CAMBRIDGE TRACTS IN MATHEMATICS General Editors H. BASS, H. HALBERSTAM, J.F.C. KINGMAN J.E. ROSEBLADE & C.T.C. WALL
83
General irreducible Markov chains and non-negative operators
ESA NUMMELIN Associate Professor of Applied Mathematics, University of Helsinki
General irreducible Markov chains and non-negative operators
The right of the University of Cambridge to print and sell all manner of books was granted by Henry VIII in 1534. The University has printed and published continuously since 1584.
NO'
CAMBRIDGE UNIVERSITY PRESS Cambridge London New York New Rochelle Melbourne Sydney
PUBLISHED BY THE PRESS SYNDICATE OF THE UNIVERSITY OF CAMBRIDGE The Pitt Building, Trumpington Street, Cambridge, United Kingdom CAMBRIDGE UNIVERSITY PRESS The Edinburgh Building, Cambridge CB2 2RU, UK 40 West 20th Street, New York NY 10011-4211, USA 477 Williamstown Road, Port Melbourne, VIC 3207, Australia Ruiz de Alarcon 13, 28014 Madrid, Spain Dock House, The Waterfront, Cape Town 8001, South Africa http://www.cambridge.org C Cambridge University Press 1984 This book is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 1984 First paperback edition 2004 A catalogue record for this book is available from the British Library Library of Congress catalogue card number: 83-23995 ISBN 0 521 25005 6 hardback ISBN 0 521 60494X paperback
To Leena, Mikko and Anna
Contents
Preface 1 Preliminaries 1.1. Kernels 1.2. Markov chains
ix 1 1 3
2 Irreducible kernels 2.1. Closed sets 2.2, 9-irreducibility 2.3. The small functions 2.4. Cyclicity
8 8 11 14 20
3 Transience and recurrence 3.1. Some potential theory 3.2. R-transience and R-recurrence 3.3. Stopping times for Markov chains 3.4. Hitting and exit times 3.5. The dissipative and conservative parts 3.6. Recurrence
25 25 27 31 33 38 41
4 Embedded renewal processes 4.1. Renewal sequences and renewal processes 4.2. Kernels and Markov chains having a proper atom 4.3. The general regeneration scheme 4.4. The split chain
47 47 51 58 60
5 Positive and null recurrence 5.1. Subinvariant and invariant functions 5.2. Subinvariant and invariant measures 5.3. Expectations over blocks 5.4. Recurrence of degree 2 5.5. Geometric recurrence 5.6. Uniform recurrence 5.7. Degrees of R-recurrence
68 69 72
6 Total variation limit theorems 6.1. Renewal theory 6.2. Convergence of the iterates K(x, A)
98 98 108
75 84 86 91 94
Contents
viii
7
6.3. Ergodic Markov chains 6.4. Ergodicity of degree 2 6.5. Geometric ergodicity 6.6. Uniform ergodicity 6.7. R-ergodic kernels
114 118 119 122 123
Miscellaneous limit theorems for Harris recurrent Markov chains 7.1. Sums of transition probabilities 7.2. Ratios of sums of transition probabilities 7.3. Ratios of transition probabilities 7.4. A central limit theorem
126 126 129 131 134
Notes and comments
141
List of symbols and notation
147
Bibliography
148
Index
155
Preface
The basic object of study in this book is the theory of discrete-time Markov processes or, briefly, Markov chains, defined on a general measurable space and having stationary transition probabilities. The theory of Markov chains with values in a countable set (discrete Markov chains) can nowadays be regarded as part of classical probability theory. Its mathematical elegance, often involving the use of simple probabilistic arguments, and its practical applicability have made discrete Markov chains standard material in textbooks on probability theory and stochastic processes. It is clear that the analysis of Markov chains on a general state space requires more elaborate techniques than in the discrete case. Despite these difficulties, by the beginning of the 1970s the general theory had developed to a mature state where all of the fundamental problems — such as cyclicity, the recurrence—transience classification, the existence of invariant measures, the convergence of the transition probabilities — had been answered in a satisfactory manner. At that time also several monographs on general Markov chains were published (e.g. Foguel, 1969a; Orey, 1971; Rosenblatt, 1971; Revuz, 1975). The primary motivation for writing this book has been in the recent developments in the theory of general (irreducible) Markov chains. In particular, owing to the discovery of embedded renewal processes, the 'elementary' techniques and. constructions based on the notion of regeneration, and common in the study of discrete chains, can now be applied in the general case. Our second motivation is to point out the close connections between the theories of Markov chains and non-negative operators (operators induced by a non-negative transition kernel). This relationship is analogous to that between discrete Markov chains and non-negative matrices (cf. the monograph by Seneta (1981)). Since the emphasis here lies on Marko v chains we shall discuss the theory of general non-negative transition kernels only as far as it naturally arises as an extension of the theory of markovian transition kernels. However, even within this relatively narrow scope, we are able to develop some fundamental concepts and results, among them a general Perron—Frobenius-type theory for non-negative kernels. (For an
Preface account of the general, functional analytic approach to non-negative operators the reader is referred to the monograph by Schaefer (1974).) Chapter 1 contains basic definitions and some preliminary results on kernels and Markov chains. In Chapter 2 we examine the fundamental concepts of irreducibility and cyclicity. Chapter 3 deals with the concepts of transience and recurrence and the associated decomposition results. First we shall analyse general irreducible kernels and prove Vere-Jones' and Tweedie's theorem which classifies them as R-recurrent or R-transient kernels. The remainder of Chapter 3 deals with Mark ov chains. Starting from elementary potential theoretic notions we end up with Hopf's decomposition theorem, stating that the state space of a Markov chain can be split into a 'transient' and a 'recurrent' part. Chapter 4 is concerned with renewal processes and renewal sequences embedded in general irreducible Markov chains and non-negative kernels. The techniques introduced in Chapter 4 form a basic tool in the proofs and constructions of the later chapters. Chapter 5 deals with the principal `eigenfunctions' and `eigenmeasures' (to be called here R-invariant functions and R-invariant measures, respectively) of a non-negative kernel, and with the related concepts of Rpositive and R-null recurrence. A systematic study is also made of the degrees of (positive) recurrence for Markov chains; in Chapter 6 these are then used to give results regarding the rates of convergence of the transition probabilities. Finally, Chapters 6 and 7 develop the limit theory of the iterates of nonnegative kernels and Markov chain transition probabilities. Chapter 6 is devoted to total variation convergence results, among them Orey's fundamental convergence theorem and various results concerning the rate of convergence. Chapter 7 contains miscellaneous limit theorems, including results on the convergence of sums and ratios of transition probabilities and central limit theorems. Our approach is based on the use of probabilistic notions and arguments. Most of the concepts and results, even if first formulated for general, nonmarkovian kernels, will be interpreted in terms of Markov chain sample paths. It is assumed that the reader is familiar with basic concepts of probability theory, such as random variables, conditional expectations, etc. The reader interested only in operator theory can read Sections 1.1,2.1-2.4, 3.1-3.2, 4.1-4.3, 5.1-5.2 and 6.2, skipping everywhere those parts where Markov chains (and probability theory in general) are discussed. (However, for 6.2 the renewal theorems of the preceding Section 6.1 should be consulted. Although Sections 5.7 and 6.7 deal with general non-markovian kernels, the preceding 'probabilistic' sections form a necessary background for them.)
Preface
xi
The book contains a few recurring examples (indicated by the letters (a)— ( k)) , the primary aim of which is to illustrate the general theoretical concepts and results in certain special cases. Some examples refer to practical applications of general Markov chains. In most cases the detailed calculations leading to the statements and results in the examples are left to the reader as exercises. The references which have been used can be found in the bibliography. They are discussed in a separate section with title 'Notes and comments'. We do not intend to give a systematic account of the historical development of the concepts and results presented in this book. In order to trace this development more exactly, the reader is advised to study the references and their bibliographies. There are many people to whom I am indebted for their support and criticism. My special thanks are due to Elja Arjas for many comments and suggestions which helped to improve the quality of the text. I also wish to thank Richard Tweedie for important comments on an early version. For helpful remarks I am grateful to Juha Ahtola, Heikki Bonsdorff, Priscilla Greenwood, Seppo Niemi and Karen Simon. For the typing of the manuscript I thank Raili Pauninsalo. The support received from the Academy of Finland and from the Emil Aaltonen Foundation is also gratefully acknowledged. Helsinki, June 1983
Esa Nummelin
1 Preliminaries
In this chapter we introduce the basic terminology and definitions concerning Markov chains and non-negative kernels.
1.1 Kernels Let E be a set and e a a-algebra of subsets of E. We assume that the a-algebra e is countably generated, i.e. generated by a countable collection of subsets of E. The measurable space (E, e) is called the state space and the points of E are called states. The symbol e will also be used to denote the collection of extended real valued measurable functions on (E, e). The symbols x, y,... denote states, A, B, . . . denote elements of the a-algebra e, and f,g,. . . denote extended real valued measurable functions on (E, S). We write M for the collection of signed measures on (E, e). The symbols 2, it, ... denote elements of M. When d is any collection of functions or signed measures, we write <saf + (resp. bsi) for the class of non-negative (resp. bounded) elements of sig. In what follows, we shall refer to the elements of M, simply as measures. M + denotes the class of positive measures, i.e. M .4- = {26 A + :1(E) > 0}. The elements of b.ift will also be called finite signed measures. Definition 1.1. A (non-negative) kernel on (E, e) is any map K :E x e --ol, satisfying the following two conditions: (i) for any fixed set AE, the function K(• , A) is measurable; (ii) for any fixed state xeE, the set function K(x,•) is a measure on (E, e). A kernel K is called a-finite, if there exists an e 0 S-measurable function fe 6' 0 e, f> 0 everywhere, such that
finite, if
f
K(x, dy)f (x, y) < co
for all xeE;
K(x, E) < co for all xeE; bounded, if sup K(x, E) < cc; xeE substochastic, if K(x, E)
1 for all xeE;
1 Preliminaries
2 and stochastic, if K(x, E) = 1 for all xeE.
Examples 1.1. (a) Suppose that E is discrete, that means a countable set. Let 6 be the o-algebra of all subsets of E. Then any kernel K on (E, 6') can be identified with the non-negative matrix def
k(x, y) = K(x, {y}), x, ye E. K is (for example) 0--finite if and only if the matrix elements k(x, y) are finite. (b) Let co be a o--finite measure on (E, 6') and let k be a non-negative
6'0 !-measurable function. The kernel def
K(x,dy) = k(x, y)9(dy)
is called an integral kernel (with basis go and density k). Clearly, K is 0-finite if and only if, for all xeE, k(x, y) is finite for (p-almost all yeE. Any kernel K can be interpreted as a non-negative linear operator on the cone e + by defining Kf (x) = .1 K(x,dy)f (y),
(When integration is over the whole state space E, we usually omit the symbol E indicating the integration area.) Similarly, K acts as an operator on .A' + : A.K(A)= i A(dx)K(x, A), Ae di + .
We denote by I the identity kernel l(x, A), xe E, A e 6' : def
def
l(x, A) = 1 A (x) =
If K 1 and
K2
1 if xeil, 0 if x0A.
are two kernels, their product kernel K 1 K 2 is defined by
K 1 K 2 (x, A) = f K i (x, dy)K 2 (y, A).
The iterates Kn,n > 0, of a kernel K are defined by setting K ° = I, and iteratively, K" = KK" - 1 for n > 1.
Henceforth we make the following: Basic assumption. K is a fixed non-negative kernel on (E, 6 ). All the iterates K",n >1, of K are a-finite.
3
Markov chains
In the case where K is substochastic, we shall use the symbol P instead of K, and call K = P a transition probability. Note that in this case, trivially, all the iterates Pn,n> 1, are substochastic (whence a-finite).
1.2 Markov chains Let (0, .97) be a measurable space, to be called the sample space, and let P be a probability measure on (11, „F). A measurable map defined on (c2, „*---,) and taking values in an arbitrary measurable space (ST, ..*--, ') is called an D'-valued random element. An A-valued random element is called a random variable. A sequence („ ; n >0) of ST-valued random elements is called an fl'-valued stochastic process. Note that the stochastic process (,, ; n > 0) can also be viewed as an (Or '-valued random element. If is an ST-valued random element, we shall write Y() for the probability distribution of , i.e., 2'()(A')= PReill,
A'eg;'.
The analogous notation will be used for conditional distribution. Let .F 0 g 37; 1 g • • • g g-,„ g • - • be an increasing sequence of sub-aalgebras of ,97, to be called a history. A stochastic process (,,) is said to be adapted to the history (i°—), if n is *-7„-measurable for each n > 0. Note that a stochastic process („) is always adapted to the history '*-7 r14 = ago , • • • ,
r,),
n > 0,
called the internal history of(). A Markov chain (X„;n > 0) is an E-valued stochastic process having the Markov property; that means, at every n, the next state x, 1 depends only on the present state X, of the process. We supposefor a while that K = P is a stochastic transition probability.
Definition 1.2. An E-valued stochastic process (X;n >0) is called a Markov chain with transition probability P, provided that for all n
0.
(1.1)
(We shall omit systematically the phrase 'almost surely' when dealing with equalities involving conditional distributions.) A Marko v chain (X) is called Markov with respect to a history (g"), provided that it is adapted to (,°i7„), and (1.1) holds when .97x is replaced by ..*7 , i.e., Y(Xn+11 ,97.)= Y(xn+ 11 x„) = 13(X„,.)
for all n- O.
(1.2)
Sometimes, if we want to emphasize that (X) is Markov w.r.t. a general history ( 97 r) we will speak of the Markov chain (X, , F n ). The distribution Y(X 0 ) of X0 is the initial distribution of the Markov
1 Preliminaries
4
chain (X). If or(X 0 ) = ex , i.e., P {X 0 = x} = 1, for some state x, then x is called the initial state of the chain. Examples 1.2. (a) Let (X„ ; n > 0) be a Markov chain on a discrete state space E (abbreviated 'a discrete Markov chain') with transition probability P; P can be identified with the matrix def
POC, A =
P(x, {A)
=P{X
1 = ylX n = x},
x,yeE,n
0,
called the transition matrix of (X) (cf. Example 1.1(a)). We have def
pni(X,
y) = Pm(x, ly1) = P 1 X n , = yl X n = xl, x, yeE, m, n O.
(c) Let z,,,n = 1, 2, ... ,be i.i.d. (abbreviation for 'independent, identically distributed'), finite random variables with common distribution F, F(A)=P{z n eA}, Ac. Let Zo be a finite random variable, independent of Iz n ;n > 11, with distribution Fo . Define n
zn = z o + I z.,
for n > 1. 1 The stochastic process (Z n ; n > 0) is called a random walk (on R). Clearly, it is a Markov chain on (R,R) with initial distribution Y(Z 0) = Fo and transition probability P(x, A) = F(A — x),
xeR, A e.R
def
(A — x = 1y — x: ye AI). (d) Let zn , n > 1, and Z o be as in the above example. Suppose that Z o 0 a.s. (abbreviation for 'almost surely'). Set 14 10 = Zo , and 147„ = (4;2 _ 1 + zn ) + for n >1. The stochastic process (14/n ; n > 0) is called a reflected random walk (on R 4. It is a Markov chain on (R + 5 ,R + ) with initial distribution .4 Wo ) = Fo and transition probability P(x, A) = 1 A (0)F(( — oo, — x)) + F(A — x), xeR + , (In queueing theory W,, is commonly interpreted as the waiting time of the nth customer (before service). This follows if we set zn = s,,_ 1 — (Tn — Tn _ 1 ), where s„ is the service time of the nth customer and Tn is the arrival epoch of the nth customer, and assume that s o , s i ,... are i.i.d., and T1 — To , T2 — T1 , ... are i.i.d., independent of {s„ ; n On
Markov chains
5
(e) Let z,,, n >1, and Zo be as in Example (c). Suppose that they all are non-negative (a.s.). Then the random walk (Z,,) is called a renewal process on R .4_ . For any t > 0, write lir+ = inf {Zn — t: Zn __ t, n > 0}. The continuous-time stochastic process (V r+ ;t >0) is called the forward process associated with the renewal process (Z,,; n > 0). For any (5> 0, the skeletons (V ,,+6 ;n >0) form a Markov chain on (R + , ,R + ) with initial distribution F o . (f) Let zn , n 1, and Zo be as in Example (c). Let p be a constant. Set Ro = Zo , and iteratively R n = pR_ 1 + z, n 1. The stochastic process (R„;n >0), called an autoregressive process, is a Markov chain on (R,M). (g) Examples (c), (d) and (f) are special cases of the following scheme, which could be called a stochastic difference equation: Let (E', e') be an arbitrary measurable space, and letf :E x E' —> E be jointly measurable. Let z,,,n > 1, be i.i.d., E'-valued random elements, and let X 0 be an E-valued random element, independent of (z,, ; n > 1). Setting iteratively X„ =f (X_ 1 , z,,), n > 1, we obtain a Markov chain on (E, e). In the above stochastic difference equation the Markov property is retained if we allow zn to depend on the previous state X,,_ i : -29
(z.1 ,Fnx - 1
for some transition probability P' from (E, 6) into (E',r). The resulting Markov chain (X,,) represents an abstract learning model if we interpret x„ as the states of learning and the random variables Z events (induced by the states of learning through the transition probability P'). (See e.g. Norman, 1972). In what follows, we shall refer to Examples (a)—(g) by using the above lettering. Let us again return to the general theory. If (X„) is a Markov chain with initial distribution Y(X 0 )= A and transition probability P, then clearly
P{X0 e,4 0 ,..., X„EA„}
=
A.(dxo ) 1 P(xo , dx 1 )....1
P(x„_ i ,dx,,)
.1 Ao
for all n >0, A ,3 ,... , A n e o'. In particular, we have Y(X,,) = APn.
(1.3)
1 Preliminaries
6
Conversely, if an E-valued stochastic process (X) satisfies (1.3) for all n > 0, A 0 , .. , A n e e , then it is a Marko v chain with initial distribution A and transition probability P. For any given probability measure A and stochastic transition probability P on (E, e) we can always construct a Markov chain with initial distribution A and transition probability P: Set LI = EX = e®-, and define X n(w) = wn„ the (n + 1)th coordinate of w = ((D o , w i ,...)ea On the cylinder sets A o x • x A of g; define a probability measure P according to the formulas (1.3) and extend it to gi to obtain the desired Markov chain. The Markov chain, constructed in this manner, is called the canonical Markov chain corresponding to the initial distribution A and transition probability P. The sample space (Q,..F) = (E x " , 0-) is the canonical sample space. For details of this construction see e.g. Doob (1953), Suppl., §2, Neveu (1965), Sect. V.1, or Revuz (1975), Ch.1, §2. If the transition probability P is not stochastic then we proceed as follows: Take a point A not belonging to the set E and adjoin it to E to obtain the extended state space (EA , eA) = (Eu {A}, a(e, {A})). Extend Ito (Es, eA) by setting I( {A}) = 0. The transition probability P is extended to (EA, eA) by setting P(x,{A})= 1 — P(x, E), xe E, P(A, {A}) = 1. Clearly, this extended P is a stochastic kernel on (EA, eA) so that we can construct the corresponding canonical Markov chain on (EA ,gA ). The point A is called the cemetery of the Markov chain (X). In what follows, whenever K = P is a transition probability we shall think of it as the transition probability of a Markov chain (X). When P is not stochastic, we automatically make the extension described above. In the sequel we will often use the notation P A (resp. P x) instead of P to indicate a specific initial distribution A (resp. initial state x) of the chain. More generally, if Aedi + is an arbitrary measure on (E, 61, we write P A for the measure A(dx)P x(.) on (n, g-,). The symbol E denotes the expectation operator corresponding to the probability measure P. If (X) is Markov w.r.t. a history (F), the formula (1.2) expresses in its simplest form the characteristic Markov property of the Markov chain (X , 2). Below we shall formulate the Markov property in a slightly more general form stating that, given X n , the whole post-n-chain (X n , X 1 , ...) is conditionally independent of the past F. For this we need the concepts of a shift operator and a functional. A measurable map 0 on the sample space (Q, ,F) is called a shift operator (for the Markov chain (X,,)) provided that X(0 w ) = x,.1 (o (
)
for all
(DEO,
n O.
Markov chains
7
The iterates B,,,,m > 0, of 0 are defined by setting 0 0 = In , the identity operator on f/, and iteratively, 0,n = O. 0,,,_ 1 for m > 1. A random variable C, which is measurable w.r.t the a-algebra .F xdf o-(X„ ; n >0), i.e. of the form C = 1/(X0, X 1 ,...), ri d'® '-measurable, is called a functional (of the Markov chain (X n )). Note that, for any n 0, if C is a functional then the functional C00 is measurable w.r.t. the a-algebra o- (X,„ X, and conversely, if a functional C' is measurable w.r.t. o-(X, X n+1 , ...) then C' = C. On for some functional C.
Theorem 1.1. (The Markov property). Let (X n ; n >0) be a Markov chain w.r.t. a history (F n ). Then for any non-negative functional C, E[C ° On 1 •97n] = LE
for all n> 0.
Proof When C is the indicator of a cylinder,
= 1 {X0EA0 ..... XmEAm}' the result is a straightforward consequence of (1.2) and (1.3). The extension to the general case follows the standard lines. El Note that Theorem 1.1 implies the existence of a regular version of the conditional distribution Y(XX„ +1 ,...„), since (17(Y
; Xn = x) = 0,r (X0 , X 1 , . ..I X0 = x ) = Px (restricted to g7x).
Irreducible kernels
The main theme of this chapter is, broadly speaking, to investigate the 'communication structure' induced by the kernel K on the state space (E, g). That is, we shall be concerned with the relation x —> A on E x e, defined by x —> A if and only if Kn(x,A)> 0 for some n > 1. When x —> A, we say that the set A e e is attainable from the state xe E. If A is not attainable from x, i.e., Kn(x,A)= 0 for all n > 1, then we write x-/) A. When K = P is the transition probability of a Markov chain (X), we have the following probabilistic interpretation for attainability: x —> A
if and only if P x {Xn e A for some n> 1} >0.
2.1 Closed sets
First we shall define and study the so-called closed sets. These are the sets Fee such that the complement Fc is not attainable from F. Definition 2.1. A non-empty set Fee is called closed (for the kernel K), if K(x,Fc) = 0 for all xeF. A set Bee is called indecomposable (for K), if there do not exist two disjoint closed sets F 1 , F2 B. A non-empty set Fe & is called absorbing (for K), if K(x, F) = K(x, E) = 1 for all xeF. Clearly, Fee is closed if and only if x 74 Fc for all xeF. Note also that an absorbing set is always closed. Conversely, F may be closed without being absorbing: it is possible that K(x,F)= K(x,E)* 1 for some xeF. When F is closed, we can regard K also as a kernel on the restricted state space (F,e nF), def K I F (X, A) = K(x,A), xeF, Aee n F.
KI F is called the restriction of K to the closed set F. Clearly, the iterates of K I F coincide with those of K on F, (KIF)n =(Kn)IF.
9
Closed sets Similarly we can restrict K to the complement Fc of the closed set F, def
K I Fe (X,
A) = K(x, A),
xeFc, An F'.
Again we have (KI F.)n =(K")1 1... Note that the restriction of K to an absorbing set is a stochastic transition probability. The kernel G given by
K"
G=
is called the potential kernel (of K). It may happen that G is not a-finite, since it is possible that G(x, A), XE E, A ee, admits only the values 0 and co. In any case we have the following:
Proposition 2.1. For any n >1: n—1
n— 1
G = EKtm + KG =
E
Kin + GK.
In particular, G= I + KG = I + GK, and for any
lim K"Gf = 0 on {G f < co}. (The notation {G f < co} means the set {xEE: G f (x) < Proof. Obvious.
El
Let us denote by A ° the set of states in AC from which A is not attainable: A ° = IxeAc: x 74 Al = IG1 A = 01.
Proposition 2.2. For any A ee : A ° is either empty or closed. Proof. For all xeil ° : KG(x,A)G(x,A)= 0, and hence K(x,(A°)c) = 0.
Before proceeding further, we have to consider the notion of equivalence for kernels : Two measures A., yedi are said to be equivalent, and we write A. — it if they are mutually absolutely continuous, i.e. have the same null sets.
10
2 Irreducible kernels
A a-finite kernel k is said to be equivalent to the kernel K, if for all xeE, the measures K(x,.) and k(x, .) are equivalent.
Lemma 2.1. There exists a substochastic kernel k, Ri < K, which is equivalent to K. Moreover, there exists an g 0 g-measurable version o <j< 1 everywhere, of the Radon—Nikodym derivative y) R(x,dy)
K(x, dy) The iterated kernels Kn and kn are equivalent, for any n > 0. Proof. Let fee
g, f > 0 everywhere, be such that the function Kfee
defined by Kf(x)= f K(x,dy)f(x, y),
XEE,
is finite. Without any loss of generality we may suppose that f 1. It is easy to check that the desired kernel k is given by k(x,dy) =7(x, y)K(x,dy), where 7(X, Y) = 1 {K(x,E)> 0}(Kf(X)) - 1f( -
X, -
That K" and kn are equivalent is obvious.
lE1
Let 0 Ede be a a-finite measure such that 01( is absolutely continuous w.r.t. 0. There always exist such measures: Take any a-finite measure coe•! and set (2.1) 0
where ij) is a probability measure equivalent to co and k is a substochastic kernel equivalent to K.
Lemma 2.2. (i) If OK 4 0 then 0 G 0. (ii) If OK 4 1i and E is indecomposable, then 0(F) > 0 for all closed sets F. Proof. (i) By induction 01(n < 0 for all n
0, and hence 0G =
01(n
0. On the other hand, trivially, 0 < 0G. (ii) By Proposition 2.2 and by the hypothesis F° is empty whenever F is El closed. Hence 0G(F) >0, which by (i) implies 0(F) > 0. Suppose now that K = P is the transition probability of a Markov chain (X n ). A set F is called closed for (X n), if it is closed for the transition probability P of (X). Similarly, we shall speak about indecomposable and absorbing sets for (X), and about the restriction of (X) to a closed set F or
11
9-irreducibility
to the complement F. The latter, i.e. the restriction of (X n ) to the complement Fc of a closed set F, means that the set AF into a cemetery for (X). Note that F is closed for (X) if and only if
de f =F
+ {A} is made
P x {X„EFc for some n > 1} = 0 for all xEF; F is absorbing for (X) if and only if
Px {X„EF for all n > 1} = 1 for all xEF.
2.2 q) - irreducibility Bee
Let coed( be a a-finite measure on be a 9-positive set (i.e., a set with 9(B) > 0).
(E,e),
and let
Definition 2.2. The set B is called 9-communicating (for the kernel K) if every 9-positive subset A B is attainable from B, i.e. x —> A
for all xGB and all 9-positive A g B.
If the whole state space E is 9-communicating, then the kernel K is called çøirreducible. K is called irreducible, if it is 9-irreducible for some 9. In this case the measure 9 is called an irreducibility measure for K. Examples 2.1. (a) When E is discrete, irreducibility (in the sense of the above definition) means (using an obvious notation) that for some nonempty subset B g E, x —>y
for all xEE, yEB.
The corresponding irreducibility measure is the counting measure on B, that is the measure Card B defined by CardB (A) = Card(A n B), A g E. (In matrix theory, irreducibility usually means Card-irreducibility, i.e. x y for all x, ye E.) (b) Suppose that K is the integral kernel with basis 9 and density k. Then for every n> 1, K" is the integral kernel with basis 9 and density le" ), where k (1) (x, y) = k(x, y), and
k(n)(x, y) = 1 k(x, z)k (" - 1) (z, y)9(dz)
for n
2.
Now, if there exists a 9-positive set Bee such that for all xEE, CO
Ek(")(x, y) > 0 for 9-almost all yEB, then K is 9/B-irreducible.
2 Irreducible kernels
12
(
(c) (The random walk). Let denote the Lebesgue measure on (R, ,M). A distribution F is called spread-out if some convolution power F*" of F is not singular w.r.t. From the well known theorem of analysis, according to which the convolution of a bounded and of an (-integrable function on (R, ,R) is continuous, it follows that if F is spread-out then there is an interval [a, b], a < b, and a constant # >0 such that
e.
f3dt
F* 2 "(dt)
on [a, b].
(We write ((dt)= dt.) The random walk (Zn) is (-irreducible if and only if the increment distribution F is spread-out. (d) (The reflected random walk). Let go denote the probability measure on (R ± ,M + ) assigning unit mass to the origin. The reflected random walk (Kin ) is ecirreducible if and only if [13 {z 1 <0} >0, i.e., F(R + )< 1. (e) (The forward process). Let F(t)= F([0, t]),t 0, and A-4. = ess sup z 1 {t : F (t) < 1}. Let denote the restriction of the Lebesgue measure =sup to [0, a), 0 < a co . Suppose that F is spread-out. Then, for any 6> 0, the Markov chain ( V,+0 ; n >0) is (la -irreducible. (f) (The autoregressive process). Suppose that Ipl < 1. Also suppose that the distribution F of z,, is not singular (w.r.t. i.e., there is f e,R + with f R f(t)dt > 0 such that
ea
e),
F(dt)
(t)dt.
Then there is an interval F of positive length such that the autoregressive process (R „ ; n > 0) is (It-irreducible. (Hint: Consider first the case where 0 < p < 1 and there is an interval [a, b], a < b,such that f > 0 on [a, N. In this case, prove that the interval F defined by F = [(1 — p) - 1 a, (1 — p) - l b] is (-communicating, and then, that x —> F for all xe R. After this consider the general case; observe that the 2-step chain (R 2n ; n 0), R 2n = P2 '2(n-1) ± PZ 2,1— 1 ± Z 2n5
n
1,
satisfies the above hypotheses.) Let us denote
B + = Bu{xeE:x—> B}= {GI B > 0}. Recall from Proposition 2.2 that the complement (B + )C =B° is either empty or closed. Hence we can always restrict the kernel K to B. We need the following simple lemma: Lemma 2.3. Let A, Be x —> A.
Proof Easy
El
and xeB± be arbitrary. If y —> A for all yeB then
13
9—irreducibility
Proposition 2.3. (i) If B is 9-communicating then it is indecomposable. (ii) If B is 9-communicating then K1 B + the restriction of K to B + , is 91 Birreducible. ,
Proof (i) Suppose that there were two disjoint closed sets in B, say F1 and F2. Then either B\F I or B\F2 would be 9-positive. But this leads to a contradiction with the hypothesis. (ii) Take any 9-positive set A g B. Then apply Lemma 2.3. 1=1 According to part (ii) of the above proposition, if there exists a (pcommunicating set B in the state space — and we are not interested in that part of the state space from which B is not attainable — then there is no loss of generality in assuming that K is irreducible. It is clear that any measure (// which is absolutely continuous w.r.t. an irreducibility measure 9 is itself an irreducibility measure. Conversely, as we will see, starting from an irreducibility measure 9, we can construct a maximal irreducibility measure, that is an irreducibility measure Iii such that all other irreducibility measures are absolutely continuous w.r.t. 0. Note that, by definition, a maximal irreducibility measure is unique up to the equivalence of measures. Proposition 2.4. Suppose that K is 9-irreducible. Then: (i) There exists a maximal irreducibility measure. (ii) An irreducibility measure tp is maximal if and only if OK 4 tii. (iii) Let tp be a maximal irreducibility measure. If t/i(B)= 0 then also tp(B + )= 0. Proof Suppose that IP is an irreducibility measure satisfying OK 4 tp. By Lemma 2.2(i), t/iG 4 Vi. From this and from the 9-irreducibility it follows that if 9(A) >0 then tp(A) > 0. Hence t/i is maximal. Conversely, if 0 is an irreducibility measure then so is OK, and hence, if tfr is maximal then OK 4 tp. By Lemma 2.2(i) we have also (iii). It remains to prove (i). Define a finite measure 0 by (2.1). Clearly 0 is an irreducibility measure satisfying OK 4 tii and it is maximal by (ii). 0 Examples 2.2. (a) Suppose that K is a CardB-irreducible matrix, B g E (cf. Example 2.1(a)). Set F = {ye E: x y for some xeB}. Note that by Card B-irreducibility, B g F. Now Card s , the counting measure on F, is a maximal irreducibility measure. F is closed; moreover, it is the minimal closed set. In what follows, when we deal with an irreducible matrix, F stands for the unique closed set having the properties stated above.
2 Irreducible kernels
14
(b) Suppose that K is a 9/B-irreducible integral kernel with basis 9 and density k (cf. Example 2.1(b)). Let B' denote the set B' = By {ye E:
E
9(dx)k (n)(x, y)> 01. B
Then 0 = 9/B , is a maximal irreducibility measure for K.
If K is irreducible (with maximal irreducibility measure 0), we call a set Bee full (for the kernel K) whenever 0(Be) = 0. It turns out that full and closed sets almost coincide:
Proposition 2.5. Suppose that K is irreducible with maximal irreducibility measure 0. Then: (i) Every closed set F is full. (ii) If B is a full set then there exists a closed set F B. (iii) If Fo i >1, are closed then so is their intersection
n
Fi .
Proof. (i) 11/(Fc)> 0 leads to a contradiction with the hypothesis. (ii) Set F = (.Bc\ ) = {G1 8 = O}. By Propositions 2.2 and 2.4(iii), F is
closed. (iii) In general, the intersection of a countable number of closed sets is either empty or closed. But now, under the assumption of irreducibility, every Fi is full and hence so is their intersection Fi . LI
n ;,
The result of Proposition 2.5 is very useful. When we want to prove that a given set B is full, we need to prove that B is closed or, at least, that it contains a closed set F.
2.3 The small functions In this section we shall define and study an important class of functions, the so-called small functions. Later we will see that, in some respects, they play a similar role for a general kernel K, as do individual states in the case where E is discrete (and K is a matrix). For the rest of this chapter we suppose that K is an irreducible kernel. tlf
denotes a fixed maximal irreducibility measure. For any subclass .21 + of non-negative measurable functions on (E, e), Jzi+ denotes the subclass of 0-positive elements in d, : = {f (
:0(f) > 0},
Li
of def
def f(X)(dX) =
f (X)1//(dX)).
When l A ed +, i.e., 0(A) >0, we write simply 'led + . We say that the kernel K satisfies the minorization condition M(m o , fl,s,v),
15
The small functions
where mo > 1 is an integer, /3> 0 a constant, se 6' a function, and v Elf + a measure, if Kmci(x, A) fis(x)v(A)
for all xeE, Aee,
or, briefly, writing s 0 v for the kernel s 0 v(x, A) = s(x)v(A), xe E, A Ge, Km°
(2.2)
fisOv.
Definition 2.3. A function se e + is called small (for the kernel K), if K satisfies M(mo , /3,s, v) for some m o > 1, /3> 0, ve.it + . A measure v ed 1+ is called small, if K satisfies M(m o ,fl, s, v) for some m o > 1, f3>0,see +. A set Gee + is called small, if its indicator l c is a small function, i.e., there are mo __ 1, # > 0 and v Ede such that Kinqx,.)
fiv(•)
for all x e C.
We shall use the symbol 99+ to denote the class of small functions. In what follows, the symbols s and v will stand exclusively for a small function and a small measure, respectively. Examples 2.3. (a) Let K be a Card F-irreducible matrix. Then any singleton { xo }, x o eF, is a small set. (K satisfies the minorization condition M(1,1, 1 {x0} , K(x 0 , (b) Suppose that K is a 9/ B-irreducible integral kernel with basis co and density k. If there exist m o > 1,13> 0, and 9-positive sets C, Dee, C g B, such that k("k) ) (x, y) > /3 for all xe C, yED, .).)
then C is a small set. (The corresponding minorization condition is Mmo, fl, lc, 91D).) (d) (The reflected random walk). When F(R +) < 1, the singleton {0} is a small set. (e) (The forward process). Suppose that F is spread-out and that 6 > 0 is such that F[0,(5) < 1. Then there are an integer mo > 1 and a constant 13> 0 such that M(mo , #, 1 [0,6) , e 6) holds, i.e., the interval [0, 6) is a small set. (f) With the assumptions of Example 2.1(f) every bounded set CeM such that oc n F) > 0 is small. (Hint: Take c> 0 to be a small constant, and choose N so big that 1 pIN Ix < e whenever XE C. Then observe that
R N = zN + pz N _ 1 +.. + pl s 1 -1 Z 1 + pN R o . By using the hypothesis that F is non-singular we can find an interval [c,d],c < d, and a constant fl > 0 such that
PN(x, dy) fidy whenever x e C, ye [c, d] • ) Remarks 2.1. (i) A small measure v is always an irreducibility measure for K. Hence it is absolutely continuous w.r.t. 0.
2 Irreducible kernels
16
(ii)By 'multiplying' both sides of (2.2) by Km we see that Kms is small, for all m >O. Similarly, v Km is small for all m >O. Hence, by irreducibility, in most cases there is no loss of generality in assuming that v(s) > 0. (iii) A small function or measure remains small when multiplied by a constant y> 0. Consequently, in most cases one can simply assume that /3 = 1. (iv) For any constant y> 0, the set C = {s > 7} is small whenever it is 1//positive. (v) If K = P is a transition probability, then v is finite, and s is bounded by (f3v(E))'. Then, without loss of generality one can assume that Pm°
s
v, where 0 < s < 1,
and v is a probability measure.
(2.3)
It is by no means clear that any small functions or measures should exist. However: Theorem 2.1. Suppose that K is an irreducible kernel. Then 9 9+ 0. In order to prove this theorem we need some new definitions and preliminary results. Recall that the measurable space (E, 6') was assumed to be countably generated. Then there is a sequence (S i ; i > 1) of finite partitions of E which generate e. We can assume that 'j+1 is finer than S i , i.e. for every i, any member of S i is a finite union of sets from ei+,. Hence 0-(S i) g o( +1) for every i. For x GE, let E xi be the unique member of S i which includes the state x. Let A and 9 be two finite measures on (E, S), and let A(dy) = Aa (dy) + A s (dy) be the usual Lebesgue decomposition of A. into the absolutely continuous part A.a and the singular part A s (w.r.t. 9). The first lemma is the basic differentiation theorem of measures. For a proof, see e.g. Doob (1953), Ch. 7, §8. Lemma 2.4. A(E ix ) dA = --2-(x) for 9-a.e. x EE. {0E )>O} lim 1 p(E) c19
Being a 0-finite measure for each xeE, K(x,•) admits a Lebesgue decomposition K(x, dy) = k(x, y)9(dy)+ K,(x, dy), where for any fixed xeE, k(x,•)ES + and Ks (x,•) is singular w.r.t. 9. It is not
The small functions
17
immediately clear that we can choose the density k = k(x, y) to be jointly measurable. Lemma 2.5. There exists a non-negative 0 e-measurable version of the density k. Proof of Lemma 2.5. By Lemma 2.1 it is no restriction to assume that the kernel K is substochastic. Now the functions ki e& & defined by ki (x,
y) = 11 ,p(Eiy)> 0}
K(x, E iy ) 9(E)
e
are non-negative and e-measurable, and by Lemma 2.4 their limit limi _ k i is a version of the density k. We note in passing the following corollary of Lemma 2.5: Example 2.4. (b) K is an integral kernel with basis 9 if (and only if) K(x,-) 4 9
for all
XE E.
e
Lemma 2.6. There exist non-negative e-measurable versions of the densities le" ) of the iterates K", which satisfy k (m +n) (x, z) > f Km(x,dy)k (n)(y,z)
> k(m) (x, y)k (n) (y, z)9(dy)
for all m,n >1, x, zeE.
e
Proof of Lemma 2.6. Let kr be a non-negative 0-measurable version of the density of K. Define k (n) ee 0 n > 1, by setting k" ) = 141) and,
e,
iteratively, len)(x, z) = ler(x, z) v max
K (m)(x,dy)len - m) (y,z)
for n > 2.
It is easy to check that len) , n > 1, satisfy the given requirements.
e
The following lemma is the key to the proof of Theorem 2.1. Let A, oe be arbitrary, and let A 1 (x) and B 2(z) denote the sections A 1 (x) = {yeE:(x, y)e A}, B 2 (z) = {yeE:(y,z)eB}.
0 BE
2 Irreducible kernels
18
The composition AaBe g Og Og of A and B is defined by Ao B = (A x E)n(E x B) = {(x, y,z)EE x E x E:(x, y)eA, (y,z)eB}.
Write con for the product measure 9 x • • • x 9 (n times). Lemma 2.7. If 9 3 (AaB)> 0, then there exist 9-positive sets C, De
such
that def
y = ml 9(A 1 (x)n
B2 (Z)) > 0.
xeC, zeD
Proof of Lemma 2.7. Set E xi ,y = E x Eyi . By Lemma 2.4 there are 9 2-null
sets AT 1 ,N 2 Geoe such that ,
lim i
co
nEi ) 1 for all (x,y)eA\N 9 2(E xi ,y )
9 2 (A
and ,
lim
00
2
(BnE i )
(p
(E)
z
= 1 for all (y,z)EB\N 2 .
Fix a triplet (u,v,w)e(A\N 1 ).(B\N 2 ) and an integer j big enough so that 9 2 (A nE) > 3
p 2(E)
4
and
9 2 (B n
>3
p 2 (E)
(2.4)
4.
Let C = {xeE:9(A 1 (x)nEiv ) D= {zEE,:9(B 2 (z)n
_9(E)}.
It follows easily from (2.4) that 9(C) > 0 and 9(D) > 0. For any xeC and zeD we have (1)(A1(x)nB2(z))1 0 (E-iv)> 0 by (2.4).
El
Now we are able to prove Theorem 2.1: Proof of Theorem 2.1. Let 9 be a probability measure which is equivalent
to iii. By irreducibility, we have for any xeE
E k(n)(x, y)> 0 for 9-a.e. y EE. 1 It follows that there exist integers m l , m2 > 1 such that
ff
fk(mi ) (x, y)k (m 2) (y, z)9(dx)9(dy)9(dz) > 0,
which in turn implies that for 5> 0 sufficiently small, the composition A a B
The small functions
19
of the sets A = {k (mi ) > 6}, B = {k (m2) >6},
is (p 3 -positive. If C,D and y are as in Lemma 2.7 then by Lemma 2.6, for all xeC, zeD: k(MI + M2 ) (X, z)> fk(m1) (x, y)k (m 2) (y,z)9(dy)76 2 .
A i (x) nB2(z) It follows that K satisfies the minorization condition M(m o ,fl,s,v) with mo = m 1 + m2 , I3 = yo 2 , s = lc and v = co/D , and thus C is a small set. 1:1
The following two results will be needed in the sequel. The first is a sharpening of Theorem 2.1: Proposition 2.6. For any set BE66+ there exists a small set CeY ± with C g B. ,
Proof. By Remark 2.1(iv) it suffices to prove the existence of a small function sEY ± with {s > 0} B. To this end, let C' be a small set. By irreducibility, there is m > 1 such that the function s = IBKm1 c, belongs to e+ By Remark 2.1(ii) s is small. El .
Proposition 2.7. (i) For any fee + and any small function se', there exist an integer m >1 and a constant y> 0 such that (2.5)
Kinf _ys.
In particular, for any f Ee + and any small set CeY + , there exists an integer m > 1 such that infKmf > 0. C (ii) For any f ee and se,99+ , there exists a constant y > 0 such that
Gf
yGs.
(iii) For any xEE and any small measure vedi +, there exists a constant y> 0 such that G(x,-)yvG. Proof. (i) We have
Kinf _. fi(vicn - m°f)s
for all m > mo .
Choose m > mo such that y = fivKm "°f> 0. (ii) 'Multiply' both sides of (2.5) by Kn and sum n over N. (iii) The proof is similar to that of (ii). El
2 Irreducible kernels
20 2.4 Cyclicity
Next we shall examine the cyclic behaviour of an irreducible kernel. We assume that K is an irreducible kernel satisfying the minorization condition M(m o , )3, s, v).
Definition 2.4. A sequence (E 0 , Er,... ,E m _ 1 ) of m non-empty disjoint sets in 6' is called an m-cycle (for the kernel K), provided that for all i = 0, ... , m — 1, and all xeEi : K(x, Ey) = 0 for ] = i + 1 (mod m).
Note that, if the sets E 0 , ... , E._ 1 form an m-cycle, then their union E0 + • • • + E._ 1 is closed (whence by Proposition 2.5(i) also full). Also note that for all n > 1, i = 0, ... , m —1, xeE i : Kn(x, ED = 0 for ] = i + n (mod m);
in particular, Km(x, ED = 0,
i.e. Ei is a closed set for the kernel Km. It turns out that the small function s cannot be strictly positive on two different sets Ei and Ei of a cycle.
Proposition 2.8. Let (E 0 , ... , E._ 1 ) be an m-cycle and let N be the 0-null set N = (E0 + • • • + E._ 1 )c. Then there is an index i, 0 < i < m, such that {s > 0} g Ei + N and v(E) = 0 where j = i + mo (mod m). Proof. Suppose that j is such that v(Ei) > 0. Then for any xeE, s(x) > 0
implies Kmci(x, E) f3s(x)v(E j) > 0.
This is possible only if xeE i _ .0 or xeN. There cannot be any other index j' * j with v(Ei,) > 0, since this would contradict the hypotheses tp(s) > 0 and 0(N) = 0. 0 By Remark 2. 1(u) there is no loss of generality in assuming that v(s) > 0. Let d be the greatest common divisor (abbreviated g.c.d.) of the set / = {m 1: M(m, f3.,s,v) for some f3.> 0}.
(2.6)
It is clear that / is closed under addition and then it contains all sufficiently large multiples of d. The following theorem solves the problem of the existence and uniqueness of cycles:
Cyclicity
21
Theorem 2.2. Suppose that K is an irreducible kernel. Let d > 1, s e ± and vEJ( , v(s) > 0, be as above. Then: (i) There is a d-cycle (E0 ,... , Ed _ 1 ). The integer d does not depend on the particular choice of the small function s and of the small measure v. (ii) If (E'0 , , Ed' _ 1 ) is another cycle, then d' divides d, and any E: is the union (0-a.e.) of d/d' sets from the collection {E 05 Ed _ 1 }. In particular, if (E0, , Ed' _ 1 ) is another d-cycle, then Ei = Ei (0-a.e.), where j = i + r (mod d), for some integer 0 < r < d. Proof. For any i = 0, , d — 1, set n=1
By irreducibility, P o u u Ed _ = E. Should ti n Pie& + hold for some i j, then {Knd- is > 0} n {Kn' d- is > e‘+ for some n, n' > 1, and by irreducibility, vKq +nd-iS > 0 and vl(q +n'd- is > 0 for some q> O. But then both q + nd — i + 2m0 and q + n'd — j + 2m0 would belong to the set /, which contradicts the definition of the integer d. By Proposition 2.5 there is a closed (and hence full) set F such that the sets Ei = Li n F are disjoint. Their union E do - 1 Ei is equal to F. Now, if x eF is such that K(x, E)> 0 then clearly x belongs to Ei , for i =j — 1 (mod d). This proves that (E 0 , ... Ed _ 1 ) is a d-cycle. The uniqueness assertion (ii) follows from Proposition 2.8. The independence of d of the choice of s and v is a direct consequence of (ii). In what follows we shall assume that (E0 ,... , Ed _ 1 ) is a fixed d-cycle. The sets E0 , , Ed _ I are called cyclic sets. N denotes the 0-null set N = (E0 + • + Ed _ if The integer d > 1 is called the period of the kernel K. If d = 1, K is called aperiodic, otherwise periodic. By convention, in the sequel addition and equalities involving indices of the cyclic sets are always modulo d. Examples 2.5. (a) Suppose that K is a Card F-irreducible matrix. Fix any state x 0 EF. Then (with obvious notation) d = g.c.d.{m 1: km(x o , x0 )> 0} is the period of K. The cyclic sets are given by E= {xe F : knd-i(x, x 0 ) > 0 for some n > 1},
i = 0, ... d — 1.
(c) Suppose that F is spread-out. Then the random walk (Z,,) is aperiodic. (d) If F(R +) < 1, the reflected random walk ( W) is aperiodic. (e) Suppose that F is spread-out. Then the Markov chains (V,;(5 ; n > 0), ö> 0, are aperiodic. (f) With the assumptions of Example 2.1 (f), the autoregressive process (R,,) is aperiodic.
2 Irreducible kernels
22
For later purposes we shall discuss the cyclic behaviour of the iterates Km of K, m> 1. By the remarks after Definition 2.4, K d 'splits' into d distinct kernels with respective state spaces (Ei ,6'nEi), i=0,...,d — 1. In fact, it follows from Definition 2.4 and Theorem 2.2 that, for an arbitrary integer m > 1, Km splits into cm = g.c.d. {m, d} distinct kernels having state spaces = Ei + Ei±cm + Ei+ 2 cni ± • • • + Ej+d_ Cm i = 0, 1,..., cm — 1, respectively. As an immediate consequence of the definitions we have the following result:
Proposition 2.9. Let m > 1 be arbitrary and let 0 < i < cm — 1. Then the kernel Kin with state space (Er, e nEr) is /J/ E(.)-irreducible and has period d/cm . In particular, the kernel K d with state space Ei is 0/E,-irreducible and aperiodic for each i = 0, 1,... , d —1. Proof. Obvious.
El
For the rest of this section we return to the study of the class + of small functions. For i = 0, ...,d— 1, let 9 i+ denote the class of those small functions s which vanish on the cyclic sets Ej ,j i. Recall from Proposition 2.8 that every small function belongs to one of the subclasses Yt : d-1
99+ =
E
(2.7)
Proposition 2.10. For every i = 0,...,d —1: (i) If se< 9 i+ then KseY i+ 1 . (ii) If seY i+ , and s'ee + is such that s'
0, then also s'eYt . (iii) If s and s'eY i+ then s + s'eY i+ and s v s' e<.99 (iv) If seY i+ then E,T=0 1(nd ÷q,seY it g , for all m,q >O. Proof. (i) and (ii) are obvious. (iv) is an immediate consequence of (i) and (iii). In order to prove (iii) take two arbitrary small functions s, s' 97. By Remarks 2.1(ii) and (iii) there is no loss of generality in assuming that we have M(m o, 1,s, v) and M(rn'0 ,1,s',v) with v(s)> 0 and v'(s)> 0 for some m0, m, v and v'. Since the set / given by (2.6) contains all sufficiently great multiples of d, we can find integers m and m' such that vIcns> 0, v'K'n's> 0
23
Cyclicity
and m = m' + mO — mo . It follows that Km + 2m° > (vKms)s Qv
and Km + 2m° = Kni"o+ne±mo>(v/Km' s)s'
and hence the function i(vKms)s + (iii) follows easily now from (ii).
v,
Km' s)s' is small. The final assertion
Similar statements are valid for small measures. We leave the details to the reader. For every i = 0, , d — 1, denote by 99 i+ the subclass of small sets CeY + 1 c e97 , i.e., CnEj = 0 for all j i. By (2.7) every small setsuchta CeY ± belongs to one of the subclasses i+ The most important result in the next proposition is part (iv) stating that the state space E can be decomposed into a countable number of small sets. Proposition 2.11. (i) Let seY i+ be small. Every set Gee + satisfying
E
Knd -Fqs(x)> 0
for some m, q> 0,
xeC n=0
is small and belongs to the subclass 99 i+_ q . (ii) Any set Ceg + satisfying the following condition is small: there is a set BE' + such that for all A ee+ with A g B, inf
E K nd±q (X, A)> 0 for some m = m(A), q = q(A)
O.
xeC n = 0
(iii) The subclass 9' i+ of small sets is closed under finite unions. (iv) There is a countable partition E = Em- = , Cm of (E, 6') into small sets Cm eY + Proof. (i) and (iii) are immediate consequences of Proposition 2.10. (ii)
follows from (i) and Proposition 2.6. For the proof of (iv) set cm o) =
E Knd—i s > m — 1} n= 1
By (i), each C (mi) is small. Using the same notation as in the proof of Theorem 2.2 we see that their union, lim I C (mi) is equal to E. We have
U
d— 1
=0
m—■
E."=and E hence the result follows.
In the sequel, in order to avoid unessential technicalities, we shall mostly concentrate on the aperiodic case. For convenience we write out Proposition 2.11 in this special case:
2 Irreducible kernels
24
Corollary 2.1. Let K be aperiodic. Then : (i) The functions 1 m Kns, m 0, are small. In particular any set cee+ 0 satisfying III
inf E Kns(x) > 0 for some m > 0 xeC 0
is small. (ii) Any set Cee + satisfying the following condition is small: There is a set BEe+ such that for all A er- with A B, m inf E K"(x, A) > 0 for some m = m(A) O. xeC 0
(iii) The class Y + of small sets is closed under finite unions. (iv) There exists an increasing sequence C 1 C2 a... of small sets Cm eg + such that lim T C. = E. 0 m --.
GO
Transience and recurrence
Our aim in this chapter is to define and investigate the concepts of transience and recurrence. First we consider a general irreducible kernel K, showing that there exists a constant 0 < R < co (to be called the convergence parameter of K), the powers R' of which describe the growth of the iterates K" as n—> cc. Roughly speaking, the potential kernel G(r) = Do r"K" of the kernel rK is 'finite' for r R. If G(r) is 'finite' also at the critical value r = R, we say that the kernel K is R-transient ; otherwise K is Rrecurrent. After these general results we will concentrate on the case where K = P is the (not necessarily irreducible) transition probability of a Markov chain (X). The main result here is Hopf's decomposition theorem stating that the state space (E,e) can be divided into two parts, Ed (called the dissipative part) and Ec (the conservative part). The dissipative part Ed is a countable union of sets B o such that the Markov chain (X n ) is transient on each B i ; i.e. P„{X n eB i i.o.} = 0 for all xe E
('i.o.' means 'infinitely often'). On the conservative part E c the Markov chain is recurrent in the following sense. There is a non-trivial a-finite measure cp on (E, 61 such that for any 9-positive set B E c , for cp-almost all xeB: Px {Xn eB i.o.} = 1.
(3.1)
When (X n ) is irreducible, the concepts of recurrence and 1-recurrence coincide, and then there is even an absorbing set H such that (3.1) holds for all BEe + ,xeH. In this case we say that the Markov chain (x) is Harris recurrent on H.
3.1 Some potential theory Since it will be useful to have some potential theoretic notions available, we start with a brief excursion into the potential theory of nonnegative kernels. Recall from Section 2.1 the definition of the potential kernel G = LT Kn.
Definition 3.1. A non-negative function hee +, which is not identically infinite, is called superharmonic (resp. harmonic) for the kernel K, if h > Kh
(resp. h = Kh).
3 Transience and recurrence
26
A function pe& +, p* co, is called a potential, if there is a function gee + , called the charge of p, such that p= Gg. Proposition 3.1. Every potential is superharmonic. Proof. By Proposition 2.1 we have p = g + Kp Kp.
El
Proposition 3.2. Suppose that h is superharmonic. Then: (i) The set {h< col is closed. (ii) Either h> 0 everywhere or the set {h= 01 is closed. (iii) If K is irreducible and hee +, then in fact h> 0 everywhere. (iv) If K is substochastic and 0 < h <1 is harmonic, then either h < 1 everywhere or the set {h= l} is absorbing. Proof. (i) When xe{h < co} we have co > h(x)Kh(x)oo•K(x,{h= co}). Therefore K(x,{h= co})= 0. (ii) and (iv): The proofs are similar to that of (i). (iii) This is a direct consequence of (ii) and Proposition 2.5(i). El The fundamental result in potential theory is the Riesz decomposition theorem: Theorem 3.1. (i) If he6% , h* oo, is the sum of a potential p= Gg and a harmonic function h,
h = p + V',
(3.2)
then h is superharmonic, and on the closed set F = {h< oo} we have g =h— Kh
(3.3)
h' = lim .1. Knh.
(3.4)
and
(ii) Conversely, if his superharmonic, then on F = {h< oo}, h is the sum of a potential p = Gg and a harmonic function hp°, where g is given by (3.3) and ft by (3.4). (iii) If he' ± is superharmonic, and geg + is such that h>g + Kh, then h>Gg. Proof. (i) and (ii): If (3.2) holds, then by Proposition 2.1, and since V° is
R-transience and R-recurrence
27
harmonic
h=g + Kp+ KW° = g + Kh. Thus h is superharmonic and (3.3) holds on {h < co}. Suppose now that h is superharmonic, and let gee + be such that h= g + Kh. By iterating this equation we obtain n—1
h= I King + Knh for all n > 1. 0 As n —> oo, the first term on the right hand side increases to Gg =p; the second decreases to a function V', which is harmonic on 1h < ool by the monotone convergence theorem. 0 (iii) The proof follows again by iterating. The pair (p, h) appearing in Theorem 3.1 is called the Riesz decomposition of the superharmonic function h (on F = {h< con. We call part (iii) the balayage theorem. The following simple result will be needed in the sequel:
Proposition 3.3, Suppose that h is a finite superharmonic function. Let hp° =limi Knh. Then either h= p= Gg is a potential (and le° 0), or sup(h'/h) = 1. E
Proof. In any case 0 < h
le = KM' pKnh .1. ph"
as n —> co ,
which is possible only if V' —= 0.
El
3.2 R transience and R recurrence Suppose that the kernel K is irreducible. In this section we shall define and study concepts related to the 'rate of growth' of the iterates Kn of K. We adopt the notation used in Chapter 2 for irreducible K; in particular, -
-
tp denotes a maximal irreducibility measure, S + = {f t , :11i(f)> 0) and 99 + = {see : s small). Recall from Proposition 2.11(iv) that there is - = i Cm of (E,6')into small sets Cm eY +. Recall a countable partition E=y,„ also that every closed set F is full; this means (//(Fc) = 0. For any 0 < r < co, denote by G(r) the potential kernel of the kernel rK, i.e. G(r) = D,ornK".
Definition 3.2. A real number 0 < R < CO is called the convergence parameter of the kernel K, provided that there exists a closed set F such that (i)
G(r)s < co on F, for all 0 < r < R, all seY +,
3 Transience and recurrence
28 and (ii)
G(r)f
co for all r > R, all fee +.
The kernel K is called R-transient (resp. R-recurrent), if (i) (resp. (ii)) holds when r = R. It is not immediately clear that there should exist any convergence parameter, or that K should be either R-transient or R-recurrent. However, we have the following:
Theorem 3.2. Suppose that the kernel K is irreducible. Then: (i) There exists a convergence parameter 0 < R < cc. (ii) The kernel K is either R-transient or R-recurrent.
Proof. Let us fix a small function seg. Set R = sup {r >0: G(r)s(x) < co for some xeE}, Pr) = IG(r) s < col, 0 < r < co. Clearly, for all 0 < r < R, the function G(r) s is a potential, whence a superharmonic function, for the kernel rK. By Proposition 3.2(i) the sets Pr), 0 < r < R, are closed, and hence, by Proposition 2.5(iii), so is their intersection
F=
n
0
F(r) = hill i Prn1). rm 1 R
(In the special case when R = 0, we set F = P°) = E.) Let s'eg + be another small function. It follows from Proposition 2.7(ii) that G(r)s' < co on F, for all 0 < r < R. By the same proposition we also see that Or) f CO for all r > R, f ee+. Similarly we find that K is either Rtransient or R-recurrent. It remains to prove that the convergence parameter R is finite. To this end, suppose that the minorization condition M(m o , fl, s, v) holds with v(s) > 0 (cf. Remark 2.1(ii)). Then K M" S XI (S 0 IT S = ( fil , (S)) n S .
It follows that G's > E nco_ 0 rnm°K"'"s = oo on the 1i-positive set Is > 01, for all r >(fiv(s)) -1 /7". According to what we have proved before this implies that R(fiv(s)) -111". 0 In what follows, if we say that K is R-transient (or R-recurrent) then, by convention, this implicitly states that the convergence parameter of K is R.
Examples 3.1. (a) Let K be a Card F-irreducible matrix, Card F being a
R-transience and R-recurrence
29
maximal irreducibility measure (cf. Example 2.2(a)). Let GO
g(r)(x, y) = E rnkn(x, y), 0 < r < o o . n=o
We have g(r)(x, y) <
CO
for all 0 < r < R, xeF, yeE,
and g(r) (x, y)= co
for all r > R, xeE, yeF ;
for r = R, the former or latter statement holds true, depending on whether K
is R - transient or R - recurrent. (b) Let K be an irreducible integral kernel with basis 9 and density k, and with maximal irreducibility measure 9 = 9IB , (cf. Example 2.2(b)). There exists a closed set F B',9(B'\F)= 0, such that def
g(r)(x, y) = E rne)(x, y) < co n=1
for all 0 < r
g(r) (x, y) = co for all r > R, xeE, 9-almost all yeF;
for r = R, the former or latter statement holds true, depending on whether K is R-transient or R-recurrent. We introduce a new example: (h) Let (Z n ; n 0) be a multitype branching process with general type space (E,); i.e., we assume that, for each n> 0, an individual in the nth generation of the type x produces a random number of children with random types independently of the earlier generations 0, 1, ... , n —1 and of the other individuals in the nth generation. If we denote by Z(A) the number of individuals belonging to the set A (es) in the nth generation, and set M(x, A) =IE[Z i (A)IZ 0 = ex ] (ex (B) d=ef 1 B (x), xeE, Bee), it follows that the nth iterate Mn of the kernel M has the interpretation
Mn (x, A) = IE [Zn (A)1Z 0 = ex]. The convergence parameter R of the kernel M is called the Malt husian parameter of the branching process (Z n > 0). (See e.g. Harris, 1963.) Remarks 3.1. (i) A sufficient (and necessary) condition for R > 0 is: For some xo eE,fo ee ± ,mo < co, yo < oo, we have Knf0 (x 0 )
My
Then in fact, R > yo- '.
for all n> O.
30
3 Transience and recurrence
(ii) If K = P is a transition probability then 1 < R< Go. As a corollary of the above theorem we have in this case: There is a closed set F such that either Gs < oo on F for all se", or Gf:.E oo for all fer - . Later in Corollary 3.1 we will see that in the former case even sup Gs < cc, for all E SEY + .
Later we will need the following: Proposition 3.4. For any small measure vede and small function se,99+ : (i)
vG (r) s < oo for 0 < r < R and vG(r)s = (Do for r > R.
(ii)
vG (R) s = co if and only if K is R-recurrent.
Proof. Use Proposition 2.7(iii) and Theorem 3.2.
El
For later purposes we consider next the 'R-properties' of the iterated kernels Km, m >2. Recall from Proposition 2.9 that there are in fact C,,,( = g.c.d. {m, c/}) distinct kernels Km with respective state spaces Er, 0 < i < cm , all of which are irreducible with period d/cm . Define d,n = md/c m .
Proposition 3.5. If the kernel K is R-transient (resp. R-recurrent) then all the cn, kernels Km are Rm-transient (resp. Rm-recurrent). Proof. There is no loss of generality in considering Km on the state space Pom) = E0 + Ecm + • • • + Ed _ cm • We adopt the notation used in the proof of Theorem 2.2. In particular, m o ,s and v denote an integer, a small function and a small measure, respectively, such that M(m o , 1, s, v) and v(s) > 0 hold. Note that by the construction made in the proof of Theorem 2.2 and by Proposition 2.8, the set E 0 + N supports both s and v. Write R (m) for the convergence parameter of Km. Clearly r < R implies that rm < km). Hence R(m) > Rm. Suppose now that r <(R(m) ) 11m. Then the series E nco_ o rnml(mns converges 0-a.e. on E o . Let no > 1 be an integer and 7> 0 a constant such that r
m
od- - idKnod- - ids > ys for all j=0,1,...,— —1. Cm
(This is possible since the set I given by (2.6) contains all sufficiently great multiples of d.) It follows that, tp-a.e. on E 0 :
m 7 r(n+.0)47K(n+nodnis _____ 00
00 >
Cm nt ) oo m/cm — 1
r nd m + jd Knd m + jd s n=0 j=
0
CO
=y
E n=0
r nd K nd S.
Stopping times for Markov chains
3
Since, by Proposition 2.10(4 Kn d+ is vanishes on E 0 for all n >0, i = 1, ... , d — 1, we see that Gs is finite tp-a.e. on E 0 . Therefore r < R. This proves the converse inequality R(m) < Rm. Setting r = R in the above proof we also see that Km is Rm-transient if and 0 only if K is R-transient.
3.3 Stopping times for Markov chains Throughout Sections 3.3-3.6 we assume that K = P is the (not necessarily irreducible) transition probability of a Markov chain (X „;n >0).
In Sections 3.4-3.6 we will need the concept of a stopping time and the associated notion of strong Markov property. For later purposes (see Section 4.4) we introduce in this context also the concept of a randomized stopping time. Let (97„) be a history. An gl-valued random variable T is called a random time. If {T= n} eg-",n for all n > 0, then the random time T is called a stopping time relative to the history (°7• „). If in addition, (X n ) is Marko v w.r.t. (F), then T is called a stopping time for the Markov chain (X, n ). If T is a stopping time relative to the internal history (,9) then it is called simply a stopping time for the Markov chain (X). This means that, for all n > 0, for some set An important example of a stopping time for the Markov chain (X n ) is the hitting time TB of a set Bee, defined by TB = inf In >0: X n eB}. A random time T is called a randomized stopping time (for the Markov chain (Xn)), if, for every n >0, the event {T= n} and the post-n-chain (x,„ , ...) are conditionally independent, given the pre-n-chain (X 0 , ... , i.e.
xn, 1 x„),
for some 6' 0 (n + 1 )- measurable function fn . We set co PI T= c'') kFx } =foo(Xo , X 1 ,...)= 1 —If(X 0 ,..., X n ). o
Clearly a random time T is a randomized stopping time, if and only if for every n 0, P{T=n1,T- x ;Tn} = P{T= nIF nx ;Tn} = r(X 0 , ... , X n )
for some ff®( 1) -measurable function r„. The functions fn and r
related
32
3 Transience and recurrence
to each other through the formulas n—
n = (1 —
o
1 )— 1 r f,n fn,
E
fn= (1 — r 0). - .( 1 rn i)rn , f co = (1 — r 0 )(1 — r 1 ).... —
-
By convention, we regard two randomized stopping times T and T' as equal, and write .2"
T= T', provided that their conditional distributions, given ,97x , coincide. As an example of a randomized stopping time take any 0 < g < 1, and set r„(x0 ,... , x„)=g(x„), n > 0. The resulting randomized stopping time, denoted by Tg , can be regarded as a geometric random variable where the probability of success at epoch n depends on the current state X n of the Markov chain through the function g. Note that, if g = 1 B is the indicator of -2' a set Bee, we have To = T1B = TB. A stopping time for the Markov chain (X„) is clearly a randomized stopping time. Conversely, a randomized stopping time for (X„) is a stopping time for (X„, „F„), for a conveniently chosen history (3 7n ). In fact the following holds:
Proposition 3.6. (i) Suppose that (X) is Markov w.r.t. a history („F„). If T is a stopping time for the Markov chain (X„, ,97,,), then it is also a randomized stopping time for (X). (ii) Conversely, if T is a randomized stopping time for (X„), then there is a history (,F) such that (X) is Markov w.r.t. (gin) and T is a stopping time for the Markov chain (X, , 97 n). One can choose ,97n = ,*7 X„ V ,FriT , where ,def *7 ;1 = o-(T= rn; 0 < m0.
Proof. Let C be an arbitrary non-negative functional. (i) Let T be a stopping time for the Markov chain (X, gin). By the hypotheses and Theorem 1.1, E[C.0„; T= niginx ]
= IE[ER.0n 13 i7 ,J;T= n1.57;fl = Exn MP {T = n1.97 ,n, which proves the desired conditional independence.
Hitting and exit times
33
(ii) Let T be a randomized stopping time. We have for any 0 < m < n: ER.0„; T= mIgifl= ER.0,2 P{T=mlgi x }lgifl = Exn [C]P{T= mIgin, so that ER. On 1g7',; v'] = In other words, (X„) is Markov w.r.t. the history (- 97n) = GF,,x v Trivially, T is a stopping time relative to GF,,). El Theorem 1.1 expresses the Markov property of (X„) at the deterministic time epochs n. According to the following theorem (X„) is strong Markov, that is, it has the Markov property also at stopping times (or what by the preceding proposition is equivalent, at randomized stopping times). Suppose that T is a stopping time relative to a history (gi„). We can associate with T the a-algebra g-' 7- defined by GO
,°-17 7-= Pie v gin : A n{Tn} e„ *-7 n for all n.01, o
the a-algebra of events which 'happen before T'. Theorem 3.3. Suppose that (X n ) is Markov w.r.t. a history (F), and let T be an arbitrary randomized stopping time. Then for any non-negative functional C, FE[ . 0 1
; T= m] = Exn [C] for all 0 < m < n.
In particular, if T is a stopping time for the Markov chain (X, g) then the following strong Markov property holds: ER . OT I
T ] = ExT [C] on { T< oo}.
Proof. It follows from Proposition 3.6 that the Markov chain (X„,,97„) is Markov also w.r.t. the history (gi nx v ,F T). Consequently, it is Markov w.r.t. (9-i„ v ,97x v „9) = ( g;„ v ,97 nT ). The first assertion then reduces to the ordinary Markov property w.r.t. the history (gi n v The strong Markov property now follows from the formula lE[00 7,1•FT] = E[C00,,l3F,,] on {T = n}, and from Theorem 1.1.
El
3.4 Hitting and exit times Let Bee be arbitrary. We denote by TB (resp. SB ) the first hitting time of the set B, including time 0 (resp. not including time 0): TB = inf In
0:X n E.B},
SB = inf {n 1: X n eB} = TB.O.
34
3 Transience and recurrence
(By convention, inf 0 = 00.) The iterates TB (i), i._0, are defined by TB (0) = TB, and TB (i)= inf {n > TB (i — 1): X n e 13} = m + SB 0 O m on {TB (i — 1) = ni}, i 1. Similarly, the iterates SB (i),i ,>_. 1, are defined by SB (1)= SB, and S B (i)= inf {n > SB (i — 1): X, B}
= M ± Se Om on ISB (i — 1) = ml, i 2. So, for example, TB (i) is the epoch of the (i + 1)th visit to the set B. Clearly all the random times TB (i) and SB (i) are stopping times for the Markov chain (X). Let us denote by 'B the multiplication by 1 B , i.e., 'B is the kernel I B (x, A) = 1 A nB (X), xe.E, AE. Note that 'E = I, the identity kernel. Let us define the kernels GB, G/B and UB by TB
CO
G B (x, A) = E x E 1 A (X n ) =EPx {X n e A, TB 0
l'l}
0
=E (I BcP)"(x, A),
(3.5)
o
s,„- 1
G 1'3 (x, A) =
co 1 A (x„)=EP.,,{Xn eA, SB> n} o o
Ex E
= E (PI (x, A)
(3.6)
o SB
CO
U B(x, A) = E x El A (X „) = E P (IB.P)n(x, A), xeE,Aee.
(3.7)
Note that G B (resp. G/B ) is the potential kernel of the kernel I B .P (resp. Hoc), and that UB = PGB = G B / P,
(3.8)
GB=I+ I Bo U B ,
(3.9)
G;3 = I + U BI Bo.
(3.10)
Also note the interpretations GB I B (x, A) = Px {X T B e A, TB <
001,
UB I B (x, A) = Px {X sB EA, S B < GO}, and the special cases G o = G'0 = G, GE= G'E = I, U E= P.
(3.11) (3.12)
Hitting and exit times
35
Proposition 3.7. (i) If A B, then GA= GB! BC GBIBGA,
(3.13)
UA=UB±UBIB\AUA.
(3.14)
In particular, (3.15)
G = G BI B.+ G BI B G, and for any fee +,
(3.16)
sup Gf= sup Gf. (f >O}
(ii) If A g B, then G'A = I irG'B + GIB G J,
(3.17)
U A '= UB± UAIB\AUB.
(3.18)
Proof. (i) In order to prove (3.13) use (3.11) and the strong Markov property at TB. Formula (3.14) then follows by 'multiplying' both sides of (3.13) from the left by P, and using (3.8) and (3.9). Formula (3.15) is obtained by setting A = 0 in (3.13). To get (3.16) use (3.15) with B = If> 01. (ii) In order to prove (3.17), note first that, for A B, and n > 0: {S A >
n} = {TB > n} +
E
{S A > m, LB (n)= m},
m= o
where L B (n)= max 1m: 0
n, X.E.131.
Consequently, for any
G'Af(x)=
E
Ex [f(X); S A > n]
n=
CO
=E
Ex [f(x n ); TB >n]
n=0
+EE
Ex [f(X,,); SA> M, L B(n)= m] m=o n=m The first term on the right hand side equals / B.G'B f(x). The second is equal to CO
E m=0
Eli dx.)Ex„,[ f
Sg -
E
(X n )1;
>
G'A IB G'B f(x),
n=o
which is just what we wanted. Formula (3.18) follows after 'multiplying' (3.17) from the right by P. 0
36
3 Transience and recurrence
The equations (3.14) and (3.18) are called the resolvent equations while (3.16) is usually referred to as the maximum principle. We denote by L B the last exit time from the set B: LB = sup {n 0: Xn EB}.
Of course LB is defined only on the set {TB < 00} = {X n e B for some n > 0}. Obviously {LB = n} = {X n EB, S B o 0 n = °a}, {L B = oo} = {X nel3
0,
i.o.}.
Note the special case LE = sup In 0: X n eEl = inf {n 0: X n+ = LE is called the lifetime of (X n ). Let us define hB (x)= Px { TB < (X)} = G B(x, B)
(cf. 3.11),
LB < CO}, gB (x)= P x {LB = = 1 B(x)P x {S B = co } (cf. 3.12), = IB (1 — U B 1 B )(X) Pdx) = P x {0
i.o.} = lim (UB /B )"1(x).
hk(x)= Px {X n EB
(3.19)
Theorem 3.4. (i) The function hB is superharmonic, pB is the potential with charge gB ,hil is harmonic, and (pB ,14) is the Riesz decomposition of hB . Moreover, hB is the smallest superharmonic function h satisfying h >1B . (ii) The function hi3° is also harmonic for the transition probability UB/ B (x, dy) = Px XsBedY, SB < c0 . It is the harmonic part in the Riesz decomposition of the superharmonic function 1 (for the transition probability UB /B ). Proof. (i) By the definitions of L B and gB , and by the Markov property, Px {LB = n} = Px {L B .On = Summing over n gives
= PngB (x) for all n > O.
PB = G 9139
i.e. pB is the potential with charge gB . For the function hB we get hB (x) = Px { TB < 00} = P x ILB = 01 + Px {S B < cc}
= gB (x)+ PhB (x), and lim 131"hB (x) = lim P x { TB° O n < CO} oo
oo
= Px {X n EB
= qz3)(X).
37
Hitting and exit times
This proves that (pB , 4) is the Riesz decomposition of the superharmonic function hB . In order to prove the minimality of hB , apply Theorem 3.1(iii) with g =1B and K = (ii) The result follows directly from the representation hk = El lim (UB / B )"1, see (3.19). n co
We write
B" = {11;3° = 1} = IxeE: x {X n eB i.o.} = 11. From the preceding theorem and Propositions 3.2 and 3.3 we obtain :
Proposition 3.8. For any sets A, Bee: (i) Either h > 0 everywhere or the set {14 = 0} is closed. (ii) Either hii° <1 everywhere or the set B" = {14 = 1} is absorbing. If (B nB")". * 0, then BnB" * 0 and B n 0, or sup hB° =1. (iii) Either h (iv) If B" * 0 and inf hA >0 then B' A (V)
O.
If B is absorbing then PB 0 and hB = h .
Proof. (i) Use Proposition 3.2(ii). (ii) If B" * 0 then it is. absorbing by Proposition 3.2(iv). Clearly, if B" * 0 then it is impossible that B n B" = 0. If now X 0 =xeBng", then X n e B" for all n > 0 and X n eB infinitely often, whence Xn E B n B" infinitely often (a.s.). Thus B n (B n13")" as asserted. (iii) If tik p < 1 on B, then by Theorem 3.4(ii), h = UBI p everywhere, which by Proposition 3.3 leads to h; 0. (iv) Define a transition probability Q by
Q(x,dy) = P x {XsB edy,
SB < 00, SB TA 00}
= P x {X 0 011, X sA_B GB ndy, S A uB < X)} ='A' U A B B (x, dy)•
Clearly the set Bcc is closed for Q. For any xe.B", i > 1, we have Px {S B (i) < 001 = 1 and Qi 1(x) = P x { TA S B(i)} , and hence lim Q'1(x) = P{ T4 = 00} = 1 — h A (X). co
It follows that the function 1 — hA is the harmonic part in the Riesz decomposition of the superharmonic function 1 (for the transition probability Q restricted to the closed set B"). Let p = 1 — inf hA < 1. On B" we
38
3 Transience and recurrence
have
1 — h A = Q(1 — hA )= I Ac UA
u EIB(1 — hA)"
p•
Consequently, by Proposition 3.3, 1 — hA = 0 on B. By Theorem 3.4(i), and since B is absorbing, III = lirrl P nh A liM P n 1 Bo,
from which it follows that B' r] (v) Obvious.
A.
3.5 The dissipative and conservative parts Our goal in this section is to obtain the decomposition E = Ec + Ed mentioned at the beginning of this chapter.
Definition 3.3. A set BE is called transient, if Px {L B < co} = 1 for all xEB, (or, equivalently, hit° = 0 on B). B is called dissipative, if there exists a function ge6 t + such that g> 0 and Gg < oo on B. If the whole state space E is transient (resp. dissipative), we say that the Markov chain (X n) is transient (resp. dissipative). Note that, by Proposition 3.8(iii), if B is transient then in fact hil° 0. Note also that (X n) is transient if and only if the lifetime LE of (X) is finite P x-a.s. for all xeE. The relationships between the concepts of transience and dissipativity are discussed in the following:
Proposition 3.9. Let Be & be arbitrary. (i) If G1 B is finite on B then B is transient. (ii) The sets {p B > 0} = {14 < hB } and B\B" = {xeB: 14(x) < 1} are dissipative. In particular, if B is transient, then it is dissipative. (iii) If B is dissipative, then there exists ge& + such that {g > 0} = B and sup Gg 1. E (iv) If B is dissipative then B = Ui°-1Bi where each B. is transient. (v) If Bi ,i >1, are dissipative then so is their union B= H v i° 1 B 1 . (vi) If B is dissipative then there exists a dissipative set B 2 B such that the complement (B)c is either empty or absorbing. (vii) If (X) is irreducible dissipative, then the potential Gs is bounded for any small se <1' + . Then also every small set C is transient. -
The dissipative and conservative parts
39
Proof. (i) Since (by hypothesis) IE„D' 1 B (X„) < oo, we have
hk(x)=Px {E1 B (Xj= col= 0 for all xeB. (ii) Set 2 + ly ngB.
g= 0
By Theorem 3.4(i) pngB
= GgB = pB = h B
kg° ,
0
whence g > 0 on 114 < hBl. Moreover 0 Thus {hk < hB } and hence also its subset B\B are dissipative. (iii) and (iv): Let ge6 be such that {g > 0, Gg < oo} = B, and let Bi = {g i+ 1 , Gg
i
1.
Then B = UBi and by the maximum principle (see (3.16)) sup G1 B, = sup Gl Bi < i2 . Bi
By (i) each B i is transient. The desired function g in (iii) is given by ion BA U B, g = 0 on BC . g= (v) Let g i eg + be such that {g 1 > 0} = B i and 1. Set g = Then we have {g > = B and Gg 1. (vi) First we show that the set B ± = {hB > 0} 2 B is also dissipative. To this end, let g eS + be such that {g > = B and Gg 1, and define f=
0
Then { f > = B± and Gf < 1, which shows that B + is dissipative. Now recall from (ii) that the set Bc\(Be) is dissipative. It follows that the union
B = B + u(Bc\(Bc)) is also dissipative. Its complement is equal to (:fir = B ° n(Ber. If (B)' is not empty, then, being the intersection of a closed and of an absorbing set (see Propositions 2.2 and 3.8 (ii)), it is absorbing.
3 Transience and recurrence
40
(vii) Use (iii) and Proposition 2.7(ii). The latter result then follows El from (i). From Theorem 3.1(iii) we obtain the following criteria for dissipativity and transience: Proposition 3.10. Let Bee be arbitrary. (i) B is dissipative if and only if there exists a superharmonic function h satisfying
co > h > Ph on B. (ii) If there exists a superharmonic function h and a constant y> 0 such that oo>h›Ph+yonB, then B is transient. Proof. Let g = h — Ph and apply Theorem 3.1(iii) and Proposition 3.9(i).
CI Example 3.2. (a) Suppose that (X, ; n > 0) is a discrete Markov chain (cf. Example 1.2(a)). Call a state xeE transient (resp. dissipative), if {x} is a transient (resp. dissipative) set. Now any subset B g E is dissipative if and only if every state xeB is transient. In particular, an individual state xEB is transient if and only if it is dissipative.
Let 9 be an arbitrary non-trivial a-finite measure on (E, e ). Definition 3.4. A 9-positive set Bee is called 9-conservative, if for all 9positive sets A g B,
11,34° = 1
9-a.e. on A.
(3.20)
If the whole state space E is 9-conservative we say that the Markov chain (X) is 9-conservative. Note that, if 9P <<9 then by (3.19) the condition Px {S A < oo } = UA (x, A) = 1
for 9-a.e. xe A
is sufficient for (3.20). 9-conservative sets can be characterized as being complementary to all 9-positive dissipative sets: Proposition 3.11. A 9-positive set Bee is 9-conservative if and only if it does not contain any 9-positive dissipative set A E.
Proof. If a 9-positive set A B is dissipative, then, by Proposition 3.9 (iv), it contains a 9-positive transient set. Hence B is not 9-conservative.
41
Recurrence The converse result is a direct consequence of Proposition 3.9 (ii).
After all these preliminaries the proof of the decomposition result will be easy. Theorem 3.5. (Hoprs decomposition). Let 9ede be a a-finite measure. The state space (E, S) can be divided into two parts E = Ed (p) + E(94, where Ed (9) is dissipative, and we have either (i) E,(9)= 0 (i.e., the Markov chain (X) is dissipative), or (ii) 9(Ec (9)) =0, and E(co) is absorbing, or (iii) 9(Ec (9))> 0, and E(co) is absorbing and 9-conservative. Proof. Let Bo i >1, be dissipative sets such that their union B which by Proposition 3.9(v) is also dissipative, is a version of 9-ess sup {Ace:A dissipative}. Let /3 2 B be defined as in Proposition 3.9(vi), and set Ed (9) = B, E,(9) = (B)c . The result now follows from Propositions 3.9(vi) and 3.11. Examples 3.3. (a) Suppose that (X) is a discrete Markov chain and let 9 = Card be the counting measure on E. Call a state xeE recurrent, if it is Card-conservative; this means h(x) = P x {X„ = x i.o.} = 1, or equivalently, see Example 3.2(a), g(x, x) =
pn(x,x)= co. n= o
Also, x is recurrent if and only if it is not transient. Consequently, Ed = Ed (Card) = {xeE: x transient} and E c = E(Card) = {xeE: x recurrent} . (b) (The forward process). Suppose that F is non-lattice. Let /171 = ess sup z 1 = sup {t: F(t) < 1}. If M = co, then (177,6 ;n 0) is (-conservative. If ci < 00 , then Ed o = (R, 09) and Ec v = [0, R].
3.6 Recurrence Recall from Chapter 2 the definitions of indecomposability and irreducibility. In particular, recall that the latter implies the former. It is easy to see that the converse is not true in general. (Take, for example, the discrete Markov chain on the integers moving deterministically one step to the right.) We shall now examine what form Theorem 3.5 takes when we assume that the state space E is indecomposable. It turns out that there is a strong
3 Transience and recurrence
42
dichotomy: the Markov chain (X) is either dissipative or irreducible 1recurrent. When dealing with irreducible kernels and Markov chains we adopt the notation and terminology used in Chapter 2. In particular, Iii denotes a maximal irreducibility measure, and 6' ± = {BeS: tfr(B) > 0}. We introduce the following: Definition 3.5. An irreducible Markov chain (X) is called recurrent, if h;3° > 0 everywhere and h; = 1 0-almost everywhere, for all and Harris recurrent (or tk-recurrent), if
it; = 1 for all Beet Note that, if (X) is recurrent then it is 0-conservative. Note also that then the set {N° =1} is absorbing (see Proposition 3.8(ii)) for all Bee ; in particular, if (X„) is Harris recurrent then P is stochastic. As one might guess, recurrence must be closely related to the concept of 1-recurrence. In fact, we will soon see that they actually coincide. For the following theorem, recall from (2.1) that there always exist measures 9 satisfying 9P < 9 (see also Lemma 2.2). Theorem 3.6. (i) Suppose that the state space E is indecomposable. Then the Markov chain (X) is either dissipative or (irreducible) recurrent. (ii) Suppose that (X) is recurrent. Then, for any non-trivial a-finite measure 9 on (E,‘) satisfying 9P < 9, the restriction of 9 to the conservative part E(9) is a maximal irreducibility measure; i.e., 9/Ec(p)
Proof. Suppose that (X) is not dissipative. Let 9 be such that 9P < 9. By Lemma 2.2(ii) we necessarily are in case (iii) of Theorem 3.5. Let B E(9) be an arbitrary 9-positive set. By Proposition 3.8(ii) the set B = {h; =1} is absorbing. (B" is non-empty since E(co) is conservative.) By the same proposition (part (i)) and by our assumption of indecomposability, h; > 0 everywhere. In particular, we see that (X) is co/Ec(p)-irreducible. It follows from Proposition 2.4(ii) that the irreducibility measure co/ Ec( ,) is maximal. 0 In the following theorem we give several characterizations for recurrence. Theorem 3.7. Suppose that the Markov chain (X„) is irreducible. (X) is recurrent if and only if any of the following equivalent conditions (i)—(vi) is satisfied:
43
Recurrence
(i) there exists an absorbing full set H such that (X) restricted to H is Harris recurrent; (ii) the transition probability P is 1-recurrent; (iii) for some small function se<99 ±, the potential Gs is unbounded; (iv) (X) is 0-conservative; (v) for some small set CEY + , the set C" is non-empty; (vi) for some small set Ce.99 ±, Uc 1 c = 1 0-a.e. on C. (X„) is Harris recurrent if (and only if): (vii) for some small set CeY ±,LI c i c :El. Proof. The implications (i) =recurrence (ii) (iii) and recurrence (iv) .(v) (iii) are obvious. That (iii) implies recurrence follows from Proposition 3.9(vii) and part (i) of the preceding theorem. Condition (v) implies (vi), since C" 0 is absorbing and full; the converse was noted already in the remark after Definition 3.4. In order to prove the remaining parts of the theorem, let us fix a small set CeY ± and write H = C. In any case it follows from Propositions 2.7(i) and 3.8(iv) that H g B`") for all BE. This proves that (v) implies (i), and (vii) implies the Harris recurrence (note that 11'c' = lirn i (Uc /c)nl 1 when .0 El
n --■
(vii) holds). So the whole proof is completed.
We call an absorbing set He such that (X) restricted to H is Harris recurrent, a Harris set for (X). From Theorems 3.2 and 3.7 and Proposition 3.9(vii) we immediately obtain the following: Corollary 3.1. For an irreducible Markov chain (X„): (i) (X) is dissipative if and only if R> 1 or P is 1-transient. (ii) If R> 1 or P is 1-transient, the potentials Gs, se9Q + , are bounded. 0 According to the following proposition, the (seemingly weaker) notion of 9-recurrence is equivalent to Harris recurrence: Proposition 3.12. Let 96/1l + be an arbitrary a-finite measure. If (X) is 9recurrent, that means 1
for all 9-positive Bed',
then (X) is Harris recurrent (and 9
44
3 Transience and recurrence
It turns out that Harris recurrence is closely related to the following condition: Every bounded harmonic function is a constant.
Theorem 3.8. (i) If the Markov chain (X n) is Harris recurrent then for any superharmonic function h there is a constant 0 < c < oo such that h= c 0a.e. and h> c everywhere. Then also every bounded harmonic function is a constant. (ii) Conversely, if every bounded harmonic function is a constant, then (X„) is either dissipative or Harris recurrent.
Proof. (i) Suppose that he& ± is superharmonic. Let c = 0-ess sup h. Then for every E> 0, the set B = {h> c — E} is 0-positive. From Theorem 3.4(i) and from our hypothesis it follows that h>(c — E)hB C — E. Thus h>c everywhere. Suppose now that he136' + is harmonic. Applying the result we just proved to the harmonic functions h and sup h — h we see that h is a constant. E
(ii) The result follows directly from the fact that 14 is harmonic for all 0 Bee, and from Theorem 3.6(i). For recurrent chains we have:
Proposition 3.13. Suppose that (X) is recurrent. Then: (i) Every superharmonic function is constant tp-a.e. (ii) There exists a harmonic function h, h =1 tp-a.e., h < 1, which is minimal among the superharmonic functions h satisfying h= 1 tp-a.e.; b is given by
b= hH =11,7 for all Harris sets
He.
(iii) There exists a Harris set He which is maximal in the sense that H 1-71 for every Harris set H. Also, if !zee + , h =1 0-a.e., is superharmonic, then h> 1 on H. H is given by
R = {b =1} = {h i, =1} =11(x) = Cx for all Harris sets Hee, all small sets CeY + .
Proof. (i) This follows directly from Theorems 3.7(i) and 3.8(i). (ii) Let He be a Harris set. By Proposition 3.8(v) hH = h /I is harmonic. By Theorem 3.8(i), if h is any superharmonic function satisfying h= 1 0-a.e., then h> 1 on H. By Theorem 3.4(i) this implies that h>h7i). (iii) This is a direct consequence of (ii), and of the fact that C" is a Harris set satisfying (C")" = C" (cf. the proof of Theorem 3.7). 111 Next we shall look at the recurrence of the m-step Markov chains
45
Recurrence
(X ,m,; n > 0), m > 1. As a direct corollary of Proposition 3.5 and Corollary 3.1 we have: Corollary 3.2. Suppose that (X) is irreducible. Then (X „)is recurrent if and only if some (or equivalently, all) of the cm = g.c.d. {m, d m-step chains (X,,m ; n 0) is (are) recurrent. 0 Concerning the Harris recurrence of the m-step chains we obtain the following result: Proposition 3.14. Suppose that (X,.) is Harris recurrent. Then for any m > 1, all the Cm m-step chains (X,,„,; n >0) are Harris recurrent. Proof. Since (X) restricted to the closed set E 0 + • • • + Ed _ 1 = NC is Harris recurrent there is no loss of generality in supposing that N is empty. Let hi , hi = 1 tfr-a.e. on Er), hi = 0 on (Er)c, be the minimal harmonic function (given by Proposition 3.1300) for the Markov chain (X) with state space Er . Thus
Pmbi =
i = 0, 1,
, c„, — 1.
(3.21)
It follows that Pbi is harmonic for (X,,m) with state space Er i . By minimality Phi > hi _ 1 . (The indices are modulo Cm .) By iterating we get Pcmhi > Pent- 1 hi +cm _ > • • > Phi+ 1 > hi . Further iterating leads to pmb i > , > p2cm bi > pcm hi > hi. It follows from (3.21) that all the above inequalities are equalities. In particular,
Phi — bi- 1 for all i = 0, 1,
, cm — 1,
from which we can conclude that the function h defined by cm — 1 h= b, ( = b, on Er), 0 is harmonic. By Theorem 3.8(i), h a 1, and hence also hi = 1 on The final result now follows from Proposition 3.13(iii). El
Er) for all i.
Examples 3.4. (a) Let (X) be a discrete, irreducible Marko v chain (cf. Example 2.2(a)). (X„) is recurrent if and only if there exists a recurrent state x0 EE. Then all the states x in F = {xcE: xo x} are recurrent, and F is a
46
3 Transience and recurrence
Harris set for (X). The maximal Harris set R is given by def R = {xe E : h 0 (x) = P, 1 X,, = xo for some n 0} = 11. In particular, (X) is Harris recurrent if and only if hxo 1 for some recurrent state x 0 . (c) (The random walk). Suppose that F is non-lattice, and has mean MF = i tF(dt).
DI If MF 0, then the random walk (Z,,) visits every finite interval only finitely many times almost surely. On the other hand, if MF = 0, then every interval of positive length is visited infinitely often almost surely (see e.g. Feller (1971), Sect. VI. 10). It follows that the random walk is dissipative if MF 0, and Harris recurrent if F is spread-out and MF = O. (d) (The reflected random walk). Suppose that F has a mean M F, — CC) < MF < 00 . The reflected random walk (WO is dissipative if and only if MF > 0, and Harris recurrent if and only if M F < 0. (e) (The forward process). Suppose that F is spread-out. Then, for any 5> 0, the Marko v chain (V:5) is Harris recurrent. (f) With the assumptions of Example 2.1(f ), the autoregressive process (R„) is Harris recurrent.
Embedded renewal processes
In this chapter we shall develop an important tool for the study of irreducible kernels, the so-called regeneration method. It can be most easily described in the case where K = P is the transition probability of a Marko v chain (X n) having a communicating state; that is a state x o eE such that {x o } cS and x o —> xo , i.e., P"(xo , {x0 }) > 0 for some n 1. Then the path of the chain splits in a natural manner into x 0 -blocks; these are the subpaths between consecutive visits to x o by (X n ). By the Markov property the x 0-blocks are i.i.d. random elements, and thus a single x 0 -blockntaisherv'fomatinbuhewlpoft chain. Since the chain can be considered to regenerate at every visit to x o , this method is commonly referred to as the regeneration method. We shall show in this chapter that the regeneration method has far wider applicability than would be expected at first sight. In fact, we will see that a sufficient condition for the successful application of this method is that we have an irreducible kernel K which satisfies the minorization condition M(mo , 13, s,v) with mo = 1, i.e., K _fisev.
We usually normalize fi and s so that f3 = 1, and then call the pair (s, v) an atom for K. By Theorem 2.1, for an irreducible kernel K, some iterate Km°,m o > 1, possesses an atom (s, v). Therefore the regeneration method is applicable for the entire class of irreducible kernels.
4.1 Renewal sequences and renewal processes Since renewal theory will play a central role in the sequel we start by proving some elementary results for renewal sequences and processes. Let b =(bn ;n>0) be a sequence satisfying bo = 0, 0 < bn < oo for all n >0, b> 0 for some n > 1. (4.1) Let b* i,i > 0, denote the convolution powers of b, i.e., we set def
4 Embedded renewal processes
48 and iteratively,
for i > 1.
b*i =
(The convolution a*a' = (a* an ;n> 0) of any two sequences a =(a;n 0) and a' =(a„ ' ;n >0) is defined in the usual manner: n
a*a,,' =
E
am an',, n0.)
m=0 Note that from our hypothesis b 0 =0 it follows that for every i > 1, (4.2)
b:` = 0 for 0 < n 0) defined by 00
i= 0 is called the (undelayed) renewal sequence corresponding to the increment sequence b. If a = (an ;n > 0) is an arbitrary non-negative sequence, the sequence co v=a*uEbi o is called a delayed renewal sequence (with delay sequence a).
Here are some elementary facts about renewal sequences: Proposition 4.1. (i) u0 = 1, 0 < u, < oo for all n > 0. (ii) g.c.d. {n 1: b> 0}. = g.c.d. {n 1: un > 0}. (iii) The renewal sequence u is the unique non-negative sequence u = (un ;n >0) satisfying the equation (4.3)
u = 6 + b*u.
More generally, the delayed renewal sequence v = a*u is the unique nonnegative sequence v = (v n ; n 0) satisfying the equation v = a + b*v.
(4.4)
Proof. (i) By (4.2), we have u n = E7=0 bn*i . (ii) The result is obvious. (iii) Let a be an arbitrary non-negative sequence and let v = a*u. Clearly, OC)
v=a+a*Eb*i =a+b*v. 1 If v' is any non-negative solution of (4.4), then we obtain by iterating i
Un' = a+ v' *b =
E a *bn* i + C *b:ci +1)
i= 0
for all n,j >O.
Renewal sequences and renewal processes
49
IQ > n then by (4.2) the first term on the right hand side is equal to a*u„ and the second is zero. 0 The integer d> 1 defined by d = g.c.d. {n 1: b„> 0} = g.c.d. {n.l.: u,,> 0}
is called the period of the renewal sequence u. In most cases we do not lose the generality if we suppose that u is aperiodic, that is, d = 1. (Otherwise we should consider the aperiodic renewal sequence (und ; n > 0) corresponding to the increment sequence (bad ; n > 0).) Equation (4.3) is called the renewal equation. If Mr ) d=ef D° ba < 1, we call the corresponding renewal sequence u probabilistic. In this case we have the following interpretation: Let t(i), 1 < t(i) < oo a.s., i = 1, 2, ... , be i.i.d. random times with common distribution b. Write briefly t(1) = t. Thus we have, for any i > 1, P {t(i) = n} = P{t = /4} = b„, n. 1, P It(i) = co 1 = P It = oo 1 = 1— MP) . Let T(0), 0 < T(0) < oo a.s., be a random time, independent of {t(i); i > 1}. We call T(0) the delay. Let a =(an ;n>0) be its distribution: P { T(0) = n} = a,,, n>0, P{T(0)= oo} =1 —Eaa =1 — MT ) . o
We write T(i), i
1, for the sums
i T(i)= T(0) + E tu). j
=1
Definition 4.2. The sequence (T(i);I> 0) defined above is called the renewal process (on N) with delay distribution a and increment distribution b. The sequence ( Y„ ; n > 0) defined by 1'
1 (T(i)= n for some i 0}
is called the incidence process (of the renewal process (T(i);i > 0)). If a= (5, i.e., T(0) = 0 a.s., then the renewal process (T(i);i > 0) is called undelayed.
In order to emphasize a specific delay distribution a we will often write P a insteadofPrhulyingpobatmesur. It is easy to see that va = a*un = P0 { lc = 1}.
4 Embedded renewal processes
50
In particular, for the undelayed renewal sequence u we have u,, = P6 { Y„ = 1}. Note that in the probabilistic case 0 < un < 1 for all n > 0, and
. . . E ti n . E E 0
n0 i0
bn*i ,
. . E (0b0))i—_ (1 _ 00i b I
(4.5)
i=0
An undelayed renewal process (T(i);i > 0) (or the corresponding renewal sequence u) is called recurrent if T(i) < oo a.s. for all i > 0, otherwise transient.
Proposition 4.2. Either of the following conditions is equivalent to the recurrence of the renewal process (T(i);i > 0): (i) APb°) = 1,
(ii)E ,, ,/,,. 00. Proof. Obviously, recurrence is equivalent to the condition t < oo a.s., i.e. to (i). The equivalence of (i) and (ii) follows from (4.5). 0 A recurrent renewal process (T(i);i > 0) is called positive recurrent if def
Mb = Et = E ix nb is finite, otherwise null recurrent. For a probabilistic renewal sequence, let B„= P {t > n} = 1 — b* 1„. (We use the symbol 1 also to denote the sequence 1 = (1„ ; n 0) = (1, 1, ...).) It follows from the renewal equation (4.3) that B *u = 1.
(4.6)
Hence in the positive recurrent case the delay distribution e given by e = M b—l B is an equilibrium distribution in the sense that the corresponding delayed renewal sequence e* u is a constant, e*uE_-- M b-1 . Let b again be an arbitrary sequence satisfying the basic hypotheses (4.1). We define the generating functions 6(r) and i(r) by . . b-(r)=Ernb, ii(r)=Ernun, 0 < r < cc. 1 o Note that b(0) = 0 and /2(0) = 1. It follows from the renewal equation that
Kernels and Markov chains having a proper atom
51
we have
11(0= 1 + 6(r)11(r) for all 0 < r < cc, whence
ii(r)= (1 — '13' (r)) - 1 whenever b(r) < 1,
(4.7a)
170= oo whenever 13(r) > 1.
(4.7b)
and The real number R = sup {r = sup {r
O: 11(r) < co } 0: i■ (r) < 1}
is called the convergence parameter of the renewal sequence u. u is called Rrecurrent if ii(R)= cc, otherwise R transient. Note that, for a probabilistic renewal sequence, R> 1 and recurrence means the same as 1-recurrence. From (4.7) and from the monotone convergence theorem we obtain the following: -
Proposition 4.3. g(R) recurrent.
1 always, and 6(R) = 1 if and only if u is R-
El
Define a transformed sequence 6- = ( 6- „ ; n 0) by It is easy to see that the renewal sequence ii corresponding to
6 is given by
an = Rnun , n >O. From Propositions 4.2 and 4.3 it follows that recurrent if and only if u is R-recurrent.
(4.8)
a is probabilistic, and it is
4.2 Kernels and Markov chains having a proper atom Now we resume our study of non-negative kernels. Before introducing the general regeneration scheme we shall look at the simple special case where the kernel K has a proper atom.
Definition 4.3. A non-empty set aee is called a proper atom (for the kernel K), if (i) K(x, .) = K(y,.) for all x, yea, and (ii) x -*a for some x ea. Let aee be a proper atom. For any function f on E, if f is constant on a, then, by convention, we write f(x) for the common value of f (x), xe a. So,
52
4
Embedded renewal processes
for example, the notation K(a,.) makes sense. Note that, in fact, Kn(x,.)= Kn(y,-) for all x, y e a, n > 1, and hence condition (ii) can be written in the form (ii') Kn°(a, a) > 0 for some no 1 (or just briefly, a a). In other words, (ii') states that the proper atom a is Kn°(a, .)-communicating. Consequently, by Proposition 2.3 (ii) the kernel K restricted to the set a + = {xe E: x —> a} = {G1 G, > 0}
is irreducible. Since we will be interested only in the restriction of K to a + , there is usually no loss of generality in assuming that a + = E, i.e. K is irreducible. From the inequalities Kn +1(a,.)> Kn(oc,a)K(a,.), n > 1, and since K .+ 1 (01, .) is o--finite by our Basic assumption, it follows that Kn (a, a) is finite for all n > 1. Note that a proper atom is a small set: we have M(1, 1, 1, v) with v = K(a,.). Also note that, if x o e E is a communicating state, i.e. a state satisfying
{xo}ee
and x o —> xo ,
then the singleton {x 0} is a proper atom. Suppose now for a while that the kernel K has a proper atom aee. It can be shown that with a there is always associated a renewal sequence:
Proposition 4.4. Let v = K(a,.). (i) The sequence u defined by un = K"(a, a)
f= 1 for n = 0, = vIC -1 (a) for n > 1,
is a renewal sequence. Its increment sequence b is given by bn = K(I „JO" - 1 (a, ot) = v(I„cl()' - 1 (a), n 1.
(ii) More generally, for any xEE, the sequence v(x) defined by v(x) = Kn(x, a), n
0,
is a delayed renewal sequence: v(x) = a(x) * u, where the delay sequence a(x) is given by an(x) = (I cf K)(x, a), n
0.
(iii) For any Aee, the sequence w(A) defined by yKn(A), n w(A) = K . + 1 (cx, A \) =
0,
Kernels and Markov chains having a proper atom
53
is a delayed renewal sequence: w(A) = u * o- (A),
where the delay sequence a(A) is given by = K(4K)n(a, A) = v(I„cK)n(A), Ti
0.
In the proof we need the following simple algebraic lemma:
Lemma 4.1. Let y and .5 be two elements of a ring. Then for all n > 1: (y + = y n + m= 1
= y n + E (y 1 Proof of Lemma 4.1. Write (y + (5) n as the sum of yn and 2n — 1 terms each of which is a product of ys and 6s containing at least one (5. In order to obtain the first (resp. the second) of the identities of the lemma, sum together those terms where (5 occurs for the first (resp. last) time at the mth position. CI in=
Proof of Proposition 4.4. Set y = I „X and (5= I „K. By the lemma we have for the sequence v(x) = K"(x, a), n > 0:
v(x) = (y + br(x, a) = (I „eK)"(x, a)
E (I „..K)m 1 I Gc K n— m+ 1 (x, a) m= 1
= a(x)*un .
Since un = Kn(a, a) = Kvn _ 1 (a) and bn = K(I7.K)' -1 (a, a) = Kan _ 1 (a), Ti 1, we see that u satisfies the renewal equation u n = + b*un , whence by Proposition 4.1 (iii) it is a renewal sequence. Thus the proofs of (i) and (ii) are completed. The proof of (iii) is analogous to that of (ii). LI We call the sequence u = (Kn(a, a); n > 0) the embedded renewal sequence associated with the proper atom a. As a direct consequence of the definitions and of Theorem 3.2 we obtain the following:
Proposition 4.5. The kernel K and the embedded renewal sequence u have a common convergence parameter R. Moreover, K is R-recurrent if and only if u is. LI
4 Embedded renewal processes
54
Next we shall look at the harmonic functions and invariant measures for the kernels rK, 0 < r < R. We retain our basic hypothesis of the existence of a proper atom a. Let G„(r) be the potential kernel of r I „cK, i.e., 00
0 r< co , o
and let h (PeS, and 7 c(r) edt, be defined by co h(x) = rG,,fr)(x, a) = Ern+ 1 (Ic,.K)(x, a), 0 CO
fr)(A) = rKG,,(r)(a, A) = rvGr)(A) = Ern+ iv(I,x)n(A). 0
The generating function g(r) = ET rnb n of the increment sequence b = (K (I „c K)' - 1 (a, a); n 1) can be also expressed in the forms 00
60= E rn + 1 K(I ,cK)"(a, a) = rvG 9(a) -
o = v(kr ) )=
It follows from Propositions 4.3 and 4.5 that the convergence parameter R of K equals R = sup {r > 0: v(h(:)) < 1}.
Moreover, v(hr) < 1, and v(h) = 1 if and only if K is R-recurrent. From the identity rKhr)(x) = r/7eKG!,r)(x, a) + rvG(x, a)
= h(x) — 1,(x)(1 — v(kr ))), we obtain part (i) of the following:
Proposition 4.6. (i) For any 0< r < R, the function h (:) is superharmonic for the kernel rK, h (c(r) > rK kr) .
If K is R-recurrent, then kR) is harmonic for RK, h(„R) = RKh R) .
(ii) Similarly, the measure 7Ca(r) is subinvariant for rK, that is ir(c,r) > rna(r) K. If K is R-recurrent then nc,(R) is invariant for RK, i.e., 7.e> = RTh(c,R) K. El In Sections 5.1 and 5.2 we shall make a thorough study of the subinvarian
Kernels and Markov chains having a proper atom
55
and invariant functions and measures for a general R-recurrent kernel K. When K = P is the transition probability of a Markov chain (X) and P has a proper atom aee, we can interpret the situation in probabilistic terms: Suppose that (X n) has initial state X 0 = x E. The epochs (Ta (Thi 0) of the successive visits to the proper atom a form a renewal process corresponding to the incidence process Yn = 12 (X „), n >O. Its delay is T(0) = T. The delay distribution a(x) is given by an(x) = P x{ T = n} = (/e P)"(x, a), n 0. The increments c(i) = Ta(i)— Ta(i — 1),i > 1, have the common distribution
= P„{S„=n} = P(Ia.P)n - 1 (a, c), n 1. The corresponding undelayed renewal sequence u is u„ =
a} = Pfl(a, a), n 0,
while the delayed renewal sequence v(x) (with start at X 0 = x) is given by v(x) = P x {X„ea} = Pn(x, a} = a(x)*u n , n 0
(4.9)
(cf. Proposition 4.4). The identity (4.9) is called the first entrance (to the proper atom a) decomposition of the transition probability Pn(x, a). r_S Since -I3(r) = oco r,,,i pvacpr(a, a) Ed_rcc ;S,< co], we see that the convergence parameter R of P equals R = sup {r
0: Ea [rs. ; < oo] < 1}
and P is R-recurrent if and only if Ea [Rs- ;S,< oo]=1.
As a special case of Proposition 4.6 we have: Corollary 4.1. (i) The function h
by
hc,(x)= kn(x) = Ga(x, a) = Px {Tc, <
Ph,. h, is harmonic, h„ = Pk, if (X,,) is recurrent. is superharmonic, i.e., h (ii) The measure m„ defined by sc, nc,(A) = 7r 1)(il) = vG„(A)= E7 E1 A (X„) is subinvariant, i.e., 7c > n„P. 7r„ is invariant, Ira = ir/3, if (X n) is recurrent. El As a converse result to Proposition 4.4 (i), we can easily show that an arbitrary undelayed renewal process (T(i);i > 0) can be obtained as the embedded renewal process of a Markov chain: Let iVI = ess sup t = sup {m > 1:b. > 0}. If la is finite, set E =
4 Embedded renewal processes
56
{0,1, . .. ,A — 1}; otherwise set E = N. Define the transition matrix P = (p(x, y); x, yeE) by p(x,x + 1) = P {t > x+lit>x}=B;'Bx+1 , p(x,0)= P{t= x + lit > x} =B; l bx+i , p(x, y)= 0 otherwise. Now one observes easily that the above transition probabilities are the transition probabilities of the backward chain (V„ ; n > 0) of the undelayed renewal process (T(i);i > 0), def
1'=n—max{m:0<m0,
(4.10)
i.e., J' the time elapsed since the last renewal epoch before (or at) n. The renewal process (T(i);i >0) is reobtained as the embedded renewal process associated with the proper atom a = 101. Examples 4.1. (a) For a Card F-irreducible matrix K = (k(x, y); x, ye E) every state zeF is communicating (whence {z} is a proper atom). For any ze F, let z K = (z k(x, y); x, ye E) denote the matrix obtained by removing the zth row: z
k(x, y) = k(x, y)— 1 {z} (x)k(x, y) = f k(x,y) 10
for x * z, for x = z.
Let (zK) .(.,on)(x, y); x, y e E). If K is Card F-irreducible and R-recurrent then the column vector hz defined by
hz (x)=
E Rn z on)(x,z),
xeE,
n= 0
is the unique non-negative column vector satisfying
RKh z = hz and
hz (z) = 1.
If K = P is the transition matrix of a Card F-irreducible, discrete Markov chain then for all ze F (writing Sz = S{ . } ): 1 for all Ez [rsz ;S z < oo]{ —< > 1 for all
r < R, r > R,
and P is R-recurrent if and only if equality holds for r = R. In the Rrecurrent case the R-invariant column vector hz is given by
hz (x)= Ex [R Tz ;Tz < oo] = Ex [Rs. ;Sz < cc]. (d) (The reflected random walk). If F(0) = P {z 1 0} > 0, then the state 0 is communicating (whence {0} is a proper atom).
Kernels and Markov chains having a proper atom
57
The convergence parameter R is given by R = sup fr 0: E o [rs° ; S o < oo] < Q.
The reflected random walk (Wn) is R-recurrent if and only if 1E 0 [Rso ; So < co] = 1; the unique R-invariant function h o satisfying h0(0) = 1 is given by ho (x) = Ex [R T°; To < co], xeR, . Suppose that (W„) is recurrent. Then the measure so no (A)= Eo [y 1 A (X n)], 1 is the unique invariant measure satisfying n 0 ({0}) = 1. In addition to our previous examples (a)—(h) we shall introduce two new examples: (i) Consider the following model for a storage: Let 0 < M <11t < oo be two constants (the lower and upper level, respectively, of the storage). Let Zn , 17 = 1, 2, ... , be i.i.d., non-negative random variables (the daily demands). Let S o be a random variable, independent of {z n ; n > 1} (the initial storage). We suppose that the daily storages S,,,n> 0, are obtained as follows: Whenever Sn _ 1 , the storage at day n — 1, is above the lower level M the storage decreases by the daily demand zn , whereas if the storage S n _ 1 is below M then the storage will be (immediately) supplied to the upper level SI (from which it again decreases by the amount z,,); i.e., S,, = Sn _ 1 — zn if Sn _ 1 > M, = ATI — z„ if Sn _ i <M.
Clearly (S ; n> 0) is a Markov chain on (R, .R). The half-interval ( — cc, la] is an absorbing set. The half-interval a = ( — cc, M] is a proper atom. The probability measure v = P(oc,•) is given by v = =.21/171 — z 1 ). (j) The waiting times of successive customers in a 2-server queue can be modelled as follows: Let t l , t 2 ,... be i.i.d., non-negative random variables (the interarrival times), and let s o , s i ,... be i.i.d., non-negative random variables, independent of ttn ;n>11 (the service times). Let E= fx = (x(i),x(2) )ER 2 : 0 < x(1) < x(2) }, and let Wo = (TOP, W (02)) be an E-valued random element, i.e., 0 < 14/(01) < W(02) (a.s.). We define iteratively a Markov chain (W„ ; n > 0) = (W;, 1) , W(n2) ; n > 0) on E:
4 Embedded renewal processes
58
141.1) = min {(W2 1 + s,,_ 1 — tn)+ (WV ) —
b
The component W„(1) (which, in general, is not a Markov chain) represents the waiting time of the nth customer (before service). Let = ess sup t i ,s = ess inf so . If s< then (0, 0)EE is a communicating state. (Note that, if s> then (Wn) is not even irreducible. We will look at the case I < s <5later in Example 4.2(j ).)
4.3 The general regeneration scheme Our next object is to show that a regeneration scheme similar to that described above exists for an arbitrary irreducible kernel satisfying the minorization condition M(m o , 1, s, v) with mo = 1. We assume that K is irreducible.
Definition 4.4. A pair (s, v), see + , veil, is called an atom (for K) if the minorization condition M(1, 1, s, v) holds, i.e.,
K>s0v. Concerning the irreducibility assumption similar remarks could be made here as were made after Definition 4.3. So, instead of assuming that K is irreducible and se& +, it would be sufficient to assume only that s is attainable from v, i.e.,
vKns > 0 for some n > 0. But this would imply irreducibility: the restriction of K to the set {s > = {Gs > 0} would be irreducible. Note that, if we have M(m o , f3,s,v) with mo > 1, # > 0 arbitrary, then the pair (f3s,v) is an atom for the iterated kernel Km°. Note also that, if ocee + is a proper atom then the pair (1„, K(oc,-)) is an atom. In Proposition 4.4 we proved that with a proper atom there is associated a renewal sequence. It turns out that this holds true also in the case where K has only an atom (s, v):
Theorem 4.1. Let (s, v) be an atom. Then: (i) The sequence u = (un ;n > 0) defined by uo = 1,
un = vi1,
The general regeneration scheme
59
is a renewal sequence. Its increment sequence b is given by bo = 0,
bn = v(K — s 0 v)" - 1 s, n 1.
(ii) For any xeE, the sequence (IC' s(x);n -1:)) is a delayed renewal sequence. Its delay sequence a(x) = (an (x);n 0) is given by (4.11)
an (x) = (K — s ® v)s(x), n 0;
i.e., we have Ks(x) = a(x)*u n , n >O.
(iii) For any Aeg, the sequence (vK"(A);n 0) is a delayed renewal sequence. Its delay sequence a(A) = (o- n (A);n 0) is given by an(A) = v(K — s C)v)n(A), i.e., we have
n0;
v Kn(A) = u*o -(A), n O.
(iv) For all n >1,xcE, AGe: K"(x, A) = (K — s Ø v)n(x, A) + a(x)* u * o-(A),, _ 1 • Proof. (i) and (ii): Set y = K — s C)v, 6 = s ® v in Lemma 4.1. We get for all
n >1: n
Kn =(K — sC)v)" +
E
(4.12)
(K — sC)v)m - 1 sC) vK" - m,
m= 1
Therefore n
K" s(x) =
E
(K — S ® vrs(x)u n _ m = a(x)* un .
m=0 Since b= v(an _ 1 ) for all n > 1, it follows that u satisfies the renewal equation u = 6 + b*u. Hence it is a renewal sequence. (iii) The proof is analogous to that of (ii). (iv) By (4.12) and (iii), n
K"(x, A) = (K — s 0 v)"(x, A) +
E am _ i (x)a* or(A),,,
m= 1
= (K — s C) vr(x, A) + a(x)*u* o -(A) n - 1 -
CI
We write Gn for the potential kernel of the kernel r(K — s C)v), i.e., G(sr)v =Ern(K—sOv),
r > O.
o
Recall that G(r) denotes the potential kernel of r K , G(r) =
(4.13)
60
4 Embedded renewal processes For the generating functions i(r) = EO° eun and b(r) = E ic rnb we obtain co 110= 1 + E rnvKn -1 s =1 + rvG(r)s, 1 OC)
[0 = E r"v(K — s Ov)n - ' s = rvGrv s. 1 As a direct consequence of Theorem 3.2 and Proposition 4.3 we have:
Proposition 4.7. Suppose that K is an irreducible kernel and that it has an atom (s, v). Then : (i) The convergence parameter R of the kernel K and of the embedded renewal sequence u coincide, and R= sup {r 0: 11(r) < co} = sup fr.
0:6(r)< 11;
(ii) b(R) < 1 always; (iii) K R-recurrent<=>kR) = co <=>b(R) = 1.
0
For later purposes note the following formula: Suppose that K is 1-recurrent. Then the embedded renewal sequence u is recurrent, i.e., M°) = b(1) = 1, and, writing h = Gs(l s, we have def
B= 1 — b* 1,, =
CO
00
E
b.= Ev(K — s 0 v)ms = cf,,(h).
n+ 1
(4.14)
n
4.4 The split chain When K = P is a transition probability the results of Section 4.3 have an interesting probabilistic interpretation. We will show in Theorem 4.2 that an atom (s, v) for P in fact represents a proper atom in a suitably enlarged state space. Suppose now for a while that K = P is the transition probability of an irreducible M arkov chain (X „;n > 0). We suppose that (s, v) is an atom, i.e., P satisfies the minorization condition M(m o , 1, s, v) with mo = 1, P > s 0 v. We can assume without loss of generality that v is a probability measure (cf. Remark 2.1(v)). Then necessarily 0 s 1. Define a kernel Q on (E,) by setting Q(x, A) = (1 — s(x)) - 1 (P(x, A) — s(x)v(A)) = 1 A (x) if s(x) = 1.
if s(x) < 1,
Then Q is clearly substochastic. The transition probability P splits into two parts: P(x, A) = s(x)v(A) + (1 — s(x))Q(x, A).
61
The split chain
Thus a transition starting from an arbitrary state x eE can be interpreted as happening in two stages. First, an s(x)-coin (that is a coin with P{`head'} = s(x)) is tossed. If the result is 'head' the Markov chain moves according to the probability law v(), otherwise according to Q(x,.). The crucial point here is the fact that the occurrence of 'head' leads to a transition law which is independent of the state x. In what follows we shall formulate this heuristic approach in precise terms by adjoining to the Markov chain (X„) a 10, 1}valued stochastic process (I) representing precisely the successive results of the tossing of an s(X„)-coin at n= 0,1, .... It is obvious that we thus obtain a Markov chain (X,Y,,; n > 0) on the state space E x {0,1} .'.' E x {`tail', `head'} such that the subset E x 111 '.-.' E x {'head'} is a proper atom for (X, Y). Hence there exists an embedded renewal process, the incidence process of which is the sequence (Y,). To formalize all this, suppose that the Markov chain (X;n > 0) is Markov w.r.t. an arbitrary, fixed history (g", ,,). If the transition probability P is not stochastic we complete it to a stochastic kernel by extending it to (Ea , 6v a) as described in Section 1.2. If necessary, we enlarge the underlying probability space (0, ,F. ) into the product space (SI x {0,1}', , 0 {0, {0}, {1}, {0, 1} } ® "), to include the results of a coin tossing experiment. Let ( Y„ ; n >0) be a {0, 1}-valued stochastic process depending on the Markov chain (X„,,*7 ) through the formulas Plxn+IGA, Y.= 1 I-9'n v •97;:-11 = P{xn+IGA, ) 7 = 11X}
= s(X„)v(A), P{X.+1" , Y. = 0 1Y.7,1 v
(4.15a) -i}
= P{X.+1", 1c = 13 1X.}
= P(X„,A)— s(X„)v(A),
(4.15b)
A eem n 0. (By convention g7)i i = 10, fll and v( {A} ) = 0.) Note that the conditions (4.15a) and (4.15b) are together equivalent to the following set of conditions: PIX,, ±1 eA1,97„ v „F,Y,_ 1 1 (4.16a) PI Yn = 11Y7 n v g; Y_ 1 } = p{ Yn = 11Xn } = s(X),
(4.16b)
P{X ±i cAl,F,, v ,°7-,;_ i ; Yn = 1} = P{X n+l eA I Yn = 1} = v(A),
n
(4.16c)
0. Condition (4.16a) simply states that (X„) is Markov w.r.t. the history P7„ v ,F,,Y _ 1 ). Condition (4.16b) means that the probability of
4 Embedded renewal processes
62
getting a 'head' at the nth toss is equal to s(X) independently of the previous history g7' of the chain (X) and of the tosses up to n — 1. Condition (4.16c) says that, if 'head' is obtained at the nth toss then the next transition obeys the probability law v() independently of the past history ,917„ of the chain and of the tosses up to n — 1. From conditions (4.16a, b, c) it is also easy to derive the formula P{Xn+leAlgi.. v .97 ; 1 ; lc= 0} = Q(X „, A), which has a similar interpretation to the formulas above. Note also that
(4.16d)
P { Y„ = 11,Fx v, F ,;_ i l =P{Y„=11X„,X„+1 } = r(X , X
1 ),
(4.17)
where r is defined as the Radon—Nikodym derivative
s(x)v(dy) P(x,dy)
r(x,y)— .
In view of the discussion above, the following result is not surprising: Theorem 4.2. (i) The stochastic process (X, Y; n > 0) is a Markov chain w.r.t. the history ( v ,97„Y ; n 0). (ii) The Markov chain (X „,Y„ ; n 0) is irreducible. def
(iii) The set a = E x {1} is a proper atom for (X„,Y„). (iv) The renewal sequence u0 = 1,
un = vP" - 1 s, n > 1,
associated with the atom (s, v) of the kernel P is also the renewal sequence associated with the proper atom a of the Markov chain (X, Y,,). Proof. It follows from (4.15) and (4.16) that
PIX
i edy, Y,, ±1 = 11*--;„ v = P{X ±l edy, Y ±1 = 1IX, Y„} if Y,, = 1, { v(d y)s(y) --Q(X„,dy)s(y) if lc = 0,
and P{X„ ±i edy, Y,, ±1 = 01,97„ v
Y,}
= P{X ±i edy, Y ±1 = OIX„, Y„} if Y,, = 1, s(y)) = f v(dy)(1 — 1 Q(X „, dy)(1 — s(y)) if /7„ = 0. Consequently, (X„,Y) is a Markov chain w.r.t. the history GT, „ v „97 „Y ). We
The split chain
63
also see that the transitions starting from a = E x {1} are identical in distribution. In order to show that the chain (X, Y n ) is irreducible and that a is a proper atom for it, it suffices to show that PI Y„ = 1 for some n 11X 0 = x, Y0 = il > 0 for any initial state (x, i) G E x {0, 1} of the bivariate chain (X ,Y). By (4.16) PI Yn = 1 I X o = x, Yo = 1} = vPn - i s
for all n > 1.
(4.18)
By irreducibility vPn - i s > 0 for some n> 1. Similarly, P { Yn = 11X 0 = x, Y0 =0} = f Q(x,dy)Pn - 1 s(y) >0 for some n > 1. The result (4.18) also proves (iv).
0
We call the bivariate Markov chain (X, Y n ; n > 0) the split chain of (X). We write P x for the probability measure defined on the a-algebra o-(X n , n >1; Y, n > 0) and corresponding to the initial state Y 0 = 1 (X0 = x arbitrary), i.e. P x = Y(X,n 1; Y n , n 01170 = 1). By (4.16) Ex [C a 0] = 1E, [C] for any non-negative functional C of the split chain (X Yn). In what follows we shall use the symbol P also to denote the transition probability of the split chain. So, for example, we can write P'(, A) = P „IX „G Al = vPn- 1 (A), A Ge a , n > 1, Pn(x, a) = Px { lc = 1} = Ex s(X „) = Pns(x), xeE, n 0,
= Px { Y
1 n = 0, 1} = un = { ' vP" - 1 s, n > 1.
(4.19)
Define the stopping times S x , Tx and T(i), i >0, for the split chain (X, Y n) as the hitting times of the proper atom a, Sx =infln 1: Y = 11, Tx = Tx (0)= inf In ?_.0: lc = 11, and iteratively for i >1, Tc,(i) = inf In > 7;(i — 1): Y= 11.
The sequence (T(i); i >0) is the embedded renewal process associated with the proper atom oc of the split chain. Its increment distribution b= (bn ; n1) is given by b„ = P x ISx = n} = v(P — s 0 v)n - 1 s, n
1,
(4.20)
64
Embedded renewal processes
4
and the delay distribution a(x)= (a„(x);n >O) corresponding to the start X0 = x by
a(x)= P,{T = n} =(P — s
(4.21)
v)ns(x), n O.
Thus the decomposition of Theorem 4.1(ii) for K = P can be interpreted as the first entrance decomposition Pns(x) = P x {Yn = 1}
= m=0 E P„{T=m}P{Yn_m= = a(x)*un , n
1}
0,
(cf. also (4.9)). Similarly, writing a-n (A)= v(P — s C)v)n (A)
=Px {X n+l eA, S x
n +1}, n> 0,
= 1, 0 < m < n},
L ct (n)= max {m:
(4.22)
n > 0,
we can interpret part (iii) of Theorem 4.1 as the last exit decomposition v Pn (A) = P G,{X„eA} n— 1
=E
P G,{k(n — 1) = m,
m=0
n— 1
=E
P cj Ym = 1}{_m, Sr„ m=0 = u * o- (A) „ _ 1' n > 1,
— (4.23)
and part (iv) as the first-entrance—last-exit decomposition Pn (x, A) = Px {Xn eA} = Px {X„EA, Tx _n} n-1 n-1
+EE
Px = m, 4(n — 1) = p, X n eAl
m=0 pm
=(P — s v)n (x, A) + a(x) * u * o-(A) n _ 1 , n > 1. Note that, by (4.17), have
T, + 1
is a randomized stopping time for (X„); we
P{T,+ 1 = n1 ..97x ; Tx + 1 -11} =r(Xn _ 1 , X), Set Gx = CO
0 cc
1.
= E(,). (P — s v)n . Note that by (4.19) and (4.20),
Eun =1 +vGs_
(4.24)
(4.25)
The split chain
65
As a direct consequence of Proposition 4.7 we have the following characterization for the recurrence of the Markov chain (X):
Corollary 4.2. Suppose that (X,) is an irreducible Markov chain having an atom (s, v). Then: (X„) is recurrent<=›the embedded renewal process (Ta (i); i > 0) is recurrent<=>vGs = oo <=>vGas = 1. El Let us write ha(x)= Px { l',, < co 1 = Gas(x) h(x) = P x { }ç = 1 i.o.}.
by (4.21),
By the Markov property, and since the (undelayed) renewal process (Ta(i); i > 0) is either recurrent or transient, we have ha
0 in the transient case,
ha' = ha in the recurrent case. In the latter case, let H „ = {hf = 1} = {ha = Q. For the following proposition recall from Proposition 3.13 the existence of the harmonic function h, which is the minimal superharmonic function satisfying h = 1 tfr-a.e.
Proposition 4.8. Suppose that (X) is a recurrent Markov chain having an atom (s, v). Then ha = b and H„= H. In particular, (X) is Harris recurrent if and only if ha -- 1. Proof. Note that by Corollary 4.2, v(ha )= 1. It follows that ha = Gas = s + (P — s Ov)h a = Pha , i.e., ha is harmonic. In order to prove the minimality of ha , take any superharmonic function [lee + such that h = 1 ii-a.e. Then h Ph = s +(P — s Ov)h, since v < ifr. Now apply Theorem 3.1(iii) with f= s,K=P—sOvto prove that hha . Hence ha is minimal. That the set 11,( = {ha = 1} is the maximal Harris set follows from Proposition 3.13(iii). 1=3 Apart from the coin tossing experiment, there is an alternative interpretation for our basic construction. For this we need the following:
Definition 4.5. A randomized stopping time T (for (X„)), satisfying Px { T< co} >0 for all xeE,
4 Embedded renewal processes
66
is called a regeneration time (for (X„)), if there is probability measure 9 on (E, Si') such that Y(X, 2 1,97 „x _ 1 ;T=n) = 9
for all n > 0
(by convention, ,Fx_ 1 = If there exists a regeneration time, we say that the Markov chain (X) is regenerative. The measure 9 is called the regeneration measure. Note that, if (xn) is regenerative, then it is 9-irreducible. According to the following theorem (X) is regenerative if and only if the transition probability P has an atom. If P has an atom (s, v), we let r denote a version, r e 0 g, 0 < r < 1, of the Radon—Nikodym derivative r(x, y) =
s(x)v (dy) P(x,dy) '
Let Ts, „ denote the randomized stopping time defined by n > 1. Note that by (4.25), 2
(4.26)
Theorem 4.3. Suppose that the Markov chain (X) is irreducible. (i) If the transition probability P has an atom (s, v) then the randomized stopping time T is a regeneration time. The corresponding regeneration measure 9 is equal to v. (ii) Conversely, suppose that (X) is regenerative with regeneration measure 9. Then there is a small function seY + such that the pair (s, 9) is an atom for P. Proof. (i) The assertion follows directly from (4.16c) and (4.26) (ii) Let T be a regeneration time for (X) with corresponding regeneration measure (p. We have for any n >1: P(x,dy)P {X „edy, T= nig 7.!- i ; X n- 1 = x} = P{T= ni.F_i ;X„_ 1 = x}9(dy). Let s(x) = sup less sup P{ T= n1,97x_ 1 ;X,,_ 1 = x} 1. ri
1
Then P > s 0 9. If it were true that s = 0 0-a.e., then this would imply that P o { T < oo} = 0 — a contradiction. Consequently, (s, 9) is an atom 0 for P.
The split chain
67
Example 4.2. U) Consider the 2-server queueing chain (W„) introduced in Example 4.1(j). Suppose that i<5<2I. Then the subset F = {XEE:X (2) >5 — t} is an absorbing set for (W,,), and (0, 0) is not a communicating state. However, (W,,) is still regenerative. To see this, let y be any constant satisfying
S — I. < y < T. Then the randomized stopping time Ty defined by
T, = inf In > 1 : W ) 1 = 0, 14/ ) 1 < y,
t,, >
y}
is a regeneration time for (14 7 ). The corresponding regeneration measure cp y is given by
(py = .4(0,(s o —
tcY ) ),)),
where tcY) is a random variable, independent of s o , and having the same distribution as t 1 given t 1 .y; i.e., at Ty = n WV ) = 0 a.s., and w(2) -99 i c, ''
— k"n- 1 — tn ) ± given Y = (.3 0 — t 1 ) ± given t 1 .T = (s0 — tn + .
t y,
Positive and null recurrence
Our interest in Chapter 5 concentrates on the existence and uniqueness of R-invariant functions and measures. (In Proposition 4.6 we have already briefly dealt with this problem under the assumption that K has a proper atom.) The main result of this chapter states that an irreducible R-recurrent kernel K always has an (essentially) unique R-invariant function h and an (essentially) unique R-invariant measure it. This gives rise to a classification of R-recurrent kernels: If it(h) is finite we call the kernel K R-positive recurrent, otherwise R-null recurrent. In the case where K = P is the transition probability of a recurrent Markov chain, positive recurrence (i.e. m(E) < 0o) means probabilistically that the chain has a stationary probability distribution. This can be achieved by normalizing it to a probability measure. Then, given that Y(X o) = it, £f(X)=it
for all n>1.
It is easy to see that, in fact, given Y (X 0) = it, the Marko v chain (X n > 0) is a stationary process. We shall classify positive recurrent Markov chains further by looking at the 'speeds' with which the chain returns to the it-positive sets in the stationary situation. Later in Chapter 6 we will see that these 'rates of recurrence' correspond to the rates with which the n-step transition probabilities P"(x, A) tend to their stationary limits OA). By using a 'similarity transformation' which transforms an R-recurrent kernel into a Harris recurrent transition probability we can extend the rate of recurrence results to the general kernel K. These extensions will be discussed at the end of this chapter. Throughout this chapter we assume that K is an irreducible kernel with maximal irreducibility measure tli. R denotes the convergence parameter of K, e+= {fee, :0(f)> 0}, d is the period of K. By Theorem 2.1 there exist an integer m o > 1, a small function SE,99+ and small measure v e de ± such that the minorization condition M(m o , 1,s, v) holds, Km° >s0v;
in other words, the pair (s, v) is an atom for the kernel Km°. We assume also
Subinvariant and invariant functions
69
that
cmo = g.c.d.{mo ,d} = 1.
(5.1)
(By Remark 2.1(ii) this does not involve any loss of generality.) In this case Km° is an irreducible kernel on (E,e) with period d and convergence parameter Rm°. Moreover, K is R-recurrent if and only if Km° is Rin°recurrent (see Proposition 3.5).
5.1 Subinvariant and invariant functions Let r, 0 < r < oo , be a constant. Definition 5.1. A non-negative function hEe+ , which is not identically infinite, is called r-subinvariant (for K) if h is superharmonic for rK, i.e., h > rKh.
If hee , h # cc, is harmonic for rK, i.e., h =rKh,
then h is called r-invariant. We summarize some easy results on r-subinvariant functions in two propositions:
Proposition 5.1. Suppose that h is an r-subinvariant function. Then: (i) The set {h < co} is closed and full. (ii) h > 0 everywhere. (iii) If v is a small measure then 0 < v(h) < cc. Proof. (i) and (ii): Use Propositions 2.5 and 3.2. (iii) By (ii), v(h) > 0. v(h) is finite, since h rm° - "Km° +nh >rm°+n v(h)Kns, n >O.
0
Proposition 5.2. (i) If either r < R, or r = R and K is R-transient, then there exists an r-subinvariant function. (ii) If r> R then there does not exist any r-subinvariant function. Proof. (i) Set h = G's with s small.
CI
(ii) Obvious.
The interesting, non-trivial case is the case where r = R and K is Rrecurrent. We denote by G Mo (R) SV the potential kernel of Rm°(Km° — SO v), i.e., , ,
00
G(R)),s,v= E R nmo (K mo
_ S 0 On.
n=o Note that, when mo = 1, i.e. (s, v) is an atom, GL0 R),, equals GT,), = Gs(Rv) (cf. (4.13)).
70
5 Positive and null recurrence
Theorem 5.1. Suppose that K is R-recurrent. Then there exists an Rinvariant function h,, satisfying v(h v ) = 1 and having the following uniqueness and minimality properties: For any R-subinvariant function h satisfying v(h) = 1 we have
h = hv tfr-a.e., and
h > hv everywhere.
hv is given by the formula E Ro+i)„,o(Kmo s® ons. hv = R'"G (R) mo,s,vs = n= 0
Proof. Consider first the case where m o = 1. Then by Proposition 4.7(iii) def
hv = RGs satisfies v(h v) = E(R) = 1. Consequently, h v ee + , h,, is not identically infinite and hv = R(K — s 0 v)h v + Rs = RKhv . (We applied Proposition 2.1 to the kernel R(K — sØ v).) Thus hv is Rinvariant. be an arbitrary R-subinvariant function satisfying Let now he v(h) = 1. We estimate h as follows:
h > RKh = Rs + R(K — s 0 v)h > hi, by Theorem 3.1(iii). The function h' = h — hv ee + satisfies v(h') = 0, and h' > RKh' . By Proposition 5.1(ii), h' is not R-subinvariant. This is possible only if i.e. h = hv tLi-a.e. Allow now mo to be arbitrary. What we have proved above holds true for def
the Rm°-recurrent kernel K. Hence h v = Rm°G ), s, v is Rm°-invariant for Km°, i.e.,
hv = Rm° Km°h v . It follows that
RKh v = IV" ' Km° 'h v . Clearly RKh v > 0 everywhere. Also, since h v = Rm° - 1 Km° - 1 (RKh v ), RKh v cannot be identically infinite. Hence RKh v , too, is Rm°-invariant for Km°. By minimality and uniqueness, there is a constant c = v(RKh v), 0 < c < oo , such that
RKh v > chv , with an equality tL,-a.e. By iterating we obtain
Rm°Km°h v > - - - > cm° -1 RKh v > cm°hv , with an equality EP-a.e.
Subinvariant and invariant functions Since h, is Rm°-invariant for Km°, c = 1, and the above inequalities are equalities. In particular, hv is R-invariant for K. The uniqueness and minimality of hv for K follow from the fact that every R-subinvariant function h for K is R?"-subinvariant for K'n° and from the E uniqueness and minimality of h, for Km°. For later purposes, note that when m o = 1 and R = 1, we have h, =
>
(K — s 0 y)s =
n=0
x
a,, (see (4.11)).
n=0
Specializing to the case where K = P is the transition probability of a Markov chain (X) yields the following corollary. It generalizes Proposition 4.8 for arbitrary m 0 . We write co = 1 (Pm° — s 0 v)n . G= mo ,s,„ Go) mo,s,v n=0
Corollary 5.1. Suppose that (x- n ) is recurrent. Then the function hv = Gm.,s, vs = E - (Pm. - s 0 y)s is equal to the minimal harmonic function h n=0 introduced in Proposition 3.13. In particular, 171 = {hv = 1} is the maximal Harris set for (X), and (x n is Harris recurrent if and only if h v 1. El )
Our next aim is to develop a useful tool, which enables us to transform almost any result from the probabilistic case K = P to general K: Let K be an arbitrary irreducible kernel with R> 0 and let h be an arbitrary r-subinvariant function (for some fixed 0< r < R). On the closed full set F = {h < a)} = {0 < h < oo} (cf. Proposition 5.1) we can define a kernel k by setting g(x, A) = r(h(x)) - 1 f K(x,dy)h(y), xeF, A eS n F; A
or just briefly, denoting by 1h the (kernel of) multiplication by h,
k = rlh _,KI h on (F , e n F).
(5.2)
We call (5.2) the similarity transform of K (by the r-subinvariant function h). Clearly, k is a transition probability on (F,‘ n F). Hence it governs the transitions of a Markov chain a n ; n > 0) with state space (F, 6° nF). The following result is obvious: Proposition 5.3. (i) (I n) is irreducible with 0 as a maximal irreducibility measure. (ii) The convergence parameter TZ of k is equal to il = R/r. (iii)(I) is recurrent if and only if r = R and K is R-recurrent. El
5 Positive and null recurrence
72
The most interesting special case is the case where r = R,K is R-recurrent and h= h„ is the minimal R-invariant function given by Theorem 5.1. is R-recurrent. Let h =h,,= = h, < 09},k = Rio,,,_,Ki hv on E. Then is Harris recurrent (on its state space t).
Proposition 5.4. Suppose that
L
R(n +
1)Mo(K7110
the Markov chain
S
()
K
Proof. There is no loss of generality in assuming that hy < op everywhere. (Otherwise restrict K to E.) By Proposition 3.13 we have to prove that the minimal superharmonic function E for (k- n) is identically equal to 1. Since h,, is R-invariant for K, 1 is harmonic for (k- n). Let now ii be superharmonic for (5 n satisfying ii = 1 Then hyri is R-subinvariant for K, and, by the minimality of hy , we have VI' > h,,. Hence ii> 1, and therefore =1. LI )
For later purposes we note the following Proposition 5.5. Suppose that K is R-recurrent. Then for any m, i > 1, hy restricted to the set Er )= Ei + Ei+cm + •• • + is the (essentially) unique minimal Ritz-invariant function for the Rm-recurrent kernel Km with state space Er ) . Proof. Use similar arguments as in the proof of Proposition 3.14.
111
5.2 Subinvariant and invariant measures Let 0 < r < co be a constant. Definition 5.2. A measure ge,# , m(B)< oc for some Bee+ , is called rsubinvariant (for K), if it
> rmK.
If it = rnK,
then the r-subinvariant measure it is called r-invariant. When r = 1 we say simply subinvariant (resp. invariant) instead of 1subinvariant (resp. 1-invariant). Proposition 5.6. Suppose that it is an r-subinvariant measure. Then : (i) It is a-finite and tif < 77. (ii) If s is a small function then 0 < it(s) < pc. Proof. Let Bee be arbitrary. Then n(B)rnnK(B)
for all n
0.
(5.3)
73
Subinvariant and invariant measures
Let f = G(P) 1 B for some 0 < p < r. If Bee is such that n(B) < :c (cf. Definition 5.2) then it follows that n(f) < cc. By irreducibility f > 0 everywhere. Hence it is o--finite. By (5.3) and by irreducibility, n(B) > 0 for all Be e + . Thus 1i is absolutely continuous w.r.t. 7E. The proof of (ii) is similar to that of Proposition 5.1(iii). LI Analogously to Proposition 5.2 we have :
Proposition 5.7. (i) If either r < R, or r = R and K is R-transient, then there exists an r-subinvariant measure which is equivalent to (i.e., serves as a maximal irreducibility measure). (ii) If r> R then there does not exist any r-subinvariant measure. Proof. (i) Set n = vG(r) ,v any small measure. (ii) Obvious. El
We omit the proof of the following theorem since it is completely analogous to that of Theorem 5.1.
Theorem 5.2. Suppose that K is R-recurrent. Then the measure
TEs
defined
by ( ) ms = Rni°vG :
=
R (n 1)m° V(K m°
y)
n= 0
is R-invariant, is equivalent to IP, and satisfies n,(s) = 1. It is the unique Rsubinvariant measure TE satisfying n(s) = 1. LI Suppose that K is R-recurrent. In what follows we shall use the symbol TE to denote an R-invariant measure for K. By the above theorem there is a constant c = it(s), 0 < c < cc, such that it = C7rs
,
(5.4)
and it is a maximal irreducibility measure. So, for example,
e+ = {A eg: n(A) > 0}, and a set AGe is full if and only if 7r(Ac) = 0. Let h be an arbitrary R-subinvariant function for K. Since by Theorem 5.1 there is a constant c = v(h), 0 < c < cc, such that h = chv m-a.e.,
(5.5)
the following definition is unambiguous:
Defmition 5.3. An R-recurrent kernel K is called R-positive recurrent if m(h) is finite, otherwise R-null recurrent.
74
5 Positive and null recurrence
By (5.4) and (5.5), and by the definitions of it and hv , the integral n(h) takes the form GO
n(h) = m(s)v(h)
(n + 1)R(" ± 2)rn°v(K'n° — s
(5.6)
v)ns.
n=0
In the case where m o = 1, n(h) can also be written in the form n(h) = R2n(s)v(h)
dii(R) , dR
whenever is differentiable at R. In fact, it is easy to see that the left hand side derivative 1 — 6 (r) always exists and equals R—r ri R R - .2 7r(s) - I vor i n(h) co ).
iim
Suppose now that K = P is the transition probability of a Markov chain (X). As a corollary to Theorem 5.2 we obtain: Corollary 5.2. Suppose that (X n) is recurrent. Then the measure by
7Cs
defined
CO
7Cs = vG,no ,s, v
=
v(pno — so v) n=0
is invariant, is equivalent to (//, and satisfies m s(s)= 1. Moreover, it is the unique subinvariant measure it satisfying n(s) = 1. El We call the Markov chain (X) positive recurrent or null recurrent, depending on whether the transition probability Pis 1-positive recurrent or 1-null recurrent. Note that, since the harmonic functions for a recurrent Markov chain are constants (n-a.e.), positive recurrence is equivalent to it being a finite measure. In this case we always norm it to a probability measure, and call it also the stationary distribution of the Markov chain (Xn ). It is easy to see that in the positive recurrent case the Markov chain
(X n ; n > 0) is a stationary process, given that the initial distribution is the stationary distribution, Y(X 0) = From (5.6) we get (n + 1)v(Pn0 — s v)s.
n(E) = n(s) /1 = 0
When mo = 1, the invariant measure can be interpreted in terms of the
75
Expectations over blocks
split chain (X„,Y): DC
CO
its(A) = xv(P- s v)n(A)= .s„
.E0
,yux„),
AeS.
(5.7)
Hence n(A) is the expected number of visits by (X) to the set A during an ablock, that is, between two consecutive occurrences of the event = 11. In particular, we obtain ns(E)=1E,(S„
Thus (X„) is positive recurrent if and only if the expectation IcS„ is finite. Let K be an R-recurrent kernel with R-invariant function h and measure 7r, and let k denote the transformed kernel k = RI„_iKIh . Since clearly the measure FE = nlh is invariant for k, we obtain the following result: Proposition 5.8. The kernel K is R-positive recurrent if and only if the Marko v chain () with transition probability k is positive recurrent. fl Examples 5.1. (a) Let (X) be a recurrent, discrete Markov chain with transition matrix P and with maximal irreducibility measure ii = CardF . Then, for any fixed state zEF, the row vector 7Ez defined by 7Ez(x) = Ez 1 {xn
,
xeE,
is invariant. Moreover, (X) is positive recurrent if and only if the expectation lE„S„ is finite for some xeE. Then ExSx is finite whenever xeF ; the stationary distribution it is given by m(x) = (E x S x ) - 1 , x e E.
Henceforth, when dealing with a recurrent, discrete Markov chain, we shall write E„ for the set of 7E-positive states, E„= {xeE:n(x)> 0} (= F). (c) The Lebesgue measure is an invariant measure for the random walk on (R, (e) The measure it defined by 7r(dt) = (1 — F(t))dt is invariant for the Markov chain (V:o ; 0). Hence (V t) (with spread-out F) is positive recurrent if and only if Ez = R ( 1 - F(t))dt is finite.
5.3 Expectations over blocks Throughout Sections 5.3-5.5 we assume that K= P is the transition probability of a Harris recurrent Markov chain (X). it denotes a fixed
76
5 Positive and null recurrence
invariant measure for (X). (There is not much loss of generality in assuming the somewhat stronger Harris recurrence instead of mere recurrence. If (X) were only recurrent we should consider its restriction to a Harris set H(= E n-a.e.).) Note, in particular, that P is stochastic, and therefore ExSB = UB (x,E)
for all xeE, BEe.
For any Bee, let 0 < TB (0) < TB(1) < • denote the successive visit X TB(0 ) is epochs by (X) to B. For any i 1, the sequence (X TB(i _ 1)+1 , called the rth B-block. By the Markov property, for any i 1 and xeB, given the history up to TB (i — 1) and given X TB(i 1) = x, the ith B-block has the same distribution as the first B-block (X 1 , ,X) given X 0 = x. In particular, it follows that for any f ee ±: [
TB(i)
E
E
f(x n )1,0)-- Td i — 1) =
SB
xn = xi = [Ex E f(x) ,
n= T B(i— 1)+ 1
1
for all xeB, m > 0, i > 1.
=UB f(x),
When P has an atom (s, v) we can construct the split chain (X„, ic) in the manner described in Section 4.4. By the ith a-block we mean the sequence (X Ta,(i— 1) + 1 5 • • • XT(i))• The a-blocks are i.i.d. random elements having the same distribution as the sequence (X 1 , . , Xs .) given 170 = 1. In particular, for any f : [
TAO
E
E
f (X n )c_Y ;
) — 1) = mi=
f (X n 1
n= T c,(i— 1) + 1
by (5. 7)
= ns(f)
= n(s) 1 n(f
i.e., the expectation of f (X, i ) summed over any a-block is equal (up to multiplication by a constant) to the invariant measure 7t(f) of f. We shall show that a similar result holds true for a B-block, provided that the Markov chain (X) starts `stationarily' from B. Recall from (3.6) the definition of the potential kernel Ski - 1
G;3 (x, A) = E(PIB ,)n(x, A) = lEx 0
E
n=
o
1,(xn ).
Proposition 5.9. For all Bee + : 7aB G'„= nIB UB = Proof. Denote 7E ' =
nIB GB.
Since 7r/B UB =
P, we need only show that
7E ' = 7E.
77
Expectations over blocks Decompose it as follows: it = 7TIB + nPIB c,
since it = nP,
N -1
(N B .). + n(PIB c) N , 0 > it', letting N —> cc. = nIB
y
by iterating,
Further, 7E' is subinvariant: 7r/ = nIB + n' PIBc by the definition of it '. = nPIB + n' PIB . since it = nP, >n'P since ir > n' .
By the definition of it', the subinvariant measures it and Te coincide on (B,e nB). It follows from the uniqueness of it (see Theorem 5.2) that ir' = 7r. CI The result of Proposition 5.9 can also be written in the form SB - l
i 7r(dx)Ex B
E
SB
Ef(Xn) f 7r(d f(X„)= x)Ex 1
B
0
=7r(f),
for all
Setting f 1, we obtain the following corollary. For its part (ii), recall from Proposition 5.6 that 0< it(C) < oo for every small set CeY ± .
Corollary 5.3. (i) For all Bee+
J
:
7r(dx)ExSB = n(E) < co
or = co
B
depending on whether (X n ) is positive or null recurrent. (ii) If sup Ex Sc < oo
for some small set Ce99+
,
xc
then (X) is positive recurrent.
El
The following proposition gives a criterion for positive recurrence in terms of a 'drift condition':
Proposition 5.10. Let Bee be arbitrary. Suppose that for some function gee + and constant y> 0: E[g(X„, 1 ) —
= x]
= Pg (x) — g (x) —y
for all xe.Bc.
(5.8)
78
5 Positive and null recurrence
Then Ex SB y -1 131Beg(x)+ 1 for all xeE and
ExS B y - 1 g(x) for all xe Be . In particular, if Cee is small, and sup E[g(X n+ 1 )— g(X,,)1X,,= x] <0, x.c. and sup E[g(X„ + 1 ); x, + l e CI X,, = x] = sup PIc cg(x)< oo, xEc xc then (X) is positive recurrent.
Proof. The hypothesis (5.8) can be written in the form g I B.Pg + y1B. > yGB 1 Bc by Theorem 3.1 (iii), = yl Bc (x)Ex SB . 'Multiplying' this from the left by P/Be we get
PIBc g(x) yU B l Bc(x) = y(Ex S B —1) The rest follows from Corollary 5.3(ii).
El
Example 5.2. (d) (The reflected random walk). Suppose that Ez i <0. Then (W) is positive recurrent. (Hint: Choose g(x) = x, and C = [0, c], c sufficiently large, in Proposition 5.10.) Our next object of study is the sums over the sequence (X 0 ,... , X). The following concepts and results will be used later in Chapters 6 and 7 in the investigation of the asymptotic behaviour of the n-step transition probabilities P". We write orl (m) for the set of 7r-integrable functions f ee. Let Beg + and f e Y 1+ (n) be arbitrary. It follows from Proposition 5.9 f (X „) is finite for it-almost all xeB. According to the that UB f(x)= Ex Es' 1 following proposition UB f is finite even on a full set (i.e. n-a.e.).
Proposition 5.11. For any Bee + ,f e Y 1+ (m): U B f is finite it-almost everywhere. Proof. Since UB f is finite 7-a.e. on B, and GB! =f + IBcUBf,
79
Expectations over blocks
it is sufficient to show that GB f is finite n-a.e. By Proposition 2.5(fi) there is a closed set F such that UB f is finite on Bn.F. We claim that the set F' = {GB f < oo} n F is closed. If xeBnF' g BnF then co > UB f (x) = PGB f (x), whence P(x,(F)e)= 0. If
XE Bc n F
then
co > GB f(x) _- PGB f(x), whence again P(x,(F')c)= 0. It follows that F' is closed. By Proposition 2.5(i) it is full. El Setting f = 1 we obtain the following:
Corollary 5.4. If (X) is positive recurrent then for all Bee ,IE„SB is finite for 7r-almost all xeE. D Let us fix a non-negative it-integrable function f e Y +1(n). The full set {UB f < co } may of course depend on B. We set the following:
Definition 5.4. A state xe E is called f-regular if f (x) and UB f (x) are finite for all BEe+ . More generally, a finite measureAeb.i / I ± is called f-regular if 2(f) and A.U B f are finite for all Bee+ . A subset D g E is called f-regular if f and UB f are bounded on D for all A bounded n-integrable function gebY l (n) is called special if the whole state space E is Ighregular, i.e., if UB IgI is bounded for all BEe + A set Dee with n(D) < co is called special if 1 D is a special function. When f 1 we say simply 'regular' instead of '1-regular'. We denote by Rf the set of f-regular states xeE. .
It is by no means trivial that there should exist any f-regular states. However, we shall show that there exists even an increasing sequence D 1 g D 2 c... of f-regular sets such that their union U rD i is full. In order to avoid unessential technicalities we assume for the rest of this section that (X„) is aperiodic. Recall the definition of the kernel co
G
5,,,.
y (Pni° —5 0 v)", n=0
and the probabilistic interpretation of the kernel Gs , = Gi: T„
Gs„ f (x) = Ex
y f(X n). n=0
5 Positive and null recurrence
80 Let us denote
6 mo,s,v = G nio ,s,„(I + P+ • • • + Pm°-1 ) (pmo —
s
n= 0
vr(I + P + • • + Pm°-1 ),
noting that 6 1 ,s,v = Proposition 5.12. Gmo ,,,vf is finite it-almost everywhere.
Proof. If mo = 1 then the result follows from Proposition 5.11 by applying it to the split chain (X, Y) with B=a=Ex {1}. By considering the mo-step chain (X nnin) and the function (I + P+ • + Pm° -1 ) f instead of f we can conclude that the result holds true also in the case m o > 2. El Proposition 5.13. (i) A set Dg E is f-regular if and only if Gmov f is bounded on D. (ii) In particular, the set of f-regular states R f is equal to the full set < ool. There exists an increasing sequence D I g D 2 g • of fregular sets, Di = {G m.,,f < i 1, for example, such that U Di = R1 (iii)A function gebY l(n) is special if and only if is bounded. In particular, every small function s is special, and there is an increasing sequence D 1 g D2 a... of special sets such that U °Di = E. (iv) A finite measure 2 is f-regular if and only if AG mo ,s,v f is finite. .
Proof. Suppose first that mo = 1, i.e., (s, v) is an atom. Since SB < SBoOT we obtain. SD
f(x)+ UB f(x)=E x Ef(X n ) 0 T„
SB
< Ex I f(X) + Ef (X n). 1 0 The first term on the right hand side equals Gso,f(x). By applying Proposition 5.11 to the split chain, we see that the second term is finite. Hence, if G 1 f = Gf is bounded on D then f and UB f are bounded on D; i.e., D is f-regular. is Conversely, if D is f-regular, we take a set D'ee+ such that bounded on D' (cf. Proposition 5.12) and use the inequality T„ ± 700 sDto prove that Gs, is bounded on D. Thus the proof of (i) is completed in the case where m o = 1. Note that (ii) and (iii) are immediate consequences of (i). (For (iii) recall also Proposition 2.11(iv).) The proof of (iv) is similar to that of (i). In order to prove the case m o > 2 we need three lemmas: ,
'
Expectations over blocks
81
The first lemma gives a simple criterion in terms of 'geometric trials' for the finiteness of a stopping time. Lemma 5.1. Let (F„ ; n > 0) be a history and let T be a stopping time relative to it. Suppose that there is a constant y > 0 such that PIT = ni_ 1 1 Then
T
y
on
IT .17},
for all n .1.
is a finite (a.s.) and E[T1,9; 0 ]< y -1 < oo .
Proof. By the hypotheses PIT -ti+1V7- 0 }=E[PIT*ni_ 1 1;T -n1g70 ] <(1 — y)P {2 > n 0 } for all n > 1. Consequently, PIT >nig7. 0 1 (1 — y)" -1 for all n > 1, from which the results follow.
El
The second lemma can be regarded as a generalization of the classical Wald lemma. Wald's lemma states that, if , 1 ,... are i.i.d. random variables with common finite mean M, and z is a stopping time relative to the internal history (F n4 ) and with finite mean ET, then the expectation ., EY = M(1 + ET) 0 is finite. Lemma 5.2. Suppose that (,, ; n >0) is a non-negative stochastic process, adapted to a history (F„ ; n > 0). Let x be a stopping time relative to (97n) and let Mo = EE 0 , M = sup {ess sup ERnig7 ,- 1]}. n>1
Then
lEY n = E0 ± IF[YER n l,n _J;T 0
1]
1
< Mo ± MET. Proof. Write I
GO
lEY n = E 0 + ylE[„;-c fl] 1 o and condition w.r.t. the a-algebrase.,,---.- 1.
El
82
5 Positive and null recurrence
The last lemma also completes the proof of Proposition 5.13: Lemma 5.3. A set D g E (or a measure A) is f-regular if and only if it is (I + • - • + Pm° - ' ) f-r egular for the m o-step chain (X„, ; n 0). Proof. Set m = mo and define m SB
= inf In 1 : X nm e13}.
By using the Markov property at nm we easily get the following identity: for any xeE,BEe + , mS B + m — 1
MS B
Ex E ( f + — + Pm -1 f)(X mn) = Ex E
f (X n).
(5.9)
n=0
n=0
If D g E is (f + • • • + Pm - 1 f)-regular for (X„m), it follows from (5.9) and the inequality SB M m SB that D is f-regular for (X n). To prove the converse, suppose that D g E is f-regular. Proposition 5.13 applied to the m-step chain states that there exists a set Cee which is (f + • • • + Pm- 1 f)-regular for (X,,.). (Note that the m-step chain possesses the atom (s, v) and we have already proved Proposition 5.13 for Markov chains having an atom.) By the first part of the present proof, C is f-regular for (X). By Proposition 2.6 we may suppose that C is also a small set. This and aperiodicity imply that there is an integer n o > 2m and constant y> 0 such that (keeping Bee ± fixed) P"(y, B) .... y for all ye C, no — 2m < n no — m. Define a sequence (j(O; i g(i) = inf In
(5.10)
0) of stopping times by setting ;7(0) = Tc , and
g(i — 1) + no : X n e CI, i > 1.
Since (X„) is Harris recurrent by our basic hypothesis, q(i) is finite P x-a.s. for all xEE, i > 1. For every i 1, set (F i ) = (F nx(i) ), and n(0) Tc no) 4 = y f(xn)= Ef(x.), i= E f(xn).
0
0
Let 0- (i) be the unique integer satisfying n(i — 1) + no — 2m < o- (i)m ti(i — 1) + n o — m. Define a stopping time 1" relative to the history CFO by T = inf
{i 1: Xn(omeB}.
It follows from (5.10) that P {T = il,Fi _ i } > y
on {T > i}, for all i > 1.
Since D is f-regular, def Mo = sup xeD
Ex4 = sup Gcf (x) < xED
00.
83
Expectations over blocks Applying the strong Markov property at i > 1,
— 1), we obtain for all xeE,
Sc
lEx [ i I
1 ] < sup E y E f (X„), yED
1
where S'c = inf {n > n o : X„E CI Sc(no). Consequently, by using the strong Markov property n o — 1 times, sc (flw) E x[il - 1]
sup
E
f(X)
no sup Ucf(y)= M, say. yeC
y€C 1
M is finite, because C is f-regular. By applying Lemmas 5.1 and 5.2 and using the inequality rnm SB + m ,i(x), we obtain mmsB +m-1
sup Ex xeD
E
n=0
f(X)
sup Ex
Mo + My -1
xeD i= o
That D is ( f + • • • + Ptm f)-regularfor (X,,,n) now follows from the identity (5.9). The result for 2 is proved similarly. EI Let AeS + be an f-regular set, and let D E be arbitrary. By using the inequality SB TA + SB oO T , we see that, if GA f (x) = Ex E 0T A f(X) is bounded on D then also D is f-regular. Since by Proposition 5.13 any set Bo ee+ contains an f-regular set we obtain the following criterion for f-regularity: Proposition 5.14. (i) A set D E is f-regular if (and only if) for some fregular set A Er- : sup GAf < co.
(5.11)
(ii) A set D E is f-regular, if (and only if) for some set B o ee , (5.11) holds for all A ee+nB o . LI A similar result holds true for measures. For special functions we have: Corollary 5.5. (i) A 7r-integrable function ge Y 1 (7r) is special if (and only if) for some I g I-regular set A E 6 + : SUPGAIgl <(c.
(5.12)
(ii) A n-integrable function gEY 1 (m) is special if (and only if) for some set B o ee+, (5.12) holds for all A eS ± nB o . Examples 5.3. (a) Let (X) be a positive Harris recurrent, discrete Markov chain with stationary distribution g (cf. Example 5.1(a)). The set R 1 of
5 Positive and null recurrence
84
regular states is equal to R 1 = {xe E: lE„SR. < oo 1 ( Q E7,), where x oe ER isarbty. (d) For a positive recurrent reflected random walk (see Example 5.2 (d)) every state xeR ± is regular. More generally, every bounded set D c R ± is regular. (e) (The forward process). Suppose that the Markov chain ( V n+6 ; n 0) is positive recurrent (cf. Example 5.1(e)). Then every bounded set D R ± is regular. An initial distribution F o is regular if and only if the mean j. tFo(dt) is finite.
5.4 Recurrence of degree 2 Let (X n) be a Harris recurrent Markov chain. It follows from Corollary 5.3(i) that positive recurrent chains can be characterized as those recurrent chains for which some (or equivalently, all) of the expectations SB n(dx)E xSB , Bee + , is (are) finite. Consideration of the second moments of the hitting times SB leads to the following:
Definition 5.5. The Markov chain (X) is called recurrent of degree 2, if
IB n(dx)IE,S; < oo for all Note that, by Corollary 5.3(i) recurrence of degree 2 implies positive recurrence. Below we will see that recurrence of degree 2 can be alternatively characterized as the regularity of the stationary distribution 7r. In order to prove this we need to consider the following identity:
Lemma 5.4. For all Bee: Ex [-ISB (SB + 1)]= UB GB 1(x)
for all xeE.
(5.13)
Proof. Since s. -IS B(SB + 1) = E(SB — n), o and SB = n + TB 0 On on {S B > n} for all n >1, the left hand side of (5.13) is equal to SB
ExSB + Ex
E
TB ° O n
n=1
sB
= Ex5B ± Ex
E
Exn TB , by conditioning w.r.t. g7 n ,
n= 1 = LI B l(X) ± U B (G B 1 — 1)(x) = UB GB 1(X).
0
Recurrence of degree 2
85
Proposition 5.15. For all Bee : 1E7r1B [S] = 2ç TB ± 1 = 2Er SB — 1.
Proof. Integrate both sides of (5.13) over B w.r.t. the measure 7r, and use Proposition 5.9. 0 For the rest of this section we assume that (X n) is aperiodic. By using Propositions 5.13, 5.14 and 5.15 we get the following characterizations for recurrence of degree 2:
Proposition 5.16. Each of the following four conditions is equivalent to recurrence of degree 2: (i)The invariant probability measure m is regular, i.e., E T,SB is finite for all Bee. (ii) nOmo ,s,,(E) is finite. (iii) EidD [Si] is finite for some regular set Dee + . (iv) There is a set B o er' such that lE rriA [S,24 ] is finite for all 0 Aee + nB o . For later purposes we introduce the concept of regularity of degree 2:
Definition 5.6. A state xeE (resp. a measure Aelxii + , a subset D g E) is called regular of degree 2, if Ex [4] (resp.E[S;3], sup Ey [4]) is finite for all Beet yeD
According to the following proposition regularity of degree 2 has various equivalent characterizations. We formulate the result only for states, the result for measures and sets being similar.
Proposition 5.17. Let
be an arbitrary state. Each of the following six conditions is equivalent to x being regular of degree 2: (i) Ex [n] is finite for some Dee+ ,D regular of degree 2. (ii) The measure UB (x,.) is regular for all Bee+ . (iii) The state x is G1-regular for all (iv) The measure 6,„0,s,v(x,.) is regular. (v) The state x is C O3 1-regular. (vi) C ,n20 (x, E) < cc. XEE
Proof. That (i) implies the regularity of degree 2 of x follows immediately from the inequality
S
+ *Os),
Bee.
If x is regular of degree 2, then by Lemma 5.4
UB GB 1(x) < oo for all
5 Positive and null recurrence
86
Since UB > UA and GB > GA whenever B g A, it follows that UB GA 1(x) < co and
UA GB 1(x) < co for all A Beg ± .
By Proposition 5.14 (ii) these imply (ii) and (iii). If, conversely, (iii) holds, then in particular UB GB 1(x) < oo for all
But, by Lemma 5.4, this means just that x is regular of degree 2. Now, by Proposition 5.13, (iii) is equivalent to the condition Cmo,s,v GB 1(x) < oo for all which by Proposition 5.14 is equivalent to (iv). Similarly, (ii) and (v) are equivalent. Finally, it also follows from Proposition 5.13 that (iv), (v) and (vi) are equivalent. El By the following proposition, if (X n)is recurrent of degree 2 then n-almost all states xeE are regular of degree 2. Proposition 5.18. Suppose that the Markov chain (X) is recurrent of def degree 2. Then the set R 2 = {xeE: x regular of degree 2} is equal to the 1 < cc}. There is an increasing sequence D I full set {6.2 D2 • • • of sets Di e S such that each D i is regular of degree 2 and their union equals UTDi = R (2) . Proof. By Proposition 5.13 C„, s, v 1 is n-integrable. It follows from
Proposition 5.12 that the set {C 2 0,s,v 1 < co } is full. By criterion (vi) of Proposition 5.17 this set is equal to R(2) . Clearly, Di = {0 0,,, v 1 < i}, i >1, form the desired sequence of regular sets (of degree 2). 1=1 Examples 5.4. (a) Let (X n ) be a Harris recurrent, discrete Markov chain. It is recurrent of degree 2 if and only if the expectation E x [S] is finite for some xeE. Then R(2) = {xeE :E xS x20 < oo }, xc eEn arbitrary. We have E'7, (d) The reflected random walk (Wn) is recurrent of degree 2 if and only if E0 [S] is finite. In this case every state xe R + is regular of degree 2. (e)(The forward process). Assume that F is spread-out. Then the Markov chain (17 70 ) is recurrent of degree 2 if and only if E[zn = f t2F(dt) is finite. Then every state xe R, is regular of degree 2.
5.5 Geometric recurrence Suppose that (X) is a Harris recurrent Markov chain. Definition 5.7. (i) The Markov chain (X) is called geometrically recurrent if for some small set Ce<99+ , some constant r>1: sup Ex [rsc] < cc. xeC
(5.14)
87
Geometric recurrence
(ii) A state xEE (resp. a measure AEbde , a subset D E) is called geometrically regular if for all Be S ± there exists a constant r> 1, depending on x (resp. ), D) and on B, such that Ex [r5 B] (resp. EA [rs '3], sup Ey [rs.]) is finite. yeD
Note that geometric recurrence implies positive recurrence. If there exists a geometrically regular set Dee , then (X„) is geometrically recurrent (see Proposition 2.6). The following proposition is the main result of this section. Proposition 5.19. Suppose that the Markov chain (X) is geometrically recurrent. Then: (i) The stationary distribution it is geometrically regular. (ii) The small set CE99+ satisfying (5.14) is geometrically regular. There exists an increasing sequence D I D 2 • • ' of geometrically regular sets D i ES, such that their union D i is full. In particular, 7r-almost every state xeE is geometrically regular.
U le°
In the proof we need a couple of lemmas. Let us write
f ;( )(x)= Ex [7.s.], xe E,r > 1, BE e. Recall the definition of the potential kernel sB -1 co GB(x, A) , Ex E i,(x,,), Y(PI B X(x, A). 0 0 Lemma 5.5. (i) For all r > 1, BES:
f ;( ;) = 1 + (1 — r- 1 )G;3f (13.) .
(5.15)
(ii) For all Bee: 1E7dB ErSB] = m(B) +(1 —
(iii) If there exists a geometrically regular set DES + then 7r is geometrically regular.
Proof. (i) We have SB SB - 1 (1 - r - 1 ) -1 (f ; ( )(x) — 1) = Ex Ern = Ex E r581 0 sB _I = Ex E E xn rsB, 0 by conditioning w.r.t. ,.*-7. „x and using the fact that SB — n = {S B > n} for n 0, = G;1 f (139(x).
Sil o O n
on
5 Positive and null recurrence
88
(ii) Integrate both sides of the identity (5.15) over B w.r.t. it and use Proposition 5.9. (iii) By (ii), and since SB < SD ± SB oOsD , we have for all Bee + : Eir [rs B] < ET, Er spr se's,)] < (1 — r - 1) - i sup Ex [rs..] sup Ey [rs.]. xeD yeD Since D is geometrically regular, the right hand side is finite for sufficiently small r> 1. El The second lemma is a general result concerning the growth of the expectations of random sums (cf. Lemmas 5.1 and 5.2):
Lemma 5.6. Suppose that („ ; n > 0) is a non-negative stochastic process, adapted to a history (g-i n ; n > 0). Let i be a stopping time relative to (,). Suppose that for some constant y> 0, 03 {T = n1,97' „_ 1 } y on {T .' n} for all n
1,
(5.16)
and that for some constants r o > 1 and M o < co, E [it I ,i°--- n-i]
(5.17)
Mo for all n 1.
Then there exist constants r 1 > 1 and M 1 < co such that IE[rro 9 <
FE [r°] for all 1
r r1 .
Proof. It follows from (5.17) that P{. fl'ilgin-i} Moro-m for all m > 0, n >1. Put f (r) = 1 + (r — rm-1 qm , where qm = min {Mo r cT m, 1}. Then f (r) is a probability generating function satisfying
ny-i
E[r 4-1,Fn _ i ]
f(r)
for all n > 1,1
r
r0 .
(5.18)
Consequently we have on {-r >1} for all 1 < r < ro (using the notation for the conditional expectation 4.13 7j): 1.0 E(C9 [rE NI n ; t >N] N=1
f(r)
y
E(c [rENi 1 4; -c >N]
(5.19)
N=1
by conditioning w.r.t. ,*--, N _ 1 . Further, by using the Schwarz inequality, (5.16) and (5.18), we obtain for all N _. 2 and 1 < r < (r0) 1 ' : IE(°) [rENi 14 T > N] ;
, E(0)[E(N - 2) [r 4 , - i ; T * N — l]rENi 2 n;-c >N— 1]
< (1 _ y) i i 2 ( f (r2)) i i 2 E(o)[1,ENi 2 4 ; ,r > N _ 1].
89
Geometric recurrence Hence by induction
E" [rEN,
; -c >N] < (1— y)
1)/2 (f(r2))(N— 1)/2
for all N > 2.
Substituting this into (5.19) gives
E(0)[r ri 9 < f (r) E (I
_
1)/2(f (r 2))(N— 1)/2
on It > 11.
N1
the right hand side of this inequality is finite, say M 1 , Since f (r) .1.1 as r for some r 1 > 1. The proof is completed after observing that
= [r40 =0] + [r4° ; -c > 1 ;1E(') Pri9] < Mi [r°] for all 1 < r r 1 . Proof of Proposition 5.19. First we shall show that the small set C is geometrically regular. Let Be e + be arbitrary. Since C is small there exists an integer n o > 1 such that y = inf Pn°(y, B)> 0, YEC
(cf. Proposition 2.7(i)). Set q(0) = Tc , n(i)= inf {n n(i — 1) + n o : X n e CI for i 1, () = 010), = P(0), = 11(0 — n(i 1) for i > 1, and define a stopping time t relative to (.9 7i) by —
=
inf { i
1:
-1-n0e-B}°
By using Lemma 5.6 we can conclude that sup lEx rsB < oo for some r >1; xeC
i.e., C is geometrically regular. Now, by Lemma 5.5(iii), ir is geometrically regular. In particular, E irrsoc is finite for some r o > 1. Set
Di = {xeE:EAc
i >1.
Then U rD i is full. By using the inequality SB 1,xeD i ,Bee + ,1 < r < ro : lEx 1-513
SB oOsc we get for all
SUp[y r51 yeC
The right hand side is finite for sufficiently small r> 1, because C is geometrically regular. Thus each Di is geometrically regular. Suppose that P satisfies the minorization condition M(m o , 1, s, v) for
90
5 Positive and null recurrence
some m o > 1,seY± , vede . In the following proposition geometric recurrence is characterized in terms of the potential kernel GV0,,,v = E'r"'"(P") — s® v)". In order to avoid unessential technicalities we o consider the aperiodic case only. Proposition 5.20. Suppose that (X) is aperiodic. Then (X„) is geometrically recurrent if and only if vG 5 s is finite for some r> 1. Remark 5.1. If m o = I, i.e., (s, v) is an atom, we have rvGs = rvGns = crs- (see (4.20)). In the proof of Proposition 5.20 we need the following: Lemma 5.7. If (X) is aperiodic and geometrically recurrent then so is the mo-step chain (X,,„,0 ). Proof. Let Cee be a small set satisfying (5.14). Set B = C, f -=- 1, and
and the stopping time x as define the history (F i), the stochastic process in the proof of Lemma 5.3. By using Lemma 5.6 we obtain the desired result sup E„ Er mos) < oo . xEc
0
Proof of Proposition 5.20. Suppose that (X„) is geometrically recurrent. By
Lemma 5.7 there is no loss of generality in assuming that m o = 1. Now it is easy to see (using Lemma 5.6 in an obvious manner) that in this case the split chain (X, Y,,), too, is geometrically recurrent. Now the finiteness of Ers. for some r> 1 follows from Proposition 5.19. The converse result is as easy. Since, trivially, the geometric recurrence of (X,,,,,o) implies that of (X„), we can again consider only the case m o = 1. By Remark 5.1 the condition vG sfrts < cio implies that the split chain (XY„) is geometrically recurrent. By applying Proposition 5.19 we see that there exists a geometrically regular set Deg°± (for (X)). But this implies that (X) is geometrically recurrent. 0 The following proposition gives a characterization for geometric recurrence in terms of a drift condition (cf. Proposition 5.10): Proposition 5.21. Suppose that for some small set CeY+ , function ge6 + and constant r> 1: sup E [rg(X„, 1 ) XeCc
—
g(X„)1X„=
= sup(rPg
—
g)< 0
C.
and = = sup P/cc g < oc) . xec c Then the Markov chain (X,,) is geometrically recurrent.
(5.20)
Uniform recurrence
91
Proof. From the inequality (5.20) we can derive — precisely in the same manner as we did in the proof of Proposition 5.10 — the following: sc -1 g(x) y 1 cc(x) Ex E rn , y > 0 a constant. o Hence by our hypothesis
rc
-2
sup PIcc g y sup Ex E rn ; sc 2 , C c 0 from which the assertion easily follows. Ill 00 >
Examples 5.5. (a) Let (X) be a Harris recurrent, discrete Markov chain. It is geometrically recurrent if and only if E[r] is finite fOr some state X E E and some constant r = r(x) > 1. In this case for all xe E„, E[r]is finite for some constant ro = ro (x) > 1. (d) The reflected random walk (Wn) is geometrically recurrent if Ez i <0 and Eel'z' < co for some y> 0. (Hint: Set g(x) = e", C = [0, c], for sufficiently small i> 0 and big 0< c < cc, in Proposition 5.21.) (e) (The forward process). Suppose that F is spread out. The Markov chain (I/L) is geometrically recurrent if and only if Eezi = j eY`F(dt) is finite for some constant y > 0. (f) The autoregressive process R„= PRn - 1 + zn, n >1, with I p1 < 1 , F = Y(z 1) non-singular, and f 1 tIF(dt) = El zi I finite, is geometrically recurrent. (Hint: Set g(x) = I x I, C = [ — c, c], c big enough, in Proposition 5.21.)
5.6 Uniform recurrence The strongest from of recurrence we consider is uniform recurrence. We start with a brief discussion on special sets. Let (X) be an irreducible Markov chain. In Definition 5.4 we introduced the concepts of a special function and set for Harris recurrent (X). Let us extend this definition to arbitrary irreducible (X n). So, for example, a set De e+ is called special, if Sli
sup UB 1 D = sup Ex E
xEE
E i D(x„)< 00 for all Bee ± .
(5.21)
n=1
It turns out that special sets are 'test sets' for recurrence and positive recurrence (cf. also Theorem 3.7(vii) and Corollary 5.3(ii)): Proposition 5.22. Suppose that there exists a special set De e + satisfying Up(x, D) = P ,{S D < co } = 1 for all xeD.
(5.22)
Then the Markov chain (X) is recurrent. It is Harris recurrent on the
5 Positive and null recurrence
92
absorbing set D°° = {hp = 1} = IxeE:Px {TD < CO} = 11. If, in addition, (5.23)
SUP E xSD < co xeD
then (X n) is positive recurrent.
Proof. From the hypothesis (5.22) it follows that X n eD i.o. Px-a.s. for all xeD. If it were true that Px {S B = cc} > 0 for some xeD, Be 6"- , then this would lead to a contradiction with (5.21). Consequently, (X n) is Harris recurrent on D. To see that (5.23) implies positive recurrence, take a small set Cell + with C D. From the resolvent equation (3.18) it follows that sup U1 = sup(UD 1 + Uc/D\c UD 1) D
(1 + sup Uc l D)sup UD 1 < oo .
D
By Corollary 5.3(ii), (X) is positive recurrent.
E
D
I=1
We state the following:
Definition 5.8. An irreducible Markov chain (X n) is called uniformly recurrent if sup ExSB < oo for all Bee. xeE
Note that uniform recurrence implies Harris recurrence; then, in particular, the transition probability P is stochastic. Note also that uniform recurrence coincides with the requirements that P be stochastic and the state space E be special. Other characterizations are given in the following:
Proposition 5.23. Suppose that P is stochastic and irreducible. Each of the following three conditions is equivalent to uniform recurrence: (i)(X„) is uniformly geometrically recurrent in the sense that for all Be 6a + there exists a constant r =r(B)> 1 such that sup Ex [rs1 < cc. xeE
(ii) sup ExSD < oo for some special set Dee. xeE (iii) inf Y N P"(x, D)> 0 for some integer N > 1, some special De S + . xeE 1
Proof. Clearly (i) implies uniform recurrence. Conversely, if (X) is uniformly recurrent then for any Bee, sup Px {SB > NI <1 .E for N large enough. It is easy to see that this implies (i).
(5.24)
93
Uniform recurrence
By Proposition 5.22, if (ii) holds then (X„) is positive Harris recurrent. By Proposition 5.14(i), in order to prove that (X) is uniformly recurrent it suffices to show that E xSB = UB 1(x) is bounded for some regular set Bee ±nD. By (3.18) sup UB 1 = sup(UD 1 + E
UR I D\B U D 1)
E (1 ± sup E
UB 1 D)supUD 1 < co . E
That (iii) follows from uniform recurrence is a direct consequence of (5.24). Conversely, if (iii) holds then there is a constant 6> 0 and a finite partition E= N1 E of E such that for all 1 < n< N and xeE„, Pn(x,D) S. Hence sup Px {S D > AT} 1 — 6, xeE
implying (ii).
El
Using (ii) and Proposition 5.13(ii) we see that uniform recurrence is equivalent to the condition sup Ex SB < co for all Be e+ n B 0 , for some E
Bcpee+.
The 'drift criterion' now takes the following form:
Proposition 5.24. Suppose that P is stochastic and irreducible, and that for some bounded, non-negative function 036% , for some special (in particular, some small) set De': sup(Pg — g) < 0. D.
Then the Markov chain (X„) is uniformly recurrent. Proof. Use Propositions 5.10 and 5.23(ii).
CI
Examples 5.6. (a) Let (X„) be a discrete Markov chain. It is uniformly recurrent if and only if sup ES y is finite for some ye E. Then this quantity is xeE
finite for all ye E. Every irreducible Markov chain on a finite state space E and with a stochastic transition matrix P is uniformly recurrent. (e) (The forward process). The Markov chain (V n+o) is uniformly recurrent if and only if AI = ess sup z 1 = sup {t: F(t) < 1} is finite. (f)(The autoregressive process). Suppose that I pl< 1. Also suppose that F is non-singular and there is a finite interval [a,b],— co
94
5 Positive and null recurrence
(i) The storage chain (S„) introduced in Example 4.1(i) is uniformly recurrent. We shall introduce still one more example: (k) Let (X) be an irreducible and aperiodic Markov chain with convergence parameter R = 1. Suppose that for some integers n o , I >1, constant M < co, small function s 1 ,...,s1 e9' ± and small measures v 1 ,...,v 1edi + the following majorization condition holds: / P" < m x si ® vi .
iJ =1
Then (X) is recurrent. It is uniformly recurrent on every Harris set He. (For hints for the proof see the forthcoming Example 5.7(k).)
5.7 Degrees of R recurrence -
In this section we shall extend some of the concepts of the previous sections to general kernels. Our analysis will not, however, be as thorough as in the case of a transition probability P. Instead, we shall only briefly discuss the extensions of the concepts of geometric and uniform recurrence. We assume that K is an R-recurrent kernel satisfying the minorization condition M(mo , 1, s, v). We let it denote an R-invariant measure for K. We assume that h= h, is the unique minimal R-invariant function satisfying v(h) = 1. Then by Proposition 5.4, the transformed kernel
k = RI h _ i___ KI h is the transition probability of a Harris recurrent Markov chain (in) on P = Iii < col. (1') is positive recurrent if and only if K is R-positive recurrent. For any Bee, r >O, let Wir denote the kernel defined by 00
uv= y r'1 K (I B.K)"
1,
1 and let CP;) denote the kernel
CV =
I
Clearly, (PBr) = 1h - IU")I Bh
for all r 0.
Hence Ex [rga] =
14)(x, B) = (h(x)) - 1 UPIh (x, B).
(5.25)
We state the following: Definition 5.9. An R-recurrent kernel K is called geometrically R-recurrent,
Degrees of R-recurrence
95
if for some small set C eY + such that infh > 0 and sup h < cc, and for some C
c
constant r> R: sup L/g) l c < oo . C
If h is bounded on a small set CeY + , then C is small for the transformed kernel k, too. From (5.25), Definition 5.7 and Proposition 5.19R we easily get the following:
Proposition 5.25. The kernel K is geometrically R-recurrent if and only if the Markov chain (5C- „) is geometrically recurrent.
[:]
As a direct consequence of Proposition 5.20 and the above result we have:
Corollary 5.6. Suppose that K is aperiodic. Then K is geometrically Rrecurrent if and only if vG s s is finite for some r> R. 0 Let Gt.) denote the potential kernel of r(lBe K), i.e., .0
Gv.yrn(I Bc K)".
o We shall call a 7r-integrable function g e ,Y 1 (ir) special, if for every Bee + there is a constant M = M(B) < co such that Gill g I < M h.
(5.26)
Since (using obvious notation)
GB' h , we obtain the following result (see Corollary 5.5(ii)): B = Ih - 1
Proposition 5.26. (i) A function g e c,r 1 (ir) is special for K if and only if the function a = h'g is special for ( n). (ii) A function ge..991 (n) is special for K if (and only if) there is a set B o ee such that (5.26) holds for all B ee + n B o . El We state the following:
Definition 5.10. An R-recurrent kernel Kis uniformly R-recurrent if the Rinvariant function h is special; i.e. if for every Be S + there is a constant M = M(B) < co such that Grh < Mh.
(5.27)
As an immediate consequence of Propositions 5.23 and 5.26 we have (see also the remark made after Proposition 5.23):
96
5 Positive and null recurrence
Corollary 5.7. Each of the following three conditions is equivalent to uniform R-recurrence: (i) The Markov chain (sn) is uniformly recurrent. (ii) For some Bo e : the inequality (5.27) holds for all Bee+ n Bo . (iii)For some Bo ee + : for all Be e + n B o there exist an integer N = N(B) and a constant y = y(B) > 0 such that N
Eici B Th i
on 1h < oo 1.
El
Remark 5.2. In fact, for Corollary 5.7 we need not assume that K is Rrecurrent, but only that K is irreducible and has an R-invariant function h on some closed set F. This holds true since then the transformed kernel K = RI 12 _ ,KI h is stochastic on the closed set Fn {h < oo } (cf. Proposition 5.23). Examples 5.7. (a) Let K be a Card F -irreducible matrix, CardF a maximal irreducibility measure. Fix a state zeF and set b„= K( z K) - 1 (z, z),
n 1,
(see Example 4.1(a)). The kernel K is geometrically R-recurrent if and only if
60 = y rnb„< oo for some r> R. n= 1
Suppose that K has an R-invariant column vector h=(h(x);xeE). K is uniformly R-recurrent if for some ze F, N > 1,y > 0: N
ykn(x,z) 1
y h(x)
whenever h(x) < cc.
In particular, every irreducible (non-negative) matrix on a finite state space is uniformly R-recurrent. (k) Let K be an irreducible aperiodic kernel. Suppose that for some integers no , I > 1, constant M < cc, small functions s 1 , . . . ,s1 E Y + and small measures v 1 , ... , vi ed' the following majorization condition holds: / Kn° m y s® v.
i,J=i
Then K is uniformly R-recurrent. Hints for the proof: First note that, by Proposition 2.10(iii), we may suppose that K"°< Ms 0 v for some single small function s and single small measure v satisfying the minorization condition M(m o , 1,s, 0 . By considering suitable iterates
Degrees of R-recurrence
97
of K we can conclude that there is no loss of generality in assuming that mo = no . Further, since the 'R-properties' of K are inherited by IC" as the corresponding `Rm°-properties' we may assume that m o = no = 1; i.e., finally we are led to the simple hypothesis
sOv
R= sup{r
0:Ern+ lv(K — s 0 y)s <1} o
= sup{r 0: YrnvKns < ool. o Since by our hypothesis
K — s® v. pK,
where p=1— M -1 <1,
it follows that the series rvGs(r)vs . E ox rn+iv(K — sc)v)ns converges for r= p - 1 I2R > R; by Corollary 5.6 this implies that K is geometrically Rrecurrent. Let h= hv be the unique minimal R-invariant function. Note that h> 0 everywhere. We have to show that h is special. By our hypothesis
h= RKh< Mv(h)s. Hence h is small, and by Propositions 5.13(iii) and 5.26, special. Note that the Markov chain of Example 5.6(k) is uniformly 1-recurrent.
Total variation limit theorems
In this chapter we shall study for an R-recurrent kernel K the convergence of the iterates Kn as n—> oo. The main result here is Orey's convergence theorem. Formulated for an aperiodic Harris recurrent Markov chain with transition probability P, this theorem states that the n-step transition probabilities Pn(x,.) converge in total variation norm: for any two initial distributions 2 and it, lim kPn — pPn I(E) = 0.
(6.1)
n --0 oo
For a general aperiodic R-recurrent kernel K, having the (essentially unique) minimal R-invariant function h, Orey's theorem takes the following form: for any two measures A, veit + such that 2(h) = p(h) < aD, film R n I 21(n — ft K" 1(h) = 0.
(6.2)
n---f co
We shall also consider various sharpenings to Orey's theorem by considering the rate with which the norms in (6.1) and (6.2) tend to zero. We shall see that the rate of convergence in Orey's theorem is closely related to the degree of recurrence of the Markov chain (X„) (or of the kernel K). The basic technique we shall use in the proofs is the regeneration method introduced in Chapter 4. Henceforth we assume that m o > 1, 5E99+ and ve.11 + are fixed such that the minorization condition M(m o , 1, s, v) holds. When mo = 1, (X„, Y„) denotes the split chain induced by the atom (s, v). Since the regeneration method is based on the exploitation of the embedded renewal process, we start by studying renewal theory, i.e. the asymptotic behaviour of renewal sequences.
6.1 Renewal theory We adopt the notation and terminology used in Section 4.1. Suppose that (T (i); i > 0) is a recurrent renewal process with increment distribution b. If a =(an ; n 0) denotes a delay distribution then the corresponding (delayed) renewal sequence v = (v„ ; n 0) is given by v = a* u. Recall the definition of the sequence B n = E-+1 b,,,, n 0. We have u*B •a 1 (see (4.6)). The basic renewal theorem is:
99
Renewal theory
Theorem 6.1. Suppose that u = (u n ;n >0) is an aperiodic, recurrent renewal sequence. Let a =(an ;n> 0) be an arbitrary delay distribution. Then limla*u — ul*13,, =0.
The proof of this theorem is based on the use of the so-called coupling technique. It involves the study of two renewal processes defined on the same probability space, and having the same increment distribution but different delays. Their joint distribution is constructed in such a manner that they eventually have a common renewal epoch (a.$). This leads to an inequality, called the coupling inequality, from which the convergence result easily follows. In proving that the coupling time is finite we need the following lemma. Its proof can be found e.g. in Feller (1971), Sect. VI.10, Theorem 4. Lemma 6.1. Let Z = (Z(i);i >0) be an aperiodic, integer valued random walk; i.e. i
Z(i)= Z (0 ) + yz(f),
i 1,
i
where Z(0) is an integer valued random variable, and the increments z(j) are i.i.d., independent of Z(0), integer valued random variables having a common non-lattice distribution. Suppose that the increments have zero expectation, Ez(1)= 0. Then the random walk Z is an aperiodic and recurrent Markov chain on E = • • • , —1, 0, 1, 2, .... El Proof of Theorem 6.1. Assume first that, instead of mere aperiodicity, the
following stronger condition holds true: g.c.d. {n
—
m:m < n,bni > 0,b„> 0} = 1.
Let N ..>_. 1 be an integer so big that g.c.d.{n — m:m 0, b> 0} = 1.
(6.3)
Let T = (T (i); i 0), T (0) = 0, T (i) = yii to for i 1, be an undelayed renewal process with increment distribution b. We shall construct a sequence T' =(T'(i);i __ 0) of random times, T '(i) = T ' (0) ± Ei t' (j) for i > 1, in the following manner: Set T' (0) = 1. For any j > 1, if the increment t(j) is bigger than N, then we set t'(j) = t(j), whereas if t(j) is smaller than or equal to N, then we distribute t'(j) independently of, and with the same (conditional) distribution as, t(j). In exact terms, for every j > 1, P {e(j) = t(j)Ig7 1 v ,*71.1 = 1 on { t(j) > N}
100
6 Total variation limit theorems
and Y (t' (j)1,97T 1 v ,FT)= Y(t(j)It(j) N) =(1 — BN ) - l b( . ) on {t(j)
N}.
It is clear that the marginal probability law of T' is the law of a delayed renewal process with delay r(0) = 1 and with increment distribution b. By (6.3) and by the above construction the random walk Z defined by Z(i) = T ' (i) — T (i), i
0,
has symmetric, bounded (by N), non-lattice increments z(j) = t'(j) — t(j). Hence it satisfies the hypotheses of Lemma 6.1. It follows that the random time I/ = inf {i 1 : Z(i) = 0} = inf li 1: T/(i) = T (i)} is finite almost surely. Let V =(1/n ; n .... 0) and V' =(V ; n > 1) be the backward chains associated with the renewal processes T and T', respectively. (See (4.10). Note that V is not defined.) The initial states of the backward chains V and V' are Vo = 0 and VI = 0, respectively. Note that, since the increments of T and T' have the same distribution b, the Markov chains V =(Vn ;n>0) and (V/i .4_„;n>0) obey the same probability law. Note that the random time T = TOO= r(q) is a randomized stopping time for both the Markov chains V and V'. Let A IN be arbitrary. By using the Markov property at x = m and the fact that V., = V; = 0, we obtain for any n >1: n
P{Vn eil} =
y
m=1
P{Vn _ meA} P {T = MI ± P{ Vn eA, T > n},
n
P{V'n eA} =
y
P{Vn _ meil} P {T = MI ± P 1 17:1 EA, t > Ill.
m= 1
From these we get the coupling inequality sup I P {Vn eil} — P {V'n eA}l
P IT > n}.
AgN
Since t is finite a.s., the right hand side tends to zero as n—> cc. The left hand side is equal to 2 m=0
n
2m=0 n-m m
n- 1-m m
=1114 — 6 (1) *Ul*Bn
where .5 (1) is the probability distribution on NI assigning unit mass to the integer 1. Consequently, lirn lu — 6 (1) *til*Bn =0.
n-.co
(6.4)
101
Renewal theory
Let us now drop the additional assumption (6.3) and assume only the aperiodicity, i.e. g.c.d. In 1:bn > 0} = 1. Note that proceeding as above would lead to a possibly periodic random walk Z. In order to deal with this complication, we use the following trick: Let 0 < p < 1 be a constant. We modify the increment distribution b by setting bc, = p, bn =(1 — p)b n for n 1 (or, briefly, b= p6 +(1 — p)b). Let (T(i);i > 0) be the associated renewal process. (Note that we allow now, exceptionally, an increment to be zero with positive probability; cf. (4.1).) It is easy to see that the above proof goes through for the modified process. Hence we get (6.4) for the modified renewal sequence a. But since
a = (1 — p)
l u and B
(1 — p)B
we obtain (6.4) for the original renewal sequence u, too. If we have an arbitrary distribution a in place of (5(1), we proceed as follows. For any N >0, let a(N) denote the truncated sequence
a(:) = an for 0 < n< N, = 0 for n> N. Write 00
AN
=
N+1
an.
We have for all n > 0:
(N — a:tul*Bn
u — a (N) *u *B,, 0 A N u* Bn + (a — a (N) )*u*B„.
By (6.4), for any fixed N, the first term on the right hand side tends to zero as n —> co. By (4.6), the second and third terms are both dominated by A N , and they can therefore be made arbitrarily small by choosing N big enough. 1=1 Let a = (an ; n > 0) be an arbitrary non-negative sequence. Set
M° =a„, Ma = ynan . a o Note that M b = Et is finite or infinite depending on whether the renewal process (T(i);i 0) is positive or null recurrent. From Theorem 6.1 we are able to deduce the following:
6 Total variation limit theorems
102
Theorem 6.2. Suppose that u is an aperiodic, recurrent renewal sequence. Then
lim a *u,, = M 1 M ° unless Ma" =
Mb = 00.
tic() = lirn u„ =
In particular, the limit Mb-1
n --0 co
exists; u ao > 0 or = 0 depending on whether the renewal sequence u is positive or null recurrent.
Proof. It follows from Theorem 6.1 in particular that lim (a *tin — MV ) u)= 0 n —• oo
if Ma(° ) is finite. In the positive recurrent case to get
M b = MV )
is finite, and we can choose a = B
li m (1 — Mb u„) = 0. n —* co
Thus we have proved the result in the case where M b and Ma" are finite. In the case M b < cc, Ma" = cc, the result easily follows by using a truncation argument. So it remains to consider the null recurrent case; this is the case where M b = CC. Let c> 0 be arbitrary and N = N(e)> 0 be such that N
EB n
E—1 .
0
Since by (6.4), lim (u,, — un _ 1 ) = 0, n —. cf.)
we have min u(n — m)u(n)— e o ..5m5N for n big enough. By (4.6) N
y
B(m)u(n — m) e - 1 (u(n) — e) m= o for large n implying 1
lim u,, = 0. n--■ oo
By suitable truncation the final result follows:
lim a :0 ub = O. n --■ co
0
103
Renewal theory
Let now u = (u n 0) be an arbitrary (possibly non-probabilistic) renewal sequence with convergence parameter 0 < R < co. Applying Theorems 6.1 and 6.2 to the renewal sequence U = (Rnu n ;n 0) (see (4.8)) gives us the following corollary. We write ign = 1 — En, Rinb,n (=Enc"+ 1 Rinb ,n , whenever u is R-recurrent). Corollary 6.1. Let u = (u„ ; n 0) be an aperiodic, R-recurrent renewal sequence. Then: (i) For any non-negative sequence a = (an ;n 0) such that avo= E 0cc' Rna n is finite, n
iim
E
R n - m ia*u n _ n,
—
a(R)un _ ni l it
= O.
fl -. co m= 0
(ii) There exists U O3 = lim Rnun = M- 1 . Moreover ü
0 or =0 depend-
ing on whether the renewal sequence u is R-positive or R-null recurrent. CI Let us return to the probabilistic case. We assume that (T(i); i 0) is an aperiodic, recurrent renewal process. u = (un ; n 0) is the corresponding undelayed renewal sequence. If we want to emphasize a specific delay distribution a = (a„; n 0), we write Pa instead of P for the underlying probability measure, i.e. Pa { T(0) = n} . an , n O. We shall study the asymptotic behaviour of the sums N
ya*un = a*u*lN
0
as N --+ co . In probabilistic terms the above sum is equal to the expectation of the number of renewals in the interval [0, N] if the delay distribution is a, N Ya*U n =
0
N
E. E 1,,,„% .= ,};
0
by recurrence it tends to co as N + oo. We denote co Ap.2) = y n2an, o -
A„= E an, = 1 — a*1„, n > O. n+1 Theorem 6.3. With the assumptions of Theorem 6.1: (i) We have N
lirn E (a *un — u n)= — Al b-- 1 Ma, N-4 cx) 0
unless M. =
Mb = CO.
(6.5)
6 Total variation limit theorems
104 (ii) If
Mb
is finite, then [ya*u,,—(N +1)A11; 1 ] 1 ma,
- 1 m — 2 (mt2) b k —2
(6.6)
unless M.= MP = cc, and in particular, N
)— 2(MV lim y un — (N + 1 )Mb-1 =1Mb
Mb).
(6.7)
N --■ Do[ 0
Proof. This theorem is in fact a corollary of Theorem 6.2. Namely, we can
write the term on the left hand side of (6.5) in the form y(a*un un)=a*u*1 N —u*l N = 0 By Theorem 6.2 A*u,, tends to the limit M b-1 MT ) = Mb— 1 M a
—
A*u.
as N oo,
unless M b = 00 and 0,44j) = Ma = co. Setting a to be equal to the equilibrium distribution e, a =e=M b-1 B,
and using (4.6) we obtain lim N
(y — (N + 1)Mb- 1)=
y(un 00
0
—
0
e*u„)= M b— 1 M e .
Now it is easy to see that Me = 1M 1 (M (b2)
—
Mb).
This proves (6.7). Clearly, (6.6) is a direct consequence of (6.5) and (6.7). El According to Theorem 6.3 the series y(a*u n —u)
converges provided that M a < oo, and the series
y(un — M b-1 ) 0
converges provided that M (b2) < co. The following theorem deals with the absolute convergence of these series.
105
Renewal theory
Theorem 6.4. With the assumptions of Theorem 6.1: (i) If Mb is finite, then the total variation Var (u) of the renewal sequence u, def
Var (u) . 1
co
+Dun — un _ 1 1, 1
is finite. If in addition Ma is finite, then 00
E I a* tin — tin ' ma Var (u) < cc.
(6.8)
0 (ii) If MP is finite, then
y j un _ ,Ac 1 j < Go
w, 1(02) _ Mb) Var (u) < cc,
(6.9)
o and def c°
Var (2) (u) = y n I u a — un _d< co. 1 If in addition, Ma(2) is finite, then
(6.10)
00
Ynia*ua —ua l _< l(Ma(2) — Ma ) Var (u) + M a Var(2)(u) < cc. (6.11) 1 Proof. The proof of the finiteness of the total variation Var (u) is based
again on the use of the coupling technique. Let T= (T(i);i ... 0) and T' = (T'(i);i 0) be two independent renewal processes, the former being undelayed, T(0) = 0, the latter being delayed with delay T'(0) = 1, and both having the same increment distribution b. Consider the product renewal process T" = (T"(i); i >0), that is, the renewal process with incidence process Y,'; = Ya Y, n >O. Thus the renewal epochs of T" consist of precisely the common renewal epochs of the renewal processes T and T'. It follows that the delayed renewal sequence v" corresponding to T" is given by v; = 0, va"= u„u,,_ i for n > 1. By Theorem 6.2, li111 1/,; = ill = MI; 2 > 0. n—• co
Consequently, the product renewal process T" is positive recurrent. This implies that the bivariate backward chain (V a , V) is positive recurrent. (To see this, note that Y'„' = 1 if and only if (V, V)= (0,0).) Since by Theorem 6.2 tin >(2Mb ) -1 > 0 for all sufficiently big n, we can easily prove that the bivariate chain (V„, V) is aperiodic and
6 Total variation limit theorems
106
irreducible positive recurrent on the state space {0, 1, ..., M — 1} or Eb = NJ depending on whether
Eb
x
Eb,
where
Eb =
M=supin1:b(n)>01 is finite or infinite. Since T"(0) = S (0 , 0) is the first time at which the bivariate chain (V„, V) hits state (0,0), the expectation E[T"(0)1(V 0 , Vic) = (1,0)] = E(1 ,0) [S(0 , 0)] is finite by Corollary 5.4. ET"(0) is finite, since clearly, ET"(0) 1 + E[T"(0)1(V0 , ro ) = (1,0)]. Using precisely the same arguments as in the proof of Theorem 6.1 we obtain the coupling inequality I un — un _ 1 1113 {T"(0) > n} for all n 1,
(6.12)
from which the finiteness of Var (u) follows after summing n over N. In order to obtain (6.8) set w 0 = 1, w,, = lu„ — u_ 1 1 for n 1, and note that cc (6.13) E la*un — /id A4cwn . n=0
The assertion now follows, since E (so A n = M. and Di° wn = Var (u). (ii) The inequalities (6.9) follow from (i) by setting a= e. In order to prove (6.10) we adopt the notation used in the proof of part (i), and note first that by Proposition 5.18 E (1 ,0) [S, 0) ] is finite and hence so is E[T"(0) 2 ]. The assertion (6.10) follows now easily from the coupling inequality (6.12). The inequality (6.11) can be proved by a straightforward calculation. P Let .91 be any family of delay distributions a such that 00
liM A= lim E co n-..0 n+ 1
a, = 0 uniformly over aed.
n-■
Note that a sufficient condition for this is: co sup E. T(0) = sup Enan < co. a e al
aesi 0
It turns out that uniform convergence of the tails of the initial distributions leads to uniform convergence in the renewal theorem:
Theorem 6.5. With the assumptions of Theorem 6.1: If M b is finite and lim A n = 0 uniformly over ae.91, then n—■ co
lirn a,Kun = 11c 1 uniformly over a ed.
Proof. By (6.13) la *u„ — un l is dominated by
A *ii, n .
This is further
Renewal theory
107
dominated by
wm + AN Var (u), for all 0 <
E
N < n.
m=n-N+1
By hypothesis and Theorem 6.4, the first term on the right hand side tends to zero as n—> oo . By hypothesis, the second term tends to zero uniformly over si as N —> oo. 1=1 Our next aim is to investigate the case where the rate of convergence in the renewal theorem is geometric. If the renewal sequence (u„ ; n > 0) tends to its limit u. = Itc 1 with a geometric rate, i.e. for some constants M < oo and p <1,
lun —u oo l=lu„— M
MPn for all n .- 0, then the renewal process (T(i);i > 0) is called geometrically ergodic. In the following theorem necessary and sufficient conditions are given for the geometric ergodicity of T I
Theorem 6.6. With the assumptions of Theorem 6.1 the following three conditions are equivalent: (i) The renewal process (T(i); i > 0) is geometrically ergodic. (ii) The series cc
g(r)= E elk, 1
converges for some r >1. (iii) There is a constant 1.0 > 1 such that the function /2 defined on the complex plane C by ii(z)
= E znun
o has no singularities in the disc { I zi < ro } except a simple pole at z = 1. Proof. Assume first that (i) holds. Denote f o = 1, L = un — u,_ 1 for n
1.
Then the function CO
f(z) = yznf, = (1 — z)ii(z) o
has no singularities in the disc {1z1 < p -1 }, i.e., we have (iii). Conversely, if (iii) holds, then the function (1 — z)/2(z) =1(z) is regular in the disc {1z1
y el ; — un _ 1 1 < 00 o
This clearly leads to (i).
for all r <1.0 .
6 Total variation limit theorems
108
By the renewal equation (4.3) j(z). (1 - z)z2(z)= (1 - z)(1 - E(z)) - 1 in the disc { I zl < 1}.
(6.14)
If (ii) holds, then E(z) is regular in the disc { I zl 0 arbitrary. Since, by aperiodicity, there is only one root, namely z = 1 in the disc II z I < 11, it follows that there exists ro > 1 such that z = 1 is the only root in the disc { I zl
6.2 Convergence of the iterates K" (x, A) In the following we shall study the convergence of the iterates K"(x, A) as n —> cc. This will now be relatively easy, since Kn(x, A), n > 0, is by the decomposition results of Theorem 4.1 essentially equal to a delayed renewal sequence, and so the renewal theorems of the previous section can be directly applied to it. Throughout Sections 6.2-6.7 we assume that K is an R-recurrent kernel. We adopt the earlier notation. In particular, d denotes the period of K, and mo > 1, se <99 + and veil + denote fixed quantities such that the minorization condition M(mo , 1, s, v) holds. We assume that cm. = g.c.d. Imo , d 1 = 1 (cf. (5.1)). it denotes a fixed R-invariant measure and h = hv = R"I0 G,,5 = R( i)mo(Kmo _ s® v)ns is the unique minimal R-invariant function Lo satisfying v(h) = 1 (cf. Theorem 5.1). If K is R-positive recurrent, then we shall, by convention, normalize it so that m(h)= 1. The transition probability IZ = Rih _,Ki h governs the transitions of a Harris recurrent Markov chain on the closed set E = fh < co 1 (cf. Proposition 5.4). For any ge‘ ± , we write Jag) = {AEA': g I 21-integrable}. The space di(g) is equipped with the g-total variation norm: for a signed measure def 1lAllg= sup
Ill < g II K II, denotes the corresponding operator norm: 11/(11 8 = sup I1AK ll g = When g
sup g(x) -1 Kg(x). xe{g < co}
1, we have .1/(g) = JIM= bit Then 11 A
118 = 11 A. II 1
is simply the
Convergence of the iterates Kn ( x , A)
109
total variation of the bounded signed measure A: def
All 1 = 1121111 = sup A(A) — inf 2(A) = IAI(E). AEg
AEg
For a bounded kernel ll K 1 1 = sup K(x, E)< co . For a substochastic kernel xeE
P, II P111
1; i.e., P is a contraction on hit.
Proposition 6.1. The operator norm Kil l, equals R -1 . Proof. By R-invariance
I K I h=
sup
h(x) - Kh(x)=
1.
El
xe{h < op}
For any ge6',, we write
A' 0 (g) = {2e41(g): A(g) = 0} ; i.e., a signed measure ed.ii is a member of dt 0 (g) whenever g is I 2integrable and the integral A(g) vanishes. We start by proving an h-total variation convergence result for aperiodic K. We will discuss the periodic case later in Corollary 6.6.
Theorem 6.7. Suppose that K is an aperiodic, R-recurrent kernel. Then for any signed measure Ae 1 0(h): lim R"II AK"
n--■ co
I h = O.
Proof. It is no restriction to consider the case R = 1 only. (Otherwise we should look at the 1-recurrent kernel RK.) Assume first that mo = 1, i.e., the pair (s, v) is an atom. By the decomposition result (iv) of Theorem 4.1 —
11 21(n h
v)"h + A(a)*ul*a(h)„_ for all n >1. (6.15)
(We write A(a) = (2(an); n 0); the notation a(h) is interpreted similarly.) For the first term on the right hand side we have the estimates co > 121(h) _121(K — s® v)nh
(K — s v)ms1.0 as n—>oo. m=n In order to see that the second term also tends to zero as n oo , observe that, by (4.11) = 121
Y1 2 (a.)1 1 2 0
( an)= 0
0
No) < co,
6 Total variation limit theorems
110 and by (4.22) o-„(h) = B,
and then use Theorem 6.1. The general case, inc, > 1, follows by considering the iterated kernel Km° and using the fact that 11 KlI fi = 1 (see Proposition 6.1). 0 Setting A = h(x) - 1 ex — h(y) - 1 Ey in Theorem 6.7, where x, ye t , we get: Corollary 6.2. Suppose that K is aperiodic R-recurrent. Then for any states x,yeE: lim R" 11 h(x) - 1 K"(x, -) — h(Y) - 1 K n(Y, ')II
h = 0.
0
n --■ co
Setting A = it — it (h)r,r, where it belongs to .41 (h), we get the following corollary. (By convention rr(h) = 1, if K is R-positive recurrent.) Corollary 6.3. Suppose that K is aperiodic R-positive recurrent. Then for any signed measure tteJtf(h): liM n--■ co
e),
For every n > 1, let k (n) e(e CD and Kr denote the density and singular part, respectively, of K" w.r.t. m (cf. Lemma 2.6): Kt' (x, dy) = le" )(x, y)n(dy) + K!" ) (x, dy).
Since for all xet, n >O. 11 R n K n (x,.) — h(x)Th 11
h
= f I R" k (n) (x, y) — h(x)Ih(y)n(dy) + Rn Krh(x),
we obtain from Corollary 6.3: Corollary 6.4. Suppose that K is aperiodic R-positive recurrent. Then for all xet: lirn f I /V' k (n) (x, y) — h(x)Ih(y)n(dy) =
0
n--■ co
and lim Rni(!n)h(x) = O. n —■ co
I=1
It follows in particular from Corollary 6.3 that, whenever K is aperiodic
Convergence of the iterates
Kn
( x, A)
111
R-positive recurrent, then, for all xe E = {h < cc}, and all f ce ± satisfying 0 < f < Mh for some constant M < cc,, the limit (6.16)
lim IVICn f(x)= h(x)n(f) n—. co
exists (and is finite). It turns out that (6.16) holds even for xe(E)c = 117 = co 1 : Proposition 6.2. Suppose that K is aperiodic R-positive recurrent. Then for all xeE such that h(x) = cc, for all f ee :
lim R"K" f (x) = cc. n-400
Proof. Use Theorem 4.1(iv) (cf. the proof of Theorem 6.7).
The following result is the 'dual' of Theorem 6.7. First recall the definition of the Y 1 (7)-norms :
If II.= n(lf1), f EY 1(n). We set 21(70 = { f e Y 1 (70: n( f ) = 01. Theorem 6.8. Suppose that K is aperiodic R-recurrent. Then for any function f(n)
limRIIK
II
f , =
0.
Proof. The proof is analogous to that of Theorem 6.7. First note again that we can assume R=1 and n70 = 1. The inequality corresponding to the inequality (6.15) is Kn f
n(K — s v)nl f I + n(a)*Iu*o - (f)I n _ i .
Corollary 6.5. Suppose that K is aperiodic R-positive recurrent. Then for any feY 1 (n):
lim 11R nK n f Th(f)h n—*
117, = 0.
It turns out that the convergence in Corollary 6.5 is also pointwise (7ra.e.): Theorem 6.9. Suppose that K is aperiodic R-recurrent. Then for any f e.291 (n), 7r-almost all xe E: lim 12"K" f(x) = h(x)n(h)
f).
n—.00
Proof. There is no loss of generality in assuming that K = P is a Harris recurrent transition probability. (Otherwise we should make the similarity
6 Total variation limit theorems
112
transform (5.2); see also Proposition 5.4.) Also we can assume that m o = 1. n-a.e.
def
Let now xERI/I ( = the set of I f l-regular states = E, see Proposition 5.1 3(u)) be arbitrary. We have Pn f (x) = (P — s ® v)n f(x) + a(x)*u* a( f ),, _ 1 . Since by Proposition 5.13(ii) the sum X': (P — s 0 v)" f (x) = G so, f(x) is finite, the first term on the right hand side tends to zero. The latter clearly tends to n(E) -1 7r(f ). 0 Note that the limit in the above theorem is equal to zero (for n-almost all x e E) whenever K is R-null recurrent. If K is R-null recurrent, then Rn AK' f tends to zero, for all A EdI (h) and f e 9 1 (n) with If l< Mh for some constant M < oo . In fact, slightly more holds true: Theorem 6.10. Suppose that K is aperiodic R-null recurrent. Then for any AE di + (h) and any constant y > 0: lim
R"AKn f
n(f)+y
= 0 uniformly over the class If ee + : f < III.
Proof. There is no loss of generality in assuming that A(h) = 1. Let s, y> 0 be arbitrary. By Theorem 6.7 and Egoroff's theorem, and since K is R-null recurrent, there is a set B = Bwee with I B n(dx) h(x) _>__. 8 — 1 and an integer N = N(e) such that R" Il h(x) - 1 Kn(x,.) — AKn II h
ye
for all xEB, n
Consequently, we have for all 0 < f < h, n
N.
N:
m( f ) = KinK" f .-.. R" i ir(dx)Kn f (x) B
.1 7r(dx)h(x)(RnAKn f — ye) B E—1
Rn AKn f — y,
from which the result follows.
E
We shall now consider the case where K has period d > 2. Recall from Proposition 5.5 that, for each i = 0, ... , d — 1, the function h restricted to the cyclic set E. is the (essentially unique) minimal R d-invariant function for the Rd-recurrent, aperiodic kernel K d with state space E. Hence we have: Corollary 6.6. Suppose that K is R-recurrent and periodic with period
Convergence of the iterates Kn (x, A)
113
d > 2 and cyclic sets E0 ,... ,Ed _ 1 . Let N = (E0 _ i ) c . Then for any /led/ 0 (h) such that Ei 2(dx)h(x) = 0 for each i = 0, , d — 1, and1.1.1(N)= 0: + • • • +
lim NI II 21(n h = O. n -. co
0
In the important special case, where K = P is the transition probability of a Harris recurrent Markov chain (X), we have h 1, and consequently we obtain from the preceding general results: Corollary 6.7. Suppose that (X) is a Harris recurrent Markov chain. Let and p be any two initial distributions and let f,ge.T 1(7r) be such that rr(f)= ir(g). Then: (i) If (X) is aperiodic, then urn
I APn — PP" =
(6.17) 0,
lim P n f — and lim P" f (x)= n-0 co
1r(f)
m(E)
for it-almost all xe E.
(ii) If (X) is aperiodic and positive Harris recurrent with stationary distribution it, then (6.18)
lim IIAP n — 71 = 0 n-. oo
and liM Pn f — Th(f) 1 n
= ();
in particular, writing respectively p(n) e(e ‘)± and p) for the absolutely continuous part and the singular part of Pn w.r.t. 7E, we have for all xeE:
h ill 1 pnx,.) - 1
I7r
=o
and lirn Pln)(x, E)= 0. n-.
(iii)If (X) is aperiodic and null recurrent then for any initial distribution and any constant y > 0 : lim sup n-. oo
Aee
2P(A) n(A) + y
=0.
6 Total variation limit theorems
114
(iv) If (X) is periodic with period d 2 and cyclic sets E0 , .. . ,Ed _ 1 , and if A(E i) = u(E) for all i = 0, ... ,d — 1, and A(N)= it(N)= 0 (N =(E,3 + • • • ± Ed _ i )c), then l im II A P n — ,uP 11 =0.
0
.
n--0 co
Examples 6.1. (a) Suppose that K =(k(x,y); x,yeE) is a CardF-irreducible, aperiodic, R-recurrent matrix on a discrete state space E. Let h = (h(x); xEE) denote the (essentially unique) R-invariant column vector for K. Then for any two states x, yEF: lim Rn " 00
y 1 h(x) - 1 le(x, z) — h(y) - 1 10(y, z)lh(z) =
0.
ZEE
(e) Consider the renewal process (Z„;n >0) on R, introduced in Example 1.2(e). Let us write u for the corresponding renewal measure, def oo
co
u(A)=EF"(A)=E[E1A (ZOIZ o = 0] 0 0 Let B(t) = 1 — F(t)= P {z i > t}. Then u * B :El. Suppose that the increment distribution F is spread out. Then for any bounded interval [0, c], 0 < c < oo , and constant 0 < y < co : lim sup I u(t + A) — u(t — y + A)1 = O. t , co Aeg1 4- n[0,0 (Hints for the proof: for all t > y> 0, Ae.R .+ : u(t — y + A) = Pt(y,.)*u(A), where Pr denotes the transition probability of the forward process, Pr(x,dy)= P { V7 edylV ;' = x}.)
6.3 Ergodic Markov chains We shall now for a while consider the case where K= P is the transition probability of a Markov chain (X). We shall resume the general K in Section 6.7. We call a Markov chain (X n) (Harris) ergodic, if it is aperiodic and positive Harris recurrent. By Corollary 6.7(ii), lim II AP" — n ll =0 n —. a)
for any initial distribution A. There is, in fact, also a converse result to that:
Proposition 6.3. A Markov chain (X n) is ergodic if (and only if) there exists a probability measure it on (E, g) such that lim IIP"(x,•)— nil = 0 for all xeE.
(6.19)
Ergodic Markov chains
115
Proof. Suppose that (6.19) holds. Clearly, (X) is it-irreducible and aperiodic. Moreover, it follows that G(x,A)= oo for all xeE, m-positive Ace. Thus (X) is recurrent. By Corollary 6.7(iii) (X) is positive recurrent. Let h be an arbitrary bounded harmonic function for (X). From (6.19) it follows that h = n(h), whence, by Theorem 3.8(ii) (X„) is Harris
E
recurrent.
Throughout Sections 6.3-6.6 we shall assume that K = P is the transition probability of an ergodic Markov chain (X,,). Our aim in these sections is to
study the rate of convergence in (6.17) and (6.18). It turns out that these rates are closely connected with the degrees of recurrence of (X). We shall not formulate the results in the periodic case. Corollary 6.6 provides an example of how this extension could be performed. The same remark holds for the formulation of the dual results (cf. Theorem 6.8). Let us fix two initial distributions A and A of (X,,). Recall from Definition 5.4 the concept of regularity: The probability measure A is regular, if E ASB is finite for all Bee. We shall show that, for A and ft regular, the rate of convergence in (6.17) is n -1 . For the following theorem note that, since the a-algebra is countably generated, the function (x, y) —>I1Pn (x,') — Pn(y,.)11 is measurable for all n > 1. Also note that, for any non-negative non-increasing summable sequence (cn ; n 0), lim nc n = O.
e
n—■ Go
Theorem 6.11. Suppose that (X) is ergodic. If A and A are regular then co
E
n=0
112 (dx)p(dy)11P n(x,') — Pn(Y , ')11
Proof. Consider first the case where m c, = 1, i.e., (s, v) is an atom for P. By Theorem 4.1, (4.23) and (4.24) we have for all n1, xeE:
(6.20) By Theorem 6.4(i), and by (4.21) and (4.22),
co y 1 a(x)*un — u n l< Ex [ Tx] Var (u), n=0
E an (E)= lE„S„= it(s)-1 . n=0
Consequently, i° fl(dx) ll Pn(x,.)— vPn - 1 ll n=1
EA T, ± 740 - 1 EA T„ Var (14) ,
116
6 Total variation limit theorems
which is finite by Proposition 5.13(iv) and Theorem 6.4(i). The final result in the case mo = 1 now follows by using the triangle inequality. To prove the assertion for general m o use Lemma 5.3 and the contractivity of P. From the preceding theorem and from Proposition 5.13(ii) we obtain the following:
Corollary 6.8. Suppose that (X„) is ergodic. (i) If /I, and kt are regular then 1113n
0
[LP% <
in particular,
lim n AP n ii.Pn =0. co
(ii) For it-almost all x, yeE: co n= 0
and lim nlIPn(x,.) — Pn(Y, *) = 0.
n--.
According to the above corollary we know that, if A and p are regular, then the sums 0
— IE 4 Y f (X n) = E(APn f — ktPn f) 0 0
converge (even uniformly over — 1 < f < 1) as N 00 . This naturally raises the problem of identifying the limit. For the following theorem recall the definition of the kernel CO
Grno,s,v =
n= 0
(Pm° —
s
v)(i +... + Pm° - 1 ).
Theorem 6.12. Suppose that (X, i) is ergodic. If ;, is regular then for all f ebe :
0
(APn f — v Pn f) =
— 1 ® m) f.
In particular, there is a full set F( = R1 ) such that for all x, ye F , all f ebe : E(Pn f (x) — v Pn f) =(' 0
—1
7r)f (x).
Ergodic Markov chains
117
Proof. Assume first that m o = 1. By Theorem 4.1, Theorems 6.3 and 6.4, and since Mb= n(s)- 1, M A(a) = EA T, = AGG,1 — 1, we have N Dyn f
N+1
y vpn- i f 1
0
T„
= E A.
A
N
N
E f(x„) + y(A(a)*u — a)* a( f ) — 1 — VP N f 1
o
T c, —> E A Ef (X n) N--, oo 0
oo
co
+ y (1(a)*u m — um) y o-n(f) — n( f ) n=0
m=0
= A.G,f — M, 1 M A(a) n(s) - 1 m( f ) — n(f) = AG„ f — 2G,17r(f). The general case, m o > 1, can easily be proved by considering the m o-step chain (X,,,„.) and the function f +... + Pn°- 1f instead of f. 0 Examples 6.2. (a) Suppose that P = (p(x, y); x, ye E) is an ergodic transition
matrix with stationary distribution m = (7r(x); xeE). Set E Example 5.1(a)). Then for all xeE, hill
y 1 pn(x, z)— 7r(z)I =
{g > 0} (cf.
0,
n --' 00 zeE
and for all x, ye E 7,,
y y 1 pn(x, z) — pn(y, z)1 < 09 .
n= 0 zEE
Moreover for all x, y,zeE it , co y (p.(x, z) — pn( y , z)) = (E y Tz — Ex Tz)m(z). n=0
(d) (The reflected random walk). Suppose that Ez i < 0. Then the reflected random walk (W) is ergodic (cf. Example 5.2(d)) and for any initial state Wo = w, Y (W) —> ir in total variation norm as n —> co. (e)(The renewal process on R, ; cf. Example 6.1 (e).) If F is spread-out and M F is finite then for any constant 0 < c < 00 ; lim
sup I u(t + A) —
M F-1
qA)1= 0;
t-, co AeCR+ n [0,1
moreover, for any constant 0 < 7 < cc:
I
7)1 < R+
co.
(6.21)
6 Total variation limit theorems
118
6.4 Ergodicity of degree 2 An ergodic Markov chain which is recurrent of degree 2 is called ergodic of degree 2. Recall from Proposition 5.16 that recurrence of degree 2 means that the invariant probability distribution it is regular. Thus we obtain as an immediate consequence of the results of the preceding section the following: Corollary 6.9. Suppose that (X n) is ergodic of degree 2 (i) If A is regular then r r,,(dx,c(dy)II Pn(x,') Pn(Y , )11 < 0 0 ,
0JJ
lim n AP — TEM = 0 n a)
and 00
Y(AP" f — n(f))= A(I — 1 0 700(I — 10 7r)f 0 (ii) For all xeR, :
for all f ebg.
DP.(x,.)-70 <00 0
and lim n Pn(x,.) — it II = 0.
0
n oo
In addition to the above results we can prove that, if (X n ) is ergodic of degree 2, then the rate of convergence in (6.17) is n -2 whenever the initial distributions A and it are regular of degree 2: Theorem 6.13. Suppose that the Markov chain (X n) is ergodic of degree 2. (i) If A and it are regular of degree 2 then
ynfp(dx) # (4)
Pn(x, *) — Pn(Y, *) <
ynI AP—/2P njI < 09 and lim n 2 II AP" — ktPn II = 0. n a)
(6.22)
Geometric ergodicity
119
(ii) For all x, y belonging to the full set 00
< 00 1 and liM n 2 II Pn (X,') — P n (Y, )11 = 0. n —. co
Proof We shall prove only (6.22), since the other results are immediate consequences of it. It is sufficient to consider only the case m o = 1 (cf. the proof of Theorem 6.11). By the inequality (6.20) in i 2(d x) II P"(x, .) — v P" - 1 II 1 co .
ymp(dx)ia(x)*u m — um ' 1 -21-(E 2 7',2 — E,t TO Var (u) + EA Ty Var (2) (u) < cc. Example 6.3. (e) (The renewal process). Suppose that F is spread-out and that M (F2) = EE[z] = i t2F(dt) is finite. Then the convergence in (6.21) has rate C I . We leave it to the reader to work out the other examples.
6.5 Geometric ergodicity A geometrically recurrent, ergodic Markov chain is called geometrically ergodic. We shall show that, if the Markov chain (X) is geometrically ergodic, then for g-almost all xeE the n-step transition probabilities Pn(x, A) tend to
6 Total variation limit theorems
120
their stationary limits m(A) with a (common) geometric rate and uniformly over A ee. In fact, we have the following result: Theorem 6.14. Suppose that (X) is ergodic. Each of the following three conditions is equivalent to the geometric ergodicity of (X,,): (i) The embedded renewal sequence uo = 1, un = vp(n -i) m°s for n > 1, is geometrically ergodic. , some set Be e + , there exist functions (ii)For some small function s' such that M< oo and p< 1 on B, and I P"s'(x) — n(s')I
M(x)(p(x))" for all xeB, n >O. (n) and a constant p < 1 such that
(iii) There is a function Me Pn(x,.) —
M (x) pn
for all xe E, n > 0.
In the proof we need the following: Lemma 6.2. Suppose that m o = 1, i.e., (s, v) is an atom. Let Gc,(r) = G s,(rv) = y°cirn(P — s v). Then, for any constant r> 1, function 0 measure 2,E4f + : il.G1(r)S < 00
implies AG 0 1 < cc ,
vG,(r) f < oo
implies ThG yfr) f < 0 0 .
and
Proof. By the definition of the kernel G c,fr) and by Proposition 4.8 we have co
co
=yy
— so vr(p — so v)s
m=0
= y r- n y n=0
.1,(P — s
s
m=n
< (r — 1) -
which proves the first implication. The proof of the second is similar.
Proof of Theorem 6.14. We start by proving that (i) implies (iii). Suppose for a while that m 0 = 1, and let — 1 < f < 1 be arbitrary. By Theorem 4.1 and by (5.7) 1''f(x)— n(f)1 00
= Px {
+ a(x)*u
— Th(s)Ea
1 ( f )1 CO
< Px { 777 >
+ la(x)*u — n(s)11*a(E)_ 1 + n(s)yo- m(E).
121
Geometric ergodicity
Since E ox am (x) = 1 and m(s) = M b- 1 , we obtain further, writing A n(x) =
E.° a(x).
la(x),Ru n — m(s)11 a(x)*Iu — M w ' I n + m(s)A(x). Consequently,
f
ir(dx) II P(x, ') — m II < P{ To, n} + m(a)*Iu — M I* o - (E),, _ 1 + 7r(s)ir(A)* o- (E),, _ 1 + n
from which it follows that for all r> 1, GO
yrn fir(dx) II Pn(x,.) — n II < nG ; ( )1
o
00
(r)s y rmlu,n — +rirG„
M b— 1 IVG Gc(r) 1
m=0 ± rnG2(r)SVG 6t(r) 1 ± TCGa(r) 1.
Now by Theorem 6.6 and by Lemma 6.2 there is a constant r> 1 such that the right hand side is finite. Set M(x) = rrn ii Pn(x,.) — it II and p = r- 1 to o obtain (iii). In the case of an arbitrary m o the implication (i) (iii) follows by considering the m o-step chain (X mno) and by using the contractivity of P. Trivially, (iii) implies both (i) and (ii). Next we shall prove that (ii) implies (iii). Let B' e e+ n B be such that M' = sup M(x) < oo and p' = sup p(x) < 1. Let mO .- 1, fl' >0 and v'EM+ xeB'
xeB'
be such that the minorization condition M(mO, fl', s', v') holds. Without any loss of generality we can assume that fl' = 1 and that v' is a probability measure concentrated on B' (see Remarks 2.1(ii) and (iii)). Let u'o = 1, u n' = v' P(71- om'os
for n > 1,
be the embedded renewal sequence associated with the minorization condition M(n'o , 1,s', v'). By the hypothesis 1 u;, — u 'co 1
i <
B 7yI/
— 74,01 '
(P' ) (n-1)m° 9
i.e., the embedded renewal sequence u' is geometrically ergodic. Since (i) always implies (iii), regardless of the particular choice of the embedded renewal sequence, the proof of the implication (ii) (iii) is completed.
122
6 Total variation limit theorems
It remains to prove that geometric ergodicity is equivalent to (i). But this is a direct consequence of Proposition 5.20 and Lemma 5.7. El Example 6.4. (e) (The renewal process). Suppose that F is spread-out and Ee"l=feYtF(dt) is finite for some constant y > 0. Then the convergence in (6.21) has rate e _ )'t, for some constant y' > 0.
6.6 Uniform ergodicity An aperiodic, uniformly recurrent Markov chain is called uniformly ergodic. We shall show that for a uniformly ergodic chain the operator norms
il Pr' — 1 ® it II = sup II Pn (X, ') — it II xeE
converge to zero as n —> oo . Note that, by contractivity, the convergence automatically has geometric rate. It also turns out that uniform ergodicity can be characterized as the smallness of the state space E. Theorem 6.15. Either of the following two conditions is equivalent to the uniform ergodicity of (X n): (i) There exist a probability measure TE on (E, e), and constants M < co and p < 1 such that
1 Pn — 1 0 7 1 MPn . (ii) P is stochastic and aperiodic, and the state space E is small. Proof. Suppose first that (X) is uniformly ergodic. There is no loss of generality in assuming that m o = 1, i.e. (s, v) is an atom. Since uniform recurrence means that the state space E is special, Proposition 5.13(iii) states that E x 7' = G1(x) — 1 is bounded. Hence (6.23)
lim sup P x { Tc, n} = 0. n--. op
xeE
Consequently, by Theorem 6.5 lim sup I a(x)*un — it(s) I = 0. n—■ co
(6.24)
xeE
Using Theorem 4.1 we obtain II Pn — 1 0 Tr II
sup P x {T' >n} xEE
CO
+ sup I a(x)*u — m(s)1 I *o- (E)n _ 1 + n(s)Yo-,n(E). xE E n By (6.23) and (6.24) the right hand side tends to zero as n—> oo . Thus we have (i).
R-ergodic kernels
123
If (i) holds, then (X) is ergodic by Proposition 6.3, and Pn+m° > Ps 0 v >m(s)1 0 v
for n sufficiently big;
i.e., we have (ii). That (ii) implies uniform ergodicity follows by using criterion (iii) of Proposition 5.23. fl
Remarks 6.1. (i) That E is small, means the following: There exist an integer mo >1, a constant fl > 0 and a probability measure veil such that P°(x,-)
five )
for all xe E.
(ii) For a uniformly ergodic Markov chain the operator 6. ,no , fi1,v = (Pm° v)"(/ + • • • + Pm13-1 ) is bounded; its norm 116.0,fi1,v n= o equals 6.031,vi
mofi 1 •
Note that the series E' = (Pn —1 0 7r) = n= o — 1 0 gr converges absolutely (in the operator norm 11•11); by Corollary 6.9 its sum equals CO
y(pn —1 0 it) = n=
— 10 7r)6„,0 , 01 , v (I — 10 7r).
6.7 R ergodic kernels We shall now consider a general R-recurrent kernel K. The kernel K is called R-ergodic, if it is aperiodic and R-positive recurrent. By Proposition 5.4 the transformed kernel -
R = RI h _ iKi h , where h is the (essentially unique) minimal R-invariant function, is the transition probability of an ergodic Markov chain (g n) with state space If K is R-ergodic, then by Corollary 6.3 and Proposition 6.2 lim Rn Kn(x,.) — h(x)ir II h = 0 for all XGE •
CO
and lim Rn Kn f (x) = e n oo
for all xe(t)e ,
Hence, in particular, lim infR"Knf(x)> 0 for all xeE, There is also a converse result to this:
124
6 Total variation limit theorems
Proposition 6.4. Suppose that for some constant 0 < R < oo , some o--finite measure (pe . / II + : (i) For all 0 < r < R, lim r"K"(x, A) = 0 for some xeE, some 9-positive Aee; n--■
co
and (ii) lim infRnKn(x, A) > 0 for all xeE, all 9-positive A eg' . n —■ cx)
Then the kernel K is R-ergodic. Proof. By (ii) K is 9-irreducible and aperiodic. (i) and (ii) together imply that R is the convergence parameter of K, and that K is R-recurrent. By Theorem 6.9 K is R-positive recurrent. 1=1
The 'transformation' of Corollary 6.8(ii) from the Markov chain (g„) to the kernel K leads to the following: Corollary 6.10. Suppose that K is R-ergodic. Then for 7E-almost all x, yeE: co
.
1]
n= 0
We call K geometrically (resp. uniformly) R-ergodic, if it is aperiodic and geometrically (resp. uniformly) recurrent. It follows from Proposition 5.25 and Corollary 5.7 that an irreducible kernel K is geometrically (resp. uniformly) R-ergodic if and only if the Markov chain (IQ is geometrically (resp. uniformly) ergodic. The 'transformation' of Theorems 6.14 and 6.15 gives us the following two corollaries: Corollary 6.11. Suppose that K is irreducible. Either of the following two conditions is equivalent to the geometric R-ergodicity of K: (i) K is aperiodic and the renewal sequence Cio = 1, lin = Rnm° V1( (n— 1)m° for n > 1,
is geometrically ergodic. (ii)K is R-ergodic, and there exist a function Me(n) and a constant p <1 such that
11R nK n(x, ) — h(x)7r II,, M(x)p
for all xeE, n > 0.
If K is uniformly ergodic then even the operator norms
I R nK n converge:
def
—
h 0 it li h = sup 11 Rnh(x) — 1 1(n(x, •) — g 11h xcE
El
125
R-ergodic kernels
Corollary 6.12. Either of the following two conditions is equivalent to the uniform R-ergodicity of K: (i) K is R-ergodic and there exist constants M < CO and p <1 such that IIRK n— h® niih MP" for all n O. (ii)K is R-ergodic and the R-invariant function h is small, i.e., there exist an integer mo > 1, constant fi > 0 and measure v Eit + such that Icno > flh® v.
E
As an application of the concept of R-ergodicity we shall consider the socalled quasi-stationary distributions of an irreducible Markov chain (X) with transition probability P. The following considerations are of interest only if P is properly substochastic, i.e., {P1 < 1} Eg + . Fix a set Beg' . For any n ?-._ 0, xEE, A eg n B, the ratio P(x,A)1Pn(x,B) can be interpreted as the conditional probability Pn(x, A) = Px IXeAlX n eBl. Pn(x,B)
(6.25)
If there is a full set Fe S such that these conditional probabilities tend to a limit as n—> oo , which is independent of the initial state xeF, and if the limit, considered as a function of A, is a probability measure on (B, g nB), then the limit is called the quasi-stationary distribution on B. Using Corollary 6.3 we obtain a set of sufficient conditions for the existence of the quasi-stationary distribution on the given set B: Corollary 6.13. Suppose that the transition probability P is R-ergodic for some R> 1. Let h be the (essentially unique) minimal R-invariant function and let 7C be the (essentially unique) R-invariant measure for P. If 0
and infh > 0, B
then the quasi-stationary distribution on B exists and equals n(A)In(B), Aeg n B. Moreover the convergence of (6.25) to its limit n(A)/m(B) is uniform over AEgn B, for any xEF = E. CI Note that if Be g + is a small set then it satisfies the hypotheses of the above corollary. Hence in particular, if P is R-ergodic and the whole state space E is small, then the quasi-stationary distribution on E exists: for all xe E, lim Px {Xn eA ILE > n} = n(A)In(E)
uniformly over Aeg.
def
(Here L E = sup fn 0: X n eEl is the lifetime of the Markov chain (X).)
Miscellaneous limit theorems for Harris recurrent Markov chains Throughout this last chapter we assume (without further mentioning) that K = P is the transition probability of an aperiodic, Harris recurrent Markov chain (X; n > 0). We assume that P satisfies the minorization condition M(m o , 1, s, v). We fix an initial distribution A and a n-integrable function f.
In Sections 7.1 and 7.2 we shall examine the sums of transition AP" f. In Section 6.3 we considered the positive recurrent probabilities yn=0 case only; now we shall not exclude null recurrence. Section 7.1 deals with the differences E- ()Pn f — v1 3" f) whereas Section 7.2 is concerned with n=0 the convergence of the ratios Y No APnf /I No vPns, N —> oo . In Section 7.3 we shall study the convergence of the individual ratios 213n
f /vPns.
Finally, in Section 7.4 we shall prove a central limit theorem for the sums EN0 f (xn).
7.1 Sums of transition probabilities In the following theorem sufficient conditions are given for the convergence of the series
E (Apn f _
v pn f) . n=0
Recall the definition of the kernel 0,, = I x 0 (Pm° — s (I + • • • + Pm°-1). Also recall from Proposition 5.13 that A is Ifl-regular, if and only if AG, n0 ,,I f I is finite. Theorem 7.1. If m( f ) = 0, A, is I f I-regular, and N
sup YvPn f < Go,
(7.1)
N>0 0
then CO
DAP n f — vP n f )= AL.0,sof
(7.2)
o
In particular, if m(f)= 0 and (7.1) holds, then there is a full set F(= Rifi)
Sums of transition probabilities
127
such that CO
Y(Pn f (x)— vPn f)= C mo ,s,,,f(x)
for all xEF.
o The proof is based on the following:
Lemma 7.1. Let a =(a„;n >0) be a probability distribution on N and let c = (c„;n >0)be an arbitrary bounded sequence satisfying lim (c„ — c,, _ 1 ) = 0. n—o cu
Then
lim (a * c„ — c„)= 0. 00
n-0
Proof. For any 0 < N < n, write N
am (cn _ m — c n) m=0
n
+y
co
and let first
y
amcn _ m —
m=N + 1
awn ,
m=N + 1
n and then N tend to cc.
El
Proof of Theorem 7.1. Suppose that m o = 1. Note that lim vPN f = 0. N-4co
Consider the difference N
N-1
EAP" f — o
E o
T ag A N
vP f = EA
y
f(X n) o + A(a)*u*a(f)*1N _ 1 — u*a(f)*1 N _ 1 .
As N —> co the first term on the right hand side tends to the finite limit AG„f = AG s,,, f. In order to see that the second term tends to zero set a = and c = u * a(f)* 1 in Lemma 7.1. The proof of the general case, m o arbitrary, follows by considering the m o (X,,„,o) and the function f +... + Pm°- 1 f. -stepchain El
Corollary 7.1. Suppose that m(f) = 0, A is I f I-regular and the series E o- vPnf converges. Then the series y- AP" f converges and o
00
00
YA.Pn f = YvPnf + A0 m0 ,,,,,f. o o In particular, if n( f ) = 0 and Y ox vP" f converges, then co YP n O
.
f(x)= yvP" f + 6„,,,,s,,,f (x) for all xER I f 1 0
.
El
128
7 Limit theorems for Harris recurrent chains
It turns out that, if the function f E21 (1) (n) is special, then the hypothesis (7.1) of Theorem 7.1 holds true. We denote by ll • ll the supremum norm in the space bg of bounded measurable functions on (E, g):
I f II = supl f 1 , f ebg. E
Theorem 7.2. If f is special and n( f ) = 0, then N
sup
Y Pri f 0
N>0
(7.3)
< 00 ,
and (7.2) holds for any initial distribution A..
Proof. Without any loss of generality we may assume that m o = 1 (cf. the proof of Theorem 7.1). By hypothesis
icrn (f) = m(s) -1 n(f)= 0. 0 Set B(f)=Y: =na,n(f)= v(P — s 0 v)nG„f. It follows that cc
a( f )*1 n _ 1 = — Bn (f)= — v(P — s 0 v)G„f
for all n > 0,
and fence
II G„ f II v(P — h 0 v) 1
I B„( f )1
= II GJIIPG,{S G,n + 1} = II GJ II B„ for all n 0 (see (4.22)), and N-
y
1 vPn f
Iu*o-(f )* 1 N _ 1 1 = lu*B(f)N I
o
II G f Ilu*B N = II G,, f II (recall (4.6)). The assertion (7.3) follows from the estimates N
T„ A N
YPn f(x) o
Ex
y
f (X,,) + a(x)* u* a( f )*1 N _ 1
o
Glf I(x) + II G„ f II. The second assertion (7.2) is a direct consequence of Theorem 7.1.
n
Example 7.1. (a) Suppose that P is an aperiodic Harris recurrent transition matrix (see Examples 3.4(a) and 5.1(a)). Then for all x, y, zeE, all xo ek : N
sup sup Y(7r(z)Pn(x, y) — n(y)Pn(x, z)) N>0 xeE 0
< c0
Ratios of sums of transition probabilities
129
and (writing Gx.(x, y) = Ex EoT rol an =) 00
y
[n(z)(Pn(x, Y) — P n(xo,Y)) — n(Y)( 13"(x,z) — P n(xo, z))] n=0 = n(z)Gx.(x,y)— m(y)G„ o(x,z).
7.2 Ratios of sums of transition probabilities We shall start by proving a ratio limit theorem for renewal sequences. Let u =(u„ ; n >0) be an aperiodic, recurrent renewal sequence and let a =(a;n >0) be an arbitrary delay distribution. If u is positive recurrent, then by Theorem 6.2, the expectation of the mean number of renewal epochs during the interval [0, N], that is N
N
Ea RN + 1)-iyi,,,n=id= (N +1) - l ya*un , o o tends to the limit u=MIT 1 > 0 as N > co. It follows that the ratios —
N
Ea
N
E ifyn=
i}
y a*u„ _O —N
o N
tEo yi f y n = O
(7.4)
yun 0
,}
tend to the limit 1 as N > GO. The ratios (7.4) converge also in the case of null recurrence: —
Proposition 7.1. Suppose that u is an aperiodic, recurrent renewal sequence. Then for any delay distribution a, N
Ea*un lim ° N
N -- ■ co
=1.
L14,1 o
Proof. The ratio (7.4) can be written in the form a*u*lN u*1 N — A*u N = u*l N u*lN (Here A n = y- am as before.) Hence, what we have to prove is n+ 1 lim
A *u N
N —, 00 U*1N
_0.
7 Limit theorems for Harris recurrent chains
130
Let No < N be a fixed integer. We have N
No - 1
A * uN
u*iN
y
A n uN_ n AN °
< n=0 N
±
y
y
uN —
n
n=No
N
yun
un
n=0
0
Now let N and No tend to cc, and use the fact that Ex) u n = oo by 0 recurrence. El For a positive Harris recurrent Markov chain (X n) the ratios N
yA.Pn f o N
yvPns o tend to ir(f )/n(s) as N —>
The following theorem states that under some additional hypotheses this result holds true also in the null recurrent case: CO.
Theorem 7.3. If A is I f I-regular, then N
YAP' f . 1r(f) lim °N n(s) N—.00 yvP's o
(7.5)
Proof. Without any loss of generality we may suppose that m o = 1. It also suffices to consider the case where the function f is non-negative.
By Theorem 4.1 TAN
N
YAP" f
EA
y
=
O
N
N —1
f(X)
0
N
yun
yun
o
±
0
N
yun
The first term on the right hand side is dominated by AG„f /Y N0 un which by recurrence tends to zero as N —> cc. By Proposition 7.1 the second term tends to the limit YA(am)Yer n(f ) = ir(s) - 1 n( f ). o o
Consequently, N
YAPn f lim 0 N N—.00
yun
=
n(s)
0
from which the result follows.
0
Ratios of transition probabilities
131
Corollary 7.2. (i) If f is special, then (7.5) holds for any initial distribution A_ (ii) We have
y Pr' f (x) o
lim N-+ co
N
it(f)
n(s)
LvP's
for it-almost all xe E.
Example 7.2. (a) Suppose that P is an aperiodic Harris recurrent transition matrix. Then for all x, ye E, ze E rr : Pn(x, y) lim °
n(Y) 7r(Z)
oo
Pn(z,)
0
7.3 Ratios of transition probabilities In the following theorem sufficient conditions are given for the convergence of the ratios AP' f /vPns. Theorem 7.4. Suppose that f e.r +1 (n) and that the embedded renewal sequence uo = 0, un = vP('' 1)7"s for n > 1, satisfies iim Un+ 1 n- co
1.
(7.6)
n
Then: (i) We have ,.( f
ApnMo
lim inf oo
j > nn(s) `j v pnmo s
(ii) If f is small and Apnmo s <1 lim sup
(7.7)
n- oo V P nnic'S
then lim op
APn7.!
m(f)
vP-s
n(s)
(7.8)
(iii) If A is small and vP mn° f
m(f)
n- co V P nm° S
it(s)
urn sup
then (7.8) holds. (iv) If f and A are small, then (7.8) holds. Proof. (i) It suffices to consider the case m o = 1. For any sequence c =
7 Limit theorems for Harris recurrent chains
132
(c;n >0), for any fixed N >0, write c(N) for the truncated sequence (N) _ 1 c n — cps' fn 5N) • By Theorem 4.1 (7.9)
Pn f = 1(a(N) )*u*a(N) (f )n_1 + rn,N(A, f) where r„,N (A, f) ..._0. Hence, using (7.6), we have for any N >0, Ap n f Apn f lim inf = lim inf n-..ao vPn s ?I -+ CO u n+ 1 1
> lim ti n+ 1 N
N
= y A(am) y an(f) m=0 n=0 g(f
)
as
m(s)
N —> oo .
(ii) Assume first that m o = 1. We obtain for all m .....0: lim (urn
A(a (N) )*u* a(N) (Pms)„_ 1
OPms)
ti n+ 1
7r(S)
N -. co n- ■ co
= 1.
Since by the hypotheses lim sup
A.P n+mS
n-+ co
< 1 for any
ill,
un+ 1
we have, by using (7.7) and (7.9) with f = Pms, lim (lim rn'N(A ' Pins) ) = 0. N -- ■ co n-ioo
tin+ 1
Since f is small, there exist an integer m >0 and a constant 7 > 0 such that f yPms. Consequently, lim (lint sup rn'N(A' f N .--.co
n-. co
Un+ 1
))
=0,
and therefore Apn f
lim n--* op Un + 1
* u * a( Is' )( f )n - 11 . 7r( f ) = lim (lim 2 (a(N)) ) m(s) N-- ■ co n -4 co un+ 1
The assertion in the case m o 2 follows after observing that for every m >0, the function Pm f is small for the m o -step chain. (iii) The proof is similar to that of part (ii).
Ratios of transition probabilities
133
(iv) The inequality (7.7) is trivially satisfied with A = v. Hence by part (ii), vPn f n( f ) = m(s) n--.. vPn s
firn
The assertion now follows from part (iii).
0
In the following theorem a necessary and sufficient condition is given for the convergence to be valid for all initial distributions A and all small functions f. Theorem 7.5. We have (7.8) for all 2 and all small f if and only if lirn sup
P's
vPns
(7.10)
< co.
Proof. Suppose first that (7.10) holds, and let N, 7 < co be such that 7 = SUP n>N
Pnm3s vPnnzos
<
00
.
For any fixed integer m >0, any probability measure A, lim sup
AP"mc's — vPnmc's vPnm°s p(n — m)mo
= lim sup (A.P"° — vPm1"°) n—. co
S U n _ m+1
IT(' m)m°s
un+ 1
711 A P rnr " — vP mm°
Letting m —> co and using Corollary 6.7(i) we immediately get (7.7). This implies (7.8). Suppose now that (7.10) does not hold. Then for any m __ 1, there exist xm e E, n.> 0, such that Pn-s(x.) > 2m +1 vPnnis.
Define a discrete probability measure A on (E, g) by setting A( {x m} ) = 2 - m, m > 1. Then clearly, APnms
> 2 for all m 1,
vPnms —
contradicting (7.8). Example 7.3. (a) Suppose that P is an aperiodic, recurrent transition matrix. If lim
Pn +1 (z z) ' =1 Pn (Z , Z)
for some zeE,.„
7 Limit theorems for Harris recurrent chains
134 then lirn
Pn (x,Y) m(y) Tr(Z)
r n(z,
for all x,yeER .
7.4 A central limit theorem In this section we shall prove a central limit theorem for the functionals def
=
E f(x
), f e 0991 (70.
n= 0
def
Note that, when f = 1A , N (A) = (1 A) counts the number of visits by (X n ) to the set A during [0, N]. We write X(M, a') for a normal random variable with mean M and variance o- 2 .
Theorem 7.6. Suppose that (X) is ergodic. Then for any initial distribution 2, any f e 4(7r) such that the measure ni fi is I f I-regular, 112 N (
f)
(0, (q) in P a-distribution as N -> 00 ,
where the variance a is given by [ mo mo o-2f = n(f 2) + 2m 0m)7r1 f Pin f + l"l 6 ma ,s, „ f . m= m= Proof. We fix 2 and write simply 53,t = P. Since the mo -step chain ()cm° ; n > 0) has the atom (s, v) we can construct its split chain (X nmo ,Yn ; n 0) in the manner described in Section 4.4. We shall next put the entire Markov chain (X) and the incidence sequence (l,) on the same probability space: This can be achieved by giving the conditional probabilities
E ono —
E f
P { Yn = 1, X . 0+ i eclx i ,..., X mno+m ,_ EdX mo _ 1 , X 0+ omo edy v ;_ i ;Xnmo =x1 lmo = PA; = 1, X i eclx i ,...,X,_ i edxmo _ i , Xrne edy1 = r(x, y)P(x, dx 1 ). _ , dy), n 0, x, y,...eE, where re(‘ 0 64 is the Radon-Nikodym derivative s(x)v(dy) r(x,y)= Pmqx, dy) It is easy to see that, for every q >1, the conditional distribution of the post-qmo-chain ()cm. + n ; nO) given g; orno V tFqir 1 and given Yq _ 1 = 1 is the same as the P v-distribution of (X„ ; n >0): ..2:2 (X q„„. n ; n =
„(X „;n
• (x.7 1)mo V 317 ,1Y _
0).
;
Yq _ =1)
(7.11)
A
central limit theorem
For every N,i
135
1, let us write
i(N) = supli _._ 0:(T„(i) + 1)m 0 NI, L(N)= T(i(N)) = sup fn 0:(n + 1)m0 N, Y n = 11, T(i)
(T.(i) + 1)mo — 1
E
(i) =
Zr,,
n=T.(i — 1)+ 1
n=(T.(i— 1)+ 1)mo
where
E
f (X n) =
mo— 1 zn =
m=0
We can decompose the sum N ( f ) =ENn=0f (X n ) as follows (T.mo +
Mo — 1)
A
f(x„)+ E
E
N(.f)=
n=0 N
+
i(N)
N
i= 1
E
co) (7.12)
f (X n ).
n=(L(N)+ 1)mo
By (7.11) the classes {C(j); 1 j i — 2} and IC(j); j-_il are independent for every i > 3. The (unconditional) distribution of every C(i), i 1, is the same E eno +in° -1 f (X n): for every i 1, as the P a-distribution of the functional s scen. + mo — 1 = - r G,(
E
f (X n))
E
f(x n)
n=m0 T.mo+mo— 1
= 2 '1)(
n = Mo
by (7.11),
n= 0 T.
(7.13)
= n=0
Thus, for example, T. mo— 1 MO = E v [
E E f (X nmo+ ni )1 n=0 m=0
00 J. [Mo — 1
=E
,.. m E v E f(X no+,n);Yo
n= O)
n— 1
• ' Y
=0; Xnmo edyi
m= 0
mo— 1 = m(s) — 1 fm(dy)E y
E f (X , n)
m=0
by conditioning w.r.t. g I nXmo
V 3-17 nY — 1
and using (7.11),
= (n(z)) - 1 mo n( f ) = 0
by the hypothesis. The first term on the right hand side of (7.12) is dominated by
T.mo+mo_ 1 f (X „) E n= 0
(7.14)
7 Limit theorems for Harris recurrent chains
136
which does not depend on N and therefore it is finite almost surely. Therefore, when divided by N 1 /2 , it tends to zero in probability as N —> cc. Let c 0 be an arbitrary constant. For the third term we obtain (the notation [t] means the integer part of the real number teR +): > c}
p{ n=(L(N)+ 1)mo
<
E
p{ n=(L(N)+
1)mo
I f(xn)I >C}
[N Imo]
E
=Ep m= 1
if(x„)1>c; L(N) = [N/mo] — mf
n= ([1V/mo] — m+ 1 )mo N — ([/moi N nOmo —
[N Imo]
= E PIY,N/m0]-m = 1 } Pat m= 1
i f (X01> c;S- 171}
Mrito Mo — 1
00
<E
y
n = inn
E
Pc,
m=1
f (X )I > c ; S G, m}
n=m0 oo by positive recurrence.
<
Hence, by the monotone convergence theorem CO Mrno Mo 1 lirn P„ E if(x„)1> c;Sc, m} = 0. cic. m=1 n=mo It follows that N-1/2
x
f(X„) —> 0 in probability as N —> co.
n= (L(N)+ 1)mo
Consequently, it is sufficient to prove that t(N)
N -112
E
0j) in distribution as
N cc.
(7.15)
=
We shall first prove the asymptotic normality of the random variables j_1/2
E w)
as /cc.
i=
Let m > 2 be an arbitrary fixed integer. We can write I
[11m]— 1
[I /m] — 1
1
1=1
j= 0
j= 1
i=[1/mim
C(0
(7.16)
where the random variables def
'10= (jm + 1) + Wm +2) + • + (jm + m — 1), j
0,
are i.i.d. with common mean
Eri(j) = Er7(0) = (m — 1) K(1) = 0
(see (7.14))
137
A central limit theorem
and variance 0-,2n = Er7(0) 2 = E(C(1) + • • • + C(m - 1)) 2 2 + 2(m - 2) ECM C(2),=(m-1)EC and where the random variables ajm), j >1, are i.i.d. with zero mean and variance E(jm) 2 = EC(1) 2 . Now divide both sides of (7.16) by / 112 and let I -* co (m is fixed). By the central limit theorem for i.i.d. random variables the first term on the right hand side tends to x(0,m -1 am2 ). Similarly, the second term tends to X(0, m -1 K(1) 2). It is easy to see that the third term tends to zero in probability. If we now let m-> co, then in distribution, where = liM m 1 0-72n = EC(1) 2 2IEC(1)C(2),
and .iV (O,m 1 EE(1) 2)->O in probability.
Consequently, /— 1/2 E
x(0, .5.2)
in distribution as i= In order to prove (7.15), note first that by the strong law of large numbers for i.i.d. random variables
i(N)m 0 i = = n(s) almost surely. lim N -+x N Hence, for any fixed e > 0, there exist an integer N(e) and a set A(c)e gi such that P(A(e)c) < E, and for all N N (e), weA(e): def def _ N = [m(7, 1 (1 - )7r(s)N] + 2 i(N) [m c; 1 (1 + e)m(s)N] = N. Clearly, for such N and
@,
i(N)
[mcT it(s)N]
i=1
i=1
y c(i) —
E C(i) < 2 max
y c(i) i=N
Let c> 0 be arbitrary. Since any C(i), ief, are i.i.d. whenever 5 does not
7 Limit theorems for Harris recurrent chains
138
contain two consecutive indices, we obtain, by using Kolmogorov's inequality,
> N 1 / 2 c}
P { max Nsisic
< P { max N5j5I3 i even
+
max
E
P { N5j5N NSiSi iodd < 4N -1 C -2 (5T —
N 1)1EC(1) 2
8c - 2 m c7 't(s)EE(1) 2 .
Consequently, i(N) N —112
in16 1 1r(s)NJ
y c(i) — N - 1/ 2 E c(i)
i=1
0 in probability as N cc.
1=1
It follows that
E
f Ilin N -112 N-* oo n=0
i(N)
E
(X n) = urn N -1 / 2 N oo = 11111 N
i=1 [mci lit(s)N] 112
E
i=1 = (n,q 1 740)112 li n). 1— 1/2
E c(i)
=
where 0.2 _
1 7445.2
= m o'n(s)[EC(1) 2 + EC(1) C(2)].
It remains to calculate the variance EC(1) 2 and the covariance EC(1)C(2). (To see that the use of Fubini's theorem is justified below the reader is recommended to make the calculations also with I f I instead of f and to use Proposition 5.13.) For the variance we have by (7.13) T. EC(1) 2 =
)2
n=o E
.11] + 2 E
=
Ev [Z„2 ; n=0
E
Ev [Z „Z.; T
n=0m=n+ 1
n= 0
n] + 2
Ev [Zn n=0
m=n+ 1 Yn = • • • = Y._ = 0]
T.
E Exn,,,, [z (1] +2E,
= E,
°
n=0
S.
[
T.
Zo mE Z.; Yo =0] =1
E Exnni
n=0
°
by conditioning w.r.t. .Fnxmo V
sc
,
= n(s)- 1 ET [Z (1] + 2m(s)- 1 E[Z 0
E zm; yo =o]. m= I
Similarly, we get for the covariance T„ E[C(1)0211 = Ev [
T.(1)
m=0 [ co
=E
1
E z n, E n=T„1- 1 T.(1)
E z n; 7' .. m ]
E, Z ni
n=T.1- 1 S.
m=0
IZ 0 E z n; yo = I]
= m(s) - 1 ET
n=1
+ m(s) - 1 En [Z0 C(1); Yo = 0]. By conditioning w.r.t..F xmo v g; 0Y we see that the second term is equal to
m(s) - 1 lE[Z 0 E xm S(1); Yo = 0] = 0 by (7.14). Consequently sc,
(7 2
= M (T, 1 { Eit [Z fl
+
21En [Zo
y n=1
where (
Eic zg =1E,,
m0-1
y
mo f (X n))2 =
n=0
mon(f 2) + 2
E (m0 - 0 7c/fPnf
n=1
and S. Erc [Zo
T.
E z n ] .= lEn [Zo lExnio E n=1
1 Zn
n=0
mo— 1
[mo— 1
= lE„
E
f (X n)G,„.,s, v y
n=0
pmf(x mod
m=0
mo
=E
nif pnOmo ,,,,f
CI
n=1
Note that, when m o = 1, the variance a 2 = oj is simply given by oj = ir(f 2) + 2nIf PGs, v f. If (Xn) has a proper atom a, then
s. Uaf f = 7r(f ) + 27E1 oj. = 7r(f 2 2 ) + 2cifEf (X n). 1
140
7 Limit theorems for Harris recurrent chains
Corollary 7.3. (i) Suppose that (X) is ergodic. The result of Theorem 7.6 holds true for any special function f eb (1(n). (ii) Suppose that (X) is ergodic of degree 2. Then for any initial distribution 2, any bounded function f ebg, N -1 /2 R N (f) — (N + On( f
(0, _ 0)
in P2-distribution as N c o . Example 7.4. (a) Suppose that (X) is an ergodic discrete Markov chain. Then for any two distinct states x, x o eE„, any initial distribution 2, N
- 1/2
E oc(x) - 1 1an=x1 _ goo -
Ian =x0})
(0, cr L 0),
n=o
where def 2 2 x,x0 = f
with f = ir(x) - 1 x — 74)0 - 1 xo
= n(x) - 1 + 2m(x) - 1 Ux.(x, x) — n(xor If (X) is ergodic of degree 2, then for all xeE,
N
/2 [
—
(N + 1)7r(x)
X (0, ot_ , r(x)l ),
n=0
where
o- 2
= 2n(x) 2 Ei,Sx — m(x) — ir(x) 2 = ir(x)3 ExS! — n(x).
Notes and comments
Chapter 1 For thorough accounts of the foundations of general (probabilistic) Markov chain theory the reader is referred to Doob (1953), Ch. 5, Neveu (1965), Ch. 5, Orey (1971), Ch. 1, or Revuz (1975), Ch. 0 and 1. Foguel (1969a) approaches Markov chains via the theory of contractions on the Banach space 2 1 (12). Basic references to the theory of discrete Markov chains are e.g. Feller (1957), Chung (1960), Kemeny, Snell & Knapp (1966) and Freedman (1971). The general theory of non-negative operators (from the functional theoretic point of view) is presented in Schaefer (1974) (see also Krasnoselskii, 1964). Monographs concerning non-negative matrices are e.g. Gantmacher (1959) and Seneta (1981). Most of the results treated in our Example (a) concerning non-negative matrices can be found in Seneta's book. Most of the examples treated in this book point out the linkages to other fields having connections with Markov chain theory. On each of these fields (random walks and renewal processes, queueing and storage theory, time series, learning models, branching processes, etc.) there is an extensive literature of its own. Most of the material of Chapter 1 is of standard character. We make only a couple of detailed comments. The basic assumption according to which the a-algebra 6 1' is countably generated is in fact not very restrictive. By using the technique of admissible a-algebras (see e.g. Orey (1971), Sect. 1.1) one can extend most results to the case where g is not countably generated. In order to obtain the minorization inequalities of Section 2.3 we demand for the a-finiteness of a kernel K that the function f making K(x,dy)f (x, y) finite is jointly measurable (cf. Revuz (1975), Ch. 1, Def. 1.1). Chapter 2 Many of the basic concepts and results of Chapter 2 go back to Doeblin (1937) and (1940). They were further developed by Doob (1953), Harris (1955, 1956), Orey (1959), Chung (1964), Moy (1967), Jain & Jamison (1967), Isaac (1968), Neveu (1972b), Tweedie (1974a,b) and Revuz (1979). The notions of Chapter 2 are usually formulated for substochastic kernels; however, no additional difficulties arise in the extension to general, non-markovian kernels (see Moy, 1967; Tweedie, 1974a,b). Sections 2.1 and 2.2 contain easy preliminary results on closed sets and irreducible kernels. The concept of maximal irreducibility measure was introduced by Tweedie (1974a). He also proved Proposition 2.4. Since we do not restrict ourselves to stochastic kernels, we have made a distinction between closed and
142
Notes and comments
absorbing sets (see Definition 2.1). So, for example, Orey's (1971) 'closed' means the same as our 'absorbing'. The small sets, as we call them, are essentially the same concept as the C-sets in Orey (1971). (Orey (1971) assumes — in our notation — that the measure v is supported by the set C; cf. also our Remarks 2.1.) Small sets are commonly defined as those sets which satisfy condition (ii) in Corollary 2.1 with B = E (cf. e.g. Foguel, 1969b; Brunel, 1971; Lin, 1976). Bonsdorff (1980) observed that these two classes (essentially) coincide (cf. Proposition 2.11(0. This implies that the state space splits into a countable number of small sets. The small sets play in many cases the same role as do individual points (or finite sets) in a countable state space. The fundamental Theorems 2.1 and 2.2 stating the existence of small sets and cycles for irreducible kernels were proved by Jain & Jamison (1967), the presentation of which we closely follow. Many of the arguments used in these proofs go back to Doeblin (1937, 1940) (see also Doob, 1953; Harris, 1956; Orey, 1959; and Isaac, 1968). An extensive study of the concept of cyclicity is made in Chung (1964). Chapter 3 The potential theoretic results of Sections 3.1 and 3.2 are basically due to Deny (1951) and Doob (1959) (see also Choquet & Deny, 1956). For thorough accounts of the potential theory associated with Markov chains the interested reader should look at Kemeny, Snell & Knapp (1966), Ch. 7-11, and Revuz (1975), Ch. 2 and 6-9. Theorem 3.2 stating the existence of the convergence parameter and of the Rclassification for irreducible kernels is the first basic result in our presentation of the general Perron—Frobenius theory. This theorem was proved for countable nonnegative matrices by Vere-Jones (1962, 1967, 1968). The extension to the general state space was performed by Tweedie (1974a,b). A good reference on randomized stopping times is Pitman & Speed (1973). After some preliminary results (basically due to Doeblin, 1940; Chung, 1964; Moy, 1965a,b) concerning the notions of transience and dissipativity we prove in Section 3.5 the fundamental Hopf decomposition result, originally due to Hopf (1954). It is usually formulated in the 'abstract' setting where P is an arbitrary contraction, not necessarily induced by a transition probability, on a space 2 1 (p) (see e.g. Foguel, 1969a, Ch. 2; or Revuz, 1975, Ch. 4). From Hopf's decomposition theorem we derive the fundamental dichotomy results of Theorems 3.6 and 3.7, due to Jain & Jamison (1967). Most of the other results of Section 3.6 can also be found in Jain & Jamison's paper. Theorem 3.6 is in fact a special case of a more general result, concerning the concept of normal chains (see Jain & Jamison, 1967; or Orey, 1971, Sect. 1.8; cf. also Chung, 1964; Sidak, 1966; Jamison & Orey, 1967; Neveu, 1972a ; Winkler, 1975; Tuominen, 1976). The important notion of 9-recurrence was introduced by Harris (1956). The recurrence of autoregressive processes (cf. our Examples 3.4(f ) and 5.5(f) is also studied in Athreya & Pantula (1983).
Notes and comments
143
Chapter 4 Sections 4.1 and 4.2 are mostly of standard character. They serve as an introduction to the general regeneration scheme presented in Section 4.3. The concept of regeneration goes back to Palm (1943). It was further developed e.g. by Feller (1949), Bartlett & Kendall (1951) and Smith (1955). The main result of Chapter 4, Theorem 4.2, was proved independently by Athreya & Ney (1978) and Nummelin (1978a), both stimulated by Griffeath (1978). The decomposition results of Theorem 4.1 were formulated for transition probabilities in Nummelin (1978a); that they easily extend to non-markovian kernels was noted in Nummefin (1979). The use of Lemma 4.1 appears in Horowitz (1979). The first entrance and last exit decompositions given by Proposition 4.4 and Theorem 4.1 belong to the standard techniques for discrete Markov chains. Proposition 4.7 is adopted from Nummelin (1977). A somewhat different approach to the topics of Section 4.3 is presented in Athreya & Ney (1982). That the regeneration scheme described in Section 4.4 can be formulated in terms of a certain randomized stopping time (T in our notation; cf. Theorem 4.3(i)) is noted in Athreya & Ney (1978). The converse result (part (ii) of Theorem 4.3) was suggested by P. Glynn (personal communication, 1982). Other references related to the theme of Section 4.4 are e.g. Athreya, McDonald & Ney (1978), Nummelin (1978b), Berbee (1979), Lindvall (1979b), Ney (1981). The construction of the regeneration time for the many server queues (Example 4.2(j)) appears in Chariot, Ghidouche & Hamami (1978). For a construction of regeneration times for tandem queues see Nummelin (1981b). Chapter 5 Theorems 5.1 and 5.2 were proved for finite non-negative matrices by Perron (1907) and Frobenius (1908, 1909, 1912). Perron's and Frobenius' results were extended to countable matrices by Vere-Jones (1962, 1967, 1968). The existence and uniqueness of an invariant measure for a recurrent, discrete Markov chain was proved by Derman (1954); the general, (p-recurrent case was solved by Harris (1956). The proof of the existence and uniqueness of an R-invariant function and measure for a general R-recurrent kernel is due to Tweedie (1974a) (for earlier, related works see e.g. Jentzsch, 1912; Krein & Rutman, 1948; Birkhoff, 1957; Karlin, 1959; Harris, 1963). The similarity transform can be found in Harris (1963), Ch. 3, App. 3. The minimality aspect of the R-invariant function k (which leads to Harris recurrent k; cf. Proposition 5.4) was noted in Nummelin (1977). The expressions for the Rinvariant function and measure in terms of the potential kernel GT2,,,, were given in Nummelin (1979). These constructions are based on the corresponding constructions in the discrete case, due to Derman (1954) and Vere-Jones (1967) (see also Athreya & Ney, 1978, 1982; Nummelin & Arjas, 1976; Nummelin, 1978a). The results stated in Theorems 5.1 and 5.2 are also related to the general theory of fixed points for operators in a cone (see Krein & Rutman, 1948). For results concerning subinvariant functions and measures see also Feldman (1962, 1965).
144
Notes and comments
Proposition 5.9 was proved by Harris (1956) (cf. also Kac, 1947). Corollary 5.4 is due to Cogburn (1975). The great practical value of the Balayage principle as a criterion for positive recurrence (and for related concepts) was noted and systematically exploited by Tweedie (1975, 1976) (Proposition 5.10 is taken from Tweedie (1976)). In the discrete case criteria of this kind are often referred to as Foster's (1953) criteria. The rest of Section 5.3 is mostly based on Nummelin (1978a). See also Harris (1956), Brunel (1971), Neveu (1972a), and Brunel & Revuz (1974) for the notion of special sets and functions. (The special sets are the same concept as the D-sets introduced by Harris (1956).) See Isaac (1968) and Cogburn (1975) for the notion of regular states and sets. (Cogburn calls regular sets strongly uniform sets.) Lemma 5.1 is a special case of the general conditional Borel-Cantelli-Leg lemma (see e.g. Doob, 1953, Ch. 7, Sect. 4). For generalized Wald-type identities (cf. Lemma 5.2) see e.g. Franken & Lisek (1982). Recurrence of degree 2 is extensively studied in Cogburn (1975). He proved Propositions 5.15 and 5.16(i). Most of the other results are special cases of the more general results by Nummelin & Tuominen (1983). For an earlier treatment see Orey (1959). The presentation of Section 5.5 follows closely Nummelin & Tuominen (1982). The results are generalizations of the corresponding results on the discrete state space, due to Kendall (1959), Vere-Jones (1962) and others. The identities of Lemmas 5.4 and 5.5 are special cases of a general identity by Cogburn (1975). Proposition 5.21 is taken from Nummelin & Tuominen (1982). It extends Popov's (1977) criterion. For a study of the geometric recurrence of reflected random walks (cf. Example 5.5(d) ) see Miller (1966). Uniformly recurrent Markov chains are often called strongly recurrent (or ergodic), or also Markov chains having a quasicompact transition probability. They were first studied by Yosida & Kakutani (1941) and Doob (1953). For thorough treatments of uniform recurrence see e.g. Revuz (1975), Ch. 6, §3, and its references. Proposition 5.22 was proved by Horowitz (1972). Proposition 5.23 is basically due to Bonsdorff (1980). Proposition 5.24 is the general counterpart of the corresponding discrete result by Isaacson & Tweedie (1978). Example (k) is motivated by Harris (1963), Ch. 3, Sect. 10. The results of Section 5.7 are straightforward consequences of the corresponding results for transition probabilities. Chapter 6
Theorem 6.1 is in fact the same as Orey's convergence theorem (Corollary 6.7(i)) for the backward Markov chain (1' n > 0). The coupling used in its proof is due to Ornstein (1969a, b). Our presentation follows closely Berbee (1979) (see also Meilijson, 1975). Lemma 6.1 needed in the proof is due to Chung & Fuchs (1951). Theorem 6.2 was proved by Erdos, Feller & Pollard (1949). Theorems 6.3 and 6.4 are due to Pitman (1974); see also Kemeny, Snell & Knapp (1966), Ch. 9. Theorem 6.5 is a special case of a general result by Cogburn (1975). Theorem 6.6 was proved by
Notes and comments
145
Kendall (1959). More general rates for renewal processes are studied in Lindvall (1979a). The coupling technique is originally due to Doeblin (1938a). Surveys on this method can be found in Griffeath (1978), Berbee (1979) and Thorisson (1981). Theorem 6.7 was proved for positive Harris recurrent transition probabilities by Orey (1959). The null case was proved in Jamison & Orey (1967). Tweedie (1974a) proved the general R-recurrent case. Theorem 6.9 (with K = P Harris recurrent) is due to Horowitz (1979). Theorem 6.10 is essentially due to Jain (1966). More on Orey's theorem and related topics can be found e.g. in Blackwell & Freedman (1964), Jain & Jamison (1967), Jamison & Orey (1967), Horowitz (1969, 1979), Ornstein & Sucheston (1970), Foguel (1971, 1976), Derriennic (1976), Pitman (1976), Revuz (1979), Greiner & Nagel (1982). Theorem 6.11 is essentially due to Cogburn (1975). The present version is taken from Nummelin (1981a). The latter paper also contains more general results on the convergence of sums of transition probabilities and converse results, i.e. criteria for the degrees of recurrence in terms of the convergence of the sums. Theorem 6.12 is from Nummelin (1978a); it is the general counterpart of the corresponding discrete results (see e.g. Kemeny, Snell & Knapp (1966), Ch. 9, Pitman (1974)). Theorem 6.13 is a special case of a more general result from Nummelin & Tuominen (1983). The results of Section 6.5 can be found in Nummelin & Tweedie (1978) and Nummelin & Tuominen (1982). The results in the discrete case were proved by Vere-Jones (1962). Theorem 6.15 is due to Yosida & Kakutani (1941) and Doob (1953). Other results on quasi-stationarity are given in Seneta & Vere-Jones (1966) and Tweedie (1974c). The renewal theorem on R, (cf. Examples 6.1 and 6.2(e)) was proved by Doob (1948), Blackwell (1948) and Smith (1958) (see also Feller, 1971, Ch. 11; McDonald, 1975; Lindvall, 1977, 1979b; Arjas, Nummelin & Tweedie, 1978; Ney, 1981). The rate of convergence is studied in Stone & Wainger (1967), Ney (1981), Nummelin & Tuominen (1982, 1983). Chapter 7 That EN0 Pn f II is a bounded sequence whenever f is a special function satisfying n( f)= 0 (see Theorem 7.2) is due to Ornstein (1969a, b), Metivier (1969), Duflo
(1969) and Brunel (1971) (see also Foguel & Ghoussob, 1979). Theorem 7.1 improves the results by Duflo (1969), Orey (1971) and Lin (1974b). Corollary 7.1 was proved in Nummelin (1978a). In fact, the results of Section 7.1 are strongly related to the potential theory of recurrent Markov chains (see Revuz, 1975, Ch. 6-9). The results of Section 7.1 for discrete chains can be found in Kemeny, Snell & Knapp (1966), Ch. 9. Theorem 7.3 for discrete Markov chains was proved by Doeblin (1938b). The present version for general chains is taken from Nummelin (1978a). For related works see e.g. Harris (1955, 1956), Chacon & Ornstein (1960), Chacon (1962), Jain (1966), Krengel (1966), Isaac (1967, 1968), Levitan (1967, 1970, 1971), Foguel
146
Notes and comments
(1969a, b) , Metivier (1972), Foguel & Lin (1972), Neveu (1973). Of these Chacon & Ornstein's, Chacon's and Foguel's results are more general in that they deal with the 'abstract' case where P is only assumed to be a contraction on Theorems 7.4 and 7.5 are taken from Nummelin (1979) (for related results see Orey, 1961, 1971; Kingman & Orey, 1964; Levitan, 1967, 1970; Jain, 1969; Foguel, 1969b; Lin, 1974a, 1976). A thorough survey of the individual ratio limit theorems is made in King (1981). The regeneration technique is also used in Athreya & Ney (1980). Theorem 7.6 is a special case of a more general result by Kaplan & Silvestrov (1979). The present proof is adopted from Niemi & Nummelin (1982), which in turn follows the pattern of Chung (1960). Related results can be found also in Doeblin (1937, 1938b), Doob (1953), Orey (1959), Cogburn (1972), Grigorescu & Oprisan (1976) and Maigret (1978).
List of symbols and notation
In general, the list below contains only those symbols and notation which are not explained in the main text. N = {0,1,2,...}, the non-negative-integers N=N cc 1, the extended non-negative integers R = ( — co, co), the real line R=Ru{— oo} u{ co}, the extended real line R, = [0, cc), the non-negative real line +, the Borel subsets of R, R,, respectively a v b =max{a,b} a A b = min {a, b} a+ = a v 0 a_ =(— a) v 0 (E, e), the basic measurable space (to be called the state space) g} = IxeE: f (x) g(x)} (the notation f < cc} , etc., is interpreted {f similarly) A\B= {xe A: x#B} Be = E\B A i , the union of the disjoint sets A i (/), the empty set Card (A), the number of elements in the set A 1 A , the indicator of the set A:1 A (x) = 1 if xe A, = 0 if xctA A x B, the Cartesian product of the sets A and B E"=Ex•••xE(n times) E'=ExEx••• A B, the sets A and B are isomorphic n B = {Aesi: A g (sal any collection of sets) v = a(s1,M), the smallest a-algebra containing the classes d and .4 d 0 .4, the product a-algebra of the a-algebras si and .4 eon = e 0 • • 0 e (n times)
y
eo- = 06° 0.•• E[cI.521], the conditional expectation of the random variable w.r.t. the sub-a-algebra sig E[1d1,4]=E[E[491]1,4] E[;A]= E[1A]
Bibliography
Arjas, E., Nummelin, E. & Tweedie, R.L. (1978). Uniform limit theorems for non-singular renewal and Markov renewal processes. Journal of Applied Probability, 15, 112-25. Athreya, K.B., McDonald, D. & Ney, P. (1978). Limit theorems for semi-Markov processes and renewal theory for Markov chains. Annals of Probability, 6, 788-97. Athreya, K.B. & Ney, P. (1978). A new approach to the limit theory of recurrent Markov chains. Transactions of the American Mathematical Society, 245, 493-501. Athreya, K.B. & Ney, P. (1980). Some aspects of ergodic theory and laws of large numbers for Harris-recurrent Markov chains. In Colloquia Mathematica Societatis Janos Bolyai, 32. Nonparametric Statistical Inference. Budapest, Hungary, 1980. Athreya, K.B. & Ney, P. (1982). A renewal approach to the Perron-Frobenius theory of non-negative kernels on general state spaces. Mathematische Zeitschrift, 179, 507-29. Athreya, K.B. & Pantula, S.G. (1983). Strong mixing and Harris recurrence for autoregressive processes. Preprint, Iowa State University. Bartlett, M.S. & Kendall, D.G. (1951). On the use of the characteristic functional in the analysis of some stochastic processes occurring in physics and biology. Proceedings of the Cambridge Philosophical Society, 47, 65-76. Berbee, H.C.P. (1979). Random walks with stationary increments and renewal theory. Doctoral thesis. De Vrije University, Amsterdam. Birkhoff, G. (1957). Extensions of Jentzsch's theorem. Transactions of the American Mathematical Society, 85, 219-27. Blackwell, D. (1948). A renewal theorem. Duke Mathematical Journal, 15, 145-50. Blackwell, D. & Freedman, D. (1964). The tail a-field of a Markov chain and a theorem of Orey. Annals of Mathematical Statistics, 35, 1291-95. Bonsdorff, H. (1980). Characterizations of uniform recurrence for general Markov chains. Annales Academiae Scientiarum Fennicae, Series A, I. Mathematica, Dissertationes, 32. Brunel, A. (1971). Chaines abstraites de Markov verifiant une condition d'Orey. Extension cc cas d'un theoreme ergodique de M. Metivier. Zeitschrift fiir Wahrscheinlichkeitstheorie und verwandte Gebiete, 19, 323-29. Brunel, A. & Revuz, D. (1974). Quelques applications probabilistes de la quasi-compacite. Annales de l'Institut Henri Poincare, B, 10, 301-37. Chaeon, R.V. (1962). Identification of the limit of operator averages. Journal of Mathematics and Mechanics, 12, 961-68. Chaeon, R.V. & Ornstein, D. (1960). A general ergodic theorem. Illinois Journal of Mathematics, 4, 153-60. Charlot, F., Ghidouche, M. & Hamami, M. (1978). Irreductibilite et recurrence au sens de Harris des 'temps d'attente' des files GI/G/q. Zeitschrift ftJr Wahrscheinlichkeitstheorie und verwandte Gebiete, 43, 187-203. Chaquet, G. & Deny, J. (1956). Modeles finis en theorie du potentiel. Journal d'Analyse Mathematique, 5, 77-135. Chung, K.L. (1960). Markov Chains with Stationary Transition Probabilities. Berlin: Springer. Chung, K.L. (1964). The general theory of Markov processes according to Doeblin. Zeitschrift fiir Wahrscheinlichkeitstheorie und verwandte Gebiete, 2, 230-54.
148
Bibliography
149
Chung, K.L. & Fuchs, W.H. (1951). On the distribution of values of sums of random variables. Memoirs of the American Mathematical Society, 6. Cogburn, R. (1972). The central limit theory for Markov processes. In Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, pp. 485-512. Berkeley, California: University of California Press. Cogburn, R. (1975). A uniform theory for sums of Markov chain transition probabilities. Annals of Probability, 3, 191-214. Deny, J. (1951). Familles fondamentales, noyaux associes. Annales de l'Institut Fourier, 3, 73-101. Derman, C. (1954). A solution to a set of fundamental equations in Markov chains. Proceedings of the American Mathematical Society, 5, 332-34. Derriennic, Y. (1976). Lois 'zero ou deux' pour les processus de Markov. Applications aux marches aleatoires. Annales de l'Institut Henri Poincare, B, 12, 111-29. Doeblin, W. (1937). Sur les proprietes asymptotiques de mouvement regis par certains types de chaines simples. Bulletin de la Societe Roumaine des Sciences, 39, No. 1,57115; No. 2,3-61. Doeblin, W. (1938a). Exposé de la theorie des chaines simples constantes de Markov a un nombre fini d'etats. Revue Mathematique de l'Union Interbalkanique, 2, 77-105. Doeblin, W. (1938b). Sur deux problemes de M. Kolmogoroff concernant les chaines denombrables. Bulletin de la Societe Mathematique de France, 66, 210-20. Doeblin, W. (1940). Elements d'une theorie generale des chaines simples constantes de Markoff. Annales Scientifiques de l'Ecole Normale Superieure, Paris, III Ser., 57, 61-111. Doob, J.L. (1948). Renewal theory from the point of view of the theory of probability. Transactions of the American Mathematical Society, 63, 422-38. Doob, J.L. (1953). Stochastic Processes. New York: Wiley and Sons. Doob, J.L. (1959). Discrete potential theory and boundaries. Journal of Mathematics and Mechanics, 8, 433-58. Duflo, M. (1969). Operateurs potentiels-des chaines et des processus de Markov irreductibles. Bulletin de la Societe Mathematique de France, 98, 127-63. Erdos, P., Feller, W. & Pollard, H. (1949). A property of power series with positive coefficients. Bulletin of the American Mathematical Society, 55, 201-04. Feldman, J. (1962). Subinvariant measures for Markoff operators. Duke Mathematical Journal, 29, 71-98. Feldman, J. (1965). Integral kernels and invariant measures for Markoff transition functions. Annals of Mathematical Statistics, 36, 517-23. Feller, W. (1949). Fluctuation theory of recurrent events. Transactions of the American Mathematical Society, 67, 98-119. Feller, W. (1957). An Introduction to Probability Theory and Its Applications, vol. 1, 2nd edn. New York: Wiley and Sons. Feller, W. (1971). An Introduction to Probability Theory and Its Applications, vol. 2, 2nd edn. New York: Wiley and Sons. Foguel, S.R. (1969a). The Ergodic Theory of Markov Processes. New York: Van Nostrand. Foguel, S.R. (1969b). Ratio limit theorems for Markov processes. Israel Journal of Mathematics, 7, 384-92. Foguel, S.R. (1971). On the 'zero-two' law. Israel Journal of Mathematics, 10, 275-80. Foguel, S.R. (1976). More on the 'zero-two' law. Proceedings of the American Mathematical Society, 61, 262-64. Foguel, S.R. & Ghoussob, N.A. (1979). Ornstein-Metivier-Brunel theorem revisited. Annales de l'Institut Henri Poincare, B, 15, 293-301. Foguel, S.R. & Lin, M. (1972). Some ratio limit theorems for Markov operators. Zeitschrift fur Wahrscheinlichkeitstheorie und verwandte Gebiete, 23, 55-66.
150
Bibliography
Foster, F.G. (1953). On the stochastic matrices associated with certain queueing processes. Annals of Mathematical Statistics, 24, 355-60. Franken, P. & Lisek, B. (1982). On Wald's identity for dependent variables. Zeitschrift jilr Wahrscheinlichkeitstheorie und verwandte Gebiete, 60, 143-50. Freedman, D. (1971). Markov Chains. San Francisco: Holden Day. Frobenius, G. (1908). Uber Matrizen aus positiven Elementen I. Sitzungsberichte der Kdniglich Preussischen Akademie der Wissenschaften zu Berlin, 471-76. Frobenius, G. (1909). Ober Matrizen aus positiven Elementen II. Sitzungsberichte der Kdniglich Preussischen Akademie der Wissenschaften zu Berlin, 514-18. Frobenius, G. (1912). U ber Matrizen aus nicht negativen Elementen. Sitzungsberichte der Kbniglich Preussischen Akademie der Wissenschaften zu Berlin, 456-77. Gantmacher, F.R. (1959). The Theory of Matrices. New York: Chelsea. Greiner, G. & Nagel, R. (1982). La loi 'zero ou deux' et ses consequences pour le comportement asymptotique des operateurs positifs. Journal de Mathematigues Pures et Appliguees, 61, 261-73. Griffeath, D. (1978). Coupling methods for Markov processes. In Studies in Probability and Ergodic Theory, Advances in Mathematics, Supplementary Studies, 2, 1-43. Grigorescu, S. & Oprisan, G. (1976). Limit theorems for J-X-processes with a general state space. Zeitschrift Jar Wahrscheinlichkeitstheorie und verwandte Gebiete, 35, 65-73. Harris, T.E. (1955). Recurrent Markov processes, II (abstract). Annals of Mathematical Statistics, 26, 152-53. Harris, T.E. (1956). The existence of stationary measures for certain Markov processes. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, vol. 2, pp. 113-24. Berkeley, California: University of California Press. Harris, T.E. (1963). The Theory of Branching Processes. Berlin: Springer. Hopf, E. (1954). The general temporally discrete Markoff process. Journal of Rational Mechanics and Analysis, 3, 13-45. Horowitz, S. (1969). L.-limit theorems for Markov processes. Israel Journal of Mathematics, 7, 60-62. Horowitz, S. (1972). Transition probabilities and contractions of L.. Zeitschrift Jar Wahrscheinlichkeitstheorie und verwandte Gebiete, 24, 263-74.
Horowitz, S. (1979). Pointwise convergence of the iterates of a Harris-recurrent Markov operator. Israel Journal of Mathematics, 33, 177-80. Isaac, R. (1967). On the ratio limit theorem for Markov processes recurrent in the sense of Harris. Illinois Journal of Mathematics, 11, 608-15, Isaac, R. (1968). Some topics in the theory of recurrent Markov processes. Duke Mathematical Journal, 35, 641-52. Isaacson, D. & Tweedie, R.L. (1978). Criteria for strong ergodicity of Markov chains. Journal of Applied Probability, 15, 87-95. Jain, N. (1966). Some limit theorems for a general Markov process. Zeitschrift Wahrscheinlichkeitstheorie und verwandte Gebiete, 6, 206-23. Jain, N. (1969.) The strong ratio limit property for some general Markov processes. Annals of Mathematical Statistics, 40, 986-92. Jain, N. & Jamison, B. (1967). Contributions to Doeblin's theory of Markov processes. Zeitschrift fur Wahrscheinlichkeitstheorie und verwandte Gebiete, 8, 19-40. Jamison, B. & Orey, S. (1967). Markov chains recurrent in the sense of Harris. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 8, 41-48. Jentzsch, R. (1912). eber Integralgleichungen mit positivem Kern. Journal far die Reine und Angewandte Mathematik, 141, 235-44. Kac, M. (1947). On the notion of recurrence in discrete stochastic processes. Bulletin of the American Mathematical Society, 53, 1002-10. Kaplan, E.I. & Silvestrov, D.S. (1979). The invariance principle type theorems for
Bibliography
151
recurrent semi-Markov processes with a general state space (English translation). Theory of Probability and its Applications, 24, 536-47. Karlin, S. (1959). Positive operators. Journal of Mathematics and Mechanics, 8, 907-37. Kemeny, J.G., Snell, J.L. & Knapp, A.W. (1966). Denumerable Markov Chains. Princeton: Van Nostrand. Kendall, D.G. (1959). Unitary dilations of Markov transition operators and the corresponding integral representation of transition probability matrices. In Probability and Statistics, ed. U. Grenander, pp. 138-61. Stockholm: Almqvist and Wiksell. King, J.H. (1981). Strong ratio theorems for Markov and semi-Markov chains. Doctoral Thesis, University of Wisconsin-Madison. Kingman, J.F.C. & Orey, S. (1964). Ratio limit theorems for Markov chains. Proceedings of the American Mathematical Society, 15, 907-10. Krasnoselskii, M.A. (1964). Positive Solutions of Operator Equations. Groningen: Noordhoff. Krein, M.G., & Rutman, M.A. (1948). Linear operators leaving invariant a cone in a Banach space. Uspehi Mat. Nauk (N.S.), 3, 3-95. English translation: The American Mathematical Society, Translations, 26 (1950). Krengel, U. (1966). On the global limit behaviour of Markov chains and of general nonsingular Markov processes. Zeitschrift fir Wahrscheinlichkeitstheorie und verwandte Gebiete, 6, 302-16. Levitan, M. (1967). Some ratio limit theorems for a general state space Markov process. Doctoral Thesis, University of Minnesota. Levitan, M.L. (1970). Some ratio limit theorems for a general state space Markov process. Zeitschrift ffir Wahrscheinlichkeitstheorie und verwandte Gebiete, 15, 29-50. Levitan, M.L. (1971). A generalized Doeblin ratio limit theorem. Annals of Mathematical Statistics, 42, 904-11. Lin, M. (1974a). Convergence of the iterates of a Markov operator. Zeitschrift ffir Wahrscheinlichkeitstheorie und verwandte Gebiete, 29, 153-63. Lin, M. (19746). On quasi-compact Markov operators. Annals of Probability, 2, 464-75. Lin, M. (1976). Strong ratio limit theorems for mixing Markov operators. Annales de l'Institut Henri Poincare, B, 12, 181-91. Lindvall, T. (1977). A probabilistic proof of Blackwell's renewal theorem. Annals of Probability, 5, 482-85. Lindvall, T. (1979a). On coupling of discrete renewal processes. Zeitschrift far Wahrscheinlichkeitstheorie und verwandte Gebiete, 48, 57-70. Lindvall, T. (1979b). On coupling of continuous time renewal processes. Technical report, University of GOteborg. Maigret, N. (1978). Theoreme de limite centrale fonctionnel pour une chaine de Markov recurrente au sens de Harris et positive. Annales de l'Institut Henri Poincare, B, 14, 425-40. McDonald, D. (1975). Renewal theorem and Markov chains. Annales de l'Institut Henri Poincare, B, 11, 187-97. Meilijson, I. (1975). A probabilistic approach to renewal theory. Report BW 53/75, Math. Centre, Amsterdam. Metivier, M. (4969). Existence of an invariant measure and an Ornstein's ergodic theorem Annals of Mathematical Statistics, 40, 79-96. Metivier, M. (1972). Theoreme limite quotient pour les chaines de Markov recurrentes au sens de Harris. Annales de l'Institut Henri Poincare, B, 8, 93-105. Miller, H.D. (1966). Geometric ergodicity in a class of denumerable Markov chains. Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 4, 354-73. Moy, S.T.C. (1965a). )L-continuous Markov chains I. Transactions of the American Mathematical Society, 117, 68-91.
152
Bibliography
Moy, S.T.C. (1965b). )-continuous Markov chains II. Transactions of the American Mathematical Society, 120, 83-107. Moy, S.T.C. (1967). Period of an irreducible positive operator. Illinois Journal of Mathematics, 11 24-39. Neveu, J. (1965). Mathematical Foundations of the Calculus of Probability. San Francisco: Holden Day. Neveu, J. (1972a). Potentiel markovien recurrent des chaines de Harris. Annales de l'Institut Fourier, 22, 85-130. Neveu, J. (1972b). Sur l'irreductibilite des chaines de Markov. Annales de l'Institut Henri Poincare, B, 8, 249-54. Neveu, J. (1973). Une generalisation d'un theoreme limite-quotient. In Transactions of the Sixth Prague Conference on Information Theory, Statistical Decision Functions, Random Processes, pp. 675-82. Czechoslovak Academy of Science, Prague, 1973. Ney, P. (1981). A refinement of the coupling method in renewal theory. Stochastic Processes and their Applications, 11, 11-26.
Niemi, S. & Nummelin, E. (1982). Central limit theorems for Markov random walks. Commentationes Physico-Mathematicae, 54, Societas Scientiarum Fennica, Helsinki. Norman, M.F. (1972). Markov Processes and Learning Models. New York: Academic Press. Nummelin, E. (1977). Strong ratio limit theorems for cp-irreducible Markov chains. Report HTKK-Mat-A108, Helsinki University of Technology, Espoo. Nummelin, E. (1978a). A splitting technique for Harris recurrent Markov chains. Zeitschrift far Wahrscheinlichkeitstheorie und verwandte Gebiete, 43, 309-18. Nummelin, E. (1978b). Uniform and ratio limit theorems for Markov renewal and semiregenerative processes on a general state space. Annales de l'Institut Henri Poincare, B, 14, 119-43. Nummelin, E. (1979). Strong ratio limit theorems for (p-recurrent Markov chains. Annals of Probability, 7, 639-50. Nummelin, E. (1981a). The convergence of sums of transition probabilities of a positive recurrent Markov chain. Mathematica Scandinavica, 48, 79-95. Nummelin, E. (1981b). Regeneration in tandem queues. Advances in Applied Probability, 13, 221-30. Nummelin, E. & Arjas, E. (1976). A direct construction of the R-invariant measure for a Markov chain on a general state space. Annals of Probability, 4, 674-79. Nummelin, E. & Tuominen, P. (1982). Geometric ergodicity of Harris recurrent Markov chains with applications to renewal theory. Stochastic Processes and their Applications, 12, 187-202. Nummelin, E. & Tuominen, P. (1983). The rate of convergence in Orey's theorem for Harris recurrent Markov chains with applications to renewal theory. Stochastic Processes and their Applications, 15, 295-311. Nummelin, E. & Tweedie, R.L. (1978). Geometric ergodicity and R-positivity for general Markov chains. Annals of Probability, 6, 404-20. Orey, S. (1959). Recurrent Markov chains. Pacific Journal of Mathematics, 9, 805-27. Orey, S. (1961). Strong ratio limit property. Bulletin of the American Mathematical Society, 67, 571-74. Orey, S. (1971). Lecture Notes on Limit Theorems for Markov Chain Transition Probabilities. London: Van Nostrand. Ornstein, D. (1969a). Random walks I. Transactions of the American Mathematical Society, 138, 1-43. Ornstein, D. (1969b). Random walks II. Transactions of the American Mathematical Society, 138, 45-60. Ornstein, D. & Sucheston, L. (1970). An operator theorem on L 1 convergence to zero with applications to Markov kernels. Annals of Mathematical Statistics, 41, 1631-39.
Bibliography
153
Palm, C. (1943). Intensitatschwankungen im Fernsprechverkehr, Ericsson Technics, No 44. Stockholm: Telefonaktiebolaget LM Ericsson-. Perron, 0. (1907). Zur Theorie der Matrizen. Mathematische Annalen, 64, 248-63. Pitman, J.W. (1974). Uniform rates of convergence for Markov chain transition probabilities. Zeitschrift far Wahrscheinlichkeitstheorie und verwandte Gebiete, 29, 193227. Pitman, J.W. (1976). On coupling of Markov chains. Zeitschrift fur Wahrscheinlichkeitstheorie und verwandte Gebiete, 35, 315-22. Pitman, J. & Speed, T.P. (1973). A note on random times. Stochastic Processes and their Applications, 1, 369-74. Popov, N. (1977). Conditions for geometric ergodicity of countable Markov chains. Soviet Mathematics, Doklady, 18, 676-79. Revuz, D. (1975). Markov Chains. Amsterdam: North-Holland. Revuz, D. (1979). Sur la definition des classes cycliques des chaines de Harris. Israel Journal of Mathematics, 33, 378-83. Rosenblatt, M. (1971). Markov Processes. Structure and Asymptotic Behaviour. Berlin: Springer. Schaefer, H.H. (1974). Banach Lattices and Positive Operators. Berlin: Springer. Seneta, E. (1981). Non-negative Matrices and Markov Chains, 2nd edn. New York: Springer. Seneta, E. & Vere-Jones, D. (1966). On quasi-stationary distributions in discrete-time Markov chains with a denumerable infinity of states. Journal of Applied Probability, 3, 403-34. Sidak, Z. (1966). Classification of Markov chains with a general state space. Bulletin of the American Mathematical Society, 72, 149-52. Smith, W.L. (1955). Regenerative stochastic processes. Proceedings of the Royal Statistical Society, London, A, 232, 6-31. Smith, W.L. (1958). Renewal theory and its ramifications. Journal of the Royal Statistical Society, B, 20, 243-302. Stone, C. & Wainger, S. (1967). One-sided error estimates in renewal theory. Journal d'Analyse Mathematigue, 20, 325-52. Thorisson, H. (1981). The coupling of regenerative processes. Doctoral thesis, University of Goteborg. Tuominen, P. (1976). Notes on 1-recurrent Markov chains. Zeitschrift fiir Wahrscheinlichkeitstheorie und verwandte Gebiete, 36, 111-18. Tweedie, R.L. (1974a). R-theory for Markov chains on a general state space I: solidarity properties and R-recurrent chains. Annals of Probability, 2, 840-64. Tweedie, R.L. (1974b). R-theory for Markov chains on a general state space II: rsubinvariant measures for r-transient chains. Annals of Probability, 2, 865-78. Tweedie, R.L. (1974c). Quasi-stationary distributions for Markov chains on a general state space. Journal of Applied Probability, 11, 726-41. Tweedie, R.L. (1975). Sufficient conditions for ergodicity and recurrence of Markov chains on a general state space. Stochastic Processes and their Applications, 3, 385-403. Tweedie, R.L. (1976). Criteria for classifying general Markov chains. Advances in Applied Probability, 8, 737-71. Vere-Jones, D. (1962). Geometric ergodicity in denumerable Markov chains. Quarterly Journal of Mathematics, Oxford, 2nd Ser. 13, 7-28. Vere-Jones, D. (1967). Ergodic properties of nonnegative matrices I. Pacric Journal of Mathematics, 22, 361-85. Vere-Jones, D. (1968). Ergodic properties of nonnegative matrices II. Pacific Journal of Mathematics, 26, 601-20. Winkler, W. (1975). Doeblin's and Harris' theory of Markov processes. Zeitschrift fur Wahrscheinlichkeit,stheorie und verwandte Gebiete, 31, 79-88. Yosida, K. and Kakutani, S. (1941). Operator-theoretical treatment of Markoff's process and mean ergodic theorem. Annals of Mathematics, 42, 188-228.
Index
absorbing set 8 aperiodicity 21, 49 atom 58 proper a. 51 attainability 8 autoregressive process 5 backward chain 56 balayage theorem 27 block 47, 57 branching process 29 cemetery 6 charge 26 closed set 8 communicating 47 c. set 11 c. state 47 conservative set 40 contraction 109 convergence parameter 27 counting measure 11 coupling 99 c. inequality 100 cycle 20 cyclic sets 21 delay sequence 48 dissipative set 38 equilibrium distribution 50 equivalent kernels 10 ergodicity 114 geometric e. 107, 119 R-e. 123 uniform e. 122 forward process 5 full set 14 functional 7 harmonic function 25 minimal h.f. 44 Harris 43 H. recurrence 42 H. set 43 maximal H. set 44 history 3 adaptation to a h. 3
internal h. 3 Markov with respect to a h. 3 hitting time 33 Hopf's decomposition 41 incidence process 41 increment sequence 49 indecomposable set 8 initial 3 i. distribution 3 i. state 4 invariant 72 i. measure 72 r-i. function 69 r-i. measure 72 irreducibility 11 Card B -i. 11 co-i. 11 i. measure 72 maximal i. measure 13 iterates of a kernel 2 kernel 1 bounded k. 1 discrete k. 2 finite k. 1 integral k. 2 potential k. 9 a-finite k. 1 stochastic k. 2 substochastic k. 1 learning model 5 life time of a Markov chain 36 Malthusian parameter 29 Markov chain 3 canonical M.c. 6 discrete M.C. 4 Markov property 7 strong M.p. 33 Maximum principle 36 minorization condition 14 periodicity 21, 49 potential 25 queue 4 2-server q. 57 quasi-stationary distribution 125
Index
156 random element 3 random variable 3 random walk 4 reflected r.w. 4 recurrence 42, 50 9 recurrence 43 geometric r. 86 geometric R-r. 94 Harris r. 42 null r. 50, 68 positive r. 50, 68 R-null r. 68 R -positive r. 68 R-r. 27, 51 uniform r. 92 uniform R-r. 92 regeneration time 66 regenerative Markov chain 66 regularity 79 f-r. 79 geometric r. 87 r. of degree 2 85 renewal equation 49 renewal process 5, 49 renewal squence 48 embedded r.s. 53 probabilistic r.s. 49 resolvent equation 36 restriction of a kernel 8 Riesz decomposition 26 -
sample space 3 shift operator 6
similarity transformation 71 small 15 s. function 15 s. measure 15 s. set 15 special 79 s. function 79 s. set 79, 91 split chain 60 spread-out distribution 12 state space 1 stationary distribution 74 stochastic difference equation 5 stochastic process 3 stopping time 31 randomized s.t. 31 storage 57 subinvariant 72 s. measure 72 r-s. function 72 r-s. measure 72 total variation 109 g-t.v. norm 108 t.v, norm t.v. of a measure 109 t.v. of a renewal sequence 105 transience 38, 50 R-t. 28, 51 transition matrix 4 transition probablity 3 Wald 's lemma 81