1 Circulation Distribution, Entropy Production and Irreversibility of Denumerable Markov Chains
The concept of entropy ...
7 downloads
733 Views
363KB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
1 Circulation Distribution, Entropy Production and Irreversibility of Denumerable Markov Chains
The concept of entropy production was first put forward in nonequilibrium statistical physics to describe how far a specific state of a system is from its equilibrium state [220, 344, 439]. It is closely related to another concept of macroscopic irreversibility in nonequilibrium statistical physics. A macroscopic irreversible system in a steady state should have positive entropy production rate and should be in its nonequilibrium state. In Chapters 1–6 of this book, various stationary stochastic processes are used to model macroscopic systems in nonequilibrium steady states. A heuristic introduction to the entropy production of Markov chains has its beginnings in the corresponding generative entity arising in nonequilibrium statistical physics. Let Σ be a nonequilibrium system of coupled chemical reactions, where some of its N reactants are continuously introduced into the system and others are continuously withdrawn so that the ratio of the reactants can be described by a strictly positive probability distribution {πi : 1 ≤ i ≤ N }. Let pij be the probability law of the reactant i transforming into the reactant j, then the affinity (thermodynamic flux) Aij = πj pji − πi pij expresses the reaction rates. The entity πj pji A˜ij = log πi pij with pij > 0, i, j ∈ {1, · · · , N }, is known in the physical nomenclature as the conjugated thermodynamic force of Aij . The expression def
EP =
1 1 πi pij Aij A˜ij = (πi pij − πj pji ) log , 2 i,j 2 i,j πj pji
(1.1)
with all pji > 0, may be interpreted as the entropy production rate of the system Σ up to a constant factor, which is the Boltzmann constant multiplied
D.-Q. Jiang, M. Qian, and M.-P. Qian: LNM 1833, pp. 11–44, 2004. c Springer-Verlag Berlin Heidelberg 2004
12
1 Denumerable Markov Chains
with the temperature at which the reactions occur. The expression (1.1) was first investigated by J. Schnakenberg [439] under the standpoint of nonequilibrium statistical physics. In [401–403], the entropy production rate of a stochastic process was defined measure-theoretically as the specific relative entropy of the distribution of the process with respect to that of its time reversal, unifying different entropy production formulas in various concrete cases. Suppose that ξ = {ξn }n∈Z is a stationary, irreducible and positive recurrent Markov chain with a denumerable state space S, a transition probability matrix P = (pij )i,j∈S , and a unique invariant distribution Π = (πi )i∈S . Let P and P− be the distributions of the Markov chain and its time reversal respectively, then the entropy production rate (or say, the specific information gain of the stationary chain with respect to its time reversal) is defined as def
ep =
lim
n→+∞
1 H P|F0n , P− |F0n , n
where F0n = σ(ξk : 0 ≤ k ≤ n), and H(P|F0n , P− |F0n ) is the relative entropy of P with respect to P− restricted to the σ-field F0n . One can prove that ep =
1 πi pij (πi pij − πj pji ) log . 2 πj pji
(1.2)
i,j∈S
Maes and his collaborators [313–316] gave a definition of entropy production rate in the context of Gibbs measures with more or less similarities to that in [401–403]. Besides the description of the Markov chain in terms of the transition matrix P , which in turn provides the invariant distribution Π and the edgecoordinates πi pij , i, j ∈ S, another description can be given in terms of a collection C of directed circuits (or say, cycles) and weights {wc : c ∈ C} on these circuits, which can be regarded as “global coordinates” or cyclecoordinates. Kalpazidou [257] provided a survey of all principal trends to cycle representation theory of Markov processes which is devoted to the study of the inter-connections between the edge-coordinates and the cycle-coordinates along with the corresponding implications for the study of the stochastic properties of the processes. Another school, the Qians in Beijing (Min Qian, MinPing Qian, Guang-Lu Gong, etc.), developed independently the cycle representations, using mainly a behavioral approach. They defined and explored with exceptional completeness the probabilistic analogues of certain basic concepts which rule nonequilibrium statistical physics such as Hill’s cycle flux [223–226], Schnakenberg’s entropy production [439], the detailed balance, etc. With probability one the Markov chain ξ generates an infinite sequence of cycles. For a probabilistic cycle representation of the Markov chain ξ, the set of cycles C contains all the directed cycles occurring along almost all sample paths of ξ, and the weight wc is the mean number of occurrences of the cycle c along almost all the sample paths of ξ. If we translate the diagram method
1.1 Directed Circuits, Cycles and Passage Functions
13
of Hill [223–226] into the language of Markov chains, then his concept of cycle flux corresponds to the cycle weights (or say, circulations) and his concept of detailed balance corresponds to reversibility for the stationary Markov chain ξ. The entropy production rate ep of the chain ξ can also be expressed in terms of the circuits and their weights. The chain ξ is reversible if and only if ep vanishes, or iff every cycle and its reversed cycle have the same weight. The fluctuation theorem, which was first obtained by Gallavotti and Cohen [150,163] for hyperbolic dynamical systems and then extended to stochastic processes by Kurchan [276], Lebowitz and Spohn [286], etc., provides the probability ratio of observing trajectories that satisfy or violate the second law of thermodynamics. It can be interpreted as an extension, to arbitrarily strong external fields, of the fluctuation-dissipation theorem [150,151,430]. In this chapter and the next one, we will prove the fluctuation theorem in the context of finite Markov chains with discrete and continuous time parameter respectively: The distributions of sample entropy production rates (the logarithm of the Radon-Nikodym derivative of the distribution of the Markov chain with respect to that of its time reversal over a time interval [0, t], t ∈ N or R+ ) have a large deviation property and the large deviation rate function has a symmetry of Gallavotti-Cohen type. The proof is based on the wellknown Perron-Frobenius theorem and some ideas of the proof comes from Lebowitz and Spohn [286].
1.1 Directed Circuits, Cycles and Passage Functions A circuit or a cycle is a topological concept that can be defined either by geometric or by algebraic considerations. A property of a directed circuit is a canonical return to its points, that is, a periodic conformation. Here we adopt the presentation given by Kalpazidou [257] and give a functional version of the definition of a directed circuit expressing periodicity. Namely, a circuit will be defined to be any periodic function on the set of integers. Definition 1.1.1. A directed circuit-function in a denumerable set S is a periodic function c from the set Z of integers into S. The values c(n), n ∈ Z, are called either points or vertices, or nodes of c while the pairs (c(n), c(n + 1)), n ∈ Z, are called either directed edges or directed branches, or directed arcs of c. The smallest integer p = p(c) ≥ 1 that satisfies the equation c(n + p) = c(n), for all n ∈ Z, is called the period of c. A directed circuit-function c with p(c) = 1 is called a loop. With each directed circuit-function c we can associate a whole class of directed circuit-functions c obtained from c by using the group of translations def on Z. For any fixed i ∈ Z we put ti (n) = n + i, n ∈ Z, then we can define a new directed circuit-function c as c = c ◦ ti , that is, c (n) = c(n + i), n ∈ Z. Clearly c and c do not differ essentially and this suggests the following definition:
14
1 Denumerable Markov Chains
Two directed circuit functions c and c are called equivalent if and only if there is some i ∈ Z such that c = c ◦ ti .
(1.3)
Note that (1.3) defines an equivalence relation in the class of all directed circuit-functions in S. It is obvious that any two directed circuit-functions in the same equivalence class have the same vertices, period and direction. Definition 1.1.2. A directed circuit in a denumerable set S is an equivalence class according to the equivalence relation defined in (1.3). A directed circuit c is determined either by: (i) the period p = p(c) and (ii) any (p + 1)-tuple (i1 , i2 , · · · , ip , ip+1 ) with ip+1 = i1 ; or by (i ) the period p = p(c) and (ii ) any p ordered pairs (i1 , i2 ), (i2 , i3 ), · · · , (ip , ip+1 ) with ip+1 = i1 , where il = c(n + l − 1), 1 ≤ l ≤ p for some n ∈ Z. Definition 1.1.3. The directed cycle associated with a given directed circuit c = (i1 , i2 , · · · , ip , i1 ), p ≥ 1, with distinct points i1 , · · · , ip is the ordered sequence cˆ = (i1 , · · · , ip ). According to Definition 1.1.2 a cycle is invariant with respect to any cyclic permutation of its points. Definition 1.1.4. The reverse c− of a circuit c = (i1 , i2 , · · · , ip , i1 ), p > 1, is the directed circuit c− = (i1 , ip , ip−1 , · · · , i2 , i1 ). Definition 1.1.5. Given a directed circuit c in the denumerable set S determined by (i1 , · · · , ip(c) , i1 ), for k ∈ S, define Jc (k) as the number of all integers l, 0 ≤ l ≤ p(c) − 1, such that il+1 = k. We say that c passes through k if and only if Jc (k) = 0. Jc (k) is the number of times k is passed by c. Clearly Jc◦tj (k) = Jc (k) for any j ∈ Z. When all the points of c are distinct, except for the terminals, then 1, if k is a point of c; Jc (k) = 0, otherwise. Definition 1.1.6. Given r > 1 consecutive points k1 , · · · , kr ∈ S and a directed circuit c in S with period p(c), define Jc (k1 , · · · , kr ) as the number of distinct integers l, 0 ≤ l ≤ p(c) − 1, such that c ◦ tl (m) = km , m = 1, 2, · · · , r. We say that c passes through (k1 , k2 , · · · , kr ) if and only if Jc (k1 , · · · , kr ) = 0. Jc (k1 , · · · , kr ) is the number of times c passes through (k1 , · · · , kr ). When all the points of c are distinct, except for the terminals, then 1, if (i, j) is an edge of c; Jc (i, j) = 0, otherwise.
1.2 The Derived Chain
15
Lemma 1.1.7. The passage function Jc satisfies the following balance properties: Jc (k) = Jc (k, i) = Jc (l, k), i∈S
l∈S
Jc (k) = Jc− (k− ), for an arbitrarily given r ≥ 1 and for any k = (k1 , · · · , kr ) ∈ S r , where c− always symbolizes the reverse of c.
1.2 The Derived Chain In this chapter we suppose that ξ = {ξn (ω)}n∈Z is a stationary, irreducible and positive recurrent Markov chain on a probability space (Ω, F, P) with a denumerable state space S, a transition probability matrix P = (pij )i,j∈S , and a unique invariant probability distribution Π = (πi )i∈S . For simplicity, we can assume that (Ω, F, P) is the canonical orbit space of ξ, hence Ω = S Z = {ω = (ωk )k∈Z : ωk ∈ S, ∀k ∈ Z} and ξn (ω) = ωn . With probability one the Markov chain ξ generates an infinite sequence of cycles. If we discard the cycles formed by time n, and keep the track of the remaining states in sequence, we get a new Markov chain {ηn } which we call the derived chain. We will give the precise definition later, but the basic idea should be clear from the following example. If the values of the original chain {ξn (ω)}n≥0 are {1, 2, 3, 4, 2, 3, 5, 1, 5, 4, 5, · · ·}, then the cycles and the corresponding values of the derived chain are as follows: n 0 1 2 3 4 5 ξn (ω) 1 2 3 4 2 3 ηn (ω) [1] [1, 2] [1, 2, 3] [1, 2, 3, 4] [1, 2] [1, 2, 3] cycles formed (2, 3, 4) n 6 7 8 9 10 ξn (ω) 5 1 5 4 5 ηn (ω) [1, 2, 3, 5] [1] [1, 5] [1, 5, 4] [1, 5] cycles formed (1, 2, 3, 5) (5, 4) Let wc,n (ω) be the number of occurrences of the cycle c up to time n along the sample path {ξl (ω)}l≥0 . The rigorous definitions of the derived chain {ηn } and wc,n (ω) is due to Min-Ping Qian, et al. [400]. Here we adopt the definition given in [404] rather than that adopted by [400] and [405], which is very technical. We denote an ordered sequence of distinct points i1 , · · · , ir by [i1 , · · · , ir ] and identify the ordered union [[i1 , · · · , im ], [im+1 , · · · , im+k ]] with [i1 , · · · , im , im+1 , · · · , im+k ], where i1 , · · · , im , im+1 , · · · , im+k are distinct. The
16
1 Denumerable Markov Chains
set [S] of all finite ordered sequences [i1 , · · · , ir ], r ≥ 1, of points of S is denumerable. To describe the process of discarding cycles formed by the chain ξ, we define a mapping from [S] × S to [S] by [i1 , i2 , · · · , ir , i], if i ∈ {i1 , i2 , · · · , ir }; def [i1 , i2 , · · · , ir ] i = [i1 , i2 , · · · , ik ], if i = ik for some 1 ≤ k ≤ r. Then we can define the derived chain η = {ηn }n∈Z+ by η0 (ω) = [ξ0 (ω)], ηn (ω) = ηn−1 (ω) ξn (ω) for n ≥ 1. ηn (ω) is a mapping from Ω to [S]. One can inductively prove that η is adapted to the filtration {Fn }n≥0 , where Fn = σ(ξk : 0 ≤ k ≤ n). It is clear that if ηn (ω) = [i0 , i1 , · · · , ir ], then ξn (ω) = ir must hold. It can be seen from the definition of the derived chain η that the transition from ηn (ω) = [i0 , i1 , · · · , ik , · · · , ir ] to ηn+1 (ω) = [i0 , i1 , · · · , ik ] in the space [S] corresponds to the completion of the circuit (ik , ik+1 , · · · , ir , ik ) for the original chain ξ while ξn (ω) = ir and ξn+1 (ω) = ik . As a result of a cycle c = (i1 , · · · , is ) being equivalent to any of its cyclic permutation, corresponding to the completion of the cycle c, there may exist several transitions in [S]. If the initial state i of η is on the cycle c and i = ik (1 ≤ k ≤ s), then what is equivalent to the completion of the cycle c is the unique transition of η from [ik , ik+1 , · · · , ik+s−1 ] to [ik ]. If the initial state i of η is not on the cycle c = (i1 , · · · , is ), then for each cyclic permutation (ik , ik+1 , · · · , ik+s−1 ) of (i1 , i2 , · · · , is ) and any distinct j1 , · · · , jr ∈ {i, i1 , · · · , is }, r ≥ 0, the transition from ηn (ω) = [i, j1 , · · · , jr , ik , ik+1 , · · · , ik+s−1 ] to ηn+1 (ω) = [i, j1 , · · · , jr , ik ] corresponds to a completion of the cycle c for the chain ξ. For each cycle c = (i1 , · · · , is ), let wc,n (ω) =
n l=1
1s k=1
{˜ ω :ηl−1 (˜ ω )=[ηl (˜ ω ),[ik ,ik+1 ,···,ik+s−1 ]]} (ω)
(1.4)
where 1A (·) is the indicator function of the set A and the sums k +1, k +2, · · ·, k + s − 1 are understood to be modulo s. From the analysis above, it is clear that wc,n (ω) counts the number of times that the cycle c has been formed by time n. Let [S]i be the subset of all ordered sequences [i1 , i2 , · · · , ir ] (r ≥ 1) in [S] such that i1 = i, pik ik+1 > 0, ∀1 ≤ k < r. According to the definition of η, if η0 (ω) = [i], then ηn (ω) ∈ [S]i , ∀n ∈ N. Lemma 1.2.1. η = {ηn }n≥0 is a homogeneous Markov chain with the countable state space [S] and the initial distribution P(η0 = [i]) = πi , P(η0 = y) = 0, ∀y ∈ {[i] : i ∈ S}. Each [S]i is an irreducible positive recurrent class of η. For any two states y1 = [i1 , i2 , · · · , is ], y2 = [j1 , j2 , · · · , jr ] in [S]i , the one-step transition probability of η from y1 to y2 is if r ≤ s and i1 = j1 , i2 = j2 , · · · , ir = jr pis jr , or r = s + 1 and i1 = j1 , i2 = j2 , · · · , is = js , (1.5) p˜y1 y2 = 0, otherwise.
1.2 The Derived Chain
17
˜ i of η on each [S]i satisfies The unique invariant probability distribution Π ˜ i ([i]) = πi . Π
(1.6)
Proof. If neither y2 = [y1 , jr ] nor y1 = [y2 , [ir+1 , · · · , is ]], then it is impossible for both ηn = y1 and ηn+1 = y2 to hold, so for any z1 · · · , zn−1 ∈ [S]i , P(ηn+1 = y2 |ηn = y1 , ηn−1 = zn−1 , · · · , η1 = z1 , η0 = [i]) = 0. If y2 = [y1 , jr ] or y1 = [y2 , [ir+1 , · · · , is ]], then for any suitable z1 , · · · , zn−1 ∈ [S]i such that P(ηn = y1 , ηn−1 = zn−1 , · · · , η1 = z1 , η0 = [i]) > 0, we have P(ηn+1 = y2 |ηn = y1 , ηn−1 = zn−1 , · · · , η1 = z1 , η0 = [i]) = P(ηn+1 = y2 , ξn+1 = jr |ξn = is , ηn = y1 , · · · , η1 = z1 , η0 = [i]) = P(ξn+1 = jr |ξn = is , ηn = y1 , ηn−1 = zn−1 , · · · , η1 = z1 , η0 = [i]) = P(ξn+1 = jr |ξn = is ) = pis jr . The last two equalities follow from the Markov property of ξ. Suppose that η can reach [i, i1 , · · · , is ] from [i]. Since ξ is irreducible, η can also return to [i] from [i, i1 , · · · , is ]. Thereby [S]i is an irreducible class of η. As ξ is recurrent, we have P(ξ returns to i|ξ0 = i) = 1, therefore, P(η returns to [i]|η0 = [i]) = P(ξ returns to i|ξ0 = i) = 1. Thus η is also recurrent on the irreducible class [S]i . Indeed the ergodicity of ξ guarantees ergodicity for η and so η has a unique invariant probability ˜ i on each [S]i . And we can get Π ˜ i ([i]) = πi from distribution Π n−1 ˜ i ([i]) = lim 1 Π P(ηk = [i]|η0 = [i]) n→+∞ n k=0
= lim
n→+∞
1 n
n−1
P(ξk = i|ξ0 = i) = πi .
k=0
˜ i1 ([i1 , i2 , · · · , is ]) has a much more complex alThe general probability Π gebraic expression in terms of the transition probabilities pij of ξ which is due to Min-Ping Qian, Min Qian and Cheng Qian [400, 406]. In case the Markov chain ξ has a finite state space S = {1, 2, · · · , N }, let D = (dij ) = I − P , and let D(H) be the determinant of D with rows and columns indexed in the index set H. D(∅) is understood as 1. Theorem 1.2.2. If the state space S of the Markov chain ξ is finite, then we have: ˜ i of the Markov chain η on the 1) The invariant probability distribution Π recurrent class [S]i is given by
18
1 Denumerable Markov Chains
, i , · · · , is }c ) ˜ i ([i1 , i2 , · · · , is ]) = pi i pi i · · · pi i · D({i 1 2 , Π 1 2 2 3 s−1 s c j∈S D({j} )
(1.7)
where i1 = i; 2) ˜ i1 ([i1 , i2 , · · · , is ])pi i Π s 1 s ˜ j1 ([j1 , · · · , jr , ik , ik+1 , · · · , ik+s−1 ])pi i , = Π k−1 k
(1.8)
k=1 r≥1 j2 ,···,jr
where j1 is fixed in the complement set of {i1 , i2 , · · · , is }, the inner sum is taken over distinct choices j2 , j3 , · · · , jr ∈ S \ {j1 , i1 , · · · , is }, and the sums k + 1, k + 2, · · · , k + s − 1 are understood to be modulo s. ˜i In case the state space S is a countable set, the invariant distribution Π of the Markov chain η on the recurrent class [S]i can be expressed by a set of taboo probabilities, and 2) of the theorem still holds true. For the convenience of those readers who are not very familiar with probability theory, especially the theory of taboo probability, we present here the case that S is finite and give an algebraic proof by the method of calculating some determinants, while we present later the general case (S is finite or not), express the invariant dis˜ i in terms of some taboo Green functions and give a probabilistic tribution Π proof in Appendix 1.6. To prove Theorem 1.2.2, we need to prove four lemmas firstly. Lemma 1.2.3. The unique invariant probability distribution Π = (πi )i∈S of the Markov chain ξ can be expressed as D({i}c ) . c j∈S D({j} )
πi =
Proof. The unique invariant probability distribution Π = (π1 , π2 , · · · , πN ) is the solution of the system of equations ΠD = 0 and Π1 = 1, where 1 = (1, · · · , 1)T . Since the sum of every row of D is 0, the above system of equations is equivalent to 1 d11 d12 · · · d1,j−1 d1,j+1 · · · d1N 1 d21 d22 · · · d2,j−1 d2,j+1 · · · d2N (π1 , · · · , πN ) . . .. .. .. .. .. .. = (1, 0, · · · , 0), .. .. . . . . . . 1 dN 1 dN 2 · · · dN,j−1 dN,j+1 · · · dN N where j can be any one of the integers 1, 2, · · · , N . Denote the system of equations above simply as
1.2 The Derived Chain
19
ΠDj = (1, 0, · · · , 0). Then by the classical adjoint expression of inverse matrix, Π = (1, 0, · · · , 0)D−1 j D({j}c ) = ∗, · · · , ∗, , ∗, · · · , ∗ , (−1)j+1 det Dj hence the j-th element πj =
D({j}c ) . (−1)j+1 det Dj
(1.9)
Furthermore, for each 2 ≤ j ≤ N , if we add all columns except the first one of Dj to the second column, then we get 1 d11 d12 · · · d1,j−1 d1,j+1 · · · d1N .. .. .. .. .. .. det Dj = det ... ... . . . . . . 1 dN 1 dN 2 · · · dN,j−1 dN,j+1 · · · dN N
1 −d1j d12 · · · d1,j−1 d1,j+1 · · · d1N .. .. .. .. .. .. = det ... ... . . . . . . 1 −dN j dN 2 · · · dN,j−1 dN,j+1 · · · dN N 1 d12 · · · d1,j−1 d1j d1,j+1 · · · d1N .. .. .. .. .. = (−1)j−1 det ... ... ... . . . . . 1 dN 2 · · · dN,j−1 dN j dN,j+1 · · · dN N
= (−1)j+1 det D1 . Therefore for each 2 ≤ j ≤ N , (−1)j+1 det Dj = det D1 . Then from
(1.10)
πi = 1 and (1.9), we get πi = πi (−1)i+1 det Di = D({i}c ), det D1 = det D1 i∈S
i∈S
i∈S
i∈S
which together with (1.9) and (1.10) implies the desired result. Lemma 1.2.4. D({i1 , i2 , · · · , is−1 }c ) = dis is D({i1 , · · · , is }c ) − pis j1 pj1 j2 · · · pjr−1 jr pjr is D({j1 , · · · , jr , i1 , · · · , is }c ), r>0,j1 ,···,jr
where the sum is taken over all distinct choices j1 , · · · , jr ∈ {i1 , · · · , is }c .
20
1 Denumerable Markov Chains
Proof. Let D(i, j|k1 , · · · , kr ) be the determinant formed by the i-th, k1 -th, · · ·, kr -th rows and the j-th, k1 -th, · · ·, kr -th columns of the matrix D, then it follows from expanding by the first row that D(i, j|k1 , · · · , kr ) = dij D({k1 , · · · , kr }) s + pikl D(kl , j|k1 , · · · , kl−1 , kl+1 , · · · , kr ). (1.11) l=1
Next we exploit Lemma 1.2.3 and induce on r to prove D(i, j|k1 , · · · , kr ) = dij D({k1 , · · · , kr }) − pij1 pj1 j2 · · · pjα−1 jα pjα j D({k1 , · · · , kr } ∩ {j1 , · · · , jα }c ) (1.12) α>0,j1 ,···,jα
with the sum taken over distinct j1 , · · · , jα contained in {k1 , · · · , kr }. Obviously, in the case r = 1, (1.12) is true. Assuming that (1.12) holds for r, from (1.11) we get D(i, j|k1 , · · · , kr+1 ) = dij D({k1 , · · · , kr+1 }) +
r+1
pikl D(kl , j|k1 , · · · , kl−1 , kl+1 , · · · , kr+1 )
l=1
= dij D({k1 , · · · , kr+1 }) + −
r+1
pikl [dkl j D({k1 , · · · , kl−1 , kl+1 , · · · , kr+1 })
l=1
pkl j1 · · · pjα j D({k1 , · · · , kl−1 , kl+1 , · · · , kr+1 } ∩ {j1 , · · · , jα }c )]
α>0,j1 ,···,jα
= dij D({k1 , · · · , kr+1 }) − pij1 · · · pj j D({k1 , · · · , kr+1 } ∩ {j1 , · · · , jα }c ) α >0,j1 ,···,jα
α
with the sum taken over distinct j1 , · · · , jα contained in {k1 , · · · , kr+1 }. Thus by induction, (1.12) holds for any r. Finally, we put i = j = is and {k1 , · · · , kr } = {i1 , · · · , is }c in (1.12) to get the result in the lemma. Lemma 1.2.5. For every fixed j ∈ {i1 , · · · , is }c , we have D({i1 , · · · , is }c ) s = pjj1 pj1 j2 · · · pjr ik D({j, j1 , · · · , jr , i1 , · · · , is }c ) (1.13) k=1 r≥0,j1 ,···,jr
with the inner sum taken over distinct j1 , · · · , jr ∈ {j, i1 , · · · , is }c .
1.2 The Derived Chain
21
Proof. Since by a permutation we can change the order of the rows and columns simultaneously in a determinant without changing its value, we can simply assume that {i1 , · · · , is , j} = {1, 2, · · · , s, s + 1}. As D(∅) is understood as 1, (1.13) holds for the case s = N − 1. For the case N − s = 2, since D({N − 1, N }) = dN −1,N −1 dN N − dN −1,N dN,N −1 N −2 = pN −1,k + pN −1,N dN N − pN −1,N pN,N −1 k=1
=
N −2
pN −1,k dN N +
k=1
N −2
pN −1,N pN k ,
k=1
(1.13) holds true. Next we prove (1.13) by induction. Assuming that D({s + 1, · · · , N }) s = ps+1,j1 pj1 j2 · · · pjr k D({s + 1, j1 , · · · , jr , 1, · · · , s}c ), k=1 r≥0,j1 ,···,jr
we have to prove D({s, s + 1, · · · , N }) =
s−1
psj1 pj1 j2 · · · pjr k D({s, j1 , · · · , jr , 1, · · · , s − 1}c ) (1.14)
k=1 r≥0,j1 ,···,jr
with the inner sum taken over distinct j1 , · · · , jr ∈ {1, 2, · · · , s}c . In fact, the contribution from r = 0 on the right side of (1.14) is s−1
psk D({1, 2, · · · , s − 1, s}c )
k=1
=
s−1
psk D({s + 1, · · · , N })
k=1
N
= dss −
psj0 D({s + 1, · · · , N }), by the induction hypothesis,
j0 =s+1
= dss D({s + 1, · · · , N }) −
N j0 =s+1
psj0
s
pj0 j1 pj1 j2 · · · pjr k D({j0 , j1 , · · · , jr , 1, 2, · · · , s}c )
k=1 r≥0,j1 ,···,jr
= dss D({s + 1, · · · , N }) s psj0 pj0 j1 · · · pjr k D({j0 , · · · , jr , 1, · · · , s}c ). − k=1 r≥1,j0 ,···,jr
(1.15)
22
1 Denumerable Markov Chains
We combine the second term in (1.15) with the remaining on the right side of (1.14) to get psj1 pj1 j2 · · · pjr s D({1, 2, · · · , s, j1 , · · · , jr }c ) − r≥1,j1 ,···,jr
with the sum taken over distinct j1 , · · · , jr ≥ s + 1. Finally, we combine this with the first term in (1.15), and (1.14) follows with the help of Lemma 1.2.4. Lemma 1.2.6. For any i ∈ {1, 2, · · · , N }, N
D({j}c ) =
j=1
pi1 i2 · · · pis−1 is D({i1 , · · · , is }c ).
[i1 ,···,is ]∈[S]i
For the term s = 1, pi1 i2 · · · pis−1 is is understood as 1. Proof. Sum up over the case s = 1 in (1.13), then we get N
D({j}c ) =
j=1
N
j=1
r≥0,j1 ,···,jr
j=i
=
pij1 pj1 j2 · · · pjr j D({i, j1 , · · · , jr , j}c ) + D({i}c )
pi1 i2 pi2 i3 · · · pis−1 is D({i1 , i2 , · · · , is }c ) + D({i}c )
s≥2,i2 ,···,is i1 =i
=
pi1 i2 · · · pis−1 is D({i1 , · · · , is }c ).
[i1 ,···,is ]∈[S]i
Proof of Theorem 1.2.2. The derived chain η is positive recurrent on each irreducible class [S]i and its invariant measure on [S]i should be the unique solution to the following system of equations: ˜ i P˜ = Π ˜i Π i (1.16) ˜i [i1 ,···,is ]∈[S]i Π ([i1 , · · · , is ]) = 1, where P˜i = (˜ py1 y2 ) is the probability transition matrix of η on [S]i . ˜ i given by the right side of (1.7) satisfies It follows from Lemma 1.2.6 that Π the second equation. From djj = 1 − pjj > 0 and Lemma 1.2.4, we get ˜ i ([i1 , · · · , is ]) = Π ˜ i ([i1 , · · · , is−1 ])pi i + Π ˜ i ([i1 , · · · , is ])pi i Π s−1 s s s i ˜ + Π ([i1 , · · · , is , j1 , · · · , jr ])pjr is , r≥1,j1 ,···,jr
˜ i given by (1.7) is the unique i.e. the first equation is satisfied. Therefore, Π solution of (1.16) and it has to be the invariant measure of η on [S]i . (1.8) simply follows from Lemma 1.2.5. In the general case, whether the state space S is finite or not, we have the following result, of which we will give a probabilistic proof in Appendix 1.6.
1.2 The Derived Chain
23
˜ i of the Markov Theorem 1.2.7. 1) The invariant probability distribution Π chain η on the recurrent class [S]i can be expressed as ˜ i ([i1 , i2 , · · · , is ]) = pi i pi i · · · pi i πi g(i2 , i2 |{i1 }) Π s−1 s 1 1 2 2 3 g(i3 , i3 |{i1 , i2 }) · · · g(is , is |{i1 , · · · , is−1 }) (1.17) where i1 = i, and for each 1 ≤ l ≤ s − 1, g(j, k|{i1 , · · · , il }) denotes the taboo Green function g(j, k|{i1 , · · · , il }) =
+∞
P(ξn = k, ξm ∈ {i1 , · · · , il } for 1 ≤ m < n|ξ0 = j);
n=0
moreover, the product πi1 g(i2 , i2 |{i1 })g(i3 , i3 |{i1 , i2 }) · · · g(is , is |{i1 , · · · , is−1 }) is unaffected by any permutation of the indices i1 , i2 , · · · , is . 2) ˜ i1 ([i1 , i2 , · · · , is ])pi i Π s 1 s ˜ j1 ([j1 , · · · , jr , ik , ik+1 , · · · , ik+s−1 ])pi i , = Π k−1 k
(1.18)
k=1 r≥1 j2 ,···,jr
where j1 is fixed in the complement set of {i1 , i2 , · · · , is }, the inner sum is taken over distinct choices j2 , j3 , · · · , jr ∈ S \ {j1 , i1 , · · · , is }, and the sums k + 1, k + 2, · · · , k + s − 1 are understood to be modulo s. 3) For any fixed points i and j, ˜ i ([i, j2 , · · · , jr , j]), πj = (1.19) Π r≥1 j2 ,···,jr
where the inner sum is taken over all distinct choices j2 , · · · , jr ∈ S \ {i, j}. The following fact relates Theorem 1.2.7 to the special case of Theorem 1.2.2. Proposition 1.2.8. In case S is a finite set, for any distinct i1 , i2 , · · ·, is , is+1 ∈ S, the taboo Green function g(is+1 , is+1 |{i1 , i2 , · · · , is }) =
D({i1 , i2 , · · · , is+1 }c ) . D({i1 , i2 , · · · , is }c )
(1.20)
Proof. By the definition of taboo probability, it is easy to see that for i, j ∈ {i1 , · · · , is }, we have g(i, j|{i1 , i2 , · · · , is }) =
+∞ ({i1 ,···,is } P )n ij , n=0
24
1 Denumerable Markov Chains
where {i1 ,···,is } P is the matrix formed by deleting the rows and columns indexed by i1 , · · · , is from P . Consequently, (g(i, j|{i1 , i2 , · · · , is })) = (I − {i1 ,···,is } P )−1 and it follows from the classical adjoint expression of inverse matrix that g(is+1 , is+1 |{i1 , · · · , is }) =
D({i1 , · · · , is , is+1 }c ) . D({i1 , · · · , is }c )
1.3 Circulation Distribution of Recurrent Markov Chains With the derived chain, in (1.4), we have defined rigorously wc,n (ω), the number of occurrences of the cycle c up to time n along the trajectory ω of ξ. Applying the Birkhoff ergodic theorem to the positive recurrent and stationary Markov chain ξ, we can easily get that for any i, j ∈ S, the edge weight w(i, j) = πi pij is the mean number of consecutive passages of ξ through the points i and j. That is, πi pij is the almost sure limit of 1 card{m : 0 ≤ m < n, ξm (ω) = i, ξm+1 (ω) = j}, n as n → +∞. Because of the non-stationarity of the Markov chain η, which is used to express wc,n (ω), we can not apply directly the Birkhoff ergodic w (ω) theorem to the derived chain η to get the almost sure limit of c,nn . But we can exploit the strong law of large numbers for Markov chains, which is stated in the following lemma. Lemma 1.3.1. Suppose that X = {Xn }n≥0 is a homogeneous, irreducible and positive recurrent Markov chain with a countable state space S and a unique invariant probability distribution µ = (µi )i∈S . Then for any bounded function f on S and any given probability distribution of X0 , almost surely we have n−1 1 f (Xk ) = E µ f (·) = µi f (i). n→+∞ n
lim
k=0
(1.21)
i∈S
w
(ω)
To apply the result above to the expression of c,nn , we need to define a new Markov chain ζ = {ζn }n≥0 by ζn (ω) = (ηn (ω), ηn+1 (ω)), ∀n ≥ 0. The following properties of ζ can be easily proved. Lemma 1.3.2. ζ is a homogeneous Markov chain with the countable state space [S] × [S]. For each i ∈ S, ζ is positive recurrent on the irreducible class {(y0 , y1 ) ∈ [S]i × [S]i : p˜y0 y1 > 0} ˆ i = {Π ˆ i (y0 , y1 )}, where with the unique invariant probability distribution Π ˜ i (y0 )˜ ˆ i (y0 , y1 ) = Π py 0 y 1 . Π
1.3 Circulation Distribution of Recurrent Markov Chains
25
Theorem 1.3.3. Let Cn (ω) be the class of all cycles occurring along the sample path {ξl (ω)}l≥0 until time n > 0. Then the sequence of sample weighted cycles (Cn (ω), wc,n (ω)/n) associated with the chain ξ converges almost surely to a class (C∞ , wc ), that is, C∞ = lim Cn (ω), a.e. n→+∞
(1.22)
wc,n (ω) , a.e. (1.23) n Furthermore, for any directed cycle c = (i1 , i2 , · · · , is ) ∈ C∞ , the weight wc is given by wc = lim
n→+∞
wc = pi1 i2 pi2 i3 · · · pis−1 is pis i1 πi1 g(i2 , i2 |{i1 }) g(i3 , i3 |{i1 , i2 }) · · · g(is , is |{i1 , i2 , · · · , is−1 }),
(1.24)
where g(ik , ik |{i1 , · · · , ik−1 }) denotes the taboo Green function introduced in Theorem 1.2.7; In case S is a finite set, the weight wc can be expressed as wc = pi1 i2 pi2 i3 · · · pis−1 is pis i1
D({i1 , i2 , · · · , is }c ) . c j∈S D({j} )
(1.25)
Proof. Since the sequence {Cn (ω)} is increasing, we can assign to each ω the class limn→+∞ Cn (ω) of directed cycles that occur along {ξl (ω)}l≥0 . Denote def
C∞ (ω) =
lim Cn (ω) =
n→+∞
+∞
Cn (ω).
n=1
For each i ∈ S, we denote by Pi the conditional probability distribution P(·|ξ0 = i) on the canonical orbit space Ω of ξ. Given a cycle c = (i1 , i2 · · · , is ), recall that wc,n (ω) =
n s
1{˜ω:ηl−1 (˜ω)=[ηl (˜ω),[ik ,ik+1 ,···,ik+s−1 ]]} (ω),
l=1 k=1
and apply Lemma 1.3.1 to the Markov chain ζ, then we can get that for each i ∈ S and Pi -almost every ω, wc,n (ω) n s ˆi Π =E 1{(y0 ,y1 ):y0 =[y1 ,[ik ,ik+1 ,···,ik+s−1 ]]} (·) lim
n→+∞
k=1
s ˜i k=1 r≥0 j1 ,···,jr Π ([i, j1 , · · · , jr , ik , ik+1 , · · · , ik+s−1 ]) · pik−1 ik , if i ∈ {i1 , · · · , is }, = i ˜ ([i, i , · · · , ik+s−1 ])pik−1 i , Π k+1 if i = ik for some 1 ≤ k ≤ s, ˜ i1 ([i1 , i2 , · · · , is ])pi i , =Π s 1
26
1 Denumerable Markov Chains
where j1 , · · · , jr ∈ {i1 , · · · , is }, are distinct from one another and the last equality is the result of Theorem 1.2.7. Hence by Theorem 1.2.7 1) and Theorem 1.2.2 1), for P-almost every ω, wc,n (ω) n ˜ i1 ([i1 , i2 , · · · , is ])pi i =Π s 1 = pi1 i2 pi2 i3 · · · pis−1 is pis i1 πi1 g(i2 , i2 |{i1 }) g(i3 , i3 |{i1 , i2 }) · · · g(is , is |{i1 , i2 , · · · , is−1 }), in the case S is finite, D({i1 , i2 , · · · , is }c ) = pi1 i2 pi2 i3 · · · pis−1 is pis i1 . c j∈S D({j} ) lim
n→+∞
Then it follows immediately that C∞ (ω) is independent of ω as well, and we denote it by C∞ . We now introduce the following nomenclature: Definition 1.3.4. The items occurring in Theorem 1.3.3 are as follows: the sequence {wc,n (ω)/n : c ∈ C∞ }, which is called the circulation distribution on ω up to time n, the wc , which is called the cycle skipping rate on c, and {wc : c ∈ C∞ }, which is called the circulation distribution of ξ. With the class of cycles C∞ and the circulation distribution {wc : c ∈ C∞ } of ξ specified by Theorem 1.3.3, now we can present the probabilistic cycle representation of the Markov chain ξ. Theorem 1.3.5 (Probabilistic Cycle Representation). With assumptions as before, we have πi pij = lim
n→+∞
=
wc,n (ω) Jc (i, j), a.s. n
c∈C∞
wc Jc (i, j), ∀i, j ∈ S.
(1.26)
c∈C∞
The representative class (C∞ , wc ) provided by Theorem 1.3.5 is called the probabilistic cycle (circuit) representation of ξ and P while ξ is called a circuit chain. The term “probabilistic” is argued by the algorithm of Theorem 1.3.3 whose unique solution {wc } enjoys a probabilistic interpretation in terms of the sample paths of ξ. The terms in the equations (1.26) have a natural interpretation using the sample paths of ξ as follows. Consider the functions σn (·; i, j) defined as σn (ω; i, j) =
1 card{m : 0 ≤ m < n, ξm (ω) = i, ξm+1 (ω) = j} n
for any i, j ∈ S. Let Cn (ω) be, as in Theorem 1.3.3, the class of all the cycles occurring up to time n along the sample path {ξl (ω)}l≥0 . Recall that a cycle
1.3 Circulation Distribution of Recurrent Markov Chains
27
c = (i1 , · · · , ir ), r ≥ 1, occurs along a sample path if the chain passes through states i1 , i2 , · · · , ir , i1 , or any cyclic permutation. Notice that the sample sequence k(ω) = (ξm−1 (ω), ξm (ω)) occurs up to n whenever k(ω) is passed by a cycle of Cn (ω) or k(ω) is passed by a circuit completed after time n on the sample path {ξl (ω)}l≥0 . Therefore for i, j ∈ S and n > 0, we have σn (ω; i, j) =
c∈Cn (ω)
wc,n (ω) εn (ω; i, j) Jc (i, j) + , n n
(1.27)
where εn (ω; i, j) = 0 or 1 depending on whether or not the last step from i to j belongs to a completed cycle. With probability one the left side converges to πi pij and each summand of the right side converges to wc Jc (i, j), then (1.26) follows. From (1.26), we can get that for any i, j ∈ S, πi pij − πj pji = (wc − wc− )Jc (i, j), (1.28) c∈C∞
where c− denotes the reversed cycle of c. That is to say, any one of the one-step probability fluxes πi pij (i, j ∈ S, i = j) can be decomposed into two parts: one is the part of the detailed balance min{πi pij , πj pji }, i.e. the eliminated part of the two one-step probability fluxes between i and j; another is the part of the circulation balance, i.e. the net part of the probability flux between i and j, which is composed of a set of circulations on C∞ that pass the edge (i, j) or its reversal (j, i). The circulations are just the cycle skipping rates {wc : c ∈ C∞ }. We call (1.28) the circulation decomposition of the stationary Markov chain ξ. It can be proved that generally the circulation decomposition is not unique, i.e. it is possible to find another set of cycles C and weights on these cycles {w ˆc : c ∈ C} which fit (1.28). Using a diagram method, T. Hill [224, 226] proved that his cycle fluxes satisfy the equation of circulation decomposition (1.28), where his concept of cycle flux is equivalent to our circulation rate defined in the sense of trajectories. Hence we have shown that Hill’s choice of circulation decomposition is the only reasonable one from the probabilistic point of view. The probabilistic cycle representation expresses the relations between the edge-coordinates πi pij , i, j ∈ S, and the cycle-coordinates wc , c ∈ C∞ , in the sample-path-behavioral approach. Kalpazidou [257] presented another deterministic algorithm to get a deterministic cycle representation of the Markov chain ξ. On the other hand, denumerable Markov chains can be generated by weighted circuits [257]. For simplicity, we only present the case that S is a finite set. Consider a finite collection C of overlapping directed circuits in S. Suppose further that all the points of S can be reached from one another following paths of circuit-edges, that is, for each two distinct points i and j of S, there exists a sequence c1 , · · · , ck , k ≥ 1, of circuits in C such that i lies on c1 and j lies on ck , and any pair of consecutive circuits (cn , cn+1 ) have
28
1 Denumerable Markov Chains
at least one point in common. Associate a strictly positive number w ˆc with each c ∈ C. Since the numbers w ˆc must be independent of the choice of the representative of c (according to Definition 1.1.2), suppose that they satisfy the following consistency condition: w ˆc◦tk = w ˆc , ∀k ∈ Z. Define w(i, ˆ j) =
w ˆc Jc (i, j), ∀i, j ∈ S,
c∈C
w(i) ˆ =
w ˆc Jc (i), ∀i ∈ S.
c∈C
Theorem 1.3.6. Under the above assumptions, there exists an irreducible Sstate Markov chain on a suitable probability space with the transition matrix Pˆ = (ˆ pij )i,j∈S , where w(i, ˆ j) . pˆij = w(i) ˆ We refer the reader to Kalpazidou [257] for more details about finite or denumerable Markov chains generated by weighted circuits.
1.4 Irreversibility and Entropy Production Definition 1.4.1. The stationary Markov chain ξ is said to be reversible if (ξm1 , ξm2 , · · · , ξmk ) has the same distribution as (ξT −m1 , ξT −m2 , · · · , ξT −mk ) for all k ≥ 1, m1 < m2 < · · · < mk and T ∈ Z. The most known necessary and sufficient criterion for the chain ξ to be reversible is given in terms of its transition probability matrix P = (pij )i,j∈S and the invariant probability distribution Π = (πi )i∈S : πi pij = πj pji , ∀i, j ∈ S.
(1.29)
When the relations (1.29) hold, we say that ξ is in detailed balance. Let us write the relations (1.29) for the edges (i1 , i2 ), (i2 , i3 ), · · · , (is , i1 ) of an arbitrarily given directed circuit c = (i1 , i2 , · · · , is , i1 ), s > 1, with distinct points i1 , · · · , is , which occurs in the graph of P . Then multiplying these equations together and cancelling the corresponding values of the invariant distribution Π, we obtain the following equations: pi1 i2 pi2 i3 · · · pis−1 is pis i1 = pi1 is pis is−1 · · · pi3 i2 pi2 i1
(1.30)
for any directed cycle c = (i1 , · · · , is ). Equations (1.30) are known as Kolmogorov’s criterion and provide a necessary and sufficient condition, in
1.4 Irreversibility and Entropy Production
29
terms of the circuits, for the chain ξ to be reversible. The sufficiency is shown below after Theorem 1.4.7. Now we introduce two measurable transformations on (Ω, F). One is the time reversal transformation r : (Ω, F) → (Ω, F), (rω)n = ω−n , ∀n ∈ Z. The other is the left-shift operator θ : (Ω, F) → (Ω, F), (θω)n = ωn+1 , ∀n ∈ Z. Obviously, r and θ are invertible with r−1 = r. Write ξn− (ω) = ξn (rω), ξ − = {ξn− : n ∈ Z} and P− = rP, then ξ − is the time-reversed process of ξ and P− is the distribution of ξ − . The chain ξ is reversible if and only if P = P− . Since ξ is stationary, one has θn P = P, which yields θn P− = P− because rθ = θ−1 r. One can easily prove the following result: Proposition 1.4.2. ξ − is a stationary Markov chain on (Ω, F, P) with the transition probability matrix − πj pji − P = pij i,j∈S = πi i,j∈S and the invariant probability distribution Π − = (πi− )i∈S = Π. Now we discuss the relationship between the reversibility and the entropy production rate of the stationary Markov chain ξ. Definition 1.4.3. Suppose that µ and ν are two probability measures on a measurable space (M, A), the relative entropy of µ with respect to ν is defined as: dµ 1 def log dµ dν (x)µ(dx), if µ ν and log dν ∈ L (dµ), M H(µ, ν) = +∞, otherwise. There is another equivalent definition: φ H(µ, ν) = sup φdµ − log e dν ,
(1.31)
φ∈B(A)
where φ ranges over all bounded A-measurable functions. If M is a Polish space and A is the Borel σ-field, then replacing B(A) by C(M ) gives the same supremum. n n and = σ(ξk : m ≤ k ≤ n), P[m,n] = P|Fm For any m < n ∈ Z, let Fm − − n P[m,n] = P |Fm .
30
1 Denumerable Markov Chains
Definition 1.4.4. The entropy production rate of the stationary Markov chain ξ is defined by 1 def H P[0,n] , P− ep = lim (1.32) [0,n] , n→+∞ n − where H(P[0,n] , P− [0,n] ) is the relative entropy of P with respect to P restricted to the σ-field F0n .
From the theorem below, we know that the limit in the definition exists. Theorem 1.4.5. The entropy production rate ep of the stationary Markov chain ξ can be expressed as 1 πi pij ep = (πi pij − πj pji ) log . (1.33) 2 πj pji i,j∈S
To prove the theorem, we only need to consider the case that the transition probability matrix P satisfies the condition pij > 0 ⇔ pji > 0, ∀i, j ∈ S,
(1.34)
since otherwise, P[0,n] is not absolutely continuous with respect to P− [0,n] , and − by the definition of relative entropy, H(P[0,n] , P[0,n] ) is infinite for all n ∈ N, hence ep = +∞; and no term in the right hand side of (1.33) can be −∞, besides, at least one of them is +∞, therefore (1.33) holds. Exploiting Prop. 1.4.2, one can easily check the following result. Lemma 1.4.6. Under the condition (1.34), ∀m ∈ Z, n ∈ N, P[m,m+n] and P− [m,m+n] are absolutely continuous with respect to each other, and the RadonNikodym derivative is given by πξm (ω) pξm (ω)ξm+1 (ω) · · · pξm+n−1 (ω)ξm+n (ω) dP[m,m+n] (ω) = , P − a.s. − πξm+n (ω) pξm+n (ω)ξm+n−1 (ω) · · · pξm+1 (ω)ξm (ω) dP[m,m+n] Proof of Theorem 1.4.5. Under the condition (1.34), 1 ep = lim H P[0,n] , P− [0,n] n→∞ n πi pi i · · · pin−1 in 1 πi0 pi0 i1 · · · pin−1 in log 0 0 1 = lim n→∞ n π in pin in−1 · · · pi1 i0 i ,i ,···,i 0
= lim
n→∞
=
1 n
n
k=0 ik ,ik+1 ∈S
πi pij log
i,j∈S
=
1
n−1
πik pik ik+1 log
πik pik ik+1 πik+1 pik+1 ik
πi pij πj pji
πi pij 1 (πi pij − πj pji ) log . 2 πj pji i,j∈S
1.4 Irreversibility and Entropy Production
31
From the expression (1.33), one can easily get that the entropy production rate ep of the chain ξ vanishes if and only if the chain is in detailed balance. Since the Markov chain ξ can be represented by the circulation distribution {wc : c ∈ C∞ }, it is interesting to express the entropy production rate ep in terms of the circuits and their weights except for the expression (1.33) in terms of the edge weights πi pij , i, j ∈ S. Such an expression is due to Min-Ping Qian and Min Qian [400]. Theorem 1.4.7. The entropy production rate of the stationary Markov chain ξ can be expressed in terms of the circulation distribution {wc : c ∈ C∞ }: ep =
wc 1 , (wc − wc− ) log wc− 2
(1.35)
c∈C∞
where C∞ is the collection of directed cycles occurring along almost all the sample paths and c− denotes the reversed cycle of c. Proof. By (1.33), Theorem 1.3.3 and Theorem 1.3.5, one can get ep =
=
1 2 i,j
(wc − wc− ) log
c∈C∞ :Jc (i,j)=1
πi pij πj pji
s πik pik ik+1 1 (wc − wc− ) log 2 πik+1 pik+1 ik c∈C∞
k=1
s πik pik ik+1 1 = (wc − wc− ) log 2 πik+1 pik+1 ik c∈C∞ k=1 1 wc = (wc − wc− ) log . 2 wc− c∈C∞
In fact, the term (wc − wc− ) log(wc /wc− ) is a quantity which describes the deviation from symmetry along the directed cycle c, while the entropy production rate ep is the total deviation from symmetry along the cycles occurring on the sample paths. Accordingly, one can easily get the following criterion: the Markov chain ξ is reversible if and only if the components wc , c ∈ C∞ , of the circulation distribution of ξ satisfy the symmetry condition wc = wc− , ∀c ∈ C∞ .
(1.36)
By Theorem 1.3.3, this condition is equivalent to Kolmogorov’s criterion (1.30). As is well known, the Markov chain ξ is reversible if and only if it is in detailed balance, i.e. πi pij = πj pji , ∀i, j ∈ S. If this condition of detailed balance is satisfied, then (1.36) follows from Theorem 1.3.3 and (1.30). Conversely, if (1.36) holds true, then by Theorem 1.3.5, the Markov chain ξ is in
32
1 Denumerable Markov Chains
detailed balance. According to the definition of ep , ep is the information difference between the distribution of ξ and that of its time reversal. Therefore, Theorem 1.4.7 tells us that time reversibility coincides with symmetry along cycles. The analogues of the relations (1.36) for biophysical phenomena are given by T. Hill [224] using a diagram method where his concepts of cycle flux and detailed balance correspond respectively to the circulation distribution and reversibility of Markov chains. Our results about the reversibility of Markov chains with discrete time parameter can be summarized in the following theorem. Theorem 1.4.8. Suppose that ξ is an irreducible, positive recurrent and stationary Markov chain with a denumerable state space S, a transition matrix P = (pij )i,j∈S and a unique invariant probability distribution Π = (πi )i∈S , and let {wc : c ∈ C∞ } be the circulation distribution of ξ, then the following statements are equivalent: 1) The Markov chain ξ is reversible. 2) The Markov chain ξ is in detailed balance, that is, πi pij = πj pji , ∀i, j ∈ S. 3) The transition matrix P of ξ satisfies the Kolmogorov cyclic condition: pi1 i2 pi2 i3 · · · pis−1 is pis i1 = pi1 is pis is−1 · · · pi3 i2 pi2 i1 , for any directed cycle c = (i1 , · · · , is ). 4) The components of the circulation distribution of ξ satisfy the symmetry condition: wc = wc− , ∀c ∈ C∞ . 5) The entropy production rate ep = 0. Example 1.4.9. Consider the simplest nontrivial case that the state space S of the stationary Markov chain ξ is {1, 2, 3} and its transition matrix 0 p q P = q 0 p, p q 0 where p > 0, q > 0 and p + q = 1. The invariant initial distribution Π of ξ is 13 , 13 , 13 . The directed cycles occurring along almost all the paths of ξ constitute C∞ = {(1, 2, 3), (3, 2, 1), (1, 2), (2, 3), (3, 1)}. Note (1, 2) = (2, 1), and so on. By Theorem 1.3.3, the cycle skipping rates can be expressed as w(1,2,3) =
p3 q3 , w(3,2,1) = , 3(1 − pq) 3(1 − pq)
1.4 Irreversibility and Entropy Production
w(1,2) = w(2,3) = w(3,1) =
33
pq . 3(1 − pq)
The entropy production rate of ξ is given by w(1,2,3) p ep = (p − q) log = w(1,2,3) − w(3,2,1) log . w(3,2,1) q The Markov chain ξ is reversible if and only if p = q = 21 , or equivalently, its entropy production rate ep vanishes. For a system which may be described by the model of a Markov chain, the movement of the system is actually a process in which it continuously completes those possible cycles of its state space including loops which contain only one state, and the so-called “back and forth” cycles which contain only two states. When the system is in a steady state of nonequilibrium, there exists at least one cycle, containing at least three states, round which the circulation rates of one direction and its opposite direction are asymmetric (unequal), so as to cause a net circulation on the cycle. It is the existence of these net circulations that results in such macroscopic quantities as entropy production or free energy dissipation. Two characteristics of the system should be given to describe its average properties in the steady state. One is about the situation of the system, that is the probability distribution, another is about the cycling process performed by the system which is in fact what keeps the system in balance, and this characteristic is the circulation distribution. Remark 1.4.10. In the case that the stationary irreducible Markov chain ξ on (Ω, F, P) has a finite state space S, basing on the probabilistic cycle representation (C∞ , wc ) of ξ, Kalpazidou [257, Part I, Chap. 4, Sects. 4,5] developed a homologic representation (Γ, w ˜γ ) of ξ, where Γ = {γ1 , · · · , γB } is a base of “Betti circuits” (if it exists) in the real vector space C˜1 of all one-cycles associated with the oriented graph G(P ) of the transition matrix P of ξ, and the homologic circulation weights A(c, γk )wc , w ˜ γk = c∈C∞
which can be negative, with the coefficients A(c, γk ) ∈ Z arising from the linear combination B c= A(c, γk )γk k=1
in C˜1 . (If the condition (1.34) is satisfied, then there always exists a base of Betti circuits in C˜1 .) For each n ∈ N, the family of occurrence times wc,n (ω) of cycles along the sample path ω until time n determines a one-cycle in C˜1 , B c(n, ω) = wc,n (ω)c = A(c, γk )wc,n (ω) γk . c∈C∞
k=1
c∈C∞
34
1 Denumerable Markov Chains
For k = 1, · · · , B, write Nk (n, ω) =
A(c, γk )wc,n (ω).
c∈C∞
Then P-almost surely, lim
n→+∞
Nk (n, ω) A(c, γk )wc = w ˜ γk . = n
(1.37)
c∈C∞
Exploiting the fact that πi pij =
B
w ˜γk Jγk (i, j), ∀i, j ∈ S,
k=1
one can easily verify that the entropy production rate ep of ξ can be expressed as a linear sum of the homologic circulation weights w ˜γk , moreover, ξ is reversible if and only if the circulation weights w ˜γk all vanish.
1.5 Large Deviations and Fluctuation Theorem In this section we discuss the large deviation property of the distributions of sample entropy production rates (i.e. the logarithm of the Radon-Nikodym derivative of the distribution of the Markov chain ξ with respect to that of its time reversal over a time interval [0, n], n ∈ N). Then we derive the fluctuation theorem: The large deviation rate function has a symmetry of GallavottiCohen type. For simplicity, in this section we assume that the state space S is finite (i.e. S = {1, 2, · · · , N }), moreover, the transition matrix P satisfies the condition (1.34). Firstly, we recall the definition of large deviation property. Let X be a complete separable metric space, B(X ) the Borel σ-field of X , and {µt : t ∈ T} a family of probability measures on B(X ) with index set T = N or R+ . Definition 1.5.1. {µt : t ∈ T} is said to have a large deviation property if there exists a family of positive numbers {at : t ∈ T} which tend to +∞ and a function I(x) which maps X into [0, +∞] satisfying the following conditions: 1) I(x) is lower semicontinuous on X ; 2) For each l < +∞, the level set {x : I(x) ≤ l} is a compact set in X ; 3) lim supt→+∞ a−1 t log µt (K) ≤ − inf x∈K I(x) for each closed set K ⊂ X ; −1 4) lim inf t→+∞ at log µt (G) ≥ − inf x∈G I(x) for each open set G ⊂ X . I(x) is called a rate function (or an entropy function) of {µt : t ∈ T}. We note several consequences of the definition. The infimum of I(x) over X equals 0. This follows from the upper and lower large deviation bounds 3)-4) with K = G = X . It follows from hypotheses 1) and 2) that I(x) attains
1.5 Large Deviations and Fluctuation Theorem
35
its infimum over any nonempty closed set (the infimum may be +∞). See Ellis [116, page 34] for the argument. According to Theorem II.3.2. in [116], if a large deviation property holds, then the entropy function is unique. Now we state a large deviation result for dependent random variables from [116] in a simplified form which we will use later. Let W = {Wt : t ∈ T} be a family of random variables which are defined on probability spaces {(Ωt , Ft , Pt ) : t ∈ T}, and {at : t ∈ T} be a family of positive real numbers tending to infinity. We define functions ct (λ) =
1 log Et eλWt , ∀t ∈ T, λ ∈ R, at
where Et denotes expectation with respect to Pt . The following hypotheses are assumed to hold: (a) Each function ct (λ) is finite for all λ ∈ R; (b) c(λ) = limt→+∞ ct (λ) exists for all λ ∈ R and is finite. As pointed out by Ellis [116], hypothesis (b) is natural for statistical mechanical applications since c(λ) is closely related to the concept of free energy. We call c(λ) the free energy function of W. Theorem 1.5.2. Assume that hypotheses (a) and (b) hold. Let µt be the dist tribution of W at on R, then the following conclusions hold: 1) The Legendre-Fenchel transform I(z) = sup {λz − c(λ)} λ∈R
of c(λ) is convex, lower semicontinuous, and non-negative. I(z) has compact level sets and inf z∈R I(z) = 0. 2) The upper large deviation bound is valid: for each closed set K ⊂ R, lim sup t→+∞
1 log µt (K) ≤ − inf I(z). z∈K at
3) Assume in addition that c(λ) is differentiable for all λ, then the lower large deviation bound is valid: for each open set G ⊂ R, lim inf t→+∞
1 log µt (G) ≥ − inf I(z). z∈G at
Hence, if c(λ) is differentiable for all λ, then {µt : t ∈ T} has a large deviation property with entropy function I. Theorem 1.5.3. Assume that hypotheses (a) and (b) hold, then the following statements are equivalent: t 1) W at converges exponentially to a constant z0 , that is, for any ε > 0, there exist positive numbers C and M such that for all t ≥ M , ! ! ! Wt ! Pt !! − z0 !! ≥ ε ≤ e−at C ; at
36
1 Denumerable Markov Chains
2) c(λ) is differentiable at λ = 0 and c (0) = z0 ; 3) I(z) attains its infimum on R at the unique point z = z0 . For the case T = N, Theorem 1.5.2 and Theorem 1.5.3 are respectively Theorem II.6.1 and Theorem II.6.3 in Ellis [116] (see also [85]). If the sequence of random variables {Wn : n ∈ N} are all defined on the same space, +∞ then exponential convergence implies almost sure convergence provided n=1 exp(−an C) is finite for all C > 0. This extra condition is needed in order to apply the Borel-Cantelli lemma. For the case T = R+ , one can prove Theorem 1.5.2 and Theorem 1.5.3 along the lines of the proof for the discrete parameter case given by Ellis [116]. In the next chapter, we will apply Theorem 1.5.2 and Theorem 1.5.3 in the continuous parameter case. Now we discuss the large deviation property of the distributions of a sequence of special random variables. Recall that (Ω, F, P) is the canonical orbit space of the stationary, irreducible and positive recurrent Markov chain ξ. For each n ∈ N, take (Ωn , Fn , Pn ) = (Ω, F, P), an = n, and write Wn (ω) = log
dP[0,n] (ω), ∀ω ∈ Ω, dP− [0,n]
then cn (λ) =
1 log EeλWn . n
As Wn takes only finite numbers of values and eλWn > 0, P-a.s., the hypothesis (a) in Theorem 1.5.2 holds true. We will prove that the hypothesis (b) also holds true, moreover, the free energy function of W = {Wn : n ∈ N} is differentiable, hence the distributions of { Wnn : n ∈ N} have a large deviation property. To do so, we need the following well-known Perron-Frobenius theorem [49, 239, 325, 509]. Theorem 1.5.4. Let A = (aij ) be a non-negative k × k matrix, then the following statements hold true: 1) There is a non-negative eigenvalue ρ such that no eigenvalues of A has absolute value greater than ρ, i.e. ρ is equal to the spectral radius σ(A) of A. 2) Corresponding to the eigenvalue ρ, there is a non-negative left eigenvector (u1 , · · · , uk ) and a non-negative right eigenvector (v1 , · · · , vk )T . 3) If A is irreducible, then ρ is a simple eigenvalue and the corresponding eigenvectors are strictly positive (i.e. ui > 0, vi > 0, ∀i). 4) If A is irreducible, then ρ is the only eigenvalue of A with a non-negative eigenvector. In Theorem 1.5.2, to get the lower large deviation bound, the differentiability of the free energy function c(λ) is needed, so we present a result about the differentiability of simple eigenvalues, whose proof can be found in Shu-Fang Xu [509].
1.5 Large Deviations and Fluctuation Theorem
37
Theorem 1.5.5. Suppose that A(λ) is a k × k real matrix differentiable in some neighborhood U of the origin of R, and ρ is a real simple eigenvalue of A(0) with corresponding unit right eigenvector α ∈ Rk , then there exists a real function ρ(λ) and a real unit vector α (λ) ∈ Rk , defined and differentiable in a neighborhood U0 ⊂ U of the origin of R, such that ρ(0) = ρ, α (0) = α and A(λ) α(λ) = ρ(λ) α(λ), ∀λ ∈ U0 . Remark 1.5.6. When A(λ) is an irreducible non-negative k × k matrix, then from Theorem 1.5.4, we know that ρ = σ(A(0)) is a simple eigenvalue of A(0) with corresponding right eigenvector α > 0. From Theorem 1.5.5, we know that α (λ) is differentiable, hence α (λ) > 0 in a neighborhood of λ = 0, then by Theorem 1.5.4, ρ(λ) = σ(A(λ)). Therefore, σ(A(λ)) is differentiable at λ = 0. Theorem 1.5.7. There exists a real differentiable function c(λ) such that lim cn (λ) = c(λ), ∀λ ∈ R.
n→+∞
So the family of the distributions of { Wnn : n ∈ N} has a large deviation property with entropy function I(z) = sup {λz − c(λ)}. λ∈R
Proof. By the definition of Wn (ω) and Lemma 1.4.6, we have Ee
λWn
=E
λ dP[0,n] (ω) dP− [0,n]
πξ0 pξ0 ξ1 · · · pξn−1 ξn λ =E πξn pξ1 ξ0 · · · pξn ξn−1 πi0 pi0 i1 · · · pin−1 in λ = πi0 pi0 i1 · · · pin−1 in πin pi1 i0 · · · pin in−1 i ,i ,···,i :
0
1
n
pi0 i1 ···pin−1 in >0
=
πi 0 p i 0 i 1
i0 ,i1 ,···,in : pi0 i1 ···pin−1 in >0
πi 0 p i 0 i 1 πi 1 p i 1 i 0
λ
· · · pin−1 in
πin−1 pin−1 in πin pin in−1
λ .
For any i, j ∈ S, λ ∈ R, put aij (λ) =
pij 0,
πi pij πj pji
λ , if pij > 0, if pij = 0.
It is obvious that pij > 0 ⇔ aij (λ) > 0. Hence A(λ) = (aij (λ))i,j∈S is an irreducible nonnegative matrix. By the Perron-Frobenius theorem, the spectral radius e(λ) of A(λ) is a positive eigenvalue of A(λ) with one-dimensional
38
1 Denumerable Markov Chains
eigenspace {k α : k ∈ R}, where α = (α1 , α2 , · · · , αN )T satisfies αi > 0 for each i ∈ S. Denote αmin = mini αi , αmax = maxi αi . Then for any given λ, −1 −1 ΠA(λ)n α , αmax ΠA(λ)n α ≤ EeλWn = ΠA(λ)n 1 ≤ αmin
where Π = (π1 , π2 , · · · , πN ) and 1 = (1, · · · , 1)T . Hence lim
n→+∞
1 1 = log e(λ). log EeλWn = lim log ΠA(λ)n α n→+∞ n n
By Remark 1.5.6, e(λ) is differentiable since it is the simple eigenvalue of the differentiable matrix A(λ). Then the desired large deviation result follows from Theorem 1.5.2. Now we present a symmetry of the entropy function I, which is the fluctuation theorem of Gallavotti-Cohen type. Theorem 1.5.8 (Fluctuation Theorem). The free energy function c(·) and the large deviation rate function I(·) of W = {Wn : n ∈ Z+ } have the following properties: c(λ) = c(−(1 + λ)), ∀λ ∈ R; I(z) = I(−z) − z, ∀z ∈ R. Proof. Recall that r is the time reversal transformation on Ω. As rP− = P, we have λ λ dP[0,n] dP[0,n] λWn Ee = (ω) dP(ω) =E dP− dP− [0,n] [0,n] λ dP[0,n] (ω) drP− (ω) = dP− [0,n] λ dP[0,n] (rω) dP− (ω). = dP− [0,n] By Lemma 1.4.6, for P-almost every ω, πξ (rω) pξ0 (rω)ξ1 (rω) · · · pξn−1 (rω)ξn (rω) dP[0,n] (rω) = 0 − π dP[0,n] ξn (rω) pξ1 (rω)ξ0 (rω) · · · pξn (rω)ξn−1 (rω) πξ0 (ω) pξ0 (ω)ξ−1 (ω) · · · pξ−(n−1) (ω)ξ−n (ω) = πξ−n (ω) pξ−1 (ω)ξ0 (ω) · · · pξ−n (ω)ξ−(n−1) (ω) πξ (θ−n ω) pξn (θ−n ω)ξn−1 (θ−n ω) · · · pξ1 (θ−n ω)ξ0 (θ−n ω) = n πξ0 (θ−n ω) pξn−1 (θ−n ω)ξn (θ−n ω) · · · pξ0 (θ−n ω)ξ1 (θ−n ω) −1 dP[0,n] −n = (θ ω) . (1.38) dP− [0,n]
1.5 Large Deviations and Fluctuation Theorem
39
Then it follows from θP− = P− that −λ dP [0,n] (θ−n ω) dP− (ω) EeλWn = dP− [0,n] −λ dP[0,n] (ω) dP− (ω) = dP− [0,n] −λ dP− dP[0,n] [0,n] (ω) (ω)dP(ω) = − dP[0,n] dP[0,n] −(1+λ) dP[0,n] (ω) dP(ω) = dP− [0,n] = Ee−(1+λ)Wn , that is to say, cn (λ) = cn (−(1 + λ)), which yields c(λ) = c(−(1 + λ)). Hence, for any z ∈ R, I(z) = sup {λz − c(λ)} λ∈R
= sup {λz − c(−(1 + λ))} λ∈R
= sup {−(1 + λ)z − c(λ)} λ∈R
= sup {λ · (−z) − c(λ)} − z λ∈R
= I(−z) − z. By Theorem 1.5.9 below, we can regard
Wn (ω) n
=
1 n
log
dP[0,n] (ω) dP− [0,n]
as the
time-averaged entropy production rate of the sample trajectory ω of the stochastic system modelled by the Markov chain ξ. Roughly speaking, the fluctuation theorem gives a formula for the probability ratio that the sample entropy production rate Wnn takes a value z to that of −z, and the ratio is roughly enz . In fact, by (1.38), for any n > 0, z ∈ R, it holds that dP[0,n] Wn nz =z =P =e P n dP− [0,n] dP[0,n] nz = P[0,n] =e dP− [0,n] dP[0,n] nz − nz = e P[0,n] =e dP− [0,n] − dP[0,n] nz nz = e P[0,n] =e dP[0,n]
40
1 Denumerable Markov Chains
= enz P
Wn = −z . n
Since S is finite, Wnn only takes a finite number of values and both sides of the above equality may simultaneously be equal to zero. However, in case one can divide over, the above equality can be written as P Wnn = z = enz . P Wnn = −z If the Markov chain ξ is reversible (i.e. in detailed balance), then I(0) = 0 and I(z) = +∞, ∀z = 0, so in this case the fluctuation theorem gives a trivial result. However, if the Markov chain ξ is not reversible, then for z > 0 in a certain range, the sample entropy production rate Wnn has a positive probability to take the value z as well as the value −z, but the fluctuation theorem tells that the former probability is greater, which accords with the second law of thermodynamics. As the free energy function c(λ) of W = {Wn : n ∈ N} is differentiable at λ = 0, by Theorem 1.5.3, Wnn converges exponentially to the constant c (0). +∞ For each constant C > 0, n=1 exp(−nC) is finite, so by the remark after Theorem 1.5.3, Wnn converges almost surely to c (0). We will calculate the almost sure limit of Wnn directly and will see that c (0) equals the entropy production rate ep of the stationary Markov chain ξ. Theorem 1.5.9. For P-almost every ω ∈ Ω, 1 Wn (ω) wc = (wc − wc− ) log = ep , n→+∞ n 2 wc− lim
(1.39)
c∈C∞
where {wc : c ∈ C∞ } is the circulation distribution of the stationary Markov chain ξ determined by Theorem 1.3.3 and c− denotes the reversed cycle of c. Proof. For each trajectory ω of the Markov chain ξ, in Sect. 1.2 we defined the derived chain {ηn (ω)}n≥0 . Recall that if the length ln+1 (ω) of ηn+1 (ω) is less than the length ln (ω) of ηn (ω), then ω completes a cycle at time n + 1; if ln+1 (ω) = ln (ω), then ξn+1 (ω) = ξn (ω). We define inductively a sequence of random variables {fn (ω) : n ≥ 0} as below: def
1) f0 (ω) = 1; 2) For each n ≥ 0, pξ (ω)ξn+1 (ω) fn (ω) pξn (ω)ξ , if ln+1 (ω) ≥ ln (ω), def n (ω) n+1 fn+1 (ω) = −1 fn (ω) pi1 i2 ···pis−1 is pis i1 , if ηn (ω) = [ηn+1 (ω), [i1 , · · · , is ]]. pi i ···pi i pi i s s−1
2 1
1 s
From the definition of fn (ω), if ηn (ω) = [i1 , i2 , · · · , il ], then
1.5 Large Deviations and Fluctuation Theorem
fn (ω) =
41
pi1 i2 · · · pil−1 il . pil il−1 · · · pi2 i1
By Lemma 1.4.6 and Theorem 1.3.3, we have πξ (ω) pξ0 (ω)ξ1 (ω) · · · pξn−1 (ω)ξn (ω) eWn (ω) = 0 πξn (ω) pξn (ω)ξn−1 (ω) · · · pξ1 (ω)ξ0 (ω) wc,n (ω) wc πξ0 (ω) = · fn (ω), πξn (ω) wc− c∈C∞
and wc,n (ω) Wn (ω) wc 1 πξ (ω) 1 log = log 0 + log fn (ω). + n wc− n n n πξn (ω) c∈C∞
Since the state space S of ξ is finite, the state space of the derived chain {ηn }n≥0 is finite. Hence fn (ω) takes only finite number of positive values, then by Theorem 1.3.3, for P-almost every ω, lim
n→+∞
wc,n (ω) wc Wn (ω) = lim log n→+∞ n n wc− c∈C∞ wc wc log = wc− c∈C∞ 1 wc = (wc − wc− ) log = ep . 2 wc− c∈C∞
Now we discuss the fluctuations of general observables. Let ϕ : S → R be n n an observable and Φn (ω) = k=0 ϕ(ξk (ω)) = k=0 ϕ(ξ0 (θk ω)). Clearly, Φn satisfies Φn (rω) = Φn (θ−n ω), ∀ω ∈ Ω. From the Birkhoff ergodic theorem, it follows that limn→+∞ Φnn = E Π ϕ. Use the Perron-Frobenius theorem, then one sees that 1 def c(λ1 , λ2 ) = lim log Eeλ1 Wn +λ2 Φn n→+∞ n exists and is differentiable with respect to λ1 , λ2 . Thus {µn : n > 0}, the family of the distributions of {( Wnn , Φnn ) : n > 0}, has a large deviation property with rate function I(z1 , z2 ) = supλ1 ,λ2 ∈R {λ1 z1 +λ2 z2 −c(λ1 , λ2 )}. It is not difficult to find that c(λ1 , λ2 ) = c(−(1 + λ1 ), λ2 ) and I(z1 , z2 ) = I(−z1 , z2 ) − z1 . n : n > 0} and {Ψ n : n > 0} be two sets of random In general, let {Φ n and Ψ n are F n -measurable. Provided the free vectors on (Ω, F, P), where Φ 0 energy function def
γ ) = c(λ, β,
lim
n→+∞
1 log EeλWn +β,Φn +γ ,Ψn n
exists and is differentiable, it holds that {µn : n > 0}, the family of the n, Ψ n ) : n > 0}, has a large deviation property with distributions of { n1 (Wn , Φ rate function
42
1 Denumerable Markov Chains
I(z, u, v ) = sup
" # u + γ , v − c(λ, β, γ ) . λz + β,
γ λ,β,
And we have the following generalized fluctuation theorem. n (rω) = Φ n (θ−n ω) and Ψ n (rω) = Theorem 1.5.10. If for each n > 0, Φ −n n (θ ω), ∀ω ∈ Ω, then it holds that −Ψ γ ) = c(−(1 + λ), β, −γ ), c(λ, β,
I(z, u, v ) = I(−z, u, −v ) − z.
γ , Proof. For any given λ, β,
EeλWn +β,Φn +γ ,Ψn λ dP[0,n] = (ω) eβ,Φn (ω) +γ ,Ψn (ω) dP(ω) dP− [0,n] λ dP[0,n] (rω) eβ,Φn (rω) +γ ,Ψn (rω) dP− (ω) = − dP[0,n] −λ dP[0,n] −n −n −n (θ ω) eβ,Φn (θ ω) +γ ,−Ψn (θ ω) dP− (ω) = − dP[0,n]
= Ee−(1+λ)Wn +β,Φn +−γ ,Ψn .
The desired result follows immediately.
Here we point out that for non-stationary irreducible Markov chains, it is easy to obtain the transient fluctuation theorem considered in [125–127, 444– 446], which is the correspondent in non-stationary situations to the results of Theorems 1.5.7, 1.5.8, 1.5.10 (see [254]).
1.6 Appendix To prove Theorem 1.2.7, we need the following result due to T.E. Harris [219]. One can also find its proof in Br´emaud [45, page 119]. Lemma 1.6.1. Suppose that X = {Xn }n≥0 is a homogeneous, irreducible and positive recurrent Markov chain with a countable state space S. Let µ = (µi )i∈S be the unique invariant probability distribution of X. For each i ∈ S, define Ti = inf{n ≥ 1 : Xn = i}. Then for any i, j ∈ S, i = j, the following identity holds: Prob(Tj < Ti |X0 = i) =
1 . µi [E(Tj |X0 = i) + E(Ti |X0 = j)]
1.6 Appendix
43
We also need Theorem 4 on page 46 of K.L. Chung [62]. We replicate it here to make the presentation more self-contained. Lemma 1.6.2. Assume that X = {Xn }n≥0 is a homogeneous Markov chain with a denumerable state space S. For any H ⊂ S, i, j ∈ S and n ∈ N, define the taboo probability p(i, j, n|H) = Prob(Xn = j, Xm ∈ H for 1 ≤ m < n|X0 = i). If i ∈ H, j ∈ H and there exists an n0 ∈ N such that p(i, j, n0 |H) > 0 (i.e. j can be reached from i under the taboo H), then lim
N →+∞
1+ 1+
N
n=1 p(j, j, n|H) N n=1 p(i, i, n|H)
+∞ n=1 = +∞
p(i, j, n|H ∪ {i})
p(i, j, n|H ∪ {j}) N 1 + n=1 p(j, j, n|H ∪ {i}) = lim . N N →+∞ 1 + n=1 p(i, i, n|H ∪ {j}) n=1
Proof of Theorem 1.2.7. Apply Lemma 1.6.1 to the stationary Markov chain ξ, then we can obtain πi P(Tj < Ti |ξ0 = i) = πj P(Ti < Tj |ξ0 = j), ∀i, j ∈ S, i = j, which together with g(j, j|{i}) = [1 − P(Tj < Ti |ξ0 = j)]−1 = [P(Ti < Tj |ξ0 = j)]−1 implies the following identity: πi g(j, j|{i}) = πj g(i, i|{j}), ∀i, j ∈ S, i = j.
(1.40)
Similarly, apply Lemma 1.6.1 to the derived chain η, then we can get ˜ i (y1 )q(y1 , y2 ) = Π ˜ i (y2 )q(y2 , y1 ), ∀y1 , y2 ∈ [S]i , y1 = y2 , Π
(1.41)
where q(yk , yl ) denotes the probability that the derived chain η starting at yk visits yl before returning to yk . For y1 = [i1 , i2 , · · · , is−1 ] and y2 = [i1 , i2 , · · · , is−1 , is ], we have q(y1 , y2 ) = pis−1 is ,
q(y2 , y1 ) = 1 − f (is , is |{i1 , i2 , · · · , is−1 }),
where f (is , is |{i1 , i2 , · · · , is−1 }) denotes the probability that the original chain ξ starting at is returns to is before visiting any of the states i1 , i2 , · · · , is−1 . Hence (1.41) becomes ˜ i ([i1 , · · · , is−1 ])pi i = Π ˜ i ([i1 , · · · , is ])(1 − f (is , is |{i1 , · · · , is−1 })) (1.42) Π s−1 s and
44
1 Denumerable Markov Chains
˜ i ([i1 , · · · , is−1 ])pi i g(is , is |{i1 , · · · , is−1 }). ˜ i ([i1 , · · · , is ]) = Π Π s−1 s
(1.43)
Now we may appeal to Lemma 1.6.2 to get g(is , is |{i1 , · · · , is−1 })g(is+1 , is+1 |{i1 , · · · , is−1 , is }) = g(is+1 , is+1 |{i1 , · · · , is−1 })g(is , is |{i1 , · · · , is−1 , is+1 }).
(1.44)
Then (1.17) follows from (1.6) and (1.43). By (1.40) and (1.44), the product πi1 g(i2 , i2 |{i1 })g(i3 , i3 |{i1 , i2 }) · · · g(is , is |{i1 , · · · , is−1 })
(1.45)
is unaffected by any permutation of the indices i1 , i2 , · · · , is . To prove (1.18), we first show that 1=
s
g(j1 , j1 |{i1 , · · · , is })
k=1 r≥1 j2 ,···,jr
g(j2 , j2 |{i1 , · · · , is , j1 })g(j3 , j3 |{i1 , · · · , is , j1 , j2 }) · · · g(jr , jr |{i1 , · · · , is , j1 , · · · , jr−1 })pj1 j2 pj2 j3 · · · pjr−1 jr pjr ik , (1.46) where j1 ∈ {i1 , · · · , is } is fixed and the inner sum is taken over all distinct j2 , · · · , jr ∈ {i1 , · · · , is , j1 }. As before, given H ⊂ S, we define the taboo probability p(i, j, n|H) = P(ξn = j, ξm ∈ H for 1 ≤ m < n|ξ0 = i). For k, j2 , j3 , · · · , jr fixed, the sum over n1 , · · · , nr of p(j1 , j1 , n1 |{i1 , · · · , is })pj1 j2 p(j2 , j2 , n2 |{i1 , · · · , is , j1 })pj2 j3 · · · p(jr , jr , nr |{i1 , · · · , is , j1 , · · · , jr−1 })pjr ik is the probability for the chain ξ starting at j1 to enter the set {i1 , · · · , is } for the first time at ik while the value of the derived chain η is [j1 , j2 , · · · , jr , ik ]. Thus we get the summands of (1.46). Their sum over k, r, j2 , j3 , · · · , jr must be 1, hence (1.46) follows. Then multiplying both sides of (1.46) by pis i1 pi1 i2 · · · pis−1 is πi1 g(i2 , i2 |{i1 })g(i3 , i3 |{i1 , i2 }) · · · g(is , is |{i1 , · · · , is−1 }), and using the symmetry of (1.45), we obtain (1.18). Finally, (1.19) can be gotten from (1.46) via taking s = 1, j1 = i, i1 = j, and multiplying both sides of it by πj .