Commun. Math. Phys. 224, 1 – 2 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Preface The present issue of CMP is dedicated to Joel L. Lebowitz, the Willliam Hill Professor of Mathematics and Physics of Rutgers University, in recognition of his outstanding contributions and scientific leadership in statistical physics and related areas of mathematical physics. In his research, Joel Lebowitz has addressed topics in statistical physics ranging over equilibrium and non-equilibrium phenomena. His works reflect the rare combination of a deep understanding of the relevant physics and the ability to see through the mathematical formalism in which physics is being expressed. He has reveled both in learning new physical phenomena and in shedding light on basic questions of physics through mathematically rigorous results. The subjects on which Joel has worked, with numerous collaborators, include: the theory of equilibrium fluids, rigorous approach to the liquidvapor transition, critical phenomena in Ising type models, the statistical mechanics of Coulomb systems, phase segregation studied in conjunction with pioneering implementations of Monte-Carlo simulations, ergodic theory in relation to fundamental issues of statistical mechanics, kinetic theory, and the structure of non-equilibrium steady states. Joel’s contributions have been widely recognized; in 1980 he was elected to the National Academy of Sciences and in 1992 he was awarded the Boltzmann Medal of the Union of Pure and Applied Physics.
2
M. Aizenman, H. Spohn
An inseparable aspect of the Lebowitz experience for the many whom he has touched has been the sense of his personal engagement and care. Having witnessed the holocaust as a young teenager, Joel emerged from the devastation and the inhumanely twisted reality that he was thrown into with a tenacious commitment to stand up for human rights and dignity. He has inspired many with the message that scientists should use the unique opportunities accorded to them to be at the forefront of that struggle. In 1999 he was awarded the Scientific Freedom and Responsibility Award of the American Association for the Advancement of Science for “. . . his tireless devotion to the rights of scientists in oppressive regimes throughout the world and his extraordinary creativity in finding ways to help these scientists survive their ordeal ”. In the late fifties Joel instituted a unique series of biannual meetings in Statistical Mechanics, which he has been running uninterrupted since then. These conferences have provided an invaluable forum for the presentation of recent results and for stimulating exchanges on both new emerging vistas and long outstanding fundamental issues. In a fitting reflection of the spirit of these meetings, Joel’s 70th birthday is being marked with two special issues: Physica A Vol. 279, Nos. 1–4, where the reader may also find a more complete biographical sketch, and this issue of CMP which presents rigorous results in fields related to Joel’s interests. Joel L. Lebowitz stands out in his never satisfied curiosity, the clarity of his thought, and his exceptional ability to reach out and stimulate others. He has inspired and guided numerous colleagues and students. This issue is dedicated with deep gratitude, with a sense of joy at having had the privilege to interact with him, and with best wishes for Joel’s continuing quest. Michael Aizenman Herbert Spohn
Commun. Math. Phys. 224, 3 – 16 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Entropy Production in Quantum Spin Systems David Ruelle IHES, route de Chartes, 91440 Bures sur Yvette, France. E-mail:
[email protected] Received: 7 June 2000 / Accepted: 5 November 2000
Abstract: We consider a quantum spin system consisting of a finite subsystem connected to infinite reservoirs at different temperatures. In this setup we define nonequilibrium steady states and prove that the rate of entropy production in such states is nonnegative. For several decades, Joel Lebowitz has been the soul of research in statistical mechanics. He now plays a central role in the development of new ideas which reshape our understanding of nonequilibrium. The present paper, dedicated to Joel on his 70th birthday, extends some of the new ideas to quantum systems. 1. Introduction Consider a physical situation where a “small” system S is connected to different “large” heat reservoirs Ra (a = 1, 2, . . . ) at different inverse temperatures βa . We want to define nonequilibrium steady states for the total system L = S + R1 + R2 + . . . , and verify that the rate of entropy production in such states is ≥ 0. The model which we discuss in this paper is that of a fairly realistic quantum spin system. In what follows we first describe the model and state our assumptions (A1), (A2), (A3). In this setup we introduce nonequilibrium steady states ρ as states which, in the distant past, described noninteracting reservoirs at different temperatures. Under suitable conditions we check that our definition does not depend on where we place the boundary between the small system and the reservoirs. Our definition of the entropy production eρ also does not depend on where the boundary between the small system and the reservoirs is placed. With this definition we prove eρ ≥ 0. By contrast with an earlier paper [4], we omit here assumptions of asymptotic abelianness in time which are difficult to verify, the definition of nonequilibrium steady states is more general, but we obtain less specific results.
4
D. Ruelle
2. Description of the Model (See [3, 1]) Let L be a countably infinite set. For each x ∈ L, let Hx be a finite dimensional complex Hilbert space, and write HX = ⊗x∈X Hx if X is a finite subset of L. We let AX be the C∗ -algebra of bounded operators on HX , and if Y ⊂ X we identify AY with a subalgebra of AX by the map AY → AY ⊗1HX\Y ⊂ AX . We write L as a finite union L = ∪a≥0 Ra , where R0 = S is finite (small system) and the Ra with a > 0 are infinite (reservoirs). We can then define the quasilocal C∗ algebras Aa , A as the norm closures of AX , AX X⊂Ra
X⊂L
repectively. Note that all these algebras have a common unit element 1. In this setup we assume that an interaction : X → (X) is given such that (X) is a selfadjoint element of AX for every finite X ⊂ L. Also, for each reservoir, we prescribe an inverse temperature βa > 0 and a state σa on Aa . The assumptions (A1), (A2), (A3). Assumption A1. The interaction satisfies
λ = enλ sup n≥0
x∈L Xx:cardX=n+1
(X) < ∞
for some λ > 0. The importance of this assumption is that it allows us to equip A with a one-parameter group (α t ) of automorphisms1 defining a time evolution. Introduce a linear operator δ : ∪X⊂L AX → A such that δA = i [(Y ), A] if A ∈ AX . Y :Y ∩X=∅
If A ∈ AX , one checks that
δ m A ≤ A eλcardX m!(2λ−1 λ )m . The strongly continuous one-parameter group (α t ) of ∗-automorphisms of A is given by ∞ m t m αt A = δ A m! m=0
if A ∈ ∪X⊂L AX and |t| < λ/2 λ . (More generally one could take A ∈ Aλ , where Aλ is defined in the Appendix.) Let H = (X) X⊂
for finite ⊂ L. Writing → L if eventually contains each finite X ⊂ L we have, assuming A ∈ A, lim eitH Ae−itH − α t A = 0 →L
uniformly for t in compact intervals of R. 1 See [1] Theorem 6.2.4 (or [3] Sect. 7.6).
Entropy Production in Quantum Spin Systems
5
Assumption A2. (X) = 0 if X ∩ S = ∅, X ∩ Ra = ∅, X ∩ Rb = ∅ for different a, b > 0. Note that the description of the interaction is somewhat ambiguous because anything ascribed to (X) might also be ascribed to (Y ) for Y ⊃ X. Condition (A2) means that in our accounting, if a part of the interaction connects two different reservoirs, it must also involve the small system S. Assumption A3. If a > 0, let a be the restriction of the interaction to subsets of Ra and write a (X) = HRa ∩ . Ha = X⊂Ra ∩
Let also the interactions () be given such that
() λ ≤ K < ∞ and write
Ba =
(1)
() (X).
X⊂Ra ∩
We assume that, for a suitable sequence → L, lim
Tr HRa ∩ (e−βa (Ha +Ba ) A) Tr HRa ∩ e−βa (Ha +Ba )
→L
= σa (A)
if A ∈ Aa : this defines a state σa on Aa , depending on the choice of (() ) and the sequence → L. Furthermore we assume that for each finite X there is X such that () (Y ) = 0 if ⊃ X and Y ⊂ X; therefore
[Ba , A] = 0
(2)
if ⊃ X and A ∈ AX . In particular we can take all () = 0. Using (3) below, it is readily verified that σa is a βa -KMS state (see [2]) for the one-parameter group (α˘ at ) of automorphisms of Aa corresponding to the interaction a . [I do not know which of the βa -KMS states can be obtained in this manner]. Note that the assumptions (A1), (A2), (A3) can be explicitly verified in specific cases. From (A3) we obtain the following result. Lemma. lim eit (Ha +Ba ) Ae−it (Ha +Ba ) − α˘ at A = 0
→L
(3)
for a > 0, and
lim eit (H +
→L
a>0
Ba )
Ae−it (H +
uniformly for t in compact intervals of R.
a>0
Ba )
− α t A = 0
(4)
6
D. Ruelle
t it (H + a>0 Ba ) Ae−it (H + Proof. We prove (4). Write α A = e i[H + a>0 Ba , A]. If A ∈ ∪X AX we see using (1) that t α A=
a>0
Ba )
and δ A =
∞ m t m δ A m!
m=0
converges uniformly in for |t| < λ/2( λ + K). Using also (2), it is shown in the m A → δ m A in A when → L. Therefore Appendix that δ t A − α t A = 0 lim α
→L
when A ∈ ∪X AX , uniformly for |t| ≤ T < λ/2( λ + K). But the condition A ∈ ∪X AX is removed by density, and the condition |t| ≤ T < λ/2( λ + K) by use of the group property. The proof of (3) is similar. The KMS state σ . The interaction a>0 βa a , evaluated at X is βa a (X) if X ⊂ Ra and 0 if X is not contained in one of the Ra . The corresponding one-parameter group (β t ) of automorphisms of A has, according to (A3), the KMS state2 σ = ⊗a≥0 σa where σ0 is the normalized trace on A0 = AS . In fact Tr H (exp(− a βa (Ha + Ba ))A) σ (A) = lim . (5) →L Tr H exp(− a βa (Ha + Ba )) Nonequilibrium steady states. We call nonequilibrium steady states (NESS) associated with σ the limits when T → ∞ of 1 T dt (α t )∗ σ T 0 using the w∗ -topology on the dual A∗ of A. With respect to this topology, the set % of NESS is compact, nonempty, and the elements of % are (α t )∗ -invariant states on A. This definition generalizes that given in [4] where, under stringent asymptotic abeliannes conditions, the existence of a single NESS was obtained. Dependence on the decomposition L = S + R1 + R2 + . . . 3 . Our definition of σ , and therefore of % depends on the choice of a decomposition of L into small system and reservoirs. If S is replaced by a finite set S ⊃ S and the Ra by correspondingly smaller sets Ra ⊂ Ra one checks that (A1), (A2),(A3) remain valid. If a is the restriction of to subsets of Ra , the replacement of βa a by βa a changes (β t ) to a one-parameter group (β t ) and σ to a state σ . These changes are in fact bounded perturbations covered by Theorem 5.4.4 and Corollary 5.4.5 of [1]. The map σ → σ (of KMS states for (β t ) to KMS states for (β t )) is nonlinear (as can be guessed from (5)) and T T therefore we cannot expect that T1 0 dt (α t )∗ σ has the same limit as T1 0 dt (α t )∗ σ 2 The state σ corresponds to the inverse temperature +1 rather than the inverse temperature −1 favored in the mathematical literature. 3 This section and the following Proposition are in the nature of a technical digression, and may be omitted by the reader essentially interested in the positivity of the entropy production.
Entropy Production in Quantum Spin Systems
7
in general, but the deviation is not really bad. The (central) decomposition of KMS states into extremal KMS states gives factor states. If σ is assumed to be a factor state, and T (α t ) is asymptotically abelian, one finds that lim T1 0 dt (α t )∗ σ does not depend on the decomposition L = S + R1 + R2 + . . . , as the following result indicates. Proposition. Using the above notation, assume that σ is a factor state, and that lim [α t A, B] = 0
t→∞
when A, B ∈ A. Then, when T → ∞, 1 T 1 T dt (α t )∗ σ = lim dt (α t )∗ σ. lim T 0 T 0 Proof. Let us introduce the GNS representation (H, π, ') associated with σ so that if 1 T ρ = lim dt (α t )∗ σ, T 0 we have
1 T dt (', π(α t A)'). T 0 By restricting T to a subsequence we may assume that in the weak operator topology 1 T lim dt π(α t A) = A¯ ∈ π(A) T 0 ρ(A) = lim
and by assumption we also have A¯ ∈ π(A) , hence A¯ ∈ π(A) ∩ π(A) = {λ1} since σ is a factor state. But we may write σ (·) = (' , π(·)' ): this follows from the perturbation theory of [1] (see proof of Theorem 5.4.4). We have thus 1 T 1 T lim dt σ (α t A) = lim dt (' , π(α t A)' ) T 0 T 0 1 T 1 T t = lim dt (', π(α A)') = lim dt σ (α t A) T 0 T 0 as announced. Entropy production. For finite ⊂ L we have defined (X), H = X⊂
but HL , HRa do not make sense. We can however define [HL , HRa ] = lim [H , HRa ∩ ] = lim [H , Ha ]. →L
→L
We have indeed [H , Ha ] = [H − Ha , Ha ] = [H −
b>0
Hb , Ha ]
8
D. Ruelle
and (A2) gives H −
Hb =
b>0
x∈S X:x∈X⊂
1 (X) card(X ∩ S)
[implying the existence of the limit lim→L (H − A]. Using (A1) we obtain
b>0 Hb )
= HL −
b>0 HRb
∈
[(X), Ha ] ≤ 2λ−1 λ (X) eλcardX , hence
[(X), Ha ] ≤ 2λ−1 λ eλ λ ,
Xx
and [H , Ha ] has a limit [HL , HRa ] ∈ A when → L with
[HL , HRa ] ≤ 2cardSλ−1 eλ 2λ . The operator i[HL , HRa ] may be interpreted as the rate of increase of the energy of the reservoir Ra or (since this energy is infinite) rather the rate of transfer of energy to Ra from the rest of the system. According to conventional wisdom we define the rate of entropy production in an (α t )∗ -invariant state ρ as eρ =
βa ρ(i[HL , HRa ])
a>0
(this definition does not require that ρ ∈ %). Remark. If we replace S by a finite set S ⊃ S and the Ra by the correspondingly smaller sets Ra ⊂ Ra , we have noted earlier that (A1), (A2), (A3) remain satisfied. As a consequence of (A1) we have i[HL , HRa − HRa ] = lim i[H , Ha − Ha ] = lim δ(Ha − Ha ) →L
→L
(where the operator δ has been defined just after (A3)), hence ρ(i[HL , HRa − HRa ]) = lim ρ(δ(Ha − Ha )) = 0 →L
i.e. , the rate of entropy production is unchanged when S and the Ra are replaced by S and the Ra . The reason why we do not have ρ(i[HL , HRa ]) = 0 is mathematically because HRa is “infinite” (HRa ∈ / A), and physically because our definition of ρ(i[HL , HRa ]) takes into account the flux of energy into Ra from S, but not the flux at infinity. Theorem. The entropy production in a NESS is nonnegative, i.e. , eρ ≥ 0 if ρ ∈ %.
Entropy Production in Quantum Spin Systems
9
We have seen that [HL , HRa ] = lim [H , Ha ] →L Hb , Ha ]. = lim [H − →L
b>0
Therefore, using (A3) and [Hb + Bb , a>0 βa (Ha + Ba )] = 0, we find βa [HL , HRa ] = lim [H − Hb , βa Ha ] →L
a>0
a>0
b>0
= lim [H − →L
Hb ,
→L
βa (Ha + Ba )]
a>0
b>0
= lim [H +
Bb ,
βa (Ha + Ba )]
a>0
b>0
in the sense of norm convergence. We also have, for some sequence of values of T tending to infinity and all A ∈ A, 1 T 1 T t ρ(A) = lim dt σ (α t A) = lim lim dt σ (α A), T →∞ T 0 T →∞ →L T 0 where, by (4),
t α A = eit (H +
a>0
Ba )
Ae−it (H +
a>0
Ba )
→ α t A in norm
when → L, uniformly for t ∈ [0, T ]. Write HB = H + Ba , G =
a>0
βa (Ha + Ba ) + log Tr H exp
a>0
a>0
βa (Ha + Ba ) .
a>0
Then the entropy production is eρ = ρ(i
−
i T →∞ →L T
βa [HL , HRa ]) = lim lim
0
T
dt σ (eitHB [HB , G ] e−itHB )
and the convergence when → L of the operator (eitHB [HB , G ]e−itHB ) is uniform for t ∈ [0, T ]. According to (A3) we may choose the tending to L such that Tr H e−G (·) tends to σ (·) in the w ∗ -topology, hence T i dt Tr H (e−G eitHB [HB , G ]e−itHB ) eρ = lim lim T →∞ →L T 0 1 T −G d itHB −itHB = lim lim (e dt Tr H e G e ) T →∞ →L T 0 dt 1 = lim lim Tr H (e−G eiT HB G e−iT HB ) − Tr H (e−G G ) T →∞ →L T and the theorem follows from the lemma below, applied with A = G , U = eiT HB and φ(s) = −e−s .
10
D. Ruelle
Lemma. Let A, U be a hermitian and a unitary n×n matrix respectively, and φ : R → R be an increasing function. Then tr(φ(A)U AU −1 ) ≤ tr(φ(A)A). Proof. As R. Seiler kindly pointed out to me, this lemma can be obtained readily from O. Klein’s inequality tr(f (B) − f (A) − (B − A)f (A)) ≥ 0, where A, B are hermitian and f convex: take B = U AU −1 and φ = f . Remark. We have
ρ(i[HL , HRa ]) = 0
a>0
because −
ρ(i[HL , HRa ]) = lim ρ i H , H − Ha →L
a>0
=
d ρ αt dt
X:X∩S=∅
a>0
(X)
= 0, t=0
where we have used the fact that ρ is (α t )∗ -invariant. In particular, in the case of two reservoirs 0 ≤ eρ = (β1 − β2 )ρ(i[HL , HR1 ]) so that if the temperature β1−1 is less than β2−1 , i.e., β1 − β2 > 0, the flux of energy into R1 is ≥ 0: heat flows from the hot reservoir to the cold reservoir. 3. Proving Strict Positivity of eρ It is an obvious challenge to prove that eρ = 0. A natural situation to discuss would correspond to Ra = Zν and a translationally invariant. But we need then ν ≥ 3 as discussed in [4]. Indeed, for ν < 3 one expects a nonequilibrium steady state to be in fact an equilibrium state at a temperature intermediate between the original temperatures of the reservoirs. Instead of a quantum spin system as described above, a gas of noninteracting fermions would probably be easier to treat first. 4. Complements and Relation with Recent Work of Jakši´c and Pillet After this paper was submitted for publication, two interesting contributions were posted to the mp arc archive: one by Jakši´c and Pillet4 and one by Maes et al.5 In this section and the next two, I am complying with the editor’s request to take into account remarks by the referees, and in particular to discuss the relations of my work with the two references mentioned above. 4 V. Jakši´c and C.-A. Pillet. “On entropy production in quantum statistical mechanics.” mp arc 00-309. 5 Chr. Maes, F. Redig, and M. Verschuere. “Entropy production for interacting particle systems.” mp arc
00-357.
Entropy Production in Quantum Spin Systems
11
Note that the definition of entropy production used above is based on the thermodynamic relation dQ = kT dS or, in the present case dS = a (kTa )−1 dQa . It can be considered a drawback that this definition does not relate directly to a microscopically defined entropy-like quantity, as is done in the papers of Jakši´c and Pillet, and Maes et al. We now discuss in detail the approach of Jakši´c and Pillet, and its relation with the present paper.6 We are given a C∗ -algebra A with identity, an element V = V ∗ ∈ A, time evolutions t (α˘ ), (α t ) (i.e. , strongly continuous one-parameter groups of ∗-automorphisms of A) such that t1 tn−1 t
α t (A) = α˘ t (A) + in dt1 dt2 . . . dtn α˘ tn (V ), . . . [α˘ t1 (V ), A] 0
0
n≥1
0
and an (α˘ t )-invariant state σ on A. Therefore (α t ) is a local perturbation by V of the “free” evolution given by (α˘ t ) and σ is an invariant state for the “free” evolution. We furthermore assume that (C1) There exists a time evolution (β t ) for which σ is a KMS state at inverse temperature +1. (C2) V is in the domain of the infinitesimal generator δβ of (β t ). [In fact Jakši´c and Pillet assume a temperature −1 in (C1); our choice of temperature +1 will bring a change of sign below in the definition of the entropy production. In the situation discussed earlier we have V = (X), X∩S=∅
hence V λ ≤ λ cardS, and V ∈ Aλ . Note that Aλ is in the domain of the infinitesimal generator δβ of (β t ) (see the Appendix), hence (C2) holds. The advantage of the approach of Jakši´c and Pillet is that σ can be an arbitrary KMS state: the existence of “boundary terms” Ba as in (A3) is not required.] In this setup one introduces the observable −δβ (V ) and the entropy production in the state ρ is defined as ρ(−δβ (V )). [In our situation we have −δβ (V ) = −
a>0
=
βa
i[(X), (Y )]
X⊂Ra Y :Y ∩S=∅
βa i[HL , HRa ]
a>0
so that ρ(−δβ (V )) = eρ is indeed the rate of entropy production in the state ρ.] 6 We have changed the notation of [2] to align it with the one used above.
12
D. Ruelle
Finite dimensional digression. For the purpose of motivation we discuss now the case where A would be the algebra of n × n matrices, and consider two states on A given by density matrices µ, ν. A relative entropy is then defined by Ent(µ|ν) = −tr(µ log µ − µ log ν) ≤ 0. If (α t ) is a one parameter group of ∗-automorphisms of A we have thus d d Ent(µ ◦ α t |ν) = tr µ α t (log ν) . dt dt Suppose now that ν is preserved by the “free” evolution (α˘ t ), and that (α t ) is a perturbation of (α˘ t ), so that α t (A) = ei(H +V )t Ae−i(H +V )t ,
α˘ t (A) = eiH t Ae−iH t ,
then d t α (log ν) = α t (i[V , log ν]). dt Define now (β t ) by β t (A) = e−it log ν Aeit log ν so that ν is the corresponding KMS state (at inverse temperature +1). Then if δβ is the infinitesimal generator of (β t ) we have i[V , log ν] = δβ (V ), hence d t α (log ν) = α t (δβ (V )), dt
d Ent(µ ◦ α t |ν) = µ(α t (δβ (V ))). dt We obtain thus
T
T
Ent(µ ◦ α |ν) − Ent(µ|ν) = 0
(µ ◦ α t )(δβ (V )) dt
or, taking µ = ν = σ , 0 ≤ −Ent(σ ◦ α T |σ ) =
T 0
(σ ◦ α t )(−δβ (V )) dt.
Entropy Production in Quantum Spin Systems
13
The infinite dimensional situation. If µ, ν are two faithful normal states on a von Neumann algebra M [in our case πσ (A) ], Araki has introduced a relative entropy Ent(µ|ν) in terms of a relative modular operator associated with µ, ν. We must refer the reader to [1] Definition 6.2.29 for details. Using this definition, Jakši´c and Pillet have worked out an infinite dimensional version of the finite dimensional calculation given above. They are able to prove the formula
T 0
(σ ◦ α t )(−δβ (V )) dt = −Ent(σ ◦ α T |σ ) ≥ 0
which can be interpreted as an entropy balance, and gives in the limit ρ(−δβ (V )) ≥ 0 if ρ is a NESS. The proof is fairly technical. The approach of Jakši´c and Pillet has the interest of great generality. In particular σ can be an arbitrary KMS state.Also, instead of a spin lattice system one can consider fermions on a lattice. For a nonintertacting fermion model, Jakši´c and Pillet have announced a proof of strict positivity of the entropy production, as had been suggested above. Appendix: The Algebras Aλ The purpose of this Appendix is to complete the proof of (4) by establishing (10) below. On the way to this result we introduce “partial traces” π , and algebras Aλ which are of interest in their own right. For finite ⊂ L, a map π : ∪X AX → A is defined by π A =
lim
Y →L\
tr HY A . dimHY
If the φi form an orthonormal basis of HY , and ψ , ψ ∈ H we have tr HY A 1 ψ , = ψ (φi ⊗ ψ , Aφi ⊗ ψ ), dimHY dimHY i
hence π A ≤ A . The properties of the following lemma are then readily checked. Lemma. The map π extends to a unique linear norm-reducing map A → A . Furthermore if A ∈ A , π A = A π A∗ = (π A)∗ , π π = π π . Choose now some λ > 0. For A ∈ A , define λ cardX
A λ = inf
AX e : AX = A . X⊂
X
14
D. Ruelle
By the inf by min. If is replaced by a larger set , and compactness we may replace Y AY = A with Y ⊂ , we have
AY eλ cardY ≥
π AY eλ card(Y ∩)
Y ⊂
Y
with Y π AY = π A = A. Therefore A λ does not depend on the choice of provided A ∈ A . We have thus a norm . λ on ∪X AX , and we may define the Banach space Aλ by completion. Proposition. The inclusion map ∪X AX → A extends to a norm-reducing map ω : Aλ → A and ω is injective. Proof. ω is norm-reducing because A ≤ A λ for A ∈ ∪X AX . Note now that π : ∪X AX → A reduces the . λ -norm and extends thus to a linear norm-reducing map Aλ → Aλ , where Aλ is A equipped with the . λ -norm. Assume that A ∈ Aλ with A λ = a > 0. We may choose and B ∈ A such that
A − B λ < a/3, hence B λ > 2a/3. Now ωA = 0 would imply π A = 0, hence a 2a < B λ = π (B − A) λ ≤ A − B λ < . 3 3 Therefore ω must be injective. Corollary. Aλ is identified by ω to a dense ∗-subalgebra of A; Aλ is then a Banach algebra with respect to the norm . λ . Taking λ = 0 we may define A0 = A. With this definition, if λ < µ we have Aλ ⊃ Aµ , and the map Aµ → Aλ is norm-reducing. Proof.If A, B ∈ A we may choose AX , BX ∈ AX such that A = X⊂ AX , B = X⊂ BX , and
AX eλ cardX ,
B λ =
BX eλ cardX .
A λ = X⊂
Thus
AB λ ≤
X
≤
X⊂
Y
X
AX AY eλ card(X∪Y )
AX . AY eλ(cardX+cardY ) = A λ B λ .
Y
Therefore if A, B tend to limits A∞ , B∞ in Aλ , AB tends in Aλ to A∞ B∞ and A∞ B∞ λ ≤ A∞ λ B∞ λ . The rest is clear. If λ < ∞ and AX ∈ AX the formula [(Y ), AX ] δAX = i Y :Y ∩X=∅
defines an element of Aλ . If λ > µ ≥ 0, and λ < ∞, one also checks that δ m defines a map Aλ → Aµ such that
δA µ ≤ 2(λ − µ)−1 A λ λ ,
δ m A µ ≤ A λ m!(2(λ − µ)−1 λ )m .
(6)
Entropy Production in Quantum Spin Systems
15
(The proof of (6) is basically the same as that of the standard case µ = 0). + δ , where We turn now to the proof of (10) below . We have δ = δ δ A = i[ Ba , A] δ A = i[H , A], a>0
and (1) and (6) (for m = 1) yield
δA µ ≤ A λ .2(λ − µ)−1 λ ,
δ A µ ≤ A λ .2(λ − µ)−1 λ ,
δ A µ ≤ A λ .2(λ − µ)−1 K.
Given 4 > 0 and A ∈ Aλ we can find X such that A = A1 + A2 with A1 ∈ AX and
A2 λ < 4. Therefore
(δ − δ )A µ ≤ (δ − δ )A1 µ + δA2 µ + δ A2 µ + δ A2 µ
(7)
= (δ − δ )A1 µ + 4.2(λ − µ)−1 (2 λ + K).
Taking ⊃ X we also have
δ A1 = 0
by (2), and )A1 = i (δ − δ
[(Y ), A1 ]
Y :Y ⊂,Y ∩X=∅
so that
(δ − δ )A1 µ ≤ A1 λ .2(λ − µ)−1 Xλ , (8) where Xλ = supx∈X Y x,Y ⊂X e(cardY −1)λ (Y ) . When → L we have
Xλ → 0 and (7), (8) yield
lim (δ − δ )A µ = 0.
(9)
→L
We can now prove that, if λ < ∞ and A ∈ Aλ , m A = 0. lim δ m A − δ
(10)
→L
We have indeed m δ m A − δ A=
m−1 k=0
m−k−1 δ (δ − δ )δ k A
and, using (6),
6
δ k A 2λ/3 ≤ A λ .k! hence, by (9),
λ
λ
k
,
lim (δ − δ )δ k A λ/3 = 0
→L
so that, using (6), m−k−1 m−k−1 (δ − δ )δ k A ≤ δ (δ − δ )δ k A 0
δ
≤ (δ − δ )δ k A λ/3 (m − k − 1)!
6
λ
λ which tends to zero when → L. This concludes the proof of (10).
m−k−1
,
16
D. Ruelle
References 1. Bratteli, O. and Robinson. D.W.: Operator algebras and quantum statistical mechanics I, II. New York: Springer, 1979–1981 (2nd ed. 1987–1997) 2. Haag, R., Hugenholtz, N.M. and Winnink, M.: On the equilibrium states in quantum statistical mechanics. Commun. Math. Phys. 5, 215–236 (1967) 3. Ruelle, D.: Statistical mechanics. Rigorous results. New York: Benjamin, 1969 4. Ruelle, D.: Natural nonequilibrium states in quantum statistical mechanics. J. Statist. Phys. 98, 57–75 (2000) Communicated by H. Spohn
Commun. Math. Phys. 224, 17 – 31 (2001)
Communications in
Mathematical Physics
A Rigorous Derivation of the Gross–Pitaevskii Energy Functional for a Two-dimensional Bose Gas Elliott H. Lieb1 , Robert Seiringer2 , Jakob Yngvason2 1 Departments of Physics and Mathematics, Jadwin Hall, Princeton University, P. O. Box 708,
Princeton, NJ 08544, USA
2 Institut für Theoretische Physik, Universität Wien, Boltzmanngasse 5, 1090 Vienna, Austria
Received: 3 May 2000 / Accepted: 23 October 2000
Dedicated to Joel Lebowitz on the occasion of his 70th birthday Abstract: We consider the ground state properties of an inhomogeneous two-dimensional Bose gas with a repulsive, short range pair interaction and an external confining potential. In the limit when the particle number N is large but ρa ¯ 2 is small, where ρ¯ is the average particle density and a the scattering length, the ground state energy and density are rigorously shown to be given to leading order by a Gross–Pitaevskii (GP) energy functional with a coupling constant g ∼ 1/| ln(ρa ¯ 2 )|. In contrast to the 3D case the coupling constant depends on N through the mean density. The GP energy per particle depends only on Ng. In 2D this parameter is typically so large that the gradient term in the GP energy functional is negligible and the simpler description by a Thomas–Fermi type functional is adequate. 1. Introduction Motivated by recent experimental realizations of Bose–Einstein condensation the theory of dilute, inhomogeneous Bose gases is currently a subject of intensive studies. Most of this work is based on the assumption that the ground state properties are well described by the Gross–Pitaevskii (GP) energy functional (see the review article [1]). A rigorous derivation of this functional from the basic many-body Hamiltonian in an appropriate limit is not a simple matter, however, and has only been achieved recently for bosons with a short range, repulsive interaction in three spatial dimensions [2]. The present paper is concerned with the justification of the GP functional in two spatial dimensions. Several new issues arise. One is the form of the nonlinear interaction term inthe energy functional for the GP wave function . In three dimensions this term is 4πa ||4 , where a is the scattering length of the interaction potential. The rationale is the well known formula for the energy density of a homogeneous Bose gas, which,
© 2000 by the authors. Reproduction of this work, in its entirety, by any means, is permitted for noncommercial purposes.
18
E. H. Lieb, R. Seiringer, J. Yngvason
for dilute gases with particle density ρ, is 4π aρ 2 . This fact has been ‘known’ since the early 50’s but a rigorous proof is fairly recent [3]. In two dimensions the corresponding formula is 4πρ 2 | ln(ρa 2 )|−1 as proved in [4] by extension of the method of [3]. The formula was first stated by Schick [5]; other early references to this formula are [6–10]. It would seem natural to consider 4π ||4 | ln(||2 a 2 )|−1 as the interaction term in the GP functional, and this has indeed been suggested in [11, 12]. Such a term, however, is unnecessarily complicated for the purpose of leading order calculations. In fact, since the logarithm varies only slowly it turns out that one can use the same form as in the three dimensional case, but with an appropriate dimensionless coupling constant g replacing the scattering length, and still retain an exact theory (to leading order in ρ). It is often assumed that a justification of the GP functional depends on the existence of Bose Einstein condensation. Several remarks can be made about this: 1. We neither assume nor prove the existence of BE condensation, but we do demonstrate a kind of condensation over a distance that is fixed (i.e., non-thermodynamic) but whose length goes to infinity as the density goes to zero; 2. BE condensation does not exist in two dimensions when the temperature is positive, but it can, and most likely does, exist in the ground state; 3. In any event, when the density is low and the temperature is zero it appears to be likely that the system can be described for many purposes in terms of only a few macroscopic order parameters such as the density and phase – at least this is true for the dependence of the ground state energy and density upon an external potential. The functional we shall consider is E
GP
[] =
|∇(x)|2 + V (x)|(x)|2 + 4πg|(x)|4 d2 x,
(1.1)
where V is the external confining potential and all integrals are over R2 . The choice of g is an issue on which there has not been unanimous opinion in the recent papers [12–18] on this subject. We shall prove that a right choice is g = | ln(ρa ¯ 2 )|−1 , where ρ¯ is a mean density that will be defined more precisely below. This mean density depends on the particle number N , which implies that the scaling properties of the GP functional are quite different in two and three dimensions. In the three-dimensional case the natural parameter is N a/aosc , with aosc being the length scale defined by the external confining potential. If a/aosc is scaled like 1/N as N → ∞ this parameter is fixed and the gradient term |∇|2 in the GP functional is of the same order as the other terms. In two dimensions the corresponding parameter is N | ln(ρa ¯ 2 )|−1 . For a quadratic external 1/2 2 potential ρ¯ behaves like N /aosc and hence the parameter can only be kept fixed if a/aosc decreases exponentially with N . A slower decrease means that the parameter tends to infinity. This corresponds to the so-called Thomas Fermi (TF) limit where the gradient term has been dropped altogether and the functional is E TF [ρ] =
V (x)ρ(x) + 4πgρ(x)2 d2 x,
(1.2)
defined for nonnegative functions ρ. Our main result, stated in Theorems 1.3 and 1.4 below, is that minimization of (1.2) reproduces correctly the ground state energy and density of the many-body Hamiltonian in the limit when N → ∞, ρa ¯ 2 → 0, but 2 −1 2 −1 N| ln(ρa ¯ )| → ∞. Only in the exceptional situation that N | ln(ρa ¯ )| stays bounded is there need for the full GP functional (1.1), cf. Theorems 1.1 and 1.2.
Derivation of Gross–Pitaevskii Energy Functional for 2D Bose Gas
19
We shall now describe the setting more precisely. The starting point is the Hamiltonian for N identical bosons in an external potential V and with pair interaction v, H
(N)
=
v(xi − xj ), −∇i2 + V (xi ) +
N i=1
(1.3)
i<j
acting on the totally symmetric wave functions in ⊗N L2 (R2 ). Units have been chosen so that h¯ = 2m = 1, where m is the particle mass. We assume that v is nonnegative and spherically symmetric with a finite scattering length a. (For the definition of scattering length in two dimensions see the appendix.) The external potential should be continuous and tend to ∞ as |x| → ∞. It is then possible and convenient to shift the energy scale so that minx V (x) = 0. For the TF limit theorem we shall require some additional properties of V to be specified later. The ground state energy ω of the one-particle operator −∇ 2 + V is a natural energy unit and gives rise to the length unit aosc ≡ ω−1/2 . In the sequel we shall be considering a limit where a/aosc tends to zero while N → ∞. Experimentally a/aosc can be changed in two ways: One can either vary aosc or a. The first alternative is usually simpler in practice but very recently a direct tuning of the scattering length itself has also been shown to be feasible [19]. Mathematically, both alternatives are equivalent, of course. −2 Vˆ (x/a ) and keeping Vˆ and v fixed. The The first corresponds to writing V (x) = aosc osc second corresponds to writing the interaction potential as v(x) = a −2 v(x/a), ˆ where vˆ has unit scattering length, and keeping V and vˆ fixed. This is equivalent to the first, since for given Vˆ and vˆ the ground state energy of (1.3), measured in units of ω, depends only on N and a/aosc . In the dilute limit when a is much smaller than the mean particle distance, the energy becomes independent of v. ˆ We shall measure all energies in terms of ω and lengths in terms of aosc and regard Vˆ and vˆ as fixed. The notation E QM (N, a) for the ground state energy of (1.3) is then justified. The quantum mechanical particle density is defined by QM ρN,a (x) = N | (N) (x, x2 , . . . , xN )|2 d2 x2 . . . d2 xN , (1.4) where (N) is a ground state for (1.3). The GP functional (1.1) has an obvious domain of definition (cf. Eq. (2.1) in [2]). The infimum of E GP [] under the condition ||2 = N will be denoted by E GP (N, g). The infimum is obtained for a unique, positive function, denoted GP N,g , and the GP density GP (x) = GP (x)2 . is defined as ρN,g N,g of the TF functional (1.2) with the subsidiary condition The ground state energy ρ = N is denoted E TF (N, g). The corresponding minimizer can be written explicitly; it is TF ρN,g (x) =
1 [µTF − V (x)]+ , 8πg
where [t]+ ≡ max{t, 0} and µTF is chosen so that the normalization condition N holds.
(1.5)
TF = ρN,g
20
E. H. Lieb, R. Seiringer, J. Yngvason
TF at coupling We now define the mean density ρ¯ as the average of the TF density ρN,1 TF , i.e., constant g = 1, weighted with N −1 ρN,1
1 N
ρ¯ =
TF ρN,1 (x)2 d2 x.
(1.6)
It is clear that ρ¯ depends on N and when we wish to emphasize this we write ρ¯N . The definition (1.6) has the advantage that ρ¯ is easily computed; for instance, if V (x) ∼ |x|s for some s > 0, then ρ¯N ∼ N s/(s+2) . It may appear more natural to define ρ¯ selfTF (x)2 d2 x with g = | ln(ρa consistently as ρ¯ = N1 ρN,g ¯ 2 )|−1 , which amounts to solving a nonlinear equation for ρ. ¯ Also, the TF density could be replaced by the GP density. However, since ρ¯ will only appear under a logarithm such sophisticated definitions are not needed for the leading order result we are after. The simple formula (1.6) is adequate for our purpose, but it should be kept in mind that the self-consistent definition may be relevant in computations beyond the leading order. With this notation we can now state the two dimensional analogue of Theorem I.1 in [2]. Theorem 1.1 (GP limit for the energy). If, for N → ∞, a 2 ρ¯N → 0 with N/| ln(a 2 ρ¯N )| fixed, then E QM (N, a)
lim
N→∞ E GP (N, 1/| ln(a 2 ρ¯N )|)
= 1.
(1.7)
The corresponding theorem for the density, cf. Theorem I.2 in [2], is Theorem 1.2 (GP limit for the density). If, for N → ∞, a 2 ρ¯N → 0 with γ ≡ N/| ln(a 2 ρ¯N )| fixed, then lim
N→∞
1 QM GP ρ (x) = ρ1,γ (x) N N,a
(1.8)
in the sense of weak convergence in L1 (R2 ). These theorems, however, are not particularly useful in the two dimensional case, because the hypothesis that N/| ln(a 2 ρ¯N )| stays bounded requires an exponential decrease of a with N . As remarked above, the TF limit, where N/| ln(a 2 ρ¯N )| → ∞, is much more relevant. Our treatment of this limit requires that V is asymptotically homogeneous and sufficiently regular in a sense made precise below. This condition can be relaxed, but it seems adequate for most practical applications and simplifies things considerably. Definition 1.1. We say that V is asymptotically homogeneous of order s > 0 if there is a function W with W (x) = 0 for x = 0 such that λ−s V (λx) − W (x) → 0 as λ → ∞ 1 + |W (x)|
(1.9)
and the convergence is uniform in x. The function W is clearly uniquely determined and homogeneous of order s, i.e., W (λx) = λs W (x) for all λ ≥ 0.
Derivation of Gross–Pitaevskii Energy Functional for 2D Bose Gas
21
Theorem 1.3 (TF limit for the energy). Suppose V is asymptotically homogeneous of order s > 0 and its scaling limit W is locally Hölder continuous, i.e., |W (x) − W (y)| ≤ (const.)|x − y|α for |x|, |y| = 1 for some fixed α > 0. If, for N → ∞, a 2 ρ¯N → 0 but N/| ln(a 2 ρ¯N )| → ∞, then E QM (N, a) = 1. (1.10) N→∞ E TF (N, 1/| ln(a 2 ρ¯N )|) To state the corresponding theorem for the density we need the minimizer of (1.2) with g = 1, V replaced by W , and normalization ρ = 1. We shall denote this minimizer TF ; an explicit formula is by ρ˜1,1 lim
TF (x) = ρ˜1,1
1 TF [µ˜ − W (x)]+ , 8π
(1.11)
where µ˜ TF is determined by the normalization condition. Theorem 1.4 (TF limit for the density). Let V satisfy the same hypothesis as in Theorem 1.3. If, for N → ∞, a 2 ρ¯N → 0 but γ = N/| ln(a 2 ρ¯N )| → ∞, then γ 2/(s+2) QM 1/(s+2) TF x) = ρ˜1,1 (x) ρN,a (γ N→∞ N lim
(1.12)
in the sense of weak convergence in L1 (R2 ). Remark 1.1. For large N, ρ¯N behaves like (const.)N s/(s+2) . Moreover, prefactors are unimportant in the limit N → ∞, because ρ¯N stands under a logarithm. Hence Theorems 1.3 and 1.4 could also be stated with N s/(s+2) in place of ρ¯N . The proofs of these theorems follow from upper and lower bounds on the ground state energy E QM (N, a) that are derived in Sects. 3 and 4. For these bounds some properties of the minimizers of the functionals (1.1) and (1.2), discussed in the following section, are needed. 2. GP and TF Theory In this section we consider the functionals (1.1) and (1.2) with an arbitrary positive coupling constant g. Existence and uniqueness of minimizers is shown in the same way as in Theorem II.1 in [2]. The GP energy E GP (N, g) has the simple scaling property GP E GP (N, g) = N E GP (1, Ng). Likewise, N −1/2 GP N,g ≡ φγ depends only on γ ≡ Ng (2.1) GP 2 and satisfies the normalization condition |φγ | = 1. The variational equation (GP equation) for the GP minimization problem, written in terms of φγGP , is −"φγGP + V φγGP + 8π γ (φγGP )3 = µGP (γ )φγGP ,
(2.2)
where the Lagrange multiplier (chemical potential) µGP (γ ) is determined by the subsidiary normalization condition. Multiplying (2.2) with φγGP and integrating we obtain µGP (γ ) = E GP (1, γ ) + 4π γ φγGP (x)4 d2 x. (2.3) For the upper bound on the quantum mechanical energy in the next section we shall need a bound on the absolute value of the minimizer φγGP .
22
E. H. Lieb, R. Seiringer, J. Yngvason
Lemma 2.1 (Upper bound for the GP minimizer). φγGP 2∞ ≤
µGP (γ ) . 8π γ
(2.4)
Proof. φγGP is a continuous and positive function that satisfies the variational equation −"φγGP + U φγGP = µGP φγGP
(2.5)
with U = V + 8πγ (φγGP )2 . Let B = {x | φγGP (x)2 > µGP /(8π γ )}. Since V ≥ 0 we see that −"φγGP ≤ 0 on B, i.e., φγGP is subharmonic on B. Hence φγGP achieves its maximum on the boundary of B, where φγGP (x)2 = µGP /(8π γ ), so B is empty. The ground state energy E TF (N, g) of the TF functional (1.2) scales in the same way TF as E GP (N, g), i.e., E TF (N, g) = N E TF (1, Ng), and the corresponding minimizer ρN,g TF TF TF is equal to Nρ1,Ng . For short, we shall denote ρ1,γ by ργ . By (1.5) we have ργTF (x) =
1 [µTF (γ ) − V (x)]+ , 8π γ
with the chemical potential µTF (γ ) determined by the normalization condition 1. In the same way as in (2.3) we have TF TF µ (γ ) = E (1, γ ) + 4π γ ργTF (x)2 d2 x.
(2.6)
ργTF = (2.7)
The chemical potential can also be computed from a variational principle: Lemma 2.2 (Variational principle for µTF ). µTF (γ ) = Vρ + 8π γ ρ∞ . inf ρ≥0, ρ=1
(2.8)
Proof. Obviously, the infimum is achieved for a multiple of a characteristic function for some measurable set R ⊂ R2 . If |R| denotes the Lebesgue measure of R, then 1 inf Vρ + 8πγ ρ = inf V + 8π γ (2.9) ∞ |R| R ρ=1 R 1 TF TF = inf V − µ (γ ) + 8π γ + µ (γ )|R| . |R| R R (2.10) Now R (V − µTF (γ )) ≥ −8πγ , with equality for x|V (x) < µTF (γ ) ⊆ R ⊆ x|V (x) ≤ µTF (γ ) . (2.11) Corollary 2.1 (Properties of µTF (γ )). µTF (γ ) is a concave and monotonously increasing function of γ with µTF (0) = 0. Hence µTF (γ )/γ is decreasing in γ . Moreover, µTF (γ ) → ∞ and µTF (γ )/γ → 0 as γ → ∞.
Derivation of Gross–Pitaevskii Energy Functional for 2D Bose Gas
23
Proof. Immediate consequences of Lemma 2.2, using that minx V (x) = 0 and lim|x|→∞ V (x) = ∞. Note that since E TF (1, γ ) ≥ 21 µTF (γ ) we also see that E TF (1, γ ) → ∞ with γ . In this limit the GP energy converges to the TF energy, provided the external potential satisfies a mild regularity and growth condition: Lemma 2.3 (TF limit of the GP energy). Suppose for some constants α > 0, L1 and L2 , |V (x) − V (y)| ≤ L1 |x − y|α eL2 |x−y| (1 + V (x)).
(2.12)
E GP (1, γ ) = 1. γ →∞ E TF (1, γ )
(2.13)
Then lim
Proof. It is clear that E TF (1, γ ) ≤ E GP (1, γ ). For the other direction, we use (j% ∗ ργTF )1/2 as a test function for E GP , where 1 1 j% (x) = exp − |x| . (2.14) 2π % 2 % Note that j% = 1 and |∇j% | = % −1 j% . Therefore
1 GP TF 2 TF TF 2 |∇j% ∗ ργ | + V (j% ∗ ργ ) + 4π γ (j% ∗ ργ ) E (1, γ ) ≤ 4j% ∗ ργTF 1 ≤ 2+ (j% ∗ V )ργTF + 4π γ (ργTF )2 , 4% (2.15) where we have used convexity for the last term. Moreover, (j% ∗ V − V )ργTF = d2 xd2 yj% (x − y) (V (x) − V (y)) ργTF (x) L1 −1 ≤ d2 xd2 y|x − y|α e(−% +L2 )|x−y| (1 + V (x))ργTF (x) 2π% 2 ≤ (const.) % α 1 + E TF (1, γ ) , (2.16) as long as % < L−1 2 . So we have E GP (1, γ ) ≤ (1 + (const.) % α )E TF (1, γ ) +
1 + (const.) % α . 4% 2
Optimizing over % gives as a final result E GP (1, γ ) ≤ E TF (1, γ ) 1 + (const.)E TF (1, γ )−α/(α+2) .
(2.17)
(2.18)
24
E. H. Lieb, R. Seiringer, J. Yngvason
Condition (2.12) is in particular fulfilled if V is homogeneous of some order s > 0 and locally Hölder continuous. In this case, E TF (1, γ ) = γ s/(s+2) E TF (1, 1)
(2.19)
TF γ 2/(s+2) ργTF (γ 1/(s+2) x) = ρ1,1 (x).
(2.20)
µTF (γ ) = γ s/(s+2) µTF (1).
(2.21)
and
By (2.7) we also have
If V is asymptotically homogeneous with a locally Hölder continuous limiting function W , we can prove corresponding formulas for the limit γ → ∞. This is the content of the next theorem, where we have included results on the GP → TF limit as well: Theorem 2.1 (Scaling limits). Suppose V satisfies the condition of Theorem 1.3. Let E˜ TF (1, 1) be the minimum of the TF functional (1.2) with g = 1 and N = 1 and V TF be the corresponding minimizer. Then replaced by W , and let ρ˜1,1 (i) limγ →∞ E GP (1, γ )/γ s/(s+2) = limγ →∞ E TF (1, γ )/γ s/(s+2) = E˜ TF (1, 1). GP (γ 1/(s+2) x) = ρ˜ TF (x), strongly in L2 (R2 ). (ii) limγ →∞ γ 2/(s+2) ρ1,γ 1,1 TF (x), uniformly in x. (iii) limγ →∞ γ 2/(s+2) ργTF (γ 1/(s+2) x) = ρ˜1,1 Proof. With the demanded properties of V , (2.13) holds. Using this and (1.9) one easily GP (γ 1/(s+2) x) is a minimizing sequence for the funcverifies (i). Moreover, γ 2/(s+2) ρ1,γ tional in question, so we can conclude as in Theorem II.2 in [2] that it converges to TF (x) strongly in L2 , proving (ii). (Remark: In Eq. (2.10) in [2] there is a misprint, ρ˜1,1 GP one should have ρ˜ GP on the left side.) To see (iii) let us define instead of ρ1,Na 1,Na ρ γ (x) = γ 2/(s+2) ργTF γ 1/(s+2) x . (2.22) We can write ρ γ (x) =
1 −s/(s+2) TF µ (γ ) − W (x) − %(γ , x) γ + 8π
(2.23)
with %(γ , x) = γ −s/(s+2) V (γ 1/(s+2) x) − W (x).
(2.24)
By assumption, |%(γ , x)| < δ(γ )(1 + W (x)) for some δ(γ ) with limγ →∞ δ(γ ) = 0. Because ρ γ = 1 for all γ , we see from Eq. (2.23) that µTF (γ )γ −s/(s+2) converges to some c as γ → ∞. Moreover, we can conclude that the support of ρ γ is for large γ contained in some bounded set B independent of γ . Therefore 1 = lim ρ γ = (8π )−1 [c − W (x)]+ (2.25) γ →∞
by dominated convergence, so c is equal to the µ˜ TF of Eq. (1.11). Now 1 TF ρ γ (x) = µ˜ − W (x) − %¯ (γ , x) + 8π
(2.26)
Derivation of Gross–Pitaevskii Energy Functional for 2D Bose Gas
25
with %¯ (γ , x) = %(γ , x) + µ˜ TF − γ −s/(s+2) µTF (γ ).
(2.27)
¯ )(1 + W (x)) for some δ(γ ¯ ) with limγ →∞ δ(γ ¯ ) = 0. By Eqs. Again |¯% (γ , x)| < δ(γ (1.11) and (2.26) we thus have TF ¯ ) ργ − ρ˜1,1 ∞ < C δ(γ
(2.28)
with C = (8π)−1 supx∈B (1 + W (x)) < ∞. The mean density for the TF theory is defined by ρ¯γ ≡ N ργTF (x)2 d2 x.
(2.29)
For γ = N, i.e., g = 1 this is the same as (1.6). It satisfies Lemma 2.4 (Bounds on ρ¯γ ). For some constant C > 0, N
µTF (γ ) µTF (γ ) ≥ ρ¯γ ≥ CN . 8π γ γ
(2.30)
Proof. The upper bound is trivial. Because ρ γ , defined in (2.22), converges uniformly TF and µTF (γ )γ −s/(s+2) → µ ˜ TF as γ → ∞, we have the lower bound to ρ˜1,1 γ ρ¯γ ≥ 8πγ s/(s+2) µTF (γ )−1 N µTF (γ ) for some C > 0.
TF 2 TF (ρ˜1,1 ) − 2ρ˜1,1 −ρ ∞
>C
(2.31)
Remark 2.1. With V asymptotically homogeneous of order s, µTF (γ )γ −s/(s+2) converges as γ → ∞, i.e. µTF (γ ) ∼ γ s/(s+2) for large γ . So the mean TF density for coupling constant g = 1, defined in (1.6), has the asymptotic behavior ρ¯ ∼ N s/(s+2) . 3. Upper Bound to the QM Energy As in the three dimensional case, cf. Eqs. (3.29) and (3.27) in [2], one has the upper bound |∇φγGP |2 + V (φγGP )2 N J (φγGP )4 + 23 N 2 (φγGP 2∞ K)2 E QM (N, a) ≤ , (3.1) + N 1 − N φγGP 2∞ I (1 − N φγGP 2∞ I )2 where we have implicitly used that −"φγGP +V φγGP ≥ 0, which is justified by Lemma 2.1. The coefficients I , J and K are given by Eqs. (2.4)–(2.10) in [4]. They depend on the scattering length and a parameter b. We choose γ = N/| ln(a 2 ρ)| ¯ and b = ρ¯ −1/2 . (Recall that ρ¯ is short for ρ¯N .) With this choice we have (as long as a 2 ρ¯ < 1) J =
4π , | ln(a 2 ρ)| ¯
(3.2)
26
E. H. Lieb, R. Seiringer, J. Yngvason
and the error terms N φγGP 2∞ I ≤ (const.)
µGP (γ ) 1 + O(| ln(a 2 ρ)| ¯ −1 ) ρ¯
(3.3)
and K 2 N 2 φγGP 4∞ ≤ (const.)E GP (1, γ )
µGP (γ ) ¯ −1 ) , 1 + O(| ln(a 2 ρ)| ρ¯
(3.4)
where we have used Lemma 2.1. So we have the upper bound E QM (N, a) E GP (N, 1/| ln(a 2 ρ)|) ¯
≤ 1 + O µGP (γ )/ρ¯ + O | ln(a 2 ρ)| ¯ −1 .
(3.5)
Now if γ is fixed as N → ∞, µGP (γ ) 1 1 ∼ ∼ . ρ¯ | ln(a 2 ρ)| ¯ N
(3.6)
If γ → ∞ with N we have instead, assuming that the external potential is asymptotically homogeneous of order s, γ s/(s+2) µTF (γ ) µGP (γ ) ∼ TF , ∼ ρ¯ µ (N ) N
(3.7)
E QM (N, a) 2 −s/(s+2) ≤ 1 + O | ln(a ρ)| ¯ E GP (N, 1/| ln(a 2 ρ)|) ¯
(3.8)
so in any case
holds as N → ∞ and a 2 ρ¯ → 0. 4. Lower Bound to the QM Energy Compared to the treatment of the 3D problem in [2] the new issue here is the TF case, i.e., γ = N/| ln(a 2 ρ)| ¯ → ∞, and we discuss this case first. The GP limit with γ fixed can be treated in complete analogy with the 3D case, cf. Remark 4.1 below. We introduce again the rescaled ρ γ as in (2.22) and also v (x) = γ 2/(s+2) v γ 1/(s+2) x . (4.1) Note that the scattering length of v is a = a γ −1/(s+2) . Using V ≥ µTF (γ ) − 8π γργTF and (2.7) we see that γ2 + γ −2/(s+2) Q E QM (N, a) ≥ E TF (N, γ /N ) + 4π N γ s/(s+2) ρ (4.2) TF − 8πN γ s/(s+2) ργ − ρ˜1,1 ∞ ,
Derivation of Gross–Pitaevskii Energy Functional for 2D Bose Gas
with Q = inf
||2 =1
|∇i | + 2
i
27
v (xi − xj )||
2
TF − 8π γ ρ˜1,1 (xi )||2
.
(4.3)
j
Dividing space into boxes α of side length L with Neumann boundary conditions we get Q≥ E hom (nα , L) − 8π γρα,max nα , (4.4) α TF in the box α, and E hom (n, L) is the where ρα,max denotes the maximal value of ρ˜1,1 energy of a homogeneous gas of n bosons in a box of side length L and Neumann boundary conditions. We can forget about the boxes where ρα,max = 0, because the energy of particles in these boxes is positive. We now want to use the lower bound on E hom given in [4], namely
E hom (n, L) ≥ 4π
1 n2 1 − C| ln( a 2 n/L2 )|−1/5 . 2 2 2 L | ln( a n/L )|
(4.5)
a 2 n/L2 . Now if This bound holds for n > (const.)| ln( a 2 n/L2 )|1/5 and small enough the minimum in (4.4) is taken in some box α for some value nα , we have E hom (nα + 1, L) − E hom (nα , L) ≥ 8π γρα,max .
(4.6)
By a computation analogous to the upper bound (see [2]) one shows that E hom (n + 1, L) − E hom (n, L) 1 n 2 2 −1 . 1 + O | ln( a n/L )| ≤ 8π 2 L | ln( a 2 n/L2 )| Using Lemma 2.4 and the asymptotics of µTF (Remark 2.1) we see that 2/(s+2) 2 2/(s+2) a2n N a a2N s/(s+2) N 2 C ≤ 2 =N ≤ a ρ¯ 2 , L2 L γ L2 L γ
(4.7)
(4.8)
for some constant C, so (4.7) reads E hom (n + 1, L) − E hom (n, L)
1 n 1 + | ln((γ /N )2/(s+2) L2 /C)| ≤ 8π 2 1+O . L | ln(a 2 ρ)| ¯ | ln(a 2 ρ)| ¯
(4.9)
So if L is fixed, our minimizing nα is at least ∼ ρα,max L2 N . If N is large enough and a 2 ρ¯ is small enough, we can thus use (4.5) in (4.4) to get Q≥
α
n2 4π α2 L
Nρα,max C 1 − −2 . 2 ρ)| | ln(a ¯ 2 2 a nα a N 1/5 | ln L2 | | ln L2 |
1
(4.10)
28
E. H. Lieb, R. Seiringer, J. Yngvason
Lemma 4.1. For 0 < x, b < 1 we have
b b2 1 x2 −2 x≥− 1+ . | ln x| | ln b| | ln b| (2| ln b|)2
(4.11)
1 −d Proof. Since ln x ≥ − de x for all d > 0 we have
x | ln b| 2x x 2 | ln b| −2 ≥ ≥ c(d)(bd ed| ln b|)−1/(1+d) edx 2+d − 2 2 b | ln x| b b b with c(d) = 2(2+d)/(1+d)
1 (2 + d)(2+d)/(1+d)
−
1 (2 + d)1/(1+d)
Choosing d = 1/| ln b| gives the desired result.
1 ≥ −1 − d 2 . 4
(4.12)
(4.13)
Note that the lemma above implies for k ≥ 1, b b2 1 x2 k2 . −2 xk ≥ − 1+ | ln x| | ln b| | ln b| (2| ln b|)2
(4.14)
a 2 ρα,max we get the bound Applying this with x = a 2 nα /L2 and b = N 2 ρα,max L2 Q ≥ − 4πN γ
× 1+
α
1 4| ln( a 2 Nρ
α,max
)|2
| ln( a 2 Nρα,max )| C 2 1− 2 | ln(a ρ)| ¯ | ln aLN |1/5 2
−1 (4.15)
for (4.10). To estimate the error terms, note that as in (4.8), 2/(s+2) N 2 2 a N ∼ a ρ¯ , γ
(4.16)
TF → 0 ¯ + O(ln | ln(a 2 ρ)|) ¯ for small a 2 ρ. ¯ Using ργ − ρ˜1,1 so | ln( a 2 N )| = | ln(a 2 ρ)| ∞ TF 2 2 (Theorem 2.1 (iii)) and ρ γ → (ρ˜1,1 ) as γ → ∞ (which follows from the uniform convergence and boundedness of the supports) we get
E QM (N, a) 2 2 TF 2 lim inf TF ≥ 1 − (const.) ρα,max L − (ρ˜1,1 ) . (4.17) N→∞ E (N, 1/| ln(a 2 ρ)|) ¯ α
Since this holds for all choices of the boxes α with arbitrary small side length L, and by TF is continuous and has compact support, we can conclude the assumptions on V ρ˜1,1 lim inf N→∞
E QM (N, a) ≥1 E TF (N, 1/| ln(a 2 ρ)|) ¯
¯ → ∞. in the limit N → ∞, a 2 ρ¯ → 0 and N/| ln(a 2 ρ)|
(4.18)
Derivation of Gross–Pitaevskii Energy Functional for 2D Bose Gas
29
Remark 4.1 (The GP case). In the derivation of the lower bound we have assumed that γ → ∞ with N , i.e. N | ln(a 2 ρ)|, ¯ which seems natural because otherwise the scattering length would have to decrease exponentially with N . However, for fixed γ one can use the methods of [2] (with slight modifications: One uses the 2D bounds on the homogeneous gas and Lemma 4.1) to compute a lower bound in terms of the GP energy. The result is lim inf N→∞
E QM (N, a) ≥1 E GP (N, 1/| ln(a 2 ρ)|) ¯
(4.19)
¯ fixed. in the limit N → ∞, a 2 ρ¯ → 0 with γ = N/| ln(a 2 ρ)| 5. The Limit Theorems We have now all the estimates needed for Theorems 1.1–1.4. The upper bound (3.8) and the lower bound (4.19) prove Theorem 1.1. The energy limit Theorem 1.3 for the TF case follows from (3.8), Theorem 2.1 (i) and (4.18). The convergence of the energies implies the convergence of the densities in the usual way by variation of the external potential. Replacing V (x) by V (x) + δγ s/(s+2) Y (γ −1/(s+2) x) for some positive Y ∈ C0∞ and redoing the upper and lower bounds we see that Theorem 1.3 and Theorem 2.1 (i) hold with W replaced by W + δY . Differentiating with respect to δ at δ = 0 yields γ 2/(s+2) QM 1/(s+2) TF x) = ρ˜1,1 (x) ρN,a (γ N→∞ N lim
(5.1)
in the sense of distributions. Since the functions all have norm 1, we can conclude that there is even weak L1 -convergence. Remark 5.1 (The 3D case). In [2] the analogues of Theorems 1.1 and 1.2 were shown for the three-dimensional Bose gas. Using the methods developed here one can extend these results to analogues of Theorems 1.3 and 1.4. In 3D the coupling constant is g = a, so γ = N a. Moreover, the relevant mean 3D density is ρ¯γ ∼ N (N a)−3/(s+3) . A. Appendix: Scattering Length in Two Dimensions Due to the logarithmic behavior of the Green function of the two dimensional Laplacian the definition of the scattering length is slightly more delicate in two dimensions than in three. For a nonnegative potential v(x), depending only on |x| and with finite range R0 , it is naturally defined by the following variational principle: Theorem A.1. Let R > R0 and consider the functional 1 2 2 ER [φ] = |∇φ(x)| + v(x)|φ(x)| d2 x. 2 |x|≤R
(A.1)
30
E. H. Lieb, R. Seiringer, J. Yngvason
Then, in the subclass of functions such that (|φ|2 + |∇φ|2 ) < ∞ and φ(x) = 1 for |x| = R, there is a unique function φ0 that minimizes ER [φ]. This function is nonnegative and rotationally symmetric, and satisfies the equation 1 −"φ0 (x) + v(x)φ0 (x) = 0 2
(A.2)
for |x| ≤ R in the sense of distributions, with boundary condition φ0 (x) = 1 for |x| = R. For R0 < |x| < R, φ0 (x) = ln(|x|/a)/ ln(R/a)
(A.3)
for a unique number a called the scattering length. For the proof see [4], where generalizations to other dimensions and potentials with a negative part are also discussed. Note that the factor 21 in (A.1) and (A.2) is due to the reduced mass of the two body problem. If v has infinite range it is easy to extend ∞the definition of the scattering length for nonnegative v under the assumption that |x|≥R1 v(x)d2 x < ∞ for some R1 . In fact, one may then simply cut off the potential at some point R0 > R1 (i.e., set v(x) = 0 for |x| > R0 ) and consider the limit of the scattering lengths of the cut off potentials as R0 → ∞. See [4] for details. References 1. Dalfovo, F., Giorgini, S., Pitaevskii, L.P. and Stringari, S.: Theory of Bose–Einstein condensation in trapped gases. Rev. Mod. Phys. 71, 463–512 (1999) 2. Lieb, E.H., Seiringer, R. and Yngvason, J.: Bosons in a Trap: A Rigorous Derivation of the Gross– Pitaevskii Energy Functional. Phys. Rev. A 61, 043602-1–043602-13 (2000); arXiv: math-ph/9908027, mp_arc 99-312. See also: Proceedings of ‘Quantum Theory and Symmetries’ (Goslar, 18-22 July 1999), edited by H.-D. Doebner, V.K. Dobrev, J.-D. Hennig and W. Luecke, Singapore: World Scientific, 2000; arXiv math-ph/9911026, mp_arc 99–439 3. Lieb, E.H. andYngvason, J.: Ground State Energy of the Low Density Bose Gas. Phys. Rev. Lett. 80, 2504– 2507 (1998); arXiv math-ph/9712138, mp_arc 97-631. A more leisurely presentation is in Differential Equations and Mathematical Physics, Proceedings of 1999 conference at the Univ. of Alabama, R. Weikard and G. Weinstein, eds., Cambridge, MA: International Press, 2000, pp. 295–306 4. Lieb, E.H. and Yngvason, J.: Ground State Energy of a Dilute Two-dimensional Bose Gas. J. Stat. Phys. 103, 509–526 (2001); arXiv: math-ph/0002014 5. Schick, M.: Two-Dimensional System of Hard Core Bosons. Phys. Rev. A 3, 1067–1073 (1971) 6. Hines, D.F., Frankel, N.E. and Mitchell, D.J.: Hard disc Bose gas. Physics Letters 68A, 12–14 (1978) 7. Popov, V.N.: On the theory of the superfluidity of two- and one-dimensional Bose systems. Theor. and Math. Phys. 11, 565–573 (1977) 8. Fisher, D.S. and Hohenberg, P.C.: Dilute Bose gas in two dimensions. Phys. Rev. B 37, 4936–4943 (1988) 9. Kolomeisky, E.B. and Straley, J.P.: Renormalization group analysis of the ground state properties of dilute Bose systems in d spatial dimensions. Phys. Rev. B 46, 11749–11756 (1992) 10. Ovchinnikov, A.A.: On the description of a two-dimensional Bose gas at low densities. J. Phys. Condens. Matter 5, 8665–8676 (1993). See also JETP Letters 57, 477 (1993); Mod. Phys. Lett. 7, 1029 (1993) 11. Shevchenko, S.I.: On the theory of a Bose gas in a nonuniform field. Sov. J. Low Temp. Phys. 18, 223–230 (1992) 12. Kolomeisky, E.B., Newman, T.J., Straley, J.P. and Qi, X.: Low-dimensional Bose liquids: Beyond the Gross–Pitaevskii approximation. Phys. Rev. Lett. 85, 1146–1149 (2000); arXiv: cond-mat/0002282 13. Kim, S., Won, C., Oh, S.D. and Jhe, W.: Bose–Einstein condensation in a two-dimensional trap. arXiv: cond-mat/0003342 (2000) 14. Kim, S., Won, C., Oh, S.D. and Jhe, W.: Two-dimensional condensation of dilute Bose atoms in harmonic trap. J. Korean Phys. Soc. 37, 665 (2000); arXiv: cond-mat/9904087 15. Garcia-Ripoll, J.J. and Perez-Garcia, V.M.:Anomalous rotational properties of Bose–Einstein condensates in asymmetric traps. Phys. Rev. A 64, 013602 (2001); arXiv: cond-mat/0003451 (2000)
Derivation of Gross–Pitaevskii Energy Functional for 2D Bose Gas
31
16. Gonzalez, A. and Perez, A.: Ground-state properties of bosons in three- and two-dimensional traps. Int. J. Mod. Phys. B 12, 2129–38 (1998) 17. Heinrichs, S. and Mullin, W.J.: Quantum-Monte-Carlo Calculations for Bosons in a Two-Dimensional Harmonic Trap. J. Low Temp. Phys. 113, 231–6 (1998) 18. Bayindir, M. and Tanatar, B.: Bose–Einstein condensation in a two-dimensional, trapped, interacting gas. Phys. Rev. A 58, 3134–7 (1998) 19. Cornish, S.L., Claussen, N.R., Roberts, J.L., Cornell, E.A. and Wieman, C.E.: Stable 85 Rb Bose–Einstein Condensates with Widely Tunable Interactions. Phys. Rev. Lett. 85, 1795–98 (2000) Communicated by H. Spohn
Commun. Math. Phys. 224, 33 – 63 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Quantum Lattice Models at Intermediate Temperature J. Fröhlich1 , L. Rey-Bellet2 , D. Ueltschi3, 1 Institut für Theoretische Physik, ETH Hönggerberg, 8093 Zürich, Switzerland.
E-mail:
[email protected]
2 Department of Mathematics, University of Virginia, Charlottesville, VA 22903, USA.
E-mail:
[email protected]
3 Department of Physics, Princeton University, Jadwin Hall, Princeton, NJ 08544, USA.
E-mail:
[email protected] Received: 6 December 2000 / Accepted: 18 July 2001
Dedicated to Joel Lebowitz on the occasion of his seventieth birthday Abstract: We analyze the free energy and construct the Gibbs-KMS states for a class of quantum lattice systems, at low temperature and when the interactions are almost diagonal, in a suitable basis. The models we study may have continuous symmetries, our results, however, apply to intermediate temperatures where discrete symmetries are broken but continuous symmetries are not. Our results are based on quantum Pirogov– Sinai theory and a combination of high and low temperature expansions.
1. Introduction In this paper we study the low temperature phase diagram for a class of quantum lattice systems. Starting with [PS, Sin], Pirogov–Sinai theory has evolved [KP, Zah, BKL, BS, BI, BK] into a very powerful tool to study the pure phases, their coexistence and the firstorder phase transitions in classical spin systems at low temperature. In recent years a large part of the Pirogov–Sinai theory has been extended to quantum systems [Pir, BKU, DFF, DFFR, KU], quantum spin systems as well as fermionic and bosonic lattice gases, and applied to a variety of models [FR, DFF2, GKU] to describe insulating phases associated with discrete symmetry breaking. Here we formulate the Pirogov–Sinai theory in terms of tangent functionals to the free energy. This allows us to discuss the completeness of the phase diagram avoiding the difficulties associated with boundary conditions. We reformulate results of [BKU, DFF, DFFR, KU] in this framework, and extend the theory to a class of models where discrete symmetries are broken at intermediate temperatures. This applies in particular to some systems with continuous symmetries. For this, we consider the restricted ensembles introduced in [BKL] that are very useful to analyze phases which are associated to a family of configurations rather than to a single configuration. Supported by the US National Science Foundation, grant PHY 9820650
34
J. Fröhlich, L. Rey-Bellet, D. Ueltschi
The models that we consider have Hamiltonians, for finite volumes , of the form H = V + T , where V is a classical Hamiltonian (i.e. diagonal in a suitable basis) and T is a (usually small) quantum perturbation. In typical situations the suitable basis is the basis of occupation numbers of position operators. Electronic systems provide a large class of interesting models. The classical interaction V describes the many-body short range and classical interaction between the spin- 21 fermions as well as external fields and chemical potentials: V = Jx,σ nx,σ + Jxy,σ σ nx,σ ny,σ + · · · . x∈ σ ∈{↑,↓}
x,y∈ σ,σ ∈{↑,↓}
A typical quantum perturbation T is the kinetic energy † T = txy,σ (cxσ cyσ + h.c.), <x,y>⊂ σ ∈{↑,↓}
† where cxσ and cxσ are the creation and annihilation operators and < x, y > denotes pairs of nearest neighbors. Often, in such systems, the behavior at low temperatures arises from a subtle interplay between the (classical) potential energy and the kinetic energy. In this paper two such mechanisms are considered and combined, each of which we now illustrate with an example.
Example 1 (Hubbard Model). In this case the (classical) interaction is only on-site: V = U nx↑ nx↓ − µ(nx↑ + nx↓ ). x∈
For suitable values of U and µ, the ground states of V have an infinite degeneracy (in the thermodynamic limit): each site is occupied by a single particle of arbitrary spin. However the kinetic energy lifts this degeneracy and induces an effective antiferromagnetic interaction between nearest neighbors. The perturbative methods of [DFFR, DFF2] shows that, in this parameter range, this system is equivalent, in the sense of statistical mechanics, to the Heisenberg antiferromagnet, up to controlled error terms. If the hopping coefficients are asymmetric (e.g. txy,↑ txy,↓ ) then quantum Pirogov–Sinai implies the coexistence of two antiferromagnetic phases at low enough temperatures [DFFR, KU, DFF2]. Rigorous results for the Hubbard model are reviewed in [Lieb]. Example 2 (Extended Hubbard Model). This variant of the Hubbard model includes a nearest neighbor interaction: V = (nx↑ + nx↓ )(ny↑ + ny↓ ). U nx↑ nx↓ − µ(nx↑ + nx↓ ) + W x∈
<x,y>⊂
If the interaction between nearest neighbors is repulsive then for suitable values of U , W and µ the ground states of V are chessboard configurations where empty sites alternate with sites occupied with one particle of arbitrary spin. The degeneracy of the
Quantum Lattice Models at Intermediate Temperature
35
ground states is infinite in the thermodynamic limit but we have a spatial ordering of the particles. Using a restricted ensemble we associate a pure phase to this spatial ordering by neglecting the spin degrees of freedom. The methods of this paper imply the existence of only two pure phases in the intermediate temperature range βt 1
and
βW 1.
The temperature is so low that the spatial ordering of the particles survives but so high that the spins are in a disordered phase. The continuous symmetry (if txy,↑ = txy,↓ ) is not broken in this parameter regime. These two models illustrate some of the mechanisms arising from the competition between classical and quantum effects, where the system remains insulating and no continuous symmetry is broken. Our main result, Theorem 4.4, provides tools to describe the phase diagram of such models, in particular the coexistence of several phases and the associated first-order phase transitions. The main technical ingredient in this paper is a combined low-temperature and hightemperature expansion for suitable contour models obtained using the perturbation theory developed in [DFFR]. This paper is organized as follows. In Sect. 2 we describe the general formalism of quantum lattice systems and the perturbation theory of [DFFR]. Section 3 is devoted to the Pirogov–Sinai theory. In Sect. 4 we state the results of Pirogov–Sinai theory for quantum systems. The extended Hubbard model is discussed in Sect. 5 as an illustration. In Sect. 6 we prove our main result by studying a contour model and deriving the required bounds on the contours. 2. General Framework of Quantum Lattice Models 2.1. Basic set-up. We consider a quantum mechanical system on a ν-dimensional lattice Zν , as considered, e.g., in [Rue, Isr, BR, Sim]. We will need a slight modification of the usual formalism in order to treat fermionic lattice gases [DFFR] and to accommodate the fact that fermionic creation and annihilation operators do not commute but anticommute. A quantum lattice system is defined by the following data: (i) Hilbert space. For convenience we choose a total ordering (denoted by the symbol ) of the sites in Zν . We choose the spiral order, depicted in Fig.1 for ν = 2, and an analogous ordering for ν ≥ 3. This ordering has the property that, for any finite set A, the set A := {z ∈ Zν , z A} of lattice sites which are smaller than A, or belong to A, is finite. To each lattice site a ∈ Zν is associated a finite-dimensional Hilbert space Ha and, for any finite subset A = {a1 ≺ · · · ≺ an } ⊂ Zν , the corresponding Hilbert space HA is given by the ordered tensor product HA = Ha1 ⊗ · · · ⊗ Han .
(2.1)
We further require that there be a Hilbert space isomorphism φa : Ha −→ H, for all a ∈ Zν . (ii) Field and observable algebras. For any finite subset A ⊂ Zν an operator algebra FA , the field algebra, is given. The algebra FA is isomorphic to the algebra B(HA ) of bounded operators on HA , but in general FA = B(HA ), rather FA ⊂ B(HA ). The algebra FA is a ∗-algebra equipped with a C ∗ -norm obtained from the operator norm on B(HA ). If A ⊂ B and a ≺ b, for all a ∈ A and all b ∈ B \ A, then there is a natural
36
J. Fröhlich, L. Rey-Bellet, D. Ueltschi
✛ ❄
✻ 5
t
4
t
3
6
1
2
t
t
t t
(0, 0)
7
t
✲
Fig. 1. Spiral order in Z2
embedding of FA into FB : An operator K ∈ FA corresponds to the operator K ⊗ 1HB\A in FB . In the following we denote by K both operators. For the infinite system the field algebra is the C ∗ algebra given by F=
FA
norm
,
(2.2)
AZν
(the limit being taken through a sequence of increasing subsets of Zν , where increasing refers to the (spiral) ordering defined above). The algebras FA contain the observable algebras OA which have the same embedding properties as the field algebras and, moreover, satisfy the following commutativity condition: If A ∩ B = ∅, then for any K ∈ FA , L ∈ OB we have [K, L] = 0.
(2.3)
For the infinite system the observable algebra O is given by O=
OA
norm
.
(2.4)
AZν
The group of space translations Zν acts as a ∗-automorphism group {τa }a∈Zν on the algebras F and O, with FX+a = τa (FX ),
OX+a = τa (OX ),
(2.5)
for any X ⊂ Zν and a ∈ Zν . (iii) Interactions, dynamics and free energy. An interaction H = {HA } is given: This is a map from the finite sets A ⊂ Zν to self-adjoint operators HA in the observable algebra OA . We assume the interaction to be translation invariant or periodic, i.e., there is a lattice # ⊆ Zν , with dim# = ν, such that τa HA = HA+a , for all a ∈ # and all A ⊂ Zν . We will consider finite range or exponentially decaying interactions. The norm of an interaction is defined as H r = sup HA er|A| , (2.6) a∈Zν Aa
for some r > 0. Here |A| denotes the cardinality of the smallest connected subset of Zν which contains A. We shall denote by Br = {H : H r < ∞} the corresponding Banach space of interactions.
Quantum Lattice Models at Intermediate Temperature
37
For a finite box , we denote H the finite-volume Hamiltonian given by H = A⊂ HA . Here, we consider only periodic boundary conditions, i.e. is the ν-dimensional torus (Z/LZ)ν , L being the size of . In the sequel we will consider infinite volume limits; the notation limZν will stand for limL→∞ . If H ∈ Br , the interaction H determines a one-parameter group of ∗-automorphisms, {αt }t∈R on F. These automorphisms are constructed as the limit (in the strong topology) of the automorphisms αt given by for K ∈ FA , A ⊂ by αt (K) = eitH K e−itH .
(2.7)
The proof is standard (see e.g. [BR]). Note that one makes crucial use of the commutativity condition (2.3). For an interaction H and at inverse temperature β the partition function is defined as β
Z = Tr e−βH ;
(2.8)
the free energy f (H ) is then f (H ) = −
1 1 β lim ν log Z . β Z ||
(2.9)
Existence of the limit is a well-known result, see [Isr, Sim]. Notice that f (H ) is a concave function of the interaction H . (iv) KMS states and tangent functionals. A state w on O is a positive normalized linear functional on O. A state w is periodic if w ◦ τa = w, for all a in a lattice # ⊂ Zν and invariant if # = Zν . A KMS state at inverse temperature β is a state wβ which satisfies the KMS condition wβ (Kαt (L)) = wβ (αt−iβ (K)L).
(2.10)
For finite systems with periodic boundary conditions it is easy to check that the Gibbs state given by wβ ( · ) = (Tr e−βH )−1 Tr( e−βH · )
(2.11)
satisfies the KMS condition. The set of KMS states is convex, and w is called extremal if it cannot be written as a linear combination of KMS states. The state w is clustering if lim w(Kτa (L)) = w(K)w(τa L),
a→∞
(2.12)
for all K, L ∈ O. Note that a state w is extremal if it is clustering. The state w is exponentially clustering if, for any local observables K ∈ OA , L ∈ OB we have the property w(Kτa (L)) − w(K)w(τa L) CK,L e−|a|/ξ
(2.13)
with ξ > 0; here CK,L depends on K and L only. If we consider the free energy as a function of the interaction, KMS states at inverse temperature β are in one-to-one correspondence with tangent functionals to the free energy. The free energy f is a concave function of the interaction H and a linear functional α on Br is said to be tangent to f at H if for all interaction K ∈ Br we have f (H + K) f (H ) + α(K).
(2.14)
38
J. Fröhlich, L. Rey-Bellet, D. Ueltschi
To an invariant state w we associate a tangent functional α defined by α(K) = w(AK ),
(2.15)
where AK = X0 |X|−1 KX (and similarly for periodic states). The results of Israel and Araki [Isr,Ara] show that if α is a tangent functional at H , then the invariant state w defined in (2.15) is a KMS state at temperature β and, conversely, for any KMS state at temperature β there is a unique tangent functional α. The identification of KMS states with tangent functionals will be very useful to describe the phase diagrams arising from Pirogov–Sinai theory. Example. As an illustration of the general formalism we consider spin 1/2 fermions, as in the examples treated in this paper. The Hilbert space Ha is isomorphic to C4 . We † and caσ the creation and annihilation operators of a particle at site a with spin denote caσ σ ∈ {↑, ↓}. One can construct an explicit representation of the creation and annihilation † ∈ / B(Ha ). operators as operators in B(Ha ), see e.g. Sect. 4.2 in [DFFR], but caσ , caσ † , a ∈ A, The algebras FA ⊂ B(HA ) are chosen to be the algebras generated by caσ , caσ σ ∈ {↑, ↓}. The observable algebras OA are chosen as the algebras generated by pairs of creation or annihilation operators. It is easy to check that the elements FA and OA satisfy the commutativity condition (2.3). Classical interactions. A particular class of interactions consists of the classical interactions. Let {ej }j ∈I be an orthonormal basis of H. Then, for A ⊂ Zν , EA = {⊗a∈A ejaa }, with ejaa = φa−1 ej ,
(2.16)
is an orthonormal basis of HA . We denote by C(EA ) the abelian subalgebra of OA consisting of all operators which are diagonal in the basis EA . An interaction V is called classical, if there exists a basis {ej }j ∈I of H such that VA ∈ C(EA ), for all A ⊂ Zν .
(2.17)
The set .A of configurations in A is defined as the set of all assignments {ja }{a∈A} of an element ja ∈ I to each a. A configuration ωA is an element in .A . There is a one-to-one correspondence between basis vectors a∈A ejaa of HA and configurations on A: a∈A
ejaa ←→ ωA ≡ {ja }a∈A .
(2.18)
In the sequel we shall use the notation eωA to denote the basis vector defined by the configuration ωA via the correspondence (2.18). Since a classical interaction V only depends on the numbers 0A (ωA ) = %eωA |VA |eωA &
(2.19)
we may view 0A as a (real-valued) function on the set of configurations. Similarly the algebra C(EA ) may be viewed as the ∗-algebra of complex-valued functions on the set of configurations .A .
Quantum Lattice Models at Intermediate Temperature
39
2.2. Perturbation theory for interactions. The interactions we will study have the form H = V +λT , where V is a classical interaction, T is a perturbation and λ a small parameter. A typical situation is the following: the classical part of the interaction has infinitely many ground states, i.e. the number of ground states of the finite-volume Hamiltonian H diverges as || → ∞, but the perturbation T lifts this degeneracy (completely or partially). This is usually easy to check this using standard perturbation theory for the finite-volume Hamiltonian V + λT . Standard perturbation theory however does not work in the thermodynamic limit, the norm of the error growing with || and other methods are required. Such methods have been developed in [DFFR] and applied in [FR, DFF2] (see also [KU] for an alternative approach). ˜ which is equivalent to H and which can be The idea is to construct an interaction H cast in the form ˜ = V˜ (λ) + T˜ (λ), H
(2.20)
where now the degeneracy of the ground states of V˜ is lifted and T˜ (λ) is suitably small with respect to V˜ (λ). ˜ are equivalent if there exists a ∗-automorphism Recall that two interactions H and H of the algebra O of local observables such that H˜ A = γ (HA ),
(2.21)
˜ ∈ Br˜ . A convenient way of for all A. In particular, if H ∈ Br , there exists r˜ such that H constructing equivalent interactions is with a family of unitary transformations U . Let SA , A ⊂ Zν , be a family of antiselfadjoint operators, periodic or translation invariant, with SA ∈ OA and Sr < ∞ for some r > 0. We set S = A⊂ SA and then U = exp(S ) is unitary. It is shown in [DFFR] that if Sr is small enough then the ˜ ∈ Br˜ for unitary equivalent Hamiltonians H˜ = U H U−1 define an interaction H ˜ is equivalent to H . some r˜ > 0 and H We consider now an interaction of the form H = V + λT which satisfy the following conditions: (P1) The interaction V is classical and of finite range. Moreover, we assume that V is given by a translation-invariant m-potential. This last condition means that we can assume (if necessary by passing to a physically equivalent interaction) that there exists at least one configuration ω minimizing all 00X , i.e., 00X (ω) = min 00X (ω ), ω
(2.22)
for all X. For any m-potential, the set of all configurations for which Eq. (2.22) holds is the set of ground states of 00 . (P2) The perturbation interaction T is in some space Banach space Br for some r > 0. Since, by condition (P1), the ground states can be determined locally, there is a corresponding decomposition of the Hilbert space HA for all A: high
low ⊕ HA , HA = HA
(2.23)
low is the subspace spanned by the ground states of V . We can decompose any where HA low and Hhigh : operator KA ∈ B(HA ) according to their action on HA A
KA = KAll + KAhh + KAlh ,
(2.24)
40
J. Fröhlich, L. Rey-Bellet, D. Ueltschi
with low low ⊂ HA KAll HA high KAhh HA low KAlh HA
⊂ ⊂
high HA high HA
high
KAll HA
= 0,
low KAhh HA = 0, high
KAlh HA
low ⊂ HA .
Accordingly we decompose any interaction T : T = T ll + T hh + T lh ,
(2.25)
The following theorem shows that, for any integer n ≥ 1, it is possible to construct an interaction H (n) equivalent to H with the property that H (n) is block diagonal up to order n. Note that this is a constructive result and an algorithm is given in [DFFR] which (n) allows one to construct the unitary transformations U and the interactions H (n) . Theorem 2.1. Consider an interaction of the form H = V + λT ,
(2.26)
where V satisfies Condition (P1) and T satisfies Condition (P2). For any integer n ≥ 1 there is rn > 0 and λn > 0 such that for |λ| < λn there is an interaction H (n) = V + T (n) ∈ Brn , equivalent to H , with T (n)lh rn = O(λn+1 ).
(2.27)
This theorem is useful to analyze the low temperature behavior of quantum spin systems when the ground states of V have infinite degeneracy and T lifts this degeneracy (totally or partially). Consider for example the typical case where the degeneracy is lifted in second order perturbation theory. In that case we may take n = 1 and we have T (1)lh = O(λ2 ): (1)ll (1)hh (1)lh H (1) = V + λj T j + λj T j + λj T j . (2.28) j ≥1
j ≥1
j ≥2
We then decompose H (1) = V˜ + T˜ into a new “classical part” V˜ given by V˜ = V +
2 j =1
(1)ll
λj T j
,
(2.29)
and T˜ contains all remaining terms. The new perturbation satisfies the bounds T˜ = hh lh O(λ3 ), T˜ = O(λ), and T˜ = O(λ2 ). If V˜ is a classical interaction with a sufficiently regular zero-temperature phase diagram, then Pirogov–Sinai techniques can be applied to study the phase diagrams of V˜ + T˜ for sufficiently small λ (see below). Note that this perturbation scheme is not only useful to analyze the low-temperature behavior of the model. The new “classical part” V˜ does not need to be classical at all. For example, see [DFFR, DFF2], if one applies this perturbation scheme to the Hubbard model at half-filling, V˜ is given by the Heisenberg model and this gives a rigorous proof of the equivalence of both models up to controlled error terms. ll
Quantum Lattice Models at Intermediate Temperature
41
3. Phase Diagrams, Contour Models, and Pirogov–Sinai Theory A phase diagram in Thermodynamics is a partition of a space of physical parameters in domains corresponding to phases; the free energy varies very smoothly inside a domain. However, first derivatives or of higher order may have discontinuities when crossing the boundary between two domains, and in this case one talks of phase transitions. The first proof of a phase transition was proposed by Peierls for the Ising model [Pei]. It was extended by Pirogov and Sinai [PS, Sin] to situations where different phases are not related by a symmetry. Important extensions and simplifications of the Pirogov– Sinai theory include Kotecký and Preiss [KP], Zahradník [Zah], Bricmont et al. [BKL] and [BS], Borgs and Imbrie [BI], Borgs and Kotecký [BK, BK2]. An exposition of the Pirogov–Sinai theory can be found in [EFS]. Another extension of the Peierls argument was done in Fröhlich and Lieb [FL] using reflection positivity [FSS, DLS]. 3.1. Phase diagrams. We consider the Banach space Br of periodic interactions, with the norm defined in (2.6). Here r is any positive number, but further assumptions (bounds for the weights of the contours, see below) can be verified in given models only if r is large enough. To a given interaction H ∈ Br and temperature β we associate the set of all translation invariant (or periodic) KMS states or, equivalently [Ara, Isr], the set of all tangent functionals to the free energy f (H ). The set of periodic KMS states forms a simplex, so that it is enough to describe the extremal states, or the corresponding tangent functionals. We denote the set of extremal states by E β (H ). In order to define a phase diagram we consider a smooth (p−1)-dimensional manifold on the Banach space Br of periodic interactions; it is described by an application u ) → H u , from a connected open set U ⊂ Rp−1 into Br . For m = 1, 2, 3, . . . , we introduce E (m) = {H ∈ Br : |E β (H )| = m}; accordingly, we partition the set U as U=
∞
∪ U (m) ,
m=1
(3.1)
where u ∈ U (m) iff H u ∈ E (m) . The decomposition (3.1) is called the phase diagram of H u. The phase diagram of H u , u ∈ U ⊂ Rp−1 , is said to satisfy the Gibbs phase rule if the following conditions hold. Here, we call “boundary” of U (i) the set (U¯ (i) \ U (i) ) ∩ U, with U¯ (i) the closure of U (i) . (i) U = U (1) ∪ · · · ∪ U (p) . (ii) (a) U (1) consists of p connected components, each of which is a (p−1)-dimensional manifold. The boundary of U (1) is U (2) ∪ · · · ∪ U (p) . (b) U (2) consists of p2 connected components, each of which is a (p − 2)-dimensional manifold.The boundary of U (2) is U (3) ∪ · · · ∪ U (p) . p (q) (c) U consists of q connected components, each of which is a (p − q)-dimensional manifold. The boundary of U (q) is U (q+1) ∪ · · · ∪ U (p) . (d) U (p) consists of a single point u0 . In other words, the phase diagram of H u satisfies the Gibbs phase rule iff it is homeomorphic to a connected, open neighborhood U of the boundary of the positive octant of Rp , in such a way that u0 is mapped onto the origin, U (p−1) is mapped onto the union of axis ∪i {ai > 0, aj = 0, j = i}, and so on...
42
J. Fröhlich, L. Rey-Bellet, D. Ueltschi
Connected components of U (1) are the one-phase region, or pure phase region, U (2) is the region of coexistence of two phases, . . . , U (p) is the point of coexistence of all p phases. We will call a phase diagram which satisfies the Gibbs phase rule regular if the free energy is a real analytic function of u in each one-phase region, and if all connected components of the manifold U (j ) are smooth (C 1 ). 3.2. Contour models. A contour A is a pair (A, α), where A ⊂ Zν is a finite connected set and is the support of A; to describe α, let us introduce the closed unit cell C(x) ⊂ Rν centered at x, i.e. C(x) = {y ∈ Rν : |y − x|∞ 21 }. The boundary B(A) of A ⊂ Zν is the union of plaquettes B(A) = {C(x) ∩ C(y) : x ∈ A, y ∈ / A}.
(3.2)
The boundary B(A) decomposes into connected components; each connected component b is given a label αb ∈ {1, . . . , p}, and α = (αb ). Let ⊂ Zν finite, with periodic boundary conditions.A set of contours {A1 , . . . , Ak } is admissible iff • Ai ⊂ , and dist (Ai , Aj ) 1 if i = j . • Labels αj are matching in the following sense. Let W = \ ∪kj =1 Aj ; then each connected component of W must have the same label on its boundaries. For j ∈ {1, . . . , p}, let Wj be the union of all connected components of W with labels j on their boundaries. β,u For each j ∈ {1, . . . , p}, we give ourselves a complex function gj (“free energy of a restricted ensemble”), that is real analytic in u ∈ U. We suppose that the limit β → ∞ β,u of gj exists, and we write β,u
eiu = lim Re gi , β→∞
1 i p,
(3.3)
e0u = min eiu .
(3.4)
i
We consider the partition function (2.8) for an interaction H u = V u + T , where the periodic interaction T is a perturbation of V u . We assume that the partition function can be rewritten as β,u,T
Z
=
k
{A1 ,...,Ak } j =1
w β,u,T (Aj )
p
β,u
e−βgi
|Wi |
,
(3.5)
i=1
where the sum is over admissible sets of contours in .1 The weight w β,u,T (A) of a contour A is a complex function of β, u, and T , that behaves nicely for β large and T in a neighborhood of 0. Precisely, we assume that there exists a set W ⊂ R+ × Br , that is open and connected, and whose closure contains (∞, 0); furthermore, we suppose that for all u ∈ U and all (β, T ) ∈ W, and all contours A, β,u −βgi || 1 The sum includes the case k = 0, and the corresponding term is p . It is however j =1 e
irrelevant, since it does not contribute to the infinite-volume free energy (3.6).
Quantum Lattice Models at Intermediate Temperature
43
• w β,u,T is periodic with period =, i.e. we have w β,u,T (τa A) = w β,u,T (A) for all a ∈ (=Z)ν and all A. Here τa is the translation operator. u • |w β,u,T (A)| e−βe0 |A| e−τ |A| for a large enough constant τ (depending on ν, p, and =). Furthermore, |
u ∂ β,u,T w (A)| β|A|C e−βe0 |A| e−τ |A| ∂ui
and |
u ∂ β,u,T +ηK (A)| β|A|CKr e−βe0 |A| e−τ |A| w ∂η
for a uniform constant C. • limβ→∞ limT →0 w β,u,T (A) = 0. This means that the weights represent the correction to the situation (β = ∞, T = 0). • wβ,u,T (A) is real analytic in u; for all K ∈ Br , wβ,u,T +ηK (A) is real analytic in η in a neighborhood of 0 (the neighborhood depends on K). Finally, the free energy is f β,u,T = −
1 1 β,u,T lim ν log Z . β Z ||
(3.6)
We also assume the following properties for f β,u,T : • f β,u,T is real, and concave as a function of T ; • whenever H u + T = H u + T , we have
f β,u,T = f β,u ,T .
(3.7)
Although these properties seem difficult to verify in the context of a contour model, they are usually clear in the original physical model.
3.3. The Pirogov–Sinai theory. The results of the Pirogov–Sinai theory are usually presented in terms of existence of many Gibbs states for a given interaction. However, it is more convenient to think of the Pirogov–Sinai theory as to express the free energy in a suitable form for the description of first-order phase transitions: the free energy is given as the minimum of C 1 functions (“metastable free energies”), that intersect themselves by making angles, hence a first-order phase transition when varying parameters so as to cross an intersection. The free energy at zero temperature is given by (3.4); in typical situations this is the minimum over energies of some important configurations (the “potential ground states”). The Pirogov–Sinai theory shows that in contour models, this structure extends at low temperatures. In the quantum situation one is also interested in adding a perturbation to a “nice” model; the metastable free energies then depend not only on β, but also on the quantum perturbation. We claim that the Pirogov–Sinai theory allows to construct metastable free energies that satisfy the following properties.
44
J. Fröhlich, L. Rey-Bellet, D. Ueltschi
Properties of the metastable free energies. We consider a contour model that satisfies the β,u,T for (β, T , u) ∈ structure described in Sect. 3.2. Then there exist p real functions fi W × U, such that β,u,T
(a) f β,u,T = mini fi ; β,u,T (b) limβ→∞ limT →0 fi = eiu , and limβ→∞ limT →0
∂ ∂uj
β,u,T
fi
∂ u ∂uj ei ; β,u,T +ηK fi
=
(c) for all K ∈ Br , there exists a neighborhood NK of 0 such that is C 1 as ∂ β,u,T +ηK a function of (u, η) in U × NK , and | ∂η fi | CKr for a constant C depending on ν, p, = only; β,u,T β,u,T β,u,T (d) fi is a real analytic function of u in M{i} = u : fi < fj ∀ j = i . Notice that the point (d) implies that the free energy f β,u,T is a real analytic function of u in ∪i M{i} (which is the region of uniqueness, as will be seen below). The proof of these properties involves the full artillery of the Pirogov–Sinai theory. The item (c) is not really standard and may appear as superfluous technicalities, but it plays a role when establishing the properties of the phase diagram, see Theorem 3.1 below. Since the present paper is only aimed at studying a special class of quantum models, we content ourselves with an outline of the proof, so as to make it plausible for readers who have knowledge of the details of the Pirogov–Sinai theory. A review of the Pirogov–Sinai theory is expected to appear shortly and will contain a detailed proof of these properties. Sketch of the proof of these properties. We heavily rely on [BKU], which itself follows [PS, Sin, Zah, BI, BK, BK2]. Our metastable free energies are defined as the real part of the metastable free energies of [BKU], which are complex in general. The first step consists in defining the metastable free energies. This can be done by introducing truncated contour activities and truncated partition functions following the inductive procedure of [BKU], Eqs. (5.6)–(5.12). One obtains metastable free energies (n) fj (that depend on β, u, T ). One can then prove the claims of Lemma A.1 i), iii), iv), β,u,T
(n)
= limn→∞ fj . v), vi) of [BKU]. We then set fj At this point we have well-defined metastable free energies depending on β, u and T (that is, they are functionals on the Banach space of interactions), and the free energy of the system is given by the minimum of the metastable free energies, as stated in item β,u,T β,u,T = eiu , and that fi is real analytic in (a). It is also clear that limβ→∞ limT →0 fi u on M{i} . What remains to be done is to check differentiable properties. β,u,T +ηK For given T and K, we consider fj as a function of (u, η). This is a mild complication of the situation in [BKU], since the metastable free energies here depend on p parameters instead of p − 1. One then gets the items ii) and vii) of Lemma A.1 – the partial derivatives with respect to η of the truncated contour activities and of the partition function with given external label satisfying the claims of the lemma with a constant C0 Kr instead of C0 . Finally, the metastable free energies are given as convergent series of clusters of contours, the weights of those obeying suitable bounds. This leads to item (c). / 0 We show now that these metastable free energies allow for a complete characterization of tangent functionals, under the extra assumption that the situation at zero temperature and without perturbation satisfies the Gibbs phase rule in a strong sense.
Quantum Lattice Models at Intermediate Temperature
45
The stronger condition for the Gibbs phase rule is that, for some u0 ∈ U, we have that all “potential ground state energies” are equal, eiu0 = eju0 for all i, j , and that the matrix of derivatives
∂ (3.8) eiu − epu 1 i,j p−1 ∂uj has an inverse that is uniformly bounded. Actually, energies eiu may not be differentiable; β,u in this case, we consider the same matrix with Re gi instead of eiu , and we suppose that it has an inverse for all β large enough, the inverse matrix being uniformly bounded with respect to u ∈ U, and β const. Theorem 3.1 (Stability of the phase diagram). Assume that there exist metastable free β,u,T energies fi , 1 i p, that satisfy all points (a)–(d) of the properties above. We assume in addition that the strong version of the Gibbs phase rule, described above, is satisfied. Then for β large enough and T r small enough (depending on p and on the bound of the inverse of the matrix of derivatives (3.8)), there exists U ⊂ U such that the phase diagram for H u + T , u ∈ U , at inverse temperature β, satisfies the Gibbs phase rule and is regular. Theorem 3.1 states that there exists u0 ∈ U such that the set of tangent functionals to the free energy at H u0 + T is a simplex with p extremal points. More generally, we have the decomposition U = U (1) ∪ · · · ∪ U (p) such that for u ∈ U (q) , the set of tangent functionals at H u + T is a q-dimensional simplex. This “completeness” of the phase diagram was addressed in [Zah] and [BW]. The approach was however different and involved studying the Gibbs states, which is more intricate and does not easily extend to the quantum case. It is simpler to look at tangent functionals, and then to use existing results on their equivalence with DLR or KMS states. Notice that the Pirogov–Sinai theory also provides various extra information, such as the fact that the limit of U (q) , as T → 0 and β → ∞, is equal to U (q) . Also, the extremal equilibrium states can be shown to be exponentially clustering. We do not claim these properties here however, because doing so would require extra assumptions and technicalities in the description of the abstract contour model. Proof of Theorem 3.1. Items (b) and (c) of the properties of metastable free energies β,u ,T
β,u0 ,T
(with η = 0) imply that there exists u0 such that fi 0 = fj the matrix of derivatives
∂ β,u,T fi − fpβ,u,T 1 i,j p−1 ∂uj
for all i, j , and that (3.9)
has a bounded inverse, uniformly in u in a neighborhood U of u0 . Let us define β,u,T
Mi = {u ∈ U : fi and, for Q ⊂ {1, . . . , p}, MQ =
i∈Q
β,u,T
= min fj j
Mi \
i ∈Q /
Mi
},
(3.10)
(3.11)
46
J. Fröhlich, L. Rey-Bellet, D. Ueltschi
(notice that M{i} Mi ). By the implicit function theorem, each MQ is described by a C 1 function from an open subset of Rp−|Q| into U . If we set U (q) = ∪|Q|=q MQ the phase diagram satisfies the Gibbs phase rule, provided there are exactly |Q| tangent functionals at H u + T for each u ∈ MQ . β,u,T Each metastable free energy fj , j ∈ Q, defines a tangent functional αj : for all β,u,T +ηK
∂ fj |η=0 . Notice that item (c) ensures boundedness K ∈ Br , we set αj (K) = ∂η 2 of the tangent functional. We show now that these tangent functionals are linearly independent, and that any other tangent functional is a linear combination of these ones. We examine the manifold where q phases coexist; without loss of generality, we can choose u˜ ∈ MQ with Q = {1, . . . , q}. The determinant of (3.9) can be written as a linear combination of determinants of
∂ β,u,T ˜ ˜ fi − fqβ,u,T , 1 i,j q−1 ∂ukj
(3.12)
with k1 , . . . , kq−1 being q − 1 different indices. Since the determinant of (3.9) differs from 0, at least one of the determinants in the previous equation differs from 0. Without loss of generality we can assume that
∂ β,u,T ˜ ˜ fi − fqβ,u,T 1 i,j q−1 ∂uj
(3.13)
is not singular. p−1 Our analysis is local, so we can take u˜ = 0 and H u = H 0 + j =1 uj K j . Then β,u,T
|u=0 , and non-singularity of (3.13) shows that (3.7) implies that αj (K i ) = ∂u∂ i fj αj , 1 j q, are linearly independent. Furthermore, it also implies that for all tangent functionals α the system of equations for ξ = (ξ1 , . . . , ξq ), α (K i ) =
q
ξj αj (K i ),
i = 1, . . . , q − 1,
(3.14)
j =1
has a unique solution with gj (u, η) =
β,u,T +ηK fj ,
j ξj
= 1 . Now we consider any K ∈ Br ; we define
1 j q, and
g1 (u, η) − gq (u, η) .. g(u, η) = . . gq−1 (u, η) − gq (u, η)
(3.15)
∂ g(0, 0) is an isomorphism, and g(u, η) is a map of class C 1 We have g(0, 0) = 0, ∂u by item (c) of the properties metastable free energies. By the implicit function theorem 2 One may wonder whether the functional α is linear. It is actually, because α can be obtained as the j j limit of linear functionals that are tangent to the free energy, uniquely defined for all points of M{j } – a region of parameters where the concave free energy has a unique tangent functional.
Quantum Lattice Models at Intermediate Temperature
47
there exists a map u(η) such that g(u(η), η) = 0. We introduce the interactions R(η) = K +
q−1 1 uj (η)K j , η
(3.16)
j =1
uj (0)K j .
(3.17)
= · · · = fqβ,0,T +ηR(η) .
(3.18)
R = lim R(η) = K + η→0
q−1 j =1
Then using (3.7) we have β,0,T +ηR(η)
f β,0,T +ηR(η) = f1
Differentiating with respect to η, we obtain (recall that α is tangent to f β,0,T +ηR(η) at η = 0) α (R) = α1 (R) = · · · = αq (R). (3.19) Then obviously α (R) = j ξj αj (R), and it follows by linearity of the tangent functionals that
α (K) =
q
ξj αj (K).
(3.20)
j =1
0 /
4. Results of the Quantum Pirogov–Sinai Theory We summarize in this section the results obtained in [BKU, DFF, DFFR, KU], and in the present paper. All results concern the situation where the interaction has the form H = V + T , where V is a classical interaction satisfying the standard Pirogov–Sinai framework, and T is a small perturbation. The temperature will be assumed to be small. The results however split into four classes, according to whether we use the perturbation methods of [DFFR] (Sect. 2.2), and whether we include high temperature expansions to analyze phases at intermediate temperatures. In this section, we implicitly assume all properties of the metastable free energies, see Subsect. 3.3, to be valid – without these properties the statements below would not include completeness, i.e. we could not ascertain to have identified all the periodic Gibbs states of the systems. 4.1. Quantum perturbation of classical model with finitely many ground states. In this case the classical interaction V has finitely many ground states and the phase diagram of V + T is, at low temperatures and for sufficiently small T a small deformation of the zero temperature phase diagram of V . The extension of the Pirogov–Sinai theory to this class of quantum systems goes back to [Pir] and was proved in [BKU, DFF]. ν (a) Structure. We denote by . = {1, . . . , M}Z the space of classical configurations; the dimension ν of the physical space is always supposed to be bigger or equal to 2. The interaction has the form H = V +T , where V is a block interaction and is diagonal with
48
J. Fröhlich, L. Rey-Bellet, D. Ueltschi
respect to the basis of classical configurations: if A = U (x) ≡ {y : |y − x|∞ R} for some x ∈ Zν , VA |e&ω = 0x (ωU (x) ) |e&ω ,
(4.1)
and VA = 0 if there is no x with U (x) = A. The function 0x depends on µ ∈ U ⊂ Rp−1 , and we assume that its derivatives ∂µ∂ j 0x (ωU (x) ) are bounded uniformly in x, µ, ω, j .
A finite set G = {g (1) , . . . , g (p) } ⊂ . of periodic configurations is given, that contains all ground states of V for all µ (see below the precise assumption). We write GA = {gA : g ∈ G}. We suppose that 0x (gU (x) ) is independent of x, for all g ∈ G, and µ this value is denoted by eg (this is the mean energy of the configuration g). (b) Assumptions. (A1) A gap separates the excitations: for all ωU (x) ∈ / GU (x) , 0x (ωU (x) ) − min 0x (gU (x) ) D g∈G
(uniformly in µ). (A2) The zero temperature phase diagram is (linearly) regular: there is µ0 ∈ U such µ µ µ that eg 0 = eg 0 for all g, g ∈ G, and the inverse of the matrix of derivatives MG , see (3.8), is uniformly bounded. (c) Properties of Gibbs states. Theorem 4.1. Assume (A1) and (A2) hold true. There exist β0 , c < ∞ (depending on ν, R, p, M and on the periods of {g (j ) } and H only) such that if βD β0 and T c /D 1, the phase diagram of the quantum model satisfies the Gibbs phase rule and is regular in a neighborhood U ⊂ U of µ0 . In the single phase region, i.e. if µ ∈ Mβ ({g}), the KMS state w β,µ,T (·) is close to the ground state g: for all K ∈ OA , limβ→∞,T r →0 w β,µ,T (K) = %eg |K|eg &. The condition T c /D 1 means that T is a perturbation with respect to V ; c plays the role of the perturbative parameter: from Definition (2.6) of the norm · c , TA must be very small if c is very large. The proof of this theorem follows from [BKU, DFF]. 4.2. Models with infinite degeneracy. Consider a model whose classical part has infinitely many ground states, and a perturbation which lifts this degeneracy completely. The pertubation methods of [DFFR] (see Sect. 2.2) permits one in certain cases to analyze this by constructing an equivalent interaction with a new classical part which has finitely many ground states. In this case the new perturbation has a slightly more complicated form than in Sect. 4.1 and the following theorem deals with this situation. This situation was considered in [DFFR] (for a different approach see [KU]). ν (a) Structure. The space of classical configurations is again . = {1, . . . , M}Z . We consider two sets G, D ⊂ ., with D ⊂ G finite, D = {d (1) , . . . , d (p) } is a finite set of periodic configurations; G may be infinite and will represent the configurations of low energy. For A ⊂ Zν , the Hilbert space HA has the following decomposition HA = low ⊕ Hhigh , where Hlow is the subspace spanned by the low energy configurations HA A A gA ∈ GA . The interaction has the form H = V + T , where V is a classical block
Quantum Lattice Models at Intermediate Temperature
49
interaction with uniformly bounded derivatives ∂µ∂ 0x (ωU (x) ), and T is a perturbation j that is submitted to some restrictions, see the assumptions below. (b) Assumptions. / GU (x) , (B1) A gap separates high and low energies: for all ωU (x) ∈ 0x (ωU (x) ) − max 0x (gU (x) ) D0 . g∈G
(B2) Gap with the ground states: we assume that 0x (dU (x) ) is independent of x for d ∈ D, and for all ωU (x) ∈ / DU (x) , 0x (ωU (x) ) − min 0x (dU (x) ) D d∈D
(and we assume that D D0 ). (B3) The perturbation may be decomposed T = K + K + K ; for all A, low KA HA = 0,
KA HA
⊂ HA ;
KA HA
low ⊂ HA
high
low KA HA ⊂ HA , high
high
high
(there is no assumption on K).3 µ (B4) The zero temperature phase diagram is (linearly) regular, i.e. all energies ed are µ equal for some µ0 ∈ U, and the matrix MD [see (3.8)] has a uniformly bounded inverse. (c) Properties of Gibbs states. Theorem 4.2. Assume (B1)–(B4) hold true. There exist β0 , c < ∞ (depending on ν, R, p, M and on the periods of {d (j ) } and H only) such that if βD β0 , Kc /D 1, K c /D0 1, K c /D0 1 the phase diagram of the quantum model satisfies the Gibbs phase rule and is regular in U ⊂ U, U µ0 . In the single phase region, i.e. if µ ∈ Mβ ({d}), the KMS state w β,µ,T (·) is close to the ground state d: for all K ∈ OA , limβ→∞,T r →0 w β,µ,T (K) = %ed |K|ed &. The proof of this theorem is given in [DFFR]. A somewhat different method yielding similar results has been developed later in [KU]. 4.3. Combined high and low temperature expansions. Here we consider models whose classical part V has partially ordered ground states, typically described by periodic configurations of holes and particles but still with infinite degeneracy due to, e.g., degeneracy of the spin at each site. Together with the quantum perturbation the system may have a continuous symmetry. We will suppose that the temperature is low and, in addition, that βT c is actually small (i.e. the temperature is large compared to T ) and we will prove that in this case one phase corresponds to each periodic configuration of holes and particles and that in this phase the spin degrees of freedom are in a disordered phase. This situation has many similarities with that of [BKL], and could be called “a theory of restricted ensembles in quantum lattice systems”. 3 Motivation comes from (2.25). It is however slightly more general, and it is just what is required in the proof of Theorem 4.2.
50
J. Fröhlich, L. Rey-Bellet, D. Ueltschi ν
(a) Structure. As before, let . = {1, . . . , M}Z . Intermediate temperature phases will be characterized by “motives” giving partial information on the underlying configurations. In order to describe this, we consider a partition of {1, . . . , M}: N
{1, . . . , M} =
Ij
with Ii ∩ Ij = ∅.
(4.2)
j =1
We denote N = {1, . . . , N} (and N ≡ NZν ). For n ∈ N , we write .n = {ω ∈ . : ωx ∈ Inx ∀x}. Let G = {g (1) , . . . , g (p) } ⊂ N be a finite set of periodic configurations; this is the set of motives and a pure phase will be associated with each of these configurations. We write .G = ∪g∈G .g . The interaction has the form H = V +T , where V is a classical block interaction with uniformly bounded derivatives w.r.t. µ, and T is a perturbation. We introduce restricted partition functions for each g ∈ G: let g Z = e−β x,U (x)⊂ 0x (ωU (x) ) (4.3) ω ∈.g,
and hβ,µ =− g
1 1 g lim log Z . β Zν ||
(4.4)
β,µ
µ
The ground energies are eg = limβ→∞ hg , g ∈ G. (b) Assumptions. (C1) For all configurations ωU (x) ∈ / .G,U (x) , we have 0x (ωU (x) ) − min 0x (ωU (x) ) D. ω ∈.G
Moreover, we assume that min
ωU (x) ∈.g,U (x)
0x (ωU (x) ) = eµ (g)
independently of x, for all g ∈ G. (C2) We need a condition that ensures that no phase transition takes place in a restricted ensemble .g ; in other words, spatial correlations should decay quickly enough. The following condition is stronger, and amounts to saying that there is no correlation between different sites. For all g ∈ G, we suppose that there exists an on-site interaction 0g such that for all x: g
0x (ωU (x) ) = 0x (ωx ) for all ω ∈ .g . µ µ (C3) The zero temperature phase diagram is regular with eg 0 = eg 0 , g, g ∈ G, for µ some µ0 ∈ U, and the matrix MG , see (3.8), has a uniformly bounded inverse.4 µ
β,µ
4 If {e } are not C 1 , we consider the matrix of derivatives of h g g is bounded uniformly w.r.t. µ and large β.
for β large; it must have an inverse that
Quantum Lattice Models at Intermediate Temperature
51
(c) Gibbs states at intermediate temperature. Theorem 4.3. Assume (C1)–(C3) hold true. There exist β0 , c < ∞ (depending on ν, R, p, M and on the periods of {g (j ) } and H only) such that if β0 βD < ∞ and βT c 1, the phase diagram satisfies the Gibbs phase rule and is regular in U ⊂ U, U µ0 . In the single phase region, i.e. if µ ∈ Mβ ({g}), the KMS state w β,µ,T (·) is close β,µ,T (K) = (Tr(P ))−1 Tr(KP ), to the motive g: for all K ∈ OA , lim β→∞,T r →0 w A A where PA is the projection given by ωA ∈.g,A |eωA &%eωA | . Remark. It follows from our assumptions that T is small compared to V ; more precisely, T c /D 1/β0 . This theorem is actually a consequence of Theorem 4.4 below, see the remark after Theorem 4.4. 4.4. Infinite degeneracy, high and low temperature expansions. Here we consider systems where phases result from subtle interplay between potential and kinetic energy, combining the effect described in Sects. 4.2 and 4.3. The quantum perturbation lifts partially the degeneracy of the classical interaction, leading at intermediate temperatures, to spatially ordered phases. Hereafter we describe the general framework in a rather abstract way; it will be illustrated in Sect. 5, and the reader may gain better understanding by working out a concrete application. ν (a) Structure. The space of classical configurations is . = {1, . . . , M}Z ; we consider a partition like in (4.2) and define similarly N and .n . We consider a (possibly infinite) set G ⊂ N that represents low energy configurations; the Hilbert spaces decompose in low ⊕ Hhigh , where Hlow is the subspace spanned by the the following way: HA = HA A A low-energy configurations gA ∈ GA . The interaction has the form H = V + T ; V is a block interaction with uniformly bounded derivatives ∂µ∂ 0x (ωU (x) ); the perturbation j
T decomposes further T = K + K + K ; we shall require different assumptions on K, K , K , motivated by the perturbation theory of Sect. 2.2. We suppose that a finite set D = {d (1) , . . . , d (p) } ⊂ G is given, that corresponds to possible ground states. For each d ∈ D, we define the corresponding restricted partition function d Z = e−β x,U (x)⊂ 0x (ωU (x) ) (4.5) ω ∈.d,
and the corresponding restricted free energy β,µ
hd µ
=−
1 1 d , lim ν log Z β Z ||
β,µ
and ed = limβ→∞ hd . (b) Assumptions. (D1) A gap separates high and low energies: for all ωU (x) ∈ / .G,U (x) , 0x (ωU (x) ) − max 0x (ωU (x) ) D0 . ω ∈.G
(4.6)
52
J. Fröhlich, L. Rey-Bellet, D. Ueltschi
(D2) Gap with the ground states: for all ωU (x) ∈ / .D,U (x) , 0x (ωU (x) ) − min 0(ωU (x) ) D. ω ∈.D
(D3) For all d ∈ D, there exists an on-site interaction 0d such that for all ω ∈ .d and all x, 0x (ωU (x) ) = 0dx (ωx ). Moreover, we suppose that µ
min 0dx (ωx ) = ed
ωx ∈Idx
independently of x. (D4) The quantum perturbation T = K + K + K has the same properties as in (B3), with respect to the decomposition into low and high energy states. µ µ (D5) There is µ0 ∈ U such that ed 0 = ed 0 , d, d ∈ D, and the matrix of derivaµ tives (3.8) has a uniformly bounded inverse (see the footnote of (C3) if ed is not differentiable). (c) Properties of Gibbs states. Theorem 4.4. Assume (D1)–(D5) hold true. There exist β0 , c < ∞ (depending on ν, R, p, M and on the periods of {d (j ) } and H only) such that if β0 βD < ∞, βKc 1, K c /D0 1, K c /D0 1, and βK 2c /D0 1, the phase diagram satisfies the Gibbs phase rule and is regular in an open set U ⊂ U that contains µ0 . In the single phase region, i.e. if µ ∈ Mβ ({d}), the KMS state w β,µ,T (·) is close β,µ,T (K) = (Tr(P ))−1 Tr(KP ), to the motive d: for all K ∈ OA , lim β→∞,T r →0 w A A where PA is the projection given by ωA ∈.d,A |eωA &%eωA | . This theorem follows from the contour representation obtained in Sect. 6, together with the Pirogov–Sinai theory. Remarks. 1. Theorem 4.3 is an immediate consequence of Theorem 4.4. Indeed, we clearly recover the setting of Sect. 4.3 by choosing G = . (i.e. all configurations have low energy), and K = K = 0. 2. These two theorems also generalize results of [Uel]: they can be applied to the Hubbard model † H = −t (cxσ cyσ + h.c.) + U nx↑ nx↓ , (4.7) <x,y> σ =↑,↓
x
to show that the high temperature phase extends to
(β, t, U ) : βt small
and
(β, t, U ) : βt 2 /U small
(standard high temperature expansions apply when both βt and βU are small).
Quantum Lattice Models at Intermediate Temperature
53
5. Example: Extended Hubbard Model This is a Hubbard model where particles interact among each other when their distance is smaller than or equal to 1. Explicitly, † H = −t (cxσ cyσ + h.c.) + U nx↑ nx↓ + W nx ny − µ nx . <x,y>⊂ σ =↑,↓
x∈
<x,y>⊂
x∈
(5.1) † , cxσ are creation, annihilation, operators of a fermion of spin σ at site x; Here, cx,σ † cxσ is the number of < x, y > stands for a set of nearest neighbor sites; nxσ = cxσ particles of spin σ at x (it has eigenvalues 0 and 1); nx = nx↑ + nx↓ is the total number of particles at x. The coefficient t represents the hopping, and will be taken to be small compared to the nearest-neighbor repulsion W ; µ is the chemical potential. The classical limit t → 0 was studied in [J¸ed, BJK]. The stability of the chessboard phase M(0,2) (see below) with small t is a straightforward application of [DFF]; a later study devoted to it is [BK3]. Weνstart by analyzing the classical interactions. The configuration space is . = {0, ↑, ↓, 2}Z and the corresponding classical interaction can be written as (taking R = 21 )
0x (ωU (x) ) =
U W δωy ,2 + ν−1 2ν 2 y∈U (x)
Here we introduced qy ∈ {0, 1, 2}: 0 qy = 1 2
qy q z −
⊂U (x)
µ qy . 2ν
(5.2)
y∈U (x)
if ωy = 0 if ωy =↑ or ωy =↓ if ωy = 2.
(5.3)
The interaction can also be written as a sum over pairs of n.n. sites; this simplifies the analysis of the zero temperature phase diagram, and the search for symmetries (see below). This pair interaction is given by 0<x,y> (qx , qy ) =
U µ (δqx ,2 + δqy ,2 ) + W qx qy − (qx + qy ). 2ν 2ν
(5.4)
This model has a hole-particle symmetry. Introducing the unitary operator U such that † † U −1 = cxσ and U cxσ U −1 = cxσ , we see that U T U −1 = T .As for the potential, U cxσ the effect of the symmetry can be exhibited by considering classical configurations; defining qx = 2 − qx , and µ = U + 4νW − µ, we easily check that
µ 0µ <x,y> (qx , qy ) = 0<x,y> (qx , qy ) + C,
(5.5)
where C = −U/ν − 4W + 2µ/ν does not depend on (qx , qy ). As a result, the phase diagrams (U, µ) are symmetric along the line µ= for any temperature.
U + 2νW, 2
(5.6)
54
J. Fröhlich, L. Rey-Bellet, D. Ueltschi µ ν|W |
µ νW
M1
M2
2
U ν|W |
4
M(0,2)
-2
M(1,2)
M2 4
M1
2 M(0,1)
M0 2
U νW
M0
(a) (b) Fig. 2. Zero temperature phase diagrams of the extended Hubbard model, (a) when W < 0 and (b) when W > 0. The dashed line represents the hole-particle symmetry, see (5.6)
The zero temperature phase diagrams with t = 0 are depicted in Fig. 2, in both cases W < 0 and W > 0. In the case W < 0, it decomposes into three domains M0 , M1 , and M2 ; M0 and M2 have a unique translation invariant ground state with respectively 0 and 2 particles at each site. In M1 , any configurations with one particle per site is a ground state; there is degeneracy 2|| since each particle has spin ↑ or ↓. The situation W > 0 presents a richer structure with six domains. Domains M0 , M1 and M2 have the same features as with attractive n.n. interactions. In between domains M(0,2) , M(1,2) and M(0,1) now appear. M(0,2) consists in two ground states, the two 1 chessboard configurations with alternatively 0 and 2 electrons per site. M(0,1) has 2·2 2 || ground states of the chessboard type, one sublattice being empty, while the other has exactly one particle of spin ↑ or ↓; M(1,2) is similar, with 2 particles per site on one sublattice and one on the other. We are interested in the case where the temperature is small, but bigger than 0, and with small hopping. The phase diagrams for large β and small βt are presented in Fig. 3. µ νW
µ ν|W | β,t
β,t
β,t
M2
M1
M2 2
4
U ν|W |
β,t
4
M1
2
-2
β,t
Mcb β,t
M0
2
β,t
M0
U νW
(b) (a) Fig. 3. Phase diagrams of the extended Hubbard model at intermediate temperature and with small hopping, (a) when W < 0 and (b) when W > 0. Bold lines denote first-order phase transitions. White is the region PK that resists rigorous investigations, where second-order transitions are expected
In the case W < 0, all three domains survive at low temperature and with t = 0; a first-order phase transition occurs when crossing the border between any two domains.
Quantum Lattice Models at Intermediate Temperature
55 β,t
µ U The point ( νW = 2, νW = 1) belongs to M1 : this phase has residual entropy (it also has more quantum fluctuations, although this has much less effect). The Gibbs β,t state corresponding to the domain M1 is thermodynamically stable and exponentially clustering. The restriction to intermediate temperatures (βt ε) is important, because, for ν 3, a phase transition is expected when the temperature decreases, leading to an antiferromagnetic phase that breaks both symmetries of translations and of rotations of the spins. The phase diagram at finite β and nonzero t is especially interesting for W > 0. β,t β,t β,t β,t There are not six, but only four domains M0 , M1 , M2 and Mcb ; see Fig. 3. Indeed, the three domains corresponding to chessboard phases have merged into a single domain (this was first understood and proven in [BJK] in the absence of hopping). The β,t β,t free energy is real analytic in the whole domain Mcb . The transition between M2 β,t and Mcb is presumably second-order, but our results do not cover the intermediate β,t β,t region between these domains. The boundary between Mcb and M1 contains a part where a first-order phase transition occurs that can be rigorously described. Crossing the boundary elsewhere presumably results in a second-order transition. Due to the thermal β,t fluctuations, the segment from (2,2) to (2,4) belongs to M1 . Our results for this model are summarized in the next two theorems.
Theorem 5.1 (Hubbard model with attractive n.n. interactions). Let ν 2. There exist constants β0 < ∞ and ε0 > 0 (depending on ν) such that the phase diagram β,t (U, µ) for β|W | β0 and βt ε0 is regular; domains Ma , a ∈ {0, 1, 2} satisfy β,t β,t limβ→∞ limt→0 Ma = Ma . If (U, µ) belongs to a unique Ma , there is a unique Gibbs state. Furthermore, the density of the system is close to a, %nx & − a ε(β, t), for all x. ε(β, t) can be made arbitrarily small by taking β large and t small. In order to describe the situation W > 0 we first introduce the region of the phase diagram PK where we have no results. Let M0 ∪ M1 ∪ M2 \ M(0,2) ∩ M1 , (5.7) L = M(0,2) ∪ M(1,2) ∪ M(0,1) and for K > 0, PK =
BK (U, µ),
(5.8)
(U,µ)∈L
where BK (U, µ) is the open ball of radius K centered on (U, µ). We restrict our considerations to the complement of PK . Theorem 5.2 (Hubbard model with n.n. repulsions). Let ν 2 and K > 0. There exist constants β0 < ∞ and ε0 > 0 (depending on ν and K) such that if β0 βW < ∞ and βt ε0 , we have the decomposition β,t
β,t
β,t
β,t
PKc = M0 ∪ M1 ∪ M2 ∪ Mcb , and
56
J. Fröhlich, L. Rey-Bellet, D. Ueltschi β,t
β,t
β,t
(i) M0 ⊂ M0 , M2 ⊂ M2 , M1 ( ⊂ M1 ) are domains with a unique Gibbs state. Densities are close to 0, 2, 1 respectively in the sense β,t in M0 %nx & ε(β, t) β,t %nx & 2 − ε(β, t) in M2 β,t |%nx & − 1| ε(β, t) in M1 with ε(β, t) arbitrarily close to 0 if β is large and t small. β,t (ii) Mcb ⊂ M(0,2) ∪ M(1,2) ∪ M(0,1) is a domain with two extremal Gibbs states of the chessboard type. The free energy is a real analytic function of β and µ in the domain β,t (β, µ) : β0 /W β ε0 /t and (U, µ) ∈ Mcb . β,t
β,t
(iii) Mcb ∩ M1 states.
is a line of first-order phase transition, with exactly three extremal
Remarks. The proofs of Theorems 5.1 and 5.2 use Theorem 4.3. But using Theorem 4.1, one could establish stability of domains M0 , M2 , M(0,2) for all β|W | β0 , without the restriction that the temperature be not too small. Another possible improvement, for U, W > 0, would use Theorem 4.4 to replace the condition βt ε0 by βt 2 /U ε0 . The latter clearly allows lower temperatures.5
6. Combined High-Low Temperature Expansions In this section we simultaneously perform a low and a high temperature expansion. The low ) are temperature is low, in such a way that excitations above the low energy states (H rare. At the same time, the temperature is high relatively to the quantum perturbations K and K . These expansions allow to write the partition functions as one of a contour model, that can be treated by the Pirogov–Sinai theory, see Sect. 3.2. We rewrite the quantum model as a contour model, by making a mixed low and high temperature expansion (Sect. 6.1); we define suitable weights, so that the partition function takes the form required in Sect. 3.2. Section 6.2 is devoted to proving that the weights are small compared to their size. Finally, we explain in Section 6.3 how other requirements of Sect. 3.2 are fulfilled.
6.1. Expansion of the partition function. Our intention is to expand in K + K + K ; in order to simplify the notation, we introduce B = (B, i), B ⊂ Zν , i = 1, 2, 3, and we write KB = TB with B = (B, 1), KB = TB with B = (B, 2), and KB = TB with B = (B, 3). We refer to B as a transition. 5 Furthermore, the restriction to intermediate temperatures arises because of possible antiferromagnetism due to “quantum fluctuations” of strength t 2 /U ; it should be stable for βt 2 /U > const; therefore this new condition is qualitatively correct.
Quantum Lattice Models at Intermediate Temperature
57
Using Duhamel’s formula, we obtain Tr e−βH = Tr e−β
B⊂ VB
+
m
e−τ1
1 x∈ 0x (ωU (x) )
0<τ1 <···<τm <β
1 B 1 ,...,B m ω1 ,...,ωm
1 2 %ω | TB 1 |ω & e−(τ2 −τ1 ) m . . . %ω | TB m
2 x∈ 0x (ωU (x) )
1 |ω & e−(β−τm )
dτ1 . . . dτm
...
1 x∈ 0x (ωU (x) )
.
(6.1)
At this point, it is natural to define the supports of contours as all sites that belong j / .D,U (x) . But two technical to ∪j Bj , or for which there exists ωj such that ωU (x) ∈ (1) (p) are periodic rather than translation invariant; and the difficulties arise: d , . . . , d weight of a contour should not depend on the configuration outside of its support (but it may depend on the labeling α). The later difficulty is specific to systems with phases given by a restricted ensemble instead of a single configuration. To account for these difficulties, we introduce a partition of the lattice into cubes of size =, where = is the lcm of the periods of {d (i) } (considering all spatial directions). Let B¯ = ∪x∈B U (x); we define excited cubes. • A cube C is quantum excited if there is B i such that C ∩ B¯ i = ∅. j • Otherwise, it is classically excited if there is ωj and x ∈ C such that ωU (x) ∈ / .D,U (x) . Consider the set Q of quantum excited cubes, the set E of classically excited cubes, and the set N of cubes that are neighbors of Q ∪ E (two cubes C = C are neighbors iff there exist x ∈ C and y ∈ C with |x − y|∞ = 1). Connected components of Q ∪ E ∪ N form the supports of the contours. Connected components of the complement of Q ∪ E ∪ N are characterized by a configuration d ∈ D, and this information may be stored in the labeling α. The union of all components corresponding to d is denoted Wd . Then = Q ∪ E ∪ N ∪ (∪d∈D Wd ),
(6.2)
see Fig. 4 for illustration. Wd is a union of cubes, each cube C contributing in (6.1) by a factor [we use (D3)] β,µ ν d e−β x∈C 0x (ωx ) = e−βhd = . (6.3) ωC ∈.d,C
Summing first over admissible sets of contours {A1 , . . . , Ak }, we can rewrite (6.1) in the following way: Tr e−βH =
e
−β
{A1 ,...,Ak } d∈D ωWd ∈.d,Wd k
e
−τ1
x∈Aj
j =1 Q⊂A m 0 B 1 ,...,B m ω1 ,...,ωm Aj Aj B¯ i ⊂Q 1 0x (ωU (x) )
x∈Wd
0dx (ωx )
0<τ1 <...<τm <β
1 2 m 1 %ωA | TB 1 |ωA & . . . %ωA | TB m |ωA &e j j j j
dτ1 . . . dτm
−(β−τm )
x∈Aj
1 0x (ωU (x) )
. (6.4)
58
J. Fröhlich, L. Rey-Bellet, D. Ueltschi
A2
A3
A1
A4 Fig. 4. The space is divided into cubes; contours are formed by excited cubes (in black) and by their neighbors. There are four contours on this picture
We used here the fact that the contribution of different contours factorizes. There are i }: each cube several restrictions to the sums over transitions {B i } and configurations {ωA j of Q is intersected by at least one B¯ i ; {ωi } are compatible with the labeling αj ; excited Aj
cubes of Aj \ Q do not touch the boundary of Aj ; and non excited cubes in Aj have j at least one neighbor that is excited. In the last line appears configuration ωU (x) with U (x) ∩ Wd = ∅, hence depending on ωWd . However, in such a case x belongs to a cube j j that is not excited, so that ωU (x) ∈ .d,U (x) . From (D3) we can substitute 0x (ωU (x) ) j
with 0dx (ωx ), which does not depend any more on the configuration outside the support of the contour.6 Then we obtain Tr e−βH =
β,µ
e−βhd
{A1 ,...,Ak } d∈D
|Wd |
k
z(Aj ),
(6.5)
j =1
where the sum is over admissible sets of contours, and z(A) is the weight of the contour A. The explicit expression for z(A) looks rather tedious, but the main point is to establish the properties of Sect. 3.2. The expression of z(A) is z(A) = e−τ1
m 0 Q⊂A B 1 ,...,B m ω1 ,...,ωm 0<τ1 <...<τm <β A A B¯ i ⊂Q
1 x∈A 0x (ωU (x) )
dτ1 . . . dτm
1 2 m 1 %ωA | TB 1 |ωA & . . . %ωA | TB m |ωA & e−(β−τm )
1 x∈A 0x (ωU (x) )
,
(6.6)
i }, see above. with some restrictions on the sums over {B i } and {ωA
Remark. We constructed contours out of cubes, while the supports of contours in Sect. 3.2 are any connected sets. There is no contradiction, if we define the weight z(A) to be 0 if the support of A is not a union of cubes. 6 This is why cubes that are neighbors of excited cubes need to be considered as part of contours.
Quantum Lattice Models at Intermediate Temperature
59
6.2. Bounds for the weights of the contours. We turn to the proof of the exponential decay of the weight of contours, as required in Sect. 3.2. We give the following “spacetime” interpretation to the collection of sums and integrals in (6.6): we view (B¯ j , τj ) as a subset of A × [0, β] per , with periodic boundary conditions along the “vertical” interval [0, β]. Furthermore, to each “time” τ ∈ [0, β] per corresponds the configuration ωj for which (τj −1 , τj ] τ . We define B=
m
B¯ j × {τj }
Q × {0};
j =1
E=
m+1 j =1
j
E(ωQ ) × [τj −1 , τj ],
|E| =
m+1 j =1
j
|E(ωQ )|(τj − τj −1 ),
(with τ0 ≡ 0, τm+1 ≡ β, and ωm+1 ≡ ω1 ). Here, we set B¯ = ∪x∈B U (x), and E(ωQ ) = {x ∈ Q : ωU (x) ∈ / .G,U (x) }. From assumptions (D1) and D2) we can bound µ
|z(A)| e−βe0 |A|
m 0 Q⊂A B 1 ,...,B m ω1 ,...,ωm 0<τ1 <···<τm <β A A B¯ i ⊂Q ν
dτ1 . . . dτm
e−βD|E |/= e−D0 |E|
m
TB j ,
(6.7)
j =1 j
where the sums over {B j } and {ωA } satisfy the restrictions explained above. We view each B¯ j as a connected subset of Rν+1 (one can e.g. add links between nearest neighbors). Then B ∪ E is a subset of Rν+1 made out of vertical segments and horizontal sets. We consider connected components of B ∪ E. For a connected component with m horizontal sets and m m − 1 vertical segments, we deleted m − m + 1 of the latter, in such a way that the component remains connected. One of these components contains Q× {0}, possibly with extra vertical segments and horizontal sets. Other components have m horizontal sets and m − 1 vertical segments. Because of the structure (D4), or (B3), components not linked with Q × {0}, either consists in a single transition of type K, or include at least two transitions of type K or K . A connected object with m horizontal sets, and (m − 1) vertical segments that end on the horizontal sets, is called a gather and is denoted by the letter G. It is illustrated in Fig. 5. We introduce the following sets of gathers: • Gm : gathers with m horizontal sets, one containing the origin {x = 0} × {τ = 0}; G = G∞ . • G : gathers of G1 that consist in a unique transition of type K. • Gm : gathers of Gm , with at least two transitions of type K or K ; G = G∞ . The connected component of B ∪ E that contains Q × {0} can be viewed as a set of gathers, each gather being connected to Q × {0} by a vertical segment. i } leads to a set of gathers, we obtain a bound by first Since a choice of {B i } and {ωQ integrating over sets of gathers, then summing over compatible space-time configurations
60
J. Fröhlich, L. Rey-Bellet, D. Ueltschi β
B¯ j
0
A Fig. 5. A “gather” with 6 transitions and 5 vertical segments
ωA , and choosing which gathers are linked to Q × {0}. Therefore 1 µ |z(A)| e−βe0 |A| dG1 . . . dGk k! k0
ωA links
ν
e−βD|E |/= e−D0 |E|
k
TB .
(6.8)
j =1 B∈Gj
The shortcut dG means a sum over the number m of transitions, a sum over transitions over ordered times τ1 , . . . , τm , and a sum over (m − 1) vertical B 1 , . . . , B m , an integral segments that link B¯ i × {τi } together. We define
ν ¯ z˜ (G) = e−D0 |G | TB e2ν= (log M+τ )|B| , (6.9) B∈G
where |G| is the total length of the vertical segments of G. If βD/=ν 2ν(log M + τ ), we can write k (2|Q|)k β µ z(A)| eτ |A| e−βe0 |A| dτ e−D0 τ dG z˜ (G) k! G 0 k0 (6.10) k |Q|k β dτ dG z˜ (G) , k! G ∪G 0 k0
where the first sum corresponds to the number of gathers linkedto Q × {0}, and the second sum is the number of independent gathers. The shortcut G dG is identical to dG, except for the absence of an integral over τ1 , which is set to 0; integrals over G and G are similar. One easily obtains an upper bound for the gathers with a unique transition: ν ¯ dG z˜ (G) = KB e2ν= (log M+τ )|B| (2R + 1)ν Kc (6.11) G
¯ B,B0
Quantum Lattice Models at Intermediate Temperature
61
with c = 2ν=ν (2R + 1)ν (log M + τ ); this is smaller than D0 if c is large enough in the assumptions of Theorem 4.4. For general gathers, we proceed by induction. First, (6.12) dG z˜ (G) (2R + 1)ν Kc + K c + K c . G1
Next, we use the recursion inequality: ν ¯ dG z˜ (G) TB e2ν= (log M+τ )|B| Gm
¯ B,B0
k 1 β dτ e−D0 τ dG z˜ (G) . 2 k! Gm−1,0 0
k0
(6.13)
y∈B¯
Integrating over τ , and since K · c /D0 1 for a large enough c, we get ν ¯ ¯ dG z˜ (G) TB e2ν= (log M+τ )|B| e2|B| Gm
¯ B,B0 ν
(6.14)
(2R + 1) Kc + K c + K c . This holds independently of m. This allows to estimate the integral over gathers that contain at least two transitions of type K or K . Let Gm ⊂ Gm be gathers with at least one transition of type K or K . One easily obtains dG z˜ (G) (2R + 1)ν Kc + K c . (6.15) Gm
Then the integral over gathers with two transitions of type K or K can be done by integrating first on the time for such a transition, then over vertical segments and gathers at their ends, at least one of which must belong to Gm−1 . We obtain
β
dτ 0
ν (log M+τ )|B| ¯
(KB + KB ) e2ν=
¯ Q B,B⊂
β
dτ e−D0 τ
0
Gm−1
dG z˜ (G)
k 1 β 2 dτ e−D0 τ dG z˜ (G) k! Gm−1,0 0
k0
β|Q|
y∈B¯
(Kc + K c )2 . D0
Plugging these estimates in (6.10), one easily gets µ
z(A) eτ |A| e−βe0 |A| e3|A| .
(6.16)
Exponential decay of the weights of the contours is now clear. The bound on the derivative can be proven in the same way. Looking at (6.6), we see that the integrand gets a factor bounded by β|A| supx,µ,ω,j | ∂µ∂ 0x (ωU (x) )|. j
62
J. Fröhlich, L. Rey-Bellet, D. Ueltschi
6.3. Other properties of the weights. The weight of the contours can be viewed as a series in powers of {KB }, {KB }, {KB }. Since it is absolutely convergent uniformly in K, K , K (provided they be small enough), we have by the dominated convergence theorem lim
K,K ,K →0
z(A) = 0.
(6.17)
Analyticity of z(A) as a function of µ and β is clear, as well as a function of η if we add a new perturbation ηL, in a neighborhood of 0 that depends on βL. Periodicity is also obvious. Acknowledgements. It is a pleasure to thank Roberto Fernández, Roman Kotecký, and Charles-Édouard Pfister for several discussions, and the referee for useful comments.
References [Ara] [BI] [BJK] [BK] [BK2] [BK3] [BKU] [BW] [BR] [BKL] [BS] [DFF] [DFF2] [DFFR] [DLS] [EFS] [FL] [FR] [FSS]
Araki, H.: On the equivalence of the KMS condition and the variational principle for quantum lattice systems. Commun. Math. Phys. 38, 1–10 (1974) Borgs, C. and Imbrie, J.: A unified approach to phase diagrams in field theory and statistical mechanics. Commun. Math. Phys. 123, 305–328 (1989) Borgs, C., J¸edrzejewski, J. and Kotecký, R.: The staggered charge-order phase of the extended Hubbard model in the atomic limit. J. Phys. A 29, 733–747 (1996) Borgs, C. and Kotecký, R.: A rigorous theory of finite-size scaling at first-order phase transitions. J. Stat. Phys. 61, 79–119 (1990) Borgs, C. and Kotecký, R.: Surfaced induced finite size effects for first order phase transitions. J. Stat. Phys. 79, 43–115 (1994) Borgs, C. and Kotecký, R.: Low temperature phase diagrams of fermionic lattice systems. Commun. Math. Phys. 208, 575–604 (2000) Borgs, C., Kotecký, R. and Ueltschi, D.: Low temperature phase diagrams for quantum perturbations of classical spin systems. Commun. Math. Phys. 181, 409–446 (1996) Borgs, C. and Waxler, R.: First order phase transitions in unbounded spin systems. II. Completeness of the phase diagram. Commun. Math. Phys. 126, 483–506 (1990) Bratteli, O. and Robinson, D.W.: Operator Algebras and Quantum Statistical Mechanics I and II. Berlin–Heidelberg–New York: Springer-Verlag, 1981 Bricmont, J., Kuroda, J. and Lebowitz, J.L.: First order phase transitions in lattice and continuous systems: Extension of Pirogov–Sinai theory. Commun. Math. Phys. 101, 501–538 (1985) Bricmont, J. and Slawny, J.: Phase transitions in systems with a finite number of dominant ground states. J. Stat. Phys. 54, 89–161 (1989) Datta, N., Fernández, R. and Fröhlich, J.: Low-temperature phase diagrams of quantum lattice systems. I. Stability for quantum perturbations of classical systems with finitely-many ground states. J. Stat. Phys. 84, 455–534 (1996) Datta, N., Fernández, R. and Fröhlich, J.: Effective Hamiltonians and phase diagrams for tight-binding models. J. Stat. Phys. 96, 545–611 (1999) Datta, N., Fernández, R., Fröhlich, J. and Rey-Bellet, L.: Low-temperature phase diagrams of quantum lattice systems. II. Convergent perturbation expansions and stability in systems with infinite degeneracy. Helv. Phys. Acta 69, 752–820 (1996) Dyson, F.J., Lieb, E.H. and Simon, B.: Phase transitions in quantum spin systems with isotropic and nonisotropic interactions. J. Stat. Phys. 18, 335–383 (1978) van Enter, A., Fernández, R. and Sokal, A.D.: Regularity properties and pathologies of position-space renormalization-group transformations: Scope and limitations of gibbsian theory. J. Stat. Phys. 72, 879–1167 (1993) Fröhlich, J. and Lieb, E.H.: Phase transitions in anisotropic lattice spin systems. Commun. Math. Phys. 60, 233-267 (1978) Fröhlich, J. and Rey-Bellet, L.: Low-temperature phase diagrams of quantum lattice systems. III. Examples. Helv. Phys. Acta 69, 821-849 (1996) Fröhlich, J., Simon, B. and Spencer, T.: Infrared bounds, phase transitions and continuous symmetry breaking. Commun. Math. Phys. 50, 79–95 (1976)
Quantum Lattice Models at Intermediate Temperature
63
[GKU] Gruber, Ch., Kotecký, Ch. and Ueltschi, D.: Planar and lamellar antiferromagnetisms in Hubbard models. J. Phys. A 33, 7857–7871 (2000) [Isr] Israel, R.B.: Convexity in the Theory of Lattices Gases. Princeton, NJ: Princeton University Press, 1979 [J¸ed] J¸edrzejewski, J.: Phase diagrams of extended Hubbard models in the atomic limit. Physica A 205, 702–717 (1994) [KP] Kotecký, R. and Preiss, D.: An inductive approach to Pirogov–Sinai theory. Suppl. ai rendiconti del circilo matem. di Palermo, ser. II 3, 161–164 (1984) [KU] Kotecký, R. and Ueltschi, D.: Effective interactions due to quantum fluctuations. Commun. Math. Phys. 206, 289–335 (1999) [Lieb] Lieb, E.H.: The Hubbard model: Some rigorous results and open problems. In: XIth Internat. Congress of Math. Physics (Paris, 1994), Cmabridge, MA: Internat. Press, 1995, pp. 392–412 [Pei] Peierls, R.: On the Ising model of ferromagnetism. Proc. Cambridge Philos. Soc. 32, 477-481 (1936) [Pir] Pirogov, S.A.: Phase diagrams of quantum lattice systems. Soviet Math. Dokl. 19, 1096–1099 (1978) [PS] Pirogov, S.A. and Sinai, Ya.G.: Phase diagrams of classical lattice systems. Theoretical and Mathematical Physics 25, 1185–1192 (1975); 26, 39–49 (1976) [Rue] Ruelle, Ya.G.: Statistical Mechanics: Rigorous Results. Reading, MA: W. A. Benjamin, 1969 [Sim] Simon, B.: The Statistical Mechanics of Lattice Gases. Princeton: Princeton Univ. Press, 1993 [Sin] Sinai, Ya.G.: Theory of Phase Transitions: Rigorous Results. London: Pergamon Press, 1982 [Uel] Ueltschi, D.: Analyticity in Hubbard models. J. Stat. Phys. 95, 693–717 (1999) [Zah] Zahradník, M.: An alternate version of Pirogov–Sinai theory. Commun. Math. Phys. 93, 559–581 (1984) Communicated by H. Spohn
Commun. Math. Phys. 224, 65 – 81 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Ergodicity of the 2D Navier–Stokes Equations with Random Forcing J. Bricmont1 , A. Kupiainen2 , R. Lefevere2 1 UCL, Physique Théorique, 1348 Louvain-la-Neuve, Belgium 2 Helsinki University, Department of Mathematics, P.O.Box 4, Helsinki 00014, Finland
Received: 18 May 2000 / Accepted: 8 December 2000
Dedicated to Joel L. Lebowitz for his 70th birthday Abstract: We consider the Navier–Stokes equation on a two dimensional torus with a random force, acting at discrete times and analytic in space, for arbitrarily small viscosity coefficient. We prove the existence and uniqueness of the invariant measure for this system as well as exponential mixing in time. 1. Introduction A convenient mathematical model for the study of homogeneous isotropic turbulence is to consider the Navier–Stokes equation subject to a random stationary (in space and time) forcing. The turbulent situation is modelled by a smooth force, i.e. one whose Fourier transform decays fast for large wave numbers. One is then interested in various properties of the correlation functions of the velocity field in a stationary state of the ensuing stochastic process.An obvious first question concerns the large time convergence to such a stationary state starting from an arbitrary initial condition of the velocity field, i.e. the uniqueness of the stationary state. In this paper we prove the existence, uniqueness and exponential mixing of the stationary state in the case of two dimensional turbulence. We consider the Navier–Stokes equation for an incompressible velocity field u(t, x) defined on the torus T = (R/2π Z)2 : ∂t u + (u · ∇)u − ν u = f − ∇p
(1)
supplemented with the incompressibility condition ∇ · u = 0. The external force f(t, x) consists of random kicks at discrete times e−2πix·k fk (t) f(t, x) = k∈Z2 Research partially supported by EC grant FMRX-CT98-0175 and by ESF/PRODYN.
(2)
(3)
66
J. Bricmont, A. Kupiainen, R.Lefevere
with fk (t) =
δ(t − n)fk,n .
(4)
n∈Z
The random variables fk,n will be taken Gaussian, with mean zero, f¯k = f−k and covariance β Efkα (m)fl (n) = δk,−l δm,n δ αβ φk . Furthermore, we will assume φ0 = 0, which implies the vanishing of theaverage force over the torus: T f(t, x) = 0. Assuming also zero average initial velocity T u(0, x) = 0 we conclude that T u(t, x) = 0 for all times t. It is convenient to solve the incompressibility condition (2) by expressing the Navier– Stokes equation (1) in terms of the vorticity ω = ∂1 u2 −∂2 u1 which satisfies the transport equation ∂t ω + (u · ∇)ω − ν ω = g,
(5)
where g = ∂1 f2 − ∂2 f1 . Going to the Fourier transform ωk (t) = (2π )−2 T eik·x ω(t, x)dx with k ∈ Z2 , we may solve the velocity in terms of the vorticity as uk = i and write the vorticity equation as ∂t ωk = −νk2 ωk +
(−k2 ,k1 ) ωk k2
(k × l)|l|−2 ωk−l ωl +
δ(t − n)gk (n),
(6)
n∈Z
l∈Z2 \{0,k}
where k × l = k1 l2 − l1 k2 and gk (n) are Gaussian with mean zero, g¯k = g−k and covariance Egk (m)gl (n) = δk,−l δm,n γk with
γ k = k 2 φk .
We assume −1 |k|
b−1 e−κγ
−1 |k|
≤ γk ≤ be−κγ
,
(7)
where κγ > 0, and we think of b as being large. We will be interested in the turbulent region ν → 0; therefore, when it is convenient, we will always assume below that ν is small enough, although our results hold for all ν. Before stating our result, we need some definitions. First, we define the enstrophy as (a multiple of) the square of the L2 norm 1 1 "= 2 |ωk |2 = 2 ||ω||2L2 . (8) k
Next, we fix a number r > 1 and consider the Banach space $ = {ω | ω ≡ sup |ωk ||k|r < ∞} k
Ergodicity of 2D Navier–Stokes Equations with Random Forcing
67
as our probability space, with B the product σ -algebra. Note that $ is a subspace of L2 . Finally, due to the analyticity (with probability one) of the noise, ω(t) also will turn out to be analytic with probability one and it will be useful to introduce norms capturing this property. For any positive number κ, we define a norm (that we shall call the κ-norm), ||ω||κ = sup |ωk ||k|r eκ
−1 |k|
.
(9)
k
Functions with ||ω||κ < ∞ are analytic in a κ −1 neighbourhood of the torus. The factor |k|r is useful technically (and was already used in [5]). The stochastic equation (6) gives rise to a Markov chain ω(n), n ∈ N defined by ω(n + 1) = F (ω(n)) + g(n + 1),
(10)
where F is the map at time 1 of the Navier–Stokes flow (6) without the forcing. We denote by P (ω, E) the transition probability of this chain. Our main result is the Theorem. The Markov chain (10) is defined on ($, B) and has a unique invariant measure µ there. It satisfies 2 1(||ω||κ ≥νκ) µ(dω) ≤ C exp(−cν 4 κ α ) (11) for any α > 1 + r, and C, c < ∞, depending on α. Moreover, ∀ω ∈ $ and ∀E ∈ B, we have, |P t (ω, E) − µ(E)| ≤ C(ω)e−mt , C . where m = m(ν) > 0 for all ν, and C(ω) ≤ C ω+1 ν
(12)
Remark 1. Since || · ||κ < || · ||κ for κ > κ, (11) holds for all κ -norms with κ > κ too, including the norm || · || defining $, which corresponds to κ = ∞. Estimate (11) means that with high probability ω is analytic in a ν 2α -neighbourhood of the torus and bounded there by ν 1−2α . By taking r close to 1, α can be taken close to 2. Remark 2. Here and below, we denote by C or c a “generic” constant that can vary from place to place, even in the same equation. Remark 3. We obtain a lower bound on m in (12) of the form c m ≥ exp −Cν −3 log ν −1 (see Proposition 2 and Lemma 4 below), which means, however, that our estimate on the rate of convergence is unphysically small for ν small. Let us finish this section by a brief comment on previous work on the uniqueness question. There is a long history of proofs in cases that do not correspond to the turbulence problem. Either the forcing is taken to decay very slowly for large |k|, i.e. with a lower bound of the form |k|−p (see [3] and references therein), or the viscosity is taken large [6]. The only proof of uniqueness we know of in the turbulent situation is the recent one [4] where one considers a model like ours but with bounded noise (each gk has compact support). The proof of the Theorem, given in Sect. 4, will be based on probabilistic estimates (Sect. 3) and on properties of the deterministic Navier–Stokes equation, which we discuss now.
68
J. Bricmont, A. Kupiainen, R.Lefevere
2. The Deterministic Navier–Stokes Flow In this section we derive some properties of the flow of the deterministic Navier–Stokes equation, i.e. (6) without the forcing term gk (t). Let us define a family of subsets of $ that impose constraints on the size of the L2 -norm and of the κ-norm: U (κ, φ, A) = {ω|" ≤ φ, ||ω||κ ≤ A}.
(13)
Then, we introduce a one-parameter subfamily of U (κ, φ, A): Uκ ≡ U (κ, φ(κ), A(κ)),
(14)
2
where, φ(κ) = ν 2 ϕκ α and A(κ) = νaκ. This family is useful because, as we shall see, the flow maps one Uκ in that family into another one with a smaller κ. The parameter α will be taken to satisfy α > 1 + r and ϕ and a will be chosen small depending on some “geometric” constants that will appear in the course of the proof. Thus, if ω ∈ Uκ , then for all k we have |ωk | ≤ νaκ|k|−r e−κ
−1 |k|
(15)
and 2
" ≤ ν 2 ϕκ α .
(16)
κ , 1 + ηνt min(1, κ)
(17)
Let now κ(t) =
where η will be chosen suitably small below, and denote also by ω(t) the solution of (6) without the forcing term gk (t). Proposition 1. (a) Let ω(0) ∈ Uκ , then for all 0 ≤ t ≤ 1, ω(t) ∈ Uκ(t) . (b) Suppose ω(0) ∈ $ with ω(0) ≤ Dν. Then ω(1) ∈ Uκ for κ = C(D α + ν1 ). The point of part (a) of this proposition is that the domain of analyticity of the solution of the unforced Navier–Stokes equation increases with time and its L2 and κ-norms decrease with time. Part (b) says that, even if ω(0) is not analytic, but belongs to $, the solution after time 1 is analytic and its L2 and κ-norms are bounded in terms of the norm of the initial data in $. Our proof of Proposition 1 is inspired by [5] (see also [1]). For the proof we rewrite (6) (without forcing) in integral form t 2 2 ds e(s−t)νk (k × l)|l|−2 ωk−l (s)ωl (s) (18) ωk (t) = e−tνk ωk (0) + 0
l∈Z2 \{0,k}
and solve this in a suitable Banach space. Let Yκ be the Banach space equipped with the norm || · ||κ and Xκ,τ = {ω ∈ C 0 ([0, τ ], Yκ ) | |||ω||| ≡ sup |||ω(t)||κ(t) < ∞}. t∈[0,τ ]
We have the following existence lemma.
(19)
Ergodicity of 2D Navier–Stokes Equations with Random Forcing
69
Lemma 1. Let ω(0) ∈ Uκ , then the solution ω of Eq. (18) exists in the set Xκ,τ , for 1
τ ≤ (Cν 2 κ)−2 . Moreover, |||ω||| ≤ 2νaκ. Proof. Let ωk0 (t) = e−tνk ωk (0), 2
(20)
and write (18) as a fixed point equation ω(t) = ω0 (t) + N (ω)(t) ≡ F(ω)(t)
(21)
with
t
Nk (ω)(t) ≡
ds e(s−t)νk
0
2
(k × l)|l|−2 ωk−l (s)ωl (s).
(22)
l∈Z2 \{0,k}
We show that the map F is a contraction in the ball B = {ω ∈ Xκ,τ | |||ω − ω0 ||| ≤ νaκ}.
(23)
Let us first show that F maps B into itself. Obviously, |||ω0 ||| ≤ νaκ and if ω ∈ B, then |||ω||| ≤ 2νaκ, which means, −1 |k|
|ωk (t)| ≤ 2νaκ|k|−r e−κ(t)
.
(24)
We must prove that |||N (ω)||| ≤ νaκ, i.e. −1 |k|
|Nk (ω)(t)| ≤ νaκ|k|−r e−κ(t)
,
(25)
for all k ∈ Z2 \0 (recall that Nk = 0 for k = 0) and for all t ∈ [0, τ ]. Inserting (24) and |k × l| |l|−2 ≤ |k||l|−1 in (22), we get: t 2 |Nk (ω)(t)| ≤ (2νaκ)2 |k| ds e(s−t)νk 0 −1 −1 e−κ(s) |k−l| e−κ(s) |l| |k − l|−r |l|−r−1 . ×
(26)
l∈Z2 \{0,k}
Writing (17) as κ(t)−1 = κ −1 + ηνt min(1, κ)κ −1 , we obtain, since k = 0 means |k| ≥ 1, that 1 (s 2
− t)νk2 ≤ η(s − t)ν|k| ≤ (κ(s)−1 − κ(t)−1 )|k|
(27)
holds for 0 ≤ s ≤ t ≤ 1 and η ≤ 21 . Since −|k − l| − |l| ≤ −|k| and l∈Z2 \{0,k}
|k − l|−r |l|−r−1 ≤ C|k|−r ,
(28)
70
J. Bricmont, A. Kupiainen, R.Lefevere
(since r > 1), we get t 1 (s−t)νk2 −1 |k|−r e−κ(t) |k| ds e 2 |Nk (ω)(t)| ≤ C(2νaκ)2 |k| 0 −1 − 1 tνk2 = Cν(2aκ)2 2|k|−1 (1 − e 2 ) |k|−r e−κ(t) |k| . − 1 tνk2
1
(29)
1
) ≤ (νt) 2 , (25) follows for τ ≤ (Cν 2 κ)−2 . The contractive Since |k|−1 (1 − e 2 property is proven similarly. Thus we obtain a unique solution of (21) in B, which 1
satisfies (24), hence ωκ(t) ≤ 2νaκ, ∀t ≤ (Cν 2 κ)−2 .
Proof of Proposition 1. (a) It suffices to show that the solution constructed in Lemma 1 on the interval [0, τ ] satisfies the two bounds of the proposition there: the one on the κ(t) norm of the solution ω and the one on its enstrophy. This implies trivially that the solution can be extended to the whole interval [0, 1] and satisfies also there the bounds of the proposition. The bound on the enstrophy is easy to prove; as is well known, the enstrophy satisfies d "(t) = − νk2 |ωk |2 ≤ −ν"(t), (30) dt k=0
leading to "(t) ≤ "(0)e−νt . Since e−νt ≤ (1 + ηνt min(1, κ))− α for η small, we get 2
2
"(t) ≤ ν 2 ϕκ(t) α ,
(31)
i.e. the claim of the proposition concerning "(t). To prove the bound ωκ(t) ≤ νaκ(t), we consider separately the cases κ < 1 and κ ≥ 1. If κ < 1, it is enough to use the bound (29) which, since |k| ≥ 1, gives −1 − 1 tνk2 |Nk (ω)(t)| ≤ νaκλ 1 − e 2 (32) |k|−r e−κ(t) |k| , where λ can be chosen arbitrarily small by decreasing a. Now inserting this bound and e
− 21 tνk2
−1 |k|
|k|−r e−(κ)
−1 |k|
≤ |k|−r e−κ(t)
,
which follows from (27) with s = 0, in (21), we conclude that 1 2 −1 − tνk − 1 tνk2 |k|−r e−κ(t) |k| +λ 1−e 2 |ωk (t)| ≤ νaκ e 2 −1 |k|
≤ νaκ(t)|k|−r e−κ(t)
(33)
(34)
,
since e
− 21 tνk2
+ λ(1 − e
− 21 tνk2
) ≤ (1 + ηνt min(1, κ))−1 ,
(35)
which holds, since |k| ≥ 1, for λ and η small enough and 0 ≤ t ≤ 1. Inequality (34) is the claim of part (a) of the proposition concerning the κ(t) norm of the solution, namely ω(t)κ(t) ≤ νaκ(t).
Ergodicity of 2D Navier–Stokes Equations with Random Forcing
Turning to κ ≥ 1, fix a number 1 < β < β α
α−1 r
71
(recall that α > 1 + r) and consider
first 1 ≤ |k| ≤ κ . Using 1
2
|ωk (t)| ≤ (2"(t)) 2 , "(t) ≤ ν 2 ϕκ(t) α , we get immediately, 1
−1 |k|
|ωk (t)| ≤ (2ν 2 ϕ) 2 κ(t) α ≤ νaκ(t)|k|−r e−κ(t) 1
rβ
(36)
for ϕ small enough and t ≤ 1 since |k|r ≤ κ α ≤ κ 1− α and κ(t)−1 |k| ≤ 1. To conclude the proof, it suffices to show |Nk (ω)(t)| ≤ νaκλ(1 − e
1
− 21 tνk2
−1 |k|
)|k|−r e−κ(t)
,
(37)
β
for |k| ≥ κ α since we may then proceed as in (33–35). β Consider first the case κ α < |k| ≤ κ. We bound |k × l||l|−2 ≤ |k||l|−1 and split the sum in (22) into |ωk−l (s)||ωl (s)||k||l|−1 ≡ 91 + 92 . + (38) 0=|l|≤ |k| 2
l=k,|l|> |k| 2
In the first sum, we bound, using Lemma 1, |ωk−l (s)| ≤ 2νaκ|k − l|−r ≤ Cνaκ|k|−r , since |k − l| ≥ 21 |k|. Then Schwartz’ inequality and (31) yield
1
|ωl (s)||l|−1 ≤ (2ν 2 ϕ) 2 κ α 1
0=|l|≤ |k| 2
|l|−2
21
1
1
1
≤ Cνϕ 2 κ α (log |k|) 2 .
(39)
0 =|l|≤ |k| 2
Combining these two bounds, we get 1
1
91 ≤ Cν 2 |k|ϕ 2 κ α (log |k|) 2 aκ|k|−r . 1
(40)
For the second sum, we use |ωl (s)| ≤ 2νaκ|l|−r (coming from Lemma 1 again), together with (31) and Schwartz’ inequality to bound it by 1 2
1 α
92 ≤ Cν |k|ϕ κ aκ 2
|l|
−2(r+1)
21
1
≤ Cν 2 |k|ϕ 2 κ α aκ|k|−r . 1
(41)
l=k,|l|> |k| 2
Inserting (38), (40) and (41) into Nk (ω)(t) and performing the integral over time, we get the bound 1 1 1 2 |Nk (ω)(t)| ≤ Cνϕ 2 κ α (log |k|) 2 |k|−1 1 − e−tνk aκ|k|−r (42) 1 1 β 1 −1 − 1 tνk2 aκ|k|−r e−κ(t) |k| ≤ Cνe1+ην ϕ 2 κ α κ − α (log κ) 2 1 − e 2
72
J. Bricmont, A. Kupiainen, R.Lefevere β
where we used κ α < |k| ≤ κ and −1 |k|
1 ≤ e−κ(t)
e1+ην
which holds since |k| ≤ κ and, see (17), κ(t)−1 ≤ κ −1 (1 + ην) if 0 ≤ t ≤ 1. Thus we obtain (37) for ϕ small enough, because β > 1, log x ≤ 1; x ; , for x > 1 and ; > 0. Finally, in the case |k| > κ the bound (29) yields immediately (37) for a small. This finishes the proof of part (a) of Proposition 1. For part (b), we can proceed as in Lemma 1, but replace aκ by D and in the definition (19) κ(t) by νt2 . The inequality (27) is then replaced by 21 (s − t)νk2 ≤ 21 (s − t)ν|k| and the proof goes as before to the conclusion ω(t) νt2 ≤ 2νD
(43)
1
for t ≤ (Cν 2 D)−2 ≡ τ . We want to rewrite this bound in the form ω(τ )κ ≤ νaκ, for −1 a suitable κ. If τ ≤ 1, i.e. D ≥ C −1 ν 2 we have ντ2 = CD 2 and we can write (43) as ω(τ )ρ ≤ 2νD where ρ = CD 2 (remember that C is allowed to vary). Choosing now κ = Cρ = C D 2 , we obtain (since D is bounded away from zero, hence D ≤ CD 2 ) that ω(τ )κ ≤ νaκ. Applying now part (a) yields the same claim for ω(1). For τ > 1, i.e. D < C −1 ν
− 21
we get ω(1) 2 ≤ 2νD ≤ C, given the bound on D; so, ω(1)κ ≤ νaκ, ν
C ν.
with κ = Finally, for the enstrophy, we have, by (31), 2
"(t) ≤ "(0) ≤ ν 2 CD 2 ≤ ν 2 ϕκ α if we take κ > CD α . Since α > 2, taking κ = C D α + ν1 gives an upper bound covering all cases, i.e. ω(1) ∈ Uκ .
3. Probabilistic Estimates We define a region U ≡ Uν −p , where p > 27 α, and in which the solution of (6) is confined with high probability. Let us divide the transition probability into a likely and unlikely part: P (ω, E) = Q(ω, E) + R(ω, E),
(44)
Q(ω, E) = χU (ω)P (ω, U ∩ E).
(45)
where The following proposition about the dynamics in U and the unprobability of excursions outside U will play a central role in the proof of our uniqueness result1 . Proposition 2. (a) There exist constants c, C < ∞, c > 0, such that for all ω ∈ U , E ∈ B, |Qt (ω, E) − Qt (0, E)| ≤ 4e−mt , c and t ≤ c m−1 ν −q , with q ≡ where m ≥ exp −Cν −3 log ν −1 1 Here and below, the kernel AB(ω, E) is defined in the obvious way by
(46) 2p α
− 4 > 3.
A(ω, dω )B(ω , E).
Ergodicity of 2D Navier–Stokes Equations with Random Forcing
73
(b) There exists ζ < 1, c > 0, C < ∞, such that ∀κ ≥ 0, for all ω ∈ Uκ and for κ ≥ ζ κ, 2 P (ω, Uκc ) ≤ C exp −cν 4 κ α . (47) The proof of (46) is based on a standard argument for exponential convergence of Markov chains (given in Doob [2]), and the idea is fairly simple. If Q was a genuine transition probability, it would be enough, in order to prove the proposition, to show that Q has good mixing properties. The precise properties are stated in the lemmas below. First, Lemma 2 says that, for any point in U there is a nonzero probability to go in a finite time to a smaller region U¯ ⊂ U determined by the covariance of the noise and thus by κγ : U¯ ≡ U2κγ +ρν ,
(48)
where ρ > 0 will be chosen below (sufficiently small)2 . This is an easy consequence of Proposition 1. On each time interval, the solution increases its domain of analyticity (which is determined by κ, i.e. κ decreases); then, if the “kicks” of the noise are sufficiently small (but not too small, so that this event is not too unprobable), the solution reaches U¯ in a finite time (of order ν −1 log ν −1 ). Secondly, we show in Lemma 3 that, in the region U¯ , the stochastic dynamics is sufficiently mixing; this is again due to the fact that the deterministic Navier–Stokes evolution increases the domain of analyticity of the solution. Third, the fact that Q is not a bona fide transition probability is what limits the proposition to finite times. For longer times, we will need to have some estimate on the probability of escaping the region U , which follows from part (b) of the Proposition. Indeed, the latter implies, using (44, 45) and taking κ = κ = ν −p that, for all ω ∈ U , −q
P (ω, U c ) = R(ω, $) ≤ e−cν , with q =
2p α
(49)
− 4 > 3 (remember that p > 27 α and that ν is small).
Lemma 2. There exist constants c, C < ∞, such that ∀ω ∈ U , P T1 (ω, U¯ ) ≥ exp −Cν −3 (log ν −1 )c ,
(50)
with T1 = Cν −1 log ν −1 . Lemma 3. There exist constants c, C < ∞, such that, ∀ω, ω ∈ U¯ , ∀B ⊂ U¯ , P (ω, B) + P (ω , U¯ \B) ≥ exp −Cν −2 (log ν −1 )c . Lemmas 2, 3 imply that there exist
δ(ν) ≡ exp −Cν −3 (log ν −1 )c
and
T ≡ T (ν) = Cν −1 log ν −1
2 Similar ideas were used by Kuksin and Shirikyan in [4].
(51)
74
J. Bricmont, A. Kupiainen, R.Lefevere
with C, c < ∞, such that ∀ω, ω ∈ U and ∀B ⊂ U¯ , P T (ω, B) + P T (ω , U¯ \B) ≥ δ(ν),
(52)
which implies in turn, since U¯ ⊂ U , that ∀ω, ω ∈ U and ∀B ⊂ U , P T (ω, B) + P T (ω , U \B) ≥ δ(ν).
(53)
This is the main inequality that we shall use now.
3.1. Proof of Proposition 2. We start with the proof of part (a), where we shall use (49), which is a consequence of part (b), to be proven independently below. To get (46) we follow, with slight modifications, an argument in [2, pp. 197–198]. Let Q(t, E) = inf Qt (ω, E), Q(t, E) = sup Qt (ω, E). ω∈U
ω∈U
Fix ω, ω ∈ U and consider the function defined on subsets E ⊂ $: ψω,ω (E) = QT (ω, E) − QT (ω , E). Let S + be the set such that ψω,ω (E) ≥ 0 for E ⊂ S + and ψω,ω (E) ≤ 0 for E ⊂ U \S + ≡ S − (S ± depend on ω, ω , but we suppress this dependence). Observe that writing, see (44), P = Q + R, and using (49), we have, for any ω ∈ U , E ⊂ $, that |P T (ω, E) − QT (ω, E)| = |
T −1
Qt RP T −t−1 (ω, E)| ≤ T e−cν
t=0
−q
1
≡ 2 ;(ν).
(54)
Then, |ψω,ω (S + ) + ψω,ω (S − )| = |QT (ω, $) − QT (ω , $)| ≤ ;(ν),
(55)
since P T (ω, $) = P T (ω , $) = 1. Moreover, using (54, 53), ψω,ω (S + ) = QT (ω, S + ) − QT (ω , S + ) ≤ 1 − (P T (ω, S − ) + P T (ω , S + )) + ;(ν) ≤ 1 − δ(ν) + ;(ν).
(56)
Thus, Q(t + T , E) − Q(t + T , E) = sup
ω,ω
= sup
ω,ω
(QT (ω, dω ) − QT (ω , dω ))Qt (ω , E) ψω,ω (dω )Qt (ω , E)
≤ sup (ψω,ω (S + )Q(t, E) + ψω,ω (S − )Q(t, E)) ω,ω
≤ (1 − δ(ν) + ;(ν))(Q(t, E) − Q(t, E)) + ;(ν),
Ergodicity of 2D Navier–Stokes Equations with Random Forcing
75
where, to get the last inequality, we write ψω,ω (S − ) = −ψω,ω (S + ) + ψω,ω (S + ) + ψω,ω (S − ), bound ψω,ω (S + ) by (56), |ψω,ω (S + ) + ψω,ω (S − )| by (55) and use Q(t, E) ≤ 1. We conclude that, for ;(ν) < δ(ν), |QnT (ω, E) − QnT (0, E)| ≤ Q(nT , E) − Q(nT , E) ≤ 2(1 − δ(ν) + ;(ν))n−1 +
;(ν) . δ(ν) − ;(ν)
Recall that δ(ν) = exp(−Cν −3 (log ν −1 )c ) and that T = T (ν) = Cν −1 log ν −1 , hence, −q see (54), ;(ν) ≤ e−c ν ; so, since we assume ν to be small and q > 3, part (a) of the proposition follows. Let us now prove part (b). It suffices to assume κ ≥ Cν −2α since the LHS of (47) is bounded by one. Using (10) for n = 0, we have ω(1)κ ≤ F (ω(0))κ + g(1)κ
(57)
and 1
"(1) = 2 ω(1)2L2 ≤ F (ω(0))2L2 + g(1)2L2 ,
(58)
and, by Proposition 1, we know that, if ω(0) ∈ Uκ , F (ω(0)) ∈ Uκ(1) , with κ(1) = (recall that κ ≥ Cν −2α ≥ 1 here). Then, letting ζ = (1 + ην) κ ≥ ζ κ, that F (ω(0))κ ≤ (1 + ην)
− 21
− 21
κ 1+ην
< 1, we get, for
νaκ
and 2
F (ω(0))2L2 ≤ (1 + ην)− α ν 2 ϕκ α . 2
Now, assume that g(1) satisfies, ∀k, 1
|gk (1)| ≤ ;1 ν 2 κ α b
− 21
1
|k|(γk ) 2 ,
(59)
with ;1 small (depending on η but independent of ν and κ). Then, we get that ω(1) ∈ Uκ , using the upper bound in (7) and the fact that, for ν small, κ ≥ ζ κ ≥ Cζ ν −2α is much larger than 2κγ . Hence, the probability in (47) is bounded by the probability that at least one of the inequalities in (59) is violated. Since the gk ’s are Gaussian random variables with covariance γk , this event has a probability less than:
2 1 − C exp −C;12 ν 4 (κ ) α b−1 |k|2 . (60) 1− k 2
For ν 4 (κ ) α large, each is small. The factor |k|2 controls exponential2 in the product 2 2 4 −1 2 the sum over k of exp −C;1 ν (κ ) α b |k| , a sum which is small for ν 4 (κ ) α large 2 enough, and therefore (60) is bounded from above by 1 − 1 − exp −cν 4 κ α , i.e. by the RHS of (47).
76
J. Bricmont, A. Kupiainen, R.Lefevere
3.2. Proofs of Lemmas 2 and 3. Proof of Lemma 2. Let ω(0) ∈ Uκ ⊆ U and consider (10) for n = 0. Choose g(1) such that, ∀k, |gk (1)| ≤ ;1 ν 2 e;2 ν|k| b
− 21
1
(γk ) 2 .
(61)
From Proposition 1, (61) and (7), one obtains, −1 |k|
|ωk (1)| ≤ νaκ(1)|k|−r e−κ(1)
−1 |k| 2
+ ;1 ν 2 e;2 ν|k| e−κγ
.
(62)
Then, from (17), one gets, for any ρ > 0, by choosing ;1 , ;2 small enough, that ∃λ¯ < e−cν < 1 such that, ∀k, −1 |k|
|ωk (1)| ≤ νaκ |k|−r e−(κ )
(63)
¯ 2κγ + ρν). with κ = max(λκ, From (31), (58), (59), one also easily obtains that "(1) ≤ ν 2 ϕ(κ ) α , 2
and thus that ω(1) ∈ Uκ . Thus,
1 −1 P |gk (1)| ≤ ;1 ν 2 e;2 ν|k| b 2 (γk ) 2 . P (ω, Uκ ) ≥ k
Now, since the gk ’s are Gaussian random variables with covariance γk , we have that, P (|gk (1)| ≤ ;1 ν 2 e;2 ν|k| b
− 21
1
(γk ) 2 ) ≥ 1 − exp(−c;12 ν 4 e2;2 ν|k| b−1 )
(64)
for |k| ≥ Cν −1 log ν −1 , if C is chosen so that bν −4 ≤ e;2 ν|k| (note that the product over such k’s of the RHS of (64) is strictly positive uniformly in ν), while for |k| ≤ Cν −1 log ν −1 , 1 1 −1 −1 P |gk (1)| ≤ ;1 ν 2 e;2 ν|k| b 2 (γk ) 2 ≥ P |gk (1)| ≤ ;1 ν 2 b 2 (γk ) 2 ≥ Cν 4 , (65) which follows from the fact that the gk ’s are (complex) Gaussian random variables with covariance γk and therefore that ;k 1 2rdr −1 P |gk (1)| ≤ ;1 ν 2 b 2 (γk ) 2 ≥ ≥ Cν 4 γ 0
−1
k
1
with ;k ≡ ;1 ν 2 b 2 (γk ) 2 . The bound (65) readily implies that there are constants C, c1 < ∞ such that ∀ω ∈ Uκ , (66) P (ω, Uκ ) ≥ exp −Cν −2 (log ν −1 )c1 . Since U = Uν −p , and since κ decreases by a factor λ¯ < 1 at each step, as long as ¯ ≥ 2κγ + ρν, one may iterate the above argument and reach U¯ = U2κγ +ρν , see λκ (48), in a time less than T1 (ν) = Cν −1 log ν −1 , ∀ω(0) ∈ U . Therefore, the claim of the lemma follows (with a different C than in (66), and with c = c1 + 1).
Ergodicity of 2D Navier–Stokes Equations with Random Forcing
77
Proof of Lemma 3. Let ω0 ∈ U¯ and B ⊂ U¯ . Since the gk ’s are Gaussian random variables with covariance γk , we have,
|ωk −Fk (ω0 )|2 d ω¯ k ∧dωk P (ω0 , B) = , (67) exp − 2πiγ γ k
B k
k
where we recall from (10) that F (ω0 ) denotes the value at time 1 of the solution of (18) with initial condition ω0 . In view of Proposition 1, and the definition of U¯ = U2κγ +ρν , we can bound, ∀ω0 ∈ U¯ , |Fk (ω0 )| ≤ Cνae
|k| − 2κ γ
e−ρν|k| ≡ ;k ,
(68)
provided we choose ρ sufficiently small so that 1 + ην min(1, 2κγ + ρν) 1 + ρν. ≥ 2κγ + ρν 2κγ Thus, we can bound |ωk − Fk (ω0 )|2 ≤ (|ωk | + ;k )2 ; this gives a lower bound on (67) independent of ω0 and we may use this bound on each term of the LHS of (51), with ω0 = ω, ω . We get that the LHS of (51) is bounded from below by
(|ω |+; )2 d ω¯ k ∧dωk . (69) exp − k γ k 2πiγ U¯ k
k
k
In order to estimate that latter integral, observe that, by (7), ω ∈ U¯ = U2κγ +ρν provided that, ∀k, |ωk | ≤ ;1 νe;2 ν|k| b
− 21
1
(γk ) 2 ≡ ;¯k ,
(70)
if we take ;1 , ;2 small enough. Thus, by restricting the domain of integration, we get a lower bound on (69):
;¯k 2rdr (r+;k )2 exp − . (71) γ γ k
0
k
k
Each factor is bounded from below by 1−C
;k2 − exp −c;12 ν 2 e2;2 |k| (b)−1 γk
(72)
for |k| ≥ Cν −1 log ν −1 . To bound the product over those k’s of the factors given by (72) by a strictly positive constant, independent of ν, observe first that the last term is summable over k, for |k| ≥ Cν −1 log ν −1 and that the sum is small. Moreover, using (68) and the lower bound in (7), we get, ;k2 ≤ Cabν 2 exp (−2ρν|k|) . γk
(73)
Then, (73) is also summable over k, for |k| ≥ Cν −1 log ν −1 and the sum is also small. Finally, for |k| ≤ Cν −1 log ν −1 , each factor in (71) is bounded from below, using (7, ;¯ 2 70), by C 0 k 2rdr γk ≥ Cν , which yields the claim of the lemma.
78
J. Bricmont, A. Kupiainen, R.Lefevere
4. Proof of the Theorem We deduce the theorem from Proposition 2. Let us choose a number ;¯ small enough and a time τ large enough, i.e. τ = cm−1 ν −q so that (46) is less than 2;¯ . Then, for T an integer multiple of τ , write T
P T (ω, E) = (P τ ) τ (ω, E).
(74)
π(ω, E) ≡ π(E) = Qτ (0, E)
(75)
Next, let
and ¯ R(ω, E) = P τ (ω, E) − Qτ (ω, E), R (ω, E) =τ (ω, E) − Qτ (0, E) = Qτ (ω, E) − π(E), ¯ r(ω, E) = R(ω, E) + R (ω, E).
(76) (77)
One may then write T
P T (ω, E) = (π + r) τ (ω, E).
(78)
T
We can expand (π + r) τ in powers of r: T T T (π + r) τ = π k1 r k2 . . . π kl + r τ ≡ 9 1 + r τ ,
(79)
ki
where the sum 9 1 runs over ki ≥ 0, ki = Tτ and collects all the terms with at least one factor π. Now observe that, using (78) with T = τ , we have that r(ω, $) = P τ (ω, $) − π($) = 1 − π($) = 1 − Qτ (0, U ) is independent of ω; hence, by (75), (rπ )(ω, dω2 ) = r(ω, dω1 )π(ω1 , dω2 )
(80)
= r(ω, $)π(dω2 ) = (1 − Qτ (0, U ))π(dω2 )
is also independent of ω. From this, we conclude that, since there is at least one factor π in each term of 9 1 , 9 1 (ω, E) = 9 1 (ω , E), ∀ω, ω , ∀E, and, using (78), that T
T
|P T (ω, E) − P T (ω , E)| ≤ |r τ (ω, E)| + |r τ (ω , E)|,
(81)
where the RHS is controlled by: Lemma 4. For T an integer multiple of τ , T
|r τ (ω, E)| ≤ C(ω)e−mT , where m = m(ν) ≥ exp(−Cν −3 (log ν −1 )c ) and where C(ω) ≤ C C, c < ∞.
(82)
||ω||+1 ν
c
, for
Ergodicity of 2D Navier–Stokes Equations with Random Forcing
79
To conclude the proof of (12), it is enough to show that limT →∞ P T (0, E) = µ(E) exists. And, to prove that, we write, for T > T , |P T (0, E) − P T (0, E)| ≤ P T −T (0, dω)|P T (ω, E) − P T (0, E)|. (83) −p We may write the integral as an integral over U plus a sum over κ ∈ N, κ ≥ νc , of integrals over Uκ+1 \Uκ and, combining (81), Lemma 4, C(ω) ≤ C ||ω||+1 ≤ ν c C ||ω||νκ +1 , ∀κ, and (47) (which implies a similar bound for P T −T (0, Uκc )), we bound (83) by 2 Ce−mT (ν −(p+1)c + κ c exp −c ν 4 κ α ≤ C(ν)e−mT , (84) κ∈N,κ≥ν −p
which proves the existence of limT →∞ P T (0, E). Finally, the bound (11) follows from (47), for κ large, and we bound the LHS of (11) by 1 for κ small. Proof of Lemma 4. Define, for n ≥ 0, U (n) ≡ Uζ −n ν −p , (so that U = U (0)) with ζ < 1 as in Proposition 2, and define V (n) by V (n) = U (n)\U (n − 1), for n ≥ 1, and V (0) = U (0). Next, let ρmn ≡ sup |r(ω, V (n))|, ω∈V (m)
where r is defined in (77). Observe that we have the following bounds on ρmn : ρ00 ≤ ;¯ ,
ρmn ≤ exp −cξ n ν −q ρmn ≤ 4
n ≥ m,
(85)
n < m,
2 where ξ ≡ ζ − α > 1. To check this, use, for m, n = 0, (54) to bound R¯ and (46) to bound R . For the second inequality, n = 0, only P contributes to r and the bound follows immediately from (47), with κ = ζ 1−n ν −p (remember that q = 2p α − 4). Finally, for n < m, we use the fact that r is the sum of four terms, each less than 1. Write now N−1
N r (ω, E) = r(ωi , dωi+1 )χ (ωN ∈ E) (86)
i=0
with ω0 = ω, and insert a decomposition of the identity for each i = 1, . . . , N, 1= χ (ωi ∈ V (ni )) . ni ≥0
This leads to sup sup |r N (ω, E)| ≤
ω∈U (n0 ) E
N−1
i=0 (ni )N i=1 ,ni ≥0
ρni ni+1 ≡
n
ρnN0 n .
(87)
80
J. Bricmont, A. Kupiainen, R.Lefevere
Note that the RHS describes “random walks” on nonnegative integers, where only steps strictly down (ni+1 < ni ) are not suppressed. To estimate it, write ρ = d + u, where d is the “down” part of ρ, i.e. the matrix whose elements are given by ρmn with n < m and zero otherwise, and u is the rest (“up”). We shall first prove the simple estimates k dmn ≤ Cm k ≤ m (88) n
and zero otherwise (where the restriction k ≤ m comes from the fact that the indices of dmn must be positive and, whenever dmn = 0, must satisfy n < m), and (uk d l )mn ≤ (C ;¯ )k+l . (89) n
Indeed, (88) is estimated by n1 ...nk
dmn1 . . . dnk−1 nk ≤
k
4k = 4k
1 pi ≤m
m l−1 ≤ 4k 2 m , k−1 l=k
with pi = ni−1 − ni ≥ 1, n0 = m, yielding the claim since k ≤ m. To prove (89), write l (uk d l )mn = ukmm dm n, l≤m
where the constraint in the sum comes from the second inequality in (88), and note that ukmm is bounded by (¯; )k if m = m = 0 and by exp −cξ m ν −q (C ;¯ )k−1 otherwise (both bounds following from (85) and the fact that, by definition, umn = 0, unless m ≥ n). The bound (89) follows by combining these with (88), since l ≤ m , we can therefore use the factor exp(−cξ m ν −q ) and ν small to obtain the factor ;¯ l+1 in (89) (¯; is a fixed small number). Inserting (88), (89) into ρ N = (u + d)N = d l0 uk1 d l1 . . . uks d ls , where li ≥ 0, li > 0 for i = 0, s, we obtain the bound ρnN0 n ≤ C N ;¯ N−n0 , n
where ;¯ −n0
comes from the fact that we have no ;¯ bound on d l0 , but we can use l0 ≤ n0 . This proves the lemma, if we choose in (82), 1 m = − log C ;¯ = −cm−1 ν −q log C ;¯ ≥ exp −Cν −3 (log ν −1 )c τ (given our choice of τ at the beginning of this section, our bound on m in Proposition 2, and changing the constants), and, for ω ∈ U (n0 ), choose C(ω) = (C ;¯ )−n0 ; indeed, let, for ω ∈ Uκ , n0 be the smallest integer such that κ ≤ ζ −n0 ν −p ; then, n0 ≤ C log κ and C(ω) ≤ Cκ c . Moreover, that, from part (b) of Proposition 1, we know ∀ω ∈ $, F (ω) ∈ Uκ , with κ ≤ C
||ω|| ν
α
+
1 ν
; so, altogether, C(ω) ≤ C
||ω||+1 ν
c
.
Ergodicity of 2D Navier–Stokes Equations with Random Forcing
81
References 1. Bricmont, J., Kupiainen, A., Lefevere, R.: Probabilistic estimates for the two dimensional stochastic Navier–Stokes equations. J. Stat. Phys. 100, 743–756 (2000) 2. Doob, J.L.: Stochastic Processes. New-York: John Wiley, 1953 3. Flandoli, F., Maslowski, B.: Ergodicity of the 2-D Navier–Stokes equation under random perturbations. Commun. Math. Phys. 171, 119–141 (1995) 4. Kuksin, S., Shirikyan, A.: Stochastic dissipative PDE’s and Gibbs measures. Commun. Math. Phys. 213, 291–330 (2000) 5. Mattingly, J.C., Sinai, Y.: An elementary proof of the existence and uniqueness theorem for the Navier– Stokes equations. Preprint 6. Mattingly, J.C.: Ergodicity of 2D Navier–Stokes equations with random forcing and large viscosity. Commun. Math. Phys. 206, 273–288 (1999) Communicated by G. Gallavotti
Commun. Math. Phys. 224, 83 – 106 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Gibbsian Dynamics and Ergodicity for the Stochastically Forced Navier–Stokes Equation Weinan E1 , J. C. Mattingly2 , Ya. Sinai3 1 Department of Mathematics and Program in Applied and Computational Mathematics, Princeton University,
Princeton, NJ 08544, USA and School of Mathematics, Peking University, Beijing, P.R. China
2 Department of Mathematics, Stanford University, Stanford, CA 94305, USA 3 Department of Mathematics, Princeton University, Princeton, NJ 08544, USA and Landau Institute of
Theoretical Physics, Moscow, Russia Received: 21 November 2000 / Accepted: 9 December 2000
Dedicated to Joel L. Lebowitz, on the occasion of his 70th birthday Abstract: We study stationary measures for the two-dimensional Navier–Stokes equation with periodic boundary condition and random forcing. We prove uniqueness of the stationary measure under the condition that all “determining modes” are forced. The main idea behind the proof is to study the Gibbsian dynamics of the low modes obtained by representing the high modes as functionals of the time-history of the low modes. 1. Introduction and Main Results We are interested in determining conditions sufficient to insure that the stochasticallyforced Navier–Stokes equation (SNS) possesses a unique stationary measure, or equivalently, that the dynamics is ergodic in the phase space. Our main result is that this holds if all the “determining modes” are forced. To prove this, we show that the dynamics of the Navier–Stokes equation can be reduced to the dynamics of the low modes, the so-called determining modes, with memory. This is the stochastic analog of results proved for the deterministic case by Foias et al. [FMRT]. We will work with the periodic boundary condition. But in principle our techniques should also apply for the more physical no-slip boundary condition. Consider the two-dimensional Navier–Stokes equation with stochastic forcing: ∂u ∂W (x, t) + (u · ∇)u + ∇p − νu = . (1) ∂t ∂t ∇ ·u=0 For simplicity of presentation we will take W to be of the form σk wk (t, ω)ek (x)m W (x, t) =
(2)
|k|≤N
where the wk ’s are standard i.i.d complex valued Wiener process satisfying w−k (t) = w k (t), and σk ∈ C, with |σk | > 0 and σ−k = σ k , are the amplitudes of
84
W. E, J.C. Mattingly, Ya. Sinai
2 eik·x 2 the forcing, {ek (x) = −ik ik1 |k| , k ∈ Z} are the basis in the space of L divergence2 free, mean zero vector fields on T , the two dimensional torus. Our techniques apply to more general cases when the higher modes are also forced, as long as |σk | decays sufficiently fast as |k| → ∞ or to forcing which is not diagonal in Fourier space. But we will restrict ourselves to the form in (2) for clarity. Define B(u, v) = −Pdiv (u · ∇)v, 2 u = −Pdiv u, where Pdiv is the L2 projec2 tion operator onto vector fields. Let σmax = max{|σk |2 : the space2 of divergence-free 2 2 |k| ≤ N }. E0 = |k|≤N |σk | and E1 = |k|≤N |k| |σk | . Writing u(x) = k uk ek (x), we will define Hα = u = (uk )k∈Z2 , u0 = 0, k |k|2α |uk |2 < ∞ and L2 = H0 . We will work on a probability space (, F, Ft , P, θt ). We associate with the canonical space generated by all dωk (t). F and Ft are respectively the associated global σ -algebra and filtration generated by W (t). Lastly, θt is the shift on defined by θt dωk (s) = dωk (s + t). Notice that θt is an ergodic group of measure-preserving transformations with respect to P. Expectations with respect to P will be denoted by E. Projecting (1) onto L2 , we obtain the the following system of Itô stochastic equation du(x, t) + ν2 u(x, t)dt = B(u, u)dt + dW (x, t).
(3)
It can be shown that (3) generates a continuous Markovian stochastic semi-flow on L2 defined by ω ϕs,t u0 = u(t, ω; s, u0 ).
(4)
When s = 0, we simply write ϕtω (see [Fla94, DPZ96]). We will take the state space of (3) to be L2 equipped with the Borel σ -algebra. A measure µ(du) on L2 is stationary for the stochastic flow (3) if for all bounded continuous functions F on L2 and t > 0, F (u)µ(du) = EF ϕtω u µ(du). (5) L2
L2
Our main result is: Theorem 1. There exists some absolute constant C such that if N 2 ≥ C Eν 30 then (3) has a unique stationary measure on L2 . The existence of at least one stationary measure was proved in [Fla94] and [VF88]. The proof proceeds by establishing compactness for a family of empirical measures. The limiting points of these empirical measures are the stationary measures. Uniqueness has been proved under restrictive assumptions when ALL modes are forced. Flandoli and Maslowski [FM95] proved that if the σk ’s decay algebraically, i.e. if the forcing is sufficiently rough spatially, then the system has a unique stationary measure. These results were extended and refined in [Fer97]. In [Mat99], it was proven that if the viscosity was large enough the contraction induced by the Laplacian dominates and the system possesses a trivial random attractor; and hence, a unique stationary measure. We do not address convergence to the stationary measure. This and the coupling construction used to prove convergence are discussed in [Mat00]. Recently Kuksin and Shirikyan [KS] proved uniqueness of stationary measure when the Navier–Stokes equation is perturbed by a bounded degenerate kicked noise. Results similar to ours have also been obtained independently by Bricmont et al. [BKL].
Gibbsian Dynamics and Ergodicity for the Stochastically Forced Navier–Stokes Equation
85
Our main strategy is to reduce the dynamics of the Navier–Stokes equation to the dynamics of a finite dimensional set of low modes with memory. The reduced dynamics is no longer Markovian, but rather Gibbsian (see §2, §4). The finite dimensional Gibbsian dynamics has a non-degenerate noise, and have a unique stationary measure if the memory is short ranged. Before proceeding further, let us observe that any given stationary measure µ can be extended to a measure on the path space, denoted by µp , where p stands for path or past. Consider the example of the path space C (−∞, 0], L2 . Let A be a cylinder set of the type: For some t0 , t1 , · · · tn , t0 < t1 < t2 · · · tn ≤ 0,
A = u(s) ∈ C (−∞, 0], L2 , u(ti ) ∈ Ai , i = 0, · · · n , (6) where the Ai ’s are Borel sets of L2 . Corresponding to A, let B ⊂ × L2 , B = {(u, ω), u ∈ A0 , ϕtω0 ,ti u ∈ Ai , i = 1, · · · n}.
(7)
µp (A) = (P × µ)(B),
(8)
We will define
where (P × µ) is the product measure on × L2 . Clearly µp is consistent on cylinder sets and can be extended to the natural σ -algebra using the Kolmogorov extension theorem. The natural σ -algebra is the one generated by the cylinder sets. The dynamics of the stochastic semi-flow {ϕtω } can be trivially extended to return a function from C (−∞, t], L2 , given an initial function from C (−∞, 0], L2 . One simply flows forward with ϕ from the initial condition avoid confusion, we will call at time 0. To this map ψtω . Symbolically, if u(·) ∈ C (−∞, 0], L2 , then (ψtω u)(s) = ϕsω u(0) for s ∈ [0, t] and (ψtω u)(s) = u(s) for s ≤ 0. If we define the shift on trajectories by (θt v)(s) = v(s + t), we can define a dynamics on C (−∞, 0], L2 by θt ψtω . In other words, θt ψtω u takes a trajectory u from C (−∞, 0], L2 , extends it t units of time by flowing forward and then shifts the entire resulting trajectory back t units of time so it again lives on C (−∞, 0], L2 . It is easy to check directly that if µ is invariant then µp is invariant in the sense that (9) F (u)dµp (u) = E F (θt ψtω u)dµp (u) 2 2 C ((−∞,0],L ) C ((−∞,0],L ) for all bounded functions on C (−∞, 0], L2 , and t ≥ 0. Assume that µ and ν are two stationary measures for the stochastic flow (3), and µp 2 and νp are respectively their induced measure on the path space C (−∞, 0], L . It is obvious that µp = νp implies µ = ν. 2. Reduction to the Gibbsian Dynamics Define two subspaces of L2 : L2( = span{ek , |k| ≤ N },
L2h = span{ek , |k| > N }.
(10)
We will call L2( the set of low modes and L2h the set of high modes. Obviously L2 = L2( ⊕ L2h . Denote by P( and Ph the projections onto the low and high mode spaces.
86
W. E, J.C. Mattingly, Ya. Sinai
Since we are concerned with stationary measures of (3), we are interested in (statistically) stationary solutions of (3) that exist for time from −∞ to +∞. We will show in this section that for such solutions, the high modes are completely by the determined past history of the low modes. For this purpose, we write u(t) = ((t), h(t) and
d((t) = −ν2 ( + P( B((, () dt
+ P( B((, h) + P( B(h, () + P( B(h, h) dt + dW (t), (11)
dh(t) (12) = −ν2 h + Ph B(h, h) + Ph B((, h) + Ph B(h, () + Ph B((, (). dt Define the set of “nice pasts” U ⊂ C (−∞, 0], L2 to consist of all v : (−∞, 0] → L2 such that: i) v(t) is in H2 for all t ≤ 0. ii) The energy averages correctly. More precisely, 1 t→−∞ |t| lim
t
0
|v(s)|2L2 ds =
E0 . 2ν
iii) The energy fluctuations are typical. More precisely, there exists a T = T (v) such that 2
|v(t)|2L2 ≤ E0 + max(|t|, T ) 3 for t ≤ 0. The following lemma shows that U contains almost all of the trajectories defined on the whole time interval. Lemma 2.1. Let µp be themeasure on C (−∞, 0], L2 induced by a stationary measure µ for (3). Then µp U = 1. Proof of Lemma 2.1. It is proved in [Mat98] or [Fer97] that with probability one, a solution to (3) is in H2 for all t. The fact that the last condition is satisfied by a set of full measure is proved in Lemma B.3. All that remains to show is ii). From Lemma B.2 |v|2L2 is in L1 (µ) for any stationary measure µ and |v|2L2 dµ = E0 2ν . Since the measure is invariant under shifts back in time and each ergodic component has the same average enstrophy, the ergodic theorem implies that for µp –almost every trajectory time average converges to the average of |u|2L2 against µ. Given an arbitrary continuous function of time ((t) on L2( , we can view (12) as a closed equation with some exogenous forcing ((t). By ,s,t ((, h0 ), we mean the solution to (12) at time t given the initial condition h0 at time s and the “forcing” (. Denote by P the set of all ( ∈ C (−∞, 0], L2( such that the following two conditions hold. First, ( = P( u for some u = ((, h) ∈ U . Second, h(t) = ,s,t ((, h(s)) for any s < t ≤ 0, where h was the matching high mode so ((, h) ∈ U . That is to say h(t) solves (12) with low modes ((t) and the total solution ((, h) is in our space of “nice pasts”. In light of Lemma 2.1 the set P is not empty. We now will show that this h is uniquely determined by (.
Gibbsian Dynamics and Ergodicity for the Stochastically Forced Navier–Stokes Equation
87
Lemma 2.2. There exists an absolute positive constant C such that if N 2 > C Eν 30 then the following holds: If there exists two solutions u1 (t) = ((t), h1 (t) , u2 (t) = ((t), h2 (t) corresponding to some (possibly different) realizations of the forcing and such that u1 , u2 ∈ U , then u1 = u2 , i.e. h1 = h2 . Furthermore given a solution u(t) = ((t), h(t) ∈ U , any h0 ∈ L2h , and t ≤ 0, the following limit exists: lim ,t0 ,t ((, h0 ) = h∗
t0 →−∞
and h∗ = h(t). Proof of Lemma 2.2. We begin with the first clam. Denote by ρ(t) = h1 (t) − h2 (t). From (12) we have dρ = − ν2 ρ + Ph B(h1 , h1 ) − Ph B(h2 , h2 ) + Ph B((, ρ) + Ph B(ρ, () dt = − ν2 ρ + Ph B(( + h1 , ρ) + Ph B(ρ, ( + h2 )
(13)
= − ν2 ρ + Ph B(u1 , ρ) + Ph B(ρ, u2 ). Taking the inner product with ρ, using the fact that Ph B(u1 , ρ), ρL2 = 0, gives 1 d |ρ|2 2 = −ν|ρ|2L2 + Ph B(ρ, u2 ), ρL2 . 2 dt L Since |Ph B(ρ, u2 ), ρL2 | ≤Cˆ |ρ|L2 |ρ|L2 |u2 |L2 ν Cˆ 2 |ρ|2L2 |u2 |2L2 , ≤ |ρ|2L2 + 2 2ν we get ν Cˆ 2 1 d |ρ|2L2 ≤ − |ρ|2L2 + |u2 |2L2 |ρ|2L2 . 2 dt 2 2ν Since ρ only contains modes with |k| > N , the Poincaré inequality implies Cˆ 2 d 2 2 2 |u2 |L2 |ρ|2L2 . |ρ|L2 ≤ −νN + ν dt Therefore we have, for t0 < t < 0, |ρ(t)|2L2
≤
|ρ(t0 )|2L2
Cˆ 2 t 2 |u2 (s)|L2 ds . exp −νN (t − t0 ) + ν t0 2
From the third assumption on functions in U , we know that lim E0 2ν . Hence for t0 < T1 , where T1 depends on t and u2 , we have −νN 2 (t − t0 ) +
1 0 t −t
(14)
|u2 (s)|2L2 ds =
Cˆ 2 t γ |u2 (s)|2L2 ds ≤ − (t − t0 ), ν t0 2
88
W. E, J.C. Mattingly, Ya. Sinai ˆ2
ˆ2
where γ = νN 2 − C2νε20 . If we set C = C2 , then our assumption on N implies γ > 0. Now using the last property of paths in U we have for any t0 ≤ T2 , γ |ρ(t)|2L2 ≤ |ρ(t0 )|2L2 exp − (t − t0 ) 2 γ 2 3 ≤2 E0 + |t0 | ] exp − (t − t0 ) → 0 2 as t0 → −∞, where T2 is some finite constant depending on u1 and u2 . This completes the proof of the first part of Lemma 2.2. t To see the second part, observe that (14) only required control of t0 |u(s)|2L2 ds for one of the two solutions. If we proceed as before letting the given solution u(t) play the role of u2 and the solution to (12) starting from h0 play the role of u1 , the we obtain the estimate Cˆ 2 t 2 2 2 2 |ρ(t)|L2 ≤ |h(t0 ) − h0 |L2 exp −νN (t − t0 ) + |u(s)|L2 ds . (15) ν t0 Since u(t) = (((t), h(t)) ∈ U , the same reasoning as before shows that ρ(t) goes to zero as t0 → −∞. Hence the limit exists and equals h(t). In fact the splitting into high and low modes can be accomplished even when all of the modes are forced. One replaces (12) with an Itô stochastic differential equation. This causes little complication as (13) remains a standard PDE. See [Mat98].The ideas in this section are related to the ideas of Lyapunov-Schmidt reduction and those around center and inertial manifolds. See [EFNT94] for a discussion and other references. From now on we assume that N satisfies N2 > C
E0 , ν3
(16)
where C is the constant from Lemma 2.2. Because of Lemma (2.2), we can define a map ,0 which reconstructs the high modes at time zero from a given low mode trajectory stretching from zero back to −∞. Before making this more precise, let us fix some notation. In general, we will use ((t) to refer to the value of the low modes at time t and will use Lt to mean the entire trajectory from −∞ to t. Hence ((t) ∈ L2 and Lt ∈ C (−∞, t], L2 and ((s) = Lt (s) for s ≤ t. In this notation h(0) = ,0 L0 , where L0 is some “low mode past” in P which is the projection of U to the low modes. By ,s (Lt , h(0)) with s ≤ t, we mean the solution to (12) at time s with initial condition h(0) and low mode forcing Lt . Of course ,s (Lt , h(0)) only depends on the information in Lt between 0 and s. We can extend the definition of , beyond time zero by defining ,t (Lt ) = ,t (Lt , h(0)), where h(0) = ,0 (L0 ). Given the initial low mode past of L0 ∈ P, we can solve for the future of ( using
d((t) = −ν2 ((t) + P( B ((t), ((t) + G ((t), ,t (Lt ) dt + dW (t), (17) where G ((, h) = P( B((, h) + P( B(h, () + P( B(h, h).
(18)
Thus we have a closed formulation of the dynamics on the low modes given an initial past in L0 ∈ P. We write Lt = Sωt L0 . We reiterate that Lt is the entire trajectory from time t back to −∞, whereas ((t) is simply the value of the low modes at time t.
Gibbsian Dynamics and Ergodicity for the Stochastically Forced Navier–Stokes Equation
89
Except for the fact the G-term in (17) is history-dependent, (17) has the form of a standard finite dimensional stochastic ODE with non-degenerate forcing, which of course has a unique stationary measure. Our task is reduced to showing that the memory effort in (17) is not strong enough to spoil ergodicity. Existence of the solution for memory-dependent stochastic ODEs of the type (17) was considered in the work of Ito et al. [IN]. 3. Uniqueness of the Invariant Measure 3.1. Proof of the Main Theorem. Given any “nice low mode past” L ∈ P, we can reconstruct the “high modes” and hence define a closed dynamics on the paths of the low modes. However, this dynamics is no longer Markovian which will produce difficulties. 2 Let µ be an ergodic stationary measure on L and µp be its extension to the path 2 space C (−∞, 0], L . We will also consider the restriction of µp to C (−∞, 0], L2( , still denoted by µp . Lemma 2.1 says that µp (P) = 1. Given any L0 ∈ P, let Qt (L0 , · ) be the measure induced on C [0, t], L2( by the dynamics starting from L0 . In other words,Qt (L0 , · ) is the distribution of Sωt L0 viewed as a random variable taking values in C [0, t], L2( . Similarly let Q∞ (L0 , · ) be the distribution induced on C [0, ∞), L2( starting from L0 . Consider the stochastic process defined by θt Sωt L0 , where L0 is a random variable on P distributed according to the invariant measure µp . For t ≥ 0 it is a random process with values in P. This is clear as all of the defining properties of U are asymptotic in t; and hence the addition of a segment of finite length does not destroy them. Since µp is invariant with respect to the dynamics, θt Sωt L0 is a stationary random process. Hence 0 with probability one there exist time averages along trajectories θt Sωt L . 2 Take any bounded measurable functional F from C (−∞, 0], L( → R such that F (L0 ), L0 ∈ C (−∞, 0], L2( depends only on a finite range of L0 . Let F¯ = F (L)dµp (L). (19)
Theorem 2. The SNS equation (1) has a unique stationary measure. The proof of Theorem 2 is based on the following two lemmas whose proofs will be given later. Lemma 3.1. Let L01 and L02 be two initial pasts in P, such that (1 (0) = (2 (0). Then Q∞ (L01 , ·) and Q∞ (L02 , ·) are equivalent. Recall that ((τ ) is the solution of (16) with initial condition L. Lemma 3.2. For any past L ∈ P and any t > 0, the distribution of ((t) ∈ L2( conditioned at starting from L at time zero, denoted by Rt (L, ·), satisfies the following: there exists a strictly positive function fL,t ∈ L1 (L2( ), such that dRt (L, ·) ≥ fL,t (·)dm(·). where m(·) is the Lebesgue measure on L2( .
90
W. E, J.C. Mattingly, Ya. Sinai
For any measure µ on L2 let P( µ denote its projection to a measure on the low modes L2( . Namely, (P( µ)(B) = µ(P(−1 (B)). Then we have the following direct consequence of Lemma 3.2. Corollary 3.3. If µ is a stationary measure then P( µ has a component which is equivalent to the Lebesgue measure. Proof of Theorem 2. Assume that there are two different ergodic stationary measures on L2 called µ1 and µ2 . They must be mutually singular. Let µ1,p and µp,2 be the extensions of these two measures onto the path space P. Let L0i be a random variable on P distributed as µi,p . Since θt Sωt L0i is stationary with respect to µp,i we can pick a set Pi , of full µp,i -measure, such that for all L ∈ Pi One can find a functional F such as above so that F¯1 = F (L)dµp,1 (L) = F¯2 = F (L)dµp,2 (L). This assumption will lead to a contradiction. The limit 1 T F (θt Sωt Loi )dt = F¯i (20) lim T →∞ T 0 is well defined for P-almost every ω. For ( ∈ L2( define Pi (() = {L ∈ Pi : L(0) = (} and let µp,i ( · |() be the conditional measure that L(0) = (. By Fubini’s theorem, we know that for P( µi -almost every ( ∈ L2( we have µp,i (Pi (() | () = 1. Hence we can find a set Ai ⊂ L2( such that µp,i (Pi (() | () = 1 for all ( ∈ Ai and P( µi (Ai ) = 1. Define A = A1 ∩ A2 . Corollary 3.3 implies that P( µi (A) > 0 for i = 1, 2. Hence there exists some (∗ ∈ A. Since (∗ ∈ A1 ∩A2 , we know that µp,i (Pi ((∗ ) | (∗ ) = 1 for i = 1, 2. Thus there exist some L∗,1 ∈ P1 ((∗ ) and L∗,2 ∈ P2 ((∗ ). Notice that by construction L∗,1 (0) = (∗ = L∗,2 (0), and hence it follows from Lemma 3.1 that Q∞ (L∗,1 , ·) and Q∞ (L∗,2 ,·) are equivalent. Since L∗,i ∈ Pi ((∗ ), we know that we can pick Bi ⊂ C [0, ∞), L2 such that the time average of F converges to F¯i for all futures in Bi and Q∞ (L∗,i , Bi ) = 1 for i = 1, 2. Since the Q’s are equivalent, Q∞ (L∗,1 , B1 ∩ B2 ) > 0 and hence B1 ∩ B2 is non-empty. This in turn implies that F¯1 = F¯2 which contradicts the assumption that they were not equal.
3.2. Proofs of the lemmas. We first prove Lemma 3.1. Fix L01 and L02 . Most of our construction will depend explicitly on them. With probability one, we can extend each of the initial pasts into the infinite future by Lsi = Sωs L0i and setting (i (s) = Lti (s) for s ≤ t. We can also reconstruct the entire solution by using ,t to obtain the high modes. Set hi (s) = ,s (Lsi ) and ui (s) = (i (s), hi (s) . Fix a constant C0 such that |ui (0)|2L2 ≤ C0 . We begin by constructing a set of nice future paths which will contain most trajectories. For any positive K we define
Ai (K) = f ∈ C [0, ∞), L2( : |v(t)|2L2 + 2ν
and A(K) = A1 (K) ∩ A2 (K).
t
4
|v(s)|2L2 ds < C0 + E0 t + Kt 5 0 where v(s) = f (s) + ,s (f, hi )
Gibbsian Dynamics and Ergodicity for the Stochastically Forced Navier–Stokes Equation
91
By Lemma A.5, we know that for any a ∈ (0, 1) there exists a K such that a for i = 1, 2, P ω : Sωt L0i ∈ Ai (K) > 1 − 2 and hence
P ω : Sωt L0i ∈ A(K)
for i = 1, 2 > 1 − a > 0.
This is just another way of saying Q∞ (L0i , A(K)) > 1 − a. 0 0 0 Lemma 3.4. Let L01 and Let L2 be two initial pasts in P such that L1 (0) = L2 (0). 2 0 A(K) ⊂ C [0, ∞), L( be as defined above. For any choice of K > 0, Q∞ (L1 , · ∩ A(K)) is equivalent to Q∞ (L02 , · ∩ A(K)).
Proof of Lemma 3.1. Since we can choose K so that A(K) has measure arbitrarily close to 1, we have that Q∞ (L01 , ·) is equivalent to Q∞ (L02 , ·). Proof of Lemma 3.4. We intend to use Girsanov’s theorem to compare the two induced measures, Q∞ (L01 , · ) and Q∞ (L02 , · ). However we do not do so directly. To aid in our analysis, we consider the following surrogate processes y which will agree with ( on the set A = A(K). As before, we will use y(t) to denote the value of the process at time t and Y t to be the entire trajectory up to time t.
dyi (t) = −ν2 yi (t) + P( B yi (t), yi (t) + :t (Yit )G yi (t), ,t (Yit , hi (0)) dt + dW (t) (21) yi (0) = (i (0), where hi (0) = ,t (L0i ), 1 if f ∈ A|[0,t] , :t (f ) = 0 if f ∈ A|[0,t] and A|[0,T ] isthe low mode paths which agree with a path in A up to time T . Recall that ,t Yit , hi (0) is the solution to (12) with ( = Y and h(0) = hi (0). Equation (21) is the same as (17) except for the insertion of :t (Yit ). As long as :s (Yit ) = 1 for s ∈ [0, t], then yi (s) = (i (s) for s ∈ [0, t]. y y Let Q∞ (L01 , · ) and Q∞ (L02 , · ) be the measures induced by Y1 and Y2 respectively. If applicable, Girsanov’s theorem would imply that these measure are equivalent, that y y is Q∞ (L01 , · ) ∼ Q∞ (L02 , · ). For Girsanov’s theorem to apply, it is sufficient that the Novikov condition holds. Namely, 2 t t 1 ∞ −1 t ; :t (Y1 )D y1 (t), ,t Y1 , h1 (0) , ,t Y1 , h2 (0) dt < ∞, E exp 2 0 (22) where D(g, f1 , f2 )=G(g, f1 ) − G(g, f2 ) and ; is a diagonal matrix with the σk ’s on its diagonal. Here we have written the condition in terms of the y1 process. One can also def
92
W. E, J.C. Mattingly, Ya. Sinai
write the condition in terms of the y2 process; the finiteness of one implies the finiteness of the other. We will in fact show something much stronger than (22). Since |; −1 | < ∞, it would be enough to show that
∞
sup ω
:t (Y t )D y1 (t), ,t Y t , h1 (0) , ,t Y t , h2 (0) 2 dt < ∞. 1 1 1
0
(23)
Putting hi (s) = ,s (Y1s , hi (0)), ui (s) = (i (s) + hi (s), ρ(s) = h1 (s) − h2 (s) and using Lemma A.4, we have
D (1 (s), h1 (s), h2 (s) 2 2 ≤ C |ρ(s)|2 2 |u1 (s)|2 2 + |u2 (s)|2 2 . (24) L L L L Notice that if (i ∈ A|[0,T ] then for all t ∈ [0, T ], 4
|ui (t)|2L2 < C0 + E0 t + Kt 5 , t 4 1 |ui (s)|2L2 ds < C0 + E0 t + Kt 5 , 2ν 0 |ρ(0)|2L2 = |u1 (0) − u2 (0)|2L2 ≤ 2 |u1 (0)|2L2 + |u2 (0)|2L2 ≤ 4C0 . In addition, we can apply the same analysis as in Sect. 2. Starting from (14) and using the above estimates produces
|ρ(t)|2L2
Cˆ 2 t 2 |u2 (s)|L2 ds ≤ exp −νN t + ν 0 4 Cˆ 2 2 ≤ 4C0 exp −νN t + 2 C0 + E0 t + Kt 5 . 2ν |ρ(0)|2L2
2
ˆ2
Since by assumption νN 2 > C Eν 20 = C2νE20 , the second term goes to zero sufficiently fast and hence the estimate on the right-hand side of (24) decays exponentially fast. Thus, ω
2 :t (Y1 )D y1 (t), ,t (Y t , h1 (0)), ,t (Y t , h2 (0)) dt 1 1 ∞ |D (f (r), ,t (f, h1 (0)), ,t (f, h2 (0)))|2 dt ≤ sup
∞
sup 0
f ∈A 0
< const(C0 ) < ∞, y
y
which implies, Q∞ (L01 , · ) ∼ Q∞ (L02 , · ). As long as Yi stays in A, yi = (i . Hence y Q∞ (L0i , · ∩ A) = Q∞ (L0i , · ∩ A) and finally Q∞ (L01 , · ∩ A) ∼ Q∞ (L02 , · ∩ A). In fact our proof provided more information than stated in Lemma 3.4. It contains some estimates uniform over a class of initial pasts which will be useful in later investigations of the convergence rate. (See [Mat00]. ) We state the extra information in the following corollary.
Gibbsian Dynamics and Ergodicity for the Stochastically Forced Navier–Stokes Equation
93
Corollary 3.5. In the setting of the proof of Lemma 3.4, define P = {L ∈ P : |L(0) + ,0 (L)|L2 < C0 }. Then there exists a constant, depending on C0 and K, so that 2 y dQ∞ (L1 , g) y sup 1 − dQy (L , g) dQ∞ (L2 , g) < const(C0 , K1 ) < ∞. L1 ,L2 ∈P ∞ 2 We now move to the proof of Lemma 3.2. Fix L ∈ P. The proof proceeds by comparing the process ((t) to the associated Galerkin approximation living on L2( which we will denote by x(t). The advantage is that x(t) is a standard non-degenerate diffusion and hence it is Markovian and well understood. Take x(t) as the solution defined by the following stochastic ODEs: dx(t) = −ν2 x + P( B(x, x) dt + dW (t), x(0) = ((0). As in the previous section, we do not compare x(t) directly to ((t) but instead to a modified version of ((t) which we will denote by z(t). In analogy to before, we will denote the path of this process up to time t by Z t . Before continuing let us assume without loss of generality that |((0)|L2 ≤ C0 and t ≤ T for some positive C0 and T . This will give our estimates some uniformity over all initial conditions inside this ball and for times t ≤ T . The evolution of z(t) is given by
dz(t) = −ν2 z + P( B(z, z) + :t (Z t )G z, ,t Z t , h0 dt + dW, z(0) = ((0) = L(0) , where h0 = ,0 (L) and G is defined in (18). As in the last section, :t (Z t ) is a cut-off function. For any fixed b0 > 1, we define s 1 if 0 |Z s (r)|4L2 dr < (b0 C0 )4 T s :s (Z ) = . 0 otherwise Here b0 is a fixed constant to be chosen below. For any B ⊂ L2( , define [B] = v ∈ C [0, t], L2( : v(t) ∈ B . Then Rt (L(0), B) = Qt (L, [B]). Letting Qxt (L, · ) and Qzt (L, · ) be the two measures induced on C [0, t], L2( by the dynamics of x and z respectively. Lemma 3.2 will be a consequence of the following two lemmas. Lemma 3.6. Fix any b0 > 1. (The constant used in defining the z process.) Then the following holds: For any L ∈ P and t ≥ 0, Qxt (L(0), · ) is equivalent to Qzt (L, · ). Lemma 3.7. For any b0 the following holds: For any L ∈ P and t ≥ 0, there exists a positive function g( · ) so that Qxt (L(0), [B] ∩ A) ≥ B g(y)dm(y), where m( · ) is the Lebesgue measure.
94
W. E, J.C. Mattingly, Ya. Sinai
We now use these two lemmas to prove Lemma 3.2. Proof of Lemma 3.2. Observe that by construction as long as the trajectories stay in A, x(t) = ((t). Hence using Lemma 3.7, we have Rt (L, B) = Qt (L, [B]) ≥ Qt (L, [B] ∩ A) = Qzt (L, [B] ∩ A), g(L(0), y)dm(y), Qxt (L(0), [B] ∩ A) ≥ B
where g(L(0), y) is a positive function in y. Since Lemma 3.6 says that Qzt ((, · ∩ A) is equivalent to Qxt (L(0), · ∩ A), we know that Rt (L(0), B) is also bounded from below by a positive measure equivalent to the Lebesgue measure. We now turn to Lemma 3.6. Our construction gives some measure of uniform control which is useful for estimating the rate the system converges to the stationary measure. (See [Mat00]. ) We state these more precise estimates in the following corollary. Corollary 3.8. Fix a C0 > 0 and define P = {L ∈ P : |L(0) + ,0 (L)|L2 < C0 }. Then for any α ∈ (0, 1) there exists a b0 > 0 (the constant used to define A) so that: inf inf P Sωt L ∈ A > 1 − a, t∈[0,T ] L∈P
2 z 1 − dQt (L, g) dQx (L, g) < K(C0 , t) sup t x dQt (L, g) L∈P for t ∈ [0, T ], where K is a constant depending on C0 and t such that for each C0 , K → 0 as t → 0. Proof of Lemma 3.6 and Corollary 3.8. Girsanov’s theorem would imply the result if the Novikov condition t 2 1 s 2 s |:s (Z )| G z(s), ,s (Z , h0 ) L2 ds < ∞ E exp 2 0 holds. As in the proof of Lemma 3.4, we will prove the stronger condition t G z(s), ,s (Z s , h0 ) 2 2 ds < ∞. sup L z(·)∈A 0
Using Lemma A.4, we obtain the following estimate on G: G z(s), ,s (Z s , h0 ) 2 2 ≤ C |z(s)|2 2 |h(s)|2 2 + |h(s)|4 2 , L L L L where h(s) = ,s (Z s , h0 ) . By Lemma C.1 we know that if z is in A then sups∈[0,t] |h(t)|L2 is less than some C1 , where C1 depends on |h0 |L2 and the b0 , C0 and T used to define A. Hence for any z ∈ A, we have t t G z(s), ,s (Z s , h0 ) 2 2 ds ≤ C |z(s)|2L2 |h(s)|2L2 + |h(s)|4L2 ds L 0
0
≤ C
t 0
|z(s)|4L2 ds
≤ C (b0 C0 ) T 2
1 2
1 C12 t 2
21
t 0
|h(s)|4L2 ds
+ C C14 t.
21
+ C C14 t
Gibbsian Dynamics and Ergodicity for the Stochastically Forced Navier–Stokes Equation
Hence Novikov’s condition holds and the lemma is proven.
95
Proof of Lemma 3.7. The basic idea is as follows. Some of the paths which satisfy the condition defining A can be described by requiring that some norm of the paths be less than some fixed fk∗ (t) at time t. Such a condition has the advantage that it corresponds to fixing a zero boundary condition along the boundary of some region for the associated Fokker-Planck equation. Since the diffusion is nondegenerate this process has a positive density on the interior of this region. By carefully picking fk∗ we can have the region contain sets arbitrarily far away from the origin. We now make this precise. Fix a L ∈ P, and a t > 0. For k = 0, 1, 2, . . . define the disk Dk by Dk = f ∈ L2( : |f |4L2 ∈ [2k , 2k+1 ) and let D¯ k be the closure of Dk . We will construct g( · ) = gk ( · )1Dk , where gk is strictly positive on D¯ k and zero outside of D¯ k . Let fk∗ be a non-decreasing, positive, real-vaued C ∞ function fk∗ such that fk∗ (s) = 1
1
(C04 + αk ) 4 for s ∈ [0, (1 − αk )t − ε] and fk∗ (s) = (100 · 2k+1 ) 4 for s ∈ [(1 − αk )t, t] and linearly t interpolates in [(1−αk )t −ε, (1−αk )t]. αk is some number in (0, 1) chosen so that 0 (fk∗ (r))4 dr < (b0 C0 )4 T . This is possible as long as b0 > 1 and t ≤ T . Now define the subset Hk of C [0, t], L2( by Hk = f ∈ C
[0, t], L2(
: sup |f (s)|L2 ≤ s∈[0,t]
fk∗ (s)
.
By the choice of fk∗ it is clear that Hk ⊂ A, where A is the same set used in the definition of z. Now consider the process xk (t) which follows the same equation as x(t) except that it is killed whenever the trajectory leaves Hk . Another way of saying this is xk (t) is the process x(t) conditioned on staying in Hk . The transition density of this process gk (s, ((0), y) is the solution to the Kolmogorov equation with the same generator as x but with zero boundary conditions along the boundary of Hk . Since the generator is elliptic, we know that gk (t, ((0), y) is strictly positive everywhere in the interior of Hk . Since the trace of Hk at time t strictly contains Dk , we know that gk (t, ((0), y) is strictly positive for y ∈ D¯ k . Also by construction it is clear that Qxt (((0), Hk ) > 0 for all k. Let ak = Qxt (((0), Hk ) and set gk ( · ) = ak gk (t, ((0), · )1Dk ( · ). All that remains is to verify that this choice of gk constructs a g with the desired minorization property since it is clearly everywhere positive. Without loss of generality it is enough to show it for a B contained in some arbitrary Dk . Then Qxt (((0), [B] ∩ A) ≥ Qxt (((0), [B] ∩ Hk ) ≥ P((0) {x ∈ [B] & x ∈ Hk } ≥ P((0) {x ∈ [B] x ∈ Hk }P((0) {x ∈ Hk } ≥ ak gk (t, ((0), y)dm(y) = gk (y)dm(y). B
B
96
W. E, J.C. Mattingly, Ya. Sinai
4. Stationary Measures and Thermodynamical Formalism In this section we make a few general heuristic remarks about the methodology behind our approach. The starting point of our construction is rewriting the original Navier–Stokes equation with random forcing as a finite-dimensional system of ordinary stochastic differential equations whose drift coefficients depends on the whole past: d( = [−ν2 ( + P( B((, () + G((, ,t (Lt ))]dt + dW.
(25)
dW = d( − [−ν2 ( + P( B((, () + G((, ,t (Lt ))]dt.
(26)
From (25)
The measure corresponding to all dwk (t), k ∈ Zν , −∞ < t < ∞ can be symbolically written as 1 1 ∞ dw (t) 2 k dt exp − dwk (t). 2 |σk |2 −∞ dt k∈Zν
k
Here Zν is the set of modes that are forced. The substitution of the expression for dwk from (26) gives exp
∞
−∞
L1 (((t))dt +
∞ −∞
L2 (((t))dt −
∞ d(k (t) 2 1 1 dt 2 |σk |2 −∞ dt k∈Zν
d(k (t),
k
where 2 1 L1 (((t)) = − −ν2 ( + P( B((, () + G((, ,t (Lt )) , 2∞ ∞ 1 L2 (((t))dt = −ν2 ( + P( B((, () + G((, ,t (Lt )) k d(k (t). 2 |σk | −∞ −∞ k∈Zν
2 ∞ The factor exp − 21 k∈Zν |σ1|2 −∞ d(dtk (t) dt k d(k (t) can be considered as the k differential of a “free measure” which in our case is a finite-dimensional white noise. The “Lagrangians” L1 , L2 describe the non-local interaction of ((t) with the past. The whole expression shows that the stationary measure for the SNS system is actually a Gibbs state constructed with the help of Lagrangians L1 , L2 . The estimations of the growth of L1 , L2 as a function of the growth of |(k (s)|L2 , s → −∞ show the class of realizations for which the conditional distributions can be defined. Therefore we have a weaker form of the Gibbs state. R. L. Dobrushin in his last papers and talks stressed the importance of this class of probability distributions. Since we are dealing all the time with probability distributions, the free energy of our Gibbs state is zero. It would be interesting to develop a general theory of existence and uniqueness of Gibbs states for general Lagrangians L1 , L2 so that our result becomes a particular case of a more general statement.
Gibbsian Dynamics and Ergodicity for the Stochastically Forced Navier–Stokes Equation
97
5. Conclusion When analyzing the ergodic properties of an infinite dimensional stochastic process, one of the most delicate aspects is often finding the correct topology in which to work. One of the principle advantages of the approach presented in this paper is that it evades this difficulty. We trade an infinite dimensional diffusion process for a finite dimensional Itô process with memory. We have tried to present the simplest case of our theory, so that the exposition would be unencumbered. In fact the proofs contained in this work have proved a more general theorem than originally stated. Consider forcing defined by W (x, t) = σk wk (t, ω)ek (x), k∈Z
where Z is some finite subset of Z2 such that (0, 0) ∈ Z and k ∈ Z if and only if σk > 0. If we define L2( = span{ek , k ∈ Z}, and
L2h = span{ek , k ∈ Z}
N− = sup N : k ∈ Z for all k with 0 < |k| ≤ N .
With these definitions all of the previous lemmas and theorems hold with the role of N replaced by N− . In particular, if N−2 > C Eν 30 the system has a unique invariant measure. This formulation emphasizes the nature of our principle assumption. By requiring that all of the low modes are forced, we are essentially requiring that the reduced Gibbsian dynamics are elliptic in nature. Some steps towards dealing with a hypo-elliptic setting have been made. In [EMatt], finite dimensional truncations of the two dimensional SNS equation were studied and shown to be ergodic under minimal assumptions. In [EM], a reaction diffusion equation was studied under degenerate forcing. Our arguments can be easily extended to the case where the forcing of the k th mode has the form fk + σk dwk (t), fk is a constant, fk = 0 and σk = 0 for k ∈ / Z or the case when the forcing is not diagonal in Fourier space. Our approach can also be extended in several other different directions. We can consider the case when the high modes are also forced. As long as the forcing of the high modes decays sufficiently fast, our argument still applies with almost no change. The Wiener process in the forcing can be replaced by other diffusion processes such as the Ornstein-Uhlenbeck process. Dissipative PDEs such as the Cahn-Hilliard equation and the Ginzburg-Landau equations can also be studied using the same method. Finally, exponential convergence of empirical distributions to the stationary distribution can be proved. A. Energy Estimates In this Appendix, we prove a number of estimates controlling the evolution of the energy and enstrophy. Estimates for higher Sobolev norms are also possible, see [Mat98] for examples. In all cases, they are analogous to the standard results in the deterministic setting. Here we do not limit ourselves to forcing with only finitely many active modes. def We will characterize the forcing in terms of the El defined by El = |k|2l |σk |2 . We begin with the basic energy and enstrophy estimates in the stochastic setting.
98
W. E, J.C. Mattingly, Ya. Sinai
Lemma A.1. For any p > 1, we have
2p
t
2(p−1)
E |u(s)|2L2 |u(s)|L2 ds t 2p 2(p−1) ≤ E |u(0)|L2 + C0 E |u(s)|L2 ds, 0 t 2 2p 2(p−1) E |u(t)|L2 + 2pν E 2 u(s) 2 |u(s)|L2 ds L 0 t 2p 2(p−1) ≤ E |u(0)|L2 + C1 E |u(s)|L2 ds. E |u(t)|L2 + 2pν
0
0
2 and σ 2 = sup |σ |2 . In the case p = 1, we have the Here Ci = pEi + 2p(p − 1)σmax k max equalities
E |u(t)|2L2
t
+ 2ν E |u(s)|2L2 =E |u(0)|2L2 + E0 t, 0 t 2 + 2ν E 2 u(s) 2 =E |u(0)|2L2 + E1 t.
E |u(t)|2L2
L
0
(27) (28)
Proof. We begin by fixing a positive integer M and considering the Galerkin approxima (M) tion defined by u(M) (t) = |k|≤M uk (t)ek . u(M) (t) satisfies an equation of exactly the same form as the full solution except the nonlinearity has been projected to those terms def |k|2l |σk |2 . Our estimates of order less than or equal to M. We will also need ElM = |k|≤M
will be independent of the order of approximation M. For simplicity, we will sometimes neglect the superscript M. p Applying Itô’s formula to the map {uk } → |uk |2 produces, 2p
−ν |u(t)|2L2 dt + u(t), dW L2 (29)
2(p−2) 2(p−1) M + 2p(p − 1) |u(t)|L2 |uk (t)|2 |σk |2 dt + p |u(t)|L2 E0 dt 2(p−1)
d |u(t)|L2 = 2p |u(t)|L2
k
for the energy moments and d
2p |u(t)|L2
! 2 2 2 −ν u(t) 2 dt + u(t), dW L2 = L
2(p−2) + 2p(p − 1) |u(t)|L2 |k|2 |σk |2 |uk (t)|2 dt 2(p−1) 2p |u(t)|L2
(30)
k
2(p−1) M + p |u(t)|L2 E1 dt
for the enstrophy moments. Here α u(t), dW (t)L2 is shorthand for |k|α uk (t)σk dwk (t). In the first, we have used the fact that B(u, u), uL2 = 0 and in the second the fact that B(u, u), 2 uL2 = 0. Since, on the torus, the structure of the energy and the enstrophy equations are the same we will continue giving all of the details for analysis of the enstrophy equation.
Gibbsian Dynamics and Ergodicity for the Stochastically Forced Navier–Stokes Equation
99
The analysis for the energy equation proceeds analogously, see [Mat99, Mat98]. For a fixed H > 0, we introduce the stopping time 2 T = inf t ≥ 0 : 2 u(t) 2 ≥ H 2 . L
Denoting by Mt the local martingale term in (30) , we define the stopped martingale MtT by t 2(p−1) MtT = 2p |u(s ∧ T )|L2 2 u(s ∧ T ), dW (s)L2 . 0
MtT has the advantage that its quadratic variation, denoted by [M T , M T ]t , is clearly finite. t 2p 2 T T 2 [M , M ]t ≤ 2pσmax u(s ∧ T ) 2 ds L 0 t 2p 2 2 2 ≤ 2pσmax H 2p t < ∞. u(s ∧ T ) 2 ds ≤ 2pσmax L
0
Because E[M T , M T ]t < ∞ we know that EMtT = 0. And because t ∧ T is a bounded T stopping time the Optional Stopping Time Lemma says that EMt∧T = 0. Since Mt∧T = T Mt∧T , we have E |u(t
∧ T )|2L2
t∧T
+ 2νE
2 2 u(s) 2 ds = E |u(0)|2L2 + E1M E(t ∧ T ), L
0
and when p > 1,
2p
E |u(t ∧ T )|L2 + 2pνE 2p
= E |u(0)|L2 + E
t∧T
0 t∧T
0
2 2 u(s) 2 ds
2(p−1)
|u(t)|L2
L
2(p−2)
2p(p − 1) |u(s)|L2
|k|2 |σk |2 |uk (s)|2
k 2(p−1)
+ p |u(s)|L2
E1M ds.
Hence
2 2 u(s) 2 ds L 0 t∧T 2p 2(p−1) 2 |u(s)|L2 ≤ E |u(0)|L2 + 2p(p − 1)σmax + pE1M E ds. 2p
E |u(t ∧ T )|L2 + 2pνE
t∧T
2(p−1)
|u(t)|L2
0
Since u(t) is continuous in time, T → ∞ as H → ∞ and hence T ∧ t → t. Thus we obtain t 2 2 E |u(t)|2L2 + 2νE u(s) 2 ds = E |u(0)|2L2 + E1M t, 0
L
100
W. E, J.C. Mattingly, Ya. Sinai
2p E |u(t)|L2
t
+ 2pνE 0
2 2 u(s) 2 ds
2(p−1)
|u(t)|L2
L
2p 2 ≤ E |u(0)|L2 2p(p − 1)σmax + pE1M E
t
2(p−1)
|u(s)|L2
0
ds.
Recall that we have been calculating with an M th order Galerkin approximation. For the p = 1 equation, the right hand side converges to the desired right hand side. With this bound on E |u(t)|2L2 in hand we can take the M → ∞ limit of the p = 2 equation. Analogously, once we have taken the limit in the pth equation we have the dominating bound needed to take the limit in the p + 1 equation. 2 In our setting, the Poincaré inequality reads |f |2L2 > |f |2L2 and 2 f L2 > |f |2L2 . This allows us to close the above inequalities. After applying Gronwall’s inequality, we obtain the following estimates which are uniform in time. Corollary A.2. E |u(t)|2L2 ≤ e−2νt E |u(0)|2L2 + E |u(t)|2L2 ≤ e−2νt E |u(0)|2L2
E0
1 − e−2νt , 2ν E1
+ 1 − e−2νt . 2ν
For any p > 1, E |u(t)|L2 ≤ e−2νt E |u(0)|L2 + C0 2p
2p
t 0
E |u(t)|L2 ≤ e−2νt E |u(0)|L2 + C1 2p
2p
e−2ν(t−s) E |u(s)|L2
2(p−1)
t 0
e−2ν(t−s) E |u(s)|L2
ds,
2(p−1)
ds.
We use standard estimates in the tri-linear term B(u, v), wL2 specialized to our two dimensional setting. Its proof can be found in [CF88] for example. Lemma A.3. Let α, β, γ be positive real numbers such that α + β + γ ≥ 1 and (α, β, γ ) = (0, 0, 1), or (0, 1, 0), or (1, 0, 0), |B(u, v), wL2 | ≤ C α uL2 β+1 v
L2
γ w
L2
.
Using this lemma we prove the following estimate specialized to the two dimensional setting with periodic boundary conditions. Lemma A.4. Let {ek , k ∈ Z2 } be a basis for L2 . Consider a splitting of L2 = L2( + L2h . Let N + be in sup{|k| : ∃ ek with ek ∈ L2( } and P( be the projector onto L2( . If u, v ∈ L2 then |P( B(u, v)| ≤ C(N + )3 |u|L2 |v|L2 .
Gibbsian Dynamics and Ergodicity for the Stochastically Forced Navier–Stokes Equation
101
Proof of Lemma A.4. In the periodic setting, P( , Pdiv , and (−)s all are simply Fourier multipliers and hence commute with one other. Recall that B(u, v) = Pdiv (u · ∇)v and hence, |P( B(u, v)| = sup |P( B(u, v), wL2 | = sup |B(u, v), P( wL2 | w∈L2 |w|=1
w∈L2 |w|=1
= sup |B(u, P( w), vL2 | ≤ C |u|L2 |v|L2 sup 3 P( w w∈L2 |w|=1
L2
w∈L2 |w|=1
≤ C(N + )3 |u|L2 |v|L2 sup |w|L2 ≤ C(N + )3 |u|L2 |v|L2 . w∈L2 |w|=1
Lemma A.5. Fix any δ > 21 , a ∈ (0, 1) and C1 > 0. Let u(t) = ϕtω u0 . There exists a K1 > 0 such that whenever |u0 |2L2 < C0 , t |u(s)|2L2 ds ≤ C0 + E0 t + K1 (t + 1)δ for all t ≥ 0 ≥ 1 − a. P |u(t)|2L2 + 2ν 0
Proof of Lemma A.5. The energy equation reads |u(t)|2L2
t
+ 2ν 0
|u(s)|2L2
ds =
|u0 |2L2
t
+ E0 t + 0
u(s), dW (s)L2 .
Since |u0 |2L2 < C0 , all we need to show is that P Mt ≤ K1 (t + 1)δ for t ≥ 0 ≥ 1 − a t for K1 large enough, where Mt = 0 u(s), dW (s)L2 . The quadratic variation [M, M]t can be calculated and one sees that t 2 |u(s)|2L2 , [M, M]t ≤ σmax 0
and hence p
([M, M]t ) ≤
2p σmax
t 0
|u(s)|2L2
p
≤
2p p−1 σmax t
t 0
2p
|u(s)|L2 ds.
From Corollary A.2, we know that if |u(0)|2L2 < C0 , then there exists a constant Cp (C0 ) 2p
so that E |u(t)|L2 ≤ Cp for all t ≥ 0 and p ≥ 1. Now define the events Ak =
sup |Ms | > K1 k
s∈[0,k]
δ
.
102
W. E, J.C. Mattingly, Ya. Sinai
By the Doob–Kolmogorov martingale inequality we have 2p E([M, M]t )p σmax Cp k p P Ak ≤ ≤ . 2p 2p k 2pδ K1 k 2pδ K1
Lastly observe that
P Mt ≤ K1 (t + 1)
δ
≥1−P
"
Ak ≥ 1 −
k
P Ak . k
By the previous estimate on P Ak , for any δ > 21 we see that the sum is finite for p sufficiently large. Specifically, we need δ > 21 (1 + p1 ). Lastly, the sum can be made as small as we want by increasing K1 .
B. Properties of Stationary Measures We now establish a number of properties, derived from the dynamics, which any stationary measure must possess. Lemma B.1. For any stationary measure all energy moments are finite. In fact for any p ≥ 1 there exist a constant Cp < ∞ such that 2p |u|L2 dµ(u) < Cp L2
for all stationary measures µ. In particular C1 = E2ν0 . Proof. We will consider the case when p = 1. The other cases follow by the same method. For any H > 0 there exists a bH such that µ{u ∈ L2 : |u|2L2 ≤ bH } > 1 − H. Let BH denote {u ∈ L2 : |u|2L2 ≤ bH }. For any H > 0 and t > 0, we have
ω 2 |u|2L2 ∧ H dµ(u) = E ϕ0,t u L2 ∧ H dµ(u) L2 L2
ω 2 ≤ HH + E ϕ0,t u L2 ∧ H dµ(u) B H
ω 2 ≤ HH + E ϕ0,t uL2 dµ(u). BH
Applying the first bound in Corollary A.2 gives
E0 E0 2 −2νt |u|L2 ∧ H dµ(u) ≤ H H + bH − +e . 2ν 2ν L2 Taking the limit as t → ∞ and then observing that H was arbitrary, we obtain
E0 |u|2L2 ∧ H dµ(u) = |u|2L2 ∧ H dµ(u) ≤ . 2ν L2 U Taking H → ∞ gives that the energy of any stationary measure is bounded by E2ν0 . The argument for higher moments of the energy is the same
Gibbsian Dynamics and Ergodicity for the Stochastically Forced Navier–Stokes Equation
103
Lemma B.2. For any stationary measure µ, L2
|u|2L2 dµ(u) =
E0 . 2ν
In addition if the forcing is such that E1 < ∞ then
E1 2 2 u 2 dµ(u) = 2 L 2ν L
and
2p
L2
|u|L2 dµ(u) < C1 (p) < ∞
for all p ≥ 1. Proof. Using Eq. (27), we have that for any initial condition u0 ∈ L2 , 2 E ϕ0,t u0 L2 + 2ν
t 0
2 E ϕ0,s u0 L2 ds = |u0 |2L2 + E0 t.
Here we have switched the time integral and the expectation by the Fubini–Tonelli theorem because the integrand is non-negative. We know from Lemma B.1 that any stationary measure has finite energy moments. Hence averaging with respect to the stationary measure gives
t 2 2 E ϕ0,t u0 L2 dµ(u0 ) + 2ν E ϕ0,s u0 L2 ds dµ(u0 ) 2 L2 L 0 |u0 |2L2 dµ(u0 ) + E0 t. = L2
Because µ was a stationary measure, we have that L2
and
L2
2 E ϕ0,t u0 2 dµ(u0 ) =
L
t 0
2 E ϕ0,s u0 L2 ds = t
L2
|u0 |2L2 dµ(u0 )
L2
|u0 |2L2 dµ(u0 ).
Hence 2ν L2 |u0 |2L2 dµ(u0 ) = E0 , concluding the proof of the first claim. We now turn to the enstrophy moments. By the first part of this lemma, we know that there exist a U ⊂ H1 such that µ(U ) = 1. We now can proceed just as in Lemma B.1 to prove that all of the enstrophy moments are finite. To find the expected value of the H2 norm we use Eq. (28). Then we proceed exactly as we did to obtain the expected value of the enstrophy (the H1 norm). Lemma B.3. Let µp be the measure induced on C (−∞, 0], L2( by any given stationary measure µ. Fix any K0 > 0 and δ > 21 . Then for µp -almost every trajectory in C (−∞, 0], L2( , v(s), there exists a constant T such that for s ≤ 0, |v(s)|2L2 ≤ E0 + K0 min(T , |s|)δ .
104
W. E, J.C. Mattingly, Ya. Sinai
Proof. The basic energy estimate, derived from (29), reads: t t |v(s)|2L2 ds + |v(t)|2L2 = |v(t0 )|2L2 + E0 (t − t0 ) − 2ν v(s), dW (s)L2 , t0
t0
for any t0 < t ≤ 0. There is no problem writing the integration against the Wiener path in the above integral. Our stochastic PDE had pathwise defined solutions. Therefore if we know the initial condition v(t0 ) and the trajectory of v(s) for s ∈ [t0 , t] the increments of the Wiener process on the interval [t0 , t] are uniquely defined. For any k ≥ 1, the above estimate implies sup
s∈[−k,−k+1]
where Fk (s) = −2ν Now define
s
−k
|v(s)|2L2 ≤ |v(−k)|2L2 + E0 +
sup
s∈[−k,−k+1]
|v(r)|2L2 dr + Mk (s) and Mk (s) =
s
−k
Fk (s),
v(r), dW (r)L2 .
Ak = v(s) :
sup
s∈[−k,−k+1]
|v(s)|2L2
≤ E0 + K0 |k − 1|
δ
and UT = ∩k>T Ak . Since the UT are an increasing collection of sets it will be sufficient to prove that the limT →∞ µp (UT ) = 1. This is the same as showing that c c c limT →∞ µp (UT ) = 0. Now since µp (UT ) ≤ k>T µp (Ak ), we need only to show c that k>0 µp (Ak ) < ∞: K0 c 2 δ |k − 1| µp (Ak ) ≤ µp v(s) : |v(−k)|L2 ≥ 2 K0 + µp v(s) : sup Fk (s) ≥ |k − 1|δ , 2 s∈[−k,−k+1] The first term is the most straightforward. Lemma B.2 implies that the second moment of the energy is uniformly bounded by some constant C2 . Hence Chebyshev’s inequality produces 4 K0 4C E |v(−k)|4L2 ≤ 2 |k − 1|δ ≤ 2 µp v(s) : |v(−k)|2L2 ≥ 2δ 2 K0 |k − 1| K0 |k − 1|2δ which is summable as long as δ > 21 . The second term proceeds in the same way but with Chebyshev’s inequality replaced by the exponential martingale estimate. The exponential martingale inequality controls the size of a martingale minus something proportional to its quadratic variation (see [RY94, Mao97] for example). The details are given in the following. The key observation is that we can control Fk (s) by controlling Mk (s)− α[Mk , Mk ](s), where [Mk , Mk ](s) is the quadratic variation of the martingale Mk (s) and α is a constant we will choose presently. First notice that with probability one, s s 2 |v(r)|2L2 dr [Mk , Mk ](s) = |σl |2 |vl (r)|2 dr ≤ σmax −k
2 ≤ σmax
l
−k
s −k
|v(r)|2L2 dr
Gibbsian Dynamics and Ergodicity for the Stochastically Forced Navier–Stokes Equation
105
and hence Fk (s) ≤ Mk (s) −
2ν [Mk , Mk ](s) 2 σmax
almost surely. In this setting, the exponential martingale inequality states that for positive α and β, α P sup Mk (s) − [Mk , Mk ](s) > β ≤ e−αβ . 2 s∈[−k,0] Taking α = µp
4ν 2 σmax
we find
K0 v(s) : sup Fk (s) ≥ |k − 1|δ 2 s∈[−k,−k+1]
2νK0 δ ≤ exp − 2 |k − 1| . σmax
Since this is summable for any δ > 0, the proof is complete.
C. Control of High Modes by Low Modes
Lemma C.1. If h(t) is the solution to (12) with some low mode forcing ( ∈ C [0, t], L2( , t 4 then sups∈[0,t] |h(s)|L2 is bounded by a constant depending on |h(0)|L2 and 0 |(|L2 ds. Proof. Taking the inner product of (12) with h produces 1d |h(t)|2L2 = −ν |h|2L2 + Ph B(h, (), hL2 + Ph B((, (), hL2 2 dt because Ph B((, h), hL2 = Ph B(h, h), hL2 = 0. Next using Lemma A.3 produces, 1d |h(t)|2L2 ≤ −ν |h|2L2 + C |h|L2 |h|L2 |(|L2 + C |h|L2 |(|2L2 2 dt C C |h|2 2 |(|2L2 + |(|4L2 ≤ 2ν L 2ν Since ( ∈ L2( we have |(|L2 ≤ (N + ) |(|L2 , where N + = sup{|k| : ∃ ek with ek ∈ L2( }, and hence after applying Gronwall’s Lemma we have t |h(t)|2L2 ≤ C1 |h(0)|2L2 exp a1 |(|2L2 ds 0 t t |(|4L2 ds exp a1 |(|2L2 ds . + C2 0
Since by Hölder inequality,
t 0
0
|(|2L2 ds ≤ t
0
t
|(|4L2 ds,
the proof is complete. Acknowledgements. The authors would like to thank Gérard Ben Arous, Amir Dembo, Perci Diaconis,Yitzhak Katznelson, Di Liu, George Papanicolaou and Andrew Stuart for useful discussions. The work of the first author is partially supported by a Presidential Faculty Fellowship from the NSF. The work of the second author is partially supported by NSF grant DMS-9971087. The work of the third author is partially supported by NSF grant DMS-9706794 and RFFI grant 99-01-00314.
106
W. E, J.C. Mattingly, Ya. Sinai
References [BKL] [CDF97]
Bricmont, J., Kupiainen, A., and Lefevere, R.: Preprint Crauel, H., Debussche, A., and Flandoli, F.: Random attractors. J. Dynam. Diff. Eqs. 9 no. 2, 307–341 (1997) [CF88] Constantin, P. and Foia¸s, C.: Navier–Stokes equations. Chicago: University of Chicago Press, 1988 [EMatt] E, W. and Mattingly, J.C.: Ergodicity for the Navier–Stokes Equation with Degenerate Random Forcing: Finite Dimensional Approximation. Submitted [EM] Eckmann, J.P., and Hairer, M.: Uniqueness of the invariant measure for a stochastic PDE driven by degenerate noise. Preprint [EFNT94] Eden, A., Foias, C., Nicolaenko, B., and Temam, R.: Exponential attractors for dissipative evolution equations. Research in Applied Mathematics, New York: John Wiley and Sons and Masson, 1994 [Fer97] Ferrario, B.: Ergodic results for stochastic Navier–Stokes equation. Stochastics and Stochastics Rep. 60, no. 3–4, 271–288 (1997) [Fla94] Flandoli, F.: Dissipativity and invariant measures for stochastic Navier–Stokes equations. NoDEA 1, 403–426 (1994) [FM95] Flandoli, F. and Maslowski, B.: Ergodicity of the 2-D Navier–Stokes equation under random perturbations. Commun. Math. Phys. 171, 119–141 (1995) [FMRT] Foias, C., Manley, O., Rosa, R., Temam, R.: Navier–Stokes Equations and Turbulence. To be published [IN] Ito, K., Nisio, M.: On stationary solutions of a stochastic differential equation. J. Math. Kyoto Univ. 4, 1–75 (1964) [KS] Kuksin, S. and Shirikyan, A.: Stochastic Dissipative PDE’s and Gibbs Measures. Commun. Math. Phys. 213, 291–330 (2000) [Mao97] Mao, X.: Stochastic differential equations and their applications. Horwood Series in Mathematics & Applications, Chichester: Horwood Publishing Limited, 1997 [Mat98] Mattingly, J.C.: The stochastically forced Navier–Stokes equations: Energy estimates and phase space contraction. Ph.D. thesis, Princeton University, 1998 [Mat99] Mattingly, J.C.: Ergodicity of 2D Navier–Stokes equations with random forcing and large viscosity. Commun. Math. Phys. 206 no. 2, 273–288 (1999) [Mat00] Mattingly, J.C.: Exponential convergence for the stochastically forced Navier–Stokes equations and other partially dissipative dynamics. Submitted [RY94] Revuz, D. and Yor, M.: Continuous martingales and Brownian motion. Second ed., Grundlehren der Mathematischen Wissenschaften, Vol. 293, Berlin: Springer-Verlag, 1994 [Str82] Stroock, D.W.: Lectures on topics in stochastic differential equations. Bombay: Tata Institute of Fundamental Research, 1982, with notes by Satyajit Karmakar [SV79] Stroock, D.W. and Varadhan, S.R.S.: Multidimensional diffusion processes. Berlin: SpringerVerlag, 1979 [VF88] Vishik, M. and Fursikov, A.: Mathematical problems of statistical hydrodynamics. Dordrect: Kluwer Academic Publishers, 1988 Communicated by G. Gallavotti
Commun. Math. Phys. 224, 107 – 112 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Counting Phase Space Cells in Statistical Mechanics Giovanni Gallavotti Fisica, I.N.F.N., Università di Roma “La Sapienza”, P. le Moro 2, 00185 Roma, Italy. E-mail: [email protected] Received: 16 November 2000 / Accepted: 22 April 2001
To Joel L. Lebowitz on his 70th birthday Abstract: The problem of counting the number of phase space cells is analyzed with the purpose of interpreting the variational principle for the SRB statistics as an equidistribution property, in equilibrium as well as in nonequilibrium statistical mechanics. 1. Phase Space Cells When Volume is Not Conserved. Variational Properties of Stationary States Consider a transitive Anosov map S on a bounded surface M (“phase space”) modeling, for instance, a simple gas of identical particles subject to nonconservative external forces and to thermostating forces balancing them in the average: a simple but important example that seems well modeled in this way can be found in [CELS93]. Such models would be “typical” for non equilibrium systems if one accepted the chaotic hypothesis, see [GC95a], and Ch. 9 in [Ga99]; for a general discussion see [Ru99]. The general theory of Anosov systems, see [Si68], implies the existence of a “statistics” µSRB describing the asymptotic behavior of almost all initial data in phase space (in the sense of the Liouville measure). This means thatexcept for a volume zero set of ini −1 tial data x it will be limT →∞ T −1 Tj =0 F (S j x) = µSRB (dy)F (y) for all continuous functions (“observables”) F on M. The SRB distribution admits a rather simple representation which can be interpreted in terms of “coarse graining” of the phase space, and it is convenient to introduce it at this point for later use. Let P be a “Markov partition” of phase space P = (P1 , . . . , Pm ) with sets Pσ , see (for instance) Ch. 9 in [Ga99]. Let T be a time such that the size of the Eσ−T /2 ,... ,σT /2 ∈ def T /2 P T = −T /2 S j P, σj = 1, 2, . . . , m, is so small that the physically interesting observables can be viewed as a constant inside Eσ−T /2 ,... ,σT /2 = E( σ ). Then the SRB Work partially supported by IHES and Rutgers University
108
G. Gallavotti
probability µ( σ ) of E( σ ) and the Liouville distribution are described in terms of the functions λ1u (x) = log | det(∂S)u (x)|,
λ1s (x) = log | det(∂S)s (x)|,
(1)
where (∂S)u (x) (resp. (∂S)s (x)) is the Jacobian of the evolution map S restricted to the unstable (stable) manifold through x and mapping it to the unstable (stable) manifold ±T /2 1 j ±T /2 1 j T /2 T /2 through Sx. Defining Uu,± (x) = j =0 λu (S x) and Us,± (x) = j =0 λs (S x) and selecting a point x( σ ) ∈ E( σ ) for each σ , the SRB distribution and the volume distribution µL , on the phase space M, which we suppose to have volume W = V (M), attribute to the nonempty sets E( σ ) the probabilities def T /2 T /2 µ( σ ) = µSRB (E( σ )) = hTu,u ( σ ) · exp − Uu,− (x( σ )) − Uu.+ (x( σ )) def T /2 T /2 µL ( σ ) = V (E( σ ))/W = hTs,u ( σ ) · exp − Us,− (x( σ )) − Uu,+ (x( σ )) ,
(2)
where V (E) is the Liouville volume of E, and hTu,u ( σ ), hTs,u ( σ ) are suitable functions of σ uniformly bounded as σ , T vary, c.f.r. Ch. 9 in [Ga99]. One can read (2) by saying that the “difference” between the Liouville volume and the SRB volume is that the first weighs asymmetrically the past and the future while the second weighs them symmetrically. As mentioned above we have in mind that the sets E( σ ) represent macroscopic states, being small enough so that the physically interesting observables have a constant value inside them; and we would like to think that they provide us with a model for a “coarse grained” description of the microscopic states. The dynamics will, in general, be nonconservative: hence the phase space volume will generally contract under time evolution. We want to describe the time evolution in terms of evolution of microscopic states, with the aim of counting the microscopic states relevant for a given stationary state of the system, i.e. for the SRB distribution. Therefore we divide phase space, supposed of dimenson d, into parallelepipedal cells of size εd V (E( σ )) and try to discuss time evolution in terms of them. This is a situation that arises in computer simulations: where the cells are the computer points with coordinates given by a set of integers and the evolution S is a program or code (simulating the solution of equations of motion suitable for the model under study) which operates exactly on the coordinates (i.e. imagining that the deterministic round offs are part of the program). It is clear, or at least it is a widely held belief, that the simulation will produce a chaotic evolution “for all practical purposes”, i.e. if we only look at “macroscopic observables” on the coarse graining scale e−λT 0 of the partition P T , if 0 is the phase space size1 , 1/d and −λ is the most contractive line element exponent, or even at finer 0 = W observables corresponding to a finer coarse graining, which are constant on elements of the pavement P T for T > T ,: provided the latter size is greater than the size ε of the cells : T < T with e−λ T 0 ≥ ε. The question we ask on general grounds is, see also [Ga95] 1 Here the phase space size 0 should be thought of as measured in dimensionless units, i.e. in terms of the sizes δp, δq in momentum and position of the cells . Assuming that we consider N mass–m particles in a gas at temperature T and density ρ, so that d = 6N, then W = 6N 0 with 0 proportional to √ (ρ −1/3 2mkB T /δpδq)1/2 .
Counting Phase Space Cells in Statistical Mechanics
109
Question: Can we count the number of ways in which the asymptotic state of the system can be realized microscopically? In equilibrium the (often) accepted answer is simple: the number is N0 = W/ε d , i.e. just the number of cells (“ergodic hypothesis”). This means that we think that our program will generate a one cycle permutation of the N0 cells , each of which is therefore representative of the equilibrium state. Average values of macroscopic observables will be obtained simply as: lim N −1
N→∞
N −1
F (S j x) = N0−1
F () =
j =0
M
F (y)µL (dy)
(3)
According to Boltzmann the quantity: SB = log (W/εd ) def
(4)
is then, see [Bo77] (where however w’s denote integers rather than phase space volumes), proportional to the physical entropy of our equilibrium system. Can one extend the above view to systems out of equilibrium? In such systems the volume will no longer be preserved by time evolution and, in fact, its contraction rate η(x) = − log | det ∂S(x)|
(5)
not only does not vanish but, in general, will have a positive time average η+ , η+ = j limN→∞ N −1 N−1 j =0 η(S x) = M η(y)µSRB (dy), see [Ru96]. If η+ > 0 the volume will contract indefinitely (hence the system is called dissipative). Out of equilibrium we may imagine that a similar kind of “ergodicity” holds: namely that the cells that represent the stationary state form a subset of all the cells, on which evolution acts as a one cycle permutation. If so the statistical properties of motions will be determined by the equidistribution among such cells, which thus attributes probabilities ρ() which maximize the quantity − ρ() log ρ(). Hence the above counting question can be related to a problem ... which necessarily follows from Boltmann’s train of thought, [and] has remained untouched. Consider an irreversible process which, with fixed outside constraints, is passing by itself from the nonstationary to the stationary state. Can we characterize in any sense the resulting distribution of state as the “relatively most probable distribution”, and can this be given in terms of the minimum of a function which can be regarded as the generalization..., [EE11], footnote 239, p. 103. Before proceeding it is convenient to note a nontrivial relation between η and λ1u , λ1s valid for all T , T > 0, T /2 j =−T /2
η(S j x) =
T /2
T /2
(Uu,α (x) + Us,α (x)) + O(1)
(6)
α=±
see Eq. (1), with the error O(1) being uniformly bounded in T and x: this is a property which is obtained in proving (2), see [Si68] and [Ga99], Chap. 9. Considering simulations of a dissipative system we must recognize that no code can be an invertible code: it must happen (many times) that S = S with = . Clearly
110
G. Gallavotti
˜ we can think that both and are not really different and only one if S = S = of the two can be taken as a representative of the microscopic state. We can imagine “pruning” one after the other the “unnecessary” cells until the map S becomes invertible. More formally each cell will have a motion that is eventually periodic and we discard as “transients” all cells whose evolution is not strictly periodic. The remaining cells will form a discrete model of the attractor. The above question becomes now a precise one: which is the number N of leftover cells? It will be only a fraction of the initial number N0 of cells: and we can attempt to estimate it assuming that the evolution is a one cycle permutation of them. The number N ( σ ) of cells leftover in E( σ ) will have to be proportional to the SRB probability µ( σ ) of E( σ ) otherwise the time average of the observables (i.e. the SRB average introduced above) will not be correctly given by the sum over the cells. The just described pruning process will have to leave N ≤ N0 cells; and furthermore inside each “coarse grain” set E( σ ) a number of cells equal to N ( σ ) = N µSRB (E( σ )). If V ( σ ) is the volume of E( σ ), so that σ V ( σ ) = W , it must be: V ( σ )/εd ≥ N ( σ ) = N µSRB (E( σ ))
(7)
(8)
for all σ ’s. This gives, using that W = ε d N0 : N ≤ N0 min σ
V ( σ )/ε d . N0 µSRB (E( σ ))
(9)
T (x( σ )) − U T (x( σ ))) differs by O(T −1 ) from the The quantity η = max σ T2 (Us,− u,− maximal average phase space contraction maxx∈ attractor − T2 log | det ∂S T /2 (x)|, and Eqs. (2), (6), give
N ≤ N0 e− 2 T η+O(1) , 1
(10)
where the O(1) is uniform in T , and η can be identified with the infinite time average η+ of the phase space contraction rate − log | det ∂S(x)|. The picture must hold for all Markovian pavements P and for all T ’s such that e−λT δ > ε if δ 0 is the typical size of an element of the partition P: this restricts T to be of −1 the order of T = λ log 0 /ε. And, as in equilibrium, once this requirement is fulfilled we shall think that N has the maximal allowed value, i.e. that in (10) the inequality is saturated for T = T . This is a kind of “ergodicity” assumption which is similar to the corresponding assumption that in equilibrium all cells are actually visited (while assuming that only a fraction of them is visited would give the same statistics as long as the fraction is taken to be the same in each coarse grain volume, but a different cell count hence a different entropy assignment). + α We call −λ− (equal in i , λi the Lyapunov exponents, λi > 0, i = 1, 2, . . . , d/2 − , d λ ≥ η ≥ η = number by the transitivity assumption), so that λ ≤ mini λ− + i (λi − i + λi ), and define: 1η 1 η 0 Scells = log N = log N0 − log . (11) = (log N0 ) 1 − 2λ ε 2d λ
Counting Phase Space Cells in Statistical Mechanics
111
This will depend on ε and, unlike the equilibrium case when η = 0, nontrivially so because η/λ is a dynamical quantity and changing (i.e. our representation of the microscopic motion) ε will change Scells as Scells /Scells = | log ε /ε|. Given a precision ε of the observations, the quantity Scells measure, how many “non transient” phase space cells must be used to obtain a faithful representation of the attractor and of its statistical properties on scale ε. Here by “faithful” on scale ε we mean that all observables which are constant on such a scale will show the correct statistical properties, i.e. that cells of size larger than ε will be visited with the correct SRB frequency. Since the quantity η/λ is bounded by 1 we see that dissipation does not “simplify” much the motion. Note, however, that we also assume that the system isAnosov transitive: which implies that the attactor is dense; so that the small reduction due to the dissipation, estimated above, holds only as long as this is a correct assumption: at high forcing the attractor is likely (i.e. examples abound) to be no longer dense on phase space and the number N0 will have to be replaced by the smaller power of 0 , affecting correspondingly the analysis leading to (11). One can ask how many phase space cells are required for a faithful representation of the dynamics by a permutation of cells if one just asks faithfulness to hold only for “most” observations on scale ε or higher: depending on the meaning attributed to “most” we can expect that η/λ is replaced by other similar quantities (i.e.g. by some averages of η+ and of Lyapunov exponents, respectively). Since we are not interested in all observables but only in very few ones, it might be interesting to attempt concrete estimates in this more general sense. 2. Remarks (1) Although Eq. (11) gives the cell count it does not seem to deserve to be taken as a definition of entropy also for systems out of equilibrium, not even for systems simple enough to admit a transitive Anosov map as a model for their evolution. It seems a notion distinct from what has become known as the “Boltzmann entropy”, [Le93], see also [EE22]. The notion is also different from the Gibbs’entropy, to which it is equivalent only in equilibrium systems: in nonequilibrium (dissipative) systems the latter can only be defined as −∞ and perpetually decreasing; because in such systems one can define the rate at which (Gibbs’) entropy is “created” or “ceded to the thermostats” by the system to be η+ , i.e. to be the average phase space contraction η+ , see [An82, Ru99]. (2) We also see, from the above analysis, that the variational principle that determines the SRB distribution can be identified with the one that leads to equal probability of the phase space cells. The SRB distribution appears to be the equal probability distribution among the N cells which are not transient. In equilibrium all cells are non transient and the SRB distribution coincides with the Liouville distribution. (3) If we could take T → ∞ (hence, correspondingly, ε → 0) then the distribution µ which is uniform inside each E( σ ) but which attributes a total weight to E( σ ) equal to N ( σ ) = µSRB (E( σ ))N would become the exact SRB distribution. However it seems conceptually more satisfactory, imitating Boltzmann, to suppose that ε is very small but > 0 so that T will be large but not infinite. (4) A deeper understanding of the above analysis appears to be linked to an important question raised by Ruelle asking whether (and how) one could possibly relate an entropy notion to the logarithm of the Hausdorff measure of the attractor: and a pertinent possibility is that the Hausdorff measure on the attractor is absolutely continuous with
112
G. Gallavotti
respect to the SRB measure. The above analysis in terms of cells is reminiscent, in fact, of the methods to study Hausdorff dimension, Hausdorff measure and Pesin’s formula in general hyperbolic systems, [Yo94]. Acknowledgements. I am grateful to E. Speer for pointing out an inconsistency in a preliminary version of this work and to F. Bonetto for very stimulating and clarifying discussions. The references to [EE11, EE33] and their relevance were pointed out to me by E.G.D. Cohen.
References [An82]
Andrej, L.: The rate of entropy change in non-Hamiltonian systems. Phys. Lett: A 111, 45–46 (1982) [Bo77] Boltzmann, L.: Über die Beziehung zwischen dem zweiten Hauptsatze der mechanischen Wärmetheorie und der Wahrscheinlichkeitsrechnung, respektive den Sätzen über das Wärmegleichgewicht. In: Wissenschaftliche Abhandlungen, Vl. II, Chelsea, New York: F. Hasenöhrl, 1968, reprint, pp. 164–223 [CELS93] Chernov, N. I., Eyink, G. L., Lebowitz, J.L., Sinai, Y.: Steady state electric conductivity in the periodic Lorentz gas. Commun. Math. Phys. 154, 569–601 1993 [EE11] Ehrenfest, P., Ehrenfest, T.: The conceptual foundations of the statistical approach in Mechanics. New York: Dover, 1990, reprint [EE22] Einstein, E.: Zur Theorie des Radiometers. Annalen der Physik, 69, 241–254, 1922. And: Epstein, P.S.: On the resistance experienced by spheres in their motion through gases. Physical Review 23, 710–733 (1924). See also Epstein, P.S.: Theory of the radiometer. Zeitschrift für Physik 54, 537–563 (1929) [Ga95] Gallavotti, G.: Ergodicity, ensembles, irreversibility in Boltzmann and beyond. J. Stat. Phys. 78, 1571–1589 1995 [GC95a] Gallavotti, G., Cohen, E.G.D.: Dynamical ensembles in nonequilibrium statistical mechanics. Phys. Rev. Lett. 74, 2694–2697 (1995) Dynamical ensembles in stationary states. J. Stat. Phys. 80, 931–970, (1995) [Ga99] Gallavotti, G.: Statistical mechanics. A short treatise. Berlin–Heidelberg–New York: Springer Verlag, 1999, pp. 1–345 [Le93] Lebowitz, J.L.: Boltzmann’s entropy and time’s arrow. Phys. Today, 32–38, 1993 [Ru68] Ruelle, D.: Statistical mechanics of one-dimensional lattice gas. Commun. Math. Phys. 9, 267–278 1968 [Ru96] Ruelle, D.: Positivity of entropy production in non equilibrium statistical mechanics. J. Stat. Phys. 85, 1–25 (1996); Entropy production in nonequilibrium statistical mechanics. Commun. Math. Phys. 189, 365–371 (1997) [Ru99] Ruelle, D.: Smooth dynamics and new theoretical ideas in non-equilibrium statistical mechanics. J. Stat. Phys. 95, 393–468 (1999) [Si68] Sinai, Y.G.: Markov partitions and C-diffeomorphisms. Funct. Anal. and Appl. 2, no. 1, 64–89 (1968); Construction of Markov partitions. Funct. Anal. and Appl. 2, no. 2, 70–80 (1968) See also Gibbs measures inergodic theory, Russ. Math. Surv. 27, 21–69 (1972) and Lectures in ergodic theory, Lecture notes in Mathematics, Princeton, NJ: Princeton University Press, 1977 [Yo94] Young, L.S.: Ergodic theory of differentiable dynamical systems. In: Real and complex dynamical systems, ed. B. Branner, P. Hjorth, Nato ASI series, Dordrecht: Kluwer, 1995; Ergodic theory of chaotic dynamical systems. In: Mathematical Physics XII (M ∩ 5 Conference Proceedings), editors D. de Witt, A.J.B. Bracken, M.D. Gould, P.A. Pearce, Cambridge, MA: International Press, 1999 Communicated by Ya. G. Sinai
Commun. Math. Phys. 224, 113 – 132 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Adiabatic Decoupling and Time-Dependent Born–Oppenheimer Theory Herbert Spohn, Stefan Teufel Zentrum Mathematik and Physik Department, Technische Universität München, 80290 München, Germany. E-mail: [email protected]; [email protected] Received: 10 July 2000 / Accepted: 30 July 2001
Dedicated to Joel Lebowitz on the occasion of his 70th birthday Abstract: We reconsider the time-dependent Born–Oppenheimer theory with the goal to carefully separate between the adiabatic decoupling of a given group of energy bands from their orthogonal subspace and the semiclassics within the energy bands. Band crossings are allowed and our results are local in the sense that they hold up to the first time when a band crossing is encountered. The adiabatic decoupling leads to an effective Schrödinger equation for the nuclei, including contributions from the Berry connection. 1. Introduction Molecules consist of light electrons, mass me , and heavy nuclei, mass M which depends on the type of nucleus. Born and Oppenheimer [3] wanted to explain some general features of molecular spectra and realized that, since the ratio me /M is small, it could be used as an expansion parameter for the energy levels of the molecular Hamiltonian. The time-independent Born–Oppenheimer theory has been put on firm mathematical grounds by Combes, Duclos, and Seiler [5], Hagedorn [8], and more recently in [16]. With the development of tailored state preparation and ultra precise time resolution there is a growing interest in understanding and controlling the dynamics of molecules, which requires an analysis of the solutions to the time-dependent Schrödinger equation, again exploiting that me /M is small. The molecular Hamiltonian is of the form H =
2 2 h¯ 2 h¯ 2 − i∇x − Aext (x) + − i∇X + Aext (X) 2me 2M + Ve (x) + Ven (X, x) + Vn (X).
(1)
For notational simplicity we ignore spin degrees of freedom and assume that all nuclei have the same mass. We have k electrons with positions {x1 , . . . , xk } = x and l nuclei with positions {X1 , . . . , Xl } = X. The first and second term of H are the kinetic energies of the electrons and of the nuclei, respectively. An external magnetic field is
114
H. Spohn, S. Teufel
σ (He (R))
(R)
E3 (R) E2 (R) E1 (R)
R0
R
Fig. 1. The schematic spectrum of He (R) for a diatomic molecule as a function of the separation R of the two nuclei
included through the vector potential Aext . Electrons and nuclei interact via the static Coulomb potential. Therefore Ve is the electronic, Vn the nucleonic repulsion, and Ven the attraction between electrons and nuclei. Ve and Vn may also contain an external electrostatic potential. In atomic units (me = h¯ = 1) the Hamiltonian (1) can be written more concisely as H =
2 me 1 − i∇X + Aext (X) + He (X), M 2
(2)
emphasizing that the nuclear kinetic energy will be treated as a “small perturbation”. He (X) is the electronic Hamiltonian for a given position X of the nuclei, He (X) =
2 1 − i∇x − Aext (x) + Ve (x) + Ven (X, x) + Vn (X). 2
(3)
He (X) is a self-adjoint operator on the electronic Hilbert space L2 (R3k ) restricted to its antisymmetric subspace. Later on we will need some smoothness of He (X), which can be established easily if the electrons are treated as point-like and the nuclei have an extended, rigid charge distribution. Generically He (X) has, possibly degenerate, eigenvalues E1 (X) < E2 (X) < . . . which terminate at the continuum edge (X). Thereby one obtains the band structure as plotted schematically in Fig. 1. The discrete bands Ej (X) may cross and possibly merge into the continuous spectrum as indicated in Fig. 2. Comparing kinetic energies, we find for the speeds |vn | ≈ (me /M)1/2 |ve |, which means that on the atomic scale the nuclei move very slowly. If we regard X(t) as a given nucleonic trajectory, then He (X(t)) is a Hamiltonian with slow time variation and the time-adiabatic theorem [15, 14, 1] can be applied [2]. For us X are quantum mechanical degrees of freedom. The Hamiltonian H of (2) is time-independent and we can only exploit that the nucleonic Laplacian carries a small prefactor. To distinguish, we refer to our situation as space-adiabatic. Since the nuclei move very slowly, their dynamics must be followed over sufficiently long times. From the speed ratio we conclude that
Adiabatic Decoupling and Time-Dependent Born–Oppenheimer Theory
115
these times are of order (me /M)1/2 in atomic units. To simplify notation we define me (4) ε= M as the small dimensionless parameter. Then 2 1 − i∇X + Aext (X) + He (X), H ε = ε2 2
(5)
and we want to study the solutions of the time-dependent Schrödinger equation iε
∂ψ = H εψ ∂t
(6)
in the limit of small ε. The crude physical picture underlying the analysis of (6) is that the nuclei behave semiclassically because of their large mass and that the electrons rapidly adjust to the slow nucleonic motion. Thus, in fact, the time-dependent Born–Oppenheimer approximation involves two limits. If the electrons are initially in the eigenstate χj (X0 ) of the j th band with energy Ej (X0 ), where X0 is the approximate initial configuration of the nuclei, then the j th band is adiabatically protected provided there is an energy gap separating it from the rest of the spectrum. Thus at later times, up to small error, the electronic wave function is still in the subspace corresponding to the j th band. But this implies that the nuclei are governed by the Born–Oppenheimer Hamiltonian 2 1 ε = ε2 (7) − i∇X + Aext (X) + Ej (X). HBO 2 ε can be analyzed through semiclassical methods where to leading Since ε 1, HBO order the contributions come from the classical flow t corresponding to the classical cl = 1 p 2 + E (q) on nucleonic phase space. Hamiltonian HBO j 2 In general, Ej (X) may touch another band as X varies. To allow for such band crossings we introduce the region ! ⊂ Rn , n = 3l, in nucleonic configuration space, such that Ej restricted to ! does not cross or touch any other energy band. The classical flow t then has ! × Rn as phase space and is defined only up to the time when it first hits the boundary ∂! × Rn . Up to that time (7) still correctly describes the quantum evolution. To follow the tunneling through a band crossing other methods have to be used [11, 7], in particular, the codimension of the crossing is of relevance. The mathematical investigation of the time-dependent Born–Oppenheimer theory was initiated and carried out in great detail by Hagedorn. In his pioneering work [9] he constructs approximate solutions to (6) of the form φq(t),p(t) ⊗ χj (q(t)), where φq(t),p(t) is a coherent state carried along the classical flow, (q(t), p(t)) = t (q0√ , p0 ). The difference to the true solution with the same initial condition is of order ε in the L2 -norm over times of order ε −1 in atomic units and the approximation holds until the first hitting time of ∂! × Rn . In a recent work Hagedorn and Joye [10] construct solutions to (6) satisfying exponentially small error estimates. In Hagedorn’s approach the “adiabatic and semiclassical limits are being taken simultaneously, and they are coupled [10]”. In our paper we carefully separate the space-adiabatic and the semiclassical limit. One immediate benefit is the generalization of the first order analysis of Hagedorn from coherent states to arbitrary wave functions.
116
H. Spohn, S. Teufel σ (He (X))
(a) (b)
(b)
}
σ∗ (X)
(a)
Ran P∗
!
X
Fig. 2. The wave function can leave RanP∗ in two different ways: either by transitions to other bands (a) or through the boundary of ! (b)
Let us explain our result for the space-adiabatic part in more detail. We assume that there is some region ! ⊂ Rn in the nucleonic configuration space, such that some subset σ∗ (X) of σ (He (X)) is separated from the remainder of the spectrum by a gap for all X ∈ !, i.e. dist σ∗ (X), σ (He (X)) \ σ∗ (X) ≥ d > 0 for all X ∈ !. ! could be punctured by small balls (for n = 2) because of band crossings. ! could also terminate because the point spectrum merges in the continuum, which physically means that the molecule loses an electron through ionization. Let P∗ (X) be the spectral ⊕ projection of He (X) associated with σ∗ (X) and P∗ = ! dX P∗ (X). We will establish ε that the unitary time evolution e−iH t/ε agrees on RanP∗ with the diagonal evolution −iH ε t/ε ε e diag generated by Hdiag := P∗ H ε P∗ up to errors of order ε as long as the leaking through the boundary of ! is sufficiently small. To complete the analysis one has to control the flow of the wave function through ∂!. One possibility is to simply avoid the problem by assuming that ! = Rn , hence ∂! = ∅. We will refer to this case as a globally isolated band. Of course, the set {(X, y) ∈ Rn × R : y ∈ σ∗ (X)} may contain arbitrary band crossings. As one of our main results, we prove that the subspace RanP∗ is adiabatically protected. In particular for the purpose of studying band crossings the full molecular Hamiltonian may be replaced by a simplified model with two bands only. In general one has ∂! = ∅, to which we refer as a locally isolated band. To estimate the flow out of ! the only technique available seems to be semiclassical analysis. But this requires a control over the semiclassical evolution, for which one needs, at present, that {(X, y) ∈ ! × R : y ∈ σ∗ (X)} contains no band crossings. Then {(X, y) ∈ ! × R : y ∈ σ∗ (X)} = ∪j {(X, y) ∈ ! × R : y = Ej (X)} is the disjoint union of possibly degenerate energy bands Ej (X). We will prove that each band separately is adiabatically protected. In the special case where σ∗ (X) = Ej (X) is a nondegenerate eigenvalue for X ∈ !, ε −iH ε t/ε ε is a standard e diag is well approximated through e−iHBO t/ε on L2 (Rn ). Since HBO
Adiabatic Decoupling and Time-Dependent Born–Oppenheimer Theory
117
semiclassical operator, one can easily control the X-support of the wave function and therefore prove a result for rather general ! ⊂ Rn , for the details see Theorem 2. Roughly speaking, it says that if φt is a solution of the effective Schrödinger equation for the nuclei iε
∂φt ε = HBO φt , ∂t
(8)
with suppφ0 ⊂ !, then, modulo an error of order ε, ψt := φt (X)χj (X, x) is a solution of the full Schrödinger equation (6) with initial condition ψ0 (X, x) = φ0 (X)χj (X, x) as long as φt is supported in ! up to L2 -mass of order ε. This maximal time span can be computed using the classical flow t . ε acquires as a first order As first observed by Mead and Truhlar [19], in general HBO correction an additional vector potential Ageo (X) = −iχj (X), ∇X χj (X) and (7) has to be replaced by 2 1 ε HBO = ε2 (9) − i∇X + Aext (X) + Ageo (X) + Ej (X). 2 Multiplying χj (X) with a smooth X-dependent phase factor induces a gauge transformation for Ageo , which implies that the physical predictions based on (9) do not change, as it should be. As noticed in [19], in general Ageo cannot be removed through a gauge transformation and (9) and (7) describe different physics. Berry realized that geometric phases appear whenever the Hamiltonian has slowly changing parameters. Therefore Ageo (X) is referred to as Berry connection, cf. [22] for an instructive collection of reprints. In fact, the motion of nuclei as governed by the Born–Oppenheimer Hamiltonian (9) is one of the paradigmatic examples for geometric phases. ε If σ∗ (X) = E(X) is k-fold degenerate, not much of the above analysis changes. HBO 2 n ⊕k becomes matrix-valued and acts on L (R ) , i.e. 2 2 ε ε HBO = − i∇X + Aext (X) + Ej (X) 1k×k 2 ε + (−iε∇X ) · Ageo (X) + Ageo (X) · (−iε∇X ) . 2 The connection Ageo (X) contains in general also off-diagonal terms and matrix-valued semiclassics must be applied. However, since the only nondiagonal term is in the subprincipal symbol, the leading order semiclassical analysis reduces to the scalar case and, in particular, agrees with the nondegenerate band case. We do not carry out the straightforward extension of Theorem 2 below to the degenerate band case, because the technicalities of matrix-valued semiclassics would obscure the simple ideas behind our analysis. In their recent work [18] Martinez and Sordoni independently study the time-dependent Born–Oppenheimer approximation as based on techniques developed by Nenciu and Sordoni [20]. They consider the case of a globally isolated band for a Hamiltonian of the form (1) with smooth V and Aext = 0. They succeed in proving the adiabatic decoupling to any order in ε for subspaces P∗ε which are ε-close to the unperturbed subspaces P∗ considered by us. With this result, in principle, higher order corrections to the effective Hamiltonian (7) could be computed.
118
H. Spohn, S. Teufel
The paper is organized as follows. Section 2 contains the precise formulation of the ε and on results. Section 3 gives a short discussion of the semiclassical limit of HBO how such results extend to the full molecular system. Proofs are provided in Sect. 4. In spirit they rely on techniques developed in [23] in the context of the semiclassical limit for dressed electron states. In practice the Born–Oppenheimer approximation requires 2 several novel constructions, since the “perturbation” − ε2 ) increases quadratically. Our results can be formulated and proved in a more general framework dealing with, possibly time-dependent, perturbations of fibered operators. Also the gap condition can be removed by using arguments similar to those developed by Avron and Elgart in [1]. The general operator theoretical results will appear elsewhere [24]. 2. Main Results The specific form (3) of the electronic part of the Hamiltonian will be of no importance in the following. Thus we only assume that
⊕ He = dX He (X), He (X) = He0 + He1 (X), Rn
where He0 is self-adjoint on some dense domain D ⊂ He and bounded from below and He1 (X) ∈ L(He ) is a continuous family of self-adjoint operators, bounded uniformly for X ∈ Rn . Thus He is self-adjoint on D(He ) = L2 (Rn )⊗D ⊂ H := L2 (Rn )⊗He and bounded from below. For the definition of L2 (Rn ) ⊗ D we equip D with the graph-norm · He0 , i.e., for ψ ∈ D, ψHe0 = He0 ψ + ψ. Let Aext ∈ Cb1 (Rn , Rn ), where for any open set , ⊂ Rm , m ∈ N, Cbk (,) denotes the set of functions f ∈ C k (,) such that for each multi-index α with |α| ≤ k there exists a Cα < ∞ with sup |∂ α f (x)| ≤ Cα .
x∈,
2
2
Then ε2 − i∇X + Aext (X) is self-adjoint on W 2 (Rn ), the second Sobolev space, since −i∇X is infinitesimally operator bounded with respect to −)X . It follows that Hε =
2 ε2 − i∇X + Aext (X) ⊗ 1 + He , 2
(10)
self-adjoint on D(H ε ) = W 2 (Rn ) ⊗ He ∩ D(He ). For X ∈ !, ! ⊂ Rn open, we require in addition some regularity for He (X) as a function of X: Hk He1 (·) ∈ Cbk (!, L(He )). The exact value of k will depend on whether ! = Rn or ! ⊂ Rn . For the type of Hamiltonian considered in the introduction, cf. (1), all the above conditions including Condition Hk are easily checked and put constraints only on the smoothness of the external potentials and on the smoothness and the decay of the charge distribution of the nuclei. For point nuclei Hk fails and a suitable substitute would require a generalization of the Hunziker distortion method of [16]. We will be interested in subsets of {(X, s) ∈ ! × R : s ∈ σ (He (X)} which are isolated from the rest of the spectrum in the following sense.
Adiabatic Decoupling and Time-Dependent Born–Oppenheimer Theory
S
119
For X ∈ !, let σ∗ (X) ⊂ σ (He (X)) be such that there are functions f± ∈ Cb (!, R) and a constant d > 0 with f− (X) + d, f+ (X) − d ∩ σ∗ (X) = σ∗ (X) and
f− (X), f+ (X) ∩ σ (He (X) \ σ∗ (X) = ∅.
⊕ We set P∗ = ! dX P∗ (X), where P∗ (X) = 1σ∗ (X) (He (X)) is the spectral projection of He (X) with respect to σ∗ (X). As explained in the introduction we have to distinguish two cases. (i) Globally isolated bands. We assume ! = Rn and let ε Hdiag := P∗ H ε P∗ + P∗⊥ H ε P∗⊥ .
(11)
Since we aim at a uniform result for the adiabatic theorem, we introduce the Sobolev spaces W 1,ε (Rn ) and W 2,ε (Rn ) with respect to the ε-scaled gradient, i.e.
W 1,ε (Rn ) := φ ∈ L2 (Rn ) : φW 1,ε := ε |∇φ| + φ < ∞ and
W 2,ε (Rn ) := φ ∈ L2 (Rn ) : φW 2,ε := ε 2 )φ + φ < ∞ .
Alternatively we will project on finite total energies smaller than E and define E(H ε ) := 1(−∞,E ] (H ε ). ε is self-adjoint on the domain of Theorem 1. Assume H3 and S for ! = Rn . Then Hdiag ε < ∞ such that for all t ∈ R, H . There are constants C, C −iH ε t/ε −iH ε t/ε − e diag ≤ ε C (1 + |t|)3 , (12) e 2,ε
L(W
⊗He ,H)
and for all E ∈ R, −iH ε t/ε −iH ε t/ε − e diag E(H ε ) e
L(H)
(1 + |E|) (1 + |t|). ≤ εC
(13)
L(W 2,ε ⊗ He , H) denotes the space of bounded linear operators from W 2,ε ⊗ He to H equipped with the operator norm. This result should be understood as an adiabatic theorem for the subspaces RanP∗ and RanP∗⊥ , which are not spectral subspaces. Let us point out one immediate application of Theorem 1. The behavior near band crossings is usually investigated using simplified models involving only two energy bands and ignoring the rest of the spectrum, cf. [11, 7]. Theorem 1 shows that this strategy is indeed justified modulo errors of order ε. (ii) Locally isolated bands. σ∗ (X) = E(X) is a nondegenerate eigenvalue for all X ∈ !. ! may now be any open subset of Rn and for such a ! we assume H∞ and S. We also assume that ! is connected. Otherwise one could treat each connected component separately.
120
H. Spohn, S. Teufel
It is easy to see that, given H∞ and S, the family of projections P∗ (·) ∈ Cb∞ (!, L(He )). However, in order to “map” the dynamics from RanP∗ to L2 (!) we need in addition a smooth version χ (·) ∈ Cb∞ (!, He ) of the normalized eigenvector of He (X) with eigenvalue E(X). In other words we require the complex line bundle over ! defined by P∗ to be trivial. This always holds for contractible !, but, as discussed below, also for some relevant examples where ! is not contractible. Given a smooth version of χ (X) with χ (X) = 1, one has Reχ (X), ∇X χ (X) = 0, but, in general, Imχ (X), ∇X χ (X) = 0. In the following we distinguish two cases: Either it is possible to achieve Im χ (X), ∇X χ (X) = 0 by a smooth gauge transformation χ (X) → χ (X) = eiθ(X) χ (X) or not. In the latter case Ageo (X) := −iχ (X), ∇X χ (X) is the gauge potential of a connection on the trivial complex line bundle over !, the Berry connection, and has to be taken into account in the definition of the effective operator ε := HBO
2 ε2 − i∇X + Aext (X) + Ageo (X) + E(X) 2
(14)
with domain W 2 (Rn ). Thus Ageo acts as an additional external magnetic vector potenε with an ε in front only, and therefore are tial. Although Aext and Ageo appear in HBO not retained in the semiclassical limit to leading order, they do contribute to the solution of the Schrödinger equation for times of order ε−1 . If the full Hamiltonian is real in position representation, as it is the case for the Hamiltonians considered in the introduction whenever Aext = 0, then χ (X) can be chosen real-valued. If, in addition, ! is contractible, the existence of a smooth version of χ (X) with Imχ (X), ∇X χ (X) = 0 follows. ε on L2 (Rn ) through (14), the functions E(X) and A To define HBO geo (X), which are a priori defined on ! only, must be continued to functions on Rn . Hence we arbitrarily extend E(X) and Ageo (X) to functions in Cb∞ (Rn ) by modifying them, if necessary, on ! \ (! − δ/5) (cf. (17)) for some δ > 0. The parameter δ will be fixed in the formulation of Theorem 2 and will appear in several places. It controls how close the states are allowed to come to ∂!. The generic example for the Berry phase is a band crossing of codimension 2 (cf. [22, 11, 7]). If E(X) is an isolated energy band except for a codimension 2 crossing, then ! = Rn \ {closed neighborhood of the crossing} is no longer contractible, but the line bundle is still trivial. Although the underlying Hamiltonian is real, the Berry connection cannot be gauged away. Within the time-independent Born–Oppenheimer approximation Herrin and Howland [12] study a model with a nontrivial eigenvector bundle. With the fixed choice for χ (X) we have ⊕ Ran P∗ = dX φ(X)χ (X); φ ∈ L2 (!) ⊂ H. (15) !
Thus there is a natural identification U : RanP∗ → L2 (Rn ) connecting the relevant subspace on which the full quantum evolution takes place and the Hilbert space L2 (Rn ) on which the effective Born–Oppenheimer evolution is defined. According to (15), we set U(φχ ) = φ,
i.e.
( UP∗ ψ )(X) = χ (X), (P∗ ψ)(X) He .
Adiabatic Decoupling and Time-Dependent Born–Oppenheimer Theory
121
Its adjoint U ∗ : L2 (Rn ) → RanP∗ is given by
⊕ ∗ U φ= dX φ(X)χ (X). !
U ∗U
= 1 on RanP∗ . But U is not surjective and thus not Clearly U is an isometry and unitary. ε By construction, e−iHBO t/ε is a good approximation to the true dynamics only as long as the wave function of the nuclei is supported in ! modulo errors of order ε. ε is a standard semiclassical operator, the X-support of solutions of (8) can be Since HBO calculated approximately from the classical dynamics generated by its principal symbol Hcl (q, p) = 21 p 2 + E(q) on phase space Z := Rn × Rn , d q = p, dt
d p = −∇E(q). dt
(16)
The solution flow to (16) exists for all times and will be denoted by t . In order to make these notions more precise, we need to introduce some notation. The Weyl quantization of a ∈ Cb∞ (Z) is the linear operator
X+Y W,ε −n a φ (X) = (2π ) , ε k e−i(X−Y )·k φ(Y ), dY dk a 2 Rn as acting on Schwartz functions. a W,ε extends to L(L2 (Rn )) with operator norm bounded uniformly in ε (cf., e.g., Theorem 7.11 in [6]). The wave functions with phase space support in a compact set 7 ⊂ Z do not form a closed subspace of L2 (Rn ). Hence we cannot project on this set. In order to define approximate projections, let for 7 ⊂ Rm , m ∈ N, and for α > 0, 7 − α := z ∈ 7 : infm |w − z| ≥ α . (17) w∈R \7
Definition 1. An approximate characteristic function 1(7,α) ∈ Cb∞ (Rm ) of a set 7 ⊂ Rm with margin α is defined by the requirement that 1(7,α) |7−α = 1 and 1(7,α) |Rm \7 = 0. If 1(7,α) is an approximate characteristic function on phase space Z, then the corresponding approximate projection is defined as its Weyl quantization 1W,ε (7,α) . We will say that functions in Ran1W,ε (7,α) have phase space support in 7.
For 7 ⊂ Z we will use the abbreviations 7q := q ∈ Rn : (q, p) ∈ 7 for some p ∈ Rn , 7p := p ∈ Rn : (q, p) ∈ 7 for some q ∈ Rn . Let the phase space support 7 of the initial wave function be such that 7q ⊂ ! − δ. Then the maximal time interval for which the X-support of the wave function of the nuclei stays in ! up to errors of order ε can be written as δ Imax (7, !) := [T−δ (7, !), T+δ (7, !)],
where the “first hitting times” T± are defined by the classical dynamics through
T+δ (7, !) := sup t ≥ 0 : s (7) q ⊆ ! − δ ∀ s ∈ [0, t]
122
H. Spohn, S. Teufel
and T−δ (7, !) analogously for negative times. These are just the first times for a particle starting in 7 to hit the boundary of ! − δ when dragged along the classical flow t . The following proposition, which is an immediate consequence of Egorov’s Theorem δ (7, !) the support of the wave function of the nuclei [4, 21], shows that for times in Imax stays indeed in !−δ, up to errors of order ε uniformly on Ran1W,ε (7,α) for any approximate projection 1W,ε (7,α) .
Proposition 1. Let 7 ⊂ Z be such that 7q ⊂ ! − δ and let 1!−δ denote multiplication with the characteristic function of ! − δ on L2 (Rn ). For any approximate projection δ 1W,ε (7,α) and any bounded interval I ⊆ Imax (7, !) there is a constant C < ∞ such that for all t ∈ I , ε W,ε 1 − 1!−δ e−iHBO t/ε 1(7,α)
L(L2 (Rn ))
≤ C ε.
An approximate projection on 7 in H is defined as P7α := U ∗ 1(!,δ) 1W,ε (7,α) U P∗ ,
where 1W,ε (7,α) is an approximate projection on 7 according to Definition 1 and 1(!,δ) is an approximate characteristic function for !. Using the latter instead of the sharp cutoff from U ∗ makes RanP7α a bounded set in W 2,ε ⊗ He whenever 7p is a bounded set.
Theorem 2. Assume H∞ and S with dim(RanP∗ (X)) = 1 for some open ! ⊆ Rn . Let 7 ⊂ Z be such that 7q ⊂ ! − δ for some δ > 0 and 7p bounded. For any approximate δ (7, !) there is a constant C < ∞ projection P7α and any bounded interval I ⊆ Imax such that for all t ∈ I , ε −iH ε t/ε − U ∗ e−iHBO t/ε U P7α e
L(H)
≤ Cε.
(18)
Theorem 2 establishes that the electrons adiabatically follow the motion of the nuclei up to errors of order ε as long as the leaking through the boundary of ! is small. The ε the semiclassics was used only to control such a leaking uniformly. However, for HBO limit ε → 0 is a semiclassical limit and, as discussed in the following section, beyond the mere support of the wave function more detailed information is available.
3. Semiclassics for a Single Band The semiclassical limit of Eq. (8) with a Hamiltonian of the form (14) is well understood and there is a variety of different approaches. For example one can construct approximate solutions φq(t) of (8) which are localized along a classical trajectory q(t), i.e. along a solution of (16). Then it follows from Theorem 2 that φq(t) χ is a solution of the full Schrödinger equation, (6), up to an error of order ε as long as q(t) ∈ ! − δ. Roughly speaking, this coincides with the result of Hagedorn [9]. In applications the assumption that the wave function of the nuclei is well described by a coherent state seems to be rather restrictive and a more general approach to the semiclassical analysis of a Schrödinger equation of the form (8) is to consider the distributions of semiclassical observables, i.e. of operators obtained as Weyl quantization a W,ε of classical phase space functions a : Z → R.
Adiabatic Decoupling and Time-Dependent Born–Oppenheimer Theory
123
Consider a general initial wave function φ ε ∈ L2 (Rn ), such that φ ε corresponds to a probability measure ρcl (dq dp) on phase space in the sense that for all semiclassical observables with symbols a ∈ Cb∞ (Z),
ε W,ε ε lim φ , a φ − a(q, p) ρcl (dq dp) = 0. (19) ε→0
Z
The definition is equivalent to saying that the Wigner transform of φ ε converges to ρcl weakly on test functions in Cb∞ (Z) [17]. An immediate application of Egorov’s theorem yields
ε iH ε t/ε W,ε −iH ε t/ε ε t BO BO lim φ , e a e φ − (a ◦ )(q, p) ρcl (dq dp) = 0 (20) ε→0
Z
uniformly on bounded intervals in time, where we recall that t is the flow generated by (16). In (20) one can of course shift the time evolution from the observables to the states on both sides and write instead
lim φtε , a W,ε φtε − a(q, p) ρcl (dq dp, t) = 0. (21) ε→0
Z
Here φtε = e−iHBO t/ε φ ε and ρcl (dq dp, t) = ρcl ◦−t (dq dp) is the initial distribution ρcl (dq dp) transported along the classical flow. Thus with respect to a certain type of experiments the system described by the wave function φtε behaves like a classical system. For a molecular system the object of real interest is the left hand side of (21) with φtε ε as acting replaced by the solution ψtε of the full Schrödinger equation and a W,ε =: aBO 2 n W,ε on L (R ) replaced by a ⊗ 1 as acting on H. In order to compare the expectations ε with the expectations of a W,ε ⊗ 1, we need the following proposition. of aBO Proposition 2. In addition to the assumptions of Theorem 2 let a ∈ Cb∞ (Z) with
dξ sup |ξ | | a (2) (x, ξ )| < ∞, (22) x∈Rn
where (2) denotes Fourier transformation in the second argument. Then there is a constant C < ∞ such that W,ε ⊗ 1 − U ∗ a W,ε U 1!−δ P∗ ≤ C ε. a For the proof of Proposition 2 see the end of Sect. 4.2. With its help we obtain the semiclassical limit for the nuclei as governed by the full Hamiltonian. Corollary 1. Let 7 and I be as in Theorem 2. Let ψ ε ∈ H be such that ε(19) is satisfied for φ ε := UP∗ ψ ε for some ρcl with suppρcl ⊂ 7 − α. Let ψtε = e−iH t/ε ψ ε , then for all a ∈ Cb∞ (Z) which satisfy (22)
ε W,ε ε lim ψt , (a ⊗ 1) ψt − a(q, p) ρcl (dq dp, t) = 0 (23) ε→0
uniformly for t ∈ I .
Z
124
H. Spohn, S. Teufel
Translated to the language of Wigner measures Corollary 1 states the following. Let us define the marginal Wigner transform for the nuclei as
ε Wnuc (ψtε )(q, p) := (2π)−n dX eiX·p ψtε∗ (q + εX/2), ψtε (q − εX/2)He .
Rn ε ε Then, whenever Wnuc (P∗ ψ0 )(q, p) dq dp ε (P ψ ε )(q, p) dq dp sure ρcl (dq dp), Wnuc ∗ t
converges weakly to some probability mea converges weakly to (ρcl ◦ −t (dq dp). Corollary 1 follows by applying first Proposition 2 and then Theorem 2 to the lefthand side in the difference (23), where we note that limε→0 (1 − P7α )ψ ε = 0 and thus also limε→0 (1 − P!−δ ( )ψtε = 0 for any δ ( < δ. This yields the left hand side of (20) and thus (23). We mention some standard examples of initial wave functions φ ε of the nuclei which approximate certain classical distributions. The initial wave function for the full system is, as before, recovered as ψ ε = U ∗ φ ε = φ ε (X)χ (X). In these examples one regains some control on the rate of convergence with respect to ε which was lost in (19). (i) Wave packets tracking a classical trajectory. For φ ∈ L2 (Rn ) let X − q0 n φqε0 ,p0 (X) = ε− 4 e−ip0 ·(X−q0 )/ε φ √ . ε
Then |φqε0 ,p0 (X)|2 is sharply peaked at q0 for ε small and its ε-scaled Fourier transform is sharply peaked at p0 . Thus one expects that the corresponding classical distribution is given by δ(q − q0 )δ(p − p0 ) dq dp. As was shown, e.g. in [23], this is indeed true , |p|φ ∈ L1 (Rn ). Then Corollary 1 holds with (23) for φ ∈ L2 (Rn ) such that φ, |x|φ, φ replaced by ε ψt , (a W,ε ⊗ 1) ψtε − a(q(t), p(t)) √ L1 + |x|φL1 φ L1 , (24) = O( ε) φ2L2 + φL1 |p|φ where (q(t), p(t)) is the solution of the classical dynamics with initial condition (q0 , p0 ). Equation (24) generalizes Hagedorn’s first order result in [9] to a larger class of localized wave functions. (ii) Either sharp momentum or sharp position. For φ ∈ L2 (Rn ) let p pε (p) = φ p− 0 , φ 0 ε where denotes the ε-scaled Fourier transformation, then the corresponding classical distribution is ρcl (dq dp) = δ(p − p0 )|φ(q)|2 dq dp. Note that the absolute value of φ does not depend on ε in that case. Equivalently one defines X − q n 0 φqε0 (X) = ε− 2 φ ε (p)|2 dq dp. In both cases one finds that the and obtains ρcl (dq dp) = δ(q − q0 )|φ L1 for difference in (23) is bounded a constant times either ε φ2L2 + φL1 |p|φ L1 for φqε . φpε 0 or ε φ2L2 + |x|φL1 φ 0 (iii) WKB wave functions. For f ∈ L2 (Rn ) and S ∈ C 1 (Rn ) both real valued let φ ε (X) = f (X) eiS(X)/ε , then ρcl (dq √ dp) = f 2 (q) δ(p − ∇S(q)) dq dp. In this case one expects that (23) is bounded as ε, which has been shown in [23] for a smaller set of test functions.
Adiabatic Decoupling and Time-Dependent Born–Oppenheimer Theory
125
4. Proofs 4.1. Globally isolated bands. We collect some immediate consequences of H3 and S. Using the Riesz formula 1 P∗ (X) = − dλ Rλ (He (X)), (25) 2π i γ (X) with γ (X) a smooth curve in the complex plain circling σ∗ (X) only and Rλ (He (X)) = (He (X) − λ)−1 , one easily shows that P∗ (·) ∈ Cb2 (Rn , L(He )). Assumption S enters at this point, since it allows to chose γ (X) locally independent of X. Hence, when taking derivatives with respect to X in (25), one only needs to differentiate the integrand. In particular one finds that P∗⊥ (X)(∇X P∗ )(X)P∗ (X) 1 dλ Rλ (He (X)) P∗⊥ (X) (∇X He )(X) Rλ (He (X)) P∗ (X). = 2πi γ (X)
(26)
Since P∗ (X)(∇X P∗ )(X)P∗ (X) = P∗⊥ (X)(∇X P∗ )(X)P∗⊥ (X) = 0, which follows from (∇X P∗ )(X) = (∇X P∗2 )(X) = (∇X P∗ )(X)P∗ (X) + P∗ (X)(∇X P∗ )(X), we have that (∇X P∗ )(X) = P∗⊥ (X)(∇X P∗ )(X)P∗ (X) + adjoint.
(27)
In (27) and in the following “+ adjoint” means that the adjoint operator of the first term in a sum is added. Starting with (12), we find, at the moment formally, that ε ε −iH ε t/ε −iH ε t/ε iH ε t/ε e diag − e−iH t/ε = e diag 1 − e diag e−iH t/ε
t/ε −iH ε s −iH ε t/ε iH ε s ε = i e diag e ds e diag H ε − Hdiag , (28) 0
where ε H ε − Hdiag = P∗⊥ H ε P∗ + adjoint 2 2 ⊥ ε = P∗ − i∇X + Aext (X) , P∗ P∗ + adjoint. 2
(29)
Let DA := −i∇X + Aext (X). Then the commutator is easily calculated as
ε2 (DA ⊗ 1)2 , P∗ = −i ε (∇X P∗ ) · (εDA ⊗ 1) + O(ε 2 ) 2
(30)
= −ε (∇X P∗ ) · (ε∇X ⊗ 1) + O(ε 2 ),
(31)
where O(ε2 ) holds in the norm of L(H, H) as ε → 0. For (30) and (31) it was used that Aext (X) and P∗ (X) are both differentiable with bounded derivatives and that Aext (X) commutes with P∗ .
126
H. Spohn, S. Teufel
ε is self-adjoint Before we can continue, we need to justify (28) by showing that Hdiag on D(H ε ). To see this, note that −iε∇X is bounded with respect to ε 2 )X with relative bound 0 and that for ψ ∈ D(H ε ), 2 (ε 2 )X ⊗ 1) ψ ≤ c1 (ε 2 DA ⊗ 1) ψ + ψ 2 (32) ≤ c2 (ε 2 DA ⊗ 1 + 1 ⊗ H0 ) ψ + ψ ≤ c3 H ε ψ + ψ ,
where we used that He0 is bounded from below and that He1 is bounded. Hence H ε − ε ε Hdiag is infinitesimally operator bounded with respect to H ε , consequently Hdiag is ε ε self-adjoint on D(H ) and thus (28) holds on D(H ). Equations (29) and (31) in (28) give ε −iH ε t/ε P∗⊥ e diag − e−iH t/ε (33)
t/ε ε −iH ε t/ε iH ε s = −iε e diag ds e diag P∗⊥ (∇X P∗ ) P∗ · (ε∇X ⊗ 1) e−iH s + O(ε)|t|, 0
where we used that the term of order O(ε 2 ) in (31) yields a term of order O(ε)|t| after integration, since all other expressions in the integrand are bounded uniformly in time and the domain of integration grows like t/ε. In (33) and in the following we omit the adjoint term from (29) and thus consider the difference of the groups projected on RanP∗⊥ only. The argument for the difference projected on RanP∗ goes through analogously by taking adjoints at the appropriate places. Now ε(∇X P∗ ) · (ε∇X ⊗ 1) is only O(ε) in the norm of L(W 1,ε ⊗ He , H) and thus, according to the naive argument, only O(1)|t| after integration. As in [13] and [23] we proceed by writing (∇X P∗ ) · (ε∇X ⊗1) as the commutator of a bounded operator B with H ε modulo terms of order O(ε). This is in analogy to the proof of the time-adiabatic theorem [15] and allows one to write the first order part of the integrand in (33) as the time derivative of a bounded operator and, as a consequence, to do the integration without losing one order in ε. In view of (26) we define 1 B(X) := dλ Rλ (He (X))2 P∗⊥ (X) (∇X He )(X) Rλ (He (X)) P∗ (X). (34) 2πi γ (X) An easy calculation shows that = − P∗⊥ (∇X P∗ ) P∗ . He , B
(35)
j (X) ∈ By assumption ∂Xj He (X) ∈ C 2 (Rn , L(He )), j = 1, . . . , n, hence B 2 n C (R , L(He )) and thus 2 ε 2 = −ε (∇X B) · (ε∇X ⊗ 1) + O(ε 2 ) = O(ε) (36) DA ⊗ 1, B 2 in the norm of L(W 1,ε ⊗ He , H). Equations (35) and (36) combined yield that ε = − P∗⊥ (∇X P∗ ) P∗ + O(ε) H ,B
Adiabatic Decoupling and Time-Dependent Born–Oppenheimer Theory
127
with O(ε) in the norm of L(W 1,ε ⊗ He , H). Since ∇X He ∈ L(H), a short calculation shows that [H ε , ε∇X ⊗ 1] = O(ε) in L(W 1,ε ⊗ He , H). Hence we define · (ε∇X ⊗ 1) B := B and obtain H ε , B = − P∗⊥ (∇X P∗ ) P∗ · (ε∇X ⊗ 1) + O(ε)
with O(ε) in the norm of L(W 1,ε ⊗ He , H). Let ε
ε
B(s) = eiH s B e−iH s , then −i
d ε ε B(s) = eiH s [H ε , B] e−iH s . ds
Continuing (33), we have ε −iH ε t/ε P∗⊥ e diag − e−iH t/ε
t/ε ε −iH ε t/ε iH ε s = i ε e diag ds e diag H ε , B e−iH s + O(ε)(|t| + |t|2 ) 0
t/ε d ε −iH ε t/ε iH ε s = ε e diag ds e diag e−iH s B(s) + O(ε)(|t| + |t|2 ), ds 0
(37)
where O(ε) holds now in the norm of L(W 1,ε ⊗ He , H). The additional factor of |t| in (37) comes from the fact that −iH ε s e
L(W 1,ε ⊗He )
≤ c (1 + ε |s|)
(38)
for some constant c < ∞, i.e. the scaled momentum of the nuclei may grow in time. Using Aext ∞ = C < ∞ and (εDA ⊗ 1), H ε
L(H)
ε, ≤ C
(38) follows from −iH ε s −iH ε s (−iε∇X ⊗ 1) e−iH ε s ψ ≤ ⊗ 1) e ψ + ⊗ 1) e ψ (εD (εA A ext ε ≤ (εDA ⊗ 1) ψ + (εDA ⊗ 1), e−iH s ψ + C ψ ε |s| ψ + 2 C ψ ≤ (−iε∇X ⊗ 1) ψ + C for ψ ∈ W 1 ⊗ He .
128
H. Spohn, S. Teufel
Finally, continuing (37), integration by parts yields ε −iH ε t/ε P∗⊥ e diag − e−iH t/ε
t/ε ε t/ε ε s d −iHdiag iHdiag −iH ε s = εe ds e e B(s) + O(ε)(|t| + |t|2 ) ds 0 ε −iH ε t/ε = ε B e−iH t/ε − e diag B
t/ε −iH ε s ε t/ε −iHdiag iH ε s ε + iεe ds e diag H ε − Hdiag + O(ε)(|t| + |t|2 ) Be 3 = O(ε) 1 + |t| ,
0
(39)
where O(ε) holds in the norm of L(W 2,ε ⊗ He , H). For the last equality we used that B is bounded in L(W 2,ε ⊗ He , H) as well as in L(W 2,ε ⊗ He , W 1,ε ⊗ He ) uniformly ε is O(ε) in L(W 1,ε ⊗ H , H), as we saw in (29) and (31), with respect to ε, H ε − Hdiag e and −iH ε s ≤ c (1 + ε |s|)2 (40) e 2,ε L(W
⊗He )
for some constant c < ∞. Equation (40) follows from arguments similar to those used in the proof of (38). We are left to prove (13). This follows from exactly the same proof using that E(H ε ) εs −iH commutes with e and that, according to (32), (ε2 )X ⊗ 1) E(H ε ) ψ ≤ c3 H ε E(H ε ) ψ + ψ ≤ c4 (|E| + 1) ψ. 4.2. Locally isolated bands. To prove Theorem 2 we proceed along the same lines as in the previous section, with the one modification that we use Proposition 1 to control the ⊕ ε anymore, flux out of ∂!. However, one cannot use P∗ = ! dX P∗ (X) to define Hdiag ε because the functions in its range would not be in the range of H and some smoothing in the cutoff is needed. For i ∈ {0, 1, 2, 3} let 1i = 1(!− 4−i δ, 1 δ) be approximate 5 5 characteristic functions according to Definition ⊕ 1. Then the smoothed projections are dX Pi (X). In the following it will be defined with Pi (X) = 1i (X) P∗ (X) as Pi = used that for i < j we have Pi Pj = Pj Pi = Pi , and hence (1−Pj )Pi = Pi (1−Pj ) = 0. Proposition 1 yields ε ε ε ε e−iH t/ε − U ∗ e−iHBO t/ε U P7α = e−iH t/ε − P1 U ∗ e−iHBO t/ε U P7α + O(ε). (41) We make also use of the fact that the phase space support of the initial wave function lies in 7 and has thus bounded energy with respect to Hcl . Let E := supz∈7 Hcl (z) < ∞, let 1((−∞,E+α),α) be a smooth characteristic function on R and let W,ε . E := 1((−∞,E+α),α) (Hcl (·)) Then standard results from semiclassical analysis imply the following relations.
Adiabatic Decoupling and Time-Dependent Born–Oppenheimer Theory
129
W,ε Proposition 3. (a) 1W,ε (7,α) = E 1(7,α) + O(ε); ε ε (b) e−iHBO t/ε E = E e−iHBO t/ε + O(ε) uniformly for t ∈ I ; ε , E ] = O(ε 2 ); (c) [ HBO (d) E ∈ L(L2 (Rn ), W 2,ε ). In (a)–(c) O(ε) resp. O(ε 2 ) hold in the norm of L(L2 (Rn )). Proposition (3) (a), (c) and (d) are direct consequences of the product rule for pseudodifferential operators (see, e.g., [21,6]) and (b) is again Egorov’s Theorem. Using Proposition 3 (a) and (b) we continue (41) and obtain ε ε e−iH t/ε − P1 U ∗ e−iHBO t/ε U P7α ε ε = e−iH t/ε − P1 U ∗ E e−iHBO t/ε U P7α + O(ε).
We proceed as in the globally isolated band case and write ε t/ε −iH ε t/ε ∗ −iHBO e − P1 U E e U P7α = − ie
−iH ε t/ε
= − ie
−iH ε t/ε
− ie−iH
ε t/ε
t/ε
0
0
t/ε t/ε 0
ds eiH
εs
ds eiH
εs
ds eiH
εs
−iH ε s ε H ε P1 U ∗ E − P1 U ∗ E HBO e BO U P7α
ε
ε ) P1 U ∗ Ee−iHBO s U P7α H ε − Hdiag
(42)
−iH ε s ε ε Hdiag e BO U P7α , P1 U ∗ E − P1 U ∗ E HBO (43)
where ε Hdiag := P3 H ε P3 .
One can now show that (42) is bounded in norm by a constant times ε(1 + |t|) using exactly the same sequence of arguments as in the proof in the previous section. One must only keep track of the “hierarchy” of smoothed projections, e.g., instead of (29) one has 2 ε ε ε H − Hdiag P1 = (1 − P3 ) − )X ⊗ 1, P2 P1 + O(ε 2 ). 2 The adjoint part drops out completely, because this time only the difference on the band, i.e. on RanP1 , is of interest. Note also that the smoothed projections Pi are bounded operators on the respective scaled Sobolev spaces and thus, according to Proposition 3 (d), all estimates hold in the norm of L(H). It remains to show that also (43) is O(ε). First note that, according to Proposition 3 ε yields an error of order O(ε 2 ) in the integrand and thus an (c), commuting E and HBO error of order O(ε) after integration. For φ ∈ W 2 we compute ε (Hdiag P1 U ∗ φ)(X) = 11 (X) E(X) φ(X)χ (X) 2 2 ε + 11 (X) − i∇X + Aext φ (X) χ (X) 2 + ε 11 (X) (−iε∇φ) (X) · −iχ (X), ∇X χ (X)He χ (X)
− i ε (∇11 )(X) · (−iε∇φ) (X) χ (X) + O(ε 2 ).
130
H. Spohn, S. Teufel
On the other hand, again for φ ∈ W 2 , ε (P1 U ∗ HBO φ)(X) = 11 (X) E(X) φ(X)χ (X) 2 2 ε + 11 (X) − i∇X + Aext φ (X) χ (X) 2
+ ε 11 (X) (−iε∇φ) (X) · Ageo (X) χ (X) + O(ε 2 ). Hence ε ε Hdiag P1 U ∗ E − P1 U ∗ HBO E = −ε U ∗ (∇11 ) · ε∇X E + O(ε 2 ).
Thus the norm of (43) is, up to an error of order O(ε), bounded by the norm of ε U∗
t/ε 0
ε
ds (∇11 ) · ε∇X E e−iHBO s U P7α .
(44)
(∇11 ) · ε∇X E is a bounded operator and we can apply Proposition 1 in the integrand of (44) once more, this time however with the smoothed projection P0 , and obtain (44) = ε U ∗
t/ε 0
ε
ds (∇11 ) · ε∇X E 10 e−iHBO s U P7α + O(ε) = O(ε).
(45)
The last equality in (45) follows from the fact that [ε∇X E, 10 ] = O(ε) and that (∇11 ) and 10 are disjointly supported. Proof of Proposition 2. For the following calculations we continue χ (·) ∈ Cb∞ (!, He ) arbitrarily to a function χ (·) ∈ Cb∞ (Rn , He ) by possibly modifying it on ! \ (! − δ/2). For φ in a dense subset of L2 (! − δ) and X ∈ ! − δ/2, by making the substitutions = (Y − X)/ε and using the Taylor expansion with rest, we have: k = εk and Y
W,ε X+Y −n a dY dk a ⊗ 1 φχ (X) = (2π) , εk e−i(X−Y )·k φ(Y )χ (Y ) 2
ε ) χ (X) = (2π)−n dY a (2) X + Y , −Y φ(X + ε Y 2
ε )) ) Y · ∇X χ (f (X, εY + ε (2π)−n dY a (2) X + Y , −Y φ(X + ε Y 2 = U ∗ a W,ε U φχ (X) + R ε . (46) From (46) we conclude that 1!−δ/2 (·) ⊗ 1 a W,ε ⊗ 1 − U ∗ a W,ε U P!−δ ≤ R ε . Since 1 − 1!−δ/2 (·) ⊗ 1 a W,ε ⊗ 1 − U ∗ a W,ε U P!−δ = 1 − 1!−δ/2 (·) ⊗ 1 a W,ε ⊗ 1 − U ∗ a W,ε U 1!−δ (·) ⊗ 1 P!−δ = O(ε n )
Adiabatic Decoupling and Time-Dependent Born–Oppenheimer Theory
131
for arbitrary n, Proposition 2 follows by showing that R ε is of order ε:
ε ) Y · ∇X χ (f (·, ε Y )) R ε ≤ ε (2π)−n , −Y φ(· + εY dY a (2) · + Y 2 H −n ≤ ε (2π) sup (∇X χ )(X)He X∈Rn
ε ) , −Y |Y | φ(· + ε Y × dY a (2) · + Y 2 n 2 L (R )
(2) sup |Y | | )| dY ≤ ε C φL2 (Rn ) a (X, Y φχ H . = εC
X∈Rn
Acknowledgement. We are grateful to André Martinez and Gheorghe Nenciu for explaining to us their work in great detail. S. T. would like to thank George Hagedorn for stimulating discussions and, in particular, for helpful advice on questions concerning the Berry connection and Markus Klein and Ruedi Seiler for explaining their treatment of Coulomb singularities. We thank Caroline Lasser and Gianluca Panati for careful reading of the manuscript and the referee for pointing out Reference [12].
References 1. Avron, J.E. and Elgart, A.: Adiabatic theorems without a gap condition. Commun. Math. Phys. 203, 445–463 (1999) 2. Bornemann, F. and Schütte, C.: On the singular limit of the quantum-classical molecular dynamics model. SIAM J. Appl. Math. 59, 1208–1224 (1999) 3. Born, M. and Oppenheimer, R.: Zur Quantentheorie der Molekeln. Ann. Phys. (Leipzig) 84, 457–484 (1927) 4. Bouzouina, A. and Robert, D.: Uniform semi-classical estimates for the propagation of Heisenberg observables. Math. Phys. Preprint Archive mp_arc 99–409 (1999) 5. Combes, J.-M., Duclos, P., Seiler, R.: The Born–Oppenheimer approximation. In: Rigorous Atomic and Molecular Physics, eds. G. Velo, A. Wightman, Plenum: New York, 1981, pp. 185–212 6. Dimassi, M. and Sjöstrand, J.: Spectral Asymptotics in the Semi-Classical Limit. London Mathematical Society Lecture Note Series 268, Cambridge: Cambridge University Press, 1999 7. Fermanian Kammerer, C. and Gérard, P.: A Landau–Zener formula for two-scaled Wigner measures. Preprint (2001) 8. Hagedorn, G.A.: High order corrections to the time-independent Born–Oppenheimer approximation I: Smooth potentials. Ann. Inst. H. Poincaré Sect. A 47, 1–19 (1987) 9. Hagedorn, G.A.: A time dependent Born–Oppenheimer approximation. Commun. Math. Phys. 77, 1–19 (1980) 10. Hagedorn, G.A. and Joye, A.: A time-dependent Born–Oppenheimer approximation with exponentially small error estimates. Math. Phys. Preprint Archive mp_arc 00-209 (2000) 11. Hagedorn, G.A.: Molecular Propagation Through Electronic Eigenvalue Crossings. Memoirs Am. Math. Soc. 536 (1994) 12. Herrin, J. and Howland, J.S.: The Born–Oppenheimer approximation: Straight-up and with a twist. Rev. Math. Phys. 9, 467–488 (1997) 13. Hövermann, F., Spohn, H., Teufel, S.: Semiclassical limit for the Schrödinger equation with a short scale periodic potential. Commun. Math. Phys. 215, 609–629 (2001) 14. Joye, A. and Pfister, C.-E.: Quantum adiabatic evolution. In: On Three Levels, eds. M. Fannes, C. Maes, A. Verbeure, New York: Plenum, 1994, pp. 139–148 15. Kato, T. On the adiabatic theorem of quantum mechanics. Phys. Soc. Jap. 5, 435–439 (1958) 16. Klein, M., Martinez, A., Seiler, R., Wang, X.P.: On the Born–Oppenheimer expansion for polyatomic molecules. Commun. Math. Phys. 143, 607–639 (1992) 17. Lions, P.L. and Paul, T.: Sur les mesures de Wigner. Revista Mathematica Iberoamericana 9, 553–618 (1993) 18. Martinez, A. and Sordoni, V.: On the time-dependent Born–Oppenheimer approximation with smooth potential. Math. Phys. Preprint Archive mp_arc 01-37 (2001)
132
H. Spohn, S. Teufel
19. Mead, V. and Truhlar, D.G.: On the determination of Born–Oppenheimer nuclear motion wave functions including complications due to conical intersections and identical nuclei. J. Chem. Phys. 70, 2284–2296 (1979) 20. Nenciu, G. and Sordoni, V.: Semiclassical limit for multistate Klein-Gordon systems: Almost invariant subspaces and scattering theory. Math. Phys. Preprint Archive mp_arc 01-36 (2001) 21. Robert, D.: Autour de l’Approximation Semi-Classique. Progress in Mathematics, Volume 68, Basel– Boston: Birkhäuser, 1987 22. Shapere, A. and Wilczek, F. (eds): Geometric Phases in Physics. Singapore: World Scientific, 1989 23. Teufel, S. and Spohn, H.: Semiclassical motion of dressed electrons. Preprint ArXiv.org math-ph/0010009, to appear in Rev. Math. Phys. (2001) 24. Teufel, S.: Adiabatic decoupling for perturbations of fibered Hamiltonians. In preparation Communicated by B. Simon
Commun. Math. Phys. 224, 133 – 152 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Resonance Theory for Schrödinger Operators O. Costin, A. Soffer Department of Mathematics, Rutgers University, Piscataway, NJ 08854-8019, USA Received: 10 November 2000 / Accepted: 5 September 2001
Dedicated to J. L. Lebowitz, on the occasion of his 70th birthday Abstract: Resonances which result from perturbation of embedded eigenvalues are studied by time dependent methods. A general theory is developed, with new and weaker conditions, allowing for perturbations of threshold eigenvalues and relaxed Fermi Golden rule. The exponential decay rate of resonances is addressed; its uniqueness in the time dependent picture is shown in certain cases. The relation to the existence of meromorphic continuation of the properly weighted Green’s function to time dependent resonance is further elucidated, by giving an equivalent time dependent asymptotic expansion of the solutions of the Schrödinger equation. 1. Introduction and Results 1.1. General remarks. Resonances may be defined in different ways, but usually refer to metastable behavior (in time) of the corresponding system. The standard physics definition would be as “bumps” in the scattering cross section, or exponentially decaying states in time, or poles of the analytically continued S matrix (when such an extension exists). Mathematically, in the last 25 years one uses a definition close to the above, by defining λ to be a resonance (energy) if it is the pole of the meromorphic continuation of the weighted Green’s function χ (H − z)−1 χ with suitable weights χ (usually, in the Schrödinger Theory context, χ will be a C0∞ function). Here H is the Hamiltonian of the system. In many cases the equivalence of some of the above definitions has been shown [1–3]. However, the exponential behavior in time, and the correct estimates on the remainder are difficult to produce in general [21]. It is also not clear how to relate the time behavior to a resonance, uniquely, and whether “analytic continuation” plays a fundamental role; see the review [5]. Important progress
134
O. Costin, A. Soffer
on such relations has recently been obtained; Orth [6] considered the time dependent behavior of states which can be related to resonances without the assumption of analytic continuation and established some preliminary estimates on the remainder terms. Then, Hunziker [7] was able to develop a quite general relation between resonances defined via poles of analytic continuations in the context of Balslev–Combes theory, to exponential decay in time, governed by the standard Fermi Golden rule. Here the resonances were small perturbations of embedded eigenvalues. In [1] a definition of resonance in a time dependent way is given and it is shown to agree with the one resulting from analytic continuation when it exists, in the Balslev–Combes theory. They also get exponential decay and estimates on the remainder terms. Exact solutions, including the case of large perturbations, for time dependent potentials have recently been obtained in [8]. Further notable results on the time dependent behavior of the wave equation were proved by Tang and Zworski [9]. The construction of states which resemble resonances, and thus decay approximately exponentially was accomplished e.g. in [10]. For resonance theory based on the Balslev–Combes method the reader is referred to the book [21] and its comprehensive bibliography on the subject. Then, in a time-dependent approach to perturbation of embedded eigenvalues developed in [11] exponential decay and dispersive estimates on the remainder terms were proved in a general context, without the assumption of analytic continuation. When an embedded eigenvalue is slightly perturbed, we generally get a “resonance”. One then expects the solution at time t to be a sum of an exponentially decaying term plus a small term (in the perturbation size) which, however, decays slowly. The lifetime of the resonance is given by −1 , where , the probability of decay per unit time, enters in the exponential decay rate p(t) ∼ e− t/h¯ . If an analytic continuation of χ (H0 − z)−1 χ exists in a neighborhood of an embedded eigenvalue, then = −2z0 , and a resonance z0 is defined as the pole of the analytic continuation of χ (H − z)−1 χ . In this case, has the following expansion in :
(λ0 , ) = 2 γ (λ0 , ) + o( 2 ). The expression for γ (λ0 , ) is called the Fermi Golden Rule (FGR). A remarkable fact is that this expansion is defined even when analytic continuation does not exist. Previous works on the existence of resonances required that γ (λ0 , ) > 0 as → 0. This condition is sometimes hard to verify, and in the present work we remove this assumption. 1.2. Outline of new results. In this work we improve the theory of perturbation of embedded eigenvalues and resonances in three main directions: First, the Fermi Golden Rule condition, which originally required as above (sometimes implicitly) that > C 2 as → 0 is removed. We show that under (relatively weak) conditions of regularity of the resolvent of the unperturbed Hamiltonian all that is needed is that > 0. The price one sometimes has to pay is that it may be needed to evaluate at a nearby point of the eigenvalue λ0 of the unperturbed Hamiltonian (see (3)). In cases of very low regularity of the unperturbed resolvent, we need in general
> C m , with m > 2; m becomes larger if more regularity of the resolvent is provided; cf. (1) and (2) below. The second main improvement relative to known results in resonance theory is that we only require H η regularity (see Sect. 2.1), with η > 0, of the unperturbed resolvent near
Resonance Theory
135
the relevant energy. Most works on resonance require analyticity; the recent works [6, 11, 21] require H η regularity with η > 1. This improvement is important to perturbations of embedded eigenvalues at thresholds (e.g., our condition is satisfied by H0 = − at λ0 = 0 in three or more dimensions, while the previous results only apply to five or more dimensions). As a third contribution we indicate that under conditions of analytic continuation and with suitable cutoff, the term e− t can be separated from the solution and the remainder b term is given by an asymptotic series in t −a , a > 0, times a stretched exponential e−t , with b < 1, see Sect. 5. Our analyticity assumptions are weaker and thus apply in cases of threshold eigenvalues where standard complex deformation approaches could fail. Furthermore we replace analytic perturbation methods by more general complex theory arguments. As concrete examples of applications we outline the following two classes of problems; (1) In many applications H0 = − ⊕ H1 , where H1 has a discrete spectrum (see e.g. [21]), if H1 ψ0 = 0 has a solution, then H0 has an embedded eigenvalue at the threshold, since σ (−) = [0, ∞). In this case the known analytic methods do not apply; the methods of [6] apply when η > 1 which is the case of the Laplacian on L2 (RN ) if N ≥ 5. The results of this paper apply down to N = 3. (2) The Hamiltonians one gets by linearizing a nonlinear dispersive completely integrable equation around an exact solution have an embedded eigenvalue corresponding to the soliton/breather, etc. Small perturbations of such completely integrable equations then produce a perturbation problem of embedded eigenvalues with selfconsistent potential W . In these cases the size of is typically of higher order in 2 and in certain cases it is even O(e−1/ ). Hence the previous works are not applicable since they require a lower bound O( 2 ) on . Our approach follows the setup of the time dependent theory of [11], combined with Laplace transform techniques. It is expected to generalize to the N -body case following [12]. We will follow, in part, the notation of [11]. The analysis in this work utilizes in some ways this framework, but generalizes the results considerably: the required time decay is O(t −1−η ) and we remove here the assumption of lower bound on ; it is replaced by 2
≥ Cε 1−η
(1)
> 0, arbitrary
(2)
when η < 1, and
when η > 1. Whenever a meromorphic continuation of the S-matrix or Green’s function exists, the poles give an unambiguous definition of “resonance”. A time dependent approach or other definitions are less precise, not necessarily unique, as was observed in [6], but usually apply in more general situations, where analytic continuation is either hard to prove or not available. We provide some information about defining resonance by time dependent methods and its relation to the existence of “analytic continuation”. In particular, we will show that in general one can find the exponential decay rate up to higher order corrections depending on η and .
136
O. Costin, A. Soffer
In case it is known that analytic continuation exists, our approach provides a definition of a unique resonance corresponding to the perturbed eigenvalue. It is given by the solution of some transcendental equation in the complex plane and it also corresponds to a pole of the weighted Green’s function. 2. Main Results We begin with some definitions. Given H0 , a self-adjoint operator on H = L2 (Rn ), we assume that H0 has a simple eigenvalue λ0 with normalized eigenvector ψ0 : H0 ψ0 = λ0 ψ0 , ψ0 = 1.
(3)
Our interest is to describe the behavior of solutions of i
∂φ = H φ, ∂t
H := H0 + W () ,
(4)
where is a small parameter, taken to be the size of the perturbation in an appropriate norm (cf. e.g. (8)), φ(0) = E φ0 , where E is the spectral projection of H on the interval and is a small interval around λ0 . (Note that W () depends on in general, and may not even have a limit as → 0.) Furthermore, we will describe, in some cases, the analytic structure of (H − z)−1 in a neighborhood of λ0 . W is a symmetric perturbation of H0 , such that H is self-adjoint with the same domain as H0 . For an operator A, A denotes its norm as an operator from L2 to itself. We interpret functions of a self-adjoint operator as being defined by the spectral theorem. In the special case where the operator is H0 , we omit the argument, i.e., g(H0 ) = g. For an open interval , we denote an appropriate smoothed characteristic function of by g (λ). In particular, we shall take typically g (λ) to be a nonnegative C ∞ function, which is equal to one on and zero outside a neighborhood of . The support of its derivative is furthermore chosen to be small compared to the size of . We further require that |g (n) (λ)| ≤ cn ||−n , n ≥ 1. P0 denotes the projection on ψ0 , i.e., P0 f = (ψ0 , f )ψ0 . P1b denotes the spectral projection on Hpp ∩ {ψ0 }⊥ , the pure point spectral part of H0 orthogonal to ψ0 . That is, P1b projects onto the subspace of H spanned by all the eigenstates other than ψ0 . In our treatment, a central role is played by the subset of the spectrum of the operator H0 , T - on which a sufficiently rapid local decay estimate holds. For a decay estimate to hold for e−iH0 t , one must certainly project out the bound states of H0 , but there may be other obstructions to rapid decay. In scattering theory these are called threshold energies. Examples of thresholds are: (i) points of stationary phase of a constant coefficient principal symbol for two body Hamiltonians and (ii) for N-body Hamiltonians, zero and eigenvalues of subsystems. We will not give a precise definition of thresholds. For us it is sufficient to say that away from thresholds the favorable local decay estimates for H0 hold. Let ∗ be a union of intervals, disjoint from , containing a neighborhood of infinity and all thresholds of H0 except possibly those in a small neighborhood of λ0 . We then let P1 = P1b + g∗ ,
Resonance Theory
137
where g∗ = g∗ (H0 ) is a smoothed characteristic function of the set ∗ . We also define for x ∈ Rn x2 = 1 + |x|2 ,
Q = I − Q,
and
Pc- = I − P0 − P1 .
(5)
-
Thus, Pc is a smoothed out spectral projection of the set T - defined as T - = σ (H0 ) \ {eigenvalues, real neighborhoods of thresholds and infinity}.
(6)
-
We expect e−iH0 t to satisfy good local decay estimates on the range of Pc ; (see (H4) below). 2.1. Hypotheses on H0 . We assume H η regularity for H0 . By this we mean that (ψ, (H0 − z)−1 φ) is in the Hölder space of order η, H η , in the z variable for z near the relevant energy. Here ψ, φ are in the dense set {φ ∈ L2 : xσ φ ∈ L2 }. (H1) (H2) (H3) (H4)
H0 is a self-adjoint operator with dense domain D, in L2 (Rn ). λ0 is a simple embedded eigenvalue of H0 with (normalized) eigenfunction ψ0 . There is an open interval containing λ0 and no other eigenvalue of H0 . Local decay estimate: Let r > 1. There exists σ > 0 such that if xσ f ∈ L2 then x−σ e−iH0 t Pc- f 2 ≤ Ct−r xσ f 2 .
(7)
(H5) By appropriate choice of a real number c, the L2 operator norm of xσ (H0 + c)−1 x−σ can be made sufficiently small. Remarks. (i) We have assumed that λ0 is a simple eigenvalue to simplify the presentation. Our methods can be easily adapted to the case of multiple eigenvalues. (ii) Note that does not have to be small and that ∗ can be chosen as necessary, depending on H0 . (iii) In certain cases, the above local decay conditions can be proved even when λ0 is a threshold; see [13]. (iv) Regarding the verification of the local decay hypothesis, one approach is to use techniques based on the Mourre estimate [14–16]. If contains no threshold values, then quite generally, the bound (7) holds with r arbitrary and positive. We now specify the conditions we require of the perturbation, W . Conditions on W . (W1) W is symmetric and H = H0 + W is self-adjoint on D and there exists c ∈ R (which can be used in (H5)), such that c lies in the resolvent sets of H0 and H . (W2) For the same σ as in (H4) and (H5) we have : |||W ||| := x2σ Wg (H0 ) + xσ Wg (H0 )xσ + xσ W (H0 + c)−1 x−σ < ∞ and xσ W (H0 + c)−1 xσ < ∞.
(8)
138
O. Costin, A. Soffer
(W3) Resonance condition–nonvanishing of the Fermi golden rule: For a suitable choice of λ (which will be made precise later)
(λ, ) := (λ) := π 2 (W () ψ0 , δ(H0 − λ)(I − P0 )W () ψ0 ) = 0.
(9)
In most cases = (λ0 ). But in the case is very small it turns out that the “correct”
will be
(λ0 + δ) with δ given in the proof of Proposition 12. See also Sect. S4. The main results of this paper are summarized in the following theorem. Theorem 1. Let H0 satisfy the conditions (H1)...(H5) and the perturbation satisfy the conditions (W1). . . (W3). Assume moreover that is sufficiently small and either: (i) H0 has regularity as in Sect. 2.1 with η > 1 or (ii) We have lower regularity 0 < η < 1 supplemented by the conditions
> C n , and η >
n≥2
n−2 n .
Then a) H = H0 + W has no eigenvalues in . b) The spectrum of H is purely absolutely continuous in , and x−σ e−iH t g (H )40 2 ≤ C t−1−η xσ 40 2 .
(10)
c) For t ≥ 0 we have
e−iH t g (H )40 = (I + AW ) e−iω∗ t a(0)ψ0 + e−iH0 t φd (0) + R(t),
(11)
where AW := K(I − K)−1 − I and K is an integral operator defined in (35) and 1. if η < 1 and → 0 with t fixed we have R(t) = O( 2 η−1 ) while as t → ∞ we have R(t) = O( −1 t −η−1 ), 2. for η > 1 we have R(t) = O( 2 t −η+1 ), 3. AW ≤ C|||W |||,
(12)
a(0) and φd (0) are determined by the initial data. The complex frequency ω∗ is given by −iω∗ = −is0 − , where s0 solves the equation s0 + ω + 2 {F (, is0 )} = 0
(13)
(see (47) and (49) below) and 4.
= 2 {F (, is0 )} .
(14)
Remark. ω∗ can be found by solving the transcendental equation (13) by either expansion or iteration if sufficient regularity is present (see also Proposition 12 and note following it and Lemma 18).
Resonance Theory
139
2.2. Sketch of the proof of the Theorem 1. The proof of Theorem 1 is given in Sects. 3 and 4. Section 3 prepares the ground for the proof, Subsect. 4.1 provides key definitions while Subsects. 4.2 and 4.3 contain the proof of Theorem 1 (ii) and (i) respectively. As an intuitive guideline, the solution φ(t) of the time dependent problem is decomposed into the projection a(t)ψ0 on the eigenfunction of H0 and a remainder (see (18)). The remainder is estimated from the detailed knowledge of a(t) (see (34) and (39). Thus it is essential to control a(t); once that is done, parts (a) and (b) follow from Proposition 4; this a(t) satisfies an integral equation, cf. (43). We chiefly use the Tauberian type duality between the large t behavior of a(t) and the regularity properties of its Laplace transform, cf. Proposition 9 and also Eq. (55). Then, an essential ingredient in the proof of the estimate (11) is Proposition 15. When enough regularity is present, no lower bound on > 0 is imposed; Proposition 16 and Proposition 17 are key ingredients here. 2.3. Further results. Lemma 2. Assuming the conditions of Theorem 1 with η > 1 then ω∗ = λ0 + (ψ0 , W ψ0 ) + (; + i ) + o( 2 ),
(15)
where ; = 2 (W ψ0 , P .V .(H0 − λ0 )−1 W ψ0 ),
(16)
= π (W ψ0 , δ(H0 − λ0 )(I − P0 )W ψ0 ).
(17)
2
This follows from the proof of Proposition 12 and the Remarks below it. 3. Decomposition and Isolation of Resonant Terms We begin with the following decomposition of the solution of (4): ˜ e−iH t φ0 = φ(t) = a(t)ψ0 + φ(t), ˜ = 0, −∞ < t < ∞. ψ0 , φ(t)
(18) (19)
Substitution into (4) yields i∂t φ˜ = H0 φ + W φ˜ − (i∂t a − λ0 a)ψ0 + aW ψ0 .
(20)
-
Recall now that I = P0 + P1 + Pc . Taking the inner product of (20) with ψ0 gives the amplitude equation: ˜ + (ψ0 , W φd ), i∂t a = (λ0 + (ψ0 , W ψ0 ) )a + (ψ0 , W P1 φ)
(21)
˜ φd := Pc- φ.
(22)
where
-
The following equation for φd is obtained by applying Pc to Eq. (20): i∂t φd = H0 φd + Pc- W (P1 φ˜ + φd ) + aPc- W ψ0 .
(23)
140
O. Costin, A. Soffer
To derive a closed system for φd (t) and a(t) we now propose to obtain an expression ˜ to be used in Eqs. (21) and (23). Since g (H )φ(·, t) = φ(·, t) we find for P1 φ, (I − g (H ))φ = (I − g (H )) aψ0 + P1 φ˜ + Pc- φ˜ = 0 (24) or
(I − g (H )gI (H0 ))P1 φ˜ = −g (H ) aψ0 + φd ,
(25)
where gI (λ) is a smooth function which is identically equal to one on the support of P1 (λ), and which has support disjoint from . Therefore P1 φ˜ = −Bg (H )(aψ0 + φd ),
(26)
B = (I − g (H )gI (H0 ))−1 .
(27)
where
This computation is justified in Appendix B of [11]. The following was also shown there: Proposition 3 ([11]). For small , the operator B in (27) is a bounded operator on H. From (26) we get φ(t) = a(t)ψ0 + φd + P1 φ˜ = g˜ (H )(a(t)ψ0 + φd (t)),
(28)
g˜ (H ) := I − Bg (H ) = Bg (H )(I − gI (H0 )),
(29)
with see (5). Although g˜ (H ) is not really defined as a function of H , we indulge in this mild abuse of notation to emphasize its dependence on H . In fact, in some sense, g˜ (H ) ∼ g (H ) to higher order in [11]. Substitution of (26) into (23) gives: i∂t φd = H0 φd + aPc- W g˜ (H )ψ0 + Pc- W g˜ (H )φd
(30)
and
i∂t a = λ0 + (ψ0 , W g˜ (H )ψ0 ) a + (ψ0 , W g˜ (H )φd ) = ωa + (ω1 − ω)a + (ψ0 , W g˜ (H )φd ),
(31)
where ω = λ0 + (ψ0 , W ψ0 ), ω1 = λ0 + (ψ0 , W g˜ (H )ψ0 ).
(32) (33)
We write (30) as an equivalent integral equation. We will later need the integral representation of the solution of (30) t −iH0 t φd (t) = e φd (0) − i e−iH0 (t−s) a(s)Pc- W g˜ (H )ψ0 ds 0 t −i e−iH0 (t−s) Pc- W g˜ (H )φd ds. (34) 0
This was also used to prove the following statement.
Resonance Theory
141
Proposition 4 ([11]). Suppose |a(t)| ≤ a∞ t−1−α and assume that η > 0 and α ≥ η. Then for some C > 0 we have x−σ φd (t) L2 ≤ Ct−1−η xσ φd (0) L2 + a∞ |||W ||| . Note. The proposition, as we mentioned, implies parts (a) and (b) of the main theorem, given the properties of a(t) which will be shown in the sequel. The absolute continuity stated in the theorem follows from (10) with η > 0. We define K as an operator acting on C(R+ , H), the space of continuous functions on R+ with values in H by t e−iH0 (t−s) Pc- W g˜ (H )f (s, x)ds. (35) K f (t, x) = 0
We introduce on
C(R+ , H)
the norm f β = suptβ f (·, t) H
(36)
A β;σ = sup x−σ Axσ f β .
(37)
t≥0
and define the operator norm f β ≤1
The above definitions directly imply the following. Proposition 5. If is small, 0 ≤ β ≤ r, r > 1 and for some β1 > 0 we have x−σ e−iH0 t Pc x−σ ≤ Ct −1−β1 , then for 0 ≤ β ≤ β1 we have K β;σ ≤ Cβ;σ ;r .
(38)
The proof uses the smallness of which in turn entails the boundedness of −σ σ −1 x ∞ g˜ n(H )x . Using the definition of K given above we see that K(1 − K) = n=1 K is also bounded. We can now rewrite the equations for φd as φd (t) = e−iH0 t φd (0) + K a(t)ψ0 + Kφd
= (I − K)−1 K a(t)ψ0 + e−iH0 t φd (0)
(39)
(recall that we defined AW = −I + (I − K)−1 K) and therefore i∂t a = ω1 a + ψ0 , W g˜ (H )(I − K)−1 K aψ0 + ψ0 , W g˜ (H )(I − K)−1 e−iH0 t φd (0) .
(40)
To complete the proof of Theorem 1 we need to estimate the large time behavior of a(t) solving Eq. (40). Since the inhomogeneous term satisfies the required decay O(t −1−η ) by our assumptions on H0 it is sufficient to study the associated homogeneous equation. Equivalently, we may choose the embedded eigenfunction as initial condition (that is φd (0) = 0). We now define two operators on L∞ by (41) j˜(a) = v, x−σ K(aψ0 ) , where v = xσ W g˜ (H )ψ0
142
O. Costin, A. Soffer
and
j (a) = v, x−σ (I − K)−1 K(aψ0 ) .
(42)
Proposition 6. The operators j˜ and j are bounded from L∞ into itself. The proposition follows from Proposition 5 with β = 0. Remark. The equation for a can now be written in the equivalent integral form t −iωt −iωt a(t) = a(0)e +e eiωs j (a)(s)ds := a(0)e−iωt + J (a).
(43)
0 ∞ Definition 1. Consider the spaces L∞ T ;ν and Lν to be the spaces of functions on [0, T ] + and R respectively, in the norm
a ν = sup |e−νs a(s)|
(44)
s
Remark 7. We note that for T ∈ R+ , the norm on L∞ T ;ν is equivalent to the usual norm on L∞ [0, T ]. Proposition 8. For some constants c, C and c˜ independent of T we have j a ν ≤ cν −1 2 a ν , J a ν ≤ Cν −2 2 a ν and j˜a ν ≤ cν ˜ −1 2 a ν , and thus j , J , and j˜ ∞ ∞ are defined on LT ;ν and Lν and their norms, in these spaces, are estimated by j ν ≤ cν −1 2 ; j˜ ν ≤ cν ˜ −1 2 ; J ν ≤ Cν −2 2 .
(45)
Similar arguments as above lead to Proposition 9. Equation (40) has a unique solution in L1loc (R+ ), and this solution belongs to L∞ ν if ν > ν0 with ν0 sufficiently large. Thus, in the half-plane (p) > ν0 the Laplace transform of a ∞ aˆ := e−pt a(t)dt (46) 0
exists and is analytic in p. Furthermore, for (p) > ν0 , the Laplace transform of a satisfies ˆ ip aˆ = ωaˆ + ia(0) − i 2 F (, p)a(p),
(47)
where F (, p) is defined by F (, p) := ψ0 , W g˜ (H )
−1 −iI iI I+ P W g˜ (H ) P W g˜ (H ) ψ0 p + iH0 c p + iH0 c
+ i(ω1 − ω) −2
(48)
so ˆ = ia(0). (ip − ω + i 2 F (, p))a(p) Eq. (47) follows by taking the Laplace transform of (31).
(49)
Resonance Theory
143
Proof. By Proposition 8, and since e−iωt ν = 1, for large ν Eq. (43) is contractive in 1 L∞ T ;ν and has a unique solution there. It thus has a unique solution in Lloc , by Remark 7. 1 ∞ Since by the same argument Eq. (43) is contractive in L∞ T ;ν and since Lν ⊂ Lloc , the 1 ∞ unique Lloc solution of (43) is in Lν as well. The rest is straightforward. ! " Remark 10. Note that by construction (47) and (48) define F as a Laplace transform of a function. Our assumptions easily imply that if is small enough, then: (a) F (, p) is analytic except for a cut along i. F (, p) is Hölder continuous of order η > 0 at the cut, i.e. lim F (, iτ ± γ ) ∈ H η , γ ↓0
the space of Hölder continuous functions of order η. (b) |F (, p)| ≤ C|p|−1 for some C > 0 as |p| → ∞. To see it we write B = B1 B2 x−σ ; B1 :=
I P - x−σ ; p + iH0 c
B2 := xσ W g˜ (H )xσ .
(50)
-
Noting that Pc projects on the interval it is clear by the spectral theorem that x−σ B is analytic in p on D := C \ (i). By the assumption on the decay rate and the Laplace transform of Eq. (7) we have that B3 (p) := x−σ
I P - x−σ p + iH0 c
(51)
is uniformly Hölder continuous, of order η, as p → i. For p0 ∈ i, the two sided limits lima↓0 B3 (p0 ±a) = B3± will of course differ, in general. A natural closed domain of definition of B3 is D together with the two sides of the cut, D := D ∪ ∂D+ ∪ ∂D− . We then write B3 ≤ C1 (p),
(52)
where we note that C1 can be chosen so that: Remark 11. C1 (p) > 0 is uniformly bounded for p ∈ D and C1 (p) = O(p −1 ) for large p. Hence for some C2 we have uniformly in p (choosing small enough), x−σ (B1 B2 )n ≤ C2n n ,
(53)
and therefore the operator
W g˜ (H )
−1 I I I− P W g˜ (H ) P W g˜ (H ) p + iH0 c p + iH0 c
is analytic in D and is in H η (D).
(54)
144
O. Costin, A. Soffer
4. General Case 4.1. Definition of . We have from Proposition 9, Eq. (47) that a(p) ˆ =
ia(0) . ip − ω + i 2 F (, p)
(55)
We are most interested in the behavior of aˆ for p = is, s ∈ R. will be defined in terms of the approximate zeros of the denominator in (55). Let F =: F1 + iF2 . Proposition 12. For small enough, the equation s + ω + 2 F2 (, is) = 0 has at least one root s0 , and s0 = −ω + O( 2 ). If η ≥ 1, then for small enough the solution is 2
unique. If η < 1 then two solutions s1 and s2 differ by at most O( 1−η ). Proof. We write s = −ω + δ and get for δ an equation of the form δ = 2 G(δ) where G(x) = −F2 (, ix − iω), and G(x) ∈ H η . The existence of a solution for small is an immediate consequence of continuity and the fact that δ − 2 G(δ) changes sign in an interval of size 2 G ∞ . If η ≥ 1 we note that the equation δ = 2 G(δ) is contractive for small and thus has a unique root. If instead 0 < η < 1 we have, if δ1 , δ2 are two roots, then for some K > 0 independent of , |δ1 − δ2 | = 2 |G(δ1 ) − G(δ2 )| ≤ 2 K|δ1 − δ2 |η whence the conclusion. ! " Remark. Note that s0 are not, in general, poles of (55) since we only solve for the real part equal to zero. 2
Assumption 13. If η < 1 then we assume that 2 F1 (, −iω) & 1−η for small . When η > 1 this restriction will not be needed, cf. Sect. 4.3. Definition. We choose one solution s0 = −ω + δ and let be defined by (14). Note. In the case η < 1 the choice of s0 yields, by the previous assumption a (possible) 2
arbitrariness in the definition of of order O( 1−η ) = o( ). Remarks on the verifiability of condition > 0. As it is generally difficult to check the positivity of itself but relatively easier to find 0 , we will look at various scenarios, which are motivated by concrete examples, in which the condition of positivity reduces to a condition on F (, −iω). Let
0 = 2 F1 (, −iω); γ0 = 2 F2 (, −iω), where we see that 0 and γ0 are O( 2 ). The equation for δ reads δ = − 2 [F2 (, −iω + iδ) − F2 (, −iω)] − γ0 = 2 H (δ) − γ0 , where H (0) = 0. We write δ = −γ0 + ζ and get ζ = 2 H (−γ0 + ζ ) and the definition of becomes
= 2 F1 (, −iω − iγ0 + iζ ).
Resonance Theory
145
Proposition 14. (i) If H0 satisfies the conditions of Theorem 1 with η > 1 and γ0 = o( −2 0 ), then as → 0,
= 0 + o( 0 ) and in particular is positive for 0 > 0.
(56) 2
(ii) Assume that η < 1, γ0 = o( −2 0 ) and 0 & 1−η as → 0. Then again (56) holds. 1/η
Proof. (i) Since ζ = O( 2 γ0 ) + O( 2 ζ ) we get ζ = O( 2 γ0 ), implying that
= 2 F1 , −iω − iγ0 (1 + o(1)) = 0 + O( 2 γ0 ) = 0 + o( 0 ). (ii) We have η
ζ = O( 2 γ0 ) + O( 2 ζ η ).
(57)
If ζ ≤ const.γ0 as → 0, then the proof is as in part (i). If on the contrary, for some large constant C we have ζ > Cγ0 then by (57) we have ζ < const. 2 ζ η so that ζ = O( 2/(1−η) ) and 2 ζ η = O( 2/(1−η) ) = o( 0 ). But then η
= 2 F1 (, −iω) + O( 2 γ0 ) + O( 2 ζ η ) = 0 + o( 0 ).
" !
4.2. Exponential decay. We now let p = is0 + v. The intermediate time and long time behavior of a(t) are given by the following proposition Proposition 15. For t = O(1) (note that in general depends on ), as → 0 we have (i) a(t) = e−is0 t e− t + O( 2 η−1 ).
(58)
a(t) = O( −1 t −η−1 ).
(59)
(ii) As t → ∞ we have Proof. (i) Note first that, taking (v) > 0 and writing F as a Laplace transform, cf. Remark 10, ∞ e−is0 t−vt f (t)dt, F (, −is0 + v) = 0
we have by our assumptions that t ' ∞ −vt −is0 u e e f (u)du F (, −is0 + v) = 0 0 t ∞ =v e−vt e−is0 u f (u)du 0 0 ∞ ∞ ∞ =v e−is0 u f (u)du e−vt − 0 0 t ∞ ∞ ∞ −is0 u e f (u)du − v e−vt e−is0 u f (u)du = 0
= F (, −is0 ) − vL[g](v),
0
t
(60)
146
O. Costin, A. Soffer
where we denoted g(v) = define
∞ t
e−is0 u f (u)du and L[g] is its Laplace transform. Now h(v) = vL[g](v).
(61)
We have, by the formula for the inverse Laplace transform i∞ evt 2πia(t) = e−is0 t dv, 2 −i∞ v + + h(v)
(62)
where by construction we have h ∈ H η , h is analytic in C \ i and h(0) = 0. We write i∞ i∞ evt evt dv dv = 2 2 −1 −i∞ v + + h(v) −i∞ (v + ) 1 + h(v + ) i∞ vt i∞ h(v + )−1 e dv 1 2 = − evt dv. (63) 2 −1 −i∞ v +
−i∞ v + 1 + h(v + ) We first need to estimate L−1 h(v + )−1 ( the transformation is well defined, since the function is just (v + )−1 (F (, −is0 + v) − F (, −is0 )). We need to write
vL[g](v) =: (v + )L[g1 ](v) or L[g1 ] = 1 − L[g] (64) v+
which defines the function g1 : g1 = g − e− t
t
e s g(s)ds.
(65)
0
Since |g(t)| < Const.t −η we have |g1 (t)| ≤ Const.t −η + e− t
t 0
A similar inequality holds for
Q := L−1
Indeed, we have Q = −L−1
eu
u −η
h v+
2h 1 + v+
du ≤ Const.t −η .
(66)
.
h h + 2 L−1 ∗ Q. v+
v+
(67)
(68)
It is easy to check that for t ≤ r −1 and small enough this equation is contractive in the norm Q = sups≤t sη |Q(s)|. But now, for constants independent of , t 1 2 L−1 e s s −η ds ∗ Q ≤ Const.e− s v+
0 t −η u 2 − s −1 (69) = Const.e
eu du
0 2 ≤ Const. 1−η .
Resonance Theory
147
(ii) We now use (60) and (61) to write h F (, −is0 + v) F (, −is0 ) = − v+
v+
v+
and get H1 := L−1
t h = e− t e s f (s)ds + conste− t , v+
0
and thus, proceeding as in the proof of (i) we get for some C > 0 |H1 | ≤ C −1 t−η−1 . To evaluate a(t) for large t we resort again to Q as defined in (67) which satisfies (68). This time we note that the equation is contractive in the norm sups≥0 |s1+η · | when is small enough. ! " Using (59), Proposition 4 and (28) imply local decay and therefore χ cannot be an eigenfunction which implies (i). Since the local decay rate is integrable (ii) follows [24]. Part c) follows from (58), (39) and (28) while (12) follows from (39) and the smallness of K. 4.3. Proof of Theorem 1 in case (i) of regularity η > 1. In this case we obtain better estimates. We write G(v) = L−1 [g](v)
(70)
and (62) becomes a(t) = e
−is0 t
i∞ −i∞
evt dv. v + + 2 vG(v)
(71)
Now L−1 (v + + 2 vG(v))−1
v 1 1 v+ G(v) −1 2 −1 −1 =L − L ∗L . v v+
v+
1 + 2 v+
G(v)
Proposition 16. Let
−1
H2 (t) := L
v v+ G(v) v 1 + 2 v+
G(v)
(72)
.
We have |H2 | ≤ Const.t−η ;
0
∞
H2 (t)dt = 0.
(73)
148
O. Costin, A. Soffer
Proof. Consider first the function h3 := v(v + )−1 G(v) = G(v) − (v + )−1 G(v); we see that (cf. (70) and (60)) t ∞ H3 := L−1 h3 = e−is0 u f (u)du − e− t e s t
0
∞ s
e−is0 u f (u)duds,
(74)
and thus, for some positive constants Ci , |H3 | ≤ Const.t
−η
+ Const.e
− t
t
ev −η v−η dv,
(75)
0
and thus, since h3 (0) = 0 we have −η
|H3 | ≤ Const.t
;
∞
H3 (t)dt = 0.
0
Note now that the function −1 v 2 v G(v) 1 + G(v) v+
v+
vanishes for v = 0. Note furthermore that H2 = H3 − 2 H3 ∗ H2 . It is easy to check that this integral equation is contractive in the norm H = sups≤t |sη H (s)| for small enough ; the proof of the proposition is complete. ! " Proposition 17. L−1 (v + + 2 G(v))−1 = e− t + (t), where for some constant C independent of , t, we have || ≤ C 2 t−η+1 . Proof. We have, by (72) ∞ ' t (t) = 2 e− t e s H2 (u)du ds 0 s ∞ t ∞ = 2 H2 (s)ds − e− t e s H2 (u)du. t
The estimate of the last term is done as in (75). Theorem 1 part (c) in case (i) follows.
0
" !
s
(76)
Resonance Theory
149
5. Analytic Case Suppose that the function F (p, ) has analytic continuation in a neighborhood of the relevant energy −iω = 0; in this case we can prove stronger results. In many cases one can show the analyticity of F if the resolvent, properly weighted, has analytic continuation. Lemma 18. Assume that for some ω and some neighborhood D of ω, E(, p) is a function with the following properties: (i) E ∈ H η (D) and E is analytic in D (this allows for branch-points on the boundary of the domain, a more general setting than meromorphicity). (ii) |E(, p)| ≤ C 2 for some C. (iii) lima↓0 E(, −iω − a) = − 0 < 0. If (a) η > 1, E(, −iω) = o( 0 / 2 ) or (b) η < 1 and E(, −iω) = O( 0 ) and is small enough, then the function G1 (, p) = p + iω + E(, p) has a unique zero p = pz in D and furthermore (pz ) < 0. In fact, (pz ) + 0 = o( 0 ).
(77)
Remark. If the condition that for η > 1, E(, −iω) = o( −2 0 ) is not satisfied, then we can replace −iω by −iω − is0 and the uniqueness of the complex zero will still be true. Proof. We have G1 (, pz ) = 0 = pz + iω + E(, −iω) + [E(, pz ) − E(, −iω)] or, letting p = −iω + ζ , ζz := pz + iω, 2 φ(, ζ ) := E(, p) − E(, −iω), ζz = −E(, −iω) − 2 φ(, ζz ). Consider a square centered at E(, −iω) with side 2|(E(, −iω))| = 2 0 . For both cases (a) and (b) for η considered in part (iii) of the lemma, note that in our assumptions and by the choice of the square we have 2 φ(ζ, ) (78) ζ + E(, −iω) → 0 (as → 0) (on all sides of the square). In case (a) on the boundary of the rectangle we have by construction of the rectangle, |ζ + E(, −iω)| ≥ 0 . Also by construction, on the sides of the rectangle we have |ζ | ≤ 0 . Still by assumption, φ(, ζ ) ≤ Cζ = o( −2 0 ) and the ratio in (78) is o(1). In case (b), we have η
2 φ(, ζ ) = O( 2 ζ η ) = O( 2 0 ) = o( 0 ). Thus, on the boundary of the square, the variation of the argument of the functions ζ + E(, −iω) + 2 φ(ζ ) and that of ζ + E(, −iω) differ by at most o(1) and thus have to agree exactly (being integer multiples of 2π i); thus ζ + E(, −iω) + 2 φ(ζ ) has exactly one root in the square. The same argument shows that p + iω + E(, p) has no root in any other region in its analyticity domain except in the square constructed in the beginning of the proof. ! "
150
O. Costin, A. Soffer
Theorem 19. Assume the conditions (H) and (W) as before, and furthermore that the function F (, p) has analytic continuation in a neighborhood of −iω; with an appropriate choice of the cutoff function E (H0 ), we have that χ (H − z)−1 χ has a unique pole away from the real axis, near −iω, corresponding to a resonance with imaginary part near , with appropriate choice of weights χ . Proof. First we note that by taking the Laplace transform of (28) and (34) and solving for the resolvent of H we get that ˆ χ (H − z)−1 χ = A(z)a(z)ψ 0 + B(z) with A(z) and B(z) analytic in D by our assumptions (H) and (W), and the assumed analyticity of F (, p), ip := z. Hence the existence and uniqueness of the pole of χ (H − z)−1 χ follows from Lemma 18, with 2 F (, p) = E(, p). ! " As a consequence we obtain the following result. Proposition 20. With an appropriate exponential cutoff function, the remainder term decays as a stretched exponential times an asymptotic series. Sketch of proof. We need the large t behavior of a(t) which is the Inverse Laplace transform of G(p) := (p + iω + i 2 F (, p))−1 and to this end we write G(p) = (p + iω∗ )−1 − i 2 (p + iω∗ )−1 F∗ (, p)G(p),
(79)
where F∗ (, p) := F (, p) − (ω∗ − ω)/ 2 and ω∗ is the unique pole of G(p) found in the previous theorem. Taking the inverse Laplace transform of (79) we get an integral √ ˜ ∼ e− t+iθt ak t −k/4 implies equation for G(t), and direct calculations show that F √ G(t) ∼ e−iω∗ t + O( 2 )e− t+iθt bk t −k/4 . To find the asymptotic behavior of F˜ (t) we derive an integral equation by taking the inverse Laplace transform of (48) and the same integral equation arguments as above reduce the asymptotic study of F˜ to that of the following expression for any u ∈ L2 : ∗ e−iλt g Bψ 0 dµa.c. (λ) := ξ(λ)e−iλt g (λ)dλ, (u, Be−iH0 t Pc- Bψ0 ) = Bu where B = W g˜ (H ) and φ˜ is the spectral representation of φ associated to H0 . By as ∗ )(λ)(λ−z)−1 (Bv)(λ)f sumption B(H0 −z)−1 B is analytic in z ∈ D, hence (Bu (λ)dλ 2 is analytic for any v ∈ L , where f (λ) = dµa.c. /dλ; therefore so is its Hilbert transform ∗ Bvf and thus ξ is also analytic. Choosing g (λ) = exp(−(λ−a)−1 +(λ−b)−1 ) the Bu b 1 1 asymptotic expansion of F˜ follows from that of the integral a e− λ−a + λ−b −itλ ξ(λ)dλ. " !
5.1. Example. Suppose H0 =
− 0 := − ⊕ (− + x 2 ) 0 − + x 2
on L2 (R) ⊕ L2 (R). Assume
W =
0 W˜ W˜ 0
Resonance Theory
151
with W˜ = W˜ (x) sufficiently regular and exponentially localized. Then, the spectrum of H0 has embedded eigenvalues corresponding to the spectrum of −+x 2 , with Gaussian localized and smooth eigenfunctions. Since the projection I − P0 in the definition of Pc 2 2 eliminates the − + x part in any interval containing an eigenvalue of − + x , it is left to verify the conditions of the theorem for H0 replaced by −. Since e−αx (− − z)−1 e−αx
(80)
has analytic continuation through the cut (0, ∞) and is an analytic function away from z = 0, we can now choose an interval = [a, b] around each eigenvalue En of −+x 2 , avoiding zero, and let −1 −1 E (λ) = e−(λ−a) e(λ−b) be a function analytic in C except z = a and b. 5.2. Remarks on applications. The examples covered by the above approach include those discussed in [11] as well as the many cases where analytic continuation has been established, see e.g. [21]. Furthermore, following results of [21] it follows that under favorable assumptions on V (x), − + V (x) has no zero energy bound states in three or more dimensions extending the results of [11], where it was proved for 5 or more dimensions. It is worth mentioning that the possible presence of thresholds inside makes it necessary to allow for η < ∞, and that in the case where there are finitely many thresholds inside of known structure, sharper results may be obtained. Other applications of our methods involve numerical reconstruction of resonances from time dependent solutions data, in cases where Borel summability is ensured. This and other implications will be discussed elsewhere. Acknowledgement. The authors acknowledge partial support from the NSF. One of us (A. S.) would like to thank I. M. Sigal for discussions.
References 1. Gérard, C. and Sigal, I.M.: Space-time picture of semiclassical resonances. Commun. Math. Phys. 145, 281–328 (1992) 2. Helffer, B. and Sjöstrand, J.: Résonances en limite semi-classique. Mem. Soc. Math. France (N. S) #24-25 (1986) 3. Balslev, E.: Resonances with a Background Potential. In: Lecture Notes in Physics 325, Berlin– Heidelberg–New York: Springer, 1989 4. Philips, R. and Sarnak, P.: Perturbation theory for the Laplacian on Automorphic Functions. J. Am. Math. Soc. Vol. 5, No. 1, 1–32 (1992) 5. Simon, B.: Resonances and complex scaling: A rigorous overview. Int. J. Quantum Chem. 14, 529–542 (1978) 6. Orth, A.: Quantum mechanical resonance and limiting absorption: The many body problem. Commun. Math. Phys. 126, 559–573 (1990) 7. Hunziker, W.: Resonances, Metastable States and Exponential Decay Laws in Perturbation Theory. Commun. Math. Phys. 132, 177–188 (1990) 8. Costin, O., Lebowitz, J.L., Rokhlenko, A.: Exact results for the ionization of a model quantum system. J. Phys. A: Math. Gen. 33, 1–9 (2000) 9. Tang, S.H. and Zworski, M.: Resonance Expansions of Scattered waves. To appear in CPAM 10. Skibsted, E.: Truncated Gamov functions, α-decay and exponential law. Commun. Math. Phys. 104, 591–604 (1986)
152
O. Costin, A. Soffer
11. Soffer, A. and Weinstein, M.I.: Time dependent resonance theory. GAFA, Geom. Funct. Anal. vol 8, 1086–1128 (1998) 12. Merkli, M., Sigal, I.M.: A Time Dependent Theory of Quantum Resonances. Commun. Math. Phys 201 549–576 (1999) ' 13. Journé, J.L., Soffer, A. and Sogge, C.: Lp → Lp Estimates for time dependent Schrödinger Equations. Bull. AMS 23, 2 (1990) 14. Jensen, A., Mourre, E. and Perry, P.: Multiple commutator estimates and resolvent smoothness in quantum scattering theory. Ann. Inst. Poincaré – Phys. Théor. 41, 207–225 (1984) 15. Sigal, I.M. and Soffer, A.: Local decay and velocity bounds for quantum propagation. Preprint (1988); ftp:// www.math.rutgers.edu/pub/soffer 16. Hunziker, W., Sigal, I.M., Soffer, A.: Minimal Escape Velocities. Comm. PDE 24, (11, 12) 2279–2295 (2000) 17. Agmon, S., Herbst, I. and Skibsted, E.: Perturbation of embedded eigenvalues in the generalized N-body problem. Commun. Math. Phys. 122, 411–438 (1989) 18. Aguilar, J. and Combes, J.M.: A class of analytic perturbations for one body Schrödinger Hamiltonians. Commun. Math. Phys. 22, 269–279 (1971) 19. Costin, O.: On Borel summation and Stokes phenomena for rank one nonlinear systems of ODE’s. Duke Math. J. Vol. 93, No. 2, 289–344 (1998) 20. Costin, O., Tanveer, S.: Existence and uniqueness for a class of nonlinear higher-order partial differential equations in the complex plane. CPAM Vol. LIII, 1092–1117 (2000) 21. Hislop, P. and Sigal, I.M.: Introduction to Spectral Theory. Applied Math. Sci. 113, Berlin–Heidelberg– New York: Springer, 1996 22. Rauch, J.: Perturbation Theory for Eigenvalues and Resonances of Schrödinger Hamiltonians. J. Funct. Anal. 35, 304–315 (1980) 23. Lavine, R.: Exponential Decay. In: Diff. Eq. and Math. Phys, Proceedings, Alabama, Birmingham, 1995 24. Reed, M., Simon, B.: Methods of Modern Mathematical Physics IV, Analysis of Operators. New York: Academic Press, 1978 Communicated by M. Aizenman
Commun. Math. Phys. 224, 153 – 204 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
The Birth of the Infinite Cluster: Finite-Size Scaling in Percolation C. Borgs1 , J. T. Chayes2 , H. Kesten2 , J. Spencer3 1 Microsoft Research, One Microsoft Way, Redmond, WA 98052, USA 2 Department of Mathematics, Cornell University, Ithaca, NY 14853, USA 3 Courant Institute of Mathematical Sciences, New York University, 251 Mercer Street,
New York, NY 10012, USA Received: 6 December 2000 / Accepted: 25 May 2001
Abstract: We address the question of finite-size scaling in percolation by studying bond percolation in a finite box of side length n, both in two and in higher dimensions. In dimension d = 2, we obtain a complete characterization of finite-size scaling. In dimensions d > 2, we establish the same results under a set of hypotheses related to so-called scaling and hyperscaling postulates which are widely believed to hold up to d = 6. As a function of the size of the box, we determine the scaling window in which the system behaves critically. We characterize criticality in terms of the scaling of the sizes of the largest clusters in the box: incipient infinite clusters which give rise to the infinite cluster. Within the scaling window, we show that the size of the largest cluster behaves like nd πn , where πn is the probability at criticality that the origin is connected to the boundary of a box of radius n. We also show that, inside the window, there are typically many clusters of scale nd πn , and hence that “the” incipient infinite cluster is not unique. Below the window, we show that the size of the largest cluster scales like ξ d πξ log(n/ξ ), where ξ is the correlation length, and again, there are many clusters of this scale. Above the window, we show that the size of the largest cluster scales like nd P∞ , where P∞ is the infinite cluster density, and that there is only one cluster of this scale. Our results are finite-dimensional analogues of results on the dominant component of the Erd˝os–Rényi mean-field random graph model. 1. Introduction: Background and Discussion of Results We dedicate this paper to Joel Lebowitz on the occasion of his 70th birthday. He is an inspiration to us all. We present here the complete version of results announced several years ago in [CPS96] and [Cha98]. Finite-size scaling is the study of corrections to the thermodynamic behavior of an infinite system due to finite-size effects. In particular, this includes the broadening of the transition point into a transition region in a finite system. Here we present an analysis
154
C. Borgs, J. T. Chayes, H. Kesten, J. Spencer
of finite-size scaling for percolation on the hypercubic lattice, both in two and in higher dimensions. Our analysis is based on a number of postulates which are mathematical expressions of the purported scaling behavior in critical percolation in dimensions two through six. We explicitly verify these scaling postulates in two dimensions. We consider bond percolation in a finite subset of the hypercubic lattice Zd . Nearest-neighbor bonds in are occupied with probability p and vacant with probability 1 − p, independently of each other. Let pc denote the bond percolation threshold in Zd , namely the value of p above which there exists an infinite connected cluster of occupied bonds. As a function of the size of the box , we determine the scaling window about pc in which the system behaves critically. For our purposes, criticality is characterized by the behavior of the distribution of sizes of the largest clusters in the box. We show how these clusters can be identified with the so-called incipient infinite cluster – the cluster of infinite expected size which appears at pc . The motivation for this work was threefold: first, to give a finite-dimensional analogue and interpretation of results on the Erd˝os-Rényi mean-field random graph model; second, to provide rigorous results on finite-size scaling at a continuous transition; and third, to establish detailed results on incipient infinite clusters which correspond closely to results observed by numerical physicists. In this introduction, we will discuss each aspect of the motivation in some detail. The Random Graph Model. The original motivation for this work was to obtain an analogue of known results on the random graph model of Erd˝os and Rényi ([ER59, ER60]; see also [Bol85,AS92]). The random graph is simply the percolation model on the complete graph, i.e., it is a model on a graph of N sites in which each site is connected to each other site, independently, with uniform probability p(N ). It turns out that the model has particularly interesting behavior if p(N ) scales like p(N ) ≈ c/N with c 1. Here, as usual, f g means that there are nonzero, finite strictly positive constants c1 and c2 , such that c1 g ≤ f ≤ c2 g. Let W (i) denote the random variable representing the size of the i th largest cluster in the system. Erd˝os and Rényi ([ER59, ER60]) showed that the model has a phase transition at c = 1 characterized by the behavior of W (1) . It turns out that, with probability one, W (1)
log N N 2/3 N
if c < 1 if c = 1 if c > 1.
(1.1)
Moreover, for c > 1, W (1) /N tends to some constant θ(c) > 0, with probability one, while for c = 1, W (1) has a nontrivial distribution (i.e., W (1) /N 2/3 constant) ([ER59, ER60, JKLP93,Ald97]). For c ≤ 1, the sizes of the second, third, . . . , largest clusters are of the same scale as that of the largest cluster, while for c > 1 this is not the case: For any fixed i > 1, W (i) log N for all c = 1 ([ER59, ER60]), while at c = 1, W (i) N 2/3 [Bol84]. The cluster of order N for c > 1 is clearly the analogue of the infinite cluster in percolation on finite-dimensional graphs; in the random graph, it is called the giant component. As we will see, the clusters of order log N or smaller are analogues of finite clusters in ordinary percolation. The clusters of order N 2/3 will turn out to be the analogue of the so-called incipient infinite cluster in percolation. More interestingly, the critical point c = 1 is actually broadened into a critical regime by finite-N corrections. It was shown by Bollobás [Bol84] and Łuczak [Luc90] that the
Finite-Size Scaling in Percolation
155
correct parameterization of the critical regime is 1 p(N ) = N
λN 1 + 1/3 , N
(1.2)
in the sense that if limN→∞ |λN | < ∞, then W (i) N 2/3 for all i; see also the combinatoric tour de force of Janson, Knuth, Łuczak and Pittel [JKLP93] for more detailed properties, including some distributional results on the W (i) ’s. Finally, it was shown by Aldous that the W (i) , rescaled by N 2/3 , have a nontrivial limiting joint distribution which can be calculated from a one-dimensional Brownian motion with time-dependent drift [Ald97]. On the other hand, if limN→∞ λN = −∞, then W (2) /W (1) → 1 with probability one, whereas if limN→∞ λN = +∞, then W (2) /W (1) → 0 and W (1) /N 2/3 → ∞ with probability one. The largest component in the regime with λN → +∞ is called the dominant component. As we will show, it has an analogue in ordinary percolation. The initial motivation for our work was to find a finite-dimensional analogue of the above results. To this end, we consider d-dimensional percolation in a box of linear size n, and hence volume N = nd . We ask how the size of the largest cluster in the box behaves as a function of n for p < pc , p = pc and p > pc . It is straightforward from known results to describe these cluster sizes for fixed p = pc . However, we are interested mainly in the situation where p varies with n. In particular, we ask whether there is a window about pc such that the system has a nontrivial cluster size distribution within the window. Finite-size scaling. The considerations of the previous paragraph lead us immediately to the question of finite-size scaling (FSS). Phase transitions cannot occur in finite volumes, since all relevant functions are polynomials and thus analytic; nonanalyticities only emerge in the infinite-volume limit. What quantities should we study to see the phase transition emerge as we go to larger and larger volumes? Before our work, this question had been rigorously addressed in detail only in systems with first-order transitions – transitions at which the correlation length and order parameter are discontinuous ([BoK90, BI92-1, BI92-2]). Finite-size scaling at secondorder transitions is more subtle due to the fact that the order parameter vanishes at the critical point. For example, in percolation it is believed that the infinite cluster density vanishes at pc . However, physicists routinely talk about an incipient infinite cluster at pc . This brings us to our third motivation. The incipient infinite cluster. At pc , it is believed that with probability one there is no infinite cluster. On the other hand, the expected size of the cluster of the origin is infinite at pc , see [Ham57], [Kes82, Cor. 5.1] and [AN84]. This suggests that from the perspective of an observer at the origin, all clusters are finite, with larger and larger clusters appearing as one considers larger and larger length scales. Physicists have called the emerging object the incipient infinite cluster. In the mid-1980’s there were two attempts to construct rigorously an object that could be identified as an incipient infinite cluster. Kesten [Kes86] proposed to look at the conditional measure in which the origin is connected to the boundary of a box centered at the origin, by a path of occupied bonds: Ppn (·) = Pp (· | 0 ↔ ∂[−n, n]d ). Here, as usual, Pp (·) is a product measure at bond density p. Observe that, at p = pc , as
156
C. Borgs, J. T. Chayes, H. Kesten, J. Spencer
n → ∞, Ppn (·) becomes mutually singular with respect to the unconditioned measure Pp (·). Nevertheless, Kesten found that in d = 2, lim Ppnc (·) = lim Pp (· | 0 ↔ ∞).
n→∞
ppc
(1.3)
Moreover, Kesten studied properties of the infinite object so constructed and found that it has a nontrivial fractal dimension which agrees with the fractal dimension of the physicists’ incipient infinite cluster. Another proposal was made by Chayes, Chayes and Durrett [CCD87]. They modified the standard measure in a different manner than Kesten, replacing the uniform p by an inhomogeneous p(b) which varies with the distance of the bond b from the origin: p(b) = pc +
λ , 1 + dist(0, b)ζ
(1.4)
with λ constant. The idea was to enhance the density just enough to obtain a nontrivial infinite object. In d = 2, [CCD87] proved that for ζ = 1/ν, where ν is the so-called correlation length exponent, the measure Pp(b) has some properties reminiscent of the physicists’ incipient infinite cluster. In this work, we propose a third rigorous incipient cluster – namely the largest cluster in a box. This is, in fact, exactly the definition that numerical physicists use in simulations. Moreover, it will turn out to be closely related to the IICs constructed by Kesten and Chayes, Chayes and Durrett. Like the IIC of [Kes86], the largest cluster in a box will have a fractal dimension which agrees with that of the physicists’ IIC. Also, our proofs rely heavily on technical estimates from the IIC construction of [Kes86]. More interestingly, the form of the scaling window p(n) for our problem will turn out to be precisely the form of the enhanced density used to construct the IIC of [CCD87]. Yet a fourth candidate for an incipient infinite cluster is a spanning cluster in a large box, an object studied by Aizenman in [Aiz97]. Let us caution the reader that the terminology in [Aiz97] differs somewhat from ours. While Aizenman reserves the term IIC for an incipient infinite cluster viewed from a point inside this cluster (thus implying uniqueness almost by definition), we use the term incipient infinite clusters for the large clusters viewed from the scale of the box under consideration. From this point of view the IIC is not necessarily unique, see below. Recently, Járai has shown that, viewed from a random point in the IIC, all four notions of the IIC lead to the same distribution on local observables in dimension d = 2 [Jar00]. Informal statement and heuristic interpretation of results. Our results will be stated precisely in Sect. 3. Here we give an informal statement in terms of the critical exponents of percolation, assuming these exponents exist. Note that our results hold independently of the existence of critical exponents, but they are easier to state informally and to compare to the random graph results (1.1) and (1.2) in terms of these exponents. To this end, let P∞ (p) denote the infinite cluster density, χ fin (p) denote the expected size of finite clusters, ξ(p) denote the correlation length, i.e., the inverse exponential decay rate of the finite cluster connectivity function, and P≥s (p) denote the probability that the cluster of the origin is of size at least s. Also let πn (pc ) denote the probability at criticality that the origin is connected to the boundary of a hypercube of side 2n. See Sect. 2, in particular Eqs. (2.5), (2.15), (2.18), (2.4) and (2.10), for precise definitions.
Finite-Size Scaling in Percolation
157
It is believed, but not proved in low dimensions, that the behavior of these quantities as p → pc or at p = pc is described by the following scaling laws: P∞ (p) ≈ |p − pc |β
p > pc ,
(1.5)
χ (p) ≈ |p − pc |
−γ
,
(1.6)
ξ(p) ≈ |p − pc |
−ν
,
(1.7)
fin
P≥s (pc ) ≈ s
−1/δ
,
(1.8)
πn (pc ) ≈ n−1/ρ .
(1.9)
and In (1.5)–(1.7), G(p) ≈ |p − pc
|α
means
lim
p→pc
log G(p) = α. log |p − pc |
(1.10)
Unless otherwise noted we implicitly assume that the approach is identical from above and below threshold. Similarly, we use G(n) ≈ nα in (1.8)–(1.9) to mean lim
n→∞
log G(n) = α. log n
(1.11)
(i)
Let n denote a hypercube of side n and let W n denote the i th largest cluster in this hypercube. Then, under certain “scaling assumptions,” we find the asymptotic behavior (1) of W n , both for fixed p and, more generally, for p which vary with n. Combining our results at pc with known results for fixed p = pc , we first establish the following analogue of (1.1): log n if p < pc (1) W n ndf (1.12) if p = pc nd if p > pc , where we use the suggestive notation df = d − 1/ρ
(1.13)
to indicate that d −1/ρ is the fractal dimension of our candidate incipient infinite cluster. Moreover, we show that, under the scaling assumptions, the critical point pc is broadened into a scaling window of the form λ p(n) = pc 1 ± 1/ν , (1.14) n in the sense that inside the window W (1) ≈ ndf ,
W (2) ≈ ndf , · · · ,
(1.15)
while above the window W (1) ≈ nd P∞ , W (1) /ndf → ∞, W (2) /W (1) → 0,
(1.16)
158
C. Borgs, J. T. Chayes, H. Kesten, J. Spencer
and below the window W (1) /ndf → 0,
(1.17)
W (1) ≈ ξ df log(n/ξ ).
(1.18)
where, in fact,
The results in (1.14)–(1.18) are established both in expectation and in probability. Note the similarity between the form of the scaling window (1.14) and the bond density (1.4) of the [CCD87] incipient infinite cluster. Furthermore, within the scaling window, we get results on the distribution of cluster sizes which show that the distribution does not go to a point mass. This is to be contrasted with the behavior above the window, where the normalized cluster size approaches its expectation, with probability one. All of these additional results require some delicate second moment estimates. Our scaling assumptions, which are described in detail in Sect. 3, are explicitly proved in dimension d = 2, and are believed – but not proved – to hold for d less than the socalled upper critical dimension dc . The upper critical dimension is the dimension above which the critical exponents assume their Cayley tree values; presumably dc = 6 for percolation. What would results (1.14) and (1.15) say if we attempted to apply them in the case of the random graph model (to which, of course, they do not rigorously apply)? Let us use the widely believed hyperscaling relation dν = γ + 2β and the observation that the volume N of our system is just nd , to rewrite the window in the form λ λ λ (1.19) pn = pc 1 ± 1/ν = pc 1 ± 1/dν = pc 1 ± 1/(γ +2β) . n N N Similarly, let us use the hyperscaling relation df /d = δ/(1 + δ) to rewrite the size of the largest cluster as W (1) ≈ ndf ≈ N df /d ≈ N δ/(1+δ) .
(1.20)
Noting that the random graph model is a mean-field model, we expect (and in fact it can be verified [BBCK98]) that γ = 1, β = 1 and δ = 2. Using also pc = 1/N , (1.19) suggests a window of the form 1 λ p(N ) = 1 ± 1/3 , (1.21) N N and within that window W (1) ≈ N 2/3 ,
(1.22)
just the values obtained in the combinatoric calculations on the random graph model. We caution the reader that hyperscaling relations do not apply to the random graph, so that a proper version of the arguments above requires that we deal with a “correlation volume” rather than the correlation length, and that we establish (1.20) directly from the scaling of the cluster size distribution (1.8), rather than by recourse to our finite-dimensional results and a hyperscaling relation. Such arguments can be derived, but are beyond the scope of this paper.
Finite-Size Scaling in Percolation
159
Our results also have implications for finite-size scaling. Indeed, the form of the window tells us precisely how to locate the critical point, i.e., it tells us the correct region about pc in which to do critical calculations. Finally, the results tell us that we may use the largest cluster in the box as a candidate for the incipient infinite cluster. Within the window, it is not unique, in the sense that there are many clusters of this scale. However, above the window (even including a region where p is not uniformly greater than pc as n → ∞), there is a unique cluster of largest scale. This is the analogue of what is called the dominant component in the random graph problem. It is interesting to contrast our results with recent results in high dimensions. As already observed on a heuristic level in [Con85], the validity of hyperscaling is related to the fact that the critical crossing clusters in a box of side length n have size of order nd−1/ρ , and that their number is bounded uniformly in n; see [BCKS98] for rigorous results concerning this relationship. Conversely, breakdown of hyperscaling above six dimensions requires, at least on a heuristic level, that at criticality, the number of crossing clusters in a box of side length n grows like nd−6 , and that all of them have sizes of order n4 ; see again [Con85]. In a similar way, one would expect that the largest cluster in a box of side length n is of size n4 , and that there are roughly nd−6 clusters of similar size. Indeed, it can be proven [Aiz97] that these results follow from a postulate on the decay of the connectivity function at criticality which is widely believed to hold above six dimensions. Very recently, T. Hara [Har01] used the so-called Lace expansion, in the form developed in [HHS01], to rigorously establish this postulate in sufficiently high dimensions d 6. Methods and organization. As mentioned above, our results are proved under certain scaling assumptions which we explicitly verify in dimension d = 2. Obviously, the results could have been proven directly – with no assumptions – in d = 2, but the resulting proof would have been quite complicated and would not have yielded much insight. Instead, we formulate postulates which we believe characterize critical behavior in all dimensions below the critical dimension dc , and then prove our results under these postulates. We believe that the postulates are of independent interest since they provide insight into the nature of critical behavior. Indeed, in previous announcements of this work [CPS96] and [Cha98], we used more postulates than we need now. In [BCKS98], we proved that one of these original postulates was implied by several others, in particular that a reasonable assumption on the behavior of crossing probabilities implies certain hyperscaling relations among critical exponents. The proofs in this paper will rely heavily on the results and methods of [BCKS98]. Indeed, [BCKS98] should really be viewed as “Part I” of this paper, since many of our results on the cluster size distribution were derived there. The verification of the postulates in d = 2 relies on the constructive two-dimensional methods of [Kes86] and [Kes87]. The organization of this paper is as follows. In Sect. 2, we give definitions, notations and previous percolation results we will need in our proofs. Our main results are formulated in Sect. 3. There we first state our postulates, and then state the finite-size scaling results under these postulates. In Sect. 4, we state many additional results which may be of independent interest, including the results of [BCKS98]. Finally, using these additional results, in Sect. 5 we prove our main finite-size scaling theorems under the scaling postulates. We believe, but cannot prove, that the scaling postulates should hold up to the upper critical dimension, which is believed to be dc = 6 for percolation. Finally, in Sect. 6, we prove that the scaling postulates are satisfied in two dimensions. Thus, we have a complete proof of finite-size scaling for percolation in dimension d = 2. In
160
C. Borgs, J. T. Chayes, H. Kesten, J. Spencer
Sect. 7, we give a proof of slightly stronger finite-size scaling results under an alternative set of postulates, and also show that the alternative postulates hold in d = 2. 2. Definitions, Notation and Preliminaries Consider the hypercubic site lattice Zd , and the corresponding bond lattice Bd consisting of bonds between all nearest-neighbor pairs in Zd . Bond percolation on Bd is defined by choosing each bond of Bd to be occupied with probability p and vacant with probability 1 − p, independently of all other bonds. The corresponding product measure on configurations of occupied and vacant bonds is denoted by Prp . Ep denotes expectation with respect to the measure Prp , and Covp (· ; ·) denotes the covariance of two indicator functions with respect to Prp : Covp (A; B) = Prp (A ∩ B) − Prp (A)Prp (B). A generic configuration is denoted by ω. If S1 , S2 , S3 ⊂ Zd , we say that S1 is connected to S2 in S3 , denoted by {S1 ↔ S2 in S3 }, if there exists an occupied path with vertices in S3 from some site of S1 to some site of S2 . Maximal connected subsets are called (occupied) clusters. The occupied cluster (in the configuration ω) containing the site x is denoted by C(x) = C(x; ω). The size of the cluster C, denoted by |C|, is the number of sites in C. C∞ denotes the (unique) infinite cluster, i.e., the occupied cluster with |C| = ∞. We also consider clusters in a finite box ⊂ Zd . The connected component of x in C(x) ∩ is denoted by C (x) = C (x; ω); this is therefore the collection of all (1) (2) (k) points which are connected to x by an occupied path in . C , C , · · · C denote the occupied clusters in , ordered from largest to smallest size, with lexicographic order (i) (i) between clusters of the same size. W = |C | denotes the size of the i th largest cluster in . Finally (i)
N (s1 , s2 ) = |{i | s1 ≤ W ≤ s2 }|
(2.1)
denotes the number of clusters in with size between s1 and s2 , and (s1 , s2 ) = |{i | s1 ≤ W (i) ≤ s2 , C (i) ↔ ∂ }| N
(2.2)
is the corresponding number of clusters which do not touch the boundary ∂ of . Here ∂ is the set of points x ∈ that have distance less than 1 from the complement
c = Zd \ of . Returning now to the model on the full lattice, the cluster size distribution is characterized by Ps = Ps (p) = Prp (|C(0)| = s),
(2.3)
P≥s = P≥s (p) = Prp (|C(0)| ≥ s).
(2.4)
or alternatively
The order parameter of the model is the percolation probability or infinite-cluster density P∞ (p) = Prp (|C(0)| = ∞).
(2.5)
pc = inf{p : P∞ (p) > 0}.
(2.6)
The critical probability is
Finite-Size Scaling in Percolation
161
We consider several connectivity functions: the (point-to-point) connectivity function τ (v, w; p) = Prp (v ↔ w),
(2.7)
the finite-cluster (point-to-point) connectivity function τ fin (v, w; p) = Prp (v ↔ w, |C(v)| < ∞),
(2.8)
the point-to-hyperplane connectivity function πn (p) = Prp {∃ v = (n, ·) such that 0 ↔ v}
(2.9)
(v = (n, ·) means that the first coordinate of v equals n), and the point-to-box connectivity function πn (p) = Prp {0 ↔ ∂Bn (0)},
(2.10)
Bn (v) = {w ∈ Zd : |v − w|∞ ≤ n} = [−n, n]d ∩ Zd ,
(2.11)
where
πn (p) are equivalent, in the with | · |∞ denoting the 0∞ -norm. Notice that πn (p) and sense that πn (p). πn (p) ≤ πn (p) ≤ 2d
(2.12)
A quantity which for p > pc behaves much like τ fin (x, y; p) is the covariance: τ cov (v, w; p) = Covp (v ↔ ∞; w ↔ ∞)
(2.13)
(see [CCGKS89], Sect. 6). We also consider several susceptibilities: τ (0, v; p), χ (p) = Ep (|C(0)|) = χ fin (p) = Ep (|C(0)|, |C(0)| < ∞) =
v
τ fin (0, v; p) =
v
and χ cov (p) =
sPs (p)
(2.14) (2.15)
s<∞
τ cov (0, v; p).
(2.16)
v
Finally, we introduce the quantity s(n) = (2n)d πn (pc ).
(2.17)
As we will see, s(n) is the order of magnitude of the size of the largest critical clusters on scale n. Length scales in the model are naturally expressed in terms of the correlation length ξ(p), defined by the limit 1/ξ(p) = −
lim
|v|∞ →∞
1 log τ fin (0, v; p), |v|∞
(2.18)
taken with v along a coordinate axis. We will use the fact that ξ(p) < ∞ for all p = pc and ξ(p) → ∞ as p ↑ pc (see Grimmett [Gri99], Theorem 6.49 and Eq. (6.57) for
162
C. Borgs, J. T. Chayes, H. Kesten, J. Spencer
p < pc ; for p > pc this follows from Grimmett and Marstrand [GM90]). While it is also believed that ξ(p) → ∞ as p ↓ pc , this is rigorously known only for d = 2. Alternatively, lengths may be expressed in terms of the finite-size scaling correlation length L0 (p, ε), introduced in [CCF85] and studied in [CCF85, CCFS86] and [Kes87]. For p < pc , L0 (p, ε) is defined in terms of the crossing probabilities of rectangles, the so-called sponge crossing probabilities: RL,M (p) = Prp { ∃ occupied bond crossing of [0, L] × [0, M] · · · × [0, M] (2.19) in the 1-direction}. Observing that, for p < pc , the sponge crossing probability RL,3L (p) → 0 as L → ∞, we define L0 (p) = L0 (p, ε) = min{L ≥ 1 | RL,3L (p) ≤ ε}
if p < pc .
(2.20)
Using the methods and results of [ACCFR83, CC86, CCF85] and [Kes87], it is straightforward to show that there exists a(d) > 0 such that for ε < a(d), the scaling behavior of L0 (p, ε) is independent of ε for p < pc , in the sense that L0 (p, ε1 )/L0 (p, ε2 ) is bounded away from 0 and infinity for two fixed values ε1 , ε2 < a(d). This scaling behavior is also essentially the same as that of the standard correlation length ξ(p). More specifically, for 0 < ε < a(d), there exist constants c1 = c1 (d), c2 = c2 (d, ε) < ∞ such that1 1 1 c1 log L0 (p, ε) + c2 ≤ ≤ , L0 (p, ε) ξ(p) L0 (p, ε) − 1
p < pc .
(2.21)
Hereafter we will assume that ε < a(d); we usually suppress the ε-dependence in our notation. For p > pc , it is natural to define L0 (p, ε) in terms of a suitable finite-cluster analogue of the sponge-crossing probability RL,M (p), see [CC87], Eq. (53). For technical reasons, it is convenient, however, to consider instead crossings in an annulus HL,M = Zd ∩ [−L, L + M]d \ (0, M)d ,
(2.22)
with inner and outer boundaries ∂I HL,M and ∂E HL,M . We say that an occupied cluster CH in H = HL,M is H -finite if H \ CH contains a path – occupied or not – that connects ∂I H to ∂E H . Let fin (p) = Prp { ∃ an occupied H -finite cluster CH in H = HL,M SL,M
that connects ∂I H to ∂E H },
(2.23)
fin (p) = 1. We define with the convention S0,M fin (p) ≥ ε} L0 (p) = L0 (p, ε) = 1 + max{L ≥ 0 : SL,L
if
p > pc ,
(2.24)
and more generally, for x ≥ 1, fin (p) ≥ ε} L0 (p, ε; x) = 1 + max{L ≥ 0 : SL,xL
if
p > pc .
(2.25)
Note that L0 (p, ε; x) may be finite or infinite, depending on whether or not there exists fin an L0 < ∞ such that SL,xL (p) < ε for all L ≥ L0 . We expect that this definition 1 K. Alexander [Ale96] has shown that one can take c (d = 2) = 0 in (2.21) 1
Finite-Size Scaling in Percolation
163
coincides, say in the sense of Eq. (2.21) (with an x−dependent constant c2 , and c1 (d) = 0), with the standard correlation length ξ(p) above threshold. However, we are not able to prove this in d ≥ 3, since the rescaling techniques of [ACCFR83] do not work for finite-cluster crossings. In d = 2, we can use a Harris ring construction [Har60] in conjunction with the Russo–Seymour–Welsh Lemma ([Rus78, SW78]) to show that this definition is equivalent to ξ(p); see Sect. 6. An important quantity in the high-density phase is the surface tension σ (p); see [ACCFR83] for the precise definition. By analogy with the definition of a finite-size scaling correlation length below threshold, we define a finite-size scaling inverse surface tension as A0 (p) = A0 (p, ε) = min{Ld−1 ≥ 1 | RL,3L (p) ≥ 1 − ε}
if p > pc .
(2.26)
It is easy to see that A0 (p) is well-defined and finite for all p > pc . Indeed, p > pc implies P∞ (p) > 0, which in turn implies that the probability of the event |C(x)| < ∞ for all x ∈ Zd ∩ [0, L]d goes to zero as L → ∞. Since this probability is bounded from below by (1 − RL,3L (p))2d (cf. the proof of Lemma 4.4), this implies that RL,3L (p) → 1 as L → ∞, and hence A0 (p) is well-defined and finite. We expect that A0 (p) is equivalent to the inverse surface tension2 1/σ (p), which in turn should be equivalent to ξ d−1 (p) below the critical dimension dc (presumably dc = 6). Again, we are only able to prove this equivalence in d = 2. While the behavior of L0 (p) below pc is well understood in general dimension, much less is known about L0 (p) or A0 (p) above pc . In particular, below pc , it is easy to see that L0 (p) is monotone increasing, left continuous and piecewise constant. Moreover, L0 (p) ↑ ∞
as
p ↑ pc ,
(2.27)
because RL,3L (pc ) is bounded away from 0 (e.g., by Theorem 5.1 in [Kes82]). Furthermore, the jumps in L0 (p) are uniformly bounded on a logarithmic scale. In particular, by the methods of [ACCFR83, CC86, CCF85] and [Kes87], we have R2L,6L ≤
1 R2 , a(d) L,3L
(2.28)
which in turn implies lim
δ→0
L0 (p + δ) ≤ 2, L0 (p)
(2.29)
provided p < pc and ε < a(d). By contrast, none of these properties are known for L0 (p) above pc . Next consider A0 (p), which, almost by definition, is monotone decreasing and right continuous. However, in general dimension, we do not have a proof that A0 (p) diverges as p ↓ pc , nor do we have a bound of the form (2.29). We will therefore require several postulates on the behavior of L0 (p) and A0 (p) above pc . 2 Using Proposition 3 of [CC87], one can actually prove that A (p) ≤ const σ (p)−1 for all d ≥ 2. We 0 do not expect that the opposite inequality holds for d > the critical dimension, dc , since such an inequality – together with the usual assumption that σ (p) → 0 as p ↓ pc – would imply that A0 (p) → ∞ as p ↓ pc for d > dc , which is believed to be false, see Sect. 3.3.
164
C. Borgs, J. T. Chayes, H. Kesten, J. Spencer
3. Statement of Postulates and Theorems 3.1. The scaling postulates. Most of our theorems are established under a set of assumptions which we can verify explicitly in two dimensions, and which we expect to be true for all dimensions not exceeding the critical dimension dc (presumably dc = 6). We call these assumptions the Scaling Postulates, since they follow from the type of scaling typically assumed in the physics literature. Since L0 (p) and A0 (p) depend on ε, see Eqs. (2.20), (2.24) and (2.26), many of our postulates implicitly involve the constant ε. We assume that they are true for all nonzero ε < ε0 , where ε0 = ε0 (d) is a suitable constant. We write our postulates in terms of the equivalence symbol . Here F (p) G(p)
(3.1)
means that there are lower and upper bounds of the form C1 F (p) ≤ G(p) ≤ C2 F (p),
(3.2)
where C1 > 0 and C2 < ∞ are constants which do not depend on p, as long as p is uniformly bounded away from zero or one, but which may depend on the constants ε, ε or x appearing explicitly or implicitly in the postulates. Occasionally, p is further restricted to lie on one side of pc . Similarly F (n) G(n) means that C1 F (n) ≤ G(n) ≤ C2 F (n) for some constants 0 < C1 ≤ C2 < ∞. Our scaling postulates are (I) L0 (p) → ∞ as p ↓ pc ; d−1 (II) A0 (p) Ld−1 ε; x), provided p > pc , x ≥ 1 and 0 < ε < ε0 ; 0 (p) L0 (p, (III) There are constants D1 > 0 and D2 < ∞ such that D1 ≤
πn (p) ≤ D2 πn (pc )
(IV) There are constants D3 > 0 and ρ1 >
2 d
if n ≤ L0 (p);
such that
m −1/ρ1 πm (pc ) ≥ D3 πn (pc ) n (V)
if
m ≥ n ≥ 1;
There is a constant D4 such that χ cov (p) ≤ D4 Ld0 (p)πL2 0 (p) (pc )
and
χ fin (p) ≤ D4 Ld0 (p)πL2 0 (p) (pc )
if p > pc ; (VI) πL0 (p) (pc ) P∞ (p) if p > pc ; (VII) There are constants D5 , D6 < ∞ such that P≥ks(L0 (p)) (p) ≥ D5 e−D6 k P≥s(L0 (p)) (p) if
p < pc
and
k ≥ 1.
We shall have some comments on the interpretation of the postulates and other remarks after we state our theorems.
Finite-Size Scaling in Percolation
165
3.2. Statement of the main results. A central concept in our theorems is the notion of a scaling window in which the system behaves critically. This can best be described by the function n − L0 (p) if p < pc (3.3) g(p, n) := 0 if p = pc n L0 (p) if p > pc . It will be seen that a sequence of systems with density pn behaves critically – as far as size of large clusters is concerned – in the finite boxes
n := {v ∈ Zd | −n ≤ vi < n, i = 1, . . . , d}
(3.4)
pn → p and lim sup |g(pn , n)| < ∞.
(3.5)
if n→∞
If this is the case we shall say that the (sequence of) systems are inside the scaling window. We shall say that the systems are below (respectively above) the scaling window if g(pn , n) → −∞ (respectively, g(pn , n) → ∞). These regimes correspond to subcritical, respectively supercritical behavior. In particular we must have pn < pc eventually if {pn } lies below the scaling window, and pn > pc eventually if {pn } lies above the scaling window. Our theorems below give many details of the finite-size scaling behavior of the system inside, above, and below the scaling window. They confirm the folklore that within distances of the order of the correlation length the system behaves critically. Specifically, we make this statement precise for the behavior of the size of the large clusters. Unfortunately we cannot derive this from the definition of correlation length only. One of our basic assumptions is that within the correlation length the point-to-box connectivity behaves as it does at the critical point (see Postulate III). In order to state these theorems, we again use the symbol , this time for two sequences an and bn of real numbers. We write an bn
(3.6)
if 0 < lim inf n→∞
an an ≤ lim sup < ∞. bn n→∞ bn
(3.7)
| n | denotes the number of sites in n ; thus | n | = (2n)d . We remind the reader that Postulates (I)–(VII) are verified for d = 2 in Sect. 6. Thus all the conclusions of our theorems hold in the two-dimensional case. Our first theorem characterizes the scaling window in terms of the expectation of the largest cluster sizes. Theorem 3.1. i) Suppose that Postulates (I)–(IV) hold. If {pn } is inside the scaling window, i.e., if lim supn→∞ |g(pn , n)| < ∞, and i ∈ N, then (i)
Epn {W n } s(n).
(3.8)
166
C. Borgs, J. T. Chayes, H. Kesten, J. Spencer
ii) Suppose that Postulates (I)–(IV) and (VII) hold. If {pn } is below the scaling window, i.e., g(pn , n) → −∞, then n (1) Epn {W n } s(L0 (pn )) log . (3.9) L0 (pn ) iii) Suppose that Postulates (II), (V) and (VI) hold. If {pn } is above the scaling window, i.e., g(pn , n) → ∞, then (1)
Epn {W n }
| n |P∞ (pn )
→ 1 as n → ∞,
(3.10)
→ 0 as n → ∞.
(3.11)
and (2)
Epn {W n }
| n |P∞ (pn )
The next theorem tells us about the distribution of the largest cluster sizes above the scaling window. Theorem 3.2. Suppose that Postulates (II), (V) and (VI) hold. Let {pn } be above the scaling window. Then, as n → ∞, (1)
W n
| n |P∞ (pn )
→ 1 in probability.
(3.12)
The next theorem gives information about the distribution of the cluster sizes inside the scaling window. It shows that, in this regime, the tails of the distribution of (1) (i) W n /E{W n } decay, but the distribution does not go to a delta function. This should be contrasted with the behavior (3.12), which shows that above the scaling window the (1) (1) distribution of W n /E{W n } does tend to a delta function. Theorem 3.3. Suppose that Postulates (I)–(IV) hold. Let {pn } lie inside the scaling window. i) For all i < ∞,
lim inf Prpn K
−1
n→∞
(i)
≤
W n
(i)
Epn {W n }
≤ K → 1 as K → ∞.
(3.13)
ii) For each K < ∞ and all i < ∞, lim sup Prpn n→∞
(i)
W n
(i)
Epn {W n }
≥ K −1 < 1.
(3.14)
We have one more theorem for p inside the scaling window. This concerns the number of clusters on scales m < n. Before stating the theorem, we point out that, due to (3.8), the (2) “incipient infinite cluster” inside the scaling window is not unique, in the sense that W n (1)
(2)
(1)
is of the same scale as W n . This should be contrasted with the behavior of W n /W n above the scaling window (see (3.10) and (3.11)), a remnant of the uniqueness of the infinite cluster above pc . The next theorem relates the non-uniqueness of the “incipient infinite cluster” inside the scaling window to the property of scale invariance at pc . We n are defined in Eq. (2.1) and (2.2). remind the reader that the quantities N n and N
Finite-Size Scaling in Percolation
167
Theorem 3.4. Suppose that Postulates (I)–(IV) hold. Let {pn } lie inside the scaling window. Then there exist strictly positive, finite constants σ1 , σ2 , C1 and C2 (all depending on the sequence {pn }, but not on n, m or k) such that d d
n n ≤ Epn N n (s(m), s(km)) ≤ Epn N n (s(m), s(km)) ≤ C2 , C1 m m (3.15) provided m and k are strictly positive integers with k ≥ σ1 and σ2 m ≤ n. (i)
Our next theorem gives the behavior of the W n when p is below the scaling window. Theorem 3.5. Suppose that Postulates (I)–(IV) and (VII) hold. Let {pn } lie below the scaling window. Then, for each fixed i,
lim inf Prpn K
−1
n→∞
(i)
≤
W n
n s(L0 (pn )) log L0 (p n)
≤ K → 1 as K → ∞.
(3.16)
As mentioned before, we expect the Scaling Postulates to hold for all d ≤ dc = 6. The next theorem states that they do hold if d = 2. Theorem 3.6. The Postulates (I)–(VII) hold in d = 2. Notice that in Theorem 3.3 ii) (in conjunction with (3.8)), we prove that inside the (i) scaling window the support of W n /s(n) is not bounded away from 0. We would expect that this support is also unbounded above and that this should be easy to prove from Postulate (VII), which states in a way that the support of |C(0)|/s(L0 (p)) is unbounded. However we have been unable to derive this from the Postulate (VII). Instead, in Sect. 7, we consider an alternative postulate, Postulate (VII alt), which says roughly that clusters of size of order s(L0 (p)) and distance of order L0 (p) have a reasonable chance of being connected to each other. In that section, we prove the following theorem. Theorem 3.7. i) Suppose Postulates (I) – (IV) and (VII alt) hold. Let {pn } be inside the scaling window and let i ∈ N. Then lim sup Prpn n→∞
(i)
W n
(i)
Epn {W n }
≤ K < 1 for all K < ∞.
ii) Postulate (VII alt) holds in d = 2. 3.3. Comments on the postulates and further remarks. The interpretation of our postulates is as follows. The first tells us that the approach to pc is critical – i.e., continuous or second-order – from above pc . The second postulate is the assumption of equivalence of length scales above pc : namely, Widom scaling, dimensionally relating the surface tension to the correlation length, together with the equivalence of the finite-size scaling lengths at various values of x ≥ 1 and ε ∈ (0, ε0 ). This postulate is not expected to hold above the critical dimension. In fact, it is not even believed that A0 (p) → ∞ as p ↓ pc , because this would imply that the crossing probability RL,3L (pc ) is bounded away from 1 uniformly in L. But uniform boundedness of crossing probabilities implies hyperscaling [BCKS98], which is not believed to hold above the upper critical dimension dc .
168
C. Borgs, J. T. Chayes, H. Kesten, J. Spencer
Postulate (III) tells us that the system within the correlation length behaves as it does at threshold, at least as characterized by the behavior of the point-to-box connectivity function. Postulate (IV) implies that the connectivity function has a lower bound of power law behavior at threshold. Especially Postulates (III) and (IV) turn out to imply more than is immediately apparent. Proposition 4.6 states that the cluster size distribution for clusters with diameters up to the correlation length behaves like the corresponding distribution at threshold. This proposition also gives us a hyperscaling relation between the exponents δ and ρ, assuming that these exponents exist. We also obtain a scaling relation for χ (p) in Proposition 4.8. Assuming power laws for χ and L0 , and the relation (4.24), the assumed bound on ρ1 in Postulate (IV) is equivalent to the very weak bound γ > 0. But it is known ([AN84]) that χ (p) ≥ C1 (pc − p)−1 , p < pc , i.e., γ ≥ 1 if it exists. In the light of this, Postulate (IV) seems very reasonable. The fifth and sixth postulates give various exponent relations, again provided that these exponents exist. Finally, the last postulate states that (in the subcritical region) s(L0 (p)) is the natural scale for the cluster size distribution and that on this scale the tail of the distribution does not decay faster than exponentially. Proposition 4.8 provides an inequality in the opposite direction, i.e., this decay is at least exponentially fast. See also Remark vi) below. Remarks. i) Assuming the existence of the exponent ρ, see (1.9), Theorem 3.1 implies that inside the scaling window the largest, second largest, third largest,..., clusters scale like ndf , with df = d − 1/ρ, while below the scaling window the size of the largest cluster (and hence of all clusters) goes to zero on the scale ndf . ii) By Postulate (VI), and Lemma 4.5 below, πL0 (pn ) (pc ) | n |P∞ (pn ) P∞ (pn ) = →∞ s(n) πn (pc ) πn (pc )
(3.17)
above the scaling window. Statement iii of Theorem 3.1 therefore implies that (1)
Epn {W n } s(n)
→∞
as
n→∞
(3.18)
above the scaling window. iii) Assume that the critical exponent ν, see Eq. (1.7), exists, and that an equivalence of the form (2.21) holds for p > pc as well. Choose pn− = sup{p < pc : L0 (p) ≤ n}. Then by (2.29), L0 (pn− ) n. Moreover, L0 (pn− ) ≈ ξ(pn− ) ≈ |pn− − pc |−ν
(3.19)
so that pc − pn− ≈ n−1/ν . Finally, {pn } is below the scaling window if lim inf n→∞ log(pc −pn )/ log n > −1/ν. Similar statements hold to the right of pc with pn+ := inf{p > pc : L0 (p) ≤ n}, provided we make the further assumption that lim sup lim p↓pc
δ↓0
L0 (p − δ) < ∞. L0 (p)
Thus under these various assumptions the scaling window has width n−1/ν . It should be pointed out, though, that at present we do not have enough rigorous knowledge of the behavior of L0 (p) as a function of p to define the scaling window in terms of the behavior of (pn − pc )/gn± for suitable sequences {gn± }. For instance, it is not
Finite-Size Scaling in Percolation
169
known that there exists a sequence {gn− } of positive numbers such that n/L0 (pn ) → ∞ is equivalent to (pc − pn )/gn− → ∞ for pn < pc . iv) It follows from (3.11) and Markov’s inequality that (2)
W n
| n |P∞ (pn )
→0
in probability
(3.20)
above the scaling window. Combined with (3.12) this implies that, as n → ∞, (2)
W n
(1)
W n
→0
in probability,
(3.21)
provided g(pn , n) → ∞. v) In a similar way, it follows from (3.9) that, as n → ∞, (1)
W n
s(n)
→0
in probability,
(3.22)
provided g(pn , n) → −∞. 4. Auxiliary Results In this section, which is split into two subsections, we state several useful auxiliary results, most of which have been already proved in [BCKS98], which we will need for our proofs in Sect. 5. The first subsection gives a fundamental moment estimate and an exponential tail estimate for cluster sizes. These estimates show a close relationship between the diameter and the size or volume of a large cluster. A cluster in n of diameter small with respect to n usually has a volume which is small with respect to s(n). We believe – but could not prove – that the converse also holds, namely that a cluster in n of diameter of order n has with high probability a volume bigger than a small multiple of s(n). The second subsection contains various important properties of the quantities πn , Ps , P≥s and χ which are akin to the postulates. Throughout, the basic parameter p is bounded away from 0 and 1, that is we restrict p to ζ0 ≤ p ≤ 1 − ζ0 for some small strictly positive ζ0 . No further mention of ζ0 will be made. Many constants Ci appear in this paper. These are always finite and strictly positive, even when this is not indicated. In different formulae the same symbol Ci may denote different constants. All these constants depend on ε, d, ζ0 and the constants which appear in the postulates. This dependence will not be indicated in the notation. I [A] denotes the indicator function of the event A. All results in this section are proven under Postulates (I)–(IV) or a subset of these. In fact, none of the statements of this section rely directly on Postulates (I) and (II). Instead, they use the following two assumptions, which are much weaker than Postulates (I) and (II). The first is the assumption that the sponge crossing probabilities at pc are bounded away from one, that is, 1 − Rn,3n (pc ) > ε,
n ≥ 1,
(4.1)
170
C. Borgs, J. T. Chayes, H. Kesten, J. Spencer
for some ε > 0, and the second is the assumption that (4.1) can be extended to p > pc , provided n ≤ L0 (p). Actually, we only need the slightly weaker assumption that there are some constants ε > 0 and σ3 > 0 such that 1 − Rn,3n (p) > ε
p > pc
for all
and all
n ≤ σ3 L0 (p).
(4.2)
To see that (4.1) follows from Postulates (I) and (II), we note that these postulates imply that A0 (p) → ∞ as p ↓ pc , which in turn implies the statement (4.1). The bound (4.2) follows directly from Postulate (II), since, by the definition of A0 (p), 1 − Rr,3r (p) > ε
for
r d−1 < A0 (p)
and
p > pc .
By the equivalence of A0 (p) and L0 (p)d−1 (see Postulate (II)) this means that there exists some σ3 > 0 such that (4.2) holds for p > pc and all n ≤ σ3 L0 (p). We caution the reader that above pc , the definition of the correlation length L0 (p) in [BCKS98] is slightly different from the definition here (compare (2.17) in [BCKS98] to our Eq. (2.24)). However, as noted in Remark (vi) in [BCKS98], all results there remain valid for any definition of L0 (p) above pc that obeys Postulates (3.15) and (3.16) in [BCKS98]. While Postulate (3.16) of [BCKS98] is identical to our Postulate (III), Postulate (3.15) in [BCKS98] is slightly stronger than our assumption (4.2) – the former corresponds to (4.2) with σ3 = 1. Here, we need only one result which uses Postulate (3.15), namely Theorem 3.6 of [BCKS98], which we cite to establish the last statement in our Proposition 4.8 below. However, a careful reading of the proof of Theorem 3.6 in Eqs. (5.32)–(5.35) of [BCKS98] shows that actually only our weaker assumption (4.2) is needed. 4.1. General moment estimates. The first lemma is a direct consequence of Postulate (IV). It is identical to Lemma 4.4 in [BCKS98]. Lemma 4.1. If Postulate (IV) holds, then for β > 1/ρ1 − 1 (and a fortiori for β > d/2 − 1 = (d − 2)/2) there exists constants C1 = C(β, d) and C2 = C2 (d) such that L
(m + 1)β πm (pc ) ≤ C1 Lβ+1 πL (pc ) if L ≥ 1,
(4.3)
m=0
and L m=0
(m + 1)d−1 πm2 (pc ) ≤ C2 Ld πL2 (pc ) if L ≥ 1.
(4.4)
The next lemma, which is identical to Lemma 6.1 in [BCKS98], gives a basic moment estimate. For d = 2 such an estimate was already given in [Ngu88]. Lemma 4.2. Assume Postulate (IV) holds. Define V (L) := number of sites in L connected to ∂ 2L .
(4.5)
Then for some constants Ci , it holds that
k Ep V k (L) ≤ C1 k! C2 Ld πL (pc ) ,
(4.6)
Finite-Size Scaling in Percolation
171
provided p ≤ pc , k ≥ 1 and L ≥ 1. Consequently Ep exp(tV (L)) ≤ C1 [1 − tC2 Ld πL (pc )]−1
(4.7)
whenever p ≤ pc and 0 ≤ t < [C2 Ld πL (pc )]−1 . When Postulates (III) and (IV) hold, then (4.6) and (4.7) remain valid for p > pc and L ≤ L0 (p). The next proposition, which is one of the main technical results of [BCKS98] (Proposition 6.3 in [BCKS98]), follows from the above moment estimate Lemma 4.2. It is crucial for our proofs in Sects. 5.1 and 5.3. Proposition 4.3. i) Assume that Postulate (IV) holds. Then there exist constants Ci such that d n (1) Prp W n ≥ xs(L0 (p)) ≤ C1 e−C2 x (4.8) L0 (p) if x ≥ 0, n ≥ L0 (p), and p < pc . In particular (1) Prp W n ≥ ys(L0 (pn )) log
n L0 (pn )
→0
(4.9)
if y > d/C2 and g(pn , p) → −∞. ii) Assume that Postulate (IV) holds, and if p > pc , that also Postulate (III) holds. Then there exist constants Ci such that (1) (4.10) Prp W n ≥ xs(n) ≤ C1 e−C2 x if x ≥ 0 and n ≤ L0 (p). iii) Assume that Postulates (III) and (IV) hold. Then there exist constants Ci such that d d n n (1) exp −C2 x + C3 Prp W n ≥ xs(L0 (p)) ≤ C1 L0 (p) L0 (p) (4.11) if x ≥ 0, n ≥ L0 (p) and p > pc . The next lemma summarizes several additional results which follow from Postulate (IV). To state it, we introduce the diameter of a cluster C as diam(C) = max |v − w|∞ . v,w∈C
(4.12)
Lemma 4.4. Assume that Postulate (IV) holds. Then there exist constants Ci such that P rp {diam(C(0)) ≥ xL0 (p)} ≤ C1 πL0 (p) (p)e−C2 x if x ≥ 2 and p < pc ,
(4.13)
and d/2
P rp {∃ cluster C in n with diam(C) ≤ yn and |C| ≥ xs(n)} ≤ C1 y −d e−C2 x/y (4.14) if x ≥ 0, 0 < y ≤ 1, p ≤ pc and 4/y ≤ n ≤ L0 (p).
172
C. Borgs, J. T. Chayes, H. Kesten, J. Spencer
Proof. The bound (4.14) was proved in [BCKS98], see Remark (xiii) at the end of Section 6 in [BCKS98]. To prove (4.13) we note that for x ≥ 2, P rp {diam(C(0)) ≥ xL0 (p)} ≤ P rp {0 ↔ ∂BL0 (p) and ∂BL0 (p) is connected to at least x/2 distinct boxes BL0 (p) (jL0 (p)), j ∈ 2Zd \ {0}} = πL0 (p) (p)P rp {∂BL0 (p) is connected to at least x/2 distinct boxes BL0 (p) (jL0 (p)), j ∈ 2Zd \ {0}} (see (2.11) for the definition of Bn (v)). As in the proof of Proposition 6.3 (ii) of [BCKS98], (more precisely, as in the proof of the bound (6.39) in [BCKS98]), the renormalized Peierls argument of Theorem 5.1 in [Kes82] shows that for suitable constants C1 , C2 the probability P rp {∂BL0 (p) is connected to at least x/2 distinct boxes BL0 (p) (jL0 (p)), j ∈ 2Zd \ {0}} is bounded above by C1 e−C2 x .
# "
4.2. Some important scaling properties. In this subsection we state a number of properties of the functions πn , Ps and χ (p), most of which have already been proved in [BCKS98]. The first lemma provides an upper bound for πm (pc )/πn (pc ) which complements the lower bound of Postulate (IV). Lemma 4.5. i) There are constants C1 < ∞ and C2 > 0 such that πn (p) ≤ C1 e−C2 n/L0 (p) if p < pc and n ≥ L0 (p). πL0 (p) (p)
(4.15)
ii) Assume that (4.1) holds for some ε > 0. Then P rpc {∂Bn (0) ↔ ∂B3n (0)} ≤ 1 − ε 2d if n ≥ 1.
(4.16)
iii) Assume that (4.1) holds for some ε > 0. Then there exist constants C1 , ρ2 < ∞ such that m −1/ρ2 πm (pc ) if m ≥ n ≥ 1. (4.17) ≤ C1 πn (pc ) n Proof. Statements i) and iii) are the content of Theorem 3.8 of [BCKS98]. To prove ii), we show that for any p ∈ [0, 1] and any n ≥ 1, one has P rp {∂Bn (0) ↔ ∂B3n (0)} ≥ [1 − R2n,6n (p)]2d .
(4.18)
Indeed, by the definition of Rn,m , the probability that there is no occupied crossing in the 1-direction of the block [n, 3n] × [−3n, 3n]d−1
(4.19)
Finite-Size Scaling in Percolation
173
is equal to 1 − R2n,6n . The cube B3n (0) is the union of Bn (0) and the block in (4.19) plus 2d − 1 more blocks congruent to the block in (4.19). Let Fn be the event that none of these 2d blocks congruent to (4.19) has an occupied crossing in the short direction. Obviously, the event Fn implies that ∂Bn (0) is not connected to ∂B3n (0), so that the probability on the left hand side of (4.18) is bounded from below by the probability of Fn . Since P rp {Fn } is at least [1−R2n,6n (p)]2d by the Harris–FKG inequality, the bound (4.18) follows. " # The next proposition summarizes the results of Theorem 3.7 and the first statement of Theorem 3.4 in [BCKS98]. Assuming existence of the critical exponents ρ and δ, the first statement implies the hyperscaling relation dρ = δ + 1. The second statement is the analogue of Postulate (III) for P≥s (p). Proposition 4.6. Assume that (4.1) holds for some ε > 0 and that Postulate (IV) holds. Then there exists constants C1 > 0 and C2 < ∞ such that C1 πn (pc ) ≤ P≥s(n) (pc ) ≤ C2 πn (pc ).
(4.20)
If Postulate (III) holds as well, then there exist constants C3 > 0, C4 < ∞ and 0 < σ0 = σ0 (ε, d) ≤ 1 such that C3 P≥s(n) (pc ) ≤ P≥s(n) (p) ≤ C4 P≥s(n) (pc ) if n ≤ σ0 L0 (p).
(4.21)
Our last two propositions in this section summarizes the results of several theorems in [BCKS98], namely Theorem 3.5, Theorem 3.6 and Theorem 3.9. Proposition 4.8 in particular has two upper bounds complementing lower bounds in the postulates, and a hyperscaling relation. Assuming the existence of the corresponding exponents, this relation implies γ = (d − 2/ρ)ν. Lemma 4.7. Assume Postulate (IV) holds. Then there exist constants 0 < Ci < ∞ such that P≥xs(L0 (p)) (p) ≤ C1 e−C2 x if p < pc and x ≥ 1. πL0 (p) (pc )
(4.22)
Proposition 4.8. Assume that (4.1) is valid for some ε > 0, and that Postulates (III) and (IV) hold. Then there exist constants 0 < Ci < ∞ such that, with σ0 as in Proposition 4.6, it holds that P≥xs(L0 (p)) (p) ≤ C1 exp[−C2 x] if x ≥ 1 and p < pc , P≥s(σ0 L0 (p)) (p)
(4.23)
and C3 L0 (p)d [πL0 (p) (pc )]2 ≤ χ (p) ≤ C4 L0 (p)d [πL0 (p) (pc )]2 ,
p < pc .
(4.24)
If (4.1) and (4.2) are valid for some ε > 0 and some σ3 > 0, and if Postulate (IV) holds, then there exists a constant C5 > 0 such that C5 L0 (p)d [πL0 (p) (pc )]2 ≤ χ fin (p),
p > pc .
(4.25)
174
C. Borgs, J. T. Chayes, H. Kesten, J. Spencer
5. Proof of the Theorems, Given the Postulates In this section, we prove our principal results, Theorems 3.1–3.5. The section is divided into three subsections. These correspond to the proof of results within, above and below the scaling window: Theorem 3.1 i), Theorem 3.3 and Theorem 3.4 in Sect. 5.1, Theorem 3.1 iii) and Theorem 3.2 in Sect. 5.2, and finally, Theorem 3.1 ii) and Theorem 3.5 in Sect. 5.3. 5.1. Inside the scaling window. We start this subsection with several lemmas and propo (s1 , s2 ) of clusters with size between sitions concerning the numbers N (s1 , s2 ) and N s1 and s2 , defined in (2.1) and (2.2). Although some of these results are very similar to the theorems we are finally going to prove, we give them as separate propositions, since this allows us to better keep track of which postulates are needed in which step. At many points in this and the following subsections, we use the fact that, for an arbitrary configuration ω, and number α, it holds that (i) α = s α−1 I [|C (v)| = s]. (5.1) W
i≥1
v∈ s≥1
This is obvious from the fact that in the right-hand side, the sum of I [|C (w)| = s] over all points w in C (v) equals sI [|C (v)| = s]. Taking expectations of (5.1) gives (i) α E p W
s α−1 P rp {|C (v)| = s} . (5.2) = v∈ s≥1
i≥1
This argument for α = 1 will be used in the proof of Proposition 5.5, but even more often will we use the special case α = 0, which says that the number of clusters of size s can be rewritten as 1 i | W (i) = s = (5.3) I [|C (v)| = s].
s v∈
These formulae and some variants form a basic relationship which allows us to relate (i) estimates on the distributions of W and |C(0)|. We use the following consequence of (5.3): s2
1 Ep N (s1 , s2 ) = Prp |C (v)| = s . s s=s 1
(5.4)
v∈
In a similar way, we have s2
1 (s1 , s2 ) = Ep N Prp |C (v)| = s, v ↔ ∂ . s s=s 1
(5.5)
v∈
2 (s1 , s2 ): We also need the corresponding representation for the expectation of N
1
2
(s1 , s2 ) = Ep N Prp |C (v)| = s, s s s ≤s≤s 1 2 s1 ≤˜s ≤s2
v,w∈
s, w ↔ ∂ . v ↔ ∂ , |C (w)| = (i)
The next two lemmas will be useful in proving lower bounds for W .
(5.6)
Finite-Size Scaling in Percolation
175
Proposition 5.1. Assume that (4.1) holds for some ε > 0 and that Postulates (III) and (IV) hold. Then there exist constants 0 < Ci < ∞ and 1 ≤ σ1 < ∞ such that C1
n m
d
d
n (s(m), s(km)) ≤ Ep N n (s(m), s(km)) ≤ C2 n , (5.7) ≤ Ep N m
provided σ1 m ≤ min{L0 (p), n} and k ≥ σ1 . Proof. For brevity we write instead of n . We start with the upper bound. Using the representation (5.4) and bounding the factor 1/s in (5.4) by 1/s(m), we get
Ep N (s(m), s(km)) ≤
1 Prp |C (v)| = s s(m) v∈ s≥s(m)
=
1 Prp |C (v)| ≥ s(m) s(m)
(5.8)
v∈
≤
(2n)d P≥s(m) (p), s(m)
where in the last step we used the definition (2.4) of P≥s(m) (p) and the fact that |C (v)| ≤ |C(v)|. Without loss of generality we shall take σ1 ≥ 1/σ0 ≥ 1, where σ0 is the constant of Proposition 4.6. Then σ1 m ≤ L0 (p) implies m ≤ σ0 L0 (p), and we may use Proposition 4.6 to bound the right-hand side of (5.8). We get for some finite constant C2 , d (2n)d n (2n)d . P≥s(m) (p) ≤ C2 πm (pc ) = C2 s(m) s(m) m
(5.9)
The estimates (5.8) and (5.9) imply the upper bound. To prove the lower bound, we use that Postulate (IV) implies that 0 d/2 s(0) ≥ D3 % % s(0 ) 0
if
0 ≥ 0% ≥ 1,
(5.10)
−2/d
so that in particular s(0) ≥ s(0% ) whenever 0/0% ≥ D3 . We conclude that for any choice of k ≥ 1 we can find a σ1 ≥ k(1 + 1/σ0 ) such that s(km) ≥ s( km) for all k ≥ σ1 . It then follows from (5.5) that for k ≥ σ1 ,
(s(m), s(km)) ≥ Ep N (s(m), s( Ep N km) − 1) ≥
s( km)−1
1
Prp |C (v)| = s, v ↔ ∂
s
s=s(m) v∈ n 2
=
s( km)−1
1
Prp |C(v)| = s, v ↔ ∂ , s
s=s(m) v∈ n 2
(5.11)
176
C. Borgs, J. T. Chayes, H. Kesten, J. Spencer
where in the second step we bounded the sum over = n from below by a sum over
n2 . Bounding the factor 1/s in (5.11) from below by 1/s( km), we get (s(m), s(km)) Ep N
1 ≥ km), v ↔ ∂
Prp s(m) ≤ |C(v)| < s( s(km) v∈ n 2
1 = Prp s(m) ≤ |C(v)| < s( km) s(km) v∈ n 2
≥
≥
(5.12) − Prp s(m) ≤ |C(v)| < s( km), v ↔ ∂
1 Prp s(m) ≤ |C(v)| < s( km) − πn/2 (p) s( km) v∈ n s( km)
2
P≥s(m) (p) − P≥s(km) (p) − πn/2 (p) .
(n − 2)d
Since n ≥ σ1 m ≥ km by the assumption σ1 m ≤ min{L0 (p), n}, we obtain
(s(m), s(km)) Ep N (n − 2)d ≥ P≥s(m) (p) − P≥s(km) (p) − πkm/2 (p) . s( km)
(5.13)
Again by the assumption σ1 m ≤ min{L0 (p), n}, we have m ≤ km ≤ ( k/σ1 )L0 (p) ≤ σ0 L0 (p). We therefore may use Proposition 4.6 in conjunction with Postulate (III) and the bound πkm (pc ) ≤ πkm/2 (pc ) to conclude that d
(s(m), s(km)) ≥ (n − 2) C3 πm (pc ) − C4 πkm/2 (pc ) , Ep N s( km)
(5.14)
for suitable constants C3 , C4 ∈ (0, ∞) which depend only on the constants in Proposition 4.6, but not on the choice of k. Finally we appeal to Lemma 4.5 iii) to fix k so large 1 that C4 πkm/2 (pc ) ≤ 2 C3 πm(pc ) . Here k depends only on C4 /C3 and the constants in Lemma 4.5 iii); also k determines the value to take for σ1 . We then get d
(s(m), s(km)) ≥ (n − 2) 1 C3 πm (pc ). Ep N s( km) 2
(5.15)
From s( km) ≤ k d s(m) we then conclude that for n ≥ 4, d d
(s(m), s(km)) ≥ C1 (2n) πm (pc ) = C1 n , Ep N s(m) m
(5.16)
where C1 = 2−2d−1 k −d C3 . This proves the lower bound when n ≥ 4. If we choose σ1 large enough, then 1 ≤ n < 4 is ruled out by σ1 ≤ σ1 m ≤ n. " #
Finite-Size Scaling in Percolation
177
Proposition 5.2. Assume that (4.1) holds for some ε > 0 and that Postulates (III) and (IV) hold. Then there is a constant C3 < ∞, such that
n (s(m), s(km)) Var N (5.17)
≤ C3 , (s(m), s(km)) 2 Ep N provided σ1 m ≤ min{L0 (p), n}, k ≥ σ1 . Here σ1 is the constant of Proposition 5.1. Proof. Again we write for n . We first will prove that for arbitrary s1 , s2 ∈ N, s1 ≤ s2 , and p ∈ (0, 1),
(2n)d 2 P≥s1 (p) . (5.18) ≤ Ep N (s1 , s2 ) 1 + Ep [N (s1 , s2 )] s1 We need some notation. We denote the set of bonds with both endpoints in by B( ), and the set of bonds with both endpoints in \ ∂ by B( ). Let B be a subset of B( ). With a slight abuse of notation, we say that v is a point in B if v is an endpoint of one of the bonds in B. We write B is occupied (vacant) for the event that all bonds in B ⊂ B( ) are occupied (respectively, vacant). Given v ∈ , we denote the set of all connected subsets B ⊂ B( ) that contain the point v by Bv ( ). Again with a slight abuse of notation, we denote the number of points in a cluster B ⊂ Bv ( ) by |B|. Finally, we A B for the set of all bonds b ∈ B( ) \ B which share an endpoint with a bond write ∂
% b ∈ B. Using Eq. (5.6), we rewrite the left-hand side of (5.18) as
(s1 , s2 )2 Ep N = v,w∈
w ( ) B∈Bv ( ) B∈B s1 ≤|B|≤s2 s ≤|B|≤s 1 2
is occupied, ∂ A B ∪ ∂ A B Prp B ∪ B
is vacant . |B| |B|
(5.19)
and B = B Next we observe that the event on the right-hand side cannot occur if B ↔ B in , because in this case some occupied bond in B ∪ B∪ (a suitable path from B to also lies in ∂ A B ∪ ∂ A B. As a consequence, the right-hand side decomposes into two B)
terms: the term
A B is vacant Prp B is occupied, ∂
|B|2 B∈B ( )∩B ( ) v,w∈
v w s1 ≤|B|≤s2
=
v∈
B∈Bv ( ) s1 ≤|B|≤s2
A B is vacant Prp B is occupied, ∂
|B|
(5.20)
(s1 , s2 ) = Ep N and the term v,w∈
w ( ) B∈Bv ( ) B∈B s1 ≤|B|≤s2 s ≤|B|≤s 1 2 ,B↔B
is occupied, ∂ A B ∪ ∂ A B Prp {B ∪ B
is vacant} . |B| |B|
(5.21)
178
C. Borgs, J. T. Chayes, H. Kesten, J. Spencer
By using the second decoupling inequality of [BC96], or, alternatively, the van den Berg-Kesten inequality [BK85] we see that the last sum equals
v,w∈
≤
w ( ) B∈Bv ( )\Bw ( ) B∈B s1 ≤|B|≤s2 s1 ≤|B|≤s 2
1 s1
v,w∈
≤
1 s1
v,w∈
B∈Bv ( )\Bw ( ) s1 ≤|B|≤s2
B∈Bv ( )\Bw ( ) s1 ≤|B|≤s2
is occupied, ∂ A B ∪ ∂ A B Prp {B ∪ B
is vacant} |B| |B|
A B is vacant, |C (w)| ≥ s } Prp {B is occupied, ∂
1 |B| A B is vacant} Pr {|C (w)| ≥ s } Prp {B is occupied, ∂
p
1 |B|
Prp {|C (w)| ≥ s1 }
(s1 , s2 ) . ≤ Ep N s1 w∈
(5.22) Combining the two terms (5.20) and (5.22), and observing that Prp {|C (w)| ≥ s1 } ≤ Prp {|C(w)| ≥ s1 }, we obtain (5.18). The bound (5.17) now follows from (5.18), (5.9) and the lower bound in (5.7). " # The next proposition is a consequence of Proposition 5.1, Proposition 5.2 and the fact that (s(m), s(km)) ≥ N 1 (s(m), s(km)) + N 2 (s(m), s(km)), N provided 1 ⊂ and 2 = \ 1 . Proposition 5.3. Assume that (4.1) holds for some ε > 0 and that Postulates (III) and (IV) hold. Then there are constants C4 , C5 > 0 such that Prp
d d n m N n (s(m), s(km)) ≥ C4 ≥ 1 − C5 , m n
(5.23)
provided σ1 m ≤ min{L0 (p), n} and k ≥ σ1 . Here σ1 is the constant of Proposition 5.1. Proof. Let k = n/'σ1 m( be the largest integer less than or equal to n/'σ1 m(, and (s(m), s(km)) is increasing in , n = k'σ1 m(. Note that then σ1 m ≤ n ≤ n. Since N i.e., (s(m), s(km)) ≥ N
⊂ , N (s(m), s(km)) if
(5.24)
= we get that for = n ,
n, d d n (s(m), s(km)) ≥ C4 n
Prp N ≥ Prp N . (s(m), s(km)) ≥ C4 m m (5.25)
Finite-Size Scaling in Percolation
179
contains Next we note that
k d disjoint subvolumes (i) of size (2'σ1 m()d , and introduce the random variable k d
X=
i=1
(i) (s(m), s(km)). N
(5.26)
(s(m), s(km)), we have
Since X ≤ N (s(m), s(km)) ≤ N d d n n ≥ Prp X ≥ C4 . Prp N (s(m), s(km)) ≥ C4 m m
(5.27)
(i) (s(m), s(km)) in (5.26) are i.i.d. and using Observing that the random variables N Proposition 5.2, we have (1) (s(m), s(km))} 1 Var{N C6 Var X = d ≤ d. 2 2 (Ep X) k Ep {N (1) (s(m), s(km))} k
(5.28)
Noting that
4 Var X , Prp X ≤ 21 Ep X ≤ Prp |X − Ep X|2 ≥ 41 (Ep X)2 ≤ (Ep X)2
(5.29)
we find that
Prp X ≥
1 2 Ep X
d m 4C6 ≥ 1 − d ≥ 1 − C5 , n k
k = n/'σ1 m( where C5 = (4σ1 )d 4C6 (note that 1/ lower bound
−1
(5.30)
≤ 4σ1 m/n). Using finally the
(1) (s(m), s(km)) ≥ C1 ( Ep X = k d Ep N kσ1 )d = C1
n m
d
,
which comes from (5.7), we obtain the desired bound (5.23), provided C4 > 0 is chosen small enough. " # Proposition 5.4. Suppose that Postulates (III) and (IV) hold, and that (4.1) and (4.2) are valid for some ε > 0 and some σ3 > 0. Then there are strictly positive constants C1 and σ4 such that 1+(n/m)d (1) , (5.31) Prp W n ≤ s(m) ≥ C1 provided m ≤ σ4 L0 (p). Proof. It follows from (4.10) and (5.10) that there exists a constant σ4 > 0 such that 1 1 (1) if r ≤ σ4 m and r ≤ L0 (p). (5.32) Prp W 3r ≤ s(m) ≥ 2 3 In addition, it follows (4.18) that Prp {v ↔ ∂ 3r for all v ∈ r } ≥ [1 − Rr,3r (p)]2d .
(5.33)
180
C. Borgs, J. T. Chayes, H. Kesten, J. Spencer
For p ≤ pc , 1 − Rr,3r (p) ≥ 1 − Rr,3r (pc ) > ε (see (4.1)). We still have 1 − Rr,3r (p) > ε for p > pc and r ≤ σ3 L0 (p), by virtue of (4.2). Consequently, as in (4.16), Prp {v ↔ ∂B3r for all v ∈ r } ≥ ε2d
r ≤ σ3 L0 (p).
(5.34)
Using the Harris-FKG inequality we obtain from (5.32) and (5.34) that
Prp |C(v)| ≤ s(m) for all v ∈ r (1) ≥ Prp W3r ≤ s(m) and v ↔ ∂B3r for all v ∈ r
(5.35)
≥
1 2d ε 2
if
if
r ≤ (σ3 ∧ 1/3)L0 (p) ∧ σ4 m.
We are now ready to prove (5.31) for arbitrary n. We first estimate (1) Prp W ≤ s(m) ≥ Prp |C(v)| ≤ s(m) for all v ∈
(5.36)
and note that the right-hand side of (5.36) is decreasing in . Let m ≤ σ4 L0 (p) and choose 0 < σ5 ≤ σ4 such that σ4 σ5 ≤ (σ3 ∧ 1/3). Then choose an integer r ≥ 1 in [σ5 m/2, σ5 m]; if this is not possible, because σ5 m < 1, then take r = 1. For this choice of r, Prp {|C(v)| ≤ s(m) for all v ∈ r } ≥ C1 > 0 for some constant C1 , by virtue of (5.35). If n < r, then this already implies (5.31). Otherwise, choose an integer k such that n ≤ n := kr ≤ 2n. We then get (1) |C(v)| ≤ s(m) . (5.37) Prp W n ≤ s(m) ≥ Prp v∈ n
d (i) of diameter 2r, and using the Harris-FKG Decomposing n into k subvolumes
inequality for the intersection of the events ∩v∈ (i) {|C(v)| ≤ s(m)}, we obtain
k d d (1) {|C(v)| ≤ s(m)} Prp W n ≤ s(m) ≥ Prp ≥ C1k .
(5.38)
v∈ r
The proof is concluded by observing that k ≤ 2n/r ≤ 4n/(σ5 m).
# "
Proof of Theorem 3.1 i). For this proof we only use (4.1) and Postulates (III) and (IV). As before, abbreviate n to . Since lim supn→∞ |g(pn , n)| < ∞, we have n ≤ λL0 (pn ) for all n ≥ n1 ,
(5.39)
constants depending on the sequence {pn }. where λ and n1 are finite (1) The fact that Ep W /s(n) is bounded above is immediate from Proposition 4.3. If n ≤ L0 (pn ) then (4.10) suffices. If L0 (pn ) ≤ n ≤ λL0 (pn ), then we use (4.8) or (4.11) plus the fact that s(n) ≥ D3 s(L0 (pn )) (by (5.10)). Note that this proof only requires Postulates (III) and (IV), and does not rely on the assumption (4.1).
Finite-Size Scaling in Percolation
181
(i) In order to complete the proof, we need lower bounds on Ep W . To this end, we
first note that Proposition 5.3 implies that for any δ > 0 there are constants 1 ≤ σ (i) = σ (i) (λ, δ) < ∞ such that (i) Prp W n ≥ s(m) ≥ 1 − δ, (5.40) provided σ (i) m ≤ n ≤ λL0 (p). Indeed, choose σ (i) (λ, δ) ≥ σ1 (with the constant σ1 as in Proposition 5.1) so large that i) σ (i) m ≤ λL0 (p) implies σ1 m ≤ L0 (p), ii) C4 (σ (i) )d ≥ i, and iii) C5 (σ (i) )−d ≤ δ, where C4 , C5 are as in Proposition 5.3. Then for σ (i) m ≤ n ≤ λL0 (p), we get
(i)
(s(m), s(σ1 m)) ≥ i Prp W ≥ s(m) = Prp N (s(m), ∞) ≥ i ≥ Prp N d (s(m), s(σ1 m)) ≥ C4 n ≥ Prp N , m (5.41) where we used that σ (i) m ≤ n implies C4 (n/m)d ≥ i in the last step. Combined with Proposition 5.3 and the fact that the assumption σ (i) m ≤ n implies C5 (m/n)d ≤ δ by our choice of σ (i) , the bound (5.41) implies (5.40). (i) (i) In order to prove a lower bound on lim inf Epn {W n }, we now assume that n ≥ n1 := n→∞
max{n1 , σ (i) }, where n1 and λ are the constants from (5.39), and σ (i) = σ (i) (λ, 21 ). Choosing m = n/σ (i) , we have m ≥ 1 and σ (i) m ≤ n ≤ λL0 (pn ). Thus, by (5.40)
(i) (5.42) Epn W n ≥ 21 s(m). Since m ≤ n/σ (i) ≤ m + 1 ≤ 2m by the definition of m, we have s(n)/s(m) ≤ (n/m)d ≤ (2σ (i) )d ,
(5.43)
(i)
and hence s(m) ≥ s(n)(2σ (i) )−d . Thus, with C1 (λ) = 21 (2σ (i) )−d , we have
(i) (i) Epn W n ≥ C1 (λ)s(n). This completes the proof of the lower bound.
(5.44)
# "
Proof of Theorem 3.3. For this proof use (4.1), (4.2) and Postulates (III) and (IV). We
(i) (i) start with a lower bound on Prpn W n ≥ K −1 Epn (W n ) . We again have (5.39) for some λ and n1 , and by Theorem 3.1 i) (whose proof only used (4.1) and Postulates (III) (i) and (IV)) there exists some constant C2 , which depends on the sequence {pn }, such that
(i) (i) Epn W n ≤ C2 s(n). Thus if m is such that (i)
then
s(m) ≥ K −1 C2 s(n),
(5.45)
(i) (i) (i) Prpn W n ≥ K −1 Epn W n ≥ Prpn W n ≥ s(m) .
(5.46)
182
C. Borgs, J. T. Chayes, H. Kesten, J. Spencer
We now choose m = n/σ (i) (λ, δ) , where the σ (i) are the constants introduced above (5.40). Then (5.45) will be satisfied for large enough K (by (5.43)). Since n ≥ n1 and n ≥ σ (i) (λ, δ) implies m ≥ 1 and mσ (i) (λ, δ) ≤ n ≤ λL0 (pn ), we now can use (5.40) to conclude that
(i) (i) lim inf Prpn W n ≥ K −1 Epn W n ≥ 1 − δ, (5.47) n→∞
provided K is large enough. Together with Markov’s inequality,
(i)
(i) Prpn W n ≥ KEpn W n ≤ K −1 ,
(5.48)
(5.47) implies Theorem 3.3 i). In order to complete the proof of Theorem 3.3, we choose m(n) as the maximal (i) m ≤ (σ4 /λ ∧ 1)n such that K −1 C1 (λ)s(n) > s(m), where σ4 is as in Proposition 5.4, (i) (i) (1) λ as in (5.39) and C1 as in (5.44). Then, by (5.44) and W ≤ W , we have
(i) (i) (i) (i) lim sup Prpn W n ≥ K −1 Epn W n ≤ lim sup Prpn W n ≥ K −1 C1 (λ)s(n) n→∞ n→∞ (1) (i) ≤ lim sup Prpn W n ≥ K −1 C1 (λ)s(n) n→∞ (1) ≤ lim sup 1 − Prpn W n ≤ s(m(n)) . n→∞
(5.49) Since n/m(n) is bounded above by virtue of Postulate (IV) (see (5.10)), Proposition 5.4 shows that the right-hand side of (5.49) is bounded away from 1. This proves Theorem 3.3 ii). " # Proof of Theorem 3.4. For this proof we only use (4.1), and Postulates (III) and (IV). Theorem 3.4 follows immediately from Proposition 5.1. Indeed, let λ and n1 be the constants from (5.39), and C1 , C2 and σ1 be those from Proposition 5.1. Choose σ2 ≥ max{σ1 , λσ1 , n1 }. We note that then m ≥ 1 and σ2 m ≤ n imply n ≥ n1 , and hence n ≤ λL0 (pn ) and σ1 m ≤ L0 (pn ). The conditions of Theorem 3.4 therefore imply those of Proposition 5.1, proving that Theorem 3.4 under the assumption that (4.1), as well as Postulates (III) and (IV), hold. " #
5.2. Above the scaling window. In this subsection, we prove Theorem 3.1 iii) and The(i) orem 3.2. To this end, we consider separately those clusters C which intersect the infinite cluster C∞ and those which do not. We denote the clusters intersecting C∞ by (1) (2) (k) C ,∞ , C ,∞ , · · · C ,∞ , ordering them again from largest to smallest size, with lexico(1)
(2)
(k)
graphic order between clusters of the same size. In the same way, C ,fin , C ,fin , · · · C ,fin (i)
denote the clusters in which do not intersect the infinite cluster C∞ . Finally, W ,fin = (i) |C ,fin |
(i) W ,∞
and = sponding classes.
(i) |C ,∞ |
denote the sizes of the i th largest clusters in the corre-
Finite-Size Scaling in Percolation
183
Proposition 5.5. Suppose that Postulates (V) and (VI) hold. Then there exists a constant C1 < ∞ such that (1)
Ep {W n ,fin } | n |P∞ (p)
≤ C1
L (p) d/2 0 if p > pc , n
(5.50)
so that in particular (1)
Epn {W n ,fin } | n |P∞ (pn )
→ 0 as n → ∞
(5.51)
whenever pn > pc is a sequence of densities such that n/L0 (pn ) → ∞ as n → ∞. Proof. Let t (n) = (2nL0 (p))d/2 πL0 (p) (pc ). Analogously to (5.2) we have (1)
(1)
(1)
Ep {W n ,fin } ≤ t (n) + Ep {W n ,fin ; W n ,fin ≥ t (n)} (1) ≤ t (n) + Prp {|C n (v)| = W n ,fin , |C n (v)| ≥ t (n), v ↔ ∞} v∈ n
≤ t (n) + | n |Prp {|C(0)| ≥ t (n), 0 ↔ ∞}. (5.52) Using Markov’s inequality and Postulate (V) we obtain (1)
(2n)d fin χ (p) t (n) (2n)d d ≤ t (n) + D4 L (p)πL2 0 (p) (pc ) t (n) 0 = t (n) 1 + D4 .
Ep {W n ,fin } ≤ t (n) +
(5.53)
Observing that t (n)/| n |P∞ (p) (L0 (p)/n)d/2 by Postulate (VI), we obtain (5.50) and hence (5.51). " # (1)
(2)
(k)
In order to estimate the size of the clusters C ,∞ , C ,∞ , · · · C ,∞ , we make extensive use of the facts that (i) W ,∞ = | n ∩ C∞ | = I [v ↔ ∞] (5.54) v∈ n
i≥1
and Epn {| n ∩ C∞ |} =
Prpn {v ↔ ∞} = | n |P∞ (pn ).
(5.55)
v∈ n
Lemma 5.6. Suppose that Postulates (V) and (VI) hold. Let pn > pc be a sequence of densities such that n/L0 (pn ) → ∞ as n → ∞. Then as n → ∞, | n ∩ C∞ | → 1 in probability. | n |P∞ (pn )
(5.56)
184
C. Borgs, J. T. Chayes, H. Kesten, J. Spencer
Proof. We bound the variance of | n ∩ C∞ | by Covpn (v ↔ ∞; w ↔ ∞) Var {| n ∩ C∞ |} = pn
v,w∈ n
≤
Covpn (v ↔ ∞; w ↔ ∞) = | n |χ cov (pn ).
(5.57)
v∈ n w∈Zd
Note that we used here the positivity of Covpn (v ↔; w ↔ ∞); this follows from the Harris–FKG inequality. Combined with (5.55) and Postulates (V) and (VI), we obtain that for a suitable constant C1 < ∞, Var pn {| n ∩ C∞ |} L0 (pn ) d C1 L0 (pn )d . (5.58) ≤ = C 1 Ep2 n {| n ∩ C∞ |} | n | 2n By our assumption on pn , the right-hand side goes to zero as n → ∞. This implies (5.56). " # Proposition 5.7. Suppose that Postulates (II), (V) and (VI) hold. Let pn > pc be a sequence of densities such that n/L0 (pn ) → ∞ as n → ∞. Then, as n → ∞, (1)
W n ,∞
| n |P∞ (pn )
→ 1 in probability.
(5.59)
Proof. We have to show that for all δ > 0 (1)
as n → ∞
(5.60)
(1)
as n → ∞.
(5.61)
Prpn {W n ,∞ ≥ (1 − δ)| n |P∞ (pn )} → 1 and Prpn {W n ,∞ ≤ (1 + δ)| n |P∞ (pn )} → 1 (1)
Since W n ,∞ ≤ | n ∩ C∞ | by (5.54), the result (5.61) follows from (5.56). We are therefore left with proving (5.60). Again by (5.56), this amounts to showing that with (1) high probability, the main contribution to the left-hand side of (5.54) comes from W n ,∞ . We consider suitable volumes m ⊂ n with lim | m |/| n | > 1 − δ.
(5.62)
n→∞
Since |C∞ ∩ m | →1 | m |P∞ (pn )
in Prpn -probability
(5.63)
as n → ∞ (the proof is identical to the proof of Lemma 5.6), we conclude that Prpn {|C∞ ∩ m | ≥ (1 − δ)| n |P∞ (pn )} → 1 We shall next show that for a suitable choice of m ,
(i) Ppn #{i | C n ,∞ ∩ m = ∅} ≥ 2 → 0
as
as
n → ∞.
n → ∞.
(5.64)
(5.65)
Finite-Size Scaling in Percolation
185
(i)
If #{i | C n ,∞ ∩ m = ∅} = 1, then all pieces of C∞ ∩ m are connected in n and (1)
|C n ,∞ | ≥ |C∞ ∩ m |, so that (5.65) together with (5.64) will prove the desired result (5.60). In order to show that m can be chosen so that (5.62) and (5.65) hold, we define, for 0 < α < 1/6 and n ≥ 1/α, x=
2 − 3, α
L(n) = αn ,
M(n) = xL(n)
and
m=
M(n) + 1 . 2
Note that with this choice m < n for all n ≥ 1/α, and | m | 3α d . = 1− lim n→∞ | n | 2
(5.66)
(5.67)
A sufficiently small choice of α therefore ensures the condition (5.62). Note also that d, ˜ ˜
m is isomorphic to [0, M(n)]d , while n is isomorphic to [−L(n), M(n) + L(n)] where ˜ L(n) := n − m ≥ L(n).
(5.68)
Using these observations and recalling the definition (2.23) of S fin (pn ), we then bound ˜ L,M
Ppn {#{i |
(i) C n ,∞
fin fin ∩ m = ∅} ≥ 2} ≤ SL(n),M(n) (pn ) ≤ SL(n),M(n) (pn ), ˜
(5.69)
fin (p ) is decreasing in L. where in the last step we have used that SL,M n In order to complete the proof, we use that for any ε˜ > 0,
L0 (p, ε˜ ; x) L0 (p) = L0 (p, ε; 1)
(5.70)
by Postulate (II). Our assumption n/L0 (pn ) → ∞ therefore implies that L0 (pn , ε˜ ; x)/n, and hence L0 (pn , ε˜ , x)/L(n), goes to zero as n → ∞. Since this is true for all ε˜ > 0, we can use the definition (2.25) of L0 (pn , ε˜ , x) to conclude that fin fin SL(n),M(n) (pn ) = SL(n),xL(n) (pn ) → 0
as
n → ∞.
Equations (5.71) and (5.69) imply (5.65), and hence the proposition.
(5.71)
# "
Proof of Theorem 3.1 iii). For this proof use Postulates (II) and (V) and (VI). Let pn > pc be such that n/L0 (pn ) → ∞. We may then use (5.59) to conclude that (1)
lim inf n→∞
Since (1)
Epn {W n ,∞ } ≤
Epn {W n ,∞ }
≥ 1.
(5.72)
Epn {W n ,∞ } = | n |P∞ (pn )
(5.73)
| n |P∞ (pn )
i≥1
(i)
for all n, it follows that (1)
lim
n→∞
Epn {W n ,∞ } | n |P∞ (pn )
= 1.
(5.74)
186
C. Borgs, J. T. Chayes, H. Kesten, J. Spencer (1)
(1)
(1)
(1)
Combined with (5.51) and W n ,∞ ≤ W n ≤ W n ,∞ + W n ,fin , this proves (3.10). In order to prove (3.11), we note that (5.74) together with (5.54) and (5.72) imply that (2)
Epn {W n ,∞ } | n |P∞ (pn )
(1)
≤1−
Epn {W n ,∞ } | n |P∞ (pn )
as n → ∞. Combined with (5.51), this implies (3.11).
→0
(5.75)
# "
Proof of Theorem 3.2. We again use Postulates (II) and (V) and (VI). As before, by assumption, pn > pc for all sufficiently large n, and n/L0 (pn ) → ∞. Using Markov’s inequality and Proposition 5.5, we therefore get (1)
W n ,fin
| n |P∞ (pn )
→0
in probability.
Combined with Proposition 5.7, this implies Theorem 3.2.
(5.76)
# "
5.3. Below the scaling window. We start with a lemma which will play a similar role below the window to that played by the lower bound in Proposition 5.1 inside the window. Lemma 5.8. Assume that (4.1) holds for some ε > 0 and that Postulates (III), (IV) and (VII) hold. Then there exist constants 0 < C3 < ∞ and 1 ≤ σ6 , σ7 , σ8 < ∞ such that −D6 k n d
n (ks(L0 (p)), kσ6 s(L0 (p))) ≥ C3 e , Ep N k L0 (p)
(5.77)
provided k ≥ σ7 , n ≥ σ8 kL0 (p) and p < pc . Here D6 is the constant from Postulate (VII). Proof. Let C1 and C2 be the constants from Lemma 4.4. Combining the bound (4.13) with Postulate (VII) and Proposition 4.6 we see that for suitable constants C4 , C5 , with C2 C4 > D6 , and k sufficiently large, say k ≥ C7 , one gets P rp {|C(0)| ≥ ks(L0 (p)), but diam(C(0)) < C4 ksL0 (p)} ≥ P rp {|C(0)| ≥ ks(L0 (p))} − P rp {diam(C(0)) ≥ C4 kL0 (p)} ≥ C5 πL0 (p) (pc )e
−D6 k
(5.78)
.
We want to restrict |C(0)| further. For this we use Lemma 4.7, which tells us that P rp {|C(0)| ≥ σ6 ks(L0 (p))} ≤ C1 πL0 (p) (pc )e−C2 σ6 k . 7 , Therefore, if we take σ6 > D6 /C2 , then for sufficiently large k, say k ≥ C P rp {ks(L0 (p)) ≤ |C(0)| ≤ σ6 ks(L0 (p)), but diam(C(0)) < C4 kL0 (p)} 1 ≥ C5 πL0 (p) (pc )e−D6 k . 2
(5.79)
Finite-Size Scaling in Percolation
187
= n˜ . Observe that if Now let n ≥ 2C4 kL0 (p), n˜ = n − C4 kL0 (p) , = r and
and diam(C(v)) ≤ C4 kL0 (p) , then C(v) ⊂ and C (v) = C(v). Using this v∈
observation we now find
r (ks(L0 (p)), kσ6 s(L0 (p))) Ep N ≥
σ6 ks(L 0 (p)) 1 P rp {|C (v)| = s, but diam(C(v)) < C4 kL0 (p) } s
s=ks(L0 (p)) v∈
≥
1 P rp {ks(L0 (p)) ≤ |C(0)| ≤ C6 ks(L0 (p)), σ6 ks(L0 (p)) v∈B
but diam(C(0)) < C4 kL0 (p) } n d (2n)d k −1 e−D6 k . πL0 (p) (pc )e−D6 k = C3 ≥ C3 ks(L0 (p)) L0 (p) Choosing σ7 = max{C7 , C˜ 7 } and σ8 = 2C4 , this proves the lemma.
# "
Proof of Theorem 3.1 ii) and Theorem 3.5. For the proof, we will need (4.1), and Postulates (III), (IV) and (VII). Assume that pn < pc for sufficiently large n, and n/L0 (pn ) → ∞. It follows from (4.8) that for z ≥ 0 and n large, n (1) Epn {W n } ≤ s(L0 (pn )) log L0 (pn ) ∞ n (1) × z+ dy Prpn W n ≥ ys(L0 (pn )) log L0 (pn ) z n ≤ s(L0 (pn )) log L0 (pn ) d−C2 z ∞ n n × z + C1 exp[−C2 (y − z) log ]dy . L0 (pn ) L0 (pn ) z By choosing C2 z = d we see that
(1)
Epn {W n } ≤ C3 s(L0 (pn )) log
n L0 (pn )
(5.80) (1)
for a suitable constant C3 < ∞. This proves the upper bound for Epn {W n }, where we have so far only used Postulate (IV). (1) The lower bound for Epn {W n } follows immediately from Theorem 3.5 so that it suffices to prove (3.16). Also, we only have to prove that
lim inf Prpn K n→∞
−1
(i)
≤
W n
n s(L0 (pn )) log L0 (p n)
} → 1, as K → ∞,
(5.81)
since the other part of (3.16) is obvious from Markov’s inequality and the upper bound (5.80). For brevity we write p instead of pn for the remainder of this proof. The lower bound (5.77) will play a similar role to that played by Proposition 5.1. However, instead
188
C. Borgs, J. T. Chayes, H. Kesten, J. Spencer
of using an analogue of Proposition 5.2 for a second moment, we now appeal to the BK-inequality [BK85]. This tells us that P rp {∃ r disjoint clusters in B'σ8 kL0 (p)( of size ≥ ks(L0 (p))} r ≤ P rp {∃ at least one cluster in B'σ8 kL0 (p)( of size ≥ ks(L0 (p))} . Consequently if we set κ = P rp {∃ at least one cluster in B'σ8 kL0 (p)( of size ≥ ks(L0 (p))},
(5.82)
then Ep {number of disjoint clusters in B'σ8 kL0 (p)( of size ≥ ks(L0 (p))} ≤
κ . 1−κ
By (5.77) the left-hand side here is at least C8 k d−1 exp[−D6 k], C8 = C3 σ8d , so that 1 C8 d−1 −D6 k . (5.83) , k e κ ≥ min 2 2 Now choose k = k(n) = C9 log
n L0 (p)
with the constant C9 > 0 but so small that D6 C9 < d/2. Then we can find in n approximately d d/2 n n ≥ 2σ8 kL0 (p) L0 (p) disjoint boxes B'σ8 kL0 (p)( (vi ). Each of these boxes contains a cluster of size n ≥ k(n)s(L0 (p)) ∼ C9 s(L0 (pn )) log L0 (pn ) with a probability at least
min
1 C8 d−1 −D6 k . , k e 2 2
(5.84)
(5.85)
Moreover, as in (4.16) we also have P rp {∂B'σ8 kL0 (p)( (vi ) ↔ ∂B3'σ8 kL0 (p)( (vi )} ≥ ε2d . For large n this gives
C9 n s(L0 (pn )) log 2 L0 (pn ) and this cluster is not connected to ∂B3'σ8 kL0 (p)( (vi )}
P rp {B'σ8 kL0 (p)( (vi ) contains a cluster of size
(5.86)
≥ 21 ε 2d C8 k d−1 exp[−D6 k]. Since the number of boxes times the right hand side of (5.86) tends to infinity (by our choice of k(n) or C9 ), the probability that at least i of these boxes contains a cluster of size (5.84), and that these clusters are not connected to each other tends to 1. This establishes (3.16). " #
Finite-Size Scaling in Percolation
189
6. Verification of the Postulates in Two Dimensions In this section we prove Theorem 3.6, which states that the Scaling Postulates (I) – (VII) hold for d = 2. Before we start on the proof we discuss some general tools. The fundamental tool for two-dimensional bond percolation is duality. 3 This rests on the following observations. Let Z∗ denote the lattice Z2 + ( 21 , 21 ), which is called the dual lattice of Z2 . Each dual edge e∗ bisects exactly one edge e of the original lattice and vice versa. We call such a pair e∗ and e, associated. For each configuration ω of occupied and vacant edges of Z2 we obtain a configuration on Z∗ by declaring a dual edge e∗ occupied (repectively, vacant) if its associated edge is occupied (respectively, vacant). It is a well known result that there exists an occupied horizontal crossing of the rectangle [0, L] × [0, M] if and only if there does not exist a vertical vacant dual crossing of [ 21 , L − 21 ] × [− 21 , M + 21 ] (see [SmW78], Sect. 2.1 and [Kes82] , Sects. 2.6, 2.4). This translates into RL,M (p) = 1 − RM+1,L−1 (1 − p).
(6.1)
This relation can be used to relate quantities in the subcritical regime to similar quantities in the supercritical regime. For instance, define the two-dimensional finite-size scaling length as if p < pc 0 (p, ε) = min{L | RL,L (p) ≤ ε} L min{L | RL,L (p) ≥ 1 − ε} if p > pc .
(6.2)
(Note that this is in the spirit of definition (1.21) of [Kes87]. However, [Kes87] treats bond percolation on Z2 as site percolation on the covering graph of Z2 , so that the formal definition there is somewhat different. For the purposes of the proofs here this difference in the definitions is without significance.) It follows easily from duality and monotonicity 0 (p, ε) ≥ L 0 (1 − p, ε) for of RL,M in L and M that for bond percolation on Z2 , L p < pc . From the rescaling lemma (Lemmas 3.4 and 4.12 in [ACCFR83]) and duality 0 (1 − p, ε) − 1 ≥ L 0 (p, ε) for one obtains that for sufficiently small ε > 0 also 2L p < pc . We therefore have that 0 (p, ε) L 0 (1 − p, ε), L
p < pc =
1 . 2
(6.3)
Similarly, using the rescaling lemma and the Russo–Seymour–Welsh lemma [Rus78, SW78, Sect. 3.4] it is straightforward to show that in d = 2, the definition (6.2) is equivalent to our finite-size scaling correlation length below threshold, see (2.20): 0 (p) L0 (p) for L
p < pc ,
(6.4)
and to our finite-size scaling inverse surface tension above threshold, see (2.26): 0 (p) A0 (p) for L
p > pc .
(6.5)
As usual, the constants implicit in the equivalences (6.3)–(6.5) depend on ε. 3 Here we can use duality since we are dealing with bond percolation, which is self-dual. However, with a good deal more work, similar results can be proven for other two-dimensional models which are not self-dual – see [Kes87] (Eq. (1.23) and Sect. 4).
190
C. Borgs, J. T. Chayes, H. Kesten, J. Spencer
It also follows from the Russo-Seymour-Welsh lemma that for each x > 0 and integer 0 (p), M/L ≥ k ≥ 1 there exists a constant h(x, k, ε) > 0 such that for p ≤ pc , L ≤ k L x, it holds that RL,M (p) ≥ h(x, k, ε).
(6.6)
Thus, sponge crossing probabilities of rectangles with the ratio of the sides bounded 0 (p) are bounded away from 0. away from 0 and ∞ and with a size comparable to L By means of the Harris-FKG inequality it is then also easy to see that the probability of an occupied circuit surrounding the origin in the annulus A = [−M, M]2 \ (−L, L)2 is bounded away from 0, provided L ≤ kL0 (p), M/L ≥ 1 + x > 1. Indeed, by obvious monotonicity we may assume that M ≤ 2L. The annulus A is the union of four M − L × M rectangles, and if each of these has an occupied crossing in the long direction (i.e., a crossing in the direction of the side of length M), then A contains a circuit of the desired kind (compare [SmW78], Lemma 3.5). By the above, each of these crossings has a probability of RM,M−L (p) ≥ h(x/(1 + x) ∧ 1/2, 2k, ε), and by the Harris-FKG inequality the desired occupied circuit exists with a probability at least h4 (x/(1 + x) ∧ 1/2, 2k, ε). Now consider two adjacent rectangles [0, L] × [0, M] and [L, 2L] × [0, M], and assume that each of these contains an occupied horizontal crossing, r1 and r2 , say. If, in addition there exist occupied vertical crossings of [0, L] × [−L, M + L] and [L, 2L] × [−L, M + L] as well as occupied horizontal crossings of [0, 2L] × [−L, 0] and [0, 2L] × [M, M + L], then these four crossings contain a circuit which necessarily intersects r1 and r2 and therefore connects r1 and r2 (see Fig. 1). Therefore, another application of the Harris-FKG inequality shows that P rp {all horizontal crossings of [0, L] × [0, M] and of [L, 2L] × [0, M] are connectedthere exists at least one horizontal crossing in each of [0, L] × [0, M] and [L, 2L] × [0, M]} M + 2L L 1 2L , ≥ h2 ( , ε)h2 ( , , ε). M + 2L L0 (p) 2 L0 (p)
(6.7)
If M/L and L/L0 (p) are bounded, then the right-hand side of (6.7) is bounded away from zero. By minor variations of this argument one sees that there is a lower bound for the probability that two occupied crossings r1 and r2 over length L which are within distance of order L from each other are connected (by a circuit of diameter also of order L), provided L/L0 (p) is bounded. We shall say in such a situation that r1 and r2 can be connected by a Harris ring. We now prove the postulates for d = 2 in several subsections. These proofs rely to a large extent on the results and methods of [Kes86] and [Kes87]. 6.1. Proof of Postulates (I) and (II). Postulate (II) is the relation A0 (p, ε) L0 (p, ε; 1) L0 (p, ε; x)
(6.8)
for all p > pc , x ≥ 1 and ε, ε ∈ (0, ε0 ). Once we prove this, Postulate (I) follows e.g. from the equivalence in Eq. (6.5) and Postulate (II): 0 (p), L0 (p) A0 (p) L
(6.9)
Finite-Size Scaling in Percolation
191
0
y
Fig. 1. Harris ring construction for the proof of (6.7)
Eq. (6.3) and the known behavior (2.27). Hence it suffices to establish Postulate (II). We claim that in order to prove (6.8), it suffices to show that for all x ≥ 1 and ε ∈ (0, ε0 /2), there exists an ε ∈ (0, ε0 ) and a λ = λ(ε, ε, x) such that 0 (p, 2ε) ≤ L0 (p, 0 (p, ε) + 1 as p ↓ pc . L ε, x) ≤ λL
(6.10)
Indeed, given (6.10), we can deduce (6.8) for ε, ε < ε0 /2 from (6.5) and the known 0 (p, ε) at different values of ε, i.e., equivalence of L 0 (p, ε1 ) L 0 (p, ε2 ) as p ↓ pc for 0 < ε1 , ε2 < ε0 , L
(6.11)
which follows from the rescaling lemma. Finally we must replace ε0 by ε0 /2 to obtain Postulate (II). We establish (6.10) via an upper and a lower bound. For the upper bound, we note that for all L, M, fin SL,M (p) ≤ 1 − Pp (∃ an occupied circuit in HL,M surrounding ∂I HL,M ).
(6.12)
Given ε, ε ∈ (0, ε0 ) and x ≥ 1, it is not hard to show, by means of the rescaling lemma (compare the argument for (6.7)), that there exists a λ = λ(ε, ε; x) such that if 0 (p, ε), then the probability of the circuit described in (6.12) is M = xL and L ≥ λL fin 0 (p, ε). strictly bounded below by 1− ε for p > pc . Hence SL,xL (p) < ε for all L ≥ λL fin But it follows from the definition (2.25) that SL,xL (p) ≥ ε if L = L0 (p, ε; x) − 1. Thus 0 (p, ε) + 1. L0 (p, ε; x) ≤ λL
(6.13)
Next we establish a lower bound of the same form. To this end, note that the annulus HL,xL consists of four non-overlapping L × xL rectangles and four L × L corners. Let us call the rectangles the left, right, upper and lower rectangles. Clearly, for all L, fin SL,xL (p) ≥ Pp (∃ an occupied left-right crossing in the left rectangle and
a vacant dual left-right crossing in the right rectangle, each connecting ∂I HL,xL to ∂E HL,xL ).
(6.14)
192
C. Borgs, J. T. Chayes, H. Kesten, J. Spencer
Since x ≥ 1, the lower bound in (6.14) is only strengthened by requiring that the occupied crossing occur in an L × L sub-box of the corresponding L × xL rectangle and that the vacant dual crossing occur in an (L + 1) × (L − 1) rectangle. By (6.1) this gives fin (p) ≥ RL,L (1 − RL,L ) ≥ SL,xL
1 (1 − RL,L ), 2
(6.15)
ε; x). It then follows from the definition (2.25) that since p > pc . Now let L = L0 (p, fin SL,xL (p) ≤ ε, so that (6.15) implies RL,L ≥ 1 − (2 ε) if L = L0 (p, ε; x). Comparing this with the definition (6.2) for p > pc , we conclude 0 (p, 2 ε; x) ≥ L ε), L0 (p,
(6.16)
a lower bound of the desired form. " #
6.2. Proof of Postulate (III). Postulate (III) is almost identical to Theorem 1 of [Kes87], 0 (p, ε), whereas Postulate (III) assumes except that the latter uses the condition n ≤ L n ≤ L0 (p, ε). Thus, to establish Postulate (III), it suffices to show that for all ε ∈ (0, ε0 ), there exists an ε ∈ (0, ε0 ) such that 0 (p, ε). L0 (p, ε) ≤ L
(6.17)
0 (p, ε) for p > pc To prove (6.17) we note that by (6.9) we have L0 (p, ε) ≤ λ(ε)L and a suitable λ < ∞. This relation also holds for p < pc , as observed in (6.4). Therefore, it suffices to show that for all p = pc , λ < ∞, and ε ∈ (0, ε0 ), there exists an ε ∈ (0, ε0 ) such that 0 (p, 0 (p, ε) ≤ L ε). λL
(6.18)
Finally, by (6.3), it suffices to establish (6.18) only for p < pc , and by iteration, to establish the latter only for λ = 2. To this end, we note that by the Russo-SeymourWelsh lemma ([Rus78, SW78, Sect. 3.4]), rescaling and the obvious monotonicity of RL,M , we have RM,M (p) ≥ f (RL,L (p)) if L ≤ M ≤ 3L,
(6.19)
for some function f on [0, 1] which is strictly positive on (0, 1]. Without loss of generality 0 (ε, p), we conclude that we may take f (ε) ≤ ε. Using the definition (6.2) of L RM,M (p) > f (ε) if
0 (p, ε) − 3. M ≤ 3L
(6.20)
As a consequence, 0 (ε, p) − 2 ≥ 2L 0 (p, ε), 0 (p, f (ε)) ≥ 3L L
(6.21)
0 (p, ε) > 1 in the last step. where we have used that R1,1 (p) ≥ p > ε, and hence L This establishes (6.18) and hence Postulate (III). " #
Finite-Size Scaling in Percolation
193
6.3. Proof of Postulate (IV). We will establish Postulate (IV) for all p such that m ≤ L0 (p) (a somewhat stronger result than the stated postulate at pc ). This postulate with ρ1 = 2 follows from the claim that for some C1 > 0, m −1/2 πm (p) ≥ C1 πn (p) n
n ≤ m ≤ L0 (p).
if
(6.22)
In order to establish (6.22), we assume that kn ≤ m ≤ (k + 1)n for some integer k ≥ 1. By (2.12) and monotonicity of πn , π(k+1)n . πm ≥ π(k+1)n ≥
(6.23)
Recall the definition (2.9) of π(k+1)n and observe that one mechanism to ensure that the origin is connected to the line at x1 = (k +1)n is to have (1) the origin connected to some point in ∂Bn (0), (2) some point on ∂Bn (0) connected to the line at x1 = (k +1)n, and (3) Harris rings in the annuli Bn \ Bn/2 and B2n \ Bn and a rectangle crossing from (say) the right boundary of Bn/2 to the central quarter of the right boundary of B2n to “glue” the connections in (1) and (2) together. Since n ≤ L0 (p), the probability of the third event is bounded away from zero, uniformly in n (as in (6.7)). Denote the probability of the event described in (2) above by Gn,kn . Equation (6.23) and the Harris-FKG inequality then imply that for some constant C2 > 0, πm ≥ C2 πn Gn,kn .
(6.24)
By an argument almost √identical to the proof of Corollary (3.15) in [BK85], kn ≤ L0 (p) implies Gn,kn ≥ C3 / k, where C3 is a lower bound on the probability of an occupied crossing of a 2kn × 2kn square. The constant C3 > 0 by virtue of (6.17) and (6.20). (Essentially this same argument is used in [Kes87], Eq. (3.6) and its proof on p. 143.) Thus (6.24) implies the desired bound (6.22). " #
6.4. Proof of Postulate (V). Theorem 3 of [Kes87] gives the second inequality in Postulate (V). Thus it suffices to prove that for a suitable constant D4 and all p > pc , χ cov (p) ≤ D4 L20 (p)πL2 0 (p) (pc ).
(6.25)
To this end, we decompose the sum defining χ cov (p) (with |v| short for |v|∞ and L0 for L0 (p) ): χ cov (p) = Covp (0 ↔ ∞; v ↔ ∞) + Covp (0 ↔ ∞; v ↔ ∞). (6.26) |v|≤2L0
|v|>2L0
To control the first term, we use the bound (4.4) in Lemma 4.1 and Postulate III to estimate Covp (0 ↔ ∞; v ↔ ∞) ≤ Pp {0 ↔ ∞, v ↔ ∞} |v|≤2L0
≤
|v|≤2L0
|v|≤2L0
τ (0, v) ≤
|v|≤2L0
2 π[|v|/2] (p) L20 πL2 0 (pc ).
(6.27)
194
C. Borgs, J. T. Chayes, H. Kesten, J. Spencer
Next, we bound the second term in (6.26). To this end, let B(w) = BL0 (w) be the box of radius L0 centered at w. For |v| > 2L0 , we have Covp (0 ↔ ∞; v ↔ ∞) = Covp (0 ↔ ∞; v ↔ ∞) = Covp (0 ↔ ∞, 0 ↔ ∂B(0); v ↔ ∞, v ↔ ∂B(v)) + Covp (0 ↔ ∞, 0 ↔ ∂B(0); v ↔ ∂B(v)) + Covp (0 ↔ ∂B(0); v ↔ ∞, v ↔ ∂B(v)) + Covp (0 ↔ ∂B(0); v ↔ ∂B(v)) = Covp (0 ↔ ∞, 0 ↔ ∂B(0); v ↔ ∞, v ↔ ∂B(v)) + 2Covp (0 ↔ ∂B(0); v ↔ ∞, v ↔ ∂B(v)), (6.28) where in the last step we have used that Covp (0 ↔ ∂B(0); v ↔ ∂B(v)) = 0 by the independence of the events {0 ↔ ∂B(0)} and {v ↔ ∂B(v)} when B(0) and B(v) are disjoint, and also the symmetry of the roles played by 0 and v. Now we bound the second term on the right-hand side of (6.28) according to Covp (0 ↔ ∂B(0); v ↔ ∞, v ↔ ∂B(v)) = Covp (0 ↔ ∂B(0); v ↔ ∞, v ↔ ∂B(v), v ↔ ∂B(0)) + Covp (0 ↔ ∂B(0); v ↔ ∞, v ↔ ∂B(v), v ↔ ∂B(0)) = Covp (0 ↔ ∂B(0); v ↔ ∞, v ↔ ∂B(0))
(6.29)
= − Covp (0 ↔ ∂B(0); v ↔ ∞, v ↔ ∂B(0)) ≤ Pp {0 ↔ ∂B(0)} Pp {v ↔ ∞, v ↔ ∂B(0)}, where we have used that the two events {v ↔ ∞} ∩ {v ↔ ∂B(v)} ∩ {v ↔ ∂B(0)} and 0 ↔ ∂B(0) are independent. Using the Harris-FKG inequality and obvious monotonicities, the second factor on the right-hand side of (6.29) is in turn bounded according to Prp {v ↔ ∞, v ↔ ∂B(0)} ≤ Prp {v ↔ ∂B(0)} Prp {∃w ∈ ∂B(0) such that w and v are surrounded by a vacant dual contour}. (6.30) We now follow a coarse-graining argument along the lines of the proof of Theorem 3 in [Kes87] (see (3.12), (3.13) and (2.25) there). Let v = (v1 , v2 ) and for the sake of argument let v1 = |v| = |v|∞ . If there exists a vacant dual contour surrounding w ∈ ∂B(0) and v, then there exists a vacant dual path from B(0) to some B(v1 + j, v2 ) with j ≥ 0. By (2.25) in [Kes87] the probability that such a vacant path exists is at most C1 exp[−C2 |v|/L0 ]. Together with (6.29) and Postulate III this leads to a bound of C3 πL2 0 (pc ) exp[−C2 |v|/L0 ] for the second term in the right-hand side of (6.28).
(6.31)
Finite-Size Scaling in Percolation
195
Next we bound the first term in the right-hand side of (6.28) by means of the BK inequality as follows: Prp {0 ↔ ∞, 0 ↔ ∂B(0); v ↔ ∞, v ↔ ∂B(v)} ≤ Prp {0 ↔ ∂B(0), v ↔ ∂B(v) and there exist vacant dual contours C1 , C2 surrounding 0 and v, respectively} ≤ Prp {0 ↔ ∂B(0), v ↔ ∂B(v) and there exist edge-disjoint vacant dual contours C1 , C2 surrounding 0 and v, respectively} + Prp {0 ↔ ∂B(0), v ↔ ∂B(v) and there exist vacant dual contours C1 , C2 surrounding 0 and v, respectively, and C1 and C2 have an edge in common}. By the BK inequality the first term in the right-hand side is no more than Prp {0 ↔ ∂B(0) and there exists a vacant dual contour C1 which surrounds 0} × Prp {v ↔ ∂B(v) and there exists a vacant dual contour C2 which surrounds v} = Prp {0 ↔ ∞, 0 ↔ ∂B(0)}Prp {v ↔ ∞, v ↔ ∂B(v)}. Therefore, Covp (0 ↔ ∞, 0 ↔ ∂B(0); v ↔ ∞, v ↔ ∂B(v)} ≤ Prp {0 ↔ ∂B(0), v ↔ ∂B(v) and there exist vacant dual contours C1 , C2 surrounding 0 and v, respectively, and C1 and C2 have an edge in common} ≤ Prp {0 ↔ ∂B(0)}Prp {v ↔ ∂B(v)}Prp {∃ vacant dual contours C1 , C2 surrounding 0 and v respectively, and C1 and C2 have an edge in common}, (6.32) where we have used the Harris-FKG inequality and disjointness of B(0) and B(v) in the last step. If the two dual contours C1 , C2 in (6.32) have an edge in common, and if again v = (v1 , v2 ) with v1 = |v|, then C1 ∪ C2 contains a vacant dual path from some B(−j1 , 0) to some B(v1 + j2 , v2 ) with j1 , j2 ≥ 0. The same argument as used for (6.31) now shows that also the first term in the right-hand side of (6.28) is bounded by (6.31). Finally, then Covp (0 ↔ ∞; v ↔ ∞) |v|>2L0
≤
|v|>2L0
2C3 πL2 0 (pc ) exp[−C2 |v|/L0 ] ≤ C(ε)L20 πL2 0 (pc ).
Together with (6.26), (6.27) this yields (6.25).
(6.33)
# "
6.5. Proof of Postulate (VI). Postulate (VI) for d = 2 goes back to [Ngu85]. We can also immediately obtain this from Theorem 2 in [Kes87], which states that P∞ (p) is of the same order as πL 0 (p,ε) (pc ). But by (6.10), (6.11) there exists a λ = λ(ε) ≥ 1 such 0 (p, ε) ≤ λL0 (p, ε). Therefore, by Postulate (IV) that L −1/ρ1 πL πL0 (p,ε) (pc ). 0 (p,ε) (pc ) ≥ πλL0 (p,ε) (pc ) ≥ D3 λ
196
C. Borgs, J. T. Chayes, H. Kesten, J. Spencer
Combined with Theorem 2 of [Kes87] this gives one inequality of Postulate (VI). In the other direction, it is trivial that P∞ (p) ≤ πL0 (p) (p) and further πL0 (p) (p) πL0 (p) (pc ), by Postulate (III). " # 6.6. Proof of Postulate (VII). We shall build a cluster of size at least ks(L0 (p)) by connecting together C1 k clusters of size at least C2 s(L0 (p)) (for suitable constants C1 , C2 ) in adjacent squares of size 2L0 (p). These clusters will be connected by means of Harris rings. By Postulate (IV) and Proposition 4.6 (which relies only on Postulate (I)–(IV)) there exists a σ0 ∈ (0, 1] such that for n0 = σ0 L0 (p)/2 , P≥s(L0 (p)) (p) ≤ P≥s(n0 ) (p) ≤ C3 πn0 (pc ); in the first inequality we used that Postulate (IV) implies (5.10), which in turns implies that s(m) ≥ s(n) if m/n is large enough. In turn, by Postulate (III), the right-hand side here is at most C4 πn0 (p). It therefore suffices to show that for p < pc and suitable constants C5 , D6 , P≥ks(L0 (p)) (p) ≥ C5 e−D6 k πn0 (p).
(6.34)
First we use Theorem 3.3 and Lemma 4.4, (4.14). These results show that there exist constants K0 < ∞ and y0 > 0 such that P rp {∃ cluster C ⊂ L0 (p) with diam(C) ≥ y0 L0 (p) and |C| ≥ K0−1 s(L0 (p))} (1)
= P rp {WL0 (p) ≥ K0−1 s(L0 (p))} − P rp {∃ cluster C ⊂ L0 (p) with diam(C) < y0 L0 (p) and |C| ≥ K0−1 s(L0 (p))} 1 1 ≥ − C1 y0−2 exp[−C2 (K0 y0 )−1 ] ≥ , 2 4 (6.35) provided L0 (p) ≥ 4/y0 . The estimate (6.35) shows that with a probability of at least 1/4 there is a cluster with a “large” size and “large” diameter in L0 (p) . We wish to locate this large cluster more precisely. In fact we want to show that we may assume that it crosses a certain rectangle in the first coordinate direction. To this end we note that if diam(C) ≥ y0 L0 (p), then there are two points v, w ∈ C so that wi − vi ≥ y0 L0 (p) for i = 1 or i = 2. Assume that this holds for i = 1. Then for some −2/y0 ≤ j ≤ 2/y0 the event M(p, j ) := {∃ cluster C ∈ L0 (p) with |C| ≥ K0−1 s(L0 (p)) that contains (6.36) points v, w with v1 ≤ jy0 L0 (p)/2 < (j + 1)y0 L0 (p)/2 ≤ w1 } must occur. Therefore there exists a j0 ∈ [−2/y0 , 2/y0 ] for which y0 P rp {M(p, j0 )} ≥ . 8(y0 + 1)
(6.37)
From (6.37) and translation invariance it follows that each of the events {∃ cluster C0 ∈ [20L0 (p), (20 + 2)L0 (p)) × [−L0 (p), L0 (p)) with |C0 | ≥ K0−1 s(L0 (p))and which crosses [(20 + j0 )y0 L0 (p)/2, (20 + j0 + 1)y0 L0 (p)/2] × [−L0 (p), L0 (p)) in the horizontal direction} , 0 ≥ 0,
(6.38)
Finite-Size Scaling in Percolation
197
has probability at least y0 /(8y0 + 8). Let k ≥ 1 be given and take r = 'kK0 (. If the event in (6.38) occurs for 0 = 0, 1, . . . , r and 0 ↔ ∂Bn0 (0), and the paths from 0 to ∂Bn0 (0) and the horizontal crossings of [(20+j0 )y0 L0 (p)/2, (20+j0 +1)y0 L0 (p)/2]× [−L0 (p), L0 (p)) , 0 ≤ 0 ≤ r are all connected by Harris rings, then the cluster of the origin has size at least rK0−1 s(L0 (p)) ≥ ks(L0 (p)). The Harris-FKG inequality now shows that y0 P≥ks(L0 (p)) (p) ≥ πn0 (p)C6 [C6 ]r . 8(y0 + 1) This proves (6.34) with D6 = log
8(y0 + 1)(K0 + 1) , C6 y0
and Postulate (VII) follows for all p < pc with L0 (p) ≥ 4/y0 . If L0 (p) < 4/y0 , the −2 postulate follows from the trivial bound P≥ks(L0 (p)) ≥ pks(L0 (p)) ≥ p64y0 k . # " 7. Proof of Theorem 3.7 In this section, we introduce Postulate (VII alt), which is slightly stronger than Postulate (VII), and prove Theorem 3.7. To state the Postulate (VII alt), we need some notation. For k ≥ 1, let [k]d = {1, . . . , k}d . Given an integer k ≥ 1, and a choice of vertices v( j) ( j) := 2jL0 (p) + L0 (p)/4 , j ∈ [k]d , we define sets ( j) = 2jL0 (p) + L0 (p) in
and ! F=
( j), j∈[k]d
as well as events G( j) = G(j; x) = {|C ( j) (v( j))| ≥ xs(L0 (p))}, G(j; x), Gk = Gk (x) = j∈[k]d
H ( j) = {v( j) ↔ v(j ± ei ) in F, 1 ≤ i ≤ d}, where the i th component of j ± ei equals ji ± 1. We also define Hk = {all v( j) with j ∈ [k]d are connected in F} =
H ( j).
2≤ji ≤k−1 1≤i≤d
Postulate (VII alt). For all 0 < x ≤ 1 there exists a constant D7 = D7 (x) > 0 such that P rp {Hk | Gk (x)} ≥ D7k
d
(7.1)
for all ζ0 ≤ p < pc , k ≥ 1 and all choices of v( j), j ∈ [k]d . We remind the reader that ζ0 is some arbitrary number in (0, pc ).
198
C. Borgs, J. T. Chayes, H. Kesten, J. Spencer
Note that there are k d choices for j ∈ [k]d . Condition (7.1) therefore roughly speaking says that the conditional probability of H ( j), given that |C(v( j))| ≥ xs(L0 (p)) and each |C(v( j ± ei ))| ≥ xs(L0 (p)), is at least D7 . Or still more intuitively, “clusters of size of order s(L0 (p)) and a distance of order L0 (p)) apart have a reasonable conditional probability of being connected." We also mention that (7.1) is actually not needed for all x ∈ (0, 1], but only for one fixed value of x for which P rp {G( j)} ≥ C1 πL0 (p) (pc ) for some constant C1 > 0, independent of p < pc . Such x and C1 can be shown to exist by means of the bound (5.40) which follows from Proposition 5.3.
7.1. Proof of Theorem 3.7i. In this subsection we always assume Postulates (I)–(IV) and ζ0 ≤ p < pc . For brevity we write in many places L for L0 (p) and for L0 (p) . In Steps i–v we also use Postulate (VII alt), but we only shall use that (7.1) is valid for 0 < x ≤ x0 for some x0 > 0. The value of x0 is irrelevant. All constants in this section are independent of k. Step i. There exists an x ∈ (0, 1] and a constant C2 > 0 such that uniformly for ( j) = 2 jL0 (p) + L0 (p)/4 , v( j) ∈
P rp {G( j)} ≥ C2 πL (pc ).
(7.2) (1)
To prove (7.2) we use the relation (5.3) between the distribution of W and P≥s . For r ≥ 1 and any 0 < C1 < ∞, we get (1)
P rp {W r ≥ C1 s(r)} ≤
v∈ r s≥C1 s(r)
1 P rp {|C r (v)| = s} s
| r | ≤ sup P rp {|C r (v)| ≥ C1 s(r)}. C1 s(r) v∈ r
(7.3)
On the other hand, by (5.10), s(m) ≥ s(r) and hence (1)
(1)
P rp {W r ≥ C1 s(r)} ≥ P rp {W r ≥ s(m)}
(7.4)
whenever m ≥ r(C1 /D3 )2/d . Setting r = L0 (p)/4 , m = r'(C1 /D3 )2/d (, and choosing C1 > 0 small enough to guarantee that '(C1 /D3 )2/d (σ (1) (1/4, 1/2) ≤ 1, where σ (1) (λ, δ) is the constant introduced before (5.40), we can now use the bound (5.40). (1) Combined with (7.4) we get P rp {W r ≥ C1 s(r)} ≥ 1/2. Using (7.3), we therefore conclude that there exists a constant C3 > 0 and a w0 ∈ r such that P rp {|C r (w0 )| ≥ C1 s(r)} ≥ 21 C2 πr (p) ≥ C3 πr (pc ),
(7.5)
where we used Postulate (III) in the last step. Now for any v ∈ r , r shifted by v − w0 is contained in 3r ⊂ . Therefore for all v ∈ r = L0 (p)/4 and sufficiently small C4 , P rp {|C (v)| ≥ C4 s(L)} ≥ P rp {|C r (w0 )| ≥ C1 s(r)} ≥ C3 πr (pc ) ≥ C2 πL (pc ). (7.6) This proves (7.2) for j = 0 and x = C4 ∧ 1. But then it clearly holds for all j by translation and for all 0 < x ≤ C4 ∧ 1.
Finite-Size Scaling in Percolation
199
Step ii. Now fix k and for brevity write M = k d . Let C2 and C4 be such that (7.6) holds. Also fix x = C4 ∧ 1 ∧ x0 and take D7 = D7 (x). It is useful to indicate the choice of the v( j) more explicitly in our notation. With some abuse of notation we denote the possible values of j by 1, . . . , M, and we occasionaly write Gk (v(1), . . . , v(M)) instead of Gk , and similarly for Hk (v(1), . . . , v(M)). We have defined the ( j) such that they are disjoint. Consequently, for any choice ( j), we have by (7.2) of v( j) in
" P rp {Gk (v(1), . . . , v(M))} = P rp {G( j)} ≥ [C2 πL (pc )]M , j
and then by Postulate (VII alt) P rp {Gk (v(1), . . . , v(M)) ∩ Hk (v(1), . . . , v(M))} ≥ [D7 C2 πL (pc )]M . (7.7) # (k) ( j) for j = 1. We indicate this sum by We sum this over all v( j) ∈
. We therefore have for some constants C5 , C6 , (k) P rp {Gk (v(1), . . . , v(M)) ∩ Hk (v(1), . . . , v(M)) (7.8) ≥ [D7 C2 πL (pc )]M [2L0 (p)/4 ]M−1 ≥ C5 πL (pc )[C6 s(L)]M−1 . Step iii. We next work on an upper bound for the left-hand side of (7.8). To this end we note that on the event Gk ∩ Hk , v( j) is connected to v(1) and therefore to ∂ ( j) whenever j = 1. We therefore define ( j) = number of v ∈
( j) which are connected to ∂ ( j). V We further define Ik = Ik (v(1)) = I [|CF (v(1))| ≥ Mxs(L)].
(7.9)
Clearly, on the event Gk (v(1), . . . , v(M)) ∩ Hk (v(1), . . . , v(M)), it holds that |CF (v(1))| ≥ |C ( j) (v( j))| ≥ Mxs(L) and v( j) ↔ ∂ ( j), j
and therefore (k) P rp {Gk (v(1), . . . , v(M)) ∩ Hk (v(1), . . . , v(M))} ≤ Ep {Ik (v(1))I [v(1) ↔ ∂ (1)]
"
( j)}. V
(7.10)
j=1
We continue this inequality. For any γ ≥ 0 the right-hand side of (7.10) is at most eγ M [s(L)]M−1 Ep {Ik (v(1))} " " ( j); ( j) ≥ eγ M [s(L)]M−1 V V + Ep Ik (v(1))I [v(1) ↔ ∂ (1)] j=1
≤e
γM
[s(L)]
j=1
M−1
Ep {Ik (v(1))} " 2 ( j) . V + e−γ M [s(L)]−M+1 Ep I [v(1) ↔ ∂ (1)] j=1
200
C. Borgs, J. T. Chayes, H. Kesten, J. Spencer
( j) are independent among each other and of I [v(1) ↔ Finally we observe that the V ∂ (1)], because the ( j) are disjoint. Moreover, for each j, 2 ( j)} ≤ C7 s 2 (L), Ep {V by virtue of (4.6). Therefore " " 2 ( j)} 2 ( j)} = P rp {v(1) ↔ ∂ (1)} Ep {I [v(1) ↔ ∂ (1)] Ep {V V j=1
j=1
≤ πL/2 (pc )[C7 s(L)]
2M−2
.
Combining these estimates gives (k) P rp {Gk (v(1), . . . , v(M)) ∩ Hk (v(1), . . . , v(M))} ≤ eγ M [s(L)]M−1 Ep {Ik (v(1))} + e−γ M [s(L)]−M+1 πL/2 (pc )[C7 s(L)]2M−2 . (7.11) Step iv. In this step we complete the deduction of Postulate (VII) from Postulates (I)–(V) and Postulate (VII alt). From (7.8)–(7.11) we obtain (by means of Postulate (IV)) eγ M [s(L)]M−1 Ep {Ik (v(1))}
≥ C5 πL (pc ) [C6 s(L)]M−1 − e−γ M [s(L)]−M+1 (C5 D3 )−1 21+1/ρ1 [C7 s(L)]2M−2 . Choosing γ so large that e−γ < C6 C7−2 ∧
1 C5 D3 2−1/ρ1 , 4
we find that 1 Ep {Ik (v(1))} ≥ πL (pc )e−γ M C5 C6M−1 . 2
(7.12)
Since, by (7.9), the left-hand side is no more than P≥Mxs(L) (p), and, by (4.20), πL (pc ) ≥ C2−1 P≥s(L) (pc ) ≥ C2−1 P≥s(L) (p), we obtain Postulate (VII). Step v. Even though we finished the deduction of Postulate (VII), we point out here that had we summed over v(1) as well, then the derivation given above would have resulted in (1)
P rp {W kL ≥ C2 Ms(L)} = P rp {∃ a cluster in
!
( j) of size ≥ C2 Ms(L)}
(7.13)
j
≥ C9 e−γ M C6M . This is basically the estimate (5.83) and we can deduce the lower bound in Theorem 3.5 almost immediately from (7.13) without repeating most of its proof from Postulate (VII).
Finite-Size Scaling in Percolation
201
Also (7.13) can be used to derive the desired counterpart to (3.14), namely, for each fixed K and i, lim sup Ppn
(i)
W n
(i)
Epn {W n }
≤ K < 1,
(7.14)
when pn is inside the scaling window, i.e., when (3.5) holds. To see (7.14) for i = 1, fix some large K. Then choose k such that for large n, 1/ρ2 k (1) KEpn {W n } ≤ C2 C1−1 s(n) and kL0 (p) > 2n, (7.15) 2 (1)
with C1 as in (4.17). Such a k exists because Epn {W n } and s(n) are of the same order by Theorem 3.1 i) and pn is inside the scaling window. Finally choose pn% ≤ (pn ∧ pc ) such that n ≤ kL0 (pn% ) ≤ 2n. This can be done by virtue of (2.29). Lemma 4.5 then shows that 1/ρ2 (1) −1 k d % C2 k s(L0 (pn )) ≥ C2 C1 s(n) ≥ KEpn {W n } (see (7.15)). 2 Finally, then, by (7.13) for p = pn% , (1)
(1)
(1)
Ppn {W n ≥ KEpn {W n }} ≥ Ppn {W n ≥ C2 k d s(L0 (pn% ))} d
(1)
d
≥ Ppn% {W n ≥ C2 k d s(L0 (pn% ))} ≥ C9 e−γ k C6k > 0. This proves (7.14) for i = 1. For general i a little extra work is needed as in the last few lines of the proof of Theorem 3.5. " # 7.2. Proof of Theorem 3.7ii. We briefly indicate how to derive Postulate (VII alt) in dimension 2. We first show that (7.1) holds when x is sufficiently small. In fact, if K0 , y0 and j0 are the constants for which (6.35)–(6.37) hold, then this argument works for x ≤ [K0 ]−1 . With M(p, j ) as in (6.36), we have by a Harris ring construction that for ( j), suitable constants C1 , C2 > 0 and all v( j) ∈
P rp {∃ cluster C ∈ L0 (p) which contains v( j) and points v, w with v1 ≤ j0 y0 L0 (p)/2 < (j0 + 1)y0 L0 (p)/2 ≤ w1 and with |C| ≥ K0−1 s(L0 (p))} ≥ C1 P rp {M(p, j0 )}P rp {v( j) ↔ ∂BL0 (p) (v( j)} ≥
C1 y0 πL (p) (p) 8(y0 + 1) 0
≥ C2 πL0 (p) (pc ). On the other hand, by definition of G( j), P rp {G( j)} ≤ P≥xs(L0 (p)) (p).
(7.16)
202
C. Borgs, J. T. Chayes, H. Kesten, J. Spencer
Using the bound P≥s (p) ≤ πn (p) +
n 2 1 |∂Bm | πm/2 (p) , s m=1
proven in [BCKS98], Eq. (4.20), one easily shows that there is a constant C10 = C10 (x) < ∞ such that P≥xs(L0 (p)) (p) ≤ C10 πL0 (p) (pc ). Since the G( j) for different j depend on disjoint regions, they are independent, and P rp {Gk } ≤ [P≥xs(L0 (p)) (p)]k ≤ [C10 πL0 (p) (pc )]k . 2
2
(7.17)
Finally, denote the event in the left-hand side of (7.16) by K( j), that is K( j) = ∃ cluster C ∈ L0 (p) which contains v( j) and points v, w with v1 ≤ j0 y0 L0 (p)/2 < (j0 + 1)y0 L0 (p)/2 ≤ w1 and with |C| ≥ K0−1 s(L0 (p))}. Note that K( j) implies G( j) when x ≤ [K0 ]−1 . Therefore another Harris ring construction shows that for some constant C11 > 0, k P rp {K( j) for all 1 ≤ ji ≤ k, i = 1, 2} P rp {Gk ∩ Hk } ≥ C11 2
k [C2 πL0 (p) (pc )]k , ≥ C11 2
2
by virtue of (7.16) and the Harris–FKG inequality. Comparing this with (7.17), we see that P rp {Gk ∩ Hk } ≥ [C11 C2 /C10 ]k P rp {Gk }. 2
This completes the proof of (7.1) when x ≤ [K0 ]−1 . For our purposes (7.1) for 0 < x ≤ [K0 ]−1 is actually good enough, but it is not hard to obtain (7.1) for general 0 < x ≤ 1 now. In fact, we can apply the same argument as above, provided we first prove the following strengthening of (7.16) for some constant C12 > 0: P rp {∃ cluster C ∈ L0 (p) which contains v( j) and points v, w with v1 ≤ j0 y0 L0 (p)/2 < (j0 + 1)y0 L0 (p)/2 ≤ w1 and with |C| ≥ s(L0 (p))} ≥ C12 πL0 (p) (pc ).
(7.18)
But (7.18) can be derived exactly as (7.16) from an analogue of (6.37) if we start from P rp {∃ cluster C ∈ L0 (p) with diam(C) ≥ y0 L0 (p) and |C| ≥ s(L0 (p))} (1)
≥ P rp {WL0 (p) ≥ s(L0 (p))} − P rp {∃ cluster C ⊂ L0 (p) with diam(C) ≤ y0 L0 (p) but |C| ≥ s(L0 (p))} (1)
≥ P rp {WL0 (p) ≥ s(L0 (p))} − C1 y0−2 exp[−C3 y0−1 ] ≥ C13 > 0, (7.19)
Finite-Size Scaling in Percolation
203
which is valid for sufficiently small y0 > 0 and some constant C13 > 0. Equation (7.19) is the analogue of (6.35) with [K0 ]−1 replaced by 1. The reason why we can prove this now, but could not take [K0 ]−1 = 1 in (6.35) to begin with, is that we first needed to (1) show that P rp {WL0 (p) ≥ s(L0 (p))} is bounded away from 0. But this is now available to us from (7.14). As we pointed out before (7.14) only needs (7.1) for 0 < x ≤ x0 for some x0 > 0, and this we just derived. " # Acknowledgements. The authors wish to thank the Forschungsinstitut of the ETH in Zürich and the Institute for Advanced Study in Princeton for their hospitality and partial support of the research in this paper. The authors are also grateful for partial support from other sources: C.B. was supported by the Commission of the European Union under the grant CHRX-CT93-0411, J.T.C. by NSF grant DMS-9403842, and H.K. by an NSF grant to Cornell University.
References [ACCFR83] Aizenman, M., Chayes, J.T., Chayes, L., Fröhlich, J. and Russo, L.: On a sharp transition from area law to perimeter law in a system of random surfaces. Commun. Math. Phys. 92, 19–69 (1983) [Aiz97] Aizenman, M.: On the number of incipient spanning clusters. Nucl. Phys. B[FS] 485, 551–582 (1997) [Ald97] Aldous, D.: Brownian excursions, critical random graphs and the multiplicative coalescent. Ann. Probab. 25, 812–854 (1997) [Ale96] Alexander, K.: Private communication (1996) [AN84] Aizenman, M. and Newman, C.M.: Tree graph inequalities and critical behavior in percolationmodels. J. Stat. Phys. 36, 107–143 (1984) [AS92] Alon, N. and Spencer, J.: The Probabilistic Method. New York: Wiley Interscience, 1992 [BBCK98] Bollobás, B., Borgs, C., Chayes, J.T. and Kim, J.-H.: Unpublished (1998) [BCKS98] Borgs, C., Chayes, J.T., Kesten, H. and Spencer, J.: Uniform boundedness of critical crossing probabilities implies hyperscaling. Rand. Struc. Alg. 15, 368–413 (1999) [BC96] Borgs, C. and Chayes, J.T.: On the covariance matrix of the Potts model: A random cluster analysis. J. Stat. Phys. 82, 1235–1297 (1996) [BI92-1] Borgs, C. and Imbrie, J.: Finite-size scaling and surface tension from effective one-dimensional systems. Commun. Math. Phys. 145, 235–280 (1992) [BI92-2] Borgs, C. and Imbrie, J.: Crossover finite-size scaling at first order transitions. J. Stat. Phys. 69, 487–537 (1992) [BK85] van den Berg, J. and Kesten, H.: Inequalities with applications to percolation and reliability. J. Appl. Probab. 22, 556–569 (1985) [BoK90] Borgs, C. and Kotecký, R.: A rigorous theory of finite-size scaling at first-order phase transitions. J. Stat. Phys. 61, 79–110 (1990) [Bol84] Bollobás, B.: The evolution of random graphs. Trans. Am. Math. Soc. 286, 257–274 (1984) [Bol85] Bollobás, B.: Random Graphs. London: Academic Press, 1985 [CC86] Chayes, J.T. and Chayes, L.: Percolation and random media. In: Random Systems and Gauge Theories, Les Houches, Session XLIII, eds K. Osterwalder and R. Stora Amsterdam: Elsevier, 1986, pp. 1001–1142 [CC87] Chayes, J.T. and Chayes, L.: On the upper critical dimension in Bernoulli Percolation. Commun. Math. Phys. 113, 27–48 (1987) [CCD87] Chayes, J.T., Chayes, L. and Durrett, R.: Inhomogeneous percolation problems and incipient infinite clusters. J. Phys. A: Math. Gen. 20, 1521–1530 (1987) [CCF85] Chayes, J.T., Chayes, L. and Fröhlich, J.: The low-temperature behavior of disordered magnets. Commun. Math. Phys. 100, 399–437 (1985) [CCFS86] Chayes, J.T., Chayes, L., Fisher, D. and Spencer, T.: Finite-size scaling and correlation lengths for disordered systems. Phys. Rev. Lett. 57, 2999-3002 (1986) [CCGKS89] Chayes, J.T., Chayes, L., Grimmett, G.R., Kesten, H. and Schonmann, R.H.: The correlation length for the high-density phase of Bernoulli percolation. Ann. Probab. 17, 1277–1302 (1989) [Cha98] Chayes, J.T.: Finite-size scaling in percolation. Doc. Math. J. DMV, Extra Volume ICM III 113–122, (1998) [CPS96] Chayes, J.T., Puha, A.L. and Sweet, T.: Independent and dependent percolation. In: IAS-Park City Mathematics Series, Vol. 6 Probability Theory and Applications (Princeton, NJ, 1996), Providence, RI: AMS, 1999, pp. 49–166
204
[Con85] [ER59] [ER60] [GM90] [Gri99] [Ham57] [Har01] [HHS01] [Har60] [Jar00] [JKLP93] [Kes82] [Kes86] [Kes87] [Luc90] [Ngu85] [Ngu88] [Rus78] [SmW78] [SW78]
C. Borgs, J. T. Chayes, H. Kesten, J. Spencer
Coniglio, A.: Shapes, surfaces and interfaces in percolation clusters. In: Proc. Les Houches Conference on Physics of Finely Divided Matter, eds M. Daoud and N. Boccara Berlin–Heidelberg– New York: Springer-Verlag, 1985, pp. 84–109 Erd˝os, P. and Rényi, A.: On random graphs I. Publ. Math. (Debrecen) 6, 290–297 (1959) Erd˝os, P. and Rényi, A.: On the evolution of random graphs. Magy. Tud. Akad. Mat. Kut. Intéz. Közl. 5, 17–61 (1960) Grimmett, G. and Marstrand, J.M.: The supercritical phase of percolation is well behaved. Proc. Roy. Soc. London Ser. A 430, 439–457 (1990) Grimmett, G.: Percolation. 2nd edition, New York: Springer-Verlag, 1999 Hammersley: Percolation processes. Lower bounds for the critical probability. Ann. Math. Statist. 28, 790–795 (1957) Hara, T.: Critical two-point functions for nearest-neighbour high-dimensional self-avoiding walk and percolation. Preprint in preparation Hara, T., van der Hofstad, R. and Slade, G.: Critical two-point functions and the lace expansion for spread-out high-dimensional percolation and related models. Preprint, (2001) Harris, T.E.: A lower bound for the critical probability in a certain percolation process. Proc. Cambridge Philos. Soc. 56, 13–20 (1960) Járai, A.: Incipient infinite percolation clusters in 2D. Preprint (2000) Janson, S., Knuth, D.E., Łuczak, T. and Pittel, B.: The birth of the giant component. Random Struc. Alg. 4, 233–358 (1993) Kesten, H.: Percolation Theory for Mathematicians. Boston: Birkhäuser, 1982 Kesten, H. : The incipient infinite cluster in two-dimensional percolation. Probab. Theory Rel. Fields 73, 369–394 (1986) Kesten, H.: Scaling relations for 2D-percolation. Commun. Math. Phys. 109, 109–156 (1987) Łuczak, T.: Component behavior near the critical point of the random graph process. Rand. Struc. Alg. 1, 287–310 (1990) Nguyen, B.G.: Correlation lengths for percolation processes. PhD thesis, UCLA (1985) Nguyen, B.G.: Typical cluster size for two-dimensional percolation processes. J. Stat. Phys. 50, 715–726 (1988) Russo, L.: A note on percolation. Z. Wahrsch. verw. Geb. 43, 39–48 (1978) Smythe, R.T. and Wierman, J.C.: First-Passage Percolation on the Square Lattice. BerlinHeidelberg: Springer-Verlag, 1978 Seymour, P.D. and Welsh, D.J.A.: Percolation probabilities on the square lattice. Ann. Discrete Math. 3, 227–245 (1978)
Communicated by M. Aizenman
Commun. Math. Phys. 224, 205 – 218 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Are There Incongruent Ground States in 2D Edwards–Anderson Spin Glasses? C. M. Newman1, , D. L. Stein2, 1 Courant Institute of Mathematical Sciences, New York University, New York, NY 10012, USA.
E-mail: [email protected]
2 Departments of Physics and Mathematics, University of Arizona, Tucson, AZ 85721, USA.
E-mail: dls @ physics.arizona.edu Received: 3 December 2000/ Accepted: 30 April 2001
Abstract: We present a detailed proof of a previously announced result [1] supporting the absence of multiple (incongruent) ground state pairs for 2D Edwards–Anderson spin glasses (with zero external field and, e.g., Gaussian couplings): if two ground state pairs (chosen from metastates with, e.g., periodic boundary conditions) on Z2 are distinct, then the dual bonds where they differ form a single doubly-infinite, positive-density domain wall. It is an open problem to prove that such a situation cannot occur (or else to show – much less likely in our opinion – that it indeed does happen) in these models. Our proof involves an analysis of how (infinite-volume) ground states change as (finitely many) couplings vary, which leads us to a notion of zero-temperature excitation metastates, that may be of independent interest.
1. Introduction The decades-old challenge of understanding the physical nature of laboratory spin glasses and the mathematical nature of spin glass models at low temperature continues. It is a paradigm of the wider effort to analyze the many novel features that occur in disordered systems generally. One can only hope that this effort will achieve some fraction of the successes that have been reached in understanding homogeneous systems – in and out of equilibrium – and that are epitomized by the work of Joel Lebowitz and his many collaborators. It is indeed an honor to contribute to this celebration of Joel’s first 70 years; may he live to 120. Research partially supported by the National Science Foundation under grants DMS-98-02310 and DMS01-02587. Research partially supported by the National Science Foundation under grants DMS-98-02153 and DMS01-02541.
206
C. M. Newman, D. L. Stein
Our focus here is entirely on the Edwards–Anderson (EA) [2] model on Zd , simplest of the short-ranged Ising spin glasses, with Hamiltonian Jxy σx σy . (1) HJ (σ ) = − x,y
Here J denotes a specific realization of the couplings Jxy = Jx,y , the spins σx = ±1 and the sum is over nearest-neighbor pairs x, y only, with the sites x, y on the square lattice Zd . The Jxy ’s are independently chosen from a symmetric, continuous distribution with unbounded support, such as Gaussian with mean zero; we denote by ν the overall disorder distribution for J . In this paper, we restrict attention entirely to ground states, and further, to the lowest interesting dimension, d = 2. Of course, for d = 1, and assuming as we do that the Jxy ’s are continuously distributed, it is easy to see that the multiplicity of infinite-volume ground states is exactly two – i.e., a single ground state pair (GSP) of spin configurations related to each other by a global spin flip – since, in the absence of frustration, every bond can be satisfied in a ground state. We are interested in the question of whether there are infinitely many observable GSP’s. By “observable” we mean that these states can be generated without using special J -dependent boundary conditions. This means that by using, say, periodic boundary conditions on the L × L squares SL centered at the origin, for a sequence of L’s tending to infinity, also chosen in a J -independent way, the corresponding sequence of finite(L) volume GSP’s for the finite-volume Hamiltonians HJ (when restricted to a fixed, but arbitrarily large window about the origin) will generate an empirical distribution, i.e., a histogram, that in the limit is dispersed over many GSP’s. 2. Main Result 2.1. Preliminaries: Metastates. To state a precise theorem about the GSP’s that arise in this way, we need to explain the notion of a metastate [3–6] in this zero-temperature context. We will do this in the briefest possible way here, using empirical distributions, while delaying to later sections of the paper a discussion of the fact that there are alternative definitions giving rise to the same mathematical object. First, we note that for a given J , with all couplings nonzero, a GSP α may be identified with the collection of unsatisfied bonds, which we regard as edges in the dual lattice. Now suppose that Lj → ∞ is a sequence of scale sizes, not depending on J , such that for ν-almost every J , there is a probability measure (called a metastate) κJ , defined on the configurations α of GSP’s on all of Z2 , which is the limit of the empirical (L) distributions of the finite volume GSP’s αJ along the sequence Lj as follows: Let D1 and D2 be disjoint finite sets of dual edges, let A(D1 , D2 ) denote the event that every (M) edge in D1 is unsatisfied and every edge in D2 is satisfied; let FJ (D1 , D2 ) denote the fraction of the indices j ∈ {1, . . . , M} such that all the edges of D1 and D2 are within (L ) the square SLj and such that the GSP αJ j obeys all the requirements of A(D1 , D2 ); then for every such D1 and D2 , (M)
lim FJ (D1 , D2 ) = κJ (A(D1 , D2 )).
M→∞
(2)
Thus a metastate for T = 0 is an ensemble of infinite-volume GSP’s that describes the asymptotic fractions of squares, along a subsequence Lj , for which the various
Are There Incongruent Ground States in 2D Spin Glasses?
207
GSP’s are observed (when restricted to windows of fixed, but arbitrarily large, size) within the finite-volume systems. It can be shown by compactness arguments [5, 6] that such subsequences Lj exist; in fact every subsequence has such an Lj as a further subsubsequence. Although it is a reasonable conjecture that any two metastates are in fact the same for almost every J , no general result has been proved. However, this would be an immediate corollary of the following conjecture, at least for d = 2, which would also imply that the metastate is supported on a single GSP for almost every J . We note that recent numerical results are consistent with the existence of only a single GSP in two dimensions [7, 8]. Conjecture 1. Let J be chosen from the disorder distribution ν and let α and β be GSP’s chosen independently from d = 2 periodic boundary condition metastates, κJ and κJ (coming from subsequences Lj and L k ). Then, with probability one, α = β. 2.2. Theorem. The main result of this paper is the proof of the following theorem, which we regard as partial verification of the above Conjecture – see the remark below. Equality of two GSP’s, α and β, is of course equivalent to the vanishing of the symmetric difference αβ, the collection of bonds that are satisfied in one of the two GSP’s and unsatisfied in the other. It is not hard to show (see Proposition 1 below) that, at least for periodic boundary conditions, the symmetric difference must consist either of a single domain wall (i.e., a doubly-infinite self-avoiding path in the dual lattice) with strictly positive density or else multiple nonintersecting domain walls which have altogether strictly positive density, but may have zero density individually. A priori, we felt (and still feel) that on a heuristic level, the former scenario for GSP multiplicity is the less plausible of the two. The next theorem rigorously eliminates the latter scenario. Theorem 1. Let J be chosen from the disorder distribution ν and let α and β be GSP’s chosen independently from d = 2 periodic boundary condition metastates, κJ and κJ (coming from subsequences Lj and L k ). Then, with probability one, either α = β or else αβ is a single domain wall with strictly positive density. Proof. This theorem will be an immediate consequence of three propositions, given in Sect. 4 of the paper. Remark. Although Theorem 1 does not eliminate the scenario of multiple GSP’s whose symmetric differences are single positive density domain walls, we suspect that such domain walls do not in fact occur. The proof of Theorem 1 is based on showing that the presence of two or more αβ domain walls would create an instability for both α and β with respect to the flip of a large droplet whose boundary consists of two long segments from adjacent domain walls, connected by two short “rungs” between the walls. The stability of α and β to such flips is controlled by the infimum E of the necessarily positive rung energies – see Eq. (11). Proposition 3 of Sect. 4 proves instability by showing that E = 0, while Proposition 2 there shows that such unstable GSP’s cannot actually occur with nonzero probability. If there were a single domain wall, it would be natural to expect that, like the rungs in Proposition 3, the “pseudo-rungs” that connect sections of the domain wall that are close in Euclidean distance, but greatly separated in distance along the domain wall, could also have arbitrarily low positive energies. If these pseudo-rungs connected long pieces of the domain wall containing some fixed bond (and we emphasize that these properties have not been proved), then single domain
208
C. M. Newman, D. L. Stein
walls would be ruled out by an analogue of Proposition 2. The consequence would be that the periodic boundary condition metastate in the 2D EA Ising spin glass would be unique and supported on a single GSP.
2.3. Extension to other boundary conditions. The restriction to periodic boundary conditions in Theorem 1 can in fact be relaxed to allow other boundary conditions that do not depend on J . For boundary conditions such as antiperiodic that are flip-related to periodic ones, nothing needs to be done, since they yield the same metastate – see Sect. IV of [4]. To explain how other boundary conditions can be handled, we begin by noting that the significance of periodic boundary conditions is that they yield translation-invariance of various infinite-volume objects, which in turn is a crucial ingredient in the propositions of the next section. With periodic boundary conditions, translation-invariance (L) is already valid for finite volume. For example, from the random pair J , αJ , the finite dimensional distributions of finitely many coupling values and finitely many bond satisfaction variables are unchanged under translation by y, as long as y does not translate any of the finitely many bonds in question beyond SL . On the other hand, in the spirit of the empirical distribution construction of the metastate described above, one (L) could rather consider the random pair J , αJ , with L chosen, uniformly at random, from L1 , . . . , LM . In that case, there is in a certain sense only approximate translation invariance for finite M, since the bonds typically do get translated out of SLj for small j . But full translation-invariance is restored in the limit M → ∞. For non-periodic, but still J -independent, boundary conditions, one can somewhat similarly obtain infinite-volume translation-invariance, as follows. For each L and x, let (L,x) αJ denote the GSP in the translated square SL +x with some J -independent boundary condition, such as free or plus. Next, let X (L) denote a uniformly random site in SL √(L) , where the deterministic L (L) → ∞ with, say, L − L (L) → ∞ (e.g., L (L) = L). (L,X (L)) (L,X (L)) Then the random pair (J , αJ ) or, alternatively, (J , αJ ), has approximate translation-invariance, which becomes exact as L → ∞, or, alternatively, M → ∞. Using such an “average over translates” construction, one can obtain metastates coming from, e.g., free or plus boundary conditions, for which the analogue of Theorem 1 will be valid. Such averaging over translates can also be used to obtain translation-invariance for the extended notions of metastates we describe next. 3. The Excitation Metastate An important part of the proof of Theorem 1 is based on extending the notion of metastates so as to describe how a given GSP changes as the couplings in J vary. Of course, if Conjecture 1 were true, then, at least for d = 2, there would be, for almost every J , a GSP αJ , uniquely determined as being the one on which the periodic boundary condition metastate is supported; thus one would know how αJ changes even when infinitely many of the couplings in J vary. But in general, since there might be many GSP’s and perhaps even many metastates, it is not so obvious how to formulate the dependence of a given GSP in the support of a metastate even on finitely many couplings. Neither the statement of Theorem 1 nor that of our three main propositions requires this extension of metastates, but it will be needed for the proofs of the latter two of the
Are There Incongruent Ground States in 2D Spin Glasses?
209
main propositions. This extension will be presented in detail in Sect. 5 of the paper, but we present a short exposition here, since it seems to be of independent interest. Roughly speaking, the extension requires that we keep track of not only the GSP itself, but also of all its excitations in which finitely many spins are forced to take specified values, modulo a global flip. We note that recent numerical studies of spin glasses have analyzed excitations induced in this way [9] and in more novel ways [10]. There are two types of information about our excitations that one might wish to keep track of: (a) the minimum energy cost required to force the spins, and (b) the pair of spin configurations that does the minimizing – i.e., the excited state. It actually suffices to keep track only of (a), but it is perhaps conceptually simpler to keep track of (b) as well, and we will take that tack. Suppose A is a finite subset of Z2 (in this discussion, we only take d = 2 for convenience), η is a spin configuration on A and L is sufficiently large so that A ⊂ SL . A,η,(L) We denote by αJ the pair of periodic boundary condition spin configurations on SL with minimum energy subject to the constraint that they equal ±η on A. If A is empty or (L) a singleton site, this is just the ordinary finite-volume ground state αJ . We also define A,η,(L)
A,η,(L)
to be the energy of αJ minus the ground state (L) energy of αJ . Let B be a finite set of bonds b = x, y and let J B denote a realization the excitation energy EJ
(L)
of the couplings Jb for all b ∈ B. To see how αJ and eventually αJ varies with J B when all other couplings are fixed, we begin by letting A = A(B) denote the set of sites A,η,(L) that are endpoints of bonds in B and considering the excitation energies EJ and A,η,(L)
, for all possible spin configurations η on A. We corresponding excited states αJ also define B HJ B (η) = − Jxy ηx ηy , HJ (η; B) = − Jxy ηx ηy , (3) x,y∈B
x,y∈B
J [J B ]
and denote by the coupling configuration in which each coupling Jb of J with b ∈ B is replaced by JbB and all other couplings are left unchanged. Then, for fixed η, A,η,(L) αJ [J B ] does not depend on J B and A,η,(L) A,η ,(L) (L) (L) HJ [J B ] αJ [J B ] − HJ [J B ] αJ [J B ] A,η,(L) A,η ,(L) (L) (L) = HJ [J B ] αJ − HJ [J B ] αJ (4) = HJ B (η) − HJ (η; B) − HJ B (η ) − HJ (η ; B) A,η,(L)
+ EJ
A,η ,(L)
− EJ
.
A,η,(L)
depends on J but not on J B while HJ B (η) depends on J B Note that EJ but not on J . Consider now the finitely many functions, as η varies on A, A,η,(L)
B h(L) η (J ) ≡ EJ
+ HJ B (η) − HJ (η; B).
(5)
∗(L)
These are affine functions of J B , and if we define ηJ (J B ) to be the η that minimizes (L)
hη (J B ), it follows that
(L)
∗(L)
A,ηJ (J B ),(L)
αJ [J B ] = αJ
.
(6)
210
C. M. Newman, D. L. Stein
When letting L → ∞, we will do so for the ground state αJ and simultaneously for the A,η A,η excitation energies EJ and excited states αJ for all choices of finite A and spin configurations η on A; a superscript will denote that collection of choices. Of course, this needs to be done via a metastate construction that extends the “ground metastate” κJ described earlier, to what we will call the excitation metastate κJ . The excitation metastate is a probability measure on infinite-volume excitation energies and states for the given J , (E , α ), which includes the ground metastate since the ground state α can be obtained by restricting α to A being the empty set (or a singleton, since we are dealing with periodic boundary conditions that do not break spin-flip symmetry). To see how the ground state α changes to α[J B ] when the couplings in a fixed finite B vary, we can then use the infinite-volume extensions of our last two displayed equations (where HJ B (η) and HJ (η; B) are as before): hη (J B ) ≡ E A(B),η + HJ B (η) − HJ (η; B),
(7)
∗ B α[J B ] = α A(B),η (J ) ,
(8)
and
where η∗ (J B ) is the η on A(B) that minimizes hη (J B ).
4. The Main Propositions In this section, we present the three central propositions leading immediately to Theorem 1. The proof of the first of these, a direct application to spin glasses of general 2D percolation results of Burton and Keane [11], will be given in this section. The proof of the second and third propositions will be given in Sect. 6. We begin with a somewhat more detailed discussion of ground metastates than given in the last section. For simplicity, we continue to restrict the discussion to periodic boundary condition metastates, as in Sect. 2. An (infinite-volume) ground state pair or GSP for a given coupling realization J is a pair of spin configurations ±σ on Zd , whose energy, governed by Eq. (1), cannot be lowered by flipping any finite subset of spins. That is, it must satisfy the constraint Jxy σx σy ≥ 0 (9) x,y∈C
along any closed loop C in the dual lattice. Infinite-volume ground states are always the limits of finite volume ground states, but, in general, the finite-volume boundary conditions may need to be carefully chosen, depending on J and/or the limiting ground state. In a disordered sytem, if there are many distinct GSP’s for typical fixed J , then in (L) general, as noted in [12], the limit limL→∞ αJ doesn’t exist, if the L’s are chosen in a coupling-independent way. This phenomenon was called chaotic size dependence [12]. The ground metastate, a probability measure κJ on the infinite-volume ground states (L) αJ , was proposed in [5] as a means of analyzing the way in which αJ samples from its various possible limits as L → ∞. (The metastate was introduced and defined for both zero and positive temperatures, but we confine the discussion here to zero temperature.) The same metastate can be constructed by at least two distinct approaches. The first,
Are There Incongruent Ground States in 2D Spin Glasses?
211
introduced earlier by Aizenman and Wehr (AW) [13], directly employs the randomness of the J ’s, while the “empirical distribution” approach of [5] and subsequent papers was motivated by, but doesn’t require, the potential presence of chaotic size dependence for fixed J . The empirical distribution point of view (and its natural extension to excitation metastates) will be the primary one used throughout this paper. However, we briefly describe the AW construction, since it is the one that directly gives, for, e.g., periodic boundary conditions, the translation invariance that will be crucial in our first proposition; for (L) more details see [13]. Here one considers, for each L, the random pair (J , αJ ) (where (L)
αJ is the finite-volume periodic boundary condition GSP obtained using the restriction J (L) of J to SL ), and takes the limit of the finite-dimensional distributions along a J -independent subsequence of L’s, using compactness. This yields a probability distribution K on infinite-volume (J , α)’s which is translation invariant, under simultaneous lattice translations of J and α, because of the periodic boundary conditions, and is such that the conditional distribution κ˜ J of α given J is supported entirely on GSP’s for that J . The conditional distribution κ˜ J is the AW ground metastate. It is easy to show that there is sequential compactness leading to convergence for J -independent subsequences of L’s, as described above. We have conjectured [6] that all subsequence limits are the same; i.e., that existence of a limit does not require taking a subsequence. Proving this conjecture remains an open problem. The empirical distribution approach of [3, 5, 6], as described in Sect. 2, takes a fixed J and, roughly speaking, replaces the “J -randomness” used in the AW construction of κ˜ J with “L-randomness” – i.e., with chaotic size dependence. The empirical distributions along a subsequence (L1 , L2 , . . . ) are the measures κJM = (1/M)
M k=1
δ
(L )
αJ k
,
(10)
where δα denotes the Dirac delta measure at the state α and where for convenience we (L) regard the finite-volume GSP αJ as defined in infinite volume by, e.g., taking all bonds outside SL as satisfied. We say that κJM has a limit κJ if the probability of any event A(D1 , D2 ) (that every edge in D1 is unsatisfied and every edge in D2 is satisfied, where D1 and D2 are disjoint finite sets of dual edges) converges to the κJ -probability of that event. It was shown in [6] that there exists a J -independent subsubsequence where the limits κ˜ J and κJ are the same. For more details and proofs, see [3, 5, 6]. Also see [4] for additional properties of the metastate, particularly invariance with respect to gaugerelated boundary conditions. Before we state Proposition 1, some additional definitions are needed. Consider a periodic boundary condition metastate κJ (in some fixed dimension, not necessarily two) and two GSP’s α and β chosen from κJ . Then their symmetric difference αβ, ∗ as introduced in Sect. 2, is the set of edges in the dual lattice Zd that are satisfied in α and not β or vice-versa. If B is the graph whose edge set is αβ and whose vertices are ∗ all sites in Zd touching αβ, then a domain wall, defined relative to the two GSP’s, is a cluster (i.e., a maximal connected component) of B. (In two dimensions, according to Proposition 1, domain walls are generically doubly-infinite self-avoiding paths in the dual lattice.) The symmetric difference αβ is the union of all αβ domain walls and
212
C. M. Newman, D. L. Stein
may consist of a single domain wall or of multiple domain walls that are site-disjoint and hence also edge-disjoint. Two distinct GSP’s α and β are said to be incongruent if αβ has a well-defined ∗ nonvanishing density within the set of all edges in Zd ; if the density is zero, they are regionally congruent. We do not consider here the case where the density is not well-defined; we will see from Proposition 1 that in fact this cannot happen in two dimensions. In Proposition 1, we will also see that, if there are multiple GSP’s, the “observable” ones are incongruent. Our primary interest is therefore in the question of existence of these “physical” incongruent states, which should be observable by using coupling-independent boundary conditions. As mentioned in Sect. 2, incongruent states may consist of a single positive-density wall, or else of multiple domain walls, which individually may or may not have positive density, but collectively have strictly positive density. In all our propositions, J is chosen from the disorder distribution ν and then α and β are GSP’s chosen independently from periodic boundary condition metastates κJ and κJ (which may be the same), as described above. Proposition 1 ([1, 11]). Distinct α and β in any dimension must, with probability one, be incongruent. In two dimensions, all domain walls comprising αβ have the following properties with probability one: (i) they are infinite and contain no loops or dangling ends; (ii) they cannot branch and thus are doubly-infinite self-avoiding paths; (iii) they together partition Z2 into at most two topological half-spaces and/or a finite or infinite number of doubly-infinite topological strips (that also cannot branch – i.e., each strip has two boundary domain walls and exactly one neighboring strip or half-space on each side). (iv) Moreover, each domain wall has a well-defined density and there cannot simultaneously be positive-density and zero-density walls. Proof of Proposition 1. Let us denote by DJ the probability measure on configurations of αβ corresponding to choosing α and β independently from κJ and κJ , and denote by D the measure then obtained by integrating out the couplings J with respect to the disorder distribution ν. We claim that D is translation-invariant. To see this, begin with the translation-invariant measures on joint configurations of couplings and GSP’s K (= νκJ ) and K (= νκJ ) and note that the natural coupling νκJ κJ , a measure on (J , α, β) configurations, retains translation-invariance. D is then translation-invariant since it is just the distribution of αβ with (α, β) distributed as the marginal of this coupled measure. The translation-invariance of D in turn implies by the ergodic theorem ∗ with respect to Z2 -translations that any “geometrically defined event”, such as a bond belonging to a domain wall, occurs either nowhere or else with strictly positive density. This proves the first claim. To prove property (i), we note that a domain wall taken from αβ separates regions in which the spins of α and β agree from regions where they disagree. A domain wall therefore cannot end at a point in any finite region. To rule out loops, note that the sum x,y Jxy σx σy along any such loop must have opposite signs in the two GSP’s, violating Eq. (9), unless the sum vanishes. But this occurs with zero probability because the couplings are chosen independently from a continuous distribution. Claims (ii), (iii), and (iv) are proven in [11], using percolation-theoretic arguments first presented in [14]; we sketch the arguments. To prove (ii), suppose that a domain wall
Are There Incongruent Ground States in 2D Spin Glasses?
213
branches at some site z in the dual lattice. (We note, although it’s not needed for the proof, that the number of branches emanating from z must be even, again because domain walls separate regions of spin configuration agreement from regions of disagreement. Hence the minimal branching at z is four.) None of these branches may intersect somewhere else, by property (i). By the translation-invariance of D, there must then be a positive density of branch points, so that the domain wall would have a treelike structure. That implies the existence of an " > 0 such that the boundary of SL is intersected by a number of distinct branches that grows as "L2 as L → ∞, which is impossible. The proof of (iii) uses a similar argument to rule out branching of the strips – see Theorem 2 of [11] for details. Property (iv) is not needed for subsequent arguments, but is included for completeness; it is proven in Theorem 4 of [11] and follows readily from the properties just proven. If zero-density and positive-density clusters coexist, then for some p > 0, there is positive D-probability that the origin of the dual lattice is contained in a zero-density domain wall with an adjacent wall of density at least p. Let Sp be the set of all walls with density greater than or equal to p. Then there can be no more than (1/p) walls in Sp . The maximum number of walls of density zero that are adjacent to walls belonging to Sp (i.e., if every Sp -wall is surrounded by two zero-density walls whose other adjacent wall does not belong to Sp ) is therefore 2/p. But then the union of such zero-density walls has density zero and so the probability of the event that the origin is contained in a zero-density wall adjacent to a wall in Sp is zero, leading to a contradiction. This completes the proof of the proposition. So the picture we now have of the symmetric difference αβ is a union of one or more doubly infinite domain walls. These domain walls do not branch or have any internal loops, and they divide the plane into strips or (if there are positive-density domain walls) half-planes. In all cases where there is more than a single domain wall, translation-invariance of D implies that distinct domain walls mostly remain within an O(1) distance of one another. E.g., there can be no “hourglass”, “martini glass”, etc., domain wall configurations; these can be ruled out by arguments similar to those used in the proof of part (ii) of Proposition 1. The essential idea behind the proof of Theorem 1 is contained in the next two propositions. Before we state these propositions, we need to introduce the notion of a “rung” between adjacent domain walls. A rung R, defined with respect to αβ, is a path of ∗ edges in Z2 connecting two distinct domain walls, with only the first and last sites in R on any domain wall. So R can contain only edges that are not in αβ, and the corresponding couplings are therefore either both satisfied or both unsatisfied in α and β. The energy ER of R is defined to be Jxy σx σy , (11) ER = xy∈R
with σx σy taken from α or equivalently β. It must be that ER > 0 with probability one for the following reasons, which we sketch here and make precise later in the proof of Proposition 2. Suppose that a rung could be found with negative energy (there is zero probability of a zero-energy rung); by translation-invariance there would need to be many such rungs between some fixed pair of adjacent domain walls. Consider the “rectangle” formed by two such negative-energy rungs and the connecting segments of the two adjacent domain walls. The sum of Jxy σx σy along the couplings in the domain wall segments would be positive in one GSP (say, α), and would therefore be negative in the other (say, β). Therefore, the loop formed by the boundary of this rectangle would violate Eq. (9) in GSP β.
214
C. M. Newman, D. L. Stein
It is then natural to ask the deeper question of whether rung energies along any strip are strictly bounded away from zero, or whether their infimum is exactly zero. Propositions 2 and 3 address this question. Proposition 2. The rung energies ER between two fixed (adjacent) domain walls cannot be arbitrarily small; i.e., there is zero probability that E = inf R ER = 0. Proposition 3. There is zero probability that E > 0. The contradiction between Propositions 2 and 3 leads directly to Theorem 1. These propositions will be proved in Sect. 6. 5. Transition Values and Flexibilities In this section, we present two auxiliary propositions. They will be used in the next section to prove Propositions 2 and 3. These auxiliary propositions involve two notions, transition value and flexibility, that arise in the analysis of how a GSP changes when a single coupling, Jb , varies. Since this is a restricted case of the dependence of α[J B ] on a finite collection J B of couplings, we begin the section by providing a more detailed exposition of the excitation metastate than that given in Sect. 3 above. Along with an empirical distribution construction of the excitation metastate κJ as a probability measure, defined for ν-almost every J , on configurations (E , α ) of excitation energies and states for the given J , there is an alternative AW-type construction, ,(L) ,(L) ,(L) ,(L) as follows. For each L, consider (J , EJ , αJ ), where EJ and αJ denote the excitation energies and states in SL , with periodic boundary conditions, when the spin configuration on A ⊂ SL is constrained to be ±η (for all allowed A’s and η’s). As in the AW ground metastate construction, one has sequential compactness of the corresponding probability measures, K,(L) , leading to convergence of the finite dimensional distributions (involving finitely many couplings, finitely many finite A’s and finitely many η’s) to those of a limiting translation-invariant measure K on infinite-volume configurations (J , E , α ) along deterministic subsequences of L’s. The marginal distribution of J from this K is of course just ν and the conditional distribution of (E , α ) given J is then an excitation metastate κ˜ J , which, like in the
ground metastate case, can be shown for ν-almost every J to equal the κJ constructed via empirical distributions, as the limit along a subsubsequence of (1/M)
M k=1
δ
,(Lk )
EJ
,(Lk )
,αJ
.
(12)
The translation-invariance of K follows, as usual, from the periodic boundary conditions. The relative compactness (tightness) for α ,(L) follows from the two-valuedness of spin variables. Finally, the relative compactness (tightness) for E ,(L) follows from the trivial bound, A,η,(L) |EJ |≤ |Jxy |, (13)
A
where A denotes the sum over bonds x, y with either x or y or both in A, together with the fact that the distribution of the Jxy ’s does not change with L.
Are There Incongruent Ground States in 2D Spin Glasses?
215
As explained in Sect. 3, for a given J , we can extract from (E , α ) not only the GSP α, but also α[J B ] , which describes how the GSP changes when the couplings in a fixed finite set B of bonds vary. When B consists of a single bond b = x, y, we write α(K ; b) for the ground state that results when Jb is replaced by K with all other couplings of J left unchanged. It should be clear from Equations (7) and (8) that as K varies in (−∞, +∞), the GSP α(K ; b) changes exactly once (this is particularly easy to see in finite volume and the property is preserved in the excitation metastate), from its original configuration α when K = Jb to a new configuration α b = α {x,y},ηˆ ,
(14)
where ηˆ is one of the two spin configurations on {x, y} of opposite parity to the original GSP α (so that σx σy is +1 in one of α and α b and −1 in the other, or equivalently Jb is satisfied in one and unsatisfied in the other). We call the value of K where this change happens the transition value and denote it by Kb . For a given b, the transition value Kb and the unordered set of two GSP’s {α, α b } do not depend on the value of Jb , with all other couplings held fixed (again, this is clear for finite volume, and is preserved in the limit). This means that with respect to the probability measure K on infinite-volume configurations (J , E , α ), the random variables Kb and Jb are independent. The next proposition is an immediate consequence of this independence. Proposition 4. With probability one, no coupling Jb is exactly at its transition value Kb . Proof of Proposition 4. From the independence of Jb and Kb , and the continuity of the distribution of Jb , it follows that there is probability zero that Jb − Kb = 0. As in the proof of the last proposition, we continue to work on the probability space of (J , E , α ) configurations with probability measure K . When the value of Jb is moved from its original value past the transition value Kb , the change from the original ground state of α to the new ground state, and originally excited state, of α b may involve the flipping of a finite droplet (region of Z2 ) or one or more infinite droplets. Thus the symmetric difference αα b , representing the dual bonds which change from satisfied to unsatisfied or vice-versa, may consist of a single finite loop or else of one or more infinite disconnected paths, but in all cases some part must pass through b since its satisfaction status clearly changes. To help analyze what other bonds αα b may or may not pass through, we introduce the notion of flexibility. The flexibility of a bond b = x, y is defined as Fb ≡ |Kb − Jb | = (1/2) |E {x,y},ηˆ |
(15)
and thus is proportional to the excitation energy needed to flip the relative sign of the spins at x and y; it is a measure of the stability of the ground state α with respect to fluctuations of the single coupling Jb . Proposition 5. For two bonds a and b, there is zero probability that Fb > Fa and simultaneously αα a passes through b. (L)
Proof of Proposition 5. For finite L, and a bond e in SL , let us denote by Fe ≡ (L) (L) |Je − Ke | the finite-volume flexibility. Now Fe is clearly the minimum, over all droplets in SL , with periodic boundary conditions, whose boundary passes through e, of (half the) droplet flip energy cost in the GSP α (L) . Since this is the case for both e = a
216
C. M. Newman, D. L. Stein
and e = b, it is an immediate consequence that the finite-volume droplet boundary (L) (L) α (L) α a,(L) cannot pass through b if Fb > Fa . After L → ∞, the characterization of Fe as a minimum over finite droplets may be lost, but we claim that the conclusion of the proposition still holds. This is because, although the convergence of K,(L) along (L) (L) a subsequence to K is not sufficient to imply, e.g., that the probability of Fb > Fa converges along the subsequence to the limiting probability of Fb > Fa , it is sufficent to imply that the probability of the event in the proposition is less than or equal to the the lim inf of the (zero) probability of the corresponding finite-volume events. This completes the proof of the proposition.
6. Proof of Propositions 2 and 3
Proof of Proposition 2. Suppose that there are two adjacent domain walls from the GSP’s α and β, W1 and W2 , with W1 passing through the origin of the dual lattice, and suppose further that the infimum E of rung energies ER for rungs R between W1 and W2 is zero. Our object is to prove that this event has zero probability. If the probability is nonzero, then for every " > 0 there is some *(") < ∞ so that, with nonzero probability, there is a rung R between W1 and W2 , with the property P("), that its length, defined as the number of bonds, is below *(") and its energy ER is below ". But then, by translation-invariance and the lemma given right after this proof, there must, with nonzero probability, be infinitely many such rungs with property P(") with starting points on W1 in both directions from the origin along W1 . Thus we can find two such rungs R and R , one in each direction, and sufficiently far apart that they do not touch each other. Consider the “rectangular” region of Z2 whose boundary is the union of these two rungs and the connecting segments, C1 and C2 of W1 and W2 . The energy cost of flipping the spins in this region in α (respectively, in β) is +E(C1 , C2 )+ER +ER (respectively, −E(C1 , C2 ) + ER + ER ). Both these quantities must be positive since both α and β are GSP’s; hence |E(C1 , C2 )| is bounded by ER + ER < 2" and the energy costs in both ground states are bounded by 4". This implies that every bond b that W1 (or W2 ) passes through has flexibility less than 2". Since " is arbitrary, the flexibilities must be zero, but that would contradict Proposition 4. This, together with the following lemma, completes the proof. Lemma 1. Suppose P is a translation-invariant property of rungs, e.g., the property that the rung energy is below a certain value and/or the rung length is below a certain value. There is zero probability that there exist two adjacent domain walls, W1 and W2 , such that the set of starting points on W1 of rungs between W1 and W2 that satisfy P is nonempty without being doubly infinite, i.e., along both directions of W1 . Proof of Lemma 1. The proof is based entirely on the translation invariance of the measure K . Suppose the claim of the lemma is false. Then for each site x in the dual lattice, there is nonzero probability for the event Ax that there is a domain wall W passing through x and an adjacent wall W such that x is the last site in one of the two directions along W such that there is a rung from that site to W satisfying P. Since every domain wall has two directions and at most two adjacent domain walls, there can be at most four sites on any domain wall for which this event occurs. Every domain wall that intersects the
Are There Incongruent Ground States in 2D Spin Glasses?
217
b1
a
b2
Fig. 1. A rung R with ER = E + δ. The dots are sites in Z2 , and bonds are drawn in the dual lattice. Two domain walls are solid lines and R is the dashed line. The bonds b1 and b2 have flexibility > δ. The ten dotted line bonds are super-satisfied
square SL , sitting inside the infinite lattice, much touch the boundary of the square and thus there are at most cL such domain walls for some constant c < ∞, and consequently at most 4cL sites x in SL for which Ax occurs. But by the ergodic theorem for spatial translations, there is nonzero probability that the number of such sites exceeds c L2 for some constant c > 0. This contradiction completes the proof. Proof of Proposition 3. For the proof, we need the notion of a “super-satisfied” bond b = x, y. It is easy to see, for a given J , that b is satisfied in every ground state if |Jxy | >min{Mx , My }, where Mx is the sum of the three other coupling magnitudes |Jxz | touching x, and My is defined similarly. Such a bond or its dual, called super-satisfied, clearly cannot be part of a domain wall between any two GSP’s. As in the proof of Proposition 1, but using the excitation metastates κJ and κ J that extend the ground metastates from which α and β are chosen, we work in the probability space with the coupled measure νκJ κ J . On this space, we can consider the modified ground states α[J B ] and β[J B ] as any finitely many couplings are varied as well as the transition values and flexibilities for both α and β for all bonds b. Now suppose that the rung energy infimum E between some pair W1 , W2 of domain walls satisfies E > 0 with positive probability; we show this leads to a contradiction. First we find, as in Fig. 1, a rung R and two dual bonds b1 , b2 whose locations on W1 are respectively in opposite directions from the starting site of R, and such that ER − E , which we denote by δ, is strictly less than the flexibility values for both α and β of both b1 , b2 . The existence with positive probability of such an R, b1 and b2 follows from the non-vanishing of flexibilities given by Proposition 4 and translation-invariance (e.g., Lemma 1). But we also want a situation, as in Fig. 1, where all the dual lattice non-domain-wall bonds that touch W1 between b1 and b2 , other than the first bond a in R, are supersatisfied, and remain so regardless of changes of Ja (by a bounded amount). We will call these bonds, numbering ten in Fig. 1, the “special” bonds. How do we know that
218
C. M. Newman, D. L. Stein
such a situation will occur with nonzero probability? If necessary, we can first adjust the signs and then increase the magnitudes (in an appropriate order) of the couplings of the special bonds, so that they first become satisfied and then super-satisfied. This can be done in an “allowed” way because of our assumption that the distribution of individual couplings has unbounded support. Also, this can be done so that α[J B ] and β[J B ] remain unchanged from α or β, and without changing ER , without decreasing any other ER (and thus without changing E or ER − E = δ) and without decreasing the flexibilities of b1 or b2 . Starting from a nonzero probability event, such an allowed change of finitely many couplings in J yields an event which still has nonzero probability. Next, suppose we move Ja toward its transition value Ka by an amount slightly greater than δ. The geometry – see, e.g., Fig. 1 – and Proposition 5 forbid the replacement of either α or β by α a or β a , because it is impossible, under the conditions given, for αα a or ββ a to connect to the end of bond a touching W1 . But this change of Ja reduces ER below ER for any R not containing a, yielding a nonzero probability event that contradicts translation-invariance (i.e., Lemma 1). This completes the proof.
References 1. Newman, C.M. and Stein, D.L.: Nature of ground state incongruence in two-dimensional spin glasses. Phys. Rev. Lett. 84, 3966–3969 (2000) 2. Edwards, S. and Anderson, P.W.: Theory of spin glasses. J. Phys. F 5, 965–974 (1975) 3. Newman, C.M. and Stein, D.L.: Metastate approach to thermodynamic chaos. Phys. Rev. E 55, 5194–5211 (1997) 4. Newman, C.M. and Stein, D.L.: Simplicity of state and overlap structure in finite volume realistic spin glasses. Phys. Rev. E 57, 1356–1366 (1998) 5. Newman, C.M. and Stein, D.L.: Spatial inhomogeneity and thermodynamic chaos. Phys. Rev. Lett. 76, 4821–4824 (1996) 6. Newman, C.M. and Stein, D.L.: Thermodynamic chaos and the structure of short-range spin glasses. In: Mathematics of Spin Glasses and Neural Networks, edited by A. Bovier and P. Picco. Boston: Birkhäuser, 1997, pp. 243–287 7. Middleton, A.A.: Numerical investigation of the thermodynamic limit for ground states in models with quenched disorder. Phys. Rev. Lett. 83, 1672–1675 (1999) 8. Palassini, M. and Young, A.P.: Evidence for a trivial ground-state structure in the two-dimensional Ising spin glass. Phys. Rev. B 60, R9919–R9922 (1999) 9. Krzakala, F. and Martin, O.C.: Spin and link overlaps in 3-dimensional spin glasses. Phys. Rev. Lett. 85, 3013–3016 (2000) 10. Palassini, M. and Young, A.P.: Nature of the spin glass state. Phys. Rev. Lett. 85, 3017–3020 (2000) 11. Burton, R.M. and Keane, M.: Topological and metric properties of infinite clusters in stationary twodimensional site percolation. Isr. J. Math. 76, 299–316 (1991) 12. Newman, C.M. and Stein, D.L.: Multiple states and thermodynamic limits in short-ranged Ising spin glass models. Phys. Rev. B 46, 973–982 (1992) 13. Aizenman, M. and Wehr, J.: Rounding effects of quenched randomness on first–order phase transitions. Commun. Math. Phys. 130, 489–528 (1990) 14. Burton, R.M. and Keane, M.: Density and uniqueness in percolation. Commun. Math. Phys. 121, 501–505 (1989) Communicated by M. Aizenman
Commun. Math. Phys. 224, 219 – 253 (2001)
Communications in
Mathematical Physics
Finite-Volume Fractional-Moment Criteria for Anderson Localization Michael Aizenman1,2 , Jeffrey H. Schenker2 , Roland M. Friedrich3 , Dirk Hundertmark1 1 Department of Physics, Princeton University, Princeton, NJ 08544, USA 2 Department of Mathematics, Princeton University, Princeton, NJ 08544, USA 3 Theoretische Physik, ETH-Zürich, 8093 Zürich, Switzerland
Received: 21 October 1999 / Accepted: 31 March 2000 / Revised: 30 August 2001
To Joel L. Lebowitz on the occasion of his seventieth birthday Abstract: A technically convenient signature of localization, exhibited by discrete operators with random potentials, is exponential decay of the fractional moments of the Green function within the appropriate energy ranges. Known implications include: spectral localization, absence of level repulsion, strong form of dynamical localization, and a related condition which plays a significant role in the quantization of the Hall conductance in two-dimensional Fermi gases. We present a family of finite-volume criteria which, under some mild restrictions on the distribution of the potential, cover the regime where the fractional moment decay condition holds. The constructive criteria permit to establish this condition at spectral band edges, provided there are sufficient “Lifshitz tail estimates” on the density of states. They are also used here to conclude that the fractional moment condition, and thus the other manifestations of localization, are valid throughout the regime covered by the “multiscale analysis”. In the converse direction, the analysis rules out fast power-law decay of the Green functions at mobility edges. Contents 1.
2.
3.
Introduction . . . . . . . . . . . . . . . 1.1 Overview . . . . . . . . . . . . . 1.2 The finite-volume criteria . . . . Proofs of the Main Results . . . . . . . . 2.1 Some useful notation . . . . . . 2.2 Key lemmas . . . . . . . . . . . 2.3 Proofs of the main results . . . . Generalizations . . . . . . . . . . . . . 3.1 Formulation of the general results
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
220 220 222 225 225 227 230 233 233
© 2001 Copyrights rest with the authors. Faithful reproduction of the article for non-commercial purpose is permitted.
220
M. Aizenman, J. H. Schenker, R. M. Friedrich, D. Hundertmark
3.2 Derivation of the general results . . . . . . . . . . . . . . . . . . . Some Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Fast power decay ⇒ exponential decay . . . . . . . . . . . . . . . 4.2 Lower bounds for Gω (x, y; Eedge + i0) at mobility edges . . . . . 4.3 Extending off the real axis . . . . . . . . . . . . . . . . . . . . . . 4.4 Relation with the multiscale analysis and density of states estimates Appendix A. Dynamical Localization . . . . . . . . . . . . . . . . . . . . . . . . . . . B. A Fractional Moment Bound . . . . . . . . . . . . . . . . . . . . . . . . C. Decoupling Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . C.1 Decoupling inequalities for Green functions . . . . . . . . . . . . C.2 A condition for the validity of R2 (s) . . . . . . . . . . . . . . . . 4.
236 238 238 239 240 242 244 248 250 250 251
1. Introduction 1.1. Overview. Operators with extensive disorder are known to have spectral regimes (energy ranges) where the spectrum consists of a dense collection of eigenvalues corresponding to exponentially localized eigenfunctions. This phenomenon is of relevance in different contexts; e.g., it plays a role in the conductive properties of metals [1–3], in the quantization of Hall conductance [4–8], and in the emerging subject of optical crystals [9]. Most of the mathematical results on localization for operators with random potential in dimensions d > 1 have been derived using the multiscale analysis introduced by Fröhlich and Spencer [10] (and later evolved through various other works). For discrete systems there is an alternative approach, based on the analysis of the Green function’s fractional moments [11]. This approach has so far been developed for only a subset of the localization regime, but where it applies it yields somewhat stronger conclusions (through elementary arguments). In this work we present a further extension of that method. In particular, we derive a family of constructive finite-volume criteria for the exponential decay for the fractional moments of Green functions. This decay condition is a technically convenient characterization of localization, for it is known to imply spectral localization, absence of level repulsion, dynamical localization (in a strong exponential sense) and a related condition which plays a significant role in the quantization of the Hall conductance in two-dimensional Fermi gases. The constructive criteria are used to prove that for the discrete random operators described below all these properties hold throughout the regime of localization – if that is defined through either the criteria of the multiscale analysis or those presented here. The constructive criteria also preclude fast power-law decay of the Green functions at mobility edges. A guiding example for the operators discussed here is the discrete Schrödinger operator, acting in 2 (Zd ): Hω = T + λVω ,
(1.1)
with T denoting the off-diagonal part, whose matrix elements are referred to as the hopping terms, and Vω a random multiplication operator – referred to as the potential. The symbol ω represents a particular realization of the disorder, in this case the potential variables {Vω (x)}, and λ serves as the disorder strength parameter.
Finite-Volume Fractional-Moment Criteria for Anderson Localization
For the discrete Schrödinger operator 1 if |u − v| = 1, Tu,v = 0 if |u − v| = 1,
221
(1.2)
and the random potential is given by a collection of independent identically distributed random variables, {Vω (x)}x∈Zd . However, we shall also consider a more general class of operators, allowing the incorporation of magnetic fields, periodic terms, and off-diagonal disorder (see Sect. 3). We focus on the case of extensive disorder, where the distribution of the random operator Hω is either translation invariant, or at least gauge equivalent to shifts by multiples of basic periods (i.e. invariant under periodic magnetic shifts). Our main goal is to present a sequence of finite-volume criteria for localization, which permit to conclude that the following fractional-moment condition is satisfied in some energy interval [a, b] ∈ R: s 1 E x (1.3) y ≤ A(s)e−µ(s)|x−y| , Hω − E − iη for all E ∈ [a, b], η ∈ R, and suitable s ∈ (0, 1). E(·) represents here the average over the disorder, i.e. the random potential. Needless to say, the bound (1.3) is of interest mainly in situations where the energy E is within the spectrum, i.e. [Hω − E]−1 is an unbounded operator and the exponential decay occurs only due to the localization of the eigenfunctions with energies within the interval [a, b]. As in ref. [11], fractional powers are used in order to avoid infinity, however the value of 0 < s < 1 at which Eq. (1.3) is derived is of almost no importance (if Eq. (1.3) holds for a particular value of s, then it will hold for all s < τ , where τ < 1 is a number which depends only on the regularity of the probability distribution of Vω (x), see Appendix – Lemma B.2). For the systems considered here, Eq. (1.3) is known to imply various other properties, mentioned above, which are commonly associated with localization. More explicitly: Spectral localization ([11] – using [12]): The spectrum of Hω within the interval (a, b) is almost-surely of the pure-point type, and the corresponding eigenfunctions are exponentially localized. (ii) Dynamical localization ([13], expanded here in Appendix A): wave packets with energies in the specified range do not spread – −itH ˜ ˜ −µ|x−y| E sup |x|e PH ∈[a,b] |y| ≤ Ae . (1.4)
(i)
t∈R
(iii) Exponential decay of the projection kernel ([8]); the condition expressed in a bound similar to Eq. (1.4) for E(|x|PH ≤E |y|) with E ∈ [a, b]. This condition plays an important role in the quantization of Hall conductance, in the ground state of the two dimensional electron gas with Fermi level EF ∈ [a, b] [7, 6, 8]. (iv) Absence of level repulsion ([14]). Minami has shown that Eq. (1.3) implies, for operators of the type considered here, that in the range [a, b] the energy gaps have Poisson-type statistics. The fractional moment condition has already been established for certain regimes: extreme energies, as well as all energies at high enough disorder [11], and also for weak disorder but far enough from the unperturbed spectrum [13]. The results presented below
222
M. Aizenman, J. H. Schenker, R. M. Friedrich, D. Hundertmark
permit to extend it to band edges, provided there are sufficient “Lifshitz tail estimates” on the density of states (refs. [15–19]), and to other regimes mapped by a sequence of constructive criteria. 1.2. The finite-volume criteria. Our main results admit a number of variations. In this section we present a formulation which is natural for the prototypical example of the discrete random Schrödinger operators, i.e. Hamiltonians of the form (1.1) with T the discrete Laplacian (given by (1.2)). In Sect. 3 we formulate various extensions of the results, including operators incorporating magnetic fields and to operators with hopping terms of unbounded range. The results are derived under some mild regularity assumptions on the probability distribution of the variables {Vω (x)}x∈Zd which form the random potential. For simplicity we address ourselves here to the IID case: the potential variables are independent with a common probability distribution ρ(dV ). The assumption is then that ρ(dV ) satisfies the regularity conditions listed below, R1 (s) or R2 (s). However, the independence is not essential. What matters is that the stated regularity condition be satisfied, with a uniform constant, by the conditional distribution of each of the potential variables, conditioned on arbitrary values of the other potentials. The two regularity conditions mentioned here are: R1 (s): A probability distribution ρ(dV ), on R, is said to be s-regular, or to satisfy the condition R1 (s) at some 0 < s ≤ 1, if there exists C < ∞ such that ρ(a − $, a + $) ≤ C$ s .
(1.5)
R2 (s): The probability distribution ρ(dV ) is said to have the decoupling property R2 (s), with some 0 < s ≤ 1, if there exists C < ∞ such that for any pair of functions f and g of the form f (V ) =
1 , V −a
g(V ) =
V −b , V −c
(1.6)
with a, b, c ∈ C, the expectation of the product can be dominated as follows:
(1.7) E |f (V )|s |g(V )|s ≤ CE |f (V )|s E |g(V )|s . The smallest C such that Eq. (1.7) holds for all a, b, c ∈ C is called here the decoupling constant for ρ, and is denoted by Ds (ρ). A sufficient condition for R2 (s) is that ρ have bounded support and satisfy R1 (τ ) for some τ > 4s (see Appendix C; related discussion is found in refs. [11, 8].) In Appendix B we show that given any τ -regular measure ρ and any s < τ , there is a finite constant C such that for any 2 × 2 self adjoint matrix A2×2 , −1 s u0 ≤ C, ρ(du)ρ(dv) A2×2 + (1.8) 0 v i,j where [·]i,j denotes the i, j matrix element with i, j = 1, 2 . Throughout this work, we denote by Cs the smallest value of C at which (1.8) holds. For ρ(dV ) which also satisfy s = Cs · Ds (ρ)2 . R2 (s) we let: C
Finite-Volume Fractional-Moment Criteria for Anderson Localization
223
For * ⊂ Zd we denote by H*;ω the operator obtained from Hω by “turning off” the hopping terms outside *. Thus, the restriction of H*;ω to 2 (*) (considered as a subspace of 2 (Zd )), is nothing but Hω with the Dirichlet boundary conditions on the boundary of *. We also denote by +(*) the set of the nearest-neighbor bonds reaching out of * (i.e. pairs with one site in * and the other outside), by *+ the collection of sites within distance 1 from *, and by |+(*+ )| the number of bonds reaching out of that set. These notions will be generalized in Sect. 2.1. Following are our basic results for operators of the form (1.1). Theorem 1.1. Let Hω be a random Schrödinger operator with the probability distribution of the potential V (x) satisfying the regularity condition R1 (τ ) and fix s < τ . If for some z ∈ C (possibly real) and some finite region * ⊂ Zd which contains the origin 0: s C 1 s < 1, (1.9) E 0 b(*, z) := sup |+(*+ )| s u λ HW ;ω − z W ⊂* u,u ∈+(*)
then there are some µ(s) > 0 and A(s) < ∞ – which depend on the energy z only through the bound b(*, z) – such that for any region . ⊂ Zd , s 1 (1.10) E±i0 x y ≤ A(s)e−µ(s) |x−y| . H.;ω − z The subscript of E±i0 , in (1.10) is to be interpreted as saying that the bound is valid for either of the two limiting expressions: s 1 (1.11) lim E x y . η0 H.;ω − E −(+) iη The “cutoff” ±iη is needed for an unambiguous interpretation in case z is a real energy (E) within the spectrum of H . For the random operators considered here it is well understood that: (i) the expectation may be exchanged with the limit η 0, (ii) it suffices to verify the uniform bounds (1.10) for finite regions, and (iii) the finite volume expectations are continuous in η. In the proofs we shall be dealing with finite systems; the subscript will, therefore, be omitted there. Let us note that already the special case * = {0} is of interest. It provides the following variant of the single-site criterion of ref. [11] (which is, in fact, a bit simpler since it does not invoke the decoupling lemma). Corollary. For the random Schrödinger operator a sufficient condition for localization (1.3) is that for all E ∈ [a, b], Cs 1 2d(2d − 1) s ρ(dV ) < 1. (1.12) λ |λV − E|s Just as the main result of ref. [11], the above criterion permits to easily conclude localization for the cases of high disorder or extreme energies. However, we may now move beyond that. By testing the hypothesis of Theorem 1.1 in the increasing sequence of volumes * = [−L, L]d , one may extend the conclusion to increasing regimes in the “energy × disorder plane”. In fact, it is easy to see that for each energy at which the strong localization condition (1.10) is satisfied, the hypothesis (1.9) will be met at all
224
M. Aizenman, J. H. Schenker, R. M. Friedrich, D. Hundertmark
sufficiently large L. (This may, however, be far from a practical test, as the necessary computation may be rather difficult for large L). Observant readers may note that the conclusion of Theorem 1.1 provides not only the localization condition Eq. (1.3), but it also rules out extended boundary states. The flip side of this observation is that if such states are present in some geometry, e.g. the half space, then the hypothesis of Theorem 1.1 will fail to be satisfied even if the operator exhibits localization in the bulk. Therefore, we present also the following result which permits to establish bulk localization regardless of the possible presence of extended boundary states. Theorem 1.2. Let Hω be a random Schrödinger operator with the probability distribution of the potential V (x) satisfying R1 (τ ) and R2 (s), for some s < τ . If for some z ∈ C and some finite region 0 ∈ * ⊂ Zd , 2 s C 1 + s |+(*)| λ
u,u ∈+(*)
E 0
1 H*;ω − z
s u < 1,
(1.13)
then Hω satisfies the fractional-moment condition (1.3), and there exist µ(s) > 0, A(s) < ∞ so that for any region . ⊂ Zd , s 1 E±i0 x (1.14) y ≤ A(s)e−µ(s) dist. (x,y) , H.;ω − z with dist. (x, y) = min{|x − y|, [dist(x, ∂.) + dist(y, ∂.)]}.
(1.15)
Let us add that, as in Theorem 1.1, A(s) and µ(s) of (1.14) depend on z only through the value of the LHS in Eq. (1.13). The modified metric, dist. (x, y), is a distance function relative to which the entire boundary of . is regarded as one point. It permits us to state that there is exponential decay in the bulk without ruling out non-exponential decay along the boundary. We supplement the last result by the following observation. Theorem 1.3. Let Hω be a random operator given by Eq. (1.1), with the probability distribution of the potential V (x) satisfying R1 (τ ) and R2 (s), for some s < τ . If at some energy E (or z ∈ C) the localization condition (1.3) is satisfied, with some A < ∞ and µ > 0, then for all large enough (but finite) L the condition (1.13) is met for * = [−L, L]d . The statement is a bit less immediate than the analogous claim for Theorem 1.1. We shall therefore include the proof below. It is natural to compare the above criteria for localization with those of the multiscale analysis. The two methods share the basic feature that the analysis requires an initial condition which one may expect to be met in a finite system provided its linear size is of the order of the localization length, or larger. However, for the method presented here if a suitable input is received on some scale, then the analysis can proceed using steps, or blocks, of only that size. An important difference in the results is that the fractional moment condition yields exponential decay for the expectation values, which are important for some of the conclusions listed above. Such bounds have not been derived by methods based on the multiscale analysis, since (at least without further
Finite-Volume Fractional-Moment Criteria for Anderson Localization
225
improvement) the bounds the latter yields on the “error terms”, i.e., the probabilities of “bad blocks”, decay not faster than exp[−(log L/ log Lo )α ]. This rate is faster than any power of L, but in itself not fast enough to imply exponential bounds for the mean values. However, it should be noted that the extension of the present method to operators in the continuum, for which a number of basic localization results have been established using the multiscale analysis [20, 21, 17], is still unaccomplished. Also not covered are discrete operators with the potential assuming discrete values (e.g., Vω (x) = ±1 [22]). In Sect. 4 we discuss various implications of the basic results. In particular it is shown that, for discrete random operators of the type considered here, the fractional moment condition (1.3) is satisfied throughout the regime in which the multiscale analysis applies (see Theorem 4.4). This carries the further implication that the properties listed above hold throughout the entire regime for which localization can be proven by any of the known methods. One of those properties is a strong form of dynamical localization, on which more is said in Appendix A. 2. Proofs of the Main Results 2.1. Some useful notation. The proofs of the above statements will be presented in terms which permit a direct extension to operators with more general hopping terms. We start by generalizing the notation; in particular, the sets *+ and +(*) will be made to depend implicitly on the operator T . (+) In the study of H.;ω we shall often consider “depleted” Hamiltonians, H.;ω , obtained by setting to zero the operator’s non-diagonal matrix elements (hopping terms) along some collection of ordered pairs of sites (referred to here as bonds) + ⊂ Zd × Zd . The difference is the operator T (+) , with Tx,y if x, y ∈ + or y, x ∈ + (+) Tx,y = (2.1) 0 if x, y ∈ + and y, x ∈ +, so that (+)
H.;ω = H.;ω + T (+) .
(2.2)
Typically, + will be a collection of bonds which forms the “cut set” of some W ⊂ Zd , i.e., the set of bonds with Tx,y = 0 connecting sites in W with sites in its complement. Thus we denote +(W ) = u, u |u ∈ W, u ∈ Zd \W, and Tu,u = 0 , (2.3) and also
W + = W ∪ u ∈ Zd |Tu,u = 0 for some u ∈ W .
The number of elements (i.e. bonds) in + is denoted |+|. In addition, we use the “Green function” notation: 1 G.;ω (x, y; z) = x y , H.;ω − z
(2.4)
(2.5)
226
M. Aizenman, J. H. Schenker, R. M. Friedrich, D. Hundertmark (+)
with G.;ω (x, y; z) defined correspondingly. Often, where it is obvious from context that an operator is a random variable, we shall suppress the subscript ω. In broad terms, the strategy for the proof is to derive a bound on the average Green function, of the form
E |G. (x, y; z)|s ≤
u,u ∈+(*(x))
s (+(*(x)) γ*(x) (u, u )|Tu,u |s E G. (u , y; z) , (2.6)
for all y ∈ Zd \*(x), where: *(x) = {x + y : y ∈ *} is a finite neighborhood of x, translate of some fixed region * 0, and γ*(x) is a quantity which is small when the typical values of the finite volume Green function between x and the boundary of *(x) are small (in a suitable sense). An inequality of the form (2.6) is particularly useful when
γ*(x) (u, u )|Tu,u |s < 1,
(2.7)
u,u ∈+(*(x))
since in that case Eq. (2.6) is akin to the statement that E (|G. (x, y; z)|s ) is a strictly subharmonic function of x, as long as |x −y| > diam|*|, and thus – if it is also uniformly bounded (which it is) – it decays exponentially. The first step towards a bound of the form (2.6) is, naturally, the resolvent identity: (+)
(+)
G.,ω = G.,ω − G.,ω · T (+) · G.,ω (+)
(2.8)
(+)
= G.,ω − G.,ω · T (+) · G.,ω
(written here in the operator form). However, one then reaches an obstacle, since the quantity whose mean needs to be estimated is a product of two Green functions which are not independent. For some time now this co-dependence has been the main obstacle on the road to an argument along the lines outlined above, since otherwise the general strategy applied here is well familiar from its various successful applications in the context of the statistical mechanics of homogeneous systems ([23–27]), and the other auxiliary tools specific to the present context have in essence been available since ref. [11]. The co-dependence problem is solved here through a second application of the resolvent identity (followed by a decoupling argument of a familiar type). In fact, a similar tactic was applied by von Dreifus to the mean correlation functions, in a study of the phase transitions in disordered ferromagnetic models [28] (as we learned from T. Spencer after the completion of the first draft of this work). The two applications of the resolvent identity, for which the depletion sets +1 and +2 need not coincide, may be combined by starting our argument from the identity: (+ )
(+ )
(+ )
(+ )
(+ )
G. = G. 1 − G. 1 · T (+1 ) · G. 2 + G. 1 · T (+1 ) · G. · T (+2 ) · G. 2 .
(2.9)
Readers familiar with the current techniques may note that once the middle term G. is replaced by a uniform bound, the remaining expression can be made free from codependence by an appropriate choice of +1 and +2 . The rest are technicalities, to which we turn next.
Finite-Volume Fractional-Moment Criteria for Anderson Localization
227
2.2. Key lemmas. We shall now present three lemmas which will be used in the proofs of our main results. The first is a known estimate which provides the afore-mentioned uniform upper bound. Lemma 2.1. Let V (x) be a random potential satisfying the regularity condition R1 (τ ). Then for each s < τ , any region ., and any random operator of the form (1.1)
Cs E |G. (x, y; z)|s ≤ s , λ
(2.10)
for all z ∈ C. The statement is an immediate consequence of a version of the Wegner estimate which we present in the appendix. (See Lemma B.1; also Eq. (2.18) below.) Next is our new bound. Lemma 2.2. Let Hω be a random operator given by Eq. (1.1) with the probability distribution of the potential V (x) satisfying the regularity condition R1 (τ ), and let W be a subset of .. Then, denoting + = +(W + ) and + = +(W ), for all z ∈ C: (1) The following “depleted-resolvent bound” holds for any pair of sites x ∈ W , y ∈ .\W + ,
|Tv,v |s E |G.\W + (v , y; z)|s , (2.11) E |G. (x, y; z)|s ≤ γ (W ) v,v ∈ +
with γ (W ) =
Cs λs
|Tu,u |s E |GW (x, u; z)|s .
(2.12)
u,u ∈+
(2) If, furthermore, the probability distribution of the potential satisfies also R2 (s) then the following bound holds for any pair of sites x ∈ W , y ∈ .\W ,
γx (v, v )|Tv,v |s E |G.\W (v , y; z)|s , (2.13) E |G. (x, y; z)|s ≤ v,v ∈+
with s
C γx (v , v) = E |GW (x, v ; z)|s + s λ
|Tu,u |s E |GW (x, u; z)|s .
u,u ∈+
(2.14)
Proof. Both results follow from the second-order resolvent identity Eq. (2.9), which yields: (+ ) (+ ) (+ ) (+ ) G. (x, y; z) = G. 1 (x, y; z) − x G. 1 T. 1 G. 2 y (2.15) (+ ) (+ ) (+ ) (+ ) + x G. 1 T. 1 G. T. 2 G. 2 y .
228
M. Aizenman, J. H. Schenker, R. M. Friedrich, D. Hundertmark
u’
y
u
x W
v
v’
Fig. 2.1. Diagramatic depiction of the bound (2.16) on G(x, y; z), for x, y ∈ Zd and z ∈ C. The long solid lines are “depleted Green functions”, the two short segments correspond to the hoping terms (T ) and the double line is a full Green function. Once the latter is replaced by a uniform upper bound, the expectation value of the product of the remaining terms factorizes
For the proof of the first claim, we take +1 = + = +(W ) and +2 = + = +(W + ). Then, the first term of Eq. (2.15) is zero because +(W ) decouples x and y and the second term is zero because +(W + ) decouples W + and y. Thus (+) ( +) G. (x, y; z) = Tu,u Tv,v G. (x, u; z)G. (u , v; z)G. (v , y; z) . (2.16) u,u ∈+ + v,v ∈
It follows that for any s ∈ (0, 1),
E |G. (x, y; z)|s s (+) ( +) ≤ |Tu,u |s |Tv,v |s E G. (x, u; z)G. (u , v; z)G. (v , y; z) .
(2.17)
u,u ∈+
+ v,v ∈
(Note that for 0 < s < 1: |a + b|s ≤ |a|s + |b|s .) In estimating the terms on the right-hand side of Eq. (2.17) let us consider first the conditional expectation of the central factors, G. (u , v; z). Only these factors depend on the values of the potential at u and v, and therefore they
can be replaced by their conditional expectation E |G. (u , v; z)|s {V (q)}q∈.\{u ,v} . As will be proven in the Appendix, under the regularity condition R1 (τ ) these are uniformly bounded (Lemma B.1):
Cs E |G. (u , v; z)|s {V (q)}q∈.\{u ,v} ≤ s . λ
(2.18)
(The proof involves a reduction to a two-dimensional problem via the Krein formula, and a two-dimensional Wegner-type estimate.) Once the central factor in each expectation on the right.hand side of Eq. (2.17) is replaced by the above bound, what remains there are two independent random variables (+) ( +) which are |G. (x, u; z)|s = |GW (x, u; z)|s and |G. (v , y; z)|s = |G.\W + (v , y; z)|s . The expectation now factorizes, and the resulting expression yields the first claim of the lemma.
Finite-Volume Fractional-Moment Criteria for Anderson Localization
229
For the second claim, we take +1 = +2 = + = +(W ). Once again the first term of Eq. (2.15) is zero because +(W ) decouples x and y. However, the second term is non-zero, and we obtain
E |G. (x, y; z)|s s (+) (+) ≤ |Tv ,v |s E G. (x, v; z)G. (v , y; z) v,v ∈+
+
u,u ∈+ v,v ∈+
s (2.19) (+) (+) |Tu,u |s |Tv,v |s E G. (x, u; z)G. (u , v; z)G. (v , y; z) .
At this point we may not use the previous argument, since in the last expectation V (v) affects each of the first two factors and V (u ) affects each of the last two factors. However, the dependence of each of these factors on the potentials is of a particularly simple form: they are ratios of two functions (determinants) which are separately linear in each potential variable. Using the decoupling hypotheses, i.e. the regularity conditions R1 (τ ) and R2 (s), the expectation may be bounded by the product of expectations. Specifically, we prove in Lemma C.1 that: s (+) (+) E G. (x, u; z)G. (u , v; z)G. (v , y; z) ≤
s (+) C G (x, u; z)G(+) (v , y; z)s . E . . λs
(2.20)
Once again, of two independent random variables, (+) we are left with a product G (x, u; z)s = GW (x, u; z)s and G(+) (v , y; z)s = G.\W (v , y; z)s . The fac. . torization of the remaining expectation yields the second claim of the lemma, Eq. (2.13). The above lemma provides a bound for the Green function in terms of its depleted versions. This suffices for the derivation of the first of our two main theorems (Thm 1.1). However, this does not suffice for the second theorem, Thm 1.2, for which we shall use an inequality that is linear in the original function. That “closure” will be attained with the help of the following bound on the depleted resolvent in terms of the full one. Lemma 2.3. Let H.,ω be a random operator in 2 (.), . ⊆ Z d , given by Eq. (1.1), with the probability distribution of the potential V (x) satisfying the regularity conditions R1 (τ ) and R2 (s) for some s < τ . Let W be a subset of .. Then, the following holds for any pair of sites u, y ∈ .\W , and every z ∈ C:
E |G.\W (u, y; z)|s ≤ E |G. (u, y; z)|s s
C + s |Tv ,v |s E |G. (v, y; z)|s , λ v,v ∈+
with + = +(W ) the “cut-set” of W .
(2.21)
230
M. Aizenman, J. H. Schenker, R. M. Friedrich, D. Hundertmark
Proof. Starting from the first order resolvent identity, Eq. (2.8), and taking expectation values of its matrix elements, we find:
(+) E |G. (u, y; z)|s ≤ E |G. (u, y; z)|s (+) + |Tv ,v |s E |G. (u, v ; z)|s |G. (v, y; z)|s , v,v ∈+(W )
(2.22) where + = +(W ), and G(+) = G.\W . It suffices, therefore, to show that in the last (+) term the factor |G. (u, v ; z)|s may be replaced (for an upper bound) by the constant s C λs .
This follows through a decoupling argument which we present in the Appendix – see Lemma C.1. Remark. In the applications we shall use Lemmas 2.2 and 2.3 both in the stated form and in the conjugated form, with the arguments of the Green functions reversed. One form of course implies the other (at conjugate energy). 2.3. Proofs of the main results. We are now ready to derive the results stated in the Introduction. For simplicity these were stated in the context of the Schrödinger operators, for which T is the discrete Laplacian. The proofs given in this section will be restricted to this case. A more generally applicable treatment is presented in the next section. Proof of Theorem 1.1. Assume that for some z ∈ C and a finite region * the smallness condition (1.9) holds. By Lemma 2.2 and translation invariance, we learn that for any region . and any x, y ∈ . with y ∈ Zd \*+ (x):
E |G. (x, y; z)|s ≤ b ·
1 |+(*+ )|
E |G.\*+ (x) (v , y; z)|s ,
(2.23)
v,v ∈+(*+ (x))
where b = b(*, z) of Eq. (1.9), and *(x) is the translate of * by x. By Lemma 2.1, each of the terms in the sum is bounded by Cs /λs . Since the sum is normalized by the prefactor 1/|+(*+ )|, the inequality (2.23) permits to improve that bound for E(|G. (x, y; z)|s ) by the factor b(< 1). Furthermore, the inequality may be iterated a number of times, each iteration resulting in an additional factor of b. One should take note of the fact that the iterations bring in Green functions corresponding to modified domains. It is for this reason that the initial input assumption was required to hold for modified geometries, i.e. not just for * but also for all its subsets. Inequality (2.23) can be iterated as long as the resulting sequences (x, v , . . . , v (n) ) do not get closer to y than the distance L = sup{|u||u ∈ *+ }. Thus:
Cs Cs E |G. (x, y; z)|s ≤ s · b|x−y|/L ≤ s e−µ|x−y| , λ λb with µ = | ln b|/L.
(2.24)
Next, let us turn to the proof of the second theorem (Thm 1.2). The main change is that we now proceed under the assumption that the smallness condition holds for some region * without requiring it to hold also in all subsets. As explained in the introduction, the difference may be meaningful if Hω has extended boundary states in some geometry.
Finite-Volume Fractional-Moment Criteria for Anderson Localization
231
Proof of Theorem 1.2. Our first goal is to show that under the assumption (1.13) there is b < 1 such that for all pairs {x, y} with *(x) ⊂ . and y ∈ .\*(x),
E |G. (x, y; z)|s ≤ b (2.25) Pxl (u)E |G. (u, y; z)|s , u∈*+ (x)
with non-negative weights satisfying: u∈*+ (x)
Pxl (u) = 1.
We shall use this inequality along with its conjugate:
Pyr (v)E |G. (x, v; z)|s , E |G. (x, y; z)|s ≤ b
(2.26)
(2.27)
v∈*+ (y)
where Pyr (v) satisfy the suitable analog of the normalization condition (2.26). It is important that – unlike in the inequality (2.23), the functions which appear on the right-hand side of (2.25) and (2.27) are computed in the same domain as those on the left-hand side. The first step is by Lemma 2.2, which yields
E |G. (x, y; z)|s ≤ (2.28) γx (u, u )E |G.\*(x) (u , y; z)|s , u,u ∈+(*(x))
whenever *(x) ⊂ . and y ∈ Zd \*(x), with γx (u, u ) specified in Eq. (2.14).
Next, we apply Lemma 2.3, Eq. (2.21), to bound E |G.\*(x) (u , y; z)|s in terms of a sum of quantities of the form E (|G. (v, y; z)|s ) with v ∈ *+ (x). The result is initially expressed as a sum over bonds:
E |G. (x, y; z)|s ≤ γx (u, u )E |G. (u , y; z)|s u,u ∈+(*(x))
+
s C : λs
E |G. (u, y; z)|s ,
(2.29)
u,u ∈+(*(x))
where, using translation invariance, : :=
γ0 (u, u ).
u,u ∈+(*)
Collecting terms, and pulling out normalizing factors, one may cast the inequality (2.29) in the form (2.25) with s s C C b := γx (u, u ) + s : = 1 + s |+(*)| : (2.30) λ λ u,u ∈+(*(x))
2 s C = 1 + s |+(*)| λ
E |G* (0, u; z)|s .
u,u ∈+(*)
The smallness condition (1.13) is nothing other than the assumption that b < 1.
(2.31)
232
M. Aizenman, J. H. Schenker, R. M. Friedrich, D. Hundertmark
The above argument proves Eq. (2.25). By the transposition, or time-reflection, symmetry of H (H T = H ) also Eq. (2.27) holds. (Such symmetry of H is not essential for our analysis: it suffices to assume that the smallness condition Eq. (1.13) holds along with its transpose.) We proceed in the proof by iterating the inequalities (2.25) and (2.27). However an adaptation is needed in the argument which was used in the proof of Theorem 1.1 since the iteration can be carried out only as long as the two points (the arguments of the resolvent) stay at distance L = sup{|u| : u ∈ *+ } not only from each other but also from the boundary ∂.. The relevant observation is that for every pair of sites x, y ∈ . there is a pair of integers {n, m} such that: 1. n + m = dist . (x, y) , 2. the ball of radius n centered at x and the ball of radius m centered at y form a pair of disjoint subsets of .. For the desired bound on E (|G. (x, y; z)|s ), we shall iterate Eq. (2.25) n/L times from the left, and (2.27) m/L times from the right. Similar to Eq. (2.24), we obtain:
Cs E |G. (x, y; z)|s ≤ s 2 e−µdist. (x,y) , λb with µ = | ln b|/L.
(2.32)
The third theorem stated in the introduction (Thm 1.3) is the claim that the condition which is shown above to be sufficient for exponential localization, in the sense of Eq. (1.3), is also a necessary one. We shall now prove this to be the case. Proof of Theorem 1.3. Suppose that Eq. (1.3) holds with some A < ∞ and µ > 0. We need to show that also in finite systems the Green function is sufficiently small between an interior point and the boundary. To bound the finite volume function in terms of the infinite volume one, we may use Lemma 2.3, by which
E |G* (0, u; z)|s ≤
u,u ∈+(*)
E |G(0, u; z)|s
u,u ∈+(*)
+
s C |+(*)| λs
|Tv,v |s E |G(0, v ; z)|s ,
(2.33)
v,v ∈+(*)
for any finite region * containing the origin. We need to show that for * = [−L, L]d with L large enough 1+
2 s C |+(*)| λs
E |G* (0, u; z)|s < 1.
(2.34)
u,u ∈+(*)
After applying Eq. (2.33) to the terms on the left side of Eq. (2.34) we find that the number of summands involved and their prefactors grow only polynomially in L, whereas under our assumption the relevant factors E (|G(0, u; z)|s ) are exponentially small in L. Hence the condition (2.34) is satisfied for L large enough.
Finite-Volume Fractional-Moment Criteria for Anderson Localization
233
3. Generalizations 3.1. Formulation of the general results. We shall now turn to some generalizations of the theorems which were presented in Sect. 1.2 for the random Schrödinger operator. The setup may be extended in a number of ways. 1. Addition of magnetic fields. The hopping terms {Tx,y } need not be real. In particular, the present analysis remains valid when one includes in Hω a constant magnetic field, or a random one with a translation invariant distribution. A magnetic field is incorporated in Tx,y through a factor exp(−iAx,y ), with Ax,y an anti-symmetric function of the bonds. (It represents the integral of the “vector potential” ×(−e/h) ¯ along the bond x, y.) Except for the trivial case, with such a factor T is no longer shift invariant. However, in the case of a constant magnetic field, T will still be invariant under appropriate “magnetic shifts”, which consist of ordinary shifts followed by gauge transformations. Translation-invariance plays a role in our discussion. However, since gauge transformations do not affect the absolute values of the resolvent, it suffices for us to assume that Hω is stochastically invariant under magnetic shifts – in the sense of Definition 3.1. 2. Extended hopping terms. The discrete Laplacian may be replaced by an operator with hopping terms of unlimited range. For exponential localization we shall however require {Tx,y } to decay exponentially in |x − y|. 3. Off-diagonal disorder. {Tx,y } may also be made random. It is convenient however to assume exponentially decaying uniform bounds. The regularity conditions on the potential will now be assumed for the conditional distribution of V (x) at specified off-diagonal disorder. 4. Periodicity. Hω may also include a periodic potential, i.e., Eq. (1.1) may be modified to: Hω = Tx,y;ω + Uper (x) + λVω (x).
(3.1)
This may be further generalized by requiring periodicity only of the probability distribution of H . 5. More general lattices. In the previous discussion, the underlying sets Zd may be replaced by other graphs, with suitable symmetry groups. The graph structure is relevant if the hopping terms are limited to graph edges. However, since we consider also operators with hopping terms of unlimited range, let us formulate the result for operators on 2 (T ) where the underlying set is of the form T = G ×S, with G a countable group and S a finite set. We let dist(x, y) denote a metric on T which is invariant under the natural action of G on that set. For example, this setup allows for T to be a Bethe lattice, or a more general Cayley lattice. (Instructive discussion of some statistical mechanical models in such settings may be found in refs. [29]). The set S is included here in order to leave room for periodic structures. We denote by C the “periodicity cell”, which is {ı} × S where ı is the identity in G, and by gx the “G-coordinate” of x. Thus, the lattice T is tiled by disjoint translates of C, the tile containing x being gx C. Some of the relevant concepts are summarized in the following definition. Definition 3.1. With T = G × S as above, let Hω be a random operator on 2 (T ) (i.e., one with some specified probability distribution), whose off-diagonal part is denoted by Tω and the diagonal part is referred to as the potential (for consistency, we denote it as λVω ).
234
M. Aizenman, J. H. Schenker, R. M. Friedrich, D. Hundertmark
1. We say that Hω is stochastically invariant under magnetic shifts if for each κ ∈ G and almost every ω there is a unitary map of the form
Uκ,ω ψ (x) = eiφκ,ω (x) ψ(κx),
(3.2)
(with some function φκ,ω (·) ) under which D
∗ Uκ,ω Hω Uκ,ω = Hω ,
(3.3)
D
where = means equality of the probability distributions. 2. The operator is said to have tempered off-diagonal matrix elements, at a specified value of s < 1, if there is a kernel τx,y , and some m > 0, such that Tx,y;ω ≤ τx,y , almost surely, and sup
x∈T y∈T
s τx,y e+ m dist(x,y) < ∞.
(3.4)
3. We say that the potential has an s-regular distribution if for some τ > s the conditional distributions of {Vω (x)}, at specified values of the hopping terms variables {Tu,v;ω }, are independent and satisfy the regularity conditions R1 (τ ) and R2 (s) with uniform constants. Before presenting our general theorems, Theorem 3.2 and Theorem 3.3, it is convenient to introduce notation for certain quantities which appear in their statements. For s each * ⊂ T we define τu,∂* , “the hopping term from u to the boundary”, by s = τu,∂*
v∈W
s τu,v ,
(3.5)
where W is either * or T \ *, whichever does not contain u. The kernel k* (u, v), s that appears in our basic bounds (see Lemma 3.4), which is a “dressed” version of τu,v is defined as follows: s C s s s k* (u, v) := τu,v I[u ∈ *, v ∈ T \ *] + τu,∂* τv,∂* I[u ∈ *] λs 2 Cs s s τv,∂* Es (*)I[u, v ∈ *], + τu,∂* λs where Es (*) =
s u∈* τu,∂* . Notice that k*
(3.6)
is concentrated on the boundary of *, i.e.,
k* (u, v) ≤ C* e−m [dist(u,∂*)+dist(v,∂*)]
(3.7)
where m is independent of * and dist(v, ∂*) is the distance from v to whichever set, * or T \ *, does not contain v. Following is the generalization of Theorem 1.1.
Finite-Volume Fractional-Moment Criteria for Anderson Localization
235
Theorem 3.2. Let Hω be a random operator on 2 (T ) (T = G × S, as above) with an s-regular distribution for the potential Vω (·), and with tempered off-diagonal matrix elements (Tx,y;ω ), which is stochastically invariant under magnetic shifts. Let µ > 0, and assume that for some z ∈ C and a finite region * ⊂ T , which contains the periodicity cell C, the following is satisfied for all subsets W ⊂ *: s 1 u k* (u, v) e+µdist(x,v) < 1. E x (3.8) sup HW ;ω − z x∈C u,v∈*×(T \*)
Then there exists A < ∞ such that for all . ⊂ T , and all x ∈ ., s 1 y e+µdist(x,y) ≤ A. E±i0 x H.;ω − z
(3.9)
y∈.
Remarks. 1. Because the hopping terms are tempered as described in Definition 3.1, the bound (3.8) will be satisfied for some µ > 0 provided s 1 u k* (u, v) < 1. E x (3.10) sup sup HW ;ω − z x∈C W ⊂* u,v∈*×T \*
We shall use this criterion in Sect. 4 in the slightly different form s 2 s C 1 s u < 1, (3.11) 1 + s Es (*) sup sup τu,u E x λ H − z W ;ω x∈C W ⊂* u,u ∈*×T \*
where we have summed various terms appearing in k* (u, v). 2. For graphs which grow at an exponential rate, such as the Bethe lattice, exponentially decaying functions need not be summable. The conclusion, Eq. (3.9), was therefore formulated in the stronger form, which implies both exponential decay, and almost sure summability. In particular, it is useful to recall that for s/2 < 1:
s/2 2 s ≤E (3.12) |G(x, y)| |G(x, y)| . E y
y
3. One may note that in the more general theorem we do make use of the “decoupling lemma”, which was not used in Theorem 1.1. 4. Translation invariance played a limited role here: the analysis extends readily to random operators with non-translation invariant distributions, provided only that the required bounds are satisfied uniformly for all translates of *, and the distribution of the potential is uniformly s-regular. To demonstrate the required change we cast the next statement in that form. As we discussed in the preceding sections, condition (3.8) may fail due to the existence of extended states at some surfaces. The following generalization of Theorem 1.2 provides criteria for localization in the bulk which are less affected by such surface states.
236
M. Aizenman, J. H. Schenker, R. M. Friedrich, D. Hundertmark
Theorem 3.3. Let Hω be a random operator on 2 (T ) (T = G × S, as above) with an s-regular distribution for the potential Vω (·), and with tempered off-diagonal matrix elements ({Tx,y;ω }). Let µ > 0 and assume that for some z ∈ C and a finite region *, C ⊂*⊂T, s 1 u kg * (u, v)e+µdist(x,v) < 1, sup E x (3.13) x H − z[¯ z ] g *;ω x∈T x u∈gx * v∈T
where z[¯z] means that the bound is satisfied for both z and z¯ . Then the condition (3.9) holds for the full operator Hω (i.e., with . = T ), and there exists B < ∞ with which for arbitrary . ⊂ T : s 1 y ≤ Be− µ˜ dist. (x,y) . E±i0 x (3.14) H − z .;ω
The modified distance dist. (x, y) is defined by the natural extension of Eq. (1.15). 3.2. Derivation of the general results. The derivation of Theorems 3.2 and 3.3 follows very closely the proofs of Sect. 2. The main difference is in the second portion of the argument where we extract decay in a single step rather than by iteration. The first part of the proof rests on Lemmas 2.2 and 2.3 which are easily seen to extend to the setup described in Theorem 3.3. One readily obtains the following extension (the hopping terms Tx,y appearing in Sect. 2.2 are replaced with the uniform upper-bound τx,y ): Lemma 3.4. Let Hω be a random operator with the properties listed in Theorem 3.3, and let * be a finite subset of T , containing the periodicity cell C, for which the condition (3.8) is satisfied. Then the following bound is valid for any x ∈ *, y ∈ T \*,
E |G. (x, y; z)|s ≤
E |G*∩. (x, u; z)|s k* (u, v)E |G.\* (v, y; z)|s ,
(3.15)
∈*×T \*
and
E |G. (x, y; z)|s ≤
E |G.∩* (x, u; z)|s k* (u, v)E |G. (v, y; z)|s .
∈*×T
(3.16) Notice that (3.16) differs from (3.15) in that the Green function in the region . (not . \ *) appears on the right hand side and the summation over v extends over the entire lattice. Theorems 3.2 and 3.3 follow easily from Lemma 3.4:
Finite-Volume Fractional-Moment Criteria for Anderson Localization
Proof of Theorem 3.2. To establish the claimed bound (3.9) we will show that
E |G. (x, y; z)|s e+µdist(x,y) An := sup sup .:|.|≤n x y∈.
237
(3.17)
is bounded independent of n, thus establishing the result for finite regions. For infinite regions (3.9) the result follows by a limiting procedure, with the convergence implied by Fatou’s lemma. For any given . with |.| ≤ n and any site x ∈ .,
Cs E |G. (x, y; z)|s e+ µ dist(x,y) ≤ |*|eµdiam(*) s λ y∈.
+ E |G*x ∩. (x, u; z)|s k* (u, v)E |G.\*x (v, y; z)|s e+µdist(x,y) , y∈.\*x u∈*x ,v∈T \*x
(3.18) where the first term on the right side bounds the contribution to the sum from sites y in *x ≡ gx *, and the remaining terms were estimated by Lemma 3.4, Eq. (3.15). Performing the summation over y first, and applying the triangle inequality to factor the exponential weight, we obtain: y∈.
Cs E |G. (x, y; z)|s e+µdist(x,y) ≤ |*| s eµdiam(*) + b An , λ
(3.19)
where b is the quantity on the left hand side of (3.8). When maximized over . and x this leads to the bound An ≤ Const. + bAn which, since b < 1, implies that An ≤
|*|Cs λ−s eµdiam(*) , 1−b
(3.20)
as claimed above. Proof of Theorem 3.3. The claim made for the special case . = T is covered by analysis similar to what was just described. However the second claim, i.e., Eq. (3.14), requires a somewhat different argument. We will first show that for a finite region . the function g(x, y) = E(|G. (x, y; z)|s ) e+µdist. (x,y)
(3.21)
attains its maximum value for some (x, y) with dist. (x, y) ≤ 2diam(*). For any pair with a larger distance at least one of the sites, say x, can be separated from both the other and the boundary ∂. by an appropriate translate of *, i.e. *x . We may then use Lemma 3.4, Eq. (3.16), to bound g(x, y) by a sum of products of Green functions. If, in this sum, we replace each factor of E(|G. (v, y)|s )eµdist(x,y) by the upper bound gmax eµdist(x,v) , the resulting sum yields g(x, y) ≤ bgmax ,
(3.22)
where b is the quantity which sits on the left hand side of (3.13). As b < 1, we learn that g(· , ·) is not maximized at (x, y).
238
M. Aizenman, J. H. Schenker, R. M. Friedrich, D. Hundertmark
Since g(x, y) ≤
Cs µdist. (x,y) , λs e
the above implies that for any finite .
E(|G. (x, y; z)|s ) ≤
Cs 2µdiam(*) −µ dist. (x,y) e e . λs
(3.23)
By strong resolvent convergence arguments, the bound extends to infinite regions.
4. Some Implications We shall now present a number of implications of the finite volume criteria for localization, focusing on the finite dimensional lattices Zd . The statements will bear some resemblance to results derived using the multiscale approach, however the conclusions drawn here go beyond the latter by yielding results on the exponential decay of the mean values. The significance of that was described in the introduction. 4.1. Fast power decay ⇒ exponential decay. An interesting and useful implication (as is seen below) is that fast enough power law implies exponential decay. In this sense, random Schrödinger operators join other statistical mechanical models in which such principles have been previously recognized. The list includes the general Dobrushin– Shlosman results [24] and the more specific two-point function bounds in: percolation (Hammersley [23] and Aizenman–Newman [27]), Ising ferromagnets (Simon [25] and Lieb [26]), certain O(N ) models (Aizenman–Simon [30]), and time-evolution models (Aizenman–Holley [31], Maes–Shlosman [32]). Theorem 4.1. Let Hω be a random operator on 2 (Zd ) with an s-regular distribution for the potential (Vω (x)) and tempered off-diagonal matrix elements (Tx,y;ω ). There are L0 , B1 , B2 < ∞, which depend only on the temperedness bound (3.4), such that if for some E ∈ R and some finite L ≥ L0 , either s 1 3(d−1) L sup E x (4.1) y ≤ B1 , H*L (x),ω − E L/2≤&x−y&≤L or L
4(d−1)
sup
L/2≤&x−y&≤L
E x
s 1 y Hω − E − i0
≤ B2 ,
(4.2)
where *L (x) = [−L, L]d + x and &y& ≡ maxj |yj |, then the exponential localization (1.3) holds for all energies in some open interval (a, b) containing E. Proof. By Theorem 3.2, to establish exponential decay at the energy E it suffices to show that for each x ∈ Zd , 2 s
C s s τu,u < 1. (4.3) 1 + s Es (*L ) E |G*L (x) (x, u; E)| λ u∈*L (x) u ∈Zd \*L (x)
Because the off diagonal elements are tempered we have the following bounds
−m|u−u | s τu,u , ≤ Const. e
Es (*L ) ≤ Const. qLd−1 ,
(4.4)
Finite-Volume Fractional-Moment Criteria for Anderson Localization
239
for some m > 0, and all L > 1. Under the assumption Eq. (4.1): u∈*L (x) u ∈Zd \*L (x)
s s τu,u E |G*L (x) (x, u; E)| s C Const. (L/2)d e−mL /2 λs s 1 + Const. sup E x y Ld−1 . H*L (x),ω − E L/2≤&x−y&≤L ≤
(4.5)
For this bound the sum was split according to &u − u & < (or ≥)L/2, and in the first s /λs . case we used the uniform upper bound E(|G(x, u; E)|s ) ≤ C It is now easy to see that with an appropriate choice of L0 and B1 condition (4.1) implies the claimed bound (4.3) – for the given energy E. The extension to an interval of energies around E then follows from the continuity of the fractional moments of finite volume Green functions. To show the sufficiency of the second condition, we first use Lemma 2.3 to bound finite volume Green functions in terms of the corresponding infinite volume funtions s
C E |G*L (x) (x, y; E)|s ≤ E |G(x, y; E)|s + s λ
u∈*L (x) u ∈Zd \*L (x)
τus ,u E |G(x, u ; E)|s . (4.6)
Splitting the sum as in Eq. (4.5), we get
E |G*L (x) (x, y; E)|s
sup
L/2≤&x−y&≤L
≤
! "2 Cs Const. (L/2)d e−mL /2 λs + 1 + Const. Ld−1 × Ld−1
(4.7) sup
L/2≤&x−y&≤L
E |G(x, y; E)|s .
The combination of Eq. (4.7) with (4.5), yields the claim – for the given energy. Again, the existence of an open interval of energies in which the condition is met is implied by the continuity of the finite-volume expectation values.
4.2. Lower bounds for Gω (x, y; Eedge + i0) at mobility edges. Boundary points of the continuous spectrum are often referred to as mobility edges. (In an ergodic setting the location of such points does not depend on the realization ω [33].) The proof of the occurrence of continuous spectrum for random stochastically shift-invariant operators on Zd is still an open problem (one may add that we are here glossing over some fine distinctions in the dynamical behaviour [34]). However it is intersting to note that Theorem 4.1 directly yields the following pair of lower bounds on the decay rate of
240
M. Aizenman, J. H. Schenker, R. M. Friedrich, D. Hundertmark
the Green function at mobility edges, Eedge , for stochastically shift invariant random operators with regular probability distribution of the potential:
E 0
s y
1
≥ B1 L−3(d−1) , H − E d edge L/2≤&y&≤L [−L,L] ,ω s 1 sup E 0 y ≥ B2 L−4(d−1) , Hω − Eedge − i0 L/2≤&y&≤L sup
(4.8) (4.9)
with &y& ≡ maxj |yj |. We do not expect the power laws provided here to be optimal. As mentioned above, vaguely similar bounds are known for the critical two-point functions in certain statistical mechanical models (percolation, Ising spin systems, and some O(N ) spin models).
4.3. Extending off the real axis. For various applications, such as the decay of the projection kernel (see [8, Sect. 5]), it is useful to have bounds on the resolvent at z = E + iη which are uniform in η. The following result shows that in order to establish such uniform bounds it is sufficient to verify our criteria for real energies in some neighborhood of E. Theorem 4.2. Let Hω be a random operator on 2 (Zd ) with an s-regular distribution for the potential (Vω (x)) and tempered off-diagonal matrix elements (Tx,y;ω ). Suppose that for some E ∈ R, and IE > 0, the following bound holds uniformly for ξ ∈ [E − IE, E + IE]: E x
s 1 y Hω − ξ − i0
≤ A e−µ|x−y| .
(4.10)
˜ −µ|x−y| ≤ Ae ,
(4.11)
Then for all η ∈ R: E x
s 1 y Hω − E − iη
< ∞ and µ˜ > 0 – which depend on IE and the bound (4.10). with some A Remarks. 1. This result is not needed in situations covered by the single site version of the criterion provided by Theorem 1.1, since if Eq. (1.12) is satisfied at some E ∈ R then it automatically holds uniformly along the entire line E + iR. We do not see a monotonicity argument for such a deduction in case of other finite-volumes. 2. One way to derive the statement is by using the fact that exponential decay may be tested in finite volumes: if a finite volume criterion holds for some E then continuity allows one to extend it to all E + iη with η sufficiently small. The Combes–Thomas estimate [35] can then be used to cover the rest of the line E + iR. However, by this approach one gets only a weaker decay rate for energies off the real axis. It is tempting to think that some contour integration argument could be found to significantly improve on that. The proof given below is a step in that direction (though it still leaves one with the feeling that a more efficient argument should be possible).
Finite-Volume Fractional-Moment Criteria for Anderson Localization
241
Proof. Assume that condition (4.10) is satisfied for all ξ ∈ [E − IE, E + IE]. We shall show that this implies that for any power α, s 1 Aα E x , (4.12) y ≤ Hω − ξ − iη |x − y|α with the constant Aα < ∞ uniform in η. The stated conclusion then follows by an application of Theorem 4.1 (and the uniform bounds seen in its proof). We shall deal separately with large and small |η|, splitting the two regimes at IE × π/α. The case |η| ≥ IE×π/α is covered by the general bound of Combes–Thomas [35], which states that: |G(x, y; E + iη)| ≤ (2/η)e−m|x−y| for any m ≥ 0 such that
τ (x) (em|x| − 1) ≤ η/2.
(4.13)
(4.14)
x∈Zd
To estimate the resolvent for |η| ≤ IE × π/α, we shall use the fact that the function
fL (ζ ) = E |G[−L,L]d (x, y; ζ )|s (4.15) is subharmonic in the upper half plane, and continuous at the boundary. The subharmonicity is a general consequence of the analyticity of the resolvent in ζ , and the continuity is implied through the continuity of the distribution of the potential. L serves as a convenient cutoff, which may be removed after the bounds are derived (since H[−L,L]d ,ω −→ H L→∞ ω in the strong resolvent sense). Let D ⊂ C be the triangular region in the upper half plane in the form of an equilateral triangle based on the real interval [E − IE, E + IE] with the side angles equal to θ – determined by the condition 2π − 1. (4.16) θ The Poisson-kernel representation of harmonic functions yields, for E + iη ∈ D, D fL (E + iη) ≤ fL (ζ )PE+iη (dζ ), (4.17) α=
∂D
D PE+iη (dζ )
where is a certain probability measure on ∂D. We now rely on the fact that this probability measure satisfies D PE+iη (dζ ) ≤ Const.d(η2π/θ ) /IE 2π/θ .
(4.18)
(This is easily understood upon the unfolding of D by the map z ( → z2π/θ applied from either of the base corners of D, i.e., from ζ = E ± IE, and a comparison with the Poisson kernel in the upper half plane.) For ζ ∈ ∂D ∩ R the integrand satisfies the exponential bound (4.10). Along the rest of the boundary of D we use the Combes–Thomas bound (4.13). Putting it all together we get IE θ 2 −Const. |x−y| η fL (E + iη) ≤ A e−µ|x−y| + Const. e d(η2π/θ ) /IE 2π/θ . η 0 (4.19) The claimed Eq. (4.12) follows by simple integration, and the relation (4.16).
242
M. Aizenman, J. H. Schenker, R. M. Friedrich, D. Hundertmark
4.4. Relation with the multiscale analysis and density of states estimates. Using the above results we shall now show that the fractional moment localization condition is satisfied throughout the regime for which localization can be shown via the multiscale analysis, and also in regimes over which one has suitable bounds (e.g., via Lifshitz tail estimates) on the density of states of the operators restricted to finite regions *L = [−L, L]d . The following result is useful for the latter case. Theorem 4.3. Let Hω be a random operator on 2 (Zd ) with tempered off-diagonal matrix elements (Tx,y;ω ) and a distribution of the potential which is s-regular for all s small enough, which is stochastically invariant under magnetic shifts. Then, given β ∈ (0, 1), C1 > 0, and ξ > 3(d − 1), there exist L0 > 0 and C2 > 0 such that if for some L ≥ L0 , $ #
Prob dist σ (H*L ;ω ), E ≤ C1 L−β < C2 L−ξ , (4.20) at some energy E, then the exponential localization condition (1.3) holds in some open interval containing E. The condition (4.20) is similar to the one used in the multiscale analysis, although there one can also find a sufficient diagnostic with arbitrary ξ > 0. It may therefore not be initially clear that the methods of this paper may be used throughout the regime in which the multiscale analysis applies. However, the proof of Theorem 4.3 is easily adapted to prove the following result which implies fractional moment localization via the conclusions of the multiscale analysis. Theorem 4.4. Let Hω be a random operator with tempered off-diagonal matrix elements (Tx,y;ω ) and a distribution of the potential which is s-regular for all s small enough, which is stochastically invariant under magnetic shifts. If for some E ∈ R there exist A < ∞, µ > 0 , and ξ > 3(d − 1) such that % & lim Lξ Prob |G*L ;ω (0, x)| > Ae−µ|x| for some x ∈ *L = 0, (4.21) L→∞
then the exponential localization condition (1.3) holds in some open interval containing E. Remarks. 1. When the multiscale analysis applies, it allows one to conclude that there are A < ∞ and µ > 0 such that the probabilities appearing on the left side of Eq. (4.21) decay faster than any power of L as L → ∞. Thus, the conclusions of the multiscale analysis imply that exponential localization in the stronger sense discussed in our work applies throughout the regime which may be reached by this prior method. 2. It is of interest to combine the criterion presented above with Lifshitz tail estimates on the density of states at the bottom of the spectrum, E0 , and at band edges. Using Lifshitz tail estimates, it is possible to show that [36]: $ # −d/2 . Prob inf σ (H*L ;ω ) ≤ E0 + IE ≤ Const. Ld e−IE
(4.22)
Theorem 4.3 then implies fractional moment localization in a neighborhood of E0 ; we need only choose IE ∝ L−β with β ∈ (0, 1) for large enough L. Previous results in this vein may be found in [21, 16–18].
Finite-Volume Fractional-Moment Criteria for Anderson Localization
243
Proof of Theorems 4.3 and 4.4. We first prove Theorem 4.3 and then indicate how the proof can be modified to show Theorem 4.4. Fix an energy E ∈ R. For L > 0, define #
$ (4.23) pL (δ) := Prob dist σ (H*L ;ω ), E ≤ δ , and let δL := C1 L−β .
(4.24)
We will show that for suitable s ∈ (0, 1), L0 > 0 and C2 > 0, if pL (δL ) < C2 L−ξ ,
(4.25)
then the input condition (4.1) of Theorem 4.1: s 1 L3(d−1) sup E 0 y ≤ B1 , H − E L/2≤&y&≤L *L ,ω
(4.26)
∈ [E − 1 δL , E + 1 δL ]. Exponential localization in the is satisfied for all energies E 2 2 corresponding interval (and strip, with η = 0) follows then by Theorems 4.1 (and Theorem 4.2).
s in terms of pL (δ). This First we must show how to estimate E |G*L ;ω (0, u; E)| is achieved by considering separately the contributions from the “good set”:
(4.27) .G = {ω|dist σ (H*L ;ω ), E > δ}, and its complement, the “bad set”: .B = .cG . is at a small yet significant distance (IE ≥ On the “good set”, ω ∈ .G , the energy E 1 δ) from the spectrum of H . In this situation, we use the Combes–Thomas [35] *L ;ω 2 bound, by which: ≤ |G*L ;ω (0, u; E)|
2 − 1 IE|u| . e 2 IE
(4.28)
The above estimate does not apply on the “bad set”. However, using the Hölder inequality, we find that the net contribution to the expectation is small because Prob(.B ) = pL (δ) is small. The two estimates are combined in the following bound:
s E |G*L ;ω (0, u; E)|
s I [ω ∈ .G ] + E |G*L ;ω (0, u; E)| s I [ω ∈ .B ] = E |G*L ;ω (0, u; E)|
s s t t E (I [ω ∈ .B ])1− t ≤ 4s δ −s e−s |u| δ /4 + E |G*L ;ω (0, u; E)| (4.29) s
s
≤ 4s δ −s e−s |u| δ /4 + Ctt /λs pL (δ)1− t , where t is any number greater than s for which the distribution of the potential is still t-regular (i.e., Ct < ∞). The required bound, Eq. (4.26), is satisfied once one chooses s small enough so that t 3(d − 1), and L0 large enough so that for L > L0 , ξ ≥ t−s 4s C1−s L3(d−1)−sβ e−s C1 L
1−β
/4
≤ B1 /2.
(4.30)
244
M. Aizenman, J. H. Schenker, R. M. Friedrich, D. Hundertmark
Finally let us remark on how this argument can be adapted to prove Theorem 4.4. We simply define the good and bad sets differently: .G = {ω||G*L ;ω (0, x)| ≤ Ae−µ|x| for all x ∈ *L },
(4.31)
and .B = .cG , and then proceed as in the proof of Theorem 4.3 using Hölder’s inequality to estimate the contributions from .B . It is easy to see that for large L, the condition (4.21) implies that the input for Theorem 4.1 is satisfied. Thus, we have seen here that the fractional moment localization condition holds throughout the regime for which localization can be established by any available methods. This is meaningful since that condition carries a number of physically significant implications. Appendix A. Dynamical Localization Among the implications of the fractional moment condition is dynamical localization, expressed through uniform exponential decay of the average time evolution kernels: E sup x PHω ∈F eitH y ≤ Ae−µ|x−y| , (A.1) t∈R
where PHω ∈F indicates the spectral projection of Hω onto a set F ⊂ R in which the fractional moment condition is known to hold. A derivation of this implication, under some auxiliary assumptions on the distribution of the potential, was given in ref. [13]. For completeness we offer here a streamlined version of that argument, which also extends the result in that we now allow F to be an unbounded set (in particular the full real line). The inequality expressed in Eq. (A.1) is not special to the time evolution operators ft (E) = eitE ; it follows, rather, from a similar bound on the average total mass of the x,y spectral measures, µω , associated to pairs of sites x, y. The measures are defined by the spectral representation: f (E)µx,y (A.2) ω ( dE) := x|f (Hω )|y, x,y
for bounded Borel functions f . In the following discussion we denote by |µω | the x,y absolute value (sometimes called the total variation) of µω . Theorem A.1. Let Hω be a random operator on 2 (Zd ) with tempered off-diagonal matrix elements and a potential Vω which satisfies:
1. For some δ ∈ (0, 1), the δ-moments of Vω , E |Vω (x)|δ , are uniformly bounded. 2. For each x ∈ Zd the conditional distribution of v = Vω (x) at specified values of all other matrix elements has a density ρωx (v), and the functions ρωx are uniformly bounded. Suppose there is an energy domain F ⊂ R on which Hω satisfies a uniform fractional moment bound, i.e., there exist A < ∞ and µ > 0 such that, for some s ∈ (0, 1), s 1 E x (A.3) y ≤ Ae−µ|x,y| , H*;ω − E
Finite-Volume Fractional-Moment Criteria for Anderson Localization
245
for any finite region * ⊂ Zd , any pair of sites x, y ∈ *, and every E ∈ F . Then there exist A < ∞ and µ > 0 such that for any pair of sites x, y ∈ Zd ,
−µ |x−y| , (A.4) E |µx,y ω |(F ) ≤ A e x,y
where µω is the spectral measure associated to the pair x, y and Hω . Remarks. 1. Recall that for any regular Borel measure µ, |µ|(F ) = sup | f (E)µ( dE)|, F
where the supremum ranges over Borel measurable (or even just continuous) functions f which are point-wise bounded by 1. Thus Eq. (A.4) implies that (A.5) E sup |x|ft (Hω )PHω ∈F |y| ≤ CA e−µ |x−y| , t
for any uniformly bounded family of Borel functions {ft }. In particular, we may take ft (E) = eitE for t ∈ R to obtain dynamical localization (A.1) as promised. 2. The requirement that the conditional densities, ρωx , be uniformly bounded is overly strong. By the arguments presented in ref. [13], the result extends to potentials for which ' there is some q > 0 such that (ρωx (v))1+q dv are uniformly bounded. 3. Since this work extends now the exponential dynamical localization to the regime covered by the multiscale analysis, let us mention that prior results covering this regime include the proof of localization in terms of power-law bounds for the time evolution kernel [37, 38]. (The analysis there is more general since it applies also to models for which the fractional moment method has not been developed, e.g., continuum operators). Proof of Theorem A.1. It is convenient to derive the result through the analysis of the finite volume operators obtained by restricting Hω to finite regions, *n ⊂ Zd . It is generally understood that for each x, y ∈ Zd and each increasing sequence of finite regions *n x,y which contain {x, y} and whose union is Zd , the associated spectral measures, µ*n ;ω , x,y converge in the vague topology to µω . Thus, by the lemma of Fatou, for any F ⊂ R: x,y x,y E(|µω |(F )) ≤ limn→∞ E(|µ*n ;ω |(F )). The upshot is that it suffices to prove the following statement regarding finite volume operators. Under the assumptions of Theorem A.1 there exist C, r > 0 (which depend only on the regularity assumptions for Hω ) such that for any finite region * ⊂ Zd , any x, y ∈ *, any F ⊂ R, and any s ∈ (0, 1): ! s "r 1 x,y E µ*;ω (F ) ≤ C sup E x . (A.6) y H − E *,ω E∈F Following is a summary of the proof of this assertion. Let us fix a finite region * ⊂ Zd and a pair of sites x, y ∈ *. For simplicity of notation, we will suppress the region * and denote the restricted operator by Hω and x,y the associated spectral measure by µω . x,y 2 Since (*) is finite dimensional, µω is a weighted sum of Dirac measures supported on the eigenvalues of Hω . Integrals with respect to this measure are discrete sums. The
246
M. Aizenman, J. H. Schenker, R. M. Friedrich, D. Hundertmark
argument of ref. [13] makes an essential use of the following representation of measure. Let v = Vω (x), and let vˆ be any other value in R. Denote ( + (E) := −1/ x ( 1 ˆ Then, with Hˆ ω the operator with the potential at x changed to v. µx,y (dE) = −(v − v) ˆ x ω
1 Hˆ ω − E
Hω −E
y δ(v − vˆ − +(E))dE. ˆ
this x ,
(A.7)
In what follows, we will take vˆ = vˆω to be a random variable independent of vω and identically distributed. In this case Eq. (A.7) holds almost surely. A special case of Eq. (A.7) is the formula (which was the basis for the important “Kotani-argument” [39, 12]) for the spectral measure at x, ˆ µx,x ω (dE) = δ(v − vˆ − +(E))dE.
(A.8)
The above is a probability measure. Another normalizing condition is: 2 1 y δ(v − vˆ − +(E))dE ˆ ≤ 1, |v − v| ˆ x ˆ Hω − E 2
(A.9)
(which typically holds as equality). The reason for Eq. (A.9) is that by the general structure of the spectral measures, x,y µω (dE) = Rω (E)µx,x ω ( dE), with Rω (E) satisfying
|Rω (E)|2 µx,x ω (dE) = y| Pω |y ≤ 1,
where Pω is the projection onto the cyclic subspace for Hω which contains |x. Let us first present the necessary estimates for the case that F ⊂ R is of finite Lebesgue measure. Using the bound Eq. (A.9), and the Hölder inequality,
(F ) E µx,y ω
1/(2−α) α 1 x y δ(v − vˆ − +(E))dE ˆ , ≤ E |v − v| ˆα ˆ Hω − E F
(A.10)
where α( < 1) is a small number to be specified later. By a further application of the Hölder inequality, followed by the Jensen inequality we obtain 2−α # $α/δ x,y ≤ 2E(|v|δ ) E µ*;ω (F ) α/s s 1 x y δ(v − vˆ − +(E))dE ˆ × E , ˆ Hω − E F (A.11)
Finite-Volume Fractional-Moment Criteria for Anderson Localization
247
where α is fixed by the equation α/s + α/δ = 1. Finally we evaluate: E
ˆ |y| δ(v − vˆ − +(E))dE |x| Hˆ ω − E s x 1 ˆ = E x y ρω (vˆ + +(E)) dE Hˆ ω − E F s 1 y dE, ≤κ E x Hˆ ω − E F 1
F
s
(A.12)
where κ is a uniform upper bound for ρωx . These estimates can be combined to provide a bound of the form Eq. (A.6) for F a finite interval, which was the case considered in ref. [13]. We shall now improve the argument, to obtain a statement which covers the case that the localized spectral regime is unbounded. Since we do not wish our final estimate to depend on the Lebesgue measure of F , we seek a way of introducing an integrable weight h(E), so that the final bound involves the integral of h(E) dE in place of dE. This may be accomplished with the following inequality: 1 x,y µ (F ) ≤ x||g(H )|2p |x 2p ω
F
|g(E)|
−p
x,y µ (dE)
1
ω
p
,
(A.13)
where 1/p + 1/p = 1 and g is any continuous function which x,y ' is bounded andx,ybounded away from zero. To prove Eq. (A.13), write µω (F ) = F g(E)/g(E) µω ( dE), and apply the Hölder inequality followed by 1/2 |g(E)|p µx,y ( dE) ≤ x||g(H )|2p |x . (A.14) ω It is convenient to choose g(E)2p = (1 + E 2 ), since x|(1 + Hω2 )|x = B + Vω (x)2 , where Bω is a bounded random variable which depends only on the off-diagonal part of Hω . Upon taking expectations followed by a further application of the Hölder inequality this leads to ! q "1/q x,y
2 2p E µω (F ) ≤ E Bω + Vω (x) × E
q
1 F
1/q
(A.15)
p
p
(1 + E 2 ) 2p
x,y µ (dE) ω
,
where 1/q + 1/q = 1. We estimate the two factors on the right-hand side of this inequality separately. The first factor can be controlled by choosing q = pδ so that q
2p δ/2 ≤ &Bω &∞ + E |Vω (x)|δ . (A.16) E Bω + Vω (x)2
248
M. Aizenman, J. H. Schenker, R. M. Friedrich, D. Hundertmark
The exponents p, p , q, q are all specified once we choose p > 1/δ. Specifically, q = δp, q = p(p − 1/δ)−1 , and p = p(p − 1)−1 . Note that p < q . x,y To estimate the second factor, we note that |µω | is a sub-probability measure and q /p > 1, so by the Jensen inequality, q p x,y x,y 1 1 µ (dE) ≤ E µ (dE) . E ω ω p q F (1 + E 2 ) 2p F (1 + E 2 ) 2p (A.17) Estimating the right hand side with the argument outlined above for F with finite Lebesgue measure, we find that x,y $ # 1 µ (dE) ≤ 2E(|v|δ ) α/δ E ω q F (1 + E 2 ) 2p
α/s s dE 1 y × κ E x , (A.18) (1 + E 2 )q /2p Hˆ ω − E F which is uniformly bounded provided we choose p such that q /p > 1. This is possible since q /p = (p − 1/δ)−1 which can be madeas large as we like. x,y Thus, for any finite volume E µ*;ω (F ) can be bounded by a constant multiple s of supE∈F E x ˆ 1 y raised to a certain power. Which multiple and which H*;ω −E
power depend only on the δ-moments of the potential and the uniform bound on the conditional distributions ρωx . By the vague convergence argument outlined at the start of the proof, this proves the theorem.
B. A Fractional Moment Bound The regularity conditions R1 (τ ) and R2 (s) have been used to give a priori estimates of certain fractional moments. Such fractional moment bounds are properties of the general class of operators with diagonal disorder. Hence, throughout this appendix, we consider random operators Hω on 2 (T ) of the form Hω = T0 + λVω ,
(B.1)
where T0 is an arbitrary bounded self-adjoint operator and Vω is a random potential for which Vω (x) are independent random variables (T is any countable set). Lemma B.1. Let Hω be a random operator given by Eq. (B.1) such that for each x the probability distribution of the potential Vω (x) satisfies R1 (τ ) for some fixed τ > 0 with constants uniform in x. Then there exists κτ < ∞ such that for any finite subset * of T , any x, y ∈ *, any z ∈ C, and any s ∈ (0, τ ), s τ (4κτ ) s/τ 1 . (B.2) E x y {V (u)}u∈*\{x,y} ≤ H*;ω − z τ − s λs
Finite-Volume Fractional-Moment Criteria for Anderson Localization
249
Proof. Let us first consider z = E ∈ R. For such energies Eq. (B.2) is a consequence of a Wegner type estimate on the 2-dimensional subspace spanned by |x >, |y >. The key is to determine the correct expression for the dependence of x| H*;ω1 −E |y on Vω (x) and Vω (y). Such an expression is given by the “Krein formula”: ! " −1 1 V (x) 0 x (B.3) 2 , y = 1 [A]−1 + λ ω0 V (y) ω H*;ω − E where [A] is a 2 × 2 matrix whose entries do not depend on Vω (x) or Vω (y). In fact, 1 1 x H( −E x x H( −E y *;ω *;ω 1 x [A] = y (B.4) , H(*;ω −E y 1 y ( H −E *;ω
(*;ω denotes the operator obtained from H*;ω by setting Vω (x) and Vω (y) equal where H to zero. The regularity condition R1 (τ ) implies a Wegner type estimate: 1 ! "−1 1 1 1 4κτ Vω (x) 0 1 1 −1 Prob 1 [A] + λ , (B.5) 1 > t {Vω (u)}u =x,y ≤ 0 Vω (y) 1 1 (λt)τ where κτ is any finite number such that for every v ∈ T , a ∈ R, and $ > 0, Prob (Vω (v) ∈ (a − $, a + $)) ≤ κτ $ τ .
(B.6)
The desired bound (B.2) follows easily from Eq. (B.5). (The factor, 4, on the right hand side of (B.5) arises as the square of the “volume” of the region {x, y}. In the case x = y, we could replace this factor by 1.) Although the Krein formula (B.3) is true when E is replaced by any z ∈ C, the resulting matrix [A] may not be normal if z ∈ R. (The resolvent, H 1−z , is normal. 1 However, given an orthogonal projection, P , the operator P H −E P may not be normal!) Yet, the Wegner-like estimate (B.5) holds only when [A] is a normal matrix. At first, this seems to be an obstacle to the extension of (B.2) to all values of z. However, once the inequality is known for real values of z, it follows for all z ∈ C from analytic properties of the resolvent. Specifically, the function s 1 φ(z) = x (B.7) y H*;ω − z is sub-harmonic in the upper and lower half planes and decays as z → ∞. Hence, φ(z) is dominated by the convolution of its boundary values with a Poisson kernel: |η| dE φ(E + iη) ≤ φ(E) . (B.8) 2 2 +η π (E − E) ∈ R, (B.2) is seen to hold for all z ∈ C. By Fubini’s theorem and Eq. (B.2) for E
250
M. Aizenman, J. H. Schenker, R. M. Friedrich, D. Hundertmark
The “all for one” principle mentioned previously is actually a simple consequence of Lemma B.1. Lemma B.2. Let Hω be a random operator as described in Lemma B.1, and suppose that there is a distance function dist on T such that for some s < τ and some z ∈ C 1 s (B.9) E x y ≤ A(s)e−µ(s) dist(x,y) , Hω − z for every x, y ∈ T . Then, in fact, (B.9) holds, with modified constants A(r) and µ(r), when s is replaced by any r < τ . Proof. Note that given r, s > 0 with r < s < τ , E x
1 H*;ω − E
r rs ≤ E x y
≤ E x
≤
1
H*;ω − E t−s r t−r E x y
H*;ω − E s−r
(4κτ ) t/τ λt
t−r
s y
1
E x
where t is any number with s < t < τ .
1 H*;ω − E
1
H*;ω − E t−s r t−r , y
t s−r t−r y (B.10)
C. Decoupling Inequalities C.1. Decoupling inequalities for Green functions. The condition R2 (s) plays a crucial role in several of the arguments presented in this paper. It has been used to bound expectations of products of Green functions in terms of products of expectations. In this section we demonstrate the validity of the necessary bounds. The main result is the following: Lemma C.1. Let Hω be a random operator given by Eq. (B.1), with an s regular distribution of the potential Vω (x). Then 1. For any .1 , .2 ⊂ T , any x, y ∈ .1 , and any u, v ∈ .2 , s
C
E |G.1 (x, y; z)|s |G.2 (u, v; z)|s ≤ s E |G.1 (x, y; z)|s . λ
(C.1)
2. For any .1 ∩ .2 = ∅, x, u ∈ .1 , v, y ∈ .2 , and .3 ⊂ +,
E |G.1 (x, u; z)|s |G.3 (u, v; z)|s |G.2 (v, y; z)|s s
C ≤ s E |G.1 (x, u; z)|s E |G.2 (v, y; z)|s . λ
(C.2)
Lemma C.1 is a consequence of the conditional expectation bound (B.2), the Krein formula (B.3), and the following:
Finite-Volume Fractional-Moment Criteria for Anderson Localization
251
Lemma C.2. Let V1 , V2 be independent real valued random variables which satisfy (2) R2 (s) for some s > 0. Then there exists Ds > 0 such that
(C.3) E |F (V1 , V2 )|s |F (V1 , V2 )|s ≤ Ds(2) E |F (V1 , V2 )|s E |G(V1 , V2 )|s , where F and G are arbitrary functions of the form 1 , L1 (V1 , V2 ) L2 (V1 , V2 ) G(V1 , V2 ) = , L3 (V1 , V2 ) F (V1 , V2 ) =
(C.4) (C.5)
with {Li } functions which are linear in each variable separately. In fact, we may take (2) Ds = Ds;1 Ds;2 , where, for j = 1, 2, Ds;j is the decoupling constant for Vj . Proof. Let f (V ) and g(V ) be two functions of the appropriate form for the decoupling lemma. Then, with j = 1, 2,
j )|s |g(Vj )|s , E |f (Vj )|s |g(Vj )|s ≤ Ds;1 E |f (V (C.6) j indicates an independent variable distributed identically to Vj . where V Now, if F and G are functions of 2 variables of the given form, then at fixed values of V2 , they satisfy the 1 variable decoupling lemma, so
1 , V2 )|s |G(V1 , V2 )|s . E |F (V1 , V2 )|s |G(V1 , V2 )|s ≤ Ds;1 E |F (V (C.7) 1 and V1 , F (V 1 , V2 ) and G(V1 , V2 ) (as functions of V2 ) are again For fixed values of V of the correct form to apply the 1 variable decoupling lemma. Thus,
1 , V 2 )|s |G(V1 , V2 )|s E |F (V1 , V2 )|s |G(V1 , V2 )|s ≤ Ds;1 Ds;2 E |F (V
(C.8) = Ds;1 Ds;2 E |F (V1 , V2 )|s E |G(V1 , V2 )|s . C.2. A condition for the validity of R2 (s). Decoupling lemmas have been discussed already in references [11, 13, 8]. Though these contain results similar to those required here, they do not provide the exact condition used in this work. Hence, we briefly present an elementary condition under which R2 (s) is satisfied. The following discussion is by no means exhaustive. Rather, we simply wish to show that the condition R2 (s) is not devoid of meaningful examples. Lemma C.3. Let ρ be a measure with bounded support which satisfies R1 (τ ). Then for any s < τ4 , ρ satisfies R2 (s). Proof. For each s > 0, we define
1 ρ(dV ), |V − z|s |V − z|s ψs (z, w) = ρ(dV ), |V − w|s |V − z|s γs (z, w, ζ ), = ρ(dV ). |V − w|s |V − ζ |s φs (z) =
(C.9) (C.10) (C.11)
252
M. Aizenman, J. H. Schenker, R. M. Friedrich, D. Hundertmark
Property R2 (s) amounts to the statement that γs (z, w, ζ ) < ∞. φ z,w,ζ ∈C s (ζ )ψs (z, w) sup
In fact, if we let
√ φ2s (z) Fs (z) = , φs (z) √ ψ2s (z, w) Gs (z, w) = , ψs (z, w)
(C.12)
(C.13) (C.14)
then by the Cauchy–Schwartz inequality, it suffices to show that Fs and Gs are uniformly bounded. However this is elementary since Fs and Gs are continuous functions which are easily shown to have finite limits at infinity. Acknowledgements. Questions asked by Frédéric Klopp led us streamline the original derivation of the results in Sect. 3. We thank him for this and other stimulating discussions. This work was supported in part by the NSF Grant PHY-9971149 (MA). Jeff Schenker thanks the NSF for financial support under a Graduate Research Fellowship, and Dirk Hundertmark thanks the Deutsche Forschungsgemeinschaft for financial support under grant Hu 773/1-1.
References 1. Anderson, P.W.: Absence of diffusion in certain random lattices. Phys. Rev. 109, 1492 (1958) 2. Mott, N. and Twose, W.: The theory of impurity conduction. Adv. Phys. 10, 107 (1961) 3. Martinelli, F. and Scoppola, E.: Introduction to the mathematical theory of Anderson localization. Rivista del Nuovo Cimento 10, no. 10 (1987) 4. Halperin, B.I.: Quantized Hall conductance, current-carrying edge states, and the existence of extended states in a two-dimensional disordered potential. Phys. Rev. B 25, 2185 (1982) 5. Niu, Q., Thouless, D.J. and Wu, Y.S.: Quantized Hall conductance as a topological invariant. Phys. Rev. B 31, 3372 (1985) 6. Avron, J.E., Seiler, R. and Simon, B.: Charge deficiency, charge transport and comparison of dimensions. Commun. Math. Phys. 159, 399 (1994) 7. Bellissard, J., van Elst, A. and Schulz-Baldes, H.: The noncommutative geometry of the quantum Hall effect. J. Math. Phys. 35, 5373 (1994) 8. Aizenman, M. and Graf, G.M.: Localization bounds for an electron gas. J. Phys. A: Math. Gen. 31, 6783 (1998) 9. Figotin, A. and Klein, A.: Midgap defect modes in dielectric and acoustic media. SIAM J. Appl. Math. 58, 1748 (1998); no. 6, 1748–1773 (electronic) 10. Fröhlich, J. and Spencer, T.: Absence of diffusion in the Anderson tight binding model for large disorder or low energy. Commun. Math. Phys. 88, 151 (1983) 11. Aizenman, M. and Molchanov, S.: Localization at large disorder and at extreme energies: An elementary derivation. Commun. Math. Phys. 157, 245 (1993) 12. Simon, B. and Wolff, T.: Singular continuous spectrum under rank one perturbations and localization for random Hamiltonians. Comm. Pure Appl. Math. 39, no. 1, 75 (1986) 13. Aizenman, M.: Localization at weak disorder: Some elementary bounds. Rev. Math. Phys. 6, 1163 (1994) 14. Minami, N.: Local fluctuation of the spectrum of a multidimensional Anderson tight binding model. Commun. Math. Phys. 177, 709 (1996) 15. Pastur, L. and Figotin, A.: Spectra of random and almost-periodic operators. Berlin: Springer-Verlag, 1992 16. Barbaroux, J.M., Combes, J.-M. and Hislop, P.D.: Localization near band edges for random Schrödinger operators. Helv. Phys. Acta 70, 16 (1997) Papers honouring the 60th birthday of Klaus Hepp and of Walter Hunziker, Part II (Zürich, 1995)
Finite-Volume Fractional-Moment Criteria for Anderson Localization
253
17. Kirsch, W., Stollmann, P. and Stolz, G.: Localization for random perturbations of periodic Schrödinger operators. Rand. Oper. Stoch. Eq. 6, 241 (1998) 18. Stollmann, P.: Lifshitz asymptotics via linear coupling of disorder. Preprint, 1999 19. Klopp, F.: Internal Lifshits tails for random perturbations of periodic Schrödinger operators. Duke Math. J. 98, 335 (1999) 20. Combes, J.-M. and Hislop, P.D.: Localization properties of continuous disordered systems in d- dimensions. In: Mathematical quantum theory. II. Schrödinger operators (Vancouver, BC, 1993). CRM Proc. Lecture Notes 8, Providence, RI: Am. Math. Soc., 1995, p. 213 21. Figotin, A. and Klein, A.: Localization of electromagnetic and acoustic waves in random media. Lattice models. J. Stat. Phys. 76, 985 (1994) 22. Carmona, R., Klein,A. and Martinelli, F.:Anderson localization for Bernoulli and other singular potentials. Commun. Math. Phys. 108, no. 1, 41 (1987) 23. Hammersley, J.M.: Percolation processes II. The connective constant. Proc. Camb. Phil. Soc. 53, 642 (1957) 24. Dobrushin, R.L. and Shlosman, S.B.. Completely analytical interactions: Constructive description. J. Stat. Phys. 46, no. 5–6, 983–1014 (1987) 25. Simon, B.: Correlation inequalities and the decay of correlations in ferromagnets. Commun. Math. Phys. 77, no. 2, 111 (1980) 26. Lieb, E.H.: A refinement of Simon’s correlation inequality. Commun. Math. Phys. 77, no. 2, 127 (1980) 27. Aizenman, M. and Newman, C.M.: Tree graph inequalities and critical behavior in percolation models. J. Stat. Phys. 36, 107 (1984) 28. von Dreifus, H.: Bounds of the critical exponents of disordered ferromagnetic models. Ann. Inst. Henri Poincaré 55, 657 (1991) 29. Benjamini, I., Lyons, R., Peres,Y. and Schramm, O.: Group-invariant percolation on graphs. Geom. Funct. Anal. 9, no. 1, 29 (1999) 30. Aizenman, M. and Simon, B.: Local Ward identities and the decay of correlations in ferromagnets. Commun. Math. Phys. 77, no. 2, 137 (1980) 31. Aizenman, M. and Holley, R.: Rapid convergence to equilibrium of stochastic Ising models in the Dobrushin Shlosman regime. In: Percolation theory and ergodic theory of infinite particle systems (Minneapolis, Minn., 1984–1985), 1, New York: Springer, 1987 32. Maes, C. and Shlosman, S.B.: Ergodicity of probabilistic cellular automata: a constructive criterion. Commun. Math. Phys. 135, no. 2, 233 (1991) 33. Kunz, H. and Souillard, B.: Sur le spectre des opérateurs aux différences finies aléatoires. Commun. Math. Phys. 78, no. 201 (1980) 34. Last, Y.: Quantum dynamics and decompositions of singular continuous spectra. J. Funct. Anal. 142, no. 2, 406 (1996) 35. Combes, J.-M. and Thomas, L.: Asymptotic behaviour of eigenfunctions for multiparticle Schrödinger operators. Commun. Math. Phys. 34, 251 (1973) 36. Simon, B.: Lifschitz tails for the Anderson model. J. Stat. Phys. 38, 65 (1985) 37. Germinet, F. and DeBièvre, S.: Dynamical localization for discrete and continuous random Schrödinger operators. Commun. Math. Phys. 194, no. 2, 323 (1998) 38. Damanik, D. and Stollmann, P.: Multi-scale analysis implies strong dynamical localization. Preprint, 1999; http://xxx.lanl.gov/abs/math-ph/9912002 39. Kotani, S.: Lyaponov exponents and spectra for one-dimensional random Schrödiner operators. In: Contemporary Mathematics (AMS), Vol. 50, Providence, RI: AMS, 1986 Communicated by B. Simon
Commun. Math. Phys. 224, 255 – 269 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Correlations Between Zeros and Supersymmetry Pavel Bleher1 , Bernard Shiffman2 , Steve Zelditch2 1 Department of Mathematical Sciences, IUPUI, Indianapolis, IN 46202, USA.
E-mail: [email protected]
2 Department of Mathematics, Johns Hopkins University, Baltimore, MD 21218, USA.
E-mail: [email protected]; [email protected] Received: 13 November 2000 / Accepted: 23 February 2001
To Joel Lebowitz on his 70th birthday Abstract: In our previous work [BSZ2], we proved that the correlation functions for simultaneous zeros of random generalized polynomials have universal scaling limits and we gave explicit formulas for pair correlations in codimensions 1 and 2. The purpose of this paper is to compute these universal limits in all dimensions and codimensions. First, we use a supersymmetry method to express the n-point correlations as Berezin integrals. Then we use the Wick method to give a closed formula for the limit pair correlation function for the point case in all dimensions. 1. Introduction This paper is a continuation of our articles [BSZ1, BSZ2, BSZ3] on the correlations between zeros of random holomorphic polynomials in m complex variables and their generalization to holomorphic sections of positive line bundles L → M over general Kähler manifolds of dimension m and their symplectic counterparts. These correlations N (z1 , . . . , zn ) of finding joint zeros of k indepenare defined by the probability density Knk dent sections at the points z1 , . . . , zn ∈ M (see Sect. 2). To obtain universal √ quantities, we rescale the correlation functions in normal coordinates by a factor of N . Our main result from [BSZ2, BSZ3] is that the (normalized) correlation functions have a universal scaling limit, z1 zn ∞ N N nkm K z0 + √ , . . . , z0 + √ (z1 , . . . , zn ) = lim K1k (z0 )−n Knk , (1) N→∞ N N ∞ depends which is independent of the manifold M, the line bundle L and the point z0 ; K nkm only on the dimension m of the manifold and the codimension k of the zero set. The
Research partially supported by NSF grants #DMS-9970625 (first author), #DMS-9800479 (second author), #DMS-0071358 (third author).
256
P. Bleher, B. Shiffman, S. Zelditch
problem then arises of calculating these universal functions explicitly and analyzing their small distance and large distance behavior. In [BSZ1,BSZ2], we gave explicit formulas ∞ (z1 , z2 ) in codimensions k = 1, 2, respectively. for the pair correlation functions K 2km The purpose of this paper is to complete these results by giving explicit formulas for ∞ in all dimensions and codimensions. K nkm Our first formula expresses the correlation as a supersymmetric (Berezin) integral involving the matrices (z), A∞ (z) used in our prior formulas, as well as a matrix of fermionic variables described below. Theorem 1.1. The limit n-point correlation functions are given by 1 [(m − k)!]n ∞ nkm (z1 , . . . , zn ) = dη. K (m!)n [det A∞ (z)]k det[I + (z)] Here, is the nkm × nkm matrix pj q p q p p = p j q = δp δq ηj η¯ j
(1 ≤ p, p ≤ n, 1 ≤ j, j ≤ k, 1 ≤ q, q ≤ m), (2)
p
p
where the ηj , η¯ j are anti-commuting (fermionic) variables, and dη = j,p dηj d η¯ j . The integral in Theorem 1.1 is a Berezin integral, which is evaluated by simply taking the coefficient of the top degree form of the integrand det[I + (z)]−1 (see Sect. 3). Hence the formula in Theorem 1.1 is a purely algebraic expression in the coefficients of (z) and A∞ (z), which are given in terms of the Szegö kernel of the Heisenberg group and its derivatives (see Sect. 2). We remark that supersymmetric methods have also been applied to limit correlations in random matrix theory by Zirnbauer [Zi]. ∞ (z1 , z2 ), depends only on the distance between the points In the case n = 2, K 2km z1 , z2 , since it is universal and hence invariant under rigid motions. Hence it may be written as: ∞ 2km (z1 , z2 ) = κkm (|z1 − z2 |). K
(3)
We refer to [BSZ2] for details. In [BSZ1] we gave an explicit formula for κ1m (using the “Poincaré–Lelong formula”), and in [BSZ2] we evaluated κ2m . (The pair correlation function κ11 (r) was first determined by Hannay [Ha] in the case of zeros of SU (2) polynomials in one complex variable.) In Sect. 3.1, we use Theorem 1.1 to give the following new Berezin integral formula for κkm : Corollary 1.2. The pair correlation functions are given by 1 (m − k)!2 κkm (r) = dη, 2 k 2 −r m−1 m! (1 − e ) where = det [I + P (1 + 2 ) + T 1 2 ] , P =1−
r 2 e−r
2
1 − e−r
,
T = 1−e
−r 2
−
r 4 e−r
2
1 − e−r 2 = det I + 1 + 2 + (1 − e−r )1 2 . 2
2
,
Correlations Between Zeros and Supersymmetry
257
Here, 1 , 2 are the k × k matrices p p p = ηj η¯ j
1≤j,j ≤k
,
p = 1, 2.
We then expand the formula as a (finite) series (32), which we use to compute explicit formulas for κkm . The most vivid case is when k = m, where the simultaneous zeros of k-tuples of sections almost surely form a set of discrete points. Our second result is an explicit formula for the point pair correlation functions κmm in all dimensions: Theorem 1.3. The point pair correlation functions are given by κmm (r) =
m(1−v m+1 )(1−v) + r 2 (2m + 2)(v m+1 −v) + r 4 v m+1 + v m + ({m + 1}v + 1)(v m −v)/(v−1) m(1−v)m+2 2
v = e−r ,
,
(4)
for m ≥ 1. For small values of r, we have κmm (r) =
m + 1 4−2m + O(r 8−2m ), r 4
as r → 0.
(5)
We prove Theorem 1.3 in Sect. 4 without making use of supersymmetry. Our proof uses instead the Wick formula expansion of the Gaussian integral representation of the correlation. It is interesting to observe the dimensional dependence of the short distance behavior of κmm (r). When m = 1, κmm (r) → 0 as r → 0 and one has “zero repulsion”. When m = 2, κmm (r) → 3/4 as r → 0 and one has a kind of neutrality. With m ≥ 3, κmm (r) ∞ as r → 0 and there is some kind of attraction between zeros. More 1.06
1.04
1.02
1
0.98
0.96
0.94 0
0.2
0.4
0.6
0.8
1
1.2
1.4
r
1.6
1.8
2
2.2
Fig. 1. The limit pair correlation function κ33
2.4
2.6
2.8
3
258
P. Bleher, B. Shiffman, S. Zelditch
precisely, in dimensions greater than 2, one is more likely to find a zero at a small distance r from another zero than at a small distance r from a given point; i.e., zeros tend to clump together in high dimensions. Indeed, in all dimensions, the probability of finding another zero in a ball of small scaled radius r about another zero is ∼ r 4 . We give in Fig. 1 a graph of κ33 ; graphs of κ11 and κ22 can be found in [BSZ2]. Remark. Theorem 1.3 says that the expected
r number of zeros in the punctured ball of scaled radius r about a given zero is ∼ 0 κmm (t)t 2m−1 dt ∼ r 4 . Also, one can show that for balls of small scaled radii r, the expected number of zeros approximates the probability of finding a zero. 2. Background We begin by recalling the scaling limit zero correlation formula of [BSZ2]. Consider a random polynomial s of degree N in m variables. More generally, s can be a random section of the N th power LN of a positive line bundle L on an m-dimensional compact complex manifold M (or a symplectic 2m-manifold; see [SZ3, BSZ3]). We give M the Kähler metric induced by the curvature form ω of the line bundle L. The probability measure on the space of sections is the complex Gaussian measure induced by the Hermitian inner product s1 , s¯2 = hN (s1 , s¯2 )dVM , M
where hN
is the metric on LN
and dVM is the volume measure induced by ω. (For further discussion of the topics of this section, see [BSZ2].) In particular, if L is the hyperplane section bundle over CPm , then random sections of LN are polynomials of degree N in m variables of the form CJ j j z11 · · · zmm ( J = (j1 , . . . , jm ) ), P (z1 , . . . , zm ) = √ (N − |J |)!j ! · · · j ! 1 m |J |≤N where the CJ are i.i.d. Gaussian random variables with mean 0; they are called “SU(m + 1)-polynomials”. We consider k-tuples s = (s1 , . . . , sk ) of i.i.d. random polynomials (or sections) sj N (z1 , . . . , zn ) is defined as the expected (1 ≤ k ≤ m). The zero correlation density Knk joint volume density of zeros of sections of LN at the points z1 , . . . , zn . In the case N (z1 , . . . , zn ) can be interpreted as k = m, where the zero sets are discrete points, Knk the probability density of finding simultaneous zeros at these points. For instance, the N (z) ≈ c N k as N → ∞, where c is independent of the point zero density function K1k k k z (see [SZ1]). In [BSZ2, BSZ3], we gave generalized forms of the Kac-Rice formula [Kac,Ri], N (z1 , . . . , zn ) in terms of the joint probability distribution which we used to express Knk (JPD) of the random variables s(z1 ), . . . , s(zn ), ∇s(z1 ), . . . , ∇s(zn ). We then showed ∞ given by (1) can be expressed in terms that the scaling limit correlation function K nkm of the scaling limit of the JPD. The central result of [BSZ2] is that the limit JPD is universal and can be expressed in terms of the Szegö kernel ,H 1 for the Heisenberg group: ,H 1 (z, θ ; w, ϕ) =
1 i(θ−ϕ+z·w)− 1 ¯ 21 |z−w|2 ¯ 21 (|z|2 +|w|2 ) e = m ei(θ−ϕ)+z·w− . πm π
(6)
Correlations Between Zeros and Supersymmetry
259
To be precise, the limit JPD is a complex Gaussian measure with covariance matrix 2∞ given by:
A∞ (z) B ∞ (z) m! ∞ , (7) 2 (z) = m π B ∞ (z)∗ C ∞ (z) where
p
p p π −m A∞ (z)p = ,H 1 (z , 0; z , 0), p
π −m B ∞ (z)p q = pq
π −m C ∞ (z)p q =
∇ p ∂ z¯ q
p
p
p p H p p ,H 1 (z , 0; z , 0) = (zq − zq ),1 (z , 0; z , 0) ,
∇2 p
(8)
p q
p p ,H 1 (z , 0; z , 0)
∂zq ∂ z¯ p p p p p p = δqq + (¯zq − z¯ q )(zq − zq ) ,H 1 (z , 0; z , 0).
(Here A∞ , B ∞ , C ∞ are n × n, n × mn, mn × mn matrices, respectively.) In the sequel, we shall use the matrix ∞ (z) := C ∞ (z) − B ∞ (z)∗ A∞ (z)−1 B ∞ (z).
(9)
We note that A∞ (z) and ∞ (z) are positive definite whenever z1 , . . . , zn are distinct points. In [BSZ2], we gave the following key formula for the limit correlation functions: n m n [(m − k)!] p p ∞ nkm K (z1 , . . . , zn ) = det ξj q ξ¯j q dγ(z) (ξ ), (m!)n [det A∞ (z)]k Ckmn 1≤j,j ≤k p=1
q=1
(10) where γ(z) is the Gaussian measure with (nkm × nkm) covariance matrix pj q j pq (z) := (z)p j q = δj ∞ (z)p q . p
p
(11)
pj q
(I.e., ξj q ξ¯j q γ(z) = (z)p j q .) For the pair correlation case (n = 2), Eq. (10) becomes: 1 2 m! k (m−k)! det A(r) m × det ξj1q ξ¯j1 q
κkm (r) =
C2km 1≤j,j ≤k
where
q=1
det
1≤j,j ≤k
(12) m 2 ¯2 ξj q ξj q dγ(r) (ξ ),
A(r) = A∞ (z1 , z2 ), (r) = (z1 , z2 ),
q=1
|z1 − z2 | = r.
The computations in this paper are all based on formula (10).
260
P. Bleher, B. Shiffman, S. Zelditch
3. Supersymmetric Approach to n-Point Correlations We now prove Theorem 1.1 using our formula (10) for the limit n-point correlation function, which we restate as follows: ∞ nkm K (z1 , . . . , zn ) =
[(m − k)!]n Gnkm , (m!)n [det A∞ (z)]k
where Gnkm (z) =
n
det
Ckmn p=1 1≤j,j ≤k
m p p ξj q ξ¯j q dγ(z) (ξ ).
(13)
(14)
q=1
Our approach is to represent the determinant in (14) as a Berezin integral and then to exchange the order of integration. p p We introduce anti-commuting (or “fermionic”) variables ηj , η¯ j (1 ≤ j ≤ k, 1 ≤ p ≤ n), which can be regarded as generators of the Grassmann algebra • C2l = 2l t 2l • 2l C , l = nk. The Berezin integral on C is the linear functional I : •t=02l C → C given by p p I|t C2l = 0 for t < 2l, I η ¯ η j,p j j = 1. Elements f ∈ write
•
C2l are considered as functions of anti-commuting variables, and we p p I(f ) = f dη = f j,p dηj d η¯ j .
pj (See for example [Ef, Chapter 2], [ID, Sect. 2.1].) If H = Hp j is an l × l Hermitian matrix, we have the supersymmetric formula for the determinant: p pj p ¯ det H = e−H η,η dη, H η, η ¯ = ηj Hp j η¯ j . (15) j,p,j ,p
We now use (15) to compute Gnkm : let p p ξ11 · · · ξ1m . .. . ξp = . . p p ξk1 · · · ξkm p
(where {ξj q } are ordinary “bosonic” variables). We also write ξ = ξ 1 ⊕ · · · ⊕ ξ n : Cmn → Ckn . Then 1 ξ 1∗ · · · ξ 0 n m . .. p p .. . (16) det ξj q ξ¯j q = det(ξ ξ ∗ ) = det . . . . 1≤j,j ≤k p=1 q=1 n n∗ 0 ··· ξ ξ
Correlations Between Zeros and Supersymmetry
261
Applying (15) with H = ξ ξ ∗ , we have 1 −1 ¯ det(ξ ξ ∗ )e− ξ,ξ dξ Gnkm = nkm π det Cnkm 1 −1 ¯ ∗ ¯ = nkm e− ξ,ξ −ξ ξ η,η dηdξ, π det Cnkm p p p p ξ ξ ∗ η, η ¯ = ξj q ξ¯j q ηj η¯ j = ξ, ξ¯ ,
(17) (18)
p,q,j,j
where is given by (2). Note that the entries of commute, since they are of degree 2. Furthermore, adopting the supersymmetric definition of the conjugate [Ef], p p (ηj )¯ = η¯ j ,
p p (η¯ j )¯ = −ηj ,
we see that the matrix is superhermitian; i.e., ∗ = , where ∗ = t ¯. Thus by (17)–(18), we have −1 1 ¯ e− ( +)ξ,ξ dηdξ. Gnkm = nkm π det Cnkm We recall that
1 π nkm
¯
Cnkm
e−P ξ,ξ dξ = det P −1 ,
(19)
(20)
for a positive definite, Hermitian (nkm × nkm) matrix P . Furthermore, (20) holds when P is the superhermitian matrix −1 + ; we give a short proof of this fact below. Reversing the order of integration in (19) and applying (20) with P = −1 + , we have 1 1 dη Gnkm = det det(−1 + ) (21) 1 = dη. det(I + ) We now verify by formal substitution that (20) holds when P = −1 + : Suppose that <1 , · · ·
−1 +Z+Z ∗ )ξ,ξ¯
e−(
e−(Z+Z
∗ )ξ,ξ¯
dξ
−1 ξ,ξ¯
e−
dξ =
1 , −1 det( + Z + Z ∗ )
(22)
for Z = (zαβ ) ∈ gl(nkm, C), where the last equality is by (20). We easily see that f is a ∗ ¯ convergent power series in {zαβ , z¯ αβ } and that the integrand e−(Z+Z )ξ,ξ can be written −1 ¯ as an absolutely convergent power series in {zαβ , z¯ αβ } with values in L1 (e− ξ,ξ dξ ). 1 1 ¯ We now let <αβ = 2 αβ = 2 βα ; the conclusion follows by applying S< to (22). !
262
P. Bleher, B. Shiffman, S. Zelditch
3.1. Pair correlation. In this section, we prove Corollary 1.2. To illustrate the computation, we consider first the case k = m = 1 of zero correlations in dimension one: We have
η1 η¯ 1 11 12 0 , = , = 0 η2 η¯ 2 21 22
1 + 11 η1 η¯ 1 12 η2 η¯ 2 . I + = 21 η1 η¯ 1 1 + 22 η2 η¯ 2 We easily compute det(I + ) = 1 + 11 η1 η¯ 1 + 22 η2 η¯ 2 + (det )η1 η¯ 1 η2 η¯ 2 , det(I + )−1 = 1 − 11 η1 η¯ 1 − 22 η2 η¯ 2 + (211 22 − det )η1 η¯ 1 η2 η¯ 2 , det(I + )−1 d η¯ 1 dη1 d η¯ 2 dη2 = 211 22 − det = 11 22 + 12 21 . Hence by Theorem 1.1, we have κ11 (r) =
11 22 + 12 21 . det A
(23)
To obtain the explicit formula for κ11 (r) [Ha, BSZ1], we set z1 = (r, 0, . . . , 0), z2 = 0 p and substitute in (23) the resulting values of p and det A (see (24)–(26) below). To obtain formulas for κkm (r) for higher k, m, we again set z1 = (r, 0, . . . , 0), z2 = pj q 0. Using (6), (8) and (9), we see that p j q = 0 for (j, q) " = (j , q ) and
1j 1
1j 1
2j 1 1j 1 1j q 1j q
2j 1 2j 1 1j q 2j q
2j q
2j q
1j 1 2j 1
1j q 2j q
=
P Q
,
Q P
=
R S
for q ≥ 2,
S R
where P =
2 1 − e−r
2 − r 2 e−r
1 − e−r
2
R = 1,
e− 2 r
1 2
,
Q=
A(r) =
− 21 r 2
1
e− 2 r
e− 2 r
1
1 2
1 − e−r − r 2 2
1 − e−r
S=e
We also have
(24)
2
,
(25)
.
1 2
,
and thus det A = 1 − e−r . 2
(26)
Correlations Between Zeros and Supersymmetry
263
We can write the 2km × 2km matrices , in block form: 0 · · · 0 0 · · · 0 ··· 0 0 · · · = . , = . .. . . .. .. . . .. .. . . . . . 0 0 ··· 0 0 ···
P Ik QIk RIk SIk = , = , = QIk P Ik SIk RIk where Ik is the k × k identity matrix and p p p p η1 η¯ 1 · · · ηk η¯ 1 . .. . p = . . p p η1 η¯ k
···
p p ηk η¯ k
0 0 .. .
,
1
0
0
2
,
,
p = 1, 2.
Hence det(I + ) = m−1 ,
(27)
where = det(I + ),
We note that I + = and thus
Similarly,
= det(I + ).
I + P 1
Q2
Q1
I + P 2
(28)
,
= det(I + P 1 ) det I + P 2 − Q2 1 (I + P 1 )−1 2 = det I + P (1 + 2 ) + (P 2 − Q2 )1 2 . = det I + R(1 + 2 ) + (R 2 − S 2 )1 2 .
(29)
(30)
We recall that by Theorem 1.1, κkm (r) =
[(m − k)!]2 2 (m!)2 (1 − e−r )k
Combining (25)–(31), we obtain Corollary 1.2.
1 dη. det(I + )
(31)
!
To obtain explicit formulas for the pair correlation in a fixed codimension k (for all dimensions m), we write = 1 − (1 − ) and our formula becomes: 2k m+t −2 [(m − k)!]2 κkm (r) = −1 (1 − )t dη. (32) 2 t (m!)2 (1 − e−r )k t=0
264
P. Bleher, B. Shiffman, S. Zelditch
Using MapleTM , we evaluate (32) to obtain the following pair correlation formulas: 1 P 2 + 2 (m−1) P R + Q2 + (m−1)2 R 2 + (m−1) S 2 , κ1m (r) = 2 m det A 1 κ2m (r) = 2 4(m−1)P 2 R 2 + 2P 2 S 2 + 4 (m−1) (m−2) P R 3 m (m−1) det A2 + 4(m−2)P RS 2 + 2 (m−1) Q2 R 2 + 4Q2 S 2 + (m−1) (m−2)2 R 4 + 2 (m−2)2 R 2 S 2 + 2 (m−2) S 4 , κ3m (r) =
1 9 (m−1) (m−2) P 2 R 4 + 12 (m−2) P 2 R 2 S 2 m2 (m−1)(m−2) det A3 + 6P 2 S 4 + 6 (m−3) (m−1) (m−2) P R 5 + 12 (m−3) (m−2) P R 3 S 2 + 12 (m−3) P RS 4 + 3 (m−1) (m−2) Q2 R 4 + 12 (m−2) Q2 R 2 S 2 + 18Q2 S 4 + (m−1) (m−2) (m−3)2 R 6 + 3 (m−2) (m−3)2 R 4 S 2 + 6 (m−3)2 R 2 S 4 + 6 (m−3) S 6 .
Recalling (25)–(26), we then obtain power series expansions of the pair correlation function in codimensions 1, 2, 3: κ1m (r) =
m−1 −2 m−1 (m + 2) (m + 1) 2 (m + 4) (m + 3) 6 r − r r + + m 2m 12 m2 720 m2 (m + 6) (m + 5) 10 (m + 8) (m + 7) 14 + r − r ··· , 30240m2 1209600 m2
m−2 −4 m−2 −2 5m2 −7m + 12 (m−2)(m + 2)(m + 1) 2 r r + r + + m m 12(m−1)m 12(m−1)m2 (m + 3)(m + 2) 4 (m−2)(m + 4)(m + 3) 6 + r + ..., r − 720(m−1)m2 240(m−1)m m−3 −6 3(m−3) −4 m2 −4 m + 6 −2 (m−3) 3 m2 −m + 8 κ3m (r) = r + r + r + m 2m 8 m (m−1) (m−2) (m−2) m 2 (m + 2) (m + 1) 19 m −79 m + 120 2 + r 240 m2 (m−1) (m−2) (m−3) (m + 3) (m + 2) 4 + r ··· . 160 m (m−1) (m−2)
κ2m (r) =
(The power series for κ1m and κ2m were given in [BSZ1] and [BSZ2] respectively.) 4. The Point Case We now prove Theorem 1.3: For the case k = m, where the zero set is discrete, (12) becomes: Gm (r) , (33) κmm (r) = 2 (m!) det A(r)m
Correlations Between Zeros and Supersymmetry
where Gm (r) = G2mm (r) =
C2m
2
265
2 1 det det ξj2q dγ(r) (ξ ). 1≤j,q≤m ξj q 1≤j,q≤m
(34)
We let · (r) = ·dγ(r) denote the expected value with respect to the Gaussian probability measure γ(r) . Thus Gm (r) = det(ξ 1 ) det(ξ 2 ) det(ξ¯ 1 ) det(ξ¯ 2 ) (r)
! α+β+µ+ν 1 2 1 2 = (−1) ξαq q ξβq q ξ¯µq q ξ¯νq q q
α,β,µ,ν
q
q
q
,
(35)
(r)
where the sum is over all 4-tuples α, β, µ, ν ∈ Sm (= permutations of {1, . . . , m}), and (−1)σ stands for the sign of the permutation σ . We shall compute the terms of (35) using the Wick formula rather than directly from the Berezin integral formula. The computation simplifies considerably since the matrix (r) is sparse. In fact, we shall see that the sign (−1)α+β+µ+ν is positive whenever the corresponding moment is nonzero. Let us now evaluate the moments of order 4m in (35):
! 1 2 1 2 Mαβµν := ξ¯µq q ξ¯νq q ξαq q ξβq q . (36) q
q
q
q
(r)
Recall that the Wick formula expresses Mαβµν as a sum of products of second moments with respect to the Gaussian measure γ(r) . Since this Gaussian is complex, these second moments come from pairings of ξ ’s with ξ¯ ’s. We write pj q j pq p p . (37) p j q = δj p q = ξj q ξ¯j q (r)
pj q Hence the term Mαβµν equals the permanent of the submatrix of p j q formed from the rows corresponding to the variables ξα11 1 , . . . , ξα1m m , ξβ21 1 , . . . , ξβ2m m and columns 1 1 2 2 , . . . , ξ¯µ(m)m , ξ¯ν(1)1 , . . . , ξ¯ν(m)m : corresponding to ξ¯µ(1)1
1α1 1 1α1 1 1α1 1 11 1α 1µ1 1 · · · 1µm m 2ν1 1 · · · 2νm m .. .. .. .. . . . . 1αm m · · · 1αm m 1αm m · · · 1αm m 1µ1 1 1µm m 2ν1 1 2νm m . (38) 2β1 1 2β1 1 2β1 1 2β1 1 1µ1 1 · · · 1µm m 2ν1 1 · · · 2νm m .. .. .. .. . . . . 2β m 2β m 2β m 2β m 1µm1 1 · · · 1µmm m 2ν1m1 · · · 2νmmm " (Recall that permanent(Vij ) = σ i Viσi , where the sum is over all permutations σ .) To compute κmm (r), we can set z1 = (r, 0, . . . , 0), z2 = 0, as before. Recalling (24) pj q and the fact that p j q = 0 for (j, q) " = (j , q ), we observe that (38) is made up of 4 diagonal matrices. For example, if m = 3, then (38) becomes
266
P. Bleher, B. Shiffman, S. Zelditch
δµα11 P
0
0
δνα11 Q
0
0
0 δµα22 R 0 0 δνα22 S 0 α α 3 0 0 δµ3 R 0 0 δν33 S β1 β 0 0 δν11 P 0 0 δµ1 Q β β 2 2 0 δµ2 S 0 0 δν2 R 0 β3 β3 0 0 δµ3 S 0 0 δν3 R
.
We conclude that Mαβµν is a product of 2 × 2 permanents: m α β α β δµqq δνqq R 2 + δνqq δµqq S 2 . Mαβµν = δµα11 δνβ11 P 2 + δνα11 δµβ11 Q2
(39)
q=2
In particular, Mαβµν vanishes unless {µq , νq } = {αq , βq } for 1 ≤ q ≤ m.
(40)
We now claim that (40) implies that (−1)α+β+µ+ν = 1: First of all, by multiplying the four permutations by α −1 on the left, we can assume without loss of generality that αi = i for all i. Now write β as a product of disjoint cycles. Then one sees that µ is a product of some of these cycles and ν is a product of the other cycles, and the positivity of the product of signs easily follows. Hence Eq. (35) becomes Gm (r) = Mαβµν . (41) α,β,µ,ν
We now use (35) to evaluate Gm for arbitrary dimension m. Lemma 4.1. Gm = (m − 1)!m! P 2 fm (R 2 , S 2 ) + Q2 fm (S 2 , R 2 ) , where fm (x, y) = y m−1 + 2xy m−2 + · · · + (m − 1)x m−2 y + mx m−1 =
mx m+1 + y m+1 − (m + 1) x m y (x − y)2
.
Proof. We use induction on m. The identity holds trivially for m = 1, since f1 = 1 and G1 = P 2 + Q2 . Let m ≥ 2 and suppose the identity has been verified for 1, . . . , m − 1. Since Mαβµν is unchanged if we multiply all the permutations on the left by α −1 , we " have Gm = m!G0m , where G0m = β,µ,ν Meβµν (e = identity). For 1 ≤ i ≤ m, let Ci ⊂ Sm denote the collection of i-cycles of the form (1a2 . . . ai ). For each σ ∈ Ci , let σ ⊥ denote those permutations τ ∈ Sm that fix the elements 1, a1 , . . . , ai . We claim that Meβµν = (m − i)!(P 2 R 2i−2 + Q2 S 2i−2 )gm−i (R 2 , S 2 ), (42) β∈σ ⊥ µ,ν
where gl (x, y) = x l + x l−1 y + · · · + xy l−1 + y l .
Correlations Between Zeros and Supersymmetry
267
To verify (42), we can assume without loss of generality that σ = (1 . . . i). Recall that we need only consider permutations µ that are products of some of the cycles in β (and ν is determined by the pair (β, µ), since ν is the product of the other cycles of β when Meβµν " = 0). For the P 2 -terms of the sum, ν contains the cycle (1 . . . i) so that µq = q, νq = σq = βq for q = 1, . . . , i. (For the Q2 -terms, µ contains (1 . . . i).) Hence by (39), we have
Meβµν
β∈σ ⊥ µ,ν
= P 2 R 2i−2
m β∈σ ⊥
µ,ν q=i+1
β β δµq q δνqq R 2 + δνqq δµqq S 2 + terms with Q2 ,
(43)
" where is over those µ, ν with µq = q, νq = βq , for 1 ≤ q ≤ i. To compute the 1 double sum in the right side of (43), we notice by (39) and (41) that it equals (m−i)! Gm−i with P , Q replaced by R, S respectively. Hence by our inductive assumption, we have m
β
β
δµq q δνqq R 2 + δνqq δµqq S 2
β∈σ ⊥ µ,ν q=i+1
= (m − i − 1)![R 2 fm−i (R 2 , S 2 ) + S 2 fm−i (S 2 , R 2 )] = (m − i)!gm−i (R 2 , S 2 ). The computation of the Q2 terms is similar, and hence (42) holds. We now have by (42), Gm = m!
m
Meβµν
i=1 σ ∈Ci β∈σ ⊥ µ,ν
= m!
m
(#Ci )(m − i)!(P 2 R 2i−2 + Q2 S 2i−2 )gm−i (R 2 , S 2 ).
i=1
Noting that (#Ci )(m−i)! = (m−1)! and summing over i, we obtain the desired formula. ! We now complete the proof of Theorem 1.3. By (33) and Lemma 4.1, we have κmm (r) =
P 2 fm (R 2 , S 2 ) + Q2 fm (S 2 , R 2 ) . m(det A(r))m
(44)
Substituting (25)–(26) into (44), we obtain (4). Finally, to verify (5), we note by (25) that P = 21 r 2 + · · · , Q = 21 r 2 + · · · , R = 1, S = 1 + · · · , and hence κmm =
2fm (1, 1)(r 4 /4) + · · · m + 1 4−2m fm (1, 1) 4−2m + ··· = + ··· . = r r mr 2m + · · · 2m 4
The following proposition yields the remainder estimate of (5).
268
P. Bleher, B. Shiffman, S. Zelditch
Proposition 4.2. If m is odd, resp. even, then κmm (r) is an odd, resp. even, function of r 2. #, Q # be the functions given by P (r) = P #(u), Q(r) = Q(u), # Proof. Let P where u = r 2 . From (44), we have # 2 fm (e−u , 1) #(u)2 fm (1, e−u ) + Q(u) P . m(1 − e−u )m # #(−u) = eu/2 Q(u) and thus We observe from (25) that P κmm =
κmm (−u) =
# 2 fm (1, eu ) + eu P #(u)2 fm (eu , 1) eu Q(u) = (−1)m κmm (u), m(1 − eu )m
since fm is homogeneous of order m − 1.
!
The expansions of (4) are easily obtained using MapleTM : κ11 (r) =
1 2 1 6 1 10 1 r − r + r − r 14 2 36 720 16800 1 691 + r 18 − r 22 · · · , 435456 8382528000
κ22 (r) =
3 1 8 1 1 691 1 4 + r − r + r 12 − r 16 + r 20 · · · , 4 24 288 4800 96768 1524096000
κ33 (r) = r −2 +
1 2 11 6 1 1 4871 r − r − r 10 + r 14 − r 18 · · · , 4 2160 50400 80640 5029516800
κ44 (r) =
5 −4 95 79 7 6049 19 4 r + + r − r8 + r 12 − r 16 · · · , 4 144 576 40320 82944 2235340800
κ55 (r) =
3 −6 4 −2 55 2 19 r + r + r − r6 2 3 288 16800 257 21337 − r 10 + r 14 · · · , 1451520 1397088000
κ66 (r) =
7 −8 7 −4 5257 103 8 38177 407 4 r + r + + r − r + r 12 · · · 4 3 8640 14400 82944 1197504000
References [BSZ1] Bleher, P., Shiffman, B. and Zelditch, S.: Poincaré–Lelong approach to universality and scaling of correlations between zeros. Commun. Math. Phys. 208, 771–785 (2000) [BSZ2] Bleher, P., Shiffman, B. and Zelditch, S.: Universality and scaling of correlations between zeros on complex manifolds. Invent. Math. 142, 351–395 (2000) [BSZ3] Bleher, P., Shiffman, B. and Zelditch, S.: Universality and scaling of zeros on symplectic manifolds. Random Matrices and Their Applications, P. Bleher and A. Its (eds). MSRI Publications 40. Cambridge: Cambridge Univ. Press, 2001, pp. 31–69 [Ef] Efetov, K.: Supersymmetry in disorder and chaos. Cambridge: Cambridge Univ. Press, 1996 [Ha] Hannay, J.H.: Chaotic analytic zero points: Exact statistics for those of a random spin state. J. Phys. A: Math. Gen. 29, 101–105 (1996) [ID] Itzykson, C. and Drouffe, J.M.: Statistical field theory, Vol. 1. Cambridge: Cambridge Univ. Press, 1989
Correlations Between Zeros and Supersymmetry
[Kac] [Ri]
[SZ1] [SZ2] [SZ3] [Zi]
269
Kac, M.: On the average number of real roots of a random algebraic equation. Bull. Am. Math. Soc. 49, 314–320 (1943) Rice, S.O.: Mathematical analysis of random noise. Bell System Tech. J. 23, 282–332, (1944) and 24, 46–156 (1945); reprinted in: Selected papers on noise and stochastic processes. New York: Dover, 1954, pp. 133–294 Shiffman, B. and Zelditch, S.: Distribution of zeros of random and quantum chaotic sections of positive line bundles. Commun. Math. Phys. 200, 661–683 (1999) Shiffman, B. and Zelditch, S.: Asymptotics of almost holomorphic sections of ample line bundles on symplectic manifolds. J. Reine Angew. Math., to appear Shiffman, B. and Zelditch, S.: Asymptotics of almost holomorphic sections of ample line bundles on symplectic manifolds: an addendum. Proc. Amer. Math. Soc., to appear Zirnbauer, M.: Supersymmetry for systems with unitary disorder: Circular ensembles. J. Phys. A 29, 7113–7136 (1996)
Communicated by M. Aizenman
Commun. Math. Phys. 224, 271 – 305 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
On the 1/n Expansion for Some Unitary Invariant Ensembles of Random Matrices S. Albeverio1,2 , L. Pastur3,4, , M. Shcherbina4 1 2 3 4
Institut für Angewandte Mathematik, Universität Bonn, Bonn, Germany CERFIM, Locarno, Switzerland Centre de Physique Théorique de CNRS, 13288 Marseille, France. E-mail: [email protected] Mathematical Division of the Institute for Low Temperature Physics, Kharkov, Ukraine. E-mail: [email protected]
Received: 9 November 2000 / Accepted: 26 July 2001
Dedicated to Joel L. Lebowitz on the occasion of his 70th birthday Abstract: We present a version of the 1/n-expansion for random matrix ensembles known as matrix models. The case where the support of the density of states of an ensemble consists of one interval and the case where the density of states is even and its support consists of two symmetric intervals is treated. In these cases we construct the expansion scheme for the Jacobi matrix determining a large class of expectations of symmetric functions of eigenvalues of random matrices, prove the asymptotic character of the scheme and give an explicit form of the first two terms. This allows us, in particular, to clarify certain theoretical physics results on the variance of the normalized traces of the resolvent of random matrices. We also find the asymptotic form of several related objects, such as smoothed squares of certain orthogonal polynomials, the normalized trace and the matrix elements of the resolvent of the Jacobi matrices, etc. 1. Introduction. Problem and Main Results Random matrix theory is an actively developing field that has a wide variety of applications (see e.g. the review works [20, 16, 23] and references therein). Among numerous random matrix ensembles studied by the theory and which have important applications the ensembles with the unitary invariant probability distributions (known also as matrix models) play a significant role [15, 22]. This is, in particular, because of numerous links of the ensembles with the theory of orthogonal polynomials, potential theory, the theory of integrable systems, and other domains and techniques of analysis and mathematical physics. These ensembles consist of n × n Hermitian matrices and are defined by the distribution Pn (M)dM = Zn−1 exp{−nTrV (M)}dM, where Zn is the normalizing constant, V : R → R+ satisfies the conditions On leave from the U.F.R. de Mathématiques, Université Paris 7.
(1.1)
272
(i)
S. Albeverio, L. Pastur, M. Shcherbina
for some > 0 there exists L1 > 0 such that |V (λ)| ≥ (2 + ) log |λ|, |λ| ≥ L1 ,
(1.2)
(ii) for any 0 < L2 < ∞ there exists γ > 0 such that |V (λ1 ) − V (λ2 )| ≤ C|λ1 − λ2 |γ , |λ1,2 | ≤ L2 , (iii) there exists m > 0 such that
|V (λ)|e−mV (λ) dλ < ∞,
(1.3)
(1.4)
and dM =
n
dMjj
j =1
dMj k dMj k .
(1.5)
j
The asymptotic regime that we study is intermediate between the global regime (see e.g. [8, 12]) and the local regime (see e.g. [24, 13]) and the respective results are important in studying the central limit theorem [17], universal conductance fluctuations [6], and the universality of the local eigenvalue statistics at the edges of the support of the Integrated Density of States (IDS) of the ensembles [25]. (n) (n) Let us recall the definition of the IDS. Denote by λ1 , . . . , λn the eigenvalues of a matrix M of the ensemble and define the eigenvalue counting measure (NCM) of the matrix as (n)
Nn () = {λl
∈ } · n−1 ,
(1.6)
where is an interval of the spectral axis. According to [8] the NCM tends weakly in probability as n → ∞ to the nonrandom limiting measure N known as the Integrated Density of States (IDS) of the ensemble. The IDS is normalized to unity and is absolutely continuous if V satisfies the Lipshitz condition (1.3) (with possibly different constants C and γ ) [27]: ρ(λ)dλ. (1.7) N (R) = 1, N () =
The non-negative function ρ in (1.7) is called the Density of States (DOS) of the ensemble. The IDS can be found as the unique solution of a certain variational problem [8, 12, 27]. The IDS is one of the main outputs of the study of the global regime. Let us state now our main conditions. Condition C1. The support σ of the IDS of the ensemble consists of either (i)
a single interval: σ = [a, b], or
−∞ < a < b < ∞,
On the 1/n Expansion for Some Unitary Invariant Ensembles of Random Matrices
273
(ii) of two symmetric intervals: σ = [−b, −a] ∪ [a, b],
−∞ < a < b < ∞,
and V is even: V (λ) = V (−λ), λ ∈ R. a+b Remark 1. It is easy to see that changing the variables according to M = M − I 2 in case (i) we can always take the support σ to be symmetric with respect to the origin. Therefore without loss of generality we can assume that in this case σ = (−a, a).
(1.8)
Condition C2. The DOS ρ(λ) is strictly positive in all internal points of σ and behaves asymptotically as const |λ − c|1/2 , λ → c, in a neighborhood of each edge c of the support. Besides, the function u(λ) = 2 log |µ − λ|dµ − V (λ) (1.9) achieves its maximum if and only if λ ∈ σ . We will call this behavior generic (see e.g. [19] for results, justifying the term). Condition C3. V (λ) is real analytic on σ , i.e. there exists an open domain D ⊂ C such that σ ⊂ D and an analytic in D function V (z), z ∈ D such that V (λ + i0) = V (λ),
λ ∈ σ.
Note that we always have the one-interval case (i) if V is convex [8, 17], or if it has a unique absolute minimum and sufficiently large amplitude [19], and we always have the two interval case (ii) if V has two equal absolute minima and sufficiently large amplitude [19]. As for the condition C3, it is the case in many of the quantum field theory [15] and of the condensed matter theory [6] applications. The following statement, known in several contexts, provides a sufficiently explicit form of the IDS in our cases. Proposition 1. Let an ensemble of form (1.1)–(1.5) satisfy conditions C1–C3 above. Then its density of states ρ has the form ρ(λ) =
1 χσ (λ)P (λ)X+ (λ), 2π
(1.10)
where χσ (λ) is the indicator of the support σ of the IDS, P (λ) is analytic in D (including σ ) and √ a 2 −λ2 , |λ| ≤ a in the case (i), X+ (λ) = (1.11) sign λ (λ2 − a 2 )(b2 − λ2 ), a ≤ |λ| ≤ b in the case (ii). Besides, the Stieltjes transform g(z) =
σ
ρ(µ)dµ , z−µ
z = 0,
(1.12)
274
S. Albeverio, L. Pastur, M. Shcherbina
of the IDS can be represented in the form g(z) = with
1 (V (z) − X(z)P (z)), 2
z ∈ D,
√ z2 − a 2 , in the case (i), X(z) = 2 2 2 2 (z − a )(z − b ), in the case (ii),
(1.13)
(1.14)
and we take the branches of the square roots, which are analytic everywhere except σ and have the asymptotic X(z) = zp (1 + O(z−1 )), z → ∞ with p = 1, 2 for the one interval and for the two interval cases respectively. P (z) in (1.13) and in (1.10) can be represented in the form 1 P (z) = Q(z, ζ )X −1 (ζ )dζ, (1.15) 2π i L where L ⊂ D is a closed contour encircling σ , and Q(z, ζ ) ≡
V (z) − V (ζ ) . z−ζ
(1.16)
The proof of the proposition will be given in Sect. 3. Here we remark that in the two-interval case the contour L consists of two connected components encircling each of the intervals of σ . We need also several facts on ensembles (1.1)–(1.5) (see e.g. [7, 20, 22]). Denote by pn (λ1 , . . . , λn ) the joint eigenvalue probability density which we assume to be symmetric without loss of generality. It is known that [20] n pn (λ1 , . . . λn ) = Zn−1 (λj − λk )2 exp − n V (λj ) , (1.17) 1≤j
j =1
where Zn is the respective normalization factor. Let (n) pl (λ1 , . . . , λl ) = pn (λ1 , . . . , λl , λl+1 , . . . λn )dλl+1 . . . dλn
(1.18)
be the l th marginal distribution density of (1.17). The link with orthogonal polynomials is provided by the formula [20, 7] (n)
pl (λ1 , . . . , λl ) =
(n − l)! det ||kn (λj , λk )||lj,k=1 , n!
(1.19)
where kn (λ, µ) =
n l=1
(n)
(n)
ψl (λ)ψl (µ)
(1.20)
is known as the reproducing kernel of the orthonormalized system (n)
(n)
ψl (λ) = exp{−nV (λ)/2}pl−1 (λ),
l = 1, . . . ,
(1.21)
On the 1/n Expansion for Some Unitary Invariant Ensembles of Random Matrices
275
(n)
in which pl (λ), l = 1, . . . are orthogonal polynomials on R associated with the weight wn (λ) = e−nV (λ) ,
(1.22)
i.e.
(n)
(n) (λ)wn (λ)dλ = δl,m . pl (λ)pm
(1.23)
(n)
The polynomial pl (λ) has the degree l and the positive coefficient in front of λl . The (n) orthonormalized functions ψl (λ) verify the recurrent relations (n)
(n)
(n)
(n)
Jl (n)ψl+1 (λ) + ql (n)ψl (λ) + Jl−1 (n)ψl−1 (λ) = λψl (λ),
l = 1, . . . ,
(1.24)
where J0 (n) = 0. In other words, we have here a semi-infinite real symmetric Jacobi matrix J (n) = {Jl,m (n)}∞ l,m=1 , Jl,m (n) = ql (n)δl,m + Jl (n)δl+1,m + Jl−1 (n)δl−1,m .
(1.25)
Note that if V (λ) is even, then ql (n) = 0, l = 1, . . . .. As in statistical mechanics the symmetrized marginal densities (1.18) allow us to compute the expectation with respect to measure (1.1) of random variables of the form ωm (λ1 , . . . , λn ) =
ϕm (λi1 , . . . , λim ),
(1.26)
ch
where ϕm (t1 , . . . , tm ) is symmetric in its arguments and ch denotes the sum over all choices of m λ’s from the set (λ1 , . . . , λn ). By using (1.19) and noting that the (n) (n) semi-infinite matrix {ψj (λ)ψk (λ)}∞ j,k=1 is the density of the resolution of identity of J (n), it is easy to show that E{ωm } is a linear combination of the matrix elements of ϕm ((J (n))⊗m ) (see e.g. formula (2.78)). This observation makes J (n) an important object of the theory. That is why our main result (Theorem 1 below) yields the 1/nexpansion of entries of J (n). Besides, we give the 1/n-expansion of the expectation gn (z) = E{n−1 Tr(z − M)−1 },
z = 0,
of the normalized trace of the resolvent of the random matrix M and of the variance of the trace. These quantities are of considerable interest by themselves and are also important technical ingredients of the theory. Theorem 1. Let the ensemble of the form (1.1)–(1.5) satisfy conditions (1.3), (1.2), and C1–C3 above. Take a sequence of positive integers N (n) and an integer m > 0 such that N (n)n−1/(m+1) → 0, N (n) log−2 n → ∞, as n → ∞.
(1.27)
276
S. Albeverio, L. Pastur, M. Shcherbina (0)
(m)
(0)
(m)
Then there exist coefficients qk,n , . . . , qk,n , Jk,n , . . . , Jk,n , |k −n| ≤ N (n) and analytic (0)
(m)
outside of σ functions gn (z), . . . , gn (z) such that for any k ∈ [n − N (n), n + N (n)] we have the following asymptotic formulas: qk (n) =
m j =0
(j ) n−j qk,n
(m,q) + n−m r˜k,n ,
Jk (n) =
m j =0
(j )
(m,J )
n−j Jk,n + n−m r˜k,n ,
(1.28)
where (j )
(j )
|qk,n |, |Jk,n | ≤ const (|k − n|j + 1), (m,J )
(m,q)
|˜rk,n |, |˜rk,n | ≤ εn(m) ,
εn(m) → 0,
j = 0, . . . , p,
n → ∞;
(1.29) (1.30)
and gn (z) =
m
(j )
(m,g)
n−j gn (z) + n−m r˜n
(z),
(1.31)
j =0
where (j )
|gn (z)| ≤ const, (m,g)
r˜n
(z) → 0,
(1.32)
n→∞
uniformly in any compact set in {z : δ(z) ≥ d}, where δ(z) = dist(z, σ ), d > 0 is an arbitrary fixed number and const does not depend on k, n. In particular: gn(0) (z) = g(z),
gn(1) (z) = 0;
in the case (i) (0)
(0)
qk,n = 0,
Jk,n =
1 a, 2
(1)
qk,n = 0,
(1)
Jk,n =
1 1 (k − n) + , a P (a) P (−a)
(1.33)
(1.34)
i.e. the zero order coefficients for k = n(1 + o(1)) are independent of k; in the case (ii) (j )
qk,n = 0,
j = 0, 1, . . . ,
and (0)
1 1 (0) (b + (−1)k a), or Jk,n = (b − (−1)k a), 2 2
1 (k − n) (−1)k − = 2 ± , a − b2 P (b) P (a)
Jk,n = (1)
Jk,n
(1.35) (1.36)
where the sign corresponds to the chosen sign in (1.35) Besides, for m ≤ 1 formulas (1.28)–(1.35) are valid for any k : |k − n| ≤ n2/3 with |k − n| + 1 |k − n|2 + 1 (0,q) , , |˜rk,n | ≤ const n n2 |k − n|2 + 1 (1,q) (1,J ) |˜rk,n |, |˜rk,n | ≤ const . n (0,J )
|˜rk,n | ≤ const
(1.37)
On the 1/n Expansion for Some Unitary Invariant Ensembles of Random Matrices
277
(0)
Remark 2. According to (1.35) the 2-periodic function Jk,n is determined by our method up to the shift by 1. By using recent results of [13] on the form of the leading term of the asymptotics of the orthogonal polynomials (1.23) it can be shown that (0)
Jk,n =
1 (b − (−1)k a), 2 (j )
k = n(1 + o(1)).
(1.38)
(j )
Moreover, all subsequent coefficients Jk,n and gn , j = 1, . . . , N (n) of the asymptotic expansions (1.28) and (1.31) are uniquely determined by the choice (1.35) and by the recurrence procedure described in the proof of the theorem. (0)
(0)
Remark 3. The zero order coefficients qk,n and Jk,n were found in [2, 17]. The first (1)
order coefficients Jk,n (1.34) of the one-interval case were found in [14] in a somewhat different context. Theorem 1 allows us, in particular, to find the 1/n-expansion of the covariance 1 1 Dn (z1 , z2 ) ≡ E Tr(z1 − M)−1 Tr(z2 − M)−1 n n (1.39) 1 1 − E Tr(z1 − M)−1 E Tr(z2 − M)−1 , n n which is important in a number of questions of the random matrix theory and of its applications. Here and below the symbol E{. . . } denotes the expectation with respect to measure (1.1)–(1.5). In the paper [24] it is proven that for any V satisfying (1.2)–(1.3) we have the bound |Dn (z1 , z2 )| ≤
const . 2 2 1 ) (z2 )
n2 (z
Hence, the 1/n-expansion of Dn (z1 , z2 ) has the form Dn (z1 , z2 ) =
∞ j =2
dn(k) (z1 , z2 )n−j + o(n−p ),
n→∞
(1.40)
in which the leading term is of the order n−2 . Theorem 1 implies Corollary 1. Under the conditions of Theorem 1 we have: in the case (i) the n-independent
a 2 − z 1 z2 1 1 + , d (2) (z1 , z2 ) = − 2(z1 − z2 )2 X(z1 )X(z2 )
(1.41)
where X(z) is defined in the first line of (1.14); in the case (ii) the 2-periodic in n
(a 2 − z1 z2 )(b2 − z1 z2 ) 1 (−1)n ab 1 + − , dn(2) (z1 , z2 ) = − 2 2(z1 − z2 ) X(z1 )X(z2 ) 2X(z1 )X(z2 ) (1.42) where X(z) is defined in the second line of (1.14).
278
S. Albeverio, L. Pastur, M. Shcherbina
Remark 4. The covariance Dn (z1 , z2 ) of the traces of the resolvent is of considerable interest in the random matrix theory since the beginning of the 90s, when its study was motivated by matrix models of quantum field theory [1, 3–5, 9, 10] and later by solid state theory (see review [6] and references therein). Initially only the one interval case was studied but later the many interval case was also analyzed. In particular, in [3, 1] a version of the large-n expansion procedure was proposed. In the case (ii) of the two-interval symmetric potential the procedure leads to an expression for the leading term amplitude (2) dn (z1 , z2 ) that does not depend on n and contains elliptic integrals, while our expression (1.42) is 2-periodic in n and contains only elementary functions. By using recent results of paper [13] on the asymptotic form of the leading term of orthogonal polynomials (1.22)–(1.23) and our formula (2.78) below for the covariance Dn (z1 , z2 ), it can be shown that in the general case of a two-interval non-symmetric potential the leading (2) term amplitude dn (z1 , z2 ) is quasi-periodic in n and contains Jacobi elliptic functions that disappear when one passes to a two-interval symmetric potential. Moreover, by using the same results, it can be shown that in the case of a potential leading to a p-interval (2) support of the density of states the amplitude dn (z1 , z2 ) is a quasi-periodic function. Its frequency module contains generically p − 1 incommensurable frequencies (but can reduce to a p-periodic function in some special cases [11]), and its form includes the Riemann θ -function of p − 1 variables. The frequencies are determined by the density of states, and the θ -function are determined by the endpoints of the support of the density of states of the ensemble. Remark 5. Formulas (1.41) and (1.42) for the leading terms amplitude d (2) (z1 , z2 ) of the covariance Dn (z1 , z2 ) depend on the ensemble only via the number of intervals of the IDS support and via the endpoints of the support. This is why this property of the covariance is often referred to as the long-range universality [10] in contradistinction with the short range (or microscopic) universality that manifests itself in 1/n - neighborhoods of the interior points of σ and is valid independently of the number of connected components of σ (see e.g. papers [13, 24]). Thus under conditions of these papers all the unitary invariant ensembles belong to the same short range universality class. On the other hand, since according to (1.41) and (1.42) the leading terms of the covariance Dn (z1 , z2 ) are different in the one and in the two-interval cases, the long range universality classes depend on the number of intervals of the IDS support and on its endpoints. Corollary 2. Under the conditions of Theorem 1 we have the following expressions for (n) the weak limits of squares of the orthonormalized functions ψk (λ) with |k −n| ≤ N (n): w
2 (n) − lim ψk (λ) n→∞
χσ (λ) 1, in case (i), = π X+ (λ) λ, in case (ii),
(1.43)
where X+ (λ) is defined in (1.11). The proofs of these assertions will be given in the next section. 2. Proofs of Main Results Proof of Theorem 1. We introduce an eigenvalue distribution which is more general than (1.17), making different the number of the variable and the large parameter in front of
On the 1/n Expansion for Some Unitary Invariant Ensembles of Random Matrices
279
V in the exponent of the r.h.s of (1.17): pk,n (λ1 , . . . λk ) =
−1 Zk,n
(λj − λm ) exp
−n
2
1≤j <m≤k
k
V (λj ) ,
(2.1)
j =1
where Zk,n is the normalizing factor. For k = n this probability distribution density coincides with (1.17). Let ρ˜k,n (λ1 ) = dλ2 . . . dλk pk,n (λ1 , . . . λk ), (2.2) ρ˜k,n (λ1 , λ2 ) = dλ3 . . . dλk pk,n (λ1 , . . . λk ) (2.3) be the first and the second marginal densities of (2.1). By standard arguments [20, 7] we have ρ˜k,n (λ) = K˜ k,n (λ, λ), k 2 (λ, µ)], [K˜ k,n (λ, λ)K˜ k,n (µ, µ) − K˜ k,n ρ˜k,n (λ, µ) = k−1
(2.4)
where K˜ k,n (λ, µ) = k −1
k l=1
(n)
(n)
ψl (λ)ψl (µ),
(2.5)
(n)
and ψl (λ) is defined by (1.21). We will use the notations Kk,n (λ, µ) ≡ n−1
k l=1
(n)
(n)
ψl (λ)ψl (µ) =
k ˜ Kk,n (λ, µ), n
(2.6) k ρk,n (λ) ≡ Kk,n (λ, λ) = ρ˜k,n (λ). n V (λ1 ) Consider now the quantity Ek for z, z = 0, where Ek {. . . } denotes the z − λ1 expectation with respect to the probability distribution (2.1). It is well defined in view of condition (1.4) above. It is easy to find that V (λ1 ) V (λ)ρ˜k,n (λ) = dλ. (2.7) Ek z − λ1 z−λ On the other hand, integrating by parts the r.h.s. in (2.7) and using (2.3), we obtain that ρ˜k,n (λ) ρ˜k,n (λ, µ) V (λ1 ) 1 k−1 dλdµ. = Ek dλ + 2 z − λ1 n (z − λ)2 n (z − λ)(λ − µ) Combining these two expressions, we come to the identity V (λ)ρ˜k,n (λ) 1 ρ˜k,n (λ) ρ˜k,n (λ, µ) k−1 dλ = dλdµ, dλ + 2 2 z−λ n (z − λ) n (z − λ)(λ − µ)
(2.8)
280
S. Albeverio, L. Pastur, M. Shcherbina
The symmetry property ρ˜k,n (λ, µ) = ρ˜k,n (µ, λ) of (2.3) implies ρ˜k,n (λ, µ) ρ˜k,n (λ, µ) dλdµ = − dλdµ. (z − λ)(λ − µ) (z − µ)(λ − µ) This allows us to rewrite (2.8) in the form V (λ)ρ˜k,n (λ) 1 ρ˜k,n (λ) ρ˜k,n (λ, µ) k−1 dλ = dλdµ. dλ + 2 z−λ n (z − λ) n (z − λ)(z − µ) Now, by using (2.4)–(2.6), we can rewrite (2.9) as V (λ)ρk,n (λ) ρk,n (λ) dλ dλ = n−1 z−λ (z − λ)2 ρk,n (λ)ρk,n (µ) − (Kk,n (λ, µ))2 + dλdµ. (z − λ)(z − µ)
(2.9)
(2.10)
This relation is a version of the well known loop equation of the matrix models of the Quantum Field Theory [15]. We will use also Proposition 2. Consider any unitary invariant ensemble of the form (1.1)–(1.5) and assume that V (λ) possess two bounded derivatives in some neighborhood of the support σ of the density of states ρ and that ρ(λ) satisfies Condition C2. Denote by σε the ε-neighborhood of σ for some ε > 0. Then there exist n-independent quantities C, C0 , ε0 > 0 such that for any positive n-independent ε < ε0 there exists ε1 > 0 such that for any integer k satisfying inequality |k−n| n ≤ ε1 we have the bounds (n) ρk,n (λ)dλ ≤ e−nCε , (ψk (λ))2 dλ ≤ e−nCε . (2.11) R\σε
R\σε
Remark 6. The proof of Proposition 2, given in the next section, does not use the fact that ensemble (1.1)–(1.5) consists of Hermitian matrices. Therefore Proposition 2 is valid also for real symmetric and quaternion real matrices, i.e. for orthogonal and symplectic ensembles, satisfying (1.2), (1.3), and Condition C2. Let us fix now a sufficiently small ε such that σε ⊂ D and all the zeros of the function P (z) are outside of σε . Then (2.11) allows us to replace the integrals over the whole line by the integrals over σε in (2.10). Therefore, denoting gk,n (z) ≡
σε
(z) ≡ − Rj,m
ρk,n (λ)dλ , z−λ (n)
Rj,m (z) ≡ (n)
ψj (λ)ψm (λ)dλ (z − λ)2
σε
we get from (2.10):
(gk,n (z))2 −
σε
,
(n)
(n)
ψj (λ)ψm (λ)dλ
σε
z−λ
,
V (ζ ) V˜ (z, ζ ) ≡ , z−ζ
(2.12)
V˜ (z, λ)ρk,n (λ)dλ
k k 1 1 2 − 2 Rm,m (z) − 2 Rm,j (z) = en (z), n n m=1
m,j =1
(2.13)
On the 1/n Expansion for Some Unitary Invariant Ensembles of Random Matrices
281
where en (z) is the remainder function which appears because of our replacement of the integrals over the whole line by the integrals over σε . Note that since the l.h.s. of (2.13) is an analytic function in C \ σε , en (z) is also analytic in C \ σε , and admits the bound: C0 , |δε (z)|l
(2.14)
δε (z) ≡ dist{z, σε }
(2.15)
|en (z)| ≤ where
and l = 2. Besides, it follows from (2.11) that en (z) ≤
C1 e−nC2
|z|2 |δε (z)|l
(2.16)
with l = 0. We will denote below by {en (z)}∞ n=1 sequences of functions (may be different in different formulas) which are analytic everywhere in C \ σε and satisfy the estimates (2.14) and (2.16) with some nonnegative l, l and some positive n-independent C’s. According to our conditions V˜ (z, ζ ) in (2.12) is analytic with respect to ζ inside D, except for the point ζ = z. Hence, we can write that 1 ρk,n (λ) V˜ (z, λ)ρk,n (λ)dλ = dλ dζ V˜ (z, ζ ) 2π i ζ −λ σε σ L (2.17) ε 1 dζ V˜ (z, ζ )gk,n (ζ ), = 2π i L where L ⊂ D is an arbitrary closed contour which contains σε and does not contain z. This allows us to rewrite (2.13) as (gk,n (z))2 −
1 2π i
1 − 2 n
k 1 V˜ (z, ζ )gk,n (ζ )dζ − 2 Rm,m (z) n L m=1
k m,j =1
2 Rm,j (z)
(2.18)
= en (z).
Now, subtracting from (2.18) the relation obtained from (2.18) by the replacement k → (k − 1), we obtain: 1 V˜ (z, ζ )Rk,k (ζ )dζ 2Rk,k (z)gk−1,n (z) − 2π i L (2.19) k−1 1 2 2 − Rk,k (z) − Rk,j (z) = en (z). n n j =1
Relations (2.18) and (2.19) are our main technical tools in constructing the 1/n expansion given in the theorem. We will consider (2.18) and (2.19) as a system of equations with respect to the functions gk,n (z) and Rj,m (z) and solve them by iterations in 1/n.
282
S. Albeverio, L. Pastur, M. Shcherbina
We will need two more facts on ensembles (1.1)–(1.5). (a) The function gk,n (z) from (2.12) and g(z) from (1.12) are related as log1/2 n |k − n| |gk,n (z) − g(z)| ≤ const √ 2 . + nδε (z) nδε (z)
(2.20)
This relation follows from (2.12), (2.6), (2.4), and from the bound valid for any function φ(µ), which grows not faster than ebV (µ) , b > 0 as |µ| → ∞, φ(µ)ρn (µ)dµ − φ(µ)ρ(µ)dµ ≤ const||φ ||1/2 ||φ||1/2 n−1/2 log1/2 n, (2.21) 2 2 where the symbol || . . . ||2 denotes the L2 -norm on a compact set of R containing σε (the bound was proved in [8], Lemma 4, see also [24]). (b) g 2 (z) − V (z)g(z) + Q(z) = 0,
Q(z) =
1 2πi
L
Q(z, ζ )g(ζ )dζ =
σ
z ∈ D, z = 0,
(2.22)
V (z) − V (λ) ρ(λ)dλ, z−λ
(2.23)
and Q(z, ζ ) is defined by (1.16). The relations follow from (2.20), and identity (2.10) for n = k. Indeed, in view of (2.4) the r.h.s. of (2.10) is gn2
+E
n
−1
n
(z − λl )
−1
−E n
−1
l=1
n
(z − λl )
−1
2 .
l=1
The second term here is the variance of n−1 Tr(z − M)−1 , and according to [24], Lemma 3, the variance is of the order O(n−2 ). This and (2.20) imply (2.22). It follows from the above that the zero order approximation for gk,n (z) coincides with g(z). To find the zero order approximations for Rk,k (z) for |k − n| ≤ N (n), where N (n) is defined in (1.27), let us note that (2.12) leads to the bounds
|Rk,k (z)|, |
k−1 j =1
2 Rk,j (z)| ≤
const . δε2 (z)
The first bound follows from the definition of Rk,i (z) in (2.12). To prove the second bound we view Rk,i (z) of (2.12) as the generalized Fourier coefficients of the function (n) (n) χε (λ)ψk (λ)(z − λ)−1 with respect to the orthonormal system {ψl (λ)}∞ l=1 . Then the Bessel inequality gives us the second bound. These bounds imply that the last two terms in the l.h.s. of (2.19) have the order n−1 . Hence, the zero order equations for Rkk (z) have the form 1 (0,R) 2g(z)Rk,k (z) = dζ V˜ (z, ζ )Rk,k (ζ ) − rk,n (z) + en (z), (2.24) 2πi L
On the 1/n Expansion for Some Unitary Invariant Ensembles of Random Matrices
283
where the remainder k−1
(0,R)
rk,n (z) ≡ −
1 2 2 Rk,j (z) Rk,k (z) − n n
(2.25)
j =1
+ 2Rk,k (z)(gk−1,n (z) − g(z)) → 0,
n → ∞,
is analytic in C \ σε and tends to zero uniformly on any compact set for which dist (z, σε ) ≥ d > 0. Besides, since by definition (1.21)
(n)
(ψk )2 (λ)dλ = 1, we have from (2.11), that Rk,k (z) =
1 1 (1 + O( )) + en (z), z z
z → ∞.
(2.26)
Equation (2.24) was already considered in [2]. However we will use here a bit different way to analyze the equation, which is based on the following lemma: Lemma 1. Consider the equation 1 2g(z)R(z) − 2π i
L
dζ V˜ (z, ζ )R(ζ ) = 0,
z ∈ D \ σε ,
(2.27)
˜ ζ ) is defined in (2.12), and a closed contour L ∈ D contains σe and does where V (z, not contain the point z. Set for z ∈ σ , >(z) =
X−1 (z), in the case (i), zX−1 (z), in the case (ii),
(2.28)
where X(z) is defined by (1.14). Then the following statements are valid under the conditions of Theorem 1: 1. In the case (i) Eq. (2.27) has the unique solution R(z) = >(z) in the class of functions analytic in C \ σε and behaving as R(z) = z−1 (1 + o(1)),
z → ∞.
(2.29)
In the case (ii) Eq. (2.27) has the unique solution R(z) = >(z) in the class (2.29), under the additional symmetry condition R(−z) = −R(z). 2. In both cases Eq. (2.27) has no solutions in the class of functions R(z) analytic in C \ σε and satisfying the condition lim |z2 R(z)| ≤ const < ∞.
|z|→∞
(2.30)
284
S. Albeverio, L. Pastur, M. Shcherbina
3. For any analytic in C \ σε function F (z), satisfying condition (2.30) and even in the case (ii), the inhomogeneous equation 1 2g(z)R(z) = dζ V˜ (z, ζ )R(ζ ) − F (z) (2.31) 2π i L has the unique solution of the form R(z) =
1 2π iX(z)
L
dζ
F (ζ ) , P (ζ )(z − ζ )
(2.32)
in the class of functions analytic in C \ σε , satisfying condition (2.30) and odd in the case (ii). Here P (z) is defined by (1.15) and a closed contour L should be taken sufficiently close to σ , to have z and all zeros of P (z) outside of L. In particular, in the case (ii) the contour consists of two components, encircling each interval of the support. The proof of the lemma will be given in the next section. Omitting in (2.24) the error terms, we deduce from the obtained homogeneous equation and from (2.26) on the basis of Assertion 1 of Lemma 1 that the zero order approxi(0) mation Rk,k (z) of Rk,k (z) is >(z) from (2.28). Moreover, the difference Rk,k (z) − >(z) decays at infinity as z−2 at least, and the error terms in the r.h.s. of (2.24) decays also as z−2 , as z → ∞. Thus on the basis of Assertion 3 of the lemma we can write that (0,R)
Rk,k (z) = >(z) + r˜k,n (z) + en (z). (0,R)
(2.33) (0,R)
Here r˜k,n (z) is obtained from formula (2.32) with F (z) = rk,n (z) given by (2.25)). (0,R)
Using the fact that |rk,n (z)| → 0 as |z| → ∞ and that P (z) has no zeros on L we obtain the bound (0,R) rk,n (ζ ) 1 (0,R) dζ |˜rk,n (z)| ≤ 2πiP (z)X(z) L (z − ζ ) 1 P −1 (ζ ) − P −1 (z) (0,R) (2.34) + dζ rk,n (ζ ) 2πiX(z) L (z − ζ )
const (0,R) (0,R) ≤ rk,n (z) + max rk,n (ζ ) → 0, n → ∞. ζ ∈L |X(z)| Thus, for all k such that |k − n| ≤ N (n), where N (n) is given in (1.27) for m = 0, we have (0)
Rk,k ≡ lim Rk,k (z) = >(z). n→∞
We have also the relations following from (1.21), (1.24), (2.11) and (2.12): 1 2 qk = λψk (λ)dλ = ζ Rk,k (ζ )dζ + O(e−nCε ), 2π i L 1 2 qk2 + Jk2 + Jk−1 = λ2 ψk2 (λ)dλ = ζ 2 Rk,k (ζ )dζ + O(e−nCε ), 2π i L 2 2 2 (qk2 + Jk2 + Jk−1 )2 + (qk + qk+1 )2 Jk2 + (qk + qk−1 )2 Jk−1 + Jk2 Jk+1 1 2 2 +Jk−1 Jk−2 = λ4 ψk2 (λ)dλ = ζ 4 Rk,k (ζ )dζ + O(e−nCε ). 2π i L
(2.35)
(2.36)
On the 1/n Expansion for Some Unitary Invariant Ensembles of Random Matrices (j )
285 (j )
In what follows we omit the subindex n in the coefficients qk,n and Jk,n , introduced in (1.28). By using (2.35), and (2.28) for the case (i), we find from the first of the above relations (0) that the zero order term qk is zero. Then, combining the second relation of (2.36) for (0) k, k − 1, and k + 1 and the third relation of (2.36), we find that Jk = a/2. In the (0) case (ii) the same scheme carried out for even and odd k leads to the coefficients Jk of (1.35). In other words we have proved that in the zero order in 1/n the coefficients of the Jacobi matrix J (n) defined in (1.25) do not depend on k, |k − n| ≤ N (n) in the case (i) of a one interval support of the density of states and are 2-periodic functions of k in the case (ii) of a two interval symmetric support. To find the first order terms for these coefficients, we will study the first order versions of Eqs. (2.18). Note first that we have the bound k k const 1
2 − R (z)j,j − Rj,m (z) ≤ 4 + |en (z)|, n nδε (z) j =1
(2.37)
j,m=1
where const does not depend on n, z. Indeed, by using the orthonormality of system (1.21) we can write the l.h.s. as n 2 2 2 dλ dµ(φ(λ) − φ(µ)) Kk,n (λ, µ) + n dλ dµφ 2 (λ)Kk,n (λ, µ), 2 σε σε σε R\σε where φ(λ) = (z − λ)−1 and Kk,n (λ, µ) is defined in (2.6). According to Lemma 3 of [24] the first term here is bounded by const · sup |φ (λ)|2 /n ≤ const/nδε4 (z), and according to Proposition 2, the second term is en (z). We conclude that the first order equation for the function (1)
gk,n (z) ≡ n(gk,n (z) − g(z)) has the form (1) 2g(z)gk,n (z)
1 = 2πi
(1,g)
(1)
(2.38)
V (z, ζ )gn,k (ζ )dζ − rk,n (z) + en (z),
(2.39)
k k 1 (1) 1 2
2 − R (z)j,j − ≡ (gk,n (z)) + Rj,m (z) n n m=1 j =1 1 (1) const (1,g) (1,g) ≡ (gk,n (z))2 + r k,n (z), r k,n (z) ≤ 4 . n nδε (z)
(2.40)
with (1,g) rk,n (z)
Besides, we have the normalization condition
1 (1) gk,n (z) = (k − n)z−1 1 + O + en (z), z → ∞, |k − n| ≤ N (n), z
(2.41)
which follows from Definition (2.12) of the function gk,n (z). Then, according to Lemma 1, we get (1)
(1,g)
gk,n (z) = (k − n)>(z) + r˜k,n (z) + en (z),
(2.42)
286
S. Albeverio, L. Pastur, M. Shcherbina (1,g)
where the remainder r˜k,n (z) has the form (1,g) r˜k,n (z)
1 = 2πiX(z)
(1,g)
(1)
n−1 (gk,n (ζ ))2 + r k,n (ζ ) P (ζ )(z − ζ )
L
dζ.
(2.43)
Thus, denoting (1)
mk,n (d) ≡
max
{z:δε (z)≥d}
(1)
|gk,n (z)|,
where d is a positive constant, we obtain from relations (2.42) and (2.43) the inequality (1)
mk,n (d) ≤
(1) (mk,n (d))2 |k − n| 1 , + C + d 1/2 nd 3/2 nd 9/2
where C is independent of n, k, and d. This inequality implies that either (1)
mk,n (d) ≤
2|k − n| , d 1/2
or
(1)
mk,n (d) ≥ nd 3/2 C −1 + O(1).
But the second inequality here cannot be true, because it was proved above that (1)
n−1 mk,n (d) =
max
{z:δε (z)≥d}
|gk,n (z) − g(z)| → 0
for any k such that |k − n| = N (n), where N (n) is given in (1.27) for m = 0. Hence in view of (2.43) we get that for {z : δε (z) ≥ d},
1 |k − n|2 (1,g) |˜rk,n (z)| ≤ const (2.44) + 9/4 . nd nd Substituting now representation (2.42) in the r.h.s. of (2.43), and using bound (2.44), we get finally (1,g)
r˜k,n (z) =
(k − n)2 Y (z) + O(|k − n|3 n−2 d −5/2 ) + O((nd 5 )−1 ), n
(2.45)
where > 2 (ζ ) 1 dζ Y (z) ≡ 2πiX(z) L P (ζ )(z − ζ )
1 1 1 − , (i), 1 2a P (a)(z − a) P (−a)(z + a)
= 1 az bz X(z) − , (ii). 2 (a − b2 ) P (a)(z2 − a 2 ) P (b)(z2 − b2 )
(2.46)
We have obtained the first order term in the 1/n-expansion for gn,k (z). Now we need a lemma that will allow us to replace Rk,j (z) in (2.18), (2.19) by a (j ) (j ) certain simpler expression constructed from the coefficients qk,n , Jk,n , j = 0, . . . , p found during the previous p steps of our expansion process and to estimate the error of this replacement.
On the 1/n Expansion for Some Unitary Invariant Ensembles of Random Matrices
287
Lemma 2. Take N˜ (n) = [log2 n] and let N1 (n) be such that N1 (n)n−1/(p+1) → 0, (N1 (n))−1 N˜ (n) → 0,
as n → ∞.
(2.47) (p)
(0)
Assume that for any k : |k − n| ≤ N1 (n) we have found the coefficients qk , . . . , qk , (p) (0) Jk , . . . , Jk , satisfying bound (1.29), and such that (1.28) is fulfilled for m = p. Here (j ) (j ) and below we omit the subindex n in the coefficients qk,n , Jk,n of the asymptotic formula (1.28) of Theorem 1. For any s such that |s| ≤ 2/n consider the (2N1 + 1)-periodic symmetric Jacobi matrix J˜(p) (s) defined by the entries (p) (p) J˜k,k ≡ q˜k =
p j =0
(j )
p
(p) (p) J˜k,k+1 ≡ J˜k =
s j qk ,
j =0
(j )
s j Jk ,
|k − n| ≤ N1 (n). (2.48)
Denote by R˜ (p) (z, s) the resolvent of J˜(p) (s), and set R
(j )
1 ∂ j ˜ (p) (z) ≡ R (z, s)|s=0 , j ! ∂s j
S
(p)
(z) ≡
p
n−j R (j ) .
(2.49)
j =0
Then for any L > 0 there exist positive n-independent quantities C1 and C2 such that for any k satisfying the inequality: |k − n| ≤ N1 − 2N˜ ≡ N2 (n),
(2.50)
and for any z ∈ σε , |z| < L, Rk,k (z) − S (p) (z), − R (z) − (S (p) · S (p) )k,k (z) k,k k,k p+1
˜ (p) C1 N 1 e−C2 δε (z)N 2εn + + , p+1 δε2 (z)np δε (z)|z|2 δε (z)np+1
(2.51)
k k p+1 ˜ (p) 2 C1 N 1 2εn e−C2 δε (z)N (p) 2 ≤ R (z) − (S (z)) + + , k,m k,m δ 2 (z)np p+1 δε (z)|z|2 δε (z)np+1 ε m=1 m=1
(2.52)
≤
k k k 1 k 1 (p) (p) (p)
2 2 − R (z)j,j − Rj,m (z) − [(S · S )j,j (z) − (Sj,m (z)) n n j =1
m=1
m=1
j =1
≤
(p) 2εn N1 δε2 (z)np+1
(p)
where δε (z) ≡ dist {z, σε } and εn
+
p+2 C1 N 1 p+1 δε (z)np+2
+
˜ e−C2 δε (z)N/2
= o(1), n → ∞ (see (1.30)).
|z|3
,
(2.53)
288
S. Albeverio, L. Pastur, M. Shcherbina
The proof of the lemma will be given in the next section. Consider the function
(1,n) (0) Rk,k (z) ≡ n Rk,k (z) − Rk,k (z) ,
(2.54)
(0)
with Rk,k (z) defined in (2.35). From (2.19) and (2.42) we get the first order equation for Rkk : 1 (1,n) (1,n) (1,R) (1,R) 2g(z)Rk,k (z) = dζ V˜ (z, ζ )Rkk (ζ ) − Fk (z) − rk,n (z) + en (z). 2πi L (2.55) Here (1,R)
Fk
(0)
(1)
(z) ≡ 2Rk,k (z)gk−1 (z) + (R (0) · R (0) )k,k (z) − 2
k−1 j =1
(0)
(Rk,j (z))2 ,
R (0) denotes the resolvent of the double infinite Jacobi matrix J (0) of the zero order (0) coefficients {Jk }k∈Z , and 2 (1,n) (1,g) (1,R) (0) (1) rk,n (z) ≡ 2Rk,k (z)˜rk,n (z) + Rk,k (z)gk,n (z) n
+ −Rk,k (z) − (R (0) · R (0) )k,k (z) −2
k−1 j =1
(Rk,j (z))2 −
k−1 j =1
(0)
(2.56)
(Rk,j (z))2 .
By using the translational symmetry of the resolvent R (0) and the exponential decay of (0) its matrix elements Rj m in |j − m|, as |j − m| → ∞, it is easy to show that (R (0) · R (0) )k,k (z) − 2
k−1 j =1
(0)
(Rk,j (z))2 (0) 2 (i), (Rk,k (z)) + en (z), (0) (0) 2 2 = (J ) − (Jk−1 ) (0) + en (z), (ii), (Rk,k (z))2 + k X 2 (z)
This relation, and formulas (2.42), and (2.54) imply that (i), [2(k − n) − 1] > 2 (z), (1,R) k ab Fk = (−1) [2(k − n) − 1] > 2 (z) ± , (ii), X 2 (z) where the sign in the case (ii) corresponds to that in (1.35). (1,n) In addition, bound (2.45), and the fact that n−1 Rk,k (z) → 0, as n → ∞ (see formulas (2.54) and (2.33)–(2.35)) imply that the first two terms in the r.h.s. of (2.56) tend to zero as n → ∞. And on the basis of Lemma 2, one can conclude that the last two (1,R) terms there also vanish as n → ∞. Therefore rk,n (z) → 0 as n → ∞. Then on the
On the 1/n Expansion for Some Unitary Invariant Ensembles of Random Matrices
289 (1)
basis of Lemma 1, and similarly to (2.38)–(2.46) we get for the first order term Rk,k (z) all k such that |k − n| ≤ N1 (n), where N1 (n) is given in (2.47): [2(k − n) − 1]Y (z), (i), (1) Rk,k (z) = (2.57) k (±) [2(k − n) − 1] Y (z) ± (−1) Y (z), (ii), where Y (z) is defined in (2.46), Y
(±)
dζ ab (z) ≡ 2πiX(z) L P (ζ )X 2 (ζ )(z − ζ )
z b a = − , X(z)(a 2 − b2 ) P (a)(z2 − a 2 ) P (b)(z2 − b2 ) (1,R)
and the remainder function r˜k,n (z) is (1,R)
r˜k,n (z)
1
|k − n|4 2(k − n)[3(k − n) − 1] ˜ + O Y (z) + O , (i), n n3 n 2(k − n)[3(k − n) − 1] ˜ = Y (z) ± 2(−1)k (k − n)Y˜ ± (z) n
4 +O |k − n| + O 1 , (ii) n3 n
where
1 Y (ζ )>(ζ ) , dζ 2π iX(z) L P (ζ )(z − ζ ) 1 Y ± (ζ )>(ζ ) Y˜ ± (z) ≡ dζ . 2π iX(z) L P (ζ )(z − ζ )
(2.58)
Y˜ (z) ≡
(2.59)
Now in the case (ii) we take the first order terms with respect to n−1 in Eqs. (2.36) (recall (0) that the diagonal coefficients qk are zero for all k). We obtain the relations 1 (0) (1) (0) (1) (1) (1,J,2) 2(J2q J2q + J2q−1 J2q−1 ) = ζ 2 R2q,2q (ζ )dζ + r2q , 2π i L (0) (1)
(0)
(1)
(0)
(0)
4(J2q J2q + J2q−1 J2q−1 )((J2q )2 + (J2q−1 )2 ) (0) (0)
(0)
(1)
(0) (1)
(0) (1)
(0)
(1)
+ 2J2q J2q−1 (J2q−1 J2q + J2q J2q+1 + J2q J2q−1 + J2q−1 J2q−2 ) 1 (1) (1,J,4) = ζ 4 R2q,2q (ζ )dζ + r2q , 2πi L (2.60) where k = 2q, |k − n| ≤ N1 (n), N1 (n) is defined in (2.47) for p = 0, and: (1,J,2) (1,R) rk,n ≡ ζ 2 r˜k,n (ζ )dζ → 0, n → ∞, L (1,J,4) (1,R) rk,n ≡ ζ 4 r˜k,n (ζ )dζ → 0, n → ∞. L
(2.61)
290
S. Albeverio, L. Pastur, M. Shcherbina
Consider also the two analogs of the first equation in (2.60) with 2q replaced by 2q − 1 and by 2q + 1. These relations and (2.60) comprise a linear system with the unknowns (1) (1) (1) (1) (0) (0) J2q−2 , J2q−1 , J2q and J2q+1 . The system is uniquely soluble for J2q = J2q−1 , and its solution is specified by (1.36), and its remainder terms satisfy the bounds (1.37). (0) (0) However, for J2q = J2q−1 this system is degenerated. Thus, in the case (i) we cannot (1)
use the system to find coefficients Jk,n . In this case we use first identity (2.36) that yields the following relation in the first order: (1,q,1) (1) (1,R) qk = rk,n ≡ ζ r˜k,n (ζ )dζ. L
(1)
This and (2.57) yield that qk (0)
(0)
= 0. Furthermore, the first equation in (2.60) for J2q =
J2q−1 = a/2, in view of (2.57) and (2.58), has the form (1)
a(Jk
(1)
(1,J,2)
+ Jk−1 ) = [2(k − n) − 1]I (i) + rk,n
1 1 1 (i) I ≡ + . 2 P (a) P (−a)
, (2.62)
Iterating this relation starting from k = n it is easy to obtain the one-parameter family of solutions (1)
aJk
(1,J )
= (k − n)I (i) − c(−1)k−n + r˜k,n ,
(2.63)
where (1,J )
r˜k,n
=
k−n j =0
(1,J,2)
(−1)k−n−j rn+j,n .
(1,R)
(1,J,2)
Substituting expression (2.58) for r˜k,n (z) in (2.61) and using the resulting rk,n (z) in the last relations, we obtain the bound
|k − n|2 + 1 |k − n|5 (1,J ) . (2.64) |˜rk,n | ≤ const + n n3 This leads to (1.37) for the case (i), if |k − n| ≤ n2/3 . To fix the parameter c in (2.63) we use the relation known in random matrix theory as the string equation (see e.g. [15]): k (n) (n) Jk V (λ)ψk (λ)ψk+1 (λ)dλ = . n The relation can be easily obtained from the identity (n) (n) e−nV (λ) pk−1 (λ)pk (λ) dλ = 0. We use this relation in the form Jn V (ζ )Rn,n+1 (ζ )dζ = 1 + O(e−nC ), 2πi L
(2.65)
On the 1/n Expansion for Some Unitary Invariant Ensembles of Random Matrices
291
following from Proposition 2. The first order equation which follows from (2.65) has the form (0) (1) Jn Jn (1) (0) V (ζ )Rn,n+1 (ζ )dζ + V (ζ )Rn,n+1 (ζ )dζ = 0. 2πi L 2π i L By using (1.34), (2.33), (2.57), and (2.63), we get a linear equation with respect to c: D (i) c − A(i) = 0, with
a (0) V (ζ )Rn,n+1 (ζ )dζ + V (ζ )(R (0) J ± R (0) )n,n+1 (ζ )dζ, 2 L L a (0) (1∗)
≡ Jn V (ζ )Rn,n+1 (ζ )dζ + V (ζ )(R (0) · J (1∗) · R (0) )n,n+1 (ζ )dζ, 2 L L (2.67)
D (i) ≡ Jn± A(i)
(2.66)
where J ± is the symmetric Jacobi matrix with coefficient Jk± = (−1)n−k and J (1∗) is the symmetric Jacobi matrix with coefficients defined by (1.34). Lemma 3. Under conditions of the theorem A(i) = 0, D (i) = 0 and Eq. (2.66) has the unique solution c = 0. The proof of this lemma is given in the next section. By using the lemma we find the first order terms of our expansion in the case (i) given in (1.34). Now we will prove (1.31) and (1.28) by induction. The scheme of the induction pro(p) (0) cedure will be as follows. Assume that we have found coefficients qk , . . . , qk and (p) (p+1) (0) Jk , . . . , Jk . Then we can find the p + 1 correction gk (z) and estimate the respec(p+1,g) tive remainder rk,n from the (p + 1) form of Eq. (2.18) (see Eq. (2.70) below), in (p)
(0)
(p)
(0)
which we use the functions gk (z), . . . , gk (z) and Rkk (z), . . . , Rkk (z) found previously. Then, by using the (p +1) form of Eq. (2.19) (see Eq. (2.73) below), we determine (p) (p+1,R) Rkk (z) and estimate the respective remainder rk,n . Finally, we find the coefficients (p+1)
(p+1)
, and Jk and estimate the respective remainder by using the (p + 1) form of qk relations (2.36) and (2.65). To realize this scheme we first write the asymptotic relation: gk,n (z) =
p j =0
(j )
(p,g)
n−j gk (z) + n−p r˜k,n (z),
(p,g)
r˜k,n (z) → 0,
as n → ∞,
(2.68)
valid for all k such that |k − n| ≤ N1 (n). Let matrices R (j ) (z), j = 0, . . . , p be defined as in Lemma 2 (see formula (2.48), (2.49)). Then, denoting (p+1) gk,n (z)
p+1
≡n
gk,n (z) −
p j =0
,
(j ) n−j gk (z)
(2.69)
292
S. Albeverio, L. Pastur, M. Shcherbina (p+1)
we obtain from (2.18) the equation of the (p + 1)th order for gk,n 1 = 2π i
(p+1) 2g(z)gk,n (z)
(z):
(p+1) V˜ (z, ζ )gk,n (ζ )dζ
(p+1,g) (p+1,g) − Fk (z) − rk,n (z) + en (z),
(2.70)
where (p+1,g)
Fk
(z) =
p
(p+1,g)
(z) =
(l)
(z)gk (z) +
∞ p−1 k
(p−l−1)
Rm,j
m=1 j =k+1 l=0 p (p+1) (p+1) (l) n−p−1 (gk,n (z))2 + 2gk,n (z) n−l gk (z) l=1 p
(l) (l ) np+1−l−l gk (z)gk (z) + l,l =1,l+l >p+1 l=1
rk,n
(p+1−l)
gk
· np
(l)
(z)Rm,j (z),
(2.71)
k k 1 2 − R (z)j,j − Rj,m (z) n m=1
j =1
1 − n
k
(S
(p)
·S
(p)
)j,j (z) −
k m=1
j =1
(p) (Sj,m (z))2
,
(p)
with Sj,m (z) defined by (2.49). On the basis of (2.68), (1.28), and Lemma 2 we conclude that the relations (p+1,g) F (z) ≤ const (|k − n|p+1 + 1), k and (p+1,g)
rk,n
(z) → 0,
as
n → ∞,
are valid uniformly in {z : δε (z) ≥ d}, for any fixed d > 0, because by the induction (p+1) (p) assumption (2.68) we have that n−1 gk,n (z) ≡ g˜ k,n (z) → 0 as n → ∞. Then Lemma 1 leads to the relations (p+1)
gk,n
(p+1)
(z) = gk
(p+1,g)
(z) + r˜k,n
(z),
(2.72)
where for δε (z) ≥ d > 0, (p+1)
gk
(z) =
1 2πi
L
(p+1,g)
Fk (ζ ) dζ, P (ζ )(ζ − z)
(p+1)
|gk
(z)| ≤ const (|k − n|p+1 + 1)
and (p+1,g)
|˜rk,n
(z)| ≤
const (p+1,g) (p+1,g) (z)| + max |rk,n (ζ )|). ((|rk,n {ζ ∈L} |X(z)|
On the 1/n Expansion for Some Unitary Invariant Ensembles of Random Matrices
293
Now, denoting (cf. (2.69)) (p+1,n) (z) Rk,k
p+1
Rk,k (z) −
≡n
p
n
−j
j =0
(j ) Rk,k (z)
,
we get from (2.19) the equation of the form (cf. (2.55)) 1 (p+1,n) (p+1) (z) = V˜ (z, ζ )Rk,k (ζ )dζ 2g(z)Rk,k 2π i (p+1,R)
− Fk
(p+1,R)
(z) − rk,n
(2.73)
(z) + en (z),
where (p+1,R) Fk (z) (p+1,R)
rk,n
=
p l=0
(p+1−l) (l) gk−1 (z)Rk,k (z) +
(p+1)
(z) = 2Rk,k
(z)
p
p
+
l=1
j =1
−
p ∞ j =k+1
l=0
(p−l)
(l)
Rm,j (z)Rm,j (z),
(l)
n−l gk−1 (z)
l,l =1,l+l >p+1
+ np−1
k
(l )
(l)
np+1−l−l gk−1 (z)Rk,k (z)
− R (z)k,k − 2
k
(Rk,m (z))2
m=1
k (p) − (S (p) · S (p) )j,j (z) − 2 (Sj,m (z))2 . m=1
By the virtue of (2.68), (1.28) and of Lemma 2, we conclude that the relations (p+1,R) F (z) ≤ const (|k − n|p+1 + 1), k and (p+1,R)
rk,n
(z) → 0,
as
n → ∞,
are valid uniformly in {z : δε (z) ≥ d}, for any fixed d > 0. Using again Lemma 1, we get (p+1,n)
Rk,k
(p+1)
(z) = Rk,k
(p+1,R)
(z) + r˜k,n
(z),
(2.74)
where for δε (z) > d, (p+1)
Rk,k and
(z) =
1 2πi
L
(p+1,R)
(ζ ) Fk (p+1) dζ, |Rk,k (z)| ≤ const (|k − n|p+1 + 1) (2.75) P (ζ )(ζ − z)
(p+1,R) ≤ const r (p+1,R) (z) + max r (p+1,R) (ζ ) . r˜ (z) k,n k,n {ζ ∈L} k,n |X(z)|
294
S. Albeverio, L. Pastur, M. Shcherbina
Now, as for the first order approximation case, in the case (ii) we take the (p + 1) - order terms (with respect to n−1 ) of Eqs. (2.36) for k = 2q: 1 (p+1) (p+1) (p+1,J,2) (0) (p+1) (0) + J2q−1 J2q−1 ) = ζ 2 R2q,2q (ζ )dζ + r2q , 2(J2q J2q 2π i L (0) (p+1)
4(J2q J2q
(p+1)
(0)
(0)
(0)
+ J2q−1 J2q−1 )((J2q )2 + (J2q−1 )2 )
(0) (0)
(p+1)
(0)
(0) (p+1)
(0) (p+1)
(0)
(p+1)
+ J2q J2q+1 + J2q J2q−1 + J2q−1 J2q−2 ) + 2J2q J2q−1 (J2q−1 J2q 1 (p+1) (p+1,J,4) = ζ 4 R2q,2q (ζ )dζ + r2q , (2.76) 2πi L (p+1,J,2)
(p+1,J,4)
are the coefficients at n−p−1 in the r.h.s. of the second p (j ) n−j Jk , and and the third equations (2.36) which we get, substituting there Jk =
where Fk
and Fk
(p+1,J,2)
rk
(p+1,J,4)
rk
j =0
≡ ≡
(p+1,R)
L
ζ 2 r˜k,n
(ζ )dζ → 0,
(p+1,R)
L
ζ 4 r˜k,n
n → ∞,
(ζ )dζ → 0,
n → ∞.
Consider also the two analogs of the first relation of (2.76), in which 2q is replaced by 2q − 1 and 2q + 1. These relations together with (2.76) comprise a linear system with (p+1) (p+1) (p+1) (p+1) (0) (0) respect to the variables J2q−2 , J2q−1 , J2q and J2q+1 . For J2q = J2q−1 , i.e. in the case (ii), the system is uniquely soluble and the solution satisfies condition (1.29) in view of (2.75). (0) (0) However, for J2q = J2q−1 this system is degenerated and so in the case (i) we (p+1)
from the system. Therefore similarly to (2.62)–(2.64) for the case (i) cannot find Jk we obtain the one-parameter family of solutions (p+1)
Jk
(p+1)
= bk
(p+1,J )
− c(−1)k−n + r˜k,n
,
(2.77)
where (p+1)
bk
=
k−n j =0
(p+1)
(−1)k−n−j an+j ,
(p+1,J )
r˜k,n
with (p+1)
ak
(p+1,J,2)
≡ −Fk
+
1 2π i
=
k−n j =0
L
(p+1,J,2)
(−1)k−n−j rn+j,n
(p+1)
ζ 2 Rk,k
,
(ζ )dζ,
To fix the parameter c we use again identity (2.65) and Lemma 2. Then we get the equation for c of the form (i)
D (i) c − Ap+1 = 0, where, as usually in perturbation theory, the coefficient D (i) is the same in each order of the procedure. Thus, in view of Lemma 3, D (i) is nonzero and the parameter c is
On the 1/n Expansion for Some Unitary Invariant Ensembles of Random Matrices
295
uniquely defined by this equation. By the same argument as in the case p = 1 it is (p+1) (p+1) easy to see that in view of (2.75) qk and Jk satisfy bounds (1.30). Theorem 1 is proven. Proof of Corollary 1. By using general formulas (1.18 )–(1.25), (2.12) (2.14)–(2.16) and the Christoffel–Darboux identity for orthogonal polynomials it can be shown that the covariance (1.39) can be written as (λ − µ)2 kn2 (λ, µ)dλdµ 1 Dn (z1 , z2 ) = 2 2n (z1 − λ)(z1 − µ)(z2 − λ)(z2 − µ) (2.78)
2 δRn+1,n 2 Jn δRn+1,n+1 δRn,n − + en (z1 ) + en (z2 ), = 2 n δz δz δz where kn (λ, µ) is defined in (1.20) and we denote δR k,j ≡ Rk,j (z1 ) − Rk,j (z2 ) and δz ≡ z1 − z2 . (2) Then, on the basis of Lemma 2, we conclude that the amplitude dn (z1 , z2 ) of the asymptotic formula (1.40) is: (0) (0) 2
(0) δRn+1,n+1 δRn,n δRn+1,n (2) (0) 2 . dn (z1 , z2 ) = (Jn ) − δz δz δz According to Theorem 1 and Remark 2 after the theorem the zero-order coefficients (0) Jk of the Jacobi matrix J (n) do not depend on k (k = n(1 + o(1))) in the case (i) and are 2-periodic functions of k in the case (ii). Thus, we have only to compute the matrix elements of the resolvent of the constant Jacobi matrix and of the 2-periodic Jacobi matrices whose coefficients are given by (1.34) and (1.35) in the cases (i) and (ii) respectively. The computations are standard and lead to (1.41) and to (1.42). ! (n)
Proof of Corollary 2. The weak convergence of (ψk (λ))2 is equivalent to the convergence of its Stieltjes transform (n) (ψk (λ))2 dλ (2.79) z−λ uniformly in z on any compact set of C \ R. According to (2.12) and Proposition 2 the Stieltjes transform (2.79) is Rkk (z) + en (z). Now the asymptotic formula (2.33) implies that the Stieltjes transform (2.79) converges to >(z) as n → ∞ and dist{z, σε } ≥ d˙ > 0. This fact and the inversion formula (3.2) yield the result. !
3. Auxiliary Results Proposition 1. For the proof of weak convergence of measures Nn and (1.10) see [8]. Furthermore, it follows from Eq. (2.22) that in D g(z) can be written as V (z) 1 − (V (z))2 − 4Q(z), (3.1) 2 2 where Q(z) is defined in (2.23). Since ρ(λ) = −
1 lim g(λ + iε), π ε→+0
(3.2)
296
S. Albeverio, L. Pastur, M. Shcherbina
we conclude that ρ(λ) satisfies the Holder condition. Thus we find from the real parts of (3.1) that: V (λ) ρ(µ)dµ v.p. = , λ ∈ σ. 2 σ λ−µ Regarding this relation as a singular integral equation and using standard facts (see [21]), we obtain (1.10) in which 1 −1 P (λ) = Q(λ, µ)X+ (µ)dµ π σ −1 and Q and X+ (µ) are defined in (1.16) and (1.11). It is clear that P (λ) can be analytically continued into D and can be written in form (1.15). Since g(z) is uniquely determined by its boundary values on σ and its asymptotic behaviour g(z) = z−1 (1 + o(1)), as z → ∞, we obtain the assertions of the lemma. !
Proof of Proposition 2. According to the result of [8], and our condition C2, if we consider the function u(x) of the form (1.9), then u(x) = C ∗ (x ∈ σ ) and u(x) < C ∗ (x ∈ σ ). It is easy to see that at all endpoints a∗ of σ there exist one-side derivatives u ± (a∗) (we take the right derivative for the right endpoints a∗ and the left derivative for the left endpoints), and these derivatives are nonzero. Set C1 = 21 min |u ± (a∗)| and consider the function x ∈ σ, 0, V1 (x) = C1 ε, (3.3) x ∈ R \ σε , ±C (x − a∗), σ \ σ. ε 1 In the last line here we take plus for the right endpoints and minus for the left endpoints of the spectrum. It is easy to see that we can always choose ε0 so small that for any ε ≤ ε0 the function u1 (x) ≡ u(x) + V1 (x) also takes its maximum value C ∗ on σ . Consider now the following functions of (x1 , . . . , xn ) ∈ Rn that we will call Hamiltonians because their role below will be analogous to that of Hamiltonians of classical statistical mechanics (see [8] for this analogy): Hn (x1 , . . . , xn ) = n
n
V (xi ) − 2
i=1
ln |xi − xj |,
1≤i<j ≤n
Hn(1) (x1 , . . . , xn ) = nV˜ (x1 ) + n
n
V (xi ) − 2
i=2
Hn(1a) (x1 , . . . , xn ) = V˜ (x1 ) − (n − 1)u1 (x1 ) + n −2
ln |xi − xj |,
1≤i<j ≤n n
V (xi )
i=2
ln |xi − xj |,
2≤i<j ≤n
Hn(a) (x1 , . . . , xn )
= − nV1 (x1 ) − n + n(n − 1)
n
u(xi )
i=1
ln |x − y|ρ(x)ρ(y)dxdy,
(3.4)
On the 1/n Expansion for Some Unitary Invariant Ensembles of Random Matrices
297
where V˜ (x) ≡ V (x) − V1 (x), u is defined in (1.9), and u1 = u + V1 . Denote by pn = (Z )−1 exp{−Hn } the probability density defined by one of these functions (cf. (1.17)). We will use the Bogolyubov inequality, valid for any two Hamiltonians H1,2 with correspondent normalization constants (partition functions) Z1,2 , #H2 − H1 $H2 ≤ log Z1 − log Z2 ≤ #H2 − H1 $H1 ,
(3.5)
where the symbol #. . . $H denotes the mathematical expectation with respect to the probability density p = Z −1 exp{−H }. (1)
(1a)
Using the r.h.s inequality in (3.5) for H1 = Hn and H2 = Hn
, we get
log Zn(1) − log Zn(1a)
≤ 2(n − 1) log |x1 − x2 | ρn(1,1) (x1 , x2 ) − ρn(1,1) (x1 )ρn(1,2) (x2 ) dx1 dx2
+ 2(n − 1) log |x1 − x2 |ρn(1,1) (x1 ) ρn(1,2) (x2 ) − ρ(x2 ) dx1 dx2 , (3.6) (1,1)
(1,2)
where ρn (x1 ), and ρn (x2 ) are the first marginal densities corresponding to x1 and x2 (1) (1,1) (1,2) (1) for the Hamiltonian Hn (note that ρn (x1 ) = ρn (x1 ) since Hn is not symmetric (1,1) in x1 and x2 ), and ρn (x1 , x2 ) is the second marginal density, corresponding to x1 , x2 (1,1) (note that ρn (x1 , x2 ) is not symmetric because of the same reason). Lemma 4 of [8] (valid for not necessarily symmetric Hamiltonians) implies that the first term in the r.h.s. of (3.6) is O(log n). To estimate the second term we first take into account that the integral kernel log |x − y|−1 is positive definite, hence by the corresponding Schwartz inequality
log |x − y|ρ (1,1) (x) ρ (1,2) (y) − ρ(y) dxdy n n 1/2 (1,1) (1,1) ≤ log |x − y|ρn (x)ρn (y)dxdy 1/2
(1,2) (1,2) × log |x − y| ρn (x) − ρ(x) ρn (y) − ρ(y) dxdy .
(3.7)
298
S. Albeverio, L. Pastur, M. Shcherbina
By using the estimate (1,1) x˜ (1,1) ρ x + 3/γ − ρn (x) n n n
x˜ x˜ = (Zn(1) )−1 dx2 . . . dxn exp − nV˜ x + 3/γ − n V xi + 3/γ n n i=2 (3.8) n n n − exp − nV˜ (x) − n V (xi ) · |x − xi |2 |xi − xj |2 i=2
i=2
2≤i<j
const (1,1) ρ ≤ (x), n n
(1,1) valid for |x| ˜ < 1 in view of the condition (1.3), and the fact that ρn (x)dx = 1, (1,1) we obtain that ρn (x) ≤ const n3/γ . Hence we have the following bound for the first factor in the r.h.s. of (3.7): ln |x − y|ρ (1,1) (x)ρ (1,1) (y)dxdy ≤ const log n. n n To estimate the second factor in the r.h.s. of (3.7) we use the l.h.s inequality in (3.5) for (a) (1) (a) (1) the Hamiltonians H1 = Hn and H2 = Hn , where Hn and Hn are defined in (3.4). We obtain the inequality (n − 1)(n − 2) log |x − y|(ρn(1,2) (x, y) − ρn(1,2) (x)ρn(1,2) (y))dxdy − n2 (n − 1)(n − 2) − log |x − y|(ρn(1,2) (x) − ρ(x))(ρn(1,2) (y) − ρ(y))dxdy n2 2(n − 1) + log |x − y|(ρn(1,2) (x) − ρ(x))ρ(y)dxdy n2 2 + log |x − y|ρn(1,1) (x)ρ(y)dxdy (3.9) n 2(n − 1) − log |x − y|ρn(1,1) (x, y)dxdy n2
1 1 1 1 (a) ≤ 2 log Zn(a) − 2 log Zn(1) = log Z − log Z n n n n n2 n2
1 log n 1 2 + log Zn − 2 log Zn(1) ≤ O − V1 (x)ρn (x)dx. 2 n n n n (a)
In the r.h.s. here we have used the result of [8] to estimate 1/n2 log Zn − 1/n2 log Zn (1) and inequality (3.5) to estimate 1/n2 log Zn − 1/n2 log Zn . Using Lemma 4 of [8] (more precisely, repeating almost literally the arguments of that lemma in the case of the non symmetric Hamiltonian ), we obtain that the first and the last terms in the l.h.s. of (3.9) are of the order O(log n/n). And the third, the fourth and the fifth terms here are evidently of the order O(n−1 ). Therefore finally we get from (3.9), log n . (3.10) − log |x − y|(ρn(1,2) (x) − ρ(x))(ρn(1,2) (y) − ρ(y))dxdy ≤ const n
On the 1/n Expansion for Some Unitary Invariant Ensembles of Random Matrices
299
Substituting this estimate in (3.6) we obtain log Zn(1) − log Zn(1a) ≤ const
√
n log n.
(3.11)
(1a)
(1a)
Now we use the r.h.s. inequality in (3.5) for H1 = Hn and H2 = Hn , where Hn and Hn are defined in (3.4). We get (1a) log Zn − log Zn ≤ n V1 (x1 )ρn(a1) (x1 )dx1 (3.12) + (n − 1) ρn(a1) (x1 )(ρn(a2) (y) − ρ(y))dx1 dy, (a1)
(a2)
(1a)
where ρn and ρn are the first marginal densities of the Hamiltonian Hn sponding to x1 and x2 . On the other hand it is easy to see that ρn(a1) (x) =
, corre-
exp{(n − 1)u1 (x) − V˜ (x)} , exp{(n − 1)u1 (x) − V˜ (x)}dx (a1)
and due to the choice of the function V1 the density ρn (x) decays exponentially outside of σ . Thus since V1 (x) = 0 for x ∈ σ the first term in the r.h.s. of (3.12) is of the order O(1). The second term can be estimated by the Schwartz inequality similarly to (3.7) (a2) and then, using the fact that ρn (x) coincides with the first marginal densities of the Hamiltonian,
Hn (x2 , . . . , xn ) = n
n
V (xi ) − 2
i=2
ln |xi − xj |.
2≤i<j (a2)
Therefore the analog of inequality (3.10) for ρn of [8]. Thus, from (3.12) we derive
(x) follows directly from the results
log Zn(1a) − log Zn ≤ const
√
n log n.
(3.13)
Bounds (3.11) and (3.13) lead to the relation
enV1 (x1 ) ρn (x1 )dx1 =
(1)
√ Zn ≤ eC2 n log n . Zn
2 Taking C0 = 2 C C1 , we obtain from the last relation that for any positive ε satisfying the inequality: C0 n−1/2 log n ≤ ε ≤ ε0 we have √ ρn (x1 )dx1 ≤ exp{C2 n log n − C1 εn} ≤ e−nC1 ε/2 .
R\σe
To obtain this statement for ρk,n we have to prove now that for any n-independent ε we can choose ε1 such that for |k − n| ≤ ε1 n the spectrum of the ensemble with potential n V˜ ≡ V is inside of σε/2 . This fact follows from the main result of [8, 12] and also from k [19]. Proposition 2 is proven. !
300
S. Albeverio, L. Pastur, M. Shcherbina
Proof of Lemma 1. Using Proposition 1 we rewrite Eq. (2.27) in D: 1 P (z)X(z)R(z) = dζ Q(z, ζ )R(ζ ), 2π i L
(3.14)
with Q(z, ζ ) defined by (1.16). It follows from formula (1.15) for P (z) that the function >(z) of (2.28) solves Eq. (3.14) in the class (2.29). Let us show that the solution is ˜ ˜ unique. Denoting by Q(z) the r.h.s. of (3.14), we see that Q(z) is an analytic function ˜ in D. From Eq. (3.14) we derive that zeros of P (z) in D coincide with zeros Q(z) and ˜ Q(z) have the same order. Thus, function R(z)X(z) = P (z) is analytic in D. In the rest of C it is analytic, because we are looking for a solution analytic outside σε . Thus R(z)X(z) is analytic in the whole C. Besides, if R(z) = 1z (1 + o(1)), as |z| → ∞, then in the case (i) R(z)X(z) is bounded, as |z| → ∞. Therefore by the Liouville theorem, R(z)X(z) is a constant. In the case (ii) we get also from the Liouville theorem, that R(z)X(z) = az+b. By the symmetry of the function R(z) we get R(z) = zX−1 (z). This proves the first statement of the lemma. To prove the second statement, we notice that under condition (2.30) in the case (i) we have R(z)X(z) → 0, as |z| → ∞. Thus, according to the above conclusions R(z) = 0 for all z. In the case (ii) condition (2.30) implies that R(z)X(z) = const and we get R(z)X(z) = 0 from the symmetry condition. To prove that (2.32) is a solution of Eq. (2.31) we note first that for any closed contour L that does not contain the zeros of P (z) we can write the relation ˜ )dζ 1 Q(ζ 1 R(ζ )X(ζ )dζ R(z)X(z) = − , (3.15) 2πi L (ζ − z) 2π i L P (ζ )(ζ − z) ˜ where Q(z) is defined as in the r.h.s. of (3.14). Indeed, under the condition of the lemma R(z)X(z) = z−1 (1 + o(1)), as z → ∞, i.e. the function is analytic outside of contour L. Then, by the Cauchy theorem, the first term in the r.h.s. is R(z)X(z). The second term is zero, because the integrand is analytic inside the contour L and z is outside of L. By using this relation, we can rewrite formula (2.32) for the solution as 1 2πi
L1
(V (ζ ) − P (ζ )X(ζ ))R(ζ ) −
1 2πi
L
V (ζ, ζ1 )R(ζ1 )dζ1 + F (ζ )
dζ = 0, P (ζ )(ζ − z)
where the contour L1 lies outside of L and is close enough to L. According to the condition of the lemma the expression in the brackets is analytic outside of L1 . Thus by the Cauchy theorem, we have 1
(V (z) − P (z)X(z))R(z) − V (z, ζ )R(ζ )dζ + F = 0. 2π i L Since 2g(z) = V − P (z)X(z), the last relation proves that (2.32) is the solution of Eq. (2.31). Uniqueness follows from the absence of solutions of the homogeneous equation (2.27) in the class (2.30). This fact was proven above. !
On the 1/n Expansion for Some Unitary Invariant Ensembles of Random Matrices
301
Proof of Lemma 2. Consider the ”block” symmetric Jacobi matrix J˙(n,N1 ) which can be obtained from J if we set Jn−N1 −1 = 0. Let R˙ (n,N1 ) (z) be its resolvent. We will use the resolvent identity valid for any two selfadjoint operators J1,2 with resolvents R1,2 respectively, R1 (z) − R2 (z) = R1 (z)(J2 − J1 )R2 (z).
(3.16)
Thus taking as R1 (z) the resolvent R(z) of J (n), and as R2 (z) the resolvent R˙ (n,N1 ) (z) of J˙(n,N1 ) we obtain (n,N1 )
Rk,j (z) − R˙ k,j
(n,N )
1 (z) = R˙ k,n−N J R (z) 1 −1 n−N1 −1 n−N1 ,j
(n,N1 ) J R (z). + R˙ k,n+N 1 +1 n+N1 +1 n+N1 +2,j
(3.17)
Now we use the general fact of the theory of the Jacobi matrices. Proposition 3. Let J be the Jacobi matrix with coefficients Jk,k+1 = Jk+1,k = ak ∈ R, |Jk,k | ≤ ε, and |ak | ≤ A. Then there exist positive constants C1,2 , such that for any z ∈ C \ [−2A − ε, 2A + ε] the matrix elements of the resolvent G = (zI − J )−1 satisfy the inequalities: |Gk,k (z)| ≤
C1 −C2 δε (z)|k−k | , e δε (z)
(3.18)
where δε (z) ≡ dist{z, [−2A − ε, 2A + ε]}. The proof of the proposition is similar to that of the well-known Combes- Thomas estimates for the Schroedinger operator (see e.g. [26]) and we omit the proof. On the basis of the proposition we obtain the bound (n,N1 )
|R˙ j,k
(z)| ≤
1 −C2 δε (z)|j −k| . e δε (z)
(3.19)
Thus, for (N1 − 2N˜ ) ≤ |k − n| ≤ (N1 − N˜ ) we have (n,N ) (n,N ) |R˙ n−N11 −1,k (z)|, |R˙ n+N11 +1,k (z)| ≤
1 −C2 δε (z)N˜ . e δε (z)
So, it follows from (3.17) that (n,N ) |Rk,j (z) − R˙ k,j 1 (z)| ≤
const −C1 δε (z)N˜ . e |z|δε (z)
(3.20)
Similarly, if we consider the (2N1 + 1)-periodic symmetric Jacobi matrix J˜ such that J˜k,k+1 = Jk,k+1
|k − n| ≤ N1 ,
(3.21)
and denote by R˜ its resolvent, then (n,N ) |R˜ k,k − R˙ k,k 1 (z)| ≤
2 ˜ e−C2 δε (z)N . |z|δε (z)
(3.22)
|Rk,k (z) − R˜ k,k (z)| ≤
const −C2 δε (z)N˜ . e |z|δε (z)
(3.23)
Therefore,
302
S. Albeverio, L. Pastur, M. Shcherbina
Applying the resolvent identity (3.16) to the matrices J˜(p) and J˜ we obtain in view of estimate (1.28): (p)
|R˜ k,j (z) − R˜ k,j (z, n−1 )| ≤
(p)
2εn , np |z|2
(3.24)
(p) where R˜ k,j (z, s) is the resolvent of the Jacobi matrix J˜(p) (z, s) defined in (2.48) and (p) (p) εn is defined in (1.30). Now expanding R˜ k,k (z, n−1 ) with respect to n−1 it is easy to find that (p)
(p)
|R˜ k,j (z, n−1 ) − Sk,j (z)| ≤
p+1
C1 N1
p+1
δε
(z)np+1
.
(3.25)
From (3.23)–(3.25) we derive that p+1
(p)
Rk,k (z) − Sk,k (z)| ≤
˜ (p) C1 N 1 εn e−C2 δε (z)N + + . p+1 δε2 (z)np |z|δε (z) δε (z)np+1
This inequality and (2.11), lead to the first inequality in (2.51). To prove the second inequality in (2.51) we use again identity (3.17). Taking the second power of the identity, using the bounds (n,N1 )
|R˙ k,j
(z)|, |Rk,j (z)| ≤
1 , |z|
valid for resolvents of arbitrary selfadjoint operators, and bound (3.19), we obtain ∞ ∞ (n,N1 ) 2 2 ˙ (Rk,j (z)) − (Rk,j (z)) j =1
j =1
∞
∞ 4 −C2 δε (z)N˜ 2 2 (3.26) ≤ |Rn−N1 ,j (z)| + |Rn+N1 ,j (z)) | e δε (z) j =1
j =1
8 ˜ ≤ e−C2 δε (z)N . 2 |z| δε (z) 2 To estimate here the sums of the type ∞ j =1 |Rn−N1 ,j (z)| we have used the simple inequalities ∞
|Rn−N1 ,j (z)|2 =
j =1
∞
Rn−N1 ,j (z)Rj,n−N1 (z) ≤ (R(z) · R(z))n−N1 ,n−N1 ≤
j =1
Similarly, ∞ ∞ 1 ˜ (n,N1 ) 2 2 ˜ ˙ e−C2 δε (z)N . (Rk,m (z)) − (Rkm (z)) ≤ 2 2 |z| δε (z) m=k+1
1 . |z|2
(3.27)
m=k+1
And then, by the same way as in (3.23)-(3.25) we get the second inequality of (2.51). The proof of (2.52) is similar.
On the 1/n Expansion for Some Unitary Invariant Ensembles of Random Matrices
303
Note that in fact we have proved (2.51) and (2.52) for |k − n| ≤ (N1 − N˜ ). To prove (2.53) we need to make one more step. Let us prove that for |k − n| ≤ (N1 − 2N˜ ), ˜ n−(N1 −N) k 1 ˜
2 −C2 δε (z)N/2 ≤ − R (z) − R (z) . j,j j,m |z|2 δ (z) e ε
(3.28)
m=1
j =1
˜
To this end we consider one more ”block” symmetric Jacobi matrix J˙(n,(N1 −2N)) which can be obtained from J if we put Jn−(N1 −2N)−1,n−(N ˜ ˜ = 0. Using identity (3.16) 1 −2N) ˜ ˜ (n,(N −2 N)) (n,(N −2 N)) 1 1 and (3.19) for J˙ , we obtain similarly to (3.26), for J and J˙ ˜ ˜ n−(N1 −N) n−(N ∞ ∞ 1 −N) ˜ (n,(N1 −2N)) 2 2 ˙ (R (z)) − ( R (z)) j,m j,m j =1
m=k+1
j =1
≤
m=k+1
2n −C2 δε (z)N˜ 1 ˜ e ≤ e−C2 δε (z)N/2 . |z|3 |z|2 δε (z) ˜ (n,(N1 −2N))
Then, using the estimate (3.19) for R˙ j,m k + 1 > n − (N1 − 2N˜ ) we get
(3.29)
(z) with j ≤ n − (N1 − N˜ ) and m ≥
˜ n−(N1 −N) ∞ 2n 1 ˜ ˜ ˜ (n,(N1 −2N)) 2 ˙ (Rj,m (z)) ≤ 2 e−C2 δε (z)N ≤ 2 e−C2 δε (z)N/2 . δε (z) δε (z) j =1
m=k+1
This inequality combined with (3.29) proves that ˜ n−(N1 −N) k 2 (R · R) (z) − R (z) j,j j,m m=1
j =1
˜ n−(N1 −N) ∞ 1 −C2 δε (z)N/2 ˜ 2 = Rj,m (z) ≤ e . |z|3 j =1
m=k+1
(z)) and R Now, using (2.11), we can replace (R ∗ R)j,j (z) by (−Rj,j j,m (z) by Rj,m (z) to get (3.28). Applying the first and the second line of (2.51) for |k − n| ≤ (N1 − N˜ ) we get (2.53). !
304
S. Albeverio, L. Pastur, M. Shcherbina (i)
Proof of Lemma 3. To find Dk we first compute the quantity (R (0) (ζ )J (±) R (0) (ζ ))n,n+1 = =
∞ j =−∞ ∞ j =−∞
(0)
(0)
(0)
(0)
(Rn,j (ζ )(−1)n−j Rj +1,n+1 + Rn,j +1 (ζ )(−1)n−j Rj +2,n+1 (ζ ) 1 (2π)2
2π 0
dxdy
ei(n−j )(x−y−π) (1 + e−i(x+y) ) (ζ − a cos x)(ζ − a cos x)
2π 1 − cos 2x 1 dx 2 2π 0 (ζ − a 2 cos2 x)
2π 1 1 ζ2 1 + dx 2 =2 1− 2 2 2 a 2π 0 (ζ − a cos x) π a 2
2 ζ2 = 1 − 2 X −1 (ζ ). ζ a
=
(0)
(0)
Then using the simple formula Rn,n+1 (ζ ) = a −1 (ζ Rn,n (ζ ) − 1) = a −1 (ζ X −1 (ζ ) − 1) we find from (2.65),
1 ζ a ζ2 D (i) = X −1 (ζ )dζ V (ζ ) + 1− 2 2πi L a ζ a a V (ζ ) = dζ = aP (0) = 0. 2πi L X(ζ )ζ Here we have used representation (1.15) and the fact that L dζ (X(ζ )ζ )−1 = 0. Similar calculations show us that A(i) = 0, so it follows from Eq. (2.66) that c = 0 and we get (1.34). !
References 1. Akemann, G.: Higher genus correlators for the Hermitian matrix model with multiple cuts. Nuclear Phys. B 482, 403–430 (1996) 2. Albeverio, S., Pastur, L., Shcherbina, M.: On asymptotic properties of certain orthogonal polynomials. Matem. Fizika, Analiz, Geometriya 4, 263–277 (1997) 3. Ambjörn, J., Akemann, G.: New universal spectral correlators. J. Phys. A 29, L555–L560 (1996) 4. Ambjörn, J., Chekhov, L., Makeenko,Yu.: Higher genus correlators from the Hermitian one-matrix model. Phys. Lett. B 282, 341–348 (1992) 5. Ambjörn, J., Jurkiewicz, J., Makeenko,Yu.M.: Multiloop correlators for two-dimensional quantum gravity. Phys. Lett. B 251, 517–524 (1990) 6. Beenakker, C.W.J.: Random-matrix theory of quantum transport. Rev. Mod. Phys. 69, 731–847 (1997) 7. Bessis, D., Itzykson C., Zuber J.-B.: Quantum Field Theory Techniques in Graphical Enumeration. Adv. Appl. Math. 1, 109–157 (1980) 8. Boutet de Monvel, A., Pastur L., Shcherbina, M.: On the statistical mechanics approach in the random matrix theory. Integrated density of states. J. Stat. Phys. 79, 585–611 (1995) 9. Brézin, É., Deo, N.: Correlations and symmetry breaking in gapped matrix models. Phys. Rev. E 59, 3901–3910 (1999) 10. Brézin, É., Zee, A.: Universality of the correlations between eigenvalues of large random matrices. Nuclear Phys. B 402, 613–627 (1993) 11. Buslaev, V., Pastur, L.: On a class of multi-cut solutions of matrix models and related structures. In preparation
On the 1/n Expansion for Some Unitary Invariant Ensembles of Random Matrices
305
12. Deift, P., Kriecherbauer, T., McLaughlin, K.: New results on the equilibrium measure in the presence of external field. J. Approx. Theory 95, 388–475 (1998) 13. Deift, P., Kriecherbauer, T., McLaughlin, K., Venakides, S., Zhou, X.: Uniform asymptotics for polynomials orthogonal with respect to varying exponential weights and applications to universality questions in random matrix theory. Commun. Pure Appl. Math. 52, 1335–1425 (1999) 14. Deift, P., Kriecherbauer, T., McLaughlin, K., Venakides, S., Zhou, X.: Strong asymptotics of orthogonal polynomials with respect to exponential weights. Commun. Pure Appl. Math. 52, 1491–1552 (1999) 15. Di Francesco, P., Ginsparg, P., Zinn-Justin, J.: 2D gravity and random matrices. Phys. Rep. 254, 1–133 (1995) 16. Guhr, T., Mueller-Groeling, A., Weidenmueller, H.A.: Random matrix theories in quantum physics: Common concepts. Phys. Rept. 299, 189–425 (1998) 17. Johansson, K.: On fluctuations of eigenvalues of random Hermitian matrices. Duke Math. J. 91, 151–204 (1998) 18. Khorunzhy, A., Khoruzhenko, B., Pastur, L.: Random matrices with independent entries: Asymptotic properties of the Green function. J. Math. Phys. 37, 5033–5060 (1996) 19. Kuijlaars, A.B.J., McLaughlin, K.: Generic behavior of the density of states in random matrix theory and equilibrium problems in the presence of real analytic external fields. Commun. Pure Appl. Math. 53, 736–785 (2000) 20. Mehta, M.L.: Random Matrices. New York: Academic Press, 1991 21. Muskhelishvili, N.I.: Singular Integral Equations. Groningen: P. Noordhoff, 1953 22. Pastur, L.: Spectral and probabilistic aspects of matrix models. In: Boutet de Monvel, A., Marchenko, V. (eds), Algebraic and Geometric Methods in Mathematical Physics. Dordrecht: Kluwer, 1996, pp. 207–247 23. Pastur, L.: Random matrices as paradigm. In: Fokas, A., Grigoryan, A., Kibble, T., Zegarlinski B. (eds.), Mathematical Physics 2000. London: Imperial College Press, 2000, pp. 216–266 24. Pastur, L., Shcherbina, M.: Universality of the local eigenvalue statistics for a class of unitary invariant random matrix ensembles. J. Stat. Phys. 86, 109–147 (1997) 25. Pastur, L., Shcherbina, M.: Universality of the local eigenvalue statistics near generic endpoints of the spectrum for a class of unitary invariant random matrix ensembles. In preparation 26. Reed, M., Simon,B.: Methods of Modern Mathematical Physics, Vol. IV. New York: Academic Press, 1978 27. Saff, E.B., Totik, V.: Logarithmic Potentials with External Fields. Berlin: Springer, 1997 Communicated by H. Spohn
Commun. Math. Phys. 224, 307 – 321 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Symmetric Simple Exclusion Process: Regularity of the Self-Diffusion Coefficient C. Landim1,2, , S. Olla3 , S. R. S. Varadhan4, 1 IMPA, Estrada Dona Castorina 110, CEP 22460 Rio de Janeiro, Brasil. E-mail: [email protected] 2 CNRS UPRES-A 6085, Université de Rouen, 76128 Mont Saint Aignan, France 3 Université de Cergy Pontoise, Département de Mathématiques, CNRS UPRES-A 8088, 2 Av. Adolphe
Chauvin, B.P.222, Pontoise, 95302 Cergy-Pontoise-Cedex, France. E-mail: [email protected]
4 Courant Institute of Mathematical Sciences, NewYork University, 251 Mercer Street, NewYork, NY 10012,
USA. E-mail: [email protected] Received: 6 December 2000 / Accepted: 6 April 2001
Dedicated to Joel Lebowitz on his seventieth birthday Abstract: We prove that the self-diffusion coefficient of a tagged particle in the symmetric exclusion process in Z d , which is in equilibrium at density α, is of class C ∞ as a function of α in the closed interval [0, 1]. The proof provides also a recursive method to compute the Taylor expansion at the boundaries. 1. Introduction In the course of the study of macroscopic behavior of large particle systems, effective diffusion coefficients which are functions of the parameters (associated to the conserved quantities) that define the equilibrium measures of the system often appear. These diffusion coefficients are usually expressed in terms of integrals of time correlations functions (Green–Kubo formulas), or through (infinite dimensional) variational formulas. They also appear as coefficients in the diffusive equations that govern the non-equilibrium evolution of the conserved quantities of the system. In order to study the existence and regularity of solutions to these equations it is important to establish first the regularity of these diffusion coefficients as functions of the parameters. In this article we develop a method for proving smooth dependence, on the density, of the self diffusion coefficient of a tagged or tracer particle in symmetric simple exclusion particle systems that are in equilibrium. It is based on the duality properties of the symmetric simple exclusion process. This method, with modifications, can also be applied to study the smooth dependence on the density of other diffusion coefficients that arise in the study of more general simple exclusion processes. But this will be taken up elsewhere. The paper is organized along the following lines. In Sect. 2 we introduce the notation and state the main theorem. In Sect. 3 we describe the generalized duality and discuss Partially supported by CNPq grant 300358/93-8, FAPERJ grant E26/150.940/99 and PRONEX grant 41.96.0923.00. Supported by the National Science Foundation grant DMS-9803140.
308
C. Landim, S. Olla, S. R. S. Varadhan
several operators and norms that appear in the dual representation. Section 4 is devoted to some key estimates that are used in Sect. 5 to prove the main result on the smoothness of the self diffusion coefficient of the tagged particle in the case of the symmetric simple exclusion process. At the end, in Remark 5.3, we expose a recursive method to compute the Taylor expansion at the boundaries. Very few results exist at the present time about the regularity of diffusion coefficients. Continuous dependence on the density has been established in different contexts (cf. [2]). Generally proving continuity does not seem to be considerably harder than establishing the existence of a diffusion coefficient. In [6], Lipschitz continuity of the selfdiffusion coefficient for the tagged particle in the symmetric simple exclusion is proved in dimensions d ≥ 3. 2. Notation and Results Let us fix a symmetric finite-range probability distribution p(·) on Zd . Consider the symmetric simple exclusion process associated with p. We assume, without loss of generality, that the subgroup generated by the support of p is all of Zd . In addition we assume that we are not dealing with the trivial situation of d = 1 and p(±1) = 21 , i.e. the one dimensional nearest neighbor case where the self diffusion coefficient is identically zero. d The simple exclusion process is the Markov process on X = {0, 1}Z whose generator L acts on cylinder functions f as (Lf )(η) = p(y − x)η(x)[1 − η(y)][f (σ x,y η) − f (η)] x,y∈Zd
=
1 p(y − x)[f (σ x,y η) − f (η)]. 2 d
(2.1)
x,y∈Z
Here and below the configurations of X are denoted by Greek letters. In particular, for x in Zd , η(x) is equal to 1 if the site x is occupied in the configuration η and is equal to 0 if it is not. Moreover, for a configuration η and x, y in Zd , σ x,y η is the configuration obtained from η by exchanging the occupation variables η(x), η(y) : η(y) if z = x, (σ x,y η)(z) = η(x) if z = y, (2.2) η(z) otherwise. Fix 0 ≤ α ≤ 1 and denote by µα the Bernoulli product measure on X . This is the probability measure on X obtained by placing a particle with probability α at each site x, independently from the other sites. It easy to check that the one-parameter family of probability measures {µα , 0 ≤ α ≤ 1} are stationary, reversible and ergodic for the Markov process with generator L. We examine in this article the evolution of a single tagged particle in the symmetric simple exclusion process. Let η be an initial configuration with a particle at the origin, i.e. with η(0) = 1. Tag this particle and denote by ηt (resp. Xt ) the state of the process (resp. the position of the tagged particle) at time t. We shall refer to ηt as the environment. Let ξt be the state of the environment as seen from the tagged particle: ξt = θXt ηt . Here, for x in Zd and a configuration η, θx stands for the translation of η by x, i.e.
Regularity of Self-Diffusion
309
(θx η)(y) = η(x + y). Notice that the origin is always occupied (by the tagged particle) for the environment as seen from the tagged particle. For this reason, we shall consider d the process ξt as taking values in {0, 1}Z∗ , where Zd∗ = Zd \{0}. Whereas Xt is not a Markov process due to the presence of the environment, (Xt , ξt ) and ξt are. A simple computation shows that the generator L of the Markov process ξt is given by L = L0 + Lτ , where (L0 f )(ξ ) =
p(y − x)ξ(x)[1 − ξ(y)][f (σ x,y ξ ) − f (ξ )],
x,y∈Zd∗
1 p(y − x)[f (σ x,y ξ ) − f (ξ )], 2 x,y∈Zd∗ p(z)[1 − ξ(z)][f (τz ξ ) − f (ξ )]. (Lτ f )(ξ ) = =
(2.3)
z∈Zd∗
The first part of the generator takes into account the jumps in the environment, while the second one corresponds to jumps of the tagged particle. In the above formula, τz ξ stands for the configuration where the tagged particle, sitting at the origin, is first transferred to the (empty) site z and then the entire environment is translated by −z: for all y in Zd∗ , (τz ξ )(y) =
ξ(z) ξ(y + z)
if y = −z, for y = −z. d
For 0 ≤ α ≤ 1, denote by µα the Bernoulli product measure on X∗ = {0, 1}Z∗ . A simple computation shows that µα is a reversible and ergodic stationary measure for the Markov process ξt . In this context Kipnis and Varadhan ([1]) proved a central limit theorem for the position of the tagged particle starting with an initial environment chosen randomly from the equilibrium µα . They showed that εXtε−2 converges, as ε ↓ 0, to a Brownian motion with diffusion coefficient D(α) which we will describe in more detail in the next section. This result has been generalized by Varadhan ([6]) to the asymmetric case with 0mean ( y yp(y) = 0). More recently, for the general asymmetric case in dimension d ≥ 3, if y yp(y) = m = 0, in Sethuraman-Varadhan-Yau ([5]) it is proved that ε[Xtε−2 − mt (1 − α)ε −2 ] converges, as ε ↓ 0, to a Brownian motion with another diffusion coefficient. In this article we limit ourselves to the symmetric case and study the regularity properties of D(α) as a function of α. The main result is Theorem 2.1. The self-diffusion coefficient D(α), as a function of α, is of class C ∞ in the closed interval [0, 1]. 3. Generalized Duality The proof of Theorem 2.1 relies on the duality properties of the symmetric exclusion process that we will now describe.
310
C. Landim, S. Olla, S. R. S. Varadhan
We have the Hilbert space L2 (µα ) with its natural inner product ·, ·α . The operator L is self adjoint and the natural Dirichlet inner products will be denoted by f, g1,α = −Lf, gα , f, g1,env,α = −L0 f, gα . The dual norms f −1,α and f −1,env,α are defined by f 2−1,α = sup 2 f, g − g, g1,α , g
f 2−1,env,α
= sup 2 f, g − g, g1,env,α . g
For each n ≥ 0, denote by E∗,n the subsets of Zd∗ with n points and let E∗ = ∪n≥0 E∗,n be the class of all finite subsets of Zd∗ . Let us consider an abstract Hilbert space H with a complete orthonormal basis consisting of {eA : A ∈ E∗ }. The space H can be viewed as the space of square summable maps f of E∗ → R. In a natural way H = ⊕n≥0 Gn , where Gn is spanned by {eA : A ∈ E∗,n }. For each A in E∗ , let the local function in L2 (µα ) be defined by
ξ(x) − α , √ A = A (α, ξ ) = χ (α) x∈A where χ (α) = α(1 − α). By convention, φ = 1. It is easy to check that { A , A ∈ E∗ } is an orthonormal basis of L2 (µα ). For each n ≥ 0, denote by Gn the subspace of L2 (µα ) generated by { A , A ∈ E∗,n }, so that L2 (µα ) = ⊕n≥0 Gn . Functions of Gn are said to have degree n. The main property of the symmetric simple exclusion process that will be used here is that part of the generator, i.e. L0 , preserves the degree of the functions. Consider a local function f . Since { A : A ∈ E∗ } is a basis of L2 (µα ), we may write f(A) A = πn f. f = n≥0 A∈E∗,n
n≥0
Here we have denoted by πn the orthogonal projection onto Gn . Notice that the coefficients f(A) depend not only on f but also on the density α: f(A) = f(A, α). Since f is a local function, f : E∗ → R has finite support. In other words we have a unitary isomorphism, f ∼ f(A)eA between L2 (µα ) and H that takes local functions in L2 (µα ) onto finite linear combinations of the basis elements. Of course this establishe also an isomorphism between Gn and Gn . We now conclude this section by expressing the operators L and L0 as well as their Dirichlet forms, through this isomorphism, in the basis {eA } of H. To begin with, because the isomorphism is unitary, we have f, gα = f, g =
f(A)g(A),
A∈E∗
where f ∼
f(A)eA
The norm in H will be denoted by f0 .
and
g∼
g(A)eA .
Regularity of Self-Diffusion
311
For a subset A of Zd∗ and x, y in Zd∗ , denote by Ax,y , Sy A the sets defined by (A\{x}) ∪ {y} if x ∈ A, y ∈ A, Ax,y = (A\{y}) ∪ {x} if y ∈ A, x ∈ A, A otherwise; (3.1) A−z if z ∈ A, Sz A = ((A\{z}) − z) ∪ {−z} if z ∈ A. In this formula, B + z is the set {x + z; x ∈ B}. Therefore, to obtain Sz A from A in the case where z belongs to A, we first remove z to get a set not containing z, then translate A\{z} by −z and finally add the site −z. Recall the definition of the generators L0 , Lτ given in (2.3). A simple computation shows that (L0 f)(A)eA , (Lτ f ) ∼ (Lτ,α f)(A)eA , (L0 f ) ∼ A∈E∗
A∈E∗
where (L0 f)(A) =
1 p(y − x)[f(Ax,y ) − f(A)] 2 d
(3.2)
x,y∈Z∗
and Lτ,α is an operator which can be decomposed as Lτ,α = αLτ1 + (1 − α)Lτ2 + χ (α)(Lτ+ + Lτ− ), where (Lτ1 f)(A) = (Lτ2 f)(A) = (Lτ+ f)(A) = (Lτ− f)(A)
=
p(y)[f(Sy A) − f(A)],
y∈A
p(y)[f(Sy A) − f(A)],
y∈A
p(y)[f(A\{y}) − f(Sy A\{−y})],
(3.3)
y∈A
p(y)[f(A ∪ {y}) − f(Sy A ∪ {−y})].
y∈A
Notice that L on L2 (µα ) is represented on H by Lα = L0 + Lτ,α : Lα = L0 + αLτ1 + (1 − α)Lτ2 + χ (α)[Lτ+ + Lτ− ].
(3.4)
We mentioned earlier that the main property to be exploited here is that the generator of the symmetric exclusion process preserves the degree of local functions. It is easy to check that the operators L0 , Lτ1 , Lτ2 preserve the degree of a function, i.e. they map Gn into itself. Moreover, Lτ+ increases the degree of a function by one while Lτ− decreases it by one. For a function f : E∗ → R and n ≥ 0, denote by πn f or by fn its restriction to En,∗ : (πn f)(A) = f(A)1{A ∈ En }.
312
C. Landim, S. Olla, S. R. S. Varadhan
For local functions f , g : X∗ → R, a long but elementary computation shows that, if we define 2 f, g1,α = p(y − x)[f(Ax,y ) − f(A)][g(Ax,y ) − g(A)] +
x,y∈Zd∗ A∈E
p(y)ry (A)[f(Sy A) − f(A)][g(Sy A) − g(A)]
(3.5)
y∈Zd∗ A∈E
− −
χ (α)
p(y)[f(Sy A) − f(A)][g(Sy [A ∪ {y}]) − g([A ∪ {y}])]
y∈Zd∗ A∈E y∈A
χ (α)
p(y)[f(Sy [A ∪ {y}]) − f([A ∪ {y}])][g(Sy A) − g(A)].
y∈Zd∗ A∈E y∈A
with ry (A) is equal to α if y belongs to A and is equal to 1 − α if y does not belong to A, then f, g1,α = f, g1,α Notice that the last three terms can be recombined to give a positive expression when f = g. The corresponding norm will be denoted by f1,α which of course is equal to f 1,α . By completing the space of finitely supported functions with this norm we obtain the Dirichlet space H1 . Let H−1 be the dual of H1 with respect to the standard inner product on H. This is the Hilbert space generated by finitely supported functions and the norm · −1,α defined by f2−1,α = sup 2 f, g − g, g1,α , g
where the supremum is carried over all finitely supported functions g. It follows from the isomorphism that f −1,α = f−1,α . The Dirichlet form corresponding to L0 is much simpler to calculate in the H representation. Denote by · 1,env and · −1,env respectively the Dirichlet norm and its dual associated to the generator L0 : g21,env = g, (−L0 )g 1 = p(y − x)[g(Ax,y ) − g(A)]2 2 x,y∈Zd∗ A∈E = πn g21,env n≥0
and
g2−1,env = sup 2 f, g − f, (−L0 )f f
=
n≥0
πn g2−1,env ,
(3.6)
where the supremum is carried over all finitely supported functions. In contrast to the norms · 1,α , · −1,α , the norms · 1,env , · −1,env do not depend explicitly on
Regularity of Self-Diffusion
313
the parameter α. Moreover, since f, (−L0 )f ≤ f, (−Lα )f, it follows that g1,env ≤ g1,α and g−1,α ≤ g−1,env . In Lemma 4.4, we estimate g1,α and g−1,env in terms of g1,env and g−1,α , respectively. Finally, for any k ≥ 0, let us define n2k πn f20 , |||f|||21,k = n2k πn f21,env , |||f|||20,k = n≥0
|||f|||2−1,k
=
n≥0
n≥0
n
2k
πn f2−1,env .
(3.7)
If T is the operator that acts as scalar multiplication by n on the space Gn of degree n, these are the quadratic forms T k f2 , < T k f, (−L0 )T k f > and < T k f, (−L0 )−1 T k f > respectively. Note that L0 commutes with T . The completion under these norms will be denoted by H0,k , H1,k and H−1,k respectively. 4. Some Estimates Since Lα is self adjoint, for the solution uλ of the resolvent equation λuλ − Lα uλ = f, we have the basic estimate that implies
(4.1)
uλ 1,α ≤ f−1,α uλ 1,env ≤ f−1,env
or
|||uλ |||1,0 ≤ |||f|||−1,0 .
The following regularity result follows from Eq. (5.5) of [4]. Lemma 4.1. Let k ≥ 1 be given. Let f be a function such that |||f|||−1,k < ∞. For λ > 0, let uλ be the solution of the resolvent equation (4.1). Then, |||uλ |||1,k ≤ C(k)|||f|||−1,k
(4.2)
for a finite constant C(k) independent of α and λ. In fact the proof of (4.2) given in [4] extends immediately to non-local f. We now state some bounds on the restrictions of Lτ1 , Lτ2 , Lτ+ and Lτ2 on Gn . These j bounds will grow linearly with n. Notice that Lτ , j = 1, 2 are symmetric operators, + − while Lτ is the adjoint of Lτ :
+ Lτj f, g = f, Lτj g Lτ f, g = f, Lτ− g , for j = 1, 2 and f, g in L2 (E∗ ). Moreover, p(y)[f(Sy A) − f(A)]2 , Lτ1 f, f = (1/2)
Lτ2 f, f = (1/2)
A∈E∗ y∈A
A∈E∗ y∈A
p(y)[f(Sy A) − f(A)]2 .
314
C. Landim, S. Olla, S. R. S. Varadhan
Lemma 4.2. There exists a finite constant C0 depending only on the transition probability p such that (−Lτj )f, f ≤; C0 n (−L0 )f, f (4.3) for j = 1, 2, all n ≥ 1 and all f in Hn . Moreover
2 (−Lτ± )f, g ≤ C02 n2 (−L0 )f, f (−L0 )g, g
(4.4)
for all n ≥ 1 and all f in Gn , g in Gn±1 . On the other hand for j = 1, 2, Lτj f20 ≤ 4f20 and Lτ± f20 ≤ 4f20
(4.5)
for all f in H. Proof. The first estimate (4.3) follows immediately from Lemma 5.1 in [4]. We first prove that for all f, g in L2 (E∗ ), ± 2 Lτ f, g ≤ (−Lτ1 )f, f (−Lτ2 )g, g .
(4.6)
Fix f, g in L2 (E∗ ). By the explicit formula for Lτ+ , we have that
(−Lτ+ )f, g = p(y) g(A) f(Sy A\{−y}) − f(A\{y}) . y
Ay
Rewrite this expression as twice one half of it. In one of the pieces, we perform the change of variables B = Sy A, z = −y to obtain that it is equal to −(1/2)
p(y)
y
g(Sy A) f(Sy A\{−y}) − f(A\{y}) .
Ay
Here we used
the fact that p(·) is symmetric. Adding the two expressions we get that (−Lτ+ )f, g is equal to −(1/2)
y
p(y)
g(Sy A) − g(A) f(Sy (A\{y})) − f(A\{y}) .
Ay
By Schwarz’s inequality, this expression is bounded above by 2 1 p(y) g(Sy A) − g(A) 4β y Ay
+
2 β p(y) f(Sy A\{−y}) − f(A\{y}) 4 y Ay
for all β > 0. By the identities
presented just before the statement of the lemma, the first term is (1/2β) (−Lτ1 )g, g . A change of variables B = A − {y} shows that the second
is bounded by (β/2) (−Lτ2 )f, f . Minimizing over β, we conclude the proof of (4.6).
Regularity of Self-Diffusion
315
We may now prove the second estimate of the lemma. Fix n ≥ 1, and functions
2 f and g of degree n and n + 1, respectively. By (4.6), Lτ+ f, g is bounded above by
(−Lτ1 )f, f (−Lτ2 )g, g . By the first part of the lemma, this product is bounded by C02 n2 (−L0 )f, f (−L0 )g, g This proves (4.4) for Lτ+ . The proof for Lτ− is similar. The last estimate (4.5) is elementary and follows from Schwarz’s inequality and the explicit formulas for the operators Lτ1 , Lτ2 , Lτ+ , and Lτ− . Lemma 4.3. For every k ≥ 0, there exists a finite constant Ck such that for j = 1, 2, +, −, |||Lτj f|||−1,k ≤ Ck |||f|||1,k+1 , j
so that Lτ maps H1,k+1 boundedly into H−1,k Proof. Follows immediately from the preceding lemma.
Lemma 4.4. There exists a finite constant C0 such that for all n ≥ 1, f1,α ≤ C0 nf1,env ,
f−1,env ≤ C0 nf−1,α
for all α in [0, 1], and all f in Gn . Proof. Fix n ≥ 1 and f in Gn . By (3.5) and Schwarz’s inequality, < f, f >1,α is bounded above by f21,env + 2
p(y)[f(Sy A) − f(A)]2
A∈E∗ y∈Zd∗
+
p(y)[f(Sy [A ∪ {y}]) − f([A ∪ {y}])]2
A∈E∗ y∈A
because |ry (A)| ≤ 1 and χ (α) ≤ 1. Since f belongs to Gn , we may restrict the second sum to sets A in En,∗ . A change of variables permits us to estimate the third sum by the second one. In conclusion, f, f1,α ≤ f21,env + 3
p(y)[f(Sy A) − f(A)]2 .
A∈E∗,n y∈Zd∗
By Lemma 4.2, the second term on the right-hand side is less than or equal to C0 nf21,env because f belongs to Gn . The second estimate of the lemma is obtained by duality.
316
C. Landim, S. Olla, S. R. S. Varadhan
5. The Self-Diffusion Coefficient By [1], the self–diffusion coefficient D(α) in the direction v is given by the variational formula : v · D(α)v = inf p(z)Eµα [1 − ξ(z)]{v · z − [f (τz ξ ) − f (ξ )]}2 f
z∈Zd∗
+
p(x − y)Eµα ξ(x)[1 − ξ(y)]{f (σ x,y ξ ) − f (ξ )}2 ,
x,y∈Zd∗
where the infimum is carried over all cylinder functions f . A simple computation shows that v · D(α)v = (1 − α) (z · v)2 p(z) − α(1 − α)fv 2−1,α (5.1) z∈Zd∗
for each v in Rd . Here fv is the cylinder function given by fv (ξ ) = √
1 p(y)(y · v)[1 − ξ(y)] α(1 − α) d y∈Z∗
1 p(y)(y · v)[α − ξ(y)] =√ α(1 − α) d y∈Z∗
because p has mean zero. With the notation introduced in the previous section, we may write fv as (y · v)p(y) y , fv (ξ ) = − y∈Zd∗
where z = {z} for z in Zd∗ . We are now in a position to state the main result of this section. Theorem 2.1 follows from this result in view of formula (5.1). Theorem 5.1. As a function of α, fv 2−1,α is of class C ∞ on [0, 1]. The proof is based on the lemmas at the end of the previous section. To explain the strategy of the proof we introduce the resolvent equation associated to fv : for λ > 0, denote by uλ the solution of the resolvent equation: λuλ − Luλ = fv . We will use the dual representation and carry out the estimates in H. Let uλ ∼ uλ through the unitary isomorphism. Of course uλ = uλ (α) depends on α, (z · v)p(z)e{z} fv ∼ f v = − z∈E∗
is independent of α and is actually in H−1 . We have λuλ (α) − Lα uλ = fv .
(5.2)
Regularity of Self-Diffusion
317
It follows from [1] that fv 2−1,α = lim fv , uλ α = − lim λ→0
= lim
λ→0
λ→0
(z · v)p(z)uλ ({z}, α)
z∈Zd∗
1 (z · v)p(z) [uλ ({−z}, α) − uλ ({z}, α)] 2 d
(5.3)
z∈Z∗
because p(·) is symmetric. In view of this identity, to prove Theorem 5.1 we just need to show that there exists a subsequence λk ↓ 0 such that, for each z with p(z) > 0, {uλk (α, {z}) − uλk (α, {−z}), k ≥ 1} converges uniformly in α to a smooth function. To prove the existence of such a subsequence, it is enough to show that the functions {uλ (α, {z})} are smooth for each λ > 0 and, for each z and j ≥ 0, to obtain the uniform bounds sup
(j )
(j )
sup |uλ (α, {−z}) − uλ (α, {z})| < ∞.
(5.4)
0<λ≤1 0≤α≤1 (j )
Here uλ stands for the j th derivative of uλ with respect to the density α. By Schwarz’s inequality,
z
2 (j ) (j ) p(z)|uλ (α, {−z}) − uλ (α, {z})|
≤
z
(j )
(j )
p(z)[uλ (α, {−z}) − uλ (α, {z})]2 .
Since the support of p generates Zd , and we can exclude the one dimensional nearest neighbor case, there exists a path z0 = −z, z1 , . . . , zn = z, avoiding 0, such that (j ) (j ) p(zi+1 − zi ) > 0 for 0 ≤ i < n. Rewriting the difference uλ (α, {−z}) − uλ (α, {z}) (j ) (j ) as 0≤i
(j )
C0 π1 uλ 21,env ≤ C0 |||uλ |||21,0 .
(5.5)
By (5.5), in order to prove (5.4) it is enough to obtain for each j ≥ 0, the bound sup
(j )
sup |||uλ |||1,0 < ∞.
0<λ≤1 0≤α≤1
Notice that the coefficients of Lα are not smooth at the boundary of [0, 1]. For this reason, we reparametrize the family of equations by α = sin2 t, t ∈ [0, π/2] to get L(t) = L0 + (sin2 t) Lτ1 + (cos2 t) Lτ2 + (sin t cos t) Lτ+ + Lτ− and consider the resolvent equation λvλ (t) − L(t)vλ (t) = fv . Since fv does not depend on α we have uλ (α(t)) = vλ (t). To prove that the sequences (j ) {uλ (A, α), λ > 0}, j ≥ 0, are uniformly bounded in the ||| · |||1,0 norm, we first prove
318
C. Landim, S. Olla, S. R. S. Varadhan (j )
such a statement for the sequences {vλ (A, t), λ > 0}, j ≥ 0. From this result and the relation between uλ and vλ , t and α, we deduce boundedness in ||| · |||1,0 norm of (j ) {uλ (A, α), λ > 0} in the interior of the domain. An extra argument, presented at the end of the proof, extends the smoothness up to the boundary. We start by observing that the function f has finite H−1,k norm for all k ≥ 0, i.e. there exists a finite constant C0 such that |||fv |||−1,k ≤ C0
(5.6)
for all k ≥ 0. The proof of this claim is elementary. Since fv has degree 1, |||fv |||−1,k = |||fv |||−1,0 = fv −1,env is finite as soon as f−1,env is finite. To prove that fv −1,env is finite, recall the variational formula (3.6) for the · −1,env norm and fix a finite supported function g. Since L0 does not change the degree of a function and since fv has degree one, we may assume that g has degree one. Since p is symmetric, f, g =
1 p(z)(z · v)[g({−z}) − g({z})]. 2 z
By Schwarz’s inequality, the square of this expression is bounded by 1 p(z)|(z · v)|2 p(z)[g({−z}) − g({z})]2 . 4 z z Now we proceed as for the bound (5.5): there exists a path z0 = −z, z1 , . . . , zn = z, avoiding 0, such thatp(zi+1 − zi ) > 0 for 0 ≤ i < n. Rewriting the difference g({−z}) − g({z}) as 0≤i, which proves the claim (5.6) in view of the variational formula (3.6) for the · −1,env norm. We now start our way through the proof that vλ is a sequence of smooth functions with bounded derivatives. Lemma 4.1 applied to f shows that sup
sup |||vλ (t)|||1,k
0<λ≤1 0≤t≤π/2
is finite for all k ≥ 1. We now turn to the proof of the differentiability of vλ (·). We say that a function g(t) with values in H is differentiable at t if γ −1 [g(t + γ ) − g(t)] converges, as γ ↓ 0, strongly in H to some function that we denote by g . Notice that differentiating formally L(t) in t we get the operator L (t) = (2 sin t cos t) (Lτ1 − Lτ2 ) + (cos2 t − sin2 t) Lτ− + Lτ+ . Lemma 5.2. Suppose that f(t) is a differentiable function of t. Let uλ be the solution of the resolvent equation λuλ (t) − L(t)uλ (t) = f(t). Then, uλ (t) is differentiable and its derivative is the solution uλ (t) of λuλ − L(t)uλ = f (t) + L (t) uλ .
(5.7)
Regularity of Self-Diffusion
319
Proof. The proof of the differentiability of uλ (t) is standard; all we need to control is that L (t)uλ (t) is in H, which follows from (4.5) of Lemma 4.2 and the boundedness of the coefficients of L (t). The previous lemma applied to f = fv shows that the family of functions uλ is differentiable for each fixed λ and that the derivative uλ satisfies some resolvent-type equation. Proof of Theorem 5.1. We first show that {uλ (t), λ > 0} is a family of smooth functions whose derivatives satisfy for each k ≥ 0, sup
sup |||uλ (t)|||1,k < ∞.
(5.8)
0<λ≤1 0≤t≤ π2
By (5.6) |||f|||−1,k is bounded uniformly in t. Hence, by Lemma 4.1, |||uλ |||1,k is bounded, uniformly in λ and t. Since f does not depend on t, by Lemma 5.2, uλ is differentiable and its derivative uλ satisfies λuλ − L(t)uλ = L (t) uλ . By Lemma 4.3 and the explicit form of the operator L(t), |||L (t) uλ |||−1,k ≤ 2 |||Lj (t) uλ |||−1,k ≤ 8Ck |||uλ |||1,k+1 j =1,2,+,−
then by Lemma 4.1 |||L (t) uλ |||−1,k is bounded for each k ≥ 1, uniformly in λ and t. We may therefore apply again Lemma 4.1 to show that |||uλ (t)|||1,k is uniformly bounded in (t, λ) for all k ≥ 1. To iterate the argument, we just need to prove by induction the existence of constants {an,i , n ≥ 1, 0 ≤ i < n} such that (j ) λuλ
−
(j ) L(t)uλ
=
j −1 i=0
(i)
aj,i L(j −i) (t) uλ ,
(5.9)
(i)
where uλ , L(i) (t) stands for the i th derivative of uλ (t), L(t). This is elementary and left to the reader. The previous argument shows that uλ (t) is a sequence of smooth functions on [0, 1] with their derivatives having the uniform bounds sup
(j )
sup |||uλ (t)|||1,k < ∞
0<λ≤1 0≤t≤π/2
for each j ≥ 0. We have seen just after (5.3) that these uniform estimates guarantee the smoothness of f−1,α(t) as a function of t defined in [0, π/2]. Since α = sin2 t, this translates immediately into smoothness in α for α ∈ (0, 1). Regularity at the boundary requires the following extra argument. We claim that the odd derivatives of < f, uλ (t) > vanish at t = 0 and π/2. We consider the case t = 0, the other being similar. To keep notation simple, let Uλ (t) =< f, uλ (t) >. (j ) (j ) Since f does not depend on α, for j ≥ 0, Uλ (0) = < f, uλ (0) >. Since f is a function of degree 1, to prove that the odd derivatives of Uλ (t) vanish at 0, it is enough to prove (2j +1) that uλ (0) is a function of even degree. We prove this statement by induction on j .
320
C. Landim, S. Olla, S. R. S. Varadhan
Observe that L(0) = L0 + Lτ2 , which are operators that preserve the degree of a function. On the other hand, since sin2 t, cos2 t are even functions and since sin t cos t is an odd function, there exist constants aj , bj , cj such that L(2j ) (0) = aj Lτ1 + bj Lτ2 ,
L(2j +1) (0) = cj [Lτ+ + Lτ− ]
for j ≥ 0. In particular, while L(2j ) (0) preserves the degree of a function, L(2j +1) (0) changes it by one. (2j +1) (2j ) (0) (resp. uλ (0)), j ≥ 0, are functions of even (resp. odd) To prove that uλ degree, notice first that uλ (0) is the solution of [λ − (L0 + Lτ2 )]uλ (0) = f. Since f is a function of degree 1, uλ (0) is also of degree 1. This proves the claim for j = 0. It is easy to conclude the proof by induction using formula (5.9) and the fact that L(2j ) (0) preserves the degree, while L(2j +1) (0) changes it by one. (2j +1) (2j +1) (2j +1) (0) are functions of even degree, Uλ (0) =< f, uλ (0) > vanSince uλ ishes because f has degree one. Since we proved uniform convergence of a subsequence uλk (t) and its derivatives, the limit U (t) = f2−1,α(t) of Uλ (·) inherits these properties. In particular, U 2j +1 (0) = 0 . Elementary analytic considerations show that U (t) is in fact a smooth function of t 2 and hence of sin2 t = α. Remark 5.3. The proof of the smoothness at the boundary provides a recursive method to compute the Taylor expansion at the origin of the diffusion coefficient. Recall that U (t) = f−1,α(t) . By Theorem 5.1, U (0) = limλ→0 < f, uλ (0) >= < f, u(0) >, where u(0) is the solution of −[L0 + Lτ2 ] u(0) = f.
(5.10)
Since f has degree one and since L0 , Lτ2 preserve the degree, this equation can be solved in H1 . In this space both L0 , Lτ2 are essentially Laplace operators and this equation may be solved. Knowing u(0), we may examine the equation −[L0 + Lτ2 ] u(1) (0) = L(1) (0)u(0). As noticed earlier, the right hand side is a function of degree 0 and 2 so that u(1) (0) has this property. By induction we may obtain u(j ) (0) for all j ≥ 1 by inverting an operator which is essentially a Laplacian. This permits us to compute the Taylor expansion of U around the origin because U (j ) (0) =< f, u(j ) (0) >. In particular, from (5.1), v · D(α)v = (1 − α)
(z · v)2 p(z) − α(1 − α) < u(0), fv > + O(α 2 )
z∈Zd∗
includes the first order correction, where u(0) is the solution of (5.10).
Regularity of Self-Diffusion
321
References 1. Kipnis, C., Varadhan, S.R.S.: Central limit theorem for additive functionals of reversible Markov processes and applications to simple exclusion. Commun. Math. Phys. 106, 1–19 (1986) 2. Landim, C., Olla, S.,Yau, H.T.: Some properties of the diffusion coefficient for asymmetric simple exclusion processes. Ann. of Probab. 24, 1779–1807 (1996) 3. Landim C.,Yau, H.T.: Fluctuation–dissipation equation of asymmetric simple exclusion processes. Probab. Th. Rel. Fields 108, 321–356 (1997) 4. Landim C., Olla S., Varadhan S.R.S.: Finite-dimensional approximation of the self-diffusion coefficient for the exclusion process. Preprint 5. Sethuraman, S., Varadhan, S.R.S., Yau, H. T.: Diffusive limit of a tagged particle in asymmetric exclusion process. Comm Pure Appl. Math. 53, 972–1006 (2000) 6. Varadhan, S.R.S.: Regularity of the self-diffusion coefficient. In: The Dynkin Festschrift, Progr. Probab. 34, Boston, MA: Birkhäuser Boston 1994, pp. 387–397 Communicated by H.-T. Yau
Commun. Math. Phys. 224, 323 – 340 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
A Simple Proof of Stability of Fronts for the Cahn–Hilliard Equation E. A. Carlen1, , M. C. Carvalho1, , E. Orlandi 2, 1 School of Mathematics, Georgia Institute of Technology, Atlanta, GA 30332, USA.
E-mail: [email protected]
2 Dipartimento di Matematica, Universitá degli Studi di Roma Tre, P. S. Murialdo 1, 00146 Roma, Italy.
E-mail: [email protected] Received: 14 November 2000 / Accepted: 30 July 2001
Dedicated to Joel Lebowitz on the occasion of his 70th birthday Abstract: We apply a method developed in our earlier work on a non-local phase kinetics equation to give a simple proof of the non-linear stability of fronts for the Cahn–Hilliard equation. 1. Introduction In this paper we consider the one dimensional Cahn–Hilliard equation, which is a particularly interesting example of a class of equations for the transport of a conserved order parameter m(x) on R. Such equations generally have the form ∂ ∂ m= J, ∂t ∂x
(1.1)
where the current J is given in terms of the variation of a free energy functional F through ∂ δF J = . (1.2) ∂x δm In this particular case, the free energy F is 1 ∂ 2 1 2 2 F(m) = m + 8 (1 − m ) dx. R 2 ∂x Work partially supported by U.S. National Science Foundation grant DMS 00–70589.
Work partially supported by E.U. grant ERB FMRX CT 97-0157 and FCT PRAXIS XXI.
On leave from Departamento de Matemática da Faculdade de Ciencias de Lisboa and GFM, 1700 Lisboa codex, Portugal. E-mail: [email protected] Work partially supported by the CNR-GNFM, MURST COFIM 99–00.
(1.3)
324
E. A. Carlen, M. C. Carvalho, E. Orlandi
The variation in (1.2) is to be computed with respect to the L2 norm on R, and hence δF ∂2 1 = − 2 m − m(1 − m2 ) δm ∂x 2
(1.4)
and the equation is ∂ ∂2 m= 2 ∂t ∂x
∂2 1 2 − 2 m − m(1 − m ) . ∂x 2
Clearly the free energy is a decreasing function under this evolution: 2 ∂ δF d F(m) = − (m) dx, dt R ∂x δm
(1.5)
(1.6)
and thus our evolution has a Lyapunov functional. We will denote −dF(m)/dt by I(m(t)). Moreover, the evolution has a conservation law: For all t > 0, (m(x, t) − m(x, 0))dx = 0. (1.7) R
Replacing derivatives by gradients and divergences in the obvious places, one obtains a two or three dimensional version. In such cases, m(x) represents the order parameter in the model of a binary alloy with a phase transition. The two global equilibrium states correspond to the two minima of the potential W (m) = (1 − m2 )2 /8. Clearly these are m = 1 and m = −1. At the boundary between two regions of different phases, there will be a transition from m = 1 to m = −1. Since the evolution decreases the free energy, we expect that after a short initial time period, these transitions should occur in a way that minimizes the cost in excess free energy. Therefore, in the one dimension across the boundary between two regions of different phase, we expect a “transition profile” that is very close to some translate of m ¯ 0 , where (1.8) F(m ¯ 0 ) = inf F(m) sgn(x)m(x) ≥ 0, lim sgn(x)m(x) > 0 . x→±∞
The minimizer is well known, and easily seen, to be m ¯ 0 (x) = tanh(x/2). The physical interest in the one dimensional problem is that stability of these minimal free energy transition profiles, which we simply call “fronts” in the rest of the paper, is important for understanding how the boundaries between regions of different phases evolve in higher dimension. Without further mention of the higher dimensional case, we now turn to this stability problem. The subscript 0 on the minimizer in (1.8) is present because the constraint imposed in (1.8) breaks the translational invariance of the free energy. For any a in R, define m ¯ a (x) = m ¯ 0 (x − a).
(1.9)
These functions m ¯ a are the fronts whose stability is to be investigated here. Clearly F(m ¯ a ) = F(m ¯ 0 ), so that m ¯ 0 belongs to a one parameter family of minimizers of the free energy. Another family is obtained by reflecting this one because the free energy is also reflection invariant. However, these two families of minimizers separated in all of the relevant metrics, and it suffices to consider just one.
Simple Proof of Stability of Fronts for Cahn–Hilliard Equation
325
It is easy to guess the result of solving (1.5) for initial data m0 that is a small perturbation of the front m ¯ 0 . The excess free energy should decrease in a way that forces the solution m(t) to tend to the family of fronts, and the conservation law should select m ¯ a as the front it should be converging to, so the result should be that, in any reasonable sense, limt→∞ (m(x, t) − m ¯ a (x)) = 0 with a given in terms of the initial data m0 through (1.7) in the form
m(x, 0) − m ¯ a (x) dx = 0. (1.10) Our main result is a proof that this is the case. The result has recently been obtained in this case by Bricmont, Kupiainen and Taskinen [2] using renormalization group methods. Their result gives a tighter estimate on the decay rate, but in a weaker norm that does not control the excess free energy. We recently proved such a result for a related equation, the LOP equation, which first appeared in [10] and later rigorously derived from an underlying microscopic model in [7]. The method that we used was developed to deal with the non-local nature of the LOP equation, and the fact that one has no explicit formula for m ¯ in that case, which precluded the explicit spectral analysis required in the renormalized group method. However, as we show here, the method developed for the LOP equation also applies to the local Cahn–Hilliard equation, and yields a fairly simple proof of the non-local stability. Moreover, this method works directly in physical norms, and it provides an estimate on the rate of decrease of the excess free energy. The result is: Theorem 1.1. Consider initial data m0 (x) for the one dimensional Cahn–Hilliard equation (1.5) such that ¯ 0 (x))2 dx ≤ c0 , x 2 (m0 (x) − m where c0 is any positive constant. Then for any > 0 there is a strictly positive constant δ = δ(, c0 ) depending only on and c0 such that for all inital data with ¯ 0 (x))2 dx ≤ δ, (m0 (x) − m the excess free energy F(m(t)) − F(m0 ) of the corresponding solution m(t) of (1.5) satisfies F(m(t)) − F(m) ¯ ≤ c2 (1 + c1 t)−(9/13−) and
m(t) − m ¯ a 1 ≤ c2 (1 + c1 t)−(5/52−) , where c1 and c2 are finite constants depending only on and c0 and a is given by (1.10). Since the problem has both a Lyapunov functional and a conservation law, it may appear that it should be a simple matter to prove this result. One reason that it is not so simple is that the decrease of the excess free energy provides only L2 control, and by itself, only partial control at that. To use the conservation law, one needs L1 control. Our equation is not dissipative in L1 , a circumstance which is closely related to the lack of a maximum principle. Decrease of free energy can be used to show that the
326
E. A. Carlen, M. C. Carvalho, E. Orlandi
solution m(x, t) approaches some moving front ma(t) (x) in some norm other than L2 . For example, Asselah did this in [1] for the LOP equation studied in [4] and [5], with the approach controlled in the L∞ norm. But since the free energy is translation invariant, it cannot provide any control over a(t). Moreover, without control on a(t) that prevents it from “running away”, it is not at all clear how one can even get L2 control on the difference between m(x, t) and ma(t) (x), or get a rate estimate. The difficulties in this sort of problem are discussed in more detail in [4]. Here we move directly on to the solution. Despite what has been said above, understanding the free energy functional F is still central to understanding the stability. To begin, we introduce the operator A associated with its second variation at a front m. ¯ First, throughout this paper, we make the following convention: whenever some solution m(x, t) of (1.5) is under discussion, then v(x, t) is defined by v(x, t) = m(x, t) − m ¯ a(t) (x),
(1.11)
where a(t) is defined to be that value of c such that
m(t) − m ¯ a(t) 2 = inf { m(t) − m ¯ c 2 }. c∈R
(1.12)
It is shown in [4] that a(t) is a well–defined function as long as m(t) − m ¯ a(t) 2 stays sufficiently small since then the minimum is uniquely attained. Finally, it will be convenient to have the convention that m(x) ¯ denotes m ¯ a(t) (x). In the same vein, we shall generally simply write A in place of Aa(t) for the second variation of F at m ¯ a(t) , and leave the a(t) implicit. However, in the definition, we shall be explicit: v, Aa vL2
d2 = 2 F(m ¯ a + sv) . ds s=0
(1.13)
One easily computes that Av(x) = −v (x) + V (x)v(x) + v(x),
(1.14)
where V (x) =
x 3 3 2 m ¯ −1 = tanh2 −1 . 2 2 2
(1.15)
The operator A has a spectral gap: Lemma 1.2. In the spectrum of A, 0 is an isolated eigenvalue of multiplicity one. In fact, v(x)Av(x)dx ≥ for all v with
v(x)m ¯ (x)dx = 0.
3
v 22 4
Simple Proof of Stability of Fronts for Cahn–Hilliard Equation
327
Proof. We consider the operator H given by H v(x) = −v (x) + V (x)v(x). We know that m ¯ is an eigenvector, and that the corresponding eigenvalue is −1. Let −1 = e0 , e1 , e2 , . . . be the negative eigenvalues of H , repeated according to their multiplicity. Then by a bound of Lieb and Thirring [9], one has 3 |ej |3/2 ≤ |V (x)|2 dx. 16 R j
The integral is easily evaluated and equals 6. Keeping only the first two terms in the sum on the left 1 + |e1 |3/2 ≤ 18/16 and this implies that |e1 | ≤ 1/4. Thus e1 ≥ −1/4, and this completes the proof. As indicated in Theorem 1.1, we shall start out with v 2 small, and then, because of the smoothing properties of the equation [3, 5], it will be the case that at least a short time later, v 2 is still small, and then v 2 is small as well. We shall obtain a number of a-priori estimates that hold when v 2 and v 2 are both small, and shall use them in the final section of the paper to prove that this condition persists indefinitely. The first estimate that we obtain under these conditions shows that the excess free energy of m ¯ + v is comparable to v, Av . Lemma 1.3. For all > 0, there are δ, κ > 0 so that whenever v 2 ≤ δ and v 2 ≤ κ, then 1− 1+ v, Av ≤ F(m ¯ + v) − F(m) ¯ ≤ v, Av. 2 2 Proof. One easily computes that F(m ¯ + v) − F(m) ¯ =
1 1 v, Av + 2 4
2mv ¯ 3 + v 4 dx.
Using the inequality v 2∞ ≤ 2 v 2 v 2 , one obtains √ v4 3 ≤ 2 2κδ + κδ v 2 . 2 mv ¯ + dx 2 2 √ By the previous lemma, for κ and δ small enough, 2 2κδ + κδ v 22 ≤ (/2)v, Av, and this completes the proof. The first key result is a lower bound on the dissipation in terms of A: Lemma 1.4. For any > 0, d F m(t)) − F m ¯ ≥ (1 − ) I(m(t)) = − dt
2 (Av) (x) dx
(1.16)
whenever ||v ||2 ≤ κ1 () and ||v||2 ≤ δ1 () for some strictly positive constants κ1 () and δ1 (). Moreover, there exists a constant γ > 0 so that 2 (1.17) (Av) dx ≥ γ ||v ||22 whenever
v(x)m ¯ (x)dx = 0.
328
E. A. Carlen, M. C. Carvalho, E. Orlandi
This theorem is proved in Sect. 2. We use (1.16) only when I(m(t)) << F m(t)) − F m ¯ ; i.e., when the dissipation is very small compared to the excess free energy. The point is that in this case, v must be very “smooth and spread out” and so both v , v and v, V v are negligible compared to v, v. That is, v, Av ≈ v, v, and so the operator A differs negligibly from the identity. But if we replace A by the identity in the linearized evolution equation, it simply becomes the heat equation. Therefore, when the dissipation is small compared to the excess free energy, we expect heat equation behavior, which we can calculate, to govern this dissipation process. On the other hand, when the dissipation is not small compared to the excess free energy, we are in a position to benefit from the plentiful dissipativity to compute bounds on the rate at which it occurs. This is the dissipation–dichotomy in our argument. To make precise use of the dichotomy, introduce a small parameter 1 to be fixed later, and distinguish between the times t for which ¯ (1.18) I(m(t)) ≤ 1 F(m(t)) − F(m) or ¯ . I(m(t)) ≥ 1 F(m(t)) − F(m)
(1.19)
When (1.19) is true, there is plenty of dissipation, and the excess free energy is decaying at an exponential rate, and it will be relatively simple to exploit this. We have already explained that condition (1.18) will help us because under this condition, we will be able to
show that v 1 dissipates away as though it were a solution of the heat equation with R v(t)dx = 0 for all t. We will return to this shortly, but it depends on the fact that when (1.18) holds, v is very “smooth and spread out”. This is used several ways in the proof. Indeed, combining (1.18) with the key bound (1.16),
Av 22 ≤ 1 F(m ¯ + v) − F(m) ¯ under appropriate conditions on v. Then since v, Av is comparable with the excess free energy of m ¯ + v, when (1.18) holds, one has Av 22 << v, Av. Any function v such that this is the case is so smooth and spread out that 2
Av 2 ≈ v, Av ≈ 2[F(m(t)) − F(m)]. ¯ (1.20) The precise version of this is given in Theorem 2.3, and it is the key inequality behind the dissipation–dichotomy argument. It enables us to “drop” extra powers of A when (1.18) holds. We shall also need certain moment inequalities, which show that v(t) can’t spread out too fast. Theorem 1.5. Let m = m ¯ + v be a solution of (1.5) and let C be a positive number. Define φ(t) by φ(t) = 1 + |x (Av) |2 dx + C [F(m ¯ + v) − F(m)] ¯ . (1.21) R
Then for any > 0, there is a choice of C < ∞ and an 1 > 0 so that one has d φ(t) ≤ 4(1 + ) [F(m ¯ + v) − F(m)] ¯ dt
(1.22)
Simple Proof of Stability of Fronts for Cahn–Hilliard Equation
329
whenever (1.18) holds, and ||v||2 ≤ δ1 (), ||v ||2 ≤ κ1 (), and |a(t)| ≤ 1 for some strictly positive constants κ1 () and δ1 (). Regardless of whether (1.18) holds or not, there is a constant K < ∞, d φ(t) ≤ K [F(m ¯ + v) − F(m)] ¯ dt
(1.23)
for as long as ||v ||2 ≤ κ1 (), ||v||2 ≤ δ1 () and |a(t)| ≤ 1. Theorem 1.5 is proved in Sect. 3. Theorems 1.4 and 1.5 are the main ingredients of our argument specific to the Cahn–Hilliard equation. The other two ingredients are a constrained form of the uncertainty principle inequality and decay estimate for a system of differential inequalities introduced in [5]. We will now explain what these are, and how they work together to provide the proof of Theorem 1.1. The constrained form of the uncertainty principle inequality [5] is the following: Under either of the constraints ψ(x)dx = 0 or ψ(0) = 0, one has
x 2 |ψ(x)|2 dx
|ψ (x)|2 dx
≥
9 4
2 |ψ(x)|2 dx
.
(1.24)
The difference between (1.24) and the usual uncertainty principle is a factor of 9 in the constant, and, as we showed in [5], this is crucial for L1 control. We wish to apply this to ψ = Av. It is clear that Av will have a zero somewhere, a technical argument is needed to control the location. To explain how all of the pieces of the argument fit together, assume for the moment that the initial data is antisymmetric. Then the solution will be antisymmetric for all time and so Av(0, t) = 0
(1.25)
for all t. The technical argument needed to remove the antisymmetry assumption will be given in Sect. 2. However, assuming (1.25) , we have from (1.16) and (1.24) that d 9 Av 42 . F m(t)) − F m ¯ ≤ −(1 − ) dt 4 xAv 22
(1.26)
The problem with this inequality is that
the right hand side does not directly involve the excess free energy F m(t)) − F m ¯ . If it did, we could hope to get a Gronwall inequality for the decay of the excess free energy. The problem is thus one of closure: we have to relate the quantity on the right-hand side to the excess free energy. Now we are ready to put the pieces together. When (1.20) is valid, interpreting the approximation sign appropriately in terms of , we can rewrite (1.26) as d [F(m(t)) − F(m)] ¯ 2 . F m(t)) − F m ¯ ≤ −9(1 − ) dt
xAv 22
(1.27)
f (t) = F(m ¯ + v(t)) − F(m) ¯
(1.28)
Now define
330
E. A. Carlen, M. C. Carvalho, E. Orlandi
and define φ(t) as in Theorem 1.5. Then (1.27) becomes 2 F m(t)) − F m ¯ d F m(t)) − F m ¯ ≤ −9(1 − ) , dt φ(t) and from Theorem 1.5 we have that d φ(t) ≤ (1 + )4 F(m ¯ + v) − F(m) ¯ . dt Notice the condition that |a(t)| ≤ 1 in Theorem 1.5, to which we shall return. Thus, when (1.18) holds, we have d f (t)2 f (t) ≤ −A˜ dt φ(t)
d ˜ (t) φ(t) ≤ Bf dt
and
(1.29)
˜ A+ ˜ B) ˜ and 9/13 arbitrarily small for small enough for with the difference between A/( all times t such that (1.18) holds, v(t) 2 , v (t) 2 are sufficiently small and |a(t)| ≤ 1. On the other hand, when (1.19) holds, there is plenty of dissipation, and using (1.19) and the second half of Theorem 1.5, we get (1.29) with some different constants A˜ and B˜ ˜ A˜ + B) ˜ (in fact, A˜ will be the constant K from Theorem 1.5), but such that the ratio A/( is the same. The upshot is that we always have (1.29), but at two different time scales according to whether (1.19) or (1.18) holds. The heuristic idea that we will make precise in Sect. 4 is that by taking the slower of these two time scales, we bound the decay of our system. Therefore we consider the system of differential inequalities d f (t)2 f (t) ≤ −A dt φ(t)
d φ(t) ≤ Bf (t) dt
and
(1.30)
with A = 9 and B = 4. Theorem 5.1 of [4] says that for any solution of (1.30),
−q φ(0) + (A + B)t , f (0) 1−q 1−q q φ(0) + (A + B)t , φ(t) ≤ f (0) φ(0) f (0)
f (t) ≤ f (0)1−q φ(0)q
where q = A/(A + B). In the case at hand, this is q = 9/13. Since this value exceeds 1/2, we get L1 decay in the following way: By the elementary Lemma 5.2 of [5], for any function w and any 0 < δ < 1, (1+δ)/2
w 1 ≤ C(δ) (1 + x 2 )1/2 w 2
(1−δ)/2
w 2
,
(1.31)
where C(δ) is a finite constant. (This
same method may be applied to solutions u of the heat equation ∂u/∂t = u with R u(t)dx = 0 to estimate the rate of L1 decay, as shown in [5].) Here, we apply (1.31) with w = Av(t), so that we obtain
Av(t) 21 ≤ C(δ)φ(t)1+δ Av(t) 1−δ 2 .
Simple Proof of Stability of Fronts for Cahn–Hilliard Equation
331
Since 9/13 > 1/2 for δ sufficiently small, we have that φ(t)1+δ increases more slowly decreases, and so Av(t) 1 decreases to zero. In fact, the rate one gets than Av(t) 1−δ 2 is arbitrarily close to t −5/26 , for δ sufficiently small, as in Theorem 1.1. This leads to lim Av(x, t)dx = lim (V (x) + 1) v(x, t)dx = 0. t→∞ R
t→∞ R
But R V (x)v(x, t)dx ≤ V 2 v(t)
2 , and this tends to zero as t tends to infinity by
the above, so that finally, limt→∞ R v(x, t)dx = 0. But (1.7) is equivalent to
m ¯ a(t) (x) − m(x, 0) dx + v(x, t)dx = 0, R
R
¯ a(t) (x) − m(x, 0) dx = 0 so and hence limt→∞ R m
that limt→∞ a(t) = a, where a ¯ a (x) − m(x, 0)) dx is linear, is determined through . Indeed, the map a → R (m
(1.10) and the slope is − R m ¯ a (x)dx = −2, as one sees simply by differentiating. Thus,
= 2|a(t) − a|. (x) − m(x, 0) dx m ¯ a(t) R
2. Free Energy Estimates It follows from (1.6) and the definition of A, one has 2 d d 1 2 3 dx. F(m) = − Av − 3mv ¯ +v dt 2 R dx For convenience of notation, define 3 2 1 d 3mv ¯ 2 + v3 = − m ¯ v + 2mvv ¯ + v2 v . U= 2 2 dx
(2.1)
(2.2)
Now for any f and g in L2 and for any 0 < < 1, 1
f + g 22 ≥ (1 − ) f 22 − g 22 . Combining (2.1), (2.2) and (2.3), we have d (Av) + U 2 dx ≥ (1 − ) (Av) 2 dx − 1 |U |2 dx. − F(m) = dt R R R
(2.3)
(2.4)
The following lemma is closely based on lemmas and arguments in Sect. 3 of [4]. We have stated it so that it applied to a general class of potentials because the proof, although somewhat involved, depends only on fairly general properties of m ¯ and A.
Theorem 2.1. Let v ∈ L2 (R), v ∈ L2 (R) and v(x)m ¯ (x)dx = 0 then there exists a positive constant γ , such that 2 (2.5) (Av) dx ≥ γ ||v ||22 , where A is the linear operator defined in (1.14) .
332
E. A. Carlen, M. C. Carvalho, E. Orlandi
Proof. First observe V is given in (1.15). Next,
x that (Av) = Av + V v, where v(x) = v(y) + y v (z)dz. Multiply both sides by m ¯ (y), and integrate in y. Since
v(y)m ¯ (y)dy = 0, and since m ¯ (y)dy = 2, x ∞ 1 v(x) = m ¯ (y) v (z)dz dy. (2.6) 2 −∞ y
Hence (Av) = Av + Kv , where 1 Kφ(x) = V (x) 2
∞
−∞
m ¯ (y)
x y
(2.7) φ(z)dz dy.
The operator K is compact on L2 . A detailed proof in a closely related case is given in [4]. Now consider the quadratic form Q(φ) given by Q(φ) = (A + K) φ 22 for φ in the domain of A. We next show that Q(φ) > 0 for all φ in its domain. Suppose on the contrary that Q(φ) = 0 for some φ in the domain of Q, which is the operator domain of A. Define x η(x) = φ(y)dy = 1[0,x] , φ. 0
It follows by the Schwarz inequality that |η(x)| ≤ φ 2 |x|
for all
x.
(2.8)
It then follows that Kφ = V η − 21 V m ¯ , η, where the inner product on the right is well defined because of the exponential decay of m ¯ and (2.8). Hence 1 1 ¯ , η = (Aη) − V m ¯ , η. (A + K) φ = Aη + V η − V m 2 2 Since the right side is a total derivative, we have 1 Aη − V m ¯ , η = C, 2
(2.9)
where ¯ , and integrate. Note
C is a constant. To determine C, multiply both sides by m that m ¯ (Aη) dx = 0, because (2.8) permits the integration by parts. The computation
then yields C = (1/2)m ¯ , η. Putting this in (2.9) yields A η − (1/2)m ¯ , η = 0. Now any solution ψ of Aψ = 0 either decays exponentially or diverges exponentially at infinity, since, due to the rapid decay of m ¯ , and hence V , φ ≈ φ. The only option consistent with (2.8) is exponential decay. Hence we must have that η − (1/2)m ¯ , η 2 is in the L kernel of A. However, we know from Lemma 1.2 that this is spanned by m ¯ . So we must have η − (1/2)m ¯ , η = α m ¯ . Integrating both sides against m ¯ yields α = 0. Hence η is constant, and so φ = 0, as was to be shown.
Simple Proof of Stability of Fronts for Cahn–Hilliard Equation
333
We will now show that there is a γ > 0 so that Q(φ) ≥ γ φ 22
(2.10)
for all φ. The proof is similar to the proof of Weyl’s lemma, though note that A + K is not self adjoint. If (2.10) were false, there would exist an infinite orthonormal sequence {φn } in L2 such that limn→∞ Q(φn ) = 0. Since the sequence {φn } is orthonormal, it converges ¯ and note that limn→∞ cn = 0. If the cn are not weakly to zero. Next, let cn = φn , m all zero, let n0 be such that |cn0 | ≥ |cn | for all n, and define φ˜ n = φn − (cn /cn0 )φn0 . It is clear that the φ˜ n are all orthogonal to m ¯ , and moreover the modified sequence still converges weakly to zero, and still satisfies limn→∞ Q(φ˜ n ) = 0 and limn→∞ φ˜ n 22 = 1. (If all of the cn vanish, we simply take φ˜ n = φn for all n.) Moreover, by Lemma 1.2,
Aφ˜ n 22 ≥
9
φ˜ n 22 . 16
(2.11)
Since the sequence {φ˜ n } converges weakly to zero, lim K φ˜ n = 0
n→∞
(2.12)
strongly in L2 . Also, it is clear that the
operator domain of A is the form domain of Q and that Aφ 22 ≤ 2 Q(φ) + Kφ 22 on this domain. Thus, (2.13)
Aφ˜ n 22 ≤ 2 Q(φ˜ n ) + K 2 φ˜ n 22 , where K denote the operator norm of K on L2 . In particular, the Aφ˜ n 2 are uniformly bounded by a finite constant. Now, Q(φ˜ n ) ≤ Aφ˜ n 22 + K φ˜ n 22 + 2 Aφ˜ n 2 K φ˜ n 2 .
(2.14)
By (2.12) and (2.13), the last two terms on the right in (2.14) tend to zero with n. Hence for any > 0, we obtain that Aφ˜ n 22 ≤ φ˜ n 22 for all sufficiently large n, which would contradict (2.11). This proves (2.10). Now by (2.7), when m ¯ , v = 0, 2
(Av) 2 = Q(v ), and hence we have the result. Combining this result with (2.4) , we have 2 d 1 |U |2 dx. − F(m) ≥ (1 − 2) (Av) dx + γ ||v ||22 − dt R R
(2.15)
We next show that the quantity on the last line is positive whenever δ and κ are small enough. To accomplish this, we use the following lemma: Lemma 2.2. Let v ∈ L2 (R), v ∈ L2 (R). For any κ > 0 and 0 > 0 small enough, there exists δ(κ, 0 ) > 0 such that the following estimate holds: 2 (2.16) U (v) dx ≤ 0 |v |2 dx, R
provided v 2 ≤ δ,
v
2
≤ κ.
334
E. A. Carlen, M. C. Carvalho, E. Orlandi
Proof. This follows directly from (2.2) and the bound v 2∞ ≤ 2 v 2 v 2 .
Proof of Theorem 1.4. Now choose κ and δ so that 0 ≤ 2 γ , and then from (2.15), we have the inequality of Theorem 2.1. We now prove a bound that will enable us to apply the dissipation–dichotomy argument described in the introduction. Theorem 2.3. For all > 0, there is an 0 > 0 such that for or all v orthogonal to m ¯ with I(m ¯ + v) = (Av) 22 ≤ 02 v, Av
(2.17)
(1 − ) Av 22 ≤ v, Av ≤ (1 + ) Av 22 .
(2.18)
one has
Proof. First, by Lemma 1.2, inserting A1/2 v in place of v, v, Av ≤
4
Av 22 3
(2.19)
so we have that (Av) 22 ≤ (402 /3) v 22 . Then, using the notation of Lemma 1.2, Av 22 − v, Av = v , Av + V v, Av ≤ v , (Av) + |V v, Av| . Now |V v, Av| ≤ v 2 V 2 Av ∞ and by (2.17) and (2.19),
Av 2∞ ≤ 2 Av 2 (Av) 2 ≤
80
Av 22 . 3
Then, by Lemma 1.2 and Schwarz’s inequality, v 2 ≤ (4/3) Av 2 , so that, recalling from the proof of Lemma 1.2 that V 22 = 6, |V v, Av| ≤ 8
0
Av 22 . 3
(2.20)
Next we bound v , (Av) . First, an easy application of (2.17) and (2.19) yields v , (Av) ≤ v 2 (Av) 2 ≤ 0
4
v 2 Av 2 . 3
(2.21)
√ By Theorem 2.1, v 2 ≤ (1/ γ ) (Av) 22 ; hence aplying (2.17) and (2.19) again, 4 v , (Av) ≤ 2 √
Av 22 . 0 3 γ Combining (2.20) and (2.22), we have the result.
(2.22)
Simple Proof of Stability of Fronts for Cahn–Hilliard Equation
335
3. Moment Estimates In this section we prove Theorem 1.5 which bounds the growth of φ(t) = 1 + |x (Av) |2 dx + C [F(m ¯ + v) − F(m)] ¯ , R
(3.1)
where C is a positive constant to be specified. Actually, 1 + C [F(m ¯ + v) − F(m)] ¯
(3.2)
is non-negative and monotone decreasing, so as far as growth is concerned, the quantity of real interest is ψ(t) = |x (Av) |2 dx. (3.3) R
However, (3.2) contributes negative terms to the time derivative of φ(t) that serve to absorb certain terms that cannot be controlled in terms of the excess free energy, due to the unboundedness of the operator A. Recall that A means Aa(t) , where the solution m(x, t) has the form m(x, t) = ¯ a 22 . Therefore, it follows from (1.14) v(x, t) + m ¯ a(t) (x), and a(t) minimizes m(t) − m that
∂ ∂ ˙ (3.4) Aa(t) v(t) = Aa(t) v(t) − 3m ¯ a(t) a(t), ∂t ∂t where a(t) ˙ denotes the derivative of a(t). We can also rewrite the evolution equation (1.5) in terms of v(t) = m(t) − m ¯ a(t) , and doing so we obtain
1 ∂ . (3.5) v(t) = Aa(t) Aa(t) v(t) + 3m ¯ a(t) v 2 (t) + v 3 (t) Aa(t) ∂t 2 (This time there is no contribution involving a(t) ˙ since m ¯ a(t) is annihilated by Aa(t) .) Note that the first term on the right is linear in v, and the second term is higher order. The main contribution will come from the linear term, and it is this that we must work hardest to control. To control the term involving a(t), ˙ first note that (m(t) − m ¯ a(t) )m ¯ a(t) dx = 0 2 which holds for all t. Differentiating this equation in t, one obtains a(t) ˙
m ¯ a 2 −
v, m ¯ a = − (∂m/∂t)m ¯ a . Thus, we have δF m ¯ a dx ≤ 2 I(m(t)) m ¯ 2 , (3.6) |a(t)| ˙ ≤ 2 δm 2
as long as v 2 is sufficiently small that m ¯ a 2 − v, m ¯ a > 1/2. Since m ¯ has exponential decay, this gives us the bounds we will need to control the effects of the terms involving a(t), ˙ as we will see below. The non-linear terms are easily handled without any preparatory analysis.
336
E. A. Carlen, M. C. Carvalho, E. Orlandi
We now turn to the linear part, which will provide all of the most important terms. Consider the growth of ψ(t) when v evolves according to the linearized equation ∂ v = (Av) . ∂t
(3.7)
The computations that follow can be more clearly and compactly represented if we introduce the notation ξ = x (Av)
and
η = Av.
(3.8)
Lemma 3.1. Let v(x, t) solve (3.7), and let ψ(t) be defined in terms of v through (3.3). Then for any α > 0, 1 + 4 V 1 d ψ(t) = 12 + η , η dt 2α (3.9) α + 2 + (x 2 V ) 2∞ + 2α V 1 η, η, 2 where η = Av. Proof. Let V be the potential defined in (1.15). Then one easily computes the commutators ∂ ∂ , A = V and [x, A] = 2 . (3.10) ∂x ∂x Clearly, d ψ(t) = 2 dt
R
x 2 (Av) A (Av) dx.
Now one commutes derivatives and multiples of x past
integrates by parts to A and obtain a dissipative term of the form − R x (Av) A x (Av) dx into which positive terms can be absorbed. The result, in the notation (3.8) , is that d ψ(t) = −2ξ, Aξ − 4ξ, η − 4xη, Aη − 2η , x 2 V η. dt The last three terms require further manipulation. First: 1 ξ, η = −ξ , η = −xη , η − η , η = − η , η . 2 This term is controlled by the derivative of the excess free energy. Second, one has, using (3.10) xη, Aη = η, Aξ + 2η, η = η, Aξ − 2η , η . Finally, for any α > 0, η , x 2 V η ≤ η , η 1/2 (x 2 V )η, (x 2 V )η1/2 ≤
1 α η , η + (x 2 V ) 2∞ η, η. 2α 2
Simple Proof of Stability of Fronts for Cahn–Hilliard Equation
337
Putting everything together, one obtains: 1 α d ψ(t) ≤ −2ξ, Aξ − 4ξ, Aη + 10 + η , η + (x 2 V ) 2∞ η, η. dt 2α 2 Now one uses that −2ξ, Aξ − 4ξ, Aη = −2(ξ + η), A(ξ + η) + 2η, Aη.
(3.11)
But η, Aη = η , η + η, V η + η, η ≤ η , η + V 1 η 2∞ + η, η, and
η 2∞ ≤ 2 η 2 η 2 ≤ Altogether
V 1 η, Aη ≤ 1 + α
1 η , η + αη, η. α
η , η + (1 + α V 1 ) η, η.
Putting (3.12) into (3.11) gives the result.
(3.12)
Lemma 3.2. η, η ≤ η , η + v, Av. Proof. By Schwarz, for any α > 0, η, η = A1/2 η, A1/2 v ≤ η, Aη1/2 v, Av1/2 ≤
α 1 η, Aη + v, Av, 2 2α
and η, Aη ≤ η , η + V + 1 ∞ η, η. Since V + 1 ∞ = 1, we can choose α = 1 and combine the above to obtain the result. Proof of Theorem 1.5. First, we deal with the inhomogenous terms involving a(t) ˙ on the right in (3.4) , as they contribute to
x 2 Aa(t) v ∂ Aa(t) v dx . ∂t R By symmetry and the Schwarz inequality, we have that
2 ˙ 3 A v x m ¯ dx ≤ 3 Aa(t) x 2 m ¯ a(t) 2 v 2 |a(t)|. ˙ a(t) a(t) |a(t)| R
Now applying (3.6) , the contribution of the term involving a(t) ˙ is bounded above by ¯ a(t) 2 v 2 m ¯ a(t) 2 I(m(t)). 6 Aa(t) x 2 m It is here that we begin using the hypothesis that |a(t)| ≤ 1. The exponential decay of m ¯ a(t) would not give a bound on Aa(t) x 2 m ¯ a(t) 2 that is uniform in t if |a(t)| gets large. Since this is precluded by the hypotheses, for any α > 0, there is a universal constant Kα so that
Kα 3 x 2 Aa(t) v m ¯ a(t) dx |a(t)| ˙ ≤ (3.13) I(m(t)) + α v 22 . α R
338
E. A. Carlen, M. C. Carvalho, E. Orlandi
Note that the first term on the right in (3.13) can be absorbed into the negative contribution from the inclusion of the multiple C of the excess free energy in φ, at least if C is chosen appropriately large. Therefore, since we can take α arbitrarily small, and can bound of
v 22 in terms of the excess free energy by Lemma 1.3, this term is under control. One even more easily handles the contributions of the nonlinear terms in (3.5) using the bound v 2∞ ≤ 2 v 2 v 2 . We do not give the details here, but turn to the application of the lemmas from this section to control the contribution from the linear terms. To apply Lemma 3.1, choose α so that α . 2 + (x 2 V ) 2∞ + 2α V 1 ≤ 2 1 + 2 4 Then, for this choice of α, and using the notation from (3.8) , d 1 + 4 V 1 ψ(t) = 12 + η , η + 2 1 + η, η. dt 2α 4
(3.14)
Next, by Theorem 1.4, d C [F(m ¯ + v) − F(m)] ¯ ≤ −C(1 − )η , η . dt Therefore, if we choose C so that C(1 − ) ≥ (12 + (1 + 4 V 1 )/(2α)), we get d φ(t) ≤ 2 1 + η, η. dt 4 It remains to bound η 22 . There are two cases. First suppose that the dissipation is small compared to the excess free energy so that (1.18) holds. Then by Theorem 2.3,
η 22 ≤ (1 + )v, Av, and then by Lemma 1.3, η 22 ≤ (1 + )3 [F(m(t)) − F(m)], ¯ for δ and κ sufficiently small. Redefining , we have proved (1.22) under the hypothesis (1.18). If we don’t assume (1.18), we use
η 22 = v , η + (V + 1)v, η ≤ v 2 I(m(t)) + v 2 η 2 since v+1 ∞ = 1. This leads to η 22 ≤ (2/γ )I(m(t))+4 v 22 , where γ is the constant in Theorem 1.4. Again, the term involving I(m(t)) can be absorbed by an appropriate choice of C. The remaining term is easily handled by Lemma 1.2 and Lemma 1.3, and so (1.23) is established.
4. Proof of the Main Theorem We will be brief in the presentation of this proof since from this point on, it is very close to the one we have given for the LOP equation in Sect. 4 of [5]. Let m(t) be a solution of (1.5) with initial data as specified in Theorem 1.1, where the size of δ is to be specified in the course of the proof. The first step is to wait a bit to acquire some smoothness. For any fixed κ > 0, if initially v 2 ≤ δ/4, where δ is sufficiently small, we will have that v(1) 2 ≤ δ/2 and v (1) 2 ≤ κ/2, and moreover |a(1)| will be small. Regularity theory for m(t) can be found in [3]. Also, the production of smoothness estimates in Sect. 2 of [5] are easily adapted to this case to see the validity of the above assertion.
Simple Proof of Stability of Fronts for Cahn–Hilliard Equation
339
We now begin the analysis from this starting point. All of the lemmas and theorems that required v(1) 2 ≤ δ, v (1) 2 ≤ κ, and |a(1)| < 1 can be used until time T , which is the first time that any of them is violated. Of course, we have to show that such a time T never occurs. Let f (t) and φ(t) be given in terms of m(t) as in the introduction. We begin by assuming that at time t, (1.18) holds. Then by Theorem 1.4, d f (t) ≤ −(1 − ) (Av) 22 . dt By convexity (Av) 22 ≥ (Aρ ∗ v) 22 , where ρ = (1/2)m ¯ , which is a probability density. Because v is orthogonal to m ¯ , ρ ∗ v(a(t)) = 0. Therefore, by the constrained uncertainty principle (1.24) ,
(Av) 22 ≥ (Aρ ∗ v) 22 ≥
Aρ ∗ v 42 9 . 4 (x − a(t)) (Aρ ∗ v) 22
Now under the condition (1.18) , v is so smooth and spread out that ρ ∗v ≈= v, and we do not lose much in passing from v to ρ ∗ v. The estimates are straightforward, making use of (3.10) , and are exactly like those applied on pp. 868–869 of [5].
Without repeating the
details, the result is that (Av) 22 ≥ (9/4)(1 − )2 A ∗ v 42 / (x − a(t)) (Av) 22 and hence that, with redefined, and making use of Lemma 1.3, d f 2 (t) f (t) ≤ −9(1 − ) , dt φ(t) where we have used the fact that |a(t)| < 1 to absorb the effects of a(t) into the constant term. By Theorem 1.5, we have that d φ(t) ≤ 4(1 + )f (t). dt Hence for such t, we have (1.30) satisfied with A/(A + B) arbitrarily close to 9/13. Now suppose that (1.19) holds. Then we have d ˜ (t) φ(t) ≤ Bf dt from the second half of Theorem 1.5, where B˜ is the constant K given there. From (1.19) d f 2 (t) f (t) ≤ −1 f (t) ≤ A˜ , dt φ(t)
(4.1)
where A˜ can be chosen as large as we like provided f (t) is sufficiently small. Thus with δ chosen sufficiently small, as long as f (t) < δ holds, we have (??) and can arrange ˜ A˜ + B) ˜ = A/(A + B). Thus, by rescaling for it to hold with a value of A˜ so that A/( the time in those time intervals in which (1.19) holds; i.e., possibly using a slower clock there, we have a system holding for all t. The details of this argument are exactly as in Sect. 5 of [5]. One now concludes that as long as |a(t)| < 1, v(t) 2 ≤ δ and v (t) 2 ≤ κ, f (t) decays at a rate close to t −9/13 (using the slower of the two time scales). Therefore, as in [5], |a(t)| < 1, v(t) 2 ≤ δ and v (t) 2 ≤ κ hold for all t, and so f (t) decays all the way to zero at a rate close to t −9/13 , as in Theorem 1.1. As explained at the end of Sect. 1 of this paper, this means that Av(t) 1 decays to zero at an algebraic rate, and that this forces limt→∞ a(t) = a, where a is given by the conservation law.
340
E. A. Carlen, M. C. Carvalho, E. Orlandi
References 1. Asselah, A.: Stability of a wave front for a nonlocal conservative evolution. Proc. Royal Soc. Edinburgh 128 A, no. 2, 219–234 (1998) 2. Bricmont, J., Kupiainen, A., Taskinen, J.: Stability of Cahn–Hilliard Fronts. Comm. Pure and Appli. Math. 52, no. 7, 839–871 (1999) 3. Caffarelli, L., Muler, N.E.: An L∞ bound for solutions of the Cahn–Hilliard equation. Arch. Rational Mech. Anal. 133, 129–144 (1995) 4. Carlen, E.A., Carvalho, M.C., Orlandi, E.: Algebraic rate of decay for the excess free energy and stability of fronts for a non-local phase kinetics equation with a conservation law I. J. Stat. Phys. 95, no. 5/6, 1069–1117 (1999) 5. Carlen, E.A., Carvalho, M.C., Orlandi, E.: Algebraic rate of decay for the excess free energy and stability of fronts for a non-local phase kinetics equation with a conservation law II. Comm. P.D.E. 25, no. 5/6, 847–886 (2000) 6. De Masi, A., Orlandi, E., Presutti, E., Triolo, E.: Stability of the interface in a model of phase separation. Proc. Royal Soc. Edinburgh 124A, 1013–1022 (1994) 7. Giacomin, G., Lebowitz, J.: Phase segregation dynamics in particle systems with long range interactions I: Macroscopic limits. J. Stat. Phys. 87, no. 1/2, 37–61 (1997) 8. Hardy, G., Littlewood, J.,and Polya, G.: Inequalities. Cambridge: Cambridge Univ. Press, 1932 9. Lieb, E.H., Thirring, W.: Inequalities of the moments of the eigenvalues of the Schroödinger Hamiltonian and their relation to Sobolev inequalities. In: Studies in Mathematical Physics, Essays in honor of Valentine Bargmann, edited by Lieb, Simon and Wightman, Princeton, NJ: Princeton University Press, 1976, pp. 269–303 10. Lebowitz, J.L., Orlandi, E., Presutti, E.: A Particle model for spinodal decomposition. J. Stat. Phys. 63, 933–974 (1991) 11. Weyl, H.: Gruppentheorie und Quantenmechanik. Leipzig: Wissenschaftlicher Verlag, 1926 Communicated by A. Kupiainen
Commun. Math. Phys. 224, 341 – 372 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
On Quasi-Hopf Superalgebras Mark D. Gould, Yao-Zhong Zhang, Phillip S. Isaac Department of Mathematics, The University of Queensland, Brisbane, Qld 4072, Australia. E-mail: [email protected] Received: 14 December 1998 / Accepted: 29 January 2000
Abstract: In this work we investigate several important aspects of the structure theory of the recently introduced quasi-Hopf superalgebras (QHSAs), which play a fundamental role in knot theory and integrable systems. In particular we introduce the opposite structure and prove in detail (for the graded case) Drinfeld’s result that the coproduct ≡ (S ⊗ S) · T · · S −1 induced on a QHSA is obtained from the coproduct by twisting. The corresponding “Drinfeld twist” FD is explicitly constructed, as well as its inverse, and we investigate the complete QHSA associated with . We give a universal proof that the coassociator = (S ⊗ S ⊗ S) 321 and canonical elements α = S(β), β = S(α) correspond to twisting the original coassociator = 123 and canonical elements α, β with the Drinfeld twist FD . Moreover in the quasi-triangular case, it is shown algebraically that the R-matrix R = (S ⊗ S)R corresponds to twisting the original R-matrix R with FD . This has important consequences in knot theory, which will be investigated elsewhere. 1. Introduction The main aim of this paper, in conjuction with [1], is to continue the work introduced in [2] which defines Z2 graded versions of Drinfeld’s quasi-Hopf algebras [3], called quasiHopf superalgebras (QHSAs). In particular, we show that the special QHSA structure obtained by application of the antipode (see Proposition 4) actually coincides with the quasi-Hopf superalgebra structure induced by twisting with FD , the “Drinfeld twist” (see Eq. (4.10)). In the quasi-triangular case, our results in this direction are new, even in the non-graded case. The potential for application of these new structures is enormous. They give rise to new (non-standard) representations of the braid group and corresponding link polynomials which will be investigated elsewhere. Moreover, it has already been shown in [4–8] Current address: Graduate School of Mathematical Sciences, The University of Tokyo, 3-8-1 Komaba, Meguro, Tokyo 153-8914, Japan. E-mail: [email protected]
342
M. D. Gould, Y.-Z. Zhang, P. S. Isaac
and [2] that QHSAs are directly relevant to elliptic quantum (super)groups [9, 10], which are useful in obtaining elliptic solutions [11–16] to the (graded) quantum Yang-Baxter equation. The importance of QHSAs in supersymmetric integrable models and the theory of knots and links [17] should become evident as the theory is developed further, which is the aim of this paper. In particular, the opposite structure is introduced and several aspects of their structure theory are investigated. 2. Quasi-Hopf Superalgebras and Twistings This section is mostly a summary of the definitions and results given in [2]. They are important and worth restating here since they will be used frequently. Definition 1. A Z2 graded quasi-bialgebra A over C is a unital associative algebra equipped with algebra homomorphisms : A → C (counit), : A → A ⊗ A (coproduct) together with an invertible homogeneous ∈ A ⊗ A ⊗ A (coassociator) satisfying (1 ⊗ )(a) = −1 ( ⊗ 1)(a) , ∀a ∈ A, ( ⊗ 1 ⊗ 1) · (1 ⊗ 1 ⊗ ) = ( ⊗ 1) · (1 ⊗ ⊗ 1) · (1 ⊗ ), ( ⊗ 1) = 1 = (1 ⊗ ), (1 ⊗ ⊗ 1) = 1.
(2.1) (2.2) (2.3) (2.4)
Properties (2.2), (2.3) and (2.4) imply that ( ⊗ 1 ⊗ 1) = 1 = (1 ⊗ 1 ⊗ ) . In this case, multiplication of tensor products is Z2 graded and defined as (a ⊗ b)(c ⊗ d) = (−1)[b][c] ac ⊗ bd for homogeneous a, b, c, d ∈ H and where [a] ∈ Z2 denotes the grading of a, so that we have the following important result which will be used frequently: [a] = 1 ⇒ (a) = 0. Also, the twist map T : H ⊗ H → H ⊗ H is defined by T (a ⊗ b) = (−1)[a][b] b ⊗ a. Since is homogeneous, the counit properties imply that is even ([ ] = 0). Definition 2. A QHSA H is a Z2 graded quasi-bialgebra equipped with a Z2 graded antiautomorphism S : H → H (antipode) and homogeneous canonical elements α, β ∈ H such that for all a ∈ H , m · (1 ⊗ α)(S ⊗ 1)(a) = (a)α, m · (1 ⊗ β)(1 ⊗ S)(a) = (a)β, m(m ⊗ 1) · (S ⊗ 1 ⊗ 1)(1 ⊗ α ⊗ β)(1 ⊗ 1 ⊗ S) = 1, m(m ⊗ 1) · (1 ⊗ β ⊗ α)(1 ⊗ S ⊗ 1) −1 = 1.
(2.5) (2.6) (2.7) (2.8)
On Quasi-Hopf Superalgebras
343
Here m : H ⊗ H → H is the multiplication map, m(a ⊗ b) = ab, ∀a, b ∈ H , and S is defined by S(ab) = (−1)[a][b] S(b)S(a) for homogeneous a, b. This can be extended to inhomogeneous elements by linearity. Also, since H is associative, m(m ⊗ 1) = m(1 ⊗ m). If we apply to (2.7) and (2.8) we obtain, in view of Eq. (2.4), (α)(β) = (αβ) = 1, so that [α] = [β] = 0. It then follows by applying to (2.5) and (2.6) that (S(a)) = (a), ∀a ∈ H. If we write =
Xν ⊗ Y ν ⊗ Z ν ,
ν
and using the standard coproduct notation of Sweedler [18], (a) =
= a(1) ⊗ a(2) ,
(a)
(2.5), (2.6), (2.7) and (2.8) may be expressed
S(a(1) )αa(2) = (a)α,
(a)
a(1) βS(a(2) ) = (a)β,
(a)
1=
S(Xν )αYν βS(Zν )
ν
=
X¯ ν βS(Y¯ν )α Z¯ ν .
ν
The definition of a QHSA is designed to ensure that its finite dimensional representations constitute a monoidal category. For example, a Hopf superalgebra is a QHSA with α = β = 1 and = 1⊗3 . In fact, the relation between QHSAs and Hopf superalgebras is analogous to that between quasi-triangular Hopf superalgebras and cocommutative ones. In the latter case cocommutativity is weakened while in the former case coassociativity is weakened (in the same sense). Before proceeding, it is important to establish some notation. For the coassociator and its inverse, we set 123 ≡ = Xν ⊗ Y ν ⊗ Z ν , ν
−1 123
−1
≡
=
ν
X¯ ν ⊗ Y¯ν ⊗ Z¯ ν .
344
M. D. Gould, Y.-Z. Zhang, P. S. Isaac
We may then define the elements 132 and 312 (for example) by applying appropriate twists to the positions so that 132 = (1 ⊗ T ) 123 = Xν ⊗ Zν ⊗ Yν × (−1)[Yν ][Zν ] , ν
312 = (T ⊗ 1) 132 = Zν ⊗ Xν ⊗ Yν × (−1)[Yν ][Zν ]+[Xν ][Zν ] , ν
and similarly for −1 , so that, for example, −1 −1 231 = (1 ⊗ T ) 213
= (1 ⊗ T )(T ⊗ 1) −1 123 ¯ ¯ ¯ ¯ Y¯ν ⊗ Z¯ ν ⊗ X¯ ν × (−1)[Xν ][Yν ]+[Xν ][Zν ] . = ν
Note that our convention differs from the usual one (see [3] for example) which employs the inverse permutations on the positions. However, this is simply notation and is not important below. We now have the following definition, which once again appears in [2], and which we include here for convenience. Definition 3. A QHSA H is called quasi-triangular if there exists an invertible homogeneous R ∈ H ⊗ H such that T (a)R = R(a), ∀a ∈ H, −1 ( ⊗ 1)R = −1 231 R13 132 R23 123 ,
(2.9) (2.10)
(1 ⊗ )R = 312 R13 −1 213 R12 123 ,
(2.11)
where T ≡ T · . Moreover, if R satisfies R −1 = T · R ≡ R T , then H is called triangular. Note that this definition of quasi-triangular QHSAs ensures that the family of finite dimensional H -modules constitutes a quasi-tensor category. Equations (2.10) and (2.11) immediately imply ( ⊗ 1)R = (1 ⊗ )R = 1, and hence [R] = 0. It can be shown that R also satisfies the graded quasi-quantum Yang-Baxter equation (graded QQYBE) −1 −1 −1 R12 −1 231 R13 132 R23 123 = 321 R23 312 R13 213 R12 .
(2.12)
Now we come to twistings. Here we point out that the category of quasi-triangular QHSAs is invariant under a kind of gauge-transformation. Let F ∈ H ⊗ H be an invertible homogeneous element satisfying the property (1 ⊗ )F = ( ⊗ 1)F = 1,
(2.13)
On Quasi-Hopf Superalgebras
345
(so that [F ] = 0) with H a (quasi-triangular) QHSA. Set F (a) = F (a)F −1 , ∀a ∈ H, F = (F ⊗ 1) · ( ⊗ 1)F · · (1 ⊗ )F −1 · (1 ⊗ F −1 ),
(2.14)
αF = m · (1 ⊗ α)(S ⊗ 1)F −1 , βF = m · (1 ⊗ β)(1 ⊗ S)F.
(2.15)
RF = F T RF −1 ,
(2.16)
and
Also put
where F T ≡ T · F ≡ F21 . The following theorem summarises results proven in [2]. Let (H, , , , S, α, β) denote the entire QHSA structure. Given this structure, we have Theorem 1. (H, F , , F , S, αF , βF ) is also a QHSA. Moreover, if H is quasi-triangular with R-matrix R, then (H, F , , F , S, αF , βF ) is also quasi-triangular with R-matrix RF . We refer to F as a twistor. (H, F , , F , S, αF , βF ) is said to be the structure of H twisted under F . It is possible to impose on F the cocycle condition (F ⊗ 1)( ⊗ 1)F = (1 ⊗ F )(1 ⊗ )F.
(2.17)
It is worth pointing out that if we have a quasi-triangular Hopf superalgebra ( = 1⊗3 , α = β = 1) with structure (H, , , S) and R-matrix R, and then applying a twist F that satisfies (2.17), we would obtain a Hopf superalgebra (H, F , , S) with new R-matrix RF . 3. Opposite Structure Let
T = T ·
be the opposite coproduct on a QHSA H . Also set ¯ ¯ ¯ ¯ ¯ ¯ T = −1 = Z¯ ν ⊗ Y¯ν ⊗ X¯ ν × (−1)[Xν ][Yν ]+[Xν ][Zν ]+[Yν ][Zν ] , 321 α T = S −1 (α), and
β T = S −1 (β).
Our aim here is to prove the following. Proposition 1. (H, T , , T , S −1 , α T , β T ) is a QHSA. This is called the opposite structure on H .
346
M. D. Gould, Y.-Z. Zhang, P. S. Isaac
Proof. Firstly we prove that we indeed have a Z2 graded quasi-bialgebra structure. We note that (2.3) and (2.4) are obvious. For a ∈ A, (2.1) may be written (in Sweedler’s notation [18]) a(1) ⊗ (a(2) ) = −1 123 ((a(1) ) ⊗ a(2) ) 123 . Below we set (a(1) ) =
i
(a(2) ) =
i
i a(1)i ⊗ a(1) , i a(2)i ⊗ a(2) ,
so that (2.1) becomes i i = −1 a(1) ⊗ a(2)i ⊗ a(2) 123 (a(1)i ⊗ a(1) ⊗ a(2) ) 123 .
(3.1)
If we then apply the algebra homomorphism (1 ⊗ T )(T ⊗ 1)(1 ⊗ T ) to (3.1) we obtain T T (T ⊗ 1)T (a) = −1 321 (1 ⊗ ) (a) 321
which can be written (1 ⊗ T )T (a) = ( T )−1 (T ⊗ 1)T (a) T with T as stated. Taking the inverse of (2.2) and applying the algebra homomorphism (T ⊗ T )(1 ⊗ T ⊗ 1)(T ⊗ T )(1 ⊗ T ⊗ 1) to both sides, we have (T ⊗ 1 ⊗ 1) T · (1 ⊗ 1 ⊗ T ) T = ( T ⊗ 1) · (1 ⊗ T ⊗ 1) T · (1 ⊗ T ), which is (2.2) for the opposite structure. Hence we have proved the Z2 graded quasibialgebra properties. As to the remaining properties, we use (2.5) to obtain the following: m · (1 ⊗ α T )(S −1 ⊗ 1)T (a) = S −1 (a(2) )S −1 (α)a(1) × (−1)[a(1) ][a(2) ] = S −1 (S(a(1) )αa(2) ) = S −1 ((a)α) = (a)α T , and similarly, we can use (2.6) to obtain m · (1 ⊗ β T )(1 ⊗ S −1 )T (a) = (a)β T . As to the opposite of (2.7), we have m(m ⊗ 1) · (S −1 ⊗ 1 ⊗ 1)(1 ⊗ α T ⊗ β T )(1 ⊗ 1 ⊗ S −1 ) T ¯ ¯ ¯ ¯ ¯ ¯ = S −1 (Z¯ ν )S −1 (α)Y¯ν S −1 (β)S −1 (X¯ ν ) × (−1)[Xν ][Yν ]+[Xν ][Zν ]+[Yν ][Zν ] = S −1 (X¯ ν βS(Y¯ν )α Z¯ ν ) = 1.
On Quasi-Hopf Superalgebras
347
In a similar way, we can show the opposite of (2.8) is m(m ⊗ 1) · (1 ⊗ β T ⊗ α T )(1 ⊗ S −1 ⊗ 1) T = 1. This completes the proof. Now consider (2.9). This immediately shows that the opposite R-matrix R T ≡ T · R satisfies the intertwining property under the opposite coproduct T . We now investigate (2.10) and (2.11) for this opposite structure. Set R= ei ⊗ e i . i
Applying the homomorphism (1 ⊗ T )(T ⊗ 1)(1 ⊗ T ) to (2.10) gives (1 ⊗ T )R T = (X¯ ν ⊗ Z¯ ν ⊗ Y¯ν )(ej ⊗ 1 ⊗ ej )(Yρ ⊗ Zρ ⊗ Xρ )(ek ⊗ ek ⊗ 1) j k ¯ ¯ · (Z¯ µ ⊗ Y¯µ ⊗ X¯ µ ) × (−1)[Yν ][Zν ]+[Yρ ][Zρ ]+[Xρ ][Yρ ]+[ej ][e ]+[ek ][e ]
¯
¯
¯
¯
¯
¯
×(−1)[Xµ ][Yµ ]+[Xµ ][Zµ ]+[Yµ ][Zµ ] −1 T T = −1 132 R13 231 R123 321 . Since T −1 321 = 123 ,
231 = ( T )−1 213 , T −1 132 = 312 ,
we have
T T T ( T )−1 (1 ⊗ T )R T = T312 R13 213 R12 123 , which proves (2.11) for the opposite structure. Now applying the homomorphism (T ⊗ 1)(1 ⊗ T )(T ⊗ 1) to (2.11), we can obtain Eq. (2.10) for the opposite structure in a similar way: T T T T −1 (T ⊗ 1)R T = ( T )−1 231 R13 132 R23 ( )123 .
Thus we have proved Proposition 2. (H, T , , T , S −1 , α T , β T ) is a quasi-triangular QHSA with R-matrix R T ≡ T · R. It is worth noting that if H is a quasi-triangular QHSA, then its R-matrix R satisfies (2.13), so we may consider twisting H with its own R-matrix. Obviously the coproduct now reduces to the opposite one: R (a) = R(a)R −1 = T (a) for every a ∈ H . In this case, in view of the graded QQYBE (2.12), the coassociator induced by R coincides with the opposite coassociator: R
(2.14)
=
(2.10),(2.11)
=
−1 R12 · ( ⊗ 1)R · · (1 ⊗ )R −1 · R23 −1 −1 −1 −1 −1 R12 · −1 231 R13 132 R23 123 · R12 213 R13 312 R23
(2.12)
=
−1 321
=
T .
348
M. D. Gould, Y.-Z. Zhang, P. S. Isaac
The corresponding canonical elements are given, from (2.15), by αR = m · (1 ⊗ α)(S ⊗ 1)R −1 , βR = m · (1 ⊗ β)(1 ⊗ S)R, while the R-matrix induced by twisting with R is, from (2.16), R T · R · R −1 = R T , which is simply the opposite R-matrix. It thus appears that the structure induced by twisting with R corresponds to the opposite quasi-triangular QHSA structure. Note however that αR and βR are defined with respect to the antipode S rather than the opposite antipode S −1 . So now we come to consider the opposite structure of the twisted quasi-triangular QHSA (H, F , F , , S, αF , βF ) with R-matrix RF . The opposite coproduct is clearly given by (F )T (a) = F T T (a)(F T )−1 , which obviously corresponds to twisting the opposite coproduct on H with F T . That is, (F )T (a) = (T )F T (a). To see this is in fact the case for the remaining structure, we note that the opposite coassociator to F is ( F )T
= ( −1 F )321
= (T ⊗ 1)(1 ⊗ T )(T ⊗ 1)( −1 F )123 (2.14)
−1 −1 · F12 } = (T ⊗ 1)(1 ⊗ T )(T ⊗ 1) · {F12 · (1 ⊗ )F · −1 123 · ( ⊗ 1)F T T T −1 T −1 = F23 · (T ⊗ 1)F T · −1 · (F23 ) 321 · (1 ⊗ )(F ) T = F23 · (T ⊗ 1)F T · T123 · (1 ⊗ T )(F T )−1 · (F T )−1 23
(2.14)
= ( T )F T .
Similarly for the opposite R-matrix we have (RF )T = F R T (F T )−1 = (R T )F T . It remains to consider the canonical elements (2.15). To this end, (αF )T = S −1 (αF ) = S −1 (S(f¯i )α f¯i ) ¯ ¯i = S −1 (f¯i )S −1 (α)f¯i × (−1)[fi ][f ] ¯ ¯i = m · (1 ⊗ S −1 (α))(S −1 ⊗ 1)(f¯i ⊗ f¯i ) × (−1)[fi ][f ] = m · (1 ⊗ α T )(S −1 ⊗ 1)(F T )−1 = (α T )F T and similarly (βF )T = (β T )F T . Here we have used Proposition 1 and the fact that S −1 is the antipode under the opposite structure. Thus we have proved
On Quasi-Hopf Superalgebras
349
Proposition 3. (H, (F )T , , ( F )T , S −1 , (αF )T , (βF )T ) = (H, (T )F T , , ( T )F T , S −1 , (α T )F T , (β T )F T ). Moreover, if H is quasi-triangular with R-matrix R, then (RF )T = (R T )F T . Now take H to be a normal quasi-triangular Hopf superalgebra and consider a twistor F (λ) ∈ H ⊗H which depends on λ ∈ H , where we assume λ depends on one or possibly several parameters. Here we assume that F (λ) satisfies the shifted cocycle condition (cf. Eq. (2.17)) F12 (λ) · ( ⊗ 1)F (λ) = F23 (λ + h(1) ) · (1 ⊗ )F (λ),
(3.2)
where h(1) = h ⊗ 1 ⊗ 1 and h ∈ H fixed. We then have the following QHSA structure induced by twisting with F (λ): (λ) ≡ F (λ) = F23 (λ + h(1) )F23 (λ)−1 , λ (a) = F (λ)(a)F (λ)−1 , ∀a ∈ H, αλ = m · (S ⊗ 1)F (λ)−1 , βλ = m · (1 ⊗ S)F (λ), R(λ) = F (λ)T RF (λ)−1 .
(3.3)
It is straightforward to show that Eqs. (2.10), (2.11) in this case reduce to (1) (λ ⊗ 1)R(λ) = −1 231 (λ)R13 (λ)R23 (λ + h ),
(1 ⊗ λ )R(λ) = R13 (λ + h(2) )R12 (λ) 123 (λ),
(3.4)
while the QQYBE (2.12) becomes R12 (λ + h(3) )R13 (λ)R23 (λ + h(1) ) = R23 (λ)R13 (λ + h(2) )R12 (λ). This is the graded dynamical QYBE, of interest in obtaining elliptic solutions to the QYBE. We can also determine the opposite structure of the above. Recall that H is also a QHSA with the opposite coproduct Tλ and with the opposite coassociator (3.3)
T T (3) −1 (λ)T = (λ)−1 321 = F12 (λ)F12 (λ + h ) .
It is worth noting, in view of Proposition 3, that this coincides with the QHSA structure induced on the opposite QHSA structure of H by twisting with F T (λ). By applying (1 ⊗ T )(T ⊗ 1)(1 ⊗ T ) to the shifted cocycle condition (3.2), it can be shown that F T (λ) satisfies the opposite shifted cocycle condition T F23 (λ)(1 ⊗ T )F T (λ) = F12 (λ + h(3) )(T ⊗ 1)F T (λ).
To complete the opposite QHSA structure the antipode is S −1 , while the canonical elements are now given by αλT = S −1 (αλ ),
βλT = S −1 (βλ ).
350
M. D. Gould, Y.-Z. Zhang, P. S. Isaac
Applying (T ⊗ 1)(1 ⊗ T )(T ⊗ 1) to (3.4) gives the coproduct properties T T (Tλ ⊗ 1)R T (λ) = R13 (λ + h(2) )R23 (λ) 321 (λ),
T T (3) (1 ⊗ Tλ )R T (λ) = −1 132 (λ)R13 (λ)R12 (λ + h ),
which are special cases of (2.10) and (2.11), for the coassociator concerned. Finally, the graded QQYBE satisfied by R T (λ) reduces to T T T T T T R12 (λ)R13 (λ + h(2) )R23 (λ) = R23 (λ + h(1) )R13 (λ)R12 (λ + h(3) ),
which we refer to as the opposite graded dynamical QYBE. 4. Drinfeld Twist This section is concerned with the QHSA structure induced by the Drinfeld twist [3], and gives details of some remarkable results relating to this construction. First it is worth establishing some useful notation. Set (1 ⊗ )(a) = a(1) ⊗ (a(2) ) R R R = a(1) ⊗ a(2) ⊗ a(3) , ( ⊗ 1)(a) = (a(1) ) ⊗ a(2) L L L = a(1) ⊗ a(2) ⊗ a(3) . The following result will be used later. Lemma 1. ∀a ∈ H , we have L L L Xν a ⊗ Yν βS(Zν )(−1)[a][Xν ] = a(1) Xν ⊗ a(2) Yν βS(Zν )S(a(3) ) L
S(Xν )αYν ⊗ aZν (−1)
[a][Zν ]
×(−1)[Xν ][a(2) ] , R R R = S(a(1) )S(Xν )αYν a(2) ⊗ Zν a(3) R
×(−1)[Zν ][a(2) ] , L L L a X¯ ν ⊗ S(Y¯ν )α Z¯ ν = X¯ ν a(1) ⊗ S(a(2) )S(Y¯ν )α Z¯ ν a(3) ¯
L
L
¯
R
R
×(−1)[Xν ]([a(1) ]+[a(2) ]) , R ¯ R R ¯ X¯ ν βS(Y¯ν ) ⊗ Z¯ ν a = Xν βS(Y¯ν )S(a(2) Zν a(1) ) ⊗ a(3) ×(−1)[Zν ]([a(2) ]+[a(3) ]) . Proof. For (4.1), (1 ⊗ )(a) = ( ⊗ 1)(a) can be rewritten as
R
R
R
R R R Xν a(1) ⊗ Yν a(2) ⊗ Zν a(3) (−1)[Zν ]([a(1) ]+[a(2) ])+[Yν ][a(1) ] R R L L L L Xν ⊗ a(2) Yν ⊗ a(3) Zν (−1)[Xν ]([a(2) ]+[a(3) ])+[Yν ][a(3) ] . = a(1)
(4.1)
(4.2)
(4.3)
(4.4)
On Quasi-Hopf Superalgebras
351
Then applying (1 ⊗ m)(1 ⊗ 1 ⊗ βS) to both sides we obtain R R R R R R ⊗ Yν a(2) βS(a(3) )S(Zν )(−1)[Zν ]([a(2) ]+[a(3) ])+[a(1) ][Xν ] l.h.s. = Xν a(1) = Xν a(1) ⊗ Yν (a(2) )βS(Zν )(−1)[a(1) ][Xν ] = Xν a ⊗ Yν βS(Zν )(−1)[a][Xν ] L L L L = r.h.s. = a(1) Xν ⊗ a(2) Yν βS(Zν )S(a(3) )(−1)[Xν ][a(2) ] . This proves (4.1). Parts (4.2), (4.3) and (4.4) are proved similarly and we shall only outline how they are obtained. We can arrive at (4.2) by applying (m ⊗ 1)(S ⊗ α ⊗ 1) to ( ⊗ 1)(a) = (1 ⊗ )(a). Equation (4.3) can be obtained by applying (1 ⊗ m)(1 ⊗ S ⊗ α) to (1 ⊗ )(a) −1 = −1 ( ⊗ 1)(a). Finally, if we apply (m ⊗ 1)(1 ⊗ βS ⊗ 1) to −1 ( ⊗ 1)(a) = (1 ⊗ )(a) −1 we arrive at (4.4). This completes the proof. Also, the following equations, which arise from Eq. (2.2), will prove useful throughout: ⊗ 1 = ( ⊗ 1 ⊗ 1) · (1 ⊗ 1 ⊗ ) · (1 ⊗ −1 ) · (1 ⊗ ⊗ 1) −1 (1) (2) = (X(ν) X(µ) X¯ ρ ⊗ Xν) Xµ X¯ σ Y¯ρ(1) ⊗ Y(ν) Zµ(1) Y¯σ Y¯ρ(2) ⊗ Zν Zµ(2) Z¯ σ Z¯ ρ ) (2)
¯
×(−1)[Xρ ]([Xν
(1)
]+[Xµ ]+[Xν ])+([X¯ σ ]+[Y¯ρ ])([Xν ]+[Zµ ]) (2)
×(−1)[Zµ ][Xν ]+[Xµ ][Xν
(1)
¯ (2)
¯
(1)
(2)
]+[Zν ][Zµ ]+[Y¯ρ ][X¯ σ ]+[Y¯ρ ][Z¯ σ ] (2)
×(−1)([Yσ ]+[Yρ ])([Zν ]+[Zµ ]) , (4.5) 1 ⊗ = (1 ⊗ ⊗ 1) −1 · ( −1 ⊗ 1) · ( ⊗ 1 ⊗ 1) · (1 ⊗ 1 ⊗ ) = (X¯ ν X¯ µ Xσ(1) Xρ ⊗ Y¯ν(1) X¯ µ Xσ(2) Yρ ⊗ Y¯ν(2) Z¯ µ Yσ Zρ(1) ⊗ X¯ ν Zσ Zρ(2) ) (1)
]+[Xρ ])([X¯ µ ]+[X¯ ν ])+[Zρ ][Xσ ]
(2)
]+[Yρ ])([Z¯ ν ]+[Z¯ µ ]+[Y¯ν ])+[Z¯ ν ]([Yσ ]+[Zρ ])+[Xρ ][Xσ ]+[Zσ ][Zρ ]
×(−1)([Xσ ×(−1)([Xσ ¯
(2)
¯
¯
(1)
(2)
(1)
¯ (2)
¯
×(−1)[Xµ ][Xν ]+[Yµ ]([Zν ]+[Yν ])+[Zµ ][Zν ] , (4.6) −1 ⊗ 1 = (1 ⊗ ⊗ 1) · (1 ⊗ ) · (1 ⊗ 1 ⊗ ) −1 · ( ⊗ 1 ⊗ 1) −1 = (Xν X¯ σ X¯ ρ(1) ⊗ Yν(1) Xµ Y¯σ X¯ ρ(2) ⊗ Yν(2) Yµ Z¯ σ(1) Y¯ρ ⊗ Zν Zµ Z¯ σ(2) Z¯ ρ ) ¯
¯ (1) ])[Xν ]+([Y¯σ ]+[X¯ ρ(2) ])([Xµ ]+[Zν ]+[Yν(2) ])
¯
¯ (1) ])([Zν ]+[Zµ ])+[Zν ][Zµ ]+[Xµ ][Yµ(2) ]
×(−1)([Xσ ]+[Xρ ×(−1)([Yρ ]+[Zσ ¯ (1)
1 ⊗ −1
¯ (2) ¯
¯
¯
¯ (2)
×(−1)[Xρ ][Xσ ]+[Xρ ][Zσ ]+[Yρ ][Zσ ] , (4.7) = (1 ⊗ 1 ⊗ ) −1 · ( ⊗ 1 ⊗ 1) −1 · ( ⊗ 1) · (1 ⊗ ⊗ 1) = (X¯ ν X¯ µ(1) Xσ Xρ ⊗ Y¯ν X¯ µ(2) Yσ Yρ(1) ⊗ Z¯ ν(1) Y¯µ Zσ Yρ(2) ⊗ Z¯ ν(2) Z¯ µ Zρ ) ¯
¯
(2) ¯ (2) (1) ¯ (2) ])+[X¯ µ ][Zν ]+[Y¯µ ][Z¯ ν ]+[Zσ ][Yρ ]
×(−1)([Xσ ]+[Xρ ])([Xµ ]+[Xν ]+[Xµ (1)
×(−1)([Yσ ]+[Yρ
¯ (1) ][X¯ ν ]+[Xρ ][Xσ ]
×(−1)[Xµ
(2)
(2)
])([X¯ µ ]+[Z¯ ν ])+([Zσ ]+[Yρ ])([Z¯ µ ]+[Z¯ ν ])
.
(4.8)
352
M. D. Gould, Y.-Z. Zhang, P. S. Isaac
Given a QHSA H , we note that (S ⊗ S)T and T · S −1 both determine Z2 graded algebra antihomomorphisms. It follows that ≡ (S ⊗S)T ·S −1 determines an algebra homomorphism and thus a new coproduct on H . That is, (a) = (S ⊗ S)T (S −1 (a)), ∀a ∈ H. Remark. In the case H is a normal Hopf superalgebra, = (cf. Sweedler [18]). In what follows, we work towards showing that is obtained from by twisting. Apply (S ⊗ S)T ⊗ 1 to Lemma 1, (4.1), to give l.h.s. = (S ⊗ S)T (a)(S ⊗ S)T (Xν ) ⊗ Yν βS(Zν ) L L L = r.h.s. = (S ⊗ S)T (Xν )(S ⊗ S)T (a(1) ) ⊗ a(2) Yν βS(Zν )S(a(3) ) L
L
×(−1)[Xν ]([a(1) ]+[a(2) ]) . Now let γ ∈ H ⊗ H be an even element (ie. [γ ] = 0). If we apply (1⊗2 ⊗ γ )(1⊗2 ⊗ ) to the above equation, we obtain (S ⊗ S)T (a)(S ⊗ S)T (Xν ) ⊗ γ (Yν βS(Zν )) L L L = (S ⊗ S)T (Xν )(S ⊗ S)T (a(1) ) ⊗ γ (a(2) )(Yν βS(Zν ))(S(a(3) )) L
L
×(−1)[Xν ]([a(1) ]+[a(2) ]) . Then applying (m ⊗ m)(1 ⊗ T ⊗ 1) gives (S ⊗ S)T (a)(S ⊗ S)T (Xν ) · γ · (Yν βS(Zν )) L L L = (S ⊗ S)T (Xν )(S ⊗ S)T (a(1) ) · γ · (a(2) )(Yν βS(Zν ))(S(a(3) )) L
L
×(−1)[Xν ]([a(1) ]+[a(2) ]) , so that if γ satisfies
(S ⊗ S)T (a(1) ) · γ · (a(2) ) = (a)γ ,
(4.9)
then
(S ⊗ S)T (Xν ) · γ · (Yν βS(Zν )) (S ⊗ S)T (a) = (S ⊗ S)T (Xν ) · (a(1) )γ · (Yν βS(Zν ))(S(a(2) ))(−1)[a(1) ][Xν ] = (S ⊗ S)T (Xν ) · γ · (Yν βS(Zν ))(S(a)).
This can be rewritten (S ⊗ S)T (a)FD = FD (S(a)), ∀a ∈ H where FD =
(S ⊗ S)T (Xν ) · γ · (Yν βS(Zν )).
(4.10)
On Quasi-Hopf Superalgebras
353
To find γ ∈ H ⊗ H satisfying (4.9), we first note, ∀a ∈ H , ( ⊗ )(a) = ( ⊗ 1 ⊗ 1)(1 ⊗ )(a) = ( ⊗ 1 ⊗ 1)( −1 ( ⊗ 1)(a) ) = ( ⊗ 1 ⊗ 1) −1 · (( ⊗ 1) ⊗ 1)(a) · ( ⊗ 1 ⊗ 1) = ( ⊗ 1 ⊗ 1) −1 · ( ⊗ 1) · ((1 ⊗ ) ⊗ 1)(a) · ( −1 ⊗ 1) · ( ⊗ 1 ⊗ 1) = ( ⊗ 1 ⊗ 1) −1 · ( ⊗ 1) · (1 ⊗ ⊗ 1)( ⊗ 1)(a) · ( −1 ⊗ 1) · ( ⊗ 1 ⊗ 1) . We thus arrive at ( −1 ⊗ 1) · ( ⊗ 1 ⊗ 1) · ( ⊗ )(a) = (1 ⊗ ⊗ 1)( ⊗ 1)(a) · ( −1 ⊗ 1) · ( ⊗ 1 ⊗ 1) . Now write ( ⊗ )(a) = =
(4.11)
(a(1) ) ⊗ (a(2) ) L L R R ⊗ a(2) ⊗ a(1) ⊗ a(2) , a(1)
L L L (1 ⊗ )(a(1) ⊗ a(2) ⊗ a(3) ) L L L L = a(1) ⊗ a(2)(1) ⊗ a(2)(2) ⊗ a(3) .
(1 ⊗ ⊗ 1)( ⊗ 1)(a) =
Lemma 2. γ = (m ⊗ m) · (1 ⊗ α ⊗ 1 ⊗ α)(S ⊗ 1 ⊗ S ⊗ 1) · (1 ⊗ T ⊗ 1)(T ⊗ 1 ⊗ 1)( −1 ⊗ 1)( ⊗ 1 ⊗ 1) satisfies (4.9). Moreover γ = (m ⊗ m) · (1 ⊗ α ⊗ 1 ⊗ α)(S ⊗ 1 ⊗ S ⊗ 1) · (1 ⊗ T ⊗ 1)(T ⊗ 1 ⊗ 1)(1 ⊗ )(1 ⊗ 1 ⊗ ) −1 . Proof. First we set Ai ⊗ B i ⊗ C i ⊗ D i i
≡
(1)
X¯ ν Xµ(1) ⊗ Y¯ν Xµ(2) ⊗ Z¯ ν Yµ ⊗ Zµ (−1)[Xµ
(2)
][X¯ ν ]+[Xµ ][Z¯ ν ]
= ( −1 ⊗ 1)( ⊗ 1 ⊗ 1) . Note that [Ai ] + [Bi ] + [Ci ] + [Di ] = 0 ( mod 2). Now we have, from (4.11), L L R L L R R ⊗ Bi a(2) ⊗ Ci a(1) ⊗ Di a(2) (−1)[a(1) ][Ai ]+[a(2) ]([Ci ]+[Di ])+[a(1) ][Di ] Ai a(1) L L L L = a(1) Ai ⊗ a(2)(1) Bi ⊗ a(2)(2) Ci ⊗ a(3) Di L
L
L
L
L
× (−1)[Ai ]([a(2) ]+[a(3) ])+[Bi ]([a(3) ]+[a(2)(2) ])+[Ci ][a(3) ] .
(4.12)
354
M. D. Gould, Y.-Z. Zhang, P. S. Isaac
Applying (m ⊗ m)(S ⊗ α ⊗ S ⊗ α)(1 ⊗ T ⊗ 1)(T ⊗ 1 ⊗ 1) to the above we obtain L R L R )S(Bi )αCi a(1) ⊗ S(a(1) )S(Ai )αDi a(2) l.h.s. = S(a(2) R
L
L
R
× (−1)[a(1) ]([Ai ]+[Di ])+[Ai ]([Bi ]+[Ci ])+[a(1) ]([Bi ]+[Ci ]+[a(2) ]+[a(1) ]) L L R R = (S ⊗ S)(a(2) ⊗ a(1) )(S(Bi )αCi ⊗ S(Ai )αDi )(a(1) ⊗ a(2) ) L
R
× (−1)[Ai ]([Bi ]+[Ci ])+[a(1) ][a(1) ] = (S ⊗ S)T (a(1) )(S(Bi )αCi ⊗ S(Ai )αDi )(a(2) )(−1)[Ai ]([Bi ]+[Ci ]) = (S ⊗ S)T (a(1) ) · γ · (a(2) ) L L L = r.h.s. = S(Bi )(a(2) )αCi ⊗ S(Ai )S(a(1) )αa(3) Di L
L
× (−1)[Di ]([a(1) ]+[a(3) ])+[Ai ]([Bi ]+[Ci ]) = S(Bi )αCi ⊗ S(Ai )S(a(1) )αa(2) Di (−1)[Di ]([a(1) ]+[a(2) ])+[Ai ]([Bi ]+[Ci ]) = (a) S(Bi )αCi ⊗ S(Ai )αDi (−1)[Ai ]([Bi ]+[Ci ]) = (a)γ with γ given by (4.12). As to the second part, note that γ = S(Y¯ν Xµ(2) )α Z¯ ν Yµ ⊗ S(X¯ ν Xµ(1) )αZµ (1)
(2)
¯
¯
¯
(1)
¯
(2)
× (−1)[Xµ ][Xν ]+[Xµ ][Zν ]+([Xν ]+[Xµ ])([Xν ]+[Xµ ]+[Yµ ]) ¯ = (S ⊗ S)T (Xµ )(S(Y¯ν )α Z¯ ν Yµ ⊗ S(X¯ ν )αZµ )(−1)[Xν ](1+[Yµ ]) . From (2.2), (1 ⊗ )(1 ⊗ 1 ⊗ ) −1 = (1 ⊗ ⊗ 1) −1 · ( −1 ⊗ 1)( ⊗ 1 ⊗ 1) = X¯ σ X¯ ν Xµ(1) ⊗ Y¯σ(1) Y¯ν Xµ(2) ⊗ Y¯σ(2) Z¯ ν Yµ ⊗ Z¯ σ Zµ (1) (1) (2) ¯ (2) ]([Z¯ ν ]+[Xµ ])+[Y¯σ(1) ]([X¯ ν ]+[Xµ ])+[Xµ ][X¯ ν ]+[Xµ ][Z¯ ν ]
¯
× (−1)[Xσ ][Zµ ]+[Yσ
.
If we then apply (m ⊗ m)(S ⊗ α ⊗ S ⊗ α)(1 ⊗ T ⊗ 1)(T ⊗ 1 ⊗ 1) to this equation, straightforward calculation reveals (m ⊗ m)(S ⊗ α ⊗ S ⊗ α)(1 ⊗ T ⊗ 1)(T ⊗ 1 ⊗ 1)(1 ⊗ )(1 ⊗ 1 ⊗ ) −1 ¯ = (S ⊗ S)T (Xµ )(S(Y¯ν )α Z¯ ν Yµ ⊗ S(X¯ ν )αZµ )(−1)[Xν ](1+[Yµ ]) = γ.
Thus we have shown that FD defined by (4.10) satisfies (a)FD = FD (a), ∀a ∈ H.
(4.13)
It remains to show that FD is invertible and thus qualifies as a twist. We proceed by constructing FD−1 explicitly.
On Quasi-Hopf Superalgebras
355
Note. From the definition of γ , it is easily seen that (1 ⊗ )γ = α ⊗ (α), ( ⊗ 1)γ = (α) ⊗ α, so that (1 ⊗ )FD = ( ⊗ 1)FD = (α)S(Xν )αYν βS(Zν ) = (α). It then becomes clear, since (α)(β) = 1, that strictly speaking (β)FD qualifies as a twist. This corresponds to a non-zero scalar multiple of FD which is not important below. Now let γ¯ ∈ H ⊗ H be an even element. Apply (1 ⊗ γ¯ )( ⊗ ) to Lemma 1, (4.3), to give l.h.s. = = r.h.s. = =
(a)(X¯ ν ) ⊗ γ¯ (S(Y¯ν )α Z¯ ν ) ¯
L
L
L L L (X¯ ν a(1) ) ⊗ γ¯ (S(a(2) )S(Y¯ν )α Z¯ ν ) (a(3) )(−1)[Xν ]([a(1) ]+[a(2) ]) L (X¯ ν )(a(1) )
¯ L L ⊗ γ¯ (S ⊗ S)T (a(2) ) (S(Y¯ν )α Z¯ ν ) (a(3) )(−1)[Xν ]([a(1) ]+[a(2) ]) . L
L
On applying (m ⊗ m)(1 ⊗ T ⊗ 1), we obtain
(a)(X¯ ν ) · γ¯ · (S(Y¯ν )α Z¯ ν ) L L ¯ L L L = (X¯ ν )(a(1) )γ¯ (S ⊗ S)T (a(2) ) (S(Y¯ν )α Z¯ ν ) (a(3) )(−1)[Xν ]([a(1) ]+[a(2) ]) .
If γ¯ satisfies
(a(1) ) · γ¯ · (S ⊗ S)T (a(2) ) = (a)γ¯ , ∀a ∈ H,
(4.14)
then FD−1 (a) = (a)FD−1 , ∀a ∈ H,
(4.15)
where FD−1 =
(X¯ ν ) · γ¯ · (S(Y¯ν )α Z¯ ν ).
(4.16)
To explicitly construct γ¯ ∈ H ⊗ H satisfying (4.14), we note ( ⊗ )(a) · ( ⊗ 1 ⊗ 1) −1 · ( ⊗ 1) = ( ⊗ 1 ⊗ 1) −1 · ( ⊗ 1) · (1 ⊗ ⊗ 1)( ⊗ 1)(a).
(4.17)
356
M. D. Gould, Y.-Z. Zhang, P. S. Isaac
Lemma 3. γ¯ = (m ⊗ m) · (1 ⊗ βS ⊗ 1 ⊗ βS) · (1 ⊗ T ⊗ 1)(1 ⊗ 1 ⊗ T )( ⊗ 1 ⊗ 1) −1 · ( ⊗ 1) satisfies (4.14). Moreover, γ¯ = (m ⊗ m) · (1 ⊗ βS ⊗ 1 ⊗ βS) · (1 ⊗ T ⊗ 1)(1 ⊗ 1 ⊗ T )(1 ⊗ 1 ⊗ ) · (1 ⊗ −1 ). Proof. The proof is very similar to that of Lemma 2. We obtain the first part by applying (m ⊗ m)(1 ⊗ βS ⊗ 1 ⊗ βS)(1 ⊗ T ⊗ 1)(1 ⊗ 1 ⊗ T ) to (4.17). The second part is obtained by noting that γ¯ can be written as ¯ ¯ ¯ γ¯ = (X¯ ν ) · (Xµ βS(Z¯ ν ) ⊗ Yµ βS(Y¯ν Zµ ))(−1)[Zν ]([Yµ ]+[Yν ])+[Xν ][Zµ ] , then applying (m ⊗ m)(1 ⊗ βS ⊗ 1 ⊗ βS)(1 ⊗ T ⊗ 1)(1 ⊗ 1 ⊗ T ) to (1 ⊗ 1 ⊗ ) · (1 ⊗ −1 ) = ( ⊗ 1 ⊗ 1) −1 · ( ⊗ 1) · (1 ⊗ ⊗ 1) ,
which is a restatement of (2.2). This proves the second part.
It remains to show that FD−1 is indeed the inverse of FD . To this end, the following result is useful. Lemma 4. FD (α) = γ ,
(β)FD−1 = γ¯ . Proof. Note that
FD ⊗ 1 = (m(1 ⊗ m) ⊗ 1) · ((S ⊗ S)T ⊗ γ ⊗ ⊗ 1) · (1 ⊗ 1 ⊗ βS ⊗ 1) · ( ⊗ 1) (4.5) = (S ⊗ S){T (Xµ )T (X¯ ρ )}(Y¯ρ ) · γ · (Yµ X¯ σ βS(Zµ(1) Y¯σ )) · (S(Yν ))(Xν ) (2)
¯ ¯ ¯ ⊗ Zν Zµ(2) Z¯ σ Z¯ ρ (−1)[Xρ ][Xµ ]+[Xσ ][Zµ ]+[Yσ ][Zµ
(1) ]+[Xν ]([Z¯ σ ]+[X¯ ρ ]+[Zµ ])
Now applying 1 ⊗ 1 ⊗ S to both sides, this reduces to FD ⊗ 1 = (S ⊗ S)T (Xµ ) · γ · (Yµ X¯ σ βS(Zµ(1) Y¯σ )) ¯
(2)
¯
⊗ S(Zµ(2) Z¯ σ )(−1)[Xσ ][Zµ ]+[Yσ ][Zµ ] . Further, applying (1 ⊗ 1 ⊗ )(1 ⊗ 1 ⊗ S −1 ) to both sides gives FD ⊗ 1 ⊗ 1 = (S ⊗ S)T (Xµ ) · γ · (Yµ X¯ σ βS(Zµ(1) Y¯σ )) ¯
¯
(2)
⊗ (Zµ(2) Z¯ σ )(−1)[Xσ ][Zµ ]+[Yσ ][Zµ ] .
.
On Quasi-Hopf Superalgebras
357
Now multiply by (α) ⊗ 1 ⊗ 1 from the right and apply (m ⊗ m)(1 ⊗ T ⊗ 1) so that (S ⊗ S)T (Xµ ) · γ FD (α) = (1)
(2)
¯ ¯ · (Yµ X¯ σ βS(Y¯σ )S(Zµ(1) )αZµ(2) Z¯ σ )(−1)[Yσ ]([Zµ ]+[Zµ ])+[Xσ ][Zµ ] = (S ⊗ S)T (Xµ ) · γ · (Yµ X¯ σ βS(Y¯σ )(Zµ )α Z¯ σ ) = (S ⊗ S)T (Xµ ) · γ · (Yµ (Zµ ))(X¯ σ βS(Y¯σ )α Z¯ σ ) = (S ⊗ S)T (Xµ ) · γ · (Yµ (Zµ )) = γ.
The second part (β)FD−1 = γ¯ is proved similarly with the help of (4.7) and (4.15). Now set ¯ ¯ (2) A¯ i ⊗ B¯ i ⊗ C¯ i ⊗ D¯ i ≡ X¯ ν(1) Xµ ⊗ X¯ ν(2) Yµ ⊗ Y¯ν Zµ (−1)[Zµ ][Yν ]+[Xν ][Xµ ] = ( ⊗ 1 ⊗ 1) −1 · ( ⊗ 1). We compute FD−1 · FD : FD−1 · FD =
(4.15)
(4.13)
(4.15)
= = =
(X¯ σ βS(Y¯σ )α Z¯ σ )FD−1 · FD (X¯ σ βS(Y¯σ ))FD−1 (α Z¯ σ )FD (X¯ σ )(β)(S(Y¯σ ))FD−1 · FD (α)(Z¯ σ ) (X¯ σ )(β)FD−1 (S(Y¯σ )) · FD (α)(Z¯ σ ).
Using Lemma 4 this reduces to FD−1 · FD = (X¯ σ(1) A¯ i ⊗ X¯ σ(2) B¯ i )(β ⊗ β)(S ⊗ S) · T (Aj Y¯σ(1) C¯ i ⊗ Bj Y¯σ(2) D¯ i ) · (α ⊗ α)(Cj Z¯ σ(1) ⊗ Dj Z¯ σ(2) ) · (−1)ξ , where ξ = [Bj ]([D¯ i ] + [Y¯σ ]) + [Y¯σ ]([Aj ] + [C¯ i ] + [D¯ i ]) + [Aj ]([C¯ i ] + [D¯ i ]) + [A¯ i ][X¯ σ(2) ] + [C¯ i ][Y¯σ(2) ] + [Dj ][Z¯ σ(1) ] + [Bj ][Y¯σ(1) ]. On the other hand, setting r≡ (1⊗2 ⊗ Aj ⊗ Bj ⊗ Cj ⊗ Dj ) · ( ⊗ ⊗ ) −1 · (A¯ i ⊗ B¯ i ⊗ C¯ i ⊗ D¯ i ⊗ 1⊗2 ) = X¯ σ(1) A¯ i ⊗ X¯ σ(2) B¯ i ⊗ Aj Y¯σ(1) C¯ i ⊗ Bj Y¯σ(2) D¯ i ⊗ Cj Z¯ σ(1) ⊗ Dj Z¯ σ(2) (−1)ξ , implies
FD−1 · FD = ϕ(r)
358
M. D. Gould, Y.-Z. Zhang, P. S. Isaac
with ϕ : H ⊗6 → H ⊗2 defined by ϕ(a1 ⊗ a2 ⊗ a3 ⊗ a4 ⊗ a5 ⊗ a6 ) = (a1 ⊗ a2 )(β ⊗ β)(S ⊗ S) · T (a3 ⊗ a4 ) · (α ⊗ α)(a5 ⊗ a6 ).
Remark. The two equivalent expressions of γ¯ (γ ) implies that we can choose either
A¯ i ⊗ B¯ i ⊗ C¯ i ⊗ D¯ i =
Aj ⊗ Bj ⊗ Cj ⊗ Dj =
( ⊗ 1 ⊗ 1) −1 · ( ⊗ 1) or (1 ⊗ 1 ⊗ ) · (1 ⊗ −1 ), (1 ⊗ ) · (1 ⊗ 1 ⊗ ) −1 or ( −1 ⊗ 1) · (1 ⊗ 1 ⊗ ) .
Similarly, we can show FD−1 · FD = ϕ(¯ ¯ r ), where r¯ =
(Aj ⊗ Bj ⊗ Cj ⊗ Dj ⊗ 1⊗2 ) · ( ⊗ ⊗ ) · (1⊗2 ⊗ A¯ i ⊗ B¯ i ⊗ C¯ i ⊗ D¯ i )
with ϕ¯ : H ⊗6 → H ⊗2 defined by ϕ(a ¯ 1 ⊗ a2 ⊗ a3 ⊗ a4 ⊗ a5 ⊗ a6 ) = (S ⊗ S) · T (a1 ⊗ a2 ) · (α ⊗ α)(a3 ⊗ a4 ) · (β ⊗ β)(S ⊗ S) · T (a5 ⊗ a6 ). Before proceeding, it is worth noting the following properties of ϕ and ϕ¯ which follow immediately from their definition: ϕ(h23 (a)) = (a)ϕ(h) = ϕ(45 (a)h), ϕ(h14 (a)) = (a)ϕ(h) = ϕ(36 (a)h), ϕ( ¯ 23 (a)h) = (a)ϕ(h) ¯ = ϕ(h ¯ 45 (a)), ϕ( ¯ 14 (a)h) = (a)ϕ(h) ¯ = ϕ(h ¯ 36 (a)),
(4.18) (4.19) (4.20) (4.21)
a(1) ⊗ 1 ⊗ 1 ⊗ ∀a ∈ H, h ∈ H ⊗6 and where we have used the notation 14 (a) = a(2) ⊗ 1 ⊗ 1 (i.e. (a) acting in the first and fourth components of the tensor product), etc. Now we choose the following expressions for r and r¯ : r = (1⊗2 ⊗ (1 ⊗ )(1 ⊗ 1 ⊗ ) −1 ) · ( ⊗ ⊗ ) −1 · (( ⊗ 1 ⊗ 1) −1 · ( ⊗ 1) ⊗ 1⊗2 ), r¯ = (( −1 ⊗ 1)(1 ⊗ 1 ⊗ ) ⊗ 1⊗2 ) · ( ⊗ ⊗ ) · (1⊗2 ⊗ (1 ⊗ 1 ⊗ ) · (1 ⊗ −1 )),
On Quasi-Hopf Superalgebras
359
which implies r = (1⊗3 ⊗ ) · ( ⊗ 1⊗2 ⊗ ) × {(1 ⊗ −1 ) · (1 ⊗ ⊗ 1) −1 · ( −1 ⊗ 1)} · ( ⊗ 1⊗3 ) (2.2)
= (1⊗3 ⊗ )( ⊗ 1 ⊗ (1 ⊗ )) −1 · (( ⊗ 1) ⊗ 1 ⊗ ) −1 · ( ⊗ 1⊗3 )
(2.1)
= ( ⊗ 1 ⊗ ( ⊗ 1)) −1 · (1⊗3 ⊗ ) · ( ⊗ 1⊗3 )((1 ⊗ ) ⊗ 1 ⊗ ) −1 = 45 (Z¯ ν(1) )((X¯ ν ) ⊗ Y¯ν ⊗ 1⊗2 ⊗ Z¯ ν(2) )( ⊗ 1⊗3 ) (2) ¯ ¯ (1) ][Z¯ ν ]+[X¯ µ ][Xµ ]
· (1⊗3 ⊗ )(X¯ µ(1) ⊗ 1⊗2 ⊗ Y¯µ ⊗ (Z¯ µ ))23 (X¯ µ(2) )(−1)[Zν
.
Equation (4.18) implies ϕ(r) = ϕ(s), where s= ((X¯ ν ) ⊗ Y¯ν ⊗ 1⊗2 ⊗ Z¯ ν )( ⊗ 1⊗3 )(1⊗3 ⊗ )(X¯ µ ⊗ 1⊗2 ⊗ Y¯µ ⊗ (Z¯ µ )). Using (2.2), and noting that ⊗3 ⊗ (1 ⊗ T )(T ⊗ 1))(1 ⊗ −1 ⊗ 1⊗2 ), −1 236 = (1 ⊗3 ⊗2 −1 ⊗ −1 ⊗ 1), 145 = ((T ⊗ 1)(1 ⊗ T ) ⊗ 1 )(1
the expression for s reduces to s= 36 (Zµ )45 (Y¯σ ) · (Xµ ⊗ Yµ ⊗ 1⊗4 ) ⊗4 ¯ · −1 ⊗ Z¯ ν )(X¯ σ ⊗ 1⊗4 ⊗ Z¯ σ ) 236 · (Xν ⊗ 1 · −1 · (1⊗4 ⊗ Yρ ⊗ Zρ )23 (Y¯ν ) 145
¯
¯
¯
¯
· 14 (Xρ )(−1)[Yσ ]([Zµ ]+[Yν ]+[Xσ ])+[Yν ]([Xρ ]+[Zν ])+[Zµ ]+[Xρ ] . Equations (4.18) and (4.19) then imply ϕ(s) = ϕ(t), where −1 t = −1 236 · 145 ¯ ¯ = X¯ µ ⊗ X¯ ν ⊗ Y¯ν ⊗ Y¯µ ⊗ Z¯ µ ⊗ Z¯ ν (−1)[Zν ][Zµ ] ,
which then implies ϕ(r) = ϕ(t) = (X¯ µ ⊗ X¯ ν )(β ⊗ β)(S(Y¯µ ) ¯ ¯ ¯ ¯ ⊗ S(Y¯ν ))(α ⊗ α)(Z¯ µ ⊗ Z¯ ν )(−1)[Zν ][Zµ ]+[Yν ][Yµ ] = X¯ µ βS(Y¯µ )α Z¯ µ ⊗ X¯ ν βS(Y¯ν )α Z¯ ν = 1 ⊗ 1.
360
M. D. Gould, Y.-Z. Zhang, P. S. Isaac
Similarly, with the following choice of r¯ , r¯ = ( −1 ⊗ 1⊗3 ) · ( ⊗ 1⊗2 ⊗ )(( ⊗ 1) · (1 ⊗ ⊗ 1) · (1 ⊗ )) · (1⊗3 ⊗ −1 ), and using (2.2) and (2.1), we obtain ϕ(¯ ¯ r ) = ϕ(¯ ¯ s ), with s¯ defined by s¯ =
(Xµ ⊗ 1⊗2 ⊗ Yµ ⊗ (Zµ ))(1⊗3 ⊗ −1 )
· ( −1 ⊗ 1⊗3 )((Xν ) ⊗ Yρ ⊗ 1⊗2 ⊗ Zν ) which reduces to s¯ =
14 (X¯ ν )23 (Yρ )(1⊗4 ⊗ Y¯ν ⊗ Z¯ ν )
· 145 · (Xµ ⊗ 1⊗4 ⊗ Zµ )(Xρ ⊗ 1⊗4 ⊗ Zρ ) · 236 · (X¯ σ ⊗ Y¯σ ⊗ 1⊗4 )45 (Yµ ) ¯ ¯ · 36 (Z¯ σ )(−1)[Xµ ][Zµ ]+[Yρ ]([Xρ ]+[Xν ]+[Yµ ])+[Yµ ][Zσ ] .
This implies that
ϕ(¯ ¯ r ) = ϕ(¯ ¯ s ) = ϕ( ¯ t¯),
where
t¯ = 145 · 236 = t −1 ,
from which it follows that
ϕ(¯ ¯ r ) = ϕ( ¯ t¯) = 1 ⊗ 1,
FD−1
so that is indeed the inverse of FD . Summarising the above results, we have proved Theorem 2. is obtained from by twisting with FD . That is, (a) = FD (a)FD−1 , ∀a ∈ H with FD as in (4.10) and γ as in Lemma 2. Moreover FD−1 is given explicitly by (4.16) with γ¯ as in Lemma 3. Remark. It is actually F¯D = (β)FD which qualifies as a twist. Thus we have (a) = F¯D (a)F¯D−1 , ∀a ∈ H with F¯D−1 = (α)FD−1 . Thus H is a QHSA with coproduct under the twisted structure induced by F¯D . The following gives alternative expressions for FD and FD−1 (the proof is straightforward). Lemma 5. FD = FD−1 =
(X¯ ν βS(Y¯ν )) · γ · (Z¯ ν ), (S(Xν )αYν ) · γ¯ · (S ⊗ S)T (Zν ).
On Quasi-Hopf Superalgebras
361
5. QHSA Structure Induced by In this section we give the full QHSA induced by . Proposition 4. H is a QHSA with coproduct, coassociator and canonical elements given respectively by , ≡ (S ⊗ S ⊗ S) 321 , α = S(β), β = S(α). T Proof. First we note that = (S ⊗ S ⊗ S)( T )−1 , T = −1 321 . is the coassociator associated with the opposite QHSA structure, and obeys
(1 ⊗ T )T (a)( T )−1 = ( T )−1 (T ⊗ 1)T (a). Applying S ⊗ S ⊗ S to both sides of this expression yields S(a(2) ) ⊗ (S ⊗ S)T (a(1) )(−1)[a(1) ][a(2) ] = ( (S ⊗ S)T (a(2) ) ⊗ S(a(1) )(−1)[a(1) ][a(2) ] ) · , which reduces to · (1 ⊗ )(S ⊗ S)T (a) = ( ⊗ 1)(S ⊗ S)T (a) · or
(1 ⊗ ) (a) = ( )−1 ( ⊗ 1) (a) , ∀a ∈ H. Next, from (T ⊗ 1 ⊗ 1) T · (1 ⊗ 1 ⊗ T ) T = ( T ⊗ 1) · (1 ⊗ T ⊗ 1) T · (1 ⊗ T )
we take the inverse (1 ⊗ 1 ⊗ T )( T )−1 · (T ⊗ 1 ⊗ 1)( T )−1 = (1 ⊗ ( T )−1 ) · (1 ⊗ T ⊗ 1)( T )−1 · (( T )−1 ⊗ 1) and then apply S ⊗ S ⊗ S ⊗ S to both sides: l.h.s. = ((S ⊗ S)T · S −1 ⊗ 1 ⊗ 1)(S ⊗ S ⊗ S)( T )−1 ·(1 ⊗ 1 ⊗ (S ⊗ S)T · S −1 )(S ⊗ S ⊗ S)( T )−1 = ( ⊗ 1 ⊗ 1) · (1 ⊗ 1 ⊗ ) = r.h.s. = ( ⊗ 1)(1 ⊗ (S ⊗ S)T · S −1 ⊗ 1)(S ⊗ S ⊗ S)( T )−1 · (1 ⊗ ) = ( ⊗ 1) · (1 ⊗ ⊗ 1) · (1 ⊗ ). Thirdly, from (1 ⊗ ⊗ 1) T = 1, and applying S ⊗ S ⊗ S to both sides gives (1 ⊗ ⊗ 1) = 1.
362
M. D. Gould, Y.-Z. Zhang, P. S. Isaac
As to the canonical elements α and β , m · (1 ⊗ α )(S ⊗ 1) (a) = m · (1 ⊗ S(β))(S ⊗ 1)(S ⊗ S)T (S −1 (a)) = m · (1 ⊗ S(β))(S ⊗ 1)(S ⊗ S) a¯ (2) ⊗ a¯ (1) (−1)[a¯ (2) ][a¯ (1) ] = S 2 (a¯ (2) )S(β)S(a¯ (1) )(−1)[a¯ (2) ][a¯ (1) ] = S( a¯ (1) βS(a¯ (2) )) = (a)S(β) ¯ = (S −1 (a))S(β) = (a)α and similarly m · (1 ⊗ β )(1 ⊗ S) (a) = (a)β . Finally, m(m ⊗ 1) · (1 ⊗ β ⊗ α )(1 ⊗ S ⊗ 1)( )−1 = m(m ⊗ 1) · (1 ⊗ S(α) ⊗ S(β))(1 ⊗ S ⊗ 1)(S ⊗ S ⊗ S) −1 321 ¯ ¯ ¯ = S(Z¯ ν )S(α)S 2 (Y¯ν )S(β)S(X¯ ν )(−1)[Zν ]+[Xν ][Yν ] = S( X¯ ν βS(Y¯ν )α Z¯ ν ) = S(1) = 1, and similarly m(m ⊗ 1) · (S ⊗ 1 ⊗ 1)(1 ⊗ α ⊗ β )(1 ⊗ 1 ⊗ S) = 1. This proves that H is a QHSA with the structure given.
5.1. Connection with the Drinfeld twist. Our aim is to show that the twisted structure induced by FD coincides precisely with the QHSA structure of Proposition 4. We have already shown in Theorem 2 that = FD , so it remains to show that = FD , while α and β are equivalent to αFD and βFD respectively. For the coassociator, it remains to prove = (S ⊗ S ⊗ S) 321 = FD
= (FD ⊗ 1)( ⊗ 1)FD · · (1 ⊗ )FD−1 · (1 ⊗ FD−1 ),
or · (1 ⊗ FD )(1 ⊗ )FD = (FD ⊗ 1)( ⊗ 1)FD · .
(5.1)
On Quasi-Hopf Superalgebras
363
To this end, (1 ⊗ FD )(1 ⊗ )FD (4.13)
= (1 ⊗ )FD · (1 ⊗ FD ) (4.10) = (1 ⊗ ) (S(Xν )) · (1 ⊗ FD )(1 ⊗ FD−1 ) · (1 ⊗ )γ · (1 ⊗ FD )(1 ⊗ FD−1 ) · (1 ⊗ )(Yν βS(Zν ))(1 ⊗ FD ) (2.1) = (1 ⊗ ) (S(Xν )) · (1 ⊗ FD ) · (1 ⊗ )γ · −1 ( ⊗ 1)(Yν βS(Zν )) · . Now multiplying both sides by on the left gives · (1 ⊗ FD )(1 ⊗ )FD = ( ⊗ 1) (S(Xν )) · · (1 ⊗ FD ) · (1 ⊗ )γ · −1 ( ⊗ 1)(Yν βS(Zν )) · , while we can likewise show (FD ⊗ 1)( ⊗ 1)FD · = ( ⊗ 1)FD · (FD ⊗ 1) · = ( ⊗ 1) (S(Xν )) · ( )−1 · (FD ⊗ 1)( ⊗ 1)γ · · ( )−1 · ( ⊗ 1)(Yν βS(Zν )) · . So to prove (5.1), it suffices to prove (1 ⊗ FD )(1 ⊗ )γ = ( )−1 · (FD ⊗ 1)( ⊗ 1)γ · , or Lemma 6. ( )−1 · (FD ⊗ 1)( ⊗ 1)γ = (1 ⊗ FD )(1 ⊗ )γ · −1 . Proof. Since γ =
S(Bi )αCi ⊗ S(Ai )αDi (−1)[Ai ]([Bi ]+[Ci ]) ,
we have (FD ⊗ 1)( ⊗ 1)γ = FD (S(Bi ))(α)(Ci ) ⊗ S(Ai )αDi (−1)[Ai ]([Bi ]+[Ci ]) (4.13) = (S ⊗ S)T (Bi )FD (α)(Ci ) ⊗ S(Ai )αDi (−1)[Ai ]([Bi ]+[Ci ]) = (S ⊗ S)T (Bi ) · γ · (Ci ) ⊗ S(Ai )αDi (−1)[Ai ]([Bi ]+[Ci ]) = (S ⊗ S)T (Bi ) · (S ⊗ S)T (Aj ⊗ Bj ) · (α ⊗ α) · (Cj ⊗ Dj ) · (Ci ) ⊗ S(Ai )αDi (−1)[Ai ]([Bi ]+[Ci ]) ,
(5.2)
364
M. D. Gould, Y.-Z. Zhang, P. S. Isaac
where in the penultimate equation we have used Theorem 2. Set ¯ ¯ ¯ ( )−1 = (S ⊗ S ⊗ S)(Z¯ ν ⊗ Y¯ν ⊗ X¯ ν )(−1)[Zν ]+[Xν ][Yν ] which implies ( )−1 (FD ⊗ 1)( ⊗ 1)γ = (S ⊗ S)T (Y¯ν ⊗ Z¯ ν ) · (S ⊗ S)T · (Bi ) · (S ⊗ S)T (Aj ⊗ Bj ) · (α ⊗ α) · (Cj ⊗ Dj ) · (Ci ) ⊗ S(X¯ ν )S(Ai )αDi ¯
× (−1)[Ai ]([Bi ]+[Ci ])+[Xν ](1+[Bi ]+[Ci ]) = (S ⊗ S)T {(Aj ⊗ Bj )(Bi )(Y¯ν ⊗ Z¯ ν )} · (α ⊗ α) · (Cj ⊗ Dj ) · (Ci ) ⊗ S(Ai X¯ ν )αDi
¯
¯
¯
¯
× (−1)([Aj ]+[Bj ])([Bi ]+[Xν ])+[Bi ][Xν ]+([Ai ]+[Xν ])([Bi ]+[Ci ]+[Xν ]) = ζ (p), where p=
Ai X¯ ν ⊗ (Aj ⊗ Bj ) · (Bi ) · (Y¯ν ⊗ Z¯ ν ) ⊗ (Cj ⊗ Dj ) · (Ci ) ⊗ Di ¯
¯
×(−1)([Aj ]+[Bj ])([Bi ]+[Xν ])+[Bi ][Xν ] and with ζ : H ⊗6 → H ⊗3 defined by ζ (a1 ⊗ a2 ⊗ a3 ⊗ a4 ⊗ a5 ⊗ a6 ) = S(a3 )αa4 ⊗ S(a2 )αa5 ⊗ S(a1 )αa6 × (−1)[a1 ]([a2 ]+[a3 ]+[a4 ]+[a5 ])+[a2 ]([a3 ]+[a4 ]) . Also, p can be reduced to p= (1 ⊗ Aj ⊗ Bj ⊗ Cj ⊗ Dj ⊗ 1) · (1 ⊗ ⊗ ⊗ 1)(Ai ⊗ Bi ⊗ Ci ⊗ Di ) · ( −1 ⊗ 1⊗3 ) = {1 ⊗ (1 ⊗ )(1 ⊗ 1 ⊗ ) −1 ⊗ 1} · (1 ⊗ ⊗ ⊗ 1) · {(1 ⊗ )(1 ⊗ 1 ⊗ ) −1 } · ( −1 ⊗ 1⊗3 ). Now we compute the right-hand side of (5.2): (1 ⊗ FD )(1 ⊗ F )γ · −1 = S(Bi )αCi ⊗ FD (S(Ai )αDi ) · −1 (−1)[Ai ]([Bi ]+[Ci ]) = S(Bi )αCi ⊗ (S ⊗ S)T (Ai )FD (α)(Di ) · −1 (−1)[Ai ]([Bi ]+[Ci ]) = S(Bi )αCi ⊗ (S ⊗ S)T (Ai ) · γ · (Di ) · −1 (−1)[Ai ]([Bi ]+[Ci ]) = S(Bi )αCi X¯ ν ⊗ (S ⊗ S)T {(Aj ⊗ Bj )(Ai )} · (α ⊗ α) · (Cj ⊗ Dj ) ¯ · (Di ) · (Y¯ν ⊗ Z¯ ν )(−1)[Xν ]([Ai ]+[Di ])+[Ai ]([Aj ]+[Bj ]+[Bi ]+[Ci ]) = ζ (p), ˜
On Quasi-Hopf Superalgebras
365
where, in the third equality we have used Theorem 2. Here p˜ = (Aj ⊗ Bj )(Ai ) ⊗ Bi ⊗ Ci X¯ ν ⊗ (Cj ⊗ Dj ) · (Di ) · (Y¯ν ⊗ Z¯ ν ) ¯
× (−1)[Xν ]([Di ]+[Cj ]+[Dj ])+[Di ]([Cj ]+[Dj ]) = (Aj ⊗ Bj ⊗ 1⊗2 ⊗ Cj ⊗ Dj ) · ( ⊗ 1⊗2 ⊗ )(Ai ⊗ Bi ⊗ Ci ⊗ Di ) · (1⊗3 ⊗ −1 ). Therefore, to prove (5.2), it suffices to show that ζ (p) = ζ (p). ˜
(5.3)
We first note that ∀h ∈ H ⊗6 and ∀a ∈ H (notation as in Eqs. (4.18)–(4.21)) ζ (34 (a)h) = (a)ζ (h) = ζ (25 (a)h) = ζ (16 (a)h). We can also write where
(5.4)
¯ p˜ = {(1 ⊗ )(1 ⊗ 1 ⊗ ) −1 }1256 · p,
p¯ = ( ⊗ 1⊗2 ⊗ ){(1 ⊗ )(1 ⊗ 1 ⊗ ) −1 } · (1⊗3 ⊗ −1 ).
In the following we use ∼ to denote equivalence under the map ζ : p˜
(5.4),(2.4)
∼
=
(1⊗2 ⊗ (Xν ) ⊗ Yν ⊗ Zν ) · p¯ {(1 ⊗ )(1 ⊗ 1 ⊗ ) −1 }1256 (1 ⊗ Xµ ⊗ 1⊗2 ⊗ Yµ ⊗ Zµ )(X¯ ν ⊗ Y¯ν ⊗ 1⊗2 ⊗ (Z¯ ν )
∼
· {1⊗2 ⊗ ( ⊗ 1 ⊗ 1) } · p¯ L L L (1 ⊗ )1256 · (X¯ ν ⊗ Y¯ν ⊗ (Z¯ ν(1) ) ⊗ Z¯ ν(2) ⊗ Z¯ ν(3) )
=
· {1⊗2 ⊗ ( ⊗ 1 ⊗ 1) } · p¯ (1 ⊗ Xµ ⊗ 1⊗2 ⊗ Yµ ⊗ Zµ ){1⊗2 ⊗ ( ⊗ 1 ⊗ 1)( ⊗ 1)} −1
(5.4),(2.4)
(5.4),(2.4)
∼
=
· {1⊗2 ⊗ ( ⊗ 1 ⊗ 1) } · p¯ (1 ⊗ Xµ ⊗ (Yµ(1) ) ⊗ Yµ(2) ⊗ Zµ ){1⊗2 ⊗ ( ⊗ 1 ⊗ 1)( ⊗ 1)} −1 · {1⊗2 ⊗ ( ⊗ 1 ⊗ 1) } · p¯ {1 ⊗ (1 ⊗ ( ⊗ 1) ⊗ 1) } · {1⊗2 ⊗ (( ⊗ 1) ⊗ 1)} −1 · {1⊗2 ⊗ ( ⊗ 1 ⊗ 1) } · p. ¯
That is, ζ (p) ˜ = ζ (u), where u = {1 ⊗ (1 ⊗ ( ⊗ 1) ⊗ 1) } ¯ · {1⊗2 ⊗ (( ⊗ 1) ⊗ 1)} −1 · {1⊗2 ⊗ ( ⊗ 1 ⊗ 1) } · p.
366
M. D. Gould, Y.-Z. Zhang, P. S. Isaac
We now compute p. Using Eq. (2.2) we obtain p = (1⊗2 ⊗ ⊗ 1){1 ⊗ (1 ⊗ 1 ⊗ ) −1 ⊗ 1} · {1 ⊗ ( ⊗ ⊗ 1) } · (1 ⊗ ⊗ ⊗ 1)(1 ⊗ 1 ⊗ ) −1 · ( −1 ⊗ 1⊗3 ) = (1⊗2 ⊗ ⊗ 1){1 ⊗ (1 ⊗ 1 ⊗ ) −1 ⊗ 1} · {1 ⊗ ( ⊗ ⊗ 1) } · {1 ⊗ (1⊗2 ⊗ ( ⊗ 1)) } · {1⊗2 ⊗ (1 ⊗ ( ⊗ 1))} −1 · { ⊗ 1 ⊗ ( ⊗ 1)} −1 = (1⊗2 ⊗ ⊗ 1){1 ⊗ (1 ⊗ (1 ⊗ ) ⊗ 1) } · {1⊗2 ⊗ (1 ⊗ ⊗ 1) } · {1⊗2 ⊗ (1 ⊗ ( ⊗ 1))} −1 · { ⊗ 1 ⊗ ( ⊗ 1)} −1 = (1⊗2 ⊗ ⊗ 1){1 ⊗ (1 ⊗ (1 ⊗ ) ⊗ 1) } · {1⊗2 ⊗ (1 ⊗ ⊗ 1)( ⊗ 1)} −1 · {1⊗2 ⊗ (1 ⊗ ⊗ 1) } · { ⊗ 1 ⊗ ( ⊗ 1)} −1 = {1 ⊗ (1 ⊗ ( ⊗ 1) ⊗ 1) } · {1⊗2 ⊗ (( ⊗ 1) ⊗ 1)} −1 · (1⊗2 ⊗ ⊗ 1){1⊗2 ⊗ (1 ⊗ ⊗ 1) } · { ⊗ 1 ⊗ ( ⊗ 1)} −1 = {1 ⊗ (1 ⊗ ( ⊗ 1) ⊗ 1) } · {1⊗2 ⊗ (( ⊗ 1) ⊗ 1)} −1 · {1⊗2 ⊗ ( ⊗ 1 ⊗ 1) } · {1⊗2 ⊗ (1 ⊗ 1 ⊗ ) } · { ⊗ 1 ⊗ (1 ⊗ )} −1 · (1⊗3 ⊗ −1 ) = {1 ⊗ (1 ⊗ ( ⊗ 1) ⊗ 1) } · {1⊗2 ⊗ (( ⊗ 1) ⊗ 1)} −1 · {1⊗2 ⊗ ( ⊗ 1 ⊗ 1) } · p¯ = u. Thus we have proved (5.3), i.e. ζ (p) = ζ (u) = ζ (p). ˜ This proves Lemma 6, so that
= FD ,
as required. For the canonical elements, we begin with the following useful result: Lemma 7. For any η ∈ H ⊗ H , m · (1 ⊗ α)(S ⊗ 1){(a)η} = (a)m · (1 ⊗ α)(S ⊗ 1)η, m · (1 ⊗ β)(1 ⊗ S){η(a)} = (a)m · (1 ⊗ β)(1 ⊗ S)η. Proof. For (5.5), l.h.s. = m · (1 ⊗ α)(S ⊗ 1){ (a(1) ⊗ a(2) )(ηi ⊗ ηi )} = S(ηi )S(a(1) )αa(2) ηi (−1)[ηi ]([a(1) ]+[a(2) ]) = (a)S(ηi )αηi = (a)m · (1 ⊗ α)(S ⊗ 1)η = r.h.s. The proof of (5.6) is similar.
(5.5) (5.6)
On Quasi-Hopf Superalgebras
367
For αFD , we have αFD = m · (1 ⊗ α)(S ⊗ 1)FD−1 = m · (1 ⊗ α)(S ⊗ 1) (X¯ ν ) · γ¯ · (S(Y¯ν )α Z¯ ν ) (5.5) = m · (1 ⊗ α)(S ⊗ 1) (X¯ ν ) · γ¯ · (S(Y¯ν )α Z¯ ν ) = m · (1 ⊗ α)(S ⊗ 1){γ¯ · (α)} = m · (1 ⊗ α)(S ⊗ 1) (X¯ ν )(Xµ βS(Z¯ ν ) ⊗ Yµ βS(Y¯ν Zµ )) · (α) ¯
¯
¯
¯
¯
¯
× (−1)[Zν ]([Yµ ]+[Yν ])+[Xν ][Zµ ] (5.5) = m · (1 ⊗ α)(S ⊗ 1) (X¯ ν )(Xµ βS(Z¯ ν ) ⊗ Yµ βS(Zµ )S(Y¯ν )) · (α) × (−1)[Zν ]([Yµ ]+[Yν ])+[Yν ][Zµ ] = S(β α˜ (1) )S(Xµ )αYµ βS(Zµ )α˜ (2) = S(β α˜ (1) )α˜ (2) = S( S −1 (α˜ (2) )β α˜ (1) (−1)[α˜ (1) ][α˜ (2) ] ), where we have used the notation (α) =
α˜ (1) ⊗ α˜ (2) .
Now observe S −1 (α˜ (2) )β α˜ (1) (−1)[α˜ (1) ][α˜ (2) ] = m · (1 ⊗ β)(S −1 ⊗ 1)T (α) = m · (1 ⊗ β)(S −1 ⊗ 1)(S ⊗ S)(S −1 (α)) = m · (1 ⊗ β)(1 ⊗ S)(S −1 (α)) = (S −1 (α))β = (α)β, which implies αFD = = = =
m · (1 ⊗ α)(S ⊗ 1)FD−1 S((α)β) (α)S(β) (α)α .
The result for βFD , namely βFD = m · (1 ⊗ β)(1 ⊗ S)FD = (β)β is proved similarly. We have therefore proved the following: Theorem 3. The QHSA structure defined on H by Proposition 4 is precisely equivalent to that induced by the Drinfeld twist FD .
368
M. D. Gould, Y.-Z. Zhang, P. S. Isaac
5.2. Drinfeld twisting on quasi-triangular QHSAs. Our aim here is to extend Theorem 3 to the important case of quasi-triangular QHSAs. We begin with Proposition 5. With the full QHSA structure of Proposition 4, H is quasi-triangular with R-matrix R = (S ⊗ S)R. Proof. Applying S ⊗ S to (2.9) gives, ∀a ∈ H R (S ⊗ S)T (a) = (S ⊗ S)(a)R , so that
R (a) = ( )T R .
Applying T ⊗ 1 to (2.10) gives −1 (T ⊗ 1)R = −1 321 R23 312 R13 213 .
Then applying S ⊗ S ⊗ S we obtain l.h.s. = ((S ⊗ S)T · S −1 ⊗ 1)(S ⊗ S)R = ( ⊗ 1)R = r.h.s. = (S ⊗ S ⊗ S) −1 213 · (S ⊗ S ⊗ S)R13 · (S ⊗ S ⊗ S) 312 ·(S ⊗ S ⊗ S)R23 · (S ⊗ S ⊗ S) −1 321 . Since 123 = (S ⊗ S ⊗ S) 321 , −1 ( )−1 123 = (S ⊗ S ⊗ S) 321 , −1 ( )−1 231 = (S ⊗ S ⊗ S) 213 ,
132 = (S ⊗ S ⊗ S) 312 ,
we have
−1 ( ⊗ 1)R = ( )−1 231 (R )13 132 (R )23 ( )123 .
Similarly, applying (S ⊗ S ⊗ S)(1 ⊗ T ) to (2.11) we arrive at (1 ⊗ )R = 312 (R )13 ( )−1 213 (R )12 123 .
This completes the proof. We now show that the R-matrix R coincides with the R-matrix RFD induced from R by the Drinfeld twist FD . Our main result is Theorem 4. The quasi-triangular QHSA structure on H , defined by Propositions 4, 5 is precisely equivalent to the quasi-triangular QHSA structure induced on H by the Drinfeld twist FD . Namely, R = FDT RFD−1 = RFD .
On Quasi-Hopf Superalgebras
369
Proof. To prove this, it suffices to show R FD = FDT R, where FDT =
(S ⊗ S)(Xν ) · γ T · T (Yν βS(Zν ))
= T · FD , and γ T = T · γ . To this end,
R FD = R (S ⊗ S)T (Xν ) · γ · (Yν βS(Zν )) = ( )T (S(Xν ))R · γ · (Yν βS(Zν )) = (S ⊗ S)(Xν )R · γ · (Yν βS(Zν )),
and similarly FDT R =
(S ⊗ S)(Xν ) · γ T · R(Yν βS(Zν )).
It therefore suffices to show Lemma 8.
R γ = γ T R.
Proof. Write R = at ⊗ a t and note that R is even. We then have for the left hand side Rγ = (S(at ) ⊗ S(a t ))(S(Bi )αCi ⊗ S(Ai )αDi )(−1)[Ai ]([Bi ]+[Ci ]) = (S ⊗ S)T {(Ai ⊗ Bi )(a t ⊗ at )} · (α ⊗ α) · (Ci ⊗ Di ) t
t
t
× (−1)[Bi ][a ]+([Ai ]+[a ])([Bi ]+[at ])+[Ai ]([Bi ]+[a ]) = (S ⊗ S)T {(Ai ⊗ Bi )R T } · (α ⊗ α) · (Ci ⊗ Di ) = ψ(v), where
v=
(Ai ⊗ Bi ⊗ Ci ⊗ Di )(R T ⊗ 1⊗2 )
and ψ : H ⊗4 → H ⊗2 is defined by ψ(a1 ⊗ a2 ⊗ a3 ⊗ a4 ) = (S ⊗ S)T (a1 ⊗ a2 ) · (α ⊗ α) · (a3 ⊗ a4 ). For the right hand side (using obvious notation), we have γTR = T( S(Bi )αCi ⊗ S(Ai )αDi ) · (et ⊗ et )(−1)[Ai ]([Bi ]+[Ci ]) = (S ⊗ S)(Ai ⊗ Bi ) · (α ⊗ α)(Di ⊗ Ci )(et ⊗ et )(−1)[Di ][Ci ] = (S ⊗ S)T {T (Ai ⊗ Bi )} · (α ⊗ α) · T {(Ci ⊗ Di )R T } = ψ(v), ˜
370
M. D. Gould, Y.-Z. Zhang, P. S. Isaac
where v˜ = (T ⊗ T )
(Ai ⊗ Bi ⊗ Ci ⊗ Di )(1⊗2 ⊗ R T ),
so it suffices to show ψ(v) = ψ(v). ˜ Above we have used Lemma 2, so that Ai ⊗ Bi ⊗ Ci ⊗ Di = ( −1 ⊗ 1)( ⊗ 1 ⊗ 1) , Ai ⊗ Bi ⊗ Ci ⊗ Di = (1 ⊗ )(1 ⊗ 1 ⊗ ) −1 . In view of Eq. (2.9), v immediately reduces to T T v = ( −1 123 (R )12 ⊗ 1)( ⊗ 1 ⊗ 1) .
With the help of the equation (1 ⊗ )R T = (T ⊗ 1)(1 ⊗ T )( ⊗ 1)R −1 T T = −1 123 (R )12 213 (R )13 312 , v can be written −1 T v = {(1 ⊗ )R T · 312 (R T )−1 13 213 ⊗ 1}( ⊗ 1 ⊗ 1) −1 T = 23 (at )(a t ⊗ 1⊗3 ){ 312 (R T )−1 13 213 ⊗ 1}( ⊗ 1 ⊗ 1) .
Now observe ψ(23 (a)h) = (a)ψ(h) = ψ(14 (a)h),
(5.7)
which holds ∀a ∈ H , h ∈ H ⊗4 . In what follows, we use ∼ to denote equivalence under ψ. We then have (5.7)
v ∼
−1 T (at )(a t ⊗ 1⊗3 ){ 312 (R T )−1 13 213 ⊗ 1} · ( ⊗ 1 ⊗ 1)
−1 = (T ⊗ 1 ⊗ 1){( 132 ⊗ 1)((R T )−1 ⊗ 1)( ⊗ 1 ⊗ 1) } 23 ⊗ 1)( (2.2)
= (T ⊗ 1 ⊗ 1){( 132 ⊗ 1)(1 ⊗ (R T )−1 ⊗ 1)(1 ⊗ ⊗ 1) · (1 ⊗ )(1 ⊗ 1 ⊗ ) −1 }
(2.9)
= (T ⊗ 1 ⊗ 1){( 132 ⊗ 1)(1 ⊗ T ⊗ 1) · (1 ⊗ (R T )−1 ⊗ 1)(1 ⊗ )(1 ⊗ 1 ⊗ ) −1 }
(2.2)
= (T ⊗ 1 ⊗ 1){(1 ⊗ T ⊗ 1)(( ⊗ 1 ⊗ 1) · (1 ⊗ 1 ⊗ ) · (1 ⊗ −1 )) · (1 ⊗ (R T )−1 ⊗ 1) · (1 ⊗ )(1 ⊗ 1 ⊗ ) −1 }.
By straightforward application of Eq. (5.7) we obtain v∼ (Xν )(Zµ )(Yν ⊗ 1⊗2 ⊗ Zν )(1 ⊗ Xµ ⊗ Yµ ⊗ 1) · (T ⊗ 1 ⊗ 1){(1 ⊗ T ⊗ 1)(1 ⊗ −1 ) · (1 ⊗ (R T )−1 ⊗ 1) · (1 ⊗ )(1 ⊗ 1 ⊗ ) −1 } T −1 −1 = (T ⊗ 1 ⊗ 1){(1 ⊗ −1 213 )((R )23 ⊗ 1)(1 ⊗ )(1 ⊗ 1 ⊗ ) }.
On Quasi-Hopf Superalgebras
371
As to v˜ we note that ( ⊗ 1)R T = (1 ⊗ T )(T ⊗ 1)(1 ⊗ )R T = 123 (R T )23 −1 132 (R )13 231 . Paying particular attention to Eqs. (2.9) and (5.7), we have v˜ = (T ⊗ T ) · {(1 ⊗ )(1 ⊗ 1 ⊗ ) −1 · (1⊗2 ⊗ R T )} (2.9)
= (T ⊗ T ) · {(1 ⊗ )(1⊗2 ⊗ R T )(1 ⊗ 1 ⊗ T ) −1 } T = (T ⊗ T ){(1 ⊗ 123 R23 )(1 ⊗ 1 ⊗ T ) −1 } T −1 = 14 (a t )(1⊗2 ⊗ at ⊗ 1)(T ⊗ T ){1 ⊗ −1 231 (R )13 132 } t
· (T ⊗ T )(1⊗2 ⊗ T )(1 ⊗ 1 ⊗ ) −1 (−1)[at ][a ] T −1 ∼ (a t )(1⊗2 ⊗ at ⊗ 1)(T ⊗ T ){1 ⊗ ( −1 231 (R )13 132 )} · (T ⊗ 1⊗2 )(1 ⊗ 1 ⊗ ) −1 T −1 = (T ⊗ 1⊗2 ){(1⊗2 ⊗ T ){(1 ⊗ −1 231 )(1 ⊗ (R )13 )(1 ⊗ 132 )} · (1 ⊗ 1 ⊗ ) −1 }. We therefore have T −1 v˜ ∼ (T ⊗ 1 ⊗ 1){(1 ⊗ −1 ⊗ 1)(1 ⊗ )(1 ⊗ 1 ⊗ ) −1 }. 213 )(1 ⊗ (R )
Thus ψ(v) = ψ(v) ˜ from which the lemma follows.
This is sufficient to prove Theorem 4.
6. Concluding Remarks As noted in the introduction, the potential for applications of QHSAs is enormous, particularly in knot theory and supersymmetric integrable models, and these applications will be investigated elsewhere. In applications such as these, it is important to have a well developed and accessible structure theory, which has been the main focus of this paper. It is worth noting, even in the non-graded case, that the structure induced by the Drinfeld twist (4.10) has only been investigated for quasi-bialgebras [3]. Thus our results on the complete (graded) quasi-Hopf algebra structure, and in particular the purely algebraic and universal proof of Theorem 4, are new even in the non-graded case. Note. After this paper was posted to the math.QA bulletin board, we were informed by F. Hausser of their paper [19], in which the result of Theorem 4 was proved (in the non-graded case only) using graphical techniques on the category of finite dimensional modules of H . However, as we have mentioned above, our proof is purely algebraic and universal. Acknowledgements. P.S.I is supported by a JSPS postdoctoral fellowship.
372
M. D. Gould, Y.-Z. Zhang, P. S. Isaac
References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19.
Gould, M.D., Zhang, Y.-Z., Isaac, P.S.: J. Math. Phys. 41, no. 1, 547 (2000) Zhang, Y.-Z., Gould, M.D.: J. Math. Phys. 40, no. 10, 5264(1999) Drinfeld, V.G.: Quasi-Hopf algebras. Leningrad Math. J. 1, 1419 (1990) Babelon, O., Bernard, D., Billey, E.: Phys. Lett. B375, 89 (1996) Fronsdal, C.: Lett. Math. Phys. 40, 134 (1997) Jimbo, M., Konno, H., Odake, S., Shiraishi J.: Transform. Groups 4, no. 4, 303 (1999) Arnaudon, D., Buffenoir, E., Ragoucy, E., Roche, Ph.: Lett. Math. Phys. 44, no. 3, 201 (1998) Enriquez, B., Felder, G.: Commun. Math. Phys. 195, no. 3, 651 (1998) Foda, O., Iohara, K., Jimbo, M., Kedem, R., Miwa, T., Yan, H.: Lett. Math. Phys. 32, 259 (1994) Felder, G.: Elliptic quantum groups. Proc. ICMP Paris 1994. Cambridge, MA: International Press, 1995, pp. 211 Baxter, R.J.: Ann. Phys. 70, 193 (1972) Andrews, G.E., Baxter, R.J., Forrester, P.J.: J. Stat. Phys. 35, 193 (1984) Belavin, A.: Nucl. Phys. B180, 189 (1981) Jimbo, M., Miwa, T., Odake, M.: Commun. Math. Phys. 116, 507 (1988) Bazhanov, V.V., Stroganov, Yu.G.: Theor. Math. Phys. 62, 253 (1985) Deguchi, T., Fujii, A.: Mod. Phys. Lett. A6, 3413 (1991) Altsculer, D., Coste, A.: Commun. Math. Phys. 150, 83 (1992) Sweedler, M.E.: Hopf Algebras. New York: Benjamin, 1969 Hausser, F., Nill, F.: Commun. Math. Phys. 199, no. 3, 547 (1999)
Communicated by R. H. Dijkgraaf
Commun. Math. Phys. 224, 373 – 397 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Hairy Black Hole Solutions to SU (3) Einstein–Yang–Mills Equations Wei H. Ruan Department of Mathematics, Computer Science and Statistics, Purdue University Calumet, Hammond, IN 46323, USA. E-mail: [email protected] Received: 23 October 2000 / Accepted: 30 January 2001
Abstract: We give a rigorous proof of existence of infinitely many black hole solutions to the Einstein–Yang–Mills equations with gauge group SU (3). In the case that the radius of event horizon is not too small, we show that there is a black hole solution for any possible numbers of zeros of the two field variables. 1. Introduction The coupling of Einstein’s general relativity with Yang–Mills’ field theories has been receiving active study for over a decade, ever since the discovery by Bartnik and McKinnon of numerical solutions of hairy black holes when the gauge group is SU (2). The Einstein–Yang–Mills equations with the gauge group SU (N ) have the form ∂P = 0, ∂ωi
rµ + 2Gµ = , r S 2 = G S r
r 2 µωi + ωi +
i = 1, . . . , N − 1,
r > rˆ ,
(1.1)
where rˆ > 0 is the radius of event horizon, 2 1 2 2 − N − 1 + 2i , ωi − ωi−1 8 N
P =−
(1.2)
i=1
in which ω0 = ωN = 0, and G=
N−1 i=1
2 ωi ,
4
= r (1 − µ) + P . r
(1.3)
374
W. H. Ruan
This system is derived using the ansatz ds 2 = µ−1 dr 2 + r 2 dθ 2 + sin2 θdφ 2 − S 2 µ dt 2 for the metric and Aj dx j =
1 i C − C H dθ − C + C H sin θ + D cos θ dφ 2 2
for the field potential, where
0 ω1 0 . . , .. .. ω C= N−1 0 0
N −1
D=
0 N −3 .. .
−N + 1
0
(For a derivation of the equations, see [4, 6].) A regular black hole solution is the one that satisfies the condition µ rˆ = 0,
µ (r) > 0 for r > rˆ , and lim µ (r) = 1. r→∞
For such a system, the so-called No Hair Conjecture has been the general belief for a long time. The conjecture states that a stationary black hole is uniquely determined by mass, angular momentum, and Yang–Mills charge at infinity. This was disproved by Bartnik and McKinnon [1] in 1988. They found in the SU (2) case numerical solutions corresponds to nonsingular and nonabelian black holes. (See also [2, 5, 11] and a recent review by Volkov and Galt’sov [12].) A rigorous and thorough mathematical analysis in this case is given by Smoller, Wasserman, and Yau [9]. (See also [8, 10].) It is shown that for every value of radius of the event horizon, and every nonnegative integer n, there are two black hole solutions such that the field function ω has exactly n zeros. A natural question is hence whether this result can be extended to a more general case where the gauge group is SU (N ). Since in this case there are N − 1 field functions ω1 , . . . , ωN−1 , the conjecture is that for every radius of the event horizon and every N − 1-tuple (n1 , . . . , nN−1 ) of nonnegative integers, there are k black hole solutions such that ωi has exactly ni zeros for i = 1, . . . , N − 1, where k = 23(N−1)/2 if N is odd and k = 23(N−2)/2+1 if N is even. (The multiplicity is due to the symmetry of the system under the changes ωi → −ωi , and ωi → ωN−i for any fixed i.) Proof of this conjecture has been attempted but not yet achieved. The SU (N ) case appears to be difficult because the N − 1 field equations are strongly coupled (i.e., coupled through derivatives). It is to be noted that in a recent paper [6], Mavromatos and Winstanley give an argument for a weaker version of the conjecture: given N − 1 nonnegative integers n1 , . . . , nN−1 , there exist solutions such that each ωi possessing at least (rather than exactly) ni zeros. Their argument, however, is heuristic. There are gaps in the proofs at the fundamental level. In this paper, we give a rigorous proof of the original version of the conjecture in a special case where the gauge group is SU (3) and the radius rˆ of the event horizon
Hairy Black Hole Solutions to SU (3) Einstein–Yang–Mills Equations
375
exceeds 2. In this particular case, the equations for the metric variable µ and the field functions ωi have the form
1 r 2 µω1 + ω1 + ω1 1 − ω12 + ω22 = 0, 2
1 (1.4) r 2 µω2 + ω2 + ω2 1 − ω22 + ω12 = 0, r > rˆ , 2
rµ + 2Gµ = , r together with the initial condition µ rˆ = 0,
ωi rˆ = ωˆ i ,
i = 1, 2,
(1.5)
where ωˆ i are constants. We set aside the equation for S since S is not involved in the above equations, and hence can be solved separately once µ, ω1 and ω2 are found. In view of the symmetry of the system under the substitutions ωi → −ωi for i = 1, 2, and ω1 ↔ ω2 , we may restrict ourselves to consider only the solutions that satisfy the initial condition 0 ≤ ωˆ 1 ≤ ωˆ 2 . Our main result is the following Theorem 1.1. Suppose rˆ > 2. Then for any integers n1 ≥ n2 ≥ 0, there is a regular black hole solution (µ, ω1 , ω2 ) of the system (1.4)–(1.5) such that √ 0 ≤ ωˆ 1 ≤ ωˆ 2 ≤ 2, and ωi has exactly ni zeros over the interval rˆ , ∞ . Furthermore, each black hole solution, with ωˆ 1 = 0 and ωˆ 2 = 0, has the constant limit √ lim µ = 1, lim ωi = ± 2, i = 1, 2, r→∞
r→∞
and (ω1 , ω2 ) reaches the limit from within the square √ √ √ √ D = − 2 ≤ ω1 ≤ 2, − 2 ≤ ω2 ≤ 2 . It is implied by the theorem that at the infinity, the field is constant and the spacetime is that of the flat one. We will also show that each of these solutions has a finite ADM mass m ≡ r (1 − µ) /2, which is derived from the solution, not an arbitrary constant. The proof of Theorem 1.1 will be complete at the end of the paper, as a result of analyzing several aspects of the system. Let us define a few terms before describing the structure of this paper. It will be seen that the square D defined in Theorem 1.1 plays an important role. Throughout this paper, by trajectory we mean the curve in the (ω1 , ω2 )-plane generated by a solution (µ, ω1 , ω2 ). We characterize trajectories into three types. A crashing trajectory is generated by a solution of which µ becomes zero before the point ω ≡ (ω1 , ω2 ) leaves D (if ever). A connecting trajectory is the one that does not crash, and stays in D for all r > rˆ . And an exiting is the one that leaves D before it crashes (if ever). A starting trajectory point ωˆ ≡ ωˆ 1 , ωˆ 2 of the trajectory is called a crashing, connecting, or exiting point if the corresponding trajectory is of the respective type. By zero-numbers of a trajectory
376
W. H. Ruan
or an initial point we mean the numbers n1 and n2 of zeros of ω1 and ω 2 , respectively, before the trajectory crashes or exits D (if ever). We often write ni ωˆ to indicate the dependence of ni on the starting point ωˆ ∈ D. This paper is organized as follows. In Sect. 2, we give preliminary properties about solutions. In particular, we show that any trajectory that ever reaches the boundary of D in finite r must exit D immediately. We also show that the condition rˆ > 2 eliminates the existence of crashing trajectories. Properties of µ, and m are also given. In Sect. 3, we study the connecting trajectories, and show that these solutions converge to equilibria. This calling it “connecting”. In Sect. 4,give justifies properties of the zero-numbers ni ωˆ . We also show that each zero number ni ωˆ , as an integer-valued function in D, is upper semicontinuous at exiting points and lower semicontinuous at connecting points. In the final section, we examine the distribution of the connecting and exiting points in the square D, and prove the existence of connecting points with all possible values of zero-numbers. Thus we complete the proof of Theorem 1.1. 2. Properties of Solutions In this section, we give a preliminary study about the spacetime variable µ, the field variables (ω1 , ω1 ), the ADM mass m, and the zero-numbers ni ωˆ . We first consider µ. It is clear that the system (1.4) is singular whenever µ = 0. In particular, it is singular at r = rˆ . Because of the singularity at r = rˆ , the existence of a local solution is a nontrivial problem. The result on the existence of a local solution for the general SU (N ) case (1.1) is proved in Künzle [4], which also shows that the solutions depend on the initial values ωˆ = ωˆ 1 , . . . , ωˆ N−1 analytically. Furthermore, from Eq. (1.1), we see that
rˆ 1 4 µ rˆ = = P ω ˆ . 1 + rˆ 2 rˆ r2 Since we are only interested in solutions of which µ (r) > 0 for r > rˆ , we assume throughout this paper that rˆ > 0, or equivalently rˆ 2 > −4P ωˆ . (2.1) In the SU (3) case, we will see that this condition holds if rˆ > 2. Furthermore, we show that µ (r) > 0 as long as ω ∈ D. This would eliminate existence of other singularity while the trajectory stays in D. Theorem 2.1. rˆ > 2. Let (µ, ω1 , ω2 ) be a solution of the initial problem (1.4) Suppose (1.5), with ωˆ 1 , ωˆ 2 ∈ D. Then µ (r) > 0 before the trajectory ω = (ω1 , ω2 ) exits D (if ever). Furthermore, µ (r) < 1 holds for all r at which the solution is defined. Proof. First observe that in the SU (3) case,
2 2 2 1 2 . P ω =− ω1 − 2 + ω12 − ω22 + ω22 − 2 8 A simple analysis shows that −1 ≤ P (ω) ≤ 0
if ω ∈ D.
(2.2)
Hence, the condition rˆ > 2 implies (2.1), which in turn implies µ > 0 for r > rˆ and near rˆ .
Hairy Black Hole Solutions to SU (3) Einstein–Yang–Mills Equations
377
We next show that µ (r) > 0 if ω ∈ D in rˆ , r . Suppose this is not true. Let r0 be the first of such r at which µ = 0. Then µ (r0 ) ≤ 0. However, from Eq. (1.4) and the relation (2.2),
4 4 1 1 µ (r0 ) = r0 + P (ω (r0 )) ≥ r0 − . r0 r0 r0 r0 Since r0 > rˆ > 2, it follows that µ (r0 ) > 0. This is impossible. The assertion is thus proved. Next, we show that µ (r) < 1 for all r > rˆ . Suppose this is not true. Let r1 be the first number at which µ (r1 ) = 1. Then µ (r1 ) ≥ 0. On the other hand, from Eq. (1.1), 4 P r1 2 2 2 1 2 = −2 ω12 + ω22 − ω1 − 2 + ω12 − ω22 + ω22 − 2 2r1 ≤ 0.
r1 µ (r1 ) = −2µ (r1 ) G +
Hence µ (r1 ) = 0. This implies that ωi (r1 ) = 0,
ωi2 (r1 ) = 2,
i = 1, 2.
However, by the uniqueness of solutions, these conditions imply that each ωi is a constant equilibrium. Therefore, G = 0,
P =0
for all r, and the equation of µ is reduced to rµ + µ = 1. It follows that rˆ < 1, r1 contradicting the assumption. Hence no such r1 exists. µ (r1 ) = 1 −
The above theorem shows that if rˆ > 2, then all trajectories starting in D do not crash in D. We assume rˆ > 2 throughout this paper without further notice. We next consider the field variables ω ≡ (ω1 , ω2 ). Our next result shows that if a trajectory ever reaches the boundary ∂D from inside of D, then the trajectory exits D. Theorem 2.2. Let (µ, ω1 , ω2 ) be a solution of the initial value problem (1.4)-(1.5) where ω (r) ≡ (ω1 (r) , ω2 (r)) is not constant. Suppose r¯ > rˆ is such that ω (r) ∈ D for rˆ ≤ r ≤ r¯ and ω (¯r ) ∈ ∂D. Then there is an ε > 0 such that ω (r) ∈ / D for r¯ < r < r¯ +ε. Proof. Suppose this is not true. Then either ω12 or ω22 has a local maximum 2 at r¯ . Without √ loss of generality, we may assume that ω1 (¯r ) = 2. Hence, ω1 (¯r ) = 0 and ω1 (¯r ) ≤ 0. In view of Theorem 2.1, µ (¯r ) > 0. Thus, from the equation for ω1 in (1.4 ), 1 −1 + ω22 (¯r ) ≥ 0, 2 that is, ω22 (¯r ) ≥ 2. Since on ∂D, ωi2 (¯r ) ≤ 2, it follows that ω22 (¯r ) = 2, which is also a local maximum. Hence ω2 (¯r ) = 0. However, this implies that ω (¯r ) is at an equilibrium with zero derivatives. It follows from the uniqueness of solution that ω (r) is constant for all r > rˆ . This contradicts the assumption of the theorem.
378
W. H. Ruan
Throughout this paper, for any exiting trajectory ω ∈ D starting at a point ω, ˆ we use ω¯ ≡ (ω¯ 1 , ω¯ 2 ) to denote the first point of the trajectory on ∂D. With slight abuse of language, we call ω¯ the end point of the trajectory. We also use µ¯ and r¯ to denote the corresponding “end” values of the solution. As a consequence of the previous theorem, we show that the end values depend on ωˆ continuously. Theorem 2.3. Suppose ωˆ ∗ ∈ D is an exiting point. Then in a neighborhood of ωˆ ∗ the end values r¯ , µ, ¯ ω¯ depend on the initial point ωˆ continuously. Proof. Let (µ∗ , ω∗ ) be the solution whose trajectory starts at ωˆ ∗ , and let r¯ ∗ be the end value of r for this solution. By Theorem 2.1, µ∗ (¯r ∗ ) > 0. Hence, by the continuity of solutions is a δ > 0 such with respect to initial values, for small ε, there that if ωˆ ∈ Nδ ωˆ ∗ then µ¯ > 0 in (¯r ∗ , r¯ ∗ + ε/2] andω ∈ / D in r¯ ∗ + ε/4, r¯ ∗ + ε/2 . We first show that r¯ depends on ωˆ continuously in Nδ ωˆ ∗ . Let ωˆ k be a sequence in D such that ωˆ k → ωˆ ∗ as k → ∞, and let r¯ k , ω¯ k and µ¯ k be the corresponding end values of the solutions. Let εn → 0+ as n → ∞. For n, repeat the argument of the previous paragraph, we can show that ωk ∈ / D for in each r¯ ∗ + εn /4, r¯ ∗ + εn /2 , if k is large enough. Hence lim sup r¯ k ≤ r¯ ∗ + εn /4 k→∞
for each n. This shows that lim supk→∞ r¯ k ≤ r¯ ∗ . On the other hand, by Theorem 2.2, for each εn , the distance between ω∗ (r) and ∂D for r ∈ rˆ , r¯ ∗ − εn has a positive lower bound, say, dn . Hence, by the continuity of solution with respect to initial values, if k is large enough, the distance between ωk (r) and ∂D for r ∈ rˆ , r¯ ∗ − εn is at least dn /2. This implies that lim inf r¯ k ≥ r¯ ∗ − εn k→∞
for each n. Since n is arbitrary, it follows that lim inf k→∞ r¯ k ≥ r¯ ∗ . This proves the assertion. Now, the convergence of ω¯ k and µ¯ k follows directly from r¯ k → r¯ ∗ and the continuous dependence of solutions on initial values. It follows from the above theorem that the set of exiting points is relatively open in D. Hence the set of connecting points is closed. Next, we present some properties of the ADM mass m = 2r (1 − µ). We first show that m is always increasing. Theorem 2.4. Suppose rˆ is a constant. Then m µ > 0 in an interval rˆ , r˜ , where r˜ > is nondecreasing in rˆ , r˜ . Furthermore, m > 0 for all r ∈ rˆ , r˜ unless both ω1 and ω2 are constant. Proof. By computation, 1 m (r) = µG − 2 P 2r 2 2 2 1 2 2 = µ ω1 + ω22 + 2 ω1 − 2 + ω12 − ω22 + ω22 − 2 4r ≥ 0.
Hairy Black Hole Solutions to SU (3) Einstein–Yang–Mills Equations
379
Hence m is always nondecreasing. If there is an r0 ∈ rˆ , r at which m = 0, then ωi (r0 ) = 0,
ωi2 (r0 ) = 2
for i = 1, 2. Hence each ωi is constant by the uniqueness of solution.
We next show that m has an upper bound which depends only on the number of sign changes of the field functions ω1 and ω2 . We first prove the following lemmas. Lemma 2.1. Suppose µ > 0 and ω ∈ D in the interval rˆ , r˜ . Then there is a constant M > 0, independent of the initial value ω, ˆ such that µG ≤ M. Proof. Let ξ ω; ˆ r = µG. Then by computation,
∂P ∂P r 2 ξ + (2rξ + ) G + 2 ω1 = 0. (2.3) + ω2 ∂ω1 ∂ω2 Since D is bounded, there exists a constant M0 > 1 such that ∂P ∂ω ≤ M0 , i = 1, 2 i whenever ω ∈ D. Hence, by Theorem 2.5 and the boundedness |P | ≤ 1 in D, 4 4 4 4
= r (1 − µ) + P = 2m + P ≥ 2m rˆ − = 2ˆr − > 0, r r rˆ rˆ
(2.4)
and ω1
√ ∂P ∂P + ω2 ≥ −M0 ω1 + ω2 ≥ −M0 2G. ∂ω1 ∂ω2
Suppose by contradiction that there is no upper bound for ˆ Then ξ for all r and ω. for each n > 0, there is a ωˆ n and an rn > rˆ such that ξ ω ˆ ; r ≥ n. Notice that n n ξ ωˆ n , rˆ = 0, rn can be chosen such that ξ ωˆ n , rn ≥ 0. Hence, for large n √ 2rn ξ ωˆ n ; rn + ≥ 2ˆr n ≥ 2M0 and
ξ ωˆ n ; rn G (rn ) = ≥ n ≥ 1. µ (rn )
√ G (rn ) and at r = rn ,
√ 2 ∂P ∂P rn ξ + (2rn ξ + ) G + 2 ω1 ≥ 2M0 G − M0 G > 0. + ω2 ∂ω1 ∂ω2
Hence G (rn ) >
This contradicts (2.3). Lemma 2.2. Letr˜ > rˆ . Then, there is a constant B > 0, depending on r˜ nonincreasingly, such that ωi (˜r ) ≤ B, i = 1, 2 for any trajectory ω that stays in D for r ∈ rˆ , r˜ ,
380
W. H. Ruan
Proof. We first observe that there is a constant σ > 0 such that if ω (r) ∈ D then
(r) ≥ σ . This can be seen from (2.4) and the assumption rˆ > 2. Write the equation for ωi in the form
1 ∂P ωi = − 2
ωi + . (2.5) r µ ∂ωi Since ω ∈ D in rˆ , r˜ , there is a constant, say M1 such that ∂P ∂ω ≤ M1 . i Let r ∈ rˆ , r˜ . Assume first that ωi does not change sign in rˆ , r . Without loss of generality, we may further assume that ωi ≥ 0 in this interval. If there is an r¯ such that rˆ < r¯ < r and ωi (¯r ) < 2M1 /σ , then it is necessary that ωi (r) ≤ 2M1 /σ . Because otherwise, there would be an r ∗ ∈ (¯r , r) such that 2M1 ωi r ∗ = , σ However, by (2.5) 1 ωi r ∗ ≤ − ∗2 r µ
and
ωi r ∗ ≥ 0.
2 (r ∗ ) M1 − M1 σ
≤−
M1 < 0, r ∗2 µ
which is impossible. Hence ωi (r) ≤ 2M1 /σ . If no such r¯ exists, then ωi ≥ 2M1 /σ in rˆ , r . Hence, by (2.5)
ωi (s) ≤ −
1 M1 (2M1 − M1 ) = − 2 s2µ s µ
for any s ∈ rˆ , r . Hence, by Lemma 2.1, −ωi 1 M1 M1 M1 M2 = , ≥ 2 ≥ 2 2 ≥ ωi s µG s2 ωi s 2 µ ωi where M2 is an upper bound of µG guaranteed by Lemma 2.1. Integrating from rˆ to r˜ with respect to s, we have
r − rˆ 1 1 1 1 − ≥ M1 M2 = + M 1 M2 . ωi (r) rˆ r rˆ r ωi rˆ Hence, ωi (r) ≤
rˆ r . M1 M2 (r−ˆr )
Finally, if ωi changes the sign in (r0 , r), then there is
an r¯ in this interval such that ωi (¯r ) = 0 < 2M1 /σ . Thus, by the above argument, ωi (r) ≤ 2M1 /σ . Therefore, in any case, we can choose 2M1 rˆ r˜ . B = max , σ M1 M2 r˜ − rˆ We now prove that m is bounded above by a constant that depends only on the number of times the field functions change signs.
Hairy Black Hole Solutions to SU (3) Einstein–Yang–Mills Equations
381
Theorem 2.5. For each nonnegative integer n, there is a constant Mn > 0 such that if both ω1 and ω2 of a solution (µ, ω1 , ω2 ) do not change signs for more than n times while the trajectory ω is in D, then m ≤ Mn as long as ω ∈ D. Proof. Since µ > 0 in D, for r ≤ rˆ + 1, m (r) =
r rˆ + 1 . (1 − µ) ≤ 2 2
For r ≥ rˆ + 1, we integrate m = µG − with respect to r to obtain
1 P, 2r 2
r 1 ds ω12 + ω22 ds + 2 rˆ +1 rˆ +1 2s r 1 rˆ + 1 + ω12 + ω22 ds. + ≤ 2 2 rˆ + 1 rˆ +1 ωi changes its sign at s1 , . . . , sk ∈ rˆ + 1, r , (k ≤ n). Let B be the bound of Suppose ω (r) for r > rˆ + 1 guaranteed by Lemma 2.2. Then i r k ωi (sj +1 ) k √ 2 ωi sj +1 − ωi sj ≤ 2 2nB. ≤ B ωi ds ≤ dω ω (s) i i ωi (sj ) rˆ +1 m (r) ≤ m rˆ + 1 +
r
j =1
j =1
Hence m (r) ≤
√ rˆ + 1 1 + 4 2nB. + 2 2 rˆ + 1
Theorems 2.4, 2.5 show that for any solution if its field functions change signs only finitely many times, the ADM mass must be finite. Recall that the purpose of this paper is to show that the system has all kinds of black hole solutions, each has field functions changing signs finitely many times. Hence, each has a finite ADM mass. Using Lemma 2.2, we obtain for future use a positive lower bound of µ (r) which is independent of ω. Proposition 2.1. Let r˜ > rˆ . Then there is a function δ (˜r ) > 0, depending on r˜ increasingly, such that µ (˜r ) ≥ δ (˜r ) for any trajectory that stays in D for r ∈ rˆ , r˜ . Proof. By Lemma 2.2, there is a constant M such that
2 2 rˆ + r˜ G = ω1 + ω2 ≤ M in , r˜ . 2 Let a = rˆ + r˜ /2 and let y be the solution of the initial value problem σ , r y (a) = 0,
ry + 2My =
for r > a,
382
W. H. Ruan
where σ > 0 is a positive lower bound of while ω ∈ D. Then, by the comparison principle,
rˆ + r˜ 2M σ 1− > 0. µ (˜r ) ≥ y (˜r ) = 2M 2˜r 3. The Convergence of Connecting Trajectories In this section, we consider solutions whose trajectory ω (r) stays in D for all r > rˆ . It is easy to see that the system (1.4) has nine equilibria for (ω1 , ω2 ): √ √ (0, 0) , (±1, 0) , (0, ±1) , ± 2, ± 2 . The purpose of this section is to show that unless ω (r) is itself one of these equilibria, any solution starting at a connecting point tends to a constant limit lim (µ (r) , ω1 (r) , ω2 (r)) = (1, ω¯ 1 , ω¯ 2 ) ,
r→∞
√ √ where ω¯ i is either 2 or − 2. The following theorem is actually more general. It only assumes that the trajectory is uniformly bounded, regardless whether it is in D. Theorem 3.1. Suppose (µ (r) , ω1 (r) , ω (r)) is a solution of problem (1.4)–(1.5) such that µ (r) > 0 and ω (r) ≡ (ω1 (r) , ω2 (r)) is uniformly bounded for all r > rˆ , then limr→∞ µ (r) = 1 and the limit of ω (r) exists and is an equilibrium. Furthermore, for any i = 1, 2, if ωˆ i ≡ ωi rˆ = 0 then limr→∞ ωi (r) = 0. The proof is long. We divide it into several steps. We first show that µ → 1 as r → ∞. Lemma 3.1. Suppose the condition of Theorem 3.1 holds. Then limr→∞ µ (r) = 1. Proof. Assume the opposite. Then the mass m, being nondecreasing and unbounded, necessarily tends to ∞. Hence
4 lim (r) = lim 2m (r) + P (ω) = ∞ r→∞ r→∞ r because P (ω) is bounded. We show that for each j = 1, 2, limr→∞ ωj = 0. First observe that since ω is bounded, lim inf r→∞ ωj = 0. This is obvious because ω is bounded. Also since ω is bounded, there is a upper bounded M such that ∂P /∂ωj ≤ M for all r and j . Let ε > 0 be fixed. Since → ∞, we can choose r0 > 0 so large that ε (r0 ) > 2M. Increasing r0 if necessary, we may assume also that ωj (r0 ) < ε. We show that ω (r) < ε for r > r0 . Suppose this is not true. Assume first that ω (r0 ) ≥ 0 j
j
and there is a r1 > r0 such that ωj (r1 ) = ε, and 0 ≤ ωj (r) < ε for r < r1 . Then
−1 ∂P −1 ≤ 2 (2M − M) < 0. ωj (r1 ) = 2
ωj + ∂ωj r=r1 r1 µ r1 µ
Hairy Black Hole Solutions to SU (3) Einstein–Yang–Mills Equations
383
This is impossible. Hence no such r1 exists. If on the other hand, ωj (r0 ) ≤ 0 and there is a r1 > r0 such that ωj (r1 ) = −ε, and −ε < ωj (r) ≤ 0 for r < r1 . Then
∂P −1 −1 ωj (r1 ) = 2 ≥ 2 (−2M + M) > 0.
ωj + ∂ωj r=r1 r1 µ r1 µ This is again impossible. This proves that ωj (r) < ε for all r > r0 . To show that limr→∞ µ = 1, we use the variable τ = ln r and write the equation for µ in (1.4) in the form 2 2 4 µ˙ + 1 + 2 ω1 + 2 ω2 µ = 1 + 2 P (ω) , r where upper the dot “ · ” represents d/dτ . Let δ > 0. Since 0 < µ < 1, P is bounded, and ωj → 0, there is a T such that µ˙ + µ > 1 − δ/2
for τ > T .
Compare µ with the solution of the equation ξ˙ + ξ = 1 − δ/2,
ξ (T ) = µ(T ).
The comparison principle implies that µ > ξ for all τ > T . It is clear that ξ → 1 − δ/2. Hence, µ > 1 − δ for large τ . Since µ < 1 for all τ , it follows that lim µ = lim µ = 1.
r→∞
τ →∞
Throughout the remainder of this paper, we use f˙ to denote r df dr for any differentiable function f . This is equivalent to introducing a new variable τ = ln r. It is often more convenient not to explicitly change variable from r to τ . In terms of this operator, the field equation in (1.1) can be written as
4 ∂P µω¨ i + 1 − 2µ + 2 P (ω) ω˙ i + = 0, i = 1, 2. (3.1) r ∂ωi We next show that ω˙ i → 0 as r → ∞ for each ωi of a uniformly bounded trajectory. Lemma 3.2. Suppose the condition of Theorem 3.1 holds. Then limr→∞ ω˙ i = 0 for each i = 1, 2. Proof. Define the energy function H =
1 2 µω˙ + P (ω) , 2
where ω˙ 2 ≡ ω˙ 12 + ω˙ 22 . By computation
1 2 2 2 3 2 ˙ H = ω˙ µ − − 2 P (ω) − 2 µω˙ . 2 2 r r
(3.2)
Since P (ω) is bounded and by Lemma 3.1, µ → 1, it is clear that if ω˙ is also bounded, then H˙ ≈ ω˙ 2 for large r. We prove the boundedness of ω˙ as follows. Let M be an upper bound of |P (ω)| and |∇P (ω)|, and let r0 > rˆ be so large that M/r02 < 1/24. Since
384
W. H. Ruan
µ → 1, we may choose r0 larger if necessary so that µ (r) ≥ 5/6 for all r ≥ r0 . We show that ω˙ 2 ≤ 8M 2 for all r > r0 . Suppose this is not true. Without loss of generality, we may assume that there is r1 > r0 such that ω˙ 1 (r1 ) > 2M. By Eq. (3.1), at any point r > r0 at which ω˙ 1 > 2M, we have
4 ∂P 1 5 −1− M − M = 0. (3.3) >2 µω¨ 1 (r) = 2µ − 1 + 2 P ω˙ 1 (r) − r ∂ω1 3 6 Furthermore, ω¨ 1 cannot become zero again because if it did, say first at an r2 after r1 , then ω˙ 1 (r2 ) ≥ ω˙ 1 (r1 ) > 2M. However, ω¨ 1 (r2 ) = 0 by (3.3). This contradiction shows that ω˙ 1 ≤ 2M for all r > r0 . This proves the boundedness of ω˙ 1 . As a consequence of the boundedness of P and ω, ˙ it follows from (3.2) that there is a r ∗ > rˆ such that 2 H˙ ≥ ω˙ 2 3
for r > r ∗ .
(3.4)
We show that lim inf r→∞ ω˙ 2 = 0. If not, by (3.4), H → ∞ as r → ∞. From the definition of H , since P (ω) is bounded, it follows that ω˙ 2 → ∞. This contradicts the boundedness of ω. ˙ We now show that limr→∞ ω˙ 2 = 0. Suppose this is not true. Then there exists δ > 0 and sequences {sn }, {tn } such that r0 < . . . < sn < tn < sn+1 < . . . , sn → ∞, tn → ∞, 1 2 µω˙ (sn ) ≥ δ, 2
and
1 2 µω˙ (tn ) ↓ 0 2
as n → ∞. From the field equations and the boundedness of ω˙ and ∇P , ω¨ is also bounded. Hence there is an ε > 0 such that ω˙ 2 (r) ≥ δ/2 in (sn − ε, sn + ε). It is clear that (tn , tn+1 ) ⊃ (sn+1 − ε, sn+1 + ε) . Now, since H˙ ≥ 23 ω˙ 2 for large r, it follows that for large n,
1 2 1 2 2 τn+1 2 ω˙ dτ µω˙ (tn+1 ) + P (ω (tn+1 )) ≥ µω˙ (tn ) + P (ω (tn )) + 2 2 3 τn 1 2 ≥ µω˙ 2 (tn+1 ) + P (ω (tn )) + δε, 2 3
where τi = ln ti for i = 1, 2, . . . . This implies that 2 P (ω (tn+1 )) ≥ P (ω (tn )) + δε 3 for large n. This contradicts the boundedness of P . The lemma is thus proved.
In the next step, we show that ω has a limit as r → ∞, and the limit is an equilibrium. Lemma 3.3. Suppose the condition of Theorem 3.1 holds. Then the field functions ω = (ω1 , ω2 ) has a limit as r → ∞, and the limit is one of the equilibria of the system.
Hairy Black Hole Solutions to SU (3) Einstein–Yang–Mills Equations
385
Proof. We first show that limτ →∞ ∇P = 0. Suppose this is not true. Then there is a constant ε0 > 0, a component ωj and a sequence {rn } such that rn ↑ ∞ and ∂P (r ) n ≥ ε0 for all n. ∂ω j ε0 Since ω˙ 2 → 0, there is r˜ > 0 such that ω˙ j < δ ≡ 2(M+3) for r ≥ r˜ , where M is an 1 2 upper bound for µ 2µ − 1 − 4P /r for r > r˜ . However, from the field equation (3.1) we see that ln rn +1 ω˙ j (ern ) ≥ −ω˙ j (rn ) + ω¨ j dτ ≥ −ω˙ j (rn ) +
ln rn ln rn +1
ln rn
∂P dτ − ∂ω j
ln rn +1
ln rn
1 4 2µ−1− 2 P ω˙ j dτ µ r
≥ −3δ + ε0 −δM ε0 > δ. = 2 This is a contradiction. This proves the assertion. Finally, let - denote the “ω -limit set” defined by - = w = (w1 , w2 ) : there is a sequence rn → ∞ such that lim ω (rn ) = w . n→∞
Since ω stays in a bounded set, - is clearly nonempty. Also limr→∞ ω exists if and only if - is a singleton. Suppose the limit does not exist. We show that - has infinitely many points. Let w1 and w 2 be two different points of -. Then, there are two sequences {sn } and {tn }, such that limn→∞ ω (sn ) = w 1 , limn→∞ ω (tn ) = w 2 . We may choose the sequences such that sn < tn < sn+1 < tn+1 < . . . for all n. Choose ε > 0 such that the ε-neighborhoods N w1 , ε and N w2 , ε of w1 and w2 do not intersect. Then for each is a rn between sn and tn such that ω (rn ) is outside both neighborhoods n, there N w1 , ε and N w2 , ε . Since {ω (rn )} is bounded, it has a limit point w 0 . Clearly w0 ∈ - and is outside of N w 1 , ε and N w2 , ε . This shows that for any pair of distinct points of -, there is another point. Hence - has infinitely many points. To see that this is impossible, we notice that each point of - is a zero of ∇P . Indeed, from the preceding paragraph, if limn→∞ ω (rn ) = w, then ∇P (w) = lim ∇P (ω (rn )) = lim ∇P (ω (r)) = 0. n→∞
r→∞
Thus, since ∇P has only finitely many zeros, - is necessarily a singleton. This proves that the limit limτ →∞ ω exists and is a zero of ∇P . It is clear from the equations that any zero of ∇P is an equilibrium. Our final step is to show that the limit of any field function ωi is nonzero if its initial value ωˆ i is nonzero. This would complete the proof of Theorem 3.1. Lemma 3.4. Let the condition of Theorem 3.1 hold. Then for each ωi with the initial value ωi rˆ ≡ ωˆ i = 0, limr→∞ ωi (r) = 0.
386
W. H. Ruan
Proof. Suppose the opposite holds. Without loss of generality, we may assume that there is a solution (µ, ω1 , ω2 ) such that ωˆ 1 = 0, but limr→∞ ω1 = 0. Let ω¯ ≡ (0, ω¯ 2 ) = lim ω (r) . r→∞
Since it is an equilibrium, ω¯ 2 is either 0, or ±1. Let ξj = ω˙ j for j = 1, 2. The equations for ωj can be written as a system of first order equations ω˙ 1 = ξ1 , ω˙ 2 = ξ2 , 1 ˙ξ1 = − 1 − 2µ + µ 1 ξ˙2 = − 1 − 2µ + µ
4 P ξ1 − r2
4 P ξ2 − r2
ω1 1 − ω12 + µ ω2 1 − ω22 + µ
1 2 ω , 2 2
1 2 ω . 2 1
In a neighborhood of the equilibrium ω, ¯ the system can be written as a perturbed system η˙ j = ξj , ξ˙j = ξj −
ηk
k=1,2
∂ 2P ¯ + εj (r, η, ξ ) , j = 1, 2, (ω) ∂ωj ∂ωk
(3.5)
where ηj = ωj − ω¯ j , and εj (τ, η, ξ ) is a smooth function in η ≡ (η1 , η2 ) and ξ ≡ (ξ1 , ξ2 ) such that εj (τ, 0, 0) = 0. Notice that since ω¯ 1 = 0, the linear part of the equations for (η1 , ξ1 ) is independent of (η2 , ξ2 ) and vice versa. Specifically, the linearized equations for (η1 , ξ1 ) are η˙ 1 = ξ1,
1 ξ˙1 = ξ1 − η1 1 + ω¯ 22 , 2
whose eigenvalues are complex with positive real parts, and the equations for (η2 , ξ2 ) are η˙ 2 = ξ2,
ξ˙2 = ξ2 − η2 1 − 3ω¯ 22 .
(3.6)
This implies that the stable manifold for the linearized system is a subset of the subspace S2 = {η1 = 0, ξ1 = 0}. Let k be the dimension of this stable manifold. We show that the stable manifold of the unperturbed system (3.5) also lies in S2 . Since εi (τ, 0, 0) = 0, it is well-known that the stable manifold of the perturbed system is a homeomorphism of the stable manifold of the unperturbed system (see e.g. Chapter 4, Theorem 3.1 of [3]). Hence the stable manifold of the unperturbed system again has the dimension k. Observe next that the unperturbed system (3.5) is invariant in S2 . To see this, consider the system (1.4), which is equivalent to (3.5). If ω1 rˆ = 0, then
1 ω1 rˆ = −ω1 rˆ 1 − ω12 rˆ + ω22 rˆ / rˆ = 0. 2
Hairy Black Hole Solutions to SU (3) Einstein–Yang–Mills Equations
387
Hence by the uniqueness of (local) solutions, ω1 (r) = 0 for all r > rˆ . This corresponds to η1 = ξ1 = 0. Hence S2 is invariant. Restricting system (3.5) in S2 , we find that its linearized equations about (0, ω¯ 2 ) is again given by (3.6). Hence the stable manifold when restricted in S2 is also k. The same dimensionality implies that the stable manifold of (3.5) is contained entirely in S2 . Hence any trajectory approaches ω¯ as r → ∞ must be in S2 for all r > rˆ . This contradicts the assumption ωˆ 1 = 0. This completes the proof of Theorem 3.1 about the convergence of connecting trajectories. In the next section, we examine the number of zeros of ωi of connecting and exiting trajectories. 4. Zero-Numbers of Trajectories In this section, we discuss the properties of the zero-numbers ni ωˆ at exiting and connecting points. These numbers can be viewed as integer-valued functions in D. It will be shown that the variation of ni ωˆ at any point (i.e., the difference between the maximum and minimum in neighborhoods as the neighborhoods reduce to the point) is at most one. Also, each ni is upper semicontinuous at any exiting point, and lower semicontinuous at any connecting point. The following lemma is needed. Lemma 4.1. Suppose ω∗ is a trajectory that stays in D for r ∈ rˆ ,r˜ , where r˜ > rˆ is a constant. Suppose also that the component ωi∗ has at ∗n zeros in rˆ , r˜ , and is∗ nonzero ∗ such r˜ . Then there is an r > r˜ and a neighborhood N ω ˆ of ω of the initial point ω ˆ that any trajectory starting in N ωˆ ∗ has exactly n zeros in rˆ , r . Proof. Since by Theorem 2.1, solutions do not crash while the trajectory stays in D, the differential equations are not singular for r ∈ (ˆr , r¯ ]. Hence the continuous dependence of solutions on initial values holds is an ∗there ∗ while solutions stay in D. This implies that r > r˜ and a neighborhood such that any trajectory ω starting in N ω ˆ does not N ω ˆ crash for r ∈ rˆ , r and ωi r = 0. We may also assume that ωi∗ has no additional zero in r˜ , r . Suppose by contradiction that there is a sequence of initial points ωˆ k → ωˆ ∗ as k → ∞, such that the i th component ωik , has the number of zeros nk = n. We show that this is impossible. We first show that for large k, nk ≥ n. Let r1 < r2 < . . . < rn be the zeros of ωi∗ in rˆ , r . Note that since ωi∗ is not constantly zero, it followsthat r1 > rˆ , and ωi∗ = 0 at any of these points. Hence, there is an ε > 0 such that ωi∗ rj − ε and ωi rj + ε have the opposite signs for j =1, . . . ,n. By the continuity of dependence on initial values, for each j and large k, ωik rj − ε and ωik rj + ε also have the opposite signs. Hence there is a zero of ωik in rj − ε, rj + ε . The assertion follows if we choose ε smaller than each of rj +1 − rj . We next show that for large k, nk ≯ n. Suppose ωik has more than n + 1 zeros for k infinitely many k. Let r1k < r2k < . . . < rn+1 be n + 1 of them in rˆ , r¯ . Since each ∞ is bounded, we may choose a subsequence so that it has limit. Let the sequence rjk k=1 limit be denoted by rj . Clearly, rj is a zero of ωi . Since ωi has only n zeros in rˆ , r , there are two sequences rjk and rjk+1 that converge to a same limit. Let us call this
388
W. H. Ruan
limit s. Hence ωi∗ (s) = 0. However, by the mean value theorem, there is s k ∈ rjk , rjk+1 such that ωik s k = 0. Taking k → ∞, we see that ωi∗ (s) = 0. By the uniqueness of solutions, ωi∗ ≡ 0. This contradicts the assumption that ωi∗ has only finitely many zeros. We first consider ni ωˆ at an exiting point. Theorem 4.1. Let ω∗ be an exiting trajectory starting point ωˆ ∗ and the end ∗ with the ∗ ∗ point ω¯ . Then, there is a neighborhood N ωˆ of ωˆ such that (1) the zero-number ni is constant in N ωˆ ∗ if ω¯ i∗ = 0, and (2) ni ωˆ ∗ − 1 ≤ ni ωˆ ≤ ni ωˆ ∗
for any ωˆ ∈ N ωˆ ∗
if ω¯ i∗ = 0 and ωi∗ is not a constant. Proof. Suppose ω¯ i∗ = 0. Then there is an ε > 0 such that ωi∗ = 0 in (¯r ∗ − ε, r¯ ∗ + ε), where r¯ ∗ is the end value of r for ω∗ . By Theorem 2.3 and continuous dependence on initial there is a δ > 0 such that any trajectory ω starting in the δ-neighborhood values, Nδ ωˆ ∗ has its end value r¯ ∈ (¯r ∗ − ε, r¯ ∗ + ε) and ωi = 0 in (¯r ∗ − ε, r¯ ∗ + ε). In view ∗ have same number of zeros in rˆ , r¯ ∗ . of Lemma 4.1, if δ is sufficiently small, ω and ω i i ∗ Hence ni ωˆ = ni ωˆ . This proves part (1) of the lemma. Suppose ω¯ i∗ = 0. By the uniqueness of the solution, since ωi∗ is not constantly zero, r¯ ∗ ∗ ∗ is an isolated zero of ωi∗ . Hence there is an ε > 0 such r¯ ∗ ). Choose that ωi = 0 in (¯r − ε, ∗ ∗ ∗ δ > 0 so that any trajectory ω starting in Nδ ωˆ has the end value r¯ ∈ (¯r − ε, r¯ + ε) ∗ ∗ and ωi has the same number of zeros as ωi in rˆ , r¯ − ε . We show that for small ε and the corresponding δ, ωi can have no more than one zero in (¯r ∗ − ε, r¯ ). Suppose the opposite is true. Then there are sequences εk → 0 and ωˆ k → ωˆ ∗ as k → ∞, such that ∗ k each ωi has at least two zeros in r¯ − εk , r¯ k . Let them be denoted as s k < t k . Then by the mean value theorem, there is a r k such that s k < r k < t k and ωik r k = 0. Since s k → r¯ ∗ , t k → r¯ ∗ and ωik → ωi∗ as k → ∞, it follows that ωi∗ (¯r ∗ ) = ωi∗ (¯r ∗ ) = 0. Hence by the uniqueness of solutions, ωi∗ ≡ 0, contradicting the assumption of the ∗ theorem. Hence ni ωˆ ≤ ni ωˆ and the difference of them is at most one. This proves part (2). We now consider ni ωˆ at a connecting point. Theorem 4.2. Let ωˆ ∗ ∈ D be a connecting point such that both ωˆ 1∗ and ωˆ 2∗ are nonzero. ∗ Then there is a neighborhood N ωˆ in which ni ωˆ ∗ ≤ ni ωˆ ≤ ni ωˆ ∗ + 1,
i = 1, 2
(4.1)
for any ωˆ = ωˆ ∗ in N ωˆ ∗ . Furthermore, if ni ωˆ > ni ωˆ ∗ for some ωˆ ∈ N ωˆ ∗ then ωˆ is exiting. ∗ ∗ Proof. ∗ ∗For∗ simplicity in notation, we denote the zero-number n∗i ωˆ by ni . Let µ , ω1 , ω2 be the solution of problem (1.4), ( 1.5) that starts at ωˆ . In view of Theorem 3.1 , limr→∞ µ∗ = 1 and there is an equilibrium p ∈ D such that limr→∞ ω∗ = p. Furthermore, by Theorem 2.2, if ωˆ ∗ = p, then for each r > rˆ , ω∗ (r) is in the interior
Hairy Black Hole Solutions to SU (3) Einstein–Yang–Mills Equations
389
√ √ of D. Without loss of generality, we assume that p = − 2, − 2 . Then, there is an r˜ > rˆ such that ω∗ is in the square √ C = − 2 ≤ ωi ≤ −1, i = 1, 2 for all r > r˜ . By the continuous dependence of solutions on initial values, there is an ε-neighborhood Nε ωˆ ∗ and an r0 > r˜ such that any trajectory ω starting in Nε ωˆ ∗ must stay in D for r ∈ rˆ ,r0 and ω (r0 ) ∈ C. In view of Lemma 4.1, for small ε, ωi has n∗i zeros in the interval rˆ , r0 . Hence, ni ωˆ ≥ n∗i . It remains to show that ε can be chosen so small such that if the trajectory ω leaves C without leaving D, each ωi can have at most one more zero before ω exits D. In the following, let r¯ be the end value of r for the trajectory (i.e., ω (¯r ) ∈ ∂D), let r2 ≤ r¯ be the maximum of r such that ωi has no more than n∗i + 2 zeros before exiting D, and let δ > 0 be so small that δ < 1/3 and √ √ 4 2 1 2 + (2 − 5δ) > √ . (4.2) 2 3 3 3 (1 − 3δ) By choosing ε small enough, we may assume that r0 is so large such that √ 4 4 2 1 1+ ≤ δ, ≤δ r2 r2 3
(4.3)
for r > r0 . Furthermore, by Theorem 2.5, m has an upper bound Mn for r ∈ rˆ , r2 which is independent of r and ω. ˆ Hence, choosing ε smaller if necessary, we assume µ (r) = 1 −
2m 2Mn ≥1− ≥1−δ r r
(4.4)
for r ∈ (r0 , r2 ). We first show that if at any r ∗ ∈ (r0 , r2 ) √ 4 2
ω˙ i > √ , 3 3 (1 − 3δ)
(4.5)
then ω˙ i will be increasing in (r ∗ , r2 ). To see this, we use conditions (4.3) and (4.4) in Eq. (3.1) to obtain
1 µω¨ i r ∗ ≥ (1 − 3δ) ω˙ i r ∗ − ωi 1 − ωi2 + ωj2 2 √
4 2 1 > √ − ωi 1 − ωi2 + ωj2 , (j = i) . 2 3 3 By simple calculation, it can be shown that sup ωi
ω∈D
1 − ωi2
1 + ωj2 2
√ 4 2 = √ . 3 3
390
W. H. Ruan
Hence ω¨ i (r ∗ ) > 0. If ω¨ i r = 0 for some r ∈ (r ∗ , r2 ), then since at this point (4.1) also holds, the same argument would lead to ω¨ i r > 0. This is impossible. Hence the assertion follows. To show that ωi cannot have more than two zeros before ω exits D, it suffices to show that by the time ωi reaches its first zero after leaving C, ω˙ i is higher than the threshold value that is on the right side of (4.5). The next lemma provides such a lower bound. Lemma 4.2. Suppose ε is chosen such that conditions (4.3) and (4.4) hold. Then each component ωi of a trajectory starting in Nε ωˆ c can have at most one zero in (r0 , r¯ ). At such a zero, if it exists, √ 1 2 + (2 − 5δ) . ω˙ i ≥ 2 3 Proof. Let r1 denote the first zero of ωi in (r0 , r¯ ). Since ωi (r0 ) ≤ −1 and ωi (r1 ) = 0, there is an r1 ∈ (r0 , r1 ) such that ωi r1 = −1 and ω˙ i r1 ≥ 0. Furthermore, from Eq. (3.1), we see that
1 µω¨ i = −ωi 1 − ωi2 + ωj2 > 0, (j = i) 2 whenever ω˙ i = 0 and −1 ≤ ωi ≤ 0. Hence ω˙ i ≥ 0 in r1 , r1 . Define an “individual energy function” Hi by Hi = µω˙ i2 + ωi2 −
ωi4 . 2
By computation,
4P 2µω˙ 2 H˙ i = −1 + 3µ − 2 − ω˙ i2 − ωi ω˙ i ωj2 , (j = i) . r r2 Since P ≤ 0, ωi ≤ 0 and ω˙ i ≥ 0 in r1 , r1 , and by (4.3)-(4.4) √ 2µω˙ 2 4 2 2 µ ≥ 1 − δ, ≤ 2δ, ≤ 2 1+ r2 r 3 it follows that H˙ i ≥ (2 − 5δ) ω˙ i2 > 0
(4.6) in r1 , r1 . This implies that Hi is increasing in this interval. Hence, for any r ∈ r1 , r1 µω˙ i2 + ωi2 − which leads to
ω˙ i >
1 ωi4 = Hi > Hi r1 ≥ 2 2
1 − ω2 ω4 1 − ωi2 + i = √ i . 2 2 2
Hairy Black Hole Solutions to SU (3) Einstein–Yang–Mills Equations
391
Also, since ωi (r1 ) = 0, it follows from (4.6) that µω˙ i2 (r1 ) = Hi (r1 ) = Hi r1 +
ln r1 ln r1
H˙ i dτ
ln r1 1 ≥ µω˙ i r1 + + (2 − 5δ) ω˙ i2 dτ 2 ln r1 0 1 − ωi2 1 ≥ + (2 − 5δ) √ dωi 2 2 −1 √ 2 1 = + (2 − 5δ) , 2 3 where τ = ln r. This proves the inequality in the lemma. To see that there can be no other zero for ωi , we observe that by (4.2), ω˙ i satisfies (4.5) at r = r1 . It follows that ω˙ i is increasing in (r1 , r¯ ), and hence must be positive. Therefore, it is impossible for ωi to have another zero in (r1 , r¯ ). This completes the proof of Theorem 4.2.
5. Distribution of Connecting Points in D In this final section, we complete the proof of Theorem 1.1, which asserts that for each pair of nonnegativenumbers n1 and n2 , there is a connecting trajectory ω ∈ D such that ωi has ni zeros in rˆ , ∞ for i = 1, 2. Because of the symmetry of the equations, we only consider connecting points in the subset D = {(ω1 , ω2 ) ∈ D : ω2 ≥ ω1 ≥ 0} . In view of Theorem 4.1, the zero-numbers ni are upper semi-continuous at each exiting point. In addition, by Theorem 4.2, if we define Ni = ni at each exiting point and Ni = ni + 1 at each connecting point, then Ni is upper semi-continuous over the entire D . It is clear that the variation of Ni at any point is again at most 1. Let l1 = (u, v) ∈ D : u = 0 and l2 = (u, v) ∈ D : u = v be the left and right side boundary of D . We first construct a sequence of dividing curves 5k that separate regions where N2 < k and N2 ≥ k. Lemma 5.1. For each k = 0, 1, . . . , there is a continuous curve 5k that joins a point ak ∈ l1 and a point bk ∈ l2 , such that N2 = k + 1 on 5k and in any neighborhood of any point on 5k there are points with N2 < k. Furthermore, any two different curves do not intersect, and 5k+1 is below 5k for each k. Proof. Let Dk = (u, v) ∈ D : N2 (u, v) < k + 1 for k = 0, 1, . . . . Then since N2 is upper semicontinuous, Dk is open for each k. We define 5k by induction as follows. First observe that there is a component of D0 that contains the line segments s1 = v = √ √ √ 2, 0 ≤ u ≤ 2 and s2 = u = 0, 1 ≤ v ≤ 2 in D . In fact, if (ω1 , ω2 ) is a trajectory that starts in s1 , then by computation √ 2 ω ˆ 2 ω2 rˆ = − 2 −1 + 1 > 0. 2 rˆ µ rˆ
392
W. H. Ruan
Hence the trajectory exits D immediately. Therefore N2 = n2 = 0. Similarly, if (ω1 , ω2 ) starts in s2 , then by the uniqueness of solutions, ω2 ≡ 0. Hence ωˆ 2 ω2 rˆ = − 2 1 − ωˆ 22 > 0. rˆ µ rˆ Furthermore, if ω2 (r) = 0 at any r > rˆ , then by (1.4), r 2 µω2 = −ω2 1 − ω22 > 0 which is impossible. Hence ω exits D along the line {ω1 = 0} with N2 = n2 = 0. This proves the assertion. Let D0∗ denote the component of D0 that contains s1 ∪ s2 and let 50 = ∂D1∗ \ (s1 ∪ s2 ). Hence 50 lies in the interior of D except for a point on l2 . It is clear that 50 joins a point a0 ∈ l1 and a point on b0 ∈ l2 . Clearly, at any point p ∈ 50 , N2 (p) ≤ 1 by upper semi-continuity. Furthermore, if N2 (p) = 0 for some p ∈ 50 , then p is exiting (because n2 cannot be negative), and the end value ω¯ 2 = 0. However, by Theorem 4.1, N2 = n2 = 0 is constant in a neighborhood of p. Hence p ∈ / ∂D0 . This shows that N2 = 1 on 50 . Suppose Dn∗ and 5n have been defined such that Dn∗ is the component of Dn that contains 5n−1 , 5n = ∂Dn∗ \ (s1 ∪ s2 ), and N2 = n + 1 on 5n . Suppose also that 5n joins ∗ a point an ∈ l1 and a point bn ∈ l2 . Hence there is a component of Dn+1 of Dn+1 that ∗ contains 5n . Define 5n+1 = ∂Dn+1 \ (s1 ∪ s2 ). It is clear that 5n+1 is below 5n and it again joins a point an+1 on l1 to a point bn+1 on l2 . Suppose p ∈ 5n+1 at which N2 (p) < n + 2. We first show that p cannot be connecting. If it is, then n2 (p) = N2 (p) − 1 ≤ n. Hence by Theorem 4.2, in a neighborhood of p, n2 ≤ n at every connecting point and n2 ≤ n + 1 at every exiting point. This leads to N2 ≤ n + 1 in this neighborhood, ∗ . On the other hand, p cannot be exiting. Because otherwise, contradicting p ∈ ∂Dn+1 by Theorem 4.1, there is a neighborhood that only contains exiting points such that ∗ . Hence N (p) = n + 2. The N2 = n2 ≤ n + 1. Again it contradicts p ∈ ∂Dn+1 2 construction by induction is complete. It is clear from the construction that in any neighborhood of any point of 5k there are points with N2 < k and N2 ≥ k. This implies that 5k ∩ 5m = ∅ if m > k. Because if there is an intersection p, then in any small neighborhood of p, there are points with N2 < k and also points with N2 ≥ m ≥ k + 1 ≥ N2 + 2. This contradicts that the variation of N2 is at most one. The proof is complete. It follows from this lemma that at any connecting point on 5k , n2 = N2 − 1 = k. The proof of Theorem 1.1 would be complete if we show that on each dividing curve 5k there is a connecting point pm ≡ (um , vm ) such that n1 (pm ) = m for each m = k, k + 1, . . . . (See Fig. 5.1 below.) For this purpose, weconsider how ni ωˆ changes as the point ωˆ moves through the null-exiting sets Li ≡ ωˆ ∈ D : ω¯ i = 0 . Clearly, any exiting point of 5k lies in L2 since n2 = N2 is not constant in a neighborhood. The following lemma implies that any point of intersection 5k ∩ L¯ 1 is connecting. Lemma 5.2. The intersection L¯ 1 ∩ L¯ 2 contains only connecting points. Proof. Let ωˆ ∗ ∈ L¯ 1 ∩ L¯ 2 and let ω¯ ∗ be the end point of the corresponding trajectory ω∗ . 2 2 If it is not connecting, then it is exiting. Hence, either ω¯ 1∗ = 2 or ω¯ 2∗ = 2. Without loss of generality, we assume the former. Then, by Theorem 2.3, for any ωˆ near ωˆ ∗ , the end value ω¯ 1 of the trajectory is nonzero. Hence ωˆ ∈ / L1 . This means that ωˆ ∗ ∈ / L¯ 1 , contradicting to the assumption.
Hairy Black Hole Solutions to SU (3) Einstein–Yang–Mills Equations
393
ω2 2
Γ0
n1 = 2
n1 = 1
. .. n1 = k
Γk
n1 = k + 1 n1 = k + 2
ω1 Fig. 5.1. Connecting points on 5k
The next lemma describes the change of the zero-number ni ωˆ as ωˆ moves along a curve that passes through Li . Lemma 5.3. L et t ∈ [0, 1] → p (t) ∈ D be a continuous curve containing only exiting points. Suppose ni (p (0)) = ni (p (1)). Then the curve passes through Li . If in addition, the curve passes through Li only once at a point not in the set {ωi = 0}, then n1 (p (0)) differs from n2 (p (0)) by one. Proof. Let t¯ = inf {t ∈ (0, 1) : ni (p (t)) = ni (p(0))}. Then t¯ = 0. Suppose the end value ω¯ i,p(t¯) corresponding to the starting point p t¯ is nonzero. Then by Theorem 4.1, ni is constant in a neighborhood of p t¯ . Hence, there is a t1 < 1 such that ni (p (t1 )) = ni (p (1)) = ni (p (0)). This contradicts the definition of t¯. Hence ω¯ i,p(t¯) = 0, that is, p t¯ ∈ Li . Suppose ni (p (1)) = ni (p (0)) and the curve passes through Li only at t ∈ [0, 1]. Hence ni (p (t)) changes value only at t = t . Furthermore, Theorem 4.1 ensures that the difference of values of ni can be at most one in a neighborhood of t . This implies that ni (p (0)) and ni (p (1)) can differ at most by one. In view of the above lemma, if one shows that points on 5k near ak ∈ l1 can have arbitrary large n1 values, then the existence of connecting points pm ∈ 5k for m ≥ k would follow. To see this, let cn ∈ 5k be near ak such that n1 (cn ) ≥ n > k. Note that bk is connecting and n1 (bk ) = k. Suppose there is no connecting point on 5k between bk and cn . Then by Lemma 5.3 there is a point of L1 in this segment. And by Lemma 5.2 the intersection point is connecting. Since n1 can only change one at a time, and since n can be arbitrarily large, the existence of all types of connecting points pm follows. Hence, the proof of Theorem 1.1 will be complete if we prove the following lemma. Lemma 5.4. Let ωˆ ∗ = (0, v ∗ ) ∈ D , where v ∗ > 0, be a connecting point. Then, for any n > 0, there is an ε > 0 such that n1 ≥ n for any trajectory starting in the neighborhood Nε ωˆ ∗ .
394
W. H. Ruan
∗ ωˆ in which Proof. Suppose the opposite holds. Then there is a neighborhood N ε ∗ n1 < n. Assume that ε is so small that Nε ωˆ ⊂ Dk for some k. Then by Theorem ¯ which is independent of ωˆ ∈ Nε ωˆ ∗ . Since 2.5, = 2m + 4r P has an upper bound
rˆ > 2 and P ≥ −1 in D, by the monotonicity of m given by Theorem 2.4,
= 2m +
4 4P 4 ≥ 2m rˆ − = rˆ − > 0. r rˆ rˆ
Hence also has a positive lower bound ≡ rˆ − 4/ˆr . In view of Proposition 2.1, µ has a positive lower bound µ for r > rˆ + 1. For any trajectory ω = (ω1 , ω2 ), define the “polar coordinates” ρ (r) and θ (r) in the (ω1 , ω˙ 1 )-space by ω˙ 1 ρ = ω12 + ω˙ 12 , θ = tan−1 , ω1 where the angle is defined so that −π/2 < θ rˆ < π/2. We show that for r and r˜ such that π ¯ r > max rˆ + 1, /µ , r˜ ≥ r + (n + 1) (5.1) 4 ˙ ≤ −1/4 in r , r˜ . there is an ε sufficiently small, such that the end value r ¯ > r ˜ and θ Once this is it is clear that θ (˜r ) − θ r ≥ (n + 1) π and hence ωi has at least proved, n zeros in r , r˜ . This is a contradiction. The conclusion of the lemma thus follows. We prove θ˙ ≤ −1/4 below. Since ωˆ ∗ is a connecting point, itis clear that ε can be chosen so small that r¯ > r˜ for any trajectory starting in Nε ωˆ ∗ . We first show that there is a constant M ≥ 1, depending on r˜ but independent of the initial point ω, ˆ such that |ω1 (r)| ≤ M ωˆ 1 in rˆ , r˜ . (5.2) Observe that by Eq. (1.4),
1 2 ωˆ 1 2 1 − ωˆ 1 + ωˆ 2 < 0. (5.3) 2 rˆ 2 µ rˆ Hence, either ω1 (r) > 0 for all r ∈ rˆ , r˜ or there is a zero of ω1 in this interval. In the first case, 0 ≤ ω1 ≤ ωˆ 1 whichclearly implies (5.2). In the second case, let r0 be the first zero of ω in r ˆ , r ˜ . Then on r ˆ , r , we again have 0 ≤ ω1 ≤ ωˆ 1 . It remains to show 1 0 (5.2) in r0 , r˜ . We show that M can be chosen such that in this interval ω1
rˆ = −
ρ (r) ≤ M ωˆ 1 .
(5.4)
This will imply (5.3). We first compute ρ ρ˙ = ω1 ω˙ 1 −
1 4P 1 1 ω1 ω˙ 1 1 − ω12 + ω22 − ω˙ 12 1 − 2µ + 2 µ 2 µ r
in rˆ , r˜ .
Using the Schwarz inequality and the boundedness of ωi and P , we can find a constant c1 > 0 such that c1 ρ ρ˙ ≤ ρ 2 . (5.5) µ
Hairy Black Hole Solutions to SU (3) Einstein–Yang–Mills Equations
395
We next estimate µ in the interval (r0 , r˜ ). Observe that by the mean value theorem, −ωˆ 1 = ω1 (r0 ) − ω1 rˆ = ω1 (r1 ) r0 − rˆ for some r1 ∈ rˆ , r0 . If we can find a constant c2 > 0, independent of ω, such that ω ≤ c2 ωˆ 1 1
in rˆ , r0
(5.6)
then the above relation leads to r0 − rˆ > c2 . Hence, by Proposition 2.1, µ (r) ≥ δ rˆ + c2 ≡ c3
in r0 , r˜ .
(5.7)
To prove (5.6), we observe that since at r = r0 , r02 µω1 = − ω1 ≥ 0, the minimum of ω1 in rˆ , r0 occurs either at rˆ or at a point r2 ∈ (ˆr , r0 ] at which ω1 = 0. In the former case, (5.3) implies that ω (r) ≤ ω rˆ ≤ c2 ωˆ 1 in rˆ , r0 , 1 1 where c2 ≥
2 . rˆ 2 µ rˆ
In the latter case, (1.4) implies that
ω (r) ≤ −ω (r2 ) = ω1 1 − ω2 + 1 ω2 ≤ 2 ωˆ 1 in rˆ , r0 . 1 1 1 2
2
Recall that ≥ > 0 for all r ∈ rˆ , r˜ . Hence we again have (5.6) with c2 ≥ 2/ . This proves (5.6) and (5.7). Let us return to the discussion about ρ. By (5.5) and (5.7), rρ = ρ˙ ≤ c3 ρ
in (r0 , r˜ ) ,
where c3 = c1 /c2 . Furthermore, by (5.6), ρ (r0 ) = |ω˙ 1 (r0 )| ≤ c2 r˜ ωˆ 1 . Hence, by the comparison principle, ρ (r) ≤ ρ (r0 ) r c3 ≤ c2 r˜ c3 +1 ωˆ 1 . This proves (5.4) and also (5.2).
396
W. H. Ruan
˙ By computation, and using the definition of ρ and θ , We now consider θ. ω1 ω¨ 1 − ω˙ 12 ρ2
4P 1 ω1 1 −ω1 1 − ω12 + ω22 − 1 − 2µ + 2 ω˙ 1 − ω˙ 12 = 2 ρ µ 2 r
1 1 1 4P =− 1 − ω12 + ω22 cos2 θ − 1 − 2µ + 2 sin 2θ − sin2 θ µ 2 2µ r
1 1 1 4P = −1 − 1 − µ − ω12 + ω22 cos2 θ − 1 − 2µ + 2 sin 2θ. µ 2 2µ r
θ˙ =
Since by Theorem 2.1, µ < 1, it follows that
ω2 1 1 2 2 − 1 − µ − ω 1 + ω2 ≤ 1 . µ 2 µ ¯ ≥ , it follows that Also, by (5.1), rµ ≥ r µ >
1 1 4P 1 − 1 − 2µ + 2 sin 2θ ≤ (rµ − ) ≤ . 2µ r 2rµ 2 Hence θ˙ ≤ −1 +
ω12 1 + . µ 2
Hence, by (5.2), we can choose ωˆ 1 sufficiently small such that ω12 /µ ≤ 1/4. This ensures that θ˙ ≤ −1/4 in r , r˜ . The assertion is proven. The proof of Theorem 1.1 is complete.
References 1. Bartnik, R. and Mckinnon, J.: Particlelike solutions of the Einstein–Yang–Mills equations. Phys. Rev. Lett. 61, 141–144 (1988) 2. Bizon, P.: Colored black holes. Phys. Rev. Lett. 64, 2844–2847 (1990) 3. Hale, J.K.: Ordinary Differential Equations. New York: John Wiley & Sons, Inc., 1969 4. Künzle, H.: Analysis of the static spherically symmetric SU (n)-Einstein–Yang–Mills equations. Commun. Math. Phys. 162, 371–397 (1994) 5. Künzle, H. and Masood-ul-Alm, A.: Spherically symmetric static SU (2) Einstein–Yang–Mills fields. J. Math. Phys. 31, 928–935 (1990) 6. Mavromatos, N.E. and Winstanley, E.: Existence theorems for hairy black holes in su(N ) Einstein– Yang–Mills theories. J. Math. Phys. 39, 4849–4873 (1998) 7. Ruan, W.H.: Existence of infinitely many black holes in su(3) Einstein–Yang–Mills theory. Nonlinear Analysis 47, 6109–6119 (2001) 8. Smoller, J. and Wasserman, A.: Existence of infinitely-many smooth, static, global solutions of the Einstein–Yang–Mills equations. Commun. Math. Phys. 151, 303–325 (1993) 9. Smoller, J., Wasserman, A. andYau, S.-T.: Existence of black hole solutions for the Einstein–Yang/Mills equations. Commun. Math. Phys. 154, 377–401 (1993)
Hairy Black Hole Solutions to SU (3) Einstein–Yang–Mills Equations
397
10. Smoller, J., Wasserman, A., Yau, S.-T. and McLeod, J.: Smooth static solutions of the Einstein/Yang– Mills equations. Commun. Math. Phys. 143, 115–147 (1991) 11. Volkov, M. and Galt’sov, D.: Black-holes in Einstein–Yang–Mills theory. Sov. J. Nucl. Phys. 51, 747–753 (1990) 12. Volkov, M. and Galt’sov, D.: Gravitating non-Abelian solitons and black holes with Yang–Mills fileds. Phys. Rep. 319, 2–83 (1999) Communicated by H. Nicolai
Commun. Math. Phys. 224, 399 – 426 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Quantum Invariant Measures Nicolai Reshetikhin1, , Milen Yakimov1,2, 1 Department of Mathematics, University of California at Berkeley, Berkeley, CA 94720, USA.
E-mail: [email protected]; [email protected]
2 Department of Mathematics, Cornell University, Ithaca, NY 14853, USA
Received: 26 January 2001 / Accepted: 31 May 2001
Abstract: We derive an explicit expression for the Haar integral on the quantized algebra of regular functions Cq [K] on the compact real form K of an arbitrary simply connected complex simple algebraic group G. This is done in terms of the irreducible ∗-representations of the Hopf ∗-algebra Cq [K]. Quantum analogs of the measures on the symplectic leaves of the standard Poisson structure on K which are (almost) invariant under the dressing action of the dual Poisson algebraic group K ∗ are also obtained. They are related to the notion of quantum traces for representations of Hopf algebras. As an application we define and compute explicitly quantum analogs of Harish-Chandra c-functions associated to the elements of the Weyl group of G. 1. Introduction Let G be a simply connected complex simple algebraic group. The cocommutative Hopf algebra C[G] of regular functions on G has a standard quantization, denoted by Cq [G] and called quantized algebra of regular functions on G. It is a Hopf subalgebra of the dual Hopf algebra of the standard quantized universal enveloping algebra Uq g. Let K denote a compact real form of G. The complex conjugation in the algebra C[K](= C[G]) can be deformed to a conjugate linear antiisomorphism ∗ of Cq [G]. This gives rise to a Hopf ∗-algebra (Cq [G], ∗) called the quantized algebra of regular functions on K which will be denoted by Cq [K]. The Hopf algebra Cq [K] is known [1] to have a unique Haar functional H : Cq [K] → C normalized by H (1) = 1. It is known by a quantum analog of the Schur orthogonality relations. At the same time an analog of the classical expression for the bi-invariant functional on C[K] as an integral over K with respect to the Haar measure was found only Partially supported by NSF grant DMS96-03239
Partially conducted for the Clay Mathematics Institute and also supported by NSF grants DMS94-00097
and DMS96-03239
400
N. Reshetikhin, M. Yakimov
in the case of SU2 , [16]. The first result which we obtain in this paper is a representation for the Haar integral on Cq [K] of this type in the general case. Let us first note that the quantum analog of the set of points on K is the set of irreducible ∗-representations of the Hopf ∗-algebra Cq [K]. Its representations were classified by Soibelman [14] and can be nicely described by a version of the Kirillov– Kostant orbit method. Fix a maximal torus T of K. Let G = KAN be the related Iwasawa decomposition of G. The group K has a standard Poisson structure making it a real Poisson algebraic group which is the semiclassical structure of the deformation of C[K] to Cq [K]. The double and dual Poisson algebraic groups of K are isomorphic to G and AN as real algebraic groups, respectively. The dressing action of AN on K is global and is explicitly given by the rule [9, 14] δan (k) for a ∈ A, n ∈ N, k ∈ K
is such that
ank = (δan (k)) a1 n1
(1.1)
for some a1 ∈ A, n1 ∈ N (see [13, 9] for general facts about the dressing action). Let us choose for each element w of the Weyl group W of G a representative w˙ in the normalizer of A in K. The orbits of the dressing action of AN on K (symplectic leaves of K) are Sw t, where w ∈ W , t ∈ T and Sw denotes the orbit of w. ˙ The disjoint union t∈T Sw .t is the Bruhat cell K ∩ B wB, ˙ where B is the Borel subgroup B = T AN of G. Soibelman proved that the leaves Sw .t are deformed to a set πw,t of (unequivalent) irreducible ∗-representations of the Hopf ∗-algebra Cq [K]. Up to an equivalence they exhaust all such representations of Cq [K]. Our result on the Haar integral on Cq [K] expresses it as an integral over the maximal torus T of K of the traces of the representations πw◦ ,t for the maximal length element w◦ of W . In other words these are the irreducible ∗-representations of Cq [K] corresponding to the symplectic leaves in the maximal Bruhat cell of K. This result is derived in Sect. 5. It is particularly suited for obtaining integral expressions for quantum spherical functions. This will be discussed in a future publication. For each w ∈ W denote Nw = N ∩ wN− w −1 and Nw+ = N ∩ wN w −1 , where N− is the opposite to the unipotent subgroup N of G. Our next result is a quantum analog of the Haar measures on the unipotent groups Nw . The symplectic leaf Sw .t, considered as an AN -homogeneous space via the dressing action, is isomorphic to Sw .t = AN/ANw+ .
(1.2)
The quotient AN/ANw+ does not have a left invariant measure because the ratio of the corresponding modular functions is not equal to 1, see [3]. Using the factorization AN = Nw ANw+ , we can identify AN/ANw+ ∼ = Nw which induces a measure on the symplectic leaf (1.2) from the Haar measure on Nw . The resulting measure transforms under the action of AN by the following multiplicative character of AN : χ (an) = a 2(ρ−wρ) ,
a ∈ A, n ∈ N.
(1.3)
The dressing action of AN = K ∗ on the symplectic leaf Sw .t of K induces an action of K ∗ on the space of functions on Sw .t. The latter transforms in the quantum situation to an action of Cq [G] on the space of linear operators in the Hilbert space completion V w,t of the representation space of πw,t . It coincides with the standard adjoint action c.L =
πw,t (c(1) )Lπw,t (S(c(2) )).
(1.4)
Quantum Invariant Measures
401
(Here and later we use the standard notation for the comultiplication in a Hopf algebra (c) = c(1) ⊗ c(2) .) Let us also note that Cq [K] acts by bounded operators in all of its ∗-representations and thus in particular in V w,t . The standard trace in V w,t is not a homomorphism from the space of trace class operators in V w,t with the adjoint Cq [G]-action (1.4) to the 1-dimensional representation of Cq [G] determined by its counit. After Reshetikhin and Turaev such a homomorphism, from possibly a “deformation” of the space of trace class operators, is called a quantum q trace for the Hopf algebra module under consideration. We define a space B1 (V w,t ) of “quantum” trace class operators in V w,t , stable under the adjoint Cq [G]-action (1.4), and construct a homomorphism from it to the 1-dimensional representation of Cq [G] determined by a multiplicative character of it which is a deformation of the character (1.3). Such homomorphisms, to be called quantum quasi-traces, are treated in Sect. 6 where we also study some of their properties. They are quantum analogs of the invariant measures on the unipotent groups Nw and the almost AN -invariant measures on the symplectic leaves Sw .t. Section 7 contains an application to quantum analogs of Harish-Chandra c-functions related to the elements of the Weyl group of G. They are constructed by the help of the quantum quasi-traces from Sect. 6 and are explicitly computed by a q-analog of the original Harish-Chandra formula. In the quantum situation the role of the factorization formulas for the groups Nw as products of 1-dimensional unipotent subgroups is played by tensor product formulas for the representations πw,t [14, 7]. In a forthcoming publication we will discuss the relation between the quantum c-functions and the asymptotics of quantum spherical functions at infinity, which is similar to the one in the classical case. Sections 2 and 3 review some standard facts about quantized universal enveloping algebras, quantized function algebras, and their representations. Section 4 deals with a family of elements of Cq [K] which enter in all formulas for quantum invariant functionals derived in this paper.
2. Preliminaries on Quantized Enveloping Algebras 2.1. Root data. Let g be a complex simple Lie algebra of rank l with Cartan matrix (aij ). Denote by (. , .) the invariant inner product on g for which the square length of a minimal root equals 2 in the resulting identification h∗ ∼ = h for a Cartan subalgebra h of g. The sets of simple roots, simple coroots, and fundamental weights of g will be denoted by {αi }li=1 , {αi ∨ }li=1 , and {ωi }li=1 , respectively. Let P , Q, and Q∨ , denote the weight, root, and coroot lattices of g. Denote by , + , − , and P + the sets of roots, positive/negative roots, and dominant weights of g. Set Q = { mi αi } and + Q+ ∨ = { mi αi ∨ }, mi ∈ N. Recall that there exists a unique set of relatively prime positive integers {di }li=1 for which the matrix (di aij ) is symmetric and for it (αi , αj ) = di aij . The Weyl group of g will be denoted by W . The simple reflections in W will be denoted by si and the maximal length element in W by w◦ .
402
N. Reshetikhin, M. Yakimov
2.2. Definition of Uq g. Throughout this paper we will assume that q is a real number different from ±1 and 0. The adjoint rational form of the quantized universal enveloping algebra Uq g of g is generated by Ki±1 , and Xi± , i = 1, . . . , l, subject to the relations Ki−1 Ki = Ki Ki−1 = 1, Ki Xj± Ki−1
=
Ki Kj aij ± qi Xj ,
Xi+ Xj− − Xj− Xi+ = δi,j
= Kj Ki ,
Ki − Ki−1 qi − qi−1
,
1 − a ij (Xi± )r Xj± (Xi± )1−aij −r = 0, i = j. r q
1−aij r=0
i
It is a Hopf algebra with comultiplication given by (Ki ) = Ki ⊗ Ki , (Xi+ ) = Xi+ ⊗ Ki + 1 ⊗ Xi+ ,
(Xi− ) = Xi− ⊗ 1 + Ki−1 ⊗ Xi− ,
antipode and counit given by S(Ki ) = Ki−1 ,
S(Xi+ ) = −Xi+ Ki−1 ,
S(Xi− ) = −Ki Xi− ,
+(Ki ) = 1, +(Xi± ) = 0,
where qi = q di . As usual q-integers, q-factorials, and q-binomial coefficients are denoted by [n]q q n − q −n n , [n] ! = [1] . . . [n] , = [n]q = q q q −1 m q −q [m]q [n − m]q q for n, m ∈ N and m ≤ n. The conjugate linear antiisomorphism ∗ of Uq g defined on its generators by Ki∗ = Ki ,
(Xi+ )∗ = Xi− Ki ,
(Xi− )∗ = Ki−1 Xi−
(2.1)
equips Uq g with a structure of a Hopf ∗-algebra. In the limit q → 1 the involution ∗ recovers the Cartan (anti)involution (conjugate linear antiisomorphism of order 2) of g associated to its compact real form k. For the definition and properties of Hopf ∗-algebras we refer to [7, pp. 95–97] and [1, pp. 117–118]. For i = 1, . . . , l the Hopf subalgebra of Uq g generated by Ki and Xi± will be denoted by Uqi gi . It is naturally isomorphic to Uqi sl2 . The canonical embedding Uq sl2 ∼ = Uqi gi ,→ Uq g will be denoted by ϕi . Recall that a Uq g-module is called integrable if the subalgebras Uqi gi act locally finitely. The subalgebras of Uq g generated by {Ki }li=1 , {Xi+ }li=1 , and {Xi− }li=1 will be denoted by U0 , U + , and U − , respectively. Clearly U0 is a commutative Hopf subalgebra of Uq g isomorphic to the group algebra of the lattice Q equipped with the standard structure of a cocommutative Hopf algebra.
Quantum Invariant Measures
403
2.3. Quantum Weyl group. Let Bg denote the (generalized) braid group associated to the Coxeter group W with generators Ti corresponding to the simple reflections si ∈ W . For any integrable Uq g-module V one can define an action of Bg on V . It is given by [10] Ti = (−1)b qiac−b (Xi+ )(a) (Xi− )(b) (Xi+ )(c) , a,b,c∈N
where (Xi± )(n) =
Xi± · [n]qi
In the case of the adjoint representation of Uq g this gives an action of the braid group Bg on Uq g. The explicit action of Ti on the generators Kj , Xj± of Uq g is Ti (Xi+ ) = −Xi− Ki , Ti (Xj+ ) = Ti (Xj− ) =
−aij
r=0 −aij
r=0
Ti (Xi− ) = −Ki−1 Xi+ ,
−aij
Ti (Kj ) = Kj Ki
(−1)r qi−r (Xi+ )(−aij −r) Xj+ (Xi+ )(r) (−1)r qir (Xi− )(r) Xj− (Xi− )(−aij −r)
i = j,
if
if
,
i = j.
The defined actions of Bg are compatible in the sense that for any integrable Uq gmodule V , Ti .xv = (Ti x).Ti v,
∀ x ∈ Uq g,
v ∈ V.
Recall that there exists a canonical section T : W → Bg of the natural projection Bg → W (where Ti → si ). If w = si1 . . . sin is a reduced decomposition of w ∈ W then the image Tw of w in Bg is defined by Tw = Ti1 . . . Tin . It does not depend on the choice of a reduced decomposition. The weight subspaces of a U0 -module (in particular of a Uq g-module) V are defined by Vλ = {v ∈ V | Ki .v = q (λ,αi ) v},
λ ∈ P.
The elements of Bg preserve the weight space decomposition of an integrable Uq gmodule, in particular Tw Vλ = Vwλ .
404
N. Reshetikhin, M. Yakimov
2.4. R-matrix. Put Uk± =
λ∈±Q+ , |(λ,ρ ∨ )|≥k
Uλ± ,
k ∈ N,
(2.2)
U− the completion where ρ ∨ is the half-sum of positive coroots of g. Denote by U+ ⊗ U− according to the descending sequence of vector spaces of the vector space U+ ⊗ + Uk ⊗ U − ⊕ U + ⊗ Uk− . U− acts in the tensor product of two finite dimenAny element of the completion U+ ⊗ sional Uq g-modules. Recall that a representation V of Uq g is called a type 1 representation if it is a direct sum of its weight subspaces. For a pair (V1 , V2 ) of type 1 Uq g-modules define the linear operator 3V1 ,V2 : V1 ⊗ V2 → V1 ⊗ V2 by 3V1 ,V2 (v1 ⊗ v2 ) = q (λ,µ) v1 ⊗ v2
if
v1 ∈ (V1 )λ , v2 ∈ (V2 )µ .
Denote also by σ : V1 ⊗ V2 → V2 ⊗ V1 the flip operator σ (v1 ⊗ v2 ) = v2 ⊗ v1 . U − , called a quasi R-matrix for Uq g, There exists [10, 7] a unique element R ∈ U + ⊗ normalized by U1− R − 1 ∈ U1+ ⊗ such that for any pair (V1 , V2 ) of finite dimensional Uq g-modules of type 1 the composition σ ◦ 3V1 ,V2 ◦ R : V1 ⊗ V2 → V2 ⊗ V1
(2.3)
defines an isomorphism of Uq g-modules. For any pair (V1 , V2 ) of finite dimensional Uq g-modules and an element w ∈ W the actions of Tw ∈ Bg on V1 , V2 , and V1 ⊗V2 , to be denoted by Tw,V1 , Tw,V2 , and Tw,V1 ⊗V2 , U − which does not are related as follows. There exists a unique element R w ∈ U + ⊗ depend on V1 and V2 such that Tw,V1 ⊗V2 = R w Tw,V1 ⊗ Tw,V2 .
(2.4)
As the quasi R-matrix R, R w satisfies U1− . R w − 1 ∈ U1+ ⊗
(2.5)
The element R w◦ associated to the maximal element w◦ of W is equal to the quasi R-matrix R.
Quantum Invariant Measures
405
3. Quantized Algebras of Functions 3.1. Quantized algebras of regular functions. Let G be a connected, simply connected, complex simple algebraic group and g = LieG. The finite dimensional, Uq g-modules of type 1 form a quasitensor category. Hence their matrix coefficients form a Hopf subalgebra of the Hopf dual (Uq g)∗ of Uq g. It is called the quantized algebra of regular functions on G and is denoted by Cq [G]. Every finite dimensional type 1 Uq g-module is a direct sum of irreducible type 1 Uq g-modules. The latter are highest weight modules with highest weights 6 ∈ P+ (the corresponding module will be denoted by L(6)). The matrix coefficient of L(6) 6 : associated to v ∈ L(6) and l ∈ L(6)∗ will be denoted by cl,v 6 6 ∈ Cq [G], cl,v (x) = l, x.v. cl,v
The above implies 6 Cq [G] = span{cl,v | 6 ∈ P+ , v ∈ L(6), l ∈ L(6)∗ }.
The ∗-involution in Uq g induces a structure of Hopf ∗-algebra on Cq [G] by ξ ∗ , x = ξ, S(x)∗ ,
ξ ∈ Cq [G], x ∈ Uq g.
(3.1)
The resulting Hopf ∗-algebra (Cq [G], ∗) is called quantized algebra of regular functions on the compact real form K of G and is denoted by Cq [K]. The inclusions ϕi : Uqi gi ,→ Uq g induce surjective homomorphisms ϕi∗ : (Cq [G], ∗) → (Cqi [Gi ], ∗), where Gi is the subgroup of G isomorphic to SL2 with tangent Lie algebra gi generated by the root vectors of ±αi . We finish this subsection with a simple fact on the explicit structure of the Hopf ∗-algebra Cq [K] (see, for instance, [1, Proposition 13.1.3]). Recall that L(6)∗ ∼ = L(−w◦ 6) and if we fix these isomorphisms, we can consider any v ∈ L(6), l ∈ L(6)∗ as elements of L(−w◦ 6)∗ , L(−w◦ 6), respectively. Recall that any module L(6) can be equipped with a unique (up to a constant) inner product which turns it into a (Uq g, ∗) ∗-representation. Lemma 3.1. (i) The comultiplication, the counit, and the antipode of Cq [G] are given by 6 (cl,v )=
j
6 cl,v ⊗ cl6j ,v , j
−w◦ 6 6 6 +(cl,v ) = l, v, S(cl,v ) = cv,l ,
(3.2) (3.3)
where in (3.2) ({vj }, {lj }) is an arbitrary pair of dual bases of L(6) and L(6)∗ . (ii) Fix an orthonormal basis {vi } of L(6) equipped with an invariant inner product as above and a dual basis {lj } of L(6)∗ . The action of the ∗-involution (3.1) on the corresponding elements of Cq [G] is given by ◦6 ). (cl6i ,vj )∗ = (cv−w i ,lj
(3.4)
406
N. Reshetikhin, M. Yakimov
3.2. Quantized algebra of continuous functions of K. Let G be a complex simple algebraic group as in the previous subsection and K be its compact real form. The quantized algebra of continuous functions Cq (K) on K is by definition the C ∗ -completion of the ∗-algebra Cq [K] with respect to the norm f = sup η(f ),
f ∈ Cq [K],
η
(3.5)
where η runs through all ∗-representations of Cq [K]. The fact that for any ∗-representation η of Cq [K] η(f ) is a bounded operator and that the supremum in (3.5) is finite for all f ∈ Cq [G] follows from the following identity in Cq [K]: j
cl6j ,vi (cl6j ,vi )∗ = 1,
where {vi } and {lj } are dual bases of L(6) and L(6)∗ as in part (ii) of Lemma 3.1, see [1, Eq. (13), p. 452]. The C ∗ -algebras Cq (K) posses natural structures of compact matrix quantum groups in the sense of Woronowicz [18], see [1, Sect. 13.3]. 3.3. Cq [SU2 ]. The Uq sl2 -module L(ω1 ) has a basis in which the operators K1 , X1± act by K1 →
q 0 , 0 q −1
X1+ →
01 , 00
X1− →
00 . 10
The corresponding matrix coefficients cij ∈ Cq [SL2 ] i, j = 1, 2 generate Cq [SL2 ]. More precisely: Lemma 3.2. The Hopf algebra Cq [SL2 ] is isomorphic to the algebra generated by cij , i, j = 1, 2, subject to the relations c11 c12 = q −1 c12 c11 , c11 c21 = q −1 c21 c11 , c12 c22 = q −1 c22 c12 , c21 c22 = q −1 c22 c21 , c12 c21 = c21 c12 ,
c11 c22 − c22 c11 = (q −1 − q)c12 c21 , c11 c22 − q −1 c12 c21 = 1.
In these generators the comultiplication, the counit, the antipode, and the ∗-involution of Cq [SU2 ] are given by (cij ) =
cik ⊗ ckj ,
+(cij ) = δij ,
k=1,2
S(c11 ) = c22 ,
S(c22 ) = c11 , S(c12 ) = −qc12 , ∗ ∗ c11 = c22 , c21 = −qc12 .
S(c21 ) = −q −1 c21 ,
Quantum Invariant Measures
407
A proof of Lemma 3.2 can be found, for instance, in [7, Example 2.3.3 and Theorem 3.0.1]. Let q ∈ R, q > 1. The Hopf ∗-algebra Cq [SU2 ] has an infinite dimensional ∗representation π on l 2 (N) given by the following action of its generators cij , i, j = 1, 2 (see [14, 7]): (3.6) π(c12 )ek = q −k−1 ek , π(c11 )ek = 1 − q −2k ek−1 , π(c21 )ek = −q −k ek , π(c22 )ek = 1 − q −2k−2 ek+1 , (3.7) where e−1 := 0. 3.4. Irreducible star representations of Cq [K]. The group of multiplicative characters of the Hopf algebra Cq [G] is isomorphic to the complex torus (C× )l , see [4, Theorem 3.3] and [6, Sect. 10.3.8] in the case when q is an indeterminate. The character corresponding to the l-tuple t = (t1 , . . . , tl ) ∈ (C× )l is given by 6 χt (cl,v )=
l i=1
(λ,αi ∨ )
ti
l, v =
l i=1
(λ,αi ∨ )
ti
6 +(cl,v ), v ∈ L(6)λ .
(3.8)
The unitary ones among these are the ones corresponding to the real torus (S 1 )l = {(t1 , . . . , tl ) ∈ (C× )l | |ti | = 1}. From now on we will assume that q ∈ R, q > 1. Denote by πi the ∗-representation of (Cqi [Gi ], ∗) ∼ = Cqi [SU2 ] given by (3.6)–(3.7). The ∗-representation of Cq [K] ∼ = (Cq [G], ∗) induced from it by the homomorphism ϕi∗ : (Cq [G], ∗) → (Cqi [Gi ], ∗) will be denoted by πsi . (Recall that si denotes the simple reflection in the Weyl group W of g corresponding to the root αi .) The irreducible ∗-representations of the Hopf ∗-algebra Cq [K] were classified by Soibelman [14], see also the book [7] for an exposition. Theorem 3.3. (i) For any reduced decomposition w = si1 . . . sin of an element w of W and any t ∈ (S 1 )l the tensor product πw,t = πsi1 ⊗ . . . ⊗ πsin ⊗ χt
(3.9)
is an irreducible ∗-representation of Cq [K]. (ii) Up to an equivalence the representation πw,t does not depend on the choice of reduced decomposition of w. (iii) Every irreducible ∗-representation of Cq [G] is isomorphic to some πw,t . Denote by Vw,t the representation space of πw,t equipped with the Hermitian inner product from Theorem 3.3. The Hilbert space completion of Vw,t with respect to it will be denoted by V w,t . Then: The representations πw,t naturally induce irreducible representations of the C ∗ algebra Cq (K), πw,t : Cq (K) → B(V w,t ). The latter exhaust all irreducible representations of Cq (K) up to a unitary equivalence. Each module Vw,t has a natural orthonormal basis ek1 ,... ,kn = ek1 ⊗ . . . ⊗ ekn ⊗ 1,
n = l(w), k1 , . . . , kn ∈ N
(3.10)
408
N. Reshetikhin, M. Yakimov
induced from the orthonormal basis {ek } of the Cq [SU2 ]-module V defined by (3.6)– (3.7). Here 1 denotes a (fixed) vector of the 1-dimensional representation of Cq [G] corresponding to χt . For an element w of the Weyl group W denote by Iw the ∗-ideal of Cq [K] generated by 6 cl,v 6
such that
6 ∈ P+ , l, U + Tw .v6 = 0,
(3.11)
where v6 denotes a highest weight vector of L(6). The annihilation ideals of the representations πw,t are contained in Iw [14, 7]: ker πw,t ⊂ Iw .
(3.12)
4. A Family of Elements a,w ∈ Cq [K] 4.1. Definitions. For a dominant integral weight 6 ∈ P+ and a highest weight vector v6 of L(6) denote by l6,w the unique element of L(6)∗−w6 such that l6,w , Tw v6 = 1. (The uniqueness follows from the fact that dim L(6)w6 = 1.) Define a6,w = cl66,w ,v6 .
(4.1)
Note that a6,w does not depend on the choice of highest weight vector v6 of L(6). The ∗-subalgebras of Cq [K] generated by a6,w played an important role in Soibelman’s classification of the irreducible ∗-representations of Cq [K], see Theorem 3.3. Most of the results in this subsection are due to Soibelman [14]. We include their proofs since [14] does not assume the normalization made in the definition of a6,w . U − allow to write l6,w and Properties (2.4) and (2.5) of the elements R w ∈ U + ⊗ thus a6,w slightly more explicitly. Let l6 = l6,1 , i.e. let l6 ∈ L(6)∗−6 be the unique element such that l6 , v6 = 1. Then (2.4), (2.5) imply l6,w = Tw l6 and thus a6,w = cT6w l6 ,v6 .
(4.2)
∗ Proposition 4.1. (i) The elements a6,w , a6,w ∈ Cq [K], 6 ∈ P+ are normal modulo Iw : "
"
"
"
6 6 a6,w cl,v − q (6,λ )−(w6,µ ) cl,v a6,w ∈ Iw ,
(4.3)
∈ Iw ,
(4.4)
∗ 6" cl,v a6,w
" " 6" ∗ − q (6,λ )−(w6,µ ) cl,v a6,w
for v ∈ L(6" )λ" , l ∈ L(6" )∗−µ" .
Quantum Invariant Measures
409
∗ } (ii) The images of {a6,w , a6,w 6∈P+ in Cq [K]/Iw generate a commutative subalgebra. More precisely the following identity holds in Cq [K]:
a61 ,w a62 ,w = a61 +62 ,w ,
∀ 61 , 62 ∈ P+ .
(4.5)
Proofs of Proposition 4.1 can be found in [14, 7]. The property (4.3) follows from the existence of a quasi R-matrix for Uq g, see (2.3). Equation (4.4) follows from (4.3), Lemma 3.1, and the fact that the ideals Iw are stable under the ∗-involution. The first statement in part (ii) is a direct consequence of part (i). The second statement in (ii) U1− with the properties (2.4), follows from the existence of the element R w ∈ U1+ ⊗ (2.5) and the fact that v61 ⊗ v62 ∈ L(61 ) ⊗ L(62 ) generates a submodule isomorphic to L(61 + 62 ). 4.2. The action of a6,w in Vw,t . Lemma 4.2. Let w, w " ∈ W be such that w = si w " and l(w) = l(w " ) + 1 for some simple reflection si ∈ W . Then (a6,w ) − cl66,w ,Tw" v6 ⊗ a6,w" ∈ ker ϕi∗ ⊗ Cq [K] + Cq [K] ⊗ Iw" . Proof. According to (3.2) (a6,w ) is given by cl66,w ,vj ⊗ cl6j ,v6 , (a6,w ) = j
where ({vj }, {lj }) is a pair of dual bases of L(6) and L(6)∗ consisting of weight vectors (vj ∈ L(6)λj , lj ∈ L(6)−λj , λj ∈ P ). The definition (3.11) of Iw" implies cl6j ,v6 ∈ Iw"
if
λj ∈ / w " 6 + Q+ .
(4.6)
The map ϕi∗ : Cq [G] → Cqi [Gi ] acts on the matrix coefficients of a Uq g-module by restricting the module to Uqi gi . Since w = si w " and l(w) = l(w " ) + 1, w −1 αi ∨ ∈ −Q+ ∨ . Since 6 is a dominant weight 6, w −1 αi ∨ ≤ 0
and thus
w6, αi ∨ ≤ 0.
Hence Tw v6 is a lowest weight vector for the Uqi gi -submodule of L(6) generated by Tw v6 . The corresponding Uqi gi -highest weight vector is Tw" v6 and cl66,w ,vj ∈ ker ϕi∗
if
λj ∈ / {w6, w6 + αi , . . . , w" 6}.
The lemma now follows from (4.6) and (4.7).
(4.7)
#
For an element w ∈ W and a reduced decomposition w = si1 . . . sin of it denote wj = sij +1 . . . sin ,
j = 0, . . . , n − 1,
wn = 1.
(4.8)
410
N. Reshetikhin, M. Yakimov
Proposition 4.3. In the notation (4.8) the action of the elements a6,w in the module Vw,t is given by
πw,t (a6,w ) =
n
j =1
πsij (a
(wj 6,αij ∨ )ωij ,sij
).
l i=1
(6,αi ∨ )
ti
.
(4.9)
In the orthonormal basis {ek1 ,... ,kn }∞ kj =0 of Vw,t , see (3.10), the elements a6,w act diagonally by
πw,t (a6,w ).ek1 ,... ,kn =
n
q
−(kj +1)(wj 6,αij )
j =1
l i=1
(6,αi ∨ )
ti
ek1 ,... ,kn .
(4.10)
Formula (4.9) follows by induction from Lemma 4.2 and Definition (3.8) of the multiplicative characters χt of Cq [G]. To prove (4.10) we first compute that in Cq [SL2 ] aω1 ,s1 = −qc21
(4.11)
(cf. Sect. 3.3) and then use (4.5) which implies amω1 ,s1 = (aω1 ,s1 )m . We also use the identity di αi ∨ = (αi , αi )αi ∨ /2 = αi , see Sect. 2.1. 5. The Haar Integral on Cq (K) 5.1. Definition and the Schur orthogonality relations. Recall that a left invariant integral on a Hopf algebra A is a linear functional H on A satisfying (id ⊗ H ) ((a)) = H (a).1,
∀ a ∈ A.
(5.1)
A right invariant integral is analogously defined. In the analytic setting a left Haar integral for a C ∗ -Hopf algebra A is a state H on A satisfying (5.1), see [18]. Proposition 5.1. There exists a unique left invariant integral H on the Hopf algebra Cq [K] normalized by H (1) = 1. It is also right invariant and can be uniquely extended to a bi-invariant Haar integral on Cq (K). It is given by a quantum version of the classical Schur orthogonality relations: δ6,6" l, v " l " , v 6 6" H (cl,v cl " ,v " ) = 2(λ,ρ) λ dim L(6)λ q or equivalently by 6 ) = δ6,0 l, v. H (cl,v
(5.2)
Quantum Invariant Measures
411
5.2. Statement of the main result. Theorem 5.2. The bi-invariant integral H on Cq (K) (q ∈ R, q > 1) is given in terms of the irreducible representations πw,t of Cq (K) by (2ρ,β) ∗ (q − 1) tr V w ,t (πw◦ ,t (aρ,w◦ aρ,w c))dt, (5.3) H (c) = ◦ (S 1 )l
β∈+
◦
where w◦ is the maximal length element of the Weyl group W of g, ρ is the half sum of 1 l all positive roots of g, and dt is the invariant measure on the torus (S ) normalized by (S 1 )l dt = 1. In the special case of K = SU2 Theorem 5.2 was established by Soibelman and Vaksman [16]. A similar formula is also known for quantum spheres [15, 17]. Theorem 5.2 answers Question 3 in [15]. ∗ Note that formula (4.10) implies that πw,t (aρ,w◦ aρ,w ) is a trace class operator in ◦ V w,t . Since πw,t (c) is a bounded operator for c ∈ Cq (K), the product is also a trace class operator in V w,t for all c ∈ Cq (K). From Definition (3.9) of πw,t it is also clear that ∗ c)) tr V w ,t (πw◦ ,t (aρ,w◦ aρ,w ◦ ◦
is a continuous function in t ∈ (S 1 )l for a fixed c ∈ Cq (K) and that the r.h.s. of (5.3) defines a continuous linear functional on Cq (K). By identifying (S 1 )l ∼ = {(t1 , . . . , tl ) ∈ Cl : |ti | = 1}, the normalized invariant measure on the torus (S 1 )l is represented as dt =
dtl 1 dt1 ∧ ... ∧ · l (2π i) t1 tl
on Cq [G] given by the right-hand In Sects. 5.3 and 5.4 we show that the functional H side of (5.3) satisfies 6 (cl,v )=0 H
if 6 = 0.
(5.4)
(1) = 1. Combined In Sect. 5.5 we check that it satisfies the normalization condition H with (5.2) this proves Theorem 5.2.
5.3. Proof of (5.4): Reduction to the rank 1 case. Recall first the following simple characterization of w◦ ∈ W . Lemma 5.3. The maximal length element w◦ ∈ W is the only element w ∈ W that has a representation of the form w = w" si with l(w " ) = l(w) − 1 for an arbitrary simple reflection si . Lemma 5.3 follows from the so called “deletion condition”, see [5], and the property of w◦ that it is the only element w ∈ W such that w−1 (αi ) is a negative root of g for all simple roots αi of g.
412
N. Reshetikhin, M. Yakimov
We show that (5.4) for K = SU2 implies its validity in the general case. Let 6 ∈ P+ , 6 = 0. Equip L(6) with a Hermitian inner product making it a (Uq g, ∗) ∗-representation, recall (2.1). Denote Li = {v ∈ L(6) | Uqi gi .v = 0},
i = 1, . . . , l.
Since L(6) is an irreducible Uq g-module ∩li=1 Li = 0 and ⊥ l ⊥ L⊥ 1 + . . . + L2 = (∩i=1 Li ) = L(6).
Hence to show (5.4) it is sufficient to show that 6 (cl,v H )=0
if
v ∈ L⊥ m
for some
m = 1, . . . , l.
(5.5)
Note that L⊥ m is the span of the nontrivial irreducible Uqm gm -submodules of L(6). Choose a reduced decomposition of w◦ of the form w◦ = si1 . . . sin◦ −1 sm and consider the corresponding model for the representation πwn◦ ,t , πwn◦ ,t ∼ = πsi1 ⊗ . . . ⊗ πin◦ −1 ⊗ πsm ⊗ χt . Taking trace over the component πsm ⊗ χt of πwn◦ ,t and using (3.2) and (4.9) we see that to prove (5.5) it is sufficient to prove that (S 1 )l
∗ 6 tr V s (πsm (aωm ,sm aω∗ m ,sm ϕm (cl " ,v )))dt = 0 m
for all
l " ∈ L(6)∗ .
(5.6)
(Recall that by definition (w◦ )n◦ = 1, see (4.8).) Since v ∈ L⊥ m, ∗ 6 (cl,v ) = ϕm
p
pω
clp ,vmp
with all p > 0. By appropriately breaking the integral (5.6) into a product of a 1dimensional and an (l − 1)-dimensional integral one sees that (5.6) follows from (5.4) for K = SU2 .
Quantum Invariant Measures
413
5.4. Proof of (5.4): The case of Cq [SU2 ]. Our proof in the rank 1 case is similar to the one from [16]. Lemma 3.2 implies that Cq [SU2 ] is spanned by the elements p
p
m r m r c12 c21 and c22 c12 c21 c11
for
m, p, r ∈ N.
The Haar functional H acts on them by [1, Example 13.3.9] p
p
m r m r H (c11 c12 c21 ) = H (c22 c12 c21 ) = δm,0 δp,r
(−q)p (q 2 − 1) · q 2p+2 − 1
has the same property. This implies (5.4) for K = SU2 . We check that the functional H Recall from (4.11) that aω1 ,s1 = −q −1 c21 and thus aω∗ 1 ,s1 = c12 , see Lemma 3.2. Using (3.6)–(3.7) we compute p r )) tr V (π(aω1 ,s1 aω∗ 1 ,s1 ciim c12 c21
= δm,0
∞
−q −1 .q −(k+1)(p+1) .(−q −k )r+1
k=0
= δm,0
(−q)r q p+r+2 − 1
for i = 1, 2. This gives
1 2πi
S1
(−q)r = δm,0 p+r+2 t r−p−1 dt t q −1 (−q)p = δm,0 δp,r 2p+2 q −1
dt p r tr V s ,t (πs1 ,t (aω1 ,s1 aω∗ 1 ,s1 ciim c12 c21 )) 1
= H in the case K = SU2 . (i = 1, 2) which shows that H (1) = 1. Let w◦ = si1 . . . sin be a reduced decom5.5. Checking the normalization H ◦ position of the maximal element of W . Using (4.10) and the notation (4.8) we compute (S 1 )l
∗ tr V w ,t (πw◦ ,t (aρ,w◦ aρ,w ))dt = ◦ ◦
=
n◦ j =1 n◦
∞ kj =0
−2(kj +1)((w◦ )j ρ,αij ∨ ) qij
1
∨ −(2ρ,(w◦ )−1 j αij ) j =1 1 − q ij
(λ,α ∨ )
·
Note that qi i = q (λ,αi ) for all simple roots αi of g. The set of elements (w◦ )−1 j αij ∈ Q, j = 1, . . . , n◦ , coincides with the set of positive roots of g. This together with the via the r.h.s. of (5.3) gives definition of the functional H (1) = 1. H
414
N. Reshetikhin, M. Yakimov
5.6. Semiclassical limit. Here we explain the semiclassical analog of the integral formula from Theorem 5.2. As earlier G denotes a complex simple algebraic group and K denotes a compact real form of G. For each element w of the Weyl group W of K choose a representative w˙ of it in the normalizer of a fixed maximal torus T of K. Using the related Iwasawa decomposition of G, introduce the map aw : N → A
by w˙ −1 nw˙ = k1 aw (n)n1 ,
k1 ∈ K, n1 ∈ N,
(5.7)
see for instance [8]. It can be pushed down to a well defined map from the symplectic leaf Sw to A ˙ := aw (n), n ∈ N. aw (δn w) We refer to the introduction for details on the dressing action of AN on K related to the standard Poisson structure on K. The semiclassical analog of formula (5.3) is the following formula for the normalized Haar integral on K: π aw◦ (k)−2ρ f (k.t)µw◦ dt, f ∈ C(K). (5.8) H (f ) = (ρ, β) Sw◦ ×T β∈+
Here µw◦ denotes the Liouville volume form on the symplectic leaf Sw◦ corresponding to the maximalelement w◦ ∈ W and dt denotes the invariant measure on the torus T normalized by T dt = 1. Recall that Sw◦ × T coincides with the maximal Bruhat cell of K. Formula (5.8) can be easily proved following the idea of Sects. 5.3–5.5 on the basis of the product formulas [14, 7] for the symplectic leaves Sw of K, w ∈ W , Sw = Ssi1 . . . Ssin ,
(5.9)
where si1 . . . sin is a reduced decomposition of w. The integral with respect to the symplectic measure on the leaf Sw .t is (up to a factor) a semiclassical limit of the trace in the module V w,t . −2ρ At the end we explain the connection between the functions aw on the leaves Sw ∗ and the operators πw,t (aρ,w aρ,w ) in V w,t . Let us consider the highest weight module L(6) of g with highest weight 6 and the matrix coefficient a6,w ∈ C[G],
a6,w (g) = lw,6 , gv6 , g ∈ G,
where v6 is a highest weight vector of L(6) and lw,6 ∈ L(6)∗−w6 is normalized by lw,6 , wv ˙ 6 = 1, cf. (4.1). It is easy to show that the restriction of a6,w to the symplectic −6 , leaf Sw coincides with aw −6 . a6,w |Sw = aw
For t ∈ T the functions |aw,ρ (k.t)|2 = |aw,ρ (k)|2 = aw (k)−2ρ , k ∈ Sw ∗ ) in V are semiclassical analogs of the linear operators πw,t (aw,ρ aw,ρ w,t .
Quantum Invariant Measures
415
6. Quantum Quasi-Traces of Vw,t 6.1. Motivation. Let A be a Hopf algebra and A∗ be its dual Hopf algebra. Denote by A◦ the dual Hopf algebra of A equipped with the opposite comultiplication. Recall [2, 1] that the quantum double D(A) of A is isomorphic to A ⊗ A◦ as a coalgebra and the following commutation relation holds in D(A): ξa =
ξ(1) , a(3) ξ(2) a(2) S −1 ξ(3) , a(1) ,
ξ ∈ A∗ , a ∈ A.
(6.1)
Analogously to the classical situation one defines a quantum dressing action δ of A∗ on A. Using the identification D(A) ∼ = A ⊗ A∗ as vector spaces, set δξ a = (id ⊗ +)(ξ a). In view of the commutation relation (6.1) it is explicitly given by δξ a =
ξ(1) , a(3) a(2) S −1 ξ(2) , a(1) .
It is dual to the standard adjoint action of A∗ on itself adξ ξ " =
ξ(1) ξ " S(ξ(2) )
in the sense that adξ ξ " , a = ξ " , δS(ξ ) a.
(6.2)
For any representation π of A∗ in the vector space V the adjoint action of A∗ on itself lifts to an action of A∗ in the space of linear operators on V by adξ L =
π(ξ(1) )Lπ(Sξ(2) ).
(6.3)
Suppose that A∗ is a deformation of the Poisson Hopf algebra C[F ] of regular functions on a Poisson algebraic group F . According to Kirillov–Kostant orbit method philosophy an irreducible A∗ -module V can be viewed as a quantization of a symplectic leaf S in F . The left action of A∗ in the space of linear operators in V is a deformation of the Poisson C[F ]-module of functions on the leaf S. At the same time the dual Poisson algebraic group F ∗ of F acts in the space of functions on S by the dressing action. The quantum analog of this action is the adjoint action (6.3) of A∗ in the space of linear operators in the A∗ -module V . This leads to: The quantum analog of a measure on the symplectic leaf S in the Poisson algebraic group F which is invariant up to a multiplicative character of F ∗ is a homomorphism from a subspace of linear operators in the A∗ -module V , equipped with the A∗ -action (6.3), to a 1-dimensional representation of A∗ . In the next subsection we will develop this idea from a categorical point of view and relate it to the notion of quantum traces for A∗ -modules. In analogy, the defined more general morphisms will be called quantum quasi-traces. Subsections 6.3 and 6.4 construct such morphisms for the irreducible ∗-representations of the quantized algebras of functions (Cq [G], ∗).
416
N. Reshetikhin, M. Yakimov
6.2. Definitions. Let C be a C-linear, rigid, monoidal category with identity object 1. Recall that C is called balanced if for each object V ∈ Ob(C) there exists an isomorphism ∼ =
bV : V → V ∗∗ such that bV1 ⊗ bV2 = bV1 ⊗V2 ,
(6.4)
(bV∗ )−1 ,
(6.5) (6.6)
bV ∗ = b1 = id1 .
Given a Hopf algebra C over the field C let repC denote the category of its finite dimensional modules equipped with the left dual object V ∗ of V ∈ Ob(C) defined by c.ξ, v = ξ, S(a).v,
ξ ∈ V ∗, v ∈ V .
(6.7)
The spaces HomC (V1 , V2 ), V1 , V2 ∈ Ob(C) can be equipped with the canonical C-action c.L = πV1 (c(1) )LπV2 (S(c(2) )), L ∈ HomC (V1 , V2 ). (6.8) Here the Hopf algebra C plays the role of the Hopf algebra A∗ from the motivation in the previous subsection, cf. (6.3) and its derivation from the quantum dressing action. Clearly HomC (V1 , V2 ) ∼ = V2 ⊗ V1∗ as C-modules. In particular, for this action HomC (V , +) is canonically isomorphic to V ∗ , where, by abuse of notation, + denotes the 1-dimensional representation of C defined by its counit. Reshetikhin and Turaev [11] defined the following notion of quantum trace for a finite dimensional C-module V . Definition 6.1. A quantum trace for a finite dimensional C-module V is a homomorphism qtr V : EndC (V ) → + of C-modules for the action of C on EndC (V ) defined in (6.8). The pairing EndC (V ) ∼ = V ⊗V∗ → C is not a homomorphism of C-modules, where C is given the structure of the C-module corresponding to the counit +. At the same time the opposite pairing V∗ ⊗V → C has this property. If repC is balanced each V ∈ Ob(C) has a quantum trace defined by the composition [11] bV ⊗id
EndC (V ) ∼ = V ⊗ V ∗ −→ V ∗∗ ⊗ V ∗ → +
Quantum Invariant Measures
417
or explicitly qtr V (L) = tr V (bV L), L ∈ End(V ). Here bV is considered as a linear endomorphism of V using the canonical identification of V and V ∗∗ as vector spaces. The properties (6.4)–(6.5) of the balancing morphisms bV imply the following properties of the quantum traces qtr V : qtr V1 ⊗V2 (L1 ⊗ L2 ) = qtr V1 (L1 ) qtr V2 (L2 ), qtr V ∗ (L∗ ) = qtr V (L)
(6.9) (6.10)
for all Li ∈ EndC (Vi ). In [11, 12] it was proved that the category of finite dimensional type 1 Uq g-modules is balanced and this was used for constructing invariants of links and 3-dimensional manifolds. We would like to incorporate in Definition 6.1 the possibility for an invariant up to a character “quantum measure”, as explained in the previous section, and the general case of an infinite dimensional C-module V . We will restrict ourselves to representations of C π : C → B(V ) by bounded operators in a Hilbert space V and will call them bounded representations of C. The Hermitian inner product in V is not assumed to possess any invariance properties and the linear operators π(c), c ∈ C, in V are not assumed to be uniformly bounded. The dual V ∗ of such a bounded representation π : C → B(V ) is defined in the Hilbert space V ∗ of bounded functionals on V by formula (6.7). Obviously it is again a bounded representation. Definition 6.2. Two bounded representations of a Hopf algebra C πi : C → B(Vi ) in the Hilbert spaces Vi , i = 1, 2, will be called weakly equivalent if Vi contain dense C-stable subspaces Wi ⊂ Vi which are equivalent as C-modules. The point here is that the equivalence can be given by an unbounded operator ∼ =
b : W1 → W2 which therefore does not extend to the full space V1 . Definition 6.3. A bounded representation π : C → B(V ) of a Hopf algebra C in a Hilbert space V will be called quasi-balanced if there exists a multiplicative character χ of C for which V and χ ⊗ V ∗∗ are weakly equivalent. By abuse of notation we denote by χ the 1-dimensional C-module corresponding to the multiplicative character χ of C. In other words the bounded C-module V is balanced if there exists an invertible linear operator bV in V with dense domain and range such that Dom bV is C-stable and χ (c(1) )π(S 2 (c(2) ))bV , ∀ c ∈ C. (6.11) bV π(c) = (Here we use the canonical identification of V ∗∗ and V as Hilbert spaces.) Remark 6.4. Often V is the Hilbert space completion of a C-module W , equipped with a Hermitian inner product, which is a direct sum of mutually orthogonal finite dimensional submodules Wµ for a Hopf subalgebra B of C W = ⊕µ Wµ .
(6.12)
418
N. Reshetikhin, M. Yakimov
The restricted dual of such a module W with respect to the decomposition (6.12) as a direct sum of finite dimensional subspaces is naturally a C-module of the same type. The double restricted dual W ∗∗ of W is canonically isomorphic to W as a vector space. If W ∼ = χ ⊗ W ∗∗ as C-modules then the modules V and χ ⊗ V ∗∗ are weakly equivalent and V is a quasi-balanced C-module. Let π : C → B(V ) be a quasi-balanced representation as above. We call the subspace of the space of linear operators in V with dense domains q
B1 (V ) := B1 (V )bV−1 a space of quantum trace class operators in the C-module V . Here B1 (V ) stands for the standard trace class in V . It is naturally a C-module by c.L = π(c(1) )Lπ(S(a(c) )) q
because C acts in V by bounded operators. The linear map qtr V : B1 (V ) → C given by qtr V (L) := tr V (LbV ) is a well defined homomorphism of C-modules q
qtr V : B1 (V ) → χ . It will be called a quantum quasi-trace for the module V . Remark 6.5. One can as well use the space q (V ) := b−1 B1 (V ) B 1 V q
instead of B1 (V ). When bV−1 is not defined on the full space V the composition bV−1 L0 , L0 ∈ B1 (V ) need not have a dense domain in V . Because of this, it is convenient to use q (V ) only when b−1 has full domain. In that case the space B q (V ) is also a the space B 1 1 V C-module and the following map q (V ) → χ , qtr V (L) := tr(bV L) qtr V : B 1 is a homomorphism of C-modules. Remark 6.6. It is natural to look for a quasi-balancing map bV ∈ EndC (V ) for a bounded representation π : C → B(V ) of the form bV = π(aV ) for some aV ∈ C. The definition (6.11) implies that such a map π(aV ) provides a quasibalancing endomorphism if π(aV ) is an invertible linear operator in V with a dense range satisfying aV c − χ (c(1) )S 2 (c(2) )aV ∈ Ker π, ∀ c ∈ C (6.13) for some multiplicative character χ of A.
Quantum Invariant Measures
419
Thus quasi-balancing of the modules of a Hopf algebra A is related to the properties of the square of the antipode S of A. This is analogous to the usual case of balancing when χ = + and (6.13) reduces to aV c = S 2 (c)aV ∈ Ker π,
∀ c ∈ C,
see [11]. Similarly bV = πV (aV )−1 is a quasi-balancing map for the C-module V if πV (aV ) is an invertible operator in V with a dense range such that caV − χ (c(1) )aV S 2 (c(2) ) ∈ Ker π, ∀ c ∈ C. (6.14)
6.3. Main construction. In this subsection we construct quasi-balancing morphisms for the Cq [G]-modules Vw,t . As was pointed out in Sect. 3.4 they are bounded Cq [G]modules in the terminology from the previous subsection. Set 2ρ =
α=
l
α∈+
pi αi
i=1
for some positive integers pi and denote q
2ρ
=
l i=1
p
Ki i ∈ Uq g.
(6.15)
Its commutation with the generators Xi± of Uq g is given by q 2ρ Xi± q −2ρ = q ±(2ρ,αi ) Xi± ,
∀ i = 1, . . . , l.
As it is well known the square of the antipode in Uq g is given by the following lemma. Lemma 6.7. For all x ∈ Uq g, S 2 (x) = q 2ρ xq −2ρ . For an arbitrary element ν = i mi αi ∨ of the coroot lattice Q∨ of g we set q ν := (q m1 , . . . , q ml ) ∈ (C× )l and consider the multiplicative character χq ν of Cq [G]. It is explicitly given by 6 χq ν (cl,v ) = q (ν,µ) l, v,
l ∈ L(6)∗−µ ,
recall (3.8). From Lemma 6.7 we deduce the following properties of S 2 in Cq [G].
(6.16)
420
N. Reshetikhin, M. Yakimov
Lemma 6.8. (i) If v ∈ L(6)λ and l ∈ L(6)∗−µ , then 6 6 S 2 (cl,v ) = q 2(ρ,λ−µ) cl,v .
(6.17)
(ii) For all elements w ∈ W ∗ ∗ − χq 2(wρ−ρ) (c(1) )aρ,w aρ,w S 2 (c(2) ) ∈ Iw , caρ,w aρ,w
∀ c ∈ Cq [G],
recall (3.11). Proof. (i) By a straightforward computation, for all x ∈ Uq g, 6 6 6 6 S 2 (cl,v ), x = cl,v , S 2 (x) = cl,v , q 2ρ xq −2ρ = q 2(ρ,λ−µ) cl,v , x.
(ii) Combining part (i) with the identities (4.3) and (4.4) gives 6 ∗ ∗ 6 aρ,w aρ,w − q 2(wρ−ρ,µ) aρ,w aρ,w S 2 (cl,v ) ∈ Iw , cl,v
which implies (6.17) in view of (3.2).
∀ l ∈ L(6)∗−µ , v ∈ L(6),
#
Let us fix an element w ∈ W , a reduced decomposition w = si1 . . . sin of it, and an element t ∈ (S 1 )l . Consider the (Cq [G], ∗)-module Vw,t . We will make use of the notation (4.8) wj = sij +1 . . . sin , j = 0, . . . , n − 1,
wn = 1
and of the basis ek1 ,... ,kn , kj ∈ N of Vw,t from (3.10). Formula (4.10) implies that the space Vw,t decomposes as a sum of weight subspaces with respect to the action of the commutative subalgebra of Cq [G] spanned by a6,w , 6 ∈ P+ (recall part (ii) of Proposition 4.1) as Vw,t =
span{ek1 ,... ,kn |
µ∈Q+
n j =1
(kj + 1)wj−1 αij = µ}.
(6.18)
All weight subspaces of Vw,t are finite dimensional and we can identify the corresponding ∗∗ with V double restricted dual Vw,t w,t as a vector space. Part (ii) of Lemma 6.8 and the fact that the ideal Iw contains the annihilation ideal of ∗ )−1 : V Vw,t , see (3.12), imply that πw,t (aρ,w aρ,w w,t → Vw,t induces an isomorphism of ∗∗ ∗ )−1 the Cq [G]-modules Vw,t and χ ⊗Vw,t . In view of Remark 6.4, bw,t = πw,t (aρ,w aρ,w defines a quasi-balancing map for the Cq [G]-module V w,t . Explicitly in the basis (3.10) ∗ )−1 acts diagonally by of Vw,t , πw,t (aρ,w aρ,w ∗ πw,t (aρ,w aρ,w )−1 .ek1 ,... ,kn
=
n
q
2(kj +1)(wj ρ,αij )
ek1 ,... ,kn ,
(6.19)
j =1
recall (4.10). Define the set of quantum trace class operators in the Cq [G]-module V w,t by q
∗ ). B1 (V w,t ) = B1 (V w,t )πw,t (aρ,w aρ,w
(6.20)
Quantum Invariant Measures
421
∗ ) is a compact operator and thus It is clear from (6.19) that πw,t (aρ,w aρ,w q
B1 (V w,t ) ⊂ B1 (V w,t ). Using Proposition 4.1, observe that ∗ ∗ πw,t (a2ρ,w a2ρ,w ) = πw,t (aρ,w aρ,w )2 .
(6.21) q
Finally define the quantum quasi-trace functional qtr V w,t : B1 (V w,t ) → C by ∗ )−1 ), qtr V w,t (L) = constw tr V w,t (Lπw,t (aρ,w aρ,w
(6.22)
where
constw =
(q (2ρ,β) − 1).
(6.23)
β∈+ ∩w−1 −
Proposition 6.9. The Cq [G]-modules V w,t are quasi-balanced with multiplicative char∗ )−1 . The space acters χ2(wρ−ρ) and quasi-balancing morphisms bw,t = πw,t (aρ,w aρ,w of quantum trace class operators in V w,t and quantum quasi-trace morphisms q
qtr V w,t : B1 (V w,t ) → χ2(wρ−ρ) are given by (6.20) and (6.22). The morphisms qtr V w,t are normalized by ∗ )) = 1. qtr V w,t (πw,t (a2ρ,w a2ρ,w
(6.24)
To check (6.24) it is sufficient to check that ∗ tr V w,t (πw,t (aρ,w aρ,w )) =
(q (2ρ,β) − 1)−1 ,
β∈+ ∩w−1 −
recall (6.21). This easily follows from (6.19) using the standard fact {wj−1 αij }nj=1 = + ∩ w −1 −
(6.25)
in the notation of (4.8), see for instance [5]. Remark 6.10. Consider again the compact group K equipped with the standard Poisson structure, see the introduction and Sect. 5.6. Recall the notation Nw = N ∩ wN− w −1 and Nw+ = N ∩ wN w−1 , w ∈ W , where N− is the unipotent subgroup of G which is dual to N with respect to the fixed complex torus T A of G. The symplectic leaf Sw .t of K, considered as an AN homogeneous space under the dressing action, is isomorphic to AN/ANw+ . We choose as a base point of Sw .t the point w.t. ˙ Denote by µw,t the Liouville volume form on the leaf Sw .t. The diffeomorphisms Sw .t ∼ = AN/ANw+ ∼ = Nw
(6.26)
422
N. Reshetikhin, M. Yakimov
induce a measure dnw on Sw .t from the Haar measure on Nw . The second one comes from the factorization AN = Nw ANw+ . The measure dnw will be normalized by 2 aw,2ρ |Sw .t dnw = 1, cf. Sect. 5.6. The relation between the volume forms µw,t and dnw on Sw .t was found by Lu [8]. It is given by
−2 (ρ, β) (6.27) dnw = aw,ρ |Sw .t µw,t . π −1 β∈+ ∩w
−
It is easy to compute that the measure dnw on Sw .t transforms under the dressing action of AN = K ∗ by δan (dnw ) = a 2(ρ−wρ) dnw . The quantum quasi-trace morphisms q
qtr V w,t : B1 (V w,t ) → χ2(wρ−ρ) are quantum analogs of the measures dnw on Sw .t and thus also of the Haar measures on the unipotent subgroups Nw of G. The traces in the modules Vw,t can be considered as quantizations of the Liouville volume forms µw,t on the leaves Sw .t. The relation (6.22) is a quantum version of Lu’s relation (6.27). 6.4. Tensor product properties of the quasi-balancing morphisms bw,t . When w, w " ∈ W are such that l(ww" ) = l(w) + l(w" ) the tensor product of (Cq [G], ∗)-modules Vw,t ⊗Vw" ,t " is again an irreducible (Cq [G], ∗)-module, see Lemma 6.12 below. Here we discuss the relation between the corresponding quasi-balancing morphisms constructed in the previous subsection. For an element t = (t1 , . . . , tl ) ∈ (C× )l denote its j th component by (t)j := tj . Define an action of the Weyl group W of g on the torus (C× )l by mij tj where w −1 αj ∨ = mij αi ∨ . (w(t))i := j
i
It can be easily identified with the conjugation action of W on a complex torus of G. It is straightforward to check that 6 )= χw(t) (cl,v
l i=1
(λ,w−1 αi ∨ )
ti
l, v,
cf. (3.8). Fix w ∈ W and a reduced decomposition w = si1 . . . sin of it. The representation space Vw,t , recall Theorem 3.3, is canonically identified with the vector space Vw = Vs1 ⊗ . . . ⊗ Vsn
Quantum Invariant Measures
423
for all t ∈ (C× )l . (As earlier we will not show explicitly the dependence on the choice of a reduced decomposition of w.) Under this identification the basis (3.10) of Vw,t corresponds to the basis ek1 ,... ,kn = ek1 ⊗ . . . ⊗ ekn ,
n = l(w), k1 , . . . , kn ∈ N
(6.28)
of Vw . In the notation (4.8) define the linear operator Jw,t in Vw acting diagonally in the above basis of Vw by Jw,t .ek1 ,... ,kn =
n
k +1
(wj −1 (t)wj (t −1 ))ijj
ek1 ,... ,kn .
(6.29)
j =1
Lemma 6.11. For all w ∈ W , and t, t " ∈ (C× )l the operator Jw,t " defines an isomorphism of the Cq [G]-representations χw(t " ) ⊗ πw,t and πw,t ⊗ χt " ∼ = πw,tt " in the natural identification of their representation spaces with Vw . Lemma 6.11 is checked directly in the case of G = SL2 using the defining identities (3.6)–(3.7) for the Cq [SL2 ]-module π , see Sect. 3.3. This implies the lemma when w is a simple reflection and the general case is proved by induction on l(w). Lemma 6.12. Let w, w " ∈ W be such that l(ww" ) = l(w) + l(w " ) and t, t " ∈ (S 1 )l . The linear operator Jw" ,(w" )−1 (t) induces the unitary equivalence of (Cq [G], ∗)-modules Dw,t;w" ,t " : V w,t ⊗ V w" ,t " → V ww" ,(w" )−1 (t)t "
(6.30)
by identifying the spaces Vw,t ⊗ Vw" ,t " ∼ = Vw ⊗ Vw" ∼ = Vww" ∼ = Vww" ,(w" )−1 (t)t " . (The " product of two reduced decompositions of w and w is used as a reduced decomposition of ww" .) In the setting of Lemma 6.12 the Cq [G]-module V ww" ,(w" )−1 (t)t " admits a quasibalancing morphism constructed from the quasi-balancing morphisms bw,t and bw" ,t " for the modules V w,t and V w" ,t " . It is given by the composition Vww" ,(w" )−1 (t)t "
D−1 w,t;w " ,t "
id⊗bw" ,t "
−→ Vw,t ⊗ Vw" ,t " −→
Vw,t ⊗ χq 2(w" ρ−ρ) ⊗ Vw∗∗" ,t "
J −1 2(w" ρ−ρ) ⊗id w,q
−→
∗∗ ⊗ Vw∗∗" ,t " χq 2w(w" ρ−ρ) ⊗ χq 2(wρ−ρ) ⊗ Vw,t
bw,t ⊗id
χq 2w(w" ρ−ρ) ⊗ Vw,t ⊗ Vw∗∗" ,t " −→
D∗∗ w,t;w " ,t "
∗∗ −→ χq 2(ww" ρ−ρ) ⊗ Vww " ,(w " )−1 (t)t " . (6.31)
The restricted duals of the modules Vw,t and Vw" ,t " are taken with respect to the weight space decomposition (6.18) for the commutative subalgebras of Cq [G] spanned by a6,w and a6,w" , 6 ∈ P+ , respectively. Recall also the notation (6.16). Proposition 6.13. If w, w " ∈ W are such that l(ww " ) = l(w) + l(w" ) and t, t " ∈ (S 1 )l then the quasi-balancing map for the Cq [G]-module V ww" ,(w" )−1 (t)t " given by the composition (6.31) coincides with the quasi-balancing map bww" ,(w" )−1 (t)t " .
424
N. Reshetikhin, M. Yakimov
To prove Proposition 6.13 observe that in the natural identification of the representation spaces in (6.31) with Vw ⊗ Vw" the composition is simply bw,t J −1 2(w" ρ−ρ) ⊗ bw" ,t " . w,q
(We use again the product of two reduced decompositions of w and w " as a reduced decomposition of ww" .) Now the proposition easily follow from (6.19) and the following formula for the action of J −1 2(w" ρ−ρ) in the basis (6.28) of Vw,t which is a direct w,q consequence from (6.29), J −1 2(w" ρ−ρ) .ek1 ,... ,kn =
l(w)
w,q
q
2(kj +1)(wj (w" ρ−ρ),αij )
ek1 ,... ,kn .
j =1 q
This computation implies also the following connection between the spaces B1 (V w,t ), q q B1 (V w" ,t " ) and B1 (V ww" ,(w" )−1 (t)t " ) when l(ww " ) = l(w) + l(w " ). q
q
Corollary 6.14. If L ∈ B1 (V w,t ) and L" ∈ B1 (V w" ,t " ) in the setting of Proposition 6.13, then q
LJw,q 2(w" ρ−ρ) ⊗ L" ∈ B1 (V ww" ,(w" )−1 (t)t " ) and qtr V
ww " ,(w" )−1 (t)t "
(LJw,q 2(w" ρ−ρ) ⊗ L" ) =
constww" qtr V w,t (L) qtr V " " (L" ), w ,t const w const w"
where constw is given by (6.23). 7. An Application: Quantum Harish-Chandra c-Functions Denote by 1 the identity element (1, . . . , 1) of the real torus (S 1 )l .According to (4.10) the linear operators πw,1 (aωi ,w ) in V w,1 are compact, selfadjoint with spectrum contained in [0, ∞). For different values of i they mutually commute. Hence for each λ ∈ h we can define the linear operator in V w,1 , dλ,w = where λi = (λ, αi ∨ ), i.e. λ =
l
πw,1 (aωi ,w )λi ,
i=1
λi ωi . It is obvious that
dλ,w = πw,1 (aλ,w )
when
λ ∈ P+ ⊂ h
(7.1)
and dλ1 ,w dλ2 ,w = dλ1 +λ2 ,w ,
∀ λ1 , λ2 ∈ h.
(7.2)
Lemma 7.1. The linear operator diλ+2ρ,w in V w,1 is quantum trace class (belongs to q B1 (V w,1 )) if and only if Im(λ, β) < 0,
∀ β ∈ + ∩ w −1 − .
Quantum Invariant Measures
425 q
Proof. The operator diλ+2ρ,w in V w,1 belongs to B1 (V w,1 ) if and only if diλ,w ∈ B1 (V w,1 ) because of (7.1), (7.2), and the selfadjointness of πw,1 (aρ,w ). The operator diλ,w is diagonal in the orthonormal basis (3.10) of V w,1 and according to (4.10) acts by diλ,w .ek1 ,... ,kn =
n
q
−i(kj +1)(wj λ,αij )
ek1 ,... ,kn ,
(7.3)
j =1
recall the notation (4.8). It is clear that the linear operator diλ,w in V w,1 is trace class if and only if Re(iλ, wj−1 αij ) > 0 for i = 1, . . . , n = l(w) which implies the statement because of (6.25). # Definition 7.2. The function q
cw−1 (λ) = qtr V w,1 (diλ+2ρ,w ) = tr V w,1 (diλ,w )
(7.4)
in the domain {λ ∈ h | Im(λ, β) < 0, ∀ β ∈ + ∩ w −1 − } will be called quantum Harish-Chandra c-function associated to the element w−1 of the Weyl group W of g. q
Proposition 7.3. For all w ∈ W the quantum Harish-Chandra c-function cw (λ) is given by q (λ) = cw
β∈+ ∩w−
q (2ρ,β) − 1 · q (iλ,β) − 1
This proposition follows from (7.3) and (6.25) similarly to the proof of the normalization (6.24). Remark 7.4. Proposition 7.3 is a quantum analog of the Harish-Chandra formula for the c-function in the case of complex simple Lie groups, generalized later by Gindikin and Karpelevich to arbitrary real reductive groups. Recall the setting of Sect. 5.6 and Remark 6.10. Let dnw denote the Haar measure on the unipotent subgroup Nw of G. The classical Harish-Chandra c-function associated to the element w−1 ∈ W is given by the integral formula aw (n)−(iλ+2ρ) dnw , λ ∈ h, Im(λ, β) < 0, ∀ β ∈ + ∩ w −1 − , cw−1 (λ) = Nw
recall Definition (5.7) of the map aw : N → A. We refer to [3] for a detailed treatment of spherical functions and to [8] for an interpretation of the c-function in terms of the Poisson geometry of K, see in particular Example 2.8 in [8]. The linear operators diλ+2ρ in the modules V w,1 can be thought of as quantizations of the pushforwards of the functions aw (n)−(iλ+2ρ) on Nw to the symplectic leaves Sw by the dressing action, using the base points w˙ ∈ Sw (i.e. using the diffeomorphisms (6.26)). As was explained in Remark 6.10 the quantum quasi-traces qtr V w,1 in the Cq [G]modules V w,1 are quantizations of the pushforwards of the Haar measures on Nw to the symplectic leaves Sw .
426
N. Reshetikhin, M. Yakimov
The classical Harish-Chandra formula (2ρ, β) , λ ∈ h, Im(λ, β) < 0, ∀ β ∈ + ∩ w −1 − cw (λ) = (iλ, β) β∈+ ∩w−
is proved by induction on the length of w, see [3, Chapter IV, §6]. Lu [8] found that this argument is essentially based on the product formula (5.9) for the leaves Sw . Our computation relies on its quantum counterpart – the tensor product formula (3.9) for the representations πw,t , cf. also Sect. 6.4. References 1. Chari, V. and Pressley, A.: A guide to quantum groups. Cambridge: Cambridge Univ. Press, 1994 2. Drinfeld, V. G.: Quantum groups. Proc. ICM, Berkeley, 1986, Providence, RI: AMS, 1987, pp. 798–820 3. Helgason, S.: Groups and geometric analysis. Pure Appl. Math. 113, London–New York: Acad. Press, 1984 4. Hodges, T. J. and Levasseur, T.: Primitive ideals of Cq [G]. Preprint 1992 5. Humphreys, J. E.: Reflection groups and Coxeter groups. Cambridge Stud. Adv. Math. 29, Cambridge: Cambridge Univ. Press, 1990 6. Joseph, A.: Quantum groups and their primitive ideals. Ergebnisse der Mathematik und ihrer Grenzgebiete (3), Berlin–Heidelberg–New York: Springer–Verlag, 1995 7. Korogodski, L.I. and Soibelman, Ya.S.: Algebras of functions on quantum groups: Part I. AMS Math. Surveys and Monographs 56, Providence, RI: AMS, 1998 8. Lu, J.-H.: Coordinates on Schubert cells, Kostant’s harmonic forms, and the Bruhat Poisson structure on G/B. Transform. Groups 4, no. 4, 355–374 (1999) 9. Lu, J.-H. and Weinstein, A.: Poisson Lie groups, dressing transformations, and Bruhat decompositions. J. Diff. Geom. 31, no. 2, 501–526 (1990) 10. Lusztig, G.: Introduction to quantum groups. Progr. Math. 110, Basel–Boston: Birkhäuser, 1993 11. Reshetikhin, N.Yu. and Turaev, V.G.: Ribbon graphs and their invariants derived from quantum groups. Commun. Math. Phys. 127, no. 1, 1–26 (1990) 12. Reshetikhin, N.Yu. and Turaev, V.G.: Invariants of 3-manifolds via link polynomials and quantum groups. Invent. Math. 103, no. 3, 547–597 (1991) 13. Semenov-Tian-Shansky, M.A.: Dressing transformations and Poisson group actions. Publ. Res. Inst. Math. Sci. 21, no. 6, 1237–1260 (1991) 14. Soibelman, Ya.S.: The algebra of functions on a compact quantum group and its representations. St. Petersburg Math. J. 2, 193–225 (1991) 15. Soibelman, Ya.S.: Selected topics in quantum groups. In: Infinite analysis, Part B (Kyoto, 1991), Adv. Ser. Math. Phys. 16, Singapore: World Sci. Publ., 1992, pp. 859–887 16. Soibelman, Ya.S. and Vaksman, L.L.: An algebra of functions on the quantum group SU (2). Funct. Anal. Appl. 22, no. 3, 170–181 (1988) 17. Soibelman, Ya.S. and Vaksman, L.L.: On some problems in the theory of quantum groups. In: Representation theory and dynamical systems Adv. Soviet Math. 9, Providence, RI: AMS, 1992, pp. 3–55 18. Woronowicz, S.L.: Compact matrix pseudogroups. Commun. Math. Phys. 111, no. 4, 613–665 (1987) Communicated by H. Araki
Commun. Math. Phys. 224, 427 – 442 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Quantum Morphing and the Jones Polynomial Oliver T. Dasbach, Thang D. Le, Xiao-Song Lin Department of Mathematics, University of California, Riverside, CA 92521, USA E-mail: [email protected]; [email protected]; [email protected] Received: 15 February 2001 / Accepted: 8 June 2001
Abstract: We will explore the experimental observation that on the set of knots with bounded crossing number, algebraically independent Vassiliev invariants become correlated, as noticed first by S. Willerton. We will see this through the value distribution of the Jones polynomial at roots of unit. As the degree of the roots of unit is getting larger, the higher order fluctuation is diminishing and a more organized shape will emerge from a rather random value distribution of the Jones polynomial. We call such a phenomenon “quantum morphing”. Evaluations of the Jones polynomial at roots of unity play a crucial role, for example in the volume conjecture. When I questioned your pupil, under a pine-tree, “My teacher”, he answered, “went for herbs, But toward which corner of the mountain, How can I tell, through all these clouds?” Jia Dao (777–841), Chinese Poet of Tang Dynasty
1. Introduction While the Alexander polynomial of a knot is considered as being well-understood, the Jones polynomial remains mysterious. The Alexander polynomial has a solid interpretation in terms of classical topology; such an interpretation of the Jones polynomial is not known. The Alexander polynomial is computable in polynomial time in the number of crossings of a knot while evaluations of the Jones polynomial at all but eight points are known to be #P -hard [JVW90]. Partially supported by the Overseas Youth Cooperation Research Fund of NSFC
428
O. T. Dasbach, T. D. Le, X.-S. Lin
The theory of Vassiliev knot invariants gave a common framework for the Alexander polynomial and the Jones polynomial and its generalizations, known as quantum polynomials. After suitable renormalizations, the coefficients of the polynomials are Vassiliev invariants. Since each of these invariants are computable in polynomial time, this gives a way of approximating the Jones polynomial in polynomial time. Although the space of Vassiliev knot invariants is quite large [Das00], there is a lot of interest in understanding the simplest Vassiliev invariants. (See e.g. the recent, 28 page preprint [PV99] devoted to the Vassiliev invariant of order 2. For the use of this Vassiliev invariant for proofs on the Property P conjecture see [MZ00].) The space of Vassiliev invariants of order three is two-dimensional and has a natural (both algebraically and linearly independent) basis v2 and v3 with integer values. The image of the pair (v2 , v3 ) under all knots is the integer lattice. However, this point of view might be misleading. As observed by Willerton [Wil01], if one restricts the crossing number of the knots the image of (v2 , v3 ), plotted into the plane, has – at least for crossing number less than 16 – a distinct shape, which he called “fish”. Therefore, there might be a possible correlation among the crossing number and the first two Vassiliev invariants. We will put this observation in a more general setting. The first twoVassiliev invariants are the quadratic and cubic terms in the power series expansion of the Jones polynomial evaluated at ex . Note that its linear term vanishes and that the constant term is always 1. Thus, if we evaluate the Jones polynomial at the primitive nth root of unity then, as n grows, this complex number is more and more determined by v2 and v3 . In this way we see v2 and v3 as a limit of the Jones polynomial evaluated at primitive roots of unity. So, Willerton’s plotting, revealing the correlation of v2 , v3 and the crossing number, can be approached by renormalizing the value plotting of the Jones polynomial at a primitive root of unity of high order. Hence, within the set of knots with fixed crossing number, by varying the order of the primitive root of unity, we observe that the value distribution of the Jones polynomial reduces its randomness and becomes stabilized at a more organized shape. We call this phenomenon “quantum morphing”. A somewhat astonishing observation, seen in our plottings, is that the same kind of correlations seem to hold if we confine ourselves to alternating knots or, equally well, to non-alternating knots. The Jones polynomial for alternating knots is a specialization of the Tutte polynomial of the corresponding checkerboard graph of an alternating diagram of the knot. Thus, all observations hold for this specialization of the Tutte polynomial for planar graphs on a given number of edges as well. To explain some of the phenomena seen in the pictures, we rewrite the formulas for the second and third Vassiliev invariants given by Polyak and Viro (also known to Lannes and Fiedler). This will provide further hints to the correlation among the crossing number and these knot invariants. It is interesting to consider the special case of knots of low braid index. For knots of braid index 3, we will provide an explanation of the expected correlation. Finally, we indicate that this method could be explored further to reveal a possible correlation of the similar kind among the crossing number and the “Vassiliev coefficients” of the Jones and the Alexander polynomials. We would like to thank Jim Hoste and Morwen Thistlethwaite for their program knotscape. It provides a wonderful tool for the study of knots. Furthermore, the first author would like to thank Joan Birman for her encouragement.
Quantum Morphing and the Jones Polynomial
429
2. Further Motivations and Discussion 2.1. Complexity theory. It is intriguing to think of the Jones polynomial and the Alexander polynomial from the point of view of computational complexity theory. The Alexander polynomial has its root in classical topology. As most classical topological invariants, the Alexander polynomial is computable in polynomial time. To give a common framework with the Jones polynomial and other quantum polynomials it is convenient to see this fact in terms of representations of the braid group. Starting with a diagram of a knot with c crossings, Vogel’s algorithm (see [Vog90] and compare with [Yam87]) transforms the knot into a closed braid on s strands, where s is the number of Seifert circles in the diagram and thus bounded by c + 1. The word length of the resulting (non-unique) braid is bounded by a polynomial in c. Now the Alexander polynomial can be computed as a determinant from an s-dimensional representation of the braid group Bs . Combining these steps, we see that the computation of the Alexander polynomial for a knot of crossing number c is possible in polynomial time in c. The Jones polynomial, on the other hand, is defined in this setting as a weighted trace of a 2s -dimensional representation of the braid group Bs . Since s depends on c, we could only get an algorithm of exponential complexity for the computation of the Jones polynomial. Note, however, that here a subtlety is of some importance. If we confine our consideration only to knots given as diagrams with a bounded number of Seifert circles, then the computation of the Jones polynomial is polynomial in the crossing number. In particular, the computation of the Jones polynomial of closed n-braids is possible in polynomial time in the word length of n-braids. Without this restriction the computation of the Jones polynomial is harder than the computation of the Alexander polynomial (assuming N P = P ). This was shown by Jaeger, Vertigan and Welsh [JVW90]. They proved that for any primitive root of unity e2πi/n , n > 4 and n = 6, the evaluation of the Jones polynomial at this value is #P -hard. For a definition of #P see for example [GJ79]. This result makes it interesting to look at polynomial-time approximations of the Jones polynomial. Here, the theory of Vassiliev knot invariants comes into the play. As shown in [BL93] the coefficient of x k in the power series expansion of VK (ex ) is a Vassiliev invariant of order k. Since in general Vassiliev invariants of order k are computable in O(ck ) time [BN95], it particularly holds for these coefficients as well. Truncations of the power series expansion now give a polynomial time approximation of the Jones polynomial. It is unknown whether one could get some error a priori estimate in terms of the crossing number. The possible correlation among the finite type coefficients of the Jones polynomial we observed here is certainly encouraging for the search of such an a priori estimate. For related discussion, see [Fre98].
2.2. Quantum computing and value distribution of the Jones polynomial. The braid group is intimately related with the physics of anyons or quantum Hall effects. Such a relationship is in the heart of the introduction in [FLW00] and [FKW00] of a universal computation model equivalent to quantum computation. Roughly speaking, this universal computation model uses the Jones representation at the fifth root of unit as basic logic gates. Density results in [FLW00] led to the following theorem in [FLW01] about the statistical value distribution of the Jones polynomial. To state the theorem, we fix an r th primitive root of unit, r ≥ 3, r = 3, 4, 6 and let V : Bn → C be given by evaluating the Jones polynomial of the closure of a braid
430
O. T. Dasbach, T. D. Le, X.-S. Lin
σ ∈ Bn at the r th root of unity. For a braid σ ∈ Bn , its word length will be calculated in terms of the n − 1 standard generators of Bn . A density measure µn on C can be defined as follows. For a subset S ⊂ C, µn (S) = lim
l→∞
#{σ ∈ Bn ; length(σ ) = l, V (σ ) ∈ S} . (2(n − 1))l
Theorem 2.1 ([FLW01]). When n → ∞, µn approaches a Gaussian distribution on C whose deviation depends on r. If we understand this theorem as describing the statistical value distribution of the Jones polynomial on the set of isotopy classes of links, caution must be used. First of all, the braid index is used here to filtrate the set of isotopy classes. Moreover, since different braids may represent the same link, the limiting Gaussian distribution in the theorem above is for a “weighted” value distribution of the Jones polynomial on links. Factoring through braids leads to this theorem, which is the first of its kind, about the statistical value distribution of the Jones polynomial. It can be thought of as an indication of the randomness of the values of the Jones polynomial. Nevertheless, our plotting shows some more delicate features of the actual value distribution of the Jones polynomial. This seems to be the case in particular regarding the phenomena of “quantum morphing”, that the value distribution of the Jones polynomial exhibits some kind of regularity when r is getting larger. Such a tendency becomes precise in the case of B3 (see Prop. 6.1). We wonder whether some theorems about the phenomena of “quantum morphing” could be established for each braid group Bn . 2.3. The volume conjecture of Kashaev and Murakami–Murakami. The distinctive shape of the value distribution of the Jones polynomial at higher order roots of unit might be thought of as an evidence in support of the volume conjecture of Kashaev and Murakami– Murakami. Recall that the N -dimensional irreducible representation of sl2 gives rise to the colored Jones polynomial. The exponential growth rate of the norm of the colored Jones polynomial at e2πi/N is conjectured in [Kas97, MM99] to be equal to the simplicial volume of the knot complement. It is known that for a knot K, the colored Jones polynomial at the N -dimensional irreducible representation of sl2 is determined by the usual Jones polynomial (N = 2) at the connected r-fold cabling of K, r < N. Notice further that for a fixed knot, the crossing number of its connected N -fold cabling grows like a quadratic function of N . If the value distribution of the Jones polynomial at e2πi/N for knots with crossing number, say N , is random, when N is very large, we certainly would have less chance to get a meaningful exponential growth rate for its norm. Fortunately, what we see from our plottings is pointing toward an opposite direction. It is interesting to try to think about the volume conjecture along this line more quantitatively. 3. The Pictures With the standard notation as in [Jon87], let K (t) be the Alexander polynomial and VK (t) be the Jones polynomial of a knot K. We expand the Jones polynomial into a power series by a change of variables t = ex : VK (ex ) = 1 +
∞ n=2
Vn (K)x n .
(1)
Quantum Morphing and the Jones Polynomial
431
The Alexander polynomial K (t), on the other hand, is a polynomial in (t 1/2 − t −1/2 )2 : K (t) = 1 +
N
c2n (K)(t 1/2 − t −1/2 )2n .
(2)
n=1
The coefficients Vn (K) and cn (K) are Vassiliev invariant of order n. We will call them Vassiliev coefficients of the Jones polynomial and the Alexander polynomial. From the general theory of Vassiliev invariants, we know that V2 , V3 , . . . , Vn , . . . are algebraically independent knot invariants. In other words, there is no non-trivial polynomial P = P (x1 , x2 , . . . , xk ) such that P (Vn1 (K), Vn2 (K), . . . , Vnk (K)) = 0 for all knots K. Nevertheless, actual evaluation of the Jones polynomial reveals that on the set of knots with bounded crossing number, these knot invariants V2 , V3 , . . . , Vn , . . . become correlated in a certain sense. Such a correlation between V2 and V3 was first observed by S. Willerton in his plotting of the “fish”. To agree with the standard notation in the literature, we give the following definition. Definition 3.1.
1 1 v2 (K) := − VK (1) = K (1), 6 2 1 v3 (K) := − (VK (1) + 3V (1)). 36
Note that with this definition VK (ex ) = 1 − 3v2 x 2 − 6v3 x 3 + O(x4 ). So V2 = c2 = −3v2 and V3 = −6v3 . The invariants v2 of order 2 and v3 of order 3 span the whole space of Vassiliev ¯ = invariants of order less than or equal to 3. For K¯ the mirror image of K we have v2 (K) ¯ = −v3 (K). In particular, if K is amphicheiral then v3 (K) = 0. v2 (K) and v3 (K) As a Vassiliev invariant of order 2, v2 is uniquely determined by v2 (unknot) = 0 and v2 (trefoil) = 1. Similarly, the Vassiliev invariant v3 of order 3 is uniquely determined by v3 (unknot) = 0, v3 (right-trefoil) = 1, and v3 (figure-eight) = 0. Let qn be the nth root of unity qn := e2πi/n . Since VK (1) = 1 for a knot K, we have the classical limit lim VK (qn ) = 1. n→∞
For our purposes other limits are more useful: Proposition 3.2. We renormalize the real and imaginary part of the evaluation of the Jones polynomial in the following way: VK (qn ) − 1 ˜ V2,n := re , (2π i/n)2 VK (qn ) V˜3,n := re . (2π i/n)3 Here, re(z) denotes the real part of a complex number z.
432
O. T. Dasbach, T. D. Le, X.-S. Lin
We have
lim V˜2,n = V2 ,
n→∞
lim V˜3,n = V3 .
n→∞
The proof is immediately from the expansion in Eq. (1). Other coefficients of the Jones polynomial can be obtained similarly by considering the limit of some functions of derivatives of VK (qn ) when n approaches infinity. The plottings of the values of the (renormalized) Jones polynomial at various roots of unity will show decreasing randomness and the emergence of more organized shapes. More specifically, we plot the following data: 1. The (renormalized) evaluation at various roots of unity of the Jones polynomials of all (a) alternating prime knots (Fig. 4) (b) non-alternating prime knots (Fig. 5) (c) prime knots (Fig. 6) with crossing number 13; 2. The (renormalized) evaluation at various roots of unit of the Jones polynomials of all alternating prime knots with crossing number 14 (Fig. 7). Finally, the plottings for other pairs of Vassiliev coefficients of the Jones polynomial, namely (V2 , V4 ), (V4 , V3 ), and (V4 , V5 ), show similar phenomena as in the case of (V2 , V3 ) (Fig. 8). 4. Some Explanations We could only offer some very partial explanations to the observed correlations among Vassiliev coefficients of the Jones polynomial on the set of knots with bounded crossing number. 4.1. Upper bounds for v2 and v3 . By results in [BN95] every Vassiliev invariant of order k can be computed in O(ck ) time and its value is in O(ck ), where c is the crossing number of a knot. An explicit quadratic upper bound for the invariant v2 of order 2 in terms of c is given in [PV99] as: |v2 | ≤ c2 /8. Similarly, for v3 one can get the cubic bound [Wil01] |v3 | ≤
1 c(c − 1)(c − 2). 4
4.2. Formulas for v2 and v3 . Combinatorial formulas for v2 and v3 were given by several authors. We will use the approach of Polyak and Viro [PV94], where we fix an oriented knot diagram and extract from it a “signed arrow diagram”. Briefly, the knot diagram gives us a generic immersion of S 1 in the plane, which in turn determines a chord diagram as before. Then an arrow and a sign ±1 is added to each chord to encode the information we get from the fact that this chord diagram comes from an oriented knot
Quantum Morphing and the Jones Polynomial
433
X
H
Y
Fig. 1. Some (based) arrow diagrams X, H and Y
diagram. This is the signed arrow diagram of the knot diagram. Ignoring the signs from a signed arrow diagram, we get an arrow diagram. A (signed) arrow diagram can also be based, which means that we fix a base point on the circle away from the end points of arrows. Finally, a sub-diagram is obtained by deleting several arrows from a (based, signed) arrow diagram. Let G be a signed based arrow diagram coming from a knot projection of a knot K. For a given based arrow diagram D, an imbedding φ : D → G identifies D with a sub-diagram of G. Define sign(φ) to be the product of all signs of the arrows in φ(D). Let X˙ be the based arrow diagram in Fig. 1. Ignoring the base point of X˙ we get the arrow diagram X. Two other arrow diagrams H and Y are also given in Fig. 1. Proposition 4.1 (Polyak–Viro). We have v2 (K) =
sign(φ)
˙ φ:X→G
and v3 (K) =
1 sign(φ) + sign(φ). 2 φ:H →G
φ:Y →G
We now reformulate the Polyak–Viro formulas so that the summation is taken over the same set of elements in both cases of v2 and v3 . This set consists of all imbeddings φ : X → G. Fix such an imbedding φ, we define three weights as follows: 1. w1 (φ) equals to 1 plus the number of endpoints of arrows in G which lie between the arrow-heads of φ(X). 2. w2 (φ) equals to the sum of signs of arrows in G such that together with arrows in φ(X), they form the arrow diagram H . 3. w3 (φ) equals to the sum of signs of arrows in G such that together with arrows in φ(X), they form the arrow diagram Y . Finally, let c be the number of arrows in G. Proposition 4.2. We have v2 (K) =
sign(φ)
φ:X→G
and v3 (K) =
φ:X→G
w1 (φ) 2c
1 1 sign(φ) w2 (φ) + w3 (φ) . 4 3
434
O. T. Dasbach, T. D. Le, X.-S. Lin
4.3. Positive knots. Let K be a knot having a knot diagram of all c crossings positive, we have the following theorem. Proposition 4.3. We have 1 10c − 5 v2 (K) ≤ v3 (K) ≤ v2 (K). 2 6 Proof. The lower bound was already noticed by Willerton [Wil01]. In order to prove the promised upper bound, let us try to bound the weights w2 and w3 by w1 from above. The first upper bound is easy to obtain: w3 (φ) ≤ w1 (φ) − 1 for every imbedding φ : X → G. The comparison of w2 and w1 is slightly more complicated. We will consider two imbeddings φ, φ : X → G as two vertices in a graph G. These two vertices are connected by an edge iff φ(X) and φ (X) together have three different arrows and these three arrows form the arrow diagram H . Then w2 (φ) is equal to the valence of φ in the graph G. They are two types of neighboring vertices for a fixed imbedding φ as shown in Fig. 2.
type 1
type 2
Fig. 2. Two types of relative positions of φ (solid arrows) and φ
Let w2+ (φ) be the number of neighboring vertices of φ of the first type and w2− (φ) be the number of neighboring vertices of the second type. Of course, w2 (φ) = w2+ (φ) + w2− (φ). First it is obvious that w2+ (φ) ≤ w1 (φ) − 1. We observe next that suppose φ and φ are connected by an edge in G, if φ is considered as of type 2 for φ, then φ is considered as of type 1 for φ . Therefore, φ:X→G
w2 (φ) = 2
φ:X→G
w2+ (φ).
Quantum Morphing and the Jones Polynomial
435
Thus, we have v3 (K) =
1 + 1 w2 (φ) + w3 (φ) 2 3 φ
φ
1 1 1 1 w1 (φ) − 1+ w1 (φ) − 1 ≤ 2 2 3 3 φ
φ
φ
φ
5 5 2c v2 (K) − v2 (K) 6 6 10c − 5 = v2 (K) 6 =
as desired. We learned that a slightly better upper bound is given in [Sto98]. 4.4. Knots given as closed braids. Of special interest is the class of knots given as closed 3-braids. Using representation theory of Hecke algebras, Jones gave the following relation between VK (t) and K (t) for K = α, ˆ α ∈ B3 (see [Jon87]). Proposition 4.4 (Jones). If α ∈ B3 is such that the closure αˆ is a knot and the exponent sum of α is e then Vαˆ (t) = t e/2 (1 + t e + t + 1/t − t e/2−1 (1 + t + t 2 )αˆ (t)). As a corollary, we have the following relationship among v2 , v3 and e for knots K = α, ˆ α ∈ B3 . Corollary 4.5. Let K be a knot given as a closed 3-braid K = αˆ and e be the exponent sum of α ∈ B3 . Then e e 3 . V3 (K) = eV2 (K) − + 2 2 For B4 , again using representation theory of Hecke algebras, Jones established the following formula relating the symmetrized Jones polynomial with the Alexander polynomial (see [Jon87]). Proposition 4.6 (Jones). If α ∈ B4 is such that αˆ is a knot and the exponent sum of α is e then t −e Vαˆ (t) + t e Vαˆ (1/t) = (t −3/2 + t −1/2 + t 1/2 + t 3/2 )(t e/2 + t −e/2 ) − (t −2 + t −1 + 2 + t + t 2 )αˆ (t).
(3)
Comparing the Vassiliev coefficients of both side of Eq. (3), we can get an algebraic relation among e, V2,3,4 , and c4 as in Corollary 4.5. We will offer an elementary proof of Proposition 4.6 below. The proof also generalizes Eq. (3) to every α ∈ B4 , not just the braids which close to a knot. Furthermore, our method can be generalized to get a similar equation for every braid group Bn using the HOMFLY polynomial. But since we have no conclusive results directly relevant to the topics of this paper from such a generalization, it will not be presented here. Suffice to say that such equations tell us that the Vassiliev coefficients of the Jones and Alexander polynomial
436
O. T. Dasbach, T. D. Le, X.-S. Lin
for closed braids, together with the exponent sum e, are algebraically related with each other. See also [DL00]. For α ∈ B4 , let e be the exponent sum and k be the number of components of α. ˆ Consider the following three invariants of conjugacy classes of B4 : V = t −e Vαˆ (t) + (−1)k−1 t e Vαˆ (1/t), Q = t e/2 + (−1)k−1 t −e/2 , = αˆ (t). They all satisfy the following skein relation: [α+ ] − [α− ] = (t 1/2 − t −1/2 ) [α0 ], where α+ = α0 σi and α− = α0 σi−1 , with σi the standard braid generator. Thus, these invariants of conjugacy classes of B4 are determined by their initial values on σ1 σ2 σ3 , σ1 σ3 , σ1 , and 1 in B4 . (The set of braids {σ1 σ2 σ3 , σ1 σ2 , σ1 σ3 , σ1 , 1} identifies with the set of conjugacy classes of the symmetric group S4 . Their closures are all the unknots. So the values of V¯ , Q, on these braids depend only on e and k. Thus σ1 σ2 is dropped from our list of braids determining the initial values of V¯ , Q, , since it has the same e and k as the braid σ1 σ3 .) It is easy to check that the following matrix, whose rows are values of V , Q, and on σ1 σ2 σ3 , σ1 σ3 , σ1 , and 1, respectively, is of rank 2: t −3 + t 3 (t −2 − t 2 )(−t 1/2 − t −1/2 ) (t −1 + t)(−t 1/2 − t −1/2 )2 0 3/2 t − t −1 t 1/2 + t −1/2 0 . t + t −3/2 1
0
0
0
Therefore, V , Q, and are linearly related and one can work out Eq. (3) directly from this conclusion. 4.5. The evaluation of the Jones polynomial at e2πi/10 . The evaluation of the Jones polynomial at the tenth root of unity q10 = e2πi/10 is somewhat special. We will give at least an heuristic reason for the difference in the “density” of the values of the Jones polynomial at the tenth-root of unity compared with the other roots of unity.
L
α
Fig. 3. The link Lα
Quantum Morphing and the Jones Polynomial
400
441
-60
-40
200 -60
-40
-20
20
-500 -1000
-20
-1500
20 -200
-2000
-400
-2500
10000
400 200
5000
-2500 -2000 -1500 -1000 -500 -200
-2500 -2000 -1500 -1000 -500 -5000
-400
-10000
Fig. 8. Plots of (V2 , V3 ), (V2 , V4 ), (V4 , V3 ), (V4 , V5 ) for knots on 13 crossings
Proposition 4.7. For a 3-braid α ∈ B3 and fixed L let Lα be the link as in Fig. 3. Here the strands of α are supposed to be oriented in the same direction. Then the Jones polynomial evaluated at the tenth root of unity q10 takes only finitely many values on the set {Lα |α ∈ B3 }. Proof. By a result of Przytycki [Prz88] and a generalization of Stoimenow [Sto99] the value of the Jones polynomial at q4k+2 does not change its norm if we insert or delete σi2k+1 into the braid α. Here, the σi denote the standard generators of the braid groups. More specifically, V (Lα ; q10 ) only changes by a multiplication by −i if we delete σi5 in a braid. By a result of Coxeter [Cox59] (compare with [Che01]) the braid group Bn modulo p p its normal subgroup generated by σ1 , . . . , σn−1 is finite if and only if (n−2)(p−2) < 4. In particular, the group B3 modulo the normal subgroup generated by the fifths powers of the standard generators is a finite group. Thus, V (Lα ; q10 ) can only take finitely many values. As a corollary we get the following result of Jones [Jon87]: Corollary 4.8. The Jones polynomial evaluated at q10 takes only finitely many values on the set of closed 3-braids. References [BL93] [BN95] [Che01] [Cox59]
Birman, J.S. and Lin, X.-S.: Knot polynomials and Vassiliev’s invariants. Invent. Math. 111, no. 2, 225–270 (1993) Bar-Natan, D.: Polynomial invariants are polynomial. Math. Research Letters 2, 239–246 (1995) Chen, Q.: The 3-move conjecture for 5-braids. In: Proceedings of the International Conference on Knot Theory and its Ramifications, Singapore: World Scientific, 2001, pp. 36–47 Coxeter, H.S.M.: Factor groups of the braid group In: Proc. fourth Canad. Math. Congress Banff 1957, 1959, pp. 95–122
442
[Das00] [DL00] [FKW00] [FLW00] [FLW01] [Fre98] [GJ79] [Jon87] [JVW90] [Kas97] [MM99] [MZ00] [Prz88] [PV94] [PV99] [Sto98] [Sto99] [Vog90] [Wil01] [Yam87]
O. T. Dasbach, T. D. Le, X.-S. Lin
Dasbach, O.T.: On the combinatorial structure of primitive Vassiliev invariants III – A lower bound. Comm. Contempor. Math. 2, no. 4, 579–590 (2000) Dasbach, O.T. and Lin, X.-S.: The Bennequin number of closed n-trivial n-braids is negative. To appear in: Math. Res. Let. 8, No. 5–6 (2001) Freedman, M.H., Kitaev, A. and Wang, Z.: Simulation of topological field theories by quantum computers. Preprint (Microsoft), available as: quant-ph/0001071, 2000 Freedman, M.H., Larsen, M.J. and Wang, Z.: A modular functor which is universal for quantum computation. Preprint (Microsoft), available as: qant-ph/0001108, 2000 Freedman, M.H., Larsen, M.J. and Wang, Z.: The two-eigenvalue problem and density of Jones representation of braid groups. Preprint (Microsoft), available as math.GT/0103200, 2001 Freedman, M.H.: Topological views on computational complexity. In: Proceedings of the International Congress of Mathematicians, Vol. II (Berlin, 1998), Doc. Math. 1998, Extra Vol. II, pp. 453–464 (electronic) Garey, M.R. and Johnson, D.S.: Computers and intractability. In: A guide to the theory of NPcompleteness A Series of Books in the Mathematical Sciences, San Francisco, CA: W. H. Freeman and Co., 1979, Jones, V.F.R.: Hecke algebra representations of braid groups and link polynomials. Ann. Math. 126, 335–388 (1987) Jaeger, F., Vertigan, D.L. and Welsh, D.J.A.: On the computational complexity of the Jones and Tutte polynomials. Math. Proc. Cambridge Philos. Soc. 108, no. 1, 35–53 (1990) Kashaev, R.M.: The hyperbolic volume of knots from the quantum dilogarithm. Lett. Math. Phys. 39, no. 3, 269–275 (1997) Murakami, H. and Murakami, J.: The colored Jones polynomials and the simplicial volume of a knot. Acta Math. 186, No. 1, 85–104 (2001) Menasco, W.W. and Zhang, X.: Positive knots and knots with braid index three have Property P . Available as: math.GT/0010154, 2000 Przytycki, J.H.: tk moves on links. Braids (Santa Cruz, CA, 1986), Providence, RI: Amer. Math. Soc., 1988, pp. 615–656 Polyak, M. and Viro, O.: Gauss diagram formulas for Vassiliev invariants. Internat. Math. Res. Notices no. 11, 445ff. (1994), approx. 8 pp. (electronic) Polyak, M. and Viro, O.: On the Casson knot invariant. Knots in Hellas ’98, Vol. 3 (Delphi). J. Knot Theory Ramifications 10, no. 5, 711–738 (2001) Stoimenow, A.: Positive knots, closed braids and the Jones polynomial. Preprint, available as: math.GT/9805078, 1998 Stoimenow, A.: The granny and the square tangle and the unknotting number. Preprint, MPI Bonn, October 1999 Vogel, P.: Representation of links by braids: A new algorithm. Commun. Math. Helv. 65, 104–113 (1990) Willerton, S.: On the first two Vassiliev invariants. Preprint, available as:math.GT/0104061, 2001 Yamada, S.: The Minimum Number of Seifert Circles Equals the Braid Index of a Link. Invent. Math. 89, 346–356 (1987)
Communicated by P. Sarnak
Commun. Math. Phys. 224, 443 – 544 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Global Regularity of Wave Maps II. Small Energy in Two Dimensions Terence Tao Department of Mathematics, UCLA, Los Angeles, CA 90095-1555, USA. E-mail: [email protected] Received: 14 December 2000 / Accepted: 18 June 2001
Abstract: We show that wave maps from Minkowski space R1+n to a sphere S m−1 are globally smooth if the initial data is smooth and has small norm in the critical Sobolev space H˙ n/2 , in all dimensions n ≥ 2. This generalizes the results in the prequel [40] of this paper, which addressed the high-dimensional case n ≥ 5. In particular, in two dimensions we have global regularity whenever the energy is small, and global regularity for large data is thus reduced to demonstrating non-concentration of energy.
1. Introduction Throughout this paper d ≥ 2, n ≥ 1 will be fixed integers, and all constants may depend on d and n. Let R1+n be n + 1 dimensional Minkowski space with flat metric η := diag(−1, 1, . . . , 1), and let S d−1 ⊂ Rd denote the unit sphere in the Euclidean space Rd . Elements φ of Rd will be viewed as column vectors, while their adjoints φ † are row vectors. We let ∂α and ∂ α for α = 0, . . . , n be the usual derivatives with respect to the Minkowski metric η, subject to the usual summation conventions. We let ✷ := ∂α ∂ α = − ∂t2 denote the D’Lambertian. We shall write φ,α and φ ,α for ∂α φ and ∂ α φ respectively. Define a classical wave map to be any function φ defined on an open set in R1+n taking values on the sphere S d−1 which is smooth, equal to a constant outside of a finite union of light cones, and obeys the equation † ,α φ . ✷φ = −φφ,α
(1)
For any time t, we use φ[t] := (φ(t), ∂t φ(t)) to denote the position and velocity of φ at time t. We refer to φ[0] as the initial data of φ. Note that in order for φ[0] to be the initial data for a classical wave map, φ[0] must be smooth, equal to a constant outside
444
T. Tao
of a compact set, and satisfy the consistency conditions φ † (0)φ(0) = 1;
φ † (0)∂t φ(0) = 0.
(2)
We shall refer to data φ[0] which satisfy these properties as classical initial data. The purpose of this √ paper is to prove the following regularity result for classical wave maps. Let H˙ s := ( −)−s L2 (Rn ) denote the usual homogeneous Sobolev spaces. Theorem 1. Let n ≥ 2, and suppose that φ[0] is classical initial data which has a sufficiently small H˙ n/2 × H˙ n/2−1 norm. Then φ can be extended to a classical wave map globally in time. Furthermore, if s is sufficiently close to n/2, we have the global bounds
φ[t] L∞ (H˙ s ×H˙ xs−1 ) φ[0] H˙ s ×H˙ xs−1 . t
x
x
(3)
In particular, in the energy-critical two-dimensional case one has global regularity for wave maps with small energy into a sphere. From this and standard arguments based on finite speed of propagation (see e.g. [4]), we see that the problem of global regularity for general smooth data is thus reduced to demonstrating the non-concentration of energy. This non-concentration is known if one assumes some symmetry on the data and some curvature assumptions on the target manifold ([4, 30, 34, 35]), but is not known in general. For further discussion on these problems see, e.g. [15, 21, 29, 33]. A similar result, but n/2,1 with the Sobolev norm H˙ n/2 replaced by a slightly smaller Besov counterpart B˙ 2 , was obtained in [42]. Indeed, our paper shall largely be a (self-contained) combination of [42] and the prequel [40] to this paper, although there are some additional technical issues arising here which do not occur in the two papers just mentioned. Theorem 1 was proven in [40] in the high-dimensional case n ≥ 5. In that paper the main techniques were Littlewood–Paley decomposition, Strichartz estimates1 and an adapted co-ordinate frame constructed by parallel transport. To cover the low dimensional cases n = 2, 3, 4 we shall keep the Littlewood–Paley decomposition and adapted co-ordinate frame construction (with only minimal changes from [40]), but we shall abandon the use of Strichartz estimates as the range of these estimates becomes far too restrictive to be of much use, especially when n = 2. Instead, we shall adapt the more intricate spaces (including X˙ s,b spaces) and estimates developed in [42], as substitutes for the Strichartz estimates. This will make the argument much lengthier and involved, although the overall strategy is little changed2 from that in [40]. One major new difficulty ∞ ˙ s,b spaces, and so is that multiplication by L∞ t Lx functions is not well-behaved on X ∞ ∞ we will need to replace Lt Lx with a more complicated Banach algebra. In [40] the non-linearity was placed (after localizing in frequency and switching to n/2−1 the adapted co-ordinate frame) in the familiar space L1t H˙ x . When n ≥ 5 this was 1 Readers familiar with the literature may be surprised that Strichartz estimates are able to handle the critical problem for wave maps. The reason is that the renormalization almost reduces the strength of the non-linearity to the level of a pure power. A more precise statement is that the renormalization ensures that in the event of high-low frequency interactions, at least one of the two derivatives in the cubic non-linearity will land on the lowest frequency term. To compare this with the pure power problem, observe that if we could somehow ensure that both derivatives in the non-linearity landed on the two lowest frequency terms, then we could † ,α φ , which is a cubic semi-linear differentiate the equation to obtain something like ✷∇x,t φ = −∇x,t φφ,α equation in ∇x,t φ with an additional null structure. 2 Indeed, the basic renormalization argument only covers about a third of the paper, from Sect. 2 to Sect. 5. The bulk of the paper is concerned instead with constructing rather complicated function spaces as substitutes for the Strichartz spaces, and proving the relevant estimates for those spaces.
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
445
relatively easy to achieve, since one had access to L2t L4x and L2t L∞ x Strichartz estimates. For n = 4 one loses the L2t L4x estimate, but one could probably use X˙ s,b spaces and null † φ ,α in spaces such as L2 L2 ) as a substitute, so that form estimates (which would place φ,α t x n/2−1 . When one could continue to place the non-linearity in such good spaces as L1t H˙ x n = 3 one also (barely) loses the L2t L∞ estimate, although in principle this could be x compensated for by the Lp null form estimates in [44, 39] for certain p < 2. However in the energy-critical case n = 2, the best Strichartz estimate available is only L4t L∞ x , and it appears that even the best possible Lp null form estimates3 are not strong enough to place the non-linearity in a space such as L1t L2x , even after using the adapted co-ordinate frame and introducing X˙ s,b type spaces. Because of this, we can only place a small portion of the non-linearity in L1t L2x . Following [42], we shall place the other portions of the non-linearity either in an X˙ s,b type space, or in L1t L2x spaces corresponding to null frames. To obtain these types of control on the non-linearity, we shall use null-form estimates, as well as the decomposition, introduced in [42], of free solutions as a superposition of travelling waves, each of which is in L2t L∞ x with respect to a certain null frame. This decomposition, combined with the L2t L2x control coming from X˙ s,b estimates, is crucial to recover the L1t L2x type control of the non-linearity which we need to close the argument. The high-dimensional argument in [40] did not need to exploit the null structure in (1). However, one does not have this luxury in the low-dimensional cases, and we shall need in particular to rely on the identity 2φ,α ψ ,α = ✷(φψ) − φ✷ψ − ✷φψ
(4)
heavily (cf. [41, 42], and elsewhere). This identity is useful when φ, ψ, φψ are relatively close to the light cone in frequency space, although when one is far away from the light cone this identity can become counter-productive. It is quite possible that Theorem 1 can be extended to other manifolds than the sphere4 , and to scattering and well-posedness results. We refer the reader to [40] as we have nothing of interest to add to that discussion here (other than a large increase in complexity). Indeed, we would strongly recommend to anyone interested in these problems for small data that they first study the high-dimensional case before attempting the low-dimensional one. (For large data one of course has blowup in dimensions greater than two due to the supercritical nature of the energy conservation law; see [31].) 2. Notation and Preliminary Reduction We shall restrict our attention to the low dimensional case n = 2, 3, 4 since the high dimensional case was already treated in [40]. We shall need some small exponents 0 < δ0 δ1 δ2 δ3 δ4 1. 3 An examination of the known counter-examples suggests that it may just be possible to place φ ψ ,α ,α 4/3 in Lt L2x , which in principle is just barely enough to obtain L1t L2x control on the non-linearity thanks to the L4t L∞ x Strichartz estimate. However this would require (among other things) a reworking of the endpoint
argument of [39] and would therefore not be a simplification to this paper. 4 In dimensions n ≥ 5 this has recently been achieved [19] in the case when the target manifold is boundedly parallelizable.
446
T. Tao
The exact choice of these exponents is not important, but for concreteness we shall choose them as follows. We first choose 0 < δ4 1 to be a small absolute constant 10 for i = 3, 2, 1, 0. We depending on n (δ4 = 1/100n shall do), and then set δi := δi+1 shall implicitly be inserting the disclaimer “assuming δ4 is sufficiently small depending on n” in all the arguments which follow. Thus any exponential term involving δ4 shall dominate a corresponding term involving δ3 , and so forth down to δ0 , which is dominated by everything. The exponents δi are only of technical importance and the reader should not take them too seriously. Broadly speaking, we shall use the smallest constant δ0 to control the flexibility of frequency envelopes, the next smallest constant δ1 to measure the exponential gains in our final iteration spaces Sk , Nk , and the largest constant δ4 to measure rather large exponential gains coming from the basic linear and bilinear estimates. The intermediate exponents δ2 , δ3 are only used for the delicate trilinear estimate (31), and arise because the proof of (31) is essentially an interpolation between several different types of arguments. (i) (i) Let j, k be integers and i = 0, 1, 2, 3, 4. We use χj ≤k or χk≥j to denote a quantity of the form min(1, 2−δ(j −k) ), where δ > C −1 δi2 for some absolute constant C > 0 (i) depending only on n. We also use χj =k to denote a quantity of the form 2−δ|j −k| with (i)
(i)
the same assumptions on δ. Thus χj ≤k is small unless j ≤ k + O(1), and χj =k is small unless j = k + O(1). We suggest the reader ignore the i index and think of the χ (i) as (i) characteristic functions, e.g. χj ≤k is morally a cutoff to the region j ≤ k. −(i)
−(i)
Similarly, we use χj ≤k = χk≥j to denote a quantity of the form max(1, 2δ(j −k) ), 1/2
where δ < Cδi for some absolute constant C > 0 depending only on n, and also −(i) use χj =k to denote quantity of the form 2δ|j −k| with the same assumptions on δ. The χ (i) thus represent various exponential gains in our estimates, while χ −(i) represent various exponential losses. Note that a χ (i) gain will dominate a corresponding χ −(j ) loss whenever i > j . As usual, we use A B or A = O(B) to denote the estimate A ≤ CB, where C is some quantity depending only on n, d, and the δi . All sums will be over the integers Z unless otherwise specified. We fix 0 < ε 1 to be a small constant depending only on n, d, and the δi (ε := δ0nd will suffice5 ). We shall implicitly insert the disclaimer “assuming ε is sufficiently small depending on n, d and the δi ” in all the arguments which follow. Eventually we shall assume that the initial data has a H˙ n/2 × H˙ n/2−1 norm of ε 2 . We shall parameterize spacetime R1+n in the standard Euclidean frame {(t, x) : t ∈ R, x ∈ Rn } with the Euclidean inner product (t, x) · (t , x ) := tt + x · x ; we will not use the Minkowski metric η much (except in Case 4(e) of Sect. 18). In the proof of our estimates in the second half of the paper we shall also introduce null frames for R1+n , but we shall not need them for quite a while. We fix T > 0 to be a given time. It will be important that our implicit constants do not depend on T . For the first half of this paper, which is concerned with the iteration scheme and the renormalization, our functions shall only be supported on the slab [−T , T ]×Rn , but in the second half, which is concerned with the function spaces and the estimates, 5 Of course, by our above construction this value of ε is absurdly small, as we have wildly exaggerated the separation of scales in exponents that we shall actually need. One can improve the value of ε substantially, but we shall not attempt to do so here.
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
447
we shall mainly work on all of R1+n (as one then gains access to the spacetime Fourier transform) and then apply standard restriction arguments to return to [−T , T ] × Rn . q We define the Lebesgue spaces Lt Lrx by the norm
φ Lqt Lr := x
|φ(t, x)|r dx
q/r
1/q dt
with the usual modifications when r = ∞ or q = ∞. If φ(t, x) is a function on [−T , T ] × Rn or R1+n , we define the spatial Fourier ˆ ξ ) by transform φ(t, ˆ ξ ) := φ(t, e−2πix·ξ φ(t, x) dx Rn with the inverse transform given by ˇ F (t, x) := e2πix·ξ F (t, ξ ) dξ. Rn The spatial Fourier transform is distinct from the spacetime Fourier transform Fφ(τ, ξ ), which we shall need in the second half of the paper. We define the spatial Fourier support ˆ ξ ) = 0 for some t}. or ξ -Fourier support of φ to be the set {ξ : φ(t, We shall write D0 for |ξ |, so that D0 measures the strength of the operator ∇x . Thus, for instance, the set {D0 ∼ 2k } denotes the frequency region {ξ : |ξ | ∼ 2k }. We now set up some Littlewood–Paley operators, which shall play a central role in our arguments. Fix m0 (ξ ) to be a non-negative radial bump function supported on D0 ≤ 2 which equals 1 on the ball D0 ≤ 1. For each integer k, define the operators P≤k = P
and the projection operators Pk to the frequency annulus D0 ∼ 2k by the formula Pk := P≤k − P
P≥k := 1 − P
Similarly define Pk1 <·≤k2 , P>k , etc. Note that these operators all have kernels which are measures of finite mass, and so these operators are bounded on all translation-invariant Banach spaces. Also observe that these operators commute with differentiation operators (or other Fourier multipliers), as well as any time cutoffs. We shall write φk for Pk φ. Similarly define φ≤k , φ≥k , φk1 <·
448
T. Tao
We shall often exploit the Littlewood–Paley product trichotomy (also known as the paraproduct decomposition), which asserts if φ and ψ have Fourier support on D0 ∼ 2k1 and D0 ∼ 2k2 respectively, then Pk (φψ) vanishes unless one of the following three statements6 is true: – (Low-high interaction) k = k2 + O(1); k1 ≤ k2 + O(1). – (High-low interaction) k = k1 + O(1); k2 ≤ k1 + O(1). – (High-high interaction) k ≤ k2 + O(1); k1 = k2 + O(1). The same is true if φψ is replaced by similar expressions such as φ ,α ψ,α or L(φ, ψ) (see below). In the wave map problem with n ≥ 2 the low-high (and by symmetry, the highlow) interactions are dominant; the high-high interactions are weaker because in two and higher dimensions it is unlikely that two high frequencies will be so opposed as to cancel and form a low frequency7 . This qualitative statement will be made concrete by the presence of factors such as 2−|k−k2 |/10 in several of our estimates. This fact implies that the “high to low” frequency cascade is heuristically negligible8 , and so we only need to concentrate on stemming a possible “low to high” frequency cascade coming from a divergence of the low-high and high-low interactions. Our main weapon in achieving this is a “gauge transformation” or “renormalization”, which effectively moves a derivative from the high frequency term to the low one. We remark that in the one-dimensional case that the “high to low” cascade is not negligible and is not damped by the renormalization. Indeed, this cascade causes ill-posedness at the critical regularity [37]. Note that if φ is a smooth function which is equal to a constant e outside of a compact set, then the φk are rapidly decreasing in space and we have the Littlewood– Paley decomposition φ = e + k φk . In particular, we have φ,α =
φk,α
k
for all indices α. We shall rely frequently on Bernstein’s inequality, which asserts (among other things) that
f L∞ 2nk/2 f L2x x
(5)
6 One could also distinguish the “low-low interaction” case k = k + O(1); k = k + O(1) from the three 2 1 2 listed above. This is sometimes useful in Schrodinger or KdV contexts as it isolates the parallel interaction case, but is not particularly worthwhile for wave equations because there is essentially no change in the geometry of the light cone in the transition between this case and the other three. n/2 ,α 7 Another way to see this is to consider an expression such as P (φ 0 k,α ψk ) for k > 0. If φ, ψ ∈ H˙ x , n
then φk,α and ψk,α are in L2x with a norm of O(2−( 2 −1)k ). For n > 2 this gives a decay in k. For n = 2 one can exploit the null form structure via (4) to obtain a similar heuristic decay. For n = 1 no decay is available, even with the null structure. 8 More precisely, if we only retained the high-high interaction component of the non-linearity and suppressed the other two components, then one could obtain global regularity, well-posedness, and scattering for small H˙ n/2 data for n ≥ 2 directly from the iteration arguments in [42] (or in [41] for n ≥ 4). Indeed, for n ≥ 5 one could even do this just by iterating in Strichartz spaces as one can see by inspecting the arguments in [40]. Note that once the high-high interaction is neglected, the wave map equation essentially becomes an upper-triangular system in the sense that low frequencies affect the high, but not vice versa. In principle, this allows us to solve the system recursively, and our construction of the gauge U≤k shall reflect this philosophy.
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
449
whenever f has Fourier support on D0 2k , and (by duality) that
f L2x 2nk/2 f L1x
(6)
under the same assumption. More generally, we have
f L∞ |R|1/2 f L2x x
(7)
whenever f has Fourier support on a box R. When estimating a product expression such as Pk (φk1 ψk2 ) we shall usually apply Bernstein’s inequality at the lowest frequency min(k, k1 , k2 ), in order to minimize the factor 2nk/2 which appears. We shall now introduce a convenient notation for describing multi-linear expressions of product type. More precisely, if φ (1) (t, x), . . . , φ (s) (t, x) are scalar functions, we use L(φ (1) , . . . , φ (s) )(t, x) to denote any multi-linear expression of the form L(φ (1) , . . . , φ (s) )(t, x) := K(y1 , . . . , yS(c) )φ (1) (t, x − y1 ) . . . φ (s) (t, x − yS(c) ) dy1 . . . dyS(c) , where the kernel K is a measure with bounded mass (the exact value of K might change from line to line). The kernel of L shall never depend on the index α. Also we extend the notation to the case when φ (1) , . . . φ (s) take values as d-dimensional vectors or d × d matrices, by making K into an appropriate tensor. For instance, if φ (1) = (φ (1),i )di=1 is an d-dimensional vector field and φ (2) = (φ (2),j k )dj,k=1 is an d × d matrix field, we use L(φ (1) , φ (2) )(t, x) to denote any function of the form L(φ (1) , φ (2) )(t, x) =
d
Kij k (y1 , y2 )φ (1),i (t, x − y1 )φ (2),j k (t, x − y2 ) dy1 dy2 ,
i,j,k=1
where for each i, j, k the kernel Kij k (which may be itself scalar, vector, or matrix valued) has a bounded mass. This L notation will turn out to be extremely handy for suppressing matrix coefficients, Littlewood–Paley multipliers, commutator expressions, null forms, etc., whenever these structures are not being exploited (in a manner similar to the big-O() notation, which we shall also use). The expression L(φ (1) , . . . , φ (s) ) should be thought of as a variant of the expression O(φ (1) . . . φ (s) ), and in general L can be treated as if it were just the pointwise product operator. For instance, the Fourier support of L(φ (1) , . . . , φ (s) ) is contained in the set-theoretic sum of the Fourier supports of the individual φ (i) . Also, from Minkowski’s inequality we immediately have Lemma 1. Let X1 , . . . , Xs , X be spatially translation-invariant9 Banach spaces such that we have the product estimate
φ (1) . . . φ (s) X ≤ A φ (1) X1 . . . φ (s) Xs 9 By “X is spatially translation-invariant” we mean that f (t, x − x ) has exactly the same X norm as 0 f (t, x) for all f ∈ X and x0 ∈ Rn .
450
T. Tao
for all scalar-valued φ (i) ∈ Xi and some constant A > 0. Then we have
L(φ (1) , . . . , φ (s) ) X (Cd)Cs A φ (1) X1 . . . φ (s) Xs for all φ (i) ∈ Xi which are scalars, d-dimensional vectors, or d × d matrices. Similarly, if we have the null form estimate (s−1) (s),α
φ (1) . . . φ (s−2) φ,α φ
X ≤ A φ (1) X1 . . . φ (s) Xs
for all scalar φ (i) ∈ Xi , then
L(φ (1) , . . . , φ (s−2) , φ (s−1),α , φ (s),α ) X (Cd)Cs A φ (1) X1 . . . φ (s) Xs for all φ (i) ∈ Xi as previously. Finally, if we have the product estimate
φψ X ≤ A φ X1 ψ X2 for all scalar φ, ψ, then we have
L(φ (1) , φ (2) , . . . , φ (s) ) X CdA φ (1) X1 L(φ (2) , . . . , φ (s) ) Xs for all φ (i) as previously, where the right-hand side L depends on the left-hand side L and also on the functions φ (1) , . . . , φ (s) . In our applications (except in Step 2(c) of Sect. 5) s will usually just be 2 or 3, so the (Cd)Cs term can be entirely neglected. The L notation is invariant under permutations of the functions (with the kernel K also being permuted, of course). It also interacts well with Littlewood–Paley operators (which always have integrable kernels): L(Pk φ (1) , φ (2) , . . . , φ (s) ) = L(φ (1) , φ (2) , . . . , φ (s) ); Pk L(φ (1) , φ (2) , . . . , φ (s) ) = L(φ (1) , φ (2) , . . . , φ (s) ).
(8)
Similarly for Pk , etc.A similar statement holds for derivatives of Littlewood–Paley operators: L(∇x P≤k φ (1) , φ (2) , . . . , φ (s) ) = 2k L(φ (1) , φ (2) , . . . , φ (s) ); ∇x P≤k L(φ (1) , φ (2) , . . . , φ (s) ) = 2k L(φ (1) , φ (2) , . . . , φ (s) ).
(9)
In particular, we have L(∇x φ (1) , φ (2) , . . . , φ (s) ) = 2k1 L(φ (1) , φ (2) , . . . , φ (s) )
(10)
whenever φ (1) has Fourier support on D0 2k , and similarly for permutations. A typical place where the L notation is useful is in commutator expressions. Specifically, we have: Lemma 2 (Leibnitz rule for Pk ). We have the commutator identity Pk (f g) = f Pk g + L(∇x f, 2−k g).
(11)
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
451
Proof. We may rescale k = 0. By the Fundamental Theorem of Calculus we have (P0 (f g) − f P0 g)(t, x) =
m(y)(f ˇ (t, x − y) − f (t, x))g(t, x − y) dy Rn 1 m(y)y ˇ · ∇x f (t, x − sy)g(t, x − y) dy ds. =− Rn 0
The claim follows from the rapid decay of m. ˇ
This lemma is most useful when g has frequency ∼ 2k and f has frequency 2k . In this case Pk (f g) − f Pk g effectively shifts a derivative from the high-frequency function g to the low-frequency function f . This shift will generally ensure that all such commutator terms will be easily estimated. We now recall a definition, essentially from [40]: Definition 1. A frequency envelope is a sequence c = {ck }k∈Z of positive reals such that we have the l 2 bound
c l 2 ε
(12)
and the local constancy condition (0)
−(0)
χk=k ck ck χk=k ck
(13)
for all k, k ∈ Z. In particular we have ck ∼ ck whenever k = k + O(1). If c is a frequency envelope and (f, g) is a pair of functions on Rn , we say that (f, g) lies underneath the envelope c if one has
Pk f H˙ n/2 + Pk g H˙ n/2−1 ≤ ck
(14)
for all k ∈ Z. One should think of (13) as asserting that ck is effectively constant. In practice, the ±(0) (i) factors of χk=k shall always be dominated by factors such as χk=k for i > 0. We shall not use the full strength of (12) often, and shall usually rely instead on the weaker estimate ck ε
(15)
for all integers k. However, there will be occasions (especially in the construction of the co-ordinate frame U ) in which we will have to fully exploit (12). Because of this, our argument will not apply to any Besov space beyond H˙ n/2 . To prove Theorem 1, we shall first prove Theorem 2. Let T0 > 0 and c be a frequency envelope, and suppose that φ is a classical wave map on [−T0 , T0 ]×Rn such that φ[0] lies underneath εc. Then φ[t] lies underneath Cc for all t ∈ [−T0 , T0 ], where C is an absolute constant depending only on n, d.
452
T. Tao
We now show how Theorem 1 follows from Theorem 2. From the regularity theory in [20, 42] it suffices to show that (3) holds in [−T0 , T0 ] for all classical wave maps φ on [−T0 , T0 ] × Rn whose initial data φ[0] has H˙ n/2 × H˙ n/2−1 norm ε2 . But if φ is such a wave map, we can define the envelope c by ck := ε−1
2−δ0 |k−k | φk [0] H˙ n/2 ×H˙ n/2−1 .
k ∈Z
It is easy to verify (12), (13), (14), so that φ[0] lies underneath εc. Also, from Young’s inequality on l 2 we have
2 σ k ck
k
2
1/2 ∼ φ[0] H˙ n/2+σ ×H˙ n/2−1+σ
for all |σ | ≤ δ0 . From Theorem 2 we thus have that φ[t] lies underneath Cc for t ∈ [−T0 , T0 ], which implies again from (13) that
φ[t] H˙ n/2+σ ×H˙ n/2−1+σ
2 σ k ck
2
1/2 .
k
This implies (3), and global regularity then follows. Henceforth the envelope c will be fixed. The rest of the paper will be devoted to the proof of Theorem 2. We remark that if the l 2 control in (12) was strengthened to l 1 then the proof of this theorem is essentially contained in [42], and is based on a pure iteration argument in a rather intricate Banach space. Our argument will also have an iterative flavor and uses similar spaces, but also requires the renormalizations in [40].
3. The “Iteration” Space, and Key Estimates In order to prove Theorem 2 we shall need some Banach spaces S(c), Sk , Nk to “iterate” in. The purpose of this section is to describe the important properties and estimates of these spaces; their exact construction and the verification of these properties will be deferred to the second half of the paper (Sects. 6–18) as they are rather technical and lengthy. The space S(c) shall contain the wave map φ and the gauge transform U≤k . The spaces Sk shall contain the Littlewood–Paley pieces φk of φ, as well as the renormalizations wk := U≤k−10 φk of these pieces. Finally, Nk contains the non-linearity ✷φk and similar expressions (basically any algebraic combination of U , φ with two derivatives and a frequency of ∼ 2k should be placed inside Nk ). Before we describe the properties of these spaces in detail, we first give some motivation. The proof of Theorem 2 shall be based on the following (informal) scheme: – (Bootstrap hypothesis) By a continuity argument, we shall assume a priori that the φk are already in Sk , and then bootstrap this to better control on φk in Sk . ∞ – (Control of φ) Since φ lies on the sphere, it will be in L∞ t Lx . Combining this with the assumption φk ∈ Sk , we shall obtain φ ∈ S(c).
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
453
– (Construction of gauge) We then construct for each frequency 2k a gauge transform U≤k , which turns out to be a polynomial in φ, φk . Using the above control on φk , φ, we shall show10 that U≤k ∈ S(c) for all k. – (Control of renormalized non-linearity) We define wk := U≤k−10 φk , and use the above estimates to show that ✷wk ∈ Nk . – (Energy estimates) We also have wk [0] ∈ H˙ n/2 × H˙ n/2−1 . Using this and the previous step, we shall place wk ∈ Sk . – (Inverting the gauge) We then show that U≤k−10 is invertible in S(c), and use this and −1 the previous step to show that φk = U≤k−10 wk is in Sk , thus closing the bootstrap circle. – (Epilogue) Finally, we show that control of φk ∈ Sk implies that φ lies underneath the envelope Cc. To make this scheme work we need several estimates connecting Sk , Nk , and S(c), which we shall describe in detail in Theorem 3. One of the major tasks in proving Theorem 2 is thus to select spaces S(c), Sk , Nk which obey all of these estimates. n/2−1 In the high-dimensional case [40], Nk was just the energy method space L1t H˙ x (localized to frequency 2k ), and the spaces Sk were Strichartz spaces. The analogue of the S(c) norm in [40] was essentially given by −1 ∞ + sup c
φ S(c) ∼ φ L∞ k φk Sk . t Lx k
The properties in Theorem 3 turn out to be easily verifiable (after some minor modifications) from standard Strichartz estimates and Hölder’s inequality as long as n ≥ 5. n/2−1 , even In the low-dimensional case one cannot place the non-linearity just in L1t H˙ x after renormalization. Thus we must enlarge the space Nk somewhat, incorporating not n/2−1 only L1t H˙ x but also some X˙ s,b type norms, as well as some more complicated null frame spaces of Tataru [42]. Because we wish to prove energy estimates, this enlargement of Nk also causes Sk to become more complicated; it shall also involve X˙ s,b norms and null frame spaces. This also forces the space S(c) to become a more sophisticated Banach ∞ algebra – something of roughly the same strength as X˙ n/2,1/2 but still controlling L∞ t Lx and closed under multiplication. Fortunately, examples of such algebras S(c) exist (see [18, 41, 42, 38]). The space we use shall be loosely based on that in [42]. For technical reasons (e.g. wanting to make Sk closed under multiplication) we will not localize the spaces Sk very strongly to the frequency 2k , and allow functions in Sk to wander away from the annulus |ξ | ∼ 2k with only a mild penalty; see (87). However, most of the functions which we will place in Sk will indeed be localized to frequency 2k . We will also endow the spaces Nk with similar characteristics (but with a far harsher penalty for leaving the annulus; see Definition 8). We now summarize all the properties of the spaces S(c), Sk , Nk that we shall need, and discuss these properties in turn. Theorem 3. Let T ≥ 0 and c be a frequency envelope. Then there exist Banach spaces S(c) = S(c)([−T , T ] × Rn ), Sk = Sk ([−T , T ] × Rn ), and Nk = Nk ([−T , T ] × Rn ) for k ∈ Z of functions in [−T , T ] × Rn , which satisfy the following properties for all integers k, k1 , k2 , k3 : 10 Actually, we shall first prove that U ≤k ∈ S(Cc) for some constant C, but the spaces S(Cc) and S(c) will end up being equivalent.
454
T. Tao
– (Quasi-continuity) Suppose that φ is a classical wave map on [−T0 , T0 ] × Rn . Then the function a(T ) := max 1, sup ck−1 φk |[−T ,T ]×Rn Sk ([−T ,T ]×Rn ) (16) k
defined on [0, T0 ] obeys the quasi-continuity property lim sup a(T ) lim inf a(T ) T →T
T →T
(17)
for all 0 ≤ T ≤ T0 . Also, a(T ) is monotone non-decreasing in T . – (Invariance properties) The spaces S(c), Sk , Nk are invariant under spatial translations. For any j ∈ Z, the scaling11 φ(t, x) → φ(2j t, 2j x), T → T /2j maps S({ck }) to S({ck−j }) and Sk to Sk+j . Similarly, the scaling F (t, x) → 22j F (2j t, 2j x), T → T /2j maps Nk to Nk+j . – (S(c) is an algebra) The space S(c) is a Banach algebra with respect to pointwise multiplication, i.e. S(c) contains the identity 1 and obeys the estimate12
L(φ, ψ) S(c) φ S(c) ψ S(c)
(18)
for all φ, ψ ∈ S(c). Also, we have ∞ φ S(c)
φ L∞ t Lx
(19)
for all φ ∈ S(c). – (Frequency-localized algebra property) If φ, ψ ∈ Sk , then
L(φ, ψ) Sk φ Sk ψ Sk .
(20)
Similarly, if φ ∈ Sk , ψ ∈ S(c) and ψ has Fourier support in D0 2k , then
L(φ, ψ) Sk φ Sk ψ S(c) .
(21)
Also, Sk controls L∞ , in the sense that −k ∞ 2 ∞ φ S
φ L∞ ∇x,t φ L∞ k t Lx t Lx
(22)
whenever φ ∈ Sk . – (S(c) insensitive to c) For all φ ∈ S(c) and C > 0 we have
φ S(Cc) ∼ φ S(c)
(23)
with the implicit constants depending at most polynomially on C. – (S(c) is built up from Sk ) Let φ be a smooth function on [−T , T ] × Rn which israpidly decreasing in the spatial variable. Suppose we have a decomposition φ = k φ (k) , where each φ (k) is in Sk . Then we have −1 (k) ∞ + sup c
φ S(c) φ L∞ k φ Sk . t Lx k
(24)
11 In other words, the spaces S(c), S are dimensionless, while N has the units length−2 . k k 12 We shall consistently use φ, ψ, to denote generic functions in the S family of spaces, and F to denote
generic functions in the N family of spaces.
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
455
n/2−1 – (Nk contains L1t H˙ x ) Let F be an L1t L2x function on [−T , T ] × Rn which has Fourier support on the region D0 ∼ 2k for some integer k. Then F is in Nk and n
F Nk F L1 H˙ n/2−1 ∼ 2( 2 −1)k F L1 L2 . t
t
x
(25)
x
– (Adjacent Nk are equivalent) We have the compatibility property
F Nk1 ∼ F Nk2
(26)
whenever F ∈ Nk2 and k1 = k2 + O(1). – (Energy estimate) For any Schwartz function φ on [−T , T ] × Rn with Fourier support in D0 ∼ 2k , we have
φ Sk ✷φ Nk + φ[0] H˙ n/2 ×H˙ n/2−1 n
n
∼ ✷φ Nk + 2 2 k φ(0) L2 + 2( 2 −1)k ∂t φ(0) L2 .
(27)
– (Product estimates) We have (1)
Pk L(φ, F ) Nk χk≥k2 φ S(c) F Nk2
(28)
for all φ ∈ S(c) and F ∈ Nk2 . We also have the variant (1)
Pk L(φ, F ) Nk χk≥k2 φ Sk1 F Nk2
(29)
whenever φ ∈ Sk1 and F ∈ Nk2 . – (Null form estimates) We have (1)
Pk L(φ,α , ψ ,α ) Nk χk=max(k1 ,k2 ) φ Sk1 ψ Sk2
(30)
for all φ ∈ Sk1 , ψ ∈ Sk2 . – (Trilinear estimate) We have
Pk L(φ
(1)
(2) , φ,α , φ (3),α ) Nk
(1) (1) χk=max(k1 ,k2 ,k3 ) χk1 ≤min(k2 ,k3 )
3 i=1
φ (i) Ski
(31)
whenever φ (i) ∈ Ski for i = 1, 2, 3. – (Epilogue) For any φ ∈ Sk with Fourier support in D0 2k we have sup φ[t] H˙ n/2 ×H˙ n/2−1 2nk/2 sup φ[t] L2x ×L2x φ Sk . t
x
x
t
(32)
We now discuss each of the above properties in turn. – The estimate (17) is a technical fact needed in order to make the continuity argument work, and is proven in Sect. 12, mainly using (27) and (25). Since we are assuming φ to be smooth and constant outside of a compact set, one would certainly expect the function a to actually be continuous rather than just quasi-continuous, but we do not know how to prove this and in any event it is not needed for our argument13 . In the high dimensional case this estimate was trivial as the spaces Sk were just Lebesgue spaces, but more care is required here because Sk will be defined by restriction from R1+n and have a spacetime Fourier component in their norms. We remark that the quantity a(T ) is necessarily finite for classical wave maps φ, thanks to (27) and (13). 13 Note added in proof: Daniel Tataru has observed that continuity can be obtained by using the fact that the scaling map λ → φ(λt, λx) is continuous in the Sk topology for sufficiently nice φ.
456
T. Tao
– The invariance properties are unsurprising given the translation and scaling symmetries of the equation, and will be automatic from our construction of S(c), Sk , Nk in Sect. 10. As a corollary of translation invariance we observe that the Littlewood–Paley operators Pk , P≤k , etc. are bounded on the spaces S(c), Sk , Nk . – The algebra property (18) is essential for us to invert the gauge transformation, and will be proven in Sect. 16. The spaces described in [42] obey this algebra property if c ∈ l 1 , but when c ∈ l 2 there is a logarithmic divergence in the estimates. Fortunately, ∞ this divergence can be rectified (with some non-trivial effort) by adding L∞ t Lx conn/2 trol to S(c). This is analogous to the well-known fact that H˙ x is not closed under n/2 multiplication, but H˙ x ∩ L∞ x is. The estimate (19) thus will be an automatic consequence of our construction of S(c) in Sect. 16. We shall be able to obtain (18) to some extent from (28) via a convenient duality argument. – The estimates (21), (20) are minor variants of (18); indeed, all three estimates shall be treated in a unified manner in Sect. 16. The L∞ control (22) is unsurprising given (20), and shall be easy to prove. – The insensitivity property (23) will be immediate from the construction of S(c). This property is required because it will turn out for induction purposes that it is more convenient to initially measure U in S(Cc) instead of S(c). – The estimate (24) shall turn out to be automatic, because we shall essentially use (24) to define the space S(c) in Sect. 10. – The estimate (25) plays only a minor role in the main argument, ensuring that Nk does n/2−1 indeed contain test functions and certain error terms. Note that the space L1t H˙ x is the classical space which one would use to hold the non-linearity, if one attempted to apply the energy method (although this method of course fails at the critical regularity). This estimate shall be an automatic consequence of our construction of Nk in Sect. 10. – The compatibility property (26) allows us to ignore certain technical “frequency leakage” problems arising from the fact that φψ does not quite have the same frequency as φ, even when φ has much higher frequency than ψ. It will be an automatic consequence of the construction of Nk in Sect. 10. A similar property for Sk holds but will not be needed in our argument. From (26) and Littlewood–Paley decomposition we observe the estimate
Pk F Nk (33)
F Nk k =k+O(1)
whenever F is supported on the region D0 ∼ 2k . – The energy estimate (27) is a bit lengthy, and is proven in Sect. 11. One could try to make (27) the definition of Sk , as is done in some other papers, but this makes the product estimate (28) difficult to prove. (1) – The estimates (28), (29) shall be proven in Sect. 15. The factor χk≥k2 is an indication that the high-high interactions in this problem are quite weak. (A similar gain is implicit in [42]). – We shall prove (30) in Sect. 17. The proof basically uses the estimates (29), (18) described above, combined with the identity (4). In practice we shall only apply (30) in the high-high interaction case (since we then obtain an exponential gain from (1) χk=max(k2 ,k3 ) ), or if a derivative has been transferred from the high-frequency term to the low-frequency term14 . From (33) we observe that the Pk projection in the above lemma can be removed if the expression inside the Pk already has frequency ∼ 2k .
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
457
– The trilinear estimate (31) is the most difficult estimate in this theorem to prove, and (1) is handled in Sect. 18. The factor χk=max(k1 ,k2 ,k3 ) again reflects the fact that high-high interactions are weak. The difficulty lies primarily in obtaining the small but crucial (1) factor15 of χk1 ≤min(k2 ,k3 ) . Without this factor, (31) essentially follows from (29) and (30). The presence of this factor allows us to handle any non-linearity of cubic or higher degree in which at least one derivative falls on a low frequency term. In order to obtain this key exponential gain we have to go beyond the arguments in [42] and apply some other tools, notably some multiplier calculus to shift null forms from one function to another, and the use of Bernstein’s inequality when the null forms are too degenerate for the multiplier calculus to be effective. As with (30), we remark that the Pk can be removed if the expression inside the Pk already has frequency ∼ 2k . – The estimate (32) is basically a dual to (25), and shall be automatic from our construction of Sk in Sect. 16. When n = 2 it might be possible to use energy conservation to circumvent the need for this estimate, however this does not seem to achieve any substantial simplification in this paper. To close this section we informally discuss how the bilinear and trilinear estimates (28), (30), (31) are to be used. They cannot quite treat the original non-linearity L(φ, φ,α , φ ,α ) in (1), especially when the derivatives ∂α , ∂ α fall on high-frequency components of φ. However these estimates can treat these types of expressions when the derivatives are in more favorable locations. Examples of such “good” non-linearities include – (Derivative falls on a low frequency) An expression L(φk1 , φk2 ,α , φk,α3 ) with k1 ≥ min(k2 , k3 ) + O(1). For these expressions we use (31). – (High-high interactions) An expression L(φk1 , φk2 ,α , φk,α3 ) with k2 = k3 + O(1). For these expressions we use (30) and (28). – (Derivative shifted from high-frequency to low, Type I) An expression of the form L(∇x φk1 , φk2 ,α , ∂ α ∇x−1 φk3 ) with k1 ≤ k3 + O(1). This generalizes the high-high interaction non-linearity, and arises from commutator expressions via Lemma 2. This non-linearity is treated by (31). – (Derivative shifted from high-frequency to low, Type II) An expression of the form L(φk1 , ∇x φk2 ,α , ∇x−1 φk,α3 ) with k2 ≤ k3 + O(1). This type of non-linearity also arises from commutator expressions via Lemma 2, and is estimated by (30) and (28). 14 Because of this, it is possible to lose a factor of up to (but not including) 2|k1 −k2 | in (30) without affecting the argument. This is for instance the case in the n ≥ 5 theory in [40], where the high frequency term is 2(n−1)/(n−3) estimated using the endpoint Strichartz space L2t Lx and the low frequency term in the companion n−1 2 |k −k |(n+1)/2(n−1) 1 2 . space Lt Lx , thus losing a factor of 2 15 For n ≥ 4 this estimate can be obtained by estimating the two low frequencies in L2 L∞ and the high t x frequency in L1t L2x , and by moving these exponents around by an epsilon one can also cover the n = 3 case by Strichartz estimates. However in the n = 2 case the Strichartz estimates are far too weak to prove this estimate, and we shall need to work much harder.
458
T. Tao
– (Repeated derivatives avoiding the highest frequency) An expression of the form L(✷φk1 , φk2 , . . . , φks ) with k1 ≤ max(k2 , . . . , ks ) + O(1). These types of expressions will arise once we apply the gauge transformation U . In principle, one can use Eq. (1) to break this expression up into combinations of the previous types of good non-linearity, although the computations become somewhat tedious in practice. Note that in all cases we have retained the null structure of the non-linearity. In the low dimensional cases n = 2, 3, 4 this is vital to the above non-linearities being good. In all of the above cases we obtain various exponential gains which will allow us to sum in the ki indices. As a first approximation, one should treat these good non-linearities as being negligible errors. The objective is then to gauge transform (Littlewood–Paley localized versions of) (1), exploiting such geometric identities as φ † φ,α = 0 as well as Lemma 2, until all the non-linearities are negligible. In this strategy the Littlewood–Paley decomposition seems to play an indispensable role, as this decomposition allows us to easily separate the core component of the non-linearity (which for wave maps is a connection term where the connection Aα;≤k has small curvature) from the remaining error terms which are good non-linearities and therefore negligible.
4. The Main Proposition Let S(c), Sk , Nk be as in Theorem 3. We now adapt the argument from [40]. In the next section we shall prove the following “bootstrap” property of the Sk norms: Proposition 1 (Main Proposition). Let c be a frequency envelope, 0 < T < ∞, and let φ be a classical wave map on [−T , T ] × Rn , extended to R1+n by the free wave equation, such that φ[0] lies underneath εc, and that
φk Sk ck
(34)
φk Sk ≤ ck
(35)
for all k. Then we have
for all k. We now give the continuity argument which deduces Theorem 2 from this proposition. Let T0 , c, φ be as in Theorem 2, and let a(T ) be the quantity in (16). From (27) and the hypothesis that φ lies underneath εc we see that a(0) = 1. From Proposition 1 we see that if 0 < T ≤ T0 obeys a(T ) 1, then we can automatically bootstrap this bound (using the monotonicity of a) to a(T ) = 1. From this and (17) we see that the set {T ∈ [0, T0 ] : a(T ) = 1} is both open and closed in [0, T ]. Since this set contains the origin, we thus have a(T0 ) = 1. From this and (32) we thus see that φ[t] lies underneath Cc for all 0 ≤ t ≤ T0 , as desired. It only remains to prove Proposition 1.
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
459
5. Renormalized Iteration: The Proof of Proposition 1 We shall divide this proof into several steps16 . Step 0. Scaling. Fix c, T , φ, and suppose that the hypotheses of Proposition 1 hold. In this section all our functions and equations shall be on the slab [−T , T ] × Rn . ∞ Since φ is on the sphere, it is bounded in L∞ t Lx . From this and (24) we have the bound
φ S(c) 1.
(36)
Of course, the same bound then holds for all Littlewood–Paley projections of φ, such as φk , P≤k φ, P≥k φ, etc. We need to show (35). By scale-invariance (scaling T , c, and φ appropriately) it suffices to show that
φ0 S0 ≤ c0 .
(37)
By applying P0 to (1) we obtain the evolution equation for φ0 : † ,α φ ). ✷φ0 = −P0 (φφ,α
(38)
Step 1. Linearize the φ0 evolution. Definition 2. For each integer k, define the connection A≤k;α by the formula17 † † A≤k;α := A
(39)
Definition 3. A function F on [−T , T ] × Rn is said to be an acceptable error if
F N0 εc0 , and we shall write F = error to denote this. The purpose of this step is to convert the non-linear equation (38) into the linear transport equation (cf. [40]) ✷φ0 = 2A≤−10;α φ0,α + error.
(40)
In order to convert (38) into (40) we need to show that † ,α P0 (φφ,α φ ) + 2A≤−10;α φ0,α
(41)
is an acceptable error. We shall do this into two stages. 16 We have decided to organize this paper as a tree as opposed to the more usual linear structure. Hopefully this shall help keep the “high-level” ideas of the argument from being obscured by the dozens of cases and sub-cases which we shall eventually have to consider. 17 Morally speaking, A ≤k;α is the pullback of the Levi–Civita connection on the sphere by φ≤k ; however this is a little inaccurate because φ≤k does not quite lie on the sphere.
460
T. Tao
Step 1(a). Decompose (41) into small pieces. The purpose of this step is to prove the identity (41) = P0 L(φ, φk2 ,α , φk,α3 )
(42)
k2 ,k3 :k2 ≥O(1);k3 =k2 +O(1)
+
k2 ,k3 :k3 ≥k2 ;k3 ≥O(1)
P−10<·<10 L(φ>k2 −30 , φk2 ,α , φkα3 )
,α + L(∇x φ≤−10 , φ≤−10,α , φ−10<·<10 )
,α + L(φ≤−10 , ∇x φ≤−10,α , φ−10<·<10 ).
(43) (44) (45)
In the first two summations the kernel of L may depend on k2 , k3 . To obtain the above decomposition, we split † α φφ,α ∂ φ= φφk†2 ,α φk,α3 k2 ,k3 :max(k2 ,k3 )≥10;|k2 −k3 |<5
+
k2 ,k3 :k2 ≥10,k3 +5
+
φφk†2 ,α φk,α3 φφk†2 ,α φk,α3
k2 ,k3 :k3 ≥10,k2 +5 † ,α + φ>−10 φ<10,α φ<10 † ,α + φ≤−10 φ−10<·<10,α φ−10<·<10 † ,α + φ≤−10 φ≤−10,α φ−10<·<10 † ,α + φ≤−10 φ−10<·<10,α φ≤−10 † ,α + φ≤−10 φ≤−10,α φ≤−10 .
We apply P0 , and discard some terms which are now zero, to obtain † ,α P0 (φφ,α φ )= P0 (φφk†2 ,α φk,α3 ) k2 ,k3 :max(k2 ,k3 )≥10;|k2 −k3 |<5
+
k2 ,k3 :k2 ≥10,k3 +5
+
P0 (φk2 −5<·
k2 ,k3 :k3 ≥10,k2 +5 † ,α + P0 (φ>−10 φ<10,α φ<10 ) † ,α φ−10<·<10 ) + P0 (φ≤−10 φ−10<·<10,α † ,α + P0 (φ≤−10 φ≤−10,α φ−10<·<10 ) † ,α φ≤−10 ). + P0 (φ≤−10 φ−10<·<10,α
The first term is of type (42), while the next three terms are of type (43) by (8) and dyadic decomposition (swapping k2 and k3 as necessary). Also, the fifth term is of type (42) by (8). The sixth and seventh terms are equal, so it remains to show that † ,α φ−10<·<10 ) + A≤−10;α φ0,α P0 (φ≤−10 φ≤−10,α
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
461
is of the right form. From Lemma 2 and the Leibnitz rule we see that † † ,α P0 (φ≤−10 φ≤−10,α φ−10<·<10 ) = φ≤−10 φ≤−10,α φ0,α + (44) + (45).
From the definition (39) of A≤−10;α it thus remains to show that † φ≤−10,α φ≤−10 φ0,α
(46)
is of the right form. Since φ lies on the sphere, we have the geometric identity φ † φ ,α = 0 for all α. We apply P0 and multiply by φ≤−10,α to obtain φ≤−10,α P0 (φ † φ ,α ) = 0. On the other hand, from Lemma 2 we have † † † φ ,α ) = φ≤−10,α φ≤−10 φ0,α + φ≤−10,α L(∇x φ≤−10 , φ ,α ). φ≤−10,α P0 (φ≤−10
Subtracting these two identities and re-arranging we obtain (46) = φ≤−10,α L(∇x φ≤−10 , φ ,α ) + φ≤−10,α P0 L(φ>−10 , φ ,α ). We may take P−5<·<5 projections of both sides; only the first term on the right-hand ,α side is affected. We may then replace φ ,α by φ−10≤·≤10 since the contribution of the remainder term vanishes. The first term is now of the form (44) by (8). We can exploit the P0 projection to split the second term as ,α )+ φ≤−10,α P0 L(φk1 , φk,α2 ). φ≤−10,α P0 L(φ−10<·<10 , φ<20 k2 ≥20 k1 :|k1 −k2 |<5
By (8) the first term is of the form (43) and the second term is of the form (42). Step 1(b). Show that the small pieces are all acceptable errors. To finish the proof of (40) we need to show that the four expressions (42)-(45) are acceptable errors. Step 1(b).1. The contribution of (42). (High-high interaction). By Lemma 1 and (36) it suffices to show (1) (2) (3),α P0 (φ φk2 ,α φk3 ) εc0 φ (1) S(c) φ (2) S(c) φ (3) S(c) k2 ≥O(1) k3 =k2 +O(1)
N0
for all scalar φ (i) , i = 1, 2, 3. By the triangle inequality and dyadic decomposition we can estimate the left-hand side by P0 (φ (1) Pk (φ (2) φ (3),α )) . k2 ,α k3 N k2 ≥O(1) k3 =k2 +O(1) k
0
462
T. Tao
From (28) we may bound this by φ (1) S(c)
k2 ≥O(1) k3 =k2 +O(1) k
(1)
(2)
(3),α
χk≤0 Pk (φk2 ,α φk3
) Nk .
We may restrict k to k ≤ k2 + O(1), since the summand vanishes otherwise. From (30), (13) we may majorize the previous by (1) (1) χk≤0 χk2 =k ck2 ck3 . φ (1) S(c) φ (2) S(c) φ (3) S(c) k,k2 ,k3 :k2 ≥O(1), k3 =k2 +O(1); k≤k2 +O(1)
We first sum in k3 , then in k2 , and finally in k, using (13) and splitting into k ≤ 0 and k > 0 if desired. The sum then converges to O(c02 ), which is acceptable by (15). Step 1(b).2. The contribution of (43). (Derivative falls on low frequency). We can bound the N0 norm of (43) by
P0 L(φk1 , φk2 ,α , φk,α3 ) N0 . k3 k2 k2 +O(1)
By (31) and (34) this is bounded by
(1)
k3 k2 k2 +O(1)
(1)
χ0=max(k1 ,k3 ) χk1 ≤k2 ck1 ck2 ck3 .
Split into k3 ≥ k1 and k3 < k1 . If k3 ≥ k1 , we perform the k1 summation using (13) to estimate this by (1) χk3 =0 ck22 ck3 k3
k2
which is acceptable by (12), (13). If k3 < k1 , we perform the k2 and k3 summations using (13) to estimate this by (1) χk1 =0 ck31 k1
which is acceptable by (13), (15). Step 1(b).3. The contribution of (44). (Derivative transferred from high frequency to low, Type I). We estimate the N0 norm using the triangle inequality and (10) by ,α 2k L(φk , φ≤−10,α , φ−10<·<10 ) N0 . k≤−10
We may freely add a P−10<·<10 projection to the expression inside the norm. By (33) and Lemma 1 we may thus bound the previous by 2k Pk L(φk , φk2 ,α , φk,α3 ) Nk . k≤O(1) k =O(1) k2 ≤O(1) k3 =O(1)
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
By (31) and (34) we may bound this by
k≤O(1) k =O(1) k2 ≤O(1) k3 =O(1)
463
(1)
χk≤k2 2k ck ck2 ck3 .
Using (13), (15) we can simplify this to −(0) (1) ε 2 c0 χk=0 2k χk≤k2 k≤O(1) k2 ≤O(1)
which is acceptable. Step 1(b).4. The contribution of (45). (Derivative transferred from high frequency to low, Type II). The expression φ≤−10 is bounded in S(c) by (36), while the expression ,α L(∇x φ≤−10,α , φ−10<·<10 )
has Fourier support in D0 ∼ 1. By (33) and (28), we can thus bound the N0 norm of (45) by ,α
Pk L(∇x φ≤−10,α , φ−10<·<10 ) Nk . k=O(1)
By the triangle inequality and (10) we can thus bound the previous by 2k1 Pk L(φk1 ,α , φk,α2 ) Nk . k=O(1) k1 ≤O(1) k2 =O(1)
By (30), (34) this is bounded by
2k1 ck1 ck2 .
k=O(1) k1 ≤O(1) k2 =O(1)
But this is acceptable by (13), (15). This completes the proof of (40). Step 2. Construct a gauge U≤−10 . We continue the proof of (37). In Step 3 we shall apply a renormalization w0 = U≤−10 φ0 which will transform (40) into a much better form, namely ✷w0 = error. In order to transform (40) like this, we would like18 U≤−10 to approximately be an orthogonal co-ordinate frame given by adjoint parallel transport by A≤−10;α in all direc† ≈ 1 and U≤−10,α ≈ −U≤−10 A≤−10;α . tions; in other words, we expect U≤−10 U≤−10 We now give the construction of U≤−10 . Let M > 10 be a large integer (depending on T , n, d, and the δi ) to be chosen later. We define the real d × d matrix fields Uk for integer k by setting Uk = 0 for k ≤ −M and † Uk := U
(47)
† 18 This scheme is slightly modified from that in [40]; basically, we have replaced U ≤−10 by U≤−10 as it allows for some technical simplifications.
464
T. Tao
for k ≥ −M, where U
Uk
k
and I is the identity matrix. An easy inductive argument shows that U
U≤k S(Cc) ≤ C,
Uk Sk ≤ Cck ,
(48) (49)
†
U≤k U≤k − I S(c) ε
(50)
for all k and some absolute constant C depending only on n, d, and the δi , assuming that M is sufficiently large (depending on T , ε, n, d, and the δi ). We prove (48), (49) by induction on k (this is why we made the constant C explicit in the estimates). As a by-product of this induction argument we shall also obtain (50). The estimates (48), (49) are trivial when k ≤ −M. Now suppose that k > −M and that (48), (49) held for all smaller k. From (47) and two applications of (21) (one with c and one with Cc) we have
Uk Sk U
By (24) we have
† †
U≤k U≤k − I S(c) U U k k k
∞ L∞ t Lx
+ sup Uk 2Sk /ck . k
By the triangle inequality, (22), and the induction hypothesis of (49) we thus see that † − I S(c) ck2 + sup ck ,
U≤k U≤k k
k
which implies (50) by (12), (15). From (19) we then have † ∞ ε
U≤k U≤k − I L∞ t Lx
which implies that
∞ ≤ C/2.
U≤k L∞ t Lx From this and (49) (which has just been proven for this value of k and all preceding values) we obtain (48) if C is sufficiently large. This concludes the induction, and also gives the proof of (50).
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
465
Step 2(b). Show that (U≤−M,α + U≤−M A≤−M;α )φ0,α is an acceptable error. Since U≤−M = I , it suffices by (25), (39) to show
L(φ≤−M , φ≤−M,α , φ0,α ) L1 L2 εc0 . t
x
By Hölder we may majorize the left-hand side by ∞ ∇x,t φ≤−M L∞ L∞ ∇x,t φ0 L∞ L2 . T φ≤−M L∞ t Lx t x t x
The first norm can be discarded since φ lies on the sphere. The second norm is O(2−M ) by (32), (34), and (5). The third norm is O(c0 ) by (32), (34). The claim then follows if M is sufficiently large depending on T . From (48), (49), and (23) we have
U≤k S(c) 1,
Uk Sk ck .
(51) (52)
Step 2(c). Show that (U≤−10,α + U≤−10 A≤−10;α )φ0,α is an acceptable error. From Step 2(b) and the triangle inequality it suffices by (12) to show the telescoping bound
(Uk,α + U≤k A≤k;α − U
and † † U≤k A≤k;α − U
Comparing the two, we see that the first terms of both expressions cancel. By several applications of (8) we thus have (Uk,α + U≤k A≤k;α − U
+ L(U
+ L(φ≤k , Uk , φ≤k,α , φ0,α ). Thus it remains to show that all the terms on the right-hand side have a N0 norm of O(ck2 c0 ). Note that all these non-linearities are of the good “derivative falls on a low frequency” type, as the derivative fails to fall on φk or Uk .
466
T. Tao
The expressions φ≤k , U
L(φk , Uk ,α , φ0,α ) N0 + L(φk , φk ,α , φ0,α ) N0 + L(Uk , φk ,α , φ0,α ) N0 (1)
χk≤k ck ck c0 for all k ≤ k. But this follows from (31), (52), and (34). Step 2(d). Show that ✷U≤−10 φ0 is an acceptable error. Morally speaking, this is a non-linearity of “repeated derivatives avoiding the high frequency” type, although the treatment gets somewhat technical because of the recursive construction of U . Applying ✷ to the definition (47) of Uk we obtain the identity ✷Uk = ✷Uk + Fk = ✷Uk >k + Fk (53) −M
for all −M < k ≤ −10, where >k is the matrix † >k := φ
and Fk is the matrix † ) Fk := U
† + 2U
,α† + 2U
We can simplify Fk using (8) as ,α ,α Fk = L(U
(54)
Iterating (53) completely we see that ✷U≤−10 =
M−10
Fk1 >k2 . . . >ks .
s=1 −M
At first glance this series seems to grow exponentially in s, but we shall eventually extract a 1/s! decay19 from the ordering of the kj to counteract this. We multiply both sides by φ0 on the right and take the N0 norm. Using the L notation (possibly conceding some powers of Cd because of matrix multiplication) we obtain
(✷U≤−10 )φ0 N0
M−10
(Cd)Cs L(Fk1 , >k2 , . . . , >ks , φ0 ) N0 .
s=1 −M
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
467
We can re-arrange the right-hand side as M−10
(Cd)Cs L(Fk1 , φ0 , >k2 , . . . , >ks ) N0 .
−M
From (21), (36), (34) we have
>k Sk φk Sk φ
M−10
(Cd)Cs L(Fk1 , φ0 ) N0 ck2 . . . cks .
−M
By the binomial (or multinomial) theorem we can estimate this by M−10 1 (Cd)Cs L(Fk1 , φ0 ) N0 s! −M
s ck ,
k1
which in turn is estimated by
L(Fk1 , φ0 ) N0 exp Cd
−M
ck .
k1
From (12) and Cauchy–Schwarz we have ck ε|k1 |1/2 , k1
so that the previous is majorized by
L(Fk1 , φ0 ) N0 exp Cdε|k1 |1/2 .
−M
It will thus suffice to show the bound (1)
L(Fk , φ0 ) N0 χk=0 εc0 for all −M < k ≤ −10. Fix k. From (54) we have L(Fk , φ0 ) = L(U
,α + L(φ≤k , φ0 , U
,α ) + L(U
(1)
so it suffices to show that these three expressions have an N0 norm of χk=0 εc0 .
468
T. Tao
The quantities U
L(✷φ≤k , φ0 ) N0 ,
(55)
,α
L(φ0 , U
L(φ0 , φ≤k,α , φ≤k ) N0
(56) (57)
(1)
are χk=0 εc0 . Case 2(d).1. The treatment of (56) (Derivative falls on low frequency). By dyadic decomposition and (31) we can bound (56) by (1) χ0≤min(k ,k ) φ0 S0 Uk Sk φk Sk . k ,k ≤k
But this is acceptable by (34), (52), (34). Case 2(d).2. The treatment of (57) (Derivative falls on low frequency). This is identical to Case 2(d).1 except that (52) is replaced with (34). Case 2(d).3. The treatment of (55). By dyadic decomposition it suffices to show (1)
L(✷φk , φ0 ) N0 χk=0 εc0 . We may freely insert P−10<·<10 in front of the L. Expanding ✷φk using Eq. (1), we can write the left-hand side as
P−10<·<10 L(Pk L(φ, φ,α , φ ,α ), φ0 ) N0 . By dyadic decomposition and symmetry it suffices to show that (1)
P−10<·<10 L(Pk L(φ, φk2 ,α , φk,α3 ), φ0 ) N0 χk=0 εc0 .
(58)
k2 ,k3 :k2 ≤k3
We divide (58) into three contributions and treat each separately. Case 2(d).3(a). The contribution to (58) when k2 ≤ k/2. (Derivative falls on low frequency). We use (8) to write this contribution as
P−10<·<10 L(φ, φ0 , φk2 ,α , φk,α3 ) N0 . k2 ,k3 :k2 ≤k3 ,k/2
We split the first φ as φ<10 + k1 ≥10 φk1 . To deal with φ<10 we use (31), (26) to estimate this contribution by (1) (1) χ0=max(0,k3 ) χ0≤k2 L(φ<10 , φ0 ) S0 φk2 Sk2 φk3 Sk3 . k2 ,k3 :k2 ≤k3 ,k/2
Using (34), (36), and (21) we can bound this by (1) (1) χ0=max(0,k3 ) χ0≤k2 c0 ck2 ck3 k2 ,k3 :k2 ≤k3 ,k/2
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
469
which is acceptable by (15) (performing the k3 summation first). To deal with k1 ≥10 φk1 , we again use (31) to estimate this contribution by (1) (1) χ0=max(k1 ,k3 ) χk1 ≤k2 L(φk1 , φ0 ) Sk1 φk2 Sk2 φk3 Sk3 k1 ≥O(1) k2 ,k3 :k2 ≤k3 ,k/2
which by the previous arguments is bounded by (1) (1) χ0=max(k1 ,k3 ) χk1 ≤k2 ck1 ck2 ck3 . k1 ≥O(1) k2 ,k3 :k2 ≤k3 ,k/2
But this is acceptable by (13) and (15) (performing the k1 summation, then the k3 , then the k2 ). Case 2(d).3(b). The contribution to (58) when k2 ≥ k/2 and k3 = k2 + O(1). (Highhigh interaction). By (29) and (34) we may control this contribution by c0 Pk L(φ, φk2 ,α , φk,α3 ) Nk . k2 ≥k/2 k3 =k2 +O(1)
Since φ is bounded in S(c) by (36), we apply another dyadic decomposition and (28) again to bound this by (1) c0 χk≥k0 Pk0 L(φk2 ,α , φk,α3 ) Nk0 . k2 ≥k/2 k3 =k2 +O(1) k0
By (30) and (34) this is bounded by (1) (1) c0 χk≥k0 χk0 =max(k2 ,k3 ) ck2 ck3 , k2 ≥k/2 k3 =k2 +O(1) k0
which is acceptable by (15). Case 2(d).3(c). The contribution to (58) when k2 ≥ O(1) and k3 > k2 +5. (Derivative falls on low frequency). As in Case 2(d).3(b) we can use (28), (34) to control this contribution by c0 Pk L(φ, φk2 ,α , φk,α3 ) Nk . k2 ≥k/2 k3 >k2 +5
We may replace the first φ by a φk3 −5≤·≤k3 +5 since the contribution vanishes otherwise. By dyadic decomposition, (31) and (34) we can thus control the previous by (1) (1) c0 χk=max(k1 ,k2 ,k3 ) χk1 ≤k3 ck1 ck2 ck3 k2 ≥k/2 k3 ≥k2 +O(1) k1 =k3 +O(1)
which is acceptable by (15). This concludes Step 2(c). Step 3. Renormalization. We can now complete the proof of (37). Define the renormalization w0 of φ0 by w0 := U≤−10 φ0 .
470
T. Tao
Step 3(a). Estimate ✷w0 . From the Leibnitz rule and (40) we have ✷w0 = 2(U≤−10,α + U≤−10 A≤−10;α )φ0,α + ✷U≤−10 φ0 + U≤−10 error.
(59)
Note that U≤−10 error must have Fourier support in the region D0 ∼ 1, since all the other terms of (59) do. From (51), (28), and (33) we thus see that this term is an acceptable error. From Steps 2(c), 2(d) the other two terms on the right-hand side of (59) are also acceptable errors. We thus have
✷w0 N0 εc0 . Step 3(b). Estimate w0 [0]. From (51), (19) we have
U≤−10 (0) L∞ 1. Also, from (51), (32), and Bernstein’s inequality (5) we have
Pk ∂t U≤−10 (0) L∞ 2k for all k. Summing over all k ≤ C and using the Fourier support of U≤−10 we obtain
U≤−10 [0] L∞ ×L∞ 1. From the hypothesis that φ lies underneath the frequency envelope εc we have
φ0 [0] L2 ×L2 ∼ φ0 [0] H˙ n/2 ×H˙ n/2−1 εc0 .
(60)
Combining the two we see that
w0 [0] H˙ n/2 ×H˙ n/2−1 ∼ w0 [0] L2 ×L2 εc0 . Step 3(c). Invert U≤−10 . From the Fourier support of U≤−10 and φ0 we see that w0 has Fourier support in the region 2−5 ≤ D0 ≤ 25 . From Steps 3(a), 3(b) and the energy estimate (27) we thus have
w0 S0 εc0 .
(61)
To obtain (37) from (61) we need to invert U≤−10 . From (18) we know that S(c) is † a Banach algebra. From (50) we thus see that U≤−10 U≤−10 is invertible in S(c), so that −1 −1 )† U≤−10
S(c) 1.
(U≤−10
From (51) and (18) we thus have −1
U≤−10
S(c) 1.
To use this we observe from the Fourier support of φ0 , w0 that −1 −1 φ0 = P−5<·<5 φ0 = P−5<·<5 (U≤−10 w0 ) = P−5<·<5 (P<10 U≤−10 w0 ).
From (21) and the preceding bounds, we thus have −1
φ0 S0 P<10 U≤−10 w0 S0 −1 P<10 U≤−10
S(c) w0 S0 −1 U≤−10
S(c) w0 S0 εc0
and (37) follows. This completes the proof of Proposition 1 and hence Theorem 1. We have now completed the iterative portion of the proof of Theorem 1; it remains only to prove Theorem 3. This shall occupy the remainder of the paper.
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
471
6. Overview of Spaces and Estimates We now begin the proof of Theorem 3. Our spaces S(c), Sk , Nk are modeled on those of Tataru in [42]. Roughly speaking, these spaces can be divided into a “Fourier” or “X˙ s,b ” q component, and a “physical space” or “Lt Lrx ” component, although the separation is not entirely clean, and the physical space component utilizes null frames in addition to the usual Euclidean space-time frame. This section will only be an outline of the spaces and estimates to come, and will therefore be rather informal in nature. Much of the notation used in this section will be defined later on. s−1,−1/2+ and S(c) In the sub-critical theory s > n/2 one usually chooses Nk to be Xk to be ∇x,t X s−1,1/2+ , in which case one essentially has Theorem 3 for short times. Thus n/2−1,−1/2 at the critical level s = n/2 one expects the space Nk to be something like X˙ k , −1 ˙ n/2−1,1/2 and S(c) to be something like ∇x,t X . However, these spaces contain too many −1 ˙ n/2−1,1/2 X will just barely fail logarithmic divergences to be useful (for instance, ∇x,t n/2 ˙ to control L∞ ). H x t In the high dimensional case n ≥ 4 Tataru [41] worked with the Besov counterparts n/2−1,−1/2,1 ˙ n/2,1/2,1 X˙ k , Xk of the above spaces, and showed that if these spaces were ˙ n/2,1 (for S(c)) and L1t B˙ n/2−1,1 (for N ), combined with Strichartz spaces such as L∞ t B2 2 then one could recover all the estimates above provided that one localized each function to a single dyadic block. This is enough to obtain a regularity and well-posedness theory n/2,1 for the Besov space B˙ 2 . When n = 2, 3 the Strichartz spaces are not strong enough to close all the above estimates, but in [42] null frame Strichartz spaces (specifically, ∞ 2 1 2 L2tω L∞ xω and Ltω Lxω type spaces for S(c), and Ltω Lxω type spaces for N ) were introduced in order to resolve this problem. Our spaces shall be similar to those in [42], but we shall need to prove stronger estimates than what was shown in that paper, and so we have modified the spaces somewhat in order to accomplish this. Specifically, we need to obtain an exponential gain (already implicit in [42]) in all high-high interactions, and we also need an exponential gain in the trilinear estimate SS,α S ,α ⊂ N when one or more derivatives fall on low frequency terms (which is a substantially more difficult gain to obtain, especially when n = 2). Both of these gains are essential in this paper because these are the two types of interactions which are not removed or mollified by the renormalization. Finally, we need to select S(c) so that it is a genuine algebra, as opposed to merely being obeying product-type estimates when all functions are restricted to dyadic blocks; this is essential in order for us to invert the renormalization, among other things. Heuristically the algebra ∞ property should be obtainable just by adding L∞ t Lx control to the space S(c), but the actual proof requires some care, mainly because one needs to commute multiplication ∞ by L∞ t Lx functions with frequency projections such as Pk or Qj at various junctures. There is a certain amount of discretion in how to select the S and N spaces. However, any attempt to make one of the above estimates easier to prove (for instance, by making a space S or N stronger or weaker) often causes two or three other estimates to become more difficult to prove. The spaces that the author eventually settled upon were obtained via a lengthy trial-and-error procedure, in which extreme difficulties in one of the above estimates were traded (via one or more modifications of the S and N spaces) for slightly less extreme difficulties in other estimates, with this process being iterated until all estimates could eventually be proved by the author. As in other work at or near the
472
T. Tao
critical problem, a partial duality relationship (94) between S and N can be exploited to slightly reduce the amount of computation required. The rest of the paper is organized as follows. In Sect. 7 we define the Qj family of multipliers which is used to define spaces of X˙ s,b type, and prove some technical lemmata allowing us to manipulate the Qj , as well as a Strichartz estimate for these spaces. In Sect. 8 we set out the notation for spherical caps and null frames, define two key spaces N F A[κ], S[k, κ] adapted to these null frames and caps, and set out their properties. With these preliminaries, and some sector projections defined in Sect. 9, we shall be able to define the spaces Sk , Nk , S(c) in Sect. 10, and set out some basic estimates for these spaces, including a partial duality between the S and N spaces. We then prove the energy estimate (27) in Sect. 11, which then allows us to demonstrate the quasi-continuity estimate (17) in Sect. 12. The remaining sections are devoted to bilinear and trilinear estimates. We begin by a discussion on the geometry of the cone in Sect. 13. We then prove a basic product estimate, Lemma 12, in Sect. 14. Using this Lemma and some additional effort, we shall be able to prove the product estimates (28), (29) in Sect. 15, the algebra estimates (18), (21), (20) in Sect. 16, and the null form estimate (30) in Sect. 17. Finally in Sect. 18 we use the estimates just proven, plus some additional arguments, to prove the trilinear estimate (31). 7. X˙ s,b Type Spaces We first use a convenient trick (going back at least to [7]) to work in the global Minkowski space R1+n instead of just the time-localized slab [−T , T ] × Rn . Namely, if X(R1+n ) is a Banach space of functions on R1+n and T ≥ 0, we define X([−T , T ] × Rn ) to be the restriction of functions in X to [−T , T ] × Rn , with the norm
φ X([−T ,T ]×Rn ) := inf{ > X(R1+n ) : >|[−T ,T ]×Rn = φ}. We shall construct the above Banach spaces first on R1+n , prove the estimates in Theorem 3 in this global setting, and then restrict these spaces to [−T , T ] × Rn by the above procedure. This restriction will not affect the above estimates (except for some minor technicalities involving time cutoffs in proving (32), which we shall address in Sect. 11). For a further discussion see e.g. [25]. Now that we are working globally in spacetime, we have available the space-time Fourier transform Fφ(τ, ξ ) := e−2πi(x·ξ +tτ ) φ(t, x) dt dx. It will be convenient to define the non-negative quantities D0 , D+ , D− on frequency space {(τ, ξ ) : τ ∈ R, ξ ∈ Rn } by D0 := |ξ |;
D+ := |ξ | + |τ |;
D− := ||τ | − |ξ ||.
Note that D+ ∼ D0 + D− . We shall also define the direction B ∈ S n−1 by B := τ ξ/|τ ξ | if τ ξ = 0, and B := 0 otherwise. We sometimes refer to D0 as the frequency, D− as the modulation, and B as the direction. We shall often use D0 , D− , D+ , and B to define
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
473
various regions in frequency space, thus for instance {D0 ∼ 2k , D− ∼ 2j } denotes the frequency region {(τ, ξ ) : |ξ | ∼ 2k , ||ξ | − |τ || ∼ 2j }. Most of the action shall take place in the region D0 ∼ D+ , and one can think of these quantities as heuristically equivalent. In practice, the region D+ $ D0 will generate numerous minor sub-cases in the estimates, which can usually be disposed of quickly20 . Using the spacetime Fourier transform we can define Littlewood–Paley projections adapted to the light cone. For any integer j , define the projection operator Q≤j = Q<j +1 by F(Q≤j φ)(τ, ξ ) := m0 (2−j D− )Fφ(τ, ξ ). Similarly define Qj , Q≥j , Qj1 ≤·≤j2 , etc. in exact analogy to the corresponding P multipliers. Thus, for instance, Qj has symbol m(2−j D− ). We observe the decomposition21 φ = j Qj φ for Schwartz functions φ. For any sign ±, define the operator Q± ≤j to be the restriction of Q≤j to the frequency + − region ±τ ≥ 0, thus Q≤j = Q≤j + Q≤j and −j F(Q± ≤j φ)(τ, ξ ) := χ[0,∞) (±τ )m0 (2 (±τ − |ξ |))Fφ(τ, ξ ). ± ± ± ± Similarly define Q± j , Q≥j , Qj1 ≤·≤j2 , etc. The reader should be cautioned that Q≤j +Q>j does not add up to the identity, but rather to the Riesz projection to the half-space {±τ ≥ 0}. Note that the operators Pk , Qj , Q± j , Q≤j , etc. are all spacetime Fourier multipliers and thus commute with each other and with constant co-efficient differential operators. The operator Pk Qj is essentially a projection onto the functions with spatial frequency 2k and modulation 2j .
Definition 4. A spacetime Fourier multiplier is said to be disposable if its (distributional) convolution kernel is given by a measure with total mass O(1). By Minkowski’s inequality, a disposable multiplier is bounded on all spacetimetranslation-invariant Banach spaces; in practice, this means that these operators can be discarded whenever one wishes. Note also that the composition of two disposable multipliers is also disposable. The Littlewood–Paley operators Pk , P≤k , P≥k , etc. can easily be seen to be disposable. Unfortunately, the operators Qj , Q± j , Q≤j , etc. are not disposable, however the following two lemmas will serve as adequate (and very useful!) substitutes. Lemma 3. If j, k are integers such that j ≥ k + O(1), then P≤k Q≤j , Pk Q≤j , P≤k Qj , and Pk Qj are disposable multipliers. ± If we strengthen the assumption j ≥ k + O(1) to j ≥ k + 10, then Pk Q± j and Pk Q≥j are disposable for either choice of sign ±. 20 Since the wave map equation is Lorentz invariant, there is no singularity near the frequency time axis, and one expects the D+ $ D0 case to not be significantly different from, say, the D− ∼ D0 case. Unfortunately the use of spatial Littlewood–Paley projections Pk breaks the Lorentz invariance and introduces these (admittedly artificial) cases. One could use spacetime Littlewood–Paley projections to resolve this problem (as is done in several other papers), but this creates problems elsewhere, specifically with energy estimates and the quasicontinuity property. We recommend that the reader ignore all the D+ $ D0 cases in a first reading. 21 Of course, this identity is not true for global free solutions, since Q φ always vanishes then, but all our j functions shall be time localized, so this issue does not arise.
474
T. Tao
Proof. Observe that when |k − j | > 10, then Pk Q± ≥j can be factorized as the product of Pk Q≥j and a multiplier whose symbol is a bump function adapted to D+ ∼ 2k , and is therefore disposable. Similarly for Pk Q± j . Thus it suffices to prove the claims in the first paragraph. In fact, it suffices to verify the claim for P≤k Q≤j , as the other projections are linear combinations of this type of multiplier. We may rescale k = 0. The symbol m of P≤0 Q≤j supported on the region |ξ | 1, |τ | 2j , is smooth except when ξ = 0 or τ = 0, and obeys the derivative bounds |∂τs ∂ξα m(τ, ξ )| 2−j s |ξ |1−|α| for arbitrarily many indices s, α with s ≥ 0, |α| > 0 away from the singular regions ξ = 0, τ = 0. Standard calculations then show that the kernel K(t, x) obeys the bounds |K(t, x)| (1 + |x|)−n−1+ε 2j (1 + 2j |t|)−2 for any ε > 0, which is integrable as desired.
Lemma 4. The operators Qj , Q≤j , Q≥j , Qj1 ≤·≤j2 , etc. are bounded on the spaces p Lt L2x for all 1 ≤ p ≤ ∞. Proof. It suffices to check Q≤j . By scaling we may take j = 0. We then decompose − Q≤0 = P≤10 Q≤0 + P>10 Q+ ≤0 + P>10 Q≤0 .
The first term is disposable by Lemma 3, so by conjugation symmetry it suffices to demonstrate the boundedness of P>10 Q+ ≤0 . It suffices to show that the multiplier with p 2 symbol m0 (τ − |ξ |) is bounded on Lt Lx . However, the transformation U given by F(U φ)(τ, ξ ) := Fφ(τ − |ξ |, ξ ) p
is easily seen √ to be an isometry on Lt L2x (indeed, for each time t, U is just the unitary operator e2πit − ). The claim then follows by conjugating by U and observing that the p multiplier with symbol m0 (τ ) is bounded on Lt L2x . Of course, on L2t L2x one does not need the above lemmata, and can just use Plancherel to discard all multipliers with bounded symbols. In the sequel we shall frequently be faced with estimating bilinear expressions such as Q(φψ), where Q is some disposable multiplier and φ has a larger frequency support than ψ. Often we shall use the Q projection to localize φ to some frequency region (e.g. ˜ for some variant Q ˜ of Q), and then use one of the above lemmata to replacing φ by Qφ discard the Q. This effectively moves the multiplier Q from φψ onto φ, albeit at the cost of enlarging Q slightly. We shall exploit this trick in much the same way that Lemma 2 was exploited in previous sections. Because we wish to treat the two dimensional case n = 2, we will not have access 2 4 to good L2t L∞ x or Lt Lx Strichartz estimates, in contrast to [40]. However, we have the following partial substitute (which basically comes from the L4t L∞ x estimate): Lemma 5 (Improvement to Bernstein’s inequality). If φ has Fourier support in the region D0 ∼ 2k , D− ∼ 2j , then (4)
φ L2 L∞ χj ≥k 2nk/2 φ L2 L2 . t
x
t
x
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
475
Proof. We may rescale j = 0. We may assume that k $ 1 since the claim follows from Bernstein’s inequality (5) otherwise. From the Poisson summation formula we can construct a Schwartz function a(t) whose Fourier transform is supported on the interval τ 1 and which satisfies a 3 (t − s) 1= s∈Z for all t ∈ R. From Hölder we have 3
φ L2 L∞ = a (· − s)φ t x S(c)
a
2
S(c)
L2t L∞ x
1/2
(· − s)φ 2L2 L∞ t x
S(c)
1/2
a(· − s)φ 2L4 L∞ t x
.
On the other hand, from the Xs,b version (as in [1])
φ L4 L∞ &D0 'n/2−1/4 D−
1/2+
t
x
Fφ L2 L2 τ
ξ
of the L4t L∞ x Strichartz estimate (see e.g. [12]), the Fourier support of a(· − s)φ and Plancherel we have
a(· − s)φ L4 L∞ 2−k/4 2nk/2 a(· − s)φ L2 L2 , t
t
x
and the claim follows by square-summing in s.
x
One can improve this bound substantially when n > 2, but we shall not need to do (4) so here. The factor χj ≥k is a gain over Bernstein’s inequality when j < k, and reflects the Cp properties of the cone (as already indicated by Strichartz estimates). s,b,q For any s, b ∈ R, k ∈ Z and 1 ≤ q ≤ ∞, we define X˙ k to be the completion of space of all Schwartz functions with Fourier support in 2k−5 ≤ D0 ≤ 2k+5 whose norm q 1/q sk bj
φ X˙ s,b,q := 2 2 Qj φ L2 k
j
t,x
is finite, with the usual supremum convention when q = ∞. As a first approximation, one should think of φk as belonging to spaces of strength −1 ˙ n/2−1,1/2,q comparable to ∇x,t for some q, while ✷φk belongs to spaces of strength Xk n/2−1,1/2−1,q ˙ comparable to Xk . In sub-critical contexts the value of q is usually irrelevant, but some care must be taken with this index for our situation. The space X˙ ks,b,1 is an atomic space, whose atoms are functions with spacetime Fourier support in the region {2k−5 ≤ D0 ≤ 2k+5 , D− ∼ 2j } and have an L2t L2x norm of O(2−sk 2−j/2 ) for some integer j . The verification of multi-linear estimates on these spaces then reduces to verifying the estimates on atoms. Unfortunately only a portion of our waves φ can be placed into such a nice space. From Plancherel we observe the duality relationship
sup{|&φ, ψ'| : φ X˙ s,b,1 ≤ 1} ∼ 2−k(s+s ) ψ X˙ s ,−b,∞ k
k
(62)
476
T. Tao
for all k, s, s , b and Schwartz ψ ∈ X˙ ks ,−b,∞ , where &φ, ψ' := φψ dx dt is the usual inner product. 8. Null Frames We now set out the machinery of null frames which we shall need to define our spaces. In particular we shall describe the complementary null frame spaces N F A[κ] and S[k, κ], which shall play a key role in the construction of Nk and Sk respectively. We define a spherical cap to be any subset κ of S n−1 of the form κ = {ω ∈ S n−1 : |ω − ωκ | < rκ } for some ωκ ∈ S n−1 and 0 < rκ < 2. We call ωκ and rκ the center and radius of κ respectively. If κ is a cap and C > 0, we use Cκ to denote the cap with the same center but C times the radius. If ± is a sign, we use ±κ to denote the cap with the same radius but center ±ωκ . For any direction ω ∈ S n−1 , we define the null direction θω by 1 θω := √ (1, ω) 2 and the null plane N P (ω) by N P (ω) := {(t, x) ∈ R1+n : (t, x) · θω = 0}, where (t, x) · (t , x ) := tt + x · x is the usual Euclidean inner product. We can parameterize physical space R1+n by the null co-ordinates (tω , xω ) ∈ R × NP (ω) defined by tω := (t, x) · θω ;
xω := (t, x) − tω θω .
One can similarly parameterize frequency space by (τω , ξω ) ∈ R × N P (ω) defined by τω := (τ, ξ ) · θω ; q
ξω := (τ, ξ ) − τω θω . q
We can then define Lebesgue spaces Ltω Lrxω , Lτω Lrξω in the usual manner. The trivial identity
φ L2 L2 ∼ φ L2 t
x
2 tω Lxω
(63)
shall be crucial in linking the null frame estimates to the Euclidean frame estimates. The null plane N P (ω) contains the null direction (1, −ω). As such, it is not conducive to good energy estimates; for instance, control of ✷φ in L1tω L2xω does not give satisfactory control of φ in general. However, if one also knows that φ has Fourier support in a sector {B ∈ κ} and that ω is outside of 2κ, then one can recover good energy estimates in this co-ordinate system (but losing a factor of 1/dist(ω, κ)). This motivates the definition of a Banach space N F A[κ] of null frame atoms oriented away from κ. More precisely:
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
477
Definition 5. For any cap κ, we define N F A[κ] to be the atomic Banach space whose atoms are functions F with
F L1
2 tω Lxω
≤ dist(ω, κ)
for some ω ∈ 2κ. The spaces N F A[κ] will be one of the more exotic building blocks for our nonlinearity space Nk , and appear implicitly in [42]. The reader should think of N F A[κ] as a variant of L1t L2x . Since the norm N F A[κ] is defined entirely using Lebesgue spaces, we have the estimate ∞ ψ NFA[κ]
φψ NFA[κ] φ L∞ t Lx
(64)
for all Schwartz φ, ψ. We also observe the nesting inequality
F NFA[κ ] ≤ F NFA[κ]
(65)
κ
whenever ⊂ κ. We now construct a Banach space S[k, κ] which is something of a dual space22 to N F A[κ], and will be a key part of the construction of the spaces Sk and S(c). Proposition 2. For each integer k and cap κ we can associate a Banach space S[k, κ], such that the following properties hold for all integers k, k , caps κ, κ , and Schwartz functions φ, ψ. – (S[k, κ] is stable under L∞ multiplications) We have ∞ ψ S[k,κ] .
φψ S[k,κ] φ L∞ t Lx
(66)
– (Nesting property) We have
φ S[k,κ] φ S[k,κ ]
(67)
whenever κ ⊂ κ. – (S[k, κ] is dimensionless) For any integer j , we have
φ(2j t, 2j x) S[k+j,κ] = φ(t, x) S[k,κ] .
(68)
– (Energy estimate) We have −nk/2
φ L∞
φ S[k,κ] . 2 2 t Lx
(69)
|&φ, ψ'| 2−nk/2 φ S[k,κ] ψ NFA[κ] .
(70)
– (Duality) We have
22 (The reader may think of this space S[k, κ] as something like a Strichartz space, containing norms similar 2 ∞ 2 k to L∞ t Lx and Lt Lx , normalized to frequencies 2 and adapted to null frames oriented in the direction κ. The main reason for introducing this space is to restore some vestige of the crucial L2t L∞ x type of Strichartz
estimates, which are otherwise lacking in dimension n = 2.
478
T. Tao
– (Consistency) We have
φ S[k,κ] ∼ φ S[k ,κ]
(71)
whenever k = k + O(1). – (Product estimates) We have the estimates23
φψ NFA[κ]
|κ |1/2 2k /2 dist(κ, κ )
φ L2 L2 ψ S[k ,κ ] t
x
(72)
and
φψ L2 L2 t
x
|κ |1/2 2nk/2 2k /2 dist(κ, κ )
φ S[k,κ] ψ S[k ,κ ]
(73)
whenever 2κ ∩ 2κ = ∅. – (Square-summability) Let 0 < r ≤ 2−5 rκ be a radius, and suppose that K is a finitely overlapping collection of caps of radius r which are contained in κ. For each κ ∈ K we associate a Schwartz function φκ with Fourier support in the region D0 ∼ 2k , B ∈ κ , D− r 2 2k . Then we have φκ κ ∈K
S[0,κ]
κ ∈K
1/2
φκ 2S[0,κ ]
.
(74)
n/2,1/2,1 waves with direction in κ) We have the estimate – (S[k, κ] contains X˙ k
φ S[k,κ] φ X˙ n/2,1/2,1
(75)
k
n/2,1/2,1 has Fourier support on {B ∈ κ}. whenever φ ∈ X˙ k
Proof. Step 1. Construction of S[k, κ]. We shall need to define some intermediate spaces, which will only be used inside the proof of this proposition. Define N F A∗ [κ] to be the space of functions φ whose norm
φ NFA∗ [κ] := sup dist(ω, κ) φ L∞ 2 t Lx ω
ω∈2κ
ω
is finite. Observe that N F A∗ [κ] is the dual of N F A[κ]. We also define the plane wave space P W [κ] to be the atomic Banach space whose atoms are functions φ with
φ L2 L∞ ≤ 1 tω
xω
for some ω ∈ κ. 23 The angular separation condition 2κ ∩ 2κ = ∅ is only really needed when n = 2. When n > 2 the 4/3 4/3 Strichartz estimates allow one to make S(c) contain L4t L4x type spaces, and Nk contain Lt Lx type spaces, (n−1)/2 1/2 ∼ rκ allows us in which case one can just use Hölder as a substitute for (72), (73). The factor |κ |
to obtain a gain in the small angle interaction case when n > 1.
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
479
We now define S[k, κ] by the norm
φ S[k,κ] := 2nk/2 φ NFA∗ [κ] + |κ|−1/2 2k/2 φ P W [κ] + 2nk/2 φ L∞ 2. t Lx
(76)
The estimates (66), (67), (70), (69), (71) and (68) are now straightforward and are left to the reader. It remains to prove (72), (73), (74), and (75). Step 2. Proof of the product estimates (72), (73). The estimate (73) will follow from (72), (70), and duality, so we need only prove (72). By (76) it suffices to show
φψ NFA[κ]
1
φ L2 L2 ψ P W [κ ] . t x dist(κ, κ )
But this follows by reducing ψ to a P W [κ ] atom, then using Hölder and (63). Step 3. Proof of the square-summability estimate (74). Fix κ. By (68) we may rescale k = 0. By (67) we may replace the S[0, κ ] norms on the right-hand side by S[0, κ]. We expand the left-hand side of (74) using (76) and treat the three components separately. Step 3(a). The contribution of P W [κ]. By triangle inequality and Cauchy–Schwarz we can bound this contribution by |κ|−1/2 |K|1/2
κ ∈K
1/2
φκ 2P W [κ]
.
Since the caps in K are finitely overlapping, we have |K| |κ|/r n−1 , and the claim then follows from (76). 2 Step 3(b). The contribution of L∞ t Lx . This follows from Plancherel and the observation that the φκ have finitely overlapping ξ -support as κ varies.
Step 3(c). The contribution of N F A∗ [κ]. Suppose that ω is a direction such that ω ∈ 2κ. Then from elementary geometry we see that the functions φκ have finitely overlapping ξω -Fourier support as κ varies. We thus have from Plancherel that φκ (tω ) (
φκ (tω ) 2L2 )1/2 κ ∈K
L2xω
xω
κ ∈K
for all tω ∈ R. Taking suprema in tω , then taking suprema in ω and using (76) we obtain the claim. Step 4. Proof of the embedding (75). n/2,1/2,1 By (68) we may rescale k = 0. By the atomic nature of X˙ 0 and conjugation symmetry we may assume that φ has Fourier support in the region τ > 0, D0 ∼ 1, D− ∼ 2j , B ∈ κ for some integer j , in which case we need to show
φ S[k,κ] 2j/2 φ L2 L2 . t
x
480
T. Tao
Fix j . We now prove the claim for the three components of (76) separately. 2 Step 4(a). The estimation of the L∞ t Lx norm. By Plancherel and Minkowski we have
φ L∞ 2 Fφ L2 L1 . t Lx ξ
(77)
τ
For fixed ξ , the function Fφ has τ -support in an interval of length O(2j ). The claim then follows by using Hölder and Plancherel. Step 4(b). The estimation of the N F A∗ [κ] norm. Let ω ∈ S n−1 be such that ω ∈ 2κ. By Plancherel and Minkowski we have
φ L∞ 2 Fφ L2 t Lx ω
ω
1 ξω Lτω
.
From elementary geometry we see that for fixed ξω , the function Fφ has τω -support in an interval of length O(2j /dist(ω, κ)2 ). The claim then follows by using Hölder and Plancherel, and then taking suprema in ω. Step 4(c). The estimation of the P W [κ] norm. This estimate is perhaps the most interesting one in the proposition; it shows that even though the L2t L∞ x Strichartz estimate fails in two dimensions, that some analogue of this estimate can still hold if one is willing to use null frames. From the Fourier inversion formula in polar co-ordinates and the Fourier support of φ we have φ(t, x) = C e2πiat Fφ(r + a, rω)e2πir(t,x)·(1,ω) r n−1 dr da dω. ω∈κ
|a|2j
r∼1
From Minkowski’s inequality and the definition of P W [κ] we thus have
φ P W [κ] ω∈κ
|a|j
2πiat e
r∼1
Fφ(r + a, rω)e2πir(t,x)·(1,ω) r n−1 dr
L2tω L∞ xω
da dω.
√ The e2πiat factor is bounded and can be discarded. Since (t, x) · (1, ω) = 2tω , we can estimate the previous by √ 2πi 2rtω n−1 Fφ(r + a, rω)e r dr da dω. ω∈κ
|a|2j
r∼1
L2tω
By Plancherel this is bounded by 1/2 (Fφ(r + a, rω)r n−1 )2 dr da dω. ω∈κ
|a|2j
r∼1
By Cauchy–Schwarz this is bounded by 1/2 |κ|1/2 2j/2 (Fφ(r + a, rω)r n−1 )2 dr da dω . ω∈κ
|a|2j
r∼1
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
481
Undoing the polar co-ordinates we can estimate this by |κ|1/2 2j/2 Fφ L2 L2 . τ
ξ
By Plancherel we thus see that the contribution of the P W [κ] norm is acceptable. This finishes the proof of (75). 9. Sector Projections We now define some sector projection operators which will be needed in the definition of Nk and S(c). For every real l > 10 we let Fl be a maximal 2−l -separated subset of the unit sphere n−1 S . We let Kl denote the space of caps Kl := {κ : ωκ ∈ Fl ; rκ = 2−l+5 }. n−1 For any real l > 10and any cap κ ∈ Kl , we define a bump function mκ on S such that 1 = κ∈Kl mκ . For any integer k, we then define Pk,κ to be the spatial Fourier multiplier with symbol mk,κ (ξ ) := mk (|ξ |)mκ (ξ/|ξ |),
where mk is a bump function adapted to 2k−4 ≤ D0 ≤ 2k+4 which equals 1 on 2k−3 ≤ D0 ≤ D k+3 . Thus mk,κ is a bump function adapted to the tube 1 n k−4 k+4 ξ ≤ D0 ≤ 2 ; ∈ κ (78) ξ ∈R :2 |ξ | 2 and φ = κ∈Kl Pk,κ φ for all functions φ with Fourier support in 2k−3 ≤ D0 ≤ 2k+3 . We shall also need the variant P˜k,κ , given by a spatial Fourier multiplier m ˜ k,κ which is a bump function, equals 1 on (78) and is adapted to the enlargement n k−5 k+5 ξ ξ ∈R :2 ≤ D0 ≤ 2 ; ∈κ |ξ | of (78). Observe that the multipliers Pk,κ and P˜k,κ are disposable. Lemma 6. Let j, k ∈ Z, l ∈ R be such that l > 10 and j ≥ k − 2l + O(1), and let κ = Kl . Then P˜k,κ Qj , P˜k,κ Q≤j , and P˜k,κ Q≥j are disposable multipliers. If we also ± ˜ assume that |j − k| > 10, then P˜k,κ Q± j is also disposable. If j > k + 10, then Pk,κ Q≥j is disposable, while if j < k − 10, then P˜k,κ Q± is disposable. ≥j
The above results also hold when P˜k,κ is replaced by Pk,κ .
Proof. We shall just prove these claims for P˜k,κ , as the Pk,κ claims then follow from the factorization Pk,κ = Pk,κ P˜k,κ . If |j − k| ≤ 5 then we factorize P˜k,κ Q≤j = P˜k,κ (Pk−10<· 5. We shall show the claim for P˜k,κ Q+ j ; the other multipliers are treated similarly and will be left to the reader.
482
T. Tao
One may verify using the hypothesis j ≥ k − 2l + O(1) that the symbol of P˜k,κ Q+ j is a bump function adapted to the parallelopiped {(τ, ξ ) : ξ · ωκ ∼ 2k ; |ξ ∧ ωκ | 2k−l ; τ = ξ · ωκ + O(2j )}. The kernel is then rapidly decreasing away from the dual of this parallelopiped, and the claim follows. 10. Construction of Nk , Sk , S(c) In this section we construct the spaces Nk , Sk , S(c) required for Theorem 3, and prove some of the easier properties about them. The energy estimate (27), the bilinear estimates (18), (21), (20), (30), (28), (29), and the trilinear estimate (31) are quite lengthy to prove and will be deferred to later sections. The spaces Nk , Sk , S(c) will be defined in terms of frequency localized variants N [k], S[k]. We begin with the definition of the S(c)-type spaces. Definition 6. For any integer k, we define S[k] to be the completion of the space of Schwartz functions with Fourier support in the region {ξ ∈ Rn : 2k−3 ≤ D0 ≤ 2k+3 } with respect to the norm24
φ S[k] := ∇x,t φ L∞ H˙ n/2−1 + ∇x,t φ X˙ n/2−1,1/2,∞ x t k 2 1/2 + sup sup (
Pk,±κ Q± .
(79)
± l>10 κ∈Kl
The first two terms in (79) are standard, but are not sufficient by themselves to obtain good product estimates25 . This will be rectified by the addition of the third term in (79), which also essentially appears in [42]. Note that the square sum in the third term of (79) is essentially increasing in l thanks to (74). It is not a priori clear that the third term in (79) is finite even for Schwartz functions. However, this follows from Lemma 7. Suppose φ is a Schwartz function with Fourier support in {2k−5 ≤ D0 ≤ 2k+5 }. Then we have 1/2 2 Pk,±κ Q± φ X˙ n/2,1/2,1 (80)
κ∈Kl
for all l > 0 and any sign ±. Proof. By (75) it suffices to show 1/2 2 Pk,±κ Q± φ X˙ n/2,1/2,1 .
k
k
n/2,1/2,1 we may assume that φ has Fourier support in the By the atomic nature of X˙ k region D− ∼ 2j for some integer j . But then the claim then follows from Plancherel’s theorem and the finite overlap of the Pk,κ Fourier supports. 24 Unlike the frequency 2k and modulation 2j (which have the units of inverse length), the quantity 2l will measure an angle and is therefore dimensionless. 25 More precisely, a logarithmic divergence occurs whenever one considers the imbalanced modulation case (as defined in Sect. 13).
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
483
A function φ in S[k] enjoys several estimates. From (79) and Lemma 4 we clearly have −nk/2
φ L∞
φ S[k] , 2 , Q≤j φ L∞ L2 , Qj φ L∞ L2 , etc. 2 t Lx t t x x
(81)
and hence from Bernstein’s inequality (5) that ∞ , Q≤j φ L∞ L∞ , Qj φ L∞ L∞ , etc. φ S[k] .
φ L∞ t Lx t t x x
(82)
Also, from (79) we have
Qj φ L2 L2 2−nk/2 2−j/2 t
x
2k
2k
φ S[k] , + 2j
(83)
since ∇x,t behaves on Qj φ like the quantity D+ ∼ 2k + 2j , at least as far as the L2t L2x norm is concerned. From this and Lemma 5 we have (4)
Qj φ L2 L∞ χj ≥k min(2−(j −k) , 1)2−j/2 φ S[k] . t
(84)
x
In each of these four estimates we remark that one can replace φ by φt by paying a power of 2k on the right-hand side; this is basically because the S[k] norms contain a time derivative ∂t in addition to spatial derivatives ∇x . The above four “Strichartz” estimates will be used very frequently in the sequel, and between them are capable of handling all cases except when two of φ, ψ, φψ are close to the light cone26 , in which case one must resort to the null frame estimates in Lemma 2 instead. It is worth mentioning that of the above four estimates, the L2x estimates are more effective for high frequencies, while the L2t estimates are more effective for large modulations. For instance, if φ has low frequency and small modulation, then (82) is probably the best of the above four estimates to use. n/2,1/2,1 The space S[k] also contains X˙ k functions: n/2,1/2,1 Lemma 8. If φ is in X˙ k , then
φ S[k] 2−k ∇x,t φ X˙ n/2,1/2,1 . k
n/2,1/2,1 Proof. We may rescale k = 0. By the atomic nature of X˙ 0 and conjugation symmetry we may assume that φ has Fourier support in the region τ > 0, D0 ∼ 1, D− ∼ 2j for some integer j , in which case we need to show
φ S[k] (1 + 2j )2j/2 φ L2 L2 . t
x
To control the first term of (79), we use (77) to obtain j j
∇x,t φ L∞ 2 (1 + 2 ) φ L∞ L2 (1 + 2 ) Fφ L2 L1 . t Lx t x ξ
τ
Since Fφ is supported in an interval of length O(2j ), the claim then follows from Hölder and Plancherel. 26 More precisely, in the language of Sect. 13, we shall use (81), (82), (83), (84) when the modulations of φ, ψ, φψ are balanced, and rely on Lemma 2 when the modulations are imbalanced.
484
T. Tao
The second term of (79) is trivial, so it remains only to show κ∈Kl
2
P0,±κ Q± <−2l φ S[0,κ]
1/2
(1 + 2j )2j/2 φ L2 L2 t
x
for all l > 10 and signs ±. Fix l, ±. We may assume that j ≤ −2l +O(1) since the left-hand vanishes otherwise. In particular we have 1 + 2j ∼ 1. The claim then follows from Lemma 7. We also have the technical estimate Lemma 9. For all φ ∈ S[k ] and k = k + O(1) we have
Pk φ S[k] φ S[k ] .
(85)
Proof. This estimate is easily verified for the first two factors of (79). Thus it only remains to show κ∈Kl
2
Pk,±κ Q±
1/2 φ S[k ]
for all l > 10 and signs ±. ± ± ± Fix ±, l. We split Q±
nk/2 (k−2l)/2
2
2 ± − Q )φ Pk,±κ (Q±
Lt L2x
κ∈Kl
1/2 ,
which by Plancherel is bounded by ± 2nk/2 2(k−2l)/2 (Q±
x
But this is acceptable by the second term in (79). It remains to estimate κ∈Kl
2
Pk,±κ Q±
1/2 .
From the construction of Pk,±κ and the Fourier support of φ we observe the identity Pk,±κ Pk φ = Pk ,±κ Pk φ. Discarding the Pk we then see that this contribution is acceptable by the third term in (79).
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
485
We are now in a position to define the full space S(c)(R1+n ). We define the space S(c)(R1+n ) to be the closure of the space of affinely Schwartz27 functions φ and whose norm −1 ∞ + sup c
φ S(c)(R1+n ) := φ L∞ k φk S[k] t Lx k
(86)
is finite. We can then form the restricted space S(c) := S(c)([−T , T ] × Rn ) in the usual manner. We then define the space Sk (R1+n ) to be the subspace of S(c)(R1+n ) given by the norm
φ S (R1+n ) := sup 2δ1 |k−k | φk S[k ] , k k
(87)
and define Sk := Sk ([−T , T ] × Rn ) in the usual manner. Thus Sk is a larger variant of S[k] which allows for some leakage outside of the frequency region D0 ∼ 2k . Most of the estimates in this half of the paper will contain a decay of χ (2) or stronger, which −(1) shall be more than enough to overcome this 2δ1 |k−k | = χk=k leakage. We shall also define the slightly weaker norm S(1)(R1+n ) by ∞ + sup φk S[k] .
φ S(1)(R1+n ) := φ L∞ t Lx
k
(88)
From (86), (82), (87), and (85) we observe the embeddings S(c)(R1+n ), Sk (R1+n ), S[k] ⊆ S(1)(R1+n ).
(89)
The space S(1) will therefore be convenient for unifying the treatment of S(c)(R1+n ) and S[k] in the estimates in Theorem 3. Certain portions of Theorem 3 can now be easily verified. It is easy to see that S(c) and Sk satisfy the required invariance properties, as well as (19), (23), and (24). The estimate (22) easily follows from (82) and (87). It is also clear that S(c) contains the identity function 1, and that the function a(T ) defined in (16) is monotone non-decreasing. Having defined the Banach algebra S(c), which will be used to hold the wave map φ, we now define the spaces N [k] and Nk , which are used to hold the renormalized frequency pieces of ✷φ. The invariance properties of Nk and (25) shall be immediate, however the proof of (27) is far more involved, and shall be deferred to the next section. Just as the space S[k] was defined as the intersection of three different “sup-type” Banach spaces, the space N [k] (which is sort of a dual to S[k]) will be defined as the sum of three “atomic-type” Banach spaces: Definition 7. Let k be an integer, and let F be a Schwartz function with Fourier support n/2−1 in the region 2k−4 ≤ D0 ≤ 2k+4 . We say that F is an L1t H˙ x -atom at frequency 2k if n
F L1 L2 ≤ 2−( 2 −1)k . t
x
If j ∈ Z, we say that F is a X˙ n/2−1,−1/2,1 -atom with frequency 2k and modulation 2j if F has Fourier support in the region 2k−4 ≤ D0 ≤ 2k+4 , 2j −5 ≤ D− ≤ 2j +5 and n
F L2 L2 ≤ 2j/2 2−( 2 −1)k . t
x
27 We call a function φ affinely Schwartz if φ − e is a Schwartz function for some constant e.
486
T. Tao
Finally, if l > 10 is a real number and ± is a sign, we say that F is a ±-null frame atom with frequency 2k and angle 2−l if there exists a decomposition F = κ∈Kl Fκ such that each Fκ has Fourier support in the region {(τ, ξ ) : ±τ > 0; D− ≤ 2k−2l−50 ; 2k−4 ≤ D0 ≤ 2k+4 ; B ∈ 21 κ} and κ∈Kl
1/2
Fκ 2NFA[κ]
n
≤ 2−( 2 −1)k .
If F is an atom of one of the above three types, then we say that F is an N [k] atom. We let N [k] be the atomic Banach space generated by the N [k] atoms. For future reference we observe that if F is a ±-null frame atom with frequency 2k and angle 2−l , and Fκ is as above, then Fκ = P˜k,κ Fκ .
(90)
From our definition we clearly have the estimates
F N[k] F L1 H˙ n/2−1 ∼ 2(n/2−1)k F L1 L2
(91)
F N[k] F X˙ n/2−1,−1/2,1
(92)
t
t
x
x
and k
whenever F is a Schwartz function with Fourier support in 2k−4 ≤ D0 ≤ 2k+4 . Furthermore, we have 1/2 Fκ 2(n/2−1)k
Fκ 2NFA[κ] (93) κ∈Kl
N[k]
κ∈Kl
for all signs ±, l > 10 and Fκ with Fourier support in the region {(τ, ξ ) : ±τ > 0; D− ≤ 2k−2l−50 ; 2k−4 ≤ D0 ≤ 2k+4 ; B ∈ 21 κ}. Often we will need to estimate the Nk norm of a function F which is a bilinear expression of two other functions. In such cases, the estimate (92) is the most favorable when F has modulation at least as large as its component functions, while (91) is favorable when F has modulation much smaller than its components. The remaining bound (93) is needed when F and one of its components have small modulation, while the other component has large modulation. In this case one must use the geometry of the cone to create some angular separation between the two small modulation functions and then use28 (72). Definition 8. Let F be a Schwartz function and k be an integer. We say that F is an Nk atom if there exists a k ∈ Z such that 2100n|k−k | F is a N [k ] atom. We define 1+n Nk (R ) to be the atomic Banach space generated by the Nk atoms. Finally, we define Nk := Nk ([−T , T ] × Rn ) to be the restriction of Nk (R1+n ) to the slab [−T , T ] × Rn . 28 Informally, providing one has some angular separation, one may pretend that S spaces controlled (a null ∞ 2 1 2 frame version of) L2t L∞ x ∩ Lt Lx , while (a null frame version of) the space Lt Lx controlled the N spaces. This allows us to play the same Hölder type games that is already used in the balanced modulation case.
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
487
Thus Nk is a very slight enlargement of N [k] which (reluctantly!) allows for some frequency leakage outside the region D0 ∼ 2k . Observe that the invariance properties of Nk required in Theorem 3 are immediate, as are (25) and (26). From (70), (62), and Hölder, we observe the useful duality property |&φ, F '| 2−(n−1)k φ S[k] F N[k ]
(94)
whenever k = k + O(1) and φ ∈ S[k], F ∈ N [k ]. Thus up to a scaling factor, S[k] is contained in the dual of N [k] and vice versa. Finally, we observe the useful embedding Lemma 10. For all F ∈ N [k] we have
F X˙ n/2−1,−1/2,∞ F N[k] . k
Proof. We control Q≤k+10 F and Q>k+10 F separately. The contribution of Q≤k+10 F is acceptable from Lemma 8, (94), and duality (note that the time derivative in Lemma 8 can be discarded in the Fourier support of Q≤k+10 ). Now we control Q>k+10 F . The claim is clear for X˙ n/2−1,−1/2,1 atoms, and for null n/2−1 frame atoms Q>k+10 F vanishes. It remains to verify the claim for L1t H˙ x atoms, or in other words that
F X˙ n/2−1,−1/2,∞ F L1 H˙ n/2−1 t
k
when F has Fourier support in
2k−4
x
2k+4 .
≤ D0 ≤ But this follows from the formula FF (τ, ξ ) = e−2πitτ F (t)(ξ ) dt,
Plancherel’s theorem, Minkowski’s inequality, and a straightforward computation.
From Lemma 10, we may reverse (92) if D− is restricted to a single dyadic block, and obtain
F X˙ n/2−1,−1/2,q ∼ F N[k] k
(95)
for any 1 ≤ q ≤ ∞ when F has Fourier support in the set D− ∼ 2j for some j . 11. Energy Estimates: The Proof of (27) In this section we prove (27). We remark that this section can be read independently since the techniques used here are not needed elsewhere in the paper. Step 0. Scaling. By (87) and Definition 8 it suffices to show that
φk S[k ]([−T ,T ]×Rn ) ✷φ N[k]([−T ,T ]×Rn ) + φ[0] H˙ n/2 ×H˙ n/2−1 for all k, k with k = k + O(1). By rescaling k, k , and T we may make k = 0, so that k = O(1). We may assume that T ≥ 1, since the case T < 1 then follows by restricting the T ≥ 1 estimate to a smaller slab. For any M > 0, we let ηM (t) be a bump function adapted to [−10M, 10M] which equals 1 on [−5M, 5M].
488
T. Tao
Fix k. By linearity we can split φ into a free solution and a solution to the inhomogeneous problem, and we address the two cases separately. Step 1. (Free solutions) Prove (27) when ✷φ = 0 on [−T , T ] × Rn . We may assume that φ[0] is Schwartz. Of course we may then extend φ to be a free solution on all of R1+n . By Plancherel’s theorem we then observe the Fourier representation Fφ0 (τ, ξ ) = f+ (ξ )δ(τ − |ξ |) + f− (ξ )δ(τ + ξ ),
(96)
where f+ , f− are supported on 2−3 ≤ D0 ≤ 23 and obey
f+ 2 + f− 2 φ[0] H˙ n/2 ×H˙ n/2−1 . It thus suffices to show that
ηT (t)φ0 S[0](R1+n ) f+ 2 + f− 2 . By Lemma 8 it suffices to show
∇x,t (ηT (t)φ0 ) X˙ n/2,1/2,1 f+ 2 + f− 2 . 0
But this follows by applying the identity F∇x,t (ηT φ0 )(τ, ξ ) = 2π i(ξ, τ )[f+ (ξ )ηˆ T (τ − |ξ |) + f− (ξ )ηˆ T (τ + |ξ |)] coming from (96), then using Plancherel’s theorem and a routine computation. Step 2. (Inhomogeneous solutions) Prove (27) when φ[0] = 0. As in [42], the idea shall be to use Duhamel’s formula to write the inhomogeneous solution as an average of free solutions truncated along half-spaces. Because N [k] contains null frame atoms, we shall need to perform some truncation along null planes N P (ω), which becomes a little messy. We turn to the details. By Duhamel’s formula it suffices to show that √ t sin((t − s) −) F N[k]([−T ,T ]×Rn ) F (s) ds √ P 0 − n 0 S[0]([−T ,T ]×R ) t 0 for all F ∈ N [k]([−T , T ] × Rn ), where we adopt the convention that 0 = − t when t < 0. Let ηT+ be the restriction of ηT to [0, ∞). If we set √ t sin((t − s) −) + φ := P0 ηT (t − s)F (s) ds √ − −∞ then we have29
t
P0 0
√ sin((t − s) −) F (s) ds = φ − S(t)φ[0] √ −
29 The author thanks Daniel Tataru for this simplifying observation.
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
489
for t ∈ [−T , T ], where
√ sin(t −) S(t)φ[0] := cos(t −)φ(0) + √ φt (0) − √
is the solution to the free wave equation. In light of the estimates already obtained for free solutions, and the energy estimate
φ[0] L2 ×L2 φ S[0] from (79), it suffices to show that
φ S[0] F N[k] .
(97)
We may assume that F is an N [k] atom, and replace the right-hand side by 1. Observe the identity Fφ(τ, ξ ) = C
m(ξ ) + ηˆ T (τ − |ξ |) − ηˆ T+ (τ + |ξ |) FF (τ, ξ ) |ξ |
and the estimate ηˆ T+ (τ − |ξ |) − ηˆ T+ (τ + |ξ |) = O
1 D− (1 + D− )
(98)
(99)
when D0 ∼ 1; this follows from a routine computation of ηˆ T+ and the hypothesis T ≥ 1. From (98) and Lemma 10 we see that
∇x,t φ X˙ n/2,1/2,∞ P0 F X˙ n/2−1,−1/2,∞ 0
0
F X˙ n/2−1,−1/2,∞ F N[k](Rn ) 1. k
Thus the second component of the S[0] norm in (79) always makes an acceptable contribution to (97). We now divide into three cases depending on which type of N [k] atom F is. Case 2(a). F is an X˙ n/2−1,−1/2,1 atom with frequency 2k and modulation 2j . In this case F has Fourier support in 2j −5 ≤ D− ≤ 2j +5 and
F L2 L2 2j/2 . t
x
n/2,1/2,1 From this, (99) and (98) we see that ∇x,t φ is bounded in X˙ 0 , and (97) follows from Lemma 8. n/2−1 Case 2(b). F is an L1t H˙ x atom with frequency 2k . By Minkowski’s inequality and time translation invariance it suffices to show that √ sin(t −) + f 2 f P0 ηT (t) √ − S[0]
for all f ∈ L2 (Rn ). 2 The L∞ t Lx component of (79) is acceptable by standard energy estimates. The n/2,1/2,∞ X˙ 0 component is acceptable by the remarks made previously, so we turn to
490
T. Tao
the last component. The idea is to treat the expression inside the norm as a free solution, smoothly truncated to the upper half-space {t > 0}. By conjugation symmetry it suffices to show
√ 2 −) sin(t + + P Q P η (t) f √ 0,κ 0 <−2l T − κ∈Kl
1/2
f 2 .
S[0,κ]
for all l > 10. Fix l. By the finite overlap of the P0,κ it suffices to show that √ sin(t −) + + f 2 f P0,κ Q<−2l P0 ηT (t) √ −
(100)
S[0,κ]
for each κ ∈ Kl and all f ∈ L2 . Fix κ, f . The expression inside the norm has a spacetime Fourier transform of m(ξ ) ηˆ T+ (τ − |ξ |) + O(1) fˆ(ξ ) Cm0,κ (ξ )m0 22l (τ − |ξ |) |ξ | ) since ηˆ T+ (τ + |ξ |) = O(1) on the support of m0 (22l (τ − |ξ |)) m(ξ |ξ | .
We treat the main term ηT+ (τ − |ξ |) and the error O(1) separately.
Case 2(b).1. The contribution of O(1). By Lemma 7 we may estimate this contribution by −1 m(ξ ) ˆ 2l F O(m (ξ )m (2 (τ − |ξ |)) f (ξ )) 0,κ 0 ˙ n/2,1/2,1 , |ξ | X 1
which by Plancherel and the support of m0 (22l (τ − |ξ |)) is bounded by m(ξ ) −l 2l ˆ f (ξ ) 2 m0,κ (ξ )m0 2 (τ − |ξ |) 2 2, |ξ | L Lτ ξ
which is acceptable (with a factor of 2−2l to spare). Case 2(b).2. The contribution of ηˆ T+ (τ − |ξ |). This contribution can be rewritten as √ it − e + ˇ −2l ·) (t)P0,κ P0 √ C ηT ∗ 2−2l m(2 f −
.
S[0,κ]
The expression inside the norm is Schwartz, so the above expression can be estimated by √ eit − + −2l ηT ∗ m(2 ˇ ·) (t)ηM (t)P0,κ P0 √ f − S[0,κ]
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
491
for some sufficiently large M > 0 (depending on f , l, T ). By (66) we can discard the bounded function ηT+ ∗ m ˇ 0 (2−2l ·). We then revert to the frequency space formulation, and apply Lemma 7 to bound this contribution by −1 m(ξ ) ˆ m0,κ (ξ ) ηˆ M (τ − |ξ |)f (ξ ) F n/2,1/2,1 , |ξ | X˙ 1
which is acceptable by Plancherel and a direct computation.
Case 2(c). F is a ±-null frame atom with frequency 2k and angle 2−l . By conjugation symmetry we may take ± = +. Then there exists a decomposition F = κ∈Kl Fκ such that each Fκ has Fourier support in {τ > 0; D− 2−2l ; D0 ∼ 1; B ∈ 21 κ} and κ∈Kl
1/2
Fκ 2NFA[κ]
1.
(101)
From (98) we see that φ has the same Fourier support as F . Since F is supported on the region D0 ∼ 1, D− 2−2l , we have D+ ∼ 1 and therefore that
∇x,t φ L∞ 2 φ L∞ L2 . t Lx t x By orthogonality and (76) we thus see that the first component of φ S[0] in (79) is bounded by the third. The second component is controlled by the remarks made previously, so it remains to control the third component of φ S[0] , i.e. that
2 φ P0,±κ Q± <−2l κ ∈Kl
S[0,κ ]
1/2
1
(102)
for all l > 10 and choices of sign ±. Actually, we can take ± = + since the ± = − term vanishes completely. We can decompose φ = κ∈Kl φκ , where φκ is given by the Fourier representation Fφκ := C
m(ξ ) + ηˆ T (τ − |ξ |) − ηˆ T+ (τ + |ξ |) FFκ (τ, ξ ). |ξ |
(103)
We shall first show (102) for l ≥ l + C, and then later bootstrap this to the case l + C > l > 10. Case 2(c).1. Proof of (102) when l ≥ l + C. In this case each φκ only contributes to those κ ∈ Kl for which κ ⊂ κ. Thus each κ ∈ Kl has contributions from at most O(1) values of κ ∈ Kl . It therefore suffices by (101) to show that 1/2 2 Fκ NFA[κ] (104) P0,κ Q+ <−2l φκ κ ∈Kl :κ ⊂κ
S[0,κ ]
492
T. Tao
for all κ ∈ Kl . + Fix κ. Because of the multiplier Q+ <−2l we may estimate ηˆ T (τ + |ξ |) by O(1) in (103), as in Case 2(b).2(b). From Definition 5 and (103) it suffices to show
κ ∈Kl :κ ⊂κ
2 P0,κ Q+ F −1 m(ξ ) (ηˆ + (τ −|ξ |)+O(1))FF (τ, ξ ) <−2l |ξ | T
1/2
S[0,κ ]
1
F L1 L2 tω xω dist(ω, κ)
for all ω ∈ 2κ and F ∈ L1tω L2xω . Fix ω. By rotation invariance we may take ω = e1 . By Minkowski’s inequality it suffices to prove this estimate with F replaced by δ(te1 − t0 )f (xe1 ) for some t0 ∈ R and f ∈ L2 (N P (e1 )), and with F L1 L2 replaced by f 2 . By translation invariance we may let t0 = 0, so we reduce to
κ ∈Kl :κ ⊂κ
te1
xe1
2 P0,κ Q+ F −1 m(ξ ) (ηˆ + (τ − |ξ |) + O(1))fˆ(ξe ) 1 T <−2l |ξ |
1/2
S[0,κ ]
1
f 2 . dist(e1 , κ)
For each κ , the only portion of fˆ which contributes is that which is supported on the region 1 ξe1 : τ > 0, D+ ∼ 1, D− 2−2l , B ∈ κ . (105) 2 From elementary geometry we see that these regions are finitely overlapping as κ varies. Thus by Plancherel we need only show that 1 P0,κ Q+ F −1 m(ξ ) (ηˆ + (τ − |ξ |) + O(1))fˆ(ξe )
f 2 1 T <−2l |ξ | dist(e1 , κ ) S[0,κ ] for all κ ∈ Kl such that κ ⊂ κ, and all f ∈ L2 (using the trivial estimate dist(e1 , κ ) ≥ dist(e1 , κ)). Fix κ . We rewrite the left-hand side of the previous as −1 m(ξ ) + 2l F ˆ m (ξ )m (2 (τ − |ξ |)) (τ − |ξ |) + O(1)] f (ξ ) . (106) [ η ˆ 0 e1 0,κ T |ξ | S[0,κ ] We shall now replace these Euclidean multipliers with null frame counterparts. Observe that we may freely insert the multiplier m ˜ 0,κ (ξe1 ), where m ˜ 0,κ equals 1 on the region
{ξe1 : τ > 0; D0 ∼ 1, D− 2−2l , B ∈ κ} and is a bump function adapted to a dilate of this region.
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
493
Next, we observe the identity τ − |ξ | =
2ξe11 1 (τ 2 − |ξ |2 ) = τe1 − h(ξe1 ) , τ + |ξ | τ + |ξ |
(107)
where 1 ξe11 := ξe1 · √ (1, −e1 ); 2 and h is the function
1 ξe1 := ξe1 − ξe11 √ (1, −e1 ) 2
h(ξe1 ) := |ξe1 |2 /2ξe11 .
On the Fourier support of (106) we see from elementary geometry that |ξe1 | ∼ dist(e1 , κ ) and |ξe11 | ∼ dist(e1 , κ )2 . We thus have
τe1 = h(ξe1 ) + O(2−2l /dist(e1 , κ )2 ).
(108)
We can therefore freely insert the multiplier m0 22l −C dist(e1 , κ )2 τe1 − h(ξe1 ) in (106). By Lemma 6 the multiplier with symbol m0,κ (ξ )m0 (22l (τ − |ξ |))m(ξ )/|ξ | is disposable. We may therefore estimate (106) by F −1 m0 (22l −C dist(e1 , κ )2 (τe1 − h(ξe1 )))[ηˆ T+ (τ − |ξ |)
+O(1)]m ˜ 0,κ (ξe1 )fˆ(ξe1 )
S[0,κ ]
.
It still remains to convert ηˆ T+ (τ − |ξ |) to a null frame equivalent. A direct computation gives the estimate ηˆ T+ (τ ) = −1 for |τ | > ∼ T , and
C + O T (T τ )−100 τ
ηˆ T+ (τ ) = O(T )
for |τ | T −1 . Combining these facts with (107) we have ηˆ T+ (τ − |ξ |) + O(1) τ + |ξ | + −100 η ˆ (τ − h(ξ )) + O T min(1, (T D ) ) + O(1) = e e − 2 1 1 2ξe11 T /dist(e1 ,κ ) on the support of (106); note that |(τ + |ξ |)/2ξe11 | ∼ dist(e1 , κ)2 in this region. We deal with the main term and the two O() terms separately.
494
T. Tao
Case 2(c).1(a). The contribution of the O() terms. We use (75) to estimate this contribution by F −1 m0 22l −C dist(e1 , κ )2 τe1 − h(ξe1 ) × O T min(1, (T D− )−100 ) + O(1) m ˜ 0,κ (ξe1 )fˆ(ξe1 )
n/2,1/2,1 X˙ 0
.
The expression inside the norm has Fourier support on τ = |ξ | + O(2−2l ). From (107), Plancherel and the triangle inequality we can thus estimate the previous by 2j/2 T min(1, (T 2j )−100 ) + 1
j ≤−2l +C
× F −1 χτ =|ξ |+O(2j ) m ˜ 0,κ (ξe1 )fˆ(ξe1 )
L2t L2x
.
By Plancherel we may estimate the previous by 2j/2 T min(1, (T 2j )−100 ) + 1 χτ =|ξ |+O(2j ) m ˜ 0,κ (ξe1 )fˆ(ξe1 )
L2ξe L2τe
j ≤−2l +C
1
. 1
For each ξe1 in the support of m ˜ 0,κ , the expression χτ =|ξ |+O(2j ) has an L2τe norm of 1
2j/2 /dist(e1 , κ ) by (107). Thus we can estimate the previous by
j ≤−2l +C
2j j −100 ) ) + 1
f 2 T min(1, (T 2 dist(e1 , κ )
which is acceptable. Case 2(c).1(b). The contribution of the main term. We need to show τ + |ξ | + −1 m0 22l −C dist(e1 , κ )2 (τe1 − h(ξe1 )) ηˆ (τe − h(ξe1 )) F 2ξe11 T /dist(e1 ,κ )2 1 1 m ˜ 0,κ (ξe1 )fˆ(ξe1 )
f 2 . S[0,κ ] dist(e1 , κ ) Since the above expression has Fourier support on the region |τ |+|ξ | ∼ 1, the multiplier (τ + |ξ |)/2 is effectively disposable and can be discarded. We can thus estimate the previous by , F −1 (ϕ ηˆ T+/dist(e ,κ )2 )(τe1 − h(ξe1 ))fˆκ (ξe1 ) 1
where 1 fˆκ (ξe1 ) := 1 m ˜ 0,κ (ξe1 )fˆ ξe1 ξe1
S[0,κ ]
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
and
495
ϕ(τ ) := m0 22l −C dist(e1 , κ )2 τ .
We can rewrite this further as ϕˇ ∗ ηT+/dist(e ,κ )2 te1 ηˆ M F −1 δ τe1 − h(ξe1 ) fˆκ ξe1
S[0,κ ]
1
for some large M (depending on f , l, T ). By (66) we may discard the (ϕˇ ∗ ηT+/dist(e ,κ )2 )(te1 ) factor. We then move ηM inside 1 the Fourier transform and use (75) to estimate the previous by F −1 ηˆ M (τe1 − h(ξe1 ))fˆκ (ξe1 ) n/2,1/2,1 . ˙ X0
By Plancherel we can estimate this by 2j/2 χτ =|ξ |+O(2j ) ηM (τe1 − h(ξe1 ))|ξe11 |−1 m0,κ (ξe1 )fˆ(ξe1 )
L2ξe L2τe
j
1
. 1
We use the estimates |τe1 − h(ξe1 )| ∼ 2j /dist(e1 , κ )2 (from (107)) and ηˆ M (τ ) = O(M min(1, (M|τ |)−100 )) to estimate this by 2j/2 M min 1, (M2j /dist(e1 , κ )2 )−100 j
χ|τe1 −h(ξe1 )|∼2j /dist(e1 ,κ )2 |ξe11 |−1 m0,κ (ξe1 )fˆ(ξe1 )
L2ξe L2τe 1
. 1
Since |ξe11 | ∼ dist(e1 , κ )2 , we can estimate this by dist(e1 , κ )−1
j
2j M min(1, (M2j /dist(e1 , κ )2 )−100 ) fˆ 2 dist(e1 , κ )2
which is acceptable. This completes the proof of (102) when l > l + C. Case 2(c).2. Proof of (102) when 10 < l < l + C. We divide Q<−2l φ = Q<−2l−4C φ + Q−2l−4C≤·<−2l φ. Case 2(c).2(a). The contribution of Q−2l−4C≤·<−2l φ. In this case we use (75) to bound this contribution by 1/2 2 . P0,κ Q+ −2l−4C≤·<−2l φ ˙ n/2,1/2,1 κ ∈Kl
X0
From (98) we see that the expression inside the norm has Fourier support in D− ∼ 2−2l , hence we can estimate the previous by 1/2 2 , 2−l P0,κ Q+ −2l−4C≤·<−2l φ 2 2 κ ∈Kl
Lt Lx
496
T. Tao
which by Plancherel is bounded by 2 1/2 , 2−l Q−2l−4C≤·<−2l φ L2 L2 t
x
which is acceptable since we have already bounded the second component of (79). Case 2(c).2(b). The contribution of Q<−2l−4C φ. In this case we write the contribution to (102) as 2 + P0,κ P0,κ Q<−2l−4C φ κ ∈K κ ∈Kl+2C
1/2
.
S[0,κ ]
l
We may restrict the inner summation to the case when κ ⊂ κ since the contribution vanishes otherwise. We then discard P0,κ , estimating the previous by
1/2
2 + P0,κ Q<−2l−4C φ κ ∈K κ ∈Kl+2C :κ ⊂κ
.
S[0,κ ]
l
By (74) we may estimate this by
P0,κ Q+
<−2l−4C
κ ∈Kl κ ∈Kl+2C :κ ⊂κ
1/2
2 φ
S[0,κ ]
which simplifies to
P0,κ Q+
<−2l−4C
κ ∈Kl+2C
2 φ
1/2
S[0,κ ]
.
Split φ = κ∈Kl φκ as before. For each κ we can restrict the κ summation to those caps for which κ ⊂ κ, so that only O(1) values of κ contribute for each κ . Thus we may estimate the previous by
P0,κ Q+
<−2l−4C
κ ∈Kl+2C κ∈Kl :κ ⊂κ
1/2
2 φκ
S[0,κ ]
.
By (104) (applied with l replaced by l + 2C) we can bound this by
κ ∈Kl+2C κ∈Kl :κ ⊂κ
1/2
Fκ 2NFA[κ ]
.
By (65) we may replace N F A[κ ] with N F A[κ]. But this is then acceptable by (101). This concludes the proof of (27).
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
497
12. The Continuity of (16) In this section we prove the continuity of (16). Our main tools will be the estimates (27), (25), which have already been proven. We remark that this section can be read independently since the arguments used here are not needed elsewhere in the paper. Our arguments here are unfortunately a little inelegant. Daniel Tataru has observed that one could prove the stronger continuity estimate (for T > 0 at least) by first showing that the scaling map λ → φ(·/λ, ·/λ) is continuous in the Sk topology, but we shall not pursue this argument here. Fix φ, T , T0 ; we may extend φ to all of R1+n by the free wave equation outside of [−T0 , T0 ] × Rn . Since the spaces Sk ([−T , T ] × Rn ) are defined by restriction we see that the function a(T ) is monotone non-decreasing in T . Let A denote the quantity A := lim inf T →T a(T ). Clearly A ≥ 1. By monotonicity we need to show that
φk |[−T −H,T +H]×Rn Sk ([−T −H,T +H]×Rn ) Ack
(109)
for all k, where 0 < H 1 is some quantity depending on φ, T , A but is independent of k. By (27) we may majorize the left-hand side of (109) by n
n
2nk/2 φk (0) 2 + 2( 2 −1)k ∂t φk (0) 2 + 2( 2 −1)k ✷φk L1 L2 ([−T −H,T +H]×Rn ) . (110) t
x
Since φ is a classical wave map, we see that this quantity decays like O(2−|k| ) or faster as k → ±∞, uniformly in T ∈ [0, T0 ]. From (13) we thus see that (109) holds whenever |k| is sufficiently large. Thus we only have a finite number of k left to deal with, which implies that we only need to show (109) for each k separately (with H now allowed to depend on k). Fix k; we may rescale k = 0. We first dispose of the case T = 0. In this case we must have A = a(0). By (32) we then have
φ0 (0) 2 + ∂t φ0 (0) 2 Ac0 . Since φ is a classical wave map, we may therefore bound (110) by Ac0 for sufficiently small H, as desired. Now suppose T > 0. If H is sufficiently small, we have a(T − H) ∼ A by monotonicity. We may therefore find φ˜ 0 ∈ S0 (R1+n ) which agrees with φ0 on [−T + H, T − H] and satisfies the estimate
φ˜ 0 S0 Ac0 .
(111)
By replacing φ˜ 0 with P−5<·<5 φ˜ 0 if necessary we may assume that φ˜ 0 has Fourier support on the region D0 ∼ 1. From (32) and the previous we see that ˜
φ˜ 0 (t) 2 ∇x,t φ(t) 2 Ac0
(112)
498
T. Tao
for all t ∈ R. Unfortunately, this is not enough regularity for us to apply (27), since ˜ To resolve this we shall regularize φ. ˜ Specifically, define the we need to control ✷φ. smoothing operator S by ˜ := φ(t ˜ + Hs)ϕ(s) ds, S φ(t) where ϕ is a bump function on [−1, 1] of mass 1. From Minkowski’s inequality and (111) we have
S φ˜ 0 S0 Ac0 , which of course implies
S φ˜ 0 |[−T −H,T +H]×Rn S0 ([−T −H,T +H]×Rn ) Ac0 . From the identity φ0 = S φ˜ 0 + S(φ0 − φ˜ 0 ) + (1 − S)φ0 and (32) we see that (109) will follow from
✷S(φ0 − φ˜ 0 ) L1 L2 ([−T −H,T +H]) + (1 − S)φ0 [0] L2 ×L2 t
x
+ (1 − S)✷φ0 L1 L2 ([−T −H,T +H]×Rn ) Ac0 t
x
(113)
(note that S(φ0 − φ˜ 0 ) vanishes at time 0). Since φ is a classical wave map on [−T0 , T0 ] and is extended by the free wave equation, we have the estimates j
∇x,t φ0 (t) L2 ≤ Cφ
(114)
for all t ∈ R and j = 0, 1, 2, and some quantity Cφ < ∞ which is independent of H. From this and the Lebesgue differentiation theorem we see that the last two terms of (113) go to zero as H → 0, and are therefore acceptable if H is sufficiently small. Since S(φ0 − φ˜ 0 ) vanishes on [−T + 2H, T − 2H] we may bound the first term of (113) by H ✷S(φ0 − φ˜ 0 ) L∞ . 2 t Lx ({|t|=T +O(H)}) Expanding out ✷ and S(c) and using the fact that φ0 − φ˜ 0 has Fourier support in D0 ∼ 1, we may bound this by ∇x,t (φ0 − φ˜ 0 ) L∞ . 2 t Lx ({|t|=T +O(H)}) The contribution of φ˜ 0 is acceptable by (112). To control the contribution of φ0 , we observe from (114) and the Fundamental Theorem of Calculus that
∇x,t φ0 (t) L2 ∇x,t φ0 (T − H) L2 + Cφ H whenever t = T + O(H). Since φ0 (T − H) = φ˜ 0 (T − H), we thus see from (112) that this term is also acceptable if H is sufficiently small. This concludes the proof of (109).
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
499
13. The Geometry of the Cone In the next few sections we shall be proving a number of bilinear estimates. In all of these estimates it will be important to understand the relationship between the modulation of φ, ψ, and φψ, and with the angular separation of φ and ψ. Definition 9. If N0 , N1 , N2 are numbers, we write Nmax ≥ Nmed ≥ Nmin for the maximum, median, and minimum of N0 , N1 , N2 respectively. We say that N0 , N1 , N2 are balanced if Nmed ∼ Nmax and imbalanced otherwise. The Littlewood–Paley trichotomy thus says that Pk0 (φk1 ψk2 ) vanishes unless the frequencies 2k0 , 2k1 , 2k2 are balanced. There is a similar relationship concerning modulations, but it is more complicated. For simplicity we restrict ourselves to the case when all frequencies lie near the cone: Lemma 11. Let C be a large constant, let 2k0 , 2k1 , 2k2 be balanced, let j0 , j1 , j2 be integers such that ji ≤ ki −C for i = 0, 1, 2, and let κ1 , κ2 be caps of radius 0 < r ≤ 2−5 . Let φ have Fourier support on the region {D0 ∼ 2k1 , D− ∼ 2j1 , B ∈ κ1 } and ψ have Fourier support on the region {D0 ∼ 2k2 , D− ∼ 2j2 , B ∈ κ2 }. – If 2j0 , 2j1 , 2j2 are imbalanced, then Pk0 Qj0 L(φ, ψ) vanishes unless 2jmax 2kmin and dist(κ1 , κ2 ) + r ∼ 2k0 −kmax 2(jmax −kmin )/2 + r.
(115)
– If 2j0 , 2j1 , 2j2 are balanced, then Pk0 Qj0 L(φ, ψ) vanishes unless dist(κ1 , κ2 ) 2k0 −kmax 2(jmax −kmin )/2 + r.
(116)
Proof. If Pk0 Qj0 L(φ, ψ) does not vanish, then by the Fourier transform there must exist (τ0 , ξ0 ) = (τ1 , ξ1 )+(τ2 , ξ2 ) such that ||τi |−|ξi || ∼ 2ji and |ξi | ∼ 2ki for i = 0, 1, 2, and that τi ξi /|τi ξi | ∈ κi for i = 1, 2. Similarly for Pk Qj L(φ,α , ψ ,α ). From our hypothesis ji ≤ ki − C we see that |τi | ∼ |ξi | for i = 0, 1, 2. By conjugation symmetry we may assume τ0 ≥ 0. By symmetry it suffices to consider three cases. Case 1. ((++) case) k0 ≤ kmax − C and τ1 , τ2 > 0. Observe the identity |ξ1 | + |ξ2 | − |ξ1 + ξ2 | = (|τ0 | − |ξ0 |) − (|τ1 | − |ξ1 |) − (|τ2 | − |ξ2 |). The left-hand side is ∼ 2kmax , but the right-hand side can at most be O(2jmax ). Since we are assuming ji ≤ ki − C, this case is therefore impossible. Case 2. ((+−) case) k0 ≤ kmax − C and τ1 > 0, τ2 < 0. This forces k0 = kmin and k1 , k2 = kmax + O(1). Now observe the identity −|ξ1 | + |ξ2 | + |ξ1 + ξ2 | = −(τ0 − |ξ0 |) + (τ1 − |ξ1 |) − (|τ2 | − |ξ2 |). The right-hand side has magnitude O(2j max ), and in the imbalanced case has magnitude ∼ 2j max . The left-hand side can be rewritten as −|ξ1 | + |ξ2 | + |ξ1 + ξ2 | = 2
|ξ1 + ξ2 ||ξ2 | + (ξ1 + ξ2 ) · ξ2 |ξ1 | + |ξ2 | + |ξ1 + ξ2 |
500
T. Tao
and therefore
−|ξ1 | + |ξ2 | + |ξ1 + ξ2 | ∼ (ξ1 + ξ2 , −ξ2 )2 2k0 .
By the sine rule we thus have −|ξ1 | + |ξ2 | + |ξ1 + ξ2 | ∼ (ξ1 , −ξ2 )2 2−k0 2−kmax . Combining this with the previous we see that we are in either (115) or (116). Case 3. (Low-high case) k0 = kmax + O(1) and k2 = kmax + O(1). If τ2 < 0, then τ1 > 0. Since |τi | ∼ |ξi |, we must then have k1 = kmax + O(1), at which point we could swap k1 and k2 . Thus we may assume that τ2 > 0. If τ1 > 0, we have the identity |ξ1 | + |ξ2 | − |ξ1 + ξ2 | = (|τ0 | − |ξ0 |) − (|τ1 | − |ξ1 |) − (|τ2 | − |ξ2 |). The right-hand side has magnitude O(2jmax ), and in the imbalanced case magnitude ∼ 2jmax . The left-hand side can be rewritten as |ξ1 | + |ξ2 | − |ξ1 + ξ2 | = 2 so that
|ξ1 ||ξ2 | − ξ1 · ξ2 |ξ1 | + |ξ2 | + |ξ1 + ξ2 |
|ξ1 | + |ξ2 | − |ξ1 + ξ2 | ∼ (ξ1 , ξ2 )2 2k1 .
Combining this with the previous we see that we are in either (115) or (116). If τ1 < 0, we have the identity |ξ1 | − |ξ2 | + |ξ1 + ξ2 | = −(|τ0 | − |ξ0 |) − (|τ1 | − |ξ1 |) + (|τ2 | − |ξ2 |). The right-hand side has magnitude O(2jmax ), and in the imbalanced case has magnitude ∼ 2jmax . The left-hand side can be rewritten as |ξ1 | − |ξ2 | + |ξ1 + ξ2 | = 2 so that
|ξ1 ||ξ1 + ξ2 | + ξ1 · (ξ1 + ξ2 ) |ξ1 | + |ξ2 | + |ξ1 + ξ2 |
|ξ1 | + |ξ2 | − |ξ1 + ξ2 | ∼ (−ξ1 , ξ1 + ξ2 )2 2k1 .
By the sine rule we thus have |ξ1 | + |ξ2 | − |ξ1 + ξ2 | ∼ (−ξ1 , ξ2 )2 2k1 . Combining this with the previous we see that we are in either (115) or (116).
One could of course formulate variants of this lemma when we drop the ji ≤ ki − C hypothesis, but the number of additional cases becomes excessive, and we shall just treat these cases by hand whenever they arise. Most of these cases are rather simple, as the modulations are so large that the geometry of the cone becomes irrelevant. However, there is one case in this category, namely the (++) case mentioned above, when φ, ψ have opposing high frequencies and low modulation, and φψ has low frequency and high modulation, which warrants some special attention in Case 2(a).3 of Sect. 17. In the situations considered in Lemma 11, we refer to φ, ψ as inputs and Pk0 Qj0 L(φ, ψ) or Pk0 Qj0 L(φ,α , ψ ,α ) as the output. We refer to the inputs and output
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
501
collectively30 as functions. We say that one function dominates another if its modulation of the former is much greater than that of the latter. The imbalanced modulation case thus occurs when one of the functions dominate the other two. In such a case, our estimates will usually be proven by dividing up into caps κ of the radius suggested by the above proposition, and then applying Lemma 2 or the third term in (79) (for the inputs) and (93) or (79) (for the output). The balanced modulation case occurs when none of the functions dominate each other, and in this case one usually uses Hölder’s inequality and the estimates (81)–(84) or Lemma 10 (for the inputs), and (91), (92), or (79) (for the output). 14. The Core Product Estimate The purpose of this section is to state and prove the core product estimate Lemma 12, which lies at the heart of the proof of all the bilinear and trilinear estimates in Theorem 3 ((18), (21), (20), (28)–(31)): Lemma 12. Let j, k, k1 , k2 be integers such that j ≤ min(k1 , k2 ) + O(1). Then we have (4)
(4)
Pk (F ψ) N[k] χk=max(k1 ,k2 ) χj =min(k1 ,k2 ) F X˙ n/2−1,−1/2,∞ ψ S[k2 ]
(117)
k1
for all Schwartz functions F on R1+n with Fourier support in 2k1 −5 ≤ D0 ≤ 2k1 +5 , D− ∼ 2j and Schwartz functions ψ ∈ S[k2 ]. (4)
The factor χk=max(k1 ,k2 ) reflects the fact that high-high interactions (which are the (4)
only case when k is significantly less than max(k1 , k2 )) are weak. The factor χj =min(k1 ,k2 ) indicates some decay when F is close to the light cone; note that if F is normalized in n/2−1,−1/2,∞ X˙ k1 , then the L2t L2x norm of F decays like the square root of the distance to the light cone. This gain near the light cone shall be crucial for eliminating various logarithmic divergences arising from certain types of low-high interactions, and is a reflection of the fact that small-angle interactions are weaker than large-angle interactions when n > 1. (Small-angle interactions are the only interactions in which F , ψ, and F ψ can all stay near the cone). The estimates (28), (29) shall largely be obtained from Lemma 12 via Lemma 10. By (94) we can convert Lemma 12 to an estimate of the form S · S ⊂ X˙ n/2,1/2,1 , at which point (18), (21), (20) can largely be obtained from Lemma 8. Then, (30) will be largely obtained from the previous estimates and (4). Finally, (31) will be obtained from all the previous estimates, plus some additional arguments to deal with some difficult sub-cases. The rest of this section is devoted to the proof of Lemma 12. By the Littlewood–Paley product trichotomy we may divide into the high-high interaction case k ≤ k2 + O(1), k1 = k2 + O(1), the high-low interactions k = k1 + O(1), k2 ≤ k1 + O(1), and the low-high interactions k = k2 + O(1), k1 ≤ k2 + O(1). Despite the large number of sub-cases we shall soon consider, we shall only use a small number of techniques: we first use the geometry of the cone systematically to obtain as much frequency localization as possible, and then use either the standard 30 Since one can use duality to swap an input with the output, it is natural to consider both inputs and outputs on the same footing. Indeed, we will sometimes use a dual formulation of the above lemma, when the input φ is restricted to the region D0 ∼ 2k0 , D− ∼ 2j0 and the output is restricted via the projections Pk1 ,κ1 Qj1 .
502
T. Tao
∞ 2 ∞ ∞ 2 2 2 1 2 spaces L∞ t Lx , Lt Lx , Lt Lx , Lt Lx , Lt Lx combined with Hölder and Bernstein (in the balanced modulation case), or use variants of these arguments involving null frame spaces N F A[κ], S[k, κ] (in the imbalanced modulation case). Unfortunately each case has a slightly different geometry and numerology, and slightly different summability issues arise from case to case, and so the author was forced to treat each case separately. It may well be possible however to re-organize the cases to reduce this repetition somewhat, especially in three and higher dimensions in which the estimates are not as tight31 .
Case 1. (High-high interactions) k ≤ k2 + O(1), k1 = k2 + O(1), j ≤ k2 + O(1). We rescale k1 = 0, so k2 = O(1) and j, k ≤ O(1), and we reduce to showing (4)
(4)
Pk (F ψ) N[k] χk=0 χk=j 2−j/2 F L2 L2 ψ S[k2 ] . t
x
We now split the left-hand side into three contributions. Case 1(a). (F does not dominate ψ) The contribution of Pk (F Q≥j −C ψ). In this case we use (91) to estimate this contribution by n
2( 2 −1)k Pk (F Q≥j −C ψ) L1 L2 t
x
and then split into two sub-cases. Case 1(a).1. (F very close to light cone) j < 100k. We discard Pk and estimate the previous by n
2( 2 −1)k F L2 L2 Q≥j −C ψ L2 L∞ . t
t
x
x
By dyadic decomposition and (84) (noting that the right-hand side of this estimate is decreasing in j ) we can bound this by n
(4)
2( 2 −1)k F L2 L2 min(2−j , 1)χj ≥0 2−j/2 ψ S[k2 ] t
x
which is acceptable. Case 1(a).2. (F not too close to light cone) j ≥ 100k. We use Bernstein’s inequality (6) to estimate this by n
2( 2 −1)k 2nk/2 Pk (F Q≥j −C ψ) L1 L1 . t
x
We discard Pk and estimate the previous by n
2( 2 −1)k 2nk/2 F L2 L2 Q≥j −C ψ L2 L2 . t
x
t
x
31 For instance, it is tempting to divide immediately into the balanced and imbalanced modulation cases before any other localization, as the techniques used depend crucially on this distinction. Unfortunately this runs into the annoying technical problem that the Qj family of multipliers are not disposable, and one is in danger of requiring a solution to the (as yet unsolved) Bochner–Riesz problem for the cone. In the author’s argument, several artificial cases needed to be added or subtracted solely to circumvent this obstacle.
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
503
By (83) this is bounded by n
2( 2 −1)k 2nk/2 F L2 L2 2−j/2 t
x
1
ψ S[k2 ] 1 + 2j
which is acceptable. Case 1(b). (F does not dominate the output) The contribution of Pk Q≥j −C (F Q<j −C ψ). We use (92) to estimate the contribution by Pk Q≥j −C (F Q<j −C ψ) X˙ n/2−1,−1/2,∞ . k
We estimate this in turn by n
2( 2 −1)k 2−j/2 Pk (F Q<j −C ψ) L2 L2 . t
x
We split into two cases. Case 1(b).1. (F very close to light cone) j < 100k. Discarding the Pk , we estimate this contribution by n
2( 2 −1)k 2−j/2 F L2 L∞ Q<j −C ψ L∞ 2. t Lx t
x
By Lemma 5, (81) we can bound this by n
(4)
2( 2 −1)k 2−j/2 χj ≥k F L2 L2 ψ S[k2 ] t
x
which is acceptable. Case 1(b).2. (F not too close to light cone) 100k ≤ j < O(1). We use Bernstein’s inequality (6) to estimate this by n
2( 2 −1)k 2−j/2 2nk/2 F Q<j −C ψ L2 L1 , t
x
which we estimate by n
2( 2 −1)k 2−j/2 2nk/2 F L2 L2 Q<j −C ψ L∞ 2 t Lx t
x
which by (81) is bounded by n
2( 2 −1)k 2−j/2 2nk/2 F L2 L2 ψ S[k2 ] t
x
which is acceptable. Case 1(c). (F dominates) The contribution of Pk Q<j −C (F Q<j −C ψ). We may assume j ≤ k + O(1) since the contribution vanishes otherwise. Since we are in the unbalanced case, we shall decompose into caps of the size suggested by Lemma 11.
504
T. Tao
We introduce the parameter l := (k −j )/2+C/4, and then split Q<j −C = Q+ <j −C + − Q<j −C . By conjugation symmetry it suffices to show + ± Pk Pk,κ Q<j −C F Pk2 ,±κ Q<j −C ψ κ∈Kl κ ∈Kl
1+n
N[k](R ) (4) (4) −j/2 χk=0 χj =k 2
F L2 L2 ψ S[k2 ] . t x
for both choices of sign ±. Fix ±. By Lemma 11 (or more precisely, a dual of this lemma) we may assume that dist(κ, κ ) ∼ 2−l+C . By (93) we can bound the left-hand side by
1/2
2 Pk Pk,κ Q+ F Pk2 ,±κ Q± <j −C <j −C ψ κ∈Kl κ ∈K :dist(κ,κ )∼2−l+C
n 2( 2 −1)k
l
NFA[κ]
Since the κ summation is only over O(1) elements we can estimate this by 2
( n2 −1)k
2 ± Q (F P ψ) Pk Pk,κ Q+ k ,±κ 2 <j −C <j −C
1/2
NFA[κ]
κ,κ ∈Kl :dist(κ,κ )∼2−l+C
.
By Lemma 6 we may discard Pk Pk,κ Q+ <j −C . By (72) we can thus estimate the previous by 2(
)k 2l 2−(n−1)l/2 F
n 2 −1
L2t L2x
Pk
2 ,±κ
κ,κ ∈Kl :dist(κ,κ )∼2−l+C
Q± <j −C
2 ψ
1/2
S [k2 ,κ ]
.
The κ summation is now trivial and can be discarded. Note that if we replaced Q± <j −C by Q±
2( 2 −1)k 2l 2−(n−1)l/2 F L2 L2 ψ S[k2 ] t
x
which is acceptable by the definition of l. Thus it remains only to control 2( 2 −1)k 2l 2−(n−1)l/2 F L2 L2 t x 2 ± Pk2 ,±κ Q± <j −C − Q
S [k2 ,κ ]
κ ∈Kl
By Lemma 7 we may bound this by n ± − Q )ψ 2( 2 −1)k 2l 2−(n−1)l/2 F L2 L2 (Q± ˙ n/2,1/2,1 , <j −C
2
1/2
.
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
505
which we can bound using (83) by 2( 2 −1)k 2l 2−(n−1)l/2 F L2 L2 (1 + |(j − C) − (k2 − 2l)|) ψ S[k2 ] . t n
x
But this is acceptable by our construction of l. This concludes the proof of Case 1. Case 2. (Low-high interactions) k = k2 + O(1), j ≤ k1 + O(1), and k1 ≤ k2 + O(1). Let C be a large constant. We may rescale k = 0, so that k2 = O(1), k1 ≤ O(1), and j ≤ k1 + O(1). We may also assume that k1 < −C, since the claim follows from Case 1 otherwise. Our task is then to show (4)
P0 (F ψ) N[0](R1+n ) χj =k1 2(n/2−1)k1 2−j/2 F L2 L2 ψ S[k2 ] . t
x
We split the left-hand side into three separate contributions. Case 2(a). (Output is not too close to the light cone) The contribution of P0 Q≥k1 +j −2C (F ψ). We apply (92) and estimate this contribution by 2−j /2 P0 Qj (F ψ) L2 L2 . t
j ≥k1 +j −2C
x
We use Plancherel to discard P0 Qj and estimate this by 2−j /2 F L2 L∞ ψ L∞ 2. t Lx t
j ≥k1 +j −2C
x
Applying Lemma 5 and (81) we can bound this by (4) 2−j /2 2nk1 /2 χj ≥k1 F L2 L2 ψ S[k2 ] t
j ≥k1 +j −2C
x
which is acceptable (with a factor of about 2k1 /2 to spare). Case 2(b). (Input ψ is not too close to the light cone) The contribution of P0 Q
t
x
x
Applying Lemma 5, and (83) we can bound this by (4)
2nk1 /2 χj =k1 F L2 L2 2−(k1 +j −C)/2 ψ S[k2 ] t
x
which is acceptable (with a factor of about 2k1 /2 to spare). Case 2(c). (Both the input ψ and the output are very close to the light cone; F has the dominant modulation) The contribution of Pk Q
n
χk1 =j 2( 2 −1)k1 2−j/2 F L2 L2 ψ S[k2 ] t
x
506
T. Tao
for all signs ±. We can assume that ± = + since the expression vanishes otherwise. Since we are again in the imbalanced case, we again use sector decomposition. We introduce the parameter l := (k2 − k1 − j + C)/2, and rewrite the left-hand side as + + P0 P0,κ Q
NFA[κ]
Since the interior summation is only over O(1) choices, we may estimate this by 1/2 + + 2
P0 P0,κ Q (F Pk2 ,κ Q ψ) NFA[κ] .
κ,κ ∈Kl :dist(κ,κ )∼2−l+C
By Lemma 6 we may discard P0 P0,κ Q+
F 2L2 L2 Pk2 ,κ Q+
κ,κ ∈Kl :dist(κ,κ )∼2−l+C
x
The κ summation is now trivial and can be discarded. By construction of l we have Q
2l 2−(n−1)l/2 F L2 L2 ψ S[k2 ] ∼ 2−(n−1)(k1 −j )/2 2( 2 −1)k1 2−j/2 F L2 L2 ψ S[k2 ] t
t
x
x
which is acceptable32 . This concludes the treatment of Case 2. Case 3. (High-low interactions). k = k1 + O(1), j ≤ k2 + O(1), and k2 ≤ k1 + O(1). This will be a variant of the argument for Case 2. There is less room to spare, but on the other hand one does not have to perform as severe an angular decomposition. Let C be a large constant. We may rescale k = 0. We may take k2 < −C since the claim follows from Case 1 otherwise. For future reference we observe that F X˙ n/2−1,−1/2,∞ ∼ 2−j/2 F L2 L2 . We split into t
k1
four contributions.
x
Case 3(a). (F does not dominate ψ) The contribution of P0 (F Q≥j −C ψ). For this contribution we apply (91) and discard P0 to control this contribution by F L2 L2 Q≥j −C ψ L2 L∞ . t
x
t
x
32 Note that in this case one does not gain a factor of 2k1 as in the other cases. This inability to gain any
additional factors will be a substantial headache when it comes to prove (31).
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
507
Applying (84) we can bound this by (4)
F L2 L2 χk2 =j 2−j/2 ψ S[k2 ] t
x
which is acceptable. Case 3(b). (Output very far away from light cone) The contribution of P0 Q>k2 +C (F Q<j −C ψ). We apply (92) and estimate this contribution by ∞, 2−k2 /2 F Q<j −C ψ L2 L2 2−k2 /2 F L2 L2 Q<j −C ψ L∞ t Lx t
t
x
x
which is acceptable by (82). Case 3(c). (F does not dominate the output) The contribution of P0 Qj −2C≤·≤k2 +C (F Q<j −C ψ). We apply (92) and estimate this contribution by 2−j /2 P0 Qj (F Q<j −C ψ) L2 L2 . (118) t
j −2C≤j ≤k2 +C
x
Fix j . We introduce the parameter l := (k2 − j )/2 + C, and write P0 Qj (F Q<j −C ψ) 2 2 = P0 Qj F Pk2 ,κ Q<j −C ψ Lt Lx κ∈Kl
.
L2t L2x
From the geometry of the cone we see that the summand has Fourier support in the region {B ∈ C κ} for some C depending on C. Thus the summands are almost orthogonal, and we may estimate the previous by
P0 Qj (F Pk
2
1/2
2 ,κ Q<j −C ψ) 2
Lt L2x
κ∈Kl
.
We use Plancherel to discard P0 Qj and then Hölder to estimate this by
κ∈Kl
1/2
F 2L2 L2 Pk2 ,κ Q<j −C ψ) 2L∞ L∞ t
t
x
.
x
By Bernstein’s inequality (7) we may bound this by
nk2 /2 −(n−1)l/2 Pk 2 22
F L2 L t
2
x
2 ∞ ,κ Q<j −C ψ
κ∈Kl
Lt L2x
By Plancherel and orthogonality we may bound this by F L2 L2 2nk2 /2 2−(n−1)l/2 Q<j −C ψ L∞ 2. t Lx t
x
1/2
.
508
T. Tao
Applying (81) and then inserting this back into (118), we can therefore estimate this contribution by 2−j /2 F L2 L2 2−(n−1)l/2 ψ S[k2 ] j −2C≤j ≤k2 +O(1)
t
x
which is acceptable. Case 3(d). (F dominates) The contribution of P0 Q<j −2C (F Q<j −C ψ). By conjugation symmetry it suffices to show (4) ± −j/2 (F Q ψ)
F L2 L2 ψ S[k2 ] P0 Q+ 1+n χk2 =j 2 <j −2C <j −C t x N[0](R ) for all ±. Fix ±. As usual in the imbalanced case we divide into sectors. We introduce the parameter l := −(j − k2 + C)/2 and rewrite the left-hand side as + ± P0 P0,κ Q<j −2C (F Pk2 ,±κ Q<j −C ψ) . 1+n κ∈Kl κ ∈Kl N[0] R By (a dual of) Lemma 11 the summand vanishes unless dist(κ, κ ) ∼ 2−l+C . Thus for each κ ∈ Kl there are only O(1) values of κ which have a non-zero contribution. We can therefore write the previous as + ± . P0 P0,κ Q<j −2C (F Pk2 ,±κ Q<j −C ψ) 1+n κ∈Kl κ ∈K :dist(κ,κ )∼2−l+C l N[0] R Observe from our hypotheses k2 < −C and j ≤ k2 + O(1) (and Lemma 3) that we may assume that F has Fourier support in the region τ > 0, D− ∼ 2j . We would like to discard Q+ <j −2C , but unfortunately the tools we have (Lemmata 3, 4, 6) do not allow us to do so automatically. We must therefore split + + Q+ <j −2C = 1 − 1 − Q≤j +C − Qj −2C≤·≤j +C . Case 3(d).1. (Output dominates F ) The contribution of 1 − Q+ ≤j +C . From (a dual of) Lemma 11 we see that this term vanishes. (The projection Q− ≤j +C does not contribute, from the Fourier support of F and the hypothesis k2 < −C). Case 3(d).2. (Output balances F ) The contribution of Q+ j −2C≤·≤j +C . By (92) and orthogonality we may bound this contribution by 2−j/2
2 ± Q ψ F P P0 P0,κ Q+ 2 k ,±κ 2 j −2C≤·≤j +C <j −C
κ,κ ∈Kl :dist(κ,κ )∼2l+C
Lt L2x
1/2
.
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
509
We now use Lemma 6 to discard P0 P0,κ Q+ j −2C≤·≤j +C and estimate this by
1/2
2−j/2
κ,κ ∈Kl :dist(κ,κ )∼2l+C
2
F 2L2 L2 Pk2 ,±κ Q± <j −C ψ L∞ L∞ t
t
x
.
x
By the improved Bernstein’s inequality (7) we can estimate this by 2−j/2 2nk2 /2 2−(n−1)l/2 F L2 L2 t
x
2 ψ Pk2 ,±κ Q± <j −C ∞
Lt L2x
κ,κ ∈Kl :dist(κ,κ )∼2l+C
1/2
.
The κ sum is now trivial and can be discarded. We can factorize Pk2 ,±κ Q± <j −C as Pk2 ,±κ Q<j −C times a disposable multiplier, which we then discard. By Plancherel we can thus bound the previous by 2−j/2 2nk2 /2 2−(n−1)l/2 F L2 L2 Q<j −C ψ L∞ 2, t Lx t
x
which is acceptable by (81). Case 3(d).3. (Output arbitrary) The contribution of 1. We use (93) to estimate this contribution by
κ∈Kl
κ ∈Kl :dist(κ,κ )∼2−l+C
1/2
2 P0 P0,κ (F Pk2 ,±κ Q± ψ) <j −C
.
NFA[κ]
Since the interior summation is only over O(1) choices, we may estimate this by
P0 P0,κ (F Pk
2 ,±κ
κ,κ ∈Kl :dist(κ,κ )∼2l+C
Q± <j −C
2 ψ)
NFA[κ]
1/2
.
We discard P0 P0,κ and use (72) to bound the previous by 2l 2−(n−1)l/2 2−k2 /2
κ,κ ∈Kl :dist(κ,κ )∼2l+C
2 Pk
F 2L2 L t
x
2 ,±κ
Q± <j −C
2 ψ
S[k2 ,κ ]
1/2
.
The κ summation is trivial and can be discarded. By construction of l we have Q<j −C = Q
x
which is acceptable. This concludes the proof of Case 3, and thus of Lemma 12. In the next three sections we show how the above core product estimates can be used to prove (28), (29), (18), (21), (20), and (30).
510
T. Tao
15. Product Estimates: The Proof of (28), (29) From Lemma 12 it shall be straightforward to prove (28) and (29). To prove these estimates for [−T , T ] × Rn it suffices to do so on R1+n . By Lemma 1 we may replace L(φ, F ) with φF . We may also replace Nk , Nk2 with N [k], N [k2 ]. By (89) and dyadic decomposition we see that both (28) and (29) will follow from (4)
Pk (φF ) N[k](R1+n ) χk≥k2 φ S(1)(R1+n ) F N[k ](R1+n ) 2
(119)
for all Schwartz φ ∈ S(1)(R1+n ) and Schwartz F ∈ N [k2 ](R1+n ). The estimate (119) shall also be helpful in proving (31). We may rescale k = 0. We divide into three cases: k2 > 10, k2 < −10, and −10 ≤ k2 ≤ 10. Case 1: k2 > 10 (High-high interactions). We may replace φ by φk2 −5<·
(120)
for all k2 − 5 < k1 < k2 + 5. Fix k1 . We may of course assume F is an N [k2 ] atom. As usual we split into three cases. n/2−1
atom with frequency 2k2 . Case 1(a). F is an L1t H˙ x We use (91) followed by Bernstein’s inequality (6) to estimate the left-hand side of (120) by φk1 F L1 L1 φk1 L∞ 2 F L1 L2 . t Lx t
t
x
x
By (81) and the Case 1(a) assumption this is bounded by 2−nk1 /2 φk1 S[k1 ] 2−(n/2−1)k2 which is acceptable. Case 1(b). F is an X˙ n/2−1,−1/2,1 atom with frequency 2k2 and modulation 2j . We may assume that j ≥ k1 +10 since the claim follows from Lemma 12 and Lemma 10 otherwise. First consider the contribution of Q<j −20 φk1 . This contribution has Fourier support in D0 ∼ 1, D− ∼ 2j , so by (92) we may bound it by 2−j/2 Q<j −20 φF L2 L2 . t
x
By Bernstein’s inequality (6) and Hölder we may bound this by 2−j/2 Q<j −20 φ L∞ 2 F L2 L2 , t Lx t
which is acceptable by (82) and the Case 1(b) hypothesis.
x
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
511
Finally, consider the contribution of Q>j −20 φk1 . In this case we use (91) and Bernstein’s inequality (6) to bound this by Q>j −20 φk1 F L1 L1 Q>j −20 φk1 L2 L2 F L2 L2 t
t
x
t
x
x
which is acceptable by (83) and the Case 1(b) hypothesis. Case 1(c). F is a null frame atom with frequency 2k2 . In this case we split F = j ≤k2 +5 Qj F . From Lemma 10 and Lemma 12 we have (4)
(4)
P0 (φk1 Qj F ) N[0](R1+n ) χk2 =0 χk2 =j φk1 S[k1 ] for all such j , and the claim follows by summing in j . Case 2: k2 < −10 (High-low interactions). We may replace φ by φ5<·<5 . By Littlewood–Paley decomposition it thus suffices to prove the estimate
P0 (φk1 F ) N[0](R1+n ) φk1 S[k1 ] F N[k ](R1+n ) 2
(121)
for all −5 < k1 < 5. We may of course assume F is an N [k2 ] atom. As usual we split into three cases. n/2−1 Case 2(a). F is an L1t H˙ x atom with frequency 2k2 . We use (91) to estimate the left-hand side of (121) by
φk1 F L1 L2 φk1 L∞ 2 F L1 L∞ . t Lx t
t
x
x
By (81), Bernstein’s inequality (5) and the Case 1(a) assumption this is bounded by φk1 S[k1 ] 2k2 which is acceptable. Case 2(b). F is an X˙ n/2−1,−1/2,1 atom with frequency 2k2 and modulation 2j . We may assume that j ≥ k2 +10 since the claim follows from Lemma 12 and Lemma 10 otherwise. Consider the contribution of Q<j −20 φk1 . This contribution has Fourier support in D0 ∼ 1, D− ∼ 2j , so by (92) we may estimate it by 2−j/2 Q<j −20 φk1 F L2 L2 2−j/2 Q<j −20 φk1 L∞ 2 F L2 L∞ . t Lx t
t
x
x
But this is acceptable by (81), Bernstein’s inequality (5), and the Case 2(b) hypothesis. Finally, consider the contribution of Q≥j −20 φk1 . We use (91) to bound this by
Q≥j −20 φk1 F L1 L2 Q≥j −20 φk1 L2 L2 F L2 L∞ . t
x
t
x
t
x
But this is acceptable by (83), Bernstein’s inequality (5), and the Case 2(b) hypothesis.
512
T. Tao
Case 2(c). F is a ±-null frame atom with frequency 2k2 and angle 2−l . In this case we split F = j ≤k2 +5 Qj F . From Lemma 10 and Lemma 12 we have (4)
P0 (φk1 Qj F ) N[0](R1+n ) χk2 =j φk1 S[k1 ] for all such j , and the claim follows by summing in j . Case 3: −10 ≤ k2 ≤ 10 (Low-high interactions). We may then freely apply the Littlewood–Paley cutoff P<20 to φ, so that φ has Fourier support in the region D0 1. We may assume that F is an N [k2 ] atom. We now split into three cases depending on which type of atom F is. n/2−1
atom with frequency 2k2 . Case 3(a). F is an L1t H˙ x In this case we simply use (91) to compute ∞ F 1 2
P0 (φF ) N[0](R1+n ) P0 (φF ) L1 L2 φ L∞ L L t Lx t
t
x
x
which is acceptable by (88). Case 3(b). F is an X˙ n/2−1,−1/2,1 atom with frequency 2k2 and modulation 2j . In this case F has Fourier support in D0 ∼ 1, D− ∼ 2j and
F L2 L2 2j/2 . t
(122)
x
From Lemma 12 we thus have (4)
P0 (φk1 F ) N[0](R1+n ) φk1 S[k1 ] 2−j/2 χj =k1 F L2 L2 t
x
whenever k1 ≥ j − C. Summing this and using (88) we see that −j/2 P0 (φ≥j −C F )
F L2 L2 . 1+n φ 1+n 2 ) ) N[0](R S(1)(R t
x
It therefore suffices to consider the case when φ is close to the origin in frequency. More precisely, we reduce to showing −j/2 P0 (φ<j −C F )
F L2 L2 . 1+n φ 1+n 2 N[0](R ) S(1)(R ) t x We split this into two contributions. Case 3(b).1. (F does not dominate φ) The contribution of P0 (Q>j −2C φ<j −C F ). By (91) and the triangle inequality we estimate this contribution by P0 (Q>j −2C φk F ) 1 2 . 1 L L t
k1 ≤j −C
x
Discarding P0 , we can bound this by Q>j −2C φk 2 ∞ F 2 2 . 1 L L L L k1 ≤j −C
t
x
t
x
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
513
By (84), (122), (88) we can bound this by 2−(j −k1 ) 2−j/2 φ S(1)(R1+n ) 2j/2 k1 ≤j −C
which is acceptable. Case 3(b).2. (F dominates φ) The contribution of P0 (Q≤j −2C φ<j −C F ).
This contribution has Fourier support in the region D0 ∼ 1, D− ∼ 2j , and can therefore be estimated using (92) by 2−j/2 P0 (Q≤j −2C φ≤j −C F ) L2 L2 . t
x
We then discard P0 and bound this by 2−j/2 Q≤j −2C P≤j −C φ L∞ L∞ F L2 L2 . t
x
t
x
But this is acceptable by (88), (122), and Lemma 3. This concludes the treatment of the case when F is an X˙ n/2−1,−1/2,1 atom. Case 3(c). F is a ±-null frame atom with frequency 2k2 and angle 2−l . By conjugation symmetry we may take ± = +. We then have a decomposition F = κ∈Kl Fκ such that each Fκ has Fourier support in the region 1 (τ, ξ ) : τ > 0; D− 2−2l ; D+ ∼ 1; B ∈ κ 2 and such that
κ∈Kl
1/2
Fκ 2NFA[κ]
1.
(123)
We now split P0 (φF ) into three pieces and treat each contribution separately. Case 3(c).1. (F stays away from light cone, φ close to origin) The contribution of P0 (φ<−2l−2C Q≥−2l−C F ). The term Q≥−2l−C F has Fourier support on the region D0 ∼ 1, D− ∼ 2−2l . From Lemma 10 and (92) we thus see that Q≥−2l−C F is a bounded linear combination of X˙ n/2−1,−1/2,1 atoms of frequency 2k2 and modulation ∼ 2−2l , in which case the Case 2 argument applies. Case 3(c).2. (F close to light cone, φ close to origin) The contribution of P0 (φ<−2l−2C Q<−2l−C F ). In this case the idea is to use (64) and the fact that φ is bounded and does not significantly affect frequency support. We subdivide this contribution as P0,κ P0 φ<−2l−2C Q<−2l−C Fκ . κ ∈Kl+10 1+n κ∈Kl N[0] R
514
T. Tao
By (93) we can bound this by
1/2
2 φ<−2l−2C Q<−2l−C Fκ P0,κ P0 κ ∈Kl+10
.
NFA[κ ]
κ∈Kl
We may restrict the κ summation to those κ for which κ ⊂ κ, since the contribution vanishes otherwise after applying P0,κ . Thus for each κ there are only O(1) values of κ which contribute, so we can estimate the previous by
κ ∈Kl+10
κ∈Kl
1/2
P0,κ P0 (φ<−2l−2C Q<−2l−C Fκ ) 2
NFA[κ ]
:κ ⊂κ
.
By (65) we may replace the N F A[κ ] norm with the N F A[κ] norm. We then discard P0,κ P0 and use (64), (88) to bound the previous by φ S(1)(R1+n )
κ ∈Kl+10
1/2
κ∈Kl
:κ ⊂κ
Q<−2l−C Fκ 2NFA[κ]
.
By (90) we may insert P˜k2 ,κ in front of Fκ . By Lemma 6 we may discard P˜k2 ,κ Q<−2l−C . The claim now follows from (123) and the observation that for each κ there are only O(1) values of κ which contribute. Case 3(c).3. (φ not too close to origin) The contribution of P0 (φ≥−2l−2C F ). We use the triangle inequality to estimate this contribution by
P0 (φk Qj F ) 1+n . 1 N[0](R )
k1 ≥−2l−2C j
The case k1 > C can be ignored since their contribution vanishes. Similarly for j > −2l + C. For the remaining cases we apply Lemma 12 to estimate the contribution by
−2l−2C≤k1 ≤C j ≤−2l+C
(4)
φk1 S[k1 ] 2−j/2 χj =k1 Qj F L2 L2 . t
By (88) and Lemma 10 we can bound this by
−2l−2C≤k1 ≤C j ≤−2l+C
which is acceptable.
(4)
φ S(1)(R1+n ) χj =k1
x
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
515
16. Algebra Estimates: The Proof of (18), (21), (20) We now use the core product estimates in Sect. 14, combined with some duality arguments and some additional work, to obtain the algebra estimates (18), (21), (20). Step 1. Obtain decay near the light cone. The purpose of this step is to prove the following lemma, which asserts that if φ ∈ S[k1 ] and ψ ∈ S[k2 ], then φψ begins to decay once one approaches within min(2k1 , 2k2 ) of the light cone in frequency space33 . This lemma shall also be useful in dealing with several sub-cases of the trilinear estimate (31). Lemma 13. Let j, k1 , k2 , k be such that j ≤ min(k1 , k2 ) + O(1). Then −k (4) 2 ∇x,t Pk Qj (φψ) n/2,1/2,1 χ (4) k=max(k1 ,k2 ) χj =min(k1 ,k2 ) φ S[k1 ] ψ S[k2 ] X˙ k
for all Schwartz functions φ ∈ S[k1 ], ψ ∈ S[k2 ]. As is usual the 2−k ∇x,t can be discarded by Littlewood–Paley calculus if desired. Proof. By symmetry we may assume k1 ≤ k2 . By scaling we may take k = 0. By conjugation symmetry we may assume that φ, ψ are real-valued. We may assume that either k2 = O(1) (low-high interaction) or that k2 ≥ O(1) and k1 = k2 + O(1) (high-high interaction). We may of course replace ψ by Pk2 −5≤·≤k2 +5 ψ. The left-hand side is ∼ 2j/2 ∇x,t P0 Qj (φψ) L2 L2 , t
which by duality is
x
∼ 2j/2 &ψ, Pk2 −5≤·≤k2 +5 (φ∇x,t P0 Qj F )'
for some F in the unit ball of L2t L2x . By (94) this is
ψ S[k2 ] Pk2 (φ∇x,t P0 Qj F ) 2j/2 2−(n−1)k2
N[k2 ]
k2 =k2 +O(1)
.
(124)
Suppose we are in the low-high interaction case k2 = O(1). By Lemma 12 we can estimate (124) by (4) 2j/2
ψ S[k2 ] χj =k1 φ S[k1 ] ∇x,t P0 Qj F X˙ n/2−1,−1/2,1 0
k2 =k2 +O(1)
which is acceptable by the normalization of F (estimating ∇x,t by O(1)). Now suppose we are in the high-high interaction case k2 ≥ O(1), k1 = k2 + O(1). By Lemma 12 we may estimate (124) by (4) 2j/2 2−(n−1)k2
ψ S[k ] χ φ S[k ] ∇x,t P0 Qj F n/2−1,−1/2,1 2
k2 =k2 +O(1)
j =0
1
X˙ 0
which is acceptable by the normalization of F (estimating ∇x,t by O(1 + 2j ). This proves the lemma. 33 We shall prove this lemma by duality as this is by far the quickest way to do so. However, it is an instructive exercise to prove this Lemma directly. Geometrically, the lemma reflects the fact that the portion of φψ this close to the light cone arises from small angle interactions, which give a better contribution than large angle interactions in dimensions 2 and higher.
516
T. Tao
Step 2. Control frequency-localized interactions. The purpose of this step is to prove the estimate (4)
Pk (φψ) S[k] χk=max(k
φ S[k1 ] ψ S[k2 ] 1 ,k2 )
(125)
whenever k, k1 , k2 are integers and φ ∈ S[k1 ], ψ ∈ S[k2 ] are Schwartz functions. In this step we shall also prove the strengthened estimate
Pk (φψ) S[k] φ S(1) ψ S[k2 ]
(126)
in the low-high interaction case k = k2 + O(1). To prove (125) we may use the Littlewood–Paley product trichotomy and symmetry to divide into the high-high interaction case k ≤ k2 + O(1), k1 = k2 + O(1) and the low-high interaction case k = k2 + O(1), k1 ≤ k2 + O(1). In the latter case we see that from (89) that it will suffice to prove (126). Case 2(a). (High-high interactions) Proof of (125) when k ≤ k2 + O(1), k1 = k2 + O(1). By scale invariance we may take k2 = O(1), so k1 = O(1) and k ≤ O(1). 2 First consider the L∞ t Lx component of the S[k] norm in (79). By Bernstein (6) and Hölder we have ∇x,t Pk (φψ) ∞ 2 2nk/2 ∇x,t (φψ) ∞ 1 Lt Lx Lt Lx 2nk/2 ∇x,t φ L∞ 2 ψ L∞ L2 + φ L∞ L2 ∇x,t ψ ∞ 2 , L L L t t t x x x t
x
(127) 2 and so by (79) we see that the L∞ t Lx component of the S[k] norm is acceptable. To deal with the other two components we split Pk (φψ) = Pk Q
which implies from Lemma 8 that Pk Q
x
for all j ≥ k1 + C. Fix j . Consider the contribution of Q>j −C φ. By Plancherel we may discard Pk Qj , and estimate this contribution by 2nk/2 2j −k Q>j −C φ L2 L∞ ψ L∞ 2 t Lx t
x
which is acceptable by (84), (81). Similarly one may dispose of the contribution of Q≤j −C φ and Q>j −C ψ, which leaves only the contribution of Q≤j −C φ and Q≤j −C ψ. But this vanishes by the assumption on j , and so we are done.
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
517
Case 2(b). (Low-high interactions) Proof of (126) when k = k2 + O(1), k1 ≤ k2 + O(1). This case will not follow as easily from Lemma 13 as Case 2(a), because of the possible logarithmic pile-up of low frequencies. However, we can compensate for this because ∞ of the L∞ t Lx control in (86). By scale invariance we may take k = 0, so that k2 = O(1). By a limiting argument we may assume that ψ is Schwartz, and that φ is a Schwartz function plus a constant. If φ is a constant then the claim is trivial from Lemma 9, so we may assume that φ is actually Schwartz. Expanding out (79), we have to show that P0 ∇x,t (φψ) ∞ 2 φ (128) 1+n ψ S[k2 ] , Lt Lx S(1)(R ) that (1 + 2j ) P0 Qj (φψ) L2 L2 2−j/2 φ S(1)(R1+n ) ψ S[k2 ] t
x
(129)
for all j ∈ Z, and that P0,κ P0 Q+ κ∈Kl
2 <−2l (φψ) S[0,κ]
1/2 φ S(1)(R1+n ) ψ S[k2 ]
(130)
for all l > 10 (the corresponding estimate for the − sign follows by conjugation symmetry). Step 2(b).1. Energy estimate: Proof of (128). We use (127) from Step 2(a). The claim then follows from (82), (79), (88) and dyadic decomposition. Step 2(b).2. X˙ n/2,1/2,∞ estimate: Proof of (129). Fix j . We first deal with the easy case j > C. Consider the contribution of Q>j −C ψ. By Plancherel we may discard P0 Qj and estimate the previous by ∞ Q>j −C ψ 2 2 (1 + 2j ) φ L∞ L L t Lx t
x
which is acceptable by (19), (83). The contribution of Q≤j −C φ and Q≤j −C ψ vanish, so we only need consider the contribution of Q>j −C φ and Q≤j −C ψ. By Plancherel we may discard P0 Qj and estimate this contribution by (1 + 2j ) Q>j −C φ L2 L2 Q≤j −C ψ L∞ L∞ . t
x
t
x
But this is acceptable by (83), (82), (88) and dyadic decomposition. Thus we may assume that j < O(1). From Lemma 13, (88) and dyadic decomposition we see that P0 Qj (φ≥j −C ψ) 2 2 2−j/2 φ 1+n ψ S[k2 ] . Lt Lx S(1)(R ) Thus we need only show that P0 Qj (φ<j −C ψ) 2 2 2−j/2 φ 1+n ψ S[k2 ] . L L S(1)(R ) t
x
518
T. Tao
Consider first the contribution of Q≥j −C/2 φ<j −C . By (84) the L2t L∞ x norm of this is −j/2 ∞ 2 O(2
φ S(1)(R1+n ) , while the Lt L norm of ψ is O( ψ S[k2 ] ) by (81). The claim follows by discarding P0 Qj and using Hölder. It remains to control Q<j −C/2 φ<j −C . For this contribution we may replace ψ by Qj −10<·<j +10 ψ since the left-hand side vanishes otherwise. We then use Plancherel to discard P0 Qj and estimate the left-hand side by Q<j −C/2 φ<j −C ∞ ∞ Qj −10<·<j +10 ψ 2 2 , L L L L t
t
x
x
which is acceptable by (88), (83). This proves (129). Step 2(b).3. Null frame estimate: Proof of (130). Fix l > 10. We divide (130) into two contributions. Case 2(b).3(a). (φ not too close to origin) The contribution of φ>−2l−C . By Lemma 7 we can bound this contribution by + j/2 P0 Q+ (φ>−2l−C ψ) n/2,1/2,1 2 Q (φ ψ) P 0 >−2l−C ˙ <−2l j X 0
L2t L2x
j <−2l+C
.
But the right-hand side is acceptable by Lemma 13 (estimating Q+ j by Qj ), (88) and dyadic decomposition. Case 2(b).3(b). (φ close to origin) The contribution of φ≤−2l−C . In this case the idea is to use (66) and the fact that φ is bounded and does not significantly affect frequency support. We may assume that ψ has Fourier support in 2−2.5 ≤ D0 ≤ 22.5 since this contribution vanishes otherwise. We first take advantage of Lemma 2 to write P0 (φ≤−2l−C ψ) = φ≤−2l−C ψ0 + L(∇φ≤−2l−C , ψ). We consider the contribution of each term separately. Case 2(b).3(b).1. (Commutator term) The contribution of L(∇φ≤−2l−C , ψ). By Lemma 7 we may control this by Q+ <−2l L(∇φ≤−2l−C , ψ) X˙ n/2,1/2,1 0
which we can bound using (9) by j <−2l+C k1 ≤−2l−C
2j/2 2k1 Qj L(φk1 , ψ) L2 L2 . t
x
However, from (129), (88), and Lemma 1 we have Qj L(φk , ψ) 2 2 2−j/2 φ 1+n ψ S[k2 ] 1 L L S(1)(R ) t
x
while from Lemma 13, (88), and Lemma 1 we have Qj L(φk , ψ) 2 2 2−j/2 χ (4) φ 1+n ψ S[k2 ] 1 k1 =j L L S(1)(R ) t
x
when j ≤ k1 + O(1). Combining these estimates we see that the contribution of L(∇φ≤−2l−C , ψ) is acceptable (in fact we have a sizeable gain in l).
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
519
Case 2(b).3(b).2. (Main term) The contribution of φ≤−2l−C ψ0 . We need to show 1/2 P0,κ Q+ (φ≤−2l−C ψ0 ) 2 φ 1+n ψ S[k2 ] . <−2l S[0,κ] S(1)(R )
(131)
κ∈Kl
Let us first consider the contribution of Q>−2l−C/2 φ≤−2l−C . For this term we use Lemma 7 to bound this by 2−l Q>−2l−C/2 φ≤−2l−C ψ0 L2 L2 . t
x
But this is acceptable using (84) to estimate Q>−2l−C/2 φ≤−2l−C and (81) to estimate ψ0 (cf. the corresponding term in Step 2(b).2). It remains to control the contribution of Q≤−2l−C/2 φ≤−2l−C . For this term we may freely replace ψ0 by P0,κ Q+ <−2l+C ψ0 . κ ∈Kl+10 :κ ⊂κ
By Lemma 6 we may discard P0,κ Q+ <−2l and estimate this contribution by
Q>−2l−C/2 φ≤−2l−C P0,κ Q+
<−2l+C
κ∈Kl κ ∈Kl+10 :κ ⊂κ
1/2
2 ψ0
S[0,κ]
.
We now discard a technical contribution. Case 2(b).3(b).2(a). (ψ stays away from light cone) The contribution of Q−2l−20≤·<−2l+C ψ0 . This contribution can be estimated by (75) by
κ∈Kl κ ∈Kl+10 :κ ⊂κ
P0,κ Q+ <−2l (Q≤−2l−C/2 φ≤−2l−C 2 × P0,κ Q+ −2l−2C≤·≤−2l+C ψ0 ) X˙ n/2,1/2,1
1/2 .
0
n/2,1/2,1
We may estimate the X˙ 0 then estimate the previous by 2−l φ S(1)(R1+n )
norm by 2−l times the L2t L2x norm. By (88) we can
κ∈Kl κ ∈Kl+10 :κ ⊂κ
1/2 2
P0,κ Q+ −2l−2C≤·≤−2l+C ψ0 L2 L2 t
which by Plancherel is bounded by 2−l φ S(1)(R1+n ) Q+ ψ 0 −2l−2C≤·≤−2l+C
L2t L2x
which is acceptable by (83).
,
x
,
520
T. Tao
Case 2(b).3(b).2(b). (ψ close to light cone) The contribution of Q<−2l−20 ψ0 . This is the main term. We need to control
Q≤−2l−C/2 φ≤−2l−C P0,κ Q+
<−2l−20
κ∈Kl κ ∈Kl+10 :κ ⊂κ
2 ψ0
S[0,κ]
1/2
.
(132)
By (66) and (88) we can estimate this
1+n S(1) R κ∈Kl κ ∈Kl+10 :κ ⊂κ
φ
P0,κ Q+
<−2l−20
2 ψ0
S[0,κ]
1/2
.
By (67) we may replace the S[0, κ] norm by the S[0, κ ] norm. By (79) the previous is therefore bounded by φ S(1)(R1+n ) ψ0 S[0] , which is acceptable by Lemma 9. This concludes the proof of (125). Step 3. Prove (18). ∞ Since the L∞ t Lx control on φψ is immediate from (19), it thus suffices by (86) to show that
Pk (φψ) S[k] ck φ S(c)(R1+n ) ψ S(c)(R1+n ) for all k. By scale invariance we may take k = 0. By the Littlewood–Paley trichotomy we can decompose P0 (φψ) = P0 (φ≤−5 )ψ + P0 ((φ−5<·<5 )ψ) + P0 (φ≥5 )ψ) = P0 (φ≤−5 )ψ−5<·<5 + P0 ((φ−5<·<5 )ψ<10 ) + P0 (φk1 ψk2 ). k1 ≥5 k1 −5≤k2 ≤k1 +5
Applying the triangle inequality and (125), (126) we obtain
P0 (φψ) S[0]
φ S(1) ψk2 S[k2 ]
k2 =O(1)
+
k1 =O(1)
+
φk1 S[k1 ] ψ S(1)
k1 ≥O(1) k2 =k1 +O(1)
(4)
χk1 =0 φk1 S[k1 ] ψk2 S[k2 ] .
But this is acceptable by (86), (89). This completes the proof of (18).
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
521
Step 4. Prove (21). By Lemma 1 we may replace L(φ, ψ) with φψ. By (87) and dyadic decomposition it suffices to show
2δ1 |k −k|
Pk (φk ψ) φ S ψ S(c) 1 k S[k ] k1
for all k . Recall from hypothesis that ψ has Fourier support in the region D0 2k . Fix k . We divide the k1 summation into three pieces. Case 4(a). (Low-high interactions) The contribution when k1 ≤ k − 5. In this case we may replace ψ by ψk −5<·
2δ1 (k−k )
φk1 S[k1 ] ψk2 S[k2 ]
k1 ≤k −5 k2 =k +O(1)
which is acceptable by (87), (86). Case 4(b). (High-high interactions) The contribution when k1 ≥ k + 5. In this case we may replace ψ by ψk1 −5<·
2δ1 (k −k)
(4)
k1 ≥k −5 k2 =k1 +O(1)
χk =k1 φk1 S[k1 ] ψk2 S[k2 ]
which is acceptable by (87), (86) (note that the χ (4) gain dominates any δ1 losses). Case 4(c). (High-low interactions) The contribution when k − 5 < k1 < k + 5. In this case we apply (126) to dominate this by
2δ|k−k |
φk1 S[k1 ] ψ
k1 =k +O(1)
S(1)
R1+n
which is acceptable by (87), (89). This completes the proof of (21). Step 5. Prove (20). As in Step 4 we may reduce to showing
2δ1 |k −k|
Pk (φk ψ) φ S ψ S 1 k k S[k ] k1
for all k . Again, we divide into three cases.
522
T. Tao
Case 5(a). (Low-high interactions) The contribution when k1 ≤ k − 5. In this case we may replace ψ by ψk −5<·
φk1 S[k1 ] ψk2 S[k2 ] k1 ≤k −5 k2 =k +O(1)
which is acceptable by (87). Case 5(b). (High-high interactions) The contribution when k1 ≥ k + 5. In this case we may replace ψ by ψk1 −5<·
which is acceptable by (87) (note that the χ (4) gain dominates any δ1 losses). Case 5(c). (High-low interactions) The contribution when k − 5 < k1 < k + 5. In this case we may replace ψ by ψ
φk1 S[k1 ] ψk2 S[k2 ] 2δ1 |k−k | k1 =k +O(1) k2 ≤k1 +O(1)
which is acceptable by (87). This completes the proof of (20).
17. Null Forms: The Proof of (30) We now use the estimates of the previous three sections, combined with (4), to prove (30). The identity (4) is especially good for small angle interactions in which φ, ψ, and φψ are all close to the light cone in frequency space34 . However if φ is much smaller frequency than ψ, and ψ is far away from the light cone, then (4) is no longer efficient, however one can obtain a very satisfactory estimate in this case just by ignoring the null structure and writing φ,α ψ ,α just as L(∇x,t φ, ∇x,t ψ). This improvement shall be important in the proof of (31) in the next section. We shall prove (4) in several stages. Step 1. Obtain decay away from light cone. The purpose of this step is to obtain the bound (4) (4) Pk (φ,α Qj ψ ,α ) χk=k2 χj ≤k1 φ S[k1 ] ψ S[k2 ] N[k]
(133)
whenever k1 ≤ k2 + O(1) and j > k1 + O(1). In other words, we obtain a decay in the low-high interaction case if ψ is more than 2k1 away from the light cone. This bound (133) is quite easy to prove and does not exploit the null structure. We may of course assume that k ≤ k2 + O(1) since the contribution vanishes otherwise. 34 Indeed, it allows one - heuristically at least - to consider large angle interactions as the dominant term, although in practice the small angle interactions do introduce several technical nuisances, notably the angular decomposition into sectors which appears in the definition of S[k] and Nk .
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
523
We first dispose of the contribution of Q>k1 −C φ,α . In this case we use (91) and Hölder to estimate this contribution by 2(n/2−1)k Q>k1 −C ∇x,t φ L2 L∞ Qj ∇x,t ψ L2 L2 , t
t
x
x
which by (84), (83) is bounded by 2(n/2−1)k 2k1 /2 φ S[k1 ] 2−j/2 2−(n/2−1)k2 ψ S[k2 ] which is acceptable. It remains to control the contribution of Q
Qj ψ N[k2 ] 2−k2 2−j ψ S[k2 ] , while from Lemma 3 we have
Q k1 + C. Then the expression inside the norm has Fourier support in D0 ∼ 2k , D− ∼ 2j , so we may use (92) to obtain 2k1 (2k2 + 2j ) Pk (Q
2k1 (2k2 + 2j )2(n/2−1)k 2−j/2 Q
x
If k = k2 + O(1) the claim then follows by (82) for φ and (83) for ψ. If k < k2 − C, then k1 = k2 + O(1). By Bernstein’s inequality (6) and Hölder we may estimate the previous by 2k1 (2k2 + 2j )2(n/2−1)k 2−j/2 2nk/2 Q
x
t
x
which is acceptable by (81), (83). Step 2. Control null forms from S[k1 ] × S[k2 ] → N [k]. The purpose of this step is to prove Lemma 14. Let k1 , k2 , k be integers. Then we have Pk (φ,α ψ ,α )
N[k]
for all φ ∈ S[k1 ] and ψ ∈ S[k2 ].
(4)
χk=max(k1 ,k2 ) φ S[k1 ] ψ S[k2 ]
(134)
524
T. Tao
This lemma shall also be useful in the proof of (31). Our main tools shall be (133), Lemma 12, (4), and the estimate (from Plancherel and the identity |τ 2 − |ξ |2 | = D+ D− )
✷ψ X˙ n/2−1,−1/2,∞ ∼ ∇x,t ψ X˙ n/2−1,1/2,∞ k
k
(135)
whenever ψ has Fourier support in 2k−5 ≤ D0 ≤ 2k+5 . By the Littlewood–Paley product trichotomy and symmetry we may split into the high-high interaction case k ≤ k2 + O(1), k1 = k2 + O(1) and the low-high interaction case k1 ≤ k2 + O(1), k = k2 + O(1). Case 2(a). (High-high interactions) k1 = k2 + O(1), k ≤ k2 + O(1). By scale invariance we may take k1 = 0, hence k2 = O(1) and k ≤ O(1). We begin by discarding those contributions where φ or ψ is far away from the light cone. From (133) and the triangle inequality, the contribution of Q≥k2 ψ is acceptable, thus we need only consider the contribution of Q
ψ S[k2 ] by Lemma 3). By similar reasoning we may now dispose of the contribution of Q≥0 φ. We are thus left with showing (4) Pk (Q<0 φ,α Q
By Lemma 13 we may bound this by (4)
χk=0 Q<0 φ S[0] Q
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
525
By (92) we can estimate the left hand side by 2(n/2−1)k 23j/2 Pk Qj (Q<0 φQ
k+2C≤j ≤C
x
since the j > C contributions vanish. We split this into three contributions. Case 2(a).1. (φ not too close to light cone) The contribution of Qk+j −C<·<0 φQ
k+2C≤j ≤C
x
which by (83) and (82) is bounded by 2nk/2 2(n/2−1)k 23j/2 2−(k+j −C)/2 φ S[0] ψ S[k2 ] , k+2C≤j ≤C
which is acceptable. Case 2(a).2. (φ is very close to light cone, but ψ is not) The contribution of Q≤k+j −C φQk2 +k+j −C<·
k+2C≤j ≤C
x
which by (83) and (82) is bounded by 2nk/2 2(n/2−1)k 23j/2 φ S[0] 2−(k2 +j −C)/2 ψ S[k2 ] , k+2C≤j ≤C
which is acceptable. Case 2(a).3. ((++) case: both φ, ψ are very close to light cone) The contribution of Q≤k+j −C φQ≤k2 +k+j −C ψ. From elementary geometry we see that this case is only non-zero when j = O(1); note that this is the (++) case which was left uncovered by Lemma 11. We may assume that k < −C since the sum is vacuous otherwise. In this case the output dominates, and so we shall use sector decomposition. Fix j , and l := (C − j − k)/2. From the triangle inequality we have Pk Qj (Q≤k+j −2 φQ
Pk Qj (P0,±κ Q± φP ψ) 2 2. k ,± 2 ≤−2l
±,± κ,κ ∈Kl
From the geometry of the cone we see that the expression inside the norm vanishes unless ± = ± and dist(κ, −κ ) 2−l . By (73) we can thus estimate the previous by 2−(n−1)l/2 ; P0,±κ Q± Pk2 ,±κ Q± ≤−2l φ
S[0,κ]
S[k2 ,κ]
526
T. Tao
note that κ and κ are separated by ∼ 1, not by ∼ 2−l . By Cauchy-Schwartz and (79) we may bound this by 2−(n−1)l/2 φ S[0] ψ S[k2 ] which is acceptable. Case 2(b). (Low-high interactions) k1 ≤ k2 + O(1), k = k2 + O(1). By scale invariance we may take k = 0, thus k2 = O(1) and k1 ≤ O(1). We may assume that k1 < −10C for some large constant C, since we could use Case 2(a) otherwise. We first discard a rather technical contribution, that of Qj φ,α for j = O(1). By (133) we may replace ψ by Q
χ0≥k1 Qj ∇x,t φ N[k1 ] ∇x,t Q
We discard the χ0≥k1 gain. From (92), (83) we have Qj ∇x,t φ φ S[k1 ] N[k ] 1
while from the hypothesis k2 = O(1) we have ∇x,t Q
First consider the contribution of Qj φ,α for some j > k1 + C, j = O(1). Then this term has Fourier support in {D0 ∼ 1, D− ∼ 2j }, so by (92) and Hölder we may estimate this contribution by 2−j/2 Qj ∇x,t φ L2 L∞ Q≤k1 +O(1) ∇x,t ψ L∞ 2. t Lx t
x
By (84), (81) we may bound this by 2−j/2 2k1 −j/2 φ S[k1 ] ψ S[k2 ] . Summing over all such j we see that these contributions are acceptable. We thus reduce to showing
P0 (Q≤k1 +C φ,α Q≤k1 +O(1) ψ ,α ) N[0] φ S[k1 ] ψ S[k2 ] . By (4) it suffices to show that P0 (Q≤k +C φ✷Q≤k +O(1) ψ) φ S[k1 ] ψ S[k2 ] , 1 1 N[0] P0 (✷Q≤k +C φQ≤k +O(1) ψ) φ S[k1 ] ψ S[k2 ] , 1 1 N[0] P0 ✷(Q≤k +C φQ≤k +O(1) ψ) φ S[k1 ] ψ S[k2 ] . 1 1 N[0]
(136) (137) (138)
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
Step 2(b).1. Proof of (136). By Lemma 12 we may bound the left-hand side by (4) Q≤k +C φ S[k ] ✷Qj ψ χ j ≤k1 +O(1)
j =k1
1
1
527
n/2−1,−1/2,∞ . X˙ k 2
By Lemma 3 we may discard Q≤k1 +C . By (135), (79) this is bounded by (4) χj =k1 φ S[k1 ] ψ S[k2 ] j ≤k1 +O(1)
which is acceptable. Step 2(b).2. Proof of (137). By repeating the argument in Step 2(b).1 (but with the roles of φ, ψ reversed) we can see that P0 (✷Q≤k +C φQ≤k +O(1) ψ) φ S[k1 ] ψ S[k2 ] . 1 2 N[0] Thus it suffices to show
P0 (✷Q≤k
1 +C
k1 +O(1)<j ≤k2 +O(1)
φQj ψ) N[0] φ S[k1 ] ψ S[k2 ] .
First consider the case when j < k1 + 2C. In this case we use Lemma 12, (135), (83) to estimate P0 (✷Q≤k +C φQj ψ) φ S[k1 ] ψ S[k2 ] 1 N[0] as desired. Now suppose j > k1 + 2C. Then the expression inside the norm has Fourier support in D0 ∼ 1, D− ∼ 2j , so by (92) and Hölder we may bound this by 2−j/2 ✷Q≤k1 +C φ L2 L∞ Qj ψ L∞ 2. t Lx t
x
By Bernstein’s inequality (5) and (81) this is bounded by 2−j/2 2nk1 /2 ✷Q≤k1 +C φ L2 L2 ψ S[k2 ] . t
x
Applying (135), (83) we may estimate this by 2−j/2 2k1 /2 φ S[k1 ] ψ S[k2 ] , which is acceptable after summing in j . Step 2(b).3. Proof of (138). We may freely insert Q≤k1 +2C in front of P0 . We first consider the contribution of P0 Q≤k +2C ✷(Q≤k +C φψ) . 1 1 N[0] By (92), (135) we may estimate this by Q≤k1 +2C (Q≤k1 +C φψ) X˙ n/2,1/2,1 . 0
By dyadic decomposition and Lemma 13 we may bound this by (4) χj =k1 Q≤k1 +C φ S[k ] ψ S[k2 ] . j ≤k1 +2C
1
528
T. Tao
By Lemma 3 we may discard Q≤k1 +C , and so we see this contribution is acceptable. It remains to control P0 Q≤k +2C ✷(Q≤k +C φQ≥k +O(1) ψ) . 1 1 1 N[0] We may freely replace Q≥k1 +O(1) with Qk1 +O(1)≤·≤k1 +3C . We then use (92) to estimate this by 2k1 /2 Q≤k +C φQk +O(1)≤·≤k +3C ψ 2 2 . 1
1
1
Lt Lx
But this is acceptable by (82) for φ and (83) for ψ. Step 3. Proof of (30). By Lemma 1 we may replace L(φ,α , ψ ,α ) with φ,α ψ ,α . It suffices to prove this estimate on R1+n , since the [−T , T ] × Rn estimate then follows by truncation. Since Nk contains N[k], it suffices to show that Pk (φ,α ψ ,α )
N[k]
(1)
χk=max(k1 ,k2 ) φ Sk1 ψ Sk2 .
By the triangle inequality and Littlewood–Paley decomposition the left-hand side is bounded by ,α . Pk (φk1 ,α ψk ) 2
k1 ,k2
N[k]
By Lemma 14 and (87) we may bound this by k1 ,k2
(4)
(1) (1) χ φ Sk1 ψ Sk 2 1 =k1 k2 =k2
χk=max(k ,k ) χk 1
2
which is acceptable.
18. Trilinear Estimates: The Proof of (31) We now prove the final major estimate in Theorem 3, namely (31). The major difficulty (1) here is in obtaining the crucial χk1 ≤min(k2 ,k3 ) gain; without this gain, the lemma follows fairly easily from (119) and (30). In some cases one can use the refined estimates (133) and Lemma 13 to eke out this gain, but there is one difficult case (Case 4, when k1 is much larger than k2 , k3 ) in which this does not suffice, in which case we must use more difficult arguments. We apologize in advance for the convoluted nature of the arguments here, which are an amalgam of five or six separate attacks on this estimate by the author. Each attack was only able to deal with a portion of the cases, but together they eventually manage to cover all the possible trilinear interactions between φ (1) , φ (2) , and φ (3) . The crude piecing together of these distinct attacks is responsible for the small constant δ1 in the exponential gain, which is of course not optimal. We turn to the details. As usual we shall work on R1+n rather than [−T , T ] × Rn . (2) We may replace Nk by N [k]. By Lemma 1 we may replace L(φ (1) , φ,α , φ (3),α ) with
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
529
(2)
φ (1) φ,α φ (3),α . By the argument in Step 3 of the previous section, it suffices to prove the estimate (2) (3),α φ ] Pk [φ (1) φ,α
N[k] (2) (2) χk=max(k1 ,k2 ,k3 ) χk1 ≤min(k2 ,k3 ) φ (1) S[k1 ] φ (2) S[k2 ] φ (3) S[k3 ]
for all integers k, k1 , k2 , k3 and Schwartz φ (i) ∈ S[ki ] for i = 1, 2, 3. By symmetry we may assume k2 ≤ k3 ; by scaling we may take k3 = 0. We thus need to show (2) (2) (2) (3),α φ ] χk=max(k1 ,0) χk1 ≤k2 φ (1) S[k1 ] φ (2) S[k2 ] φ (3) S[0] . Pk [φ (1) φ,α N[k]
To do this we split into four (slightly overlapping) cases. Case 1: (φ (1) has the lowest frequency) k1 < k2 + O(1). (2)
This is the easiest case as one does not need to obtain the decay χk1 ≤k2 . From Lemma 14 we have (4)
(2) (3),α
Pk [φ,α φ ] N[k ] χk =0 φ (2) S[k2 ] φ (3) S[0]
for all k . From (119), (89) and Littlewood–Paley decomposition we thus have (4) (2) (3),α φ ] χk=0 φ (1) S[k1 ] φ (2) S[k2 ] φ (3) S[0] Pk [φ (1) φ,α N[k]
which is acceptable. Case 2: (φ (1) has the intermediate frequency) k2 ≤ k1 + O(1) ≤ O(1).. (2)
The idea is to play off Lemma 12 (which gives a gain if φ,α φ (3),α is far from the cone) (2) against a variant of (133) (which will gain when φ,α φ (3),α is close to the cone). We may assume that k ≤ O(1) since the left-hand side vanishes otherwise. We may also assume that k2 < −C for some large constant C since we can use Case 1 otherwise. By the triangle inequality it thus suffices to show (2) (2) χk=0 χk1 ≤k2 φ (1) S[k1 ] φ (2) S[k2 ] φ (3) S[0] , (139) Pk [φ (1) Gk ] N[k]
for all
k
= O(1), where
(2) (3),α Gk := Pk φ,α . φ
(140)
Fix k . We apply Lemma 15. For any k , k2 , k3 we have (2) (3),α φ ) φ (2) S[k2 ] φ (3) S[k3 ] Pk (φ,α N[k ]
and
(2) (3),α φ ) Qj Pk (φ,α
N[k ]
(4)
χj =min(k2 ,k3 ) φ (2) S[k2 ] φ (3) S[k3 ]
whenever j > min(k2 , k3 ) + O(1). In particular we have
(141)
(142)
530
T. Tao
Proof. By symmetry we may assume k2 ≤ k3 . We may then assume that k ≤ k3 + O(1) since the expression vanishes otherwise. The estimate (141) follows from Lemma 14, so we turn to (142). This shall be a variant of (133). We may assume that j > k2 + 10 since the claim follows from (141), Lemma 10, and (92) otherwise. Consider the contribution of Q>j −5 φ (3),α . By (92) and Hölder we can bound this contribution by
(3) ∞ Q>j −5 ∇x,t φ
L2 L2 2(n/2−1)k 2−j/2 ∇x,t φ (2) L∞ t Lx t
x
which is acceptable by (82), (83). Now consider the contribution of Q≤j −5 φ (3),α . We may then freely insert Q>j −10 (2) in front of φ,α . By (92) and Hölder we can then bound the previous by 2(n/2−1)k 2−j/2 ∇x,t Q>j −10 φ (2) 2 ∞ Q>j −5 ∇x,t φ (3) ∞ 2 Lt Lx
which is acceptable by (84), (81). This proves the lemma.
Lt Lx
From the above lemma we have Q>(k +k )/2 Gk χ (4) φ (2) S[k ] φ (3) S[0] 2 1 2 k1 =k2 N[k ] which when combined with (119), (89) shows that the contribution of Q>(k1 +k2 )/2 Gk is acceptable. It thus suffices to show that (2) χk1 =k2 φ (1) S[k1 ] φ (2) S[k2 ] φ (3) S[0] . Pk φ (1) Q≤(k1 +k2 )/2 Gk N[k]
We first consider the contribution of Q>k1 −C φ (1) . We first consider the case when k1 < −C, which forces k = O(1). By (91) and Hölder we may bound this contribution by Q>k1 −C φ (1) L2 L∞ Q≤(k1 +k2 )/2 Gk L2 L2 . t
t
x
x
From (84) and Lemma 10 we can estimate this by 2−k1 /2 φ (1) S[k1 ] 2(k1 +k2 )/4 Gk N[k ] which is acceptable by (141). Now suppose k1 = O(1). Then we apply (91) as before, but then use Bernstein’s inequality (6) and Hölder to obtain a bound of 2nk/2 Q>k1 −C φ (1) 2 2 Q≤(k1 +k2 )/2 Gk L2 L2 . Lt Lx
t
x
If one argues as before using (83) instead of (84) we see that this case is also acceptable. We are now left with controlling , Pk Q≤k1 −C φ (1) Q≤(k1 +k2 )/2 Gk N[k]
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
531
We may freely insert Q≤k1 +10 in front of Pk , and insert Pk1 −10<·
j ≤(k1 +k2 )/2
which by Lemma 12 is bounded by (4) χk=0 Pk1 −10<·
(4)
S[k1 ]
j ≤(k1 +k2 )/2
2−j/2 χj =k1 Qj Gk L2 L2 . t
x
By Lemma 3 we may discard Pk1 −10<·
S[k1 ]
which is acceptable by (141). Case 3: (φ (1) has the highest frequency; φ (2) , φ (3) have different frequencies) k1 $ 1 and k2 ≤ −δ2 k1 . This will be similar to Case 2, but uses the separation between k2 and k3 = 0 as a proxy (2) for the separation between k2 and k1 for the purposes of obtaining the gain χk1 ≤k2 . In this case we may assume k = k1 + O(1). It suffices by the triangle inequality to show that (2) (3) (2) φ (1) φ φ χ Pk [φ (1) Gk ] N [k]
k1 ≤k2
S[k1 ]
S[k2 ]
S[0]
for all k = O(1), where Gk is defined in (140); note that Gk vanishes for k = O(1). Fix k . We shall repeat the argument in (139). From Lemma 15 we have (3) Q>k /2 Gk χ (4) φ (2) φ . 2 k2 =0 N[k ] S[k ] S[0] 2
From (119) and the Case 3 hypothesis we thus see that the contribution of Q>k2 /2 Gk is acceptable. It thus remains to estimate . Pk [φ (1) Qj Gk ] j ≤k2 /2
N[k]
By Lemma 12 and Lemma 10 we can estimate this by (4) χj =0 Gk N[k ] φ (1) S[k ] , j ≤k2 /2
1
which is acceptable by (141) and the Case 3 hypothesis. Case 4: (φ (1) has the highest frequency; φ (2) , φ (3) have comparable frequencies) k1 $ 1 and −δ2 k1 ≤ k2 ≤ 0. In this case we may assume k = k1 + O(1), and we reduce to showing (2) (2) (3),α φ ] χk1 =0 φ (1) S[k1 ] φ (2) S[k2 ] φ (3) S[0] . Pk [φ (1) φ,α N[k]
532
T. Tao
In principle this case is easy because both derivatives are on low frequency terms, however by the same token the null structure is much less advantageous in this case. In (2) fact in two dimensions (where the null structure is crucial) the gain χk1 =0 is not easy to obtain even when the φ (i) are all free solutions35 . Our strategy will be to use some multilinear multiplier calculus to try to move the derivatives (and the null structure) back onto the high frequency term, hopefully gaining in the process a factor proportional to the ratio between the high and low frequencies. This turns out to be achievable except when the high frequency is Minkowski-orthogonal to the two low frequencies, but in this case one can repeat the derivation of (28), improving upon Bernstein’s inequality for the low frequencies at one crucial juncture to obtain the desired gain. We now consider the contribution of (2) (3),α φ ) . Pk φ (1) P≤−δ2 k1 (φ,α N[k]
This contribution is only non-zero when k2 = O(1). By (119) and Lemma 14 we can bound this by (2)
φ (1) S[k1 ] χk =0 φ (2) S[k2 ] φ (3) S[k3 ] k ≤−δ2 k1
which is acceptable. Next, consider the contribution of (2) (3),α φ ) Pk φ (1) P−δ2 k1 <·<5 Q≤−2δ2 k1 (φ,α
N[k]
.
We estimate this by
−δ2 k1
(2) (3),α φ ) Pk φ (1) Pk Qj (φ,α
N[k]
which by Lemma 12, Lemma 10 is bounded by (4) (2) (3),α χj ≥k φ (1) S[k1 ] Pk (φ,α φ )] N[k ] −δ2 k1
which by Lemma 14 is bounded by (4) χj ≥k φ (1) S[k1 ] φ (2) S[k2 ] φ (3) S[k3 ] , −δ2 k1
which is acceptable. 35 In particular, it cannot be proven solely using X ˙ s,b spaces even in the free case, because these spaces just barely fail to exclude the possibility of the low frequencies concentrating in the region Minkowski-orthogonal to the high frequency, which has the effect of keeping the above trilinear expression close to the light cone in frequency space.A similar difficulty is also encountered in equations of Maxwell–Klein–Gordon orYang–Mills type near the critical regularity (see [22, 21, 43]). This possible concentration can be shown to be impossible by using Lp bilinear estimates for p > 2, but in low dimensions these estimates are quite difficult (cf. [44, 39]). Another approach would be to refine the frequency-sector control of S(c) further, perhaps to small balls, in order to isolate the portions of φ (i) which are contributing to the bad Minkowski-orthogonal case. In both cases one would have to re-engineer the space S(c) substantially. We have chosen a different approach based on applying Bernstein’s inequality to the bad Minkowski-orthogonal region.
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
It thus remains to show (2) (3),α φ )] Pk [φ (1) J(φ,α
N[k]
533
(2)
χk1 =k2 φ (1) S[k1 ] φ (2) S[k2 ] φ (3) S[0] ,
where J := P−δ2 k1 <·<5 Q>−2δ2 k1 . (2)
By adapting the proof of Lemma 6 we observe that χk1 =0 J is disposable for a suitable (2)
choice of weight χk1 =0 . Thus we can usually discard J whenever desired by paying a −(2)
price of χk1 =0 . The projection J will be a minor nuisance for the next few steps, but is crucially needed much later in the argument (in Cases 4(e).3(b)–(c)), and is difficult to insert later on. We now use Littlewood–Paley projections in time to split into five sub-cases, of which the first four are quite easy (they are of “D+ $ D0 ” type). The fifth case is the main one, in which D+ ∼ D0 for all three functions φ (i) . Case 4(a): φ (2) , φ (3) both have Fourier support in the region |τ | $ 1. −(2)
In this case we use (91) and Hölder and discard J (paying χk1 =0 ) to estimate this contribution by (2) (3),α φ (1) J φ,α φ 1 2, Lt Lx (1) −(2) φ ∞ 2 χk1 =0 ∇x,t φ (2) L2 L∞ ∇x,t φ (3) L2 L∞ . t
Lt Lx
t
x
x
By (81), (84) we may bound this by −(2)
2−nk1 /2 φ (1) S[k1 ] χk1 =0 φ (2) S[k2 ] φ (3) S[k3 ] , which is acceptable. Case 4(b): φ (2) , φ (3) have Fourier support in the regions |τ | 1 and |τ | $ 1 respectively. Split φ (1) = Q<−C φ (1) + Q≥−C φ (1) . To control the contribution of Q≥−C we use (91) and Hölder, and discard J (paying −(2) χk1 =0 ) to estimate this contribution by (2) (3),α Q≥−C φ (1) J(φ,α φ ) 1 2 L L t x −(2) (1) Q≥−C φ 2 2 χk1 =0 ∇x,t φ (2)
(3) φ ∇ x,t ∞
L∞ t Lx
Lt Lx
L2t L∞ x
.
by (83), (82), (84) this is bounded by −(2)
2−nk1 /2 φ (1) S[k1 ] χk1 =0 φ (2) S[k2 ] φ (3) S[k3 ] which is acceptable as before. We now control the contribution of Q<−C . It suffices by conjugation symmetry to (3) avoids the region D ∼ 2k1 . control Q+ − <−C . Let us first assume (using Lemma 3) that φ
534
T. Tao
Then this term has Fourier support in the region D0 ∼ 2k , D− > ∼ 1. Thus we may use (92) to estimate this contribution by (1) (2) (3),α φ J(φ φ ) 2(n/2−1)k Q+ ,α <−C
L2t L2x
.
−(2)
By Hölder and discard J (paying χk1 =0 ) we may estimate this contribution by (1) 2(n/2−1)k Q+ φ <−C
2 L∞ t Lx
−(2) χk1 =0 ∇x,t φ (2)
(3) φ ∇ x,t ∞
L∞ t Lx
L2t L∞ x
.
But this is acceptable by (81), (82), (84). Now consider the case when φ (3) is supported in the region D− ∼ 2k1 . The problem here is that this term can hit the negative light cone τ = −|ξ |. From the previous discussion, we only need to show that + (1) Pk Q−
(2)
χk1 =k2 φ (1) S[k1 ] φ (2) S[k2 ] φ (3) S[0] ,
N[k]
where l is chosen so that k1 − 2l = −C, and (2) Qk1 +O(1) φ (3),α . F := J φ,α Observe that F has Fourier support in the region 2−δ2 k1 D0 1 and D− ∼ 2k1 . If we estimate F in L2 , use Plancherel to discard J, then use Hölder and (82), (83) we obtain
F L2 L2 ∇x,t φ (2) ∞ ∞ ∇x,t Qk1 +O(1) φ (3) 2 2 t x Lt Lx Lt Lx (3) −k1 /2 (2) 2 . φ φ S[k2 ]
Thus it will suffice to show that + (1) Pk Q−
N[k]
S[0]
(4) χk1 =0 φ (1)
S[k1 ]
2k1 /2 F L2 L2 . t
x
To prove this we shall use what is roughly the dual argument to Case 2(a).3 of Sect. 17. We split the left-hand side as + (1) Pk Pk,−κ Q−
.
N[k]
By (93) we may bound this by
2 + (1) 2(n/2−1)k Pk Pk,−κ Q−
NFA[κ]
1/2
.
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
535
By elementary geometry we see that the summand vanishes unless dist(−κ, κ ) 2−l . In particular for each κ there are only O(1) κ which contribute, so we may estimate the previous by 2(n/2−1)k
κ,κ ∈Kl :dist(−κ,κ )2−l
1/2
2 + (1) Pk Pk,−κ Q−
NFA[κ]
.
By Lemma 6 we may discard Pk Pk,−κ Q−
2(n/2−1)k 2−l/2 2−k1 /2
κ,κ ∈Kl :dist(−κ,κ )2−l
2 (1) φ Pk1 ,κ Q+
S[k1 ,κ ]
1/2
F L2 L2 . t
x
By (79) this is bounded by 2(n/2−1)k 2−l/2 2−k1 /2 φ (1) S[k1 ] F L2 L2 t
x
which is acceptable when n = 2, 3, 4. Case 4(c): φ (2) , φ (3) have Fourier support in the regions |τ | $ 1 and |τ | 1 respectively. This is similar to Case 4(b) and is left to the reader. (Some additional factors of 2k2 appear but the reader can easily verify that they are negligible since we are in Case 4.) Case 4(d): φ (2) , φ (3) both have Fourier support in the region |τ | 1, and φ (1) has Fourier support on |τ | $ 2k1 . k This contribution has Fourier support on D0 ∼ 2k , D− > ∼ 2 , so by (92) we may estimate it by (2) (3),α φ ) 2 2 . 2(n/2−1)k 2−k/2 φ (1) J(φ,α Lt Lx
−(2)
Using Hölder and discarding J (paying χk1 =0 ) we may bound this by 2(n/2−1)k 2−k/2 φ (1)
L2t L2x
−(2) χk1 =0 ∇x,t φ (2)
(3) φ ∇ x,t ∞
L∞ t Lx
∞ L∞ t Lx
.
But this is acceptable by (83) and two applications of (82). Case 4(e): (No function is too far away from the light cone) φ (2) , φ (3) both have Fourier support in the region |τ | 1, and φ (1) has Fourier support on |τ | 2k1 . We now divide the φ (i) into spacetime sectors of angular width 2−δ4 k1 . More precisely, we let F∗ be a maximal 2−δ4 k1 -separated subset of the set {D0 = 1; D+ 1} and for each K = (τ, ξ ) in F∗ , let πK be a spacetime Fourier multiplier whose symbol mK is homogeneous of degree zero, is adapted to the sector (τ , ξ ) : (τ , ξ ), K 2−δ4 k1
536
T. Tao
(where denotes the standard Euclidean angle) and we have the partition of unity πK 1= K∈F∗
for functions with Fourier support on the region D+ ∼ D0 . We can then bound the contribution of this case by (2) Pk πK1 φ (1) J πK2 φ,α πK3 φ (3),α . (143) K1 ,K2 ,K3 ∈F∗ N[k]
Observe that the πKi are disposable. If K = (τ, ξ ), K = (τ , ξ ) are spacetime vectors, we define the Minkowski inner product η(K, K ) by η(K, K ) := ξ · ξ − τ τ . We now split the sum into three pieces. (1)
Case 4(e).1: (φ,α φ (2),α is non-degenerate) The portion of the summation where |η(K1 , K2 )| $ 2−δ4 k1 2k1 . (2)
(1)
In this case the idea is to move the null form from φ,α φ (3),α to the null form φ,α φ (2),α (which is now non-degenerate), gaining a factor of about 2−k1 in the process. We turn to the details. Consider a single summand of (143). The Fourier transform of this at (τ, ξ ) is C δ(τ1 + τ2 + τ3 )δ(ξ1 + ξ2 + ξ3 )(ξ2 · ξ3 − τ2 τ3 ) 1 − m0 (2δ2 k1 |ξ2 + ξ3 |)
3 1 − m0 (22δ2 k1 ||τ2 + τ3 | − |ξ2 + ξ3 ||) × mKi (τi , ξi )Fφ (i) (τi , ξi ) dτi dξi . i=1
We rewrite this as C2−k1 2δ4 k1 δ(τ1 + τ2 + τ3 )δ(ξ1 + ξ2 + ξ3 )(ξ1 · ξ2 − τ1 τ2 ) ×M
3
mKi (τi , ξi )Fφ (i) (τi , ξi ) dτi dξi ,
i=1
where the symbol M = M(τ1 , τ2 , τ3 , ξ1 , ξ2 , ξ3 ) is defined by M := 1 − m0 (2δ2 k1 |ξ2 + ξ3 |) 2k1 (ξ · ξ − τ τ ) 2 3 2 3 × 1 − m0 (22δ2 k1 ||τ2 + τ3 | − |ξ2 + ξ3 ||) δ k . 2 4 1 (ξ1 · ξ2 − τ1 τ2 ) The function
3 i=1
mKi (τi , ξi )Fφ (i) (τi , ξi )
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
537
is supported on a 3n+3-dimensional box, which is the product of three n+1-dimensional boxes of dimension 2ki ×2−δ4 k1 2ki ×. . . 2−δ4 k1 2ki for i = 1, 2, 3. On this box the symbol M is bounded, and by the hypothesis of Case 4(e).1 can in fact be replaced by a bump function adapted to a dilate of this 3n + 3-dimensional box, except that each normalized −(2) derivative could pull down a factor of χk1 =0 . In particular, we can then write M on this box as a Fourier series (cf. [36]) M(τ1 , τ2 , τ3 , ξ1 , ξ2 , ξ3 ) :=
X∈L
cX
3
e2πi(ti τi +xi ·ξi ) ,
i=1
where L is some discrete subset of (R×Rn )3 (basically the dual lattice of the dilate of the 3n + 3-dimensional box), X = (t1 , t2 , t3 , x1 , x2 , x3 ), and the cX are a set of co-efficients −(2) with an l 1 norm of O(χk1 =0 ). Undoing the Fourier transform, we thus see that (2) πK1 φ (1) (t, x)πK2 φ,α (t, x)πK3 φ (3),α (t, x)
can be written as −(4) (1) cX πK1 φ,α (t−t1 , x−x1 )πK2 φ (2),α (t−t2 , x−x2 )πK3 φ (3) (t−t3 , x−x3 ). C2−k1 χk1 =0 X∈L
The summand in (143) can thus be estimated by −(4) (1) (t − t1 , x − x1 )πK2 φ (2),α 2−k1 χk1 =0 sup Pk πK1 φ,α X∈L
× (t − t2 , x − x2 )πK3 φ (3) (t − t3 , x − x3 )
N[k]
.
By Lemma 14 and (119) we can bound this by36 −(4)
2−k1 χk1 =0 sup
3 πKi φ (i) (· − ti , · − xi )
S[ki ]
X∈L i=1
.
On the Fourier support of φ (i) , πKi is a multiplier with integrable kernel, so we can estimate this by (2) (3) −(4) . 2−k1 χk1 =0 φ (1) φ φ S[k1 ]
S[k2 ]
S[k3 ]
−(4)
Since there are at most O(χk1 =0 ) summands, we see that the total contribution of these terms are acceptable. (1)
Case 4(e).2: (φ,α φ (3),α is non-degenerate) The portion of the summation not covered by 4(e).1, where |η(K1 , K3 )| $ 2−δ4 k1 2k1 . −(2)
This is done similarly to Case 4(e).1. A few additional powers of 2k2 = O(χk1 =0 ) appear in the estimates, but they make no difference to the final argument. 36 It is possible to adapt the proofs of Lemma 14 and (119) to apply directly to the summands of (143) in this case, and thus forego the need to perform the Fourier series decomposition of M. But this would require a tedious repetition of the proofs of these two Lemmata - in addition to the repetition which will occur in Case 4(e).3 - so we opted to take this shortcut instead.
538
T. Tao
Case 4(e).3: (Low frequencies are Minkowski-orthogonal to the high frequency) The portion of the summation where η(K1 , K2 ), η(K1 , K3 ) = O(2−δ4 k1 2k1 ). We are thus left with showing (1) (2) (3),α Pk πK1 φ J πK2 φ,α πK3 φ ∗
(2)
N[k]
χk1 =0
3 (i) φ
S[ki ]
i=1
,
(144)
where the summation ∗ is over all K1 , K2 , K3 ∈ F∗ for which η(K1 , K2 ), −(3) −δ k k 4 1 1 2 ). Note that there are at most O(χk1 =0 ) terms in this summaη(K1 , K3 ) = O(2 tion. Define the quantity l := k1 + 3δ2 k1 . We split φ (1) = Q≥k1 −2l φ (1) + Q
∗
n (2) 2( 2 −1)k1 πK1 Q≥k1 −2l φ (1) J πK2 φ,α πK3 φ (3),α
L1t L2x
.
−(2)
We use Hölder and discard J (paying χk1 =0 ) to estimate this by
∗
n −(2) 2( 2 −1)k1 Q≥k1 −2l φ (1) L2 L2 χk1 =0 ∇x,t φ (2) L∞ L∞ ∇x,t φ (3) L∞ L∞ t
t
x
t
x
x
which by (83), (82) can be bounded by −(4) n −(2) χk1 =0 2( 2 −1)k1 2−nk1 /2 2−(k1 −2l)/2 φ (1) S[k ] χk1 =0 φ (2) S[k ] φ (3) S[k ] . 1
2
3
But this simplifies to −(4) χk1 =0 2−k1 /2 φ (1) S[k ] φ (2) S[k ] φ (3) S[k ] , 1
2
3
which is acceptable. Case 4(e).3(b). (The output is not too close to light cone) The contribution of Q
N[k]
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
539
By the triangle inequality and (92) we may estimate this by
∗
n (1) (2) (3),α 2( 2 −1)k1 2−(k1 −2l)/2 Pk πK1 Q+
L2t L2x
We expand out J and estimate this by
n 2( 2 −1)k1 2−(k1 −2l)/2 φ (1)
∗
2 L∞ t Lx
−2δ2 k1
(2) πK3 φ (3),α Pk Q>−2δ2 k1 πK2 φ,α
L2t L∞ x
.
By (81) and Bernstein’s inequality (5) we can bound this by
∗
n 2( 2 −1)k1 2−(k1 −2l)/2 2−nk1 /2 φ (1)
S[k1 ]
−2δ2 k1
(2) 2nk /2 Pk Q>−2δ2 k1 πK2 φ,α πK3 φ (3),α
L2t L2x
.
By Lemma 10 we can bound this by −(2)
χk1 =0
n 2( 2 −1)k1 2−(k1 −2l)/2 2−nk1 /2 φ (1)
∗
S[k1 ]
2
nk /2
2−(
(2) ) πK3 φ (3),α Pk πK2 φ,α
n 2 −1
−2δ2 k1
k
N[k ]
which by Lemma 14 is bounded by −(2)
χk1 =0
n −1k 1 −(k1 −2l)/2 −nk1 /2 (1) φ 2 2 2 2 S[k ∗
1]
n
(3) 2 −1 k φ (2) . S[k2 ] φ S[k ]
2nk /2 2−
3
−2δ2 k1
But this simplifies to −(4) 2−k1 /2 χk1 =0 φ (1) S[k ] φ (2) S[k ] φ (3) S[k 1
which is acceptable.
2
3]
.
540
T. Tao
Case 4(e).3(c). (The input φ (1) and the output are both very close to light cone) The contribution of Q
N[k] (2)
χk1 =0
3 (i) φ i=1
S[ki ]
.
We estimate the left-hand side by Q+
∗ κ ∈Kl
(1) (2) (3),α πK1 Q+ P φ J π φ π φ K2 ,α K3
N[k]
.
Observe from the geometry of the cone that we may assume that 2−l dist(κ, κ ) 2−k1 since the other terms have a zero contribution. By (93) we may thus estimate the previous by 2( 2 −1)k1 n
κ∈Kl
Q+
πK1 Q+ P φ J π φ π φ . K K 2 ,α 3 NFA[κ]
∗ κ ∈Kl :2−l dist(κ,κ )2−k1
By Lemma 6 we can discard Q+
k1 =0
κ∈Kl κ ∈Kl :2−l dist(κ,κ )2−k1 K1 ∈F∗ : (K1 ,θκ )2−δ4 k1
2 (1) (2) πK2 φ,α πK3 φ (3),α πK1 Q+
NFA[κ]
1/2
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
541
where for each K1 , ∗∗ ranges over all the pairs (K2 , K3 ) which would have contributed to ∗ . By (72), then discarding πK1 , we can estimate the previous by 2( 2 −1)k1 χk1 =0 2l 2−(n−1)l/2 2−k1 /2 n
−(2)
κ∈Kl κ ∈Kl :2−l dist(κ,κ )2−k1 K1 ∈F∗ : K1 ,θ2−δ4 k1
2 + Q
S[k1 ,κ ]
2 (2) πK2 φ,α πK3 φ (3),α 2 J
.
Lt L2x
∗∗
Assume for the moment that we can show
(3) (2) (3),α πK2 φ,α πK3 φ χk1 =0 φ (2) S[k ] φ (3) S[k ] J 2 3 2 2 ∗∗
1/2
(145)
Lt Lx
for all κ, κ , K1 in the above summation. Substituting this into the previous and evaluating the K1 and κ summations using (79) we obtain the bound n −(2) −(2) 2( 2 −1)k1 χk1 =0 2l 2−(n−1)l/2 2−k1 /2 χk1 =0 φ (1)
S[k1 ]
(3) χk1 =0 φ (2) S[k ] φ (3) S[k ] . 2
3
But this simplifies to (3) χk1 =0 φ (1) S[k ] φ (2) S[k ] φ (3) S[k 1
2
3]
which is acceptable. It remains to prove (145). This is an estimate similar to those in Lemma 14, but the (3) conditions of Case 4(e).3 will allow us to squeeze out the additional χk1 =0 gain. Fix κ, κ , K1 . We can factorize the left-hand side of (145) as (2) (3),α J πK2 φ,α πK3 φ K2 ∈F∗∗ K3 ∈F∗∗
,
(146)
L2t L2x
where
F∗∗ := K ∈ F∗ : η(K1 , K) = O(2−δ4 k1 2k1 ) .
Write Ki = (τi , ξi ) for i = 1, 2, 3. From the hypotheses 2−k1 η(K1 , K2 ), 2−k1 η(K1 , K3 ), K1 , θκ = O(2−δ4 k1 ) we see that τ2 = ξ2 · ωκ + O 2−δ4 k1 ;
τ3 = ξ3 · ωκ + O 2−δ4 k1 .
(147)
Define ξi := ξi − (ξi · ωκ )ωκ for i = 2, 3, thus ξi is the portion of ξi orthogonal to ωκ . We now split the summation into four cases.
542
T. Tao
Case 4(e).3(c).1: (φ (2) , φ (3) aligned with φ (1) ) |ξ2 | < 2−δ3 k1 and |ξ3 | < 2−δ3 k1 . In this case we see from (147) that |τ2 + τ3 | = |ξ2 + ξ3 | + O 2−δ3 k1 and therefore that the contribution of this case vanishes once J is applied. Case 4(e).3(c).2: (φ (2) is aligned with φ (1) , but φ (3) is not) |ξ2 | < 2−δ3 k1 and |ξ3 | ≥ 2−δ3 k1 . In this case we may assume that |ξ3 | ≥ 2−Cδ2 k1 , since otherwise this contribution −(2) vanishes by the arguments in Case 1. We then discard J (paying χk1 =0 ) and estimate (146) by −(2) (2) (3) χk1 =0 π ∇ φ π ∇ φ K x,t K x,t 2 3 ∞ 2 . ∞ L L Lt L2x −δ k t x 3 1 K2 ∈F∗∗ :|ξ2 |<2 K3 ∈F∗∗ :ξ3 |≥2−Cδ2 k1 (148) Consider the first factor of (148). By Bernstein’s inequality (5), taking advantage of the restriction |ξ2 | < 2−δ3 k1 0 , we may bound this by (4) χk1 =0 πK2 ∇x,t φ (2) . K2 ∈F∗∗ :|ξ2 |<2−δ3 k1 ∞ 2 Lt Lx
By (147) we see that the Fourier supports of πK2 have finitely overlapping ξ -projections as K2 varies across F∗∗ . Thus we can use orthogonality to estimate the previous by (4) (4) χk1 =0 ∇x,t φ (2) ∞ 2 χk1 =0 2nk2 /2 φ (2) S[k ] 2 Lt Lx
by (81). Now consider the second factor of (148). From (147) and the restriction |ξ3 | ≥ 2−Cδ2 k1 we see that we may freely insert a factor of Q>−Cδ2 k1 inside the norm. We then use the almost orthogonality of the πK3 as K3 varies over F∗∗ to estimate this factor by Q>−Cδ2 k1 ∇x,t φ (3) 2 2 Lt Lx
−(2)
which is O(χk1 =0 φ (3) S[0] ) by (83). Combining this with the previous we see that the contribution of this case is acceptable. Case 4(e).3(c).3: (φ (3) is aligned with φ (1) , but φ (2) is not) |ξ2 | ≥ 2−δ3 k1 and |ξ3 | < 2−δ3 k1 . (2) This is similar to Case 2 and is left to the reader (the factor 2nk2 /2 = O(χk1 =0 ) moves around but is irrelevant). Case 4(e).3(c).4: (Neither φ (2) nor φ (3) is aligned with φ (1) ) |ξ2 | ≥ 2−δ3 k1 and |ξ3 | ≥ 2−δ3 k1 .
Global Regularity of Wave Maps II. Small Energy in Two Dimensions
543
−(2)
In this case we discard J (paying χk1 =0 ) and estimate (146) by −(2) χk1 =0 πK2 ∇x,t φ (2) 2 2 πK3 ∇x,t φ (3) Lt Lx
K2 ∈F∗∗ :|ξ2 |≥2−δ3 k1
K3 ∈F∗∗ :|ξ3 |≥2−δ3 k1
∞ L∞ t Lx
.
By (147), the second factor has Fourier support in a O(2−δ4 k1 ) × O(1) × . . . × O(1) slab, so by Bernstein’s inequality (5) we can bound the previous by −(2) χk1 =0
πK2 ∇x,t φ (2)
K2 ∈F∗∗ :|ξ2 |≥2−δ3 k1
(4) χ L2t L2x k1 =0
πK3 ∇x,t φ (3)
K3 ∈F∗∗ :|ξ3 |≥2−δ3 k1
L2t L2x
.
By (147) and the constraints |ξ2 |, |ξ3 | ≥ 2−δ3 k1 we may freely insert Q>−Cδ3 k1 into both factors. By orthogonality and (83) we can estimate the previous by −(2) (4) χk1 =0 ∇x,t Q>−Cδ3 k1 φ (2) 2 2 χk1 =0 ∇x,t Q>−Cδ3 k1 φ (3) 2 2 Lt Lx Lt Lx −(2) −(3) (4) (4) −(3) χk1 =0 χk1 =0 φ (2) χk1 =0 χk1 =0 χk1 =0 φ (3) S[k2 ]
S[0]
which is acceptable. This concludes the proof of (145) and therefore of (31).
Acknowledgements. This work was conducted at UCLA, Tohoku University, UNSW, and the French Alps. The author thanks Daniel Tataru, Mark Keel, and Sergiu Klainerman for very helpful discussions, insights, and encouragement, to the referees for their thorough reading and for valuable suggestions, and to Joachim Krieger for pointing out an error in the original manuscript. The author also thanks Daniel Tataru for simplifying some of the arguments in this paper. The author is a Clay Prize Fellow and is supported by grants from the Sloan and Packard Foundations.
References 1. Bourgain, J.: Fourier transform restriction phenomena for lattice subsets and applications to nonlinear evolution equations I, II. Geom. Funct. Anal. 3, 107–156, 209–262 (1993) 2. Chang, S.Y.A., Wang, L., Yang, P.: Regularity of Harmonic Maps. Comm. Pure. Appl. Math. 52, 1099– 1111 (1999) 3. Christ, M., Kiselev, A.: WKB and spectral analysis of one-dimensional Schrodinger operators with slowly varying potentials. J. Funct. Anal. 179, 426–447 (2001) 4. Christodoulou, D., Tahvildar-Zadeh, A.: On the regularity of spherically symmetric wave maps. Comm. Pure Appl. Math. 46, 1041–1091 5. Freire, A., Müller, S., Struwe, M.: Weak compactness of wave maps and harmonic maps. Ann. Inst. H. Poincare Anal. Non Lineaire 15, no. 6, 725–754 (1998) 6. Colliander, J., Keel, M., Staffilani, G., Takaoka, H., Tao, T.: Global well-posedness below the energy norm for 2D NLS. Preprint 7. Ginibre, J.: Le probléme de Cauchy pour des EDP semi-linéaires périodiques en variables d’espace. Séminaire Bourbaki 1994/1995, Astérisque 237, Exp. 796, 163–187 (1996) 8. Ginibre, J., Velo, G.: The Cauchy problem for the O(N ), CP (N − 1), and GC(N, P ) models. Ann. Physics, 142, 393–415 (1982) 9. Grillakis, M.: Classical solutions for the equivariant wave map in 1 + 2 dimensions. To appear in Indiana Univ. Math. J. 10. Gu, C.: On the Cauchy problem for harmonic maps defined on two-dimensional Minkowski space. Comm. Pure Appl. Math. 33, 727–737 (1980)
544
T. Tao
11. Helein, F.: Regularite des applications faiblement harmoniques entre une surface et une varitee Riemannienne. C.R. Acad. Sci. Paris Ser. I Math. 312, 591–596 (1991) 12. Keel, M., Tao, T.: Endpoint Strichartz Estimates. Am. Math. J. 120, 955–980 (1998) 13. Keel, M., Tao, T.: Local and global well-posedness of wave maps on R1+1 for rough data. IMRN 21, 1117–1156 (1998) 14. Keel, M., Tao, T.: Global existence for the Maxwell-Klein-Gordon equation below the energy norm. In preparation 15. Klainerman, S.: On the regularity of classical field theories in Minkowski space-time R3+1 . Prog. in Nonlin. Diff. Eq. and their Applic. 29, Basel–Boston: Birkhäuser, 1997, 113–150 16. Klainerman, S., Machedon, M.: Smoothing estimates for null forms and applications. Duke Math. J., 81, 99–133 (1995) 17. Klainerman, S., Machedon, M.: On the optimal local regularity for gauge field theories. Diff. and Integral Eq. 10, 1019–1030 (1997) 18. Klainerman, S., Machedon, M.: On the algebraic properties of the H n/2,1/2 spaces. IMRN 15, 765–774 (1998) 19. Klainerman, S., Rodnianski, S.: On the global regularity of wave maps in the critical Sobolev norm. To appear, IMRN 20. Klainerman, S., Selberg, S.: Remark on the optimal regularity for equations of wave maps type. C.P.D.E., 22, 901–918 (1997) 21. Klainerman, S., Selberg, S.: Bilinear estimates and applications to nonlinear wave equations. Preprint 22. Klainerman, S., Tataru, D.: On the optimal regularity for Yang–Mills equations in R4+1 . J. Am. Math. Soc. 12, 93–116 (1999) 23. Ladyzhenskaya, O.A., Shubov, V.I.: Unique solvability of the Cauchy problem for the equations of the two dimensional chiral fields, taking values in complete Riemann manifolds. J. Soviet Math. 25, 855–864 (1984); (English Trans. of 1981 Article) 24. Nakanishi, K.: Local well-posedness and Illposedness in the critical Besov spaces for semilinear wave equations with quadratic forms. Funk. Ekvac. 42, 261–279 (1999) 25. Selberg, S.: Multilinear space-time estimates and applications to local existence theory for non-linear wave equations. Princeton University Thesis 26. Selberg, S.: Wave maps and bilinear spacetime estimates. Preprint 27. Shatah, J.: Weak solutions and development of singularities of the SU (2) σ -model. Comm. Pure Appl. Math. 41, 459–469 (1988) 28. Shatah, J.: The Cauchy problem for harmonic maps on Minkowski space. In Proceed. Inter. Congress of Math. 1994, Basel–Boston: Birkhäuser, 1126–1132 29. Shatah, J., Struwe, M.: Geometric Wave Equations. Courant Lecture Notes in Mathematics 2, (1998) 30. Shatah, J., Tavildar-Zadeh, A.: Regularity of harmonic maps from the Minkowski space to rotationally symmetric manifolds. Comm. Pure Appl. Math. 45, 941–971 (1992) 31. Shatah, J., Tavildar-Zadeh, A.: On the Cauchy problem for equivariant wave maps. Comm. Pure Appl. Math. 47, 719–753 (1994) 32. Sideris, T.: Global existence of harmonic maps in Minkowski space. Comm. Pure Appl. Math. 42, 1–13 (1989) 33. Struwe, M.: Wave Maps, in Nonlinear Partial Differential Equations in Geometry and Physics. Prog. in Nonlin. Diff. Eq. and their Applic., 29, Basel–Boston: Birkhäuser, 1997, pp. 113–150 34. Struwe, M.: Radially symmetric wave maps from (1+2)-dimensional Minkowski space to the sphere. Preprint 35. Struwe, M.: Equivariant wave maps in two space dimensions. Preprint 36. Tao, T.: The Bochner–Riesz conjecture implies the restriction conjecture. Duke Math. J. 96, 363–376 (1999) 37. Tao, T.: Ill-posedness for one-dimensional wave maps at the critical regularity. Am. J. Math. 122, 451–463 (2000) 38. Tao, T.: An algebra for critical regularity solutions to the free wave equations. Unpublished 39. Tao, T.: Endpoint bilinear restriction theorems for the cone, and some sharp null form estimates. To appear, Math Z. 40. Tao, T.: Global regularity of wave maps I. Small critical Sobolev norm in high dimension. IMRN 7, 299–328 (2001) 41. Tataru, D.: Local and global results for wave maps I. Comm. PDE 23, 1781–1793 (1998) 42. Tataru, D.: On global existence and scattering for the wave maps equation. To appear, Am. J. Math. 43. Tataru, D.: On ✷u = |∇u|2 in 5 + 1 dimensions. Math. Res. Letters 6, 469–485 (1999) 44. Wolff, T.H.: A sharp bilinear cone restriction estimate. To appear, Annals of Math. Communicated by P. Constantin
Commun. Math. Phys. 224, 545 – 550 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Gradient Estimates for the Ground State Schrödinger Eigenfunction and Applications Rodrigo Bañuelos1, , Pawel Kröger2, 1 Mathematics Department, Purdue University, West Lafayette, IN 47907, USA.
E-mail: [email protected]
2 Departamento de Matemática, UTFSM, Valparaíso, Chile. E-mail: [email protected]
Received: 21 May 2001 / Accepted: 29 June 2001
Abstract: Let be a planar convex domain which is symmetric with respect to each coordinate axes. In this paper we give a simple and very short proof, based on the 2 for the spectral gap of the Dirichlet maximum principle, of the sharp lower bound 3π d2 Laplacian in .
1. Introduction Let be a bounded convex domain in Euclidean space Rn . Consider the Schrödinger operator − + V for a nonnegative convex potential V on under Dirichlet boundary conditions. Under these assumptions the eigenvalues are discrete and satisfy 0 < λV1, < λV2, ≤ λV3, . . . When the potential is identically zero we will just write λi, for these eigenvalues. The quantity λV2, − λV1, is called the spectral gap. It was conjectured by 2 , where d M. van den Berg [4] that λV2, − λV1, can be estimated below by 3π 2 /d denotes the diameter of . (See also [1, 2] and Problem 44 in [14].) The lower bound 2 was obtained in [12]. This bound was subsequently improved to π 2 /d 2 in [11] π 2 /4d (see also [9]). For V = 0 and planar convex domains which are symmetric in both axes, Smits [13] obtained a similar lower bound with the diameter replaced by the length of the longest axes of symmetry. The symmetry assumption is needed in order to apply a result by L. Payne [10] which guarantees the existence of an eigenfunction for the second eigenvalue whose nodal line is one of the axes of symmetry. The conjectured lower bound was proved in [6] for the Laplacian (the case V = 0) in planar domains which are again symmetric in both axes. A different proof with some extensions is given in [3]. The purpose of this note is to give a simple proof for a first order differential inequality that implies the gap estimate when V = 0 under the symmetry assumptions Supported in part by NSF grant # 9700585-DMS.
Supported in part by Fondecyt Grant # 1000713 and by UTFSM Grant # 120023.
546
R. Bañuelos, P. Kröger
on . Our proof is based on the maximum principle technique. As in the case of the results in [3] and [6], the full convexity of the domain is not needed if the domain is symmetric and convex relative to both coordinate axes. 2. The Result Let ⊂ Rn be a convex domain which can be included in a strip (infinite slab) S of width 2b. We may assume that that strip is perpendicular to the first coordinate axis and centered with respect to the origin. That is, S = {x = (x1 , x2 , ..., xn ) | |x1 | ≤ b}. We aim to compare the ground state eigenfunction of the Schrödinger operator − + V on ⊂ Rn with the ground state eigenfunction of − in S. We will assume that both V and are symmetric with respect to the hyperplane x1 = 0 and that V is increasing in x1 for x1 ≥ 0. Under these assumptions we have Proposition 1. Let φ1 denote the ground state eigenfunction of − + V on and let ψ1 denote the ground state eigenfunction of − on S. Then ∂ ∂ ln(φ1 )(x) < ln(ψ1 )(x), ∂x1 ∂x1 for every x ∈ with x1 > 0. Proof. We can and will assume without loss of generality that the potential V is smooth and that ∂x∂ 1 V (x) > 0 for every x with x1 > 0, that has a smooth boundary and that the coordinate axis x1 is not tangent to that boundary for x1 > 0, and that the closure of is contained in the interior of the strip S. The general case follows from a limit argument. Moreover, we will replace the strip by a translate S = {(x1 , x2 , ..., xn ) | |x1 + | ≤ b} for a positive such that the closure of is still contained in the interior of S . By symmetry, ∂ ∂ () (ln φ1 )(x) = 0 < ln ψ1 (x) ∂x1 ∂x1 for every x ∈ with x1 = 0. On the other hand, by Hopf’s boundary lemma ∂ ln(φ1 )(x) = −∞ ∂x1 for every x ∈ ∂ with x1 > 0. Thus we only have to show that the assertion is correct in the interior of + ≡ ∪ {(x1 , x2 , ..., xn ) | x1 > 0}. Assume that ∂ ∂ () (ln φ1 )(x) − ln ψ1 (x) ∂x1 ∂x1 attains a nonnegative maximum at an interior point zM of + . We emphasize that it is sufficient to show the assumption that that maximum is equal to 0 leads to a contradiction (cf. the last section of [12] for the method of continuity). Indeed, if the maximum is strictly positive, then we can and will replace the strip S by a strip with width greater than 2b in order to make the above maximum equal to 0 (take into account that ln(φ1 ) is concave [5] and [12]). Thus, ∇x
∂ ∂ () (ln φ1 )(zM ) = ∇x ln ψ1 (zM ) ∂x1 ∂x1
Gradient Estimates and Applications
547
and x
∂ ∂ () ln ψ1 (zM ). (ln φ1 )(zM ) ≤ x ∂x1 ∂x1
(1)
()
Obviously, ∇x ∂x∂ 1 (ln ψ1 )(zM ) is parallel to the x1 -axis. Recall also that ∂ ∂ () (ln φ1 )(zM ) = ln ψ1 (zM ). ∂x1 ∂x1 Thus, ∂ ∂ () () ∇x (ln φ1 )(zM ), ∇x (ln φ1 )(zM ) = ∇x (ln ψ1 )(zM ), ∇x (ln ψ1 )(zM ) , ∂x1 ∂x1 (2) where , denotes the inner product. On the other hand, differentiating x (ln φ1 ) + |∇x ln φ1 |2 − V + λV1, = 0 in the direction of x1 , we obtain that x
∂ ∂ (ln φ1 ) + 2 < ∇x (ln φ1 ), ∂x1 ∂x1
∇x (ln φ1 ) > −
∂ V = 0. ∂x1
Similarly, x
∂ ∂ () () (ln ψ1 ) + 2 < ∇x (ln ψ1 ), ∂x1 ∂x1
()
∇x (ln ψ1 ) > = 0.
Since − ∂x∂ 1 V < 0, we have arrive at a contradiction to (1) and (2) and that completes the proof. Theorem 1. Suppose that and V are as in the above proposition. Suppose that there is an odd eigenfunction φ2 of − + V with respect to x1 . That is, φ2 (−x1 , x2 , ..., xn ) = −φ2 (x1 , x2 , ..., xn ) for all (x1 , x2 , ..., xn ) ∈ . Then λV2, − λV1, ≥ Proof. Let χ ≡ φ2 /φ1 . Then λV2, − λV1,
3π 2 3π 2 ≥ 2 . 2 4b d
|∇x χ |2 φ12 dx = χ 2 φ12 dx
(3)
∂ 2 2 χ φ1 dx ∂x ≥ 1 . χ 2 φ12 dx
(4)
548
R. Bañuelos, P. Kröger
Clearly the quantity
∂ 2 2 χ φ1 dx1 I ∂x1 , 2 2 I χ φ1 dx1
where I is the set of all x1 such that (x1 , x2 , ..., xn ) belongs to , is bounded from below by the first nonzero Neumann eigenvalue µ1 of the operator −
∂2 ∂ ∂ −2 ln φ1 2 ∂x ∂x ∂x1 1 1
(5)
∂χ on I . Set u ≡ ∂x /χ (cf. [7], the end of Sect. 2.15). Thus, the eigenvalue µ1 is the 1 smallest positive number µ such that the Ricatti equation
∂ ∂ 2 u+u +2 ln φ1 u + µ = 0 ∂x1 ∂x1 has a solution on I ∩ [0, ∞) that satisfies limx1 ↓0 u(x1 ) = +∞ and inf I ∩[0,∞) u = 0. The Ricatti equation ∂ ∂ v + v2 + 2 ln ψ1 v + µ = 0 ∂x1 ∂x1 is obtained in a similar way. Thus we have arrived at a pair of first order equations where the second equation satisfies the standard uniqueness theorem. The above proposition implies via pointwise comparison of the solutions of the two Riccati equations that the eigenvalue µ1 is bounded from below by the first nonzero Neumann eigenvalue of the operator −
∂2 ∂ ∂ −2 ln ψ1 2 ∂x ∂x ∂x1 1 1
(6)
on (−b, b), which is 3π 2 /4b2 . This completes the proof of the theorem. Remark 1. Alternatively, we can use the variational characterization in order to compare the lowest eigenvalue of the operator (5) with that of (6). Let I = (−a, a), where a ≤ b. Let u be the eigenfunction of the problem (5) in I corresponding to the lowest nonzero eigenvalue µ1 . This function, in addition to u (a) = u (−a) = 0, satisfies (from the symmetry of the situation) (i) uu ≥ 0 for x ≥ 0, uu ≤ 0 for x ≤ 0, and (ii) u(−x) = −u(x). By (ii) we see that
a
−a
u(x)ψ12 (x) dx = 0,
where ψ1 is the eigenfunction for the Laplacian in (−a, a). Let ν1 be the lowest nonzero eigenvalue of the problem (6) restricted to the interval (−a, a). By domain monotonicity,
Gradient Estimates and Applications
549
2
ν1 ≥ 3π . Testing with the eigenfunction u of problem (5) in problem (6) in the interval 4b2 (−a, a), we get that a a ν1 u2 |ψ1 |2 dx ≤ |u |2 |ψ1 |2 dx −a −a a
=− u u ψ12 dx −a a a =− uu ψ12 dx − 2 uu ψ1 ψ1 dx, −a
−a
where we have used the “prime” notation to denote derivatives in the x1 direction. From (5) we know that φ −uu ψ12 = µ1 ψ12 |u|2 + 2ψ12 1 u u. φ1 Hence, ν1
a −a
or
u2 |ψ1 |2 dx ≤ µ1
a −a
u2 ψ12 dx + 2
a −a
u u
φ1 ψ − 1 φ1 ψ1
ψ12 dx
φ1 ψ1 ψ12 dx uu − 3π 2 φ ψ 1 1 ≤ ν1 ≤ µ1 + 2 −a a . 4b2 2 2 u |ψ1 | dx
a
−a
This together with property i) of u, the proposition, and (3) and (4) proves the comparison result. By the result of Payne [10], for V = 0 and ⊂ R2 symmetric with respect to both axes and convex on both axes, there exists an eigenfunction corresponding to λ2 whose nodal line is the intersection of the domain with one of the two axes. Therefore we have Corollary 1. Suppose V = 0 and let ⊂ R2 be symmetric and convex with respect to both coordinate axes. Let l = 2b be the length of its major axis. Then λ2, − λ1, ≥
3π 2 . l2
References 1. Ashbaugh, M., Benguria, R.: Optimal lower bounds for eigenvalue gaps for Schödinger operators with symmetric single well potentials and related results, Maximum principles and eigenvalue problems in partial differential equations. White Plains, NY: Longman, 1988 2. Ashbaugh, M., Benguria, R.: Optimal lower bounds for the gap between the first two eigenvalues of onedimensional Schrödinger operators with symmetric single-well potentials. Proc. Am. Math. Soc. 105, 419–424 (1989) 3. Bañuelos, R., Méndez-Hernández, P.J.: Sharp inequalities for heat kernels of Schrödinger operators and applications to spectral gaps. J. Funct. Anal. 176, 368–399 (2000) 4. van den Berg, M.: On condensation in the free–boson gas and the spectrum of the laplacian. J. Statist. Phys. 31, 623–637 (1983)
550
R. Bañuelos, P. Kröger
5. Brascamp, H.J., Lieb, E.H.: On extensions of the Brunn-Minkowski and Prékopa-Leindler theorems, including inequalities for log concave functions, and with an application to the diffusion equation. J. Funct. Anal. 22, 366–389 (1976) 6. Davis, B.: On the spectral gap of the Dirichlet Laplacian. Arkiv. Mat. (to appear) 7. Ince, E.L.: Ordinary differential equations. New York: Dover Publ., 1956 8. Kröger, P.: An extension of a theorem by Brascamp and Lieb. Proc. XIV Encuentro de la Zona Sur, Lican Ray, 2000 9. Lin, J.: A lower bound for the gap between the first two eigenvalues of Schrödinger operators on convex domains of S n and R n . Michigan Math. J. 40, 259–270 (1993) 10. Payne, L.: On two conjectures in the fixed membrane eigenvalue problem. Z. Angew. Math. Phys. 24, 721–728 (1973) 11. Qihuang, Y., Zhong, J.Q.: Lower bounds for the gap between the first and second eigenvalues for the Schrödinger operator. Trans. Am. Math. Soc. 294, 341–349 (1986) 12. Singer, I.M., Wang, B., Yau, S.-T., Yau, St. S.-T.: An estimate of the gap of the first two eigenvalues of the Schrödinger operator. Ann. Scuola Norm. Sup. Pisa Cl. Sci. 12, 319-333 (1985) 13. Smits, R.: Spectral gaps and rates to equilibrium for diffusions in convex domains. Michigan Math. J. 43, 141–157 (1996) 14. Yau, S.-T.: Open problems in geometry, Proceedings of symposia in pure mathematics. Vol. 54, 1–22 (1993) Communicated by B. Simon
Commun. Math. Phys. 224, 551 – 564 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Statistics of a Flux in Burgers Turbulence with One-Sided Brownian Initial Data J. Bertoin1,2 , C. Giraud1 , Y. Isozaki3 1 Laboratoire de Probabilités et Modèles Aléatoires, Université Paris 6, 175 rue du Chevaleret, 75013 Paris,
France. E-mail: [email protected]; [email protected]
2 Institut universitaire de France. 3 Department of Mathematics, Graduate School of Science, Osaka University, Machikaneyama-cho,
Toyonaka 560-0043, Japan. E-mail: [email protected] Received: 18 April 2001 / Accepted: 16 July 2001
Abstract: We study the statistics of the flux of particles crossing the origin, which is induced by the dynamics of ballistic aggregation in dimension 1, under certain random initial conditions for the system. More precisely, we consider the cases when particles are uniformly distributed on R at the initial time, and if u(x, t) denotes the velocity of the particle located at x at time t, then u(x, 0) = 0 for x < 0 and (u(x, 0), x ≥ 0) is either a white noise or a Brownian motion.
1. Introduction This work is motivated by a question in the so-called Burgers turbulence that is perhaps more easily formulated in terms of the model of ballistic aggregation (i.e. system of free sticky particles). Specifically, consider infinitesimal particles that are uniformly spread in the one-dimensional space R, and suppose that at the initial time, particles receive some impulse, inducing a velocity field. Then let the system evolve according to the dynamics of completely inelastic shocks. As time passes, this induces the formation of clusters (i.e. point masses); more precisely, particles clump in case of collision to form a single cluster whose mass and velocity are determined by the law of conservation of mass and momentum. Of course, the velocity of particles is unchanged as long as they are not involved in shocks. Suppose now that the particles in (−∞, 0] are at rest at the initial time and that sufficiently many particles in [0, ∞) have a negative velocity. We may observe as time passes a flux of clusters that cross the point 0 from the right to the left. Typically, if we denote by a(0, t) the total mass of particles that crossed 0 up to time t, then t is a jump time of a(0, ·) if and only if there is a cluster crossing 0 at time t, and the mass of this cluster is just the size of the jump, a(0, t) − a(0, t−). Moreover, it can be seen that the velocity of this cluster is then a function of t, a(0, t−) and a(0, t), so that the flux is completely determined by the process (a(0, t), t ≥ 0).
552
J. Bertoin, C. Giraud, Y. Isozaki
Let us now present the interpretation in terms of Burgers turbulence. Consider Burgers equation ∂t u + u∂x u = ε∂xx u, where ε > 0 is a viscosity parameter, with initial data u(·, 0). It is known that the solution uε has a limit u when the viscosity tends to 0, and that in this setting, u(0, t) = −a(0, t)/t. In other words, the flux described above also specifies the evolution of the velocity field evaluated at the location 0. The purpose of this work is to investigate the statistics of this flux when the initial velocities on [0, ∞) are random. Specifically, we will be interested in the cases when they are given by either a white noise or a Brownian motion, which have been considered first by Burgers [5], and by Sinai [15] and She et al. [14]. We also refer to Tribe and Zaboronski [16] and Frachebourg et al. [7] for recent works on Burgers turbulence where the initial impulse is given by a Gaussian white noise supported by a finite or semi-infinite interval. Roughly, in both cases a(0, ·) is pure jump and satisfies a time-inhomogeneous Markov property 1 ; more precisely jumps occur on a discrete set of times in the case of white noise initial velocity, whereas the jump times are everywhere dense for Brownian initial velocity. Our main results specify the statistics of the jumps of a(0, ·) in the white noise case, and its transition probabilities in the Brownian case. The rest of this note is organized as follows. In the next section we recall some standard features on the evolution of sticky particles and its connection with the inviscid Burgers equation. Sections 3 and 4 are devoted to the statements and proofs of our results in the case of white noise and Brownian initial velocities, respectively. 2. Some Background on Burgers Equation Provided that at the initial time the mass distribution of particles is given by the Lebesgue measure on R, it is well-known (see [5, 11, 10, 6, 12], ...) that the state of the system at time t > 0 for the dynamics of ballistic aggregation can be completely described in terms of the initial velocity field u(·, t = 0) as follows. · First, we introduce the so-called initial potential U (·) = 0 u(y, 0)dy, so for x ≥ 0, U (x) represents the initial momentum of particles located in [0, x]. We assume that the initial data ensure that U (x) = o(x 2 ) as x → ±∞. For every x ∈ R and t > 0, we denote by a(x, t) the right-most location of the overall minimum of the function z → U (z) +
(z − x)2 ; 2t
see Fig. 1. Alternatively, this quantity can be viewed as the right-most location of the overall minimum of z z→ (tu(y, 0) + y − x) dy, z ∈ R. 0
We shall refer to a(x, t) as the inverse Lagrangian function evaluated at location x and time t. It corresponds to the right-most initial location of the particles that lie in (−∞, x] at time t. 1 The Markov property relies crucially on the hypothesis that at the initial time, the particles in (−∞, 0) have velocity zero, and would fail if we considered instead e.g. a two-sided Brownian motion as initial data. Indeed, it may happen in that case that at time t > 0, there is a cluster with mass m that passes across 0 from the left to the right. The dynamics of ballistic aggregation impose that, if the next cluster that will cross 0 arrives from the right, then its mass will be greater than m, which clearly impedes the Markov property.
Statistics of Flux in Burgers Turbulence with Brownian Initial Data
x
553
a(x, t) U (z)
1 (z − x)2 + C − 2t
Fig. 1. Geometrical Interpretation of a(x, t) mass location
velocity
Fig. 2. Geometrical Interpretation of a shock
For every fixed t > 0, x → a(x, t) is a right-continuous and increasing function, and its jumps are related to the clusters in the system. More precisely, there is a cluster located at x ∈ R at time t if and only if a(x, t) > a(x−, t). By the conservation of mass and momentum, the mass of this cluster is given by the size of the jump a(x, t) − a(x−, t) and its velocity by u(x, t) =
1 a(x, t) − a(x−, t)
a(x,t) a(x−,t)
u(y, 0)dy =
1 (2x − a(x, t) − a(x−, t)), 2t
(1)
see Fig. 2. When the initial velocities are identically zero on (−∞, 0] (which is assumed in this work), a(0, t) represents the total mass of the particles that crossed 0 up to time t. In particular the map t → a(0, t) is also right-continuous and increasing. On the other hand, it is easily seen that its left-limit a(0, t−) at t > 0 coincides with a(0−, t). So the velocity of the cluster crossing 0 at time t (if any) can also be expressed as u(0, t) = −
a(0, t) + a(0−, t) a(0, t) + a(0, t−) =− , 2t 2t
(2)
which shows that the process (a(0, t), t ≥ 0) completely describes the flux of clusters crossing 0.
554
J. Bertoin, C. Giraud, Y. Isozaki
Let us turn our attention to the connection with the inviscid Burgers equation. If we define u(x, t) =
x − a(x, t) t
(3)
for every point x ∈ R at which a(·, t) is continuous and by (1) otherwise, then u is the well-known entropy solution to the inviscid Burgers equation ∂t u + u∂x u = 0.
(4)
In particular, describing the solution at the fixed location 0 as time varies, (u(0, t), t ≥ 0), is the same as describing the flux of particles crossing 0 in the model of ballistic aggregation. An interesting consequence of (3) and (4) for the study of the flux is that it solves ∂t a + u∂x a = 0
(5)
in the weak sense (in (4) and (5), it is of course crucial to define properly the value of u(x, t) at its discontinuity points, which is the purpose of (1)). The physical interpretation of this PDE is clear : ∂x a gives the mass field and u the velocity field of the particles, so (5) is just the transport equation. Observe also that taking derivatives with respect to the spatial variable x in (5) yields the equation of conservation of masses, see e.g. [6]. 3. White Noise Initial Velocity 3.1. Main results. In this section, we focus on white noise initial velocity, i.e. u(x, 0) = 0 for x ≤ 0 and u(x, 0) =
dWx for x > 0, dx
where (Wx , x ≥ 0) is a standard Brownian motion which thus coincides with the initial potential U . So, strictly speaking, the initial velocities u(·, 0) are not given by a classical function, but rather by a generalized function which is the derivative of a random continuous path. However it is well known that the analysis of the inviscid Burgers equation can be extended to such singular initial data. More precisely the singularity immediately disappears, in the sense that the solution u(·, s) to Burgers equation is a classical function for any s > 0. An important guide for our analysis is the fact that the shock structure is discrete (see [2]). This means that at time s > 0, all the particles are concentrated on clusters whose locations form a discrete set in space. As a consequence, (a(0, s), s ≥ 0) is a.s. a step process, in the sense that it increases only by jumps and the number of the jumps (i.e. of clusters of particles that cross 0) during the time interval [ε, 1/ε] is finite for every ε > 0. We shall show that a(0, ·) is a time-inhomogeneous Markov process. Its one-dimensional distributions have been determined by Groeneboom [9], see also Sect. 3 in Frachebourg et al. [7]. To give an explicit expression, it is convenient to first introduce the following notation. We write Ai for the Airy function (see [1] on p. 446), and −ω1 > −ω2 > . . . for its zeros ranked in decreasing order. Following Groeneboom [9], we introduce the function g : R → R+ which has Fourier transform ∞ 21/3 g(ξ ˆ )= eiξ x g(x)dx = , ξ ∈ R, Ai(i2−1/3 ξ ) −∞
Statistics of Flux in Burgers Turbulence with Brownian Initial Data
555
and the function h(w, .) : R+ → R+ that is determined via its Laplace transform
∞
e−λx h(w, x) dx =
0
Ai(22/3 w + 2−1/3 λ) . Ai(2−1/3 λ)
There is also a representation of h(w, m) as a series : h(w, m) = 21/3
∞ Ai(22/3 w − ωn ) n=1
Ai (−ω
n)
exp − 21/3 mωn .
Then according to Corollary 3.1 in [9], we have P(a(0, s) ∈ dm) = (2s)−2/3 g (2s)−2/3 m
∞
h w, (2s)−2/3 m dw
dm.
0
By the Markov property, since the one-dimensional distributions of the process a(0, ·) are known, the statistics of its evolution are completely determined by the family of conditional distribution of the pair (T (s), M(s)) given a(0, s), where T (s) = inf {t > s : a(0, t) > a(0, s)} ,
M(s) = a(0, T (s)) − a(0, s)
denote the first instant after time s at which a cluster crosses 0, and the mass of this cluster. The calculations of these conditional laws rely again crucially on the work of Groeneboom [9]; more precisely, the fact that the shock structure is discrete will enable us to relate the rates of jump of the map s → a(0, s) to that of x → a(x, s) at x = 0 (which has been computed by Groeneboom). In order to state the result, we need to introduce the function p : R+ → R+ , p(x) = 2
exp − 21/3 ωk x .
k≥1
Alternatively, p is determined by the Laplace transform
∞
√ e−λx p(x) − (2πx 3 )−1/2 dx = 22/3 Ai (2−1/3 λ)/Ai(2−1/3 λ) + 2λ.
0
Theorem 1. For white noise initial velocity, a(0, ·) is a time-inhomogeneous Markov step process. For any m, s > 0, the conditional distribution of the instant and the size of the first jump of a(0, ·) after time s given a(0, s) = m is P (T (s) ∈ dt, M(s) ∈ dy | a(0, s) = m)
s 1/3 y(2m + y) m3 1 1 = exp − − t 6 s2 t2 4t 3
g (2t)−2/3 (m + y) · p (2t)−2/3 y dy dt, g (2s)−2/3 m where y > 0 and t > s.
556
J. Bertoin, C. Giraud, Y. Isozaki
Specifying the above formula for t = s, we obtain the rates of jump of s → a(0, s) : lim
t→s
1 P (a(0, t) − a(0, s) ∈ dy | a(0, s) = m) t −s g (2s)−2/3 (m + y) y(2m + y)
−2/3 = p (2s) y dy. 4s 3 g (2s)−2/3 m
Before starting to prove Theorem 1, recall from Corollary 3.4(ii) in [9] that
2 g(x) ∼ 4x exp − x 3 3
as x → ∞,
which enables us to estimate the decay of the rates of jump, e.g. when the size y → ∞. For instance one gets for s = 1/2 and a(0, 1/2) = m that the rate of jump of size y is of order 2 y 3 exp − (y + m)3 − 21/3 ω1 y 3 when y is large (the first root of the Airy function is −ω1 ≈ −2.3381). 3.2. Proof of Theorem 1. We have already explained why a(0, ·) must be a step process. An argument close to that in Avellaneda and E [2] shows that the (time-inhomogeneous) Markov property follows from a general path decomposition for Markov processes due to Millar [13]. More precisely, Millar’s on a(0, s) = m, result entails that conditionally the processes (Wx , x ≤ a(0, s)) and Wa(0,s)+x − Wa(0,s) , x ≥ 0 are independent. For r < s < t, a(0, r) can be viewed as the location of the minimum of x → Wx + x 2 /2r for x ≤ m, whereas a(0, t) − a(0, s) coincides with the location of the minimum of x → Wm+x − Wm + (x + m)2 /2t for x ≥ 0; see Fig. 3. The Markov property follows.
m
W (x)
1 x2 + C − 2s
1 x2 + C − 2t
Fig. 3.
Statistics of Flux in Burgers Turbulence with Brownian Initial Data
557
We now turn our attention to the conditional distribution of the pair (T (s), M(s)) given a(0, s) = m, that is we have to calculate the asymptotic of P (T (s) ∈ [t, t + h], M(s) ∈ [y, y + η] | a(0, s) = m) as h, η → 0+. As a first step, we express the latter quantity in the form A × B where A = P (T (s) ≥ t | a(0, s) = m) , B = P (T (s) ∈ [t, t + h], M(s) ∈ [y, y + η] | a(0, s) = m, T (s) ≥ t) . We start with the computation of the conditional probability A. For every z > 0, let Pz denote the law of a Brownian motion (Wx , x ≥ 0) starting from z. On the one hand, it is known from the work of Millar [13] that the conditional law Pz (· | Wx ≥ −x(x + 2m)/2s for all x ≥ 0) has a weak limit when z decreases to 0, which serves as the conditional distribution of the process (Wa(0,s)+x − Wa(0,s) , x ≥ 0) given a(0, s) = m. On the other hand, the analysis of the dynamics of ballistic aggregation sketched in Sect. 2 shows that the first instant T (s) after time s at which a cluster crosses 0, occurs after time t if and only if Wa(0,s)+x − Wa(0,s) ≥ − 2t1 x(x + 2a(0, s)) for all x ≥ 0. As a consequence 1 A = P Wa(0,s)+x − Wa(0,s) ≥ − x(x + 2a(0, s)) for all x ≥ 0 | a(0, s) = m 2t 1 z P Wx ≥ − 2t x(x + 2m) for all x ≥ 0 = lim . z↓0 Pz Wx ≥ − 1 x(x + 2m) for all x ≥ 0 2s According to Corollary 3.1 in [9], we have for every r > 0 that 1 ∂ lim Pz Wx ≥ − x(x + 2m) for all x ≥ 0 z↓0 ∂z 2r 3
m = (2r)−1/3 exp g (2r)−2/3 m , 2 6r where g has been defined before Theorem 1. By l’Hospital’s rule, we conclude that the first conditional probability is
s 1/3 g (2t)−2/3 m m3 1 1 . A= exp − − 2 (6) t 6 s2 t g (2s)−2/3 m We next turn our attention to the conditional probability B. The Markov property of s → a(0, s) applied at time t shows that B = P (T (t) ∈ [t, t + h], M(t) ∈ [y, y + η] | a(0, t) = m) . We shall first estimate this quantity when h → 0+ and η → 0+, in terms of the jump rate of x → a(0, x). Lemma 1. We have the following estimate when h, η → 0+ : P (T (t) ∈ [t, t + h], M(t) ∈ [y, y + η] | a(0, t) = m) ∼ P (a((m + y/2)h/t, t) − a(0, t) ∈ [y, y + η] | a(0, t) = m) .
558
J. Bertoin, C. Giraud, Y. Isozaki
Let us sketch a heuristic argument for Lemma 1 (the rigorous proof is postponed to the end of this section). It is convenient to consider the left-most cluster located in [0, ∞) at time t, so the location of this cluster is ' = inf{x ≥ 0 : a(x, t) > a(0, t)} and its mass µ = a(', t) − a(0, t). On the one hand, as the shock structure is discrete, we may neglect the effects of collisions during small time-intervals. Therefore, a cluster of mass ≈ y and velocity ≈ v, crosses 0 during the time-interval [t, t + h] “if and only if” the left-most cluster in [0, ∞) at time t is located at ' ∈ [0, −vh] and has mass µ ≈ y and velocity ≈ v (roughly, this is a loose interpretation of the transport equation (5)). Recall from (2) that given a(0, t) = m, we must have v ≈ −(m + y/2)/t. On the other hand, because the process x → a(x, t) is an inhomogeneous Markov step process (see [9] or [2]), we have as h → 0+, P (' ∈ [0, (m + y/2)h/t] and µ ∈ [y, y + η] | a(0, t) = m) ∼ P (a((m + y/2)h/t, t) − a(0, t) ∈ [y, y + η] | a(0, t) = m) .
(7)
We thus arrive at the stated estimate. In order to evaluate the quantities appearing in Lemma 1, we can invoke Theorem 4.1 of Groeneboom 2 . If we denote by νt (m, y) the rate of jump of size y for x → a(x, t) at x = 0, conditionally on a(0, t) = m, then we have for t = 1/2, ν1/2 (m, y) = 2yp(y)
g (m + y) . g (m)
This means that for every y > y, we have the following estimate as h → 0+: y ν1/2 (m, z) dz. P a(h, 1/2) − a(0, 1/2) ∈ [y, y ] | a(0, 1/2) = m ∼ h y
It is an easy matter to deduce from this the expression of νt (m, y) for an arbitrary t > 0. Indeed, the scaling property of Brownian motion propagates to the inverse Lagrangian function, in the sense that for every t > 0, law (a(x, t), x ≥ 0) = (2t)2/3 a(x(2t)−2/3 , 1/2), x ≥ 0 , see e.g. Eq. (21) in She et al. [14]. It follows readily that the conditional rates of jump at time t are given by νt (m, y) = (2t)−4/3 ν1/2 (2t)−2/3 m, (2t)−2/3 y g (2t)−2/3 (m + y) y −2/3 . = 2 p((2t) y) 2t g (2t)−2/3 m By Lemma 1, we get as h, η → 0+ that P (T (t) ∈ [t, t + h], M(t) ∈ [y, y + η] | a(0, t) = m)
g (2t)−2/3 (m + y) y(2m + y) −2/3 . p((2t) y) ∼ hη 4t 3 g (2t)−2/3 m
2 In fact, Groeneboom considered a two-sided white noise as initial velocity, which induces spatial stationarity for the inviscid Burgers turbulence. Replacing the two-sided white noise by a one-sided white noise modifies the one-dimensional distributions of the process a(·, 1/2), but not its transition probabilities nor its infinitesimal generator (this is easily seen from a perusal of Groeneboom’s argument).
Statistics of Flux in Burgers Turbulence with Brownian Initial Data
559
Combining this with (6) yields the formula stated in Theorem 1. So all that we need to complete the proof of Theorem 1 is to establish Lemma 1. Proof of Lemma 1. We shall follow the heuristic approach sketched above; the main technical problem is to check that the effects of shocks during a small time interval can be neglected. Let us work conditionally on a(0, t) = m and fix ε > 0. Denote by ρ(', µ, m, ε) the conditional probability given the location ' and the mass µ of the first cluster at the right of 0 at time t, that during the time interval (t, t + ε] there is at least one cluster that crosses the location '. By the same argument based on Corollary 3.1 in [9] as that we used to obtain (6), we see that ρ(', µ, m, ε) = 1 −
k(t + ε, m + µ − ') , k(t, m + µ − ')
where k(r, m) = (2r)−1/3 exp
m3 6r 2
g (2r)−2/3 m .
Hence, for every c > 0, if we set ρ - (ε, m, c) :=
sup ρ(', µ, m, ε), 0≤',µ≤c
then ρ - (ε, m, c) ≤ ε ×
sup t≤τ ≤t+ε
0≤µ,'≤c
|∂1 k(τ, m + µ − ')| , k(t, m + µ − ')
(8)
and this uniform upper bound tends to 0 as ε decreases to 0. Next, we write for simplicity T for T (t) and M for M(t), i.e. T is the first instant after time t at which a cluster crosses 0, and M is the mass of this cluster. We thus have a(0, T −) = m and a(0, T ) = m + M, and by (2), the velocity of this cluster is v = −(m + M/2)/T . By the conservation of masses and momenta, the center of the mass of the particles that constitute this cluster is located at −v(T − t) at time t. As a consequence, the location ' of the left-most cluster in [0, ∞) at time t must fulfill ' ≤ (m + M/2)(T − t)/T ∼ (m + M/2)(T − t)/t
as T → t + .
Combining these two observations, we deduce the following for every fixed 0 < dt ε. On the one hand, the event, say /, that T ∈ [t, t + dt] and M ∈ [y, y ], is implied by the event that ' ∈ [0, (m + y/2)dt/t], µ ∈ [y, y ], and that no cluster crosses ' during the time interval (t, t + ε]. This yields the lower-bound P T ∈ [t, t + dt], M ∈ [y, y ] | a(0, t) = m ≥ P ' ∈ [0, (m + y/2)dt/t], µ ∈ [y, y ] | a(0, t) = m 1 − ρ - (ε, m, y ) . On the other hand, this same event / forces that ' ∈ [0, (m + y /2)dt/t], and either that µ ∈ [y, y ], or that µ < y and there is at least one cluster that crosses the location ' during the time interval (t, t + ε]. This yields the upper-bound P T ∈ [t, t + dt], M ∈ [y, y ] | a(0, t) = m ≤ P ' ∈ [0, (m + y /2)dt/t], µ ∈ [y, y ] | a(0, t) = m) + P ' ∈ [0, (m + y /2)dt/t], µ < y | a(0, t) = m) ρ - (ε, m, y).
560
J. Bertoin, C. Giraud, Y. Isozaki
Finally, letting ε → 0+ and then y → y, we obtain from (8) that P (T ∈ [t, t + dt], M ∈ [y, y + dy] | a(0, t) = m) dt dy P (' ∈ [0, (m + y/2)dt/t], µ ∈ [y, y + dy] | a(0, t) = m) . = dt dy By (7), the proof of Lemma 1 is complete.
4. Brownian Initial Velocity 4.1. Main results. Throughout this section, we assume Brownian initial velocities, i.e. u(x, 0) = 0 for x < 0 and (u(x, 0), x ≥ 0) is a standard Brownian motion. One major difference with the white noise case is that now there are no rarefaction intervals, i.e. at any time s > 0, the locations of clusters form an everywhere dense subset on [0, ∞) with probability 1 (cf. Sinai [15]). We now state the main result of this work in this setting. Theorem 2. For Brownian initial velocities, a(0, ·) is a time-inhomogeneous Markov process that increases only by jumps. The transition probabilities can be described as follows. For every q, m > 0 and 0 < s < t, we have E (exp {−q (a(0, t) − a(0, s))} | a(0, s) = m) s m(t − s) t −s 2q + 1 − 1 = 2t . exp − + t st 2 t 2t 2 q + 1 It is interesting to observe that lim E (exp {−q (a(0, t) − a(0, s))} | a(0, s) = m) = 0,
q→∞
which entails that on any non-void time-interval (s, t], there are clusters of particles that cross 0. This is certainly not surprising due to the absence of rarefaction intervals. We also point out that it is easy to derive from the formula in Theorem 2 the jump rates of a(0, ·) (that is the rate of clusters with a given mass crossing zero) conditional on its current value. Specifically, we see that when t decreases to s, 1 E (1 − exp {−q (a(0, t) − a(0, s))} | a(0, s) = m) t −s 1 m 2 2 2s q + 1 − 1 + ∼ 3 1 − 1/ 2s q + 1 . s 2s Inverting the Laplace transform, we obtain the following formula for the rate of jump of size y for a(0, ·) at time s conditionally on a(0, s) = m: 1 P (a(0, t) − a(0, s) ∈ [y, y + dy] | a(0, s) = m) t −s
y 2m + y 1 ∼ exp − 2 dy. 2s 2s s 2πy 3
(9)
Statistics of Flux in Burgers Turbulence with Brownian Initial Data
561
As a check, it may be interesting to recover (9) by the following informal argument, inspired by our approach via Lemma 1 in the white noise case. If we neglect the effects of collisions during a small time-interval, then we should have P (a(0, t) − a(0, t−) ∈ [y, y + dy] for some t ∈ [s, s + ds] | a(0, s) = m) 2m+y ds] | a(0, s) = m . ∼ P a(x, s)−a(x−, s) ∈ [y, y + dy] for some x ∈ [0, 2s (10) On the other hand, we know from Theorem 1 in [3] that the point process of the jumps of the inverse Lagrangian function, (a(x, s) − a(x−, s), x ≥ 0) (that is the process of the masses of clusters at time s as a function of their location) is independent of a(0, s) and has the distribution of a Poisson point process on (0, ∞) with characteristic measure
y 1 νs (dy) = exp − 2 dy. 2s s 2πy 3 So the probability of the right-hand side in (10) equals 2m + y ds νs (dy), 2s which is the formula found in (9). Unfortunately, it does not seem easy to make this heuristic argument rigorous, because it does not take into account the collisions during the time interval [s, s + ds] (recall that there are no rarefaction intervals a.s.). The rest of this section is devoted to the proof of Theorem 2.
4.2. Proof of Theorem 2. Let us first establish the easiest parts of the statement. To start with, recall that with probability one, at any positive time almost every particle belongs to some clusters. See Sinai [15] or E et al. [6]. We may therefore identify a(0, s) with the total mass of the clusters that crossed 0 by time s, and the latter, viewed as a process depending on s, is obviously pure jump. Next, we lift from Lemma 1 in [3] the key fact that a(0, s) is a splitting time for the Brownian motion u(·, 0). Specifically, write F s for the sigma-field generated by (u(x, 0), x ≤ a(0, s)). Because a(0, s) increases with s, (Fs )s≥0 is a filtration and the process a(0, ·) is clearly (Fs )-adapted. Then for each s > 0, (u(x + a(0, s), 0), x ≥ 0) is independent of Fs .
(11)
On the other hand, the increment a(0, t) − a(0, s) can be identified as the (right-most) location of the minimum of the function z z→ (tu(y + a(0, s), 0) + y + a(0, s)) dy, z ≥ 0. 0
We thus see from (11) that the conditional distribution of a(0, t) given Fs only depends on a(0, s), which establishes the Markov property. So all that we need now is to calculate the (time-inhomogeneous) transition probabilities.
562
J. Bertoin, C. Giraud, Y. Isozaki
In this direction, we consider the solution v to the inviscid Burgers equation with initial velocity v(·, 0) given for x ≤ 0 by v(x, 0) = 0 and for x ≥ 0 by v(x, 0) = u(x, s) − u(0, s) = u(x, s) + a(0, s)/s. We write
(x, r) → α(x, r) = x − rv(x, r)
for the associated inverse Lagrangian function. Lemma 2. We have for every x ≥ 0 sx t α(x + x0 , t − s) = + a(x, t), t −s t −s where
t −s a(0, s). s Proof. It is convenient to consider first the so-called delayed solution (cf. Sect. 4.3 in [4]), that is for every x ∈ R and r ≥ 0, we set u(x, ˜ r) = u(x, s + r). Then u˜ is the entropy solution to the inviscid Burgers equation ∂t u˜ + u∂ ˜ x u˜ = 0 with initial velocity u(·, ˜ 0) = u(·, s). Let us compare the two sticky particle systems with respective initial velocities v(·, 0) and u(·, ˜ 0). Because the initial relative velocities v(x, 0) − v(y, 0) = u(x, ˜ 0) − u(y, ˜ 0) are the same for x, y ≥ 0 and the initial velocities u(·, ˜ 0) and v(·, 0) both are non-positive on (−∞, 0], it should be plain from the dynamics of the system and the interpretation of v and u˜ as velocity fields that at any time r > 0, we have the identity x0 :=
v(x − r u(0, ˜ 0), r) = u(x, ˜ r) − u(0, ˜ 0) = u(x, s + r) − u(0, s). In particular, for r = t − s, if we set ˜ 0) = x0 := −r u(0,
t −s a(0, s), s
then we have x + x0 − α(x + x0 , t − s) = v(x + x0 , t − s) t −s = u(x, t) − u(0, s) x − a(x, t) a(0, s) = + ; t s and we arrive at the stated identity. We may now use results in [3] to analyze the distribution of variables arising in Lemma 2. In this direction, we re-write the identity there for x = 0 in the form a(0, t) = a(0, s) + τs,t (a(0, s)) + where τs,t (y) =
t t −s
t α(0, t − s), t −s
t −s α y, t − s − α(0, t − s) − y, s
(12)
y ≥ 0.
Theorem 2 now follows from (12) and an appeal to the following lemma.
Statistics of Flux in Burgers Turbulence with Brownian Initial Data
563
Lemma 3. (i) The process τs,t (x), x ≥ 0 is independent of the variables a(0, s) and α(0, t − s). It is a subordinator (increasing process with stationary and independent increments) with Laplace transform x(t − s) 2 E exp −qτs,t (x) = exp − 2t q + 1 − 1 . st 2 (ii) The Laplace transform of the random variable tα(0, t − s)/(t − s) is given by
qt s t −s E exp − . α(0, t − s) = + t −s t t 2t 2 q + 1 Proof. (i) According to Theorem 1 in [3], the process v(·, 0) is a Lévy process with no positive jumps which is independent of a(0, s). Because the inverse Lagrangian function α(·, t − s) is measurable with respect to v(·, 0), it is also independent of a(0, s). Then Theorem 2 in [3] applies to the initial velocity v(·, 0), and we get that the process α(·, t − s) is a subordinator started from α(0, t − s). We thus see that the process τs,t (·) is independent of a(0, s) and α(0, t −s), and has stationary and independent increments. Then it follows from Lemma 2 that increments of τs,t have the same distribution as those of x → a((t − s)x/s, t). According to Theorem 1 in [3] (and the scaling property), its Laplace transform is given by x 2 E (exp {−q (a(x, t) − a(0, t))}) = exp − 2 2t q + 1 − 1 , t and the formula for the Laplace transform of τs,t (x) follows. (ii) We can use the identity (12) and the independence stated in part (i) to determine the Laplace transform of tα(0, t − s)/(t − s) in terms of the other variables. Specifically, we get E (exp {−qtα(0, t − s)/(t − s)}) =
E (exp {−qa(0, t)}) . E exp −qa(0, s) − qτs,t (a(0, s))
On the one hand, the Laplace exponent of the subordinator τs,t has been given in part (i). On the other hand, we know from Theorem 1 in [3] (and the scaling property of Brownian motion) that for every r > 0, a(0, r) has a Gamma(1/4, 1/2r 2 ) distribution, i.e. its Laplace transform is given by −1/4
. E (exp {−qa(0, r)}) = 2r 2 q + 1 Straightforward (but lengthy) calculations then yield the stated formula. We mention that there is also an alternative proof which we now sketch. Because v(·, 0) is a Lévy process with no positive jumps, we can also determine the distribution of α(0, t − r) by adapting the argument of [3] on pp. 402–403, which deals with Brownian initial velocities. Specifically, one gets that if α(0, ˜ t − r) is an independent copy of α(0, t − r), then the sum α(0, t − r) + α(0, ˜ t − r) has the same distribution as the last passage time 's,t := sup {x ≥ 0 : (t − s)v(x, 0) + x = 0} .
564
J. Bertoin, C. Giraud, Y. Isozaki
So all that is needed is to calculate the Laplace transform of 's,t , which is an easy task since the distribution of the Lévy process v(·, 0) is known explicitly. One finds s t −s , E exp −q's,t = + √ t t 2qt (t − s) + 1 so
E (exp {−qα(0, t − r)}) =
and we recover the formula (ii).
s t −s + √ , t t 2qt (t − s) + 1
Acknowledgement. We should like to thank Ph. A. Martin as the question about the statistics of the flux that motivates this work was raised by him during a visit of J.B. to the Institut de Physique Théorique at the École Polytechnique Fédérale in Lausanne.
References 1. Abramowitz, M. and Stegun, I.A.: Handbook of Mathematical functions. Washington: Nat. Bur. Stand, 1964 2. Avellaneda, M. and E, W.: Statistical properties of shocks in Burgers turbulence. Commun. Math. Phys. 172, 13–38 (1995) 3. Bertoin, J.: The inviscid Burgers equation with Brownian initial velocity. Commun. Math. Phys. 193, 397–406 (1998) 4. Bertoin, J.: Clustering statistics for sticky particles with Brownian initial velocity. J. Math. Pures Appl. 79, 173–194 (2000) 5. Burgers, J.M.: The nonlinear diffusion equation. Dordrecht: Reidel, 1974 6. E, W., Rykov, Yu.G. and Sinai, Ya.G.: Generalized variational principles, global weak solutions and behavior with random initial data for systems of conservation laws arising in adhesion particle dynamics. Commun. Math. Phys. 177, 349–380 (1996) 7. Frachebourg, L., Jacquemet, V. and Martin, Ph.A.: Inhomogenenous ballistic aggregation. Preprint (2001) 8. Frachebourg, L. and Martin, Ph.A.: Exact statistical properties of the Burgers equation. J. Fluid Mech. 417, 323–349 (2000) 9. Groeneboom, P.: Brownian motion with a parabolic drift and Airy functions. Probab. Theory Relat. Fields 81, 79–109 (1989) 10. Kida, S.: Asymptotic properties of Burgers turbulence. J. Fluid Mech. 93, 337–377 (1979) 11. Lax, P.D.: Hyperbolic systems of conservation laws and the mathematical theory of shock waves. Philadelphia: Society for Industrial and Applied Mathematics, 1973 12. Martin, Ph.A. and Piasecki, J.: One dimensional ballistic aggregation: Rigorous long-time estimates. J. Stat. Phys. 76, 447–476 (1994) 13. Millar, P.W.: A path decomposition for Markov processes. Ann. Probab. 6, 345–348 (1978) 14. She, Z.S., Aurell, E. and Frisch, U.: The inviscid Burgers equation with initial data of Brownian type. Commun. Math. Phys. 148, 623–641 (1992) 15. Sinai, Ya.: Statistics of shocks in solution of inviscid Burgers equation. Commun. Math. Phys. 148, 601–621 (1992) 16. Tribe, R. and Zaboronski, O.: On the large time asymptotics of decaying Burgers turbulence. Commun. Math. Phys. 212, 415–436 (2000) 17. Woyczy´nski, W.A.: Göttingen Lectures on Burgers-KPZ turbulence. Lecture Notes in Math., Berlin– Heidelberg–New York: Springer, 1998 Communicated by H. Spohn
Commun. Math. Phys. 224, 565 – 591 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Norton’s Trace Formulae for the Griess Algebra of a Vertex Operator Algebra with Larger Symmetry Atsushi Matsuo Department of Pure Mathematics and Mathematical Statistics, Centre for Mathematical Sciences, University of Cambridge, Wilberforce Road, Cambridge CB3 0WB, UK Received: 24 July 2000 / Accepted: 15 June 2001
Abstract: Formulae expressing the trace of the composition of several (up to five) adjoint actions of elements of the Griess algebra of a vertex operator algebra are derived under certain assumptions on the action of the automorphism group. They coincide, when applied to the moonshine module V , with the trace formulae obtained in a different way by S. Norton, and the spectrum of some idempotents related to 2A, 2B, 3A and 4A elements of the Monster is determined by the representation theory of the Virasoro algebra at c = 1/2, the W3 algebra at c = 4/5 or the W4 algebra at c = 1. The generalization to the trace function on the whole space is also given for the composition of two adjoint actions, which can be used to compute the McKay-Thompson series for a 2A involution of the Monster. Introduction Since Griess’construction of the Monster simple group [Gr1] as the automorphism group of a commutative nonassociative algebra of dimension 196883 + 1, many attempts are made in order to understand the nature of this algebra. Conway [Co] reconstructed a slightly modified version of the algebra, called the Griess-Conway algebra, and gave a description of a 2A involution (a transposition) in terms of the eigenspace decomposition with respect to the adjoint action of an idempotent of a particular type called the transposition axis. Some more formulae related to the decomposition of this kind are obtained by Norton [No]. In particular, he wrote down a trace formula for the composition of several (up to five) adjoint actions of elements of the algebra: Tr ad a , Tr ad a ad b , . . . . On leave of absence from the Graduate School of Mathematical Sciences, University of Tokyo, Komaba 3-8-1, Tokyo 153-8914, Japan. E-mail: [email protected] Supported by the Overseas Research Scholarship of the Ministry of Education, Science, Sports and Culture, Japan.
566
A. Matsuo
These results are based on the explicit construction of the algebra as well as the character table of the Monster and some of its subgroups. On the other hand, Frenkel et al. [FLM1] constructed a graded vector space V called the moonshine module, and showed that the Griess-Conway algebra is naturally realized as the subspace of degree (conformal weight) 2 of V . The multiplication and the inner product of the algebra are actually a part of an infinite series of bilinear operations on the whole space V giving this space with the structure of a vertex operator algebra (VOA) [Bo1, FLM2]. The moonshine module is probably the most natural object to be considered in the study of the Monster in its relation to the moonshine phenomena; although this was more or less clear in the original construction of V as a graded vector space in [FLM1], it is conclusively supported by the existence of the VOA structure on V observed by Borcherds in [Bo1], as it is essential in his solution [Bo2] to the Conway–Norton conjecture [CN]. The algebra formed by the degree 2 subspace of a VOA is generally called the Griess algebra of the VOA. Recently an attempt to understand the Griess–Conway algebra from the VOA point of view was made by Miyamoto [Mi1], who opened a way to study the action of Monster elements on the moonshine module by using a subVOA whose fusion rules have a nice symmetry. In particular, 2A involution of the Monster is described as an automorphism obtained by the eigenspace decomposition with respect to the action of Virasoro L0 corresponding to the subVOA isomorphic to L 21 , 0 , which reproduces, on the level of the Griess algebra, the action described by Conway mentioned above. Further, the structure of V as a module over the tensor product of 48 copies of L 21 , 0 , called a frame, was studied in [DGH], and theVOA structure of the moonshine module was reconstructed from a frame in [Mi3]. In particular, the character and the McKay–Thompson series for 2A, 2B and 3C elements of the Monster can be computed by using a frame. The primary purpose of this paper is to derive Norton’s trace formula concerning the adjoint action of elements of the Griess algebra mentioned above from the VOA structure on the moonshine module V without using any explicit structure of the Griess algebra or the Monster; the only particular property necessary for our derivation is the fact that the components of V fixed by the full automorphism group coincides with the Virasoro submodule generated by the vacuum vector up to degree 11. This property says that, while the automorphism group is finite as it is the Monster, the symmetry of the VOA is large enough to separate the trivial components into the small subspace of which the action on the VOA is determined by the action of the Virasoro algebra (up to that degree). It should be emphasized that, in our derivation of the formulae, we do not need to know that the automorphism group is the Monster so that our method is generally applicable to any VOA with the same property; we call it a VOA of class S n if its trivial components with respect to the full automorphism group coincides with the Virasoro submodule generated by the vacuum vector up to degree n. This is what we mean by a VOA with larger symmetry. We will actually show in this paper (Theorem 2.1) that the trace of the composition of m adjoint actions of elements of the Griess algebra, m = 1, . . . , 5, is expressed in the same way as Norton’s formula with coefficients being replaced by certain rational functions of the rank (central charge) c and the dimension d of the Griess algebra if the VOA is of class S 2m under some technical assumptions. In spite of the fact that the trace must be invariant under the cyclic permutation of the order of the operators involved in the trace, the explicit expression for m = 4 and 5
Vertex Operator Algebra with Larger Symmetry
567
does not satisfy this property in general; it means that there is a restriction on the pair (c, d) coming from our assumptions. Also, in a slightly different way, we see that if the Griess algebra contains an idempotent with central charge being different from 0 and c then similar restrictions are imposed on the pair (c, d). In this way, we may list the possible pairs of (c, d) of a VOA satisfying our assumptions (Sect. 3). In particular, if the VOA is of class S 8 and it has an idempotent as above, then the rank must be 24 and the dimension must be 196884, i.e., those of the moonshine module V (Theorem 3.1). Now, as we have established Norton’s trace formulae in a different way, we may use them to study the action of some elements of the Monster. Indeed, the trace formulae has sufficient information to determine the spectrum of some idempotents related to 2A, 2B, 3A and 4A elements of the Monster by the representation theory of L( 21 , 0), W3 algebra at c = 4/5 or W4 algebra at c = 1 (Subsect. 4.2), reproducing some of the results of Conway [Co] and Norton [No] in the opposite way. We note that the trace formulae would be generalized to the traces on the higher degree subspaces. In fact, we will show (Theorem 5.1) that, under suitable assumptions, the trace functions Tr a(1) q L0 ,
and
Tr a(1) b(1) q L0 ,
where a, b are elements of the Griess algbera, are expressed in terms of the character ch V and its derivatives with coefficients written by (a|ω), (b|ω) and (a|b) as well as the Eisenstein series E2 (q) and E4 (q) by using some identities established by Zhu [Zh]. As a corollary, we show that the McKay-Thompson series T2A (q) for a 2A involution of the Monster can be computed from ch V = q(J (q) − 744) and the characters of L( 21 , h), h = 0, 1/2, 1/16, without using a frame. We finally note that our consideration based on the nonexistence of a Monster invariant primary vector of degree less than 12 in the moonshine module can be understood to be complementary to the result of Dong and Mason [DM] that the one-point function for a Monster invariant primary vector of degree 12 gives rise to the cusp form (q) of weight 12. Indeed, in the case of the moonshine module, we can give yet another derivation of Norton’s trace formulae by using the modular invariance of the one-point functions and the nonexistence of cusp forms of weight less that 12 (Remark 4.1). Most of the results of this paper were obtained by using computer. The author used Mathematica Ver. 3.0 for Linux.
1. Preliminaries In this section, we recall or give some definitions and facts necessary in this paper.
1.1. The Griess algebra of a vertex operator algebra. Let V be a vertex operator algebra (VOA) over the field C of complex numbers. Recall that it is a vector space equipped with a linear map Y (a, z) =
n∈Z
a(n) z−n−1 ∈ (End V ) z, z−1
568
A. Matsuo
and nonzero vectors 1 and ω satisfying a number of conditions [Bo1, FLM2]. We recall some of the properties (cf. [MaN]). The operators a(n) , (a ∈ V , n ∈ Z), are subject to ∞ p i=0
i
(a(r+i) b)(p+q−i) =
∞ i=0
r a(p+r−i) b(q+i) − (−1)r b(q+r−i) a(p+i) , (−1) i r
(1.1) where a, b ∈ V and p, q, r ∈ Z. The vacuum vector 1 satisfies 0, (n = −1), 0, (n ≥ 0), 1(n) a = and a(n) 1 = a, (n = −1), a, (n = −1).
(1.2)
The conformal vector ω generates a representation of the Virasoro algebra: [Lm , Ln ] = (m − n)Lm+n +
m3 − m δm+n,0 c, 12
where Lm = ω(m+1) . The central charge c is called the rank of the VOA. The operator L−1 satisfies (L−1 a)(n) = (ω(0) a)(n) = −na(n−1) .
(1.3)
n The operator L0 is supposed to be semisimple giving rise to a grading V = ∞ n=0 V of the VOA V , where V n denotes the eigenspace with eigenvalue n, which is supposed to be finite-dimensional by definition. The eigenvalue is called the degree or theconformal i V j ⊂ V i+j −n−1 and 1 ∈ V 0 . We denote the sum of the weight. It follows that V(n) subspaces with degree up to n as V ≤n =
n
V m.
(1.4)
m=0
Throughout the paper, we assume that the grading of the VOA V is of the form V =
∞
V n,
where
V 0 = C1
and
V 1 = 0.
(1.5)
n=0
Consider the first nontrivial subspace B = V 2 . Set ab = a(1) b,
(a|b)1 = a(3) b,
for a, b ∈ B.
(1.6)
Then the multiplication gives a commutative nonassociative algebra structure on B and ( | ) is a symmetric invariant bilinear form on it. The space B equipped with these structures is called the Griess algebra of V . We denote the adjoint action of an element a ∈ B as Ra : B → B, x → xa (= ax).
(1.7)
By a slight abuse of terminology, we call a vector τ ∈ B satisfying τ(1) τ = 2τ an idempotent. A vector τ ∈ B generates a representation of the Virasoro algebra on V if and only if it is an idempotent of B, for which the central charge is given by 2(τ |τ ). The adjoint action of the conformal vector ω of the VOA V is twice an identity element of
Vertex Operator Algebra with Larger Symmetry
569
the algebra, i.e., ωa = 2a for any a ∈ B. The squared norm (ω|ω) = c/2 is half the rank of the VOA V . Recall that the VOA V carries a unique invariant bilinear form ( | ) up to normalization ([Li]). We normalize the form by (1|1) = 1. It is actually an extension of the form ( | ) on B to the whole space V , so we have denoted it by the same symbol. We note that (V i |V j ) = 0 if i = j . The relation (a(n) u|v) = (−1)m (u|a(2m−2−n) v),
(u, v ∈ V ),
(1.8)
for a vector a ∈ V m such that L1 a = 0 is a particular case of the invariance of the form. Therefore, (a(n) u|v) = (u|a(2−n) v) holds for any a ∈ B thanks to the assumption (1.5). 1.2. Virasoro submodule generated by the vacuum vector. Let V be a VOA and let Vω be the Virasoro submodule generated by the vacuum vector 1 with respect to the action Ln associated with the conformal vector ω. Then, since Ln 1 = 0 for n ≥ −1, we have the following sequence of surjective homomorphisms of Virasoro modules: M(c, 0)/M(c, 1) → Vω → L(c, 0),
(1.9)
where M(c, h) (resp. L(c, h)) denote the Verma module (resp. irreducible module) over the Virasoro algebra of central charge c with highest weight (lowest conformal weight) h. Let us denote the highest weight vector of M(c, 0)/M(c, 1) mapped to 1 ∈ Vω ⊂ V by the same symbol 1. Now, let Pn denote the set of all partitions of n by integers greater than 1: Pn = m = m1 m2 · · · mk | k, m1 , . . . , mk ∈ N, m1 ≥ m2 ≥ · · · ≥ mk ≥ 2 . (1.10) For each partition m ∈ Pn , we set [m] = [m1 , m2 , . . . , mk ] = L−m1 L−m2 · · · L−mk 1 ∈ M(c, 0)/M(c, 1).
(1.11)
Then the set {[m] | m ∈ Pn } forms a basis of the subspace with conformal weight n of the module M(c, 0)/M(c, 1). We denote Lm = Lmk · · · Lm2 Lm1
(1.12)
for a partition m = m1 m2 · · · mk . Recall that a singular vector (or a primary vector) of a Virasoro module is a nonzero vector v of the module such that Lm v = 0
for all m ≥ 1.
(1.13)
In this paper, by convention, a singular vector means a nonzero vector in a highest weight module over the Virasoro algebra satisfying (1.13) which is not a multiple of the highest weight vector (vacuum vector), and a primary vector means any nonzero vector in a VOA satisfying (1.13). The module M(c, 0)/M(c, 1) contains a singular vector of conformal weight up to n if and only if the central charge c is a zero of a certain polynomial Dn (c), which
570
A. Matsuo
can be computed by the Kac-determinant formula. We normalize the polynomials for n = 2, . . . , 10 as follows: D2 (c) = c, D4 (c) = c(5c + 22), D6 (c) = c(2c − 1)(5c + 22)(7c + 68), D8 (c) = c(2c − 1)(3c + 46)(5c + 3)(5c + 22)(7c + 68), D10 (c) = 10c(2c − 1)(3c + 46)(5c + 3)(5c + 22)(7c + 68)(11c + 232).
(1.14)
If Dn (c) = 0, then, up to the degree n, the maps (1.9) are isomorphisms and we may identify Vω with M(c, 0)/M(c, 1). 1.3. VOA of class S n . An automorphism of a VOA V is a linear isomorphism g : V → V satisfying g(a(n) b) = (ga)(n) (gb) for all a, b ∈ V and all n ∈ Z that fixes the conformal vector ω. Let Aut V denote the group of all automorphisms of V . Any automorphism sends the vacuum vector 1 to itself and preserves the grading. Since an automorphism g ∈ Aut V fixes any vector in the subspace Vω , the group Aut V acts on the quotient space V /Vω and on its graded pieces V n /Vωn . Definition 1.1. A VOA V is said to be of class S n if the action of Aut V on V ≤n /Vω≤n is fixed-point free. In other words, a VOA V is of class S n if V ≤n has no extra fixed-vector other than those belonging to Vω≤n . The VOA L(c, 0) associated with the irreducible highest weight representation of the Virasoro algebra at central charge c (cf. [FrZh]) is obviously of class S n . However, since its Griess algebra B is one-dimensional, this example is not of interest to us, although we will use L( 21 , 0), etc. later in a different context. The main example of concern to us isis the moonshine module V constructed by Frenkel et al. in [FLM1, FLM2]. It follows from [CN] and [Bo2] (cf. [HL] and [DM]) that the VOA V is of class S 11 . We will later see that the moonshine module V , for which c = 24 and dim B = 196884, has exceptionally large symmetry in our sense. In the rest of this paper, we always assume that Dn (c) = 0 whenever V is supposed to be of class S n , for the cases when Dn (c) = 0 are not interesting from our point of view. The VOA’s of concern to us are at most of class S 10 so that the excluded values of c are only 0, 1/2, −46/3, −3/5, −22/5, −68/7 and −232/11. 2. Trace Formulae for the Griess Algebra of a VOA Let B be the Griess algebra of a VOA V of rank c, and let d denote the dimension of B. In the first subsection, we give the formulae that describes the traces Tr Ra1 Ra2 · · · Ram up to m = 5 under appropriate assumptions. Sketch of the derivation of the formulae will be given in subsequent subsections.
2.1. The formulae. To describe the formulae, we set (a1 |a2 |a3 ) = (a1 |a2 a3 ),
(2.1)
Vertex Operator Algebra with Larger Symmetry
571
which is a totally symmetric trilinear form on B, and define a totally antisymmetric quinery form on B by setting (a1 , a2 , a3 , a4 , a5 )1 =
1 (−1),(σ ) σ a1(3) a2(2) a3(1) a4(0) a5 . 5!
(2.2)
σ ∈S5
Here we let σ ∈ S5 act by the permutation of the indices of ai , i = 1, . . . , 5. Let Cyc denote the operation of summing over the cyclic permutations of the indices, and Sym denote that over all permutations such that the results are apparently distinct after performing the symmetries ai aj = aj ai , (ai |aj ) = (aj |ai ) and (ai aj |ak ) = (ai |aj ak ) for any i, j, k = 1, . . . , 5; for instance, Sym (a1 |a2 )(a3 |ω)(a4 |ω) = (a1 |a2 )(a3 |ω)(a4 |ω) + (a1 |a3 )(a2 |ω)(a4 |ω) + (a1 |a4 )(a2 |ω)(a3 |ω) + (a2 |a3 )(a1 |ω)(a4 |ω) + (a2 |a4 )(a1 |ω)(a3 |ω) + (a3 |a4 )(a1 |ω)(a2 |ω). The result is summarized in the following theorem. Theorem 2.1. Let B be the Griess algebra of a VOA V such that the bilinear form ( | ) on B is nondegenerate or Aut V is finite1 : (1) If V is of class S 2 then, for any a ∈ B, Tr Ra =
4d (a|ω). c
(2) If V is of class S 4 then, for any a1 , a2 ∈ B, Tr Ra1 Ra2 =
−2(5c2 − 88d + 2cd) 4(5c + 22d) (a1 |a2 ) + (a1 |ω)(a2 |ω). c(5c + 22) c(5c + 22)
(3) If V is of class S 6 then, for any a1 , a2 , a3 ∈ B, Tr Ra1 Ra2 Ra3 =
−3c2 (70c2 + 769c − 340) + 2d(4c3 − 445c2 + 12236c − 5984) (a1 |a2 |a3 ) c(2c − 1)(5c + 22)(7c + 68) +
4c(70c2 + 1017c − 340) − 8d(32c2 − 1419c + 748) Cyc (a1 |a2 )(a3 |ω) c(2c − 1)(5c + 22)(7c + 68) 5952c(d − 1) + (a1 |ω)(a2 |ω)(a3 |ω). c(2c − 1)(5c + 22)(7c + 68)
1 We suppose this for simplicity although the condition can be slightly weakened.
572
A. Matsuo
(4) If V is of class S 8 then, for any a1 , a2 , a3 , a4 ∈ B, Tr Ra1 Ra2 Ra3 Ra4 1 A1 (a1 a2 |a3 a4 ) + A2 (a1 a3 |a2 a4 ) + A3 (a1 a4 |a3 a2 ) = D8 (c) +B Sym (a1 |a2 |a3 )(a4 |ω) + C Sym (a1 |a2 )(a3 |a4 )
+D Sym (a1 |a2 )(a3 |ω)(a4 |ω) + E(a1 |ω)(a2 |ω)(a3 |ω)(a4 |ω)
where D8 (c) is given in (1.14) and the coefficients are listed in Appendix A.1. (5) If V is of class S 10 then, for any a1 , a2 , a3 , a4 , a5 ∈ B, Tr Ra1 Ra2 Ra3 Ra4 Ra5 1 Ai1 ,i2 ,i3 ,i4 ,i5 (ai1 ai2 |ai3 |ai4 ai3 ) + B1 Cyc(a1 a2 |a3 a4 )(a5 |ω) = D10 (c) + B2 Cyc(a1 a3 |a2 a4 )(a5 |ω) + B3 Cyc(a1 a4 |a2 a3 )(a5 |ω) + C Sym(a1 |a2 |a3 )(a4 |a5 ) + D Sym(a1 |a2 |a3 )(a4 |ω)(a5 |ω)
,
+ E Sym(a1 |a2 )(a3 |a4 )(a5 |ω) + F Sym(a1 |a2 )(a3 |ω)(a4 |ω)(a5 |ω) + G(a1 |ω)(a2 |ω)(a3 |ω)(a4 |ω)(a5 |ω) + H (a1 , a2 , a3 , a4 , a5 ) where D10 (c) is given in (1.14) and the coefficients are listed in Appendix A.2. Remark 2.1. Suppose that V is of class S 8 . By the cyclic property of trace: Tr Ra1 Ra2 Ra3 Ra4 = Tr Ra2 Ra3 Ra4 Ra1 , we must have A1 = A3 in Theorem 2.1 (4) if there exist elements a1 , . . . , a4 such that (a1 a2 |a3 a4 ) = (a1 a4 |a2 a3 ). In this case, the dimension is determined from the rank as d=−
1050 c6 + 22565 c5 + 33121 c4 − 1707790 c3 − 3390408 c2 + 308160 c . 2 (30 c5 − 3212 c4 + 107355 c3 − 1135590 c2 − 206024 c + 825792) (2.3)
We will discuss restrictions on the pair (c, d) similar to (2.3) later in Sect. 3. 2.2. Derivation by Casimir elements. In this subsection, we sketch the derivation of the formulae in case the form ( | ) on B is nondegenerate. Let {x1 , . . . , xd } be a basis of B and let {x 1 , . . . , x d } be the dual basis with respect to the form ( | ): (xi |x j ) = δi,j . We suppose that any expression with a repeated index i must be summed over i = 1, . . . , d. Note that i i (x(3) a)(−1) xi = x(−1) a(3) xi = a.
The strategy of the derivation of the formula is to write the trace as i Tr Ra1 · · · Ram = (a1(1) · · · am(1) xi |x i ) = (x(1) am−1(1) · · · a1(1) xi |am ),
(2.4)
Vertex Operator Algebra with Larger Symmetry
573
and to rearrange the vectors by using the identity (1.1) and the invariance of the form i x |X), where ( | ) on V until the trace is written as a sum of expressions of the form (x(k) i X is an element of V written by a1 , a2 , . . . , am and k is an integer. Let us first study the “Casimir” elements: i κn = x(3−n) xi =
d i=1
i x(3−n) xi ,
(2.5)
which do not depend on the choice of the basis. Note that κ0 = d1
and
κ1 = 0,
(2.6)
and that the vector κn for an odd n is determined from those for even n by the action of L−1 . By the identity (1.1), the sequence κ0 , . . . , κn is subject to the relations Lm κk = (m + k − 2)κk−m + δm,2 L−k+2 1 + δm,k−2
m3 − m L−2 1, (k = 0, . . . , n). 6 (2.7)
Now, in order to deduce some information on the Griess algebra by these elements, the following simple observation is fundamental. i Lemma 2.1. The vector κn = x(3−n) xi is fixed by any automorphism of the VOA V .
Therefore, if the VOA V has larger symmetry then the vector κn has to belong to a smaller subspace of V ; the smallest possible case is Vω when the VOA is of class S n . Lemma 2.2. If the VOA V is of class S n then the Casimir elements κ2 , . . . , κn are contained in Vω . Now, suppose that the vectors κ2 , . . . , κn are indeed contained in Vω . If the Virasoro submodule Vω does not contain a singular vector of degree up to n, then these vectors are uniquely determined by the properties (2.6) and (2.7). In this way, we have the following result. Proposition 2.1. If the VOA V is of class S n then κn is uniquely written as 1 Pm (c, d)[m], κn = Dn (c)
(2.8)
m∈Pn
where Pm (c, d) are certain polynomials in c and d. The explicit expressions of κn for n = 2, 4, 6 are given as follows: 6(d − 1) 2(5c + 22d) [4] + [2, 2], 5c + 22 c(5c + 22)
κ2 =
4d [2], c
κ6 =
8(d − 1)(5c2 + 35c − 228) [6] (2c − 1)(5c + 22)(7c + 68)
κ4 =
(2.9) 2c(70c2 + 769c + 1644) + 4d(92c2 + 427c − 748) + [4, 2] c(2c − 1)(5c + 22)(7c + 68) 31(d − 1)(5c + 44) 992(d − 1) + [3, 3] + [2, 2, 2]. (2c − 1)(5c + 22)(7c + 68) (2c − 1)(5c + 22)(7c + 68)
574
A. Matsuo
The expressions for higher n are so lengthy; we do not include them in this paper. The expressions in case c = 24 and d = 196884 look as follows: κ2 = 32814[2],
κ4 = 8319[4] + 2542[2, 2], 1271 κ6 = 3492[6] + 1302[4, 2] + [3, 3] + 124[2, 2, 2], 2 333 3863 κ8 = [8] + 552[6, 2] + 434[5, 3] + [4, 4] 2 2 13 + 96[4, 2, 2] + 93[3, 3, 2] + [2, 2, 2, 2], 3 613 κ10 = 1182[10] + [8, 2] + 207[7, 3] + 141[6, 4] + 41[6, 2, 2] 2 99 + 74[5, 5] + 64[5, 3, 2] + [4, 4, 2] + 24[4, 3, 3] 4 9 13 7 + [4, 2, 2, 2] + [3, 3, 2, 2] + [2, 2, 2, 2, 2]. 2 2 60 Now, suppose that V is of class S 2 and let a be any element of the Griess algebra. i x |a) = Then we immediately obtain Theorem 2.1 (1): Tr Ra = (a(1) x i |xi ) = (x(1) i 4d(a|ω)/c. In particular, we have 8d (a|b). c Next, take any two elements a, b of the Griess algebra. Then Tr Rab =
(2.10)
i Tr Ra Rb = (a(1) b(1) x i |xi ) = (x(1) a(1) xi |b). i a) i By the identity (1.1) for p = −1, q = 1, r = 2, we have −(x(3) (−1) xi = x(1) a(1) xi + i i i x − a x i x . Since a x i x = a x i x , a(3) xi − a(3) x(−1) xi + 2a(2) x(0) x(−1) i (1) (1) i (2) (0) i (1) (1) i i i xi |b) − (a(1) x(1) xi |b) Tr Ra Rb = −2(a|b) + (a(3) x(−1) i i xi |a(−1) b) − (x(1) xi |a(1) b). = −2(a|b) + (x(−1) i x and κ = Suppose that V is of class S 4 and substitute the expressions of κ2 = x(1) i 4 i x(−1) xi given by (2.9). Then using
(L−4 1|a(−1) b) = (1|L4 a(−1) b) = 6(a|b), (L−2 L−2 1|b(−1) a) = (1|L2 L2 (b(−1) a)) = 2(a|ω)(b|ω) + 8(a|b),
(2.11)
we have Theorem 2.1 (2): Tr Ra Rb = −2(a|b) + =
4d 2(5c + 22d) 6(d − 1) 6(a|b) + 2(a|ω)(b|ω) + 8(a|b) − (ab|ω) 5c + 22 c(5c + 22) c
−2(5c2 − 88d + 2cd) 4(5c + 22d) (a|b) + (a|ω)(b|ω). c(5c + 22) c(5c + 22)
Vertex Operator Algebra with Larger Symmetry
575
Similarly, one can obtain an expression in terms of the inner product and the multiplication of the trace in which three and four elements of the Griess algebra are concerned if V is of class S 6 and of S 8 , respectively. However, if five elements are concerned, then we encounter the expression a1(3) a2(2) a3(1) a4(0) a5 and its permutations which cannot be written by a combination of the inner product ( | ) and the multiplication in general. Thus we are led to consider the totally antisymmetric quinery form defined by (2.2). Then the trace Tr Ra1 Ra2 Ra3 Ra4 Ra5 is written by a combination of these operations if V is of class S 10 . In this way, we obtain Theorem 2.1 (3)–(5). 2.3. Derivation by projection to Vω . In this subsection, we will sketch another derivation of the formulae under the assumption that Aut V is finite. For any v ∈ V , let v˜ denote its average over the action of the automorphism group: v˜ =
1 gv. |Aut V |
(2.12)
g∈Aut V
The following lemma is obvious. Lemma 2.3. Let v be an element of V n . Then Tr |V k v(n−1) = Tr |V k (gv)(n−1) for any g ∈ Aut V at any degree k. In particular, we have Tr v(n−1) = Tr v˜(n−1) for any v ∈ V n . Now, for each n, consider the map n
ηn : V n → C dim Vω ,
v → (Lm v)m∈Pn ,
(2.13)
where Pn is the set (1.10) which parametrizes a basis of Vωn . Lemma 2.4. If Dn (c) = 0 then V n = Vωn ⊕ Ker ηn . Proof. Since the map ηn is isomorphic on Vωn if Dn (c) = 0, the result follows.
For any v ∈ V , let δ(v) denote its projection to Vω with respect to the decomposition as in the lemma. Lemma 2.5. Suppose that V is of class S n . Then Tr |V k v(n−1) = Tr |V k δ(v)(n−1) for any v ∈ V n at any degree k. Proof. Set δ = δ(v) and π = v − δ(v). Then obviously v˜ = δ˜ + π˜ = δ + π˜ , and π˜ is fixed by any automorphism of V . Therefore, if V is of class S n then π˜ = 0 because π˜ ∈ Vω ∩ Ker ηn . Hence, by Lemma 2.3, we have Tr |V k v(n−1) = Tr |V k v˜(n−1) = Tr |V k δ(n−1) . Since δ(v) ∈ Vω , the action δ(v)(n−1) on B can be explicitly computed by the commutation relation of the Virasoro algebra. For example, if c = 0, then we have δ(a) = 2(a|ω)ω/c for any a ∈ B. Namely, any element a ∈ B can be written as a=
2(a|ω) ω + π, c
(2.14)
576
A. Matsuo
where π is a primary vector in B = V 2 . If V is of class S 2 then Tr Ra =
2(a|ω) 4d Tr Rω = (a|ω). c c
Thus we have obtained Theorem 2.1 (1). Next, let a, b be any elements of B. Then, by the identity (1.1) for p = 2, q = 1, r = −1, we have a(1) b(1) = (a(−1) b)(3) +2(a(0) b)(2) +(a(1) b)(1) −a(−1) b(3) −b(−1) a(3) on B. Therefore, Tr Ra Rb = Tr |B (a(−1) b)(3) + 2Tr |B (a(0) b)(2) + Tr |B (a(1) b)(1) − 2(a|b). If V is of class S 4 then, since 6c(a|b) − 12(a|ω)(b|ω) 44(a|b) + 20(a|ω)(b|ω) [4] + [2, 2], c(5c + 22) c(5c + 22) 2(a|b) 4(a|b) δ(a(0) b) = [3], δ(a(1) b) = [2] c c
δ(a(−1) b) =
(2.15)
and Tr |B [4](3) = 6d, Tr |B [2, 2](3) = 8d + c, Tr |B [3](2) = −4d, Tr |B [2](1) = 2d, we have Theorem 2.1 (2). The derivation of Theorem 2.1 (3)–(5) is similar. 3. Constraints on c and d Let B be the Griess algebra of a VOA V such that the form ( | ) on B is nondegenerate. In this section, we will give some necessary conditions satisfied by the pair (c, d) for a VOA with larger symmetry under some additional conditions on V . 3.1. Constraints from a proper idempotent. By a proper idempotent we mean an idempotent τ ∈ B such that the central charge cτ = 2(τ |τ ) differs from 0 and the rank c of the VOA. If the algebra B has a real form on which the bilinear form ( | ) is positive-definite then, by Theorem 11 of [MeN] and Theorem 6.8 of [Mi1], the conformal vector ω is decomposable if d ≥ 2; In particular, the algebra B contains a proper idempotent. Lemma 3.1. Let ϕ(τ ) be a vector generated by an idempotent τ ∈ B. Then (τ(2) (a(m−1) b)|ϕ(τ )) = (3 − m)(a(m) b|ϕ(τ )) holds for any a, b ∈ B and m ∈ Z. Proof. Note that the actions of τ(p) and (ω − τ )(q) commute, and (ω − τ )(q) 1 = 0 if q ≥ 0. Hence τ(q) ϕ(τ ) = ω(q) ϕ(τ ) if q ≥ 0. Since (τ(2) (a(m−1) b)|ϕ(τ )) = (a(m−1) b|τ(0) ϕ(τ )) = (a(m−1) b|ω(0) ϕ(τ )) = (ω(2) (a(m−1) b)|ϕ(τ )) = ((ω(0) a)(m+1) b)|ϕ(τ )) + 2((ω(1) a)(m) b)|ϕ(τ )) = (3 − m)(a(m) b)|ϕ(τ )), we have the result.
Vertex Operator Algebra with Larger Symmetry
577
Now suppose that V is of class S 6 . By the lemma, i i (x(−1) τ(1) xi |τ(0) τ(0) τ ) = 3(x(0) τ(1) xi |τ(0) τ ).
(3.1)
Computing both sides of this equality by the method of Subsect. 2.2, we have
cτ (cτ − c) (70c2 + 955c + 2388)c − 2(c2 − 55c + 748)d = 0, where cτ = 2(τ |τ ) is the central charge of τ . Therefore, if τ is proper then d=
(70c2 + 955c + 2388)c c2 + 214248c − 3593392 = 35c + 2402 + . 2(c2 − 55c + 748) 2(c2 − 55c + 748)
Further, if V is of class S 8 then, by computing
i i x(−3) τ(1) xi |τ(0) τ(0) τ(0) τ(0) τ = 5 x(−2) τ(1) xi |τ(0) τ(0) τ(0) τ ,
(3.2)
(3.3)
we have d=
5250c5 + 155250c4 + 1369715c3 + 3507098c2 + 1497768c 125c4 − 4770c3 − 23382c2 + 1561868c + 1032240
(3.4)
if τ is proper. Recall that we have excluded by convention the cases when Dn (c) = 0 whenever V is supposed to be of class S n . Theorem 3.1. Let V be a VOA of class S 8 for which the form ( | ) on B is nondegenerate. If B contains a proper idempotent then c = 24 and d = 196884. Proof. By (3.2) and (3.4), c = −46/3, −68/7, −22/5, −3/5, 0, 1/2, 24, 142/5. However, D6 (c) = 0 for the first 6 cases and d < 0 for the last case. By the remark at the beginning of this subsection, we have the following corollary. Corollary 3.1. Suppose that the algebra B has a real form on which the bilinear form ( | ) is positive-definite. If the VOA V is of class S 8 and if d ≥ 2 then c = 24 and d = 196884. Now, let us come back to VOA’s of class S 6 but restrict our attention to the case when the rank c is a positive half-integer. In this case, inspecting the relation (3.2), we see that the pair (c, d) must be one from Table 3.1. Table 3.1. c
d
c
8
156
23
16
2296
24
20
10310
24
21414 28639
21 22
1 2
d 1 2
c
d
c
d 1 2
96256
32
139504
54
196884
34
57889
68
9919
1 2
1107449
36
35856
93
30
1 2
1964871
40
20620
132
8154
31
1 2
207144
44
14994
1496
54836
8146 1 2
7566
578
A. Matsuo
3.2. Constraints from an idempotent of central charge 1/2. Suppose that B contains an idempotent of central charge 1/2 for which the eigenvalues of the adjoint action are 0, 1/2, 1/16 and 2, and the eigenspace with eigenvalue 2 is one-dimensional. This is indeed the case if τ generates a subVOA isomorphic to L( 21 , 0), for this is a rational VOA for which the irreducible modules are isomorphic to either L( 21 , 0), L( 21 , 21 ) or 1 L( 21 , 16 ) (cf. [BPZ, FrZh,Wa, DMZ]). In particular, this holds if V has a real form on which the form ( | ) is positive-definite and τ is a real idempotent of central charge 1/2. First consider the case when the eigenspace with eigenvalue 1/16 is zero. In this case, we have the following equations satisfied by the dimension d 21 of the eigenspace: d
1 2
+ 4 = 2Tr Rτ , d
1 2
+ 16 = 4Tr Rτ2 .
(3.5)
By the compatibility, we obtain 2Tr Rτ2 − Tr Rτ = 6. If the VOA V is of class S 4 then, by Theorem 2.1, (−22 + 2c)d = (−37 − 10c).
(3.6)
Further if V is of class S 6 then we get (3c2 + 164c − 2992)d = c(−140c2 − 1903c − 4832).
(3.7)
By solving these equations, we obtain the following result. Proposition 3.1. Suppose that the VOA V contains an idempotent of central charge 1/2 for which the eigenvalues are 0, 1/2 and 2 and the eigenspace with eigenvalue 2 is one-dimensional. If V is of class S 6 and d ≥ 2 then c = 8 and d = 156. Now suppose that the rank c is a positive half-integer. If V is of class S 4 then, by (3.6), the nonnegativity and the integrality of d(0) and d 21 gives us a list of possible pairs of such c and d (Table 3.2). Table 3.2. d = d(0) + d 21 + 1
c 4 7 8
1 2
22 =
14 +
7 + 1
9
120 =
91 +
28 + 1
10
156 = 120 +
35 + 1
d = d(0) + d 21 + 1
c
10
1 2
1 2
418 = 333 +
84 + 1
685 = 551 +
133 + 1
1491 = 1210 +
280 + 1
√ Note 3.1. The fixed-point VOA V√+ of the VOA associated with the lattice 2E8 by 2E8 the −1 automorphism of the lattice would be an example with c = 8 and d = 156. + According to [Gr2], the automorphism group of this VOA is isomorphic to O10 (2). The decomposition 156 = 120 + 35 + 1 coincides with Theorem 5.2 of [Gr2]. The Hamming code VOA VH8 considered by Miyamoto [Mi2], isomorphic to the VOA V√+ , would be 2D4 an example with c = 4 and d = 22. The automorphism group of this VOA is isomorphic to a group of shape 26 : (GL3 (2) × S3 ) [MaM].
Vertex Operator Algebra with Larger Symmetry
579
Table 3.3. c
d
16 20
2296 = 10310 =
1 d(0) + d 21 + d 16 + 1 1116 + 155 + 1024 + 1 4914 + 403 + 4992 + 1
1 2
=
96256 =
46851 + 2300 + 47104 + 1
196884 =
96256 + 4371 + 96256 + 1
24
1 2
1107449 =
543960 + 22816 + 540672 + 1
30
1 2
1964871 = 1029630 + 13640 + 921600 + 1
31
1 2
23 24
207144 =
109771 + 1116 + 96256 + 1
32
139504 =
74340 +
651 + 64512 + 1
36
35856 =
19951 +
0 + 15904 + 1
We next consider the case when the 1/16 components are indeed present. In this case, if V is of class S 6 then we have (2c2 − 110c + 1496)d = (70c2 + 955c + 2388)c,
(3.8)
which is actually the same as condition (3.2). If V is of class S 6 with d ≥ 2 and if c is a positive half-integer then the rank c and the dimension d must be a pair from Table 3.3. Note 3.2. The moonshine module V is of course an example with c = 24 and d = 196884. The bosonic projection of the Babymonster VOSA VB constructed by Höhn 1 [Hö] would be an example with c = 23 2 and d = 96256. The fixed-point VOA V9+16 of the VOA associated with the Barnes-Wall lattice 916 by the −1 automorphism of the lattice would be an example with c = 16 and d = 2296. 4. Application to the Moonshine Module Now, let V be the moonshine module and let B be the Griess-Conway algebra. In this section, we will compute the spectrum of the eigenspace decomposition of B with respect to idempotents related to some Monster elements starting from the trace formulae by the representation theory of various subVOA’s inside V . 4.1. Norton’s formulae. Since the moonshine module V is of class S 11 , just substituting c = 24 and d = 196884 in Theorem 2.1, we recover the original trace formulae of Norton [No]: Corollary 4.1. For any elements a1 , a2 , a3 , a4 , a5 of the Griess-Conway algebra B , Tr Ra1 = 32814(a1 |ω), Tr Ra1 Ra2 = 4620(a1 |a2 ) + 5084(a1 |ω)(a2 |ω), Tr Ra1 Ra2 Ra3 = 900(a1 |a2 |a3 ) + 620 Cyc (a1 |a2 )(a3 |ω) + 744(a1 |ω)(a2 |ω)(a3 |ω),
580
A. Matsuo
Tr Ra1 Ra2 Ra3 Ra4 = 166(a1 a2 |a3 a4 ) − 116(a1 a3 |a2 a4 ) + 166(a1 a4 |a2 a3 ) +114 Sym (a1 |a2 |a3 )(a4 |ω) + 52 Sym (a1 |a2 )(a3 |a4 ) +80 Sym (a1 |a2 )(a3 |ω)(a4 |ω) +104(a1 |ω)(a2 |ω)(a3 |ω)(a4 |ω), Tr Ra1 Ra2 Ra3 Ra4 Ra5 = 30Cyc (a1 a2 |a3 |a4 a5 ) + 4Cyc (a1 a4 |a3 |a2 a5 ) −22Cyc (a1 a5 |a3 |a2 a4 ) + 20Cyc (a1 a2 |a3 a4 )(a5 |ω) −14Cyc (a1 a3 |a2 a4 )(a5 |ω) +20Cyc (a1 a4 |a2 a3 )(a5 |ω) + 8 Sym (a1 |a2 |a3 )(a4 |a5 ) +14 Sym (a1 |a2 |a3 )(a4 |ω)(a5 |ω) + 6 Sym (a1 |a2 )(a3 |a4 )(a5 |ω) +10 Sym (a1 |a2 )(a3 |ω)(a4 |ω)(a5 |ω) +14(a1 |ω)(a2 |ω)(a3 |ω)(a4 |ω)(a5 |ω) +52(a1 , a2 , a3 , a4 , a5 ). Note 4.1. To compare this result with Table 2 of [No], substitute 1 = ω/2 (the identity element of the algebra) and suppose for the last one that a1 , a2 , a3 , a4 , a5 are perpendicular to ω. The formula for Tr Ra1 Ra2 was given in p. 528 of [Co]. In particular, letting a1 = · · · = a5 be an idempotent τ , we have Tr Rτ = 32814 (τ |τ ) , Tr Rτ2 = 4620 (τ |τ ) + 5084 (τ |τ )2 , Tr Rτ3 = 1800 (τ |τ ) + 1860 (τ |τ )2 + 744 (τ |τ )3 , Tr Rτ4 Tr Rτ5
(4.1)
=
864 (τ |τ ) + 1068 (τ |τ ) + 480 (τ |τ ) + 104 (τ |τ ) ,
=
480 (τ |τ ) + 680 (τ |τ )2 + 370 (τ |τ )3 + 100 (τ |τ )4 + 14 (τ |τ )5 .
2
3
4
Recall that 2 (τ |τ ) is the central charge of the Virasoro algebra corresponding to the idempotent τ . Remark 4.1. By a slight variation of the argument of Subsect. 2.3, using results of Zhu [Zh] and of Dong and Mason [DM] and the absence of a cusp form of weight less than 12, we see that Norton’s formulae hold for any rational selfdual (holomorphic) VOA of rank 24 with shape (1.5) satisfying Zhu’s C2 finiteness condition. 4.2. Eigenspace decomposition of the Griess-Conway algebra. Recall that V has a real form VR on which the form ( | ) is positive-definite. Let U → V be an inclusion of a VOA U with conformal vector τ into V such that (W1) The map preserves the operations of VOA’s. (W2) The operators τ(n+1) and Ln for n = 0, 1 coincide on the image of U .
Vertex Operator Algebra with Larger Symmetry
581
(W3) The image of U is closed under the complex conjugation. (W4) The image of τ is contained in the real form VR . Then the adjoint action Rτ = τ(1) is semisimple on B . Let B (h) denote the eigenspace of Rτ with eigenvalue h, and let d(h) be its dimension. In case there is a primary vector w ∈ U of conformal weight 3 mapped to the real form VR , the action w(2) is semisimple, as it is alternating with respect to the invariant bilinear form, and we have [w(2) , τ(1) ] = 0. Let us denote by B (h, σ ) the simultaneous √ eigenspace of τ(1) and w(2) , where σ = −, 0, + is the sign of the action of −1w(2) . Let d(h, σ ) be the dimension. Note that d(h, +) = d(h, −). For the representation theory of various VOA’s discussed below, we refer the reader to [FrZh,Wa, DMZ, KMY, Mi4,DN,Ab] as well as the physics papers [BPZ,FaZ,ZaF, FaL], and for the construction of an automorphism by means of fusion rules to [Mi1] and [Mi4]. We will use the ATLAS notation [Atlas] for conjugacy classes of the Monster. L( 21 , 0) and 2A involution. Let U be the VOA L( 21 , 0) associated with the irreducible highest weight representation of the Virasoro algebra at c = 1/2. There are 3 irreducible modules for this VOA, which are parametrized by the lowest conformal weight h = 0, 1/16, 1/2. Hence we have a decomposition 1 B = B (0) ⊕ B ( 16 ) ⊕ B ( 21 ) ⊕ B (2),
where d(2) = 1. Hence 1 d( 21 ) d( 16 ) + + 2 = Tr Rτ , 2 16
1 d( 21 ) d( 16 ) + + 22 = Tr Rτ2 . 2 2 2 16
By (4.1) with 2(τ |τ ) = 1/2, we get 1 d(0) = d( 16 ) = 96256, d( 21 ) = 4371, d(2) = 1.
Now consider the map 1
on
B (0) ⊕ B ( 21 ) ⊕ B (2),
−1
on
1 B ( 16 ).
This map gives rise to an automorphism of B [Mi1], so an involution of the Monster. It is identified with a 2A involution of the Monster by [Mi1] and [Co]. We may confirm this by looking at the trace of this map; it is 96256 − 96256 + 4371 + 1 = 4372, which coincides with the corresponding value in the list of Conway and Norton [CN]. Thus we have come back to the situation considered in Sect. 15 of [Co] without using any explicit construction of the Griess–Conway algebra. L( 21 , 0) ⊗ L( 21 , 0) and 2B involution. Suppose given an embedding of U = L( 21 , 0) ⊗ L( 21 , 0) into V , and let τ 1 and τ 2 denote the images of the conformal vector of the first and the second component which we suppose to be real. Since they are mutually orthogonal, we have a simultaneous eigenspace decomposition
B = B (h, h ). 1 1 h,h ∈{0, 16 , 2 ,2}
582
A. Matsuo
1 1 We already know that d(2, 0) = d(0, 2) = 1 and d(2, 16 ) = d(2, 21 ) = d( 16 , 2) = 1 d( 2 , 2) = d(2, 2) = 0. By Corollary 4.1, we have
Tr Rτ 1 Rτ 2 =
1271 4 ,
Tr Rτ21 Rτ 2 = Tr Rτ 1 Rτ22 =
403 8 ,
Tr Rτ21 Rτ22 =
197 32 .
Therefore, the dimensions of the eigenspaces are given by 1 1 1 1 d(0, 0) = 46851, d(0, 16 ) = d( 16 , 0) = d( 16 , 16 ) = 47104, 1 1 d( 21 , 0) = d(0, 21 ) = 2300, d( 21 , 16 ) = d( 21 , 16 ) = 2048, d( 21 , 21 ) = 23.
Hence, for the idempotent τ = τ 1 + τ 2 , 1 d(0) = 46851, d( 16 ) = 94208, d( 18 ) = 47104, d( 21 ) = 4600, 9 d( 16 ) = 4096, d(1) = 23, d(2) = 2.
In particular, the trace of the map 1
on
B (0) ⊕ B ( 18 ) ⊕ B ( 21 ) ⊕ B (1) ⊕ B (2),
−1
on
1 9 B ( 16 ) ⊕ B ( 16 ),
which is the composition of two 2A involutions corresponding to τ 1 and τ 2 , is equal to 46851 − 94208 + 47104 + 4600 − 4096 + 23 + 2 = 276. Hence this map is identified with a 2B involution of the Monster. We may do the same analysis for an embedding of L( 21 , 0)⊗3 . However, the spectrum is not uniquely determined by the trace formulae; there are two possibilities. 7 7 L( 10 , 0) and 2A involution. Let U be the VOA L( 10 , 0), for which the irreducible modules are parametrized by the lowest conformal weight h = 0, 3/80, 1/10, 7/16, 3/5, 2/3. Hence the spectrum of the idempotent is determined as 3 1 7 d(0) = 51054, d( 80 ) = 91392, d( 10 ) = 47634, d( 16 ) = 4864,
d( 35 ) = 1938, d( 23 ) = 1, d(2) = 1. The map 1
on
1 B (0) ⊕ B ( 10 ) ⊕ B ( 23 ) ⊕ B ( 35 ) ⊕ B (2),
−1
on
3 7 B ( 80 ) ⊕ B ( 16 ),
gives rise to an automorphism of B by [Mi1]. Since the trace is equal to 51054−91392+ 47634 − 4864 + 1938 + 1 + 1 = 4372, this map is identified with a 2A involution of the Monster.
Vertex Operator Algebra with Larger Symmetry
583
W3 algebra at c = 4/5 and 3A element. Let U = W3 ( 45 ) be the vacuum sector of the W3 algebra [Za] at c = 4/5. It is isomorphic to L( 45 , 0) ⊕ L( 45 , 3) as a module over the Virasoro algebra. A realization of W3 ( 45 ) as a VOA as well as its representation theory are described in [KMY] and [Mi4]. There are 6 irreducible modules for this VOA, which are labeled as (h, σ ) = (0, 0), (2/5, 0), (2/3, ±), (1/15, ±), where h is the lowest conformal weight and σ is the sign √ of the eigenvalue of the action of a certain primary vector −1w ∈ W3 ( 45 ) of conformal weight 3. Hence the spectrum is given by 1 d(0) = 57478, d( 15 ) = 129168, d( 25 ) = 8671, d( 23 ) = 1566, d(2) = 1,
and we have a decomposition 1 1 1 B ( 15 ) = B ( 15 , +) ⊕ B ( 15 , −),
B ( 23 ) = B ( 23 , +) ⊕ B ( 23 , −),
into the sum of subspaces of equal dimensions for h = 1/15 and 2/3. Now the map 1
on
B (0) ⊕ B ( 25 ) ⊕ B (2),
ζ ±1
on
1 B ( 23 , ±) ⊕ B ( 15 , ±),
where ζ is a primitive 3rd root of unity, gives rise to an automorphism of B by [Mi4]. Since the trace is equal to 57478 + 8671 + (ζ + ζ −1 )(129168 + 1566)/2 + 1 = 783, it is identified with a 3A element of the Monster. This eigenspace decomposition is described in (24) of [MeN] and Lemma 4 of [No]. W4 algebra at c = 1 and 4A element. Let U = W4 (1) be the vacuum sector of the W4 algebra at c = 1. It is realized as the fixed-point subspace VL+ of the lattice VOA VL corresponding to the rank one lattice L = Zγ with γ , γ = 6 with respect to the −1 automorphism of the lattice. It is generated by the conformal vector τ and certain primary vectors w, z of conformal weight 3 and 4 respectively. We may use the representation theory of VL+ developed in [DN] and [Ab]. There are 10 irreducible modules for this VOA, which are labeled as (h, σ ) = (0, 0), (1, 0), (1/12, 0), (1/3, 0), (3/4, ±), (1/16, ±), (9/16, ±). Hence the spectrum is given by 1 1 ) = 94208, d( 12 ) = 48600, d( 13 ) = 11178, d(0) = 38226, d( 16 9 d( 16 ) = 4096, d( 43 ) = 552, d(1) = 23, d(2) = 1.
These dimensions are determined as unique nonnegative integers that satisfy the formula (4.1), although the number of unknown dimensions exceeds the number of equations. The map 1
on
B (0) ⊕ B ( 13 ) ⊕ B (1) ⊕ B (2),
−1
on
1 B ( 12 ) ⊕ B ( 43 ),
on
1 9 B ( 16 , ±) ⊕ B ( 16 , ±)
√ ± −1
gives rise to an automorphism of B by the fusion rules of W4 (1). Since the trace is equal to 38226 − 48600 + 11178 − 552 + 23 + 1 = 276, it is identified with a 4A element of
584
A. Matsuo
the Monster. Note that the trace of the square of this map, i.e., of the map 1
on
1 B (0) ⊕ B ( 12 ) ⊕ B ( 13 ) ⊕ B ( 43 ) ⊕ B (1) ⊕ B (2),
−1
on
1 9 B ( 16 ) ⊕ B ( 16 ),
is equal to 276. Hence this map is identified with a 2B involution of the Monster as expected. This eigenspace decomposition is described in Lemma 5 of [No]2 . W5 algebra at c = 8/7 and 5A element. Unfortunately, the classification of irreducible modules and the determination of fusion rules based on the theory of VOA for W5 ( 87 ) seem to be missing. However, formally applying the expected properties of this algebra to our situation, the eigenspace decomposition is expected to be 2 3 23 ) = 72010, d( 35 ) = 76912, d( 17 d(0) = 27228, d( 35 35 ) = 6688, d( 35 ) = 1520,
d( 27 ) = 12122, d( 67 ) = 133, d( 45 ) = 268, d( 65 ) = 2, d(2) = 1. Since the trace of the map 1
on
B (0) ⊕ B ( 27 ) ⊕ B ( 67 ) ⊕ B (1) ⊕ B (2),
ζ ±1
on
6 2 B ( 35 , ±) ⊕ B ( 17 35 , ±) ⊕ B ( 5 , ±),
ζ ±2
on
4 3 B ( 35 , ±) ⊕ B ( 23 35 , ±) ⊕ B ( 5 , ±),
where ζ is a primitive 5th root of unity, is equal to 27228 + (ζ + ζ −1 )(72010 + 6688 + 2)/2 + (ζ 2 + ζ −2 )(76912 + 1520 + 268)/2 + 12122 + 133 + 1 = 134, this map would be identified with a 5A element of the Monster (if we appropriately choose the signs above3 ). Remark 4.2. The fusion rules of Wn algebras constructed by the quantized DrinfeldSokolov reduction are determined by Frenkel et al. [FKW] via the Verlinde formula. In particular, the fusion ring of the first unitary series Wn (cn ), cn = 2(n − 1)/(n + 2), is isomorphic to that of the level 2 integrable highest weight representations of the (1) affine Kac-Moody Lie algebra of type An−1 . Then the map [λ] → ζλ [λ], where ζλ = √ n−1 ¯ exp(2π −1 i=1 imi /n) for a level 2 weight λ = n−1 i=1 mi 9i , gives an automorphism of the fusion ring over C. Therefore, we expect that an element of order n of the Monster would be obtained by using the decomposition of V into the sum of irreducible Wn (cn )modules. 5. Generalization to Higher Degree Recall the notations and assumptions in Sect. 2.3. In particular, Aut V is supposed to be finite. 2 There 24104 + 24104 should read 47104 + 47104. 3 This ambiguity is a matter of identification of the representations. There is no ambiguity if we adopt the
labeling as in [FKW].
Vertex Operator Algebra with Larger Symmetry
585
5.1. The trace functions. Let us set o(u) = u(n−1) : V → V
(5.1)
for any u ∈ V n after Frenkel and Zhu [FrZh]. Consider the trace functions Tr o(a)q L0
and
Tr o(a)o(b)q L0
for elements a, b ∈ B, where Tr denotes the trace over the whole space V . In this subsection, we will express these trace functions, under the corresponding assumptions, in terms of the character ch V =
∞
(dim V n )q n
(5.2)
n=0
and the Eisenstein series E2k = E2k (q), which we normalize as in [DM]: ∞ B2k 2 d 2k−1 q n . + E2k = − (2k)! (2k − 1)! n=1
(5.3)
d|n
Here Bm are the Bernoulli numbers defined by ∞ t tm B = . m et − 1 m! m=0
In particular, 1 + 2q + 6q 2 + 8q 3 + 14q 4 + 12q 5 + · · · , 12 1 1 28 73 E4 = + q + 3q 2 + q 3 + q 4 + 42q 5 + · · · . 720 3 3 3
E2 = −
(5.4)
The result is summarized in the following theorem Theorem 5.1. Let B be the Griess algebra of a VOA V such that Aut V is finite. (1) If V is of class S 2 then, for any a ∈ B, Tr o(a)q L0 =
2(a|ω) d q ch V . c dq
(2) If V is of class S 4 then, for any a, b ∈ B, d 2 44(a|b) + 20(a|ω)(b|ω) L0 q Tr o(a)o(b)q = c(5c + 22) dq −(11 + 60E2 )
c(a|b) − 2(a|ω)(b|ω) d q 3c(5c + 22) dq
+(11 + 120E2 − 720E4 )
c(a|b) − 2(a|ω)(b|ω) ch V . 360(5c + 22)
586
A. Matsuo
For instance, Tr |V 3 o(a)o(b) =
−2(20c2 + 40c dim V 2 + (3c − 198) dim V 3 ) (a|b) c(5c + 22) +
16(5c + 10 dim V 2 + 12 dim V 3 ) (a|ω)(b|ω), c(5c + 22)
Tr |V 4 o(a)o(b) =
−2(55c2 + 98c dim V 2 + 60c dim V 3 + (4c − 352) dim V 4 ) (a|b) c(5c + 22) +
4(55c + (5c + 120) dim V 2 + 60 dim V 3 + 84 dim V 4 ) (a|ω)(b|ω). c(5c + 22)
We sketch the derivation of the formulae in the rest of this subsection. The formula (1) immediately follows from (2.14) and Lemma 2.5: if V is of class S 2 then Tr a(1) q L0 =
2(a|ω) 2(a|ω) 2(a|ω) d Tr ω(1) q L0 = Tr L0 q L0 = q ch V c c c dq
for any a ∈ B. Suppose that V is of class S 4 and consider two vectors a, b ∈ B of the Griess algebra. Then, by Proposition 4.3.5 of [Zh], we have Tr o(a)o(b)q L0 = Tr o(a[−1] b)q L0 − E2 Tr o(a[1] b)q L0 − E4 Tr o(a[3] b)q L0 .
(5.5)
Here the operations (a, b) → a[n] b, (n ∈ Z), are another VOA structure on V introduced by Zhu [Zh], which is normalized so that 3 5 11 a[−1] b = a(−1) b + a(0) b + a(1) b + a(3) b, 2 12 720 1 a[1] b = a(1) b − a(3) b, a[3] b = a(3) b. 6
(5.6)
for a, b ∈ B. Then, by (5.5) and (2.15), we have 11 5(a|b) 3(a|b) Tr o(1)q L0 + Tr o([2])q L0 + Tr o([3])q L0 720 3c c 6c(a|b) − 12(a|ω)(b|ω) 44(a|b) + 20(a|ω)(b|ω) + Tr o([4])q L0 + Tr o([2, 2])q L0 c(5c + 22) c(5c + 22) 4(a|b) 1 Tr o([2])q L0 + E2 (a|b)Tr o(1)q L0 − E4 (a|b)Tr o(1)q L0 . −E2 c 6 (5.7)
Tr o(a)o(b)q L0 =
Vertex Operator Algebra with Larger Symmetry
587
Hence we get Theorem 5.1 (2) by using the following: d d d ch V , Tr o([3]) = −2q ch V , Tr o([4]) = 3q ch V , (5.8) dq dq dq d d 2 13 c 11c c ch V . = q + + 2E2 q − + E2 − E4 dq 6 dq 1440 12 2
Tr o([2]) = q Tr o([2, 2])q L0
Here only the last one is not obvious. By (5.5), we have Tr o(ω)o(ω)q L0 = Tr o(ω[−1] ω)q L0 − E2 Tr o(ω[1] ω)q L0 − E4 Tr o(ω[3] ω)q L0 . (5.9) Substituting o(ω[−1] ω) = [2, 2](3) −
13 11 c c [2](1) + c, o(ω[1] ω) = 2[2](1) − , o(ω[3] ω) = 6 144 12 2
and Tr o(ω)q L0 = Tr L0 q L0 = q
d d 2 ch V , ch V , Tr o(ω)o(ω)q L0 = q dq dq
we have the result. 5.2. McKay-Thompson series for 2A involution. Let V be the moonshine module V . In this subsection, we will show that the McKay-Thompson series for a 2A involution is determined by the formulae above using the fact that the character of V is given by ch V = q(J (q) − 744) = 1 + 196884q 2 + 21493760q 3 + 864299970q 4 + 20245856256q 5 + · · · , (5.10) and that the characters of L( 21 , h) are given by χ0 (q)
=
χ1/2 (q) = χ1/16 (q) =
ch L( 21 , 0)
1 = 2
ch L( 21 , 21 )
1 = 2
1 ch L( 21 , 16 )=
q
∞
(1 + q
k+1/2
)+
k=0
∞
(1 − q
k+1/2
(1 − q
k+1/2
k=0
(1 + q
k+1/2
k=0
1/16
∞
∞
)−
∞ k=0
(1 + q k ).
k=1
) , ) ,
(5.11)
588
A. Matsuo
Now, let τ be an idempotent of central charge 1/2 in the Griess-Conway algebra B and consider the corresponding Virasoro action Tn = τ(n+1) . Consider the subspace P (h) = {v ∈ V | Tn v = 0 if n ≥ 1 and T0 v = hv}
(5.12)
for each h = 0, 1/2, 1/16, and set zh (q) =
∞
dim(V n ∩ P (h))q n .
(5.13)
n=0
Then we have z0 (q)χ0 (q) + q −1/2 z1/2 (q)χ1/2 (q) + q −1/16 z1/16 (q)χ1/16 (q) = Tr q L0 , . . . z0 (q)χ 0 (q) + q −1/2 z1/2 (q)χ 1/2 (q) + q −1/16 z1/16 (q)χ 1/16 (q) = Tr o(e)q L0 , .. .. .. z0 (q)χ 0 (q) + q −1/2 z1/2 (q)χ 1/2 (q) + q −1/16 z1/16 (q)χ 1/16 (q) = Tr o(e)2 q L0 , (5.14) . .. where χ (q) = qd/dqχ (q) and χ (q) = (qd/dq)2 χ (q). By Theorem 5.1, Tr q L0 = ch V , Tr o(e)2 q L0 =
Tr o(e)q L0 =
1 d q ch V , 48 dq
49 d 2 47(11 + 60E2 ) d − q q 13632 dq 81792 dq +
(5.15)
47(11 + 120E2 − 720E4 ) ch V , 163584
where ch V is given by (5.10). Therefore, the condition (5.14) determines the series z0 (q), z1/2 (q) and z1/16 (q), so the McKay-Thompson series T2A (q) via
T2A (q) = q −1 z0 (q)χ0 (q) + q −1/2 z1/2 (q)χ1/2 (q) − q −1/16 z1/16 (q)χ1/16 (q) . (5.16) The result is written as a rational expression involving the functions J (q), χh (q), their first and the second derivatives and the Eisenstein series E2 (q) and E4 (q). We do not include the explicit form in this paper.
Vertex Operator Algebra with Larger Symmetry
589
Appendix A.1 The coefficients in the trace formula (4). A1 = −c(2100c5 + 53650c4 + 304049c3 − 980942c2 − 1641936c + 229152) +(2455c4 − 193958c3 + 4032472c2 + 539488c − 1651584)d, A2 = −c(1050c5 + 30965c4 + 279826c3 + 609848c2 − 271248c − 150144) −c(60c4 − 4929c3 + 96248c2 + 258428c − 56304)d, A3 = −c(1050c5 + 31085c4 + 270928c3 + 726848c2 + 1748472c − 79008) +c(60c4 − 3969c3 + 20752c2 + 1761292c + 127440)d, B = 4c(1050c4 + 30905c3 + 289750c2 + 281168c −4d(120c4 − 14853c3 + 424928c2 + 11132c − 206448)), C = 8c(d − 1)(120c3 − 9437c2 + 187858c + 22968), D = −192c(d − 1)(100c2 − 4297c − 2852)(d − 1),
E = 15744c(30c + 47).
A.2 The coefficients in the trace formula (5). A1,2,3,4,5 = −5c(46200c6 + 2154600c5 + 31531073c4 +123663366c3 − 560461448c2 − 1390398720c − 168205824) −5d(100c6 − 2405c5 − 1037398c4 + 70463896c3 − 1249353984c2 + 60544768c + 766334976), A1,2,4,3,5 = c(1500c5 − 161985c4 + 5500754c3 − 19601928c2 − 1338547904c − 3497905152)(d − 1), A1,2,5,3,4 = c(−1500c5 + 147745c4 − 3380778c3 − 83375368c2 + 2968841472c + 3711048192)(d − 1), A1,3,2,4,5 = −c(115500c6 + 5849050c5 + 102135165c4 +720684894c3 + 1549368552c2 − 664210624c + 4461754368) +cd(300c5 − 81505c4 + 5253294 c3 − 87363968c2 − 611758944c + 4940713728), A1,3,4,2,5 = c(500c5 − 29035c4 + 518574c3 − 15730088c2 + 553755136c − 3442893312)(d − 1), A1,3,5,2,4 = c(−500c5 + 14795c4 + 1601402c3 − 87247208c2 + 1076538432c + 3656036352)(d − 1), A1,4,2,3,5 = −c(115500c6 + 5848050c5 + 102268115c4 +715702714c3 + 1553240392c2 + 1228092416c + 4516766208) −cd(700c5 − 51445c4 − 271114c3 + 83492128c2 − 1280544096c − 4995725568), A1,4,3,2,5 = c(−500c5 + 82955c4 − 5023222c3 + 122369272c2 − 861258816c − 1610972160)(d − 1), A1,4,5,2,3 = c(500c5 − 118155c4 + 6583582c3 − 91119048c2 − 815764608c + 3601024512)(d − 1),
590
A. Matsuo
A1,5,2,3,4 = −c(115500c6 + 5847050c5 + 102401065c4 +710720534c3 + 1557112232c2 + 3120395456c + 4571778048) −cd(1700c5 − 184395c4 + 4711066c3 + 79620288c2 − 3172847136c − 5050737408), A1,5,3,2,4 = c(500c5 − 49995c4 − 41042c3 + 118497432c2 − 2753561856c − 1665984000)(d − 1), A1,5,4,2,3 = c(−500c5 + 103915c4 − 4463606c3 − 11858248c2 + 2446058176c − 3387881472)(d − 1), A2,3,1,4,5 = −3c(100c5 − 21675c4 + 907054c3 + 11023128c2 − 806389760c + 1100745216)(d − 1), A2,4,1,3,5 = c(700c5 − 67925c4 + 2261018c3 − 36941224c2 + 526866240c − 3357247488)(d − 1), A2,5,1,3,4 = c(1700c5 − 200875c4 + 7243198c3 − 40813064c2 − 1365436800c − 3412259328)(d − 1), B1 = 4c(115500c5 + 5848250c4 + 101927925c3 + 740910478c2 + 1067413032c + 217343424) +4d(500c5 + 288745c4 − 25478878c3 + 569319488c2 − 269795104c − 478959360), B2 = −4c(8100c4 − 616655c3 + 8745246c2 + 142937384c − 614801472)(d − 1), B3 = 4c(8100c4 − 482575c3 − 1572066c2 + 339056296c − 368532288)(d − 1), C = −8c(1780c4 − 264997c3 + 12872162c2 − 203786696c − 26642880)(d − 1), D = 64c(3620c3 − 510813c2 + 15237868c + 4458096)(d − 1), E = 256c(2095c3 − 161208c2 + 3064358c + 3847956) (d − 1), F = −3840c(3000c2 − 125177c − 223532)(d − 1), G = 333312c(90c + 259)(d − 1), c H = − (100c5 − 13295c4 + 498218c3 − 387184 c2 − 189230304c − 5501184)(d − 1). 12
Acknowledgement. The author wishes to thank Alexander Ivanov, Simon Norton and Michael Tuite for stimulating discussions and useful comments on the subject. He is deeply grateful to Ian Grojnowski for hospitality at Cambridge.
References [Atlas] [Ab] [BPZ] [Bo1] [Bo2] [Co] [CN] [DGH]
Conway, J.H., Curtis, R.T., Norton, S.P., Parker, R.A. and Wilson, R.A.: Atlas of finite groups. Oxford: Oxford University Press, 1985 Abe, T.: Fusion rules for the charge conjugation orbifold. Preprint (math/0006101) Belavin, A.A., Polyakov, A.M. and Zamolodchikov, A.B.: Infinite conformal symmetry in twodimensional quantum field theory. Nuclear Phys. B 241, 333–380 (1984) Borcherds, R.E.: Vertex algebras, Kac-Moody algebras, and the monster. Proc. Nat’l. Acad. Sci. USA. 83, 3068–3071 (1986) Borcherds, R.E.: Monstrous moonshine and monstrous Lie superalgebras. Invent. Math. 109, 405– 444 (1992) Conway, J.H.: A simple construction for the Fischer-Griess monster group. Invent. Math. 79, 513– 540 (1985) Conway, J.H. and Norton, S.P.: Monstrous moonshine. Bull. London Math. Soc. 11, 308–339 (1979) Dong, C.-Y., Griess, R.L. and Höhn, G.: Framed vertex operator algebras, codes and moonshine module. Commun. Math. Phys. 193, 407–448 (1998)
Vertex Operator Algebra with Larger Symmetry
[DM]
591
Dong, C.-Y. and Mason, G.: Monstrous moonshine of higher weight. Acta Math. 185 101–121 (2000) [DMZ] Dong, C.-Y., Mason, G. and Zhu, Y.-C.: Discrete series of the Virasoro algebra and the moonshine module.In: Haboush, W.J. and Parshall, B.J. (eds.): Algebraic groups and their generalizations: Quantum and infinite-dimensional methods (University Park, PA, 1991), Proc. Sympos. Pure Math., 56, Part 2, Providence, RI: Amer. Math. Soc., RI, 1994, pp. 295–316 [DN] Dong, C.-Y. and Nagatomo, K.: Representations of vertex operator algebra VL+ for rank one lattice L. Commun. Math. Phys. 202, 384–404 (1999) [FaL] Fateev, V.A. and Lukyanov, S.L.: The models of two-dimensional conformal quantum field theory with Zn symmetry. Int. J. Mod. Phys. A3, 507 (1988) [FKW] Frenkel, E., Kac, V. and Wakimoto, M: Characters and fusion rules for W -algebras via quantized Drinfeld-Sokolov reduction. Commun. Math. Phys. 147, 295–328 (1992) [FLM1] Frenkel, I.B., Lepowsky, J. and Meurman, A.: A natural representation of the Fischer–Griess Monster with the modular function J as character. Proc. Nat’l. Acad. Sci. USA 81, 3256–3260 (1984) [FLM2] Frenkel, I.B., Lepowsky, J. and Meurman,A.: Vertex operator algebras and the Monster. Pure and Appl. Math. 134, Boston: Academic Press, 1989 [FaZ] Fateev, V.A. and Zamolodchikov, A.B.: Conformal quantum field theory models in two dimensions having Z3 symmetry. Nucl. Phys. B 280, 644–660 (1987) [FrZh] Frenkel, I.B. and Zhu, Y.-C.: Vertex operator algebras associated to representations of affine and Virasoro algebras. Duke Math. J. 66, 123–168 (1992) [Gr1] Griess, R.L.: The friendly giant. Invent. Math. 69, 1–102 (1982) [Gr2] Griess, R.L.: The vertex operator algebra related to E8 with automorphism group O + (10, 2). In: The Monster and Lie algebras (Columbus, Ohio, 1996), Ohio State Univ. Math. Res. Inst. Publ. 7, Berlin: de Gruyter, 1998, pp. 43–58 [HL] Harada, K. and Lang, M.-L.: Modular forms associated with the Monster module. In: The Monster and Lie algebras (Columbus, Ohio, 1996), Ohio State Univ. Math. Res. Inst. Publ. 7, Berlin: de Gruyter, 1998, 59–83 [Hö] Höhn, G.: Selbstduale Vertexoperatorsuperalgebren und das Babymonster. Dissertation, Bonn, 1995 [KMY] Kitazume, M., Miyamoto, M. and Yamada,H.: Ternary codes and vertex operator algebra. J. Algebra 223, 379–395 (2000) [Li] Li, H.-S.: Symmetric invariant bilinear forms on vertex operator algebras. J. Pure Appl. Algebra 96, 279–297 (1994) [MaM] Matsuo, A. and Matsuo, M.: The automorphism group of the Hamming code vertex operator algebra. J. Algebra 228, 204–226 (2000) [MaN] Matsuo, A. and Nagatomo, K.: Axioms for a vertex algebra and the locality of quantum fields. MSJ-Memoirs 4, Mathematical Society of Japan, 1999 [MeN] Meyer, W. and Neutsch, W.: Associative subalgebras of the Griess algebra, J. Algebra 158, 1–17 (1993) [Mi1] Miyamoto, M.: Griess algebras and conformal vectors in vertex operator algebras. J. Algebra 179, 528–548 (1996) [Mi2] Miyamoto, M.: Binary codes and vertex operator (super)algebras. J. Algebra 181, 207–222 (1996) [Mi3] Miyamoto, M.: A new construction of the Moonshine vertex operator algebra over the real number field. Preprint (q-alg/9701012) [Mi4] Miyamoto, M.: 3-state Potts model and automorphism of vertex operator algebra of order 3. Preprint (q-alg/9710038) [No] Norton, S.P.: The Monster algebra: Some new formulae.In: Moonshine, the Monster, and related topics (South Hadley, MA, 1994), Contemp. Math. 193 Providence, RI: Amer. Math. Soc., 1996, pp. 297–306 [Wa] Wang, W.: Rationality of Virasoro vertex operator algebra.Internat. Math. Res. Notices 1993, no. 7, 197–211 (1993) [Za] Zamolodchikov, A.B.: Infinite additional symmetries in 2-dimensional conformal quantum field theory. Theor. Math. Phys. 65, 1205–1213 (1985) [ZaF] Zamolodchikov, A.B. and Fateev, V.A.: Nonlocal (parafermion) currents in two-dimensional conformal quantum field theory and self-dual critical points in ZN -symmetric statistical systems. Soviet Phys. JETP 62, 215–225 (1985) [Zh] Zhu, Y.-C.: Modular invariance of characters of vertex operator algebras. J. Amer. Math. Soc. 9, 237–302 (1996) Communicated by M. Aizenman
Commun. Math. Phys. 224, 593 – 612 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Potential Approximations to δ : An Inverse Klauder Phenomenon with Norm-Resolvent Convergence Pavel Exner1,2 , Hagen Neidhardt3 , Valentin A. Zagrebnov4,5 1 Department of Theoretical Physics, NPI, Academy of Sciences, 25068 Rež, ˇ Czech Republic 2 Doppler Institute, Czech Technical University, Bˇrehová 7, 11519 Prague, Czech Republic.
E-mail: [email protected]
3 Weierstraß-Institut für Angewandte Analysis und Stochastik, Mohrenstr. 39, 10117 Berlin, Germany.
E-mail: [email protected]
4 Départment de Physique, Université de la Méditerranée (Aix-Marseille II), Marseille, France 5 Centre de Physique Théorique, CNRS, Luminy Case 907, 13288 Marseille, Cedex 9, France.
E-mail: [email protected] Received: 19 March 2001 / Accepted: 18 June 2001
Abstract: We show that there is a family Schrödinger operators with scaled potentials which approximates the δ -interaction Hamiltonian in the norm-resolvent sense. This approximation, based on a formal scheme proposed by Cheon and Shigehara, has nontrivial convergence properties which are in several respects opposite to those of the Klauder phenomenon. 1. Introduction Point interactions are often used for constructing solvable models of quantum mechanical systems [AGHH]. To judge a quality of such models one has to be able, of course, to decide how well does a point interaction approximate the “actual” interaction. In the simplest case of a one-dimensional point interaction introduced originally by Kronig and Penney [KP] the answer is easy: the appropriate Hamiltonian is a norm-resolvent limit of a family of Schrödinger operators with squeezed potentials, which physically means that a slow particle with a widely smeared wave packet “sees” only the mean value of a localized potential. The problem is more complicated in dimension two and three where squeezed potentials can also be used, however, with a renormalization such that the limiting coupling is “infinitely weak”. The idea belongs to Friedman [Fr]; a detailed discussion for a general shape of the approximating potential can be found in [AGHH] together with description of other, nonlocal, approximations and the corresponding bibliography. More generally, we have here an important particular case of the question: what is the “right Hamiltonian” for a strongly singular perturbation of the Laplace operator – see [NZ1, NZ2] and references therein. A peculiarity of the one-dimensional situation is that not all point interactions are of the δ type. This follows from the standard construction of a point interaction [BF, AGHH] which relies on the restriction of the free Hamiltonian to functions which vanish in the vicinity of the interaction support, and a consecutive construction of self-adjoint extensions of the obtained symmetric operator. For a single center in dimension one
594
P. Exner, H. Neidhardt, V. A. Zagrebnov
the latter has deficiency indices (2, 2) leading thus to a four-parameter family of extensions. A subset of them usually called δ interactions was introduced in [GHM]; the whole family was later studied in [GK, Še1, GH] and subsequent papers by other authors. In distinction to the usual δ interactions, the other extensions were constructed as mathematical objects and the question about their physical meaning arose naturally. Šeba [Še2] was the first who addressed the question of approximation of δ Hamiltonians by those with “regular” interactions. He showed, in particular, that the name is misleading because such Hamiltonians cannot be obtained using families of scaled zero-mean potentials. At the same time he demonstrated that the δ interaction can be approximated in a nonlocal way using a suitable family of rank-one operators with a nontrivial coupling-constant renormalization. Later local approximations were constructed [Ca, CH] but they were not of potential type since they involved first-derivative terms. The question about the δ interaction meaning became more appealing when interesting physical properties of this coupling were discovered. Specifically, it was shown that Wannier–Stark systems with an array of δ interactions have no absolutely continuous spectrum [AEL, Ex, MS] and even that the spectrum is pure point for most values of the parameters [ADE]. The origin of this effect is the high-energy behaviour of the δ scattering, with the transmission amplitude vanishing as k → ∞. Such a behaviour can be approximated, up to a phase factor, within a fixed finite interval of energies by small complicated graph scatterers [AEL], and the qualitatively same scattering picture, up to a series of resonances, was found for a sphere with two halflines attached [Ki]. Until recently it was believed, however, that no potential-type approximation to the δ interaction existed. It came thus as a surprise, when two years ago Cheon and Shigehara (CS) constructed an approximation by means of a triple of δ interactions with the coupling constants scaled in a nonlinear way as their distances tend to zero [CS]. In distinction to the situations mentioned above this renormalization leads to an “infinitely strong” coupling in the limit. The authors computed formally the limiting wave function and showed that it obeyed the δ boundary conditions [AGHH]; they also presented an alternative argument based on convergence of the corresponding transfer matrices [SMMC]. It is natural to ask in which sense does the limit exist and whether one can construct a similar approximation using regular potentials. We shall answer the second question affirmatively and show that the approximating families converge in a rather strong topology, namely norm resolvent. A nontrivial character of the approximation will be seen from the fact that we do not recover the sought limit when the involved operators are expressed through the respective quadratic forms, in particular, because the form domain of the limiting operator is larger than those of the approximating ones. Such a disparity between the form domains reminds us of the Klauder phenomenon [Kl, Si] where a singular perturbation is switched off in the strong resolvent sense yielding an operator different from the free one obtained as the formal limit by putting the coupling constant equal to zero. Here the situation is in several respects opposite. First of all, the coupling here is not switched off as in [Kl, Si] but rather becomes infinitely strong, so it is not straightforward to identify the formal limit. On the other hand, the larger form domain corresponds to the true norm-resolvent limit. In addition, the CS-approximation requires a subtle interplay of the coupling constants. If we change this choice, we arrive at an operator the form domain which is smaller than those of the approximants, namely to the Laplacian with Dirichlet decoupling at the δ interaction position. Let us review briefly the contents of the paper. In the next section we will formulate the approximation by triple δ interaction and examine it using the explicit form of
Potential Approximations to δ
595
the operators involved. Then we combine this result with the known squeezed-potential approximation of the δ interaction [AGHH, Thm. I.3.2.3] to show that a δ can be approximated by a family of potentials consisting of three a()-spaced parts of a “size” which approach each other as → 0+ and at the same time undergo a CS-type scaling. Furthermore, we determine a squeezing rate which yields a convergent approximation: it is sufficient that a()−12 tends to zero. In Sect. 4 we illustrate the mentioned nonstability of the approximation: if we disbalance only slightly the dependence of the coupling constants we get a family which converges in the norm-resolvent sense to the Dirichlet decoupled Laplace operator on the line. To keep things simple we do not strive for maximum generality. We restrict ourselves to the δ case, because an extension to the general four-parameter point interaction is easy to obtain by adapting the scheme of [SMMC]. We also do not ask about the optimal rate between and a() needed for the convergence.
2. Resolvent Approach to the CS Approximation In the following we use the notations and definitions of [AGHH]. Let H0 = − be a free one-dimensional Schrödinger operator in the Hilbert space L2 (R). Its resolvent is an integral operator with the kernel Gk (x −x ) ≡ (− − k 2 )−1 (x, x ) :=
i ik|x−x | e 2k
(2.1)
for any m k > 0 and x, x ∈ R. The related function ˜ k (x −x ) := sgn(x −x ) i eik|x−x | G 2k
(2.2)
allows us to express the resolvent for the δ -perturbation of H0 centered at the point y and having the “strength” β, denoted by β,y , in the form [AGHH, Sect. I.4]: (β,y − k 2 )−1 (x, x ) = Gk (x −x ) −
2βk 2 ˜ ˜ k (x −y). Gk (x −y)G 2 − iβk
(2.3)
Recall that β,y acts as H0 away from y and its domain consists of those f ∈ W 2,2 (R \ {y}) which satisfy the boundary conditions ψ (y+) = ψ (y−) =: ψ (y) ,
ψ(y+) − ψ(y−) = βψ (y).
(2.4)
Our first aim is to approximate the resolvent (2.3) of β,y by a family of those corresponding to the triple δ-perturbation of H0 with the couplings Aa = {αj }j =−1,0,1 = {2β −1 −a −1 , βa −2 , 2β −1 −a −1 } localized at Ya = {yj }j =−1,0,1 = {y − a, y, y + a} for a ≥ 0 letting a → 0. Denote this perturbed operator by − Aa ,Ya . Then by [AGHH, Sect. II.2] the corresponding resolvent has the kernel (− Aa ,Ya − k 2 )−1 (x, x ) = Gk (x −x ) −
[a (k)]−1 jj Gk (x −yj ) Gk (x −yj ),
j,j =−1,0,1
(2.5)
596
P. Exner, H. Neidhardt, V. A. Zagrebnov
where [a (k)]jj := αj−1 δjj + Gk (yj −yj )
jj
and j, j = −1, 0, 1. In particular, for
a purely imaginary k = iκ, κ > 0, we get 2 1 + u w w 1 w 1 + v w , a (iκ) = 2κ w2 w 1+u
(2.6)
where u := 2βκa/(2a −β),
v := 2κa 2 /β,
w := e−κa .
(2.7)
Let us look at how the spectrum of the operators {− Aa ,Ya }a≥0 behaves as a → 0 for a fixed β. Since the perturbation in (2.5) is a rank three operator, σess (H0 ) = σac (H0 ) = [0, ∞) is not affected by the perturbation and the point spectrum consists of at most three negative eigenvalues, with the multiplicity taken into account [We, Sect. 8.3]. Here we have: Proposition 2.1. For small enough spacing a the operator − Aa ,Ya has at most one eigenvalue. This happens if and only if β < 0, and in that case inf σ (− Aa ,Ya ) = −
4 + O(a) . β2
(2.8)
Proof. Since the negative part of σ (− Aa ,Ya ) is the point spectrum determined by zeros of det a (iκ) by [AGHH, Sect. II.2] we arrive at the equation (1+u−w2 ) (1+u)(1+v) − w2 (1−v) = 0, (2.9) or e−2κa = 1 + and e
−2κa
2βκa = 1+ 2a −β
2βκa 2a −β
1 + 2κa 2 β −1 . 1 − 2κa 2 β −1
(2.10)
(2.11)
Expanding the left- and right-hand sides of the last two equations around a = 0, one finds that only (2.10) has a solution for a sufficiently small a > 0 and that it equals κ(a) = −
2 + O(a). β
Since k = iκ corresponds to an isolated eigenvalue if and only if m k > 0, the assertion follows readily. Proposition 2.1 also shows that if κ > −2/β, β = 0, is fixed, then there is a0 (κ) > 0 such that − Aa ,Ya + κ 2 > 0 and the resolvent (− Aa ,Ya + κ 2 )−1 exists for all a ∈ (0, a0 (κ)). We further note that the operator − Aa ,Ya admits a definition in the sense of quadratic forms. Denoting this quadratic form by QAa ,Ya [·, ·] one has β QAa ,Ya [u, v] = (u , v ) + 2 u(y)v(y) a 2 1
+ − u(y + a)v(y + a) + u(y − a)v(y − a) (2.12) β a
Potential Approximations to δ
597
for u, v ∈ dom(QAa ,Ya ) = W 1,2 (R). When equipped with the scalar product (u, v)QAa ,Ya := − Aa ,Ya + κ 2 u, − Aa ,Ya + κ 2 v ,
(2.13)
where κ > −2/β and a ∈ (0, a0 (κ)), the domain dom(QAa ,Ya ) becomes a Hilbert space. It is important to note that the norm · QAa ,Ya arising from this scalar product is equivalent to the norm of the Hilbert space W 1,2 (R). Proposition 2.1 shows that up to an O(a) error the spectral properties of − Aa ,Ya coincide with those of β,y . Next we compare the corresponding resolvents. Theorem 2.2. Let κ = −2/β and β = 0 be fixed. Then the relation
lim
a→0+
− Aa ,Ya + κ 2
−1
−1 (x, x ) = β,y + κ 2 (x, x )
(2.14)
holds for any x, x ∈ R. Consequently, − Aa ,Ya → β,y as a → 0+ in the normresolvent sense. Proof. By virtue of (2.3), to check (2.14) it is sufficient to compute the pointwise limit of the second term at the right-hand side of (2.5). Using the notations introduced in the preceding proof, we obtain an explicit expression for the inverse matrix in (2.5): [a (iκ)]−1 =
2κ (2.15) (w 2 −1−u)[(1+u)(1+v) − w2 (1−v)] 2 −w(w2 −1−u) w2 v w −(1+u)(1+v) × −w(w2 −1−u) (w2 +1+u)(w2 −1−u) −w(w2 −1−u) . w2 v −w(w 2 −1−u) w2 −(1+u)(1+v)
Without loss of generality we may assume y = 0. Suppose, for instance, that x, x > a, then the resolvent difference kernel is obtained by sandwiching the above matrix between the vectors G(x), G(x ), where w Giκ (x + a) 1 −κx 1 , G(x) := Giκ (x) = (2.16) e 2κ G (x − a) w −1 iκ
which yields the expression j,j =−1,0,+1
[a (iκ)]−1 jj G(x − yj )G(x − yj ) =
1 −κx −κx N e e 4κ 2 D
(2.17)
with D=
(w2 −1−u)[(1+u)(1+v) − w2 (1−v)] 2κ
(2.18)
and N = (w2 + w −2 )[w 2 − (1+u)(1+v)] + 2w2 v + (w 2 −1−u)(u−1−w2 ) .
(2.19)
598
P. Exner, H. Neidhardt, V. A. Zagrebnov
It is straightforward, if tedious, to compute the Taylor expansions of the denominator and numerator: we get
D = −2κ 2 a 4 κ +2β −1 + O(a 5 ) ,
(2.20)
while in the other expression all the terms cancel up to the third order giving N = 4κ 4 a 4 + O(a 5 ) .
(2.21)
The sought kernel is thus j,j =−1,0,1
[a (k)]−1 jj Gk (x −yj )Gk (x −yj ) = −
β e−κx e−κx (1+O(a)) 2(2+βκ) (2.22)
as expected. In the same way one can treat the other situations with x, x belonging to (−∞, a), (−a, 0), (0, a), and (a, ∞). In the coefficient this corresponds to different combinations of (w, 1, w−1 ) and (w −1 , 1, w) in (2.16). Due to the symmetry of [a (iκ)]−1 , however, there are just two different expressions, the other one having the numerator replaced by N = (w4 + 1)v + 2[w 2 − (1+u)(1+v)] + (w2 −1−u)(u−1−w2 )
(2.23)
leading to N = −4κ 4 a 4 + O(a 5 )
(2.24)
and the correct kernel again; recall the sign factor in (2.2). This yields the relation (2.14). For a fixed κ > 0 we see from the relation (2.16) that its left-hand side can be majorized by a function from L2 (R2 ) which is independent of a. The same is, of course, true for the last term in (2.3). Then by (2.3), (2.5), (2.17), and dominated convergence the resolvent converges in the Hilbert-Schmidt norm, lim (− Aa ,Ya + κ 2 )−1 − (β,y + κ 2 )−1 2 = 0,
a→0
(2.25)
and thus, a fortiori, {− Aa ,Ya }a≥0 approximates β,y in the norm-resolvent topology. Remark 2.3. The result remains valid if the coupling constants Aa are replaced by α±1 (a) =
2 1 − + ϕ1 (a), β a
α0 (a) =
β (1+ϕ0 (a)) , a2
where ϕj are smooth functions behaving as O(a) for a → 0+ .
(2.26)
Potential Approximations to δ
599
3. Approximation of δ by Regular Potentials It is easy to use the above result to prove the existence of an approximation of δ by local potentials. After a suitable translation we can put y = 0 and we seek in the form
x β a W,0 (x) = V 0 a()2 x + a() x − a() 2 1 1 1 + − V−1 + V1 ; (3.1) β a() a (x) is obtained by replacing x by x −y at the the general potential approximation W,y right-hand side. In this expression β ∈ R \ {0}, and the involved potentials are supposed to satisfy Vj ∈ L1 (R) and Vj (x) dx = 1 (3.2) R
for j = −1, 0, 1. The function a : R+ → R+ , to be specified later, is supposed to be continuous at = 0 with a(0) = 0. The family of one-dimensional Schrödinger operator used to approximate β,y will be of the form a a H,y := − + W,y .
(3.3)
If Vj ∈ L1 (R) the r.h.s. is defined in the sense of the corresponding quadratic forms. If we a (x) is an infinitely small perturbation of the add the requirement Vj ∈ L2 (R), then W,y a ) = dom(− ) = Laplacian and (3.3) as a self-adjoint operator is defined on dom(H,0 W 2,2 (R) as an operator sum. We will make this assumption everywhere in the following, except for Theorem 3.1 where we refer directly to a result in [AGHH]. To compare the resolvents, we choose k = iκ which belongs to the resolvent sets of a and the operator 2 both H,y β,y introduced above; this can be achieved if k is nonreal or with κ > 0 large enough. Then we may employ the elementary estimate a (H +κ 2 )−1 − (β,y +κ 2 )−1 ,y a ≤ (H,y +κ 2 )−1 − (− Aa() ,Ya() +κ 2 )−1 (3.4) 2 −1 2 −1 +(− A ,Y +κ ) − (β,y +κ ) a()
a()
to prove the following claim: Theorem 3.1. Let Vj ∈ L1 (R), j = −1, 0, 1. For any sequence {an } ⊂ (0, ∞) with an → 0 there is a sequence {n } of positive numbers with n → 0 such that lim (Hann,y + κ 2 )−1 − (β,y + κ 2 )−1 = 0 (3.5) n→∞
holds for any κ > 2|β|−1 . Proof. Without loss of generality we may put y = 0. In view of Theorem 2.2 it is sufficient to deal with the first term at the right-hand side of (3.4). By [AGHH, Thm. II.2.2.2] for each an > 0, n = 1, 2, . . . , there exists a sequence of {nm }∞ m=1 with limm→∞ nm = 0 such that 2 −1 n lim (Hanm (3.6) − (− Aan ,Yan + κ 2 )−1 = 0 , ,0 + κ ) m→0
600
P. Exner, H. Neidhardt, V. A. Zagrebnov (n)
where Yan = {yj }j =−1,0,1 = {−an , 0, an }, Aan = {αj }j =−1,0,1 = (2β −1 −an−1 , βan−2 , n 2β −1 − an−1 ) and {Hanm ,0 }n≥1 are defined by local potentials x β an Wnm ,0 (x) = (3.7) V0 nm an2 nm 2 1 1 x + an 1 x − an + − + . V−1 V1 β an nm nm nm nm Indeed, in view of (3.2), Theorem II.2.2.2 of [AGHH] applies if we choose the real analytic function λj (·), which enters into Theorem II.2.2.2, of the form λj (nm ) := (n) nm αj . If m k 2 = 0, the norms at the right-hand side of (3.6) are uniformly bounded and the claim is valid for the diagonal sequence, n := nn – cf. [RS, Sect. I.3]. By the first resolvent identity its validity extends to any point outside the spectrum of β,0 . The diagonal trick used in the above proof introduces a relation between the parameters a and . Since to a given a we choose small enough to meet the requirements, the procedure works if a() tends to zero sufficiently slowly as → 0+. Put like that the claim is, of course, very vague. Even without computing the resolvents, e.g., we can conjecture that the family (3.3) will not yield the sought approximation if a() ∼ ν with ν > 1 since then the three potentials will overlap substantially for small values of and eventually the (divergent) overall mean value will prevail. The question about a rate between a and which is sufficient to yield a convergent approximation is subtle, and the rest of the section is devoted to it. As above we put y = 0 in the following argument restoring a general y only in the final result. First we (0) introduce the sesquilinear forms ta, [·, ·], β 1 +∞ (0) dx V0 (x/)u(x)v(x) , ta, [u, v] := 2 u(0)v(0) − a −∞ (j )
and ta, [·, ·], (j ) ta, [u, v]
:=
2 1 − β a
1 u(j a)v(j a) −
+∞
−∞
dx Vj (x − j a/)u(x)v(x) ,
(j )
(0)
where j = ±1 and dom(ta, ) = dom(ta, ) = W 1,2 (R). We set (0) (−1) (+1) ta, [·, ·] := ta, [·, ·] + ta, [·, ·] + ta, [·, ·]
with dom(ta, ) = W 1,2 (R). To proceed further we need stronger hypotheses about the potentials, namely the conditions (3.8) and (3.11) below. It can be shown that in combination with Vj ∈ L2 (R) they imply Vj ∈ L1 (R). Lemma 3.2. Let V0 ∈ L2 (R). If the conditions (3.2) and +∞ dx |x|1/2 |V0 (x)| < +∞ , −∞
(0)
are valid, then |ta, [u, v]| ≤ holds for u, v ∈ W 1,2 (R).
(3.8)
√ √ +∞ 2 |β|a −2 −∞ dx |x|1/2 |V0 (x)| uW 1,2 vW 1,2
Potential Approximations to δ
601 (0)
Proof. Changing the integration variable x → x in the definition of ta, [u, v] we get +∞
β (0) ta, [u, v] = 2 dx V0 (x) u(0)v(0) − u(x)v(x) , a −∞ which yields (0) [u, v] = − ta,
β a2
+∞ −∞
dx V0 (x) (u(0) − u(x))v(0) + u(x)(v(0) − v(x)) .
Since 1 |f (x)| ≤ √ f W 1,2 , 2
f ∈ W 1,2 (R),
(3.9)
and
|x − y| f W 1,2 , f ∈ W 1,2 (R), (3.10) y as it follows from f (x) − f (y) = − x f (t) dt and the Hölder inequality, we find |β| +∞ (0) |ta, [u, v]| ≤ 2 dx |x| |V0 (x)| uW 1,2 vW 1,2 2 a 2 −∞ |f (x) − f (y)| ≤
for u, v ∈ W 1,2 (R) which proves the lemma.
Lemma 3.3. Let Vj ∈ L2 (R), j = ±1, and β = 0. If the conditions (3.2) and +∞ dx |x|1/2 |Vj (x)| < +∞ , j = ±1 , −∞
(3.11)
are satisfied, then √ √ (j ) ta, [u, v] ≤ 2
+∞ 2 − 1 dx |x|1/2 |Vj (x)| uW 1,2 vW 1,2 β a −∞
(3.12)
holds for any u, v ∈ W 1,2 (R) and j = ±1. Proof. Let j = −1. Changing the integration variable to x − a in the definition of (−1) ta, [u, v] we get +∞
2 1 (−1) ta, [u, v] = dx V−1 (x) u(−a)v(−a) − u(x − a)v(x − a) . − β a −∞ From here we infer 2 1 (−1) ta, [u, v] = − × (3.13) β a +∞
dx V−1 (x) (u(−a) − u(x − a))v(−a) + u(x − a)(v(−a) − v(x − a)) . −∞
Using again (3.9) and (3.10) we complete the proof.
602
P. Exner, H. Neidhardt, V. A. Zagrebnov
Corollary 3.4. Let Vj ∈ L2 (R), j = −1, 0, +1, and β = 0. If the potentials Vj satisfy the conditions (3.2), (3.8), (3.11), then the estimate √ ta, [u, v] ≤ C(a) uW 1,2 vW 1,2 is valid for u, v ∈ dom(ta, ) = W 1,2 (R), where the constant C(a) is given by √ |β| +∞ dx |x|1/2 |V0 (x)| C(a) := 2 2 a −∞ 2 1 +∞ 1/2 dx |x| {|V−1 (x)| + |V+1 (x)|} . + − β a −∞
(3.14)
Let us next introduce the operator G(a) : L2 (R) → C3 , +∞ −∞ dx Giκ (x + a)f (x) +∞ G(a)f := −∞ dx Giκ (x)f (x) +∞ −∞ dx Giκ (x − a)f (x) for f ∈ dom(G(a)) = L2 (R). Obviously, the action of the adjoint operator G(a)∗ : C3 → L2 (R) is given by G(a)∗ ξ (x) = Giκ (x + a)ξ−1 + Giκ (x)ξ0 + Giκ (x − a)ξ+1 , ξ−1 ξ := ξ0 ∈ C3 . ξ+1
where
With these definitions the r.h.s. of (2.5) can be rewritten as (− Aa ,Ya + κ 2 )−1 f = (H0 + κ 2 )−1 f + G(a)∗ a (iκ)−1 G(a)f ,
(3.15)
where Ya = {yj }j =−1,0,+1 with yj = j a and the matrix a (iκ) is given by (2.6). ˆ Furthermore, we introduce the operator G(a): ˆ G(a)f :=
(H0 + κ 2 )−1/2 f G(a)f
L2 (R) : L (R) −→ ⊕ C3 2
(3.16)
and the operator ˆ a (iκ): ˆ a (iκ) :=
I 0 0 a (iκ)
L2 (R) L2 (R) : ⊕ −→ ⊕ . C3 C3
Using the definitions (3.16) and (3.17) we can rewrite (3.15) as ˆ ∗ ˆ a (iκ)−1 G(a)f ˆ . (− Aa ,Ya + κ 2 )−1 f = G(a)
(3.17)
Potential Approximations to δ
603
Since Giκ (x −j a) ∈ W 1,2 (R) for j = −1, 0, +1, one gets that ran(G(a)∗ ) ⊆ W 1,2 (R), ˆ ∗ ) ⊆ W 1,2 (R). Thus it makes sense to define the following and consequently, ran(G(a) sesquilinear form ˆ ˆ da, [ξˆ , η] ˆ := ta, [G(a) η], ˆ ξ , G(a) ∗ˆ
where ξˆ :=
∗
f ξ
L2 (R) ξˆ , ηˆ ∈ dom(da, ) = Hˆ := ⊕ , C3
and
g yˆ := η
with f, g ∈ L2 (R) and ξ, η ∈ C3 . By construction, the form da, [· , ·] defines a bounded ˆ operator Da, : Hˆ → H. Lemma 3.5. Let Vj ∈ L2 (R), j = −1, 0, +1, and β = 0. If the potentials Vj satisfy the conditions (3.2), (3.8) and (3.11) and κ ≥ 1, then one has √ Da, B(Hˆ ,Hˆ ) ≤ 4 C(a) (3.18) for a > 0. Proof. Using Corollary 3.4 we find ˆ ∗ ξˆ , G(a) ˆ ∗ η]| |da, [ξˆ , η]| ˆ = |ta, [G(a) ˆ ≤
√ ˆ ∗ ξˆ W 1,2 G(a) ˆ ∗ η] C(a)G(a) ˆ W 1,2 .
Since ˆ ∗ ξˆ )(x) = (H0 + κ 2 )−1/2 f + Giκ (x + a)ξ−1 + Giκ (x)ξ0 + Giκ (x − a)ξ+1 , (G(a) we have
ˆ ∗ ξˆ 2 1,2 ≤ 4 (H0 + κ 2 )−1/2 f 2 1,2 + Giκ (· + a)ξ−1 2 1,2 G(a) W W W 2 2 + Giκ (·)ξ0 W 1,2 + Giκ (· − a)ξ+1 W 1,2 .
The assumption κ ≥ 1 yields (H0 + κ 2 )−1/2 f 2W 1,2 ≤ f 2 ,
f ∈ L2 (R),
and Giκ (· − j a)ξj 2W 1,2 =
1 −1 κ + κ −3 |ξj |2 ≤ |ξj |2 , 4
j = −1, 0, +1,
for a ≥ 0. In this way we get the estimate
ˆ ∗ ξˆ 2 1,2 ≤ 4 f 2 + ξ 2 3 ≤ 4ξˆ 2 , G(a) ˆ W C H
for a ≥ 0. This leads to the estimate
√ ˆ ∗ ξˆ , G(a) ˆ ∗ η]| |da, [ξˆ , η]| ˆ = |ta, [G(a) ˆ ≤ 4 C(a)ξˆ Hˆ η ˆ Hˆ ,
from which (3.18) follows readily.
604
P. Exner, H. Neidhardt, V. A. Zagrebnov (n)
Let us further introduce the Neumann iterations Ra, (iκ) defined by
n (n) ˆ ∗ (a)ˆ a (iκ)−1 Da, ˆ a (iκ)−1 G(a), ˆ (iκ) := G n = 0, 1, 2, . . . Ra, for k > max(−2/β, 1) and a ∈ (0, a0 (κ)). The meaning of these expressions will become clear below; we note that (0) Ra, (iκ) = (− Aa ,Ya + κ 2 )−1 .
(3.19)
We also need to know how the norm of a (iκ)−1 behaves as a → 0. The Taylor expansion for all the expressions contained in (2.15) yields [a (iκ)]−1 =
2βa −2 2 + βκ
2κβ −1 2κ(κ + β −1 ) −2κ(κ + 2β −1 ) × −2κ(κ + 2β −1 ) 4κ(κ + 2β −1 ) −2κ(κ + 2β −1 ) (1 + O(a)) . 2κβ −1 −2κ(κ + 2β −1 ) 2κ(κ + β −1 )
Consequently, for κ > max(−2/β, 1) there is a constant C (κ) > 0 such that a (iκ)−1 3 3 ≤ C (κ) a −2 B(C ,C )
(3.20)
holds for any a ∈ (0, a0 (κ)). Lemma 3.6. Let Vj ∈ L2 (R), j = −1, 0, +1, and κ > max(−2/β, 1), β = 0. If the potentials Vj satisfy the conditions (3.2), (3.8) and (3.11), then the Neumann iterations obey the estimate 2C (κ) 4√ C (κ)C(a) n (n) (3.21) Ra, (iκ) ≤ 2 a a2 for a ∈ (0, a0 (κ)) and n = 1, 2, . . . , where C(a) is given by (3.14). Proof. Since κ > 1, we have ˆ ˆ∗ G(a) B(H,Hˆ ) = G (a)B(Hˆ ,H) ≤
√ 2.
n+1 (n) An elementary estimate, Ra, (iκ) ≤ 2 a (iκ)−1 B(C3 ,C3 ) Da, n
B(Hˆ ,Hˆ )
, gives
(n) Ra, (iκ) ≤ 2 · 4n n/2 C (κ)n+1 C(a)n a −(2n+2) so (3.21) follows readily. If κ > max{−2/β, 1} and the condition √ 4 C (κ)C(a) τ (, a, κ) := <1 a2
(3.22)
Potential Approximations to δ
605
is satisfied for some a ∈ (0, a0 (κ)), then the operator Ra, (iκ), Ra, (iκ) :=
∞ n=0
(n) Ra, (iκ),
is well defined. We denote the closed quadratic form which is associated with the selfa by ha [·, ·]. Obviously, its domain is dom(ha ) = W 1,2 (R); we adjoint operator H,0 ,0 ,0 note that the natural norm · ha,0 on dom(ha,0 ) is equivalent to the norm of the Hilbert
space W 1,2 (R).
Lemma 3.7. Let Vj ∈ L2 (R), j = −1, 0, +1, and κ > max(−2/β, 1), β = 0. If the potentials Vj satisfy the conditions (3.2), (3.8), (3.11), and τ (, a, κ) < 1 is valid for a given some a ∈ (0, a0 (κ)), then −κ 2 belongs to the resolvent set of the operator H,0 by (3.3), and moreover, one has a + κ 2 )−1 = Ra, (iκ). (H,0
(3.23)
Proof. Combining the above definitions of the quadratic forms, we get − Aa ,Ya + κ 2 u, − Aa ,Ya + κ 2 v = ha,0 [u, v] + κ 2 (u, v) + ta, [u, v] (3.24) −1 for u, v ∈ W 1,2 (R). We use this relation for u = Ra, (iκ)f and v = − Aa ,Ya + κ 2 g with f, g ∈ L2 (R). Since ˆ ∗ ˆ a (iκ)−1 Ra, (iκ) = G(a)
∞
Da, ˆ a (iκ)−1
n
ˆ G(a)
(3.25)
n=0
ˆ ∗ ) ⊆ W 1,2 (R) we get u ∈ W 1,2 (R). Since v = − A ,Y + κ 2 −1 g ∈ and ran(G(a) a a W 1,2 (R), we can insert u and v into (3.24). This yields (Ra, (iκ)f, g) = ha,0 [Ra, (iκ)f, (− Aa ,Ya + κ 2 )−1 g] + κ 2 (Ra, (iκ)f, (− Aa ,Ya + κ 2 )−1 g)
+ ta, [Ra, (iκ)f, (− Aa ,Ya + κ 2 )−1 g].
Using (3.19) and (3.25) we find ta, [Ra, (iκ)f, (− Aa ,Ya + κ 2 )−1 g] ∞ n −1 ˆ ˆ ˆ ∗ (iκ) ˆ ∗ ˆ a (iκ)−1 ˆ = ta, [G(a) Da, ˆ a (iκ)−1 G(a)f, G(a) G(a)g] n=0
= Da, ˆ a (iκ)−1 =
∞ n=1
∞
Da, ˆ a (iκ)−1
n=0
(n) Ra, (iκ)f, g
.
n
−1 ˆ ˆ ˆ G(a)f, (iκ) G(a)g
606
P. Exner, H. Neidhardt, V. A. Zagrebnov
Furthermore, from (3.19) we infer that
(− Aa ,Ya + κ 2 )−1 f, g = ha,0 [Ra, (iκ)f, (− Aa ,Ya + κ 2 )−1 g] + κ 2 (Ra, (iκ)f, (− Aa ,Ya + κ 2 )−1 g). Setting now h := (− Aa ,Ya + κ 2 )−1 g we find (f, h) = ha,0 [Ra, (iκ)f, h] + κ 2 (Ra, (iκ)f, h)
(3.26)
for h ∈ dom(− Aa ,Ya ). Since dom(− Aa ,Ya ) is a core for the quadratic form ha,0 [· , ·] one concludes that the equality (3.26) extends to each h ∈ dom(ha,0 ). In particular, if a ) we have h ∈ dom(H,0 a + κ 2 )h). (f, h) = (Ra, (iκ)f, (H,0 a )) and In this way we find Ra, (iκ)f ∈ dom(H,0 a + κ 2 )Ra, (iκ)f = f, (H,0
and
a Ra, (iκ)(H,0 + κ 2 )h = h,
f ∈ H,
a h ∈ dom(H,0 ).
a + κ 2 ) = {0} and ran(H a + κ 2 ) = H, so the operator H a + κ 2 is Hence ker(H,0 ,0 ,0 a + κ 2 )−1 = R (iκ). boundedly invertible and (H,0 a,
With the help of Lemma 3.7 one can prove the following estimate. Lemma 3.8. Let Vj ∈ L2 (R), j = −1, 0, +1, and κ > max(−2/β, 1), β = 0. If the potentials Vj satisfy the conditions (3.2), (3.8) and (3.11) and τ (, a, κ) < 1 is valid for some a ∈ (0, a0 (κ)), then a (H + κ 2 )−1 − (− A ,Y + κ 2 )−1 ≤ 2C (κ) τ (, a, κ) (1 − τ (, a, κ))−1 . ,0 a a a2 Proof. Taking into account (3.23) and (3.19) we find a + κ 2 )−1 − (− Aa ,Ya + κ 2 )−1 = (H,0
∞ n=1
(n) Ra, (iκ) .
Using the notation (3.22) and taking into account the estimate (3.21) one gets ∞ a (H + κ 2 )−1 − (− A ,Y + κ 2 )−1 ≤ 2C (κ) τ (, a, κ)n . ,0 a a a2 n=1
If τ (, a, κ) < 1 is satisfied, we obtain (3.8) easily.
Now we are ready to say something about the rate of the potential approximation in terms of the relation between a and . Consider a function a : (0, ∞) → (0, ∞).
Potential Approximations to δ
607
Theorem 3.9. Let Vj ∈ L2 (R), j = −1, 0 + 1, and κ > max(−2/β, 1), β = 0. Moreover, suppose that a() → 0 as → 0+. If the potentials Vj satisfy the conditions (3.2), (3.8) and (3.11) for j = −1, 0, +1, and = 0, →0 a()12
(3.27)
a lim (H,y + κ 2 )−1 − (− Aa() ,Ya() + κ 2 )−1 = 0
(3.28)
a lim (H,y + κ 2 )−1 − (β,y + κ 2 )−1 = 0.
(3.29)
lim
then →0
and →0
a is unitarily equivalent to H a by translation and the same is true for Proof. Since H,y ,0 the other involved operators, we can again put y = 0 without loss of generality. By assumption, a() ∈ (0, a0 (κ)) for sufficiently small. Further, we note that there is a constant C = C(Vj , β) such that C(a) ≤ Ca −2 for a > 0. Using that we can estimate
τ (, a(), κ) ≤
√ √ 4 C (κ)C 2 = 4C (κ)Ca() , a()4 a()6
so lim→0+ τ (, a(), κ) = 0 by (3.27) and lim→0 a() = 0. Hence, τ (, a(), κ) < 1 holds for sufficiently small. Applying Lemma 3.8 we get √ a (1 − τ (, a(), κ))−1 . (H,0 + κ 2 )−1 − (− Aa() ,Ya() + κ 2 )−1 ≤ 8C (κ)2 C a()6 Taking into account once again the assumption (3.27) we prove (3.28). Moreover, using Theorem 2.2 together with the estimate (3.4) we arrive at (3.29). 4. Exceptional Character of the CS Approximation In conclusion we want to show that it is sufficient to disbalance the limiting procedure slightly, say by changing the normalization (3.2), and the result will be completely different than that in Theorem 3.9. For simplicity we will consider the case y = 0 only. Denote by − D,0 the Laplace operator with Dirichlet boundary conditions at the origin, i.e
dom(− D,0 ) = f ∈ W 2,2 (R− ) ⊕ W 2,2 (R+ ) : f (0−) = f (0+) = 0 and (− D,0 f )(x) = −
d2 f (x), dx 2
f ∈ dom(− D,0 ).
With respect to L2 (R) = L2 (R− ) ⊕ L2 (R+ ) the operator − D,0 decomposes into + − D,0 = − − D,0 ⊕ − D,0
608
P. Exner, H. Neidhardt, V. A. Zagrebnov
with dom(− ± f ∈ W 2,2 (R± ) : f (0±) = 0 . We note that σ (− ± D,0 ) = D,0 ) ± 2 −1 = [0, +∞). The resolvents (− D,0 + κ ) are integral operators with the kernels 1 ∓κx sinh(κx ) . . . ±x ∈ [0, ±x) ±κ e ± Diκ (x, x ) := ± κ1 sinh(κx) e∓κx . . . ±x ∈ [±x, +∞). Then a straightforward computation shows that the free resolvent (2.1) gets the form : − + (x, x ) ⊕ Diκ (x, x ) + Giκ (x − x ) = Diκ
1 −κ|x| −κ|x | e e . 2κ
(4.1)
The indicated modification corresponds to the changed − Aa ,Ya with Aa replaced by αAa , αAa := α(2β −1 − a −1 ), αβa −2 , α(2β −1 − a −1 ) , where α, β ∈ R \ {0}. The form Qα Aa ,Ya [·, ·] associated with the operator − α Aa ,Ya is given by β 2 1 − u(+a)v(+a)+u(−a)v(−a) , Qα Aa ,Ya [u, v] = (u , v )+α 2 u(0)v(0)+α a β a where u, v ∈ dom(Qα Aa ,Ya,α ) = W 1,2 (R), which means that α = 1 amounts to a simultaneous change of all the δ coupling parameters. The resolvent (− α Aa ,Ya +κ 2 )−1 is again given by Krein’s formula (− α Aa ,Ya + κ 2 )−1 (x, x ) = Giκ (x − x ) − j,j =−1,0,+1
[a,α (iκ)]−1 jj Giκ (x − yj )Giκ (x − yj ) ,
(4.2)
2 1 + αu w w 1 w 1 + αv w , a,α (iκ) = 2κ w2 w 1 + αu
where
i.e., in comparison with (2.6) we have u → αu, v → αv, while w is preserved. Lemma 4.1. Let κ > 0. The resolvent (− α Aa ,Ya + κ 2 )−1 exists for sufficiently small a > 0 if α = 1. Proof. It is sufficient that −κ 2 is not an eigenvalue. As in Proposition 2.1 this would be true for − α Aa ,Ya if κ satisfies one of the equations analogous to (2.10) and (2.11), with κ replaced by ακ at the r.h.s. The Taylor expansion around a = 0 shows that this cannot happen unless α = 1. In the following we fix κ > 0, α ∈ {0, 1}, and β = 0. Then there is a0 (κ) > 0 such that for all a ∈ (0, a0 (κ)) the resolvent (− α Aa ,Ya + κ 2 )−1 exists. Theorem 4.2. Let κ > 0, α = 0, 1, and β = 0 be fixed. Then the relation
−1 − + lim − α Aa ,Ya + κ 2 (x, x ) = Diκ (x, x ) ⊕ Diκ (x, x ) a→0+
(4.3)
holds for any x, x ∈ R. Consequently, − α Aa ,Ya → − D,0 as a → 0+ in the normresolvent sense.
Potential Approximations to δ
609
Proof. Considering the case x, x ≥ a and following the line of reasoning from (2.15) to (2.20) we obtain 1 −κx −κx Nα [a,α (iκ)]−1 e (4.4) jj G(x −yj )G(x −yj ) = 4κ 2 e Dα jj =−1,0+1
with Dα :=
(w 2 − 1 − αu)[(1+αu)(1+αv) − w 2 (1−αv)] 2κ
and Nα := (w 2 + w −2 )[w 2 − (1+αu)(1+αv)] + 2αw 2 v + (w 2 − 1− αu)(αu−1− w 2 ) . If α = 1, one gets
Dα = −2κa 2 (1 − α) + O(a 3 )
and Nα = −4κ 2 a 2 (1 − α) + O(a 3 ) , so the r.h.s. of (4.4) equals 2κe−κx e (4.1), we find
−κx
(4.5)
(1 + O(a)). Inserting (4.5) into (4.2) and using
+ lim (− α Aa ,Ya + κ 2 )−1 (x, x ) = Diκ (x, x )
a→+0
(4.6)
for x, x ∈ [a, +∞). In the same way one can treat the other combinations with x, x belonging to (−∞, a], (−a, 0), (0, a) and [a, +∞); doing so we check (4.3) for x, x ∈ R. Taking into account (4.1) and (4.2) one easily verifies that −1
− + (x, x ) − Diκ (x, x ) ⊕ Diκ (x, x ) − α Aa ,Ya + κ 2 can be majorized by a function from L2 (R2 ) which is independent of a. By (4.3) and the −1 −1 − − D,0 + κ 2 Lebesgue convergence theorem the difference − α Aa ,Ya + κ 2 converges to zero in the Hilbert-Schmidt norm, so − α Aa ,Ya → − D,0 as a → 0+ in the norm-resolvent sense. a Let us introduce the Schrödinger operator H,0,α defined by a a H,0,α := − + αW,0
for α ∈ R \ {0} as in the previous section. It corresponds to rescaling of the original a a if α = 1. The Neumann iterations are approximation potential: we have H,0,α = H,0 now defined by
n (n) ˆ ∗ (a)ˆ a,α (iκ)−1 Da, ˆ a,α (iκ)−1 G(a), ˆ Ra,,α (iκ) := G n = 0, 1, 2, . . . for k > 1 and a ∈ (0, a0 (κ)), where the definition of ˆ a,α (iκ) is obvious, cf. (3.17). We note that for κ > 1 and α = 1 there is a constant Cα (κ) > 0 such that instead of (3.20) one has the estimate a,α (iκ)−1 3 3 ≤ C (κ)a −1 α B(C ,C ) for a ∈ (0, a0 (κ)). Lemma 3.6 reads now as follows.
610
P. Exner, H. Neidhardt, V. A. Zagrebnov
Lemma 4.3. Let Vj ∈ L2 (R), j = −1, 0, +1, and κ > 1, β = 0. If the potentials Vj satisfy the conditions (3.2), (3.8) and (3.11), then the Neumann iterations obey the estimate 2C (κ) 4√ C (κ)C(a) n α α (n) Ra,,α (iκ) ≤ a a for a ∈ (0, a0 (κ)) and n = 1, 2, . . . , where C(a) is given by (3.14). The proof is similar to that of Lemma 3.6. In view of Lemma 4.3 one has to modify the parameter τ (, a, κ) to √ 4 Cα (κ)C(a) . τα (, a, κ) := a (n)
n If ατα (, a, κ) < 1 is satisfied, then the operator Ra,,α (iκ) := ∞ n=0 α Ra,,α (iκ) is well defined. With obvious modifications Lemma 3.7 takes the following form.
Lemma 4.4. Let Vj ∈ L2 (R), j = −1, 0, +1, and let κ > 1, α ∈ {0, 1}, and β = 0. If the potentials Vj satisfy the conditions (3.2), (3.8) and (3.11) and ατα (, a, κ) < 1 is valid for some a ∈ (0, a0 (κ)), then −κ 2 belongs to the resolvent set of the operator a H,0,α , and, moreover, one has a + κ 2 )−1 = Ra,,α (iκ). (H,0,α
Lemma 3.8 modifies similarly but we get a slightly stronger result because the matrix a,α (iκ)−1 is now less singular for any κ > 0 as a → 0. Lemma 4.5. Under the assumptions of the preceding lemma, a (H,0,α + κ 2 )−1 − (− α Aa ,Ya + κ 2 )−1 ≤ 2αCα (κ)
τα (, a, κ) (1 − ατα (, a, κ))−1 . a
Taking into account Theorem 4.2 and Lemmata 4.4, 4.5 we thus prove the following theorem. Theorem 4.6. Let Vj ∈ L2 (R), j = −1, 0, 1, and let κ > 1, α ∈ {0, 1}, and β = 0. Furthermore, let lim→0 a() = 0. If the potentials Vj satisfy the conditions (3.2), (3.8) and (3.11) and lim = 0, →0 a()8 then
−1 −1 a 2 2 = 0. lim H,0,α + κ − − α Aa() ,Ya() + κ →0
and
−1 −1 a 2 2 = 0. lim H + κ − − + κ D,0 ,0,α →0
a Using a translation, the analogous conclusion can be made for the family {H,y,α } with the potential center shifted to a point y, which naturally converges for α ∈ {0, 1} to the Laplacian with the Dirichlet decoupling at y.
Potential Approximations to δ
611
Acknowledgements. The authors are grateful for the hospitality in the institutes where parts of this work were done: P.E. and H.N. in Centre de Physique Théorique, CNRS, Marseille-Luminy, and H.N. and V.Z. in Nuclear ˇ near Prague. We also thank the referee for pointing out an error in the first version Physics Institute, AS, Rež of the manuscript. The research was partially supported by the GAAS Grant A1048101 and the Exchange Agreement No. 7919 between CNRS and the Czech Academy of Sciences.
Note added in proof. We thank J. Brasche who attracts out attention to the article “Singular Schrödinger Operators as Limits of Point Interaction Hamiltonian”, Potential Analysis 8, 163–178 (1998), by J. Brasche, R. Figari and A. Teta, related to the topic of the present paper. References [AGHH] Albeverio,S., Gesztesy, F., Høegh-Krohn, R., Holden, H.: Solvable Models in Quantum Mechanics. Heidelberg: Springer, 1988 [ADE] Asch, J., Duclos, P., Exner, P.: Stability of driven systems with growing gaps. Quantum rings and Wannier ladders. J. Stat. Phys. 92, 1053–1069 (1998) [AEL] Avron, J.E., Exner, P., Last, Y.: Periodic Schrödinger operators with large gaps and Wannier–Stark ladders. Phys. Rev. Lett. 72, 896–899 (1994) [BF] Berezin, F.A., Faddeev, L.D.: A remark on Schrödinger equation with a singular potential. Sov. Acad. Sci. Doklady 137, 1011–1014 (1961) (in Russian) [Ca] Carreau, M.: Four–parameter point–interactions in 1D quantum systems. J. Phys. A26, 427–432 (1993) [CS] Cheon, T., Shigehara, T.: Realizing discontinuous wave functions with renormalized short-range potentials. Phys. Lett. A243, 111–116 (1998) [CH] Chernoff, P.R., Hughes, R.: A new class of point interactions in one dimension. J. Funct. Anal. 111, 92–117 (1993) [Ex] Exner, P.: The absence of the absolutely continuous spectrum for δ Wannier–Stark ladders. J. Math. Phys. 36, 4561–4570 (1995) [Fr] Friedman, C.N.: Perturbations of the Schrödinger equation by potentials with small support. J. Funct. Anal. 10, 346–360 (1972) [GH] Gesztesy, F., Holden, H.: A new class of solvable models in quantum mechanics describing point interactions on the line. J. Phys. A20, 5157–5177 (1987) [GK] Gesztesy, F., Kirsch, W.: One–dimensional Schrödinger operators with interactions singular on a discrete set. J. Reine Angew. Math. 362, 28–50 (1985) [GHM] Grossmann, A., Høegh-Krohn, R., Mebkhout, M.: A class of explicitly soluble, local, many-center Hamiltonians for one-particle quantum mechanics in two and three dimensions. J. Math. Phys. 21, 2376–2385 (1980) [Ki] Kiselev, A.: Some examples in one–dimensional “geometric” scattering on manifolds. J. Math. Anal. Appl. 212, 263–280 (1997) [Kl] Klauder, J.: Field structure through model studies: Aspects of nonrenormalizable field theory. Acta Phys. Austriaca Suppl. 11, 341–387 (1973) [KP] Kronig, R. de L., Penney, W.G.: Quantum mechanics of electrons in crystal lattices. Proc. Roy. Soc. (London) 130A, 499–513 (1931) [MS] Maioli, M., Sacchetti, A.: Absence of absolutely continuous spectrum for Stark-Bloch operators with strongly singular periodic potentials. J. Phys. A28, 1101–1106 (1995); erratum A31, 1115– 1119 (1998) [NZ1] Neidhardt, H., Zagrebnov, V.A.: Towards the right Hamiltonian for singular perturbations via regularization and extension theory. Rev. Math. Phys. 8, 715–740 (1996) [NZ2] Neidhardt, H., Zagrebnov, V.A.: On the right Hamiltonian for singular perturbations: General theory. Rev. Math. Phys. 9, 609–633 (1997) [RS] Reed, M., Simon, B.: Methods of Modern Mathematical Physics, I. Functional Analysis. NewYork: Academic Press, 1972 [Še1] Šeba, P.: The generalized point interaction in one dimension. Czech. J. Phys. B36, 667–673 (1986) [Še2] Šeba, P.: Some remarks on the δ –interaction in one dimension. Rep. Math. Phys. 24, 111–120 (1986)
612
P. Exner, H. Neidhardt, V. A. Zagrebnov
[SMMC] Shigehara, T., Mizoguchi, H., Mishima, T., Cheon, T.: Realization of a four parameter family of generalized one-dimensional contact interactions by three nearby delta potentials with renormalized strengths. IEICE Trans. Fund. Elec. Comm. Comp. Sci. E82-A, 1708–1713 (1999) [Si] Simon, B.: Quadratic forms and Klauder’s phenomenon: a remark on very singular perturbations. J. Funct. Anal. 14, 295–298 (1973) [We] Weidman, J.: Linear Operators in Hilbert Spaces. New York: Springer, 1980 Communicated by H. Araki
Commun. Math. Phys. 224, 613 – 655 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Clebsch–Gordan and Racah–Wigner Coefficients for a Continuous Series of Representations of Uq (sl(2, R)) B. Ponsot1 , J. Teschner2 1 Laboratoire de Physique Mathématique, Université Montpellier II, Pl. E. Bataillon, 34095 Montpellier,
France. E-mail: [email protected]
2 Institut für Theoretische Physik, Freie Universität Berlin, Arnimallee 14, 14195 Berlin, Germany.
E-mail: [email protected] Received: 9 August 2000 / Accepted: 2 July 2001
Abstract: The decomposition of tensor products of representations into irreducibles is studied for a continuous family of integrable operator representations of Uq (sl(2, R). It is described by an explicit integral transformation involving a distributional kernel that can be seen as an analogue of the Clebsch–Gordan coefficients. Moreover, we also study the relation between two canonical decompositions of triple tensor products into irreducibles. It can be represented by an integral transformation with a kernel that generalizes the Racah–Wigner coefficients. This kernel is explicitly calculated. 1. Introduction Noncompact quantum groups can be expected to lead to very interesting generalizations of the rich and beautiful subject of harmonic analyis on noncompact groups. Important progress has recently been made concerning an abstract (C ∗ -algebraic) theory of noncompact quantum groups, see [1] for a nice overview and further references. However, an important problem is still the rather limited supply of interesting examples. Results on the harmonic analysis are so far only known for the quantum deformation of the group of motions on the euclidean plane[2, 3], the quantum Lorentz group [5, 6] and SUq (1, 1) [7, 8]. Moreover, there sometimes exist subtle analytical obstacles to construct quantum deformations of classical groups such as SU (1, 1) on the C ∗ -algebraic level, cf. [4]. Recently some evidence was presented in [9] that a certain noncompact quantum 2 group with deformation parameter q = eπib should describe a crucial internal structure of Liouville theory, a two-dimensional conformal field theory (CFT) that can be seen to be as much a prototype for a CFT with continuous spectrum of Virasoro representations as the harmonic analysis on SL(2, C) is a protoype for noncompact groups. The relation between Liouville theory and that quantum group which was proposed in [9] generalizes the known equivalences between fusion categories of chiral algebras in conformal field theories and braided tensor categories of quantum group representations, cf. e.g. [12, 13]. These equivalences concern the isomorphisms that represent the operation of commuting
614
B. Ponsot, J. Teschner
tensor factors as well as the associativity of tensor products, and can be boiled down to the comparison of certain numerical data, the most non-trivial being some generalization of the Racah–Wigner coefficients (or fusion coefficients in CFT terminology). The quantum group in question is Uq (sl(2, R)). A class of “well-behaved” representation of Uq (sl(2, R)) on Hilbert-spaces was defined and classified in [10]. We will study a certain subclass of the representations listed there. Some of the representations found in [10] reproduce known representions of principal or discrete series of sl(2, R) in the classical limit b → 0, others do not have a classical limit at all. The representations we will consider are of the latter type. Let us remark that representations that are essentially equivalent to the class of representations discussed in our paper were recently also discussed in [14]. The main result of the latter paper is a very interesting proposal for a braiding operation on such representations. In our present paper we will present explicit descriptions for the decomposition of tensor products of these representations into irreducibles, as well as the isomorphism relating two canonical bases for triple tensor products. What appears to be remarkable is the fact that the subseries we have picked out is actually closed under forming tensor products, which one would generally not expect if there exist other unitary representations. The maps describing the decomposition of tensor products lead to the definition and explicit calculation of the generalization of the Racah–Wigner coefficients which represent the central ingredient for the approach of [9] from the mathematics of quantum groups. From the mathematical point of view one may view our results as providing a technical basis for further studies of a C ∗ algebraic quantum group that may be generated1 from Uq (sl(2, R)) and its dual object, which is expected to be a C ∗ algebraic quantum group generated from SLq (2, R). In [9] we presented the definition of SL+ q (2, R) as a quantum space, a C ∗ algebra A+ that is generated from SLq (2, R) and is acted on by analogues of left and right regular representation of Uq (sl(2, R)). An L2 -space was introduced there, and the result describing its decomposition into irreducible representations of Uq (sl(2, R)) (Plancherel decomposition) was announced. Two aspects of these constructions were unusual: A+ was introduced such that the elements a, b, c, d generating SLq (2, R) have positive spectrum and the L2 -space was introduced by a measure that has no classical q → 1 limit. It turns out that it is precisely the subset of unitary Uq (sl(2, R)) representations studied in the present paper which appears in the Plancherel decomposition of that L2 -space. We view these results as hints towards existence of a rather interesting C ∗ -algebraic quantum group related to SLq (2, R) that has no classical counterpart, but other beautiful properties such as a self-duality under b → b−1 which are crucial for the application to Liouville theory [9]. A first hint towards this self-duality can be found in the observation made in [9, 14] (see also [15] for closely related earlier observations) that the representations that we 2 consider may alternatively be seen as representations of Uq˜ (sl(2, R)), where q˜ = eπi/b . This led L. Faddeev to the proposal [14] to unify Uq (sl(2, R)) and Uq˜ (sl(2, R)) into an object called “modular double”, which exhibits the self-duality under b → b−1 in a manifest way. And indeed, it is found in the present paper that the Clebsch–Gordan intertwining maps, as well as the Racah–Wigner coefficients can be constructed in terms of a remarkable special function Sb (x). This special function is closely related to the Barnes Double Gamma function [28], and was more recently independently introduced 1 In a similar sense as the bounded operators on L2 (R) are generated by the unbounded operators p and q that satisfy [p, q] = −i, cf. [11] for more details.
Clebsch–Gordan and Racah–Wigner Coefficients for Representations of Uq (sl(2, R))
615
under the names of “Quantum Dilogarithm” in [16], and as “Quantum Exponential function” in [17]. The function Sb (x) has the property to be self-dual in the sense that it satisfies Sb (x) = S1/b (x). It follows from this self-duality of the function Sb that the Clebsch–Gordan maps constructed in the present paper can be seen as intertwining maps for the “modular double” of L. Faddeev. We would finally like to point out that our techniques for dealing with finite difference operators that involve shifts by imaginary amounts, in particular the method for determining the spectrum of such an operator, seem to be new and should have generalizations to a variety of other problems where such operators appear. Moreover, the investigation of the class of special functions that we use is fairly recent, so we will need to deduce several previously unknown properties. The paper is organized as follows: In the following section we will introduce some technical preliminaries. Since we have to deal with finite difference operators that shift the arguments of functions by imaginary amounts, a lot of what follows will be based on the theory of functions analytic in certain strips around the real axis, and the description of their Fourier-transforms via results of Paley–Wiener type. The third section introduces the class of representations that will be studied in the present paper and discusses some of their properties. This is followed by a section describing the decomposition of tensor products of representations into irreducibles. We then define and calculate b-Racah Wigner coefficients as the kernel that appears in the integral transformation that establishes the isomorphism between two canonical decompositions of triple tensor products. Appendix A is in some sense the technical heart of the paper: It contains the spectral analysis of a finite difference operator of second order that is related to the Casimir on tensor products of two representations. Appendices B and C contain some information on the special functions that are used in the body of the paper.
2. Preliminaries We collect some basic conventions, definitions and standard results that will be used throughout the paper.
2.1. Finite difference operators. The quantum group will be realized in terms of finite difference operators that shift the arguments by an imaginary amount. On functions f (x), x ∈ R that have an analytic continuation to a strip containing {x ∈ C; Im(x) ∈ [ − a− , a+ ]}, a± ≥ 0 one may define the finite difference operators Txia , a ∈ [ − a− , a+ ] by Txia f (x) = f (x + ia).
(1)
As convenient notation we will use sin(π bx) , [x]b ≡ sin(π b2 )
ib
1 dx ≡ ∂x , 2π
[dx + a]b ≡
− ib 2
eπiba Tx 2 − e−πiba Tx eπib − e−πib 2
2
. (2)
616
B. Ponsot, J. Teschner
2.2. Fourier-transformation. Our notation and conventions concerning the Fourier-transformations are as follows: Let S(R) denote the usual Schwartz-space of functions on the real line. The Fourier-transformation of a function f ∈ S(R) will be defined as ∞
f˜(ω) =
dx e−2πiωx f (x).
(3)
−∞
The corresponding inversion formula is then ∞ f (x) =
dω e2πiωx f˜(ω).
(4)
−∞
The Fourier-transformation maps the finite difference operator Txia to the operator of multiplication with e−2πaω . It will therefore be a useful tool for dealing with these operators. Of fundamental importance will be the connection between analyticity of functions in a strip to exponential decay properties of its Fourier-transform and vice versa that is expressed by the classical Paley–Wiener theorem: Theorem 1 (Paley–Wiener). Let f be in L2 (R). Then (e2πxa+ + e−2πxa− )f ∈ L2 (R), a± > 0 if and only if f˜ has an analytic continuation to the strip {ω ∈ C; Im(ω) ∈ (−a− , a+ )} such that for any ω2 ∈ (−a− , a+ ), f˜(. + iω2 ) ∈ L2 (R) and ∞ sup
ω2 ≤b
dω1 |f˜(ω1 + iω2 )|2 < ∞ for any b ∈ (−a− , a+ ).
(5)
−∞
Proof. Cf. e.g. [19].
The following simple variant of this result will often be useful: Lemma 1. For f ∈ S(R), the following two conditions are equivalent: 1. f is the restriction to R of a function F that is meromorphic in the strip {z ∈ C; Im(z) ∈ (−a− , a+ )}, a+ , a− > 0 with finitely many poles in the upper (lower) half plane at P± ≡ {zj ; j ∈ I± }, |Im(zj )| > 0, and all functions Fy (x) ≡ F (x + iy), y ∈ (−a− , a+ ) are of rapid decrease, and 2. one has the following asymptotic behavior of the Fourier-transform f˜(ω) for ω → ±∞: f˜(ω) = − 2πi e−2πizj ω Res F (z) + f˜a+ (ω), j ∈I−
f˜(ω) = + 2πi
j ∈I+
z=zj
e−2πizj ω Res F (z) + f˜a− (ω), z=zj
where f˜a± (ω) decay as x → ±∞ faster than e−2πa|ω| for any a ∈ (−a− , a+ ).
Clebsch–Gordan and Racah–Wigner Coefficients for Representations of Uq (sl(2, R))
617
2.3. Distributions. Let S (R) be the space of tempered distributions on S(R). The dual pairing between a distribution ∈ S (R) and a function f ∈ S(R) will be denoted by , f . The Fourier transformation on S (R) is defined by ˜ , f˜ ≡ , f for any f ∈ S(R). It should be noted that if a distribution ∈ S (R) actually happens to be represented by a function (x) via ∞ ,f =
dx
(x)f (x),
−∞
then our definition of the Fourier-transform of following inversion formula for (x): ∞ (x) =
implies that instead of (4) one has the
dω e−2πiωx ˜ (ω).
(6)
−∞
The distributions that appear below will all be defined in terms of meromorphic functions by means of the so-called i!-prescription: Assume given a family of functions ! , ! > 0 that are meromorphic in some strip containing R, rapidly decreasing at infinity and have finitely many poles with !-independent residues at a distance ! from the real axis. The limit ≡ lim!→0 ! then defines a distribution ∈ S (R). We will often use the symbolic notation (x) for the resulting distribution, keeping in mind that (x) will not be defined for all x ∈ R. There is a simple generalization of Lemma 1 to such distributions in S (R): Poles on the real axis correspond to asymptotic behavior of the form e2πiωx of the Fouriertransform: Lemma 2. For
∈ S (R), the following two conditions are equivalent:
= lim!→0 ! , where ! is for ! > 0 represented as the restriction to R of a function that is meromorphic in the strip {z ∈ C; Im(z) ∈ (−a− , a+ )}, a+ , a− > 0 ! ≡ {z ± i!; j ∈ I }, with finitely many poles in the upper (lower) half plane at P± j ± ±Im(zj ) ≥ 0, and all functions !,y (x) ≡ ! (x + iy), x, y ∈ R, y ∈ (−a+ , a− ) are of rapid decrease, and 2. ˜ is represented by a function ˜ (ω) ∈ C ∞ (R) that has the following asymptotic behavior:
1.
! (x)
˜ (ω) = + 2π i
j ∈I+
˜ (ω) = − 2π i
j ∈I−
e2πizj ω Res
(z) + ˜ a+ (ω)
e2πizj ω Res
(z) + ˜ a− (ω),
z=zj
z=zj
where ˜ a± (ω) decay faster than than e−2πa|ω| for any a ∈ (−a− , a+ ). Remark 1. The sign flips between Lemmas 1 and 2 are due to the different inversion formulae for functions and distributions.
618
B. Ponsot, J. Teschner
2.4. A useful lemma from complex analysis. The following lemma is useful for determining the analytic properties of convolutions of meromorphic functions: Lemma 3. Let f (z0 ; z1 , z2 ) be meromorphic in its variables in some open strip S around the real axis, with singular behavior near z0 = z1 = z2 of the form R12 (z1 )(z0 − z1 )−1 (z0 − z2 )−1 . The function I (z1 , z2 ), defined by the integral ∞ I (z1 , z2 ) ≡
dz0 f (z0 ; z1 , z2 ),
(7)
−∞
will then be a function that has a meromorphic continuation w.r.t. zi , i = 1, 2 to the whole strip S. If z1 and z2 were initially separated by the real axis one will find a pole with residue R12 (z1 ) at z1 = z2 . If not, I (z1 , z2 ) will be nonsingular at z1 = z2 as well. Proof. To define the meromorphic continuation of I (z1 , z2 ) in cases where the poles zi , i = 1, 2 cross the contour of integration of the integral (7) one just needs to deform the contour accordingly. This will obviously always be possible as long as zi , i = 1, 2 were initially not separated by the real axis. We will therefore turn to the case that they were initially separated, and consider w.l.o.g. the case that z1 was initially in the upper, z2 in the lower half plane. In this case one may deform the contour into a contour that passes above z1 plus a small circle around z1 . The residue contribution from the integral over that small circle is 2πi
R12 (z1 ) + (contributions regular as z1 − z2 → 0). z1 − z 2
(8)
The Lemma is proven. 3. A Class of Representations of Uq (sl(2, R)) Definition. Uq (sl(2, R) is a Hopf-algebra with generators: E,
F,
K,
K −1 ; KF = q −1 F K,
relations: KE = qEK, star-structure: K ∗ = K,
E ∗ = E,
co-product: &(K) = K ⊗ K,
[E, F ] = −
F∗ = F;
K 2 − K −2 ; q − q −1
(9)
&(E) =E ⊗ K + K −1 ⊗ E, &(F ) =F ⊗ K + K −1 ⊗ F.
The center of Uq (sl(2, R) is generated by the q-Casimir C = FE −
qK 2 + q −1 K −2 − 2 . (q − q −1 )2
(10)
We will consider the case that q = eπib , b ∈ (0, 1) ∩ (R \ Q). Unitary representations of Uq (sl(2, R)) by operators on a Hilbert-space have been studied in [10]. Since there are no unitary representations in terms of bounded operators some care is needed in order to single out an interesting class of “well-behaved” 2
Clebsch–Gordan and Racah–Wigner Coefficients for Representations of Uq (sl(2, R))
619
representations. A natural notion of “well-behaved” was introduced in [10], where the corresponding unitary representations of Uq (sl(2, R)) were classified. In the present paper we will study a one-parameter subclass Pα , α ∈ Q/2 + iR, Q = b + b−1 of the representations listed in [10] which are constructed as follows: The representation will be realized on the space Pα of entire analytic functions f (x) that have a Fourier-transform f (ω) which is meromorphic in C with possible poles at ω = i(α − Q − nb − mb−1 )
n, m ∈ Z≥0 .
ω = i(Q − α + nb + mb−1 )
(11)
Remark 2. It can be shown that Pα is a Frechet-space. One may then introduce the following finite difference operators: πα (E) ≡ e+2πbx [dx + Q − α]b
ib
πα (K) ≡ Tx 2 .
πα (F ) ≡ e−2πbx [dx + α − Q]b
(12)
As shorthand notation we will also use uα ≡ πα (u). Lemma 4. (i) The operators πα (u), u = E, F, K map Pα into itself. (ii) πα (u), u = E, F, K generate a representation of Uq (sl(2, R)) on Pα . Proof. To verify (i), note that Fourier-transformation maps Eα , Fα , Kα into the following operators: E˜ α = [−iω + α]b Tωib F˜α = [−iω − α]b Tω−ib
Kα = e−πbω .
(13)
The claim follows from the fact that [x]b = 0 for x = nb−1 , n ∈ Z. (ii) is checked by straightforward calculation. Proposition 1. The operators (12) generate an integrable operator representation of Uq (sl(2, R)) in the sense of [10], i.e. 1. Eα , Fα , Kα have self-adjoint extensions in L2 (R), 2. the corresponding unitary operators Eαit , Fαit , Kαit satisfy Kαis Eαit = q −ts Eαit Kαis ,
Kαis Fαit = q ts Fαit Kαis ,
and
3. the q-Casimir strongly commutes with Eα , Fα and Kα . Proof. It suffices to show that the representation Pα is unitarily equivalent to one of the representations listed in [10]. Consider the operator Jα defined as (Jα f˜)(ω) = Sb (α − iω)f˜(ω) in terms of the special function Sb (x) (cf. Appendix B). Jα is unitary since |Sb−1 (α − iω)|2 = 1 which follows from Eq. (134) in Appendix B. Moreover, it follows from the analytic and asymptotic properties of Sb (x) given in the Appendix that Jα maps Pα to the space Rα of entire analytic functions which have a Fourier-transform that is meromorphic in C with possible poles at ω = i(α − Q − nb − mb−1 ) ω = i(−α − nb − mb
−1
)
n, m ∈ Z≥0 .
(14)
620
B. Ponsot, J. Teschner
One finally finds from the functional relations of the Sb -functions, Eq. (133) that Jα−1 E˜ α Jα = Tωib Jα−1 F˜α Jα = [α + iω]b Tω−ib [α − iω]b
Jα−1 Kα Jα = e−πbω .
(15)
Our representation is thereby easily recognized as the representation denoted by (I )1,−1,c 2 −1 −2 in Corollary 5 of [10], where c = [α − Q 2 ]b + 2(q − q ) . Note that our notation Q −1 −2 is different from that in [10] and c ≤ 2(q − q ) . Remark 3. The representations considered here form a subset of the representations of Uq (sl(2, R)) that appear in the classification of [10]. This subset has the following ˜ F˜ , K˜ by replacing b → b−1 in remarkable property: If one introduces generators E, the expressions for E, F , K given above, one obtains a representation of Uq˜ (sl(2, R)) ˜ F˜ , K˜ 2 commute with E, q˜ = exp(π ib−2 ) on the same space Pα . The generators E, 2 F , K on the space Pα . This does not mean, however, that these operators commute as self-adjoint operators on L2 (R). This self-duality property of our representations Pα is related to the fact that the representations (Pα , πα ) do not have a classical (b → 0) limit. Intertwining operators. The representations with labels α and Q−α are equivalent. The unitary operator establishing this equivalence can be most easily found by considering the Fourier-transform of the representation (12), as already done in the proof of Proposition 1, Eqs. (13): Define the operator I˜ α : L2 (R) → L2 (R) as (I˜ α f˜)(ω) = B˜ α (ω)f (ω),
B˜ α (ω) ≡
Sb (α − iω) . Sb (Q − α − iω)
(16)
The operator I˜ α is unitary since |B˜ α (ω)| = 1. It maps Pα to PQ−α as follows from the analytic and asymptotic properties of the Sb -function summarized in Appendix B. The fact that πQ−α (u)I˜ α = I˜ α πα (u),
u ∈ Uq (sl(2, R))
(17)
is a simple consequence of the functional relations (133),Appendix B of the Sb -functions. By inverse Fourier-transformation one finds the representation of the intertwining operator on functions f (x). It takes the form (Iα f )(x) =
dx Bα (x − x )f (x),
(18)
R
where the inverse Fourier-transform defining the kernel Bα (x − x ) may be found by means of Eq. (136), Appendix B to be given by
Bα (x − x ) =
Sb
Q
2 Sb (2α) Q Sb 2
+ i(x − x ) − α + i(x − x ) + α
.
(19)
Clebsch–Gordan and Racah–Wigner Coefficients for Representations of Uq (sl(2, R))
621
4. The Clebsch–Gordan Decomposition of Tensor Products The co-product allows us to define the tensor product of representations: For any u ∈ Uq (sl(2, R)) let π21 (u) ≡ (πα2 ⊗ πα1 )&(u). The operators π21 (u) generate a representation of Uq (sl(2, R)) on Pα2 ⊗ Pα1 . Our aim is to determine the decomposition of this representation into irreducible representations of Uq (sl(2, R)). Lemma 5. Pα2 ⊗ Pα1 is dense in L2 (R) ⊗ L2 (R). Proof. Any two-variable Hermite-function is contained in Pα2 ⊗ Pα1 . α3 α2 α1 (the “Clebsch–Gordan coeffiDefinition 1. Define a distributional kernel x3 x 2 x 1 cients”) by an expression of the form α3 α2 α1 α3 α2 α1 ≡ lim , (20) x3 x2 x1 x3 x2 x1 ! !↓0 where the meromorphic function
Q − α3 α2 α1 x3 x2 x1
α3 α2 α1 x3 x2 x1
!
is defined as
πi
!
= e− 2 (&α3 −&α2 −&α1 ) × Db (β32 ; y32 + !)Db (β31 ; y31 + !)Db (β21 ; y21 + !), (21)
&α = α(Q−α), the distribution Db (α; y) is defined in terms of the Double Sine function Sb (y) (cf. Appendix) as Db (α; y) =
Sb (y) , Sb (y + α)
(22)
and the coefficients yj i , βj i , j > i ∈ {1, 2, 3} are given by y32 = i(x3 − x2 ) − 21 (α3 + α2 − Q), y31 = i(x1 − x3 ) − y21 = i(x1 − x2 ) −
1 2 (α3 1 2 (α2
+ α1 − Q), + α1 − 2α3 ),
β32 = α2 + α3 − α1 , β31 = α3 + α1 − α2 , β21 = α2 + α1 − α3 .
(23)
The aim of this section will be to prove Theorem 2. The Uq (sl(2, R))-representation π21 defined on πα2 ⊗ πα1 decomposes as follows into irreducible representations Pα : πα2 ⊗ πα1
⊕ dαπα, S
S≡
Q + iR+ . 2
(24)
The isomorphism can be described explicitly in terms of a unitary map C21 of the form C21 :
dµ(α) ≡ |Sb (2α)|2 L2 (R × R) → L2 (S × R, dµ(α3 )dx3 ), α3 α2 α1 f (x2 , x1 ) → Ff (α3 , x3 ) ≡ dx2 dx1 f (x2 , x1 ) x3 x2 x1 R
(25)
622
B. Ponsot, J. Teschner
such that the corresponding projections 221 (α3 ), 221 (α3 )f (x3 ) = Ff (α3 , x3 ), map Pα2 ⊗ Pα1 into Pα3 and intertwine the respective Uq (sl(2, R)) actions according to 221 (α3 )π21 (u) = πα3 (u)221 (α3 ),
u ∈ Uq (sl(2, R)).
(26)
Remark 4. It follows from Theorem 2 that the representation π21 is in fact integrable, which was not clear apriori. Remark 5. It is remarkable and nontrivial that the subset of “self-dual” integrable representations of Uq (sl(2, R)) is actually closed under tensor products. Remark 6. The appearance of the measure dµ(α) is natural since dµ(α) is the Plancherel measure for the dual space of functions L2 (SL+ q (2, R)), cf. [18]. Corollary 1. The Clebsch–Gordan coefficients
α3 α2 α1 x3 x2 x1
satisfy the following or-
thogonality and completeness relations: α3 α2 α1 ∗ β3 α2 α1 lim dx1 dx2 = |Sb (2α3 )|−2 δ(α3 − β3 )δ(x3 − y3 ), x3 x2 x1 ! y3 x2 x1 ! !↓0 R
dα3 |Sb (2α3 )|
lim !↓0
S
dx3
2 R
α3 α2 α1 x3 x2 x1
∗ !
α3 α2 α1 x3 y2 y1
!
= δ(x2 − y2 )δ(x1 − y1 ). (27)
The main step in the proof of Theorem 2 will be the construction of a common spectral decomposition for the operators Q21 ≡ (πα2 ⊗ πα1 )&(Q) and K21 . The decomposition of L2 (R × R) into eigenspaces of K21 is simply obtained by Fourier-transformation: F:
L2 (R × R) → L2 (R × R) f (x2 , x1 ) → F (κ3 , x− ) ≡
R
dx+ e−πiκ3 x+ f
x+ +x− 2
− , x+ −x . 2
(28)
The q-Casimir Q21 is mapped under this Fourier-transformation F into a second order finite difference operator C21 (κ3 ) that contains shifts w.r.t. the variable x− only and therefore leaves the eigenspaces of K21 invariant: 2 C21 (κ3 ) − α3 − Q 2 b Q 1 = − ix − 21 (α1 + α2 − Q) + (α3 − Q 2 ) b [ − ix − 2 (α1 + α2 − Q) − (α3 − 2 )]b 1 − − ix + 21 (α1 + α2 ) − Q b eiπb(−ix− 2 (α1 +α2 )) {α1 − α2 + iκ3 }b
1 − e−iπb(−ix− 2 (α1 +α2 )) {α1 − α2 − iκ3 }b Tx−ib − −2ib 1 1 + − ix + 2 (α1 + α2 ) − Q b − ix + 2 (α1 + α2 ) − 2Q b Tx− , (29) where the following notation has been used: [x]b ≡
sin(π bx) , sin(π b2 )
{x}b ≡
cos(π bx) . i sin(π b2 )
(30)
Clebsch–Gordan and Racah–Wigner Coefficients for Representations of Uq (sl(2, R))
623
The spectral analysis of the operator C21 is performed in Appendix A. The result may be summarized as follows: Eigenfunctions α3 (α2 , α1 |κ3 |x) of C21 are given by an expression of the form Q−α3 (α2 , α1 |κ3 |x)
;κ3 πx(2α3 −2α2 +iκ3 ) = Mαα23,α e 6b (T , y− ) 7b (U, V , W ; y+ ). (31) 1
The special functions 6b (T ; y) and 7b (U, V , W ; y) are defined in Appendix B, y± are introduced as y± = −ix − 21 (α2 + α1 − Q) ∓ (α3 − Q 2 ) and the coefficients T , U , V , W are given as T = α2 + α1 − α3 U = α3 + α1 − α2
V = − iκ3 + α3 W = − iκ3 + α1 − α2 + Q.
(32)
Theorem 3. A complete set of generalized eigenfunctions for the operator C21 (κ3 ) is given by {( α3 )∗ ; α3 ∈ S}. By combining Theorem 3 with the usual Plancherel formula for the Fourier-transformation F one concludes that each function f (x2 , x1 ) ∈ L2 (R × R) can be decomposed as (x± ≡ x2 ± x1 ) ∗ f (x2 , x1 ) = dκ3 eπiκ3 x+ dµ(α3 ) (33) α3 (α2 , α1 |κ3 |x− ) Ff (α3 , κ3 ), R
S
where the generalized Fourier-transformation Ff of f is defined as Ff (α3 , κ3 ) = dx2 dx1 e−πiκ3 x+ α3 (α2 , α1 |κ3 |x− )f (x2 , x1 ).
(34)
R
The measure dµ(α3 ) will be determined later. One may next observe that Lemma 6. One has α3 α2 α1 α3 α2 α1 dx3 e2πiκ3 x3 ≡ = e−πiκ3 x+ κ3 x2 x1 x3 x2 x1
α3 (α2 , α1 |κ3 |x− ),
(35)
R
if the normalization factor M in (31) is chosen as ;κ3 ≡ eπiα2 (α2 −α3 ) e−πi(α3 −iκ3 )(α3 +α2 −Q) . (36) Mαα23,α 1 Q − α3 α2 α1 Proof. The kernel may be rewritten in terms of the function 6b (β; y) x3 x2 x1 as follows: Q − α3 α2 α1 = eπiα1 α2 e2π(x3 (α2 −α1 )+α1 x1 −α2 x2 ) x3 x2 x1 (37) × 6b (β32 ; y32 )6b (β31 ; y31 )6b (β21 ; y21 ).
The substitution s = −i(x3 − x2 ) + 21 (α3 + α2 − Q) then leads to the Euler-type integral (146) for the b-hypergeometric function. The rest is straightforward. If follows that the generalized Fourier-transformation defined in Theorem 3 represents a decomposition into eigenspaces of the q-Casimir Q21 . Two things remain to be done in order to finish the proof of Theorem 2: On the one hand it remains to calculate the spectral measure dµ(α3 ), and on the other hand one needs to verify the intertwining property (26).
624
B. Ponsot, J. Teschner
4.1. Spectral measure. We will show in this subsection that dµ(α3 ) = |Sb (2α3 )|2 . This follows from the combination of the following two results. We first of all determine the asymptotics of the distributional Fourier-transform of α3 : Lemma 7. The function ˜ α3 (ω) (defined as in (6)) decays exponentially for ω → ∞ and has the following asymptotic behavior for ω → −∞: ˜ α3 (ω) = N+ (α3 )e2πiωx+ + N− (α3 )e2πiωx− + R− (ω),
(38)
where R− (ω) decays exponentially for ω → −∞, x+ and x− are defined by x± ≡ + 2i α1 + α2 − Q ± i α3 −
Q 2
and |N± (α3 )|2 = |Sb (2α3 )|−2 . Proof. According to Lemma 2 one just needs to calculate the residues of α3 for the poles at x = x± . We will only need the absolute values of these quantities. The pole at x = x− comes from the Gb /Gb factor in the expression for . To calculate its residue one needs the following special value of the 7-function: 7b (U, V ; W ; W − U − V ) =
Gb (V )Gb (W − U − V ) , Gb (W − U )
(39)
which follows easily from the fact that the representation (146) simplifies to the b-beta 2 integral (136) for x = W − U − W . We furthermore note that |Gb ( Q 2 + ix)| = 1 from the reflection property of Sb (x) stated in Appendix B. It thereby follows that |N− (α3 )|2 = |Mαα23α;κ13 Gb (Q − 2α3 )|2 .
(40)
One has |Mαα23α;κ13 |2 = eπiQ(Q−2α3 ) , and |Gb (Q − 2α3 )|2 = e−πiQ(Q−2α3 ) |Sb (2α3 )|−2 from the connection between Sb and Gb , as well as the reflection property of Sb (see Appendix B). Therefore |N− (α3 )|2 = |Sb (2α3 )|−2 . The pole at x = x+ corresponds to the pole at y = 0 of 7b (U, V ; W ; y). One may determine the singular term for y → 0 by applying Lemma 3 to the Euler integral representation (146) for the function 7b : 2πe−2πiyβ
1 Gb (γ − β) Gb (−y + γ − β) = + (contributions regular as y → 0). Gb (α)Gb (−y + Q) y Gb (α) (41)
The rest of the calculation proceeds as in the case of N− (α3 ) and yields |N+ (α3 )|2 = |Sb (2α3 )|−2 . Proposition 2. Assume that the generalized eigenfunctions ˜ α3 decay exponentially for ω → ∞ and have asymptotic behavior of the form (38) with |N+ (α3 )|2 = |N− (α3 )|2 for ω → −∞. In that case one may define the “inner product” ( α3 , α3 ) as a bidistribution which is explicitly given by (
α3 ,
α3 )
= |N+ (α3 )|2 δ(α3 − α3 ).
(42)
Clebsch–Gordan and Racah–Wigner Coefficients for Representations of Uq (sl(2, R))
625
Proof. Consider (C21 (κ3 ) = lim
α3 ,
W →∞
W
s=±−W
α3 ) − (
α3 , C21 (κ3 )
α3 )
∗ ∗ dω δ˜s (ω) ˜ α3 (ω + sib) ˜ α3 (ω)− ˜ α3 (ω) δ˜s (ω) ˜ α3 (ω + sib) , (43)
where the Fourier-transform of the explicit expression (105) for C21 (κ3 ) has been used. The contour of integration for the second term in (43) can be deformed into R − isb plus contours from −W to −W − isb and W − isb to W . The integral over R − isb cancels the first term on the right-hand side of (43). Only the contour from −W to −W −isb will give nonvanishing contributions in the limit W → ∞ due to the exponential decay of ˜ α3 (ω) for ω → ∞. In the remaining term one gets in the limit W → ∞ contributions only from the leading terms in the asymptotics of ˜ α3 (ω) for ω → −∞ as quoted in Lemma 38. Taking into account that δ˜s (ω) =
1 esπib(Q−α1 −α2 ) + O(e2πbω ) (q − q −1 )2 Q 2
for ω → −∞, it follows that (α3 = (C21 (κ3 )
α3 ,
α3 ) − (
+ ip3 , α3 =
α3 , C21 (κ3 )
1 = lim (q − q −1 )2 W →∞ s=±
α3 )
!1 ,!2 =±
Q 2
(44)
+ ip3 )
∗ N!1 (α3 ) N!2 (α3 ) 2πiW (!1 p3 −!2 p ) 3 · e (45) 2π i(!1 p3 − !2 p3 ) · e2πs!2 bp3 1 − e2πsb(!1 p3 −!2 p3 ) .
The expression on the right-hand side of (45) vanishes by the Riemann–Lebesgue Lemma for p3 " = p3 as well as !1 " = !2 . The remainder is found to be (C21 (κ3 )
α3 ,
α3 ) − (
α3 , C21 (κ3 )
α3 )
e2πiW (p3 −p3 ) − e−2πiW (p3 −p3 ) . = [ip3 ]2b − [ip3 ]2b |N+ (α3 )|2 lim W →∞ 2π i(p3 − p3 )
(46)
It follows that
(
α3 ,
α3
e2πiW (p3 −p3 ) − e−2πiW (p3 −p3 ) W →∞ 2π i(p3 − p3 )
) = |N+ (α3 )|2 lim = |N+ (α3 )|
2
(47)
δ(α3 − α3 )
by the corresponding well-known property of the kernel sin(Rx)/x, cf. e.g. [21, Chapter IX, Exercise 14].
626
B. Ponsot, J. Teschner
4.2. Intertwining property. Proposition 3. The projections 221 (α3 ), α3 ∈ S map Pα2 ⊗ Pα1 into Pα3 and satisfy the intertwining property (26). Proof. Ff (α3 , x3 ) will be entire analytic w.r.t. x3 by straightforward application of Lemma 3, using that f is entire analytic in x2 , x1 and the analytic properties of the Clebsch–Gordan coefficients summarized in Lemma 19, Appendix C. One similarly finds by using Lemma 20, Appendix C that the Fourier-transform Ff (α3 , κ3 ) will be meromorphic in κ3 with poles at κ = ±(Q − α + nb + mb−1 ), n, m ∈ Z≥0 for any f ∈ Pα2 ⊗ Pα1 . This establishes the first claim in Proposition 3. Note that the analytic continuation of the integral (25) that defines Ff (α3 , x3 ) can be represented by integrating over a deformed contour C (2) ⊂ C2 . For later use we will present suitable contours for the cases of analytic continuation to {x3 ∈ C; Im(x3 ) ∈ [0, b2 ]} and {x3 ∈ C; Im(x3 ) ∈ [− b2 , 0]} respectively: In the first case one may integrate x1 over the real axis and instead of integrating over x2 one may integrate x32 ≡ −iy32 , cf. (23), over a contour consisting of the union of the half axes (−∞, −δ] and [δ, +∞), b > δ > b/2 with a half-circle in the upper half plane around x32 = 0 of radius δ. In the second case one may integrate x2 over R, and x31 ≡ −iy31 over the contour C1 consisting of the union of the half axes (−∞, −δ] and [δ, +∞) with a half-circle of radius δ in the lower half plane around x31 = 0. Now consider the right-hand side of (26). The expressions for π21 (u), u = E, F, K contain the shift operators + ib
+ ib
Tx1 2 Tx2 2 ,
− ib
− ib 2
Tx1 2 Tx2
and
− ib
+ ib
Tx1 2 Tx2 2 .
(48)
± ib
The shift operator Txi 2 is “partially integrated” by (i) shifting the contour of integration over xi to the axis R ∓ ib 2 , where one will pick up a residue contribution from the pole of the Clebsch–Gordan coefficients that lies between these two contours, and (ii) introducing the new variables of integration xi ≡ xi ± ib 2 . In this way one rewrites the expression for C21 π21 (u)f in the form
α α α t dx2 dx1 π21 (u) 3 2 1 f (x2 , x1 ), (49) x3 x2 x1 C1
C2
t denotes the transpose of π , and the contours C , i = 1, 2 are just the where the π21 21 i contours introduced above to represent the analytic continuation w.r.t. x3 . It is important to notice that due to the fact that only the shift operators (48) appear in the expressions for π21 (u), u = E, F, K one does not need to introduce further deformations of the contours in order to treat the poles from the factor in the Clebsch–Gordan coefficients that depends on x2 − x1 only. It is verified by a straightforward calculation using (133) that the Clebsch–Gordan coefficients satisfy the finite difference equations
t (u) π21
α3 α 2 α 1 α α α = πα3 (u) 3 2 1 , x3 x2 x1 x3 x2 x1
u = E, F, K.
(50)
Clebsch–Gordan and Racah–Wigner Coefficients for Representations of Uq (sl(2, R))
627
Inserting these relations into (49) yields an expression that is easily identified as πα3 (u)C21 f . 5. Racah–Wigner Coefficients for Uq (sl(2, R)) 5.1. Canonical decompositions for triple tensor products. Triple tensor products Pα3 ⊗ Pα2 ⊗ Pα1 carry a representation π321 of Uq (sl(2, R)) given by π321 ≡ (πα3 ⊗ πα2 ⊗ πα1 ) ◦ &(3) ,
(51)
&(3) ≡ (& ⊗ id) ◦ & ≡ (id ⊗ &) ◦ &.
The decomposition of this representation into irreducibles can be constructed by iterating Clebsch–Gordan maps: There are two canonical ways to do so, which will be referred to as “s-channel” and “t-channel” respectively. The first of these corresponds to first decomposing the factor Pα2 ⊗ Pα1 into a direct sum of irreducible representations Pαs then performing the Clebsch–Gordan decomposition of Pα3 ⊗ Pαs . This extends to a unitary map C3(21) :
L2 (R × R × R) → L2 (S2 × R, dµ(α4 )dµ(αs )dx4 ) . f (x3 , x2 , x1 ) → Ffs (α4 , αs , x4 ),
(52)
The generalized Fourier-transform Ffs of f is defined as α α α Ffs (α4 , αs ; x4 ) ≡ lim lim dx3 dxs 4 3 s x4 x3 xs !2 !2 ↓0 !1 ↓0 R2
×
dx2 dx1
R2
αs α2 α1 xs x2 x1
(53) !1
f (x3 , x2 , x1 ),
which in the notation x ≡ (x3 , x2 , x1 ), dx ≡ dx3 dx2 dx1 can be rewritten as α α Ffs (α4 , αs ; x4 ) ≡ lim dx sαs 3 2 (x4 ; x) f (x), α 4 α1 ! !↓0 where
s αs
α3 α 2 α4 α 1
!
R3
(x4 ; x) =
R
dxs
α4 α3 αs x4 x3 xs
!
αs α2 α1 x s x 2 x1
(54)
!
α4 , αs ∈ S, x4 ∈ R.
α 3 α2 (x ; x) are collected inAppendix C. α4 α1 ! 4 The generalized Fourier-transformation C3(21) is such that the two-parameter family of projections 2s (α4 , αs ) : Pα3 ⊗ Pα2 ⊗ Pα1 → Pα4 (R) defined by f → Ffs (α4 , αs ; .) intertwine the representation π321 with the irreducible representation πα4 . It therefore realizes the following isomorphism of Uq (sl(2, R)) representations:
Some useful properties of the functions
s αs
⊕ Pα3 ⊗ Pα2 ⊗ Pα1
dµ(α4 )Pα4 ⊗ Sµ , S
(55)
628
B. Ponsot, J. Teschner
where the multiplicity space Sµ L2 (S, dµ) is considered to be equipped with the trivial action of Uq (sl(2, R)). A second canonical decomposition of Pα3 ⊗ Pα2 ⊗ Pα1 is obtained by first decomposing the factor Pα3 ⊗ Pα2 into a direct sum of irreducible representations Pαt and then performing the Clebsch–Gordan decomposition of Pαt ⊗ Pα1 . One obtains a map C(32)1 :
L2 (R × R × R) → L2 (S2 × R, dµ(α4 )dµ(αt )dx4 )
(56)
f (x3 , x2 , x1 ) → Fft (α4 , αt , x4 ),
where Fft is defined by a generalized Fourier-transform of the same form as (53) but with s21 replaced by α 3 α2 α4 αt α1 αt α3 α2 t (x4 ; x) = dxt . α4 , αt ∈ S, x4 ∈ R. αt α α x 4 x t x1 ! x t x3 x2 ! 4 1 ! R (57) As in the case of the s-channel, one has a corresponding two-parameter family of projections 2s (α4 , αs ) : Pα3 ⊗ Pα2 ⊗ Pα1 → Pα4 that intertwine the representation π321 with the irreducible representation πα4 . Remark 7. The unitarity of the maps C3(21) and C(32)1 ensures existence of self-adjoint extensions for the operators π3(21) (u), π(32)1 (u), u = E, F, K, Q: Simply take the −1 −1 or C(32)1 . image of the self-adjoint extensions on L2 (S2 × R) under C3(21) However, it is not a priori clear that such self-adjoint extensions are unique. In particular, it could be that the self-adjoint extensions that are defined in terms of the maps C3(21) and C(32)1 are inequivalent. This disturbing possibility will be excluded shortly. 5.2. Relation between C3(21) and C(32)1 . It will be convenient to also consider the = 3 α2 (k ; x), = = s, t that are defined as Fourier-transforms αs α 4 α α 1 !
4
= αs
α 3 α2 α 4 α1
!
(k4 ; x) =
R
dx4 e2πik4 x4
= αs
α3 α 2 α4 α1
!
(x4 ; x).
(58)
Unitarity of the maps C3(21) and C(32)1 allows us to relate the transforms Ffs and Fft by a transformation of the form α α k (59) Ffs (α4 , αs , k4 ) = dα4 dαt dk4 K 4 s 4 Fft (α4 , αt , k4 ). α 4 αt k 4 S2
R
The distribution K appearing in (59) can be represented as α α k K 4 s 4 α 4 αt k 4 ∞ ρ
∗ α3 α 2 t = lim lim dx2 dx3 dx1 (k ; x) αt α α 4 ρ→∞ !↓0 4 1 ! −∞
−ρ
s αs
α 3 α2 α4 α1
!
(k4 ; x). (60)
Clebsch–Gordan and Racah–Wigner Coefficients for Representations of Uq (sl(2, R))
629
We will first prove Proposition 4. The distribution K is of the form K
α α α 4 αs k4 = δ(α4 − α4 )δ(k4 − k4 ) K 4 s . α 4 αt k 4 k4 α t
(61)
Proof. This will be a consequence of the following result: K satisfies α4 −
2 Q 2 b
− α4 −
2
Q 2 b
K
(k4 − k4 ) K
α4 αs k4 α4 αt k4 α4 αs k4 α4 αt k4
= 0, (62) = 0.
To see that (62) implies the claim, consider the simplified case of a distribution T ∈ S (R) that satisfies Tf = 0, where f is a function that vanishes only at x0 and such that f g ∈ S(R) if g ∈ S(R). This distribution has support only at x0 . By Theorem V.11 n of [20] one has T = N n=0 an (x0 )∂x δ(x − x0 ). It is then easy to see that Tf = 0 implies an = 0 for n " = 0. The generalization to the case at hand is clear. = 3 α2 To verify (62) one may note that the functions αt α α4 α1 ! (k4 ; x), = = s, t satisfy eigenvalue equations for the operators Q321 ≡ π321 (Q) and K321 ≡ π321 (K) up to an error of order O(!). It follows that
α α k K 4 s 4 α 4 αt k 4 ρ
∗ α 3 α2 t = lim lim dx2 dx3 dx1 (k x) Q321 αt α α 4 !1 ,!2 ↓0 ρ→∞ 4 1 !1
α4 −
Q 2 2 b
− α4 −
R
Q 2 2 b
−ρ
− Q321
t αt
α 3 α2 α4 α1
∗
!1
(k4 ; x)
s αs
s αs
α 3 α2 α4 α1
α 3 α2 α 4 α1
!2
(k4 ; x)
!2
(k4 ; x) . (63)
The right-hand side of (63) will vanish if Q321 can be “partially integrated”. To show that this is the case, one needs some information on the form that Q321 takes when acting on functions f (x). By straightforward evaluation of its definition one obtains an expression in terms of shift operators T1is1 b T2is2 b T3is3 b ,
where Ti = Txi , si ∈ {+, −}, i = 1, 2, 3.
It is convenient to introduce an alternative set of shift operators T+3 = T1 T2 T3 ,
2 T21 = T2 T1−1
2 T32 = T3 T2−1 .
The crucial point now is that the expression for Q321 when rewritten in terms of T+ , T21 , T32 takes the following form Q321 =
3
3 3
n+ =−3 n21 =0 n32 =0
in b
2
ibn21
Pn+ n21 n32 (x) T+ + T213
2
ibn32
T323
,
(64)
630
B. Ponsot, J. Teschner
so it contains shifts of x21 , x32 , x31 by positive imaginary amounts up to 2ib only. Furthermore note that in (63) one may replace T+ by e−2πik4 . The analytic properties of the integrand in (63) as following from Lemma 22 in Appendix C now allow to partially integrate Q321 by appropriate shifts of the contours of integration over x3 , x2 , x1 (cf. proof of Proposition 3). The verification of the second equation in (62) is similar. Remark 8. This result implies that the self-adjoint extensions of π321 (u), u = K, Q that are defined by the maps C3(21) and C(32)1 indeed coincide. A similar argument as in the proof of the previous proposition will also cover the two other cases u = E, F . 5.3. Calculation of the Racah–Wigner coefficients I. It will be useful to also introduce α4 αs x4 X α 4 αt x 4 ∞ (65)
∗ α 3 α2 α 3 α2 t s = lim dx3 dx2 dx1 (x ; x) (x ; x). 4 αt α α αs α α 4 !→0+ 4 1 ! 4 1 ! −∞
Proposition 4 has an obvious counterpart for X : Proposition 5. The distribution X is of the form
α4 αs x4 α1 α2 αs X = δ(α4 − α4 )δ(x4 − x4 ) . α 4 αt x 4 α3 α4 αt b
(66)
Proof. Introduce K!,ρ
∞ ρ α4 αs k4 dx2 dx3 dx1 = α 4 αt k 4 −∞
−ρ
t αt
α3 α 2 α4 α1
!
∗
(k4 ; x)
s αs
α 3 α2 α 4 α1
!
(k4 ; x). (67)
The coefficient of δ(k4 − k4 ) in the expression for K coincides with the sum of the coefficients with which e−2πi(k4 −k4 )x1 and e−2πi(k4 −k4 )x3 appear in the asymptotic expansion of the integrand in (67), cf. Lemma 22. Lemma 2 identifies the origin of these terms in the asymptotic expansion of = , = = s, t, with the poles in the dependence of = [. . . ] (x ; x), = = s, t on their variable x . It follows that the coefficient of δ(k − k ) ! 4 4 4 4 in the expression for K is independent of k4 . The result now follows from standard properties of the Fourier transformation. Proposition 6. We have
Sb (α2 + αs − α1 )Sb (αt + α1 − α4 ) α1 α2 αs =N α3 α 4 α t b Sb (α2 + αt − α3 )Sb (αs + α3 − α4 ) i∞ Sb (U1 + s)Sb (U2 + s)Sb (U3 + s)Sb (U4 + s) 2 · |Sb (2αt )| ds , Sb (V1 + s)Sb (V2 + s)Sb (V3 + s)Sb (V4 + s) −i∞
(68)
Clebsch–Gordan and Racah–Wigner Coefficients for Representations of Uq (sl(2, R))
631
where the coefficients Ui and Vi , i = 1, . . . , 4 are given by U1 =αs + α1 − α2 , U2 = Q + αs − α2 − α1 , U3 = αs + α3 − α4 , U4 = Q + αs − α3 − α4 ,
V1 V2 V3 V4
= 2Q + αs − αt − α2 − α4 , = Q + αs + αt − α4 − α2 , = 2αs , = Q,
(69)
and N is a constant. Proof. Let K!
∞ α4 αs x4 = dx3 dx2 dx1 α 4 αt x 4 −∞
t αt
α 3 α2 α4 α1
∗
!
(x4 ; x)
s αs
α 3 α2 α4 α 1
!
(x4 ; x). (70)
The analytic and asymptotic properties of the integrand follow from Lemma 21 in Appendix C. Let us observe that for ! > 0 one is dealing with absolutely convergent integrals, the integrand being meromorphic both w.r.t. the integration variables and the parameters. The integral (70) therefore does not depend on the order in which the integrations are performed, so we will assume that it is first integrated over x2 . Singular behavior will emerge in the limit ! → 0. We will call a pole relevant if it has distance of O(!) from the real axis, irrelevant otherwise2 . It then easily follows from Lemma 3 that the integration over x2 does not introduce any new relevant poles since all the relevant poles in the x2 dependence that have distance of O(!) are lying on the same side of the contour. Next one may integrate over x1 . We find from Lemma 21 in Appendix C that Rs14 Rs13 α 3 α2 + + (Regs ), (x4 , x) = α4 α1 ! x1 − x3 + α13 − 2i! x1 − x4 + α14 − 2i!
∗ Rt13 Rt14 α 3 α2 t (x , x) = + αt α α 4 + 2i! + i! + (Regt ), x1 − x3 + α13 x1 − x4 + α14 4 1 ! (71) s αs
where (Reg= ), = = s, t are terms that do not lead to relevant poles in the variable x1 after having integrated over x2 . The following abbreviations have been used: α13 = 2i (α1 + α3 − 2(Q − α4 )),
α13 = 2i (α1 + α3 − 2(Q − α4 )),
α14 = 2i (α1 − α4 ),
= 2i (α1 − α4 ). α14
(72)
It is then easily found by using Lemma 3 that the result of the integration over x1 will have poles at the following locations: i(α4 − α4 ) − 4i! = 0, x4 − x4 + 2i (α4 − α4 ) − 3i! = 0,
x3 − x4 − 2i (α3 + α4 − 2(Q − α4 )) − 4i! = 0, x4 − x3 + 2i (α3 + α4 − 2(Q − α4 )) − 3i! = 0. (73)
The relevant residues can easily be assembled from the expressions given in Appendix C. Moreover, it is straightforward to work out their poles. By again using Lemma 3 one 2 We of course assume that ! has been chosen to be much smaller than b.
632
B. Ponsot, J. Teschner
then finds that all four poles listed in (73) will, after doing the x3 integration, produce terms that are singular for x4 = x4 , α4 = α4 and ! → 0. The terms that lead to δ(x4 − x4 )δ(α4 − α4 ) are easily identified by means of lim
!→0+
1
1 − = 2π iδ(x). x − i! x + i!
(74)
All these terms have as residue an expression proportional to Res Res
y31 =0 y21 =0
α4 α3 αs ∗ ∗ ∗
R
Res Res
y31 =0 y21 =0
dx2 Res
y31 =0
αs α2 α1 ∗ x 2 x1
α4 αt α1 ∗ ∗ ∗
Res
x1 =x3 −α13 y32 =0
αt α3 α2 xt ∗ x 2
xt =x3 − 2i (α3 −αt )
(75) .
One just needs to assemble the ingredients to check that the expression (75) coincides with what one finds on the right-hand side of (68). Remark 9. With more patience, one could of course also fix the constant N by the method used in the previous proof. We refrain from doing so since we will present a less tedious and more illuminating way of calculating it in the next subsection. What will be needed there, however, is the information on analyticity of the coefficients {. . . } w.r.t. αt that follows from Proposition 6.
5.4. Relation between the distributions Proposition 7.
s
and
s
and
t.
t
are related by a linear transformation of the form
α3 α 2 α α αs α3 α 2 s t (x4 ; x). dαt 1 2 α αs α α (x4 ; x) = α α 3 α 4 t b t α4 α 1 4 1
(76)
S
The relation (76) can be read either as (i) relation between functions analytic in A(4) ≡ {x = (x4 , x3 , x2 , x1 ) ∈ C4 ; Im(x1 ) < Im(x2 ) < Im(x3 ), Im(x1 ) < Im(x4 ) < Im(x3 ), Im(x3 − x1 ) < Q}, or (ii) as relation between functions meromorphic w.r.t. x ∈ C4 , or (iii) as relation between distributions defined as boundary values of = , = = s, t for (x4 , x) ∈ R4 . Proof. We will start from Eq. (59). By using Fourier-transformation w.r.t. the variable k4 and Eq. (66) one may rewrite (59) as follows:
α 1 α2 α s Ffs (α4 , αs , x4 ) = dαt F t (α , α , x ). (77) α3 α4 αt b f 4 t 4 S
Let us introduce sequences of test-functions that tend towards delta-distributions: tn (y; x) =
n 3 n 2 − ||x−y||2 e 2 , 2π
y = (y3 , y2 , y1 ).
(78)
Clebsch–Gordan and Racah–Wigner Coefficients for Representations of Uq (sl(2, R))
633
Lemma 8. Let y ≡ (x4 , y) ∈ A(4) with Im(y1 ) < 0. In this case one has =
lim Ftn (y;.) (α4 , α= , x4 ) =
n→∞
= α=
α3 α2 (x ; y). 4 α 4 α1
(79)
=
Proof. By writing out the definition of Ftn and shifting the contours of integration over xi to R + iIm(yi ), i = 1, 2, 3, one reduces the claim to the standard result that lim tn (y; x) = δ 3 (x − y)
n→∞
for Im(yi ) = 0, i = 1, 2, 3. (Note that follows from Lemma 21, Appendix C.)
=
is regular for these values of its arguments as
We will now consider the sequence with elements
α1 α 2 α s dαt Ft (α , α , x ). α3 α4 αt b tn (y,.) 4 t 4
(80)
S
It converges for n → ∞ due to Lemma 8 and Eq. (77). We would like to show that one may exchange the limit n → ∞ with the integration over αt so that the limit of (80) is given by the integral
α 1 α2 α s α 3 α2 t dαt (81) (x4 ; y).
α3 α4 αt b αt α4 α1 S
To this aim it is useful to note that Lemma 9. Under the conditions on the variable y introduced in Lemma 8 one finds that the integrand in (81) decays exponentially for pt ≡ −i(αt − Q 2 ) → ±∞. The integrand in (80) decays at least as fast as the integrand in (81). Proof. By a straightforward calculation using the method in the proof of Lemma 17, Appendix B and Eq. (135) one finds that α 3 α2 (x4 ; y) decays stronger than e∓πQpt and α4 α1
α1 α2 αs grows as e±πQpt α3 α4 α t b t αt
(82)
for pt → ∞. The first statement in Lemma 9 follows. The second statement follows from the first by shifting the contour of integration over x1 in the definition of Fttn (y,.) to R + iIm(y1 ). The integrals (80), (81) can therefore be transformed into integrals over a compact set, e.g. the interval [0, 1]. In order to justify the exchange of limit and integration it therefore suffices to prove the following Lemma 10. The convergence of Fttn (y,.) (α4 , αt , x4 ) is uniform in αt .
634
B. Ponsot, J. Teschner
Proof. To shorten the exposition, let us consider a slightly simplified situation. Assume that fp (x) is analytic w.r.t. both p and x in open strips that contain the real axis and n −nx /2 e decays exponentially for either |p| or |x| going to infinity. Let tn (x) = 2π and study the convergence of fp,n ≡ R dxfp (x)tn (x) for n → ∞. Upon writing fp (x) = fp (0) + xgp (x), the task reduces to the study of 2
dx gp (x) xtn (x) = √ R
1 2π n
n 2
dx e− 2 x ∂x gp (x).
(83)
R
Convergence for n → ∞ will be uniform in p provided that ∂x gp (x) is bounded as functions of both p and x. But this is a consequence of our assumptions: The exponential decay allows us to transform fp (x) (resp. ∂x gp (x)) to a function that is analytic on a compact rectangle in C2 , and therefore bounded. The regularity properties of t necessary to extend the argument to the present situation follow from Lemma 21, Appendix C. We have proved (76) provided (x4 , x) satisfies the same conditions as (x4 , y) in Lemma 8. Proposition 7 follows by analytic continuation.
5.5. Calculation of Racah–Wigner coefficients II. We have shown that the meromorphic functions s and t are related by an integral transformation of the form (76). If one fixes the values of three of the four variables x4 , . . . , x1 in (76) one obtains an integral transformation for a function of a single variable. In fact, the analytic properties of sαs and tαt even allow one to choose complex values. It will be convenient to consider 7αs s
3 α 3 α2 2πα4 x4 (x) = lim e lim e−2παj xj α¯ 4 α1 x4 →∞ x2 →−∞ j =1
s αs
x1 =x α3 α 2
(x) i , α¯ 4 α1 x3 = 2 (Q+α2 −α4 ) (84)
where α¯ = Q − α, and the same for 7αt t . The integral that defines sαs and tαt , (54)(57) can be done explicitly in this limit by using (146). One finds expressions of the form α 3 α2 α α α α (x) = Nαs s 3 2 6sαs 3 2 (x), α¯ 4 α1 α¯ 4 α1 α¯ 4 α1 α α 6sαs 3 2 (x) = e+2πx(αs −α2 −α1 ) Fb (αs + α1 − α2 , αs + α3 − α4 ; 2αs ; −ix), α¯ 4 α1 α α α α α α 7αt t 3 2 (x) = Nαt t 3 2 6tαt 3 2 (x), α¯ 4 α1 α¯ 4 α1 α¯ 4 α1 α α 6tαt 3 2 (x) = e−2πx(αt +α1 −α4 ) Fb (αt + α3 − α2 , αt + α1 − α4 ; 2αt ; +ix), α¯ 4 α1 (85)
7αs s
where Fb is the b-hypergeometric function defined in the Appendix, and Nαs s , Nαt t are certain normalization factors.
Clebsch–Gordan and Racah–Wigner Coefficients for Representations of Uq (sl(2, R))
635
The linear transformation following from (76) can now be calculated as follows: One observes that 7αs s (resp. 7αt t ) are eigenfunctions of the finite difference operators Qs and Qt defined respectively by 2 Qs = dx + α1 + α2 − Q − e+2πbx dx + α1 + α2 + α3 − α4 dx + 2α1 , 2 (86) 2 −2πbx − e + α + α − α − α d d . Qt = dx + α1 − α4 + Q x 1 2 3 4 x 2 It can be shown that Theorem 4. The operators Qs and Qt have unique self-adjoint extensions in L2 (R, dxe2πQx ). Bases of L2 (R, dxe2πQx ) in the sense of generalized eigenfunctions are given by the sets of functions {6sαs ; αs ∈ S} and {6tαt ; αt ∈ S}, where the normalization is given by ∗ α α α α = dx e2πQx 6α 3 2 (x) 6=α= 3 2 (x) = δ(α= − α= ), = = s, t. (87) ¯ 4 α1 α¯ 4 α1 = α R
The proof is omitted as it is very similar to the proof of Theorem 3. It follows that the Racah–Wigner coefficients can be evaluated in terms of the overlap between these two bases: α α Nαs s 3 2
∗ α¯ 4 α1 α1 α2 αs α 3 α2 α 3 α2 2πQx t s = dx e 6 6 (x) (x).
αt α αs α α3 α¯ 4 αt b ¯ 4 α1 ¯ 4 α1 α α Nαt t 3 2 R α¯ 4 α1 (88) The integral can be done by using the representation (143) for the b-hypergeometric function. The result is just Eq. (68) with N = 1. 5.6. Properties the Racah–Wigner coefficients. First of all let us note that orthogonality and completeness of the bases { sαs ; αs ∈ S} and { tαt ; αt ∈ S} imply the following orthogonality relations for the b-Racah–Wigner symbols
∗ α1 α 2 α s α1 α2 α s dαs |Sb (2αs )|2 = |Sb (2αt )|2 δ(αt − βt ). (89)
α α α α α βt t S
3
4
b
3
4
b
This may be verified e.g. by rewriting
α 3 α2 α 3 α2 t t (x ; .), (x ; .) 4 αt α α αt α α1 ! 4 4 1 ! 4
(90)
= |Sb (2αt )|−2 δ(αt − αt )δ(α4 − α4 )δ(x4 − x4 ) with the help of the inversion formula to (76)
Sb (2αs ) 2 α1 α2
α ∗ α 3 α2 t s
(x ; x) = dα 4 s αs α α α3 α4 αt b 4 1 Sb (2αt ) S
s αt
α3 α 2 (x4 ; x), α4 α 1 (91)
636
B. Ponsot, J. Teschner
and finally using (90) with subscripts t replaced by s. Second, by considering quadruple products of representations one finds the so-called pentagon equation in the usual way:
β 1 α3 β 2 α 1 α2 β 1 α 1 δ 1 β2 α 2 α3 δ1 α 1 α 2 β1 dδ1 = .
α3 β 2 δ 1 b α4 α 4 γ 2 b α 4 γ 2 γ 1 b α4 α5 γ1 b γ1 α5 γ2 b S
(92)
5.7. From intertwiners to coinvariants. Let us consider coinvariants on tensor products of representations. These will be maps B : Pαn ⊗ . . . ⊗ Pα1 . → C that satisfy the coinvariance property B ◦ (παn ⊗ . . . ⊗ πα1 )&(n) (u) = 0,
u ∈ Uq (sl(2, R)),
(93)
where &(n) is defined recursively by &(n) = (id ⊗ &)(&(n−1) ) = (& ⊗ id)(&(n−1) ), &(2) ≡ &. The basic case to consider is n = 2. Let Bα : PQ−α ⊗ Pα → C be defined by −i Q 2
Bα (f ⊗ g) ≡ f , T g ,
T ≡ Tx
.
(94)
Proposition 8. Bα satisfies the coinvariance property (93). Proof. Let us note that Txiα f , g = f , Tx−iα g
(95)
if f ∈ PQ−α and g ∈ Pα . A straightforward calculation then shows that πQ−α (u)f , g = f , πα (u)g ,
u ∈ Uq (sl(2, R)).
(96)
It is useful to also note the commutation relations T Eα = e−iπbQ Eα T ,
T Fα = e+iπbQ Fα T ,
T Kα = Kα T .
(97)
We may then calculate in the case u = E Bα ((πQ−α ⊗ πα ) ◦ &(E))f ⊗ g = EQ−α f , T Kα g + KQ−α f , T Eα g = EQ−α f , Kα T g + e−iπbQ KQ−α f , Eα T g = f , Eα Kα T g − q = 0.
−1
(98)
T f , Kα Eα T g
The calculation for the case u = F is identical and the case u = K is trivial.
Clebsch–Gordan and Racah–Wigner Coefficients for Representations of Uq (sl(2, R))
637
A coinvariant Bα : Pα ⊗ Pα is then obtained by combining Bα with the intertwining operator Iα : Bα ≡ Bα ◦ (Iα ⊗ id).
(99)
In order to construct coinvariants B (n) for n > 2 one may use intertwining maps C ∈ HomUq (sl(2,R)) (Pαn−1 ⊗ . . . ⊗ Pα1 , Pαn ). Such maps can be constructed by iterating Clebsch–Gordan maps, as has been discussed explicitly in the case n = 4 at the beginning of the present section. One may associate a coinvariant BC to any C ∈ HomUq (sl(2,R)) (Pαn−1 ⊗ . . . ⊗ Pα1 , Pαn ) via BC ≡ B ◦ (id ⊗ C).
(100)
The maps C can be represented explicitly with the help of meromorphic integral = kernels C (xn ; x), x ≡ (xn−1 , . . . , x1 ) that generalize α= and the Clebsch–Gordan coefficients. It follows that the corresponding coinvariant BC can be represented as iQ 2 BC (fn ⊗ . . . ⊗ f1 ) = dxn Txn fn (xn ) dx C (xn ; x)fn−1 (xn−1 ) . . . f1 (x1 ). R
Rn−1
(101) It is possible to rewrite (101) as a convolution of fn (xn ) . . . f1 (x1 ) against a kernel 7C (x), x ≡ (xn , . . . , x1 ): To this aim it is necessary to “partially” integrate the finite difference operator in (101) to let it act on C . One should note that the analytic continuation of the integral over x to complex values of xn may in general be represented by integrating the variable x over deformed contours, cf. e.g. the proof of Proposition 3. One arrives at a representation of the form (102) BC (fn ⊗ . . . ⊗ f1 ) = dxn . . . dx1 7C (xn , . . . , x1 )fn (xn ) . . . f1 (x1 ), Cn
where −i Q 2
7C (xn , . . . , x1 ) = Txn
C (xn ; xn−1 , . . .
, x1 ).
(103)
Remark 10. The kernels that represent the coinvariants are in some respects analogous to functional realizations of the conformal blocks in conformal field theory. We strongly suspect that we are touching upon the tip of an iceberg at this point: Quantization of Teichmüller space, as developed in [22, 23] conjecturally leads to a construction of spaces of conformal blocks in Liouville theory. One may expect this to be equivalent to a quantization of certain moduli spaces of flat SL(2, R) connections on Riemann surfaces with marked points. In analogy to results of [24] one would expect spaces of conformal blocks in the case of the punctured Riemann sphere to be represented by spaces of coinvariants in tensor products of Uq (sl(2, R)) representations. A class of these has been constructed in the present subsection. It would certainly be rather interesting
638
B. Ponsot, J. Teschner
and far-reaching if one could establish a direct relation between these spaces and the Hilbert spaces constructed via quantization of Teichmüller space. In this regard we find the following observation quite intriguing: Consider the case of n = 4. There is a canonical way to define a Hilbert space H(0,4) of coinvariants by = taking the sets { α ; α ∈ S} for either = = s or = = t as basis in the sense of generalized functions with the normalization given by (
= α,
= α )
= |Sb (2α)|−2 δ(α − α ).
(104)
The observation made in Subsect. 5.6. now implies that H(0,4) is in a canonical way 2 Q 2 isomorphic to L2 (R) such that multiplication with αs − Q 2 b (resp. αs − 2 b ) gets mapped into the self-adjoint finite difference operator Qs (resp. Qt ). Maybe there is a rather direct connection of these operators to the geodesic length operators appearing in the quantization of Teichmüller space. This would establish a direct relation between the latter and our quantum group results. 6. Appendix A: Spectral Analysis of C21 (κ3 ) This appendix is devoted to the proof of Theorem 3. 6.1. Preliminaries. The difference operator to be considered is of the form 2 πibQ 2πbx C21 (κ3 ) − α3 − Q e − δ0 + δ− e−πibQ e−2πbx , 2 b = δ+ e
(105)
where δs , s = −, 0, + are x-independent finite difference operators given by δ+ = Tx−ib [dx − α2 − ik3 ]b [dx − α1 + ik3 ]b , 2δ0 = {0}b {Q}b Tx−2ib − e−2πbk3 {2α2 − Q}b + e2πbk3 {2α1 − Q}b Tx−ib
+ {2α3 − Q}b ,
(106)
δ− = Tx−ib [dx + α2 − ik3 ]b [dx + α1 + ik3 ]b , and κ3 = −2k3 . It will initially be defined on the domain D ⊂ L2 (R) consisting of functions with the following property: There exists a function F (z) that is 1. holomorphic in the strip {z ∈ C|Im(z) ∈ [−2b, 0]} and 2. the functions Fy (x) ≡ F (x +iy) are in L2 (R, dx cosh(2π bx)) for any y ∈ [−2b, 0]. Proposition 9. The operator (C21 (k3 ), D) is a symmetric, densely defined operator in L2 (R). The domain D† of its adjoint is dense as well. Proof. First of all note that one has (f, Tx−ib g) = (Tx−ib f, g)
(107)
for any f, g ∈ D. This follows by shifting the contour of the integration that represents (f, T− g) to the line R + ib. The fact that C21 (κ3 ) is symmetric is then seen by a simple calculation remembering that αi∗ = Q − αi , i = 1, 2. The fact that D and D† are dense in L2 (R) is easily seen by noting that any Hermitefunction is contained in these sets.
Clebsch–Gordan and Racah–Wigner Coefficients for Representations of Uq (sl(2, R))
639
˜ The Paley–Wiener theorem provides a characterization of the Fourier-transform D of the domain D of C21 (κ3 ). The action of C21 (κ3 ) on functions in D then corresponds ˜ with the following operator: to acting on D 2 2πbω C21 (κ3 ) − α3 − Q &1 + e4πbω &2 , 2 b ≡ &0 − e &0 = [dω + α3 − Q − 21 (α1 + α2 )]b [dω − α3 − 21 (α1 + α2 )]b , 1 &1 = [dω + 21 (α1 + α2 )]b eiπb(dω − 2 (α1 +α2 )+Q) {α1 − α2 − 2ik}b
1 − e−iπb(dω − 2 (α1 +α2 )+Q) {α1 − α2 + 2ik}b ,
&2 = [dω + 21 (α1 + α2 )]b [dω + 21 (α1 + α2 ) + Q]b . (108)
6.2. Strategy. The key to the proof of Theorem 3 is the following result characterizing regularity and asymptotic properties of distributional solutions to the eigenvalue equation of the operator C21 (κ3 ): Theorem 5. Let
2 t ∈ S (R) be a distributional solution of (C21 (κ3 )−[α3 − Q 2] )
= 0.
1. ˜ is represented by a function ˜ (ω) that can be continued to a meromorphic function on C, with simple poles within SQ/2 only at ω = − k3 + i(α1 + nb + mb−1 ), ω = − k3 − i(α1 + nb + mb−1 ), n, m ∈ Z≥0 . ω = + k3 + i(α2 + nb + mb−1 ), ω = + k3 − i(α2 + nb + mb−1 ), 2.
can be represented as = lim!→0 ! where ! is for ! > 0 represented as the restriction to R of a function ! (x) that is meromorphic on C with poles only at −1 x = + 2i α1 + α2 − Q ± i α3 − Q 2 − i(! + nb + mb ), n, m ∈ Z≥0 . Q −1 i x = − 2 α1 + α2 − Q + i 2 + nb + mb ,
In fact, given these properties it is not very difficult to show that for any given eigen2 value [α3 − Q 2 ] there is at most one tempered distributional solution to the eigenvalue equation (Proposition 13). Moreover, no such solution exists for Re(2α3 − Q) " = 0. It follows [25] that the deficiency indices vanish and C21 (κ3 ) has a unique self-adjoint extension. The spectral decomposition can be written as an expansion into generalized eigenfunctions [26]. It can be shown on rather general grounds that only tempered distributions can appear in the spectral decomposition, as is nicely discussed in [27]. The combination of Theorem 5 and Proposition 13 therefore also yields a characterization of the support of the Plancherel measure. These remarks reduce the proof of Theorem 3 to that of Theorem 5 and Proposition 13. 6.3. Preparations. In view of the explicit expressions for C21 (κ3 ) (cf. (105)) resp. its Fourier-transform (108) one may anticipate that the analysis of the asymptotic behavior of and ˜ will require some information about properties of the operators δ+ , δ− resp. &0 , &2 . The information that will be needed is contained in the following lemmas:
640
B. Ponsot, J. Teschner
Lemma 11. δ± is invertible on Cc∞ (R). The image f (x) of a function g ∈ Cc∞ (R) under −1 has the following properties: δ± 1. f (x) is analytic in the strip {x ∈ C; Im(x) ∈ (−2b, 0)} and f (x) ∈ C ∞ (R), f (x − 2ib) ∈ C ∞ (R). 2. f˜(ω) is meromorphic in C with simple poles at ω = −k3 + i(∓α1 + nb−1 )
ω = +k3 + i(∓α2 + nb−1 )
n ∈ Z.
−1 Proof. The action of δ± is represented on the Fourier transform f˜ as multiplication with −1 (δ˜± )−1 (ω) ≡ e−2πbω [iω ∓ α2 − ik3 ]−1 b [iω ∓ α1 + ik3 ]b .
The statement on the analyticity properties of f˜ is then clear after recalling that the function g(ω) ˜ is entire analytic and of rapid decay being the Fourier transform of a Cc∞ function [21, Theorem IX.11]. −1 The statement that (δ+ g)(x) is analytic in the strip {x ∈ C; Im(x) ∈ (−2b, 0)} −1 )(ω) by means of the Paley–Wiener follows from the asymptotic decay properties of (δ˜± Theorem. In fact, the rapid decay of g(ω) ˜ ensures convergence of the inverse Fourier −1 transformation for any x-derivative of (δ+ g)(x) even in the extremal cases Im(x) = 0 and Im(x) = −2b. We will furthermore need similar statements about the inverses of &0 and &2 . Lemma 12. &2 is invertible on Cc∞ (R). The image f (ω) of a function g ∈ Cc∞ (R) under &−1 2 has the following properties: 1. f˜(x) is meromorphic in C with simple poles at x = − 2i (α1 + α2 ) − i(Q + nb−1 )
x = − 2i (α1 + α2 ) + inb−1
n ∈ Z.
2. f (ω) is analytic in the strip {ω ∈ C; Im(x) ∈ (−b, b)} and f (ω ± ib) ∈ C ∞ (R). Lemma 13. &0 is invertible on the space of functions D(&0 ) ≡ dω + α3 − Q − 21 (α1 + α2 ) dω − α3 − 21 (α1 + α2 h,
h ∈ Cc∞ (R).
The image f (ω) of a function g ∈ D(&0 ) under &−1 0 has the following properties: 1. f˜(x) is meromorphic in C with simple poles at x = + 2i (α1 + α2 − Q) ± i(α3 −
Q −1 2 ) − inb
n ∈ Z \ {0}.
2. f (ω) is analytic in the strip {ω ∈ C; Im(x) ∈ (−b, b)} and f (ω ± ib) ∈ C ∞ (R).
Clebsch–Gordan and Racah–Wigner Coefficients for Representations of Uq (sl(2, R))
641
6.4. Asymptotic estimates. We now want to show that the Fourier-transform ˜ of may actually be represented by integration against a function ˜ (ω). For technical reasons it will be necessary to start by considering the distribution R ∈ S (R) defined by ˜ R ≡ δ˜tr,R (ω) ˜ ≡ (ω − ω ) ˜ , ω ∈I+ ∪I− |Im(ω )|
where I+ (resp. I− ) are the sets of values for ω, where either δ˜+ (ω) or δ˜− (ω) have a pole in the upper (resp. lower) half plane. The following result characterizes the asymptotic behavior of R . Proposition 10. Let τn ∈ Cc∞ (R) have support only in [n − 1, n + 1]. For a sufficiently large value of R there exists some N > 0 such that cosh(2π bn) Proof. We will rewrite large n. One may write
R , τn
R , τn
R , τn
< N for all n ∈ Z.
(109)
in a form that allows us to estimate its asymptotics for
= , δtr,R τn , = , δ+ e2πbx σn,R , =
σn,R ≡ e−2πbx (δ+ )−1 δtr,R τn ;
where
c σn,R , , δ+
where
c δ+
≡ (δ0 − δ− e
−2πbx
(110)
).
In the last step we have used that weakly solves the eigenvalue equation, for which one needs to check that σn,R ∈ D: One point of having introduced δtr,R is that it improves the asymptotic behavior of (δ+ )−1 δtr,R τn for x → −∞ by cancelling the poles of its Fourier transform in {ω ∈ C; Im(ω) < R}. The regularity theorem for tempered distributions [20, Theorem V.10] allows us to furthermore write ∞
R , τn
=
dx 6(x) ρn,R (x)
where
c −2πbx ρn,R ≡ ∂xk δ+ e (δ+ )−1 δtr,R τn
−∞
(111) for some positive integer k and a polynomially bounded continuous function 6(x). The functions ρn,R (x) may be represented by expressions of the form ρn,R (x) =
k=1,2
Ck e
−2πbx
∞ dω e2πiωx −∞
Pk,R (ω)τ˜n (ω) , (1 − e2πb(ω−k+iα1 ) )(1 − e2πb(ω+iα2 ) ) (112)
where Pk,R (ω) k = 1, 2 are some polynomials in ω. The functions ρn,R (x) have main support around x = n, and by choosing R large enough one can achieve decay stronger than e−2πλ|x−n| for any λ > 0. It is then convenient to split the integral in (111) into an c integral Jn obtained by integrating over [ n2 , 3n 2 ] and the remainder Jn . c In order to estimate Jn one may use the polynomial boundedness of 6(x) to estimate its absolute value by some constant times cosh(!x), where ! can be as small as one likes. The absolute value of ρn,R (x) can in R \ [ n2 , 3n 2 ] be estimated by some inverse power of
642
B. Ponsot, J. Teschner
cosh(x), which is bounded by the chosen value of R. It follows that there exist D1 , N1 such that |Jnc | ≤ D1 e−2πµn
for any n > N1 ,
(113)
where µ can be made arbitrarily large by choosing R large enough. In the case of Jn one may estimate |ρn,R (x)| by some constant times e−2πbn e−2πb|x−n| and 6(x) simply by a constant, which easily gives existence of D2 , N2 such that |Jn | ≤ D2 e−2πbn
for any n > N1 .
(114)
This proves the claim about the asymptotics for n → ∞. In the case of n → −∞ one uses the operator δ− in a completely analogous fashion. 6.5. Representation of ˜ . Assume that the set {τn ; n ∈ Z} represents a Cc∞ (R)-partition of unity. It will be convenient to choose the τn as translates of τ0 : τn (x) = τ0 (x − n). This can always be achieved: Let
τ0 (x) =
χ (x) = N −1
x
dt exp
− 41
|x| >
3 4
0
if
1
if |x| <
χ (x + 21 )
if
x ∈ [ − 43 , − 41 ]
1−χ (x − 21 )
if
x ∈ [ + 41 , + 43 ], 1
1 (x − 41 )(x + 41 )
1 4
4
dt exp
, N= − 41
1 (x − 41 )(x + 41 )
. (115)
The result of Proposition 10 implies convergence of the following sum: ˜ R (ω) ≡
R , τn e
−2πiωx
.
(116)
n∈Z
which defines ˜ R (ω) as a function that is analytic in the strip {ω ∈ C; Im(ω) ∈ (−b, b)}. Proposition 11. The function ˜ R (ω) represents the distribution ∞
R, f =
dω ˜ R (ω)f˜(ω).
R
in the sense that
(117)
−∞
Proof. To begin with, note that R,n (ω) ≡ R , τn e−2πiωx represents the Fouriertransform of the distribution τn R ∈ S (R) of compact support [21, Theorem IX.12]. It follows that R , τn e−2πiωx is polynomially bounded. Since the convergence in (116) is absolute, one concludes that ˜ R (ω) is polynomially bounded as well. In the evaluation
Clebsch–Gordan and Racah–Wigner Coefficients for Representations of Uq (sl(2, R))
643
of ˜ R (ω) against a test-function f ∈ S(R) one may therefore insert definition (117) and exchange the orders of integration and summation to get ∞
dω ˜ R (ω)f˜(ω) =
∞
dω ˜ R,n (ω)f˜(ω)
n∈Z −∞
−∞
=
(118)
R , τn f
=
R , f ,
n∈Z
where we used that fact that the set {τn ; n ∈ Z} represents a partition of unity in the last step. In order to recover the sought-for distribution from R one only has to divide ˜ R (ω) by δ˜tr,R (ω). The resulting function is meromorphic in the strip {ω ∈ C; Im(ω) ∈ (−b, b)}, with poles at distance 21 (b−1 − b) from the real axis. 6.6. Representation of . In order to get a similar result on the representation of in x-space we will analogously consider the asymptotics of ˜ in ω-space. Here it will be convenient to start by considering ˜ (x − xs ) (x − y) , R ≡ δtr,R (x) ≡ s∈{+,−}
y∈I+ ∪I− |Im(z)|
˜ 2 (z) and & ˜ 0 (z) which lie where I+ (resp. I− ) denotes the union of the sets of zeros of & ˜ 0 (z) that lie on the real in the upper (resp. lower) half plane, and x± are the zeros of & axis, given by x± ≡ + 2i α1 + α2 − Q ± i α3 − Q 2 . For the asymptotics of ˜ R one has a result completely analogous to Proposition 10:
Proposition 12. Let {τn ; n ∈ Z} be a sequence of functions in Cc∞ (R) that have support only in [n − 1, n + 1]. For sufficiently large R there exists some N > 0 such that cosh(2π bn) ˜ R , τn < N for all n ∈ Z.
(119)
Proof. The proof is to a large extent analogous to that of Proposition 10, so we will only sketch some necessary modifications. In order to get an estimate of ˜ R , τn for n → −∞ one may use the eigenvalue equation to rewrite it as ˜ R , τn = ˜ , &0 &−1 0 δtr,R τn = ˜ , &c0 &−1 0 δtr,R τn ,
where &c0 = e2πbω &1 − e4πbω &2 .
(120)
It follows as in the proof of Proposition 10 that ˜ R , τn ∼ e+2πbn for n → −∞. In the case of n → ∞ one may use instead −4πbω ˜ R , τn = ˜ , e4πbω &2 &−1 δtr,R τn 2 e −4πbω = ˜ , &c2 &−1 δtr,R τn , 2 e
which gives ˜ R , τn ∼ e−2πbn for n → ∞.
where &c0 = e2πbω &1 − &0 ,
(121)
644
B. Ponsot, J. Teschner
It follows as in the previous section that R is represented by convolution against a function R (x) which is holomorphic in {x ∈ C; Im(x) ∈ (−b, b)}. In this case, (x) has two simple zeros however, recovering from R is more subtle since δ˜tr,R on the real axis. The resulting ambiguity in the definition of in terms of R (x) is well-known (cf. e.g. [20, Chapter V, Example 9]) and may be parametrized as follows: Cs 1 − Cs 1 = (x). (122) + x − xs + i0 x − xs − i0 x−y R s∈{+,−}
y∈I+ ∪I− |Im(z)|
Lemma 2 then describes the corresponding asymptotic behavior of ˜ (ω). In general one would find terms with exponential decay weaker than e−2πb|ω| for ω → ∞ that (x) strictly above the real axis, or from x in the case come either from zeros of δ˜tr,R ± of Cs " = 0. The occurrence of such terms can be excluded by means of the following argument: 2 t Lemma 14. Let ∈ S (R) be a distributional solution of (C21 (κ3 )−[α3 − Q =0 2] ) ˜ that is represented by a function (ω) which has asymptotic behavior for ω → ∞ of the form ˜ (ω) = +2π i e−2πizj ω Rj + ˜ a− (ω), j ∈I−
where ˜ b (ω) decays at least as fast as e−2πbω for ω → ∞. Then Rj = 0 if Im(zj ) < b. Proof. Consider ˜ , τn , where now τn is chosen proportional to e−κ(x−n) . One has 2 ˜ , τn = ˜ , &0 − e2πbω &1 + e4πbω &2 + α3 − Q 2 τn . α3 − Q (123) 2 b 2 b 2
Now if there were terms with exponential decay weaker than e−2πbω in the asymptotic expansion of ˜ (ω) for ω → ∞ one would find terms that grow exponentially with n → ∞ on the right-hand side of (123). But polynomial boundedness of ˜ excludes the occurrence of such terms on the left-hand side of (123).
6.7. Completing the proof of Theorem 5. Concerning the distribution , we previously found that away from its singular support at x = x± it is represented by a function (x). The asymptotic behavior of (x) is via Lemma 2 given by the analytic properties of ˜ that were stated after the proof of Proposition 11. The possible poles of ˜ at distance 1 −1 −2πb|x| 2 (b −b) from the real axis would lead to terms which decay more slowly than e for |x| → ∞. The appearance of such terms can now easily be excluded by an argument analogous to the proof of Lemma 14 in the x-representation. Furthermore, knowing that the function (x) that represents away from its singular support decays exponentially for |x| → ∞ allows us to use an argument very similar to the proof of Proposition 10 to further improve upon the estimate of the rate of decay as given in Proposition 10: In estimating Jn one may for large enough n replace 6(x) by (x). The exponential decay of the latter may then be used to improve (114) to
Clebsch–Gordan and Racah–Wigner Coefficients for Representations of Uq (sl(2, R))
|Jn | ≤ D2 e−2πνn
for any n > N1
645
(124)
for some ν > b, implying that (x) decays faster than e−2πb|x| for |x| → ∞. But this means via Lemma 2 that the Fourier-transformation ˜ (ω) is analytic in an open strip containing {ω ∈ C; |Im(ω)| < b}, and that ˜ (ω) solves (C˜ 21 (k3 ) − [α3 − Q 2 t ˜ 2 ]b ) (ω) = 0 in the ordinary sense. The meromorphic extension to all of C is then easily obtained by using the eigenvalue equation to define the values of ˜ (ω) outside {ω ∈ C; |Im(ω)| < b} in terms of those inside. This finishes the proof of the first half of Theorem 5. The completion of the proof of the second half proceeds along very similar lines. 6.8. Uniqueness of generalized eigenfunctions. Theorem 3 also implies that the meromorphic function (x) that represents the distribution must solve the transpose of the eigenvalue equation in the usual sense. 2 t Proposition 13. There is at most one solution to (C21 (κ3 ) − α3 − Q (x) = 0 that 2 ) has the analytic and asymptotic properties that follow from Theorem 5. Proof. If one introduces H(x) via (recall κ3 = −2k3 ) (x) = eπx(α3 +α1 −α2 −iκ3 )
Sb (−ix − 21 (α1 + α2 ) + α3 ) Sb (−ix + 21 (α1 + α2 )) × H x − 2i (α1 + α2 − 2(Q − α3 )) ,
(125)
one may verify by direct calculation using the functional equation of the function Sb (x) 2 t (x) = 0 is equivalent to the following that the equation (C21 (κ3 ) − α3 − Q 2 ) equation for H(x): (1 − e2πib(α3 +α1 −α2 ) Txib )(1 − e2πib(α3 −iκ3 ) Txib )
(126) − e−2πbx (1 − Txib )(1 − e2πib(α1 −α2 −iκ3 ) Txib ) H(x) = 0. By using Lemma 2 and the properties of Sb (x) that are summarized in Appendix B ˜ one may deduce the following properties of the Fourier transform H(ω) of H(x) from Theorem 5: ˜ 1. H(x) has a Fourier transform H(ω) that is analytic in {ω ∈ C; Im(ω) ∈ (−Q/2, 0)}, and ˜ 2. H(ω) has the following asymptotic behavior for ω → ±∞: ˜ H(ω) = R+ (ω),
˜ H(ω) = K− + R− (ω),
where K− is a constant, R− (ω) has exponential decay for ω → −∞ and R+ (ω) has exponential decay stronger than e−4πbω for ω → ∞. ˜ Equation (126) is equivalent to the following first order difference equation for H(ω): (1−e2πib(α3 +α1 −α2 −iω) )(1 − e2πib(α3 −iκ3 −iω) )
(127) ˜ = 0. − (1 − e2πib(Q−iω) )(1 − e2πib(Q+α1 −α2 −iκ3 −iω) )Tωib H(ω)
646
B. Ponsot, J. Teschner
Now there exists a solution to (127), namely ˜ H(ω) =
Gb (α3 + α1 − α2 − iω)Gb (α3 − iκ3 − iω) , Gb (Q − iω)Gb (Q + α1 − α2 − iκ3 − iω)
(128)
that has all the required analytic and asymptotic properties. If there was a second solution ˜ (ω)/H(ω). ˜ ˜ (ω) of these conditions one could consider the ratio Q(ω) ≡ H This ratio H ib ˜ must be a solution to (Tω − 1)Q(ω) = 0. Since H(ω) has no zeros in the open strip {ω ∈ C; Im(ω) ∈ (−Q/2, 0)} one concludes that Q(ω) is holomorphic in any such strip. The function Q(ω) must furthermore be asymptotic to the constant function for b ω → ±∞. But this implies that Q = const.: The function P (z) ≡ Q( 2π ln(z)) is holomorphic and regular on the whole Riemann sphere, therefore constant. 7. Appendix B: Special Functions The basic building block for the class of special functions to be considered is the Double Gamma function introduced by Barnes [28], see also [29]. The Double Gamma function is defined as ∞ ∂ −t (s + n1 ω1 + n2 ω2 ) . (129) log I2 (s|ω1 , ω2 ) = ∂t t=0 n1 ,n2 =0
Let Ib (x) = I2 (x|b, b−1 ), and define the Double Sine function Sb (x) and the Upsilon function ϒb (x) respectively by Sb (x) =
Ib (x) . Ib (Q − x)
(130)
It will also be useful to introduce Gb (x) = e
πi 2 x(x−Q)
Sb (x).
(131)
7.1. Useful properties of Sb , Gb . 7.1.1. Self-duality. Sb (x) = Sb−1 (x),
Gb (x) = Gb−1 (x).
(132)
7.1.2. Functional equations. Sb (x + b) = 2 sin(π bx)Sb (x),
Gb (x + b) = (1 − e2πibx )Gb (x).
(133)
7.1.3. Reflection property. Sb (x)Sb (Q − x) = 1,
Gb (x)Gb (Q − x) = eπi(x
2 −xQ)
.
(134)
7.1.4. Analyticity. Sb (x) and Gb (x) are meromorphic functions with poles at x = −nb − mb−1 and zeros at x = Q + nb + mb−1 , n, m ∈ Z≥0 .
Clebsch–Gordan and Racah–Wigner Coefficients for Representations of Uq (sl(2, R))
647
7.1.5. Asymptotic behavior. Sb (x) ∼
πi
e− 2 (x
e
Gb (x) ∼
2 −xQ)
+ π2i (x 2 −xQ)
for Im(x) → +∞ for Im(x) → −∞
1 e+πi(x
2 −xQ)
,
for Im(x) → +∞ . for Im(x) → −∞
(135)
7.2. b-beta integral. Lemma 15. We have 1 Bb (α, β) ≡ i
i∞ dτ e2πiτβ −i∞
Gb (τ + α) Gb (α)Gb (β) = . Gb (τ + Q) Gb (α + β)
(136)
Proof. From the relation (recall Tτ f (τ ) ≡ f (τ + b)) i∞ 0=
dτ (1 − Tτb ) e2πiτβ
−i∞
Gb (τ + α) , Gb (τ + Q)
(137)
which easily follows from the analyticity and asymptotic properties of the Gb -function by means of Cauchy’s theorem, one finds the following functional equation for Bb (α, β): 1 − e2πibβ Bb (α, β + b) = . Bb (α + b, β) 1 − e2πibβ
(138)
By the b → b−1 self-duality of Bb one also has the same equation with b → b−1 . For irrational values of b it follows that (138) and its b → b−1 counterpart determine Bb uniquely up to a function of α + β. The expression on the left-hand side of course satisfies (138). To fix the remaining ambiguity one may note that the integral defining Bb can be evaluated in the special case of α = b−1 by means of [31, Chapt. 1.5., Eq. (28)]: Bb (b−1 , β) =
b−1 1 − e2πib
−1 β
.
(139)
Equation (136) follows. Let us also introduce the combination 6b (y; α) ≡
Gb (y) . Gb (y + α)
(140)
The b-beta-integral (136) can be read as a formula for the Fourier-transform of 6b (y; α): 1 1 6b (y; α) = Gb (y) i
i∞ dτ e2πiατ 6b (τ + y; Q + y). −i∞
(141)
648
B. Ponsot, J. Teschner
An expansion describing the asymptotic behavior of 6b (y; α) for |Im(y)| → ∞ can therefore easily be obtained from Lemma (2): One finds (n,m) −1 6b,+ (α)e2πi(nb+mb )y , 6b (y; α) Im(y)→+∞
6b (y; α) (0,0)
n,m≥0
Im(y)→−∞
n,m≥0
(n,m)
6b,− (α)e−2πi(α+nb+mb
−1 )y
,
(142)
(0,0)
where 6b,+ (α) = 1, 6b,− (α) = e−πiα(α−Q) . 7.3. b-hypergeometric function. The b-hypergeometric function will be defined by an integral representation that resembles the Barnes integral for the ordinary hypergeometric function: 1 Sb (γ ) Fb (α, β; γ ; y) = i Sb (α)Sb (β)
i∞ ds e2πisy −i∞
Sb (α + s)Sb (β + s) , Sb (γ + s)Sb (Q + s)
(143)
where the contour is to the right of the poles at s = −α − nb − mb−1 and s = −β−nb−mb−1 and to the left of the poles at s = nb+mb−1 and s = Q−γ +nb+mb−1 , n, m = 0, 1, 2, . . . . The function Fb (α, β; γ ; −ix) is a solution of the b-hypergeometric difference equation 1 [δx + α][δx + β] − e−2πbx [δx ][δx + γ − Q] Fb (α, β; γ ; −ix) = 0, δx = 2π ∂x . (144) This definition of a b-hypergeometric function is closely related to the one first given in [30]. Lemma 16. Consider the case that Re(α) = Re(β) = Q/2, Re(γ ) = Q. Fb (α, β; γ ; y) is analytic in y in the strip {y ∈ C; Re(y) ∈ (−Q/2, Q/2)}. The leading asymptotic behavior for |Im(y)| → ∞ is given by Fb (α, β; γ ; y) = 1 + O(e2πiby ) + + e2πi(Q−γ )y
Sb (γ ) Sb (Q + β − γ )Sb (Q + α − γ ) Sb (2Q − γ ) Sb (α)Sb (β)
· (1 + O(e2πiby )) Sb (γ )Sb (α − β) (1 + O(e−2πiby )) Fb (α, β; γ ; y) = e−2πiαy Sb (β)Sb (γ − α) Sb (γ )Sb (β − α) + e−2πiβy (1 + O(−e2πiby )). Sb (α)Sb (γ − β)
(145)
There is also a kind of deformed Euler-integral for the hypergeometric function [30]: 1 7b (α, β; γ ; y) = i
i∞ ds e2πisβ −i∞
Gb (s + y)Gb (s + γ − β) . Gb (s + y + α)Gb (s + Q)
(146)
Clebsch–Gordan and Racah–Wigner Coefficients for Representations of Uq (sl(2, R))
649
For the case of main interest, Re(α) = Re(β) = Q/2, Re(γ ) = Q and Re(x) = 0 one needs to deform the contour such that it passes the pole at s = 0 in the right half plane, the pole at s = −y in the left half plane respectively. It then defines a function that is analytic in the right y half plane and develops a pole on the imaginary axis at x = 0 (Lemma 3). Lemma 17. 7b (α, β; γ ; y) has the following asymptotic behavior for |Im(y)| → ∞: Gb (γ − β)Gb (β) (1 + O(e2πiby )) Gb (γ ) Gb (Q + α − γ ) + eπi(γ −β)(γ −β−Q) e2πi(Q−γ )y (1 + O(e2πiby )) Gb (2Q − γ )Gb (α) Gb (β − α)Gb (γ − β) 7b (α, β; γ ; y) = e−2πiαy e−πiα(α−Q) (1 + O(e−2πiby )) Gb (γ − α) Gb (α − β)Gb (β) + e−2πiβy e−πiβ(β−Q) (1 + O(e−2πiby )). Gb (α) (147) 7b (α, β; γ ; y) =
Proof. In order to study the limit Im(y) → ∞ it is convenient to split the integral into two integrals I+ and I− over the intervals (−y/2, ∞) and (−∞, −y/2) respectively. In the case of I+ one may use the asymptotics of the 6b functions containing y for imaginary part of their argument going to +∞, Eq. (142), to get 1 lim I+ = lim Im(y)→∞ Im(y)→∞ i
i∞ Gb (s + γ − β) Gb (β)Gb (γ − β) ds e2πisβ = , (148) Gb (s + Q) Gb (γ )
− y2
where (136) was used in the second step. To study the behavior of I− for Im(y) → ∞ it is convenient to change the integration variable in the second integral to t = s + y. One gets y
I− =
1 i
2 dt e2πi(t−y)β −i∞
Gb (t)Gb (t − y + γ − β) . Gb (t + α)Gb (t − y + Q)
(149)
In this expression one may now use the asymptotics of the 6b functions containing y for the imaginary part of their argument going to −∞, Eq. (142), which yields as previously lim
Im(y)→∞
e−2πiy(Q−γ ) I− = eπi(γ −β)(γ −β−Q) e2πi(Q−γ )y
The behavior for Im(y) → −∞ is studied similarly.
Gb (Q + α − γ ) . (150) Gb (2Q − γ )Gb (α)
Lemma 18. 7b (α, β; γ ; y) is a solution of the finite difference equation Lb 7b = 0, where Lb ≡ e−2πiby (1 − Tyb )(1 − e2πib(γ −Q) Tyb ) − (1 − e2πibα Tyb )(1 − e2πibβ Tyb ). (151)
650
B. Ponsot, J. Teschner
Proof. Abbreviate the integrand in (146) by I . A direct calculation shows that it satisfies the equation Lb I = −(1 − e2πibα )(1 − Tsb )e2πisβ
Gb (s + x)Gb (s + γ − β) . Gb (s + x + α + b)Gb (s + b−1 )
(152)
The lemma follows from Cauchy’s theorem. The finite difference equation allows us to define the meromorphic continuation of 7b into the right y half plane. The precise relation between 7b and Fb is 7b (α, β; γ ; y) =
Gb (β)Gb (γ − β) Fb (α, β; γ ; y ), Gb (γ )
y = y − 21 (γ − α − β + Q). (153)
This follows as in the proof of Proposition (13) from the facts that (i) the finite difference equations satisfied by left- and right-hand sides of (153) are equivalent, and (ii) analytic and asymptotic properties of the functions of y appearing on both sides of (153) coincide. 8. Appendix C This appendix collects some results on the analytic and asymptotic properties of Clebsch– Gordan coefficients, the kernels = , = = s, t and the Racah–Wigner coefficients. 8.1. Clebsch–Gordan coefficients. Lemma 19. The analytic and asymptotic properties of the Clebsch–Gordan coefficients α3 α2 α1 may be summarized as follows: x 3 x 2 x1 Q − α 3 α2 α 1 decays exponentially as e−2παi |xi | if any one of |xi | → ∞, i = 2. x2 x1 x3 1, 2, 3. 2. the Clebsch–Gordan coefficients are meromorphic w.r.t. each variable xi , i = 1, 2, 3 with poles w.r.t. x1 at Upper half plane: Lower half plane:
x1 x1 x1 x1
= x2 − = x3 − = x2 − = x3 −
−1 i 2 (α1 + α2 − 2α3 ) + i(! + nb + mb ) −1 i 2 (α3 + α1 − Q) + i(! + nb + mb ), −1 i 2 (Q − α1 − α2 ) − i(Q + nb + mb ) −1 i 2 (2α2 − α3 − α1 ) − i(Q + nb + mb ),
where n, m ∈ Z≥0 , and w.r.t. x2 at Upper half plane: Lower half plane:
x2 x2 x2 x2
= x1 + = x3 + = x1 − = x3 −
−1 i 2 (Q − α1 − α2 ) + i(Q + nb + mb ) −1 i 2 (2α1 − α3 − α2 ) + i(Q + nb + mb ), −1 i 2 (2α3 − α1 − α2 ) − i(! + nb + mb ) −1 i 2 (Q − α3 − α2 ) − i(! + nb + mb ).
Proof. Direct consequence of analytic and asymptotic properties of the Sb -function given in Appendix B.
Clebsch–Gordan and Racah–Wigner Coefficients for Representations of Uq (sl(2, R))
651
α 3 α2 α1 w.r.t. variables κ3 , κ2 , κ1 is of the following κ3 κ2 κ 1
Lemma 20. The dependence of form:
α3 α2 α1 α α α = δ(κ3 − κ2 − κ1 ) Z 3 2 1 , κ3 κ2 κ1 κ3 κ2 κ1
(154)
Q − α3 α2 α1 is defined on the hypersurface κ3 − κ2 − κ1 = 0 only and is κ3 κ2 κ1 meromorphic w.r.t. κi , i = 1, 2, 3 with poles only at where Z
κi = ±i(αi + nb + mb−1 ),
i = 1, 2, 3,
n, m ∈ Z≥0 .
(155)
Proof. One needs to calculate
α α α α3 α2 α1 = dx2 dx1 e2πik1 x1 e2πik2 x2 3 2 1 . κ3 κ2 κ1 κ3 x 2 x 1
(156)
R
By inserting (35) and changing variables (x1 , x2 ) → (x+ , x− ), x± ≡ x2 ± x1 one finds
α α α that the integration over x+ produces δ(κ3 − κ2 − κ1 ). Z 3 2 1 is therefore given κ3 κ 2 κ 1 by the integral Z
α3 α2 α1 = dx− eπix− (k2 −k1 ) κ3 κ2 κ1
α3 (α2 , α1 |κ3 |x− ).
(157)
R
It is then useful to employ the Barnes integral representation (143) for the b-hypergeometric function that appears in the definition (31) of the function α3 . The order of integrals in the resulting double integral may be exchanged, and the x− integration carried out by means of (136). Up to prefactors that are entire analytic in ki , i = 1, 2, 3 one is left with the following integral: 1 i
i∞ ds e2πisQ −i∞
Gb (s + A1 )Gb (s + A2 )Gb (s + A3 ) , Gb (s + B1 )Gb (s + B2 )Gb (s + B3 )
(158)
where the coefficients are given by A1 =Q − α3 + α1 − α2 , A2 =Q − α3 − iκ3 , A3 =α1 + iκ1 ,
B1 =Q + α1 − α2 − iκ3 , B2 =2Q − α3 − α2 + iκ1 , B3 =Q.
The claim now follows by straightforward application of Lemma 3.
(159)
652
B. Ponsot, J. Teschner
8.2. Kernels
= α= ,
= = s, t.
Lemma 21. Analytic and asymptotic properties of
= αs
rized as follows: α α 1. sαs 3 2 (x4 ; x) is meromorphic w.r.t. α 4 α1 ! x1 in {x1 ∈ C; Im(x1 ) ∈ (−Q, b)}, x2 in {x2 ∈ C; Im(x1 ) ∈ (−b, Q)},
α3 α 2 α4 α 1
!
(x4 ; x) can be summa-
x3 in {x3 ∈ C; Im(x1 ) ∈ (−b, Q)}, x4 in {x4 ∈ C; Im(x1 ) ∈ (−b, b)}.
The poles are located at (notation: xij ≡ xi − xj ) x12 + 2i (α2 + α1 − 2αs ) − 2i! = 0,
x14 + 2i (α1 − α4 ) − 2i! = 0,
x12 + 2i (α2 + α1 − 2(Q − αs )) − i! = 0,
x34 + 2i (α4 − α3 ) + i! = 0.
x13 + 2i (α3 + α1 − 2(Q − α4 )) − 2i! = 0, It decays exponentially for |xi | → ∞ as e−πQ|xi | . α α 2. tαs 3 2 (x4 ; x) is analytic w.r.t. α 4 α1 ! x1 in {x1 ∈ C; Im(x1 ) ∈ (−Q, b)}, x2 in {x2 ∈ C; Im(x1 ) ∈ (−Q, b)},
x3 in {x3 ∈ C; Im(x1 ) ∈ (−b, Q)}, x4 in {x4 ∈ C; Im(x1 ) ∈ (−b, b)}.
The poles are located at x32 − 2i (α3 + α2 − 2αt ) + 2i! = 0,
x14 + 2i (α1 − α4 ) − i! = 0,
x32 − 2i (α3 + α2 − 2(Q − αt )) + i! = 0,
x34 + 2i (α4 − α3 ) + 2i! = 0.
x13 + 2i (α3 + α1 − 2(Q − α4 )) − 2i! = 0, It decays exponentially for |xi | → ∞ as e−πQ|xi | .
The residues of these poles that are needed in Sect. 5 can be represented as follows: Rs13 ∝ Res
y21 =0
Rs14 ∝ Res
y31 =0
α4 α3 αs x4 x3 ∗ α4 α3 αs x4 x3 ∗
Res
y31 =0
Res
y31 =0
αs α2 α1 xs x 2 ∗ αs α2 α1 xs x 2 ∗
xs =x3 − 2i (αs +α3 −2(Q−α4 ))+i! xs =x4 − 2i (αs −α4 )+i!
,
,
αt α3 α2 α4 αt α1 Rt13 ∝ Res , Res y32 =0 ∗ x3 x2 y21 =0 x4 xt ∗ xs =x3 − 2i (α3 −αs )+i! α4 αt α1 αt α3 α2 Rt14 ∝ dxt Res , xt x3 x2 y31 =0 x4 xt ∗ R
(160)
Clebsch–Gordan and Racah–Wigner Coefficients for Representations of Uq (sl(2, R))
653
where the undetermined prefactor does not depend on any of the variables and the ∗ appearing in the arguments indicates the variable of the b-Clebsch–Gordan coefficients that is to be expressed in terms of the others. The necessary residues are Res
y21 =0
α3 α2 α1 x3 x2 ∗
Sb i(x3 − x2 ) − 21 (α2 − α3 ) 1 = 2πSb (α3 + α2 + α1 − Q) Sb i(x3 − x2 ) − 21 (α2 − α3 ) + β32 Sb i(x2 − x3 ) + 21 (α2 + α3 − 2(Q − α3 )) , Sb i(x2 − x3 ) + 21 (α2 + α3 − 2(Q − α3 )) + β31 α3 α2 α1 Res y31 =0 ∗ x2 x1 Sb i(x1 − x2 ) − 21 (α1 + α2 − 2α3 ) Sb (α3 + α2 − α1 ) = 2π Sb i(x1 − x2 ) − 21 (α1 + α2 − 2α3 ) + β31 Sb i(x1 − x2 ) − 21 (α1 + α2 − 2(Q − α3 )) , Sb i(x1 − x2 ) − 21 (α1 + α2 − 2(Q − α3 )) + β32 α3 α2 α1 Res y32 =0 x3 x2 x1 Sb i(x1 − x2 ) − 21 (α1 + α2 − 2α3 ) Sb (α3 + α1 − α2 ) = 2π Sb i(x1 − x2 ) − 21 (α1 + α2 − 2α3 ) + β31 Sb i(x1 − x2 ) − 21 (α1 + α2 − 2(Q − α3 )) , Sb i(x1 − x2 ) − 21 (α1 + α2 − 2(Q − α3 )) + β21 α3 α2 α1 Res Res y32 =0 y21 =0 ∗ ∗ ∗ Sb (2α3 − Q) α3 α2 α1 = Res Res . = 2 ∗ ∗ ∗ y31 =0 y21 =0 (2π ) Sb (α1 + α2 + α3 − Q) (161)
Lemma 22. Analytic and asymptotic properties of summarized as follows: 1.
s αs
α 3 α2 α 4 α1
!
= αs
α 3 α2 α 4 α1
!
(k4 ; x), = = s, t can be
(k4 ; x) is meromorphic w.r.t.
x1 in {x1 ∈ C; Im(x1 ) ∈ (−Q, b)}, x2 in {x2 ∈ C; Im(x1 ) ∈ (−b, Q)},
x3 in {x3 ∈ C; Im(x1 ) ∈ (−b, Q)}, Q k4 in {k4 ∈ C; Im(x1 ) ∈ (− Q 2 , 2 )}.
654
2.
B. Ponsot, J. Teschner t αs
α 3 α2 α 4 α1
!
(k4 ; x) is meromorphic w.r.t.
x1 in {x1 ∈ C; Im(x1 ) ∈ (−Q, b)}, x2 in {x2 ∈ C; Im(x1 ) ∈ (−Q, b)},
x3 in {x3 ∈ C; Im(x1 ) ∈ (−b, Q)}, Q k4 in {k4 ∈ C; Im(x1 ) ∈ (− Q 2 , 2 )}.
α3 α2 (x ; x), α4 α 1 ! 4 = = s, t, which are at positions independent of x4 . Both behave asymptotically
The poles in their dependence on x1 , x2 , x3 are those poles of
= αs
for |x1 | → ∞ as e−2πik4 x1 ,
for |x3 | → ∞ as e−2πik4 x3 ,
for |x2 | → ∞ as e−2πα2 |x2 | ,
for |k4 | → ∞ as e−2π!k4 .
8.3. Racah–Wigner coefficients.
α1 α2 αs is meromorphic w.r.t. all six variables and has poles at β = Lemma 23. α3 α4 αt b −1 −nb − mb where n, m ∈ Z≥0 and β may be any of the following: α 2 + α1 − αs , α s + α1 − α2 , α3 + α2 + αt − Q, α3 + α2 − αt ,
Q − α s − α2 + α1 , 2Q − α1 − α2 − αs , Q − α3 − αt − α2 , Q − α2 − αt − α3 ,
Q − αs − α4 + α3 , Q − αs − α3 + α4 , α1 + α4 + αt − Q, α 1 + α4 − αt ,
2Q − α3 − α4 − αs , Q − α 3 − α4 + αs , α t + α4 − α1 , Q − α1 + α4 − αt .
Acknowledgement. B.P. was supported in part by the EU under contract ERBFMRX CT960012. J.T. is supported by DFG SFB 288 “Differentialgeometrie und Quantenphysik”. Most of this work was carried out while the second named author was at the Dublin Institute for Advanced Studies. He would like to express to this institution his sincere gratitude for support and hospitality.
References 1. Kustermans, K., Vaes, S.: The operator algebra approach to quantum groups. Proc. Natl. Acad. Sci. USA 97, (2), 547–552 (2000) 2. Woronowicz, S.L.: Quantum E(2) group and its Pontryagin dual. Lett. Math. Phys. 23, 251–263 (1991) 3. Van Daele, A., Woronowicz, S.L.: Duality for the quantum E(2) group. Pac. J. Math. 173, 375–385 (1996) 4. Woronowicz, S.: Unbounded elements affiliated with C ∗ -algebras and non-compact quantum groups. Commun. Math. Phys. 136, 399–432 (1991) 5. Buffenoir, E., Roche, Ph.: Harmonic Analysis on the quantum Lorentz group. Commun. Math. Phys. 207, 499–555 (1999) 6. Buffenoir, E., Roche, Ph.: Tensor Products of Principal Unitary Representations of Quantum Lorentz Group and Askey-Wilson Polynomials. Preprint math/9910147 7. Kakehi, T.: Eigenfunction expansion associated with the Casimir operator on the quantum group SUq (1, 1). Duke Math. J. 80, 535–573 (1995) 8. Koelink, E., Stokman, J., Rahman, M.: Fourier transforms on the quantum SU(1,1) group. Preprint math.QA/9911163 9. Ponsot, B., Teschner, J.: Liouville bootstrap via harmonic analysis on a noncompact quantum group. Preprint hep-th/9911110 10. Schmüdgen, K.: Operator representations of Uq (sl(2, R)). Lett. Math. Phys. 37, 211–222 (1996) 11. Woronowicz, S.: C ∗ -algebras generated by unbounded elements. Rev. Math. Phys. 7(1995)481–521 12. Kazhdan, D., Lusztig, G.: Tensor structures arising from affine Lie algebras I–IV. J. Am. Math. Soc. 6, 905–947, 949–1011 (1993) and 7, 335–381, 383–453 (1994) 13. Finkelberg, M.: An equivalence of fusion categories. Geom. Funct. Anal. 6, 249–267 (1996)
Clebsch–Gordan and Racah–Wigner Coefficients for Representations of Uq (sl(2, R)) 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31.
655
Faddeev, L.: Modular Double of Quantum Group. Preprint math.QA/9912078 Faddeev, L.: Discrete Heisenberg-Weyl group and modular group. Lett. Math. Phys. 34, 249–254 (1995) Faddeev, L., Kashaev, R.: Quantum dilogarithm. Mod. Phys. Lett. 9, 265–282 (1994) Woronowicz, S.L.: Quantum Exponential Function. Rev. Math. Phys. 12, 873–920 (2000) Teschner, J.: In preparation Katznelson, V.: An introduction to harmonic analysis. New York: Dover Publ., 1976 Reed, M., Simon, B.: Methods of Modern Mathematical Physics I: Functional Analysis. New York: Academic Press, 1980 (revised ed.) Reed, M., Simon, B.: Methods of Modern Mathematical Physics II: Fourier Analysis, Self-adjointness. New York: Academic Press, 1975 Fock, V.V.: Dual Teichmüller spaces, dg-ga/9702018, and: Chekhov, L., Fock, V. V.: Quantum Teichmüller space. math/9908165 Kashaev, R. M.: Quantization of Teichmüller spaces and the quantum dilogarithm. q-alg/9705021, and: Liouville central charge in quantum Teichmuller theory. hep-th/9811203 Alekseev, A., Schomerus, V.: Representation theory of Chern-Simons observables. Duke Math. J. 85, 447 (1996) Akhiezer, N.I., Glazman, I.M.: Theory of Linear Operators in Hilbert Space II, Monographs and Studies in Mathematics, 10. Boston–London–Melbourne: Pitman Advanced Publishing Program. XXXII (1981) Gelfand, I.M., Vilenkin, N.Ya.: Generalized functions, Vol. 4. Academic Press, 1964 Bernstein, J.: On the support of Plancherel measure, J. Geom. Phys. 5, 663–710 (1988) Barnes, E.W.: Theory of the double gamma function. Phil. Trans. Roy. Soc. A 196, 265–388 (1901) Shintani, T.: On a Kronecker limit formula for real quadratic fields, J. Fac. Sci. Univ. Tokyo Sect. 1A 24,167–199 (1977) Nishizawa, M., Ueno, K.: Integral soluitons of q-difference equations of the hypergeometric type with |q| = 1. q-alg/9612014 Erde’lyi, A. (Ed.): Higher Transcendental Functions, Vol. 1. New York: MacGraw-Hill, 1953
Communicated by A. Connes
Commun. Math. Phys. 224, 657 – 681 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Discrete Dynamical Systems Associated with Root Systems of Indefinite Type Tomoyuki Takenawa Graduate School of Mathematical Sciences, University of Tokyo, Komaba 3-8-1, Meguro-ku, Tokyo 153-8914, Japan Received: 19 March 2001 / Accepted: 11 July 2001
Abstract: A geometric charactrization of the equation found by Hietarinta and Viallet, which satisfies the singularity confinement criterion but which exhibits chaotic behavior, is presented. It is shown that this equation can be lifted to an automorphism of a certain rational surface and can therefore be considered to be a realization of a Cremona isometry on the Picard group of the surface. It is also shown that the group of Cremona isometries is isomorphic to an extended Weyl group of indefinite type. A method to construct the mappings associated with some root systems of indefinite type is also presented. 1. Introduction The singularity confinement method has been proposed by Grammaticos et al. [6] as a criterion for the integrability of (finite or infinite dimensional) discrete dynamical systems. The singularity confinement method demands that when singularities appear due to particular initial values such singularities should disappear after a finite number of iteration steps, in which case the information on the initial values ought to be recovered (hence the dynamical system has to be invertible). However “counter examples” were found by Hietarinta and Viallet [8]. These mappings satisfy the singularity confinement criterion but the orbits of their solutions exhibit chaotic behavior. The authors of [8] introduced the notion of algebraic entropy in order to test the degree of complexity of successive iterations. The algebraic entropy is defined as s := limn→∞ log(dn )/n, where dn is the degree of the nth iterate. This notion is linked to Arnold’s complexity since the degree of a mapping gives the intersection number of the image of a line and a hyperplane. While the degree grows exponentially for a generic mapping, it was shown that it only grows polynomially for a large class of integrable mappings [1, 2, 8, 12]. The discrete Painlevé equations have been extensively studied [9, 14]. Recently it was shown by Sakai [16] that all of (from the point of view of symmetries) these are obtained by studying rational surfaces in connection with the extended affine Weyl groups.
658
T. Takenawa
Surfaces obtained by successive blow-ups [7] of P2 or P1 × P1 have been studied by several authors by means of connections between the Weyl groups and the groups of Cremona isometries on the Picard group of the surfaces [3–5]. Here, the Picard group of a rational surface X is the group of isomorphism classes of invertible sheaves on X and it is isomorphic to the group of linear equivalence classes of divisors on X. A Cremona isometry is an isomorphism of the Picard group such that a) it preserves the intersection number of any pair of divisors, b) it preserves the canonical divisor KX and c) it leaves the set of effective classes of divisors invariant. In the case where 9 points (in the case of P2 , 8 points in the case of P1 × P1 ) are blown up, if the points are in general position the group of Cremona isometries becomes isomorphic with an extension of the Weyl (1) group of type E8 . In case the 9 points are not in general position, the classification of connections between the group of Cremona isometries and the extended affine Weyl groups was first studied by Looijenga [11] and more generally by Sakai. Birational (bimeromorphic) mappings on P2 (or P1 × P1 ) are obtained by interchanging the procedure of blow downs. Discrete Painlevé equations are recovered as the birational mappings corresponding to the translations of affine Weyl groups. It was shown that for every Painlevé equation in Sakai’s list the order of the nth iterate is at most O(n2 ) [18]. Our aim in this paper is to characterize some birational mappings which satisfy the singularity confinement criterion but exhibit chaotic behavior from the point of view of the theory of rational surfaces. Considering one such mapping and the space of its initial values, we obtain a rational surface associated with some root system of hyperbolic type. Conversely, we recover the mapping from the surface and consequently obtain the extension of the mapping to its non-autonomous version. It is important to remark that this method also allows the construction of other mappings starting from suitable rational surfaces. We also show some other examples of such constructions. In Sect. 2, we start from one of the mappings found by Hietarinta and Viallet (we call it the HV equation in this paper) and construct the space such that the mappings are lifted to an automorphism, i.e. bi-holomorphic mapping, of the surface. The mapping ϕ is called a mapping lifted from the mapping ϕ if ϕ coincides with ϕ on any point where ϕ is defined. For this purpose we compactify the original space of initial values, C2 , to P1 × P1 and blow-up 14 times. In Sect. 3, we study the symmetry of the space of initial values. We show that the group of all the Cremona isometries of the Picard group of the surface is isomorphic to an extended Weyl group of hyperbolic type. As a corollary, we prove that there does not exist any Cremona isometry whose action on the Picard group commutes with the action of the HV equation except the action itself. In Sect. 4, we show a method to recover the HV equation from the surface as an element of the extended Weyl group. Each element of the extended Weyl group which acts on the Picard group as a Cremona isometry, is realized as a Cremona transformation (i.e. a birational mapping) on P1 × P1 by interchanging the blow down structure. Here, a blow down structure is the sequence designating the procedure of blow downs. As a result of this we obtain the non-autonomous version of the equation. In Sect. 5, we discuss the construction of other mappings from certain rational surfaces and show some examples which are associated with root systems of indefinite type.
Discrete Dynamical Systems with Root Systems of Indefinite Type
659
2. Construction of the Space of Initial Values by Blow-ups We consider the dynamical system written by the birational (bi-meromorphic) mapping ϕ : C2 → C2 xn xn+1 yn → = , yn yn+1 −xn + yn + a/yn2
(1)
where a ∈ C is a nonzero constant. This mapping was found by Hietarinta and Viallet [8] and we call it the HV equation. To test the singularity confinement, let us assume x0 = 0 and y0 = where || 1. With these initial values we obtain the sequence: x0 = x0 , x1 = y0 = , x2 = y1 = a −2 − x0 + , x3 = y2 = a −2 − x0 + a −1 4 + O( 6 ), x4 = y3 = − + 2a −1 4 + 4x0 a −2 6 + O( 7 ), x5 = y4 = x0 + 3 + O( 2 ), x6 = y5 = (ax0−2 + x0 ) + O(), .. . . In this sequence singularities appear at n = 1 as → 0 and disappear at n = 4 and the information on the initial values is hidden in the coefficients of higher degree . However, taking suitable rational functions of xn and yn we can find the information of the initial values as finite values. The fact that the leading orders of (x12 y1 − a)y1 , (x23 (y2 /x2 − 1)2 − a)x2 and (x3 y32 − a)x3 become −ax0 , −ax0 and −ax0 actually suggests that the HV equation can be lifted to an automorphism of a suitable rational surface (although these rational functions are of course not uniquely determined). Let us consider the HV equation ϕ to be a mapping from the complex projective space P1 (C) × P1 (C) (= P1 × P1 ) to itself. We use the terminology space of initial values as follows (analogous to the space of initial values of Painlevé equations introduced by Okamoto[13]). Definition. A sequence of algebraic varieties Xi is (or Xi themselves are) called the space of initial values for the sequence of rational mappings ϕi , if each ϕi is lifted to an isomorphism from Xi to Xi+1 , for all i. The procedure for constructing the space of initial values Xi in this paper is as follows. Let F be a minimal surface and let each ϕi : F → F be a birational mapping. First blowing up F , we have the surfaces Y1,i such that each ϕi can be lifted to a birational regular mapping from Y1,i to Y0,i+1 := F and each inverse mapping ϕi−1 can also be lifted to a birational regular mapping from Y1,i+1 to Y0,i . Here the mapping ψ is called a mapping lifted from the mapping ψ if ψ coincides with ψ on any point where ψ is defined. In our case ψi can also be lifted to a birational mapping from Y1,i to Y1,i+1 and therefore similarly ϕi can be lifted to a birational mapping from Y2,i to Y2,i+1 . If we have Yn,i = Yn+1,i for all i for some n by continuing this operation, then each ϕi is lifted to a
660
T. Takenawa
biregular mapping, i.e. an isomorphism, from Yn,i to Yn,i+1 , and hence the sequence of Xi := Yn,i can be considered to be the space of initial values. Of course this procedure does not finish for general birational mappings and we may need not only blow-ups but also blow-downs for some mappings. Our aim in this section is to construct the surface X by blow-ups P1 × P1 such that ϕ is lifted to an automorphism of X. 2.1. Regular mapping from Y1 to P1 × P1 . Let the coordinates of P1 × P1 be (x, y), (x, 1/y), (1/x, y) and (1/x, 1/y) and let x = ∞ denote 1/x = 0. We denote the HV equation as ϕ : (x, y) → (x, y) = (y, −x + y + a/y 2 ),
(2)
where (x, y) means the image of (x, y) by the mapping. This mapping has two indeterminate points: (x, y) = (∞, 0), (∞, ∞). By blowing up at these points we can ease the indeterminacy. We denote blowing up at (x, y) = (x0 , y0 ) ∈ C2 : {(x, y) : x, y ∈ C}, µ(x0 ,y0 ) ←−−−−{(x − x0 , y − y0 ; ζ1 : ζ2 ) | x, y, ζ1 , ζ2 ∈ C, |ζ1 | + |ζ2 | = 0, (x − x0 )ζ2 = (y − y0 )ζ1 } by (x, y) ← (x − x0 , (y − y0 )/(x − x0 )) ∪ ((x − x0 )/(y − y0 ), y − y0 ).
(3)
In this way, blowing up at (x, y) = (x0 , y0 ) gives meaning to (x − x0 )/(y − y0 ) at this point. First we blow-up at (x, y) = (∞, 0), (1/x, y) ← (1/x, xy) ∪ (1/xy, y) and denote the obtained surface by Y0 . Then ϕ is lifted to a birational mapping from Y0 to P1 × P1 . For example, in the new coordinates ϕ is expressed as (u1 , v1 ) := (1/x, xy) → (x, y) = (u1 v1 , (−u1 v12 + u31 v13 + a)/(u21 v12 )), (u2 , v2 ) := (1/xy, y) → (x, y) = (v2 , (−v2 + u2 v23 + au2 )/(u2 v22 )). This maps the exceptional curve at (x, y) = (∞, 0) , i.e. u1 = 0 and v2 = 0, almost to (x, y) = (∞, 0) but has an indeterminate point on the exceptional curve: (u2 , v2 ) = (0, 0). Hence we have to blow-up again at this point. In general it is known that, if there is a rational mapping X → X , where X and X are rational surfaces, the procedure of blowing up can be completed in a finite number of steps, after which one obtains a rational surface Y such that the rational mapping is lifted to a birational regular mapping from Y to X (theorem of the elimination of indeterminacy [7]). Here adapting the above method to the mappings ϕ and ϕ −1 we obtain the surface Y1 defined by the following sequence of blow-ups (for simplicity we take only one coordinate of (3)): for ϕ, 1 1 (∞,0) (0,0) (x, y)←−−−− , y ←−−−− , xy 2 µ1 µ2 xy xy 1 1 2 2 (0,a) (0,0) , xy(xy 2 − a) ←−−−− , x y (xy 2 − a) ←−−−− µ3 µ4 xy xy 1 x 1 x (∞,∞) (0,1) (x, y)←−−−− ←−−−− , , x( − 1) , µ9 µ10 x y x y
Discrete Dynamical Systems with Root Systems of Indefinite Type
661
and for ϕ −1 ,
1 1 (0,0) at (0,∞) ←−−−− x 2 y, (x, y)←−−−− x, µ5 µ6 xy xy 1 1 (a,0) (0,0) ←−−−− xy(x 2 y − a), ←−−−− x 2 y 2 (x 2 y − a), , µ7 µ8 xy xy
where µi denotes the i th blow-up. Of course the above sequence is not unique since there is freedom in choosing the coordinates. 2.2. Automorphism of X. We have obtained a mapping from Y1 to P1 × P1 which is lifted from ϕ or ϕ −1 . But our aim is to construct a rational surface X such that ϕ is lifted to an automorphism of X. We construct the rational surface Y2 such that ϕ and ϕ −1 are lifted to a regular mapping from Y2 to Y1 respectively. For this purpose it is sufficient to eliminate the indeterminacy of mapping from Y1 to Y1 . Consequently we obtain Y2 defined by the following sequence of blow-ups. 1 x x 1 (0,0) , x( − 1) ←−−−− , x( − 1) µ11 x y x 2 (x/y − 1) y 1 (0,0) 3 x 2 , x ( − 1) ←−−−− µ12 x 2 (x/y − 1) y 1 (0,a) 2 x 3 x 2 ←−−−− , x ( − 1)(x ( − 1) − a) µ13 x 2 (x/y − 1) y y 1 x (0,0) 4 2 3 x 2 ←−−−− , x ( − 1) (x ( − 1) − a) . µ14 x 2 (x/y − 1) y y It can be shown that the mapping ϕ and ϕ −1 from Y2 to Y2 does not have any indeterminate points. To show this, it is sufficient that the preimage of each exceptional curve of Y2 by ϕ and ϕ −1 is not a point. First we show the regularity of ϕ −1 . Let us define the total and proper transforms. Definition. Let S be the set of zero points of i∈I fi (u, v) = 0, where (u, v) ∈ C2 and the {fi }i∈I is a finite set of polynomials, and let U1 : (u1 , v1 ) and U2 : (u2 , v2 ) the new coordinates of blow-up at the point (u, v) = (a, b), i.e. (u1 , v1 ) = (u − a, (v − b)/(u − a)), (u2 , v2 ) = ((u − a)/(v − b), v − b). The total transform of S is
(u1 , v1 ) ∈ U1 ;
i∈I
fi (u1 + a, u1 v1 + b) = 0 ∪ (u2 , v2 ) ∈ U2 ; fi (u2 v2 + a, v2 + b) = 0 i∈I
and the proper transform of S is fi (u1 + a, u1 v1 + b) = 0 (u1 , v1 ) ∈ U1 ; um 1 i∈I fi (u2 v2 + a, v2 + b) ∪ (u2 , v2 ) ∈ U2 ; = 0 , v2n i∈I
where m or n is the maximum integer simplifying the respective equations for u1 or v2 respectively.
662
T. Takenawa
For example, by blowing up at (u, v) = (0, 0), the total transform of u = 0 is {(u1 , v1 ) ∈ U1 ; u1 = 0}∪{(u2 , v2 ) ∈ U2 ; u2 v2 = 0} and its proper transform is {(u1 , v1 ) ∈ U1 ; 1 = 0}(= φ) ∪ {(u2 , v2 ) ∈ U2 ; u2 = 0}. We denote the total transform of the point of the i th blow-up by Ei and denote the proper transform of the exceptional curves of the i th blow-up by D0 , D1 , D2 , C0 = E4 , D3 , D4 , D5 , C1 = E8 , D6 , D12 , D7 , D8 , D9 , C2 = E14 . (4) Moreover we denote the proper transforms of x = 0, x = ∞, y = 0, y = ∞ as C3 , D10 , C4 , D11 .
(5)
(See Fig. 1.)
Fig. 1.
The proper transforms of C4 , D0 , D1 , D2 and C0 are written as C4 : (x, y) = (x, 0), D0 : (u1 , v1 ) = (u1 , 0), D1 : (u2 , v2 ) = (0, v2 ), D2 : (u3 , v3 ) = (0, v3 ), C0 : (u4 , v4 ) = (0, v4 ), where ui and vi are the new coordinates of the i th blow-up (more precisely, these express the total transforms of curves and we have to write each curve by using the coordinates
Discrete Dynamical Systems with Root Systems of Indefinite Type
663
of the last blow-up but this makes the notation rather complicated) and therefore the relations x = 1/(u1 v1 ),
y = v1 ,
u1 = u2 ,
v1 = u2 v2 ,
u2 = u3 ,
v2 = u3 v3 + a,
u3 = u4 ,
v3 = u4 v4
hold. The proper transforms of C1 , D5 , D4 , D3 and C3 are written as C1 : (u8 , v8 ) = (u8 , 0), D5 : (u7 , v7 ) = (u7 , 0), D4 : (u6 , v6 ) = (u6 , 0), D3 : (u5 , v5 ) = (0, v5 ), C3 : (x, 1/y) = (0, 1/y), and the relations u8 = u7 /v7 ,
v8 = v7 ,
u7 = (u6 − a)/v6 , v7 = v6 , u6 = u5 /v5 ,
v6 = v5 ,
u5 = x,
v5 = 1/(xy)
hold. Using these relations one can calculate the images (by ϕ) of the curves. For example, in the case of C4 : From the above relations and (2) we can calculate (u8 , v8 ) using initial values corresponding to C1 as y (u8 , v8 )|(x,y)=(x,0) = (−x + y)(a + y 2 (−x + y))2 , 2 3 a − xy + y (x,y)=(x,0) = (−a 2 x, 0). This then implies that the image of C4 (= C4 ) is C1 . Analogously, from the equation (u7 , v7 )|(u1 ,v1 )=(u1 ,0) (−1 + u1 v12 )(au1 − v1 + u1 v13 ) u 1 v1 = , u21 au1 − v1 + u1 v13 (u1 ,v1 )=(u1 ,0) a = − ,0 u1 which implies D0 = D5 . In this way we can show that (D0 , D1 , D2 , D3 , D4 , D5 , D6 , D7 , D8 , D9 , D10 , D11 , D12 , C0 , C1 , C2 , C4 ) →(D5 , D4 , D3 , D7 , D8 , D9 , D6 , D0 , D1 , D2 , D11 , D12 , D10 , C3 , C2 , C0 , C1 ),
(6)
664
T. Takenawa
and which implies the preimage of each exceptional curve by ϕ −1 is some curve and therefore ϕ −1 is regular. Similarly the regularity is shown for ϕ, hence we obtain the following theorem. Theorem 1. The HV equation (1) can be lifted to an automorphism of X(= Y2 ). 3. The Picard Group and Symmetry 3.1. Action on the Picard group. We denote the (linear equivalence classes of) total transform of x = constant, (or y = constant) on X by H0 (or H1 respectively) and the (linear equivalence classes of) total transform of the point of the i th blow-up by Ei . From [7] we know that the Picard group of X, Pic(X), is Pic(X) = ZH0 + ZH1 + ZE1 + · · · + ZE14 and that the canonical divisor of X, KX , is KX = −2H0 − 2H1 + E1 + · · · + E14 . It is also known that the intersection form, i.e. the intersection numbers of pairs of base elements, is Hi · Hj = 1 − δi,j ,
Ek · El = −δk,l ,
Hi · Ek = 0,
(7)
where δi,j is 1 if i = j and 0 if i = j, and the intersection numbers of any pairs of divisors are given by their linear combinations. Remark. Let X be a rational surface. It is known that Pic(X), the group of isomorphism classes of invertible sheaves of X, is isomorphic to the following groups: (i) The group of linear equivalence classes of divisors on X. (ii) The group of numerically equivalence classes of divisors on X, where divisors D and D on X are numerically equivalent if and only if for any divisors D on X, D · D = D · D holds. Hence we identify them in this paper. The (linear equivalence classes of) prime divisors in (4), (5) as elements of Pic(X) are described as C0 = E4 ,
C1 = E8 ,
C2 = E14 ,
C3 = H0 − E5 ,
C4 = H1 − E1 (−1 curve)
D0 = E1 − E2 ,
D1 = E2 − E3 ,
D2 = E3 − E4 ,
D3 = E5 − E6 ,
D4 = E6 − E7 ,
D5 = E7 − E8 ,
D6 = E9 − E10 ,
D7 = E11 − E12 ,
D8 = E12 − E13 ,
D9 = E13 − E14 (−2 curve), D10 = H0 − E1 − E2 − E9 , D11 = H1 − E5 − E6 − E9 ,
D12 = E10 − E11 − E12 (−3 curve),
where by n curve we mean a curve whose self-intersection number is n. See Fig. 2.
Discrete Dynamical Systems with Root Systems of Indefinite Type
665
Fig. 2.
The anti-canonical divisor −KX can be reduced uniquely (see Appendix A) to prime divisors as D0 + 2D1 + D2 + D3 + 2D4 + D5 + 3D6 + D7 + 2D8 + D9 + 2D10 + 2D11 + 2D12 ,
(8)
and the connection of Di are expressed by the following diagram. D9 ❞ ❞ D5 ❞ D3
❞ D4
t D11
D8 ❞
D7 ❞
t D12 ❞ D6
t D10
❞ D2 ❞ D1
❞ D0
❞ −2 curve t −3 curve intersection
(9)
The HV equation (1) acts on curves as (6). Hence the HV equation acts on Pic(X) as H0 H1 , E1 , E2 E3 , E4 , E5 , E6 E7 , E8 , E9 , E10 E11 , E12 , E13 , E14 3H0 + H1 − E5 − E6 − E7 − E8 − E9 − E10 H0 , H0 − E8 , H0 − E7 → H0 − E6 , H0 − E5 , E11 , E12 E13 , E14 , H0 − E10 , H0 − E9 E1 , E2 , E3 , E4
666
T. Takenawa
(this table means H0 = 3H0 + H1 − E5 − E6 − E7 − E8 − E9 − E10 , H1 = H0 , E1 = H0 − E8 and so on) and their linear combinations. Remark. Let θ be an isomorphism from the rational surface X to the rational surface X . Let D be a divisor and [D] its class. The class of θ(D) coincides with the class θ ([D]) ∈ Pic(X ) and the action of θ on Pic(X) (∼ = Pic(X )) is linear.
3.2. Cremona isometries and the root system. Definition. An automorphism s of Pic(X) is called a isometry if the following three properties are satisfied: (a) s preserves the intersection form in Pic(X); (b) s leaves KX fixed; (c) s leaves the semigroup of effective classes of divisors invariant. In general, if a birational mapping can be lifted to an isomorphism from X to X by blowups, its action on the resulting Picard group is always a Cremona isometry. We will show that the group of Cremona isometries is an extended Weyl group of hyperbolic type. In the next section we will show these Cremona isometries can be realized as Cremona transformations, i.e. birational mappings, on P1 × P1 . Lemma 1. Let s be a Cremona isometry, then (c’) s is an automorphism of the diagram (9). Proof. First we show that for any i ∈ {0, 1, · · · , 12} there exists j ∈ {0, · · · , 12} such that Di = s(D j ). Notice that −KX can be uniquely reduced to prime divisors in the form −K = X i mi Di (see (8)) and the condition (b). We have s(−KX ) = −KX = m s(D ), where all s(Di ) are effective divisors due to the condition c) (and moreover i i i Di · Di = s(Di ) · s(Di )). By the uniqueness of decomposition of −KX , we have that for any i there exists j such that Di = s(Dj ) and mi = mj . According to this fact and condition (a) we have the lemma. (Another proof is shown in [11, 16].) Let us define Di and Di ⊥ as Di =
12
ZDi
i=0
and Di ⊥ = {α ∈ Pic(X); α · Di = 0 for i = 0, 1, · · · , 12}.
Lemma 2. The Cremona isometry s leaves < Di >⊥ invariant. Proof. Let α ∈< Di >⊥ . By the condition c’) we have that for any i ∈ {0, · · · , 12} there exists j ∈ {0, · · · , 12} such that s(α) · Di = s(α) · s(Dj ) = α · Dj = 0. It implies s(α) ∈< Di >⊥ .
Discrete Dynamical Systems with Root Systems of Indefinite Type
667
In this case Di ⊥ can be written as Di ⊥ =< αi >:= Zα1 + Zα2 + Zα3 , where α1 = 2H1 − E1 − E2 − E3 − E4 , α2 = 2H0 − E5 − E6 − E7 − E8 , α3 = 2H0 + 2H1 − 2E9 − 2E10 − E11 − E12 − E13 − E14 . We consider αi with the intersection form to be a root lattice with a symmetric bilinear form. Let us define the transformation wi (i = 1, 2, 3) on αi as αi · α αi (10) wi (α) = α − 2 αi · α i for α ∈< αi >. The transformation wi (αj ) has the form wi (αj ) = αj − cij αi , where cij = 2(αi · αj )/(αi · αi ), and the matrix cij becomes a generalized Cartan matrix. Here, (3) the generalized Cartan matrix and its Dynkin diagram are of the hyperbolic type H71 [19] as follows: α3 ❞ 2 −2 −2 ❊❅ ❅ ✆ (11) −2 2 −2 and ❅❊❊ ❅ ✆✆✟ ❍ ❞ ❍ ❞✟ ✟ ❍ −2 −2 2 α2 α1 Hence the group W generated by the actions w1 , w2 , w3 , is a Weyl group of hyperbolic type. The extended (including the full automorphism group of the Dynkin diagram) Weyl , is generated by group, W {w1 , w2 , w3 , σ12 , σ13 }
(12)
and the fundamental relations: 2 = 1, wi2 = σ1j
σ12 w1 = w2 σ12 , σ13 w1 = w3 σ13 ,
(σ12 σ13 )3 = 1, σ12 w2 = w1 σ12 , σ12 w3 = w3 σ12 , σ13 w2 = w2 σ13 , σ13 w3 = w1 σ13 ,
(13)
where the action of σ12 or σ13 on < αi > is defined by the exchange of indices of αi ; the action of σ1j and wk on αi can be summarized as follows: σj (αi ) and wk (αi ) α1 → α2 → α3 →
σ12
σ13
w1
w2
w3
α2 α1 α3
α3 α2 α1
−α1 α2 + 2α1 α3 + 2α1
α1 + 2α2 −α2 α3 + 2α2
α1 + 2α3 α2 + 2α3 −α3
Moreover, by the following property we have the fact that the group of Cremona . isometries is included in ±W Proposition 1 ([10] §5.10). If the generalized Cartan Matrix cij is a symmetric matrix of finite, affine, or hyperbolic type, then the group of all automorphisms of < αi > . preserving the bilinear form is ±W
668
T. Takenawa
Remark. If s is a Cremona isometry, then −s can not satisfy the condition (c). to the action on Next we consider uniqueness for the extension of the action of W Pic(X). Lemma 3. Let s and s be Cremona isometries such that the action of s is identical to the action of s on < αi >, then s = s as Cremona isometries, i.e. s is identical to s as an automorphism of Pic(X). Proof. Let s and s be Cremona isometries such that the actions of s is identical to the action of s on < αi >. The actions of s ◦ s −1 on < αi > is the identity. We investigate where the exceptional divisor E4 is moved by the action of s ◦ s −1 . In {Di ; i = 0, · · · , 12}, only D2 has an intersection with E4 . By the condition (c’), s ◦ s −1 (D2 ) is D0 , D2 , D3 , D5 , D7 or D9 and only s ◦ s −1 (D2 ) has an intersection with s ◦ s −1 (E4 ) in {s ◦ s −1 (Di ); i = 0, · · · , 12}. (i) Assume s ◦s −1 (D2 ) = D2 . s ◦s −1 (E4 ) has an intersection only with D2 in {Di ; i = 0, · · · , 12} (this condition on the coefficients of basis of Pic(X) can be considered to be a system of linear equations of order 13). Then we have the general solution with three integers z1 , z2 , z3 : s ◦ s −1 (E4 ) = E4 + z1 α1 + z2 α2 + z3 α3 . Multiplying this equation by s ◦ s −1 (αi ) = αi , we have the system of linear equations: 1 − 4z1 + 4z2 + 4z3 = 1 0 + 4z1 − 4z2 + 4z3 = 0 0 + 4z + 4z − 4z = 0. 1 2 3 It implies s ◦ s −1 (E4 ) = E4 . (ii) Assume s ◦s −1 (D2 ) = D0 . We have s ◦s −1 (E4 ) = (H1 −E1 )+z1 α1 +z2 α2 +z3 α3 . Multiplying this equation by s ◦ s −1 (αi ) = αi , one has that this equation does not have integer solutions. (iii) The other cases. s ◦ s −1 (D2 ) = D3 , D5 , D7 or D9 implies s ◦ s −1 (E4 ) = L + z1 α1 + z2 α2 + z3 α3 , where L = H0 − E5 , E8 , H0 + H1 − E9 − E10 − E11 or E14 respectively. This implies that this equation does not have integer solutions. According to (i), (ii) and (iii) s ◦ s −1 (E4 ) = E4 and s ◦ s −1 (D2 ) = D2 . Analogously we have s ◦ s −1 (H1 − E1 ) = H1 − E1 and s ◦ s −1 (D0 ) = D0 and so on. Due to this fact and the condition (c’), s ◦ s −1 must be the identity as a Cremona isometry. This implies the lemma. on < αi > to the actions Next we consider the extension of actions of elements of W on Pic(X). Let us define αi,j (i = 1, 2, 3 j = 1, 2) as α1,1 α2,1 α3,1 α3,2
= H1 − E 1 − E 4 , = H0 − E 5 − E 7 , = H0 + H1 − E9 − E10 − E11 − E14 , = H0 + H1 − E9 − E10 − E12 − E13 ,
α1,2 = H1 − E2 − E3 , α2,2 = H0 − E6 − E7 ,
and define the action of αi,j (i = 1, 2, 3 j = 1, 2) on α ∈< αi > as wi,j (α) := α − 2
αi,j · α αi,j . αi,j · αi,j
Discrete Dynamical Systems with Root Systems of Indefinite Type
669
It is easy to see that wi (α) = wi,1 ◦ wi,2 (α) = wi,2 ◦ wi,1 (α). We define the action of wi on D ∈Pic(X) as wi (D) := wi,1 ◦ wi,2 (D) = wi,2 ◦ wi,1 (D). These actions are explicitly written as follows (see Fig.2). (For brevity we did not write the invariant elements under each action.) w1 :
H0 , E1 , E2 , E3 , E4 H1 , E 5 , E6 , E7 , E8
→
H0 + 2H1 − E1 − E2 − E3 − E4 H 1 − E4 , H 1 − E3 , H 1 − E2 , H 1 − E1
,
2H0 + H1 − E5 − E6 − E7 − E8 , H 0 − E8 , H 0 − E 7 , H 0 − E 6 , H 0 − E5 H0 + α3 , H1 + α3 , E9 + α3 , E10 + α3 H0 , H1 , E9 , E10 → . w3 : E11 , E12 , E13 , E14 E11 + α3,1 , E12 + α3,2 , E13 + α3,2 , E14 + α3,1
w2 :
→
(14)
We define the action of σ12 and σ13 on Pic(X) as follows: H1 , H0 , E5 , E6 , E7 H0 , H1 , E1 , E2 , E3 → , σ12 : E4 , E5 , E6 , E7 , E8 E8 , E1 , E2 , E3 , E4 (15) H0 + H1 − E9 − E10 , E11 , E12 H1 , E1 , E2 σ13 : E3 , E4 , E9 , E10 → E13 , E14 , H0 − E10 , H0 − E9 . E11 , E12 E13 , E14 E1 , E2 , E3 , E4 By direct calculation, it is easy to check that each wi (or σ1i ) expressed by (14) (or (15) resp.) acts on < αi > as (10) (or as the exchanges of indices of αi resp.) and that they satisfy the fundamental relations (13) (the later property is of course guaranteed by ). Moreover it is also easy to check that the actions of the uniqueness of extension of W all elements of W on Pic(X) satisfy the conditions a),b) and c’). , where W is Theorem 2. The group of Cremona isometries of X is isomorphic to W generated by {w1 , w2 , w3 , σ12 , σ13 } and the fundamental relations (13). The actions of on Pic(X) are given by (14) and (15) and their composition. elements of W To show this theorem, it is enough to show that (14) and (15) satisfy condition (c). For this purpose, it is enough to realize them as isomorphisms from X to X , where X and X have the same semigroup of classes of effective divisors. We show this fact in the next subsection. From (14) and (15) it is straightforward to show that the action of the HV equation on Pic(X) is identical to the action of w2 ◦ σ13 ◦ σ12 . Corollary 1. There does not exist any Cremona isometry of X whose action on Pic(X) commutes with the action of the HV equation except (w2 ◦ σ13 ◦ σ12 )m , where m ∈ Z. Proof. In this proof we denote σ12 or σ13 by σ2 or σ3 respectively and omit the symbol can be uniquely written in the form of composition ◦. Each element of W wi1 wi2 · · · win s, where all indices of w (or σ ) are considered in Mod 3 (or 2 resp.) and il = il+1 and s ∈ {1, σj , σj σj +1 , σj σj +1 σj }. Assume g = wi1 wi2 · · · win s commutes with w2 σ3 σ2 . (i) The case of s = 1. According to the relation w2 σ3 σ2 wi1 wi2 · · · win = wi1 wi2 · · · win w2 σ3 σ2 ,
670
T. Takenawa
we have the relation w2 wi1 +1 wi2 +1 · · · win +1 σ3 σ2 = wi1 wi2 · · · win w2 σ3 σ2 . It implies i1 ≡ 2, i2 ≡ 3, · · · , in ≡ n + 1, 2 ≡ n + 2 and therefore there exists the integer m such that n = 3m. On the other hand (w2 σ3 σ2 )3m = w2 w3 · · · wn+1 . It implies g = (w2 σ3 σ2 )3m . (ii) The case of s = σ3 σ2 or σ2 σ3 . Similar to the case i), n must be n = 3m + 1 or n = 3m + 2 respectively and g becomes (w2 σ3 σ2 )n . (iii) The case of s = σj . Suppose j = 2. According to the relation w2 σ3 σ2 wi1 wi2 · · · win σ2 = wi1 wi2 · · · win σ2 w2 σ3 σ2 , we have the relation w2 wi1 +1 wi2 +1 · · · win +1 σ3 = wi1 wi2 · · · win w1 σ2 σ3 σ2 . It implies σ3 = σ2 σ3 σ2 but this is a contradiction. Similarly s = σ3 is impossible. (v) The case s = σj σj +1 σj . Suppose j = 2. Similar to case (iii), we have the relation w2 wi1 +1 wi2 +1 · · · win +1 σ2 = wi1 wi2 · · · win w3 σ2 σ3 σ2 σ3 σ2 . It implies σ2 = σ2 σ3 σ2 σ3 σ2 and hence 1 = σ3 σ2 σ3 σ2 which leads to a contradiction. Similarly it can be shown that s = σ3 σ2 σ3 is impossible. Conclusion of the section. We have shown in this section that (a) the “Dynkin diagram” (9) of the irreducible components of the anti-canonical divisor −KX does not correspond to a generalized Cartan matrix, (b) the group of Cremona isometries is isomorphic to an extended Weyl group of hyperbolic type and c) there does not exist a Cremona isometry which is commutative with the action of the HV equation except itself, while for the discrete Painlevé equations a) affine type, b) affine type and (c) there does exist such Cremona isometries except for the trivial exceptions (the type of root (1) system is A1 , etc.). 4. The Inverse Problem A birational mapping is called a Cremona transformation. One method for obtaining a Cremona transformation such that its action on Pic(X) is a Cremona isometry is to interchange the blow down structures, i.e. to interchange the procedure of blow downs. Following this method, we construct the Cremona transformations on P1 × P1 which yield the extended Weyl group (12), (13) and thereby recover the HV equation from its action on Pic(X). is an automorphism of Pic(X) but does not have to be an automorAn element of W phism of X itself, i.e. the blow-up points can be changed with a transformation satisfying the condition (a), (b) and (c) in Sect. 3.2. In order to do this, one has to consider not only autonomous but also non-autonomous mappings.
Discrete Dynamical Systems with Root Systems of Indefinite Type
671
By a0 , a1 , a2 , a3 , a4 , a5 , a6 or a7 we denote the point of the 10, 3, 4, 7, 8, 11, 13, 14th blow-up or the corresponding coordinates and we call them “the parameters”. In short these points can be expressed as follows: 1 1 2 2 , xy = (0, a1 ), , xy(xy − a1 ) = (0, a2 ), xy xy 1 1 2 2 x y, = (a3 , 0)), = (a4 , 0), xy(x y − a3 ), xy xy 1 x , = (0, a0 )), where we normalize a0 to be a0 = 1, x y 1 x x 1 2 , x( − 1) = (0, a5 ), , x(x( − 1) − a5 ) = (0, a6 ), x y x(x( xy − 1) − a5 ) y 1 x x = (0, a7 ), , x(x( − 1) − a5 ) x(x( − 1) − a5 )2 − a6 x(x( xy − 1) − a5 ) y y
where ai ∈ C and a1 , a3 , a6 are nonzero. The point of the 2, 6, 12th blow-up is determined by intersection numbers. Moreover the point of the 1, 5, 9, 10, 11th blow-up can be fixed by acting with a suitable automorphism of P1 × P1 , i.e. a Möbius transformation of each coordinate combination with an exchange of the coordinates. We call this operation “normalization”. It can also be seen that we can normalize a5 = 1 except for the case a5 = 0. as Cremona In this section we consider a realization of the generating elements of W transformations which can be lifted to isomorphisms from X to X, where X is the same rational surface as X except for a difference in parameters. First we realize the action of w2 as a Cremona transformation on P1 × P1 .
4.1. The calculation of interchanging the blow down structure. In the following we shall present a scheme in which the blow down structure is changed. This method is based on the following fact: By Fn we denote the nth Hirzebruch surface with the coordinate system 1 1 1 1 ∪ , n , n . ∪ (, ) ∪ ,
(16)
Blowing up the nth Hirzebruch surface at the point (1/, n ) = (0, 0) and blowing down along the line 1/ = 1/(n+1 ) = 0, we obtain the n + 1th Hirzebruch surface as follows: 1 1 ∪ , n ∪ , 1n (, ) ∪ , 1 up ←−−−− (, ) ∪ ,
1
down −−−−→ (, ) ∪ ,
1
∪
1 n+1 ,
∪
∪
1
n+1
1 n+1 ,
, n
∪ ∪
1 1 , n
1 1 , n+1
.
672
T. Takenawa
On the other hand, blowing up the nth Hirzebruch surface at the point (1/, 1/(n )) = (0, 0) and blowing down along the line 1/ = n−1 = 0, we obtain the n − 1th Hirzebruch surface as follows: (, ) ∪ ,
1
∪
up ←−−−− (, ) ∪ ,
1
∪
down −−−−→ (, ) ∪ ,
1
∪
1 n ,
∪
1 n ,
∪
1 1 , n
∪ n−1 , 1n
1 1 , n−1
1 n−1 ,
∪
1 1 , n−1
.
The next figure shows the order of the blow-ups and the blow downs to obtain the Cremona transformation corresponding to w2 . (This table has to be read from the left to the right.) H1 y=∞
H1 − E5
H0 x=∞
H0 x=0 ❙ o ❙
E5
H0 ✼
H1 − E5 − E6
H0
H0
❙ o ❙ E6
H1 − E5
H0
✼
❙ o ❙
H1 − E5 − E6
✼ * H1 − E5 − E6
E7 H0 − E5
H0
H0
◗✑ ✑ ◗
H0 − E6
H0
H0 − E7
H0 + H1 − E5 −E6 − E7
✑ ✑
H0
◗ ◗ H0
*✼
✑ ✑ ✑ ◗ ✑◗ ◗ ◗
H0
2H0 + H1 − E5 −E6 − E7 − E8
H0 ✼
❙ o ❙
H0 + H1 − E5 E8 ❅ ✑ −E6 − E7 ✑ ❅ ✑ H0 − E8 ❅ ◗ ✑◗ ◗ ◗ H0
where double lines mean the lines which are blown down in the next steps.
Discrete Dynamical Systems with Root Systems of Indefinite Type
673
The calculation of changing the blow down structure is as follows: 1 1 1 1 (x, y) ∪ (x, ) ∪ ,y ∪ , = P 1 × P1 y x x y 1 1 1 1 1 ∪ xy, ∪ ,y ∪ , ←(x, y) ∪ x, xy y x x y 1 1 1 1 →(x, xy) ∪ x, R ∪ ,y ∪ , = F1 xy x x y 1 1 1 1 1 ←(x, xy) ∪ x, 2 ∪ x 2 y, ∪ ,y ∪ , xy x x y x y 1 1 1 1 ∪ ,y ∪ , = F2 →(x, x 2 y) ∪ x, 2 x x y x y 1 1 1 x 2 y − a3 x2 ∪ = F2 ∼ (x, x 2 y − a3 ) ∪ x, 2 ∪ , , x x x 2 y − a3 x y − a3 x2 1 x 2 y − a3 1 1 x 2 y − a3 x x2 2y − a ∪ ←(x, , x ∪ x, ∪ )∪ , , 3 x x x x 2 y − a3 x 2 y − a3 x 2 y − a3 x2 x 2 y − a3 1 1 x 2 y − a3 x x2 → x, ∪ = F1 ∪ ∪ x, 2 , , x x x x 2 y − a3 x y − a3 x2 a3 (x 2 y − a3 ) − a4 x a3 x ∼ x, ∪ · · · = F1 ∪ x, a3 x a3 (x 2 y − a3 ) − a4 x a3 x 2 a3 (x 2 y − a3 ) − a4 x a3 (x 2 y − a3 ) − a4 x ∪ ← x, , a3 x a3 x 2 (a3 (x 2 y − a3 ) − a4 x a3 x ∪ x, ∪ ··· a3 (x 2 y − a3 ) − a4 x a3 x 2 a3 (x 2 y − a3 )− a4 x ∪ x, → x, ∪ · · · = P1 × P1 , a3 x 2 a3 (x 2 y − a3 ) − a4 x
where · · · is
1 a3 (x 2 y − a3 ) − a4 x , x a3 x 2
∪
1 a3 x 2 , 2 x a3 (x y − a3 ) − a4 x
,
and ∼ means an automorphism of the Hirzebruch surface and is determined as the point of blow-up in (16) is moved to the origin. Writing a3 a4 w2 : (x, y) → x, y − 2 − , x a3 x we obtain w2 = t ◦ w2 , where t is an automorphism of P1 × P1 . By taking a suitable t, we can normalize w2 to get the required result.
4.2. Normalization and the action on the space of parameters. First we determine the automorphism for normalization t. By (14), w2 does not move the points (x, y) = (∞, 0), (∞, ∞). According to the fact: w2 : (∞, 0) → (∞, 0), (∞, ∞) → (∞, ∞), t is reduced to the mapping t : (x, y) → (c1 x + c2 , c3 y), where c1 , c2 , c3 ∈ C are nonzero constants.
674
T. Takenawa
Similarly, w2 moves the point a3 to the proper transform of the point of the 6th blow-up. We denote this fact as (u6 , v6 )|(u6 ,v6 )=(a3 ,0) = (0, 0), where (un , vn ) is the coordinate of the nth blow-up. On the other hand, (u6 , v6 )|(u6 ,v6 )=(a3 ,0) 1 = x¯ 2 y, ¯ x¯ y¯ (u6 ,v6 )=(a3 ,0) a3 x a3 a4 2 ), = x (y − 2 − x a3 x a3 (x 2 y − a3 ) − a4 x (u6 ,v6 )=(a3 ,0) a 4 u6 v 6 a3 u6 v6 = −a3 + u6 − ,− 2 a3 a 3 − a 3 u 6 + a 4 u 6 v6
(u6 ,v6 )=(a3 ,0)
= (0, 0) holds. Hence t does not move the point (x, y) = (0, ∞) and therefore c2 = 0. Similarly, since w2 does not move the points of the 10th and the 11th blow-ups, we have c1 = c3 = 1 (moreover we can normalize a5 to be a5 = 1 or a5 = 0 by taking a suitable value of c1 = c3 ). Hence t has to be the identity. Next we calculate how the parameters a1 , a2 , a3 , a4 , a6 , a7 are changed by the action of w2 . Notice that w2 is an isomorphism from X to X , where w2 satisfy the conditions a),b) and c) in Sect. 3.2. and therefore X and X have the same sequence of blow-ups except their parameters. By ai we denote the parameter of the i th blow-up of X . Since the action of w2 moves the points of blow-ups as follows (u2 , v2 ) = (0, a1 ) → (u2 , v2 ) = (0, a1 ), (u3 , v3 ) = (0, a2 ) → (u3 , v3 ) = (0, a2 ), (u6 , v6 ) = (0, 0) → (u6 , v6 ) = (a3 , 0), (xy, 1/y) = (0, 0) → (u7 , v7 ) = (a4 , 0), (u12 , v12 ) = (0, a6 ) → (u12 , v12 ) = (0, a6 ), (u13 , v13 ) = (0, a7 ) → (u13 , v13 ) = (0, a7 ), ai can be calculated. For example a1 is calculated as follows: (0, a1 ) = (u2 , v2 )|(u2 ,v2 )=(0,a1 ) 1 , x¯ y¯ 2 = x¯ y¯ (u2 ,v2 )=(0,a1 ) v2 (−a3 + a4 u2 + a32 u32 v2 )2 a3 u2 = , a3 − a4 u2 − a32 u32 v2 a32
(u2 ,v2 )=(0,a1 )
= (0, a1 ), and therefore a1 = a1 .
Discrete Dynamical Systems with Root Systems of Indefinite Type
675
Similarly we can calculate a2 , a3 , a4 as follows: 1 2 (0, a2 ) = , x¯ y( ¯ x¯ y¯ − a1 ) x¯ y¯ (u3 ,v3 )=(0,a2 ) a 1 a4 , = 0, a2 − a3 1 (a3 , 0) = x¯ 2 y, ¯ x¯ y¯ (u6 ,v6 )=(0,0) = (−a3 , 0), 1 ¯ x¯ 2 y¯ − a3 ), (a4 , 0) = x¯ y( x¯ y¯ (xy,1/y)=(0,0) = (a4 , 0). Consequently w2 changes the parameters ai as a1 = a1 , a2 = a2 − 2a1 a4 /a3 , a3 = −a3 , a4 = a4 , a5 = a5 , a6 = a6 , a7 = a7 + 2a4 a6 /a3 . We write the action of w2 on P1 × P1 and the space of parameters together as w2 : (x, y; a1 , a2 , a3 , a4 , a5 , a6 , a7 ) → (x, y ; a1 , a2 , a3 , a4 , a5 , a6 , a7 ) a4 2a1 a4 2a4 a6 a3 . , −a3 , a4 , a5 , a6 , a7 + ; a1 , a2 − = x, y − 2 − x a3 x a3 a3 Here, in the calculation of the next iteration step we have to use a3 = −a3 instead of a3 and so on. As was remarked before the mapping w2 is of order 2 as an element of an extended Weyl group and can be lifted to an isomorphism from X to X. 4.3. The actions of other elements. Next we calculate the action of σ13 from X to X. The following figure shows the order of the blow-ups and the blow downs.
H0 x=∞
H0 x=0 ❙ o ❙
E9
H0 ✼
H0 ❙ o ❙
❅ H1 − E9 ❅ ❅ H0
◗ H0 + H1 − E9 − E10 ◗ ◗✑ ✑◗ H 0 ✑ ✑ H0
H1 − E9
H1 y=∞
H0 − E9
✼ H1 − E9 E10
H0
H0 − E10
676
T. Takenawa
Its calculation is as follows: 1 1 1 1 (x, y) ∪ (x, ) ∪ ,y ∪ , = P1 × P 1 y x x y 1 1 x y 1 1 ∪ ,y ∪ , ∪ , ← (x, y) ∪ x, y x x y x y 1 1 y 1 x → (x, y) ∪ x, ∪ , ∪ , = F1 y x x x y 1 1 y−x 1 x ∼ (x, y − x) ∪ x, ∪ , ∪ , = F1 y−x x x x y−x 1 1 y−x 1 x 1 ∪ ,y − x ∪ , ∪ , ← (x, y − x) ∪ x, y−x x y−x x x y−x 1 1 1 1 → (x, y − x) ∪ x, ∪ ,y − x ∪ , = P1 × P 1 . y−x x x y−x Similar to the case of w2 , we have the action of σ13 on X and the space of parameters as follows: σ13 : (x, y; a1 , a2 , a3 , a4 , a5 , a6 , a7 ) → x, x − y − a5 ; a6 , a7 − 2a52 a6 , − a3 , a4 , a5 , a1 , a2 + 2a1 a52 . Similarly the action of σ12 on X and the space of parameters is σ12 : (x, y; a1 , a2 , a3 , a4 , a5 , a6 , a7 ) → −y, − x ; − a3 , −a4 , − a1 , −a2 , a5 , −a6 , a7 − 4a52 a6 . The action of w1 or w3 is determined by the relation w1 = σ12 ◦ w2 ◦ σ12 or w3 = σ13 ◦ w1 ◦ σ13 respectively as follows: w1 : (x, y; a1 , a2 , a3 , a4 , a5 , a6 , a7 ) a1 a2 2a2 a3 2a2 a6 → x − 2 − . , a5 , a6 , a7 + , y ; − a1 , a2 , a3 , a4 − y a1 y a1 a1 w3 : (x, y; a1 , a2 , a3 , a4 , a5 , a6 , a7 ) a7 −2a52 a6 a7 −2a52 a6 a6 a6 → x − − − , y − ; (x −y −a5 )2 a6 (x −y −a5 ) (x −y −a5 )2 a6 (x −y −a5 ) 2a1 (a7 − 2a52 a6 ) 2a3 (a7 − 2a52 a6 ) 2 , a3 , a4 + , a5 , −a6 , a7 − 4a5 a6 . a1 , a2 + a6 a6
Discrete Dynamical Systems with Root Systems of Indefinite Type
677
4.4. The non-autonomous HV equation. The composition w2 ◦ σ13 ◦ σ12 is reduced to w2 ◦ σ13 ◦ σ12 : (x, y; a1 , a2 , a3 , a4 , a5 , a6 , a7 ) a1 a2 2a2 a6 , ; − a6 , a7 − 2a52 a6 − → −y, x − y − a5 − 2 − y a1 y a1 2a2 a3 −a1 , −a2 , a5 , −a3 , −a4 − 2a3 a52 + , a1
(17)
where a5 can be normalized to be a5 = 0 or 1. Of course this mapping satisfies the singularity confinement criterion by construction and in the case of a2 = a4 = a5 = a7 = 0 and a1 = a3 = a6 = a it coincides with the HV equation (1) except their signs. The difference between them comes from the assumption a5 = a5 . Assuming a5 = −a5 under the actions of w2 , σ13 and σ12 , we have −w2 , −σ13 and −σ12 as new w2 , σ13 and σ12 and therefore (17) becomes as follows: w2 ◦ σ13 ◦ σ12 : (x, y; a1 , a2 , a3 , a4 , a5 , a6 , a7 ) a1 a2 ; → y, −x + y + a5 + 2 + y a1 y 2a2 a6 2a2 a3 2 2 a6 , −a7 + 2a5 a6 + . , a1 , a2 , −a5 , a3 , a4 + 2a3 a5 − a1 a1 Actually in the case of a2 = a4 = a5 = a7 = 0 and a1 = a3 = a6 = a it coincides with the HV equation (2). 5. Some Other Examples We present some examples of rational mappings which satisfy the singularity confinement criterion and some of which have positive algebraic entropy. 5.1. Example 1. Let w1 , w2 , w3 and ai be as in Sect. 4. First we consider the mapping w1 ◦ w2 , w1 ◦ w2 : (x, y; a1 , a2 , a3 , a4 , a5 , a6 , a7 ) a1 a3 a2 a4 − 2a2 a3 /a1 → x − 2 − ,y − − , ; a1 y y (x − a1 /y 2 − a2 /(a1 y))2 a3 (x − a1 /y 2 − a2 /(a1 y)) 2(a1 a4 − 2a2 a3 ) 2a2 a3 2a2 a6 a2 a 3 a 6 , −a3 , a4 − , a5 , a6 , a7 + + 2 a4 − 2 −a1 , a2 + a3 a1 a1 a1 a3
This mapping has the following properties: (1) This mapping satisfies the singularity confinement criterion. (2) The degree of the nth iterate of mapping is O(n2 ) (which can be seen from the action on the Picard group, in fact the degree is given by the coefficients of the linear equivalence class obtained from the class of generic lines by the action [18]). (3) The actions on the parameters a5 , a6 , a7 can be ignored. This mapping is nothing but one of the discrete Painlevé equations, since the surface obtained by blowing down the curves E9 , E10 , · · · , E14 in X is also the space of initial values and the type of Dynkin diagram corresponding to the irreducible components of
678
T. Takenawa (1)
(1)
anti-canonical divisor is D7 with the symmetry A1 . Actually the irreducible components of anti-canonical divisor are E1 − E2 , E2 − E3 , E3 − E4 , E5 − E6 , E6 − E7 , E7 − E8 , H0 − E1 − E2 , H1 − E5 − E6 and the root basis of orthogonal lattice is α1 = 2H1 − E1 − E2 − E3 − E4 , α2 = 2H0 − E5 − E6 − E7 − E8 . Next we consider the mapping w3 ◦ w2 ◦ w1 . This mapping is almost identical to the nonautonomous HV equation after 3 steps. Actually the latter becomes w2 ◦ w3 ◦ w1 . At last we consider the mapping w2 ◦ w3 ◦ w2 ◦ w1 . This mapping √ satisfies the singularity confinement criterion and its algebraic entropy is 17 + 12 2.
5.2. Example 2. We consider the following diagram as irreducible components of the anti-canonical divisor:
❞
❞
❞
❞
❞
❞
❞
❞
❞
t t
❞
t
❞ ❞
❞
❞
❞
❞
❞ −2 curve t −4 curve intersection
This diagram is realized by the sequence of blow-ups from P1 × P1 as follows: 1 1 1 (0,0) (0,0) 3 (x, y)←−−−− , y ←−−−− , xy , y ←−−−− µ1 µ2 µ3 xy xy 2 xy 2 1 1 (0,a1 ) (0,a2 ) 2 3 2 2 3 ←−−−− , xy (xy − a1 ) ←−−−− , xy (xy (xy − a1 ) − a2 ) µ4 µ5 xy 2 xy 2 1 (0,a3 ) 2 2 2 3 ←−−−− , xy (xy (xy (xy − a1 ) − a2 ) − a3 ) , µ6 xy 2 (∞,0)
1 1 1 (0,∞) (0,0) (0,0) (x, y)←−−−− x, ←−−−− x, 2 ←−−−− x 3 y, 2 µ7 µ8 µ9 xy x y x y 1 1 (a4 ,0) (a ,0) 5 2 3 2 2 3 ←−−−− x y(x y − a4 ), 2 ←−−−− x y(x y(x y − a4 ) − a5 ), 2 µ10 µ11 x y x y 1 (a6 ,0) ←−−−− x 2 y(x 2 y(x 2 y(x 3 y − a4 ) − a5 ), 2 , µ12 x y and 1 x 1 1 x (∞,∞) (0,1) (0,a7 ) (x, y)←−−−− ←−−−− , , x( − 1) ←−−−− ,z , µ13 µ14 µ15 x y x y xz
Discrete Dynamical Systems with Root Systems of Indefinite Type
679
where we denote z := x(x/y − 1) − a7 , 1 1 1 (0,0) (0,0) (0,a8 ) 3 2 3 ←−−−− , z ← − − − − , xz ← − − − − , xz (xz − a ) 8 µ16 µ17 µ18 xz2 xz2 xz2 1 (0,a9 ) ←−−−− , xz2 (xz2 (xz2 − a8 ) − a9 ) µ19 xz2 1 (0,a10 ) 2 2 2 3 ←−−−− , xz (xz (xz (xz − a ) − a ) − a ) . 8 9 10 µ20 xz2 Similar to the case of the HV equation (Sect. 4), we obtain w2 : (x, y : a1 , a2 , · · · , a10 ) a52 1 a5 a6 a4 − ( − ) : → x, y − 3 − x a4 x 2 a42 a43 x a1 , a2 , a3 +
3a12 a52 a43
−
3a12 a6 a42
, −a4 , a5 , −a6 , a7 , a8 , a9 , a10 −
3a52 a82 a43
+
3a6 a82 a42
,
σ13 : (x, y; a1 , a2 , · · · , a10 ) → x, x − y − a7 ; a8 , a9 , a10 − 3a72 a82 , −a4 , a5 , −a6 , a7 , a1 , a2 , a3 + 3a12 a72 , σ12 : (x, y; a1 , a2 , · · · , a10 ) → −y, −x ; a4 , −a5 , a6 , a1 , −a2 , a3 , a7 , −a8 , a9 , −a10 + 6a72 a82 , and finally w2 ◦σ13 ◦ σ12 : (x, y; a1 , a2 , · · · , a10 ) → a22 1 a1 a2 −a3 + + 3 ; −y, −y + x − a7 − 3 − y a1 y 2 a12 a1 y − a8 , a9 , −a10 + 3a82 a72 − −a5 , a6 + 3a42 a72
+
3a22 a42 a13
3a22 a82 a13 −
3a82 a3
+
3a3 a42
a12
a12
, a1 , −a2 , a3 , a7 , a4 ,
.
In the case ai = 0 for i = 2, 3, 5, 6, 9, 10, this mapping reduces to w2 ◦ σ13 ◦ σ12 : (x, y; a1 , a4 , a7 , a8 ) → (−y, x − y − a7 +
a1 ; −a8 , a1 , a7 , a4 ). y3
We present some basic properties of this mapping. The Picard group of the space of initial values is Pic(X) = ZH0 + ZH1 + ZE1 + · · · + ZE20
680
T. Takenawa
and the canonical divisor of X is KX = −2H0 − 2H1 + E1 + · · · + E20 . The irreducible components of the anti-canonical divisor are E1 − E2 , E2 − E3 , E3 − E4 , E4 − E5 , E5 − E6 , E6 − E7 , E7 − E8 , E8 − E9 , E9 − E10 , E11 − E12 , E12 − E13 , E15 − E16 , E16 − E17 , · · · , E19 − E20 , H0 − E1 − E2 − E3 − E13 , H1 − E5 − E6 − E7 − E13 , E14 − E15 − E16 − E17 , and the root basis is α1 = 3H1 − E1 − E2 − E3 − E4 − E5 − E6 , α2 = 3H0 − E7 − E8 − E9 − E10 − E11 − E12 , α3 = 3H0 + 3H1 − 3E13 − 3E14 − E15 − E16 − E17 − E18 − E19 − E20 . The Cartan matrix 2(αi · αj )/(αi · αi ) is 2 −3 −3 −3 2 −3 . −3 −3 2
(18)
This√Cartan matrix is not finite, affine nor hyperbolic type. The algebraic entropy is 2 + 3. Acknowledgement. The author would like to thank J. Satsuma, H. Sakai, T. Tokihiro, K. Okamoto, R. Willox, A. Nobe, T. Tsuda and M. Eguchi for discussions and advice. The author is also grateful to the referees for their useful comments and suggestions.
A. Uniqueness of the Decomposition of the Anti-Canonical Divisor Proposition 2. Let X be the space of initial values of the HV equation obtained in Sect. 2. The anti-canonical class of divisors −KX can be reduced uniquely to prime divisors as D0 + 2D1 + D2 + D3 + 2D4 + D5 + 3D6 + D7 + 2D8 + D9 + 2D10 + 2D11 + 2D12 . Proof. We show the fixed part of the complete linear system | − KX | is −KX itself. This proof is essentially based on [15]. First we recall the Zariski decomposition. Zariski decomposition. Given a pseudo-effective Q-divisor D, there exists a unique Zariski decomposition. D = P + N, where P is a numerically effective Q-divisor and N is an effective Q-divisor such that if we write the irreducible decomposition of N as N = αi Ei , (i) the intersection matrix (Ei · Ej )i,j is negative definite, (ii) P · Ej = 0 for each j . If D is effective, then so is P .
Discrete Dynamical Systems with Root Systems of Indefinite Type
681
In our case, it is easily seen that the intersection matrix (Di ·Dj )i,j is negative definite and therefore we have −KX = αi Di = N . By the next lemma the uniqueness of the decomposition of −KX follows. Lemma 2.4 ([15]). Let D be a pseudo effective (integral) divisor on X. Let D = P + N be the Zariski decomposition. Then N is in the fixed part of the complete linear system |D|. References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19.
Arnold, V.I.: Dynamics of complexity of intersections. Bol. Soc. Bras. Mat. 21, 1–10 (1990) Bellon, M.P. and Viallet, C.M.: Algebraic entropy. Commun. Math. Phys. 204, 425–437 (1999) Cossec, F., Dolgachev, I.: Enriques surfaces I. Boston: Birkhäuser, 1988 Dolgachev, I.: Weyl groups and Cremona transformations. Proc. Symp. Pure Math. 40, 283–294 (1983) Dolgachev, I. and Ortland, D.: Point sets in projective spaces and theta functions. Astérisque Soc. Math. de France 165 (1988) Grammaticos, B., Ramani, A. and Papageorgiou, V.: Do integrable mappings have the Painlevé property? Phys. Rev. Lett. 67, 1825–1827 (1991) Hartshorne, R.: Algebraic geometry. New York: Springer-Verlag, 1977 Hietarinta, J. and Viallet, C.M.: Singularity confinement and chaos in discrete systems. Phys. Rev. Lett. 81, 325–328 (1997) Jimbo, M. and Sakai, H.: A q-analog of the sixth Painlevé equation. Lett. Math. Phys. 38, 145–154 (1996) Kac, V.: Infinite dimensional lie algebras, 3rd ed.. Cambridge: Cambridge University Press, 1990 Looijenga, E.: Rational surfaces with an anti-canonical cycle. Ann. of Math. 114, 267–322 (1981) Ohta, Y., Tamizhmani, K.M., Grammaticos, B. and Ramani, A.: Singularity confinement and algebraic entropy: The case of the discrete Painlevé equations. Phys. Lett. A 262, 152–157 (1999) Okamoto, K.: Sur les feuilletages associés aux équations du second ordre à points critiques fixes de P.Painlevé. (French). Japan J. Math. 5, 1–79 (1979) Ramani, A., Grammaticos, B. and Hietarinta, J.: Discrete versions of the Painlevé equations. Phys. Rev. Lett. 67, 1829–1832 (1991) Sakai, F.: Anticanonical models of rational surfaces. Math. Ann. 269, 389–410 (1984) Sakai, H.: Rational surfaces associated with affine root systems and geometry of the Painlevé equations. Commun. Math. Phys. 220, 165–229 (2001). webpage: http://www.kusm.kyoto-u.ac.jp/preprint/preprint99.html Takenawa, T.: A geometric approach to singularity confinement and algebraic entropy. J. Phys. A: Math. Gen. 34, L95–L102 (2001) Takenawa, T.: Algebraic entropy and the space of initial values for discrete dynamical systems. J. Phys. A: Math. Gen. (in press) Wan, Z.: Kac-Moody algebra. Singapore: World Scientific, 1991
Communicated by L. Takhtajan
Commun. Math. Phys. 224, 683 – 703 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Chiral Forms and Their Deformations Xavier Bekaert1 , Marc Henneaux1,2 , Alexander Sevrin3 1 Physique Théorique et Mathématique, Université Libre de Bruxelles, Campus Plaine C.P. 231,
1050 Bruxelles, Belgium. E-mail: [email protected]; [email protected]
2 Centro de Estudios Científicos, Casilla 1469, Valdivia, Chile 3 Theoretische Natuurkunde, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussel, Belgium.
E-mail: [email protected] Received: 17 April 2000 / Accepted: 13 July 2001
Abstract: We systematically study deformations of chiral forms with applications to string theory in mind. To first order in the coupling constant, this problem can be translated into the calculation of the local BRST cohomological group at ghost number zero. We completely solve this cohomology and present detailed proofs of results announced in a previous letter. In particular, we show that there is no room for non-abelian, local, deformations of a pure system of chiral p-forms. 1. Introduction A chiral p-form A is defined by the equation F = ∗ F,
(1.1)
where F ≡ dA is the corresponding fieldstrength. From this, it is clear that the dimension, d, of space-time is given by d = 2(p + 1). Furthermore, p should be even in the Minkowski case and odd in the Euclidean case since only in those cases is the square of the Hodge ∗-operator equal to the identity. Throughout this paper we maintain a Minkowski signature. Chiral p-forms naturally appear in string or M-theory. Chiral bosons are essential in the worldsheet formulation of the heterotic string and correspond to p = 0. Chiral two-forms, which, as we will explain further, constitute the main motivation for the present study, are central in the description of the M5-brane. Finally, chiral four-forms appear in type IIB string theory where they signal the presence of D3-branes. The strongest motivation for studying deformations of chiral forms arises from the study of coinciding M5-branes. The solitonic objects in M theory (viewed here as eleven dimensional supergravity) are M2- and M5-branes. These soliton solutions break half of the supersymmetries, reducing them from 32 to 16. Their effective worldbrane actions contain therefore 16 Goldstinos, which correspond to 8 propagating fermionic degrees of freedom. This should be matched by 8 bosonic degrees of freedom. Obvious candidates for the bosonic degrees of freedom of a p-brane living in d dimensions are the d − p − 1
684
X. Bekaert, M. Henneaux, A. Sevrin
transverse positions of the brane. For the M2 brane (p = 2 and d = 11), this saturates the number of bosonic degrees of freedom. For the M5-brane, however, one needs three additional bosonic degrees of freedom. The little group of the worldvolume theory is Spin(6) = SU(2) × SU(2), which means we need a (3,1) representation of this. This is precisely a chiral two-form in six dimensions. In the low energy limit where bulk gravity decouples, a single M5-brane is described by a six dimensional N = (2, 0) superconformal field theory [1, 2]. Its field content consists of five scalar fields and a single chiral two-form1 . A Lorentz non-covariant action was constructed in [3], [4] and [5]. A covariant action was obtained in [6] and [7]. The covariant action contains appropriate extra auxiliary fields and gauge symmetries. Partial gauge fixing of the covariant action yields the non-covariant action. Once n M5-branes coincide, the situation changes. This can be seen by compactifying one direction on a circle. For small radius, the resulting theory is weakly coupled type IIA string theory. When the M5-branes are transverse to the circle, they appear in the type IIA theory as n coinciding NS5-branes. Not much is explicitly known about this system. However, when the M5-branes are longitudinal to the circle, they emerge as n coinciding D4-branes. The effective action for such a system is a U (n) non-abelian Born–Infeld action [8]. Its leading and next to leading terms are well understood but discussion about the subleading terms remains [9, 10]. Ignoring higher derivative terms and focusing on the leading term, one gets that the dynamics of the D4 system is governed by 5 scalar fields in the adjoint representation of U (n) coupled to a 5-dimensional U (n) gauge theory. Going back to the supergravity description, this observation suggests the existence of a non-abelian extension of chiral 2-forms. Genuine non-abelian extensions of non-chiral p-forms, for p ≥ 2 have not yet been constructed. Viewing a 2-form as a connection over loopspace, one can show that no straightforward non-abelian extension exists [11] (see also [12]). Dropping geometric prejudices, all local deformations continuously connected to the free action were constructed in [13]. Though both known and novel deformations were discovered, none of them had the required property that the p-form gauge algebra becomes truly non-abelian. Turning back to chiral 2-forms, one finds that M-theoretical considerations indicate that n coinciding M5-branes constitute a highly unusual physical system. Indeed, the supergravity description of n M5-branes predicts that both the entropy [14] and the twopoint function for the stress-energy tensor [15] scale as n3 in the large n limit. Anomaly considerations lead to a similar behaviour [16, 17]. So this suggests that a non-abelian extension of chiral two-forms falls outside the scope of finite dimensional semi-simple Lie groups as none of those have a dimension growing as fast as n3 (where n would be the dimension of the Cartan sub-algebra). It has been argued that “gerbes” could provide the appropriate mathematical framework [18, 19]. In [20], we announced the result that no local field theory is able to describe a system of coinciding M5-branes. This result was obtained by showing that local deformations of the action cannot modify the abelian nature of the algebra of the 2-form gauge symmetries. It holds under the assumption that the deformed action is continuous in the coupling constant (i.e., possible non-perturbative “miracles” are not investigated) and reduces, in the limit of the vanishing coupling constant, to the action describing free chiral 2-forms. In particular, no assumption was made on the polynomial order (cubic, quartic ...) of the interaction terms. 1 Throughout this paper we ignore the fermionic degrees of freedom which does not change any of our conclusions.
Chiral Forms and Their Deformations
685
In the present paper we present detailed proofs of that assertion. The techniques used in this paper can be applied in a straightforward fashion to prove the results in [21] as well. There, deformations of chiral four-forms in ten dimensions were analyzed with the conclusion that the only consistent deformation was the type IIB coupling of the chiral four-form to the NS-NS and the R-R two-forms familiar from IIB supergravity [22, 23]. The outline of this paper is as follows. In the next section, we review how the problem of consistent couplings can be reformulated as a cohomological problem [24, 25]. We then recall the non-covariant formalism for chiral 2-forms, and their BRST formulation (Sects. 3, 4 and 5). In particular, we point out that the BRST differential s naturally splits as the sum s = δ + γ of simpler building blocks. After a brief section in which we recall the so-called “algebraic Poincaré lemma”, which provides an important tool for our investigations, we turn to the calculation of the BRST cohomology. First we compute the cohomology of γ (Sect. 7). Next, we compute the cohomology of γ modulo d, where d is the spacetime exterior derivative (Sects. 8 and 9). In Sect. 10, we compute the same cohomologies for the other piece involved in s, namely δ. In Sect. 11, we put together the calculations of the previous sections to derive the announced result that the gauge symmetries for a set of free chiral 2-forms are rigid and cannot be deformed continuously in the local field theoretical context. Our paper ends with a short, concluding section. 2. Constructing Consistent Couplings as a Deformation Problem The theoretical problem of determining consistent interactions for a given gauge invariant system has a long history. It has been formulated in general terms in [26] (see also [27]). The equations for the consistent interactions are rather intricate because they are nonlinear and involve simultaneously not only the deformed action, but also the deformed structure functions of the deformed gauge algebra, as well as the deformed reducibility coefficients if the gauge transformations are reducible. The problem is further complicated by the fact that one has to factor out the “trivial” interactions that are simply induced by a change of variables. As we now review, one can reformulate the problem as a cohomological problem [24]. This approach systematizes the recursive construction of the consistent interactions and, furthermore, enables one to use the powerful tools of homological algebra. (0)
Starting with a “free” action S0 [ϕ i ] with “free” gauge symmetries (0)i
δε ϕ i = R α ε α ,
(2.1)
leading to the Noether identities (0)
δ S (0)i R α = 0, δϕ i
(2.2)
(0)
we introduce a coupling constant g and modify S0 , (0)
(0)
(1)
(2)
S0 −→ S0 = S0 + g S0 + g 2 S0 +... .
(2.3)
We consider only consistent deformations, meaning that the deformed action should be gauge invariant as well. In the generic case this requires a deformation of the gauge
686
X. Bekaert, M. Henneaux, A. Sevrin
transformation rules, (0)i Rα
(0)i
(1)i
(2)i
−→ Rαi = R α + g R α + g 2 R α + ... .
(2.4)
Consistency is then translated into the requirement that the Noether identities should hold to all orders δS i R = 0, δϕ i α
(2.5)
δε ϕ i = Rαi ε α .
(2.6)
where,
Expanding Eq. (2.5) order by order in the coupling constant gives the consistency condition of increasing complexity. For reducible theories, which is the case relevant to chiral 2-forms, there is an additional constraint. The gauge transformations of the free theory are not independent, (0)i (0)α R α Z A=
0
(2.7)
(possibly on-shell). One must then also impose that the gauge transformations remain reducible, possibly in a deformed way. This yields additional conditions on the coefficients R’s in Eq. (2.4). The deformations of an action fall into three classes. In the first one, gauge invariant terms are added to the original lagrangian and therefore no modification of the gauge transformations is required. Examples of this are functionals of the field strength and its derivatives, as well as Chern–Simons-like terms [28]. In the second class, both the action and the transformation rules are modified. However, the terms added to the transformation rules are invariant under the original gauge transformations. As a consequence, the gauge algebra is not modified to first order in the coupling constant. An example of this is the Freedman–Townsend model [29] for two-forms in four dimensions. Finally, in the last class, the additional terms in the deformed transformation rules are not gauge invariant. Therefore the gauge algebra itself gets modified as well. The best known example of this is the deformation of an abelian Yang–Mills theory to a non-abelian theory. The key to translating the problem of consistent interactions into a cohomological problem is the antifield formalism [30–32] (for reviews, see [33, 34]). Let us assume that (0)
we solved the master equation for the undeformed theory. Its solution is denoted by S , (0) (0)
which satisfies ( S , S ) = 0. The existence of a consistent deformation of the original (0)
gauge invariant action implies the existence of a deformation of S , which we denote by S, (0)
(0)
(1)
(2)
S −→ S = S + g S + g 2 S + ... .
(2.8)
Chiral Forms and Their Deformations
687
Expanding the master equation for S, (S, S) = 0, order by order in the coupling constant yields various consistency relations, (0) (0)
( S , S ) = 0,
(2.9)
(0) (1)
( S , S ) = 0, (0) (2)
(2.10)
(1) (1)
2( S , S ) + ( S , S ) = 0, .. . . (0)
(2.11)
(0)
The first equation is satisfied by assumption. As ( S , ( S , ·)) = 0, the second equation (1)
(0)
(0)
(1)
(1)
implies that S is a cocycle for the free differential s ≡ ( S , ·). If S is a coboundary, S = (1) (0)
( T , S ), one can show that this corresponds to a trivial deformation (i.e. a deformation which amounts to a simple redefinition of the fields). In practice, we consider deformations which are local in spacetime, i.e., we impose (1) (2)
that S , S , ... be local functionals. Reformulating the equations in terms of the Lagrange densities takes care of this problem. E.g., rewriting Eq. (2.10) as (1) (0)(1) (0)(1) (0) s S = s ( S ) = 0 ⇔ ( s S ) = 0, (2.12) (1)
we obtain the following condition on the Lagrange density S , (0)(1)
s S +dM = 0,
(2.13)
where M is a local form of degree n − 1, where n is the dimensionality of space-time and d is the spacetime exterior derivative2 . Again one can show that BRST-exact terms modulo d are trivial solutions of (2.13) and corresponds to trivial deformations. In the (0)
local context, the proper cohomology to evaluate is thus H 0,n ( s | d), where the first and second superscripts denote the ghost number and form degree, respectively. (0)
Note that when all the representatives of H 0,n ( s | d) can be taken not to depend on (1)
the antifields, one may take the first-order deformations S to be antifield-independent. (0) (2)
In this case Eq. (2.10) reduces to ( S , S ) = 0 and implies that the deformation at order (0)
(2)
g 2 defines also an element of H 0,n ( s | d). One can thus take S not to depend on the antifields either. Proceeding in this manner order by order in the coupling constant, we conclude that the additional terms in S are all independent of the antifields. Since the antifield-dependent terms in the deformation of the master equation are related to the deformations of the gauge transformations, this means that there is no deformation of the gauge transformations. Summarizing, if there is no non-trivial dependence on the (0)
antifields in H 0,n ( s | d) = 0, the only possible consistent interactions are of the first class and do not modify the gauge symmetry. This is the situation met for a system of chiral 2-forms, as we now discuss. 2 Throughout this paper we ignore boundary contributions
688
X. Bekaert, M. Henneaux, A. Sevrin
3. System of Free Chiral 2-Forms in 6 Dimensions The non-covariant action for a system of N free chiral 2-forms is [35], A dtd 5 xB Aij (A˙ A ] = (A = 1, . . . , N ), S0 [AA ij ij − Bij ),
(3.1)
A
where 1 ij klm A 1 Fklm = ij klm ∂k AA (3.2) lm . 6 2 The integer N can be any function of the number n of coincident M5-branes (e.g., N ∼ n3 ). The action (3.1) differs from the one in [3–5] where a space-like dimension was singled out. Here we take time as the distinguished direction; from the point of view of the PST formulation [6, 7], the two approaches simply differ in the gauge fixation. We work in Minkowski spacetime. This implies, in particular, that the topology of the spatial sections R 5 is trivial. Most of our considerations would go unchanged in a curved background 2 of the product form R × " provided the De Rham cohomology groups HDeRham (") 1 2 and HDeRham (") of the spatial sections " vanish. [If HDeRham (") is non-trivial, there are additional gauge symmetries besides (3.3) below, given by time-dependent spatially 1 closed 2-forms; similarly, if HDeRham (") is non-trivial, there are additional reducibility identities besides (3.4) below. One would thus need additional ghosts and ghosts of ghosts. These, however, would not change the discussion of local Lagrangians because they would be global in space (and local in t).] The action S0 is invariant under the following gauge transformations: B Aij =
A A δ # AA ij = ∂i #j − ∂j #i ,
(3.3)
because B Aij is gauge-invariant and identically transverse (∂i B Aij ≡ 0) 3 . As δAA ij = 0 for A #A i = ∂i ε ,
(3.4)
this set of gauge transformations is reducible. This exhausts completely the redundancy 1 5 in #A i since HDeRham (R ) = 0. A The equations of motion obtained from S0 [AA ij ] by varying Aij are Aij k A = 0 ⇔ ij klm ∂k (A˙ A ij klm ∂k A˙ A lm − 2∂k F lm − Blm ) = 0.
(3.5)
2 (R 6 ) = 0, one finds that the general solution of (3.5) is Using HDeRham A A A A˙A ij − Bij = ∂i #j − ∂j #i .
(3.6)
The ambiguity in the solutions of the equations of motion is thus completely accounted for by the gauge freedom (3.3). Hence the set of gauge transformations is complete. A We can view #A i as A0i , so Eq. (3.6) can be read as the self-duality equation A A − ∗F0ij = 0, F0ij
(3.7)
A = A˙A + ∂ AA + ∂ AA . Alternatively, one may use the gauge freedom to where F0ij i j0 j 0i ij A set #i = 0, which yields the self-duality condition in the temporal gauge. 3 Since AA does not occur in the action – even if one replaces ∂ AA by ∂ AA − ∂ AA − ∂ AA (it drops 0 ij 0 ij i 0j j i0 0i out because B Aij is transverse) – the action is of course invariant under arbitrary shifts of AA 0i .
Chiral Forms and Their Deformations
689
4. Fields–Antifields-Solution of the Master Equation The solution of the master equation is easy to construct in this case because the gauge transformations are abelian. We refer to [30–33] for the general construction. The fields present here are A A {$M } = {AA ij , Ci , η }.
(4.1)
A The ghosts CiA correspond to the gauge parameters #A i , and the ghosts of ghosts η correspond to A . Now, to each field $M we associate an antifield $∗M . The set of antifields is then
{$∗M } = {A∗Aij , C ∗Ai , η∗A }.
(4.2)
The fields and antifields have the respective parities: A ∗Ai ) = 0, (AA ij ) = (η ) = (C
(4.3)
(CiA )
(4.4)
∗Aij
= (A
) = (η
∗A
) = 1.
The antibracket is defined as R δ X δLY δR X δLY − , (X, Y ) = d n x δ$M (x) δ$∗M (x) δ$∗M (x) δ$M (x)
(4.5)
where δ R /δZ(x) and δ L /δZ(x) denote functional right- and left-derivatives. Because the set of gauge transformations is complete and defines a closed algebra, the (minimal, proper) solution of the master equation (S, S) = 0 takes the general form (−)(M) $∗M s$M , (4.6) S = S0 + M
where (M) is the Grassmann parity of $M . More explicitly, we have dtd 5 x A∗Aij ∂i CjA − C ∗Ai ∂i ηA . S = S0 +
(4.7)
A
The solution S of the master equation captures all the information about the gauge structure of the theory: the Noether identities, the closure of the gauge transformations and the higher order gauge identities are contained in the master equation. The existence of S reflects the consistency of the gauge transformations. 5. BRST Operator The BRST operator s is obtained by taking the antibracket with the proper solution S of the classical master equation, s X = (S, X).
(5.1)
The BRST operator can be decomposed as s = δ + γ,
(5.2)
690
X. Bekaert, M. Henneaux, A. Sevrin
where δ is the Koszul–Tate differential [33]. What distinguishes δ and γ is the antighost number (antigh) defined through A A antigh(AA ij ) = antigh(Ci ) = antigh(η ) = 0,
antigh(A∗Aij ) = 1,
antigh(C ∗Ai ) = 2,
antigh(η∗A ) = 3.
(5.3) (5.4)
The ghost number (gh) is related to the antighost number by gh = puregh − antigh,
(5.5)
where puregh is defined through puregh(AA ij ) = 0,
puregh(CiA ) = 1,
∗Aij
puregh(A
) = puregh(C
∗Ai
puregh(ηA ) = 2,
) = puregh(η
∗A
) = 0.
(5.6) (5.7)
The differential δ is characterized by antigh(δ) = −1, i.e. it lowers the antighost number by one unit and acts on the fields and antifields according to A A δAA ij = δCi = δη = 0,
δA
∗Aij
δC
= 2∂k F
∗Ai
δη
∗A
= ∂j A = ∂i C
Akij
∗Aij
∗Ai
−
ij klm
(5.8) ∂k A˙ A lm ,
,
(5.9) (5.10)
.
(5.11)
The differential γ is characterized by antigh(γ ) = 0 and acts as A A γ AA ij = ∂i Cj − ∂j Ci ,
γ CiA A
(5.12)
A
= ∂i η ,
(5.13)
γ η = 0,
γA
∗Aij
= γC
(5.14) ∗Ai
= γη
∗A
= 0.
(5.15)
Furthermore we have, sx µ = 0, s(dx µ ) = 0.
(5.16)
6. Local Forms – Algebraic Poincaré Lemma A local function is a function of the fields, the ghosts, the antifields, and their derivatives up to some finite order k (which depends on the function), f = f ($, ∂µ $, . . . , ∂µ1 . . . ∂µk $).
(6.1)
A local function is thus a function over a finite dimensional vector space J k called “jet space”. A local form is an exterior polynomial in the dx µ ’s with local functions as coefficients. The algebra of local forms will be denoted by A. In practice, the local forms are polynomial in the ghosts and the antifields, as well as in the differentiated fields, so we shall from now on assume that the local forms under consideration are of this type. One can actually show that polynomiality in the ghosts, the antifields and their derivatives follows from polynomiality in the derivatives of the Aij by an argument similar to the
Chiral Forms and Their Deformations
691
one used in [36] for 1-forms; and polynomiality in the derivatives is automatic in our perturbative approach where we work order by order in the coupling constant(s). Note also that we exclude an explicit x-dependence of the local forms. One could allow for one without change in the conclusions. In fact, as we shall indicate below, allowing for an explicit x-dependence simplifies some of the proofs. We choose not to do so here since the interaction terms in the Lagrangian should not depend explicitly on the coordinates in the Poincaré-invariant context. The following theorem describes the cohomology of d in the algebra of local forms, in degree q < n. Theorem 6.1. The cohomology of d in the algebra of local forms of degree q < n is given by H 0 (d) R, H q (d) = {constant forms}, 0 < q < n. Constant forms are by definition polynomials in the dx µ ’s with constant coefficients. This theorem is called the algebraic Poincaré lemma (for q < n). There exist many proofs of this lemma in the literature. One of the earliest can be found in [37, 38]. Constant q-forms are trivial in degree 0 < q < n in the algebra of local forms with an explicit x-dependence; e.g., dx 0 = df , where f is the x 0 -dependent function f = x 0 . Thus, in this enlarged algebra, the cohomology of d is simpler and vanishes in degrees 0 < q < n. This is the reason that the calculations are somewhat simpler when one allows for an explicit x-dependence. We work in a formalism where the time direction is privileged. For this reason, it is useful to introduce the following notation: the l th time derivative of a field $ (including the ghosts and antifields) is denoted by $(l) (= ∂0l $), and the spatial differential is denoted by d˜ = dx i ∂i . A local spatial form is an exterior polynomial in the spatial dx k ’s with coefficients that are local functions. If we write the set of the generators of the jet space J k as
$(l0 ) , ∂i1 $(l1 ) , . . . , ∂i1 . . . ∂ik $(0) ; lj = 0, . . . , k − j ,
(6.2)
it is clear that Theorem 6.2. The cohomology of d˜ in the algebra of local spatial forms of degree q < n − 1 is given by ˜ R, H 0 (d) q ˜ H (d) = {constant spatial forms}, 0 < q < n − 1. A similar decomposition of space and time derivatives occurs of course in the Hamiltonian formalism. A discussion of the problem of consistent deformations of a gauge invariant action has been carried out in the Hamiltonian context in [39–41]. 7. Cohomology of γ The following theorem completely gives H (γ ).
692
X. Bekaert, M. Henneaux, A. Sevrin
Theorem 7.1. The cohomology of γ is given by, H (γ ) = I ⊗ V .
(7.1)
Here, the algebra I is the algebra of the local forms with coefficients that depend only ∗ , and all their partial derivatives up to a finite on the variables FijAk , the antifields φM order (“gauge-invariant” local forms). These variables are collectively denoted by χ. The algebra V is the polynomial algebra in the ghosts ηA of ghost number two and their time derivatives. Proof. The generators of A can be grouped in three sets: ∗ T = {t i } = ∂µ1 ...µk FijAk , ∂µ1 ...µk φM , ηA(l) , dx µ , A(l) A(l) U = {uα } = ∂(i1 ...ik A[i)2 j ]1 , ∂(i1 ...ik−1 Cik ) , A(l) V = {v α } = ∂i1 ...ik ∂[i Cj ] , ∂i1 ...ik ηA(l) ,
(7.2) (7.3) (7.4)
(k, l = 0, · · · ) where [ ] and ( ) mean respectively antisymmetrization and symmetrization; the subscript indicates the order in which the operations are made. The differential γ acts on these three sets in the following way γ T = 0,
γU = V,
γ V = 0.
(7.5)
The elements of U and V are in a one-to-one correspondence and are linearly independent with respect to each other, so they constitute a manifestly contractible part of the algebra and can thus be removed from the cohomology. No element in the algebra generated by T is trivial in the cohomology of γ , except 0. Indeed, let us assume the existence of a local form F (t i ) = 0 which is γ -exact, then F (t i ) = γ G(t i , uα , v α ) = v α
∂ LG i α α (t , u , v ). ∂uα
(7.6)
But this implies that F (t i ) = F (t i ) |v α =0 = 0,
(7.7)
as announced. Note that contrary to what happens in the non-chiral case, the temporal derivatives of the ghosts ηA are non-trivial in cohomology. There is thus an infinite number of generators in ghost number two for H (γ ), namely, all the ηA(l) ’s. In contrast, in the non-chiral case, one has ∂0 ηA = γ C0A and so ∂0 ηA (and all the subsequent derivatives) are γ -exact. In the chiral case, there is no C0A . Let{ωI } be a basis of the vector space V of polynomials in the variables ηA and all their time derivatives. Theorem 7.1 tells us that γ α = 0, α ∈ A ⇔ α = PI (χ )ωI + γβ. (7.8) I
Furthermore, because ωI is a basis of V PI (χ )ωI = γβ I
⇒
PI (χ ) = 0.
(7.9)
Chiral Forms and Their Deformations
693
It will be useful in the sequel to choose a special basis {ωI }. The vector space V of polynomials in the ghosts ηA and their time derivatives splits as the direct sum V 2k of vector spaces with definite pure ghost number 2k. The space V 0 is one-dimensional and given by the constants. We may choose 1 as basis vector for V 0 , so let us turn to the less trivial spaces V 2k with k = 0. These spaces are themselves the direct sums of finite dimensional vector spaces Vr2k containing the polynomials with exactly r time derivatives of the η’s (e.g., ∂0 ηA ∂00 ηB is in V34 ). The following lemma provides a basis of V 2k for k = 0: Lemma 7.1. Let V 2k be the vector space of polynomials in the variables ηA(l) with fixed pure ghost number 2k = 0. V 2k is the direct sum V 2k = V02k ⊕ V12k ⊕ . . . ,
(7.10)
where Vm2k is the subspace of V 2k containing the polynomials with exactly m derivatives 2k . There exists a basis of V 2k , of ηA . One has dimVm2k ≤ dimVm+1 m
Im ω(m) : Im = 1, . . . , qm ; m = 0, . . . ,
(7.11)
Im Im ω(m) = ∂0 ω(m−1) (Im = 1, . . . , qm−1 ).
(7.12)
which fulfills
In other words, the first qm−1 basis vectors of Vm2k are directly constructed from the basis 2k by taking their time derivative ∂ . vectors of Vm−1 0 Proof. We will prove the lemma by induction. For m = 0, take an arbitrary basis of V02k (space of polynomials in the undifferentiated ghosts ηA of degree k). Assume now I that a basis with the required properties exists up to order m − 1. Let {ω(m−1) ;I = 2k 0, . . . , qm−1 } be a basis with those properties for Vm−1 . We want to prove that it is possible to construct a basis of Vm2k , where the first qm−1 basis vectors are the time 2k . We only have to show that the ∂ ωI derivatives of the basis vectors of Vm−1 0 (m−1) are linearly independent (because they can always be completed to form a basis of Vm2k ). In other words, we must prove that qm−1
I =1
I λI ∂0 ω(m−1) = ∂0
q m−1 I =1
I λI ω(m−1)
=0
(7.13)
implies λI = 0. But (7.13) is equivalent to qm−1
I =1
I λI ω(m−1) = K,
(7.14)
where K is a constant (algebraic Poincaré lemma in form degree 0). K must be equal to I zero because we are in pure ghost number = 0. By hypothesis, the ω(m−1) are linearly independent, hence the λI must be all equal to zero, which ends the proof.
694
X. Bekaert, M. Henneaux, A. Sevrin
8. Cohomology of γ Modulo d at Positive Antighost Number Let be a p a local p-form of antighost number k = 0 fulfilling γ a p + dbp−1 = 0.
(8.1)
We want to show that if we add to a p an adequate d-trivial term, the Eq. (8.1) reduces to γ a p = 0. From (8.1), using the algebraic Poincaré lemma and the fact that γ is nilpotent and anticommutes with d, we can derive the descent equations γ a p + dbp−1 = 0, γ bp−1 + dcp−2 = 0, .. . q+1 γe + df q = 0, q γf = 0.
(8.2) (8.3)
(8.4) (8.5)
Indeed, the fact that the antighost number is strictly positive eliminates the constants. [E.g., from (8.1), one derives dγ bp−1 = 0 and thus γ bp−1 + dcp−2 = constant, but the constant must vanish since it must have strictly positive antighost number.] We suppose q < p, since otherwise γ a p = 0, which is the result we want to prove. Equation (8.5) tells us that f q is a cocycle of γ . It must be non-trivial in H q (γ ) because if f q = γ g q , then (8.4) becomes γ (eq+1 − dg q ) = 0. The redefinition e q+1 = eq+1 − dg q does not affect the descent equation before (8.4), which means that the descent stops one step earlier, at q − 1. Using Theorem 7.1, we deduce from (8.5) that
(m) ˜ (m) (χ ) ωIm , P˜Im (χ ) + dx 0 Q (8.6) fq = Im (m) m,Im
(m)
(m)
˜ where P˜Im and Q Im are local spatial forms of respective degree q and q − 1. We take
Im the basis elements ω(m) to fulfill the conditions of Lemma 7.1. Differentiating (8.6), we find (m) I (m) Im m df q = d˜ P˜Im ω(m) + γ P˜Im ωˆ (m) m,Im
+dx
0
(m) (m) (m) Im Im ˜ ˜ ˜ ˜ ∂0 PIm − d QIm ω(m) + PIm ∂0 ω(m) .
(8.7)
Im ˜ Im = γ ωˆ Im (and exists thanks to Eq. (5.13)). The local function ωˆ (m) is defined by dω (m) (m) (m) Now, we will show that the component P˜ can be eliminated from f q by a trivial Im
redefinition of f q . In order to satisfy (8.4), the term independent of dx 0 and the coefficient of the term linear in dx 0 in (8.7) must separately be γ -exact. The second condition gives explicitly
(m) ˜ (m) ωIm + P˜ (m) ∂0 ωIm = γβ. ∂0 P˜Im − d˜ Q (8.8) Im Im (m) (m) m,Im
Chiral Forms and Their Deformations
695
To analyze precisely this equation, we define a degree T by T (ηA(m) ) = m.
T (χ ) = 0,
(8.9)
In fact, T simply counts the number of time derivatives of ηA . We can decompose (8.8) according to the degree T . Let p be the highest degree occurring in f q . Then, the highest degree occurring in (8.8) is p + 1 and we must have qp I =1
(p) I P˜I ∂0 ω(p) = γβp+1 .
(8.10)
From the proof of Lemma 7.1, we find that (p) P˜I = 0
(I = 1, . . . , qp )
(8.11)
I are linearly independent. In T -degree p, (8.8) gives then because the ∂0 ω(p)
γβp = −
qp I =1
=
˜ (p) ωI + d˜ Q (p) I
qp−1
I =1
qp−1
I =1
(p−1) I ∂0 ω(p−1) P˜I
(p−1) ˜ (p) ωI − − d˜ Q P˜I (p) I
qp I =qp−1 +1
(8.12) ˜ (p) ωI , d˜ Q (p) I
(8.13)
where we have used the property (7.12) of the basis {ωI }. This implies that (p−1) ˜ (p) P˜I = d˜ Q I
(I = 1, . . . , qp−1 ).
(8.14)
(p−1) Inserting this equation in (8.6), we find that P˜I can be removed from f q by elimi˜ (p−1) . It only affects eq+1 by a nating a trivial cocycle of γ modulo d and redefining Q I (p−2) ˜ d-exact term. Next, Eq. (8.8) at T -degree p − 1 shows that P˜I is also d-exact and can thus also be removed. Proceeding in the same way until the order 1 in T , we have (m) proved that all the P˜I can be eliminated from f q . (m) Looking back at (8.8) and taking into account that P˜Im can be set equal to zero by the above argument, we find that
˜ (m) = 0. d˜ Q Im
(8.15)
Now, we must use the invariant Poincaré lemma (invariant means in the algebra I of gauge-invariant forms) stating that Theorem 8.1. Let be P˜ (χ ) a local spatial form of degree q < 5, then ˜ A(l) ) + d˜ Q(χ ˜ ), d˜ P˜ (χ ) = 0 ⇒ P˜ (χ ) = R(F
(8.16)
˜ A(l) ) is a polynomial in the curvature forms F A = 1 F A dx i dx j dx k and all where R(F 6 ij k their time derivatives (with coefficients that may involve dx k , which takes care of the constant forms).
696
X. Bekaert, M. Henneaux, A. Sevrin
Proof. The set of the generators of the algebra I is A(l) ∗(l) {χ } = ∂i1 ...ik Fij k , ∂i1 ...ik φM , ηA(l) , dx µ .
(8.17)
The 1-form dx 0 is not present in our problem since P˜ is a spatial local form (it only involves dx k ). Considering l and A as only one label (call it α) and forgetting about dx 0 , the set (8.17) is the same as the corresponding set of generators I(≡ H (γ ) in of the algebra A , ∂ AA , · · · , ∂ A pureghost number 0) for a system of spatial two-forms Aαij ≡ AA ij 0 ij 00 ij in 5 dimensions. Consequently, we can simply use the results demonstrated in [42] for a system of p-forms in any dimension. ˜ I is of degree < 5. Thus, We assumed before that f q is of degree q < 6, hence Q (8.15) implies ˜ (m) = d˜ R˜ (m) , Q Im Im
(8.18)
(m)
where R˜ Im is a spatial form which only depends on the variables χ. There is no exterior ˜ (m) has strictly positive antighost number. ˜ (m) because Q polynomial in the curvatures in Q Im Im We can therefore conclude that f q is trivial in H q (γ | d) and can be eliminated by redefining eq+1 . The true bottom is then one step higher. We can proceed in the same way until we arrive at γ a p = 0 with a p = a p + dg p−1 . This can be translated into the following theorem Theorem 8.2. Let be a local form a of antighost number = 0 fulfilling γ a + db = 0. There exists a local form c such as a := a + dc satisfies γ a = 0. 9. Cohomology of γ Modulo d at Zero Antighost Number Now, we want to study H 6,0 (γ | d) in pureghost number 0. Let a (6,0) ∈ A be of form degree 6, of antighost and pureghost number 0, and fulfilling γ a (6,0) + da (5,1) = 0. If a (5,1) is trivial γ modulo d, this equation reduces to γ a (6,0) + db(5,0) = 0, which gives a (6,0) = f (∂µ1 ...µk FijAk )d 6 x plus a term trivial in the cohomology of γ modulo d. Otherwise, we can derive the non-trivial descent equations γ a (6,0) + da (5,1) = 0, γa
(5,1)
γ a (7−g,g−1) γa
(6−g,g)
(4,2)
(9.1)
+ da = 0, .. . + da (6−g,g) = 0,
(9.2)
= 0,
(9.4)
(9.3)
because pureghost (γ a (6−i,i) ) > 0 eliminates the constants. If a (6−g,g) is trivial γ modulo d, the bottom is really one step higher. Equation (9.4) implies that 6−g ˜ 5−g (χ ) ωI + γ b(6−g,g−1) , a (6−g,g) = P˜I (χ ) + dx 0 Q (9.5) I I
Chiral Forms and Their Deformations
697
6−g ˜ 5−g are local spatial forms, the superscript giving the form degree. where P˜I and Q I Because the pureghost number of η is two, a (6−g,g) is non-trivial only for g even. So, three cases are of interest: g = 0, 2, 4. The case g = 0 corresponds to γ a (6,0) = 0 and has been already studied so let us assume g > 0. Equations (9.3) and (9.5) imply together 6−g 6−g ˜ 5−g ωI + − d˜ Q (9.6) ∂0 P˜I P˜I ∂0 ωI = γβ. I I
I
Repeating the same analysis as for Eq. (8.8), we arrive at the conclusion that P˜I is ˜ trivial in the invariant cohomology of d (or vanishes) and can thus be removed from a (6−g,g) by the addition of trivial terms in the cohomology of γ modulo d and a redef˜ 5−g . The case g = 6 is then eliminated because in that case Q ˜ 5−g is not inition of Q I I present at all. Hence, there remains only two cases to examine: g = 2 and g = 4. 6−g ˜ 5−g = 0. Using the invariant Poincaré is removed, Eq. (9.6) gives d˜ Q Once P˜I I ˜ 5−g = R˜ 5−g (F A(l) ) + d˜ S˜ (4−g,g) (χ ). Hence, the form of the bottom lemma, we find Q I I I is 5−g a (6−g,g) = dx 0 (9.7) F A(l) ωI + γ b(6−g,g−1) + dc(5−g,g) . R˜ I 6−g
I
5−g But F A(l) is of form degree 3, thus if g = 4, R˜ I must be a constant spatial 1-form. In that instance, the ωI must be quadratic in the ghosts ηA(l) . The lift of such a bottom is obstructed (i.e., leads to no a 6,0 ) unless it is trivial (see [13]), so that the case g = 4 need not be considered. [In the algebra of x-dependent local forms, the argument is simpler: the bottom is always trivial and removable since it involves a constant 1-form, which is trivial.] It only remains to examine the case g = 2. R˜ must then be a 3-form. One can take ˜ R linear in F A(l) . In that case, the lift gives Chern–Simons terms, which are linear B(m) combinations of dx 0 F A(l) AB(m) , with AB(m) = 21 Aij dx i dx j . Or one can take R˜ to be a constant 3-form. The corresponding deformation is linear in the 2-form AA(l) with coefficients that are constant forms. This second possibility is not SO(5) invariant and leads to equations of motion that are not Lorentz invariant. It will not be considered further. Dropping the latter possibility, all these results can be summarized in
Theorem 9.1. The non-trivial elements of H06,0 (γ | d) are of two types: (i) those that descend trivially; they are of the form f (∂µ1 ...µk BijA )d 6 x; (ii) those that descend non6 trivially; they are linear combinations of the Chern–Simons terms ∂0l B Aij ∂0m AB ij d x. Note that the kinetic term in the free action is precisely of the Chern–Simons type (with l = 0 and m = 1). 10. Invariant Cohomology of δ Modulo d˜ in Antighost Number 2, 4, 6, . . . To pursue the analysis, we need some results on the cohomology of the Koszul–Tate differential δ as well as on its mod-d and mod-d˜ cohomologies.
698
X. Bekaert, M. Henneaux, A. Sevrin
We can rewrite the action of the Koszul–Tate differential in the following way A(l)
δAij
A(l)
= δCi
= δηA(l) = 0,
(10.1) A(l+1)
δA∗A(l)ij = 2∂k F A(l)kij − ij klm ∂k Alm δC
∗A(l)i
δη
∗A(l)
= ∂j A = ∂i C
∗A(l)ij
∗A(l)i
,
,
.
(10.2) (10.3) (10.4)
If we regard A and l as only one label, these equations correspond to an infinite number of coupled non-chiral 2-forms in 5 dimensions. It is useful to introduce a degree N defined as N ($∗M ) = 1, N ($M ) = 0, N (∂k ) = 1, N (∂0 ) = 0, N (dx µ ) = 0.
(10.5) (10.6) (10.7)
N counts the number of spatial derivatives as well as the antifields (with equal weight given to each). According to this degree, δ decomposes as δ0 + δ1 . The differential δ1 acts exactly in the same way as the Koszul–Tate differential for a system of free 2-forms in 5 dimensions. We are now able to prove Theorem 10.1. Hi (δ) = 0 for i > 0, where i is the antighost number, i.e, the cohomology of δ is empty in antighost number strictly greater than zero. Proof. From [42], we know that Hi (δ1 ) = 0. Let a ∈ A be a δ-closed local function of antighost number i > 0. We decompose a according to the degree N a = a1 + . . . + am .
(10.8)
The expansion stops because a is polynomial in the antifields and the derivatives. Furthermore, a0 = 0 because antigh(a) = i > 0. The equation δa = 0 gives in N-degree m + 1: δ1 am = 0. But Hi (δ1 ) = 0, hence am = δ1 bm−1 . We can define an a as being a = a − δbm−1 = a1 + . . . + am−2 + am−1 ,
(10.9)
with am−1 = am−1 − δ0 bm−1 . We can proceed in the same way as before with a , whose component of higher N -degree is of degree less than m. We will then find a new a of highest degree less than m − 1, and so on, each time lowering the N -degree. After a finite number of steps, we arrive at a = a1 = a − δb. Then, δa = 0 implies δ1 a1 = 0. Hence, a1 = δ1 b0 = δb0 because δ0 $M = 0. In conclusion a = δb, with b = b0 + . . . + bm−1 .
Of course, this theorem is really a consequence of general known results on the cohomology of the Koszul–Tate differential. It simply confirms, in a sense, that we have correctly taken into account all gauge symmetries and reducibility identities in constructing the antifield spectrum. ˜ is defined as H 5 (δ | d) ˜ in the space of local The cohomological space Hk5,inv (δ | d) k spatial forms that belongs to I, i.e., that are invariant. We want to compute it for k even and = 0. To do this, we will proceed as in the proof of Theorem 10.1. We first prove the requested result for δ1 ; we then use “cohomological perturbation” techniques to extend the result to δ.
Chiral Forms and Their Deformations
699
Lemma 10.1. For k = 2, 4, . . . , ˜ = 0. Hk5,inv (δ1 | d)
(10.10)
Again, this result is simply a particular case of more general results, which were previously known, but for completeness, we prove it here. Proof. Firstly, Theorem 9.1 of [43] says that for a linear gauge theory of reducibility order p in n dimensions Hkn (δ | d) = 0 for k > p + 2. A system of abelian spatial 2-forms in 5 dimensions is a linear gauge theory of reducibility order 1 (see Sect. 3), ˜ = 0 for k > 3. thus, we can state that Hk5 (δ1 | d) ˜ = 0. Secondly, Theorem 7.4 of [42] gives here : H25 (δ1 | d) Finally, Theorem 10.1 of [42] says that for a system of space-time p-form gauge n,inv fields of the same degree Hkn (δ | d) ∼ = Hk (δ | d) for k > 0. For the system under 5,inv ˜ ∼ ˜ for k > 0. consideration here, this can be translated into: Hk5 (δ1 | d) = Hk (δ1 | d) Putting all these results together completes the proof. Let be a 5 (χ ) a local spatial 5-form in I of strictly positive and even antighost number, satisfying ˜ 4 (χ ) = 0. δa 5 (χ ) + db
(10.11)
We can decompose a 5 and b4 according to the degree N , a 5 = a15 + . . . + an5 ,
(10.12)
b =
(10.13)
4
b14
4 + . . . + bm .
a05 = 0 and b04 = 0 because a 5 and b4 are of antighost number > 0. We can always 4 = 0. Using the ˜ m suppose m ≤ n because if m > n, (10.11) gives in N -degree m + 1: db 4 = dc 4 only contributes to b4 by ˜ 3 . Hence, bm invariant Poincaré lemma, this yields bm m−1 ˜ a d-trivial term which can be eliminated. Proceeding in the same way until m = n, we arrive at the equation ˜ n4 (χ ) = 0. δ1 an5 (χ ) + db
(10.14)
It has already been noticed above that the algebra I without dependence on dx 0 is the same as for a system of spatial 2-forms. We can thus use Lemma 10.1 in (10.14) to find that 5 4 ˜ n−1 (χ ) + df (χ ). an5 (χ ) = δ1 en−1
(10.15)
5 ˜ 4 satisfies the same properties as a 5 , except that its Therefore, a 5 = a 5 − δen−1 − df n−1 component of highest N -degree is of degree < n. We can now apply the same reasoning as before to a 5 , and so on, until we arrive at n−1 n−1 5 5 5 5 4 ˜ a = a1 = a − δ (10.16) ei − d fi .
i=1
i=1
This leads to ˜ 04 (χ ). a15 = δ1 e05 (χ ) + df
(10.17)
700
X. Bekaert, M. Henneaux, A. Sevrin
˜ 4 (χ ), with But δ1 e05 = δe05 because δ0 $M = 0. Eventually, we have a 5 = δe5 (χ ) + df n−1 n−1 5 4 ei and f 4 = fi . This gives the awaited theorem: e5 = i=0
i=0
Theorem 10.2. For k = 2, 4, . . . , ˜ = 0. Hk5,inv (δ | d)
(10.18)
11. Decomposition of the Wess–Zumino Equation We now have all the necessary tools to solve the Wess–Zumino consistency condition that controls the consistent deformations (to first-order) of the action, sa 6 + db5 = 0,
(11.1)
where a 6 and b5 are local forms of respective form degrees 6 and 5, and ghost number 0 and 1. These forms are defined up to the following allowed redefinitions a 6 → a 6 + sf 6 + dg 5 ,
(11.2)
b → b + sg + dh ,
(11.3)
5
5
5
4
which preserve (11.1). We can decompose a 6 and b5 according to antighost number, which gives a 6 = a06 + . . . + ak6 , b = 5
b05
+ . . . + bq5 ,
(11.4) (11.5)
with ak6 = 0. We suppose k > 0 and we will show that ak6 can be eliminated if we redefine a 6 in an appropriate way. In antighost number k, Eq. (11.1) just reads γ ak6 + dbk5 = 0.
(11.6)
We can always assume k ≥ q because if q > k, Eq. (11.1) gives in highest antighost number dbq5 = 0. Using the algebraic Poincaré lemma, we find that bq5 = dcq4 . Hence, we can remove the component bq5 up to a d-trivial redefinition of b5 . From Theorems 7.1 and 8.2, we know that Eq. (11.6) implies ak6 = PI (χ )ωI + γfk6 + dgk5 . (11.7) I
The γ modulo d trivial part of ak6 can be eliminated by redefining a 6 in the following way: a 6 → a 6 − sfk6 − dgk5 .
(11.8)
We notice that Hk6,0 (γ ) is non-trivial only in even antighost number k (because η is of pureghost number 2). This implies that we can assume k to be even. The Wess–Zumino consistency condition in antighost number k − 1 is 6 5 + δak6 + dbk−1 = 0. γ ak−1
(11.9)
Chiral Forms and Their Deformations
701
5 5 ) = 0. Therefore, the algebraic The term bk−1 is invariant because (11.9) implies d(γ bk−1 5 4 + dck−1 = 0 because k > 1. From Theorem 8.2 we know Poincaré lemma gives γ bk−1 5 5 5 = γ ck−1 that we can suppose γ bk−1 = 0 without affecting a 6 . Furthermore, if bk−1 5 5 , which we can eliminate bk−1 by redefining b5 in the following way: b5 → b5 − sck−1 6 does not modify ak . Therefore, we can assume ak6 = dx 0 P˜I5 ωI , (11.10) I
5 bk−1 =
I
˜ 5I + dx 0 R˜ I4 ωI . Q
˜ 5 , and R˜ 4 are local spatial forms belonging to I. The P˜I5 , Q I I Inserting (11.10) and (11.11) in (11.9), we find 5 6 ˜ 5I ωI − γ Q ˜ I + dx 0 R˜ I4 ωˆ I γ ak−1 − d˜ Q =
(11.11)
(11.12)
I
˜ 5I ωI − Q ˜ 5I ∂0 ωI , +dx 0 δ P˜I5 + d˜ R˜ I4 − ∂0 Q ˜ I = γ ωˆ I . This implies that with dω ˜ 5I ∂0 ωI = γβ. ˜ 5I ωI − Q δ P˜I5 + d˜ R˜ I4 − ∂0 Q
(11.13)
(11.14)
I
˜5 = If we analyse this equation in the same way as Eq. (8.8), we can prove that Q I 5 δ P˜I5 + d˜ R˜ I4 (or simply vanishes). Inserting these equations in (11.11), we find that bk−1 is of the form 5 4 5 = δck5 + dek−1 bk−1 + γfk−1 + dx 0 (11.15) R˜ I4 (χ )ωI , I
4 ˜ 5 from b5 by belong to H (γ ). In conclusion, we can eliminate Q where ck5 and ek−1 k−1 I redefining a 6 and b5 in the following way: 5 ), a 6 → a 6 − d(ck5 + fk−1
(11.16)
− s(ck5
(11.17)
b →b 5
5
5 4 + fk−1 ) − dek−1 ,
which does not affect the condition γ ak6 = 0, because γ ck5 = 0. Therefore, we can finally assume 5 ak6 = dx 0 P˜I5 (χ )ωI , bk−1 = dx 0 R˜ I4 (χ )ωI . I
(11.18)
I
Equation (11.9) becomes
γ ak−1 + dx 0
I
δ P˜I5 (χ ) + d˜ R˜ I4 (χ ) ωI = 0,
(11.19)
702
X. Bekaert, M. Henneaux, A. Sevrin
which implies that δ P˜I5 (χ ) + d˜ R˜ I4 (χ ) = 0. We know that we are in even antighost number, thus we can use Theorem 10.2 to find that P˜I5 = δ S˜I5 (χ ) + d˜ T˜I4 (χ ). Hence, 6 ak6 = sfk+1 + dgk5 + γ h6k ,
where we have defined 6 fk+1 = −dx 0
h6k = dx 0
I
I
S˜I5 ωI ,
T˜I4 ωˆ I ,
gk5 = −dx 0
(11.20)
˜ I = γ ωˆ I . dω
I
T˜I4 ωI ,
(11.21) (11.22)
Thus ak6 can be completely eliminated by redefining a 6 as
6 + h6k ) − dgk5 , a 6 = a 6 − s(fk+1
(11.23)
which only affects the components of antighost number < k. Repeating the argument at lower antighost numbers enables one to remove successively ak−1 , ak−2 , ..., up to a1 . This completes the proof of the fact that there is no non-trivial dependence on the antifields for the elements of H 6,0 (s | d). For antifield-independent local forms, the cocycle condition H 6,0 (s | d) reduces to the cocycle condition for H 6,0 (γ | d). Furthermore, γ -exact (mod-d) solutions are also s-exact. Thus, we are led to consider H 6,0 (γ | d). This cohomology is given by Theorem 9.1. [The terms in that cohomology that vanish on-shell are trivial in the s-cohomology.] Thus, the only consistent deformations of the free action for a system of abelian chiral 2-forms are either functions of the curvatures or of the Chern-Simons type. In both cases, the integrated deformations are off-shell gauge invariant and yield no modification of the gauge transformations. 12. Final Comments and Conclusions We have shown that the most general first-order consistent deformation of a set of free chiral 2-forms cannot modify (non-trivially) the original gauge transformations and a fortiori, their algebra, which remains abelian. Thus, there is no room for a non-abelian, local, generalization of the theory analogous to the Yang-Mills construction. This result holds in fact to all orders, since the allowed deformations involve the gauge-invariant curvatures or Chern–Simons terms. The addition of such terms to the original action yields a new action which is evidently gauge-invariant under the original gauge transformations to all orders. One can show along identical lines that the rigidity of the gauge symmetries is actually valid for a set of chiral 2p-forms in 2p + 2 dimensions, for any p > 0. If one includes other fields, one may deform the gauge transformations, but the possibilities are severely limited [21]. For instance, in 10 dimensions, the only couplings of a chiral 4-form to 2-forms are those present in type IIB supergravity. Acknowledgements. X.B. and M.H. are supported in part by the “Actions de Recherche Concertées” of the “Direction de la Recherche Scientifique - Communauté Française de Belgique”, by IISN - Belgium (convention 4.4505.86) and by Proyectos FONDECYT 1970151 and 7960001 (Chile). A.S. is supported in part by the FWO and by the European Commission TMR programme ERBFMRX-CT96-0045 in which he is associated to K. U. Leuven.
Chiral Forms and Their Deformations
703
References 1. Strominger, A.: Phys. Lett. B383, 44 (1996), hep-th/9512059; Witten, E.: J. Geom. Phys. 22, 103 (1997), hep-th/9610234; Witten, E.: JHEP 9801, 001 (1998), hep-th/9710065 2. Seiberg, N.: Phys. Lett. B408, 98 (1997), hep-th/9705221; Berkooz, M., Rozali, M. and Seiberg, N.: Phys. Lett. 408, 105 (1997), hep-th/9704089 3. Perry, M. and Schwarz, J.H.: Nucl. Phys. B489, 47–64 (1997), hep-th/9611065 4. Schwarz, J.H.: Phys. Lett. B395, 191 (1997), hep-th/9701008 5. Aganagic, M., Park, J., Popescu, C., Schwarz, J.H.: Nucl. Phys. 191, B496A (1997), hep-th/9701166 6. Pasti, P., Sorokin, D. and Tonin, M.: Phys. Rev. D55 6292 (1997), hep-th/9611100; Phys. Lett. B398, 41 (1997), hep-th/9701037 7. Bandos, I., Lechner, K., Nurmagambetov, A., Pasti, P., Sorokin, D. and Tonin, M.: Phys. Rev. Lett. 78, 4332 (1997), hep-th/9701149 8. Tseytlin, A.A.: Nucl. Phys. B501 (1997) 41, hep-th/9701125 9. Hashimoto, A. Taylor IV, W.: Nucl. Phys. B503, 193 (1997), hep-th/9703217 10. Denef, F., Sevrin, A. and Troost, J.: hep-th/0002180 11. Teitelboim, C.: Phys. Lett. 167B, 63 (1986) 12. Nepomechie, R.I.: Nucl. Phys. B212, 301 (1983) 13. Henneaux, M. and Knaepen, B.: Phys. Rev. D56, 6076 (1997), hep-th/9706119; Henneaux, M.: Phys. Lett. B368, 83 (1996), hep-th/9511145; Henneaux, M. and Knaepen, B.: Nucl. Phys. B548, 491 (1999), hep-th/9812140; Henneaux, M. and Knaepen, B.: preprint hep-th/9912052 14. Klebanov, I.R. and Tseytlin, A.A.: Nucl. Phys. B475, 164 (1996), hep-th/9604089 15. Gubser, S.S., Klebanov, I.R. and Tseytlin, A.A.: Nucl. Phys. B499, 217 (1997), hep-th/9703040; Gubser, S.S. and Klebanov, I.R.: Phys. Lett. B413, 41 (1997), hep-th/9708005 16. Henningson, M. and Skenderis, K.: JHEP 9807, 023 (1998), hep-th/9806087 17. Bastianelli, F., Frolov, S. and Tseytlin, A. A.: Preprint hep-th/0001041 18. Kalkkinen,J.: JHEP 9907, 002 (1999), hep-th/9905018 19. Hitchin, N.: Preprint math.dg/9907034 20. Bekaert, X., Henneaux, M. and Sevrin, A.: Phys. Lett. B468, 228 (1999) 21. Bekaert, X., Henneaux, M. and Sevrin, A.: Preprint hep-th/9912077 22. Schwarz, J.H.: Nucl. Phys. B226, 269 (1983) 23. Howe, P.S. and West, P.C.: Nucl. Phys. B238, 181 (1984) 24. Barnich, G. and Henneaux, M.: Phys. Lett. B311, 123 (1993), hep-th/9304057; Henneaux, M: In: Moscow, Russia, 1997, Proceedings, Conference on Secondary Calculus and Cohomological Physics, hep-th/9712226 25. Stasheff, J.: Preprint q-alg/9702012 26. Berends, F.A., Burgers, G.J. and van Dam, H.: Nucl. Phys. B260, 295 (1985) 27. Wald, R.M.: Phys. Rev. D33, 3613 (1986) 28. Deser, S., Jackiw, R. and Templeton, S.: Annals Phys. 140, 372 (1982) 29. Freedman, D.Z. and Townsend, P.K.: Nucl. Phys. B177, 282 (1981) 30. Batalin, I.A. and Vilkovisky, G.A.: Phys. Lett. B102, 27 (1981) 31. Batalin, I.A. and Vilkovisky, G.A.: Phys. Lett. B120, 166 (1983) 32. Batalin, I.A. and Vilkovisky, G.A.: Phys. Rev. D28, 2567 (1983) 33. Henneaux, M. and Teitelboim, C.: Quantization of Gauge Systems. Princeton, NJ: Princeton University Press, 1992 34. Gomis, J., Paris, J. andSamuel, S.: Phys. Rept. 259, 1 (1995), hep-th/9412228 35. Henneaux, M. and Teitelboim, C.: Phys. Lett. B206, 650 (1988) 36. Barnich, G., Brandt, F. and Henneaux, M.: Commun. Math. Phys. 174, 93 (1995), [hep-th/9405194] 37. Vinogradov, A.M.: Sov. Math. Dokl. 18, 1200 (1977) 38. Vinogradov, A.M.: Sov. Math. Dokl. 19, 144 (1978) 39. Bizdadea, C., Saliu, L. and Saliu, S.O.: Preprint hep-th/0003191 40. Bizdadea, C., Cioroianu, E.M. and Saliu, S.O.: Preprint, hep-th/0003192 41. Bizdadea, C.: Preprint, hep-th/0003199 42. Henneaux, M., Knaepen, B. and Schomblond, C.: Commun. Math. Phys. 186, 137 (1997), hep-th/9606181 43. Barnich, G., Brandt, F. and Henneaux, M.: Commun. Math. Phys. 174, 57 (1995), hep-th/9405109 Communicated by R. H. Dijkgraaf
Commun. Math. Phys. 224, 705 – 732 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Tensor Product of Crystal Bases for Uq (gl(m, n))-Modules Seok-Jin Kang1, , Jae-Hoon Kwon2,, 1 School of Mathematics, Korea Institute for Advanced Study, Seoul 130-012, South Korea.
E-mail: [email protected]
2 Department of Mathematics, College of Natural Sciences, Seoul National University, Seoul 151-747,
South Korea. E-mail: [email protected] Received: 6 November 2000 / Accepted: 26 July 2001
Abstract: In this paper, we give a combinatorial algorithm of decomposing the tensor product of Uq (gl(m, n))-crystals. Using the insertion scheme for (m, n)-hook semistandard tableaux, we show that there exists a bijection between the set of Littlewood– Richardson tableaux and the set of genuine highest weight vectors in the tensor product of Uq (gl(m, n))-crystals. Hence the connected components are parametrized by the Littlewood–Richardson tableaux.
Introduction The Lie superalgebras arise as the natural algebraic structure behind many important developments in mathematics and mathematical physics (see, for example, [6,10]). The basic structure theory for Lie superalgebras was developed by Kac along with an important classification theorem for finite dimensional simple Lie superalgebras [7] (see also [23]). For the representations of Lie superalgebras, most of the fundamental facts can be found in the works of Kac [8,9], Penkov and Serganova [18,19], and Berele and Regev [2]. The main difficulty lies in the fact that the category of finite dimensional modules over finite dimensional simple Lie superalgebras is not semisimple. However, for the general linear Lie superalgebra gl(m, n), the tensor powers of the vector representation are completely reducible, and their irreducible summands are parametrized by the (m, n)-hook Young diagrams (see [2]). In this paper, we focus on the study of the quantum superalgebra Uq (gl(m, n)), the quantized universal enveloping algebra of Supported by KOSEF Grant # 98-0701-01-05-L and the Young Scientist Award, Korean Academy of Science and Technology Supported by KOSEF Grant # 98-0701-01-05-L and BK21 Mathematical Sciences Division, Seoul National University Current address: School of Mathematics, Korea Institute for Advanced Study, Seoul 130-012, South Korea. E-mail: [email protected]
706
S.-J. Kang, J.-H. Kwon
gl(m, n), and its finite dimensional representations arising from the tensor powers of the vector representation. The combinatorial structure of the finite dimensional Uq (gl(m, n))-modules can best be described in the language of crystal basis theory. The crystal basis theory was originally developed by Kashiwara for the integrable modules over the quantum groups associated with symmetrizable Kac–Moody algebras [11, 12]. The crystal basis can be viewed as a basis at q = 0 and it is given a structure of a colored oriented graph, called the crystal graph, with arrows defined by the Kashiwara operators. Two of the most important features of crystal basis theory may be the following: (i) the combinatorial structure of integrable modules are reflected in their crystal graphs, (ii) the crystal bases have extremely simple behavior with respect to taking the tensor product. In [1], Benkart, Kang and Kashiwara developed the crystal basis theory for the finite dimensional Uq (gl(m, n))-modules arising from the tensor powers of the vector representation. Given an (m, n)-hook Young diagram Y , let λ ∈ P + be the associated dominant integral weight and let V (λ) be the irreducible highest weight Uq (gl(m, n))-module with highest weight λ. Then it was shown that V (λ) has a crystal basis (L(λ), B(λ)) and its crystal B(λ) can be realized as the set of (m, n)-hook semistandard tableaux of shape Y , where the Uq (gl(m, n))-crystal structure is given by an admissible reading [1]. The purpose of this paper is to give a combinatorial algorithm of decomposing the tensor product of finite dimensional Uq (gl(m, n))-modules into a direct sum of irreducible modules using the crystal basis theory. That is, we would like to find a combinatorial algorithm of decomposing the tensor product of Uq (gl(m, n))-crystals into a disjoint union of connected components. In finding such an algorithm, the main obstacle is the existence of fake highest weight vectors in the crystal graphs. Therefore, we need to find a set of nice combinatorial objects that can be used to parametrize the genuine highest weight vectors. Our approach to this problem is based on the insertion scheme for the tableaux with entries in B = { 1 < 2 < · · · < m < 1¯ < 2¯ < · · · < n¯ }. For the tableaux with entries in the set N of positive integers, Schensted introduced the column insertion scheme [22] and it was generalized by Thomas to an algorithm of inserting a tableau with entries in N into another [25]. Using this insertion scheme, Thomas gave a combinatorial proof for the decomposition of products of Schur functions. In this work, we generalize their insertion scheme to the tableaux with entries in B, and using this super-version of insertion schemes, we give a combinatorial algorithm of decomposing the tensor product of Uq (gl(m, n))-crystals. Let us describe our algorithm in more detail. Let Y and W be (m, n)-hook Young diagrams and let B(Y ) and B(W ) be the Uq (gl(m, n))-crystals consisting of (m, n)-hook semistandard tableaux of shape Y and W , respectively. Given T ⊗T ∈ B(Y )⊗B(W ), let P = T ← T be the semistandard tableau obtained from T by inserting T . Then, since the connected component containing T ⊗ T and the one containing P are isomorphic as Uq (gl(m, n))-crystals, we have an irreducible component B(sh(P )) = B(sh(T ← T )) whenever T ⊗ T is a genuine highest weight vector. We will denote by GH (Y, W ) the set of all genuine highest weight vectors T ⊗ T in B(Y ) ⊗ B(W ). On the other hand, the recording tableau Q for the insertion of T into T satisfies the following conditions:
Tensor Product of Uq (gl(m, n))-Crystal Bases
707
(i) the entries of Q lie in N, (ii) Q is a skew tableau of shape Z/Y and content W , where Z = sh(T ← T ), (iii) the word of Q obtained by the Middle Eastern reading forms a lattice permutation. Such a tableau Q is called a Littlewood–Richardson tableau of shape Z/Y and content W . We denote by LR(Y, W ) the set of all Littlewood–Richardson tableaux of shape Z/Y and content W for some (m, n)-hook Young diagram Z containing Y . Then we prove that there exists a natural bijection : GH (Y, W ) −→ LR(Y, W ) which is given by the recording tableaux, and obtain the decomposition B(sh(T ← T )) B(Y ) ⊗ B(W ) ∼ = (T ⊗T )∈LR(Y,W )
∼ =
Z
B(Z)⊕NY,W ,
(0.1)
Z∈H (m,n) Z where H (m, n) is the set of all (m, n)-hook Young diagrams and NY,W denotes the number of Littlewood–Richardson tableaux of shape Z/Y and content W , called the Littlewood–Richardson coefficient. Furthermore, we show that any two elements in B(Y ) ⊗ B(W ) are in the same connected component if and only if they have the same recording tableaux. We can generalize our algorithm to a more general setting. Let Y1 , · · · , Yl be (m, n)hookYoung diagrams and consider the tensor product B(Y1 )⊗· · ·⊗B(Yl ). Then the genuine highest weight vectors in B(Y1 ) ⊗ · · · ⊗ B(Yl ) are parametrized by the sequences of tableaux Q = (Q1 , · · · , Ql−1 ), called the generalized Littlewood–Richardson tableaux of shape Z = Zl ⊃ · · · ⊃ Z1 = Y and content (Y2 , · · · , Yl ), which consist of the recording tableaux for the insertion (· · · (T1 ← T2 ) ← · · · ) ← Tl for a genuine highest weight vector T1 ⊗ · · · ⊗ Tl ∈ B(Y1 ) ⊗ · · · ⊗ B(Yl ) (say, (T1 ⊗ · · · ⊗ Tl ) = Q). Therefore, we obtain the decomposition B(sh((· · · (T1 ← T2 ) ← · · · ) ← Tl )) B(Y1 ) ⊗ · · · ⊗ B(Yl ) ∼ = (T1 ⊗···⊗Tl )∈LR(Y1 ,··· ,Yl )
∼ =
B(Z)
NYZ ,··· ,Y 1
l
,
Z∈H (m,n)
(0.2) where NYZ1 ,··· ,Yl is the number of generalized Littlewood–Richardson tableaux of shape Z = Zl ⊃ · · · ⊃ Z1 = Y and content (Y2 , · · · , Yl ), called the generalized Littlewood– Richardson coefficient. We also present the application of our algorithm to several interesting cases, including the tableaux switching introduced by Benkart, Sottile and Stroomer [3]. 1. The Quantum Superalgebra Uq (gl(m, n)) We begin with a brief review of crystal basis theory for the quantum superalgebra Uq (gl(m, n)), the quantized universal enveloping algebra of the general linear Lie superalgebra gl(m, n) (see [1]).
708
S.-J. Kang, J.-H. Kwon
¯ 2, ¯ · · · , n¯ }, and define a Set B = B+ ∪ B− , where B+ = { 1, 2, · · · , m }, B− = { 1, linear ordering on B by 1 < 2 < · · · < m < 1¯ < 2¯ < · · · < n. ¯ The general linear Lie superalgebra gl(m, n) is the Lie superalgebra of (m+n)×(m+n) matrices having rows and columns indexed by B with Z2 -grading given by A 0 gl(m, n)0 = A = (aij )i,j ∈B+ , D = (dij )i,j ∈B− , 0 D 0 B gl(m, n)1 = B = (bij )i∈B+ , j ∈B− , C = (cij )i∈B− , j ∈B+ . C 0 Let Eb,b (b, b ∈ B) denote the unit matrices and let &b : gl(m, n) → C be the linear functional on gl(m, n) defined by &b (A) = ab,b for A = (ab,b )b,b ∈B ∈ gl(m, n). The free abelian group P = b∈B Z&b (resp. P ∨ = b∈B ZEb,b ) is called the weight lattice (resp. dual weight lattice) of gl(m, n), and there is a symmetric bilinear form (· , ·) on h∗ = C ⊗Z P defined by if b = b ∈ B+ , 1 (&b , &b ) = −1 if b = b ∈ B− , (1.1) 0 otherwise. The simple roots of gl(m, n) are given by αi = &i − &i+1 for i = 1, · · · , m − 1, α0 = &m − &1¯ for i = 0, α = & − & i i i+1 for i = 1, · · · , n − 1.
(1.2)
We denote by I = Ieven Iodd the index set for the simple roots, where Ieven = ¯ 2, ¯ · · · , n − 1 } and Iodd = { 0 }. The simple coroot hi corresponding { 1, 2, · · · , m−1, 1, to αi is the unique element in P ∨ satisfying li hi , λ = (αi , λ)
for all λ ∈ P ,
where · , · denotes the natural pairing between P and P ∨ and 1 if i = 0, 1, · · · , m − 1, li = ¯ · · · , n − 1. −1 if i = 1,
(1.3)
(1.4)
We assume that q is an indeterminate. Definition 1.1. The quantum superalgebra Uq (gl(m, n)) is the associative algebra over C(q) generated by the elements ei , fi ( i ∈ I ) and q h ( h ∈ P ∨ ) satisfying the following relations: q 0 = 1, q h ei = q h,αi ei q h ,
q h1 +h2 = q h1 q h2
for h1 , h2 ∈ P ∨ ,
q h fi = q −h,αi fi q h for h ∈ P ∨ ,
Tensor Product of Uq (gl(m, n))-Crystal Bases
ei fj − (−1)p(i)p(j ) fj ei = δij
709
Ki − Ki−1 qi − qi−1
,
ei ej − (−1)p(i)p(j ) ej ei = fi fj − (−1)p(i)p(j ) fj fi = 0 if aij = 0, ei2 ej − (q + q −1 )ei ej ei + ej ei2 = 0
if aij < 0, i = 0,
fi2 fj − (q + q −1 )fi fj fi + fj fi2 = 0
if aij < 0, i = 0,
e0 em−1 e0 e1¯ + em−1 e0 e1¯ e0 + e0 e1¯ e0 em−1 + e1¯ e0 em−1 e0 − (q + q −1 )e0 em−1 e1¯ e0 = 0, f0 fm−1 f0 f1¯ + fm−1 f0 f1¯ f0 + f0 f1¯ f0 fm−1 + f1¯ f0 fm−1 f0 − (q + q −1 )f0 fm−1 f1¯ f0 = 0. Here, qi = q li , Ki = q li hi , aij = hi , αj and p denotes the parity function defined by 0 if i = 0, p(i) = (1.5) 1 if i = 0, . Note that Uq (gl(m, n)) has a Hopf superalgebra structure (see [1]) and the subalgebra Uq (gl(m, n))i generated by ei , fi , Ki±1 is isomorphic to the quantum algebra Uq (sl2 ) for i = 0 and to the quantum superalgebra Uq (sl(1, 1)) for i = 0. 2. Crystal Bases We define the category of Uq (gl(m, n))-modules for which the crystal basis theory is developed. Definition 2.1. The category Oint is the category of Z2 -graded finite dimensional Uq (gl(m, n))-modules M and Uq (gl(m, n))-module homomorphisms satisfying the following conditions: (i) M has a weight space decomposition M = λ∈P Mλ , where Mλ = { u ∈ M | q h u = q h,λ u for h ∈ P ∨ }, (ii) if Mµ = 0, then h0 , µ ≥ 0, (iii) if h0 , µ = 0, then e0 Mµ = f0 Mµ = 0. Let M = λ∈P Mλ be a Uq (gl(m, n))-module in the category Oint . For i ∈ Ieven , every u ∈ Mλ can be written uniquely as
(k) fi uk , (2.1) u= k≥0,−hi ,λ
where ei uk = 0 for all k ≥ 0, [n]i =
qin − qi−n
, −1
qi − qi
[n]i ! =
n
[k]i ,
k=1
(n)
fi
=
1 n f . [n]i ! i
710
S.-J. Kang, J.-H. Kwon
Then the Kashiwara operators are defined by (k−1) f uk if i = 1, · · · , m − 1, e˜i u = k≥1 ihi ,λ+1 (k−1) ¯ · · · , n − 1, q f u if i = 1, k k≥1 i i (k+1) f uk if i = 1, · · · , m − 1, f˜i u = k≥0 i−hi ,λ+1 (k+1) ¯ · · · , n − 1. fi uk if i = 1, k≥0 qi
(2.2)
(2.3)
For i = 0, the Kashiwara operators are defined by e˜0 u = q −1 K0 e0 u,
and
f˜0 u = f0 u.
(2.4)
Let A denote the subring of C(q) consisting of all rational functions f/g ∈ C(q) such that g(0) = 0. Definition 2.2. A free A-submodule L of M is called a crystal lattice if (i) L generates M as a vector space over C(q), (ii) L has a weight decomposition L = λ∈P Lλ , where Lλ = L ∩ Mλ . (iii) e˜i L ⊂ L and f˜i L ⊂ L for all i ∈ I . Definition 2.3. A crystal basis of M is a pair (L, B) such that (i) L is a crystal lattice of M, (ii) B is a pseudo-basis of L/qL; that is, B = B • ∪ (−B • ) for a C-basis B • of L/qL, (iii) B has a weight decomposition B = λ∈P Bλ with Bλ = B ∩ (L/qL)λ , (iv) e˜i B ⊂ B {0}, f˜i B ⊂ B {0} for all i ∈ I , (v) for any b, b ∈ B and i ∈ I , we have f˜i b = b if and only if b = e˜i b . The set B/{±1} is given a colored oriented graph structure with the arrows defined i by b → b if and only if f˜i b = b ( b, b ∈ B/{±1} ). We call B/{±1} the crystal of M. For b ∈ B/{±1} and i ∈ I , we set εi (b) = max{ n ∈ Z≥0 | e˜in b = 0 }, ϕi (b) = max{ n ∈ Z≥0 | f˜n b = 0 }.
(2.5)
i
Then the representation theory of Uq (sl2 ) and Uq (sl(1, 1)) yields hi , wt(b) = ϕi (b) − εi (b) and
ϕ0 (b) + ε0 (b) =
0 1
for i ∈ Ieven ,
if h0 , wt(b) = 0, if h0 , wt(b) > 0.
(2.6)
(2.7)
In the following proposition, we recall the tensor product rule for the crystal bases of Uq (gl(m, n))-modules in the category Oint . Proposition 2.4 ([1]). Let Mj ( j = 1, 2 ) be a Uq (gl(m, n))-module in the category Oint with a crystal basis (Lj , Bj ) and set L = L1 ⊗A L2 ,
B = B1 ⊗ B2 ⊂ (L1 /qL1 ) ⊗ (L2 /qL2 ) = L/qL.
Then (L, B) is a crystal basis of M1 ⊗ M2 , where the Kashiwara operators on B1 ⊗ B2 are given as follows:
Tensor Product of Uq (gl(m, n))-Crystal Bases
(a) If i = 1, · · · , m − 1, then
711
e˜i b1 ⊗ b2 if ϕi (b1 ) ≥ εi (b2 ), b1 ⊗ e˜i b2 if ϕi (b1 ) < εi (b2 ),
e˜i (b1 ⊗ b2 ) =
f˜i b1 ⊗ b2 if ϕi (b1 ) > εi (b2 ), f˜i (b1 ⊗ b2 ) = b1 ⊗ f˜i b2 if ϕi (b1 ) ≤ εi (b2 ). ¯ · · · , n − 1, then (b) If i = 1,
b1 ⊗ e˜i b2 if ϕi (b2 ) ≥ εi (b1 ), e˜i b1 ⊗ b2 if ϕi (b2 ) < εi (b1 ),
e˜i (b1 ⊗ b2 ) =
b1 ⊗ f˜i b2 if ϕi (b2 ) > εi (b1 ), f˜i (b1 ⊗ b2 ) = ˜ fi b1 ⊗ b2 if ϕi (b2 ) ≤ εi (b1 ). (c) If i = 0, then
e˜0 b1 ⊗ b2 if h0 , wt(b1 ) > 0, ±b1 ⊗ e˜0 b2 if h0 , wt(b1 ) = 0,
e˜0 (b1 ⊗ b2 ) =
f˜0 b1 ⊗ b2 if h0 , wt(b1 ) > 0, f˜0 (b1 ⊗ b2 ) = ˜ ±b1 ⊗ f0 b2 if h0 , wt(b1 ) = 0. The sign in part (c) depends on the parity of b1 . As an example, consider the vector representation V = V+ ⊕ V− of Uq (gl(m, n)), where V± = b∈B± C(q) b . Then V belongs to the category Oint with a crystal basis (L, B ∪ (−B)), where L = b∈B A b and the associated crystal graph is given by 1
1
−→ 2
2
m−1
−→ · · · −→ m
0
−→ 1
1
−→ 2
2
n−1
−→ · · · −→ n
.
Moreover, V⊗k is completely reducible and has a crystal basis (L⊗k , (B ∪ (−B))⊗k ) (Corollary 2.13 in [1]). From now on, we will identify B with the crystal of the vector representation. 3. Tableaux and Crystals In this section, we discuss the connection between combinatorics of tableaux and crystal basis theory for Uq (gl(m, n))-modules in the category Oint . Recall that a Young diagram is a collection of boxes in left-justified rows with a weakly decreasing number of boxes in each row. Also if Y is a Young diagram and λi is the number of boxes in the i th row, we may identify Y with the partition λ = (λ1 , λ2 , λ3 , · · · ). If Z and Y are Young diagrams such that Z ⊃ Y , we denote by Z/Y the skew Young diagram obtained by removing Y from Z.
712
S.-J. Kang, J.-H. Kwon
For a skew Young diagram Y , the size of Y , denoted by |Y |, is defined to be the total number of boxes in Y . A box in a skew Young diagram is called a corner or a removable corner if there are no boxes to its right and beneath it. A place where a box can be added to a skew Young diagram to create a removable corner of a larger diagram is called a co-corner or an indent corner. The coordinate of a box in a skew Young diagram Z/Y is defined to be the pair (i, j ) if the box lies in the i th row (from the top row) and in the j th column (from the leftmost column) of the diagram Z. Note that in our definition, the leftmost box in a skew Young diagram may not be in the first column of Z. Let T be a tableau with entries in N; that is, T is made by filling a skewYoung diagram with positive integers. We say that T is semistandard if the entries in T are weakly increasing from left to right in each row and strictly increasing from top to bottom in each column. If T is semistandard and all the entries in T are strictly increasing in each row and column, T is said to be standard. The skew Young diagram from which T is made is called the shape of T and is denoted by sh T . If µi is the number of occurrences of i in T , we define the content of T to be the sequence µ = (µ1 , µ2 , µ3 , · · · ). If µi ≥ µi+1 for all i, then the content of T becomes a partition and it can be viewed as a Young diagram having µi boxes in the i th row. A Young diagram is called an (m, n)-hook Young diagram if the number of boxes in the (m+1)st row is at most n. Thus an (m, n)-hookYoung diagram fits in the (m, n)-hook displayed below. ↑ m ↓
← n → A tableau T obtained by filling a skew Young diagram with elements of B is called (m, n)-hook semistandard if (i) the entries in each row and column are weakly increasing, (ii) the entries in B+ are strictly increasing in each column, (iii) the entries in B− are strictly increasing in each row. It is easy to see that a Young diagram can be made into an (m, n)-hook semistandard tableau with entries in B if and only if it is an (m, n)-hook Young diagram. We denote by H (m, n) the set of all (m, n)-hook Young diagrams. Given a skew Young diagram Y , let B(Y ) denote the set of all (m, n)-hook semistandard tableaux of shape Y . For T ∈ B(Y ), the weight of T is defined by wt T =
µb & b ∈ P ,
(3.1)
b∈B
where µb is the number of occurrences of b in T . If |Y | = N , the set B(Y ) may be embedded into B⊗N by reading the entries { b1 , b2 , · · · , bN } of a tableau with respect
Tensor Product of Uq (gl(m, n))-Crystal Bases
713
to a given order and identifying the tableau with b1 ⊗ b2 ⊗ · · · ⊗ bN . Such an embedding ψ : B(Y ) → B⊗N is called a reading of B(Y ). The Far-Eastern reading of T ∈ B(Y ) is the one which reads each column from top to bottom and then moves to the next column from right to left, and the Middle-Eastern reading of T ∈ B(Y ) is the one which reads each row from right to left and moves to the next row from top to bottom. Example 3.1. 3 ⊗ 2 ⊗ 5 ⊗ 1 ⊗ 4
= Far-Eastern
1 2 3 4 5
Reading
3 ⊗ 2 ⊗ 1 ⊗ 5 ⊗ 4
= Middle-Eastern Reading
Suppose β and β are the two distinct boxes in a skew Young diagram Y with coordinates (i, j ) and (i , j ), respectively. We say that β is strictly higher than β if i ≤ i and j ≥ j . In general, a reading ψ : B(Y ) → B⊗N is called admissible if β is read before β whenever β is strictly higher than β . Theorem 3.2 ([1]). Let Y be a skew Young diagram and let B(Y ) be the set of all (m, n)hook semistandard tableaux of shape Y . (a) For any admissible reading ψ : B(Y ) → B⊗N , ψ(B(Y )) is stable under the Kashiwara operators e˜i and f˜i (i ∈ I ). Hence an admissible reading defines a Uq (gl(m, n))-crystal structure on B(Y ). (b) The induced Uq (gl(m, n))-crystal structure on B(Y ) does not depend on the choice of an admissible reading. (c) If Y is an (m, n)-hook Young diagram, then the Uq (gl(m, n))-crystal B(Y ) is connected. For λ, µ ∈ P , we define λ ≥ µ if and only if λ − µ ∈ Q+ = i∈I Z≥0 αi . Let B be a Uq (gl(m, n))-crystal. We say that an element b ∈ Bλ ( λ ∈ P ) is a genuine highest weight vector if Bλ = { b } and wt(b) ≥ wt(b ) for all b ∈ B. Note that a genuine highest weight vector is unique if it exists. For an (m, n)-hook Young diagram Y , let HY be the tableau in B(Y ) defined by 1 2 .. . m 1 .. . 1 1 ←
··· 1 1 1 1 ··· 2 2 2 .. .. . . m ··· m 2 ··· n .. . .. . . n .. 2 n
→
↑ m ↓
714
S.-J. Kang, J.-H. Kwon
Then HY is the unique genuine highest weight vector in B(Y ). If b is a genuine highest weight vector, we have e˜i b = 0 for all i ∈ I . An element b ∈ B satisfying e˜i b = 0 for all i ∈ I is called a highest weight vector. Clearly, a genuine highest weight vector is a highest weight vector, but the converse is not true. If Y ∈ H (m, 0), then it is well-known that the Uq (glm )-crystal B(Y ) has a unique highest weight vector which is a genuine highest weight vector. But if n > 0, then B(Y ) may have more than one highest weight vectors. A highest weight vector that is not a genuine highest weight vector is called a fake highest weight vector. Example 3.3. Let m = 2, n = 2 and consider
Y =
1 1 1 1 2 T = 2 2 2 2 1 2 1
.
Then e˜i T = 0 for all i ∈ I . But wt(T ) < wt(HY ). Hence T is a fake highest weight vector. Let P≥0 be the set of weights λ ∈ P such that (i) hi , λ ≥ 0 for all i ∈ I , (ii) if hk , λ > 0 for some k = 1, · · · , n − 1, then h0 − h1 − · · · − hk , λ ≥ k. Set P + = P≥0 ∩ b∈B Z≥0 &b . For each λ ∈ P + , there is a unique (m, n)-hook Young + λ = diagram Y λ such that wt(HY λ ) = λ. In fact, if λ = B λb &b ∈ P , then Y (λ1 , · · · , λm , µ1 , µ2 , · · · ), where µ = (µ1 , µ2 , · · · ) is the transpose of the partition (λ1¯ , · · · , λn¯ ). For λ ∈ P + , let V (λ) be the irreducible highest weight Uq (gl(m, n))-module with highest weight λ (for the definition, see [9]). Then V (λ) belongs to the category Oint . Also V (λ) has a crystal basis (L(λ), B(λ)) and the crystal of V (λ) is isomorphic to the Uq (gl(m, n))-crystal B(Y λ ) consisting of (m, n)-hook semistandard tableaux of shape Y λ (Theorem 5.1 in [1]). 4. Littlewood–Richardson Rule In [25], Thomas introduced the technique of inserting a semistandard tableau into another to give a combinatorial proof of the Littlewood–Richardson rule for Schur functions. In this section, we will generalize his result to the case of (m, n)-hook semistandard tableaux and give an explicit algorithm of decomposing the tensor product B(Y )⊗B(W ) of Uq (gl(m, n))-crystals for (m, n)-hook Young diagrams Y, W ∈ H (m, n). We first recall Schensted’s column insertion scheme or column bumping for (m, n)hook semistandard tableaux. Definition 4.1 ([2, 20]). Let Z/Y be a skew Young diagram and T be an (m, n)-hook semistandard tableau in B(Z/Y ). For x ∈ B, we define T ← x to be the tableau obtained from T by applying the following procedure: (i) If the first column of Z/Y is empty, put x at the bottom of the first column of Z.
Tensor Product of Uq (gl(m, n))-Crystal Bases
715
(ii) Suppose that the first column of Z/Y is not empty. If x ∈ B+ , let y be the smallest entry in the first column which is greater than or equal to x. If there are more than one y, choose the one in the highest position. If x ∈ B− , let y be the smallest entry in the first column which is greater than x. If there are more than one y, choose the one in the highest position. (iii) Replace y by x . (Hence y is bumped out of the first column.) If there is no such y, put x at the bottom of the first column and stop the procedure. (iv) Apply the same procedure for the second column with y as described in (i), (ii) and (iii). (iv) Repeat the same procedure column by column from left to right until we place a box at a co-corner of T . By definition of the above insertion algorithm, we can check that T ← x is also an (m, n)-hook semistandard tableau ([20]) and sh(T ← x ) = Z /Y , where Z ⊃ Z and |Z | = |Z| + 1. Example 4.2. (a)
1 2 2 4 1 2 6 3
← 3
2 (b)
3 1¯ 2¯
← 4
=
=
1 2 1 2 3 4 2 6 3
2 3 2¯ 4 1¯
Also, there is an algorithm of deleting a box from a tableau, which is merely a reverse algorithm for Schensted’s column insertion (cf. [5, 21]). Let T be an (m, n)-hook semistandard tableau of shape Z/Y and let c = (i, j ) be the coordinate of a removable corner of T . Let x be the entry of T at c. Suppose that for each i (1 ≤ i < i), the i th column of Z/Y is not empty. We define T to be the tableau obtained from T by applying the following procedure: If x ∈ B+ , let y be the largest entry in the (j − 1)st column, which is smaller than or equal to x. If x ∈ B− , let y be the largest entry in the (j − 1)st column, which is smaller than x. If there are more than one y, then choose the one in the lowest position. (ii) Replace y by x . (i)
(iii) Apply the same procedure for the (j − 2)nd column with y as described in (i) and (ii). (iv) Repeat the same procedure column by column from right to left until a box is bumped out of the first column. If z is bumped out of T by the above procedures, then it is clear from the definition of T that T ← z = T . We will denote by dc (T ) the resulting tableau after deleting a box at c from T .
716
S.-J. Kang, J.-H. Kwon
For i = 1, 2, let Bi be a Uq (gl(m, n))-crystal and let C(bi ) denote the connected component of B containing bi ∈ Bi . We say that b1 is equivalent to b2 , denoted by ∼ b1 & b2 , if there is an isomorphism of Uq (gl(m, n))-crystals C(b1 ) → C(b2 ) sending b1 to b2 (for the definition of isomorphism of crystals, see [13]). Lemma 4.3 ([1]). Let Y be a skew Young diagram. Then for any x ∈ B and T ∈ B(Y ), we have T ← x &T ⊗ x .
(4.1)
Lemma 4.4. Let Y and Y be (m, n)-hook Young diagrams and let T and T be (m, n)hook semistandard tableaux of shape Y and Y , respectively. If T & T and wt(T ) = wt(T ), then Y = Y and T = T . Proof. Since Y is an (m, n)-hook Young diagram, there exists a unique genuine highest weight vector HY ∈ B(Y ). Thus there is a sequence of indices i1 , · · · , ir and nonnegative integers ak , bk (1 ≤ k ≤ r) such that e˜ia11 f˜ib11 · · · e˜iarr f˜ibr r T = HY . Since T & T , we must have e˜ia11 f˜ib11 · · · e˜iarr f˜ibr r T = HY . Hence we get wt(HY ) = wt(HY ), which implies Y = Y and HY = HY . Therefore, we conclude that T = T . ' Let Y and Y be skew Young diagrams and let T and T be (m, n)-hook semistandard tableaux of shape Y and Y , respectively. We fix an admissible reading ψ of B(Y ) and write ψ(T ) = x1 ⊗ · · · ⊗ xk , where k = |Y |. ψ
Definition 4.5. We define T ←− T to be the tableau (· · · ((T ← x1 ) ← x2 ) · · · ) ← xk .
(4.2)
ψ
By definition, T ←− T is an (m, n)-hook semistandard tableau, and by Lemma 4.3, we have ψ
T ←− T & T ⊗ ψ(T ) & T ⊗ T .
(4.3)
ψ
Let ψ be another admissible reading and consider T ←− T . By Theorem 3.2(b), we have ψ(T ) & ψ (T ), which implies ψ
ψ
T ←− T & T ←− T .
(4.4)
ψ
ψ
Moreover, it is clear that wt(T ←− T ) = wt(T ←− T ). Hence, if Y is an (m, n)hook Young diagram, then Lemma 4.4 implies ψ
ψ
T ←− T = T ←− T . ψ
(4.5)
Therefore, the insertion scheme T ←− T does not depend on the choice of an admissible reading ψ and we denote it by T ← T . For general skew Young diagrams, we will give a combinatorial proof of this fact in Proposition 4.13.
Tensor Product of Uq (gl(m, n))-Crystal Bases
717
Example 4.6. Let T =
1 2 3 1 2 2
T = 1 2 1 3
.
Note that ψ(T ) = 1 ⊗ 2 ⊗ 1 ⊗ 3 for any admissible reading ψ. Therefore, we obtain
T ← 1
=
1 2 3 1 2 1 2
1 2 3 (T ← 1 )← 2 = 2 1 2 1 2
((T ← 1 )← 2 )← 1
1 1 2 3 = 2 1 2 1 2
1 1 2 3 = T ← T . (((T ← 1 )← 2 )← 1 )← 3 = 2 1 2 3 1 2 We now give an explicit algorithm of decomposing the tensor product of Uq (gl(m, n))crystals. Let Y and W be (m, n)-hook Young diagrams and set GH (Y, W ) = T ⊗ T ∈ B(Y ) ⊗ B(W ) | T ← T (4.6) = HZ for some Z ∈ H (m, n) . Observe that for any T ∈ B(Y ) and T ∈ B(W ), we have C(T ⊗ T ) ∼ = C(T ← T ) = B(Z), where Z = sh(T ← T ). It follows that B(Y ) ⊗ B(W ) =
B(sh (T ← T )).
(4.7)
(4.8)
T ⊗T ∈GH (Y,W )
We would like to find a combinatorial parametrization of the set GH (Y, W ). Recall that a finite sequence of positive integers x = x1 · · · xt is called a lattice permutation if for 1 ≤ r ≤ t and i ≥ 1, the number of occurrences of i in x1 · · · xr is greater than or equal to the number of occurrences of i + 1. For each tableau T with entries in N , we associate a sequence w(T ) of positive integers obtained by the Middle-Eastern reading of T . The sequence w(T ) is called the (Middle-Eastern) word of T . For (m, n)-hook Young diagrams Y and W , let LR(Y, W ) be the set of all semistandard tableaux Q with entries in N satisfying the following conditions:
718
S.-J. Kang, J.-H. Kwon
(i) the shape of Q is Z/Y for some Z ∈ H (m, n), (ii) the content of Q is W , (iii) w(Q) is a lattice permutation. Such a tableau Q is called a Littlewood–Richardson tableau of shape Z/Y and content W . The number of Littlewood–Richardson tableaux of shape Z/Y and content W is Z . Our goal in this called the Littlewood–Richardson coefficient and denoted by NY,W section is to show that there exists a bijection between GH (Y, W ) and LR(Y, W ) and hence each connected component of B(Y ) ⊗ B(W ) can be parametrized by the tableaux in LR(Y, W ). Let Y be a skew Young diagram and for T ∈ B(Y ), consider T ⊗ x1 ⊗ x2 with x1 , x2 ∈ B. Let c1 be the coordinate of the box which is created in a co-corner of T when x1 is inserted and let c2 be the coordinate of the box which is created in a co-corner of T ← x1 when x2 is inserted. Then we have Lemma 4.7. The box at c1 is strictly higher than the box at c2 if and only if x1 < x2 or x1 = x2 ∈ B− . Proof. Suppose that x1 < x2 or x1 = x2 ∈ B− and zi (1 ≤ i ≤ p) is bumped out from the coordinate (li , i) when x1 is inserted and zj (1 ≤ j ≤ q) is bumped out from the coordinate (lj , j ) when x2 is inserted. By definition of the column insertion, we observe that p ≥ q and lk < lk for 1 ≤ k ≤ q. This implies that c1 is strictly higher than c2 . Conversely, if x1 > x2 or x1 = x2 ∈ B+ , by a similar argument, we conclude that c2 is strictly higher than c1 . ' Corollary 4.8. The box at c2 is strictly higher than the box at c1 if and only if x1 > x2 or x1 = x2 ∈ B+ . Remark 4.9. The sequences of coordinates (l1 , 1), · · · , (lp , p) and (l1 , 1), · · · , (lq , q) in the proof of the above lemma are called the insertion paths or bumping routes of x1 and x2 , respectively. Let Y and Y be skew Young diagrams. Let T ∈ B(Y ) and T ∈ B(Y ). For an admissible reading ψ, suppose that ψ(T ) = x1 ⊗ · · · ⊗ xk , where k = |Y |. Set Z = sh(T ← T ). ψ
Definition 4.10. The recording tableau Q of the insertion T ←− T is the skew tableau of shape Z/Y constructed as follows: (i) the tableau Q consists of the boxes that are created by the insertion of T into T . (ii) if a box xi in the r th row of T is inserted into (· · · ( T ← x1 ) ← · · · ) ← xi−1 to create a box at the position ci , then we fill the box at ci with the entry r. Example 4.11. In Example 4.6, the corresponding recording tableau is given as follows: 1 1 1 2
Tensor Product of Uq (gl(m, n))-Crystal Bases
719
The following technical lemma is a generalization of the Corollary of Lemma 8 in [25], which will be used in the proof of Proposition 4.13. Lemma 4.12. Let T be an (m, n)-hook semistandard tableau and let x1 , · · · , xk and y1 , · · · , yk be sequences in B such that (i) x1 ≤ x2 ≤ · · · ≤ xk and y1 ≥ y2 ≥ · · · ≥ yk , (ii) xk = yk and xk−1 = yk −1 , (iii) xi < xi+1 for xi ∈ B− , (iv) yj > yj +1 for yj ∈ B+ . Then we have (· · · (((· · · (T ← xk ) · · · ) ← x1 ) ← yk −1 ) · · · ) ← y1 = (· · · (((· · · (T ← yk ) · · · ) ← y1 ) ← xk−1 ) · · · ) ← x1 = T . Furthermore, the place of a box in T , which is created by insertion of xi , yj is uniquely determined. Proof. From Lemma 4.7, we see that the insertion path of xi (1 ≤ i ≤ k − 1) is lying above the insertion path of xk = yk and the insertion path of yj (1 ≤ j ≤ k − 1) is lying below the insertion path of xk = yk . Also the insertion paths of xk−1 and yk −1 are disjoint. This implies the required result. ' In [25], Thomas defined the notion of the insertion tableau T ← T and the recording tableau of T ← T for semistandard tableaux with entries in N with respect to the FarEastern reading and the Middle-Eastern reading, and proved that they do not depend on the choice of these readings. Now, we generalize his result to (m, n)-hook semistandard tableaux (with entries in B) with respect to an arbitrary admissible reading. Proposition 4.13. Let Y and Y be skew Young diagrams and let ψ be an admissible ψ
reading. Then for T ∈ B(Y ) and T ∈ B(Y ), the insertion tableau T ←− T and its recording tableau do not depend on the choice of an admissible reading. Proof. Let ψ be a given admissible reading and let (T ⊗ T ) be the recording tableau with respect to ψ. Write ψ(T ) = x1 ⊗ · · · ⊗ xk . We may assume that Y = sh T is connected (that is, Y cannot be decomposed into a union of two skew Young diagrams Y1 and Y2 such that no box of Y1 has a common side with a box in Y2 ). For 1 ≤ l ≤ k, let Sl be the part of T consisting of the boxes corresponding to x1 , · · · , xl and let y1 ⊗ · · · ⊗ yl be the Far-Eastern reading of Sl . Then y1 ⊗ · · · ⊗ yl ⊗ xl+1 ⊗ · · · ⊗ xk is also an admissible reading of T . We denote by l (T ⊗ T ), the recording tableau with respect to this reading. To prove our claim, we will use induction on l. Suppose that (· · · (T ← x1 ) ← · · · ) ← xl = ((· · · (T ← y1 ) ← · · · ) ← yl and (T ⊗ T ) = l (T ⊗ T ). When l = 1, our assumption is true.
720
S.-J. Kang, J.-H. Kwon
Let Sl+1 be the part of T consisting of the boxes corresponding to x1 , · · · , xl+1 . By considering the shape of Sl+1 , we see that there exists a maximal i such that for j ≤ i, the column where yj is located lies in the right of the column where xl+1 is located (see the figure below). yi+1
Sl xl+1
T =
yi
Let T = (· · · (T ← y1 ) ← · · · ) ← yi and Sl be the part of T consisting of the boxes corresponding to yi+1 , · · · , yl . If such an i does not exist, we may put i = 0 and T = T . Suppose that z1 ⊗ · · · ⊗ zl−i is the Middle-Eastern reading of Sl . We claim that (· · · (T ← yi+1 ) ← · · · ) ← yl = (· · · (T ← z1 ) ← · · · ) ← zl−i = U and the place of the box in U , which is created by inserting yj remains unchanged even if we insert Sl into T by Middle-Eastern reading. We will show this by induction on |Sl |, the number of boxes in Sl . If |Sl | = 1, it is clear. Let yi+1 , · · · , yp be the boxes lying in the right-most column of Sl (i + 1 ≤ p ≤ l) and let z1 , · · · , zq be the boxes lying in the first row of Sl (1 ≤ q ≤ l − i). Note (1)
that yi+1 = z1 . We denote by S l , the part of Sl which is obtained by removing its (2)
first row and denote by S l , the part of Sl which is obtained by removing its right-most column. Also let Sl be the part of Sl obtained by removing its first row and right-most column. Let ψ1 be the Middle-Eastern reading and let ψ2 be the Far-Eastern reading. Then we have ψ2
T ←− Sl ψ2
(2)
ψ1
(2)
= ((· · · (T ← yi+1 ) · · · ) ← yp ) ←− S l = ((· · · (T ← yi+1 ) · · · ) ← yp ) ←− S l
by induction hypothesis, ψ1
= ((· · · (((· · · (T ← yi+1 ) · · · ) ← yp ) ← z2 ) · · · ) ← zq ) ←− S l ψ1
= ((· · · (((· · · (T ← z1 ) · · · ) ← zq ) ← yi+2 ) · · · ) ← yp ) ←− S l by Lemma 4.12,
Tensor Product of Uq (gl(m, n))-Crystal Bases
721 ψ2
= ((· · · (((· · · (T ← z1 ) · · · ) ← zq ) ← yi+2 ) · · · ) ← yp ) ←− S l
ψ2
(1) S l
ψ1
(1)
= ((· · · (T ← z1 ) · · · ) ← zq ) ←−
= ((· · · (T ← z1 ) · · · ) ← zq ) ←− S l
by induction hypothesis,
by induction hypothesis,
ψ1
= T ←− Sl = U, and in each step of the above equations, the place of the box in U , which is created by inserting yj remains unchanged. Hence we proved our claim. Finally, let Sl+1 be the part of T containing Sl and xl+1 . Then z1 ⊗ · · · ⊗ zl−i ⊗ . Let w ⊗· · ·⊗ w xl+1 is the Middle-Eastern reading of Sl+1 1 l−i+1 be the Far-Eastern reading of Sl+1 . By applying the above argument once more, we have
((· · · (T ← z1 ) ← · · · ) ← zl−i ) ← xl+1 = (· · · (T ← w1 ) ← · · · ) ← wl−i+1 = U and the place of the box in U which is created by inserting zj and xl+1 remains . unchanged under the Far-Eastern reading of Sl+1 Therefore,
y1 ⊗ · · · ⊗ yi ⊗ w1 ⊗ · · · ⊗ wl−i+1 is the Far-Eastern reading of Sl+1 and (T ⊗T ) = l+1 (T ⊗T ), which completes our ψ
induction. Thus we have shown that T ←− T and (T ⊗ T ) are equal to the insertion tableau and the recording tableau with respect to the Far-Eastern reading, respectively. Since the choice of ψ was arbitrary, this proves our assertion. ' Now, for (m, n)-hook semistandard tableaux T and T of skew shapes, we can define ψ
T ← T to be the insertion tableau T ←− T for an admissible reading ψ and (T ⊗T ) to be the recording tableau of T ← T . Corollary 4.14. If Y is an (m, n)-hook Young diagram, then the recording tableaux of T ← T becomes a Littlewood–Richardson tableau for T ∈ B(Y ) and T ∈ B(Y ). In particular, if Y is also an (m, n)-hook Young diagram, then (T ⊗ T ) ∈ LR(Y, Y ). Proof. First, we claim that (T ⊗ T ) is a semistandard tableau with entries in N. Set Z = sh (T ← T ) ⊃ Y . Consider the recording tableau of T ← T with respect to the Middle-Eastern reading. In this case, the boxes of T are inserted row by row. By definition of the recording tableau, each r is placed in (T ⊗ T ) after all r − 1 ’s are placed. This implies that (T ⊗ T ) forms a semistandard tableau with entries in N by Lemma 4.8. Next, consider the recording tableau of T ← T with respect to the Far-Eastern reading. In this case, the boxes in T are inserted column by column. Let x be a box in T and y be the one beneath it. Suppose that the coordinate of x is (p, q). Consider the boxes q and q + 1 in (T ⊗ T ) which are created by inserting x and y .
722
S.-J. Kang, J.-H. Kwon
Then by Lemma 4.7, q is strictly higher than q + 1 . This observation implies that w((T ⊗ T )) is a lattice permutation since Y is a Young diagram, which completes our proof. ' From now on, we assume that Y and W are (m, n)-hook Young diagrams. We define B(Y, W ) to be the set of ordered pairs (P , Q) such that (i) P is an (m, n)-hook semistandard tableau of shape Z ⊃ Y for some Z ∈ H (m, n), (ii) Q is a Littlewood–Richardson tableau of shape Z/Y and content W . Let (P , Q) ∈ B(Y, W ). Suppose that the number of boxes in the i th row of W is λi for 1 ≤ i ≤ t. Since the content of Q is W , λi is equal to the number of occurrences of i in Q. Let c(i, j ) be the coordinate of i in Q (1 ≤ j ≤ λi ) such that c(i, j ) is strictly higher than c(i, j + 1). Let x(i, j ) be the box obtained by applying the reverse column insertion dc(i,j ) to the tableau dc(i,j −1) · · · dc(i,1) dc(i+1,λi+1 ) · · · dc(i+1,1) · · · dc(t,λt ) · · · dc(t,1) P . Let T be the resulting tableau after deleting all the boxes in P corresponding to the ones in Q following the above order. Consider b = x(1, λ1 ) ⊗ · · · ⊗ x(1, 1) ⊗ · · · ⊗ x(t, λt ) ⊗ · · · ⊗ x(t, 1) . By Lemma 4.7, x(i, j ) < x(i, j + 1) or x(i, j ) = x(i, j + 1) ∈ B+ . Since w(Q) is a lattice permutation, we can verify that c(i − 1, j ) is strictly higher than c(i, j ) for 1 ≤ j ≤ λi , which implies x(i − 1, j ) < x(i, j ) or x(i − 1, j ) = x(i, j ) ∈ B− by Lemma 4.8. Thus, b can be viewed as the Middle-Eastern reading of a tableau T in B(W ). We define @(P , Q) = T ⊗ T .
(4.9)
By construction of T and T , it is straightforward to check that T ← T = P and (T ⊗ T ) = Q. Also, we can check that @(T ← T , (T ⊗ T )) = T ⊗ T
(4.10)
for all T ∈ B(Y ) and T ∈ B(W ). Therefore Proposition 4.15 (cf. [25]). There is a one-to-one correspondence between B(Y ) ⊗ B(W ) and B(Y, W ) given by
T ⊗T
∈
∈
B(Y ) ⊗ B(W ) −→ B(Y, W ) ( −→ (P , Q),
(4.11)
where P = T ← T and Q = (T ⊗ T ). Moreover, the inverse map @ of this correspondence is given by (4.9). Now, the main result of this section follows immediately.
Tensor Product of Uq (gl(m, n))-Crystal Bases
723
Theorem 4.16. (a) The map : GH (Y, W ) −→ LR(Y, W ) defined by T ⊗ T ( −→ (T ⊗ T )
(4.12)
is a bijection and its inverse is given by −1 (Q) = @(HZ , Q) for Q ∈ LR(Y, W ), where sh Q = Z/Y for some Z ∈ H (m, n). (b) For each Z ∈ H (m, n), the multiplicity of B(Z) in the tensor product B(Y ) ⊗ B(W ) Z , the Littlewood–Richardson coefficient. That is, we have is equal to NY,W B(Y ) ⊗ B(W ) ∼ =
Z
B(Z)⊕NY,W .
(4.13)
Z∈H (m,n)
Proof. (a) Our assertion follows easily from the observation GH (Y, W ) = {T ⊗ T ∈ B(Y ) ⊗ B(W )|T ← T = Hsh(T ←T ) } = {@(HZ , Q)|Q ∈ LR(Y, W ) and shQ = Z/Y for some Z ∈ H (m, n) }. (b) Note that for each T ⊗ T ∈ GH (Y, W ), if sh(T ← T ) = Z, then sh (T ⊗ T ) = Z . ' Z/Y . Hence our assertion follows from (4.8) and the definition of NY,W Next, we will show that any two elements in B(Y ) ⊗ B(W ) are in the same connected component if and only if they have the same recording tableaux. Proposition 4.17. For T ⊗T ∈ B(Y )⊗B(W ), the recording tableaux of all the elements in the connected component of T ⊗ T are the same. Proof. To prove our claim, it suffices to prove the following statements: (a) (T ⊗ T ) = (e˜i (T ⊗ T )), where e˜i (T ⊗ T ) = 0 (i ∈ I ), (b) (T ⊗ T ) = (f˜i (T ⊗ T )), where f˜i (T ⊗ T ) = 0 (i ∈ I ). We will prove only the statement (a) since the proof of (b) is similar. Suppose that e˜i (T ⊗ T ) = 0 for some i ∈ I . Let ψ be a given admissible reading, and write ψ(T ) = x1 ⊗ · · · ⊗ xk . Set bt = x1 ⊗ · · · ⊗ xt for 1 ≤ t ≤ k. Case 1. Suppose that e˜i (T ⊗ T ) = e˜i T ⊗ T . Note that for 1 ≤ t ≤ k, we have e˜i ((· · · (T ← x1 ) ← · · · ) ← xt ) & e˜i T ⊗ bt & (· · · (e˜i T ← x1 ) ← · · · ) ← xt by Lemma 4.3 and Proposition 2.4. Then Lemma 4.4 yields e˜i ((· · · (T ← x1 ) ← · · · ) ← xt ) = (· · · (e˜i T ← x1 ) ← · · · ) ← xt and we have sh ((· · · (T ← x1 ) ← · · · ) ← xt ) = sh ((· · · (e˜i T ← x1 ) ← · · · ) ← xt ). Therefore, (T ⊗ T ) = (e˜i (T ⊗ T )).
724
S.-J. Kang, J.-H. Kwon
Case 2. Suppose that e˜i (T ⊗ T ) = T ⊗ e˜i T . Write e˜i ψ(T ) = x1 ⊗ · · · ⊗ e˜i xl ⊗ · · · ⊗ xk = y1 ⊗ · · · ⊗ yk (1 ≤ l ≤ k). Set bt = y1 ⊗ · · · ⊗ yt for 1 ≤ t ≤ k. If 1 ≤ t ≤ l − 1, then (· · · (T ← x1 ) ← · · · ) ← xt = (· · · (T ← y1 ) ← · · · ) ← yt and if t ≥ l, also by Lemma 4.3 and Proposition 2.4 e˜i ((· · · (T ← x1 ) ← · · · ) ← xt ) & e˜i (T ⊗ bt ) = T ⊗ bt & (· · · (T ← y1 ) ← · · · ) ← yt . By Lemma 4.4, for t ≥ l, we have e˜i ((· · · (T ← x1 ) · · · ) ← xt ) = (· · · (T ← y1 ) · · · ) ← yt which implies sh ((· · · (T ← x1 ) · · · ) ← xt ) = sh ((· · · (T ← y1 ) · · · ) ← yt ) for 1 ≤ t ≤ k. Therefore, (T ⊗ T ) = (e˜i (T ⊗ T )). ' For Q ∈ LR(Y, W ), we define BQ = { T ⊗ T ∈ B(Y ) ⊗ B(W ) | (T ⊗ T ) = Q }.
(4.14)
By Proposition 4.15, B(Y ) ⊗ B(W ) is a disjoint union of its subsets BQ with Q ∈ LR(Y, W ): BQ . (4.15) B(Y ) ⊗ B(W ) = Q∈LR(Y,W )
Theorem 4.18. We have the following decomposition of the Uq (gl(m, n))-crystal B(Y ) ⊗ B(W ) into a disjoint union of connected components: BQ , (4.16) B(Y ) ⊗ B(W ) = Q∈LR(Y,W )
where BQ is isomorphic to B(Z) with sh Q = Z/Y for some Z ∈ H (m, n). Proof. By Proposition 4.17, BQ ∪ { 0 } is closed under Kashiwara operators and by Proposition 4.15, BQ = { @(P , Q) | (P , Q) ∈ B(Y, W ) }. Suppose that @(P , Q) = T ⊗ T and e˜i P = 0 (i ∈ I ). Put S ⊗ S = e˜i (T ⊗ T ). Since e˜i P & e˜i (T ⊗ T ) = S ⊗ S , we see that S ← S = e˜i P by Lemma 4.4. Also, by Proposition 4.17, (S ⊗ S ) = (T ⊗ T ) = Q. Thus, we have e˜i @(P , Q) = @(e˜i P , Q).
Tensor Product of Uq (gl(m, n))-Crystal Bases
725
Similarly, if f˜i P = 0 (i ∈ I ), we have f˜i @(P , Q) = @(f˜i P , Q). Therefore, BQ is isomorphic to B(Z) with sh Q = Z/Y for some Z ∈ H (m, n) and it forms a connected component. Conversely, for each T ⊗ T ∈ B(Y ) ⊗ B(W ), let C(T ⊗ T ) be the connected component of T ⊗T in B(Y )⊗B(W ). Since T ⊗T ∈ B(T ⊗T ) , we have C(T ⊗T ) ⊂ B(T ⊗T ) . Since B(T ⊗T ) is connected, we conclude C(T ⊗ T ) = B(T ⊗T ) ,which completes the proof. ' Example 4.19. By Theorem 4.16 (a), we can explicitly construct genuine highest weight vectors in B(Y )⊗B(W ) by computing @(HZ , Q) for Q ∈ LR(Y, W ) with sh Q = Z/Y for some Z ∈ H (m, n). For example,
HZ =
1 1 1 1 2 2 2
Q=
1 2
&
1 1 1 1 2 2 2
1 1 2 3 1
⊗ 1
1 2
⊗ 2 ⊗ 1
1
⊗ 1 ⊗ 2 ⊗ 1
1
2
&
1 1 1 1 2 2
1
2
&
1 1 1 2 2 2
&
1 1 1 2
⊗ 2 ⊗ 1 ⊗ 2 ⊗ 1
2 Therefore, @(HZ , Q) =
1 1 1 2 2
⊗
1 2 2
& HZ .
1
Example 4.20. Let us give an example of decomposition of tensor product of Uq (gl(2, 2))-crystals.
726
S.-J. Kang, J.-H. Kwon
Let Y = elements:
and W =
1 1
1 1
, then LR(Y, W ) consists of the following
1 1 2
2
1
1
1 2
1
2 1
2 1
2
1 2
1
1 2
1
1 1 2
and the corresponding elements in GH (Y, W ) are given by 1 1 ⊗ 1 1 2 2 1
1 1 ⊗ 1 1 2 1 1
1 1 ⊗ 1 2 2 2 1
1 1 ⊗ 1 2 2 1 2
1 1 ⊗ 1 2 2 1 1
1 1 ⊗ 1 1 2 2 1
1 1 ⊗ 1 1 2 1 1
1 1 ⊗ 2 2 2 1 1
1 1 ⊗ 2 1 2 1 . 1
Therefore, we have
B(
) ⊗ B(
⊕B(
) = B(
) ⊕ B(
) ⊕ B(
)⊕2 ⊕ B(
) ⊕ B(
) ⊕ B(
) ⊕ B(
)
).
Tensor Product of Uq (gl(m, n))-Crystal Bases
727
Remark 4.21. Note that in case of Uq (gl(m, 0))-crystals, if T ⊗ T is a highest weight vector in B(Y ) ⊗ B(W ) for Y, W ∈ H (m, 0), then T = HY (see [12]). Thus, one can characterize the highest weight vectors (or the connected component of) T ⊗ T in terms of the second factor T . Based on this observation, Nakashima gave a simple combinatorial algorithm of decomposing the tensor product B(Y ) ⊗ B(W ) for Y, W ∈ H (m, 0) including the cases of the crystal bases of other classical types - Bn , Cn and Dn [17]. But for Uq (gl(m, n))-crystals with m, n = 0, we can not expect to have an algorithm analogous to that of Nakashima because the first factor of the genuine highest weight vector in B(Y ) ⊗ B(W ) is not always equal to HY and hence the second factor does not necessarily determine the connected component uniquely as we have seen in the above examples. 5. Generalized Littlewood–Richardson Rule In this section, we will generalize the method given in Sect. 4 to give an explicit algorithm of decomposing the tensor product B(Y1 ) ⊗ · · · ⊗ B(Yl ) of Uq (gl(m, n))-crystals for (m, n)-hook Young diagrams Y1 , · · · , Yl (l ≥ 2). Set GH (Y1 , · · · , Yl ) = { T1 ⊗ · · · ⊗ Tl ∈ B(Y1 ) ⊗ · · · ⊗ B(Yl ) | (· · · (T1 ← T2 ) ← · · · ) ← Tl = HZ for some Z ∈ H (m, n) }. (5.1) Note that for T1 ⊗ · · · ⊗ Tl ∈ B(Y1 ) ⊗ · · · ⊗ B(Yl ), we have C(T1 ⊗ · · · ⊗ Tl ) ∼ = C((· · · (T1 ← T2 ) ← · · · ) ← Tl ) = B(Z),
(5.2)
where Z = sh((· · · (T1 ← T2 ) ← · · · ) ← Tl ). It follows that B(Y1 ) ⊗ · · · ⊗ B(Yl ) = B(sh((· · · (T1 ← T2 ) ← · · · ) ← Tl )). T1 ⊗···⊗Tl ∈GH (Y1 ,··· ,Yl )
(5.3) We define LR(Y1 , · · · , Yl ) to be the set of ordered sequences (Q1 , · · · , Ql−1 ) of semistandard tableaux with entries in N such that (i) there exists a sequence of (m, n)hook Young diagrams Zl ⊃ Zl−1 ⊃ · · · ⊃ Z2 ⊃ Z1 = Y and (ii) Qi is a Littlewood– Richardson tableau of shape Zi+1 /Zi and content Yi+1 for 1 ≤ i ≤ l−1. Such a sequence (Q1 , · · · , Ql−1 ) is called a generalized Littlewood–Richardson tableau of shape Zl ⊃ Zl−1 ⊃ · · · ⊃ Z2 ⊃ Z1 and content (Y2 , · · · , Yl ). We denote by NYZ1 ,··· ,Yl the number of generalized Littlewood–Richardson tableaux of shape Z = Zl ⊃ · · · ⊃ Z1 = Y and content (Y2 , · · · , Yl ), called the generalized Littlewood–Richardson coefficient. Let T1 ⊗ · · · ⊗ Tl ∈ B(Y1 ) ⊗ · · · ⊗ B(Yl ). For 1 ≤ k ≤ l − 1, set T(k) = (· · · (T1 ← T2 ) ← · · · ) ← Tk , Qk = (T(k) ⊗ Tk+1 ). By Corollary 4.14, Qk is a Littlewood–Richardson tableau of shape sh T(k+1) /sh T(k) and content Yk+1 . We call (Q1 , · · · , Ql−1 ) the recording tableau of (· · · (T1 ← T2 ) ← · · · ) ← Tl and denote it by (T1 ⊗ · · · ⊗ Tl ). By construction, it is clear that (T1 ⊗ · · · ⊗ Tl ) ∈ LR(Y1 , · · · , Yl ).
728
S.-J. Kang, J.-H. Kwon
Next, define B(Y1 , · · · , Yl ) to be the set of ordered pairs (P, Q) such that (i) P is an (m, n)-hook semistandard tableau of shape Z for some Z ∈ H (m, n), (ii) Q = (Q1 , · · · , Ql−1 ) ∈ LR(Y1 , · · · , Yl ) such that sh Ql−1 = Z/W for some W ∈ H (m, n). Given a pair (P, Q) ∈ B(Y1 , · · · , Yl ), suppose that the shape of Q = (Q1 , · · · , Ql−1 ) is Zl ⊃ Zl−1 ⊃ · · · ⊃ Z2 ⊃ Z1 = Y . Let U(l−1) and Ul be the unique (m, n)-hook semistandard tableaux such that @(P, Ql−1 ) = U(l−1) ⊗ Ul . For 2 ≤ i ≤ l − 1, define inductively U(i−1) and Ui to be the unique (m, n)-hook semistandard tableaux satisfying @(U(i) , Qi−1 ) = U(i−1) ⊗ Ui and set U1 = U(1) . We now define @(P, Q) = U1 ⊗ · · · ⊗ Ul .
(5.4)
Then we can verify that (i) U1 ⊗ · · · ⊗ Ul ∈ B(Y1 ) ⊗ · · · ⊗ B(Yl ), (ii) (· · · (U1 ← U2 ) ← · · · ) ← Ul = P, (iii) (U1 ⊗ · · · ⊗ Ul ) = Q. Conversely, if T1 ⊗ · · · ⊗ Tl ∈ B(Y1 ) ⊗ · · · ⊗ B(Yl ), then we have @((· · · (T1 ← T2 ) ← · · · ) ← Tl , (T1 ⊗ · · · ⊗ Tl )) = T1 ⊗ · · · ⊗ Tl .
(5.5)
Therefore, the above argument gives Proposition 5.1. There is a one-to-one correspondence between B(Y1 ) ⊗ · · · ⊗ B(Yl ) and B(Y1 , · · · , Yl ) given by B(Y1 , · · · , Yl ) (5.6)
∈ ∈
B(Y1 ) ⊗ · · · ⊗ B(Yl ) −→
T1 ⊗ · · · ⊗ Tl ( −→ (P, Q), where P = (· · · (T1 ← T2 ) · · · ) ← Tl and Q = (T1 ⊗ · · · ⊗ Tl ). Moreover, the inverse map @ of this correspondence is given by (5.4). Consequently, we obtain Theorem 5.2. (a) The map : GH (Y1 , · · · , Yl ) −→ LR(Y1 , · · · , Yl ) defined by T1 ⊗ · · · ⊗ Tl ( −→ (T1 ⊗ · · · ⊗ Tl )
(5.7)
is a bijection and the inverse is given by −1 (Q) = @(HZ , Q) for Q ∈ LR(Y1 , · · · , Yl ), where Q = (Q1 , · · · , Ql−1 ) and sh Ql−1 = Z/W for some Z, W ∈ H (m, n). (b) For Z ∈ H (m, n), the multiplicity of B(Z) in the tensor product B(Y1 ) ⊗ · · · ⊗ B(Yl ) is equal to NYZ1 ,··· ,Yl , the generalized Littlewood–Richardson coefficient. That is, we have ⊕N Z B(Z) Y1 ,··· ,Yl . (5.8) B(Y1 ) ⊗ · · · ⊗ B(Yl ) ∼ = Z∈H (m,n)
Tensor Product of Uq (gl(m, n))-Crystal Bases
729
As in Theorem 4.18, we can characterize the connected components of B(Y1 ) ⊗ · · · ⊗ B(Yl ) in terms of Littlewood–Richardson tableaux in LR(Y1 , · · · , Yl ). We have to extend Proposition 4.17 as follows. Proposition 5.3. For T = T1 ⊗ · · · ⊗ Tl ∈ B(Y1 ) ⊗ · · · ⊗ B(Yl ), the recording tableaux of all the elements in the connected component of T are the same. Proof. As in the proof of Proposition 4.17, it suffices to prove the following statements: (a) (T ) = (e˜i T ) where e˜i T = 0 (i ∈ I ), (b) (T ) = (f˜i T ) where f˜i T = 0 (i ∈ I ). Again, we will prove only the statement (a) . Suppose that e˜i T = T1 ⊗ · · · ⊗ e˜i Tk ⊗ · · · ⊗ Tl = T1 ⊗ · · · ⊗ Tl = 0 for some 1 ≤ k ≤ l. Set T(s) = (· · · (T1 ← T2 ) · · · ) ← Ts
and
T(s) = (· · · (T1 ← T2 ) · · · ) ← Ts
(1 ≤ s ≤ l). It is clear that for 1 ≤ s ≤ k − 1, (T(s−1) ⊗ Ts ) = (T(s−1) ⊗ Ts ).
For s ≥ k, note that e˜i T(s) & e˜i (T1 ⊗ · · · ⊗ Ts ) = T1 ⊗ · · · ⊗ Ts & T(s) . and Proposition 4.17 implies By Lemma 4.4, we have e˜i T(s) = T(s) (T(s−1) ⊗ Ts ) = (e˜i (T(s−1) ⊗ Ts )) = (T(s−1) ⊗ Ts ).
Therefore, we have (T ) = (e˜i T ).
'
For Q ∈ LR(Y1 , · · · , Yl ), we define BQ = { T1 ⊗ · · · ⊗ Tl | Ti ∈ B(Yi ) (1 ≤ i ≤ l), (T1 ⊗ · · · ⊗ Tl ) = Q }.
(5.9)
By Proposition 5.1, B(Y1 ) ⊗ · · · ⊗ B(Yl ) is a disjoint union of its subsets BQ : BQ . B(Y1 ) ⊗ · · · ⊗ B(Yl ) =
(5.10)
Q∈LR(Y1 ,··· ,Yl )
Hence, by Proposition 5.3 together with a similar argument given in Theorem 4.18, we obtain Theorem 5.4. We have the following decomposition of the Uq (gl(m, n))-crystal B(Y1 )⊗ · · · ⊗ B(Yl ) into a disjoint union of connected components: B(Y1 ) ⊗ · · · ⊗ B(Yl ) = BQ , (5.11) Q∈LR(Y1 ,··· ,Yl )
where BQ is isomorphic to B(Z) with Q = (Q1 , · · · , Ql−1 ) and sh Ql−1 = Z/W for some Z, W ∈ H (m, n).
730
S.-J. Kang, J.-H. Kwon
Example 5.5. Suppose that each of Y1 , · · · , Yl consists of a single box. Then B(Yi ) = B for 1 ≤ i ≤ l, and for each (Q1 , · · · , Ql−1 ) ∈ LR(Y1 , · · · , Yl ), by filling the empty box corresponding to Y1 with 1 and replacing Qi by i + 1 we may identify (Q1 , · · · , Ql−1 ) with a standard tableau T with entries in N whose shape is an (m, n)hook Young diagram. Thus we have B⊗l = BT , (5.12) T ∈ST (m,n;l)
where ST (m, n; l) is the set of standard tableaux T with entries in N such that sh T ∈ H (m, n) and |sh T | = l, and BT is isomorphic to B(sh T ). Furthermore, if we denote by fZ the number of standard tableaux of shape Z with entries in N (Z is not necessarily in H (m, n)), we can rewrite the above decomposition as follows: B⊗l ∼ B(Z)⊕fZ . (5.13) = Z∈H (m,n)
Remark 5.6. Note that if we view x1 ⊗ · · · ⊗ xl ∈ B⊗l as the word x1 , · · · , xl , then the bijection in Proposition 4.15 given by x1 ⊗ · · · ⊗ xl −→ ((· · · ( x1 ← x2 ) ← · · · ) ← xl , ( x1 ⊗ · · · ⊗ xl )) is the Robinson–Schensted correspondence for (m, n)-hook type [2, 20]. The connection between Robinson–Schensted correspondence and the crystal basis for the Uq (gln )module V⊗l was first observed by Date, Jimbo and Miwa [4] (see also [15]). Motivated by this work, Kashiwara developed the crystal basis theory for the quantum groups associated with symmetrizable Kac–Moody algebras [11, 12]. Example 5.7. For each k ≥ 1, let Rk be the Young diagram consisting of a single row with k boxes. Given a sequence λ = (λ1 , · · · , λl ) of positive integers, consider the tensor product B(Rλ1 )⊗· · ·⊗B(Rλl ). Let (Q1 , · · · , Ql−1 ) ∈ LR(Rλ1 , · · · , Rλl ) whose shape is Z = Zl ⊃ Zl−1 ⊃ · · · ⊃ Z2 ⊃ Z1 = Rλ1 . Then sh Qi has no two boxes in the same column and Qi is filled with 1. If we fill the empty boxes corresponding to Rλ1 with 1 and replace the entries of Qi by i + 1, then we may identify (Q1 , · · · , Ql−1 ) with a semistandard tableau T with entries in N having shape Z and content λ. Then we get B(Rλ1 ) ⊗ · · · ⊗ B(Rλl ) = BT , (5.14) T ∈ST0 (m,n;λ)
where ST0 (m, n; λ) is the set of semistandard tableaux with entries in N of shape in H (m, n) and content λ, and BT is isomorphic to B(sh T ) . In particular, if λ1 ≥ · · · ≥ λl and λm+1 ≤ n, then the Young diagram Y = (λ1 , · · · , λl ) has an (m, n)-hook shape, and we have B(Rλ1 ) ⊗ · · · ⊗ B(Rλl ) ∼ B(Z)⊕KZY , (5.15) = B(Y ) ⊕ Z>Y
where Z = (µ1 , µ2 , · · · ) ≥ Y = (λ1 , λ2 , · · · ) means that ki=1 µi ≥ ki=1 λi for all k ≥ 1, and KZY is the number of semistandard tableaux T with entries in N having shape Z and content Y , called the Kostka number.
Tensor Product of Uq (gl(m, n))-Crystal Bases
731
Example 5.8. For each k ≥ 1, let Ck be the Young diagram consisting of a single column with k boxes. Given a sequence λ = (λ1 , · · · , λl ) of positive integers, consider the tensor product B(Cλ1 )⊗· · ·⊗B(Cλl ). Let (Q1 , · · · , Ql−1 ) ∈ LR(Cλ1 , · · · , Cλl ) whose shape is Z = Zl ⊃ Zl−1 ⊃ · · · ⊃ Z2 ⊃ Z1 = Cλ1 . Then, in this case, sh Qi has no two boxes in the same row and w(Qi ) is 1, 2, · · · , λi+1 . If we fill the empty boxes corresponding to Cλ1 with 1 and replace the entries of Qi by i + 1, then we may identify (Q1 , · · · , Ql−1 ) with the transpose of a semistandard tableau T with entries in N having shape Z t and content λ, where Z t is the transpose of Z obtained by reflecting Z with respect to its diagonal. Hence, we have BT t , (5.16) B(Cλ1 ) ⊗ · · · ⊗ B(Cλl ) = T ∈ST0 (n,m;λ)
where BT t is isomorphic to B(sh T t ). If λ1 ≥ · · · ≥ λl and λn+1 ≤ m, then Y t = (λ1 , · · · , λl )t ∈ H (m, n), and we have B(Z t )⊕KZY . (5.17) B(Cλ1 ) ⊗ · · · ⊗ B(Cλl ) ∼ = B(Y t ) ⊕ Z>Y
Example 5.9. Using the decomposition given in Theorem 5.4, we can describe the symmetries of tensor products of Uq (gl(m, n))-crystals. Let Y1 , · · · , Yl ∈ H (m, n) and Sl denote the symmetric group on l letters. In [3], Benkart, Sottile, Stroomer constructed an explicit bijection sσ : LR(Y1 , · · · , Yl ) −→ LR(Yσ (1) , · · · , Yσ (l) )
for σ ∈ Sl
(5.18)
as an application of the combinatorial algorithm called tableaux switching and deduced the symmetries of the generalized Littlewood–Richardson coefficients: NYZ1 ,··· ,Yl = NYZσ (1) ,··· ,Yσ (l) ,
(5.19)
where Z is an (m, n)-hook Young diagram. Note that the inverse of sσ is sσ −1 (for a more general statement of their result, see Example 3.7 in [3]). For T1 ⊗ · · · ⊗ Tl ∈ B(Y1 ) ⊗ · · · ⊗ B(Yl ), we define Cσ (T1 ⊗ · · · ⊗ Tl ) = @( (· · · (T1 ← T2 ) ← · · · ) ← Tl , sσ ◦ (T1 ⊗ · · · ⊗ Tl ) ). (5.20) Then we have Cσ ◦ Cσ −1 = Cσ −1 ◦ Cσ = id, Cσ (BQ ) = Bσ (Q) for Q ∈ LR(Y1 , · · · , Yl ), T1 ⊗ · · · ⊗ Tl & Cσ (T1 ⊗ · · · ⊗ Tl ) ∈ B(Yσ (1) ) ⊗ · · · ⊗ B(Yσ (l) ).
(5.21)
Therefore, Cσ : B(Y1 ) ⊗ · · · ⊗ B(Yl ) −→ B(Yσ (1) ) ⊗ · · · ⊗ B(Yσ (l) )
(5.22)
becomes a Uq (gl(m, n))-crystal isomorphism. Remark 5.10. For the classical Lie algebras, there is a description of symmetry of the tensor product of crystals in terms of Littelmann’s path models [16]. Acknowledgements. Part of this work was completed while the first author was visiting the Korea Institute for Advanced Study in the fall of 2000. He is very grateful to the faculty and staff members of the Korea Institute for Advanced Study for their hospitality and support during his visit. Also, we would like to thank Professor Marc A. A. van Leeuwen for his kind suggestion and the referee for pointing out that Definition 4.1 was incomplete in the first version of this paper.
732
S.-J. Kang, J.-H. Kwon
References 1. Benkart, G., Kang, S.-J., Kashiwara, M.: Crystal bases for the quantum superalgebra Uq (gl(m, n)). J. Am. Math. Soc. 13, No. 2, 295–331 (2000) 2. Berele, A., Regev, A.: HookYoung diagrams with applications to combinatorics and to the representations of Lie superalgebras. Adv. Math. 64, 118–175 (1987) 3. Benkart, G., Sottile, F., Stroomer, J.: Tableau switching: Algorithms and Applications. J. Combin. Theory Ser. A 76, 11–43 (1996) 4. Date, E., Jimbo, M., Miwa, T.: Representations of Uq (gl(n, C)) at q = 0 and the Robinson–Schensted correspondence. In: Physics and mathematics of strings, Memorial volume of V. Knizhnik. L. Brink, D. Friedan, A. M. Polyakov (eds). Singapore: World Scientific, 1990 5. Fulton, W.: Young tableaux, with Application to Representation theory and Geometry. Cambridge: Cambridge Univ. Press, 1997 6. Harvey, J.A., Moore, G.: Algebras, BPS states, and strings. Nuclear Phys. B 463, 315–368 (1996) 7. Kac, V.G.: Lie superalgebras. Adv. Math. 26, 8–96 (1977) 8. Kac, V.G.: Infinite dimensional algebras, Dedekind’s η-function, classical Möbius function, and the very strange formula. Adv. Math. 30, 85–136 (1978) 9. Kac, V.G.: Representations of classical Lie superalgebras. Lecture Notes in Math. 676, Berlin– Heidelberg–New York: Springer-Verlag, 1978, pp. 597–626 10. Kac, V.G., Wakimoto, M.: Modular invariant representations of infinite dimensional Lie algebras and Lie superalgebras. Proc. Natl. Acad. Sci. U.S.A. 85, 4956–4960 (1988) 11. Kashiwara, M.: Crystalizing the q-analogue of universal enveloping algebras. Commun. Math. Phys. 133, 249–260 (1990) 12. Kashiwara, M.: On crystal bases of the q-analogue of universal enveloping algebras. Duke Math. J. 63, 465–516 (1991) 13. Kashiwara, M.: The crystal base and Littelmann’s refined Demazure character formula. Duke Math. J. 71, 839–858 (1993) 14. Kashiwara, M., Nakashima, T.: Crystal graphs for representations of the q-analogue of classical Lie algebras. J. Algebra 165, 295–345 (1994) 15. Leclerc, B., Thibon, J.-Y.: The Robinson–Schensted correspondence, crystal bases, and the quantum straightening at q = 0. The Foata Festschrift. Electron. J. Combin. 3 (1996), Research Paper 11, approx. 24 pp. (electronic) 16. van Leeuwen, M.A.A.: An analogue of jeu de taquin for Littelmann’s crystal paths. Sem. Lothar. Combin. 41 (1998), Art. B41b, 23 pp. (electronic) 17. Nakashima, T.: Crystal base and a generalization of the Littlewood–Richardson rule for classical Lie algebras. Commun. Math. Phys. 154, 215–243 (1993) 18. Penkov, I., Serganova, V.: Representations of classical Lie superalgerbas of type I. Indag. Math. (N.S.) 4, 419–466 (1992) 19. Penkov, I., Serganova, V.: Generic irreducible representations of finite dimensional Lie superalgebras. Int. J. Math. 5, 389–419 (1994) 20. Remmel, J.B.: The combinatorics of (k, l)-hook Schur functions. Contemp. Math. 34, 253–287 (1984) 21. Sagan, B.E.: The Symmetric Group: Representations, Combinatorial Algorithms and Symmetric Functions. Cole, California: Wadsworth and Brooks, 1991 22. Schensted, C.: Longest increasing and decreasing sequences. Canad. J. Math. 13, 179–191 (1961) 23. Scheunert, M.: The Theory of Lie superalgebras. Lecture Notes in Math. 716, Berlin–Heidelberg–New York: Springer-Verlag, 1979 24. Schützenberger, M.P.: La correspondence de Robinson. In: Combinatoire et Représentation de Groupe Symétrique, D. Foata ed., Lecture Notes in Math. 579, Berlin–Heidelberg–New York: Springer-Verlag, 1979, pp. 59–135 25. Thomas, G.P.: On Schensted’s construction and the multiplication of Schur functions. Adv. Math. 30, 8–32 (1978) Communicated by H. Araki
Commun. Math. Phys. 224, 733 – 781 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Cube–Root Boundary Fluctuations for Droplets in Random Cluster Models Kenneth S. Alexander Department of Mathematics, DRB 155, University of Southern California, Los Angeles, CA 90089-1113, USA. E-mail: [email protected] Received: 20 January 2000 / Accepted: 7 August 2001
Abstract: For a family of bond percolation models on Z2 that includes the Fortuin– Kasteleyn random cluster model, we consider properties of the “droplet” that results, in the percolating regime, from conditioning on the existence of an open dual circuit surrounding the origin and enclosing at least (or exactly) a given large area A. This droplet is a close surrogate for the one obtained by Dobrushin, Kotecký and Shlosman by conditioning the Ising model; it approximates an area-A Wulff shape. The local part of the deviation from the Wulff shape (the “local roughness”) is the inward deviation of the droplet boundary from the boundary of its own convex hull; the remaining part of the deviation, that of the convex hull of the droplet from the Wulff shape, is inherently long-range. We show that the local roughness is described by at most the exponent 1/3 predicted by nonrigorous theory; this same prediction has been made for a wide class of interfaces in two dimensions. Specifically, the average of the local roughness √over 1/3 2/3 the droplet surface is shown to be O(l (log l) ) in probability, where l = A is the linear scale of the droplet. We also bound the maximum of the local roughness over the droplet surface and bound the long-range part of the deviation from a Wulff shape, and we establish the absense of “bottlenecks”, which are a form of self-approach by the droplet boundary, down to scale log l. Finally, if we condition instead on the event that the total area of all large droplets inside a finite box exceeds A, we show that with probability near 1 for large A, only a single large droplet is present. 1. Introduction Consider an Ising model in a finite box (or other “nice” region) in Z2 at a supercritical inverse temperature β, with minus boundary condition. There is a positive magnetization m(β), and if is large, the expected and actual fraction of plus spins observed in will each be approximately (1 − m(β))/2, with high probability. If, however, one conditions on the observed number of plus spins being sufficiently greater than the expected Research supported by NSF grant DMS-9802368.
734
K. S. Alexander
number, then, as first explicated by Dobrushin, Kotecký and Shlosman [12], the typical configuration contains a single macroscopic droplet of the plus phase, that is, a droplet in which the usual proportions of plus and minus spins are reversed. Further, the droplet will have a characteristic equilibrium crystal shape, the solution of an isoperimetric problem, given by the Wulff construction. If, for example, the temperature is such that the usual proportion of plus to minus spins is 20/80, and one conditions the fraction of plus spins to be 30%, then the typical configuration will show a 20/80 mix except inside a Wulff droplet in which the proportion is approximately 80/20. This droplet will cover approximately 1/6 of the box , so as to account for nearly all of the excess plus spins. This in an example of the general phenomenon of phase separation. The work of Dobrushin, Kotecký and Shlosman in [12] provides the first rigorous derivation of phase separation beginning from a local interaction, at a fixed temperature. In the joint construction [13] of the Ising model and the corresponding Fortuin– Kasteleyn random cluster model ([14]; see [15]), abbreviated “FK model”, the droplet boundary appears as a circuit of open dual bonds. If a particular site, say the origin, is inside the droplet, one expects that the outermost open dual circuit 0 containing the origin will closely approximate the droplet boundary. Since the Ising droplet has approximately a fixed area, we can gain information useful in studying the droplet by studying the FK model conditioned to have 0 enclose at least, or exactly, a given area A. This is our main aim in this paper. Since it is of interest to study phase separation beyond the context of the Ising model, we establish our results not just for the FK model but for general percolation models having properties known, or reasonably expected, to hold quite widely in the percolating regime. These include the FKG property, a special case of the Markov property, exponential decay of dual connectivity and certain mixing properties. In [12] for very low temperatures, and in [17] for all subcritical temperatures, bounds are given for the boundary fluctuations of the Ising droplet, that is, for the deviation of the boundary of the observed droplet from the boundary of an appropriately translated and rescaled Wulff (that is, equilibrium crystal) shape. Let N denote the linear scale of the box. For a droplet also √ of linear scale N , the boundary fluctuations are shown to be of order at most N 3/4 log N , and for a droplet of linear scale lN = N α , with 2/3 < α < 1, the boundary fluctuations are shown to be of order less than N 2/3 lN . There is typically never a droplet of linear scale N α for 0 < α < 2/3; if the excess number of plusses is of order N 2α with 0 < α < 2/3, then these plusses are dispersed throughout the minus phase without the formation of a large droplet. Also, when phase separation does occur, other than the single large droplet there are typically no droplets of linear scale greater than log N . For the random cluster model context, in the special case of independent percolation it was proved in [1] that the boundary fluctuations are at most of order l 2/3 (log l)1/3 for a droplet of linear scale l. In fact for that special case, Theorem 3.1 of [1] is close to Theorem 5.8 of this paper and can be substituted for it in obtaining our main result (2.13). Nearly all the technical difficulties encountered for random cluster models are absent for independent percolation, though, due to the availability of the van den Berg–Kesten inequality [27]. Heuristics suggest that the boundary fluctuation bounds of [1, 12] and [17] are not sharp. To see what the correct fluctuation size should be, one must refine the analysis by considering three separate types of fluctuations. The first type is shrinkage – the actual droplet may be smaller than an ideal “full size” Wulff shape large enough to account for all excess plus spins, since some of the excess may be dispersed in the surrounding
Droplets in Random Cluster Models
735
minus phase. So one should actually consider fluctuations about a shrunken Wulff shape enclosing the same area as the actual droplet. The second type is local roughness, defined as inward deviations of the droplet boundary from the boundary of its own convex hull. The third type is long-wave fluctuations, defined as deviations of the convex hull of the droplet from the shrunken Wulff shape. (More precise definitions will be made below.) The local roughness is of particular interest, and is our main focus in this paper, because it is subject to the same type of interface-roughness heuristics as a wide variety of other dynamic and equilibrium systems, including first-passage percolation [22, 20], various deposition models [18], polymers in random environments [23], asymmetric exclusion processes [18] and longest increasing subsequences of random permutations [8], only the last of which is now well-understood rigorously. For a two-dimensional object of linear scale l, these heuristics predict fluctuations of order l 1/3 and a transverse correlation length of order l 2/3 . For the local roughness, this transverse correlation length should appear as the typical separation between adjacent extreme points, where the droplet boundary touches the boundary of its convex hull (see Sect. 2.) This is what makes local roughness local – one expects distinct inward excursions of the droplet boundary from the convex hull boundary to interact only minimally. The main result of this paper is that in the random cluster model context, with probability approaching 1 as l → ∞, the average local roughness is O(l 1/3 (log l)2/3 ). The above-mentioned heuristic connection between interface models and longest increasing subsequences is perhaps most readily understood via the Poisson-process construction of the longest increasing subsequence (see e.g. [10]). Consider a unitdensity Poisson process in the upper right quadrant of the plane. For z in that quadrant let N (z) be the number of Poisson sites in [0, z1 ] × [0, z2 ] and let T (z) be the maximum number of Poisson sites in any path from 0 to z which moves only upward and to the right. Then the length of the longest increasing subsequence of a random permutation of√length n has the distribution of T (z) given N √(z) = n; this length is asymptotic to 2 n in probability [28]. Therefore (provided 2 n is an integer) one can view the set √ {z : T (z) = 2 n} as an “interface” approximating the curve {(z1 , z2 ) ∈ R2 : z1 > 0, z2 > 0, z1 z2 = n} and the fluctuations of this interface are related to the variability of the length of the longest increasing subsequence. A more complete comparison of the heuristics of local roughness and longest increasing subsequences is given in the next section. Results of Dobrushin and Hryniv [11] and Hryniv [16] (at very low temperatures) strongly suggest that the fluctuations of the droplet boundary about the shrunken Wulff shape should be Gaussian, heuristically resembling roughly a rescaled Brownian bridge added radially to the Wulff shape. In particular, the long-wave fluctuations should be of order l 1/2 . We are only able to bound these by l 2/3 (log l)1/3 , however, in the random cluster model context. 2. Definitions, Heuristics and Statement of Main Results The results in this paper make use of only a few basic properties of the FK or other percolation model, so we will state our results for general bond percolation models satisfying these properties.A bond, denoted xy, is an unordered pair of nearest neighbor sites of Z2 . When convenient we view bonds as being open line segments in the plane; this should be clear from the context. In particular for R ⊂ R2 , B(R) denotes the set of all bonds for which the corresponding open line segments are contained in R, and when we refer to distances between sets of bonds, we mean distances between the
736
K. S. Alexander
corresponding sets of line segments. The exception is for ⊂ Z2 , for which we set B() = { xy : x, y ∈ }. (Again, this should be clear from the context.) For a set D of bonds we let V (D) denote the set of all endpoints of bonds in D, and ∂D = { xy : x ∈ V (D), y ∈ / V (D)},
D = D ∪ ∂D.
We write B() for B(). A bond configuration is an element ω ∈ {0, 1}B(Z ) . The dual lattice is the translation of the integer lattice by (1/2,1/2); we write x ∗ for x + (1/2, 1/2). To each (regular) bond e of the lattice there corresponds a dual bond e∗ which is its perpendicular bisector; the dual bond is defined to be open in a configuration ω precisely when the regular bond is closed, and the corresponding configuration of dual bonds is denoted ω∗ . We write (Z2 )∗ for {x ∗ : x ∈ Z2 }. A cluster in a given configuration is a connected component of the graph with site set Z2 and all open bonds; dual clusters are defined analogously for open dual bonds. (In contexts where there is a boundary condition consisting of a configuration on the complement Dc for some set D of bonds, a cluster may include bonds in Dc .) Cx and Cx ∗ denote the regular and dual clusters containing sites x and x ∗ , respectively. Given a set D of bonds, we write D∗ for {e∗ : e ∈ D}. The set of all endpoints of bonds in D∗ is denoted V ∗ (D) or V ∗ (D∗ ). For ⊂ Z2 or ⊂ (Z2 )∗ we define 2
∂ = {x ∈ / : x adjacent to },
∂in = {x ∈ : x adjacent to c },
where adjacency is in the appropriate lattice Z2 or (Z2 )∗ . A (dual) path is a sequence γ = (x0 , x0 x1 , x1 , . . . xn−1 , xn−1 xn , xn ) of alternating (dual) sites and bonds. γ is self-avoiding if all sites are distinct. We write x ↔ y (in ω) if there is a path of open bonds (or open dual bonds, if x and y are dual sites) from x to y in ω. A circuit is a path with xn = x0 which has all bonds distinct and which does not “cross itself” (in the obvious sense.) Note we do allow xi = xj for any i = j here, i.e. a circuit may touch itself without crossing. A path or circuit is open in a bond configuration ω if all its bonds are open. The exterior of a circuit γ , denoted Ext(γ ), is the unique unbounded component of the complement of γ in R2 , and the interior Int(γ ) is the union of the bounded components. An open circuit γ is called an exterior circuit in a configuration ω if γ ∪ Int(γ ) is maximal among all open circuits in ω. (These definitions differ slightly from what is common in the literature.) Similar definitions apply to dual circuits. A site x is surrounded by at most one exterior circuit; when this circuit exists we denote it x . For u, v points in a path or circuit ζ , let ζ [u,v] and ζ (u,v) denote the closed and open segments, respectively, of ζ from u to v (in the direction of positive orientation, for circuits.) | · | denotes the Euclidean norm for vectors, Euclidean length for curves, cardinality for finite sets, and Lebesgue measure for regions in R2 (which one should be clear from the context). Euclidean distance is denoted d(· , ·). Define d(A, B) = inf{d(x, y) : x ∈ A, y ∈ b} for A, B ⊂ R2 and d(x, A) = d({x}, A). We define the average local roughness of a circuit γ by ALR(γ ) =
| Co(γ )\ Int(γ )| , |∂ Co(γ )|
where Co(·) denotes the convex hull. The maximum local roughness is MLR(γ ) = sup{d(x, ∂ Co(γ )) : x ∈ γ }.
Droplets in Random Cluster Models
737
By a bond percolation model we mean a probability measure P on {0, 1}B(Z ) . The conditional distributions for the model P are 2
PD,ρ = P (· | ωe = ρe for all e ∈ Dc ), where D ⊂ B(Z2 ). We say a bond percolation model P has bounded energy if there exists p0 > 0 such that p0 < P (ωe = 1 | ωb , b = e) < 1 − p0
for all {ωb , b = e}.
(2.1)
From [9], bounded energy and translation invariance imply that there is at most one infinite cluster P -a.s. Write ωD for {ωe : e ∈ D} and let GD denote the σ -algebra generated by ωD . P has exponential decay of dual connectivity if for some C, λ > 0 we have P (x ∗ ↔ y ∗ ) ≤ Ce−λ|y−x|
for all x ∗ , y ∗ .
P has the weak mixing property if for some C, λ > 0, for all finite sets D, E with D ⊂ E, c sup Var(PE ,ρ (ωD ∈ ·), PE ,ρ (ωD ∈ ·)) : ρ, ρ ∈ {0, 1}E ≤C e−λ|x−y| , x∈V (D ),y∈V (E c )
where Var(·, ·) denotes total variation distance between measures. Roughly, the influence of the boundary condition on a finite region decays exponentially with distance from that region. Equivalently, for some C, λ > 0, for all sets D, F ⊂ B(Z2 ), (2.2) sup |P (E | F ) − P (E)| : E ∈ GD , F ∈ GF , P (F ) > 0 ≤C e−λ|x−y| . x∈V (D ),y∈V (F )
P has the ratio weak mixing property if for some C, λ > 0, for all sets D, F ⊂ B(Z2 ), P (E ∩ F ) − 1 : E ∈ GD , F ∈ GF , P (E)P (F ) > 0 sup P (E)P (F ) ≤C e−λ|x−y| , (2.3) x∈V (D ),y∈V (F )
whenever the right side of (2.3) is less than 1. Let Open(D) denote the event that all bonds in D are open. The FK model [14] with parameters (p, q), p ∈ [0, 1], q > 0 on a finite D ⊂ B(Z2 ) is described by a weight attached to each bond configuration ω ∈ {0, 1}B(D) , which is W (ω) = p|ω| (1 − p)|D|−|ω| q C(ω) , where |ω| denotes the number of open bonds in ω and C(ω) denotes the number of open clusters in ω, counted in accordance with the boundary condition, if any; see [15] for details and further information. For integer q ≥ 1 the FK model is a random cluster representation of the q-state Potts model at inverse temperature β given by p = 1 − e−β . For the study of phase separation involving more than two species, for example in the Potts model, it is useful to be able to “tilt” the distribution with one or more external fields
738
K. S. Alexander
before calculating various probabilities, as well as quantities such as surface tension and magnetization. For the q-state Potts model with external fields hi on species i, i = 0, 1, . . . , q − 1, we need only consider 0 = h0 ≥ h1 ≥ . . . ≥ hq−1 and then the factor q C(ω) in the weight W (ω) is replaced by
1 + (1 − p)h1 |C| + . . . + (1 − p)hq−1 |C| , C∈C (ω)
where C(ω) is the set of clusters in the configuration ω and |C| denotes the number of sites in the cluster C. We call species i stable if hi is maximal, i.e. hi = h0 . For each species i and for finite ⊂ Z2 , corresponding to the species-i boundary condition for the Potts model on there is the i-wired boundary condition on B() for the FK model, in which sites in connected to ∂ are considered a single cluster C∂ and assigned weight (1 − p)hi |C∂ | . Given a circuit γ and a configuration ρ ∈ {0, 1}B(Ext(γ )) , conditioning on ρ and on Open(γ ) induces a boundary condition on B(Int(γ )) which is a mixture over i of the different i-wired boundary conditions. The weight assigned to i-wiring in this mixture is proportional to (1 − p)hi N(ρ) , where N (ρ) is the number of sites in γ plus the number of sites in Ext(γ ) connected to γ in ρ. In the absence of external fields, i-wiring is the same for all i and the choice of ρ does not affect the boundary condition induced on B(Int(γ )), which is a form of Markov property, but if an external field is present this property fails. However, the weight assigned to i-wiring in the mixture for unstable i is exponentially small in |γ |, uniformly in ρ, so for large γ the effect of ρ on the boundary condition is uniformly small. Motivated by the preceding, we say a bond percolation model P has the Markov property for open circuits if for every circuit γ (of regular bonds), the bond configurations inside and outside γ are independent given the event Open(γ ). We have seen that the FK model has this property if and only if there are no external fields. If P is the infinitevolume k-wired FK model for some stable k, then letting ωint and ωext denote the bond configurations inside and outside γ , respectively, we have from the preceding discussion for some C, a > 0, P (ωint ∈ A | Open(γ ), ωext ∈ B) sup − 1 : A ∈ GB(Int(γ )) , B ∈ GB(Ext(γ )) P (ωint ∈ A | Open(γ )) (2.4) ≤ Ce
−a|γ |
for all γ .
When (2.4) holds we say P has the near-Markov property for open circuits. It is easy to see that one can interchange the roles of interior and exterior in (2.4). Further, if γ1 , . . . , γk are circuits with disjoint interiors, Bi ∈ GB(Int(γi )) , A ∈ GB(∩i Ext(γi )) , then by easy induction on k, P (A | Open(γi ) ∩ Bi for all i ≤ k) 1 + Ce−a|γi | ≤ P (A | Open(γi ) for all i ≤ k) 1 − Ce−a|γi | i≤k
(2.5)
Droplets in Random Cluster Models
739
and 1 + Ce−a|γi | P (A | Open(γi ) for all i ≤ k) . ≤ P (A | Open(γi ) ∩ Bi for all i ≤ k) 1 − Ce−a|γi |
(2.6)
i≤k
An event A ⊂ B(Z2 ) is called increasing if ω ∈ A and ω ≤ ω imply ω ∈ A. Here ω ≤ ω refers to the natural coordinatewise partial ordering. A bond percolation model P has the FKG property if A, B increasing implies P (A ∩ B) ≥ P (A)P (B). Throughout the paper, 41 , 42 , . . . , c1 , c2 , . . . and K1 , K2 , . . . are constants which depend only on P . We reserve 4i for constants which are “sufficiently small”, Ki for constants which are “sufficiently large”, and ci for those which fall in neither category. Our basic assumptions will be that P is translation-invariant, is invariant under 90◦ rotation, and has the FKG property, bounded energy and exponential decay of dual connectivity, and P,ρ has the FKG property for all , ρ.
(2.7)
When necessary we will also assume weak mixing, ratio weak mixing and/or the nearMarkov property for open circuits. Since P has the FKG property, − log P (0∗ ↔ x ∗ ) is a subadditive function of x, and therefore the limit 1 τ (x) = lim − log P (0∗ ↔ (nx)∗ ), n→∞ n
(2.8)
exists for x ∈ Q2 , provided we take the limit through values of n for which nx ∈ Z2 . This definition extends to R2 by continuity (see [2]); the resulting τ is a norm on R2 , when the dual connectivity decays exponentially (i.e. τ (x) is positive for all x = 0, or equivalently by lattice symmetry, τ (x) is positive for some x = 0; we abbreviate this by saying τ is positive.) By standard subadditivity results, P (0∗ ↔ x ∗ ) ≤ e−τ (x)
for all x.
(2.9)
In the opposite direction, it is known [4] that if τ is positive, ratio weak mixing holds and some milder assumptions hold then for some 41 and K1 , P (0∗ ↔ x ∗ ) ≥ 41 |x|−K1 e−τ (x)
for all x = 0.
(2.10)
It follows from the fact that the surface tension τ is a norm on R2 with axis symmetry that, letting ei denote the i th unit coordinate vector, for κτ = τ (e1 ) we have τ (x) √ 1 ≤ 2κτ √ κτ ≤ |x| 2
for all x = 0.
(2.11)
For a curve γ tracing the boundary of a convex region we define the τ -length of γ as the line integral τ (vx ) dx, W(γ ) = γ
where vx is the unit forward tangent vector at x and dx is arc length. The Wulff shape is the convex set K1 = K1 (τ ) which minimizes W(∂V ) subject to the constraint |V | = 1.
740
K. S. Alexander
(We also refer to multiples of K1 as Wulff shapes, when confusion is unlikely.) The Wulff shape actually minimizes W over a much larger class of γ than just boundaries of convex sets ([24, 25]) but that fact will not concern us here. We define w1 = W(∂K1 ). Let dτ (· , ·) denote τ -distance; diam(·) and diamτ (·) denote Euclidean diameter and τ -diameter, respectively. B(x, r) and Bτ (x, r) denote the closed Euclidean and τ -balls, respectively, of radius r about x. We write x + A for the translation of the set A by the vector x. dH denotes Hausdorff distance. The deviation of a closed curve γ from the boundary of an area-A Wulff shape is given by √ ;A (γ ) = inf dH (γ , z + ∂( AK1 )). z
As a convention, whenever we refer to the object in a finite class which maximizes or minimizes something, we implicitly assume there is a deterministic algorithm for breaking ties. Our description of heuristics for the local roughness is nonrigorous, so we permit ourselves the following partly vague assumptions: (i) The Wulff shape boundary has curvature bounded away from 0 and ∞. (ii) For a droplet of any linear scale l, there is a characteristic length scale ξ = ξ(l) representing the typical spacing between adjacent extreme points where the droplet touches the boundary of its convex hull. √ (iii) On any length scale n ≤ ξ the fluctuations of the droplet boundary are of order n. Here (iii) is reasonable because within each inward excursion between extreme points, the droplet boundary is nearly unconstrained, except by surface tension. (i) is known for the Ising case from the exact solution (see [7, 21].) Under (i), an arc of ∂(lK1 ) of length n deviates from the corresponding secant line by a distance of order n2 / l, so we call n2 / l the curvature deviation (on scale n.) On the characteristic scale ξ the fluctuations and the curvature deviation should be of the same order, that is, ξ2 ≈ ξ. l
(2.12)
To see this, consider two adjacent extreme points x and y of the √ droplet boundary separated by a distance of order ξ . If the curvature deviation ξ 2 / l ξ this means the boundary between x and y is following the straight segment xy much more closely than it follows the arc of ∂(lK1 ) from x to y, that is, the droplet has an approximate facet from x to y. But such facets are isoperimetrically disadvantageous since the Wulff shape √ lacks them under (i), so this is not a probable picture. Therefore we expect ξ 2 / l ≤ ξ . Another point of view is obtained by viewing the droplet boundary as a curve fluctuating above and below the boundary of the approximating Wulff shape. Consider to be specific the highest point u of the approximating Wulff shape. Suppose the nearest extreme points x, y of the droplet boundary to the left and right are at a horizontal distance ξ from u. The Wulff-shape boundary at distance ξ is an amount of order ξ 2 / l lower than it is at u, but at least one of x, y must be higher than the droplet boundary point above u. Thus the fluctuation at x or y must overcome the √ curvature deviation to achieve the necessary height, which means again that ξ 2 / l ≤ ξ . √ On the other hand, if the curvature deviation ξ 2 / l ξ then even on length scales n ξ an arc of the Wulff shape boundary looks nearly flat compared to the droplet
Droplets in Random Cluster Models
741
boundary fluctuations, so along such an arc the roughly n/ξ extreme points appear as a large number of local maxima of the droplet boundary above the Wulff shape boundary, √ all approximately collinear. This too is an unlikely picture, so we expect ξ 2 / l ≥ ξ . From (2.12) we get ξ ≈ l 2/3 , and then from (iii), we expect local roughness of order l 1/3 . The same relation (2.12) occurs in the assorted systems mentioned in the introduction, and it is worth comparing the heuristics of the present problem to the most wellunderstood case, longest increasing subsequences. Consider the process T (z) defined in the introduction. An up-right path from (0, 0) to (2l, 2l) which is maximal (i.e. contains the maximum number of Poisson sites) deviates transversely from the diagonal by some typical distance ξ˜ . This same typical distance characterizes the location, relative to the center point (l, l), of the maximum of T (z) as z varies along√ the transverse diagonal L2l = {(z1 , z2 ) : z1 + z2 = 2l}. The mean of T (z) varies as z1 z2 hence is reduced by an amount of order ξ˜ 2 / l at distance ξ˜ from (l, l) along L2l . This reduction is like the curvature deviation in that it corresponds to the deviation of the straight line L2l from the curved line {z : z1 z2 = l 2 } of constant mean. As with the local roughness, the fluctuations of T (z) must overcome the reduction in mean in order that the maximum 2 occur at transverse distance ξ˜ , so we have ξ˜ / l ≤ ξ˜ . Essentially the same “collinear local maxima” picture described above applies to T (z) varying along L2l , so we have also ξ˜ 2 / l ≤ ξ˜ and thus ξ˜ of order l 2/3 . For r > q > 0, an (q, r)-bottleneck in an exterior dual circuit γ is an ordered pair (u, v) of sites in γ such that there exists a path of length at most q from u to v in Int(γ ), and the segments γ [u,v] and γ [v,u] each have diameter at least r. When r is not very large (as in our main theorem, where r can be of order log l) the absense of (q, r)-bottlenecks reflects a high degree of regularity in the structure of the boundary. Note, however, that only outward protuberances of the boundary count as bottlenecks; we do not establish the absense of inward protuberances, though we anticipate this could be accomplished by similar methods. Our main theorem is the following. Theorem 2.1. Let P be a percolation model on B(Z2 ) satisfying (2.7), the near-Markov property for open circuits,√and the ratio weak mixing property. There exist Ki , 4i such that for A > K2 and l = A, under the measure P (· | | Int(0 )| ≥ A) with probability approaching 1 as A → ∞ we have ALR(0 ) ≤ K3 l 1/3 (log l)2/3 , ;A (∂ Co(0 )) ≤ K4 l
2/3
MLR(0 ) ≤ K5 l
2/3
(2.13)
1/3
(log l)
,
(2.14)
1/3
(log l)
,
(2.15)
and, for 42 A ≥ r ≥ 15q ≥ K6 log A, 0 is (q, r)-bottleneck-free.
of
(2.16)
It is easy to see that if MLR(γ ) and ;A (∂ Co(γ )) are each a sufficiently small fraction √ A, then MLR(γ ) = dH (γ , ∂ Co(γ )) and hence ;A (γ ) ≤ ;A (∂ Co(γ )) + MLR(γ ).
(2.17)
742
K. S. Alexander
Here “sufficiently small” does not depend on A. Hence provided A is large, (2.14) and (2.15) imply ;A (0 ) ≤ (K4 + K5 )l 2/3 (log l)1/3 .
(2.18)
Theorem 7.4 below shows that one may condition on | Int(0 )| = A instead of on | Int(0 )| ≥ A in Theorem 2.1. For Bernoulli bond percolation, (2.14) was established in [1]. As mentioned in the introduction, the exponent 2/3 appearing in (2.14) and (2.15) is presumably not optimal, with the optimal value instead being 1/2. Theorem 2.1 does not of course fully establish the validity of the heuristic (2.12) or of its implication that ALR(0 ) should be of order l 1/3 . For one thing it gives only an upper bound. In the case of Bernoulli percolation, lower bounds for order l 1/3 (log l)2/3 MLR(0 ) were recently proved in [26]. Second, the proof does not follow the heuristic. It is instead based on the idea that the “cost” of 0 is really given by the τ -length of the boundary of its convex hull. A large value of ALR(0 ) “pushes out” the convex hull boundary and thus increases the cost. The basic strategy of the proof of Theorem 2.1 is like that of [1, 6, 12] and [17]: one establishes a lower bound for P (| Int(0 )| ≥ A) and upper bounds for the (unconditional) probability that | Int(0 )| ≥ A, but 0 does not have the desired behavior, these upper bounds being much smaller than the lower bound. Both the upper and lower bounds involve coarse-graining to create a skeleton (y0 , . . . , yn , y0 ) for 0 , and both require showing that the various connections yj ↔ yj +1 occur approximately independently, that is, P (y0 ↔ . . . . ↔ yn ↔ y0 ) can be approximated in an appropriate sense by P (yj ↔ yj +1 ), j ≤n
and hence by
exp −
τ (yj +1 − yj )
j ≤n
(setting yn+1 = y0 .) For the lower bound this is a straightforward application of the FKG inequality, but for the upper bound the mixing properties established in Sect. 3 are needed, and our methods must handle “pathological” forms of 0 having many nearself-intersections. The difficulties are of two types. First, near-independence generally requires large spatial separation, or, under the near-Markov property, separation by a circuit of open bonds, neither of which need be present in our context. Second, direct application of standard mixing properties such as weak mixing requires specifying in advance some deterministic spatially separated regions on which the near-independent events will occur, but in our context one does not know a priori where the paths yj ↔ yj +1 may go. This is where Lemma 3.2 below is important. We consider now the special case of the FK model on B(Z2 ). For each (p, q) there is a value p∗ dual to p at level q given by p (1 − p ∗ ) = ; p∗ q(1 − p)
Droplets in Random Cluster Models
743
the dual configuration to the infinite-volume wired-boundary FK model at (p, q) is the infinite-volume free-boundary FK model at (p ∗ , q) (see [15].) The model has a percolation critical point pc (q) which for √ q = 1, q √ = 2 and q ≥ 25.72 is known to coincide with the self-dual point psd (q) = q/(1 + q) [19]; positivity of τ is known to hold for p > psd (q) for these same values of q. For 2 < q < 25.72, it is known that positivity of τ holds for p > psd (q − 1)∗ , where the ∗ refers to duality at level q [5]. The FK model without external fields has the Markov property for open circuits; assuming positivity of τ it satisfies (2.7) (see [15]) and has the ratio weak mixing property [3]. In a forthcoming paper we will show that positivity of τ also implies ratio weak mixing for the FK model with external fields. For now, we can conclude the following from Theorem 2.1. Theorem 2.2. Let P be the FK model at (p, q) on B(Z2 ) with q ≥ 1 (without external fields) and √suppose the surface tension τ is positive. There exists K2 such that for A > K2 and l = A, under the measure P (· | | Int(0 )| ≥ A) with probability approaching 1 as A → ∞, (2.13)–(2.16) hold. As a byproduct of the proof of Theorem 2.1 we will obtain the following largedeviation-type estimate. A related result for Bernoulli percolation appears in [1]. Theorem 2.3. Let P be a percolation model on B(Z2 ) satisfying (2.7), the near-Markov property for open circuits, and the ratio weak mixing property. Then P (| Int(0 )| = A) = e−w1
√
A+O(A1/6 (log A)2/3 )
as A → ∞.
3. Preliminaries – Coarse Graining and Mixing Properties We first define our coarse-graining concepts. Our definition of the s-hull skeleton follows [1], with some added refinements. For a contour γ let Eγ denote the set of extreme points of Co(γ ) and let γco : [0, 1] → R2 be a curve which traces ∂ Co(γ ) in the direction of positive orientation, beginning at the leftmost lattice site u0 having minimal second coordinate. When confusion is unlikely we also use γco to denote the image of this curve. Note that u0 ∈ Eγ ⊂ γco ∩ γ ∩ (Z2 )∗ . To define the s-hull skeleton we require that the τ -diameter of γ be at least 2s. We traverse γ in the direction of positive orientation, beginning at, say, the leftmost lattice site u0 having minimal second coordinate. Given u0 , . . . , uj ∈ Eγ , go forward from uj along γco until either u0 or ∂Bτ (uj , s) is reached, at some point uj +1 . Then backtrack along γco until a point of Eγ is reached (possibly uj +1 , meaning we backtrack zero distance.) If this backtracking does not require going all the way back to uj , then this new point of Eγ is labeled uj +1 . If instead the backtracking does require going all the way back to uj , then from uj +1 continue forward along γco , necessarily in a straight line, to the next point of Eγ , which is then labeled uj +1 . Stop the process when um+1 = u0 for some m. The s-hull pre-skeleton is then (u0 , . . . , um+1 ). (A similar definition, under the name “s-hull skeleton”, may be found in [1].) The sites u0 , . . . , um+1 are sites of Eγ which appear in order in γco (and in γ .) Therefore Co({u0 , . . . , um+1 }) is a convex polygon bounded by the polygonal path u0 → · · · → um+1 . Now W(γco ) ≥
m j =0
τ (uj +1 − uj ),
744
K. S. Alexander
and as noted in [1], we have τ (uj +2 − uj ) > s
for all
0 ≤ j ≤ m − 2.
Therefore m≤1+
2W(γco ) . s
(3.1)
To obtain the s-hull skeleton we refine the s-hull pre-skeleton. This is necessary because if γco has a sharp corner, then the polygonal path u0 → · · · → um+1 may clip this corner excessively, meaning part of γ may be too far outside the polygon for our needs. We must add vertices so that, whenever possible, the angular change between successive segments of the polygonal path does not exceed s/diam(γ ), which is of the order of the angular change we would obtain if γ were a circle. By convexity, for each x ∈ γco there exists a forward tangent vector vx and a corresponding forward tangent line; for y = x ∈ γco let α(x, y) denote the angle measured counterclockwise from vx to vy . Fix 0 ≤ j ≤ m and let uj 0 = uj . Note that α(uj , ·) is a nondecreasing function [u
,u
]
as one traces γco from uj to uj +1 . Having defined uj 0 , . . . , uj k ∈ Eγ ∩ γco j −1 j +1 , let uj,k+1 be the first point of γco after uj,k for which α(uj,k , uj,k+1 ) ≥ s/ diamγ . If there is no such point, then set uj,k+1 = uj +1 and stop the process. Necessarily uj,k+1 is a lattice site in Eγ . We call the sites uj k strictly between uj and uj +1 refinement sites. Let (w0 , . . . , wn , wn+1 ) be a relabeling of all sites uj k , 0 ≤ j ≤ m, k ≥ 1, in order on γco , with w0 = wn+1 = u0 ; we call (w0 , . . . , wn , wn+1 ) the s-hull skeleton of γ and denote it HSkels (γ ). The polygonal path w0 → · · · → wn+1 is denoted HPaths (γ ). It is easy to see that the angle between uj k −uj,k−1 and uj,k+2 −uj k cannot be less than s/ diam(γ ), for k ≥ 1. It follows that the number of refinement sites satisfies n − m ≤ 4π diam(γ )/s. With (3.1) this shows that the number of sites in the s-hull skeleton satisfies n + 1 ≤ K7 diam(γ )/s.
(3.2)
As with the pre-skeleton, the sites w0 , . . . , wn+1 appear in order in γ and in γco , and Co({w0 , . . . , wn+1 }) is a convex polygon bounded by HPaths (γ ). A key property of the s-hull skeleton involves the extent to which γ can go outside Co({w0 , . . . , wn+1 }). Let Tj denote the triangle formed by the segment wj wj +1 , the forward tangent line at wj and the backward tangent line at wj +1 . The angle between these two tangent lines is at most s/ diam(γ ). Now every point of γ outside Co({w0 , . . . , wn+1 }) is in some Tj , and γco ∩ Tj ⊂ γ [wj ,wj +1 ] . Let J = {j ≤ n : τ (wj +1 − wj ) ≤ 2s}. For distinct points x, y ∈ R2 let Hxy (H xy ) denote the open (closed) halfspace which is to the right of the line from x to y. From the construction of the s-hull pre-skeleton, if γ (uj ,uj +1 ) ⊂ Bτ (uj , s) for some j , then γ ∩ Huj uj +1 = φ. It follows that if τ (wj +1 − wj ) > 2s for some j , then wj and wj +1 are sites of the s-hull pre-skeleton and γ ∩ Hwj wj +1 = φ. Thus we have Int(γ )\ Int(HPaths (γ )) ⊂ ∪j ∈J Tj .
(3.3)
Droplets in Random Cluster Models
745
For j ∈ J we have |Tj | ≤ K8 s 3 /diam(γ ) and d x, Int(HPaths (γ )) ≤ K9 s 2 /diam(γ ) for all x ∈ Tj . With (3.3) this shows that | Int(γ )\ Int(HPaths (γ ))| ≤ K10 s 2 and
(3.4)
sup d x, Int(HPaths (γ )) ≤ K9 s 2 /diam(γ ).
x∈Co(γ )
(3.5)
From (3.5) and convexity it follows that W(γco ) ≤ W(HPaths (γ )) + K11 s 2 / diam(γ ).
(3.6)
We turn now to mixing properties. The following is an immediate consequence of the definition of ratio weak mixing. Lemma 3.1. ([4]) Suppose P has the ratio weak mixing property. There exists a constant K12 as follows. Suppose r > 3 and D, E ⊂ B(Z2 ) with diam(E) ≤ r and d(D, E) ≥ K12 log r. Then for all A ∈ GD and B ∈ GE , we have 1 P (A)P (B) ≤ P (A ∩ B) ≤ 2P (A)P (B). 2 A weakness of Lemma 3.1 is that the locations D, E of the two events must be deterministic. The next lemma applies only to a limited class of events but allows the locations to be partially random. For C ⊂ D ⊂ B(Z2 ) we say an event A ⊂ {0, 1}D occurs on C (or on C ∗ ) in ω ∈ {0, 1}D if ω ∈ A for every ω ∈ {0, 1}D satisfying ωe = ωe for all e ∈ C. For a possibly random set F(ω) we say A occurs only on F (or equivalently, on F ∗ ) if ω ∈ A implies A occurs on F(ω) in ω. We say events A and B occur at separation r in ω if there exist C, E ⊂ D with d(C, E) ≥ r such that A occurs on C and B occurs on E in ω. Let A ◦r B denote the event that A and B occur at separation r. Let Dr = {e ∈ B(Z2 ) : d(e, D) ≤ r}. In the next lemma we give two alternate hypotheses, in the interest of wider applicability, though either hypothesis alone suffices for our purposes in this paper. Lemma 3.2. Assume (2.7) and either (i) the ratio weak mixing property or (ii) both the weak mixing property and the near-Markov property for open circuits. There exist constants Ki , 4i as follows. Let D ⊂ B(Z2 ), x ∗ ∈ (Z2 )∗ and r > K13 log |D|, and let A, B be events such that A occurs only on Cx ∗ and B ∈ GD . Then P (A ◦r B) ≤ (1 + K14 e−43 r )P (A)P (B).
(3.7)
Proof. First suppose P has the ratio weak mixing property. For y ∗ ∈ V ∗ (Dr ), let ∗ ∗ ∗ ∗ y∗ y∗ B y = B(y ∗ , r/12), BZ = B y ∩ (Z2 )∗ and D y = B(y ∗ , r/6), DZ = D y ∩ (Z2 )∗ . Define events y∗
Gy ∗ = [y ∗ ↔ ∂in BZ by an open dual path],
∗
Zy ∗ = [A and B occur at separation r on B(D y )c ], Q = ∪y ∗ ∈V ∗ (D∗ ) (Zy ∗ ∩ Gy ∗ ).
746
K. S. Alexander
Then by (2.9), P (Gy ∗ ) ≤ K15 e−44 r .
(3.8)
Let C and λ be as in (2.2). Then for some Ki , 4i depending on C, λ, provided K13 is chosen large enough P ((A ◦r B) ∩ Q) ≤ P (Zy ∗ )P (Gy ∗ | Zy ∗ ) y ∗ ∈V ∗ (D r )
≤
P (Zy ∗ )(P (Gy ∗ ) + K16 r 2 e−45 r )
y ∗ ∈V ∗ (D r )
(3.9)
≤ K17 r 2 |D|P (A ◦r B)(K18 e−44 r + K16 r 2 e−45 r ) ≤ K18 e−46 r P (A ◦r B). Therefore P (A ◦r B) ≤ (1 − K18 e−46 r )−1 P ((A ◦r B) ∩ Qc ).
(3.10)
Suppose ω ∈ (A ◦r B) ∩ Qc , and consider C, E with C ∗ ⊂ Cx ∗ (ω), E ⊂ D and d(C, E) ≥ r for which A occurs on C ∗ and B occurs on E in ω. If r/2 < d(y ∗ , C) ≤ r/2 + 1 for some y ∗ ∈ V ∗ (Dr ), then y ∗ ∈ / Cx ∗ (ω) since ω ∈ / Zy ∗ ∩ Gy ∗ . Therefore Cx ∗ (ω) ⊂ C r/2 and hence d(E, Cx ∗ (ω)) > r/2. For F ⊂ B(Z2 ) let BF ,r be the event that B occurs on some set E ⊂ D with d(E, F) > r/2. We call F A-sufficient if ωe = 0 for all e ∈ F implies ω ∈ A. Let A denote the set of all finite A-sufficient subsets of B(Z2 ). Since the event [Cx ∗ = F ∗ ] occurs on B(V (F)) for all F ⊂ B(Z2 ), we have by ratio weak mixing P ((A ◦r B) ∩ Qc ) ≤ P ([Cx ∗ = F ∗ ] ∩ BF ,r ) F ∈A
≤
(1 + K19 |D|e−47 r )P (Cx ∗ = F ∗ )P (BF ,r )
(3.11)
F ∈A
≤ (1 + K20 e−48 r )P (A)P (B). Combining (3.10) and (3.11) proves (3.7), provided K13 is sufficiently large. Under hypothesis (ii) of weak mixing and the near-Markov property for open circuits, the proof through the first inequality of (3.11) is still valid, but we need to modify the rest of (3.11) as follows. Fix F ∈ A and let F = [Cx ∗ = F ∗ ], D = D\F r , A = {e ∈ B(Z2 ) : r/6 ≤ d(e, D ) ≤ r/3}. Let C be the set of all circuits (of regular bonds) in A which surround D and let OA be the event that some circuit α ∈ C is open; for ω ∈ OA there is a unique outermost open circuit in C, which we denote = (ω). Note that c OA ⊂ ∪y ∗ ∈V ∗ (A) Gy ∗ ,
|α| ≥ r
(3.12)
for all α ∈ C,
(3.13)
for some event Gα ∈ GB(Ext(α)) .
(3.14)
and [ = α] = Open(α) ∩ Gα
Droplets in Random Cluster Models
747
Let pα = P ( = α),
pF α = P ( = α | F ),
pFBα = P ( = α | F ∩ BF ,r ).
By weak mixing we have for δ = K21 e−49 r : |pF α − pα | ≤ δ, |pFBα − pα | ≤ δ. α∈C
(3.15)
α∈C
Define the set of “good” circuits R = {α ∈ C : pF α ≤ pα (1 + and let
h(α) =
√
δ)}
pF α − 1 1[α∈C\R] . pα
From (3.13), (3.14) and the near-Markov property for open circuits, if r is sufficiently large we obtain P (BF ,r | F ∩ [ = α]) ≤ (1 + δ )P (BF ,r | = α),
(3.16)
where δ = 3Ce−ar < 1 for C, a as in (2.4). We need to bound P (F ∩ BF ,r ). To do this, we decompose F ∩ BF ,r into 3 pieces c (the latter meaning there is no .) by intersecting it with [ ∈ R], [ ∈ C\R] and OA We then show that the first piece is approximately bounded by P (F )P (BF ,r ), and the other two pieces are negligible relative to the size of the full event. Specifically, from (3.16), P (F ∩ BF ,r ∩ [ ∈ R]) = pF α P (F )P (BF ,r | F ∩ [ = α]) α∈R
≤ (1 +
√
δ)(1 + δ )P (F )
pα P (BF ,r | = α) (3.17)
α∈R
√ ≤ (1 + 2 δ + δ )P (F )P (BF ,r ). Next, one application of (3.15) yields the expected value E(h()1OA ) = h(α)pα ≤ δ; α∈C\R
this and a second application of (3.15), with Markov’s inequality, yield √ P (F ∩ BF ,r ∩ [ ∈ C\R]) = P (F ∩ BF ,r )P (h() > δ | F ∩ BF ,r ) √ ≤ P (F ∩ BF ,r ) P (h() > δ) + δ √ ≤ P (F ∩ BF ,r )( δ + δ). Finally, similarly to (3.9) we have using (3.12), c P (F ∩ BF ,r ∩ OA )≤ P (F ∩ BF ,r ∩ Gy ∗ ) y ∗ ∈V ∗ (A)
≤ δ P (F ∩ BF ,r ),
(3.18)
(3.19)
748
K. S. Alexander
where δ = K22 e−410 r . Combining (3.17), (3.18) and (3.19) we obtain √ P (F ∩ BF ,r ) ≤ (1 − δ − δ − δ )−1 P (F ∩ BF ,r ∩ [ ∈ R]) ≤ (1 + K23 e−411 r )P (F )P (BF ,r ).
(3.20)
Summing over F yields P ([Cx ∗ = F ∗ ] ∩ BF ,r ) ≤ (1 + K23 e−411 r )P (A)P (B) F ∈A
which substitutes for (3.11). " # 4. Lower Bounds for Open Dual Circuit Probabilities In this section we prove the following result. Theorem 4.1. Let P be a percolation model on B(Z2 ) satisfying (2.7), the near-Markov property for open circuits, positivity√of τ and the ratio weak mixing property. There exist Ki such that for A > K24 and l = A,
√ P (| Int(0 )| ≥ A) ≥ exp − w1 A − K25 l 1/3 (log l)2/3 . The size of the error term K25 l 1/3 (log l)2/3 in Theorem 4.1 is important because it determines what “bad” behaviors can be ruled out as unlikely – in particular, those which have probability at most exp(−w1 l − cl 1/3 (log l)2/3 )) for some c > K25 . Though our error term is likely not optimal – according to [16] the optimal error term may be of order log l – it is enough of an improvement over the corresponding results in [12] and [17] to enable us to establish an apparently near-optimal bound on the local roughness. The proof of our Theorem 4.1 relies on the following result, the halfspace version of (2.10). Theorem 4.2 ([4]). Let P be a percolation model on B(Z2 ) satisfying (2.7), positivity of τ and the ratio weak mixing property. There exist 412 , K26 such that for all x = y ∈ R2 and all dual sites u, v ∈ Hxy , P (u ↔ v via an open dual path in Hxy ) ≥
412 e−τ (v−u) . |v − u|K26
Proof of Theorem 4.1. Let s = l 2/3 (log l)1/3 and δ = K27 s 2 / l, with K27 to be specified. Let (y0 , . . . , yn , y0 ) be the s-hull skeleton of ∂(l + δ)K1 . For each i let yi be a dual site √ with yi ∈ Hyi−1 yi ∩ Hyi yi+1 and |yi − yi | ≤ 2 2. By (3.5), provided K27 is large enough we have Co({y0 , . . . , yn }) ⊃ lK1
and hence
| Co({y0 , . . . , yn })| ≥ A.
Further, n j =0
τ (yj +1 − yj ) ≤ W(∂(l + δ)K1 ) + 4κτ n ≤ w1 l + K28 l 1/3 (log l)2/3
(4.1)
Droplets in Random Cluster Models
749
(with yn+1 = y0 .) Therefore using the FKG property, Theorem 4.2, (4.1) and (3.2), P (| Int(0 )| ≥ A) ≥ P (0 encloses Co({y0 , . . . , yn })) ≥ P (yj ↔ yj +1 via a path in Hyj yj +1 for all j ≤ n) ≥
n j =0
P (yj ↔ yj +1 via a path in Hyj yj +1 )
n 4 (n+1) 12 ≥ K exp − τ (yj +1 − yj ) l 26 j =0
1/3 ≥ exp − w1 l − K29 l (log l)2/3 . # "
(4.2)
5. Upper Bounds for Open Dual Circuit Probabilities We need to develop a method of cutting a dual circuit across a bottleneck, modifying the bond configuration to create two dual circuits. The cutting procedure is simplified if the bottleneck is clean, in the following sense. The canonical path from dual site u = (x1 , y1 ) to dual site v = (x2 , y2 ) is the path, denoted ζuv , which goes horizontally from (x1 , y1 ) to (x2 , y1 ), then vertically to (x2 , y2 ). We call a bottleneck (u, v) clean if ζuv ⊂ Int(γ ) (except for the endpoints u, v.) The next lemma will enable us to restrict our cutting to clean bottlenecks. Lemma 5.1. If a dual circuit γ contains a (q, r)-bottleneck for some r > 3q > 0, then γ contains a clean (q, r/3)-bottleneck. Proof. Suppose γ contains a (q, r)-bottleneck (u, v). We have two disjoint paths from u to v: γ [u,v] and γ [v,u] (traversed backwards.) Each of these may intersect ζuv a number of times. Accordingly, ζuv contains a finite sequence of sites u = x0 , x1 , . . . , xm = v such that the segment ζi of ζuv between xi−1 and xi satisfies ζi ⊂ Int(γ ) for all i ∈ I and ζi ⊂ Ext(γ ) ∪ γ for all i ∈ / I , where I consists either of all odd i or of all even i. For i ∈ I , we call the segment of ζuv with endpoints xi−1 and xi an interior gap. Let ψ be a dual path from u to v in Int(γ ) with |ψ| ≤ q. We can extend ψ to a doubly infinite path ψ + by adding on (possibly non-lattice) paths ψ1 from v to ∞ and ψ2 from ∞ to u, both in Ext(γ ). The path ψ + divides the plane into two regions, AL ⊃ γ [v,u] to the left of ψ + and AR ⊃ γ [u,v] to the right. Replacing ψ with ζuv in the definition of ψ + , we obtain another doubly infinite path ζ + . The path ζ + is not necessarily self-avoiding, but R2 \ζ + has exactly two unbounded components BL and BR , to the left and right of ζ + , respectively. Since diam(ζuv ) ≤ q, there exist sites z1 ∈ γ [u,v] and z2 ∈ γ [v,u] for which d(zj , ζuv ) ≥ (r − q)/2 > q (j = 1, 2) and hence z1 ∈ BR , z2 ∈ BL . Let θ be a (possibly non-lattice) path from z1 to z2 in Int(γ ). Then θ must intersect ζ + , and hence must intersect ζuv , necessarily in some interior gap. Thus every θ from z1 to z2 in Int(γ ) must cross at least one interior gap, so there exists an interior gap ζi which separates z1 and z2 , that is, exactly one of z1 , z2 is in γ [xi−1 ,xi ] . It follows that (xi−1 , xi ) is a clean (q, (r − q)/2)-bottleneck. Since (r − q)/2 > r/3, the proof is complete. " #
750
K. S. Alexander
Define Rx = x + [−1/2, 1/2]2 and Rx+ = x + [−1, 1]2 . Let Q1 (u, v) = ∪x∈ζuv Rx and Q2 (u, v) = ∪x∈ζuv Rx+ . Note that |∂Q2 (u, v)| ≤ 4q + 8. Let J1 (u, v), J2 (u, v), . . . . be an enumeration of the subsets of ∂Q2 (u, v). We say a clean (q, r)-bottleneck (u, v) in a dual circuit γ is of type η if the set of bonds in ∂Q2 (u, v) which are contained (except possibly for endpoints) in Int(γ ) is precisely Jη (u, v). Following [17] we call a dual circuit γ r-large if diamτ (γ ) > r and r-small if diamτ (γ ) ≤ r. We assume we have a fixed but arbitrary algorithm for choosing a particular (q, r)bottleneck, which we then call primary, from any circuit containing one or more (q, r)bottlenecks. When a configuration ω includes an exterior dual circuit γ for which (u, v) is a primary (q, r)-bottleneck of type η, we can apply a procedure, which we term bottleneck surgery (on γ , at (u, v)) to create a new configuration denoted Yuvη (ω). Bottleneck surgery consists of replacing the configuration ω with the configuration given by if e ∈ ∂Q1 (u, v), 1, Yuvη (ω)e = 0, (5.1) if e∗ ∈ Jη (u, v), ω , otherwise, e for each bond e. The configuration Yuvη (ω) then contains two or more disjoint open dual circuits αi , each consisting of some dual bonds of γ and some dual bonds of Jη (u, v), with no open dual path connecting αi to αj for i = j , and with ∪i Int(αi ) = Int(γ )\Q2 (u, v). Further, |Q2 (u, v)| +
| Int(αi )| ≤ K30 r 2 ,
(5.2)
(5.3)
i:αi (κτ r/3)-small
and, since γ is exterior, there is no open dual path connecting αi to αj for i = j . We call each αi a (q, r)-offspring or a (q, r)-descendant of γ . A (q, r)-offspring of a (q, r)descendant is also a (q, r)-descendant, iteratively. We may perform bottleneck surgery on each (q, r)-offspring of γ which contains a clean (q, r)-bottleneck, and iterate this process until no descendant of γ contains such a clean (q, r)-bottleneck (necessarily after a finite number of surgeries.) The bottleneck-free (q, r)-descendants are called final (q, r)-descendants. Among final (q, r)-descendants, the one enclosing maximal area is called the maximal (q, r)-descendant of γ and denoted αmax,γ . The set of all (κτ r/3)-large final (q, r)-descendants of γ is denoted F(q,r) (γ ); the non-maximal among these form the set F(q,r) (γ ) = F(q,r) (γ )\{αmax,γ }. Note that since γ is exterior, so is each offspring of γ . It is useful to note the following general fact about norms on R2 , which can be verified by a simple geometric argument. Let C be a convex set; then W(∂C) ≤ 6 diamτ (C). Define u(c, A) = max(w1 A1/2 − cA1/6 , 0)
(5.4)
Droplets in Random Cluster Models
and D(q,r) (γ ) =
751
diamτ (α),
α∈F(q,r) (γ )
D(q,r) (γ ) =
diamτ (α).
α∈F(q,r) (γ )
The following lemma is related to (5.4). Lemma 5.2. Let γ be a circuit, let A = | Int(γ )|, and let q ≥ 1, r ≥ 15q. Then D(q,r) (γ ) ≥
1 √ w1 A. 6
(5.5)
Proof. We may assume γ contains a clean (q, r)-bottleneck (u, v), for otherwise (5.5) is immediate from (5.4). We have √ 1 q + 2 2 ≤ r. 4 Let S denote the union of Q2 (u, v) and all (κτ r/3)-small offspring of γ , and let R = {z ∈ R2 : dτ (z, Q2 (u, v)) ≤ κτ r/3}. Then S ⊂ R,
√ √ 2 2r diam(R) ≤ q + 2 2 + 3
and √ √ q +2 2+ 2r 2 |R| ≤ z ∈ R : d(z, Q2 (u, v)) ≤ ≤π 3 2
√ 2 2r 3
2 ≤ r 2 . (5.6)
Note that the set {α1 , α2 , . . . .} of (κτ r/3)-large offspring of γ can be divided into two disjoint classes: right offspring, which intersect γ [u,v] , and left offspring, which intersect γ [v,u] . Also, every point of γ is either√ in a left offspring, in a right offspring, or in S. The diameter of Q2 (u, v) is at most q + 2 2 ≤ r/6, while the diameters of γ [u,v] √ and γ [v,u] are at least r, so the right and left classes each include at least one (5κτ r/6 2)-large offspring,. Further, if D(q,r) (γ ) ≥ diamτ (γ ),
(5.7)
then (5.5) follows from (5.4). Let w and x be sites of γ with dτ (w, x) = diamτ (γ ). At least one of these points is not in Q2 (u, v), so we may assume w is in some αi . There are now four possibilities. First, √ if also x ∈ αi , then (5.7) holds. Second, if instead x ∈ S, then there exists a (5κτ r/6 2)-large offspring αj = αi , and we have D(q,r) (γ ) ≥ diamτ (αi ) + diamτ (αj ) √ √ κτ r 5κτ r + √ ≥ diamτ (γ ) − 2κτ (q + 2 2) − 3 6 2 ≥ diamτ (γ ),
752
K. S. Alexander
and again (5.7) holds. Third, suppose x ∈ αk for some k = i and there exists a third (κτ r/3)-large offspring αl with l = i, k. Let dm = diamτ (αm ). Then √ √ di + dk ≥ diamτ (γ ) − 2κτ (q + 2 2) ≥ diamτ (γ ) − dl , so once more (5.7) holds. The fourth possibility is that x ∈ αk for some k =√i and αi , αk are the only (κτ r/3)large offspring; each is necessarily actually (5κτ r/6 2)-large. From (5.4) we have A ≤ |R| + | Int(αi )| + | Int(αk )| ≤ r 2 +
36 2 (di + dk2 ). w12
Using this and the fact that w1 ≤ 4κτ (since the unit square encloses unit area) we obtain w12 4 A ≤ κτ2 r 2 + di2 + dk2 ≤ 2di dk + di2 + dk2 . 36 9 Taking square roots yields (5.5). " # For k ≥ 0 define the events My (k, q, r, A, A , d , t) = |F(q,r) (0 )| = k ∩ [| Int(0 )| = A] ∩ | Int(αmax,0 )| = A ∩ D(q,r) (0 ) ∈ [d , d + 1)] ∩ [W(∂ Co(αmax,0 )) ≥ t] . We first consider k = 0, which means αmax,γ = γ and D(q,r) (γ ) = 0; larger values will be handled later by induction. Proposition 5.3. Assume (2.7) and either (i) the ratio weak mixing property or (ii) both the weak mixing property and the near-Markov property for open√circuits. Then there exist constants 4i , Ki as follows. Let A ≥ 1, t+ ≥ 0, t = w1 A + t+ ≥ 2, and 413 t > r > 15q > K31 log t. Then P (M0 (0, q, r, A, A, 0, t)) ≤ e−u(K32 r
2/3 ,A)− 1 t 2 +
.
(5.8)
Proof. From the definition of w1 we may assume t+ ≥ 0. It follows easily from (2.9) that for some K33 , K34 , P (diamτ (0 ) ≥ t) ≤ P (m ≤ diamτ (0 ) < m + 1) m>t−1
≤
e−τ (y−x)
m>t−1 x ∗ ,y ∗ ∈Bτ (0,m+1)∩(Z2 )∗ :τ (y−x)≥m
≤ K33 t 4 e−t ≤ e−u(K34 r
2/3 ,A)−t +
,
so by (5.4) it suffices to consider M0 (q, r, A, t) = M0 (0, q, r, A, A, 0, t) ∩ [t/6 ≤ diamτ (0 ) ≤ t].
(5.9)
Droplets in Random Cluster Models
753
Suppose ω ∈ M0 (q, r, A, t). Fix α > 0 to be specified, let s = αt 2/3 r 1/3 and suppose HSkels (0 ) = (y0 , . . . , ym+1 ). By (3.1), m ≤ K35 α −1 t 1/3 r −1/3 .
(5.10)
Let Bi = B(yi , 4r) ∩ (Z2 )∗ . Let I = {i ≤ m : |yi+1 − yi | > 8r}. For each i ∈ I [y ,y ] there is a segment 0[wi ,xi ] ⊂ 0 i i+1 entirely outside Bi ∪ Bi+1 , with wi ∈ ∂Bi and xi ∈ ∂Bi+1 . We next show that [wj ,xj ]
d(0[wi ,xi ] , 0
) > q/2
for all i = j in I.
(5.11)
[w ,x ]
If not, there exist u ∈ 0[wi ,xi ] , v ∈ 0 j j and a dual path ψ from u to v in Co(0 ) [y ,y ] with |ψ| ≤ q. Let a be the last site of ψ in 0 i i+1 , and b the first site of ψ after a [y ,y ] which is in some segment 0 k k+1 with k = i. Since all sites yl are extreme points, we must have ψ (a,b) ⊂ Int(0 ). We claim that (a, b) is a (q, 3r)-bottleneck. By Lemma 5.1 this is a contradiction, so our claim will establish (5.11). Suppose i < k; the proof if i > k is similar. We have ψ ⊂ B(u, q) and u ∈ / Bi+1 so ψ ∩ B(yi+1 , 3r + 1) = φ. Therefore 0[a,b] contains a segment in B(Bi+1 ) which includes yi+1 and has diameter at least 3r. Similarly since v ∈ / Bi , 0[b,a] contains a segment in B(Bi ) which includes yi and has diameter at least 3r. This proves the claim and thus (5.11). From (3.6) and (5.10) we have τ (xi − wi ) ≥ W(HPaths (0 )) − K36 mr i∈I
≥ W(∂ Co(0 )) − K37 α 2 t 1/3 r 2/3 − K36 mr
(5.12)
≥ t − K38 (α 2 + α −1 )t 1/3 r 2/3 . Equation (5.12) shows that it is optimal to take α of order 1 in our choice of s, so we now set α = 1. For wi ∈ ∂(Bi ∩Z2 ) and xi ∈ ∂(Bi+1 ∩Z2 ) for each i ≤ m, let A(w0 , x0 , . . . , wm , xm ) be the event that for each i ∈ I there is an open dual path αi from wi to xi in Bτ (wi , t), with d(αi , αj ) > q/2 for all i = j . Then we have shown P M0 (q, r, A, t)) ∩ [HSkels (0 ) = (y0 , . . . , ym+1 )] ≤ ··· P (A(w0 , x0 , . . . , wm , xm )) (5.13) w x w x 0
0
2 m+1
≤ (K39 r )
m
m
max
w0 ,x0 ,...,wm ,xm
P (A(w0 , x0 , . . . , wm , xm )).
Provided K31 is sufficiently large, Lemma 3.2, (5.12) and induction give P (A(w0 , x0 , . . . , wm , xm )) ≤ 2m P (wi ↔ xi ) i∈I
(5.14)
m −t+2K38 t 1/3 r 2/3
≤2 e
which with (5.10) and (5.13) yields 1/3 2/3 P M0 (q, r, A, t) ∩ [HSkels (0 ) = (y0 , . . . , ym+1 )] ≤ e−t+K40 t r .
(5.15)
754
K. S. Alexander
The number of possible (y0 , . . . , ym+1 ) in (5.15) is at most (K41 t 2 )m+1 , which with (5.15) yields P (M0 (q, r, A, t)) ≤ e−t+K42 t
1/3 r 2/3
,
(5.16)
provided K31 , and hence r, is large enough. It is easily verified that, provided 413 is small enough, 1 K42 (t+ + w1 A1/2 )1/3 r 2/3 ≤ 2K42 (w1 A1/2 )1/3 r 2/3 + t+ , 2 by considering two cases according to which of t+ and w1 A1/2 is larger. This and (5.16) establish (5.8) for M0 ; as we have noted, this and (5.9) establish (5.8) as given. " # Remark 5.4. Let I be any increasing event. Since the event on the left side of (5.14) is decreasing, its probability is not increased by conditioning on I . It follows easily that Proposition 5.3 remains true if the probability on the left side of (5.8) is conditioned on I , even though M0 (0, q, r, A, A, 0, t) is not itself a decreasing event. Under (2.7), open dual bonds do not percolate, so for every bounded set A there is a.s. an innermost open circuit surrounding A; we denote this circuit by O(A). An enclosure event is an event of form ∩i≤n (Open(αi ) ∩ [αi ↔ ∞]) , where α1 , . . . , αn are circuits (of regular bonds.) This includes the degenerate case of 2 the full space {0, 1}B(Z ) . Clearly any such event is increasing. Proposition 5.5. Assume (2.7), the weak mixing property and the near-Markov property for open circuits. There exist constants K√i , 4i as follows. Let A ≥ A ≥ 3, k ≥ 0, t+ ≥ √ 0, t = w1 A + t+ , d ≥ 0, and 414 (w1 A + d + t+ ) ≥ r ≥ 15q ≥ K43 log A. Then 1 1 2/3 P (M0 (k, q, r, A, A , d , t)) ≤ exp −u(K44 r , A) − t+ − d (5.17) 60 10 and
1 P (M0 (k, q, r, A, A , d , t)) ≤ exp − d P (M0 (0, q, r, A , A , 0, t)). 2
(5.18)
Remark 5.6. The proof of Proposition 5.5 actually shows slightly more than (5.18). For U ⊂ R2 let My,U (k, q, r, A, A , d , t) = My (k, q, r, A, A , d , t) ∩ [Int(y ) ⊂ U ]. Under the assumptions of the proposition, for every enclosure event E and every U ⊂ R2 , we have 1 1 P (M0,U (k, q, r, A, A , d , t) | E) ≤ exp −u(K44 r 2/3 , A) − t+ − d (5.19) 60 10 and
P M0,U (k, q, r, A, A , d , t) | E 1 ≤ exp − d max P My,U (0, q, r, A , A , 0, t) | E . 2 y∈U ∩Z2
(5.20)
We do not need this stronger result here, but it will be useful in a forthcoming paper.
Droplets in Random Cluster Models
755
√ Proof of Proposition 5.5. We will refer to the requirement 414 (w1 A + d + t+ ) ≥ r as the size condition, and to all other assumptions of the proposition collectively as the basic assumptions. We first prove (5.17). We proceed by induction on k, using Proposition 5.3 for k = 0. Fix q, r and define for U ⊂ R2 , Ly,U (k, A, A , d, d , t) = My,U (k, q, r, A, A , d , t) ∩ [d ≤ D(q,r) (y ) < d + 1], (5.21) where 1 √ w1 A − 1 ≤ d ≤ K45 A, 6
d≥
1 t+ − 1. 6
(5.22)
We omit the U in the notation when U = R2 . If K45 is large enough then, from Lemma 5.2 and the lattice nature of y , Ly,U (k, A, A , d, d , t) is empty if any of the inequalities in (5.22) fails. Note that for some K46 , My (k, q, r, A, A , d , t) ⊂ [diam(y ) ≤ K46 A].
(5.23)
Our induction hypothesis is that for some constants 4i , Ki for all j < k, all A, A , t, d satisfying the basic assumptions, all U ⊂ R2 and all d satisfying (5.22), for every enclosure event E, we have κτ 9 P (L0,U (j, A, A , d, d , t) | E) ≤ exp − d + j r ; (5.24) 10 40 if the size condition is also satisfied, then in addition P (L0,U (j, A, A , d, d , t) | E)
≤ exp −u(K32 r
2/3
1 1 , A) − t+ − d , 60 10
(5.25)
with K32 from (5.8). We wish to verify these hypotheses for j = k. For j = 0 it suffices to consider d = 0 and (5.25) is Proposition 5.3 (together with Remark 5.4), while (5.24) follows easily from the first inequality in (5.9), if K43 is large. Hence we may assume k ≥ 1 and fix A, A , d, d . Let Q(u, v, η) denote the event that L0 (k, A, A , d, d , t) occurs with (u, v) a primary (q, r)-bottleneck in 0 of type η, and let R(u, v, η) = {Yuvη (ω) : ω ∈ Q(u, v, η)}. Let E be an enclosure event; it is easy to see that bottleneck surgery cannot destroy E, that is, Yu,v,η (Q(u, v, η) ∩ E) ⊂ R(u, v, η) ∩ E. (This is the reason for considering only enclosure events, not general increasing events.) Since |∂Q1 (u, v)|+|Jη (u, v)| ≤ K47 q, we then have from the bounded energy property: P (Q(u, v, η) | E) ≤ eK48 q P (R(u, v, η) | E).
(5.26)
Fix u, v, η and for y1 , . . . , yl ∈ Z2 and l, Ai , ki , di , di ≥ 0 in Z, let Z = Z(l, y1 , . . . , yl , A1 , . . . , Al , d1 , . . . , dl , d1 , . . . , dl , k1 , . . . , kl ) denote the event that there exist disjoint exterior open dual circuits α1 , . . . , αl such that:
756
K. S. Alexander
αi surrounds yi , | Int(αi )| = Ai , diamτ (αi ) ≥ κτ r/3, |F(q,r) (αi )| = ki , di ≤ D(q,r) (αi ) < di + 1 and di ≤ D(q,r) (αi ) < di + 1 for all i ≤ l; (ii) letting αmain denote the open dual circuit enclosing maximal area among all descendants of all αi , we have√αmain a descendant of α1 satisfying | Int(αmain )| = A and W(∂ Co(αmain )) ≥ w1 A + t+ ; (iii) there is no open dual path connecting αi to αn for any two indices i = n.
(i)
We suppress the parameters in Z when confusion is unlikely. Then considering only (κτ r/3)-large offspring we see that R(u, v, η) ⊂ ∪ Z(l, y1 , . . . , yl , A1 , . . . , Al , d1 , . . . , dl , d1 , . . . , dl , k1 , . . . , kl ), (5.27) where the union is over all parameters satisfying 2 ≤ l ≤ min(4q, k + 1), A1 ≥ A ,
yi ∈ (Z2 )∗ with d(yi , ζuv ) ≤ 2,
A − K30 r 2 ≤
Ai ≤ A,
i≤l
Ai ≥
κτ r , 6
ki = k + 1 − l,
1 w1 Ai − 1 ≤ di ≤ K45 Ai , di ≤ di , 6 d − l ≤ d1 + di ≤ d , d − l ≤ di ≤ d, 2≤i≤l
d1 − d1 + 1 ≥
√ 1 (w1 A + t+ ). 6
(5.28)
i≤l
(5.29) (5.30)
i≤l
(5.31)
Here (5.31) and the first inequality in (5.29) follow from (ii) above and (5.4), and K30 is from (5.3). Temporarily fix such a set of parameters and let ν1 , . . . , νl be circuits with diamτ (νi ) ≥
κτ r 3
for all i, and Int(νi ) ∩ Int(νj ) = φ for i = j.
Define events
√ L˜ i = ∪3≤B≤Ai Lyi (ki , Ai , B, di , di , w1 B), i = 1, L˜ 1 = Ly1 (k1 , Ai , A , d1 , d1 , t), Ti = [O(yi ) = νi ], Yi = L˜ i ∩ Ti for i ≤ m, T = ∩i≤l Ti .
Then Z ∩ T ⊂ ∩i≤l Yi .
(5.32)
Note that as in (3.14), Yi = Open(νi ) ∩ [νi ↔ ∞] ∩ Gi for some Gi ∈ GB(Int(νi )) ,
for every i ≤ l. (5.33)
Define regions and events R = ∪i≤l−1 Int(νi ), F = ∩i≤l−1 Ext(νi ), ˜ = ∩i≤l−1 Gi , H = ∩i≤l−1 Open(νi ), G
N = ∩i≤l−1 [νi ↔ ∞],
Droplets in Random Cluster Models
757
and let Ll denote the event that L˜ l occurs on B(F ). Then N ∩ E ∩ H = E R ∩ EF ∩ H ˜ ∩i≤l−1 Yi = H ∩ N ∩ G,
for some ER ∈ GB(R) , EF ∈ GB(F ) ,
(5.34) (5.35)
and ∩i≤l Yi ⊂ Ll .
(5.36)
The relation between area Ai and diameter di tells us roughly whether the circuit αi (or its collection of descendants) is regular or irregular; we thereby subdivide the circuits into “large regular”, “small regular” and “irregular” categories as follows: I1 = {i ≤ l : di < 4w1 Ai , Ai ≥ c1 r 2 }, I2 = {i ≤ l : di < 4w1 Ai , Ai < c1 r 2 }, I3 = {i ≤ l : di ≥ 4w1 Ai }, 2 , (3K /w )3 ) is chosen so that where c1 = max(1/414 32 1
u(K32 r 2/3 , Ai ) ≥
2 w1 A i , 3
i ∈ I1 .
(5.37)
Let 1 9 κτ µi = max u(K32 r 2/3 , Ai ) + di , di − ki r , i ∈ I1 \{1}, 10 10 40 9 κτ µi = di − ki r, i ∈ I2 ∪ I3 , 10 40 1 1 9 κτ µ1 = max u(K32 r 2/3 , A1 ) + t+ + d1 , d1 − k1 r if 1 ∈ I1 60 10 10 40 (cf. (5.24)). Now H ∩ EF is an enclosure event so by the induction hypotheses (5.24) and (5.25), summing over B ≤ Al gives P (Ll | H ∩ EF ) ≤ Al e−µl .
(5.38)
(Note that the size condition can be used here for i ∈ I1 .) Since |νi | ≥ 415 r for all i, from (2.5) and (2.6), provided K43 is sufficiently large we get ˜ ∩ ER ) ≤ (1 + e−415 r/2 )P (Ll ∩ EF | H ) P (Ll ∩ EF | H ∩ G
(5.39)
˜ ∩ ER ). P (EF | H ) ≤ (1 + e−415 r/2 )P (EF | H ∩ G
(5.40)
and
758
K. S. Alexander
Combining (5.33)–(5.40) we obtain P (∩i≤l Yi ) ∩ E ≤ P (Ll ∩ (∩i≤l−1 Yi ) ∩ E)P (Tl | Ll ∩ (∩i≤l−1 Yi ) ∩ E) ˜ ∩ ER ∩ EF )P (Tl | Ll ∩ (∩i≤l−1 Yi ) ∩ E) = P (Ll ∩ H ∩ G ˜ ∩ ER ) ≤ (1 + e−415 r/2 )P (Ll ∩ EF | H )P (H ∩ G · P (Tl | Ll ∩ (∩i≤l−1 Yi ) ∩ E) ˜ ∩ ER ) = (1 + e−415 r/2 )P (Ll | EF ∩ H )P (EF | H )P (H ∩ G · P (Tl | Ll ∩ (∩i≤l−1 Yi ) ∩ E) ˜ ∩ ER )P (H ∩ G ˜ ∩ ER ) ≤ (1 + e−415 r/2 )2 Al e−µl P (EF | H ∩ G
(5.41)
· P (Tl | Ll ∩ (∩i≤l−1 Yi ) ∩ E) = (1 + e−415 r/2 )2 Al e−µl P ((∩i≤l−1 Yi ) ∩ E)P (Tl | Ll ∩ (∩i≤l−1 Yi ) ∩ E). Summing over νl (which appears via Tl ), dividing by P (E) and iterating this (taking H and N to be the full space and R = φ, F = R2 , at the last iteration step) we obtain using (5.32) −415 r/4 2l −µi l (5.42) ) Ai e ≤ 2A exp − µi . P (Z | E) ≤ (1 + e i≤l
i≤l
We now want to sum (5.42) over all parameters of Z allowed in (5.27). We first view l as fixed and allow the other parameters to vary. Note that the number of parameter choices is at most (K49 A)5l+1 , and the number of possible (u, v, η) is at most (K49 A)3 , for some K49 . Suppose we can show that, under the basic assumptions,
µi ≥
i≤l
9 κτ κτ d − kr + (l − 1)r 10 40 100
(5.43)
and, if the size condition is also satisfied,
µi ≥ u(K32 r 2/3 , A) +
i≤l
1 1 κτ d + t+ + (l − 1)r. 10 60 100
(5.44)
Then from (5.27) and (5.42), 9 κτ κτ P (R(u, v, η) | E) ≤ 2(K49 A)5l+2 exp − d + kr − (l − 1)r , 10 40 40 and if the size condition is satisfied, 1 1 P (R(u, v, η) | I ) ≤ (K49 A)5l+2 exp − max u(K32 r 2/3 , A) + d + t+ , 10 60
κτ 9 κτ (l − 1)r . d − kr − 40 100 10
Thus, summing over u, v, η, then over l, and using r > K43 log A and (5.26), we obtain (5.24) and (5.25) for j = k.
Droplets in Random Cluster Models
759
Now (5.43) is a direct consequence of (5.28) and (5.30), so we turn to (5.44) and assume that the size condition holds. Let 1 di , i≥2 10 δi = 1 d + 1 t+ , i = 1 10 1 60 and set β1 = β3 = 1/2, β2 = 1/10. We claim that 1 (5.45) µi ≥ βj w1 Ai + δi + di − 1 for every i ∈ Ij , j = 1, 2, 3. 10 Observe that κτ r κτ r κτ r κτ r − 1 ≥ (ki + 1) , di ≥ ki − 1 ≥ ki (5.46) di ≥ (ki + 1) 3 4 3 4 and hence 4 (5.47) µi ≥ di , i ≤ l. 5 This yields 4 11 (5.48) µi ≥ di ≥ w1 Ai + di , i ∈ I3 , 5 20 which proves (5.45) for i ∈ I3 \{1}. For i ∈ I1 we can use (5.37), (5.46) and a convex combination of the lower bounds (5.47) and u(K32 r 2/3 , Ai ) for µi to obtain 3 1 1 1 (5.49) µi ≥ u(K32 r 2/3 , Ai ) + di ≥ w1 Ai + di , i ∈ I1 , 4 5 2 5 which proves (5.45) for i ∈ I1 \{1}. From (5.29), 1 1 3 di ≥ w1 Ai + di − , i ≤ l, 8 4 4 and hence by (5.47), 4 1 1 3 w1 Ai + di − , i ≤ l, (5.50) µ i ≥ di ≥ 5 10 5 5 which proves (5.45) for i ∈ I2 \{1}. We need slightly different estimates for i = 1. If 1 ∈ I3 then using (5.48) and (5.31) we obtain 1 µ1 ≥ w1 A1 + d1 2 (5.51) 1 1 1 ≥ w1 A1 + d1 + (d1 − 1) + t+ , 4 4 24 which proves (5.45) for i = 1. If 1 ∈ I1 then by (5.49) and (5.31), 1 1 µ1 ≥ w1 A1 + d1 2 5 (5.52) 1 1 1 1 ≥ w1 A1 + d1 + (d1 − 1) + t+ , 2 10 10 60
760
K. S. Alexander
which again proves (5.45) for i = 1. Finally if 1 ∈ I2 , then by (5.50), (5.52) remains valid with 1/10 in place of 1/2, and 3/5 subtracted from the right side, once again proving (5.45) for i = 1. Thus (5.45) holds in all cases. The next step is to sum (5.45) over i. There are 2 cases. √ Case 1. d ≥ 20w1 A. Then using (5.45), (5.30) and (5.46), 1 µi ≥ δi + di + 1 10 i≤l
i≤l
≥
1 1 1 1 (d − l) + t+ + (d − l) + di + l 10 60 20 20
√ 1 1 κτ ≥ d + t+ + w1 A, + lr, 10 60 80
(5.53)
i≤l
which proves (5.44). √ Case 2. d < 20w1 A. By (5.30), (5.31) and (5.46),
√ 6d + t+ ≤ 6(d + l + 1) ≤ 7d ≤ 140w1 A.
This and the size condition imply r < 141414 w1
√
√ κτ A A≤ 80K30 w1
if 414 is small enough, with K30 as in (5.3), and then √ √ K30 r 2 κτ r A − K30 r 2 ≥ A 1 − . ≥ A− A 80w1 Let Sj =
Ai ,
j = 1, 2, 3,
and
(5.54)
S = S1 + S2 + S3 .
i∈Ij
Provided 414 is small enough, we have by (5.3) and (5.54), √ √ κτ r w1 S ≥ w1 A − . 80 It is easily checked that, for θ ≤ 1, √ √ √ a + b ≤ a + θ b for 0 ≤ b ≤ 4θ 2 a.
(5.55)
(5.56)
Choose i13 , i2 satisfying Ai13 = max Ai , i∈I1 ∪I3
Ai2 = max Ai . i∈I2
(If I1 ∪ I3 or I2 is empty then the corresponding i13 or i2 is undefined.) We now consider two subcases. 1 Case 2a. I2 = φ or Ai2 ≤ 25 (S1 + S3 ). From the definition of µi if i13 ∈ I1 , and from (5.45) if i13 ∈ I3 , we have µi13 ≥ w1 Ai13 − λ + δi13 , (5.57)
Droplets in Random Cluster Models
761
1/6 where λ = min(K32 r 2/3 Ai13 , w1 Ai13 ). Therefore by (5.56) with θ = 1/2,
µi ≥ w1 S1 + S3 − λ + δi +
i∈I1 ∪I3
i∈I1 ∪I3
i∈I1 ∪I3 ,i=i13
1 di − l. 10
(5.58)
Hence by (5.56) with θ = 1/25, and (5.55), (5.46), (5.58) and (5.45), √ κτ µi ≥ w1 S − λ + δi + (l − 1)r − l 40 i≤l
i≤l
√ κτ r 1 κτ 1 ≥ w1 A − − λ + (d − l) + t+ + (l − 1)r − l 80 10 60 40 1 1 κτ ≥ u(K32 r 2/3 , A) + d + t+ + (l − 1)r, 10 60 100
(5.59)
which gives (5.44). 1 Case 2b. Ai2 > 25 (S1 + S3 ). Let us relabel (Ai , i ∈ J2 ) as B1 ≥ · · · ≥ Bn . We have S1 + S3 ≤ 25c1 r 2 , while from (5.3), provided 414 is small enough, S ≥ A − K30 r 2 ≥ 32, 525c1 r 2 , so S2 ≥ 32, 500c1 r 2 ≥ 325
100
Bi ,
m=1
which implies 1 20
n
m=101
1/2 Bm
9 ≥ 10
100
1/2 Bm
.
m=1
Using this, and using (5.56) twice (with θ = 1 and with θ = 1/20), we get n 1 Bi 10 m=1 1/2 100 n 1 1 ≥ Bm + Bi 10 10 m=1 m=101 1/2 1/2 1/2 100 n 100 n 1 1 9 ≥ Bm + Bi + Bi − Bm 20 20 10 m=1 m=101 m=101 m=1 ≥ S2 . (5.60)
If I1 ∪ I3 = φ then (5.58) remains valid. This, with (5.45), (5.46) and (5.60), shows that, whether I1 ∪ I3 = φ or not, (5.59) (with λ = 0 if I1 ∪ I3 = φ) still holds.
762
K. S. Alexander
The proof of (5.44), and thus of (5.25), is now complete. Taking y = 0 and E the full configuration space in (5.25), and summing over d satisfying (5.22) shows that P (M0 (k, q, r, A, A , d , t))
1 1 ≤ K45 A exp −u(K32 r 2/3 , A) − t+ − d . 60 10
(5.61)
This proves (5.17), with K44 = 2K32 . It remains to prove (5.18). This is similar to the proof of (5.25), so we will only describe the changes. Again fix q, r. We make the same induction hypothesis, except that (5.25) is replaced by P (L0,U (j, A, A , d, d , t) | E) 4 sup P (My,U (0, q, r, A , A , 0, t) | E), ≤ exp − d 5 y∈U ∩Z2
(5.62)
and the requirement that the size condition be satisfied is removed. This hypothesis is true for j = 0, where only d = 0 is relevant; hence we fix k ≥ 1 and A, A , d, d . Let f (A , t) = − log sup P (My,U (0, q, r, A , A , 0, t) | E). y∈U ∩Z2
In place of µi we use 9 κτ di − ki r, i ≥ 2, 10 40 4 9 κτ µˆ 1 = max d1 + f (A , t), d1 − k1 r . 5 10 40 µˆ i =
In place of (5.43), (5.44) and their multi-case proofs, we have simply, using the first half of (5.46), 1 κτ r κτ κτ r di − k i r ≥ (l − 1) ≥ l + (l − 1), 10 40 40 50 2≤i≤l
and hence using (5.30), i≤l
1 4 κτ di − k i r µˆ i ≥ f (A , t) + (d − l) + 5 10 40 i=m
4 κτ r (l − 1). ≥ f (A , t) + d + 5 50 This leads directly to (5.62) (with L0,U , M0,U in place of L0 , M0 ), as in the proof of (5.25). In place of (5.61) we have P (M0,U (k, q, r, A, A , d , t)) 4 sup P (My,U (0, q, r, A , A , 0, t)). ≤ K45 A exp − d 5 y∈U ∩Z2
(5.63)
Droplets in Random Cluster Models
763
By (5.23), taking U = B(0, K46 A) gives M0,U (k, q, r, A, A , d , t)) = M0 (k, q, r, A, A , d , t)) so (5.63) becomes
4 P (M0 (k, q, r, A, A , d , t)) ≤ K45 A exp − d P (M0 (0, q, r, A , A , 0, t)). 5
For k ≥ 1 we need only consider d ≥ κτ r/3, so provided K43 is large enough, this proves (5.18). " # Lemma 5.7. Let q ≥ 1, r ≥ 15q, let γ be a dual circuit and let αmax,γ be its maximal (q, r)-descendant. Then W(∂ Co(γ )) ≤ W(∂ Co(αmax,γ )) + 19D(q,r) (γ ).
Proof. We may assume γ has at least one bottleneck. If (u, v) is a primary bottleneck in γ , and the (κτ r/3)-large offspring of γ are α1 , . . . , αk , then κ r
τ Int(γ ) ⊂ Bτ u, + 2κτ q ∪ ∪i≤k Int(αi ), 3 and therefore from (5.4), γ can be surrounded by a (non-lattice) loop of τ -length at most κ r
τ 12 W(∂ Co(αi )). + 2κτ q + 3 i≤k
Since ∂ Co(γ ) minimizes the τ -length over all such loops, it follows that W(∂ Co(γ )) ≤ 6κτ r + W(∂ Co(αi )). i≤k
Iterating this, and using (5.4) and
(γ ) D(q,r)
≥
κτ r 3 |F(q,r) (γ )|,
(γ )| + W(∂ Co(γ )) ≤ 6κτ r|F(q,r)
we obtain
W(∂ Co(α))
α∈F(q,r) (γ )
≤ W(∂ Co(αmax,γ )) + 19D(q,r) (γ ).
# "
The next theorem, together with Theorem 4.1, shows roughly that for a droplet of size√A, there is a cost for the convex hull boundary τ -length exceeding the minimum w1 A by an amount s+ , this cost being exponential in s+ , and there is an exponential cost for positive D(q,r) (0 ). Theorem 5.8. Assume (2.7) and either (i) the ratio weak mixing property or (ii) both the weak mixing property and the near-Markov property √ for open circuits. There exist constants Ki as follows. Let A > K50 , s+ ≥ 0, s = w1 A + s+ and d ≥ 0. Then (0 ) ≥ d ) P (| Int(0 )| ≥ A, W(∂ Co(0 )) ≥ s, D(q,r) 1 1 ≤ exp −u(K51 (log A)2/3 , A) − s+ − d . 1520 20
(5.64)
764
K. S. Alexander
Proof. Let K52 ≥ K43 (of Proposition 5.5) and 1 K52 log B, rB = K52 log B, 15 √ t+ (n, A ) = max(s − 19n − w1 A , 0), s+ }, I1 (n) = {A ∈ Z+ : t+ (n, A ) ≥ 2 ! " √ √ s+ I2 (n) = A ∈ Z+ : t+ (n, A ) < , w1 A ≤ w1 A + 19n , 2 qB =
and
! " √ √ s+ I3 (n) = A ∈ Z+ : t+ (n, A ) < , w1 A > w1 A + 19n . 2
Then using Lemma 5.7, (0 ) ≥ d ) P (| Int(0 )| ≥ A, W(∂ Co(0 )) ≥ s, D(q,r) ≤ P (| Int(0 )| = B, D(q,r) (0 ) ∈ [n, n + 1), B≥A n≥d
W ∂ Co(αmax,0 ) ≥ s − 19n) √ ≤ P M0 (k, qB , rB , B, A , n, w1 A + t+ (n, A )) B≥A n≥d
A ≤B,A ∈I1 (n) k≥0
+
√ P (M0 (k, qB , rB , B, A , n, w1 A ))
(5.65)
A ≤B,A ∈I2 (n) k≥0
+
√ P (M0 (k, qB , rB , B, A , n, w1 A )) .
A ≤B,A ∈I3 (n) k≥0
The events M0 (k, qB , rB , B, A , n, ·) are empty unless n+1 > (k +1)κτ r/4 (cf. (5.46)); if K52 , and hence r, is large enough, this implies k ≤ n, so we may restrict the sums in (5.65) to such k. Presuming A is large enough, u(K44 (log B)2/3 , B) is strictly positive for all B ≥ A, for K44 of Proposition 5.5. For A ∈ I1 (n) we apply Proposition 5.5 to get
√ P M0 (k, qB , rB , B, A , n, w1 A + t+ (n, A )
A ≤B,A ∈I1 (n) k≤n
√
≤ (n + 1)B exp −w1 B + K44 B
1/6
(log B)
2/3
1 s+ − n . − 120 10
(5.66)
√ Note that if I2 (n) or I3 (n) is nonempty we must have s+ = s −w1 A > 0. If A ∈ I2 (n) we have √ √ √ 1 (s − w1 A) > s − 19n − w1 A ≥ s − w1 A − 38n, 2
Droplets in Random Cluster Models
765
and hence n ≥ s+ /64. Therefore √ P (M0 (k, qB , rB , B, A , n, w1 A )) A ≤B,A ∈I2 (n) k≤n
√ 1 (5.67) ≤ (n + 1)B exp −w1 B + K44 B 1/6 (log B)2/3 − n 10 √ 1 1 ≤ (n + 1)B exp −w1 B + K44 B 1/6 (log B)2/3 − n − s+ . 20 1520
If A ∈ I3 (n) we have √ √ s+ s+ + w1 A − w1 A − 19n ≤ t+ (n, A ) ≤ 2 so that √ √ √ √ s+ 2(w1 A − w1 A) > w1 A − w1 A + 19n ≥ 2 which implies √ √ √ s+ w1 B ≥ w1 A > w1 A + . 4 Therefore
√ P (M0 (k, qB , rB , B, A , n, w1 A ))
A ≤B,A ∈I3 (n) k≤n
√ 1 (5.68) ≤ (n + 1)B exp −w1 B + K44 B 1/6 (log B)2/3 − n 10 1 √ 1 √ 1 1 ≤ (n + 1)B exp − w1 B − w1 A − s+ + K44 B 1/6 (log B)2/3 − n . 2 2 8 10
We can now use (5.66), (5.67) and (5.68) to sum over n and B in (5.65), obtaining (5.64). # " Part of our main result is an easy consequence of Theorem 5.8. Proof of Theorem 2.1, (2.13) and (2.14). From the definition of w1 and Theorem 5.8, for any c, if A is sufficiently large, P (| Int(0 )| ≥ A, ALR(0 ) > cl 1/3 (log l)2/3 ) (5.69) ≤ P (| Int(0 )| ≥ A, | Co(0 )| ≥ A + cw1 l 4/3 (log l)2/3 ) 2 1/3 2/3 √ cw1 l (log l) ≤ P | Int(0 )| ≥ A, W(∂ Co(0 )) ≥ w1 A + 3
≤ exp − u(K51 (log A)2/3 , A) − 416 cl 1/3 (log l)2/3 . If we take c sufficiently large, this and Theorem 4.1 prove that (2.13) holds with conditional probability approaching 1 as A → ∞.
766
K. S. Alexander
Next, from the quadratic nature of the Wulff variational minimum (see [1, 12]), for any a, b, if A is sufficiently large, P (A ≤ | Int(0 )| ≤ A + aw1 l 4/3 (log l)2/3 , ;A (∂ Co(0 )) > bl 2/3 (log l)1/3 ) b 2/3 1/3 P | Int(0 )| = B, ;B (∂ Co(0 )) > l (log l) ≤ 2 B
√ P | Int(0 )| = B, W(∂ Co(0 )) ≥ w1 A + 417 bl 1/3 (log l)2/3 ≤ B
≤ exp − u(K51 (log A)2/3 , A) − 418 bl 1/3 (log l)2/3 ,
(5.70)
where the sums are over A ≤ B ≤ A + aw1 l 4/3 (log l)2/3 . Now P (| Int(0 )| > A + aw1 l 4/3 (log l)2/3 ) can be bounded as in (5.69), so if we take a, b sufficiently large, (5.70) and Theorem 4.1 prove that (2.14) holds with conditional probability approaching 1 as A → ∞. " #
6. Proof of (2.15) and (2.16) For x, y ∈ (Z2 )∗ , r > 0 and G ⊂ R2 , we say there is an r-near dual connection from x to y in G if for some u, v ∈ (Z2 )∗ with d(u, v) ≤ r, there are open dual paths from x to u and from y to v in G. Let N (x, y, r, G) denote the event that such a connection exists. The following result is from [4]. Lemma 6.1. Let P be a percolation model on B(Z2 ) satisfying (2.7) and the ratio weak mixing property. There exist Ki such that if |x| > 1 and r ≥ K53 log |x| then P (N(0, x, r, R2 )) ≤ e−τ (x)+K54 r . To prove (2.15) we also need the following. Lemma 6.2. Suppose τ is positive. There exists 419 such that if q ≥ 1, r ≥ 15q, A > A > 0 and γ is a dual circuit with | Int(γ )| = A, | Int(αmax,γ )| = A , then √ (γ ) ≥ 419 A − A . D(q,r) Proof. Let α1 , . . . , αk be the non-maximal final (q, r)-descendants of γ , and Ai = | Int(αi )|. Then # κτ r 1 1 D(q,r) (γ ) ≥ ≥ w1 max w1 Ai , Ai + kκτ r 3 2 6 i≤k
i≤k
while (cf. (5.6)) A − A ≤ kr 2 +
i≤k
The lemma follows easily. " #
Ai .
Droplets in Random Cluster Models
767
The proof of (2.15) is relatively straightforward, compared to (2.13) and (2.16), so in the interest of space we give only a sketch of the proof. Proof sketch for Theorem 2.1, (2.15). The basic idea is that a large inward deviation of [y ,y ] 0 k k+1 from ∂ Co(0 ) for some k reduces the factor P (wk ↔ xk ) in (5.14). Let Vk denote the line through yk and yk+1 . Suppose HSkel(0 ) = {y0 , . . . , ym+1 }, z is a site in 0 with d(z, ∂ Co({y0 , . . . , ym+1 })) > K55 l 2/3 (log l)1/3 , with K55 large, and 0 satisfies (2.13) and (2.14). Let J (yk , yk+1 ) denote the event that there is an open dual path from yk to yk+1 containing some such site z. Let Rk denote the region bounded [y ,y ] by 0 k k+1 and by the segment of ∂ Co(0 ) from yk to yk+1 . There exists a site z ∈ Vk with the following property: a line tangent to ∂Bτ (yk , τ (z − yk )) at z passes through z. This z is a projection of z onto Vk and satisfies τ (yk − z) ≥ τ (yk − z ),
τ (yk+1 − z) ≥ τ (yk+1 − z ). D
E
F
(6.1) z ,
⊂ ⊂ be balls centered at all with Let D ⊂ E be balls centered at z and let radii of order l 2/3 (log l)1/3 . (The statements to follow are valid for appropriate choices [y ,y ] of these radii.) Equation (2.13) limits the area of Rk and thereby requires that 0 k k+1 intersects D . Thus we have dual connections yk ↔ z and z ↔ yk+1 , at least one of which intersects D . We then want to use Lemma 3.2 and then the triangle inequality to say something like P (J (yk , yk+1 )) ≤ exp −τ (yk+1 − z) − τ (z − yk ) (6.2) ≤ exp −τ (yk+1 − yk ) − K56 l 2/3 (log l)1/3 , but we must deal with the fact that the two dual connections may go close to each other so that Lemma 3.2 does not apply. There are various cases to consider depending on the geometry of the two dual connections yk ↔ z and z ↔ yk+1 relative to the balls and relative to each other. Consider for example the possibility that yk ↔ z and z ↔ yk+1 with an r-near connection from yk to yk+1 outside E. In this case this r-near connection and a path from z to the boundary of D occur at a large separation, so Lemma 3.2 can be applied to these two events, and the path from z to the boundary of D provides the extra cost K56 l 2/3 (log l)1/3 on the right side of (6.2). If there is no r-near connection outside E, we must consider cases depending on the location of z relative to yk and yk+1 and on whether the path intersecting D includes an r-near connection outside F . In general for each case we add costs (such as τ (yk+1 − z) and τ (z − yk ) in (6.2)), and then use the triangle inequality to get a bound like the right side of (6.2), if paths outside some ball are separated enough that costs can be added. When paths are not sufficiently separated, there is an r-near connection outside one of the larger balls E or F , and we add the costs of the r-near connection and another connection inside a smaller ball D or E , as exemplified above, to get a bound. Combining all cases bounds the left side of (6.2) by the right side of (6.2), while analogously to (5.14), P (wi ↔ xi ). P (A(w0 , x0 , . . . , wm , xm ) ∩ J (yk , yk+1 )) ≤ 2m P (J (yk , yk+1 )) i∈I \{k}
Analogously to (5.14)–(5.16), for sufficiently large K57 this leads to P (| Int(0 )| ≥ A, MLR(0 ) ≥ K57 l 2/3 (log l)1/3 , 0 is (q, r) − bottleneck-free) √ 1 (6.3) κτ K57 l 2/3 (log l)1/3 ≤ exp −w1 A − 200
768
K. S. Alexander
with q, r of order log A. Essentially in the manner of (6.7) – (6.10) below, we can reduce to the case of bottleneck-free 0 and conclude that if K58 is sufficiently large, P (| Int(0 )| ≥ A, MLR(0 ) ≥ K58 l 2/3 (log l)1/3 ) √ 1 κτ K58 l 2/3 (log l)1/3 . ≤ exp −w1 A − 4000
(6.4)
With Theorem 4.1 this completes the proof. " # The proof of (2.16) is based on the following idea. As we will see, by summing (5.18) as in the proof of Theorem 5.8 it is easy to obtain roughly 1 P (| Int(0 )| ≥ A, D(q,r) (0 ) ≥ d ) ≤ exp − d P (| Int(0 )| ≥ A − v) (6.5) 2 with v A. (Statement (6.5) is for motivation only – the actual statement we prove is contained in (6.19).) Note that for q, r as in Theorem 2.1, if 0 is not (q, r)-bottleneck free then D(q,r) (0 ) ≥ 21 K6 log A; hence to prove (2.16) we would like a result somewhat like (6.5) but with A on the right side in place of A − v. To replace A − v with A we need to know something of how the probability on the right side of (6.5) behaves as a function of v, which is obtainable from our next result. Let N = [−N, N ]2 . Proposition 6.3. Let P be a percolation model on B(Z2 ) satisfying (2.7), the nearMarkov property for open circuits, and the ratio weak mixing property. There exist Ki such that for A ≥ K59 and δ ≥ K60 log A we have √ (6.6) P (| Int(0 )| ≥ A + δ A) ≥ e−K61 δ P (| Int(0 )| ≥ A). √ Proof. Let l = A and r = 15q = K43 log A, where K43 is from Proposition 5.5. Let Midt (0 ) denote the Wulff shape of area t| Int(0 )| centered at the center of mass of Int(0 ). From translation invariance, Theorem 4.1, (2.17), (5.70) and (6.4) we have for sufficiently large K62 , 1 K62 κτ l 2/3 (log l)1/3 ) 3 + P (| Int(0 )| ≥ A, ;A (0 ) > K62 l 2/3 (log l)1/3 )
(0 ) ≥ P (| Int(0 )| ≥ A, D(q,r)
+ P (| Int(0 )| ≥ A, ;A (0 ) ≤ K62 l 2/3 (log l)1/3 , 0 ∈ / Mid3/4 (0 )) 1 ≤ P (| Int(0 )| ≥ A). 2 It is easy to see (cf. the proof of Lemma 5.7) that ;A (αmax,0 ) ≤ ;A (0 ) + 3κτ−1 D(q,r) (0 ). 2 κ 2 l 4/3 (log l)2/3 , we get Using this and Lemma 6.2, and letting g(A) = (3419 )−2 K62 τ
P (| Int(0 )| ≥ A) ≤
1 P (| Int(0 )| ≥ A) 2 + P | Int(0 )| ≥ A, | Int(0 )| − | Int(αmax,0 )| ≤ g(A), (6.7) ;A (αmax,0 ) ≤ 2K62 l 2/3 (log l)1/3 , 0 ∈ Mid3/4 (0 ) .
Droplets in Random Cluster Models
769
We need the following straightforward extension of (5.18), under the conditions of Proposition 5.5: P M0 (k, q, r, A, A , d , t) ∩ [;A (αmax,0 )
≤ 2K62 l 2/3 (log l)1/3 , 0 ∈ Mid7/8 (αmax,0 )] (6.8) 1 ≤ exp − d P M0 (0, q, r, A , A , 0, t) 2
∩ [;A (0 ) ≤ 2K62 l 2/3 (log l)1/3 , 0 ∈ Mid7/8 (0 )] . This yields that P | Int(0 )| ≥ A, | Int(0 )| − | Int(αmax,0 )| ≤ g(A),
;A (αmax,0 ) ≤ 2K62 l 2/3 (log l)1/3 , 0 ∈ Mid3/4 (0 ) √ ≤ P M0 (k, q, r, B, A , d , w1 A ) B≥A d ≥0 B−g(A)
≤
∩ [;A (αmax,0 ) ≤ 2K62 l 2/3 (log l)1/3 , 0 ∈ Mid7/8 (αmax,0 )] √ (d + 1)e−d /2 P M0 (0, q, r, A , A , 0, w1 A )
A ≥A−g(A) A ≤B
∩ [;A (0 ) ≤ 2K62 l 2/3 (log l)1/3 , 0 ∈ Mid7/8 (0 )] ≤ 10g(A)P | Int(0 )| ≥ A − g(A), 0 is (q, r) − bottleneck-free,
∩ [;A (0 ) ≤ 2K62 l 2/3 (log l)1/3 , 0 ∈ Mid7/8 (0 )]
(6.9)
so that from (6.7), P (| Int(0 )| ≥ A)
≤ 20g(A)P | Int(0 )| ≥ A − g(A), 0 is (q, r) − bottleneck-free, ;A (0 ) ≤ 2K62 l 2/3 (log l)1/3 , 0 ∈ Mid7/8 (0 ) .
(6.10)
The idea now is to split 0 into two halves and approximate the probability on the right side of (6.10) by the product of the probabilities of the two halves. With this independence the two halves can in effect be pulled apart from one another to increase the area enclosed by 0 at only a small cost in increased boundary length. To accomplish this we first need some definitions. Let ρ be a path (not necessarily self-avoiding) from x2 = (a2 , b2 ) to x1 = (a1 , b1 ) in the slab Sx1 x2 = {(x, y) ∈ R2 : b2 ≤ y ≤ b1 }. Let JL (ρ) and JR (ρ) denote the regions to the left and right, respectively, of the image of ρ in Sx1 x2 . The right-side area determined by ρ is µR (ρ) = |JR (ρ) ∩ N | − |JR (x1 x2 ) ∩ N |, evaluated for N large enough that N contains ρ. (Note that for such N , the right-side area does not vary with N . Also, in our definition the path ρ must be oriented so that
770
K. S. Alexander
b2 ≤ b1 .) The left-side area µL (ρ) is defined similarly using the left side of ρ. Let X1 and X2 be the points of 0 of maximum and minimum second coordinate, respectively, using the leftmost if there is more than one. Then
[X ,X ]
2 1 | Int(0 )| = µL 0[X2 ,X1 ] + µR ˆ0 , where ˆ0 is 0 traversed in the direction of negative orientation. Let Bi = B(Xi , 4r),
i = 1, 2,
and let Ui , Vi be the first and last lattice sites, respectively, of the segment of 0 ∩ Bi containing Xi . Let W1 be the first site in 0[U1 ,X1 ] for which
d W1 , 0[X1 ,U2 ] ≤ q, and let Z1 be the closest site to W1 in 0[X1 ,X2 ] . W2 and Z2 are defined similarly with subscripts 1 and 2 interchanged. Suppose now that 0 is (q, r)-bottleneck-free and satisfies ;A (0 ) ≤ 2K62 l 2/3 (log l)1/3 .
(6.11)
Then B1 and B2 are disjoint, and since
diam 0[U1 ,X1 ] > r
and
diam 0[X1 ,V1 ] > r,
the absense of bottlenecks implies
d 0[X2 ,U1 ] , 0[X1 ,U2 ] > q
and
d 0[V2 ,X1 ] , 0[V1 ,X2 ] > q.
It follows that Wi = Ui , Zi ∈ 0[Xi ,Vi ] (i = 1, 2) and
q d 0[X2 ,W1 ] , 0[X1 ,W2 ] > q − 1 > . 2 When ρ and σ are paths such that the endpoint of ρ is the initial point of σ , we let (ρ, σ ) denote the path obtained by concatenating σ and ρ. Then
[X ,X ] [X ,W ] µL 0 2 1 − µL 0 2 1 , ζW1 X1 ≤ |B1 | = 16π r 2 , since the paths differ only inside B1 . Again we may interchange 1, 2 and L, R. Using Lemma 3.2, P | Int(0 )| ≥ A − g(A), 0 is (q, r) − bottleneck-free, ;A (0 ) ≤ 2K62 l 2/3 (log l)1/3 , 0 ∈ Mid7/8 (0 ) ≤ A−u≤AL +AR ≤2A
√ x1 ,x2 ∈B(0, A) xi =(ai ,b√ i) b1 −b2 ≥ 23 A
w1 ,z1 ∈B(x1 ,4r)
w2 ,z2 ∈B(x2 ,4r)
P X1 = x1 , X2 = x2 , W1 = w1 , W2 = w2 , Z1 = z1 , Z2 = z2
Droplets in Random Cluster Models
771
and there exist paths ρR from x2 to w1 and ρL from w2 to x1 satisfying µL ((ρR , ζw1 x1 )) ≥ AR − 16π r 2 ,
q µR ((ζx2 w2 , ρL )) ≥ AL − 16π r 2 , d(ρL , ρR ) ≥ , z1 ∈ ρL , z2 ∈ ρR , 2 0 ∈ JR ((ζx2 w2 , ρL )) ∩ JL ((ρR , ζw1 x1 )) ≤ (6.12) A−u≤AL +AR ≤2A
√ x1 ,x2 ∈B(0, A) xi =(ai ,b√ i) b1 −b2 ≥ 23 A
w1 ,z1 ∈B(x1 ,4r)
w2 ,z2 ∈B(x2 ,4r)
2P x2 ↔ w1 via an open dual path ρR in Sx1 x2 with µL ((ρR , ζw1 x1 )) ≥ AR − 16π r 2 , z2 ∈ ρR , 0 ∈ JL ((ρR , ζw1 x1 )) · P w2 ↔ x1 via an open dual path ρL in Sx1 x2 with µR ((ζx2 w2 , ρL )) ≥ AL − 16π r 2 , z1 ∈ ρL , 0 ∈ JR ((ζx2 w2 , ρL )) . Let us assume for convenience that δ is an integer (if not, the necessary modifications are simple), and let x1 , w1 , z2 and x2 be the lattice sites which are 2δ units to the right of x1 , w1 , z2 and x2 , respectively. We now “pull apart” the two halves of 0 by replacing each of these four sites by its right-shifted counterpart in the first probability on the right side of (6.12). Specifically, by the FKG property, for each summand we have P x2 ↔ w1 via an open dual path ρR in Sx1 x2 with µL ((ρR , ζw1 x1 )) ≥ AR − 16π r 2 , z2 ∈ ρR , 0 ∈ JL ((ρR , ζw1 x1 )) · P w2 ↔ x1 via an open dual path ρL in Sx1 x2 with µR ((ζx2 w2 , ρL )) ≥ AL − 16π r 2 , z1 ∈ ρL , 0 ∈ JR ((ζx2 w2 , ρL ))
(6.13)
· P (Open(ζz1 w1 ))P (Open(ζw2 z2 )) 3 √ 2 ≤ P |0 | ≥ AL + AR + δ A − 32π r 2 √ ≤ P (|0 | ≥ A + δ A). Here the first inequality uses the fact that, even though the paths ρR , ρL , ζz1 w1 , ζw2 z2 are not necessarily disjoint, the 4 events on the left side of (6.13) imply the event on the right side of the first inequaltiy in (6.13), under our nonstandard definition of circuit. From (6.10), (6.12), (6.13) and the bounded energy property we obtain √ P (| Int(0 )| ≥ A) ≤ K63 A4 r 4 ueK64 δ P (| Int(0 )| ≥ A + δ A) (6.14) √ ≤ eK65 δ P (| Int(0 )| ≥ A + δ A), completing the proof. " #
772
K. S. Alexander
Proof Let 420 , 421 > 0 to be specified, let n0 = min{n : 2n κτ r/3 > √ of Theorem 2.1, (2.16). −2 2n 421 A}, and let bn = 419 2 (κτ r/3)2 , where 419 is from Lemma 6.2. Then provided 421 is small enough (depending on 420 ), we have bn < 420 2n−1
κτ r √ A 3
for all n ≤ n0 .
(6.15)
(0 ): We have the following decomposition according to the size of D(q,r)
P (| Int(0 )| ≥ A, 0 is not (q, r) − bottleneck-free) κτ r ) (0 ) ≥ ≤ P (| Int(0 )| ≥ A, D(q,r) 3 κτ r ≤ P (| Int(0 )| ≥ 2A, D(q,r) (0 ) ≥ ) 3 √ (0 ) ≥ 421 A) + P (A ≤ | Int(0 )| < 2A, D(q,r) n0 κτ r κτ r
+ ≤ D(q,r) . P A ≤ | Int(0 )| < 2A, 2n−1 (0 ) < 2n 3 3
(6.16)
n=1
By Theorems 5.8 and 4.1, the first probability on the right side of (6.16) satisfies κτ r P (| Int(0 )| ≥ 2A, D(q,r) (0 ) ≥ ) 3 1 κτ r 2/3 ≤ exp − − u(K51 (log 2A) , 2A) 20 3 1 κτ r ≤ exp − P (| Int(0 )| ≥ A), 20 3
(6.17)
and, for large A, the second satisfies √ P (A ≤ | Int(0 )| < 2A, D(q,r) (0 ) ≥ 421 A) √ 1 2/3 ≤ exp − 421 A − u(K51 (log A) , A) 20 √ 1 ≤ exp − 421 A P (| Int(0 )| ≥ A). 40
(6.18)
Thus we must consider the terms of the sum on the right side of (6.16). By Lemma 6.2, D(q,r) (0 ) < 2n
κτ r 3
implies
| Int(0 )| − | Int(αmax,0 )| < bn .
Droplets in Random Cluster Models
773
Hence similarly to (6.9) we have κτ r κτ r
P A ≤ | Int(0 )| < 2A, 2n−1 ≤ D(q,r) (0 ) < 2n 3 3 n−1 κτ r ≤ P A ≤ | Int(0 )| < 2A, D(q,r) (0 ) > 2 and 3 | Int(y )| − | Int(αmax,y )| < bn √ ≤ P (M0 (k, q, r, B, A , d , w1 A )) A≤B<2A d ≥2n−1 κτ r/3 B−bn
≤
(6.19)
A−bn ≤A <2A A ≤B
√ (d + 1)e−d /2 P (M0 (0, q, r, A , A , 0, w1 A ))
κτ r n−1 + 1 e−2 κτ r/6 P (| Int(0 )| ≥ A − bn ) ≤ K66 bn 2n−1 3
3 n−1 n−1 n−1 κτ r + 1 e−2 κτ r/6 e2K61 420 2 κτ r/3 P (| Int(0 )| ≥ A) ≤ K67 2 3 −2n−1 κτ r/12 P (| Int(0 )| ≥ A), ≤ K68 e where the first inequality follows from Lemma 6.2, the third from Proposition 5.5, the fifth from (6.15) and Proposition 6.3, and the sixth from a sufficiently small choice of 420 . Summing gives n0 κτ r κτ r
≤ D(q,r) P A ≤ | Int(0 )| < 2A, 2n−1 (0 ) < 2n 3 3 n=1
≤ K69 e−κτ r/12 P (| Int(0 )| ≥ A). With (6.16), (6.17) and (6.18) this completes the proof.
(6.20)
# "
7. Conditioning on the Exact Area In Theorems 2.1 and 2.2, one could as well consider conditioning on | Int(0 )| = A in place of | Int(0 )| ≥ A. With the exception of (2.16), is straightforward to alter the existing proof to prove these theorems under this conditioning once one has a lower bound like Theorem 4.1 for P (| Int(0 )| = A). For this we need first some definitions and lemmas. In the interest of space we will not give full details in the proofs; these details involve many of the same technicalities √ we have encountered earlier. Consider distinct points x, y ∈ R2 with |y − x| ≥ 4 2. We let U (x, y) denote the open slab between the tangent line to ∂Bτ (x, τ (y − x)) at y and the parallel line through x; we call U (x, y) the natural slab of x and y. (Note the tangent line is not necessarily unique; if it is not we make some arbitrary choice.) We have U (x, y) = U (y, x). It follows from the definition of U (x, y) that if u and v are on opposite sides of U (x, y), then τ (v − u) ≥ τ (y − x).
(7.1)
The portion of U (x, y) which is strictly to the right of the line from x to y is called the natural half-slab of x and y and denoted UR (x, y). For x, y ∈ U (u, v) we let
774
K. S. Alexander
U˜ uv (x, y) denote the open slab with sides parallel to those of U (u, v) with x in one edge R (x, y) = U ˜ uv (x, y) ∩ UR (u, v). of U˜ uv (x, y) and y in the other edge. We let U˜ uv One or two of the dual sites adjacent to x are in U (x, y); we let x denote such a site, making an arbitrary choice if there are two. We define y analogously as a dual site in U (x, y) adjacent to y. 2 By a face of the dual lattice we mean a square z + − 21 , 21 with z ∈ Z2 . Let V (x, y) denote the interior of the union of all faces of the dual lattice whose interiors are contained in U˜ xy (x , y ), and let Tx (x, y) and Ty (x, y) denote the components of V√ (x, y)c containing x and y, respectively. (These components are distinct since |x −y| ≥ 4 2.) Then V (x, y) ⊂ U˜ xy (x , y ) ⊂ U (x, y). The following lemma says roughly that typical open dual paths from x to y are connected to the boundary of the natural slab only near x and y. Lemma 7.1. Let P be a percolation model on B(Z2 ) satisfying (2.7). There exist constants K70 , K71 , 422 as follows. For all x, y ∈ (Z2 )∗ and J ≥ K70 log |x − y|, P (x ↔ z for some z ∈ ∂V (x, y)\(x + J ) | x ↔ y) ≤ K71 e−422 J . Proof. As mentioned above, we omit some details. Suppose x ↔ y and x ↔ z for some z ∈ ∂Tx (x, y)\(x + J ), via open dual paths. If K72 is large, one can trivially dispose of the case in which x ↔ B(x, K72 |y − x|)c , so we hence forth tacitly consider only connections occurring inside B(x, K72 |y − x|); in particular this means |z − x| ≤ K72 |y − x|. There are then two cases: either there is an (423 J )-near connection from x to y in B(z, J /10)c , or there is not; here 423 is to be specified. In the first case, there is also an open dual path from z to ∂(B(z, J /20) ∩ (Z2 )∗ ), so we can apply (2.10) and Lemmas 3.1 and 6.1 (assuming K70 is large) to obtain
P N (x, y, 423 J, B(z, J /10)c ) ∩ [z ↔ ∂(B(z, J /20) ∩ (Z2 )∗ )] ≤ K73 J e−τ (y−x)+K54 423 J e−κτ J /40
(7.2)
≤ K74 e−424 J P (x ↔ y), provided 423 is chosen small enough and K70 large enough. In the second case, there exist dual sites v, w just outside B(z, J /10) and open dual paths x ↔ v and w ↔ y occurring at separation 423 J . It follows easily from (7.1) and the fact that z is close to ∂U (x, y) that τ (y − w) ≥ τ (y − x) − 15 κτ J. Also, since |z − x| ≥ J , τ (v − x) ≥ τ (z − x) − 15 κτ J ≥ 21 κτ J.
(7.3)
Therefore by Lemma 3.2 and (2.10),
P N(x, y, 423 J, B(z, J /10)c )c ∩ [x ↔ y] ∩ [x ↔ z] ≤ 2P (x ↔ v)P (w ↔ y) v,w
≤
(7.4)
3 K75 J 2 e−τ (y−x)− 10 κτ J
≤ K76 e−425 J P (x ↔ y). Now (7.2) and (7.4), summed over z with |z − x| ≤ K72 |y − x|, prove the lemma. " #
Droplets in Random Cluster Models
775
For dual sites x and y, we say that x ↔ y cylindrically if there is an open dual path γ from x to y in U (x, y) and every open dual path from γ to U (x, y)c passes through x or y. For D a subgraph of the dual lattice (or just a set of dual bonds, which we may view as such a subgraph), and A ⊂ R2 , we define the bond boundary of D in A to be the set of bonds contained in A having exactly one endpoint in D. (As always, we view bonds as open intervals contained in the plane.) Lemma 7.2. Let P be a percolation model on B(Z2 ) satisfying (2.7). There exist constants K77 , 426 as follows. For all x, y ∈ (Z2 )∗ , P (x ↔ y cylindrically) ≥ 426 |y − x|−K77 P (x ↔ y). Proof. Fix x, y and let J = K70 log |y −x|, with K70 from Lemma 7.1. Let Jx (x, y) denote the bond boundary of ∂Tx (x, y)∩B(x+J ) in R2 , and define Jy (x, y) analogously. For e ∈ Jx (x, y) let Ae denote the event that all dual bonds in B(∂Tx (x, y) ∩ (x + J )) are open, e and xx are open and all other bonds in Jx (x, y) are closed; define Ae analogously for e ∈ Jy (x, y). Define the event B = [x ↔ z for some z ∈ ∂V (x, y)\(x + J )] (cf. Lemma 7.1). Given a configuration ω ∈ [x ↔ y] ∩ B c , there necessarily exists an open dual path from ∂Tx (x, y) ∩ (x + J ) to ∂Ty (x, y) ∩ (y + J ) in V (x, y) which contains only one bond in Jx (x, y) and one bond in Jy (x, y); we denote these two bonds by bxy (x, ω) and bxy (y, ω), respectively, making an arbitrary choice if more than one choice is possible. If the configuration ω is in [x ↔ y] ∩ B c ∩ [bxy (x) = e] ∩ [bxy (y) = f ] for some e, f , then we can modify at most 16J bonds (those in ∂Tx ∩ B(∂Tx (x, y) ∩ (x + J ))) to obtain a configuration in Ae ∩ Af ; in the resulting configuration we have x ↔ y cylindrically. Therefore from the bounded energy property we have P (x ↔ y cylindrically | [x ↔ y] ∩ B c ) ≥ e−K78 J . This and Lemma 7.1 prove the lemma. It is easy to check that Lemmas 7.1 and 7.2 are valid if we restrict to connections in the halfspace Hxy . More precisely, under the assumptions of Lemma 7.2 we have P (x ↔ y cylindrically in UR (x, y)) ≥ 426 |y − x|−K77 P (x ↔ y).
(7.5)
Additionally, we can extend the idea of cylindrical connections as follows: for u, v ∈ R2 and x, y dual sites in U (u, v), we say that x ↔ y (u, v)-cylindrically if there is an open dual path from x to y in U˜ uv (x, y) and the dual cluster of x and y intersects ∂ U˜ uv (x, y) only at x and y. Provided x, y ∈ UR (u, v), the proof of Lemma 7.2 shows that R P (x ↔ y (u, v)-cylindrically in U˜ uv (x, y)) ≥ 426 |y − x|−K77 P (x ↔ y in Hxy ). (7.6)
Theorem 7.3. Let P be a percolation model on B(Z2 ) satisfying (2.7), the near-Markov property for open circuits, positivity√of τ and the ratio weak mixing property. There exist Ki such that for A > K79 and l = A, √ P (| Int(0 )| = A) ≥ exp − w1 A − K80 l 1/3 (log l)2/3 .
776
K. S. Alexander
√ Proof. Fix A large and let l = A. Let a1 denote the vertical coordinate of the point where ∂K1 meets the positive vertical axis. Let s = l 2/3 (log l)1/3 and δ = K27 s 2 / l, with K27 as in the proof of Theorem 4.1. Let α = ∂(l + δ)K1 and let (z0 , . . . , zn , z0 ) be the s-hull skeleton of α. It is an easy exercise in geometry to see that the natural ), i = 0, . . . , n, are disjoint. (Our labeling as usual is cyclical: half-slabs UR (zi , zi+1 ) from the skeleton zn+1 = z0 .) For some K81 to be specified, let us call a pair (zi , zi+1 √ √ very short if |zi+1 − zi | ≤ 2 2, short if 2 2 < |zi+1 − zi | ≤ 2K81 log l and long if |zi+1 − zi | > 2K81 log l. In what follows, very short pairs can be handled quite trivially but tediously, so for convenience we will assume there are no very short pairs. For long pairs we define xi and yi+1 to be the points on the line segment zi zi+1 at distance K81 log l ) within from zi and from zi+1 , respectively, and let xi , yi+1 be dual sites in UR (zi , zi+1 √ distance 2 of xi and yi+1 , respectively. For short pairs we let xi = yi+1 be a dual site √ ) within distance 2 of the midpoint of z z . With minor modification in UR (zi , zi+1 i i+1 of the definition of the s-hull skeleton, we may assume the set {z0 , . . . , zn } has lattice symmetry, that is, for each zi the reflection of zi across the horizontal or vertical axis is another zj , and analogously for the sites xi and yi . For each i we let φi denote a dual lattice path of minimal length from yi to xi outside Co({z , . . . , zn })∪ U˜ z z (xi−1 , yi )∪ 0
i−1 i
U˜ zi zi+1 (xi , yi+1 ). We call such a φi a short link. Let Ci denote the bond boundary of φi in (Co({z , . . . , zn }) ∪ U˜ z z (xi−1 , yi ) ∪ U˜ z z (xi , yi+1 ))c .
0
i−1 i
i i+1
Let λa be the vertical line through (a, 0). Let HL (x) and HR (x) denote the open half planes to the left and right, respectively, of the vertical line through x. Let HU (x) and HB (x) denote the open half planes above and below the horizontal line through x, respectively. (In general we use the convention that subscripts L, R, U, B refer to left, right, upper and lower halfspaces, respectively, with combinations, such as LU , referring to quadrants.) Let S(x, y) denote the open slab between the vertical lines through x and y. Let N be the integer part of a1 l/2, M the integer part of l 2/3 (log l)1/3 and D the integer part of l 1/3 (log l)2/3 . Let uRU = HR (0) ∩ HU (0) ∩ α ∩ λM+ 1 , vRU = HR (0) ∩ HU (0) ∩ α ∩ λM+D+ 1 , 2
= HR (0) ∩ HU (0) ∩ α ∩ λN+ 1 , wRU 2
2
xRU = HR (0) ∩ HU (0) ∩ α ∩ λN+D+ 1 . 2
(Note each of these intersections is a single point.) We call these 4 points determining points. Lattice symmetry yields corresponding determining points with appropriate subscripts in the other three quadrants. We may assume that uRU is one of the sites zi of the s-hull skeleton of α (if not, we add uRU to the skeleton), and analogously for the other sites just defined. Let uRU be the second closest dual site above uRU in λM+ 1 , and analogously for vRU , wRU , xRU . If uRU = zi , for some i, we de2 fine zi to be uRU , and again analogously for the other determining points. Loosely, the idea is to remove from 0 its intersection with each of the width-D vertical slabs S(xLU , wLU ), S(vLU , uLU ), S(uRU , vRU ), S(wRU , xRU ), then raise or lower the segments of 0 between these slabs to adjust the area as desired, then reconnect these segments to make a new circuit enclosing area A. To do this we must first ensure that 0 intersects each vertical line bounding any of these four slabs only twice. We refer to the 4 width-D vertical slabs above as removal slabs. We call the 5 regions HL (xLU ), S(wLU , vLU ), S(uLU , uRU ), S(vRU , wRU ), HR (xRU ) (whose closures together form the complement of the 4 removal slabs) retention regions. By a retained
Droplets in Random Cluster Models
777
segment we mean a connected component of the intersection of α with a retention re gion. Each retained segment has the form α (zj ,zk ) for some j, k; we call zj an initial determining point and zk a final determining point, and call (j, k) a retention pair. We let J ret denote the set of all 8 retention pairs. For each initial determining point zj , in the boundary of some retained region F , we let ψj be a dual lattice path from zj to xj in F \U˜ zj zj +1 (xj , yj +1 ), of minimal length, and let Dj be the bond boundary of ψj in F \U˜ z z (xj , yj +1 )}. For each final determining point z we let ψk be a dual lattice k
j j +1
path from yk to zk in F \U˜ zk−1 zk (xk−1 , yk ), of minimal length, and define Dk analogously to Dj . We refer to ψj and ψk as the endpaths of the retention pair (j, k).
For each retention pair (j, k) let Ij k = {i : zi , zi+1 ∈ α (zj ,zk ) } and let Qj k denote )-cylindrically the event that (i) for each i ∈ Ij k ∪ {j }, we have xi ↔ yi+1 (zi , zi+1 in U˜ zR z (xi , yi+1 ), (ii) for each i ∈ Ij k we have φi open and all bonds in Ci closed, i i+1
and (iii) both endpaths of (j, k) are open and all bonds in Dj ∪ Dk are closed. These 3 component events are denoted Q(i) (j, k), Q(ii) (j, k) and Q(iii) (j, k). For a configuration in Qj k , the paths xi ↔ yi+1 together with the short links φi and the two endpaths form an open dual path from zj to zk outside Co({z0 , . . . , zn }), and there is no open dual connection from this path to any point of the retention region boundary except zj and zk . By Lemma 3.1, (7.6) and Theorem 4.2, provided K81 is large we have P (Q(i) (j, k)) |I | ≥ 21 j k
i∈Ij k ∪{j }
≥
4 |Ij k |+1 27
l
≥ exp −
P (xi ↔ yi+1 (zi , zi+1 ) − cylindrically in U˜ zR z (xi , yi+1 )) i i+1
exp −
τ (yi+1 − xi )
i∈Ij k ∪{j }
(7.7)
τ (zi+1 − zi ) − K82 |Ij k | log l .
i∈Ij k ∪{j }
From the bounded energy property, P Q(ii) (j, k) ∩ Q(iii) (j, k) | Q(i) (j, k) ≥ exp(−K83 |Ij k | log l) which with (7.7) yields P (Qj k ) ≥ exp −
τ (zi+1 − zi ) − K84 |Ij k | log l .
(7.8)
i∈Ij k ∪{j }
For a configuration ω ∈ Qj k , and for F the retention region with zj , zk ∈ ∂F , we can associate an area Rj k (ω) as follows. There is a unique outermost open dual path Yj k (ω) from zj to zk in F . If F is a halfspace (HL (xLU ) or HR (xRU )), then Rj k (ω) is the area of the region between Yj k (ω) and zj zk . If F is a slab and zj , zk ∈ HU (0), then Rj k (ω) is the area of the region in F ∩ HU (0) below Yj k (ω). If F is a slab and zj , zk ∈ HB (0), then Rj k (ω) is the area of the region in F ∩ HB (0) above Yj k (ω).
778
K. S. Alexander
We define corresponding nonrandom areas Rj0k similarly but using α (zj zk ) in place of Yj k (ω). Then Rj k (ω) ≥ Rj0k . It is not hard to see that for some K85 we have P Rj k ≥ Rj0k + K85 l 4/3 (log l)2/3 for some
1 retention pair (j, k) | ∩(j,k)∈J ret Qj k ≤ . 2
(7.9)
In fact, if this were false we could obtain extra area K85 l 4/3 (log l)2/3 almost “for free” in Theorem 4.1; more precisely, one could replace A with A + K85 l 4/3 (log l)2/3 on the left side in the conclusion of that theorem. But, assuming K85 is large, this would contradict Theorem 5.8. It follows from (7.9) that for each retention pair (zj , zk ) there exists aj k ∈ [Rj0k , Rj0k + K85 l 4/3 (log l)2/3 ) such that 1 P ∩(j,k)∈J ret [Rj k = aj k ] | ∩(j,k)∈J ret Qj k ≥ (K85 l 4/3 (log l)2/3 )−8 2
(7.10)
Let R1 =
aj k ,
(j,k)∈J ret
where the sum is over the 8 retention pairs. We call (k, j ) a removal pair if α (zk ,zj ) is a connected component of the intersection of α with some removal slab. Let J rem denote the set of all 8 removal pairs. For each removal pair (k, j ) and corresponding removal slab F , let χkj be a dual path from zk to zj in F \ Co({z0 , . . . , zn }), and let Gkj denote the event that all bonds in χkj are open, while all bonds in the bond boundary of χkj in (ψj ∪ ψk )c are closed. We call χkj a long link. There are 2 long links in each removal slab, one each in the upper and lower half planes. Let R2 be the total area in the 4 removal slabs, between the upper and lower long link in each slab. Assuming the long links are chosen to have length of order D (say, at most 4D), we have from the bounded energy property that P ∩(k,j )∈J rem Gzk ,zj | (∩(j,k)∈J ret Qj k ) ∩ (∩(j,k)∈J ret [Rj k = aj k ]) ≥ e−K86 D ≥ e−K87
l 1/3 (log l)2/3
(7.11) .
Define the event E = ∩(j,k)∈J ret Qj k ∩ ∩(j,k)∈J ret [Rj k = aj k ] ∩ ∩(k,j )∈J rem Gkj . For a configuration ω ∈ E, there is an open dual circuit surrounding Co({z0 , . . . , zn }) satisfying the constraint that it include all of the short links φi , long links χkj and endpaths ψj . There is a unique outermost such circuit subject to this constraint, obtained by taking )-cylindrical connection from x to y the outermost (zi , zi+1 i i+1 for each i; we denote this circuit 1 (ω). Because of the cylindrical nature of these connections and the closed state of the bond boundaries of the short and long links, we have 1 (ω) = 0 (ω), unless
Droplets in Random Cluster Models
779
0 (ω) and 1 (ω) are disjoint with 0 (ω) surrounding 1 (ω) and no open dual path connecting 0 (ω) to 1 (ω). It therefore follows from the near-Markov property that P (0 = 1 | E) ≤ e−428 l ≤
1 . 2
By (3.2),
|Ij k | ≤ K88
(j,k)∈J ret
l ≤ K89 l 1/3 (log l)−1/3 . s
Using these facts with (7.10), (7.11), Lemma 3.1, (7.8) and (4.1) (which is still valid here), we obtain P (|0 | = R1 + R2 ) ≥ P (E ∩ [0 = 1 ]) ≥
1 P (E) 2
1 1/3 2/3 (K85 l 4/3 (log l)2/3 )−8 e−K87 l (log l) P (∩(j,k)∈J ret Qj k ) 2 1/3 2/3 P (Qj k ) ≥ e−K90 l (log l) ≥
≥ exp −
(j,k)∈J ret n
(7.12)
τ (zi+1 − zi ) − K91 l 1/3 (log l)2/3
i=0
√ ≥ exp −w1 A − K92 l 1/3 (log l)2/3 . Let θ ω be the upward shift of a configuration ω by 1 unit, and for an event G let θ m G = {ω : θ −m ω ∈ G}. Let JBret be the set consisting of the 3 retention pairs corresponding to segments of α in the lower half plane. Given a constant K93 , for m ≤ K93 l 1/3 (log l)2/3 we can replace Qj k with θ m Qj k for all (j, k) ∈ JBret throughout the argument leading to (7.12) at the expense of only a possible increase in K91 , provided we alter the 4 long links χkj in the outer 2 removal slabs to connect to the appropriate shifted sites zi + (0, m) instead of to zi . (The possible increase in K91 reflects a possible reduction in the probabilities of the events Gkj , resulting in an increase of K87 in (7.11).) We can readily keep the area R2 fixed when we so alter the long links. We thereby obtain √ P (|0 | = R1 + R2 − (2N + 1)m) ≥ exp(−w1 A − K94 l 1/3 (log l)2/3 ). Provided K93 is large, we can choose m ∈ Z so that |R1 + R2 − (2N + 1)m − A| ≤ N. We can then repeat this entire argument, but shift upward (by some amount q ≤ K93 l 1/3 (log l)2/3 ) only the event Qj k for the central of the 3 retention pairs in JBret . This gives P (|0 | = R1 + R2 − (2N + 1)m − (2M + 1)q)
√ ≥ exp(−w1 A − K95 l 1/3 (log l)2/3 ).
We can choose k so that |R1 + R2 − (2N + 1)m − (2M + 1)q − A| ≤ M ≤ l 2/3 (log l)1/3 .
(7.13)
780
K. S. Alexander
But it is easy to see that one can alter the long links to change R2 by any amount up to l 2/3 (log l)1/3 , so that R1 + R2 − (2N + 1)m − (2M + 1)q = A, at the expense of only a possible increase to K95 in (7.13). With (7.13) this completes the proof. " # Now that we can use Theorem 7.3 in place of Theorem 4.1, all proofs leading to (2.13) – (2.15) of Theorems 2.1 and 2.2 remain valid under conditioning on | Int(0 )| = A in place of | Int(0 )| ≥ A. This establishes the following result. Theorem 7.4. Under the assumptions in Theorem 2.1 or Theorem 2.2, under the measure P (· | | Int(0 )| = A) the conclusions (2.13)–(2.15) hold with probability approaching 1 as A → ∞. We do not include√ (2.16) in Theorem 7.4 because of √technical difficulties in replacing P (| Int(0 | ≥ A + δ A) with P (| Int(0 | = A + δ A) in Proposition 6.3. However, we have no reason to expect (2.16) should not be valid here as well. We conclude with the following. Proof of Theorem 2.3. The O(·) is a combined upper and lower estimate; the lower estimate follows from Theorem 7.3 and the upper estimate from Theorem 5.8. " #
References 1. Alexander, K.S.: Stability of the Wulff minimum and fluctuations in shape for large finite clusters in two-dimensional percolation. Probab. Theory Rel. Fields 91, 507–532 (1992) 2. Alexander, K.S.: Approximation of subadditive functions and rates of convergence in limiting shape results. Ann. Probab. 25, 30–55 (1997) 3. Alexander, K.S.: On weak mixing in lattice models. Probab. Theory Rel. Fields 110, 441–471 (1998) 4. Alexander, K.S.. Power-law corrections to exponential decay of connectivities and correlations in lattice models. Ann. Probab. 29, 92–122 (2001) 5. Alexander, K.S.: The asymmetric random cluster model and comparison of Ising and Potts models. Probab. Theory Rel. Fields 120, 395–444 (2001) 6. Alexander, K.S., Chayes, J.T. and Chayes, L.: The Wulff construction and asymptotics of the finite cluster distribution for two dimensional Bernoulli percolation. Commun. Math. Phys. 131, 1–50 (1990) 7. Avron, J.E., van Beijeren, H., Schulman, L.S. and Zia, R.K.P.: Roughening transition, surface tension and equilibrium droplet shapes in a two-dimensional Ising system. J. Phys. A 15, L81–L86 (1982) 8. Baik, J., Deift, P. and Johansson, K.: On the distribution of the length of the longest increasing subsequence of random permutations. J. Am. Math. Soc. 12, 1119–1178 (1999) 9. Burton, R. and Keane, M.: Density and uniqueness in percolation. Commun. Math. Phys. 121, 501–505 (1989) 10. Deuschel, J.-D. and Zeitouni, O.: On increasing subsequences of I.I.D. samples. Combin. Probab. Comput. 8, 247–263 (1999) 11. Dobrushin, R.L. and Hryniv, O.: Fluctuations of the phase boundary in the 2D Ising ferromagnet. Commun. Math. Phys. 189, 395–445 (1997) 12. Dobrushin, R.L., Kotecký, R. and Shlosman, S.: Wulff construction. A global shape from local interaction. Translations of Mathematical Monographs 104, Providence, RI: American Mathematical Society, 1992 13. Edwards, R.G. and Sokal,A.D.: Generalization of the Fortuin–Kasteleyn–Swendsen–Wang representation and Monte Carlo algorithm. Phys. Rev. D 38, 2009–2012 (1988) 14. Fortuin, C.M. and Kasteleyn, P.W.: On the random cluster model. I. Introduction and relation to other models. Physica 57, 536–564 (1972) 15. Grimmett, G.R.: The stochastic random-cluster process and uniqueness of random-cluster measures. Ann. Probab. 23, 1461–1510 (1995) 16. Hryniv, O.: On local behaviour of the phase separation line in the 2D Ising model. Probab. Theory Rel. Fields 110, 91–107 (1998)
Droplets in Random Cluster Models
781
17. Ioffe, D. and Schonmann, R.H.: Dobrushin–Kotecký–Shlosman theorem up to the critical temperature. Commun. Math. Phys. 199, 117–167 (1998) 18. Krug, J. and Spohn, H.: Kinetic roughening of growing interfaces. In: Solids Far from Equilibrium: Growth, Morphology and Defects (C. Godrèche, ed.) Cambridge: Cambridge University Press, 1991, pp. 479–582 19. Laanait, L., Messager, A. and Ruiz, J.: Phase coexistence and surface tensions for the Potts model. Commun. Math. Phys. 105, 527–545 (1986) 20. Licea, C., Newman, C.M. and Piza, M.S.T.: Superdiffusivity in first-passage percolation. Probab. Theory Rel. Fields 106, 559–591 (1996) 21. McCoy, B.M. and Wu, T.T.: The Two-Dimensional Ising Model. Cambridge, MA: Harvard University Press, 1973 22. Newman, C.M. and Piza, M.S.T.; Divergence of shape fluctuations in two dimensions. Ann. Probab. 23, 977–1005 (1995) 23. Piza, M.S.T.: Directed polymers in a random environment: Some results on fluctuations. J. Statist. Phys. 89, 581–603 (1997) 24. Taylor, J.E.: Existence and structure of solutions to a class of nonelliptic variational problems. Symp. Math. 14, 499–508 (1974) 25. Taylor, J.E.: Unique structure of solutions to a class of nonelliptic variational problems. Proc. Symp. Pure Math. 27, 419–427 (1975) 26. Uzun, H.B.: On maximum local roughness of random droplets in two dimensions. Ph.D. dissertation, University of Southern California, 2001 27. van den Berg, J. and Kesten, H.: Inequalities with applications to percolation and reliability. J. Appl. Probab. 22, 556–569 (1985) 28. Vershik, A.M. and Kerov, C.V.: Asymptotics of the Plancherel measure of the symmetric group and the limiting form of Young tables. Dokl. Acad. Nauk. 233, 1024–1028 (1977) Communicated by A. Kupiainen