Commun. Math. Phys. 218, 1 – 97 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Strange Attractors with One Direction of Instability Qiudong Wang1, , Lai-Sang Young2,3, 1 Department of Mathematics, University of Arizona, Tucson, AZ 85721, USA.
E-mail:
[email protected]
2 Courant Institute of Mathematical Sciences, 251 Mercer St., New York, NY 10012, USA.
E-mail:
[email protected]
3 Department of Mathematics, UCLA, Los Angeles, CA 90095, USA. E-mail:
[email protected]
Received: 25 April 2000 / Accepted: 17 October 2000
Abstract: We give simple conditions that guarantee, for strongly dissipative maps, the existence of strange attractors with a single direction of instability and certain controlled behaviors. Only the d = 2 case is treated in this paper, although our approach is by no means limited to two phase-dimensions. We develop a dynamical picture for the attractors in this class, proving they have many of the statistical properties associated with chaos: positive Lyapunov exponents, existence of SRB measures, and exponential decay of correlations. Other results include the geometry of fractal critical sets, nonuniform hyperbolic behavior, symbolic coding of orbits, and formulas for topological entropy.
Contents Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. Statements of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 6 13
Part I. Controlling a Source of Nonhyberbolicity 3. The Critical Set . . . . . . . . . . . . . . . . 4. Replication of Orbit Segments . . . . . . . . . 5. Pushing the Induction Forward . . . . . . . . 6. Measure of Selected Parameters . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
19 24 28 33
Part II. Geometric and Statistical Results 7. Nonuniform Hyperbolic Behavior . . . . . . . 8. Statistical Properties of SRB Measures . . . . 9. Global Geometry . . . . . . . . . . . . . . . . 10. Symbolic Dynamics and Topological Entropy
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
44 50 57 65
This research is partially supported by NSF grant #9970673 and an NSF Postdoctoral Research Fellowship.
This research is partially supported by a grant from NSF and a Guggenheim Fellowship.
2
Q. Wang, L.-S. Young
Appendix A. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. Computational Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73 76
Introduction Strange attractors are of fundamental importance in dynamical systems; they have also been observed and recognized in many scientific disciplines. Up until now, most of the studies of strange attractors have relied on numerical simulations. Rigorous mathematical analysis has tended to be difficult, and progress has been slow. Among the not-so-many examples that have been studied are the Lorenz and Hénon attractors, both of which are closely related to certain one-dimensional maps. The theory of one-dimensional maps, on the other hand, has experienced unprecedented growth in the last two decades. The purpose of this paper is to bring some of the techniques in one-dimension to bear on the analysis of attractors with a single direction of instability. More precisely, our aim is to develop a general theory of strange attractors with one unstable direction and n − 1 directions of strong contraction, n being the dimension of the phase space. For simplicity, we will formulate our results in terms of perturbations of one-dimensional maps; what is important is that locally our dynamical systems have a one-dimensional character. In this paper, we will treat only the case n = 2, where most of the interesting phenomena already occur, leaving the case of arbitrary phase dimension to be published elsewhere. We focus on three aspects of this work that we regard as among the most important. A. Conditions for the existence of strange attractors with known properties. One of the goals of this paper is to introduce an implementable scheme that would enable one to rigorously verify the existence of strange attractors with certain well defined “chaotic” properties. Leaving these properties to paragraph C below, we now give a rough description of the scheme we propose. Given a family of strongly dissipative maps with some expansion, i.e. a situation where a strange attractor potentially exists, we show that the following steps will ensure the desired conclusions for a positive measure set of parameters: (a) First, pass to the singular limit by letting dissipation go to infinity. This gives a family of one-dimensional maps. (b) Next, check that among these one-dimensional maps, there exist some with strong expanding properties (e.g. Misiurewicz maps). (c) Then check that varying the parameter around the maps in (b) changes the dynamics effectively (this is a transversality condition). (d) Finally, check that a nondegeneracy condition is satisfied in the unfolding, i.e. in the process that reverses (a) to recover the original dynamical systems. These conditions are made precise in Sect. 1.1. We will show that all that one has to do to get the whole package of results described in paragraph C below (hyperbolicity, SRB measure, central limit theorem etc.) is to take the limit in (a), and then check (b)–(d); the latter two steps involve checking only that something is not equal to zero at a finite number of points. The paper that paved the way for the use of one-dimensional techniques in twodimensions is [BC2], in which Benedicks and Carleson showed, for certain parameters
Strange Attractors with One Direction of Instability
3
of the Hénon maps, that the attractor has positive Lyapunov exponents along its critical orbits (see paragraph B). Their estimates, however, use explicitly the formulas of the Hénon maps, making it difficult to apply directly the results in [BC2]. Extensions of [BC2] to small perturbations of Hénon maps have since been made, and some applications have been found; see e.g. [MV]. We do not claim by any means that this work is the first attempt to prove the existence of strange attractors, but we hope this is the most comprehensive attempt so far, both in terms of the clarity and generality of the conditions and in terms of the package of results that follow once these conditions are verified. B. Geometry of critical regions. In this paragraph we discuss an object which dominates the landscape of the attractor, namely its critical set. In fact, it is important to understand not just the geometry of the critical set but the behavior of the map on its neighborhoods of various sizes; we call these critical regions. Going back to one-dimension, there are basically two philosophically “different” ways to capture expanding properties for maps with critical points. There is the method of inducing used by Jakobson [J], which advocates, for an orbit passing near a critical point, to wait until it has regained a large derivative before looking at it again; and there is the idea first used by Collet and Eckmann [CE] and later in [BC1], which advocates imposing growth conditions directly on critical orbits. It is the second approach described above that we will use in this paper to study the dynamics on attractors. The idea of trying to identify a critical set for two-dimensional maps, that is to say, a set designated to play the role analogous to that played by critical points in onedimension, goes back to [BC2]. The construction of the critical set in [BC2], however, is ad hoc, and the resulting object has no obvious intrinsic characterization. Moreover, while certain geometric relationships are satisfied, no coherent geometric picture of the critical regions follows from or is exploited in [BC2]. In order to develop a coherent geometric picture, we believe it is necessary to rework the entire inductive construction of [BC2] with built-in geometric properties for the critical set as part of the induction. This is what we have done in Part I of this paper. We have borrowed various pieces of local analysis from [BC2], but we have also added a geometric component to the story. This part is new, and as the reader will see, this departure from [BC2] will make a nontrivial difference when it comes to deriving dynamical consequences (see paragraph C). We remark that previous extensions of [BC2] have followed the inductive construction of [BC2] faithfully and are therefore also without these geometric considerations. The critical set we introduce is an intrinsically defined object, characterized as those points on the attractor at which stable and unstable directions are interchanged. We prove that this set has a special geometric structure; it can be realized as the intersection of a nested sequence of rectangles with known geometric and dynamical properties. (For a quick desciption, see the statement of Theorem 1 in Sect. 1.2.) This geometric structure will be exploited heavily in the rest of the paper. C. Dynamical consequences. The purpose of Part II of this paper is to develop, for the good parameters, properties that are consequences of the basic structures established in Part I. By “dynamical consequences”, we refer to a comprehensive description of the attractor: its local and global structures, its dynamics as seen from statistical, geometric, combinatorial and symbolic points of view. We think of the first part of our paper as “sowing the seeds”, and the second part as “reaping the harvest”.
4
Q. Wang, L.-S. Young
We state and prove in Part II of this paper more than a dozen results. Some of these results have been shown before for the Hénon maps; others are new even in that restricted context. Some require delicate proofs; others follow, with a little bit of work, from general theory. All are natural consequences of the picture established in Part I, namely hyperbolicity away from the critical regions, and the geometry of the critical regions. Taken together, they represent a fairly complete understanding of the class of attractors in question. Statistical properties. From the statistical point of view, an inherent difficulty with dissipative dynamical systems is that a priori there is no natural invariant probability measure. By “natural”, I refer to a measure that reflects the properties of Lebesgue-typical points. (The measure itself can be singular.) For systems with some hyperbolicity, there is the notion of a Sinai–Ruelle–Bowen or SRB measure introduced earlier in another context ([S2,R1,R2]). SRB measures are natural in the sense above. The problem is, not every attractor has an SRB measure. Our first result is the existence of SRB measures for each of the attractors associated with a good parameter. These measures are not necessarily unique. In Sect. 8, we identify a finite number of ergodic SRB measures, called µ1 , · · · , µr , and show that the asymptotic orbit distribution starting at Lebesgue-a.e. z in the basin is given by one of these µi . The domains of attraction of the µi can be quite complicated. In a way reminiscent of phase transitions, there are examples in which starting from certain arbitrarily small open sets, one has a positive probability of reaching several different µi . By appealing to some general results in [Y3] or [Y4], we prove exponential decay of correlations and a central limit theorem for the µi . For the Hénon family near a = 2, b = 0 and their small perturbations, SRB measures and their statistical properties are studied in [BY1] and [BY2], and the basin property in [BV]. In this special case, the attractor admits only one SRB measure. Geometric and other properties. It is useful conceptually to distinguish between the following two kinds of properties: properties that hold for Lebesgue-typical points, and properties carried by “small sets” or sets having Lebesgue measure zero. Because Lebesgue-typical points approach the attractor slowly, properties of the first kind require less precise information on the critical set. It is, in many ways, a greater challenge to understand the behavior of every orbit, for there is no control on how often or how close it comes to the critical set. Properties of the second kind, some of which we now describe, rely much more heavily on the detailed geometry of the critical set. All of our results in this category are new even for the Hénon maps. A useful tool for keeping track of orbits in chaotic systems is to encode its orbits into symbolic sequences generated by a finite alphabet. Some encodings are more meaningful than others. Given that our attractors do not admit finite Markov partitions, we show in Sect. 10 that the situation is as good as can be: we have symbolic representations of orbits that reflect their true geometric locations, and a coding that is essentially one-to-one. This coding allows us to identify our attractor with strings of symbols, bringing us closer to one-dimensional lattice models in statistical mechancis. A useful concept for dynamical systems borrowed from lattice models is that of an equilibrium state (see [S2,R2]). We prove for our attractors the existence of equilibrium states, including measures of maximal entropy. We also prove various natural formulas for topological entropy, such as one given by counting the number of distinct “states” in n iterates. Uniformly hyperbolic or Axiom A attractors were among the first attractors to be understood (see [Sm,Bo,S2,R2]). In Sect. 7, we show that our attractors, obviously
Strange Attractors with One Direction of Instability
5
nonuniformly hyperbolic as they are, can be seen as the limit of an increasing sequence of uniformly hyperbolic invariant sets. Finally, in an entirely different direction, we give a finitary description of the approximate shape and complexity of these attractors, introducing a notion of “monotone branches”. The way these branches fit together gives strong insight into the differences between one and two-dimensional maps. This concludes our desciption of the content of this paper. Among the directions of research made more tractable by our results are questions on the zeta function and transfer operator (see e.g. [Bal], [PP, R3]). With kneading sequences for critical orbits being well defined, it is reasonable to consider the possibility of a kneading theory (see [C, MT]). Our topological discussion leads naturally to questions on prime ends (e.g. [Bar]). There are many related works that have not yet been mentioned. First, there are various results in one-dimension (see [dMvS]) and on attractors with one direction of stability (not necessarily satisfying our conditions), including the solenoid and other Axiom A attractors [Sm,W1], the Lorenz attractors (see [G, Ro, Ry,W2]), dissipative twist maps [Bi], and certain periodically forced nonlinear oscillators ([Lev]; see also [GH]). Extending [BC2] and therefore closer to the setting of this paper are [DRV] and [V]. For results on piecewise uniformly hyperbolic attractors, see e.g. [CL,I1,I2], [M2] and [Y1]; and for statistical properties of hyperbolic billiards, see e.g. [S1,BSC1,BSC2] and [Y3]. This paper is by and large self-contained – with the exception of Sect. 6, where two results from 1-dimensional maps are quoted without proof, and Sect. 8, where previous work of the second-named author is used. Proofs that are computational in nature have been put in the Appendix so that they will not obstruct the main flow of ideas. In a paper as long as this one, it might be useful to indicate the logical connections among the various sections. After Sect. 1, we recommend at least looking through Sect. 2, in which we introduce much of the basic vocabulary for subsequent sections. The other sections are connected as indicated. (For example, the technical content of Sect. 6 is not needed for reading Sects. 7–10.)
7. Hyperbolic Behavior
3. Critical Set
4. Replication of Orbits
5. Inductive Step
8. Statistical Properties
6. Parameter Set
9. Global Geometry
10. Sym. Dyn. & Entropy
6
Q. Wang, L.-S. Young
1. Statements of Results 1.1. Setting. For definiteness, Theorems 1–7 are stated in the context of attractors that arise from perturbations of circle maps. For the interval case, see Sect. 1.5. I just came back from Europe. Let A = S 1 × [−1, 1]. We consider 2-parameter families of maps {Ta,b }, where for each (a, b), Ta,b : A → A is a self-map of A and (x, y, a, b) → Ta,b (x, y) is C 4 . The class of 2-parameter families {Ta,b } to which our results apply are constructed via the following four steps. The necessary smoothness is assumed in each step. Step I. Let f : S 1 → S 1 satisfy the following Misiurewicz conditions, i.e. letting C = {x : f (x) = 0}, we assume: 1. 2. 3. 4.
f (x) = 0 for all x ∈ C; f has negative Schwarzian derivative on S 1 \ C;1 there is no x ∈ S 1 with f n (x) = x and |(f n ) (x)| ≤ 1; for all x ∈ C, inf n>0 d(f n x, C) > 0.
Observe that for p ∈ S 1 with inf n≥0 d(f n p, C) > 0, if g is sufficiently near f in the C 2 sense, then there is a unique point p(g) having the same symbolic dynamics with respect to g as p does with respect to f . If {fa } is a 1-parameter family through f , then for those a for which it makes sense, we will call p(a) = p(fa ) the continuation of p. For x ∈ C, we let x(a) denote the corresponding critical point of fa . Step II. Let f be as in Step I, and let {fa }, a ∈ [a0 , a1 ], be a 1-parameter family of maps from S 1 to S 1 with f = fa ∗ for some a ∗ ∈ [a0 , a1 ]. We require that {fa } satisfy the following transversality condition2 : For every x ∈ C, if p = f (x), then d d fa (x(a)) = p(a) da da
at a = a ∗ .
(1)
Step III. Let {fa } be as in Step II. Identifying S 1 with S 1 × {0} ⊂ A, we extend {fa } to a 2-parameter family {fa,b }, a ∈ [a0 , a1 ], b ∈ [0, b1 ], where fa,b : S 1 → A is such that fa,0 = fa and fa,b is an embedding for b > 0. Step IV. Finally, we extend fa,b to Ta,b : A → A in such a way that Ta,0 (A) ⊂ S 1 × {0} and for b > 0, Ta,b maps A diffeomorphically onto its image. We further impose the following non-degeneracy condition3 on the map Ta ∗ ,0 : ∂y Ta ∗ ,0 (x, 0) = 0
whenever fa ∗ (x) = 0.
(2)
This completes our construction of admissible families {Ta,b }. We remark that the transversality and non-degeneracy conditions in Steps II and IV are generic. Thinking in terms of normal neighborhoods, one constructs easily for a given fa,b extensions of the type in Step IV; the signs of the ∂y-derivatives at the critical points of fa ∗ are determined by the orientations of the turns of fa,b at the corresponding points. Step III is feasible if 1 This condition can be dropped but the proofs would be more complicated. 2 This transversality condition is used in [TTY]. 3 This condition is not assumed in [MV] or [BV]. Their regularity condition on | det(DT )| and bound on the perturbation term, however, imply a condition which is similar (though not equivalent) to (2) and which serves a similar purpose.
Strange Attractors with One Direction of Instability
7
and only if the degree of f is 0, 1 or −1. If | deg(f )| > 1, an extra dimension is needed; this will be treated in a separate paper. Finally, we observe that for b > 0, Ta,b has the general form Ta,b :
x y
→
F (x, y, a) + b u(x, y, a, b) b v(x, y, a, b)
,
where F (x, y, a) = Ta,0 (x, y) and the C 3 norms of (x, y, a) → u(x, y, a, b) and v(x, y, a, b) are uniformly bounded for all b ∈ (0, b1 ].4 We may, in fact, replace the C 4 assumption at the beginning of Sect. 1.1 by the expression for Ta,b above and the requirement of uniformly bounded C 3 norms. Notation. Given {Ta,b }, constants that are determined entirely by the family {Ta,b } will be referred to as system constants. Except where declared otherwise, the letter K is reserved throughout this article for use as a generic system constant, meaning a system constant that is allowed to change from statement to statement (the other system constants are fixed). We will use K1 , K2 etc. where K appears in more than one role in the same statement. Let K be such that Ta,b (A) ⊂ R0 := S 1 × [−Kb, Kb] for all (a, b). It is convenient for us to work with R0 instead of A. For T = Ta,b , let Rn = T n R0 . Then {Rn } is a ∞ n decreasing sequence of neighborhoods of the attractor := ∩∞ n=0 Rn = ∩n=0 T A. 1.2. Critical set and hyperbolic behavior. Our first theorem identifies, for each map T corresponding to a selected set of parameters, a fractal set C chosen to play the role of the critical set in 1-dimension. This set will be called the critical set of T . Our parameter selection imposes strong hyperbolic properties on orbits starting from C in the hope that these properties will be passed on to the rest of the system. The geometric structure near C, which is described in some detail in Theorem 1, is crucial for many of our later results. For z0 ∈ R0 , let zi = T i z0 . If w0 is a tangent vector at z0 , let wi = DT i (z0 )w0 . A curve in R0 is called a C 2 (b)-curve if the slopes of its tangent vectors are O(b) and its curvature is everywhere O(b). Theorem 1.1 (Parameter selection and the critical set). Given {Ta,b } as in Sect. 1.1, there is a positive measure set ! ⊂ [a0 , a1 ] × (0, b1 ] such that (1) and (2) below hold for T = Ta,b for all (a, b) ∈ !. The set ! is located near a = a ∗ and b = 0; it has the property that for all sufficiently small b, !b := {a : (a, b) ∈ !} has positive 1-dimensional Lebesgue measure. The constants α, δ, c > 0 and 0 < ρ < 1 below are system constants, and b << α, δ, ρ, e−c for all (a, b) ∈ !. (1) Geometry of critical regions and critical set. There is a Cantor set C ⊂ called (k) (k) are a decreasing sequence of the critical set given by C = ∩∞ k=0 C , where the C neighborhoods of C called critical regions. More precisely, (i) C (0) = {(x, y) ∈ R0 : d(x, C) < δ}, where C is the set of critical points of f . 4 This is a calculus exercise: Observe that bu extends to a C 4 function g on {b ≥ 0} with g|{b = 0} = 0. ∂3 Writing ∂ 3 = ∂z ∂z , where zi = x, y or a, we then check that ∂ 3 u extends to a continuous function h on 1 2 ∂z3 ∂ ∂ 3 g on {b = 0}. {b ≥ 0} with h = ∂b
8
Q. Wang, L.-S. Young
Q(k)
Qk−1
Rk
Fig. 1. Critical regions
(ii) C (k) has a finite number of components called Q(k) each one of which is diffeomorphic to a rectangle. The boundary of Q(k) is made up of two C 2 (b) segments of ∂Rk connected by two vertical lines: the horizontal boundaries are k ≈ min (2δ, ρ k ) in length, and the Hausdorff distance between them is O(b 2 ). (k) (k−1) (k−1) (iii) C is related to C as follows: Q ∩ Rk has at most finitely many components, each one of which lies between two C 2 (b) subsegments of ∂Rk that stretch across Q(k−1) as shown. Each component of Q(k−1) ∩ Rk contains exactly one component of C (k) . (2) Properties of critical orbits. On each horizontal boundary γ of each component Q(k) of C (k) , k = 0, 1, 2, · · · , there is a unique point z0 characterized by the following two properties: (i) DT j (z0 )( 01 ) ≥ K −1 ecj for all j > 0. (ii) If τ is a unit tangent vector to γ at z0 , then DT n (z0 )τ < (Kb)n ∀n > 0. k The point z0 is located within O(b 4 ) of the midpoint of γ . Let . be the set of all of these points, and let dC (·) be the notion of “distance to the critical set” defined below. Then z0 ∈ . also satisfies (iii) dC (zj ) ≥ K −1 e−αj for all j > 0. Finally, since the critical set C is the accumulation set of ., properties (i) and (iii) of . are passed on to C. For z ∈ R0 , dC (z) is defined as follows: For z ∈ C (0) \ C, let k be the largest number with z ∈ C (k) , and let dC (z) be the horizontal distance between z and the midpoint of the component of C (k) containing z; for z ∈ C (0) , use the component of C (0) nearest to z. Theorems 1.2–1.7 apply to T = Ta,b , (a, b) ∈ !, where ! is as in Theorem 1.1. Our next theorem is about the abundance of hyperbolic behavior on the attractor and in the basin. A compact T -invariant set / is called uniformly hyperbolic if there is a splitting of the tangent bundle over / into invariant subbundles E u ⊕ E s such that for some C, λ > 1, we have, for all n ≥ 1, DT n v ≤ Cλ−n v for all v ∈ E s and DT −n v ≤ Cλ−n v for all v ∈ E u . Theorem 1.2 (Hyperbolic behavior). (1) Let
ε := {z0 ∈ : dC (zn ) ≥ ε ∀n ∈ Z}.
(i) For every ε > 0, ε is uniformly hyperbolic. In fact, independent of ε, λ in the c definition of hyperbolicity can be taken to be ≈ e 3 , where c is as in Theorem 1.1. In particular, for evevry periodic point z ∈ with T q z = z, DT q |E u (z) ≥ c K −1 e 3 q .
Strange Attractors with One Direction of Instability
9
(ii) As ε → 0, the hyperbolicity on ε deteriorates in the sense that C → ∞ and the minimum angle between E u and E s tends to zero. (iii) = ∪ε>0 ε provided a surjective condition of the type (*) below is assumed. (2) Under the regularity conditions (**) below, we have lim sup n→∞
1 c log DT n (z0 ) ≥ n 3
for Lebesgue-almost every z0 ∈ R0 . The two technical conditions used in Parts (1)(iii) and (2) of Theorem 1.2 are: Let J1 , · · · Jr be the intervals of monotonicity of f . Then for each i, there exists j such that f (Jj ) ⊃ Ji . ** There exist K1 , K2 > 0 such that for all z ∈ R0 , *
K1−1 b ≤ | det(DTa,b (z))| ≤ K2 b. We remark that Theorem 1.2(1) confirms that C is the sole source of nonhyperbolicity in the system. Part (2) expresses the fact that many orbits experience at least some form of (nonuniform) hyperbolicity. A more detailed discussion is given in Sect. 7. 1.3. SRB measures and their statistical properties. Definition 1.1. Let g : M → M be a diffeomorphism of a manifold. A g-invariant Borel probability measure µ is called an SRB measure if g has a positive Lyapunov exponents µ − a.e. and the conditional measures of µ on unstable manifolds are absolutely continuous with respect to the Riemannian measure on these manifolds. In the absence of zero Lyapunov exponents, it follows from general hyperbolic theory that an SRB measure has at most a countable number of ergodic components, and that each ergodic component has a positive measure set of generic points.A to be point z is said generic with respect to µ if for every continuous function ϕ, n1 ni=0 ϕ(g i z) → ϕdµ as n → ∞. See [Led] and [PS]. Theorem 1.3 (Existence and ergodic properties of SRB measures). (1) T admits an SRB measure. Assuming condition (**) above, we have the following additional information: (2) T admits at most r ergodic SRB measures µi , where r is the cardinality of the critical set of the 1-dimensional map f . (3) Lebesgue-a.e. z0 ∈ R0 is generic with respect to some µi ; in fact, Lebesgue-a.e. z0 ∈ R0 lies in the stable manifold of a µi -typical point in . We know from general hyperbolic theory that without zero Lyapunov exponents, ergodic components of SRB measures are, up to finite factors, mixing [Led]. Theorem 1.4 (Decay of correlations and central limit theorem). Let µ be an ergodic SRB measure, which, by taking a power of T if necessary, we assume to be mixing. Then
10
Q. Wang, L.-S. Young
(1) for each η ∈ (0, 1], there exists λ = λ(η) < 1 such that if ψ : A → R is Hölder continuous with exponent η and ϕ ∈ L∞ (µ), then there exists K(ϕ, ψ) such that (ϕ ◦ T n )ψdµ − ϕdµ ψdµ < K(ϕ, ψ)λn for all n; (2) the Central Limit Theorem holds for all Hölder ϕ with
ϕdµ = 0, i.e.
n−1
1 ϕ ◦ T i → N (0, σ ), √ n i=0
where N (0, σ ) is the normal distribution with variance σ 2 ; furthermore, σ > 0 if and only if ϕ ◦ T = ψ ◦ T − ψ for any ψ. We remark that the word “attractor” has different meanings in the literature (see [Mil] for a discussion). In this article, it is convenient for us to refer to as “the attractor”. Theorem 1.3 suggests, however, that from a measure-theoretic point of view, it may be more appropriate to regard the supports of the µi as attractors. 1.4. Global geometry, symbolic dynamics and topological entropy. A monotone branch of Rn is a region diffeomorphic to a rectangle and bordered by two subsegments of ∂Rn . Roughly speaking, it is the largest domain of this kind with the property that for 0 ≤ i ≤ n, the x-coordinates of its T −i -image stay inside some interval of monotonicity of f , where f is the initial 1-dimensional map from which {Ta,b } is built. This notion is made precise in Sect. 9, where a combinatorial tree is introduced to describe the structure of a natural class of monotone branches. Theorem 1.5 (Coarse geometry of attractor). There is a sequence of neighborhoods R˜ n of with R˜ 1 ⊃ R˜ 2 ⊃ R˜ 3 ⊃ · · · and ∩i R˜ i = such that each R˜ n is the union of a finite number of monotone branches of Rk , n ≤ k ≤ −1 n(1 + Kθ ), where θ ∼ log b. Let {1, 2, · · · , k} be a finite alphabet and let =k be the set of all bi-infinite sequences s = (· · · , s−1 , s0 , s1 , · · · ) with si ∈ {1, 2, · · · , k}. The shift operator σ : =k → =k is defined by (σ s)i = (s)i+1 . For = ⊂ =k , we call σ |= : = → = a subshift of the full shift on k symbols if = is a closed σ -invariant subset of =k . (0) Let x1 < x2 < · · · < xr < xr+1 = x1 be the critical points of f . Let Ci be the (0) component of C (0) containing xi and let Ci = C ∩ Ci . We remark that each Ci is a fractal set – it is not contained in any smooth curve – and that a priori there is no well defined notion of whether a point lies to the left or to the right of Ci . Theorem 1.6 (Coding of orbits on attractor). (1) The critical set C partitions \ C into disjoint sets A1 , A2 , · · · , Ar so that z ∈ Ai has the interpretation of being “to the right” of Ci and “to the left” of Ci+1 .
Strange Attractors with One Direction of Instability
11
(2) There is a subshift σ : = → = of a full shift on finitely many symbols and a continuous surjection π : = → such that T ◦ π = π ◦ σ; i π is 1-1 except on ∪∞ i=−∞ T C, where it is 2-1. (3) Under the additional assumption that f [xj , xj +1 ] ⊃ S 1 for any j , the coding in (2) i −1 is given by (1), i.e. for all z0 ∈ \ ∪∞ i=−∞ T C, π (z0 ) is the unique sequence (si )∞ with z ∈ A . i si i=−∞
Corollary 1.1 (Kneading sequences for critical points). For every z0 ∈ C, the itinerary of {z1 , z2 , · · · } is uniquely represented by a sequence in =. Another consequence of Theorem 1.6 is the existence of equilibrium states. For a continuous map g : X → X of a compact metric space and a continuous function ϕ : X → R, a g-invariant Borel probability measure µ on X is called an equilibrium state for g with respect to the potential ϕ if µ maximizes the quantity sup { hν (g) + ϕdν}, where hν (g) denotes the metric entropy of g with respect to ν and the supremum is taken over all g-invariant Borel probability measures ν. Corollary 1.2 (Existence of equilibrium states). T has an equilibrium state for every continuous ϕ : → R. In particular, T admits an invariant Borel probability measure maximizing entropy. The topological entropy of g, written htop (g), is usually defined in terms of open covers of arbitrarily small diameters or in terms of (n, ε)-spanning or separated sets. For precise definitions, see [Wa]. For the class of attractors studied in this paper, htop (g) can be computated in more concrete ways. In Theorem 1.6 we saw that every z0 ∈ can be unambiguously associated with one (and occasionally two) symbol sequences in = determined by the locations of its iterates with respect to the components of the critical set. We will show in Sect. 10 that in like manner all the points in R0 can be assigned symbol sequences – except that this assignment is not unique. Let us temporarily refer to this as the “fuzzy” coding on R0 . Let Nn = number of distinct n-blocks in the coding of ; N˜ n = number of distinct n-blocks in the “fuzzy” coding of R0 ; Pn = number of fixed points of T n ; Mn± = number of monotone segments in ∂Rn± , the two boundary components of Rn (see Sect. 9.1 for the precise definition). Theorem 1.7 (Formulas and inequalities for topological entropy). (i) (ii)
1 1 1 log Nn = lim log N˜ n = lim log Pn . n→∞ n n→∞ n n→∞ n K 1 1 ± ± 1+ . lim sup log Mn ≤ htop (T ) ≤ lim inf log Mn n→∞ n n→∞ n log b1
htop (T ) = lim
12
Q. Wang, L.-S. Young
For a 1-dimensional piecewise monotonic map g, it is a well known fact that htop (g) n is the growth rate of the number of intervals on which g is monotonic [MS]. The factor 1 + K 1 gives, in a sense, the potential defect in measuring the complexity of T via log
b
the 1-dimensional curves ∂R0± . 1.5. Hénon maps and homoclinic bifurcations. Theorems 1–7 are stated for attractors that arise from perturbations of circle maps. We state here, for the record, the corresponding results for interval maps and some of their applications. Reduction to the circle case is carried out in Appendix A.1. Theorem 1.8 (Attractors arising from interval maps). Let I be a closed interval of finite length, and let f : I → I be a Misiurewicz map with f (I ) ⊂ int(I ). Let U be a neighborhood of I × {0} in R2 , and let {Ta,b } be a 2-parameter family of maps with Ta,b : U → R2 . We identify I with I × {0} ⊂ R2 , and assume that {Ta,b } satisfies the conditions in Steps II, III and IV in Sect. 2.1 with fa ∗ = f . Then ˆ = [a0 , a1 ] × (0, b1 ] arbitrarily near (a ∗ , 0) (i) there exist K > 0 and a rectangle ! ˆ such that for each (a, b) ∈ !, Ta,b maps R := I × [−Kb, Kb] strictly into its n R; interior, defining an attractor := n≥0 Ta,b ˆ such that the conclusions of Theorems 1–7 (ii) there is a positive measure set ! ⊂ ! hold for T = Ta,b | R for all (a, b) ∈ !. Corollary 1.3 (The Hénon family). Let Ta,b : (x, y) → (1 − ax 2 + y, bx),
(x, y) ∈ R2 .
Then for every a ∗ ∈ [1.5, 2] for which fa ∗ : x → 1 − a ∗ x 2 is a Misiurewicz map, the conclusions of Theorem 8 hold. In particular, there is a positive measure set ! near (a ∗ , 0) such that the conclusions of Theorems 1–7 hold for all T = Ta,b , (a, b) ∈ !. These results are valid for both b > 0 and b < 0. When specialized to a ∗ = 2 and b > 0, the part of Corollary 3 that corresponds to Theorem 1, part (2), in this paper is a version of the main result of [BC2]. The results in [BY1, BV], and [BY2] are respectively the parts of Corollary 3 that the correspond to Theorem 3(1),(2), Theorem 3(3) and Theorem 4. Our last result concerns the application of Theorems 1–7 to homoclinic bifurcations. Let gµ , µ ∈ [0, 1], be a C ∞ one-parameter family of surface diffeomorphisms unfolding at µ = 0 a nondegenerate tangency of W u (p0 ) and W s (p0 ), where p0 is a hyperbolic fixed point. We assume that the eigenvalues λ and σ of Dg0 at p0 satisfy 0 < λ < 1 < σ and λσ < 1, and that they belong in the open and dense set of eigenvalue pairs that meet the hypotheses of Sternberg’s linearization theorem. Under these conditions, it is well known (see [PT]) that for all sufficiently large k, there is a positive measure set of ˆ k such that for all µ ∈ ! ˆ k , gµ has a k-periodic attractor µ all but finitely parameters ! many of whose periodic components are located near the fixed point pµ . Theorem 1.9 (Attractors arising from homoclinic bifurcations). Let gµ be as above. ˆk Then for all sufficiently large k, there is a positive measure set of parameters !k ⊂ ! for which the following hold: for all µ ∈ !k , there is a component 0µ of µ with the property that if Tµ denotes the restriction of gµk to a neighborhood of 0µ , then the conclusions of Theorems 1.1–1.7 hold for T = Tµ .
Strange Attractors with One Direction of Instability
13
Our proof of Theorem 9, which is given in Appendix A.2, consists of observing that the maps gµk meet the conditions of Theorem 8. The part of Theorem 9 that corresponds to Theorem 1, part (2), in this paper is the main result of [MV]. 2. Preliminaries We gather in this section a collection of technical facts used repeatedly in later sections. Most of the proofs are given inAppendix B. Sections 2.1–2.4 contain material not specific to the family {Ta,b }, and K is not a “system constant” in these subsections. 2.1. Linear algebra. Let M be a 2×2 matrix. Assuming that M is not a scalar multiple of an orthogonal matrix, we say that a unit vector e defines the most contracted direction of M if Mu ≥ Me for all unit vectors u. For a sequence of matrices M1 , M2 , · · · , we use M (i) to denote the matrix product Mi · · · M2 M1 and ei to denote the most contracted direction of M (i) when it makes sense. Hypotheses for Sect. 2.1. The Mi are 2 × 2 matrices; they satisfy | det(Mi )| ≤ b and Mi ≤ K0 , where K0 and b are fixed numbers with K0 > 1 and b << 1. (i) i (i−1) ≥ Lemma 2.1. There exists √ K depending only on K0 such that if M ≥ κ and M i−1 κ for some κ >> b, then ei and ei−1 are well-defined, and i−1 Kb ei × ei−1 ≤ . κ2 √ Corollary 2.1. If for 1 ≤ i ≤ n, M (i) ≥ κ i for some κ >> b, then:
(a) en − e1 < Kb ; κ2 Kb i (i) (b) M en ≤ κ 2 for 1 ≤ i ≤ n. Proof. (a) follows immediately from Lemma 2.1. For (b), since en − ei ≤ i b i + κ . % & have M (i) en ≤ M (i) (en − ei ) + M (i) ei < K0i · Kb κ2
Kb i κ2
, we
Next we consider for each i a 3-parameter family of matrices Mi (s1 , s2 , s3 ). For the purpose of the next corollary we make the additional assumptions that for 0 < j ≤ 3, ∂ j Mi (s1 , s2 , s3 ) ≤ K0i and |∂ j det(Mi (s1 , s2 , s3 ))| < K0i b, where ∂ j represents any one of the partial derivatives of order j with respect to s1 , s2 or s3 . Let θi (s1 , s2 , s3 ) denote the angle ei (s1 , s2 , s3 ) makes with the positive x-axis, assuming it makes sense. √ Corollary 2.2. Suppose that for some κ >> b, M (i) (s1 , s2 , s3 ) ≥ κ i for every (s1 , s2 , s3 ) and for every 1 ≤ i ≤ n. Then for j = 1, 2, 3, |∂ j θ1 | ≤ Kκ −(1+j ) , and for i ≤ n, i−1 Kb j , (3) |∂ (θi − θi−1 )| < κ (2+j ) i Kb j (i) ∂ M en < . (4) κ (2+j )
14
Q. Wang, L.-S. Young
Our next lemma is a perturbation result. Let Mi , Mi be two sequences of matrices, let w be a vector, and let θi and θi denote the angles M (i) w and M (i) w make with the positive x-axis respectively. Lemma 2.2 (([BC2], Lemma 5.5)). Let κ, λ be such that 1 ≤ i ≤ n, Mi − Mi ≤ λi and M (i) w ≥ κ i , then
Kb κ2
< λ < K0−12 κ 8 . If for
(a) M (n) w ≥ 21 κ n ; n
(b) |θn − θn | < λ 4 .
Proofs of Lemmas 2.1, 2.2 and Corollary 2.2 are given in Appendix B.1. Hypothesis for Sects. 2.2 and 2.3. T : A → A is an embedding of the form T (x, y) = (t1 (x, y), bt2 (x, y)), where the C 2 -norms of t1 and t2 are ≤ K0 , and K0 > 1 and b << 1 are fixed numbers.
2.2. Stable curves. Lemma 2.3. Let κ, λ be as in Lemma 2.2 and z0 ∈ A be such that for i = 1, · · · , n, DT i (z0 ) ≥ κ i . Then there is a C 1 curve γn passing through z0 such that (a) for all z ∈ γn , d(T i z0 , T i z) ≤ ( Kb )i for all i ≤ n; κ2 (b) γn can be extended to a curve of length ∼ λ or until it meets ∂A. A proof of this lemma is given in Appendix B.2. We call γn a stable curve of order n. It will follow from this lemma that if DT i (z0 ) ≥ i κ for all i > 0, then there is a stable curve γ∞ passing through z0 obtained as a limit of the γn ’s. 2.3. Curvature estimates. Let γ0 : [0, 1] → A be a C 2 curve, and let γi (s) = T i (γ0 (s)). We denote the curvature of γi at γi (s) by ki (s). 1
Lemma 2.4. Let κ > b 3 . We assume that for every s, k0 (s) ≤ 1 and (s) ≥ κ j γn−j (s) DT j (γn−j (s))γn−j
for every j < n. Then kn (s) ≤ A proof is given in Appendix B.3.
Kb . κ3
Strange Attractors with One Direction of Instability
15
2.4. One-dimensional dynamics. We begin with some properties of maps satisfying the Misiurewicz condition. Let f be as in Sect. 1.1, and let Cδ := {x ∈ S 1 : d(x, C) < δ}. Lemma 2.5. There exist cˆ0 , cˆ1 > 0 such that the following hold for all sufficiently small δ > 0: Let x ∈ S 1 be such that x, f x, · · · , f n−1 x ∈ Cδ , any n. Then (i) |(f n ) x| ≥ cˆ0 δecˆ1 n ; (ii) if, in addition, f n x ∈ Cδ , then |(f n ) x| ≥ cˆ0 ecˆ1 n . A proof is given in Appendix B.4. Corollary 2.3. Let c0 < cˆ0 and c1 < cˆ1 . Then for all sufficiently small δ, there exists ε = ε(δ) such that for all g with g − f C 2 < ε, (i) and (ii) above hold for g with c0 and c1 in the places of cˆ0 and cˆ1 . Proof. Let N be such that δecˆ1 N > ec1 N , and choose ε small enough so that for all i ≤ N, if x, gx, · · · , g i−1 x ∈ Cδ (g), then (g i ) x ≈ (f i ) x. % & The results in the rest of this subsection are not needed in this article. We include them only as motivation for the corresponding results in 2-dimensions. Temporarily write C = C(g). To control (g n ) x when g i x ∈ Cδ for some i < n, we need to impose further conditions on g. Following [BC1] and [BC2], we assume there exist λ > 1 and 0 < α << 1 such that for all xˆ ∈ C and n ≥ 0: (a) d(g n x, ˆ C) ≥ c0 e−αn and n (b) | (g ) (g x) ˆ |≥ c0 λn . We define for each x ∈ Cδ a bound period p(x) as follows. Fix β > α. Let xˆ ∈ C be such that |x − x| ˆ < δ. Then p(x) is the smallest p such that ˆ > c0 e−βp . |g p x − g p x| Lemma 2.6 (Derivative recovery). There exists K such that for g satisfying the conditions above, if |x − x| ˆ = e−µ < δ for some xˆ ∈ C, then (i) K −1 µ ≤ p(x) ≤ Kµ ; (ii) K −1 (x − x) ˆ 2 |(g i−1 ) (g x)| ˆ < |g i x − g i x| ˆ < K(x − x) ˆ 2 |(g i−1 ) (g x)|; ˆ p p −1 2 (iii) |(g ) x| ≥ K λ , where p = p(x). Proof. For this result there is no substantive difference between the situation here and that of the quadratic family x → 1 − ax 2 . See [BC1] and [BC2], Sect. 2. % & Standing hypotheses for the rest of the paper. {Ta,b } is as in Sect. 1.1. In particular, it has the form Ta,b (x, y) = (Fa (x, y) + bua,b (x, y), bva,b (x, y)). Where no ambiguity arises, we will write T = Ta,b . The phrase “for (a, b) sufficiently near (a ∗ , 0)” will appear (finitely) many times in the next few sections. Each time it appears, the rectangle in parameter space for which our results apply may have to be reduced. From here on K is the generic system constant as declared in Sect. 1.
16
Q. Wang, L.-S. Young
2.5. Dynamics outside of C (0) . The first system constant to be chosen is δ. A number of upper bounds for δ will be specified as we go along. For now we think of it as a very small positive number with d(f n x, ˆ C) >> δ for all xˆ ∈ C and n > 0. We assume also that a is sufficiently near a ∗ that the Hausdorff distances between the critical sets of fa ∗ and fa are << δ. Recall that we will be working in R0 = {(x, y) ∈ A : |y| ≤ Kb}. Our zeroth critical region C (0) is defined to be ˆ <δ C (0) = {(x, y) ∈ R0 : |x − x|
for some xˆ ∈ C}. 1
Let s(u) denote the slope of a vector u. Assuming that b 4 << δ, an easy calculation shows that for z ∈ C (0) , if |s(u)| < δb4 , then |s(DT (z)u)| = O( bδ ). Also, if κ0 := min DT (z)u, where the minimum is taken over all z ∈ C (0) and unit vectors u with |s(u)| < δb4 , then κ0 > K −1 δ. Let K(δ) := κK3 , so that K(δ)b is the upper bound for kn 0
in Lemma 2.4. We call a vector u a b-horizontal vector if |s(u)| < K(δ)b. A curve γ is called a C 2 (b)-curve if its tangent vectors are b-horizontal and its curvature is ≤ K(δ)b at every point. Lemma 2.7. (a) For z ∈ C (0) , if u is b-horizontal, then so is DT (z)u. (b) If γ is a C 2 (b)-curve outside of C (0) , then T (γ ) is again a C 2 (b)-curve.
Proof. (a) has already been explained; (b) is an immediate consequence of (a) and Lemma 2.4. % & Our next lemma describes the dynamics of b-horizontal vectors outside of C (0) . Lemma 2.8. There exist constants c0 , c1 > 0 independent of δ such that the following holds for T = Ta,b for all (a, b) sufficiently near (a ∗ , 0). Let z ∈ R0 be such that z, T z, · · · , T n−1 z ∈ C (0) , and let u be a b-horizontal vector. Then (i) DT n (z)u ≥ c0 δec1 n ; (ii) if, in addition, T n z ∈ C (0) , then DT n (z)u ≥ c0 ec1 n . Proof. As with Corollary 2.3, this follows from Lemma 2.5 by perturbation.
& %
2.6. Critical points inside C (0) . Wherever it makes sense, let em denote the field of most contracted directions of DT m and let qm be the slope of em . When working with a curve γ parameterized by arc length, we write qm (s) = qm (γ (s)). We begin with some easy observations about e1 . Lemma 2.9. For all (a, b) sufficiently near (a ∗ , 0), e1 is defined everywhere on R0 , and there exists K > 0 such that (a) |q1 | > K −1 δ outside of C (0) , and q1 has opposite signs on adjacent components of R0 \ C (0) ; −1 1 (b) | dq ds | > K on every C 2 (b)-curve γ in C (0) .
Strange Attractors with One Direction of Instability
17
Proof. The existence of e1 follows from the fact that everywhere on R0 , DT > K −1 (this uses the non-degeneracy condition in Step IV, Sect. 1.1) while | det(DT )| = O(b). For a = a ∗ , b = 0 and {y = 0}, the assertion in (a) is obvious, and part (a) of 1 Lemma 2.9 follows by a perturbative argument. The estimate for | dq ds | uses the non degeneracy condition above and the fact that fa ∗ = 0 on C. See Appendix B.5 for details. % & Definition 2.1. Let γ be a C 2 (b)-curve in C (0) . We say that z0 is a critical point of order m on γ if (a) DT i (z0 ) ≥ K −1 for i = 1, 2, · · · , m; (b) at z0 , em coincides with the tangent vector to γ . It follows from Lemma 2.9 that on every C 2 (b)-curve that stretches across a component of C (0) , there is a unique critical point of order 1. The next two lemmas are used in the “updating” of existing critical points and the creation of new ones. Their proofs are given in Appendix B.5 Lemma 2.10 ([BC2], p. 113). Let γ be a C 2 (b)-curve in C (0) , where γ (0) = z is a critical point of order m. We assume that (a) DT i (z) ≥ 1 for i = 1, 2, · · · , 3m; m m (b) γ (s) is defined for s ∈ [−(Kb) 2 , (Kb) 2 ]. Then there exists a unique critical point zˆ of order 3m on γ , and |ˆz − z| < (Kb)m . Lemma 2.11 ([BC2], Lemma√6.1). For√ε > 0, let γ and γˆ be two disjoint C 2 (b)-curves in C (0) defined for s ∈ [−4K1 ε, 4K1 ε], where K1 is the constant K in Lemma 2.9(b). We assume (a) γ (0) is a critical point of order m; (b) the x-coordinates of γ (0) and γˆ (0) coincide, and | γ (0) − γˆ (0) |< ε. √ Then there exists a critical point of order m ˆ at γˆ (ˆs ) with |ˆs | < 4K1 ε and m ˆ = min m, K log 1ε . 2.7. Tracking DT n : a splitting algorithm. The purpose of this section is to recall an algorithm introduced in [BC2] that gives, under suitable circumstances, a direct relation between DT n and 1-dimensional derivatives. Let z0 ∈ R0 , and let w0 be a unit vector at z0 that is b-horizontal. We write zn = T n z0 and wn = DT n (z0 )w0 . In the case where zi ∈ C (0) for all i, the resemblance to 1-d is made clear in Lemmas 2.5 and 2.8. Consider next an orbit z0 , z1 , · · · that visits C (0) exactly once, say at time t > 0. Assume: (a) there exists J > 0 such that DT i (zt )( 01 ) ≥ 1 for all i < J, so that in particular eJ , the most contracted direction of DT J , is defined at zt , and J (b) θ (wt , eJ ), the angle between wt and eJ , is ≥ b 2 . Then DT i (z0 ) can be analyzed as follows. (Note that our notation is different from ˆ to the vector that in [BC2].) We split wt into wt = wˆ t + E, where wˆ t is parallel 0 ∗ = w . For i with ˆ . For i ≤ t and i ≥ t + J, let w and E is parallel to e J i 1 i
18
Q. Wang, L.-S. Young
t < i < t + J, let wi∗ = DT i−t (zt )wˆ t . We claim that all the wi∗ are b-horizontal vectors, ∗ /w ∗ } so that {wi+1 i i=0,1,2,··· resemble a sequence of 1-d derivatives. In particular, ∗ wt+1 /wt∗ ∼ θ (wt , eJ ) simulates a drop in the derivative when an orbit comes near a critical point in 1-dimension. To justify the statement about the slope of the wi∗ , we note that DT (zt )( 01 ) is b∗ . We have horizontal, so that in view of lemma 2.7 we need only to consider wt+J J J ˆ ≤ bJ wˆ t ≤ b 2 wˆ t ≤ b 2 DT J (zt )wˆ t , DT J (E) θ (wt , eJ )
the first and third inequalities following from (a) and the second from (b). Since the ∗ J ˆ t + DT J (zt )Eˆ slope of DT J (zt )wˆ t is smaller than Kb 2δ , it follows that wt+J = DT (zt )w remains b-horizontal. The discussion above motivates the following splitting algorithm introduced in [BC2]. (0) Consider {zi }∞ i=0 , and let t1 < · · · < tj < · · · be the times when zi ∈ C . We let w0 be a b-horizontal unit vector, and assume as before that eJi makes sense at zi for i = tj . Define wi∗ as follows: 1. For 0 ≤ i ≤ t1 , let wi∗ = DT i (z0 )w0 . 2. At i = tj , we split wi∗ into wi∗ = wˆ i + Eˆ i , where wˆ i is parallel to ( 3. For i > t1 , let
0 1
) and Eˆ i is parallel to eJi .
wi∗ = DT (zi−1 )wˆ i−1 +
DT
Jtj
(ztj )Eˆ tj
(5)
j : tj +Jtj =i
and let wˆ i = wi∗ if i = tj for any j . This algorithm does not give anything meaningful in general. It does, however, in the scenario of the next lemma. Lemma 2.12. Let zi , wi and wi∗ be as above. Assume Ji
(a) for each i = tj , θ (wi∗ , eJi ) ≥ b 2 ; (b) the time intervals Ij := [tj , tj + Jtj ] are strictly nested, i.e. for j = j , either Ij ∩ Ij = ∅, Ij ⊂ Ij , or Ij ⊂ Ij , and tj + Jtj = tj + Jtj . Then wi = wi∗ for i ∈ ∪j Ij , and the wi∗ ’s are all b-horizontal vectors. The sequence ∗ /w ∗ ∼ θ(w ∗ , e ) for i = t , and w ∗ ≈ {|wi∗ } has the property that wi+1 Ji j i i i+1 DT (zi )wi∗ for i = tj . Proof. The nested condition in (b) allows us to consider the Ij ’s one at a time beginning with the innermost time intervals. This reduces to the case of a single visit to C (0) treated earlier on. % &
Strange Attractors with One Direction of Instability
19
Part I. Controlling a Source of Nonhyberbolicity 3. The Critical Set Many authors, including [BC1, CE, J, M1], and [NS], have studied 1-dimensional maps by controlling their critical orbits. These ideas were mimicked in [BC2], where the authors developed techniques for identifying, for certain Hénon maps, a set they called the “critical set”. This is done via an inductive procedure involving parameter selection. The first step in our analysis of the family {Ta,b } is to carry out a similar parameter selection, and the aim of this section is to formulate suitable inductive hypotheses.
3.1. What is the critical set? In 1-dimension, the critical set is where all previous expansion is destroyed. Tangencies of stable and unstable manifolds play a similar role in higher dimensions. Here is how we propose to capture the set C that we will prove in Sect. 7 to be the origin of all nonhyperbolic behavior. Let F0 be the foliation on R0 with leaves {y = constant}, and let Fk be its image under T k . In Sect. 2.5 we defined the 0th critical region C (0) . Suppose that T i C (0) ∩ C (0) = ∅ for all i ≤ i0 . Then for i ≤ i0 , Fi restricted to C (0) ∩ Ri consists of finitely many bands of roughly horizontal leaves whose tangent vectors have been expanded the previous i iterates (Lemma 2.8). From Corollaries 2.1, 2.2 and Lemma 2.9, we see also that in C (0) , DT i has a well-defined field of most contracted directions, namely ei , whose integral curves are roughly parabolas. It is natural to take the set of tangencies in C (0) ∩ Ri between the leaves of Fi and the integral curves of ei to be our ith approximation of C. Since these approximations stabilize quickly with i, they would converge to C if this picture could be maintained indefinitely, i.e. if the “turns” of Fi could be prevented from entering C (0) for all i. For i ≤ i0 , we think of C (i) := C (0) ∩ Ri as our ith critical region. The strategy as explained above, then, is essentially to solve for tangencies of temporary stable and unstable manifolds in C (i) and call the resulting set our ith approximation of C. Observe that C (i) is the union of at most K i rectangles with a transparent geometry. This geometry will be passed on to the critical set. Now experience from 1-dimension tells us that in order to retain a positive measure set of parameters, we must allow our “turns” to approach the critical set as i increases. We will allow them to return slowly, and to maintain a picture similar to that for i ≤ i0 , we will shrink the critical regions C (i) sideways at a rate faster than this rate of approach. Justification is needed to show that this process can be continued indefinitely and to prove the stabilization of
the approximate critical sets. In the end, an alternate characterization of C will be C = i≥0 C (i) . In order for the contractive fields above to be defined, it is necessary that the derivative along orbits starting from C experience some exponential growth. This growth, which is also useful for controlling the movements of the “turns”, is brought about in two ways: (i) by arranging for critical orbits to stay away from the critical set for a very long time, hyperbolicity is guaranteed for a long initial period; (ii) when an orbit of C gets near a point z ∈ C, it copies the initial segment of the orbit of z, thereby replicating the growth properties created in (i). A version of these ideas will be made precise in the inductive assumptions.
20
Q. Wang, L.-S. Young
3.2. Getting started. We first introduce our main system constants. They are α, β, ρ, c, n0 , θ and δ (which we have already met): – e−αn and e−βn , with α << β << 1, represent two small length scales. – c > 0 is our target Lyapunov exponent; it is < c1 , where c1 is as in Lemma 2.8. – 0 < ρ < K −1 is an arbitrary number of order 1. It determines the rate at which our critical regions decrease in size (see Sect. 3.1). – n0 is the number of iterates the critical orbits are required to stay a preassigned distance away from C; see below for more precise specifications. – θ is chosen so that bθ is a number of order 1 and < K −1 ; one use of θ is the following: critical orbits originating from the same component of C ([θN]) are indistinguishable in their first N iterates. For this reason, critical points of generation > θN are not constructed in the first N steps of the induction (see (IA1) below). These constants are chosen in the following order: c and ρ are determined by the derivative of T ; α and β are then chosen. This is followed by δ, which is << δ0 to start with and shrunk a number of times as needed in the course of our argument. The value of n0 is not determined until very late in the proof: it is used to ensure sufficient hyperbolicity at the start (and to overcome various “irregularities” that occur at initial stages); it depends on all the other system constants except for b. Observe that increasing n0 is at the expense of shrinking the size of the parameter set at the start. The magnitude of b, which is used to beat everything, is the last to be chosen; θ as we have defined it is, of course, determined by b. At the start of our induction, we assume we have a parameter set !0 with the following properties: Let f be the Misiurewicz map from which we are perturbing, and let δ0 = 41 inf {d(f n x, C), x ∈ C, n > 0}. First, by considering a sufficiently near a ∗ , we may assume that for all a and for every critical point x of fa , d(fan x, C) ≥ 2δ0 for all 0 < n ≤ n0 . Next, by choosing b sufficiently small, we may assume, through Corollaries 2.1, 2.2 and Lemmas 2.9, 2.10, that Ta,b has on each connected segment of ∂R0 ∩ C (0) a unique critical point z0 of order n0 , and that z0 is close enough to the corresponding critical point of fa that dC (zn ) ≥ δ0 for all n ≤ n0 . These are our critical points of generation 0. They comprise the set we call .0 . Parameters are deleted at each stage of our induction. Sections 3–5 are concerned with the dynamics of the maps corresponding to the parameters retained. Issues pertaining to the measure of the set of retained parameters (including whether or not it is nonempty) are postponed to Sect. 6. 3.3. Inductive assumptions. Let N ≥ n0 be a large number, and let !N be the set of parameters retained after N iterates. We now formulate a set of inductive assumptions that describes the desired dynamical picture for T = Ta,b , (a, b) ∈ !N . While we will continue to provide motivations and explanations, (IA1)–(IA6) below are to be viewed as formal inductive hypotheses. As before, let zi = T i z0 . 3.3.1. Critical points and critical regions. (IA1). For all k ≤ θ N , the critical regions C (k) are defined and have the geometric properties stated in (1)(i), (ii) and (iii) of Theorem 1.1. Moreover, on each horizontal boundary of each component of C (k) , there is a unique critical point of order N located k within O(b 3 ) of the midpoint of the segment.
Strange Attractors with One Direction of Instability
21
Critical points of order N on ∂C (k) are called critical points of generation k and order N . The set of critical points of generation ≤ k is denoted by .k . As the induction progresses, the orders of the critical points are updated, and the precise locations of .k are modified accordingly. At the end of the induction process, . := ∪k .k , where .k now refers to the set of critical points of generation k and order ∞, is the set in the statement of part (2) of Theorem 1. 3.3.2. Distance to critical set and loss of hyperbolicity. If the critical set is where wouldbe stable and unstable directions are interchanged, then distance to the critical set might provide a measure of loss of hyperbolicity. This is indeed the case under suitable circumstances and for a suitable notion of “distance”. If Q is a component of C (k) , we let LQ denote the vertical line midway between the two vertical boundaries of Q. Definition 3.1. We say z ∈ C (0) is horizontally related or simply h-related to .θN if k there exists a component Q of C (k) , k ≤ θN , such that z ∈ Q and dist(z, LQ ) ≥ b 20 . When this holds, we say z is h-related to z0 for all z0 ∈ .θN ∩ Q. 5 This is an attempt to describe the location of a point relative to .θN , which, as N → ∞, converges to a fractal set. From Lemma 4.1, we see that .θN ∩ Q is contained k in a region of width O(b 4 ) in the middle of Q, so that z and .θN ∩Q have a very obviously horizontal relationship. We caution, however, that there may be points in .θN that are directly above or below z, and quite possibly both to its left and to its right. Observe also
k
that if Q is a component of C (k ) such that z ∈ Q ⊂ Q, then dist(z, LQ ) ≥ b 20 . Definition 3.2. For z ∈ R0 , we define its distance to the critical set, denoted dC (z), as follows: for z ∈ C (0) , we let dC (z) = dist(z, LQ ), where Q is the component of C (k) containing z and k is the largest number ≤ θN with z ∈ C (k) ; for z ∈ C (0) , let Q be the component of C (0) nearest to z. We further let φ(z) be one of the two points in ∂Q ∩ .θN if z is h-related to .θN . For z ∈ C ([θN ]) , the definitions of dC (z) and φ(z) are temporary and will be modified as the induction progresses. We remark that for z in an h-related position, the distance from z to φ(z) is a very good approximation of dC (z). To secure growth properties for the orbits of .θN , we forbid them to approach the critical set too closely too soon. (IA2) is a result of parameter selection. (IA2). For all z0 ∈ .θN and all i ≤ N, dC (zi ) ≥ min(δ, e−αi ). We will assume, for convenience, that e−αn0 < δ. Under this assumption, (IA2) reads dC (zi ) ≥ e−αi for i > n0 . (IA2) implies that for all z0 ∈ .θN and i ≤ N , zi is h-related to .θN whenever it is in C (0) . Intuitively, this is because zi is in a very “deep” layer relative to its distance to .θN . Formally, let zi ∈ Q ⊂ C (k) , where Q and k are as in Definition 3.2. Then k << i since ρ k ≥ e−αi . Now zi ∈ Ri . If k < [θN ], then zi ∈ Q ∩ Rk+1 , proving k 1 dC (zi ) ≥ ρ k+1 >> b 20 . If k = [θN ], then dC (zi ) ≥ e−αi ≥ e−αN >> b 20 θN provided θ −20α . that b is chosen to be < e 5 When studying the dynamics of T on ∂R , it will be convenient to include the following in the definition k of h-relatedness: Let γ be a horizontal boundary of a component of C (k) , k ≤ θ N , and let zˆ ∈ γ ∩ .θN . Then z ∈ γ is also said to be h-related to zˆ .
22
Q. Wang, L.-S. Young integral curves of el i
φ(zi ) wi∗ zi Fig. 2. Correct splitting of wi∗
Definition 3.3. (a) For arbitrary z ∈ C (0) , we define its fold period J(z) to be the J nonnegative integer J ≥ 1 such that b 2 is closest to dC (z). (b) Given z0 ∈ R0 and unit vector w0 , we let wi∗ , i = 0, 1, 2, · · · , be given by the splitting algorithm in Sect. 2.7 with Ji = J(zi ) assuming eJ(zi ) is defined at zi . For J ≤ N , Lemma 2.2 gives an estimate on the size of the neighborhood of .θN on which eJ is well defined. In particular, if z is h-related to .θN , then eJ(z) is defined at z. (0) 1 Recall that q1 is the slope of e1 . We fix ε0 > 0 such that ε0 << | ∂q ∂x | in C . For z ∈ ∂Rk , let τ (z) denote a unit tangent vector to ∂Rk at z. In the angle estimates below, τ and eJ are assumed to point in roughly the same direction as w. Definition 3.4. Let z ∈ C (0) be h-related to .θN , and let w be a vector at z. We say w w splits correctly if | w − τ (φ(z))| < ε0 dC (z). (IA3). For z0 ∈ .θN , w0 = (
0 1
) and i ≤ N , wi∗ splits correctly whenever zi ∈ C (0) .
The sense in which this splitting is “correct” is as follows. We wish to use Lemma 2.12 to understand the evolution of wi , and (IA3) implies condition (a) of the lemma. This is w∗ because |eJi (zi )− wi∗ | ≥ |eJi (zi )−eJi (φ(zi ))|−|eJi (φ(zi ))−τ (φ(zi ))|−|τ (φ(zi ))− wi∗ wi∗ |
∂qJ | ∂xi
i
≥ |dC (zi ) − O(bJi ) − ε0 dC (zi ) ≥ Lemma 2.12 is discussed in Sect. 4.1.
1 ∂q1 2 | ∂x |dC (zi )
Ji
∼ b 2 . Condition (b) of
3.3.3. Derivative along critical orbits. We saw in the last paragraph that for z0 ∈ .θN , as zi enters C (0) , wi∗ suffers a loss of hyperbolicity proportional to dC (zi ). Combining this with (IA5)(c) below applied to an earlier step, we see that this loss will be partially – but not fully – compensated for at the end of a certain period. To prevent a downward spiral in Lyapunov exponent, further parameter exclusion is needed. (IA4). For all z0 ∈ .θN and 0 ≤ i ≤ N , wi∗ (z0 ) > c0 eci . In future steps of the induction, orbits of length N starting from .θN will be replicated; in other words, they will serve as guides for other points that enter C (0) . Definition 3.5. For arbitrary ξ0 and ξ0 ∈ C (0) , we define their bound period to be the largest integer p such that for all 0 < j ≤ p, |ξj − ξj | ≤ e−βj .
Strange Attractors with One Direction of Instability
23
Consider the situation where ξ0 = z0 ∈ .θN . An important observation is that for j ≤ p, |ξj − zj | << dC (zj ). Observe also that by taking δ small enough, we have dC (ξj ) > 21 δ0 for all j ≤ min (p, n0 ) independent of n0 . (To achieve this, choose n1 with e−βn1 < 21 δ0 , and require Kδ 2 DT n1 < e−βn1 ). Taking n0 large also ensures that dC (ξj ) > 2δ whenever zj is outside of C (0) . Our last two inductive assumptions deal with the properties z0 passes along to ξ0 . (IA5). Let z0 ∈ .θN ∩ ∂C (k) , and let γ : [0, ε] → C (0) be a C 2 (b)-curve with γ (0) = z0 and γ (0) tangent to ∂C (k) . We regard all ξ0 ∈ γ as bound to z0 , and let p(ξ0 ) denote their bound periods. Then: (a) There exists K such that for ξ0 ∈ γ with |ξ0 − z0 | = e−h , 1 h ≤ p(ξ0 ) ≤ Kh provided Kh < N ; K moreover, p(ξ0 ) increases monotonically with the distance between ξ0 and z0 ; (b) for J ≤ j ≤ min(p, N ), |ξj − zj | ≈ |ξ0 − z0 |2 wj (z0 ), where “≈” means up to a factor of (1 ± ε1 ) for some ε1 > 0; cp (c) wp (ξ0 ) · |ξ0 − z0 | ≥ e 3 provided p < N . (IA5) describes the quadratic nature of the “turn” as γ is mapped forward. For comparison with 1-dimensional behavior, see Lemma 2.6. The following distortion estimates are used in the proof of (IA5). Let w0 (ξ0 ) = w0 (z0 ) = ( 01 ), and let wˆ i∗ (ξ0 ) be given by Definition 3.3(b) except that eJ(zi ) (and not eJ(ξi ) ) is used for splitting at time i. (IA6) compares wi∗ (z0 ) and wˆ i∗ (ξ0 ). Let Mi (·) and θi (·) denote the magnitude and argument of the vectors in question. Define !i (ξ0 , z0 ) =
i
s
(Kb) 4 | ξi−s − zi−s | .
(6)
s=0
(IA6). Given z0 ∈ .θN and any ξ0 ∈ C (0) , we regard ξ0 as bound to z0 and let p be the bound period. Then for i ≤ min{p, N }, i−1 !j Mi (z0 ) Mi (ξ0 ) , ≤ exp K (7) Mi (ξ0 ) Mi (z0 ) dC (zj ) j =1
and 1
| θi (ξ0 ) − θi (z0 ) |≤ (Kb) 2 !i−1 .
(8)
The estimates above also hold with wi∗ (z0 ) replaced by wˆ i∗ (ξ0 ), where ξ0 is another point in C (0) also thought of as bound to z0 , and p is the minimum of the two bound periods. We remark that the right side of (7) is finite and can be made arbitrarily close to 1 by choosing δ small (see Appendix B.7). Let us return for a moment to Definition 3.1. From the geometry of C (k) (see (IA1) and Lemma 4.1) it is an exercise in calculus to show that if ξ0 is h-related to z0 ∈ .θN , then it lies on a C 2 (b)-curve through z0 tangent to τ (z0 ). In particular, (IA5) applies. Our rules of parameter exclusion, namely (IA2) and (IA4), are similar to those used in [BC2], but they are applied to different orbits and with a different definition of “dC (·)”. The notions of bound and fold periods are borrowed from [BC2], as are (IA5) and (IA6). Our construction of C, however, has a distinctly different flavor.
24
Q. Wang, L.-S. Young
4. Replication of Orbit Segments In Sect. 3.1 we outlined a scheme for obtaining derivative growth along critical orbits, namely to choose a start-up geometry that guarantees some initial growth, and then to try to replicate this behavior. Section 4 contains a detailed analysis of the replication process. The main results are stated in Sect. 4.3, after some technical preparations in Sects. 4.1 and 4.2, including amending slightly the definitions of bound and fold periods. Throughout Sect. 4, (IA1)–(IA6) are assumed up to time N . 4.1. Nested properties of bound and fold periods. Consider z0 ∈ .θN . When zi enters C (0) , it is natural to assign to it a bound period p(zi ) defined using φ(zi ). An unsatisfactory aspect of this definition is that two bound periods so defined may overlap without one being completely contained in the other. The purpose of this subsection is to adjust slightly the definition of p(zi ) to create a simpler binding structure. A similar adjustment is made in [BC2]. ˆ (j ) be First we fix some notation. Let Q(j ) denote the components of C (j ) , and let Q the component of Rj ∩ C (j −1) containing Q(j ) . For z ∈ ∂Rj , let τ (z) be a unit vector at z tangent to ∂Rj . Lemma 4.1. For z, z ∈ .θN ∩ Q(k) , we have k
|z − z | = O(b 4 )
and
k
τ (z) × τ (z ) = O(b 4 ).
Proof. Let z(k) be a critical point in ∂Q(k) . For k ≤ i < [θN ], let z(i+1) be a critical point of generation i + 1 in Q(i) (z(i) ), the component of Q(i) containing z(i) . From (IA1) we know that the Hausdorff distance between the two horizontal boundaries of i i Q(i) (z(i) ) is O(b 2 ). Lemma 2.11 then tells us that |z(i) − z(i+1) | = O(b 4 ). The angle estimate also follows from the proof of Lemma 2.11 % & Lemma 4.2. Let ξ0 be h-related to z0 ∈ .θN . If during their bound period zi returns to ˆ (k) (zi ). C (k) , then ξi ∈ Q Proof. Let γ be a C 2 (b)-curve joining z0 and ξ0 . Then T i γ ⊂ Ri . Since e−αi ≤ dC (zi ) ≤ ρ k , we have k < i and therefore T i γ ⊂ Rk . By the monotonicity of bound periods, every point in T i γ is within a distance of < e−βi from zi . This puts ξi ∈ Rk ∩ Q(k−1) (zi ). % & Lemma 4.3. Let z0 ∈ .θN be such that zi ∈ C (0) at times t1 < t2 < · · · < tr , and that for each j < r the bound period pj initiated at time tj extends beyond time tj +1 . Then pj < (Kα)j −1 p1 . Proof. Let z˜ 0 = φ(zt1 ). We claim that |zt2 − φ(zt2 )| ≈ |˜zt2 −t1 − φ(˜zt2 −t1 )|, which is > e−α(t2 −t1 ) . If true, this will imply, by (IA5)(a), that p2 < Kα(t2 − t1 ) < Kαp1 , and the assertion in the lemma will follow inductively. Since |zt2 − z˜ t2 −t1 | < e−β(t2 −t1 ) << e−α(t2 −t1 ) , it suffices to show that |φ(˜zt2 −t1 ) − φ(zt2 )| << |˜zt2 −t1 − φ(˜zt2 −t1 )|. Let k be the largest number such that z˜ t2 −t1 ∈ C (k) . By Lemma 4.2, zt2 ∈ Q(k−1) (˜zt2 −t1 ), so k−1 φ(˜zt2 −t1 ) and φ(zt2 ) must both be in Q(k−1) (˜zt2 −t1 ). By Lemma 4.1 they are ≤ b 4 apart, and this is << |˜zt2 −t1 − φ(˜zt2 −t1 )|. % & Definition 4.1. For z0 ∈ .θN with zi ∈ C (0) , the adjusted bound period p ∗ (zi ) is defined to be the smallest number p ∗ with the property that for all j with i ≤ j < i +p ∗ , if zj ∈ C (0) , then j + p(zj ) ≤ i + p ∗ .
Strange Attractors with One Direction of Instability
25
Adjusted bound periods, therefore, have a nested structure by definition. Corollary 4.1. (a) p ∗ ≤ p + Kαp. (b) For zi ∈ C (0) with φ(zi ) = zˆ 0 , we have for all j ≤ p∗ , |zj +i − zˆ j | < e−β
∗j
for some β ∗ smaller than β and >> α. The proof is left as an exercise. We assume from here on that all bound periods for all critical orbits are adjusted, and write p and β instead of p ∗ and β ∗ . This amended definition gives critical orbits the following simple structure of bound and free states. We call zi a return if zi ∈ C (0) . Then zi is free for i ≤ n1 , where n1 > 0 is the time of the first return, and it is in bound state for n1 < i ≤ n1 + p1 , where p1 is the bound period initiated at time n1 . After time n1 + p1 , zi remains free until its next return at time n2 , is bound for the next p2 iterates, and so on. The times nj are called free return times. A primary bound period begins at each nj . Inside the time interval [nj , nj + pj ], there may be secondary bound periods which comprise disjoint time intervals, and so on. Next we consider fold periods, which are denoted by J and defined in Sect. 3.3.2. As with bound periods, if zi enters C (0) at times t1 and t2 with t1 < t2 ≤ N , and if the fold period begun at t1 remains in effect at t2 , then using Lemma 4.2 we see that Jt2 < α 1 Jt1 , log
b
so that adjusted fold periods can be defined similarly to give a nested structure. This is condition (b) of Lemma 2.12 . A further simplifying arrangement, which we will also adopt, is that no fold periods expire at returns to C (0) or at the step immediately after. The proof of the following lemma is straightforward and will be omitted. Lemma 4.4 (cf. [BC2], Lemma 6.5). Let z0 ∈ .θN . Then for every i < N, there exist i1 ≤ i ≤ i2 with i2 − i1 < Kθ αi such that i1 and i2 are out of all fold periods. 4.2. Orbits controlled by .θN . In this subsection we consider (z0 , w0 ), where z0 is an arbitrary point in R0 and w0 is a unit vector. We write zi = T i z0 and wi = DT i (z0 )w0 . Definition 4.2. We say (z0 , w0 ) is controlled by .θN up to time m (with m possibly > N) if the following hold. (0) (0) – Initial conditions: 0 if z0 ∈ C , then w0 is a b-horizontal vector; if z0 ∈ C , then either w0 = 1 , or z0 is h-related to .θN and w0 splits correctly. – For 0 < i ≤ m, if zi ∈ C (0) , then zi is h-related to .θN and wi∗ splits correctly in the sense of Definition 3.4 with ε0 replaced by 2ε0 .
No h-relatedness property is required for z0 ∈ C (0) when w0 = 01 because for practical purposes, one may think of the sequence as starting with (z1 , w1 ). Let (z0 , w0 ) be as above. Then the orbit of z0 has a natural bound/free structure defined as follows: If z0 ∈ .θN , then it is natural to regard z0 , z1 , · · · , zi as free until zi returns to C (0) . For z0 ∈ C (0) \ .θN , we may regard z0 as bound to any zˆ ∈ .θN for a period p provided that (max DT )p |z0 − zˆ | < e−βp . (This trivial bound period is used to ensure that Lemma 4.2 continues to work.) When zi is h-related to .θN , we take
26
Q. Wang, L.-S. Young
the bound period to be that between zi and φ(zi ) (which is longer than the trivial one). Observe that Lemma 4.3 is equally valid for controlled orbits as for orbits starting from .θN , so that a nested structure can also be assumed for the bound and fold periods of controlled orbits. In the language of Definition 4.2, the situation can be summed up as follows. First, it follows from (IA2) and (IA3) that for all zˆ 0 ∈ .θN , (ˆz0 , 01 ) is controlled by .θN up to time N . (In fact, the angle of splitting is better than that in the definition of “control”.) Second, for (z0 , w0 ) controlled by .θN , (IA5) and (IA6) apply to give information during its bound periods. In particular, the orbit of (z0 , w0) has similar bound/free structures and “derivative recovery” estimates as those of (ˆz0 , 01 ), zˆ 0 ∈ .θN , except that (IA2) and (IA4) need not hold. In the remainder of this subsection we record some basic facts on the growth of wi and wi∗ . Their proofs are given in Appendix B.6. In Lemmas 4.6–4.8, it is assumed that (z0 , w0 ) is controlled by .θN up to time m, and all time indices are ≤ m. Lemma 4.5. Suppose (z0 , w0 ) satisfies the initial conditions in Definition 4.2, and for 0 < i ≤ m, zi is h-related to .θN at all returns. Then (z0 , w0 ) is controlled up to time m if the angle condition on wi∗ is satisfied at all free returns. Lemma 4.6. Under the additional assumption that dC (zi ) > e−αi for all i ≤ m, we have K −εi wi∗ ≤ wi ≤ K εi eαi wi∗ , ε = Kαθ. Lemma 4.7. There exists c > 0 such that for every 0 ≤ k < n,
wn∗ ≥ K −1 dC (zj )ec (n−k) wk∗ , where j is the first time ≥ k when a bound period extending beyond time n is initiated. If no such j exists, the factor dC (zj ) in the inequality above is replaced by δ in general, by 1 if zn is a free return. Lemma 4.8. Let k < n and assume zn is free. Then
wn > K −1 δ ec (n−k) wk , with δ omitted if zn ∈ C (0) . 4.3. Controlled orbits as “guides” for other orbits. (IA2)–(IA6) are about orbits starting from .θN . In Sect. 4.2 we introduced a class of orbits that successfully use orbits from .θN as their “guides”. We now let these orbits serve as guides for other orbits and study the properties they pass along. This is the essence of the replication process. Throughout Sect. 4.3 we assume that (1) z0 ∈ C (0) , w0 = 01 , and (z0 , w0 ) is controlled by .θN up to time m; (2) dC (zi ) > δ0 for i ≤ n0 and > e−αi for all n0 < i ≤ m. Observe that these conditions are satisfied by all z0 ∈ .θN . Our first order of business is to establish that for all ξ0 bound to z0 , wˆ i∗ (ξ0 ) copies wi∗ (z0 ) faithfully. A detailed proof of the following lemma is given in Appendix B.7.
Strange Attractors with One Direction of Instability
27
Lemma 4.9 ([BC2], Lemma 7.8). Let (z0 , w0 ) be as above, and let ξ0 ∈ C (0) be an arbitrary point which we think of as bound to z0 . Let Mµ (·) and θµ (·) have the same meaning in (IA6). Then the estimates for Mµ (ξ0 ) , Mµ (z0 )
Mµ (z0 ) Mµ (ξ0 )
and |θµ (ξ0 ) − θµ (z0 )|
as stated in (IA6) hold for all µ ≤ min(p, m). The corresponding distortion estimates for two points ξ0 and ξ0 bound to z0 apply as well. In the rest of this subsection we consider the situation where z0 is a critical point on a C 2 (b)-curve in the sense of Sect. 2.6 and study the quadratic behavior as this curve is iterated. More precisely, let em be the contractive field of order m, which we know from Lemmas 4.6 and 4.7 is defined at z0 . We assume (3) z0 lies on a C 2 (b)-curve γ ⊂ C (0) , and em (z0 ) is tangent to γ . For ξ0 ∈ γ , let p = p(ξ0 ) denote the bound period between z0 and ξ0 . We assume that during its bound period, the orbit of ξ0 inherits the secondary and higher order bound structures of the orbit of z0 . Lemma 4.10. In the part of γ , where p < m, p increases monotinically with distance from z0 . Proof. Proceeding inductively, we assume that on a connected subsegment γk of γ one of whose end points is z0 , the minimum bound period is k. It suffices to show that at time k + 1, the part of γk that remains bound to z0 is connected. We may assume T k (γk ) is not in a secondary fold period (otherwise all of T k+1 (γk ) will be in a bound period), and that dC (ξ0 ) > 21 δ for all ξ0 ∈ T k (γk ). Let T k (γk ) = γ (1) ∪ γ (2) , where γ (1) consists of points for which the primary fold period remains in effect and γ (2) its complement. Then γ (1) is contained in a disk B of k radius K k b 2 centered at zk , and the bound period on no part of B can expire at time k + 1. If the bound period of any part of γ (2) is to expire at time k + 1, then the far end of γ (2) must be > K −1 e−β(k+1) from zk . Also, its tangent vectors are b-horizontal. One concludes that T k (γ ) \ B is a b-horizontal connected segment which will remain horizontal in the next iterate, forcing the desired picture. % & Let s → ξ0 (s) be the parametrization of γ by arc length with ξ0 (0) = z0 . The following lemma, whose proof is given in Appendix B.8, contains a distance formula for |ξµ (s) − zµ |. See Sect. 2.5 for comparison with 1-d. Lemma 4.11. Let ε1 > 0 be given. Then for all µ ∈ Z+ and s > 0 satisfying µ ≤ m, µ (Kb) 2 < s and p(ξ0 (s)) ≥ µ, we have (1 − ε1 ) wµ (0) K1 s 2 < |ξµ (s) − zµ | < (1 + ε1 ) wµ (0) K1 s 2 , 1 where K1 = 21 | dq dx (z0 )|.
(9)
28
Q. Wang, L.-S. Young
Corollary 4.2. Assume in addition to (1)–(3) above that wj∗ (z0 ) > ecj for all j ≤ m. Let ξ0 ∈ γ . Suppose that |ξ0 − z0 | = e−h and p(ξ0 ) ≤ m. Then (a)
h 3K2
≤p≤
3h c ,
where K2 = log DT ; cp
(b) wp (ξ0 ) · |ξ0 − z0 | ≥ e 3 . Proof. (a) The lower bound for p follows from the fact that for all j ≤ −β 3Kh 2
− 2h 3
DT j |ξ
h 3K2 , |ξj
− zj | <
<< e . By Lemma 4.11, p is the smallest µ such 0 − z0 | < e that wµ (0) · |z0 − ξ0 |2 > K1−1 e−βµ . This must happen for some µ ≤ 3h c because w 3h (z0 ) · |z0 − ξ0 |2 > K −ε c w ∗3h (z0 ) · |z0 − ξ0 |2 > K −ε c ec· c e−2h > 1. 3h
3h
c
3h
c
(b) This follows from the fact that wp (ξ0 ) ≈ wp (z0 ) (Lemma 4.9) and |z0 − ξ0 |· β
β
cp
cp
wp (ξ0 ) > e− 2 p wp (ξ0 ) 2 > e− 2 p e 2 > e 3 . 1
& %
In analogy with Definition 3.3, we define for ξ0 (s) ∈ γ the notion of a fold period J with respect to z0 . This is the number J such that (Kb) 2 ≈ s. If τ0 (ξ0 ), the unit tangent vector to γ at ξ0 , is split according to this definition, then the rejoining of the Ei -vector for J < i < p has negligible effect. We may assume also that as we iterate, the subsegment of γ bound to z0 acquires the same fold periods as zi , and think of these as secondary fold periods for ξi . Corollary 4.3. Let the assumptions and notation be as in Corollary 4.2. We let p = p(ξ0 ), where |ξ0 − z0 | = e−h and assume that zp is not in a fold period. Then (a) the subsegment of T p γ between ξp and zp contains a curve ≥ e−Kβh in length and with b-horizontal tangent vectors; (b) τp (ξ0 ) ≥ K −1 eh(1−βK) . Proof. (a) By definition, |ξp −zp | > e−βp . The part of T p γ in a fold period with respect p to z0 has length ≤ (Kb) 2 DT p , and the rest have b-horizontal tangent vectors. To convert these estimates in p into bounds involving h, use Corollary 4.2(a). (b) Splitting τ0 using ep , we see that wp ∼ eh τp . Combining this with Lemmas 4.11 and 4.9, we have eh τp (ξ0 ) ∼ wp (ξ0 ) ≈ wp (z0 ) > K −1 |ξp − zp |e2h ≥ K −1 e−Kβh e2h . % & 5. Pushing the Induction Forward The goal of this section is to define !3N and to prove that (IA1)–(IA6) hold up to time 3N for parameters in !3N . The key to this inductive step is the correct splitting of the wi∗ -vectors at free returns (Proposition 5.2). This is proved with the aid of another important fact, namely the control of points in ∂Rk (Proposition 5.1). 5.1. Control of ∂Rk , k ≤ θ N . For z ∈ ∂Rk , let τ (z) denote a unit tangent vector to ∂Rk at z. Proposition 5.1. For every ξ0 ∈ ∂R0 and every k ≤ θN , (ξ0 , τ0 ) with τ0 = τ (ξ0 ) is controlled up to time k by .k .
Strange Attractors with One Direction of Instability
29
Proof. The proof proceeds by induction. The correctness of splitting of τ0 is evident. We assume all (ξ0 , τ0 ) have been controlled up to time k − 1, so that it makes sense to speak of ξk as being in a bound or free state. Suppose ξk is bound to zi for some z0 ∈ .k−1 . Since dC (zi ) > e−αi , we have zi ∈ C (j ) \ C (j +1) for some j << i ≤ k. By Lemma 4.2, ξk is h-related to .k , and by Lemma 4.5, τk∗ splits correctly, proving control at step k. Before proceeding to the free case, we state a lemma of independent interest: Lemma 5.1. Let γ be a subsegment of ∂Rk . If all the points on γ are free, then γ is a C 2 (b)-curve. Proof. That τk is a b-horizontal vector is an immediate consequence of the splitting algorithm. As for curvature, we appeal to Lemma 2.4 after using Lemma 4.8 to establish & that τk > K −1 δec (k−i) τi for all i < k. % Returning to the proof of Proposition 5.1, let ξk be a free return, and let γ be the maximal free subsegment of ∂Rk containing ξk . Since the end points of γ are in bound state, they cannot be in C (k−1) as explained earlier. This leaves two possibilities for the relation between γ and C (k−1) . Case 1. γ passes through the entire length of a component of C (k−1) . In this case we know from (IA1) that there is a critical point z0 ∈ γ . To see that every ξ ∈ γ ∩C (0) is h-related to .k , start from z0 and move away from it along γ . Using the C 2 (b) property of γ , the structure of critical regions (see (IA1)) and the fact that γ ∩ ∂Ri = ∅ ∀i < k, we observe that after leaving ∂Q(k) (z0 ) one gets into Q(k−1) (z0 ), then Q(k−2) (z0 ), and so on, with dC (ξ ) ≥ ρ i for ξ ∈ Q(i−1) (z0 ) \ Q(i) (z0 ). For the splitting of τ (ξ ), it follows from Lemma 4.1 and the C 2 (b) property of γ that for ξ ∈ γ ∩ Q(i−1) (z0 ) setminusQ(i) (z0 ), i−1
(τ (ξ ), τ (φ(ξ )) ≤ (τ (ξ ), τ (z0 ))+ (τ (z0 ), τ (φ(ξ )) < (Kb)|ξ −z0 |+(Kb) 4 < ε0 dC (ξ ). Case 2. γ does not intersect C (k−1) . Let j < k be the largest integer such that γ ∩C (j −1) = ˆ (j ) \ Q(j ) ) for some Q(j ) . Suppose for definiteness that ∅. Then there exists z ∈ γ ∩ (Q ˆ (j ) \ Q(j ) . Moving left along γ from z, we note that z lies in the right component of Q (j ) since γ ∩ Q = ∅, the left end point zˆ of γ must also be in the same component ˆ (j ) \ Q(j ) . H-relatedness and correct splitting are now proved as in Case 1 with zˆ of Q playing the role of z0 . We know τ (ˆz) splits correctly because zˆ is, by definition, in a bound state. % & 5.2. Extending control of .θN -orbits to time 3N . We continue to assume (IA1)–(IA6). Let z0 ∈ .θN and w0 = ( 01 ). The next proposition plays a key role in the inductive process. Proposition 5.2. If z0 ∈ .θN satisfies dC (zi ) > e−αi for all i ≤ 3N , then (z0 , w0 ) is automatically controlled by .θN up to time 3N . In fact, we have the following stronger results on the angle of splitting: (i) If zi is a free return, then (wi , τ (φ(z))) << ε0 dC (z).
(ii) If zi is a bound return, then
(wi∗ , τ (φ(z))) < ε0 dC (z).
30
Q. Wang, L.-S. Young
Proof. From the condition that dC (zi ) > e−αi , we know that z0 is h-related to .θN up to time 3N (see the remark following (IA2) in Sect. 3.3.2), and that p < Kα3N << N . To prove the assertions on the angle of splitting, we proceed inductively, assuming they are valid up to time k − 1 for some k ≤ 3N . ˆ (j ) \ Q(j ) for First, we consider the case where zk is a free return. Then either zk ∈ Q ([θN ]) some j ≤ θ N , or zk ∈ C . In the latter case we let j = [θN ] for purposes of the following arguments. Claim 5.1. There exists j , 21 j ≤ j < j , such that if ξ0 = zk−j then for 0 ≤ s < j ,
and u0 =
wk−j (z0 ) , wk−j (z0 )
DT s (ξ0 )u0 ≥ DT −s .
Proof of Claim 5.1. We consider the graph G of i → log wi (z0 ) for k − j < i ≤ k. Let L be the (infinite) line through (k, log wk ) with slope log DT . Then clearly, all the points in G lie above L. Let P be the intersection of L with the line x = k − 21 j . We let L be pivoted at P and rotate it clockwise until it hits some point in G. (Draw a picture!) Let k − j be the first coordinate of the first point hit. Then 21 j ≤ j < j , and Claim 5.1 is proved if we can show that in its final position, the slope of L is ≥ − log DT . This is true because zk being a free return, wk−j < wk by Lemma 4.8, so the straight line joining the two points (k −j, log wk−j ) and (k, log wk ) has slope ≥ − log DT .6 & % Now by Lemma 2.3, there exists an integral curve γ of the most contracted field of order j through ξ0 having length O(1). Since γ follows roughly the direction of e1 , it has slope > K −1 δ outside of C (0) and is roughly a parabola inside C (0) (Lemma 2.9). In both cases, γ meets ∂R0 . Let ξ0 ∈ γ ∩ ∂R0 . Then |ξs − ξs | < (K 2 b)s for all 0 ≤ s ≤ j . Our next claim is made possible by Proposition 5.1. Claim 5.2. ξj is a free return. Proof of Claim 5.2. If not, then ξj would be bound to zˆ , a point on a critical orbit, and we ˆ (i) (ˆz) for some i << j < j with dC (ξj ) ≈ dC (ξ ) ≈ dC (ˆz) > would have ξj , ξ ∈ Q j
j
ˆ (j ) or in C ([θN]) , for in either e−αj . This contradicts our assumption that ξj = zk is in Q j −1 & case, dC (zk ) < ρ . %
Claim 5.3. With u0 as in Claim 5.1, let τi = DT i (ξ0 )τ0 ,
ui = DT i (ξ0 )u0 , j
and let θi be the angle between ui and τi . Then θj ≤ b 2 . 6 This result also follows from a lemma of Pliss; see e.g. [Ma].
Strange Attractors with One Direction of Instability
31
). Then Proof of Claim 5.3. Write A = DT (ξi−1 ) and A = DT (ξi−1
θi =
τi × ui 1 = A τi−1 × A ui−1 + A τi−1 × (A − A )ui−1 τi · ui τi · ui τi−1 ui−1 ≤ · · (| det(A )|θi−1 + K|ξi − ξi |) τi ui τi−1 ui−1 ≤ · · (bθi−1 + K(K 2 b)i−1 ). τi ui
Applying this relation for θi recursively, we obtain j τ u i i θj < · (K 2 b)j −1 . τj uj i=0
Since both zk and ξj are free returns, we may use Lemma 4.8 to bound the sum in brackets, completing the proof of Claim 5.3. % & We are finally ready to prove our assertion on the angle of splitting for the free return ˆ (j ) or Q([θN]) . Since |ξj − ξ | < (K 2 b)j , ξ ∈ ∂Rj zk . Recall that ξj = zk ∈ Q j j
and j < j , we have ξj ∈ ∂Q(j ) (zk ). By our inductive hypothesis, τj (ξ0 ) splits j j correctly. Since (wk (z0 ), τ (ξj )) ≤ b 2 (Claim 5.3), (τ (φ(ξj )), τ (φ(zk ))) = O b 4 j j and |dC (ξj )−dC (zk )| = O b 4 (Lemma 4.1), it suffices to prove that b 4 << dC (zk )2 . ˆ (j ) \ Q(j ) , this is trivial as dC (zk ) ∼ ρ j . In the case where In the case where zk ∈ Q ([θN ]) zk ∈ Q , since dC (zk ) > e−αk , we have dC (zk )2 > e−6αN , which we may assume 1 1 θN is >> b 12 ≥ b4j . To complete our proof of Proposition 5.2, we now consider the case when zk is a bound return. Our argument is along the lines of Lemma 4.5, with the following modifications to get the sharper result claimed in assertion (ii). The argument in the proof of Lemma 4.5 transfers the problem of estimating the angle of splitting in question to that of estimating (DT k−j (ˆz0 )u, τ (φ(ˆzk−j )), where zˆ 0 ∈ .θN is a binding point for zj for some j < k (and u is as in Lemma 4.5). If zˆ k−j is a free return, then this angle is << ε0 dC (zk ) according to assertion (i) in this proposition, and we are done. Suppose zˆ k−j is not free. For definiteness, let us first consider the case where zˆ k−j is bound to the orbit of zˆ 0 with a binding initiated at time j , j < j < k, and that k−j (ˆ
z0 )u, zˆ k−j is a free return. We first apply assertion (i) of this proposition to (DT τ (φ(ˆzk−j )). Based on this information, we use our modified version of Lemma 4.5 to first estimate (DT k−j (ˆz0 )u, τ (φ(ˆzk−j )), and then repeat the argument to estimate the angle of splitting of wk∗ . Observe that here dC (zk ) ≈ dC (ˆzk−j ). In general, zˆ k−j may not be free, in which case we consider the critical orbit it is following, and so on. It may take several steps before we arrive at the situation of a guiding critical orbit making a free return. We need to argue that the errors in these successive approximations do not add up. They do not, because for the same reason that dC (zk ) ≈ dC (ˆzk−j ) above, each approximation guarantees that dC (zk ) is bigger, so that the errors in the constant in front of dC (zk ) form a geometric series, the sum of which we may assume is < ε0 . This completes the proof of Proposition 5.2. % &
32
Q. Wang, L.-S. Young
5.3. Verification of (IA1)–(IA6) up to time 3N . Step 1. Deletion of parameters. We delete from !N all (a, b) for which there exists z0 ∈ .θN and i, N < i ≤ 3N , such that dC (zi ) < e−αi
or
wi∗ (z0 ) < eci .
The set of remaining parameters is called !3N . We do not claim in (IA1)–(IA6) that !3N has positive measure or even that it is nonempty; this is discussed in Sect. 6. Steps 2–5 below apply to T = Ta,b for (a, b) ∈ !3N . Step 2. Updating of .θN . For each z0 ∈ .θN , since wi grows exponentially (Step 1 and Lemma 4.6), there exists a unique z0 on the component of ∂C (k) containing z0 that be the set of these z , i.e. . is is a critical point of order 3N (Lemma 2.10). Let .θN 0 θN a copy of .θN updated to order 3N . Step 3. Construction of .3θN and C (k) , θN < k ≤ 3θN . We proceed inductively, assuming all has been accomplished for k − 1. First we establish control of ∂Rk as in Sect. 5.1, with one minor (technical) difference to be explained below. It follows that Rk meets C (k−1) in at most a finite number of components bounded by free, and hence C 2 (b), curves. Next, we construct critical points on ∂Rk . Let Q be one of the components of Rk ∩ C (k−1) , and let γ be one of its horizontal boundaries. By Lemma 2.11, there exists a 1 lies on critical point zˆ 0 ∈ γ of order m ˆ = min{3N, − log d(z0 , γ ) 2 }, where z0 ∈ .θN θN
the boundary of the component Q([θN]) containing γ . Since d(z0 , γ ) = O(b 2 ), we θN have, assuming θ is chosen with e−3N > K −N > b 4 , that zˆ 0 is of order 3N . The critical regions C (k) are then constructed as follows: For each Q as above, choose one of the critical points on ∂Q, and define Q(k) := {ξ ∈ Q : the horizontal distance between ξ and zˆ 0 is ≤ ρ k }. This is the component of C (k) in Q. To continue, we need to set bindings for points in ∂Rk . Technically, only z0 ∈ .θN (and not the critical points on ∂Ri , θN < i ≤ k) can be used. This is of no concern to us for the following reason: for k with k < k ≤ 3θN , only those parts of ∂Rk that are free are involved in the construction of C (k ) ; and for ξ0 ∈ ∂Rk ∩ C ([θn]) , independent of ([θn]) which z0 ∈ Q (ξ0 ) we think of it as bound to, ξi will remain in bound state through time 3θ N because |ξi − zi | ≤ K 3θN ρ θN << e−3βθN . This completes the constructive procedure. The critical points in ∂Rk , N < k ≤ 3N , together with .θN form .3θN . To complete the verification of (IA1) up to time 3N , we need to explain the uniqueness of zˆ 0 as a critical point of order 3N on γ . Since e1 is defined everywhere in C (0) and has derivative > K −1 , while γ is a C 2 (b), curve, Corollary 2.1 limits the possibility of any critical points to an interval of length O(b). On this interval, e2 is defined, further limiting the candidates for critical points to an interval of length O(b2 ) etc. Finally, The bound on the Hausdorff distance between the two horizontal boundaries of Q(k) as stated in Theorem 1 is a triviality and not an inductive fact: it is true because area(Rk ) ≤ | det(DT k )| < (Kb)k and the two horizontal boundaries of Q(k) are roughly parallel. Step 4. Updating the definitions of dC (·) and φ(·). Using .3θN and C (k) , k ≤ [3θN ], we reset these definitionsfor z ∈ C ([θN]) in accordance with Definition θ N 3.2. Since θN |old φ(z)−new φ(z)| = O b 4 and |τ (old φ(z)) − τ (new φ(z))| = O b 4 (Lemma 4.1), these changes have essentially no effect on the correctness of splitting for points 3θ N with dC (·) > b 20 . The relations in (IA5) are also not affected.
Strange Attractors with One Direction of Instability
33
Step 5. Verification of (IA2)–(IA6) for i ≤ 3N . This is carried out in 3 stages. ), (IA2)–(IA6) hold (1) First we argue that for z0 ∈ .θN (we really mean .θN , not .θN for i ≤ 3N : (IA2) and (IA4) hold by design; (IA3) is given by Proposition 5.2, and (IA5) and (IA6) are proved in Sect. 4.3 with m = 3N . (2) With the properties of .θN in (1) having been established, we observe that continuing to use .θN as the source of control, the material in Sects. 4.2 and 4.3 are now valid for times up to min(m, 3N ). (3) We are now ready to argue that (IA2)–(IA6) hold for all z0 ∈ .3θN . For each or of generation > θ N, there exists z ∈ . z0 ∈ .3θN , whether it is in .θN 0 θN such
that |z0 −z0 | = O(b
θN 4
). This implies, for i ≤ 3N , that |zi −zi | < b θ 4
θN 4
DT 3N <<
e−β3N provided θ is chosen so that b DT 3 < 21 e−β . (IA2) follows immediately from the corresponding condition for z0 . Regarding z0 as bound to z0 for at least 3N iterates, (IA3) and (IA4) follow from property (IA6) of z0 . Finally, regarding (z0 , ( 01 )) as controlled by .θN up to time 3N , we obtain (IA5) and (IA6) from Lemmas 4.9–4.11 and Corollary 4.2. Conclusions from Sections 3–5. After letting N go to infinity, we have defined for each T = Ta,b with (a, b) ∈ ! := ∩N !N a set C given by C = ∩i≤0 C (i) . This is the critical set in Theorem 1.1. Let . be the set to which .θN converges as N → ∞. An equivalent characterization of C is that it is the set of accumulation points of .. Clearly, the properties that dC (zi ) ≥ e−αi and wi grows exponentially are passed on to points in C. We have thus completed the proof of Theorem 1.1 modulo the positivity of the measure of !. 6. Measure of Selected Parameters In this section we fix b > 0 and consider the 1-parameter family a → Ta,b . Let !b = {a : (a, b) ∈ !}. The Lebesgue measure of a set A ⊂ R is denoted by |A|. More generally, we use | · | to denote the measure on curves induced by arc length. The purpose of this section is to prove that |!b | > 0 for all sufficiently small b > 0. ∗
6.1. Phase-space dynamics and curves of critical orbits. Assuming δ = e−µ for some µ∗ ∈ Z+ , let P = {Iµj } be the partition of the interval (−δ, δ) defined as follows: for µ ≥ µ∗ , let Iµ = (e−(µ+1) , e−µ ), and let each Iµ be further subdivided into µ2 subintervals of equal length called Iµj , j = 1, 2, · · · µ2 ; for µ ≤ −µ∗ , let Iµj = −I(−µ)j . Next let γ be a curve with nearly horizontal tangent vectors. We assume for simplicity that γ meets only one component Q(0) of C (0) , and let zˆ = (x, ˆ y) ˆ be a point near the center of Q(0) . The partition Pγ ,ˆz on γ is defined to be (ψ −1 P)|γ ∪{I ± }, where ψ(x, y) = x−xˆ and I ± are the two components of γ \ ψ −1 (−δ, δ). An element of Pγ ,ˆz is said to have “full length” if its image under ψ is either equal to some Iµj or longer than all the Iµj ’s. When γ and zˆ are understood, we often refer to Pγ ,ˆz simply as P and (ψ −1 Iµj ) ∩ γ as Iµj . Before proceeding to the estimation of |!b |, we consider first the following problem in phase-space dynamics. The estimation of |!b | includes an argument parallel to and more complicated than this.
34
Q. Wang, L.-S. Young
A model problem in phase-space dynamics. Let T = Ta,b with (a, b) ∈ !. Recall from the proof of Proposition 5.1 that if γ ⊂ ∂Rk is a maximal free segment meeting some Q(0) , then either γ ∩ Q(0) contains a critical point zˆ ∈ . or the entire segment γ ∩ Q(0) is h-related to some zˆ ∈ .. In both cases, Pγ ,ˆz is the partition of choice on γ . Note that for z ∈ Iµj , dC (z) ≈ e−|µ| . Let ω0 be a subsegment of ∂R0 , and write ωi := T i ω0 . We assume that (i) for all z0 ∈ ω0 , dC (zi ) > e−αi for all i ≤ N , and (ii) ωN is free and is approximately equal to some Iµ0 j0 . The problem is to find a lower estimate for the measure of {z0 ∈ ω0 : dC (zi ) > e−αi for all i}. We may assume that all the points in ωN have the same bound period, and let i1 > N be the first moment in time after the expiration of this bound period when ωi1 ∩ C (0) contains an Iµj of full length. This must happen at some point, for the length of ωi grows by a factor > K between successive free returns (Corollary 4.3). It is easy to check that dC > e−αi is not violated between times N and i1 . Let {ω} be the partition P on ωi1 with end segments attached to their neighbors if they are not of full length. We delete those ω’s that contain some z with dC (z) < e−αi1 . For each ω that is kept, we repeat the procedure above with ω in the place of ωN , that is, we iterate until ω makes a free return at time i2 = i2 (ω) with T i2 −i1 ω containing an Iµj of full length. We then partition T i2 −i1 ω, discard subsegments that violate dC > e−αi2 , and continue to iterate the rest. We estimate the fraction of ωi1 deleted at time i1 as follows. Since ωN ≈ Iµ0 j0 , the bound period p is ≤ K|µ0 |. From Corollary 4.3, we see that ωi1 has length > K −1 −βK|µ0 | e > e−2β|µ0 |K . Now |µ0 | ≤ αN and i1 > N + p0 , where p0 > 0 is a lower µ2 0
bound for all bound periods. Then 1 |{z ∈ ωi1 : dC (z) < e−αi1 }| 2e−α(N+p0 ) < −2KαβN < e− 2 αN |ωi1 | e
assuming N is sufficiently large. Similarly, for each subsegment ω ≈ Iµj of ωi1 that is 1 1 kept, the fraction of T i2 −i1 ω deleted at time i2 is < e− 2 αi1 < e− 2 α(N+p0 ) , and so on. To estimate the total measure of ω0 deleted, these fractions have to be pulled back to ω0 . This involves a distortion estimate for DT i along certain subsegments of ∂Rk . Using the fact that this distortion is uniformly bounded (Lemma 8.2), we see that the fraction 1 1 of ω0 deleted in this procedure is < K i e− 2 α(N+ip0 ) < Ke− 2 αN . We remark that the scheme in this paragraph relies on the fact that ωN has a certain minimum length depending on N, otherwise the entire segment may be obliterated before time i1 is reached. Strategy for estimating |!b |. Since b is fixed throughout this discussion, let us for notational simplicity omit mention of it and write !, !N and Ta instead of !b , !b ∩!N and Ta,b . Let N be fixed. The problem is to estimate the measure of parameters deleted ˆ let between times N and 3N. Our strategy is as follows: For aˆ ∈ !N and z0 ∈ .θN (a), a → z0 (a) be defined on an interval containing a. ˆ We consider γ0 → γ1 → γ2 → · · · ,
where γi (a) := zi (a) = Tai (z0 (a)),
and estimate the measure of the set of a for which zi (a) violates (IA2) or (IA4). The idea behind this line of proof is that qualitatively, the evolution of γ0 is similar to that of ω0 in the model phase-space problem. If this is true, then the measure deleted on account of (IA2) can be estimated analogously. To understand why the γi ’s behave
Strange Attractors with One Direction of Instability
35
like phase curves, i.e. curves that are obtained through the iteration of Ta , observe d the way in which da γi , the tangent vector to the curve a → γi (a), is transformed: if d d d γi+1 (a) ≈ DTa (γi (a)) da γi (a); that is to say, γi+1 ≈ Ta ◦ γi da γi (a) >> 1, then da near γi (a). Issues to be addressed. 1. Similarity of space- and a-derivatives. This is the first and most important step in justifying the thinking Let γ0 be as above. In Sect. 6.2, we din the last paragraph. d show that da γi ∼ DT i da γ0 or DT i 01 . As we will see, this is made possible by our transversality condition on {fa } in Sect. 1.1. The only other prerequisite for this comparison is that the slopes of γ0 be suitably bounded. This is verified in Sect. 6.3 for curves corresponding to critical points of all generations and all orders. 2. Dynamics of the curves a → γi (a). Our next step is to show that as curves parametrized by a, the γi have properties similar to those of ωi . For example, with .θN moving with a, how is dC (zi (a)) affected? Other properties include the geometry of free segments, quadratic behavior of the type in Sect. 4.3, distortion estimates along γi etc. These questions are discussed in Sect. 6.4. 3. Deletions of parameters in violation of (IA2) or (IA4). We consider z0 ∈ .θN one at a time, and let γ0 be the corresponding curve of critical points. Assuming the success of the last step, deletions on γ0 on account of (IA2) are estimated following the scheme outlined in the model problem. Estimates for the measure of parameters deleted on account of (IA4) are discussed in Sect. 6.5. 4. Combined effect of deletions corresponding to all z0 ∈ .θN . Obviously, we need to multiply the measure of the parameters deleted on each γ0 by the cardinality of .θN , but there are technical considerations: As in our phase-space model, to get started we need γN to have a certain minumum length. This raises the question of the length of the parameter interval on which each a → z0 (a) can be continued (this problem appears already in Sect. 6.3). Also relevant is the combined effect of deletions on all critical curves prior to time N . The final estimate is made in Sect. 6.6. The idea to relate parameter-space dynamics to phase-space dynamics is, of course, not new. Two results on 1-dimensional maps are cited without proof and used in this section: a transversality condition from [TTY] is used in Sect. 6.2 and a large deviation estimate from [BC2] is used in Sect. 6.5. 6.2. Equivalence of space- and a-derivatives. The setting of this subsection is as follows: For fixed b > 0, let aˆ be such that z0 = z0 (a) ˆ ∈ .θN (a) ˆ obeys the conditions in (IA2) and (IA4) and the conclusions of Lemmas 4.6–4.8 up to time n. This assumes implicitly that all the binding structures needed for the last sentence to make sense are in place. (See the first part of Sect. 6.5 for a more detailed discussion.) We assume also that z0 (a) ˆ hasa smooth continuation a → z0 (a) to an a-interval containing a. ˆ Let dzi 0 i wi = DTaˆ (z0 (a)) ˆ ˆ The goal of this subsection is to compare wi and τi = da (a). 1 and τi . Let τ0 = (τ0,1 , τ0,2 ). Proposition 6.1. Given τ¯ > 0, there exist constants λ2 > λ1 > 0 and a small ε > 0 such that the following holds: If (a, ˆ b) is sufficiently near (a ∗ , 0), z0 (a) ˆ is as above, τ0 < τ¯ and |τ0,2 | < ε, then for all i ≤ n, λ1 ≤
τi ≤ λ2 . wi
36
Q. Wang, L.-S. Young
We will show below that once we have τi ∼ wi for some i with wi sufficiently large, then this relationship will hold from there on. The estimate for the initial stretch is guaranteed by our transversality condition on the 1-dimensional family {fa }. We recall a relevant result from 1-dimension: Let f and {fa } be as in Sect. 1.1. Let x0 be a critical point of f , and let p = f (x0 ). Since f = fa ∗ , we write x0 (a ∗ ) = x0 , p(a ∗ ) = p, and let a → x0 (a) and a → p(a) be the continuation of x0 and p as defined in Sect. 1.1. Let xk (a) = fak (x0 (a)). We will use (·) to denote differentiation with respect to x. Lemma 6.1 ([TTY], Proposition VII.7). As k → ∞, Qk (a ∗ ) :=
dxk ∗ da (a ) k−1 (f ) (x1 (a ∗ ))
→ λ0 :=
dx1 ∗ dp ∗ (a ) − (a ). da da
The transversality condition in Sect. 1.1, Step II, states that λ0 = 0. We will also need the following technical lemma the proof of which is given in Appendix B.9. Lemma 6.2. There exist constants K and c > 0 such that for every 0 ≤ s < i, we have
DT i−s (zs ) ≤ Ke−c s wi . Proof of Proposition 6.1. Since τi = DT (zi−1 )τi−1 + ψ(zi−1 ), where ψ(z) =
∂(Ta z) ˆ ∂a (a),
we have inductively i
τi = DT (z0 )τ0 +
i
DT i−s (zs )ψ(zs−1 ).
s=1
The upper estimate for ψ(·):
τi wi
follows from Lemma 6.2 and the uniform boundedness of i
DT i (z0 )τ0 DT i−s (zs )ψ(zs−1 ) τi ≤ + wi wi wi s=1
< Kτ0 + K
∞
e−c s := λ2 .
s=1 τi , we pick k0 large enough that |Qk0 (a ∗ )| > 21 |λ0 |, To obtain a lower bound for w i where Qk0 and λ0 are as in Lemma 6.1, and decompose τi into τi = I + II, where
I = DT i (z0 )τ0 +
k0
DT i−s (zs )ψ(zs−1 ),
s=1
II =
i s=k0 +1
DT i−s (zs )ψ(zs−1 ).
Strange Attractors with One Direction of Instability
37
Again by Lemma 6.2, we have ∞ II Ke−c s . < wi s=k0 +1
I > K0−1 |λ0 | for some K0 , and assume k0 is chosen so that We will show w i −c s << K −1 |λ |. Write 0 s>k0 Ke 0
I = DT i−k0 (zk0 )V , where V = DT k0 (z0 )τ0 +
k0
DT k0 −s (zs )ψ(zs−1 ).
s=1
Claim 6.1. V >
1 wk0 |λ0 |, 3 w1
and the second component of V tends to 0 as (a, ˆ b) → (a ∗ , 0). Proof of Claim 6.1. Let z0 → (x0 , 0) as (a, ˆ b) → (a ∗ , 0). The two terms of V are estimated as follows: (i)
DT k0 (z0 )τ0 < K|τ0,2 | for (a, ˆ b) sufficiently near (a ∗ , 0). This is because k0 is a system constant, and writing Tak∗0,0 = (T 1 , T 2 ), we have 1 1 ∂T ∂T 1 ∂T k0 (z )τ → , 0)τ + , 0)τ , 0 = , 0)τ , 0 . (x (x (x DTa,b 0 0 0 0,1 0 0,2 0 0,2 ˆ ∂x ∂y ∂y
(ii) For (a, ˆ b) sufficiently near (a ∗ , 0), zs stays out of C (0) for > k0 iterates, and k k0 0 k0 −s ) (x (a ∗ )) d (f (x ∗ k0 −s (z )ψ(z s s s−1 ) s=1 (f s=1 DT da a s−1 ))(a ) ,0 → wk0 /w1 ±(f k0 −1 ) (x1 (a ∗ )) k d 0 ∗ da (fa (xs−1 ))(a ) = ± , 0 , (f s−1 ) (x1 (a ∗ )) s=1
which by a simple computation is equal to (±Qk0 (a ∗ ), 0). % & Assuming that n0 > k0 , so that dC (zk0 ) > 21 δ0 , we have that the slope of ei−s (zs ) is bounded below by some K −1 . This together with Claim 6.1 gives DT i−k0 (zk0 )V > K −1 DT i−k0 (zk0 )wk0
V > K0−1 wi |λ0 |. wk0
& %
We will also need an estimate on the angle between τi and wi , which we denote by θi . The assumptions are as in Proposition 6.1. Lemma 6.3. If zi is free, then θi <
K τi .
38
Q. Wang, L.-S. Young
Proof. i 1 1 wi × DT i (z0 )τ0 i−s | sin θi | ≤ wi × DT (zs )ψ( zs−1 ) + τi wi wi s=1 i ∞ ws i−s ws 1 K s τ 0 i ≤ ≤ ) b + b . × ψ(z b s−1 τi wi ws wi τi s=1
s=0
The last inequality is valid if, for example, ws ≤ K 1δ wi for all s ≤ i, which is the case when zi is free. % & 6.3. Initial data for critical curves. The goal of this subsection is to verify the conditions on τ0 in Proposition 6.1 for critical curves of all generations and all orders. Our plan of proof is as follows: 1. We obtain information on the slopes of critical curves of generation i by comparing them to critical curves of generation i − 1. Following [BC2], this is done using a lemma of Hadamard, which requires that the intervals of definition of the critical curves be sufficiently long. We are thus led to the following question: on how long of a parameter interval can one continue a critical curve with reasonable properties? 2. As the order of a critical point tends to infinity, the length of the parameter interval on which it is defined goes to zero. This makes it necessary for us to prove our results in two steps, to first work with critical points having orders commensurate with their generations, and then to pass the bounds on to curves corresponding to higher orders. 6.3.1. Stability of critical regions. In Sections 3–5, we construct for N = N0 , 3N0 , 32 N0 , · · · a parameter set !N such that for a ∈ !N , .θN is well defined and consists of critical points of generation θ N and order N . Let us denote this set by .θN,N . In the discussion to follow, it will be convenient to consider .i,n for arbitrary i ≤ n. We define these sets formally as follows: First we fix a ∈ !N , and define .i,N , θ N < i ≤ N , inductively by carrying out the steps in Sect. 5 in a slightly different order. Assuming that .i−1,N is defined and all the points in ∂R0 are controlled for i − 1 iterates, we define C (i) and .i,N . Immediately, we observe that the newly constructed critical points are controlled by .θN,N . In particular, they satisfy (IA2) and (IA4) (with possibly slightly weaker constants) and can be used for binding. For free segments of ∂Ri that lie in C (0) , we may then set binding as in the proof of Proposition 5.1, and proceed to step i + 1. For n with N < n ≤ 3N , let !n := {a ∈ !N : (IA2) and (IA4) are satisfied up to time n for orbits from .θN }. A slight extension of the argument above defines .i,n for all a ∈ !n and i ≤ n. ˜ n , which has the same definition Finally, we introduce for each n the parameter set ! ˜ N , N = N0 , 3N0 , · · · , (IA2) and (IA4) are as !n except that in the definition of ! replaced by dC (zj ) > 21 e−αj and wj∗ > 21 ecj . One checks easily that all the results in Sections 3–5 are valid under these slightly relaxed rules, as is the discussion in the last ˜ n and i ≤ n. two paragraphs, so that .i,n is defined for all a ∈ ! We remark before proceeding further that built into our definition of .i,n for N3 < n < N is the property that z0 ∈ .i,n has all the properties of z˜ 0 ∈ .θN,N (except
Strange Attractors with One Direction of Instability
39
˜ n and for the factor 21 ) up to time n. In particular, Proposition 6.1 applies to aˆ ∈ ! z0 = z0 (a) ˆ ∈ .i,n . ˜ n and aˆ ∈ J , we say .i,n (a) Definition 6.1. For i ≤ n, an interval J ⊂ ! ˆ has a smooth continuation to J if there is a map g : .i,n (a) ˆ × J → R0 such that – g(·, a) = .i,n (a) for all a and ˆ a → g(z, a) is smooth. – for each z ∈ .i,n (a), Likewise one has the notion of the critical regions C (i) deforming continuously as a ranges over J . ˜ n ; moreover, .n,n (a) ˆ Lemma 6.4. Let aˆ ∈ !n and J = [aˆ − ρ 2n , aˆ + ρ 2n ]. Then J ⊂ ! has a smooth continuation to J , and C (i) , i ≤ n, deform continuously on J . The structual stability of the critical regions comes from the fact that the components of C (i) are stacked together in a very rigid way, and their relations to the components of C (i−1) are equally rigid. As a varies over J , the entire structure may move up or i down by amounts >> b 2 , the maximum height of the components of C (i) , but it takes a relatively large horizontal displacement to slide these components past each other. A proof of Lemma 6.4 is given in Appendix B.10. 6.3.2. Comparing τ0 -vectors for different critical curves. Lemma 6.5. There exists K such that the following holds for all n: Consider aˆ ∈ !n and J = [aˆ − ρ 2n , aˆ + ρ 2n ]. Let z(n) ∈ .n,n (a), ˆ z(n−1) ∈ .n−1,n−1 (a) ˆ ∩ Q(n−1) (z(n) ), (n) (n−1) (n) (n−1) and let z (a) and z (a) be the continuations of z and z on J . Then dz(n) n dz(n−1) (a) − (a) < (Kb) 9 . da da (n)
(0)
From this lemma it follows inductively that dzda − dzda < Kb 9 , where z(0) is a critical point of generation 0 and order 1 lying in Q(0) (z(n) ). Since there is only a finite number of critical curves of generation 0 and order 1, and for them τ0,2 = 0, Lemma 6.5 proves that the hypotheses on τ0 in Proposition 6.1 are met for curves corresponding to all z(n) ∈ .n,n . It remains to pass these properties to critical curves of higher order. 1
ˆ be the updating of zn ∈ .n,n (a) ˆ Lemma 6.6. Let m > n, aˆ ∈ !m , and let zm ∈ .n,m (a) to order m. Then for all a ∈ [aˆ − ρ 2m , aˆ + ρ 2m ], m dz n dzn da (a) − da (a) < (Kb) 4 . Lemmas 6.5 and 6.6 are proved in Appendix B.10.
40
Q. Wang, L.-S. Young
6.4. Dynamics of critical curves. We fix a parameter interval J and a critical point z0 which we assume can be smoothly continued to all of J . As usual, let γi (a) = zi (a). The purpose of this subsection is to make precise the parallel between the dynamics of i γ0 → γ1 → γ2 → · · · and the action of Tai on ∂R0 . Let τi (a) = dγ da (a). Lemma 6.7. Suppose Proposition 6.1 holds for all the parameters a and time indices i in question. Then there is a small number k(δ) > 0 and an integer i0 > 0 such that for all i > i0 , if γi (a) is free and ∈ C (0) , then |slope(τi (a))| < k(δ) and τi+1 (a) ≈ DTa (γi (a))τi (a). In particular, if γi is as above, then it grows exponentially in length as long as it stays outside of C (0) . Proof. By Proposition 6.1, there is i0 such that for all i > i0 , τi (a) is very close to wi (a) both in length and in angle. Assertion (i) is immediate; (ii) follows once τi is sufficiently large, and the last assertion is a consequence of (ii) and Lemma 2.8. % & We assume (a, b) is sufficiently near (a ∗ , 0) that n0 > i0 . The reason we assert only that |slope(τi (a))| < k(δ), (where k(δ) >> b) is that for a very long period at the beginning – the length of this period depending on b – one cannot expect τi to be b-horizontal. Next we allow γi to intersect C (0) . For each fixed a, we have introduced in Sections 3– 5 definitions of distance to the critical set, binding point, bound period, etc. To emphasize their dependence on a, we write dC (a) (·), φa (·) and pa (·) when referring to definitions that belong to the map Ta . Even for a fixed map, these quantities depend sensitively on the location of the point in question; vertical displacements of z, for example, may dramatically change φa (z). In the “dynamics” of critical curves, the problem is all the more delicate, for not only does zi (a) move with a, the entire critical set moves as well. The goal of the next few lemmas is to establish some viable notions of dC (·) and bound/free states that work in a coherent fashion for all points in γi . We assume for the rest of this subsection that ˜ Kαn , so that for each a the binding structure is in place for points with J ⊂ ! dC (a) (·) > e−αn ; (ii) z0 obeys (IA2) and (IA4) up to time n, and (iii) all time indices are ≤ n. (i)
In the next lemma, we let | · − · |h denote the horizontal distance between two points, and assume for simplicity that γi is contained in one component of C (0) . Lemma 6.8. Suppose |slope(τi )| < k(δ). Then there exists z¯ ∈ C (0) such that whenever dC (a) (γi (a)) > 21 e−αi , |γi (a) − z¯ |h − dC (a) (γi (a)) < Ke−ci dC (a) (γi (a)). Thus we may put the partition Pγi ,¯z on γi and define dC (·) = | · −¯z|h (the precise definition of dC (γi (a)) is irrelevant for a with dC (a) (γi (a)) < 21 e−αi ). Lemma 6.9. Let γi be as above. We assume further that zi (a) is a free return for every ˜ 0 ) < K|µ| such a. Then for each ω0 = Iµj ∈ Pγi ,¯z with |µ| < αi, there exists p˜ = p(ω that for all a, a with zi (a), zi (a ) ∈ ω0 ,
Strange Attractors with One Direction of Instability
41
(a) |zi+j (a) − zi+j (a )| < e−βj for j ≤ p; ˜ (b) zi+p˜ is out of all fold periods, |slope(τi+p˜ )| < k(δ) and |ωp˜ | ≥ p˜ 3
p˜ 3
1 −βK|µ| e ; µ2
(c) wi+p˜ > K −1 e wi , and τi+p˜ > K −1 e τi . Lemma 6.9 allows us to define a natural notion of bound/free states for the curves γi that agrees essentially with the dynamical notion previously defined for each zi (a). Proposition 6.2. We assume the following hold for all a ∈ J and i ≤ n: (i) for each i, the entire segment γi is bound or free simultaneously, and γi is contained in three contiguous Iµj ’s at all free returns; (ii) γn is a free return. Then there exists K (independent of γ0 or n) such that for all a, a ∈ J , τn (a) 1 ≤ ≤ K. K τn (a ) Lemmas 6.8 and 6.9 are proved in Appendix B.11. Proposition 6.2 is proved in Appendix B.12. 6.5. Deletions on account of a single critical point. Let J be a parameter interval on which .θN, 3KαN is well defined, that is to say, for each (fixed) a ∈ J , all critical points of generation ≤ θ N have been introduced, and they obey (IA2) and (IA4) up to time 3KαN. Moreover, we assume that all of the critical points have smooth continuations on J . In this subsection we follow the evolution of one fixed z0 ∈ .θN, 3KαN and consider the set of a ∈ J that will be excluded on account of its behavior between times N and 3N . Let a and z0 be fixed. Before embarking on the main discussion, let us first review what can be said about zn and wn = DTan (z0 ) 01 for n ≤ 3N based on the information available. (The precise location of z0 will have to be “updated” as we go along; these issues have been dealt with in previous subsections and will not be discussed here.) (i) There is no ambiguity as to whether dC (zn ) > e−αn . (ii) If zn ∈ C (0) and dC (zn ) > e−αn , then it has a binding point and the ensuing bound period has the properties in (IA5). (This uses the fact that p < 3KαN .) (iii) If dC (zi ) > e−αi for all i < n, then (z0 , w0 ) is controlled in the sense of Definition 4.2 up to time n. (This follows from the proof of Proposition 5.2. Notice that the argument uses only the critical structures guaranteed above.) (iv) If for some subinterval J ⊂ J , every a ∈ J has the property that z0 obeys the estimate in (IA2) up to time n, then the discussion in Sect. 6.4 holds for the critical curve γi defined on J for i < n. (In Sect. 6.4 we have assumed for simplicity that z0 obeys the estimate in (IA4); we can do without that because the estimate c wi∗ ≥ e 3 i , which we have from (IA2) alone, will also suffice.) Observe (1) .θN, 3KαN is essentially the minimum structure needed for the discussion above, and (2) this discussion is entirely independent of the behavior after time 3KαN of critical points other than z0 . It is this independence that allows us to consider one critical point at a time up to time 3N and to make deletions on the basis of its behavior
42
Q. Wang, L.-S. Young
alone. On the other hand, the dependence on early behavior of other critical points is a strong reminder that distinct critical orbits cannot be treated completely separately through their entire lifetimes. We now proceed to the main topic of discussion. We assume for the rest of this subsection that J is as above, that for all a ∈ J , the estimates in (IA2) and (IA4) hold for z0 up to time N, and that γN , which is a free return, is ≈ Iµj for some µ with |µ| < αN . We begin with deletions on account of (IA2). At this point, we ask the reader to go to Sect. 6.1, and to consider γN in the place of ωN in the model phase-space problem. The construction we make in the parameter case is identical to that in Sect. 6.1. For the analogy between the dynamics of critical curves and true dynamical curves, see Lemma 6.7 (outside C (0) ), Lemmas 6.8 and 6.9 (bound estimates) and Proposition 6.2 (distortion). We summarize the result in the following proposition, the proof of which we omit. Proposition 6.3. There is a set DN,z0 ⊂ J with |DN,z0 | < Ke− 2 αN |J | 1
such that for all a ∈ J \ DN,z0 , the estimate in (IA2) holds for z0 up to time 3N . The structure of J \ DN,z0 , which will be relevant in Sect. 6.6, can be described as follows: In the procedure outlined in Sect. 6.1, pulling back the subdivisions at each stage to the parameter interval J results in a partition defined on the subset of J that has not yet been deleted at that time. Let us call these partitions Qn,z0 , n = N, N +1, · · · , 3N . Each element J of Qn,z0 is an interval. One step later, J may again be an element of Qn+1,z0 , or it may be subdivided into shorter subintervals some of which may be discarded. For all a ∈ J , zi (a) can be thought of as having “indistinguishable” itineraries up to time n, that is to say, for each i ≤ n, the critical curve γi defined on J is either entirely outside of C (0) or entirely contained in some Iµj , and all points are either in a bound state or in a free state simultaneously. Moreover, at its last free return before time n, γi occupies the full length of some Iµj . Finally we move on to deletions on account of (IA4). We use the construction above, deleting those elements of Q3N,z0 that correspond to zi having an abnormally high frequency of close returns between times N and 3N . Proposition 6.4. Given ε > 0, ∃δ0 = δ0 (ε) such that if δ < δ0 and the parameters in question are sufficiently near (a ∗ , 0) (depending on δ), then the following holds: If J , z0 and γN are as above, then there is a set EN,z0 ⊂ J \ DN,z0 with |EN,z0 | < e−εn |J \ DN,z0 | such that for all a ∈ J \ (DN,z0 ∪ EN,z0 ), the estimates in (IA2) and (IA4) hold for z0 up to time 3N . As is evident from its formulation, this estimate is a little more delicate than the previous one. A one-dimensional version of this result is proved in [BC2], pp. 81– 86. (For an alternate proof, see [TTY].) After the discussion at the beginning of this subsection and the groundwork in Sect. 6.4, the adaptation of this result to our setting is straightforward.
Strange Attractors with One Direction of Instability
43
6.6. Estimating |!|. The initial parameter set !0 is chosen as follows. Let C = {xi } be the critical set of f , and let δ1 be the minumum distance between C and f n xi , n > 0. We assume that δ1 >> δ. Let n0 be the number of iterates the critical orbits of Ta,b are required to stay outside of C (0) . We assume n0 is as large as need be and prespecified. Then there exists ε > 0 such that for all a ∈ [a ∗ − ε, a ∗ + ε], the first n0 iterates of all the critical points of fa stay > δ21 away from C. Choose b so small that the same holds (with slightly weaker estimates) for all the generation 0 critical points. We let !0 = [a ∗ − ε, a ∗ + ε], and let b, which may be shrunk further for other reasons, be fixed in the rest of the discussion. We first give a rough outline of the inductive process by which the parameter set ! is chosen. With this outline in mind, we will discuss in greater detail each individual step and then finally estimate the measure of the set of parameters deleted. Here is the outline. Starting with N = n0 , the procedure for going from step N to step 3N is as follows: Let N0 = [ θ1 ]. Consider first the case N ≤ N0 . At time N , we are handed a good parameter set !N . For a ∈ !N , we consider each z0 ∈ .0 separately until time 3N , making deletions if necessary so that the estimates in (IA2) and (IA4) are obeyed. Let !3N,z0 be the set of parameters retained by considering z0 alone. Then !3N = ∩z0 ∈.0 !3N,z0 . The critical set .1 is created the first time 3N exceeds N0 . In general, for N > N0 , the procedure is as above with .θN in the place of .0 , plus an extra step at the end, namely the creation of .3θN for a ∈ !3N . This process is continued ad infinitum, and ! := ∩N !N . We now begin our detailed discussion. Let z0 ∈ .0 be fixed, and let γi denote its associated critical curve. We wish to argue that as we “iterate”, γi grows long, is roughly horizontal, and the first time it intersects C (0) nontrivially, the intersection contains at least one of the outermost Iµ . To see that this can be arranged, consider first the case b = 0. There, the fact that fa ∗ is a Misiurewicz map implies that γi is “hooked” onto an orbit that remains > δ1 away from the critical set for all i, giving the desired picture. Fix n1 > n0 such that γi remains outside of C (0) for i ≤ n1 and |γn1 | >> δ. This continues to hold for small b > 0. Also, for b > 0, it follows from Proposition 6.1 and Lemma 6.3 that there is a time after which wi and τi become comparable both in magnitude and in angle. By choosing n0 sufficiently large, therefore, we may assume that the first time γi meets C (0) , γi is a roughly horizontal curve with |γi | > δ, and the part deleted (in violation of (IA2)) constitutes as small a fraction of !0 as need be. Let N be the first time when some deletion has taken place for at least one of the z0 . We stated in the outline that we are handed !N , but we actually have more, namely that for each z0 ∈ .0 , we have !N,z0 , the set of parameters retained by considering z0 alone up until time N , and a partition QN,z0 on !N,z0 obtained as in Sect. 6.5. (If no deletion has taken place for this z0 , then !N,z0 = !0 and QN,z0 is the trivial partition.) Let !N = ∩z0 ∈.0 !N,z0 . We describe next how to go from step N to step 3N . Fix z0 and J ∈ QN,z0 . If J ∩ !N = ∅, then we declare J to be “inactive” from here on and do not consider it further. (For purposes of estimating the measure of the set of deleted parameters, however, we regard all the inactive intervals as being in !n,z0 for all n ≥ N .) Assume J ∩ !N = ∅. This does not mean J ⊂ !N , for other critical points may have created some “holes” in J . Let γˆi denote the critical curve defined on J and let n > N be the first time when part of γˆn makes a free return to C (0) . In order to continue, we need to verify that the necessary binding structure is available. Since −cn by Proposition 6.1; and since λ−1 e−cN << ρ 6KαN , |γˆn | < 1, we have |J | < λ−1 1 e 1 ˜ 3KαN . The lemmas in Sect. 6.4 therefore we are guaranteed by Lemma 6.4 that J ⊂ ! apply to give us a meaningful notion of dC (·) (see also Sect. 6.5). We subdivide according
44
Q. Wang, L.-S. Young
to Iµj -locations, defining Qn,z0 . For those J ∈ Qn,z0 that do not meet !N , we again declare them to be “inactive”, and we track the active ones following the discussions in Sects. 6.4 and 6.5. The process is continued until time 3N . It is then repeated for each one of the other critical points in .0 . At a generic step N , then, we are handed for each z0 ∈ .0 or .θN a set !N,z0 , which has an active part and an inactive part. On the active part there is a partition QN,z0 . We track the elements of QN,z0 , declaring some to be “inactive” along the way and making deletions to secure (IA2) and (IA4) as in Sect. 6.5. At time 3N , we create .3θN if necessary. The newly created critical points are handed the parameter sets and partitions of their parents. The completes the desciption of !N for all N = 3i n0 . We turn, finally, to the problem of estimating the measure of !. From Sect. 6.5, it follows that there exists α1 > 0 such that for each critical point z0 , |!N,z0 \ !3N,z0 | < Ke−α1 N |!0 |. (This estimate would not have been valid if we had deleted inactive intervals.) Adding up the deletions from all the critical points, we have |!0 \ !| ≤
card(.0 )Ke−α1 n0 |!0 | +
∞ i=1
i:3i n0 ≤N0
i
card(.3i θN0 )Ke−α1 3 N0 |!0 |.
To estimate card(.θN ), let I1 , · · · , Ir be the monotone intervals of f , and let K0 = maxi { number of Ij counted with multiplicity : Ij ∩ f (Ii ) = ∅}. Lemma 6.10. card(.θN ) < K0θN . Proof. Partition ∂Rk into segments by orbits of critical points of generation ≤ k. Then each segment has at most one free component, and each free component meets ≤ K0 of the monotone intervals, giving rise to ≤ K0 new critical points. For more details, see Sect. 9.1. % & We conclude that the fraction of !0 deleted tends to 0 as n0 → ∞ and b → 0.
In the remainder of this paper, T is assumed to be Ta,b , where (a, b) is a pair of “good” parameters, i.e. (a, b) ∈ !, where ! is as in Theorem 1.1.
Part II. Geometric and Statistical Results 7. Nonuniform Hyperbolic Behavior Recall that . is the set to which .θN converges as N → ∞. One of the properties guaranteed by parameter selection is that orbits starting from . have some hyperbolic behavior (Theorem 1.1(2)(ii)). The purpose of this section is to show that this behavior is passed on to a large set of points on the attractor and in the basin, proving Theorem 1.2 except for the assertion in (1)(iii), the proof of which we postpone to Sect. 10.4.
Strange Attractors with One Direction of Instability
45
7.1. Control and hyperbolicity of non-critical orbits. We recapitulate the ideas developed in Sections 3–5 with a view toward proving hyperbolicity for an arbitrary (noncritical) orbit. Given arbitrary z0 ∈ R0 , we let 0 ≤ n1 < n1 + p1 ≤ n2 < n2 + p2 ≤ n3 < · · · be such that znj ∈ C (0) and is bound to a suitable point in ., pj is the ensuing bound period, and nj +1 is the first return after nj + pj . Then: (1) During its free periods, i.e. between times nj + pj and nj +1 , the orbit is outside of C (0) , where DT i is essentially uniformly hyperbolic (Lemma 2.8). (2) During its bound periods, i.e. between times nj and nj + pj , DT i (znj ) copies the derivative of its guiding orbit from . (see (IA6)), which has been guaranteed through parameter selection to have some form of hyperbolicity ((IA4)). (3) The concatenation of hyperbolic segments, however, need not result in a hyperbolic orbit, for the direction expanded at the end of one segment may be near the contractive direction of the next. Indeed, this happens at times nj , when there is a “confusion” of stable and unstable directions, leading to a loss of hyperbolicity (see Sect. 3.1). (4) The properties that guarantee that hyperbolicity is preserved through these concatenations are precisely the h-relatedness and correct splitting properties at free returns. At time nj , the correct splitting of an expanded vector limits the magnitude of the loss (Lemma 2.12 and Sect. 3.3.2), while the h-relatedness of znj to some zˆ ∈ . guarantees that the ensuing bound period is long enough for this loss to be compensated (see (IA5)). In particular, if z0 has a unit tangent vector w0 such that (z0 , w0 ) is controlled by . for all n ≥ 0 in the sense of Definition 4.2, then lim sup n→∞
1 log DT n (z0 )w0 ≥ c > 0, n
(10)
where ec is the minimum of the growth rates of b-horizontal vectors outside of C (0) and net derivative gains during bound periods. Assuming that the rate of growth outside of c C (0) is > e 3 , where c is as in Theorem 1, we may take c = 3c . We remark that in general, the growth of DT n (z0 )w0 is not regular: without any assumptions on how close to . the free returns are allowed to be, i.e. without a condition in the spirit of (IA2), the loss of hyperbolicity at time nj can be arbitrarily large; for example, the lim inf in (10) can be negative. Recall that to establish control of (z0 , w0 ), it suffices to look at free returns (Lemmas 4.2 and 4.5). We record below a condition at free returns that enables us to extend control through another bound-free cycle. Lemma 7.1 plays a crucial role in all the results in this section. First, we identify certain locations that are potentially problematic. For k ≥ 0, let
k Z (k) := z ∈ C (k) : dC (z) < b 20 . Lemma 7.1. Let z0 and w0 be arbitrary, and suppose that (z0 , w0 ) is controlled by . up to time k − 1. Let zk be a free return. If zk ∈ C (i) \ Z (i) for some i < 45 k, then wk splits correctly.
46
Q. Wang, L.-S. Young
Proof. The proof of this lemma is virtually identical to that of Proposition 5.2. Let j = min{i, k}, so that zk−j makes sense. (The reason we allow i to exceed k has to do with the way this lemma is used.) Claims 5.1–5.3 in Proposition 5.2 continue to be valid becuase they rely only on the fact that (z0 , w0 ) is controlled. The proof here differs from that in Sect. 5 only at the end, where under present conditions we have j
j
i
b 4 ≤ b 12 ≤ b 12 5 i << b 20 ≤ dC (zk ). 1 4
& %
7.2. Typical derivative behavior in the basin. Let m denote the 2-dimensional Lebesgue measure. Proposition 7.1. Assuming the additional regularity condition (**) in Sect. 1.2, we have m {z0 ∈ R0 : zk ∈ Z (k) infinitely often} = 0. To prove this result, we need more refined estimates on the width of Q(k) than that given in Lemma 4.1. Lemma 7.2. There exists K > 0 such that if Q(k) is a component of C (k) , and dv is the vertical distance between the two horizontal boundaries of Q(k) measured anywhere along the length of Q(k) , then (K −1 b)k+1 < dv < (Kb) 100 k . 99
Proof. First we prove the lower bound, which relies heavily on the condition (**). Let ωk be a vertical line segment joining two points in ∂Q(k) . For i < k, let ωi = T −k+i ωk . If ω0 connects the two components of ∂R0 , then dv > (K −1 b)k · K −1 b since by (**), DT v ≥ K −1 | det(DT )| ≥ K −1 K1−1 b for every unit vector v. If not, we will need to rule out the possibility that ω0 may be extremely short. Let z0 , z0 ∈ ω0 ∩ ∂R0 , and let γ0 be the shorter of the two segments of ∂R0 between z0 and z0 . We consider γi := T i γ0 , and remember that points on ∂R0 together with their tangent vectors are controlled (Proposition 5.1). Since zk and zk are both free, and they do not lie on a C 2 (b)-curve, we conclude that a critical point is created on γi for some i < k. Let i be the first time this happens. If |zi − zi | > δ, then |ωk | > δ(K −1 b)k . If not, then both zi and zi are in C (0) . Since both of their bound periods have expired by time k, it follows from (IA5) that dC (zi ) and dC (zi ) are > e−K(k−i) . We claim that dC (zi ) + dC (zi ) is approximately the horizontal distance between these two points (see Lemma 9.1 for more details). This gives |ωk | > 2(e−K K −1 b)k . For the upper estimate, we pick an arbitrary zk ∈ ∂Q(k) , and borrow the argument 1 in the proof of Claim 5.1 with j = k, pivoting the line L at L ∩ {x = 100 k} (instead of 1 1 L ∩ {x = k − 3 j }) as we rotate clockwise. This gives i0 with 0 ≤ i0 ≤ 100 k such that i −100i DT (zi0 ) > DT . Iterating forward once if necessary (and possibly losing a factor of K −1 in the last estimate), we may assume that zi0 ∈ C (0) , so that it lies on an integral curve γ0 of ek−i0 which joins the two components of ∂R0 . Note that γ0 meets ∂R0 only at its end points. Iterating forward, this curve brings in two segments of ∂Rk−i0 . They must lie on the two horizontal boundaries of Q(k−i0 ) (zk ) because γk−i0 passes through zk and intersects no other point of ∂Rk−i0 . This proves that dv measured at zk 99 has length at most that of γk−i0 , which by Lemma 2.3 is < (DT 200 b)k−i0 < (Kb) 100 k . & %
Strange Attractors with One Direction of Instability
47
Proof of Proposition 7.1. By the Borel–Cantelli Lemma, it suffices to show that −k Z (k) ) < ∞. We estimate m(T −k Z (k) ) by m(T k m(T −k (Q(k) ∩ Z (k) )) m(T −k Z (k) ) = ≤ max
m(T −k (Q(k) ∩ Z (k) )) m(T −k Q(k) ), m(T −k Q(k) )
where thesummations and maximum are taken over all components Q(k) of C (k) . Note also that m(T −k Q(k) ) < 1. Using Lemma 7.2 and the regularity of det(DT ) in (**), we obtain (k) ∩ Z (k) ) m(T −k (Q(k) ∩ Z (k) )) 2k m(Q · ≤ K m(T −k Q(k) ) m(Q(k) )
(Kb) 100 k · b 20 k 99
≤ K 2k ·
( Kb )k+1 · ρ k
which decreases geometrically in k as desired.
1 b 25 k · b ρk 1
1
≤ K 4k
& %
Proof of Theorem 1.2(2). Let ξ0 ∈ R0 . From the discussion in Sect. 7.1, it follows that lim sup n→∞
1 c log DT n (ξ0 ) ≥ n 3
holds if we are able to produce k0 > 0 and a vector w0 such that if z0 = ξk0 , then (z0 , w0 ) is controlled by . for all n ≥ 0. In light of Proposition 7.1, it suffices to consider the following two cases. Case 1. ξk ∈ Z (k) for all k ≥ 0. We take k0 = 0 and let w0 = 01 if ξ0 ∈ C (0) , w0 = 01 if ξ0 ∈ C (0) . We assume (z0 , w0 ) is controlled up to time k − 1, and let zk be a free return. The hypothesis of Lemma 7.1 is verified at time k as follows: Let j be the largest integer such that zk ∈ C (j ) . Then if j ≥ k, i = k meets the requirements of ˆ (j +1) \ Q(j +1) for some Lemma 7.1 since ξk ∈ Z (k) ; and if j < k, then zk must be in Q Q(j +1) since it is in Rk , and so we may take i = j + 1. (k0 ) for some k and ξ ∈ Z (k) for all k > k . Here we let z = ξ and Case 2. 0 k 0 0 k0 0ξk 0 ∈ Z w0 = 1 . There is a critical point zˆ in Q(k0 ) (z0 ) to which z0 is bound for k1 iterates. k0
Since DT k1 b 20 > e−βk1 , we have k1 ∼ k0 θ −1 >> k0 . During this period, we may regard (z0 , w0 ) as controlled by .. For k ≥ k1 , the situation is identical to that in Case 1 except that zk ∈ Rk+k0 and we can only guarantee zk ∈ Z (k+k0 ) . To verify the hypothesis of Lemma 7.1 for zk , we proceed as above, distinguishing between the cases j ≥ k + k0 and j < k + k0 and noting that for k ≥ k1 , k + k0 < (1 + Kθ)k. % & Remark. The results in this paper that use (**) remain valid if (**) is replaced by There exist η ≥ 1 and K1 , K2 > 0 such that for all z ∈ R0 , K1−1 bη ≤ | det(DT )| ≤ K2 bη .
(∗∗ )
To prove this, it suffices to check that Proposition 7.1 is valid under (∗∗) . Observe that the results in Sect. 2.1 are abstract, so that if DT i (z0 ) ≥ κ i for all i ≤ n, then DT i en ≤ (Kbη κ −2 )i for all i ≤ n. Using this and DT v ≥ K −1 bη for all v = 1,
48
Q. Wang, L.-S. Young
one checks easily that under (∗∗) , the conclusion of Lemma 7.2 is valid if b is replaced 99 by bη . Moreover, the number 100 can be replaced by 1 − ε0 for any prespecified ε0 > 0. 1 Choosing ε0 such that ε0 η < 20 , we check that the proof of Proposition 7.1 goes through as is. 7.3. Uniform hyperbolicity away from C. 0 Recall that ε = {z0 ∈ : dC (zn ) ≥ ε for all n ∈ Z}. The purpose of this subsection is to prove that ε is a uniformly hyperbolic invariant set 7 for every ε > 0. This result together with the fact that the strength of hyperbolicity deteriorates as ε → 0 justifies our identification of C as the critical set and confirms that dC (·) is a valid notion of “distance” to the critical set. The approximation of by ε is a concrete example of the use of uniformly hyperbolic invariant sets to approximate systems that have (weak) hyperbolic properties. See [K] and [P] for results in the same spirit. Proofs of uniform hyperbolicity often rely on a priori knowledge √ of invariant cones. In our setting, these cones are easily identified for ε with ε > b; see Sect. 2.5. As ε → 0, the situation becomes considerably more delicate: the stable and unstable directions at points in ε become increasingly confused, both ranging over nearly all possible directions within very small neighborhoods. Our line of proof, which does not rely on a priori knowledge of cones, can be formulated as follows: Let g : X → X be a self-map of a compact metric space, and let M : X → GL(2, R) be a continuous map. For x ∈ X and n ≥ 0, we define M (n) (x) = M(g n−1 x) · · · M(gx)M(x) and M (−n) (x) = M(g −n x)−1 · · · M(g −1 x)−1 . It is clear what it means for the cocycle (g, M (n) ) to be uniformly hyperbolic (think of g as a diffeomorphism and M(x) = Dg(x)). Since the condition of interest to us is projective in nature, we will state our result assuming that M takes its values in SL(2, R). Lemma 7.3. Let (g, M (n) ) be as above. If there exist λ > 1 and N ∈ Z+ such that at each x ∈ X, there exists a unit vector v = v(x) such that M (n) (x)v ≤ λ−n
for all n ≥ N,
then (g, M (n) ) is uniformly hyperbolic. Proof. Let E s (x) be the subspace spanned by v(x), and observe that M(x)E s (x) = E s (gx): if not, then there are two linearly independent vectors, v1 ∈ M(x)E s (x) and v2 = v(gx) such that both M (n) (gx)v1 and M (n) (gx)v2 decrease exponentially as n → ∞, contradicting M ∈ SL(2, R). The continuity of x → E s (x) is proved similarly. Using the uniform contraction of M (N) on vectors in E s and the fact that | det(M)| = 1, we choose δ0 > 0 such that for all x ∈ X and w = 0 ∈ R2 , if (w, v(x)) < δ0 , 7 Technically, zi → z does not imply d (zi ) → d (z) when zi ∈ C (k) and z ∈ C (k) , but let us assume ε C C is closed by taking its closure if necessary.
Strange Attractors with One Direction of Instability
49
then (M (N) (x)w, v(g N x)) > 21 λ2N (w, v(x)). Let C s (x) = {w : (w, v(x)) < δ0 } (nN) (g −nN x)C u (g −nN x) and C u (x) = R2 \ C s (x). We claim that E u (x) := ∩∞ n=1 M is a 1-dimensional subspace. This follows if we show that M (−nN) w decreases exponentially as n → ∞ for w ∈ E u . The latter is a consequence of the following:
(M (−nN) (x)w, M (−nN) (x)v(x)) > δ0 , M (−nN) (x)v(x) ≥ λnN , and M ∈ SL(2, R). The M-invariance of E u is checked easily. % & Proposition 7.2. For every ε > 0, ε is uniformly hyperbolic with
DT i u ≥ Kε−1 ec i for all u ∈ E u . Here Kε is a constant depending on ε, and c can be taken to be ≈ 3c . k
Proof. We fix ε and let kε be the smallest integer k such that ε > b 20 . Claim 7.1. For every ξ0 ∈ ε , there exists k(ξ0 ) ≤ 2kε and a unit vector w0 such that if z0 = ξk(ξ0 ) , then for all i > 0, c
kε
kε
DT i (z0 )w0 ≥ e 3 i b 20 K − 10 . Proof of Claim 7.1. We consider separately the following cases: Case 1. ξi ∈ C (0) for all i ≤ kε . In this case we let k(ξ0 ) = 0 and w0 = C (0)
Case 2. ξi0 ∈ and w0 = 01 .
for some i0 ≤ kε and ξi0 +k ∈
Z (k)
1 0
.
for all k ≥ 0. We let k(ξ0 ) = i0
Case 3. ξi0 ∈ C (0) for some i0 ≤ kε and ξi0 +k ∈ Z (k) for some k ≥ 0. We let k be the last time this happens, and choose k(ξ0 ) = i0 + k, w0 = 01 . Note that k(ξ0 ) ≤ 2kε . In each of the three cases, we first show that (z0 , w0 ) is controlled by . for all n ≥ 0. This is done by verifying inductively at free returns the hypothesis of Lemma 7.1. The arguments are essentially the same as those for Theorem 1.2(2). c From the control of (z0 , w0 ), it follows that at free returns, wi > e 3 i . Next we consider the drop in wi∗ one step later. This is given by dC (zi ), which by the definition kε
of ε is ≥ b 20 . Further drops at bound returns are exponentially small. For comparisons kε between wi∗ - and wi - vectors, since the fold period J initiated at time i is ≤ 10 , we have, kε
kε
∗ . for j < J, wi+j ≥ K − 10 wi+J = K − 10 wi+J
& %
Let z0 be as above. From Claim 7.1, the fields of most contracted directions of sufficiently high orders are defined at z0 , and their uniform contractive estimates are passed on to e∞ := limn en (see Corollary 2.1). Let v(z0 ) = e∞ (z0 ). For other ξ0 ∈ ε , let v(ξ0 ) = DT −k(ξ0 ) (ξk(ξ0 ) )v(ξk(ξ0 ) ). Using the fact that k(ξ0 ) < 2kε and letting M(z) = | det DT1 (z)|1/2 DT (z), we see that the conditions of Lemma 7.3 are satisfied. Uniform hyperbolicity follows. It remains to prove that a lower bound for DT i |E u is as claimed. In the argument above we have produced for each ξ0 ∈ ε a vector u0 uniformly bounded away from c E s (ξ0 ) such that ui ≥ Kε−1 e 3 i . Since (un , E u (ξn )) → 0 uniformly, we have un ∼ DT n |E u (ξ0 ). The assertion in Theorem 1.2(i) on periodic points is proved similarly. & %
50
Q. Wang, L.-S. Young
Proof of Theorem 1.2 (1)(ii). We now prove that the deterioration of hyperbolicity on ε as ε → 0 is not only a possibility but a fact. To do this, it suffices to produce a point z ∈ ε with the property that (E u (z), E s (z)) < Kε. We can choose this point to be on the unstable manifold W u (ˆz) of any zˆ ∈ δ . For ξ0 ∈ W u (ˆz), let τ0 be its unit tangent vector to W u (ˆz), Claim 7.2. For all ξ0 ∈ W u (ˆz), (ξ0 , τ0 ) is controlled by . for all n ≥ 0. u (ˆ Proof of Claim 7.2. It suffices to prove the result for ξ0 ∈ Wloc z). Suppose that (ξ0 , τ0 ) (j is controlled up to time k − 1, ξk is a free return, and ξk ∈ C −1) \ C (j ) for some j . Since ˆ (j ) \ Q(j ) for some Q(j ) . If j ≤ k, then ξk−j ∈ , it follows that ξk ∈ Rj , so that ξk ∈ Q Lemma 7.1 applies directly. If not, we let z0 = ξk−j and apply Lemma 7.1 to the orbit of (z0 , τ0 (z0 )). % & u (ˆ Let γ = Wδ/2 z). We will show that there exists z ∈ (T n γ ∩ ε ) for some n > 0 such that dC (z) < 2ε. As γ is iterated, it gets long and eventually meets the region {dC (·) < ε}. Let n0 be the first time this happens, and let ω0 ⊂ T n0 γ correspond to some Iµj in the region {ε ≤ dC (·) ≤ 2ε}. (See the beginning of Sect. 6.1 for notation.) Note that ω0 is free. We set binding for ω0 and iterate until it becomes free again at time n1 . We then subdivide the image into segments corresponding to Iµj (by which we include pieces outside of C (0) ), and let ω1 be the longest of the divided subsegments. We iterate ω1 until it becomes free again at time n2 . Then divide and choose ω2 to be the longest of the subsegments etc. Let z ∈ ∩i≥0 T −(ni −n0 ) ωi . Using Corollary 4.3, we verify that ωi ∩ {dC (·) < ε} = ∅ for all i ≥ 0, so that z ∈ ε . It remains to estimate (E u (z), E s (z)). First, since τ (z) splits correctly, we have
(E u (z), τ (φ(z))) < ε0 d (z) < 2ε0 ε. Note that τ (φ(z)) = e∞ (φ(z)) and E s (z) = C e∞ (z). We leave it as an easy exercise to show that DT n (z)τ0 (z0 ) ≥ 1 for all n > 0 (use Claim 7.2 and Corollary 4.3), so that at both z and φ(z), (en , e∞ ) = O(bn ). Let n be such that λn ∼ ε, where λ is as in Lemma 2.2. Then (en (z), en (φ(z))) < Kε, and O(bn ) << ε, proving (τ (φ(z)), E s (z)) < K ε. This completes the proof. % &
8. Statistical Properties of SRB Measures We follow [Y3] and [Y4], which put forward a scheme for obtaining statistical information for general dynamical systems with some hyperbolic properties. In this approach, one constructs reference sets and studies regular returns to these sets. Sufficient conditions in terms of return times are then given for various statistical properties. In Sect. 8.1, we indicate how this setup is arranged for the class of attractors in question. For technical details on this construction, we refer the reader to [BY2], where a similar construction is carried out for the Hénon maps. SRB measures and their statistical properties are discussed in Sects. 8.2 and 8.4. A feature of the present setting is that depending on the transitivity properties of T , our attractor may admit multiple SRB measures. Obviously, the method of [Y3] and [Y4] gives information only on orbits that pass through the reference sets constructed. To complete the picture, we prove in Sect. 8.3 that all SRB measures are captured by our reference sets, and Lebesgue-almost every initial condition in the basin is accounted for.
Strange Attractors with One Direction of Instability
51
8.1. Positive-measure horseshoes with infinitely many branches and variable return times. In [Y3], a unified way of looking at nonuniformly hyperbolic systems is proposed. This dynamical picture requires that one constructs a reference set and a return map with Markov properties. The purpose of this subsection is to recall this construction in the context of the maps under consideration, and to give a summary of the facts needed in the discussion to follow. 8.1.1. Construction of reference set. Let {x1 , · · · , xr } be the set of critical points of f . + ± Our reference set / is the disjoint union of 2r Cantor sets /± 1 , · · · , /r , where /i and − /i are located in the component of C (0) containing (xi , 0), one on each side of (xi , 0). +,s − We define /+ i (respectively /i ) by specifying two transversal families of curves .i +,u and .i and letting
+,u +,s u s u s /+ . i = z ∈ γ ∩ γ : γ ∈ .i , γ ∈ .i The family .i+,s (no relation to the critical set .i in Sections 3–6) is defined as follows. Let P be the partition in Sect. 6.1 centered at (xi , b) ∈ ∂R0 . (To simplify notation, ∂R0 in this section refers to the top boundary of R0 .) Let ω0 ⊂ ∂R0 be the outermost Iµj on the right, and let ω∞ = {z0 ∈ ω0 : dC (zn ) > δe−αn for all n ≥ 0}. Letting mγ (·) denote the measure on a curve γ induced by arc length, it is proved in cn Sect. 6.1 that mω0 (ω∞ ) > 0. For every z0 ∈ ω∞ , since DT i (z0 )τ0 ≥ δe 3 for all n ≥ 0 (use (IA5) and the definition of ω∞ ), there is a stable curve of every order passing through it. These curves converge to a stable curve γ s (z0 ) of infinite order (Sect. 2.1). Moreover, γ s (z0 ) has slope > K −1 δ and connects the two boundaries of R0 . We define .i+,s := {γ s (z0 ) : z0 ∈ ω∞ }. To define .i+,u , we first let .˜ i+,u be the set of all free segments γ of ∂Rn , all n ≥ 0, such that γ is three times as long as ω0 and has its midpoint vertically aligned with that of ω0 . Let .i+,u be the set of curves that are pointwise limits of sequences in .˜ i+,u . We remark that since the curves in .˜ i+,u are C 2 (b), their slopes as functions in x form an equicontinuous family. This implies that the curves in .i+,u are at least C 1+Lip , and that the tangent vectors of curves in .˜ i+,u converge uniformly to the tangent vectors of curves in .i+,u . − Recalling that /+ i and /i are the Cantor sets that straddle xi , we may, for con−,s venience, choose .i and .i+,s in such a way that their elements are paired, i.e. the T -image of each element in .i−,s lies on a stable curve containing the T -image of an element of .i+,s , and vice versa. This completes the construction of / = ∪ri=1 /± i . A similar construction is carried out for the Hénon maps in [BY2], Sects. 3.1–3.4. 8.1.2. Structure of return map. Next we define a return map T R : / → / with the following properties: Topologically, T R : / → / has the structure of an infinite − horseshoe. For simplicity of notation, we write /i = /+ i or /i . A set X ⊂ /i is called s an s-subset of /i if there exists a subcollection of . ⊂ .i such that X = {z ∈ γ s ∩ γ u : γ s ∈ ., γ u ∈ .iu }; u-subsets are defined similarly. If X is an s-subset of /i , we say X = /i mod 0 if m∂R0 (/i − X) = 0.
52
Q. Wang, L.-S. Young
Lemma 8.1. There is a map T R : / → / with the following properties: every /i has a collection of pairwise disjoint s-subsets {/i,j }j =1,2,··· with /i = ∪j /i,j mod 0 such that for each j , – T R |/i,j = T ni,j |/i,j for some ni,j ∈ Z+ ; – T R (/i,j ) is a u-subset of /k for some k = k(i, j ). We stress that the partition of /i into {/i,j } is an infinite one, and that the return times ni,j are not bounded. The return time function R : / → Z+ is defined to be R|/i,j = ni,j . As we will see, the tail of this function, that is, the distribution of its large values, plays a crucial role in determining the statistical properties of the system. Note that T R is not necessarily the first return map; we have settled for possibly larger return times in favor of a Markov structure. Lemma 8.1 corresponds to Proposition A(1) in [BY2]; its proof is given in Sects. 3.4 and 3.5 of [BY2]. 8.1.3. Two important analytic estimates. Technical estimates corresponding to (P1)– (P5) in [Y3] or Proposition B of [BY2] are needed. Referring the reader to Sect. 5 of [BY2] for their precise statements and proofs, we state below two of the most relevant facts. Lemma 8.2 (Distortion estimate for controlled segments). There exists K > 0 such that the following holds: Let γ0 be a curve and τ0 its unit tangent vectors. We assume that (i) for all z0 ∈ γ0 , (z0 , τ0 ) is controlled up to time n − 1; (ii) γi is bound or free simultaneously for each i, and γi is contained in three contiguous Iµj at all free returns; (iii) γn is a free return. Then for all z0 , z0 ∈ γ0 ,
1 τn (z0 ) ≤ ≤ K. K τn (z0 )
The proof is similar to that of Proposition 6.2 (it is, in fact, a little simpler) and will be omitted. In the construction of T R : / → /, it is important to arrange that γ0 , the shortest subsegment of ∂R0 that spans /i,j in the u-direction, satisfies (ii) above up to time ni,j . Let ∪.is := ∪{z ∈ γ s : γ s ∈ .is }. If γ and γ are curves transversal to the elements of .is and intersecting them, we define ψ : γ ∩ (∪.is ) → γ by sliding along the curves in .is , and say .is is absolutely continuous if for every pair of C 2 (b)-curves γ and γ as above, ψ carries sets of mγ -measure zero to sets of mγ -measure zero. Recall that if γ is the subsegment of ∂R0 in .iu , then mγ (γ ∩ (∪.is )) > 0; in particular, the definition above is not vacuous. Lemma 8.3 (Absolute continuity of .is ). .is is absolutely continuous with 1 d ψ∗ (mγ | ∪ .is ) < K on γ ∩ (∪.is ). < K dmγ
Strange Attractors with One Direction of Instability
53
Except for one minor technical difference, the proof of Lemma 8.3 is identical to that of Sublemma 10 in Sect. 5 of [BY2]: in the latter, the transversals are taken to be curves in .˜ iu , whereas we need them to be arbitrary C 2 (b) curves here. Clearly, it suffices to show that Sublemma 10 of [BY2] is valid with γ ∈ .˜ iu and γ arbitrary C 2 (b), and for that we need distortion estimates for the τi -vectors on certain subsegments of γ (ω in the proof of Sublemma 10). We have them because these subsegments are connected to subsegments of γ by (temporary) stable curves, and the corresponding τi -vectors are comparable. 8.1.4. Tail of return times. Finally we state an estimate on which the statistical properties of T depend crucially. Its proof is identical to that of Proposition A(4) in [BY2]. Lemma 8.4. There exists K and θ0 < 1 such that for every /i , m∂R0 {z ∈ ∂R0 ∩ /i : R(z) > n} < Kθ0n . 8.2. SRB measures. 8.2.1. Construction of SRB measures. We describe below a recipe for constructing SRB measures using the reference sets {/± i }. For the definition of SRB measures, see Sect. 1.3. For more details on the technical justification of the steps below, see [Y3] or [BY2], Sect. 6.2. The construction consists of three steps. Step 1. Construction of a T R -invariant measure ν on ∪/± k with absolutely continuous − conditional measures on the leaves of . u := ∪k .k±,u . We fix some /i = /+ i or /i , and let m0 = m∂R0 | (/i ∩ ∂R0 ). Let ν be an accumulation point of the sequence of measures n−1 1 R j (T )∗ m0 , n = 1, 2, · · · . n j =0
j
Then ν is a T R -invariant measure. By Lemma 8.2, the conditional measures of (T R )∗ m0 on the curves of .˜ u := ∪k .˜ k±,u have uniformly bounded densities. From Lemma 8.2 and the Markov property of T R (see Sect. 8.1.2), it follows that for γ ∈ .˜ u , the conditional j densities of (T R )∗ m0 on γ when restricted to γ ∩ (∪. s ) are uniformly bounded away from 0. These properties are passed on to the conditional measures of ν on the leaves of . u . (The curves in . u are pairwise disjoint except possibly for a countable number of pairs; this is nothing more than a technical nuisance.) Step 2. Construction of a T -invariant probability measure µ given ν. It follows from the bounded densities of ν, Lemma 8.3 and Lemma 8.4 that / Rdν0 < ∞. Let µ=
∞ 1 j T∗ (ν0 | {R > j }). Rdν0 j =0
It is straightforward to check that µ is a T -invariant probability measure. Step 3. Proof of SRB property. Let µ be as in Step 2. First we check that T has a positive Lyapunov exponent µ-a.e. At z0 ∈ (∪. s ) ∩ (∪.˜ u ), let τ˜ (z0 ) be a unit tangent vector to cn .˜ u (z0 ), the .˜ u -curve through z0 . Just as on ω∞ , we have DT n (z0 )τ˜ ≥ δe 3 for all
54
Q. Wang, L.-S. Young
n ≥ 0. This uniform growth is passed on to the tangent vectors τ to . u -curves at every z ∈ / = ∪k /± k . The existence of a positive Lyapunov exponent µ-a.e. follows from the fact that the orbit of µ-almost every point passes through /. General nonuniform hyperbolic theory (see e.g. [P] or [R4]) then tells us that stable and unstable manifolds exist µ-a.e. To prove that µ is an SRB measure, we need to show that its conditional measures on unstable manifolds are absolutely continuous. Since µ is the sum of forward images of ν, it suffices to prove this for ν. We know from Step 1 that ν has absolutely continuous conditional measures on the leaves of . u . Thus it remains to prove Claim 8.1. For ν-a.e. z0 , . u (z0 ) is a local unstable manifold, i.e. lim sup n→∞
1 log sup |ξ−n − z−n | < 0. n ξ0 ∈. u (z0 )
Proof of Claim 8.1. From the construction of ν, it follows that that for ν-a.e. z0 ∈ /, there is a sequence of .˜ u -curves {γ˜i } such that γ˜i → . u (z0 ). Let ni be such that T −ni γ˜i ⊂ ∂R0 . Since γ˜i is free, we have that for all tangent vectors τ˜ of γ˜ , DT −n τ˜ < e−c n for some c > 0 and 0 < n ≤ ni (Proposition 5.1 and Lemma 4.8). These uniform estimates for backward iterates of T are passed on to all tangent vectors of . u (z0 ), proving that it is a local unstable manifold of z0 . % & 8.2.2. Ergodic decomposition of SRB measures. We begin by considering the ergodic decompositions of the T R -invariant measures constructed in Step 1 in Sect. 8.2.1. Definition 8.1. Let g : X → X be a continuous map of a compact metric space, and let ν be a g-invariant Borel probability measure on X. We say z ∈ X is generic or future-generic with respect to ν if for every continuous function ϕ : X → R, n−1
1 ϕ(zi ) → n
ϕdν.
i=0
M(T R )
Let be the set of all normalized invariant measures constructed in Step 1 of Sect. 8.2.1. Let ν ∈ M(T R ), and suppose that ν(/i ) > 0 for some /i = /+ i or . From the positivity of the conditional densities of ν on / ∩ γ , Lemma 8.3, and /− i i a standard argument due to Hopf, we know that there is an ergodic component ν e of ν such that (i) ν-a.e. z ∈ /i is generic with respect to ν e ; (ii) for every C 2 (b)-curve γ , mγ -a.e. z ∈ γ ∩ (∪.is ) is generic with respect to ν e . We abbreviate this by saying ν e “occupies” /i . Let Me (T R ) denote the set of normalized ergodic components of measures in − R M(T R ). Then each /+ i (resp. /i ) is occupied by an elememt of Me (T ). Because the + − − + stable curves of /i and /i are joined, /i and /i are in fact occupied by the same element of Me (T R ). Thus the cardinality of Me (T R ) is ≤ r. To further study the structure of Me (T R ) we borrow some ideas from finite state + − ± Markov chains. Let /± i := /i ∪ /i . We think of each the sets /i , i = 1, · · · , r, as ± ± R a state, and write “i → j ” if T (/i ) ∩ /j = ∅. We say i is transient if there exists j such that there is a chain i → · · · → j but no chain with j → · · · → i. Non-transient
Strange Attractors with One Direction of Instability
55
states are called recurrent. The following are consequences of simple facts about directed graphs. (a) The set of recurrent states is partitioned into equivalence classes where i ∼ j if there is a chain i → · · · → j . On the union of the /± i corresponding to the states in each equivalence class is supported exactly one element of Me (T R ), which occupies each of these /± i . R (b) If i is transient, then clearly ν(/± i ) = 0 for every ν ∈ Me (T ). The following R claim is a consequence of the structure of T (Lemma 8.1) and the fact that for every transient state j , there exists a recurrent k such that j → · · · → k. Claim 8.2. /± i is the mod 0 union of a collection of pairwise disjoint s-subsets ˆ ˆ i,J {/i,J }J=1,2,··· with the property that for each J, there exists nJ > 0 such that (T R )nJ / is a u-subset of some recurrent state. The discussion in Sects. 8.2.1 and 8.2.2 are summarized as follows: Proposition 8.1. Let r be the number of critical points of f . Then there exist ergodic SRB measures µ1 , µ2 , · · · , µr , 1 ≤ r ≤ r, such that for every C 2 (b)-curve γ , mγ -a.e. z ∈ γ ∩ (∪. s ) is generic with respect to some µi . Proof. Let Me (T R ) = {ν1 , ν2 , · · · , νr }. Then the µi in this proposition are saturations of the νj ∈ Me (T R ) in the sense of Step 2 in Sect. 8.2.1. Clearly r ≤ r ≤ r; it may happen that r ≤ r because the saturations of distinct T R -invariant measures may merge. The genericity assertion is proved as follows. If k is a recurrent state, then it is occupied by some νj , and hence mγ -a.e. z ∈ γ ∩ (∪.ks ) is generic with respect to some µi . Via Claim 8.2, the same conclusion holds if k is a transient state. % & 8.3. A bound on the number of ergodic SRB measures and accounting for almost every initial condition in the basin. Let m denote the Lebesgue measure on R0 . In this subsection we prove Proposition 8.2. Let {µi } be the ergodic SRB measures in Proposition 8.1. Then for m-a.e. z0 ∈ R0 , there exists µi with respect to which z0 is generic. We prove this by showing that for some n > 0, zn lies in the local stable curve of a µi -typical point. The definition of genericity is given in Sect. 1.3. Proposition 8.2 serves two purposes: It accounts for the behavior of Lebesgue-a.e. initial condition in the basin of attraction and proves, at the same time, that the {µi } are all of the ergodic SRB measures of T (see (ii) below), thereby putting an upper bound on this number. Propositions 8.1 and 8.2 together prove Theorem 1.3. The following background information on general nonuniform hyperbolic theory may help put things in perspective: (i) In general, not all attractors admit SRB measures. This is the case even when there is a great deal of hyperbolicity (see e.g. [HY]). Also, without assumptions of transitivity, the number of ergodic SRB measures on an attractor can, in theory, be countably infinite (see [Led]). For the maps considered in this paper, examples show that multiple SRB measures do occur, and the given bound is achieved.
56
Q. Wang, L.-S. Young
(ii) Returning to general nonuniform theory, let B(µ) denote the measure-theoretic basin of µ, i.e. the set of points generic with respect to the measure µ. If µ is an ergodic SRB measure with no zero Lyapunov exponents, then B(µ) has positive Lebesgue measure (see [PS]). This is the reason why SRB measures are important in physics. It is also how we deduce from Proposition 8.2 that we have exhausted our list of ergodic SRB measures. (iii) In general, without assumptions of transivity, the attractor can be considerably smaller than the union of the supports of its ergodic SRB measures. This happens for the maps we are considering. (iv) In general, the union of B(µ) as µ ranges over all ergodic SRB measures need not be a full Lebesgue measure subset of the topological basin of the attractor (meaning the set of all points z with d(T i z, ) → 0). Measure-theoretic basins can be strictly smaller even when the SRB measure is unique or when ∪supp(µ) = . Proposition 8.2 therefore goes beyond general theory to describe a nice property of these attractors. (v) Finally, when there is more than one ergodic SRB measure, their measure-theoretic basins can be very delicately interwined. For the maps being considered here, we leave it as an exercise to construct examples in which there are arbitrarily small open sets meeting every B(µi ) in a set of positive Lebesgue measure. Proof of Proposition 8.2. Let B be the set of points not generic with respect to any of the µi . We remark that B is a Borel measurable set, for genericity with respect to a given measure is determined by a countable number of test functions. Let Z (k) be as in Sect. 7.3. Let Y0 = {z0 ∈ R0 : zk ∈ Z (k) for any k ≥ 0}, and for i ≥ 1, let Yi = {z0 ∈ R0 : zi ∈ Z (i)
and
zk ∈ Z (k) for all k > i}.
Suppose m(B ∩Yi ) > 0 for some i > 0. Then m(B ∩T i Yi ) > 0, and there is a vertical line γ with mγ (B ∩ T i Yi ) > 0. Let ε > 0 be a small number. By the Lebesgue density theorem, there exists a short segment γ0 ⊂ γ with the property that mγ (B ∩T i Yi ∩γ0 ) > (1 − ε)mγ (γ0 ). We will show in the next paragraphs that points generic with respect to some µi make up a definite fraction of γ0 , contradicting our choice of γ0 if ε is sufficiently small. (The argument we present also works if m(B ∩ Y0 ∩ C (0) ) > 0. For the case m(B ∩ Y0 ∩ (R0 \ C (0) )) > 0, use horizontal instead of vertical lines.) Let τ0 denote the tangent vectors to γ0 , and let γj = T j γ0 . We regard all of γ0 (which can be taken to be arbitrarily short) as bound to its nearest critical point, and let n1 be the first time when part of γj makes a free return to C (0) . As before, let /k = /+ k or u and . s -curves . Let D(/ ) denote the smallest rectangular region bounded by . /− k k that contains /k . If γn1 crosses some D(/k ) with two segments of at least comparable lengths extending beyond the two sides of D(/k ), we consider the segment γn1 ∩D(/k ) as having reached its final destination and take it out of circulation. We then divide what remains of γn1 into Iµj and delete those subsegments that do not contain a point of T n1 (B ∩ T i Yi ). Observe that for z0 ∈ γ0 ∩ T i Yi , (z0 , τ0 ) is controlled through time n1 − 1, and by Lemma 7.1, τn1 splits correctly (see the proof of Theorem 1.2(2)). This is true not only for z0 ∈ γ0 ∩ T i Yi but also for z0 ∈ γ0 such that zn 1 is in the same Iµj as zn1 . We iterate independently each one of the Iµj -segments that are kept. At the next free return we repeat the same procedure, namely we take out subsegments that cross some D(/k ), divide the rest into Iµj , delete those that do not contain a point in the image of B ∩ T i Yi , and observe that for the remaining segments control is extended to the next free return.
Strange Attractors with One Direction of Instability
57
Let γ0d = {z0 ∈ γ0 : zj is deleted at a free return for some j > 0}, and let γˆ0 = {z0 ∈ γ0 : zj reaches D(/k ) for some j and k in the required manner}. We note that mγ0 (γ0 \ (γ0d ∪ γˆ0 )) = 0. This follows from a sublemma which is the first step in the proof of Lemma 8.4 (see [BY2], Sublemma 4 and its corollary). Since (γ0 ∩ B ∩ T i Yi ) ∩ γ0d = ∅, we have (γ0 ∩ B ∩ T i Yi ) ⊂ γˆ0 mod 0 and that γˆ0 is the disjoint union of a countable number of subsegments {ω} with the following properties: – each ω is mapped under some T n(ω) onto a C 2 (b)-curve that connects two . s -sides of some D(/k ); – (z0 , τ0 ) is controlled up to time n(ω) for every z0 ∈ ω. From Lemmas 8.2, 8.3 and Proposition 8.1, it follows that there exists c1 > 0 independent of the choice of γ0 such that for each ω, mγ {z0 ∈ ω : zn(ω) ∈ ∪. s and is generic w.r.t. some µk } > c1 mγ (ω). This implies that mγ {z0 ∈ γ0 : z0 is generic w.r.t. some µk } > c1 mγ (γˆ0 ) > c1 (1 − ε) mγ (γ0 ), contradicting our choice of γ0 if c1 (1 − ε) > ε. % &
8.4. Correlation decay and Central Limit Theorem. We indicate how Theorem 1.4 is proved. The setup T R : / → / is designed so that the statistical properties in question are easily read off from the tail properties of the return time function R. To use the results in [Y3] or [Y4] directly, however, we need to consider returns to a single recurrent state. ˜ be one of the /i such that µ(/ Let µ˜ be one of the µj in Proposition 8.1, and let / ˜ i ) > 0. ˜ ˜ ˜ we define a return time R(z) ˜ by R(z) For z ∈ /, of z to / = t0 + t1 + · · · + tn , where ˜ t0 = R(z), t1 = R(T R (z)), · · · , tn = R((T R )n z) and (T R )n+1 z is the first return to / under T R . The results in [Y3] or [Y4] allow us to read off information on the statistical ˜ ˜ : R(z) properties of (T , µ) ˜ via the asymptotics of m∂R0 {z ∈ ∂R0 ∩ / > n}. Lemma 8.5. There exists K > 0 and θ˜0 < 1 such that for every n > 0, ˜ ˜ : R(z) > n} < K θ˜0n . m∂R0 {z ∈ ∂R0 ∩ / This lemma, which we leave as an exercise, is an easy consequence of Lemma 8.4. The results in [Y3] and [Y4] state that if the quantity estimated in Lemma 8.5 is of 1 ) for some ε > 0, then the Central Limit Theorem holds in the context order O( n2+ε of Theorem 1.4. This condition is evidently satisfied here. They also tell us that if this quantity is exponentially small, then every mixing component of µ˜ has exponential decay of correlations as asserted. 9. Global Geometry 9.1. Motivation. Nonuniformly hyperbolic attractors have very complicated local structures. The purpose of this section is to develop an understanding of the coarse geometry of the attractor for the maps in question, that is to say, to describe in a finite way the approximate shape and complexity of . To illustrate the idea of coarse geometry, consider the standard solenoid constructed from z → z2 . A good approximation of the attractor is given by the kth forward image
58
Q. Wang, L.-S. Young
(b)
(a)
(c)
(d) Fig. 3. The geometry of Rk
of S 1 × D2 , which is a tubular neighborhood of a simple closed curve winding around the solid torus 2k times. For another example, consider piecewise monotonic maps in 1-dimension. Iterates of these maps continue to be piecewise monotonic and can be understood in terms of their monotone pieces. Returning to the maps under consideration, the standard solenoid example suggests that Rk may be a good approximation of . In analogy with 1-dimension, one may also guess that Rk is a tubular neighborhood of a simple closed curve whose x-coordinates vary in a piecewise monotonic fashion. The latter is false, as is evident from the following sequence of pictures in Fig. 3: Depicted in (a) is a section of Rk lying between two C 2 (b)curves; (b) is the image of (a). As (b) is iterated, the horizontal distance between the tips of the two parabolas increases as shown in (c), until at some point they fall on opposite sides of a component of the critical set, resulting in (d). Since this happens to every “turn” that is created, the geometry of Rk for large k is quite complicated. The purpose of this section is to introduce the idea of monotone branches as basic building blocks for understanding the global structure of . To each map T we will associate a combinatorial tree whose edges correspond to monotone branches, and we will show that has arbitrarily fine neighborhoods made up of unions of finitely many monotone branches. Moreover, the way these branches fit together will tell us exactly how, in finite approximation, T differs from a 1-dimensional map.
9.2. Monotone branches. For z0 ∈ R0 , let O+ (z0 ) = {z1 , z2 , z3 , · · · } denote the positive orbit of z0 , and write O+ (.) = ∪z0 ∈. O+ (z0 ).
Strange Attractors with One Direction of Instability
59
Definition 9.1. Let γ be a connected subsegment of ∂Rk . We say γ is a (maximal) monotone segment if (i) the two end points of γ are in O+ (.); (ii) γ does not intersect O+ (.) in its interior. When we say ξi is an end point of a monotone segment, it will be understood that ξ0 is a critical point. We record below some simple facts about monotone segments. Lemma 9.1. Let γ ⊂ ∂Rk be a monotone segment. Then: (a) All points near the two ends of γ are in their fold periods; the part of γ not in a fold period (respectively bound period), if nonempty, is connected. (b) If part of γ is free, then its geometry is as follows: γ consists of a relatively long C 2 (b)-curve connecting two sets of relatively small diameters at the two ends; more precisely, there exists p such that the C 2 (b)-curve has length > e−βp while the p diameters of the two small sets are < b 2 ; also, the curvature of ∂Rk at the end point −i ξi of γ is > b . (c) If γ meets . in r points, r ≥ 0, then T (γ ) is the union of r + 1 monotone segments joined together at the T -images of these points. Proof. (c) follows from the definition of a monotone segment. (a) follows from the way monotone segments are created and from the monotonicity of bound and fold periods (see the proof of Lemma 4.10). The first assertion in (b) follows from estimates on the relative sizes of the parts of γ that are in bound versus fold periods; the second follows from the curvature formula in the proof of Lemma 2.4. % & We now begin to study the geometry of certain 2-dimensional objects. Definition 9.2. A simply connected region S ⊂ Rk is called a monotone branch if it is bounded by two monotone segments γ , γ ⊂ ∂Rk and two ends Eξ and Eζ with the following properties: (i) If the end points of γ are ξi and ζj , then the end points of γ are ξi and ζj , where ξ0 and ξ0 lie on the upper and lower boundaries of the same component Q(k−i) of C (k−i) , and ζ0 and ζ0 are related in the same way. k−i
(ii) Eξ = T i {z ∈ Q(k−i) (ξ0 ) : |z − ξ0 | < b 4 }; its time of creation is said to be k − i; Eζ and its time of creation are defined analogously. (iii) We define the age of Eξ to be i and require that i < θ −1 (k − i + 1); there is an analogous limit on the age of Eζ .
The definitions of Eξ and Eζ are quite arbitrary, subject only to the following considerations: We want Eξ to be large enough to contain all the critical orbits that originate from Q(k−i) (ξ0 ). On the other hand, we want it to remain relatively small during the life span of the monotone branch, so that the phenomenon depicted in Fig. 3 does not k−i k−i occur. We will assume that for i < θ −1 (k − i + 1), DT i b 4 < b 8 << e−αi , which is < dC (zi ) for z0 ∈ . by (IA2) in Sect. 3; that is to say, if S is a monotone branch of Rk , then its ends are at least a certain distance from C (k) . It is not always easy to visually identify monotone branches, particularly when their boundary segments are in fold periods. When part of γ is free, it follows from Lemma 9.1(b) that S consists of a (relatively long) horizontal strip with two small blobs at the two ends.
60
Q. Wang, L.-S. Young
.. . Fig. 4. Tree of monotone branches: branches ending in • are discontinued
Tree structure of a class of monotone branches. Monotone branches can be constructed as follows. First we declare that R0 is a monotone branch (even though it has no ends). Then if xi < xi+1 are adjacent critical points of the 1-dimensional map f , the 1 1 T -image of {z = (x, y) : xi − b 4 < x < xi+1 + b 4 } is a monotone branch of R1 . In general, let S be a monotone branch of Rk . If one of the ends of S is at its maximum allowed age, then S is “discontinued”, meaning we do not iterate it further. If not, T (S) is the union of a finite number of monotone branches of Rk+1 . More precisely, if S ∩ C (k) = ∅, then T (S) is a monotone branch. If S ∩ Q(k) = ∅, then S ⊃ Q(k) (in fact, S extends beyond Q(k) by > e−αk in both directions). If S contains r components of C (k) , then T (S) is the union of r + 1 monotone branches split roughly along the T -images of the middle of each of the Q(k) contained in S (cf. Lemma 9.1(c)). Let T = ∪k Tk denote the set of all monotone branches inductively constructed this way, with Tk consisting of branches of Rk . More precisely, T0 = {R0 }, and Tk+1 is obtained from Tk via the procedure described above. We will be working exclusively with monotone branches in T , which is a proper subset of the set of all monotone branches in Definition 9.2. The set T has a natural tree structure: we call the branches obtained by mapping forward and subdividing a given branch its descendants. Note that every branch in Tk has a unique ancestor in Ti for every i < k, but not all branches in T have offsprings: the ones with no offsprings are exactly those one of whose ends has reached its maximum allowed age. We have elected to discontinue a branch before its geometry “deteriorates”. An immediate question that arises is what happens to the part of the attractor contained in a discontinued branch. We will show in the next subsection that branches farther down the tree T can be used to take its place. We will, in fact, prove the following stronger version of Theorem 1.5. Theorem 1.5 . One can construct special neighborhoods R˜ n as in Theorem 1.5 using only monotone branches from Tk , n ≤ k < (1 + Kθ)n.
9.3. Replacement of branches. Let S ∈ Tk be a branch whose ends are denoted by Eξ and Eζ . In the discussion to follow, we assume that Eξ is fairly advanced in age, meaning (k − i) ∼ θ i, where i is the age of Eξ and k − i is its time of creation. As we search for replacements for S, the picture we hope to have is the following. There is a finite collection of branches {B} ⊂ ∪k<j ≤(1+Kθ)k Tj such that
Strange Attractors with One Direction of Instability
61
(i) the ends of B are contained in those of S; and (ii) if S ∈ S, where S ⊂ T is a cover of , then replacing S by {B} does not leave any part of exposed. Let Q(k−i) be the component of C (k−i) containing T −i Eξ . We hope to show that T −i S ⊂ Q(k−i) , so that the picture described above pulled back to Q(k−i) is as shown in Fig. 5. We begin to systematically justify this picture. For j = 0, 1, · · · , i − 1, let Sj ∈ Tk−i+j be the ancestor of S, so that S0 is the monotone branch of Rk−i containing Q(k−i) . Let E0 denote the end of S0 contained in Q(k−i) , and let Ej = T j E0 . Let the other end of Sj be called Ej . Let t > k − i, and let P ∈ Tt be such that P ∩ Q(k−i) is a horizontal strip bounded by two C 2 (b)-curves stretching all the way across Q(k−i) . We think of P as a pre-branch with respect to S0 in the sense that P ⊂ S0 and it is not yet born when S0 is created. If P is not discontinued, then we let P1 be the (unique) child of P with one end in E1 , and assuming P1 is not discontinued, we let P2 be the child of P1 with one end in E2 . Similarly, we define P3 , P4 , · · · up to Pi if it makes sense. Lemma 9.2. There exists K1 depending on ρ such that (i) for all j with K1 (k − i) < j ≤ i, T −j Sj ⊂ Q(k−i) ; (ii) if Pj is defined for all j ≤ K1 (k − i), then it is defined for all j ≤ i; moreover, for each j ≥ K1 (k − i), Pj ⊂ Sj , and the two ends of Pj are contained in the two ends of Sj . We isolate the following sublemma, the ideas in which are also used elsewhere. See Sect. 6.1 for notation. Sublemma 9.1. Let one of the horizontal boundaries of Q(s) , any s, be identified with [−ρ s , ρ s ], with the critical point corresponding to 0. Then for every Iµ0 j0 ⊂ [−ρ s , ρ s ], there exists n < K|µ0 | such that T n Iµ0 j0 traverses completely a component of C (0) . Proof. Let ω0 = Iµ0 j0 , and let r0 be the first time when part of ω0 makes a free return with T r0 ω0 containing an Iµj of full length. By Corollary 4.3, either T r0 ω0 contains one of the outermost Iµj (which we will call I˜) or it contains some Iµ1 j1 with |µ1 | < Kβ|µ0 |. In the latter case, we let ω1 = Iµ1 j1 and continue to iterate until r1 iterates later when part of T r1 ω1 is free and contains either I˜ or some Iµ2 j2 with |µ2 | < Kβ|µ1 |. After a finite number of iterates, we have T rq wq ⊃ I˜.
Q(k−1)
T −i S
T −i Eξ
T −i B Fig. 5. Replacing S by {B}
62
Q. Wang, L.-S. Young
From Corollary 4.3, we see that at the end of its bound period, T p I˜ has length >> δ. Inductively define I˜p+j = T (I˜p+j −1 ) \ C (0) for j = 1, 2, · · · . Then I˜p+j is a connected C 2 (b)-curve which grows essentially exponentially – until it crosses completely a component of C (0) . Since ri ∼ |µi | up to the point when T rq wq ⊃ I˜, and the growth is exponential thereafter, we conclude that the end game is reached in a total of < K|µ0 | iterates. % & Proof of Lemma 9.2. Claim 9.1. There exists K1 (depending on ρ) such that T −K1 (k−i) SK1 (k−i) ⊂ Q(k−i) . Proof of Claim 9.1. We identify the upper horizontal boundary of Q(k−i) with the interval [−ρ k−i , ρ k−i ], with the critical point corresponding to 0, and let n1 be the smallest n such that T n [0, 21 ρ k−i ] intersects the horizontal boundary of some Q(k−i+n) . From Sublemma 9.1, n1 < K1 (k − i) for some K1 = K(ρ). The claim is proved once we show that T −(n1 +1) Sn1 +1 ⊂ Q(k−i) . Let [0, J] be the shortest interval such that T n1 [0, J] contains the entire horizontal boundary of a Q(k−i+n1 ) . Since this boundary is free, J < 21 ρ k−i + e−c n1 ρ k−i+n1 , which is ≈ 21 ρ k−i . Let Sˆn1 be the section of Rk−i+n1 from k−i
En1 to the middle of Q(k−i+n1 ) . Since b 4 K K1 (k−i) << ρ k−i+K1 (k−i) , we have that T n1 Q(k−i) ⊃ Sˆn1 . It remains to show Sn1 +1 = T (Sˆn1 ), for which we need only to check that T (Sˆn1 ) is a monotone branch. To do that, it suffices to show that for j < n1 , T j [0, J] does not contain the horizontal boundary of any Q(k−i+j ) . Suppose it does for some j . By our choice of n1 , this can happen only if J > 21 ρ k−i and | T j [ 21 ρ k−i , J] |≥ ρ k−i+j , & which is impossible, for | T j [ 21 ρ k−i , J] |< e−c (n1 −j ) ρ k−i+n1 . %
Suppose we are guaranteed that Pn1 exists. We show next that Pn1 +1 exists and has the properties in Lemma 9.2(ii). Let γ be the part of a horizontal boundary of P that lies below [0, J]. From the estimates above, we know that T n1 γ is C 0 very near T n1 [0, J]. Let Pˆn1 be the section of T n1 (P ∩Q(k−i) ) that runs from En1 to the middle of some Q(t+n1 ) ⊂ Q(k−i+n1 ) . We claim that Pn1 +1 = T (Pˆn1 ). Clearly, Pn1 +1 ⊂ Sn1 +1 . To see that Pn1 +1 is a monotone branch, it suffices to observe that for j < n1 , T −n1 +j Pˆn1 ∩ C (t+j ) = ∅, which is an immediate consequence of the fact that T −n1 +j Sˆn1 ∩ C (k−i+j ) = ∅. We are now ready to show that Pj exists for all j ≤ i. Suppose that Pj −1 exists. The only reason why Pj may not exist is that one of its ends has reached its maximum allowed age. Of the two ends of Pn1 +1 , the one contained in En1 +1 is clearly created earlier, which means that of the two ends of Pj −1 , the one contained in Ej −1 is created earlier. It suffices therefore to check that this end survives the step from Pj −1 to Pj . It does, because it is created later than Ej −1 and has the same age as Ej −1 , and, by definition, Ej −1 has not reached its maximum allowed age. From here on we argue inductively that the relations in Lemma 9.2(ii) between Pj and Sj hold from j = n1 + 2 to j = i. Assume this is true for j − 1, and that Sj −1 has more than one child. Then Sj = T (Sˆj −1 ), where Sˆj −1 is the section of Sj −1 from Ej −1 to the middle of some Q(k−i+j −1) . Since by inductive assumption Pj −1 has its ends contained in those of Sj −1 , we are assured that it traverses some Q(t+j −1) ⊂ Q(k−i+j −1) . Letting Pˆj −1 be the section of Pj −1 from its end in Ej −1 to the middle of Q(t+j −1) , we see that Pj = T (Pˆj −1 ) has the desired properties. This completes the proof of Lemma 9.2. % &
Strange Attractors with One Direction of Instability
63
Proof of Theorem 1.5 . Let S0 = {R0 }, and assume that for each n ≤ m, a collection of monotone branches Sn is selected so that R˜ n := ∪S∈Sn S is a neighborhood of the attractor, and each S ∈ Sn has the following properties: (i) S ∈ Tk for some n ≤ k ≤ (1 + 3θ)n; (ii) if an end of S is of age i, i.e. it is created at time k − i, then 2θi ≤ k − i + 1. Note that (ii) is a more stringent requirement than the definition of monotone branches. The collection Sm+1 is defined as follows. For each S ∈ Sm , if the ends of S have not reached their maximum ages as allowed by (ii) above, then we put the children of S in Sm+1 . If one of its ends has reached this age, then we choose a collection of branches {P } to be specified in the next paragraph, construct from each P a monotone branch Pi as in Lemma 9.2, replace S by {Pi } and put the children of Pi in Sm+1 . Suppose for definiteness that S ∈ Tk , and its end E has reached age i, where 2θi = k − i + 1.
(11)
Let Q(k−i) be the component of C (k−i) containing T −i E. Let {P } be the subcollection of Sk−i+1 with the property that P ∩ Q(k−i) = ∅. Observe immediately that by our inductive hypotheses, P is a monotone branch of Rk˜ for some k˜ with k − i + 1 ≤ k˜ ≤ (1 + 3θ )(k − i + 1).
(12)
Since e−α(1+3θ)(k−i+1) >> ρ k−i , it follows that P intersects Q(k−i) in a horizontal strip bounded by C 2 (b) curves. Note also that since the union of the elements of Sk−i+1 covers , we have ∪P ⊃ (Q(k−i) ∩ ). To justify the validity of this replacement procedure, we need to show that (a) for each P as above, PK1 (k−i) is well defined, where K1 is as in Lemma 9.2; (b) Pi is a monotone branch of Rj for some j ≤ (1 + 3θ)m. ˜ Then Suppose that an end of P , which is a branch of Rk˜ , is of age i. 2θ i˜ ≤ k˜ − i˜ + 1.
(13)
To prove (a), if suffices to verify that this end lasts another K1 (k − i) iterates, i.e. θ [i˜ + K1 (k − i)] ≤ k˜ − i˜ + 1. This is true because θ i˜ ≤ 21 (k˜ − i˜ + 1) by (13), and ˜ K1 θ (k − i) ≤ K1 θ(k˜ + 1) = K1 θ[(k˜ − i˜ + 1) + i] 1 1 << (k˜ − i˜ + 1). ≤ K1 θ(k˜ − i˜ + 1) 1 + 2θ 2 The first inequality above is by (12) and the second by (13). To prove (b), we need to check that the age of the end of Pi that is contained in E, namely k˜ + i, is ≤ (1 + 3θ )m. Observe first that i ≤ m. This is because the replacement procedure described in Lemma 9.2 does not change the ages of the respective ends of the monotone branch in question. (The age of an end is equal to the “age” of the critical orbits it contains.) Thus it remains to check that k˜ ≤ (1 + 3θ )(k − i + 1) = (1 + 3θ)2θi < 3θi ≤ 3θm,
64
Q. Wang, L.-S. Young
the first inequality above coming from (12) and the equality from (11). This completes the proof of Theorem 1.5 . % & We mention two bonuses of this construction. First, it can be seen inductively that for every S ∈ Sn , if S is a branch of Rk , then the two monotone segments of ∂Rk that bound S must necessarily be from different components of ∂R0 . This is used in Sect. 10.6. Second, we claim that if deg(f ) = 0, then all of our monotone branches S ∈ Sm intersect the attractor in an essential way. Let us call a monotone branch S essential if every curve connecting the two monotone segments γ and γ in ∂S meets . Observe first that R0 ∈ S0 is essential if deg(f ) = 0. If not, then there exists a curve ω connecting the two components of ∂R0 that does not meet . Since = ∩k Rk , this implies that for some k, Rk ∩ ω = ∅, which is absurd since Rk is not contractible. Assuming that S ∈ Sm is essential, then clearly all the monotone branches that comprise T (S) are essential if no end replacements are needed in the next step. If an end replacement is required, then since the new branches are the images of parts of earlier essential branches, they are again essential.
9.4. The coarse geometry of . We explain in the following sequence of pictures exactly how, in finite approximation, the geometry of differs from that of a small tubular neighborhood of a single curve. These pictures are justified by Lemma 9.2. Referring back to Fig. 3(c), we may think of the region between the parabolas as made up to two ends belonging to adjacent branches. We know from Lemma 9.2 that long before the tips of these parabolas “separate”, that is, before the ends in question reach their maximum allowed age, there are pre-branches inside running parallel to these parabolas. In Fig. 6 below, the pre-branches are shown in grey, and the zig-zagging cut-lines represent pre-images of the critical set. These cut-lines will become “turns” before the ends in question reach their maximum allowed age. As this age is reached, the pre-branches are released. Figure 7(a) shows four newly released montone branches grafted onto a branch created earlier. Once released, the new branches evolve independently, resulting possibly in the configuration in Fig. 7(b) (cf. Fig. 3(d)). The boundaries of every turn (or pair of ends) created every step of the way will in time separate, releasing new branches grafted onto ones born earlier. As the new branches evolve, they create new turns, which again will last for only a finite duration of time. In terms of global geometry, this, in a sense, is the only way in which T differs
Pre-branches
Fig. 6. Pre-branches waiting to be released
Strange Attractors with One Direction of Instability
65
(a)
(b) Fig. 7. Newly released monotone branches evolving independently
from a 1-dimensional map. Tip replacements are scheduled to take place roughly once every ∼ log b1 iterates, so that in the limit as b tends to 0, no replacement is needed – as it should be for 1-dimensional maps. 10. Symbolic Dynamics and Topological Entropy The goals of this section are (1) to introduce a natural and unambiguous coding of all points on the attractor for the maps in question, and (2) to use this coding to obtain results on topological entropy and equilibrium states. 10.1. Coding of points on the attractor. Abusing notation slightly, let x1 < x2 < · · · < xr < xr+1 = x1 be the critical points of f in the order in which they appear on the circle, (0) (0) and let Ci := C ∩ Ci , where Ci is the component of C (0) containing xi . We remark that Ci can be a fractal set, and that for an arbitrary z ∈ R0 near Ci , it does not always make sense to think of z as being located on the left or on the right of Ci . The goal of this subsection is to show that points on are special, in that for them this left/right notion is always well defined. ˆ (k) is the component of Rk ∩ C (k−1) Recall that if Q(k) is a component of C (k) , then Q (k) (k) (k) ˆ containing Q . In particular, Q \ Q has a left and a right component. Lemma 10.1. The critical set C partitions \C into disjoint sets A1 , · · · , Ar as follows: – For z = (x, y) ∈ C (0) , z ∈ Ai if and only if xi < x < xi+1 . (0) ˆ (k) \ Q(k) . Then z ∈ Ai if it lies in the – For z ∈ Ci \ Ci , let Q(k) be such that z ∈ Q (k) (k) ˆ (k) \ Q(k) . ˆ \ Q ; z ∈ Ai−1 if it lies in the left component of Q right component of Q Proof. This lemma is an immediate consequence of our description of critical regions (Theorem 1.1(1)). The sets {Ai } are defined by the conditions above. What sets points in apart from arbitrary points in R0 is that z ∈ implies z ∈ Rk for all k, so that for z ∈ C (0) , there are only two possibilities: either z ∈ ∩k≥0 C (k) , in which case it is a
66
Q. Wang, L.-S. Young
critical point, or there is a largest k such that z ∈ C (k−1) . In the latter case, it follows ˆ (k) \ Q(k) for some Q(k) . from the geometric relation between C (k) and C (k−1) that z ∈ Q & % Lemma 10.1 gives a well defined address a(z) for all z ∈ \ C. We write a(z) = i if z ∈ Ai . Points in C have two addresses; for example, for z ∈ Ci , a(z) = both i − 1 and i. This in turn allows us to attach to each z0 ∈ with zi ∈ C for all i an itinerary ι(z0 ) = (· · · , a−1 , a0 , a1 , · · · ), where ai = a(zi ). Orbits that pass through C have exactly two itineraries as T i C ∩ C = ∅ for all i. We would like to show that the symbol sequence ι(z0 ) uniquely determines z0 . This may fail in a trivial way: Let Ii = [xi , xi+1 ]. Then our coding is clearly not unique if for some i, f (Ii ) wraps all the way around the circle, meeting some Ij more than once. For simplicity of exposition we will assume this does not happen. If it does, it suffices to consider the partition on whose elements correspond to the connected components of Ii ∩ f −1 Ij . 10.2. Coding of monotone branches. Coding of monotone segments of ∂Rk . Observe that points in ∂Rk also have welldefined a-addresses in the spirit of Lemma 10.1: if z ∈ ∂Rk ∩ C (k) , then its location with respect to .k is obvious (except when z ∈ .k ). This allows us to assign in a unique way a k-block [a−k , · · · , a−1 ] to each monotone segment γ of ∂Rk . We write ι(γ ) = [a−k , · · · , a−1 ]. Coding of monotone branches of Rk . Each S ∈ Tk , k > 0, is associated with a block ι(S) = [a−k , · · · , a−1 ] defined inductively as follows: Let S ∈ Tk−1 be such that ι(S) = [a−(k−1) , · · · , a−1 ]. If S ∩ C (k−1) = ∅, then it lies between two components of C (0) , (0) (0) , · · · , a ], where a = i and a say Ci and Ci+1 , and ι(T (S)) := [a−k −1 −1 −j = a−j +1 (k−1) for j > 1. If S ∩ C
= ∅, then S = Sˆ1 ∪ · · · ∪ Sˆn , where Sˆ1 is the section of S from one end to the middle of the first Q(k−1) that it meets, Sˆ2 is the section from the middle of this Q(k−1) to the middle of the next component of C (k−1) etc., and the -entry of ι(T (Sˆ )) is defined according to the location of Sˆ . Note that this coding a−1 j j of branches in T is injective, i.e. S = S implies ι(S) = ι(S ), and that if γ and γ are monotone segments that bound S, then ι(γ ) = ι(γ ) = ι(S). Note also that the replacement procedure in Sect. 9.3 corresponds to replacing [ak , · · · , a−1 ] by blocks of the form [∗, · · · , ∗, a−k , · · · , a−1 ]. Coding of arbitrary points in R0 . For points in certain locations of R0 , there is no meaningful way of assigning to it an address as we did in Sect. 10.1. Instead, for each k ≥ 0, we define the a˜ (k) -address(es) of z ∈ Rk as follows: a˜ (k) (z) has the obvious (0) definition if z ∈ C (k) ; if z = (x, y) ∈ Q(k) for some Q(k) ⊂ Ci , we let a˜ (k) (z) = i if k
x > xˆ − b 4 , where zˆ = (x, ˆ y) ˆ is one of the critical points in ∂Q(k) ; a˜ (k) (z) = i − 1 if k (k) x < xˆ + b 4 . Clearly a˜ -addresses are not unique: an open set of points in the middle (0) part of each Q(k) ⊂ Ci have as their a˜ (k) -addresses both i − 1 and i. We further introduce the following notation: π ([an , an+1 , · · · , am ]) = {z0 ∈ : a(zi ) = ai , n ≤ i ≤ m}; πR0 ([a−k , a−k+1 , · · · , a−1 ]) = {z0 ∈ Rk : a˜ (k−i) (z−i ) = a−i , 1 ≤ i ≤ k}; “a˜ (k−i) (z−i ) = a−i ” above means a−i is an admissible a˜ (k−i) -address of z−i .
Strange Attractors with One Direction of Instability
67
Lemma 10.2. (i) Every S ∈ Tk , k ≥ 1, is = πR0 (ι(S)) and contains a neighborhood of π (ι(S)). (ii) Given z0 ∈ and n ∈ Z+ , there exists k with n ≤ k ≤ n(1 + 3θ) and S = S(z0 , n) ∈ Tk such that z0 ∈ π (ι(S)). Proof. That S = πR0 (ι(S)) follows inductively from the definitions of these two objects. That S contains a neighborhood of π (ι(S)) is also obvious inductively. For (ii), we know from Theorem 5 that there exists S ∈ Tn with z0 ∈ S. The only way one can have z0 ∈ π (ι(S)) is that at the time S is created, say at time k − i, T −i S meets the k−i mid b 4 -section E of some Q(k−i) and extends to the left of E, while z−i ∈ E and lies to the “right” of . ∩ Q(k−i) in the sense of Lemma 10.1. Let S0 be the ancestor of S in Tk−i , and let S1 be the descendant of S0 that contains the right half of Q(k−i) . Our replacement procedure guarantees that there exists S ∈ T that is either a descendant of S1 or a replacement for a descendent of S1 which contains z0 . % & Let
= := {a = (ai )∞ i=−∞ : ι(z0 ) = a
for some z0 ∈ },
and let (σ a)i = (a)i+1 denote the shift operator. It is easy to check that = is a closed −1 = ⊂ =. Extending our definition of π to infinite subset of Y∞ −∞ {1, 2, · · · , r} with σ sequences and writing π = π , we have that π(a) is the set of all points z0 ∈ with ι(z0 ) = a. The following proposition, whose proof occupies all of the next subsection, completes the proof of Theorem 1.6. Proposition 10.1. For every a ∈ =, π(a) consists of exactly one point, and π : = → is a continuous mapping. Let B(z0 , ε) denote the ball of radius ε centered at z0 , and let us say S ∈ Tk is compatible with a = (ai ) if ι(S) = [a−k , · · · , a−1 ]. Proposition 10.1 follows immediately from Lemma 10.2(i) and Proposition 10.1 below. Proposition 10.1 . Given a ∈ =, z0 ∈ π(a), and ε > 0, there exists S ∈ Tn+m compatible with σ n a such that T −n S ⊂ B(z0 , ε). 10.3. Uniqueness of point in corresponding to each itinerary. We begin with a situation that resembles that in 1-dimension. Lemma 10.3. Let a, z0 and ε be as in Proposition 10.1 . Suppose that for some k, the component of Rk ∩ B(z0 , ε) containing z0 , which we denote by H , is bounded by two C 2 (b) subsegments γ and γ of ∂Rk cutting across B(z0 , ε) as shown with Hausdorff distance (γ , γ ) < ε10 . Then there exists S ∈ Tn+m compatible with σ n a such that T −n S ⊂ H . Proof. Our plan of proof is as follows. Assuming m > k, so that T −n S ⊂ Rk , we wish to block it from exiting B(z0 , ε) via, say, the right boundary of H . To this end, we will show that for some section H ⊂ H as shown and k > 0, T k (H ) is a component of C (k+k ) , so that the left and right boundaries of H have incompatible a˜ (k+k ) -addresses. Assuming n > k , it will follow (using Lemma 10.2) that T −n S cannot meet both the left and right boundaries of H . Being connected and contained in Rk , T −n S must meet
68
Q. Wang, L.-S. Young B(z0 , ε)
H γ γ
Rk
. z0
H Fig. 8. The situation considered in Lemma 10.3
both boundaries of H in order to exit B(z0 , ε) from the right. The left boundary of H can be blocked off similarly. The proof that T k H crosses a component of C (k+k ) for some k is similar to that of Sublemma 9.1, but there are two differences: initially at least, we do not know the lengths of T j γ relative to their distances to the critical set, and we must control the shearing between γ and γ as we iterate. Details of the proof follow. Consider first the case where z0 ∈ C (0) . Let γ0 be a subsegment of γ of length 2ε located half-way between z0 and the right boundary of H . We first describe how to locate γ0 ∩ H . Let n1 be the first time when T i (γ0 ) meets C (0) . If T n1 γ0 contains an Iµj of full length, then we let γ1 ⊂ T n1 γ0 correspond to the longest Iµj or segment outside of C (0) , whichever is longer. If not, we let γ1 = T n1 γ0 . In both cases, we let n2 be the first time when part of T n2 −n1 γ1 makes a free return. Choose γ2 ⊂ T n2 −n1 γ1 as before, let n3 be the first time when part of T n3 −n2 γ2 makes a free return, and so on. Using the fact that ∂Rn is controlled (Proposition 5.1), we see that the γi increase in length, so that there exists some i0 such that γi0 contains an Iµj . From then on, the argument in Sublemma 9.1 produces an i1 such that T ni1 −ni1 −1 γni1 −1 traverses a component of C (0) . We now proceed to construct H . Letting τ0 denote unit tangent vectors to γ , we have that DT i (ξ0 )τ0 ≥ c > 0 for all ξ0 ∈ γ0 and i ≤ n1 . Through each ξ0 ∈ γ0 , therefore, is a stable curve of order n1 connecting ξ0 to a point in γ less than ε 9 away (see Sect. 2.2 and Lemma 2.9). Let H0 be the region between γ and γ made up of the union of these stable curves. Since we do not know how close T n1 γ0 gets to the critical set, we cannot continue to claim the expanding property of τ0 beyond time n1 . Instead, we observe that for ξ0 ∈ γ1 , DT j (ξ0 )( 01 ) ≥ 1 for j ≤ n2 − n1 so that through each ξ0 ∈ γ1 , there is a stable curve of order n2 − n1 . Assuming that these stable curves meet T n1 γ0 , we define H1 to be the region between T n1 γ0 and T n1 γ0 spanned by these curves, and check that H1 can be chosen to be a subregion of T n1 H0 . To justify the last sentence, observe first that if γ1 ⊂ Iµ1 j , then e−µ1 > ε. This is true regardless of whether γ1 = T n1 γ0 . Second, since the contractive field en2 −n1 near γ1 makes angles ∼ e−µ1 with T n1 γ0 and with T n1 γ0 , every point in γ1 is connected by a stable curve to a point in T n1 γ0 not more than a distance of (bn1 ε 9 )/e−µ1 < bn1 ε 8 << e−µ1 away. This allows us to define H1 . Finally, we may need to trim the edges of H1by a length ∼ bn1 ε 8 in order to fit it inside T n1 H0 . This is easily done since |γ1 | > min ε, µ12 e−µ1 . 1
Strange Attractors with One Direction of Instability
69
At time n2 , we again do not know how close T n2 −n1 γ1 is to the critical set, and so we use DT j ( 01 ) ≥ 1 for j ≤ n3 − n2 to construct new stable curves which are then used to construct H2 . Observe that compared to time n1 , the situation has improved: |γ2 | ≥ |γ1 |, and the segments γ2 and γ2 are closer than before. We construct H3 , H4 , · · · , until time ni , when T ni1 −ni1 −1 Hi1 −1 ⊃ Q, a component of C (k+ni1 ) . Letting k = ni1 and H = T −ni1 (Q), the proof for the case z0 ∈ C (0) is complete. ˆ (j ) \ Q(j ) . If k in the statement of the For z0 ∈ C (0) \ C, let j be such that z0 ∈ Q lemma is ≥ j , repeat the argument above with n1 = 0. If not, replace k by j and ε by min(ε, 21 ρ j ) and let n1 = 0. The case of z0 ∈ C is dealt with similarly. % & Recall that for all z0 ∈ , at every return to C (0) , zi is h-related, and bound and fold periods are well defined. (See Sect. 3 for definitions.) Proof of Proposition 10.1 . Let a ∈ = and z0 ∈ π(a) be given. We wish to arrange for the scenario in Lemma 10.3 at z0 , but it is not possible to do it directly when z0 is near a “turn”. Intuitively, in order for z0 to be near a “turn”, z−i must be near the critical set for some i > 0. This motivates the following considerations. Case 1. There exists arbitrarily large i such that dC (z−i ) < ρ k for k ≈ K0 (log DT )θ i, where K0 is to be specified shortly. Let ε > 0 be given, and let i and k have the ˆ (j ) \ Q(j ) . Then relationship above with DT −i < ε 10 . Let j be such that z−i ∈ Q −i j ≥ k. We wish to apply Lemma 10.3 to z0 = z−i with ε = DT ε and H bounded by ∂Rj . This result transported back to z0 proves the proposition. To satisfy the hypotheses of Lemma 10.3 at z0 , it suffices to check that the Hausdorff distance between the two ˆ (j ) is < (DT −i ε)10 . This is true provided K0 is chosen to horizontal boundaries of Q satisfy the inequality k
b 4 = (b 4 K0 θ log DT )i = (DT log b 1
1 4 K0 θ
)i < DT −11i = (DT −i ε)10 .
Case 2. Not Case 1. Note that this means that z−i approaches C extremely slowly (if at i all) as i → ∞. First we observe that with dC (z−i ) >> b 2 , z0 is out of all fold periods from the past. To arrange for the scenario of Lemma 10.3 at z0 , we will show: there exist κ = O(1) and arbitrarily large i such that DT j (z−i )( 01 ) ≥ κ j for all j ≤ i; (ii) the stable curves near z−i when mapped forwards bring with them to z0 a pair of curves from ∂Rn with z0 sandwiched in between; (iii) these curves are C 2 (b), they have a minumum length ε1 independent of i and their Hausdorff distance can be made as small as need be by choosing i large. (i)
We prove (i). Leaving the inf i dC (z−i ) > 0 case as an exercise, we consider i with dC (z−i ) ≤ dC (z−j ) for all 0 < j ≤ i. Suppose dC (z−i ) ≈ e−µ , so that the ensuing bound period is > K −1 µ. Let wj = DT j (z−i )( 01 ), and let z−i+n be the next free return. Then wj ≥ 1 for j ≤ n. We argue that wn splits correctly: If z−i+n ∈ C (n) , 1 1 −1 then dC (z−i+n ) ≥ dC (z−i ) ≈ e−µ >> b 20 K µ ≥ b 20 n ; if z−i+n ∈ C (n) , then it ˆ (j ) − Q(j ) for some j < n. In both cases, Lemma 7.1 applies, and we have is ∈ Q cn c ∗ wn+1 ≥ e 3 e−µ ≥ e( 3 −K)n . Since the situation at subsequent free returns is clearly c improved (dC (·) ≥ dC (z−i ) and the derivative has built up), we have wj ≥ e( 3 −K)j for all j ≤ i.
70
Q. Wang, L.-S. Young
ˆ (k) \ Q(k) for some k. We consider the stable curve To prove (ii), suppose z−i ∈ Q ˆ (k) . A of order i through z−i and let ζ0 be its intersection with the upper boundary of Q subsegment γ0 of this upper boundary centered at ζ0 is constructed by iterating forward i times and trimming whenever necessary so that T j γ0 stays inside three consecutive IµJ for all j ≤ i. Clearly, stable curves of order i can be constructed through all points ˆ (k) . in γ0 , and these curves “tie together” the two subsegments of ∂ Q We leave it as an exercise to show the existence of ε1 (which depends only on the slow rate of approach to C in backward time). The curves brought in are sebsegments of ∂Rk+i and they are out of all fold periods. This completes the proof of Proposition 10.1 . & %
10.4. Proof of Theorem 2(1)(iii). We explain how = ∪ε>0 ε follows readily from the ideas in the last two subsections and the surjectivity condition (*) in Sect. 1.2. In view of Proposition 10.1 , it suffices to show that every S ∈ T contains a point in ε for some ε > 0. Recall the way monotone branches in T are constructed. Given S ∈ T , let J > 0 be the smallest integer such that T −J S ∈ T . Then T −J S contains half of some Q(k) . Let H be the middle half of T −J S ∩ Q(k) , with length 41 ρ k . An argument similar to that in Lemma 10.3 but carried on indefinitely in time gives a sequence of domains H ⊃ H1 ⊃ H2 ⊃ · · · and a curve ω0 ⊂ ∩n≥1 Hn with the following properties: – ω0 connects the top and bottom boundaries of Q(k) ∩ T −J S; – there exists ε > 0 such that ∀z ∈ ω0 , dC (zn ) ≥ ε ∀n ≥ 0. To finish, it suffices to produce zˆ 0 ∈ ω0 such that zˆ −i ∈ C (0) ∀i > 0. Let Di be the component of R0 \ C (0) between the i th - and (i + 1)st components of C (0) , and let Dˆ i be the union of Di with the two components of C (0) adjacent to it. Then we may assume from condition (*) that for every i, there exists j such that T (Dj )∩ Dˆ i contains a horizontal strip traversing the full length of Dˆ i . Suppose ω0 ⊂ Dˆ i , and let j be as above. Then there is a subsegment ω1 ⊂ ω0 such that T −1 ω1 ⊂ Dj and connects the top and bottom boundaries of Dj . Similarly, we produce for n = 2, 3, · · · segments ωn ⊂ ωn−1 such that T −n ωn is contained in some Dj (n) and connects the two horizontal boundaries of Dj (n) . Let zˆ 0 ∈ ∩n≥0 ωn . % & 10.5. Existence of equilibrium states. This is a corollary to the symbolic dynamics we have developed. Let ϕ : R0 → R be a continuous function, and let P (T ; ϕ) denote the topological pressure of T for the potential ϕ. (See e.g. [Wa], Chapter 9, for definitions and basic facts.) A well known variational principle says that P (T ; ϕ) = sup Pν (T ; ϕ), where the supremum is taken over all T -invariant Borel probability measures ν and Pν (T ; ϕ) := hν (T ) + ϕdν, where hν (T ) denotes the metric entropy of T with respect to ν. An invariant measure for which this supremum is attained is called an equilibrium state for (T ; ϕ). Let σ : = → = and π : = → be as in Theorem 1.6.
Strange Attractors with One Direction of Instability
71
Proof of Corollary 1.2. Let ϕ : R0 → R be given. We need to prove that there exists ν such that Pν (T ; ϕ) = P (T ; ϕ). Let ϕ˜ be the function on = defined by ϕ˜ = ϕ ◦ π . Then P (T ; ϕ) = P (T |; ϕ|) ≤ P (σ ; ϕ). ˜ Since σ : = → = has a natural finite generator without boundary, (σ, ϕ) ˜ has an equilibrium state which we call ν˜ . Let ν = π∗ ν˜ . It suffices to show that Pν (T |; ϕ|) = Pν˜ (σ ; ϕ). ˜ This follows from the fact that π is one-to-one over \ ∪T i C, and µ(π −1 (∪T i C)) = 0 for any σ -invariant probability measure µ because σ i (π −1 C) ∩ π −1 C = ∅ for all i ∈ Z. % & Since the topological entropy of T , written htop (T ), is equal to P (T ; 0), the discussion above gives immediately Corollary 10.1. (i) T has an invariant measure of maximal entropy. (ii) Let Nn be the number of distinct blocks of symbols of length n that appear in =. Then 1 lim log Nn = htop (T ). n→∞ n 10.6. Topological entropy. Topological entropy is, in general, defined in terms of open covers of arbitrarily small diameters, ε-separated or spanning sets. None of the standard definitions is easy to compute with. Corollary 10.1 gives a concrete way to think about this invariant for the class of dynamical systems under consideration. Three other characterizations and estimates of geometric interest are discussed here. Recall the notion of a˜ (k) -addresses for z ∈ Rk (see Sect. 10.2). For z0 ∈ R0 , we define its (future) a-itinerary ˜ to be (ai )∞ ˜ (i) (zi ) = ai . These itineraries 0 if for each i, a are clearly not unique. Let N˜ n = the number of n-blocks appearing in the a-itineraries ˜ of points in R0 , overcounting whenever ambiguities arise, that is, if an orbit has j different admissible a-itineraries ˜ of length n, they will be counted as j distinct blocks in N˜ n . Obviously, ˜ Nn ≤ Nn . Lemma 10.4. lim sup n→∞
1 log N˜ n ≤ htop (T ). n
Proof. We fix some arbitrarily small ε > 0, and choose n0 so that 1 log Nn0 < htop (T ) + ε n0
and
1 log(2n0 ) < ε. n0
n1
Let n1 > n0 be large enough that b 10 DT n0 < e−βn0 , so that no orbit segment in R0 n1 of length ≤ n0 can pass through the region D := {ξ0 ∈ C (n1 ) : |ξ0 − zˆ 0 | < b 10 for some zˆ 0 ∈ C ∩ Q(n1 ) (ξ0 )} more than once. For each z0 , let Sz0 = T −n0 S(zn0 , 2n1 ), where S(zn0 , 2n1 ) is as in Lemma 10.2(ii). By part (i) of the same lemma, Sz0 is a neighborhood of z0 . Let n2 > n1 be such that Rn2 ⊂ ∪z0 ∈ Sz0 . Define N˜ (n2 , n2 + n0 ) = the number of distinct blocks of [an2 , · · · , an2 +n0 −1 ] that appear in the a-itineraries ˜ of all points in R0 . Claim 10.1. N˜ (n2 , n2 + n0 ) ≤ 2n0 Nn0 .
72
Q. Wang, L.-S. Young
Proof of Claim 10.1. Let ξ0 ∈ R0 , and let (ai ) be any one of its a-itineraries. ˜ Let ξn2 ∈ Sz0 for some z0 ∈ , and let ι(z0 ) = (bi ). We compare the two blocks [an2 , · · · , an2 +n0 −1 ] and [b0 , · · · , bn0 −1 ]. The ith entry of the first block is an a˜ (n2 +i) -address of ξn2 +i . Since ξn2 +n0 ∈ S = S(zn0 , 2n1 ), it follows from Lemma 10.2 that the i-th entry of the second block is an a˜ (n(S)−n0 +i) -address of ξn2 +i , where n(S) is such that S ∈ Tn(S) . Since the indices in both of these a-addresses ˜ exceed n1 , they may differ only if ξn2 +i ∈ D. This can happen at most once in the time period in question. In other words, [an2 , · · · , an2 +n0 −1 ] and [b0 , · · · , bn0 −1 ] can differ in at most one entry, and the difference is either +1 or −1. Since [b0 , · · · , bn0 −1 ] is one of the sequences counted in Nn0 , the claim is proved. & % Similar reasoning shows that N˜ (n2 + kn0 , n2 + (k + 1)n0 ) ≤ 2n0 Nn0 for all k ≥ 0, giving N˜ n2 +kn0 ≤ K n2 · (2n0 Nn0 )k . This combined with the properties we imposed on n0 at the beginning of the proof gives the desired inequality. % & To complete the proof of Theorem 1.7(i), recall that Pn is the number of fixed points of T n in . Lemma 10.5. lim
n→∞
1 log Pn = htop (T ). n
Proof. Since no point in C is periodic, there is a one-to-one correspondence between the fixed points of T n and the periodic symbol sequences of period n in =, proving “≤” in the lemma. That 1 lim inf log Pn > htop (T ) − ε n→∞ n for every ε > 0 follows from a general theorem of Katok for all C 2 surface diffeomorphisms [K]. % & Perhaps the most concrete geometric quantity of all is the rate of growth of the number of monotone segments of a curve such as ∂R0 . Our next lemma compares this growth rate to the topological entropy of T . Let ∂R0+ and ∂R0− denote the two components of ∂R0 , and define Mn± = the number of monotone segments in ∂Rn± where “monotone segments” are as defined in Sect. 9.1. Proof of Theorem 1.7(ii). First we prove Mn± ≤ N˜ n . This follows from the fact that for every monotone segment γ in ∂Rn± , ι(γ ) is counted in N˜ n , and the mapping γ → ι(γ ) is injective. To prove the second inequality, we associate to each n-block [a−n , · · · , a−1 ] that appears in = first a point z0 ∈ with a(z−i ) = a−i and then a monotone branch S = S(z0 , n) as in Lemma 10.2. Then S ∈ Tk for some k with n ≤ k ≤ n(1 + ε0 ), ε0 = 3(log b1 )−1 . We remarked at the end of Sect. 9.3 that every S ∈ T has a boundary component γ + in ∂Rk+ and one in ∂Rk− . We have thus defined, for each fixed n, a mapping from the set of n-blocks in = to the set of monotone segments of ∂Rk+ , n ≤ k ≤ n(1+ε0 ).
Strange Attractors with One Direction of Instability
73
This mapping is clearly injective since ι(γ + ) = ι(S) = [∗, · · · , ∗, a−n , · · · , a−1 ], proving Mk+ . Nn ≤ n≤k≤n(1+ε0 )
From this one deduces easily that lim
1 1 + . log Nn ≤ (1 + ε0 ) lim inf log M(1+ε 0 )n n (1 + ε0 )n
& %
Appendix A. Examples A.1. Attractors arising from interval maps including the Hénon attractors. Reduction of Theorem 1.8 to Theorems 1.1–1.7. Let I0 be a closed interval such that f (I ) ⊂ int(I0 ) ⊂ I0 ⊂ int (I ), and let J1 and J2 be the two components of I \ I0 . Choosing b0 << |J1 |, |J2 |, one obtains easily from the formulas for Ta,b in Sect. 1.1 ˆ := [a0 , a1 ] × (0, b0 ] such that for all (a, b) ∈ !, ˆ Ta,b that there exist K > 0 and ! maps R := I × [−Kb, Kb] strictly into I0 × [−Kb, Kb]. Our plan is to replace ∂I × [−Kb, Kb] by two curves ω1 and ω2 so that each ωi ⊂ Ji × [−Kb, Kb], joins the top and bottom boundaries of R, and lies on the stable curve of a periodic orbit. We may assume that these periodic orbits stay outside of C (0) . Replacing R by R0 , the subregion of R bounded by ω1 and ω2 , the situation is now virtually indistinguishable from that of the annlus maps treated in Theorems 1.1–1.7: the top and bottom boundaries of R0 play the role of ∂R0 in the previous situation, and the left and right boundaries shrink exponentially as we iterate. (There are small differences, such as the existence of monotone branches with one end bounded by images of ωi . These differences are inessential.) To produce ω1 and ω2 , we claim that pre-periodic points of f are dense in I . This claim is justified as follows. First, Misiurewicz maps have no homtervals, so that there is a coding of the orbits of f by a subshift σ : = → = with the property that each element of = corresponds to the itinerary of exactly one point in I . Second, = is the closure of ∪n =n , where {=n } is an increasing sequence of subshifts of finite type, and third, pre-periodic points are dense in shifts of finite type. To finish, we fix pre-periodic points p1 and p2 of f near the middle of J1 and J2 . ˆ if necessary, we may assume that for Ta,b with (a, b) ∈ !, ˆ the periodic Shrinking ! orbits related to p1 and p2 persist and the stable curves through the continuation of pi have the desired properties. This is possible because the slopes of these stable curves are bounded away from zero (see Sects. 2.1 and 2.6). Proof of Corollary 1.3. For the quadratic family, the transversality condition in Step II in Sect. 1.1 hold at all Misiurewicz points [T]. The nondegeneracy condition in Step IV is obviously satisfied. (To ensure that f (I ) ⊂ int(I ) for some I in the case a ∗ = 2, consider a slightly less than 2.)
74
Q. Wang, L.-S. Young gn
η (0, 1) gn
Bn (1, 0)
ξ
Fig. 9. Attractors arising from homoclinic bifurcations
A.2. Homoclinic bifurcations. We verify here the conditions in Sect. 1.1 and condition (**) in Sect. 1.2 for homoclinic bifurcations in 2-dimensions, setting the stage to apply Theorems 1.1–1.7. See Sect. 1.5 for a more detailed description of the bifurcation in question. Following [PT, pp. 47–51] we assume that linearizing coordinates have been chosen in which gµ , µ ∈ [0, µ∗ ], has the following properties: (i)
On {|ξ |, |η| < 2}, gµ is the linear map gµ (ξ, η) = (σµ ξ, λµ η)
where 0 < λµ < 1 < σµ , λµ σµ < 1, and λµ , σµ depend continuously on µ. (ii) There exists N ∈ Z+ such that g0N maps the point (1, 0) to (0, 1), carrying the unstable curve at (1, 0) to a curve making a quadratic tangency with the stable curve at (0, 1). Near (1, 0), gµN has the form gµN (ξ, η) = (α(ξ − 1)2 + βη + γ µ + H1 (µ, ξ, η), 1 + H2 (µ, ξ, η))
(14)
where α, β, γ = 0 are constants. Furthermore, we have that at (µ, ξ, η) = (0, 1, 0), H1 = H2 = 0, ∂ξ H1 = ∂η H1 = ∂µ H1 = 0 and ∂ξ ξ H1 = ∂ξ µ H1 = ∂µµ H1 = 0. It is not hard to see that for each fixed n, n large, there exist a box Bn (with diam(Bn ) → 0 as n → ∞) and a range of parameters µ (also depending on n) such that (gµn ◦ gµN )(Bn ) ⊂ Bn . The attractors of interest to us have (n + N ) components permuted cyclically by gµ , with one of these components residing in Bn . To maneuver g n ◦ g N into the setting in Sect. 1.1, we apply the coordinate transformation [ = [2 ◦ [1 where [1 (ξ, η) = (ξ − 1, η − λn ),
[2 (ξ, η) = (−
σ 2n σn ξ, − η). a a
The purpose of [1 is to shift the center of Bn to the origin. The map [2 magnifies the attractor to unit length; its scaling in the η-direction is chosen with the standard quadratic
Strange Attractors with One Direction of Instability
75
family in mind. A straightforward computation yields x T := [ ◦ g n ◦ g N ◦ [−1 : y 1 n 2n (λn + µ)] − ax 2 + y − σ 2n H (µ, [−1 (x, y)) [σ − σ 1 a . → a 2n − σa λn H2 (µ, [−1 (x, y)) Letting a = \(µ) := σ n −σ 2n (λn +µ) and H˜ i (a, x, y) := Hi (µ, [−1 (x, y)), i = 1, 2, we have 2n 1 − ax 2 + y − σa H˜ 1 (a, x, y) x . T : → 2n y − σ λn H˜ 2 (a, x, y) a
Since µ = σ −n − aσ −2n − λn , the range of a of interest to us, namely a ∈ [1.5, 2) (see Appendix A.1), corresponds to a subset of (0, µ∗ ] for n large. What we have so far is a 1-parameter family {Ta }, which we regard as defined on U := {|x|, |y| < 2}. The role of b → 0 here is played by n → ∞. Our next task is to choose b (as a function of n) in such a way that Ta,b has the form Ta,b :
x y
→
1 − ax 2 + y + bu bv
,
where u = u(a, x, y) and v = v(a, x, y) have uniformly bounded C 3 -norms. This will put us in the setting of Theorem 8 (see the proof of Corollary 3). We begin by examing the C 3 -norms of σ 2n H˜ 1 and σ 2n λn H˜ 2 . Using the facts that the leading terms in H1 are η(ξ − 1 + η + µ), and that |ξ | < 3σ −n and |η| < 3σ −2n for (ξ, η) ∈ [−1 (U ), we have H˜ 1 C 0 = O(σ −3n ). Similarly, H˜ 2 C 0 = O(σ −n ). Let ∂ i , i = 1, 2, 3, denote any one of the i-th partial derivatives. Using again the special form of H1 and the nature of the coordinate transformations [ and \, we have ∂ i H˜ 1 = O(σ −3n ) and ∂ i H˜ 2 = O(σ −n ). Together this gives σ 2n H˜ 1 C 3 < Kσ −n ,
σ 2n λn H˜ 2 C 3 < K(σ λ)n .
The following choices of b therefore give the desired result: If σ 2 λ ≤ 1, let b = σ −n . If σ 2 λ ≥ 1, let b = (σ λ)n . This completes the verification of the conditions in Sect. 1.1 for the family {Ta,b }. We finish with the observation that all the results in Sect. 1 that assume (**) are valid in the present setting: In the case σ 2 λ ≤ 1, | det(DT )| ∼ b, so (**) is satisfied. When σ λ ≥ 1, | det(DT )| ∼ (σ λ)n = bη where σ −1 = (σ λ)η . This is condition (∗∗) , a variant of (**) discussed in Sect. 7.2.
76
Q. Wang, L.-S. Young
B. Computational Proofs B.1. Linear algebra (Sect. 2.1). Sublemma B.1. Let e be a unit vector in the most contracted direction of A C M= B D with Me = λmin . Then 1 e = ± (C 2 + D 2 − (λmin )2 , −(AC + BD)) , ρ 1 Me = ± (−A(λmin )2 + D det(M); −B(λmin )2 − C det(M)), ρ
(15) (16)
where ρ is the normalizing constant in (15). The proof is left as an easy exercise. Proof of Lemma 2.1. Let O1 and O2 be orthogonal matrices such that min 0 λi−1 . O2 M (i−1) O1 = 0 λmax i−1 Then the tangent of the angle between ei−1 and ei is given by the slope of the most contracted direction of the matrix min min min λi−1 A λmax 0 0 λi−1 λi−1 A C i−1 C Mi O2−1 := = . max B D 0 λmax 0 λmax λmin i−1 i−1 i−1 B λi−1 D From Sublemma B.1, we see that the slope in question is equal to max (AC + BD)λmin i−1 λi−1 min 2 2 (C 2 + D 2 )(λmax i−1 ) − (λi )
.
i−1 Kb max (i−1) )| < bi−1 , λmin < ( b )i because λmin i−1 λi−1 = | det(M i κ κ2 2 > K −1 κ 2(i−1) , the last inequality being a consequence of the (C 2 + D 2 )(λmax ) i−1 b 2(i−1) 2 . % & that M (i) > κ i and (A2 + B 2 )(λmin i−1 ) < K( κ )
This is ≤
and fact
Before giving the proof of Corollary 2.2 we state another lemma the proof of which is also a straightforward computation. Sublemma B.2. Let A C Mi = , B D
M
(j )
=
Aj C j Bj Dj
,
j = i − 1, i.
Then ei × ei−1 =
1 ρ (i) ρ (i−1)
2 2 | det(M (i−1) )[(AC + BD)(Ci−1 + Di−1 )
(17)
+ (A2 + B 2 − C 2 − D 2 )Ci−1 Di−1 ] + !i |, where ρ (i−1) and ρ (i) are the normalizing constants for ei−1 and ei as in Sublemma B.1, and 2 min 2 !i = −(λmin i ) (Ai−1 Ci−1 + Bi−1 Di−1 ) + (λi−1 ) (Ai Ci + Bi Di ).
Strange Attractors with One Direction of Instability
77
Observe that each the terms in the numerator of (17) has a factor | det(M (i−1) )|, λmin i−1 b i−1 or λmin , all of which are ≤ ( ) . Observe also that if both e and e are nearly i−1 i i κ parallel to the x-axis, then ρ (i) , ρ (i−1) are > K −1 κ 2i (see the proof of Lemma 2.1). Proof of Corollary 2.2. We begin with some useful derivative estimates. First, we claim that ∂ 1 M (i) < K i .
(18)
This is because ∂ 1 M (i) is the sum of i terms of the form Mi · · · Mj +1 (∂ 1 Mj )Mj −1 · · · M1 and the norm of this product is < K02i . A similar argument gives |∂ 1 det M (i) | ≤ (Kb)i .
(19)
min = i = M (i) , it follows from (18) that |∂ 1 λmax Since λmax i i | < K ; and since λi Kb i 1 min | det M (i) |/λmax i , we have |∂ λi | < κ 2 . Pre-composing with a suitable orthogonal matrix as in the proof of Lemma 2.1, we may assume that ρ (i) , ρ (i−1) are > K −1 κ 2i . The estimate for ∂ j θ1 is obtained by differentiating (15). To prove (3), we differentiate (17), and observe using the inequalities above that after differentiation, the numerator is the sum of a finite number of terms each i−1 one of which is bounded above by Kb . κ2 To prove (4), we write
M (i) en = M (i) ei + M (i) (en − ei ) = M (i) ei +
n−1
M (i) (ek+1 − ek )
k=i
and take partial derivative one term at a time. First we have ∂ 1 M (i) (ek+1 − ek ) = ∂ 1 M (i) · (ek+1 − ek ) + M (i) · ∂ 1 (ek+1 − ek ). k because ∂ 1 M (i) ≤ K i The norm of the first term on the right side is bounded by Kb κ2 Kb k k and ek+1 − ek < κ 2 . The norm of the second term is bounded by Kb according κ2 Kb i 1 (i) to (3). It remains to show ∂ M ei < κ 2 . This follows by differentiating (16) and using the inequalities above. The proofs for j = 2, 3 are similar. % & Sublemma B.3. Let Mi and Mi be as in Lemma 2.2, let m < n2 , and write Mi,m = Mi+m Mi−1+m · · · Mm ,
Mi,m = Mi+m Mi−1+m · · · Mm .
Then < Mi,m − Mi,m
for all i, 0 ≤ i ≤ m.
1 (Kλ)m 4
(20)
78
Q. Wang, L.-S. Young
. Then Proof. Set ρk = Mk,m − Mk,m Mk+1,m − Mk+1,m = Mk+1+m Mk,m − Mk+1+m Mk,m
= Mk+1+m (Mk,m − Mk,m ) + (Mk+1+m − Mk+1+m )Mk,m .
< K k and M k+m , we have Since Mk,m k+1+m − Mk+1+m < λ 0
ρk+1 ≤ Kρk + K k λm+k , which implies (20).
& %
Proof of Lemma 2.2. ([BC2], p. 108) We prove the assertion for all the indices that are powers of two and leave the rest as an exercise. To prove (b), write mj = 2j , and let uj =
wmj wmj
,
u j
=
wm j wm j
,
= M (mi ) w. We will show inductively that where wmj = M (mi ) w and wm j
uj × u j < λ
mj 4
.
(21)
Assume that (21) is true up to index j . Let . A = Mmj +1 −mj ,mj and A = Mm j +1 −mj ,mj
Since wmj < K mj and wmj +1 > κ mj +1 , we have 2 mj wmj +1 κ , > Auj = wmj K 2 mj m mj κ 3 κ2 j − K mj λ 4 ≥ . Au j ≥ Auj − Auj − u j ≥ K 4 K
(22) (23)
Writing A u j = A u j − A uˆ j + A uˆ j − Auˆ j + Auˆj , where uˆ j = uj if the angle between uj and u j is smaller than π2 , uˆ j = −uj otherwise, we obtain A u j ≥ Auj − Auj × u j − A − A . Using Sublemma B.3 to bound A − A , we again have m 3 κ2 j A u j ≥ . (24) 4 K We are now ready to prove (21) for index j + 1: uj +1 × u j +1 = <
Auj × A u j
Auj · A j u j Auj × Au j
Auj · A j u j
= +
Auj × (A − A + A )u j Auj · A j u j
Auj × (A − A )u j Auj · A j u j
.
The first term is fine since Auj × Au j = | det(A)|uj × u j and det(A) < bmj . To estimate the second term, we use Sublemma B.3 and (22)–(24).
Strange Attractors with One Direction of Instability
79
To prove (a), we again let i = 2k . Then for 0 < j ≤ k, we have wm = wm A u j = wm A u j − Au j + Au j − Auˆ j + Auˆ j , j +1 j j
so that wm j +1 wm j
≥ Auj − A − Au j − Au j − uˆ j =
wmj +1 wmj
1−
wmj wmj +1
(A
− A + Au j
− uˆ j ) .
Using Sublemma B.3 to bound A − A and part (b) of this lemma to bound uˆ j − u j , we obtain wm wmj +1 j +1 ≥ (1 − 4−mj ), wmj wmj which implies (a). % & λ B.2. Stable curves (Sect. 2.2). On a ball of radius 2K centered at z0 , we have DT ≥ κ2 0 so that e1 , the field of most contracted directions of DT , is well defined. Let γ1 be the integral curve to e1 of length ∼ λ passing through z0 . λ2 -neighborhood of γ1 . For ξ ∈ B1 , let ξ be a point in To construct γ2 , let B1 be the 2K 0
λ2 λ2 Kb 2 2K0 . Then |T ξ −T z0 | ≤ |T ξ −T ξ |+|T ξ −T z0 | ≤ 2 + κ 2 λ < λ , 2 so by Lemma 2.2, DT 2 ξ ≥ κ2 . This ensures that e2 , the field of most contracted directions for DT 2 , is defined on all of B1 . Let γ2 be the integral curve through z0 in B1 . We leave it as an exercise to show that the Hausdorff distance between γ1 and γ2 is O( κb2 λ) << λ2 , so that γ2 has essentially the same length as γ1 . This uses the fact that e1 has Lipschitz constant K (Corollary 2.2) and that the angle between e1 and e2 is (Corollary 2.1). < Kb κ2 λ3 -neighborhood of γ2 and repeat the argument above to Next we let B2 be the 2K 0 get e3 and γ3 . Using the Lipschitzness of e2 and the fact that e3 × e2 ≤ ( Kb )2 , we κ2 conclude again that γ3 has essentially the same length as γ2 . This process is continued
γ1 with |ξ −ξ | <
for n steps.
B.3. Curvature estimates (Sect. 2.3). Recall that ki (s) =
γi (s) × γi (s) . γi (s)3
Write DT = DT (γi (s)) =
and X=
A C B D
< ∇A, γi−1 > < ∇C, γi−1 > > < ∇B, γi−1 > < ∇D, γi−1
,
80
Q. Wang, L.-S. Young
+ where < , > is the usual inner product. Since γi = DT · γi−1 and γi = DT · γi−1 , we have X · γi−1
ki =
1 1 DT · γi−1 × (DT · γi−1 + X · γi−1 ) ≤ (I + II), 3 γi γi 3
(25)
where × γi−1 , I = | det(DT )| · γi−1
II = DT · γi−1 × X · γi−1 .
. Moreover, the second component of each Term II is degree three homogeneous in γi−1 vector involved in the cross product has a factor b. Thus there exist K > 0 such that
ki ≤ (Kb · ki−1 + K · b) ·
3 γi−1
γi 3
.
(26)
Lemma 2.4 follows by recursively applying inequality (26).
B.4. One-dimensional dynamics (Sect. 2.4). Let δ0 :=inf{d(f n x, ˆ C) : xˆ ∈ C, n > 0}. We begin with three easy observations: (i) There exists k0 > 0 such that for all δ < 21 δ0 , if x is such that f n x ∈ Cδ , then |(f n ) x| ≥ k0 . This is true because there is an interval (x1 , x2 ) containing x on which f n is monotone and f n (x1 , x2 ) ⊃ (xˆ − 2δ, xˆ + 2δ) for some xˆ ∈ C. It then follows from the negative Schwarzian property that restricted to f −n (xˆ − δ, xˆ + δ) ∩ (x1 , x2 ), |(f n ) | ≥ some k0 > 0 independent of x. (ii) There exists λ0 > 1 such that for all sufficiently small δ, if d(x, C) < δ, then there p exists p = p(x) such that f i x ∈ Cδ for all i < p and |(f p ) x| ≥ λ0 . This is an easy computation using the fact that the forward critical orbits of f are contained in a uniformly expanding invariant set. Let p(δ) ˆ =inf{p(x) : d(x, C) < δ}. (iii) For all sufficiently small δ, there exist N1 (δ) ∈ Z and λ1 (δ) > 1 such that if x, · · · , f n x ∈ Cδ for some n > N1 , then |(f n ) x| ≥ λn1 . This is proved in [M1]. We now prove the assertion in Lemma 2.5. Fix δ1 sufficiently small for (i)–(iii) p(δ ˆ ) above, and with the property that λ0 1 >> k0−1 . Consider δ < δ1 and an orbit segment x, · · · , f n x with f i x ∈ Cδ for i < n and f n x ∈ Cδ . To estimate (f n ) x, we let nj be the p j th time f i x ∈ Cδ1 , and let pj = p(f nj x). Then |(f pj ) (f nj x)| ≥ λ0 j , and between the times nj +pj and nj +1 , the derivative is bounded below by λ1 (δ1 )nj +1 −(nj +pj ) if nj +1 − (nj +pj ) > N1 (δ1 ), by k0 otherwise. The same estimate holds for the initial stretch up to p time n1 . Noting that the factor k0 can be absorbed into λ0 j , we see that |(f n ) x| ≥ ecˆ1 n , p(δ ˆ 1)
ˆ 1 )+N1 (δ1 ) . Also, c where ecˆ1 can be taken to be slightly smaller than (min(λ0 , λ1 (δ1 )) p(δ ˆ0 can be taken to be k0 λ−N1 (δ1 ) . This completes the proof of part (ii) of Lemma 2.5. To prove (i), let nq < n be the last time f i x ∈ Cδ1 , and observe that |(f n−nq ) (f nq x)| ≥ K −1 k0 δ if n − nq < N1 (δ1 ), ≥ K −1 δλ1 (δ1 )n−nq otherwise. % &
Strange Attractors with One Direction of Instability
81
B.5. Critical points inside C (0) (Sect. 2.6). Proof of Lemma 2.9. Write dq1 (s) dx(s) dy(s) = ∂x q1 (x, y) + ∂y q1 (x, y) . ds ds ds Since γ is b-horizontal, we have
dx(s) ds
q1 (s) =
≈ 1 and |
dy(s) ds
|< O(b)· |
(27)
dx(s) ds
|. By (15)
AC + BD , C 2 + D 2 − (λmin )2
(28)
so Ax C + ACx + O(b) (AC + BD)(CCx + DDx + λmin λmin x ) −2 2 2 min 2 C + D − (λ ) (C 2 + D 2 − (λmin )2 )2 := I + II,
∂x q1 (x, y) =
where A = Fx + bux , C = Fy + buy , B = bvx , D = bvy . We will show that |I | ≥ K −1 and |II| = O(δ). To estimate I , observe that the denominator is > K −1 , and that for (x, y) ∈ C (0) , |ACx | = O(δ), while |Ax C| = |Fxx Fy |(1 + O(b)) ≥ K −1 since |Fy | > K −1 (non-degeneracy condition). Term II follows from the fact that its denominator is ≥ K −1 , and AC + BD = O(δ). % & Proof of Lemma 2.10. Using the results in Sect. 2.1 and Lemma 2.9, we have that at m γ (s) with |s| < (Kb) 2 , e3m is defined with |q3m − qm | < (Kb)m (Lemma 2.2) and d −1 | ds qm | ≥ K (Corollary 2.2 and Lemma 2.9). Let τ (s) denote the slope of γ (s), and d q3m > 0. Then assume for definiteness that ds m
m
m
m
m
q3m ((Kb) 2 ) − τ ((Kb) 2 ) = (q3m ((Kb) 2 ) − qm ((Kb) 2 )) + (qm ((Kb) 2 ) − qm (0)) m
+ (qm (0) − τ (0)) + (τ (0) − τ ((Kb) 2 )) m
m
≥ − (Kb)m + K −1 (Kb) 2 + 0 − K1 b(Kb) 2 ≥ m
m K −1 (Kb) 2 . 2 m
Similarly, q3m (−(Kb) 2 ) − τ (−(Kb) 2 ) < 0, giving a unique critical point of order 3m in between. % & Proof of Lemma 2.11. Let τ (s) be the slope of the tangent vector to γ at γ (s), and let qm (s) be the slope of qm at γ (s). Let τˆ (s) and qˆm (s) denote the corresponding quantities at γˆ (s). First we claim that √ | τ (0) − τˆ (0) |≤ 2 ε. (29) An easy calculation (which we omit) √ shows that if this was not the case, then γ and γˆ would meet at γ (s) for some |s| < ε.
82
Q. Wang, L.-S. Young
√ Let m ˆ be the largest integer j ≤ m such that 4K1 ε < DT −13j . Then by √ √ 1 ˆ and s ∈ [−4K1 ε, 4K1 ε]. This Lemma 2.2, DT i (γ (s)) > 2 for 0 < i < m guarantees that qmˆ is defined everywhere on γ and on γˆ . Let σˆ (s) := qˆmˆ (s) − τˆ (s). We have |σˆ (0)| ≤ |qˆmˆ (0) − qmˆ (0)| + |qmˆ (0) − qm (0)| + |qm (0) − τ (0)| + |τ (0) − τˆ (0)| √ √ < Kε + (Kb)mˆ + 0 + 2 ε < 3 ε. To prove the existence of a critical point of order m ˆ on γˆ , we will compare the signs of σˆ at the two end points of γˆ . First, √ √ √ d d σˆ (4K1 ε) = qˆmˆ (0) + qmˆ (s1 ) · 4K1 ε − τˆ (0) − τˆ (s2 ) · 4K1 ε ds ds √ for some s1 , s2 ∈ [0, 4K1 ε]. This is = σˆ (0) + (
√ d qmˆ (s1 ) + O(b)) · 4K1 ε. ds
√ Since the second term has absolute value > (K1−1 − O(b)) · 4K1 ε > |σˆ (0)|, it √ d follows that σˆ (4K1 ε) has the same sign as ds q1 . An analogous computation shows √ d that σˆ (−4K1 ε) has the opposite sign as ds q1 . % & B.6. Growth of wi and wi∗ (Sect. 4.2). Sublemma B.4. Let z0 be h-related to zˆ 0 ∈ .θN with bound period p < 23 N , and let w0 = 01 . Then for i ≤ p, wi∗ > K −1 ec i for some c ≈ c. Proof. Let wˆ i∗ be as defined in (IA6). Then (IA4) and (IA6) together imply that wˆ i∗ > c0 ci ˆ i∗ and wi∗ is that contractive fields of order J(ˆzi ) 2 e . The only difference between w are used for splitting for the former and J(zi ) the latter at returns to C (0) . By Lemma 4.2, J(zi ) = J(ˆzi ) ± 1, so that recombination times may differ by one. This is clearly of no consequence. Assuming these times are synchronized, we observe next that wi∗ has the same direction as wˆ i∗ . This can be seen inductively (using the nested property of fold periods). Finally, a vector split using a field of order J or J + 1 may differ in length by a factor of 1 ± O(bJ ). Thus wi∗ ≥ (1 − O(b))i wˆ i∗ . % & Proof of Lemma 4.6. We may assume zi is in a fold period, otherwise there is nothing to prove. Let i1 < i ≤ i2 be the longest fold period containing i. By Lemma 4.4, which applies also to controlled orbits satisfying dC (zj ) > e−αj , we have i2 − i1 ≤ εi. Let 0 wi1 = Ae + B 1 be the usual splitting. Then wi∗ ≤ K i−i1 |B| ≤ K i−i1 wi∗2 = K i−i1 wi2 ≤ K i−i1 (K i2 −i wi ) ≤ K εi wi . The first “≤” uses the fact that
wj∗+1 wj∗
≤ some K, the second uses Sublemma B.4,
and the third DT ≤ K. The reverse estimate follows from wi ≤ K i−i1 wi1 ≤ K i−i1 dC (zi1 )−1 wi∗ and dC (zi1 ) > e−αi . % &
Strange Attractors with One Direction of Instability
83
Proof of Lemma 4.7. We give a proof in the case where j exists; the other case is simpler. Let k ≤ i1 < i1 + p1 ≤ i2 < i2 + p2 ≤ · · · ≤ ir = j < n be defined as follows: we let i1 be the first return to C (0) at or after time k, p1 the bound period of zi1 , i2 the first w∗ return after i1 + p1 , and so on until ir = j . Writing k = i0 + p0 , we have that wn∗ is a k product of factors of the following three types: I :=
wi∗s+1
wi∗s +ps
,
II :=
wi∗s +ps wi∗s
and
III :=
wn∗ . wj∗
First we prove the lemma assuming that no fold periods initiated before time k expires between times k and n. By Lemma 2.8, I ≥ c0 ec1 (is+1 −(is +ps )) . Since wi∗s splits correctly, c
we have, by (IA5), II ≥ K −1 e 3 ps . Moreover, we may assume that c0 and K above can be absorbed into the exponential estimate for the bound period [is , is + ps ]. For III, let J be the fold period initiated at time j . If J > n − j , then III ≥ K −1 dC (zj )ec (n−j ) by Sublemma B.4. If not, we split wj∗ into wj∗ = Aen−j +B 01 , noting that en−j is defined
at zj by Sublemma B.4 and Lemma 4.6. Then III ≥ K −1 dC (zj )ec (n−j ) − (Kb)n−j . J The last term is negligible because dC (zj ) ∼ b 2 >> (Kb)n−j . Altogether, this gives ∗ wn −1 c (n−k) for some c > 0 as claimed. wk∗ ≥ K dC (zj )e In the rest of the proof, we view contributions from fold periods initiated before time k as perturbations of the estimates above, and verify that they are inconsequential. For I , we claim that for each t in question, ∗ √ DT (zt−1 )wt−1 wt∗ b = 1 ± O , ∗ ∗ wt−1 wt−1
so that I has the same estimate as before with possibly a slightly smaller c1 . This claim follows from the fact that when a fold period initiated J steps earlier expires at time t, the ∗ O(b 2J ). (See Sect. 2.7). vector to rejoin the main term has magnitude DT (zt−1 )wt−1 Next we turn to III, which is similar to and a little more complicated than II. Given zt and a vector u, we let u, T∗1 (zt )u, T∗2 (zt )u, · · · denote the vectors given by the splitting algorithm for the orbit segment beginning at zt with initial vector u – neglecting recombinations from fold periods initiated before time t. Then n−j
wn∗ = T∗
(zj )wj∗ +
n t=j +1
T∗n−t (zt )Et ,
where Et is the sum of the vectors to be rejoined at time t. For fixed t, let J be the shortest fold period initiated before k to expire at time t. From Sect. 2.7, we have J Et ≤ (Kb) 2 wt∗ . Also, since this fold period contains the one initiated at j , we have, by Sect. 4.1, KαJ > (n − j ). Together this gives J
T∗n−t (zt )Et ≤ K n−t (Kb) 2 wt∗ ≤ (Kb Kα )n−j wt∗ . 1
84
Q. Wang, L.-S. Young
Assuming inductively that the assertion in the lemma has been proved for shorter time intervals, we have wt∗ ≤ KdC (zjt )−1 wn∗ , where jt is a return between times t and n. Thus n t=j +1
T∗n−t (zt )Et < (n − j )(Kb Kα )n−j eα(n−j ) wn∗ << wn∗ , 1
n−j
which together with our earlier estimate on T∗
(zj )wj∗ gives the disired result.
& %
Proof of Lemma 4.8. The case where zk is not in a fold period is contained in Lemma 4.7. Let j < k be the point in time when the largest fold period covering zk is initiated. Splitting wj = A 01 + BeJ as usual, and noting that J ≥ k − j , we have J
wk ≤ b 2 DT k−j wj + (Kb)j −k wj << wj . The assertion again follows from Lemma 4.7. % & Proof of Lemma 4.5. The proof proceeds inductively. Consider a bound return zi , and assume that the wj∗ -vectors split as desired at all returns prior to time i. Let (·, ·) denote the angle between two vectors, and let u = 01 . Case 1. zi is in a fold period. Let j < i be the largest integer such that the fold period initiated at time j remains in effect at time i, and let zˆ 0 = φ(zj ). We will prove that
(DT i−j (zj )u, τ (φ(zi )) <
3 ε0 dC (zi ). 2
We compare this inequality to
(DT i−j (ˆz0 )u, τ (φ(ˆzi−j )) < ε0 dC (ˆzi−j ),
which we know to be true by (IA3). Suppose zˆ i−j ∈ C (k) . Then – (DT i−j (zj )u, DT i−j (ˆz0 )u) << e−β(i−j ) << dC (zi ) by (IA6); k−1 – (τ (φ(zi ), τ (φ(ˆzi−j )) < b 4 << dC (zi ) by Lemma 4.1; k−1 – |dC (zi ) − dC (ˆzi−j )| < e−β(i−j ) + b 4 << dC (zi ). Case 2. zi is not in any fold period. In this case let j < i be the last free return, so that the bound period initiated at j remains in effect at i and wi∗ = DT i−j (zj )wj∗ . We split wj∗ (z0 ) = Aei−j + Bu; ei−j (zj ) is defined (even though i − j > J(zj )) by (IA6) and Lemma 4.6. We proceed as in Case 1 to estimate the angle of splitting for DT i−j (zj )u at zi . It remains to check that adding A · DT i−j (zj )ei−j will only change the angle of B · DT i−j (zj )u by << e−α(i−j ) < dC (zi ). This is true because A · DT i−j (zj )ei−j <
i−j i−j |B| bi−j < |B|b 2 < b 2 B · DT i−j (zj )u. ∗ (e, wj )
& %
Strange Attractors with One Direction of Instability
85
B.7. Distortion during bound periods (Sect. 4.3). The notation and context are those of Lemma 4.9. Sublemma B.5.
µ
K
i=1
!i << 1. dC (zi )
Proof. Since |ξs − zs | < e−βs for all s < µ, we have !i < 2e−βi . Let h0 be large enough that µ
K
i=h0 +1
∞ 1 −(β−α)i !i 1 e−(β−α)h0 e
and assume δ is small enough that h0
K
i=1
h0 !i 1 K (eα DT )i )δ << 1. <( dC (zi ) δ0
& %
i=1
Proof of Lemma 4.9. (cf. [BC2], Lemma 7.8). Assuming the lemma for all i < µ, we give the proof of (7) for step µ; the bound in (8) is proved similarly. Case 1. No fold period expires at zµ and µ − 1 is not a return time. In this case ∗ (·). Writing C = DT (z wµ∗ (·) = DT (·)wµ−1 µ−1 ), C = DT (ξµ−1 ), u=
∗ (z ) wµ−1 0
∗ (z ) wµ−1 0
and
u =
∗ (ξ ) wµ−1 0
∗ (ξ ) wµ−1 0
,
we have Mµ Mµ
= ≤
Mµ−1
Mµ−1 Mµ−1 Mµ−1
Mµ−1 C u ≤ Cu Mµ−1
C u − Cu Cu C − C C(u − u ) 1+ + . Cu Cu ·
1+
Since Cu > K −1 δ, C − C < K|ξi−1 − zi−1 | and u − u ∼ |θµ−1 − θµ−1 | < 1
Kb 2 !µ−2 , we have Mµ Mµ
≤
Mµ−1
Mµ−1
!µ−1 · 1+K . dC (zµ−1 )
Case 2. µ − 1 is a return time. Then ∗ (z0 ) = A(zµ−1 ) · e(zµ−1 ) + B(zµ−1 ) · w0 . wµ−1
Let A0 =
A(zµ−1 ) ∗ (z ) ; wµ−1 0
B0 =
B(zµ−1 ) ∗ (z ) . wµ−1 0
86
Q. Wang, L.-S. Young
Then since wµ∗ (z0 ) = B(zµ−1 ) · DT (zµ−1 )w0 , we have Mµ Mµ
=
Mµ−1
Mµ−1
·
|B0 | C w0 · . |B0 | Cw0
− θµ−1 | + e − e , we get Also with |B0 | ∼ dC (zµ−1 ) and |B0 − B0 | ≤ |θµ−1
B0 !µ−1 . B − 1 < K d (z 0 C µ−1 ) For the last ratio,
(30)
C w0 ≤ 1 + K|ξµ−1 − zµ−1 |. Cw0
This finishes the computation for the magnitude. We record also the estimate |A0 − A 0 | < K!µ−1 for use in Case 3. Case 3. There exists a return time j whose fold period expires at time µ. In this case wµ∗ (z0 ) = B(zj ) · DT µ−j (zj )w0 + A(zj ) · DT µ−j (zj )e(zj ). Let B0 =
B(zj ) wj∗ (z0 )
A0 =
A(zj ) , wj∗ (z0 )
C = DT µ−j (zj )w0 , Y = DT µ−j (zj )e(zj ). As before, all the corresponding quantities for ξ0 carry a prime. Then Mµ Mµ
= ≤
Since
|A0 | |B0 |
∼
Mj Mj Mj Mj
1 dC (zj )
· ·
B0 C + A 0 Y B0 C + A0 Y C C
and
·
Y C
|B0 | |B0 |
· 1 +
C C −
C C
+
C C
≤ dC2 (zj ), it follows that
A 0 Y B0 C
− 0Y + BA0 C
A0 Y B0 C
A0 Y B0 C
.
<< 1, giving
C A0 Y C |B0 | C A0 Y . ≤ · · · 1+2 − +2 + Mµ Mj C |B0 | C C B0 C B0 C Mµ
Mj
µ
µ−j
µ
Since both {zs }s=j and {ξs }s=j are bound to a critical segment {ηs }s=0 , η0 ∈ .θN , we have µ−j µ−j µ−1 −1 ! −1 !s+j !i ˆs C ≤1+K ≤1+K =1+K , C dC (ηs ) dC (zs+j ) dC (zi ) s=1
s=1
i=j +1
Strange Attractors with One Direction of Instability
87
where ˆs = !
s
j
(Kb) 4 |zs−j − ξs−j |.
(31)
j =1 |B0 | |B0 |
is estimated in (30). This term has no cumulative effect because it is a The factor one-time addition to the exponent in the distortion formula for any given return. Next C C ˆ C − C < θ, ˆ µ−j −1 . Now where θˆ is the angle between C and C , which is smaller than ! A0 Y A 0 A0 Y |A0 | Y − Y A0 B C − B C ≤ |B | · C + B C − B C Y . 0 0 0 0 0 For the first term we have 1 |A0 | ∼ , |B0 | dC (zj )
Y − Y ≤ (Kb)µ−j |ξj − zj |,
and C > 1, where the estimate on Y − Y is from (4) in Corollary 2.2. For the second term, A0 A 0 Y ≤ (Kb)µ−j |A0 | · 1 · A0 · B0 − 1 + 1 − C − B C B C |B0 | C A 0 B0 C 0 0 A0 (Kb)µ−j |A0 | B0 C ≤ . − 1 + − 1 + 1 − dC (zj ) |A 0 | B0 A0 C We again estimate term by term: For the first term, µ−j (Kb)µ−j !j (Kb) 2 (Kb)µ−j |A0 | B0 − 1 ≤ · · · < · !j dC (zj ) |A 0 | B0 dC (zj ) dC (zj ) dC (zj ) because b
µ−j 2
< dC (zj ) by the definition of fold period. For the second term, (Kb)µ−j A0 − A 0 (Kb)µ−j ≤ · !j . dC (zj ) A 0 dC (zj )
Finally, for the third term, we have µ−j −1 ˆs (Kb)µ−j ! C (Kb)µ−j 1− ≤ , dC (zj ) C dC (zj ) dC (zs+j ) s=1 ˆ s is as in (31). We also have b where !
µ−j 2
< dC (zs+j ), for no fold period starting at µ−j ˆ s ≤ !j for all s, 0 < s < µ − j . time s + j extends beyond index µ. Also, b 2 · ! !j Therefore the third term is again bounded by K dC (zj ) . Observe further that if we replace z0 by another point ξ0 which is bounded to z0 , the same argument above continues to work with !i (ξ0 , z0 ) replaced by !i (ξ0 , ξ0 ). This completes the proof. % &
88
Q. Wang, L.-S. Young
B.8. Quadratic behavior (Sect. 4.3). Let ξ0 (s) and z0 be as in Lemma 4.11. We begin with the following A priori estimate on ξµ (s) − zµ (cf. [BC2, pp. 144–147]). Let t0 (s) be a unit vector to γ at ξ0 (s), and let tµ = DT µ t0 . We split t0 using eµ to get t0 = A0 eµ + B0 01 , so that tµ = A0 DT µ eµ + B0 wµ . Writing wµ = wµ (0) + (wµ − wµ (0)) = wµ (0) + (wµ∗ − wµ∗ (0)) + (Eµ − Eµ (0)), where Eµ =
Aj DT µ−j eJj
j ∈Sµ
and Sµ is the collection of j such that the fold period begun at time j extends beyond time µ, we have s s tµ (u)du = wµ (0) B0 (u)du + I + II + III, (32) ξµ (s) − zµ = 0
where
s
I= 0
µ
0
A0 DT eµ ,
II = 0
s
B0 (wµ∗
− wµ∗ (0)),
Since A0 ≈ 1, I ≤ (Kb)µ s. We claim that II, |III ≤ Ke2αµ wµ∗ (0)
s 0
III = 0
s
B0 (Eµ − Eµ (0)).
u(sup |zi − ξi (u)|) du. i≤µ
The norm of II is estimated using the distortion estimate in Lemma 4.9. To estimate III, we have, for each j ∈ Sµ , Aj DT µ−j (ξj )e(ξj ) − Aj (0)DT µ−j (zj )e(zj ) ≤ (Kb)µ−j |Aj − Aj (0)| + |Aj (0)| DT µ−j (ξj )e(ξj ) − DT µ−j (zj )e(zj ). From the distortion estimate in appendix B.7, |Aj − Aj (0)| < Kwj∗ (0)e2αj sup |zi − ξi |. i≤j
For the second term we have |Aj (0)| ≤ wj∗ (0)eαj because wj∗ (0) splits correctly at time j , and wj∗ (0) ≤ 1δ eαµ wµ∗ (0) by Lemma 4.7. Finally, DT µ−j (ξj )e(ξj ) − DT µ−j (zj )e(zj ) ≤ (Kb)µ−j |ξj − zj | by Corollary 2.2, and B0 (s) ≈ 2K1 s. Proof of Lemma 4.11. We will show that for the µ and s in question, the first term in (32) is the dominating one. For those s with p = p(ξ0 (s)) ≤ n0 , the entire action takes place at a distance > 21 δ0 from the critical set (see the remark after Definition 3.5). This case is straightforward and is left to the reader. We consider here only those s with p > n0 .
Strange Attractors with One Direction of Instability
Define
89
Uµ := Ke4αµ sup wj∗ (0), j ≤µ
where K is the constant in the bound for II and III above. Let µ0 be large enough that e7αµ0 e−βµ0 << 1. By taking n0 sufficiently large, we may assume Uµ0 s 2 << 1 for all the s in question. We will show inductively first the weaker statement (i) |ξj (s) − zj | < Uj s 2 , and then the stronger statement (ii) |ξj (s) − zj | = K1 (1 ± ε1 )wj (0)s 2 . Assume that (i) and (ii) have been proved for all j < µ. To prove them for j = µ, we need the preliminary estimate Uµ s 2 << 1. It suffices to consider the case µ > µ0 . Observe first that for µ ≤ n0 , one has the trivial estimate sup wj∗ (0) ≤ Kwµ−1 (0),
j ≤µ
and if µ > n0 , then using Lemmas 4.7, 4.6 and the fact that
1 δ
< eαµ , one again has
sup wj∗ (0) < eαµ wµ∗ (0) ≤ e2αµ wµ (0) ≤ Ke2αµ wµ−1 (0).
j ≤µ
This combined with (ii) for step µ − 1 gives Uµ s 2 ≤ Ke4αµ (Ke2αµ wµ−1 (0))s 2 ≈ K 2 e6αµ K1−1 |ξµ−1 (s) − zµ−1 | < K 2 e6αµ K1−1 e−β(µ−1) << 1.
s
B0 ≈ K1 s 2 , we see from our a priori estimate that s |ξµ (s) − zµ | ≤ wµ (0)K1 s 2 + (Kb)µ s + 2Uµ Ku [sup |zi − ξi (u)|] du.
Noting that
0
0
i<µ
With the quantity inside square brackets being < Uµ u2 by (i) from the previous step, this is µ < (Uµ s 2 )e−αµ + (Kb) 2 s 2 + K(Uµ s 2 )2 < Uµ s 2 . The proof of (ii) for step µ now follows immediately.
& %
B.9. Proof of Lemma 6.2 (Sect. 6.2). We begin with a scenario for which one sees easily that the assertion in this lemma holds: Suppose for 1 ≤ j ≤ i − s, DT j (zs ) > κ j for 1 some κ >> b 2 , and that zs is bounded away from C (0) . Then ei−s (zs ) is well defined and has slope > K −1 . Suppose, in addition, that zs is out of all fold periods, so that ws is a b-horizontal vector. Then DT i−s (zs ) ws ≤ KDT i−s (zs )ws = wi .
This together with ws > cc s (which follows from ws∗ > ecs ) gives the desired estimate. Now, intuitively, the behavior of DT j (zs ) is a little different just before or after a return to C (0) . This motivates the following definition: If t is a return time to C (0) for z0 , let Jt denote its fold period and let It := (t − 5Jt , t + Jt ).
90
Q. Wang, L.-S. Young
Claim B.1. By modifying It slightly to I˜t = (t − (5 ± ε)Jt , t + (1 ± ε)Jt ), we may assume they have a nested structure. Proof of Claim B.1. We consider t = 0, 1, 2, · · · in this order, and determine, if t is a return time, what I˜t will be. The right end point of I˜t is determined by the following algorithm: Go to t + Jt , and look for the largest t inside the bound period initiated at time t with the property that t − 5Jt < t + Jt . If no such t exists, then t + Jt is the right end point of I˜t . If t exists, then the new candidate end point is t + Jt , and the search continues. For the same reasons as in Sect. 4.1, the increments in length are exponentially small and the process teminates. As for the left end point of I˜t , it is possible that t − 5Jt ∈ I˜t for some t the bound period initiated at which time does not extend to time t. This means that Jt << Jt , and since we assume a nested structure has been arranged for I˜t for all t < t, we simply & extend the left end of I˜t to include the largest I˜t that it meets. % Let us assume this nested structure and write It instead of I˜t from here on. Claim B.2. For s ∈ ∪It , we have, for all j with 1 ≤ j < i − s, j
ws+j ≥ b 9 ws . Proof of Claim B.2. We fix j and let r be such that zr makes the deepest return between times s and s + j . Let j be the smallest integer ≥ j such that zs+j is outside of all fold periods. Then from Sect. 4.2 , it follows that
Jr
ws+j ≥ K −(j −j ) ws+j ≥ K −K(j −j ) dC (zr )ws ≈ K −K(j −j ) b 2 ws . (33)
Case 1. s +j ∈ Ir . In this case, 6Jr < j since Ir is sandwiched between s and s +j , and j − j ≤ Jr because r is the deepest return. The rightmost quantity in (33) is therefore j Jr > K −Jr b 2 ws > b 9 ws . Case 2. s + j ∈ Ir . The argument is as above, except we only have 5Jr < j . This completes the proof of the claim. % & As noted in the first paragraph, Claim B.2 implies the assertion in Lemma 6.2 for s ∈ ∪It provided zs is bounded away from C (0) . For s ≤ n0 , this is always the case. For s > n0 , the slope of the contractive vector is only guaranteed to be > K −1 δ, which in principle introduces a copy of 1δ to the right side of the inequality in Lemma 6.2. This factor, however, can be absorbed into the exponential by taking n0 sufficiently large. It remains to prove the lemma for s ∈ ∪It . Let Ir be the maximal It -interval containing s. Observe that 6Jr < Kαθ s (recall that z0 obeys (IA2)) and wi > ec i for some 1 1 1 c > 0. If i ∈ Ir , then DT i−s (zs ) < K 6Jr << e 2 c i < e− 2 c s ec i < e− 2 c s wi . If i ∈ Ir , let s = r + Jr . Then s ∈ ∪It , and
DT i−s (zs ) ≤ DT s −s (zs ) · DT i−s (zs ) ≤ K 6Jr · Ke−c s wi .
Strange Attractors with One Direction of Instability
91
B.10. Initial data for critical curves (Sect. 6.3). Proof of Lemma 6.4. Let Ji := [aˆ −ρ 2i , aˆ +ρ 2i ]. Assume for all i < n that the following has been proved: ˜ i; (i) Ji ⊂ ! ˆ has a smooth continuation on Ji and C (i) deforms continuously; (ii) .i,i (a) dz ≤ Ki. (iii) for all z ∈ .i,i , da We now prove (i)–(iii) for i = n. First we verify that for all a ∈ Jn and z0 ∈ .n−1,n−1 , (IA2) and (IA4) hold up to time n. This is true for a = a. ˆ For a ∈ Jn , |z0 (a) − z0 (a)| ˆ < ρ 2n K n−1 , so that |zj (a) − zj (a)| ˆ < ρ 2n K 2n for all j ≤ n. We may assume that ρK is << 1. It then follows from the discussion at the beginning of Sect. 6.3.1 that .n,n (a) is well defined, proving (i). To prove (ii), we fix an arbitrary a˜ ∈ Jn , a component Q(n−1) of C (n−1) , and show that every segment of ∂Rn (a) ˜ ∩ Q(n−1) (a) ˜ has a continuation to a segment of ∂Rn (a) ∩ Q(n−1) (a). Let ω˜ be a segment of this kind, and let ω(a) := Tan (2Ta˜−n ω), ˜ where 2Ta˜−n ω˜ refers to the segment in ∂R0 with the same midpoint as Ta˜−n ω˜ and two times as long. Observe that as we vary our parameter from a˜ to a, the segment ω(a) cannot intersect the horizontal boundaries of Q(n−1) (a). Thus the only way ω(a) can fail to traverse fully Q(n−1) (a) is that it has moved sufficiently far from ω(a) ˜ in the horizontal direction. We know this cannot happen because |Ta˜n − Tan | ≤ ρ 2n K n which is << ρ n . This proves (ii). It remains to prove (iii). Consider z¯ (a) = (x(a), ¯ y(a)) ¯ ∈ .n,n (a), and let y = ψ(x, a) denote the C 2 (b)-curve in ∂Rn (a) containing z¯ (a). Then qn (x(a), ¯ ψ(x(a), ¯ a), a) = ∂x ψ(x(a), ¯ a), where qn (x, y, a) is the slope of the contractive vector of order n at z = (x, y). Taking derivative with respect to a on both sides of the last equation, we have ∂ x qn ·
d x¯ d x¯ d x¯ + ∂y qn · (∂x ψ · + ∂a ψ) + ∂a qn = ∂xx ψ · + ∂ax ψ. da da da
This implies ∂xa ψ − ∂y qn · ∂a ψ − ∂a qn d x¯ = . da ∂x qn + ∂y qn · ∂x ψ − ∂xx ψ
(34)
Since ∂x ψ, ∂xx ψ = O(b), |∂x qn | > K1 and |∂y qn | < K (see Corollary 2.2 and Lemma 2.9), the denominator on the right-hand side is bounded away from zero. In the numerator, we have |∂y qn |, |∂a qn | < K, and we need to estimate ∂a φ(x, a) and ∂ax φ(x, a). For this purpose we write the horizontal curve y = ψ(x, a) in parametric form x = X(t, a), y = Y (t, a), where t is the x-coordinate of Ta−n (x, y), i.e., (t, ±b) ∈ ∂R0 and (X(t, a), Y (t, a)) = Tan (t, ±b). Let t = t (x, a) be defined by ψ(x, a) = Y (t (x, a), a). Then ∂a ψ = ∂t Y (t, a) · ∂a t (x, a) + ∂a Y (t, a).
92
Q. Wang, L.-S. Young
Clearly, |∂t Y (t, a)| < K n b and |∂a Y (t, a)| < K n . One way to bound ∂a t (x, a) is to write it as ∂a X(t, a) ∂a t (x, a) = − . ∂t X(t, a) Since |∂t X(t, a)| > 1 (recall that Tan |∂R0 is controlled), this term is also < K n . Similar considerations yield |∂ax ψ(x, a)| < K n . We have proved ddax¯ < K n . The corresponding estimate for dday¯ follows immediately since d y¯ d x¯ = ∂x ψ + ∂a ψ. da da We record an estimate needed in the proof of Lemmas 6.5 and 6.6. Taking derivatives with respect to a one more time on both sides of (34) and estimating corresponding 2 terms (using again Corollary 2.2 and Lemma 2.9), we obtain | ddax2¯ | < K n . This estimate & requires that Ta,b be C 3 . % Recall the following lemma due to Hadamard: Lemma B.1 (Hadamard). Let g ∈ C 2 (0, L) be such that |g| ≤ M0 and |g | < M2 . If 4M0 < L2 , then |g | ≤ M0 (1 + M2 ). Proof of Lemma 6.5. Let z(n) = (x (n) , y (n) ). For our pusposes, let g(a) = x (n) (a) − n n n (n) x (n−1) (a) and L = 2ρ 2n . Then M0 = b 4 and M2 = K n . Thus | dxda | < b 8 K n < b 9 . A similar estimate holds for y (n) . % & Proof of Lemma 6.6. Let zn (a) = (x n (a), y n (a)), zm (a) = (x m (a), y m (a)), and let y = ψ(x, a) be the C 2 (b)-curve segment in ∂Rn containing both zn (a) and zm (a). Arguments similar to those used to prove |∂xa ψ| < K n can also be used to prove that the C 3 -norm of ψ is < K n . Let Pn and Qn be the numerator and denominator on the right hand side of (34), and similarly for Pm and Qm . Then dx n (Pn − Pm )Qn + (Qm − Qn )Pn dx m P n Q m − P m Qn = . − = da da Qn Qm Qn Qm As observed in the proof of Lemma 6.4, |Qm |, |Qn | > K −1 . |Qm |, |Pn | < K n . It remains therefore to estimate |Qm − Qn | and |Pm − Pn |. Let qn and qm denote the slopes of en and em respectively. Fixing a and omitting it in the arguments of the functions below, we have |Qm − Qn | ≤ |∂x qm (zm ) − ∂x qn (zn )| + |∂y qm (zm ) · ∂x ψ(zm ) − ∂y qn (zn ) · ∂x ψ(zn )| + |∂xx ψ(zm ) − ∂xx ψ(zn )|. The second difference, for example, is ≤ |∂y qn (zn )||∂x ψ(zm ) − ∂x ψ(zn )| + |∂x ψ(zm )||∂y qm (zm ) − ∂y qn (zm )| + |∂x ψ(zm )||∂y qn (zm ) − ∂y qn (zn )|. n
This is < (Kb) 4 since ψC 3 < K n , |∂xx q|, |∂x ∂y q| < K (Corollary 2.2), |qm (zn ) − n qn (zn )| < (Kb)n (Lemma 2.1) and |zm − zn | < (Kb) 4 (Lemma 2.10). The other terms in |Qm − Qn | and |Pm − Pn | are estimated similarly. % &
Strange Attractors with One Direction of Instability
93
B.11. Dynamics of critical curves (Sect. 6.4). Proof of Lemma 6.8. Let zˆ 0 be an arbitrary critical point. First we observe that as functions d of a, zi (a) and zˆ 0 (a) move at very different speeds: da zi (a) ∼ wi (a) > eci by d Proposition 6.1, whereas from Sect. 6.3 we have da zˆ 0 (a) < K. Next we consider zi (a) ∈ Q(k−1) (a) \ Q(k) (a) for some k << i, so that φa (zi (a)) ∈ ∂Q(k−1) (a), and study the relative movements of zi , φ(zi ) and the relevant critical regions as a varies. For definiteness, let us assume that zi is in the right component of (Q(k−1) ∩ Rk ) \ Q(k) (which we call A), and that it moves left as a increases. (See Fig 1 in Sect. 1.2.) In horizontal distance, it follows from the first paragraph that relative to φ(zi ), zi is moving left at a speed > K −1 eci − K, which we assume to be >> 1. We do not have analytic estimates on the relative vertical movements of φ(zi ) and zi , but note that since zi ∈ ∂Rk , it must enter A through its right vertical boundary and exit through the left. As zi meets these vertical boundaries, it crosses them instantaneously due again to the horizontal speed differential between zi and the critical points which determine these regions. What we have shown is that the function a → φa (zi (a)) is continuous except at a discrete set of points corresponding to when zi (a) crosses a vertical boundary of some Q(k) . If a and a are the entry and exit parameters for Q(k−1) \ Q(k) as above, we have |a − a | < ρ k−1 (K −1 eci − K)−1 , and consequently |φa (zi ) − φa (zi )| < K ρ k−1 e−ci . As zi crosses the vertical boundary into Q(k) , a jump in φa (·) occurs due to our rule for k−1 selecting binding points; this jump is < b 4 . As we continue to move toward the cricial set, either γi ends or we enter the “last” Q(k) available at step i, with k ∼ θi. Let a¯ be the parameter that corresponds to the αi end point of γi or where dC (a) ¯ = e− 2 , whichever is reached first, and let z¯ = ¯ (zi (a)) φa¯ (zi (a)). ¯ We will use z¯ as our “binding point” for γi . The “error” in this choice for zi (a), i.e. | |zi (a) − z¯ |h − dC (a) (zi (a)) |, is less than the total variation of a → φa (zi (a)) between a and a. ¯ We have proved that this is < Ke−ci dC (a) (zi (a)) + Kb k ρ ∼ dC (a) (zi (a)). % &
k−1 4
, where
Proof of Lemma 6.9. Let p˜ = min {pa (zi (a)) : zi (a) ∈ Iµj }. Then by Corollary 4.2(a) and the last lemma, p˜ < K|µ|. Assertion (a) in Lemma 6.9 is obvious for j ≤ J, where J is the common fold period. For J < j ≤ p, ˜ we have: τi+j wi+j |zi+j (a) − zi+j (a )| ≤ length (ωj ) = ∼ . τ i ω0 ω0 wi Since zi is outside of fold periods and wi splits correctly, we have, for j > J, |wi+j /wi ∼ e−µ wj (zi ). ˜ then Furthermore, if zˆ 0 = φ(zi ) and p˜ = p(zi (a)), ˜ e−µ wj (zi ) ∼ e−µ wj (ˆz0 ) ∼ e−µ wj (ˆz0 (a)), the first ∼ coming from Lemma 4.9, and the second from the fact that |φa (zi (a)) − φa˜ (zi (a))| ˜ < e−ci << K j for j < K|µ| < Kαi. Thus using the distance formula (9) in Lemma 4.11 for Ta˜ , we have τi+j 1 1 1 ˜ ∼ 2 |zi+j (a) ˜ − zˆ j (a)| ˜ < 2 e−βj . ∼ e−2µ 2 wj (ˆz0 (a)) µ µ µ ω0 τi
94
Q. Wang, L.-S. Young
This completes the proof of (a); (c) follows from wp˜ (zi (a)) ∼ wp˜ (zi (a)) ˜ and Proposition 6.1. It remains to prove (b). From (a) we have that zi+p˜ is out of all fold periods whenever ˜ is. To show that the slopes of τi+p˜ are < K(δ), we use Lemma 6.3: the wi+p˜ zi+p˜ (a) vectors are b-horizontal, so it suffices to show that ws ≤ K 1δ wi+p˜ for all s < i + p. ˜ For s ≥ i, this is true by comparison with a = a; ˜ for s < i, ws ≤ wi because zi is a free return. Finally, the small slope of ωp˜ allows us to reverse the inequalities displayed & above to conclude that |ωp˜ | ≥ µ12 e−β p˜ . %
B.12. Distortion estimate for critical curves (Sect. 6.4). Let J be a parameter interval satisfying all the assumptions made in Proposition 6.2, and let a, a ∈ J . Assume that zi (a) and zi (a ) are free returns, and that they lie in the same Iµj with µ < αi. Write ξ0 (a) = zi (a) and wk (ξ0 (a)) = DTak (ξ0 (a)) 01 . Let p˜ = p(zi (a)) ˜ be the bound period and zˆ 0 (a) ˜ = φ(zi (a)) ˜ the binding point in the proof of Lemma 6.9. For k < p, ˜ let {wk∗ (ξ0 (a))} be given by the splitting algorithm taken with respect to the orbit p˜
˜ k=0 , and write wk∗ (ξ0 (a)) = Mk eiθk (ξ0 (a)) . The corresponding quantities segment {ˆzk (a)} for ξ0 (a ) = zi (a ) are defined analogously. Sublemma B.6. For k < p, ˜ Mk (ξ0 (a )) , Mk (ξ0 (a)) and
k−1 !j (a, a ) Mk (ξ0 (a)) ≤ exp K Mk (ξ0 (a )) dC (ˆzj (a)) j =1
|θk (ξ0 (a)) − θk (ξ0 (a ))| < (Kb) 2 !k−1 (a, a ), 1
where !j (a, a ) =
j
s
(Kb) 4 (|ξj −s (a) − ξj −s (a )| + |a − a |).
s=0
Proof. The computation is similar to that in Appendix B.7, modulo the following adaptations to accommodate for the fact that different parameter values are involved in the present situation: (i)
Replace |DT (ξ ) − DT (ξ )| < K|ξ − ξ | by |DTa (ξ(a)) − DTa (ξ(a ))| < K(|ξ(a) − ξ(a )| + |a − a |).
(ii) Replace |e − e | < K|ξ − ξ | by |e(a) − e(a )| < K(|ξ(a) − ξ(a )| + |a − a |). (iii) Replace |Y − Y | < (Kb)µ−j |ξ − ξ | by |Y (a) − Y (a )| < (Kb)µ−j (|ξ(a) − ξ(a )| + |a − a |). & % 0 wi (z0 )(a) Next we prove a version of Sublemma B.6 with 1 replaced by ui (a) := w . i (z0 )(a)
Strange Attractors with One Direction of Instability
95
Sublemma B.7. |ξ0 (a) − ξ0 (a )| < exp K p˜ e−µ DTa (ξ0 (a ))ui (a ) p˜
DTa (ξ0 (a))ui (a)
Proof. The proof uses the fact that both ui (a) and ui (a ) split correctly. Writing ui (a) = A(a)e(a) + B(a) 01 , we have p˜
p˜
DTa (ξ0 (a))ui (a) = A(a)DTa (ξ0 (a))e(a) + B(a)wp˜ (ξ0 (a)). The proof is similar to that of Case 3 of Lemma 4.9, and Sublemma B.6 is used to compare wp (ξ0 (a)) and wp (ξ0 (a )). % & Proof of Proposition 6.2. In view of Proposition 6.1, it suffices to show that there exists a constant K > 0 such that 1 | wn (z0 (a)) | < < K. K | wn (z0 (a )) | Divide the time interval (1, n) into bound and free period according to Lemma 6.9. As usual we denote free return times as tk , 1 ≤ k < q, and the bound period at tk as ptk . Write wn (z0 (a)) log Sk + Sk , = wn (z0 (a )) k
k
where Sk = log
p
DTa k (ztk (a))utk (a) , p DTa k (ztk (a ))utk (a )
Sk = log
t
DTak+1
−pk
(ztk +pk (a))utk +pk (a) . tk+1 −pk DTa (ztk +pk (a ))utk +pk (a )
First we prove that k
K < δ
tk+1
(|zj (a) − zj (a )| + |a − a |).
j =tk +pk
The effect of |a − a | can be ignored since |a − a | < e−cn . By Lemma 6.7, the slopes of γj are uniformly bounded and the length of γj grows exponentially, so tk+1
|zj (a) − zj (a )| < K|γtk+1 |.
j =tk +pk
Again by Lemma 6.9(b), |γtk+1 | > K|γtk |. Therefore k
k=1
96
Q. Wang, L.-S. Young
where γtk ∈ Iµk jk . To estimate this sum, let m(µ) = max{tk : µk = µ} for each µ. Using the fact that | γtk+1 |≥ K|γtk |, we conclude that k
Sk < K
| γt | | γm(µ) | 1 k < K < K . e−µk e−µ µ2 µ µ k
This completes the proof. % &
References [Bal] [Bar]
[BC1] [BC2] [BV] [BY1] [BY2] [Bi] [Bo] [BSC1] [BSC2] [CE] [CL] [C] [DRV] [G] [GH] [H] [HY] [I1] [I2] [J] [Led] [Lev] [K]
Baladi, V.: Dynamical zeta functions In: Real and Complex Dynamical Systems, B. Branner and P. Hjorth, eds., Amsterdam: Kluwer Academic Publisher, 1995 Barge, M.: Prime end rotation numbers associated with the Hénon maps. I: Continuum Theory and Dynamical Systems, T. West, ed., Lecture Notes in Pure and Applied Math. 149, New York: Dekker, 1993, pp. 15–34 Benedicks, M. and Carleson, L.: On iterations of 1 − ax 2 on (−1, 1). Ann. Math. 122, 1–25 (1985) Benedicks, M. and Carleson, L.: The dynamics of the Hénon map. Ann. Math. 133, 73–169 (1991) Benedicks, M. and Viana, M.: Solutions of the basin problem for certain non-uniformly hyperbolic attractors. Preprint Benedicks, M. andYoung, L.-S.: Sinai–Bowen–Ruelle measure for certain Hénon maps. Invent. Math. 112, 541–576 (1993) Benedicks, M. and Young, L.-S.: Markov extensions and decay of correlations for certain Hénon maps. Asterisque (1999) Birkhoff, G.D.: Dynamical Systems. Providence, RI: Am. Math. Soc., 1927 Bowen, R.: Equilibrium states and the ergodic theory of Anosov diffeomorphisms, Lecture Notes in Math., Vol. 470, Berlin–Heidelberg–New York: Springer-Verlag, 1975 Bunimovich, L.A., Sinai,Ya.G. and Chernov, N.I.: Markov partitions for two-dimensional hyperbolic billiards. Russ. Math. Survey 45, 105–152 (1990) Bunimovich, L.A., Sinai, Ya.G. and Chernov, N.I.: Statistical properties of two-dimensional hyperbolic billiards. Russ. Math. Survey 46, 47–106 (1991) Collet, P. and Eckmann, J.P.: Positive Liapunov exponents and absolute continuity for maps of the interval. Ergodic Theory and Dynamical Systems 3, 13–46 (1983) P. Collet and Y. Levy, Ergodic properties of the Lozi mappings. Commun. Math. Phys. 93, 461–481 (1984) Cvitanovi´c, P.: Periodic orbits as the skeleton of classical and quantum chaos. Physica D 51, 138–151 (1991) Diaz, L.J., Rocha, J. and Viana, M.: Strange attractors in saddle-node cycles: prevalence and globality. Invent. Math. 125, 37–74 (1996) Guckenheimer, J.: A strange strange attractor. In: Hopf Bifurcation and its Application, J.E. Marsden and M. McCracken, eds., Berlin–Heidelberg–New York: Springer-Verlag, pp. 368–381 Guckenheimer, J. and Holmes, P.: Nonlinear oscillators, dynamical systems and bifurcations of vector fields. Appl. Math. Sciences 42, Berlin–Heidelberg–New York: Springer-Verlag, 1983 Hénon, M.: A two-dimensional mapping with a strange attractor. Commun. Math. Phys. 50, 69–77 (1976) Hu, H. and Young, L.-S.: Nonexistence of SRB measures for some systems that are “almost Anosov”. Ergodic Theory and Dynamical Systems 15, 67–76 (1995) Ishii, Y.: Towards a kneading theory for Lozi mappings I: A solution of the pruning front conjecture and the first tangency problem. Nonlinearity 10, 731–747 (1997) Ishii, Y.: Towards a kneading theory for Lozi mappings II: Monotonicity of the topological entropy and Hausdorff dimension of attractors. Commun. Math. Phys. 190, 375–394 (1997) Jakobson, M.:Absolutely continues invariant measures for one-parameter families of one-dimensional maps. Commun. Math. Phys. 81, 39–88 (1981) Ledrappier, F.: Propriétés ergodiques des mesures de Sinai. Publ. Math. Inst. Hautes Etud. Sci. 59, 163–188 (1984) Levi, M.: Qualitative analysis of periodically forced relaxation oscillations. Mem. AMS, 214, 1–147 (1981) Katok, A. and Hasselblatt, B.: Introduction to the modern dynamical systems. Cambridge: Cambridge University Press, 1995
Strange Attractors with One Direction of Instability
[Ma]
97
Mañé, R.: Ergodic theory and differentiable dynamics Berlin–Heidelberg–NewYork: Springer Verlag, 1987 [dMvS] de Melo, W. and van Strien, S.: One-dimensional Dynamics Berlin–Heidelberg–New York: SpringerVerlag, 1993 [MS] Misiurewicz, M. and Szlenk, W.: Entropy of piecewise monotone mappings. Studia Math. 67, 45–67 (1980) [M1] Misiurewicz, M.: Absolutely continues invariant measures for certain maps of an interval. Publ. Math. IHES. 53, 17–51 (1981) [M2] Misiurewicz, M.: The Lozi mapping has a strange attractor. In: Nonlinear Dynamics, R.H.G. Helleman, ed., New York: New York Academy of Sciences, 1980, pp. 348–358 [Mil] Milnor, J.: On the concept of attractor. Commun. Math. Phys. 99, 177–195 (1985) [MT] Milnor, J. and Thurston, W.: On iterated maps of the interval I and II. Preprint 1977; Published in Dynamical Systems: Proc. Univ. of Maryland 1986–87, Lect. Notes in Math., Vol. 1342, Berlin–New York: Springer, 1988, pp. 465–563 [MV] Mora, L. and Viana, M.: Abundance of strange attractors. Acta. Math. 171, 1–71 (1993) [NS] Nowicki, T. and Van Strien, S.: Absolutely continuous invariant measures under a summability condition. Invent. Math. 105, 123–136 (1991) [P] Pesin, Ja.B.: Characteristic Lyapunov exponents and smooth ergodic theory. Russ. Math. Surv. 32.4, 55–114 (1977) [PP] Parry, and Pollicott, M.: Zeta functions and the periodic orbit structure of hyperbolic dynamics. Paris: Société Mathématique de France (Astérisque, Vol. 187–188), 1990 [PS] Pugh, C. and Shub, M.: Ergodic attractors. Trans. A. M. S. 312, 1–54 (1989) [PT] Palis, J. and Takens, F.: Hyperbolicity & sensitive chaotic dynamics at homoclinic bifurcations. Cambridge studies in advanced mathematics, 35, Cambridge: Cambridge University Press, 1993 [R1] Ruelle, D.: A measure associated with Axiom A attractors. Am. J. Math. 98, 619–654 (1976) [R2] Ruelle, D.: Thermodynamic Formalism. Reading, MA: Addison Wesley, 1978 [R3] Ruelle, D.: Dynamical Zeta functions for piecewise monotone maps of interval. CRM Monograph Series, Vol. 4, Providence, RI: Am. Math. Soc., 1994 [R4] Ruelle, D.: Ergodic theory of differentiable dynamical systems. Publ. Math. Inst. Hautes Étud. Sci. 50, 27–58 (1979) [Ro] Robinson, C.: Homoclinic bifurcation to a transitive attractor of Lorenz type. Nonlinearity 2, 495–518 (1989) [Ry] Rychlik, M.: Lorenz attractors through Sil’nikov-type bifurcation, Part I. Ergodic Theory and Dynamical Systems 10, 793–822 (1990) [S1] Sinai, Ya.G.: Dynamical systems with elastic reflections: Ergodic properties of dispersing billiards. Russ. Math. Surveys 25, 137–189 (1970) [S2] Sinai, Ya.G.: Gibbs measure in ergodic theory. Russian Math. Surveys 27, 21–69 (1972) [Sm] Smale, S.: Differentiable dynamical systems. Bull. Am. Math. Soc. 73, 747–817 (1967) [T] Tsujii, M.: A simple proof for monotonicity of entropy in the quadratic family. Ergodic Theory and Dynamical Systems, to appear [TTY] Thieullen, P., Tresser, C. and Young, L.-S.: Positive exponent for generic 1-parameter families of unimodal maps. C.R. Acad. Sci. Paris, t. 315 Série I (1992), 69–72; J Analyse 64, 121–172 (1994) [V] Viana, M.: Strange attractors in higher dimensions. Bull. Braz. Math. Soc. 24, 13–62 (1993) [W1] Williams, R.: Classification of one-dimensional attractors. In: Global Analysis, Proc. Sym. in Pure Math., Vol. 14 (1970), 341–361 [W2] Williams, R.: The structure of Lorenz attractors. Turbulence Seminar Berkeley 1996/97, P. Bernard and T. Ratiu, eds., Berlin–Heidelberg–New York: Springer-Verlag, 1977, pp. 94–112 [Wa] Walters, P.: An introduction to ergodic theory. Grad. Texts in Math. Berlin–Heidelberg–New York: Springer-Verlag, 1981 [Y1] Young, L.-S.: A Bowen-Ruelle measure for certain piecewise hyperbolic maps. Trans. A.M.S. 287, 41–48 (1985) [Y2] Young, L.-S.: Ergodic theory of differentiable dynamical systems. Real and Complex Dynamical Systems, B. Branner and P. Hjorth, eds., Amsterdam: Kluwer Acad. Press, 1995 [Y3] Young, L.-S.: Statistical properties of dynamical systems with some hyperbolicity. Ann. of Math. 147, 585–650 (1998) [Y4] Young, L.-S.: Recurrence time and rate of mixing. Israel J. of Math. 110, 153–188 (1999) Communicated by Ya. G. Sinai
Commun. Math. Phys. 218, 99 – 111 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Energy Correlations in O(N) Models and the Wolff Representation Michael Campbell Department of Mathematics, University of California, Irvine, CA 92697-3875, USA. E-mail:
[email protected] Received: 7 March 2000 / Accepted: 31 October 2000
Abstract: We prove that energy functions are positively correlated in isotropic, ferromagnetic O(N ) models on an arbitrary graph. In our inductive proof, this is used to prove the strong FKG property of the Wolff representation for isotropic, ferromagnetic O(N + 1) models. This strong FKG property is then used to prove energy correlations for the O(N + 1) model. Furthermore, percolation in the Wolff representation is proved to be a necessary and sufficient condition for positivity of the spontaneous magnetization (previously known only for N = 3). 1. Introduction Energy correlation inequalities for XY models were first proved, very elegantly, in [6] and extended in [8]. O(N ) versions of these inequalities, in their fullest generality, were shown to not hold for definite, but infrequent exceptions in [10] (a four-term energy correlation is numerically shown to be negative). However, in [10], numerical computations of the (two-term) energy–energy correlation functions were all shown to be non-negative for twenty-five graphs in the cases N = 3, 4, 5, 10. This reinforced the conjecture that energy–energy correlations in O(N ) models are positive. In a seemingly unrelated problem, Wolff’s cluster algorithm [11], which is a Monte Carlo algorithm used to analyze spin observables, shows numerically that associated clusters percolate precisely at the point of phase transition. This was first developed theoretically in [4] using very beautiful arguments. And the connection between FKG properties of the Wolff representation and energy–energy correlations shown in [2, 3] not only relates the two problems, but in fact also suggests that energy–energy correlation functions should be positive for isotropic O(N ) models (it has been shown in [2] that they are negative for some anisotropic models) on the basis of the results in [11]. We show here that the two go hand-in-hand. We know from a minor extension of arguments in [2, 3] Research supported by Harry and Mary Might.
100
M. Campbell
that these O(N − 1) correlations are needed to get the strong FKG properties in O(N ). Surprisingly, the strong FKG properties from the Wolff representation are exactly what is needed to prove the correlation inequalities in O(N ), which are the principal results below (cf. Theorem 2). Accordingly, we need to develop the strong FKG property to prove energy correlations, and to further use the property to establish the connection of magnetization and percolation. 2. Notation for the Wolff Representation Our major application of these correlation inequalities is to establish the strong FKG property of the Wolff representation (cf. (3)) for isotropic O(N ) models and to connect percolation to magnetization. Some theoretical understanding of this connection appears in [1, 7, 9], but a conclusive theorem has only recently been established [2–4] and only for the cases N = 2, 3. We extend these results for all N ≥ 3. We will state the problem precisely below, using the following notations and definitions. We consider the standard classical Heisenberg ferromagnet. Let G denote an arbitrary, finite graph with bonds B and sites S. The Hamiltonian is given by H=− Jij s i · s j , (1) i,j ∈S
where s i = (xi1 , . . . , xiN ) is a unit vector in RN and the Jij ≥ 0. The partition function is given by ZG (β) = d |S| s e−β H , where the integrations are with respect to the usual, normalized measure on the sphere. To define the Wolff representation, we will start by working with the x N -direction. Let us write s i = (ai ti1 , . . . , ai tiN−1 , bi σi ), where bi is the absolute value of the projection of s i onto the x N -axis, σi is an Ising variable, ai = 1 − bi2 and the (ti1 , . . . , tiN−1 ) are the usual O(N − 1) variables. The idea of the Wolff representation is to construct an FK random cluster representation [5] from the Ising variables. Using σi σj = 2δσi σj − 1 we have ZG (β) = aiN−3 dbi eβJij bi bj Zb[I ] (2β)Za[N−1] (β), (2) i
i,j
where, on the right-hand side, the dependence on G and the Jij has been suppressed. The terms above are defined as follows: b is a configuration of the b’s, b = {bi | i ∈ S} (and similarly for a); the function Zb[I ] is the Ising partition function written in Potts form Zb[I ] (2β) =
e
2βbi bj Jij (δσi σj −1)
;
σi i,j ∈S
and Za[N−1] (β) is the O(N − 1) partition function with couplings Jij ai aj at inverse temperature β.
Energy Correlations in O(N ) Models and the Wolff Representation
We expand Zb[I ] (2β) ≡ Here
Bb,2β (ω) =
ω⊂B Bb,2β (ω)2
c(ω)
1 − e−2βJij bi bj
i,j ∈ω
101
in the standard FK representation.
e−2βJij bi bj
i,j ∈ω /
and c(ω) is the number of connected components of the bond configuration ω. Hence a joint measure is defined on configurations (b, ω) of “spin-projections” and bonds. This is the Wolff measure and is denoted by VβW (−). The bond marginal is denoted by Mβ (−) and the b-marginal by ρβ (−). For specific lattices that frequently occur in applications, such as Zd , we need to address boundary conditions. For ⊂ Zd , we consider free boundary conditions (for which nothing has to be said) and wired boundary conditions meaning bi σi ≡ 1 for all i ∈ ∂. We will denote these measures with some condition ∗ on the boundary by a superscripted ∗: i.e., M∗β (−) and ρβ∗ (−). Our notation for free, periodic, and wired will be f , p, and w, respectively. 3. Results The relationship between the strong FKG property for the O(N ) model and positive energy correlations in the O(N − 1) model is the content of the following. If the graph G is a rectangular subset of Zd or another such suitable graph, we can also consider periodic boundary conditions. Theorem 1. Consider a finite graph G. For the O(N ) model with free (periodic) boundary conditions the corresponding measure ρβ∗ (−) is strong FKG (which we define in the proof) if the following conditions hold in the O(N − 1) model with free (periodic) boundary conditions and arbitrary ferromagnetic couplings: (i) E[N−1] [t u · t v ] ≥ 0, (ii) E[N−1] [(t u · t i )(t v · t j )] ≥ E[N−1] [t u · t i ] E[N−1] [t v · t j ] for all i, j, u, v ∈ S. Here t i = (ti1 , . . . , tiN−1 ) is a unit vector in RN−1 and E[N−1] [−] is the expectation with respect to the aforementioned O(N − 1) measure. Proof. First, we will define the strong FKG property. Let u, v ∈ S and %u , %v denote small numbers. For fixed b, let δu denote the configuration that agrees with b at each site except u, where it takes on the value bu + %u . The configuration δv is defined similarly and it is assumed, without loss of generality, that bu + %u is less than one. We will show that ρβ∗ (−) is strong FKG; i.e., ρβ (b ∨ δu ∨ δv )ρβ (b) ≥ ρβ (b ∨ δu )ρβ (b ∨ δv ),
(3)
where, for simplicity, we use the same notation for the density and the measure. It has been shown [4, Prop A.1.] that [I ] [I ] [I ] eβJuv %u %v Zb∨δ (2β)Zb[I ] (2β) ≥ Zb∨δ (2β)Zb∨δ (2β). u ∨δv u v
(4)
Thus, it is enough to show [N−1] [N−1] [N−1] [N−1] Za(b∨δ (β)Za(b) (β) ≥ Za(b∨δ (β)Za(b∨δ (β). u ∨δv ) u) v)
(5)
102
M. Campbell
Since Za[N−1] is non-vanishing for all a, we can instead prove [N−1] Za(b∨δ (β) u ∨δv ) [N−1] Za(b∨δ (β) u)
≥
[N−1] Za(b∨δ (β) v) [N−1] Za(b) (β)
.
(5 )
Define a new b-configuration, δu (t), which raises the configuration b at the site u by t. Hence bi is unchanged for all i, except at u, where bu becomes bu + t. So if we can show that the function (supressing β-dependence) G(t) ≡
[N−1] Za(b∨δ u (t)∨δv ) [N−1] Za(b∨δ u (t))
(6)
is an increasing function of t, we have (5 ) by noting G(%u ) ≥ G(0). But this will follow if G (t) ≥ 0, which amounts to proving ∂ [N−1] ∂t Za(b∨δu (t)∨δv ) [N−1] Za(b∨δ u (t)∨δv )
≥
∂ [N−1] ∂t Za(b∨δu (t)) . [N−1] Za(b∨δ u (t))
Using the same idea as above, we only need show that F (s) ≥ 0, where F (s) ≡
∂ [N−1] ∂t Za(b∨δu (t)∨δv (s)) . [N−1] Za(b∨δ u (t)∨δv (s))
(7)
The derivative of F will be positive if we show E[N−1] a(b∨δu (t)∨δv (s)) [t u · t v ] ≥ 0
(8a)
and E[N−1] a(b∨δu (t)∨δv (s)) [(t u · t i )(t v · t j )] [N−1] ≥ E[N−1] a(b∨δu (t)∨δv (s)) [t u · t i ] × Ea(b∨δu (t)∨δv (s)) [t u · t i ].
(8b)
We need (3) to hold for all configurations b. And since the couplings in the O(N − 1) expectations in (8a) and (8b) are Jij ai aj , these conditions must hold for arbitrary (ferromagnetic) couplings. It has been shown in [6] that energy functions are positively correlated for N = 2. This was used in [3] to establish the strong FKG property of the Wolff representation for N = 3; namely, that the strong FKG property of Theorem 1 holds. And the above proposition shows that if we have positive energy correlations for O(N − 1) models (with arbitrary couplings), we have the strong FKG property for O(N ) models. Below, we will extend Theorem 1 for more general terms in the Hamiltonian. We say the interaction {Jij } dominates the interaction {Kij } if Jij ≥ Kij , for all i, j ∈ S. We also say the magnetic field {hi } dominates the field {ki } if hi ≥ ki for all sites i.
Energy Correlations in O(N ) Models and the Wolff Representation
103
Corollary 1. Consider adding to the isotropic (zero field) Hamiltonian of (1) the follow ing types of terms: (a) A magnetic field term: i hi σi bi (i.e. h points in the xˆ N direction) with all the non-zero hi of the same sign, say positive. (b) A term that modifies the coupling between the x N -components, i,j Kij bi bj σi σj with Jij + Kij non-negative. Then the conclusions of Theorem 1 still hold. Furthermore, if one set of K’s and h’s dominates a second set, the associated measures are correspondingly FKG ordered. Proof. Neither of these terms have any effect on the O(N − 1) portion in the proof of Theorem 1. The remainder of what is needed is proved exactly as in [4, Proposition A.1]: it establishes the FKG property (when all the Jij + Kij are non-negative) and similar considerations were shown to apply to non-zero (and non-negative) magnetic fields by considering “ghost sites”. The stated FKG dominance was the corollary to this proposition. Corollary 2. Under the same conditions as Theorem 1, assume that the b-marginal ρβ (−) is strong FKG (Condition (3)). Then the measures Mβ (−) are FKG (have positive correlations). Proof. Let µb,2β (−) denote the q = 2 random cluster measures as described earlier. Explicitly µb,2β (ω) ∝ Bb,2β (ω)2c(ω) .
(9)
We may decompose the measure Mβ according to Mβ (−) = dρβ (b)µb,2β (−).
(10)
Let A and B denote increasing bond events. Then Mβ (A ∩ B) = dρβ (b)µb,2β (A ∩ B) ≥ dρβ (b)µb,2β (A)µb,2β (B)
(11)
b
b
b
by the FKG property of the random cluster measures. But random cluster probabilities of increasing events are increasing functions of all the couplings, and hence of the b. Thus µb,2β (A) and µb,2β (B) are increasing functions of b, and we conclude that Mβ (A ∩ B) ≥ Mβ (A) Mβ (B) by our assumption. Now we can show positive energy correlations for the O(N ) model by using the strong FKG property of the Wolff representation for the O(N ) model. First, we will use some preliminary notation for the proof as well as an auxiliary lemma. Given specific sites k, l, m, n ∈ S, we will write our O(N ) Hamiltonian in (1) as H = H0 + Hl + Hm + Hmk , where H0 = −
(12)
Jij s i · s j ,
i,j = l,m
Hl = −
Jil s i · s l ,
i= l,m
Hm = −
(13) Jim s i · s m ,
i= l,m
Hlm = −Jlm s l · s m .
104
M. Campbell
We also parameterize the spins with the usual spherical coordinates: s = cos φ N xˆ N + sin φ N cos φ N−1 xˆ N−1 + · · · + sin φ N sin φ N−1 · · · cos φ 3 zˆ ˆ + sin φ N sin φ N−1 · · · sin φ 3 sin θ yˆ + sin φ N sin φ N−1 · · · sin φ 3 cos θ x, where for notational convenience, we refer to the coordinates x 1 , x 2 , and x 3 as x, y, and z, respectively. The polar angles φ M , 3 ≤ M ≤ N , take values in [0, π ] and the planar angle θ ∈ [0, 2π ]. Lemma 1. Consider the O(N ) model (N ≥ 3) on an arbitrary finite graph with the Hamiltonian given by (1), having free or periodic boundary conditions. Let − be the corresponding Gibbs state for the given Hamiltonian. Then the integral I ≡ (s k · s l )(s m · s n ) can be written as I=
1 Z
[0,π]|S|−2
|S|
×
i=l,m
d t e
dλi e−β H0 sin φkN sin φnN 1
(14)
−β(H0N −1 +Hl +Hm +Hlm )
(t k · T l )(T m · t n ),
where dλi = dφiN sinN−3 (φiN ) and dt is the usual (normalized) measure on the sphere ∂B N = {x ∈ RN | x = 1}. Here, s i = cos φiN xˆ N + sin φiN t i (i = l, m), t i , T l , and T m are unit vectors in RN−1 , and the Hamiltonians in (14) are H01 = −
i,j = l,m
Jij cos φiN cos φjN ,
H0N−1
=−
i,j = l,m
Jij sin φiN sin φjN t i · t j ,
Hl = −
i= l,m
Hm = −
Jil sin φiN t i · T l ,
(15)
i= l,m
Jim sin φiN t i · T m ,
Hlm = −Jlm T m · T l . Proof. We can write the integral explicitly as I=
1 Z
dsl
dsi e−β H (s k · s l )(s m · s n ) .
(16)
i= l
Since we have full O(N ) invariance of the Hamiltonian and dot products, the integrand of the right side of (16) (in brackets) is a constant function of s l . Thus we may take ˆ the unit vector in the x-direction, and integrate out the l-degrees of freedom. s l = x,
Energy Correlations in O(N ) Models and the Wolff Representation
105
We now only have O(N − 1) invariance, perpendicular to the x-axis. To exploit this remaining symmetry, we reparameterize N N N−1 N N N−1 3 4 s m = cos φ˜ m cos φ˜ m sin φ˜ m · · · cos φ˜ m xˆ + sin φ˜ m xˆ + · · · + sin φ˜ m xˆ N N−1 3 N N−1 3 ˜ ˜ ˜ ˜ ˜ ˜ ˜ ˆ + sin φm sin φm · · · sin φm sin θm zˆ + sin φm sin φm · · · sin φm cos θ˜m y. M = π/2 for Using the remaining O(N − 1) symmetry, we may take θ˜m = 0 and φ˜ m 3 ≤ M ≤ N − 1 and integrate all of these degrees of freedom out, using the same argument as with s l . Note that we could also take θ˜m = π ; we will use this fact later. Hence the vector N N ˆ s m = cos φ˜ m xˆ + sin φ˜ m y.
(17)
Now we consider the O(N − 1) system in the integral (16), as we did above (2). We see that s i = (sin φiN ti1 , . . . , sin φiN tiN−1 , cos φiN ) , where t i = (ti1 , . . . , tiN−1 ) is a unit vector in RN−1 . So s i = cos φiN xˆ N + sin φiN t i and the Hamiltonian H0 splits into H0 = H01 + H0N−1 , where
H01 = −
i,j = l,m
(18)
Jij cos φiN cos φjN ,
H0N−1 = −
Jij sin φiN sin φjN t i · t j .
i,j = l,m
(19) (20)
Furthermore, the Hamiltonians in (13) are now ˆ Jil sin φiN t i · x, Hl = − i= l,m
Hm = −
i= l,m
Jim sin φiN t i · s m ,
(21)
ˆ Hlm = −Jlm s m · x, and the terms in the integrand are ˆ s k · s l = sin φkN t k · x,
(22)
s m · s n = sin φnN s m · t n .
Then with dλi = dφiN sinN−3 (φiN ) and dt the usual measure on ∂B N−1 , dsi = dλi dti and the integral in (16) is 1 1 I= dλi e−β H0 sin φkN sin φnN Z i=l,m [0,π]|S|−2 (23) ×
d |S|−2 t
N −1
d λ˜ m e−β(H0
+Hl +Hm +Hlm )
ˆ m · t n) . (t k · x)(s
106
M. Campbell
Let’s now consider the integrand in (23), |S|−2 −β(H0N −1 +Hl +Hm +Hlm ) ˜ ˆ m · t n) . t d λm e (t k · x)(s I ≡ d
(24)
We see that (24) is the integral of an O(N − 1) system except at site m. So we view (24) as a system with (N −1)-dimensional unit vectors t i ∈ ∂B N−1 at each site i = m, and an N -dimensional unit vector s m ∈ ∂B N at site m that is restricted to the upper half of the x–y plane because of our previous use of symmetry. Also note that, relative to the O(N − 1) system of (24), the φiN are all constant and considered part of the interaction constants in the O(N − 1) Hamiltonians in (20) and (21). N, From (17), the spin s m only has one degree of freedom over which we integrate, φ˜ m and only the coordinates xm and ym are non-zero. By reintroducing new coordinates, we will replace this integral with an integral over the entire sphere ∂B N by adding appropriate delta functions. However, this sphere is relative to the O(N − 1) system in (24) (where the polar angles φiN , i = m are considered constant). Therefore, these new coordinates are different from those of the original sphere of s m in the given O(N ) 4 , . . . , X N into (24) which model. Specifically, we introduce the new coordinates Zm , Xm m ˆ M = xˆ M , 4 ≤ M ≤ N . are in the same direction as the original coordinates: Zˆ = zˆ , X We can now parameterize s m with the corresponding spherical coordinates: N N N ˆ N + · · · + sin φ˜ m ˆ4 ˜ N−1 ˜ N−1 ˜ 3m X X s m = cos φ˜ m cos 8 sin 8 · · · cos 8 xˆ + sin φ˜ m m m N N ˜ N−1 ˜ 3m sin 9 ˜ m Zˆ + sin φ˜ m ˜ N−1 ˜ 3m cos 9 ˜ m y. ˆ sin 8 · · · sin 8 sin 8 · · · sin 8 + sin φ˜ m m m
We must restrict these new coordinates in (24) because of our use of symmetry (cf. (17), where s m is restricted to the upper half x–y plane, y ≥ 0). Hence, for 3 ≤ M ≤ ˜M N − 1, all polar angles in our new coordinates must be restricted to 8 m = π/2 in (24). N M−1 M ˜ Keeping in mind that φ˜ m is the polar angle for xm and 8 is the polar angle for Xm M (4 ≤ M ≤ N ), the delta functions needed to implement this restriction are δ(Xm = 0), 4 ≤ M ≤ N. ˜ m = 0. Likewise, to keep s m in the upper half of the x–y plane, we must impose 9 Since we could just as well have taken θ˜m = π in the above symmetry argument ˜ m = π in this (restricting s m to the lower half of the x–y plane), we would impose 9 ˜m = case. Taking half of each case, a convenient delta function is δ(Zm = 0) = (1/2)δ(9 ˜ m = π). Now, we change back to the usual spherical coordinates for s m 0) + (1/2)δ(9 in the O(N − 1) system, resulting in the reparameterization ˆ s m = cos 8N mX
N
N−1 ˆ X + sin 8N m cos8m N N−1 3 ˆ + sin 8m sin 8m · · · cos 8m Z
N−1
N−1 · · · sin 83m sin 9m yˆ + sin 8N m sin 8m
+ ··· (25)
N−1 ˆ · · · sin 83m cos 9m x. + sin 8N m sin 8m
1 1 N N−1 , cos 8N , where T N−1 is Then s m = sin 8N m = Tm , . . . , Tm m Tm , . . . , sin 8m Tm ˆ N + T m relative to the O(N − 1) system, a unit vector in RN−1 . Hence s m = cos 8N mx and
t i · sm = t i · T m.
Energy Correlations in O(N ) Models and the Wolff Representation
107
We will rewrite the integral in (24), using the above coordinates for s m and multiply the integrand by the aforementioned delta functions – noting that δ(Zm = 0) is now expressed in the coordinates of (25). The restriction to the x-y plane of s m (the use of the delta functions above in the integrand) requires that 8M m = π/2, 3 ≤ M ≤ N . Using (25), the Hamiltonians in (21) are now
ˆ Hl = − Jil sin φiN t i · x, i= l,m
Hm = −
i= l,m
Jim sin φiN t i · T m ,
(26)
ˆ Hlm = −Jlm T m · x. Consequently, we see that the O(N − 1) Hamiltonian HN−1 = HN−1 (t i , T m ), i ∈ S − {l, m}, in (24) is HN−1 ≡ H0N−1 + Hl + Hm + Hlm and
I =
d
|S|−2
t
dsm δ(Zm = 0)
δ
M Xm
=0
e−β H
N −1
ˆ (t k · x)(T m · t n ).
4≤M≤N
(27) N Implementing δ(8N m = π/2) (which is δ(Zm = 0) if N = 3; or δ(Xm = 0) if N > 3) N−1 in (27), we replace dsm with dTm , the usual measure on ∂B . Hence we can rewrite
M δ Xm =0 × I = dTm δ(Zm = 0)
4≤M≤N−1
d |S|−2 t e
−β HN −1
(28)
ˆ (t k · x)(T m · t n) ,
where in the case N = 3, there are no delta functions in the integrand. If N > 3, using our symmetry argument again, we see that the integrand of (28) N −1 ˆ (29) I = d |S|−2 t e−β H (t k · x)(T m · t n) has O(N − 2) symmetry in the variable T m perpendicular to the x-axis. Thus we can go back to the coordinates ˜ N−1 xˆ T m = cos 8 m ˆ ˜ N−1 ˜ N−2 + sin 8 X cos 8 m m
N−1
+ ···
ˆ4 ˜ N−1 ˜ N−2 ˜ 3m X + sin 8 sin 8 · · · cos 8 m m ˜ N−1 ˜ N−2 ˜ 3m sin 9 ˜ m Zˆ + sin 8 sin 8 · · · sin 8 m m
˜ N−2 ˜ 3m cos 9 ˜ m yˆ ˜ N−1 sin 8 · · · sin 8 + sin 8 m m
108
M. Campbell
˜ N−2 ˜3 ˜ to see that the integrand I is a constant function of 8 m , . . . , 8m , 9m . The delta ˜ m = π ) = δ(Zm = ˜ m = 0) + (1/2)δ(9 functions in (28) can be replaced with (1/2)δ(9 M M+1 ˜ 0), δ(8m = π/2) = δ(Xm = 0), 3 ≤ M ≤ N − 2, and the integrand I is a constant function of all variables in the delta functions. Therefore we can remove the delta functions from the integral (28), and we have
I =
d |S|−1 t e−β H
N −1
ˆ (t k · p)(T m · t n ),
(30)
ˆ Finally (including where we replaced xˆ in the integrand, Hl , and Hlm – setting pˆ = x. again the case N = 3), we see that the integral in (30) has O(N − 1) symmetry in ˆ the vector pˆ (allowing pˆ to vary on ∂B N−1 ), and thus I is a constant function of p. Consequently, we can replace pˆ with a new vector T l and integrate over ∂B N−1 with respect to T l : I =
d |S| t e−β H
N −1
(t k · T l )(T m · t n ).
(31)
Theorem 2. Under the same conditions as Lemma 1, the energy functions are positively correlated; i.e., (i) s k · s l ≥ 0, (ii) (s k · s l )(s m · s n ) ≥ s k · s l s m · s n for any sites k, l, m, n ∈ S. Proof. The proof is by induction. Assume (i) and (ii) hold for the O(N − 1) model. Immediately, by Theorem 1, the measure ρβ∗ (−) is strong FKG for ∗=free or ∗=periodic boundary conditions. Now, we prove (ii) first. Let I = (s k · s l )(s m · s n ). By Lemma 1, we can rewrite I as in (14). We define an O(N − 1) partition function from the integrand of (14): Z[N−1] a
≡
N −1
d |S| t e−β(H0
+Hl +Hm +Hlm )
,
where a is a configuration of sines that appear in the above Hamiltonians (cf. (15)) – explicitly, ai = sin φiN , i ∈ S − {l, m}. We denote the resulting O(N − 1) expectation E[N−1] [−]. By assumption, condition (ii) holds for the O(N − 1) model, so a E[N−1] [(t k · T l )(T m · t n )] ≥ E[N−1] [t k · T l ] E[N−1] [T m · t n ]. a a a
(32)
Hence we have I= ≥
b
b
dρβ (b)ak an E[N−1] [(t k · T l )(T m · t n )] a dρβ (b)ak E[N−1] [t k · T l ]an E[N−1] [T m · t n ]. a a
(33)
Energy Correlations in O(N ) Models and the Wolff Representation
109
From (15), we see that ∂ [N−1] E [t k · T l ] ∂ai a [N−1] [N−1] = βJil E[N−1] [(t · T )(t · T )] − E [t · T ] E [t · T ] k l i l k l i l a a a [N−1] [N−1] + βJim Ea [(t k · T l )(t i · T m )] − Ea [t k · T l ] E[N−1] [t i · T m ] a [N−1] [N−1] +β Jij aj E[N−1] [(t · T )(t · t )] − E [t · T ] E [t · t ] , k l i j k l i j a a a j
(34) [t k · T l ] is an increasing function of which is positive by assumption. Thus ak E[N−1] a a, and hence a decreasing function of b. Similarly, an E[N−1] [T m · t n ] is a decreasing a function of b. Since the measure ρβ∗ (−) is strong FKG, the integral in (33) satisfies b
dρβ (b)ak E[N−1] [t k · T l ]an E[N−1] [T m · t n ] a a ≥ dρβ (b)ak E[N−1] [t · T ] dρβ (b)an E[N−1] [T m · t n ] k l a a b
(35)
b
= s k · s l s m · s n , where the last equality follows by reversing the arguments in Lemma 1 for each integral. Finally, combining (33) and (35), we have (ii). To prove (i), we write the integral as s k · s l = xkM xlM . (36) 1≤M≤N
It is enough to examine xkN xlN β = bk bl σk σl β – the argument for the other coordinates is the same. Decomposing into clusters, it is not hard to see that there is vanishing contribution from any contribution in which k is not connected to l and that σk = σl whenever it is. Hence the identity xkN xlN = VβW (bk bl I{k↔l} ), (37) β
where I is an indicator. Since all integrand functions on the right side of (37) are positive, we have (i). In fact, (i) of Theorem 2 above is well-known in the folklore (see [10]), and we used the opportunity here to show how the Wolff representation can be used to analyze spin–spin correlation functions in the context of clusters. The true power was (ii) of Theorem 2, which gives us the strong FKG property of Theorem 1. There are further applications of these correlations in the spin system analogous to the N = 2 applications (for example, see [8]) which we will not pursue here. Rather, we will focus on the bond system. Now that we have the FKG conditions for the corresponding Wolff representations of O(N ) models, N ≥ 3, we have the following.
110
M. Campbell
Theorem 3. Let k denote a thermodynamic sequence of finite boxes; 0 ∈ k−1 ⊂ k ⊂ Zd with k Zd . Let
exists and satisfies <∞ (β) ≥ m(β) ≥ K <∞ (β), where m(β) is the spontaneous magnetization. Here K is a finite and non-singular function of temperature and coordination number. Hence, the magnetization is positive if and only if there is percolation as defined by the condition <∞ (β) > 0. Furthermore, in the high temperature phase, the spin–spin correlation function and the magnetic susceptibility enjoy similar bounds by appropriate quantities in the graphical representation. In particular, the susceptibility is bounded above and below by “constants” times the average of the size of the cluster at the origin. f
w (−) ≥ ρ Corollary 3. Let 1 ⊂ Zd denote a finite set. Then ρβ; β;1 (−) and simi1 FKG
w (−) and larly for the bond measures Mβ (−). Furthermore let 2 ⊂ 1 and let ρβ; 1 |2 Mw (−) denote the restrictions of the wired measures to . Then 1 2 β;1 |2 w w (−) ≤ ρβ; (−) ρβ; 1 |2 2 FKG
and w Mw β;1 |2 (−) ≤ Mβ;2 (−). FKG
Proof. The wired measures can be constructed from the free measure, or the measure on the larger space, by the addition to the Hamiltonian of some Kij ’s and/or hi ’s (cf. Corollary 1) which are then taken to infinity. The stated FKG dominations follow from (the limiting version of) Corollary 1. An immediate consequence of Corollary 3 is the existence of <∞ independent of the sequence (k ). This follows from standard monotonicity arguments. In addition, we have Corollary 4. Consider a finite graph (with no boundary conditions) in particular some finite ⊂ Zd with free or periodic boundary conditions. Let −β denote the corresponding Gibbs state for the zero field Hamiltonian (Eq. (1)) or the infinite volume limit thereof (defined, if necessary by subsequence) and Mβ (−) the associated bond measure. Then NMβ (i ↔ j ) ≥ s i · s j β ≥ K 2 Mβ (i ↔ j ), where i ↔ j is the event that the sites i and j are in the same connected cluster and K is a non-singular function of β and coordination number. On Zd , in the single phase regime, similar bounds relate the susceptibility to the average cluster size.
Energy Correlations in O(N ) Models and the Wolff Representation
111
Proof. The argument here is similar to the proof of (i) in Theorem 2, which is identical to the proof in [4]. We can examine xiN xjN β = bi bj σi σj β . Decomposing into clusters, we have xiN xjN = VβW (bi bj I{i↔j } ), (38) β
where I is an indicator. The upper bound is obtained by the observation bi bj ≤ 1 and the lower bound by the FKG inequality (all integrand functions are increasing) and by considering the worst possible case of the neighbors of i and j to estimate bi β and bj β . By summing the spin–spin correlation function, we get the result on the susceptibility. Similarly, the proof of Theorem 3 uses the same arguments as the corresponding result for the XY model. Proof of Theorem 3. The spontaneous magnetization is not smaller than the average of x0N in any limiting state, hence the lower bound. To obtain the upper bound, we set hi ≡ h > 0 and note that for a.e. h, m(h) is independent of thermodynamic state. Consequently, we may employ wired boundary conditions at h > 0 and, with a little work, exchange the h ↓ 0 and infinite volume limits. Details are in [4, proof of Theorem 4A]. Acknowledgements. I wish to thank Lincoln Chayes for introducing me to the subject of graphical representations of continuous spin systems, and for laying the groundwork in [4].
References 1. Aizenman, M.: On the Slow Decay of O(2) Correlations in the Absence of Topological Excitations: Remark on the Patrascioiu–Seiler Model. J. Stat. Phys. 77, 351–359 (1994) 2. Campbell, M.: Continuous Spin Sytems and Graphical Representations. Ph.D. thesis, University of California, Los Angeles (1999) 3. Campbell, M. and Chayes, L.: The isotropic O(3) model and the Wolff representation. J. Phys. A 31, L255–L259 (1998) 4. Chayes, L.: Discontinuity of the Spin Wave Stiffness in the Two-Dimensional XY Model. Commun. Math. Phys. 197, 3, 623–640 (1998) 5. Fortuin, C.M. and Kasteleyn, P.W.: On the Random Cluster Model I: Introduction and Relation to Other Models. Physica 57, 536–564 (1972) 6. Ginibre, J.: General Formulation of Griffiths’ Inequalities. Commun. Math. Phys. 16, 4, 310–328 (1970) 7. Leung, P.W. and Henley, C.L.: Percolation Properties of the Wolff Clusters in a Planar Triangular Model. Phys. Rev. B 43, 752–759 (1991) 8. Messager, A., Miracle-Sole, S. and Pfister, C.: Correlation Inequalities and Uniqueness of the Equilibrium State for the Plane Rotator Ferromagnetic Model. Commun. Math. Phys. 58, 19–29 (1978) 9. Patrascioiu, A. and Seiler, E.: Phase Structure of Two-Dimensional Spin Models and Percolation. J. Stat. Phys. 69, 573–595 (1992) 10. Sylvester, G.S.: The Ginibre Inequality. Commun. Math. Phys. 73, 105–114 (1980) 11. Wolff, U.: Collective Monte Carlo Updating for Spin-Systems. Phys. Rev. Lett. 62, 361–364 (1989) Communicated by J. L. Lebowitz
Commun. Math. Phys. 218, 113 – 130 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
The Lp -Theory of the Spectral Shift Function, the Wegner Estimate, and the Integrated Density of States for Some Random Operators J. M. Combes1,2, , P. D. Hislop1,3, , Shu Nakamura4, 1 2 3 4
Centre de Physique Théorique† , CNRS Luminy Case 907, 13288 Marseille Cedex 9, France Département de Mathématiques, Université de Toulon et du Var, 83130 La Garde, France Mathematics Department, University of Kentucky, Lexington, KY 40506-0027, USA Graduate School of Mathematical Sciences, University of Tokyo, 3-8-1 Komaba, Meguro-ku, Tokyo 153-8914, Japan
Received: 27 July 2000 / Accepted: 1 November 2000
Abstract: We develop the Lp -theory of the spectral shift function, for p ≥ 1, applicable to pairs of self-adjoint operators whose difference is in the trace ideal Ip , for 0 < p ≤ 1. This result is a key ingredient of a new, short proof of the Wegner estimate applicable to a wide variety of additive and multiplicative random perturbations of deterministic background operators. The proof yields the correct volume dependence of the upper bound. This implies the local Hölder continuity of the integrated density of states at energies in the unperturbed spectral gap. Under an additional condition of the single-site potential, local Hölder continuity is proved at all energies. This new Wegner estimate, together with other, standard results, establishes exponential localization for a new family of models for additive and multiplicative perturbations. 1. Introduction and Main Results One of the original challenges in the theory of randomly perturbed media was the demonstration that solutions to the appropriate equations of motion remained localized in space. This is usually interpreted as meaning that the deterministic spectrum of the corresponding random family of self-adjoint generators of the time evolution group, {Hω | ω ∈ }, is pure point almost surely in certain energy intervals. A self-adjoint operator H exhibits spectral localization in an energy interval I ⊂ R if the spectrum of H in I is purely pure point. The proof of localization for various random systems on Rd begins with an analysis of finite-volume perturbations H = H0 + V , for a bounded region ⊂ Rd , of a self-adjoint operator H0 , describing the background, unperturbed, situation. Two estimates on H are needed: 1) a decay estimate on the Green’s function of H at farseparated points, with a probability converging to one as || → Rd , and 2) an estimate Supported in part by CNRS and NATO Grant CRG-951351.
Supported in part by NSF grants INT-9015895 and DMS-9707049 and NATO Grant CRG-951351. Supported in part by JSPS grant Kiban B 09440055. † Unité Propre de Recherche 7061.
114
J. M. Combes, P. D. Hislop, S. Nakamura
on the location of the eigenvalues of H . We refer to [1, 12, 13, 16, 31, 19]. This second estimate is called a Wegner estimate. A Wegner estimate is an upper bound on the probability that the spectrum of the local Hamiltonian H lies within an η-neighborhood of a given energy E. A good Wegner estimate is one for which the upper bound depends linearly on the volume ||, and vanishes as the size of the energy neighborhood η shrinks to zero. The linear dependence on the volume is essential for the proof of the existence of the integrated density of states (IDS). The rate of vanishing of the upper bound as η → 0 determines the continuity of the IDS. In this note, we will present a new proof of a good Wegner estimate modeled on Wegner’s original proof [32], and some new estimates on the spectral shift function related to the single-site perturbation. This new result allows us to prove exponential localization and the local Hölder continuity of the integrated density of states for more models than previously known. In order to prove this result, we develop the Lp -theory of the spectral shift function for p > 1, that might be of independent interest. The models that can be treated by this method are described as follows. We can treat both multiplicative (M) and additive (A) perturbations of a background self-adjoint operator H0X , for X = M or X = A. Additively perturbed operators describe electron propagation, and multiplicatively perturbed operators describe the propagation of acoustic and electromagnetic waves. We refer to [9] for a further discussion of the physical interpretation of these operators. For the Wegner estimate, we are interested in local perturbations V of a background operator H0X , that are local with respect to a bounded region ⊂ Rd . Multiplicatively perturbed operators HM are of the form −1/2
HM = A
−1/2
H0M A
,
(1.1)
where A = 1 + V . We assume that (1 + V ) is invertible (cf. [9] for a discussion of this condition). Additively perturbed operators HA are of the form HA = H0A + V .
(1.2)
The unperturbed, background medium in the multiplicative case is described by a divergence form operator −1/2
H0M = −C0 ρ0 ∇ · ρ0−1 ∇ρ0 1/2
C0 ,
(1.3)
where ρ0 and C0 are positive functions that describe the unperturbed density and sound velocity. We assume that ρ0 and C0 are sufficiently regular so that C0∞ (Rd ) is an operator core for H0M . The unperturbed, background medium in the additive case is described by a Schrödinger operator H0 given by H0A = (−i∇ − A)2 + W,
(1.4)
where A is a vector potential with A ∈ L2loc (Rd ), and W = W+ − W− is a background potential with W− ∈ Kd (Rd ) and W+ ∈ Kdloc (Rd ). ˜ denote the lattice The perturbations V can be of the following two forms. Let d d ˜ points in the region , so that ≡ ∩ Z . For any k ∈ Z , let L (k) denote the cube of side length L centered at point k ∈ Zd . The local perturbation in the Anderson-type model is defined by: λi (ω)ui (x − i − ξi (ω )), (1.5) V (x) = ˜ i∈
Lp -Theory of Spectral Shift Function, Wegner Estimate, and IDS
115
provided the random variables ξi (ω ), modeling thermal vibrations, are small enough so the positivity condition (H3) (given ahead) holds. The functions ui are nonnegative and compactly supported in a neighborhood of the origin. They need not be of the form ui (x) = u(x), for some fixed u, since ergodicity plays no role in the Wegner estimate. For the breather-type model, the local potential is defined by ui (λi (ω)(x − i)). (1.6) V (x) = ˜ i∈
The single-site potentials ui must satisfy a repulsivity condition stated in Hypothesis (H3). We remark that the dependence of the Anderson-type potential on the random ˜ is linear, and hence analytic, whereas the dependence of the variables {λk (ω) | k ∈ } ˜ is nonanalytic when the breather-type potential on the random variables {λk (ω) | k ∈ } single-site potentials uk have compact support. When the sum in (1.5) or (1.6) extends over all the lattice points Zd , we write Vω for the potential and HωX , with the operator for X = A given by (1.2) with V replaced by Vω , and similarly for X = M. We will put conditions of the random variables λk (ω) and the single-site potentials uk . We note that the Wegner estimate is a local estimate so that for a finite region ⊂ Rd , only a finite number of single-site potentials are involved. We denote the ball of radius R > 0 about the origin by B(R). (H1a) The self-adjoint operator H0X is essentially self-adjoint on C0∞ (Rd ), for X = A and for X = M. The operator H0X is semi-bounded and has an open spectral gap. That is, there exist constants −∞ < M0 ≤ C0 ≤ B− < B+ < C1 ≤ ∞ so that σ (H0 ) ⊂ [M0 , ∞), and σ (H0 ) ∩ (C0 , C1 ) = (C0 , B− ] ∪ [B+ , C1 ). (H1b) The self-adjoint operator H0X is essentially self-adjoint on C0∞ (Rd ), and H0X is semi-bounded with σ (H0X ) ⊂ [M0 , ∞), for some M0 > −∞. (H2) The operator H0X is locally compact in the sense that for any χ ∈ L∞ (Rd ) with compact support, the operator χ (H0X − M1 )−1 is compact for any M1 < M0 . (H3) The single-site potentials uk , k ∈ Zd , are nonzero. For the Anderson-type model (1.5), we assume that there exists R > 0 so that uk ∈ C0 (B(R)), and that uk ≥ 0 for each k ∈ Zd . Furthermore, we assume that the family {uk | k ∈ Zd } is equicontinuous. For the breather-type model (1.6), we assume that there exists R > 0 so that uk ∈ C 1 (B(R)\{0}) (there may be a singularity at the origin), and that uk ≥ 0 for each k ∈ Zd . There exists %0 > 0 so that −x · ∇uk − %0 uk ≥ 0. Finally, we assume that there exists δ > 0 so that the family {x · ∇uk | k ∈ Zd } is equicontinuous on the set {x | − x · ∇uk ≤ δ}. (H3a) In addition to (H3) for the Anderson-type model, we assume that there exists %1 > 0 so that uk ≥ %1 on 1 (0), where r (k) ≡ {x ∈ Rd | |xj − kj | < r/2, j = 1 . . . , d}. (H4) The conditional probability distribution of λ0 , conditioned on λ0 ⊥ ≡ {λi | i = 0}, is absolutely continuous with respect to Lebesgue measure. The density h0 has compact support [m, M], for some constants (m, M) with −∞ < m < M < ∞. The density h0 satisfies h0 ∞ < ∞, where the sup norm is defined with respect to the probability measure P.
116
J. M. Combes, P. D. Hislop, S. Nakamura
We refer to the review article of Kirsch [17] for a proof of the fact that hypotheses (H3)–(H4) imply the essential self-adjointness of HωA on C0∞ (Rd ) (see [9] for the X = M case). We make two comments about the hypotheses. First, we will assume that the random variables are independent, and identically distributed, but the results hold in the correlated case, and in the case that the supports of the single-site potentials are not necessarily compact (cf. [8, 20]). Second, the open spectral gap condition (H1a) for σ (H0X ) is not needed provided H0X is semibounded from below. In this case, a lowersemibounded operator H0X always has at least one open gap (−∞, inf σ (H0X )). If only hypotheses (H1b)–(H4) hold, together with additional assumptions (H5)–(H6) given ahead, localization is proven near the bottom of the spectrum of H0X . The main result under these hypotheses on the unperturbed operator H0X , and the local perturbation V , is the following theorem. Theorem 1.1. Assume (H1a)–(H4). For any E0 ∈ G = (B− , B+ ), for any q > 1, and for any η < 21 dist(E0 , σ (H0X )), there exists a finite constant CE0 , depending on [dist(σ (H0X ), E0 )]−1 , the dimension d, and q > 1, such that: P dist(E0 , σ (HX )) ≤ η ≤ CE0 η1/q || .
(1.7)
For Anderson-type potentials, if we assume (H1b)–(H4), and, in addition, that the singlesite potential satisfies (H3a), then the result (1.7) holds for HωX | , with Dirichlet boundary conditions on , and for any E0 ∈ R. This result was proven for HA , and for HM , with theAnderson-type perturbation (1.5) and q = 1, that is, Lipschitz continuity of the integrated density of states, in [1] and [9], respectively. Although localization for X = A and the breather model (1.6) was proven in [7], Theorem 1.1 is new for the breather model for X = A, and Theorems 1.1–1.3 are new for X = M with the breather-type perturbation. There are several prior results on the Wegner estimate for multidimensional, continuous Schrödinger operators. Kotani and Simon [25] proved a Wegner estimate with a ||-dependence for Anderson models with overlapping single-site potentials. This condition was removed and extensions were made to the band-edge case in [5] and [1]. An extension to multiplicative perturbations was made in [9, 11–13]. These methods require a spectral averaging theorem (cf. [7] and references therein). Wegner’s original proof [32] for Anderson models did not require spectral averaging. Following Wegner’s argument, Kirsch gave a nice, short proof of the Wegner estimate in [18], but obtained a ||2 -dependence. Recently, Stollmann [30] presented a short, elementary proof of the Wegner estimate for Anderson-type models with singular single-site probability distributions that are assumed to be simply Hölder continuous. He also obtains a ||2 -dependence. These proofs, and the proof in this paper, do not require spectral averaging. It is not clear, however, how to extend the methods of this paper in order to prove a Wegner estimate for singular distributions with the correct volume dependence. In order to prove a localization result, we need to assume that the family of operators {HωX | ω ∈ } has a deterministic spectrum -. For the models described here this is insured if, for example, the single-site potentials satisfy uk (x) = u(x − k), for all k ∈ Zd . Given the Wegner estimate, the existence of localized states near the band edge for these models follows provided we make two other hypotheses.
Lp -Theory of Spectral Shift Function, Wegner Estimate, and IDS
117
(H5) The density h0 decays sufficiently rapidly near m and near M in the following sense: 0 < P{|λ − m| < ε} ≤ ε 3d/2+β , 0 < P{|λ − M| < ε} ≤ ε3d/2+β , for some β > 0. satisfying B < B < B < B such that (H6) There exist constants B± − + − + - ∩ {(B− , B− ) ∪ (B+ , B+ )} = ∅ .
In light of hypothesis (H6), we define the band edges of the almost sure spectrum near the gap G, as follows: B˜ − ≡ sup{E ≤ B− | E ∈ -},
and
B˜ + ≡ inf {E ≥ B+ | E ∈ -},
Theorem 1.2. Assume (H1a)–(H6). There exist constants E± satisfying B− ≤ E− < B˜ − and B˜ + < E+ ≤ B+ such that - ∩ (E− , E+ ) is pure point with exponentially decaying eigenfunctions. Theorem 1.3. Assume (H1a)–(H6), and that the model is ergodic. The integrated density of states is Hölder continuous of order 1/q, for any q > 1, on the interval (B− , B+ ). For Anderson-type potential, if we assume (H1b)–(H6), ergodicity, and, in addition, that the single-site potential satisfies (H3a), then the integrated density of states is locally Hölder continuous of order 1/q, for any q > 1, on -. The existence of the integrated density of states for additively perturbed, infinitevolume, ergodic models like (1.2) is well-known. A textbook account is found in the lecture notes of Kirsch [17]. The same proof applies to the multiplicatively perturbed model (1.1) with minor modifications. Recently, Nakamura [26] showed the uniqueness of the IDS, in the sense that it is independent of Dirichlet or Neumann boundary conditions, in the case of Schrödinger operators with magnetic fields. The same proof applies to the multiplicatively perturbed model. It is interesting to note that the proof uses the L1 -theory of the spectral shift function. We mention the recent papers of Kostrykin and Schrader [21–23] in which they construct a spectral shift density for random Schrödinger operators with Anderson-type potentials. Their idea is to use the spectral shift function ξ (λ) for the pair (H = H0 + V , H0 ), and to study the thermodynamic limit lim
→∞
ξ (λ) ≡ ξ(λ). ||
(1.8)
They prove that this limit (property defined) exists and is deterministic. Furthermore, they prove that the spectral shift density ξ(λ) is related to the IDS through the formula ξ(λ) = N0 (λ) − N (λ),
(1.9)
where N0 (λ) is the IDS for the free Hamiltonian H0 . Their work provides an alternative proof of the existence of the IDS.
118
J. M. Combes, P. D. Hislop, S. Nakamura
The contents of this paper are as follows. The Lp -theory of the spectral shift function for p > 1 is developed in Sect. 2. We prove Wegner’s estimate in Sect. 3 along the ideas of the original argument. An application of the theory developed in Sect. 2 to the single-site spectral shift function allows us to obtain the correct volume dependence. In Sect. 4, we discuss the localization properties of the eigenfunctions of the operators HX corresponding to eigenvalues in the unperturbed spectral gap G. We also present a slightly generalized version of a comparison theorem of Kirsch, Stollmann, and Stolz [19] that allows one to use single-site potentials of arbitrarily small support. Some simple proofs of the trace class estimates used in the proof of Wegner’s estimate in Sect. 3 are presented in Sect. 5. Extensions of some of the results of this paper to single-site potentials without fixed sign will appear in [14]. We mention a recent preprint of Kostrykin and Schrader [24] in which they study the density of surface states using some of the methods of this paper. 2. The Lp -Theory of the Spectral Shift Function, p ≥ 1 We develop the Lp -theory of the spectral shift function for p ≥ 1. Let us recall the L1 theory, which can be found in the review paper of Birman and Yafaev [3], and the book of Yafaev [33]. Suppose that H0 and H are two self-adjoint operators on a Hilbert space H having the property that V ≡ H − H0 is in the trace class. Under these conditions, we can define the Krein spectral shift function (SSF) ξ(λ; H, H0 ) through the perturbation determinant. Let R0 (z) = (H0 − z)−1 , for I m z = 0. We then have ξ(λ; H, H0 ) ≡ It is well-known that
1 lim arg det (1 + V R0 (λ + i%)). π %→0+
(2.1)
ξ(λ; H, H0 ) dλ = Tr V ,
(2.2)
R
and that the SSF satisfies the L1 -estimate: ξ(· ; H, H0 )L1 ≤ V 1 .
(2.3)
Let A be a compact operator on H and let µj (A) denote the j th singular value of A. We say that A ∈ I1/p , for some p ≥ 1, if µj (A)1/p < ∞. (2.4) j
Clearly, this means that the singular values of A converge very rapidly to zero. We define a nonnegative functional on the ideal I1/p by p A1/p ≡ µj (A)1/p . (2.5) j
For p > 1, this functional is not a norm but satisfies 1/p
1/p
1/p
A + B1/p ≤ A1/p + B1/p .
(2.6)
Lp -Theory of Spectral Shift Function, Wegner Estimate, and IDS
119
1/p
If we define a metric ρ1/p (A, B) ≡ A − B1/p on I1/p , then the linear space I1/p is a complete, separable linear metric space. The finite rank operators are dense in I1/p (cf. [2]). Since I1/p ⊂ I1 , for all p ≥ 1, we refer to A ∈ I1/p as being super-trace class. Consequently, we can define the SSF for a pair of self-adjoint operators H0 and H for which V = H − H0 ∈ I1/p . Our main theorem is the following. Theorem 2.1. Suppose that H0 and H are self-adjoint operators so that V = H − H0 ∈ I1/p , for some p ≥ 1. Then, the SSF ξ(λ; H, H0 ) ∈ Lp (R), and satisfies the bound 1/p
ξ(· ; H, H0 )Lp ≤ V 1/p .
(2.7)
Proof. The proof follows the standard proof in the L1 -theory, cf. [33]. First, we consider the simple case that V is rank one. Let V = λ(V )|ψψ|, for some ψ ∈ H with ψ = 1, and λ(V ) ∈ R. Note that the singular value µ(V ) = |λ(V )|. By a standard calculation, one finds that ξ(· H, H0 )L1 = |ξ(λ; H, H0 )| dλ = µ(V ), (2.8) R
and the SSF is bounded, |ξ(λ; H, H0 )| ≤ 1.
(2.9)
ξ(· ; H, H0 )Lp ≤ µ(V )1/p .
(2.10)
As a consequence, we have
We next consider the case of a finite rank perturbation V of the form VN =
N
µj (VN ) |ψj ψj |,
(2.11)
j =1
where N < ∞. Without loss of generality, we also assume that the coefficients µj (VN ) ≥ 0, and that ψj = 1. Let HN = H0 + VN . We define a sequence of Hamiltonians HK by H K ≡ H0 +
K
µk (VN ) |ψk ψk |, j = 1, . . . , N.
(2.12)
k=1
The factorization of the perturbation determinant into the product of determinants of rank-one perturbations implies that the spectral shift function for HN can be written as the sum ξ(λ; HN , H0 ) =
N j =1
ξ(λ; Hj , Hj −1 ).
(2.13)
120
J. M. Combes, P. D. Hislop, S. Nakamura
Each term in the sum on the right in (2.13) is the SSF for a rank one perturbation and hence satisfies the bounds (2.8)–(2.10). It follows from those bounds and the Minkowski inequality (p ≥ 1) that
1/p p ξ(· ; HN , H0 )Lp = |ξ(λ; HN , H0 )| dλ R
≤
N R
j =1
≤
N
p
1/p
|ξ(λ; Hj , Hj −1 )| dλ
(2.14)
|µj (VN )|1/p ,
j =1
from which it follows that 1/p
ξ(· ; HN , H0 )Lp ≤ VN 1/p .
(2.15)
As for the general case, let V ∈ I1/p . Such a potential is the limit of a sequence of finite rank operators VN , of the form given in (2.6), and satisfying p ∞ V − VN 1/p ≤ |µj (V )|1/p . (2.16) j =N+1
Since the series is convergent, the difference vanishes as N → ∞. It follows that ξ(· ; H, H0 )Lp ≤
lim
N→∞
N
1/p
|µj (V )|1/p = V 1/p ,
(2.17)
j =1
proving the result. D. Hundertmark and B. Simon have recently proved an optimal Lp -bound on the spectral shift function [15]. 3. A Proof of Wegner’s Estimate In his original article [32], Wegner introduced the clever device of interchanging differentiation with respect to energy with differentiation with respect to the random variables on which the random potential depends. Since an expectation is taken in the course of the proof, these derivatives in the random variables can be removed by an integration with respect to the random variables. This requires some smoothness on the distribution of the random variables. At one point in the standard proof, a Weyl-type estimate is used to bound the trace of the smoothed-out spectral projector. We replace this with a finer estimate by first expressing the quantity in terms of a spectral shift function corresponding to a perturbation by a single-site potential, and then by estimating the Lp -norm of this spectral shift function, for p > 1, as described in Sect. 2. The proof below uses some of the modifications of Wegner’s proof [32] introduced by Kirsch [18]. We note that this proof of the Wegner estimate does not require spectral averaging [7]. It does, however, rely upon some monotonicity of the eigenvalues with respect to the random variables (for comparison, see [4]).
Lp -Theory of Spectral Shift Function, Wegner Estimate, and IDS
121
In the case of the Anderson-type potential satisfying (H1b)–(H4), with the single-site potential satisfying (H3a), the proof proceeds along the same lines with the following modification. Instead of working with the local operator H = H0 + V , on the Hilbert space L2 (Rd ), we take H = (H0 + Vω )|, with Dirichlet boundary conditions on ∂, on L2 (). The energy E0 can be chosen to be any energy and the comments concerning the gap G can be omitted. Finally, the argument in parts 3 and 4 of the proof are significantly simpler. The auxiliary potential defined in (3.12) is bounded below by %1 χ , the characteristic function on , that is the identity on L2 (). The Comparison Theorem of Kirsch, Stollmann, and Stolz [19] is not needed in this case. Proof of Theorem 1.1. 1. Let G = (B− , B+ ) be the spectral gap of H0 as in Hypothesis (H1) and fix an energy E0 ∈ G. Since the local potential V is a relatively compact perturbation of H0 , its effect is to introduce at most finitely-many eigenvalues Ej () into the gap G. Let η > 0 be chosen so that [E0 − 2η, E0 + 2η] ⊂ G. We denote by Iη the interval [E0 − η, E0 + η]. We want to estimate P{ dist (E0 , σ (H )) < η}.
(3.1)
This probability is expressible in terms of the finite-rank spectral projector for the interval Iη and H , which we write as E (Iη ). This projection is a random variable, but we suppress this in the notation. We now apply Chebyshev’s inequality to the random variable Tr(E (Iη )) and obtain P{ dist (E0 , σ (H )) < η} = P{Tr(E (Iη )) ≥ 1} ≤ E{Tr(E (Iη ))}.
(3.2)
2. We now proceed to estimate the expectation of the trace, following the original argument of Wegner [32] as modified by Kirsch [18], and using some results of [19]. By our assumptions, there exists a constant M0 > −∞ so that inf σ (H ) > M0 , uniformly in . Let ρ be a nonnegative, smooth, monotone decreasing function such that ρ(x) = 1, for x < −η/2, and ρ(x) = 0, for x ≥ η/2. Furthermore, we can assume that ρ has compact support: We take ρ(x) = 0 for x << M0 . By the functional calculus, we have ρ(H − E0 − 3η/2) − ρ(H − E0 + 3η/2) {ρ(Ej () − E0 − 3η/2) − ρ(Ej () − E0 + 3η/2)}Pj (), =
(3.3)
j ∈Z
where, due to the support of ρ, the sum is over the eigenvalues Ej () of H in [E0 − 2η, E0 + 2η]. The operators Pj () are the projectors onto the corresponding finite dimensional eigenspaces spanned by normalized eigenfunctions φj . Consequently, the difference of the two operators on the left in (3.3) is trace class. Note that this difference ρ(H − E0 − 3η/2) − ρ(H − E0 + 3η/2) is, roughly, the number of eigenvalues of H less than E0 + 2η minus the number of eigenvalues of H less than E0 − 2η. However, the operator H may have continuous spectrum to the left of E0 − 2η, so neither operator ρ(H − E0 − 3η/2), nor ρ(H − E0 + 3η/2), by itself, is in the trace class. The coefficient of Pj () on the right side of (3.3) is always nonnegative and precisely equal to one for Ej () ∈ Iη , so we have Tr(E (Iη )) ≤ Tr(ρ(H − E0 − 3η/2) − ρ(H − E0 + 3η/2)).
(3.4)
122
J. M. Combes, P. D. Hislop, S. Nakamura
Although the counting function for eigenvalues is not differentiable, we will use the fact that this smooth approximation is differentiable with respect to the energy. 3. Returning to the expectation of the trace in (3.4), we can now bound it above by E {Tr(E (Iη ))} ≤E {Tr[ρ(H − E0 − 3η/2) − ρ(H − E0 + 3η/2)]} 3η/2
≤ E Tr{−ρ (H − E0 − t)} dt . (3.5) −3η/2
We note that by the spectral theorem, the support of ρ (x) contributing to the trace in (3.5) lies in the interval [−η/2, η/2], and, by choice of ρ, the derivative is negative there. In order to evaluate the trace, we consider the operator H as a function of the random coupling constants and rewrite differentiation with respect to the energy t in terms of ˜ ≡ ∩ Zd . For this, we differentiation with respect to the coupling constants λk , k ∈ use the Gohberg–Krein formula (cf. Simon [27]) giving
∂ ∂H . (3.6) Tr ρ(H − E0 − t) = Tr ρ (H − E0 − t) ∂λk ∂λk ˜ k∈
˜ k∈
The proof of this result is similar to the proof in [27]. It uses the fact that for g ∈ C0∞ (R), we have that (∂H /∂λk )g(H ) is trace class, as follows from Proposition 5.1. We use the expansion in (3.3) to evaluate the trace on the right in (3.6) and write ∂H ∂H Tr ρ (H − E0 − t) = φj , ρ (Ej () − E0 − t) φj , ∂λk ∂λk j
(3.7) where, as above, the the compactness of the support of ρ implies that the sum is finite and over the eigenvalues of H in the interval [E0 −2η, E0 +2η], including multiplicity, with normalized eigenfunctions φj . For the case of additive perturbations, the expectation of the derivative of H on the right side of (3.7) is equal to ∂V φj , φj , ∂λk and for multiplicative perturbations, it is equal to ∂V φj , (1 + V )−1 φj . ∂λk We recall that it follows from the hypotheses in the multiplicative case that (1 + V ) > C1 > 0, for some constant C1 . Hence, in the next step, we need to estimate the sum ∂V φj φj , ∂λk ˜ k∈
from below so we can solve (3.7) for an upper bound on − Tr{ρ (H − E0 − t)} ≥ 0. The Anderson-type potential depends linearly on the coupling constants so that ∂V = uk . ∂λk
(3.8)
Lp -Theory of Spectral Shift Function, Wegner Estimate, and IDS
123
For the breather-type model, the derivative is ∂V = (x − k) · ∇uk (λk (ω)(x − k)). ∂λk
(3.9)
It follows from this that for the Anderson-type potential, we need to estimate from below the sum φj , uk φj , (3.10) ˜ k∈
and, for the breather-type potential, the sum φj , (x − k) · ∇uk (λk (ω)(x − k))φj . −
(3.11)
˜ k∈
In light of (3.10)–(3.11), we define an auxiliary potential V˜ for the Anderson-type model by V˜ (x) ≡
∂V (x) = uk (x − k), ∂λk
˜ k∈
(3.12)
˜ k∈
and, for the breather-type model, (x − k) · ∇uk (λk (ω)(x − k)). V˜ (x) ≡
(3.13)
˜ k∈
We note that by (H3) and (H4) there exists a finite constant C1 > 0 such that |V (x)| ≤ C1 |V˜ (x)|,
(3.14)
for any x ∈ Rd , and for any subset ⊂ Rd . 4. We estimate from below the quantity φj , V˜ φj using the Comparison Theorem of Kirsch, Stollmann, and Stolz (KSS) [19] proved in Sect. 4. Referring to Theorem 4.3, we must construct a comparison potential V0 . We do this as follows. We choose some 0 < %2 so small that [E0 − 2η, E0 + 2η] ⊂ (B− + %2 , B+ − %2 ), and so that, for the breather-type model, %2 /C1 ≤ δ, with δ as in (H3). We set D = x ∈ Rd | V˜ (x) ≥ %2 /C1 ,
(3.15)
(3.16)
and we define the comparison potential V0 by V0 (x) = V (x) for x ∈ Dc = 0 for x ∈ D.
(3.17)
We then have that |V (x)| ≤ %2 , and hence [E0 − 2η, E0 + 2η] ⊂ ρ(H0 + V0 ). Let 0 < %1 < %2 /C1 , and we set (3.18) F = x ∈ Rd | V˜ (x) ≤ %1 .
124
J. M. Combes, P. D. Hislop, S. Nakamura
By the equicontinuity assumption (H3), there exists a finite constant θ > 0, independent of , so that dist (D, F ) ≥ θ > 0.
(3.19)
Theorem 4.3 then implies that φj , V˜ φj ≥ %1 φj , (1 − χF )φj ≥ (%1 /C2 )φj 2 ,
(3.20)
for a finite constant C2 > 0 that is independent of and j , and depends only on η, θ, and H0 . 5. Returning to (3.6)–(3.7), we have the lower bound, ∂V φj ≥ C0 , φj , (3.21) ∂λk ˜ k∈
uniformly in j , so we obtain from (3.6)–(3.7), − Tr{ρ (H − E0 − t)}
∂ρ −1 Tr (H − E0 − t) ≤ C0 ∂λk ˜ k∈
∂ −1 Tr {ρ(H − E0 − t) − ρ(H0 − E0 − t)} . ≤ C0 ∂λk
(3.22)
˜ k∈
We remark that the operator {ρ(H − E0 − t) − ρ(H0 − E0 − t)} is trace class since the difference H − H0 = V has compact support. With this estimate, the right side of (3.5) can be bounded above by
−1 3η/2 ∂ dt E Tr{ρ(H − E0 − t) − ρ(H0 − E0 − t)} . (3.23) C0 ∂λk −3η/2 ˜ k∈
˜ In order to evaluate the expectation, we select one random variable λk , with k ∈ , and first integrate with respect to this variable using hypothesis (H4). Because of the positivity of −ρ , and of the density h0 , we obtain ∂ Tr{ρ(H − E0 − t) − ρ(H0 − E0 − t)} − dλk h0 (λk ) ∂λk ≤ h0 ∞ {Tr{ρ(Hm,k − E0 − t) − ρ(HM,k − E0 − t)},
(3.24)
where HM,k is the local Hamiltonian H with the coupling constant λk at the k th -site fixed at its maximum value. Similarly, the small m denotes the minimum value of the coupling constant. We note once again that the difference {ρ(Hm,k −E0 −t)−ρ(HM,k − E0 − t)} is trace class. Consequently, we are left with the task of estimating h0 ∞ 3η/2 dt h0 (λl ) dλl Tr{ρ(Hm,k − E0 − t) − ρ(HM,k − E0 − t)}. C0 −3η/2 ˜ k∈
l=k
(3.25)
Lp -Theory of Spectral Shift Function, Wegner Estimate, and IDS
125
6. The expression involving the trace in (3.25) is basically the number of eigenvalues created in the interval [E0 − 2η, E0 + 2η] by decreasing the k th -coupling constant from m the maximum value λM k to the minimum value λk . This trace can be rewritten in terms of a spectral shift function as follows. We consider explicitly the Anderson-type additive case. The other cases are treated similarly. We let H1 ≡ HM,k be the unperturbed operator, and write, M Hm,k = H1 + (λm k − λk )uk = H1 + V .
(3.26)
Although the difference V is not trace class, it does have compact support. We show in Sect. 5 that the difference of sufficiently large powers of the resolvents of H1 and H1 +V is not only in the trace class, but is in the super-trace class I1/p , for all p ≥ 1. Specifically, our assumptions imply that there exists a finite constant M0 such that (H1 + M0 )−1 and (H1 + V + M0 )−1 are bounded. Let us define the function g(λ) = (λ + M0 )−k , for λ >> M0 . We prove that for k > pd/2 + 2, and p > 1, g(H1 + V ) − g(H1 ) ∈ I1/p .
(3.27)
Spectral shift function ξ(λ ; H1 + V , H1 ) is defined for the pair (H1 , H1 + V ) by ξ(λ ; H1 + V , H1 ) = sgn(g (λ)) ξ(g(λ) ; g(H1 + V ), g(H1 )).
(3.28)
Because the function ρ has compact support, and the fact that the difference {g(H1 + V ) − g(H1 )} is super-trace class, we can apply the Birman–Krein identity [3] to the trace in (3.25). This gives Tr{ρ(Hm,k − E0 − t) − ρ(HM,k − E0 − t)} d ρ(λ − E0 − t) ξ(λ; H1 + V , H1 ) dλ =− dλ R d =− ρ(λ − E0 − t) ξ(g(λ); g(H1 + V ), g(H1 )). dλ R
(3.29)
We estimate the integral using the Hölder inequality and the Lp -theory of Sect. 2. Let ξ˜ (λ) = ξ(g(λ); g(H1 + V ), g(H1 )), for notational convenience. Let χ be the characteristic function for the interval [−η, η] so that we can replace ρ in (3.29) by χρ . We write χ˜ (λ) ≡ χ (λ − E0 − t). For any p > 1, and q such that p1 + q1 = 1, the right side of (3.29) can be bounded above by
q
supp χ
1/q
|ρ |
|ξ˜ (λ) χ˜ (λ)|p
1/p
≤ C0 η(1−q)/q ξ˜ χ˜ Lp .
(3.30)
Here, we integrated one power of ρ using the fact that −ρ > 0 on the support of χ , and we used the fact that |ρ | = O(η−1 ), to obtain 1/q
q−1
supp χ
|ρ |
|ρ |
≤η
(1−q)/q
−
≤ C0 η(1−q)/q .
1/q supp χ
ρ
(3.31)
126
J. M. Combes, P. D. Hislop, S. Nakamura
By a simple change of variables, we find
1/p |ξ(g(λ); g(H1 + V ), g(H1 ))|p χ (λ − E0 − t) dλ ξ˜ χ˜ p = ≤ C1
p
R
1/p
(3.32)
|ξ(λ; g(H1 + V ), g(H1 ))| dλ 1/p
≤ C1 g(H1 + V ) − g(H1 )1/p . M We recall that V = (λm k − λk )uk . In particular, the volume of the support of V has order one, and is independent of ||. We prove in Sect. 5 that the constant g(H1 + 1/p V ) − g(H1 )1/p depends only on the single-site potential uk , and is independent of ||. Consequently, the right side of (3.32) is bounded above by C0 η(1−q)/q , independent of ||. This estimate, Eqs.(3.25) and (3.5), lead us to the result
P{dist (E0 , σ (H )) < η} ≤ CW η1/q g∞ ||, for any q > 1.
(3.33)
4. Localization Estimates on Eigenfunctions The localization properties of the eigenfunctions of the locally-perturbed Hamiltonians H are essential for the proof of the Wegner estimate given in Sect. 3. We present two results concerning these eigenfunctions in this section. The results here apply to any local perturbation of a background operator H0 satisfying (H1), and do not use any randomness. Let us recall a result on the decay of the resolvent R0 (E) = (H0 − E)−1 of the unperturbed operator at an energy E ∈ G. This is a version of the Combes-Thomas result [10] proved in [1]. Proposition 4.1. Let H0 satisfy hypothesis (H1a), and let E ∈ G. Let d± = dist (B± , E). There exists a constant, depending on d± , so that √ d− d+ |x−y|
|G0 (x, y; E)| ≤ Ce−
,
(4.1)
for all x, y ∈ Rd sufficiently separated. We first present a simple result indicating the exponential decay of the eigenfunctions of H away from the support of the local perturbation V . Let H0 be a background Hamiltonian which is perturbed by a potential V supported in a bounded region so that H = H0 + V , acting on L2 (Rd ). For any region O ⊂ Rd , let χO be the characteristic function of the region. Similar results hold for multiplicative perturbations. Proposition 4.2. Let φ be an L2 -eigenfunction of H satisfying H φ = Eφ, with E ∈ G, the spectral gap of H0 as in (H1b). Let O be an open subset of {supp V }c ⊂ Rd such that dist (O, supp V ) > δ > 0. Then, there exist constants AE > 0 and CE > 0 such that χO φ ≤ AE e−CE δ .
(4.2)
Lp -Theory of Spectral Shift Function, Wegner Estimate, and IDS
127
Proof. We begin by rewriting the eigenvalue equation as (H0 − E)φ = −V φ.
(4.3)
Since the energy E ∈ ρ(H0 ), we have φ = −R0 (E)V φ.
(4.4)
We now multiply on the left by the characteristic function of O to obtain, χO φ = χO R0 (E)V φ.
(4.5)
We now use the estimate on the decay of the resolvent of R0 (E) given in Proposition 4.1 to obtain the estimate. Kirsch, Stollmann, and Stolz [19] present a nice result, called here the KSS Comparison Theorem, on the localization of the eigenfunctions of the local Hamiltonian H that provides more precise information about the eigenfunctions in the region . The proof of this theorem is simple and follows the same ideas as in the proof of Proposition 4.2. Theorem 4.3. Let H0 and V be as above and H φ = Eφ with E ∈ G and φ ∈ L2 (Rd ). Suppose that the following two conditions are satisfied: 1. There exists a potential V0 such that, with H0 ≡ H0 + V0 , we have E ∈ ρ(H0 ); 2. There exists a subset F ⊂ and a constant θ > 0 so that dist (F ∪c , {x | V (x) = V0 (x)}) > θ > 0. We then have, φ ≤ (1 + (H0 − E)−1 W1 )(1 − χF )φ,
(4.6)
where W1 ≡ [H0 , χ1 ], with χ1 is defined in the proof, and χF is the characteristic function of F . Proof. Let D ≡ {x | V (x) = V0 (x)}. Since dist (D, F ∪ c ) > θ, we can find a smoothed characteristic function χ1 such that χ1 | D = 1, and χ1 = 0 near ∂F and ∂, with supp |∇χ1 | ⊂ F c . Let us define W1 ≡ [H0 , χ1 ]. Note that (1 − χF )W1 = W1 . We then have (1 − χ1 )(H0 + V )φ = (1 − χ1 )(H0 + V0 )φ = (H0 + V0 )(1 − χ1 )φ + W1 φ (4.7) = E(1 − χ1 )φ. Solving this equation for (1 − χ1 )φ, we find (1 − χ1 )φ = −(H0 + V0 − E)−1 W1 φ.
(4.8)
(1 − χ1 )φ ≤ (H0 − E)−1 W1 (1 − χF )φ.
(4.9)
It follows that
The result follows directly from this inequality since φ ≤ (1 − χ1 )φ + χ1 φ ≤ ((H0 − E)−1 W1 + 1) (1 − χF )φ, since χ1 ≤ (1 − χF ).
(4.10)
Of course, the art in applying this theorem lies in the construction of the comparison potential V0 so that the support and spectral hypotheses are satisfied.
128
J. M. Combes, P. D. Hislop, S. Nakamura
5. Trace Estimates We present the estimates needed in Sect. 3. We let Kd (Rd ) denote the Kato class of potentials, and we refer the reader to Simon’s article (cf. [29]) for a complete description. We let H0 be the Schrödinger operator H0 = (−i∇ − A)2 + W,
(5.1)
where A is a vector potential with A ∈ L2loc (Rd ), and W = W+ − W− is a background potential with W− ∈ Kd (Rd ) and W+ ∈ Kdloc (Rd ). We denote by H = H0 + V , for suitable real-valued functions V . We are interested in a bounded potential V with compact support. Proposition 5.1. Let H0 be as above, and let V1 be a Kato-class potential such that V1 Kd ≤ M1 . Let H1 ≡ H0 + V1 , and let M > 0 be a sufficiently large constant given in the proof. Let V be a Kato-class function supported in B(R), the ball of radius R > 0 with center at the origin. Then, for any p > 0, we have Veff ≡ (H1 + V + M)−k − (H1 + M)−k ∈ I1/p ,
(5.2)
provided k > dp/2 + 2. Under these conditions, there exists a constant C0 , depending on p, k, H0 , M1 , V Kd , and R, so that Veff 1/p ≤ C0 .
(5.3)
Proof. Let H2 = H1 + V . By the diamagnetic inequality, we have |(H1 + M)−1 φ(x)| ≤ (−@ + W + V1 + M)−1 |φ|(x)
(5.4)
≤ 2(−@ + 1)−1 |φ|(x), if M is sufficiently large. We also have |(H2 + M)−1 φ(x)| ≤ 2(−@ + 1)−1 |φ|(x),
(5.5)
if M is sufficiently large. We fix M so large so that these estimates hold for any V1 satisfying the conditions stated above. We note that Veff = −
k−1
(H2 + M)−(k−j ) V (H1 + M)−j −1
j =0
=−
k−1
(5.6) (J
k−j
(H2 + M)
−(k−j ) ∗
) V (J
j +1
(H1 + M)
−j −1
),
j =0
where J ∈
C0∞ (B(R))
such that J V = V . By Proposition 12 of [26], we have
J j (Ha + M)−j =
j N
Jαβ (Ha + M)−1 Bαβ ,
(5.7)
β=1 α=1
for an integer N depending on j , and where a = 1, 2, Jαβ ∈ C0∞ (B(R)), and Bαβ are uniformly bounded operators. Note that Jαβ are given by linear combinations of the derivatives of J .
Lp -Theory of Spectral Shift Function, Wegner Estimate, and IDS
129
Let q > d/4. It is then well-known that Jα,β (−@ + 1)−1 ∈ I2q . We now apply Theorem 2.13 of [28], using (5.4) and (5.5), to show that Jα,β (Ha + M)−1 ∈ I2q , and Jα,β (Ha + M)−1 I2q ≤ 2Jα,β (−@ + 1)−1 I2q .
(5.8)
Combining this observation with (5.7) and (5.8), we find that Veff ∈ I2q/k . We choose k so large that 2q/(k − 2) ≤ 1/p, i. e. we can take k > 2pq + 2 > dp/2 + 2. This proves the theorem. Acknowledgements. We thank W. Kirsch,A. Klein, F. Klopp, E. Kostrykin, R. Schrader, K. Sinha, P. Stollmann, and G. Stolz for useful discussions. We thank the referee for a careful reading of the manuscript and for comments that improved the paper.
References 1. Barbaroux, J.-M., Combes, J.M. and Hislop, P.D.: Localization near band edges for random Schrödinger operators. Helv. Phys. Acta 70, 16–43 (1997) 2. Birman, M.S., Solomjak, M.Z.: Spectral theory of self-adjoint operators in Hilbert space. Dordrecht: D. Reidel Publishing Co., 1987 3. Birman, M.S., Yafaev, D.R.: The spectral shift function: The work of M. G. Krein and its further development. St. Petersburg Math. J. 4, 833–870 (1992) 4. Buschmann, D., Stolz, G.: Two-parameter spectral averaging and localization for non-monotoneous random Schrödinger operators. Trans. Am. Math. Soc. 353, 635–653 (2001) 5. Combes, J.M., Hislop, P.D.: Localization for some continuous random Hamiltonians in d-dimensions. J. Funct. Anal. 124, 149–180 (1994) 6. Combes, J.M., Hislop, P.D.: Landau Hamiltonians with random potentials: Localization and density of states. Commun. Math. Phys. 177, 603–629 (1996) 7. Combes, J.M., Hislop, P.D., Mourre, E.: Spectral averaging, perturbation of singular spectra, and localization. Trans. Am. Math. Soc. 348, 4883–4894 (1996) 8. Combes, J.M., Hislop, P.D., Mourre, E.: Correlated Wegner inequalities for random Schrödinger operators. In: Advances in Differential Equations and Mathematical Physics: Proceedings of the International Conference on Differential Equations and Mathematical Physics 1997, E. Carlen, E.M. Harell, M. Loss, eds. Contemporary Mathematics 217, 191–204, AMS 1998 9. Combes, J.M., Hislop, P.D., Tip, A.: Band edge localization and the density of states for acoustic and electromagnetic waves in random media. Ann. Inst. H. Poincaré 70, 381–428 (1999) 10. Combes, J.M., Thomas, L.: Asymptotic behavior of eigenfunctions for multiparticle Schrödinger operators. Commun. Math. Phys. 34, 251–270 (1973) 11. Faris, W.: A localization principle for multiplicative perturbations. J. Funct. Anal. 67, 105–114 (1986) 12. Figotin, A., Klein, A.: Localization of classical waves I: Acoustic models. Commun. Math. Phys. 180, 439–482 (1996) 13. Figotin, A., Klein, A.: Localization of classical waves II: Electromagnetic waves. Commun. Math. Phys. 184, 411–441 (1997) 14. Hislop, P.D., Klopp, F.: The Wegner estimate and the integrated density of states for some random operators with non-sign definite potentials. In preparation 15. Hundertmark, D., Simon, B.: An optimal Lp -bound on the Krein spectral shift function. Preprint 2000, to appear in J. d’Analyse Math. 16. Klopp, F.: Localization for some continuous random Schrödinger operators. Commun. Math. Phys. 167, 553–569 (1995) 17. Kirsch, W.: Random Schrödinger operators: A course. In: Schrödinger Operators, Sonderborg DK 1988, ed. H. Holden and A. Jensen, Lecture Notes in Physics 345, Berlin: Springer, 1989 18. Kirsch, W.: Wegner estimates and localization for alloy-type potentials. Math. Zeit. 221, 507–512 (1996) 19. Kirsch, W., Stollmann, P. and Stolz, G.: Localization for random perturbations of periodic Schrödinger operators. Random Operators and Stochastic Equations 6, 241–268 (1998) 20. Kirsch, W., Stollmann, P., Stolz, G.: Anderson localization for random Schrödinger operators with long range interactions. Commun. Math. Phys. 195, 495–507 (1998) 21. Kostrykin, V., Schrader, R.: Scattering theory approach to random Schrödinger operators in onedimension. Rev. Math. Phys. 11, 187–242 (1999) 22. Kostrykin, V., Schrader, R.: The density of states and the spectral shift density of random Schrödinger operators. Rev. Math. Phys. 12, 807–847 (2000)
130
J. M. Combes, P. D. Hislop, S. Nakamura
23. Kostrykin, V., Schrader, R.: Global bounds for the Lyapunov exponent and the integrated density of states of random Schrödinger operators in one dimension. Preprint 2000 24. Kostrykin, V., Schrader, R.: Regularity of the density of surface states. Preprint 2000, to appear in J. Func. Anal. 25. Kotani, S., Simon, B.: Localization in general one dimensional systems. II, Commun. Math. Phys. 112, 103–120 (1987) 26. Nakamura, S.: A remark on the Dirichlet–Neumann decoupling and the integrated density of states. Preprint 2000 27. Simon, B.: Spectral averaging and the Krein spectral shift. Proc. Am. Math. Soc. 126, 1409–1413 (1998) 28. Simon, B.: Trace Ideals and their Applications. London Mathematical Society Lecture Series 35, Cambridge: Cambridge University Press, 1979 29. Simon, B.: Schrödinger semigroups. Bull. Am. Math. Soc. N. S. 7, 447–526 (1982) 30. Stollmann, B.: Wegner estimates and localization for continuum Anderson models with some singular distributions. Archiv der Mathematik 75, 307–311 (2000) 31. von Dreifus, H., Klein, A.: A new proof of localization for the Anderson tight-binding model. Commun. Math. Phys. 124, 245–299 (1989) 32. Wegner, F.: The density of states for disordered systems. Zeit. Phy. B 44, 9–15 (1981) 33. Yafaev, D.R.: Mathematical Scattering Theory: General Theory. Translations of Mathematical Monographs 105, Providence, RI: Am. Math. Soc., 1992 Communicated by B. Simon
Commun. Math. Phys. 218, 131 – 132 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
A Proof of the Local Borg–Marchenko Theorem Christer Bennewitz Centre of Mathematics, Lund University, Box 118, 221 00 Lund, Sweden. E-mail:
[email protected] Received: 6 November 2000 / Accepted: 8 November 2000
Abstract: In [2], [5] a “local” version of a basic uniqueness theorem in inverse spectral theory is given. This paper gives a simple proof of this theorem.
Consider the boundary value problem M[u] = −u + qu = λu on [0, b),
(1)
u(0) cos α + u (0) sin α = 0,
(2)
where q ∈ L1loc [0, b) is real valued, 0 < b ≤ ∞ and α ∈ [0, π ) is fixed. In addition, we impose a boundary condition at the endpoint b, if needed for M to generate a selfadjoint operator T in L2 (0, b). Let ϕ and θ be solutions of M[u] = λu with initial conditions ϕ(0, λ) = sin α, θ(0, λ) = cos α, ϕ (0, λ) = − cos α,
θ (0, λ) = sin α.
The solutions ϕ and θ are entire functions of λ of order 1/2, locally uniformly in x, and there exists a function m(λ) (the Titchmarsh–Weyl m-function), analytic outside R and such that ψ(x, λ) = θ (x, λ) − m(λ)ϕ(x, λ) satisfies the boundary condition at b (in particular, ψ ∈ L2 (0, b)). Recently Simon [5] proved a “local” version of a well-known uniqueness theorem due to Borg [1] and Marchenko [4]. See also [2] for a shorter proof. This note gives a very simple proof of Simon’s theorem, in the spirit of Borg’s original paper. To state it we introduce, in addition to T , a similar operator T˜ , corresponding ˜ a potential q˜ and, if needed, a to the same boundary condition (2), an interval [0, b), ˜ boundary condition at b. Let the corresponding m-function be m, ˜ and similarly ϕ, ˜ θ˜ and ˜ ψ correspond to ϕ, θ and ψ. Then Simon’s theorem is the following, where the square root always means the principal root, that is, the root with a positive real part.
132
C. Bennewitz
˜ Then q = q˜ on (0, a) if and only if for every Theorem 1. Suppose a ∈ (0, min(b, b)]. √ ε > 0 we have that m(λ) − m(λ) ˜ = O(e−2(a−ε) Re −λ ) as λ → ∞ along some non-real ray. Our proof is based on two well-known facts which may be found in [1], but can also be extracted from [3, Lemma 2.1, p. 5 and (5.11), p. 136]. √ √ (3) ϕ(x, λ) = 21 (sin α − cos α/ −λ)ex −λ )(1 + o(1)), ϕ(x, λ)ψ(x, λ) → 0
(4)
in both cases for every x ∈ (0, b) as λ → ∞ along a non-real ray. Note that ϕ(x, λ)ψ(x, λ) is Green’s function on the diagonal x = y. Clearly (3) implies that for fixed x we have ϕ(x, λ)/ϕ(x, ˜ λ) → 1 as λ → ∞ along ˜ a non-real ray, so (4) shows that ϕ(x, ˜ λ)ψ(x, λ) and ϕ(x, λ)ψ(x, λ) both also tend to 0 in the same way. Hence so does their difference ˜ λ) + (m(λ) ϕ(x, ˜ λ)θ (x, λ) − ϕ(x, λ)θ(x, ˜ − m(λ))ϕ(x, λ)ϕ(x, ˜ λ).
(5)
The “only if” part of the theorem now follows immediately from (3), since the first two terms of (5) cancel by assumption for x ∈ (0, a). To show the other direction of the theorem, note that for 0 < x < a the last term tends to 0 by assumption and (3). Thus ϕ(x, ˜ λ)θ (x, λ) − ϕ(x, λ)θ˜ (x, λ) tends to 0 along a non-real ray, and by symmetry also along its conjugate. This entire function is of order ≤ 1/2 and bounded on two rays, so the simplest version of the Phragmén–Lindelöf theorem shows that it must be bounded in all of C, and so is constant by Liouville’s theorem. As the limit is 0 along ˜ ϕ. the ray, it vanishes for all x ∈ (0, a). Thus θ/ϕ = θ/ ˜ Differentiating and using that θ ϕ − θ ϕ = 1 it follows that ϕ 2 = ϕ˜ 2 . Taking the logarithmic derivative of this and differentiating once more we obtain ϕ /ϕ = ϕ˜ /ϕ˜ which means that q = q˜ in (0, a). The theorem is proved. References 1. Borg, G.: Uniqueness theorems in the spectral theory of y +(λ−q(x))y = 0. In: Proc. 11th Scandinavian Congress of Mathematicians (Oslo), Johan Grundt Tanums Forlag, 1952, pp. 276–287 2. Gesztesy, F. and Simon, B.: On local Borg–Marchenko uniqueness results. Commun. Math. Phys. 211, 273–287 (2000) 3. Levitan, B. M. and Sargsjan, I. S.: Introduction to spectral theory: Selfadjoint ordinary differential operators. Providence, R.I.: American Mathematical Society, 1975 4. Marˇcenko, V. A. Certain problems in the theory of second order differential operators. Doklady Akad. Nauk SSSR 72, 457–460 (1950), (Russian) 5. Simon, B.: A new approach to inverse spectral theory, I. fundamental formalism. Annals of Math. 150, 1–29 (1999) Communicated by B. Simon
Commun. Math. Phys. 218, 133 – 152 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Free Fisher Information with Respect to a Completely Positive Map and Cost of Equivalence Relations Dimitri Shlyakhtenko Department of Mathematics, UCLA, Los Angeles, CA 90095, USA. E-mail:
[email protected] Received: 11 September 1999/Accepted: 10 November 2000
Abstract: Given a family of isometries v1 , . . . , vn in a tracial von Neumann algebra M, a unital subalgebra B ⊂ M and a completely-positive map η : B → B we define the free Fisher information F ∗ (v1 , . . . , vn : B, η) of v1 , . . . , vn relative to B and η. Using this notion, we define the free dimension δ ∗ (v1 , . . . , vn B) of v1 , . . . , vn relative to B, id. Let R be a measurable equivalence relation on a finite measure space X. Let M be the von Neumann algebra associated to R, and let B ∼ = L∞ (X) be the canonical diffuse subalgebra. If v1 , . . . , vn , · · · ∈ M are partial isometries arising from a treeing of this equivalence relation, then limn δ ∗ (v1 , . . . , vn , . . . B) is equal to the cost of the equivalence relation in the sense of Gaboriau and Levitt. 1. Introduction Recently G. Levitt [3] introduced the notion of cost of a measurable equivalence relation R on a finite measure space X. This notion was further studied by Gaboriau [5, 4], who used this invariant to prove, for example, that a free measure-preserving action of the free group F(n) and a free measure-preserving action of the free group F(m), n = m, cannot be orbit-equivalent. The cost C(R) measures “how many local morphisms from X to X are required to generate R”, and is an invariant of R. If α is a free measure-preserving action of a discrete group G on a finite measure space X, Gaboriau associated to it the cost C(Rα ) of the equivalence relation Rα induced by the action. The cost of the group C(G) was then defined as the infimum of C(Rα ) over all possible actions α of G. Gaboriau showed that certain groups (e.g., cyclic groups and their free products) have a fixed price – C(Rα ) is independent of α. He also showed that (under certain assumptions) the cost of the group is well-behaved under the free product operation: C(G ∗ H ) = C(G) + C(H ). Research supported by MSRI
134
D. Shlyakhtenko
Let G be a (finitely generated) discrete group, and let g1 , g2 , . . . , gn ∈ G. Then to each gi one can associate a unitary u(gi ) in the left regular representation 2 (G). The weak closure of the ∗-algebra generated by all such u(gi ) is called the von Neumann algebra of G, denoted by L(G). L(G) has a canonical trace (given by τ (u) = uδe , δe , where δe ∈ 2 (G) is the delta-function supported on the neutral element of G). To the family of unitaries u(g1 ), . . . , u(gn ) Voiculescu [7, 10, 8] has associated two numbers, χ (u(g1 ), . . . , u(gn )) and χ ∗ (u(g1 ), . . . , u(gn )), called the “microstates” and “microstates-free” free entropy. These numbers arise naturally from Voiculescu’s free probability theory. Using microstates free entropy χ , he defined the free (microstates) entropy dimension δ(u(g1 ), . . . , u(gn )); replacing in his definition the microstates free entropy χ by the microstates-free free entropy χ ∗ leads to the microstates-free free dimension δ ∗ , which we use in this paper. The two notions (microstates-free and involving microstates) of free entropy dimension make sense for an arbitrary collection u1 , . . . , un of unitaries in a tracial von Neumann algebra. It is not known if the two quantities δ and δ ∗ are different; they coincide in every example in which their values have been computed. Voiculescu has proved that δ(u(g1 ), . . . , u(gn )) is independent of the choice of generators g1 , . . . , gn of the group G, and therefore gives a non-trivial invariant of G, called the free dimension of G, δ(G). The corresponding invariance property for δ ∗ is suspected, but so far has not been proved. If G = K ∗ H , and k1 , . . . , kn ∈ K, h1 , . . . , hm ∈ H , then both δ and δ ∗ are additive: δ ∗ (u(k1 ), . . . , u(kn ), u(h1 ), . . . , u(hn )) = δ ∗ (u(k1 ), . . . , u(kn )) + δ ∗ (u(h1 ), . . . , u(hm )) and similarly for δ. It is remarkable that if g is a generator of a cyclic group, then 1 . Additivity of both quantities under free δ(u(g)) = δ ∗ (u(g)) = C(G) = 1 − |G| products imply that a (finite) free product of cyclic groups has a sequence of generators g1 , g2 , . . . , gn for which δ(u(g1 ), . . . , u(gn )) = δ ∗ (u(g1 ), . . . , u(gn )) = C(G). There are indications that the three quantities: C(G),
δ(G),
δ ∗ (u(set of generators of G))
are actually the same. It is tempting to suggest that there is a more general notion of “cost” (and “free dimension”) which unifies the notion of cost (free dimension) of a group, and the notion of cost of an equivalence relation. Indeed, groups and equivalence relations are both special cases of groupoids. In this paper: 1. If M is a tracial von Neumann algebra, B ⊂ M a unital subalgebra, and v1 , . . . , vn ∈ M is an n-tuple of partial isometries, we define the free entropy dimension of the n-tuple v1 , . . . , vn relative to (B, id : B → B), δ ∗ (v1 , . . . , vn B). For R an equivalence relation on a measure space X, a graphing (“sequence of generators”) φ1 , . . . , φn , . . . gives rise to a family v1 , . . . , vn , . . . of partial isometries in W ∗ (X, G), and hence to a number, δ ∗ (φ1 , . . . , φn , . . . ) = lim supn δ ∗ (v1 , . . . , vn B). 2. In certain cases (when an equivalence relation R is treeable), R has a graphing φ1 , . . . , φn , . . ., giving rise to partial isometries v1 , . . . , vn , · · · ∈ W ∗ (X, R) for which limn→∞ δ ∗ (v1 , . . . , vn B) = C(R).
Free Information and Cost
135
3. Given a sequence of generators φ1 , . . . , φn one can extend the resulting isometries v1 , . . . , vn ∈ W ∗ (X, G) to unitaries u1 , . . . , un ∈ W ∗ (X, G) in a certain way. For a treeing φ1 , . . . , φn , . . ., δ ∗ (v1 , . . . , vn B) = δ ∗ (u1 , . . . , un B). 4. If α is a free action of a discrete group on a measure space X, each element g ∈ G gives rise to a unitary w(g) in the crossed product algebra L∞ (X) α G ∼ = W ∗ (X, Rα ). It turns out that δ ∗ (w(g1 ), . . . , w(gn ) B) = δ ∗ (u(g1 ), . . . , u(gn )) (here u(gj ) ∈ B(2 (G)) are in the group von Neumann algebra of G), and therefore depends only on the generators g1 , . . . , gn and not on the action α. The connections between δ ∗ and δ lead us to suspect that if φ1 , . . . , φn , . . . is a graphing of an equivalence relation R on X, then the number δ ∗ (φ1 , φ2 , . . . ) depends only on R. The fourth statement above would then imply that if R comes from a group action, δ ∗ of any graphing of R would have the same microstates-free free dimension as any sequence of generators of the group. The remainder of the paper is organized as follows. We start with the definition of free Fisher information F ∗ for a family of isometries v1 , . . . , vn ∈ M relative to a subalgebra B ⊂ M and a completely-positive map η : B → B. The definition involves the notion of free Fisher information for unitaries with respect to a completely positive map, and is a straightforward extension of the definitions in [9, 8, 6]. Some proofs are direct adaptations of the arguments from these papers; we include them for the convenience of the reader. Specializing to the case η = id we get as in [9] an entropy quantity # ∗ . Passing from free entropy to free dimension like in [7] gives a free dimension quantity, which is denoted by δ ∗ (v1 , . . . , vn B). Next , we study the properties of F ∗ , # ∗ and δ ∗ . We conclude by exhibiting equality between cost and free dimension in certain cases. 2. Free Microstates–Free Fisher Information for Unitaries The reader is referred to [11] for the basic notions from free probability theory. 2.1. Notations. In the remainder of the paper, (M, τ ) will be a tracial non-commutative probability space, i.e., M is a W ∗ -algebra, and τ : M → C is a linear functional, so that 1. τ (xy) = τ (yx) for all x, y ∈ M (traciality), 2. τ (x ∗ x) ≥ 0 (positivity), 3. τ (x ∗ x) = 0 iff x = 0 (faithfulness). M is a bimodule over itself under right and left multiplication. The norm x2 = τ (x ∗ x) defines a pre-Hilbert space structure on M. We shall fix B ⊂ M a unital subalgebra, and denote by CP(B) the set of all completely-positive maps from B to B. Recall that η ∈ CP(B) if η is a linear map from B to B and its natural extension η ⊗ 1 : B ⊗ Mn×n → B ⊗ Mn×n to n × n matrices over B is positive: τ ⊗ Tr(η(m∗ m)) ≥ 0 for all m ∈ B ⊗ Mn×n . We shall be frequently concerned with the case that B is the commutative algebra B = L∞ (X, µ), where (X, µ) is a finite measure space, and τ (f ) = f (t)dµ(t) for f ∈ B. In this case, η ∈ CP(B) if and only if η(f ) is a.e. positive whenever f ∈ B is an a.e. positive function. If η : B → B is a completely-positive map, then a ⊗ b, a ⊗ b = τ (b∗ η(a ∗ b)) defines a (possibly semi-definite) pre-Hilbert space structure on B ⊗ B. The Hilbert
136
D. Shlyakhtenko
space obtained from B ⊗ B after separation and completion with respect to this inner product is denoted H 2 (B, η). It carries the obvious structure of a B, B-bimodule. If B ⊂ M, there exists a linear completely-positive map EB : M → B, called the conditional expectation from M onto B. Sometimes we write E for EB when B is understood. E satisfies E(bmb ) = bE(m)b , b, b ∈ B, m ∈ M. E is determined by τ (bE(m)) = τ (bm) for all b ∈ B, m ∈ M. If η ∈ CP(B), it can be extended to all of M by composing it with EB . If η ∈ CP(B), we write H (B, η) for H 2 (M, η ◦ EB ). We call an element u ∈ M a partial isometry, i.e., uu∗ and u∗ u are projections, i.e., ∗ u uu∗ u = u∗ u, uu∗ uu∗ = uu∗ . We denote by B[u] = B[u, u∗ ] the ∗-algebra generated by B and u, u∗ . 2.2. Freeness with amalgamation. If A1 , A2 are subalgebras of M, we say that they are free with amalgamation over B, if whenever aj ∈ W ∗ (Ai(j ) , B), i(1) = i(2), . . . , i(n− 1) = i(n) and EB (aj ) = 0, one has EB (a1 . . . an ) = 0. (1)
(1)
(2)
(2)
We further say that the families (a1 , . . . , an1 ), (a1 , . . . , an2 ), . . . of elements of M (j ) nj ) are free with are free with amalgamation over B, if the algebras Aj = W ∗ (B, {ai }i=1 amalgamation over B. If M and N are two tracial probability spaces, each containing B, one can form their amalgamated free product M ∗B N , which is again a tracial probability space. M ∗B N is generated by N and M, with the copies of B in M and N identified. M and N as subalgebras of M ∗B N are free with amalgamation over B. 2.3. Free Fisher information for an isometry. Let v be a partial isometry and let B ⊂ M be a subalgebra. Define the derivation du:B : B[u, u∗ ] → H (B[u, u∗ ], η) by: 1. du:B (u) = uu∗ ⊗ u = (uu∗ )1 ⊗ u(u∗ u), du:B (u∗ ) = −u∗ ⊗ uu∗ , 2. du:B (wv) = du:B (w) · v + w · du:B (v), 3. du:B is B-bilinear. Definition 2.1. We call a vector ξ(u : B, η) ∈ L2 (B[u, u∗ ]) conjugate to u relative to B with respect to η, if ξ(u : B, η), w = 1 ⊗ 1, du:B (w)
(2.1)
for all w ∈ B[u, u∗ ]. Definition 2.2. If ξ(u : B, η) exists, we let the relative free Fisher information of u with respect to B and η to be F ∗ (u : B, η) = ξ(u : B, η)22 . If the conjugate vector does not exist, we set F ∗ (u : B, η) = +∞. If B = C and η = id, we write just F ∗ (u). Example 2.3. Let u = 0. Then du:B = 0, since du:B (b) = 0 for all b ∈ B, and du:B (u) = 0 · 0 ⊗ u = 0. Hence ξ(u : B, η) = 0, and F ∗ (u : B, η) = 0. Proposition 2.4. Let D ⊂ B be a unital subalgebra. Assume that η|D ∈ CP(D). Assume that ξ(u : B, η) exists. Then ξ(u : D, η) exists and ξ(u : B, η) = EL2 (D[u,u∗ ]) (ξ(u : B, η)).
Free Information and Cost
137
Proof. Identical to that of [6, Proposition 3.4].
Proposition 2.5. Let D ⊂ B be a subalgebra. Assume that η = η ◦ ED and u is ∗-free from B with amalgamation over D. Then ξ(u : B, η) ∈ L2 (D[u, u∗ ]), and ξ(u : D, η) = ξ(u : B, η). Furthermore, F ∗ (u : B, η) = F ∗ (u : D, η). Proof. If ξ(u : B, η) exists, then also ξ(u : D, η) exists. Hence we can assume that ξ(u : D, η) exists. It is sufficient to verify that it satisfies (2.1) for all w ∈ B[u, u∗ ]. We may assume that ED (w) = 0, and hence that w = b0 v0 b1 v1 . . . bm , where bj ∈ B, vj ∈ D[u, u∗ ], ED (bj ) = ED (vj ) = 0, except possibly b0 and/or bn is equal to 1. Assume first that at least one of b0 , b1 is not 1, or that m > 1. Then, since τ (ξ(u : D, η)∗ ) = ξ(u : D, η), 1 = 1 ⊗ 1, 0 = 0, we get by freeness that ξ(u : D, η), w = 0. If m > 1, then du:B (w) ⊥ 1 ⊗ 1 in H (B, η). If m = 1, so that w = b0 v0 b1 , assume that du:B (v0 ) = zi ⊗ zi , zj , zj ∈ D[u, u∗ ]. Then du:B (w) = b0 zi ⊗ zi b1 , and 1 ⊗ 1, du:B (w) = τ (η(b0 zi )zi b1 ) = τ (η(ED (b0 zi ))ED (zi b1 )) = τ (η(ED (b0 )ED (zi ))ED (zi )ED (b1 )), which is zero if (as we assumed) either ED (b0 ) or ED (b1 ) = 0. It remains to verify (2.1) in the case b0 = b1 = 1 and m = 1, i.e., that ξ(u : D, η), v0 = 1 ⊗ 1, du:B (v0 ) . But this follows from the definition of ξ(u : D, η) and the fact that du:B (v0 ) = du:D (v0 ). Proposition 2.6. Let v ∈ B be an isometry. Assume that u = uvv ∗ and that uu∗ , u∗ u ∈ B. Then F ∗ (u : B, η) = F ∗ (uv : B, η). Proof. Let W = b0 ug(1) b1 . . . bn , and W = b0 (uvv ∗ )g(1) b1 . . . bn , where bi ∈ B and g(j ) ∈ {·, ∗}. Then W = W . We first claim that du:B (W ) = du:B (W ) = duv:B (W ). The proof proceeds by induction on n. If n = 0, du:B (b0 ) = du:B (b0 ) = duv:B (b0 ) = 0. If the equality holds for n = k, then writing W = b0 (uvv ∗ )g(1) W , where W has length k, and similarly W = b0 uW¯ , with W¯ having length k, we have if g(1) = ·: du:B (W ) = b0 uu∗ ⊗ uW¯ + b0 udu:B (W¯ ), du:B (W ) = b0 uu∗ ⊗ uvv ∗ W¯ + b0 uvv ∗ du:B (W¯ ), duv:B (W ) = b0 uvv ∗ u∗ ⊗ uvv ∗ W¯ + b0 uvv ∗ duv:B (W¯ ), which are the same by the inductive assumption. The case that g(1) = ∗ is similar. Assume now that F ∗ (uv : B, η) < +∞, so that ξ = ξ(uv : B, η) < +∞. Then ξ, W = ξ, W = 1 ⊗ 1, duv:B (W ) = 1 ⊗ 1, du:B (W ) = 1 ⊗ 1, du:B (W ) .
138
D. Shlyakhtenko
It follows that ξ(u : B, η) = ξ . Assuming now that F ∗ (u : B, η) < +∞, we can apply the same reasoning as before, replacing u by (uv) and v by v ∗ to conclude that ξ((uv)v ∗ : B, η) = ξ(uv : B, η). Proposition 2.7. Let v be an isometry, free with amalgamation over B from u, and assume that uu∗ , u∗ u ∈ B. Assume that u = uvv ∗ . Assume further that F ∗ (v : B, η) < +∞. Then F ∗ (uv : B, η) ≤ F ∗ (v : B, η) < +∞. In fact, ξ(uv : B, η) = EB[u] (ξ(u : B, η)). Proof. Let C = B[u]. Then F ∗ (v : B, η) = F ∗ (v : C, η ◦ EB ) = F ∗ (vu : C, η ◦ EB ), and ξ(v : B, η) = ξ(v : C, η ◦ EB ) = ξ(vu : C, η ◦ EB ). Hence F ∗ (vu : B, η) ≤ F ∗ (vu : C, η ◦ EB ) = F ∗ (v : B, η) < +∞ and ξ(vu : B, η) = EB[u] ξ(vu : C, η ◦ EB ) = EB[u] ξ(v : B, η). Corollary 2.8. Let u be a unitary, and p ∈ B be an isometry. Assume that upu∗ ∈ B. Then F ∗ (up : B, η) ≤ F ∗ (u : B, η). 3. Free Fisher Information for Several Isometries Definition 3.1. Let B ⊂ M, η ∈ CP(B) be as before, and let u1 , . . . , un ∈ M be isometries. Then the mutual free Fisher information of u1 , . . . , un relative to B, η is defined to be F ∗ (u1 , . . . , un : B, η) = F ∗ (uj : B[u1 , . . . , uˆ j , . . . , un ], η ◦ EB ). j
Proposition 3.2. F ∗ (u1 , . . . , un ) ≤ F ∗ (u1 , . . . , uk ) + F ∗ (uk+1 , . . . , un ). Proof. If 1 ≤ j ≤ k, then F ∗ (uj : B[u1 , . . . , uˆ j , . . . , un ]) ≤ F ∗ (uj : B[u1 , . . . , uˆ j , . . . , uk ]), by Proposition 2.4. Similarly, for k < j ≤ n, F ∗ (uj : B[u1 , . . . , uˆ j , . . . , un ]) ≤ F ∗ (uj : B[uk+1 , . . . , uˆ j , . . . , un ]).
Proposition 3.3. If the isometries u1 , . . . , uk are free with amalgamation over B from isometries uk+1 , . . . , un , then F ∗ (u1 , . . . , un ) = F ∗ (u1 , . . . , uk ) + F ∗ (uk+1 , . . . , un ).
Free Information and Cost
139
Proof. If 1 ≤ j ≤ n, then F ∗ (uj : B[u1 , . . . , uˆ j , . . . , un ]) = F ∗ (uj : B[u1 , . . . , uˆ j , . . . , uk ]), by Proposition 2.5. Similarly, if k < j ≤ n, we get that F ∗ (: B[u1 , . . . , uˆ j , . . . , un ]) = F ∗ (uj : B[uk+1 , . . . , uˆ j , . . . , un ]). Applying the definition of F ∗ (u1 , . . . , un ) proves the proposition.
Corollary 3.4. If uj are free with amalgamation over B, then F ∗ (u1 , . . . , un : B, η) = ∗ F (uj : B, η). Proposition 3.5. Let u1 , . . . , un be isometries. Then: 1. If D ⊂ B is such that η(D) ⊂ D, then F ∗ (u1 , . . . , un : D, η) ≤ F ∗ (u1 , . . . , un : B, η). 2. If D ⊂ B is such that η = ED ◦ η ◦ ED , and u1 , . . . , un are ∗-free with amalgamation over D from B, then F ∗ (u1 , . . . , un : D, η) = F ∗ (u1 , . . . , un : B, η). Proof. Follows from 2.4 and 2.5.
3.1. F ∗ of a single unitary. In the case that u is a unitary, B = C and η = id, Voiculescu gave an explicit formula for F ∗ (u) = F ∗ (u : C, id): ∗ 3F (u) = −1 + p 3 dλ, where λ is the Haar measure on T = {z ∈ C : |z| = 1}, and p is the density with respect to the Haar measure of the distribution of u. In particular, if u(t) is unitary free Brownian motion starting at identity and 0 < α < 21 , then there exist constants C1 , C2 , independent of t, for which F ∗ (u(t)) ≤ C1 t −1 for 0 < t ≤ 1, and F ∗ (u(t)) ≤ C2 exp(−tα). 4. Free Fisher Information for Normalizing Isometries Let B ⊂ M be as before. Denote by N (B) the normalizer of B, consisting of those unitaries w ∈ M, for which wBw∗ = B. If w ∈ N (B), then Adu (w) = uwu∗ is an automorphism of B. We write GN (B) for the “full group” of the normalizer; GN (B) consists of all isometries v ∈ M, for which v ∗ Bv ⊂ B and vBv ∗ ⊂ B. Notice that in particular vv ∗ and v ∗ v are assumed to be in B. If u ∈ N (B), then frequently F ∗ (u : B, τ ) = +∞. Indeed: Proposition 4.1. If the conjugate variable ξ = ξ(u : B, τ ) exists, and u ∈ N (B), then τ (ubu∗ c) = τ (b)τ (c) for all b, c ∈ B.
140
D. Shlyakhtenko
Proof. If ξ were to exist, we would have, for b, c ∈ B, 0 = ξ, ubu∗ c = 1 ⊗ 1, 1 ⊗ ubu∗ c − ubu∗ ⊗ c =τ (ubu∗ c) − τ (ubu∗ )τ (c) =τ (ubu∗ c) − τ (b)τ (c), which would imply that τ (ubu∗ c) = τ (b)τ (c).
Definition 4.2. A unital subalgebra A ⊂ M is said to be Aut-independent from the algebra B, if: 1. There exist generators v1 , v2 , . . . of A, so that vj ∈ N (B); 2. For all a ∈ A and b ∈ B, one has τ (ab) = τ (a)τ (b). Equivalently, EB (A) = C1. We say that unitaries a1 , . . . , an are Aut-independent from B, if W ∗ (a1 , . . . , an ) is Autindependent from B and aj ∈ N (B). Example 4.3. Let X be a finite measure space, and let α be a measure-preserving action of a discrete group G on X. Let M = L∞ (X)α G, B = L∞ (X). For g ∈ G, write ug = u(α, g) for the unitary in M, implementing the action of g. Then for any g1 , . . . , gn ∈ G, ug1 , . . . , ugn are Aut-independent from B. Furthermore, L(G) = W ∗ (ug : g ∈ G) ⊂ M is Aut-independent from B. Proposition 4.4. Let A1 , A2 be two algebras, each Aut-independent from B. Assume that A1 and A2 are free with amalgamation over B. Then A1 ∨ A2 is Aut-independent from B. Proof. It is clear that A1 ∨ A2 have a sequence of generators from N (B). Thus we must prove that if a ∈ A1 ∨ A2 , then EB (a) ∈ C. Let w = a1 a2 a3 · · · an ∈ A1 ∨ A2 , so that aj ∈ Ai(j ) , i(j ) = i(j + 1) for all j . Let [w] be the number of j for which EB (aj ) = 0. Then we have EB (w) = E((a1 − EB (a1 ))a2 . . . an ) + EB (a1 )E(a2 . . . an ). w1
w2
Since EB (a1 ) ∈ C by Aut-independence of Aj (1) and B, it is sufficient to prove that EB (w1 ) and EB (w2 ) ∈ C. But [w1 ], [w2 ] < [w]. Hence proceeding inductively, we must prove that EB (w) ∈ C if [w] = 0. But in that case EB (w) = 0, by the freeness with amalgamation assumption. Proposition 4.5. Let D ⊂ B be a unital subalgebra. Assume that B is generated by D and another unital subalgebra C. Assume that C[u1 , . . . , un ] is Aut-independent from D, and u1 , . . . , un ∈ N (D). Assume that η ∈ CP(B) is such that η|D = id and η(C) ⊂ C. Then F ∗ (u1 , . . . , un : B, η) = F ∗ (u1 , . . . , un : C, η|C ). Proof. Using the definition of F ∗ and by repeatedly replacing the algebra B by B[u1 , . . . , uˆ j , . . . , un ], C by C[u1 , . . . , uˆ j , . . . , un ], it is sufficient to prove the proposition in the case n = 1. We write u = u1 . Since η(C) ⊂ C, by Proposition 2.4, we in general have F ∗ (u : B, η) ≥ F ∗ (u : C, η|C ).
Free Information and Cost
141
Thus we may assume that ξ = ξ(u : C, η|C ) exists. We’ll show that ξ(u : C, η|C ) = ξ(u : B, η), which would establish the proposition. Let w ∈ B[u, u∗ ]. We must show that ξ, w = 1 ⊗ 1, du:B (w) H (B,η) . By assumption,
u∗ Du
⊂ D, uDu∗ ⊂ D. We may therefore assume that w has the form w = c0 d0 ug(1) c1 d1 ug(2) · · · cn dn ,
where cj ∈ C, dj ∈ D, g(j ) ∈ {·, ∗}, and also cj ∈ N (D). Then ξ, w = ξ, c0 ug(1) c1 · · · cn d0 d1 · · · dn , where dj = Ad−1 · · · Ad−1 ug(n) c ug(j +1) c n
j +1
Ad−1 (dj ). Since ξ ∈ L2 (C[u, u∗ ]) and ug(j ) c j
C[u, u∗ ] is Aut-independent from D, it follows that
ξ, w = ξ, c0 ug(1) c1 · · · cn · τ (d0 d1 · · · dn ). On the other hand, du:B (w) =
c0 d0 ug(1) · · · ck dk
k
uu∗ ⊗ u if g(k) = 1 −u∗ ⊗ uu∗ if g(k) = −1
× ck+1 dk+1 ug(k+1) · · · cn dn = du:C (w) · d0 d1 · · · dn , since a · d ⊗ b = a ⊗ d · b for all a, b ∈ B[u] and d ∈ D. Because of independence again, we find that 1 ⊗ 1, du:B (w) H (B,η) = 1 ⊗ 1, du:C (w) H (B,η) · τ (d0 d1 · · · dn ) = ξ, w , because ξ = ξ(u : C, η|C ).
Corollary 4.6. Let G be a finitely-generated group and g1 , . . . , gn be its generators. Let α be a measure-preserving action of G on a finite measure space X. Let u(g1 ), . . . , u(gn ) be the unitaries in L(G) ⊂ L∞ (X) α G, implementing the action. Then F ∗ (u(g1 ), . . . , u(gn ) : L∞ (X), id) = F ∗ (u(g1 ), . . . , u(gn )). In particular, let G be a free group, and let g1 , . . . , gn be its standard generators. Then F ∗ (u(g1 ), . . . , u(gn ) : L∞ (X), id) = F ∗ (u(g1 ), . . . , u(gn )) =
n
F ∗ (u(gi )) = 0.
i=1
Proof. It can easily seen 4.3 that L(G) is Aut-independent from L∞ (X) inside L∞ (X)α G. This implies the first claim. In the case G is a free group, u(gi ) are free, hence F ∗ (u(g1 ), . . . , u(gn )) = F ∗ (u(gi )). Since each u(gi ) is a Haar unitary, it follows that F ∗ (u(gi )) = 0.
142
D. Shlyakhtenko
Proposition 4.7. Let D ⊂ B be a unital subalgebra of B, u1 , . . . , un ∈ GN (B). Let p ∈ D ⊂ B, i = 1, . . . , N, N ∈ N ∪ {∞} be orthogonal projections, so that i N i=1 pi = 1. Assume that [pj , B] = {0} and [ui , pj ] = 0 for all i = 1, . . . , n, j = 1, . . . , N. Then τ (pj )F ∗ (pj u1 , . . . , pj un : pj B, Epj D ). F ∗ (u1 , . . . , un : B, ED ) = Proof. It is sufficient to prove the proposition for n = 1. We write u = u1 . Let H = H (B[u, u∗ ], ED ) be the bimodule associated to ED . Then by the assumptions made, we have the direct sum decomposition (as a Hilbert space) H = ⊕Hi , where Hi = pi Hpi = pi H as bimodules. Each Hican be identified with H (pi Bpi [pi upi ], Epi Dpi ) by multiplying its inner product with τ (pi )−1 . Ki , where Similarly, K = L2 (B[u]) = Ki = pi L2 (B[u]). 2 The space Ki can be identified with L (pi Bpi [pi upi ]) as bimodules over pi Bpi by multiplying its inner product by τ (pi )−1 . It is easily seen that du:B preserves these direct sum decompositions, so that du:B = ⊕di , where di : Ki → Hi . It is also clear ∗ (1 ⊗ 1) exists if and that each di = dpi upi :pi Bpi . Hence we get that ξ(u : B, ED ) = du:B ∗ only if all ξ(pi upi : pi Bpi , Epi Dpi ) = dpi upi :pi Bpi (pi 1 ⊗ 1) exist, and moreover
ξ(u : B, ED ) =
ξ(pi upi : pi Bpi , Epi Dpi ).
It follows that
ξ(pi upi : pi Bpi , Epi Dpi )2Ki F ∗ (u : B, ED ) =ξ(u : B, ED )22 = = τ (pi )ξ(pi upi : pi Bpi , Epi Dpi )2L2 (p Bp [p up ]) i i i i = τ (pi )F ∗ (pi upi : pi Bpi , Epi Dpi ).
Remark 4.8. The same proof shows that Proposition 4.7 still holds, if rather than dealing with a discrete family of projections pi , we are dealing with a projection-valued measure µ on X, µ(X) = 1. In that case, we get that F ∗ (u : B, ED ) = X F ∗ (u(t) : B(t), Edµ(t)D )τ (dµ(t)), where u(t) and B(t) are the disintegrations of u and B over the commutative algebra W ∗ (µ(A) : A ⊂ X). The following proposition is essentially the basis of all computations of F ∗ in this paper. Indeed, it reduced the computation of F ∗ of several unitaries relative to a subalgebra to the computation of F ∗ of a single unitaries relative to C, for which there is an explicit formula (see Sect. 3.1). We mention that the situation as in the hypothesis of the proposition occurs in the case of a treeing of an equivalence relation (see later in the paper).
Free Information and Cost
143
Proposition 4.9. Let u1 , u2 , . . . be unitaries in M, B ⊂ M unital subalgebra, so that: 1. uj ∈ N (B) for all j , (j ) (j ) (j ) 2. For each j , there are central projections p1 , . . . , pn(j ) ∈ B, so that [pk , uj ] = 0 (j ) (j ) (j ) (j ) (j ) for all k, pk = 1, and for all k, pk uj pk is Aut-independent from pk Bpk . 3. u1 , u2 , . . . are free with amalgamation over B. Then F ∗ (u1 , . . . , un : B, id) =
n(j ) n j =1 i=1
(j )
(j )
(j )
τ (pi )F ∗ (pi uj pi )
and lim F ∗ (u1 , . . . , un : B, id) =
n→∞
n(j ) ∞ j =1 i=1
(j )
(j )
F ∗ (pi uj pi ).
Proof. Since u1 , . . . , un are free with amalgamation over B, it follows that F ∗ (u1 , . . . , un : B, id) =
n
F ∗ (uj : B, id).
j =1
Next, we get that F ∗ (uj : B, id) =
(j )
(j )
(j )
(j )
τ (pi )F ∗ (pi uj pi
(j )
(j )
: pi B, id).
(j )
Since pi B is independent from pi uj pi , it follows that (j )
(j )
F ∗ (pi uj pi
(j )
(j )
(j )
: pi B, id) = F ∗ (pi upi ).
This implies the statement. 5. Free Dimension for Normalizing Isometries 5.1. Free Brownian motion . Let u1 , . . . , un ∈ M, B ⊂ M be as before. Consider multiplicative free Brownian motion w1 (t), . . . , wn (t), such that: 1. w1 (t), . . . , wn (t) are free among each other with amalgamation over B, 2. w1 (t), . . . , wn (t) are free from u1 , . . . , un with amalgamation over B, 3. [wj (t), B] = {0}, and {wj (t)}j is Aut-independent from B. To construct such a family, take w(t) to be free multiplicative Brownian motion [1] in an algebra C. Let C1 , . . . , Cn be n isomorphic copies of C. Consider ∗B (Ci ⊗ B), and let wi (t) be w(t) ⊗ 1 in the i th copy Ci of C. Proposition 5.1. Let u1 , . . . , un , w1 (t), . . . , wn (t) be as in Sect. 5.1. Then, given 0 < α < 21 , there exist constants C1 , C2 , independent of t, so that:
144
D. Shlyakhtenko
(a) F ∗ (wj (t) : B, id) < +∞; more precisely, F ∗ (wj (t) : B, id) ≤
C1 , t
0 < t ≤ 1,
F ∗ (wj (t) : B, id) ≤ C2 e−tα ,
1 ≤ t.
(b) F ∗ (uw1 (t), . . . , uwn (t)) < +∞; more precisely, ∗
∗
F (uw1 (t), . . . , uwn (t)) ≤ nF (w1 (t)) ≤
n Ct1 , nC2 e−tα ,
0
Proof. By Proposition 3.2, F ∗ (uw1 (t), . . . , uwn (t) : B, id) ≤
F ∗ (uwj (t) : B, id).
By Proposition 2.7, we get that F ∗ (uwj (t) : B, id) ≤ F ∗ (wj (t) : B, id). Since wj (t) is Aut-independent from B, we get by Proposition 4.5 that F ∗ (wj (t) : B, id) = F ∗ (wj (t)). The estimates now follow from Voiculescu’s results for a single unitary (see Sect. 3.1). Definition 5.2. Let v1 , . . . , vn ∈ M be partial isometries. Then the free entropy of v1 , . . . , vn ∈ M relative to B, id is defined to be # ∗ (v1 , . . . , vn B) =
1 2
∞
0
F ∗ (v1 w1 (t), . . . , vn wn (t))dt.
By [9, Corollary 10.9], we get that if u is a single unitary, then # ∗ (u C) = #(u) =
log |z1 − z2 |dµ(z1 )dµ(z2 ),
where µ is the distribution of u, supported on the unit circle. Definition 5.3. Fix a an element ψ of the β-compactification of I = (0, 1], which is not in I . The free dimension of u1 , . . . , un relative to B, id is defined to be δ ∗ (u1 , . . . , un B) =
τ (u∗j uj ) − lim
t→ψ
# ∗ (u1 w1 (t), . . . , un wn (t) B) . log t 1/2
In the case that B = C, we write simply δ ∗ (u1 , . . . , un ).
Free Information and Cost
145
5.2. Properties of δ ∗ . The properties of F ∗ , together with Proposition 4.4 immediately give rise to the following properties of δ ∗ (· · · B): Proposition 5.4. Let u1 , . . . , un , B be as above. Then we have: 1. 0 ≤ δ ∗ (u1 , . . . , un ) ≤
n i=1
τ (ui u∗i ).
2. If u1 , . . . , uk are free from uk+1 , . . . , un with amalgamation over B, then δ ∗ (u1 , . . . , un ) = δ ∗ (u1 , . . . , uk ) + δ ∗ (uk+1 , . . . , un ). 3. δ ∗ (u1 , . . . , un ) ≤ δ ∗ (u1 , . . . , uk ) + δ ∗ (uk+1 , . . . , un ). 4. If D ⊂ B is a unital subalgebra, then δ ∗ (u1 , . . . , un B) ≤ δ ∗ (u1 , . . . , un D) ≤ δ ∗ (u1 , . . . , un ). 5. If D ⊂ B is a unital subalgebra, so that B = D ∨ C, and C[u1 , . . . , un ] is Autindependent from B, and u1 , . . . , un ∈ GN (D), then δ ∗ (u1 , . . . , un B) = δ ∗ (u1 , . . . , un C). In particular, if u1 , . . . , un are Aut-independent from B, then δ ∗ (u1 , . . . , un B) = δ ∗ (u1 , . . . , un ). 6. If pi , i = 1, . . . , N, N ∈ N ∪ {∞} are orthogonal central projections in B, so that pi = 1, and so that [pi , uj ] = 0 for all i, j , then δ ∗ (u1 , . . . , un B) =
τ (pi )δ ∗ (pi u1 pi , . . . , pi un pi pi Bpi ).
i
The same conclusion holds if µ is a projection-valued measure on X, µ(X) = 1, so that µ(Y ) ∈ B is in the center of B[u1 , . . . , un ] for all Borel subsets Y ⊂ X: ∗
δ (u1 , . . . , un ) =
X
δ ∗ (u1 (t), . . . , un (t) B(t))τ (dµ(t)),
where uj (t), B(t) is the disintegration of uj , B over W ∗ (µ(A) : A ⊂ X).
146
D. Shlyakhtenko
6. Free Dimension for One Variable Proposition 6.1. Let u ∈ M be a unitary, and let µ be its distribution, supported on the circle T = {z : |z| = 1}. Let w(t) be multiplicative free Brownian motion, free from u. Let ut = uw(t), and let µt be the distribution of ut . Then 2 lim
t→0
log |z − w|dµt (z)dµt (w) (µ{t})2 . =− | log t| t∈T
Proof. We reduce the problem to the form dealt with in Voiculescu’s proof of [7, Proposition 6.3]. We know by [9, Corollary 1.7] that µt is absolutely continuous with respect √ to Haar measure on T . Denote by pt its density; then we also know that pt ∞ ≤ 2/ t for t sufficiently small. We have that w(t) − 1∞ ≤ Kt 1/2 for some constant K. We also have that µt 2 (−π, b − Kt) ≤ µ(−π, b) ≤ µt 2 (−π, b + Kt);
(6.1)
this follows from the fact that µt is the distribution of uw(t), and w(t 2 ) − 1 ≤ Kt, 2 so that the spectrum of w(t ) is contained in the arc −Kt, Kt. The quantity log |z − w|dµt (z)dµt (w) does not change if we replace µt (w) by µt (rw) for some fixed r ∈ T . Hence we may assume that there are no atom s of µ in some arc around −1. Let −π ≤ θ ≤ π be the polar coordinate on T . We’ll from now on write dµt (θ ) to denote the density in [−π, π ] of the push-forward via θ of the measure µt . Notice that log |eiθ − eiψ | =
1 1 1 log(2 − 2 cos(θ − ψ)) = log(1 − cos(θ − ψ)) + log 2. 2 2 2
It will thus suffice to prove that lim
t→0
log(1 − cos(θ − ψ))dµt (θ )dµt (ψ) (µ{t})2 . =− | log t|
There is a δ > 0 so that for |α| < δ, 2 | log t|
t∈T
α2 4
≤ 1 − cos α ≤
α2 2 .
It follows that
log 4 log |θ−ψ|dµt (θ )dµt (ψ) − 1dµt (θ )dµt (ψ) | log t| |θ−ψ|<δ |θ −ψ|<δ 1 ≤ log(1 − cos(θ − ψ))dµt (θ )dµt (ψ) | log t| |θ −ψ|<δ 2 ≤ log |θ − ψ|dµt (θ )dµt (ψ) | log t| |θ −ψ|<δ log 2 − 1dµt (θ )dµt (ψ). | log t| |θ −ψ|<δ
Free Information and Cost
147
When 2π − δ ≥ |θ − φ| ≥ δ, 0 ≥ log(1 − cos(θ − ψ)) ≥ log(1 − cos δ) and when |θ − φ| ≥ δ, 0 ≥ log |θ − ψ| ≥ log δ. Hence as t → 0, the following expressions −1 | log t| log(1 − cos(θ − ψ))dµt (θ )dµt (ψ), | log t|−1 log(1 − cos(θ − ψ))dµt (θ )dµt (ψ) |θ−ψ|<δ + | log t|−1 log(1 − cos(θ − ψ))dµt (θ )dµt (ψ), |θ−ψ|≥2π −δ | log t|−1 2 log |θ − ψ|dµt (θ )dµt (ψ) |θ−ψ|<δ + | log t|−1 2 log |2π − (θ − ψ)|dµt (θ )dµt (ψ) and |θ−ψ|≥2π −δ | log t|−1 2 log |θ − ψ|dµt (θ )dµt (ψ) −1 + | log t| 2 log |2π − (θ − ψ)|dµt (θ )dµt (ψ) all have the same limit, if any. It is therefore sufficient to show that log |θ − ψ|dµt (z)dµt (w) lim 2 (µ{t})2 , =− t→0 | log t| t∈T log |2π − (θ − ψ)|dµt (θ )dµt (ψ) 2 lim = 0, i.e., t→0 | log t| lim
t→0
lim
t→0
log |θ − ψ|dµt (z)dµt (w) (µ{r})2 , = − | log t 1/2 | r∈T
log |2π − (θ − ψ)|dµt (θ )dµt (ψ) = 0. | log t 1/2 |
Let t = s 2 /K 2 , and write νt for µt 2 /K 2 . Then we must prove that log |θ − ψ|dνs (z)dνs (w) (µ{t})2 , lim =− s→0 | log s| t∈T log |(θ − ψ)|dνs (θ − π )dνs (ψ + π ) lim = 0. t→0 | log s| Let vs be the density of νs . We now proceed exactly as in the proof of [7, Proposition 6.3], noticing that the only estimates on νs needed in that proof are that: (a) vs 2 ≤ Cs −1/2 and (b) νs (−π, b − s) ≤ µ(−π, b) ≤ νs (−π, b + s). Voiculescu’s proof then shows that log |θ − ψ|dνs (z)dνs (w) lim − µ ⊗ µ(B) = 0, s→0 | log s| log |(θ − ψ)|dνs (θ − π )dνs (ψ + π ) lim − µ−π ⊗ µπ (B) = 0, t→0 | log s|
148
D. Shlyakhtenko
where B = {(t, t) : t ∈ [−π, π ]} and µr denotes the translation of µ by r. Since µ has no atoms in around π and −π, µ−π = µ−π = 0, while µ ⊗ µ(B) = t∈T (µ{t})2 . It remains to prove estimates (a) and (b). To prove estimate (a), it is sufficient to prove ˆ − 41 . Since pt ∞ ≤ Kt −1/2 and pt is supported on an arc of length that pt 2 ≤ Ct 1/2 Ct , it follows that pt rr = |pt (θ )|r dθ ≤ Ct 1/2 · K r t −r/2 = CK r t (1−r)/2 . supp pt
ˆ 21 (−1+1/r) . Setting r = 2 gives the estimate p2 ≤ Ct ˆ − 41 . Estimate Hence pt r ≤ Ct (b) follows by substituting the definition of νs and s into (6.1). Corollary 6.2. Let u ∈ M be a unitary, andlet µ be its distribution, supported on the circle T = {z : |z| = 1}. Then δ ∗ (u) = 1 − t∈T µ({t})2 . Corollary 6.3. Let G be a cyclic singly-generated group, and g ∈ G be its generator. 1 Let u ∈ L(G) be the unitary implementing g. Then δ ∗ (u) = 1 − |G| . Proof. If |G| = ∞, the distribution of u is a Haar measure. It follows that δ ∗ (u) = 1. 1 n If |G| = n, then the distribution of u is equal to n δω , where ωn is a primitive nth root of unity. We have that µ({ωn })2 = n12 , so that t∈T µ({t})2 = n n12 = n1 . Proposition 6.4. Let v be an isometry, v ∈ GN (B). Let w be a Haar unitary, independent from B and free from v with amalgamation over B. Assume that the B-valued distributions of v and vw are the same. Then δ ∗ (v B) = τ (vv ∗ ). Proof. Since v ∈ GN (B), one has in particular vv ∗ , v ∗ v ∈ B. Then δ ∗ (v) ≤ τ (v ∗ v) = τ (vv ∗ ). Let w be a Haar unitary, independent from B and free from v with amalgamation over B. Assume that vw has the same B-valued distribution as v. Then F ∗ (w : B, id) = F ∗ (w) = 0 because w is independent from B; and by Proposition 2.4, we get that 0 ≤ F ∗ (vw : B, id) ≤ F ∗ (w : B, id) = 0. Hence F ∗ (vw : B, id) = 0. Let w(t) be multiplicative unitary free Brownian motion, which is independent from B and free from w, v with amalgamation over B. Then the B-valued distributions of vw(t) and vww(t) coincide. Since ww(t) is free from v with amalgamation over B and has the same B-valued distribution as w, it follows that the B-valued distributions of vw(t) and vw are the same. Hence we get that δ ∗ (v B) = τ (v ∗ v) − lim
t→ψ
F ∗ (vw(t) : B, id) = τ (v ∗ v) − 0. F ∗ (w(t))
Problem 6.5. Let u1 , . . . , un ∈ GN (B). Is δ ∗ (u1 , . . . , un B) is an invariant for the pair (W ∗ (u1 , . . . , un B), B)? 7. Free Dimension and Cost 7.1. Local morphisms. Let (X, µ) be a finite measure space. If A ⊂ X is a subset, a local morphism φ : A → φ(A) is a measure-preserving measurable monomorphism from its domain A to its range φ(A). Let R be a measurable equivalence relation on X. We say that φ is consistent with R, if φ(x) ∼R x for almost all x ∈ A. If φ, ψ are local morphisms, then so are φ ◦ ψ and φ −1 ; these are defined on the subsets {x ∈ dom ψ : ψ(x) ∈ dom φ} and ran φ, respectively.
Free Information and Cost
149
7.2. Graphings and treeings. Following Gaboriau [4], we say that a set {φi }i∈I of local morphisms is a graphing of an equivalence relation R, if a.e. in X, x ∼R y iff x = g(1) g(2) g(n) φi1 (φi2 (. . . φin (y) . . . )) for some i1 , . . . , in ∈ I and g(1), . . . , g(n) ∈ {±1}. A graphing is a treeing, if for any i1 = i2 , i2 = i3 ,. . . and g(1), . . . , g(n) ∈ ±N, the set g(1) g(2) g(n) {x : φi1 (φi2 (. . . φin (x) . . . )) = x} has measure zero. 7.3. Cost of an equivalence relation. The cost C(R) of R is the infimum of the numbers i∈I µ(dom φi ) taken over all graphings {φi }i∈I of R. If {φi }i∈I is a treeing, then C(R) is given by i∈I µ(dom φi ). 7.4. Free equivalence relations. Two equivalence relations R1 and R2 are called free, if for each n ≥ 2, the set of all x ∈ X for which x ∼Ri1 x1 ∼Ri2 ∼Ri3 · · · ∼Rin x with i1 = i2 , i2 = i3 , . . . , in−1 = in has measure zero. Gaboriau proved that C(R1 ∨ R2 ) = C(R1 ) + C(R2 ), if R1 ∨ R2 is the equivalence relation generated by R1 and R2 , and R1 , R2 are free. 7.5. The von Neumann algebra of an equivalence relation. Feldman and Moore [2] associated to a measurable equivalence relation R on X a von Neumann algebra W ∗ (X, R). We summarize some properties of W ∗ (X, R): 1. W ∗ (X, R) contains L∞ (X) as a unital subalgebra; 2. Every local morphism φ consistent with R defines a partial isometry vφ ∈ M, so that vφ∗ vφ = χdom φ , vφ vφ∗ = χran φ , and vφ f vφ∗ = f ◦ φ for all f ∈ L∞ (dom φ) ⊂ L∞ (X) (χ denotes the characteristic function of a set). Furthermore, vφ vψ = vφψ , vφ∗ = vφ −1 . 3. If φ is a local morphism, and C ⊂ dom φ, then vφ|C = vφ χC . 4. M is generated by all vφ , where φ ranges over all possible local morphisms, consistent with R. 5. There is a conditional expectation E from M onto L∞ (X), so that τ = µ ◦ E is a trace on M. Moreover, if φ is a local morphism, then E(vφ ) = χ{x∈X:φ(x)=x} . 7.6. Extensions of local morphisms. Let φ : A → G be a local morphism. Let A∞ = {x ∈ X : x ∈ dom φ −n , ∀n}. Then φ(A) · A∞ = φ −1 (A) · A∞ = A∞ . Let A∞ = {x ∈ X : x ∈ dom φ n , ∀n}. Let C = dom φ ∪ ran φ. Then φ(A) · C = φ −1 (A) · C = C. Let D = C \A∞ \A∞ . Then for all x ∈ D, there exists a number n(x), so that x ∈ dom φ −n , but x ∈ / dom φ −(n+1) . Let An = {x : x(n) = n}. Now define Dφ (x) = φ(x) if x ∈ A∞ , Dφ (x) = φ −1 (x) if x ∈ A∞ \ A∞ , Dφ (x) = x if x ∈ / C, and Dφ (x) = φ −1 (x) if m(x) x ∈ An , n > 0, Dφ (x) = φ (x) if x ∈ A0 , where m(x) = max{p : x ∈ dom φ p }. Proposition 7.1. Let φ : A → G be a local morphism, compatible with an equivalence relation R. Then Dφ is an automorphism of X, compatible with R. Proof. We must check that Dφ (x) = Dφ (y) if x = y. This is clear if at least one of (x, y) is not in C, or if x ∈ An , y ∈ Am for n = m. Assume that x ∈ An , y ∈ Am , but n = m. For definiteness, assume that n > 0. Then Dφ (x) ∈ An−1 . If m > 0, then Dφ (y) ∈ Am−1 , so that Dφ (x) = Dφ (y). If m = 0, Dφ (y) = Dφ (x) would imply that φ m(y) (y) = Dφ (x) ∈ dom φ.
150
D. Shlyakhtenko
Given a graphing {φi } of an equivalence relation R, we can associate to it unitaries ui ∈ W ∗ (X, R) by taking the unitaries uφi associated to bisections Dφi . Proposition 7.2. Let {φi }i∈I be a graphing of an equivalence relation R on a finite measure space X. Then {Dφi }i∈I is also a graphing of R. In particular, the von Neumann algebra W ∗ (X, R) is generated by L∞ (X) and {vφi }i∈I . Proof. Clearly, only the first statement needs to be proved. It is clearly sufficient to prove that if x = φi (y), then x and y are equivalent in the equivalence relation generated by Dφi . From the definition of D, we have that either Dφi (x) = y, or Dφi (y) = x, / dom φi for some n ∈ N. In that case, we have that unless y ∈ / dom φi−1 , and φin (y) ∈ Dφi (x) = y. Lemma 7.3. A treeing of R is a graphing {φi }, such that R = ∗Ri , where Ri is the equivalence relation generated by φi , and so that φin (x) = x for all x ∈ X and n ∈ ±N. Conjecture 7.4. Let {φi } be a graphing of an equivalence relation R. Let ui be the unitary in W ∗ (X, R) associated to Dφi , and let vi ∈ W ∗ (X, R) be the partial isometry associated to φi . Then δ ∗ ({φi }) = lim δ ∗ (u1 , u2 , . . . , un L∞ (X)) = lim δ ∗ (v1 , v2 , . . . , vn L∞ (X)) n→∞
n→∞
is independent of the graphing {φi }. Proposition 7.5. Let R1 , R2 be two equivalence relations, which are free. Then the algebras W ∗ (X, R1 ) and W ∗ (X, R2 ) are free in W ∗ (X, R) with amalgamation over L∞ (X). (1)
(2)
Proof. Let φi , i ∈ I be local morphisms compatible with R1 , and φi , i ∈ I be local morphisms compatible with R2 . Then W ∗ (X, R1 ∨ R2 ) = W ∗ (X, R1 ) ∨ W ∗ (X, R2 ) is densely linearly spanned by L∞ (X) and words of the form w = vφ j1 . . . vφ jn , where j1 = j2 , j2 = j3 , etc., and EL∞ (X) (v
i1
j φi k k
in
) = 0. We must prove that EL∞ (X) (w) = 0. But
EL∞ (X) = χ{x∈X:φ j1 ◦...φ jn (x)=x} = 0 by the assumption that R1 and R2 are free. i1
in
Proposition 7.6. Let {φi } be a treeing of an equivalence relation R. Let uφi be the unitary associated to the bisection Dφi . Then C(R) = lim δ ∗ (uφ1 , uφ2 , . . . , uφn L∞ (X)). n→∞
Proof. In view of additivity of both C(R) and δ ∗ under free products, it is sufficient to verify the equality for a single φ. Let A = dom φ, B = ran φ, C = A ∪ B. R be the equivalence relation generated by φ, and let An = {x ∈ X : #{y : y ∼R x} = n}. Then C(R) = µ(A∞ ) + µ(An )(1 − n1 ), since An are disjoint, and R is cyclic of order n on An . On the other hand, let pn be the characteristic function of An . Then [uφ , pn ] = 0. Moreover, un = pn uφ pn implements a free action of a cyclic group of order n on An . Therefore, τ (pn )δ ∗ (un L∞ (An )). δ ∗ (uφ L∞ (X)) = τ (p∞ )δ ∗ (u∞ L∞ (A∞ )) + Since un is Aut-independent from L∞ (An ), it follows that δ ∗ (un L∞ (An )) = δ ∗ (un ) = 1 − n1 . Thus C(R) = δ ∗ (u L∞ (X)).
Free Information and Cost
151
If Conjecture 7.4 is correct, then δ ∗ ({φi }) ≤ C(R). Indeed, assume that each φi by itself is a treeing. Let Rφi be the equivalence relation generated by φi . δ ∗ ({φi }) = lim δ ∗ (uφ1 , uφ2 , . . . , uφn L∞ (X)) n→∞ ≤ lim δ ∗ (uφi L∞ (X)) = C(Rφi ) ≤ C(R), n→∞
the last inequality because given a graphing {φi } we can always assume that each φi is a treeing by itself by shrinking the domains of φi . Proposition 7.7. Let {φi } be a treeing of an equivalence relation R. Let vφi be the partial isometry in W ∗ (X, R) associated to φi . Then ∗
∞
C(R) = lim δ (vφ1 , . . . , vφn L (X)) = n→∞
∞
δ ∗ (vφi L∞ (X)).
j =1
Proof. Since {φi } is a treeing, vφj , j = 1, 2, . . . are free with amalgamation over L∞ (X). Hence δ ∗ (vφ1 , . . . , vφn L∞ (X)) =
n
δ ∗ (vφi L∞ (X)).
j =1
Since each φi is a treeing, the L∞ (X)-valued distribution of vφi remains the same if vφi is replaced by vφi w, where w is a Haar unitary, independent from L∞ (X) and free from vφi with amalgamation over L∞ (X). δ ∗ (vφi ) ∞From Proposition 6.4 we get that ∗ ∗ = τ (vφi vφi ) = µ(dom φi ). Since C(R) = j =1 µ(dom φi ), C(R) = limn δ (vφ1 , . . . , vφn ). Problem 7.8. Choose any graphing {φi } of R, let uφi be the unitary in W ∗ (X, R) associated to the bisection Dφi , and let vφi be the partial isometry in W ∗ (X, R) associated to φi . Is it true that C(R) = limn→∞ δ ∗ (uφ1 , . . . , uφn L∞ (X)) = limn→∞ δ ∗ (vφ1 , . . . , vφn )? If this question is answered in the affirmative, then all groups have fixed cost. Indeed, if φi = αgi comes from the free action of a group G with some fixed set of generators g1 , g2 , . . ., then δ ∗ (uφ1 , . . . , uφn L∞ (X)) = δ ∗ (uφ1 , . . . , uφn ), the latter depending only on g1 , g2 , · · · ∈ G, and independent of the action α. Acknowledgements. This work was completed while the author was participating in the Random Matrix Program at MSRI. We would like to thank the organizers for the friendly atmosphere. We would also like to acknowledge fruitful conversations with Prof. J. Feldman, Prof. S. Popa and Prof. D.-V. Voiculescu.
References 1. Biane, P.: Free Brownian motion, free stochastic calculus and random matrices. In: Free Probability, D.-V. Voiculescu, ed., Fields Institute Communications, Vol. 12, Providence, RI: American Mathematical Society, 1997, pp. 1–19 2. Feldman, J. and Moore, C.C.: Ergodic equivalence relations, cohomology, and von Neumann algebras I, II. Trans. AMS 234, 289–359 (1977) 3. Levitt, G.: On the cost of generating an equivalence relation. Ergodic Theory Dynam. Systems 6, 1173– 1181 (1995) 4. Gaboriau, D.: Coût des relations d’équivalence et des groupes. Invent. Math. 139, 41–98 (2000) 5. Gaboriau, D.: Mercuriale de groupes et de relations. C.R. Acad. Sci. Paris t. 326 Série I, 219–222 (1998)
152
D. Shlyakhtenko
6. Shlyakhtenko, D.: Free entropy with respect to a completely-positive map, Amer. J. Math. 122, 45–81 (2000) 7. Voiculescu, D.-V.: The analogues of entropy and of Fisher’s information measure in free probability theory II. Invent. Math. 118, 411–440 (1994) 8. Voiculescu, D.-V.: The analogues of entropy and of Fisher’s information measure in free probabilility, V. Invent. Math. 132, 189–227 (1998) 9. Voiculescu, D.-V.: The analogues of entropy and of Fisher’s information measure in free probability, VI. Adv. Math. 146, 101–166 (1999) 10. Voiculescu, D.-V.: A strengthened asymptotic freeness result for random matrices with applications to free entropy. IMRN 1, 41–64 (1998) 11. Voiculescu, D.-V., Dykema, K. and Nica, A.: Free random variables. CRM monograph series, Vol. 1. Providence, RI: American Mathematical Society, 1992 Communicated by A. Connes
Commun. Math. Phys. 218, 153 – 176 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Random Parking, Sequential Adsorption, and the Jamming Limit Mathew D. Penrose Department of Mathematical Sciences, University of Durham, South Road, Durham DH1 3LE, UK. E-mail:
[email protected] Received: 18 August 2000 / Accepted: 13 November 2000
Abstract: Identical cars are dropped sequentially from above into a large parking lot. Each car is positioned uniformly at random, subject to non-overlap with its predecessors, until jamming occurs. There have been many studies of the limiting mean coverage as the parking lot becomes large, but no complete proof that such a limit exists, until now. We prove spatial laws of large numbers demonstrating that for various multidimensional random and cooperative sequential adsorption schemes such as the one above, the jamming limit coverage is well-defined. 1. Introduction The classical random car-parking model of Rényi (1958) goes as follows. Unit-length cars arrive sequentially at a roadside of length L. Each car parks at a position along the length of the kerb chosen uniformly at random subject to the obvious constraint of non-overlap with previously parked cars, until there are no available spaces left of length greater than 1 (“jamming” of the interval [0, L]). With the resulting (random) number of cars at jamming denoted N (L), Rényi proved that t ∞ E N (L) 1 − e−u lim exp −2 = du dt ≈ 0.748. (1.1) L→∞ L u 0 0 The model readily generalizes to two or more dimensions. Indeed, Rényi’s one-dimensional analysis was motivated by analogy to the less mathematically tractable question of physical sphere-packing in three dimensions. We next describe the analogous twodimensional model as formulated by Palásti (1960). Let A = A(s, t) := [0, s] × [0, t] be a target region in R2 , and let D be a specified bounded open set in R2 containing the origin (for example, a square or disk). Suppose items are placed sequentially at random in the region A. Assume each successive item is a translate of D with location distributed uniformly at random over A, subject to the
154
M. D. Penrose
constraint of non-overlap with previously placed items. The process terminates when there is no available space left large enough to contain a new item (jamming of A). Let N(A; A) denote the (random) number of items packed in at the termination time (the second A in the notation is not a misprint; its reason will become apparent later.) Our main interest here is in N (A; A). The two-dimensional model is a version of irreversible random sequential adsorption (RSA) which has become very important in the physical and biological sciences, particularly in the study of the deposition of colloidal particles or proteins on a surface. See Evans (1993) for a comprehensive survey. Other surveys include Bartelt and Privman (1991), Privman (2000), Senger et al. (2000), Talbot et al. (2000) and Wang (2000). A key feature is the irreversibility of the deposition of items, which distinguishes RSA from equilibrium models such as Gibbs distributions. In the statistical literature, an identical model is referred to as simple sequential inhibition (SSI); see Diggle (1983) or Stoyan Kendall and Mecke (1995). In the modelling of scheduling protocols in operations research, the model is known as on-line packing; see for example Coffman et al. (1998). Other processes for which parking-type models have been used include rock fragmentation and election results; see Evans (1993), p. 1312. For further results on the Rényi (1958) model and its variants, see for example Ney (1962), Dvoretzky and Robbins (1964), Mannion (1979), Coffman et al. (1998), Itoh and Shepp (1999). In all of these developments, the methods used seem to be strictly one-dimensional, and in higher dimensions there have been many simulation studies but not much in the way of rigorous theory. See Solomon and Weiner (1986) for a survey of developments up to that time. Palásti (1960) made an early study of the two-dimensional model described above, and made two conjectures that will concern us here. The first conjecture is that if we write M(s, t) for E N (A; A), with A = A(s, t), then λ = λ(D) := lim
s,t→∞
M(s, t) exists. st
(1.2)
The second conjecture is that in the special case where D is a unit square, the limit λ is equal to the square of the Rényi limit given in (1.1). Both conjectures readily generalize to higher dimensions. Palásti gives a partial proof of the first conjecture, which is incomplete because it relies on a plausible but unproven hypothesis. She bases her second conjecture on an “heuristic argument” which she does not provide. If the set D is taken to have unit area, the putative limiting value λ of E M(s, t)/(st) is sometimes known as the “jamming limit coverage”, and has been the subject of numerous simulation studies for various sets D. For general accounts of these, see Wang (2000), Sect. F of Evans (1993), Solomon and Weiner (1986). The jamming limit coverage has been estimated by simulations as roughly around 0.547, when D is a disk, and as roughly around 0.562 when D is a square (see Evans (1993), pp. 1293 and 1312). The latter figure indicates that Palásti’s second conjecture is most likely false. Many variants of the basic model have also been proposed, mainly in the RSA literature in the physical sciences. These include lattice models, models with more than one type of shape, models with some kind of longer-range interaction besides hard-core repulsion (known as cooperative sequential adsorption or CSA), and so forth. The extent of activity on these models is indicated by more than 200 citations to date for Evans (1993). Typical questions are concerned with the value of the jamming limit coverage, the rate of approach to jamming, the spatial distribution of items in the jamming limit, and the fluctuations of N (A; A).
Random Parking, Sequential Adsorption, Jamming Limit
155
A good deal of this activity is to some extent founded on the assumption that Palásti’s first conjecture (1.2) is true. However, it seems that this has never been proved. In fact, Solomon and Weiner (1986, p. 2595) assert that the experimental evidence suggests the first conjecture is probably false. Our main purpose is to prove that on the contrary, (1.2) is true. In fact we prove a law of large numbers (LLN) which says that N (A; A)/|A| converges in mean to a constant λ, a stronger result than (1.2). We also allow for more general target sets than rectangles. The result is given in Theorem 2.1 below. The method of proof works in any dimension, and is applicable in modified form to many variants of the basic model, some of which we consider later on. As well as the total number of points, one might be concerned with more detailed information about the spatial distribution of adsorbed items. We shall present a secondary result (Theorem 2.2), which says, loosely speaking, that the local distribution of adsorbed points stabilizes as the target region becomes large. With a LLN established for N (A; A), it is natural to look for an associated central limit theorem (CLT) to show N (A; A) is approximately normally distributed. We hope to investigate this in future work. The remainder of this paper is organized as follows. Section 2 contains the formal statement of our main results as just described, and an outline of the proof and its relation to previous work on the jamming limit. Details of the proof are given in Sects. 3 and 4. In Sects. 5, 6, and 7 we extend the result to non-uniform distributions of locations of incoming items, to packings of items with randomly varying shapes or spins, and to a CSA model, respectively. Notation: for a set S in Rd , we shall say S is nonnull if its Lebesgue measure is strictly positive. We shall write |S| to mean either the number of elements of S (if S is finite) or the Lebesgue measure of S (if S is nonnull). Let 0 denote the origin (0, . . . , 0) of Rd . For K > 0 and x = (x1 , . . . , xd ) ∈ Rd we define the “box” BK (x) ⊂ Rd by BK (x) :=
d
[xi − K, xi + K].
i=1
Also set x := maxi |xi |, the l∞ (maximum-component) norm of x. For any sequence of sets (An )n≥1 we make the definition lim inf(An ) := ∪n≥1 ∩m≥n Am . Given a bounded set A ⊂ Rd , let ∂A be the intersection of the closure A with that of its complement, and for r > 0, let ∂r A denote the set of all x ∈ Rd lying at a Euclidean distance at most r from ∂A. If L is a subset of Zd , let ∂ Z L be the set of elements of L at a Euclidean distance at most 1 from Zd \ L. If E is an event in a given probability space let 1E be the indicator random variable taking the value 1 if E occurs and 0 if not. 2. Basic Results In this section we present and discuss our main results on the Palásti model of RSA. The model is as already described, except that from now on we work in Rd rather than R2 to emphasise that the arguments work in any dimension d. Also, the target region A, previously assumed to be a rectangle, is now allowed to be an arbitrary bounded Borel nonnull set in Rd .
156
M. D. Penrose
Theorem 2.1. There is a constant λ = λ(d, D) > 0 such that if (An )n≥1 is a sequence of bounded nonnull Borel subsets of Rd satisfying |∂r An |/|An | → 0 as n → ∞ for all r > 0, then for any p ∈ [1, ∞), N (An ; An ) Lp −→ λ. |An |
(2.1)
This result is a law of large numbers for N (A; A). Later we shall briefly discuss an associated strong law of large numbers. The reader might guess that a result like Theorem 2.1 can be proved by methods based on subadditivity or near-subadditivity, as described for example in Yukich (1998). However, it is not clear whether any form of near-subadditivity holds for N (A; A) or its mean, even in one dimension. The unproven hypothesis which forms the basis of Palásti’s original proof of (1.2) is essentially a form of near-subadditivity. To proceed further, we need a few notions from point process theory. Let S be the space of locally finite subsets of Rd . For ξ ∈ S and B ⊂ Rd , let Nξ (B) denote the number of elements of ξ in B (so Nξ (·) is a counting measure). A point process on Rd is a random element ξ of S. For more details, see for example Daley and Vere-Jones (1988), Stoyan, Kendall and Mecke (1995), or Resnick (1987). A “soft” argument for the existence of a form of jamming limit coverage goes as follows. Consider a Markov process (ξt , t ≥ 0) taking values in S, in which points, once added, are never removed. Assume the probability of adding a point in the volume element dxdt (x ∈ Rd , t ∈ [0, ∞)) is equal to dxdt if there is no overlap of the translate of D centred at x with translates of D centred at existing points just prior to time t, and is equal to zero otherwise. Assume ξ0 is the empty set. Since the dynamics are spatially homogeneous, for each t, ξt is a stationary point process on the whole of Rd (meaning that at each time t, the spatial distribution of points is invariant under space-translation). By general properties of stationary point processes the point process ξt has a density λt (the mean number of points per unit area), which is clearly bounded and nondecreasing in t, so must converge to a limit λ as t → ∞. Two objections might be made to the above argument. The first objection is that the existence and uniqueness of a Markov process with the above dynamics is not immediately clear. This issue was discussed in Evans (1993), p. 1302, but only for lattice RSA models. The second objection is that the soft argument does not take us much closer to knowing whether the limit λ in (1.2) exists, and if it does, whether it is equal to the limit λ given by the soft argument. Typically, a physical system exhibiting RSA or a computer simulation thereof will be on a large but finite target region, so that it is λ, not λ , that is arguably of greater interest. Here is an outline of the proof of Theorem 2.1. We use the following alternative definition of N (A; A). Items arrive sequentially with locations uniformly distributed over A, and an item is accepted unless it overlaps a previously accepted item, in which case it is rejected. Let N (A; A) denote the ultimate number of accepted items. Clearly, this is equivalent to the earlier definition of N (A; A). It useful to assume that the locations in space-time of arrivals form a homogeneous Poisson point process in A × [0, ∞). Let ξtA denote the set of spatial locations of accepted Poisson arrivals up to time t. We can keep track of the accepted points via a graphical representation on the Poisson points, similar to the graphical representations long used in interacting particle systems theory on the lattice (see e.g. Durrett (1988)). Extending the Poisson process and the graphical representation to the whole of space-time, we obtain for each t a stationary point process ξt of locations of items accepted up to time t, which matches the informal description of
Random Parking, Sequential Adsorption, Jamming Limit
157
ξt given in the soft argument above. The graphical representation enables us to estimate the difference between ξtA and ξt (essentially, a boundary effect) via a comparison with a stochastic spatial growth model (first passage percolation; see Durrett (1988)), and using a counting argument to bound the growth rate. In effect, this enables us to fill in the gaps in the soft argument. Convergence of means is strengthened to convergence in the p th moment by an application of the Ergodic Theorem to the stationary point processes ξt and ξ . The result shows that a simulation of the Palásti RSA process on a sufficiently large square will, in principle, yield a coverage that approximates with high probability to the quantity of interest λ, which is the same as λ described above. A number of simulations in the literature have in fact been on the torus: it is possible to show that these, too, yield a coverage which approximates to λ; loosely speaking, boundary conditions are not too important. However, we shall not go into details in the case of the torus. Much of the literature from the physical sciences on RSA processes is effectively concerned with the point processes ξt and the limiting point process ξ := ∪t>0 ξt . As well as their densities, there is interest in quantities such as correlations and n-point density functions for these point processes; see for example Talbot et al. (2000). As in the case of the jamming limit coverage, it seems worthwhile to assure ourselves that such analyses of the infinite-space system are a good approximation to the system on a sufficiently large bounded target region. Therefore we give a second result which says that the distribution of points, for the Palásti RSA process on a bounded target set A, does indeed converge locally to the distribution of points generated by an RSA process in the whole of Rd . If ζ and ζn (n ∈ N) are point processes on Rd , we say the sequence ζn converges weakly to ζ if the finite-dimensional distributions converge, i.e. if for any finite collection of bounded Borel sets Bi satisfying Nζ (∂Bi ) = 0 almost surely, the joint probability distributions of Nζn (Bi ) converge weakly to those of Nζ (Bi ). This is equivalent to various other definitions of weak convergence; see e.g. Daley and Vere-Jones (1988), Sect. 9.1. Our next result concerns convergence of the point processes ξtA already alluded to, and of the limiting point process ξ A of all points ultimately accepted in the target set A, as A becomes large. Theorem 2.2. There exist point processes ξ and ξt , t > 0, such that if (An )n≥1 is any sequence of target sets with lim inf n→∞ An = Rd , the sequence of point processes ξ An converges weakly to ξ , and for all t > 0, the sequence of point processes ξtAn converges weakly to ξt . 3. A Graphical Construction In this section we describe a graphical construction which can be used to generate Palásti’s RSA process both on a bounded target set A, and on the unbounded target set Rd . This construction will give us a rigorous demonstration of existence of the point process ξt alluded to in the previous section. The model is generated by an input of random translates of a specified bounded open set D ⊂ Rd with 0 ∈ D (possibly a ball). On occasion we may refer to the set x + D := {x + y : y ∈ D} as the translate of D centred at x. Assume that on an underlying probability space we have a homogeneous Poisson process P of unit rate on Rd ×[0, ∞). Given a bounded nonnull Borel target set A ⊂ Rd , label the points of the restriction of P to A × [0, ∞) as {(Xi , Ti )}∞ i=1 with T1 < T2 <
158
M. D. Penrose
T3 < · · · . Let the first item be accepted, and recursively for i = 2, 3, 4, . . . , let item i be accepted if (Xi + D) ∩ (Xj + D) = ∅ for all j < i such that j was accepted. Let I (Xi , Ti ; A) be equal to 1 if item (Xi , Ti ) is accepted, and zero otherwise. For Borel B ⊂ Rd and for t > 0 define Nt (B; A) :=
∞ i=1
I (Xi , Ti ; A)1{Ti ≤t,Xi ∈B} ;
N (B; A) := lim Nt (B; A). t→∞
(3.1)
Then N (A; A) is clearly the total number of accepted items in a realization of Palásti’s model as described at the start of this article. By an elementary rescaling argument, it suffices to prove Theorems 2.1 and 2.2 under the extra assumption that D is small enough to satisfy D ∩ (y + D) = ∅ for all y ∈ Rd with y ≥ 1,
(3.2)
and from now on we assume (3.2). An oriented graph is a special kind of directed graph in which there is no pair of vertices {x, y} for which both (x, y) and (y, x) are included as directed edges. By a circuit in an oriented graph we shall mean an oriented path that ends up where it started. We shall say that x is a parent of y and y is an offspring of x if there is an oriented edge from x to y. By a root of an oriented graph we mean a vertex with no parents. The graphical construction goes as follows. Make the points of the Poisson process P on Rd × [0, ∞) into the vertices of an infinite oriented graph, denoted G, by putting in an oriented edge (X, T ) → (X , T ) whenever (X + D) ∩ (X + D) = ∅ and T < T . For completeness we also put an edge (X, T ) → (X , T ) whenever (X +D)∩(X+D) = ∅, T = T , and X precedes X in the lexicographic ordering on Rd , but in practice the probability that P induces this second kind of edge is zero. It can be useful to think of the oriented graph as representing the spread of an “epidemic” through space over time; each time an individual is “born” at a Poisson point in space-time, it becomes (and stays) infected if there is an earlier infected point nearby in space (in the sense that the translates of D centred at the two points overlap). This graph determines which items are to be accepted. Before describing how this works, we shall give a key lemma providing a bound for the rate of spread of the epidemic. Divide Rd into unit cubes, with the centres of the cubes at the points of the integer lattice Zd . For z ∈ Zd let Qz be the cube centred at z. By (3.2), if there is an edge of G from (X, T ) to (Y, U ), then if Qx is the cube containing X and Qy the cube containing Y , then x − y ≤ 1, so the cubes Qx and Qy are either the same or adjacent (possibly diagonally), and x − y ≤ 1. For x, y ∈ Zd , let us say that y is affected by x before time t if there exists a (directed) path in the oriented graph that starts at some Poisson point (X, T ) with X ∈ Qx , and ends at some Poisson point (Y, U ) with Y ∈ Qy and U ≤ t. In terms of the “epidemic”, y is affected by x before time t if, on the assumption that all particles born in Qx are infected, the epidemic spreads to Qy by time t. Define the event Et (x, y) by Et (x, y) := {y is affected by x before time t}.
(3.3)
For x and y in Zd , we define a path γ from x to y (written γ : x ❀ y) to be any sequence x0 = x, x1 , x2 , . . . , xn = y of distinct elements of Zd , with xi − xi−1 = 1 for each i = 1, 2, . . . , n. For such a path we write |γ | = n. Given a path γ = (x0 , . . . , xn ), let Sγ be the time it would take for the infection to pass from x to y along the path γ , if only infections along that path were allowed. More
Random Parking, Sequential Adsorption, Jamming Limit
159
formally, set Sγ ,0 = 0 and recursively define random variables Sγ ,j for j = 1, 2, . . . , n by Sγ ,j := Sγ ,j −1 + Wγ ,j , where Wγ ,j is the time from Sγ ,j −1 to time of the next Poisson arrival in the cube Qxj . Then set Sγ = Sγ ,n . Then n Wγ ,j , Sγ = j =1
a sum of independent exponential variables with parameter 1. The following key result says that the rate of spread of the epidemic is at most linear with high probability, and is proved using an argument adapted from FPP (see for example Durrett (1988)). Recall that 0 denotes the origin of Zd . Lemma 3.1. There is a constant δ1 ∈ (0, 1) such that for all r > 0, the probability that 0 is affected from outside [−r, r]d before time δ1 r is less than 2(3−r ), or in other words, (3.4) P ∪y∈Zd \[−r,r]d Eδ1 r (y, 0) ≤ 2(3−r ). Proof. By Boole’s inequality, for r, δ > 0 we have P ∪y∈Zd \[−r,r]d Eδr (y, 0) ≤
P [Sγ ≤ δr]
γ ❀0,|γ |≥r
≤
P [Sγ ≤ δ|γ |].
γ ❀0,|γ |≥r
Let W denote an exponential random variable with mean 1. For θ > 0, by Markov’s inequality we have P [Sγ ≤ δ|γ |] ≤
|γ | E [e−θSγ ] δθ −θW = e E [e ] . e−θδ|γ |
Let α = 3−(d+1) . Take θ > 0 with E [e−θW ] < α/2 and δ1 with eδ1 θ < 2. Then P [Sγ ≤ δ1 |γ |] ≤ α |γ | . Since each lattice point has 3d − 1 neighbours, the number of paths of length n ending at the origin is at most 3dn , so P [Sγ ≤ δ1 n] P ∪y∈Zd \[−r,r]d Eδ1 r (y, 0) ≤ n≥r γ ❀0:|γ |=n
≤
3dn α n ≤ 2(3−r ).
n≥r
Corollary 3.1. Let z ∈ Zd and t > 0. With probability 1, z is affected before time t by at most finitely many y. Proof. It suffices to consider the case z = 0. Let r > t/δ1 . The probability that 0 is affected before time t by infinitely many y is bounded by the probability that 0 is affected before time t by some y outside [−r, r]d , and therefore is at most 2(3−r ). Since r can be arbitrarily large, the result follows.
160
M. D. Penrose
For (X, T ) ∈ P, let C(X,T ) (the “cluster at (X, T )”) be the (random) set of ancestors of (X, T ), that is, the set of (Y, U ) ∈ P such that there is an oriented path in G from (Y, U ) to (X, T ). By Corollary 3.1, the “cluster” C(X,T ) is finite for all (X, T ) ∈ P with probability 1. It represents the set of all items that can potentially affect the acceptance status of the incoming particle represented by the Poisson point (X, T ). The method of reconstructing the set of accepted items from the graph G goes as follows. Let A ⊂ Rd be a (possibly unbounded) Borel set in Rd , and let PA denote the set P ∩ (A × [0, ∞)), i.e. the set of Poisson points that lie in A × [0, ∞). Let G|A denote the restriction of G to the vertex set PA . Recursively define subsets Fi (A), Gi (A), Hi (A) of PA , i = 1, 2, 3, . . . as follows. Let F1 (A) be the set of roots of the oriented graph G|A , and let G1 (A) be the set of offspring of roots. Set H1 (A) = F1 (A)∪G1 (A). For the next step, remove the set H1 (A) from the vertex set, and define F2 (A) and G2 (A) in the same way; so F2 (A) is the set of roots of the restriction of G to vertices in PA \ H1 (A), and G2 (A) is the set of vertices in PA \H1 (A) that are offspring of vertices in F2 (A). Set H2 (A) = F2 (A)∪G2 (A), remove the set H2 (A) from PA \ H1 (A) and repeat the process to obtain F3 (A), G3 (A), H3 (A). Continuing ad infinitum gives us subsets Fi (A), Gi (A) of PA defined for i = 1, 2, 3, . . . . It is clear from the construction that these sets are disjoint. If at some stage we run out of roots then all but finitely many of the sets Fi (A), Gi (A) will be empty. In the special case where the initial set A is taken to be the whole of Rd , let us make the abbreviations Fi for Fi (Rd ) and Gi for Gi (Rd ). Thus F1 is the set of all roots in the whole of G, G1 is the set of all offspring in G of vertices in F1 , and so on. As before, we obtain an infinite sequence of disjoint subsets F1 , G1 , F2 , G2 , . . . of P. Lemma 3.2. Let A be a bounded nonnull Borel set in Rd . With probability 1, the sets F1 (A), G1 (A), F2 (A), G2 (A), . . . form a partition of PA , and the sets F1 , G1 , F2 , G2 , . . . form a partition of P. Moreover, if A is the target set for the Palásti RSA model, the set of accepted items is the union ∪∞ i=1 Fi (A). Proof. First we show that every element of PA is assigned to one of the sets Fi (A), Gi (A), so that these form a partition of A. Let t > 0 and set PA,t = P ∩ (A × [0, t]). If the (random but almost surely finite) number of points of PA,t is N, then every point of PA,t is assigned to one of the sets F1 (A), G1 (A), . . . , FN (A), GN (A). This is because at each iteration of the algorithm defining the sets Fi (A), Gi (A), the set of hitherto unassigned points of PA,t , if nonempty, contains at least one root, so at least one more element of PA,t gets assigned at each successive iteration, until all elements of PA,t are assigned, which must happen in at most N iterations. Since the above argument applies with probability 1 for all t, it follows that with probability 1, every element of PA gets assigned to one of the sets F1 (A), G1 (A), F2 (A), G2 (A), . . . , so these sets form a partition of PA as asserted. Next we show that ∪i≥1 Fi (A) is the set of accepted items in the target set A. A root represents an item that arrives before any overlapping items, so elements of F1 (A) are accepted. Elements of G1 (A) are rejected because they overlap with previously accepted “parents” in F1 (A). Elements of F2 (A) are accepted because they have parents only in G1 (A), and elements of G2 (A) are rejected because they have accepted parents in F2 (A). The argument proceeds in this way, formally by an induction which we leave to the reader. This procedure for determining which items have been accepted is essentially the same as that used by Caser and Hilhorst (1995), as the basis for a rigorous bound for λ (without determining its existence).
Random Parking, Sequential Adsorption, Jamming Limit
161
Finally we show that the sets F1 , G1 , F2 , G2 , . . . form a partition of P. Given (X, T ) ∈ P, recall that C(X,T ) is set of ancestors of (X, T ) in P. Recursively define subsets Fi (C(X,T ) ) and Gi (C(X,T ) ), i = 1, 2, . . . , in exactly the same manner as for Fi (A), Gi (A), except that now we start out with the restriction of G to the (finite) set of vertices C(X,T ) , rather than to PA . In this case the algorithm will terminate at some point when there are no roots left. Since every nonempty finite subset of C(X,T ) contains at least one root, the algorithm does not terminate until every element of C(X,T ) has been assigned to one of the sets Fi (C(X,T ) ), Gi (C(X,T ) ), so these sets form a partition. In the oriented graph G, by definition there are no oriented edges of G from outside C(X,T ) into C(X,T ) . For this reason, any root of the restriction of G to C(X,T ) is also a root of the entire graph G. Therefore F1 (C(X,T ) ) = F1 ∩ C(X,T ) , and so G1 (C(X,T ) ) = G1 ∩ C(X,T ) , and so on: for all j , Fj (C(X,T ) ) = Fj ∩ C(X,T ) and Gj (C(X,T ) ) = Gj ∩ C(X,T ) . Since C(X,T ) is finite, (X, T ) must lie in one of the sets Fi (C(X,T ) ) or Gi (C(X,T ) ), and therefore also in one of the sets Fi or Gi . Therefore these sets do indeed form a partition of P as asserted. Define the indicator function I (·; A) defined for all points of PA , by I (X, T ; A) = 1 ∞ if (X, T ) ∈ ∪∞ i=1 Fi (A) and I (X, T ; A) = 0 if (X, T ) ∈ ∪i=1 Gi (A). By Lemma 3.2, this function is well-defined, and takes the value 1 precisely on the set of locations in space-time of those items that are accepted in the realization of the Palásti RSA model on the target set A, described at the start of this section. Therefore, for B ⊂ Rd , the definition Nt (B; A) := I (Xj , Tj ; A) j :Xj ∈B,Tj ≤t
is consistent with (3.1). We now define the Palásti RSA process with target set Rd , using the analogous partition of the whole of P. For each point (X, T ) of P, set I (X, T ) = 1 if (X, T ) ∈ ∞ ∪∞ i=1 Fi and I (X, T ) = 0 if (X, T ) ∈ ∪i=1 Gi . Then I (X, T ) is well-defined for all (X, T ) ∈ P, by Lemma 3.2. The following observation will be useful later on. Remark 3.1. Let (X, T ) be a point of P. The modification of G by addition or removal of edges with neither endpoint in C(X,T ) has no effect on the value of I (X, T ) given by the algorithm just described. For bounded Borel B ⊂ Rd , define the integer-valued random variables Nt (B) := I (Xj , Tj ); N (B) := lim Nt (B). {j :Xj ∈B,Tj ≤t}
t→∞
(3.5)
Clearly Nt (B) is nondecreasing in t. Also, N (B) is finite for bounded B because D has strictly positive volume, so there is a finite bound on the number of translates of D that can be packed into a bounded region. Let ξt (respectively ξ ) be the point process associated with the counting measure Nt (·) (respectively N (·)). That is, re-labelling the points of P in arbitrary order as d {(Xj , Tj )}∞ j =1 , let ξt and ξ be the random locally finite sets in R defined by ξt := {Xj : I (Xj , Tj ) = 1, Tj ≤ t};
ξ := {Xj : I (Xj , Tj ) = 1}.
These are now rigorously defined in terms of the Poisson process P and the graphical construction. They are stationary point processes on Rd .
162
M. D. Penrose
4. Proof of the Basic Results The proof of Theorem 2.1 uses the following multiparameter form of the Ergodic Theorem. Lemma 4.1. Suppose (E, E, P0 ) is a probability space. Let (4, F, P ) be the probability
d space 4 := E Z = z∈Zd E, equipped with the smallest σ -field such that each of the co-ordinate projections Xz : 4 → E, z ∈ Zd , is measurable, and with the probability measure P that makes the E-valued random variables (Xz , z ∈ Zd ) into independent identically distributed random elements of E with common probability distribution P0 . Suppose (Yz , z ∈ Zd ) is a stationary random field on (4, F, P0 ), meaning that Y0 : 4 → R is a real-valued random variable on (4, F, P ), and for all ω ∈ 4 and z ∈ Zd we have Yz (ω) = Y0 (τz (ω)), where the shift operator τz maps ω = (ωy , y ∈ Zd ) to τz (ω) = (ωy+z , y ∈ Zd ). Suppose also that E |Y0 | < ∞. Suppose (An , n ≥ 1) is a sequence of subsets of Zd with |∂ Z An |/|An | → 0. Then z∈An Yz L1 −→ E Y0 as n → ∞. (4.1) |An | Proof. Let e1 = (1, 0, . . . , 0) ∈ Zd . Since P is a product measure, the shift τe1 is an ergodic transformation of 4 (see Petersen (1983), pp. 57–58). Given ε > 0, by the classical one-parameter Ergodic Theorem, we can choose K > 0 such that for all n ≥ K, the average of Ye1 , Y2e1 , . . . , Yne1 is within an L1 distance at most ε of E [Y0 ]. Divide An into one-dimensional intervals, by which we mean maximal subsets of An of the form (Z ∩ [a, b]) × {z2 } × · · · × {zd }, with a, b, z1 , . . . , zd in Z. Let A∗n be the union of constituent intervals of length at least K. Let kn = |An | and kn = |A∗n |. Since |∂An |/|An | → 0, limn→∞ (kn /kn ) = 1. Writing X1 for the L1 -norm (expected absolute value) of any random variable X, we have −1 −1 (k ( Y ) − E [Y ] ≤ k Y ) − k E [Y ] z 0 z 0 n n n z∈A∗n z∈An 1 1 (4.2) + kn−1 Yz − (kn − kn )E [Y0 ] . z∈An \A∗n 1
By the choice of K and translation-invariance, for each interval I of length at least K the average of Yz , z ∈ I , is within an L1 -distance ε of E [Y0 ]. Therefore the first term of the right hand side of (4.2) is at most ε, while the second term tends to zero because (kn /kn ) → 1. Therefore the left side of (4.2) is less than 2ε for large n, and (4.1) follows. For each z ∈ Zd , let Nz denote the set of z ∈ Zd such that z − z ≤ 1 (including z itself). Let Q+ z denote the enlarged cube ∪z ∈Nz Qz , and define the “jamming time” + Jz := inf{t ≥ 0 : Nt (Q+ z ) = N (Qz )}.
The value of Jz is clearly finite. At time Jz , with probability 1 the cube Qz is jammed, meaning that the set V of x ∈ Qz such that x + D does not intersect any of the accepted sets Xi + D with Ti ≤ Jz and I (Xi , Ti ) = 1 has zero Lebesgue measure. Indeed, if its
Random Parking, Sequential Adsorption, Jamming Limit
163
measure were not zero, then at some later time T after Jz there would be a Poisson arrival in the set V , and this item would either be accepted itself, contradicting the definition of Jz , or be blocked by some accepted item arriving in Q+ z between times Jz and T , also contradicting the definition of Jz . Given A ⊂ Rd , let LA be the lattice discretization of A, by which we mean the set of z ∈ Zd such that Qz ∩ A = ∅. Given also K > 0, let LoA,K be the lattice K-interior of A defined by LoA,K := {z ∈ LA : BK (z) ⊆ A},
(4.3)
and let LbA,K = LA \ LoA,K (a set of “boundary” lattice points). Recall from (3.3) that for y, z ∈ Zd , Et (y, z) is the event that y affects z before time t. Given K > 0, define the event Ft,K (z) by Ft,K (z) := ∪z ∈Nz ∪y∈Zd ,y−z≥K Et (y, z ).
(4.4)
Lemma 4.2. Suppose A ⊂ Rd , suppose K > 0 is an integer, suppose t > 0, and suppose z ∈ LoA,K . Then we have the following event inclusions (ignoring events of probability zero): {Nt (Qz ; A) = Nt (Qz } ⊂ Ft,K (z)
(4.5)
{Nt (Qz ; A) = N (Qz ; A)} ⊂ Ft,K (z) ∪ {Jz > t}.
(4.6)
and
Proof. Suppose that Ft (z) does not occur, i.e. there does not exist y ∈ Zd with y −z ≥ K, and z ∈ Nz , such that y affects z before time t. Then for all Poisson points (X, T ) ∈ Q+ z × [0, t], the cluster C(X,T ) is contained in BK (z) × [0, t], and therefore (since z ∈ LoA,K ) is unaffected by removal of points of P lying outside A × [0, t]. Therefore by Remark 3.1, I (X, T ) = I (X, T ; A) for all such (X, T ), so Nt (Qz ; A) = N (Qz ), and (4.5) follows. If also Jz ≤ t, the set {Xi ∈ Q+ z : I (Xi , Ti ) = 1, Ti ≤ t} (of points accepted in + Qz before time t) jams Qz , so no arrivals in Qz after time t are accepted and hence Nt (Qz ; A) = N (Qz ; A). This gives us (4.6). In the proof of Theorem 2.1, we shall repeatedly use the fact that the packing constant c1 defined by c1 := max{k : ∃{x1 , . . . , xk } ⊂ Q0 with x1 + D, . . . , xk + D disjoint}
(4.7)
is finite, because D has positive volume and is bounded. Lemma 4.3. Let t > 0. Suppose (An )n≥1 is a sequence of bounded nonnull Borel subsets of Rd satisfying |∂r An |/|An | → 0 as n → ∞ for all r > 0. Then as n → ∞, N (An ) L1 −→ λ; |An |
Nt (An ) L1 −→ λt , |An |
where we set λ := E N (Q0 ) and λt := E Nt (Q0 ).
(4.8)
164
M. D. Penrose
Proof. Since |An | ≤ |LAn | ≤ |An | + |∂1 An |, and since |LbAn ,K | ≤ |∂K+1 An |, we have as n → ∞ that |LAn | → 1; |An |
|LbAn ,K | |An |
→ 0.
(4.9)
We wish to apply the Ergodic Theorem (Lemma 4.1). Let (Pz , z ∈ Zd ) be a family of independent homogeneous Poisson processes of rate 1 on Q0 × [0, ∞). More formally, let (E, E, P0 ) be the space of point configurations on Q0 × [0, ∞), equipped with the probability measure corresponding to a homogeneous Poisson point
process of rate 1 on Q0 × [0, ∞), and let (4, F, P ) be the probability space 4 := x∈Zd E equipped with the smallest σ -field such that each of the co-ordinate projections Pz : 4 → E, z ∈ Zd , is measurable, and with the probability measure P that makes the E-valued random variables (Pz , z ∈ Zd ) into independent identically distributed random elements of E with common distribution P0 . Assume the Poisson point process P on Rd ×[0, ∞), used as the basis of the graphical construction, is generated by taking the union of the independent Poisson point processes σz (Pz ), where σz (Pz ) is the image of the point set Pz under the translation σz which sends each point (x, t) ∈ Q0 × [0, ∞) to (x + z, t). For z ∈ Zd , define Yz = N (Qz ). Since the graph G is translation-invariant, in the sense that the relation (X, T ) → (X , T ) is invariant under the translation σz of both (X, T ) and (X , T ), the variables (Yz , z ∈ Zd ) form a stationary random field in the sense of Lemma 4.1. By that result, recalling the definition (4.3) of LoA,K , we have N ∪z∈LoA ,1 Qz z∈LoAn ,1 Yz L1 n −→ λ. (4.10) = o |LAn ,1 | |LoAn ,1 | Suppose A ⊂ Rd . Since A \ ∪z∈LoA,1 Qz is contained in the union of at most |LbA,1 | unit cubes, the definition (4.7) of the packing constant c1 yields the uniform bound 0 ≤ N (A) − N ∪z∈LoA,1 Qz ≤ c1 |LbA,1 |, so by (4.9) and (4.10) we obtain the first convergence result in (4.8). The proof of the second convergence result is entirely analogous. Lemma 4.4. Suppose (An )n≥1 is a sequence of bounded nonnull Borel subsets of Rd satisfying |∂r An |/|An | → 0 as n → ∞ for all r > 0. Then for any t ∈ (0, ∞), |Nt (An ) − Nt (An ; An )| L1 −→ 0 |An |
as n → ∞.
Proof. Choose integer K > 1. Since |Nt (Qz ; A) − Nt (Qz ∩ A)| is uniformly bounded by c1 for all z ∈ Zd , using Lemma 4.2 we obtain |Nt (Qz ; An ) − N (Qz ∩ An )| |An |−1 |Nt (An ; An ) − Nt (An )| ≤ |An |−1 z∈LAn
≤
c1 |LbAn ,K | |An |
+
c1
z∈LoAn ,K
|An |
1Ft,K (z)
.
In the right-hand side the first term tends to zero by (4.9) while the second term converges in L1 to c1 P [Ft,K (0)] by the Ergodic Theorem (cf. the proof of (4.10)). Taking K → ∞ and using Lemma 3.1 we obtain the result.
Random Parking, Sequential Adsorption, Jamming Limit
165
Lemma 4.5. Suppose (An )n≥1 is a sequence of bounded nonnull Borel subsets of Rd satisfying |∂r An |/|An | → 0 as n → ∞ for all r > 0. Then for any t ∈ (0, ∞), |Nt (An ; An ) − N (An ; An )| lim sup E ≤ P [J0 > t]. |An | n→∞ Proof. Let K > 1. By Lemma 4.2, |Nt (An ; An ) − N (An ; An )| ≤ |An | ≤
z∈LAn
|Nt (Qz ; An ) − N (Qz ∩ An ; An )|
c1 |LbAn ,K | |An |
+
c1
|An | z∈LoAn ,K
1Ft,K (z)∪{Jz >t}
|An |
.
In the right-hand side the mean of the first term tends to zero by (4.9), while the mean of the second term tends to P [Ft,K (0) ∪ {J0 > t}] by stationarity. Taking K → ∞ and using Lemma 3.1 gives us the result. Proof of Theorem 2.1. Let t > 0, and let λ, λt be as defined in Lemma 4.3. Since |(λ|A| − N (A; A))| is bounded by (λ − λt )|A| + (λt |A| − Nt (A)) + |Nt (A) − Nt (A; A)| + |Nt (A; A) − N (A; A)|, by combining Lemmas 4.3, 4.4, and 4.5, we obtain |N (An ) − N (An ; An )| lim sup E ≤ λ − λt + P [J0 > t]. |An | n→∞ Taking t → ∞, this gives us the convergence (2.1) for p = 1. The general case p ≥ 1 follows because N (An )/|An | is uniformly bounded by a constant. Almost sure convergence. One may view Theorem 2.1 as a weak law of large numbers, and ask if there is an associated strong law, with convergence in the p th moment replaced by almost sure convergence. The question makes sense now that we are assuming that the Palásti models on different target sets A are all defined in terms of a single Poisson process P in space-time. In Theorem 2.1 as stated, the only requirement on the sets An is a very mild condition of negligible boundary, but to obtain almost sure convergence we need a more restricted class of sets An . This is because we need a form of the multiparameter Ergodic Theorem with almost sure convergence. A sufficient condition would be to make the extra assumptions that (i) the sets An are all “rectangles”, i.e. products of intervals, and (ii) lim inf(An ) = Rd . In that case, sets of the form LoAn will be lattice rectangles satisfying lim inf(LAn ) = Zd , and by a multiparameter Ergodic Theorem (e.g. Dunford and Schwartz (1958), Theorem VIII.6.9) the L1 convergence in the Ergodic Theorem (Lemma 4.1) can be replaced by almost sure convergence, and therefore in Lemmas 4.3 and 4.4, the L1 convergence can be replaced by a.s. convergence, while Lemma 4.5 can be changed to an a.s. bound on the limsup. Therefore, with the extra assumptions (i) and (ii) above, the proof of Theorem 2.1 can indeed be modified to give almost sure convergence. Proof of Theorem 2.2. Let ξ be the point process associated with the counting measure N (·), and let ξt be the point process associated with the counting measure Nt (·). Suppose
166
M. D. Penrose
(An )n≥1 is a sequence of target sets with lim inf(An ) = Rd . We shall show that ξ An converges weakly to ξ , and for all t > 0, ξtAn converges weakly to ξt . With Nt (B; A) and Nt (B) all defined in terms of a single Poisson process P using the graphical representation in Sect. 3, it suffices to prove that for any bounded Borel B ⊂ Rd we have almost sure convergence Nt (B; An ) → Nt (B);
N (B; An ) → N (B).
(4.11)
There is no loss of generality in assuming that B is contained in some unit cube Qz (since if not we simply sum the contributions of intersections of B with a finite collection of such cubes), and from now on we do so. Let t > 0. Recall the definition of events Et (y, z) and Ft,K (z) at (3.3) and (4.4). Define the random variable M(t) := min{K ∈ N : Ft,K (z) does not occur} which is almost surely finite by Corollary 3.1. Then Ft,M(t) (z) does not occur, so if n is sufficiently large so that z ∈ LoAn ,M(t) , then by the proof of (4.5) in Lemma 4.2, Nt (B; An ) = Nt (B). This proves the first part of (4.11). Let T = Jz , which is random but almost surely finite. Then neither FT ,M(T ) (z) nor {Jz > T } occurs, so if n is sufficiently large so that z ∈ LoAn ,M(T ) , then by the proof of (4.6) in Lemma 4.2, N (B; A) = NT (B; A) = NT (B) = N (B), the last inequality coming from the definition of T = Jz as the jamming time. This gives us the second part of (4.11).
5. Non-uniform Distributions This section is concerned with the following model. Suppose (X1 , X2 , X3 , . . . ) are independent identically distributed random d-vectors with common probability density function f : Rd → [0, ∞). Let D be a specified bounded open set in Rd with 0 ∈ D, and for a > 0 let aD denote the set {ax : x ∈ D}. For each positive integer n, an n-model generating a random variable denoted Nn∗ is defined as follows. For i = 1, 2, . . . , item i is accepted in the n-model if (Xi + n−1/d D) ∩ (Xj + n−1/d D) = ∅ for all previous items j, j < i that were accepted in the n-model. For B ⊆ Rd , let Nn∗ (B) be the number of items that arrive in B and are accepted for the n-model, i.e. the number of i for which Xi ∈ B and item i is accepted. Write simply Nn∗ for Nn∗ (Rd ), the total number of items accepted. For the special case where f is a uniform distribution over the unit cube, (or some other fixed set with a reasonably well-behaved boundary), Theorem 2.1 and a simple spatial rescaling argument immediately give convergence of Nn∗ /n in the pth moment to λ (the same limit as in Theorem 2.1). In the present section we seek extensions to nonuniform density functions f . In one dimension only, non-uniformity was incorporated into extensions of Rényi’s result by Ney (1962); see also the remarks in Sect. 6.2 of Dvoretzky and Robbins (1964). The non-uniform distribution could represent RSA of particles onto a surface with variable adhesivity, or sequential arrival of members of a territorial biological species in a previously unoccupied area of land where some parts of the terrain are more friendly than others. This form of non-homogeneous SSI has been
Random Parking, Sequential Adsorption, Jamming Limit
167
used in statistical simulations by Pretzsch (1995), described in English by Stoyan and Stoyan (1995). Recall that the support of the function f is the intersection of all closed sets F ⊆ Rd such that Rd \F f (x)dx = 0. A set A ⊂ Rd is Riemann measurable if its indicator function (taking the value 1 for x ∈ A and 0 for x ∈ Rd \ A) is Riemann integrable. Theorem 5.1. Suppose f is Riemann integrable and {x ∈ Rd : f (x) > 0} is bounded and Riemann measurable with measure |S|. Let λ = λ(d, D) be the constant appearing in Theorem 2.1. Then in the above model, for all p ∈ [1, ∞), Lp
n−1 Nn∗ −→ λ|S|.
(5.1)
We emphasize that given d and D, the limit in (5.1) depends on f only via the Lebesgue measure |S| of its support. Estimates of λ by simulation studies for d ≥ 2 in the uniform case, and also the exact value of λ given by (1.1) for d = 1 in the uniform case, are therefore also applicable via (5.1) in the non-uniform case under consideration here. Proof. Let ε > 0. Define dyadic step functions fn+ and fn− approximating f from above and below, as follows. Divide Rd into half-open n-dyadic cubes, that is, sets of the form
d −n −n d j =1 (aj 2 , (aj + 1)2 ] with all aj ∈ Z. Given x ∈ R , if Qn (x) is the n-dyadic + cube containing x, let fn (x) := sup{f (y) : y ∈ Qn (x)}, and fn− (x) := inf{f (y) : y ∈ Qn (x)}. Let Sn+ be the support of fn+ and let Sn− := {x ∈ Rd : fn− (x) > 0}. Then Sn− ⊆ S ⊆ Sn+ . Since {x : f (x) > 0} is assumed Riemann measurable, it is possible to choose n0 = n0 (ε) such that |Sn+0 \ Sn−0 | < ε.
(5.2)
The restriction of fn−0 to Sn−0 is strictly positive and takes finitely many values, so there is a constant δ2 = δ2 (ε) such that fn−0 (x) ≥ δ2 , x ∈ Sn−0 . Let ε > 0. Using the Riemann integrability of f , choose n1 = n1 (ε) ≥ n0 such that (fn+1 (x) − fn−1 (x))dx ≤ ε 2 δ2 , Rd
which implies by Markov’s inequality that |{x ∈ Rd : fn+1 (x) > fn−1 (x) + εδ2 }| ≤ ε.
(5.3)
Define the “well-behaved” set A1 := {x ∈ Sn−0 : fn+1 (x) ≤ fn−1 (x) + εδ2 },
(5.4)
and the “exceptional” set A2 := Sn+0 \ Sn−0 ∪ {x ∈ Sn−0 : fn+1 (x) > fn−1 (x) + εδ2 }. Then A1 ⊂ S ⊂ (A1 ∪ A2 ) .
(5.5)
By (5.2) and (5.3), the set A2 is a union of n0 -dyadic and n1 -dyadic cubes with total volume |A2 | < 2ε.
(5.6)
168
M. D. Penrose
By dividing each of these cubes into subcubes of volume 1/n, for large enough n we have the uniform deterministic bound Nn∗ (A2 ) ≤ 3nc1 ε,
(5.7)
with the packing constant c1 defined at (4.7). Choose one of the n1 -dyadic cubes constituting A1 , and let it be denoted A. Let x0 denote the centre of A. Let g1 be the (constant) value taken by the function fn−1 on the set A. Note that g1 ≥ δ2 and by (5.4), g1 ≤ f (x) ≤ g1 (1 + ε),
x ∈ A.
(5.8)
The next aim is to estimate Nn∗ (A). On a suitable probability space, let P be a homogeneous Poisson process of rate 1 on Rd × [0, ∞). Consider the Palásti RSA model, with exclusion based on overlap of translates of D (rather than n−1/d D) and with arrivals in space-time given by the points of P. For B ⊂ Rd , let N (B) be the number of items accepted in B × [0, ∞), i.e. let N (B) be precisely the same as defined at (3.5). Let An denote the cube of side n1/d 2−n1 centred at the origin; note that |An | = n|A|.
(5.9)
Let P1,n be the restriction of P to An , and assume that on the same probability space, there is an independent non-homogeneous Poisson process P2,n on Rd × [0, ∞) with intensity measure hn (y)dydt (y ∈ Rd , t > 0), where we set hn (y) := g1−1 (f (x0 + n−1/d y) − g1 ), y ∈ An ; hn (y) := g1−1 f (x0 + n−1/d y),
y ∈ R d \ An .
Later on we shall use the fact that this definition and (5.8) together imply that 0 ≤ hn (y) ≤ ε,
y ∈ An .
(5.10)
The union of P1,n and P2,n is a non-homogeneous Poisson process on Rd × [0, ∞) with intensity measure g1−1 f (x0 + n−1/d y)dydt. Consider a RSA model in which items arrive in Rd at points in space-time given by the points of the Poisson process P1,n ∪P2,n , and exclusion is based on overlaps of translates of D, as in the case of N (B) above, and in the basic RSA model described in Sect. 3. Let Nn∗∗ (An ) denote the total number of items accepted with arrivals in An × [0, ∞). Define the linear transformation τn : Rd × [0, ∞) → Rd × [0, ∞) which maps (y, t) to (x0 + n−1/d y, t) (so that the image of An × [0, ∞) under τn is A × [0, ∞)). By the transformation theory of Poisson processes, as given for example in Resnick (1987, Prop. 3.7), the image of the Poisson process P1,n ∪ P2,n under the mapping τn is a nonhomogeneous Poisson process on Rd × [0, ∞) whose intensity measure is (n/g1 )f (x)dxdt; in other words, a sequence of independent variables on Rd with common probability density function f , arriving at the arrival times of a homogeneous Poisson process in [0, ∞) of rate n/g1 . Let us assume that the input for the n-model is given by the positions in Rd of the arrivals of this transformed Poisson process, taken
Random Parking, Sequential Adsorption, Jamming Limit
169
in time-order. In that case, since the transformation τn shrinks sets by a factor of n−1/d and maps points in An to points in A, we have the identity Nn∗∗ (An ) = Nn∗ (A).
(5.11)
Next, we aim to approximate Nn∗∗ (An ) by N (An ). Let t > 0 and let K > 0 with (K − 2)δ1 > t, with δ1 given by Lemma 3.1. For z ∈ Zd , let Uz be the first arrival time of the Poisson process P2,n in BK (z), i.e., labelling the points of P2,n as {(Yi , Ti )}∞ i=1 , Uz = min{Ti : Yi ∈ BK (z)}. Since at most c1 items are packed into any unit cube, using notation from Sect. 4 we have E |Nn∗∗ (An ) − N (An )| ≤
z∈LAn
≤ c1
E |Nn∗∗ (Qz ∩ An ) − N (Qz ∩ An )|
z∈LoAn ,K
+ c1
P [Nn∗∗ (Qz ) = N (Qz ); Jz ≤ t; Uz > t]
z∈LoAn ,K
(P [Jz > t] + P [Uz ≤ t]) + c1 |LbAn ,K |. (5.12)
For z ∈ LoAn ,K the set BK (z) is contained in An , and therefore by (5.10), the expected total number of points of P2,n in BK (z) × [0, t] is bounded by (2K)d tε, so that P [Uz ≤ t] ≤ (2K)d tε,
z ∈ LoAn ,K .
(5.13)
Turning to the first sum in the right hand side of (5.12), for z ∈ LoAn ,K we assert that P [Nn∗∗ (Qz ) = N (Qz ); Jz ≤ t; Uz > t] ≤ P [Ft,K−1 (z)] d+2−K
≤ |Nz |P [∪z>K−2 Et (z, 0)] ≤ 2(3
).
(5.14) (5.15)
The first inequality (5.14) is obtained by a similar argument to Lemma 4.2; if Ft,K−1 (z) does not occur, then for (X, T ) ∈ Q+ z × [0, t], the cluster C(X,T ) is contained in the cylinder BK−1 (z) × [0, t]. If also Uz > t, then there are no points of Pn,2 in the slightly larger cylinder BK (z) × [0, t], and so none of these points is attached to C(X,T ) in the graph. Therefore by Remark 3.1 the set of accepted points in Q+ z up to time t is the same for the Poisson process P1,n ∪ P2,n as it is for P1,n , and if also Jz ≤ t, then no point is accepted in Qz after time t because it is jammed. This proves the inequality (5.14). The last inequality in (5.15) comes from Lemma 3.1 and the fact that we assumed (K − 2)δ1 > t. By (5.15) and (5.13), the left-hand side of (5.12) is bounded by c1 |LoAn ,K |(2(3d+2−K ) + P [J0 > t] + (2K)d tε) + c1 |LbAn ,K |.
(5.16)
170
M. D. Penrose L1
By the Ergodic Theorem (Lemma 4.3), we have (N (An )/|An |) −→ λ, and therefore by (5.9), (5.11) and the bound (5.16) for (5.12), ∗ ∗∗ Nn (A) Nn (An ) lim sup E − λ|A| = |A| lim sup E − λ n |A | n→∞ n→∞ n ∗∗ Nn (An ) − N (An ) N (An ) ≤ |A| lim sup E +E − λ |A | |A | n→∞ n n d+2−K d ≤ c1 |A| 2 · 3 + P [J0 > t] + ε(2K) t . Summing over the constituent cubes A of A1 (a finite collection of cubes), and also using (5.5), (5.6) and (5.7), we obtain ∗ ∗ Nn Nn (A1 ) lim sup E − λ|S| ≤ lim sup E − λ|A1 | n n n→∞ n→∞ ∗ Nn (S \ A1 ) + lim sup E − λ|S \ A1 | n n→∞ d+2−K ≤ c1 |A1 | 2 · 3 + P [J0 > t] + ε(2K)d t + ε(3c1 + 2). L1
Letting ε → 0, then K → ∞, and then t → ∞, we obtain (Nn∗ /n) −→ λ|S|, and convergence in the pth moment for p > 1 follows because Nn∗ /|An | is uniformly bounded by a constant. 6. Random Shapes, Random Types In the basic Palásti model of RSA, each incoming item is a translate of the same ddimensional set D. It is natural to ask what happens if this condition is relaxed by allowing for several kinds of set. In the case d = 1, variable length “cars” are incorporated into the models of Ney (1962) and Mannion (1979). In the case d = 2, the incoming items could be disks of variable radius, squares of variable orientation, rectangles of variable aspect ratio, and so on. Many of the recent simulation studies have been concerned with these types of RSA model. See the surveys mentioned in the introduction, for example Sects. 6.2 and 7 of Bartelt and Privman (1991). More generally, one may have several types of item, with arbitrary exclusion rules between items of different types at different locations, not necessarily induced by each type representing a particular size or shape. For example, Itoh and Shepp (1999) have recently considered a one-dimensional parking model where there are two types, which they call “spins”; items of the same type exclude each other up to range a and items of different type exclude each other up to range 1. They assert that for this model the mean number f (x) of items accepted of a particular type in an interval of length x is subadditive in x, but Shepp (personal communication) has subsequently observed that this is not really true, so some other argument is needed to show convergence of f (x)/x as x → ∞. The approach here provides a method to prove this, as well as higher dimensional analogues. We extend Theorem 2.1 by allowing for items which are of random type. The type can represent the shape of an item, or its “spin”, or any other property of the item. Let (F, F) be an arbitrary measurable space (the space of possible types), and let P1 be a
Random Parking, Sequential Adsorption, Jamming Limit
171
probability measure on (F, F), representing the distribution of types of random inputs. Suppose D : Rd × F × F → {0, 1} is a measurable function (with respect to product measure), the so-called “exclusion function”, where D(x, E, E ) takes the value 0 if an item of type E excludes any subsequently arriving item of type E with relative displacement x. For example, in the case of spatial exclusion by items with random shape, one could take F to be a space of subsets of Rd and set D(x, E, E ) to be 0 only if E ∩ (x + E ) = ∅. A general theory of random subsets of Rd can be found in Matheron (1975) and in condensed form in Stoyan, Kendall and Mecke (1995). In particular, results and remarks of Matheron (1975), pp. 28, 9, and 13, imply that the function D just defined really is measurable. Assume that the exclusion function has a positive “hard core” component and a finite range. In other words, assume that that there are constants 0 < r0 < r1 such that for (P1 × P1 )-almost all pairs (E, E ), we have
and
D(x; E, E ) = 0,
x ≤ r0
D(x; E, E ) = 1,
x ≥ r1 .
The model goes as follows. Given a bounded nonnull Borel target set A ⊂ Rd , let X1 , X2 , . . . be independent d-vectors uniformly distributed over A, and let E1 , E2 , . . . be independent random elements of F with common probability distribution P1 . We refer to Xi as the location of the i th incoming item, and to Ei as its type. Let the first item be accepted, and recursively for i = 2, 3, 4, . . . , let the i th item be accepted if D(Xi − Xj , Ej , Ei ) = 1 for all j < i such that j was accepted; otherwise the i th item is rejected. As in earlier results, it is convenient to assume Poisson input; in this case, let us take P to be a Poisson process on Rd × [0, ∞) × F with intensity given by the product of Lebesgue measure on Rd × [0, ∞) with the probability measure P1 on F. Each point of this Poisson process is a triple (X, T , E), with X ∈ Rd representing location of an incoming item, T ∈ [0, ∞) representing its time of arrival, and E ∈ F its type. Then arriving items have independent identically distributed types with distribution P1 , and it is assumed that the input to the model as described in the preceding paragraph comes from the locations and types of the points of this Poisson process which lie in A × [0, ∞) × F, taken in time-order. Given B ⊂ Rd , let N (B; A) be the total number of items accepted whose location is in B. This is finite because only a finite number of translates of [−r0 /2, r0 /2]d can be packed into A. Also of interest are other counts concerned with accepted sets. For example, one might be interested in the total volume of accepted sets in B (which is trivially deduced from N (B; A) in the basic Palásti model when P1 is degenerate), or in the number of accepted sets of a particular kind. With this in mind, for any bounded measurable function h : F → R, let us define N h (B; A) be the sum of the values of h(Ei ) over all accepted items in B, or in other words, Nth (B; A) :=
∞
h(Ei )1{i accepted; Xi ∈ B, Ti ≤ t},
i=1
and N h (B; A) := limt→∞ Nth (B; A).
172
M. D. Penrose
In the case where the function h is the function 1 that is identically 1, we write simply N (B; A) for N 1 (B; A) the total number of accepted items in B, and Nt (B; A) for Nt1 (B; A), the total number of accepted items up to time t. Let ξtA and ξ A be the point processes associated with the counting measures Nt (·; A) and N (·; A). Theorem 6.1. There is a constant λ > 0, depending on the dimension d, the choice of measure P1 , and the function h, such that if (An )n≥1 is a sequence of nonnull bounded Borel subsets of Rd satisfying |∂r An |/|An | → 0 as n → ∞ for all r > 0, then for any p ∈ [1, ∞), N h (An ; An ) Lp −→ λ. |An |
(6.1)
Moreover, there exist point processes ξ and ξt , t > 0, such that if (An )n≥1 is any sequence of target sets with lim inf(An ) = Rd , the sequence of point processes ξ An converges weakly to ξ , and for all t > 0, the sequence of point processes ξtAn converges weakly to ξt . In this model, the distribution of types of accepted items will typically differ from the distribution P1 of types of incoming items. For example, for balls of varying sizes, the distribution of accepted items will be biased towards smaller items that can fit more easily into gaps between previously accepted items. It might also be of interest to consider a model without this feature, namely a “random shape” version of that described at the start of this article, in which every incoming item is accepted at a position uniformly distributed over feasible positions, until there are no feasible positions left. However, we do not consider such a model here. The proof of Theorem 6.1 runs along mostly the same lines as those of Theorems 2.1 and 2.2, and we just give a sketch. As before, we may assume without loss of generality that r1 = 1, so that there cannot be any overlap between items whose location is at an l∞ distance more than 1 from one another. The appropriate oriented graph structure G on the points of P goes as follows. Draw an oriented edge from (X, T , E) to (Y, U, E ) whenever T < U and D(Y − X; E, E ) = 0. Using this graph structure we define I (X, T , E; A) in terms of the restriction of the graph to vertices in A × [0, ∞) × F, and I (X, T , E) in terms of the entire graph, in a manner analogous to that used in Sect. 3. We then have N h (B; A) = h(E)I (X, T , E; A)1{X∈B} , (X,T ,E)∈P
and can also define Nth (B) := and
h(E)I (X, T , E)1{X∈B,T ≤t} ,
(X,T ,E)∈P
Nt (B) :=
I (X, T , ξ )1{X∈B,T ≤t} .
(X,T ,ξ )∈P
Also set N h (B) = limt→∞ Nth (B) and N (B) = limt→∞ Nt (B). Most of the modifications to the proof of Theorems 2.1 and 2.2 are straightforward. The definition of the blocking time Jz in the modified setting is + Jz := inf{t ≥ 0 : Nt (Q+ z ) = N (Qz )}.
Random Parking, Sequential Adsorption, Jamming Limit
173
At time Jz , with probability 1 the cube Qz is jammed, meaning in this case that the set V of pairs (x, ζ ) ∈ Qz × F such that D(x − X; E, ζ ) = 1 for all of the points (X, T , E) of P such that T ≤ Jz and I (X, T , E) = 1, satisfies (Leb × P1 )(V ) = 0. This is the case because if it were not, then at some later time U after Jz there would be a Poisson arrival (X, U, E ) with (X, E ) ∈ V , and this item would either be accepted itself, contradicting the definition of Jz , or be blocked by some item arriving between times Jz and U , also contradicting the definition of Jz . We can apply the Ergodic Theorem to get convergence of N h (An )/|An | to E N h (Q0 ), because if A is a union of cubes centred at integer lattice points, then N h (A) can be expressed as the sum over corresponding lattice points of a stationary functional of independent identically distributed Poisson processes on Q0 × [0, ∞) × F, indexed by lattice points. Likewise, Nth (An )/|An | tends to E Nth (Q0 ). We can approximate Nth (An ) by Nth (An ; An ) and approximate Nth (An ; An ) by N h (An ; An ) in much the same manner as before, and the remainder of the proof follows in the same way as in the earlier case. 7. Cooperative Sequential Adsorption Cooperative sequential adsorption (CSA) is a generalization of RSA, in which an incoming item located at x ∈ Rd is accepted with a probability that depends on the local configuration of previously accepted items near to x. For example, in one-dimensional car-parking one can imagine that some of the more faint-hearted drivers might go away when confronted by a gap between cars that is only just big enough to contain another car. CSA models for deposition of molecules onto a surface are an attempt to incorporate intermolecular interactions more general than spatial exclusion. The Palásti RSA model is a special case of CSA in which the acceptance probability is a product over all accepted points of “hard core” indicator functions. In this section we indicate how to adapt the proof of Theorem 2.1 to obtain a similar LLN for more general CSA models, provided the acceptance probability function has a hard core component to it and is translation-invariant with finite range. In fact, many of the CSA models proposed in the scientific literature have infinite range effects; further work will be required to deal with these in a rigorous way. As in the preceding section, we could allow for the possibility of several different types of incoming item, but for the sake of clarity we restrict our attention to the following single-type CSA model. Let r1 > 0 be the range of interactions. Let D0 ∈ (0, 1], and for k = 1, 2, 3, . . . , let Dk : ([−r1 , r1 ]d )k → [0, 1] be a Borel measurable function that is permutation-invariant, meaning that for any permutation σ : {1, 2, . . . , k} → {1, 2, . . . , k}, we have
Dk (x1 , . . . , xk ) = Dk (xσ (1) , . . . , xσ (k) ).
The function Dk represents the acceptance probability as a function of relative displacement of existing items. Given a bounded nonnull Borel target set A ⊂ Rd , suppose items arrive sequentially at locations uniformly distributed over A. If an item arrives at location x ∈ A, and the
174
M. D. Penrose
already accepted items in Br1 (x) are at x1 , . . . , xk , then the new item is accepted with probability Dk (x1 − x, . . . , xk − x). If there are no already accepted items in Br1 (x), then it is accepted with probability D0 . Combining the functions Dk , let the function D(X ), defined for all finite subsets X of [−r1 , r1 ]d , be given by D(X ) := D0 if X = ∅; D(X ) := Dk (x1 , . . . , xk ) if X = {x1 , . . . , xk }. We assume D satisfies a “hard core” condition, by which we mean that there exists r0 ∈ (0, r1 ) such that D(X ) = 0 if X ∩ Br0 (0) = ∅. This means that no incoming item is accepted if too close to an existing accepted item. If the hard core condition is satisfied, then the ultimate number of accepted items in A is finite; let it be denoted N (A; A) as usual. Following on from previous sections, the primary aim here is to give a law of large numbers for N (A; A). Theorem 7.1. There is a constant λ > 0 such that if (An )n≥1 is a sequence of bounded nonnull Borel subsets of Rd with |∂r An |/|An | → 0 as n → ∞ for all r > 0, then for all p ∈ [1, ∞), N (An ; An ) Lp −→ λ. |An |
(7.1)
It is also possible to obtain a point process convergence result along the lines of Theorem 2.2, but we do not go into details. As for preceding results, it suffices to prove Theorem 7.1 in the case r1 = 1, and from now on we assume this. Also as in previous cases, we wish to extend the CSA process to the whole of Rd . This requires a modification of the previous graphical construction. Assume that on a suitable probability space, we have a homogeneous Poisson process P of unit intensity on Rd × [0, ∞), and each point (X, T ) of the Poisson process P carries a mark M(X, T ) which is uniformly distributed on [0, 1], independently of the other arrivals. In other words, we are really taking a homogeneous Poisson process on Rd × [0, ∞) × [0, 1], but we are viewing the third component M of each Poisson point (X, T , M) as a “mark” assigned to the point (X, T ). The first component X represents location and the second component T represents time of arrival, as usual. Make the points of the Poisson process P on Rd × [0, ∞) into the vertices of an oriented graph, denoted G, by putting in an oriented edge (X, T ) → (X , T ) whenever X − X ≤ 1 and T < T . d For z ∈ Zd , let the cubes Qz and Q+ z be as defined in Sect. 4. For x, y ∈ Z , let us say that y is affected by x before time t if there exists a (directed) path in G that starts at some Poisson point (X, T ) and ends at some Poisson point (Y, U ) with X ∈ Qx , Y ∈ Qy , and U ≤ t. Let Et (x, y) denote the event that y is affected by x before time t. Lemma 7.1. There exists a constant δ3 ∈ (0, 1), such that for all r > 0 and all z ∈ Zd , P ∪y∈Zd \Br (z) Eδ 3 r (y, z) ≤ 2(3−r ). (7.2) Hence with probability 1, z is affected before time t by only finitely many y. Proof. The proof of Lemma 3.1 carries through to the present case, to give us (7.2). The proof of the last assertion is the same as that of Corollary 3.1.
Random Parking, Sequential Adsorption, Jamming Limit
175
Suppose (X, T ) is a point of P, representing an incoming item at location X at time T with mark M(X, T ). Let us put I (X, T ; A) = 1 if the item is accepted and I (X, T ; A) = 0 if not. Let the rule for acceptance be as follows: set I (X, T ; A) = 1 if X ∈ A and M(X, T ) ≤ D (X − X) :(X , T ) ∈ P, X − X ≤ 1, (7.3) T < T , I (X , T ; A) = 1 . Otherwise, set I (X, T ; A) = 0. This clearly generates a realization of the CSA process with target set A as described earlier. We now give an equivalent generational construction of I (X, T ; A) which also applies when the target set is the whole of Rd . Given A ⊆ Rd (possibly the set Rd itself), let PA = P ∩ (A × [0, ∞)). Let GA be the restriction of G to vertices in PA . Let G0 (A) be the set of roots of GA . After removing G0 (A) from the vertex set of GA , let G1 (A) be the set of roots of the remaining oriented graph. After removing G1 (A) also, let G2 (A) be the set of roots of the remaining graph. Repeating this process gives us a sequence G0 (A), G1 (A), G2 (A), . . . of disjoint subsets (“generations”) of the vertex set of GA . By Lemma 7.1, they actually form a partition of GA . Let (X, T ) be a point of PA . Suppose first that (X, T ) ∈ G0 , i.e. (X, T ) is a root of GA ; in this case, let I (X, T ; A) be equal to 1 if M(X, T ) ≤ D0 , and be zero otherwise. Now suppose, inductively, that (X, T ) is in generation Gm (A) and I (X , T ; A) has been determined for all (X , T ) in generations G0 (A), G1 (A), . . . , Gm−1 (A). Then put I (X, T ; A) = 1 if and only if M ≤ D({(X − X) : (X , T ) ∈ PA , X ∈ B1 (X), T < T , I (X , T ; A) = 1}). Since the sets G0 (A), G1 (A), . . . form a partition of PA , it can be seen that in the case where A is bounded, this definition of I (X, T ; A) agrees with that given at (7.3) earlier on. In the case where A = Rd , this definition of I (X, T ; A) remains valid, and we abbreviate I (X, T ; Rd ) to I (X, T ). Define Nt (B) := I (X, T )1{X∈B,T ≤t} (X,T )∈P
and N (B) = limt→∞ Nt (B). For each z ∈ Zd , as in Sect. 4 define the “jamming time” + Jz := inf{t ≥ 0 : Nt (Q+ z ) = N (Qz )},
which is finite. At time Jz , with probability 1 the site z is jammed, meaning in this case that for Lebesgue-almost all x ∈ Qz , D({X − x : (X, T ) ∈ P, X ∈ B1 (x), T ≤ Jz , I (X, T ) = 1}) = 0. Indeed, if this were not the case then there would exist ε > 0 and a set B ⊂ Qz of strictly positive Lebesgue measure, such that for x ∈ B, D({X − x : (X, T ) ∈ P, X ∈ B1 (x), T ≤ Jz , I (X, T ) = 1}) > ε.
176
M. D. Penrose
But then with probability 1, there would arrive a Poisson point (X , U ) with X ∈ B, U > Jz and M(X , U ) < ε, and the item this represented would be accepted unless there had been some added point in the cube Q+ z after time Jz , a contradiction. With the above ingredients in place, the proof of Theorem 7.1 follows by very much the same argument as that of Theorem 2.1, and we do not give further details. Acknowledgement. It is a pleasure to acknowledge helpful suggestions and comments by Joe Yukich and George Stell.
References 1. Bartelt, M.C., Privman, V.: Kinetics of irreversible monolayer and multilayer sequential adsorption. Internat. J. Mod. Phys. B 5, 2883–2907 (1991) 2. Caser, S., Hilhorst, H.J.: Random sequential adsorption of hard discs and squares: Exact bounds for the covering fraction. J. Phys. A 28, 3887–3900 (1995) 3. Coffman, E.G., Flatto, L., Jelenkovi´c, P., Poonen, B.: Packing random intervals on-Line. Algorithmica 22, 448–476 (1998) 4. Daley, D.J., Vere-Jones, D.: An Introduction to the Theory of Point Processes. New York: Springer, 1988 5. Diggle, P.J.: Statistical Analysis of Spatial Point Patterns. London: Academic Press, 1983 6. Dunford, N., Schwartz, J.T.: Linear Operators, Part I. New York: Wiley-Interscience, 1958 7. Durrett, R. Lecture Notes on Particle Systems and Percolation. Pacific Grove: Wadsworth & Brooks/Cole, 1988 8. Dvoretzky, A., Robbins, H.: On the “parking” problem. Publ. Math. Res. Inst. Hung. Acad. Sci. 9, 209–225 (1964) 9. Evans, J.W.: Random and cooperative sequential adsorption. Rev. Mod. Phys. 65, 1281–1329 (1993) 10. Itoh, Y., Shepp, L.: Parking cars with spin but no length. J. Statist. Phys. 97, 209–231 (1999) 11. Mannion, D.: Random packing of an interval II. Adv. Appl. Probab. 11, 591–602 (1979) 12. Matheron, G.: Random Sets and Integral Geometry. New York: Wiley, 1975 13. Ney, P.: A random interval filling problem. Ann. Math. Stat. 33, 702–718 (1962) 14. Palásti, I.: On some random space filling problems. Publ. Math. Res. Inst. Hung. Acad. Sci. 5, 353–360 (1960) 15. Petersen, K.: Ergodic Theory. Cambridge: Cambridge University Press, 1983 16. Pretzsch, H.: Zum Einfluss des Baumverteilungsmusters auf den Bestandszuwachs. Allg. Forst- und Jagdzeitung 166, 190–201 (1995) 17. Privman, V.: Dynamics of nonequilibrium deposition. Colloids and Surfaces A 165, 231–240 (2000) 18. Rényi, A.: On a one-dimensional random space-filling problem. Publ. Math. Res. Inst. Hung. Acad. Sci. 3, 109–127 (1958) 19. Resnick, S.I.: Extreme Values, Regular Variation, and Point Processes. New York: Springer, 1987 20. Senger, B. Voegel, J.-C., Schaaf, P.: Irreversible adsorption of colloidal particles on solid substrates. Colloids and Surfaces A 165, 255–285 (2000) 21. Solomon, H., Weiner, H.: A review of the packing problem. Commun. Statist. Theor. Meth. 15, 2571–2607 (1986) 22. Stoyan, D., Kendall, W.S., Mecke, J.: Stochastic Geometry and its Applications, 2nd edition. Chichester: Wiley, 1995 23. Stoyan, D. and Stoyan, H.: Non-Homogenous Gibbs process models for forestry – A case study. Biometrical J. 40, 521–531 (1998) 24. Talbot, J., Tarjus, G., Van Tassel, P.R., Viot, P.: From car parking to protein adsorption: An overview of sequential adsorption processes. Colloids and Surfaces A 165, 287–324 (2000) 25. Wang, J.-S.: Series expansion and computer simulation studies of random sequential adsorption. Colloids and Surfaces A 165, 325–343 (2000) 26. Yukich, J.E.: Probability Theory of Classical Euclidean Optimization Problems. Berlin–Heidelberg: Springer, 1998 Communicated by J. L. Lebowitz
Commun. Math. Phys. 218, 177 – 216 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Discrete Riemann Surfaces and the Ising Model Christian Mercat Université Louis Pasteur, Strasbourg, France. E-mail:
[email protected] Received 23 May 2000/ Accepted: 21 November 2000
Abstract: We define a new theory of discrete Riemann surfaces and present its basic results. The key idea is to consider not only a cellular decomposition of a surface, but the union with its dual. Discrete holomorphy is defined by a straightforward discretisation of the Cauchy–Riemann equation. A lot of classical results in Riemann theory have a discrete counterpart, Hodge star, harmonicity, Hodge theorem, Weyl’s lemma, Cauchy integral formula, existence of holomorphic forms with prescribed holonomies. Giving a geometrical meaning to the construction on a Riemann surface, we define a notion of criticality on which we prove a continuous limit theorem. We investigate its connection with criticality in the Ising model. We set up a Dirac equation on a discrete universal spin structure and we prove that the existence of a Dirac spinor is equivalent to criticality. Contents 1. 2. 3. 4.
Introduction . . . . . . . . . . . . . . . . . . . Discrete Harmonic and Holomorphic Functions Criticality . . . . . . . . . . . . . . . . . . . . Dirac Equation . . . . . . . . . . . . . . . . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
. . . .
177 180 189 204
1. Introduction We present here a new theory of discrete analytic functions, generalising to discrete Riemann surfaces the notion introduced by Lelong–Ferrand [LF]. Although the theory defined here may be applied wherever the usual Riemann Surfaces theory can, it was primarily designed with statistical mechanics, and particularly the Ising model, in mind [McCW, ID]. Most of the results can be understood without Current address: Department of Math. and Stats, The University of Melbourne, Parkville, Victoria 3052, Australia. E-mail:
[email protected]
178
C. Mercat
any prior knowledge in statistical mechanics. The other obvious fields of application in two dimensions are electrical networks, elasticity theory, thermodynamics and hydrodynamics, all fields in which continuous Riemann surfaces theory gives wonderful results. The relationship between the Ising model and holomorphy is almost as old as the theory itself. The key connection to the Dirac equation goes back to the work of Kaufman [K] and the results in this paper should come as no surprise for workers in statistical mechanics; they knew or suspected them for a long time, in one form or another. The aim of this paper is therefore, from the statistical mechanics point of view, to define a general theory as close to the continuous theory as possible, in which claims as “the Ising model near criticality converges to a theory of Dirac spinors” are given a precise meaning and a proof, keeping in mind that such meanings and proofs already exist elsewhere in other forms. The main new result in this context is that there exists a discrete Dirac spinor near criticality in the finite size Ising model, before the thermodynamic limit is taken. Self-duality, which enabled the first evaluations of the critical temperature [KW, Ons,Wan50], is equivalent to criticality at finite size. It is given a meaning in terms of compatibility with holomorphy. The first idea in order to discretise surfaces is to consider cellular decompositions. Equipping a cellular decomposition of a surface with a discrete metric, that is giving each edge a length, is sufficient if one only wants to do discrete harmonic analysis. However it is not enough if one wants to define discrete analytic geometry. The basic idea of this paper is to consider not just the cellular decomposition but rather what we call its double, i.e. the pair consisting of the cellular decomposition together with its Poincaré dual. A discrete conformal structure is then a class of metrics on the double where we retain only the ratio of the lengths of dual edges1 . In Ising model terms, a discrete conformal structure is nothing more than a set of interaction constants on each edge separating neighbouring spins in an Ising model of a given topology. A function of the vertices of the double is said to be discrete holomorphic if it satisfies the discrete Cauchy–Riemann equation, on two dual edges (x, x ) and (y, y ), f (y ) − f (y) f (x ) − f (x) = i . (y, y ) (x, x ) This definition gives rise to a theory which is analogous to the classical theory of Riemann surfaces. We define discrete differential forms on the double, a Hodge star operator, discrete holomorphic forms, and prove analogues of the Hodge decomposition and Weyl’s lemma. We extend to our situation the notion of pole of order one and we prove existence theorems for meromorphic differentials with prescribed poles and holonomies. Similarly, we define a Green potential and a Cauchy integral formula. Up to this point, the theory is purely combinatorial. In order to relate the discrete and continuous theories on a Riemann surface, we need to impose an extra condition on the discrete conformal structure to give its parameters a geometrical meaning. We call this semi-criticality in Sect. 3. The main result here is that the limit of a pointwise convergent sequence of discrete holomorphic functions, on a refining sequence of semicritical cellular decompositions of the same Riemann surface, is a genuine holomorphic function on the Riemann surface. If one imposes the stronger condition of criticality on the discrete conformal structure, one can define a wedge product between functions and 1-forms which is compatible with holomorphy. 1 By definition, a discrete Riemann surface is a discrete surface equipped with a discrete conformal structure in this sense.
Discrete Riemann Surfaces
179
y (y, y ) x
(x, x )
x
y Fig. 1. The discrete Cauchy–Riemann equation
Finally, for applications of this theory to statistical physics, one needs to define a discrete analogue of spinor fields on Riemann surfaces. In Sect. 4 we first define the notion of a discrete spin structure on a discrete surface. It sheds an interesting light onto the continuous notion, allowing us to redefine it in explicit geometrical terms. In the case of a discrete Riemann surface we then define a discrete Dirac equation, generalising an equation appearing in the Ising model, and show that criticality of the discrete conformal structure is equivalent to the existence of a local massless Dirac spinor field. In Sect. 2, we present definitions and properties of the theory which are purely combinatorial. First, in the empty boundary case, we recall the definitions of dual cellular complexes, notions of deRham cohomology. We define the double , we present the discrete Cauchy–Riemann equation, the discrete Hodge star on , the Laplacian and the Hodge decomposition. In Subsect. 2.2, we prove Dirichlet and Neumann theorems, the basic tools of discrete harmonic analysis. In Subsect. 2.3 we prove existence theorems for 1-forms with prescribed poles and holonomies. In Subsect. 2.4, we deal with the basic difficulty of the theory: The Hodge star is defined on while the wedge product is on another complex, the diamond ♦, obtained from or ∗ by the procedure of tile centering [GS87]. We prove Weyl’s lemma, Green’s identity and Cauchy integral formulae. In Sect. 3, we define semi-criticality and criticality and prove that it agrees with the usual notion for the Ising model on the square and triangular lattices. We present Voronoï and Delaunay semi-critical maps in order to give examples and we prove the continuous limit theorem. We prove that every Riemann surface admits a critical map and give examples. On a critical map, the product between functions and 1-forms is compatible with holomorphy and yields a polynomial ring, integration and derivation of functions. We give an example showing where the problems are. In Sect. 4, we set up the Dirac equation on discrete spin structures. We motivate the discrete universal spin structure by first showing the same construction in the continuous case. We show discrete holomorphy property for Dirac spinors, we prove that criticality is equivalent to the existence of local Dirac spinors and present a continuous limit theorem for Dirac spinors.
180
C. Mercat
2. Discrete Harmonic and Holomorphic Functions In this section, we are interested in properties of combinatorial geometry. The constructions are considered up to homeomorphisms, that is to say on a combinatorial surface, as opposed to Sect. 3 where criticality implies that the discrete geometry is embedded in a genuine Riemann surface. 2.1. First definitions. Let be an oriented surface without boundary. A cellular decomposition of is a partition of into disjoint connected sets, called cells, of three types: a discrete set of points, the vertices 0 ; a set of non intersecting paths between vertices, the edges 1 ; and a set of topological discs bounded by a finite number of edges and vertices, the oriented faces 2 . A parametrisation of each cell is chosen, faces are mapped to standard polygons of the euclidean plane, and edges to the segment (0, 1); we recall particularly that for each edge, one of its two possible orientations is chosen arbitrarily. We consider only locally finite decompositions, i.e. any compact set intersects a finite number of cells. In each dimension, we define the space of chains C• ( ) as the Z-module generated by the cells. The boundary operator ∂ : Ck ( ) → Ck−1 ( ) partially encodes the incidence relations between cells. It fulfills the boundary condition ∂∂ = 0. We now describe the dual cellular decomposition ∗ of a cellular decomposition of a surface without boundary. We refer to [Veb] for the general definition. Though we formally use the parametrisation of each cell for the definition of the dual, its combinatorics is intrinsically well defined. To each face F ∈ 2 we define the vertex F ∗ ∈ 0∗ inside the face F , the preimage of the origin of the euclidean plane by the parametrisation of the face. Each edge e ∈ 1 , separates two faces, say F1 , F2 ∈ 1 (which may coincide), hence is identified with a segment on the boundary of the standard polygon corresponding to F1 , respectively F2 . We define the dual edge e∗ ∈ 1∗ as the preimage of the two segments in these polygons, joining the origin to the point of the boundary mapped to the middle of e. It is a simple path lying in the faces F1 and F2 , drawn between the two vertices F1∗ and F2∗ (which may coincide), cutting no edge but e, once and transversely. As the surface is oriented, to the oriented edge e we can associate the oriented dual edge e∗ such that (e, e∗ ) is direct at their crossing point. To each vertex v ∈ 0 , with v1 , . . . , vn ∈ 0 as neighbours, we define the face v ∗ ∈ 2∗ by its boundary ∂v ∗ = (v, v1 )∗ + . . . + (v, vk )∗ + . . . + (v, vn )∗ . Remark 1. ∗ is a cellular decomposition of [Veb]. If we choose a parametrisation of the cells of ∗ , we can consider its dual ∗∗ ; it is a cellular decomposition homeomorphic to but the orientation of the edges is reversed. The bidual of e ∈ 1 is the reversed edge e∗∗ = −e (see Fig. 2). The double of a cellular decomposition is the union of these two dual cellular decompositions. We will speak of a k-cell of as a k-cell of either or ∗ . A discrete metric is an assignment of a positive number (e) to each edge e ∈ 1 , its length. For convenience the edge with reversed orientation, −e, will be assigned the same length: (−e) := (e). Two metrics , : 1 → (0, +∞) belong to the same ∗) (e∗ ) discrete conformal structure if the ratio of the lengths ρ(e) := (e (e) = (e) , on each pair of dual edges e ∈ 1 , e∗ ∈ 1∗ are equal. A function f on is a function defined on the vertices of and of ∗ . Such a function is said to be holomorphic if, for every pair of dual edges (x, x ) ∈ 1 and
Discrete Riemann Surfaces
181
vn
v1 F∗
e∗
v
e v2
0. The vertex dual to a face
1. Dual edges
2. The face dual to a vertex
Fig. 2. Duality
(y, y ) = (x, x )∗ ∈ 1∗ , it fulfills f (x ) − f (x) f (y ) − f (y) = i . (y, y ) (x, x ) It is the naive discretisation of the Cauchy–Riemann equation for a function f , which is, in local orthonormal coordinates (x, y): ∂f ∂f =i . ∂y ∂x Here, we understand two dual edges as being orthogonal. This equation, though simple, was never considered in such a generality. It was introduced by Lelong–Ferrand [LF] for the decomposition of the plane by the standard square lattice Z2 . It is also called monodiffric functions; for background on this topic, see [Duf]. Polynomials of degree two, restricted to the square lattice, give examples of monodiffric functions. See also the works of Kenyon [Ken] and Schramm and Benjamini [BS96] who considered more than lattices. The usual notions of deRham cohomology are useful in this setup. We said that kchains are elements of the Z-module Ck (), generated by the k-cells, its dual space C k () := Hom (Ck (), C) is the space of k-cochains. We will denote the coupling by the usual integral and functional notations: f (x) for a function f ∈C 0 () on a vertex x ∈ 0 ; e α for a 1-form α ∈ C 1 () on an edge e ∈ 1 ; and F ω for a 2-form ω ∈ C 2 () on a face F ∈ 2 . The coboundary d : C k () → C k+1 () is defined by the Stokes formula (with the same notations as before): df := f ∂(x, x ) = f (x ) − f (x) dα := α. (x,x )
F
∂F
∗,
As the boundary operator splits onto the two dual complexes and the coboundary d also respects the direct sum C k () = C k ( ) ⊕ C k ( ∗ ). The Cauchy–Riemann equation can be written in the usual form ∗df = −idf for the following Hodge star ∗ : C k () → C 2−k () defined by: ∗α := −ρ(e∗ ) α. e
e∗
182
C. Mercat
We extend it to functions and 2-forms by: ∗f := f (F ∗ ),
∗ω(x) :=
ω. x∗
F
∗
) (e) As, by definition, for each edge e ∈ 1 , ρ(e)ρ(e∗ ) = (e (e) (e∗ ) = 1, the Hodge star fulfills on k-forms, ∗2 = (−1)k(2−k) IdC k () . It decomposes 1-forms into −i, respectively +i, eigenspaces, called type (1, 0), resp. type (0, 1) forms: C 1 () = C (1,0) () ⊕ C (0,1) ().
The associated projections are denoted: 1 (Id + i∗): C 1 () → C (1,0) (), 2 1 = (Id − i∗): C 1 () → C (0,1) (). 2
π(1,0) = π(0,1)
A 1-form is holomorphic if it is closed and of type (1, 0): α ∈ 1 () ⇐⇒ dα = 0 and ∗ α = −iα. It is meromorphic with a pole at a vertex x ∈ 0 if it is of type (1, 0) and not closed on the face x ∗ . Its residue at x is defined by 1 Resx (α) := α. 2iπ ∂x ∗ The residue theorem is merely a tautology in this context. We define d , d , the composition of the coboundary with the projections on eigenspaces of ∗ as its holomorphic and anti-holomorphic parts: d := π(1,0) ◦ d,
d := π(0,1) ◦ d
from functions to 1-forms, d := d ◦ π(0,1) ,
d := d ◦ π(1,0)
from 1-forms to 2-forms and d = d = 0 on 2-forms. They verify d 2 = 0 and d 2 = 0. The usual discrete Laplacian, which splits onto and ∗ independently, reads := −d ∗ d ∗ − ∗ d ∗ d as expected. Its formula for a function f ∈ C 0 () on a vertex x ∈ 0 , with x1 , . . . , xn as neighbours is (f )(x) =
n
ρ(x, xk ) (f (x) − f (xk )) .
(2.1)
k=1
As in the continuous case, it can be written in terms of d and d operators: For functions, = i ∗ (d d − d d ), in particular holomorphic and anti-holomorphic functions are harmonic. The same result holds for 1-forms.
Discrete Riemann Surfaces
183
In the compact case, the operator d ∗ = − ∗ d∗ is the adjoint of the coboundary with respect to the usual scalar product, (f, g) := x∈0 f (x)g(x) ¯ on functions, similarly on 2-forms and (α, β) := ρ(e) α β¯ on 1-forms. e∈1
e
e
It gives rise to the Hodge decomposition, Proposition 1 (Hodge theorem). In the compact case, the k-forms are decomposed into orthogonal direct sums of exact, coexact and harmonic forms: C k () = Im d ⊕⊥ Im d ∗ ⊕⊥ Ker , harmonic forms are the closed and coclosed ones: Ker = Ker d ∩ Ker d ∗ . In particular the only harmonic functions are locally constant. Harmonic 1-forms are also the sum of holomorphic and anti-holomorphic ones: Ker = Ker d ⊕⊥ Ker d . Beware that being disconnected, the space of locally constant functions is 2dimensional. The function ε which is +1 on and −1 on ∗ is chosen as the second basis vector. The proof is algebraic and the same as in the continuous case. As the Laplacian decomposes onto the two dual graphs, this result tells also that for any harmonic 1-form on , there exists a unique harmonic 1-form on the dual graph ∗ such that the couple is a holomorphic 1-form on , it’s simply α ∗ := i ∗ α . These decompositions don’t hold in the non-compact case; there exist non-closed and/or non-co-closed, harmonic 1-forms. 2.2. Dirichlet and Neumann problems. Proposition 2 (Dirichlet problem). Consider a finite connected graph , equipped with a function ρ on the edges, and a certain non-empty set of points D marked as its boundary. For any boundary function f ∂ : (∂ )0 → C, there exists a unique function f , harmonic on 0 \ D such that f |∂ = f ∂ . We refer to the usual laplacian defined by Eq. (2.1). If f ∂ = 0, the solution is the null function. Otherwise, it is the minimum of the strictly convex, positive functional f → (df, df ), proper on the non-empty affine subspace of functions which agree with f ∂ on the boundary. Definition 1. Given a cellular decomposition of a compact surface with boundary ¯ union with the opposite oriented surface, along
, define the double 2 := ∪ , their boundary. The double 2 is a cellular decomposition of the compact surface 2 . Consider its dual 2∗ and define ∗ := ∩ 2∗ . We don’t take into account the faces of 2∗ which are not completely inside but we do consider the half-edges dual to boundary edges of as genuine edges noted (∂ ∗ )1 and define (∂ ∗ )0 := 12∗ ∩ ∂ as the set of their boundary vertices. 1 A function ρ on the edges of yields an extension to 1∗ by defining ρ(e∗ ) := ρ(e) .
184
C. Mercat
Remark 2. ∗ is not a cellular decomposition of the surface; the half-edges dual to boundary edges do not bound any face of ∗ . Proposition 3 (Neumann problem). Consider a cellular decomposition of a disk, equipped with a function ρ on its edges. Choose a boundary vertex y0 ∈ (∂ ∗ )0 , a value f0 ∈ C, and on the set of boundary edges e ∈ (∂ ∗ )1 , not incident to y0 , a 1-form α. Then f , harmonic on ∗ \(∂ ∗ )0 such that f (y0 ) = f0 thereexists a unique function ∗ and e df = e α for all e ∈ (∂ )1 not incident to y0 . It is a dual problem. Let e0∗ ∈ (∂ ∗ )1 , be the edge incident to y0 and e0 ∈ (∂ )1 its dual. Consider, on the set of boundary edges e ∈ (∂ )1 different from e0 , the 1-form defined by i ∗ α. Integrating it along the boundary, we get a function f ∂ on (∂ )0 , well defined up to an additive constant. The Dirichlet theorem gives us a function f harmonic on 0 \ (∂ )0 corresponding to f ∂ . Integrating the closed 1-form i ∗ df on ∗ yields the desired harmonic function f . Remark 3. The number of boundary points in is the same as in ∗ , and as every harmonic function on , when the surface is a disk, defines a harmonic function on ∗ such that their couple is holomorphic, unique up to an additive constant, the space of holomorphic functions, resp. 1-forms, on the double decomposition with boundary is |(∂)0 |/2 + 1, resp. |(∂)0 |/2 − 1 dimensional. The theorem is true for more general surfaces than a disk but the proof is different, see the author’s PhD thesis [M]. There are 2 versions of these theorems too. 2.3. Existence theorems. We have very similar existence theorems to the ones in the continuous case. We begin with the main difference: Proposition 4. The space of discrete holomorphic 1-forms on a compact surface without boundary is of dimension twice the genus. The Hodge theorem implies an isomorphism between the space of harmonic forms and the cohomology group of . It is the direct sum of the cohomology groups of and of ∗ and each is isomorphic to the cohomology group of the surface which is 2g dimensional on a genus g surface. It splits in two isomorphic parts under the type (1, 0) and type (0, 1) sum. As any holomorphic form is harmonic, the dimension of the space of holomorphic 1-forms is 2g. We can give explicit basis to this vector space as in the continuous case [Sie]. To construct them, we begin with meromorphic forms: Proposition 5. Let be a compact surface with boundary. For each vertex x ∈ 0 \∂♦, and a simple path λ on going from x to the boundary there exists a pair of meromorphic 1-forms αx , βx with a single pole at x, with residue +1 and which have pure imaginary, respectively real holonomies, along loops which don’t have any edge dual to an edge of λ. Proposition 6. Let be a compact surface. For each pair of vertices x, x ∈ 0 with a simple path λ on from x to x , there exists a unique pair of meromorphic 1-forms αx,x , βx,x with only poles at x and x , with residue +1 and −1 respectively, and which have pure imaginary, respectively real holonomies, along loops which don’t have any edge dual to an edge of λ.
Discrete Riemann Surfaces
185
In both cases, the forms are (Id + i∗)df with f a solution of a Dirichlet problem at x (and x ) for α and of a Neumann problem on the surface split open along the path λ for β. The uniqueness in the second proposition is given by the difference: the poles cancel out and yield a holomorphic 1-form with pure imaginary, resp. real holonomies, so its real part, resp. imaginary part, can be integrated into a harmonic, hence constant function. So this part is in fact null. Being a holomorphic 1-form, the other part is null too. We refer to the author’s PhD thesis [M] for details. As in the continuous case, it allows us to construct holomorphic forms with (no poles and) prescribed holonomies: Corollary 1. Let A, B ∈ Z1 () be two non-intersecting simple loops such that there exists exactly one edge of A dual to anedge of B (dual loops). There exists a unique holomorphic 1-form #AB such that Re( B #AB ) = 1 and γ #AB ∈ iR for every loop γ which doesn’t have any edge dual to an edge of A. y
We decompose A in two paths λx and λxy . It gives us two 1-forms βx,y and βy,x , then #AB :=
1 (βx,y + βy,x ) 2iπ
(2.2)
fulfills the conditions. 2.4. The diamond ♦ and its wedge product. Following [Whit], we define a wedge product, on another complex, the diamond ♦, constructed out of the double : Each pair of dual edges, say (x, x ) ∈ 1 and (y, y ) = (x, x )∗ ∈ 1∗ , defines (up to homeomorphisms) a four-sided polygon (x, y, x , y ) and all these constitute the faces of a cellular complex called ♦ (see Fig. 3). y
x
x
y Fig. 3. The diamond ♦
On the other hand, from any cellular decomposition ♦ of a surface by four-sided polygons one can reconstruct the double . A difference is that may not be disconnected in two dual pieces and ∗ , it is so if each loop in ♦ is of even length; we
186
C. Mercat
will restrict ourselves to this simpler case. This is not very restrictive because from a connected double, refining each quadrilateral in four smaller quadrilaterals, one gets a double disconnected in two dual pieces. Definition 2. A discrete surface with boundary is defined by a quadrilateral cellular decomposition ♦ of an oriented surface with boundary such that its double complex is disconnected in two dual parts. This definition is a generalisation of the more natural previous Definition 1. It allows us to consider any subset of faces of ♦ as a domain yielding a discrete surface with boundary. While any edge of has a dual edge, a vertex of has a dual face if and only if it is an inner vertex. Punctured surfaces can be understood in these terms too: An inner vertex v ∈ 0 is a puncture if it is declared as being on the boundary and its dual face v ∗ removed from 2 . We construct a discrete wedge product, but while the Hodge star lives on the double , the wedge product is defined on the diamond ♦: ∧ : C k (♦) × C l (♦) → C k+l (♦). It is defined by the following formulae, for f, g ∈ C 0 (♦), α, β ∈ C 1 (♦) and ω ∈ C 2 (♦): (f · g)(x) := f (x) · g(x) for x ∈ ♦0 , f (x) + f (y) f · α := (x, y)α for (x, y) ∈ ♦1 , 2 (x,y) 4 1 α ∧ β := α β− α β 4 k=1(x ,x ) (x ,x ) k−1 k k k+1
(x1 ,x2 ,x3 ,x4 )
f · ω :=
(xk+1 ,xk ) (xk ,xk−1 )
f (x1 )+f (x2 )+f (x3 )+f (x4 )
4
(x1 ,x2 ,x3 ,x4 )
ω
(x1 ,x2 ,x3 ,x4 )
for (x1 , x2 , x3 , x4 ) ∈ ♦2 . Lemma 1. d♦ is a derivation with respect to this wedge product. To take advantage of this property, one has to relate forms on ♦ and forms on where the Hodge star is defined. We construct an averaging map A from C • (♦) to C • (). The map is the identity for functions and defined by the following formulae for 1 and 2-forms: 1 A(α♦ ) := + + + (2.3) α♦ , 2 (x,x )
(x,y)
d
A(ω♦ ) := x∗
1 2
(y,x )
(x,y )
ω♦ ,
(y ,x )
(2.4)
k=1(x ,y ,x,y ) k−1 k k
where notations are made clear in Fig. 4. With this definition, d A = Ad♦ , but the map A is neither injective nor always surjective, so we can neither define a Hodge star on ♦ nor a wedge product on . An element of the kernel of A is given for example by d♦ ε, where ε is +1 on and −1 on ∗ .
Discrete Riemann Surfaces
187
x1
y
yd
y1 x2
x
x
xd
x y2
y
(2.3)
(2.4) Fig. 4. Notations
On the double itself, we have pointwise multiplication between functions, functions and 2-forms, and we construct an heterogeneous wedge product for 1-forms: with α, β ∈ C 1 (), define α ∧ β ∈ C 1 (♦) by α ∧ β := α β+ α β. (x,y,x ,y )
(x,x )
(y,y )
(y,y )
(x ,x)
It verifies A(α♦ )∧A(β♦ ) = α♦ ∧β♦ , the first wedge product being between 1-forms on and the second between forms on ♦. Of course, we also have for integrable 2-forms: 1 ω♦ = A(ω♦ ) = A(ω♦ ) = A(ω♦ ). 2 ♦2
And for a function f , ♦2
2∗
2
1 f · ω♦ = 2
2
2
1 A(f · ω♦ ) = 2
f · A(ω♦ ) 2
whenever f · ω♦ is integrable. Explicit calculation shows that for a function f ∈C 0 (), denoting by χx the characteristic function of a vertex x ∈ 0 , (f )(x) = − 2 f · ∗χx . So by linearity one gets Weyl’s lemma: a function f is harmonic iff for any compactly supported function g ∈ C 0 (), f · g = 0. 2
One checks also that the usual scalar product on compactly supported forms on reads as expected: ¯ ρ(e) α α ∧ ∗β. β¯ = (α, β) := e∈1
e
e
♦2
188
C. Mercat
In some cases, for example, the decomposition of the plane by lattices, the averaging map A is surjective. We define the inverse map B : C 1 () → C 1 (♦)/Ker A and ♦ := d♦ B ∗ d and we then have Proposition 7 (Green’s identity). For two functions f, g on a compact domain D ⊂ ♦2 , (f · ♦ g − g · ♦ f ) − (f · B∗dg − g · B∗df ) = 0. D
∂D
This means that for any representatives of the classes in C 1 (♦)/Ker A the equality holds, but each integral separately is not well defined on the classes.
2.5. Cauchy integral formula. Proposition 8. Let a double map and D a compact region of ♦2 homeomorphic to a disc. Consider an interior edge (x, y) ∈ D; there exists a meromorphic 1-form νx,y ∈ C 1 (D \ (x, y)) such that the holonomy γ νx,y along a cycle γ in D only depends on its homology class in D \ (x, y), and ∂D νx,y = 2iπ . Consider the meromorphic 1-form µx,y = αx +αy ∈ C 1 (∩D) defined by existence Theorem 5 on D. It is uniquely defined up to a global holomorphic form on D. Its only poles are x and y of residue +1 so it verifies a similar holonomy property, but on ∩ D \ (x, y). We define a 1-form νx,y on ♦ ∩ D \ R, such that µx,y = Aνx,y in the following way: Let (x,a) νx,y := λ, a fixed value, and for an edge (x , y ) ∈ D1 , with y
x ∈ 0 , y ∈ 0∗ , given two paths in D, λxx ∈ C1 ( ) and λy ∈ C1 ( ∗ ) respectively from x to x and from y to y , νx,y := µx,y + νx,y + y µx,y − µx,y , (x ,y )
λxx
(x,A)
λy
[γ ]
y
where [γ ] is the homology class of λxx + (x, y) + λy + (y , x ) on the punctured domain. dz with z0 = (x, y). It is closed on every face of νx,y is the discrete analogue of z−z 0 D \ R. By definition, the average of νx,y on the double map is the meromorphic form Aνx,y = µx,y . It allows us to state Proposition 9 (Cauchy integral formula). Let D be a compact connected subset of ♦2 and (x, y) ∈ D1 two interior neighbours of D with a non-empty boundary. For each function f ∈ C 0 (), f (x) + f (y) f · νx,y = d f ∧ µx,y + 2iπ . 2 ∂D D The proof is straightforward: The edge (x, y) bounds two faces in D, let R = (abcd) the rectangle made of these faces (see Fig. 5). On D \ R, d♦ (f · νx,y ) = d♦ f ∧ νx,y + f · d♦ νx,y .
Discrete Riemann Surfaces
189
c d
y x
b R a
D Fig. 5. The rectangle R in a domain D defined by an edge (x, y) ∈ ♦1
The (1, 0) part of df disappears in the wedge product against the holomorphic form µx,y , so we can substitute d♦ f ∧ νx,y = d f ∧ Aνx,y = d f ∧ µx,y . Integrating over D, as νx,y is closed on D \ R, we get: f · νx,y = d f ∧ µx,y + ∂D
Explicit calculus shows that
∂R
D\R
f · νx,y =
R
∂R
f · νx,y .
(y) d f ∧ µx,y + 2iπ f (x)+f . 2
Remark 4. Since for all α ∈ C 1 (♦), the locally constant function ε defined by ε( ) = +1, ε( ∗ ) = −1, verifies ε · α = 0, an integral formula will give the same result for a function f and f + λε. Therefore such a formula can not give access to the value of the function at one point but only to its average value at an edge of ♦. Corollary 2. For f ∈ () a holomorphic function, the Cauchy integral formula reads, with the same notations, f (x) + f (y) 1 = f · νx,y . 2 2iπ ∂D The Green function on the lattices (rectangular, triangular, hexagonal, Kagomé, square/octogon) is exactly known in terms of hyperelliptic functions ([Hug] and references in Appendix 3). As the potential is real, it means that the discrete Dirichlet problem on these lattices can be exactly solved this way, once the boundary values on the graph and its dual are given: if these values are real and and imaginary on its dual, the solution is real on and pure imaginary on the dual so the contributions f (x) and f (y) are simply the real and imaginary parts of the contour summation respectively. Unfortunately, this pair of boundary values are not independant but related by a Dirichlet to Neumann problem [CdV96]. 3. Criticality The term criticality, as well as our motivation to investigate discrete holomorphic functions, comes from statistical mechanics, namely the Ising model. A critical temperature is defined that restrains the interaction constants, interpreted here as lengths. We will see these geometrical constraints in Sect. 3.3. Technically, as far as the continuous limit theorem is concerned, a weaker property, called semi-criticality is sufficient, it gives us a product between functions and forms. Moreover, at criticality, this product will be compatible with holomorphy.
190
C. Mercat
3.1. Semi-criticality. Define Cθ := {(r, t) : r ≥ 0, t ∈ R/θ Z}/(0, t) ∼ (0, t ) with the metric ds 2 := dr 2 + r 2 dt 2 as the standard cone of angle θ > 0 [Tro]. The cones can be realized by cutting and pasting paper, demonstrating their local isometry with the euclidean complex plane. Let be a compact Riemann surface and P ⊂ a discrete set of points. A flat Riemannian metric with P as conic singularities is an atlas {ZUx : Ux → U ⊂ Cθx >}x∈P of open sets Ux ⊂ , a neighbourhood of a singularity x ∈ P , into open sets of a standard cone, such that the singularity is mapped to the vertex of the cone and the changes of coordinates CU,V : U ∩ V → C are euclidean isometries. There is a lot of freedom allowed in the choice of a flat metric for a given closed Riemann surface : Any finite setP of points on with a set of angles θx > 0 for every x ∈ P such that 2πχ ( ) = x∈P (2π − θx ), defines uniquely a Riemannian flat metric on with these conic singularities and angles [Tro]. Consider such a flat riemannian metric on a compact Riemann surface and (, ) a double cellular decomposition of as before. Definition 3. (, ) is a semi-critical map for this flat metric if the conic singularities are among the vertices of and ♦ can be realized such that each face (x, y, x , y ) ∈ ♦2 is mapped, by a local isometry Z preserving the orientation, to a four-sided polygon (Z(x), Z(y), Z(x ), Z(y )) of the euclidean plane, the segments [Z(x), Z(x )] and [Z(y), Z(y )] being of lengths (x, x ), (y, y ) respectively and forming a direct orthogonal basis. We name δ(, ) the supremum of the lengths of the edges of ♦. The local isometric maps Z are discrete holomorphic. Voronoï and Delaunay complexes [PS85] are interesting examples of semi-critical dual complexes. Any discrete set of points Q on a flat Riemannian surface, containing the conic singularities, defines such a pair: We first define two partitions V and D of into sets of three types: 2-sets, 1-sets and 0-sets, and then show that they are in fact dual cellular complexes. They are defined by a real positive function mQ on the surface, the multiplicity. Consider a point x ∈ ; as the set Q is discrete, the distance d(x, Q) is realized by geodesics of minimal length, generically a single one. Let mQ (x) ∈ [1, ∞) be the number of such geodesics. If mQ (x) = 1, there exists a vertex π(x) ∈ Q such that the shortest geodesic from x to π(x) is the only geodesic from x to Q with such a small length. The Voronoï 2-set associated to a vertex v in Q, is π −1 (v), that is to say the set of points of closer to this vertex than to any other vertex in Q. Each 2-set of V is a connected component of m−1 Q (1). Likewise, the 1-sets are the connected components of m−1 Q (2). They are associated to pairs of points in Q. The 0-sets are the connected components of m−1 Q ([3, +∞)). Generically, they are associated to three points in Q. V is a cellular complex (see below) and the complex D is its dual (generically a triangulation), its vertices are the points in Q, its edges are segments (x, x ) for x, x ∈ Q such that there exist points equidistant and closer to them.
Proposition 10. The Voronoï partition, on a closed Riemann surface with a flat metric with conic singularities, of a given discrete set of points Q containing the conic singularities, is a cellular complex.
Discrete Riemann Surfaces
191
We have to prove that 2-sets are homeomorphic to discs, 1-sets are segments and 0-sets are points. First, 2-sets are star-shaped, for every point x closer to v ∈ Q than to any other point in Q, along a unique portion of a geodesic, the whole segment [x, v] has the same property. 2-sets are open, if x is closer to v ∈ Q than to any other point in Q, as it is discrete, d(x, Q \ v) − d(x, v) > 0. By triangular inequality, every point in the open ball of this radius centred at x is closer to v than to any other point in Q. So 2-sets are homeomorphic to discs. Let x be a point in a 1-set. It is defined by exactly two portions of geodesics D, D from x to y, y ∈ Q (they may coincide). By definition, the open sphere centred at x containing D ∪ D doesn’t contain any point of Q so it can be lifted to the universal covering, where the usual rules of euclidean geometry tell us that the set of points equidistant to y and y around x is a submanifold of dimension 1. As the surface is compact, if it is not a segment, it can only be a circle. Then, it’s easy to see that the surface is homeomorphic to a 2-sphere and that y and y are the only points in Q. But this is impossible because an euclidean metric on a 2-sphere has at least three conic singularities [Tro]. The same type of arguments shows that 0-sets are isolated points.
Fig. 6. The Voronoï/Delaunay decompositions associated to two points on a genus two surface
Proposition 11. Such Delaunay/Voronoï dual complexes are semi-critical maps of the surface. Hence any Riemann surface admits semi-critical maps. The edge in V dual to (x, x ) ∈ D1 is a segment of their mediatrix so is orthogonal to (x, x ). Hence, equipped with the Euclidean length on the edges, (V , D) is a semi-critical map. Remark 5. Apart from Voronoï/Delaunay maps, circle packings [CdV90] give another very large class of examples of interesting semi-critical decompositions (see Fig. 7). The semi-criticality of a double map gives a coherent system of angles φ in (0, π ) on the oriented edges of . An edge (x, x ) ∈ 1 is the diagonal of a certain diamond; φ(x, x ) is the angle of that diamond at the vertex x. In particular, φ(x, x ) # = φ(x , x) a priori. They verify that for every diamond, the sum of the angles on the four directions
192
C. Mercat
Fig. 7. Circle packing, the dual vertex to a face
of the two dual diagonals is 2π (see Fig. 8). Then the conic angle at a vertex is given by the sum of the angles over the incident edges. y x
φ(y , y) φ(x, x )
x φ(x , x)
φ(y, y ) y Fig. 8. A system of angles for a semi-critical map
3.2. Continuous limit. We state the main theorem, a converging sequence of discrete holomorphic functions on a refining sequence of semi-critical maps of the same Riemann surface, converges to a holomorphic function. Precisely: Theorem 3. Let be a Riemann surface and (k, k )k∈N a sequence of semi-critical maps on it, with respect to the same flat metric with conic singularities. Assume that the lengths δk = δ(k) tend to zero and that the angles at the vertices of all the faces of the k♦ are in the interval [η, 2π − η] with η > 0. Let (fk )k∈N be a sequence of discrete holomorphic functions fk ∈ (k), such that there exists a function f on which verifies, for every converging sequence (xk )k∈N of points of with each xk ∈ k0 , f limk (xk ) = limk fk (xk ) , then the function f is holomorphic on . Such a refining sequence is easy to produce (see Fig. 9) but the theorem takes into account more general sequences. A more natural refining sequence, which mixes the two
Discrete Riemann Surfaces
193
Fig. 9. Refining a semi-critical map
dual sequences is given by a series of tile centering procedures [GS87]: If one calls ♦/2 the cellular decomposition constructed from ♦ by replacing each tile by four smaller ones of half its size, and (♦/2), ∗ (♦/2) the double cellular decomposition it defines, one has (♦/2) = (♦)“ ∪ ” ∗ (♦) and the interesting following sequence: (♦) → ♦ → (♦/2) → ♦/2 → · · · → ♦/2n → . . . $ % $ % $ % . ∗ (♦) ∗ (♦/2) ··· ···
(3.1)
The horizontal arrows correspond to tile centering procedures, and the ascending, respectively descending arrows, to tile centering, resp. edge centering procedures. This sequence is not that exciting though since locally, the graph rapidly looks like a rectangular lattice. More interesting inflation rules staying at criticality can be considered too (see Fig. 21). The demonstration of the continuous limit theorem needs three lemmas: Lemma 2. Let (fk )k∈N be a sequence of functions on an open set ⊂ C such that there exists a function f on verifying, for every converging sequence (xk )k∈N of points of , f limk (xk ) = limk fk (xk ) . Then the function f is continuous and uniform limit of (fk ) on any compact. Taking a constant sequence of points, we see that (fk ) converges to f pointwise. So with the notations of the theorem, (fk (xk )) converges to f (x) and (fl (xk ))l∈N to f (xk ). Combining the two, (f (xk )) converges to f (x) so f is continuous. If the convergence was not uniform on a compac sett, then there would exist a converging sequence (xk ) with (fk (xk ) − f (xk )) not converging to zero. But f is continuous in x = lim(xk ) and (fk (xk )) converges to f (x), which, combined, contradicts the hypotheses. Lemma 3. Let (ABCD) be a four sided polygon of the Euclidean plane such that its diagonals are orthogonal and the vertices angles are in [η, 2π − η] with η > 0. Let (M, M ) be a pair of points on the polygon. There exists a path on (ABCD) from M to M of minimal length . Then MM sin η ≥ . 4
194
C. Mercat
It is a straightforward study of a several variables function. If the two points are on the same side, MM = and sin η ≤ 1. If they are on adjacent sides, the extremal position with MM fixed is when the triangle MM P , with P the vertex of (ABCD) between η sin η them, is isocel. The angle in P being less than η, MM ≥ sin 2 > 2 . If the points are sin η on opposite sides, the extremal configuration is given by Fig. 10.2., where MM = 4 .
M
M
η
η
η M
M 1. M, M on adjacent sides
2. M, M on opposite sides Fig. 10. The two extremal positions
Lemma 4. Let (, ) be any double cellular decomposition and α ∈ C 1 (♦) a closed 1-form. The 1-form f · α is closed for any holomorphic function f ∈ () if and only if α is holomorphic. Just check. Proof of Theorem 3. We interpolate each function fk from the discrete set of points k0 to a function f¯k of the whole surface, linearly on the edges of k♦ and harmonicly in its faces. Let (ζk ) be a converging sequence of points in . Each ζk is in the adherence of a face of k♦. Let xk , yk be the minimum and maximum of Re fk around the face. By the maximum principle for the harmonic function Re f¯k , Re fk (xk ) ≤ Re f¯k (ζk ) ≤ Re fk (yk ). Moreover, the distance between xk and ζk is at most 2δk , as well as for yk . It implies that (xk ) and (yk ) converge to x = lim(ζk ), (fk (xk )) and (fk (yk )) to f (x), and (Re f¯k (ζk )) to Re f (x); and similarly for its imaginary part. So, by Lemma 2, the function f is continuous, and is the uniform limit of (f¯k ) on every compact set. In particular, it is bounded on any compact. By the theorem of inessential singularities, since f is continuous hence bounded on any compact set, and that conic singularities form a discrete set in , to show that f is holomorphic, we can restrict ourselves to each element U ⊂ of a euclidean atlas of the punctured surface (without conic singularities). We have an explicit coordinate z on U. Let γ be a homotopically trivial loop in U of finite length . We are going to prove that γ f dz = 0. The theorem of Morera then states that f is holomorphic. Let us fix the integer k. By application of Lemma 3 on every face of k♦ crossed by γ , 4 we construct a loop γk ∈ C1 (k♦), homotopic to γ , of length (γk ) ≤ sin η (see Fig. 11).
Discrete Riemann Surfaces
195
As the diameter of a face of k♦ is at most 2δk , all these faces are contained in the tubular neighbourhood of γ of diameter 4δk . Its area is 4δk and it contains the set C of between γ and γk .
Fig. 11. The discretised path
¯ | is bounded by a number M. Applying Assume f is of class C 1 , on the compact C, |∂f Stockes formula to f dz, ¯ (z)|dz ∧ d z¯ ≤ M × 4δk . f (z)dz ≤ |∂f f (z)dz −
γ
γk
C
So γ f (z)dz = lim γk f (z)dz. Taking a sequence of class C 1 functions converging uniformly to f on C, we prove the same result for f simply continuous because all the paths into account are of bounded lengths. As (f¯k ) converges uniformly to f on C and the paths are of bounded lengths, we also have that (| γk f¯k (z) − f (z) dz|)k∈N tends to zero. But because the interpolation is linear on edges of k♦, γk f¯k (z)dz = γk fk dZ, the second integral being the coupling k between a 1-chain and a 1-cochain of ♦. But since fk and dZ are discrete holomorphic, fk dZ is a closed 1-form, and γk fk dZ = 0. So γk fk (z)dz tends to zero and f (z)dz = 0. γ
3.3. Criticality. Proposition 12. Let α be a holomorphic 1-form, f · α is holomorphic for any holomorphic function if and only if (y,x) α = (x ,y ) α for each pair of dual edges (x, x ), (y, y ).
196
C. Mercat
Let (x, y, x , y ) ∈ ♦2 be a face of ♦, the Cauchy–Riemann equation for f · α, on the couple (x, x ) and (y, y ) is the nullity of: f ·α f ·α 1 f (x) + f (y) f (x) + f (y ) (y,y ) (x,x ) −i = α+ α (y, y ) (x, x ) (y, y ) 2 2 +
f (x ) + f (y) 2
(y,x)
α+ (y,x )
1 f (x) + f (y) −i (x, x ) 2 +
f (x) + f (y ) 2
=
α+
(y,x)
(y ,x )
α+ (x,y )
α
f (x ) + f (y ) 2 α+
(x,y )
α (x ,y )
f (x ) + f (y)
(x,y)
f (x ) + f (y ) 2
2
α (y,x )
α (y ,x )
f (y ) − f (y) , (y, y )
after having developed, used the holomorphy of α, then the holomorphy of f . So to be able to construct out of the holomorphic 1-forms dZ given by local flat isometries, and a holomorphic function a holomorphic 1-form f dZ, we have to impose that for each face (x, y, x , y ) ∈ ♦2 , Z(x) − Z(y) = Z(y ) − Z(x ). Geometrically, it means that each face of the graph ♦ is mapped by Z to a parallelogram in C. But as the diagonals of this parallelogram are orthogonal, it is a lozenge (or rhombus, or diamond). Definition 4. A double (, ) of a Riemann surface is critical if it is semi-critical and each face of ♦2 are lozenges. Let δ() be the common length of their sides. Remark 6. This has an intrinsic meaning on , the faces of ♦ are genuine lozenges on the surface and every edge of can be realized by segments of length given by , two dual edges being orthogonal segments. Another equivalent way to look at criticality can be useful: a double (, ) is critical if there exists an application Z : \ P → C from the universal covering of the punctured surface \ P for a finite set P ⊂ 0 into C identified to the oriented Euclidean plane R2 such that ˜ 1 is a linear segment of length (a), – the image of an edge a ∈ – two dual edges are mapped to a direct orthogonal basis, – Z is an embedding out of the vertices, – there exists a representation ρ of the fundamental group π1 ( \ P ) into the group of isometries of the plane respecting orientation such that, ∀γ ∈ π1 ( \ P ), Z ◦ γ = ρ(γ ) ◦ Z, – and the lengths of all the segments corresponding to the edges of ♦ are all equal to the same δ > 0.
Discrete Riemann Surfaces
197
The criticality of a double map gives a coherent system of angles φ in (0, π ) on the unoriented edges of , φ(x, x ) is the angle in the lozenge for which (x, x ) is a diagonal, at the vertex x (or x ). They verify that for every lozenge, the sum of the angles on the dual diagonals is π . Then the conic angle at a vertex is given by the sum of the angles over the incident edges. Every discrete conformal structure (, ) defines a conformal structure on the associated topological surface by pasting lozenges together according to the combinatorial data (though most of the vertices will be conic singularities). Conversely, Theorem 4. Every closed Riemann surface accepts a critical map. Proof. We first produce critical maps for cylinders of any modulus: Consider a row of n squares and glue back its ends to obtain a cylinder, its modulus, the ratio of the square of the distance from top to bottom by its area is n1 . Stacking m such rows upon each other, one gets a cylinder of modulus m n. Squares can be bent into lozenges yielding a continuous family of cylinders of moduli ranging from zero to n2 (see Fig. 12). Hence we can get cylinders of any modulus.
Fig. 12. Two bent rows
Dehn twists can be performed on these critical cylinders, see Fig. 13.
2π cos θ n
θ φ Fig. 13. Performing a Dehn twist
Gluing three cylinders together along their bottom (n has to be even), one can produce trinions of any modulus (see Fig. 14) and these trinions can be glued together according to any angle. Hence, every Riemann surface can be so produced [Bus]. Remark 7. An equilateral surface is a Riemann surface which can be triangulated by equilateral triangles with respect to a flat metric with conic singularities. Equilateral ¯ [VoSh] so are dense among the Riemann surfaces. surfaces are the algebraic curves over Q Cutting every equilateral triangle into nine, three times smaller, triangles (see Fig. 15), one can couple these triangles by pairs so that they form lozenges, hence a critical map. In Figs. 16–19 are some examples of critical decompositions of the plane. In Fig. 20, a higher genus example, found in Coxeter [Cox1], of the cellular decomposition of a collection of handlebodies (the genus depends on how the sides are glued pairwise) by
198
C. Mercat
Fig. 14. Gluing three cylinders into a trinion
Fig. 15. An equilateral triangle cut in nine yielding lozenges
ten regular pentagons, the centre is a branched point of order three; together with its dual, they form a critical map. It is the case for any cellular decomposition by just one regular tile when its vertices are co-cyclic. This decomposition gives rise to a critical sequence using the Penrose inflation rule [GS87]. Figure 21 illustrates this inflation rule sequence on a simpler genus two example where each outer side has to be glued with the other parallel side.
3.4. Physical interpretation. Theorem 5. A translationally invariant discrete conformal structure (, ρ) on the double square or triangular/hexagonal lattices decomposition of the plane or the genus one torus, is critical and flat if and only if the Ising model defined by the interaction constants Ke := 21 Arcsinhρe on each edge e ∈ 1 is critical as usually defined in statistical mechanics [McCW]. Proof. We prove it by solving another problem which contains these two particular cases, namely the translationally invariant square lattice with period two [Yam]. At a
Discrete Riemann Surfaces
199
Fig. 16. A 1-parameter family of critical deformations of the square lattice
Fig. 17. A 2-parameters family of critical deformations of the triangular/hexagonal lattices. This family, key to the solution of the triangular Ising model, induced Baxter to set up the Yang–Baxter equation [Bax]. Our notion of criticality fits beautifully into this framework
11001100 11 00 0011 1100 0110 01 11001100 01 1010 1100 11001100 10 101011001100 01 1100 1100 11000011 01 1100 01 01 1010 001101 0 1 00 11 00 11 00 11 0 1 0 1 01 10110 01 101100 10110010 1100 01 1010110011000011 10110011000011 1100 1010 Fig. 18. The order 5 Penrose quasi crystal
200
C. Mercat
0100 1100 101110 11 11 00 00 0 1 00 01 10 10 11 1 0 0 1 01 00 1100 00 10 00 0 1 00 1001 11 1111 11 0001 00 00 11 11 10 10 11 00 11 00 11 00 11 0 1 01 1011 11 11 0100 1100 00 00 01 11 00 0100 11 00 01 0100 11 00 11 11 00 11 10 00 11 1000 11 Fig. 19. Lozenge patchworks
Fig. 20. Higher genus critical handlebody
a)
b)
c)
d)
Fig. 21. Sequence of critical maps of a genus two handlebody using Penrose inflation rule
Discrete Riemann Surfaces
201
particular vertex, the flat critical condition on the four conformal parameters is: 4
arctan ρi = π,
i=1
which is obviously invariant by all the symmetries of the problem, including duality. When ρi = ρi+2 , we get the usual period one Ising model criticality on the square lattice sinh 2Kh sinh 2Kv = 1, and likewise when one of the four parameters degenerates to zero or infinity, the three remaining coefficients fulfill sinh 2KI sinh 2KII sinh 2KIII = sinh 2KI + sinh 2KII + sinh 2KIII which is (a form of) the criticality condition for the triangular/hexagonal Ising model. The case shown in Fig. 16 occurs when ρ1 = ρ3 = 1, implying ρ2 ρ4 = 1. We see here that flat criticality, when the angles at conic singularitites are multiples of 2π, is more meaningful than criticality in general. This theorem is important because it shows that statistical criticality is meaningful even at the finite size level. It is well known [KW] that for lattices, it corresponds to self-duality, which has a meaning for finite systems; here we see that self-duality corresponds to a compatibility with holomorphy. In a sense, our notion of criticality defines self-duality for more complex graphs than lattices. Furthermore, we will see in Sect. 4 that criticality implies the existence of a discrete massless Dirac spinor, which is the core of the Ising model. Although we saw that criticality implies a continuous limit theorem, the thermodynamic limit is not necessary for criticality to be detected, and to have an interesting meaning. It is easy to produce higher genus flat critical maps and compute their critical temperature, the examples in Figs. 20–21 have four kinds of interactions corresponding to the diagonals of the two kinds of quadrilateral tiles. They are critical when the angles of 3π 4π the quadrilaterals are π5 , 2π 5 , 5 , and 5 , corresponding to Ising interactions sinh 2Kn = tan
nπ . 10
(3.2)
The author had made no attempt to verify these values numerically. A general way is, considering a critical genus one torus made up of a translationally invariant lattice, to cut two parallel segments of equal length and seam them back, interchanging their sides. This creates two conic singularities where an extra curvature of −2π is concentrated at each point, yielding a genus two handlebody. Repeating the process, we may produce critical handlebodies of arbitrarily large genus if we start with a very fine mesh. One has to beware that our continuous limit theorem applies only to fixed genus, it cannot grow with the refinement of the mesh. This explains why the union-jack lattice (the square lattice and its diagonals) or the three dimensional Ising model, which can be modelled as a genus mnp surface for a 2m×2n×2p cubic network, are beyond the scope of our technique as far as a continuous limit theorem is concerned. With this restriction in mind, we see that both the existence and the value of a critical temperature is essentially a local property and neither depends on the genus nor on the modulus of the handlebody. It is not the case for more interesting quantities such as the partition function, which can be obtained in principle from the discrete Dirac spinor that
202
C. Mercat
Fig. 22. The diamond graph of a critical labyrinth lattice
criticality provides, defined in Sect. 4. But such a calculus is beyond the scope of this article. Apart from the standard lattices, the critical temperature of other well known graphs can be computed using our method, for example the labyrinth [BGB], whose diamond is pictured in Fig. 22, has the topology of the square lattice but has five different interactions strengths controlled by two binary words, labelling the columns and rows by 0’s and 1’s. And also new ones such as the “street graph” depicted in Fig. 23. Its double row transfer matrix appears to be the product of three commuting transfer matrices, two triangular and a square one. Other cases such as the Kagomé [Syo] or more generally lattices of chequered type [Uti] can be handled using a technique called electrical moves [CdV96] which enables us to move around, and causes appearing or disappearing conic singularities of a flat metric. This will be the subject of a subsequent article, explaining the relationship between discrete holomorphy, electrical moves and knots and links. These electrical moves act in the space of all the graphs with discrete conformal structures in a similar way to that of the Baxterisation processes in the spectral parameter space of an integrable model (see [AdABM]). We are going to see that the link with statistical mechanics is even deeper than simply pointing out a submanifold of critical systems inside the huge space of all Ising models, as the similarity with the continuous case extends to the existence of a discrete Dirac spinor near criticality.
Discrete Riemann Surfaces
203
Fig. 23. The “street” lattice
3.5. Polynomial ring. Definition 5. Let (, ) be a critical map. In a given flat map Z : U → C on the simply connected U , choose a vertex z0 ∈ 0 , and for a holomorphic function f , define the holomorphic functions f † and f by the following formulae: f † (z) := ε(z)f¯(z), where f¯ denotes the complex conjugate and ε( ) = +1, ε( ∗ ) = −1, f (z) :=
4 δ2
z z0
† f † dZ
.
See [Duf] for similar definitions. Notice that f is defined up to ε if one changes the base point. Proposition 13. Let (, )be a critical map. In a given flat map Z : U → C on the simply connected U , for every holomorphic function f ∈ (), df = f dZ. We hence call f the derivative of f . Consider an edge (x, y) ∈ ♦1 , x ∈ 0 , y ∈ 0∗ , 4 f (y) = 2 δ
x z0
f dZ +
y
†
x
f dZ †
†y
†y 4 f¯(x) − f¯(y) = −f (x) + 2 (Z(y) − Z(x)) δ 2 2 ¯ ¯ = −f (x) − 2 (f (x) − f (y))(Z(y) − Z(x)). δ
So
(x,y) f
dZ
(y) ¯ ¯ = − f (x)−f (Z(y) − Z(x))(Z(y) − Z(x)) = f (y) − f (x). δ2
204
C. Mercat
Definition 6. Let U be a simply connected z flat region and z0 ∈ U . Define inductively the holomorphic functions Z k (z) := z0 k1 Z k−1 dZ given Z 0 := 1. As the space of holomorphic functions on U is finite dimensional, these functions are not free; let PU be the minimal polynomial such that PU (Z) = Z n + . . . = 0. Conjecture 6. The space of holomorphic functions on U , convex, is isomorphic to C[Z]/PU . We won’t define here the notion of convexity, see [CdV96]. The question is whether the set (Z k ) generates the space of holomorphic functions. The problem is that zeros are not localised, and as the power of Z k increases, the set of its zeros spread on the plane and get out of U . Figure 24 is an example on the unit square lattice with U the square [−10, 10] ⊕ [−10, 10]i, the degree increases with k until 16 where four zeros get out of the square. So a definition of the degree of a function by a Gauss formula is delicate.
3
Z 15 and its zeros
4
Z 16 and its zeros
Fig. 24. The zeros of Z 16 get out of the square [−10, 10] ⊕ [−10, 10]i
4. Dirac Equation Although we believe our theory can be applied to a lot of different problems, our motivation was to shed new light on statistical mechanics and the Ising model in particular. This statistical model has been linked with Dirac spinors since the work of Kaufman [K] and Onsager and Kaufman [KO]. We refer among others to [McCW81, SMJ, KC]. Hence we are interested in setting up a Dirac equation in the context of discrete holomorphy. To achieve this goal we first have to define the discrete analogue of the fibre bundle on which spinors live. We therefore have to define a discrete spin structure. Physics provides us with a geometric definition [KC] based on paths in a certain Z2 -homology, that we generalise to our need (higher genus, boundary, arbitrary topology). We begin by showing that such an object in the continuum is indeed a spin structure, then define the discrete object. We then set up the Dirac equation for discrete spinors, show that
Discrete Riemann Surfaces
205
it implies holomorphy and that the existence of a solution is equivalent to criticality. The Ising model gives us an object which satisfies the discrete Dirac equation, namely the fermion, C = σ µ as defined in [KC], corresponding to a similar object defined previously by Kaufman [K]. It fulfills the Dirac equation at criticality, but also off criticality, corresponding to a massive Dirac spinor. We will end this article by describing off-criticality, as defined by the author’s Ph.D. advisor, Daniel Bennequin. 4.1. Universal spin structure. A spin structure [Mil] on a principal fibre bundle (E, B) over a manifold B, with SO(n) as a structural group, is a principal fibre bundle (E , B), of structural group Spin(n), and a map f : E → E such that the following diagram is commutative: E × Spin(n) → E % ↓ f ×λ ↓f B $ E × SO(n) → E where λ is the standard 2-fold covering homomorphism from Spin(n) to SO(n). In this paper we consider only spin structures on the tangent bundle of a surface. On a generic Riemann surface , there is not a canonical spin structure. We are going to ˆ 22−χ( ) -fold covering of , on which there exists a preferred spin describe a surface , structure. It allows us to define every spin structure on as a quotient of this universal spin structure. We will treat the continuous case and then the discrete case. ˆ is the set of Definition 7. Let be a differentiable surface with a base-point y 0 ; pairs (z, [λ]2 ), where z ∈ is a point and [λ]2 the homology of a path λ from y 0 to z considered in the relative homology H1 ( , {y 0 , z}) ⊗ Z2 . ˆ is the 22−χ( ) covering associated to the intersection H of the kernels of all the
homomorphisms from π1 ( ) to Z2 , that is to say the quotient of the universal covering by the subgroup H ⊂ π1 ( ) of loops whose homology is null modulo two. Choose v0 a tangent vector at y 0 . For each point z ∈ , define z := \ {y 0 , z} S1 S1 , the blown up of at y 0 and z (add only one circle in the case y 0 = z). Consider the set of oriented paths in z , from the point corresponding to the vector v0 at y 0 to the directions at z (the vector v0 is needed only when z = y 0 ). Define an equivalence relation ∼z (see Fig. 25) on this set by stating that two paths λ, λ are equivalent if and only if λ − λ is a cycle and [λ − λ ]2 = 0 in the homology H1 ( \ {z}, Z2 ).
z
z v0 1. z #= y 0
2. z = y 0
Fig. 25. Paths of different classes with respect to ∼z for z # = y 0 and z = y 0
206
C. Mercat
Definition 8. The universal spin structure S of is the set of pairs (z, [λ]∼z ), with z ∈ and [λ]∼z the ∼z -equivalence class of the path λ from y 0 to z in z . ˆ and is the only one such that the action of the Theorem 7. S is a spin structure on ˆ can be lifted to. Moreover it is the pull-back of any spin fundamental group π1 ( ) on structure on . Proof. The proof is in three steps, we check that S is a spin structure, we define a spin structure S0 through group theory and we show that both are equal to a third spin structure S1 . ˆ defined by (z, [λ]∼z ) → (z, [λ]2 ). The There is an obvious projection from S to fibre of this projection at (z, [λ]2 ) is the set of ∼z -equivalence classes of paths from y 0 to the blown-up circle at z. To each class is associated the tangent direction at z so Sz ˆ As H1 ( \ {z}, Z2 ) is 23−χ( ) dimensional (a loop around z is a covering of STz ( ). ˆ there are two different lifts. The is not homologically trivial), for each point in ST ( ), path in Sz corresponding to turning around z once yields the Z2 -deck transformation. ˆ Hence S is a spin structure on . Let G := π1 ( ) and G := π1 (ST ); the S1 -fibre bundle ST ( ) → induces a →G. Every double covering of ST is defined by the short exact sequence Z J→ G → kernel S of an homomorphism u from G to Z/2, moreover, for S to be a spin structure, its intersection with the subgroup Z must be 2Z. ˆ → implies that the fundamental group H := π1 (ST ) ˆ Likewise, the fibration ˆ is the subgroup of G over H := π1 ( ), ˆ of the directions bundle of Z → H → H ↓ ↓ ↓, Z → G → G
(4.1)
ˆ The intersection of the subgroups H and S is a well defined spin structure S0 on : Indeed, consider another spin structure S = Ker (v : G → Z/2) on , its intersection with Z is 2Z hence the kernel of u − v contains the whole subgroup Z, that is to say u − v comes from a homomorphism of G to Z/2 and we have S ∩ H = S ∩ H . In other ˆ which is the pull-back of a spin structure on words, S0 is the unique spin structure on
and it is the pull-back of any spin structure. Let z ∈ be a point, consider the set of paths in ST from the base point (y 0 , v 0 ) to any direction at z. Consider on this set the equivalence relation ∼z defined by fixed extremities Z/2-homology. The class [λ]∼z of a path λ from (y 0 , v 0 ) to (z, v) is its homology class in H1 (ST , {(y 0 , v 0 ), (z, v)}) ⊗ Z/2. The projection ST → → splits H1 (ST , {(y 0 , v 0 ), (z, v)}) ⊗ Z/2 into Z/2 → H1 (ST , {(y 0 , v 0 ), (z, v)}) ⊗ Z/2 → H1 ( , {y 0 , z}) ⊗ Z/2,
(4.2)
hence the set S1 of pairs (z, [λ]∼z ) for all points z ∈ and all paths λ, is a spin structure ˆ on . Let S be a spin structure on , it defines an element in Z/2 for each loop in ST . So each path in ST beginning at (y 0 , v 0 ) defines, through the splitting 4.2, an element ˆ hence S0 = S1 . in S1 which is then the pull-back of S to , On the other hand S = S1 because there is a continuous projection from S to S1 : For an element (z, [λ]∼z ), consider a C 1 -path λ ∈ representing the class. Lift it to a path
Discrete Riemann Surfaces
207
in ST by the tangent direction at each point, its class [λ]∼z only depends on [λ]∼z and gives us an element in S1 . 4.2. Discrete spin structure. Definition 9. Let ϒ be a cellular complex of dimension two, a spin structure on ϒ is a graph ϒ , double cover of the 1-skeleton of ϒ such that the lift of the boundary of every face is a non-trivial double cover. They are considered up to isomorphisms. Let SD be the set of such spin structures. A spinor ψ on ϒ is an equivariant complex function on ϒ regarding the action of Z/2, that is to say, for all ξ ∈ ϒ0 , ψ(ξ¯ ) = −ψ(ξ ) if ξ¯ represents the other lift. Remark 8. Usually, a spinor field is a section of a spinor bundle, that is to say a square root of a tangent vector field. Here, we consider square roots of covectors; we should say cospinors. A discrete spin structure is encoded by a representation of the cycles of ϒ, Z1 (ϒ) := Ker ∂ ∩ C1 (ϒ), into Z/2 which associates to γ ∈ Z1 (ϒ), the value µ(γ ) = 0 if it can be lifted in ϒ to a cycle and µ(γ ) = 1 if it can not. By construction, the value of the boundary of a face is 1 and the value of a cycle which is the boundary of a 2-chain of ϒ is the number of faces enclosed, modulo two. We are going to show that this structure is indeed a good notion of discrete spin structure. First, there are as many discrete spin structures on a surface as there are in the continuous case: Proposition 14. On a closed connected oriented genus g surface , the set SD of inequivalent discrete spin structures of a cellular decomposition ϒ is of cardinal 22g . The space of representations of the fundamental group of the surface into Z/2 acts freely and transitively on SD . We explicitly build discrete spin structures and count them: Let T be a maximal tree of ϒ, that is to say a sub-complex of dimension one containing all the vertices of ϒ and a maximal subset of its edges such that there is no cycle in T . Choose 2g edges (ek )1≤k≤2g in ϒ \ T such that the 2g cycles (γk ) ∈ Z1 (ϒ)2g extracted from (T ∪ ek )1≤k≤2g form a basis of the fundamental group of (and ϒ). Let T+ := T ∪k ek and consider T , the sub-complex of the dual ϒ ∗ formed by all the edges in ϒ ∗ not crossed by T+ . It is a maximal tree of ϒ ∗ . Likewise we define T+ := T ∪k ek∗ . We construct inductively a spin structure ϒ : its first elements are a double copy of T and we add edges without any choice to make as we take leaves out of T+ . When only cycles are left, a choice concerning an edge ek has to be taken, opening a cycle in T+ . The process goes on until T+ is empty. These choices are completely encoded by a representation µ such as in the remark, and the 2g values (µ(γk ))1≤k≤2g determine the spin structure. On the other hand, this representation defines the spin structure and there are 22g such different representations. Hence the choices of the maximal tree and the edges ek are irrelevant. Because a cycle in ϒ belongs to a class in the fundamental group of the surface (up to a choice of a path to the base point, irrelevant for our matter), the representations of the fundamental group into Z/2 obviously act on spin structures: A representation ρ : π1 ( ) → Z/2 associates to a spin structure defined by a representation µ : Z1 (ϒ) → Z/2, the spin structure defined by the representation ρ(µ) such that ρ(µ)(γ ) := µ(γ ) +
208
C. Mercat
ρ([γ ]), where [γ ] ∈ π1 ( ) is the class of the cycle γ in the fundamental group. This action is clearly free, and transitive because the set of representations is of cardinal 22g . Given = ∗ a double cellular decomposition, we introduce a cellular decomposition which is the discretised version of the tangent directions bundle of both and ∗: Definition 10. The triple graph ϒ is a cellular complex whose vertices are unoriented edges of ♦, ϒ0 = {{x, y}/(x, y) ∈ ♦1 }. Two vertices {x, y}, {x , y } ∈ ϒ0 are neighbours in ϒ iff the edges (x, y) and (x , y ) are incident (that is to say x = x or x = y or y = x or y = y ), and they bound a common face of ♦. There are two edges in ϒ for each edge in . For this to be a cellular decomposition of the surface in the empty boundary case, one needs to add faces of three types, centred on vertices of , of ∗ and on faces of ♦ (see Fig. 26).
y
11 00 00 11
x
0011 1100
x
y Fig. 26. The triple graph ϒ
Remark 9. The topology of the usual tangent directions bundle is not at all mimicked by the incidence relations of ϒ, the former is 3 dimensional and the latter is a 2-cellular complex. ˆ Let (x 0 , y 0 ) ∈ ♦1 be a given edge. All the complexes , ∗ , ♦, ϒ are lifted to . Definition 11. The discrete universal spin structure ϒˆ is the following 1-complex: Its y vertices are of the form ((x, y), [γy 0 ]), where (x, y) ∈ ϒ0 is a pair of neighbours in ♦ and y
γy 0 is a path from y 0 to y on ∗ , avoiding the faces x ∗ and x 0∗ . We are interested only in y
its relative homology class modulo two, that is to say [γy0 ] ∈ H1 ( ∗ \ x ∗ , {y 0 , y}) ⊗ Z2 . y y y We will denote a point by ((x, y), γy 0 ) and identify it with ((x, y), γ y 0 ) whenever γy 0 y
and γ y 0 are homologous.
y y Two points ((x, y), γy 0 ) and ((x , y ), γy ) are neighbours in ϒˆ if 0
y0
y0
– x = x , (y, y ) ∈ 1∗ and γy −γy +(y, y ) is homologous to zero in H1 ( ∗ \x ∗ )⊗Z2 , y0
y0
– y = y , (x, x ) ∈ 1∗ and γy − γy is homologous to zero in H1 ( ∗ \ x ∗ ) ⊗ Z2 .
Discrete Riemann Surfaces
209
ϒˆ is a double covering of ϒˆ and it is connected around each face (see Fig. 27). It is a discrete spin structure on ϒˆ in the sense defined above. Once a basis of the fundamental group π1 (ϒ) is chosen, every representation of the homology group of into Z2 allows us to quotient this universal spin structure into a double covering of ϒ, yielding a usual spin structure ϒ .
111 000 000 111
0011
01
Fig. 27. Double covering around faces of ϒ
4.3. Dirac equation. A spinor changes sign between the two lifts in ϒ of a vertex of ϒ, in other words it is multiplied by −1 when it turns around a face. The faces of ϒ which are centred on diamonds are four sided. We set up the spin symmetry equation for a function ζ on ϒ0 , on a positively oriented face (ξ1 , ξ2 , ξ3 , ξ4 ) ∈ ϒ2 around a diamond, lifted to an 8-term cycle (ξ1+ , ξ2+ , ξ3+ , ξ4+ , ξ1− , ξ2− , ξ3− , ξ4− ) ∈ Z1 (ϒ ): ζ (ξ3+ ) = iζ (ξ1+ ).
(4.3)
It implies obviously that ζ is a spinor, that is to say ζ (ξ•− ) = −ζ (ξ•+ ). The coherent system of angles φ given by a semi-critical structure locally provides a spinor respecting the spin symmetry away from conic singularities: Define half angles θ on oriented edges of ϒ in the following way: Each edge (ξ, ξ ) ∈ ϒ1 cuts an edge a ∈ 1 , set θ (ξ, ξ ) := ± φ(a) 2 whether (ξ, ξ ) turns in the positive or negative direction around the diamond. Choose a base point ξ0 ∈ ϒ0 , define ζ by ζ (ξ0 ) = 1 and ζ (ξ ) := exp i θ (λ) (4.4) λ∈γ
for any path γ from ξ0 to ξ . The sum of the half angles are equal to π around the faces of ♦ and half the conic angle around a vertex, so if it is a regular flat point, we get 2π 2 =π again, hence ζ is a well defined spinor. As diagonals of the faces of ♦ are orthogonal, ζ fulfills the spin symmetry. Moreover, if the conic angles are congruous to 2π modulo 4π, ζ can be extended to any simply connected region; if the fundamental group acts by translations, ζ is defined on the whole ϒ .
210
C. Mercat
We are going to define a propagation equation which comes from the Ising model. It is fulfilled by the fermion defined by Kaufman [K] which is known to converge to a Dirac spinor near criticality. We will use the definition ψ = σ µ given by Kadanoff and Ceva [KC]. The Dirac equation has a long history in the Ising model, beginning with the work of Kaufman [K] and Onsager and Kaufman [KO], we refer among others to [McCW81, SMJ, KC]. The equation that we need is defined explicitly in [DD], hence we will name it the Dotsenko equation, even though it might be found elsewhere in other forms. It is fulfilled by the fermion at criticality as well as off criticality. But this equation is only a part of the full Dirac equation. For a function ζ on ϒ0 , with the same notations as before, and if a ∈ 1 is the diagonal of the diamond, between (ζ2 , ζ3 ) and (ζ4 , ζ1 ) (see Fig. 28): ζ (ξ1+ ) = 1 + ρ(a)2 ζ (ξ2+ ) − ρ(a)ζ (ξ3+ ). (4.5) A check around the diamond shows that it also implies that ζ is a spinor: We write the Dotsenko equation in ξ2+ and ξ3+ , ζ (ξ2+ ) = 1 + ρ(a ∗ )2 ζ (ξ3+ ) − ρ(a ∗ )ζ (ξ4+ ), ζ (ξ3+ ) = 1 + ρ(a)2 ζ (ξ4+ ) − ρ(a)ζ (ξ1− ), hence, as 1 + ρ(a)2 1 + ρ(a ∗ )2 = ρ(a) + ρ(a ∗ ), ζ (ξ1+ ) =ρ(a ∗ )ζ (ξ3+ ) − 1 + ρ(a ∗ )2 ζ (ξ4+ ) =ρ(a ∗ )( 1 + ρ(a)2 ζ (ξ4+ ) − ρ(a)ζ (ξ1− )) − 1 + ρ(a ∗ )2 ζ (ξ4+ ) = − ζ (ξ1− ).
The Dirac equation is the conjunction of the symmetry (4.3) and the Dotsenko (4.5) equations. We will see that this same equation describes the massive and massless Dirac equation, the mass measuring the distance from criticality.
a
ξ3 −ρa 1 + ρa2
ξ1
ξ2
Fig. 28. The Dotsenko equation
Given two spinors ζ , ζ , their pointwise product is no longer a spinor but a regular function on ϒ. As there are two edges in ϒ for each edge in , there is an obvious averaging map from 1-forms on ϒ to 1-forms on : We define dϒ ζ ζ ∈ C 1 () by the following formula, with the same notation as before, 2 dϒ ζ ζ := ζ (ξ3 )ζ (ξ3 ) − ζ (ξ2 )ζ (ξ2 ) + ζ (ξ4 )ζ (ξ4 ) − ζ (ξ1 )ζ (ξ1 ). a
dϒ ζ ζ is by definition an exact 1-form on ϒ but its average is not a priori exact on .
Discrete Riemann Surfaces
211
00111100
ξ1
ξ4 a
ξ2
00111100
ξ3
Fig. 29. The 1-form on associated to two spinors.
Proposition 15. If ζ and ζ respect whether the spin symmetry or the Dotsenko equation, then dϒ ζ ζ is a closed 1-form. If ζ is a Dirac spinor and ζ fulfills the Dotsenko equation, then dϒ ζ ζ is holomorphic, dϒ ζ¯ ζ anti-holomorphic and every holomorphic 1-form on can be written this way on a simply connected domain, uniquely up to a constant. A sufficient condition for dϒ ζ ζ to be closed on is that, with the same notations as above, ζ (ξ3 )ζ (ξ3 ) − ζ (ξ2 )ζ (ξ2 ) = ζ (ξ4 )ζ (ξ4 ) − ζ (ξ1 )ζ (ξ1 ) because ∂y ∗ dϒ ζ ζ for a vertex y ∈ 0 is a sum of such differences on the edges of ϒ around y. This is so if there exists a 2 × 2-matrix A such that + + ζ (ξ3 ) ζ (ξ4 ) =A , ζ (ξ2+ ) ζ (ξ1+ ) 1 0 1 0 t a similar formula for ζ , and A A= . The solutions are of the form 0 −1 0 −1 √ O 1 + λ2 √ λ for a complex number λ ∈ C, O = ±1 and a determination A= 1 + λ2 Oλ √ of 1 + λ2 . This is the case for√the spin symmetry, λ = −i, O = +1 and for the Dotsenko equation, λ = ρ(a), O = −1, 1 + λ2 > 0. If ζ is a Dirac spinor and ζ fulfills the Dotsenko equation, then dϒ ζ ζ = ζ (ξ4+ )ζ (ξ4+ ) − ζ (ξ3+ )ζ (ξ3+ ) ∗ a = iζ (ξ2+ )( 1 + ρ(a)2 ζ (ξ3+ ) − ρ(a)ζ (ξ2+ )) − iζ (ξ1+ )ζ (ξ3+ ) = iζ (ξ2+ )( 1 + ρ(a)2 ζ (ξ3+ ) − ρ(a)ζ (ξ2+ )) − i( 1 + ρ(a)2 ζ (ξ2+ ) − ρ(a)ζ (ξ3+ ))ζ (ξ3+ ) = iρ(a) ζ (ξ3+ )ζ (ξ3+ ) − ζ (ξ2+ )ζ (ξ2+ ) = iρ(a) dϒ ζ ζ . a
So dϒ is holomorphic. Of course, d ζ¯ ζ is anti-holomorphic. Conversely, if dϒ ζ ζ is holomorphic with ζ a Dirac spinor, then ζ fulfills the Dotsenko equation. (1,0) (), define α on ϒ by the obvious map ϒ 1 Given a holomorphic 1-form α ∈ ({x,y},{y,x }) αϒ := (x,x ) α. It is a closed 1-form on ϒ because α is closed on , ζζ
so there exists a function a on any simply connected domain of ϒ0 , unique up to an additive constant, such that dϒ a = αϒ . A check shows that the only spinors ζ such
212
C. Mercat
that dϒ ζ ζ = 0 on are the one proportional to ζ¯ . It is consistent with the fact that the Dirac spinor is of constant modulus (see Eq. (4.6)). Hence the function ζ := a/ζ on ϒ is the unique spinor (up to a constant times 1/ζ ∼ ζ¯ ) such that dϒ ζ ζ = α. Notice that for ζ a Dirac spinor, the holomorphic 1-form associated to it on is locally, for a given flat coordinate Z, dϒ ζ ζ = λdZ, with λ ∈ C a certain constant. 4.4. Existence of a Dirac spinor. Theorem 8. There exists a Dirac spinor on a double map iff it is critical for a given flat metric with conic angles congruous to 2π modulo 4π and such that the fundamental group acts by translations. The Dirac spinor is unique up to a multiplicative constant. Proof. Let ζ be a non-zero Dirac spinor. Consider a positively oriented face(ξ1 , ξ2 , ξ3 , ξ4 ) ∈ ϒ2 around a diamond with diagonals a, a ∗ as in Fig. 28, lifted to an 8-term cycle (ξ1+ , ξ2+ , ξ3+ , ξ4+ , ξ1− , ξ2− , ξ3− , ξ4− ) ∈ C1 (ϒ ). The equation ei
φ(a) 2
ρ(a ∗ ) + i = 1 + ρ(a ∗ )2
defines an angle φ(a) ∈ (0, π) for every edge a ∈ 1 . The Dotsenko and symmetry equations combine into ρ(a) + i ζ (ξ2+ ) = ζ (ξ1+ ). 2 1 + ρ(a)
(4.6)
The fact that ζ is a spinor implies that, summing the four angles around the diamond, ∗ we get ei(φ(a)+φ(a )) = −1. As each angle is less than π , their sum is equal to π . The ) = −1. So same consideration around a vertex x ∈ 0 , yields exp i (x,x )∈1 φ(x,x 2 φ is a coherent system of angles and the map is critical with conic angles congruous to 2π modulo 4π. Conversely, given φ a coherent system of angles with conic angles congruous to 2π modulo 4π, the preceding construction described by Eq. (4.4) gives the only Dirac spinor. In this case, dZ is a well defined holomorphic 1-form on the whole surface. Corollary 9. Let (, ρ) be a discrete conformal structure and P a set of vertices, containing among others the vertices v such that the sum e∼v Arctanρ(e), summed over all edges e incident to v, is greater than 2π . The discrete conformal structure is critical with P as conic singularities if and only if there exist Dirac spinors on every simply connected domain containing no point of P . We define in which sense a discrete spinor converges to a continuous spinor. We don’t define these spinors on specific spin structures but rather on the universal spin structure S. Consider a sequence of finer and finer critical maps such as in Theorem 3. Choose a converging sequence of base points (xk0 , yk0 ) ∈ k ϒ0 on each critical map such that the direction sequence
−−→ x0y0 ( d(xk0 ,yk 0 ) ) k k
converges to a tangent vector (x 0 , v 0 ).
Discrete Riemann Surfaces
213
Consider a sequence of points (xk , yk ) ∈ k ϒ0 , defining a sequence of points (xk ) −−→
converging to x in and a converging sequence of directions v = lim d(xxkky,yk k ) . By compacity of the circle, there exist such sequences for every point x ∈ and the criticality implies that it is in at least three directions for flat points, separated by angles less than π. The different limits allow us to identify, after a certain rank, the relative homology groups H1 (k ∗ \ xk∗ , {yk0 , yk }) ⊗ Z2 with H1 ( x , {(x 0 , v 0 ), (x, v)}) ⊗ Z2 , the classes of paths in the blown-up of at x 0 and x. Definition 12. We will say thata sequence (ζk )k∈N of spinors converges if and only if, for any converging sequences, (xk , yk ) ∈ k ϒ0 k∈N defining a limit tangent vector, and ([λk ])k∈N of classes of paths in k ∗ from yk0 to y, avoiding the face xk∗ , the sequence of values (ζk (xk , [λk ])) converges. ˆ the set Remark 10. It defines a continuous limit spinor ζ by equivariance: Let x ∈ , Dx of directions in which there exist converging sequences of discrete directions is by definition a closed set. Let u, v two boundary directions of Dx such that the entire arc A of directions between them is not in Dx . Consider [(x, [λ]x ), (x, [λ ]x )] ⊂ S a lift of A. The circle S1 acts on the directions, hence on the ∼x -classes, let ψ ∈ (0, π ) the angle such that (x, eiψ [λ]x ) = (x, [λ ]x ). Define ζ (x, eiφ [λ]x ) := eiv(φ) ζ (x, [λ]x ), where v(φ) =
φ ζ (x,[λ ]x ) ψ ζ (x,[λ]x )
(v(φ) =
φ 2
for Dirac spinors).
Theorem 10. Given a sequence of critical maps such as in Theorem 3 with Dirac spinors on all of them, they can be normed so that they converge to the usual Dirac spinor on the Riemann surface. In a local flat map Z, the square of the discrete Dirac spinor on k ϒˆ is (up to a multiplicative constant) the 1-form dZ evaluated on the edges. Hence their sequence converges.
4.5. Massive Dirac equation, discrete fusion algebra and conclusions. For completeness and motivation, we describe below the situation off-criticality where elliptic integrals come into play, and investigate a form of the discrete fusion algebra in the Ising model. This work was done by Daniel Bennequin and will be the subject of a subsequent article. A massive system in the continuous theory is no longer conformal. In the same fashion, Daniel Bennequin defined a massive discrete system of modulus k as a discrete double graph (, ρ) such that, for each pair (a, a ∗ ) of dual edges, ρ(a)ρ(a ∗ ) =
1 . k
(4.7)
The massless case corresponds to k = 1. We showed that criticality was equivalent to a coherent system of angles φ(a) such as shown in Fig. 8, defined by tan φ(a) 2 = ρ(a), and adding up to 2π at each vertex of the double, except at conic singularities. The Dirac
214
C. Mercat
spinor was constructed using the half angles φ(a) 2 . Similarly, for every edge, we define the massive “half angle” u(a) as the elliptic integral
φ(a) 2
u(a) := 0
dϕ , (ϕ)
(4.8)
where the measure is deformed by k 2 + k = 1, (ϕ) := 1 − k 2 sin2 ϕ, (ϕ) := 1 − k 2 sin2 ϕ. 2
π 2
(4.9) (4.10) (4.11)
Using these non-circular half angles, and the corresponding “square angle” Ik := dϕ (ϕ) ,
one can construct a massive Dirac spinor wherever the following “flatness” condition is fulfilled: (Ik − u(a)) = Ik mod 4Ik for each face F ∈ 2 , (4.12) 0
a∈∂F
(u(a)) = Ik
mod 4Ik for each vertex v ∈ 0 .
a.v
(4.13)
Daniel Bennequin noticed that the fusion algebra of the Ising model could be understood at the finite level: Consider a trinion made of cylinders of a square lattice, of width m and n, glued into a cylinder of width m+n. It has been known since Kaufman [K] that, in the transfer matrix description of the Ising model, the configuration space of the Ising model on each of the three boundaries is a representation of spin groups spin(m), spin(n) and spin(m+n) respectively. If m is odd, there exists a unique irreducible representation of spin(m) but when m is even, there are two irreducible representations, + and − . A pair of pants gives us a map spin(m) × spin(n) → spin(m + n), in the case of a pair of pants of height zero, it’s the inclusion given by the usual product. The representations of spin(m + n) induce representations of the product group that can be split into irreducible representations. If the three numbers are even, + → + ⊗ + + − ⊗ − , −
+
−
−
+
→ ⊗ + ⊗ ,
(4.14) (4.15)
while if only one of them is even, → + ⊗ + − ⊗ ,
(4.16)
+ → ⊗ ,
(4.17)
and if m and n are both odd,
−
→ ⊗ .
(4.18)
Discrete Riemann Surfaces
215
Let us compile these data in an array and relabel by σ , + by 1 and − by O: 1 O
σ
1 1 O
σ
O
σ
O
1
(4.19)
σ σ σ 1 + O. This is read as follows, the 1 + O in the slot σ ⊗ σ for example, means that the representation 1 and the representation O of spin(m + n) both induce a factor σ ⊗ σ in the representation in the product group spin(m) × spin(n). We get exactly the fusion rules of the Ising model. The only difference compared with the continuous case is that the algebra is not closed at a finite level. The columns, rows and entries are not representations of the same group, rather we have a product of representations of spin(n) and spin(m) as a factor of a representation of spin(n + m). These results provide evidence that a discrete conformal field theory might be looked for: the discrete Dirac spinor at criticality is the discrete version of the conformal block associated with the field C and some sort of fusion algebra can be identified at the finite level. The program we contemplate is, first to investigate other statistical models and see if there are such patterns. If that is the case, we must then mimic in the discrete setup the vertex operator algebra of the continuous conformal theory. This can be attempted by defining a discrete operator algebra, in a similar fashion to Kadanoff and Ceva [KC], and splitting this algebra according to its discrete holomorphic and anti-holomorphic parts. The hope is that some aspects of the powerful results and techniques defined by Belavin, Polyakov and Zamolodchikov [BPZ] will still hold. A very interesting issue would be, as we have done for the Ising model, to realize the fusion rules of a theory in the discrete setup, yielding its Verlinde algebra. Acknowledgements. The author did the main part of this work as a Ph.D student in Strasbourg, France, under the supervision of Daniel Bennequin [M]; his remarks were essential throughout the paper. The rest was done during two postdoctoral stays, in Djursholm, Sweden, funded by a Mittag-Leffler Institute grant, and at the University of Tel Aviv, Israel, thanks to an Algebraic Lie Representations TMR network grant, no. ERB FMRX-CT97-0100. We thank M. Slupinski, M. Katz and L. Polterovich for discussions on spinors, M. Sharir for advice on Voronoï diagrams, R. Benedetti for references on flat Riemannian metrics, and B. McCoy for pointing out references in statistical mechanics and his great help in understanding the relationship between the Ising model and discrete holomorphy.
References [AdABM] Anglès d’Auriac, J.-Ch., Boukraa, S. and Maillard, J.M.: Let’s baxterise. J. Stat. Phys. 102, 641–700 (2001). hepth/0003212 [BGB] Baake, M, Grimm, U. and Baxter, R.J.: A critical Ising model on the labyrinth. Internat. J. Modern Phys. B 8, 2526 (1994): Perspectives on solvable models. 3579–3600 [Bax] Baxter, R.J.: Exactly solved models in statistical mechanics. London: Academic Press Inc. [Harcourt Brace Jovanovich Publishers], 1989; Reprint of the 1982 original [BPZ] Belavin, A.A., Polyakov, A.M. and Zamolodchikov, A.B.: Infinite conformal symmetry in twodimensional quantum field theory. Nucl. Phys. B 241, 2, 333–380 (1984) [BS96] Benjamini, I. and Schramm, O.: Random walks and harmonic functions on infinite planar graphs using square tilings. Ann. Probab. 24, 3, 1219–1238 (1996) [Bus] Buser, P.: Geometry and spectra of compact Riemann surfaces. Boston, MA: Birkhäuser Boston Inc., 1992 [CdV90] Colin de Verdière, Y.: Un principe variationnel pour les empilements de cercles. Invent. Math. 104, 655–669 (1991)
216
[CdV96]
C. Mercat
Colin de Verdière, Y. Gitler, I. and Vertigan, D.: Réseaux électriques planaires, I,I. Comment. Math. Helv. 71, 1, 144–167 (1996) [Cox1] Coxeter, H.S.M.: Introduction to geometry. Wiley Classics Library, New York: John Wiley & Sons Inc., 1989; Reprint of the 1969 edition [DD] Dotsenko, V.S. and Dotsenko, V.S.: Critical behaviour of the phase transition in the 2d Ising model with impurities. Adv. in Phys.32, 2, 129–172 (1983) [Duf] Duffin, R.J.: Basic properties of discrete analytic functions. Duke Math. J. 23, 335–363 (1956) [GS87] Grünbaum, B. and Shephard, G.C.: Tilings and patterns. NewYork: W. H. Freeman and Company, 1987 [Hug] Hughes, B.D.: Random walks and random environments, Vol. 1. New York: The Clarendon Press Oxford University Press, 1995 [ID] Itzykson, C. and Drouffe, J.-M.: Statistical field theory, Vol. 2. Cambridge Monographs on Mathematical Physics, Cambridge: Cambridge University Press, 1989 [KC] Kadanoff, L.P. and Ceva, H.: Determination of an operator algebra for the two-dimensional Ising model. Phys. Rev. B (3) 3, 3918–3939 (1971) [K] Kaufman, B: Crystal statistics ii. Partition function evaluated by spinor analysis. Phys. Rev. 76, 1232 (1949) [KO] Kaufman, B and Onsager, L: Crystal statistics iii. Short range order in a binary Ising lattice. Phys. Rev. 76, 1244 (1949) [Ken] Kenyon, R.: Tilings and discrete Dirichlet problems. Israel J. Math. 105, 61–84 (1998) [KW] Kramers, H.A. and Wannier, G.H.: Statistics of the two-dimensional ferromagnet, I. Phys. Rev. (2) 60, 252–262 (1941) [LF] Lelong-Ferrand, J.: Représentation conforme et transformations à intégrale de Dirichlet bornée. Paris: Gauthier-Villars, 1955 [McCW] McCoy, B.M. and Wu, T.T.: The two-dimensional Ising model. Cambridge, Massachusetts: Harvard University Press, 1973 [McCW81] McCoy, B.M. and Wu, T.T.: Non-linear partial difference equations for the two-spin correlation function of the two-dimensional Ising model. Nucl. Phys. B 180, 89–115 (1981) [M] Mercat, C.: Holomorphie discrète et modèle d’Ising, PhD thesis, Université Louis Pasteur, Strasbourg, France, 1998, under the direction of Daniel Bennequin, Prépublication de l’IRMA, available at http://www-irma.u-strasbg.fr/irma/publications/1998/98014.shtml [Mil] Milnor, J: Spin structures on manifolds. Enseign. Math., II. Ser. 9, 198–203 (1963) [Ons] Onsager, L.: Crystal statistics, I. A two-dimensional model with an order-disorder transition. Phys. Rev. (2) 65, 117–149 (1944) [PS85] Preparata, F.P. and Shamos, M.I.: Computational geometry. Texts and Monographs in Computer Science, New York: Springer-Verlag, 1985 [SMJ] Sato, M, Miwa, T and Jimbo, M: Studies on holonomic quantum fields i–iv. Proc. Japan Acad. 53 (A), 1–6 (1977) [Sie] Siegel, C.L.: Topics in complex function theory. Vol. II, Wiley Classics Library, New York: John Wiley & Sons Inc., 1988. Automorphic functions and abelian integrals, translated from the German by A. Shenitzer and M. Tretkoff, with a preface by Wilhelm Magnus, Reprint of the 1971 edition. A Wiley-Interscience Publication [Syo] Syôzi, Itiro: Statistics of kagomé lattice. Progress Theoret. Phys. 6, 306–308 (1951) [Tro] Troyanov, M.: Les surfaces euclidiennes à singularités coniques. Enseign. Math. (2) 32 1, 2, 79–94 (1986) [Uti] Utiyama, T.: Statistics of two-dimensional Ising lattices of chequered types. Progress Theoret. Phys. 6, 907–909 (1951); Letter to the Editor [Veb] Veblen, O.: Analysis situs. 2. edit., New York: American Mathematical Society X, 1931 [VoSh] Voevodski˘ı, V.A. and Shabat, G.B.: Equilateral triangulations of Riemann surfaces, and curves over algebraic number fields. Dokl. Akad. Nauk SSSR 304 2, 265–268 (1989) [Wan50] Wannier, G.H.: Antiferromagnetism. The triangular Ising net. Physical Rev. (2) 79, 357–364 (1950) [Whit] Whitney, H.: Product on complexes. Annals of Math. (2) 39, 397–432 (1938) [Yam] Yamamoto, T.: On the crystal statistics of two-dimensional Ising ferromagnets. Progress Theoret. Phys. 6, 533–542 (1951) Communicated by M. E. Fisher
Commun. Math. Phys. 218, 217 – 232 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
The Discrete Spectrum of the Perturbed Periodic Schrödinger Operator in the Large Coupling Constant Limit Oleg Safronov Department of Mathematics, KTH, 10044 Stockholm, Sweden. E-mail:
[email protected] Received: 12 September 2000 / Accepted: 22 November 2000
Abstract: Let A be a periodic Schrödinger operator and let V0 ≥ 0 be a decaying potential. We study the number N˜ (α) of the eigenvalues of the operator A(α) = A−αV0 inside a fixed interval (λ1 , λ2 ). We obtain an asymptotic formula for N˜ (α) as α → ∞. 0. Introduction In a simple quantum mechanical model for impurities in a crystal, one studies the spectrum of a periodic Schrödinger operator A = − + p(x), perturbed by a decaying potential −αV0 , which models the impurity (α > 0 is a coupling constant). The basic assumptions are that A has a spectral gap in its essential spectrum; eigenvalues of A(α) = A − αV0 inside correspond to the eigenstates of the electrons observed in solid state physics. We study the number of eigenvalues of A(α) inside an interval (λ1 , λ2 ) ⊂ . The main purpose of our study is to obtain an asymptotics of this number, as α → ∞. We point out that the problem has been solved only for one-dimensional Schrödinger operators in [3]. The results of the present paper deal with multidimensional Schrödinger operators. However we consider only the case of spherically symmetric potentials V0 ≥ 0. Note that the eigenvalues of A(α) in are continuous and monotone functions of α. Therefore it makes sense to ask how many eigenvalues of A(t) cross a fixed point λ ∈ as t grows from 0 to α (excluding α). Let us denote this quantity by N0 (λ, A(α)). Then the difference N0 (λ2 , A(α)) − N0 (λ1 , A(α)) coincides with the number of eigenvalues inside the interval [λ1 , λ2 ). For slowly decaying potentials V0 (x) ∼ |x|−s ,
|x| → ∞,
(1)
with s < 2, it was shown in [1] that N0 (λ, A(α)) ∼ C(λ)α d/s ,
α → ∞,
(2)
218
O. Safronov
where the coefficient C(λ) can be expressed in terms of the integrated density of states. The same situation arises when V0 ≤ 0. Since the coefficient C(λ) in (2) depends on λ, one can easily obtain that the leading term in the asymptotic formula for the number of the eigenvalues inside (λ1 , λ2 ) equals (C(λ2 ) − C(λ1 ))α d/s . Note that if V0 satisfies (1) with s > 2 then we have the standard Weyl’s asymptotics: d/2 N0 (λ, A(α)) ∼ (2π )−d ωd α d/2 V0 dx, α → ∞, (3) where the asymptotic coefficient does not depend on λ. So if one knew the second term in the asymptotics (3), one would be able to calculate the asymptotics of the difference N0 (λ2 , A(α)) − N0 (λ1 , A(α)). Unfortunately the second term in (3) has not been obtained. Therefore in order to solve the problem we need a new technique. 1. Formulation of the Main Result. Preliminary Information 1.1. Basic notation. a. Below M is a densely defined linear operator in a Hilbert space H. We denote by D(M), M ∗ , σ (M) the domain, the adjoint operator and the spectrum of the operator respectively. In the case when M = M ∗ , the symbol EM (·) denotes the spectrum measure for M. Let the spectrum of the operator M = M ∗ in H be bounded from below and be discrete to the left from the point λ0 ∈ R. Then the operator M is semibounded from below and is generated by a certain quadratic form m[·, ·] in H. We define the spectral distribution function of M: N (λ, M) = rankEM (−∞, λ),
λ < λ0 .
Furthermore, let d[m] be the domain of the form m. Then for the quantity N (λ, M) the following relation holds: N (λ, M) = max(dimF ),
F ⊂ d[m], m[u, u] < −λ(u, u), ∀ u ∈ F, u = 0.
b. Below d ≥ 2, Q = [0, 1)d , S = {x ∈ Rd : |x| = 1 } and Br = { x ∈ Rd : |x| < r}, r > 0. Notation H s ("), " ⊂ Rd , is taken for the Sobolev classes of oder s ∈ N; H0s (") is the closure of the class C0∞ (") in the metric of H s ("). For Laplace operators on domains " ⊂ Rd and on the sphere S we take the notations , θ respectively. The gradient operator is denoted by ∇. Different estimate constants are denoted by C, c (sometimes with indices). 1.2. Let a function p ∈ L∞ (Rd ) take real values and p(x + n) = p(x),
x ∈ Rd , n ∈ Zd .
Consider the selfadjoint operator Au = − u + pu,
u ∈ D(A) = H 2 (Rd ),
in L2 (Rd ). Obviously A is an elliptic Zd -periodic operator. The spectrum σ (A) can have gaps besides the semi-infinite one. Let = (λ− , λ+ ) be a fixed gap in the spectrum σ (A) (if λ+ = infσ (A), then λ− = −∞). Below we perturb the operator A so that the spectrum of the perturbed operator in is discrete.
Discrete Spectrum
219
Let V : [1, +∞) → R+ be a smooth function and lim r s V (r) = va ,
r→∞
2 < s < 2 + 2/(d − 1), va > 0.
(4)
We denote by V0 the operator of multiplication by the function V (|x|) extended by zero for |x| < 1. We put A(α) := A − αV0 , α > 0. It is easy to see that the spectrum σ (A(α)) is discrete in . Let λ− < λ1 < λ2 < λ+ . The main object of our investigation is the asymptotic behavior of the value rankEA(α) (λ1 , λ2 ) for α → ∞. Below we systematically use the following notations for the operators on a domain. Let " ⊂ Rd be an open domain. We consider the following operator in L2 ("): A(ε) (α, ") = − + p − αV0 ,
α > 0;
the index ε takes the values D or N and characterizes the boundary (Dirichlet or Neumann) conditions on ∂". In order to formulate the main result we need to introduce the density of states for the operator A. Let x1 , x2 ∈ Rd and l > 0. Let also "(l) := "(l; x1 , x2 ) = l(Q + x1 ) + x2 . Then for an arbitrary λ ∈ R the following limit exists: ρ(λ) = lim l −d N (λ, A(ε) (0, "(l))). l→∞
It does not depend on x1 , x2 and on the values of the index ε. This limit is called the integrated density of states for the operator A at the point λ. Let λ1 , λ2 be fixed, λ− < λ1 < λ2 < λ+ . We put v-supx∈Rd (p(x) − λ1 ), ε = D; b(ε) = v-infx∈Rd (p(x) − λ2 ), ε = N . We impose an additional condition on the function V : Condition 1.1. There exists η∗ > 0, such that for all 0 < η < η∗ and α > η−s the number of the roots of the equation 2b(ε)r = α(r 2 V (r)) , ε = D, N in the interval r ∈ [1, ηα 1/s ] does not exceed a fixed number l ∈ N. We note that Condition 1.1 is fulfilled when the function r −1 (r 2 V (r)) decreases monotonously for r > 1. In particular this condition is fulfilled for V (r) = va r −s , r > 1. The main result of the present work gives an asymptotics of the entire spectrum multiplicity of the operator A(α) in the interval (λ1 , λ2 ) as α → ∞. Theorem 1.1. Let d ≥ 2 and λ− < λ1 < λ2 < λ+ . Assume that (4) and Condition 1.1 are fulfilled. Then lim α −d/s rankEA(α) (λ1 , λ2 ) = (ρ(λ2 + va |x|−s ) − ρ(λ1 + va |x|−s ))dx.
α→∞
Rd
(5)
Remark 1.1. For a fixed real number µ, ρ(λ + µ) − ρ(λ) = O(λ(d−1)/2 ),
λ → ∞.
This relation shows that the integral in the right-hand side of (5) is finite. Remark 1.2. Now we discuss the condition s < 2 + 2/(d − 1). The “Dirichlet–Neumann bracketing” method (used in the present paper) has a limit of accuracy. Namely it can be applied only to the asymptotics of the order α q , q > (d −1)/2. In Theorem 1.1 q = d/s. This leads to the discussed estimate for s.
220
O. Safronov
2. Auxiliary Result 2.1. Let Y = {x ∈ Rd : 1 < |x| < ηα 1/s }, d ≥ 2. We write A(ε) (α) = A(ε) (α, Y) for brevity. From the inequality A(N ) (α) < A(D) (α) we obtain that N (λ, A(D) (α)) ≤ N(λ, A(N ) (α)) for an arbitrary λ ∈ R. The following statement gives an estimate of the difference of the distribution functions of the spectrum of the operators A(D) (α) and A(N ) (α). Proposition 2.1. Let p ∈ L∞ (Rd ), 2 < s < 2 + s/(d − 1) and λ1 < λ2 . Assume that V satisfies the conditions of Theorem 1.1. Then there is η0 > 0 such that for all 0 < η < η0 , µ(η) := lim sup α −d/s (N (λ2 , A(N ) (α)) − N (λ1 , A(D) (α))) < ∞. α→∞
Also one has
lim µ(η) = 0.
η→0
Proof of Proposition 2.1. The proof is based on the elementary separation of the variables. We put λD = λ1 and λN = λ2 . The value N (λε , A(ε) (α)) is equal to the maximal dimension of the subspaces F on which (|∇u|2 + (p(x) − λε )|u|2 )dx < α V (|x|)|u|2 dx. (6) Y
Y
It is sufficient to consider subspaces F ⊂ C0∞ (Y) (the case ε = D) and F ⊂ C ∞ (Y) (the case ε = N ). Let b(ε) be the same as in Sect. 1. We replace the function p in the definition of the operator A(ε) (α) by b(ε) + λε . Since b(D) + λD ≥ p(x) and b(N ) + λN ≤ p(x) the difference N (λN , A(N ) (α)) − N (λD , A(D) (α))
(7)
can only increase. After the passage to the spherical coordinates and the substitutions r = et , u = v exp(− (d−2) 2 t) and p → b(ε) + λε the inequality (6) turns into T T ∂v (2 − d) 2 2 (|∇θ v| + b(ε)e2t |v|2 dθ dt + v )dθ dt + ∂t 2 S S 0 0 (8) T 2t t 2 <α e V (e )|u| dθ dt, 0
S
where T = 1/s log(ηs α). Let F(D) = C0∞ (0, T ),
F(N ) = C ∞ [0, T ] and let Wj , j = 0, 1, . . . be the space of the spherical functions of order j on the sphere S. The subspaces, obtained by closing F(ε)Wj , diagonalize the quadratic forms in both parts of (8). Hence the operator (ε) A(ε) (α) is decomposed into the orthogonal sum of the operators Aj (α) operating in the corresponding “cells”. In the cell with the number j the inequality (8) is transformed into kj = dimWj copies of the inequalities for scalar functions 0
T
(d − 2) 2 (|y |2 − y + ( j + b(ε)e(2t) )|y|2 )dt 2 T <α e2t V (et )|y|2 dt, 0
y ∈ F ⊂ F(ε),
(9)
Discrete Spectrum
221
where j are the eigenvalues of the operator − θ on the sphere S. We denote by 8 = 8(α) the family of the numbers j for which (9) is fulfilled for ε = N at least for one y ∈ F(N ). Obviously the set {t ∈ [0, T ] : j + b(N )e2t ≤ αe2t V (et )} is not empty for j ∈ 8. Therefore for an arbitrary j ∈ 8,
j ≤ α supt∈[0,+∞) (e2t V (et )) + |b(N )|η2 α 2/s ,
j ∈ 8.
(10)
By the Weyl law for j and kj we have the estimate
kj ≤ C(α
d−1 2
+ 1),
0 < η < 1.
(11)
j ∈8
Under the substitution of F(N ) by C0∞ (0, T ) the maximal dimension of subspaces F , on which (9) is fulfilled, decreases at most by two. According to (11) under this procedure d−1 the difference (7) would be changed by a quantity of order O(α 2 ) at α → ∞. For F(ε) = C0∞ (0, T ) the inequality (9) turns into:
T 0
(|y |2 + (γj + b(ε)e2t )|y|2 )dt < α y∈F ⊂
C0∞ (0, T ),
T
e2t V (et )|y|2 dt,
0
d −2 2 γj = j + ( ) . 2
(12)
2.2. We denote by Z(ε, α) the set of numbers j , for which (12) is fulfilled at least for one y ∈ C0∞ (0, T ). For j ∈ Z(ε, α) we consider the set = (ε, j, α) := {t ∈ [0, T ] : γj + b(ε)e2t ≤ αe2t V (et )}. Obviously = ∅ for every j ∈ Z(ε, α). Let us clarify the structure of the set . If a point t ∈ (0, T ) lies on the boundary of the set , then γj + b(ε)e2t = αe2t V (et ).
(13)
γj + b(ε)r 2 = αr 2 V (r).
(14)
For et = r we get
According to Rolle’s theorem between two solutions of Eq. (14) there exists a root of the equation 2b(ε)r = α
d 2 (r V (r)). dr
(15)
According to Condition (1.1) the number of the roots of Eq. (15) does not exceed a fixed number l ∈ N. Therefore is a union of at most l + 2 disjoint (probably degenerated to a point) intervals. Consequently, under the substitution of F(ε) = C0∞ (0, T ) by C0∞ ( ) the maximal dimension N (ε, j, α) of the subspaces F , where (12) is fulfilled, will be decreased at most by 2l + 4 (the doubled number of the intervals forming ). Since d−1 Z(ε, α) ⊂ 8 and according to (11) the value (7) will be changed only by O(α 2 ) as l+2 α → ∞. Let = ∪k=1 k , where k = k (ε, j, α) are disjoint (probably empty or
222
O. Safronov
degenerated to a point) intervals. For F ⊂ C0∞ ( ) = C0∞ ( 1 ) + · · · + C0∞ ( l+2 ) the inequality (12) is decomposed into l + 2 inequalities for the intervals k : 2 2t 2 (|y | + (γj + b(ε)e )|y| )dt < α e2t V (et )|y|2 dt, k k (16) ∞ y ∈ F ⊂ C0 ( k ), k = 1, l + 2. The maximal dimension Nk = Nk (ε, j, α) of the subspaces F from (16) coincides with the entire multiplicity of the negative spectrum of the operator −y + (γj + b(ε)e2t )y − αe2t V (et )y, y ∈ H 2 ( k ) ∩ H01 ( k ), k = 1, l + 2. 2.3. In the investigation of the spectrum of this operator we use the technique suggested in [3]. Let us put f = f (ε, t, α) = (αe2t V (et ) − γj − b(ε)e2t + 1)1/2 , t ∈ k . Proposition 2.2. For the quantity Nk the following inequality holds: f (ε, t, α)| ≤ C(1 + log(1 + α)), |Nk (ε, j, α) − k (ε,j,α)
(17)
where C = C(η) does not depend on j and α. (k)
(k)
Proof. Let k = [R1 , R2 ], k = 1, l + 2. Let us employ the relation between the quantity Nk and the properties of the solution of the problem −φ + (γj + b(ε)e2t )φ − αe2t V (et )φ = 0, (18) (k) (k) φ(R1 ) = 0, φ (R1 ) = 1. Namely the quantity Nk coincides (see [3]) with the number of zeros µk = µk (ε, j, α) of the function φ lying strictly inside k . In order to estimate µk we represent φ(t) in the polar form. We put φ(t) = β(t) sin ξ(t),
φ (t) = f (t)β(t) cos ξ(t).
Then it follows from (18) (see [3]) that ξ = f + f (2f )−1 sin(2ξ ) − f −1 sin2 ξ,
(k)
ξ(R1 ) = 0.
(19)
We note also that β(t) > 0 for t ∈ k . Consequently the quantity µk coincides with the number of zeros of the function sin ξ(t). In other words µk coincides with the number of roots of the equation ξ(t) = 0 mod π ). In view of (19) ξ = f > 0, if ξ = π n, n ∈ Z. Therefore for a fixed n ∈ N the function ξ(t) can take the values π n only once. Thus (k)
(k)
π −1 ξ(R2 ) − 1 ≤ µk ≤ π −1 ξ(R2 ). Since Nk = µk according to (19) we obtain −1 Nk − π −1 f dt ≤ (2π) |f /f |dt + π −1 | k | + 1, k = 1, l + 2. k
k
f
It is easy to see that changes its sign on the interval k not more than the function l times. Therefore k |f /f |dt ≤ 2(l + 1) × maxt∈ k | log f |. It remains to take into account that 1 ≤ f ≤ (α supr>1 (r 2 V (r)) + |b(N )|η2 α 2/s + 1)1/2 for t ∈ k .
Discrete Spectrum
223
2.4. Let us set
@j (α) :=
(N ,j,α)
f (N , t)dt −
(D ,j,α)
f (D, t)dt,
and C1 := (b(D)−b(N ))1/2 . The following statement gives the estimate of the quantity @j (α). Proposition 2.3. For @j (α) the following inequality holds: @j (α) ≤ C1 et dt.
(20)
(N ,j,α)
Proof. It follows from the definition of the value f (ε, t) that f (N , t)2 = C12 e2t + f (D, t)2 for t ∈ (D, j, α). Consequently f (N , t) − f (D, t) = C12 e2t /(f (N , t) + f (D, t)) ≤ C12 e2t /f (N , t) ≤ C1 et ,
t ∈ (D, j, α).
Furthermore, the inequality f (N , t) ≤ C1 et is fulfilled for t ∈ (N , j, α)\ (D, j, α). And (D, j, α) ⊂ (N , j, α). Thus, @j (α) = (f (N , t) − f (D, t))dt (D ,j,α) + f (N , t)dt (21) (N ,j,α)\ (D ,j,α) et dt. ≤ C1 (N ,j,α)
Above (see Subsects. 2.1, 2.2) we made transformations that either increase difference (7) or change it by a value of order o(α d/s ) as α → ∞. Finally this difference equals
kj
j ∈Z(N ,α)
l+2
(Nk (N , j, α) − Nk (D, j, α)).
(22)
k=1
According to (11) and (17) the value (22) coincides with the sum j ∈Z(N ,α) kj @j (α) with accuracy of order o(α d/s ) when α → ∞. Therefore kj @j (α) + o(α d/s ), α → ∞. N (λN , A(N ) (α)) − N (λD , A(D) (α)) ≤ j ∈Z(N ,α)
(23) At the next step of the proof we decomposeZ(N , α) into subsets Xn = Xn (α). For each of these subsets we estimate the value j ∈Xn kj @j (α) separately. Then we sum up these estimates over n. For n ∈ N ∪ {0} and α > 0 we define a set Xn = Xn (α) of numbers j for which nα 2/s ≤ j < (n + 1)α 2/s .
(24)
224
O. Safronov
Our nearest goal is to estimate the sums of the quantities kj = dimWj , the numbers j of which are in Xn (α). For that we use the precise values
j = j (j + d − 2), d−2 d−2 kj = dimWj = Cd+j −2 + Cd+j −3 .
From the last equation it follows that kj ≤ Cj d−2 ,
j ≥ 1.
(25)
Proposition 2.4. There exists a number n0 ∈ N such that for any n > n0 and α > 1, kj ≤ C(n(d−2)/2 α (d−2)/s + n(d−3)/2 α (d−1)/s ), (26) j ∈Xn
where C > 0 does not depend on n and α. Proof. Let n ∈ N and α > 1. Then for j ∈ Xn (α), nα 2/s ≤ j = j (j + d − 2) = j 2 (1 + (d − 2)/j ) ≤ j 2 (d − 1). Comparing the utmost parts we get j 2 ≥ (d − 1)−1 nα 2/s ,
∀ j ∈ Xn (α).
Therefore for j ∈ Xn (α), nα 2/s ≤ j = j 2 (1 + (d − 2)/j ) ≤ j 2 (1 +
(d − 2)(d − 1)1/2 ). n1/2 α 1/s
Taking the square root of the utmost parts of this inequality we come to the estimate j ≥ (1 + (d − 2)(d − 1)1/2 /n1/2 α 1/s )−1/2 n1/2 α 1/s =: ξ,
∀ j ∈ Xn (α).
Now let us derive an upper estimate for j ∈ Xn (α). Since for j ∈ Xn (α), j 2 ≤ j (j + d − 2) = j ≤ (n + 1)α 2/s , we have j ≤ (n + 1)1/2 α 1/s =: ζ, ∀ j ∈ Xn (α). According to (25) we get j ∈Xn
kj ≤ C
ζ
j d−2 ≤ C
ξ
ζ +1
ξ
x d−2 dx
= C(((n + 1)1/2 α 1/s + 1)d−1 − nd−1/2 α d−1/s (1 + (d − 2)(d − 1)1/2 /n1/2 α 1/s )−(d−1)/2 ). It leads to (26) for large n > n0 .
(27)
Discrete Spectrum
225
2.5. Let η be the same as in the statement of Proposition 2.1, v+ := supr>1 (r s V (r)),
C2 := |b(N )|.
In order to estimate a sum of the kind j ∈Xn kj @j (α) it is enough to estimate the integral in the right side of (20) and to use (26). The investigation of the inequality (20) leads to the following result. Proposition 2.5. Let be η > 0 and n > η2 C2 + η−(s−2) .
(28)
Then for j ∈ Xn , 1/(s−2) 1/s
@j (α) ≤ C1 v+
α
(n − C2 η2 )−1/(s−2) .
(29)
Proof. For t ∈ (N , j, α) we have αv+ e−(s−2)t − j + C2 η2 α 2/s ≥ αe2t V (et ) − j −
d − 2 2 2
− b(N )e2t ≥ 0.
Let be j ∈ Xn . Then j ≥ nα 2/s . Therefore αv+ e−(s−2)t ≥ nα 2/s − C2 η2 α2/s,
∀ t ∈ (N , j, α), j ∈ Xn .
1 log((n−C2 η2 )/v+ ) is posThus, if (N , j, α) = ∅, then the number T˜ := 1s log α− s−2 itive. Moreover (N , j, α) ⊂ [0, T˜ ]. Therefore substituting in (20) the set (N , j, α) by [0, T˜ ] we come to (29). In the case when (N , j, α) = ∅ the estimate (29) is fulfilled by the definition of the value @j (α). Now let us sum up the quantities j ∈Xn kj @j (α) over n. Namely we consider the set Y0 – the union of all Xn , for which Xn ∩ Z(N , α) = ∅ and (28) is fulfilled. The following statement gives an estimate of the sum j ∈Y0 kj @j (α).
Proposition 2.6. Under the conditions of Proposition 2.1 there exists a function µ1 : (0, 1) → R+ , such that lim µ1 (t) = 0, t→0
kj @j (α) ≤ µ1 (η)α d/s + o(α d/s ),
α → ∞,
0 < η < 1.
j ∈Y0
Proof. If Xn ⊂ Y0 , then according to (28), n ≥ n1 := [η2 C2 + η−(s−2) ] + 1, and according to (10) and (24), n ≤ n2 := [v+ α 1−2/s + C2 η2 ]. Furthermore, we multiply both parts of (29) by kj and sum up over j ∈ Xn . Taking into account (26) we get j ∈Xn
kj @j (α) ≤ C
n
d−2 2
α
d−1 s
+n
d−3 2 1
(n − C2 η2 ) s−2
d
αs
.
226
O. Safronov
Therefore
kj @j (α) ≤ C
d−1 d d−2 d−3 n2 n 2 α s +n 2 αs 1
(n − C2 η2 ) s−2
n=n1
j ∈Y0
∞
≤ Cα d/s
n=n1
n
d−3 2 1
(n − C2 η2 ) (s−2)
+ O α d/2−2/s
(30)
as α → ∞. Since in the conditions of Proposition 2.1, d/2 − 2/s < d/s, it remains to note that n1 → ∞ when η → 0. 2.6. Let us consider the set Y := Z(N , α) \ Y0 . An analog of Proposition 2.6 for Y is Proposition 2.7. Under the conditions of Proposition 2.1 there exists a function µ2 : (0, 1) → R+ such that lim µ2 (t) = 0, t→0
kj @j (α) ≤ µ2 (η)α d/s + o(α d/s ),
α → ∞, 0 < η < 1.
j ∈Y
Proof. If Xn ∩ Y = ∅, then the inequality (28) is not fulfilled. Therefore according to (24)
j < (1 + η2 C2 + η−(s−2) )α 2/s for all j ∈ Y . In accordance with the Weyl law for j and kj we get
d−1 d−1 2 . kj ≤ C 1 + α s 1 + η2 C2 + η−(s−2)
(31)
j ∈Y
Furthermore, since (N , j, α) ⊂ [0, T ], due to (20) we have: @j (α) ≤ Cηα 1/s . Let us multiply both parts of this inequality by kj and sum up over j ∈ Y . Now taking into account (31) we get
d−1 d−1 2 kj @j (α) ≤ Cηα 1/s 1 + α s 1 + η2 C2 + η−(s−2) .
j ∈Y
It is exactly what we need. Let us now complete the proof of Proposition 2.1. Since Z(N , α) = Y ∪ Y0 , then from Propositions 2.6 and 2.7 it follows that lim sup α −d/s α→∞
kj @j (α) = µ(η),
(32)
j ∈Z(N ,α)
where 0 ≤ µ(η) ≤ µ1 (η) + µ2 (η). Thus limη→0 µ(η) = 0. Combining (23) with (32) we come to the statement of Proposition 2.1.
Discrete Spectrum
227
3. Proof of Theorem 1.1 3.1. In this section we prove the relation (6). Necessary auxiliary statements for this are given in Subsects. 3.1 and 3.2. In Subsect. 3.3 the proof of Theorem 1.1 is completed. Here we follow the way of the article [2] but additionally we use Proposition 2.1. For λ ∈ and α > 0 we define a quantity N0 (λ, A(α)) as a number of eigenvalues of the operator A(t) which cross the point λ with t growing from zero to α (excluding α). It is clear that for λ1 , λ2 ∈ , λ1 < λ2 , rankEA(α) [λ1 , λ2 ) = N0 (λ2 , A(α)) − N0 (λ1 , A(α)). Therefore it is enough to obtain the asymptotics of the right-hand side, as α → ∞. The following statement plays an important role in the proof of Theorem 1.1. Theorem 3.1. Let λ ∈ and ε > 0. Assume that λ − ε, λ + ε ∈ . Then there exist positive numbers r0 , α0 and c such that for r > r0 , α > α0 , the following inequalities are fulfilled: N0 (λ, A(α)) ≤ N ((λ + ε) + 0, A(N ) (α, α 1/s Br )) − N (λ + ε, A(N ) (0, α 1/s Br )) + cr d−1 α (d−1)/s , N0 (λ, A(α)) ≥ N (λ − ε, A(D) (α, α 1/s Br )) − N (λ − ε, A(D) (0, α 1/s Br )) − cr d−1 α (d−1)/s ,
(33)
(34)
where c does not depend on r and α. We postpone the proof of Theorem 3.1 until Subsect.3.4. Actually all the ideas of the proof are given in [2]. In Sect. 4 we reproduce them in the exact form. Below we use the following notations ||p||∞ = supx∈Rd |p(x)|, v+ = supr>1 r s V (r). For " ⊂ Rd we put L(") = inf x∈" |x|. For L(") > 0, λ ∈ R we introduce the quantity g = g(λ, ") = ||p||∞ +v+ L(")−s +|λ|. The following proposition gives an asymptotic estimate for the quantity N (λ, A(ε) (α, α 1/s ")), as α → ∞. Proposition 3.1. Let " ⊂ Rd be a domain with a piecewise smooth boundary. Assume that L(") > 0. Then for any λ ∈ R, lim sup α −d/s N (λ, A(ε) (α, α 1/s ")) ≤ (2π )−d ωd g d/2 mes("). α→∞
(35)
Proof. For x ∈ α 1/s " we have the estimate αV (|x|) ≤ v+ L(")−s . Thus αV0 (x) + λ − p(x) ≤ g, x ∈ α 1/s ". Then it follows from the variational principle that N (λ, A(ε) (α, α 1/s ")) does not exceed the amount of the negative spectrum of the operator − − gI in L2 (α 1/s "). Let us substitute x = α 1/s y, y ∈ ". As a result we obtain that N (λ, A(ε) (α, )) does not exceed the entire multiplicity of the negative spectrum of the operator − −α 2/s gI in L2 (") (with appropriate boundary conditions). Now the inequality (35) follows from the Weyl law for the eigenvalues of the Laplace operator on a domain with a piecewise smooth boundary. We fix n ∈ Zd and ν > 0. Let us consider the operator A(ε) (α, α 1/s ν(Qn )), where Qn = Q + n, (N )
(D )
Let xn be the nearest and xn of the cube Qn .
Q = [0, 1)d .
be the most remote (with respect to the origin) point
228
O. Safronov (N )
Proposition 3.2. Let ν > 0, n ∈ Zd and λ ∈ R. Assume that |xn
| > 0. Then
lim supα −d/s N (λ, A(N ) (α, α 1/s νQn )) ≤ ν d ρ(λ + va (ν|xn(N ) |)−s ),
(36)
lim infα −d/s N (λ, A(D) (α, α 1/s νQn )) ≥ ν d ρ(λ + va (ν|xn(D) |)−s ).
(37)
α→∞
α→∞
Proof. First of all, let us assume that V (r) = va r −s , r > 1. Then for x ∈ α 1/s νQn , |x| > 1 we have va (ν|xn(D) |)−s ≤ αV (|x|) ≤ va (ν|xn(N ) |)−s . (N )
(D )
After the substitution V0 (x) = V (|x|) by va (ν|xn |)−s (by va (ν|xn |)−s ), the value N (λ, A(ε) (α, α 1/s νQn )) can only increase (decrease). Thus the quantity N (λ, A(ε) (α, α 1/s νQn )) is estimated from above (the case ε = N ) or from below (the case ε = D) by M(ε, α), which equals the amount of the negative spectrum of the oper(ε) ator − + p − (λ + va (ν|xn |)−s )I in L2 (α 1/s νQn ) with ε-conditions on α 1/s ν∂Qn . ˜ λ˜ ∈ R, According to the definition of the quantity ρ(λ), lim α −d/s M(ε, α) = ν d ρ(λ + va (ν|xn(ε) |)−s ).
α→∞
This proves (36), (37) for V (r) = va r −s . In the general case for any µ > 0 there exists α0 > 0, such that for α > α0 , (va − µ)|x|−s ≤ V (|x|) ≤ (va + µ)|x|−s ,
∀ x ∈ α 1/s νQ.
For the potentials in the utmost parts of this inequality the estimates of the kind (36), (37) are fulfilled with the substitution of va by va + µ in the right-hand sides. It remains to let µ → 0. The following statement gives an asymptotics of the spectral distribution function of the operator on the increasing (with the growth of α) spherical layer. Proposition 3.3. Let 0 < δ1 < δ2 < ∞, " = {x ∈ Rd : δ1 < |x| < δ2 } and λ ∈ R. Then lim α −d/s N (λ, A(ε) (α, α 1/s ")) = ρ(λ2 + va |x|−s )dx. α→∞
δ1 <|x|<δ2
Proof. We fix ν > 0. Let J be a set of all n ∈ Zd for which νQn ⊂ " and "1 = " \ ∪n∈J (νQn ). Then N (λ, A(ε) (α, α 1/s ")) ≤ N (λ, A(N ) (α, α 1/s "1 )) N (λ, A(N ) (α, α 1/s νQn )). +
(38)
n∈J
Using Proposition 3.2 and 3.3 we obtain the following estimate: lim supα→∞ α −d/s N (λ, A(ε) (α, α 1/s ")) ≤ Cmes("1 ) ν d ρ(λ + va (ν|xn(N ) |)−s ). + n∈J
(39)
Discrete Spectrum
229
Let us pass to the limit as ν → 0. Obviously, mes("1 ) → 0, ν d ρ(λ + va (ν|xn(N ) |)−s ) → ρ(λ + va |x|−s )dx δ1 <|x|<δ2
n∈J
as ν → 0. Therefore, lim supα→∞ α
−d/s
(ε)
N (λ, A (α, α
1/s
")) ≤
δ1 <|x|<δ2
ρ(λ + va |x|−s )dx.
In the same way we obtain the lower estimate: lim inf α→∞ α −d/s N (λ, A(ε) (α, α 1/s ")) ≥
δ1 <|x|<δ2
ρ(λ + va |x|−s )dx.
Let " ⊂ Rd be a domain with a piecewise smooth boundary and λj ∈ R, j = 1, 2. We put n(λ1 , λ2 , ", α) = N (λ2 , A(N ) (α, α 1/s ")) − N (λ1 , A(D) (α, α 1/s ")),
α > 0.
Obviously if " = "1 ∪ "2 , then the following inequalities are fulfilled: n(λ1 , λ2 , ", α) ≥ −n(λ2 , λ1 , "j , α), j = 1, 2, n(λ1 , λ2 , ", α) ≤ n(λ1 , λ2 , "1 , α) + n(λ1 , λ2 , "2 , α).
(40) (41)
Proposition 3.4. Let r > 0 and λj ∈ R, j = 1, 2. Then lim α −d/s n(λ1 , λ2 , Br , α) = (ρ(λ2 + va |x|−s ) − ρ(λ1 + va |x|−s ))dx.
α→∞
|x|
(42)
Proof. Let us take " = Br , "1 = Bη , 0 < η < r and "2 = " \ "1 in (41). Now we multiply the obtained inequality by α −d/s and pass to the upper limit as α → ∞. According to Proposition 2.1 and 3.3 we get lim sup α −d/s n(λ1 , λ2 , ", α) α→∞ ≤ µ(η) +
η<|x|
(ρ(λ2 + va |x|−s ) − ρ(λ1 + va |x|−s ))dx.
(43)
Taking η → 0 we come to the following estimate: lim sup α −d/s n(λ1 , λ2 , ", α) α→∞ ≤ (ρ(λ2 + va |x|−s ) − ρ(λ1 + va |x|−s ))dx. |x|
(44)
230
O. Safronov
In the same way we obtain the lower estimate from Eq. (40), lim inf α→∞ α −d/s n(λ1 , λ2 , ", α) ≥ (ρ(λ2 + va |x|−s ) − ρ(λ1 + va |x|−s ))dx. |x|
(45)
Let us combine Proposition 3.1 and 3.4. We get the following result. Proposition 3.5. Let λj ∈ , j = 1, 2, and ε > 0. Assume that λj + ε ∈ , λj − ε ∈
, j = 1, 2. Then there exists r > 0 such that lim supα→∞ α −d/s (N0 (λ2 , A(α)) − N0 (λ1 , A(α))) (ρ(λ2 + ε + va |x|−s ) − ρ(λ1 − ε + va |x|−s ))dx, ≤
(46)
lim inf α→∞ α −d/s (N0 (λ2 , A(α)) − N0 (λ1 , A(α))) (ρ(λ2 − ε + va |x|−s ) − ρ(λ1 + ε + va |x|−s ))dx. ≥
(47)
|x|
|x|
Proof. The function ρ(λ) is constant for λ ∈ . Therefore if we put va = 0 in (42), then we get lim α −d/s (N (λ2 ±ε, A(0, α 1/s Br )) − N (λ1 ±ε, A(0, α 1/s Br )))
α→∞
= mes(Br )(ρ(λ2 ±ε) − ρ(λ1 ±ε)) = 0.
(48)
We subtract the inequality (34) with λ = λ1 from (33) with λ = λ2 . We multiply the result by α −d/s and pass to the upper limit at α → ∞. According to (42) and (48) we obtain lim supα→∞ α −d/s (N0 (λ2 , A(α)) − N0 (λ1 , A(α))) ≤ lim supα→∞ α −d/s n(λ1 − ε, (λ2 + ε) + 0, Br , α) (49) −s −s (ρ(λ2 + ε + va |x| ) − ρ(λ1 − ε + va |x| ))dx. = |x|
This coincides with (46). Inequality (47) can be obtained in the same way. We just have to subtract (33) with λ = λ1 from (34) with λ = λ2 . Let us complete the proof of Theorem 1.1. In (46) and (47) we pass to the limit as r → ∞. Then we take ε → 0. The passage to the limit as ε → 0 under the integral is valid since the integrated functions in (46), (47) have a integrable majorant for ε < ε0 . As a result we get lim supα→∞ α −d/s (N0 (λ2 , A(α)) − N0 (λ1 , A(α))) ≤ (ρ(λ2 + va |x|−s ) − ρ(λ1 + va |x|−s ))dx,
(50)
Discrete Spectrum
231
lim inf α→∞ α −d/s (N0 (λ2 , A(α)) − N0 (λ1 , A(α))) ≥ (ρ(λ2 + va |x|−s ) − ρ(λ1 + va |x|−s ))dx.
(51)
It remains to note that rankEA(α) (λ1 , λ2 ) = N0 (λ2 , A(α)) − N0 (λ1 , A(α)) for λ1 ∈ ρ(A(α)). 4. Proof of Theorem 3.1 4.1. We fix an interval [a, b] satisfying the following conditions: λ − ε, λ + ε ∈ (a, b), [a, b] ⊂ .
(52)
Obviously there exist a , b ∈ R such that a < a, b > b, [a , b ] ⊂ .
(53)
Let the operators A(ε) (α, ") be the same as in Sect. 1. We put Ir = EA(D) (0,Br ) (a , b ), r > 0. Also let the function ψ ∈ C ∞ (Rd ) satisfy the following conditions: 0 for |x| < 1/2, 0 ≤ ψ(x) ≤ 1, ψ(x) = 1 for |x| > 3/4. We denote by Kr the multiplication operator by the function ψr (x) = ψ(x/r), r > 0, in L2 (Rd ) and we put A˜r = A(D) (0, Br ) + (b − a )Kr Ir Kr , r > 0.
(54)
In [1] the following statement concerning A˜r is proved. Proposition 4.1. Let the conditions (52), (53) be fulfilled and let A˜r be defined as in (54). Then there exists r˜0 > 0 such that [a, b] ⊂ ρ(A˜r ),
∀ r > r˜0 .
We put A˜r (α) = A˜r − αV0 .
(55)
The next statement allows to approximate the quantity N0 (λ, A(α)) by similar quantities for operators on a ball Br , r > 0. We refer to [2] for its proof. Proposition 4.2. Let λ ∈ (a, b) and the conditions (52), (53) are fulfilled. Assume that the operator A˜r in (55) is defined in (54). Then N0 (λ, A(α)) ≥ lim sup(N (λ, A˜r (α )) − N (λ, A˜r )), 0 < α < α;
(56)
N0 (λ, A(α)) ≤ lim inf (N (λ, A˜r (α )) − N (λ, A˜r )),
(57)
r→∞
r→∞
0 < α < α.
232
O. Safronov
1 ,ε2 We introduce the selfadjoint operators AεR,r = − + p in L2 (Br \ BR ), r > R > 0, where the indexes ε1 and ε2 , which take the values D or N , characterize boundary conditions (Dirichlet or Neumann) on ∂BR and ∂Br correspondingly. We also 1 ,ε2 1 ,ε2 put A˜ εR,r = AεR,r + (b − a )Kr Ir Kr , r > R > 0. The next statement is contained in Lemma 3.2 and Lemma 4.2 of [2].
Proposition 4.3. Let r˜0 be as in Proposition 4.1 and λ ∈ (a, b). Then for r > r˜0 , 1 ≤ R ≤ r/2 the following inequalities hold: ,D d−1 ˜ N (λ, A(D) (0, BR )) + N (λ, A˜ D , C1 = C1 (λ), R,r ) ≥ N (λ, Ar ) − C1 R
(58)
,D d−1 ˜ N (λ, A(N ) (0, BR )) + N (λ, A˜ N , C2 = C2 (λ), R,r ) ≤ N (λ, Ar ) + C2 R
(59)
where C1 and C2 do not depend on r and R. 4.2. Now let us complete the proof of Theorem 3.1. First of all we establish the relation (33). Let α > α and r0 = (v+ /ε)1/s . We put R = mα 1/s , m > r0 . We note that αV0 (x) < ε for |x| > R. Therefore from the variational principle it follows that ,D N (λ, A˜ r (α )) ≤ N (λ + ε, A(α , BR )) + N (λ + ε, A˜ N R,r ),
R < r.
(60)
,D d−1 , ≥ N (λ + ε, A(N ) (0, BR )) + N (λ + ε, A˜ N R,r ) − C2 R
(61)
Since λ + ε ∈ (a, b), we obtain from (59) that N (λ, A˜ r ) = N (λ + ε, A˜ r ) C2 = C2 (λ + ε),
r > r˜0 ,
1 ≤ R ≤ r/2.
Estimating the right-hand side of (57) with the help of (60), (61) we get N0 (λ, A(α)) ≤ N (λ + ε, A(N ) (α , BR )) − N (λ + ε, A(N ) (0, BR )) + C2 R d−1 , R = mα Taking
α
1/s
(62)
≥ 1.
→ α we come to the upper estimate N0 (λ, A(Eα)) ≤ N (λ + ε) + 0, A(N ) (α, BR ) − N (λ + ε, A(N ) (0, BR )) + C2 R d−1 , R = mα
1/s
(63)
,
which coincides with (33) for r = m and c = C2 . Moreover (63) holds for all m > r0 , α > α0 = r0−s . The relation (34) can be obtained in the same way from (56) and (58). References 1. Alama, S., Deift, P.A., Hempel, R.: Eigenvalue branches of the Schrödinger operator H − λW in a gap of σ (H ). Commun. Math. Phys. 121, 291–321 (1989) 2. Hempel, R.: Eigenvalues in gaps and decoupling by Neumann boundary conditions. J. Math. Anal. Appl. 169, no. 1, 229–259 (1992) 3. Sobolev, A.: Weyl asymptotics for the discrete spectrum of the perturbed Hill operator. Adv. Sov. Math. 7, 159–178 (1991) Communicated by B. Simon
Commun. Math. Phys. 218, 233 – 244 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Vacuum Nodes and Anomalies in Quantum Theories M. Aguado, M. Asorey, J. G. Esteve Departamento de Física Teórica, Facultad de Ciencias, Universidad de Zaragoza, 50009 Zaragoza. Spain. E-mail:
[email protected] Received: 6 April 1999 / Accepted: 21 October 2000
Abstract: We show that nodal points of ground states of some quantum systems with magnetic interactions can be identified in simple geometric terms. We analyse in detail two different archetypical systems: i) the planar rotor with a non-trivial magnetic flux and ii) the Hall effect on a torus. In the case of the planar rotor we show that the level repulsion generated by any reflection invariant potential V is encoded in the nodal structure of the unique vacuum for θ = π . In the second case we prove that the nodes of the first Landau level for unit magnetic charge appear at the crossing of the two non-contractible circles α− , β− with holonomies hα− (A) = hβ− (A) = −1 for any reflection invariant potential V . This property illustrates the geometric origin of the quantum translation anomaly. 1. Introduction Classical configurations play different roles in the description of quantum effects. Monopoles, skyrmions, vortices, solitons, kinks and similar classical configurations contribute to unveil the existence of non-trivial sectors in the energy spectrum of many quantum theories. Classical configurations are also important for the description of superselection sectors and non-trivial phase structures in quantum field theories. Tunnel effect is semiclassically described by means of instantons. There is, however, another kind of classical configurations which play a genuine quantum role in the description of some physical effects: the nodal configurations of physical states. It is known that the structure of those nodes encode information about relevant physical properties such as the complete integrability or chaotic behaviour of the corresponding systems. Standard minimum principle arguments disfavor the appearance of nodes in ground states of quantum systems. However, the presence of CP violating interactions invalidates the use of such arguments and the vacuum response to this kind of interactions may involve the appearance of nodes. The existence of a non-trivial nodal structure in the vacuum states of quantum theories with CP violating interactions provides a new
234
M. Aguado, M. Asorey, J. G. Esteve
perspective in the analysis of the role of classical configurations in quantum theory. In some cases, the infrared behaviour of the theory is so dramatically modified by the CP violating interaction that a confining vacuum state can become non-confining. The connection between the absence of confinement and the existence of nodes in the vacuum state, suggests that new classical field configurations related to the nodal structure of the quantum vacuum emerge as new candidates to play a significant role in the mechanism of confinement. This idea has been successfully exploited to show the absence of spontaneous breaking of CP symmetry at θ = π for various field theories [6, 7]. In this paper we analyse the connection between the appearance of nodes in ground states of quantum systems generated by CP violating interactions and some non-perturbative quantum effects. In particular, we analyse in some detail the vacuum nodal structure of the quantum planar rotor with a θ = π term and the quantum Hall effect on a torus. In both cases the vacuum nodal structure turns out to be intimately related to the behaviour of the corresponding ground states under CP symmetry or translation symmetry. In order to understand the physical origin of vacuum nodes, let us briefly recall the standard argument which prevents the vanishing of the vacuum. Let us consider a quantum system evolving on a finite dimensional Riemannian manifold (M, g) with Hamiltonian H =
1 + V (x), 2
(1.1)
defined by the Laplace–Beltrami operator and a non-singular potential V (x). Unitarity of quantum evolution requires the potential V (x) to be real, V (x) = V (x)∗ , to guarantee the hermicity of H . In this case the system is invariant under time reversal, U (T )ψ(x, t) = ψ(x, −t)∗ , because [U (T ), H ] = 0. This symmetry implies that for any energy level there is a basis of real stationary states, ψn (x) = ψn (x)∗ . Indeed, if ψn (x) is an eigenstate of H with energy En , the state ψn (x)∗ is also an eigenstate with the same energy. If ψn is not real, ψn∗ = ψn , the states ψ± = ψn∗ ± ψn will have the same energy En and will be real, irrespective of the degeneracy or not of the energy level. If H is bounded below and has a non-trivial discrete spectrum there is a ground state ψ0 whose energy attains the minimum E0 of the energy spectrum. Because V has no singularities on M it is trivial to see that ψ0 cannot vanish for any point x of the configuration space. Indeed, ψ0 satisfies the stationary equation H ψ0 = E0 ψ0 . If the set of nodal points of ψ0 , N 0 = {x ∈ M; ψ0 (x) = 0}, is non-trivial, the positive real function ψ1 = |ψ0 | is smooth everywhere except at the points of N0 . The expectation value of the Hamiltonian on the state ψ1 is again E0 because the delta function singularity of ψ1 at N0 is cancelled by the vanishing of ψ1 at that point, E0 = ψ0 |H ψ0 = ψ1 |H ψ1 ; e.g. in one dimension, if there is a nodal point, N0 = {x∗ }, we have ∞ d2 d2 d d2 ψ0 | 2 ψ0 = ψ1 | 2 ψ1 − 2 ψ0 (x)∗ δ(x − x∗ ) ψ0 (x) = ψ1 | 2 ψ1 . dx dx dx dx −∞
Vacuum Nodes and Anomalies in Quantum Theories
235
Since E0 is the lowest eigenvalue of H this means that ψ1 is also an eigenstate of H with the same energy that ψ0 . Now, elliptic regularity implies that any eigenstate of H must be smooth, thus ψ1 cannot be an eigenstate because its differential is discontinuous at N0 . The contradiction, being motivated by the assumption of existence of nodes, disappears if ψ(x) = 0 for any x ∈ M. The same argument leads to the proof of vacuum uniqueness. If the vacuum were degenerate, there would exist another ground state ψ1 = ψ0 . Then, the ground state defined by χ (x) = ψ0 (x∗ )ψ1 (x)−ψ1 (x∗ )ψ0 (x) will vanish for x = x∗ , which cannot occur by the previous argument. Both results rely heavily on the real, local and smooth characteristics of the potential V . Exceptions for this archetypical infrared behaviour of quantum systems can arise either by the introduction of internal degrees of freedom (e.g. spin), singular or nonlocal potentials, or complex interactions. Complex interactions are physically generated by the presence of magnetic fields. The interaction with the magnetic gauge field potential A through the gauge principle of minimal coupling leads to a Hamiltonian HA =
1 A + V (x), 2
(1.2)
which is not invariant under time reversal, U (T )HA U (T ) = H−A . The eigenstates are not necessarily real functions and the rest of the argument leading to the absence of nodes and uniqueness of the vacuum state fails. 2. The Planar Rotor Let us consider the case of a charged particle moving on a circle under the action of a periodic potential V (ϕ) and a non-trivial magnetic flux = 2π crossing through the circle. In this case, M = S 1 and 1 H = − (∂ϕ − i)2 + V (ϕ), 2
(2.1)
where ϕ ∈ [−π, π) is the angular coordinate of the circle, and we assume that the mass and charge of the particle are m = e = 1. Proposition 2.1. If the potential V is reflection invariant V (ϕ) = V (−ϕ), the matrix element KT (ϕ0 , ϕ1 ) = ϕ0 |e−T H |ϕ1 of the heat kernel operator vanishes for = 1/2 when ϕ0 = 0 and ϕ1 = π , i.e. 1/2
KT (0, π ) = 0. Proof. In such a case the Hamiltonian (2.1) is invariant under the Bragg reflection symmetry U (P )ψ(ϕ) = ψ(−ϕ) and it is always possible to find in the Hilbert space H = L2 (S 1 ) a complete basis of stationary states with definite U(P) symmetry. If the energy level is not degenerate, the corresponding physical state ψ(ϕ) has to be either even or odd under U (P ) symmetry. In the degenerate case, if U (P )ψ is not the same state as ψ, the stationary functionals
236
M. Aguado, M. Asorey, J. G. Esteve
ψ± = ψ ±U (P )ψ are parity even/odd, respectively. This implies that the kernel element 1/2 KT (0, π) is reflection invariant 1/2 1/2 U (P )† KT U (P )(0, π) = U (P )ψn (0)∗ U (P )ψn (π )e−En T = KT (0, π ). (2.2) n
On the other hand in the path integral representation T KT (ϕ0 , ϕ1 ) = δϕ exp − dt 21 ϕ(t) ˙ 2 + i ϕ(t) ˙ + V (ϕ(t)) , ϕ(0)=ϕ0 ϕ(T )=ϕ1
(2.3)
0
we have that U (P )† KT U (P )(0, π ) = KT− (0, π )
(2.4)
because the P transformation leaves the points ϕ = 0 and ϕ = π invariant but changes the sign of the -term in the exponent of the path integral, since it reverses the orientation of every path. The contribution of this term becomes −2π i(1/2+n) instead of 2π i(1/2+ n) for any trajectory ϕ(t) in S 1 connecting ϕ = 0 with ϕ = π with winding number n. Thus, the kernel element KT (0, π ) is not invariant under reflection symmetry, unless 1/2 = 0 (mod. Z). In particular for = 1/2, the kernel element KT (0, π ) is parity odd and purely imaginary U (P )† KT U (P )(0, π) = KT− (0, π ) = KT (0, π )∗ = −KT (0, π ). 1/2
1/2
1/2
1/2
This is in disagreement with (2.2) unless the kernel element vanishes for those points 1/2 KT (0, π) = 0. This property is independent of the potential term V and the value of T . In particular, it implies that the same vanishing property holds for the restriction of KT (0, π ) to any energy level, e.g the ground state. If the vacuum is non-degenerate it has to vanish either at ϕ = π or ϕ = 0 for this particular value = π of the magnetic flux ( = 1/2). This property of the heat kernel can also be understood in the Hamiltonian formalism. The presence of the magnetic flux has a non-trivial effect in the energy spectrum of the theory (Aharonov–Bohm effect) because of the non-simply connected character of S 1 , π1 (S 1 ) = Z. Although the dependence cannot be removed by a globally defined gauge transformation, the singular gauge transformation ξ(ϕ) = e−iϕ ψ(ϕ)
(2.5)
which is uniquely defined on the domain (−π, π ) but is discontinuous at ϕ = ±π , removes the dependence of the quantum Hamiltonian = e−iϕ H eiϕ = H0 . H The dependence is, however, encoded in the non-trivial boundary conditions that physical states have to verify at the boundary ϕ± = ±π , ξ(−π ) = e−i ξ(π ).
(2.6)
In this sense the transformation is trading the -dependence of the Hamiltonian for non-trivial boundary conditions at ϕ± .
Vacuum Nodes and Anomalies in Quantum Theories
237
The relevant extra property which allows us to extract some information on the nodal structure of the quantum vacua is that the theory is U(P) invariant for = 1/2. In the Hamiltonian approach this property of the special case = 1/2 comes from the fact that the boundary condition (2.6) becomes an anti-periodic boundary condition, ξ(ϕ+ ) = −ξ(ϕ− ), which is a reflection invariant condition. As discussed above it is always possible to find a complete basis of stationary states with definite U(P) symmetry. If the energy level is not degenerate the corresponding physical state ψ(ϕ) has to be U(P) even or U(P) odd. In the degenerate case, we can have states with both parities. But, because of anti-periodic boundary conditions, any of them satisfies U (P )ξ(π) = ξ(−π ) = −ξ(π ). Thus, for any parity even state ξ+ this is possible only if ξ+ vanishes for ϕ = ±π , ξ+ (±π ) = 0. In the same way, since for any parity odd state ξ− we have U (P )ξ− (0− ) = ξ− (0+ ) = ξ− (0), any parity odd state vanishes for ϕ = 0, i.e. ξ− (0) = 0. This property explains the vanishing of the heat kernel element KT1/2 (0, π ) for any value of T , previously derived by path integral methods, because half of the states of an orthonormal basis of stationary states in L2 (S 1 ) vanish at ϕ = 0 whereas the other half vanish at ϕ = π . Let us now consider the structure of the ground state ψ0 . Proposition 2.2. A planar rotor interacting with a transverse magnetic flux = π and a reflection invariant non-constant potential V (ϕ) with maximum height at ϕ = π and minimum value at ϕ = 0 has a unique vacuum state ψ0 which is parity even and vanishes at ϕ = π. Proof. Since the potential term V is non-trivial it gives a non-trivial contribution to the energy of stationary states. The states with lowest energy which are parity even vanish at ϕ = ±π, where the potential term attains its maximal value, and cannot have the same energy as parity odd states which vanish at ϕ = 0, where the potential term attains its minimal value. This feature implies that the quantum vacuum state ψ0 is non-degenerate, is parity even and vanishes at ϕ = ±π . The splitting of energies between the ground state and the first excited state can also be understood in terms of tunnelling effect induced by instantons. But the argument used above is completely rigourous and does not rely on any semiclassical approximation or asymptotic expansion (see [4] for an early anticipation of this behaviour of the ground state based in numerical calculations). The existence of a non-trivial potential with such a peculiar behaviour is crucial for the proof of the existence of vacuum nodes. If V = 0, there is no splitting between the energies of even and odd states and the ground state becomes degenerate. In this case there are ground states with indefinite parity which are linear combinations of parity even and parity odd ground states and have no nodes. However, the kernel of the restriction of 1/2 the operator KT to the ground state subspace also vanishes for the pair of points 0 and π as the path integral formula predicts. The parity of the vacuum for general potentials with unique vacuum depends on the structure of the potential. Although small generic perturbations of potentials of the type considered in Proposition 2.2 preserve the even character of the vacuum it might change for large perturbations due to the appearance of level crossings. The existenc of such crossings for V = 0 guarantees the consistency of the result when the maximum and minimum of V are interchanged. The presence of a magnetic field generates nodes in the ground state as a vacuum response to the magnetic flux crossing the circle where the system evolves. This system mimics the behaviour of the 1+1 dimensional QED on a cylinder with a θ -term θ = 2π when V = 0.
238
M. Aguado, M. Asorey, J. G. Esteve
3. Hall Effect in a Torus A charged quantum particle (e = m = 1) moving on a two-dimensional torus T 2 under the action of an uniform magnetic field B = k/2π (k ∈ Z) and an external potential V is governed by the Hamiltonian 1 HA = − A + V , 2
(3.1)
where A is the covariant Laplacian with respect to a U(1) gauge field A with curvature B defined on the line bundle Ek (T 2 , C) with first Chern class k ∈ Z, whose sections are the quantum states. For trivial potentials V = const, the spectrum of the Hamiltonian (3.1) is exactly solvable. It is given by the Landau levels
1 En = B n + n∈N 2 as in the infinite plane case. However, in the present case the degeneracy of each level is finite, dim H0 = |k|, whereas in infinite volume the degeneracy is infinite for k = 0. The degeneracy of the ground state E0 is not only dependent of the first Chern class of the line bundle Ek (T 2 , C), where physical states are defined but also of the background metric of the torus and the form of the magnetic field [2]. 3.1. Translation anomalies. In the presence of the constant magnetic field, V = const, and a symmetric metric the classical system is translation invariant but the quantum generators of translation symmetries given by j
Lj = −iDA − j l xl B
j = 1, 2
suffer from an anomaly which transforms the abelian algebra R × R into the Heisenberg algebra [L1 , L2 ] = −iB
(3.2)
as a central extension with central charge B. This is easy to understand because the system has two degrees of freedom and cannot have three independent conmuting operators corresponding to time and space translations (3.1), (3.2). In a T 2 torus there is an extra anomaly of translation symmetry, for if the Heisenberg algebra were a real symmetry of the quantum system the energy levels would be infinitely degenerate, since any energy level supports a representation of the symmetry algebra and any representation of the Heisenberg algebra (3.1) must be infinite dimensional, but the energy levels do have a finite degeneracy k. The presence of the anomaly is explicitly shown by the existence of a non-vanishing correlation function involving the time derivative of the would be conserved currents lj = x˙j − j n x n B associated to translation transformations
B2 ψ0 |l˙j (t)x˙n (s)|ψ0 = −iθ3 (0)(δj n + ij n ) eiB(s−t) , 2
Vacuum Nodes and Anomalies in Quantum Theories
239
β+
α− φ− + φ− −
φ+ + φ+ − β−
α+
Fig. 3.1. Circles with holonomies ±1 and nodal points of quantum vacua
where θ3 (u) =
∞
e−πn
2 /2
e2nui
n=−∞
is the third Jacobi theta function. This is the simplest example of an anomalous symmetry in a quantum mechanical system. Notice that it is not present in the infinite volume limit. There is an operator theory explanation for this anomaly [3, 1]. Although the generators of the translation Heisenberg algebra L1 , L2 commute with H on the domain of functions with compact support on (0, 2π ) × (0, 2π ), the corresponding selfadjoint extensions do not commute, because the domain of definition of H is not preserved by the action of Lj . In this sense translation invariance is broken in the quantum system. This interpretation of the anomaly based on the anomalous behaviour of the domain of definition of the quantum Hamiltonian under translations was first pointed out by Esteve [8] and Manton [9]. In this case the existence of an anomalous commutator is crucial for the understanding of the finite degeneracy of energy levels in spite of the existence of a partial translation invariance. There is a simpler geometrical interpretation of the anomaly. The quantum system is not completely specified by the magnetic flux. To define the connection A one has to specify the holonomies, hα (A), hβ (A), along two complementary non-contractible circles of the torus α, β. Once hα (A), hβ (A) are specified, the holonomy along any other closed loop on T 2 is completely determined because the holonomies along two homotopically equivalent circles differ by a phase factor whose exponent is twice the magnetic flux of the torus domain enclosed by them. This means that while any of the basic circles α, β sweeps the torus under a 2π translation its holonomy describes a non-contractible loop along the gauge group U (1). Thus, there are at least two noncontractible circles α0 , β0 on the torus at which the holonomies of A reduce to the identity, i.e. hα+ (A) = hβ+ (A) = I . In a similar way there are at least other two noncontractible circles α− , β− at which the the holonomies of A are minus the identity, i.e. hα− (A) = hβ− (A) = −I (see Fig. 1). These loops are unique for any connection with unit first Chern class, i.e. c(A) = k = 1. The existence of such special loops explains why translation symmetry is completely
240
M. Aguado, M. Asorey, J. G. Esteve
broken for k = 1 in the Hall effect on a torus. Only translations which give a complete turn to the torus leave the Hamiltonian invariant. The translation symmetry group is then reduced from T 2 to I . For the same reasons for higher values of k the number of closed circles with trivial holonomy in each homotopy class is equal to k. If k > 1 there are k circles in the same homotopy class with the same holonomy. This means that the continuous translation symmetry is reduced by the anomalies to a discrete quantum symmetry generated by translations by an angle 2π/k in each of the two transversal directions of the torus, i.e. the symmetry is reduced from T 2 to a central extension of Zk−1 × Zk−1 . 3.2. Parity anomaly. There is another discrete symmetry which also becomes anomalous upon quantization. Let us introduce angular coordinates on the torus, T 2 = [0, 2π )× [0, 2π). The classical system is invariant under the combined action of two reflections with respect to any pair of angles φ = (φ1 , φ2 ), Pφ1 (ϕ1 , ϕ2 ) = (2φ1 − ϕ1 , ϕ2 ),
Pφ2 (ϕ1 , ϕ2 ) = (ϕ1 , 2φ2 − ϕ2 ),
i.e. Pφ = Pφ1 Pφ2 . Notice that any of these parity transformations Pφ1 , Pφ2 is not a symmetry because it reverses the orientation of the torus and, thus, the sign of the magnetic field. However, the need of specification in the quantum theory of the holonomies hα (A), hβ (A) breaks down this reflection symmetry with respect to a generic point of the torus (φ1 , φ2 ) except for the four crossing points φ++ = (φ1+ , φ2+ ), φ+− = (φ1+ , φ2− ), φ−+ = (φ1− , φ2+ ), φ−− = (φ1− , φ2− ) of the circles with holonomies I or −I (see Fig. 1), α± (ϕ) = (φ1± , ϕ),
β± (ϕ) = (ϕ, φ2± ),
ϕ ∈ [0, 2π ).
Reflection with respect to any other point transforms loops into loops with different holonomy for k = 1. For k > 1 there are more crossing points of circles with holonomies I and −I , thus, the reflection symmetry group is bigger in that case. In any case the remaining quantum symmetry, U (P±± ) defined by ±
±
U (P±± )ψ(ϕ1 , ϕ2 ) = e−i(φ2 ϕ1 −φ1 ϕ2 ) ψ (U (P±± )(ϕ1 , ϕ2 ))
(3.3)
is very relevant to find the nodes of the ground states. 3.3. Vacuum structure. Since the line bundle Ek (T 2 , C) is non-trivial for k = 0 any section must have nodal points. This means that any energy level has a non-trivial nodal structure. For |k| > 1 the degeneracy of energy levels is k which means that given any point φ∗ on the torus there is one state in that level with a node at φ∗ . Therefore in such a case the physical meaning of the nodal configuration cannot be relevant. However, in the case k = 1 (B = 1/2π) there is no degeneracy in the energy levels and the vacuum does have only one node which certainly has to be a very distinguished classical configuration for the quantum system. The search of vacuum nodes is simplified for the case of P++ -symmetric potentials V (P++ ϕ) = V (ϕ), by the following result. Proposition 3.1. For any P++ symmetric potential V the heat kernel elements KTA (φ++ , φ−− ) = KTA (φ+− , φ−− ) = KTA (φ−+ , φ−− ) = 0 vanish for any T if k = 1.
(3.4)
Vacuum Nodes and Anomalies in Quantum Theories
241
Proof. The basic strategy is similar to that used in the planar rotor. Because of the nondegeneracy of the energy levels, any stationary state must have a definite U (P±± )-parity symmetry with respect to the four points φ++ , φ+− , φ−+ , φ−− , where the circles with holonomies I and −I cross each other. There are four quantum parity symmetry transformations U (P++ ), U (P+− ), U (P−+ ) and U (P−− ). Although, the four transformation are identical in T 2 , e.g. all of them leave the four points invariant, they define four different unitary transformations in the space of quantum states. If we redefine our coordinates so that φ1+ = φ2+ = 0, we have that φ1− = φ2− = π . In such coordinates ϕ = (ϕ1 , ϕ2 ), the gauge field with the required holonomy properties is given by Ai =
B ij ϕ j 2
(3.5)
in a gauge with boundary conditions ψ(ϕ1 + 2π, ϕ2 ) = eiπBϕ2 ψ(ϕ1 , ϕ2 ),
ψ(ϕ1 , ϕ2 + 2π ) = e−iπBϕ1 ψ(ϕ1 , ϕ2 ). (3.6)
Since the P++ symmetry leaves the point φ−− invariant, physical states ψ must verify U (P++ )ψ(π, π ) = ψ(−π, −π ) = eiπ ψ(π, π ) = −ψ(π, π ). Thus, if ψ is U (P++ ) even ψ has a node at φ−− , i.e. ψ(φ−− ) = 0. For the same reason, U (P++ )ψ(0, 0) = ψ(0, 0), U (P++ )ψ(0, π ) = ψ(0, −π ) = ψ(0, π ), U (P++ )ψ(π, 0) = ψ(−π, 0) = ψ(π, 0), which implies that P++ odd states must vanish at φ++ , φ−+ and φ+− , i.e. ψ(φ++ ) = ψ(φ+− ) = ψ(φ−+ ) = 0. In a similar way we get that U (P−− )ψ(π, π ) = ψ(π, π ),
U (P−− )ψ(0, π ) = eiπ/2 ψ(2π, π ) = −ψ(0, π ),
U (P−− )ψ(0, 0) = ψ(2π, 2π ) = e−iπ ψ(0, 2π ) = −ψ(0, 0), U (P−− )ψ(π, 0) = e−iπ/2 ψ(π, 2π ) = −ψ(π, 0), which implies that P−− odd states must vanish at φ−− , whereas P−− even states must vanish at φ++ , φ−+ and φ+− . Similar properties hold for the remaining parity operators U (P−+ ), U (P+− ). Since there is a complete basis of stationary states consisting of U (P−− ) even and U (P−− ) odd states, this implies the vanishing of the kernel matrix elements (3.4). As in the planar rotor case there is an alternative derivation of the same results. It is based on the path integral approach. The method also carries enough information to identify the parity of the vacuum feature is to prove that the state. The essential matrix element KTA (φ++ , φ−− ) = φ++ |e−T HA |φ−− of the euclidean time evolution kernel vanishes for any T . This property can be easily derived from the path integral representation of the heat kernel T A A 2 1 δϕ hϕ (t) exp − dt 2 ϕ(t) ˙ + V (ϕ(t)) , (3.7) KT (φ++ , φ−− ) = ϕ(0)=ϕ++ ϕ(T )=ϕ−−
0
where hA ϕ (t) is the holonomy of the closed path ϕ(t). In the path integral representation (3.7) a path ϕ(t) connecting φ++ and φ−− transforms under the P reflection symmetry into another path P ϕ(t) which connects the same points and gives the same contribution to the real term of the exponent in the path integral. However, the contribution of both
242
M. Aguado, M. Asorey, J. G. Esteve
β+
α− φ− +
Pϕ
C1
φ− − C2
ϕ
φ+ +
φ+ − α+ Fig. 3.2. Paths giving opposite contributions to the path integral kernel
paths to the imaginary part is different. They contribute to the path integral with a phase factor which is exactly the holonomy of A along the paths. It is immediate to A −1 equals the holonomy of the see that the ratio of both contributions hA ϕ (t)(hP ϕ(t) ) closed loop obtained by composition ϕ(t) ◦ P ϕ(T − t) which is in the homotopy class 2n1 +1 2n2 +1 (2n1 + 1, 2n2 + 1) of α− ◦ β+ . The holonomy splits, by Stokes theorem, into a 2n1 +1 2n2 +1 factor which is the holonomy of the basic circles α− ◦ β+ and the magnetic flux 2 1 , 2 crossing two surfaces C1 and C2 in T : C1 being the domain of T 2 enclosed by the curves ϕ(t) and T ((2n1 + 1)π + (2n1 + 1), 0) 2π 0 ≤ t ≤ T2 T (t − 2 )) γ1 (t) = (3.8) T ((2n1 + 1)π, (2n2 + 1)π + (2n2 + 1) 2π T (t − T )) 2 ≤ t ≤ T, and C2 being the surface enclosed by P ϕ(T − t) and P γ1 (T − t) (see Fig. 2). Those contributions of magnetic fluxes are opposite and cancel each other. Thus, the contribution A −1 is reduced to the holonomy of α 2n1 +1 ◦β 2n2 +1 , i.e. (−1)2n1 +1 = −1. of hA − + ϕ (t)(hP ϕ(t) ) This means that the contributions of ϕ(t) and P ϕ(t) to the path integral are equal but with opposite signs. The contributions of both paths cancel and the argument can be repeated path by path to show that the whole path integral vanishes. In a similar way we can prove the vanishing of the kernel element KTA (φ+− , φ−− ) = KTA (φ−+ , φ−− ) = 0 for φ+− and φ−+ , because in that case the corresponding holonomies of paths and reflected 2n1 +1 2n2 2n1 2n2 +1 paths differ by the holonomies of the loops α− ◦β+ and α+ ◦β− , respectively. The relative negative sign is again the basis for the cancellation of the corresponding contributions to the path integral. Notice, however, that the argument cannot prove the vanishing of KTA (φ−+ , φ++ ), KTA (φ+− , φ++ ) or KTA (φ+− , φ−+ ). Let P↑↑ denote the reflection operator with respcet to the crossing points φ↑↑ , where the two circles with holonomy iI cross each other for k = 1. Theorem 3.1. Let k = 1 and V be an invariant potential under reflections P++ and P↑↑ . The ground state is unique, U (P++ ) even and has a node at φ++ .
Vacuum Nodes and Anomalies in Quantum Theories
243
Proof. In the case V = 0 the parity behaviour of the vacuum state can be obtained from an indirect argument. We know that the ground states in absence of potential term are holomorphic sections of the line bundle Ek (T 2 , C) with Chern class c(Ek ) = k (see Ref. [2] for a review and references therein). Any holomorphic section of Ek can have only k single nodes. For k = 1 only P++ -even states can have only one single node at φ−− , whereas P++ -odd states have at least three-different nodal points. Thus, the vacuum state is P++ -even and has a node at the crossing of the two circles with holonomy −I (the corresponding Abrikosov lattice has one single vortex). The same state is parity odd with respect to P−− , P+− and P−+ (the property also holds for k > 1). Those results can be explicitly checked from the exact analytic solutions
1 im ϕ1 +2π l k 2 4 1 i 4π ϕ1 ϕ2 k e e √ k n!2n m∈Z
1 k m 2 × Hn ϕ2 + 2π e− 4π k (2πm+kϕ2 ) , 2π k
1 ψnl (ϕ1 , ϕ2 ) = 2π
(3.9)
l = 0, · · · , k − 1. However, the symmetry arguments already introduced in the case of the planar rotor allows us to generalize this result for more general potentials which makes it extremely useful especially for non-exactly solvable cases. The vanishing of KTA (φ++ , φ−− ) = 0 for any T implies that U (P++ )ψn (φ++ )∗ U (P++ )ψn (φ−− ) = 0 n
for any energy level. If the ground state is degenerate there are at least two states ψ0+ and ψ0− which are even and odd, respectively, with respect to P++ -parity. ψ0+ vanishes at φ−− and ψ0− at φ−+ , φ+− and φ++ . We know that the kinetic term contribution is minimized in a state with a single node at φ−− . Since the potential term is reflection symmetric it has the same behaviour near the four points. Thus, the kinetic and potential energies are minimized on states with a unique node at φ−− instead of three nodes at φ−+ , φ+− and φ++ . From Ritz’s variational argument it is obvious that ψ0+ and ψ0− cannot have the same energy. The existence of such non-trivial splitting implies that the vacuum state ψ0 is unique and, thus, has a unique node at φ−− and is even with respect to P++ -parity and odd with respect to P−+ , P+− and P−− parities. It is also easy to understand this result from perturbation theory, because V does not connect parity even states ψ+ with parity odd states ψ− . In fact, by parity symmetry ψ+ |V n |ψ− = 0, which implies that there are no corrections to the parity behaviour of the ground state at any order in perturbation theory. This result is, thus, compatible with our non-perturbative result which holds for larger potentials and tells us that there is no level crossing of ground states whenever we keep the reflection symmetry properties of the potentials. We have found that the vacuum state vanishes at the intersection of the only pair of circles with holonomy −I . The singularity of this point explains very explicitly why translation invariance is completely broken. Only translations by 2π can leave the quantum states and their nodes invariant. For higher values of k we get similar results but now there are k circles with holonomy −I in each direction which cross at k 2 different
244
M. Aguado, M. Asorey, J. G. Esteve
points. Any of such points is one of the k nodes of parity even states under reflections with respect to the opposite crossing point of two circles with unit holonomy. Those nodes give rise to the Abrikosov lattice of vortices in a type II planar superconductor. But now we have different states vanishing at the different intersections which transform one into each other by the remaining discrete symmetries. In addition there are linear combinations of parity even and odd states for each level which vanish elsewhere on T 2 . Therefore, nothing special happens at that intersections. Although in the k = 1 case the points φ++ , φ−− , φ−+ and φ+− are distinguished points because they are the only nodes of stationary states, they do not have any physical meaning because the breaking of translation symmetry is reflected in the fact that there is a T 2 moduli space of U (1)connections A which generate the same magnetic field but differ by their holonomies, and for any point φ of T 2 there is a connection whose ground state vanishes at φ. All these connections and their corresponding eigenstates are obtained from those of one fixed connection by translations. Therefore the location of the vacuum nodes in this case does not have an special meaning. This is in contrast with what happens in quantum field theories, where the quantum vacua are really unique and their nodes are very special field configurations carrying, therefore, a relevant dynamical information [5]–[7]. The same arguments also apply for higher genus surfaces or surfaces with holes, giving very relevant information on the structure of the Abrikosov lattice for systems with defects. Acknowledgements. We thank Luis J. Boya, and Fernando Falceto for discussions and collaboration. The work of M.A. is supported by a MEC fellowship (Spain). We acknowledge CICYT for partial financial support under grant AEN97-1680.
References 1. Aguado, M.: Estudio de Anomalías Cuánticas en el Efecto Hall. Ms. Sci. dissertation, Universidad de Zaragoza (1997); Aguado, M., Asorey, M., Esteve, J.G.: Quantum Anomalies in the Hall Effect. DFTUZ/99-009 2. Asorey, M.: Topological Phases of Quantum Theories. Chern–Simons Theory. J. Geom. Phys. 11, 63–94 (1993) 3. Asorey, M.: Classical and Quantum Anomalies in the Quantum Hall Effect. In: Proceedings of the Fall Workshop on Differential Geometry and its Applications, eds. M. Muñoz and X. Gracia, Barcelona: Universidad de Barcelona, 1993 4. Asorey, M., Esteve, J.G., Pacheco, A.F.: The planar rotor: θ -vacuum structure and some approximate methods in Quantum Mechanics. Phys. Rev. D 27, 1852–1868 (1983) 5. Asorey, M., Falceto, F., López, J.L., Luzón, G.: Nodes Monopoles and Confinement in 2+1 Dimensional Gauge Theories. Phys. Lett. B 349, 125–130 (1995); Vacuum Structure of 2 + 1-Dimensional Gauge Theories. Banach Center Publications 39, 183–199 (1997) 6. Asorey, M., Falceto, F.: Vacuum Nodes in QCD at θ = π : Exact Results. Phys. Rev. Lett. 77, 3074–3077 (1996) 7. Asorey, M., Falceto, F.: Vacuum Structure of CPN Sigma Models at θ = π . Phys. Rev. Lett. 80, 234–237 (1998) 8. Esteve, J.G.: Anomalies in conservation laws in the Hamiltonian formalism. Phys. Rev. D 34, 674 (1986) 9. Manton, N.S.: The Schwinger model and its chiral anomaly. Ann. Phys. (N.Y.) 159 220 (1985); A model for the anomalies in gauge field theory. Santa Barbara preprint 83–164 (1983) Communicated by R. H. Dijkgraaf
Commun. Math. Phys. 218, 245 – 262 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
WKB and Spectral Analysis of One-Dimensional Schrödinger Operators with Slowly Varying Potentials Michael Christ1, , Alexander Kiselev2, 1 Department of Mathematics, University of California, Berkeley, CA 94720-3840, USA.
E-mail:
[email protected]
2 Department of Mathematics, University of Chicago, Chicago, IL 60637, USA.
E-mail:
[email protected] Received: 27 July 2000 / Accepted: 23 October 2000
Abstract: Consider a Schrödinger operator on L2 of the line, or of a half line with appropriate boundary conditions. If the potential tends to zero and is a finite sum of terms, each of which has a derivative of some order in L1 + Lp for some exponent p < 2, then an essential support of the the absolutely continuous spectrum equals R+ . Almost every generalized eigenfunction is bounded, and satisfies certain WKB-type asymptotics at infinity. If moreover these derivatives belong to Lp with respect to a weight |x|γ with γ > 0, then the Hausdorff dimension of the singular component of the spectral measure is strictly less than one.
1. Introduction The semiclassical WKB method, first proposed in [31, 15, 4], is one of the most widely used methods for approximating the wave function of a particle, and is a part of most textbooks on quantum mechanics. The principal requirement for its applicability is that the electric field potential V (x) vary slowly, allowing certain terms involving its derivatives to be neglected, provided that the difference E − V (x), where E is the energy, is bounded away from zero. The literature on mathematically rigorous applications of the WKB idea is enormous. As an example we refer to [5, 12] for the construction of modified wave operators for long-range decaying potentials; see [21] for many further references. A typical requirement, for example, is that the potential satisfy both |V (x)| ≤ C(1 + |x|)−1/2− and |D α V (x)| ≤ C(1 + |x|)−3/2− for any |α| = 1; more slowly decaying potentials require stronger decay assumptions for derivatives of higher order. M. C. was supported in part by NSF grant DMS-9970660 and performed part of this research while on appointment as a Miller Research Professor in the Miller Institute for Basic Research in Science. A. K. was supported in part by NSF grant DMS-9801530.
246
M. Christ, A. Kiselev
A series of works were devoted to studying oscillating potentials of Wigner–Von Neumann type, like ax −δ sin(bx β ), and their generalizations. See [2, 3, 32] for some of the results and further references. In this paper, we extend the scope of the WKB method to analyze the asymptotic behavior at infinity of generalized eigenfunctions in one dimension, in the case where the potential may decay only very slowly or not at all, but where at least one derivative decays at a moderate rate, including (1 + |x|)−α for 1/2 < α < 1. We establish WKBtype asymptotics for almost every energy (with respect to Lebesgue measure), even though there can occur an everywhere dense set of energies for which such asymptotics fail to hold; the decay rates hypothesized are very nearly optimal in this respect. This paper is one of a series of works [7, 8] treating such asymptotics with unstable parameter dependence. As a corollary, we deduce the existence of absolutely continuous spectrum, and an upper bound on the dimension of the singular continuous spectrum. Thus certain parts of the spectrum are stable under perturbations of the free Hamiltonian (V = 0) by suitably slowly varying potentials. Denote by p (L1 )(R) the Banach space of all (equivalence classes of) measurable functions from R to R for which the norm ∞ p 1/p k+1 f p (L1 ) = |f (x)| dx k=−∞
k
is finite. This Banach space contains L1 + Lp . If p ≤ q, then p (L1 ) ⊂ q (L1 ). Throughout the paper, we assume the following conditions (the hypotheses of Theorems 1.3 and 1.4 are more restrictive). Let n ≥ 0 be a nonnegative integer, and let p ∈ [1, 2) be an exponent. Let V be a measurable, real-valued function defined on the real line R. Suppose that V admits a decomposition V = V0 + Vn where1 V0 ∈ p (L1 ), Vn is continuous and bounded, and d n Vn /dx n ∈ p (L1 ), in the sense of distributions. Define esslimsupx→+∞ V (x) to be lim supx→+∞ Vn (x). Define essliminf x→+∞ V (x), and the corresponding quantities with +∞ replaced by −∞, likewise. Let H = −d 2 /dx 2 + V (x). In the following statement, “almost every” means with respect to Lebesgue measure. Theorem 1.1. Under the above hypotheses, for almost every E > esslimsupx→+∞ V (x), each solution of the generalized eigenfunction equation H u = Eu is a bounded function of x ∈ [0, +∞). In addition, for almost every E in this same interval, WKB-type asymptotics hold in the sense that there exists a generalized eigenfunction u(x, E) satisfying u(x, E) · e−i (x,E) → 1 as x → +∞, where is a certain complex-valued function constructed from the potential by an explicit, though somewhat complicated, recipe dependent on the index n in the hypotheses. The exponent has relatively tame behavior, in contrast to u(x, E); ∂ /∂x and all its partial derivatives with respect to x, E are uniformly bounded for x ∈ [0, ∞) and E in any compact subinterval of (esslimsupx→+∞ V (x), ∞). The complex conjugate of u is a second, linearly independent solution, with corresponding asymptotics. 1 This includes any potential decomposable as n k k p 1 k=0 Vk , where d Vk /dx ∈ (L )(R) for each k ≥ 0, and where nk=1 Vk is bounded.
WKB and Spectral Analysis of 1D Schrödinger Operators
247
d Theorem 1.2. Under the above hypotheses, for the Schrödinger operator H = − dx 2 + 2 V (x) acting on L ([0, ∞)), with some self-adjoint boundary condition at the origin, an essential support for the absolutely continuous spectrum of H is [esslimsupx→+∞ V (x), ∞). Moreover, the essential spectrum coincides with [essliminf x→+∞ V (x), +∞), and is purely singular in the interval from essliminf x→+∞ V (x) to esslimsupx→+∞ V (x). 2
d 2 Let H = − dx 2 + V (x) acting on L (R), and define A± = esslimsupx→±∞ V (x), A = max(A+ , A− ), and a = min(A+ , A− ). Then an essential support for the absolutely continuous spectrum of H is [a, +∞]. The absolutely continuous spectrum has multiplicity two in [A, ∞), and has multiplicity one in [a, A]. The essential spectrum of H coincides with [essliminf |x|→∞ V (x), ∞), and is purely singular in the interval from essliminf |x|→∞ V (x) to a. 2
As is well known [28], the almost everywhere boundedness of the generalized eigenfunctions asserted in Theorem 1.1 implies the assertions of Theorem 1.2 concerning the absolutely continuous spectrum; see [10] for an alternative approach to this implication. The assertions concerning the absence of absolutely continuous spectrum are proved as was done for the case n = 1 in [8]. The first related result in one dimension was proved by Weidmann [30], who showed that if the potential is a sum of an L1 function and a function whose derivative is in L1 , then the spectrum on the positive semi-axis is purely absolutely continuous. Later, Behncke [1] and Stolz [27] established a result similar to our Theorems 1.1 and 1.2 for p = 1; the spectrum above esslimsup|x|→∞ V (x) is also purely absolutely continuous in that case. Even earlier, Ben-Artzi [2] proved a power-decaying version of the Behncke– Stolz result. The price to be paid for a more general class of potentials is that the WKB asymptotics can actually fail to hold for a dense set of energies having positive Hausdorff dimension [24], and moreover, the point spectrum can be dense in [0, ∞) [18, 25]. The conclusions of Theorems 1.1 and 1.2 are false for p > 2; Pearson [20] has exhibited potentials satisfying V (x) → 0 as |x| → ∞, for which every derivative of V belongs to Lp for every p > 2, yet the spectrum on the positive semi-axis is purely singular; see also [14]. Theorem 1.1 remains open for p = 2. For 1 < p ≤ 2, the conclusion concerning the absolutely continuous spectrum had been independently conjectured by Molchanov, Novitskii and Vainberg [17], who proved by a different method that a support for the absolutely continuous spectrum coincides with (0, ∞), provided that d n V /dx n ∈ L2 , under the supplementary hypothesis that V ∈ Ln+2 [17]. Certain partial results had also been obtained by Killip [13]. For n = 0, the main conclusion of Theorem 1.2 was obtained by Deift and Killip [11] under the hypothesis V ∈ L2 + L1 . For potentials with more rapidly decaying derivatives, our conclusions can be strengthened. Define p = p/(p − 1). Theorem 1.3. Suppose that n ≥ 0, 1 ≤ p ≤ 2, 0 < γ , and γp ≤ 1. Let V be a measurable, real-valued function defined on R. Suppose that V = V0 + Vn , where Vn is bounded and continuous, and both (1 + |x|)γ V0 and (1 + |x|)γ d n Vn /dx n belong to p (L1 ). Then every solution of H u = Eu is a bounded function of x ∈ R+ , for all E > esslimsupx→+∞ V (x), except for a set of values of E having Hausdorff dimension ≤ 1 − γp . Moreover, except for a set of energies having Hausdorff dimension ≤ 1−γp , there exists a generalized eigenfunction satisfying u(x, E) exp(−i (x, E)) → 1 as x → +∞.
248
M. Christ, A. Kiselev
Theorem 1.4. Suppose that n ≥ 0, 1 ≤ p ≤ 2, 0 < γ , and γp ≤ 1. Let V be a measurable real-valued function defined on R, satisfying the hypotheses of Theorem 1.3. d2 2 Consider the Schrödinger operator − dx 2 + V (x) acting on L ([0, ∞)), with some selfadjoint boundary condition at the origin. Then the Hausdorff dimension of the singular component of its spectral measure in the interval (lim supx→+∞ V (x), ∞) does not exceed 1 − γp . For a Schrödinger operator acting instead on L2 (R), the singular component of its spectral measure has dimension ≤ 1−γp in the interval from min[lim supx→+∞ V (x), lim supx→−∞ V (x)] to +∞. For n = 0, this was proved by Remling [23] under a power decay hypothesis V (x) = O(|x|−α ). Remling [24] also constructed examples for which WKB asymptotics fail to hold, for a set of energies of dimension equal to 2(1 − α), precisely consistent with the above bound 1 − γp , but it remains unproven that singular continuous spectrum of positive dimension can actually occur for power decaying potentials. Theorem 1.4 is a consequence of Theorem 1.3; see for instance [10] for a general criterion which yields this implication. Theorem 1.1 can be viewed as a nonlinear analogue of a basic property of the Fourier transform. Menshov, Paley, and Zygmund x showed (in different versions) that if 1 ≤ p < 2 and V ∈ Lp (R), then supx 0 e−iλy V (y) dy is finite for almost every λ ∈ R; the nonlinear analogue is the almost everywhere boundedness of the generalized eigenfunctions of the Schrödinger operator with potential V . Now if instead, V is a bounded function for which d n V /dx n ∈ Lp , then again the conclusion of Menshov, Paley, and Zygmund is valid, and is a direct consequence of the case n = 0 by the
(λ) (or integration by parts). Sums of finitely many relation d n V /dx n (λ) = i n λn V functions V are equally easily handled. For the nonlinear analogue, we know of no such simple way to deduce the case n > 0 from n = 0, nor to conclude that V1 + V2 can be handled if V1 , V2 can be separately treated. Likewise, Theorem 1.3 is related to the fact that if (1 + |x|)γ · f ∈ p (L1 )(R) for y some p ∈ [1, 2], then limy→∞ 0 e−ixλ f (x) dx exists for all λ except for an exceptional set of Hausdorff dimension ≤ min(1 − γp , 0). This fact follows from a simpler version of our analysis. In an earlier paper [8], we proved Theorems 1.1 and 1.2 for n = 1, developing a general method for treating certain kinds of multilinear expansions on which the present paper is also based. The principal new ingredient needed to make our multilinear machinery applicable for n > 1 is a sufficiently good WKB-type approximation exp[i (x, E)] to the generalized eigenfunctions. A second novelty is that whereas for n = 1 we had been led [8] to multilinear operators mapping functions of x to functions of E, we now obtain operators mapping functions of (x, E) to functions of E; however, this second point had also arisen in [8], in the proof of a somewhat artificial supplementary theorem, in which V itself was allowed to depend on E. In the present paper we discuss the new points in detail, and merely outline the less novel remainder of the argument.
2. Simplifying a First-Order System Our first step is to write an ordinary differential equation −g + Ug = 0 as a first-order system, and to record certain transformations that bring the system into a nearly diagonalized form. Let φ(x) be a complex-valued function to be determined later. Introduce
WKB and Spectral Analysis of 1D Schrödinger Operators
249
the quantity E(x) = iφ − (φ )2 − U. Write the equation for g as y = My, where g and y(x) = g Substitute y = Az, where
A=
eiφ
iφ eiφ
M=
0 1
U 0
.
(2.1)
¯ e−i φ ¯ . −i φ¯ e−i φ
Then z = Bz, where B = A−1 MA − A−1 A . We have ¯ −iφ −ie−iφ −1 −1 φ e ¯ A = (φ + φ ) ¯ ¯ φ ei φ iei φ ¯ iφ eiφ −i φ¯ e−i φ MA = ¯ U eiφ U e−i φ 0 0 MA − A = ¯ −i φ¯ −Eeiφ −Ee and therefore
¯ −1
B = (φ + φ )
iE
¯
¯
−iEei(φ+φ)
iEe−i(φ+φ) −iE
.
Let ρ be another auxiliary function to be specified later, and substitute z = .u, where eρ 0 .= 0 eρ¯ to obtain u = Du with D = .−1 B. − .−1 . . Since −1
B. = (2 Re φ ) we have D = (2 Re φ )
−iEei(2 Re φ)+ρ
¯ −i(2 Re φ)+ρ¯ i Ee , ¯ ρ¯ −i Ee
i E¯ exp(−2i Re φ − 2i Im ρ) . · −iE exp(2i Re φ + 2i Im ρ) −i E¯ − (2 Re φ )ρ¯
−1
·
iEeρ
(2.2)
iE − (2 Re φ )ρ
250
M. Christ, A. Kiselev
Choosing
ρ(x) = i 0
x
E 2 Re(φ )
(2.3)
eliminates the diagonal entries, yielding 0 i E¯ exp(−2i Re φ − 2i Im ρ) −1 . D = (2 Re φ ) −iE exp(2i Re φ + 2i Im ρ) 0 More succinctly,
D=
0 −i E 2 Re φ
eih
iE 2 Re φ
where
e−ih
0
x
h(x) = 2 Re φ + 0
is purely real-valued. Define
Re E Re φ
= φ − iρ = φ(x) +
1 2
,
x 0
E . Re φ
(2.4)
(2.5)
(2.6)
Then i
=
−φ − iU + i|φ |2 . 2 Re φ
(2.7)
Indeed, (iφ + ρ ) · 2 Re φ = iφ · 2 Re φ + i[−(φ )2 + iφ − U ] = iφ · 2 Re φ − i(φ )2 − φ − iU = − φ − iU + i(Re φ + i Im φ )(2 Re φ ) − i (Re φ )2 − (Im φ )2 + 2i Re φ · Im φ = − φ − iU + i 2(Re φ )2 − (Re φ )2 + (Im φ )2 . A solution u of u = Du gives rise to a solution y of y = My by the substitution y = A.u, that is, ¯ ei e−i (2.8) y= ¯ u. iφ ei −i φ¯ e−i Let M, E, , D be related to U, φ as above. Lemma 2.1. Let a potential U be given. Suppose that there exists a continuous complexvalued function φ such that log | Re φ | is a bounded function on [0, ∞). Then the function defined by (2.6) has bounded imaginary part. If in addition φ and each solution of u = Du are bounded on [0, ∞), then each solution of y = My is likewise bounded. d d Re(i ) = − 21 dx log | Re(φ )|, whence the first Proof. Equation (2.7) implies that dx conclusion. The second then follows from (2.8).
WKB and Spectral Analysis of 1D Schrödinger Operators
251
3. Splitting the Potential Let V be as in Theorem 1.1. Fix an auxiliary function η ∈ C0∞ (R) having compact support, identically equal to one in some neighborhood of the origin, and whose inverse Fourier transform is real-valued. Decompose
(ξ ) = V
(ξ ) · η(ξ ). V = W + V˜ , where W
(3.1)
Then W, V˜ are real-valued. This decomposition will reduce matters to the analysis of sums of only two types of potentials. The proof of the next elementary observation is left to the reader. Lemma 3.1. W ∈ C ∞ ∩ L∞ , and for every k ≥ 1, d k W (x)/dx k → 0 as |x| → ∞. Moreover d k W/dx k ∈ Lp for every k ≥ n, while V˜ ∈ p (L1 ). The following routine refinement will be needed; we include a proof for the reader’s convenience. Lemma 3.2. For each 1 ≤ k < n, d k W/dx k ∈ Lqk ∩ L∞ , where qk = pn/k. Proof. Let 1 ≤ k < n. We claim that if f is bounded and f (n) ∈ Lp (R), then f (k) ∈ Lqk . To begin, observe that there exists C < ∞ such that for any smooth function f and any interval I , f (k) L∞ (I ) ≤ C|I |−k f L∞ (I ) + C|I |n−k−1 f (n) L1 (I ) .
(3.2)
By scaling, it suffices to establish this for intervals of length one. Suppose that the right-hand side is ≤ 1, and the left-hand side is large. f (n−1) may be decomposed as a polynomial of degree zero, plus a function whose supremum over I is a priori bounded. Iterating this, f (k) may be decomposed as a polynomial of degree ≤ n−k, plus a function whose supremum over I is a priori bounded. Since the L∞ norm of f (k) is large, this last polynomial must be large in L∞ (I ) norm. Consequently f (k−1) may be decomposed as a polynomial of degree ≤ n − k + 1 with large norm in L∞ (I ), plus a function whose L∞ (I ) norm is a priori bounded. Iterating this, we eventually conclude that f itself equals a large polynomial of bounded degree, plus a function whose supremum norm is small; hence f L∞ (I ) is large, a contradiction. Fix 1 < p < ∞, let q = np/k, let f be any C n function with f (n) ∈ Lp , and normalize so that f L∞ = 1. Any point x ∈ R belongs to some interval I satisfying |I |1−n = f (n) L1 (I ) ;
(3.3)
for if I is taken to be centered at x then the right-hand side is a nondecreasing function of |I |, while the left-hand side decreases to zero as |I | → ∞. Choose and fix a covering {Ij } of R consisting of such intervals, such that no point of R belongs to more than two intervals Ij . By (3.2) and the normalization, f (k) L∞ (Ij ) ≤ C|I |n−k−1 f (n) L1 (Ij ) for each Ij . By (3.2), (3.3), and Hölder’s inequality, for each J = Ij , f (k) Lq (J ) ≤ C|J |1/q f (k) L∞ (J ) ≤ C|J |1/q |J |n−k−1 f (n) L1 (J ) k/n
≤ C|J |1/q |J |n−k−1 f (n) L1 (J ) · |J |(1−n)(n−k)/n ≤ C|J |1/q |J |n−k−1 |J |(1−n)(n−k)/n |J |
p−1 k p n
k/n
k/n
f (n) Lp (J ) = Cf (n) Lp (J ) .
Raising this to the power q and summing over j completes the proof. The result can alternatively be proved via complex interpolation.
252
M. Christ, A. Kiselev
4. Higher-Order WKB Approximations We next show how to construct useful approximations exp(i ), exp(−i ¯ ) to the generalized eigenfunctions associated to V , by constructing the auxiliary function φ of Sect. 2. Consider a Schrödinger equation −g + (W − λ2 )g = 0 with some potential W (x). If g = ei6 , this equation becomes (6 )2 − i6 + W − λ2 = 0. Writing F = 6 , this becomes F 2 − iF + W − λ2 = 0.
(4.1)
Throughout this discussion we assume that the real-valued function λ2 − W (x) is uniformly bounded below by some fixed strictly positive number, and moreover that the quantities Fk , to be introduced shortly, are uniformly small. It will be a consequence of our construction that this is the case, under the hypotheses of Theorem 1.1, for all poten
tials W and all k to which this discussion is applied. The notation λ2 − W (x) + iFk (x) thus refers, without ambiguity, to that branch of the square root function close to the positive square root of the positive quantity λ2 − W (x). We approximately solve the preceding equation by recursion: set F0 (x, λ) = λ and for k ≥ 0,
Fk+1 = λ2 − W + iFk . (4.2) Define the error Ek = Fk2 − iFk + W − λ2 .
(4.3)
Substituting the recursion formula for Fk2 gives the alternative expression − iFk . Ek = iFk−1
(4.4)
A different expression will be more useful for our purpose. (4.3) can be rewritten as iFk = Fk2 + (W − λ2 ) − Ek ; substituting this into (4.2) gives Fk+1 = Fk2 − Ek . Thus from (4.4) we deduce
d d Ek
Ek+1 = i (Fk − Fk2 − Ek ) = i . (4.5) dx dx F + F 2 − E k k k The first few functions Fk , Ek are: F0 = λ E0 = W F 1 = λ2 − W i E1 = (λ2 − W )−1/2 W 2 1/2 i F2 = (λ2 − W ) − (λ2 − W )−1/2 W 2 −1 d i . W (λ2 − W )−1 1 + 1 − (λ2 − W )−3/2 W E2 = − 21 dx 2
WKB and Spectral Analysis of 1D Schrödinger Operators
253
In our application, the function φ of Sect. 2 will be chosen to equal Fn for some n. Our asymptotic expression for generalized eigenfunctions will thus be u(x, λ) ∼ exp(i (x, λ)),
(4.6)
V˜ + En . 2 Re Fn
(4.7)
where n
The first two functions 0 1
n
= Fn −
are:
= λ − (2λ)−1 V , = (λ2 − W )1/2 − 21 (λ2 − W )−1/2 V˜ − 4i (λ2 − W )−1 W .
2
is already rather complicated. In the next lemma, W is a bounded, continuous real-valued function of x ∈ R, and K is an arbitrary compact subinterval of (0, ∞), which is to remain fixed, for the remainder of the proof of Theorem 1.1; all assertions are uniform in λ ∈ K. We will always assume that supx max [W (x), 0] is as small as may be desired, and that d k W/dx k is likewise small in L∞ ∩ Lqk for 1 ≤ k < n and in Lp for k ≥ n. In our eventual application, this can be achieved by replacing the original potential V by a suitable potential that equals V on [N, ∞) for sufficiently large N . Lemma 4.1. Suppose that W ∈ L∞ , that d k W/dx k ∈ Lp (R, dx) for every k ≥ n, and that the supremum of max [W (x), 0] is sufficiently small. Then Fn ∈ L∞ , and Re(Fn − λ) ≥ −δ, where δ > 0 may be taken to be as small as desired. The remainder term En belongs to Lp (R, dx), and moreover ∂ m En /∂λm ∈ Lp (R, dx) for every m, uniformly in λ ∈ K. Proof. Write W (s) = ∂ s W/∂x s . It is a direct consequence of the recursion (4.2) that each Fk (x, λ) may be expressed √ as a smooth function of λ, W (x), W (x), . . . , W (k−1) (x). Moreover, for k ≥ 1, Fk − λ2 − W will be as small as may be desired in supremum norm, provided that W , . . . , W (k−1) and maxx (W (x), 0) are all sufficiently small in the senses detailed above. We say that a monomial m (W (m) )dm has weight d = m dm · m/n. Such a monomial belongs to Lr (R, dx), where r −1 = d/p, by Lemma 3.2. Like Fk+1 , Ek may be expressed as a smooth function of λ, W (x), W (x), . . . , (k) W (x). More precisely, Ek can be expressed as a finite sum of terms H (W, W , . . . , W (k−1) ) · P (W, W , . . . , W (k) ), where H is a smooth function in a neighborhood of (−∞, ) × {0, 0, . . . , 0} ⊂ R × Rk−1 for a fixed constant = (K) > 0, and P is a monomial of weight exactly k/n. In particular, there can be at most one factor of the highest-order derivative W (k) . This description of Ek follows by induction on k from the above description of Fk , together with the recursion (4.5). Consequently En ∈ Lp , by Lemma 3.2 and Hölder’s inequality. One consequence is that the real parts of both Fk and Fk2 will be bounded below by a fixed strictly positive constant.
254
M. Christ, A. Kiselev
5. Combining Ingredients Fix n ≥ 1 and assume that V satisfies the hypotheses of Theorem 1.1 with this index n. For E > A+ = esslimsupx→+∞ V (x), rewrite V (x) − E = [V (x) − A+ ] − λ2 , where λ > 0. We will henceforth replace V by V − A+ , and may thus assume that esslimsupx→+∞ V (x) = 0. By modifying this new V only on an interval (−∞, N ], we may then assume that esslimsupx→+∞ V (x) is smaller than any preassigned positive quantity. Such a modification has no effect on the asymptotic behavior of the generalized eigenfunctions, as x → +∞. Now we combine the splitting V = W + V˜ of the potential, the generalized WKB approximation of the preceding section, and the computations in Sect. 2. Decompose V = W + V˜ as in (3.1). Then max(0, supx W (x)) may likewise be taken to be arbitrarily small. Let Fn (x, λ) be constructed from W , by iterating the recursion (4.2) to order n, and let En = Fn2 − iFn + (W − λ2 ). As in (4.7), define (x, λ) = 0
x
V˜ + En Fn − 2 Re Fn
(s, λ) ds.
(5.1)
x Also define φ(x, λ) = 0 Fn . has bounded imaginary part. Indeed, by Lemma 4.1, log | Re φ (x, λ)| = log | Re Fn | is a bounded function of x, for each λ ∈ K, provided that V is sufficiently small, in the sense described in the first paragraph of this section. By Lemma 2.1, exp(i (x, λ)) is therefore a bounded function of x ∈ R+ , uniformly for every λ ∈ K. The two functions exp(±i ) are linearly independent, and the same holds for any two perturbations of them that are sufficiently small in the supremum norm near +∞. Indeed, the main constituent Fn of ∂ /∂x has a real part that is bounded below by a positive constant. The other part, (V˜ + En )/2 Re Fn , tends to zero in the sense that its L1 norm on an arbitrary interval [N, N + 1] approaches zero as N → ∞. Linear independence thus follows from an elementary argument. The remainder of the paper is devoted to the proof of the following result, which was mentioned in the Introduction but not formulated precisely there. Theorem 5.1. Under the hypotheses of Theorem 1.1, for almost every λ > 0 = A+ , there exists a generalized eigenfunction u(x, λ) such that u(x, λ)e−i
(x,λ)
→1
(5.2)
as x → +∞. u and its complex conjugate are linearly independent. Under the stronger hypotheses of Theorem 1.3, the set of all λ for which (5.2) fails to hold has Hausdorff dimension less than or equal to 1 − γp . Because has bounded imaginary part, (5.2) implies boundedness of all generalized eigenfunctions associated to the spectral parameter λ2 , and hence Theorem 5.1 implies the main conclusions of the theorems formulated in the Introduction. To make use of the results in Sect. 2, set U = V − λ2 . Then E(x, λ) = iφ − (φ )2 − (V − λ2 ) = −En (x, λ) − V˜ (x).
WKB and Spectral Analysis of 1D Schrödinger Operators
255
We have ∂ m E/∂λm ∈ p (L1 )(R, dx) for every m ≥ 0, uniformly for each λ in any compact subset of R, and the same goes for E/ Re φ = E/ Re Fn . Define F(x, λ) = −iE/2 Re Fn .
(5.3)
We know that F(·, λ) ∈ p (L1 ) uniformly for all λ in any compact subset K of (0, ∞), and moreover the same goes for ∂λr F(·, λ) for all r. According to (2.5), x Re E . h(x, λ) = 2 Re Fn + 0 Re Fn In order to demonstrate the first conclusion of Theorem 1.1, it suffices to show that for 1 →0 almost every λ, there exists a C2 -valued solution u of u = Du such that u(x)− 0 as x → +∞. We will deduce this from analytic machinery built up in earlier papers [8, 9]. In the final section of the paper, we will indicate the modifications needed to treat all λ outside a set of appropriately bounded Hausdorff dimension, under the stronger hypotheses of Theorem 1.3. 6. A Nonlinear Hausdorff–Young Inequality Fix n, let 1 ≤ p ≤ 2, and a compact subinterval K ⊂ (0, ∞). Let h(x, λ) be defined as in Sect. 5. Consider the operator Sf (λ) = eih(x,λ) f (x) dx. R
Lemma 6.1. Let V satisfy the hypotheses of Theorem 1.1, for a certain n ≥ 0 and p < ∞, with sufficiently small norms. Let s ≤ 2, and let q = s/(s − 1) be the exponent conjugate to s. Then there exists C < ∞ such that for any f ∈ L1 (R), Sf Lq (K,dλ) ≤ Cf s (L1 )(R) .
(6.1)
Proof. First consider the case s = 2. Let ζ be a real-valued, nonnegative, smooth auxiliary function that is strictly positive on K and is supported in a small neighborhood of K. Consider 2 eih(x,λ) f (x) dx ζ (λ) dλ K R = eih(x,λ) f (x) dx e−ih(y,λ) f¯(y) dy ζ (λ) dλ R K R = ei(h(x,λ)−h(y,λ)) ζ (λ) dλ f (x)f¯(y) dx dy. R×R K
The hypotheses and earlier lemmas imply that |h(x, λ) − h(y, λ)| is bounded above and below by positive constants times |x − y|, provided that |x − y| is sufficiently large, uniformly in λ. We claim that the inner integral is bounded by C(1 + |x − y|)−2 ; the conclusion (6.1) then follows directly from this, for q = 2. When |x − y| is bounded, the estimate
256
M. Christ, A. Kiselev
is trivial. When |x − y| is large, multiply and divide by ∂λ ((h(x, λ) − h(y, λ)), and integrate by parts with respect to λ in the inner integral, integrating exp(i(h(x, λ) − h(y, λ)) · ∂λ (h(x, λ) − h(y, λ)). Multiply and divide, then integrate by parts once more, to conclude the proof for s = 2. The case s = 1 is trivial, and the general case then follows by interpolation. In our application, S will act on F(x, λ), which itself depends on λ. This will necessitate some modification; see the proof of Proposition 7.1. 7. Summation of the Solution Series In order to prove Theorem 5.1, we now combine the WKB-type Ansatz developed in Sects. 2–5 with Lemma 6.1 and with the machinery developed in [8, 9].We seek to solve
the equation u = Du, and to find a solution such that u(x) → 1 0 Writing the equation as ∞ 1 − u(x) = D(y)u(y) dy, 0 x
t
as x → +∞.
we obtain the formal series solution 1 u(x) = 0 ∞ 1 k + dtk · · · dt2 dt1 . ··· (−1) D(t1 )D(t2 ) · · · D(tk ) 0 x≤t1 ≤t2 ···≤tk <∞ k=1 Introduce multilinear operators Tm (f1 , . . . , fm )(x, λ) =
m
exp(i(−1)m−k h(tk , λ)) fk (tk ) dtk
(7.1)
(7.2)
x≤t1 ≤t2 ≤···≤tm k=1
for each m ≥ 1. Then the series solution is formally ∞ 1 m=1 T2m (F2m,1 , . . . , F2m,2m ) + ∞ , u(x) = 0 m=0 T2m+1 (F2m+1,1 , . . . , F2m+1,2m+1 ) where each Fm,j equals either F(x, λ) or its complex conjugate; the precise rule is of no consequence for our estimates. Even the operator T1 is highly nonlinear, since both h and F are nonlinear functions of the potential V . It suffices to show that for almost every λ ∈ K, these two series converge, and define bounded functions of x ∈ R. The proof below, together with arguments in [8], then demonstrates that the the vector-valued series does define a solution of u = Du sumof t such that u(x) → 1 0 as x → +∞, hence gives rise to a bounded solution, not identically vanishing, of the original generalized eigenfunction equation Hf = λ2 f . This type of result was treated in our analysis [8] of the case n = 1. The principal new twist here is that the functions F on which our multilinear operators act, now depend
WKB and Spectral Analysis of 1D Schrödinger Operators
257
on λ. This situation is much like that considered in Theorem 1.3 of [8], where V itself was allowed to depend on λ. Let 1
B=
r=0
p
∂ρr F(·, ρ)p (L1 ) .
As proved in [8], there exist sets Ejm ⊂ R, indexed by 1 ≤ m < ∞, 1 ≤ j ≤ 2m , satisfying: • R = ∪j Ejm for every m.
• Ejm ∩ Ejm = ∅ for every j < j .
• If j < j , x ∈ Ejm , and x ∈ Ejm , then x < x . m+1 m+1 • For every m, j , Ejm = E(2j −1) ∪ E2j . p
• For every m, j , ∂ρr F(·, ρ)χjm p (L1 ) ≤ 2−m B for r = 0, 1. We denote by χE the characteristic function of the set E, and introduce the special notation χjm for the characteristic function of the interval Ejm . Fix such sets Ejm , for the remainder of Sect. 7. Define a multilinear operator Mn , acting on n functions gk of (x, λ), by
n
gk (tk , λ) dtk .
Mn (g1 , . . . , gn )(x, x , λ) =
x≤t1 ≤···≤tn ≤x k=1
In the special case when there is a single function g such that each gk is an element of the set {g, g}, ¯ we write simply Mn (g)(x, x , λ). In our application, g(x, λ) will essentially be equal to eih · F. Define Mn∗ (g1 , . . . gn )(λ) = sup Mn (g1 , . . . gn )(x, x , λ) , x≤x ∈R
Mn∗ (g)(λ) = sup Mn (g)(x, x , λ) , x≤x ∈R
˜ G(g, λ) =
∞ 1 r=0 m=1
m 2 m j =1
Ejm
1/2 2 g(x, λ) dx .
(7.3)
˜ as a linear operator. To do this, introduce the Banach space It will be useful to regard G B consisting of all complex-valued sequences a = a(m, j ) indexed by 1 ≤ m < ∞ 1/2 2 ˜ |a(m, j )| < ∞. Then G(g, λ) equals the and 1 ≤ j ≤ 2m , for which m m j norm in B of the sequence { E m g(x, λ) dx}. j
258
M. Christ, A. Kiselev
In Proposition 4.2 of [9] and in the proof of Theorem 1.3 of that reference it is shown2 that Mn∗ (g1 , . . . , gn )(λ) ≤ C n
n
˜ k , λ) G(g
(7.4)
k=1
˜ G(g, λ)n √ n!
(7.5)
˜ 2 ), sup |u(x, λ)| ≤ C exp(C G(λ)
(7.6)
Mn∗ (g)(λ) ≤ C n
√ for some universal constant C < ∞. Moreover, there is likewise a factor of 1/ n! on the right-hand side of (7.4), if the number of distinct functions gk is bounded by any fixed constant independent of n. One formal consequence of (7.5) is that the series solution u of u = Du satisfies x
˜ is as defined in (7.3), with g(x, λ) = exp(ih(x, λ)) · F(x, λ). where G Proposition 7.1. Let n, V , p be as in the hypotheses of Theorem 1.1. Let F be as defined ¯ Then for each m ≥ 1, above, and suppose that for each s, k, Fs,k equals either F, or F. for almost every λ ∈ K,
s
lim
x →∞ x≤t1 ≤···≤ts ≤x
is finite. Finally,
x≤t1 ≤···≤ts <∞
∞
∗ s=0 Ts (λ)
k=1
exists for every x, and
Ts∗ (λ) = sup x
e±ih(tk ,λ) Fs,k (tk , λ) dtk
e±ih(tk ,λ) Fs,k (tk , λ) dtk k=1
s
is finite for almost every λ ∈ K.
The plus and minus signs in the exponents are not specified; these assertions are valid for all choices of signs, with uniform bounds. Proof. Fix any compact subinterval K of (0, ∞), and let q = p/(p − 1) > 2 be the exponent conjugate to p. By Lemma 6.1, the mapping f ! → R e±ih(x,λ) f (x) dx maps p (L1 ) boundedly to Lq (K). Therefore by Proposition 3.3 of [8] and the remark following the proof of Theorem 1.1 of [9], the Lq (K, B) norm of { E m e±ih(x,λ) f (x) dx} j
is majorized by a fixed constant times the p (L1 ) norm of f , provided that the collection of sets Ejm is adapted to f , in the sense that f · χjm p (L1 ) ≤ 2−m f p (L1 ) . Taking f (x) first equal to F(x, ρ) and then equal to ∂ρ F(x, ρ), we conclude that for r = 0, 1, the Lq (K, B, dλ) norm of {∂ρr E m e±ih(x,λ) F(x, ρ) dx} is majorized by j
CFp (L1 ) + C∂ρ Fp (L1 ) , hence by a finite constant, uniformly for all ρ ∈ K. Thus ∂ρr E m e±ih(x,λ) F(x, ρ) dx ∈ Lq (K × K, B, dλ dρ) for r = 0, 1. Therefore by the j
√
2 A bound with a factor of 1/ n! also appears in the elementary theory of Volterra integral equations [29]
p. 12; it appears not to be closely related to (7.5).
WKB and Spectral Analysis of 1D Schrödinger Operators
259
one-dimensional Sobolev embedding theorem, K " ρ ! →
Ejm
e±ih(x,λ) F(x, ρ) dx is a
continuous B-valued function for almost every λ ∈ K, and the supremum over ρ of its B-norm belongs to Lq (K, dλ). Thus we conclude that ˜ G(F(·, λ), λ) ∈ Lq (K, dλ).
(7.7)
From (7.5), it follows that the supremum over all pairs x, x of e±ih(tk ,λ) Fs,k (x, λ) dtk x≤t1 ≤···≤ts ≤x
is finite for almost every λ ∈ K. Existence of the limit, as x → ∞, then follows as in the proof of√ Proposition 4.1 of [8]. Summability with respect to s holds, because of the factor of 1/ n! in (7.5), as expressed in (7.6). 8. A Bound on the Set of Exceptional Energies Assume that V satisfies the hypotheses of Theorem 1.3 for some 1 < p ≤ 2, γ > 0. We seek an upper bound on the Hausdorff dimension of the set of all λ for which the WKB-type asymptotics fail to hold. Suppose that β > 1 − p γ , where p = p/(p − 1). We may assume that 0 < β < 1, γ < 1 and p > 1; otherwise there is nothing to prove, or the result is already known [1]. Let Hβ denote β-dimensional Hausdorff measure. Fix a compact subinterval K of (0, ∞). Throughout this section, V , F, h are functions of x ≥ 0. By its construction, the exponent h(x, λ) defined in (2.5) satisfies |∂ k h(x, λ)/∂λk | ≤ C + Cx, uniformly in λ ∈ K, for every k ≥ 0. Indeed, ∂ k+1 h/∂λk ∂x is bounded. Likewise, (1 + x)γ times any partial derivative of F(x, λ) with respect to λ belongs to p (L1 )(R+ ). This follows from natural analogues of Lemmas 3.1 and 3.2; in particular, (1 + x)γ k/n ∂ k W (x, λ)/∂x k ∈ L∞ ∩ Lpk/n for 0 ≤ k ≤ n, and the same holds for each of its partial derivatives with respect to λ. By subtracting a constant from V , we can also assume that esslimsupx→+∞ V (x) equals 0. As in Sect. 7, consider the formal series solution (7.1) of u = Du. Define m Tm (F1 , . . . , Fm )(x, x , λ) = e±ih(tk ,λ) Fk (tk ) dtk . x≤t1 ≤t2 ≤···≤tm ≤x k=1
Define intervals Ejm ⊂ R+ by the same construction used in Sect. 7, but applied to (1 + x)γ · F(·, λ). To any function F (x, λ) associate the doubly indexed sequence of numbers g(F )(λ) = eih(t,λ) F (t, λ) dt . Ejm
m≥1, 1≤j ≤2m
Recall Sect. 7 that the B-norm of a doubly indexed sequence a(m, j ) is defined to from be m m[ j |a(m, j )|2 ]1/2 . Define F (N) (x, λ) = F(x, λ) for x ≥ N , and = 0 for x < N . A direct consequence of the definitions is that Tm (F, . . . , F)(x, x , λ) ≡ Tm (F (N) , . . . , F (N) )(x, x , λ) for all x, x ≥ N.
(8.1)
260
M. Christ, A. Kiselev
Throughout this discussion, the exponent h and the sets Ejm appearing in the definitions of Tm and g are defined in terms of the original potential V ; they are independent of N . Define .c = {λ ∈ K : g(F (N) )(λ)B ≥ c for every N ≥ 0}. We will prove that for any c > 0, Hβ (.c ) = 0. Since by (7.5) and (8.1), whenever N ≥ M, ∞
sup |Tm (F (N) , . . . , F (N) )(x, x , λ)| ≤ C exp(Cg(F (M) )(λ)2B ),
m=1 x,x ≥N
this implies that for any c > 0, Hβ {λ ∈ K : lim sup
∞ Tm (F(·, λ), . . . , F(·, λ))(x, x , λ) ≥ c} = 0.
x,x →∞ m=1
(8.2)
As in the proofs of Proposition 7.1 of this paper, and Proposition 4.1 of [8], that suffices to establish convergence of the series defining u, and validity of the WKB asymptotics, for all λ outside a set whose Hβ measure equals zero. q Let q = p . We claim that g(F) belongs to the Sobolev space Lγ of all B-valued q functions having γ derivatives in L , in a fixed neighborhood of K. To prove this, consider the analytic family of functions Fz (x, λ) = (1 + x)z F(x, λ). For Re(z) = γ , g(Fz ) ∈ Lq , by Lemma 6.1 and Proposition 7.1. For Re(z) = γ − 1, we have ∂λ g(Fz ) ∈ Lq (K, dλ). Indeed, when E m e±ih(x,λ) Fz (x, λ) dx is differentiated with j
respect to λ, the derivative falls either on F, or on the exponent h. In the former case, no harm is done, because each partial derivative of F with respect to λ satisfies the same bounds as does F itself; moreover, matters are improved by the factor of (1 + x)z in the definition of Fz , since Re(γ − 1) ≤ 0. In the latter case, F is replaced by ±i∂λ h · F. Since ∂ k h/∂λk = O(x) for every k ≥ 1, this results in an extra O(1 + x) factor; when combined with the factor of (1 + x)γ −1 in the definition of Fz , this means that we are applying g to a function all of whose λ-derivatives belong to p (L1 ) for each λ. Thus Lemma 6.1, in its λ-dependent version developed in the proof of Proposition 7.1, applies once more. Moreover, the Sobolev norm of g(F (N) ) tends to zero as N → ∞. This follows from three facts. Firstly, in a fixed neighborhood of K, by the discussion in the preceding q paragraph, for any fixed m, j , the scalar-valued function E m eih(t,λ) F (t, λ) dt has Lγ j norm bounded by C 1r=0 supλ∈K (1+|t|)γ ∂λr F (t, λ)p (L1 )(R+ , dt) . Secondly, for F = F (N) , the two p (L1 ) norms in this last expression tend to zero as N → ∞. Thirdly, the claim of the preceding paragraph remains valid if the norm on B is changed so that 2 1/2 (the weight m has been the norm of a sequence a(m, j ) is m m2 j |a(m, j )| changed to m2 ). Indeed, this remains true with any power of m, as follows from the proof of Proposition 7.1, the argument two paragraphs above, and [9]. Suppose now that for some c > 0, Hβ (.c ) > 0. Then by Theorem II.1 of [6], there exists a finite positive measure µ with µ(.c ) > 0, satisfying µ(I ) ≤ |I |β for every interval I . Let N be large. By a potential-theoretic characterization of Sobolev spaces [26], we conclude that g(F (N) )(λ)B ≤ J ∗ fN (λ) = R J (λ − ρ)fN (ρ) dρ, where fN , J are
WKB and Spectral Analysis of 1D Schrödinger Operators
261
nonnegative, fN ∈ Lq (R), fN Lq → 0 as N → ∞, J (ρ) ≤ C|ρ|γ −1 for all |ρ| ≤ 1, and J (ρ) ≤ C exp(−c|ρ|) for |ρ| ≥ 1. A simple calculation using the hypothesis β > 1 − p γ in conjunction with the upper bound on µ(I ) demonstrates that J ∗ µ ∈ Lq = Lp . (Decompose J as a Schwartz ∞ function plus r=0 Jr , where Jr (x) is supported where |x| ≤ 2−r , and Jr L∞ ≤ C2r(1−γ ) . Estimate the Lp norm of µ ∗ Jr by interpolating between simple L1 and L∞ bounds, then sum over r.) Thus (J ∗ fN ) dµ = fN · (J ∗ µ) dλ ≤ fN Lq · J ∗ µLq , which tends to zero as N → ∞. But by hypothesis and the definition of .c , g(F (N) )(λ)B ≥ c for every λ ∈ .c and every N. Therefore
(J ∗ fN )dµ ≥
g(F (N) )B dµ ≥ cµ(.c ) > 0,
a contradiction. References 1. Behncke, H.: Absolutely continuous spectrum of Hamiltonians with Von Neumann–Wigner potentials, II. Manuscripta Math. 71, 163–181 (1991) 2. Ben-Artzi, M.: On the absolute continuity of Schrödinger operators with spherically symmetric long range potentias I, II. J. Diff. Equations 38, 41–60 (1980) 3. Ben-Artzi, M.: Spectral and scattering theory for the adiabatic oscillator and related potentials. J. Math. Phys. 20, 594–607 (1979) 4. Brilloin, L.: Notes on undulatory mechanics. J. Phys. 7, 353 (1926) 5. Buslaev,V. and Matveev,V.: Wave operators for the Schrödinger equation with slowly decreasing potential. Teoret. Mat. Fiz. 2, 3, 367–376 (1970) 6. Carleson, L.: Lectures on Exceptional Sets. Princeton, NJ: Van Nostrand, 1967 7. Christ, M. and Kiselev, A.: Absolutely continuous spectrum for one-dimensional Schrödinger operators with slowly decaying potentials: Some optimal results. J. Am. Math. Soc. 11, 771–797 (1998) 8. Christ, M. and Kiselev, A.: WKB asymptotic behavior of almost all generalized eigenfunctions for onedimensional Schrödinger operators with slowly decaying potentials. J. Funct. Anal., to appear 9. Christ, M. and Kiselev, A.: Maximal functions associated to filtrations. J. Funct. Anal., to appear 10. Christ, M., Kiselev, A. and Last, Y.: Approximate eigenvectors and spectral theory. In Differential Equations and Mathematical Physics, Proceedings of an International Conference held at the University of Alabama at Birmingham, Providence, RI: Am. Math. Soc., 2000, pp. 61–72 11. Deift, P. and Killip, R.: On the absolutely continuous spectrum of one-dimensional Schrödinger operators with square summable potentials. Commun. Math. Phys. 203, 341–347 (1999) 12. Hörmander, L.: The existence of wave operators in scattering theory. Math. Z. 146, 69–91 (1976) 13. Killip, R.: Perturbations of one-dimensional Schrödinger operators preserving the absolutely continuous spectrum. Preprint 14. Kiselev, A., Last, Y. and Simon, B.: Modified Prüfer and EFGP transforms and the spectral analysis of one-dimensional Schrödinger operators, Commun. Math. Phys. 194, 1–45 (1998) 15. Kramers, H.: Wellenmechanik und habzahlige Quantisierung. Zeit. Phys. 39, 828 (1926) 16. Menshov, D.: Sur les series de fonctions orthogonales. Fund. Math. 10, 375–420 (1927) 17. Molchanov, S., Novitskii, M., and Vainberg, B.: First KdV integrals and absolutely continuous spectrum for 1-D Schrödinger operator. Preprint 2000 18. Naboko, S.N.: Dense point spectra of Schrödinger and Dirac operators. Theor.-math. 68, 18–28 (1986) 19. Paley, R.E.A.C.: Some theorems on orthonormal functions. Studia Math. 3, 226–245 (1931) 20. Pearson, D.: Singular continuous measures in scattering theory. Commun. Math. Phys. 60, 13–36 (1978) 21. Reed, M. and Simon, B.: Methods of Modern Mathematical Physics, III. Scattering Theory. London–San Diego: Academic Press, 1979
262
M. Christ, A. Kiselev
22. Remling, C.: The absolutely continuous spectrum of one-dimensional Schrödinger operators with decaying potentials. Commun. Math. Phys. 193, 151–170 (1998) 23. Remling, C.: Bounds on embedded singular spectrum for one-dimensional Schrödinger operators. Proc. Am. Math. Soc. 128, 161–171 (2000) 24. Remling, C.: Schrödinger operators with decaying potentials: Some counterexamples. Ppreprint 99-87, Mathematical Physics Preprint Archive, http://rene.ma.utexas.edu/mp_arc/ 25. Simon, B.: Some Schrödinger operators with dense point spectrum. Proc. Am. Math. Soc. 125, 203–208 (1997) 26. Stein, E.M.: Singular Integrals and Differentiability Properties of Functions. Princeton, NJ: Princeton Univ. Press, 1970 27. Stolz, G.: Spectral theory for slowly oscillating potentials. II. Schrödinger operators. Math. Nachr. 183, 275–294 (1997) 28. Stolz, G.: Bounded solutions and absolute continuity of Sturm–Liouville operators. J. Math. Anal. Appl. 169, 210–228 (1992) 29. Tricomi, F.G.: Integral Equations. New York: Dover, 1985 30. Weidmann, J.: Zur Spektraltheorie von Sturm–Liouville Operatoren. Math. Z. 98, 268–302 (1967) 31. Wentzel, G.: Eine Verallgemeinerung der Quantenbedingungen für die Zwecke der Wellenmechanik. Zeit. Phys. 38, 38 (1926) 32. White, D.A.W.: Schrödinger operators with rapidly oscillating central potentials. Trans. Amer. Math. Soc. 275, 641–677 (1983) 33. Zygmund, A.: A remark on Fourier transforms. Proc. Camb. Phil. Soc. 32, 321–327 (1936) Communicated by B. Simon
Commun. Math. Phys. 218, 263 – 281 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Superselection Theory for Subsystems Roberto Conti1 , Sergio Doplicher2 , John E. Roberts1 1 Dipartimento di Matematica, Università di Roma “Tor Vergata”, 00133 Rome, Italy 2 Dipartimento di Matematica, Università di Roma “La Sapienza”, 00185 Rome, Italy
Received: 23 December 1999 / Accepted: 25 November 2000
Abstract: An inclusion of observable nets satisfying duality induces an inclusion of canonical field nets. Any Bose net intermediate between the observable net and the field net and satisfying duality is the fixed-point net of the field net under a compact group. This compact group is its canonical gauge group if the occurrence of sectors with infinite statistics can be ruled out for the observable net and its vacuum Hilbert space is separable. 1. Introduction In this paper, we take the view that a physical system is described by its observable net, a net O → A(O) of von Neumann algebras over double cones in Minkowski space in its vacuum representation. Our goal is to analyze subsystems. Of course, physical intuition would lead us to believe that in physically realistic situations there will be no proper subsystems since any putative subsystem loses its identity through interaction with the ambient system. A result in this direction would be interesting as it would allow one to pinpoint many natural sets of generators. For example, one could, modulo technicalities, claim that the observable net is generated by the energy–momentum tensor density [6, 3]. Unfortunately, we have no such result. Instead, we turn to consider what might prove to be the exceptional case where there are proper subsystems. For example, if the original system admits symmetries, i.e. if there is a nontrivial group of local automorphisms of the net leaving the vacuum state invariant, then the fixed-point net provides an example of a subsystem and one may even wonder whether every subsystem arises in this manner. In fact, the theory of superselection sectors provides a natural mechanism for giving examples of subsystems. Each observable net A is contained in an associated canonical field net F as the fixed-point net under a compact group G of gauge automorphisms. The fixed-point net B under a proper subgroup G, containing the element changing the sign of Fermi fields, if present, can be treated as the observable net of some other physical system and A will then be a subsystem of B. Research supported by MURST, CNR–GNAFA and EU.
264
R. Conti, S. Doplicher, J. E. Roberts
Here are some of the questions we would like to answer. Can one classify subsystems? What is the relation between the superselection structure of a system and that of its subsystems and how are their canonical field nets related? Earlier partial results on the classification problem can be found in [17] and [5]. We have only been able to make sensible progress on classifying subsystems by restricting ourselves to systems satisfying duality in the vacuum representation. Since there are good reasons for believing that A satisfies essential duality, this amounts to replacing A by its dual net, thus ignoring, for example, the possibility of spontaneously broken gauge symmetries. This partial solution does have the merit of reducing the classification problem to that of finding all observable nets with a given dual net. In Sect. 2 we study inclusions of observable nets and their functorial properties. Thus if A ⊂ B, we get an inclusion of the corresponding categories of 1-cocycles, Z 1 (A) ⊂ Z 1 (B) and hence of the corresponding categories of transportable morphisms, Tt (A) ⊂ Tt (B), and finally restricting to finite statistics gives Tf (A) ⊂ Tf (B). We interpret this latter inclusion in terms of the associated homomorphism from the gauge group of B to that of A. In Sect. 3, we study inclusions of field nets giving conditions for the existence of an associated conditional expectation. Conditional expectations also play a decisive role in proving that an inclusion of observable algebras satisfying duality gives rise to an inclusion of the corresponding complete normal field nets. In Sect. 4, we study intermediate nets, that is nets contained between the observable net and its canonical field net, showing that such nets are the fixed-points of F under a closed subgroup L of the gauge group. After this, we prove a result showing that the sectors of an intermediate observable net correspond, as one would expect, to the equivalence classes of irreducible representations of L. This was the principal objective of this paper and is worth comparing with previous results. There are two known sets of structural hypotheses [13, 22] allowing one to conclude that a Bosonic net has no sectors. Our result would be a consequence whenever F were Bosonic. However, in the absence of evidence that a Bosonic canonical field net satisfies the structural hypotheses, these results of [13] and [22] have been most useful in proving the absence of sectors in examples such as free field theories. For our result, we need to exclude infinite statistics for the observable net, a weaker hypothesis with little known about its validity. In addition we have to assume that the vacuum Hilbert space of A is separable, as indeed it is, in practice. The paper concludes with an appendix giving results on the harmonic analysis of the action of compact groups on von Neumann and C ∗ -algebras and on conditional expectations needed in the course of this paper.
2. Inclusions of Observable Nets In this section, we will be considering an inclusion A ⊂ B of observable (i.e. local) nets, with a view to seeing what can be said about the relation between the corresponding superselection sectors. Each observable net will be considered as acting irreducibly on its own vacuum Hilbert space (denoted in the sequel by HA , HB ). Of course, if A ⊂ B, HA is naturally identified with a subspace of HB . However, we start by recalling some well known facts about superselection structure, cf. [14, 16, 7, 8]. This will allow us to introduce our notation and give a few definitions and useful results.
Superselection Theory for Subsystems
265
Throughout this paper, the superselection sectors for A are understood to be the unitary equivalence classes of irreducible representations π satisfying the Selection Criterion (SC) π |A(O ) ∼ = π0 |A(O ) , O ∈ K with respect to a reference vacuum representation π0 (= π0A ). Here K denotes as usual the set of double cones in Minkowski space ordered under inclusion. The representations satisfying the selection criterion are the objects of a W∗ -category S(π0 ) whose arrows are the intertwining operators. Recall that Ad , the dual net of A, is defined by Ad (O) := A(O ) , the commutants being taken on HA . If A is a local net, i.e. if A ⊂ Ad , we have inclusions A ⊂ Add := (Ad )d ⊂ Ad = Addd . If A satisfies Haag duality, i.e. if A = Ad , we find that a π as above is equivalent to a representation of the form π0 ◦ ρ, where ρ is an endomorphism of A with the following properties: it is localized in a double cone O, i.e. ρ A(O ) = id, and is transportable, i.e. (inner) equivalent to an endomorphism localized in any other double cone. The category of these endomorphisms and their intertwiners is equivalent to S(π0 ) and hence may be equally used to describe the superselection structure. However, it has the added advantage of being a tensor W ∗ -category, denoted by Tt , and reveals the latent tensor structure of superselection sectors. The objects of Tt , the set of localized and transportable endomorphisms of A, will be denoted by t (A). If we wish to relate the superselection sectors of A and B using endomorphisms we run into two evident problems. On one hand, the restriction of an endomorphism ρ of B to A is not an endomorphism of A, unless ρ(A) ⊂ A, although it could still be regarded as a representation. Furthermore, it is not at first sight clear how a given localized, transportable endomorphism of A can be extended to a similar endomorphism of B. However, this problem has a canonical solution indicated by an alternative approach to superselection sectors using net cohomology. In the version in [23], net cohomology is conceived as a cohomology of partially ordered sets with coefficients in nets over the partially ordered set K. The formal description of this cohomology may be found in [23] and we restrict ourselves here to a pedestrian account of those concepts needed in this paper. A 0-simplex a is just a double cone O, i.e. an element of K. A 1-simplex b is an ordered set (O, O0 , O1 ) of double cones with O0 ∪O1 ⊂ O. Its faces are the 0-simplices ∂0 b = O0 , ∂1 b = O1 and its support is |b| = O. A 2-simplex c is an ordered set of double cones (O, O0 , O1 , O2 , O01 , O02 , O12 ) such that O ⊃ O0 ∪ O1 ∪ O2 and such that its faces ∂0 c = (O0 , O01 , O02 ), ∂1 c = (O1 , O01 , O12 ) and ∂2 c = (O2 , O02 , O12 ) are 1-simplices. Its support is |c| = O. The set of n-simplices is denoted by n . Definition. A 0-cocycle of A, a net of C ∗ -algebras over K, is a map z : 0 → A such that z(∂0 b) = z(∂1 b), b ∈ 1 , and z(a) ∈ A(a), a ∈ 0 . Hence the set Z 0 (A) of 0-cocycles is ∩O A(O). Definition. A 1-cocycle of A is a map z : 1 → U(A) such that z(∂0 c)z(∂2 c) = z(∂1 c), c ∈ 2 , and z(b) ∈ A(|b|), b ∈ 1 . The 1-cocycles are considered as the objects of a C ∗ -category Z 1 (A). An arrow between 1-cocycles, w ∈ (z, z ) is a mapping w : 0 → A such that z(b)w(∂1 b) = w(∂0 b)z (b), b ∈ 1 , w(a) ∈ A(a), a ∈ 0 . Note that if 1 denotes the trivial 1-cocycle 1(b) = I, b ∈ 1 , then the elements of (1, 1) are just the 0-cocycles. Two objects z, z
266
R. Conti, S. Doplicher, J. E. Roberts
of Z 1 (A) are cohomologous if (z, z ) contains a unitary arrow and z is a 1-coboundary if it is cohomologous to 1. Here is an example of a 1-cocycle illustrating at the same time the relation with the theory of superselection sectors. Given a representation π of A satisfying the selection criterion, pick for each a ∈ 0 a unitary operator Va such that Va π(A) = π0 (A)Va , and set
A ∈ A(O), a ⊂ O ,
z(b) := V∂0 b V∂∗1 b ,
b ∈ 1 ,
then z ∈ Z 1 (Ad ). Conversely, given a 1-cocycle with values in a net A, define for a ∈ 0 , πa (A) := z(b)Az(b)∗ ,
provided A ∈ Ad (O), b ∈ 1 , ∂0 b = a, ∂1 b ⊂ O .
One checks that πa gives a well defined representation of Ad and that z(b) ∈ (π∂1 b , π∂0 b ). Thus a 1-cocycle gives rise to a field a → πa of equivalent representations. Furthermore, πa is localized in a in the sense that πa (A) = A,
A ∈ Ad (O), O ⊂ a .
Details may be found in [23], § 3.4.6, Theorem 1, Corollary 2, where it is also proved that S(π0 ) and Z 1 (Ad ) are equivalent W ∗ -categories. It follows that S(π0 ) and S(π0dd ) are equivalent as W ∗ -categories, where π0dd denotes the vacuum representation of the double dual net Add . It should be noted that the above results on superselection sectors do not require any form of duality or even locality. But we are not able to define the tensor structure without a further hypothesis. We see, however, that essential duality, Ad = Add , will suffice for this purpose. A variant of the above construction relates cocycles and endomorphisms. It is based on assuming relative duality A(O) = Ad (O) ∩ A,
O ∈ K,
a weaker version of the more familiar assumption of duality. This is the C ∗ -version of duality and is defined without reference to the vacuum representation. In this context, it is natural to use nets of C ∗ -algebras. As a consequence of relative duality, an endomorphism localized in O satisfies ρ(A(O1 )) ⊂ A(O1 ) whenever O ⊂ O1 . Furthermore, an intertwiner between endomorphisms localized in O is automatically in A(O). Relative duality suffices for a theory of transportable endomorphisms but to pass from superselection sectors to transportable endomorphisms we need duality or, at least, essential duality. Consider an endomorphism ρ of A such that, given a ∈ 0 , there is a unitary ψ(a) ∈ A with ψ(a)ρ(A) = Aψ(a) A ∈ A(O), O ⊂ a . This is the analogue for endomorphisms of the selection criterion for representations. Our 1-cocycle z(b) := ψ(∂0 b)ψ(∂1 b)∗ , b ∈ 1 now takes values in A. Any such 1-cocycle z now defines a field of endomorphisms: ρa (A) := z(b)Az(b)∗ ,
provided A ∈ A(O), b ∈ 1 , ∂0 b = a, ∂1 b ⊂ O .
Superselection Theory for Subsystems
267
ρa is localized in a in the sense that ρa (A) = A,
A ∈ A(O), O ⊂ a .
Since z(b) ∈ (ρ∂1 b , ρ∂0 b ), we have a field a → ρa of endomorphisms in Tt , each of which is equivalent to the ρ we started from. Note that if we start with a cocycle of the form z(b) := ψ(∂0 b)ψ(∂1 b)∗ , as above, then ρa = Adψa ρ. In particular, if ρ is localized in a we may take ψa = I and hence arrange that ρa = ρ. We may regard our construction as leading to an equivalence of tensor C ∗ -categories between Z 1 (Tt ) and Tt , cf. [23], § 3.4.7, Theorem 5. As in the theory of superselection sectors, the tensor C ∗ -category Tt (A) admits a canonical permutation symmetry ε (in more than two spacetime dimensions). We can now begin to examine an inclusion A ⊂ B of nets satisfying relative duality. Such an inclusion obviously induces an inclusion functor Z 1 (A) → Z 1 (B). Thus a 1-cocycle in a local net A not only gives rise to a field a → ρa of endomorphisms of A but to a field a → ρ˜a of endomorphisms of any relatively local net B extending the original field. We have seen that any element of t (A) arises as a value of such a field and hence admits an extension to an element of t (B). As the cocycle is not uniquely determined by the endomorphism, a little argument is needed to show that the extension is uniquely detemined, cf. Lemma 3 of § 3.4.7 in [23]. Lemma 2.1. Let z and z be two 1-cocycles of a net A satisfying relative duality and suppose that, for some a ∈ 0 , ρa = ρa . Then, if A ⊂ B is an inclusion of nets and A and B are relatively local, the endomorphisms ρ˜a and ρ˜a of B induced by z and z agree. Proof. Let b ∈ 1 with ∂0 b = a, then z(b)∗ z (b) ∈ (ρ∂ 1 b , ρ∂1 b ) is an intertwiner of endomorphisms localized in ∂1 b. Since A satisfies relative duality, z(b)∗ z (b) ∈ A(∂1 b) and the result follows since B and A are relatively local. We now come to the main result of this section. Theorem 2.2. Let A ⊂ B be an inclusion of nets satisfying relative duality. Then there is an induced structure preserving inclusion of Tt (A) in Tt (B) which corresponds to the above extension on endomorphisms and to the given inclusion on intertwiners. Proof. In view of the relation between cocycles and endomorphisms and Lemma 2.1, the only point which is not yet obvious is that the tensor structure is preserved by the inclusion. However, if z and z are 1-cocycles in A and a → ρa and a → ρa are the corresponding fields of endomorphisms, then z ⊗ z (b) := z(b)ρ∂1 b (z (b)) defines a 1-cocycle over A whose associated field of endomorphisms is a → ρa ρa , and the result now follows. The extension of endomorphisms is also discussed in [2] under the name α-induction in the context of nets of subfactors [18]. The inclusion of Theorem 2.2 will of course map the unitary operator ε(ρ, ρ ) in Tt (A) onto the corresponding operator for the extended endomorphisms. Furthermore,
268
R. Conti, S. Doplicher, J. E. Roberts
as is obvious from the cohomological description, we shall have (ρ, ρ )A = (ρ, ρ )B ∩ A, with an obvious notation. In particular, whenever A and B satisfy duality, this result is at the same time a result about superselection structure and relates the superselection structure of A to that of B. The extension of an endomorphism ρ with finite statistics, ρ ∈ f (A), will again have finite statistics and we have an induced tensor ∗ -functor from Tf (A) to Tf (B). In fact this also holds in the more general context provided we understand Tf to be the full subcategory of Tt having conjugates. Now Tf (A) and Tf (B) are equivalent to the tensor W ∗ -categories of finite dimensional continuous unitary representations of compact groups so that tensor ∗ -functors correspond contravariantly to continuous homomorphisms between the groups in question [11]. In the context of superselection structure, the compact groups are the gauge groups. A gauge group appears as the group of automorphisms of a field net leaving the observable subnet pointwise fixed. We would like to make this homomorphism explicit and therefore consider the following situation. We consider a commuting square of inclusions of nets A1 ⊂ A2 ⊂ F2 and A1 ⊂ F1 ⊂ F2 . The Ai are to be considered as observable nets, the Fi as field nets, cf. [12], Definition 3.1 and Theorem 3.6. Thus we suppose Ai to have trivial relative commutant in Fi and Fi to be local relative to Ai . We consider the subcategories Ti of Tf (Ai ) induced by Hilbert spaces in Fi . The Hilbert spaces in question are unique and are supposed to generate Fi . We let Gi be the group of automorphisms of Fi leaving Ai pointwise invariant. These automorphisms leave the Hilbert spaces stable and we suppose that Gi is a compact group equipped with the topology of pointwise norm convergence on these Hilbert spaces. Finally, we suppose that each irreducible representation of Gi is realized on some such Hilbert space and that FiGi = Ai . These last conditions ensure that the inclusion Ai ⊂ Fi realizes Ti in a canonical way as a dual of Gi . Without wishing to get involved in further technicalities, we might say that the essence of these conditions is that the net Fi is a crossed product of the net Ai by the action of a group dual, where these terms are to be understood as adaptions to nets of von Neumann algebras of the corresponding concepts in [10]. Theorem 2.3. Under the above conditions on a commuting square of inclusions, F1 is stable under the action of G2 and the restriction of G2 to F1 defines a homomorphism h from G2 to G1 . The inclusion A1 ⊂ A2 induces an inclusion functor from T1 to T2 and this inclusion functor is precisely that induced by h. If N and K denote the kernel and image of h, respectively, then F1 ∨ A2 = F2N , F1 ∩ A2 = F1K . Proof. Note first that a Hilbert space H (ρ) in F1 inducing an object ρ of T1 must, when considered as a Hilbert space in F2 , induce the canonical extension of ρ to an object of Tf (A2 ) by relative locality. This canonical extension is thus an object of T2 so that we do have an induced inclusion functor from T1 to T2 . It also follows that H (ρ) is stable under the action of G2 . But such Hilbert spaces generate F1 so F1 is stable under the action of G2 . The restriction of an element of G2 to F1 defines an automorphism of F1 leaving A1 pointwise invariant and is therefore an element of G1 . Thus restriction defines the required homomorphism h. Since the representation of G2 on H (ρ) arises by composing that of G1 with h, h induces the above inclusion functor. Now N , being the kernel of h, obviously acts trivially on F1 and A2 . Now F2N is generated by the Hilbert
Superselection Theory for Subsystems
269
spaces H (ρ) in F2 inducing objects of T2 and carrying irreducible representations of G2 that are trivial in restriction to N . Regarding these as representations of K and inducing up to a representation of G1 , bearing in mind that every irreducible representation of G1 is realized within F1 , we conclude that there is an isometry in A2 mapping H (ρ) into F1 . Thus F2N is generated by F1 and A2 . Next, note that the K-invariant part of a Hilbert space of F1 inducing an object of T1 is G2 -invariant and hence lies in A2 . These Hilbert spaces generate F1K and, as any element of F1 ∩ A2 is K-invariant, we have F1 ∩ A2 = F1K , completing the proof. 3. Inclusions of Field Nets In the last section, we have treated inclusions of observable nets. However, observable nets are frequently defined by starting with a net F of fields with Bose–Fermi commutation relations. From a mathematical point of view, these are simply the Z2 -graded version of an observable net. Hence to have a basic formalism which is sufficiently flexible, we need to consider inclusions of Z2 -graded nets. We define a (concrete) Z2 -graded net F to be a net of von Neumann algebras over K, represented on its (vacuum) Hilbert space HF , together with an involutive unitary operator k inducing a net automorphism αk of F. The even (Bose) part F+ of F is the fixed-point net under αk , the odd (Fermi) part F− changes sign under αk . The twisted net F t is defined as F+ + ikF− and is, in an obvious way, itself a Z2 -graded net. The Z2 -graded or twisted dual net of F is defined by F d (O) := ∩O1 ⊂O F t (O1 ) . It is understood to act on the same Hilbert space with the same unitary k. F satisfies twisted duality if it coincides with its twisted dual net F d . F is said to have Bose–Fermi commutation relations if F(O1 ) ⊂ F t (O2 ) ,
O1 ⊂ O2 ,
or, equivalently, if F ⊂ F d . If, in addition, F is irreducibly represented on HF , we refer to the triple F, k, HF as being a field net. By an inclusion of Z2 -graded nets we mean compatible (normal) inclusions B(O) ⊂ F(O) of von Neumann algebras together with an inclusion of Hilbert spaces HB ⊂ HF compatible with the inclusion of nets and such that kB is the restriction of kF to HB . We further require that HB be cyclic and separating for each F(O). Typically, an observable net may be defined from a field net acted on by a compact group G of net automorphisms with αk ∈ G by taking A to be the fixed-point net under G. Under these circumstances, A satisfies duality if F satisfies twisted duality (except in one space dimension) and there is a normal conditional expectation m of nets from F onto A obtained by averaging over the group. However, it has been known since the beginnings of the theory of superselection sectors that the existence of such a normal conditional expectation follows simply from the hypothesis that A satisfies duality, without any reference to a compact group G. We present here some related results. Let F, k, HF be a Z2 -graded net and let E be a projection on HF , commuting with k and cyclic and separating for F, i.e. for each F(O). Let F E (O) := F(O) ∩ {E} and let FE and kE denote the restriction of F E and k to the subspace EHF . Then the triple FE , kE , EHF is itself a Z2 -graded net. If we started with a field net, we would only get a field net if we knew that FE acts irreducibly on EHF .
270
R. Conti, S. Doplicher, J. E. Roberts
We now ask whether F admits a conditional expectation m of nets such that m(F )E = EF E,
F ∈ F.
In this case, m would project onto the subnet F E and be locally normal, see Lemma A.7 of the Appendix. In particular, in the case of a field net FE would act irreducibly on EHF . Lemma 3.1. If F is a field net and FE satisfies twisted duality, then there is a conditional expectation of F such that m(F )E = EF E,
F ∈ F.
Proof. By Corollary A.8b of the Appendix, we must show that [EF E, EF E] = 0,
F ∈ F(O),
F ∈ F(O) .
Now if O1 ⊂ O and B ∈ F t (O1 )E then [EF E, B] = 0. Hence EF(O)E EHF ⊂ (FE )d (O) = FE (O). Thus there is a G ∈ F(O) with GE = EG = EF E, and [GE, EF E] = 0, as required. To have a more systematic approach, we begin by proving an analogue of Lemma A.7 of the Appendix for Z2 -graded nets. Lemma 3.2. Let F, k be a Z2 -graded net on a Hilbert space H. Let E be a k-invariant projection cyclic for F and F d . Let E F be the net defined by: E
F(O) := {F ∈ F(O) : EF E ∈ (EF d E)d (O)}.
Then E F ⊃ F E and is weak-operator closed. This makes E F into a F E -bimodule. Given F ∈ F(O), there is a m(F ) ∈ F ddE (O) such that m(F )E = EF E if and only if F ∈ E F(O). Proof. If F ∈ E F(O), then EF E ∈ (EF dt E)(O1 ) ,
O1 ⊂ O .
Pick G ∈ F dt (O1 ), then EF ∗ EG∗ GEF E ≤ EF E2 EG∗ GE, hence there exists m(F ) ∈ F dt (O1 ) , such that m(F )E = EF E. Since E is separating for each F dt (O1 ) and O is path-connected, m(F ) is independent of the choice of O1 ⊂ O . Hence m(F ) ∈ ∩O1 ⊂O F dt (O1 ) = F dd (O). If F ∈ F(O) and there is an m(F ) ∈ F ddE (O), such that m(F )E = EF E then F ∈ E F(O). The remaining assertions are evident. Remark. To have a closer analogy with Lemma A.7 of the Appendix, we should require that F = F dd . Since, after all, we require M = M in the Appendix. Corollary 3.3. Let F = F dd , k, H be a Z2 -graded net and E a projection cyclic for F and F d . Then the following conditions are equivalent.
Superselection Theory for Subsystems
271
a) There is a conditional expectation m on F such that m(F )E = EF E, F ∈ F. a ) There is a conditional expectation md on F d such that md (F d )E = EF d E, F d ∈ F d . b) EFE ⊂ (EF d E)d . b ) EF d E ⊂ (EFE)d . c) (E F)d = E (F d ). c ) (E (F d ))d = E F. Here E F, for example, denotes the restriction of EFE to EH. Proof. Suppose b) holds, then E F = F and, by Lemma 3.2, m becomes a conditional expectation onto FE , since it is idempotent and of norm 1, giving a). Similarly, b ) implies a ). Taking duals, we see that b) and b ) are equivalent. It is clear that a) implies b) and that a ) implies b ). Now (E F)d (O) = ∩O1 ⊂O (E F t )(O1 ) = ∩O1 ⊂O (EF t (O1 )E EH) , so if a) holds then by Corollary A.8c of the Appendix, (E F)d (O) = ∩O1 ⊂O (EF t (O1 ) E) EH = E ∩O1 ⊂O F t (O1 ) E EH, where we have used the fact that E is separating for each F t (O1 ) and that O is pathconnected. Thus a) implies c). The implication a ) implies c ) follows by exchanging the role of F and F d . Now, trivially, c) implies b ) and, again, c ) implies b) follows. Of course, a direct application of Corollary A.8 of the Appendix shows that the above conditions are also equivalent to (E F)(O) = E (F(O) ),
O ∈ K.
However, in view of the superficial similarities with c), it is worth stressing that equivalence depends on being in more than two spacetime dimensions. What becomes clear from the above discussion is that the problem of studying the subsystems of a given system can be divided up in a natural way. We can begin with the simple class of subsystems characterized by cyclic projections E and the existence of a conditional expectation as above. Let us call such subsystems full since they are the largest subsystems on their Hilbert spaces and are uniquely determined by their Hilbert spaces. If A is a full subsystem of B and B is itself a full subsystem of F, then A is a full subsystem of F. We see from Lemma 3.1 that a subsystem satisfying twisted duality is full. Furthermore, if F satisfies twisted duality, then, by Corollary 3.3, a subsystem is full if and only if it satisfies twisted duality. A second step might then be to analyze subsystems having the same Hilbert space. In the following result, we give an analogue of Corollary A.9 of the Appendix and look at full subsystems from the point of view of the subsystem. Lemma 3.4. Let B ⊂ F be an inclusion of Z2 -graded nets and E the associated projection from HF to HB , then the following conditions are equivalent. a) There is a (necessarily unique, injective and Z2 -graded) net morphism ν : Bd → F d such that ν(B)+ = B+, B ∈ Bd , + ∈ HB . b) Bd = F d E .
272
R. Conti, S. Doplicher, J. E. Roberts
If the conditions are fulfilled, there is a unique normal conditional expectation m of F d onto ν(Bd ) such that m(F )E = EF E, F ∈ F d . Proof. ν is obviously unique, hence Z2 -graded, since HB is cyclic for each F t (O), hence separating for each F d (O). Given a), we note that Eν(B)E = ν(B)E and replacing B by B ∗ , we see that ν(B) ∈ F d E . Hence B ∈ F d E , yielding b). Conversely, if b) is satisfied, given B ∈ Bd (O), there is an F ∈ F d (O) with F E = EF and F + = B+, + ∈ HB . Hence, we may pick ν(B) = F to give a map ν : Bd → F d and it follows from uniqueness that ν is a net morphism. Now suppose the conditions are satisfied and that F ∈ F d (O) and B ∈ Bt (O1 ) with O1 ⊂ O . Then EF EB = EF BE = EBF E = BEF E. Hence the restriction of EF E to HB lies in Bd (O) = F d E (O) by b). The result now follows by Lemma A.7 of the Appendix. Remarks. For an inclusion of field nets, B ⊂ F d E ⊂ Bd , b) is trivially fulfilled if B satisfies twisted duality. Now suppose that B satisfies twisted duality for wedges then Bd (O) = ∩W ⊃O R(W), where R(W) denotes the von Neumann algebra associated with the wedge W. Now given spacelike double cones, O and O1 , there is a wedge W such that O ⊂ W ⊂ O1 . Hence Bd (O) ⊂ R(W) ⊂ F t (O1 ) E , and, taking the intersection over O1 , we see that b) is again satisfied. If B satisfies essential twisted duality, i.e. if Bd = Bdd , then we cannot conclude from the above that b) is satisfied since we do not know that we have an inclusion Bdd ⊂ F dd . If B = Bdd , F = F dd and E is also cyclic for each F d (O) in Lemma 3.4, then we may deduce from Corollary 3.3 that, under the equivalent conditions of Lemma 3.4, B = FE . We now consider an inclusion A ⊂ B of nets of local von Neumann algebras over double cones each satisfying duality in their respective Hilbert spaces HA and HB . Let E denote the projection of HB onto HA . Then, as follows e.g. from Lemma 3.4, there is a conditional expectation of nets of von Neumann algebras m of B onto A such that EBE = m(B)E,
B ∈ B.
Furthermore, the intertwiner spaces between transportable localized morphisms of A and their extensions to B are related by m(ρ, ρ )B = (ρ, ρ )A , see the remarks following Theorem 2.2. We now introduce the canonical field net F of B and let mB be the associated conditional expectation from F onto B. We recall [12] that the canonical field net is defined for observable nets satisfying duality and Property B. Let E denote the net of C ∗ -algebras generated by the Hilbert spaces in F implementing the transportable localized morphisms of A. Then by Lemma A.2, the restriction of m ◦ mB to E is the unique conditional expectation n onto the subnet A. By [10], this shows that E is the C ∗ -cross product of A by the action of TA .
Superselection Theory for Subsystems
273
Now let α denote the canonical action of the gauge group G of A on the net E. Then we have α(F ) dµ(g) = n(F ),
F ∈ E,
where µ denotes Haar measure on G. Let HF denote the canonical Hilbert space of F and H the Hilbert subspace generated by HA and E. Since ω ◦ m ◦ mB = ω, ω ◦ n = ω, where ω is a state defined by a vector of HA . It follows that states of E defined by vectors in HA are gauge invariant. Hence (+, F /) = (+, n(F )/),
F ∈ E,
+, / ∈ HA .
Given Fi ∈ E and +i ∈ HA , i = 1, 2 . . . , n define Fi +i := αg (Fi )+i , Ug
i
i
i
αg (Fi )+i = (+i , αg (Fi∗ Fj )+j ) = (+i , Fi∗ Fj +j ). 2
i,j
i,j
Thus we get a unitary action of G on H. We next remark that HA is the space of G-invariant vectors in H. In fact, if Ug + = +, g ∈ G and + is orthogonal to HA , then (+, F /) = (+, αg (F )/) = (+, n(F )/) = 0,
/ ∈ HA ,
F ∈ E.
Thus + = 0. We can now check easily, that our data consisting of a representation of A on H restricting to the vacuum representation on HA , a unitary action of G on H and a homomorphism ρ → Hρ from the semigroup of objects of TA has all the properties needed to generate the canonical field net of A [12], p.66. Since HA is cyclic for E(O) hence separating for E(O) , the canonical field net is canonically isomorphic to O → E(O). Thus we have shown the following result. Theorem 3.5. Let A ⊂ B be an inclusion of nets of observable algebras satisfying duality and Property B, then there is a canonical inclusion of the corresponding canonical field nets. 4. Sector Structure of Intermediate Nets In this section, we consider an inclusion of nets A ⊂ B and examine in more detail the relation between the sectors of A and those of B. As we have little to say in general, we restrict our attention to the case that A ⊂ B ⊂ F(A). We first show that under these circumstances, B is the fixed-point net of F(A) under a closed subgroup of the gauge group G of A. To this end, we denote by Hρ the Hilbert space in F := F(A) inducing ρ ∈ f . Set Kρ := Hρ ∩ B. Then Kρ is a Hilbert space in B. We claim a) Kρ Kσ ⊂ Kρσ , b) T Kρ ⊂ Kσ , if T ∈ (ρ, σ ),
274
R. Conti, S. Doplicher, J. E. Roberts
c) Kρ¯ = J Kρ , where J is an antiunitary from Hρ to Hρ¯ intertwining the actions of the gauge group. Indeed a) is obvious whilst b) follows from the fact that T ∈ A. Finally, c) follows from ¯ with R¯ ∈ (ι, ρ ρ) the fact that we may define such an antiunitary J by J ψ = ψ ∗ R, ¯ as in the definition of conjugate endomorphisms, cf. Theorem 3.3 of [8, II], or a standard solution of the conjugate equations, cf. [19]. It follows [21] that there is a unique closed subgroup L of the gauge group G such that each Kρ is precisely the fixed-points of the action of L on Hρ . We now make use of the fact that when B satisfies duality, there is a locally normal conditional expectation m from F onto B. Let ψ, ψ ∈ Hρ , then m(ψ)B = ρ(B)m(ψ), Hence
∗
∗
B ∈ B.
ψ m(ψ )B = ψ ρ(B)m(ψ ) = Bψ ∗ m(ψ ),
and since B ∩ F(A) = CI , ψ ∗ m(ψ ) ∈ CI and m(ψ ) ∈ Hρ . Since F is generated as a net of linear spaces closed in say the s-topology by the elements of the Hilbert spaces Hρ , B is generated in the same way by Kρ . Thus B is the fixed-point net under the action of L. Thus we have proved the following result. Theorem 4.1. Let F be the canonical field net of the observable net A and B an intermediate net, A ⊂ B ⊂ F, satisfying duality, then there is a closed subgroup L of the gauge group G of A such that B = F L . Related results in the context of inclusions of von Neumann algebras can be found in [15]. Lemma 4.2. The following are equivalent: a) L is a normal subgroup of G, b) αg (B) ⊂ B, g ∈ G, c) B is generated by Hilbert spaces inducing endomorphisms in f (A). Proof. a) ⇒ b) is obvious. Hilbert spaces inducing endomorphisms in f (A) are Ginvariant so c) ⇒ b). If b) holds then given g ∈ G and k ∈ L, B ∈ B, αgkg −1 (B) = αg αk αg −1 (B) = αg αg −1 (B) = B, since αg −1 (B) ∈ B. Thus αgkg −1 is an automorphism of F leaving B pointwise fixed. Thus gkg −1 ∈ L and L is a normal subgroup, giving b) ⇒ a). Suppose a) then consider the set of Hilbert spaces in B inducing endomorphisms in f (A). These must be Linvariant and each thus carry a canonical representation of G/L and BG/L = F G = A. We know that B is generated by H ∩ B, where H is a Hilbert space inducing an element of f (A). This Hilbert space may not have support I but it is an invariant subspace for the action of G and is hence in the algebra generated by Hilbert spaces above. Obviously, c) implies b), completing the proof. We next discuss a situation where two members of an inclusion of observable nets A ⊂ B have coinciding canonical field nets. We start with a net B and suppose that its canonical field net F has a compact gauge group K of internal symmetries with K ⊃ G, where G is the gauge group of B and then define A to be the fixed-point net F K .
Superselection Theory for Subsystems
275
We recall that if K is spontaneously broken then A does not satisfy duality [21]. Its dual net Ad is the fixed-point net of F under the closed subgroup of unbroken symmetries and does satisfy duality. Furthermore, A and Ad have the same superselection structure. Hence in line with our strategy of considering only nets satisfying duality, we may restrict ourselves to the case that K is unbroken. We recall that, if F has the split property, then the group K max of all unitaries leaving : invariant and inducing net automorphisms of F is automatically compact in the strong operator topology [9]. In the above situation {F, A, K, HA } is a field system with gauge symmetry for A. Furthermore, ρ ∈ f (A) is induced by a finite-dimensional Hilbert space H in F since this is true of its extension to an element of f (B). But this means that every sector of A is realized on the vacuum Hilbert space of F so that F is the canonical field net of A and K is the gauge group. We have thus proved the following result Proposition 4.3. Let B be an observable net with canonical field net F and gauge group G. Suppose F has an unbroken compact group K of internal symmetries. Then the fixedpoint net F K has F as canonical field net. Finally, we consider the sector structure of an intermediate observable net A ⊂ B ⊂ F(A) satisfying duality. As we know from Theorem 4.1, B is the fixed-points of F(A) under the action of a closed subgroup L of the gauge group G. We shall suppose that the vacuum Hilbert space of A is separable, that Property B of Borchers holds for Ad and that each representation of A satisfying the selection criterion is a direct sum of irreducibles with finite statistics. We now pick a representation πˆ of B satisfying the selection criterion for B. To analyse this representation, we choose an associated 1-cocycle z as in Sect. 2. Since B satisfies duality, z(b) ∈ B(|b|) ⊂ F(|b|). If we consider z as a 1-cocycle of F, it can be used, as discussed in Sect. 2, to define representations π˜ a of F, where π˜ a (F ) := z(b)F z(b)∗ ,
F ∈ F(O), b ∈ 1 , ∂0 b = a, ∂1 b ⊂ O .
Note that z(b) is a Bosonic operator in F. Restricting π˜ a first to B and then to the vacuum Hilbert space of B gives the representations πˆ a associated with z considered as a cocycle of B. Thus πˆ a is equivalent to πˆ . We now let π denote the restriction of some fixed π˜ a to A. Now if the vacuum Hilbert space of F is non-separable, then π cannot satisfy the selection criterion as its restriction to each A(O ) is equivalent to a direct sum of uncountably many copies of the identity representation of A(O ). However, we shall see that π is just a direct sum of representations satisfying the selection criterion. It suffices to show that any cyclic subrepresentation satisfies the selection criterion. Such a cyclic representation is, like π , locally normal and hence acts on a separable Hilbert space as a consequence of the following well known result. Lemma 4.4. Let A be an observable net acting on a separable vacuum Hilbert space and ω be a locally normal state. Then the GNS representation πω of A is separable. A proof may be found for example in § 5.2 of [1]. Lemma 4.5. Every cyclic subrepresentation of π satisfies the selection criterion. Proof. We turn the equivalence of representations in restriction to A(O ) into a question of the equivalence of two projections E0 and F0 in the representation of A(O ) obtained by restricting the vacuum representation πˆ 0 of F to A(O ). E0 is the projection onto the
276
R. Conti, S. Doplicher, J. E. Roberts
subspace given by the vacuum sector of A. F0 is determined as follows. To be able to exploit the Borchers property, we choose a double cone O0 with O0− ⊂ O, and a unitary U such that U π(A) = πˆ 0 (A)U, A ∈ A(O0 ), and set F0 := U F U ∗ , where F corresponds to the (cyclic) subrepresentation of π , F ∈ π(A) , with separable range. Let σ, σˆ 0 and τ, τˆ0 denote the restrictions of π, πˆ 0 to A(O ) and A(O0 ), respectively. Then U ∈ (τ, τˆ0 ) ⊂ (σ, σˆ 0 ) and F0 ∈ (τˆ0 , τˆ0 ) ⊂ (σˆ 0 , σˆ 0 ). Since πˆ 0 is, in restriction to A, a direct sum of representations satisfying the selection criterion, E0 has central support I in both (τˆ0 , τˆ0 ) and (σˆ 0 , σˆ 0 ). Thus there are projections e0 and f0 with e0 ≺ E0 , f0 ≺ F0 and e0 f0 in (τˆ0 , τˆ0 ). E0 in (σˆ 0 , σˆ 0 ). Thus E0 is equivalent to the Moreover, by Property B for Ad , e0 subprojection f0 of F0 in (σˆ 0 , σˆ 0 ). Since F0 is separable and E0 has infinite multiplicity by Property B, we have E0 ≺ F0 ≺ ∞E0 E0 . Thus E0 and F0 are equivalent, completing the proof. Corollary 4.6. π is normal on A, π˜ a is normal on F and πˆ is normal on B, where the term normal refers to the vacuum representation of F. Proof. The first statement follows at once from Lemma 4.5, the second by invoking Theorem A.6 of the Appendix and the third is obvious since, as we have seen, πˆ is equivalent to a subrepresentation of the restriction π˜ a to B. Now any normal representation of B is just a direct sum of subrepresentations of the defining representation so we have proved the following result. Theorem 4.7. Let A be an observable net on a separable Hilbert space whose dual net satisfies Property B and suppose that every representation of A satisfying the selection criterion is a direct sum of irreducible representations with finite statistics. Let B be an intermediate observable net satisfying duality, i.e. A ⊂ B ⊂ F(A) and L the associated compact group as in Theorem 3.1. Then every representation of B satisfying the selection criterion is a direct sum of sectors with finite statistics and these are labelled by the equivalence classes of irreducible representations of L. As a particular case of this, we note that when F contains Fermi elements, then the Bose part of F is the fixed-point algebra of F under Z2 and has precisely two sectors. In the case where A has only a finite number of superselection sectors, the above result is already known, cf. [4, 20]. Theorem 4.7 has an immediate corollary. Corollary 4.8. Under the hypothesis of Theorem 4.7, the field nets of A and B coincide, F(A) = F(B). 5. Appendix In this appendix we collect together various results needed in the course of this paper. They have in common that they do not involve the net structure but typically the harmonic analysis of the action of compact groups on von Neumann algebras and C ∗ -algebras and conditional expectations. The results are looked at in terms of the structure of the category of finite-dimensional continuous, unitary representations of the group rather than the group itself. Consequently, the results transcend group theory. This degree of generality is not needed in this paper.
Superselection Theory for Subsystems
277
Lemma A.1. Let m be a conditional expectation from the ∗ -algebra B onto the ∗ subalgebra A and H a Hilbert space in B such that m(H ) ⊂ H , then m restricted to H is the orthogonal projection onto the closed subspace A ∩ H . Proof. If ψ, ψ ∈ H then ψ ∗ m(ψ ) is a scalar. Thus ψ ∗ m(ψ ) = m(ψ ∗ m(ψ )) = m(ψ)∗ m(ψ ) so m(ψ)∗ ψ = ψ ∗ m(ψ ). Hence m restricted to H is selfadjoint and as it is anyway involutive, it is the orthogonal projection onto m(H ) = A ∩ H , as required. There are some obvious corollaries of this result. Suppose that B is generated by A and a collection H of Hilbert spaces in B then there is at most one conditional expectation m of B onto A such that m(H ) ⊂ H for each H ∈ H. If we suppose that B is a C ∗ algebra then it suffices if A and H generate B as a C ∗ -algebra. If B is a von Neumann algebra and m is normal then it suffices if A and H generate B as a von Neumann algebra. These results apply in particular to the case where B is the cross product of A by the action of a dual object of a compact group. Note, too, that the hypothesis m(H ) ⊂ H is redundant if the canonical endomorphism of H maps A into itself and if A ∩ B = C. Thus “minimal” or perhaps better irreducible cross products have a unique mean. Lemma A.2. Let A ⊂ B ⊂ F be inclusions of C ∗ -algebras and mB a conditional expectation of F onto B. Let H denote a category of Hilbert spaces in F each normalizing B and A and such that mB (H ) ⊂ H for each object H of H. Let m be a conditional expectation of B onto A. Suppose that, whenever H is an object of H and σH the corresponding endomorphism, then ψ ∈ A,
ψA = σH (A)ψ,
A ∈ A,
implies ψ ∈ H . Let E denote the C ∗ -subalgebra of F generated by A and the objects H of H then m ◦ mB restricted to E is the unique conditional expectation n of E onto A with n(H ) ⊂ H for all objects H of H. Proof. The uniqueness of n holds since E is generated by A and the objects of H and since n(H ) ⊂ H for each such object H . Now taking n to be the restriction of m ◦ mB to E, n is trivially a conditional expectation onto A. If ψ ∈ H , then n(ψ)A = n(ψA) = σH (A)n(ψ),
A ∈ A,
since H normalizes A. Hence n(ψ) ∈ H by hypothesis, completing the proof.
By a partition of the identity on a Hilbert space H we mean a set Ei , i ∈ I of (selfadjoint) projections with sum the identity operator. Each element X ∈ B(H) can then be written X = i,j Ei XEj with convergence in say the s-topology. The set of elements for which this sum is finite forms a ∗ -subalgebra B(H)I of B(H) which is a direct sum of the subspaces Ei B(H)Ej . We let sf denote the topology on the ∗ -subalgebra which is the direct sum of the s-topologies on these subspaces. Lemma A.3. Let Ei , i ∈ I be a partition of the unit on a Hilbert space H and π a representation of B(H)I on Hπ , continuous in the sf -topology when B(Hπ ) is given the s-topology. Then if π(Ei ), i ∈ I , is a partition of the identity, π extends uniquely to an s-continuous representation of B(H).
278
R. Conti, S. Doplicher, J. E. Roberts
Proof. If π extends to an s-continuous representation, again denoted by π , we must have π(X) = π(Ei XEj ), X ∈ B(H), i,j
so any extension is unique. On the other hand, this expression for π(X) is obviously defined on the dense subspace spanned by the subspaces π(Ei )Hπ , i ∈ I . Hence, it suffices to show that π(X) is bounded there. Let J be a finite subset of I and EJ := ∗ j ∈J Ej . Then the von Neumann algebra EJ B(H)EJ is a -subalgebra of B(H)I so that π(EJ XEJ ) ≤ EJ XEJ ≤ X, X ∈ B(H) and π(X) is bounded. Computing matrix elements from the dense subspace, we see that we have a representation of B(H). To see that it is normal, it suffices to show that its restriction to the compact operators is non-degenerate. However, its restriction to the compact operators on each Ei H is non-degenerate on π(Ei )Hπ . But π(Ei ) is a partition of the identity, so the result follows. Remark. Another way of looking at the above result is that B(H) is the inductive limit of the von Neumann algebras B(EJ H) as J runs over the set of finite subsets of I , ordered under inclusion. The inductive limit is here understood in the category of von Neumann algebras with normal, but not necessarily unit-preserving ∗ -homomorphisms. We now consider a von Neumann algebra M and a faithful, normal conditional expectation m onto a von Neumann subalgebra A. Consider M as a left A-module with the A-valued scalar product m(XY ∗ ) derived from m. Lemma A.4. A representation π of M, s-continuous in restriction to A is also scontinuous in restriction to any submodule N of finite rank. Proof. When N has finite rank, we can find a finite orthonormal basis ψi using the Gram–Schmidt orthogonalization process. Thus for each X ∈ N , we have X= m(Xψi∗ )ψi . i
Suppose Xn → X in the s-topology on N . Then π(m(Xn ψi∗ )) → π(m(Xψi∗ )) and hence π(Xn ) → π(X) as required. We will need some variant of this result where M is just a C ∗ -algebra. We could assume that N has a finite orthonormal basis or say assume that it is a finite-rank projective module where the coefficients can be chosen continuous in the s-topology. To make a bridge between Lemmas A.3 and A.4, we need another structure related to the notion of hypergroup. We consider a set and a mapping (σ, τ ) → σ ⊗ τ from × into the set of finite subsets of . We suppose further that is equipped with an involution (conjugation) σ → σ¯ with the property that ρ ∈ σ ⊗ τ if and only if τ ∈ σ¯ ⊗ ρ and if and only if σ ∈ ρ ⊗ τ¯ . Furthermore there is a distinguished element ι ∈ such that ι ⊗ σ and σ ⊗ ι both consist of the single point σ for each σ ∈ . If is as above then a C ∗ -algebra B will be said to be -graded if there are normclosed linear subspaces Bσ , σ ∈ , spanning B such that Bσ∗ = Bσ¯ and if Bσ Bτ ⊂ Bσ ⊗τ . Here Bσ ⊗τ denotes the norm-closed subspace spanned by the Bρ as ρ runs over the elements of σ ⊗ τ . Note that Bι is a C ∗ -subalgebra of B and that each Bσ is a Bι bimodule.
Superselection Theory for Subsystems
279
A representation π of a -graded C ∗ -algebra B is a representation of B on a Hilbert space H which is a direct sum of closed linear subspaces Hσ such that π(Bσ )Hτ ⊂ Hσ ⊗τ , where Hσ ⊗τ is defined in the obvious manner.
Lemma A.5. Let π be a representation of a -graded C ∗ -algebra and Eσ the projection on Hσ then Eσ π(B)Eτ ⊂ Eσ π(Bσ ⊗τ¯ )Eτ . Proof. π(Bρ )Hτ ⊂ Hρ⊗τ . Thus Eσ π(Bρ )Eτ = 0 unless σ ∈ ρ ⊗ τ , i.e. unless ρ ∈ σ ⊗ τ¯ . The obvious example of the above structure is to consider a compact group G acting on a C ∗ -algebra B and to take to be the set of equivalence classes of irreducible, continuous unitary representations of G. We now set Bσ := mσ (B), where mσ (B) := αg (B)χσ (g), B ∈ B, G
and χσ denotes the normalized trace of σ . In the same way, if (π, U ) is a covariant representation of {B, α}, we get a representation of the -graded C ∗ -algebra B by using Eσ := χσ (g)U (g) G
to define the closed linear subspace Hσ . We now put the above results together in the form of a theorem needed in the body of the text. Theorem A.6. Given a C ∗ -algebra B acting irreducibly on a Hilbert space H and a continuous unitary representation U of a compact group G inducing an action α : G → Aut(B) on B with full Hilbert spectrum, then every representation of B normal on B G is normal on B. Proof. Let A denote the fixed point algebra and let Eσ be as above. Since B is irreducible, Eσ is in the weak closure of A, so that extending π to this weak closure by normality, we have a partition π(Eσ ) of the unit in the representation space of π . Then by Lemma A.5 above, Eσ BEτ is finite-dimensional as a left A-module. Since the action has full Hilbert spectrum, i.e. every irreducible representation of G is realized on some Hilbert space in B, the argument of Lemma A.4 applies and shows that a representation π of B normal on A is normal on each Eσ BEτ . The result now follows from Lemma A.3. We come now to a result on the existence of normal conditional expectations, beginning with a simple lemma of interest in its own right. Lemma A.7. Let M be a von Neumann algebra on a Hilbert space H and E a cyclic and separating projection for M. Let ME := M ∩ {E} and E M := {M ∈ M : EME ∈ (EM E) }, then E M is a weak-operator closed ME -bimodule containing ME as a subbimodule. Given M ∈ M there is a µ(M) ∈ ME such that µ(M)E = EME if and only if M ∈ E M.
280
R. Conti, S. Doplicher, J. E. Roberts
Proof. Given M ∈ E M and M ∈ M , then
EM ∗ EM ∗ M EME ≤ ||EME2 EM ∗ M E,
since EME and EM ∗ M E commute. Thus E being cyclic for M , there exists a unique bounded operator µ(M) such that µ(M)M E = M EME. Obviously, µ(M) ∈ M and a computation shows that µ(M ∗ ) = µ(M)∗ . Setting M = I , it now follows that µ(M) commutes with E. On the other hand, if µ(M)E = EME for some M ∈ M then M ∈ E M. The remaining assertions are evident. Specializing to the case that E M = M gives the following result. Corollary A.8. Let M be a von Neumann algebra on a Hilbert space H and E a cyclic and separating projection for M. Then the following conditions are equivalent. a) There is a conditional expectation µ on M such that µ(M)E = EME, M ∈ M. a ) There is a conditional expectation µ on M such that µ (M )E = EM E, M ∈ M . b) [EME, EM E] = 0. c) (E (M )) = E M. Here E M, for example, denotes the restriction of EME to EH. The conditional expectations µ and µ are automatically normal. Proof. Suppose b) holds then E M = M and by Lemma A.7, µ becomes a normal conditional expectation onto ME since it is idempotent and of norm 1. We have therefore deduced a) and by symmetry a ). Now suppose a) holds, then µ(M) is just ME and M = E M, proving b). Furthermore, its restriction to EH is µ(M)E . Thus (E M) = µ(M)E . Since µ(M) = M ∩ (E) , elements of the form M1 + M2 EM3 with Mi ∈ M form an s-dense ∗ -subalgebra in its commutant and restricting this to EH, we have proved c). Trivially, c) implies b), so the conditions of the corollary are equivalent. Remark. If σ is an (inner) automorphism of B(H), the above conditions are satisfied by σ (M) and σ (E) and the corresponding conditional expectation is σ µσ −1 . In particular, if σ (M) = M and σ (E) = E, then µσ = σ µ. Corollary A.9. Let N ⊂ M be an inclusion of von Neumann algebras on Hilbert spaces K and H, respectively. Let E, the projection from H onto K, be cyclic and separating for M, then the following conditions are equivalent. a) There is a (necessarily unique and injective) morphism ν : N → M such that ν(N )+ = N +,
N ∈ N ,
+ ∈ K.
b) N = M E . c) There is a conditional expectation m of M onto N such that m(M)E = EME,
M ∈ M.
Here ME denotes the restriction of M E to EH = K.
Superselection Theory for Subsystems
281
Proof. ν is obviously unique since K is cyclic for each M. Given a), we note that Eν(N )E = ν(N )E and replacing N by N ∗ , we see that ν(N ) ∈ M E . Hence N ∈ ME , yielding b). Conversely, if b) is satisfied, given N ∈ N , there is an M ∈ M with M E = EM and M + = N +. Hence, we may pick ν(N ) = M to give a map ν : N → M and it follows from uniqueness that ν is a morphism. Now if M ∈ M and N ∈ N , then by a), N commutes with the restriction of EME to K. Thus EME ∈ N ⊂ ME and c) follows from Lemma A.7. Conversely, if c) holds then N = ME and b) follows by calculating commutants. Remark. Of course, when the conditions of Corollary A.9 are satisfied, there is also a conditional expectation m of M onto ν(N ) such that m (M )E = EM E,
M ∈ M .
This follows from Corollary A.8. References 1. Baumgärtel, H., Wollenberg, M.: Causal Nets of Operator Algebras. Mathematical Aspects of Algebraic Quantum Field Theory. Berlin: Akademie Verlag, 1992 2. Böckenhauer, J., Evans, D.: Modular Invariants, Graphs and α-Induction for Nets of Subfactors I. Commun. Math. Phys. 197, 361–386 (1998) 3. Conti, R.: On the Intrinsic Definition of Local Observables. Lett. Math. Phys. 35, 237–250 (1995) 4. Conti, R.: Inclusioni di algebre di von Neumann e teoria algebrica dei campi. Ph.D. Thesis, Università di Roma Tor Vergata (1996) 5. Davidson, D.R.: Classification of Subsystems of Local Algebras. Ph.D. Thesis, University of California at Berkeley (1993) 6. Doplicher, S.: Progress and problems in algebraic quantum field theory. In S. Albeverio et al. (eds.), Ideas and Methods in Quantum and Statistical Physics Vol. 2, Cambridge: Cambridge Univ. Press, 1992 7. Doplicher, S., Haag, R., Roberts, J.E.: Fields, observables and gauge transformations I. Commun. Math. Phys. 13, 1–23 (1969); II, Commun. Math. Phys. 15, 173–200 (1969) 8. Doplicher, S., Haag, R., Roberts, J.E.: Local observables and particle statistics I. Commun. Math. Phys. 23, 199–230 (1971); II, Commun. Math. Phys. 35, 49–85 (1974) 9. Doplicher, S., Longo, R.: Standard and split inclusions of von Neumann algebras. Invent. Math. 75, 493–536 (1984) 10. Doplicher, S., Roberts, J.E.: Endomorphisms of C ∗ -algebras, cross products and duality for compact groups. Ann. Math. 130, 75–119 (1989) 11. Doplicher, S., Roberts, J.E.: A new duality theory for compact groups. Invent. Math. 98, 157–218 (1989) 12. Doplicher, S., Roberts, J.E.: Why there is a field algebra with a compact gauge group describing the superselection structure of particle physics. Commun. Math. Phys. 131, 51–107 (1990) 13. Driessler, W.: Duality and Absence of Locally Generated Superselection Sectors for CCR-Type Algebras. Commun. Math. Phys. 70, 213–220 (1979) 14. Haag, R.: Local Quantum Physics. 2nd ed., Berlin–Heidelberg–New York: Springer-Verlag, 1996 15. Izumi, M., Longo, R., Popa, S.: A Galois correspondence for compact groups of automorphisms of von Neumann algebras with a generalization to Kac algebras. J. Funct. Anal. 155, 25–63 (1998) 16. Kastler, D. (ed.): The algebraic theory of superselection sectors: Introduction and recent results. Singapore: World Scientific, 1990 17. Langerholc, J., Schroer, B.: Can current operators determine a complete theory? Commun. Math. Phys. 4, 123–136 (1967) 18. Longo, R., Rehren, K.-H.: Nets of Subfactors. Rev. Math. Phys. 7, 567–597 (1995) 19. Longo, R., Roberts, J.E.: A Theory of Dimension. K-Theory 11, 103–159 (1997) 20. Müger, M.: On charged fields with group symmetry and degeneracies of Verlinde’s matrix S. Ann. Inst. H. Poincaré 71, 359–394 (1999) 21. Roberts, J.E.: Spontaneously broken gauge symmetries and superselection rules. In: Proceedings of the International School of Mathematical Physics, Camerino 1974, G. Gallavotti ed., Camerino: Universitá di Camerino, 1976 22. Roberts, J.E.: Net Cohomology and Its Applications to Field Theory. In: Quantum Fields–Algebras, Processes, ed. L. Streit, Wien–New York: Springer-Verlag 1980 23. Roberts, J.E.: Lectures on algebraic quantum field theory. In: [16], op. cit Communicated by A. Connes
Commun. Math. Phys. 218, 283 – 292 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Complexified Gravity in Noncommutative Spaces Ali H. Chamseddine Center for Advanced Mathematical Sciences (CAMS) and Physics Department, American University of Beirut, Lebanon Received: 1 June 2000 / Accepted: 27 November 2000
Abstract: The presence of a constant background antisymmetric tensor for open strings or D-branes forces the space-time coordinates to be noncommutative. This effect is equivalent to replacing ordinary products in the effective theory by the deformed star product. An immediate consequence of this is that all fields get complexified. The only possible noncommutative Yang–Mills theory is the one with U (N ) gauge symmetry. By applying this idea to gravity one discovers that the metric becomes complex. We show in this article that this procedure is completely consistent and one can obtain complexified gravity by gauging the symmetry U (1, D − 1) instead of the usual SO(1, D − 1). The final theory depends on a Hermitian tensor containing both the symmetric metric and antisymmetric tensor. In contrast to other theories of nonsymmetric gravity the action is both unique and gauge invariant. The results are then generalized to noncommutative spaces. 1. Introduction The developments in the last two years have shown that the presence of a constant background B-field for open strings or D-branes lead to the noncommutativity of spacetime coordinates ([1–7]). This can be equivalently realized by deforming the algebra of functions on the classical world volume. The operator product expansion for vertex operators is identified with the star (Moyal) product of functions on noncommutative spaces ([8, 9]). In this respect it was shown that noncommutative U(N)Yang-Mills theory does arise in string theory. The effective action in presence of a constant B-field background is 1 − T r Fµν ∗ F µν , 4 where Fµν = ∂µ Aν − ∂ν Aµ + iAµ ∗ Aν − iAν ∗ Aµ ,
284
A. H. Chamseddine
and the star product is defined by i
f (x) ∗ g (x) = e 2
θ µν ∂ζ∂µ
∂ ∂ην
f (x + ζ ) g (x + η) ζ =η=0 .
This definition forces the gauge fields to become complex. Indeed the noncommutative Yang-Mills action is invariant under the gauge transformations Agµ = g ∗ Aµ ∗ g∗−1 − ∂µ g ∗ g∗−1 , where g∗−1 is the inverse of g with respect to the star product: g ∗ g∗−1 = g∗−1 ∗ g = 1. The contributions of the terms iθ µν in the star product forces the gauge fields to be complex. Only conditions such as A†µ = −Aµ could be preserved under gauge transformations provided that g is unitary: g † ∗ g = g ∗ g † = 1. It is not possible to restrict Aµ to be real or imaginary to get the orthogonal or symplectic gauge groups as these properties are not preserved by the star product ([7, 10]). I will address the question of how gravity is modified in the low-energy effective theory of open strings in the presence of background fields. It has been shown that the metric of the target space gets modified by contributions of the B-field and that it becomes nonsymmetric ([11, 7]). If we think of gravity as resulting from local gauge invariance under Lorentz transformations in the tangent manifold, then the previous reasoning would suggest that the vielbein and spin connection both get complexified with the star product. This seems inevitable as the star product appears in the operator product expansion of the string vertex operators. We are therefore led to investigate whether gravity in D dimensions can be constructed by gauging the unitary group U (1, D −1). In this article we shall show that this is indeed possible and that one can construct a Hermitian action which governs the dynamics of a nonsymmetric complex metric. Once this is achieved, it is straightforward to give the necessary modifications to make the action noncommutative. The plan of this paper is as follows. In Sect. 2 the action for nonsymmetric gravity based on gauging the group U (1, D − 1) is given and the structure of the theory studied. In Sect. 3 the equations of motion are solved to make connection with the second order formalism. In Sect. 4 we give the generalization to noncommutative spaces. Section 5 is the conclusion. 2. Nonsymmetric Gravity by Gauging U (1, D − 1) Assume that we start with the U (1, D − 1) gauge fields ωµa b . The U (1, D − 1) group of transformations is defined as the set of matrix transformations leaving the quadratic form a † a b Z ηb Z invariant, where Z a are D complex fields and ηba = diag (−1, 1, · · · , 1) with D − 1 positive entries. The dagger operator is the adjoint operator which in this case takes the complex conjugate and lower the index (or exchange rows and columns). The gauge fields ωµa b must then satisfy the condition † ωµa b = −ηcb ωµc d ηad .
Complexified Gravity in Noncommutative Spaces
285
The curvature associated with this gauge field is Rµν ab = ∂µ ωνa b − ∂ν ωµa b + ωµa c ωνc b − ωνa c ωµc b . Under gauge transformations we have ωµa b = Mca ωµc d Mb−1d − Mca ∂µ Mb−1c . where the matrices M are subject to the condition: a † a b Mc ηb Md = ηdc . The curvature then transforms as µν ab = Mca Rµν cd M −1d . R b µ
a and its inverse e defined by Next we introduce the complex vielbein eµ a a = δµν , eaν eµ
eνa ebν = δba ,
which transform as a b eµ = Mba eµ , µ
eaµ = eb Ma−1b . It is also useful to define the complex conjugates a † , eµa ≡ eµ µ † µa e ≡ ea . With this, it is not difficult to see that eaµ Rµν ab ηcb eνc transforms to µ
−1f b ηc
ed Ma−1d Mea Rµν ef Mb
Mc−1l
†
eνl
and is thus U (1, D − 1) invariant. It is also Hermitian † f eaµ Rµν ab ηcb eνc = −ecν ηbc ηeb Rµν ef ηa eµa = eaµ Rµν ab ηcb eνc . The metric defined by † a b ηa eνb gµν = eµ satisfies the property † gµν = gνµ .
286
A. H. Chamseddine
When the metric is decomposed into its real and imaginary parts: gµν = Gµν + iBµν , the hermiticity property then implies the symmetries Gµν = Gνµ , Bµν = −Bνµ . The gauge invariant Hermitian action is given by √ I = d D x eeaµ Rµν ab ηcb eνc e† , a . This action is analogous to the first order formulation of gravity where e = det eµ obtained by gauging the group SO(1, D −1). One goes to the second order formalism by integrating out the spin connection and substituting for it its value in terms of the vielbein. The same structure is also present here and one can solve for ωµa b in terms of the complex a resulting in an action that depends only on the fields g . It is worthwhile to fields eµ µν stress that the above action, unlike others proposed to describe nonsymmetric gravity [12] is unique, except for the measure, and unambiguous. Similar ideas have been proposed in the past based on gauging the groups O(D, D) [13] and GL(D) [14], in relation to string duality, but the results obtained there are different from what is presented here. The ordering of the terms in writing the action is done in a way that generalizes to the noncommutative case. a is The infinitesimal gauge transformations for eµ a b δeµ = 'ab eµ a = ea + iea , which can be decomposed into real and imaginary parts by writing eµ 0µ 1µ a a a and 'b = '0b + i'1b to give a b b = 'a0b e0µ − 'a1b e1µ , δe0µ a b b = 'a1b e0µ + 'a0b e1µ . δe1µ
† The gauge parameters satisfy the constraints 'ab = −ηcb 'cd ηad which implies the two constraints a T '0b = −ηcb 'c0d ηad , a T '1b = ηcb 'c1d ηad . a and ea one can easily show that the gauge From the gauge transformations of e0µ 1µ a a parameters '0b and '1b can be chosen to make e0µa symmetric in µ and a and a eb η e1µν = e1µ 0ν ab antisymmetric in µ and ν. This is equivalent to the statement that the Lagrangian should be completely expressible in terms of Gµν and Bµν only, after eliminating ωµa b through its equations of motion. In reality we have a b a b Gµν = e0µ e0ν ηab + e1µ e1ν ηab , a b a b Bµν = −e0µ e1ν ηab + e1µ e0ν ηab .
Complexified Gravity in Noncommutative Spaces
287
a eb η , g a νλ λ In this special gauge, where we define g0µν = e0µ 0µν g0 = δµ , and use e0µ 0ν ab to raise and lower indices we get
Bµν = 2e1µν , 1 Gµν = g0µν − Bµκ Bλν g0κλ , 4 The last formula appears in the metric of the effective action in open string theory [11]. 3. Second Order Formulation In the rest of this paper, we shall assume for simplicity that the metric is Euclidean a only by solving the ω a ηab = δab . We can express the Lagrangian in terms of eµ µb equations of motion µ
eaµ eνb ωνc b + ebν eµc ωνb a − eµb eaν ωνc b − eb eνc ωνb a √ 1 = √ ∂ν G eaν eµc − eaµ eνc ≡ Xµc a , G µc µc † µa where X a satisfy X a = −X c and G = ee† . One has to be very careful in working with a nonsymmetric metric a gµν = eµ eνa ,
g µν = eµa eνa , gµν g νρ = δµρ , ρ
but gµν g µρ = δµ . Care also should be taken when raising and lowering indices with the metric. Before solving the ω equations, we point out that the trace part of ωµa b (corresponding to the U (1) part in U (D)) must decouple from the other gauge fields. It is thus undetermined and decouples from the Lagrangian after substituting its equation of motion. It a, imposes a condition on the eµ √ 1 G eaν eµa − eaµ eνa ≡ Xµa a = 0. √ ∂ν G
We can therefore assume, without any loss in generality, that ωµa b is traceless ωµa a = 0 . ρ Multiplying the ω−equation with eκa ec we get δκµ ωνρ ν + δρµ ωννκ − ωκρ µ − ωρµ κ = X µρκ , where ωµν ρ = eνa eρb ωµa b , Xµρκ = eρc eκa X µc a .
Contracting by first setting µ = κ then µ = ρ we get the two equations 3ωνρ ν + ωννρ = X µρµ ,
ωνρ ν + 3ωννρ = X µµρ .
288
A. H. Chamseddine
These could be solved to give 1 µ 3X ρµ − X µµρ , 8 1 −Xµρµ + 3X µµρ . = 8
ωνρ ν = ωννρ
Substituting these back into the ω-equation we get ωκρ µ + ωρµκ =
1 1 µ µ δκ 3X ρµ − X µµρ + δρµ −X µκµ + 3X µµκ − X µρκ ≡ Y µρκ . 8 8
We can rewrite this equation after contracting with eµc eσc to get ωκρσ + eaµ eµc eσc ωρaκ = gσ µ Y µρκ ≡ Yσρκ . By writing ωρaκ = ωρνκ eνa we finally get α β γ δκ δρ δσ + g βµ gσ µ δρα δκγ ωαβγ = Yσρκ . To solve this equation we have to invert the tensor αβγ = δκα δρβ δσγ + g βµ gσ µ δρα δκγ . Mκρσ
In the conventional case when all fields are real, the metric gµν is symmetric and αβγ β g βµ gσ µ = δσ so that the inverse of Mκρσ is simple. In the present case, because of the nonsymmetry of gµν this is fairly complicated and could only be solved by a perturρ bative expansion. Writing gµν = Gµν + iBµν and from the definition g µν gνρ = δµ we get g µν = a µν + ibµν , where −1 a µν = Gµν + Bµκ Gκλ Bλν = Gµν − Gµκ Bκλ Gλσ Bσ η Gην + O(B 4 ), bµν = −2iGµκ Bκλ Gλν + Gµκ Bκλ Gλσ Bσ τ Gτρ Bρη Gην + O(B 5 ). µ
We have defined Gµν Gνρ = δρ . This implies that g µα gνα ≡ δνµ + Lµ ν,
µρ µρ σα 3 Lµ ν = iG Bρν − 2G Bρσ G Bαν + O(B ).
αβγ
The inverse of Mκρσ defined by σρκ
β
αβγ Nαβγ Mκρσ = δαα δβ δγγ
Complexified Gravity in Noncommutative Spaces
289
is evaluated to give σρκ
Nαβγ =
1 σ ρ κ δγ δβ δα + δβσ δαρ δγκ − δασ δγρ δβκ 2 1 κ σ ρ ρ − δβ δα Lγ + δακ δγσ Lβ − δγκ δβσ Lρα 4 1 κ σ ρ ρ + Lγ δβ δα + Lκβ δασ δγρ − Lκα δγσ δβ 4 1 κ σ ρ − δα Lγ δβ + δγκ Lσβ δαρ − δβκ Lσα δγρ + O(L2 ). 4
This enables us to write σρκ
ωαβγ = Nαβγ Yσρκ and finally γ
ωµa b = eβa eb ωµβγ . It is clear that the leading term reproduces the Einstein–Hilbert action plus contributions proportional to Bµν and higher order terms. The most difficult task is to show that the Lagrangian is completely expressible in terms of Gµν and Bµν only. The other a and ea should disappear. We have argued from the viewpoint of components of e0µ 1µ gauge invariance that this must happen, but it will be nice to verify this explicitly, to leading orders. We can check that in the flat approximation for gravity with Gµν taken to be δµν , the Bµν field gets the correct kinetic terms. First we write i a eµ = δµa + Bµa , 2 i eµa = δµa − Bµa . 2 and the inverses i eµa = δµa + Bµa , 2 i µ a ea = δµ − Bµa . 2 The ωµa a equation implies the constraint Xµa a = ∂ν eaµ eνa − eaν eµa = 0. This gives the gauge fixing condition ∂ ν Bµν = 0. We then evaluate Xµρκ = −
i ∂ρ Bκµ + ∂κ Bρµ . 2
290
A. H. Chamseddine
This together with the gauge condition on Bµν gives Y µρκ =
i ∂ρ Bκµ + ∂κ Bρµ 2
and finally ωµνρ = −
i ∂µ Bνρ + ∂ν Bµρ . 2
When the ωµνρ is substituted back into the Lagrangian, and after integration by parts one gets L = ωµνρ ωνρµ − ωµµρ ωνρ ν 1 = − Bµν ∂ 2 B µν 4 This is identical to the usual expression 1 Hµνρ H µνρ , 12 where Hµνρ = ∂µ Bνρ + ∂ν Bρµ + ∂ρ Bµν . We have therefore shown that in D dimensions one must start with 2D 2 real components a , subject to gauge transformations with D 2 real parameters. The resulting Lagrangian eµ symmetric components Gµν and D(D−1) antisymdepends on D 2 fields, with D(D+1) 2 2 metric components Bµν . The idea of a hermitian metric was first forwarded by Einstein and Strauss [15], which resulted in a nonsymmetric action for gravity, with two possible contractions of the Riemann tensor. The later developments of nonsymmetric gravity showed that the occurrence of the trace part of the spin-connection in a linear form would result in the propagation of ghosts in the field Bµν [16]. This can be traced to the fact that there is no gauge symmetry associated with the field Bµν . For the theory to become consistent one must show that the action above has an additional gauge symmetry, which generalizes diffeomorphism invariance to complex diffeomorphism. This would protect the field Bµν from having non-physical degrees of freedom. It is therefore essential to identify whether there are additional symmetries present in the above proposed action. 4. Noncommutative Gravity At this stage, and having shown that it is perfectly legitimate to formulate a theory of gravity with nonsymmetric complex metric, based on the idea of gauge invariance of the group U (1, D − 1). It is not difficult to generalize the steps that led us to the action for complex gravity to spaces where coordinates do not commute, or equivalently, where the usual products are replaced with star products. First the gauge fields are subject to the gauge transformations −1d −1c ωµa b = Mca ∗ ωµc d ∗ M∗b − Mca ∗ ∂µ M∗b ,
Complexified Gravity in Noncommutative Spaces
291
−1b is the inverse of M a with respect to the star product. The curvature is now where M∗a b
Rµν ab = ∂µ ωνa b − ∂ν ωµa b + ωµa c ∗ ωνc b − ωνa c ∗ ωµc b , which transforms according to µν ab = Mca ∗ Rµν cd ∗ M −1d . R ∗b a and their inverse defined by Next we introduce the vielbeins eµ ν a e∗a ∗ eµ = δµν , ν = δba , eνa ∗ e∗b
which transform to a b = Mba ∗ eµ , eµ µ
µ
−1b e∗a = eb ∗ M∗a .
The complex conjugates for the vielbeins are defined by a † eµa ≡ eµ , µa µ † e∗ ≡ e∗a . Finally we define the metric † a ∗ ηab ∗ eνb . gµν = eµ The U (1, D − 1) gauge invariant Hermitian action is √ µ I = d D x e ∗ e∗a ∗ Rµν ab ηcb ∗ e∗νc ∗ e† . This action differs from the one considered in the commutative case by higher derivatives terms proportional to θ µν . It would be very interesting to see whether these terms could be reabsorbed by redefining the field Bµν , or whether the Lagrangian reduces to a function of Gµν and Bµν and their derivatives only. The connection of this action to the gravity action derived for noncommutative spaces based on spectral triples ([17–19]) remains to be made. In order to do this one must understand the structure of Dirac operators for spaces with deformed star products. 5. Conclusions We have shown that it is possible to combine the tensors Gµν and Bµν into a complexified theory of gravity in D dimensions by gauging the group U (1, D − 1). The Hermitian gauge invariant action is a direct generalization of the first order formulation of gravity obtained by gauging the Lorentz group SO(1, D − 1). The Lagrangian obtained is a a and reduces to a function of G function of the complex fields eµ µν and Bµν only. This action is generalizable to noncommutative spaces where coordinates do not commute, or equivalently, where the usual products are deformed to star products. It is remarkable that the presence of a constant background field in open string theory implies that the metric
292
A. H. Chamseddine
of the target space becomes nonsymmetric and that the tangent manifold for space-time does not have only the Lorentz symmetry but the larger U (1, D − 1) symmetry. The results shown here, can be improved by computing the second order action to include higher order terms in the Bµν expansion and to see if this can be put in a compact form. More work is needed to show that the theory is consistent at the non-linear level in the metric dependence, and to explore whether there is an additional hidden symmetry present that protects the non-physical degrees of freedom from Bµν do not propagate. Similarly the computation has to be repeated in the noncommutative case to see whether the θ µν contributions could be simplified. It is also important to determine a link between this formulation of noncommutative gravity and the Connes formulation based on the noncommutative geometry of spectral triples. To make such connection many points have to be clarified, especially the structure of the Dirac operator for such a space. This and other points will be explored in a future publication. References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19.
Connes, A., Douglas, M.R. and Schwarz, A.: JHEP 9802, 003 (1998) Douglas, M.R. and Hull, C.: JHEP 9802, 008 (1998) Cheung, Y.K.E. and Krogh, M.: Nucl. Phys. B528, 185 (1998) Chu, C.-S. and Ho, P.-M.: Nucl. Phys. B528, 151 (1999) Schomerus, V.: JHEP 9906, 030 (1999) Ardalan, F., Arfaei, H. and Sheikh-Jabbari, M.M.: JHEP 9902, 016 (1999) Seiberg, N. and Witten, E.: JHEP 9909, 032 (1999) Hoppe, J.: Phys. Lett. B250, 44 (1990) Fairlie, D.B., Fletcher, P. and Zachos, C.K.: Phys. Lett. B218, 203 (1989) Madore, J., Schraml, S., Schupp, P. and Wess, J.: hep-th/0001203 Callan, C.G., Lovelace, C., Nappi, C.R. and Yost, S.A.: Nucl. Phys. B288, 525 (1987) Moffat, J.: J. Math. Phys. 36, 3722 (1995) and references therein Maharana, J. and Schwarz, J.H.: Nucl.Phys. B390, 3 (1993) Siegel, W.: Phys. Rev. D47, 5453 (1993) Einstein, A., and Strauss, E.: Ann. Math. 47, 731 (1946) Damour, T., Deser, S. and McCarthy, J.: Phys. Rev. D47, 1541 (1993) Chamseddine, A.H., Felder, G. and Fröhlich, J.: Commun. Math. Phys. 155, 109 (1993) Chamseddine, A.H., Grandjean, O. and Fröhlich, J.: J. Math. Phys. 36, 6255 (1995) Connes, A.: J. Math. Phys. 36, 6194 (1995)
Communicated by A. Connes
Commun. Math. Phys. 218, 293 – 313 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
The Vlasov–Poisson–Boltzmann System Near Vacuum Yan Guo Lefshitz Center for Dynamical Systems, Division of Applied Mathematics, Brown University, Providence, RI 02912, USA. E-mail:
[email protected] Received: 29 September 2000 / Accepted: 27 November 2000
Abstract: Global classical solutions with small amplitude are constructed for the Cauchy problem to the Vlasov–Poisson–Boltzmann system, which describes the dynamics of charged particles interacting with their self-consistent electrostatic potential as well as with themselves through collisions. 1. Introduction and Notations Charged dilute particles (e.g., electrons) in the absence of a magnetic field can be described by the Vlasov–Poisson–Boltzmann system ∂t f + v · ∇x f + ∇x φ · ∇v f = Q(f, f ), f dv, φ = ρ =
(1) (2)
R3
f (0, x, v) = f0 (x, v),
(3)
where f (t, x, v) is the distribution function for the particles at time t ≥ 0, spatial coordinates x = (x1 , x2 , x3 ) ∈ R3 and velocity v = (v1 , v2 , v3 ) ∈ R3 , with initial condition f0 (x, v). The self-consistent electric potential φ is coupled with the distribution function f (t, x, v) through Poisson’s equation (2). The short-range interaction between particles is given by the standard Boltzmann collision operator Q(f, f ) which takes the form: Q(f, f )(v) = B(|v − v∗ |, θ ){f (v )f (v∗ ) − f (v)f (v∗ )}dv∗ dω. R3 ×S 2
Here cos θ = (v − v∗ ) · ω/|v − v∗ |, ω ∈ S 2 with v = v − (|v − v∗ | cos θ)ω,
v∗ = v∗ + (|v − v∗ | cos θ)ω,
294
Y. Guo
which denote velocities after a collision of particles having velocities v, v∗ before the collision and vice versa. We denote u = v − v∗ so that v = v − (u · ω)ω ≡ v − u ,
v∗ = v − u + (u · ω)ω ≡ v − u⊥ .
Now the collision kernel takes the form of Q(f, f )(v) = B(|u|, θ){f (v )f (v∗ ) − f (v)f (v − u)}dudω, R3 ×S 2
(4)
(5)
with v and v∗ given in (4). We make a physically reasonable assumption that the collisions between these charged particles are rather “soft”: B(|u|, θ) ≤ C(1 + |u|γ ),
−2 < γ ≤ 0.
(6)
In many physical situations, the electro-magnetic interaction is far more important than collisions between charged particles, so that collisions are treated as high-order corrections to the collisionless Vlasov–Poisson model. This is the case, for example, of a hot dilute plasma. The goal of this article is to demonstrate this fact from a mathematical standpoint by constructing global in time smooth solutions with small amplitude to the Vlasov–Poisson–Boltzmann system (1), (2) and (3). For any given field ∇x ψ(t, x), we define the characteristic equations to (1) as dx = v, ds
dv = ∇x ψ(s, x). ds
(7)
For any point (t, x, v) in R+ × R3 × R3 , we denote [X(s), V (s)] ≡ [X(s; t, x, v), V (s; t, x, v)]
(8)
as the solution to (7) with X(t; t, x, v) = x, V (t; t, x, v) = v. We define k 2 |||h(t, x, v)|||ψ = sup (1 + t)−j eα|V (0)| 1 + |X(0)|2 ∇xi ∇vj h(t, x, v) t,x,v 0≤i+j ≤1
(9) for k ≥ 0 and α > 0. We remark that even if |h| < ∞, ∇v h can grow linearly in t as t → ∞. We also define 2 2 |f0 (x, v)|α,k ≡ sup eα|v| (1 + |x|2 )k f0 (x, v) + sup eα|v| (1 + |x|2 )k ∇x,v f0 (x, v) , x,v
x,v
(10) where ∇x,v denote either ∇x or ∇v . Our main results are as follows: Theorem 1. Assume (6). There is a δ1 > 0 such that if 0 ≤ f0 ∈ C 1 (R3 × R3 ) and |f0 |α,k ≤ δ1 , for α > 0 and k > 3/2, then there is a classical solution [f, φ] (with f ≥ 0) to the Vlasov–Poisson–Boltzmann system (1), (2) and (3). Moreover, there are constants C0 and C1 such that |||f |||φ ≤ C0 |f0 |α,k , (11) 2 5/2 2 (1 + t)|φ(t)|∞ + (1 + t) |∇t,x φ(t)|∞ + (1 + t) |∇x φ(t)|∞ ≤ C1 |f0 |α,k (12) for all t ≥ 0. Furthermore, if [g, ψ] (with g ≥ 0) is another classical solution to (1), (2) and (3) which satisfies (11) and (12) with other constants C0∗ and C1∗ , then f ≡ g, φ ≡ ψ.
Vlasov–Poisson–Boltzmann System Near Vacuum
295
Theorem 1 is based on the construction of classical solutions to the Boltzmann equation in the presence of a given, decaying field ∇x ψ(t, x): ∂t f + v · ∇x f + ∇x ψ · ∇v f = Q(f, f ).
(13)
Theorem 2. Assume (6) and ∇x ψ(t, x) ∈ C(R+ ×R3 ), and for any fixed t, ∇x2 ψ(t, x) ∈ C(R3 ). Assume (14) sup (1 + t)|ψ(t)|∞ + (1 + t)2 |∇t,x ψ(t)|∞ + (1 + t)5/2 |∇x2 ψ(t)|∞ ≤ %0 t
for sufficiently small number %0 > 0. There exist δ0 and C0 > 0 such that if 0 ≤ f0 ∈ C 1 (R3 ×R3 ) and |f0 |α,k ≤ δ0 with α > 0 and k > 3/2, then there is a classical solution f (t, x, v) ≥ 0 to (13) and (3) such that |||f |||ψ ≤ C0 |f0 |α,k . Furthermore, if there is another solution g ∈ W 1,∞ (R+ × R3 × R3 ) to (13) and (3) which satisfies for 0 ≤ s ≤ t that 2 (15) sup g(s, x, v)eα|v| < C(t) < ∞. s,x,v
Then f ≡ g. To the author’s knowledge, even for the Cauchy problem of Boltzmann equation in the presence of an external field (13), there are few constructions of global classical solutions. Theorem 2 can be regarded as a natural extension of well-known results of Illner-Shinbrot [IS] (see also [BT, H, P, T]) for the Boltzmann equation in the absence of external fields, as well as of Bardos–Degond [BD] for the Vlasov–Poisson system in the absence of collision effects. To prove Theorem 2 for a given external field ∇x ψ(t, x), for any point (t, x, v), we recall the mild form of the Boltzmann equation (13) along its backward trajectory [X(s), V (s)] as t f (t, x, v) − f0 (X(0), V (0)) = Q(f, f )(s, X(s), V (s))ds 0 t = B(|u|, θ )f (s, X(s), V (s) − u )f (s, X(s), V (s) − u⊥ )dudωds 0 R3 ×S 2 t f (s, X(s), V (s)) B(|u|, θ )f (s, X(s), V (s) − u)dudωds − 0
R3 ×S 2
≡ N (f, f ).
(16)
Here we decompose u into u = (u · ω)ω, which is parallel to ω, and u⊥ = u − (u · ω)ω, which is perpendicular to ω. Using this mild form (16), we apply the contraction mapping principle to prove Theorem 2, in the framework of [U2] for the case ψ ≡ 0. In order to estimate Q(f, f )(s, X(s), V (s)), which is evaluated along a trajectory, we define the following: V (0) = V (0; s, X(s), V (s) − u ), V⊥ (0) = V (0; s, X(s), V (s) − u⊥ ); X (0) = X(0; s, X(s), V (s) − u ), X⊥ (0) = X(0; s, X(s), V (s) − u⊥ ).
(17) (18)
296
Y. Guo
In the case of ψ ≡ 0, the following well-known identity is crucial in the construction of the global solution: |X (0)|2 + |X⊥ (0)|2 = |X(0)|2 + |X(0) − su|2 . Although this is not true if ψ = 0, we observe that |X (0)|2 + |X⊥ (0)|2 ≥
1 |X(0)|2 2
as in Lemma 2, Sect. 2. In the estimation of mild form (Sect. 3), we can not perform the traditional s-integration first, due to lack of information of the time-derivative of perturbed trajectories. Instead, we first estimate the integration over u = su, and leave the s-integration last. Since |X (0)| and |X⊥ (0)| are functions of u and u ⊥ respectively, we decompose du dω = 2|u |−2 du du ⊥ to obtain the desired estimate in Lemma 5. In Sect. 4, we establish Theorem 2. To estimate the derivatives, we use the decay property of the external field ∇x ψ to balance ∇v f (t, x, v) which grows linearly in time. Using a iterative sequence with uniform bounds based on Theorem 2, we finally construct global classical solutions to the full Vlasov–Poisson–Boltzmann system in Sect. 5. In comparison with the renormalized solutions constructed by P.-L. Lions [L] to the same problem, the initial data f0 in our case is more restrictive. On the other hand, since our solutions are constructed with more analytical properties, uniqueness can be easily derived within the same solution class. We hope that our result serves as a first step in construction of classical solutions with small amplitude to the more fundamental Vlasov–Maxwell–Boltzmann system, in which a self-consistent magnetic field is also present and no global solutions have been constructed so far. While there is extensive mathematical literature for the Vlasov theory over the last twenty years, there has been far more explosive mathematical work for the Boltzmann theory. Since it is impossible to list them all, the author refers [C, CIP, G] and [U] for more references in both subjects. We merely mention that the global classical solutions near Maxwellian was first constructed by Ukai [U1], and (weak) solutions with unrestricted size for both the Boltzmann and Vlasov–Maxwell system was constructed by DiPerna and Lions [L]. The classical solutions with small amplitude to the relativistic Vlasov– Maxwell system was constructed by Glassey and Strauss [GS]. We shall use from time to time | · |p to denote the standard Lp norm. 2. Characteristics We need to study the Boltzmann equation in the presence of a given, external field ∇x ψ(t, x) which satisfies (14). For any given point (t, x, v), recall [X(s), V (s)] ≡ [X(s; t, x, v), V (s; t, x, v)] satisfies (7) with X(t; t, x, v) = x and V (t; t, x, v) = v. In order to study (1) in its mild formulation (16), we need some estimates of v and v∗ along a trajectory (8).
Vlasov–Poisson–Boltzmann System Near Vacuum
297
Lemma 1. Assume (14). For any fixed (t, x, v), and 0 ≤ s1 , s2 ≤ ∞, we have |V (s1 ) − V (s2 )| ≤ π %0 , |V (s1 )|2 − |V (s2 )|2 ≤ (π + 2)%0 .
(19)
Furthermore, X(0; s, X(s), V (s) + ξ ) is C 1 in ξ , and for any 0 ≤ s ≤ ∞, ∂X (0; s, X(s), V (s) + ξ ) + sI = sO(%0 ), ∂ξ X(0; s, X(s), V (s) + ξ ) = X(0) − s[I + O(%0 )]ξ,
(20) (21)
where |O(%0 )| ≤ C%0 with a universal constant C, independent of s. Proof. We fix (t, x, v). Integrating over [s1 , s2 ] of |V (s1 ) − V (s2 )| =
s2
s1
dv ds
= ∇x ψ, we have
ds ∇x ψ(s, X(s))ds ≤ %0 = π %0 , 2 R 1+s
dv where we have used the decay assumption (14). Multiplying v = dx ds to ds = ∇x ψ, we have d 1 d |V (s)|2 = ∇x ψ(s, X(s)) · V (s) = ψ(s, X(s)) − ψt (s, X(s)). ds 2 ds
Integrating over s1 ≤ s ≤ s2 yields: s2 1 |V (s2 )|2 − 1 |V (s1 )|2 ≤ |ψ(s2 , X(s2 )) − ψ(s1 , X(s1 ))| + |ψt (s, X(s))|ds 2 2 s1 1 = 2%0 + %0 ds ≤ (π + 2)%0 , 2 R (1 + s) where we have used (14). Equation (19) thus follows. To prove (20) and (21), we first denote X(θ, ξ ) ≡ X(θ ; s, X(s), V (s) + ξ ),
V (θ, ξ ) ≡ V (θ ; s, X(s), V (s) + ξ ).
From the characteristic equations (7), we represent X(0, ξ ) in terms of X(s) and V (s)+ξ as θ V (θ, ξ ) = V (s) + ξ + ∇x ψ(τ, X(τ, ξ ))dτ, s s s X(0, ξ ) = X(s) − [V (s) + ξ ]s + ∇x ψ(τ, X(τ ; ξ ))dτ dθ. 0
θ
From the decay condition (14) and by Lemma 8 in Appendix, we have ∂X sup (τ ; s, x, v) ≤ C(s − τ ). x,v ∂v
298
Y. Guo
Therefore by taking the ξ derivative of X(0, ξ ), we get from (14) s s ∂X ∂X 2 ∇x ψ(τ, X(τ, ξ )) (τ ; s, X(s), V (s) + ξ )dτ dθ ∂ξ (0, ξ ) + sI = ∂v 0 θ s s ≤ C%0 (1 + τ )−5/2 (1 + s − τ )dτ dθ 0 θ s ≤ C%0 s (1 + θ)−3/2 dθ = sO(%0 ). 0
We now prove (21). Notice that as functions of ξ , both X(0, ξ ) and X(0, ξ ) − X(0) + sξ are zero when ξ = 0. From the Mean Value Theorem between ξ and 0, we have ∂X |X(0, ξ ) − X(0) + sξ | ≤ sup (0, η) + sI |ξ | ≤ C%0 |ξ |s, ∂ξ η and the lemma follows. For any fixed (t, x, v), we recall (4) and (8). Lemma 2. Assume (14). Then for %0 small enough, we have for all 0 ≤ s ≤ t, |X (0)|2 + |X⊥ (0)|2 ≥
1 |X(0)|2 . 2
(22)
Proof. Applying (21) to (18) with ξ = −(u·ω)ω = −u and ξ = −u+(u·ω)ω = −u⊥ respectively, we have |X (0)|2 + |X⊥ (0)|2 = |X(0) + s[I + O(%0 )]u |2 + |X(0) + s[I + O(%0 )]u⊥ |2 ≥ 2|X(0)|2 + 2sX(0) · u + s 2 |u|2 − C%0 s|X(0)||u| + s 2 |u|2 ≥
1 |X(0)|2 2
for sufficiently small %0 .
3. Estimates for the Mild Form Recall (16), we study the mild form of the Boltzmann equation (13) in the presence of a given external field ∇x ψ(t, x). We define N (f, g) = N1 (f, g) − N2 (f, g) with the gain term N1 and the loss term N2 as t N1 (f, g) = B(|u|, θ )f (s, X(s), V (s) − u ) R3 ×S 2
0
N2 (f, g) = 0
× g(s, X(s), V (s) − u⊥ )dudωds, t
f (s, X(s), V (s)) × B(|u|, θ )g(s, X(s), V (s) − u)dudωds. R3 ×S 2
(23)
(24)
Vlasov–Poisson–Boltzmann System Near Vacuum
299
For t > 0, α > 0 and k ≥ 0, we define a norm for h(t, x, v) which depends on ψ as 2 (25) h(t) ≡ h(t)ψ,α,k ≡ sup eα|V (0)| (1 + |X(0)|2 )k h(t, x, v) , x,v
where V (0) = V (0; t, x, v) and X(0) = X(0; t, x, v). We first prove a useful elementary fact. Lemma 3. Let −n < λ1 ≤ 0 and 2λ2 < −n, then |ξ − ξ |λ1 (1 + |ξ |2 )λ2 dξ ≤ C(1 + |ξ |)λ1 . Rn
(26)
Proof. |ξ − ξ |λ1 (1 + |ξ |2 )λ2 dξ Rn = + |ξ −ξ |≥(|ξ |+1)/2
≤ C(1 + |ξ |)λ1
|ξ −ξ |≤(|ξ |+1)/2
(1 + |ξ |2 )λ2 dξ +
1 + |ξ | 2
2λ2 |ξ −ξ |≤(|ξ |+1)/2 λ1
= C(1 + |ξ |)λ1 + C(1 + |ξ |)λ1 +n+2λ2 ≤ C(1 + |ξ |) .
|ξ − ξ |λ1
Lemma 4. Assume (14). Assume B(|u|, θ) ≤ C(1 + |u|γ ) with −3 < γ ≤ 0. Let α > 0 and k ≥ 0 in (25). Then there is a constant C > 0 such that for all t ≥ 0, t f (s)g(s)ds. (27) N (f, g)(t) ≤ C 0
Proof. We first estimate the loss term N2 (f, g). We fix (t, x, v). Notice that for any point (s, X(s), V (s)) on its backward trajectory, |h(s, X(s), V (s))| ≤ [1 + |X(0)|2 ]−k e−α|V (0)| h(s) 2
from the definition (25). From (24), we have t |N2 (f, g)(t, x, v)| = f (s, X(s), V (s)) 0 × B(|u|, θ )g(s, X(s), V (s) − u)dudωds R3 ×S 2
≤ [1 + |X(0)|2 ]−k e−α|V (0)| t 2 × B(|u|, θ )f (s)g(s)e−α|V (0;s,X(s),V (s)−u)| 2
0
R3 ×S 2
× 1 + |X(0; s, X(s), V (s) − u)|2
−k
dudωds.
From (19), plugging |V (0; s, X(s), V (s) − u)|2 = |V (s) − u|2 + O(%0 )
300
Y. Guo
into above, we obtain |N2 (f, g)(t, x, v)| 2 −k −α|V (0)|2
t
f (s)g(s) ≤ C[1 + |X(0)| ] e 0
−k 2 × B(|u|, θ)e−α|V (s)−u| 1 + |X(0; s, X(s), V (s) − u)|2 dudω ds R3 ×S 2
≤ C[1 + |X(0)|2 ]−k e−α|V (0)| t × f (s)g(s)
2
R3 ×S 2
0
2 B(|u|, θ )e−α|V (s)−u| dudω ds.
(28)
Notice that from B(|u|, θ) ≤ C(1 + |u|γ ) with −3 < γ ≤ 0, and the fact that e−α|u| ≤ C(1 + |u|2 )−l with some l large, we deduce from Lemma 3 2 B(|u|, θ)e−α|V (s)−u| dudω < ∞, 2
R3 ×S 2
uniformly in s. Thus we have 2 k α|V (0)|2
[1 + |X(0)| ] e
t
|N2 (f, g)(t, x, v)| ≤ C
f (s)g(s)ds.
(29)
0
To study the gain term N1 (f, g) in (23), we recall (17) and (18). From (23), we have |N1 (f, g)(t, x, v)| t B(|u|, θ )f (s, X(s), V (s) − u )g(s, X(s), V (s) − u⊥ )dudωds = 3 2 0 R ×S t
−k 2 B(|u|, θ)f (s) 1 + |X (0)|2 e−α|V (0)| ≤ 3 2 0 R ×S
−k 2 −α|V⊥ (0)|2 ×g(s) 1 + |X⊥ (0)| e dudωds t = f (s)g(s) 0 2 2 × B(1 + |X (0)|2 )−k (1 + |X⊥ (0)|2 )−k e−α[|V (0)| +|V⊥ (0)| ] dudωds. R3 ×S 2
From the conservation of energy through a collision (17), |V (s) − u |2 + |V (s) − u⊥ |2 = |V (s)|2 + |V (s) − u|2 . From (19) in Lemma 1, we have |V (0)|2 + |V⊥ (0)|2 = |V (s) − u |2 + |V (s) − u⊥ |2 + O(%0 ) = |V (s)|2 + |V (s) − u|2 + O(%0 ) = |V (0)|2 + |V (s) − u|2 + O(%0 ).
Vlasov–Poisson–Boltzmann System Near Vacuum
301
We thus have an upper bound for |N1 (f, g)(t, x, v)| as Ce ×
−α|V (0)|2
t
f (s)g(s)
0
B(|u|, θ)(1 + |X (0)|2 )−k (1 + |X⊥ (0)|2 )−k e−α|V (s)−u| dudωds. 2
R3 ×S 2
(30)
From Lemma 2, either |X (0)|2 or |X⊥ (0)|2 is greater than 41 |X(0)|2 . Hence
B(|u|, θ)(1 + |X (0)|2 )−k (1 + |X⊥ (0)|2 )−k e−α|V (s)−u| dudω R3 ×S 2 2 2 −k B(|u|, θ)e−α|V (s)−u| dudω ≤ C(1 + |X(0)| ) 2
2 −k
≤ C(1 + |X(0)| )
R3 ×S 2
,
where we have used Lemma 3 again, since B(|u|, θ) ≤ C(1 + |u|γ ) with γ > −3. Therefore we have t f (s)g(s)ds. N2 (f, g)(t) ≤ C 0
Together with (29), the lemma follows.
The following lemma treats the case for large time, which is essential for the global existence. Lemma 5. Assume (14). Assume B(|u|, θ ) ≤ C(1 + |u|γ ) with −2 < γ ≤ 0. Let α > 0 and k > 3/2 in (25). Then there is a constant C > 0 such that sup N (f, g)(t) ≤ C sup f (t) × sup g(t). t
t
t
(31)
Proof. We split the s-integrations in (28) and (30) into two parts: 0 ≤ s ≤ 1 and 1 ≤ s ≤ t. Applying (27) in Lemma 4 to majorize both small-time parts 0 ≤ s ≤ 1 in (28) and (30) by
1
C
f (s)g(s)ds ≤ C sup f (s) sup g(s). s
0
s
It suffices to estimate the large time integrations 1 ≤ s ≤ t in (28) and (30). To estimate the loss term N2 (f, g), we first perform a change of variable u = su to majorize the part of 1 ≤ s ≤ t in (28) by C[1 + |X(0)|2 ]−k e−α|V (0)| sup f (s) sup g(s)A 2
s
s
302
Y. Guo
with
|u | A≡ B ,θ s 1 R3 ×S 2 2 −k du u 2 × e−α|V (s)− s | 1 + X(0; s, X(s), V (s) − u /s) dωds s3 t 2 −k ≤C s −3−γ (1 + |u |γ ) 1 + X(0; s, X(s), V (s) − u /s) du dω ds. t
1
S2
R3
We have used the fact for s ≥ 1, B(
|u | , θ) ≤ C(1 + |u |γ s −γ ) ≤ C(1 + |u |γ )s −γ s
with −2 < γ ≤ 0. By a further change of variable u¯ = X(0; s, X(s), V (s) − u /s) − X(0), we have from (20) and (21) that d u¯ du = 1 + O(%0 ). Therefore,
−k (1 + |u |γ ) 1 + |X(0; s, X(s), V (s) − u /s)|2 du R3
−k ≤C (1 + |u| ¯ γ ) 1 + |X(0) − u| ¯2 d u¯ < ∞, R3
again by Lemma ∞ 3 with λ1 = 0, γ respectively, and λ2 = −k < −3/2. We thus deduce A ≤ C S 2 1 s −3+γ dsdω < ∞. Together with the case for s ≤ 1, we conclude sup N2 (f, g)(t) ≤ C sup f (t) sup g(t). t
t
t
Similarly, to estimate the gain term N1 (f, g), we can majorize the part of s ≥ 1 integration in (30) by: C sup f (s) sup g(s)e−α|V (0)| A1 2
s
s
with A1 takes the form of A1 ≡
t 1
R3 ×S 2
B(|u|, θ)(1 + |X (0)|2 )−k
× (1 + |X⊥ (0)|2 )−k e−α|V (s)−u| dudωds. 2
Using again the change of variable of u = su
Vlasov–Poisson–Boltzmann System Near Vacuum
we get
∞
A1 ≤ 1
R3 ×S 2
B
|u | ,θ s
303
u 2 du × (1 + |X (0)|2 )−k (1 + |X⊥ (0)|2 )−k e−α|V (s)− s | 3 dωds (32) s t ≤ s −3−γ [1 + |u |γ ](1 + |X (0)|2 )−k (1 + |X⊥ (0)|2 )−k du dω ds. 1
R3 ×S 2
From Lemma 2, either |X (0)|2 or |X⊥ (0)|2 is greater than 41 |X(0)|2 . Due to the symmetry, we may assume that |X⊥ (0)|2 ≥
1 |X(0)|2 . 4
(33)
For fixed s, we notice from (18) that X (0) and X⊥ (0) are functions of u , only u⊥ respectively. In order to estimate (32), we decompose u = (u · ω)ω + [u − (u · ω)ω] ≡ u + u ⊥ . It suffices to consider the case u · ω > 0 due to the symmetry. The key step to estimate A1 is to rearrange the integral over R3 × S 2 in (32) as a 3D integration of u and a 2D integration of u ⊥ . Notice that du = d(u · ω)du ⊥ , and the spherical coordinates for du is du = |u |2 d|u |dω = |(u · ω)|2 d(u · ω)dω. We combine d(u · ω) and dω in (32) to get du dω = d(u · ω)du ⊥ dω = |(u · ω)|−2 du du ⊥ .
(34)
Therefore we can use the Fubini Theorem to reduce (32) to repeated integrals over u ⊥ and u :
[1 + |u |γ ](1 + |X (0)|2 )−k (1 + |X⊥ (0)|2 )−k du dω = (1 + |X (0)|2 )−k |u |−2 du [1 + |u ⊥ |γ ](1 + |X⊥ (0)|2 )−k du ⊥ .
R3 ×S 2 ,u ·ω>0
R3
(35)
R2
We have also used the fact that |u |γ ≤ |u ⊥ |γ . For the 2D u ⊥ -integral, we notice that from (21), X⊥ (0) − X(0) = X(0; s, X(s), V (s) − u⊥ ) − X(0) = [I + O(%0 )]u ⊥ . Hence by a change of variable u ⊥ → X⊥ (0), the 2Du ⊥ -integral is bounded by C [1 + |X⊥ (0) − X(0)|γ ](1 + |X⊥ (0)|2 )−k dX⊥ (0) R2 ∩{|X⊥ (0)|≥ 41 |X(0)|}
=
|X⊥ (0)−X(0)|≤ |X(0)|+1 2
+
|X⊥ (0)−X(0)|≥ |X(0)|+1 2
.
304
Y. Guo
By (33), (1 + |X⊥ (0)|2 )−k ≤ C(1 + |X(0)|2 )−k , and the first integral is bounded by C(1 + |X(0)|2 )−k [(|X(0)| + 1)2 + (|X(0)| + 1)γ +2 ] = C(1 + |X(0)|2 )−k+1 . By (33) again, the second integral is bounded by (1 + |X⊥ (0)|2 )−k d[|X⊥ (0)|2 ]. C[1 + (|X(0)| + 1)γ ] |X⊥ (0)|≥ 41 |X(0)|
We thus conclude that [1 + |u ⊥ |γ ](1 + |X⊥ (0)|2 )−k du ⊥ ≤ C(1 + |X(0)|2 )−k+1 [1 + |X(0)|γ ].
(36)
R2
u
Since from (21), we have X (0) = X(0) + [I + O(%0 )]u . By changing the variable → X (0), we then estimate the 3D u -integral as
R3
|u |−2 (1 + |X (0)|2 )−k du ≤C |X(0) − X (0)|−2 (1 + |X (0)|2 )−k dX (0) R3
≤ C(1 + |X(0)|)−2 by Lemma 3 again. Together with the estimate on u ⊥ integral (36), we finally obtain A1 ≤ C(1 + |X(0)|2 )−k as desired. We thus have, together with the case of s ≤ 1: |N2 (f, g)(t, x, v)| ≤ C sup f (t) sup g(t)(1 + |X(0)|2 )−k e−α|V (0)| . 2
t
t
We remark that the case when |X (0)| ≥ 41 |X(0)| is similar: We split u ⊥ = |u ⊥ |η and integrate over dω = dη first with u = (u · η)η + [u − (u · η)η].
4. Boltzmann Equation with External ψ Assume (14) for the external field ψ, we now construct a classical solution to the problem (13) and (3). We define a Banach space S = f (t, x, v) : f (t, x, v), ∇x f (t, x, v), ∇v f (t, x, v) ∈ C(R+ × R3 × R3 ) , (37) equipped with the norm |||h|||ψ = shall show that the mapping
0≤i+j ≤1 supt
j
(1 + t)−j ∇xi ∇v h(t) as in (9). We
F ≡ Tf ≡ f0 (X(0), V (0)) + N (f, f ) is a contraction from S to S, where N (f, g) is defined in (16), (23) and (24). We first prove an elementary fact in ODE.
(38)
Vlasov–Poisson–Boltzmann System Near Vacuum
305
Lemma 6. Consider a family of ODE dy(s; ξ ) = G(s, y(s; ξ )), ds
y(0; ξ ) = y0 (ξ )
(39)
with y(s; ξ ) ∈ Rn and ξ ∈ Rm . Assume y0 (ξ ) is C 1 in ξ and Gy (s, y) is continuous in y with sup |Gy (s, ·)|∞ ≤ CT
(40)
0≤s≤T
for all T > 0. Then y(s; ξ ) is C 1 in ξ and ∂y (ξ ) s ∂y 0 (s; ξ ) = e 0 Gy (θ,y(θ;ξ ))dθ . ∂ξ ∂ξ
(41)
∂y (s; ξ ) follows by the Proof. It suffices to verify (41) from which the continuity of ∂ξ Lebesgue dominant convergence theorem and (40). Denote y(s, ξ ) = y(s; ξ + ξ ) − y(s; ξ ). Taking the difference of solutions of (39), we obtain
dy(s, ξ ) = Gy (s; y(s; ξ ))y(s, ξ ) + R(s, ξ ), ds where R(s, ξ ) = [G(s, y(s; ξ + ξ )) − G(s, y(s; ξ )) − Gy (s, y(s; ξ ))y(s, ξ )]. Therefore, we have from (40),
s
|y(s, ξ ) − y0 (ξ )| ≤ 0
CT |y(θ, ξ )|dθ.
By the Gronwall inequality we deduce that |y(s, ξ )| ≤ C|ξ |. Furthermore, s s θ y(s, ξ ) = e 0 Gy (θ,y(θ;ξ ))dθ y0 (ξ ) + e− 0 Gy (τ,y(τ ;ξ ))dτ R(θ, ξ )dθ . 0
Notice that |R(θ, ξ )|/|ξ | is uniformly bounded for the bounded time interval from (40). Since from the continuity assumption of Gy , lim R(θ, ξj )/|ξj | → 0
ξj →0
for j = 1, . . . , m. Hence by the Lebesgue dominant convergent theorem, s θ lim |ξj |−1 e− 0 Gy (τ,y(τ ;ξ ))dτ R(ξj )dθ → 0. |ξj |→0
Equation (41) thus follows.
0
Lemma 7. Assume ∇x ψ(t, x) ∈ C(R+ × R3 ), and ∇x2 ψ(t, x) ∈ C(R3 ) for any given t ≥ 0. Let ψ satisfy (14). Let f0 ∈ C 1 and |f0 |α,k < ∞. Let f ∈ S, then Tf is C 1 (R+ × R3 × R3 ). Moreover, we have |||Tf ||| ≤ C|f0 |α,k + C|||f |||2 , |||Tf − T g||| ≤ C[|||f ||| + |||g|||] × |||f − g|||. (42)
306
Y. Guo
Proof. Step 1. Tf ∈ C 1 . Recall (4). We first define Q(h1 (x1 , v1 ),h2 (x2 , v2 )) B(|u|, θ )h1 (x1 , v1 − u )h2 (x2 , v2 − u⊥ )dudω ≡ R3 ×S 2 B(|u|, θ )h1 (x1 , v1 )h2 (x2 , v2 − u)dudω. − R3 ×S 2
Notice that |v1 − u |2 + |v2 − u⊥ |2 = |u|2 + |v1 |2 + |v2 |2 − 2v1 · u − 2v2 · u⊥ 1 ≥ |u|2 − 3(|v1 |2 + |v2 |2 ), 2 1 2 |v2 − u| ≥ |u|2 − 3|v2 |2 . 2 We therefore majorize the integrand in Q(h1 (x1 , v1 ), h(x2 , v2 )) by h1 0,α,0 h2 0,α,0 e3α[|v1 |
2 +|v |2 ] 2
α
B(|u|, θ)e− 2 |u|
≤ Ch1 ψ,α,0 h2 ψ,α,0 e
3α[|v1 |2 +|v2 |2 ]
2
B(|u|, θ)e
(43) − α2 |u|2
.
(44)
Since |V (0; t, xi , vi )|2 = |vi |2 + O(%0 ), for i = 1, 2. For notational simplicity, we shall use ∂x and ∂v to denote all xi and vi derivatives, i = 1, 2, 3. We now verify that for ∂ = ∂x , ∂v , ∂Q(f, f ) = Q(∂f, f ) + Q(f, ∂f ).
(45)
Without loss of generality, we just consider ∂ = ∂v . Notice that 1 [Q(f, f )(v1 ) − Q(f, f )(v2 )] v1 − v 2 f (v ) − f (v )
f (v1 ) − f (v2 ) 1 2 =Q , f (v1 ) + Q f (v2 ), v1 − v 2 v1 − v 2 = Q(∂v f (v), ¯ f (v1 )) + Q(f (v2 ), ∂v f (v)), ¯ where v¯ is between v1 and v2 . Notice that ∂v f (t)ψ ≤ |f |ψ < ∞, so from (44) with α 2 h1 = ∂f, h2 = f , the integrands above are bounded by CB(|u|, θ )e− 2 |u| for bounded t, x, v. Therefore we can verify (45) as v1 → v2 , by the continuity of f and ∂f and the Lebesgue dominant convergence theorem. Furthermore, it also follows that lim
(x1 ,v1 )→(x,v)
Q(∂f, f )(x1 , v1 ) → Q(∂f, f )(x, v).
Hence from (45), Q(f, f ) is C 1 in (x, v). Similarly, from (44) and the fact f ∈ C(R+ × R3 × R3 ), Q(f, f ) is continuous in (t, x, v). We now claim that [X(s), V (s)] is C 1 in (t, x, v). In fact we can apply Lemma 6 to the characteristic equations dX(s; t, x, v) = V (s; t, x, v), ds
dV (s; t, x, v) = ∇x ψ(s, X(s; t, x, v)). ds
Vlasov–Poisson–Boltzmann System Near Vacuum
307
Notice that the initial data at s = t satisfy Xt (t; t, x, v) = v, Vt (t; t, x, v) = ∇x ψ(t, x), which are continuous in t, x, v. We deduce the claim by Lemma 6 and (14). It follows that f0 (X(0), V (0)) is C 1 in (t, x, v). We thus deduce from (44) and the chain rule, t ∂N (f, f ) = ∂x Q(f, f )(s, X(s), V (s))∂X(s) 0 + ∂v Q(f, f )(s, X(s), V (s))∂V (s) ds, t ∂t N (f, f ) = ∂x Q(f, f )(s, X(s), V (s))∂t X(s) 0 + ∂v Q(f, f )(s, X(s), V (s))∂t V (s) ds, +Q(f, f ), where ∂ = ∂x , ∂v . Moreover, N (f, f ) is C 1 (R+ × R3 × R3 ). This is again because that from (44), (45) and the fact |f | < +∞, for bounded t, x, v, the integrands above are α 2 bounded by CB(|u|, θ)e− 2 |u| . Therefore, Tf is C 1 in (t, x, v) and we conclude our first step. Step 2. Proof of (42). By Lemma 5 and recalling (10) and (16), we have sup ||Tf (t)|| ≡ sup ||F (t)|| ≤ |f0 |α,k + C sup f (t)2 . t
t
t
(46)
We now estimate first order derivatives of Tf . From (38) we obtain equivalently L(F ) ≡ [∂t + v · ∇x + ∇x ψ · ∇v ] F = Q(f, f ). Taking x and v derivatives for the above equation, we get L(∂x F ) = −∂x ∇x ψ · ∇v F + Q(∂x f, f ) + Q(f, ∂x f ), L(∂v F ) = −∂v v · ∇x F + Q(∂v f, f ) + Q(f, ∂v f ).
(47) (48)
We first estimate (47). Fix (t, x, v). Integrating over its backward trajectory [X(s), V (s)] from 0 to t yields ∂x F (t, x, v) − ∂x f0 (X(0), V (0)) t ∂x ∇x ψ(s, X(s)) · ∇v F (s, X(s), V (s))ds + N (∂x f, f ) + N (f, ∂x f ) = − 0 t {∂x ∇x ψ(s, X(s))[1 + s]} · [1 + s]−1 ∇v F (s, X(s), V (s)) ds = − 0
+ N (∂x f, f ) + N (f, ∂x f ). Notice that from (14), |∇x2 ψ(s, X(s))| ≤ %0 (1 + s)−5/2 . Therefore, by multiplying (1 + 2 |X(0)|2 )k eα|V (0)| and taking the maximum norm over t, x, v, we have from Lemma 5, sup ∇x F (t) t
≤ |f0 |α,k + %0
∞ 0
(1 + s)−3/2 (1 + s)−1 ∇v F (s)ds + C∇x f f
≤ |f0 |α,k + C%0 sup (1 + t)−1 ∇v F (t) + C sup ∇x f (t) sup f (t). t
t
t
(49)
308
Y. Guo
Similarly, for (48), we have ∂v F (t, x, v) − ∂v f0 (X(0), V (0)) t ∂v v∇x F (s, X(s), V (s))ds + N (∂v f, f ) + N (f, ∂v f ). =−
(50)
0
Notice that from (16), we rewrite N (∂v f, f ) as N ([1 + s] [1 + s]−1 ∂v f (s) , f (s)) ≤ (1 + t)N ([1 + s]−1 ∂v f (s), f (s)). By multiplying (1 + t)−1 (1 + |X(0)|2 )k eα|V (0)| to (50) and taking the maximum over t, x, v, we obtain 2
sup (1 + t)−1 ∇v F (t) ≤ |f0 |α,k + sup ∇x F (t) t
t
+ C sup (1 + t)−1 ∇v f (t) sup f (t). t
t
(51)
Plugging (51) into (49) we have sup ∇x F (t) ≤ C|f0 |α,k + C%0 sup ∇x F (t) + C|||f |||2 . t
t
If %0 is so small that C%0 = 1/2, we can solve from above to get sup ∇x F (t) ≤ C|f0 |α,k + C|||f |||2 . t
From (51), we have sup ∇x F (t) + sup (1 + t)−1 ∇v F (t) ≤ C|f0 |α,k + C|||f |||2 . t
t
Now we consider |||F − G|||, where G(t, x, v) ≡ T g = f0 (X(0), V (0)) + N (g, g). From L(G) = Q(g, g), we have L(F − G) = Q(f − g, f ) + Q(g, f − g). Now repeating all the estimates we have |||F − G||| ≤ C|||f − g||| × |||f ||| + |||g||| .
Proof of Theorem 2. It is a standard application of the contraction mapping theorem that there is a unique fixed point f = Tf of the mapping T from the space S to S if |f0 |α,k is small enough. Moreover, |f ||| ≤ C|f0 |α,k , see [U2] and [G]. From Lemma 7, Tf is C 1 , and it satisfies (13). It follows from the same proof as in [U2] that if f0 ≥ 0 then f ≥ 0. Now if g is another W 1,∞ solution to (13) and (3) which satisfies (15). By (19) sup g(s)ψ,α,0 ≤ C(t) < +∞.
0≤s≤t
Vlasov–Poisson–Boltzmann System Near Vacuum
309
Clearly g satisfies the mild form (16) too. Then we have [f − g](t, x, v) = N (f − g, g) + N (f, f − g). Applying Lemma 4 with k = 0 yields: (f − g)(t)ψ,α,0 ≤ C
0
t
(f − g)(s)ψ,α,0 f (s)ψ,α,0 + g(s)ψ,α,0 ds.
Hence f (t) ≡ g(t) from Gronwall.
5. The Vlasov–Poisson–Boltzmann System Proof of Theorem 1. We first construct a solution to the full Vlasov–Poisson–Boltzmann system (1), (2) and (3) through the following iterating sequence for n = 0, 1, 2, . . . : ∂t f n+1 + v · ∇x f n+1 + ∇x φ n · ∇v f n+1 = Q(f n+1 , f n+1 ), n+1 n+1 =ρ = f n+1 dv, φ
(52) (53)
R3
f n+1 (0, x, v) = f0 (x, v), starting with φ 0 (t, x) ≡ 0. We shall use Theorem 2 repeatedly. Recall constants δ0 , C0 and %0 in Theorem 2. We claim that there exist 0 < δ1 < δ0 such that if |f0 |α,k ≤ δ1 then for all n, [f n+1 , φ n ] are well-defined and satisfy |||f n+1 |||φ n ≤ C0 |f0 |α,k , (1 + t)|φ n (t)|∞ + (1 + t)2 |∇t,x φ n (t)|∞ + (1 + t)5/2 |∇x2 φ n (t)|∞ ≤ %0 .
(54)
Proof of the claim. We prove the claim via an induction on n. Since φ 0 ≡ 0 and |||f 1 |||φ 0 ≤ C1 |f0 |α,k by Theorem 2, clearly for n = 0 the claim is valid. Now we suppose that (54) is true for n = l − 1 . We define φ l = ρ l and we first show φ l satisfies (54). For any fixed (t, x, v) we denote [Xl−1 (0), Vl−1 (0)] as its backward trajectory of dx dv l−1 at s = 0. Then ds = v, ds = ∇x φ |ρ l (t)|1 =
f l (t, x, v)dxdv
−k 2 ≤ ||f l (t)||φ l−1 1 + |Xl−1 (0)|2 e−α|Vl−1 (0)| dxdv
−k 2 l = ||f (t)||φ l−1 1 + |Xl−1 (0)|2 e−α|Vl−1 (0)| dXl−1 (0)dVl−1 (0) ≤ C||f l (t)||φ l−1 ,
(55)
where α > 0, k > 3/2, and (x, v) → (Xl−1 (0), Vl−1 (0)) is a measure preserving map. Moreover, for t ≥ 1, from Lemma 8 and the induction hypothesis for φ l−1 , and a change
310
Y. Guo
of variable v → Xl−1 (0), we have l l f (t, x, v)dv |ρ (t)|∞ = sup 3 x R
−k 2 l ≤ ||f (t)||φ l−1 1 + |Xl−1 (0)|2 e−α|Vl−1 (0)| dv (56) 3 R
−k 2 = Ct −3 ||f l (t)||φ l−1 e−α|Vl−1 (0)| dXl−1 (0) 1 + |Xl−1 (0)|2 R3
≤ Ct
−3
l
||f (t)||φ l−1
for k > 3/2. While for t ≤ 1, from the induction hypothesis for φ l , |Vl−1 (0)|2 = |v|2 + O(%0 ) by (19). Hence we have l l |ρ (t)|∞ = sup f (t, x, v)dv ≤ ||f l (t)||φ l−1 . (57) x
R3
By Lemma 9 and (55), (56) and (57), we have (1 + t)|φ l (t, ·)|∞ + (1 + t)2 |∇x φ l (t, ·)|∞ ≤ Cf l (t)φ l−1 . Notice that from the charge conservation in (52) with n = l − 1, we have ∂t ρ l + ∇x · j l = 0, where the current density is given by j l (t, x) = R3 vf l (t, x, v)dv. Hence we have ∂t φ l = −∇x · j l and 1 ∇y · j l (y)dy. ∂t φ l (x) = 4π 3 |x − y| R From the proof of Lemma 9, and similar estimates for j l as ρ l in (55), (56) and (57) (replacing f l by vf l ), we obtain |∂t φ l (t, ·)|∞ ≤ C|j l |1 |j l |∞ ≤ C(1 + t)−2 ||f l (t)||φ l−1 . 1/3
2/3
Applying estimates (55), (56) and (57) to both ρxl and jxl (instead of ρ l and j l ), we have |∇x ρ l |1 + |∇x j l |1 + (1 + t)3 |∇x ρ l |∞ + |∇x j l |∞ ≤ C∇x f l φ l−1 . Therefore from Lemma 9 and ∂t φ l = −∇x · j l again, |∇x2 φ l (t, ·)|∞ ≤ C(1 + t)−5/2 |||f l (t)|||φ l−1 , |∂t ∇x φ l (t, ·)|∞ ≤ C(1 + t)−2 |||f l (t)|||φ l−1 .
(58)
Hence ∇x φ l ∈ C(R+ × R3 ). Summarizing all the previous estimates, we conclude that there is a constant C(%0 ), such that sup (1 + t)|φ l (t)|∞ + (1 + t)2 |∇t,x φ l (t)|∞ + (1 + t)5/2 |∇x2 φ l (t)|∞ t
≤ C(%0 )|||f l |||φ l−1 ≤ C(%0 )C0 |f0 |α,k ≤ C(%0 )C0 δ1 .
(59)
Vlasov–Poisson–Boltzmann System Near Vacuum
311
If we choose δ1 so small at the beginning such that C(%0 )C0 δ1 ≤ %0 , we obtained the desired (54) for φ l . Moreover, we can apply Theorem 2 to construct f l+1 ∈ C 1 so that |||f l+1 |||φ l ≤ C0 |f0 |α,k . The claim thus follows. In particular, the decay condition (14) is valid for all φ n . From (19), |Vn (0)|2 = + O(%0 ), we therefore deduce from (54) and (58) that (since ρ n (t, ·) ∈ C 0,µ (R3 ))
|v|2
f n+1 (t)0,α,0 + ∇x,v f n+1 (t)0,α,0 + |∇x2 φ n (t)|C 0,µ (R3 ) + |∂t ∇x φ n (t)|∞ ≤ C(t). (60) Here C(t) is bounded for bounded time interval, and some 0 < µ < 1. This implies that Q(f n+1 , f n+1 ) is bounded for finite time by Lemma 4 with ψ ≡ 0, k = 0. Therefore, for any T > 0, f n+1 is bounded in W 1,∞ ([0, T ] × R3 × R3 ) from (52). Let f n → f ∈ W 1,∞ and φ n → φ ∈ C 2 , up to a subsequence. Now we verify that [f, φ] is a classical solution tothe full nonlinear problem. From (60), letting n → ∞ in (53), we have φ = ρ = f dv. Hence the Poisson equation is verified. For the Boltzmann equation, notice that from (60) and (43), for fixed (t, x, v), the integrands α 2 in Q(f n+1 , f n+1 ) are uniformly bounded by CB(|u|, θ )e− 2 |u| . Since f n+1 → f up n+1 n+1 to a subsequence, Q(f ,f ) → Q(f, f ) by the Lebesgue dominant convergence theorem. Moreover, ∇φ n · ∇v f n+1 → ∇φ · ∇v f , because ∇φ n → ∇φ up to a subsequence from (60). Passing n → ∞ in (52), we have shown that [f, φ] is a solution to the full nonlinear problem. Furthermore, since f is a W 1,∞ solution to (13) with the field ∇x φ. Clearly, from (60), φ(t, x) ∈ C 2 (R3 ) for any given t, and ∇x φ ∈ C(R+ × R3 ). Moreover, as n → ∞, φ satisfies (14). Hence the limit φ(t, x) satisfies the condition in Theorem 2. We apply the uniqueness of Theorem 2 for (13) with ∇x φ, f is the classical solution and (11) is also valid. Notice that (12) follows from (59) as n → ∞ with C1 = C(%0 )C0 . Finally we establish the uniqueness. Let [g, ψ] be another solution to (1) and (2) which satisfies (11) and (12) for some constants C0∗ and C1∗ . Let [X(0), V (0)] and [Xg (0), Vg (0)] be trajectories emanating from (t, x, v) at s = 0 with fields ∇x φ and ∇x ψ dv respectively. Integrating over these trajectories dv ds = ∇x φ and ds = ∇x ψ respectively, ∗ ∗ we deduce that for 0 ≤ t ≤ T0 (C1 , δ1 , C1 , δ1 ), |Xg (0) − X(0)| ≤
1 , 2
|Vg (0) − V (0)| ≤
1 . 2
Therefore from |||g|||ψ < +∞, we can replace the ψ-norm by the φ-norm to get sup 0≤s≤T0
g(s)φ + ∇v g(s)φ < +∞.
Now f − g satisfies: [∂t + v · ∇x + ∇φ · ∇v ](f − g) = (∇x ψ − ∇x φ) · ∇v g + Q(f − g, f ) + Q(g, f − g).
312
Y. Guo
Integrating the trajectory associated with ∇x φ yields
t
(f − g)(t, x, v) = 0
∇x [φ − ψ](s, X(s)) · ∇v g(s, X(s), V (s))
+ N (f − g, f ) + N (g, f − g). Since from (55) and (57), we have |∇x [φ − ψ](s)|∞ ≤ C(T0 )(f − g)(s)φ . From Lemma 4, we have for t ≤ T0 , (f − g)(t)φ ≤ C(1 + sup ∇v g(s)φ + sup g(s)φ ) 0≤s≤T0
0≤s≤T0
t
0
(f − g)(s)φ ds.
We thus deduce f (t) ≡ g(t) for 0 ≤ t ≤ T0 from the Gronwall inequality. By repeating the same arguments for [T0 , 2T0 ], [2T0 , 3T0 ], . . . , we conclude the uniqueness. Acknowledgement. The research is supported in part by NSF grant 9971306 and an A. P. Sloan Fellowship. The author thanks a referee’s helpful comments.
6. Appendix The following two lemmas are from [BD]. The first one is based on Proposition 1 and Corollary 1, and the second one is based on Lemma 1 in [BD]. Lemma 8. There is a %0 > 0 such that if |∇x2 ψ(t, ·)|∞ ≤ %0 /(1 + t)5/2 , then for any 0 ≤ s ≤ t, the mapping v → X(s; t, x, v) is one to one. Furthermore, ∂X ∂V ∂v (s; t, x, v) − (s − t)I + ∂v (s; t, x, v) − I ≤ C%0 (t − s), ∂V ∂X ∂x (s; t, x, v) − I + ∂x (s; t, x, v) ≤ C%0 ,
3 det ∂X (s; t, x, v) ≥ (t − s) . ∂v 2 Lemma 9. Let ρ(x) ∈ L1 (R3 ) ∩ W 1,∞ (R3 ) then if φ(x) = ρ ∗ 1/3
2/3
2/3
1/3
1 |x|
we have
|φ|∞ ≤ C|ρ|∞ |ρ|1 , |∇x φ|∞ ≤ C|ρ|∞ |ρ|1 , 3(1−θ)/3+θ
|∇x2 φ|∞ ≤ Cθ |ρ|∞
3θ/(3+θ)
|∇x ρ|∞
where the last constant Cθ depends on 0 < θ < 1.
θ/(3+θ)
|ρ|1
,
Vlasov–Poisson–Boltzmann System Near Vacuum
313
References [BD]
Bardos, C., Degond, P.: Global existence for the Vlasov–Poisson equation in three space variables with small initial data. AIHP, Anal. Nonlinear. 2, 101–108 (1985) [DD] Desvilletes, L., Dolbeault, J.: On long time asymptotics of the Vlasov–Poisson–Boltzmann equation. Comm. PDE. 16, no. 2–3, 451–489 (1991) [BT] Bellomo, N., Toscani, G.: On the Cauchy problem for the nonlinear Boltzmann equation: Global existence, uniqueness and asymptotic behavior. J. Math. Phys. 6, 334–338 (1985) [C] Cercignani, C.: The Boltzmann equation and its applications. New York: Springer, 1988 [CIP] Cercignani, C., Illner, R. and Pulvirenti, M.: The mathematical theory of dilute gases. New York: Springer-Verlag, 1994 [CGP] Chvala, F., Gustafsson, T., Pettersson, R.: On solutions to the linear Boltzmann equation with external electromagnetic force. SIAM J. Math. Anal. 24, 583–602 (1993) [G] Glassey, R.: The Cauchy Problem in Kinetic Theory. SIAM, 1996 [GS] Glassey, R., Strauss, W.: Absence of shocks in a initially dilute collisionless plasma. Commun. Math. Phys. 113, 191–208 (1987) [H] Hamdache, K.: Existence in the large and asymptotic behavior for the Boltzmann equation. Japan J. Appl. Math. 2, 1–15 (1985) [IS] Illner, R., Shinbrot, M.: The Boltzmann equation: Global existence for a rare gas in an infinite vacuum. Commun. Math. Phys. 95, 217–226 (1984) [L] Lions, P.-L.: On the kinetic equations. In: Proceedings of ICM, Vol. I and II, (Kyoto, 1990), 1173–1185 [P] Polewczak, J.: Classical solution of the nonlinear Boltzmann equation in all R3 : Asymptotic behavior of the solutions. J. Stat. Phys. 50, 611–632 (1988) [T] Toscani, G.: On the non-linear Boltzmann equation in unbounded domains. Archive. Rational Mech. Anal. 95, 37–49 (1986) [U1] Ukai, S.: On the existence of global solutions of a mixed problem for the nonlinear Boltzmann equation. Proc. Japan Acad. A 53, 179–184 (1974) [U2] Ukai, S. Solutions of Boltzmann Equation. In: Patterns and Waves – Qualitative Analysis of Nonlinear Differential Equations, Studies in Mathematics and Its Applications, 18, 1986, pp. 37–96 Communicated by P. Constantin
Commun. Math. Phys. 218, 315 – 331 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Around Polygons in R3 and S 3 John J. Millson1 , Jonathan A. Poritz2 1 University of Maryland, College Park, MD 20742, USA. E-mail:
[email protected] 2 IBM Zurich Research Laboratory, 8803 Rüschlikon, Switzerland. E-mail:
[email protected]
Received: 20 September 1999 / Accepted: 28 November 2000
Abstract: We survey certain moduli spaces in low dimensions and some of the geometric structures that they carry, and then construct identifications among all of these spaces. In particular, we identify the moduli spaces of polygons in R3 and S 3 , the moduli space of restricted representations of the fundamental group of a punctured 2-sphere, the moduli space of flat connections on a punctured sphere, the moduli space of parabolic bundles on a sphere, the moduli space of weighted points on CP1 and the symplectic quotient of SO(3) acting diagonally on (S 2 )n . All of these spaces depend on parameters and some the above identifications require the parameters to be small. One consequence of this work is that these spaces are all biholomorphic with respect to the most natural complex structures they can each be given.
1. Introduction In this paper we shall describe a coincidence that occurs among a number of moduli spaces of geometric objects in two, three, and infinite dimensions. These spaces arise in a series of very simple but apparently quite unrelated problems, and themselves carry a variety of geometric structures. Despite their disparate origins, we shall exhibit explicit maps identifying all of the spaces and hence show that they each share all of the geometric structures of their siblings. Let us here at least name the main spaces which we shall go on to identify and give a diagram displaying the maps we shall construct among them. For a vector s = (s1 , . . . , sn ) of positive real numbers with nj=1 sj < 1, we shall consider the following spaces, whose precise definitions and technical details shall be discussed in the following sections: Research partially supported by NSF grants DMS-9205154 (Millson) and DMS-9403784 (Poritz).
316
J. J. Millson, J. A. Poritz
• the space PsR of configurations of a polygon with side lengths s in Euclidean 33 space; and the corresponding moduli space PsR /E+ (3) under the action of the group of orientation-preserving Euclidean motions; • the symplectic manifold (S 2 )ns consisting of the product of n spheres, where the j th has the usual symplectic form scaled by sj ; and its reduction (S 2 )ns //SO(3) by the action of SO(3); • the configuration space Wsss of n points on CP1 , semi-stable with respect to the weights s; and the corresponding geometric invariant theory quotient Wsss /∼; • Vsss , the space of semi-stable rank two parabolic vector bundles on CP1 of degree zero, parabolic degree zero and having parabolic weights −sj and sj at the j th parabolic point; and the moduli space Vsss /∼; ss of semi-stable, L2 complex structures in a trivial rank two Hermitian • the space Cs,δ 1,δ vector bundleover the n-punctured sphere which near the j th puncture are asymptotic −s /2 0 j d¯ z ss to ∂¯ + 2¯z ⊗ 0 sj /2 ; and the moduli space Cs,δ /∼; • the set Fs,δ of flat unitary connections in Cs,δ ; and the quotient Fs,δ /GT, δ by the group of special unitary L22,δ gauge transformations asymptotic to elements of the maximal torus T ⊂ SU (2) at each puncture; • the set Rs of those representations of the fundamental group of the n-punctured sphere into SU (2) which take the loop around the j th puncture to elements of SU (2) with trace 2 cos(π sj ); and the corresponding moduli space Rs /SU (2) under the conjugation action of SU (2); S 3 of configurations of a geodesic polygon with side lengths π s in the • the space Pπs S 3 /SO(4) under the action of the sphere S 3 ; and the corresponding moduli space Pπs isometry group of the sphere. 3
Since there are so many configuration and moduli spaces appearing in this paper, we have attempted to use a somewhat suggestive notation: each space has as representative symbol the first letter of a word that describes the objects in question. Hence we use P for polygons, W for weighted points, V for vector bundles, C for both complex structures and connections, F for flat connections and R for representations. With this notation understood, it now makes sense to say that our main purpose here is to fill in the arrows in the following diagram: PsR /E+ (3) −−−−→ (S 2 )ns //SO(3) −−−−→ Wsss /∼ −−−−→ Vsss /∼ 3
S /SO(4) − Pπs −−−→ 3
Rs /SU (2)
ss /∼ ←−−−− Fs,δ /GT, δ ←−−−− Cs,δ
In fact, the remaining sections of this paper are nothing other than a grand tour of the above diagram; sections alternate, where appropriate, between describing spaces and some of their properties and constructing maps between these spaces. Note that in the above diagram, Wsss /∼ and Vsss /∼ are naturally complex spaces and, as we shall show below, the map connecting them is a biholomorphism with respect to these structures. It shall thus follow that the above diagram consists of complex isomorphisms when all of the spaces are endowed with any of the available holomorphic structures. We should mention that this is by no means the first paper to discuss these spaces, their structures or the existence of maps between some of them; our purpose is in fact to
Around Polygons in R3 and S 3
317
bring together this material in one place and to fill in a number of the missing connections between spaces and structures. In particular, Sects. 3–6 describe results of Kapovich and Millson [12], Sects. 7, 8 and part of Sect. 9 are entirely new to this paper, the rest of Sect. 9 and Sect. 10 are closely related to the work of Poritz in [16], while Sects. 11 and 12 are based on a special case of Kapovich and Millson’s [11]; work of other authors on these topics is also mentioned in the references of [12], [11] and [16]. A concluding Sect. 13 summarizes what we have done, in a more accurate and complete diagram than the above, and mentions two further, more recent, related works. The first author would like to thank Aaron Bertram for useful conversations. 2. Configuration Spaces of Polygons in R3 : PsR /E+ (3) 3
Let n ≥ 3 be an integer and s = (s1 , . . . , sn ) an n-tuple of positive real numbers. Then Definition 2.1. The configuration space PsR of polygons in R3 with fixed side lengths s is the set of all n-tuples (e1 , . . . , en ) of directed line segments such that the length of ej is sj and the endpoint of ej is the beginning point of ej +1(mod n) . The 3 moduli space of polygons is then simply PsR /E+ (3), where E+ (3) acts diagonally on the n-tuples of line segments. 3
Note that it might actually be more precise to call these “labeled polygons”, since we are keeping track of which line segment is first, which is second, etc. Hence two polygons will be identified by an element g ∈ E+ (3) if and only if g moves one polygon exactly onto the other, maintaining the numbering of sides. Of course, for generic choices of 3 side lengths s, the only identification possible between elements of PsR would have to match corresponding sides. Let us remark that the vector s of side-lengths of our polygons can be scaled by any positive real number and it will have an obvious effect on the configuration and moduli spaces. In fact, it shall be necessary in Sect. 8, below, to assume that nj=1 sj < 1, but for the time being we need make no such restriction. 3 There are a number of equivalent ways to give PsR the structure of a smooth manifold. One very direct method goes as follows: each segment ej is defined by a pair of points begin (ej , ejend ) ∈ Sj , where Sj = {(p, q) | d(p, q) = sj } ⊂ R3 × R3 . Each of these
Sj ’s is a smooth manifold – in fact, a trivial S 2 bundle over either R3 factor – and PsR n can be identified with the set f −1 ((0, . . . , 0)), where f : S1 × · · · × Sn → R3 : ((p1 , q1 ), . . . , (pn , qn )) → (p2 − q1 , . . . , p1 − qn ). The zero set of f will be smooth if we avoid certain polygons: 3
Definition 2.2. A polygon is said to be degenerate if it lies entirely in some line in R3 . sR3 . The set of non-degenerate polygons shall be denoted P sR3 is a Now an application of the inverse function theorem with f shows that P smooth submanifold of S1 × · · · × Sn . An additional feature of the space of non-degenerate polygons is that E+ (3) acts freely there, while the degenerate polygons all have stabilizer isomorphic to SO(2). sR3 /E+ (3) is a smooth manifold, while on all of PsR3 /E+ (3) we can This means that P at least use the quotient topology. Note, however, that for generic side lengths s, there 3 3 are no degenerate polygons whatsoever in PsR , while even if there are some in a PsR , they form a closed, E+ (3)-invariant set of high codimension.
318
J. J. Millson, J. A. Poritz
When a compact group acts freely on a manifold it is particularly easy to give the quotient space a manifold structure. This motivates passing to R = {(e , . . . , e ) ∈ P R | e Definition 2.3. The set of based polygons is Ps,0 1 n s 1 0 ∈ R3 }. 3
3
begin
=
3 R3 /SO(3) are homeomorphic, as are P sR3 /E+ (3) and Then PsR /E+ (3) andPs,0
R3 /SO(3), and this last space is a smooth manifold. P s,0 3. The First Gauss Map
Let us begin transforming the configuration space of polygons in R3 and its moduli space. The essential idea here is to build a sort of discrete Gauss map – that is, to assign to each edge of a polygon the unit vector in that direction. So consider the map begin
G1 :
R3 Ps,0
2 n
→ (S ) : (. . . , ej , . . . ) → (. . . ,
ejend − ej
, . . . ).
sj
To invert G1 , we start with some (u1 , . . . , un ) ∈ (S 2 )n and build edges by setting begin begin begin e1 = 0 and then inductively ejend = ej + sj uj and ej +1 = ejend . This will give us
R as long as the end of the last edge is the beginning a well-defined closed polygon in Ps,0 of the first, i.e., if 3
begin
e1
begin
= 0 = enend = en begin
= · · · = e1
end + sn un = en−1 + s n un
+ s1 u1 + · · · + sn un = s1 u1 + · · · + sn un .
Hence if we define µs : (S 2 )n → R3 : (u1 , . . . , un ) → s1 u1 + · · · + sn un , then G1 is an R3 with µ−1 (0) and consequently also induces SO(3)-equivariant homeomorphism of Ps,0 s 3 −1 −1 R R3 ). (0) = G (P a diffeomorphism G of P /SO(3) with µ (0)/SO(3), where µ 1
s
s,0
s
1
s,0
4. Symplectic Quotients of Products of Spheres: (S 2 )ns //SO(3) Our choice of the letter µs for the map defined above was not an accident: it is nothing other than the moment map for the diagonal action of SO(3) on (S 2 )n , where the latter is given the symplectic structure s1 π1∗ (vol) + · · · + sn πn∗ (vol). Here the πj are the various projections onto the factors, vol is the standard volume form on the sphere (of total volume 4π) and the target R3 is to be thought of as the dual, via the usual inner product, of (R3 , ×) ∼ = (so(3), [ , ]), where × is the cross-product. Definition 4.1. We write (S 2 )ns for (S 2 )n with the above symplectic structure, and set 2 )n = {(u , . . . , u ) ∈ (S 2 )n | not all u = ±u }. (S 1 n i j s s It now follows that the map G1 of the last section is in fact a homeomorphism R3 /SO(3) with the Marsden–Weinstein symplectic reduction (S 2 )n //SO(3) = of Ps,0 s −1 2 )n and G is a diffeomorphism of (0) = µ−1 (0) ∩ (S µ−1 (0)/SO(3). Similarly, µ s
s
s
s
1
Around Polygons in R3 and S 3
319
2 )n //SO(3). FurtherR3 /SO(3) with the symplectic manifold (S the smooth manifold P s s,0 2 more, as the smooth symplectic reduction of a Kähler manifold, (S )n //SO(3) is itself s
Kähler, [7]. Kapovich and Millson have in fact shown that (S 2 )ns //SO(3) can be given a C-analytic structure even near its singular stratum, see [12] for details. 5. The First Kempf–Ness-Type Theorem
We can also think of the symplectic manifold S 2 as the complex algebraic variety CP1 , and the action of SO(3) on S 2 extends to the algebraic action of the group P SL(2, C) of biholomorphisms of CP1 . Certainly then the inclusions SO(3) → P SL(2, C) and 1 n µ−1 s (0) → (CP ) induce a map κ1 : (S 2 )ns //SO(3) → (CP1 )n /P SL(2, C). In situations such as the current one, it often turns out that this map is a bijection onto the quotient of a non-empty Zariski open in the target; this is the original theorem of Kempf and Ness [13], much elaborated by Kirwan, [14]. In fact, Kirwan’s result applies to our spaces, but we shall instead give the proof of [12] which can be interpreted in our present context much more directly and concretely. For this it is instructive to think of the points of (S 2 )ns as giving purely atomic measures on the sphere, where (u1 , . . . , un ) ∈ (S 2 )ns corresponds to the measure nj=1 sj δuj . We have the standard Definition 5.1. For any measure ν on S 2 , the center of mass of ν is x dν(x), B(ν) = S2
where x ∈ S 2 → R3 and the integral is of this vector-valued function. Note B(ν) is always in the closed unit ball B(0, ν(S 2 )) ⊂ R3 and only on ∂B(0, ν(S 2 )) if ν is concentrated at a point. There is another center of mass C(ν), whose definition is in the work [6] of Douady and Earle, which is called the conformal center of mass and satisfies: • C(g∗ ν) = g(C(ν)) for any g ∈ P SL(2, C), and • C(ν) = 0 if and only if B(ν) = 0, but which is only defined for stable measures ν, where Definition 5.2. A measure ν on S 2 is said to be stable (respectively, semi-stable) if the mass of any atom is less than (respectively, less than or equal to) 21 ν(S 2 ). −1 This is exactly the tool we need. The measures corresponding to µ s (0) are stable and have center of mass at 0. If an element g ∈ P SL(2, C) leaves the center of mass of a stable measure at 0, then it also fixes the conformal center of mass at 0 and hence must lie in the stabilizer SO(3) of 0 in P SL(2, C). Conversely, by the transitivity of P SL(2, C) on B(0, 1), any stable purely atomic measure can be moved until its conformal center of 2 )n //SO(3) mass, hence also its normal center of mass, is at 0. Thus κ1 is a bijection of (S s with the part of (CP1 )n /P SL(2, C) corresponding to orbits of stable measures.
320
J. J. Millson, J. A. Poritz
6. Weighted Quotients of Points on CP1 : Wsss /∼ It turns out that this image of κ1 is exactly the stable part of the weighted quotient by P SL(2, C) of the configuration space of n points on CP1 studied by Deligne and Mostow in [4]. One can motivate this construction as follows: Let us for a moment write W0 ⊂ (CP1 )n for the set of n-tuples of distinct points in CP1 . Then W0 is a quasiprojective C-algebraic variety and is closed under the free, diagonal action of P SL(2, C); the quotient W0 /P SL(2, C) is a smooth quasi-projective variety. The complication comes when we try to complete W0 by putting back in some of the P SL(2, C)-orbits in (CP1 )n W0 before taking the quotient. Deligne and Mostow [4] give a geometric invariant theory approach to this problem, depending on a choice of a vector s of positive real numbers. Using the terminology we have developed above, we can restate their definitions as follows: Definition 6.1. Let the subsets Wss and Wsss of stable and semi-stable weighted points n be those points (u , . . . , u ) ∈ W = (CP1 )n such that the corresponding in (CP1 ) 1 n cusp measure nj=1 sj δuj is stable and semi-stable, respectively; also write Ws = Wsss s Ws . There is the usual geometric invariant theory equivalence relation ∼ on Wsss which cusp on Wss is the relation given by the P SL(2, C)-orbits and on Ws is extended orbit equivalence, i.e., two points are equivalent if and only if their orbit closures intersect. Then Wss /∼ = Wss /P SL(2, C) can be given the structure of a smooth, quasi-projective C-algebraic variety and Wsss /∼ that of a projective variety; if all of the sj are rational, these are simply the geometric invariant theory quotients à la Mumford. One pleasant feature of the geometric invariant theory in this application is the existence of some particularly useful semi-stable points: Definition 6.2. The nice semi-stable points Wsnss ⊂ Wsss are those whose P SL(2, C)orbit is closed in Wsss . Considering the action of P SL(2, C) on CP1 , we see that the points of Wsnss ∩ correspond to measures with exactly two atoms, each having half the total mass. What makes these so nice is the fact that inclusion induces a bijection Wsss /∼ = cusp Wsnss /P SL(2, C). Under κ1−1 , (Wsnss ∩ Ws )/P SL(2, C) corresponds exactly to 2 )n //SO(3), i.e., to G of the degenerate polygons. Thus κ is a (S 2 )n //SO(3) (S 1 1 cusp Ws
s
s
homeomorphism of (S 2 )ns //SO(3) with the projective C-algebraic variety Wsss /∼ which identifies the corresponding smooth and singular parts. It is in fact easy to check that κ1 is smooth, while in [12] it is also shown that their C-analytic structure 2 n (S )s / SO(3)
cusp
near the singular set is mapped analytically by κ1 to the complex structure near Ws
/∼.
7. The Passage to Vector Bundles Fix now a trivial holomorphic vector bundle E of rank two over CP1 (not the same CP1 as in previous sections) and n + 1 points p1 , . . . , pn , q ∈ CP1 . Since E is trivial there is a canonical identification of each of the fibers E p with the fiber E q over q and j choosing an isomorphism C2 ∼ = E q , we will thus have an identification of the points of CP1 (the CP1 of previous sections, this time) with the set of lines in E q , and hence
Around Polygons in R3 and S 3
321
with the set of lines in each E p . We shall write vL if this identification matches a j v ∈ CP1 with the line L ⊂ E . pj
With the above choices and identifications in place and still using our n-vector s of real numbers, we can define for every (v1 , . . . , vn ) ∈ (CP 1 )n some additional structure on E as follows. For each pj , we define a filtration E p = Ej1 Ej2 0, where j
vj Ej2 is the only interesting filtration step; we also attach to Ej1 the number −sj and to Ej2 the number sj and call these the weights of these steps of the filtration. Suppose we now consider a holomorphic automorphism g of E of trivial determinant. Since E is trivial over a compact base, g is constant with respect to any holomorphic trivialization. Letting h ∈ SL(2, C) be the matrix of g q with respect to our identification C2 ∼ = E , we see that the action of g on E takes the filtrations E k coming from j
q
(v1 , . . . , vn ) ∈ (CP1 )n to those coming from (hv1 , . . . , hvn ). Thus our map from (CP1 )n to trivial vector bundles with filtration and weight data sends orbits under P SL(2, C) to orbits under the group of holomorphic automorphisms of E which are trivial on det E. Let us check what sort of data arises on E when we start with a point in Wss or Wsss . So note that the mass of an atom at some point v ∈ CP1 of the measure corresponding to a (v1 , . . . , vn ) ∈ (CP1 )n is
sj = sj = sj , (7.1) j | v=vj
j | vEj2
j | Lv|pj =Ej2
where Lv is the unique line subbundle of E with Lv q = v that is constant with respect to a holomorphic trivialization of E, i.e., the line subbundle of degree zero whose fiber at q is v. It is then appropriate to make the Definition 7.1. Given a bundle E with filtration and weight data as above we set for any line subbundle L of E the mass of L to be the number
sj . j | L|pj =Ej2
We can now say that the filtration and weight data on E that results from a point of Wss (respectively, Wsss ) is that data such that for all holomorphic line subbundles L ⊂ E of degree zero the mass of L is less than (respectively, less than or equal to) 21 nj=1 sj . 8. Moduli of Parabolic Vector Bundles: Vsss /∼ The decorated vector bundles that appeared in the last section have been studied before. Definition 8.1. Let F be a holomorphic vector bundle over a compact Riemann surface * with n distinguished points p1 , . . . , pn ∈ *. A parabolic structure on F is the choice for each j of k k +1 • a decreasing flag F p = Fj1 · · · Fj j Fj j = 0 in the fiber over pj ; and j
j
j
• an increasing sequence α1 < · · · < αkj of real numbers, called weights, one for each step of the filtration at pj .
322
J. J. Millson, J. A. Poritz j
j
The weight α, is said to occur with multiplicity m, = dim Fj, /Fj,+1 and the parabolic degree and parabolic slope of (F, α) are par-deg(F, α) = deg F +
kj n
j =1 ,=1
j j
m, α ,
and µpar (F, α) =
par-deg(F, α) . rank F
An isomorphism of parabolic vector bundles is an isomorphism of holomorphic bundles which takes one parabolic structure exactly to another. This definition was first given by Mehta and Seshadri in [15], but with the additional requirement that all of the weights lie in [0, 1); the version here follows the constructions in [16], but it is easy to see how to generalize essentially all known results on parabolic vector bundles, mutatis mutandis, to this slightly larger context. Given a holomorphic subbundle G of a parabolic vector bundle (F, α), there is a natural way to induce a parabolic structure on G. For each point pj , we let the filtration of G p be the intersection of G p with the filtration of F p , with repetitions removed, j
j
j
j
j
and at the ,th filtration step we assign the weight β, = α, , where , is the largest index such that G,j ⊂ Fj, . By convention, when we speak of a parabolic subbundle, it shall be assumed to have arisen in this way. There is a good geometric invariant theory moduli space for vector bundles with parabolic structure, and it relies on the following Definition 8.2. A parabolic vector bundle (F, α) is stable (respectively, semi-stable) if µpar (G, β) < µpar (F, α) (respectively, µpar (G, β) ≤ µpar (F, α)) for all proper parabolic subbundles (G, β) of (F, α). We can now concentrate on the sets of vector bundles which are relevant to our present study. So fix n points p1 , . . . , pn on * = CP1 and our usual vector s of positive real numbers. Definition 8.3. Denote by Vs the set of parabolic vector bundles of rank two, trivial determinant and such that at each pj the filtration has two steps with weights −sj and sj . Write also Vss and Vsss for the stable and semi-stable parts of Vs , respectively, and cusp Vs for Vsss Vss . We adopt the convention that isomorphisms of elements of Vs must have trivial determinant and denote the resulting equivalence relation “Iso0 ”. Note that the trivial determinants and weights of opposite signs mean that the normal and cusp parabolic degrees of bundles in Vs are both zero. Also, our notation Vs is non-standard; these are usually called properly semi-stable bundles. Let us look at the parabolic degrees of line subbundles of elements of Vs . So say L is a proper holomorphic subbundle of the vector bundle F at some point of Vs . Then the induced filtration on L p has only one step, but the weight assigned to that step j
Around Polygons in R3 and S 3
323
depends on the relation of L p to Fj2 : it is sj if L p = Fj2 and −sj otherwise. Hence j
j
par-deg L = deg L +
= deg L + 2 j
sj −
j | L|pj =Fj2
sj
the rest
sj −
n
(8.1)
sj .
j =1
| L|pj =Fj2
It follows that if F ∈ Vsss and L is any proper holomorphic subbundle, then deg L = par-deg L +
n
j =1
sj − 2
j | L|pj =Fj2
sj ≤
n
sj .
j =1
Hence if the vector s is chosen so that this last sum is less than 1 then the degree of L, being an integer, will have to be non-positive, and so F will in fact be semi-stable as a plain holomorphic bundle. Since all holomorphic bundles over CP1 of rank two and degree zero are of the form O(k) ⊕ O(−k), such a parabolic-semi-stable F must be holomorphically trivial. We shall thus assume for the remainder of this paper that n
sj < 1.
(8.2)
j =1
The gist of the last section was to define maps which we may call ξ : (CP1 )n → Vs and ξ : (CP1 )n /P SL(2, C) → Vs / Iso0 . If we restrict ξ to Wss (respectively, to Wsss ), then we get elements of Vs whose underlying holomorphic bundle is trivial and has the property that every holomorphic line subbundle of degree zero has mass less than (respectively, less than or equal to) 21 nj=1 sj . But by (8.1) this means that every such line subbundle has parabolic degree less than zero (respectively, less than or equal to zero). Also by (8.1) and (8.2), any holomorphic line subbundle of negative degree has negative parabolic degree. Since all line subbundles of the trivial bundle are of non-positive degree, it follows that ξ (Wss ) ⊂ Vss and ξ (Wsss ) ⊂ Vsss . Furthermore, it is clear from the role of the flag varieties in the constructions of [15, §4] that the points in Vss / Iso0 depend holomorphically upon the filtration data at p1 , . . . , pn (we shall also reprove this fact much more directly in the next section). This means that ξ is in fact a biholomorphism of Wss /P SL(2, C) with Vss / Iso0 . In order to extend ξ to be a map of the semi-stable moduli space, let us again take advantage of the nice semi-stable points. So given a point of Wsnss with corresponding parabolic vector bundle F ∈ Vs , note that the two atoms which each have half of the total mass correspond to distinct trivial line subbundles L1 and L2 which by (8.1) both have parabolic degree zero, and in fact F = L1 ⊕ L2 as parabolic bundles. Recall the Definition 8.4. A parabolic vector bundle which is the direct sum of stable parabolic ps subbundles all of the same parabolic slope is called polystable. Write Vs for the set of polystable bundles in Vs .
324
J. J. Millson, J. A. Poritz ps
Of course Vss ⊂ Vs ⊂ Vsss , but the general semi-stable bundle is an extension of a line bundle of parabolic degree zero by another such, rather than merely the direct sum of two such line bundles. However, the equivalence relation used in the algebro-geometric construction of the moduli space of semi-stable parabolic bundles is exactly the isomorphism of associated graded bundles (associated to the Harder-Narasimhan filtration of a holomorphic bundle by successive maximally destabilizing subbundles; see [1]). Hence every semi-stable bundle is equivalent to one which is polystable and we can extend ξ to a bijection cusp
Ws
/∼ ∼ = (Wsnss ∩ Ws
cusp
ξ
ps
cusp
− → (Vs ∩ Vs
)/P SL(2, C)
cusp )/ Iso0 ∼ = Vs /∼
and a homeomorphism ξ : Wsss /∼ → Vsss /∼. C 9. A Reinterpretation in Infinite Dimensions: Vs / Iso0 = Cs,δ /GA,δ
In our presentation here, the issue of what exactly are the points of Vs or its quotients is somewhat elusive: let us now make these points more concrete. But rather than approaching the intricacies of the algebraic geometry of these spaces, we choose to represent them as infinite-dimensional affine spaces and the resulting quotients by infinite-dimensional Lie groups. This is quite standard, since the work of Atiyah and Bott [1] and before, so our main work here will be in incorporating the parabolic structures of our bundles. First, since all holomorphic vector bundles of degree zero are isomorphic as smooth bundles, we shall work always in our trivial bundle E = C2 × CP1 . Definition 9.1. A complex structure on E is an operator ∂¯A : 10 (E) → 1(0,1) (E) ¯ satisfying the ∂-Leibniz rule and inducing the standard ∂¯ on the line bundle det E. Equivalently, fixing the standard constant Hermitian structure on E, a (special) unitary connection is an operator dA : 10 (E) → 11 (E) satisfying the usual Leibniz rule and also preserving the metric and inducing the standard d on det E. In either case, the space of such operators shall be denoted C, and the subset of complex structures for which the resulting holomorphic bundle is stable shall be denoted C s . We get from one kind of operator to the other by the maps ∂¯A = ∂¯ + A → d + A − A∗ and dA = d + A1,0 + A0,1 → ∂¯ + A1,0 . While usually the space of holomorphic bundles ¯ Therefore is an affine space, our E is trivial and thus we have a natural basepoint ∂. (0,1) C = {∂¯ + η | η ∈ sl(2, C) ⊗ 1CP1 } is essentially a complex vector space. Next, we must consider when two complex structures are equivalent. Definition 9.2. Two operators ∂¯A , ∂¯B ∈ C are said to give isomorphic holomorphic bundles if and only if ∂¯B = g ∗ (∂¯A ) = g −1 ◦ ∂¯A ◦ g for some g in the complex gauge group G C of smooth sections of the bundle SAutE = SL(2, C)×CP1 of automorphisms of E with trivial determinant. The subgroup G ⊂ G C of automorphisms which preserve the metric is called the (special) unitary gauge group. C s is in fact an open, G C -invariant subset of C, so the moduli space of stable bundles is nothing other than C s /G C , which inherits a complex structure as the quotient of a complex space acted upon holomorphically by a complex group. Note that when working with unitary connections, the natural group to act is G: elements of G will act by pull-back
Around Polygons in R3 and S 3
325
on connections, while the pull-back of a connection by an element of G C may not any more be unitary. Let us now begin to introduce our parabolic structures. As in the last section, we will work with the vector s of positive real numbers, the n marked points {p1 , . . . , pn } ⊂ CP1 and the parabolic structures at each point pj with weights −sj and sj . What may vary in these bundles – other than the underlying holomorphic structure – is therefore the middle step in the flag at each pj , a choice of a line in the fiber over pj . Thus the parabolic bundles of this type are encoded precisely by C × (CP 1 )n , and a g ∈ G C acts here in the usual way on the C factor and on the j th CP1 by g p ∈ SL(2, C). In other words, j
C × (CP1 )n is nothing other than Vs and the map ξ is simply inclusion onto the (CP1 )n factor, which is certainly holomorphic, as claimed in the last section. To do gauge theory in these spaces of bundles, as we shall in a moment, it is easier to work with a space of complex structures or connections alone, rather than also to carry along the (CP1 )n . But SL(2, C) acts transitively on CP1 and is connected, so there are elements of G C which take any point of C × (CP1 )n to one with all of the CP1 components equal to some standard basepoint. The stabilizer in G C of such bundles with standardized intermediate flag step at all pj is described by the Definition 9.3. The subgroup GPC = {g ∈ G C | g p ∈ P ∀j } ⊂ G C , where P is the j standard parabolic subgroup of SL(2, C), is the asymptotically parabolic complex gauge group. We find that C × CP1 /G C ∼ = C/GPC , where C can now be understood to be the space of parabolic bundles endowed with the same, standard filtration and weights −sj and sj at each marked point pj . We make one last simplification before introducing the actual spaces of connections and complex structures that we shall need. For this, observe that the existence of local holomorphic frames in holomorphic vector bundles amounts to saying that every complex structure can be gauged trivial in one or even several disjoint disks. In fact, as a constant local complex gauge transformation sends local holomorphic frames to local holomorphic frames, we can even achieve the local trivialization of a ∂¯A ∈ C by a g ∈ GPC (or, for that matter, by a g which is the identity at each pj ). It thus is appropriate to make the Definition 9.4. Let Cc be the subset of C of elements which equal ∂¯ in a neighborhood C be the subgroup of G C of automorphisms which are ∂-holomorphic ¯ of each pj and GP, c P in some neighborhood of the set {p1 , . . . , pn }. (The subscript “c” in Cc is intended to remind the reader that it consists of operators ∂¯A for which ∂¯A − ∂¯ is compactly supported in CP1 {p1 , . . . , pn }.) Then C/GPC ∼ = C . Cc /GP, c It is still necessary to think of Cc as a space of parabolic bundles by imposing the weights ±sj and standard filtration in the fiber at each pj . In order to incorporate this external data directly into the complex structures, we imagine removing the marked points from the base CP1 entirely, and introducing a singularity into the elements of Cc . A direct way to do this is to act on the elements of Cc by a fixed (very singular, non-unitary) −s /2 r
j
0
j , gauge transformation gs which near each marked point pj looks like sj /2 0 rj where rj = zj for zj a holomorphic coordinate near pj . We shall in particular denote
326
J. J. Millson, J. A. Poritz
by ∂¯X the image of ∂¯ under the action of gs , and note that near each point pj d¯zj −sj /2 0 ⊗ ∂¯X = ∂¯ + . 0 sj /2 2¯zj
(9.1)
¯ Note that if {e1 , e2 } is the standard – constant, and thus ∂-holomorphic – basis of sections coming from the trivialization of E, then {r sj /2 e1 , r −sj /2 e2 } is a local ∂¯X -holomorphic frame near pj . We can now make the Definition 9.5. Let Cs,c be the space of holomorphic structures on the trivial bundle E of rank 2 over CP1 {p1 , . . . , pn } inducing the standard ∂¯ on det E and which on some C be the group of smooth sufficiently small disk around each pj equal ∂¯X . Likewise, let GA, c −1 automorphisms g of E with trivial determinant and for which gs ggs can be extended a 0 by an element of A = { 0 a −1 | a ∈ C {0}} ⊂ SL(2, C) at pj to be holomorphic in C . a neighborhood of pj . Finally, let GT, c be the subgroup of unitary elements GA, c
As our choice of gs is entirely non-canonical – we made no restrictions whatsoever upon its values far away from the marked points – the basepoint ∂¯X is somewhat arbitrary and it is appropriate to think of Cs,c as merely an affine space. Since we have punctured the base CP1 , the appropriate Sobolev metric to work with becomes a much more subtle issue. In fact, it is necessary to use Sobolev spaces weighted near each pj by a power rj−δ of the local radial coordinate, where δ > 0 must be chosen smaller than both minj sj and minj 1 − sj ; see [16] for an extensive discussion of these analytic issues. Completing with respect to these norms gives complex structures and gauge transformations which differ from the models in Definition 9.5 by terms which decay rapidly as one approaches each point pj . Definition 9.6. We shall denote by Cs,δ the completion of Cs,c in the δ-weighted Sobolev C and G C 2 L21 norm and by GA,δ T, δ the completions of GA, c and GT, c with respect to the L2,δ norm. A first application of analysis in these weighted Sobolev spaces shows that every C -orbit in C GA,δ s,δ contains elements that look like (9.1) near each pj , see [16]. This is a sort of Newlander–Nirenberg theorem, which is normally trivial over a manifold of complex dimension one, but here is rendered difficult again by the singularities. It provides an important step in proving the C = C /G C . Proposition 9.1. There exists an identification Cc /GP, s,δ c A,δ
Proof. Conjugation by gs gives a well-defined map σ : Cc → Cs,c → Cs,δ ; certainly C , then σ (∂¯ ) and σ (∂¯ ) will ∂¯X ∈ Cs,c . If ∂¯A , ∂¯B ∈ Cc are equivalent by some g ∈ GP, A B c a b −1 be equivalent by gs ggs . Say c d is the matrix of holomorphic functions defining −1 −1 g near pj , withtherefore c(pj ) = 0 and a(pj ) = d(pj ) . Then gs ggs will look s
br j near pj . But d least like r 1−sj , both of which C . gs −1 ggs ∈ GA,δ
like
cr
a
−sj
br sj decays at least like r sj towards pj and cr −sj at
are in the weighted Sobolev closure we are using, so
C C Hence σ induces a well-defined map σ : Cc /GP, c → Cs,δ /GA,δ . The singular Newlander-Nirenberg theorem mentioned above tells us that σ is surjective, so it only remains to show that it is injective. So say that σ (∂¯A ) and σ (∂¯B ) are equivalent by some
Around Polygons in R3 and S 3
327
C . Concentrating near one p , let h = element h ∈ GA,δ j
a 0 0 a −1
+ hj , where hj de-
cays towards pj at least like r δ , a is holomorphic near pj and a(pj ) = 0. But then a 0 + gs hj gs −1 takes ∂¯A to ∂¯B , and thus must be holomorphic in a punctured neigh0 a −1 j −s j h r j h12 j borhood of pj . If sj 11j is to be holomorphic and hj is to decay, then h11 and j j
r h21
h22
j
j
h22 must be have a zero at pj while h12 and h21 must be of the form br sj and cr −sj , respectively, for b and c holomorphic functions on a neighborhood of pj . However if hj is to decay at pj then c(pj ) must equal zero while b(cj ) can be any complex number – C and we are done. in other words, gs hgs −1 ∈ GP, c 10. The Second Kempf–Ness-Type Theorem C acting on the complex affine space C We now have the complex group GA,δ s,δ preserving C its complex structure. GA,δ is the complexification of the group GT, δ , which will have to be the analogue in this infinite-dimensional situation of the compact group acting symplectically. It does indeed preserve the constant symplectic form u ∧ v, ω(u, v) = CP1
where u and v are tangent vectors to Cs,δ and the operation u ∧ v takes the normal wedge of the form parts and the Killing form on the Lie algebra parts. In fact, Atiyah and Bott define and study this symplectic form in [1] – for higher genus and without parabolic C structures, but an identical argument shows that the moment map for the action of GA,δ is nothing other than the map which assigns to a connection dA its curvature FA . Hence one would now hope – if the appropriate infinite-dimensional Kempf–Ness s /G C could be identified with the theorem could be proven – that the quotient Cs,δ A,δ s //G irr symplectic quotient Cs,δ T, δ = Fs,δ /GT, δ , where Fs,δ is the subset of flat connections irr denotes the flat and irreducible (the zero set of the current moment map) in Cs,δ while Fs,δ connections. This was indeed done from this perspective for bundles without parabolic structure by Donaldson in [5] and by several authors for parabolic bundles, after the initial, purely algebro-geometric proof of Mehta and Seshadri in [15]. None of these proofs applies directly to our present situation, requiring either all of the parabolic weights to lie in [0, 1), there to be only one parabolic point, the base Riemann surface * to have genus two or greater, or several of these conditions. (But note that an analogue of the Mehta–Seshadri theorem which would suffice for our purposes is proven in the preprint [2].) Let us instead sketch a proof which applies here, based on [16] and quoting some facts that are now known about the topology of the space of parabolic bundles. First of all, the topology of Vss / Iso0 has been extensively studied, and in particular it is s /G C and G C is connected known to be connected, see [8, 3] or [9]. But Vss / Iso0 = Cs,δ A,δ A,δ s of stable connections is (since π2 (SL(2, C)) is trivial), so it follows that the set Cs,δ C · F irr is open connected. Now an implicit function theorem argument shows that GA,δ s,δ s in Cs,δ and a version of Uhlenbeck compactness in our weighted spaces shows that it is also closed. (See [16] for the analytic details.) Thus every stable connection is complex gauge-equivalent to a (unitary gauge orbit of a) flat connection; we will write κ2 for the s /G C with F irr /G resulting diffeomorphism of Cs,δ T, δ . In fact, [16] also shows that κ2 s,δ A,δ
328
J. J. Millson, J. A. Poritz
can be extended to a homeomorphism κ2 : Vsss /∼ → Fs,δ /GT, δ , just as at the end of Sect. 8. 11. Representations of the Fundamental Group It is by now a classic result of modern differential geometry that the moduli space of flat connections on some manifold M can be identified with the moduli space of representations of π1 (M), by associating to a flat connection dA its holonomy representation hol(dA ). In our case, the image of hol will not be all representations of π1 (CP1 {p1 , . . . , pn }) since we are not working with all flat connections but only those which are in Fs,δ . These are all connections which differ from the model (the unitary connection corresponding to the basepoint ∂¯X ∈ Cs,δ ) by terms which decay fast enough towards the puncture that their holonomy along a small loop around each pj will −πis j e 0 ; see [16] for details. be the same as the local model, i.e., conjugate to 0 eπisj Let us give these representations a name. Definition 11.1. With s fixed as usual, the restricted representation variety Rs of CP1 {p1 , . . . , pn } consists of those homomorphisms from π1 (CP1 {p1 , . . . , pn }) to SU (2) which take a small loop around each pj to elements of trace 2 cos(π sj ); Rs admits an action of SU (2) by post-conjugation and the quotient Rs /SU (2) shall be called the moduli space of restricted representations. We shall also denote by Rirr s the subset of irreducible elements in Rs . With this understood, we can say that the holonomy representation hol induces a map hol : Fs,δ /GT, δ → Rs /SU (2) which, it is easy to see, is smooth on the smooth part irr /G Fs,δ T, δ . In order to show that hol is a bijection, we should construct a flat connection dA ∈ Fs,δ whose holonomy representation is an arbitrary fixed ρ ∈ Rs . The usual approach ∼ =
2 / =, where π(CP1 {p , . . . , p }) − to this is to write CP1 {p1 , . . . , pn } = HR → 1 n 2 2 = ⊂ P SL(2, R) and to consider the connection dρ induced on HR ×ρ C by the 2 × C2 . But if we choose coordinates {x , y } near the cusp trivial connection on HR j j corresponding to the parabolic element γj ∈ = of a fundamental domain for the =2 which look like an infinite vertical strip in the upper half-plane of width action on HR 2π , as is often done, then dρ exactly equals 0 −1 −isj /2 dAj = d − dxj ⊗ Cj Cj 0 isj /2 −π is j 2 /= → 0 on a neighborhood of that cusp, where ρ(γj ) = Cj−1 e Cj . If g : HR π isj 0
e
SU (2) equals Cj near the j th cusp, then g(dρ ) is the required flat connection in Fs,δ . It shall be convenient in just a moment to have Rs in another form. Taking the obvious presentation of the fundamental group of a punctured sphere, we can think of a restricted representation as nothing other than a choice of n elements of SU (2), the j th from the conjugacy class of trace 2 cos(π sj ), such that the product of these elements is the identity, i.e.: Rs = (g1 , . . . , gn ) ∈ (SU (2))n | g1 · · · gn = Id and (11.1) Tr gj = 2 cos(π sj ) ∀j .
Around Polygons in R3 and S 3
329
Furthermore, the reducible representations Rs Rirr s are those which are simultaneously diagonalizable, so in our present formulation of Rs they correspond to n-tuples (g1 , . . . , gn ) such that all −πis j 0 −1 e gj = X X (11.2) 0 eπisj with the same X for all j . 12. Returning to Polygons: A Second Gauss Map and Polygons in S 3 The conjugacy classes appearing in (11.1) are simply spheres centered at the identity n of radius πsj in the spherical metric on SU (2) ∼ = S 3 . Since j =1 sj < 1 and the sj are all positive, it follows that each sj < 1, and hence there is a unique directed geodesic @j from the identity of SU (2) to any of the gj as in (11.1). Thus ej = h@j is the unique directed geodesic segment from h to hgj and as gj runs over all elements of trace 2 cos(π sj ), ej exhausts the set of directed geodesic segments starting at h and of length πsj . This motivates us to recall the definitions of Sect. 2, generalized to a slightly different ambient space: Definition 12.1. Let s be our usual n-tuple of positive real numbers. S of polygons in S 3 with fixed side lengths πs is the set • The configuration space Pπs of all n-tuples (e1 , . . . , en ) of directed geodesic segments such that the length of ej is πsj and the endpoint of ej is the beginning point of ej +1(mod n) . The moduli space of S 3 /SO(4) where SO(4) acts diagonally on the n-tuples of polygons is then simply Pπs geodesic segments. • A polygon is said to be degenerate if it lies entirely in some great circle in S 3 . The set S3 . πs of non-degenerate polygons shall be denoted P 3 S S 3 | ebegin = Id ∈ SU (2) ∼ = {(e1 , . . . , en ) ∈ Pπs • The set of based polygons is Pπs,0 = 1 3 S }. 3
S Hence for a (g1 , . . . , gn ) ∈ Rs we can construct an element (e1 , . . . , en ) ∈ Pπs,0 by defining g0 = Id and then letting ej be the unique directed geodesic segment from g0 · · · gj −1 to g0 · · · gj for 1 ≤ j ≤ n. Note that if we start with a reducible representation satisfying (11.2), the corresponding polygon will lie entirely on the great circle −it 0 −1 e X X t ∈ [0, 2π ] 0 eit 3
and thus be degenerate. As these constructions are clearly invertible and continuous, −1 S 3 which restricts to a : Rs → Pπs it follows that we have a homeomorphism G2 −1 S 3 since the present map is diffeomorphism of Rirr s with Pπs,0 ; we use this notation G2 something of an inverse to a non-linear generalization of the first Gauss map we defined in Sect. 3. Finally, recall that the adjoint group of SU (2) is SO(3). Hence the conjugation action of SU (2) on the matrices encoding a representation corresponds to the residual action of the group SO(3) of isometries of S 3 fixing the identity, which is the basepoint of our based polygons. Thus we can finish our grand tour of moduli spaces by defining 3 G2 3 the induced homeomorphism P S /SO(4) ∼ = P S /SO(3) −→ Rs /SU (2) which, as πs
πs,0
S /SO(4) with Rirr /SU (2). πs usual, restricts to a diffeomorphism of P s 3
330
J. J. Millson, J. A. Poritz
13. Conclusion Let us return to the diagram we had originally intended to fill in, for completeness now adding the various intermediate spaces and maps we used: sR3 /E+ (3) P
S 3 /SO(4) πs P
R3 /SO(3) P s,0 G1
S 3 /SO(3) P πs,0 G2
2 )n //SO(3) (S s κ1
Rirr s /SU (2) hol
Wss /P SL(2, C) ξ
irr /G Fs,δ T, δ κ2
Vss / Iso0 s C × (CP1 )n /G C
s /G C Cs,δ A,δ σ
C s /GPC
C Ccs /GP, c
Theorem 13.1. In the above diagram, all maps are diffeomorphisms and those connecting complex spaces are biholomorphisms. All maps also extend to homeomorphisms on the corresponding spaces of semi-stable points. Let us conclude by mentioning two other maps which could be inserted in the above S 3 /SO(3) → P R3 /SO(3) defined by M. Sardiagram. The first is a diffeomorphism P πs,0 s,0 gent in [17], as follows. On the level of configuration spaces of based polygons, he enlarges the S 3 by a factor k → ∞, all the while scaling the polygons by 1/k, getting in the limit a polygon with the original side lengths but on R3 . As the resulting map on configuration spaces is equivariant with respect to the SO(3)-action on both sides, a diffeomorphism of the moduli spaces results. This construction is admirably direct, but has the disadvantage that it does not reveal whether any geometric structures are preserved. A more sophisticated approach is taken by L. Jeffrey in [10] with a map 2 )n //SO(3) → Rirr /SU (2). (S s s For more general compact groups – not just SU (2) – she gives a symplectomorphism between the quotient of a submanifold of the product of conjugacy classes in the Lie algebra and the corresponding quotient in the group itself. Her main interest is symplectic structures, however, and not the complex structures we address here; nor does she use or address the configuration spaces of weighted points and parabolic bundle techniques which form the core of Sects. 5–10, above.
Around Polygons in R3 and S 3
331
References 1. Atiyah, M.F. and Bott, R.: The Yang–Mills equations over Riemann surfaces. Philos. Trans. Roy. Soc. London Ser. A 308, 523–615 (1982) 2. Belkale, P.: Local systems on P1 − S for S a finite set. Preprint, 1999 3. Boden, H. and Hu, Y.: Variations of moduli of parabolic bundles. Math. Ann. 301, 539–559 (1995) 4. Deligne, P. and Mostow, G.: Monodromy of hypergeometric functions and non-lattice integral monodromy. Publ. Math. IHES 63, 5–90 (1986) 5. Donaldson, S.K.: A new proof of a theorem of Narasimhan and Seshadri. J. Differential Geom. 18, 269–277 (1983) 6. Douady, A. and Earle, C.: Conformally natural extensions of homeomorhpisms of the circle. Acta Math. 157, 23–48 (1986) 7. Fogarty, J., Kirwan, F. and Mumford, D.: Geometric invariant theory. Enlarged 3rd ed., New York: Springer-Verlag, 1994 8. Furuta, M. and Steer, B.: Siefert-fibered homology 3-spheres and Yang–Mills equations on Riemann surfaces with marked points. Adv. Math. 96, 38–102 (1996) 9. Holla, Y.: Poincaré polynomial of the moduli spaces of parabolic bundles. Preprint, alg-geom/9902002, 1999 10. Jeffrey, L.: Extended moduli spaces of flat connections on Riemann surfaces. Math. Ann. 298, 667–692 (1994) 11. Kapovich, M. and Millson, J.J.: The relative deformation theory of representations and flat connections and deformations of linkages in constant curvature spaces. Compositio Math. 103, 287–317 (1996) 12. Kapovich, M. and Millson, J.J.: The symplectic geometry of polygons in Euclidean space. J. Differential Geom. 44, 479–513 (1996) 13. Kempf, G. and Ness, L.: The length of vectors in representation spaces. In: Algebraic Geometry, Proceedings, Copenhagen, 1978, Berlin–Heidelberg–New York: Springer-Verlag, 1979, pp. 233–243 14. Kirwan, F.: Cohomology of quotients in symplectic and algebraic geometry. Princeton, N.J.: Princeton University Press, 1984 15. Mehta, V.B. and Seshadri, C.S.: Moduli of vector bundles on curves with parabolic structures. Math. Ann. 248, 205–239 (1980) 16. Poritz, J.A.: Parabolic vector bundles and Hermitian-Yang–Mills connections over a Riemann surface. Internat. J. Math. 4, 467–501 (1993) 17. Sarget, M.A.: Diffeomorphic equivalence of configuration spaces of polygons in constant curvature spaces. Ph.D. thesis, University of Maryland, 1995 Communicated by A. Connes
Commun. Math. Phys. 218, 333 – 371 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Nahm Transform for Periodic Monopoles and N = 2 Super Yang–Mills Theory Sergey Cherkis1 , Anton Kapustin2 1 TEP, UCLA Physics Department, Los Angeles, CA 90095-1547, USA. E-mail:
[email protected] 2 Institute for Advanced Study, Olden Lane, Princeton, NJ 08540, USA. E-mail:
[email protected]
Received: 20 July 2000 / Accepted: 29 November 2000
Abstract: We study Bogomolny equations on R2 × S1 . Although they do not admit nontrivial finite-energy solutions, we show that there are interesting infinite-energy solutions with Higgs field growing logarithmically at infinity. We call these solutions periodic monopoles. Using the Nahm transform, we show that periodic monopoles are in one-to-one correspondence with solutions of Hitchin equations on a cylinder with Higgs field growing exponentially at infinity. The moduli spaces of periodic monopoles belong to a novel class of hyperkähler manifolds and have applications to quantum gauge theory and string theory. For example, we show that the moduli space of k periodic monopoles provides the exact solution of N = 2 super Yang–Mills theory with gauge group SU (k) compactified on a circle of arbitrary radius.
1. Introduction and Summary 1.1. The Bogomolny equation. Let X be a three-dimensional oriented Riemannian manifold, E be a vector bundle over X with structure group G, A be a connection on E, and φ be a section of End(E). The Bogomolny equation is the reduction of the self-duality equation to three dimensions which reads FA = ∗ dA φ.
(1)
Here ∗ is the Hodge star operator. In what follows we set G = SU (2), with E being associated with the fundamental representation of SU (2). In this case φ is Hermitian and traceless. We will assume that all functions and connections are infinitely differentiable, unless specified otherwise. It is well known that for X = R3 with flat metric the Bogomolny equation admits finite energy solutions, so-called BPS monopoles. The energy and magnetic charge of a
334
S. Cherkis, A. Kapustin
pair (A, φ) are defined as follows: 1 Tr (FA ∧ ∗FA + ∗ dA φ ∧ dA φ) , E(A, φ) = 4 X Tr (FA φ) m(A, φ) = lim . R→∞ |x|=R 4π ||φ|| Here ||φ||2 =
1 Tr φ 2 . 2
For BPS monopoles ||φ|| tends to a constant value v at infinity, while ||FA || decreases as 1/r 2 . It follows that the energy of a BPS monopole is proportional to its magnetic charge: E(A, φ) = 2π v m(A, φ). BPS monopoles are absolute minima of the energy function E(A, φ) in a subspace with fixed magnetic charge and fixed asymptotic value of ||φ||. 1.2. Periodic monopoles. Solutions of the Bogomolny equation on R2 × S1 have not been studied previously. One of the reasons is that any monopole on R2 × S1 with a nonzero magnetic charge must have infinite energy. This happens because the magnetic field of a magnetically charged object on R2 × S1 decays only as 1/r, where r is the radial distance on R2 . Hence the magnetic energy density decays as 1/r 2 , and its integral over R2 × S1 diverges. Since the energy of a periodic monopole is infinite, it cannot be regarded as a solitonic particle. Still, periodic monopoles do play a role in certain physical problems. For example, we will see that the centered moduli space of an SU (2) periodic monopole with magnetic charge k is a hyperkähler manifold of dimension 4(k − 1). It turns out that this hyperkähler manifold coincides with the quantum Coulomb branch of the N = 2 superYang–Mills theory on R3 ×S1 with gauge group SU (k) (see below). For k = 2 this manifold is a very interesting deformation of the Atiyah–Hitchin manifold (the reduced moduli space of a k = 2 monopole on R3 ) and is an example of a new class of asymptotically locally flat self-dual gravitational instantons. The properties of the moduli spaces of periodic monopoles will be discussed in more detail in a forthcoming paper [1]. The goal of this paper is two-fold. On one hand, we want to compute the dimension of the moduli spaces of periodic monopoles and to establish a correspondence between periodic monopoles and solutions of Hitchin equations on a cylinder. The latter correspondence is a particular instance of the Nahm transform. On the other hand, we want to explain the relation of our results to the four-dimensional N = 2 gauge theories and to the brane configurations of the type first considered by Chalmers and Hanany [2] and further explored by Hanany and Witten [3], Witten [4], and many others. These brane configurations were used to find the exact Coulomb branch of the super Yang–Mills theory with eight supercharges on R3 and R4 . In particular, Witten showed how to obtain the exact solution of the N = 2 super Yang–Mills on R4 from the physics of the Mtheory fivebrane. As explained below, the Nahm transform approach not only reproduces the classical physics of the fivebrane, but goes considerably further by resumming the effects of membrane instantons.
Nahm Transform for Periodic Monopoles and N = 2 Super Y–M Theory
335
In a companion paper [5] we study solutions of Bogomolny equations on R2 ×S1 with prescribed singularities. Their moduli spaces provide more examples of novel self-dual gravitational instantons and are related to four-dimensional N = 2 gauge theories with matter compactified on a circle. In the remainder of this section we define periodic monopoles more precisely and formulate our main result, the correspondence between periodic monopoles and solutions of Hitchin equations on a cylinder. 1.3. Periodic Dirac monopoles. Before investigating the nonabelian Bogomolny equation on R2 × S1 , it is instructive to write down solutions of the Bogomolny equation in the case G = U (1). In this case there are no nontrivial smooth solutions, so we allow for singularities at some finite number of points on R2 × S1 . A solution with one singularity represents a Dirac monopole on R2 × S1 . For G = U (1) the Bogomolny equation implies that the Higgs field satisfies the Laplace equation ∇ 2 φ = 0. Let z be a complex affine coordinate on C R2 and χ ∈ [0, 2π ] be the periodic coordinate on S1 . We will denote by x the pair (z, χ ). The solution corresponding to a Dirac monopole at x = 0 is given by ∞ log(4π ) − γ k 1 1 φ(x) = v + kV (x) ≡ v + k , − − 2 p=−∞ 2π |p| 2π |z|2 + (χ − 2πp)2 where the prime means that for p = 0 the second term in the square brackets must be omitted, and γ is the Euler’s constant. V (x) satisfies the Laplace equation everywhere except z = 0, χ = 0 mod 2π . Near this point V (x) diverges: 1 V (x) ∼ − + O(1). 2 2 |z| + |χ |2 For large |z| the function V (x) is given by log |z| + o(1). 2π The connection A corresponding to this Higgs field has the following asymptotics for |z| → ∞ (up to a gauge transformation): V (x) ∼
Az ∼
a k + o(1/z), Aχ = arg z + b + o(1). z 2π
Here a and b are real constants. For these formulas to define a connection on a U (1) bundle, the parameter k must be an integer. The magnetic charge of a U (1) monopole is defined as the first Chern class of the monopole bundle restricted to the 2-torus |z| = R for sufficiently large R. It is easy to see that the magnetic charge of the above monopole is k. The Higgs field of a solution describing several periodic Dirac monopoles has the form φ(x) = v + kα V (x − xα ). (2) α
336
S. Cherkis, A. Kapustin
It is singular at x = xα and for large |z| behaves as φ(x) ∼ v +
log |z| kα + o(1). 2π α
(3)
1.4. Asymptotics of a periodic monopole. It is well known that finite-energy solutions of SU (2) Bogomolny equations on R3 are exponentially close to the Dirac monopole at large distances [6]. Then it is natural to require that periodic SU (2) monopoles be close to the periodic Dirac monopole at large |z|. Accordingly, we will look for solutions of SU (2) Bogomolny equations on R2 × S1 such that outside a compact set T ⊂ R2 × S1 one has φ(x) ∼ g(x) σ3 φD (x) g(x)−1 + o(1), dA φ(x) ∼ g(x) σ3 dφD (x) g(x)−1 + o(1/|z|), A(x) ∼ g(x) σ3 AD (x) g(x)
−1
+ g(x)dg
(4)
−1
(x) + o(1). Here σ3 = diag {1, −1}, g(x) is an SU (2)-valued function on R2 × S1 \T , and φD and AD are a 0-form and a U (1) connection defined by log |z| , 2π k AD (x) = b + arg z. 2π φD (x) = v + k
(5) (6)
This means that up to terms vanishing at infinity a periodic SU (2) monopole is gaugeequivalent to a periodic Dirac monopole with charge k embedded in a U (1) subgroup of SU (2). The real parameters v and b will often appear in a combination v + ib. We will denote this combination v. Note that we implicitly set the circumference of circle parameterized by χ to be 2π . This does not entail a loss of generality, as the Bogomolny equation is invariant with respect to rescalings of the metric on X. One has to keep in mind that rescaling the circumference by a factor λ requires rescaling the Higgs field φ by the same factor. Thus the “large circumference limit” is equivalent to the “large v limit”. We will use these terms interchangeably. The magnetic charge of a periodic monopole is defined in analogy with the case of monopoles on R3 . It follows from Eq. (4) that for large enough |z| the eigenvalues of φ are distinct (and opposite). Hence for large enough |z| one has a well-defined line bundle L+ ⊂ E, the eigenbundle of φ associated with the positive eigenvalue. The magnetic charge can be defined as the first Chern class of L+ restricted to a 2-torus |z| = R, where R is large enough. Thus the magnetic charge is given by the formula Tr (FA φ) m(A, φ) = lim . (7) R→∞ |z|=R 4π ||φ|| It is easy to see that the magnetic charge of a periodic monopole is nonnegative. Substituting the asymptotics Eqs. (4–6) into this formula, one finds that m(A, φ) = k, so k must be nonnegative too. Unlike the case of monopoles on R3 , the energy of a monopole is infinite for k = 0.
Nahm Transform for Periodic Monopoles and N = 2 Super Y–M Theory
337
1.5. Nahm transform for periodic monopoles. Let ' be a Riemann surface, V be a unitary vector bundle on ', Aˆ be a connection on V , and ( be a section of End(V )⊗)1,0 ' . ˆ ( are the equations Hitchin equations for A, ∂¯Aˆ ( = 0,
i FAˆ + [(, († ] = 0, 4
where the commutator is understood in a graded sense, i.e. [(, († ] = ( ∧ († + († ∧ (. Hitchin equations are the reduction of the self-duality equation to two dimensions. Our main result is that there is a one-to-one correspondence (modulo gauge transformations) between SU (2) periodic monopoles with magnetic charge k and solutions of U (k) Hitchin equations on a cylinder R × S1 with the Higgs field growing exponentially at infinity. To describe the asymptotics of the Higgs field more precisely, let us regard R × S1 as a strip 0 ≤ Im s ≤ 1 on the complex s-plane with the boundaries identified in an obvious manner. Then the Higgs field behaves as follows for Re s → ±∞: ((s) ∼ g± (s, s¯ ) e±
2π s k
diag(1, ω, ω2 , . . . , ωk−1 ) g(s, s¯ )−1 ± ds.
(8)
Here g(s, s¯ )± are some (multi-valued) functions with values in U (k), and ω is a k th root of unity. The curvature of the U (k) connection, on the other hand, approaches zero as 1/|Re s|3/2 for Re s → ±∞. For k = 1 it is easy to write down an explicit solution of Hitchin equations with this asymptotics. Then the Nahm transform implies that there exists a periodic SU (2) monopole with k = 1. One can also argue that solutions of Hitchin equations exist for all positive k, and even describe their moduli space. This implies that periodic monopoles exist for all k > 0. It would be interesting to find an explicit formula for the periodic monopole, at least in the k = 1 case.
1.6. Outline. The paper is organized as follows. In Sect. 2 we explain the relation between periodic monopoles and N = 2 superYang–Mills theory compactified on a circle. This section requires familiarity with the physics of branes in Type II string theory. The rest of the paper does not depend on it. In Sect. 3 we show that the Nahm transform takes periodic monopoles to solutions of Hitchin equations on a cylinder. In Sect. 4 we explain how to associate algebro-geometric data to a periodic monopole. These data consist of an algebraic curve and a line bundle over it and are important in the study of the Nahm transform. On the other hand, it is well known that to every solution of the Hitchin equations one can associate so-called spectral data also consisting of an algebraic curve and a line bundle. In Sect. 5 we show that the algebro-geometric data associated to the periodic monopole coincide with the spectral data of its Nahm transform. In Sect. 6 we use this information to determine the asymptotic behavior of the solutions of Hitchin equations arising from periodic monopoles. In Sect. 7 we describe the “inverse” Nahm transform which produces a solution of the Bogomolny equation on R2 × S1 from a solution of Hitchin equations on a cylinder. In Sect. 8 we study the asymptotic behavior of the resulting solution of the Bogomolny equation and show that it is given by (4). In Sect. 9 we prove that the composition of the “direct” and “inverse” Nahm transform takes a periodic monopole to a gauge-equivalent periodic monopole. The proof is modelled on that of Schenk [7] and requires rather tedious computations. Another approach to the proof which uses
338
S. Cherkis, A. Kapustin
the spectral sequence technology is sketched in the Appendix. The results of Sects. 3– 9 imply that the Nahm transform establishes a one-to-one correspondence between periodic monopoles and solutions of Hitchin equations on a cylinder with a particular asymptotics. In Sect. 10 we give (nonrigorous) arguments that periodic monopoles exist for all k > 0 and are (almost) completely determined by their spectral data. Assuming that this is true, we show in Sect. 11 that the centered moduli space of a charge k periodic monopole has dimension 4(k − 1), and describe a distinguished complex structure on it. We also argue that the centered moduli space carries a natural hyperkähler metric. 2. Periodic Monopoles and Brane Configurations This section assumes familiarity with the Chalmers–Hanany–Witten-type brane configurations [2–4] and their use in solving quantum gauge theories with eight supercharges. Consider two parallel flat NS5-branes in Type IIB string theory. For definiteness, let us assume that their worldvolumes are given by the equations x6 = x7 = x8 = x9 = 0 and
x 6 = v,
x 7 = x 8 = x 9 = 0.
This brane configuration is BPS (preserves sixteen supercharges), and its low-energy dynamics is described by a d = 6 supersymmetric Yang–Mills theory with gauge group U (2). Consider now a D3-brane with the worldvolume given by x 3 = x 4 = x 5 = x 7 = x 8 = x 9 = 0,
0 ≤ x 6 ≤ v.
This is an open D3-brane, in the sense that its worldvolume has boundaries. This is possible because the boundaries lie on the NS5-branes. One can say that such a D3brane is suspended between the NS5-branes. From the point of view of the Yang–Mills theory describing the NS5 branes, the suspended D3-brane is a static solution of the Yang–Mills equations of motion with a unit magnetic charge [2]. Moreover, since a suspended D3-brane preserves eight supercharges, it is a BPS soliton, and must solve the Bogomolny equation. Similarly, k suspended D3-branes are described in the Yang–Mills theory by a charge k monopole [2]. Let us now compactify the x 3 coordinate on a circle of radius R, i.e. let x 3 take values in R/(2πR · Z) rather than in R. In such a situation we may still consider suspended D3-branes. The same arguments as in the uncompactified case lead one to the conclusion that k D3 branes are described in the Yang–Mills theory by a BPS monopole on R2 × S1 with charge k. The S1 has circumference 2π R. Now let us apply T-duality in the x 3 direction. This has the effect of taking us to Type IIA string theory. We will denote the spatial coordinates in Type IIA by y 1 , . . . y 9 , so that y 3 can be identified with the Fourier dual of x 3 , while the rest of the y coordinates are identified with the corresponding x coordinates. If we choose the units in which the Regge slope α is unity, then y 3 has period 2π/R. The usual T-duality rules tell us that the Type IIB NS5-branes are mapped under T-duality to the Type IIA NS5 branes with the worldvolumes given by y6 = y7 = y8 = y9 = 0
Nahm Transform for Periodic Monopoles and N = 2 Super Y–M Theory
and
y 6 = v,
339
y 7 = y 8 = y 9 = 0.
A suspended D3-brane is mapped to a D4-brane with the worldvolume given by y 4 = y 5 = y 7 = y 8 = y 9 = 0,
0 ≤ y 6 ≤ v.
This D4-brane is suspended between the NS5-branes. Such a brane configuration in Type IIA string theory has been first studied by E. Witten [4] and subsequently by many other authors. The only difference with [4] is that in our case y 3 is a periodic variable. Witten argued that the low-energy dynamics of k suspended D4-branes is described by the d = 4 N = 2 supersymmetric Yang–Mills theory with gauge group SU (k). The classical gauge coupling of this theory depends on v: 1/gY2 M ∼ v/gst . Thus we are dealing with a four-dimensional super Yang–Mills theory on R3 × S1 , where the circumference of S1 is given by 2π/R. In the quantum theory the gauge coupling depends on the renormalization scale µ: 1/gY2 M (µ) = 1/gY2 M (µ0 ) + k log(µ/µ0 ). From the string theory viewpoint, taking into account quantum corrections on the D4-brane worldvolume is equivalent to taking into account the back-reaction of the D4-branes on the NS5-branes. This back-reaction results in the bending of the NS5-branes, as a consequence of which the distance between them in the y 6 direction starts to depend on u = y 4 + iy 5 : δx 6 (u) = const +k Re log u. Since D3-branes suspended between NS5-branes in Type IIB string theory are Tdual to D4-branes suspended between NS5-branes in Type IIA string theory, their moduli spaces must coincide (as Riemann manifolds). The moduli space of the former coincides with the moduli space of k periodic monopoles. The moduli space of the latter is the Coulomb branch of the d = 4 N = 2 supersymmetric Yang–Mills with gauge group SU (k) compactified on a circle of radius 1/R. Assuming that this correspondence is true, we may predict the dimension of the moduli space of periodic monopoles. As explained in [8], the Coulomb branch of a d = 4 N = 2 super-Yang–Mills theory on R3 × S1 is a hyperkähler manifold of dimension 4 rank(G), where G is the gauge group. Thus the moduli space of a periodic monopole of charge k must have real dimension 4(k − 1). A particular case of this correspondence has been known for some time from the work of Chalmers and Hanany [2]. These authors showed that the centered moduli space of k periodic monopoles on R3 coincides with the Coulomb branch of the d = 3 N = 4 supersymmetric Yang–Mills with gauge group SU (k) on R3 . This statement follows from ours in the limit R → ∞. In this limit monopoles on R2 × S1 reduce to ordinary monopoles on R3 . On the other hand, the radius of the dual circle goes to zero, and therefore the d = 4 N = 2 super-Yang–Mills undergoes Kaluza–Klein reduction to the d = 3 N = 4 super-Yang–Mills. We pause here to explain one subtlety in the above arguments. The Coulomb branch of the d = 3 super-Yang–Mills theory with gauge group SU (k) is related to the centered monopole moduli space [2, 3], while the Coulomb branch of the d = 4 gauge theory on a circle appears to be related to the uncentered moduli space of periodic monopoles. If this were the case, we would not get an exact agreement between the two statements in the limit R → ∞. In fact, when considering periodic monopoles, one is forced to fix their center-of-mass if one wants to get a well-defined metric on the moduli space. The reason is that the translational zero modes of a periodic monopole are not normalizable.
340
S. Cherkis, A. Kapustin
This is explained in more detail in Sect. 11. In this way the contradiction is avoided. (The fact that the translational zero modes for suspended D4 branes are not normalizable was explained from the string theory point of view in [4]. This “freezing out” of the centerof-mass motion is the ultimate reason why the suspended D4-branes are described by an SU (k) rather than U (k) gauge theory.) Another interesting limit is R → 0. In this limit the circle on which the d = 4 N = 2 super-Yang–Mills theory is compactified becomes arbitrarily large, while the monopole interpretation loses meaning. The Coulomb branch of this theory with all quantum corrections has been determined in [9, 10]. It is a special Kähler manifold of real dimension 2(k − 1). (Note that the dimension of the Coulomb branch jumps by a factor two as soon as one compactifies one dimension on a circle. The reason for this is explained in [8].) The simplest way to derive the answer uses the Type IIA brane configuration with suspended D4-branes described above [4]. One notices that the metric on the Coulomb branch does not depend on the string coupling if gY M is kept fixed, so one can consider the limit gst → ∞, v → ∞. In this limit Type IIA string theory reduces to d = 11 supergravity, and the configuration with D4-branes suspended between two NS5 branes turns into a single smooth M5-brane. The metric on the Coulomb branch with all quantum corrections taken into account can be obtained by a classical computation with an M5-brane. It would certainly be nice if the quantum Coulomb branch of the compactified theory could also be determined by a classical computation in d = 11 supergravity. However, it is easy to see that this is not the case. The reason is that upon compactification on a circle there appear new kinds of instantons in the gauge theory, namely virtual BPS monopoles and dyons whose worldlines wrap the compactified circle. In a strongly coupled string theory such effects are captured by membrane instantons. These instantons are represented by Euclidean open M2-branes whose boundaries lie on the M5-brane. Clearly, directly summing up all such instantons is a hopeless task. Nevertheless, one can give a “classical” recipe for computing the complete quantum Coulomb branch of the compactified super-Yang–Mills theory by exploiting the correspondence with periodic monopoles. Computing the metric on the moduli space of periodic monopoles is a well-defined problem which appears much simpler than summing up membrane instantons. One could hope to determine this metric using twistor methods, similarly to how it has been done for ordinary monopoles. Alternatively, one could apply the Nahm transform to make the problem more manageable. Below we show that the Nahm transform of a periodic monopole is described by Hitchin equations on C∗ . These equations are somewhat simpler than the original Bogomolny equation. The properties of the moduli space of periodic monopoles will be studied in detail in a forthcoming publication [1].
3. From Periodic Monopoles to Solutions of Hitchin Equations In this section we show that the Nahm transform associates to every periodic SU (2) monopole with charge k a solution of U (k) Hitchin equations on a cylinder. We follow [11] where the Nahm transform for instantons on T 4 is discussed. In fact, periodic monopoles can be regarded as a limiting case of instantons on T 4 invariant with respect to a subgroup of translations. Another closely related work is [12], where the Nahm transform for instantons on R2 × T 2 is studied. We will use many of the techniques of [12] and [13].
Nahm Transform for Periodic Monopoles and N = 2 Super Y–M Theory
341
Let the pair (A, φ) be a periodic SU (2) monopole with asymptotics (4). Let S be the spinor bundle on X = R2 × S1 . This means that S is a trivial unitary rank 2 bundle on X equipped with an injective bundle morphism σ : T ∗ X → S ⊗ S ∗ which is Hermitian and has zero trace. By a change of trivialization, one can always bring σ to the standard form σ (dxj ) = σj , j = 1, 2, 3, where σj are the Pauli matrices. Let L be a trivial unitary line bundle on X with a flat unitary connection a whose monodromy around S1 is exp(−2πit), t ∈ R/Z (these conditions define a unique connection). Consider a Dirac-type operator D : E ⊗ S ⊗ L → E ⊗ S ⊗ L of the form D = σ · dA+a − (φ − r).
(9)
We will be interested in its L2 kernel and cokernel. Using the fact that the norm of the Higgs field φ grows logarithmically at infinity, one can show that D is Fredholm for any (r, t) ∈ R × R/Z. Thus its L2 -index is independent of r, t. As explained in the end of this section, the index is equal to the negative of the magnetic charge k. The Weitzenbock formula for D reads: 2 D † D = −∇A+a + (φ − r)2 + σ · (dA φ − ∗FA ).
(10)
This formula together with the Bogomolny equation imply that D † D is a positive-definite operator, and therefore D has a trivial L2 kernel. It follows that Ker D † is a rank k trivial bundle over the (r, t)-plane. Actually, since t is a periodic variable, we get a rank k ˆ bundle over a cylinder Xˆ = R × S1 ∼ = C∗ . This trivial bundle will be denoted E. From the growth of φ at infinity it follows that for all s = r + it the elements of Ker D † decay at least exponentially. Thus for all s ∈ C we have a well-defined Hermitian inner product on Eˆ s . If we choose a basis ψ1 (x, s), . . . , ψk (x, s), x ∈ X, of Ker D † at point s, then the explicit formula for the inner product is ψα , ψβ = ψα (x, s)† ψβ (x, s)d 3 x. This inner product makes Eˆ into a unitary bundle. Below it will be assumed that the vectors ψα , α = 1, . . . , k, are chosen to form an orthonormal basis of Ker D † for all s. ˆ The Next we want to define a connection Aˆ on Eˆ and a Higgs field φˆ ∈ 9(End(E)). ˆ ˆ ˆ Higgs field at a point s ∈ X is a linear map from Es to Es . We define this map as a composition of two maps: multiplication by z and projection to Eˆ s . An explicit formula for φˆ in an orthonormal basis is ˆ αβ = ψα (x, s)† z ψβ (x, s)d 3 x. φ(s) (11) Since all ψα decay at infinity faster than any power of z, this is well-defined. The connection Aˆ on Eˆ is induced by the zero connection on the trivial infinitedimensional bundle whose fiber at a point s ∈ Xˆ consists of all smooth L2 sections of E ⊗ S ⊗ L. In components: ∂ α † ˆ As (s)β ds = i ψα (x, s) ds (12) ψβ (x, s) d 3 x. ∂s ˆ As for φˆ ∈ 9(End(E)), ˆ it is not It is easy to see that Aˆ is a unitary connection on E. Hermitian, unlike its counterpart φ ∈ 9(End(E)).
342
S. Cherkis, A. Kapustin
Now we will show that Aˆ and φˆ satisfy Hitchin equations. We will need the following commutation relations: ¯ = −p− , [D, ∂] † ¯ = −p+ . = −p− , [D , ∂]
[D, z] = σ+ ,
[D, z¯ ] = σ− ,
[D, ∂] = −p+ ,
[D † , z]
[D † , z¯ ]
[D † , ∂]
= −σ+ ,
= −σ− ,
Here ∂ = ∂/∂s, ∂¯ = ∂/∂ s¯ , σ± = σ1 ± iσ2 , p± = projector to Ker D † by P . Its explicit form is
1 2 (1
(13)
± σ3 ). We will denote the
P = 1 − D(D † D)−1 D † . The projector to the orthogonal complement of Ker D † will be denoted by Q: Q = D(D † D)−1 D † . ˆ First let us compute ∂¯Aˆ φ: ¯ P z] ∂¯Aˆ φˆ = [P ∂, ¯ − Q)z − P z(1 − Q)∂) ¯ = (P ∂(1 ¯ = P (zQ∂¯ − ∂Qz). Using the identity P D = 0, and keeping in mind that ∂¯Aˆ φˆ should be thought of as acting on Ker D † from the right, we can rewrite this expression as follows: ¯ − [∂, ¯ D](D † D)−1 [D † , z]) ∂¯Aˆ φˆ = P ([z, D](D † D)−1 [D † , ∂] = P (σ+ (D † D)−1 p+ + p− (D † D)−1 σ+ ).
To go from the first line to the second line we used the commutation relations (13). The Weitzenbock formula (10) tells us that D † D commutes with all σj , j = 1, 2, 3, and since p− σ+ = σ+ p+ = 0, we get the “complex” Hitchin equation ∂¯Aˆ φˆ = 0.
(14)
The curvature Fˆ of the connection Aˆ is given by ¯ Fˆ = i[P ∂, P ∂]ds ∧ d s¯ . We can simplify this as follows: ¯ − ∂Q∂Q)ds ¯ Fˆ = iP (∂Q∂Q ∧ d s¯ † −1 ¯ † † ¯ D)−1 ∂D † )ds ∧ d s¯ = iP (∂D(D D) ∂D − ∂D(D = iP (p+ (D † D)−1 p+ − p− (D † D)−1 p− )ds ∧ d s¯ = iP (D † D)−1 σ3 ds ∧ d s¯ .
(15)
Nahm Transform for Periodic Monopoles and N = 2 Super Y–M Theory
343
Here we again used the commutation relations (13) and the fact that D † D commutes ˆ φˆ † ]: with all σj . On the other hand, let us compute the commutator [φ, ˆ φˆ † ] = [P z, P z¯ ] [φ, = (P z¯ Qz − P zQ¯z) = P ([¯z, D](D † D)−1 [D † , z] − [z, D](D † D)−1 [D † , z¯ ]) = P (σ− (D † D)−1 σ+ − σ+ (D † D)−1 σ− ) = −4P (D † D)−1 σ3 . ˆ we obtain the “real” Hitchin Comparing with the expression for the curvature of A, equation: i ˆ φˆ † ] = 0. Fˆs s¯ + [φ, 4
(16)
To make the last equation covariant with respect to diffeomorphisms of Xˆ one should ˆ ⊗ )1,0 . Then the “real” Hitchin equation think of ( = φˆ ds as a section of End(E) Xˆ takes the form i Fˆ + [(, († ] = 0, (17) 4 where the commutator is understood in the graded sense. Following [14], we can associate to any solution of Hitchin equations an algebraic curve C. In the present case the curve is a hypersurface in C×C∗ defined by the equation ˆ det(z − φ(s)) = 0.
(18)
∼ C∗ . The left-hand-side of Here z is an affine parameter on C, while s parameterizes Xˆ = the above equation is a polynomial in z of degree k, and it follows from the “complex” Hitchin equation that its coefficients are holomorphic functions on C∗ . This shows that the above equation defines an algebraic curve which is noncompact and is a k-fold cover of C∗ . The eigenvectors of φˆ obviously form a sheaf N on C whose stalk at a general point is one-dimensional. The direct image of N under the projection map π : C → C∗ is the ˆ We will call the pair (C, N ) the spectral data of a Hitchin pair (A, ˆ φ), ˆ and bundle E. refer to C as the Hitchin spectral curve. For a general Hitchin pair the curve C is nonsingular. If this is the case, then the sheaf N is a line bundle. Indeed, since π∗ (N ) is a vector bundle, N is a torsion free sheaf, hence a subsheaf of a locally free sheaf. But any subsheaf of a locally free sheaf on a smooth algebraic curve is locally free (this follows from the fact that a nonsingular curve has cohomological dimension one). Thus N must be a line bundle. Finally, let us justify the assertion that Ind D = −k. The index can be computed using the heat kernel method. Alternatively one may use the approach of Callias [15] who computed the index of a Dirac-type operator on R2n+1 for all n. One can check that the proof goes through for R2 × S1 . Either way, we find: Tr (∗(∂A φ)φ) Ind D = lim − R→∞ 4π ||φ|| |z|=R ∂ 1 = lim − ||φ|| d(arg z) ∧ dχ . R→∞ 2π |z|=R ∂r
344
S. Cherkis, A. Kapustin
Thus Ind D = −m(A, φ) = −k. Below we will compute the index in another way, which also provides some information on the spatial structure of the zero modes. 4. Spectral Data of a Periodic Monopole In the previous section we showed that the Nahm transform of a charge k periodic ˆ (), where Aˆ is a connection on a trivial rank k bundle Eˆ over monopole is a pair (A, 1 ˆ ˆ ⊗ )1,0 , and the pair A, ˆ ( satisfies the Hitchin X = R × S , ( is a section of End(E) Xˆ equations (14,16). Since Xˆ is noncompact, it is important to determine the behavior ˆ () at r = ±∞. The simplest way to do this uses an algebraic curve of the pair (A, associated to the periodic monopole. In this section we explain how to construct this curve and a line bundle over it. These algebro-geometric data associated to a periodic monopole will be called the monopole spectral data. Let B be a (nonunitary) connection on E defined by B(x) = A(x) − iφ(x)dχ . Let ζ ∈ C. Consider a loop γζ : S1 → X given by γζ : u → (z(u), χ (u)) = (ζ, u),
u ∈ R/2π Z.
We denote the value of B(γζ (u)) on the vector ∂/∂u by Bu . Suppose we want to compute the holonomy of B along γ . To do this, we must solve the matrix equation d (19) − iBu V (ζ, u) = 0 du with the initial condition V (ζ, 0) = 12×2 . The holonomy is equal to V (ζ, 2π ). Note now that the Bogomolny equation implies
d ∂z¯ − iAz¯ (ζ, u), − iBu = −iFz¯ χ − (∂z¯ φ − i[Az¯ , φ])|z=ζ = 0. du Hence the commutator
W (ζ, u) = ∂z¯ − iAz¯ , V (z, u) |z=ζ also satisfies the differential equation (19). On the other hand, since V (ζ, 0) = 12×2 for all ζ , W (ζ, 0) = 0 for all ζ . Equation (19) being first order, this means that W (ζ, u) = 0 for all ζ ∈ C and u ∈ R. Recalling the definition of W , we see that the characteristic polynomial of V (z, u) is a holomorphic function of z for any u. Hence the function F (w, z) = det(w − V (z, 2π )) is a holomorphic function of both z and w. It is also easy to see that F (w, z) is gaugeinvariant and independent of the choice of origin on the circle parameterized by χ . We define the spectral curve S of a periodic monopole to be the zero set of F (w, z), i.e. S is an algebraic curve in C2 given by det(w − V (z, 2π )) = 0.
(20)
Nahm Transform for Periodic Monopoles and N = 2 Super Y–M Theory
345
Since both φ and A are traceless, det V (z, 2π ) = 1. It follows that S does not have common points with the set w = 0 in C2 , and therefore may be regarded as a complete curve in C × C∗ , where C∗ is a complex w-plane with the origin removed. Let us examine the curve S more closely. Since we are dealing with SU (2) monopoles, the equation of S is really w 2 − w Tr V (z, 2π ) + 1 = 0,
(21)
i.e. S is a double cover of the z-plane. One can also show that Tr V (z, 2π ) is a degree k polynomial in z. Indeed, we already know that Tr V (z, 2π ) is an entire function of z. Its behavior for large z can be computed from the known behavior of A and φ described by (4). This yields Tr V (z, 2π ) = zk exp (2π v) (1 + o(1)),
(22)
with v = v + ib. Since the function Tr V (z, 2π ) is entire and bounded by a multiple of zk , it must be a polynomial of degree k. The leading coefficient of this polynomial is determined by the asymptotic conditions imposed on the monopole (i.e. by b and v), while the remaining k coefficients are the moduli of the periodic monopole. A periodic monopole also provides us with a coherent sheaf M on S, namely the sheaf of eigenvectors of V (z, 2π ). The stalk of M at a general point is one-dimensional. The direct image of M under the projection map π : S → C is of course the bundle E restricted to χ = 0. We will call the pair (S, M) the spectral data of a periodic monopole. For a general monopole the curve S is nonsingular. If this is the case, then M is a line bundle. The reasoning leading to this conclusion is the same as for the Hitchin spectral data. A periodic monopole with charge k can be thought of as consisting of k monopoles of charge 1. With the help of the spectral curve one may suggest a precise definition of the location of these constituent monopoles on C. These are the points where the holonomy V (z, 2π) has an eigenvalue 1, i.e. the roots of the equation Tr V (z, 2π ) = 2. Since Tr V (z, 2π) is a polynomial of degree k, for a generic monopole this equation has k distinct roots ζ1 , . . . , ζk . We expect that when these points are well-separated, the energy density is concentrated in their neighborhood. If we assume that the curve S is nonsingular, then at z = ζα the Jordan normal form of V (z, 2π) is 1 1 . 0 1 This implies that at z = ζα the holonomy V (z, 2π ) has a single eigenvector with eigenvalue one. In other words, if we consider the restriction of E to the S1 given by z = ζα , and equip it with a (nonunitary) connection B, then this bundle has a covariantly constant section unique up to a scalar multiplication. On the other hand, for other values of ζ the holonomy of B has both eigenvalues distinct from 1, and the restriction of E to the circle does not have sections covariantly constant with respect to B. This elementary observation plays an important role in the next section. 5. Coincidence of the Spectral Data The purpose of this section is to demonstrate that the two kinds of spectral data defined in Sects. 3 and 4 coincide. Recall that starting from a periodic monopole (A, φ) twisted
346
S. Cherkis, A. Kapustin
by s = r + it ∈ C we defined a unitary bundle Eˆ on C∗ , formed by zero-modes of ˆ and a Higgs field the twisted Dirac operator D † , as well as a unitary connection on E, ˆ ˆ ˆ φ ∈ 9(End(E)). The Higgs field φ(s) was defined as a composition of multiplication by the affine coordinate z on C ∼ = R2 and projection to Ker(D † ). The coincidence of the spectral curves C and S is equivalent to the following statement: if ζ is an eigenvalue of the transformed Higgs field φˆ at a point s = σ , then e2πσ is an eigenvalue of the holonomy of B = A − iφ dχ around the loop γζ which winds around the S1 at z = ζ . This is the statement that will be proved below. We will also show that the zero modes of the Dirac operator D † are in one-to-one correspondence with the points ζ ∈ C such that the restriction of E to the circle z = ζ has a covariantly constant section (with respect to the connection B). As explained in the previous section, for a general monopole there are k such points, so we see again that dim ker D † = k.
5.1. Cohomological description of the Nahm transform. We proceed to reformulate the Nahm transform of Sect. 3 in cohomological terms. The benefits of such a reformulation will become apparent shortly. In particular the cohomological definition of the transformed Higgs field φˆ is extremely simple. Let us denote by B0,1 (X, E) the bundles on X = R2 × S1 whose sections have the form f d z¯ +gdχ , where f, g ∈ 9(E). B0,2 (X, E) will denote the bundle whose sections have the form f d z¯ ∧ dχ, where f ∈ 9(E). The bundles B0,1 (X, E) and B0,2 (X, E) are subbundles of the bundles of E-valued differential forms B1 (X, E) and B2 (X, E), respectively. Their names betray their origin in the Hodge decomposition of forms on C2 . For uniformity of notation, we also set B0,0 (X, E) = 9(E). Pursuing this analogy, we can identify spinor bundles S + (E) and S − (E) as follows: S + (E) = B0,0 (X, E) ⊕ B0,2 (X, E),
S − (E) = B0,1 (X, E).
(23)
To any trivial vector bundle E on X we can associate a locally free sheaf of vector spaces defined in the following way. Over the whole X its space of sections is the space of smooth global sections of E which belong to the Schwarz space (i.e. all of their derivatives decay faster than any negative power of |z|). Over any open set O ⊂ X its space of sections is obtained by restriction from X. In what follows we will identify a trivial vector bundle on X and the corresponding sheaf. Let us define differentials ∂ ∂ D¯ p = d z¯ ∧ 2( − iAz¯ ) + dχ ∧ ( − iAχ − φ + s), ∂ z¯ ∂χ
(24)
acting from B0,p (X, E) to B0,p+1 (X, E), p = 0, 1. Note that D¯ 1 D¯ 0 = 0, as a consequence of the Bogomolny equation. Thus we have a differential complex K: D¯ 0
D¯ 1
K : 0 → B0,0 −→ B0,1 −→ B0,2 → 0.
(25)
Since the operators D¯ p depend on e2πs ∈ C∗ , so does the complex K, and it would be more precise to call it Ks . We will omit the subscript s where this cannot lead to confusion.
Nahm Transform for Periodic Monopoles and N = 2 Super Y–M Theory
347
Since all the bundles we are dealing with are trivial, we are free to identify S ⊗ E ∼ = S + (E) ∼ = S − (E). Then the twisted Dirac operator D : S ⊗ E → S ⊗ E becomes simply D = D¯ 0 − D¯ 1∗ , and its adjoint D † : S − → S + is D¯ 0∗ − D¯ 1 . As explained in Sect. 3, the only L2 solution of the equation Dψ = 0, ψ ∈ 9(S ⊗E), is the trivial one. In other words the equation D¯ 0 E − D¯ 1∗ E = 0, E ∈ B0,0 (X, E) ⊕ B0,2 (X, E), has only the trivial L2 solution. It follows that the complex (25) is exact in the first and the third terms: H 0 (K) = H 2 (K) = 0. We want to show that H 1 (K) is isomorphic to the kernel of the twisted Dirac operator D † . In one direction this is easy: for any ψ ∈ Ker D † we have D¯ 1 ψ = D¯ 0∗ ψ = 0, and therefore ψ is a harmonic representative of a class in H 1 (K). It is obvious that this map from Ker D † to H 1 (K) is injective. The inverse map is constructed as follows. For any representative θ of a class [θ ] ∈ H 1 (K) we have to find ρ ∈ B0,0 (X, E) such that (D¯ 0∗ − D¯ 1 )(θ + D¯ 0 ρ) = 0. Since H 0 (K) is trivial, the kernel of the operator D¯ 0∗ D¯ 0 is empty and the operator itself is invertible. Thus we may solve for the function ρ: ρ = −(D¯ 0∗ D¯ 0 )−1 D¯ 0∗ θ.
(26)
This yields a map from H 1 (K) to Ker D † . It is easy to see that it is the inverse of the map from Ker D † to H 1 (K) constructed above. Since D¯ p and multiplication by z commute, the action of φˆ on H 1 (K) is simply multiplication by z, without a need for a projection. This is the reason the cohomological description of Ker D † is useful. 5.2. Explicit argument. Suppose the point (ζ, e2πσ ) ∈ C × C∗ belongs to the Hitchin spectral curve C. In this case there exists a nonzero vector I ∈ H 1 (Kσ ) such that ˆ )I = ζ I. φ(σ
(27)
As explained above, φˆ acts on H 1 (Kσ ) as multiplication by z. Let θ be a one-form representing I ∈ H 1 (Kσ ). Then Eq. (27) means that there exists ψ ∈ B0,0 (X, E) such that (z − ζ )θ = D¯ 1 ψ. It follows that D¯ 1 ψ vanishes at z = ζ . In particular we have ∂ − iAχ − φ + σ ψ|z=ζ = 0, ∂χ
(28)
(29)
i.e. the restriction of ψ to the circle z = ζ is covariantly constant with respect to the connection B + iσ dχ. If ψ is not identically zero on the circle z = ζ , this implies that the holonomy matrix V (ζ, 2π ) has an eigenvalue equal to e2πσ , and consequently the point (ζ, e2πσ ) belongs to the monopole spectral curve S. To complete the proof of C = S it remains to show that ψ does not vanish identically on the circle z = ζ . Suppose it does vanish. Then (28) implies that on the circle z = ζ j we have ∂∂z¯ j ψ = 0 for all j ≥ 0. It follows that the function ψ has the form ψ =
348
S. Cherkis, A. Kapustin
(z − ζ )a(z, z¯ , χ ) + b(z, z¯ , χ ), where both a and b are smooth, and as z → ζ the function b approaches zero faster than any power of |z − ζ |. Hence ψ is divisible by (z − ζ ), i.e. there exists a smooth function ϕ ∈ B0,0 (X, E) such that ψ = (z − ζ )ϕ. Then θ = D¯ 0 ϕ, which contradicts the assumption that θ represents a nontrivial class in H 1 (K).
5.3. Cohomological argument. Consider a complex of sheaves of vector spaces: z−ζ
rest.
→ 0, 0 → E −−→ E −−−→ E|z=ζ −
(30)
where the second map is multiplication by z − ζ , and the map rest. is restriction to the circle z = ζ . This complex fails to be exact in the second term. Nevertheless, as shown below, there is a long exact sequence in D¯ cohomology: z−ζ
rest.
0 1 1 1 1 1 0 → HD ¯ (S , E|z=ζ ) → HD¯ (X, E) −−→ HD¯ (X, E) −−→ HD¯ (S , E|z=ζ ) → 0. (31) j
Here H 1¯ (X, E) is the same as H 1 (K), while H ¯ (S1 , E|z=ζ ) is the j th cohomology of D D the restriction of K to the circle z = ζ . Note that the restriction of D¯ p to z = ζ is simply the covariant differential with respect to the connection B + i s dχ restricted to z = ζ . To understand where this exact sequence comes from, it is helpful to think about solutions of self-duality equations on C × T 2 . Periodic monopoles are a particular class of such solutions which are invariant with respect to translations in one direction on the torus. Now the variable χ gets promoted to a complex variable parameterizing the uni¯ versal cover of the torus, and the operator D¯ becomes simply a ∂-operator on the bundle E. Thus E has a structure of a holomorphic bundle over C × T 2 . The cohomology of D¯ is the Dolbeault cohomology of E. If we forget about noncompactness, the Dolbeault ˇ cohomology can be identified with the Cech cohomology of the holomorphic bundle E. On the other hand, if we work in the holomorphic category, the sequence of sheaves (30) ˇ is exact and hence induces a long exact sequence of Cech cohomology groups. In our situation, we cannot use the holomorphic interpretation. Instead, in the next subsection we derive the exact sequence (31) from a spectral sequence of a double complex. The coincidence of the Hitchin and the monopole spectral data is an immediate consequence of the exactness of the sequence (31). Indeed, if the point (ζ, exp(2π σ )) belongs to the Hitchin spectral curve C, then the kernel of the map (z − ζ ) from H 1 (Kσ ) to H 1 (Kσ ) is nontrivial. But the cohomology exact sequence implies an isomorphism 0 1 Ker(z − ζ ) ∼ = HD ¯ (S , E|z=ζ ).
(32)
Therefore H 0¯ (S1 , E|z=ζ ) is nontrivial as well. This means that the holonomy of B along D the circle z = ζ has exp(2πσ ) as one of its eigenvalues. Thus the point (ζ, exp(2π σ )) ∈ C × C∗ belongs to the monopole spectral curve S. Moreover, the fibers of the spectral line bundles on C and S are given by Ker(z − ζ ) and H 0¯ (S1 , E|z=ζ ), respectively. Thus D we also get an isomorphism of the line bundles.
Nahm Transform for Periodic Monopoles and N = 2 Super Y–M Theory
349
5.4. Exactness of the cohomology sequence. Consider again the complex of sheaves (z−ζ )
rest.
0 → E −−−→ E −−→ E|z=ζ → 0,
(33)
where rest. is the restriction to z = ζ . Since D¯ p , p = 0, 1, commutes with (z − ζ ) and rest., this complex is included in a double complex D p,q : E ⊗ O B0,2
(z−ζ )
/ E ⊗ B0,2 O
D¯ 1
D p,q :
rest.
D¯ 1
E ⊗ O B0,1
(z−ζ )
D¯ 1
/ E ⊗ B0,1 O
D¯ 0
rest.
D¯ 0
E ⊗ B0,0
(z−ζ )
/0 O
/ E ⊗ B0,1 |z=ζ O D¯ 0
/ E ⊗ B0,0
rest.
/ E ⊗ B0,0 |z=ζ .
Computing the cohomology of the rows, we obtain the first term of the “vertical” spectral sequence : 0,2 η ∼ η0,2 + (z − ζ )ω0,2 0O 0O O D¯ 1 p,q
E1
:
0O
η0,1 ∼ η0,1 + (z − ζ )ω0,1 | rest.(η0,1 ) = 0 O
0O
D¯ 0
0
η0,0 ∼ η0,0 + (z − ζ )ω0,0 | η0,0 |z=ζ = 0
0
p,q
On the second level the “vertical” spectral sequence degenerates to zero: E∞ = 0. Now let us compute the “horizontal” spectral sequence. Its first term is simply the D¯ p,q = H ¯ (D p,q ). The second term E p,q is given by cohomology of D p,q : E 1 2 D 0
ZZZZZZ 0 ZZZZZZ ZZZZZZ ZZZZdZ2 Z ZZZZZZ ZZZZZ,
Ker (z − ζ )|H 1 (K)
0
Ker rest./Im (z − ζ )|
0
H ¯ (E|z=ζ )/Im rest.|H 1 (K) H 1 (K) D YYYYYY YYYYYY YYYYdY2Y YYYYYY YYYYYY YYYY, 1
H 0¯ E|z=ζ . D
0
At the next level the “horizontal” spectral sequence degenerates. Since the “vertical” spectral sequence converges to zero, so should the “horizontal” one. From this we infer the isomorphisms ∼ Im (z − ζ )| 1 , Ker rest.| 1 = H (K) 1 HD¯ (E|z=ζ )
Ker (z − ζ )|H 1 (K)
H (K)
∼ = Im rest.|H 1 (K) , ∼ = H 0 E|z=ζ . D¯
These isomorphisms are equivalent to the exactness of the cohomology sequence (31).
350
S. Cherkis, A. Kapustin
5.5. Revisiting the index computation. Using the spectral sequence technology, we can give another proof that dim H 1 (K) = k. The advantage of this method of proof is that it makes it clear that the zero modes of the Dirac operator are “localized” near the points z = ζ1 , . . . , ζk . Consider a bundle morphism M from E to E defined as multiplication by the polynomial ˆ s=0 ) = (z − ζ1 ) . . . (z − ζk ). det(z − φ| ˆ or equivalently the Here ζ1 , . . . , ζk are the roots of the characteristic polynomial of φ, solutions of the equation Tr V (z, 2π ) = 2. Suppose all ζi are distinct. Let Y be the restriction of the sheaf E to the union of k circles z = ζ1 , . . . , z = ζk , and let rest. be the restriction map. Obviously, rest. · M = 0, so we get a complex M
rest.
0→E− → E −−→ Y → 0. As before, this complex is not exact in the middle term, but nevertheless leads to an exact cohomology sequence: M
rest.
0 1 0 → HD → H 1 (K) − → H 1 (K) −−→ HD ¯ (Y ) − ¯ (Y ) → 0.
The proof of exactness is identical to the one given above. Now note that the map M sends H 1 (K) to zero by virtue of the Cayley–Hamilton theorem on the characteristic polynomial of a matrix. Hence H 1 (K) is isomorphic to H 0¯ (Y ). On the other hand, if D all the numbers ζα are distinct, we have 0 k 0 ∼ k HD ¯ (Y ) = ⊕α=1 HD¯ E|z=ζα = C . Hence dim H 1 (K) = k. Moreover, we see that each of the circles z = ζα gives rise to a vector in H 1 (K) ∼ = Ker D † , and all these vectors are linearly independent. Thus we may think of the k zero modes of the Dirac operator D † as “localized” in the neighborhood of k circles z = ζα , α = 1, . . . , k. 6. Asymptotic Behavior of the Hitchin Data The fact that the spectral curves C and S coincide provides a wealth of information about periodic monopoles. In particular, it allows to determine the behavior of the Higgs field φˆ for Re s → ±∞. Let us rewrite the equation of the curve S in the following form: f (z) − exp(2π s) − exp(−2π s) = 0.
(34)
Here we used the identification of the eigenvalue w of V (z, 2π ) with exp(2π s). Recalling the definition of the Hitchin spectral curve C, we infer that all coefficients of the characteristic polynomial of φˆ except det φˆ are independent of s. Furthermore, we know from Sect. 4 that f (z) is a degree k polynomial in z with leading coefficient exp(2π v). It follows that the determinant of φˆ is given by det φˆ = (−1)k+1 e2πs + e−2πs exp(−2π v).
Nahm Transform for Periodic Monopoles and N = 2 Super Y–M Theory
351
Another piece of information comes from Eq. (15) which says that the curvature of Aˆ is proportional to the restriction of the operator (D † D)−1 to the subspace Ker D † . The operator (D † D)−1 is a integral operator on the space of L2 sections of E ⊗ S, and from (10) it is clear that its norm vanishes in the limit Re s → ±∞. (In fact, it is easy to see that the norm is bounded from above by a multiple of 1/|Re s|3/2 .) Hence the curvature ˆ →0 of Aˆ also goes to zero in this limit. Bogomolny equations then imply that [φˆ † , φ] asymptotically. ˆ we Combining this with the information about the characteristic polynomial of φ, ˆ infer that for Re s → ±∞ the Higgs field φ behaves as follows: 2π i ˆ φ(s) ∼ − exp − v+ k 2 (35) · g± (s) e±
2π s k
(1 + o(1))diag(1, ω, ω2 , . . . , ωk−1 ) g± (s)−1 .
Here ω is a k th root of unity and g± (s) are multi-valued functions on Xˆ with values in U (k). In order for φˆ to be well-defined, the functions g± must satisfy g± (s + i) = g± (s) V ±1 eiβ . Here β is a real number, and V ∈ SU (k) is the so-called “shift” matrix: 0 0 0 ... 0 1 1 0 0 . . . 0 0 0 1 0 . . . 0 0 V = . . . . . . . . . . . . . . . . . . . 0 0 0 . . . 0 0
(36)
0 0 0 ... 1 0 We can reformulate these results as follows. The Nahm transform of a periodic ˆ φ) ˆ satisfying the U (k) Hitchin equations and the monopole of charge k is a pair (A, following asymptotic conditions: ˆ α , α = 1, . . . , k − 1 are bounded; (i) The functions Tr φ(s) ˆ (ii) The function exp(∓2πs) det φ(s) behaves as (−1)k+1 exp(−2π v) + O(exp(∓2π s)) for Re s → ±∞; C (iii) ||Fz¯z ||2 ≤ . |Re s|3 ∼ R×S1 ˆ α and det φ(s) ˆ are holomorphic functions on C∗ = Since the functions Tr φ(s) by virtue of the Hitchin equations, the first two conditions are equivalent to the statement ˆ φ) ˆ has the form (34), with f (z) being a polynomial of degree that the spectral curve of (A, k with the leading coefficient exp(2π v), and the rest of the coefficients being arbitrary constants. In the next three sections we will show that the correspondence between the solutions of Hitchin equations satisfying (i)–(iii) and periodic monopoles is one-to-one.
352
S. Cherkis, A. Kapustin
7. The Inverse Nahm Transform In this section we show how to associate a periodic SU (2) monopole of charge k to any solution of U (k) Hitchin equations on Xˆ with asymptotic behavior as above. This procedure will be called the inverse Nahm transform. It will take us from Hitchin data ˇˆ → X. Later on, in associated with a bundle Eˆ → Xˆ to monopole data on a bundle E ˇˆ coincides with that on E, so in this section Sect. 9, we will show that the monopole on E we shall use a simplified notation in which the symbol ˆˇ is omitted. Since the original monopole data on E are not used in this section, this should not lead to confusion. Let Eˆ be a trivial unitary rank k bundle over Xˆ = C∗ , Aˆ be a connection on Eˆ and φˆ ˆ Furthermore, let the pair (A, ˆ φ) ˆ be a solution of U (k) Hitchin be a section of End (E). ˆ ˆ equations on X such that φ has the asymptotics as in (8). Let Lˆ be a trivial line bundle over Xˆ with a flat unitary connection aˆ such that the holonomy of aˆ around the positively oriented loop encircling the origin of C∗ is exp(−iχ ). The variable χ is assumed to take values in the interval [0, 2π]. Consider a Dirac-type operator Dˆ : Eˆ ⊗ Lˆ ⊗ C2 → Eˆ ⊗ Lˆ ⊗ C2 given by −φˆ + z 2∂A+ ˆ aˆ Dˆ = 2∂¯ ˆ −φˆ † + z¯ . A+aˆ
Here z is a complex parameter. The operator Dˆ is Fredholm for any z and χ because ˆ grows without bound as t → ±∞. ||φ|| The Weitzenbock formula for Dˆ reads: (φˆ † − z¯ )(φˆ − z) − 4∂A+ 2∂Aˆ φˆ † ∂¯A+ ˆ ˆ † a ˆ a ˆ Dˆ Dˆ = 2∂¯Aˆ φˆ (φˆ − z)(φˆ † − z¯ ) − 4∂¯A+ ˆ aˆ ∂A+ ˆ aˆ . If the Hitchin equations are satisfied, then this formula simplifies: 1 † 2 ˆ ˆ ˆ† ˆ Dˆ † Dˆ = −∇A+ ˆ aˆ + 2 ((φ − z¯ )(φ − z) + (φ − z)(φ − z¯ )). This operator is clearly positive definite on the space of smooth rapidly decreasing sections of Eˆ ⊗ Lˆ ⊗ C2 . It is easy to see that any L2 eigenvector of Dˆ with zero eigenvalue must be smooth and decreasing faster than any negative power of r = Re s, hence Dˆ has trivial L2 kernel. Thus the dimension of the kernel of Dˆ † is minus the index ˆ of D. Computing the L2 index of Dˆ turns out to be rather tricky. We will do it in the next section by reinterpreting Ker Dˆ † as a certain cohomology group and computing it using the spectral sequence of a double complex, similarly to how it was done in Sect. 5. The result of this computation is that Ker Dˆ † has dimension 2 for all z and χ . We conclude that Ker Dˆ † forms a trivial rank 2 bundle on the manifold X = C × S1 parameterized by (z, χ ). Since the elements of the kernel are square-integrable, we have a well-defined Hermitian inner product on Ker Dˆ † for all z, χ . In this way we obtain a unitary rank 2 bundle E over X. Now we need to define a connection A on E and a traceless Hermitian section φ of End(E). The connection on Ker Dˆ is induced from a trivial connection on a trivial infinite-dimensional bundle on X whose fiber consists of all smooth L2 sections of
Nahm Transform for Periodic Monopoles and N = 2 Super Y–M Theory
353
ˆ = 1 − Pˆ , we ˆ If we introduce the projectors Pˆ = 1 − D( ˆ Dˆ † D) ˆ −1 Dˆ † and Q Eˆ ⊗ Sˆ ⊗ L. may write ∂ ∂ ∂ ˆ ˆ dA = P d = P dz + d z¯ + dχ . ∂z ∂ z¯ ∂χ The value of the Higgs field φ at a point x ∈ X is a linear map E → E defined as a composition of multiplication by r and projection to KerD † , i.e. φ = Pˆ r. It remains to show that φ and A satisfy the Bogomolny equation (1). In the coordinates z, χ used above this equation is equivalent to a pair of equations Fz¯ χ = i ∂¯A φ, i i∂/∂χ · dA φ. Fz¯z = 2
(37) (38)
Here ∂¯A means i∂/∂ z¯ · dA . To show that these equations are satisfied, we have to use the commutation relations ˆ ∂ ] = −p+ , ˆ ∂ ] = −iσ2 , [D, [D, ∂χ ∂z ∂ ∂ [Dˆ † , [Dˆ † , ] = −p− , ] = iσ2 , ∂χ ∂z
ˆ ∂ ] = −p− , [D, ∂ z¯ ∂ [Dˆ † , ] = −p+ , ∂ z¯
ˆ r] = σ1 , [D, [Dˆ † , r] = −σ1 ,
and the fact that Dˆ † Dˆ commutes with all σi . We find for the curvature of A: ˆ χQ ˆ − ∂χ Q ˆ ∂¯ Q) ˆ Fz¯ χ = i Pˆ (∂¯ Q∂ ˆ Dˆ † D) ¯ Dˆ † ]) ˆ −1 [∂χ , Dˆ † ] − [∂χ , D]( ˆ −1 [∂, ¯ D]( ˆ Dˆ † D) = i Pˆ ([∂, ˆ −1 σ2 + σ2 (Dˆ † D) ˆ −1 p+ ) = Pˆ (p− (Dˆ † D) ˆ −1 σ− , = 2i Pˆ (Dˆ † D) ˆ ∂¯ Q ˆ − ∂¯ Q∂ ˆ Q) ˆ Fz¯z = i Pˆ (∂ Q ¯ Dˆ † ] − [∂, ¯ D]( ˆ Dˆ † D) ˆ −1 [∂, ˆ −1 [∂, Dˆ † ]) ˆ Dˆ † D) = i Pˆ ([∂, D]( ˆ −1 p+ − p− (Dˆ † D) ˆ −1 p− ) = i Pˆ (p+ (Dˆ † D) ˆ −1 σ3 . = i Pˆ (Dˆ † D)
(39)
354
S. Cherkis, A. Kapustin
The covariant derivatives of φ can also be easily computed: ¯ Pˆ r] ∂¯A φ = [Pˆ ∂, ˆ ∂¯ − Pˆ ∂¯ Qr ˆ = Pˆ r Q ˆ −1 [Dˆ † , ∂] ˆ −1 [Dˆ † , r]) ¯ − [∂, ¯ D]( ˆ Dˆ † D) ˆ Dˆ † D) = Pˆ ([r, D]( ˆ −1 p+ + p− (Dˆ † D) ˆ −1 σ1 ) = Pˆ (σ1 (Dˆ † D) (i∂/∂χ
ˆ −1 σ− , = 2Pˆ (Dˆ † D) · dA )φ = [Pˆ ∂χ , Pˆ r] ˆ ˆ χ − Pˆ ∂χ Qr = Pˆ r Q∂
(40)
ˆ −1 [Dˆ † , ∂χ ] − [∂χ , D]( ˆ −1 [Dˆ † , r]) ˆ Dˆ † D) ˆ Dˆ † D) = Pˆ ([r, D]( ˆ −1 σ2 − σ2 (Dˆ † D) ˆ −1 σ1 ) = −i Pˆ (σ1 (Dˆ † D) ˆ −1 σ3 . = 2Pˆ (Dˆ † D) Comparing (39) and (40), we see that A and φ indeed satisfy the Bogomolny equation. 8. Spectral Data and the Inverse Nahm Transform In the previous section we showed that the inverse Nahm transform applied to a solution of Hitchin equations on Xˆ ∼ = C∗ yields a solution of the Bogomolny equation on 2 1 ∼ X = R × S . In this section we prove that if the solution of the Hitchin equations has the asymptotics (35), then the corresponding solution of the Bogomolny equation has the asymptotics (4). The use of the monopole spectral data greatly facilitates this proof. We first give a “pedestrian” proof of the coincidence of the Hitchin and monopole spectral data, and then a more conceptual one using the spectral sequence of a double complex. This spectral sequence will also be used to compute the index of the Dirac ˆ thereby filling a gap in the derivation of Sect. 7. In fact, we will show that operator D, the zero modes of the twisted Dirac operator Dˆ z,χ are in one-to-one correspondence with the points on Xˆ where φˆ has an eigenvalue z. 8.1. Cohomological interpretation of the inverse Nahm transform. In order to give a cohomological interpretation of the inverse Nahm transform, let us consider the folˆ E) ˆ and ˆ 0 = B0,0 (X, lowing complex constructed from the sheaves of vector spaces B 1 0,1 ˆ ˆ ˆ B = B (X, E): ˆ
ˆ
δ0 δ1 ˆ0⊕B ˆ1 − ˆ 1 → 0, ˆ0 − →B →B 0→B
with δˆ0 and δˆ0 defined by −(φˆ − ζ )f , δˆ0 : f $ → 2∂¯ ˆ f A+aˆ
δˆ1 :
g0 g1
ˆ $ → −2∂¯A+ ˆ aˆ g0 − (φ − ζ )g1 .
(41)
(42)
ˆ 1 has the form g1 = g(s)d s¯ , we can identify B ˆ 0 % g with Since every element g1 ∈ B ˆ 1 % g1 . B
Nahm Transform for Periodic Monopoles and N = 2 Super Y–M Theory
355
In terms of this complex Kˆ z,χ the Dirac operator Dˆ † of Sect. 7 is given by Dˆ † = δˆ0∗ − δˆ1 .
(43)
One can easily see that the square-integrable zero modes of Dˆ † are in one-to-one correspondence with the first L2 cohomology of the above complex. Thus, if we denote the ˇˆ we have a canonical identification of the inverse Nahm transform of the bundle Eˆ by E, ˇˆ at a point (z, χ ) ∈ X with H 1 (Kˆ ). In other words, the spaces H 1 (Kˆ ) fiber of E z,χ
z,χ
ˇˆ form a vector bundle over X which is canonically isomorphic to E. At this stage of the discussion it becomes crucial to keep track of the periodicity conditions along the χ and t directions. Let us denote the circles parameterized by χ and t by S1 and Sˆ 1 , respectively. Previously we worked with one circle at a time and could choose a trivialization of any bundle on a circle so that the components of a section be periodic functions. However, if one considers a bundle on S1 × Sˆ 1 , both circles are in the game. If the bundle on S1 × Sˆ 1 is nontrivial, then there is no trivialization in which sections are periodic functions along both periodic directions. This is in fact what happens in our case. Let us introduce some notation. The circle Sˆ 1 parameterizes line bundles with a unitary connection on S1 . Namely, a point t ∈ Sˆ 1 corresponds to a line bundle with a monodromy e−2πit . We will denote this line bundle with a connection by Lt . If one chooses a “periodic” trivialization alluded to above, then the connection on Lt is −tdχ . Alternatively, if one chooses a “quasiperiodic” trivialization in which sections are represented by functions satisfying f (χ + 2π ) = e2πit f (χ ), then the connection on Lt is trivial. Conversely, S1 parameterizes unitary line bundles on Sˆ 1 ; the point χ ∈ S1 corresponds to a line bundle Lˆ χ whose monodromy is eiχ . Recall now the definition of the Poincaré line bundle P on S1 × Sˆ 1 (see e.g. [13]). It is a line bundle with a unitary connection whose restriction to any circle t = t0 is isomorphic to Lt0 (as a line bundle with a connection), while its restriction to any circle χ = χ0 is isomorphic to Lˆ χ0 . The curvature of the connection on P is given by dχ ∧ dt. The Poincaré line bundle is nontrivial and has the first Chern class equal to 1. If we choose a trivialization of P such that the connection on P is given by A = χ dt, then its sections are represented by functions f (χ , t) satisfying f (χ + 2π, t) = e2πit f (χ , t) and f (χ , t + 1) = f (χ , t). Similarly, the dual line bundle P ∗ has a connection 1-form −χ dt, and its sections are represented by functions f (χ , t) satisfying f (χ + 2π, t) = e−2πit f (χ , t) and f (χ , t + 1) = f (χ , t). By making use of P ∗ , the inverse Nahm transform can be rephrased as follows. We ˆ Then pull back the bundle Eˆ to X × Xˆ using the natural projection πˆ : X × Xˆ → X. ∗ ˆ by a line bundle with a unitary connection −χ dt and trivial periodicity we twist πˆ (E) condition in the t direction, i.e. by a line bundle P ∗ . The complex Kˆ is also twisted by P ∗ , as is clear from its definition, and in addition by the Higgs field z ds. Finally, we form a bundle on X whose fiber over x is the first cohomology group of the complex Kˆ z,χ . The direct Nahm transform can be similarly reformulated using the line bundle P. (This description suggests that it is useful to think about the derived functor of the Nahm transform, which reduces to the ordinary Nahm transform when both the initial and the transformed complexes happen to have only a single nonvanishing cohomology).
356
S. Cherkis, A. Kapustin
The upshot of this discussion is that sections of P ∗ × Eˆ should be thought of as vector-valued functions on X × Xˆ satisfying e2πit f (χ + 2π, t) = f (χ , t), while the covariant derivatives along ∇χ =
∂ ∂χ
∂ , ∂χ
and
∇t =
f (χ , t + 1) = f (χ , t), ∂ ∂t
are given by
∂ + iχ − i Aˆ t . ∂t
8.2. Coincidence of the spectral curves: An explicit argument. Let us pick a point (ζ, exp 2πσ ) ∈ C × C∗ belonging to the monopole spectral curve S. In other words, exp 2πσ is one of the eigenvalues of the holonomy V (ζ, 2π ): det(V (ζ, 2π ) − e2πσ ) = 0.
(44)
ˆ0⊕B ˆ 1 parameterized by χ ∈ R, This implies that there is a family of sections E of B 1 ˆ such that [E(χ )] is a nonzero element of H (Kz,χ ) and
∂ − r + σ E = [0]. (45) ∂χ Here the brackets designate the cohomology class in H 1 (Kˆ z,χ ). We also denote s = r + it, as usual. Let us unwrap Eq. (45). We will write E as follows: a(χ , s) . E(s, χ ) = b(χ , s) ˆ We can choose the cohomology For a fixed χ the functions a and b are sections of E. ˆ and therefore satisfy representative E(s, χ ) so that a and b are sections of P ∗ ⊗ πˆ ∗ (E), a(χ , s) a(χ + 2π, s) . = e2πit b(χ , s) b(χ + 2π, s) ˆ such that Equation (45) means that there exists a section h of P ∗ ⊗ πˆ ∗ (E) ˆ −(φ − ζ )h a ∂ = . −r +σ ∂χ 2∂¯ ˆ h b A+aˆ
Introducing
F (s) =
2π
eiχt h(χ , s)dχ ,
0
we find that
ˆ ) − ζ F (σ ) = 0. φ(σ
ˆ ) and therefore the point If F (σ ) is nonzero, this implies that ζ is an eigenvalue of φ(σ (ζ, e2πσ ) belongs to the Hitchin spectral curve C. This shows that the curves coincide.
Nahm Transform for Periodic Monopoles and N = 2 Super Y–M Theory
357
To prove that F (σ ) is nonzero, let us assume the contrary. Then it follows from the Fredholm Alternative that the equation ∂ −r +σ g =h ∂χ ˆ One can easily see that a and b are expressible in has a solution g ∈ 9(P ∗ ⊗ πˆ ∗ (E)). terms of g as follows: −(φˆ − ζ )g a = . 2∂¯A+ b ˆ aˆ g But this contradicts the assumption that E represents a nontrivial cohomology class in H 1 (Kˆ z,χ ). Thus if exp(2πσ ) is an eigenvalue of V (ζ, 2π ), then ζ is an eigenvalue of ˆ ). φ(σ 8.3. Cohomological argument. The above argument can be conveniently rephrased in cohomological terms. Consider the manifold S1 × Xˆ = S1 × Sˆ 1 × R and a bundle ˆ over it. We have two operators acting on its sections, namely δˆζ,χ of E = P ∗ ⊗ πˆ ∗ (E) Eq. (42) and given by =
∂ − r + σ. ∂χ
(46)
Let us note that these operators, δˆζ,χ and , commute. δˆζ,χ , = 0.
(47)
Consider a complex of sheaves of vector spaces:
0→E − → E −→ Eˆ s=σ → 0, int
(48)
where int acts as int : f (χ , s) $→
2π
f (χ , σ )eiχ Imσ dχ .
(49)
0
Even though this short sequence is not exact, by an argument similar to that in Subsect. 5.4, one can show that there still is a long exact sequence of cohomology groups. To show this, we consider a double complex whose lowest row is the complex (48), and p,q of the spectral sequence the vertical differential is given by δˆζ,χ . The first level E 1 j ˆ E), ˆ as well as maps of this double complex contains the cohomology groups H ˆ (X, δζ,χ
between them. Comparison with the total cohomology of this double complex provides us with an exact sequence ˆ s=σ ) → H 1 (X, ˆ E) ˆ − ˆ E) ˆ → ... . → Hδˆ1 (X, 0 → Hδˆ0 (E| δˆ ζ,χ
ζ,χ
ζ,χ
(50)
358
S. Cherkis, A. Kapustin
ˆ s=σ ) = Ker φ(σ ˆ ) − ζ . Recalling the From the definition of δˆζ,χ we have H ˆ0 (E| δζ,χ
† ˆ E), ˆ the above exact sequence implies an isoidentification of Ker Dˆ ζ,χ with H ˆ1 (X, δζ,χ
morphism
ˆ )−ζ ∼ Ker φ(σ = Ker |KerDˆ † . ζ,χ
(51)
If the point (ζ, exp(2πσ ) belongs to the monopole spectral curve S, then the righthand side of this equation is nonempty, and therefore the point belongs to the Hitchin spectral curve as well. The converse statement is also true. Thus the two curves coˆ ) − ζ on C is identified with incide. Moreover, the spectral line bundle Ker φ(σ Ker |H 1
δˆζ,χ
ˆ E) ˆ (X,
, which is the line bundle on the monopole spectral curve S. Thus
the line bundles on C and S are also isomorphic. ˆ Indeed, since for The isomorphism (51) also enables one to compute the index of D. σ1 = σ2 the kernels of σ1 and σ2 do not intersect, one can easily see that † ∼ ˆ )−ζ . Ker Dˆ ζ,χ = ⊕σ ∈Xˆ Ker φ(σ The dimension of the right-hand side is just the number of points at which the spectral curve C intersects the cylinder in C × C∗ given by z = ζ . From the equation of the curve (34) we see that there are two such points for any ζ . Since the kernel of Dˆ is trivial, the index of Dˆ equals −2, as promised. 8.4. The asymptotic behavior of the monopole. We are now ready to show that the inverse Nahm transform produces a solution of Bogomolny equations with the asymptotics (4). By assumption, we started from a solution of Hitchin equations with the spectral curve f (z) − exp(2π s) − exp(−2π s) = 0, where f (z) is a degree k polynomial with leading coefficient exp(2π v). We proved that the monopole spectral curve is given by the same equation. This implies that Tr V (z, 2π ) grows as Tr V (z, 2π ) ∼ zk e2πv
(52)
for large z, while det V (z, 2π) = 1 everywhere. Another piece of information that we need is that ||FA ||2 is bounded by a multiple of ˆ −1 . Together 1/|z|2 . This follows from (39) and a simple estimate of the norm of (Dˆ † D) with the Bogomolny equation this fact implies that ∂χ φ goes to zero for large z, while the components of the connection A can be chosen to be bounded. Let us now investigate the consequence of these two observations. First, one can show that the inverse Nahm transform yields a traceless connection and a traceless Higgs field (this is not obvious from their definition). Indeed, if the trace part of the curvature were nonzero, it would satisfy the Laplace equation (as a consequence of the Bogomolny equation) and grow at infinity, in contradiction with the above estimate. Hence the curvature is traceless and Tr A is a flat connection on R2 × S1 . Furthermore,
Nahm Transform for Periodic Monopoles and N = 2 Super Y–M Theory
359
Tr φ must be constant by virtue of the Bogomolny equation. Now, since the monopole spectral curve tells us that det V (z, 2π ) = 1 everywhere, this means that Tr φ = 0 and Tr A has zero monodromy. Since Tr A is also flat, it must be gauge-equivalent to zero. Second, the fact that Tr V (z, 2π ) grows as (52) at infinity implies that for large z the eigenvalues of the Higgs field are ±
k log |z| + v + o(1). 2π
This proves that the Higgs field has the asymptotics (4). To show that the gauge field has the correct asymptotics, it suffices to prove that the components of the connection orthogonal to φ go to zero for large |z|, i.e. that the SU (2) monopole approaches at infinity a U (1) monopole embedded in SU (2). The argument for this is exactly the same as for the monopole on R3 [6] (it uses only the fact that at large distances ||φ|| is bounded from below by a strictly positive constant). In fact, [6] proves that the “nonabelian” components of the curvature decay exponentially fast. From the physical point of view this can be explained as follows. Since ||φ|| ≥ 1 for large enough |z|, the SU (2) gauge group is broken down to U (1), the Higgs effect makes all the “nonabelian” components of the gauge field massive, and they decay exponentially. 9. Closing the Circle In this section we prove that the composition of the direct and inverse Nahm transform takes a periodic monopole to a gauge-equivalent periodic monopole. Together with the results of Sects. 3–8, this implies that there is a one-to-one correspondence between the gauge-equivalence classes of periodic SU (2) monopoles with charge k and gauge-equivalence classes of solutions of U (k) Hitchin equations on a cylinder with the asymptotic behavior as described in Sect. 6. Our proof is modelled on the argument given by Schenk [7] for instantons on a four-torus. Another proof, similar to that given by Donaldson and Kronheimer [13] for instantons on T 4 , is sketched in the Appendix. The direct Nahm transform is defined in terms of square-integrable sections ψ1 (x, s), . . . , ψk (x, s) of E ⊗ S which form an orthonormal basis of Ker D † . Here S is the spin bundle on X, and D † is twisted by s ∈ C. The sections ψ1 (x, s), . . . , ψk (x, s) span a fiber of Eˆ at a point s, so by combining them into a matrix E = (ψ1 , ψ2 , . . . , ψk ) ˆ ⊗ π ∗ (E ⊗ S). Here π : X × Xˆ → X and we obtain a section E(x, s) of πˆ ∗ (E) πˆ : X × Xˆ → Xˆ are the natural projections. By definition, E satisfies D † E = 0.
(53)
Since S is trivial and two-dimensional, we can view E as a pair of bundle morphisms ˆ to π ∗ (E). (We recall that we have Hermitian inner products on E (E1 , E2 ) from πˆ ∗ (E) and Eˆ and thus can identify E and Eˆ with their duals.) ˆ φ) ˆ reads In terms of E the expression for (A, ∂ ∂ ˆ − i As = d 3 x Tr spin E † (x, s) E(x, s), ∂s ∂s X (54) φˆ = d 3 x Tr spin E † (x, s)zE(x, s). X
360
S. Cherkis, A. Kapustin
∂ ∂ We denote by ∂s E the composition of E and ∂s , while the derivative of E with respect
∂ to s will be denoted by ∂s , E . In order to perform the inverse Nahm transform, we have to find a pair of sections † ˆ ⊗ C2 which span Ker Dˆ z,χ for all (z, χ ) ∈ X. In other ψˆ 1 (s, x), ψˆ 2 (s, x) of πˆ ∗ (E) ˆ must satisfy ˆ = (ψˆ 1 , ψˆ 2 ), E words, if we combine them into a k × 2 × 2 matrix E
ˆ = 0, Dˆ † E
(55)
and, with proper normalization, ˆ ˆ † (s, x) = 1E . x)E d 2 s Tr spin E(s,
(56)
Here 1E is the identity endomorphism E → E. ˇˆ φ) ˇˆ is given by ˆ the inverse Nahm transform (A, Given E, ˇˆ ˆ † (s, x)dx E(s, ˆ † (s, x) r E(s, ˆ ˆ d 2s E d 2s E x). x), φ(s) = dˇ =
(57)
Xˆ
Aˆ
Xˆ
Xˆ
ˇˆ φ) ˇˆ lies in finding The difficulty in establishing the equivalence of (A, φ) and (A, ˆ E(s, x) in terms of E(x, s). In the case of the Nahm transform on a four-torus this was ˆ T denote accomplished by Schenk [7], whose results we adapt to the case at hand. Let E ˆ E with the spinor indices transposed. We claim that √ 3 T ˆ d y E † (y, s)(D † D)−1 (y, x; s)e−iχx t , (58) E (s, x) = 2 2π X
where t = Ims, as usual. In what follows it will be convenient to regard x, y ∈ X as continuous labels and think of E as an object with one continuous and three discrete labels. Integration over x is then regarded as a summation over a continuous label and is not shown explicitly. The dependence on s will not be shown explicitly either. In this shortened notation Eq. (58) takes the form √ ˆ T = 2 2π E † (D † D)−1 e−iχt . E ˆ that is, whether ˆ is a well-defined section of E, The first thing to check is whether E ˆ +i, x) = E(s, ˆ E(s x). Since the twisted derivative along χ is given by ∂χ − iAχ + it , we see that E(x, s + i) is related to e−iχ E(x, s) by a U (k) gauge transformation. By making an s-dependent change of basis in Ker D † , we can always ensure that E(x, s + i) = e−iχ E(x, s). Furthermore, we have (D † D)−1 (x, y; s + i) = e−iχx (D † D)−1 (x, y; s)eiχy . ˆ + i, x) = E(s, ˆ It follows that E(s x). In terms of σ± = σ1 ± iσ2 and p± = (1 ± σ3 )/2 the untwisted operators Dˆ and Dˆ † take the following form: σ+ σ− ˆ φˆ † Dˆ = ∂Aˆ , ∂¯Aˆ , φ, −p+ −p−
Nahm Transform for Periodic Monopoles and N = 2 Super Y–M Theory
361
and
σ+
σ− ˆ φˆ † Dˆ † = (−1) ∂Aˆ , ∂¯Aˆ , φ, . p− p+
(59)
ˆ The statement that E(s, x) given by (58) satisfies (55) is equivalent to the following identity:
E
†
∂ ∂ ,E , , E , zE, z¯ E E † (D † D)−1 ∂s ∂ s¯
σ+T
− E (D D) †
†
−1
σ−T (0, 0, z, z¯ ) T = 0. p− T p+
(60)
We recall that z stands for an operator of multiplication by x1 + ix2 , or equivalently for an integral operator with a kernel (x1 + ix2 )δ(x, y). Making use of the identities EE † = 1 − D(D † D)−1 D † ,
(61)
∂ † −1 = (D † D)−1 (−p− D − D † p+ )(D † D)−1 , , (D D) ∂s
∂ † −1 = (D † D)−1 (−p+ D − D † p− )(D † D)−1 , , (D D) ∂ s¯ z, (D † D)−1 = (D † D)−1 (−σ+ D + D † σ+ )(D † D)−1 , z¯ , (D † D)−1 = (D † D)−1 (−σ− D + D † σ− )(D † D)−1 ,
(62)
T = p , one can see that Eq. (60) is a consequence of a matrix and σ±T = σ∓ and p± ± identity
(p− , p+ , σ+ , σ− )D + (p+ , p− , −σ+ , −σ− )D †
σ−
σ+ + D (p+ , p− , −σ+ , −σ− ) = 0, p− p+ †
ˆ defined by Eq. (58) indeed solves which can be readily verified. Thus we conclude that E ˆ = 0. Dˆ † E
362
S. Cherkis, A. Kapustin
Next we verify Eq. (56). With the help of Eqs. (62) we find Tr spin (D † D)−1 1 − D(D † D)−1 D † (D † D)−1 1 . = − Tr spin 4∂s¯ ∂s (D † D)−1 − z¯ , z, (D † D)−1 8
(63)
ˆ we obtain Combining this identity with the the formula for E, Xˆ
ˆ 2 , s) = −π Tr spin ˆ † (x1 , s)E(x d 2 s Tr spin E
Xˆ
d 2 s eit (χ1 −χ2 )
× (4∂s¯ ∂s − (z¯1 − z¯2 )(z1 − z2 )) (D † D)−1 (x1 , x2 ; s).
(64)
Integrating by parts and considering the limit of x2 approaching x1 , we see that in this limit the integral is dominated by the region of large r = Re s. Thus to estimate the integral it is sufficient to consider the large r limit, where (D † D)−1 reduces to the Green’s function of the operator −∇ 2 + r 2 . We conclude that in the limit x1 → x2 the right-hand side of Eq. (64) reduces to +∞ e−|r||x1 −x2 | . dr −|x1 − x2 |2 −2π1E lim x1 →x2 −∞ 4π |x1 − x2 | Performing the integral over r, we get (56). ˆ one can find the result of the inverse Nahm transform. Now, having constructed E, Let us start with a few useful identities valid for any smooth section P(s) of Eˆ which decays rapidly as |s| → ∞ together with all its derivatives (i.e. belongs to the Schwarz space): d 2 s E(x, s) ∂A+ P(s) = d 2 s D(D † D)−1 p− E P(s), ˆ aˆ Xˆ Xˆ 2 ¯ d s E(x, s) ∂A+ d 2 s D(D † D)−1 p+ E P(s), ˆ aˆ P(s) = Xˆ Xˆ d 2 s E(x, s)φˆ − zE(x, s) P(s) = d 2 s D(D † D)−1 σ+ E P(s), ˆ ˆ X X 2 † ˆ d s E(x, s)φ − z¯ E(x, s) P(s) = d 2 s D(D † D)−1 σ− E P(s). Xˆ
Xˆ
ˆ T (s, x ), set x = x, multiply them Now substitute into these formulas P(s) = eitχ E from the right by σ− , σ+ , p− , and p+ , respectively, and sum them up. The left-hand T ˆ side of the resulting identity will be proportional to D † E and therefore will vanish.
Thus we get
σ−
σ ˆ T + = 0. d 2 s D(D † D)−1 (p− , p+ , σ+ , σ− ) Eeiχt E Xˆ p− p+
(65)
Nahm Transform for Periodic Monopoles and N = 2 Super Y–M Theory
363
† √ ˆ T and using an identity Substituting 2 2πeiχt (D † D)−1 E = E
σ−
σ+ (p− , p+ , σ+ , σ− )M = (σ+ + σ− )Tr spin M p− p+ valid for any 2 × 2 matrix M, we are left with † ˆT E ˆ T = 0. d 2 s eiχt De−iχt (σ+ + σ− )Tr spin E Xˆ
(66)
(67)
This operator equation in spin space can be rewritten as four “scalar” equations which exˇˆ φ). ˇˆ For example, the vanishing of the coefficient press the coincidence of (A, φ) and (A, of p+ in Eq. (67) implies ∂ ∂ † ˆ † (s, x) E ˆ ˆ 1 (s, x) + ˆ 2 (s, x) = 0, E d 2s (s, x) E − iAz E − iA z 1 2 ∂z ∂z Xˆ (68) ˆ 1 and E ˆ 2 are the two spinor components of E. ˆ This equation simply says that where E ˇ Az = Aˆ z . Writing out the coefficient of σ+ , we obtain
d s 2
Xˆ
∂ † ˆ ˆ 1 (s, x) − iAχ − φ + r E1 (s, x) E ∂χ ∂ † ˆ ˆ 2 (s, x) = 0, + − iAχ − φ + r E2 (s, x) E ∂χ
(69)
ˇˆ and φ = φ. ˇˆ This completes the proof. which implies Aχ = A χ 10. Remarks on the Existence of Periodic Monopoles It is intuitively plausible that periodic monopoles exist for all k > 0.1 In fact, when the parameter v in (5) is large, one can propose a simple way of constructing approximate solutions of Bogomolny equations on R2 × S 1 . One considers a charge k SU (2) monopole on R3 located near the origin of R3 . It approaches a charge k Dirac monopole exponentially fast at distances larger than 1/v [6]. Then one can patch it with a periodic Dirac monopole solution of Sect. 1 at distances larger than 1/v but smaller than 1, and obtain an accurate approximation to a nonabelian periodic monopole. To prove the existence of periodic monopoles for all v and k it is easier to use the correspondence between periodic monopoles and solutions of Hitchin equations on a 1 From the string theory point of view, periodic SU (2) monopoles can be identified with D4 branes suspended between two parallel NS5 branes, with one direction common to D4-branes and NS5-branes compactified on a circle (see Sect. 2 for details). This brane configuration surely exists, so one is tempted to dismiss the question of the existence of periodic monopoles as trivial. But it is far from obvious that suspended D4-branes are represented by nonsingular field configurations on the NS5-branes after a T-duality along the compact direction.
364
S. Cherkis, A. Kapustin
cylinder established above. For k = 1 one can write down explicitly a family of solutions of Hitchin equations with required asymptotics: ˆ φ(s) = e2π(v+s) + e2π(v−s) + c,
c ∈ C,
A = βdt,
β ∈ R/(2π Z).
(70)
This proves that a periodic monopole of charge 1 exists. Moreover, it is easy to see that any solution of U (1) Hitchin equations with the boundary conditions described in Sect. 6 is gauge-equivalent to (70). Thus a periodic monopole with k = 1 has three real moduli (Re c, Im c, β). (Caution: we do not claim that there is a natural metric on this moduli space, and in fact we will see below that this is not true.) They arise from the translational invariance of the Bogomolny equations and parameterize R2 × S1 = X. We may regard them as describing the location of the monopole on X. For k > 1 finding solutions of Hitchin equations is harder, and we do not have a satisfactory proof of their existence. Below we merely sketch a possible approach to the proof based on the holomorphic description of solutions of Hitchin equations. The idea of the holomorphic approach is familiar to physicists in the guise of the following principle: the space of solutions of D and F-flatness conditions modulo a compact gauge group is the same as the space of solutions of the F-flatness conditions modulo the complexified gauge group (this principle is often referred to as the Luty–Taylor theorem [16]). Let us apply this principle to our problem. The “complex” Hitchin equation is invariant with respect to the complexified gauge transformations, i.e. gauge transformations which are GL(k, C)-valued. The “real” Hitchin equation is invariant only with respect to U (k) gauge transformations. Thus from the physical point of view, the “real’ and “complex” Hitchin equations play the role of the D-flatness and F-flatness conditions, respectively, and it is natural to consider the space of solutions of the “complex” Hitchin equation modulo the complexified gauge group. The Hermitian inner product on the bundle Eˆ then plays no role, and all we have is a holomorphic bundle over Xˆ ∼ = C∗ . ˆ Such The “complex” Hitchin equation says that φˆ is a holomorphic section of End(E). ˆ ˆ ˆ a pair, a holomorphic bundle E on X and a holomorphic section of End(E) ⊗ )Xˆ , is called a Higgs bundle. Obviously, we have a forgetful map from the moduli space of solutions of the full Hitchin equations to the moduli space of Higgs bundles. ˆ and in the next section we will give It is very easy to construct Higgs bundles on X, a rather explicit description of their moduli space. Thus if the above-mentioned map is surjective, the existence of solutions of Hitchin equations will be established. Let us explain why it is plausible that every suitable Higgs bundle comes from a solution of Hitchin equations. In the case of Hitchin equations on a compact Riemann surface, one can prove that any stable Higgs bundle is related by a GL(k, C) gauge transformation to a solution of Hitchin equations. The role of the stability condition is to ensure that the complexified gauge group acts freely on the Higgs bundles. One may also consider solutions of Hitchin equations on a punctured Riemann surface with “tame” singularities at the punctures [17, 18]. (“Tame” means that the eigenvalues of the Higgs field grow at most as 1/r as one approaches the puncture.) The corresponding holomorphic object is a Higgs bundle on the punctured Riemann surface whose Higgs field has simple poles at the punctures. Again there is a stability condition on the Higgs bundle which ensures that the complexified gauge group acts freely on its orbit, and any stable Higgs bundle on a punctured Riemann surface comes from a solution of Hitchin equations with tame singularities [17, 18]. In our case the Higgs field is not “tame” at infinity. To see this, let us make a conformal transformation w = exp(2πs) which maps the cylinder to C∗ . Keeping in mind that the
Nahm Transform for Periodic Monopoles and N = 2 Super Y–M Theory
365
ˆ ⊗ )1,0 , we see that its eigenvalues near w = 0 behave Higgs field is a section of End(E) −1−1/k as |w| , i.e. the singularity is not “tame”. A similar problem occurs at w = ∞. This means that we cannot use the results of [17, 18]. Still, the above discussion suggests that the important thing is for complexified gauge transformations to act freely on the Higgs bundles. It appears that this condition is always satisfied if the spectral curve of the Higgs bundle is given by zk + a1 zk−1 + . . . + ak − 2e−2πv cosh(2π s) = 0.
(71)
Indeed, if there were a GL(k, C) transformation which would leave our Higgs bundle ˆ φ) ˆ invariant, this would mean that Eˆ has a rank one holomorphic subbundle invariant (E, ˆ However, this is clearly not true for large |Res| because of the asympwith respect to φ. ˆ they are all distinct and cyclically permuted as one totic behavior of the eigenvalues of φ: goes around the circumference of the cylinder. Thus it seems plausible that any Higgs bundle whose spectral curve has the form (71) is related by a GL(k, C) transformation to a solution of Hitchin equations with required asymptotics. One could ask if there could be a one-to-one correspondence between solutions of Hitchin equations on a cylinder modulo U (k) gauge transformations and holomorphic Higgs bundles with the spectral curve (71) modulo GL(k, C) gauge transformations. This would be the analogue of the Luty–Taylor theorem for Hitchin equations on a cylinder. If the question is posed this way, the answer is negative. Indeed, we already saw that rank one solutions of Hitchin equations are parameterized by a complex number c which describes the Higgs field, and a real number β which parameterizes the monodromy of Aˆ around the circumference of the cylinder. On the other hand, the bundle Eˆ is holomorphically trivial, so all the information about the holomorphic Higgs bundle is described by c. In other words, the information about the monodromy of Aˆ is lost in the holomorphic picture. In the case of Higgs bundles with “tame” singularities, the situation is similar: the information about the monodromy of Aˆ around the punctures is lost upon passing to a Higgs bundle. But there is a way to fix this: one has to consider Higgs bundles with “parabolic structure” at the punctures [17]. Parabolic structure essentially encodes the conjugacy class of the monodromy. In our case it is reasonable to conjecture that what is missing in the naive holomorphic description is precisely the information about the monodromy of Aˆ at infinity. In fact, we have an a priori knowledge (see Sect. 6) that the monodromy is given by exp(iβ)V ±1 , where V is the “shift” matrix (36), and β ∈ R/(2πZ). Thus the holomorphic description misses one real parameter β ∈ R/(2πZ). We are led to the following conjecture. Let MH i,k be the moduli space of solutions of U (k) Hitchin equations on a cylinder with asymptotics (35). Let MH B,k be the moduli space of holomorphic Higgs bundles on C∗ whose spectral curve has the form (71). The forgetful map from MH i,k to MH B,k is surjective, and moreover is a fiber bundle with fiber S1 . In the next section we will test this conjecture by computing the dimension of MH B,k and comparing with expectations from N = 2 super-Yang–Mills. 11. The Moduli Space of Periodic Monopoles In this section we describe the moduli space MH B,k of solutions of the complex Hitchin equation on R×S1 with the spectral curve (71). Assuming that the analogue of the Luty– Taylor theorem formulated in the previous section is true, the moduli space of charge k
366
S. Cherkis, A. Kapustin
periodic monopoles is fibered over MH B,k with fiber S1 . As explained below, there is an alternative way to view the relation between MH B,k and periodic monopoles: a certain submanifold in MH B,k of complex codimension one coincides with the centered moduli space of charge k periodic monopoles. We compare our results with the expectations from string theory and discuss the existence of a hyperkähler metric on the centered moduli space of periodic monopoles. We already know how to associate a spectral curve C ∈ C∗ × C∗ a nd a coherent sheaf N on it to every solution of the complex Hitchin equation. For a generic solution, the curve C is nonsingular, and N is a line bundle (see Sect. (3)). The curve has the form w2 − wf (z) + 1 = 0, where f (z) is a polynomial of degree k whose leading coefficient is a known constant. Thus to specify the polynomial f (z) we need to specify its k coefficients a1 , . . . , ak . For k = 1 the spectral curve is rational, and its compactification is a P1 . To understand what happens for k > 1, it is convenient to rewrite the equation of the curve C in the form w˜ 2 =
1 f (z)2 − 1, 4
(72)
where w˜ = w − f (z)/2. This equation implies that the compactification of C is a hyperelliptic curve in P2 . It is well known that a hyperelliptic curve in P2 has singularities, in this case over the point z = ∞. Its desingularization has genus k − 1 and will be ˜ The pull-back of the line bundle N to C ˜ will be denoted by the same letter N . denoted C. ˜ ˆ Since E is a trivial bundle, N has zero degree. The moduli space of line bundles over C ˜ with fixed degree is simply the Jacobian of C, which is an Abelian variety of dimension k − 1. ˜ and a line bundle over it, we can Conversely, starting from a hyperelliptic curve C reconstruct the solution of the complex Hitchin equation. The bundle Eˆ over C∗ is obtained by pushing forward N with respect to the projection (w, z) $ → w. The Higgs ˆ is defined as follows: field φˆ ∈ 9(End(E)) φˆ : vw,z $ → z vw,z . Thus we obtain the following description of the moduli space of MH B,k valid in an ˜ N ), where C ˜ is a hyperelliptic curve of genus k − 1 open set: it is the space of pairs (C, and N is a degree 0 line bundle on it. Hence MH B,k is a complex manifold of dimension 2k − 1 fibered over Ck by Abelian varieties of dimension k − 1. It follows that the moduli space MH i,k , and therefore the moduli space of periodic monopoles of charge k, has real dimension 4k − 1. This coincides with the dimension of the moduli space of SU (2) monopoles of charge k on R3 . Furthermore, string theory predicts that the centered moduli space of periodic monopoles has dimension 4k − 4 (see Sect. 2). Centering the monopole amounts to setting β = 0, and a1 = 0, where a1 is the coefficient of zk−1 in f (z). Indeed, we already explained in Sect. 4 that the positions of the constituent charge 1 monopoles on R2 are the roots of the equation f (z) = 2, so setting a1 = 0 has the effect of making the center-of-mass of the monopole located at z = 0. It is also easy to check that a translation along S1 has the effect of shifting β. Thus the centered moduli space of periodic monopoles of charge k is a hypersurface in MH B,k given by the equation a1 = 0. It has complex dimension 2k−2, in agreement with
Nahm Transform for Periodic Monopoles and N = 2 Super Y–M Theory
367
string theory predictions. This lends support to the conjectured correspondence between solutions of Hitchin equations on a cylinder and a special class of Higgs bundles. Moreover, one expects on physical grounds that the moduli space of the N = 2 super Yang–Mills compactified on a circle has a distinguished complex structure in which it is a complex manifold fibered over Ck−1 by Abelian varieties of dimension k − 1 [9, 19]. We saw above that this is indeed true. Let us now turn to the issue of the hyperkähler metric on the moduli space of periodic monopoles. Supersymmetry implies that the Coulomb branch of the N = 2 SU (k) superYang–Mills theory compactified on a circle must be a complete hyperkähler manifold [8], so we expect that the hyperkähler metric exists for the centered moduli space. In contrast to monopoles on R3 , we do not expect to have a well-defined metric on the uncentered moduli space. The reason for this is that the uncentered monopoles would correspond to a U (k) gauge theory in d = 4, but the latter does not make sense as a quantum theory because it is not asymptotically free. We can also explain this in a purely classical way, which does not involve quantum N = 2 super-Yang–Mills theory. The difference between the centered and the uncentered moduli spaces is that in the former case we mod out by translations of R2 × S1 , while in the latter case we don’t. The reason why one needs to divide by the translations group to get a well-defined metric is that the tangent vectors to the moduli space corresponding to the translations on R2 are not normalizable, i.e. their L2 norm diverges. This tangent vector is given by (δA, δφ) = (∂z A, ∂z φ). According to (5), ∂z φ decays only as 1/z, therefore the L2 norm of this tangent vector is logarithmically divergent. The above arguments demonstrate that there is no well-defined metric on the uncentered moduli space, but they do not prove that there is one on the centered moduli space. This can be argued as follows. As explained in Sect. 6, for large |z| a nonabelian periodic monopole is exponentially close to a periodic Dirac monopole embedded in SU (2). Thus to count L2 deformations it is sufficient to use the abelian asymptotics (5,6). Then it is easy to see that changing the locations of monopoles while keeping their center-of-mass fixed changes the Higgs field only by terms which decay as 1/|z|2 (this is essentially multipole expansion). Thus all such deformations have finite L2 norm. There are 3k − 3 such tangent vectors. Using the quaternionic structure of the tangent space (see below), one can show that the remaining k − 1 tangent vectors are also normalizable. From the mathematical point of view, it may be easier to count L2 deformations in the Nahm-transformed picture. Note that setting β = 0, a1 = 0 amounts to setting Tr φˆ = 0 and passing from the U (k) to SU (k) Hitchin equations. SU (k) Hitchin equations may be regarded as hyperkähler moment map equations for the action of the SU (k) gauge group on the cotangent bundle of the space of SU (k) connections on R × S1 [14]. Formally, the hyperkähler quotient construction [20] implies that the moduli space of SU (k) Hitchin equations has a hyperkähler metric. In order to prove the existence of a hyperkähler metric on the centered moduli space, it is sufficient to show that the space of L2 deformations of SU (k) Hitchin equations on a cylinder has the expected dimension 4k − 4. The properties of the hyperkähler metric on the centered moduli space of periodic monopoles will be discussed elsewhere [1]. One thing is clear though: in the limit when the circumference of S1 goes to infinity, the centered moduli space of periodic monopoles smoothly goes over to the centered moduli space of monopoles on R3 . In particular, the metric on the centered moduli space of a charge 2 periodic monopole is a deformation of the Atiyah–Hitchin metric. It would be very interesting to find the explicit form of this metric. On physical grounds, we expect that it is hyperkähler and asymptotically locally
368
S. Cherkis, A. Kapustin
flat. But unlike the Atiyah–Hitchin metric, which has an SU (2) isometry, the new metric seems to have no continuous isometries. Acknowledgements. We are grateful to Nigel Hitchin for a very helpful conversation concerning the definition of the monopole spectral data, and to Dmitri Orlov and Marcos Jardim for discussions. We also wish to thank the organizers of the workshop “The Geometry and Physics of Monopoles,” Edinburgh, August-September 1999, for creating a very stimulating atmosphere during the meeting and for providing us with an opportunity to present a preliminary version of this work. The work of S.Ch. was supported in part by NSF grant PHY9819686. The work of A.K. was supported in part by a DOE grant DE-FG02-90ER4054442.
Appendix In Sect. 9 we proved that the composition of the Nahm transform of Sect. 3 and the inverse Nahm transform of Sect. 7 is the identity map on the gauge-equivalence classes of periodic monopole configurations. For the sake of completeness, we present here an outline of a cohomological proof of this fact in the spirit of reference [13]. Both the direct and inverse Nahm transforms, as well as a map identifying the result of the composition of the two transforms with the initial configuration, will emerge from the spectral sequence of a double complex. Nahm transform, as described in Subsect. 5.1, is given in terms of the cohomology of the following complex: D¯ 0
D¯ 1
0 → B0,0 (X, E) −→ B0,1 (X, E) −→ B0,2 (X, E) → 0.
(73)
ˆ In the trivialization of the The differentials D¯ p , p = 0, 1, here are twisted by xˆ ∈ X. ¯ ¯ Poincaré bundle defined in Sect. 8 the operators D0 and D1 are given by: ∂ ∂ D¯ p = d z¯ ∧ 2( − iAz¯ ) + dχ ∧ ( − iAχ − φ + r). ∂ z¯ ∂χ
(74)
ˆ such that its restriction to X × xˆ is E and Consider a trivial bundle E → X × X, restriction to x × Xˆ is a trivial bundle with fiber E|x . For each xˆ ∈ Xˆ it has an action of the operator D¯ twisted by x. ˆ Let )p denote the sheaf of E-valued rapidly decaying p-forms spanned by the differentials dχ and d z¯ with coefficients depending on x and x. ˆ Here by “rapidly decaying” we mean rapidly decaying both for large |z| and large Re s. Consider a double complex )O 2
δ0
D¯ 1
C p,q :
)O 1
δ1
D¯ 1 δ0
D¯ 0
)0
/ )2 ⊕ ) 2 O / )1 ⊕ ) 1 O
D¯ 1 δ1
D¯ 0 δ0
/ )0 ⊕ ) 0
/ )2 O / )1 O D¯ 0
δ1
/ )0 ,
(75)
Nahm Transform for Periodic Monopoles and N = 2 Super Y–M Theory
where δ0 and δ1 act as follows: δ0 : f →
∂
−zf
+ 21 χ f g0 ∂ δ1 : → −2 + ∂ s¯ g1 2
369
∂ s¯
, (76)
1 χ g0 − zg1 . 2
It is easy to check that D¯ and δ commute. The zeroth and second cohomology of D¯ vanish, ˆ Thus the cohomology of columns with respect to D¯ and the first cohomology yields E. is:
p,q : E 1
/0
/0
0 δˆ0
ˆ0 B
δˆ1
/B ˆ0⊕B ˆ1 /0
0
(77)
/B ˆ1 / 0,
which contains exactly the sequence (41) defining the inverse Nahm transform. Thus on the second level the sequence degenerates to
p,q : E 2
0
0
0
ˆ E) ˆ H ˆ0 (X,
ˇˆ = H 1 (X, ˆ E) ˆ E ˆ δ
ˆ E) ˆ H ˆ2 (X,
0
0
0.
δ
(78)
δ
Computation of the other spectral sequence E p,q is exactly analogous to that of [13], the result being
p,q
E2
:
0
0
0
0
0
0
0
0
E|x=0 .
(79)
Comparing the total cohomologies of the double complex (75) p,q
p,q
∞ , ⊕p+q=n E∞ = ⊕p+q=n E ˇˆ we conclude that E|x=0 = E| x=0 .
(80)
370
S. Cherkis, A. Kapustin
ˇˆ Chasing the spectral sequence we can obtain the isomorphism ω : E| x=0 → E|x=0 ˇ 1 ˆ explicitly. Namely, an element of E|x=0 can be represented by α ∈ ) ⊕ )1 harmonic ¯ for some β ∈ )0 . The isomorphism ω takes with respect to D¯ and such that δ1 α = Dβ α to the value of β at xˆ = 0 integrated over X × {0}. The equation for β is solved by −1 ∗ D¯ δ1 α, thus β = D¯ ∗ D¯ −1 ∗ ω : α $→ D¯ , δ1 α. (81) dx D¯ ∗ D¯ X×{0}
−1
Note that D¯ ∗ D¯ is proportional to the Green’s function (D † D)−1 , while the commu ∗ tator D¯ , δ1 has a simple form: it maps (g0 , g1 ) ∈ C 1,1 = )1 ⊕ )1 to 2(g0,χ + g1,¯z ). The point x = 0 was not distinguished in any natural way, and twisting the vertical operator δ of the double complex (75) by (−x0 ) ∈ X and computing the spectral sequence ˇˆ would lead to an isomorphism ω : E| x=x0 → E|x=x0 . Therefore the isomorphism ω is an isomorphism of bundles on X. Using the above explicit formula for ω, it can ˇˆ = A and ¯ This shows that A be checked that it commutes with the differential D. z¯ z ˇˆ + φ) ˇˆ = (iA + φ). (i A χ
χ
In order to conclude that the isomorphism ω takes the original monopole data (A, φ) ˇˆ φ) ˇˆ we need to say a few words regarding the naturalness of the above construction. to (A, The way to present the direct (as well as inverse) Nahm transform in cohomological terms is not unique. For example, we could have replaced D¯ with a differential ∂ ∂ ∂ (dx1 − idχ ) ∧ + dx2 − iA1 − i − iA2 + (φ − r) , − iAχ ∂x1 ∂χ ∂x1 (82)
and modified the cohomological construction accordingly. This amounts to identifying X ∼ = R × C∗ instead of X ∼ = C × S1 . This arbitrariness is exactly the same as the arbitrariness in the choice of complex structure in the twistor description of monopoles, as well as in the discussion of Higgs bundles and Nonabelian Cohomology in [17]. As ˇˆ → E in the case of a four-torus [13], we could have constructed an isomorphism η : E preserving an appropriate differential for each choice of the identification X ∼ = R × C∗ . We would have discovered then that the isomorphism η is always given by the formula ˇˆ φ) ˇˆ to (A, φ). (81) and therefore coincides with ω. Therefore, ω maps (A, References 1. Cherkis, S. and Kapustin, A.: New Hyperkähler Metrics From Periodic Monopoles. Work in progress 2. Chalmers, G. and Hanany, A.: Three Dimensional Gauge Theories And Monopoles. Nucl. Phys. B 489, 223 (1997) [hep-th/9608105] 3. Hanany, A. and Witten, E.: Type IIB Superstrings, BPS Monopoles, And Three-Dimensional Gauge Dynamics. Nucl. Phys. B 492, 152 (1997) [hep-th/9611230] 4. Witten, E.: Solutions of Four-Dimensional Field Theories Via M-Theory. Nucl. Phys. B 500, 3 (1997) [hep-th/9703166] 5. Cherkis, S. and Kapustin, A.: Periodic Monopoles With Singularities and N = 2 Super-QCD. To appear 6. Jaffe, A. and Taubes, C.: Vortices And Monopoles. Structure of Static Gauge Theories. Boston: Birkhäuser, 1980 7. Schenk, H.: On A Generalized Fourier Transform of Instantons over Flat Tori. Commun. Math. Phys. 116, 177–183 (1988)
Nahm Transform for Periodic Monopoles and N = 2 Super Y–M Theory
371
8. Seiberg, N. and Witten, E.: Gauge Dynamics and Compactification to Three Dimensions. [hepth/9607163] 9. Seiberg, N. and Witten, E.: Electric–Magnetic Duality, Monopole Condensation, and Confinement in N = 2 Supersymmetric Yang–Mills Theory. Nucl. Phys. B 426, 19 (1994) [hep-th/9407087] 10. Argyres, P.C. and Faraggi, A.E.: The Vacuum Structure and Spectrum of N = 2 Supersymmetric SU (n) Gauge Theory. Phys. Rev. Lett. 74, 3931 (1995) [hep-th/9411057] 11. Braam, P.J. and van Baal, P.: Nahm’s Transformation for Instantons. Commun. Math. Phys. 122, 267 (1989) 12. Jardim, M.: Nahm Transform for Doubly-Periodic Instantons. math.dg/9910120; Spectral Curves and Nahm Transform for Doubly Periodic Instantons. math.ag/9909146 13. Donaldson, S.K. and Kronheimer, P.B.: The Geometry Of Four-Manifolds. Oxford Mathematical Monographs, New York: Clarendon Press, Oxford University Press, 1990 14. Hitchin, N.J.: The Self-duality Equations on a Riemann Surface. Proc. Lond. Math. Soc. 55, 59 (1987); Stable Bundles and Integrable Systems. Duke. Math. J. 54, 91 (1987) 15. Callias, C.: Index Theorems On Open Spaces. Commun. Math. Phys. 62, 213 (1978) 16. Luty, M.A. and Taylor, W.I.: Varieties Of Vacua in Classical Supersymmetric Gauge Theories. Phys. Rev. D 53, 3399 (1996) [hep-th/9506098] 17. Simpson, C.T.: Higgs Bundles And Local Systems. IHES Publ. Math. 75, 5 (1992); The Hodge Filtration On Nonabelian Cohomology. In: Algebraic Geometry – Santa Cruz 1995, Proc. Symp. Pure Math. 62, Part 2, Providence, RI: Am. Math. Soc., 1997, p. 217 18. Konno, H.: Construction of The Moduli Space of Stable Parabolic Higgs Bundles on a Riemann Surface. J. Math. Soc. Japan 45, 253 (1993) 19. Donagi, R. and Witten, E.: Supersymmetric Yang–Mills Theory And Integrable Systems. Nucl. Phys. B 460, 299 (1996) [hep-th/9510101] 20. Hitchin, N.J., Karlhede, A., Lindstrom, U. and Roˇcek, M.: Hyperkähler Metrics and Supersymmetry. Commun. Math. Phys. 108, 535 (1987) Communicated by A. Connes
Commun. Math. Phys. 218, 373 – 391 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
A Riemann–Roch Theorem for One-Dimensional Complex Groupoids Denis Perrot Centre de Physique Théorique, CNRS-Luminy, Case 907, 13288 Marseille cedex 9, France. E-mail:
[email protected] Received: 1 February 2000 / Accepted: 3 December 2000
Abstract: We consider a smooth groupoid of the form , where is a Riemann surface and a discrete pseudogroup acting on by local conformal diffeomorphisms. After defining a K-cycle on the crossed product C0 () generalising the classical Dolbeault complex, we compute its Chern character in cyclic cohomology, using the index theorem of Connes and Moscovici. This involves in particular a generalisation of the Euler class constructed from the modular automorphism group of the von Neumann algebra L∞ () . 1. Introduction In a series of papers [4, 5], Connes and Moscovici proved a general index theorem for transversally (hypo)elliptic operators on foliations. After constructing K-cycles on the algebra crossed product C0 (M) , where is a discrete pseudogroup acting on the manifold M by local diffeomorphisms [4], they developed a theory of characteristic classes for actions of Hopf algebras that generalise the usual Chern–Weil construction to the non-commutative case [5, 6]. The Chern character of the concerned K-cycles is then captured in the periodic cyclic cohomology of a particular Hopf algebra encoding the action of the diffeomorphisms on M. The nice thing is that this cyclic cohomology can be completely exhausted as Gelfand–Fuchs cohomology and renders the index computable. We shall illustrate these methods with a specific example, namely the crossed product of a Riemann surface by a discrete pseudogroup of local conformal mappings. We find that the relevant characteristic classes are the fundamental class [] and a cyclic 2-cocycle on Cc∞ () generalising the (Poincaré dual of the) usual Euler class. When applied to the K-cycle represented by the Dolbeault operator of , this yields a non-commutative version of the Riemann–Roch theorem. Throughout the text we also stress the crucial role played by the modular automorphism group of the von Neumann algebra L∞ () . Allocataire de recherche MENRT.
374
D. Perrot
2. The Dolbeault K-Cycle Let be a Riemann surface without boundary and a pseudogroup of local conformal mappings of into itself. We want to define a K-cycle on the algebra C0 () generalising the classical Dolbeault complex. Following [4], the first step consists in lifting the action of to the bundle P over , whose fiber at point x is the set of Kähler metrics corresponding to the complex structure of at x. By the obvious correspondence metric ↔ volume form, P is the R∗+ -principal bundle of densities on . The pseudogroup acts canonically on P and we consider the crossed product C0 (P ) . Let ν be a smooth volume form on .As in [2], this gives a weight on the von Neumann algebra L∞ () together with a representative σ of its modular automorphism group. Moreover σ leaves C0 () globally invariant and one has C0 (P ) = (C0 () ) σ R,
(1)
where the space P is identified with × R thanks to the choice of the global section ν. Therefore one has a Thom-Connes isomorphism [1] Ki (C0 () ) → Ki+1 (C0 (P ) ),
i = 0, 1,
(2)
and we shall obtain the desired K-homology class on C0 (P ). The reason for working on P rather than is that P carries quasi -invariant metric structures, allowing the construction of K-cycles represented by differential hypoelliptic operators [4]. More precisely, consider the product P × R, viewed as a bundle over with 2dimensional fiber. The action of extends to P ×R by making R invariant. Up to another Thom isomorphism, the K-cycle may be defined on C0 (P × R) = (C0 (P ) ) ⊗ C0 (R). By a choice of horizontal subspaces on the bundle P ×R, one can lift the Dolbeault ∗ ∗ operator ∂ of . This yields the horizontal operator QH = ∂ +∂ , where the adjoint ∂ is 2 taken relative to the L -norm given by the canonical invariant measure on P × R (see [4] for details). Finally, consider the signature operator of the fibers, QV = dV dV∗ − dV∗ dV , where dV is the vertical differential. Then the sum Q = QH + QV is a hypoelliptic operator representing our Dolbeault K-cycle. This construction ensures that the principal symbol of Q is completely canonical, because it is related only to the fibration of P ×R over , and hence is invariant under . Another choice of horizontal subspaces does not change the leading term of the symbol of Q. This is basically the reason why Q allows one to construct a spectral triple (of even parity) for the algebra Cc∞ (P × R) . If = Id, then C0 (P × R) = C0 () ⊗ C0 (R2 ) and the addition of QV to QH is nothing else but a Thom isomorphism in K-homology K ∗ (C0 ()) → K ∗ (C0 (P × R))
(3)
∗
sending the classical Dolbeault elliptic operator ∂ + ∂ to Q. Now we want to compute the Chern character of Q in the periodic cyclic cohomology H ∗ (Cc∞ (P × R) ) using the index theorem of [5]. We need first to construct an odd cycle by tensoring the Dolbeault complex with the spectral triple of the real line ∂ ∂ (Cc∞ (R), L2 (R), i ∂x ). In this way we get a differential operator Q = Q + i ∂x whose ∞ ∞ 2 Chern character lives in the cyclic cohomology of (Cc (P ) ) ⊗ Cc (R ). By Bott periodicity it is just the cup product ch∗ (Q ) = ϕ#[R2 ]
(4)
Riemann–Roch Theorem for One-Dimensional Complex Groupoids
375
of a cyclic cocycle ϕ ∈ H C ∗ (Cc∞ (P ) ) by the fundamental class of R2 . The main theorem of [5] states that ϕ can be computed from Gelfand–Fuchs cohomology, after transiting through the cyclic cohomology of a particular Hopf algebra. We perform the explicit computation in the remainder of the paper. 3. The Hopf Algebra and Its Cyclic Cohomology First we reduce to the case of a flat Riemann surface, since for any groupoid one can find a flat surface and a pseudogroup acting by conformal transformations on such that C0 ( ) is Morita equivalent to C0 () (see [5] and Sect. 5 below). Let then be a flat Riemann surface and (z, z) a complex coordinate system corresponding to the complex structure of . Let F be the Gl(1, C)-principal bundle over of frames corresponding to the conformal structure. F is gifted with the coordinate system (z, z, y, y), y, y ∈ C∗ . A point of F is the frame (y∂z , y∂z )
at (z, z).
(5)
The action of a discrete pseudogroup of conformal transformations on can be lifted to an action on F by pushforward on frames. More precisely, a holomorphic transformation ψ ∈ acts on the coordinates by z → ψ(z),
Domψ ⊂ F,
y → ψ (z)y,
(6)
ψ (z) = ∂z ψ(z).
(7)
Let Cc∞ (F ) be the algebra of smooth complex-valued functions with compact support on F , and consider the crossed product A = Cc∞ (F ) . A is the associative algebra linearly generated by elements of the form f Uψ∗ with ψ ∈ , f ∈ Cc∞ (F ), suppf ⊂ Domψ. We adopt the notation Uψ ≡ Uψ∗ −1 for the inverse of Uψ∗ . The multiplication rule f1 Uψ∗1 f2 Uψ∗2 = f1 (f2 ◦ ψ1 )Uψ∗2 ψ1
(8)
makes good sense thanks to the condition suppfi ⊂ Domψi . We introduce now the differential operators X = y∂z ,
Y = y∂y ,
X = y∂z ,
Y = y∂y ,
(9)
forming a basis of the set of smooth vector fields viewed as a module over C ∞ (F ). These operators act on A in a natural way: X.(f Uψ∗ ) = (X.f )Uψ∗ ,
Y.(f Uψ∗ ) = (Y.f )Uψ∗
(10)
and similarly for X, Y . We remark that the system (z, z) determines a smooth volume form dz∧dz on . This in turn gives a representative σ of the modular automorphism 2i group of L∞ () , whose action on Cc∞ () reads (cf. [3] chap. III) σt (f Uψ∗ ) = |ψ |2it f Uψ∗ ,
t ∈ R.
(11)
We let D be the derivation corresponding to the infinitesimal action of σ : D = −i
d σt |t=0 dt
D(f Uψ∗ ) = ln |ψ |2 f Uψ∗ .
(12)
376
D. Perrot
The operators δn , δ n , n ≥ 1 are defined recursively δn = [X, . . . [X, D] . . . ] δ n = [X, . . . [X, D] . . . ]. n
(13)
n
Their action on A are explicitely given by δn (f Uψ∗ ) = y n ∂zn (ln ψ )f Uψ∗ ,
δ n (f Uψ∗ ) = y n ∂zn (ln ψ )f Uψ∗ .
(14)
Thus δn , δ n represent in some sense the Taylor expansion of D. All these operators fulfill the commutation relations [Y, X] = X, [Y, δn ] = nδn , [X, δn ] = δn+1 , [δn , δm ] = 0,
(15)
and similarly for the conjugates X, Y , δ n . Thus {X, Y, δn , X, Y , δ n }n≥1 form a basis of a (complex) Lie algebra. Let H be its enveloping algebra. The remarkable fact is that H is a Hopf algebra. First, the coproduct ' : H → H ⊗ H is determined by the action of H on A: 'h(a1 ⊗ a2 ) = h(a1 a2 )
∀ h ∈ H, ai ∈ A.
(16)
One has 'X = 1 ⊗ X + X ⊗ 1 + δ1 ⊗ Y, 'Y = 1 ⊗ Y + Y ⊗ 1, 'δ1 = 1 ⊗ δ1 + δ1 ⊗ 1.
(17)
'δn for n > 1 is obtained recursively from (13) using the fact that ' is an algebra homomorphism, '(h1 h2 ) = 'h1 'h2 . Similarly for the conjugate elements. The counit ε : H → C satisfies simply ε(1) = 1, ε(h) = 0 ∀ h = 1. Finally, H has an antipode S : H → H, determined uniquely by the condition m ◦ S ⊗ Id ◦ ' = m ◦ Id ⊗ S ◦ ' = ηε, where m : H ⊗ H → H is the multiplication and η : C → H the unit of H. One finds S(X) = −X + δ1 Y,
S(Y ) = −Y,
S(δ1 ) = −δ1 .
(18)
Since S is an antiautomorphism: S(h1 h2 ) = S(h2 )S(h1 ), the values of S(δn ), n > 1 follow. We are interested now in the cyclic cohomology of H [5, 6]. As a space, the cochain complex C ∗ (H) is the tensor algebra over H: C ∗ (H) =
∞
H⊗n .
(19)
n=0
The crucial step is the construction of a characteristic map γ : H⊗n → C n (A, A∗ )
(20)
from the cochain complex of H to the Hochschild complex of A with coefficients in A∗ dydy [3]. First F has a canonical -invariant measure dv = dzdz (yy) 2 . This yields a trace τ on A: f dv, f ∈ Cc∞ (F ), τ (f ) = (21) F τ (f Uψ∗ ) = 0 if ψ = 1.
Riemann–Roch Theorem for One-Dimensional Complex Groupoids
377
Then the characteristic map sends the n-cochain h1 ⊗ · · · ⊗ hn ∈ H⊗n to the Hochschild cochain γ (h1 ⊗ · · · ⊗ hn ) ∈ C n (A, A∗ ) given by γ (h1 ⊗ · · · ⊗ hn )(a0 , . . . , an ) = τ (a0 h1 (a1 ) . . . hn (an )) ,
ai ∈ A.
(22)
The cyclic cohomology of H is defined such that γ is a morphism of cyclic complexes. One introduces the face operators δ i : H⊗(n−1) → H⊗n for 0 ≤ i ≤ n: δ 0 (h1 ⊗ · · · ⊗ hn−1 ) = 1 ⊗ h1 ⊗ · · · ⊗ hn−1 , δ i (h1 ⊗ · · · ⊗ hn−1 ) = h1 ⊗ · · · ⊗ 'hi ⊗ · · · ⊗ hn−1 , δ n (h1 ⊗ · · · ⊗ hn−1 ) = h1 ⊗ · · · ⊗ hn−1 ⊗ 1,
1 ≤ i ≤ n − 1,
(23)
as well as the degeneracy operators σi : H⊗(n+1) → H⊗n , σi (h1 ⊗ · · · ⊗ hn+1 ) = h1 ⊗ . . . ε(hi+1 ) · · · ⊗ hn+1 ,
0 ≤ i ≤ n.
(24)
Next, the cyclic structure is provided by the antipode S and the multiplication of H. Consider the twisted antipode S˜ = (δ ⊗ S) ◦ ', where δ : H → C is a character such that ˜ τ (h(a)b) = τ (a S(h)(b))
∀ a, b ∈ A.
(25)
This last formula plays the role of ordinary integration by parts. One finds: δ(1) = 1,
δ(Y ) = δ(Y ) = 1,
δ(X) = δ(X) = δ(δn ) = δ(δ n ) = 0
∀ n ≥ 1.
(26)
The definition implies S˜ 2 = 1. Connes and Moscovici proved in [6] that the latter identity is sufficient to ensure the existence of a cyclicity operator τn : H⊗n → H⊗n , ˜ 1 )) · h2 ⊗ · · · ⊗ hn ⊗ 1, τn (h1 ⊗ · · · ⊗ hn ) = ('n−1 S(h
(27)
with (τn )n+1 = 1. Now C ∗ (H) endowed with δ i , σi , τn defines a cyclic complex. The Hochschild coboundary operator b : H⊗n → H⊗(n+1) is b=
n+1
(−)i δ i
(28)
i=0
and Connes’ operator B : B=
H⊗(n+1)
n
→ H⊗n is
(−)ni (τn )i B0
B0 = σn τn+1 + (−)n σn .
(29)
i=0
They fulfill the usual relations B 2 = b2 = bB + Bb = 0, so that C ∗ (H, b, B) is a bicomplex. We define the cyclic cohomology H C ∗ (H) as the b-cohomology of the subcomplex of cyclic cochains. The corresponding periodic cyclic cohomology H ∗ (H) is isomorphic to the cohomology of the bicomplex C ∗ (H, b, B) [3]. Furthermore, the definitions of δ i , σi , τn imply that γ is a morphism of cyclic complexes. Consequently, γ passes to cyclic cohomology γ : H C ∗ (H) → H C ∗ (A),
(30)
378
D. Perrot
as well as to periodic cyclic cohomology γ : H ∗ (H) → H ∗ (A).
(31)
In fact we are not interested in the frame bundle F but rather in the bundle of metrics P = F /SO(2), where SO(2) ⊂ Gl(1, C) is the group of rotations of frames. P is gifted with the coordinate chart (z, z, r), where the radial coordinate r is obtained from the decomposition y = e−r+iθ ,
r ∈ R, θ ∈ [0, 2π [.
(32)
The pseudogroup still acts on P by z → ψ(z), z → ψ(z), 1 r → r − ln |ψ (z)|2 . 2
(33)
Define A1 = ASO(2) ⊂ A the subalgebra of elements of A invariant under the (right) action of SO(2) on F . A1 is canonically isomorphic to the crossed product Cc∞ (P ) . P carries a -invariant measure dv1 = e2r dzdzdr, so that there is a trace on A1 , namely f dv1 , f ∈ Cc∞ (P ), τ1 (f ) = (34) P τ1 (f Uψ∗ ) = 0 if ψ = 1. Thus passing to SO(2)-invariants yields an induced characteristic map from the relative cyclic cohomology of H [5] γ1 : H C ∗ (H, SO(2)) → H C ∗ (A1 )
(35)
given by γ1 (h1 ⊗ · · · ⊗ hn )(a0 , . . . , an ) = τ1 (a0 h1 (a1 ) . . . h1 (an )), ai ∈ A1 , where h1 ⊗ · · · ⊗ hn represents an element of H C ∗ (H, SO(2)). The map γ1 generalises the classical Chern-Weil construction of characteristic classes from connections and curvatures. In the crossed product case , these classes are captured by the periodic cyclic cohomology of H. Connes and Moscovici computed the latter as Gelfand–Fuchs cohomology. This is the subject of the next section. 4. Gelfand–Fuchs Cohomology Let G be the group of complex analytic transformations of C. G has a unique decomposition G = G1 G2 , where G1 is the group of affine transformations x → ax + b,
x ∈ C, a, b ∈ C
(36)
and G2 is the group of transformations of the form x → x + o(x).
(37)
Any element of G is then the composition k ◦ ψ for k ∈ G1 , ψ ∈ G2 . Since G2 is the left quotient of G by G1 , G1 acts on G2 from the right: for k ∈ G1 , ψ ∈ G2 , one has ψ k ∈ G2 . Similarly, G2 acts on G1 from the left: ψ k ∈ G1 .
Riemann–Roch Theorem for One-Dimensional Complex Groupoids
379
We remark that G1 is the crossed product C Gl(1, C). The space C × Gl(1, C) is a prototype for the frame bundle F of a flat Riemann surface. This motivates the notation a = y, b = z for the coordinates on G1 . Under this identification, the left action of G2 on G1 corresponds to the action of G2 on F : for a holomorphic transformation ψ ∈ G2 , one has z → ψ(z),
y → ψ (z)y,
(38)
with ψ(0) = 0, ψ (0) = 1. Furthermore, the vector fields X, X, Y, Y form a basis of invariant vector fields for the left action of G1 on itself, i.e. a basis of the (complexified) Lie algebra of G1 . Its dual basis is given by the left-invariant 1-forms (Maurer–Cartan form) ω−1 = y −1 dz, ω0 = y −1 dy,
ω−1 = y −1 dz,
(39)
ω0 = y −1 dy.
The left action G2 G1 implies a right action of G2 on forms by pullback. One has in particular, for ψ ∈ G2 , ω−1 ◦ ψ = ω−1 ,
ω0 ◦ ψ = ω0 + y∂z ln ψ ω−1
and c.c.
(40)
Consider now the discrete crossed product H∗ = Cc∞ (G1 ) G2 , where G2 acts on by pullback. As a coalgebra, H is dual to the algebra H∗ . One has a natural action of H on H∗ : Cc∞ (G1 )
X.(f Uψ∗ ) = X.f Uψ∗ ,
f ∈ Cc∞ (G1 ), ψ ∈ G2 ,
δn (f Uψ∗ ) = y n ∂zn ln ψ f Uψ∗ ,
(41)
and so on with Y, X . . . . The operators δn , δn have in fact an interpretation in terms of coordinates on the group G2 : for ψ ∈ G2 , δn (ψ) is by definition the value of the function δn (Uψ∗ )Uψ at 1 ∈ G1 . For any k ∈ G1 , one has [δn (Uψ∗ )Uψ ](k) = δn (ψ k).
(42)
Note that (40) rewrites ω0 ◦ ψ = ω0 + δ1 (ψ k)ω−1
at k ∈ G1 .
(43)
The Hopf subalgebra of H generated by δn , δn , n ≥ 1, corresponds to the commutative Hopf algebra of functions on G2 which are polynomial in these coordinates. Let A be the complexification of the formal Lie algebra of G. It coincides with the jets of holomorphic and antiholomorphic vector fields of any order on C: ∂x ,x∂x , . . . , x n ∂x , . . . , ∂x ,x∂x , . . . , x n ∂x , . . . .
x ∈ C,
(44)
The Lie bracket between the elements of the above basis is thus [x n ∂x , x m ∂x ] = (m − n)x n+m−1 ∂x [x n ∂x , x m ∂x ] = 0.
and c.c.,
(45)
380
D. Perrot
Define the generator of dilatations H = x∂x + x∂x and of rotations J = x∂x − x∂x . They fulfill the properties [H, x n ∂x ] = (n − 1)x n ∂x ; [H, x n ∂x ] = (n − 1)x n ∂x , [J, x n ∂x ] = (n − 1)x n ∂x , [J, x n ∂x ] = −(n − 1)x n ∂x .
(46)
We are interested in the Lie algebra cohomology of A (see [7]). The complex C ∗ (A) of cochains is the exterior algebra generated by the dual basis {ωn , ωn }n≥−1 : m , ωn (x m ∂x ) = δn+1
ωn (x m ∂x ) = 0,
ωn (x m ∂x ) = 0,
m ωn (x m ∂x ) = δn+1 ,
∀ n ≥ −1, m ≥ 0,
(47)
and the coboundary operator is uniquely defined by its action on 1-cochains dω(X, Y ) = −ω([X, Y ])
∀ X, Y ∈ A.
(48)
From [5] we know that the periodic cyclic cohomology H ∗ (H, SO(2)) is isomorphic to the relative Lie algebra cohomology H ∗ (A, SO(2)), i.e. the cohomology of the basic subcomplex of cochains on A relative to the Cartan operation (L, i) of J : LJ ω = (iJ d + diJ )ω
∀ ω ∈ C ∗ (A).
(49)
We say that a cochain ω ∈ C ∗ (A) is of weight r if LH ω = −rω. Remark that LH ωn = −nωn ,
LH ωn = −nωn
∀ n ≥ −1,
(50)
so that C ∗ (A) is the direct sum, for r ≥ −2, of the spaces Cr∗ (A) of weight r. Since [H, J ] = 0, Cr∗ (A) is stable under the Cartan operation of J and we note Cr∗ (A, SO(2)) the complex of basic cochains of weight r. Then we have C ∗ (A, SO(2)) =
∞ r=−2
Cr∗ (A, SO(2)).
(51)
For any cocycle ω ∈ Cr∗ (A, SO(2)), LH ω = diH ω = −rω,
(52)
so that Cr∗ (A, SO(2)) is acyclic whenever r = 0. Hence H ∗ (A, SO(2)) is equal to the cohomology of the finite-dimensional subcomplex C0∗ (A, SO(2)). The direct computation gives H 0 (A, SO(2)) = C with representative 1, H 2 (A, SO(2)) = C ω−1 ω1 , 3 H (A, SO(2)) = C (ω−1 ω1 − ω−1 ω1 )(ω0 + ω0 ), 5 ω1 ω−1 ω1 ω−1 (ω0 + ω0 ). H (A, SO(2)) = C
(53)
The other cohomology groups vanish. Next we construct a map C from C ∗ (A) to the bicomplex (C n,m , d1 , d2 )n,m∈Z of [3] chap. III.2.δ. Let >m (G1 ) be the space m-forms on G1 . C n,m is the space of totally antisymmetric maps γ : G2 n+1 → >m (G1 ) such that γ (g0 g, . . . , gn g) = γ (g0 , . . . , gn ) ◦ g,
gi ∈ G2 , g ∈ G ,
(54)
Riemann–Roch Theorem for One-Dimensional Complex Groupoids
381
where gi g is given by the right action of G on G2 , and G acts on >∗ (G1 ) by pullback (left action of G on G1 ). The first differential d1 : C n,m → C n+1,m is m
(d1 γ )(g0 , . . . , gn+1 ) = (−)
n+1
∨
(−)i γ (g0 , . . . , gi , . . . , gn+1 ),
(55)
i=0
and d2 : C n,m → C n,m+1 is just the de Rham coboundary on >∗ (G1 ): (d2 γ )(g0 , . . . , gn ) = d(γ (go , . . . , gn )).
(56)
Of course d1 2 = d2 2 = d1 d2 + d2 d1 = 0. We remark that for γ ∈ C n,m , the invariance property (54) implies γ (g0 , . . . , gn ) ◦ k = γ (g0 k, . . . , gn k)
∀ k ∈ G1 ,
(57)
in other words the value of γ (g0 , . . . , gn ) ∈ >m (G1 ) at k is deduced from its value at 1. Let us describe now the construction of C. As a vector space, the Lie algebra A is just the direct sum G1 ⊕ G2 , Gi being the (complexified) Lie algebra of Gi . The cochain complex C ∗ (A) is then the exterior product @A∗ = @G1 ∗ ⊗ @G2 ∗ . One identifies G1 ∗ with the cotangent space T1∗ (G1 ) of G1 at the identity. Since G2 fixes 1 ∈ G1 , there is a right action of G2 on @G1 ∗ by pullback. The basis {ω−1 , ω0 , ω−1 , ω0 } of G1 ∗ is represented by left-invariant one-forms on G1 through the identification ω−1 → −ω−1 = −y −1 dz, ω−1 → −ω−1 = −y −1 dz, ω0 → −ω0 = −y −1 dy,
ω0 → −ω0 = −y −1 dy,
(58)
and the right action of ψ ∈ G2 reads (cf. (40)) ω−1 · ψ = ω−1 ,
ω0 · ψ = ω0 + δ1 (ψ)ω−1 .
(59)
Next, we view a cochain ω ∈ C ∗ (A) as a cochain of the Lie algebra of G2 with coefficients in the right G2 -module @G1 ∗ . It is represented by a @G1 ∗ -valued right-invariant form µ on G2 . Then C(ω) ∈ C ∗,∗ evaluated on (g0 , . . . , gn ) ∈ G2 n+1 is a differential form on G1 whose value at 1 ∈ G1 is C(ω)(g0 , . . . , gn ) =
'(g0 ,...,gn )
µ
∈ @T1∗ (G1 ),
(60)
where '(g0 , . . . , gn ) is the affine simplex in the coordinates δi , δi , with vertices (g0 , . . . , gn ). Let {ρj } be a basis of left-invariant forms on G1 . Then C(ω)(g0 , . . . , gn ) =
j
pj (g0 , . . . , gn )ρj
at 1 ∈ G1 ,
(61)
382
D. Perrot
where pj (g0 , . . . , gn ) are polynomials in the coordinates δi , δi . The invariance property (54) enables us to compute the value of C(ω)(g0 , . . . , gn ) at any k ∈ G1 , pj (g0 k, . . . , gn k)ρj (62) C(ω)(g0 , . . . , gn )(k) = j
because ρj ◦ k = ρj . Connes and Moscovici showed in [5] that C is a morphism from C ∗ (A, d) to the bicomplex (C n,m , d1 , d2 )n,m∈Z . In the relative case, it restricts to a morphism from n,m C ∗ (A, SO(2), d) to the subcomplex (Cbas. , d1 , d2 ) of antisymmetric cochains on G2 with values in the basic de Rham cohomology >∗ (P ) = >∗ (G1 /SO(2)). It remains to compute the image of H ∗ (A, SO(2)) by C. We restrict ourselves to even cocycles, i.e. the unit 1 ∈ H 0 (A, SO(2)) and the first Chern class c1 ∈ H 2 (A, SO(2)), defined as the class c1 = [2ω−1 ω1 ].
(63)
0,0 . The immediate result is One has C(1) ∈ Cbas.
C(1)(g0 ) = 1,
g0 ∈ G2 .
(64)
For the first Chern class, we must transform c1 into a right-invariant form on G2 with values in @T1∗ (G1 ). We already know that ω−1 is represented by −ω−1 = −y −1 dz, which satisfies ω−1 ◦ ψ = ω−1 , ∀ ψ ∈ G2 . Next, the Taylor expansion of an element ψ ∈ G2 can be expressed in the coordinates δn thanks to the obvious formula ln ψ (x) =
∞ 1 δn (ψ)x n , n!
∀ x ∈ C.
(65)
n=1
One finds: 1 1 ψ(x) = x + δ1 (ψ)x 2 + (δ2 (ψ) + δ1 (ψ)2 )x 3 + O(x 4 ). 2 3!
(66)
It shows that the cochain ω1 ∈ C ∗ (A) is represented by the right-invariant 1-form 21 dδ1 1,1 on G2 . Thus at 1 ∈ G1 , C(c1 ) ∈ Cbas. is given by −ω−1 dδ1 C(c1 )(g0 , g1 ) = (67) '(g0 ,g1 ) = −ω−1 (δ1 (g1 ) − δ1 (g0 )) gi ∈ G2 , and at k ∈ G1 , the 1-form C(c1 )(g0 , g1 ) is C(c1 )(g0 , g1 ) = −ω−1 (δ1 (g1 k) − δ1 (g0 k)).
(68)
Since ω−1 = y −1 dz and δ1 (g k) = y∂z ln g (z), z and y being the coordinates of k, one has explicitly C(c1 )(g0 , g1 ) = −dz(∂z ln g1 (z) − ∂z ln g0 (z)).
(69)
It is a basic form on G1 relative to SO(2), which then descends to a form on P = G1 /SO(2) as expected.
Riemann–Roch Theorem for One-Dimensional Complex Groupoids
383
The last step is to use the map F of [3, Theorem 14, p. 220] from (C n,m , d1 , d2 ) to the (b, B) bicomplex of the discrete crossed product Cc∞ (P ) G2 . Define the algebra ˆ B = >∗ (P )⊗@C(G 2 ),
(70)
where @C(G 2 ) is the exterior algebra generated by the elements δψ , ψ ∈ G2 , with δe = 0 for the identity e of G2 . With the de Rham coboundary d of >∗ (P ), B is a differential algebra. Now form the crossed product B G2 , with multiplication rules Uψ∗ αUψ = α ◦ ψ,
α ∈ >∗ (P ), ψ ∈ G2 ,
Uψ∗1 δψ2 Uψ1 = δψ2 ◦ψ1 − δψ1 ,
ψi ∈ G2 .
(71)
Endow B G2 with the differential d˜ acting on an element bUψ∗ as ∗ ∗ ∂b ∗ ˜ d(bU ψ ) = dbUψ − (−) bδψ Uψ ,
(72)
where db comes from the de Rham coboundary of >∗ (P ). The map F : (C ∗,∗ , d1 , d2 ) → (Cc∞ (P ) G2 , b, B) n,m is constructed as follows. Let γ ∈ Cbas. . It yields a linear form γ˜ on B G2 : γ˜ (α ⊗ δg1 . . . δgn ) = α ∧ γ (1, g1 , . . . , gn ), α ∈ >∗ (P ), gi ∈ G2 ,
γ˜ (bUψ∗ ) = 0
P
(73)
(74)
if ψ = 1.
Then F(γ ) is the following l-cochain on Cc∞ (P ) G2 , l = dim P − m + n, l
n! ˜ j +1 . . . dx ˜ l x0 dx ˜ j ), ˜ 1 . . . dx F(γ )(x0 , . . . , xl ) = (−)j (l−j ) γ˜ (dx (l + 1)! xi ∈
j =0 Cc∞ (P ) G2
(75)
⊂ B G2 .
The essential tool is that F is a morphism of bicomplexes: F(d1 γ ) = bF(γ ),
F(d2 γ ) = BF(γ ).
(76)
Moreover, if d1 γ = d2 γ = 0, F(γ ) is a cyclic cocycle. This happens in our case. Since P is a 3-dimensional manifold, the image of C(1) under F is the cyclic 3-cocycle F(C(1))(x0 , . . . , x3 ) = x0 dx1 . . . dx3 , xi ∈ Cc∞ (P ) G2 , (77) P
where d(f Uψ∗ ) = df Uψ∗ >∗ (P ) G2 by setting
for f ∈
P
Cc∞ (P ), ψ
αUψ∗ = 0
∈ G2 , and the integration is extended over
if ψ = 1, α ∈ >∗ (P ).
The image of γ = C(c1 ) is more complicated to compute. One has α ∧ y −1 dzδ1 (g k), α ∈ >2 (P ), g ∈ G2 , γ˜ (α ⊗ δg ) = − P
(78)
(79)
384
D. Perrot
where y −1 dzδ1 (g k) = dz∂z ln g (z) is, of course, a 1-form on P . F(γ ) is the cyclic 3-cocycle F(γ )(f0 Uψ∗0 , . . . , f3 Uψ∗3 )
= − γ˜ (f0 Uψ∗0 df1 Uψ∗1 df2 Uψ∗2 f3 δψ3 Uψ∗3 + f0 Uψ∗0 df1 Uψ∗1 f2 δψ2 Uψ∗2 df3 Uψ∗3
+ f0 Uψ∗0 f1 δψ1 Uψ∗1 df2 Uψ∗2 df3 Uψ∗3 )
(80)
= γ˜ (f0 (df1 ◦ ψ0 ) (df2 ◦ ψ1 ψ0 ) (f3 ◦ ψ2 ψ1 ψ0 )δψ2 ψ1 ψ0 + f0 (df1 ◦ ψ0 ) (f2 ◦ ψ1 ψ0 ) (df3 ◦ ψ2 ψ1 ψ0 )(δψ2 ψ1 ψ0 − δψ1 ψ0 ) − f0 (f1 ◦ ψ0 ) (df2 ◦ ψ1 ψ0 ) (df3 ◦ ψ2 ψ1 ψ0 )(δψ1 ψ0 − δψ0 )), upon assuming that ψ3 ψ2 ψ1 ψ0 = Id. Using the relation δ1 (ψ k) = [δ1 (Uψ∗ )Uψ ](k),
∀ k ∈ G1 , ψ ∈ G2 ,
(81)
the computation gives F(γ )(x0 , . . . , x3 ) =
P
x0 (dx1 dx2 δ1 (x3 ) + dx1 δ1 (x2 )dx3 + δ1 (x1 )dx2 dx3 )y −1 dz. (82)
Now recall that P has an invariant volume form dv1 = e2r dzdzdr. The differential df of a function on P makes use of the horizontal X = y∂z , X = y∂z and vertical Y + Y = −∂r vector fields: df = y −1 dzX.f + y −1 dzX.f − dr(Y + Y ).f.
(83)
Then using the relations (40) one sees that F(C(c1 )) is a sum of terms involving the Hopf algebra x0 hi1 (x1 ) . . . hi3 (x3 )dv1 , (84) F(C(c1 ))(x0 , . . . , x3 ) = i
P
where the sum i hi1 ⊗ hi2 ⊗ hi3 is a cyclic 3-cocycle of H relative to SO(2). This follows from the existence of a characteristic map H C ∗ (H, SO(2)) → H C ∗ (Cc∞ (P ) G2 )
(85)
and the duality between H and H∗ = Cc∞ (G1 ) G2 (cf. [5]). Returning to the initial situation, where F is the frame bundle of a flat Riemann surface , and P = F /SO(2) the bundle of metrics, the above computation shows that the cyclic 3-cocycle on A1 = Cc∞ (P ) , [c1 ](a0 , . . . , a3 ) =
i
P
a0 hi1 (a1 ) . . . hi3 (a3 )dv1 ,
ai ∈ A1 ,
(86)
Riemann–Roch Theorem for One-Dimensional Complex Groupoids
385
is the image of C(c1 ) by the characteristic map H C ∗ (H, SO(2)) → H C ∗ (A1 ). Also the fundamental class a0 da1 da2 da3 (87) [P ](a0 , . . . , a3 ) = P
is in the range of the characteristic map. Since Connes and Moscovici showed that the Gelfand–Fuchs cohomology H ∗ (A, SO(2)) is isomorphic to the periodic cyclic cohomology of H, we have completely determined the odd part of the range of the characteristic map. We can summarize the result in the following Proposition 1. Under the characteristic map H ∗ (A, SO(2)) # H ∗ (H, SO(2)) → H ∗ (A1 ),
(88)
the unit 1 ∈ H 0 (A, SO(2)) maps to the fundamental class [P ] represented by the cyclic 3-cocycle, [P ](a0 , . . . , a3 ) = a0 da1 da2 da3 , ai ∈ A1 , (89) P
and the first Chern class c1 ∈ gives the cocycle [c1 ] ∈ H C 3 (A1 ): [c1 ](a0 , . . . , a3 ) = a0 (da1 da2 δ1 (a3 ) + da1 δ1 (a2 )da3 + δ1 (a1 )da2 da3 )y −1 dz. H 2 (A, SO(2))
P
(90)
In Sect. 2 we considered an odd K-cycle on C0 (P × R2 ) represented by a differential operator Q , which is equivalent, up to Bott periodicity, to an odd K-cycle on C0 (P ) . Q is a matrix-valued polynomial in the vector fields X, X, Y + Y and the partial derivatives along the two directions of R2 . Its Chern character is the cup product ch∗ (Q ) = ϕ#[R2 ]
(91)
of a cyclic cocycle ϕ ∈ H C odd (Cc∞ (P ) ) by the fundamental class of R2 . The index theorem of Connes and Moscovici states that ϕ is in the range of the characteristic map (we have to assume that the action of on has no fixed point). Hence it is a linear combination of the characteristic classes [P ] and [c1 ]. We shall determine the coefficients by using the classical Riemann–Roch theorem.
5. A Riemann–Roch Theorem for Crossed Products We shall first use the Thom isomorphism in K-theory [1], Ki (C0 () ) → Ki+1 (C0 (P ) )
(92)
to descend the characteristic classes [P ] and [c1 ] down to the cyclic cohomology of Cc∞ () . Recall that C0 (P ) is just the crossed product of C0 () by the modular automorphism group σ of the associated von Neumann algebra C0 (P ) = (C0 () ) σ R.
(93)
386
D. Perrot
By homotopy we can deform σ continuously into the trivial action. For λ ∈ [0, 1], let σtλ = σλt , ∀ t ∈ R. Then σ 1 = σ , σ 0 = Id and (C0 () ) Id R = C0 () ⊗ C0 (R).
(94)
Next, the coordinate system (z, z) of gives a smooth volume form dz∧dz 2i together with a representative of σ , whose action on the subalgebra Cc∞ () is σt (f Uψ∗ ) = f |ψ |2it Uψ∗ ,
f ∈ Cc∞ (), ψ ∈ ,
(95)
and accordingly σtλ (f Uψ∗ ) = f |ψ |2iλt Uψ∗ .
(96)
We remark that the algebra (C0 ())σ λ R is equal to the crossed product C0 (P )λ obtained from the following deformed action of on P : z → ψ(z), z → ψ(z), 1 r → r − λ ln |ψ (z)|2 , ψ ∈ . 2
(97)
Hence for any λ ∈ [0, 1], one has a Thom isomorphism Fλ : K0 (C0 () ) → K1 (C0 (P ) λ ),
(98)
and F0 is just the connecting map K0 (C0 ()) → K1 (S(C0 ())). We introduce also the family {[P ]λ }λ∈[0,1] of cyclic cocycles λ λ λ a0λ da1λ . . . da3λ , ∀ aiλ ∈ Cc∞ (P ) λ . (99) [P ] (a0 , . . . , a3 ) = P
One has
[P ]1
= []#[R] ∈ (Cc∞ () ) ⊗ Cc∞ (R), where [](a0 , a1 , a2 ) = a0 da1 da2 ∀ ai ∈ Cc∞ () .
= [P ] and
[P ]0
(100)
Moreover for any element [e] ∈ K0 (C0 () ) such that Fλ ([e]) is in the domain of definition of [P ]λ , the pairing $Fλ ([e]), [P ]λ %
(101)
depends continuously upon λ. Next for any λ ∈]0, 1], consider the vertical diffeomorphism of P whose action on the coordinates (z, z, r) reads ˜ λ(z) = z,
˜ λ(z) = z,
˜ λ(r) = λr.
(102)
χλ : Cc∞ (P ) λ → Cc∞ (P )
(103)
Thus for λ = 0 one has an algebra isomorphism
by setting χλ (f Uψ∗ ) = f ◦ λ˜ Uψ∗
∀ f ∈ Cc∞ (P ), ψ ∈ .
(104)
Riemann–Roch Theorem for One-Dimensional Complex Groupoids
387
For any λ = 0, (χλ )∗ ◦ Fλ = F1 , ∗
λ
(χλ ) [P ] = [P ] . 1
(105) (106)
Equation (105) comes from the unicity of the Thom map (cf. [1]), and (106) is obvious. Thus $Fλ ([e]), [P ]λ % is constant for λ = 0, and by continuity at 0, $F1 ([e]), [P ]% = $[e], []%.
(107)
This shows that the image of [P ] by Thom isomorphism is the cyclic 2-cocycle [] corresponding to the fundamental class of . In exactly the same way we show that the image of [c1 ] is the cyclic 2-cocycle τ defined, for ai = fi Uψ∗i ∈ Cc∞ () , by
τ (a0 , a1 , a2 ) =
a0 (da1 ∂ ln ψ2 a2 + ∂ ln ψ1 a1 da2 ),
(108)
with ∂ = dz∂z . Note that in the decomposition of the differential on , d = ∂ + ∂, both ∂ and ∂ commute with the pullbacks by the conformal transformations ψ ∈ . So far we have considered a flat Riemann surface and the constructions we made were relative to a coordinate system (z, z). We shall now remove this unpleasant feature by using the Morita equivalence [5]. In order to understand the general situation, let us first treat the particular case of the Riemann sphere S 2 = C ∪ {∞}. We consider an open covering of the sphere by two planes: S 2 = U1 ∪ U2 , U1 = C, U2 = C, together with the glueing function g: g : U1 \{0} → U2 \{0}, 1 z (→ . z
(109)
The pseudogroup of conformal transformations 0 generated by {Ug∗ , Ug } acts on the disjoint union = U1 ) U2 , which is flat. Then S 2 is described by the groupoid 0 . If is a pseudogroup of local transformations of S 2 , there exists a pseudogroup containing 0 , acting on and such that the crossed product C ∞ (S 2 ) is Morita equivalent to Cc∞ () . The latter splits into four parts: it is the direct sum, for i, j = 1, 2, of elements of the form fij Uψ∗ij with ψij : Ui → Uj
and
suppfij ⊂ Domψij .
(110)
For convenience, we adopt a matricial notation for any generic element b ∈ Cc∞ () :
b11 b12 , bij = fij Uψ∗ij . (111) b= b21 b22 Now the Morita equivalence is explicitly realized through the following idempotent e ∈ Cc∞ () :
ρ1 2 ρ1 ρ2 Ug∗ e= , e2 = e, (112) Ug ρ2 ρ1 Ug ρ2 2 Ug∗
388
D. Perrot
where {ρi }i=1,2 is a partition of unity relative to the covering {Ui }: ρ1 ∈ Cc∞ (U1 ),
ρ1 2 + ρ2 2 = 1 on S 2 = U1 ∪ {∞}.
(113)
The reduction of Cc∞ () by e is the subalgebra (Cc∞ () )e = {b ∈ Cc∞ () /b = be = eb}. Its elements are of the form
ebe =
ρ1 cρ1 ρ1 cρ2 Ug∗ Ug ρ2 cρ1 Ug ρ2 cρ2 Ug∗
(114)
(115)
with c = ρ1 b11 ρ1 +ρ2 Ug∗ b21 ρ1 +ρ1 b12 Ug ρ2 +ρ2 Ug∗ b22 Ug ρ2 . Then c can be considered as an element of C ∞ (S 2 ) under the identification S 2 = U1 ∪ {∞}. (Cc∞ () )e and C ∞ (S 2 ) are isomorphic through the map θ : C ∞ (S 2 ) −→ (Cc∞ () )e
ρ1 aρ2 Ug∗ ρ1 aρ1 . a ( −→ Ug ρ2 aρ1 Ug ρ2 aρ2 Ug∗
(116)
We are ready to compute the pullbacks of [] and τ ∈ H C 2 (Cc∞ () ) by θ. This yields the following cyclic 2-cocycles on C ∞ (S 2 ) : θ ∗ [] = [S 2 ], a0 da1 (∂ ln ψ2 a2 + [a2 , ρ2 2 ∂ ln g ]) (θ ∗ τ )(a0 , a1 , a2 ) = S2
+ (∂ ln ψ1 a1 + [a1 , ρ2 2 ∂ ln g ]) − a2 a0 a1 d(ρ2 2 )∂ ln g ,
(117)
S2
with ai = fi Uψ∗i ∈ C ∞ (S 2 ) . In formula (117), S 2 = U1 ∪ {∞} is gifted with the coordinate chart (z, z) of U1 , which makes sense to ψi (z) = ∂z ψi (z) and g (z) = ∂z g(z) = −1/z2 , but gives singular expressions at 0 and ∞. We can overcome this 2 difficulty by introducing a smooth volume form ν = ρ(z, z) dz∧dz 2i on S . The associated modular automorphism group σ ν leaves C ∞ (S 2 ) globally invariant and is expressed in the coordinates (z, z) by
ρ ◦ ψ it ν ◦ ψ it σtν (f Uψ∗ ) = f Uψ∗ = (118) |∂z ψ|2 f Uψ∗ , ∀ t ∈ R. ν ρ Define the derivation δ ν on C ∞ (S 2 ) , d ν σ ](f Uψ∗ )|t=0 dt t
ρ ◦ ψ = [∂, ln |∂z ψ|2 ](f Uψ∗ ) ρ = ∂ ln ψ f Uψ∗ − [∂ ln ρ, f Uψ∗ ].
δ ν (f Uψ∗ ) ≡ −i[∂,
(119) (120)
Riemann–Roch Theorem for One-Dimensional Complex Groupoids
389
One has ∂ ln ψ f Uψ∗ + [f Uψ∗ , ρ2 2 ∂ ln g ] = δ ν (f Uψ∗ ) + [∂ ln ρ − ρ2 2 ∂ ln g , f Uψ∗ ],
(121)
where the 1-form ω = ∂ ln ρ − ρ2 2 ∂ ln g is globally defined, nowhere singular on S 2 . Let R ν = ∂∂ ln ρ be the curvature 2-form associated to the Kähler metric ρdz ⊗ dz. One has the commutation rule (∂δ ν + δ ν ∂)a = [R ν , a]
∀ a ∈ C ∞ (S 2 ) .
Simple algebraic manipulations show that the following 2-cochain: τ ν (a0 , a1 , a2 ) = a0 (da1 δ ν a2 + δ ν a1 da2 ) + a 2 a 0 a1 R ν S2
S2
(122)
(123)
is a cyclic cocycle. Moreover, τ ν is cohomologous to θ ∗ τ . To see this, let ϕ be the cyclic 1-cochain ϕ(a0 , a1 ) = (a0 da1 − a1 da0 )ω. (124) S2
Then for all ai ∈ C ∞ (S 2 ) , (τ ν − θ ∗ τ )(a0 , a1 , a2 ) = −
S2
(a0 da1 a2 + a2 da0 a1 + a1 da2 a0 )ω
(125)
= bϕ(a0 , a1 , a2 ). It is clear now that the construction of characteristic classes for an arbitrary (non flat) Riemann surface follows exactly the same steps as in the above example. Using an open cover with partition of unity, one gets the desired cyclic cocycles by pullback. Choose a smooth measure ν on , then the associated modular group is σtν (f Uψ∗ )
=
ν◦ψ ν
it
f Uψ∗ ,
f Uψ∗ ∈ Cc∞ () .
(126)
The corresponding derivation D ν (f Uψ∗ ) = ln
ν◦ψ ν
f Uψ∗
(127)
allows one to define the noncommutative differential δ ν = [∂, D ν ].
(128)
Then the characteristic classes of the groupoid are given by [] and [τ ν ] ∈ H C 2 (Cc∞ () ), where τ ν is given by Eq. (123) with S 2 replaced by . In the case = Id, the crossed product reduces to the commutative algebra Cc∞ () for which (δ ν = 0) τ ν (a0 , a1 , a2 ) = a0 a1 a2 R ν (129)
390
D. Perrot
is just the image of the cyclic 0-cocycle τ0ν (a) =
aR ν
(130)
by the suspension map in cyclic cohomology S : H C ∗ (Cc∞ ()) → H C ∗+2 (Cc∞ ()). Thus the periodic cyclic cohomology class of τ ν corresponds in de Rham homology to the cap product 1 [τ ν ] = c1 (κ) ∩ [] ∈ H0 () (131) 2πi of the first Chern class of the holomorphic tangent bundle κ by the fundamental class. This motivates the following definition: Definition 2. Let be a Riemann surface without boundary and a discrete pseudogroup acting on by local conformal transformations. Let ν be a smooth volume form on , and σ ν the associated modular automorphism group leaving Cc∞ () globally invariant. Then the Euler class e( ) is the class of the following cyclic 2-cocycle on Cc∞ () 1 1 ν (a2 a0 a1 R ν + a0 (da1 δ ν a2 + δ ν a1 da2 )), (132) τ (a0 , a1 , a2 ) = 2πi 2πi d ν σt |t=0 ], and R ν is the curvature of the Kähler metric where δ ν is the derivation −i[∂, dt determined by ν and the complex structure of . Moreover, this cohomology class is independent of ν.
Now if = Id, the operator Q of Sect. 2 defines an element of the K-homology of × R2 . It corresponds to the tensor product of the classical Dolbeault complex [∂] of by the signature complex [σ ] of the fiber R2 , so that its Chern character in de Rham homology is the cup product ch∗ (Q) = ch∗ ([∂])#ch∗ ([σ ]) 1 = ([] + c1 (κ) ∩ [])#2[R2 ] ∈ H∗ ( × R2 ) 2 which yields, by Thom isomorphism, the homology class on 2[] + c1 (κ) ∩ []
∈ H∗ ().
(133)
(134)
Next for any , we know from the last section that the Chern character of the Dolbeault Kcycle, expressed in the periodic cyclic cohomology of Cc∞ (), is a linear combination of [] and e( ). Thus we deduce immediately the following generalisation of the Riemann–Roch theorem: Theorem 3. Let be a Riemann surface without boundary and a discrete pseudogroup acting on by local conformal mappings without fixed point. The Chern character of the Dolbeault K-cycle is represented by the following cyclic 2-cocycle on Cc∞ () : ch∗ (Q) = 2[] + e( ).
(135)
Acknowledgements. I am very indebted to Henri Moscovici for having corrected an erroneous factor in the final formula.
Riemann–Roch Theorem for One-Dimensional Complex Groupoids
391
References 1. Connes, A.: An analogue of the Thom isomorphism for crossed products of a C ∗ algebra by an action of R. Adv. in Math. 39, 31–55 (1981) 2. Connes, A.: Cyclic cohomology and the transverse fundamental class of a foliation. In: Geometric methods in operator algebras, Kyoto (1983), , Pitman Res. Notes in Math. 123, Harlow: Longman, 1986, pp. 52–144 3. Connes, A.: Non-commutative geometry. New-York: Academic Press, 1994 4. Connes, A., Moscovici, H.: The local index formula in non-commutative geometry. GAFA 5, 174–243 (1995) 5. Connes, A., Moscovici, H.: Hopf algebras, cyclic cohomology and the transverse index theorem. Commun. Math. Phys. 198, 199–246 (1998) 6. Connes, A., Moscovici, H.: Cyclic cohomology and Hopf algebras. Lett. Math. Phys. 48, 85 (1999) 7. Godbillon, C.: Cohomologies d’algèbres de Lie de champs de vecteurs formels. Séminaire Bourbaki, Vol. 1972/73, no 421 Communicated by A. Connes
Commun. Math. Phys. 218, 393 – 416 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
The Split Property and the Symmetry Breaking of the Quantum Spin Chain Taku Matsui Graduate School of Mathematics, Kyushu University, 1-10-6 Hakozaki, Fukuoka 812-8581, Japan. E-mail:
[email protected] Received: 1 February 1999 / Accepted: 5 December 2000
Abstract: We consider the relationship between the symmetry breaking and the split property of pure states of quantum spin chains. We obtain a representation theoretic condition implying that the half-sided uniform mixing condition leads to symmetry breaking of translationally invariant pure states. This is a mathematical generalization of Dichotomy previously found by I. Affleck and E. Lieb and M. Aizenman and B. Nachtergaele for ground states of a special class of Hamiltonians.
1. Introduction The study of low-dimensional materials is an active area of research. Recent experimental and theoretical outcome on quasi one-dimensional materials are stimulating. Among them, one of the surprises from the experimental side is the discovery of spin ladder materials. (See the review article [13] and the references therein.) For some time, it has been known that the spin 1/2 antiferromagnetic quantum spin chain has either slow decay of correlation or breaking down of lattice translational symmetry. This observation was mainly supported by the exact solution of integrable models. However, if higher spin systems are concerned the situation seems different. In fact, Haldane’s conjecture claims that the integer spin Heisenberg models have a unique ground state with the spectral gap and exponential decay of correlation while the half odd integer model behaves as a massless quantum field theory. In a sense, experimental results on spin ladder materials are support for Haldane’s conjecture. Even if the original assertion of the conjecture is still an open problem of mathematical physics, a number of results related to the conjecture are found in the last decade. (See, e. g. [2–4] and [19].) In [3] I. Affleck, T. Kennedy, E. Lieb and H. Tasaki produced a new type of ground states for integer spin SU(2) invariant Hamiltonians. Their models have a spectral gap and exponential decay of spatial correlation. A perturbation theory for the model of [3] was developed and a new non-local order parameter characterizing
394
T. Matsui
the Haldane system is discussed in [19]. These results are a mathematical support of the integer spin part of Haldane’s conjecture. In [2] I. Affleck and E. Lieb have shown that, for the half-odd integer spin, there exists an excitation with arbitrary small energy for the antiferromagnetic Heisenberg Hamiltonian if the infinite volume ground state is unique. Their proof is based on the socalled Lieb–Mattis argument. Turning to the decay of correlation of the half odd integer spin, M.Aizenman and B. Nachtergaele investigated a fuctional integral representation of other SU(2) invariant Hamiltonians and found the dichotomy of slow decay of correlation and translational symmetry breaking. See also [26] and [5] for related results. On the other hand, influenced by Haldane’s conjecture, H. J. Schultz considered the antiferromagnetic quantum spin chain on the ladder (see [29]). He argued that certain spin ladder systems on an even number of legs has an exponential decay correlation while the case of an odd number of legs behaves in the same manner as the antiferromagnetic Heisenberg chain with half odd integer spin. It is relatively recent that the spin ladder material such as Srn−1 Cun O2n−1 is produced and investigated. The experimental results support the qualitative difference of systems with even and odd number of legs which was previously pointed out theoretically. Even though we have various mathematical and experimental results at hand, logically speaking, it is not clear whether the relation between the slow decay of correlation, translation symmetry breaking, the spectral gap and the parity of the spin is due to the special choice of Hamiltonians or to a universal underlying structure of SU(2) invariant antiferromagnetic systems. The aim of this paper is to investigate the connection between decay of correlation, symmetry breaking and the representation theoretic property of the global gauge symmetry from the viewpoint of Split Property. Our results are valid not only for spin half odd interger systems or spin ladder chains with an odd number of legs but also for pure states with arbitrary compact semi-simple Lie group symmetry. We will show that uniform mixing of infinite volume pure states necessitates the breakdown of either gauge or translation symmetry if a representation theoretic condition is satisfied. To explain more details of our results, we introduce our notation. We use the C ∗ −algebraic method in our analysis. The standard reference for this mathematical approach is [10] and [11]. The algebra of local observables is the infinite tensor product Aloc of matrix algebras, Aloc =
Md (C),
Z
where we denote the set of all d by d matrices by Md (C). Each component of the tensor product above is specified with a lattice site j ∈ Z. By A we denote the C ∗ −algebraic completion of Aloc , A = Aloc
C∗
.
For the spin s quantum spin chain, we set d = 2s + 1. The SU (2) gauge action on the observable algebra is denoted by βg (g ∈ SU (2)). In the case of the spin ladder model, each quantum degree of freedom is attached to a site in the ladder Ln , where Ln is a subset of Z2 defined by Ln = (j, a) ∈ Z2 |j ∈ Z,
a = 1, 2, 3 . . . n .
Split Property and Symmetry Breaking of Quantum Spin Chain
395
The positive integer n is called the number of legs. For simplicity we only consider the case that each site in Ln is equipped with a spin 1/2 degree of freedom. The algebra of n local observables is denoted by AL loc , n AL M2 (C). loc = Ln
Here each component of the tensor product is specified with a lattice site (j, a) ∈ Ln . n Obviously, as C ∗ -algebras, AL loc and Aloc are isomorphic provided d = (2s + 1)n. A standard Hamiltonian of the spin s antiferromagnetic chain is the Heisenberg Hamiltonian H s , Sα(j ) Sα(j +1) , H = j ∈Z α=x,y,z
(j )
where Sα is the spin operator at the site j. On the other hand the Hamiltonian for the ladder case is Sα(j,a) Sα(j +1,a) + J2 Sα(j,a) Sα(j,a+1) , H = J1 (j,a)∈Ln α=x,y,z
j ∈Za=1,2,3...n−1 α=x,y,z
where J1 and J2 are real positive parameters. Experimentally, the ratio JJ21 of J1 and J2 is of order one and it may not be easy to work with rigorous expansion techniques in this region of parameters. In Sect. 4 we discuss ground states of various Hamiltonians as examples, while, in Sect. 3, we do not consider any specific form of Hamiltonians and we examine the following uniform clustering property for the infinite volume states. By AR we denote the subalgebra of Aloc generated by all physical observables localized in the positive lattice sites j = 0, 1, 2, 3, . . . and in the same manner, by AL we denote the subalgebra of Aloc generated by all physical observables localized in the negative lattice sites j = −1, −2, −3, . . . . Let τj be the lattice translation (j shift to the right). Note that if j is positive τj (AR ) ⊂ AR and if j is negative τj (AL ) ⊂ AL . Definition 1.1. Let ϕ be a translationally invariant state of A. We say that ϕ is split if the following clustering condition is valid: (ϕ(Qi τk (Ri )) − ϕ(Qi )ϕ(Ri )) = 0, (1.1) lim sup k→∞ Q ≤1
i
where the above supremum is taken for any local observable Q in Aloc with the norm less than 1, Q ≤ 1 and written as a finite sum Qj Rj Qj ∈ A L Ri ∈ A R . Q= j
Recall that purity and translation invariance of ϕ implies clustering, (ϕ(Qj τk (Rj )) − ϕ(Qj )ϕ(Rj )) = 0 lim k→∞
j
for every single Q. However, the split condition does not always hold. The crucial part of the above assumption is uniformity of convergence. All the examples of pure states
396
T. Matsui
with exponential decay of correlation that we know satisfy the split property. By the exponential decay of correlation we mean lim eM|k| |ϕ(Q1 τk (Q2 )) − ϕ(Q1 )ϕ(Q2 )| = 0
|k|→∞
for any local observables Q1 and Q2 and a positive constant M. The uniform cluster condition of 1.1 is valid if and only if the state ϕ is quasiequivalent to the product ψL ⊗ ψR of a state ψL of AL and another state ψR of AR . (See Sect. 2.) In this sense, the split property is a weak statistical independence of left infinite and right infinite systems. A Gibbs state of a finite range interaction is split due to the above equivalent condition of the split property. In Sect. 4 we will present examples of pure ground states with and without split property. We can also formulate the split property of a measure on a one-dimensional classical spin system, as we have only to consider absolute continuity of measures instead of quasi-equivalence of quantum states. At the moment we find it difficult to construct examples of non-split measures. We are not aware of any counterexample to the following conjecture. Conjecture 1.2. Let ϕ be a translationally invariant factor state such that for any local observables Q1 and Q2 the two point correlation function decays exponentially fast, |ϕ(Q1 τk (Q2 )) − ϕ(Q1 )ϕ(Q2 )| eM|k| = 0. (1.2) k∈Z
Then, the state ϕ is split. The main result of this paper is as follows. Theorem 1.3. Suppose that the spin s is a half odd integer and consider a SU(2) invariant pure state ϕ of A. Then, one of the following statements must be valid: (i) The state ϕ is not translationally invariant. (ii) The state ϕ is not split. The above result means that either the gauge symmetry, the split property, or the translation symmetry must break down, but our proof does not tell which one occurs in the antiferromagnetic system. This certainly depends on the choice of the state ϕ or the Hamiltonian for which the state ϕ is a ground state. Intuitively the breakdown of the split property is plausible in antiferromagnetic systems. This is because the quantum fluctuation of the antiferromagnetic system is so large that the systems behave as if in the high temperature phase and all the symmetry of the Hamiltonian remains unbroken, which forces the lack of split property. This argument explains the reason why the decay of the two-point correlation is slow for the spin ladder material with an odd number of legs. We also claim that our result is complementary to a theorem of I. Affleck and E. Lieb in [2] and that of M. Aizenman and B. Nachtergaele in [4]. In [2] I. Affleck and E. Lieb discussed the connection of the spectral gap and translation symmetry breaking for the standard antiferromagnetic Heisenberg chain with half odd integer spin. On the other hand, in [4], M. Aizenman and B. Nachtergaele considered measures (states of the classical spin) obtained by the restriction of quantum ground states for certain SU(2) invariant Hamiltonians and proved symmetry breaking under the assumption of L1 clustering. Though Theorem 1.3 looks similar to these results, the meaning of Theorem 1.3 is quite different in mathematical terms. The state ϕ in Theorem 1.3 is not necessarily a ground state for a Hamiltonian. We are talking about quantum states, but
Split Property and Symmetry Breaking of Quantum Spin Chain
397
not measures, for a particular set of classical observables. From the physical point of view experimentally, the Fourier transform of the time-dependent correlation function is observed, so we believe it worthwhile investigating correlation functions of general local observables. Theorem 1.3 can be generalized to other situations. For example, we obtain our dichotomy for the ladder model on the lattice Ln . If the spin degree of freedom on each site in Ln is one half, we consider the case that the number of legs n of Ln is odd and we obtain absence of the split property for SU(2) invariant, translationally invariant pure states. ˆ Here we state a result for a more general case. Let G be a compact group G. By G we denote the set of unitary equivalence classes of irreducible representations. Let v(g) be a d dimensional (not necessarily irreducible) unitary representation. We consider the infinite product type action βg of G associated with v(g). Theorem 1.4. Let G be a connected, simply connected, semi-simple compact Lie group. ˆ has a partition Suppose G ˆ = ∪G ˆ α, G
ˆα ∩G ˆ β = ∅(α = β). G
Furthermore, we assume the following condition: For any α there exists β such that any ˆ α is contained irreducible component in the tensor product v(g) ⊗ π(g)with π(g) in G ˆ in Gβ . Let ϕ be a G invariant pure state (ϕ ◦ γg = ϕ for any g in G). Then, one of the following statements must be valid: (i) The state ϕ is not translationally invariant. (ii) The state ϕ is not split. When G is U(1) our method also implies the following result. Theorem 1.5. Consider the spin 21 chain. Let ϕ be a translationally invariant pure state which is invariant under a torus U (1) of SU (2), ϕ ◦ βz = ϕ for any z ∈ U (1). Then ϕ is either a product state or a non-split state. Due to this result we see that the ground state of the one-dimensional XY model is non-split. See Sect. 4. Next we sketch the key point of our proof. The proof is based on three different notions, split property, the shift of B(H) and the finitely correlated states. As we already mentioned, a standard argument of the quasi-local algebra tells us that a state ϕ is split if and only if ϕ is quasi-equivalent to the state ϕL ⊗ϕR , where ϕL and ϕR are the restriction of ϕ to AL and AR . The latter condition is precisely the split condition as in local quantum field theory. (See [18, 12] and [14] and the references therein.) More precisely, let π( ) be the GNS representation associated with a state ϕ. Consider the von Neumann algebra M1 generated by π(AR ) and the commutant M2 = π(AL ) of π(AL ). The inclusion of the von Neumann algebras M1 ⊂ M2 is split if it contains an intermediate type I factor N , M1 ⊂ N ⊂ M2 . The inclusion is split if and only if the state ϕ is split. The split condition was proved first for free (and, hence, locally normal) relativistic quantum fields by D. Buchholtz in [12]. The standard split property was extensively studied by S. Doplicher and R. Longo in [14], and R. Longo obtained the solution of the factorial Stone–Weierstrass conjecture in [22]. There exists a vast number of results of local QFT based on the split property. (See [18].) We should mention that our inclusion is not standard in the sense of S. Doplicher and R. Longo due to the lack of the Reeh–Schlieder property in our lattice models.
398
T. Matsui
When a translation invariant state ϕ with split property is pure, the restriction ϕR to AR is a type I factor state. Passing to the GNS representation and the restriction of τj (j ≥ 0) to AR we obtain a shift of B(H) in the sense of R. Powers ([28]). The close connection of shifts of the type I factor and representations of the Cuntz algebra was clarified in a series of works by O. Bratteli, P. Jorgensen, A. Kishimoto, J. Price and R. Werner. See [7–9]. They have shown that conjugacy classes of shifts of B(H) can be identified with equivalence classes of irreducible representations of the Cuntz algebra. The identification is based on the fact that any shift of B(H) with Powers index d is implemented by the generator of the Cuntz algebra Od . Then following the construction of [25], it is easy to show that a split state ϕ is a generalized finitely correlated state. The finitely correlated states which we will define later were introduced by M. Fannes, B. Nachtergaele, and R. Werner in [15]. In fact the construction of states is essentially the same as of the quantum Markov states of L. Accardi. Historically speaking, in the middle of the 1970’s, L. Accardi investigated quantum Markov states as a noncommutative generalization of classical Markov measures. His quantum Markov state is obtained by iteration of completely positive maps which play the role of the transfer matrix in classical statistical mechanics. We have already mentioned that in [3] I. Affleck, T. Kennedy, E. Lieb and H. Tasaki solved an antiferromagnetic quantum spin model as an affirmative example of the Haldane conjecture of the antiferromagnetic Heisenberg models with integer spin. They obtained the exact ground state of their model, which they called the Valence Bond Solid State. Subsequently M. Fannes, B. Nachtergaele, and R. Werner recognized that valence bond states and the quantum Markov states can be treated in one framework. The completely positive map producing the quantum Markov state is now obtained by iteration of an isometry (an intertwiner of Lie group representations) between finite dimensional spaces. By a Popescu system we mean the isometry and the auxiliary spaces associated with a quantum Markov state or a finitely correlated state. Popsecu considered a dilation problem for a set of contraction in [27] and later his dilation and the construction of states of the Cuntz algebra was related more explicitly by O. Bratteli, P. Jorgensen, A. Kishimoto and R.F. Werner in [9]. If we do not require the finiteness of the dimensions of these auxiliary spaces in the Popescu system, any translation invariant state of the quantum spin chain is reconstructed in the same way as purely generated finitely correlated states of M. Fannes, B. Nachtergaele, and R. Werner. In a sense, the classical counterpart to split states is now the Gibbs measure for the infinite range interaction, while the finitely correlated state of M. Fannes, B. Nachtergaele, and R. Werner corresponds to the finite range interaction. We can investigate the symmetry of split pure states using our realization as states in terms of Popescu systems. Finally we would like to mention that symmetry of finitely correlated pure states has been investigated by R. F. Werner (for a finite dimensional auxiliary space K introduced later in this paper) and he noticed the appearance of projective representations of Proposition 3.6. (unpublished, however, see [31]). The rest of this paper is devoted to the detailed exposition of the above idea. Section 2 is a mathematical preliminary. In Sect. 3 we consider the realization of split states as the generalized finitely correlated states. The symmetry of split states is also discussed. Our proof of Theorem 1.3, 1.4 and 1.5 will be presented at the end of Sect. 3. In Sect. 4 we exhibit a few examples of (non-)split pure states. All of them are ground states of quantum spin chains.
Split Property and Symmetry Breaking of Quantum Spin Chain
399
2. Mathematical Preliminary We begin with introducing our notation. The quantum spin model we consider is described by a UHF C ∗ -algebra denoted by A. Here A is the C ∗ -completion of the infinite tensor product of the algebra Md (C) of d by d complex matrices, A=
C∗
Md (C) .
(2.1)
Z
Each component of the tensor product is indexed by an integer j. Thus the element of A is a local physical observable. Let Q be a matrix in Md (C). By Q(j ) we denote the following element of A: ··· ⊗ 1 ⊗ 1 ⊗
Q
⊗1 ⊗ 1 ⊗ · · · ∈ A.
the j-th component
Given a subset ( of Z, A( is defined as the C ∗ -subalgebra of A generated by all Q(j ) with Q ∈ Mn (C), j ∈ (. We also set Aloc = ∪|(|<∞
A( ,
where |(| is the cardinality of (. Suppose that ϕ is a state of A. The restriction of ϕ to A( is denoted by ϕ( , ϕ|A( = ϕ( . We also set AR = A[0,∞) , AL = A(−∞,−1] , ϕR = ϕ[0,∞) and ϕL = ϕ(−∞,−1] . The translation τj is an automorphism of A defined by τj (Q(k) ) = Q(j +k) . Thus by definition τ1 leaves AR globally invariant and τ−1 leaves AL globally invariant. As each local subalgebra A( is a finite full matrix algebra, we can apply results of Sects. 2.6 and 4.2 of [10]. We use Theorem 2.6.10 and Corollary 2.6.11 of [10]. Proposition 2.1. (i) Suppose that a state ϕ of A is translationally invariant. ϕ is factor (i. e. the GNS representation is a factor representation) if and only if lim ϕ(Q1 τj (Q2 )) = ϕ(Q1 )ϕ(Q2 )
|j |→∞
(2.2)
for any Q1 and Q2 in A. (ii) Suppose ϕ1 and ϕ2 are factor states of A. Then the following conditions are equivalent. (iii) (a) ϕ1 and ϕ2 are quasi-equivalent. (b) For any * > 0 there exists k such that |ϕ1 (Q) − ϕ2 (Q)| ≤ * Q
(2.3)
for any Q in A(−∞,−k]∪[k,∞) . Results of the same kind are valid for AR and AL . If a state ϕ of A is a factor, then states ϕR of AR , ϕL of AL and ϕL ⊗ ϕR of A are factor as well. The following is a straightforward consequence of the above proposition.
400
T. Matsui
Proposition 2.2. Let ϕ be a translationally invariant pure state of A. The following conditions are equivalent: (i) The state ϕ is split in the sense specified in Definition 1.1. (ii) ϕ is quasi-equivalent to the state ϕL ⊗ ϕR . (iii) The GNS representation of AR associated with ϕR is a type I factor. Proof. The equivalence of (i) and (ii) follows from Proposition 2.1.(ii). Suppose that ϕ is quasi-equivalent to the state ϕL ⊗ ϕR . As we assumed that the state ϕ is pure both ϕL and ϕR are type I. If one of ϕL and ϕR is non-type I the tensor product cannot be type I. Thus we have only to show that condition (iii) implies (ii). Let {π(), ,, H} be the GNS triple for the state ϕ of where π() is the representation, , is the GNS cyclic vector and H the representation space. π(AR ) is a factor. To see this, set HR = π(AR ), and let πR (AR ) be the restriction of π(AR ) to HR . Due to our assumption, the representation of π() of A is irreducible and the von Neumann algebra πR (AR ) generated by πR (AR ) is a type I factor. The center of π(AR ) is trivial because it commutes with both π(AR ) and π(AL ). Thus any subrepresentation of π(AR ) is quasi-equivalent to π(AR ) itself. In particular π(AR ) and πR (AR ) are quasi-equivalent and π(AR ) is a type I factor. For any type I factor M on a separable Hilbert space H0 we can find separable Hilbert spaces H1 , H2 and a unitary W from H0 to H1 ⊗ H2 such that W MW −1 = B(H1 ) ⊗ 1H2
,
W M W −1 = 1H1 ⊗ B(H2 ).
(2.4)
Due to these remarks, the state ϕ of A is equivalent to a product state ψL ⊗ ψR , where ψL (resp. ψR ) is a state of AL (resp. ψR ) which is quasi-equivalent to ϕL (resp. ϕR ). When ϕR is a type I factor state of AR , it yields a shift of B(H). So we next explain the relationship between the Cuntz algebra Od and a shift of the set B(H) of all bounded linear operators on a separable Hilbert space H. (See [7–9] for details.) The Cuntz algebra Od is a simple C ∗ -algebra generated by isometries satisfying Sk∗ Sl = δkl 1,
d k=1
Sk Sk∗ = 1.
(2.5)
The gauge action γU of the group U (d) of d by d unitary matrices is defined via the following formula. d γU (Sk ) = Ulk Sl , l=1
where Ukl is the k l matrix element for U in U (d). Consider the diagonal circle group U (1) = {z |z| = 1z ∈ C} and its action γz , γz (Sj ) = zSj
(j = 1, 2, . . . d).
(2.6)
The fixed point algebra Od U (1) for this action of U (1) is the UHF algebra d∞ which we will identify with AR = A[0,∞) later. Suppose that v(g) is a d dimensional unitary representation of G with its matrix element v(g)kl and γ (g) is the product type action of G on A associated with v(g). The restriction of the action γv(g) of Od to AR and that of the product type action of A to AR are identical, so, by abuse of notation, we use the same symbol γg .
Split Property and Symmetry Breaking of Quantum Spin Chain
401
The canonical endomorphism 1 of Od is determined by 1(Q) =
d k=1
Sk QSk∗
Q ∈ Od .
(2.7)
It is easy to see that the restriction of the canonical endomorphism 1 of Od to AR is the lattice translation τ1 . ˜ of B(H). Throughout this paper an endoConsider B(H) and an endomorphism 1 morphism always means a unital normal *-endomorphism. ˜ is d (d ∈ {1, 2, 3 . . . ∞}) if the relative The Powers index of the endomorphism 1 ˜ commutant of 1(B(H)) in B(H) is isomorphic to the type Id factor (cf. [28]). Thus when the Powers index d is finite, ˜ 1(B(H)) ∩ B(H) = Md (C).
˜ of B(H) with the Powers index d is implemented by generators Any endomorphism 1 of the Cuntz algebra Od . Namely, there is a non-degenerate representation of Od on H and d ˜ 1(Q) = Sk QSk∗ , Q ∈ B(H). k=1
See, for example, [7]. This implementation of the endomorphism is unique up to a gauge transformation in U (d). ˜ of B(H) is called ergodic if its fixed points are trivial, An endomorphism 1 ˜ Q ∈ B(H)|1(Q) = Q = C1H . ˜ is called a shift of B(H) if 1 ˜n ∩∞ n=1 1 (B(H)) = C1H . ˜ is a shift it is ergodic. These notions, ergodicity and shifts are expressed in terms of If 1 ˜ 1 ˜ is ergodic representations of the Cuntz algebra implementing the endomorphism 1. ˜ is a shift if and only if the U (1) fixed point if and only if Od is weakly dense in B(H). 1 algebra Od U (1) is weakly dense in B(H). We explain the relation between our quantum spin systems and shifts of the type I factor. Lemma 2.3. Suppose that ϕ is a translationally invariant pure state of A. We assume that ϕ is split. Let {πR (AR ), ,R , HR } be the GNS representation associated with ϕR of AR . (i) The translation τ1 restricted to AR is extendible to a normal unital endomorphism ˜ of the type I factor πR (AR ) . The normal extension 1 ˜ is a shift with the Powers 1 index d. (ii) There exists a representation of the Cuntz algebra Od with the canonical generators Sj (j = 1, 2, . . . d) satisfying (2.5) and Sj ∈ πR (AR ) (j = 1, 2, . . . d) and πR (τ1 (Q)) =
d k=1
Sk πR (Q)Sk∗
Q ∈ AR .
(2.8)
402
T. Matsui
Proof. (i) We set M = πR (AR ) . M is a type I factor due to Proposition 2.2. Note that πR (AR ) may not act irreducibly on HR . Let {π(A), ,, H} be the GNS triple associated with ϕ of A. As ϕ is translationally invariant there exists the unitary U satisfying U π(Q)U ∗ = π(τ1 (Q)),
U , = ,.
This equation gives rise to the normal extension of τ1 to π(A) . We denote this normal ˜ ˜ We also denote the restriction of 1 ˜ to π(AR ) by the same symbol 1. extension by 1. Obviously we may set ,R = , and HR = π(AR ),,
πR (Q) = π(Q)|HR ,
Q ∈ AR .
˜ ˜ of τ1 on AR to M is then introduced by the restriction of 1 The normal extension 1 ˜ to M acting on HR . It is easy to see that 1 is a shift because ϕR is a factor state and ˜ ˜ k (M) = πR (A[k,∞) ) . 1(M) = πR (A[1,∞) ) or more generally, 1 ˜ is d. It suffices to show that the relative commuNext we show the Powers index of 1 ˜ ˜ tant 1(M) ∩M of 1(M) in M is π(A{0} ) = Md (C). There exist conditional expectations ˜ ∩ M and take P[1,k) (k = 2, 3, . . . ) from AR to A[1,k)c . Let Q be an element of 1(M) Qα from πR (AR ) such that the weak limit of Qα converges to Q, w − limα Qα = Q. By taking P[1,k) (Qα ) instead of Qα we can assume that Q is in πR (A[1,k)c ) . Thus Q ∈ πR (A{0} ) ∪ ∩k=2,3,... πR (A[k+1,∞ ) = πR (A{0} ). ˜ is d. In particular, the Powers index of 1 ˜ is a shift (ii) We first fix the matrix unit fij of πR (A{0} ) (i, j = 1, 2, 3, . . . d). As 1 with Powers index d, we have a representation of the Cuntz algebra Od in M which ˜ in the sense that implements 1 ˜ 1(Q) =
d
Sk QSk ∗ .
k=1
For the canonical Sj of this representation, we have matrix units eij = Si Sj∗ which ˜ belong to πR (A{0} ) = 1(M) ∩ M. Any matrix unit system in a finite dimensional full matrix algebra Md (C) is unitary conjugate to another matrix unit system, we have a unitary w in Md (C) such that fij = γw (eij ) for any i, j = 1, 2, 3, . . . d. By replacing Sj with γw (Sj ) we obtain the Cuntz generators Sj satisfying (2.8). (Note that arbitrary elements of AR and Od U (1) are obtained from fij or eij and the action of the canonical endomorphism.) 3. Split Pure States In this section we present our proof of Theorem 1.3, 1.4. and 1.5. Our starting point is the observation that any translationally invariant state can be constructed from composition of completely positive maps in the same way as quantum Markov states of L. Accardi in [1] or finitely correlated states of M. Fannes, B. Nachtergaele and R. Werner in [15]. Let us recall the definition of Finitely Correlated States first.
Split Property and Symmetry Breaking of Quantum Spin Chain
403
Definition 3.1. Let K be a finite dimensinal Hilbert space and by B(K) we denote the set of all linear operators on K. Suppose a completely positive map E from Md (C) ⊗ B(K) to B(K) and a normal state ψ on B(K) are given. We assume that E is unital in the sense that E(1Md ⊗ 1B(K) ) = 1B(K) and that the following invariance is valid for E and ψ: ψ(E(1Md ⊗ Q)) = ψ(Q)
for Q ∈ B(K).
(3.1)
The following equation determines a translationally invariant state ϕ of A: (j )
(j +1)
ϕ(Q0 Q1
(j +2)
Q2
(j +l)
. . . Ql
) = ψ(E(Q0 ⊗ E(Q1 ⊗ . . . E(Ql ⊗ 1B(K) )) . . . )). (3.2)
The state ϕ determined by (3.2) is called the finitely correlated state associated with the triple {B(K), E, ψ}. The positivity of the functional ϕ determined by (3.2) is guaranteed by the complete positivity of E. The operator E plays the role of the transfer matrix in classical spin systems. The quantum Markov state investigated in [1] corresponds to the case that the dimension dim K of K is exactly d while the finitely correlated states of M. Fannes, B. Nachtergaele and R. Werner in [15] is the situation where K is any finite dimensional space. Even when the dimension of the auxiliary space K is infinite Eq. (3.2) gives rise to a translationally invariant state of A. In fact, any translationally invariant state is constructed in this manner. The following construction of [9] is intructive in our argument. Proposition 3.2. Let ϕ be a translationally invariant state of A. There exists a Hilbert space K and an isometry V = (V1 , V2 , . . . Vd ) from K to C ⊗ K , V ∗ V = 1, such that the state ϕ is reconstructed via Eq. (3.2), where E is the unital completely positive map determined by E(Q) = V ∗ QV
for any Q in Md (C) ⊗ B(K),
(3.3)
and ψ is a normal faithful state of the von Neumann algebra M generated by bounded operators V1 , V2 , . . . Vd on K. The state ψ satisfies the invariance (3.1). Proof. We consider an extension ϕ˜R of ϕR to Od . As AR is the fixed point subalgebra of Od under the U(1) gauge action γz and the canonical endomorphism 1 and γz commute we can find the 1 and γz invariant extension as ϕ˜R . Let {π(Od ), ,, H} be the GNS triple of ϕ˜R and consider the normal extension of ϕ˜R to π(Od ) . Let P be the support projection (of the normal extension) of ϕ˜R in π(Od ) . Set Vj = P π(Sj∗ )P (j=1, 2, . . . d) and let K be the range of the projection P in M = P π(Od ) P . Then V = (V1 , V2 , . . . Vd ) is an isometry from K to C ⊗ K. To see, this we have only to show that P π(Sj∗ )P = π(Sj∗ )P .
(3.4)
Equation (3.4) is a consequence from the following identity: d j =1
ϕ˜R ((P Sj − P Sj P )(Sj∗ P − P Sj∗ P )) = 1 −
d j =1
ϕ˜R (Sj P Sj∗ ) = 1 − ϕ˜R (1(P )) = 0.
404
T. Matsui
Due to Eq. (3.4), the von Neumann algebra M is generated by V1 , V2 , . . . Vd . We also have VI∗ VJ= P SI SJ∗ P ,
(3.5)
where I and J are multi-indices with n = |I | = |J |, I = (i1 , i2 , . . . , in ), J = (j1 , j2 , . . . , jn ) and VI∗ = Vi∗1 Vi∗2 . . . Vi∗n , SI = Si1 Si2 . . . Sin ,
VJ = Vjn . . . Vj2 . . . Vj1 ,
SJ∗ = Sj∗n . . . Sj∗2 . . . Sj∗1 .
Let E be the normal unital completely positive map from Md (C) ⊗ M to M determined by (3.3) and let ψ be the restriction of ϕ˜R to M. Equation (3.1) follows from (the normal extension of) 1 invariance of ϕ˜R and the following identity: ψ(E(1d ⊗ Q)) = ϕ˜R (1(Q)). Then due to (3.5) we have )
ϕ˜R (SI SJ∗ ) = ψ(VI∗ VJ = ϕ˜R (E(ei1 j1 ⊗ E(ei1 j1 ⊗ . . . E(ein jn ⊗ 1) . . . ))), where eij is the matrix unit of Md (C). This is a proof of (3.2).
In [15], M. Fannes, B. Nachtergaele and R. Werner obtained an alternative reconstruction theorem of translationally invariant states. However, they used matricially ordered vector space in place of the von Neumann algebra M of our construction. Even when ϕ is factor, the extension ϕ˜R to the Cuntz algebra Od is not unique. It is not necessarily a factor either. It seems that Proposition 3. 2 does not tell us much about the state ϕ. Definition 3.3. (i) Suppose that K is a separable Hilbert space and that V = (V1 , V2 , . . . Vd ) is a d-tuple of bounded operators on K which gives rise to an isometry from K to Cd ⊗ K. Let M be the von Neumann algebra generated by {Vj (j = 1, 2, . . . d)} and ψ be a normal faithful state with the invariance property, ψ(E(1d ⊗ Q)) = ψ(Q),
Q ∈ M,
where the normal unital is a completely positive map defined in (3.3). We call the triple {V , ψ, K} a Popescu system. The state ϕ determined by (3.2) is called the state generated by the Popescu system {V , ψ, K}. (ii) When a Popescu system {V , ψ, K} is given, we introduce operators E˜ and EQ from M to M via the following equations: ˜ E(R) = E(1Md ⊗ R),
EQ (R) = E(Q ⊗ R),
where Q in Md (C) and any R in M. (iii) A Popescu system {V , ψ, K} is called minimal if there exists no non-trivial von Neumann subalgebra N of M invariant under any EQ and containing the unit of M. The following Lemma 3.4 is obvious by definition. Lemma 3.4. The Popescu system is minimal if and only if the linear hull of {VI∗ VJ |I | = |J |} is weakly dense in M.
Split Property and Symmetry Breaking of Quantum Spin Chain
405
Proposition 3.5. Let ϕ be a translationally invariant pure state of A. Suppose further that ϕ is split. Then, there exists a Popescu system {V , ψ, K} generating ϕ which is minimal and the von Neumann algebra M generated by Vj (j=1, 2, . . . , d) is of type I. Furthermore, lim E˜ n (Q) = ψ(Q),
n→∞
Q ∈ M.
(3.6)
Proof. We repeat our previous construction of a Popescu system with a different choice of an extension of ϕR to the Cuntz algebra Od . Let {π(A), ,, H} (resp. {πR (AR ), ,, HR }) be the GNS triple associated with ϕ (resp. ϕR ). As we are assuming that ϕ is translationally invariant, there exists a unitary U such that U π(Q)U ∗ = π(τ1 (Q)), Q ∈ A, U , = , and w − lim π(τk (Q)) = ϕ(Q)1, |k|→∞
lim ψ(τk (Q)) = ϕ(Q)
|k|→∞
(3.7)
for any Q ∈ A and any state ψ which is quasi-equivalent to ϕ. Now suppose that ϕ is split. Then ϕR is a type I factor state and the normal extension ˜ of τ1 on B = πR (AR ) is a shift of B(H). (See Lemma 2.1.) We have a representation 1 of the Cuntz algebra Od in B such that the canonical generator Sj satisfies (2.8). We denote the normal extension of ϕR to πR (AR ) by the same symbol ϕR . ϕR is now an extension ϕ˜R of the state of AR to Od of Proposition 3.2. Let P be the support projection of ϕR in the type I factor πR (AR ) . Let K be the range of P and set Vj = P Sj∗ P . As ϕR is a shift invariant normal state of πR (AR ) , we obtain Vj = P Sj∗ P = Sj∗ P again. In particular, V = (V1 , V2 . . . , Vd ) is an isometry from K to Cd ⊗ K. As πR (AR ) is weakly dense in πR (AR ) , P πR (AR )P is weakly dense in the von Neumann algebra M. As a result, our Popescu system is minimal. The von Neumann algebra M is type I as πR (AR ) is type I. Equation (3.6) follows from our assumption that ϕ is mixing, w − lim 1n (Q) = ϕ(Q), n→∞
Q ∈ πR (AR ) .
Next we consider the symmetry of split pure states. Let G be a compact group and v(g) be a d-dimensional unitary representation of G. By βg we denote the product action of G on the infinite tensor product A induced by v(g), βg (Q) = (· · · ⊗ v(g) ⊗ v(g) ⊗ v(g) ⊗ . . . )Q(· · · ⊗ v(g)−1 ⊗ v(g)−1 ⊗ v(g)−1 ⊗ . . . ) for any Q in A. On AR the gauge action γv(g) of the Cuntz algebra Od and βg coincide, γv(g) (Q) = βg (Q) for any AR . Proposition 3.6. Let ϕ be a translationally invariant pure state of A. Suppose that ϕ is split and G invariant, ϕ(βg (Q)) = ϕ(Q) for any g in G and any Q in A. There exists a Hilbert space K, a projective unitary representation u(g) of G on K, one-dimensional unitary representation ξ(g), an isometry V = (V1 , V2 , . . . , Vd ) from K to Cd ⊗ K and a normal faithful state ψ of the von Neumann algebra M generated by V1 , V2 , . . . , Vd such that the following statements are valid: (i) {V , ψ, K} is a minimal Popescu system generating ϕ.
406
T. Matsui
(ii) u(g) belongs to M. The state ψ is invariant under the adjoint action Ad(u(g)), ψ(u(g)Qu(g)∗ ) = ψ(Q),
Q ∈ M.
(iii) The operator V intertwines the projective representation of G, (ξ(g)v(g)) ⊗ u(g)V = V u(g).
(3.8)
Suppose, further, that G is one-dimensional torus U (1) or a connected, simply connected semisimple Lie group. The projective representation u(g) is a unitary representation. Proof. We use the notation and construction of the Popescu system of Proposition 3.5. As the state ϕ is split, we have a representation Sj of the Cuntz algebra in πR (AR ) which ˜ As ϕR is G invariant, there exists a unitary representation implements the shift τ1 or 1. u0 (g) on HR such that u0 (g)πR (Q)u0 (g)∗ = πR (βg (Q)),
u0 (g), = ,
for any g in G and any Q in AR . Now consider Tj = u0 (g)Sj u0 (g)∗ . Tj also implements the same shift τj and Tj∗ Si is in the center of πR (AR ) . So Tj∗ Si is a scalar and there exists a unitary representation w(g) of G in Cd such that u0 (g)Sj u0 (g)∗ = γw(g) (Sj ),
(3.9)
where γw(g) is a gauge automorphism of Od . Consider another gauge action of G obtained by γv(g) . By definition, for any Q in Od U (1) = AR , we have γw(g) (Q) = γv(g) (Q).
(3.10)
Thus there exists a scalar ξ(g) such that w(g)(g) = ξ(g)v(g),
ξ(g) ∈ C |ξ(g)| = 1.
(3.11)
ξ(g) is a one-dimensional representation of G as both w(g) and v(g) are unitary representations of G. As u0 (g) gives rise to the continuous action on the type I factor π(AR ) = Od we have a projective unitary representation u1 (g) of G in π(AR ) which yields the same G action on π(AR ) . Then, u0 (g)u1 (g)∗ Sj = Sj u0 (g)u1 (g)∗ . We obtain u0 (g)u1 (g)∗ ∈ π(AR ) and u1 (g)Sj u1 (g)∗ = γw(g) (Sj ).
(3.12)
Let P be the support projection of ϕR in πR (AR ) . We set Vj = P Sj∗ P as before. As the state ϕR is G invariant, the support projection P in πR (AR ) commutes with u0 (g) and u1 (g). Thus the reduction P u1 (g)P on K is a projective unitary representation of G. We denote this projective representation by u(g), u(g) = P u1 (g)P . Equation (3.9) reads u(g)Vj u(g)∗ = ξ (g) v(g)kj Vk , k
where v(g) is the complex conjugate of v(g). The above identity is equivalent to (3.8). When G is a one-dimensional torus U (1) or a connected, simply connected semisimple Lie group, it is generally known that the continuous action of G on the type I factor is implemented by the inner unitary representation. (See Sect. 14 of [21].)
Split Property and Symmetry Breaking of Quantum Spin Chain
407
Proof of Theorem 1.3. We use the Popescu system of Proposition 3.6. Suppose that the spin s is a half odd integer and ϕ is a translationally invariant, SU (2) invariant, pure state. If ϕ is split we can apply Proposition 3.6. When G is SU (2), the character ξ(g) is trivial, ξ(g) = 1. Let p(even) and p(odd) (resp. q(even) and q(odd)) be the projection to the integer and half odd integer spin space in K (resp. in Cd ⊗ K). Note that
trs (g)u(g)dg, p(odd) = s= 21 , 23 ,...
p(even) =
trs (g)u(g)dg,
s=0,1,2,3,...
where dg is the normalized Haar measure of SU (2) and trs (g) is the character (trace) of the spin s irreducible representation and trs (g) is its complex conjugate. It turns out that p(even) and p(odd) are in M. Obviously, p(odd) + p(even) = 1K ,
q(odd) + q(even) = 1Cd ⊗K .
Note that p(even) and p(odd) are in M as u(g) is in M. The intertwining property (3.8) of SU (2) representations implies q(even)V = Vp(even),
q(odd)V = Vp(odd).
(3.13)
As the spin is half odd integer, and the tensoring of the half odd integer spin interchanges the parity of spin, q(even) = 1Cd ⊗ p(odd),
q(odd) = 1Cd ⊗ p(even).
Thus due to the invariance of ψ under E˜ we have ψ(p(even)) = ψ(V ∗ Vp(even)) = ψ(V ∗ (q(even)V ) = ψ(V ∗ 1Cd ⊗ p(odd)V ) = ψ(p(odd)).
(3.14)
As ψ(p(odd) + p(even)) = 1, ψ(p(even)) = ψ(p(odd)) =
1 . 2
(3.15)
Consider the algebra B0 defined by B0 = p(even)Bp(even) + p(odd)Bp(odd). B0 is invariant by E˜ because ˜ 1 )p(odd), ˜ E(p(even)Q 1 p(even)) = p(odd)E(Q ˜ 2 )p(even). ˜ E(p(odd)Q 2 p(odd)) = p(even)E(Q Due to the minimality we have B0 = B(K), which implies that either p(odd) = 1 or p(even) = 1. However, (3.15) leads to a contradiction. This completes the proof of Theorem 1.3.
408
T. Matsui
Proof of Theorem 1.4. Proof is similar to that of Theorem 1.3. We assume that ϕ is G an invariant, translationally invariant, split pure state and show a contradition as before. ˆ α . Thus Let pα in K and qα in Cd × K be projections corresponding to G pα pβ = 0, pα = 1, qα qβ = 0, Qα = 1. (3.16) α
α
For α there exists β such that 1d ⊗ pα ≤ qβ ,
ψ(pα ) ≤ ψ(pβ ).
(3.17)
For each α there exists a sequence α(j ) such that α(0) = α and ψ(pα(j ) ) ≤ ψ(pα(j +1) ).
(3.18)
By (3.16) and (3.18) we have either (i) ψ(pα(j ) ) = 0 for all j or (ii) pα(j ) = pα(j +N) for some and ψ(pα(j ) ) = ψ(pα(0) ). In the first case (i) pα(j ) = 0 due to faithfulness of ψ. In the second case (ii), Pα = 0 at least two different α. As a consequence, B0 = α pα Mpα is a non-trivial EQ invariant subalgebra of M. This is a contradition to minimality of our Popescu system. Proof of Theorem 1.5. We consider the representation of U (1) appearing in Proposition 3.6. Here we denote the character of U (1) by ρ(j ) (j = 0, ±1/2, ±1, . . . ) and the restriction of the spin 1/2 representation of SU (2) to the torus is ρ(1/2) ⊕ ρ(−1/2). We denote the projection to the maximal subspace of K quasi-equivalent to ρ(j ) by p(j ). The projection to the subspace of C2 quasi-equivalent to ρ(j ) is denoted by e(j ). According to Proposition 3.4, the representation U (1) of in C2 ⊗ K gives rise to the following identity: 1 1 e(k + ) + e(k − ) = 1, p(j ) = 1, 2 2 j
where k is originated from ξ(g) of (3.8). The intertwining property of V reads
1 1 1 1 e(k + ) ⊗ p(j − k − ) + e(k − ) ⊗ p(j − k + ) V = Vp(j ). 2 2 2 2 When k is an integer we can use an even-odd argument for SU (2) in our proof of Theorem 1.3, which leads to a contradiction. So we assume that k is a half odd integer. We first consider the case that k = 21 . So we have e(1) + e(0) = 1
(e(1) ⊗ p(j − 1) + e(0) ⊗ p(j ))V = Vp(j ).
(3.19)
We claim that ψ(V ∗ e(1) ⊗ p(j − 1)V ) = ψ(V ∗ e(1) ⊗ p(j )V ).
(3.20)
In fact, Eq. (3.19) implies ψ(V ∗ (e(1) ⊗ p(j − 1) + e(0) ⊗ p(j ))V ) = ψ(V ∗ Vp(j )) = ψ(p(j )) = ψ(V ∗ (e(1) ⊗ p(j ) + e(0) ⊗ p(j ))V ),
Split Property and Symmetry Breaking of Quantum Spin Chain
409
where we used (3.1) in the last step. Equation (3.20) suggests the following equation. ϕ
1 2
(1 + σz(i) ) = ψ(V ∗ e(1) ⊗ 1V ) = 0
(3.21)
for any i in Z. This is a consequence of 0≤
ψ(V ∗ e(1) ⊗ p(j )V ) = ψ(V ∗ e(1) ⊗ 1V ) ≤ 1
j
and of Eq. (3.20). Equation (3.21) tells us that the state ϕ is a ground state for the Hamiltonian H , 1 (1 + σz(i) ). H = 2 i
So ϕ is a product state. We next consider the case that k > 21 . We have a positive integer m = k + that
1 2
such
e(m + 1) + e(m) = 1 (e(m + 1) ⊗ p(j − 1) + e(m) ⊗ p(j ))V = Vp(j + m). Then, as before, we have ∞
∗
ψ(V (e(m + 1) ⊗ p(j − 1) + e(m) ⊗ p(j ))V ) =
j =n
∞
ψ(p(j + m)).
j =n
On the other hand, ψ(p(j + m)) = ψ(V ∗ (e(m + 1) ⊗ p(j + m) + e(m) ⊗ p(j + m))V ). It turns out that n+m
ψ(V ∗ (e(m + 1) ⊗ p(j − 1))V ) +
j =n
n+m−1
ψ(V ∗ (e(m) ⊗ p(j )V )) = 0.
(3.22)
j =n
As each summand in (3.22) is positive we have ψ(V ∗ (e(m + 1) ⊗ p(j ))V ) = ψ(V ∗ (e(m) ⊗ p(j ))V ) = 0. However, this is a contradiction to the following equation: 1=
(ψ(V ∗ (e(m + 1) ⊗ p(j ))V ) + ψ(V ∗ (e(m) ⊗ p(j ))V )). j
The case of the negative k can be handled in the same manner. Thus we have proved Theorem 1.5.
410
T. Matsui
4. Examples In this section we present a few examples of split and non-split pure states. As these are constructed as infinite volume ground states of spin models, we begin with explaining the mathematical physical definition of ground states. (For more detail, see [11].) We consider a translationally invariant Hamiltonian. For simplicity we assume that the interaction is of nearest neighbor type. So suppose a selfadjoint element h0 = h∗0 in A{0,1} is given and consider the finite volume Hamiltonian H[n,m] on the interval [n, m] determined by m−1 H[n,m] = hj , j =n
where we set τj (h0 ) = hj . The formal infinite volume Hamiltonian will be denoted by hj . (4.1) H = j ∈Z
The time evolution αt (Q) of Q ∈ A is obtained via the thermodynamic limit, αt (Q) = lim eitH( Qe−itH( . (→Z
The generator δ of αt is δ(Q) = [H, Q] which is densely defined. αt will be referred to as the time evolution. Formally the one-parameter group αt of automorphisms of A corresponds to eitH Qe−itH = αt (Q). See [11]. A state ϕ is invariant (under αt ) if ϕ(αt (Q)) = ϕ(Q) for any t and any Q in A. The following condition is the standard definition of the ground state for infinite spin systems. Definition 4.1. Let ϕ be a state of A. ϕ is a ground state for the Hamiltonian (4.1) if and only if ϕ(Q∗ [H, Q]) ≥ 0
for any Q in Aloc .
(4.2)
The ground state is characterized by minimization of the energy in the following sense. Proposition 4.2. The following conditions for a state ϕ of A are equivalent. (i) The state ϕ is a ground state in the sense specified in Definition 4.1. (ii) For any interval [n, m], ϕ(H[n,m] ) = inf
ψ(H[n,m] ),
(4.3)
where the inf is taken among all the states ψ satisfying ϕ(Q) = ψ(Q)
for any Q in A(−∞,n+1]∪[m−1,∞) .
Furthermore if the state ϕ is translationally invariant, the following condition is equivalent to (i) and (ii):
Split Property and Symmetry Breaking of Quantum Spin Chain
411
(iii) The state ϕ minimizes the mean energy, ϕ(h0 ) = inf ψ(h0 ),
(4.4)
where the infimum is taken among all the translationally invariant states ψ of A. The set of all ground states is a weakly compact convex set in the state space of A. The extremal ground state is a pure state. We present examples of non split pure states. We consider the exactly solvable XY model. The Hamiltonian HXY of the XY model is determined by the following equation. (j ) (j ) (j +1) (j ) (j +1) σx σx − 2λ + σ y σy σz , (4.5) HXY (λ) = − j ∈Z
j ∈Z (j )
(j )
(j )
where λ is a real parameter (an external magnetic field) , σx , σy and σz are Pauli spin matrices at the site j. For the finite chain the XY model is equivalent to the free Fermion via the Jordan Wigner transformation. For the infinite chain the equivalence is (j ) not literally correct due to the infinite product of σz in the Jordan Wigner transformation. Nevertheless we have shown that the ground state is unique in [6]. When |λ| ≥ 1 it is easy to see that the ground state is a product state. If |λ| < 1 the ground state is not a product state. On the other hand the Hamiltonian HXY (λ) of the XY model is invariant under rotations around the z axis. The infinitesimal generator of the rotation commutes with the Hamiltonian HXY [HXY (λ), N] = 0, where
N=
(j )
σz .
j ∈Z
By the uniqueness mentioned above, the ground state is invariant under the rotation around the z axis. Thus due to Theorem 1.5 the ground state is non-split. When λ = 0 the one-side restriction of the ground state gives rise to the type I I I 1 state. To see this, we briefly explain the connection between the ground state of the XY model and that of the massless free Fermion. Consider the automorphism 1 of A determined by (j )
(j )
1(σx ) = −σx ,
(j )
(j )
1(σy ) = −σy ,
(j )
(j )
1(σz ) = σz
(4.6)
for any j. As the Hamiltonian HXY of the XY model is invariant under 1, the unique ground state ϕ is invariant under 1. Let A(+) be the fixed point subalgebra of A under 1, A(+) = {Q ∈ A|1(Q) = Q}. Next consider the CAR (canonical anti-commutation relation) algebra ACAR . It is the algebra generated by Fermion creation-anihilation operators on the lattice Z. Here creationannihilation operators are denoted by cj∗ and cj , cj ck + ck cj = 0, cj∗ ck∗ + ck∗ cj∗ = 0,
cj∗ ck + ck cj∗ = δj k .
By abuse of notation we denote the parity automorphism of the CAR algebra with the same symbol 1, 1(cj ) = −cj , 1(cj∗ ) = −cj∗ ,
412
T. Matsui (+)
(+)
and the even part is denoted by ACAR . We can identify A(+) and ACAR via the equation, (j ) (j +1)
σx σx
= (cj − cj∗ )(cj +1 + cj∗+1 ),
(j )
σz
= 2cj∗ cj − 1.
Consider the free Fermion Hamiltonian HCAR = − cj∗ cj +1 + cj∗+1 cj . j ∈Z
The unique ground state of HCAR is the Fock state which we denote ω. The restriction of ω and ϕ to A(+) are the same, ω(Q) = ϕ(Q)
for Q in A(+) .
On the other hand the computation of A. Wassermann in [30] implies that the restriction (+) of ω to AR gives rise to a type III factor representation. In fact recall that the type of the representation of the quasi-free state is determined by the spectrum of the operator associated with the two point function. The positive part of the spectrum of QP Q and P QP are identical. In the present case, Q is the projection to positive integer subspace in l 2 (Z) and P is the projection to the positive energy in the one particle space. (+) Thus the state ϕR cannot be of type I as its restriction to AR is non-type I. This is a proof of the non-split property of ϕR . Theorem 4.3. Let ϕ (λ) be the unique ground state of the XY model HXY (λ). If |λ| < 1, (0) ϕ (λ) is not split. When λ is zero, ϕR gives rise to a type I I I 1 representation. Next we consider a generalization of the AKLT model of [3] which was considered in [25]. Here we only give an abstract condition for Hamiltonians. We return to the general nearest neighbor interaction (4.1). Assumption 4.4. (i) We assume that hj is positive, thus H[n,m] ≥ 0 for any [n, m]. (ii) The dimension of the kernel of H[n,m] (the multiplicity of zero eigenvalue of H[n,m] as a finite matrix) is greater than one and uniformly bounded in n and m. 1 ≤ sup dim ker H[n,m] < ∞. n,m
(4.7)
Note that the condition (i) is not a constraint. It fixes a constant which is physically irrelevant. In the condition (ii) we regard H[n,m] as a d m−n+1 by d m−n+1 matrix as we are considering nearest neighbor interaction. Definition 4.5. Suppose that hj ≥ 0. A state ϕ of A is a zero energy state if and only if for any j in Z, ϕ(hj ) = 0.
(4.8)
Zero energy states may not exist for arbitrary Hamiltonians. The geometric structure of the set of all zero energy states was investigated in [17]. If ϕ is a zero energy state; it is a ground state. If a zero energy state exists, any translationally invariant ground state is a zero energy state, however, non-translationally invariant zero energy states may exist. For example, the massive ferromagnetic XXZ model has an infinite number of non-translationally invariant zero energy states. (See [17].) The first inequality in (4.7) of Assumption 4.4. (ii) is the existence of a zero energy state. Examples of Hamiltonians with zero energy states are given in [3, 15]. The ladder case is discussed byA. K. Kolezhuk and H. J. Mikeska in [20]. The main result of the paper [25] is expressed in the present context as follows.
Split Property and Symmetry Breaking of Quantum Spin Chain
413
Theorem 4.6. Suppose that Assumption 4.4 is valid. Let ϕ be a translationally invariant ground state. Assume that ϕ is mixing in the sense that lim ϕ(Q1 τj (Q2 )) = ϕ(Q1 )ϕ(Q2 )
|j |→∞
(4.9)
for any local Q1 , Q2 in A. Then, the state ϕ is split and pure. In fact, the auxiliary Hilbert space K in Definition 4.1 is finite dimensional. The two point function decays exponentially fast, sup ϕ(Q1 τj (Q2 )) − ϕ(Q1 )ϕ(Q2 ) em|j | < ∞,
j ∈Z
(4.10)
for any local Q1 and Q2 . The above result is a converse to a result of M. Fannes, B. Nachetergaele and R. Werner in [15]. They have shown that if ϕ is a pure translationally invariant finitely correlated state with the finite dimensional auxiliary Hilbert space K, there exists a projection P in Aloc such that ϕ(Pj ) = 0, where Pj = τj (P ). Thus ϕ is a ground state for the Hamiltonian H = j Pj . Finally we present an example for an infinite dimensional auxiliary space K. Such states can be constructed via the pure state extension of classical Gibbs measures. (See [23] and [24].) For simplicity, we only consider the spin 1/2 case. Let B be the abelian (j ) C ∗ -subalgebra generated by all σz . We set B( = A( ∩ B. By the pure state extension of a classical Gibbs measure µ we mean a pure state ϕ of A such that the restriction of ϕ to B is the one-dimensional classical Gibbs measure µ. To be more specific, we introduce a class of interactions of a spin 1/2 classical spin. Let X be the configuration space of the classical spin on Z, X=
{1, −1}.
Z
A spin configuration will be denoted by σ , σ ∈ X. The projection of the spin configuration σ to the j th component is denoted by σ (j ) . The space C(X) of continuous functions (j ) on X can be naturally identified with B and we identify σ (j ) = σz . Any state of C(X) is a Radon measure on X. Let A be a finite subset of Z. We set σα (A) = σα(k) α = x, y, z. k∈A
By interaction we mean a map JA from finite subsets A of Z to R. Assumption 4.7. We assume that the interaction JA is translationally invariant, JA = JA+k for any k in Z such that A∩(−∞,−1]=∅A∩[0,∞)=∅
|JA | <
∞.
(4.11)
414
T. Matsui
Given a translationally invariant interaction JA satisfying (4.11) the Gibbs measure associated to JA is a measure µ formally determined by
dµ(σ )f (σ ) =
1 Z
dσ (j ) e−H (σ ) f (σ ),
(4.12)
j ∈Z
where f (σ ) is a function in C(X) and Z is the (infinite) normalization constant and −H (σ ) is the total energy for the configuration σ , H (σ ) =
JA σ (A).
A
For the infinite system the above expression is divergent and one way to characterize the Gibbs measure µ is the following equation: dµ(σj ) = exp(2 JA σ (A)), dµ(σ )
(4.13)
A:j ∈A
where σj is the configuration obtained by the spin flip at site j and the lefthand side dµ(σj ) dµ(σ ) is the Radon Nikodym derivative. In [23] we considered the pure state ϕµ determined by
ϕµ (σz (A)σx (B)) =
dµ(σ )σ (A)exp(
JB σ (B)).
(4.14)
|B∩C|:odd
Theorem 4.8. Let JA be a translationally invariant interaction JA satisfying (4.11) and µ be the (unique) Gibbs measure associated with JA . Equation (4.14) gives rise to a translationally invariant pure state ϕµ . The state ϕµ is split. Sketch of Proof. The Assumption (4.11) is the bounded surface energy condition which leads to uniqueness of the Gibbs measure. When the Gibbs measure is unique, the state ϕµ is pure. (See [23].) The split property of ϕµ follows from that of the measure µ. In fact set V = exp(h(−∞,−1][0,∞) ),
h(−∞,−1][0,∞) =
JA σ (A).
A∩(−∞,−1]=∅A∩[0,∞)=∅
There exist states ψL of AL and ψR of AR such that 1 1 1 ϕµ (V 2 QV 2 ) = ψL ⊗ ψR (Q) ϕµ (V )
(4.15)
for any Q in A. This is more or less obvious from the formal expression (4.12) of the Gibbs measure µ.
Split Property and Symmetry Breaking of Quantum Spin Chain
415
Remark 4.9. (i) We have shown that the pure split state in Theorem 4.8 is a ground state of a suitable (possibly infinite range) Hamiltonian in [23]. For example, consider the XYZ model with Ising like anisotropy, (j ) (j +1) (j ) (j +1) (j ) (j +1) HXY Z = − {σx σx + δ1 σy σy + δ2 σz σz } j
with small |δ1 | and |δ2 |. There exist two pure translationally invariant ground states for HXY Z and one of them is realized as ϕµ for a suitable interaction and the other is ϕµ ◦ 1, where 1 is an automorphism of A determined by (4.6). The fact that an infinite range classical interaction enters in the ground state of a quantum finite range Hamiltonian is not a surprise. In general one-dimensional quantum models are equivalent to two dimensional classical spin models, however, at the level of correlation functions, they are obtained via projection of two dimensional Gibbs measures (of classical finite range interactions) to a one dimensional line. The situation of the previous Theorem 4.8 is that the projected measure is still a Gibbs measure of another infinite range classical interaction on the one dimensional lattice. (ii) A folklore is that two point correlation functions of the Gibbs measure for some long range Ising model decay in power while the state of Theorem 4.6 exhibits an exact exponential decay of correlation, so we obtain the example of split pure states with an infinite dimensional auxiliary space K. References 1. Accardi, L.: A non-commutative Markov property. (in Russian), Funkcional. Anal. i Prilozen. 9, 1–8 (1975) 2. Affleck, I., Lieb, E.H.: A Proof of Part of Haldane’s Conjecture on Spin Chains. Lett. Math. Phys. 12, 57–69 (1986) 3. Affleck, I., Kennedy, T., Lieb, E. H., Tasaki, H.: Valence Bond Ground States in Isotropic Quantum Antiferromagnets. Commun. Math. Phys. 115, 477–528 (1988) 4. Aizenman, M., Nachtergaele, B.: Geometric aspects of quantum spin states. Commun. Math. Phys. 164, 17–63 (1994) 5. Aizenman, M., Goldstein, S. and Lebowitz, J.: Bounded Fluctuations and Translation Symmetry Breaking in One-Dimensional Particle Systems. Preprint archived as math-ph/0007022 6. Araki, H., Matsui, T.: Ground States of the XY model. Commun. Math. Phys. 101, 213–245 (1985) 7. Bratteli, O., Jorgensen, P. Price, J.: Endomorphisms of B(H), Quantization, nonlinear partial differential equations and operator algebras. In: Proc. Sympos. Pure. Math., ed. by W. Arveson, T. Branson and I. Segal, 59, AMS, 93–138 (1996) 8. Bratteli, O., Jorgensen, P.: Endomorphisms of B(H), II: Finitely Correlated States on ON . J. Funct. Anal. 145, 323–373 (1997) 9. Bratteli, O., Jorgensen, P., Kishimoto A., Werner, R.: Pure states on Od . J. Operator Theory 43 no. 1, 97–143 (2000) 10. Bratteli, O., Robinson, D.: Operator algebras and quantum statistical mechanics I. 2nd edition, Berlin– Heidelberg–New York: Springer, 1987 11. Bratteli, O., Robinson, D.: Operator algebras and quantum statistical mechanics II. 2nd edition, Berlin– Heidelberg–New York: Springer, 1997 12. Buchholz, D.: Product states for local algebras. Commun. Math. Phys. 36, 287–304 (1974) 13. Dagotto, E., Rice, T. M.: Surprise on the Way from One- to Two-Dimensional Quantum Magnets: The Ladder Materials. Science 271, 618–623 (1996) 14. Doplicher, S. Longo, R.: Standard and split inclusions of von Neumann algebras. Invent. Math. 75, 493–536 (1984) 15. Fannes, M., Nachtergaele, B., Werner, R.: Finitely Correlated States on Quantum Spin Chains. Commun. Math. Phys. 144, 443–490 (1992) 16. Fannes, M., Nachtergaele, B., Werner, R.: Finitely correlated pure states. J. Funct. Anal. 120, 511–534 (1994)
416
T. Matsui
17. Gottstein, C. T., Werner, R. : Ground states of the infinite q-deformed Heisenberg ferromagnet. Preprint, Osnabrück, 1995 (Archived in cond-mat/9501123) 18. Haag, R.: Local Quantum Physics. 2nd edition, Berlin–Heidelberg–New York: Springer-Verlag, 1996 19. Kennedy, T., Tasaki, H.: Hidden symmetry breaking and the Haldane phase in S=1 quantum spin chains. Commun. Math Phys. 147, 431–484 (1992) 20. Kolezhuk, A. K., Mikeska, H. J.; Finitely Correlated Generalized Spin Ladders. Preprint (Archived in cond-mat/9803176), 1998 21. Kirillov, A.: Elements of Representation Theory. Berlin–Heidelberg–New York: Springer-Verlag, 1976 22. Longo, R.: Solution to the factorial Stone–Weierstrass conjecture. An application of standard split W ∗ inclusion. Invent. Math. 76, 145–155 (1984) 23. Matsui, T.: Gibbs measure as quantum ground states. Commun. Math. Phys. 135, 79–89 (1990) 24. Matsui, T.: Markov Semigroups which Describe the Time Evolution of Some Higher Spin Quantum Models. J. Funct. Anal. 116, 179–198 (1993) 25. Matsui, T.: A characterization of finitely correlated pure states Infinite dimensional analysis and quantum probability Vol 1, 1998, pp. 647–661 26. Nachtergaele, B.: Quasi-state decompositions for quantum spin systems. In: Grigelionis et al. (eds.), Probability Theory and Mathematical Statistics, VSP/REV, 1994, pp. 565–590, archived as cond-mat/9312012 27. Popescu, Gelu: Isometric dilations for infinite sequences of noncommuting operators. Trans. Am. Math. Soc. 316 no. 2, 523–536 (1989) 28. Powers, R. T.: An index theory for semigroups of *-endomorphisms of B(H) and type I I1 factors. Canad. J. Math. 40, 86–114 (1988) 29. Schulz. H. J.: Phase diagrams and correlation exponents for quantum spin chains of arbitrary spin quantum number. Phys. Rev. B 34, 6372–6385 (1986) 30. Wassermann, A.: Operator algebras and conformal field theory III: Fusion of positive energy representations of SU (N ) using bounded operators. Invent. Math. 133,467–538 (1998) 31. Werner, R. F.: Finitely Correlated Pure States. In: On Three Levels, NATO ASI Series B Vol. 324, M. Fannes, Ch. Maes, and A. Verbeure (eds.), New York: Plenum Press, 1994, pp. 193–202 Communicated by H. Araki
Commun. Math. Phys. 218, 417 – 436 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Rationality of Conformally Invariant Local Correlation Functions on Compactified Minkowski Space Nikolay M. Nikolov, Ivan T. Todorov Institute for Nuclear Research and Nuclear Energy, Theoretical Physics Division, 72 Tsarigradsko Chaussee, 1784 Sofia, Bulgaria. E-mail:
[email protected];
[email protected] Received: 20 October 2000 / Accepted: 5 December 2000
Abstract: Rationality of the Wightman functions is proven to follow from energy positivity, locality and a natural condition of global conformal invariance (GCI) in any number D of space-time dimensions. The GCI condition allows to treat correlation functions as generalized sections of a vector bundle over the compactification M of Minkowski space M and yields a strong form of locality valid for all non-isotropic intervals if assumed true for space-like separations.
Introduction The study of conformal (quantum) field theory (CFT) in four (or, in fact, any number of) space time dimensions (see, e.g., [3, 23, 11, 9, 10, 12, 6–8, 13, 15] as well as the reviews [20, 17, 18] and references therein) preceded the continuing excitement with 2dimensional (2D) CFT (for a modern textbook and references to original work – see [2]). Interest in higher dimensional CFT was revived (starting in late 1997) by the discovery of the AdS-CFT correspondence in the context of string theory and supergravity (recent advances in this crowded field can be traced back from [1]). The present paper is more conservative in scope: we try to revitalize the old program of combining conformal invariance and operator product expansion with the general principles of quantum field theory using some insight gained in the study of 2D CFT models. The main result of the paper (Theorems 3.1 and 4.1) can be formulated (omitting technicalities) as follows. We say that a QFT (obeying Wightman axioms [16, 5]) satisfies global conformal invariance (GCI) if for any conformal transformation g and for any set of points (x1 , . . . , xn ) in Minkowski space M such that their images (gx1 , . . . , gxn ) also lie in M the Wightman function W (x1 , . . . , xn ) stays invariant under g. We note that this requirement (stated more precisely in Sect. 2) is stronger than the one used (under the same name) in [6]. An important tool in using GCI is the fact that mutually non-isotropic
418
N. M. Nikolov, I. T. Todorov
pairs of points form a single conformal orbit on compactified Minkowski space (Proposition 1.1). Together with local commutativity for space-like separations this implies the vanishing of the commutator [φ1 (x1 ) , φ2 (x2 )] whenever the difference x12 = x1 −x2 is non-isotropic (it follows from Lemma 3.2). We deduce from this strong locality property combined with the energy positivity condition that the Wightman functions are rational in xij (Theorem 3.1). Hilbert space positivity gives strong restrictions on the degrees of the poles of the resulting rational functions (Theorems 4.1, 4.2 and Proposition 4.3). This result severely limits the class of local QFT satisfying GCI and makes feasible the construction of general conformal invariant correlation functions in such theories. That is illustrated on a simple example in the concluding Sect. 5. The interesting problem of exploiting the presence of conserved currents and the stress energy tensor and classifying their correlation functions in a QFT satisfying GCI in 4 space-time dimensions is the subject of a separate study carried out currently in collaboration with Yassen Stanev. 1. Conformally Compactified Minkowski Space M and Its Complexification. The Orbit of Non-Isotropic Pairs of Points in M × M Conformally compactified Minkowski space M of dimension D is a homogeneous space of the connected conformal group C0 (for even D, C0 is isomorphic to SO0 (D, 2) /Z2 ; for odd D , C0 ∼ = SO0 (D, 2)). Unless otherwise stated we shall suppose that the dimension is an arbitrary integer D ≥ 1. We shall use also in the exceptional cases D = 1, 2 the group SO0 (D, 2) of Möbius transformations (as the infinite dimensional conformal group present in those cases is not an invariance group of the vacuum state). The Minkowski space M is embedded as a dense open subset in M in such a way that the isotropy relation in M × M extends to a conformally invariant “isotropy relation” in M × M. More generally, for every element g ∈ SO0 (D, 2) there exists a quadratic polynomial ω (x, g) in the coordinates x in M ∼ = RD−1,1 such that the pseudo-euclidean interval transforms as: (gx − gy)2 =
(x − y)2 , ω (x, g) ω (y, g)
(1.1)
where x → gx is the nonlinear (coordinate) conformal action of SO0 (D, 2) on M (with singularities, see Appendix A). The complement K∞ := M\M, the set of points at infinity, is an isotropic (D − 1)-cone: there is a unique point p∞ ∈ K∞ , the tip of the cone, such that K∞ is the set of all points p in M isotropic to p∞ . Thus the stabilizer C∞ of p∞ in C0 leaves M (and K∞ ) invariant; it is the Poincaré group with dilations of M (also called the Weyl group). Proposition 1.1. Any pair (p0 , p1 ) of mutually non-isotropic points of M can be mapped into any other such pair (p0 , p1 ) by a conformal transformation. Proof. Due to the transitivity of the action of C0 there are elements g0 and g0 of C0 which carry p0 and p0 into the point p∞ : g0 p0 = g0 p0 = p∞ . Then the images g0 p1 and g0 p1 of the two other points will both belong to M (because the original pairs are mutually non isotropic) and can hence be moved to one another by a translation t (in C∞ ) which leaves p∞ invariant: g0 p1 = tg0 p1 . So the element g ∈ C0 which transforms the −1 tg0 . pair (p0 , p1 ) into p0 , p1 is given by g = g0
Rationality of Conformally Invariant Correlation Functions
419
Remark 1.1. It is important to note that we are just dealing in Proposition 1.1 with pairs of points: a continuous time-like world line cannot be mapped into a space-like one by a conformal transformation. In fact, the connected conformal group preserves causal ordering on such lines – see [4] and [14]. We shall be primarily interested in this paper in the space-time bundle of (Fermi and) Bose fields of (half)integer dimension transforming under a representation of the finite covering C = Spin0 (D, 2) of the conformal group C0 of M. (For even D, C is a 4-fold covering of C0 = SO0 (D, 2) /Z2 ; for odd D it is a 2-fold covering of C0 = SO0 (D, 2).) The group C acts (transitively) on M through its canonical projection on C0 (its centre acting trivially). We shall use the following atlas on M. Let x : M ∼ = RD−1,1 be the standard Minkowski chart in M; for every g ∈ C we set M(g) := g −1 M and introduce the coor- D−1,1 thus obtaining an atlas x ; g ∈ C dinatization map x(g) := x ◦ g : M(g) ∼ R = (g) over M. The transformation from the coordinates x(g) to x(g ) is x(g ) = g g −1 x(g) . (One can in fact show that M can be covered by just 3 charts of this type.) We need also an ×n atlas on M in the study of n-point correlation functions. To this end we note that such an atlas is given by the diagonal subsystem of the Cartesian power of the above atlas: ×n ∼ Dn (1.2) x1(g) , . . . , xn(g) : M(g) =R ; g∈C (to simplify notation we write here and in what follows RD instead of RD−1,1 ). Indeed, ×n ×n M(g) = M , which is equivalent to the statement that for every set of points
g∈C
p1 , . . . , pn ∈ M there exists g ∈ C such that gp1 , . . . , gpn ∈ M . The last statement can be proven by induction in n (see also the argument in Appendix C and the proof of Lemma 3.2). Finally, let us introduce the complexification M C of M. It is needed because of the condition of energy positivity in QFT, which implies that the vector valued distribution F (x) = φ(x)|0, where φ(x) is an arbitrary (local) Wightman field [16, 5] can be viewed as the boundary value of an analytic function F (x + iy) holomorphic in the (forward) tube domain T + , where (1.3) T ± = x ± iy ; x ∈ M , y ∈ V + , V ± = y ∈ M ; ±y 0 > |y| . Clearly, T ± ⊂ M C and each of them is a homogeneous space of the (real) conformal group C [22], the stabilizer of a point being conjugate to the maximal compact subgroup Spin (D) × Spin (2) of C. 2. Global Conformal Invariance of Wightman Functions The Wightman functions, the vacuum expectation values of products φ1 (x1 ) . . . φn (xn ) of a multicomponent field φ (x), are defined as tensor valued tempered distributions on M ×n ∼ = RDn : W1...n (x1 , . . . , xn ) = 0| φ1 (x1 ) φ2 (x2 ) . . . φn (xn ) |0 ∈ S M ×n , F ⊗n , (2.1) where F is a finite dimensional (complex) vector space. In Eq. (2.1) and further we are using Faddeev’s shorthand for tensor products (φ1 . . . φn = φ (x1 ) ⊗ · · · ⊗ φ (xn )).
420
N. M. Nikolov, I. T. Todorov
We shall assume that C acts (locally) on M × F : g
M × F (x, v) −→ (gx, πx (g) v) ∈ M × F
iff
gx ∈ M,
where πx (g) ∈ Aut (F ) ,
πx (g1 g2 ) = πg2 x (g1 ) πx (g2 ) ,
(2.2)
πx (g) being a single-valued real analytic function defined for all pairs x ∈ M, g ∈ C for which gx ∈ M (thus the multiplicativity of πx in (2.2) holds under the provision that both g2 x and g1 g2 x belong to M). Moreover, we will assume that if g is a translation, ta x = x + a, then πx acts trivially on F , i.e., πx (ta ) = id. Note that these conditions are satisfied by the usually considered local induced representations of C (see [8] or Chapter 2 of [17]). Example 2.1. Let φ be a vector current j µ (x); then dim F = D. For x 2 = 0 the action of the Weyl reflection w (∈ C0 ) is given by: µ µ µ x 0 −x rν (x) ν x µ xν µ µ , = δ w : x , j → , r ; − j − 2 , (2.3) (x) ν ν d x2 x2 x2 x2 where d is the conformal dimension of the current j . The by a space
result differs µ (x) µ r reflection from the conformal inversion, Ir : (x µ , j µ ) → xx 2 , ν2 d j ν (which does (x ) not belong to C0 ). It is easy to verify that Ir , and hence w, leave invariant the current conservation law iff d = D − 1; indeed, this follows from the identity x x r µ ν (x) ν x 1 d +1−D ν ν ∂ j x j = + 2 . (2.4) ∂µ d j ν ν d+1 d+1 x2 x2 x2 x2 x2 x2 Note that the function πx (w) (as well as πx (Ir )) is well defined and single-valued (for all x such that x 2 = 0), iff d is an integer. Remark 2.1. The condition for a trivial action of the translations in M onto F – i.e. πx (ta ) = id – is not restrictive. If it is not satisfied we can pass to an equivalent action:
−1 −1 πx (g) π0 (tx ) = π0 tgx (2.5) gtx πx (g) := πgx tgx for which πx (ta ) = id does hold. It also follows from (2.5) that the growth of πx (g) for x → ∞ is not faster than polynomial. The fact that the map (2.2) is single valued outside its singularities allows to treat πx (g) as a cocycle on a fibre bundle over M with a standard fibre F . More generally, for every n = 1, 2, 3, . . . and the atlas (1.2) we have a cocycle:
x(g) , W −→ x(g ) = g g −1 x(g) , π(n) x(g) ; g g −1 W ∈ M ×n × F ⊗n , (2.6) where: gx := (gx1 , . . . , gxn ) ;
π(n) (x; g) := πx1 (g) ⊗ · · · ⊗ πxn (g) ,
for x = (x1 , . . . , xn )
and
g ∈ C,
Rationality of Conformally Invariant Correlation Functions
421
×n
which gives a fibre bundle En over M with a standard fibre F ⊗n . It allows to consider the Wightman functions W1...n as generalized sections W of En which admit for each ×n chart M(g) of the atlas (1.2) a coordinate expression W(g) (x1 , . . . , xn ) ∈ D RDn , F ⊗n (D being the space of distributions over test functions of compact support). These coordinate expressions should satisfy the consistency condition
W(g ) g g −1 x1 , . . . , g g −1 xn = π(n) x; g g −1 W(g) (x1 , . . . , xn ) (2.7) (with the cocycle (2.6)). We note that Eq. (2.7) is understood locally around every x ∈ −1 RDn such that g g −1 x ∈ RDn the : then transformation of the coordinates x → g g x is −1 a diffeomorphism and π(n) x; g g is a multiplicator. We can further define the space Dn of test sections for a generalized section W, which is actually the space of all smooth ×n sections of the fibre bundle over M with a standard fibre F ∗ ⊗n and the cocycle:
x(g) , f −→ x(g ) = g g −1 x(g) , π (n) x(g) ; g g −1 f ∈ M ×n × F ∗⊗n , where π (n) (x; g) := π(n) (x; g)−1∗ J (x; g)−1
(2.8)
and J is the Jacobian of the transformation x → gx. (For the concepts of test functions and distributions on a manifold, see also [21].) Proposition 2.1. (a) Let the distributions W(g) be defined for any g ∈ C and satisfy the consistency condition (2.7). Then there exists a unique linear functional W on Dn that is a generalized section of the vector bundle En with coordinate expressions W(g) . (b) Each W(g) actually belongs to the subspace of tempered distributions S RDn , F ⊗n of D RDn , F ⊗n . ×n
Sketch of proof. (a) Since M is compact there exists a finite partition of unity 1 = k ×n ai , where ai is a smooth function of compact support in some chart M (gi ) (i =
i=1
1, . . . , k). The linear functional W is then uniquely defined by W, f :=
k i=1
W(gi ) , (ai f )(gi ) (x),
(2.9)
where (ai f )(gi ) (x) is the coordinate expression of the test section ai f . (b) Getting a test section f with a coordinate expression f(g) (x) ∈ S RDn , F ⊗n , we obtain this statement by Eq. (2.9) because of the continuity of the linear maps:
S RDn , F ⊗n f(g) (x) −→ (ai f )(gi ) (x) ∈ D RDn , F ⊗n . (2.10)
The action (2.2) not only defines a cocycle on En ; it also gives rise to a natural action of the conformal group C on En that is linear on the fibres. A generalized section W of
422
N. M. Nikolov, I. T. Todorov
En is called conformally invariant if it does not change under the action of C. This is equivalent to requiring that W has the same coordinate expression in every chart: W(g) (x1 , . . . , xn ) = W (x1 , . . . , xn ) .
(2.11)
Because of the consistency condition (2.7) this is equivalent to the following requirement. (GCI) Global conformal invariance. πx1 (g)−1 ⊗ · · · ⊗ πxn (g)−1 W (gx1 , . . . , gxn ) = W (x1 , . . . , xn )
(2.12)
for gx1 , . . . , gxn ∈ M. This implies, in particular, (unrestricted) Poincaré and dilation invariance. We recall that Eq. (2.12) should be interpreted locally: for a fixed g ∈ C, πx1 (g)−1 ⊗· · ·⊗πxn (g)−1 should be viewed as a multiplicator (regular in the neighbourhood of (x1 , . . . , xn ) ∈ M ×n ) while g : (x1 , . . . , xn ) → (gx1 , . . . , gxn ) is a local diffeomorphism. (That is, both sides of (2.12) are to be smeared with test functions with support in the above neighbourhood.) Thus we have an one-to-one correspondence between the tempered distributions W (x1 , . . . , xn ) ∈ S RDn , F ⊗n satisfying the condition (GCI) and the conformal invariant generalized sections of the bundle En . Remark 2.2. Local conformal invariance requires the existence of a continuous curve g (τ ) ∈ C, such that g (0) = 1, g (1) = g and (g (τ ) x1 , . . . , g (τ ) xn ) ∈ M ×n for all τ ∈ [0, 1]; it is equivalent to invariance under infinitesimal conformal transformations. The (GCI) condition (2.12) does not demand the existence of such a curve; moreover, if several curves of this type exist it says that πx (g) is independent of the path g (τ ) connecting g with the group unit. The 2-point function of a hermitian scalar field of dimension d, d 4 +(d) Wd (x12 ) = D 2 0 (4π ) 2 x12 + i0x12 (2.13) 2 ipx D d p 2π 0 θ −p e = θ p D −d (2π )D + d + 1 − D2 −p 2 2 (x12 = x1 − x2 ), satisfies all Wightman axioms (including positivity for any d ≥ D2 − 1, D > 2) as well as local conformal invariance, however, it only obeys (GCI) for positive integer d, because of the requirement of singlevaluedness of πx (g) inherent in (GCI). Thus (GCI) is indeed stronger than local conformal invariance 0 prescription in (2.13) is implied by the following more The invariance of the i0x12 general statement.
Proposition 2.2. Any product of 2-point like functions of type (2.13) satisfies the invariance condition (GCI): [ω (x1 , g)]−µ1 . . . [ω (xn , g)]−µn
gxj − gxk
1≤j
=
1≤j
2
−µj k + i0 gxj0 − gxk0
xj2k + i0xj0k
−µj k
,
(2.14)
Rationality of Conformally Invariant Correlation Functions
where µj k ∈ Z , µk = introduced in (1.1).
k−1
j =1
µj k +
n
l=k+1
423
µkl , and the factors ω (x, g) are
Away from the singularities, for xij2 = 0 , the statement follows from Eq. (1.1). Only establishing conformal invariance of the “+i0xij0 ” prescription poses a problem. The somewhat technical proof of this statement is relegated to Appendix B. It is based
−µ is a limit of a holomorphic function of xj − ζk for on the fact that xj2k + i0xj0k T + ζk → xk and of the conformal invariance of the forward tube (in the transformation of ζk in xj − ζk → gxj − gζk ).
Example 2.2. Let the points x1 , x2 of M satisfy x122 > 0 and x12 x22 < 0. Then wx1 , wx2 ∈ M (where wx is defined in Example 2.1) but there is no continuous curve g (τ ) ∈ C that relates (x1 , x2 ) with (wx1 , wx2 ) in M × M. Indeed, (wx1 − wx2 )2 = x122 x12 x22
< 0; if there were xi (τ ) = g (τ ) xi ∈ M, i = 1, 2 which connect con-
tinuously xi with wxi then there would have been a point 0 < τ0 < 1 such that (x1 (τ0 ) − x2 (τ0 ))2 = 0 which is impossible (cf. Remark 1.1). 3. Wightman Axioms and Global Conformal Invariance Imply Rationality We will recall first some general properties of Wightman functions (2.1) (see for more detail [16] or [5]). Thus W is a tempered distribution with values in the n-fold tensor power F ⊗n of a finite dimensional (complex) vector space F . For complex fields the vector φ ∈ F is assumed to contain along with each component φa also its conjugate φa = φa ∗ . The splitting of the fields into bosonic and fermionic ones amounts to defining a ∗-invariant Z2 -grading F = F0 ⊕ F1 . Then the automorphisms πx (g) of F in the conformal action (2.2) will preserve the conjugation and the Z2 -grading of F . The implications of translation invariance and energy positivity and of locality can be summed up in the following conditions for W (x1 , . . . , xn ). (TS) Translation invariance and spectral condition. The Fourier transform of W (x1 , . . . , xn ) = W (x1 − x2 , . . . , xn−1 − xn )
(3.1)
(omitting the indices 1 . . . n in both sides), W˜ (q1 , . . . , qn−1 ) = · · · W (y1 , . . . , yn−1 ) e−i(q1 y1 +···+qn−1 yn−1 ) d D y1 . . . d D yn−1 ,
(3.2)
M ×(n−1)
has support in the product of (closed) future light cones in M: + ×(n−1) ± , V = q ∈ M ; ±q 0 ≥ |q| . supp W˜ ⊆ V
(3.3)
This is the relativistic (Lorentz invariant) form of energy positivity. (L) Locality. W1 ... i i+1 ... n (x1 , . . . , xi , xi+1 , . . . , xn ) = 2i i+1 W1 ... i+1 i ... n (x1 , . . . , xi+1 , xi , . . . , xn )
(3.4)
424
N. M. Nikolov, I. T. Todorov
for xi i+1 2 > 0 (xi i+1 = xi − xi+1 ); where 2ij (i = j ) is a sign factor, 2ij = −1 if both i and j refer to Fermi fields (elements of F1 ) and 2ij = 1 otherwise. In other words Fermi fields anticommute among themselves while Bose fields commute with both Bose and Fermi fields for space-like separations. Permutation of field indices accompanied by the sign factors 2ij give rise to an action of the symmetic group Sn on F ⊗n . An n-point function F : M ×n −→ F ⊗n is said to be Z2 -symmetric if it is invariant under this action combined with the corresponding permutations of the coordinates. Remark 3.1. The restriction (2.1) on the class of distributions is not essential for theories ˜ ∈ D ) then dilation invariance satisfying (GCI). Indeed, if we assume W ∈ D (or W (which has a similar form in coordinate and in momentum space) implies W ∈ S (2.1). Theorem 3.1. The tempered distribution W1...n (x1 , . . . , xn ) satisfies conditions (TS), (L) and (GCI) iff it can be expressed in terms of a rational function of the type W1...n (x1 , . . . , xn ) = P1...n (x1 , . . . , xn ) xj k = xj − xk .
1≤j
xj2k + i0xj0k
−µn
jk
, (3.5)
Here µnjk ≥ 0 are integers, P1...n (x1 , . . . , xn ) is a polynomial with values in F ⊗n and the associated rational function
R1...n (x1 , . . . , xn ) = P1...n (x1 , . . . , xn )
1≤j
xj2k
−µn
jk
is fully Z2 -symmetric and satisfies the conformal invariance condition (2.12) as a rational function. The proof of this theorem is based on the following Lemma 3.2. For each set of points (x1 , . . . , xm , y1 , y2 ) in M such that y122 = 0 and a pair of mutually non-isotropic y1 , y2 there exists a g ∈ C such that gxi ∈ M for 1 ≤ i ≤ m and y1 = gy1 , y2 = gy2 . We shall prove Lemma 3.2 by induction in m. For m = 0 it reduces to Proposition 1.1. Assume that it is established for some m ≥ 0. We shall prove that it is then also for valid arbitrary m + 1 points x1 , . . . , xm+1 and mutually non-isotropic pairs (y1 , y2 ), y1 , y2 in M. According to the assumption there exists a g ∈ C such that g xi ∈ M for 1 ≤ i ≤ m and y1 = g y1 , y2 = g y2 . If g xm+1 ∈ M we are in business. If p := g xm+1 ∈ K∞ then there exists an element h, arbitrarily close to the group unit in the stabilizer Cy1 ,y2 (⊂ C) of the pair y1 , y2 such that hp ∈ / K∞ . We prove this statement in Appendix C. To complete the proof of Lemma 3.2 it remains to choose h so that hg xi ∈ M for i = 1, . . . , m. This is possible since M is an open set in M and C acts continuously on M. Hence, g = hg satisfies the conclusion of Lemma 3.2. We continue with the proof of Theorem 3.1. Assume first that W satisfies (TS), (L) and (GCI). Lemma 3.2 and (GCI) imply the locality property (L) whenever xi i+1 2 = 0. Then W will be Z2 -symmetric in the domain U ⊂ M ×n of all (x1 , . . . , xn ) which are
Rationality of Conformally Invariant Correlation Functions
425
mutually non-isotropic. Since W1...n is a (tempered) distribution its singularities have a finite order. Therefore, there are integers µnij such that
µn ij W1...n (x1 , . . . , xn ) (3.6) xij 2 P1...n (x1 , . . . , xn ) = 1≤i<j ≤n
is a translation invariant distribution that is Z2 -symmetric in the entire Cartesian product space M ×n . The Fourier transform of P1...n (x12 , . . . , xn−1 n ) = P1...n (x1 , . . . , xn ), P˜1...n (q1 , . . . , qn−1 ) = · · · P1...n (y1 , . . . , yn−1 ) e−i(q1 y1 +···+qn−1 yn−1 ) d D y1 . . . d D yn−1
(3.7)
M ×(n−1)
is obtained from W˜ 1...n (q1 , . . . , qn−1 ) by the action of a differential operator in q , . . . , qn−1 with constant coefficients; hence supp P˜1...n (q1 , . . . , qn−1 ) ⊆ supp W˜ ⊆ 1 + ×(n−1) V . On the other hand, the total Z2 -symmetry of P implies P1...n (x1 , . . . , xn ) = Pn...1 (xn , . . . , x1 ) ⇒ P1...n (y1 , . . . , yn−1 ) (3.8) = 2 Pn...1 (−yn−1 , . . . , −y1 ) , where 2 = i 2i i+1 . Hence P˜1...n (q1 , . . . , qn−1 ) = 2 P˜n...1 (−qn−1 , . . . , −q1 ) im − ×(n−1) plying supp P˜n...1 ⊆ V . Since the order of field labels is arbitrary and + − V V = {0} we conclude that P (y1 , . . . , yn−1 ) is a polynomial and the same is true for P (x1 , . . . , xn ) . If we combine this result with the fact that W admits an analytic continuation ×(n−1) W (ζ1 , . . . , ζn−1 ) in the backward tube T − as a consequence of energy positivity then we end up with Eq. (3.5). Actually, both sides of (3.5) are equal in the domain U introduced above (in accord with (3.6)). On the other hand, they have analytic continuations in the tube domain, which thus must be equal, too. Clearly, the rational function R1...n (x1 , . . . , xn ) so obtained is fully Z2 -symmetric and conformally invariant. Conversely, if W is given by (3.5) for a completely Z2 -symmetric and conformally invariant R1...n (x1 , . . . , xn ), then conditions (TS) and (L) follow as W is the boundary value of a symmetric analytic function in the tube domain. Condition (GCI) is a corollary of Proposition 2.2 since P1...n is a regular multiplicator. Remark 3.2. By the proof of Theorem 3.1 it is clear that the (GCI) condition is just needed for extending the locality property (3.4) for all non-isotropic separations. The rationality of the Wightman functions (of the type (3.5) without conformal invariance) is then equivalent to this strong locality property and energy positivity (TS). 4. Constraints on Pole Degrees Coming from Wightman Positivity Up to this point we did not use Hilbert space (Wightman) positivity. Taking it into account allows to deduce that the order of poles in correlation functions are uniformly bounded with respect to the number (n) of points.
426
N. M. Nikolov, I. T. Todorov
Theorem 4.1. Let φ (x) (= φa (x)) be a (multicomponent) field satisfying Wightman axioms as well as the condition (GCI) for Wightman functions. Then the orders of the poles of the rational functions W (x1 , . . . , xn ) are uniformly bounded, i.e., the integers µnjk in (3.5) can be chosen independent of n. Proof. Consider the vector valued distributions 71...k (x1 , . . . , xk ) = φ1 (x1 ) . . . φk (xk ) |0.
×k
(4.1)
into the It defines a continuous (finite order) map of the test function space S M Hilbert space H ([16, Sect. 3.3] or [5, Sect. 3.3B]). Moreover, 7 is the boundary value of an analytic vector valued function 71...k (x1 + iy1 , . . . , xk + iyk ) holomorphic for yk ∈ V + , yi i+1 ∈ V − , i = 1, . . . , k − 1. For k = 2 it follows from Theorem 3.1 and from Reeh–Schlieder theorem ([16, Theorem 4-2]) that 712 (x1 , x2 ) = 212 721 (x2 , x1 ) for any pair of mutually non-isotropic (x1 , x2 ). Taking into account the finite order of the distribution 721 (in S ) we deduce that there is a positive integer µ such that the distributions µ f12 (x1 , x2 ) = x122 8|712 (x1 , x2 ) = f21 (x2 , x1 ) (4.2) for all 8 ∈ H, ((x1 , x2 ) ∈ M ×2 are Z2 -symmetric on M × M. Choosing, in particular, 8| = 0|φ1 (ζ1 ) . . . φk (ζk ) with ζj = xj + iyj ; 1 ≤ j ≤ k in the above tube domain of analyticity and substituting 712 (x1 , x2 ) by 7k+1 k+2 (xk+1 , xk+2 ) we deduce that the order of the pole of W1...k+2 (x1 , . . . , xk+2 ) in xk+1 k+2 2 does not exceed µ and is hence independent of k. Locality then implies that this holds for any pair of arguments. We proceed now to estimate (give a realistic upper bound) of the value of µ in (4.2) in the important special case of 4 dimensional space time, in which the (4-fold covering of the) conformal group C coincides with a group of 4 × 4 pseudounitary matrices: (4.3) C = Spin0 (4, 2) = SU (2, 2) = u ∈ SL (4, C) , uβu∗ = β where
0 0 1 0
0 0 0 1 . β := 1 0 0 0 0 1 0 0 We shall assume that our fields φ transform under elementary induced representations of C (see [8] or Sect. 2A of [17]). This means, in particular, that πx (g) of Eq. (2.2) provides an x-independent finite dimensional irreducible representation (IR) of the quantum mechanical Lorentz group SL (2, C) with dilations: A 0 + R × SL (2, C) = ∈ SU (2, 2) , ρ = det A > 0 . (4.4) 0 A∗ −1 (For x = x 0 1 + x σ , where σ1 , σ2 , σ3 are the Pauli matrices we have ρ ?x = A x A∗ , ∼
µ
∼
∼
µ ν where (?x) = ? ν x is a proper Lorentz transformation.) It follows that the representation π gρ of the dilation subgroup gρ : x → ρx (ρ > 0) is scalar: −1 φ (ρx) = ρ d φ (ρx) , d > 0. (4.5) gρ : φ (x) → πx gρ
Rationality of Conformally Invariant Correlation Functions
427
The exponent d is called the (conformal) dimension of φ. Labeling, as customary, the IR of SL (2, C) by a pair of (non-negative) half integers (j1 , j2 ) (2j1 and 2j2 giving the numbers of undotted and dotted indices of the spin-tensor φ) one proves that for singlevalued πx (g) the (positive) number d + j1 + j2 should be an integer. (In other words, d should be an integer for Bose fields, for which j1 + j2 ∈ Z+ , and half odd integer for Fermi fields.) Let us note some implications for a Wightman QFT coming from the additional condition (GCI) (in 4 dimension). First, the Wightman functions have an analytic continuation in the domain of all mutually-nonisotropic points of MC . They are rational and conformal invariant, and in particular their euclidean restrictions, the Euclidean Green functions, satisfy the condition of weak conformal invariance studied by Lüscher and Mack in [6]. Therefore ([6], Prop.1) there exists a unitary representation in the Hilbert ˜ the universal space H of physical states, of the quantum-mechanical conformal group C, covering of C. Thus (following [7]) H can be decomposed in a direct sum or integral of ˜ It was shown by Mack in [8] that all irreducible irreducible representation spaces of C. unitary representations of C˜ with positive energy are field representations acting on H(j1 ,j2 ;d) = Span φ (j1 ,j2 ;d) (x) |0 , (4.6) where φ (j1 ,j2 ;d) (x) is a field which transforms under the elementary induced representation of C˜ of weight (j1 , j2 ; d). In our case, because of the rationality and the conformal invariance on M for the Wightman functions, it follows that in the decomposition of H take part only such H(j1 ,j2 ;d) for which d + j1 + j2 is an integer. Thus the decomposition of H will be a direct sum: H = C|0 ⊕ ⊕ N(j1 ,j2 ;d) ⊗ H(j1 ,j2 ;d) , (4.7) d+j1 +j2 ∈N
where the Hilbert space N(j1 ,j2 ;d) gives the multiplicity of H(j1 ,j2 ;d) in H. Energy positivity and unitarity restrict each dimension d according to the following result: Theorem 4.2 ([8]; also see Theorem 3.18 of [17]). If φ is an elementary conformal field of weight (j1 , j2 ; d) then requirement (TS) and Wightman positivity imply d ≥ j1 + j2 + 1 for j1 j2 = 0 ;
d ≥ j1 + j2 + 2 for j1 j2 > 0 .
(4.8)
The proof is based, in part, on an analysis of the positivity properties of the conformally invariant 2-point function H2j1 +2j2 (x − y) d+j1 +j2 , (x − y)2 + i0 x 0 − y 0
0|φj∗2 j1 (x) φj1 j2 (y) |0 =
(4.9)
where Hn (x) is a tensor-valued homogeneous harmonic polynomial of degree n that is determined (up to a normalization constant) from conformal invariance. Example 4.1. If J µ (x) is a conserved current in D dimension then the 2-point function 0 −D x 2 δ µ − 2x µ (x ) . The har0|J µ (x1 ) Jν (x2 ) |0 is proportional to x122 + i0x12 12 ν 12 12 ν µ1 ...µl monic polynomial H2l ν1 ...νl (x) appearing in the 2-point function of a rank l symmetµ ric traceless tensor is written as a symmetrized product of factors of the type x122 δν − µ 2x12 (x12 )ν with subtracted traces.
428
N. M. Nikolov, I. T. Todorov
Proposition 4.3. Let a system of fields (in dimension D = 4) satisfy the conditions of Theorem 4.1. Let φ (x) and ψ (x) be two fields in this system which are transforming under elementary induced representations of C of weights j1 , j2 ; d and j1 , j2 ; d −µ respectively. Then the pole degree µ of (x − y)2 + i0 x 0 − y 0 in any Wightman function 0| . . . φ (x) . . . ψ (y) . . . |0 has the upper limit: 1 − δj1 j2 δj2 j1 δd d" d + j1 + j2 + d + j1 + j2 µ≤ − , (4.10) 2 2 where [[a]] stands for the integer part of the real number a (i.e. the maximal n ∈ Z for which n ≤ a). Proof. As it was pointed out in the proof of Theorem 4.1, µ is the order of the vectorvalued distribution φ (x) ψ (y) |0. Then because of the decomposition (4.7) of the physical Hilbert space H, µ does not exceed the order in x − y of the two-point function 0|φ (x) ψ (y) |0, or the maximal order of possible (non-zero) three-point conformal (j2 ,j1 ;d) two-point function of invariant Wightman functions 0|φ (x) ψ (y) φ (z) |0. The φ and ψ may be non-zero only for j1 , j2 = j2 , j1 , d = d and then it saturates the upper limit (4.10), according to (4.9). It remains to verify (4.10) for the orders µ(j1 ,j2 ;d) of the pole in x − y of the three-point functions 0|φ (x) ψ (y) φ (j2 ,j1 ;d) (z) |0. We will use the results in [7] (Lemma 10) for the general form of such three-point functions. Thus we obtain: µ(j1 ,j2 ;d) ≤
1 d + d − d + Lj1 j2 , 2
(4.11)
where L is the maximal of the integers l for which the SL (2, C) representation l l j1 j2 occurs in the triple tensor product , 2 2 j1 , j2 ⊗ j1 , j2 ⊗ (j1 , j2 ) (4.12) such l does not exist (when then the three-point function must be zero). Since j1 + j1 + j1 , j2 + j2 + j2 is the maximal weight occurring in the product (4.12), then when Lj1 j2 exists it will be equal to: Lj1 j2 = 2 min j1 + j1 + j1 , j2 + j2 + j2 . (4.13) This will certainly happen when j1 + j1 + j1 = j2 + j2 + j2 . Assume, for the sake of definiteness that 1 Lj j = j1 + j1 + j1 ≤ j2 + j2 + j2 ; 2 12
(4.14)
then from Theorem 4.2 (Eq. (4.8)) and Eqs. (4.11), (4.14) we obtain: µ(j1 ,j2 ;d) ≤
1 + θj1 j2 1 d + d + j1 + j1 + j2 + j2 − , 2 2 1+θ
(4.15)
where θ0 = 0, θij = 1 for ij > 0 . We further need to maximize − 2j1 j2 or to minimize θj1 j2 . If j1 + j1 + j1 = j2 + j2 + j2 then we can choose (j1 , j2 ) = (j, 0) or (0, j ) when θj1 j2 = 0. So we obtain the upper limit (4.10) using the condition µ ∈ Z (owing to the rationality of Wightman functions).
Rationality of Conformally Invariant Correlation Functions
429
Corollary 4.4. Under the assumptions of Proposition 4.3 for φ = ψ ∗ each truncated Wightman function 0| . . . ψ ∗ (x) . . . ψ (y) . . . |0T will have a strictly smaller power µ of the pole in (x − y)2 + i0 x 0 − y 0 than the 2-point function 0|ψ ∗ (x) ψ (y) |0. If the weight of ψ is (j1 , j2 ; d) then µ ≤ d + j1 + j2 − 1.
(4.16)
Proof. It follows from the definition of truncated functions ([5] Sect. 3.5C) that the order of the pole in (x − y)2 does not exceed the order of the vector valued distribution (1H − |00|) ψ ∗ (x) ψ (y) |0. Hence, we only need to take into account the nonvacuum contributions in the decomposition (4.7). These are determined by the 3-point functions. The estimate (4.16) then follows from Eq. (4.15). The following example illustrates the case of φ = ψ ∗ (when the 2-point function of the fields in the product vanishes). Example 4.2. Let ψ be the free massless Dirac field transforming under the reducible representation 21 , 0 ⊕ 0, 21 of SL (2, C) (it becomes irreducible if we extend the Lorentz group by space reflections). Its (normalized) 2-point function is given by (4.9) with d + j1 + j2 = 2 and H1 (x) = 2π1 2 x µ γµ . The leading term of the operator product expansion of ψ with the conserved current J ν (x) = : ψ˜ (x) γ ν ψ (x) : saturates the bound (4.10):
ψ (x) J ν (y) |0 % ψ (x) ψ˜ (y) ψ (y) |0 + O (x − y)2 (4.17) γ (x − y) ψ (y) + O (x − y)2 |0. = ! 2 2π 2 (x − y)2 + i0 x 0 − y 0 This example is typical in that the leading term in the small distance expansion of the product of a charge carrying conformal field with a conserved current (or the stress energy tensor) is the term involving the same charged field. This property is a consequence of the Ward(–Takahashi) identity. 5. Cluster Property. Discussion The conformal Hamiltonian H is the Hermitian generator of the conformal Lie algebra corresponding to J−1,0 (i.e., the generator of the rotations in the euclidean (−1, 0) −plane). The significance of both H and the associated conformal time variable τ has been repeatedly stressed by I. Segal (see, e.g. [14]) who has pointed out, in particular, that energy positivity with respect to the usual relativistic energy operator P 0 implies positivity of H because of the relation H = P 0 + Uw P 0 Uw−1 ,
(5.1)
where Uw is the representation of the Weyl group element w (of Example 2.1). Wightman axioms (including the uniqueness of the vacuum state) together with the GCI postulate imply that the spectrum of H is contained in Z+ (because of the rationality of Wightman functions) and that there is a unique state in the Hilbert space H (the state space of our QFT), the vacuum, corresponding to eigenvalue 0 of H . In fact, these properties of the conformal Hamiltonian in the vacuum superselection sector, conversely, implies rationality of correlation functions of local observable fields. In terms of the contraction semigroup e−tH , t ≥ 0 exploited in [6] the cluster decomposition property can be formulated as follows (cf. [6, Sect. 5]).
430
N. M. Nikolov, I. T. Todorov
Proposition 5.1. For any pair of vectors 7| = 0| φ1 (x1 ) . . . φn (xn ) , |7 = φ1 (y1 ) . . . φl (yl ) |0
(5.2)
the following limiting factorization property holds: lim 7| e−tH |7 = 7|00|7 .
(5.3)
t→∞
Equation (5.3) can be rewritten in terms of Wightman functions as
1 ⊗ . . . ⊗ 1 ⊗ πy1 e−tH . . . ⊗ πyl e−tH
× Wk+l x1 , . . . , xk , e−tH y1 , . . . , e−tH yl
(5.4)
−→ Wk (x1 , . . . , xk ) Wl (y1 , . . . , yl ) .
t→∞
The limit in Eq. (5.4) may be given two different (valid) interpretations: first, as a limit for every fixed set of points (x1 , . . . , xk ) and (y1 , . . . , yl ) in the corresponding tube domains; second, asa limit of rational functions (because of the rationality of both sides of (5.4)). Note that e−tH , t ≥ 0} is a subsemigroup of the complex conformal group CC . Theorem 3.1 implies that in a QFT satisfying GCI the correlation functions viewed as rational functions would satisfy (5.4) for any semigroup he−tH h−1 conjugate to e−tH in CC . This is true, in particular, for the two opposite dilation semigroups:
Uρ φi (yi ) Uρ−1 = ρ di φi (ρyi ) ,
i = 1, . . . , l,
for
ρ≥1
or
ρ ≤ 1.
(5.5)
As a corollary of Proposition 5.1 we have:
ρ d1 +···+dl Wk+l (x1 , . . . , xk , ρy1 , . . . , ρyl ) −→ Wk (x1 , . . . , xk ) Wl (y1 , . . . , yl ) . ρ ±1 →0
(5.6) Combined with locality this gives ([5] Sect. 3.5C) Proposition 5.2. For any splitting of the arguments (x1 , . . . , xn ) of a truncated Wightman function WnT into two disjoint subsets xi1 , . . . , xik and xj1 , . . . , xjl , k + l = n, we have (5.7) ρ dj1 +···+djl WnT xi1 , . . . , xik , ρxj1 , . . . , ρxjl −→ 0 for ρ ±1 → 0. The general principles of QFT together with GCI are so restrictive that they allow, in principle, the computation of conformally invariant correlation functions. We shall illustrate this fact by writing down the general 4-point function of a neutral scalar field φ of low conformal dimension. Proposition 5.3. The general 4- point function of a neutral scalar field φ of (an integer) dimension d satisfying (TS) (L) (GCI) and the implications of positivity contained in Proposition 4.3 has the form W (x1 , x2 , x3 , x4 ) = Dd (x1 , x2 , x3 , x4 ) P (η1 , η2 ) ,
Rationality of Conformally Invariant Correlation Functions
431
Dd (x1 , x2 , x3 , x4 ) =
x132 x242 2 2 2 0 0 0 0 x12 + i0x12 x232 + i0x23 x34 + i0x34 x14 + i0x14
d ,
(5.8)
where η1,2 are the cross-ratios η1 =
x132
x122 x342 2 , 0 0 + i0x13 x24 + i0x24
x142 x232 2 , η2 = 2 0 0 x13 + i0x13 x24 + i0x24
(5.9)
P is a polynomial in η1 , η2 of overall degree 2d, P (η1 , η2 ) =
µ
Cµν η1 η2ν .
(5.10)
µ≥0 , ν≥0 µ+ν≤2d
Locality implies invariance of P under the 6 element dihedral group D3 (∼ = S3 ): η1 1 = P (η1 , η2 ) , : , η 2 η2
1 η2 = P (η1 , η2 ) , : η12d P , η 1 η1 : P (η2 , η1 ) = P (η1 , η2 )
s12 s23 s13 = s12 s23 s12 = s23 s12 s23
η22d P
(5.11)
(sij standing for the permutation of the arguments i, j of the rational function W ). The normalization of P is related to the normalization of the 2- point function of φ: Nd
W (x1 , x2 ) =
x122
0 d + i0x12
&⇒ P (1, 0) = Nd2 = P (0, 1) .
(5.12)
The truncated 4- point function admits a similar representation, WT (x1 , x2 , x3 , x4 ) = Dd (x1 , x2 , x3 , x4 ) PT (η1 , η2 ) ,
(5.13)
where the polynomial PT has the properties (5.10) and (5.11) of P while instead of (5.12) it satisfies PT (1, 0) = PT (0, 1) = 0.
(5.14)
Proof. The general form (5.8) is implied by the fact that the cross ratios (5.9) form a basis of (rational) invariants of 4 points. The summation limits in (5.10) follow from
−λ Proposition 4.3. The i0xj0k prescription in xj k2 + i0xj0k reflects energy positivity. Equations (5.12) and (5.14) are consequences of Eqs. (5.6) and (5.7), respectively.
432
N. M. Nikolov, I. T. Todorov
The simplest special cases, d = 1, 2, show that the range of summation in (5.10) can be further restricted by combining the operator product expansion implied by (5.8) with positivity. Indeed, the most general polynomials P satisfying (5.11) and (5.12) for the above values of d are P1 (η1 , η2 ) = C1 (1 − η1 − η2 )2 − 4η1 η2 + N12 (η1 + η2 + η1 η2 ) , (5.15)
2 P2 (η1 , η2 ) = C20 1 − η12 − η22 − 4η12 η22 + C21 η1 (1 − η1 )2 + η2 (1 − η2 )2 + η1 η2 (η1 − η2 )2
+ N22 η12 + η22 + η12 η22 + C2 η1 η2 (1 + η1 + η2 ) . (5.16) On the other hand, for d = 1 the 2-point Wightman function W (x1 , x2 ) = W1 (x12 ) satisfies the d’Alembert equation ✷W1 (x) = 0. It then follows from the above cited Reeh–Schlieder theorem and from Wightman positivity that ✷φ (x) = 0. Consequently, C1 = 0 in (5.15) and we end up with a free field theory. More generally, the vanishing of C1 – as well as that of C20 and C21 – follows from Corollary 4.4. (We owe this remark to Yassen Stanev.) A systematic study of the implications of operator product expansions combined with positivity is relegated to the sequel of this paper (announced in the Introduction). Acknowledgements. We thank Yassen Stanev for his inquisitive questioning and for helpful discussions. The authors acknowledge partial support by the Bulgarian National Council for Scientific Research under contract F–828 .
Appendix A. The Klein–Dirac Quadric Viewed as a homogeneous space of the group SO0 (D, 2), compactified D-dimensional Minkowski space M can be defined as a projective isotropic cone (or quadric) of signature (D, 2) (see [3]). Points in M are identified with isotropic rays in RD,2 , two collinear (isotropic) vectors, ξ' and λξ' (λ = 0) representing the same point in M: 2 M = Q/R∗ , Q := ξ' ∈ RD,2 ; ξ' = 0, ξ' 2 := ξ 2 + ξD2 − ξ02 − ξ−1 = 0 . (A.1) 2 . In this Here R∗ is the multiplicative group of non-zero reals, ξ 2 := ξ12 + · · · + ξD−1 representation of M, the embedding of Minkowski space M in M is given by the Klein– Dirac compactification formulae: M x −→ λξ'x ∈ M, where : (A.2) 1 + x2 1 − x2 ξ'x = x µ e'µ + e'−1 + e'D 2 2
and e'1 , . . . , e'D , e'−1 , e'0 is the standard basis in RD,2 . Then we have the following expression for the pseudo-euclidean interval:
2 (x − y)2 = −2ξ'x .ξ'y = ξ'x − ξ'y . (A.3)
Rationality of Conformally Invariant Correlation Functions
433
Equation (A.3) shows that two points p1 , p2 ∈ M are (mutually) isotropic if the inner product ξ'1 .ξ'2 of (any of) their representatives ξ'1 , ξ'2 is zero. Thus it is obviously relation in M × M. The point p∞ ∈ M can be defined as a SO0 (D, 2)-invariant ' ' p∞ = λξ∞ , where ξ∞ = e'D − e'−1 . Then the representatives ξ'x of the points x of Minkowski space M are characterized by the normalization ξ'x .ξ'∞ = 1 so that the set K∞ of points at infinity is indeed Kp∞ ≡ K∞ = p ∈ M; (p, p∞ ) = 0, i.e., ξ' .ξ'∞ = 0 for p = λξ' , p∞ = λξ'∞ . (A.4) As an illustration of the transitivity property used in the proof of Proposition 1.1 of Sect. 1 we note that the rotation on π in the (ξ −1 , ξ 0 )-plane interchanges the origin in M with p∞ (this is the Weyl reflection used in Examples 2.1 and 2.2). Equation (A.3) also allows to compute the conformal factor ω (x, g) ω (y, g) multiplying the interval (x − y)2 under the action of g ∈ SO0 (D, 2). Indeed, ω (x, g) can be defined by: g ξ'x = ω (x, g) ξ'gx
for
ξ'x .ξ'∞ = 1 = ξ'gx .ξ'∞ ⇒ ω (x, g) = g ξ'x .ξ'∞
(A.5)
and is, hence, a second degree polynomial in x. Here ξ' → g ξ' is the linear action of the group SO0 (D, 2) on RD,2 , while x → gx is the nonlinear action of SO0 (D, 2) on M which can be computed from (A.5). We then find (cf. (1.1)): (gx − gy)2 = −2 ξ'gx .ξ'gy g ξ'x .g ξ'y (x − y)2 = −2 = . ω (x, g) ω (y, g) ω (x, g) ω (y, g)
(A.6)
So, the elements of the group SO0 (D, 2) act indeed conformally on M outside their singularities. Because the dimension of SO0 (D, 2) is equal to that of the conformal Lie algebra coming from the Liouville theorem (see, e.g., [17, Appendix A]) then SO0 (D, 2) must be locally isomorphic to the conformal group C0 . The double cover M of M (A.1) is diffeomorphic to the product of the (D −1)-sphere, S D−1 , with the unit circle S 1 ; M is obtained from it by identifying opposite points: M = S D−1 × S 1 ,
S D−1 = ξ , ξD ; ξ 2 + ξD2 = 1 , 2 S 1 = (ξ−1 , ξ0 ) ; ξ−1 + ξ02 = 1 , #
" M = M Z2 = M ξ' ∼ ξ' ∈ M . = −ξ'
(A.7)
Observation. It follows from (A.7) that compactified Minkowski space M is orientable for even D and non-orientable for odd D. Equation (A.1) admits a complex version, M C = QC /C∗ , where QC is the complexification of the quadric Q and C∗ is the multiplicative group of (non-zero) complex numbers. M C is a homogeneous space of the complexified conformal group CC . M C
434
N. M. Nikolov, I. T. Todorov
and CC admit a unique antianalytic involution ∗ whose fixed points belong to M and C, ∗ ∗ ' respectively. In the above realization we have p = λξ , (gp)∗ = g ∗ p ∗ , where ξ' ∗ is the (componentwise) complex conjugate of ξ' and g ∗ is the complex conjugate of the matrix g ∈ SO0 (D, 2; C). Appendix B. Proof of Proposition 2.2 We shall prove the statement by induction in the number of points. For n = 2 the !−µ 2-point vacuum correlator is a boundary value of the analytic function (x − ζ )2 holomorphic for (x, ζ ) ∈ M × T + which satisfies (GCI) as a consequence of (1.1) and of the conformal invariance of the forward tube. (In fact, this case can also be covered by our induction inference if we interpret the (GCI) property for n = 1 as the identity 1 = 1 .) Assume now that the statement is established for all (n − 1)-point functions for some n > 1 . To prove it for the n-point functions we fix a g ∈ C and n open sets U1 , . . . , Un in M whose closures U1 , . . . , Un are compact and such that the mapping g : x → gx has no singularity for x ∈ Ui . So we will establish that Eq. (2.14) is locally valid for (x1 , . . . , xn ) ∈ U1 × · · · × Un . The distribution in the right-hand side of (2.14) is then the boundary value for I m (ζ ) → 0 , ζ ∈ T + of the analytic function Wn (x1 , . . . , xn−1 , ζ ) = Wn−1 (x1 , . . . , xn−1 )
n−1
xj − ζ
2 −µj n
ζ ∈ T+
j =1
(B.1) (with Wn−1 again given by the right-hand side of (2.14) for n substituted by n − 1 ). Applying the induction assumption to Wn−1 and Eq. (1.1) to the other factors in the product in the right-hand side of (B.1) (treating them as a multiplicator) we deduce Wn (x1 , . . . , xn−1 , xn + iy) = ω (x1 , g)−µ1 . . . ω (xn−1 , g)−µn−1 ω (xn + iy, g)−µn × Wn−1 (gx1 , . . . , gxn−1 )
n−1
2 −µj n
gxj − gxn − ζy (gxn )
(B.2) ,
j =1
where ζy (gxn ) := g (xn + iy) − gxn ∈ T + because of the conformal invariance of T + . We also have the limit: lim
V+ y→0
Wn−1 x1 , . . . , xn−1
n−1 j =1
xj
− xn
2 − ζy xn
−µj n
2
−µj n n−1 0 0 xj − xn + i0 xj − xn = Wn−1 x1 , . . . , xn−1 ,
(B.3)
j =1
which combined with (B.2) gives (2.14) (due to the continuity of multiplication of a distribution with a multiplicator).
Rationality of Conformally Invariant Correlation Functions
435
Appendix C. Completion of the Proof of Lemma 3.2 It remains to prove the following statement: If the points y1 , y2 ∈ M are non-isotropic, p ∈ K∞ and Cy1 ,y2 is the stabilizer of the pair y1 , y2 in SO0 (D, 2), then in any neighbourhood of unity in Cy1 ,y2 there exists an element h such that hp ∈ / K∞ . In the case D = 1, K∞ consists of a single point, p∞ , and it is not stable for Cy1 ,y2 so one does not need the argument below. To prove the above statement for D ≥ 2 we shall use the representatives ξ'1 , ξ'2 and ξ' of the points y1 , y2 and p (cf. (A.2)) in
the quadric (A.1). We have ξ'1 . ξ'2 = 0 = ξ'∞ . ξ'a for a = 1, 2; ξ' . ξ'∞ = 0 ⇐ ξ' ∈ K∞ . The metric in Span ξ'1 , ξ'2 being non-degenerate (since ξ'1 . ξ'2 = 0 while ξ'12 = ξ'22 = 0) there is a pseudo-orthogonal decomposition ⊥ RD,2 = Span ξ'1 , ξ'2 ⊕ ξ'1 , ξ'2 . (C.1) Let ξ'∞ = ξ'∞ + ξ'∞ , ξ' = ξ' + ξ' ξ'(∞) ∈ Span ξ'1 , ξ'2 be the corresponding decom 2 positions of ξ'∞ and ξ' . Since ξ'∞ . ξ'1,2 = ξ'∞ . ξ'1,2 = 0 and ξ'∞2 = 0 then ξ'∞ = 0 2 and ξ'∞ = 0. It follows also that ξ' = 0 (i.e. ξ' does not belong to Span ξ'1 , ξ'2
since ξ' 2 = 0 and ξ' . ξ'∞ = 0). Let G be the connected component of the orthogonal ⊥ group of the subspace ξ'1 , ξ'2 (G ∼ = SOo (D − 1, 1)). Clearly, G is a subgroup of the stabilizer of the points yi = λξ'i , i = 1, 2: G ⊂ Cy1 ,y2 = C
Span ξ'1 ,ξ'2
:= SOo
Span ξ'1 , ξ'2
× SOo
ξ'1 , ξ'2
⊥
.
⊥ For h varying in G, hξ'∞ moves on a hyperboloid in ξ'1 , ξ'2 . Consequently, the (real
valued) function f (h) = hξ' . ξ'∞ = ξ' . ξ'∞ + ξ' . h−1 ξ'∞ cannot be a constant. On the other hand it is real analytic (in fact, algebraic) and f (1) = 0. Hence, in any (arbitrarily small) neighbourhood of the group unit in G (and also in Cy1 ,y2 ) there exist elements h such that f (h) = 0, i.e. hp ∈ / K∞ . References 1. Aharony, O., Gubser, S.S., Maldacena, J., Ooguri, H., Oz, Y.: Large N field theories, string theory and gravity. hep-th/9905111, Phys. Rep. 323, 183–386 (2000) 2. Di Francesco, P., Mathieu, P., Senechal, D.: Conformal field theories. Berlin–Heidelberg–New York: Springer, 1996 3. Dirac, P.A.M.: Wave equations in conformal space. Ann. Math. 37, 429–442 (1936) 4. Flato, M., Sternheimer, D.: Remarques sur les automorphismes causals de l’espace temps. Comptes Rendus Acad. Sci., Paris A 263, 935–936 (1966) 5. Jost, R.: The general theory of quantized fields. Providence, R.I.: Am. Math. Soc. Publ., 1965 6. Lüscher, M., Mack, G.: Global conformal invariance in quantum field theory. Commun. Math. Phys. 41, 203–234 (1975) 7. Mack, G.: Convergence of operator product expansions on the vacuum in conformal invariant quantum field theory. Commun. Math. Phys. 53, 155–184 (1977)
436
N. M. Nikolov, I. T. Todorov
8. Mack, G.: All unitary representations of the conformal group SU (2, 2) with positive energy. Commun. Math. Phys. 55, 1–28 (1977) 9. Mack, G., Symanzik, K.: Currents, stress tensor and generalized unitarity in conformal invariant quantum field theory. Commun. Math. Phys. 27, 247–281 (1972) 10. Mack, G., Todorov, I.T.: Conformal invariant Green functions without ultra-violet divergences. Phys. Rev. D8, 1764–1787 (1973) 11. Polyakov, A.M.: Conformal symmetry of crucial fluctuations. Zh. ETF Pis. Red. 12, 538 (1970) (transl.: JEPT Lett. 12, 381 (1970)) 12. Polyakov, A.M.: Nonhamiltonian approach in the conformal invariant quantum field theory. Zh. Eksp. Teor. Fiz. 66, 23–42 (1974) (transl.: Sov. Phys.-JETP 39, 10–18 (1974)) 13. Ruhl, W., Yunn, B.C.: The transformation behaviour of fields in conformally covariant quantum field theory. Forts. Phys. 25, 83–99 (1977) 14. Segal, I.E.: Causally oriented manifolds and groups. Bull. Am. Math. Soc. 77, 958–959 (1971) 15. Stanev,Ya.S.: Stress-energy tensor and U (1)-current operator product expansion in conformal QFT. Bulg. J. Phys. 15, 93–107 (1988) 16. Streater, R.F., Wightman, A.S.: PCT, spin and statististics and all that. Second Printing. Reading, MA: Benjamin/Cummings, 1978 17. Todorov, I.T.: Local field representations of the conformal group and their applications. In: Streit, L. (ed.) Mathematics + Physics, Lectures on recent results. Vol. 1, Singapore: World Scientific, 1985, pp. 195–338 18. Todorov, I.T.: Infinite dimensional Lie algebras in conformal QFT models. In: Barut, A.O. and Doebner, H.-D. (eds.) Conformal groups and related symmetries. Physical results and mathematical background, Lecture Notes in Physics 261, Berlin–Heidelberg–New York: Springer, 1986, pp. 387–443 19. Todorov, I.T.: Conformal description of spinning particles. Trieste Notes in Physics. Berlin et al.: Springer, 1986 20. Todorov, I.T., Mintchev, M.C., Petkova, V.B.: Conformal invariance in quantum field theory. Pisa: Scuola Normale Superiore, 1978, pp. 1–274 21. Treves, F.: Topological vector spaces, distributions and kernels. N.Y.: Acad. Press, 1967; for a brief review, see Treves, F.: Introduction to pseudodifferential operators and Fourier integral operators. Vol. 1, Sect. I.7. N.Y., London: Plenum Press, 1980 22. Uhlmann, A.: Remarks on the future tube. Acta Phys. Pol. 24, 293 (1963); The closure of Minkowski space. ibid. pp. 295–296; Some properties of the future tube. preprint KMU-HEP 7209 (Leipzig, 1972) 23. Wess, J.: The conformal invariance in quantum field theory. Nuovo Cimento 18, 1086–1107 (1960) Communicated by H. Araki
Commun. Math. Phys. 218, 437 – 457 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Fedosov Deformation Quantization as a BRST Theory M. A. Grigoriev1 , S. L. Lyakhovich2 1 Lebedev Physics Institute, Russian Academy of Sciences, Moscow, 117924, Russia 2 Department of Physics, Tomsk State University, Tomsk, Russia
Received: 28 April 2000 / Accepted: 6 December 2000
Abstract: The relationship is established between the Fedosov deformation quantization of a general symplectic manifold and the BFV-BRST quantization of constrained dynamical systems. The original symplectic manifold M is presented as a second class constrained surface in the fibre bundle Tρ∗ M which is a certain modification of a usual cotangent bundle equipped with a natural symplectic structure. The second class system is converted into the first class one by continuation of the constraints into the extended manifold, being a direct sum of Tρ∗ M and the tangent bundle T M. This extended manifold is equipped with a nontrivial Poisson bracket which naturally involves two basic ingredients of Fedosov geometry: the symplectic structure and the symplectic connection. The constructed first class constrained theory, being equivalent to the original symplectic manifold, is quantized through the BFV-BRST procedure. The existence theorem is proven for the quantum BRST charge and the quantum BRST invariant observables. The adjoint action of the quantum BRST charge is identified with the Abelian Fedosov connection while any observable, being proven to be a unique BRST invariant continuation for the values defined in the original symplectic manifold, is identified with the Fedosov flat section of the Weyl bundle. The Fedosov fibrewise star multiplication is thus recognized as a conventional product of the quantum BRST invariant observables. 1. Introduction Different trends are recognized among the approaches to quantization of systems whose classical mechanics is based on the Poisson bracket. In physics, the quantization strategy evolves, in a sense, in the opposite direction to the main stream developing in mathematics. From the physical viewpoint, the phase manifold is usually treated as a constraint surface in a flat manifold or in a manifold whose geometric structure is rather simpler than that of constraint surface. And the efforts are not directed to reduce the dynamics on the curved shell before quantization. The matter is that the physical models should usually possess an explicit relativistic covariance and space-time locality, whereas the
438
M. A. Grigoriev, S. L. Lyakhovich
reduction to the constraint surface usually breaks both. So, the main trend in physics is to quantize the system as it originally occurs, i.e. with constraints. The reduction is achieved in quantum theory by means of restrictions imposed to the class of admissible observables and states. The most sophisticated quantization scheme developed in this direction is the BFV method [1] (for review see [2]) based on the idea of the BRST symmetry. The method allows, in principle, to quantize any first class constraint theory, with the exception of the special case of the so-called infinitely reducible constraints. As to the second class constrained theories, various methods are known to adopt the BFV-BRST approach for the case. In this paper we turn to the idea to convert the second class theory into the first class extending the phase manifold by extra degrees of freedom which are going to be eventually gauged out by the introduced gauge symmetry related to the effective first class constraints. A number of the general conversion schemes is known today [3–6]. The conversion ideas are widely applied in practical physical problems concerning quantization of the second class constrained systems. The mathematical insight into the quantization problem always starts with the reduced Poisson manifold where the constraints, if they could be originally present, have already been resolved. The general concept of the deformation quantization was introduced in Ref. [11, 12]. The existence of the star product on the general symplectic manifold was proven in Ref. [13] where the default was ascertained for the cohomological obstructions to the deformation of the associative multiplication. Independently, Fedosov suggested the explicit construction of the star product on any symplectic manifold [14] (see also the subsequent book [15]). Now the general statement regarding the existence of the star product for the most general Poisson manifold is established by Kontsevich [16]. Recently the Kontsevich quantization formula was also supplied with an interesting physical explanation [17]. However, in the case of symplectic manifolds the Fedosov construction of the star product seems to be most useful in applications. The advantage is in the explicit description of the algebra of quantum observables. In the Fedosov approach the quantum observable algebra is the space of the flat sections of the Weyl algebra bundle over the symplectic manifold, with the multiplication being the fibrewise Weyl product. The Fedosov star-product allows a generalisation to the case of super-Poisson bracket [18]. Mention that the deformation quantization structures are coming now to the gauge field theory not only as a tool of quantizing but rather as the means of constructing new classical models, e.g. gauge theories on noncommutative spaces [7] and higherspin interactions [8]. The recent developments have also revealed a deep relationship between the strings and the Yang–Mills theory on the noncommutative spaces [9, 10]. The BRST approach to the quantization of the systems with the geometrically nontrivial phase space was initiated by Batalin and Fradkin who suggested to present the original symplectic manifold as a second class constraint surface embedded into the linear symplectic space [19]1 From the viewpoint of the BFV method, the question of deformation quantization of the general symplectic manifold was considered in Ref. [20] where one could actually observe (although it was not explicitly mentioned in the paper) that the generating equations for the Abelian conversion [6], applied to the embedding of the second class 1 The global geometric properties of this embedding were not in the focus of the original papers [19–21]. In the present paper we suggest a slightly different constrained embedding of the symplectic manifold with an explicit account for the global geometry, although our basic goal is beyond the geometry of the embedding itself.
Fedosov Deformation Quantization as a BRST Theory
439
constraints of Ref. [19, 21], naturally involve the characteristic structures of the Fedosov geometry: the symmetric symplectic connection and the curvature. However, according to our knowledge, the relationship has not been established yet between the second class constraint approach of [19–21] and the Fedosov construction. In this paper we show that the Fedosov quantization scheme can be completely derived from the BFV-BRST quantization of the constrained dynamical systems. First the symplectic manifold M is extended to the fibre bundle Tρ∗ M, being a certain modification of the usual cotangent bundle, which still carries the canonical symplectic structure. The original manifold M is identified to the second class constrained surface in Tρ∗ M. This allows to view the Poisson bracket on the base manifold as the Dirac bracket associated to the second class constraints. Further, the second class constraints are converted into the first class ones in spirit of the Abelian conversion procedure [6]. In the case at hand, we choose the conversion variables to be the coordinates on the fibres of the tangent bundle over the symplectic manifold. The phase space of the converted system, in distinction to the direct application of the conventional conversion scheme [6] exploited in Ref. [20], is equipped with a natural nonlinear symplectic structure. This symplectic structure involves the initial symplectic form and a symmetric symplectic connection. Remarkably, these structures are known as those determining the so called Fedosov geometry [22]. In its turn the Jacobi identity for the Poisson bracket, being defined in this extended manifold, encodes all the respective compatibility conditions for the Fedosov manifold. So, the embedding and converting procedure make the relationship transparent between the constrained Hamiltonian dynamics and Fedosov’s geometry. We quantize the resulting gauge invariant system, being globally equivalent to the original symplectic one, according to the standard BFV quantization prescription. As the extended phase space of the BFV quantization is a geometrically nontrivial symplectic manifold, it is a problem to quantize it directly. Fortunately, to proceed with the BFV scheme when the constraints have the special structure as in the case in hand, one needs to define only the quantization of some subalgebra of functions. Namely, we consider subalgebra A of functions at most linear in the momenta which is closed w.r.t. the associative multiplication and the Poisson bracket. This subalgebra contains all the BRST observables, BRST charge and the ghost charge. Unlike the entire algebra of functions on the extended phase space, the construction of the star-multiplication in A is evident in this case. At the quantum level we arrive at the quantum BRST charge satisfying the quantum nilpotency condition [, ] = 0. The algebra of quantum observables is thus the zeroghost-number cohomology of Ad . This algebra, being viewed as a vector space, is isomorphic to the algebra of classical observables. The noncommutative product from the algebra of quantum BRST observables is carried over to the space of functions on the symplectic manifold giving a deformation quantization. This approach allows us to identify all the basic structures of Fedosov’s method as those of the BRST theory. In particular, the auxiliary variables y i (which appear in Ref. [14] as the generators of the Weyl algebra) turns out to be the conversion variables, the basic one-forms dx i on the symplectic manifold should be identified with the ghost variables associated to the converted constraints. Further, the Fedosov flat connection D is the adjoint action of the quantum BRST charge ; the flat sections of the Weyl bundle is thus nothing but the BRST cohomology. Under this identification, the Fedosov quantization statements regarding the existence of the Abelian connection, lift of the functions from the symplectic manifold to the flat sections of the Weyl bundle can be recognized as the standard existence theorems of the BRST theory.
440
M. A. Grigoriev, S. L. Lyakhovich
2. Representation of a General Symplectic Manifold as a Constrained Hamiltonian Dynamics In this section we first represent a general symplectic manifold M, dim(M) = N as a second class constraint surface embedded into the fibre bundle Tρ∗ M, dim(Tρ∗ M) = 2N being equipped with globally defined canonical symplectic structure. Next we develop the procedure to convert the second class constraints into the first class ones extending the manifold Tρ∗ M to the direct sum Tρ∗ M ⊕ T M which possesses a nontrivial Poisson structure. This structure generates, in a sense, all the structure relations of the symplectic geometry of the original symplectic manifold M. The extra degrees of freedom introduced with this embedding are effectively gauged out due to the gauge symmetry related to the effective first class constraints. And finally we construct the classical BRST embedding for the effective first class system, which serves as a starting point for the BFV-BRST quantization of the symplectic manifold, that is done in the next section. 2.1. Second class constraint formulation of the symplectic structure. Let M be the symplectic manifold with symplectic form ω. Denote by { · , · }M the respective Poisson bracket on M. Let x i be a local coordinate system on M. In the local coordinates the symplectic form and the Poisson bracket read as ω = ωij (x)dx i ∧ dx j , {a(x) , b(x) }M = ωij (x)
dω = 0,
∂a(x) ∂b(x) , ∂x i ∂x j
ωij ωj k = δki .
(2.1) (2.2)
Let be a symmetric symplectic connection on M, which always exists (for details of the geometry based on this connection see [22]). In the local coordinates x i we have ∂ l ωj k − ijl ωlk − ik ωj l = 0, ∂x i
∇i (
∂ ∂ ) = ijk k . ∂x j ∂x
(2.3)
∂ ∂ k of by R k Introduce a curvature tensor Rl;ij l;ij ∂x k = [∇i , ∇j ] ∂x l . In the local coordinates it reads k k Rl;ij = ∂i jkl + jnl in − ∂j ilk − iln jkn .
(2.4)
In the symplectic geometry it is convenient to use the coefficients ij k defined by n . The curvature tensor R ij k = ωin jnk and Rkl;ij = ωkn Rl;ij kl;ij satisfies corresponding Bianchi identities: ∇m Rkl;ij + . . . = 0.
(2.5)
The following properties are known of the symmetric symplectic connection (see e.g. [22]): ij k is total symmetric in each Darboux coordinate system and the curvature tensor has the symmetry property Rkl;ij = Rlk;ij .
(2.6)
This fact could be immediately seen by choosing a coordinate system where ωij are constant.
Fedosov Deformation Quantization as a BRST Theory
441
Consider an open covering of M by contractible open sets. In each domain Uα the symplectic form can be represented as ω = dρ α ,
ωij = ∂i ρjα − ∂j ρiα ,
(2.7)
where ρ α = ρiα dx i is the symplectic potential in Uα . In the overlapping Uαβ = Uα ∩ Uβ we have ρα − ρβ = φα β ,
dφ αβ = 0.
(2.8)
φ αβ + φ βγ + φ γ α = 0
(2.9)
The transition 1-forms φ αβ obviously satisfy φ αβ + φ βα = 0,
in the overlappings Uα ∩ Uβ and Uα ∩ Uβ ∩ Uγ respectively. Given an atlas Uα and the symplectic potential ρ α defined in each domain Uα one can construct an affine bundle Tρ∗ M over M. Namely, for each domain Uα with the local coordinates xαi (index α indicates that xαi are the coordinates on Uα ) choose the fibre to be R N (N = dim(M)) with the coordinates piα . In the overlapping Uα ∩ Uβ we prescribe the following transition law: j β ∂xβ ∂xαi
piα = pj
αβ
+ φi
(2.10)
(summation over the repeating indices α, β, . . . is not implied). Here the coefficients αβ αβ φi of the 1-form φ αβ are introduced by φ αβ = φi dxαi . It is easy to check that the transition law (2.10) satisfies standard conditions in the overlapping of two and three domains and thus it determines Tρ∗ M as a bundle. The difference between the usual cotangent bundle T ∗ M and Tρ∗ M is that the structure group of the former is GL(N, R) while that of the latter is a group of affine transformations of R N . As the transformation law (2.10) of the variables pi differs from that of the coordinates on the fibres of the standard cotangent bundle by a closed 1-form only, then Tρ∗ M is also equipped with the canonical symplectic form dpi ∧ dx i . In particular, the corresponding Poisson bracket has also the canonical form {f , g} =
∂f ∂g ∂f ∂g − . i ∂x ∂pi ∂pi ∂x i
(2.11)
An important feature of this construction is that the surface L defined by the equations θi (x, p) ≡ ρi − pi = 0,
(2.12)
is a submanifold in Tρ∗ M. Indeed, these equations can be considered as those determining the smooth section of Tρ∗ M. Moreover, considered as a manifold, L is isomorphic to the original manifold M. Indeed, L is a section of the bundle Tρ∗ M and M is a base of Tρ∗ M; the projection π : Tρ∗ M → M to L ⊂ Tρ∗ M obviously establishes an isomorphism between M and L. Note also that quantities θi transform as coefficients of a 1-form.
442
M. A. Grigoriev, S. L. Lyakhovich
From the viewpoint of the Hamiltonian constrained dynamics, θi are the second class constraints, as their Poisson brackets in Tρ∗ M form an invertible matrix θi , θj = ωij .
(2.13)
The Dirac bracket in Tρ∗ M, being built of the constraints (2.12), {f (x, p), g(x, p)}D ≡ {f (x, p), g(x, p)} − {f (x, p), θi } ωij θj , g(x, p)
(2.14)
can be considered as a Poisson bracket defined on the constraint surface L. As the Dirac bracket is nondegenerate on the constraint surface L, the latter is a symplectic manifold. One can see that L is isomorphic to M when each one is considered as a symplectic manifold. Indeed, any function f (x, p) on Tρ∗ M can be reduced on L to the function f0 (x) = f (x, p)|pi =ρi (x) , while the function f0 (x) can be understood as defined on the original manifold M. The Dirac bracket (2.14) between any functions f (x, p), g(x, p) coincides on the constraint surface L determined by constraints (2.12) to the Poisson bracket between their projections to M: {f (x, p), g(x, p)}D |pi = ρi (x) ={f0 (x), g0 (x)}M , f0 (x) = f (x, p)|pi = ρi (x) , g0 (x) = g(x, p)|pi = ρi (x) .
(2.15)
This obvious fact provides the equivalence of the constrained dynamics in Tρ∗ M and the Hamiltonian one in M. The quantization problem for the symplectic manifold M is thereby equivalent to the quantization of the second class constrained theory in Tρ∗ M. There is another possibility to represent an arbitrary symplectic manifold as a Hamiltonian dynamics subject to second class constraints2 . Instead of changing the structure group of the cotangent bundle T ∗ M one may modify the symplectic structure on T ∗ M. Namely, consider the following symplectic form on T ∗ M: ωmod = dpi ∧ dx i + π ∗ ω,
(2.16)
where dp ∧ dx is a canonical symplectic structure on T ∗ M, ω is a symplectic 2-form on M, and π ∗ is a pullback associated to the bundle projection π : T ∗ M → M. The modified Poisson brackets between x, p in T ∗ M read
x i , pj
mod
= δji ,
pi , pj
mod
= ωij ,
(2.17)
with all others vanishing. The original symplectic manifold M can be viewed as a zero section in the cotangent bundle T ∗ M equipped with the symplectic structure (2.16). As the Poisson brackets between the momenta pi (2.17) form the invertible matrix, zero section pi = 0 is a second class constraint surface in T ∗ M. This gives a different way of representing M as a second class constrained system which is obviously equivalent to the one we use in this paper. To trace the equivalence between these two second class constraint systems, it is sufficient to change the variables pi → pi − ρi (x). 2 We use this representation in the paper [23]. Independently, this possibility has been pointed out by the referee of the present paper. We thank the referee for a careful reading of the manuscript, valuable comments, and suggestions.
Fedosov Deformation Quantization as a BRST Theory
443
2.2. Conversion to the first class. In this section we suggest a procedure to convert second class constraints (2.12) into first class ones. The procedure explicitly accounts for the geometry of the original manifold M, and makes transparent the relationship between the BRST and Fedosov’s constructions. Now we further enlarge the phase space. Namely we embed Tρ∗ M into the Tρ∗ M ⊕ T M. Let y i be the natural coordinates on the fibres of tangent bundle T M. In order to equip the extended phase space Tρ∗ M ⊕ T M with the Poisson bracket one has to engage an additional structure, a symplectic connection. In view of the properties (2.3), (2.4) and (2.5) one can equip Tρ∗ M ⊕ T M with a symplectic structure. Indeed, let the bracket operation {·, · } on Tρ∗ M ⊕ T M be given by y i , y j = ωij , x i , pj = δji , 1 j (2.18) pi , y j = il y l , pi , pj = Rmn; ij y m y n , 2 x i , x j = 0, x i , y j = 0. Then (2.18) is a Poisson bracket provided is a symmetric symplectic connection. Considering Jacobi identities for the Poisson brackets (2.18) between y i , y j , pk ; i y , pj , pk and pi , pj , pk one arrives at (2.3),(2.4) and (2.5) respectively. In this sense, the Poisson bracket (2.18) might be viewed as a generating structure for the Fedosov geometry which is based just on the relations (2.3), (2.4) and (2.5). In what follows instead of smooth functions C ∞ (Tρ∗ M ⊕ T M) on Tρ∗ M ⊕ T M, we will consider formal power series in y i with coefficients in C ∞ (Tρ∗ M). Moreover, we restrict the coefficients to be polynomials in pi . The reason for this is that y i serve as “conversion variables” and one has to allow formal power series in y. As pi play a role of momenta, it is a usual technical restriction in physics to allow only polynomials in pi . Thus speaking about "functions" on Tρ∗ M ⊕ T M we mean sections of the appropriate vector bundle over M. The Poisson bracket (2.18) is well defined in the algebra of these “functions”. There is a simple formula which clarifies the geometrical meaning of this Poisson bracket: {pi , f (x, y)} = −∇i f (x, y),
(2.19)
where the function f (x, y) (formal power series in y) is understood in the r.h.s. of (2.19) as an inhomogeneous symmetric tensor field on M, i.e. f (x, y) =
∞ k=0
fi1 ...ik (x) y i1 . . . y ik ,
∞ ∇j fi1 ...ik (x) y i1 . . . y ik . f (x, y), pj =
k=0
(2.20) The goal of the conversion procedure is to continue the second class constraints θi (x, p) (2.12), being the functions on Tρ∗ M, into Tρ∗ M ⊕ T M θi → Ti (x, p, y), Ti |y=0 = θi in such a way that Ti have to be the first class in the extended manifold. Thus we look for the functions Ti such that (2.21) Ti , Tj = 0, Ti |y=0 = θi .
444
M. A. Grigoriev, S. L. Lyakhovich
We also prescribe Ti to transform as the coefficients of the 1-form under the change of coordinates on M, as the original constraints θi (2.12) have the same transformation property. Existence of the Abelian conversion is established by the following3 Proposition 2.1. Equation (2.21) has a solution. Proof. Let us look for the solution to Eq. (2.21) in the form of the explicit power series expansion in the variables y i , Ti =
∞ r=0
τir .
(2.22)
It turns out that it is sufficient to consider functions τir , r ≥ 1 which do not depend on the momenta pi : τi0 (x) = θi ,
τir = τirj1 ...jr (x)y j1 . . . y jr .
(2.23)
τirj1 ...jr (x)
can be considered as the coefficients of the tensor field on M Functions that is symmetric w.r.t. all indices except the first one. In the zeroth and first order we respectively have ωij + τi1l ωl k τj1k = 0,
2 ∇[i τj1]k + 2τ[ilk ωlm τj1]m = 0,
(2.24)
with [i, j ] standing for antisymmetrisation in i, j . There is a particular solution to these equations: τi1k = −ωik ,
τij2 k = 0.
(2.25)
Taking τij1 = −ωij one can in fact consider more general solutions for τij2 k . In this case, the second equation of (2.24) implies that ij k + τij2 k are the coefficients of a symmetric symplectic connection on M. This arbitrariness in the solution of (2.24) can be absorbed by the redefinition of the symmetric symplectic connection entering the Poisson bracket (2.18). The ambiguity in τil1 might be able to reflect additional geometrical structures on M. As we will see below, standard Fedosov construction of the star-product on M corresponds to the “minimal” solution (2.25). However we consider here a general solution to (2.24). Taking any fixed solution to Eq. (2.24) for τi1 , τi2 one sees that in the r th (r ≥ 2) order in y Eq. (2.21) implies τ[i1 , τjr+1 (2.26) + Bijr = 0, ] where the quantities Bijr are given by
1 Bij2 = ∇[i τj2] + τi2 , τj2 + Rmn;ij y m y n , 2 r−2 Bijr = ∇[i τjr] + τir−t , τjt+2 , r ≥ 3.
(2.27)
t=0
Now relations (2.26) are to be considered as the equations determining τir+1 . We need the following 3 The local proof for the respective existence theorem is known [6]. However we give here a proof with a due regard to the global geometry which is based on the Poisson bracket (2.18).
Fedosov Deformation Quantization as a BRST Theory
445
Lemma 2.1. Let the quantity Aij (x, y) be such that Aij + Aj i = 0 and τi1 , Aj k + cycle(i, j, k) = 0, then there exist Ci such that Aij = τ[i1 , Cj ] . (2.28) The statement is an obvious generalisation of the standard Poincaré Lemma. In the case where τi1k = −ωik , it is precisely the Poincaré Lemma. It follows from the lemma that Eq. (2.26) has a solution iff Bij satisfies (2.29) τi1 , Bj k + cycle(i, j, k) = 0. To show that it takes place let us introduce the partial sum Tis =
s t=0
τit
(2.30)
and consider expression (s−1) + Bijt ) + Bijs + . . . , ( τ[i1 , τjt+1 Tis , Tjs = ]
(2.31)
t=1
where . . . denote terms of order higher than s in y i . Assume that Eqs. (2.26) hold for 1 ≤ r ≤ s − 1. Excluding the contribution of order s − 1 from the Jacobi identity Tis , Tjs , Tks + cycle(i, j, k) = 0, (2.32) one arrives at
τi1 , Bjs k + cycle(i, j, k) = 0.
(2.33)
It follows from the lemma that for r = s Eq. (2.26) considered as that on τ r+1 admits a solution. The induction implies that Eq. (2.21) also admits solution, at least locally. To show that the solution exists globally we construct the particular solution to Eq. (2.26) for r = s: τis+1 = −
2 j B s (K −1 )m y m , s + 2 ij
j
Ki = τil1 ωlj .
(2.34)
This solution satisfies the condition τis+1 (K −1 )ij y j = 0,
(2.35)
which does not depend on the choice of the local coordinates on M. Given a fixed first term τi1 , Eq. (2.35) can be considered as the condition on the solution to Eq. (2.21). It is easy to see that the solution to (2.21) is unique provided the condition (2.35) is imposed. Indeed, the general solution to Eq. (2.26) is given by s+1 1 s τ s+1 = τ + τ , C , (2.36) i i i where τis+1 is the particular solution (2.34) and C s = C s (x, y) is an arbitrary function. One can check that condition (2.35) implies that second term in (2.36) vanishes. Choosing τi to satisfy (2.35) in each domain Uα one gets the global solution to Eq. (2.21).
446
M. A. Grigoriev, S. L. Lyakhovich
Thus we have arrived at the first class constrained system, with the constraints being (2.22). An observable of the first class constrained system is a function A(x, y, p) satisfying j
{Ti , A} = Vi Tj
(2.37)
for some functions Vji (x, y, p). Observables A(x, y, p) and B(x, y, p) are said to be equivalent iff their difference is proportional to the constraints, i.e. A − B = V i Ti
(2.38)
for some functions V i (x, y, p). In each equivalence class of the observables there is a unique representative which does not depend on the momenta pi . Indeed, let A(x, y, p) be an observable. Since it is a polynomial in p it can be rewritten as A = a(x, y) + a1i (x, y)Ti + . . . ,
(2.39)
where . . . stands for higher (but finite) orders in T. It follows from (2.37) that a(x, y) satisfies {Ti , a(x, y)} = 0.
(2.40)
Now we are going to show that the Poisson algebra of the inequivalent observables is isomorphic to the algebra of functions on M. Proposition 2.2. Equation (2.40) has a unique solution a(x, y) satisfying a(x, 0) = a0 (x) for any given function a0 ∈ C ∞ (M). A proof is a direct analogue of that of Proposition (2.1). Given two solutions a(x, y) and b(x, y) of Eq. (2.40) corresponding to the boundary conditions a(x, 0) = a0 (x) and b(x, 0) = b0 (x) one can check that {a, b} |y=0 = {a0 , b0 }M .
(2.41)
Thus the isomorphism is obviously seen between the Poisson algebra of observables of the first class theory and the Poisson algebra of functions of symplectic manifold M. This shows that the constructed first class constrained system is equivalent to the original unconstrained system on M. 2.3. An extended Poisson bracket and the BRST charge. According to the BFV quantization prescription we have to extend the phase space introducing Grassmann odd ghost variable C i to each constraint Ti and the ghost momenta Pi canonically conjugated to Ci . We wish the ghost variables C i and Pi to transform under the change of local coordinate system on the base M as the components of the vector field and 1-form respectively. Thus in the intersection Uα ∩ Uβ one has Cαi
=
j Cβ
∂xαi j
∂xβ
,
Piα
=
j β ∂xβ Pj . ∂xαi
(2.42)
Fedosov Deformation Quantization as a BRST Theory
447
Further define the following Poisson brackets on the extended phase space: C i , Pj = δji , C i , C j = 0, Pi , Pj = 0,
(2.43)
the brackets between ghosts and other variables vanish and the brackets among x, y, p keep their form (2.18). If the momenta pi were still transformed according to (2.10) the Poisson bracket relations would not be invariantly defined. In order to make them invariant we modify the transformation properties of the momenta pi : in the overlapping Uα ∩ Uβ of coordinate neighborhoods the transition law (2.10) is modified by ghost contribution as follows: j β ∂xβ ∂xαi
piα = pj
αβ
β
+ φi + Pl Cβk
2 l j ∂xα ∂ xβ
∂xβk ∂xαi ∂xαj
.
(2.44)
One can easily check that the Poisson brackets (2.43) preserve their form under the change of coordinates on M and the corresponding change of other variables. Thus the extended Poisson bracket is globally defined. Let us explain the geometry of the extended phase space constructed above. Let ρ be the pullback of the symplectic potential ρ from the base M to the odd tangent bundle 2T M over M (we view ghost variables C i as natural coordinates on the fibres of 2T M over M). It was shown in Sect. 2.1 that given a (locally defined) 1-form ρα in each coordinate neighborhood Uα of manifold M one can construct a modified cotangent bundle Tρ∗ M over M. If in addition 1-form is such that dρα = dρβ in the intersection Uα ∩ Uβ the modified cotangent bundle is equipped with the canonical symplectic structure. Applying this construction to the 2T M with the (locally defined) 1-form ρ one arrives at the affine bundle Tρ∗ (2T M). Now it is easy to see that the Poisson bracket (2.43) is nothing but the canonical Poisson bracket on the modified cotangent bundle Tρ∗ (2T M). While the variable pi , being canonically conjugated to the variable x i , has the transition law (2.44). Finally, the whole extended phase space E of the BFV formulation of the converted system is the vector bundle E = Tρ∗ (2T M) ⊕ T M,
(2.45)
with y i being the natural coordinates on the fibres of T M (here Tρ∗ (2T M) is considered as the vector bundle over M and ⊕ denotes the direct sum of vector bundles). It goes without saying that “functions” on E are formal power series in y i . In what follows we denote the algebra of “functions” on the extended phase space by F(E). According to the BFV quantization procedure we prescribe ghost degrees to each the variable gh(C i ) = 1,
gh(x i ) = gh(y i ) = gh(pi ) = 0.
gh(Pi ) = −1,
(2.46)
Thus the Poisson bracket carries vanishing ghost number. A ghost charge G can be realized as G = C i Pi .
(2.47)
Indeed {G, C i } = C i ,
{G, Pi } = −Pi ,
{G, x i } = {G, y i } = {G, pi } = 0.
(2.48)
448
M. A. Grigoriev, S. L. Lyakhovich
The BRST charge of the converted system is given by = C i Ti .
(2.49)
{, } = 0,
(2.50)
It satisfies the nilpotency condition w.r.t. the extended Poisson bracket. The BRST charge carries unit ghost number: {G, } = .
(2.51)
Relations (2.50) and (2.51) are known as the BRST algebra. A BRST observable is a function A satisfying {, A} = 0,
gh(A) = 0.
(2.52)
A BRST observable of the form {, B} is called trivial. The algebra of the inequivalent observables (i.e. quotient of all observables modulo trivial ones) is thus the zero-ghostnumber cohomology of the classical BRST operator {, · }. The Poisson bracket on the extended phase space obviously determines the Poisson bracket in the BRST cohomology. Thus the space of inequivalent BRST observables is a Poisson algebra. Proposition 2.3. The Poisson algebra of inequivalent observables of the BFV theory with the BRST charge (2.49) and ghost charge (2.47) is isomorphic with the Poisson algebra of functions on the symplectic manifold M. Proof. Let A0 ⊂ F(E) be the algebra of functions depending on x, y, C only. Any function from A0 is a pullback of some function on 2T M ⊕ T M to the entire extended phase space E (2.45). Let also A = A(x, y, p, C, P) be a BRST observable. Then one can check that there exist functions a ∈ A0 and 3 ∈ F(E) such that A = a + {, 3} ,
gh(a) = 0,
{a, } = 0.
(2.53)
As a matter of fact a can not depend on C i as it has zero ghost number. Finally, it follows from Proposition 2.2 that the Poisson algebra of function on M is isomorphic with the Poisson algebra of zero-ghost-number BRST invariant functions from A0 . Thus at the classical level the initial Hamiltonian dynamics on M is equivalently represented as the BFV theory. 3. Quantization and Quantum Observables In this section we find a Poisson subalgebra A in the algebra of functions on the extended phase space which contains all the physical observables and the generators of the BRST algebra. Thus instead of quantizing the entire extended phase space it is sufficient to quantize just Poisson subalgebra A. This subalgebra can be easily quantized that results in a quantum BRST formulation of the effective first class constrained theory. The quantum BRST observables of the constructed system are isomorphic to the space of functions on the symplectic manifold M. This isomorphism carries the star-multiplication from the algebra of quantum observables to the algebra of functions on M, giving thus a deformation quantization of M. It turns out that the star multiplication of the quantum BRST observables is the fibrewise multiplication of the Fedosov flat sections of the Weyl algebra bundle over M. Finally we interpret all the basic objects of the Fedosov deformation quantization as those of the BRST theory.
Fedosov Deformation Quantization as a BRST Theory
449
3.1. Quantization of the extended phase space. Consider a Poisson subalgebra A ⊂ F(E) which is generated by subalgebra A0 (subalgebra of functions depending on x, y, C only) and the elements P = C i θi ≡ C i (ρi (x) − pi ),
G = C i Pi ,
P, G ∈ F(E).
(3.1)
The reason for considering A is that A is a minimal Poisson subalgebra of F(E) which contains, at least classically, all the BRST observables and both BRST and ghost charges (recall that G is precisely a ghost charge while BRST charge can be represented in the form = P + , with being some element of A0 ). A general homogeneous element a of A has the form a = Pm Gn a(x, y, C),
m = 0, 1,
n = 0, 1, . . . N,
N = dim(M),
a ∈ A0 . (3.2)
Note that algebra A is not free, it can be considered as the quotient of the free algebra generated (as a supercommutative algebra) by A0 and the elements P, G modulo the relations N = dim(M),
Pm C i1 . . . C iN −k−m+1 Gk = 0, k = 0, 1, . . . , N + 1, m = 0, 1.,
N − k − m + 1 ≥ 0.
(3.3)
The definition of A is in fact invariant in the sense that it is independent of the choice of the coordinates on M. The basic Poisson bracket relations in A read as {P, P} = R + ω, {G, P} = P, {P, a} = ∇a, {G, G} = 0, (3.4) ∂a ∂a ∂b {G, a} = C i i , {a, b} = ωij i j , ∂C ∂y ∂y where a(x, y, C) and b(x, y, C) are arbitrary elements of A0 , ∇ = C i ∇i is a covariant differential in A0 and 1 (3.5) R = Rkl;ij C i C j y k y l , ω = ωij C i C j , R, ω ∈ A0 , 2 is the curvature of the covariant differential ∇ and the symplectic form respectively. It is easy to see that A is closed w.r.t. the Poisson bracket and thus it is a Poisson algebra. There is almost obvious star product which realizes deformation quantization of A as a Poisson algebra. The explicit construction of the star product in A is presented in Appendix 4. In order to proceed with the BRST quantization of our system we will actually engage the star multiplication in A0 ⊂ A given by the Weyl star-product i h¯ ij ∂ ∂ a(x, y1 , C)b(x, y2 , C)|y1 =y2 =y , (3.6) (a b)(x, y, C) = exp − ω 2 ∂y1i ∂y2j and the following commutation relations in A: ∂ F, ∂G (3.7) i ∂ [P, P] = −i h(R [G, a] = −i hC a, ¯ + ω), ¯ ∂C i for any element a ∈ A0 and a function f (G) depending on G only. In what follows A and A0 considered as the associative algebras with respect to star-multiplication will be q denoted by Aq and A0 respectively. [P, a] = −i h∇a, ¯
[P, f (G)] = i hP ¯
450
M. A. Grigoriev, S. L. Lyakhovich
3.2. The quantum BRST charge. At the classical level, all the physical observables and generators of the BRST algebra (ghost charge G and the BRST charge ) belong to A. Thus to perform the BRST quantization of the first class constrained system one may restrict himself by the quantum counterpart Aq of the Poisson algebra A. Consider relations of the BRST algebra4 ˆ ] ˆ = , ˆ [G,
ˆ ] ˆ ≡ 2 ˆ ∗ ˆ = 0, [,
ˆ ∈ Aq . ˆ G ,
(3.8)
ˆ defined by The first equation implies the nilpotency of the adjoint action D of DA =
i ˆ A] , [, h¯
a ∈ Aq .
(3.9)
q
Note that D preserves the subalgebra A0 ⊂ Aq and therefore D can be considered as q an odd nilpotent differential in A0 . Let us show the existence of the quantum BRST charge satisfying (3.8) whose classical limit coincides with the classical BRST charge from the previous section. Instead ˆ at of finding h-corrections to the classical BRST charge it is convenient to construct ¯ the quantum level from the very beginning. In order to formulate boundary conditions ˆ and for technical convenience we introduce a useful degree [20, 14]. to be imposed on Namely, we prescribe the following degrees to the variables deg(x i ) = deg(C j ) = 0, deg(pi ) = deg(Pi ) = 2, deg(y i ) = 1,
deg(h) ¯ = 2.
(3.10)
The star-commutator in Aq apparently preserves the degree. ˆ into the sum of homogeneous components Let us expand ˆ =
∞
r ,
deg(r ) = r.
(3.11)
r=0
Given a classical BRST charge (2.49) which starts as = C i ρi −C i pi +C i τij1 y j + + . . . , one can formulate the boundary condition on the solution of (3.8) as follows C i τij2 l y j y l
0 = C i ρi ,
1 = C i τij1 y j .
2 = −C i pi + C i τij2 l y j y l .
(3.12)
ˆ has a solution satisfying Proposition 3.1. Equations (3.8) considered as those for boundary condition (3.12). 4 Here we have introduced a separate notation G ˆ for the quantum ghost charge because it can differ from the classical ghost charge G = C i Pi by an imaginary constant i h¯2N . This constant, of course, can not contribute to the commutation relations.
Fedosov Deformation Quantization as a BRST Theory
451
Proof. Equation (3.8) evidently holds in the lowest order w.r.t. degree (3.10) provided respective classical BRST charge satisfies {, } = 0. At the classical level the higher order terms in expansion of w.r.t. y do not depend on the momenta pi , Pi . Thus these terms belong to A0 . It is useful to assume that the same occurs at the quantum level: q
r ∈ A0
r ≥ 3.
(3.13)
In the r + 2th (r ≥ 2) degree Eq. (3.8) implies δr+1 + B r = 0,
(3.14)
where the quantity B r is defined by t , t ≤ r: Br =
r−2
i t+2 r−t [ , ] , 2h¯
deg(B r ) = r,
(3.15)
t=0
q
q
and δ : A0 → A0 stands for δa =
i 1 ∂ [ , a] = C i τij1 ωj l l a, h¯ ∂y
q
a ∈ A0 .
(3.16)
q
Note that δ is obviously nilpotent in A0 . It follows from the nilpotency of δ that the compatibility condition for Eq. (3.14) is δB r = 0. In fact it is a sufficient condition for Eq. (3.14) to admit a solution. Indeed, the cohomology of the differential δ is trivial when evaluated on functions at least linear in C. To show it, we construct the “contracting homotopy” δ −1 . Namely, let δ −1 be defined by its action on a homogeneous element apq = ai1 ,... ,ip ; j1 ,... ,jq (x) y i1 . . . y ip C j1 . . . C jq ,
(3.17)
1 j ∂ y i (K −1 )i j apq , p+q ∂C
(3.18)
by δ −1 apq =
p + q = 0,
δ −1 a00 = 0, j
where (K −1 )i is inverse to Kji = τil1 ωlj . For any a = a(x, y, C) we have a|y=C =0 + δδ −1 a + δ −1 δa = a.
(3.19)
Since B r is quadratic in C then the 3rd term vanishes and δB r = 0 implies B r = δδ −1 B r which in turn implies that Eq. (3.14) admits a solution. Let us show that the necessary condition δB r = 0 is fulfilled. To this end assume r to satisfy (3.14) for r ≤ s. Thus the Jacobi identity [
s t=0
s s t , [ t , t ] ] = 0 t=0
(3.20)
t=0
implies in the s + 3th degree that δB s = 0. The particular solution to (3.14) for r = s evidently reads as s+1 = −δ −1 B s .
(3.21)
452
M. A. Grigoriev, S. L. Lyakhovich
Iteratively applying this procedure one can construct a solution to Eq. (3.8) at least locally. To show that Eq. (3.8) admits a global solution we note that operators δ as well as δ −1 are defined in a coordinate independent way. It implies that the particular solution (3.21) does not depend on the choice of the local coordinate system and thus it is a global solution. The quantum BRST charge constructed above obviously satisfies δ −1 r = 0,
r ≥ 3,
(3.22)
which can be considered as an additional condition on the solution to Eq. (3.8). One can actually show that solution to Eq. (3.8) is unique provided condition (3.22) is imposed ˆ on . Thus we have shown how to construct the quantum BRST charge associated to the first class constraints Ti . The operator δ which is extensively used in the proof plays a crucial role in the BRST formalism. In the case of the first class constrained system, the counterpart of δ is known as the Koszul-Tate differential associated to the constraint surface [24–26] while in the Lagrangian BV quantization [27] the respective counterpart of δ is the Koszul-Tate differential associated to the stationary surface.
3.3. An algebra of the quantum BRST observables and the star-multiplication. Observables in the BFV quantization are recognized as zero-ghost-number values closed w.r.t. ˆ modulo exact ones; aˆ is an observable iff to adjoint action D of BRST charge D aˆ ≡
i ˆ a] [, ˆ = 0, h¯
ˆ a] [G, ˆ = 0.
(3.23)
Two observables are said to be equivalent iff their difference is D-exact. The space of inequivalent observables is thus the zero-ghost-number cohomology of D. Initially, the classical observables are the functions on the symplectic manifold M. Now M is embedded into the extended phase space E (2.45). According to the BFV prescription the quantum extension of the initial observable a is an operator (symbol in our case) aˆ of the quantum converted system that is the solution to Eqs. (3.23) subjected to the boundary condition a| ˆ y=0 = a0 (x).
(3.24)
Let us consider the algebra of quantum BRST observables in Aq . First study observables q in A0 ⊂ Aq . q
Proposition 3.2. Equations (3.23) have a unique solution belonging to A0 for each initial observable a0 = a0 (x). Proof. Consider an expansion of aˆ in the homogeneous components aˆ =
∞ r=0
ar ,
deg(a r ) = r.
(3.25)
Fedosov Deformation Quantization as a BRST Theory
453
The boundary condition (3.24) implies a 0 = a0 (x). Equation (3.23) obviously holds in the first degree. In the higher degrees we have δa r+1 + B r = 0,
(3.26)
where B r is given by r−2
i t+2 r−t B = [ , a ] , h¯ r
deg(B r ) = r,
(3.27)
t=0
ˆ w.r.t. degree. Similarly to the proof of and t are terms of the expansion (3.11) of Proposition 3.1 the necessary and sufficient condition for (3.26) to admit the solution is δB r = 0. To show that the condition holds indeed assume that Eqs. (3.26) hold for all r ≤ s. Then consider the identity ˆ [, ˆ a] [, ˆ ] = 0.
(3.28)
Excluding a contribution of degree s + 3 we arrive at δB s = 0.
(3.29)
The particular solution to Eq. (3.26) for r = s is a s+1 = −δ −1 B s .
(3.30)
Iteratively applying this procedure one arrives at the particular solution to Eq. (3.23) satisfying the boundary condition a| ˆ y=0 = a0 (x). q Finally let us show the uniqueness. Taking into account that a s+1 belongs to A0 and s+1 s+1 gh(a ) = 0 we conclude that a does not depend on C. Thus the general solution to Eq. (3.26) is given by a s+1 = −δ −1 B s + C s+1 (x, h). ¯
(3.31)
It is easy to see that the boundary condition requires C s+1 (x, h) ¯ = 0 (recall that (δ −1 B n )|y=0 = 0.) Since Eq. (3.23) is linear it has a unique solution aˆ satisfying the boundary condition aˆ y=0 = a0 (x, h) ¯ even if the initial observable a0 was allowed to depend formally on h. ¯ It follows from Proposition 3.2 that the space of inequivalent quantum observables coincides with the space of classical observables (functions on M) tensored by formal power series in h. ¯ In other words, the space of zero-ghost-number cohomology of D q evaluated in A0 is isomorphic with C ∞ (M) ⊗ [[h]], ¯ where C ∞ (M) is the algebra of functions on M and [[h]] ¯ denotes the space of formal power series in h¯ . In fact even a stronger statement holds Theorem 3.1. The space of inequivalent quantum BRST observables, i.e. the zero-ghostnumber cohomology evaluated in Aq , is isomorphic to C ∞ (M) ⊗ [[h¯ ]].
454
M. A. Grigoriev, S. L. Lyakhovich
A proof follows from observation that each zero-ghost-number cohomology class q from Aq has a representative in A0 . Further, Proposition 3.2 implies that the representative is unique and provides us with the explicit isomorphism between C ∞ (M) ⊗ [[h]] ¯ and the space of inequivalent quantum observables. ˆ · ] is an inner derivation in Aq then the star Since the BRST differential D = hi¯ [, q multiplication in A determines the star multiplication in the space of quantum BRST cohomology. Making use of the isomorphism from 3.1 one can equip C ∞ (M)⊗[[h]] ¯ with the associative multiplication, determining thereby a star product on M. As the algebra of quantum BRST observables is a deformation of the Poisson algebra of classical ones, which in turn is isomorphic with the Poisson algebra C ∞ (M), the star-product on M satisfies standard correspondence conditions ˆ y=h¯ =0 = a0 b0 , aˆ b|
i ˆ |y=h¯ =0 = {a0 , b0 }M . [a, ˆ b] h¯
(3.32)
Here aˆ and bˆ are the symbols obtained by means of Proposition 3.2 starting from a0 , b0 ∈ C ∞ (M). 3.4. BFV-Fedosov correspondence. To establish the correspondence with the Fedosov q construction of the star product we note that the quantum algebra A0 consisting of functions5 of x, y, C is precisely the algebra of sections of the Weyl algebra bundle from [14] provided one identifies ghosts C i with the basis 1-forms dx i . ˆ corresponding to the boundary condiLet us consider the quantum BRST charge q ˆ on A tion (2.25). An adjoint action of 0 ∞
∂ i i t ˆ a] = (Ci ∇i − C i [ Da ≡ [, )a + , a] , h¯ h¯ ∂y i t=3
q
a ∈ A0 ,
(3.33)
is precisely the Fedosov connection in the Weyl algebra bundle. Indeed, in the Fedosovlike notations δ = Ci
∂ , ∂y i
∂ = C i ∇i ,
r=
∞
t ∈ A0 ,
(3.34)
t=3
ˆ can be rewritten as the star-commutator with quantum BRST charge Da ≡
i i [, a] = (∂ − δ)a + [r, a] , h¯ h¯
(3.35)
ˆ · ] and the Fedosov connection in the that makes transparent identification of the hi¯ [, Weyl algebra bundle. In particular, the zero-curvature condition is precisely the BFV ˆ ] ˆ = 0. quantum master equation [, It follows from Theorem 3.1 that each equivalence class of the quantum BRST obq servables has a unique representative in A0 . Thus the inequivalent quantum BRST observables are the flat sections of the Weyl algebra bundle. In its turn the star multiplication (3.6) of quantum BRST observables (considered as the BRST invariant functions 5 Recall that in the BRST approach functions of auxiliary variables y are formal power series in y.
Fedosov Deformation Quantization as a BRST Theory
455
q
from A0 ) is nothing but the Weyl product of the Fedosov flat sections of Weyl algebra bundle. There is a certain distinction between BRST and Fedosov quantization. Unlike the Fedosov Abelian connection, the adjoint action of the BRST charge can be realized as an inner derivation of the associative algebra Aq . In particular, in the BRST approach the covariant differential D is strictly flat.
4. Conclusion Summarize the results of this paper. First we construct a global embedding of a general symplectic manifold M into the modified cotangent bundle Tρ∗ M as a second class constrained surface. Then we have elaborated globally defined procedure which converts the second class constrained system into the first class one that allows us to construct the BRST description for the Hamiltonian dynamics in the original symplectic manifold. We have explicitly established the structure of the classical BRST cohomology in this theory and perform a straightforward quantum deformation of the classical Poisson algebra which contains all the observables and the BRST algebra generators. As all the values on the original symplectic manifold are identified with the observables of the BRST theory, we have thus quantized the general symplectic manifold. Finally, we establish a detailed relationship between the quantum BFV-BRST theory of the symplectic manifolds and Fedosov’s deformation quantization. The construction of the BRST embedding of the second class constrained theory, being done by means of the cohomological technique, allows to recognize the conversion procedure as some sort of deformation of the classical Poisson algebra of the second class system. This deformation has an essential distinction from that one which is usually studied in relation to switching on the interactions in classical gauge theories [28], although the cohomological technique is quite similar. As soon as classical deformation has been performed, the problem of the quantum deformation becomes transparent in the theory. Thus in the BRST approach a part of the deformation quantization problem is transformed, in a sense, in the problem of another deformation, classical in essence, while the quantization itself is almost obvious in the classically deformed system. Acknowledgements. We are grateful to I. A. Batalin for fruitful discussions on various problems considered in this paper. We also wish to thank V. A. Dolgushev, A. V. Karabegov, A. M. Semikhatov, A. A. Sharapov, I. Yu. Tipunin and I. V. Tyutin. The work of MAG is partially supported by the RFBR grant 99-01-00980, INTAS-YSF-98-156, Russian Federation President Grant 99-15-96037 and Landau Scholarship Foundation, Forschungszentrum Jülich. The work of SLL is supported by the RFBR grant 00-02-17956.
Appendix A. Star-Multiplication in A Here we present an explicit form of the star product in the Poisson algebra A. Recall, that A is a Poisson subalgebra of F(E) generated by A0 (Poisson algebra of functions depending of x, y, C only) and the elements P = C i θi , G = C i Pi . The star product in A0 could be defined by the Weyl multiplication (3.6). As for general elements of A, let us consider first P-independent ones. For the general P-independent elements
456
M. A. Grigoriev, S. L. Lyakhovich
A(x, y, C, G) and B(x, y, C, G) we postulate
∂ ∂ ∂ ∂ A(x, y, C, G) B(x, y, C, G) = exp −i h¯ G + Ci i ∂G1 ∂G2 ∂C2 ∂G1
A(x, y, C1 , G1 ) B(x, y, C2 , G2 )
(A.1)
C1 =C2 =C , G1 =G2 =G,
where the in the r.h.s. is the Weyl multiplication (3.6) acting on y only. Finally, taking into account commutation relations (3.7) for the P-dependent elements one can choose (PA) (B) = P(A B), ∂ (A) (PB) = (−1)p(A) P(A B) − i hP A B ¯ ∂G ∂ 2 ∇ + i h((∇A) B) − (i h) A B , (A.2) ¯ ¯ ∂G ih (i h) ∂ ¯ ¯ 2 (PA) (PB) = (−1)p(A) (R + ω) A B + (R + ω) A B 2 2 ∂G ∂ 2 + i hP((∇A) B) − (i h) ∇ A B , ¯ ¯ P ∂G where A and B are general P-independent elements from A and the star-product in the right hand sides of Eqs. (A.2) is that given by (A.1). The associativity of multiplication (A.1) and (A.2) can be verified directly. This star multiplication can be thought of as that of P, x, y, C, G-symbols which is also a Weyl symbol w.r.t. y-variables. If one considered the star-product in A as that on functions explicitly depending on p, P then it would correspond to the p, x, y, C, Psymbol. References 1. Fradkin, E.S. and Vilkovisky, G.A.: Quantization of relativistic systems with constraints. Phys. Lett. B 55, 224–226 (1975); Batalin, I.A. and Vilkovisky, G.A.: Relativistic S-matrix of dynamical systems with boson and fermion constraints. Phys. Lett. B 69, 309–312 (1977); Batalin, I.A. and Fradkin, E.S.: A generalized canonical formalism and quantization of reducible gauge theories. Phys. Lett. B 122, 157–164 (1983); Batalin, I.A. and Fradkin, E.S.: Operatorial quantization of relativistic dynamical systems subject to first class constraints. Phys. Lett. B 128, 303–308 (1983); Batalin, I.A., Fradkin, E.S.: Operatorial quantization of dynamical systems subject to constraints.A further study of the construction. Ann Inst. Henri Poincaré 49, 145–214 (1988) 2. Henneaux, M. and Teitelboim, C.: Quantization of gauge systems. Princeton, NJ: Princeton Univiversity Press, 1992, 520 p. 3. Faddeev, L.D., Shatashvili, S.L.: Realization of the Schwinger term in the Gauss law and the possibility of correct quantization of a theory with anomalies. Phys. Lett. B 167, 225 (1986) 4. Batalin, I.A. and Fradkin, E.S.: Operator Quantization of Dynamical Systems with Irreducible First and Second Class Constraints. Phys. Lett. B 180, 157 (1986) 5. Batalin, I.A. and Fradkin, E.S.: Operatorial Quantization of Dynamical Systems Subject to Second Class Constraints. Nucl. Phys. B 279, 514 (1987) 6. Batalin, I.A. and Tyutin, I.V.: Existence theorem for the effective gauge algebra in the generalized canonical formalism with Abelian conversion of second class constraints. Int. J. Mod. Phys. A 6, 3255 (1991)
Fedosov Deformation Quantization as a BRST Theory
457
7. Connes, A. and Rieffel, M.: Yang–Mills for noncommutative two-tori. Contemp. Math. Oper. Alg. Math. Phys. 62, AMS, 1987 8. Vasiliev, M.A.: Higher Spin Gauge Theories: Star-Product and AdS Space. Contributed article to Gelfand’s Memorial Volume, M. Shifman, ed., Singapore: World Scientific, hep-th/9910096 9. Seiberg, N. and Witten, E.: Noncommutative Geometry and String Theory. JHEP 9909 (1999) 032, hepth/9908142 10. Connes, A., Douglas, M.R. and Schwarz, A.: Noncommutative Geometry and Matrix Theory: Compactification on tori. JHEP 9802:003 (1998), hep-th/9711162 11. Berezin, F.: Quantization. Izv. Mat. Nauk 38, 1109–1165 (1974) 12. Bayen, F., Flato, M., Fronsdal, C., Lichnerowicz, A. and Sternheimer, D.: Deformation Theory and Quantization. 1. Deformations of Symplectic Structures. Ann. Phys. 111, 61 (1978) 13. De Wilde, M. and Lecomte, P.B.A.: Existence of star-products and of formal deformations of the Poisson Lie algebra of arbitrary symplectic manifolds. Lett. Math. Phys. 7, 487–496 (1983) 14. Fedosov, B.V.: A Simple Geometrical Construction of Deformation Quantization. J. Diff. Geom. 40, 213–238 (1994) 15. Fedosov, B.: Deformation quantization and index theory. Berlin: Akademie Verlag, 1996, 325 p., (Mathematical topics 9) 16. Kontsevich, M.: Deformation quantization of Poisson manifolds, I. q-alg/9709040 17. Cattaneo, A.S. and Felder, G.: A path integral approach to the Kontsevich quantization formula. qalg/9902090 18. Bordemann, M.: On the deformation quantization of super-Poisson brackets. Preprint Freiburg FR-THEP96/8, q-alg/9605038 (May 1996) 19. Batalin, G. and Fradkin, E.S.: Operator Quantization of Dynamical Systems with Curved Phase Space. Nucl. Phys. B 326, 701 (1989) 20. Fradkin, E.S. and Linetsky, V.Y.: BFV approach to geometric quantization. Nucl. Phys. B 431, 569 (1994); BFV Quantization on Hermitian Symmetric Spaces. Nucl. Phys. B 444, 577–601 (1995) 21. Batalin, I.A., Fradkin, E.S. and Fradkina, T.E.: Generalized Canonical Quantization of Dynamical Systems with Constraints and Curved Phase Space. Nucl. Phys. B 332, 723 (1990) 22. Gelfand, I., Retakh, V. and Shubin, M.: Fedosov Manifolds. dg-ga/9707024 23. Batalin, I.A., Grigoriev, M.A. and Lyakhovich, S.L.: Star Product for Second Class Constraint Systems from a BRST Theory. hep-th/0001089 24. Henneaux, M. and Teitelboim, C.: BRST Cohomology in Classical Mechanics. Commun. Math. Phys. 115, 213–230 (1988); Fisch, J., Henneaux, M., Stasheff, J. and Teitelboim, C.: Existence, Uniquiness and Cohomology of the Classical BRST Charge with Ghosts of Ghosts. Commun. Math. Phys. 120, 379–407 (1989) 25. Henneaux, M.: Hamiltonian Form of the Path Integral for Theories with a Gauge Freedom. Phys. Rep. 126, 1 (1985) 26. Dubois-Violette, M.: Dynamic Constraint Systems: Homologic Approach. Ann. Inst. Fourier Grenoble 37, 45–47 (1987) 27. Batalin, I.A. and Vilkovisky, G.A.: Gauge algebra and quantization. Phys. Lett. B102, 27 (1981); Quantization of gauge theories with linearly dependent generators. Phys. Rev. D28, 2567 (1983) 28. Henneaux, M.: Consistent interactions between gauge fields: The cohomological approach. Talk given at Conference on Secondary Calculus and Cohomological Physics, Moscow, Russia, 24–31 Aug. 1997, hep-th/9712226; Glenn Barnich, Friedemann Brandt, Marc Henneaux: Local BRST cohomology in gauge theories. hepth/0002245 Communicated by A. Connes
Commun. Math. Phys. 218, 459 – 477 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Surface States and Spectra Vojkan Jak˘si´c1, , Yoram Last2,3 1 Department of Mathematics, Johns Hopkins University, 3400 N. Charles Street, 404 Krieger Hall, Baltimore,
MD 21218, USA
2 Institute of Mathematics, The Hebrew University, 91904 Jerusalem, Israel 3 Department of Mathematics, California Institute of Technology, Pasadena, CA 91125, USA
Received: 18 September 2000 / Accepted: 21 November 2000
Abstract: Let Zd+1 = Zd × Z+ , let H0 be the discrete Laplacian on the Hilbert space + d+1 l 2 (Z+ ) with a Dirichlet boundary condition, and let V be a potential supported on the boundary ∂Zd+1 + . We introduce the notions of surface states and surface spectrum of the operator H = H0 +V and explore their properties. Our main result is that if the potential V is random and if the disorder is either large or small enough, then in dimension two H has no surface spectrum on σ (H0 ) with probability one. To prove this result we combine Aizenman–Molchanov theory with techniques of scattering theory.
1. Introduction This paper is a direct continuation of [JL1] and deals with the following model: Let d ≥ 1 be given, and let Zd+1 = Zd × Z+ , where Z+ = {0, 1, · · · }. We denote the points + d+1 in Z+ by (n, x), for n ∈ Zd and x ∈ Z+ . Let H0 be the discrete (centered) Laplacian on the Hilbert space H := l 2 (Zd+1 + ) with a Dirichlet boundary condition. The operator H0 acts as ψ(n , x ) if x > 0 |n−n |+ +|x−x |=1 (H0 ψ)(n, x) = ψ(n, 1) + |n−n |+ =1 ψ(n , 0) if x = 0, where |n|+ = dj =1 |nj |. Let V be a potential supported on the boundary ∂Zd+1 = Zd + (that is, V acts as (V ψ)(n, x) = V (n, x)ψ(n, x) and V (n, x) = 0 if x > 0) and H = H0 + V .
(1.1)
On leave from Department of Mathematics and Statistics, University of Ottawa, 585 King Edward Avenue, Ottawa, ON, K1N 6N5, Canada.
460
V. Jak˘si´c, Y. Last
The model (1.1) and the questions we will study are motivated by the physics of disordered surfaces (see [JMP1, KP, P]). In the first part of the paper we propose a dynamical definition of the surface states and surface spectrum of the operator H . We remark that there are some alternative approaches in the literature (see, e.g., [JMP1, DS]) and they will be compared with our proposal in [JMP2]. In the second part of the paper we consider the case where the boundary potential V is a random process on Zd . This case is of particular physical importance. Our main result is that if d + 1 = 2 and if the disorder is either large or small enough, then with probability one H has no surface spectra on σ (H0 ). It is known that under these conditions the spectrum of H outside σ (H0 ) is (with probability one) pure point with exponentially decaying eigenfunctions [JM2]. Thus, in particular, our result rules out the existence of propagating surface states in dimension two if the disorder is large or small enough. The proof of our main result (Theorems 1.4 and 1.5 below) combines Aizenman– Molchanov theory [AM] with techniques of scattering theory. More precisely, we use Aizenman–Molchanov theory to prove a “localization” estimate for matrix elements of the resolvent (H − z)−1 along the boundary ∂Zd+1 + . Such an estimate implies that propagation of wave packets along the boundary is suppressed. We then combine this estimate with techniques of scattering theory to show that wave packets with energies in σ (H0 ) must dissolve into the bulk and that there are thus no surface spectra on σ (H0 ). A perhaps surprising aspect of our argument is that Aizenman–Molchanov theory is used to establish a result which is in spirit opposite to “localization” of wave packets. We will comment further on this point in Sect. 1.2. 1.1. Surface states and spectra. Surface states of the model (1.1) are wave packets which remain localized near the boundary ∂Zd+1 + for all time. This heuristic description can be made mathematically rigorous as follows: For R ∈ Z+ we set R := (n, x) : n ∈ Zd , 0 ≤ x ≤ R , R := Zd+1 + \ R . We denote by 1R , 1R , the characteristic functions of the sets R , R , and use the same symbols for the corresponding multiplication operators (which are orthogonal projections). Obviously, 1 = 1R + 1R . All the results of this section hold for an arbitrary boundary potential V . We say that a vector ψ is a surface state of the operator H if T 1 lim lim inf 1R e−itH ψ2 dt = ψ2 . (1.2) R→∞ T →∞ 2T −T We denote the set of all surface states by Hs (H ). Condition (1.2) is equivalent to T 1 lim lim sup 1R e−itH ψ2 dt = 0, (1.3) R→∞ T →∞ 2T −T and from (1.3) it follows easily that Hs (H ) is a closed subspace of H invariant under H . Notation. In the sequel we denote by 1 (H ) the spectral projection of H onto a Borel set ⊂ R. We will also use the shorthand cd = 2(d + 1), so σ (H0 ) = [−cd , cd ].
Surface States and Spectra
461
We recall that for any V , [−cd , cd ] ⊂ σ (H ) (see [JL1]). The basic property of the set Hs (H ) is: Theorem 1.1. Ran1R\(−cd ,cd ) (H ) ⊂ Hs (H ). The surface spectrum of the operator H , σs (H ), is defined by σs (H ) = σ (H |Hs ). Obviously, σs (H ) is a closed set and σs (H ) ⊂ σ (H ). Theorem 1.1 yields that σ (H ) \ σ (H0 ) ⊂ σs (H ). An important question is whether or not H has some surface spectra on σ (H0 ). The only known examples where this happens are periodic potentials and some special boundary potentials for which H has embedded eigenvalues in σ (H0 ) [MV]. In this paper we will describe criteria under which H has no surface spectra on σ (H0 ). In [CS, JL1] it was proven that for any boundary potential V , the wave operators ± = s − lim eitH e−itH0 t→∓∞
(1.4) 1
exist. Our next results identifies their ranges. We adopt the shorthand t = (1 + t 2 ) 2 . Let T := ψ : ∀ R, k ≥ 0, 1R e−itH ψ = O(t−k ) , −itH 2 F := ψ : ∀ R ≥ 0, 1R e ψ dt < ∞ . R
Clearly, T and F are linear spaces and T ⊂ F. Theorem 1.2. Ran ± = T = F. Moreover, Ran± ⊂ Hs (H )⊥ . We recall that the wave operators ± are complete on a Borel set ⊂ R if Ran1 (H ) ⊂ Ran± . Therefore, Theorem 1.2 yields: Corollary 1.3. If the wave operators ± are complete on , then σs (H ) ∩ = ∅. In [JL1] we have introduced the notion of resonant spectrum, R(H ), and we have shown that the wave operators ± are complete on σ (H )\R(H ). Thus, σs (H ) ⊂ R(H ). Various estimates on the location of the set R(H ) are given in [JL1]. 1.2. Random boundary potentials. Let be the set of all boundary potentials, d
= RZ =
× R, Zd
let B be the Borel σ -algebra in and dP a probability measure on (, B) of the form dP =
× dµ, Zd
where dµ = p(x)dx is an absolutely continuous (w.r.t. Lebesgue measure) probability measure on R. In what follows we use the symbol E(f ) for the expectation of a random variable f and assume that d + 1 = 2.
462
V. Jak˘si´c, Y. Last
The main result of this paper is: Theorem 1.4. Let Uper be a periodic boundary potential and H = H0 + Uper + λV , V ∈ , where λ is a real constant. Assume that xα p(x) ∈ L1 (R) ∩ L∞ (R) for some α > 2/3. Then there is a constant % > 0 such that for |λ| > %, R ≥ 0 and ψ ∈ H,
1 lim T →∞ 2T
T
−T
E 1R e−itH 1(−cd ,cd ) (H )ψ2 dt = 0.
(1.5)
In particular, for |λ| > %, σs (H ) ∩ (−cd , cd ) = ∅ P − a.s.
(1.6)
Remark 1. That (1.5) implies (1.6) is seen as follows. Let N be a countable, dense set ˜ ⊂ of full P -measure such in H. Fatou’s lemma and (1.5) yield that there is a set ˜ that for V ∈ and ψ ∈ N , lim lim inf
R→∞ T →∞
1 2T
T
−T
1R e−itH 1(−cd ,cd ) (H )ψ2 dt = 0.
(1.7)
˜ Since for fixed V the set of ψ’s for which (1.7) holds is closed, (1.7) holds for V ∈ and ψ ∈ N = H. Remark 2. The constant |λ| is a measure of the disorder. Remark 3. The operator H0 + Uper may have some surface spectra on σ (H0 ). For example, if ∀ n, Uper (n, 0) = a and |a| > 1, then σs (H0 + Uper ) = [−2d, 2d] + a + a −1 . One can also show that this surface spectrum is stable under short-range perturbations, see [JMP2]. Remark 4. The technical conditions of Theorem 1.4 can be relaxed in several ways. The method of the proof allows for more general densities (non-i.i.d. and correlated random variables), the background potential does not have to be periodic, etc. We opted for conditions which cover most physically interesting situations, while allowing for a technically simple exposition of the results and the proofs. If the background potential Uper is equal to zero, then we can also deal with the weak coupling regime. Theorem 1.5. Let H = H0 + λV , V ∈ , and assume that xα p(x) is in L1 (R) for some α > 2/3 and in L∞ (R) for some α > 5/3. Then there is a constant % > 0 such that for |λ| < %, R ≥ 0 and ψ ∈ H, 1 lim T →∞ 2T
T
−T
E 1R e−itH 1(−cd ,cd ) (H )ψ2 dt = 0.
In particular, for |λ| < %, σs (H ) ∩ (−cd , cd ) = ∅ P-a.s.
Surface States and Spectra
463
Remark 1. If %V ≤ 1 P -a.s., then for |λ| < %, the wave operators (1.4) are complete on σ (H0 ) P -a.s. [JL1], and by Corollary 1.3 H has P -a.s. no surface spectra on σ (H0 ). Thus, for bounded random variables the above theorem could be an empty statement. If, however, supp µ = R (for example, the random variables {V (n)} have a Cauchy or Gaussian distribution), then it is not known whether the wave operators (1.4) are complete for any λ = 0. In such situations, Theorem 1.5 is a new result. Remark 2. The cases Uper = 0 and Uper = 0 may describe two very different physical situations. See [JM2, JM3] for a discussion. We finish this section with a brief discussion of Theorems 1.4 and 1.5. Roughly, the physical reason why these theorems hold is that in the strong coupling regime (with Uper possibly different from zero) and in the weak coupling regime (with Uper = 0) the propagation of wave packets along the boundary is suppressed. This forces wave packets with energies outside σ (H0 ) to turn into bound states, which is the reason why we have only pure point spectrum outside σ (H0 ) in these regimes [JM1, JM2]. However, since the spectrum of H in σ (H0 ) is P -a.s. purely a.c. [JL1, JL2], wave packets with energies in that regime must propagate, and they are thus expelled into the bulk. Notation. Throughout the paper we will use the shorthands R(z) := (H − z)−1 , R((m, y), (n, x); z) := (δ(m,y) |(H − z)−1 δ(n,x) ). The above heuristic argument can be made mathematically rigorous as follows. Let (a, b) be given energy interval and 0 < s < 1. For m, n ∈ Zd and x ∈ Z+ let
sup E |R((m, 0), (n, x), e + i.)|s . γx (m, n) := e∈(a,b),.=0
The estimate one seeks is that for all x ≥ 0 there are positive constants C and δ such that ∀ m, n, γx (m, n) ≤ Cn − m−d−δ .
(1.8)
If (a, b) ∩ σ (H0 ) = ∅, then (1.8) implies that on (a, b) the operator H has P -a.s. only pure point spectrum [JM2]. (For this result it suffices that (1.8) holds for x = 0). On the other hand, if (a, b) ⊂ σ (H0 ), then (1.8) implies that relation (1.5) holds and that H has P -a.s. no surface spectra on (a, b) (see Sect. 3.2). The question remains how the estimate (1.8) can be established. Let us consider the case x = 0. One can show that for any z ∈ C \ σ (H0 ) there exists an operator h0 (z), which acts on l 2 (Zd ) as convolution by a sequence j (n, z), j (n − k, z)ψ(k), (h0 (z)ψ)(n) = k
so that (δ(m,0) |(H − z)−1 δ(n,0) ) = (δm |(h(z) − z)−1 δn ),
464
V. Jak˘si´c, Y. Last
where h(z) = h0 (z) + V . Thus, the estimate (1.8) for x = 0 is the usual localization estimate of Aizenman–Molchanov [AM] for the operator h(z). The difficulties in establishing such an estimate stem from the fact that h0 (z) is a long-range energy-dependent Laplacian, and the existing techniques are effective only if for some δ > 0 and ∀ n, t (n) :=
sup
e∈(a,b),.=0
|j (n, e + i.)| = O(n−d−δ ).
However, it turns out that t (n) ∼ n−(d+2)/2 , and we have an appropriate decay only if d + 1 = 2. This is the reason why we are able to prove Theorems 1.4 and 1.5 only in dimension two. We do not know whether this restriction is technical or new physical phenomena emerge in dimensions d + 1 > 2. 2. Deterministic Results 2.1. Proof of Theorem 1.1. The following result will be used on several occasions in this paper. Lemma 2.1. Let A be a bounded self-adjoint operator on H. Then, for any ψ ∈ H, T 1 Ae−itH ψ2 dt ≤ C. max AR(e ± i.)ψ2 de, ± 2T −T R where . = T −1 and C = e2 /2π. Proof. The result follows from the well-known identity (see, e.g., [RS], Sect. XIII.7) −2.|t| −itH 2 2π e Ae ψ dt = AR(e + i.)ψ2 + AR(e − i.)ψ2 de R
R
and the estimate 1 2T where . = T −1 .
T
−T
Ae−itH ψ2 dt ≤
e2 . 2
R
e−2.|t| Ae−itH ψ2 dt,
" !
The following lemma is an immediate consequence of Proposition 3.1 of [JL1]. We use the shorthand R0 (z) = (H0 − z)−1 . Lemma 2.2. If [a, b] ∩ σ (H0 ) = ∅ then there exists γ > 0 such that sup
e∈[a,b],.=0
1R R0 (e + i.)10 ≤ Ce−γ R .
Lemma 2.3. Let be a compact set such that ∩σ (H0 ) = ∅. Then there exist constants C > 0 and γ > 0 such that for all R ≥ 0 and n ∈ Zd , lim sup |.| 1R R(e + i.)1 (H )δ(n,0) 2 de ≤ Ce−γ R . .→0
R
Surface States and Spectra
465
Proof. Let [a, b] be an interval such that is properly contained in [a, b] and [a, b] ∩ σ (H0 ) = ∅. Then there exists a constant C1 such that ∀ e ∈ R \ [a, b], sup R(e + i.)1 (H ) ≤ C1 /dist(e, ). .
Thus, it suffices to show that b lim sup |.| 1R R(e + i.)1 (H )δ(n,0) 2 de ≤ Ce−γ R . .→0
a
Since V = 10 V and sup |.| .=0
b a
V R(e + i.)1 (H )δ(n,0) 2 ≤ C2 ,
the result follows from the resolvent identity and Lemma 2.2.
" !
We are now ready to finish: Proof of Theorem 1.1. Let be a compact set such that ∩ σ (H0 ) = ∅. It follows from Lemmas 2.1 and 2.3 that for any n ∈ Zd , T 1 lim sup 1R e−itH 1 (H )δ(n,0) 2 dt = O(e−γ R ). T →∞ 2T −T Thus, for all n ∈ Zd , 1 (H )δ(n,0) ⊂ Hs (H ). Since the set {δ(n,0) : n ∈ Zd } is cyclic for H (see [JL1]), we have that Ran1R\[−cd ,cd ] (H ) ⊂ Hs (H ). Finally, either 1{±cd } (H ) = 0 or ±cd is an eigenvalue of H . In either case, Ran1{±cd } (H ) ⊂ Hs (H ) and Theorem 1.1 follows. ! "
2.2. Ranges of ± . In this section we prove Theorem 1.2. Let D := ψ : ∀ R, k ≥ 0, 1R e−itH0 ψ = O(t−k ) . Obviously, D is a linear subspace of H. Moreover, in [JL1] (Prop. 3.12) we have shown that D = H.
(2.1)
Theorem 1.2 follows from the next three propositions. Proposition 2.4. Ran± ⊂ T . Proof. We consider the − case; a similar argument applies to the + case. Let T be a linear operator defined by −δ(n,1) if x = 0 T δ(n,x) = δ(n,0) if x = 1 0 if x > 1.
(2.2)
466
V. Jak˘si´c, Y. Last
Note that T = 1 and H 10 − 10 H0 = [H0 , 10 ] = T .
(2.3)
Fix ψ ∈ D. Since lim eitH 10 e−itH0 ψ = 0,
t→∞
we have
−
ψ −ψ =
∞
0
d iτ H e 10 e−iτ H0 ψdτ dτ
∞
=i
eiτ H T e−iτ H0 ψdτ.
0
Thus, 1R e−itH − ψ2 = 1R − e−itH0 ψ2 ≤ A(t) + 21R e−itH0 ψ2 ,
(2.4)
where A(t) = 2
∞
T e−i(τ +t)H0 ψdτ
2 .
0
Since T = T 11 , it follows from the definition of D and (2.4) that for any ψ ∈ D, R ≥ 0 and k ≥ 0, 1R e−itH − ψ2 = O(t−k ). Thus, − D ⊂ T , and relation (2.1) yields the result.
" !
Proposition 2.5. F ⊂ Ran± . Proof. We again consider only the − case. Obviously, to prove the statement it suffices to show that for every ψ ∈ F the limit lim eitH0 e−itH ψ
t→∞
(2.5)
exists. First, we observe that for ψ ∈ F, lim eitH 10 e−itH ψ = 0.
t→∞
(2.6)
Indeed, let w(t) = eitH 10 e−itH ψ. Then w(t)2 dt < ∞, and, since [H, 10 ] = [H0 , 10 ], w (t) ≤ 2H0 . This yields that limt→∞ w(t) = 0 (see, e.g., Exercise 6.2 in [RS]). Thus, to prove the existence of the limit (2.5) it suffices to prove the existence of lim eitH0 10 e−itH ψ.
t→∞
We fix ψ ∈ F and set 5(t) := eitH0 10 e−itH ψ.
Surface States and Spectra
467
Let φ ∈ H be arbitrary. Then d (φ|w(t)) = i(e−itH0 φ|T e−itH ψ), dt where T is given by (2.2). Since T = T 11 = 11 T , we have that for t > s, |(φ|5(t) − 5(s))| ≤
R
11 e
−iτ H0
1
t
2
φ dτ 2
s
11 e
−iτ H
ψ dτ 2
21
.
Since 11 is H0 -smooth (see Lemma 3.7 in [JL1]), there is a constant C, independent of φ, such that R
11 e
−iτ H0
1 φ dτ 2
2
≤ Cφ.
Therefore, 5(t) − 5(s) ≤ C
t
s
11 e−iτ H ψ2 dτ
21
.
If ψ ∈ F, then the integrand on the right hand side of the last equation is in L1 (R). Therefore, the sequence 5(t) is Cauchy as t → ∞ and limt→∞ 5(t) exist. ! " Proposition 2.6. F ⊥ Hs (H ). Proof. Let φ ∈ F and ψ ∈ Hs (H ) be two normalized vectors. Then (φ|ψ) = (1R e−itH φ|1R e−itH ψ) + (1R e−itH φ|1R e−itH ψ), which yields |(φ|ψ)| ≤ 1R e−itH φ + 1R e−itH ψ. Thus, |(φ|ψ)| ≤
1 2T
T
−T
1R e
−itH
21
φ dt 2
+
1 2T
T
−T
1R e
−itH
ψ dt 2
21
,
and 1 |(φ|ψ)| ≤ lim sup 2T T →∞
T
−T
1R e
1 + lim lim sup R→∞ T →∞ 2T
−itH
T
−T
φ dt 2
1R e
21
−itH
ψ dt 2
21
= 0.
" !
3. Surface Spectra in Dimension Two In this section we prove Theorems 1.4 and 1.5. In what follows we assume that d +1 = 2. We use the letter C for a generic constant which may change from estimate to estimate.
468
V. Jak˘si´c, Y. Last
3.1. Preliminaries. In this section we collect some technical results about the model (1.1). Most of them are similar to the results already discussed in [JL1, JM1, JM2, JM3] and we refer the reader to any of these papers for additional information. All the results of this section, except for Lemma 3.4, hold for an arbitrary boundary potential V . Let T be the unit circle. We denote by φ the points in T and by dφ the usual Lebesgue measure. Let C± = {z : ±Imz > 0} and, for z ∈ C± , let λ(φ, z) be the root of the quadratic equation X + X −1 + 2 cos φ = z, which satisfies |λ(φ, z)| < 1. Explicitly, for z ∈ C± , 1 λ(φ, z) = 2 cos φ − z − (2 cos φ − z)2 − 4± , 2 where the branch of square root is fixed by √ √ 2 w ± = x + iy ± = |w| + x ± i |w| − x , 2
(3.1)
±Imw > 0.
The function λ(φ, z) extends by continuity from T × C± to T × C± . We denote the values of these extensions along the real axis by λ(φ, e ± i0). Lemma 3.1. For any x ∈ Z+ there is a constant Cx such that ∀ n, −inφ x sup e λ(φ, z) dφ ≤ Cx n−3/2 . T
z∈C±
(3.2)
Proof. In [JL1] we have shown that (δ(0,0) |(H0 − z)
−1
δ(n,x) ) = −(2π )
−1/2
T
e−inφ λ(φ, z)x+1 dφ.
(3.3)
Moreover, the proof of Lemma 2.9 in [JM2] yields that for some Dx and all n, −inφ x λ(φ, e) dφ ≤ Dx n−2 . sup e |e|>4
T
Thus, the maximum modulus principle and the standard exponential estimate on the Green function (3.3) yield that (3.2) follows if for some Cx and all n, −inφ x λ(φ, e ± i0) dφ ≤ Cx n−3/2 . sup e |e|≤4
T
The proof of this fact is a calculus exercise. We indicate below the main steps of this calculation. For notational simplicity, we consider the cases x = 1, +i0, and e ≥ 0. We need to estimate the integral In = e−inφ (2 cos φ − e)2 − 4+ dφ T
Surface States and Spectra
469
for e ∈ [0, 4] and |n| $ 1. The change of variable u = cos φ and integration by parts yield that |In | ≤ (An+ + An− ) |n|, where An±
1 2(2u ± e) −in cos−1 u = e du . 2 −1 (2u ± e) − 4 +
We estimate An+ . Let 0 < δ % 1 and S(δ) = {u ∈ [−1, 1] : dist(u, ±1 − e/2) ≤ δ}. Let ;n+ be the integrand in An+ . One can easily see that √ ≤ C δ. ; (u)du n+ S (δ)
If
S c (δ)
= [−1, 1] \ S(δ), then 2(2u + e) d 1 −1 u −in cos 2 ; (u)du = 1 − u du e , c n+ |n| c S (δ) S (δ) du (2u + e)2 − 4 +
and integration by parts yields
C ;n+ (u)du ≤ √ . c |n| δ S (δ)
Therefore, for some constant C independent of e, C √ 1 |In | ≤ δ+ √ . |n| |n| δ Setting δ = |n|−1 , we derive the statement. Let j (n, z) = (2π )−1
T
" !
e−inφ (λ(φ, z) + 2 cos φ)dφ,
(3.4)
and let h0 (z) be the operator of convolution by j (n, z). This operator acts on l 2 (Z) as follows: (h0 (z)ψ)(n) = j (n − k, z)ψ(k). k
Let h(z) = h0 (z) + V and H = H0 + V , where V is a boundary potential. Note that h(z) acts on l 2 (Z) while H acts on l 2 (Z2+ ). Let 1 ˆ R((m, 0), (φ, x); z) = (2π )− 2 R((m, 0), (n, x); z)einφ . n
For the proof of the next two lemmas we refer the reader to [JL1, JM1].
470
V. Jak˘si´c, Y. Last
Lemma 3.2. For any n, m ∈ Z and z ∈ C± , (δ(m,0) |(H − z)−1 δ(n,0) ) = (δm |(h(z) − z)−1 δn ).
Lemma 3.3. For any x ∈ Z+ and z ∈ C± , ˆ ˆ R((m, 0), (φ, x), z) = R((m, 0), (φ, 0); z)λ(φ, z)x .
Our next lemma deals with a random potential V . Let (, B) be as in Sect. 1.2 and P be an arbitrary probability measure on (, B). Lemma 3.4. Let (a, b) be an interval such that for some 2/3 < s < 1, δ > 0 and all m, n ∈ Z, sup
e∈(a,b),.=0
E(|R((m, 0), (n, 0); e + i.)|s ) < Cn − m−1−δ .
Then for all x ∈ Z+ there is a constant Cx such that sup
e∈(a,b),.=0
E(|R((m, 0), (n, x); e + i.)|s ) < Cx n − m−q ,
(3.5)
where q = min(1 + δ, 3s/2). Proof. Let tx (n, z) = (2π )−1
T
e−inφ λ(φ, z)x dφ.
Then by Lemma 3.3, R((m, 0), (n, x); z) =
tx (n − k, z)R((m, 0), (k, 0); z),
k
and it follows that E(|R((m, 0), (n, x); z)|s ) ≤
|tx (n − k, z)|s E(|R((m, 0), (k, 0); z)|s ).
k
The result now follows from Lemma 3.1 and (3.5).
" !
The last result we will need is: Lemma 3.5. Let δ > 0 be given. Then there is a constant C such that for l > 0 and m, n ∈ Z −1−δ
≤ C l n − m−1−δ . k1 − mk2 − k1 . . . kl − kl−1 n − kl k1 ,··· ,kl ∈Z
Proof. An elementary induction.
" !
Surface States and Spectra
471
3.2. The property P. Throughout this section we assume that the conditions of Theorem 1.4 hold. Let (a, b) ⊂ (−cd , cd ). We say that the property P holds on (a, b) if for all R ≥ 0 and ψ ∈ H, T 1 E 1R e−itH 1(a,b) (H )ψ2 dt = 0. lim T →∞ 2T −T To prove Theorem 1.4 it suffices to show that the property P holds on every interval (a, b) properly contained in (−cd , cd ). In what follows we fix an interval (a, b) properly contained in (−cd , cd ) and s such that 2/3 < s < 1 and xs p(x) ∈ L1 (R) ∩ L∞ (R). The goal of this section is to prove Theorem 3.6. Assume that for some δ > 0 and all m, n ∈ Z, sup
e∈(a,b),.=0
E(|R((m, 0), (n, 0), e + i.)|s ) ≤ Cn − m−1−δ .
(3.6)
Then the property P holds on (a, b). First, we need the following result of Graf [Gr]. Lemma 3.7. Assume that (3.6) holds. Then sup
e∈(a,b),.=0
|.| E(|R((m, 0), (n, 0); e + i.)|2 ) ≤ Cn − m−1−δ .
(3.7)
Remark. This result is not stated in [Gr] in the above form. However, it is an immediate consequence of the proof of Lemma 3 in [Gr]. We denote by A the set of all C0∞ -functions with support in (a, b). Lemma 3.8. Assume that for all χ ∈ A, R ≥ 0 and n, T 1 E 1R e−itH χ (H )δ(n,0) 2 dt = 0. lim T →∞ 2T −T
(3.8)
Then the property P holds on (a, b). Proof. The result follows from the fact that the set {δ(n,0) : n ∈ Z} is cyclic for H = H0 + V for any V . ! " Lemma 3.9. Assume that supp χ ⊂ (a, b). Then for any φ, ψ ∈ H, T 1 lim E |(φ|e−itH χ (H )ψ)|2 dt = 0. T →∞ 2T −T Proof. By the result of [JL1], the spectrum of H is P -a.s. purely a.c. on (a, b), and it follows from the Wiener theorem that T 1 |(φ|e−itH χ (H )ψ)|2 dt = 0 P − a.s. lim T →∞ 2T −T This relation and the Lebesgue dominated convergence theorem yield the statement.
" !
472
V. Jak˘si´c, Y. Last
For any l ≥ 0, let 1R,l be the characteristic function of the set {(n, x) : |n| ≥ l, 0 ≤ x ≤ R}. We use the same notation for the corresponding multiplication operator. Lemma 3.10. Assume that for all χ ∈ A, R ≥ 0 and n, lim lim sup |.| E 1R,l R(e + i.)χ (H )δ(n,0) 2 de = 0. l→∞
.→0
R
(3.9)
Then the property P holds on (a, b). Proof. Since 1R − 1R,l is a finite rank operator, by Lemma 3.9 T 1 lim E (1R − 1R,l )e−itH χ (H )δ(n,0) 2 dt = 0, T →∞ 2T −T and so, by Lemma 3.8, if for all χ with suppχ ⊂ (a, b), R and n, T 1 lim lim sup E 1R,l e−itH χ (H )δ(n,0) 2 dt = 0, l→∞ T →∞ 2T −T then P holds on (a, b). The result now follows from Lemma 2.1.
" !
We are now ready to finish: Proof of Theorem 3.6. We will prove that relation (3.9) holds for all χ ∈ A, all R and all n. Let χ be given and let [c, d] be an interval such that suppχ ⊂ [c, d] ⊂ (a, b). Since for e ∈ [c, d], sup E 1R,l R(e + i.)χ (H )δ(n,0) 2 ≤ C (dist(e, suppχ ))−2 , .=0
to prove (3.9) it suffices to show that lim lim sup |.| sup E 1R,l R(e + i.)χ (H )δ(n,0) 2 = 0.
l→∞
.→0
e∈(a,b)
(3.10)
Let χ˜ be an almost analytic extension of χ . (For the basic facts about almost analytic extensions we refer the reader to [Da].) By the Helffer–Sjöstrand formula, ∂ χ˜ (z) 1 χ (H ) = R(z)dxdy. π C ∂z It follows that for any w ∈ C, 1 1R,l R(w)χ (H ) = π
C
A(w, z)dxdy,
where A(w, z) =
∂ χ˜ (z) 1 1R,l (R(w) − R(z)). ∂z z − w
(3.11)
Surface States and Spectra
473
We will make use of the following properties of χ˜ (see [Da]): (a) supp χ˜ is set. a compact ∂ χ˜ (z) = O(|Imz|). (b) sup ∂z z∈supp χ˜ Setting w = e + i. we derive from (3.11) that 1R,l R(w)χ (H )δ(n,0) ≤ 1R,l R(w)δ(n,0) D1 (w) + D2 (w), where 1 D1 (w) = π and D2 (w) =
1 π
∂ χ˜ (z) 1 ∂z |w − z| dxdy, C ∂ χ˜ (z) R(z)δ(n,0) ∂z |w − z| dxdy. C
Now (a) and (b) yield that sup D1 (w) < ∞,
w∈C
sup D2 (w) < ∞,
w∈C
and so sup E 1R,l R(e + i.)χ (H )δ(n,0) 2
e∈(a,b)
≤ C sup E 1R,l R(e + i.)δ(n,0) 2 + O(1). e∈(a,b)
(3.12)
Let q = min(3s/2, 1 + δ) > 1. Using Lemma 3.4, (3.7), and (3.12), we conclude that lim sup |.| sup E 1R,l R(e + i.)χ (H )δ(n,0) 2 ≤ C n − m−q .→0
e∈(a,b)
|m|>l
= O(l 1−q ), and (3.10) follows.
" !
3.3. Localization estimate in the strong coupling regime. In this and the next section we use the Aizenman–Molchanov technique to show that the key estimate (3.7) holds if the disorder is either large or small enough (we follow an elegant presentation of Aizenman– Molchanov theory in [S], see also [JM2]). Throughout this section we assume that the assumptions of Theorem 1.4 hold. Theorem 3.11. For any 2/3 < s < 1 there is a constant %s > 0 such that for |λ| > %s and ∀ m, n ∈ Z,
sup E |R((m, 0), (n, 0); z)|s ≤ Cn − m−3s/2 . z∈C±
474
V. Jak˘si´c, Y. Last
Theorem 1.4 is an immediate consequence of Theorems 3.6 and 3.11. The rest of this section is devoted to the proof of Theorem 3.11. To simplify the notation, we assume that m = 0 and adopt the shorthand R(n; z) := R((0, 0), (n, 0); z).
(3.13)
A similar argument applies to the other values of m. In what follows we fix s ∈ (2/3, 1). It follows from Lemma 3.2 that the function R(n; z) satisfies the equation j (n − k, z)R(k; z) + (Uper (n) + λV (n) − z)R(n; z) = δ0n . k
Then
E |Uper (n) + λV (n) − z|s |R(n; z)|s ≤ δ0n +
|j (n − k, z)|s E |R(k; z)|s .
(3.14)
k
The decoupling lemma of Aizenman–Molchanov (see [AM,AG, Gr, M, JM2]) yields that there is a constant ks , ks ≥ 2−(1+s) (1 − s)p−s ∞, such that
ks |λ|s E(|R(n; z)|s ) ≤ E |Uper (n) + λV (n) − z|s |R(n; z)|s .
(3.15)
Let g(n, z) := E(|R(n, z)|s ), and let g(z) be the sequence {g(n, z)}. Note that g(z) ∈ l ∞ (Z) (|g(n, z)| ≤ |Imz|−s ). Relations (3.14) and (3.15) yield that (1 − ks−1 |λ|−s T (z))g(z) ≤ ks−1 |λ|−s δ0 ,
(3.16)
where T (z) is the operator of convolution by |j (n, z)|s . It follows from Lemma 3.1 and (3.4) that |j (n, z)|s < ∞, C1 := sup T (z)∞ = sup z∈C±
z∈C± n
where · ∞ is the l ∞ operator norm. Thus, if ks |λ|s > C1 , then the operator 1 − ks−1 |λ|−s T (z) is invertible, (1 − ks−1 |λ|−s T (z))−1 =
∞ m=0
(ks−1 |λ|−s )m T (z)m ,
(3.17)
Surface States and Spectra
475
and (1 − ks−1 |λ|−s T (z))−1 is positivity preserving (since T (z) is). These observations and relation (3.16) yield that g(z) ≤ ks−1 |λ|−s (1 − ks−1 |λ|−s T (z))−1 δ0 , and g(n, z) ≤ ks−1 |λ|−s
∞ m=0
(ks−1 |λ|−s )m (δ0 |T (z)m δn ).
(3.18)
By Lemma 3.1 and (3.4), there is a constant C2 such that sup |j (n, z)|s ≤ C2 n−3s/2 .
z∈C±
Now let C3 be the constant which appears in Lemma 3.5 for 1 + δ = 3s/2. If in addition to (3.17) we have that ks |λ|s > C2 C3 , then (3.18) and Lemma 3.5 show that for some C and all n, sup g(n, z) ≤ Cn−3s/2 .
z∈C±
This yields Theorem 3.11 with %s = (max(C1 , C2 C3 )/ks )1/s . 3.4. Localization estimate in the weak coupling regime. Throughout this section we assume that the assumptions of Theorem 1.5 hold. Let α > 5/3 be such that xα p(x) ∈ L∞ (R). Theorem 3.12. For any s ∈ (2/3, min(1, α − 1)) there is a constant %s > 0 such that for |λ| < %s and ∀ m, n ∈ Z,
sup E |R((m, 0), (n, 0); z)|s ≤ Cn − m−3s/2 . z∈C±
Remark. The restriction on s in terms of α is a technical condition which we need to ensure that the constant Ks in the Aizenman–Molchanov decoupling lemma (relation (3.20) below) is finite. For some other conditions which also ensure that Ks < ∞ see [A,AM, JM2]. Theorem 1.5 follows from Theorems 3.6 and 3.12. The rest of this section is devoted to the proof of Theorem 3.12. We again assume that m = 0, adopt the shorthand (3.13), and fix s ∈ (2/3, min(1, α − 1)). The resolvent identity and Lemma 3.2 yield that R(n, z) = j (n, z) − j (n − k, z)V (k)R(k, z). k
476
V. Jak˘si´c, Y. Last
Thus, E(|R(n, z)|s ) ≤ |j (n, z)|s +
|j (n − k, z)|s E(|V (k)|s |R(k, z)|s ).
(3.19)
k
The decoupling lemma of Aizenman–Molchanov yields that there is a finite constant Ks such that for z ∈ C± , E(|V (k)|s |R(k, z)|s ) ≤ |λ|s Ks E(|R(k, z)|s ).
(3.20)
Let g(n, z), g(z), T (z), C1 , C2 and C3 be as in the previous section, and let js (z) be the sequence {|j (n, z)|s }. Note that js (z) = T (z)δ0 . Relations (3.19) and (3.20) yield that (1 − Ks |λ|s T (z))g(z) ≤ js (z) = T (z)δ0 . Thus, if Ks |λ|s < C1−1 ,
(3.21)
then the operator 1 − Ks |λ|s T (z) is invertible on the Banach space l ∞ (Z) and its inverse is positivity preserving. Hence, g(z) ≤ (1 − Ks |λ|s T (z))−1 T (z)δ0 ≤ Ks−1 |λ|−s (1 − Ks |λ|s T (z))−1 δ0 , and g(n, z) ≤ Ks−1 |λ|−s
∞
(Ks |λ|s )m (δ0 |T (z)m δn ).
m=0
As in the previous section, if in addition to (3.21) we have that Ks |λ|s < (C2 C3 )−1 , then for some C and all n, sup g(n, z) ≤ Cn−3s/2 .
z∈C±
This yields Theorem 3.12 with 1/s %s = min(C1−1 , (C2 C3 )−1 )/Ks . Acknowledgements. We would like to thank S. Molchanov, L. Pastur, B. Simon and B. Vainberg for useful discussions and to A. Kelm for comments on the manuscript. VJ’s work was partially supported by NSERC. YL’s work was partially supported by NSF grant DMS-9801474, by ISF grant 447/99, and by an Allon fellowship.
Surface States and Spectra
477
References [A]
Aizenman, M.: Localization at weak disorder: Some elementary bounds. Rev. Math. Phys. 6, 1163 (1994) [AM] Aizenman, M., Molchanov, S.: Localization at large disorder and at extreme energies: An elementary derivation. Commun. Math. Phys. 157, 245 (1993) [AG] Aizenman, M., Graf, G.-M.: Localization bounds for an electron gas. J. Phys. A 31, 6783 (1998) [CS] Chahrour, A., Sahbani, J.: On the spectral and scattering theory of the Schrödinger operator with surface potential. Rev. Math. Phys. 12, 561 (2000) [CFKS] Cycon, H., Froese, R., Kirsch, W., Simon, B.: Schrödinger Operators. Berlin–Heidelberg: SpringerVerlag, 1987 [Gr] Graf, G.-M.: Anderson localization and the space-time characteristic of continuum states. J. Stat. Phys. 75, 337 (1994) [Da] Davies, E.B.: Spectral Theory and Differential Operators. Cambridge: Cambridge University Press, 1995 [DS] Davies, E.B., Simon, B.: Scattering theory for systems with different spatial asymptotics on the left and right. Commun. Math. Phys. 63, 277 (1978) [JL1] Jak˘si´c, V., Last, Y.: Corrugated surfaces and a.c. spectrum. Rev. Math. Phys. 12, 1465 (2000) [JL2] Jak˘si´c, V., Last, Y.: Spectral structure of Anderson type Hamiltonians. Invent. Math. 141, 561 (2000) [JM1] Jak˘si´c, V., Molchanov, S.: On the surface spectrum in dimension two. Helv. Phys. Acta 71, 629 (1999) [JM2] Jak˘si´c, V., Molchanov, S.: Localization of surface spectra. Commun. Math. Phys. 208, 153 (1999) [JM3] Jak˘si´c, V., Molchanov, S.: Wave operators for the surface Maryland model. J. Math. Phys. 41, 4452 (2000) [JM4] Jak˘si´c, V., Molchanov, S.: On the spectrum of the surface Maryland model. Lett. Math. Phys. 45, 185 (1998) [JMP1] Jak˘si´c, V., Molchanov, S., Pastur, L.: On the propagation properties of surface waves. In Wave Propagation in Complex Media, IMA Vol. Math. Appl. 96, 143 (1998) [JMP2] Jak˘si´c, V., Molchanov, S., Pastur, L.: In preparation [KP] Khoruzenko, B.A., Pastur, L.: The localization of surface states: An exactly solvable model. Physics Reports 288, 109 (1997) [M] Molchanov, S.: Lectures on Random Media. In: Lectures on Probability, ed. P. Bernard, Lecture Notes in Mathematics, 1581, Heidelberg: Springer-Verlag, 1994 [MV] Molchanov, S., Vainberg, B.: Unpublished [P] Pastur, L.: Surface waves: Propagation and localization. Journées “Équations aux dérivées partielles” (Saint-Jean-de Monts, 1995), Exp. No. VI, École Polytech. Palaisseau (1995) [RS] Reed, M., Simon, B.: Methods of Modern Mathematical Physics, IV. Analysis of Operators. London: Academic Press, 1978 [S] Simon, B.: Spectral analysis of rank one perturbations and applications. CRM Proc. Lecture Notes, Providence, RI: 8, 1995 Communicated by B. Simon
Commun. Math. Phys. 218, 479 – 511 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Quiescent Cosmological Singularities Lars Andersson1 , Alan D. Rendall2 1 Department of Mathematics, Royal Institute of Technology, 100 44 Stockholm, Sweden.
E-mail:
[email protected]
2 Max–Planck–Institut für Gravitationsphysik, Am Mühlenberg 1, 14424 Golm, Germany.
E-mail:
[email protected] Received: 17 January 2000 / Accepted: 27 November 2000
Abstract: The most detailed existing proposal for the structure of spacetime singularities originates in the work of Belinskii, Khalatnikov and Lifshitz. We show rigorously the correctness of this proposal in the case of analytic solutions of the Einstein equations coupled to a scalar field or stiff fluid. More specifically, we prove the existence of a family of spacetimes depending on the same number of free functions as the general solution which have the asymptotics suggested by the Belinskii–Khalatnikov–Lifshitz proposal near their singularities. In these spacetimes a neighbourhood of the singularity can be covered by a Gaussian coordinate system in which the singularity is simultaneous and the evolution at different spatial points decouples. 1. Introduction The singularity theorems of Penrose and Hawking are among the best known theoretical results in general relativity. They guarantee the existence of spacetime singularities under rather general circumstances but say little about the structure of the singularities they predict. In the literature there are heuristic approaches to describing the structure of singularities, notably that of Belinskii, Khalatnikov and Lifshitz (BKL), described in [15, 4, 6] and their references. The BKL work indicates that generic singularities are oscillatory and therefore, in a certain sense, complicated. This complexity may explain why it has not been possible to determine the structure of the singularities by rigorous mathematical arguments. According to the BKL analysis, the presence of oscillatory behaviour in solutions of the Einstein equations coupled to some matter fields is to a great extent independent of the details of the matter content. There are, however, exceptions. It was pointed out by Belinskii and Khalatnikov [5] that a massless scalar field can change the situation dramatically, producing singularities without oscillations. A massless scalar field is closely related to a stiff fluid, i.e. a perfect fluid with pressure equal to energy density, as will be explained in more detail below. Barrow [3] exploited the singularity structure of solutions
480
L. Andersson, A. D. Rendall
of the Einstein equations coupled to a stiff fluid for a description of the early universe he called “quiescent cosmology”. We will refer to singularities where oscillatory behaviour is absent due to the matter content of spacetime as quiescent singularities. Recently the Einstein equations coupled to a scalar field have once again been a source of interest, this time in the context of string cosmology. A formal low energy limit of string theory gives rise to the Einstein equations coupled to various matter fields. Under simplifying assumptions the collection of matter fields can be reduced to a single scalar field, the dilaton. The field equations are then equivalent to the standard Einstein-scalar field equations. (Note, however, that the metric occurring in this formulation of the equations is not the physical metric.) The structure of the singularity in these models plays a role in the so-called pre-big bang scenario. For more information on these matters the reader is referred to the work of Buonanno, Damour and Veneziano [7] where, in particular, formal expansions around singularities in solutions of the Einstein-scalar field equations were obtained in the spherically symmetric case. In view of the above facts, the Einstein-scalar field equations and, more generally, the Einstein-stiff fluid equations represent an opportunity to prove something about the structure of spacetime singularities in a context simpler than that encountered in the case of the vacuum Einstein equations or the Einstein equations coupled to a perfect fluid with a softer equation of state. In this paper we take this opportunity and prove the existence of a family of solutions of the Einstein-scalar field equations whose singularities can be described in detail and are quiescent. These spacetimes are very general in the sense that no symmetry is assumed and they depend on as many free functions as the general solution of the Einstein-scalar field equations. They have an initial singularity near which they can be approximated by solutions of a simpler system of differential equations, the velocity dominated system. Like the full system, it consists of constraints and evolution equations. The evolution equations contain no spatial derivatives and are thus a system of ordinary differential equations. This is an expression of the idea of BKL that the evolution at different spatial points decouples near the singularity. The structure of the paper is as follows. In the second section we recall the Einsteinscalar field and Einstein-stiff fluid equations and define the corresponding velocity dominated systems. This allows the main theorems to be stated. They assert the existence of a unique solution of the Einstein-scalar field equations or Einstein-stiff fluid equations asymptotic to a given solution of the velocity dominated system. The proofs of the existence and uniqueness theorems are described in the third section. In the fourth section the main analytical tool used in these proofs, the theory of Fuchsian systems, is presented. The algebraic machinery needed for the application of the Fuchsian theory is set up in the fifth section. This provides the basis for the estimates of spatial curvature and other important quantities in the section which follows. The seventh section treats relevant aspects of the constraints. The paper concludes with a discussion of what can be learned from the results of the paper and what generalizations are desirable. Throughout the paper the scalar field and stiff fluid cases are treated in parallel. These are independent except in Sect. 7, where the propagation of constraints for the scalar field is deduced from the corresponding statement for the stiff fluid. Hence concentrating on the scalar field case on a first reading would give a good idea of the main features of the proofs.
Quiescent Cosmological Singularities
481
2. The Main Results Let (4) gαβ be a Lorentz metric on a four-dimensional manifold M which is diffeomorphic to (0, T ) × S for a three-dimensional manifold S. Let a point of M be denoted by (t, x), where t ∈ (0, T ) and x ∈ S. It will be important in the following to express the geometrical quantities of interest in terms of a local frame {ea } on S. Note that this frame does not depend on t. Let {θ a } denote the coframe dual to {ea }. Throughout the paper lower case Latin indices refer to components in this frame, except where other conventions are introduced explicitly. Suppose that the metric takes the form: −dt 2 + gab (t)θ a ⊗ θ b ,
(1)
where gab (t) denotes the one-parameter family of Riemannian metrics on S defined by the metrics induced on the hypersurfaces t =constant by the metric (4) gαβ . A function t such that the metric takes the form (1) is called a Gaussian time coordinate. In this case, the second fundamental form of a hypersurface t =constant is given by kab = − 21 ∂t gab . 2.1. The Einstein-matter equations. The Einstein field equations coupled to matter can be written in the following equivalent 3 + 1 form. The constraints are: R − kab k ab + (trk)2 = 16πρ, ∇ a kab − eb (trk) = 8πjb .
(2a) (2b)
The evolution equations are: ∂t gab = −2kab ,
(3a)
1 (3b) ∂t k a b = R a b + (trk)k a b − 8π(S ab − δba trS) − 4πρδ ab . 2 Here R is the scalar curvature of gab and Rab its Ricci tensor. The quantities ρ, ja and Sab are projections of the energy-momentum tensor. Their explicit forms in the cases of interest in this paper will be given below. The energy-momentum tensor of a scalar field is given by 1 Tαβ = ∇α φ∇β φ − (∇γ φ∇ γ φ)(4) gαβ . 2
(4)
The Einstein equations can be written in the equivalent form (4) Rαβ = 8π ∇α φ∇β φ, where (4) Rαβ is the Ricci tensor of (4) gαβ . We have ρ = T00 , ja = −T0a and Sab = Tab so that in the case of a scalar field it follows from (4) that: 1 (5a) [(∂t φ)2 + g ab ea (φ)eb (φ)], 2 jb = −∂t φeb (φ), (5b) 1 (5c) Sab = ea (φ)eb (φ) + [(∂t φ)2 − g cd ec (φ)ed (φ)]gab . 2 Note that it follows from the Einstein-scalar field equations as a consequence of the Bianchi identity that φ satisfies the wave equation (4) g αβ ∇α ∇β φ = 0. This has the 3 + 1 form: ρ=
−∂t2 φ + (trk)∂t φ + φ = 0.
(6)
482
L. Andersson, A. D. Rendall
The constraints and evolution equations are together equivalent to the full Einstein-scalar field equations. In the following we will work with the 3+1 formulation of the equations rather than the four-dimensional formulation. A stiff fluid is a perfect fluid with pressure equal to energy density. As we will see, it is closely related to the scalar field. The energy-momentum tensor of a stiff fluid is Tαβ = µ(2uα uβ + (4) gαβ ),
(7)
where µ is the energy density of the fluid in a comoving frame and uα is the fourvelocity. The Euler equations are obtained by substituting this expression into the equation ∇α T αβ = 0. The relation between the scalar field and the stiff fluid is as follows. Given a solution of the Einstein equations coupled to a scalar field, where the gradient of the scalar field φ is everywhere timelike, define µ = −(1/2)∇α φ∇ α φ and uα = ±(−∇β φ∇ β φ)−1/2 ∇ α φ. Here the sign is chosen so that uα is future pointing, and so can be interpreted as the four-velocity of a fluid. Then the energy-momentum tensor defined by (7) is equal to the energy-momentum tensor of the scalar field. Since the latter is divergence-free we see that the fluid variables just defined together with the original metric define a solution of the Einstein equations coupled to a stiff fluid. In the spacetimes of interest in the following, the gradient of φ is always timelike near the singularity, so that the condition on the gradient is not a restriction in that situation. The matter terms needed for the Einstein equations are given in the stiff fluid case by ρ = µ(1 + 2|u|2 ), jb = 2µ(1 + |u| ) ub , Sab = µ(2ua ub + gab ). 2 1/2
(8a) (8b) (8c)
Here |u|2 = gab ua ub . The Euler equations can be written in the following 3+1 form: ∂t µ − 2(trk)µ = − 2|u|2 ∂t µ − 4µua ∂t ua − 2µkab ua ub 1/2 a + 2(trk)µ|u|2 − 2ea (µ) 1 + |u|2 u 1/2 −1/2 − µ 1 + |u|2 ua ∇a ub ub − 2µ 1 + |u|2 ∇ a ua , −1 ∂t ua + (trk)ua = − µ−1 [∂t µ − 2(trk)µ]ua + 1 + |u|2 ub ua ∂t ub −1/2 ua ub + (1/2)δab µ−1 eb (µ) − 1 + |u|2 + ∇ c uc ua + u c ∇ c ua .
(9a)
(9b)
2.2. The velocity dominated system. We will prove the existence of a large class of solutions of the Einstein-scalar field equations and Einstein-stiff fluid equations whose singularities we can describe in great detail. These singularities are of the type known as velocity dominated. (Cf. [9] and [11].) This means that near the singularity the solution can be approximated by a solution of a simpler system, the velocity dominated system. Like the 3+1 version of the full equations it consists of constraints and evolution equations. Solutions of the velocity dominated system will always be written with a left superscript zero and the convention is adopted that the indices of all quantities with this
Quiescent Cosmological Singularities
483
superscript are moved with the velocity dominated metric 0 gcd . The velocity dominated equations will now be written out explicitly. The constraints are: −0 kab 0 k ab + (tr 0 k)2 = 16π 0 ρ, a 0
(10a)
∇ ( kab ) − eb (tr k) = 8π jb . 0
0
(10b)
The evolution equations are: ∂t 0 gab = −20 kab ,
(11a)
1 ∂t 0 k ab = (tr 0 k)0 k ab − 8π(0 S ab − δ ab tr 0 S) − 4π 0 ρδ ab . 2
(11b)
In these equations the matter terms are not identical to those in the full equations but have been obtained from those by discarding certain terms. In the case of a scalar field the (truncated) energy-momentum tensor components are given by 1 0 2 (∂t φ) , 2 0 jb = −∂t 0 φeb (0 φ), 1 0 Sab = (∂t 0 φ)2 (0 gab ). 2 0
ρ=
(12a) (12b) (12c)
The scalar field satisfies the equation −∂t2 (0 φ) + (tr 0 k)∂t 0 φ = 0. It is important to note that the velocity dominated evolution equations are ordinary differential equations. However the constraints still include partial differential equations. In the stiff fluid case the (truncated) energy-momentum tensor components are ρ = 0 µ,
(13a)
jb = 2 µ ub ,
(13b)
Sab = µ gab .
(13c)
0 0 0
0 0
0 0
The velocity dominated Euler equations are ∂t 0 µ − 2(tr 0 k)0 µ = 0,
(14a) 0 −1
∂t ua + (tr k) ua = −(1/2) µ 0
0
0
ea ( µ). 0
(14b)
In contrast to the scalar field case, this is not a system of ordinary differential equations. However it has a hierarchical ODE structure in the sense that if the ODE for 0 µ is solved and the result substituted into the other equations an ODE system for the 0 ua results. Substituting the expressions for the truncated energy-momentum tensor into the velocity dominated system shows that the matter terms in the velocity dominated evolution equation for 0 kab cancel both for the scalar field and the stiff fluid, leaving ∂t 0 k ab = (tr 0 k)0 k ab . Taking the trace of this equation gives ∂t (tr 0 k) = (tr 0 k)2 . This has the general solution tr 0 k = (C − t)−1 . If we wish an initial singularity, as signalled by the blow-up of tr 0 k,
484
L. Andersson, A. D. Rendall
to occur at t = 0 then tr 0 k = −t −1 . Going back to the equation for 0 k ab we see that t (0 k ab ) is independent of time. Thus all components of the mixed form of the second fundamental form are proportional to t −1 . At any given spatial point we can simultaneously diagonalize 0 gab and 0 kab by a suitable choice of frame. The matrix of components of the metric in this frame is diagonal with the diagonal elements being proportional to powers of t. This form of the metric is that originally used by BKL. Its disadvantage is that in general this frame cannot be chosen to depend smoothly on the spatial point. (There are difficulties when there are changes in the multiplicity of the eigenvalues of 0 k ab .) This is one reason why a different formulation is used in this paper. The velocity dominated matter equations can be solved exactly to give 0
φ(t, x) = A(x) log t + B(x)
(15)
for given functions A and B on S and 0 0
µ(t, x) = A2 (x)t −2 ,
ua (t, x) = t log t (A(x))
(16) −1
ea (A(x)) + tBa (x)
(17)
for given quantities A(x) and Ba (x) on S. 2.3. Statement of the main theorems. The main theorems can now be stated. Theorem 1. Let S be a three-dimensional analytic manifold and let 0 gab (t), 0 kab (t), 0 φ(t) be a C ω solution of the velocity dominated Einstein-scalar field equations on S × (0, ∞) such that ttr 0 k = −1 and each eigenvalue λ of −t 0 k ab is positive. Then there exists an open neighbourhood U of S × {0} in S × [0, ∞) and a unique C ω solution (gab (t), kab (t), φ(t)) of the Einstein-scalar field equations on U ∩ (S × (0, ∞)) such that for each compact subset K ⊂ S there are positive real numbers ζ, β, α ab , with ζ < β < α ab , for which the following estimates hold uniformly on K: a
1. 0 g ac gcb = δ ab + o(t α b ), a 2. k ab = 0 k ab + o(t −1+α b ), 3. φ = 0 φ + o(t β ), 4. ∂t φ = ∂t 0 φ + o(t −1+β ), a 5. 0 g ac ef (gcb ) = o(t α b −ζ ), 6. ea (φ) = ea (0 φ) + o(t β−ζ ). Note that the condition on tr 0 k can always be arranged by means of a time translation and that the condition on the eigenvalues of 0 k ab is satisfied provided it holds for a single value of t > 0. The positivity condition on the eigenvalues together with the velocity dominated Hamiltonian constraint imply that A2 must be strictly positive in the velocity dominated solution. Thus vacuum solutions are ruled out by the hypotheses of this theorem. If an analogous analysis were done for the Einstein equations coupled to other matter models, for instance a perfect fluid with equation of state p = kρ, k < 1, then in many cases, including that of the fluid just mentioned, the matter would make no contribution to the velocity dominated Hamiltonian constraint, so that it would not be possible to prove an analogous theorem. This reflects the fact that for those matter models an oscillatory approach to the singularity is predicted by the BKL analysis. The scalar field is an exception, as is the stiff fluid which will be discussed next.
Quiescent Cosmological Singularities
485
Theorem 2. Let S be a three-dimensional analytic manifold and let 0 gab (t), 0 kab (t), 0 µ(t), 0 u be a C ω solution of the velocity dominated Einstein-stiff fluid equations on a S × (0, ∞) such that ttr 0 k = −1 and each eigenvalue λ of −t 0 k ab is positive. Then there exists an open neighbourhood U of S × {0} in S × [0, ∞) and a unique C ω solution (gab (t), kab (t), µ(t), ua (t)) of the Einstein-stiff fluid equations on U ∩ (S × (0, ∞)) such that for each compact subset K ⊂ S there are positive real numbers ζ, β1 , β2 , α ab , with ζ < β2 < β1 < α ab , for which the following estimates hold uniformly on K: a
1. 0 g ac gcb = δ ab + o(t α b ), a 2. k ab = 0 k ab + o(t −1+α b ), 3. µ = 0 µ + o(t −2+β1 ), 4. ua = 0 ua + o(t 1+β2 ), a 5. 0 g ac ef (gcb ) = o(t α b −ζ ). The interest of these theorems depends very much on what information is available on constructing solutions of the velocity dominated system. Suppose that a solution of the velocity dominated constraints is given for some t = t0 > 0. The velocity dominated evolution equations constitute a system of ordinary differential equations which can be solved with these initial data. It follows from the remarks above that the solution exists globally on the interval (0, ∞). If we define 0 0
C = −0 kab 0 k ab + (tr 0 k)2 − 16π 0 ρ,
Cb = ∇ a (0 kab ) − eb (tr 0 k) − 8π 0 jb .
(18) (19)
then the velocity dominated evolution equations imply that: ∂t 0 C + 2t −1 (0 C) = 0, 1 ∂t 0 Ca + t −1 (0 Ca ) = ea (0 C). 2
(20) (21)
To prove this it is necessary to use the following equations for the matter quantities, which can be derived from the velocity dominated matter equations in both the scalar field and stiff fluid cases. ∂t 0 ρ = 2(tr 0 k)0 ρ,
(22)
∂t ja = (tr k) ja − ea ( ρ). 0
0
0
0
(23)
Since 0 C vanishes at t = t0 the evolution equation for 0 C implies that it vanishes everywhere. Then the evolution equation for 0 Ca , together with the fact that it vanishes for t = t0 implies that 0 Ca vanishes everywhere. To sum up, if the velocity dominated constraints are satisfied at some time and the velocity dominated evolution equations are satisfied everywhere then the velocity dominated constraints are satisfied everywhere. Thus in order to have a parametrization of the general solution of the velocity dominated system, it is enough to obtain a parametrization of solutions of the velocity dominated constraints. The latter question will be treated in Sect. 7.
486
L. Andersson, A. D. Rendall
3. Framework of the Proofs In this section the proofs of Theorems 1 and 2 are outlined. Only the general logical stucture of the proof is explained here and the hard technical parts of the argument are left to later sections. These results will be referred to as required in this section. The first step is to make a suitable ansatz for the desired solution. This essentially means giving names to the remainder terms occurring in the statements of the main theorems. Assume that a velocity dominated solution is given as in those statements. Then a solution is sought in the form: c
gab = 0 gab + 0 gac t α b γ cb , kab =
gac (0 k cb
c + t −1+α b κ cb ).
(24a) (24b)
In the following the summation convention applies only to repeated tensor indices and not to non-tensorial quantities like α ab . Thus in the above equations there is a summation on the index c but none on the index b. Matter fields are sought in the form: φ = 0φ + t β ψ
(25)
µ = 0 µ + t −2+β1 ν,
(26a)
and ua = ua + t 0
1+β2
va ,
(26b)
respectively. The Einstein-scalar field equations (2), (3), (5) and (6) can be rewritten as equations for γ ab , κ ab and ψ. Similarly the Einstein-stiff fluid equations can be written as equations for γ ab , κ ab , ν and va . This system of equations (for either choice of matter model) will be called the first reduced system. Since γ ab and κ ab are mixed tensors, there is no direct way to express the fact that they originated from symmetric tensors. Instead it must be shown that when the first reduced system is solved with suitable asymptotic conditions as t → 0 then the quantities gab and kab defined by the above equations are in fact symmetric as a consequence of the differential equations and the initial conditions. When allowing non-symmetric tensors gab and kab , we need to establish some conventions in order to make the definition of the first reduced system unambiguous. Firstly, define g bc as the unique tensor which satifies gab g bc = δac . Next, use the convention that indices on tensors are lowered by contraction with the second index of gab and raised with the first index of g bc . This maintains the usual properties of index manipulations in the case of symmetric gab as far as possible. The covariant derivatives in the equations are expressed in terms of the connection coefficients in the frame {ea } in an unambiguous way. The definition of the connection coefficients is extended to the case of a non-symmetric tensor gab by fixing the order of indices according to d = gcd 1ab
1 d d d ea (gbc ) + eb (gac ) − ec (gab ) + γab gcd − γac gbd − γbc gad , 2
(27)
c = θ c ([e , e ]) are the structure functions of the frame. where γab a b Finally, in order to define the Ricci tensor in the evolution equation for kab we define Rab to be the Ricci tensor of the symmetric part S gab = (1/2)(gab + gba ) of gab . In Lemma 3 it is shown that given a solution of the velocity dominated system as in the statement of one of the main theorems 1–2, any solution of the first reduced system which
Quiescent Cosmological Singularities
487
satisfies points 1–6 of Theorem 1, in the scalar field case, and points 1–5 of Theorem 2, in the stiff fluid case, gives rise to symmetric tensors gab and kab , and thus to a solution of the Einstein-matter evolution equations. It then follows from Lemma 7 and the remarks following it that the Einstein-matter constraints are also satisfied. It follows that to prove the main theorems it is enough to prove the existence and uniqueness of solutions of the first reduced system of the form given in the main theorems. The existence theorem for the first reduced system will be proved using the theory of Fuchsian systems. Since the form of this theory we use in the following concerns a system of first order equations it is not immediately applicable, due to the occurrence of second order derivatives of the metric and scalar field. It is necessary to introduce some suitable new variables representing spatial derivatives of the basic variables. Define λabc = t ζ ec (γ ab ) and, in the scalar field case, ωa = t ζ ea (ψ) and χ = t∂t ψ + βψ, where ζ and β are positive constants. The evolution equations satisfied by these quantities are given explicitly in (46b) and (48). With the help of the new variables these equations together with the first reduced system can be written as a first order system. Call the result the second reduced system. It is easy to show, using the evolution equations a − t ζ e (γ a ) which follow from the second reduced system, that for differences like γbc c b the first and second reduced systems give rise to the same sets of solutions under the assumptions of the main theorems, together with corresponding assumptions on the new variables. Thus it suffices to solve the second reduced system, which is of first order. If it can be shown that the second reduced system is Fuchsian, then the main theorems follow from Theorem 3. In fact it is enough to show that the restriction of the system to a neighbourhood of an arbitrary point of S is Fuchsian. For the local solutions thus obtained can be pieced together to get a global solution. Moreover asymptotic estimates as in the statements of the main theorems follow from corresponding local statements, since a compact subset of S can be covered by finitely many of the local neighbourhoods. In Sects. 5 and 6 it is shown that the second reduced system is Fuchsian on some neighbourhood of each point of S for a suitable choice of the constants α ab , β, β1 , β2 and ζ depending on the given velocity dominated solution. This requires a detailed analysis of the degree of singularity of all terms in the second reduced system, in particular that of the Ricci tensor. The result of the latter is Lemma 6. This is the hardest part of the proof. Due the above considerations, it is clear that the main theorems follow directly from Theorem 3. 4. Fuchsian Systems The proofs of Theorems 1 and 2 rely on a result of Kichenassamy and Rendall [14] on Fuchsian systems which uses a method going back to Baouendi and Goulaouic [2]. The result of [14] will now be recalled. It concerns a system of the form: t
∂u + A(x)u = f (t, x, u, ux ). ∂t
(28)
Here u(t, x) is a function on an open subset of R × Rn with values in Rk and A(x) is a C ω matrix-valued function. The derivatives of u with respect to the x variables are denoted by ux . The function f is defined on (0, T0 ] × U1 × U2 , where U1 is an open subset of Rn and U2 is an open subset of Rk+nk , and takes values in Rk . We assume that A(x) is defined on U1 . In this and later sections it will be useful to have some terminology for comparing the sizes of certain expressions.
488
L. Andersson, A. D. Rendall
Definition 1. Let F (t, x, p), G(t, x, p) be functions on (0, T0 ]×U1 ×U2 , where U1 , U2 are open subsets of Rn and RN respectively. Then we will say that F G if for every compact K ⊂ U1 × U2 , there is a constant C such that |F (t, x, p)| ≤ C|G(t, x, p)| for t ∈ (0, t0 ], (x, p) ∈ K. In the particular case that G is just a function of t we will often use the familiar notation F = O(G(t)) to replace F G. The notation F = o(G(t)) will also be used to indicate that F /G tends to zero uniformly on compact subsets of U1 × U2 as t → 0. In the theorem on Fuchsian systems the function f is supposed to be regular in a sense which will now be explained. To do this we need the notion of a function which is continuous in t and analytic in other (complex) variables. This means by definition that it should be a continuous function of all variables, that the first order partial derivatives with respect to all variables other than t should exist and be continuous, and that the CauchyRiemann equations should be satisfied in these variables. For further remarks on this concept see [16]. Assume that there is an open subset U˜ of Cn+k+nk whose intersection with the real section is equal to U1 × U2 and a function f˜ on (0, T0 ] × U˜ continuous in t and analytic in the remaining arguments whose restriction to (0, T0 ] × Rn+k+nk is equal to f . The function f is called regular if it has an analytic continuation f˜ of the kind just described, and if there is some θ > 0 such that f˜ and its first derivatives with respect to the arguments u and ux are O(t θ ) as t → 0, in the sense introduced above. For a matrix A with entries Aab let A = sup{Ax : x = 1} (operator norm) and A∞ = maxa,b |Aab | (maximum norm). Since these two norms are equivalent the operator norm could be replaced by the maximum norm in the statement of the theorem which follows and in Lemma 1 below. However in the proof of that lemma below the use of the operator norm is important. Theorem 3. Suppose that the function f is regular, A(x) has an analytic continuation to an open set U˜ 1 whose intersection with the real section is U1 , and there is a constant C such that σ A(x) ≤ C for x ∈ U˜ 1 and 0 < σ < 1. Then Eq. (28) has a unique solution u defined near t = 0 which is continuous in t and analytic in x and tends to zero as t → 0. If f˜ is analytic for t > 0 then this solution is also analytic in t for t > 0. Remark 1. Under the hypotheses of the theorem spatial derivatives of any order of u are also o(1) as t → 0. This follows directly from the proof of the theorem in [14]. Remark 2. If the coefficients of the equation depend analytically on a parameter and are suitably regular, then the solution depends analytically on the parameter. It suffices to treat the parameter as an additional spatial variable. The statement of the theorem is not identical to that given in [14] but the proof is just the same. We can write f (t, x, u, ux ) = t θ g(t, x, u, ux ) for some bounded function g. By replacing t as time variable by t θ , it can be assumed without loss of generality that θ = 1. Then the iteration used in the proof given in [14] converges to the desired solution under the regularity hypothesis we have made on f . For the applications in this paper an extension of this result which applies to equations slightly more general than (28) will be required. These have the form t
∂u ∂u + A(x)u = f (t, x, u, ux ) + g(t, x, u)t . ∂t ∂t
(29)
Quiescent Cosmological Singularities
489
If we have an equation of this form where f and g are regular then an analogous existence and uniqueness result holds. For we can rewrite (29) in the form t
∂u + A(x)u = [I − (I − g(t, x, u))−1 ]A(x) + (I − g(t, x, u))−1 f (t, x, u, ux ). ∂t (30)
If we call the right-hand side of this equation h(t, x, u, ux ) then it satisfies the conditions required of f (t, x, u, ux ) in Theorem 3. In other words h is regular. For if g(t, x, u) is O(t θ ) then (I − g(t, x, u))−1 is O(1) and [I − (I − g(t, x, u))−1 ] is O(t θ ). In the following it will be necessary to verify the regularity hypothesis for some particular systems, and some general remarks which can be used to simplify this task will now be made. In these systems we always have f (t, x, u, ux ) =
m
ti Fi (x, v(t, x), u, ux ),
(31)
i=1
where the Fi are analytic functions on an open subset V of Rn+l+k+nk which includes U1 × {0} × U2 , t1 , . . . , tm are some functions of t and v(t, x) is a given function with values in Rl . The ti are continuous functions on (0, T0 ] which tend to zero as t → 0 as least as fast as some positive power of t and are analytic for t > 0. The function v(t, x) is the restriction of an analytic function on (0, T0 ]× U˜ 1 to real values of its arguments. Each component of v tends to zero as t → 0, uniformly on U˜ 1 . These properties ensure that f is regular. For the functions Fi have analytic continuations to some open neighbourhood V˜ of V in Cn+l+k+nk . In the examples we will meet the functions ti are either positive powers of t or positive powers of t times positive powers of log t. The role of v(t, x) is played by the functions tj , the components of the velocity dominated metric 0 gab , their spatial derivatives of first and second order, and the components of the inverse metric multiplied by suitable powers of t so that the product vanishes in the limit t → 0. We saw in the last section that 0 gab can be written in the form t 2K which, for each fixed t, is an entire function of K. Thus t 2K(x) is analytic on any region where K(x) is analytic. The velocity dominated metric and its derivatives all tend to zero uniformly on compact sets as t → 0 while the same is true of the inverse metric multiplied by a suitable power of t. Next a criterion will be given which allows the hypothesis on the matrix A in Theorem 3 to be checked in many cases. Lemma 1. Let A(x) be a k × k matrix-valued continuous function defined on a compact subset of Rn . If there is a constant α such that, for each eigenvalue λ of A(x) at any point of the given compact set, Reλ > α then there is a constant C such that the estimate t A(x) ≤ Ct α holds for t small and positive and x in the compact set. Proof. The general case can be reduced to the case α = 0 by the following computation: t A−αI ≤ t A t −α .
(32)
In the rest of the proof only the case α = 0 will be considered. Without the parameter dependence the result could easily be proved by reducing the matrix to Jordan canonical form. The difficulty with a parameter is that the reduction to canonical form is in general not a continuous process. At least it can be concluded from the continuity properties of the eigenvalues that there is a β > 0 such that all eigenvalues satisfy Reλ > β. Let
490
L. Andersson, A. D. Rendall
s = log t. Then the problem is to show that for a fixed s0 there is a constant C > 0 such that e−sA ≤ C for all s > s0 . By scaling t we may suppose without loss of generality that s0 = 0. For each x we can conclude by reduction to canonical form that there exists a value sx of s such that the inequality e−sx A(x) < e−βsx /2 holds. By continuity of the exponential function there is an open neighbourhood Ux of x where this continues to hold for the given value of sx . Let Cx = sup{e−sA(y) : s ∈ [0, sx ], y ∈ Ux }. It follows that for any s ∈ [0, ∞) and any y ∈ Ux we have e−sA(y) ≤ Cx e−sx A(y) [s/sx ] ≤ Cx e−βsx [s/sx ]/2 ≤ Cx .
(33)
By compactness, it is possible to pass to a subcover consisting of a finite number of the sets Ux and letting C be the maximum of the corresponding Cx we obtain the required estimate. Remark 3. More generally, an analogous estimate is obtained if A(x) is the direct sum of a matrix B(x) whose eigenvalues have positive real parts and the zero matrix. This is obvious since in that case t A(x) is the direct sum of t B(x) and the identity. 5. Setting up the Reduced Equations In this section we introduce the adapted frame {ea } and the auxiliary exponents {qa } which will be used in the curvature estimate. Let x0 be given. Let 0 gab , 0 k ab , be solutions of the velocity dominated evolution and constraint equations (11) and (10). Let pa be the eigenvalues of K ab = −t 0 k ab . Assume that {pa } are such that pa (x0 ) > 0, a = 1, 2, 3, a pa = 1 (Kasner condition) and pa are ordered so that pa ≤ pb , for a ≤ b. Fix an initial time t0 ∈ (0, 1). We will in the following restrict our considerations to t ∈ (0, 1). If K has a double eigenvalue at x0 , then in general the eigenvalues and eigenvectors of K are not analytic in a neighbourhood of x0 , and therefore in general it is not possible to introduce an analytic frame diagonalizing K in a given neighbourhood [12, §II.5.6]. We will avoid this problem by using the fact that if pa is a double eigenvalue of K(x0 ), the eigenprojector for the pair of eigenvalues corresponding to pa is analytic in a neighbourhood of x0 [12, §II.1.4]. This means in particular that it is possible to choose an analytic frame which is adapted to the eigenspace of the pair of eigenvalues. This will play a central role in what follows. Choose numbers α0 , B > 0 so that B = α0 /4 < min{pa (x0 )}/40. 1. Cases I, II, III: We will distinguish between the following cases: (I) (near Friedmann) maxa,b |pa − pb | < B/2, a = 1, 2, 3. (II) (near double eigenvalue) maxa,b |pa − pb | > B/2, and |p a − pb | < B/2 for some pair a , b , a = b . Denote by p⊥ the distinguished exponent not equal to pa , pb . (III) (diagonalizable) min a,b |pa − pb | > B/2. a=b
By reducing B if necessary we can make sure that condition I, II or III holds at x0 if the maximum multiplicity of an eigenvalue at x0 is three, two or one respectively. The conditions I, II, III are open, and hence there is an open neighbourhood U0 x0 such that, if condition I, II or III holds at x0 , the condition holds in U0 and further, for x ∈ U0 , mina {pa (x)} > 20B. 2. Auxiliary exponents {qa }: Let U0 x0 be as in point 5. We will define analytic functions qa , a = 1, 2, 3, called auxiliary exponents, in U0 , with the properties:
Quiescent Cosmological Singularities
491
(a) qa > 0 (positivity), (b) qa ≤ qb if a ≤ b (ordering), (c) a qa = 1 (Kasner). The auxiliary exponents qa will be defined in terms of the eigenvalues pa of K depending on whether at x0 we are in case I, II, or III. Let qa (x0 ) = pa (x0 ), a = 1, 2, 3, and choose qa on U0 satisfying the positivity, ordering and Kasner conditions such that in cases I, II, III, the following holds: (I) qa = 1/3, a = 1, 2, 3, (II) q⊥ = p⊥ , qa = qb = 21 (1 − q⊥ ), (III) qa = pa . Then qa are analytic on U0 . Note that it follows from the definition of qa that q1 ≥ mini {pi } and that maxa |qa − pa | < B/2. 3. The frame {ea }: In each case I, II, III define an analytic frame {ea } with dual frame {θ a }, by the following prescription: {ea } is an ON frame w.r.t. 0 g(t0 ), and in case II, III the following additional conditions hold: (II) e⊥ is the eigenvector of K corresponding to q⊥ and ea , eb span the eigenspace of K corresponding to the eigenvalues pa , pb . (III) ea are eigenvectors of K corresponding to the eigenvalues qa . We will call {qa }, {ea }, {θ a }, satisfying the above conditions adapted. In the following we will work in a neighbourhood U0 defined as above and assume that we are given adapted {qa }, {ea }, {θ a }, on U0 . The role of α ab will be to shift the spectrum of the system matrix to be positive. We need to choose α0 larger than B to compensate for the fact that the qa are not the exact eigenvalues of K. In the following Tab will denote frame components of the tensor T w.r.t. the frame {ea }. Define the rescaled frame e˜a = t −qa ea with dual frame θ˜ a = t qa θ a and denote by T˜ab the e˜a frame components of the tensor T . Then we have T˜ ab = t qa −qb T ab .
T˜ab = t −qa −qb Tab ,
(34)
It follows from the definitions, that in case III, K ab = δ ab qb ,
0 a kb
= −t −1 δ ab qb
(35)
while in case II, the tensors K, 0 k and 0 g are block diagonal in the frame {ea },
K a⊥ = 0,
0 a k⊥
= 0,
0
ga ⊥ = 0.
(36)
For s ∈ R, let (s)+ = max(s, 0), for a, b ∈ {1, 2, 3}, let α ab = 2(qb − qa )+ + α0 , and
α˜ ab
(37)
= |qb − qa | + α0 . In view of the relation 2(s)+ − s = |s| we have α ab + qa − qb = α˜ ab .
(38)
The following identities are an immediate consequence of Eqs. (35) and (36), together with the fact that in case I, qa = 1/3, a = 1, 2, 3. 0
b
a
gab t α c = 0 gab t α c ,
0
gab t
qb
qa
= gab t . 0
(39a) (39b)
Note that in (39) no summation over indices is implied. ˜ Lemma 1 implies the following estimate for the rescaled frame components of 0 g.
492
L. Andersson, A. D. Rendall
Lemma 2. ||0 g˜ ab ||∞ ≤ Ct −B ,
||0 g˜ ab ||∞ ≤ Ct −B .
(40)
Proof. Let Qab = δ ab qb . A direct computation starting from the matrix form of the velocity dominated evolution equation gives ˜ ac (K c − Qc ) ˜ ab = 2G t∂t G b b which has the solution ˜ ab (t) = G ˜ ac (t0 ) G
2(K c −Qc ) b b t . t0
From the definition of qa and the frame ea the spectrum of K−Q is of the form pa −qa . Therefore since |pa − qa | < B/2, we find that the spectrum of 2(K − Q) is contained ˜ ∞ ≤ Ct −B . in the interval (−B, B). Therefore it follows from Lemma 1 that ||G|| ˜ −1 satisfies the equation Similarly, using the fact that G ˜ −1 = 2(Qac − Kac )G ˜ −1 , t∂t G ab cb ˜ −1 ||∞ ≤ Ct −B . an application of Lemma 1 yields the estimate ||G
We are now ready to describe the ansatz which will be used to write the Einsteinscalar field and Einstein-stiff fluid systems in Fuchsian form. Assume that a solution of the velocity dominated constraint and evolution equations, 0 gab , 0 kab , with adapted frame, coframe, and auxiliary exponents {ea }, {θ a }, {qa }, is given. Let α ab be defined by (37). Let ζ = B/200. We will consider metrics and second fundamental forms g, k of the form c
gab = 0 gab + 0 gac t α b γ cb , g
ab
ec (γ ab )
0 ab
= g =
kab =
+t
t −ζ λabc , gac (0 k cb
α ab
γ¯ ac 0 g cb ,
c + t −1+α b κ cb ),
γ cb = o(1),
(41a)
γ¯ ac = o(1), λabc = o(1), κ cb = o(1).
(41b) (41c) (41d) a
The form (41b) is a consequence of (41a). To see this, note that 0 g ac gcb = δ ab + t α b γ ab and that g ac0 gcb = (0 g ac gcb )−1 . Thus the desired result follows from the matrix identity (I + A)−1 = I − A + (I + A)−1 A2 and the fact that, using 2(x)+ − x = |x|, it can be concluded that α ae + α eb − α ab = |qe − qa | + |qb − qe | − |qb − qa | + α0 ≥ α0 .
(42)
The latter relation shows that each component of the square of γ a b vanishes faster than the corresponding component of γ a b itself. Let β = B/100. In addition to (41) we will use the following ansatz for the scalar field: φ = 0 φ + t β ψ, ψ = o(1), −ζ
ea (ψ) = t ωa , t∂t ψ + βψ = χ ,
ωa = o(1), χ = o(1),
(43a) (43b) (43c)
Quiescent Cosmological Singularities
493
and for the stiff fluid case, µ = 0 µ + t −2+β1 ν, u a = ua + t 0
1+β2
va ,
ν = o(1),
(44a)
va = o(1).
(44b)
Equations (41), (43) and (44) with the exception of (41c) and (43b) will be used to derive the first reduced form of the field equations, and Eqs. (41c) and (43b) for the spatial derivatives of γ ab and ψ, will be used to derive the second reduced system. Note that in view of (34), we have c
g˜ ab = 0 g˜ ab + 0 g˜ ac t α˜ b γ cb ,
(45a)
g˜ ab = 0 g˜ ab + t
(45b)
α˜ ab
γ¯ ac 0 g˜ cb ,
c k˜ab = g˜ ac (0 k˜ cb + t −1+α˜ b κ cb ).
(45c)
We use the following conventions throughout: – indices on velocity dominated fields 0 gab , 0 kab , 0 ua are raised and lowered with 0 gab , while indices on other tensors are raised and lowered with gab , – the dynamic tensor fields γ ab , κ ab in gab , kab are always used in mixed form and only in {ea } frame components, – the dynamic 1–form va in the velocity field ua is always used with lower index. 5.1. The reduced Einstein–matter system. In this section, we describe the first reduced system for the Einstein-scalar field evolution equations, derived from (3) using the ansatz given by Eqs. (41) and (43) for gab , kab , φ in terms of the velocity dominated solution 0 g , 0 k , 0 φ, the auxiliary exponents {q } and the dynamical fields γ a , κ a , ψ. Simiab ab a b b larly, we describe the first reduced system for the Einstein-stiff fluid evolution equations obtained using Eqs. (44). For convenience we use the term “Einstein-matter system” to describe the Einstein-scalar field and Einstein-stiff fluid system collectively. The tensor gab of the form (41a) is not a priori symmetric, but it will follow from Lemma 3 that the solution to the Fuchsian form of the Einstein-matter evolution equations will be symmetric. It is convenient to introduce the symmetrized tensor S
gab =
1 (gab + gba ). 2
Let S Rab be the Ricci tensor computed w.r.t. the symmetrized metric S gab , see Sect. 6 for details. By substituting into the evolution equations (3) with Rab replaced by S Rab , defined in terms of γ ab and λabc , we get the following system for γ ab , κ ab , λabc : a
e
a
t∂t γ a b + α ab γ ab + 2κ ab + 2γ ae (t 0 k eb ) − 2(t 0 k ae )γ eb = −2t α e +α b −α b γ ae κ eb , t∂t λabc t∂t κ ab
ζ
= t ec (t∂t γ ab ) + ζ t ζ ec (γ ab ), + α ab κ ab − (t 0 k ab )(trκ) = t α0 (trκ)κ ab
(46a) (46b)
+t
2−α ab
(S R ab − M ab ),
(46c)
where Mab is is given by Mab = 8πea (φ)eb (φ) for the Einstein-scalar field system, Mab = 16πµua ub for the Einstein-stiff fluid system.
(47a) (47b)
494
L. Andersson, A. D. Rendall
M ab will be estimated in Sect. 6. Note that the power of t occurring on the right-hand side of Eq. (46a) is positive due to (42). The wave equation (6) becomes the following system of equations for ψ, ωa , χ : t∂t ψ + βψ − χ = 0,
(48a)
ζ
t∂t ωa = t [ea (χ ) + (ζ − β)ea (ψ)], t∂t χ + βχ = t
α0 −β
β
trκ(A + t χ ) + t
2−β
(48b)
φ+t 0
2−ζ
a
∇ ωa .
(48c)
Let U = (γ ab , κ ab , ψ, χ , λabc , ωa ). Then we can write the second reduced system in the Einstein-scalar field case, which consists of Eqs. (46) and (48) in the form t∂t U + AU = F(t, x, U, Ux ) for a matrix A and a function F. We will prove that this system is in Fuchsian form. The Einstein-stiff fluid equations will be treated in a similar way. However, due to the complexity of the equations in that case, they will not be written out more explicitly than is absolutely necessary to understand the essential features of their structure. The second reduced system in the stiff fluid case can be brought into the generalized Fuchsian form t∂t U + AU = F(t, x, U, Ux ) + G(t, x, U)∂t U
(49)
already introduced in Sect. 4. In order to do this it is useful to introduce some abbreviations for certain terms in (9) so that the equations become ∂t µ − 2(trk)µ = −2|u|2 ∂t µ − 4µua ∂t ua + F1 , ∂t ua + (trk)ua + (1/2)µ
−1
ea (µ) = −µ
−1
[∂t µ − 2(trk)µ]ua ,
(50a) (50b)
+(1 + |u|2 )−1 ub ua ∂t ub − [(1 + |u|2 )−1/2 − 1]µ−1 ea (µ) + F2 . The expressions F1 and F2 contain only terms which can be incorporated into F in (49). Next the ansatz (44) must be substituted into these equations. The result is: t∂t ν + β1 ν = −2t 3−β1 (1 + ttrk)0 µ + 2(1 + ttrk)ν + t 3−β1 [∂t µ − 2(trk)µ], (51a) t∂t va + β2 va = −(1 + ttrk)va + t −β2 [∂t ua + (trk)ua + (1/2)0 µ−1 ea (0 µ)]. (51b) The expressions on the left-hand side of the above form of the Euler equations written in terms of the basic variables µ and ua occur on the right-hand sides of the above evolution equations for ν and va . In order to get a fully explicit form it would be necessary to substitute for these expressions and then express the final result in terms of ν and va . This is, however, neither necessary nor even helpful for the analysis to be done here. Next we will consider the matrix A and prove that A is a direct sum of a matrix with spectrum bounded from below by a positive number, with a zero matrix. (The arguments in the scalar field and stiff fluid cases are very similar.) It is therefore of a form such that the theory presented in Sect. 4 applies. In addition we must show that F(t, x, U, Ux ) = O(t δ ) for some δ > 0 and, in the stiff fluid case, that G(t, x, U) satisfies a similar estimate. This will be done in the next section. The matrix A is block diagonal and therefore it is enough to consider each block separately. The rows and columns of A corresponding to λabc , ωa are zero, and therefore this A is the direct sum of a matrix corresponding to γ , κ, ψ, χ, with a zero matrix in the scalar field case and the direct sum of a matrix corresponding to γ , κ, ν, va with a
Quiescent Cosmological Singularities
495
zero matrix in the stiff fluid case. We now consider the spectrum of this matrix. The submatrix corresponding to γ , κ is upper block triangular. The γ , γ block is given by γ ab → α ab γ ab + 2[γ , t 0 k]ab . To estimate the spectrum of this, it is necessary to consider the cases I,II,III separately. Working in a frame which diagonalizes 0 k, t 0 k ab = −δ ab pb , and hence in this case 2[γ , t 0 k]ab = −2(pb − pa )γ ab . Therefore, in case III, we get using the definition of α ab , α ab γ ab + 2[γ , t 0 k]ab = 2((pb − pa )+ − (pb − pa ) + α0 )γ ab , and hence using (x)+ − x ≥ 0 for all x ∈ R, the spectrum of the γ , γ block is bounded from below by α0 in case III. Next consider case I. In this case, α ab = α0 = 4B and the spectrum of γ ab → 2[γ , t 0 k]ab is bounded from below by B, which shows that the spectrum of the γ , γ block is bounded from below by 3B in case II. Finally, in case II, 0 k ab is block diagonal in the adapted frame. The spectrum of a γ b → 2[γ , t 0 k]ab consists of 2(pa − pb ), 2(pa − p⊥ ), and 0. Now using the definition of α ab for case II and arguing as above, we get the lower bound 3B for the spectrum of the γ , γ block in case II. Therefore the spectrum of the γ , γ of A is bounded from below by 3B. Next we consider the κ, κ block. This is of the form κ ab → α ab κ ab − (t 0 k ab )trκ. First consider the action on the trace–free part of κ ab . Then the spectrum is given by α ab > α0 . On the other hand, restricting to the trace part of κ ab , which is diagonal, we see that the spectrum is α0 + 1. Therefore the spectrum of the κ, κ block is bounded from below by α0 . The ψ, χ block is of the form β −1 0 β which has spectrum β > 0. The ν, va block is diagonal with eigenvalues β1 and β2 . Therefore, in view of the facts that 3B > β > 0, β1 > 0 and β2 > 0, the desired properties of the spectrum of A have been verified. Given a solution U = (γ ab , κ ab , ψ, χ , λabc , ωa ) of the reduced system for the Einsteinscalar field equations, define gab , kab and φ by (41) and (43). Similarly, given a solution U = (γ ab , κ ab , ν, va λabc ) of the reduced system for the Einstein-stiff fluid equations, define (gab , kab , µ, ua ) by (41) and (44). If it can be shown that gab and kab are symmetric then a solution of the Einstein-scalar field equations is obtained. The next lemma gives sufficient conditions for this to be true. Lemma 3. Let a solution of the velocity dominated Einstein-matter system be given on S × (0, ∞) with all eigenvalues of −t 0 k a b positive and ttr 0 k = −1. Let U be a solution of the reduced system for the Einstein-scalar field or Einstein-stiff fluid system corresponding to the given velocity dominated solution, with U = o(1). Define (gab , kab , φ) by (41) and (43) in the scalar field case, and define (gab , kab , µ, ua ) by (41) and (44) in the stiff fluid case. Then gab and kab are symmetric.
496
L. Andersson, A. D. Rendall
Proof. From the evolution equation for γ ab and the definitions of gab and kab it follows that ∂t gab = −2kab . Similarly an equation close to the usual evolution equation for k ab can be recovered from (46c). It differs from the usual one only in the fact that R ab is replaced by S R ab . From these equations we can derive the equations: ∂t (gab − gba ) = −2(kab − kba ), ∂t (kab − kba ) = (trk)(kab − kba ).
(52a) (52b)
It follows from the assumptions on γ ab and κ ab together with the definition of kab , that the components of kab − kba are o(t −1+η ) for some η > 0. Hence the quantity Eab = t 1−η (kab − kba ) tends to zero as t → 0. It satisfies the equation: t∂t Eab + ηEab = (ttrk + 1)Eab .
(53)
From Theorem 3 we conclude that Eab = 0. Thus kab is symmetric. It then follows immediately from (52a) and the fact that gab = o(1) that gab is also symmetric. 6. Curvature Estimates Let S Rab be the Ricci tensor computed w.r.t. the symmetrized metric S gab = 21 (gab + gba ). In order to get a Fuchsian form for the Einstein–matter evolution equations, we need the following estimate for the frame components of S R ab , a
t 2−α b S R ab = O(t δ ),
for some δ ∈ (0, B).
(54)
In doing the estimates we will use the notion of comparing the size of functions introduced in Definition 1. In proving that the second reduced system t∂t u + A(x)u = f (t, x, u, ux ) is in Fuchsian form, one essential step is to prove an estimate of the form f tδ for some δ > 0. In the present section, we accomplish this task for the expression a t 2−α b S R ab , which is now considered as a function r(t, x, v(x, t), u, ux ), where v(x, t) is defined in terms of the solution 0 gab , 0 kab to the velocity dominated system and the data {ea }, {θ a }, {qa }, etc. defined in Sect. 5, and u consists of the variables γ ab , λabc . In terms of the relation , the goal is to prove a
t 2−α b S R ab t δ ,
for some δ ∈ (0, B).
(55)
By assumption, α0 = 4B, so α ab − 2B = 2(qb − qa )+ + α0 /2. Therefore, the arguments that apply to α ab also apply to α ab − 2B. The symmetrized metric tensor satisfies S
a
g˜ ab = 0 g˜ ab + t α˜ b −2B 0 g˜ ac S γ cb ,
S ab
g˜
= 0 g˜ ab + t
α˜ ab −2B
S a 0 cb γ¯ c g˜ ,
S c γb S a γ¯ c
= o(1),
(56a)
= o(1).
(56b)
Quiescent Cosmological Singularities
497
To see this, note the identity S c γb
=
1 2B c 0 cd 0 f γ b + g˜ g˜ bf γ d , t 2
which in view of Lemma 2 shows that S γ cb = o(1). The argument that S γ¯ ac = o(1) is the same as for γ¯ ac = o(1). It is convenient to estimate the rescaled frame components. Note R ab = t −qa +qb R˜ ab . Hence in view of (38) we need to consider a
t 2−α˜ b R˜ ab . Using Lemma 2 and (41) gives a a ||t 2−α˜ b R˜ ab ||∞ ≤ Ct −B ||t 2−α˜ b R˜ ab ||∞ .
To see this, we compute using (39a) and (45b), a
a
||t 2−α˜ b R˜ ab ||∞ = ||t 2−α˜ b g˜ ac R˜ cb ||∞
a a a ≤ ||t 2−α˜ b 0 g˜ ac R˜ cb ||∞ + ||t 2−α˜ b +α˜ c γ¯ ad 0 g˜ dc R˜ cb ||∞ c ≤ C||t 2−B−α˜ b R˜ cb ||∞ ,
where we used the triangle inequality in the form −α˜ ab + α˜ ac ≥ −α˜ cb . c = θ c ([e , e ]) be the structure coefficients of the frame {e }. The structure Let γab a b a c = θ˜ c ([e˜ , e˜ ]) of the frame e˜ are given by coefficients γ˜ab a b a c c = t qc −qa −qb γab − log(t)(t −qa ea (qb )δbc − t −qb eb (qa )δac ). γ˜ab
(57)
It is convenient to define 1˜ abc = ∇e˜a e˜b , e˜c . Then 1˜ abc is given in terms of g˜ ab by 21˜ abc = e˜a (g˜ bc ) + e˜b (g˜ ac ) − e˜c (g˜ ab ) d d d g˜ cd − γ˜ac g˜ bd − γ˜bc g˜ ad + γ˜ab
(58)
and R˜ dcab is given in terms of 1˜ abc and γ˜ abc by f R˜ dcab = e˜a 1˜ bcd − e˜b 1˜ acd − γ˜ ab 1˜ f cd − g˜ fg 1˜ bcf 1˜ adg + g˜ fg 1˜ acf 1˜ bdg
= (R1)dcab − (R2)dcab − (R3)dcab − (R4)dcab + (R5)dcab .
Let zab =
(59)
0, if a = b . 1, if a = b
Define Zabc (t) = t −qa + t −qb + t −qc + t qa −qb −qc zbc + t qb −qc −qa zca + t qc −qa −qb zab .
(60)
Note that (R1)dcab , . . . , (R5)dcab , zab , Zabc are not tensors. In the rest of this section, the frame {ea } is fixed and the estimates will be done for tensor components in this frame. We will use the following lemma as the starting point of the estimates in this section.
498
L. Andersson, A. D. Rendall
Lemma 4. g˜ ab t −B , g˜
ab
t
e˜a g˜ bc t
−B
(61a)
,
(61b)
−qa −2B
,
(61c)
e˜a e˜b g˜ cd t −qa −qb −3B , ab −qb
g˜ t
t
−B−qa
,
(61e)
qa
−qc +qb −2B
ab
−B
c bd g˜ γ˜ab
−2B
e˜c (t g˜ ab ) t g˜ Zabc t t
(t
−qh
(t
(61d)
,
+t
(61f) −qc
qc −qa −qd
+t
+t
qc −2qh
−qa
+t
), −qd
h = min(a, b), ).
(61g) (61h)
Proof. The inequalities (61a) and (61b) are immediate from Lemma 2, (45) and Definition 1. Recalling that the variables u, ux occurring in the second reduced system contain γ ab , λabc and its first order derivatives gives (61c) and (61d). (Here the inequality ζ < B has been used.) The estimate (61e) follows from Lemma 2, (39) and (45) together with the triangle inequality in the form −qb ≤ −qa + |qa − qb |. The estimate (61f) follows from (39) and (45) together with the observation that since 0 gab is block diagonal for x ∈ U0 , ec 0 gab is also block diagonal. The estimates (61g) and (61h) follow in a similar way starting from (60) and (57). In (61h) a log(t) term is dominated by t −B . The following lemma gives estimates of 1˜ abc in terms of Zabc : Lemma 5. 1˜ abc t −2B Zabc , 1˜ abb t −2B−qa , 1˜ aab t −2B (t −qa + t −qb ), +t 1˜ aba t (t −3B−q aZ e˜a 1˜ bcd t bcd . −2B
−qa
−qb
),
(62a) (62b) (62c) (62d) (62e)
f
Proof. First observe that γbc zbc . From (57) and (61) we have f
γ˜bc g˜ f a t −2B (t qa −qb −qc zbc + t −qb + t −qc ). Now noting e˜a (g˜ bc ) t −qa −2B , (62a) follows. To estimate 1˜ abb we compute 1˜ abb =
1 −qa t ea g˜ bb t −2B−qa . 2
The estimates for 1˜ aab , 1˜ aba follow directly from (62a) and the definition of Zabc . Finally we consider (62e). Expanding out e˜a 1˜ bcd , we see that it contains (up to f f permutations of the indices) terms of the form e˜a e˜b (g˜ cd ), (e˜a γ˜cd )g˜ f b and γ˜cd e˜a g˜ f b . The first and second type of terms are estimated using (61d) and (61e), using the form of f γ˜cd . Finally, the third type of term is estimated using (61f).
Quiescent Cosmological Singularities
499
An important consequence of the Kasner relation
a qa
= 1 is
2 + 2(q1 − q2 − q3 ) = 4q1 ,
(63)
which implies t 2+2(qj −qk −ql ) t 4q1
if at least one of k, l is different from 3.
(64)
The strategy will be to eliminate as much as possible the occurrence of repeated negative exponents, in order to be able to use this relation. We make note of the following useful relations: Zabc t q1 −q2 −q3 , Zhhc t ea Zbcd t
−qh −B
+t
−qc
(65a) ,
(65b)
Zbcd .
(65c)
The estimates (65a) and (65b) are immediate from (60). For the rest of this section, we will assume that gab is symmetric and is of the form given by (41a). Let Rab be the Ricci tensor defined with respect to gab . Under these assumptions we will estimate R ab . The estimate then applies after a small modification to S R ab . We now proceed to estimate the rescaled components of the Ricci tensor R˜ ad = bc g˜ R˜ dcab . Corresponding to the terms (R1), . . . , (R5) we have R˜ ad = (Ric1)ad − (Ric2)ad − (Ric3)ad − (Ric4)ad + (Ric5)ad . We make the following simplifying observations: – By the symmetry R˜ ad = R˜ da we can assume without loss of generality that a ≤ d. – R˜ dcab is skew symmetric in the first and second pair of indices, therefore we can assume without loss of generality that c = d, b = a. Therefore, in the following, we can without loss of generality use the following convention: The indices a, b, c, d satisfy the relations a ≤ d,
c = d,
a = b.
(66)
We will now estimate R˜ ad by considering each term (Ric1)ad , . . . , (Ric5)ad in turn. The estimate we will actually prove is of the form t 2+qa −qd R˜ ad t 4q1 −6B , which will imply the needed estimate for S R ad . 6.1. (Ric1).
(Ric1)ad = g˜ bc e˜a 1˜ bcd , where we are summing over repeated indices. Let F = t 2+qa −qd (Ric1)ad . Using Lemma 5 and (61g), we have F t 2−4B t qa −qd t −qa (t −qc + t −qd + t qd −2qc ) t 2−4B (t −qd −qc + t −2qd + t −2qc ) t 4q1 −4B t 4q1 −6B .
500
L. Andersson, A. D. Rendall
6.2. (Ric2).
(Ric2)ad = g˜ bc e˜b 1˜ acd .
Let F = t 2+qa −qd (Ric2)ad . By Lemma 5 and (61e), F t 2−4B t qa −qd −qc Zacd . By (66), d = c and using (65a) and (64) this gives F t 4q1 −4B t 4q1 −6B , which is the required estimate. 6.3. (Ric3).
f (Ric3)ad = g˜ bc γ˜ab 1˜ f cd .
Let F = t 2+qa −qd (Ric3)ad . We estimate using (61h) and Lemma 5, F t 2−4B t qa −qd (t qf −qa −qc + t −qa + t −qc )Zf cd t 2−4B (t qf −qd −qc + t −qd + t qa −qd −qc )Zf cd , use c = d by (66), (65a) and (64) t 4q1 −4B t 4q1 −6B . 6.4. (Ric4).
(Ric4)ad = g˜ bc g˜ fg 1˜ bcf 1˜ adg .
Let F = t 2+qa −qd (Ric4)ad . We have by Lemma 5, F t 2+qa −qd −4B g˜ bc g˜ fg Zbcf Zadg . In case a = d this gives, with h = min(b, c), m ∈ {f, g}, using (65) and (61), t 2 (Ric4)aa t 2−6B (t −qh + t −qm + t qm −2qh )(t −qa + t −qm ) t 4q1 −6B . Next we consider the case a < d. In case g = d, Lemma 5 together with (61e) and (61g) gives F t 2+qa −qd g˜ bc g˜ f d 1˜ bcf t −2B−qa t 2−6B (t −qf −qh + t −2qf + t −2qh ), t
4q1 −6B
h = min(b, c)
.
In case a = g we have arguing as above F t 2+qa −qd g˜ bc g˜ f a 1˜ bcf t −2B (t −qa + t −qd ) t 2−5B g˜ f a (t −qh + t −qf + t qf −2qh )(t −qd + t qa −2qd ), h = min(b, c) t 2−6B (t −qh + t −qm + t qm −2qh )(t −qd + t qm −2qd ),
m = min(a, f ).
Quiescent Cosmological Singularities
501
From h = min(b, c) and c = d which holds by (66), we find that either h < 3 or d < 3 must hold. Using this it follows using (64) that in case a = g < d, F t 4q1 −6B . It remains to consider the case when a < d and a, d, g are distinct. In this case, the estimates used above give F Zadg t 2+qa −6B (t −qa + t −qg + t qg −2gh )Zadg , t
2−6B
(t
−qd
(t
−qh
+t
+t
−qg
qa −2qd
+t
+t
qg −2qh
qa −qd −qg
h = min(b, c)
) + t 2qa −2qd −qg + t −qg + t qg −2qd ).
By construction, h < 3 or d < 3 must hold, which in conjunction with the fact that in the present case, g = d gives using (64), F t 4q1 −6B . The above proves that the required estimate t 2+qa −qd (Ric4)ad t 4q1 −6B holds. 6.5. (Ric5).
(Ric5)ad = g˜ bc g˜ fg 1˜ acf 1˜ bdg .
The estimate for (Ric5)ad is the most complicated, and will be done in several steps. We review the steps which will be used here. In each step the conditions on the indices a, c, f, b, d, g are shown to imply the required estimate and so the case in question can be excluded from consideration in all later steps. Recall that a ≤ d may be assumed and also note by (66) we may assume without loss of generality that a = b, c = d. The steps we will use are: 1. 2. 3. 4. 5.
a = d can be excluded, so a < d may be assumed. g = d can be excluded, so g = d may be assumed. g = a can be excluded, so g = a may be assumed. b = d can be excluded, so b = d may be assumed. a = c can be excluded, so a = c may be assumed.
When all the above claims are verified, we may restrict our considerations to the indices satisfying the conditions a < d,
a = b,
c = d,
g = d,
g = a,
b = d,
a = c.
(67)
These conditions imply that the indices {a, d, g}, {a, d, c} and {a, d, b} are distinct, so (67) implies g = b = c, as all indices take values in {1, 2, 3}. Therefore the required estimate for (Ric5)ad will hold if we can verify that it holds under (67) in conjunction with the condition g = b = c, which is the final step. Let F = t 2+qa −qd (Ric5)ad . Case a = d: In case a = d, Lemma 5 and (65a) give F t 2−6B t 2(q1 −q2 −q3 ) t 4q1 −6B . Therefore we may assume a < d in the following. Further by (66), a = b and c = d.
502
L. Andersson, A. D. Rendall
Case g = d: Next consider the case g = d. Then using Lemma 5, (60) and (61e) we have F = t 2+qa −qd g˜ bc g˜ f d 1˜ acf 1˜ bdd t 2+qa −qd g˜ bc g˜ f d 1˜ acf t −2B−qb t 2−6B+qa −qd −qc (t −qa + t −qc + t −qd + t qa −qc −qd + t qc −qd −qa + t qd −qa −qc ) t 2−6B (t −qd −qc + t qa −qd −2qc + t qa −2qd −qc + t 2(qa −qd −qc ) + t −2qd + t −2qc ). By (66), d = c, this gives F t 4q1 −6B in the case g = d. Case g = a: Next consider the case g = a. Then we have F = t 2+qa −qd g˜ bc g˜ f a 1˜ acf 1˜ bda , use Lemma 5 and (61g) t 2−5B t qa −qd g˜ bc (t −qa + t −qc + t qc −2qa )Zbda , use Lemma 2 t 2−6B (t −qd + t qa −qd −qc + t qc −qa −qd )Zbda , use (65a) and a < d, d = c t 4q1 −6B . At this stage we may assume a < d,
a = b,
c = d,
g = d,
g = a.
(68)
Case b = d: Next consider the case b = d. In this case F = t 2+qa −qd g˜ dc g˜ fg 1˜ acf 1˜ ddg t 2+qa −qd g˜ dc g˜ fg 1˜ acf t −2B (t −qd + t −qg ) t 2−6B (t qa −qd −qc + t qa −qd −qg )Zacf , use d = c and d = g from (68) t 4q1 −6B . Case a = c: Next consider the case a = c. In this case we have using Lemma 5 F t 2+qa −qd g˜ ba g˜ fg t −2B (t −qa + t −qf )t −2B Zbdg , use (61e) t 2−6B (t −qd + t qa −qd −qg )Zbdg , use g = d from (68) and (65a) t 4q1 −6B . At this stage we may assume a < d,
a = b,
c = d,
g = d,
g = a,
b = d,
a = c.
(69)
Quiescent Cosmological Singularities
503
As discussed above, if we can prove that the required estimate holds under the condition g = c = b we are done. Case g = b = c: Next consider the case g = b = c. In this case, after making the substitutions g = c and b = c, F = t 2+qa −qd g˜ cc g˜ f c 1˜ acf 1˜ cdc , use Lemma 5 t 2−3B t qa −qd g˜ f c 1˜ acf (t −qd + t −qc ).
(70)
To estimate F we must now consider the cases f = d, f = c, f = a separately. Case g = b = c and f = d: In case g = b = c and f = d, we have from (70), F t 2−3B t qa −qd g˜ dc 1˜ acd (t −qd + t −qc ), use (61e) and Lemma 5 t 2−6B t qa −qd −qc Zacd , use c = d from (69) and (65a) t 4q1 −6B . Therefore we may exclude condition (69) in conjunction with f = d from our considerations. Case g = b = c and f = c: In case g = b = c and f = c we have from (70) , F t 2−3B t qa −qd g˜ cc 1˜ acc (t −qd + t −qc ), use Lemma 5 t 2−4B t qa −qd t −2B−qa (t −qd + t −qc ) = t 2−6B (t −2qd + t −qd −qc ) t 4q1 −6B . Case g = b = c and f = a: The only remaining case is g = b = c and f = a. In this case we get from (70) using Lemma 5 , F t 2−5B t qa −qd g˜ ac Zaca (t −qd + t −qc ), use (65b) and (61g) t 2−6B (t qa −2qd + t qa −qd −qc )t −qa t 2−6B (t −2qd + t −qd −qc ) t 4q1 −6B . Therefore it now follows that under (69), the required estimate F t 4q1 −6B holds and hence by the above argument it follows that this estimate holds under (66). This proves for a symmetric metric satisfying (41) the estimate a
t 2−α b R ab t 4q1 −6B .
(71)
504
L. Andersson, A. D. Rendall
We wish to apply this to the symmetrized metric S gab , which has the property that the rescaled symmetrized metric S g˜ ab satisfies (56). The estimate (71) translates to an estimate for a metric satisfying (56) after replacing α0 by α0 − 2B, which by the definition of α0 satisfies α0 − 2B > B > 0. Therefore we get in view of the discussion at the beginning of this section a
t 2−α˜ d S R˜ ad t 4p1 −9B−α0 or
a
t 2−α d S R ad t 4p1 −9B−α0 .
Now recall B = α0 /4 = mina {pa (x0 )}/40 and q1 > 20B by construction. This gives a
t 2−α d S R ad t 4q1 −13B t 3q1 t 3p1 . where we used that fact that q1 ≥ p1 by construction. This finishes the proof of Lemma 6 (Curvature estimate). a
t 2−α b S R ab t 3p1 . Now some estimates will be obtained for matter variables. These will be used to check that the right–hand side of the second reduced system has the properties required for a Fuchsian system. Let wa be a one-form with the property that wa t qa . We have t 2 g˜ ab ∇a wb = t 2 g˜ ab e˜a wa − t 2 g˜ ab g˜ fg 1˜ abf wg t 2−B−2qa + t 2−4B t q1 +q2 −q3 t −qg t 4q1 −4B t 4p1 −4B . Here it has been assumed that wa behaves in a suitable way upon taking derivatives. In the context of the matter variables this will be the case for the relevant choices of wa , namely ea (φ) and t −1 ua . Note that this estimate requires no use of cancellations, since it only uses the relation (65a) and not (61g). In this situation the estimate for a given quantity is never more difficult than that for the corresponding velocity dominated part, since the difference between the two is always of higher order. This gives the estimates required for the matter equations in the scalar field case. For the stiff fluid some more work is needed. The aim now is to estimate the terms on the right-hand side of Eqs. (51) by a positive power of t. For most of these terms no cancellations are required to get the desired estimate. There are only two exceptions to this and they will be discussed explicitly now. The first is the following combination which arises if the evolution equation for va is written out explicitly: 0 −1
µ
ea (0 µ) − µ−1 ea (µ).
(72)
Quiescent Cosmological Singularities
505
This expression is equal to t β1 [−A−1 ν(1 + A−1 νt β1 )−1 A−1 (∇a A + ∇a ν) + A−1 t β1 ∇a ν]
(73)
which is O(t β1 ). The contribution of this expression to the right–hand side of the evolution equation for va is as a consequence O(t β1 −β2 ) which shows that it is necessary to choose β2 < β1 . The second expression where a cancellation is necessary is 1 + ttrk. Now trk = −t −1 +trκt −1+α0 and hence 1+ttrk = trκt α0 . It follows that this expression is O(t α0 ). The analysis of the other terms is rather straightforward, although lengthy, and will not be carried out explicitly here. However some comments may be useful. The terms which are a priori most difficult to estimate are those involving covariant derivatives of ua . For those it is convenient to use the components in the rescaled frame e˜a . For all other terms the original frame ea can be used straightforwardly. In order that all terms can be estimated by a positive power of t it suffices to choose β1 and β2 small enough. One possible choice is β2 < β1 < q1 − 5B. In order to show that the reduced systems for the Einstein-scalar field system and the a Einstein-stiff fluid systems, are in Fuchsian form, we need to show that t 2−α b M ab = o(t δ ) for some δ > 0. We consider first the Einstein-scalar field case. In this case, M ab = g ac ec (φ)eb (φ). a
Arguing as above for the estimate of t 2−α b S R ab , we have a a t 2−α b M ab t 2−α˜ b −B M˜ ab .
Therefore it is enough to show a
t 2−α˜ b M˜ ab t 2B . By definition M˜ ab = t −qa −qb ea (φ)eb (φ) and hence M˜ ab t −qa −qb . This gives using α˜ ab = |qa − qb |, a
t 2−α˜ b M ab t 3q1 . For the scalar field case this gives, together with the above, a
t 2−α b (S R ab − M ab ) t δ ,
for some δ > 0
which is the estimate required for proving that the second reduced system for the Einsteinscalar field system is in Fuchsian form. The argument for the Einstein-stiff fluid system is very similar.
506
L. Andersson, A. D. Rendall
7. The Constraints The main aim of this section is to show that if a solution of the evolution equations is given which corresponds to a solution of the velocity dominated equations as in Theorem 1 or 2 then it satisfies the full constraints. It will also be shown how the existence of a large class of solutions of the velocity dominated constraints can be demonstrated. The first result on the propagation of the constraints relies on rough computations which prove the result in the case where all pa are close to 1/3. An analytic continuation argument then gives the general case. Lemma 7. Let (0 gab , 0 kab , 0 φ) be a solution of the velocity dominated system as in the hypotheses of Theorem 1 with |pa − 1/3| < α0 /10. Let (γ a b , κ a b , ψ) be a solution of (46) and (48) modelled on this velocity dominated solution and define gab , kab and φ by (24). Suppose that this solution satisfies the properties 1.–6. of the conclusions of Theorem 1. Then the Einstein constraints are also satisfied. Proof. Define: C = −kab k ab + (trk)2 − R − 16πρ, Cb = ∇a k a b − ∇b (trk) − 8πjb .
(74) (75)
These quantities satisfy the evolution equations ∂t C − 2(trk)C = ∇ a Ca , 1 ∂t Ca − (trk)Ca = ∇a C. 2
(76) (77)
Define rescaled quantities by C¯ = t 2−η1 C and C¯ = t 1−η2 Ca for some positive real numbers η1 and η2 . Then the above equations can be written in the form: t∂t C¯ + η1 C¯ = 2[1 + ttrk]C¯ − t 2−η1 +η2 ∇ a C¯ a , ¯ t∂t C¯ a + η2 C¯ a = [1 + ttrk]C¯ a − (1/2)t η1 −η2 ∇a C.
(78) (79)
Choose η1 and η2 so that η1 − η2 > 0. The aim is to apply Theorem 3 to show that C¯ and C¯ a vanish. In order to do this we should show that these two quantities vanish as fast as a positive power of t as t → 0, that 1 + ttrk vanishes like a positive power of t and that the term ∇ a C¯ a is not too singular. In obtaining these estimates it is necessary to use the behaviour of the derivatives of the solution mentioned in the remark following Theorem 3. Note that since the velocity dominated constraints are satisfied by assumption, it is enough to estimate the differences of the constraint quantities corresponding to the velocity dominated and full solutions, since these are in fact equal to C and Ca . By property 2. of the conclusions of Theorem 1 it follows that 1 + ttrk = O(t α0 ) which gives one of the desired statements. Similarly it follows that −k ab kab + (trk)2 = −(0 k ab )(0 kab ) + (tr 0 k)2 + O(t −2+α0 ).
(80)
It follows from the curvature estimates done in Sect. 6 that the scalar curvature is also O(t −2+α0 ). Now consider the expression ρ − 0 ρ. The components of the inverse metric can be estimated by t −2+40α0 , so that the terms in ρ involving spatial derivatives can be estimated by t −2+α0 as well. The difference of the time derivatives can be estimated by t −2+β . These are the estimates for the Hamiltonian constraint that will be needed.
Quiescent Cosmological Singularities
507
The estimates just carried out were independent of the restriction on the pa in the hypotheses of the lemma. The following estimates for the momentum constraint are of a cruder type and do use the restriction. First note that the metric and its inverse can be estimated by the powers of t equal to 2p1 and −2p3 respectively. It follows from the assumption on the pa in the hypotheses of the theorem that 2p1 > 2/3 − α0 /5 and −2p3 > −2/3 − α0 /5. Using (27) then shows that the connection coefficients can be estimated in terms of the power −2α0 /5. The effect on the order of a term of taking a divergence can be estimated by the powers −2α0 /5 and −2/3 − 3α0 /5 for upper and lower indices respectively. The gradient of the mean curvature is O(t −1+α0 log t) while the difference of ja is O(t −1+β log t). The difference of the divergence of the second fundamental form produces the power −1 + 3α0 /5. Evidently the last power and that containing β are the limiting ones and determine the estimate for Ca . Similarly the divergence of Ca can be estimated by the powers −5/3 and −5/3 + β − 3α0 /5. Note that it follows from the definition of α0 that α0 < 1/30. Thus given the hypothesis of the lemma it can be concluded that η1 and η2 can be chosen in such a way that all terms on the right–hand side of the propagation equations for the constraint quantities vanish like positive powers of t. Thus these equations are Fuchsian and the conclusion follows from Theorem 3. The analogue of this lemma with the scalar field replaced by a stiff fluid is also true and can be proved in the same way. Next the restriction on the exponents pa will be lifted. Consider a solution (0 gab , 0 kab , 0 µ, 0 va ) of the velocity dominated constraints for the Einstein-stiff fluid system. Let 0 kˆab be the trace-free part of 0 kab . The velocity dominated constraints become: −0 kˆ ab0 kˆab + (2/3)(tr 0 k)2 = 16π µ, ∇a 0 kˆ ab = 8π µub .
(81) (82)
Now let λ kab = (1 − λ)ˆ0 kab + (1/3)(tr 0 k)0 gab and µ = (1/16π)[−(1 − λ)2 (0 kˆ ab )(0 kˆab ) + (2/3)(tr 0 k)2 ], λ µb = 2(1 − λ)∇ a 0 kˆab [−(1 − λ)2 (0 kˆ ab )(0 kˆab ) + (2/3)(tr 0 k)2 ]−1 . λ
(83) (84)
Then 0 gab , λ kab , λ µ, λ ua is a one parameter family of solutions of the velocity dominated constraints which depends analytically on the parameter λ. There exists a corresponding family of solutions of the velocity dominated evolution equations which also depends analytically on λ. Next, Theorem 3 provides a corresponding analytic family of solutions of the full evolution equations. (Cf. the second remark following that theorem.) These define constraint quantities depending analytically on λ. For λ close to one Lemma 7 shows that these quantities are zero. Hence by analyticity they are zero for all values of λ, including λ = 0. This means that the conclusion of Lemma 7 holds for all positive pa and a stiff fluid. Since any solution of the second reduced system for a scalar field defines a solution of the second reduced system for a stiff fluid, this extension also holds for the scalar field. A variant of the conformal method for solving the full Einstein constraints can be used to analyse the velocity dominated constraints. Consider the following set of free data: a Riemannian metric g¯ ab , a symmetric trace-free tensor σab on S and two scalar
508
L. Andersson, A. D. Rendall
functions φ¯ and φ¯ t on S. Next consider the following ansatz: gab = ω4 g¯ ab ,
(85)
kab = −(1/3)t0−1 gab + ω−2 lab , ¯ φ = ω−2 φ,
(86)
φt = ω
−4
φ¯ t ,
(87) (88)
where lab = σab + ∇a Wb + ∇b Wa − (2/3)g¯ ab g¯ cd ∇c Wd .
(89)
Putting this into (12a) and defining ρ¯ = 21 (φ¯ t )2 gives ρ = ω−8 ρ, ¯ a relation well known from the usual conformal method. As a result of the Hamiltonian constraint the function φ satisfies the following algebraic analogue of the Lichnerowicz equation: 2 −ω−12 lab lcd g¯ ac g¯ bd + t0−2 − 16π ω−8 ρ¯ = 0. 3
(90)
Solving this comes down to looking for positive roots of the equation aζ 3 +bζ 2 −c = 0, where a and b are non-negative and c is positive. The derivative of the function on the left–hand side of this equation is ζ (3aζ + 2b). Thus unless a and b are both zero the derivative has no positive roots. Moreover the function tends to plus infinity for large ζ and is negative at ζ = 0. Hence the equation has a unique solution for each a and b not both zero and if a and b depend analytically on some parameter then the solution does so too. If a and b are both zero then of course there is no positive solution. In the case of interest here a and b are both positive. The function ω is given by ω = E(lab lcd g¯ ac g¯ bd , t0 , ρ), ¯ where the analytic function E is defined as the solution of the algebraic Lichnerowicz equation. The momentum constraint implies the elliptic equation ¯ b ω] − ∇a σ a b (91) g¯ as ∇a [∇s Wb + ∇b Ws − (2/3)g¯ sb g¯ cd ∇c Wd ] = 8π [j¯b − 2φ¯ t φω∇ ¯ Note that, when ω is expressed in terms of the function E of for Wa . Here j¯a = φ¯ t ∇a φ. the basic variables, it depends on the first derivatives of Wa . Thus the expression ∇a ω involves second derivatives of Wa and is not simply a lower order term. Consider now the linearization of (91),with respect to Wa , where ω has been reexpressed using E. In particular, consider the linearization in the particular case where g¯ ab is the metric of constant negative curvature on a compact hyperbolic manifold, the tensor σab is zero, φ¯ and φ¯ t are constant and the background value of Wa is zero. Because ω is a function of an expression quadratic in Wa , the right-hand side of (91) makes no contribution to the linearization. Since g¯ ab has no non-trivial conformal Killing vectors it follows from the standard theory of the York operator that the operator obtained by linearization of the equation (91) is invertible as a map between appropriate Sobolev spaces. Then an application of the implicit function theorem gives solutions of (91) for arbitrary choices of the free data sufficiently close (with respect to a Sobolev norm) to the particular free data at which the linearization was carried out. This shows the existence of solutions of the velocity dominated constraints which are as general as the solutions of the full Einstein constraints (at least in the crude sense of function counting). The conformal method can be applied in a similar way in the stiff fluid case and it turns out to be easier than in the scalar field case. This might seem paradoxical, since the scalar
Quiescent Cosmological Singularities
509
field problem can be identified with a subcase of the stiff fluid problem. The explanation is that it is difficult to identify which free data in the procedure for constructing stiff fluid data which will be presented correspond to data for a scalar field. The ansatz used is µ = ω−8 µ¯ and ua = ω2 u¯ a . This gives the scaling ρ = ω−8 ω¯ and ja = ω−6 j¯a which is often used in the conformal method. The quantities describing the geometry are scaled as in the case of the scalar field. The equations for ω and Wa are very similar in both cases, with the notable difference that in the stiff fluid case the term involving the derivative of ω is missing from the equation for Wa . This means that the equation for Wa is independent of ω and can be solved by standard theory, as long as the metric g¯ ab has no conformal Killing vectors. Once this has been done the algebraic equation for ω can be solved straightforwardly. 8. Discussion We have shown the existence of a family of solutions of the Einstein equations coupled to a scalar field or a stiff fluid whose singularity structure we can analyse. No symmetry assumptions are made and the solutions are general in the sense that they depend on the same number of free functions as general initial data for the same system on a regular Cauchy surface. These solutions agree with the picture of general spacetime singularities proposed by Belinskii, Khalatnikov and Lifshitz in two important ways. Firstly, the evolution at different spatial points decouples, in the sense that the solutions of the full equations are approximated near the singularity by a solution of a system of ordinary differential equations. Secondly there exists a Gaussian coordinate system which covers a neighbourhood of the singularity in which the singularity is situated at t = 0. It is easily seen that the curvature invariant Rαβ R αβ = 64π 2 (∇α φ∇ α φ)2 blows up uniformly for t → 0. In fact the leading term is proportional to A4 (x)t −4 and in the solutions we consider A can never vanish, as a consequence of the Hamiltonian constraint. Thus these singularities are all consistent with the strong cosmic censorship hypothesis. The mean curvature of the hypersurfaces of constant Gaussian time tends uniformly to infinity as t → 0 so that the singularity in crushing in the sense of [10]. It then follows from well-known results that a neighbourhood of the singularity can be covered by a foliation consisting of constant mean curvature hypersurfaces. This is the most general class of spacetimes in which all these suggested properties of general spacetimes have been demonstrated. A subclass of these spacetimes is covered by the results of Anguige and Tod [1]. The connection between their results and those of the present paper deserves to be examined more closely but intuitively their spacetimes should correspond to the case where, in our notation, the pi are everywhere equal to 1/3. The spacetimes constructed have been shown to be general in the sense of function counting. It would, however, be desirable to prove that the assumption of analyticity of the data can be replaced by smoothness and that, this having been done, the spacetimes constructed include all those arising from a non-empty open set of initial data on a regular Cauchy surface which, in particular, contains the initial data for a Friedmann model. This would be a statement on the stability of the Friedmann singularity. A model for this kind of generalization is provided by the work of Kichenassamy [13] on nonlinear wave equations. It was indicated in the introduction that the results on the Einstein-scalar equations can be interpreted in more than one way. The interpretation which has been emphasized here is that where the metric occurring in this system is considered to be the physical
510
L. Andersson, A. D. Rendall
metric. In the interpretation in terms of string cosmology the physical metric is (up to a multiplicative constant) eφ gµν . This means that for A(x) sufficiently negative the limit t → 0 does not correspond to a singularity at all, but rather to a phase which lasts for an infinite proper time. It is the time reverse of this situation which plays a role in the prebig bang model [7]. Another interpretation is in terms of the vacuum field equations in Brans-Dicke theory. This is very similar to the string cosmology case, with the difference that the conformal factor eφ is replaced by eCφ , where C is a constant which depends on the Brans–Dicke coupling constant. All the results in this paper have concerned the case of three space dimensions. There are reasons to believe that if the space dimension is at least ten then the vacuum Einstein equations allow stable quiescent singularities, similar in some ways to those of the Einstein-scalar field equations in three space dimensions [8]. The techniques developed in this paper might allow this to be proved rigorously. It would also be interesting to know what happens to the picture when further matter fields are added. There are several possibilities here. One is to add some other field, not directly coupled to the scalar field, to the Einstein-scalar field system. A second is to reinstate some of the extra fields (axion, moduli) which have been discarded in passing from the low energy limit of string theory to the Einstein-dilaton theory. A third is to add extra matter fields to the Brans-Dicke theory. Another direction in which the results on the Einstein-scalar field and Einstein-stiff fluid equations could be generalized is to start with situations where the solution has one Kasner exponent negative and investigate whether it moves (in the direction towards the singularity) towards the region where all Kasner exponents are non-negative. If this were true, then the singularities in generic solutions of these equations could be quiescent. The set of initial data concerned would be not just open, but also dense. This question is sufficiently difficult that it would seem advisable to first try and investigate it rigorously in the spatially homogeneous case. Acknowledgement. This research was supported in part by the Swedish Natural Sciences Research Council (SNSRC), contract no. F-FU 4873-307, and the US National Science Foundation under Grant No. PHY9407194. Part of the work was done while the authors were enjoying the hospitality of the Institute for Theoretical Physics, Santa Barbara. We gratefully acknowledge stimulating discussions with V. Moncrief which had an important influence on the development of the strategy used in this work.
References 1. Anguige, K., Tod, K. P.: Isotropic cosmological singularities 1: Polytropic perfect fluid spacetimes. Ann. Phys. (NY) 276, 257–293 (1999) 2. Baouendi, M. S., Goulaouic, C.: Remarks on the abstract form of nonlinear Cauchy–Kowalewski theorems. Comm. P. D. E. 2, 1151–1162 (1977) 3. Barrow, J. D.: Quiescent cosmology. Nature 272, 211–215 (1978) 4. Belinskii, V. A., Khalatnikov, I. M. and Lifshitz, E. M.: Oscillatory approach to a singular point in the relativistic cosmology. Adv. Phys. 19, 525–573 (1970) 5. Belinskii, V. A. and Khalatnikov, I. M.: Effect of scalar and vector fields on the nature of the cosmological singularity. Sov. Phys. JETP 36, 591–597 (1973) 6. Belinskii, V. A., Khalatnikov, I. M. and Lifshitz, E. M.: A general solution of the Einstein equations with a time singularity. Adv. Phys. 31, 639–667 (1982) 7. Buonanno, A., Damour, T. and Veneziano, G.: Pre-big bang bubbles from the gravitational instability of generic string vacua. Nucl. Phys. B 543, 275–320 (1999) 8. Demaret, J., Henneaux, M. and Spindel, P.: Non-oscillatory behaviour in vacuum Kaluza-Klein cosmologies. Phys. Lett. B 164, 27–30 (1985) 9. Eardley, D., Liang, E. and Sachs, R.: Velocity-dominated singularities in irrotational dust cosmologies. J. Math. Phys. 13, 99–106 (1972)
Quiescent Cosmological Singularities
511
10. Eardley, D., Smarr, L.: Time functions in numerical relativity: marginally bound dust collapse. Phys. Rev. D 19, 2239–2259 (1979) 11. Isenberg, J. and Moncrief, V.: Asymptotic behaviour of the gravitational field and the nature of singularities in Gowdy spacetimes. Ann. Phys. (NY) 199, 84–122 (1990) 12. Kato, T.: Perturbation theory for linear operators. Second ed., Berlin: Springer-Verlag, 1976 13. Kichenassamy, S.: The blow-up problem for exponential nonlinearities. Comm. P. D. E. 21, 125–162 (1996) 14. Kichenassamy, S. and Rendall, A. D.: Analytic description of singularities in Gowdy spacetimes. Class. Quantum Grav. 15, 1339–1355 (1998) 15. Lifshitz, E. M. and Khalatnikov, I. M.: Investigations in relativistic cosmology. Adv. Phys. 12, 185–249 (1963) 16. Rendall, A. D.: Convergent and divergent perturbation series and the post-Minkowskian approximation scheme. Class. Quantum Grav. 7, 803–812 (1990) Communicated by P. Sarnak
Commun. Math. Phys. 218, 513 – 536 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Natural Energy Bounds in Quantum Thermodynamics Daniele Guido1 , Roberto Longo2 1 Dipartimento di Matematica, Università della Basilicata, 85100 Potenza, Italy 2 Dipartimento Matematica, Università di Roma “Tor Vergata”, 00133 Roma, Italy
Received: 2 October 2000 / Accepted: 5 December 2000
Abstract: Given a stationary state for a noncommutative flow, we study a boundedness condition, depending on a parameter β > 0, which is weaker than the KMS equilibrium condition at inverse temperature β. This condition is equivalent to a holomorphic property closely related to the one recently considered by Ruelle and D’Antoni–Zsido and shared by a natural class of non-equilibrium steady states. Our holomorphic property is stronger than Ruelle’s one and thus selects a restricted class of non-equilibrium steady states. We also introduce the complete boundedness condition and show this notion to be equivalent to the Pusz–Woronowicz complete passivity property, hence to the KMS condition. In Quantum Field Theory, the β-boundedness condition can be interpreted as the property that localized state vectors have energy density levels increasing β-subexponentially, a property which is similar in the form and weaker in the spirit than the modular compactness-nuclearity condition. In particular, for a Poincaré covariant net of C∗ algebras on Minkowski spacetime, the β-boundedness property, β ≥ 2π , for the boosts is shown to be equivalent to the Bisognano–Wichmann property. The Hawking temperature is thus minimal for a thermodynamical system in the background of a Rindler black hole within the class of β-holomorphic states. More generally, concerning the Killing evolution associated with a class of stationary quantum black holes, we characterize KMS thermal equilibrium states at Hawking temperature in terms of the boundedness property and the existence of a translation symmetry on the horizon. 0. Introduction In this paper we shall discuss a property for a state which is invariant under a given one-parameter automorphism group of a C∗ -algebra. This property has two essentially equivalent descriptions either as a boundedness condition or as a holomorphic condition. Work partially supported by MURST and GNAFA-INDAM.
514
D. Guido, R. Longo
The boundedness property has a natural interpretation within Quantum Field Theory, being somehow similar to the Haag–Swieca [22] compactness or Buchholz–Wichmann [10] nuclearity conditions, while the holomorphic condition is a weakening of the KMS thermal equilibrium condition, related to the conditions recently considered by Ruelle [34] and D’Antoni–Zsido [11], thus it naturally pertains to the context of Quantum Statistical Mechanics. It is then natural to discuss our property in a context where both the two above subjects coexist: Black Hole Thermodynamics (see [40]). Our main result is that for a translation invariant Quantum Field Theory such property is equivalent to the KMS condition. Quantum field theory and the boundedness property. Let us consider a Quantum Field Theory on the Minkowski spacetime and let A(S) be the C∗ -algebra on the vacuum Hilbert space H generated by the observables localized in the region S. Clearly pure states on the quasi-local observable algebra in the vacuum folium are given by unit vectors of H, unique up to a phase, under the correspondence ξ ∈ H, ||ξ || = 1 −→ ωξ , where ωξ is the expectation functional ωξ (X) ≡ (Xξ, ξ ). Denote by L(S) the set of vector states localized in S, namely L(S) ≡ {ξ ∈ H, ||ξ || = 1 : ωξ |A(S ) = ω0 |A(S ) }, with ω0 = ω the vacuum state. It easy to see that ξ ∈ L(S) if and only if there exists an isometry W ∈ A(S ) such that ξ = W ; indeed W is the closure of the map X → X ξ , X ∈ A(S ). Therefore, if Haag duality holds for S, namely A(S) = A(S ) , where A(S) = A(S) is the weak closure of A(S), we have L(S) = {W : W ∈ A(S), W ∗ W = 1}.
(0.1)
The β-boundedness condition demands that all vectors in L(S) have energy density levels increasing β-subexponentially, namely (0.2) e−λβ dµξ (λ) < +∞, ∀ξ ∈ L(S), where µξ (B) = B d(E(λ)ξ, ξ ) is the spectral measure associated by the Hamiltonian H to ξ , H = λdE(λ) being the spectral resolution of H . By the spectral theorem, β
Eq. (0.2) is equivalent to L(S) ⊂ D(e− 2 H ). Since every operator X ∈ A(S) with ||X|| < 1 is a convex combination of unitaries in β A(S) (see e.g [29]), the β-boundedness condition is equivalent to A(S) ⊂ D(e− 2 H ) and turns out to be equivalent to e−βH B1 is a bounded set
(0.3)
for some, hence for all, weakly dense ∗ -subalgebra B of A(S), where B1 denotes the unit ball of B. Equation (0.2) already shows an interesting aspect of the β-boundedness property in Quantum Field Theory, namely that, as mentioned, it can be formulated much in analogy with the compactness-nuclearity [22, 10].
Energy Bounds in Quantum Thermodynamics
515
Now, we have not yet specified which region S is supposed to be. It could be the entire Minkowski spacetime M, but an unbounded region as a spacelike cone will already contain the relevant information. As H commutes with all translations, property (0.3) for S implies the same property for the translated regions {S + x, x ∈ Rd+1 }, hence for the quasi-local algebra. It is then almost immediate that the β-boundedness condition is equivalent to H to be semibounded, thus, by Poincaré covariance, to the positivity of the energy-momentum. Although instructive, the boundedness condition would certainly have a limited interest if it were confined to the above situation. Such a condition acquires however a deeper role when the Hamiltonian is not positive, as is the case in thermodynamical contexts. To better illustrate this point, we need to make a digression in Statistical Mechanics and discuss a property equivalent to the boundedness condition. β-holomorphy and non-equilibrium states. As is well known, the thermal equilibrium states in Quantum Statistical Mechanics at finite volume are the Gibbs states, while at infinite volume they are KMS [21] states, the proper generalization of the former, which are defined as follows. Let A be a C∗ -algebra and α a one-parameter group of automorphisms of A and B a dense ∗ -subalgebra of A. A state ω of A, namely a normalized positive linear functional of A, is a KMS state for α at inverse temperature β > 0 if, for any X, Y ∈ B, the function (a) FX,Y (t) ≡ ω(αt (X)Y ) extends to a function in A(Sβ ), (b) FX,Y (t + iβ) = ω(Y αt (X)), where A(Sβ ) is the algebra of functions analytic in the strip Sβ = {0 < z < β}, bounded and continuous on the closure S¯β . The consideration of the subalgebra B is indeed unnecessary (but convenient for future reference) as properties (a) and (b) then hold for all X, Y ∈ A. Let us now consider non-equilibrium statistical mechanics and in particular the physical situation recently considered by Ruelle [34]. There a quantum system ! is interacting with a set of infinite reservoirs Rk that are in equilibrium at different temperatures βk−1 . The system ! may be acted upon by a force that we assume to be time independent. In this context, at least if ! is finite, a natural class of stationary non-equilibrium states occurs, the non-equilibrium steady states. If we denote as above the observable C∗ -algebra by A and the time evolution automorphism group by α, a non-equilibrium steady state ω of A satisfies property (a) in the KMS condition, for all X, Y in a dense ∗ -subalgebra of B, with β = minβk , but not necessarily property (b). States with property (a) have also been independently discussed by D’Antoni and Zsido [11], see also [44]. A typical example of a non-KMS state satisfying property (a) is provided by the tensor product of KMS states at different temperatures; in this case the parameter β is clearly the minimum of the inverse temperatures. Further examples are obtained by considering a KMS state with respect to a bounded perturbation of the dynamics, showing a certain stability of the holomorphic property. In these examples the states satisfy property (a). Because of property (b), KMS states also satisfy the bound (c) |F (t + iβ)| ≤ C||X|| ||Y || for some constant C > 0. X, Y ∈ B, indeed one may take C = 1. We shall show that the bound (c) automatically occurs if property (a) holds true for all the elements of a C∗ -algebra A.
516
D. Guido, R. Longo
We say that a α-invariant state ω is β-holomorphic if Properties (a) and (c) hold for all X, Y in a dense ∗ -subalgebra B or, equivalently, if Property (a) holds for all X, Y in the C∗ -algebra A. Since the bound (c) does not necessarily hold for the Non-Equilibrium Steady States considered by Ruelle [34], β-holomorphic states form a subclass of the class of such non-equilibrium states that, in a sense, are closer to equilibrium. KMS and complete β-boundedness. Because of the bound (c), the β-holomorphic property for a state ω turns out to be equivalent to the β2 -boundedness property for the Hamiltonian H in the GNS representation. The two properties are thus two different aspects of the same notion. Furthermore, if the constant C is equal to 1, we may use an inequality of Pisier [31], improved by Haagerup [24], to get the inequality e∓βH ≤ 1 + %±1 , where % is the modular operator associated with the GNS vector that, for simplicity, we are assuming to be separating for the weak closure πω (A) . Thus the β-holomorphic property with C = 1 entails that the Hamiltonian H is dominated, in the above sense, by the thermal equilibrium Hamiltonian log %. A better understanding of the β-holomorphic property is then obtained by comparing it with the passivity condition of Pusz–Woronowicz [32], which is an expression of the second principle of thermodynamics. As is known, the passivity condition is weaker than the KMS condition, while the complete passivity turns out to be equivalent to the KMS property at some inverse temperature (possibly 0 or +∞). In analogy, we define the complete β-holomorphic property and show this property to be equivalent to complete passivity, thus to the KMS condition. We now return to Quantum Field Theory in a more general context. Black hole thermodynamics: Minimality of the Hawking temperature. During the past thirty years, a theory of black hole thermodynamics has been developed much in analogy with classical thermodynamics, see [40]. In this new context the thermodynamical functions acquire a new meaning, for example the entropy is proportional to the area of the black hole [2], yet a “generalized second law of thermodynamics” holds. As derived by Hawking [24], the temperature appearing in this formula is a true physical temperature, in other words black holes do emit a thermal radiation, provided quantum effects are taken into account. This effect, or its closely related Unruh effect in Rindler spacetime [39], has been noticed [36] to be essentially equivalent to the Bisognano–Wichmann property in Quantum Field Theory. Let now a Quantum Field Theory on the Minkowski spacetime M be specified by the algebras A(O) of the observables localized in the regions O. As is known Rindler spacetime can be identified with a wedge region W of M, say W = {x ∈ Rd+1 : x1 > |x0 |}. Thus W represents the exterior of a Rindler black hole. The pure Lorentz transformation in the x1 -direction on M leave W globally invariant and thus give rise to a one-parameter automorphism group α of the C∗ -algebra A(W ). The Bisognano–Wichmann theorem shows that, if A is generated by a Wightman field, then the restriction to A(W ) of the vacuum state satisfies the KMS condition with respect to α at inverse temperature β = 2π . One can then explain the Unruh effect on this basis, see [36].
Energy Bounds in Quantum Thermodynamics
517
We shall show that the β-boundedness property, β ≥ 2π , in the above Quantum Field Theory context, actually implies the KMS property at β = 2π , namely the state is at thermal equilibrium. In particular, for a Poincaré covariant net of local observable algebras on the Minkowski spacetime, we obtain a characterization of the Bisognano–Wichmann property: there should exist a spacelike cone S contained in the wedge W and a weakly dense ∗ -algebra B of A(S) = A(S) such that e−πK B1
is a bounded set
(0.4)
of the underlying Hilbert space. Here is the vacuum vector and K is the generator of the boost unitary group corresponding to W , namely the Killing Hamiltonian for the Rindler space W . Note that, if the Bisognano–Wichmann property and the split property [12] hold, then also the modular compactness condition holds true, namely e−λK B1 is a compact set for all 0 < λ < π [8] (e−2πK is then the modular operator associated with (A(W ), )). Thus the boundedness condition (0.4) is very similar in form to the modular compactness-nuclearity condition [8, 9]. Stating our result in the setting of Rindler spacetime, we obtain the following: if a state ω is β-holomorphy, with respect to the Killing evolution, with parameter β −1 less or equal to the Hawking temperature, then ω is indeed a KMS state at Hawking temperature. This minimality character of the Hawking temperature may have further physical interpretation, which is however limited by the fact that only β-holomorphic states appear in the context, see the conclusion at the end of this paper. We then extend our analysis to stationary black holes described by a globally hyperbolic spacetime with bifurcate Killing horizon, making use of the net of observables localized on the horizon, as in [19, 28]. We show that, assuming the β-boundedness, the KMS condition is equivalent to the existence of a translation symmetry on the horizon, cf. [45]. Our paper is organized in two sections. In the first one we deal with C∗ -algebras and automorphisms or endomorphisms; we discuss there the basic structure provided by the β-holomorphy and β-boundedness condition, that we later apply in Quantum Field Theory in Sect. 2. In this second section we first discuss our results in Minkowski, or Rindler, spacetime; then we study the corresponding structure for one-dimensional nets and we then apply these results to the case of a Quantum Field Theory on a spacetime with bifurcate Killing horizon. 1. Holomorphic States and the Boundedness Property In this section we discuss the basic structure of the holomorphy and of the boundedness properties. In the next section we shall apply our results to the Quantum Field Theory context. 1.1. General properties. Let H be a Hilbert space, ∈ H a vector and K a selfadjoint operator on H. We shall say that a linear subspace S of B(H) is β-bounded with respect to K and if S ⊂ D(e−βK ), the domain of e−βK , and the linear map 'β : X ∈ S → e−βK X ∈ H is bounded.
518
D. Guido, R. Longo
Note that if S is β-bounded then it is β -bounded for all 0 < β ≤ β, indeed if E is the spectral projection of K relative to the interval [0, +∞), then
||e−β K X ||2 = ||Ee−β K X ||2 + ||(1 − E)e−β K X ||2 ≤ ||X ||2 + ||(1 − E)e−βK X ||2 ≤ ||X||2 + ||e−βK X ||2 ≤ 1 + 'β 2 ,
(1.1)
for all X ∈ S1 , where S1 denotes the unit ball of S. Lemma 1.1. Let S be a β-bounded linear subspace of B(H) for some β > 0. Then the restriction of ' = 'β to S1 is continuous with the weak-operator topology on S and the weak topology on H. If S is norm closed and S is contained in D(e−βK ), then S is automatically β-bounded. Proof. Assume then that ' is bounded. To check the continuity, let Xi ∈ S1 be a net weakly convergent to 0. Then for all ξ ∈ D(e−βK ) we have lim(ξ, '(Xi )) = lim(ξ, e−βK Xi ) = lim(e−βK ξ, Xi ) = 0, i
i
i
as D(e−βK ) is dense in H and {'(Xi )}i is bounded by assumption, it follows that '(Xi ) → 0 weakly. It remains to show that ' is bounded if S is norm closed. We shall show that ' is closable, thus ' will be bounded by the closed graph theorem. Let {Xn } be a sequence in S and η ∈ H be such that Xn → 0 and '(Xn ) → η in norm. As {Xn } is a bounded set, by the just proved weak continuity of ' we have that '(Xn ) → 0 weakly, thus η = 0. Lemma 1.2. Let M be a von Neumann algebra on the Hilbert space H, a vector, K a selfadjoint operator of H and β > 0. The following are equivalent: (i) There exists a weakly dense ∗ -subalgebra B of M which is β-bounded (w.r.t. K and ); (ii) M is β-bounded; (iii) A ⊂ D(e−βK ), where A is some weakly dense C∗ -subalgebra of M. In this case 'β |B and 'β |M have the same norm. Proof. Assuming in (i) that e−βK B1 is contained in the ball of radius C > 0, we shall show that the same is true for e−βK M1 , i.e. (ii) holds. Let X ∈ B. By Kaplansky density theorem [38], there exists a net of operators Xi ∈ B1 strongly convergent to X. Since e−βK Xi ≤ C, we may assume, possibly restricting to a subnet, that e−βK Xi weakly converges to η ∈ H, η ≤ C. Now take ξ ∈ D(e−βK ). We have (ξ, η) = lim(ξ, e−βK Xi ) = lim(e−βK ξ, Xi ) = (e−βK ξ, X ). i
i
Since e−βK is self-adjoint, this means that X ∈ D(e−βK ) and e−βK X = η, i.e. e−βK M1 ≤ C. Now (ii) ⇒ (iii) and (iii) ⇒ (i) follows by Lemma 1.1 as A is norm closed.
Energy Bounds in Quantum Thermodynamics
519
Let now A be a unital C∗ -algebra, α a one-parameter automorphism group of A and ω a α-invariant state of A. We shall always assume that the maps t ∈ R → ω(αt (X)Y ) are continuous for all X, Y ∈ A. Denote by (H, π, ) the GNS triple associated with ω and by U the one-parameter unitary group on H implementing π · α: U (t)π(X) = π(αt (X)) ,
X ∈ A.
From our continuity assumption, it follows at once that U is strongly continuous. Denote by K the infinitesimal generator of U . Proposition 1.3. If ω is a pure state, π(A) is β-bounded with respect to K and if and only if the spectrum of K is bounded below. Proof. If ω is pure, then M ≡ π(A) = B(H), hence, if the boundedness condition holds, e−βK B(H)1 is bounded. As B(H)1 = H1 , it follows that e−βK is bounded, thus K is semibounded. The converse is obvious. Let B a ∗ -subalgebra of A. The state ω of A is β-holomorphic on B if ω is α-invariant and for every X, Y ∈ B the function FX,Y (t) = ω(αt (X)Y ) is the boundary value of a function holomorphic in the strip Sβ = {z : 0 < z < β}, continuous in S¯β . Denote by A(Sβ ) the algebra of functions holomorphic in Sβ , bounded and continuous in its closure S¯β . Proposition 1.4. Let A, α, ω, K, as before, B a ∗ -subalgebra of A. Then ω is ββ holomorphic on B if and only if π(B) ⊂ D(e− 2 K ). In this case FX,Y extends to a function in A(Sβ ), for all X, Y ∈ B. Proof. Immediate by Lemma 1.17.
Theorem 1.5. Let A, α, ω, K, as before, B a ∗ -subalgebra of A. Then π(B) is β 2 -bounded w.r.t. K and if and only if ω is β-holomorphic on B and |FX,Y (t + iβ)| ≤ CX Y ,
(1.2)
for some constant C > 0. If moreover B is norm closed, then π(B) is β/2-bounded w.r.t. K and if and only if ω is β-holomorphic for B. Proof. Clearly FY ∗ ,X (t) = (eitK π(X) , π(Y ) ) for all X, Y ∈ A. If ω is β-holomorphic on B and X, Y ∈ B, then FY ∗ ,X is a boundary value of a function in A(Sβ ), β
thus by Lemma 1.17 π(B) ⊂ D(e− 2 K ). If moreover the bound (1.2) holds, then 'β/2 |B 2 =
β
β
sup |(e− 2 K π(X) , e− 2 K π(Y ) )| =
X,Y ∈B1
sup |FY ∗ ,X (iβ)| ≤ C,
X,Y ∈B1
thus B is β/2-bounded. β Conversely, if B is β/2-bounded, then π(B) ⊂ D(e− 2 K ) and the same computation done above yields, by Lemma 1.17, that FX,Y ∈ A(Sβ ) for all X, Y ∈ B and sup |FX,Y (t + iβ)| =
X,Y ∈B1
sup |FX,Y (iβ)| = 'β/2 |B 2 ,
X,Y ∈B1
so the bound (1.2) holds with C = 'β/2 |B 2 .
β
Since by Prop. 1.4 ω is β-holomorphic on B iff π(B) ⊂ D(e− 2 K ), the rest follows by Lemma 1.2.
520
D. Guido, R. Longo
Of course KMS states at inverse temperature β > 0 are β-holomorphic for all 0 < β ≤ β on all A and satisfy the bound (1.2). Corollary 1.6. With the above notations, let ω be β-holomorphic on B. Then ω is β¯ iff the bound (1.2) holds. holomorphic on the closure B Proof. Immediate by Lemma 1.2 and Theorem 1.5.
1.2. Complete β-holomorphy and KMS condition. We begin to recall the following inequality (1.3) which is due to Pisier, with the improved constant due to Haagerup, see [30]. Theorem 1.7 ([31, 23]). Let ' be a bounded linear map from a C∗ -algebra A to a Hilbert space H. Then there exist two states ϕ and ψ on A such that '(X)2 ≤ '2 (ϕ(X ∗ X) + ψ(XX ∗ )),
X ∈ A.
(1.3)
In the special case where A is unital and ||'|| = ||'(1)||, one may take ϕ = ψ = ||'||−1 ('(·), '(1)) in Eq. (1.3). The special case in last part of the statement is obtained during Haagerup’s proof of the inequality (1.3); such proof is not difficult and can be found in [30], Thm 7.3. Corollary 1.8. Let M be a von Neumann algebra on a Hilbert space H, ∈ H a cyclic unit vector for M and U (t) = eitK a -fixing one-parameter unitary group on H implementing automorphisms of M. If ||e−βK M1 || ≤ 1, then e−2βK ≤ 1 + %E,
(1.4)
where E ∈ M is the projection onto H0 ≡ M and % is the modular operator on H0 associated with (EME, ). Proof. As the map 'β : X ∈ M → e−βK X ∈ H satisfies ||'β || = ||'β (1)|| = 1, the inequality (1.3) holds with ϕ = ψ = ω (see 1.7), where ω(X) = ('β (X), 'β (1)) = (X , ), namely ||e−βK X ||2 ≤ ||X ||2 + ||X ∗ ||2 , X ∈ M.
(1.5)
Assuming first that is also separating, i.e. E = 1, if X ∈ M and X ∈ D(%) we then have (e−βK X , e−βK X ) ≤ (X , X ) + (%X , X ) = ((1 + %)X , X ). Since U implements automorphisms of M and U (t) = , by the modular theory U (t) and %is commute, thus there exists a strongly dense subalgebra B of M such that B is a core for every continuous function of K or of log %. Taking X ∈ B, the above inequality gives (e−2βK X , X ) ≤ ((1 + %)X , X ),
Energy Bounds in Quantum Thermodynamics
521
thus e−2βK ≤ 1 + % as B is a core for both e−2βK and 1 + %. In general, if E = 1, we may consider the reduced von Neumann algebra EME on H0 . Since is separating for EME and K commutes with E, the above shows that e−2βK E ≤ E + %E.
(1.6)
Thus, by Eq. (1.5), we have for all X ∈ M, ||e−βK (1 − E)X ||2 ≤ ||(1 − E)X ||2 + ||X ∗ (1 − E) ||2 = ||(1 − E)X ||2 , that entails e−2βK (1 − E) ≤ 1 − E. Combining the inequalities (1.6) and (1.7) we get the desired inequality (1.4).
(1.7)
Remark 1.9. In general, the tensor product of bounded maps from C∗ -algebras to Hilbert spaces is not bounded. Indeed, if M is a von Neumann algebra with a cyclic and separating vector , the map 'α : X ∈ M → %α X is bounded for any α ∈ [0, 1/2], but 'α ⊗ 'β is not necessarily bounded, if α = β. As an example, let M B(H) be a type I∞ factor acting by right multiplication on the Hilbert space L2 (H) of the Hilbert-Schmidt operators affiliated on H, h a positive Hilbert-Schmidt operator of norm 1 in L2 (H). Then %X = h2 Xh−2 , X ∈ L2 (M), and 'α (X) = h2α Xh1−2α , X ∈ M, X ∈ L2 (H). As a consequence, if h2α and h1−2α are both are Hilbert-Schmidt, the norm of the map 'α is bounded by h2α 2 · h1−2α 2 . Therefore, ||'α ⊗ 'β || ≤ h2α 2 · h1−2α 2 · h2β 2 · h1−2β 2 . For example, if the eigenvalue sequence of h is {2−n }n∈N , 'α ⊗ 'β is bounded for every α, β ∈ (0, 1/2). On the other hand, denoting by F the unitary operator in M ⊗ M that, under the isomorphism with B(H ⊗ H), is defined by F ξ ⊗ η = η ⊗ ξ , one has F (X ⊗ Y ) = (Y ⊗ X)F , hence 'α ⊗ 'β (F ) = h2α ⊗ h2β F h1−2α ⊗ h1−2β = h1−ε ⊗ h1+ε F, where ε = 2α − 2β. Thus, if h1−ε or h1+ε are not Hilbert-Schmidt, 'α ⊗ 'β is unbounded. For example, if the eigenvalue sequence of h is { √n 1log n }n∈N , then 'α ⊗ 'β is bounded if and only if α = β. Note that, if in Corollary 1.8 the vector is separating for M, then by modular theory J %J = %−1 and J KJ = −K, where J is the modular conjugation of (M, ), thus the inequality e2βK ≤ 1 + %−1 holds too, therefore e∓2βK ≤ 1 + %±1 , showing that the β-boundedness condition with constant C = 1 sets a bound on the Hamiltonian K by the equilibrium Hamiltonian log %. It is then natural to look for further conditions that entail K to be proportional to log %. Let A be a C∗ -algebra acting on a Hilbert space H with a cyclic vector ∈ H and U (t) = eitK a -fixing one-parameter unitary group on H implementing automorphisms of A. We shall say that the C∗ -algebra A is completely β-bounded with respect to K
522
D. Guido, R. Longo
and if ||'β ⊗ 'β ⊗ · · · ⊗ 'β || ≤ 1, for all finitely many tensor products of 'β with itself, with 'β : X ∈ A → e−βK X ∈ H. Here we consider the spatial tensor product norm on A ⊗ A ⊗ · · · ⊗ A and, of course, the Hilbert space tensor norm on H ⊗ H ⊗ · · · ⊗ H. Note that, by Lemma 1.2, the complete β-boundedness condition can be equivalently formulated in terms of the von Neumann algebra M = A . Corollary 1.10. Let M be a von Neumann algebra acting on H with a cyclic and separating vector and U (t) = eitK a -fixing one-parameter unitary group implementing automorphisms of A. If M is completely β-bounded with respect to K and , then 2βK = −T log %, where T is a positive linear operator, 0 ≤ T ≤ 1, commuting with %, K and the modular conjugation J of (M, ). Proof. By Corollary 1.8 applied to the n-fold tensor product, we have e−2βK ⊗ · · · ⊗ e−2βK ≤ 1 + % ⊗ · · · ⊗ %. By the modular theory K and % commute, thus a simple application√of the Gelfand-Naimark theorem implies e−2nβK ≤ 1 + %n for all n, thus e−2βK ≤ n 1 + %n . Taking the limit as n → ∞ we obtain e−2βK ≤ max(1, %). As J KJ = −K and J log %J = − log %, the above inequality also entails that e2βK ≤ max(1, %−1 ). Taking logarithms, as K and % commute, these inequalities imply respectively 2βKE+ ≤ − log %E+ and 2βKE− ≥ − log %E− , where E+/− are the spectral projections of log % corresponding to the positive/negative half-line. The projection E0 onto the kernel of log % clearly commutes with % and K, thus denoting by T the closure of −2βK(log %)−1 (1 − E0 ), T is a bounded positive linear contraction, 0 ≤ T ≤ 1, commuting both with % and K. As J KJ = −K and J log %J = − log % and J E0 J = E0 , we also have J T J = T . We now recall a weak form of the characterization of the KMS property in terms of the Roepstoff–Araki–Sewell auto-correlation lower bound and the Pusz–Woronowicz passivity condition, see [7]. Theorem 1.11 ([1, 33, 32]). Let M be a von Neumann algebra, a cyclic and separating vector and U (t) = eitK a -fixing one-parameter unitary group implementing automorphisms of M. Consider the following properties (i) (KX , X ) ≥ 0 for all X ∈ Msa in the domain of the derivation [·, K], where Msa denotes the selfadjoint real subspace of M. (ii) K = −λ log % for some λ ∈ [0, +∞). Then (ii) ⇒ (i). If the group of automorphisms of M preserving (· , ) is ergodic then (i) ⇒ (ii). Note that case (i) in the above theorem cannot hold with K a non-trivial positive operator (ground state) because then K is affiliated to M by a theorem of Borchers [6] (see also [26]), thus K = 0 because is separating. We shall need to test the above inequality in (i) for more vectors.
Energy Bounds in Quantum Thermodynamics
523
Lemma 1.12. Let M be a von Neumann algebra and a cyclic and separating vector. Then −(log %ξ, ξ ) ≥ 0
(1.8)
for all vectors ξ ∈ K ∩ D(log %), where K is the real Hilbert subspace given by K ≡ Msa . 1
Proof. Let S = J % 2 be the Tomita operator and En be the spectral projection of log % corresponding to the interval (−n, n). Then En commutes with the (real, unbounded) 1 projection P = 21 (1+S) onto K (see Lemma 1.13) because it commutes with S = J % 2 : indeed En commutes both with % and with J (being a real even function of log %). Denoting by M(−n, n) the space of elements of M whose spectrum under the modular group σt = Ad%it lies in (−n, n) and by Msa (−n, n) the real subspace of selfadjoint elements of M(−n, n), we have Msa (−n, n) =
1 (1 + S)M(−n, n) = P M(−n, n) . 2
(1.9)
Let HS denote the Hilbert space D(S) equipped with the S-graph scalar product 1
1
(ξ, η)S ≡ (ξ, η) + (Sξ, Sη) = (ξ, η) + (% 2 ξ, % 2 η), ξ, η ∈ D(S), and notice that any S subset of En H is contained in HS and its closure S in H coincides 1 with its closure in HS (because the restriction of % 2 to En H is bounded). Notice also that, as a linear operator of HS , P is bounded, indeed P is the (real) orthogonal projection of HS onto K. Therefore, on the Hilbert space HS , we have by Eq. (1.9) that Msa (−n, n) = P M(−n, n) = P En HS = En P HS = En K. As the inequality (1.8) holds for all ξ ∈ Msa (−n, n) , it then holds for all ξ ∈ En K. Given ξ ∈ K ∩ D(log %) the sequence of vectors ξn ≡ En ξ belongs to Msa (−n, n) = En K and ||ξn − ξ || → 0,
|| log %ξn − log %ξ || → 0,
therefore −(log %ξ, ξ ) = − lim(log %ξn , ξn ) ≥ 0. n
Lemma 1.13. Let M be a von Neumann algebra with a cyclic separating vector and let S be the associated Tomita’s operator, i.e. the closure of X → X∗ , X ∈ M. Then Msa = {ξ ∈ D(S) : Sξ = ξ }. Proof. As is well known, S ∗ = F , where F is the Tomita operator associated with M and . With K ≡ Msa , let S˜ be the operator given by S˜ : ξ + iη → ξ − iη, ξ, η ∈ Msa , and define analogously F˜ with respect to M sa . Then S˜ ⊃ S and F˜ ⊃ F and S˜ ∗ ⊃ F˜ , thus S˜ = S and F˜ = F . For completeness we give a generalization of the inequality (1.8) that, at the same time, gives a direct proof of it.
524
D. Guido, R. Longo
Proposition 1.14. Let H be a complex Hilbert space and K a standard real Hilbert subspace, namely K ∩ iK = {0}, K + iK = H. Then −(log %ξ, ξ ) ≥ 0
(1.10)
for all vectors ξ ∈ K ∩ D(log %), where % is the modular operator on H associated with K. Proof. Since the kernel of log % is invariant under J and %, one can decompose K as a direct sum of two components, one corresponding to the kernel of log %, and one to its orthogonal complement, thus the inequality (1.10) can be proved for each component separately. Since the inequality is obviously satisfied on the kernel of log %, we may just suppose the kernel of log % to be trivial. Now we give an explicit description of the vectors of K (cf. [14]) which allows an immediate verification of the inequality (1.10). Let us choose a selfadjoint antiunitary C commuting with J and %, and set U = J C, so that U log %U = − log %. Then denote with L the real vector space of C-invariant vectors in the spectral subspace {log % > 0} and by ψ ± the maps ψ + : y ∈ L → U cos 8y + sin 8y, ψ − : y ∈ L → iU cos 8y − i sin 8y, where the operator 8 is defined by | log %| = −2 log tan 8/2, σ (8) ⊆ [0, π/2]. Since U maps the spectral space {log % > 0} onto the spectral space {log % < 0}, both ψ + and ψ − are isometries, and a simple calculation shows that their ranges are real-orthogonal. Moreover, decomposing H as {log % < 0} ⊕ {log % > 0}, one can show that any solution of the equation Sx = x can be written as a sum ψ + (y) + ψ − (z), namely the map ψ − + ψ + : L ⊕R L → K is an isometric isomorphism of real Hilbert spaces. Moreover, for any y, z ∈ L, (ψ ∓ (y), log %ψ ± (z)) is purely imaginary, therefore ((ψ − (y) + ψ + (z)), log %(ψ − (y) + ψ + (z))) = (ψ − (y), log %ψ − (y)) + (ψ + (z), log %ψ + (z)),
(1.11)
namely the inequality should be checked on ψ + (L) and ψ − (L) separately. Finally, (ψ − (y), log %ψ − (y)) = −(y, cos 8 log %y) ≤ 0, since cos 8 and log % are commuting positive operators on L, and the same holds on the range of ψ + . Theorem 1.15. Let M be a von Neumann algebra with a cyclic and separating vector and U (t) = eitK a -fixing one-parameter unitary group implementing automorphisms of M. If M is completely β-bounded with respect to K and , then either K = 0 or AdU (t) satisfies the KMS condition at some inverse temperature β0 ≥ β; indeed β0 is the greatest β > 0 such that M is completely β-bounded. Proof. Set 2βK = T log % as in Cor. 1.10. Note first that K = Msa is equal to 1 {ξ ∈ H : Sξ = ξ }, where S = J % 2 is the Tomita operator. As T commutes both with 1 1 % and J , the same is true for T 2 and thus T 2 commutes with S. It follows that 1
T 2 K ⊂ K.
Energy Bounds in Quantum Thermodynamics
525
We then have for all ξ ∈ K, 1
1
2β(Kξ, ξ ) = (T log %ξ, ξ ) = (log %T 2 ξ, T 2 ξ ) ≤ 0. Clearly the same is true if we replace the von Neumann algebra M by ⊗Z M (infinite tensor product with respect to the constant sequence of vectors n ≡ ), the vector by ⊗Z n and U (t) by ⊗Z U (t). The permutation shift then acts in a strong cluster fashion on the latter system. By the Pusz-Woronowicz theorem 1.11 either K = 0 or this latter system satisfies the KMS condition at some inverse temperature β > 0. Clearly the KMS condition then holds true also for the original system. To prove the last assertion we have to show that, given β > 1, M is not completely β-bounded with respect to − log % and . Indeed, if this were not the case, we would have %β ≤ max(1, %), by Theorem 1.3, which is not possible if β > 1 unless % = 1 in which case also K = 0. Let A be a C∗ -algebra, α a one-parameter automorphism group and ω an α-invariant state. At this point it is natural to say that ω is completely β-holomorphic if the state ω ⊗ · · · ⊗ ω of the n-fold (spatial) tensor product A ⊗ · · · ⊗ A is β-holomorphic with constant C = 1 for all n ∈ N. We then have: Theorem 1.16. Let A be a C∗ -algebra, α a non-trivial one-parameter automorphism group and ω an α-invariant state. The following are equivalent: (i) ω is completely β holomorphic; (ii) ω satisfies the KMS condition at inverse temperature βmax = sup{β > 0 : ω is completely β-holomorphic} (βmax = +∞ means that ω is a ground state). Proof. (ii) ⇒ (i). If ω satisfies the KMS condition at inverse temperature β > 0 then it is β-holomorphic with constant C = 1. It is also immediate by the inequality (1.2) that ω is not completely β -holomorphic if β > β. For the same reason it is not completely β -holomorphic if β > β. If ω is a ground state, then ω is obviously completely β-holomorphic for all β > 0. (i) ⇒ (ii). By considering the GNS representation of ω, it is sufficient to show that Theorem 1.15 holds true without assuming that is separating, but allowing ω to be a ground state. Indeed let E ∈ M be the projection onto H0 = M . Clearly EME is β-bounded on H0 with respect to K|H0 and . Hence AdU (t)|H0 implements the rescaled modular group of (EME, ). Thus β KE = − log %, for some β ≥ β, where % is the modular operator of EME acting on H0 . Moreover, by Corollary 1.8, we have e−βK ≤ 1 + %E. In particular e−βK (1 − E) ≤ (1 − E), thus K(1 − E) ≥ 0. Thus β K = − log %E + L, where L ≡ β K(1 − E) is positive. By the completely β-holomorphic assumption, the same relation holds true by replacing the system with its tensor product by itself. This is possible only in two cases: either log % = 0, thus K > 0 and ω is a ground state, or L = 0, namely is cyclic and separating and βK = − log %.
526
D. Guido, R. Longo
1.2.1. Appendix. The domain of the analytic generator. We collect here a few properties for selfadjoint operators needed in the text. Lemma 1.17. Let K be a selfadjoint operator on the Hilbert space H, β > 0, and ξ a vector in H. The following are equivalent: β
(i) ξ ∈ D(e− 2 K ); (ii) The function t ∈ R → (eitK ξ, ξ ) extends to a function continuous in S¯β and holomorphic in Sβ ; β
(iii) For all η ∈ D(e− 2 K ), the function t ∈ R → (eitK ξ, η) extends to a function in A(Sβ ). β
In this case anal.cont.(eitK ξ, ξ ) = e− 2 K ξ 2 . t→iβ
Proof. Let E(λ) be the family of projections associated with K by the spectral theorem, β namely K = λdE(λ). Then ξ ∈ D(e− 2 K ) if and only if e−λβ dE(λ)ξ 2 < ∞, i.e. e−λβ ∈ L1 (µ), where µ(V ) = V d(E(λ)ξ, ξ ) is the finite Borel spectral measure associated with ξ . By the next sublemma (with a change of sign of β) this holds iff t → µ(−t) ˆ ≡ eitλ dµ(λ) is the boundary value of a function holomorphic in Sβ and continuous in S¯β , therefore (i) ⇔ (ii). If (ii) holds, then t → µ(−t) ˆ extends to a function in A(Sβ ), namely µ(−t) ˆ is bounded in the strip Sβ , because eizλ dµ(λ) ≤ e−zλ dµ(λ) ≤ µ([0, ∞)) + e−βλ dµ(λ) for all z ∈ S¯β .
β
If ξ, η ∈ D(e− 2 K ), the function f (z) ≡ (eitK e−sK ξ, e−sK η),
z = t + is ∈ S¯β ,
can be checked to belong to A(Sβ ) by standard methods, thus (ii) ⇔ (iii). The last assertion follows by the spectral theorem. Sublemma 1.18. Let µ be a finite Borel measure on R, µˆ its Fourier transform and β ∈ R. The function t ∈ R → eβt belongs to L1 (µ) if and only if µˆ is the boundary value of a function holomorphic in Sβ and continuous in S¯β . This function is automatically bounded, namely it belongs to A(Sβ ). Proof. If λ ∈ R → eβλ belongs to L1 (µ), then also λ ∈ R → e−izλ belongs to L1 (µ) for all 0 ≤ z ≤ β and µ(z) ˆ = e−izλ dµ(λ) defines a function in the strip S¯β , that can be easily seen to belong to A(Sβ ). Conversely suppose µˆ to be the boundary value of a function holomorphic in Sβ and continuous in S¯β . Decompose µ as µ+ + µ− , where the first term is supported in the positive axis, the second in the negative axis. We have eβt ∈ L1 (R, µ− ) for any positive β, hence µˆ − is holomorphic in the upper half plane. Therefore we may restrict to the case where µ is supported in the positive axis. In this case µˆ is holomorphic in the lower half plane and in the strip Sβ , and is continuous on the real line both from above and from below, therefore µˆ extends to a holomorphic function on {z < β}, continuous on the boundary. Set ϕ(x) = µ(ix). ˆ Then ϕ is
Energy Bounds in Quantum Thermodynamics
527
analytic in x < β, continuous on x ≤ β, and for x ≤ 0 is given byϕ(x) = exλ dµ(λ). (n) n xλ If x < 0, the dominated convergence theorem entails n ϕ (x) = λ e dµ(λ), hence (n) by monotone convergence we obtain ϕ (0) = λ dµ(λ). The analyticity implies that for 0 ≤ x < β, ∞ xn ϕ(x) = λn dµ(λ) = exλ dµ(λ), n! n=0
where the last equality follows by monotone convergence. Again by monotone convergence and the continuity of ϕ on the boundary we get ϕ(β) = eβλ dµ(λ), namely eβt ∈ L1 (µ). We shall also need the following proposition. Proposition 1.19. Let U (t) = eitK be a one-parameter unitary group on a Hilbert space H and ϕ : R → C a locally bounded Borel function. If D ⊂ D(ϕ(K)) is a dense, U -invariant linear space, then D is a core for ϕ(K). Proof. By replacing ϕ with |ϕ|, we may assume that ϕ is non-negative. Let ξ ∈ H be a vector orthogonal to (ϕ(K) + 1)D. We have to show that ξ = 0. If f is a function in the Schwartz space S(R), we have (1.12) ((ϕ(K) + 1)f (K)η, ξ ) = f˜(t)((ϕ(K) + 1)e−itK η, ξ )dt = 0, for all η ∈ D, where f˜ denotes the Fourier anti-transform of f . If f is a bounded Borel function with compact support, we may choose a sequence of smooth functions fn with compact support such that fn (K) → f (K) weakly, thus the first term in Eq. (1.12) vanish for such an f . If g is a bounded Borel function with compact support, we may write g(λ) = (ϕ(λ)+ 1)(ϕ(λ) + 1)−1 g(λ), therefore (g(K)η, ξ ) = 0. We can then choose a sequence gn of such functions such that gn (K) → 1 strongly. It follows that (η, ξ ) = 0 for all η ∈ D, hence ξ = 0 because D is dense. 1.3. Case where further symmetries are present. We now examine the case where the one-parameter unitary group α considered before extends to a unitary representation of the “ax + b” group, where positive translations implement endomorphisms of the algebra. Proposition 1.20. Let M be a von Neumann algebra on the Hilbert space H, a cyclic vector for M and U a -fixing one-parameter unitary group on H, with generator K, implementing automorphisms of M. Assume furthermore that there is a one-parameter unitary group T on H such that T (a)MT (−a) ⊂ M,
∀a ≥ 0
and satisfying the commutation relations U (t)T (a)U (−t) = T (et a),
a, t ∈ R.
(1.13)
If there is a dense ∗ -subalgebra B of M with such that B ⊂ D(e−πK ) and either
528
D. Guido, R. Longo
– U (t)BU (−t) = B and T (a)BT (−a) ⊂ B, t ∈ R, a ∈ R+ , or – B is π-bounded with respect to K and , then the generator H of T is positive. Proof. Note first that, as a consequence of the commutation relations (1.13), we have T (a) = for all a ∈ R [17]. By the criterion given in Proposition 1.25 below, it will suffice to construct a core D for e−πK such that T (a)D ⊂ D for all a ≥ 0. The set D ≡ B is contained in the domain of e−πK , and clearly T (a)D = T (a)BT (−a) ⊂ B = D,
a ≥ 0.
If moreover B is AdU -invariant, then U (t)D = U (t)BU (−t) = B = D,
t ∈ R.
We may thus apply Lemma 1.24 below to conclude that D is a core for e−πK . On the other hand, if B is π -bounded, then also M is π -bounded by Lemma 1.2, thus we are in the previous case as M is AdU -invariant. Corollary 1.21. In the previous Prop. 1.20, suppose that B is π -bounded and further that is separating for M and C are the only T -invariant vectors. Then ω ≡ (· , ) is a KMS state for α = AdU at inverse temperature β = 2π . Namely the modular operator associated with (M, ) is % = e−2πK . Proof. Since U implements automorphisms of M, it commutes with the modular operator % associated with (M, ), thus U (2π t)%it is a one-parameter group of unitaries. We denote by L its self-adjoint generator. By the modular theory U also commutes with the modular conjugation J associated with (M, ), thus J eπK J = e−πK . We then have, for all X ∈ M1 , e−πL X = eπK %1/2 X = eπK J X ∗ = J e−πK X ∗ ≤ C.
(1.14) ∗ -algebra
We shall now show that the above bound holds for all X in the unit ball of the C ≡ ∪a T (a)MT (−a). By Borchers theorem [5], %it has the same commutation relations (1.13) as U (t) with T (a), thus eitL commutes with T (a). Therefore if X ∈ M1 , e−πL T (a)XT (−a) = e−πL T (a)X = T (a)e−πL X = e−πL X ≤ C,
(1.15)
namely the π-boundedness property with respect to L holds for C. Now, by the following Lemma 1.23, C is irreducible on H, thus L is semi-bounded by Cor. 1.3. As J LJ = −L, L is indeed a bounded operator. We now follow an argument in [10]. By the Kadison–Sakai derivation theorem (cf. [38]), there exists a selfadjoint element h ∈ M, indeed a minimal positive one, such that eith Xe−ith = eitL Xe−itL ,
X ∈ M,
and indeed %it h%−it = Therefore %it h = h ,
h by the canonicity of the minimal positive choice for h. t ∈ R, and this implies T (a)h = h , a ∈ R [17], thus h ∈ C by the uniqueness of the T -invariant vector. As is separating, h ∈ R+ , thus h = 0 as h is minimal. It follows that %it = U (−2πt) for all t ∈ R. Note that, by an argument of Driessler, see [26], the von Neumann algebra M in the above corollary is a III1 -factor, unless dimH ≤ 1.
Energy Bounds in Quantum Thermodynamics
529
1.3.1. Appendix. Spectral and irreducibility properties. We begin to recall a simple lemma. Lemma 1.22. [12]. Let C be a ∗ -algebra on a Hilbert space H with cyclic vector and E be the one-dimensional projection onto C . The ∗ -algebra generated by C and E is irreducible. Proof. Let X ∈ B(H) commute with C and E. Then X ∈ C and X = XE = EX = λ for some λ ∈ C. As is separating for C , then X = λ and this entails the thesis. Lemma 1.23. [26] Let C be a ∗ -algebra on a Hilbert space H, T a n-parameter unitary group, n ≥ 1, such that T (x)CT (−x) = C for x ∈ Rn . If the spectrum of T is asymmetric, namely sp(U ) ∩ −sp(U ) = {0} and is a vector which is cyclic for C and unique T -invariant, then C is irreducible. Proof. Let M be the weak closure of C. Clearly T (x)MT (−x) = M, hence, by a theorem of Borchers [6], see also [15], T (x) ∈ M. By the mean ergodic theorem, the one dimensional projection E onto belongs to the von Neumann algebra generated by {T (x), x ∈ Rn }, hence to M. Then M = B(H) by Lemma 1.22. Lemma 1.24. Let B be a ∗ -algebra, a cyclic vector for B and U (t) = eitK a one-parameter, -fixing unitary group implementing automorphisms of B. If B ⊂ D(e−βK ) for some β > 0, then B is a core for e−βK . Proof. Set D = B and apply Proposition 1.19.
We now recall the criterion for the positivity of the energy discussed in [3]. Proposition 1.25. [3]. Let U and T be one-parameter unitary group on a Hilbert space H satisfying the commutation relations (1.13). The following are equivalent: (i) the generator of T is positive; (ii) there exists a core D for e−πK such that T (a)D ⊂ D for some (hence for all) a > 0, where K is the generator of U . 2. Minimality of the Hawking Temperature We now apply our results in the Quantum Field Theory context.
2.1. A characterization of the Bisognano–Wichmann property. In the following we shall consider a Poincaré covariant net of von Neumann algebras on the Minkowski spacetime in the vacuum representation, indeed an inclusion preserving map S → A(S) associating a von Neumann algebra acting on a given Hilbert space H with each spacelike cone S in the Minkowski spacetime Rd+1 , d ≥ 1, satisfying the following properties:
530
D. Guido, R. Longo ↑
• There exists a representation U of the Poincaré group P+ such that U (g)A(S)U (g)∗ = A(gS) ,
↑
g ∈ P+ .
• There exists a unique unit vector which is invariant under the action of the Poincaré group. is cyclic and separating for the von Neumann algebras associated with wedge regions. Here a spacelike cone S is the cone generated by double cone and a point in the interior of its spacelike complement. Note that we do neither assume A to be local nor the positivity of the energy to hold. This last property will indeed follow by our boundedness condition. With each wedge region we associate the one-parameter group =W of Lorentz boosts preserving W . In this way, denoting by %W the Tomita operator associated with the von Neumann algebra A(W ) and , the Bisognano–Wichmann relations [4] take the form %itW = U (=W (−2π t)),
(2.1)
where W is a wedge. Clearly, by Poincaré covariance, Eq. (2.1) holds for all wedges if it holds for a particular one. If W is a wedge we denote by KW the generator of the one-parameter unitary group U (=(t)). We now need the following geometric observation, whose proof is straightforward: Lemma 2.1. If S is an open convex cone of Rn , the set of its translated {S + x : x ∈ Rn } is directed with respect to inclusion. Theorem 2.2. Let W be a wedge, S a spacelike cone contained in W and B a weakly dense ∗ -subalgebra of A(S). The following are equivalent: (i) The Bisognano–Wichmann relation %W = e−2πKW holds. (ii) e−πKW B1 is bounded and the energy-momentum spectrum lays in the forward light cone V¯+ . (iii) e−πKW A(W )1 is bounded. (iv) ||e−πKW A(W )1 || ≤ 1. If moreover the boundary of S intersects the edge of W in a half-line, then in (ii) the π -boundedness is sufficient, namely it implies the spectrum condition. Proof. In this proof we drop the subscript W on the operators associated with W . (i) ⇒ (iv): As %1/2 = e−πK , then e−πK X = %1/2 X = J X ∗ for all X ∈ A(W ), which immediately implies (iv). (iv) ⇒ (iii) is obvious. (iii) ⇒ (ii): We only need to show the spectrum condition. By Poincaré covariance it is sufficient to show that the positivity of the generator of a one-parameter group T of light-like translations associated with W ; this satisfies T (a)A(W )T (−a) ⊂ A(W ), a ≥ 0, and the commutation relations (1.13) with U (t) ≡ eitK . Thus the positivity property follows by the criterion in Proposition 1.25. (ii) ⇒ (i): Since U (=(t)) implements automorphisms of A(W ), it commutes with %is , i.e. U (=(2π t))%it is a one-parameter group of unitaries. Denote by L its self-adjoint
Energy Bounds in Quantum Thermodynamics
531
generator and note that, since by the Tomita–Takesaki theorem eitL commutes with the modular conjugation J of (A(W ), ), we have for all X ∈ A(S)1 , e−L/2 X = e−πK %1/2 X = e−πK X ∗ ≤ C, namely A(S) is 21 -bounded with respect to L and . Moreover, by Borchers commutation relations [5] and by Tomita–Takesaki theorem, eitL commutes with all translations. Therefore, denoting by T the translation unitary group, e−L/2 A(S + x)1 = e−L/2 T (x)A(S)1 = e−L/2 A(S)1 ≤ C, namely e−L/2 X ≤ C for all X ∈ ∪x A(S + x) with ||X|| ≤ 1. By Lemma 2.1 ∪x A(S + x) is a ∗ -subalgebra of B(H), which is also translation invariant. By Lemma 1.23 ∪x A(S + x) is thus irreducible, thus L is semi-bounded by Lemma 1.3. The rest now follows as in the proof of Corollary 1.21. It remains to show the last assertion. Let’s then assume that S intersects the edge of W in a half line. By Lemma 1.2 the boundedness of e−πK B1 implies that also e−πK A(S)1 is bounded. Let T (s) be the one-parameter unitary group of translations along the edge of W . Clearly T (s) commutes with K, therefore if ||e−πK A(S)1 || ≤ C, then ||e−πK A(S + s)1 || = ||e−πK T (s)A(S)1 || = ||T (s)e−πK A(S)1 || = ||e−πK A(S)1 || ≤ C,
(2.2)
namely the π-boundedness condition holds for ∪s A(S + s). As A(S + s) is a dense ∗ algebra of A(W ), by Lemma 1.2 ||e−πK A(W )1 || ≤ C, thus we obtain all the properties in the statement by the above proof. Let now M be a von Neumann algebra on a Hilbert space H and a cyclic and separating vector for M. For a vector ξ ∈ H we set ϕξ ≡ (· , ξ )|M . We shall say that a set Q ⊂ H is L1 -metrically nuclear with respect to M if the set of linear functionals {ϕξ ∈ M∗ : ξ ∈ Q} is a metrically nuclear subset of M∗ . By using a characterization of the split property of Fidaleo [13], we obtain the following. Corollary 2.3. Let W be a wedge region and S a spacelike cone contained in W . Consider the following properties: (i) The Bisognano–Wichmann property % = e−2πKW for W . (ii) The split property for A(S) ⊂ A(W ). (iii) The set e−λKW A(S)1 is compact for every 0 < λ < π , L1 -metrically nuclear with respect to A(W ) for λ = 1/2, and the diameter of e−λKW A(S)1 is uniformly bounded for 0 < λ < π. Then (i) & (ii) ⇔ (iii).
532
D. Guido, R. Longo
Proof. (i)&(ii) ⇒ (iii): By the split property the set %λ A(S)1 is compact for 0 < λ < 1/2 [8] and metrically nuclear for λ = 1/4 [13]. By the KMS property ||%λ X || ≤ 1 for all X ∈ A(W )1 and 0 ≤ λ ≤ 21 (see [8]); we omit the suffix W on % and K. (iii) ⇒ (i)&(ii): Assume first that the underlying Hilbert H space is separable. Let X ∈ A(S)1 and choose a sequence {ξn } norm dense in H1 ; the function Fn (z) ≡ (eizK X , ξn ) is bounded and holomorphic in the open strip Sπ . Indeed, by the uniform boundedness assumption, |F (z)| ≤ C for some constant C > 0 independent of X and ξn . As Fn ∈ H ∞ (Sπ ) the limit limλ→π Fn (t + iλ) exists except for t in a set En ⊂ R of Lebesgue measure zero. Choose t0 ∈ / ∪n En ; as ||ei(t0 +iλ)K X || ≤ C, the weak limit i(t +iλ)K 0 limλ→π e X exists, thus also the weak limit limλ→π e−λK X exists. By the spectral theorem, this implies X ∈ D(e−πK ) and ||e−πK X || ≤ C, namely A(S) is π-bounded with respect to K and and this entails (i) by the previous theorem. If H is non-separable, it is sufficient to apply the above argument to the separable Hilbert subspace generated by f (K)X as f varies in the complex continuous functions on R vanishing at infinity. Then (ii) follows because the L1 -metrical nuclearity of %1/2 A(S)1 implies the split property for A(S) ⊂ A(W ) [13]. Remark 2.4. If the net A is local, then the modular conjugation JW for wedges has a geometric action too [16] if Theorem 2.2 holds.
2.2. One-dimensional nets. It is convenient to give explicitly a version of the above results in the context of nets of von Neumann algebras on the real line. Let I denote the set of bounded open non-empty intervals of R. We shall consider a net of von Neumann algebras on R, namely an inclusion preserving map I ∈ I → C(I ) from I to the von Neumann algebras on a given Hilbert space H. If E ⊂ R, we denote by C(E) the C∗ -algebra generated by {C(I ) : I ∈ I, I ⊂ E} and by C(E) the weak closure of C(E). We shall further assume that there exist two one-parameter unitary groups T and U implementing translations and dilations, namely for any I ∈ I, a, t ∈ R, T (a)C(I )T (a)∗ = C(I + a), ∗
t
U (t)C(I )U (t) = C(e I )
(2.3) (2.4)
satisfying the commutation relations (1.13) and leaving invariant a unique unit vector (the vacuum), which is cyclic and separating for C(0, ∞). We denote by K the infinitesimal generator of U . Proposition 2.5. The following are equivalent: (i) The Bisognano–Wichmann relations % = e−2πK holds, where % is the modular operator associated with (C(0, ∞), ). (ii) There exists a > 0 and a dense ∗ -subalgebra B of C(0, a) such that e−πK B1 is bounded.
Energy Bounds in Quantum Thermodynamics
533
(iii) There exists a > 0 and a dense ∗ -subalgebra B of C(a, ∞) such that e−πK B1 is bounded. (iv) There exists I ∈ I and a dense ∗ -subalgebra B of C(I ) such that e−πK B1 is bounded and the generator of T is a positive operator. (v) e−πK C(0, ∞)1 is a bounded set. Proof. The proof is similar to the one of Theorem 2.2. We only notice that in this case condition (ii) refers to a bounded interval (0, a). This is possible because e−πK B1 ≤ C ⇒ e−πK C(0, a)1 ≤ C and, by scaling the interval, this gives e−πK C(0, et a)1 = eitK e−πK C(0, a)1 = e−πK C(0, a)1 ≤ C for any t, thus the boundedness condition holds for C(0, ∞). Remark 2.6. If locality is further assumed in Prop. 2.5, then the net C extends to a conformal net on S 1 [18]. 2.3. Globally hyperbolic spacetimes. We shall now discuss our results for a class of stationary black hole spacetimes, namely globally hyperbolic spacetimes with bifurcate Killing horizon. As we shall see, for nets with a boundedness property the KMS property is equivalent to the existence of a translation symmetry for the net on the horizon. Let V be a d + 1 dimensional globally hyperbolic spacetime with a bifurcate Killing horizon. An example is given by the Schwarzschild–Kruskal manifold. We denote by h+ and h− the two codimension 1 submanifolds that constitute the horizon h = h+ ∪ h− . We denote by L and R the “left and right wedges”. Let κ = κ(V) be the surface gravity, namely, denoting by χ the Killing vector field, the equation ∇g(χ , χ ) = −2κχ on h, with g the metric tensor, defines a function κ on h, that is actually constant on h [25]. If V is the Schwarzschild–Kruskal manifold, then 1 κ(V) = 4m , where m is the mass of the black hole. In this case R is the exterior of the Schwarzschild black hole. In what follows R is the actual spacetime and V is to be regarded as a completion of R. Let A(O) be the von Neumann algebra on a Hilbert space H of the observables localized in the bounded diamond O ⊂ R. R ⊂ V is a =-invariant region and we assume that the Killing flow =t of V gives rise to a one parameter unitary group U (t) = eiKt implementing automorphisms αt of the quasi-local C ∗ -algebra A(R) such that αt (A(O)) = A(=t (O)). We now consider a locally normal α-invariant state ϕ on A(R). The net A is assumed to be already in the GNS representation of ϕ, hence ϕ is represented by a cyclic vector ξ . Let’s denote by Ra the wedge R “shifted by” a ∈ R along, say, h+ (see [19]). If I = (a, b) is a bounded interval contained in R+ , we set C(I ) = A(Ra ) ∩ A(Rb ) ,
0 < a < b.
One obtains in this way a net of von Neumann algebras localized on the horizon parametrized by the intervals of (0, ∞), where the Killing automorphism group α acts covariantly by dilations, cf. Eq. (2.4). We denote by C(0, ∞) the C ∗ -algebra generated by all C(a, b), b > a > 0. We shall say that a one-parameter unitary group T implements translations on the horizon if Eqs. (2.3) and (1.13) are satisfied.
534
D. Guido, R. Longo
Corollary 2.7. Assume that C(0, a) is β2 -bounded w.r.t. K and ξ for some β ≥ β0 and a > 0, where β0 = 2π κ is the inverse of the Hawking temperature. The following are equivalent: (i) ϕ|C(0,∞) is a KMS state at Hawking inverse temperature β0 . (ii) There exists a one-parameter unitary group T implementing translations on the horizon. In this case the generator of T is positive and the net extends to a conformal net on the line. Proof. (i) ⇒ (ii). The inclusion R1 ⊂ R0 is half-sided modular, therefore the translations (with positive generator) can be constructed as in [42], cf. also Proposition 4A.2 in [19]. (ii) ⇒ (i). First we extend the net C to all intervals in R setting C(a, b) = T (a)∗ C(0, b− a)T (a), a < 0. Clearly T and U act as translations and dilations on the net, therefore Proposition 2.5 applies, hence the generator of T is positive. Conformal invariance then follows by [43]. 3. Conclusion We have seen that states with the β-boundedness conditions are indeed thermal equilibrium states in certain Quantum Field Theory contexts. In general the β-boundedness condition selects particular non-equilibrium steady states, whose meaning is not completely clear. One may be tempted to use the β-boundedness condition to define the “local temβ perature” of an observable X as the inverse of sup{β > 0 : X ∈ D(e− 2 K )} with the notations in the text, cf. [11]. This suggests the following physical interpretation. If a thermodynamical system ! sits in the background of the black hole, interacts with heat reservoirs at temperature less than the Hawking temperature, then the black hole is a predominant heat bath for ! and the state is a thermal equilibrium state at the black hole background temperature. In particular, the Hawking temperature is minimal and one cannot cool the system ! down by letting it interact with an infinite reservoir at lower temperature. However, the above discussion relies on the assumption that only β-holomorphic states enter in the game. The validity of the above picture relies on a clarification of the role of the βholomorphic states, namely how large is their class and how close they are to equilibrium states. Acknowledgements. We would like to thank C. D’ Antoni for conversations.
References 1. Araki, H., Sewell, G.L.: KMS condition and local thermodynamic stability of quantum lattice systems. Commun. Math. Phys. 52, 103–109 (1977) 2. Bekenstein, J.D.: Generalized second law of thermodynamics in black hole physics. Phys. Rev. D 9, 3292–3300 (1974) 3. Bertozzini, P., Conti, R., Longo, R.: Covariant sectors and positivity of the energy. Commun. Math. Phys. 141, 471–492 (1998)
Energy Bounds in Quantum Thermodynamics
535
4. Bisognano, J.J., Wichmann, E.H.: On the duality condition for quantum fields. J. Math. Phys. 17, 303 (1976) 5. Borchers, H.-J.: The CPT-theorem in two-dimensional theories of local observables. Commun. Math. Phys. 143, 315 (1992) 6. Borchers, H.-J.: Energy and momentum as observables in Quantum Field Theory. Commun. Math. Phys. 2, 49 (1966) 7. Bratteli, O., Robinson, O.: Operator algebras and quantum statistical mechanics. 2. Equilibrium states. Models in quantum statistical mechanics. 2nd ed., Berlin: Springer, 1997 8. Buchholz, D., D’Antoni, C., Longo, R.: Nuclear maps and modular structures I. General properties. J. Funct. Anal. 88, 223–250 (1990) 9. Buchholz, D., D’Antoni, D., Longo, R.: Nuclear maps and modular structures II. Applications to quantum field theory. Commun. Math. Phys. 129, 115–138 (1990) 10. Buchholz, D., Wichmann, E.H.: Causal independence and the energy-level density of states in local quantum field theory. Commun. Math. Phys. 106, 321–344 (1986) 11. D’Antoni, C., Zsido, L.: In preparation 12. Doplicher, S., Longo, R.: Standard and split inclusions of von Neumann algebras. Invent. Math. 73, 493–536 (1984) 13. Fidaleo, F.: Operator space structures and the split property. J. Operator Theory 31, 207–218 (1994) 14. Figliolini, F., Guido, D.: On the type of second quantization factors. J. Operator Theory 31, 229–252 (1994) 15. Florig, M.: On Borchers’ theorem. Lett. Math. Phys. 46, 289–293 (1998) 16. Guido, D., Longo, R.: An algebraic spin and statistics theorem. Commun. Math. Phys. 172, 517–533 (1995) 17. Guido, D., Longo, R.: The conformal spin and statistics theorem. Commun. Math. Phys. 181, 11–35 (1996) 18. Guido, D., Longo, R., Wiesbrock, H.-W.: Extension of conformal nets and superselection structures. Commun. Math. Phys. 192, 217–244 (1998) 19. Guido, D., Longo, R., Roberts, J.E., Verch, R.: Charged sectors, spin and statistics in quantum field theory on curved space-times. Rev. Math. Phys. (to appear), math-ph/9906019 20. Haag, R.: Local quantum physics. 2nd ed., Berlin–Heidelberg–New York: Springer, 1996 21. Haag, R., Hugenoltz, N.M., Winnik, M.: On the equilibrium states in quantum statistical mechanics. Commun. Math. Phys. 5, 215–236 (1967) 22. Haag, R., Swieca, J.A.: When does a quantum field theory describe particles? Commun. Math. Phys. 1, 308–320 (1965) 23. Haagerup, U.: Solution of the similarity problem for for cyclic representations of C∗ -algebras. Annals of Math. 118, 215–240 (1983) 24. Hawking, S.W.: Particle creation by black holes. Commun. Math. Phys. 43, 199 (1975) 25. Kay, B.S., Wald, R.M.: Theorems on the uniqueness and thermal properties of stationary, nonsingular, quasifree states on space-times with a bifurcate killing horizon. Phys. Rept. 207, 49–136 (1991) 26. Longo, R.: Algebraic and modular structure of von Neumann algebras of physics. Proc. Symp. Pure Math. 38 Part 2, 551 (1982) 27. Longo, R.: An analogue of the Kac–Wakimoto formula and black hole conditional entropy. Commun. Math. Phys. 186, 451–479 (1997) 28. Longo, R.: Notes for a quantum index theorem. Preprint, March 2000 29. Pedersen, G.K.: C∗ -algebras and their automorphism group. London: Academic Press, 1979 30. Pisier, G.: Similarity problems and completely bounded maps. Lecture Notes in Mathematics 1618, Berlin: Springer-Verlag, 1996 31. Pisier, G.: Grothendieck’s theorem for noncommutative C∗ -algebras with an appendix to Grothendieck’s constants. J. Funct. Anal. 29, 397–415 (1978) 32. Pusz, W., Woronowicz, S.L.: Passive states and KMS states for general quantum systems. Commun. Math. Phys. 58, 273–290 (1978) 33. Roepstorff, G.: Correlation inequalities in quantum statistical mechanics and their applications to Kondo problem. Commun. Math. Phys. 46, 253–262 (1976) 34. Ruelle, D.: Natural nonequilibrium states in quantum statistical mechanics. J. Statist. Phys. 98, 57–75 (2000)
536
D. Guido, R. Longo
35. Schroer, B., Wiesbrock, H.W.: Modular theory and geometry. Rev. Math. Phys. 12, 139–158 (2000) 36. Sewell, G.L.: Quantum fields on manifolds: PCT and gravitationally induced thermal states. Ann. Phys. (N.Y.) 141, 201 (1982) 37. Stratila, S., Zsidó, L.: Lectures on von Neumann algebras. Tunbridge Wells, Kent: Abacus Press, 1979 38. Takesaki, M.: Theory of Operator Algebras. Vol. II, in preparation 39. Unruh, W.G.: Notes on black hole evaporation. Phys. Rev. D 14, 870 (1976) 40. Wald, R.M.: Quantum field theory in curved spacetime and black hole thermodynamics. Chicago, IL: University of Chicago Press, 1994 41. Wiesbrock, H.-W.: A comment on a recent work of Borchers. Lett. Math. Phys. 25, 157–159 (1992) 42. Wiesbrock, H.-W.: Half-sided modular inclusions of von Neumann algebras. Commun. Math. Phys. 157, 83 (1993) 43. Wiesbrock, H.-W.: Conformal quantum field theory and half-sided modular inclusions of von Neumann algebras. Commun. Math. Phys. 158, 537 (1993) 44. Bros, J., Buchholz, D.: Towards a relativitic KMS condition. Nucl. Phys. B 429, 291–318 (1994) 45. Summers, S.J., Verch, R.: Modular inclusion, the Hawking temperature and quantum field theory in curved spacetime. Lett. Math. Phys. 37, 145–158 (1996) Communicated by H. Araki
Commun. Math. Phys. 218, 537 – 568 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
The Attractor of Renormalization and Rigidity of Towers of Critical Circle Maps Michael Yampolsky Mathematics Department, University of Toronto, 100 St. George Street, Toronto, Ontario, M5S 3G3, Canada. E-mail:
[email protected] Received: 16 August 1999 / Accepted: 6 December 2000
Abstract: We demonstrate the existence of a global attractor A with a Cantor set structure for the renormalization of critical circle mappings. The set A is invariant under a generalized renormalization transformation, whose action on A is conjugate to the two-sided shift with a countable alphabet. 1. Introduction The empirical discovery of universality phenomena in transition to chaos in one-dimensional dynamical systems in the late 1970’s made a great impact on the subject of onedimensional dynamics. A deep and somewhat mysterious analogy relates these phenomena to the critical phenomena in Statistical Physics. Motivated by this analogy, Feigenbaum and independently Coullet & Tresser adapted the physical Renormalization Group method to the study of universality in one-dimensional dynamics. In the particular case of dynamics of unimodal maps of the interval, they introduced a renormalization operator which acts by replacing a dynamical system with its appropriately rescaled first return map, and offered an explanation of the appearance of universality in terms of hyperbolic properties of this operator. In his seminal work [Sul2, MvS] Sullivan introduced the methods of complex-analytic dynamics and geometry into the study of renormalization, and rigorously proved some of the conjectured universalities in the unimodal case. Subsequent works of McMullen [McM2] and Lyubich [Lyu3, Lyu4] completed the unimodal picture, proving the conjectures of Feigenbaum–Coullet–Tresser, and thus providing a mathematical basis for the universality phenomena in unimodal dynamics. This paper is concerned with the other central example of universality in one-dimensional dynamics, found in critical circle maps. A critical circle map f is an orientation preserving self-homeomorphism of the circle T = R/Z of class C 3 with a single critical point c. A further assumption is made that the critical point is of cubic type. This means that for a lift f¯ : R → R with critical points at integer translates of c, ¯ f¯(x) − f¯(c) ¯ = (x − c) ¯ 3 (const +O(x − c)). ¯
538
M. Yampolsky
Well-known examples of critical circle maps are given by the projections to T of the analytic homeomorphisms of the line 1 sin 2π x. 2π These maps form the Arnold’s (or standard) family of critical circle maps. Recall that every homeomorphism f : T → T has a well-defined rotation number ρ(f ) ∈ T given by limn→∞ f¯n (x)/n mod Z (the value of this expression is clearly independent on the choices of a lift f¯ and x ∈ R). When ρ(f ) is irrational, it captures the combinatorics of the orbits of f , since the map f is semi-conjugate to the rigid rotation x → x + ρ(f ) mod Z in this case (see e.g. [MvS]). Yoccoz has shown in [Yoc] that a critical circle map f has no wandering intervals, hence for such maps the semiconjugacy becomes a conjugacy. It will be useful to us to associate to a rotation number its continued fraction expansion with positive terms Aθ (x) = x + θ −
1
ρ(f ) =
1
r0 + r1 +
,
1 r2 + · · ·
which we will further abbreviate as ρ(f ) = [r0 , r1 , . . . ] for typographical convenience. The sequence {ri } is infinite if and only if ρ(f ) is irrational, it is uniquely defined in this case. For B ∈ N we shall say that f is of a combinatorial type bounded by B if ρ(f ) ∈ / Q and all the terms in its continued fraction are not greater than B. The renormalization operator R for critical circle maps first appeared in [ORSS], and in a slightly different version in [FKS]. Among the many authors who further advanced its study let us note the work of Lanford [Lan1, Lan2], where the main conjectures about R were formulated in full generality. Due to a technical difficulty, discussed further in this paper, the action of the operator R is not defined on the critical circle maps themselves, but on the objects called critical commuting pairs. A simple glueing construction associates the former to the latter, there is, however, no one-to-one correspondence. The precise definitions will be given in the next section, for the moment the reader should think of commuting pairs as a natural extension of the class of critical circle maps, in particular, they also come equipped with rotation numbers. The renormalization is defined for all commuting pairs whose rotation number is non-zero; the action of R on the rotation numbers is the well-known Gauss map of the interval (0, 1] given by G(ρ) = {1/ρ}. Since this map forgets the first digit of the continued fraction expansion of ρ, the commuting pairs with irrational rotation numbers are infinitely renormalizable, while those with rational rotation numbers can only be renormalized a finite number of times. The main open question of the study of renormalization is a Hyperbolicity Conjecture for R which we will formulate further, it first appeared in [Lan2]. The first attempts to prove parts of the Conjecture were confined to the framework of one-dimensional smooth dynamics, however, the analytic methods naturally came into play. More specifically, in 1986 Eckmann and H. Epstein [EE] constructed a class of real- analytic maps invariant under R, the so-called Epstein class E. It was further shown by Yoccoz and others (see [dF1]) that renormalizations of smooth critical commuting pairs converge to E, and by a recent result of de Faria and de Melo [dFdM1] the convergence occurs at a geometric rate. In [dF1, dF2] de Faria has adapted Sullivan’s seminal work on unimodal renormalization to the setting of critical circle mappings. In particular, he has defined holomorphic
Attractor of Renormalization
539
extensions of commuting pairs in the Epstein class analogous to the Douady–Hubbard quadratic-like maps. He then used Sullivan’s techniques to demonstrate the existence of complex a priori bounds for such extensions in the case of a bounded combinatorial type, and obtained the following convergence result: Theorem 1.1 ([dF1, dF2]). Let f1 and f2 be two critical circle maps in E with the same rotation numbers in R \ Q of a bounded combinatorial type. Then distC r (R◦n f1 , R◦n f2 ) → 0 for all 0 ≤ r < ∞. As a consequence, for any B ≥ 1 there exists a closed R- invariant set AB of critical commuting pairs such that for any f ∈ E whose type is bounded by B, R◦n f → AB . The action of R on AB is conjugated to the two- sided shift with B symbols. In a recent work de Faria and de Melo [dFdM2] employed McMullen’s towers techniques to demonstrate that the convergence to the attractor AB happens at a geometric rate. In [Ya1] we showed the existence of complex a priori bounds for circle maps with an arbitrary irrational rotation number, using methods developed in [LY]. This result enables us in the present paper to construct the whole renormalization horseshoe A without restrictions on the combinatorial type. Before formulating our results let us make some further notation. Let be the space of bi-infinite sequence of natural numbers, and denote by σ : → the shift on this space: ∞ σ : (ri )∞ −∞ → (ri+1 )−∞ .
Let us also complement the natural numbers with the symbol ∞ with the conventions ¯ + the space ¯ the space (N ∪ {∞})Z , and by ∞ + x = ∞, 1/∞ = 0; and denote by N (N ∪ {∞}) . Our main result is the following: Theorem A. There exists an R-invariant set I of commuting pairs with irrational rotation numbers with the following properties. The action of R on I is bijective. Moreover, there is a homeomorphism i:→I such that if ζ = i(. . . , r−k , . . . , r−1 , r0 , r1 , . . . , rk , . . . ) then ρ(ζ ) = [r0 , r1 , . . . , rk , . . . ] and the action of R on I is conjugate to the shift: i −1 ◦ R ◦ i = σ. The set I is pre-compact in Carathéodory topology (see Sect. 2 for the definition), its closure A is the attractor for the renormalization operator: R◦n ζ → A, for all ζ ∈ E with ρ(ζ ) ∈ R \ Q, where the convergence is understood in the sense of Carathéodory topology (it implies, in particular, that the analytic extensions of the renormalized pairs converge uniformly on compact sets). More precisely, for any pair ζ ∈ A with ρ(ζ ) = ρ(ζ ) we have dist(Rn ζ, Rn ζ ) → 0 for the C 0 distance between the analytic extensions of the renormalized pairs on an open neighborhood of the interval of definition.
540
M. Yampolsky
The proof of the above theorem is inspired by the argument of McMullen for the convergence of unimodal renormalizations [McM2]. Following [McM2] we consider the geometric limits T of various rescalings of sequences of renormalizations (ζ, Rζ, . . . , R◦n ζ ) of increasing lengths. The objects T should be thought of as bi-infinite towers of nested dynamical systems in the plane. They are naturally equipped with rotation ¯ The proof of the theorem relies on the numbers which are bi- infinite sequences in . following uniqueness statement: ¯ there exists a unique (up to a real homothety) For each bi-infinite sequence s¯ ∈ bi-infinite tower with s¯ as its rotation number. We point out that a parallel statement has been proved by Hinkle [Hin] in the setting of unimodal maps with essentially bounded combinatorics. An important difference with the situation considered by McMullen and its adaptation to the case of circle maps carried out in [dFdM2] is the presense of parabolic commuting pairs in a limiting tower. Forward-infinite towers with parabolic elements which occur as geometric limits in a broad class of complex-analytic dynamical systems were considered in the dissertation of A. Epstein [Ep1], who proved a rigidity theorem for such objects. It is conceivable that the constructions could be modified appropriately so that the results of Epstein could be applied directly in the setting of critical circle maps; we use some ergodic arguments to replace them. The above mentioned parabolic elements of the attractor A are the pairs ζ in the closure of I on which renormalization is not defined. Necessarily, ρ(ζ ) = 0, and, as we shall see below, ζ posesses a unique fixed point, which has eigenvalue 1. We call any commuting pair with this property parabolic, and construct a family of natural extensions of R to parabolic commuting pairs, the parabolic renormalizations Pθ , θ ∈ T. We may extend Theorem A to parabolic pairs as follows. For every parabolic commuting pair ζ ∈ E and every ρ ∈ T there exists a unique θ(ζ, ρ) with the property that ρ(Pθ(ζ,ρ) ζ ) = ρ, and moreover, if ρ is rational then Pθ(ζ,ρ) ζ is also a parabolic pair. Let us define X = {ζ ∈ E, ρ(ζ ) ∈ R \ Q} ⊂ E, ¯ +. and blow-up the boundary of this space by replacing every boundary element ζ by ζ × −1 To topologize this space, let us replace every ζ ∈ X with ρ(ζ ) ∈ R \ Q with (ζ, i (ζ )), and in the case ζ ∈ ∂X , and ρ(ζ ) = [r1 , . . . , rk ], for s¯ = (s1 , s2 , . . . ) let us express the corresponding point in the blown-up space as (ζ, t¯), where t¯ = (r1 , . . . , rk , s1 , s2 , . . . ). ¯ + to topologize the blow-up of X , and Now we use the product topology on X × call the resulting space Xˆ . Let Aˆ ⊂ Xˆ be the blow-up of A. Define the generalized renormalization operator G : Xˆ → Xˆ setting G(ζ ) = R(ζ ) in the case ρ(ζ ) ∈ R \ Q, and G(ζ, (r1 , r2 , . . . )) = (Pθ (ζ ), (r2 , r3 , . . . )), where θ = θ(ζ, [r1 , r2 , . . . ]). With our definitions we have the following: Theorem B. The set Aˆ is invariant under G. The map i : → I extends homeomorˆ so that G ◦ i = i ◦ σ . The generalized renormalizations G n (x) ¯ → A, phically to ˆ converge to A for any x ∈ Xˆ . As a consequence of the above theorem we obtain the following analogue of the golden mean renormalization fixed point: Theorem C. There exists a commuting pair ζ0 ∈ A such that for all maps f with ρ(f ) = [r0 , r1 , r2 , . . . ] ∈ R \ Q with rn → ∞ we have R◦n f → ζ0 .
Attractor of Renormalization
541
Moreover, let E 0 ⊂ E be the set of Epstein pairs with zero rotation numbers, and for ζ ∈ E 0 denote P 0 (ζ ) = Pθ(ζ ) (ζ ), where θ(ζ ) is selected so that the renormalized pair is again in E 0 (as we shall see further, such θ(ζ ) exists and is unique). Then ζ0 is a fixed point of the operator P 0 : E 0 → E 0 , and P 0 ζ → ζ0 for all ζ ∈ E 0 . Further developments. Since this paper appeared as a Stony Brook preprint in 1998, some new developments in the subject have taken place. In our new work [Ya2] we were able to circumvent the principal difficulty in defining renormalization for analytic critical circle maps, and construct a new renormalization operator, acting on maps rather than on commuting pairs. This renormalization extends to analytic maps of the annulus in a small neighborhood of a critical circle map, becoming an analytic operator on a Banach manifold. Using these properties we showed that the periodic orbits of renormalization are hyperbolic, with one-dimensional unstable directions, which implies universality for all periodic rotation numbers. The global hyperbolicity of the full renormalization horseshoe constructed in this paper remains a subject of further study. 2. Preliminaries Some notations. The notation Dr (z) will stand for the Euclidean disk with center z ∈ C and radius r. The unit disk D1 (0) will be denoted D. For two points a and b in the circle T which are not diametrally opposite, [a, b] will denote the shorter of the two arcs connecting them; |[a, b]| will denote the length of the arc. For two points a, b ∈ R, [a, b] will denote the closed interval with endpoints a, b without specifying their order. The plane (C \ R) ∪ J with the parts of the real axis not contained in the interval J ⊂ R removed will be denoted CJ . We use dist and diam to denote the Euclidean distance and diameter in C. For K > 1 we call two real numbers x and y K- commensurable if K −1 ≤ |x|/|y| ≤ K. Two sets X and Y in C are K-commensurable if their diameters are. In accordance with the established terminology, we shall say that a quantity is definite if it is greater than a universal positive constant. A set B is contained well inside of a set A ⊂ C if A \ B contains an annulus with definite modulus. Similarly, an interval I ⊂ R is contained well inside of another interval J if there exists a universal constant K > 0 such that for each component L of J \ I we have |L| > K|I |. Commuting pairs and renormalization of critical circle maps. Consider a critical circle mapping f with a rotation number ρ, and let ρ(f ) = [r0 , r1 , r2 , . . . ]
(2.1)
be its (possibly finite) continued fraction expansion. We shall always assume that the critical point of f is at 0 ∈ R/Z. An iterate f k (0) is called a closest return of the critical point if the arc [0, f k (0)] contains no other iterates f i (0) with i < k. Examples of closest returns are given by the truncated continued fractions pm /qm = [r0 , r1 , . . . , rm−1 ].
(2.2)
Recall that these fractions are best rational approximations of ρ(f ) and their denominators are given recursively by qm+1 = rm qm + qm−1 , q0 = 1, q1 = r0 . Elementary considerations (see e.g. [MvS]) imply that the iterates f qm (0) form a sequence of closest
542
M. Yampolsky
returns of the critical point. For every m ≥ 1 such that the continued fraction expansion of ρ(f ) has at least m terms set Im ≡ [0, f qm (0)]. An important consequence ´ atek–Herman real a priori bounds ([Sw1, He]), is that when ρ(f ) is irrational, of Swi¸ there exists M = M(f ) such that for all m ≥ M the intervals Im and Im+1 are K− commensurable, with a universal bound K > 1. For a more detailed statement of this and some proofs see [dFdM1]. The general strategy of defining a renormalization of a given dynamical system (cf. [Lyu2]) is to select a piece of its phase space, rescale it to the “original” size, and then consider the first return map to this piece. Historically, for a circle map f the union of arcs Am = Im ∪ Im+1 is chosen as the domain for the return map. The first return map Rm : Am → Am is defined piecewise by f qm on Im+1 and by f qm+1 on Im . To recover a critical circle map we may identify the neighborhoods of points f qm (0) and f qm+1 (0) by the iterate f qm+1 −qm . This identification transforms the arc Am into a C 3 -smooth closed one- dimensional manifold A˜ m , and Rm projects to a smooth homeomorphism R˜ m : A˜ m → A˜ m with a critical point at 0. However, the manifold A˜ m does not posess a canonical affine structure; the choice of a smooth identification φ : A˜ m → R/Z gives rise to a plethora of different critical circle maps, all smoothly conjugate. The above discussion illustrates why the space of critical circle maps is not suited for defining a renormalization transformation, and motivates the introduction of the following objects: Definition 2.1. A commuting pair ζ = (η, ξ ) consists of two C 3 -smooth orientation preserving interval homeomorphisms η : Iη → η(Iη ), ξ : Iξ → ξ(Iξ ), where (I) Iη = [0, ξ(0)], Iξ = [η(0), 0]; (II) The maps η and ξ have C 3 -homeomorphic extensions to open neighborhoods of their intervals of definition such that η ◦ ξ = ξ ◦ η wherever both sides are defined; (III) ξ ◦ η(0) ∈ Iη ; (IV) η (x) = 0 = ξ (y), for all x ∈ Iη \ {0}, and all y ∈ Iξ \ {0}. Commuting pairs were first used to define the renormalization transformation by Ostlund, Rand, Sethna, and Siggia [ORSS]. Feigenbaum, Kadanoff, and Shenker [FKS] defined renormalization by means of a slightly different formalism. A critical commuting pair is a commuting pair (η, ξ ) whose maps can be decomposed near zero as η = hη ◦ Q ◦ Hη , and ξ = hξ ◦ Q ◦ Hξ , where hη , hξ , Hη , Hξ are real analytic diffeomorphisms and Q(x) = x 3 . We shall further require that η analytically extends to an open interval Pη such that η(Pη ) ⊃ [η(0), ξ(0)] in which it has two cubic critical points at 0 and at ξ(η−1 (0)) and no other critical points; and similarly that ξ analytically extends to an open interval Pξ 0 with ξ(Pξ ) ⊃ [η(0), ξ(0)], and has a single critical point 0 in this interval. We state the last two assumptions for technical convenience; they will automatically be satisfied for renormalizations of analytic critical circle maps as defined further. We will denote by C the space of critical commuting pairs with Iη = [0, 1]. Various natural topologies on this space are discussed in [dFdM1]. Let f be a critical circle mapping, whose rotation number ρ has a continued fraction expansion (2.1) with at least m + 1 terms. Let pm and qm be as in (2.2). The pair of iterates f qm+1 and f qm restricted to the circle arcs Im and Im+1 correspondingly can be viewed as a critical commuting pair in the following way. Let f¯ be the lift of f to the real line satisfying f¯ (0) = 0, and 0 < f¯(0) < 1 and let I¯m ⊂ R denote the closed interval adjacent to zero which projects down to the interval Im . Let T : R → R denote the translation x → x +1. Let η : I¯m → R, ξ : I¯m+1 → R be given by η ≡ T −pm+1 ◦ f¯qm+1 , ξ ≡ T −pm ◦ f¯qm . Then the pair of maps (η|I¯m , ξ |I¯m+1 ) forms a critical commuting pair.
Attractor of Renormalization
543
Henceforth we shall simply denote this commuting pair by (f qm+1 |Im , f qm |Im+1 ),
(2.3)
which allows us to readily identify the dynamics of the above commuting pair with that of the critical circle map, at the cost of a minor abuse of notation. For a commuting pair ζ = (η, ξ ) we will denote by ζ the pair ( η|Iη , ξ |Iξ ) where tilde means rescaling by a linear factor sign(ξ(0))/|Iη |. In accordance with the notation of [dFdM1], we define the height χ (ζ ) of a critical commuting pair ζ = (η, ξ ) as the natural number r, for which 0 ∈ [ηr (ξ(0)), ηr+1 (ξ(0))]. If no such r exists, we set χ (ζ ) = ∞, in this case the commuting pair ζ has zero rotation number and hence the map η|Iη has a fixed point. For a pair ζ with χ (ζ ) = r < ∞ one verifies directly that the mappings η|[0, ηr (ξ(0))] and ηr ◦ ξ |Iξ again form a commuting pair. Definition 2.2. The renormalization of a real commuting pair ζ = (η, ξ ) is the commuting pair r ◦ ξ |I , r (ξ(0))]). Rζ = (η ξ η|[0, η
The non-rescaled pair (ηr ◦ ξ |Iξ , η|[0, ηr (ξ(0))]) will be referred to as the prerenormalization pRζ of the commuting pair ζ = (η, ξ ). For a pair ζ we define its rotation number ρ(ζ ) ∈ [0, 1] to be equal to the continued fraction [r0 , r1 , . . . ], where ri = χ (Ri ζ ). In this definition 1/∞ is understood as 0, hence a rotation number is rational if and only if only finitely many renormalizations of ζ are defined; if χ (ζ ) = ∞, ρ(ζ ) = 0. Thus defined, the rotation number of a commuting pair can be viewed as a rotation number in the usual sense via the already familiar gluing construction. Namely, given a critical commuting pair ζ = (η, ξ ) we can again regard the interval I = [η(0), ξ ◦ η(0)] as a circle, identifying η(0) and ξ ◦ η(0) and define fζ : I → I by fζ =
ξ ◦ η(x) for x ∈ [η(0), 0] η(x) for x ∈ [0, ξ ◦ η(0)].
(2.4)
We perform glueing together of η(0) to ξ ◦ η(0) by the mapping ξ , which by Condition (II) above extends to a smooth homeomorphism of open neighborhoods. The quotient of the interval I is a C 3 -smooth circle M, and the commutation condition implies that the mapping fζ projects down to a smooth homeomorphism Fζ : M → M. Identifying M with the circle by a diffeomorphism φ : M → T we recover a critical circle mapping f φ = φ ◦ Fζ ◦ φ −1 . The critical circle mappings corresponding to two different choices of φ are conjugated by a diffeomorphism, and thus we recovered a smooth conjugacy class of circle mappings from a critical commuting pair. It is immediately seen, that: Proposition 2.1. The rotation number of mappings in the above constructed conjugacy class is equal to ρ(ζ ).
544
M. Yampolsky
To us the advantage of defining ρ(ζ ) using a sequence of heights is in having a way of distinguishing the commuting pairs with rotation numbers 0 and 1; this way we also remove the ambiguity in prescribing a continued fraction expansion to the other rational rotation numbers. For ρ = [r0 , r1 , . . . ] ∈ (0, 1] let us set 1 G(ρ) = [r1 , r2 , . . . ] = , ρ where {x} denotes the fractional part of a real number x (G is usually referred to as the Gauss map). As follows from the definition, ρ(Rζ ) = G(ρ(ζ )) for a real commuting pair ζ with ρ(ζ ) = 0. The renormalization of the real commuting pair (2.3), associated to some critical qm+2 |I , f qm+1 |I ). Thus for a given critical circle map f , is the rescaled pair (f m+1 m+2 circle map f the renormalization operator recovers the (rescaled) sequence of the first return maps: qi+1 |I qi |I )}∞ . i , f {(f i+1 i=1 Let us denote by C ∞ the subset of commuting pairs ζ ∈ C with χ (ζ ) = ∞. The following is verified directly from the definitions: Proposition 2.2. We have R : C \ C ∞ → C. Let us also note that Proposition 2.3. The map R : C \ C ∞ → C is injective. Proof. Let ζ = (η, ξ ) be a pair in R(C \ C ∞ ). The desired statement will follow if we show that there exists a unique value of r ∈ N such that there is an element ζ−1 ∈ C \ C ∞ with height χ (ζ−1 ) = r for which Rζ−1 = ζ . Indeed, in this case ζ−1 will be determined uniquely as the rescaled pair (ξ, ξ −r ◦ η). To prove the uniqueness of r consider an arbitrary commuting pair ζ−1 as above. Note that for j < χ (ζ−1 ) the commuting pair (γj , σj ) = (ξ, ξ −j ◦ η) does not have a critical point at σj ◦ γ −1 (0) j
and hence is not in C. On the other hand for j > χ (ζ−1 ) the pair of maps (γj , σj ) is not a commuting pair, since σj has a singularity at γj (0).
Obviously, the analyticity in the definition of the critical commuting pairs is the key requirement for injectivity; in the smooth category such a statement would not hold. A parallel result for the unimodal maps is non-trivial, and was proved by de Melo and van Strien in [MvS]. Renormalization Hyperbolicity Conjecture. We would like to discuss briefly the general Renormalization Hyperbolicity conjecture and its connection to the existence of the attractor of the renormalization operator. Renormalization Hyperbolicity Conjecture. The space of critical commuting pairs may be endowed with the structure of an infinite-dimensional smooth manifold with respect to which the transformation R is uniformly hyperbolic, with one-dimensional unstable direction.
Attractor of Renormalization
545
Sr S1
R1
S∞
P∞ P
Rr
Fig. 1.
In this generality the conjecture is due to Lanford [Lan2]. To illustrate its consequences let us consider the following sketch of the action of R. Let ρ = φ(θ ) : T → T be a monotone continuous function such that φ −1 (ρ) is a point for ρ irrational and an interval otherwise. An example of such a function is the dependence θ → ρ(fθ ) of the rotation number of a standard map on the parameter. Imagine the relevant space of commuting pairs as an infinite- dimensional cylinder C = T × C , where the rotation number of a commuting pair ζ (θ, ·) with the equatorial coordinate θ ∈ T is ρ(θ ). The cylinder C is partitioned into strips (cf. Fig. 1) Sr = {ζ ∈ C|ρ(ζ ) = [r, r1 , . . . ]} for r = 1, . . . , ∞. A boundary component of the strip S∞ is the hypersurface P∞ ⊂ S∞ with the property that a pair ζ ∈ P∞ if and only if it has a fixed point with unit eigenvalue. The sets Sr accumulate on P∞ in the clockwise direction. It is natural to think of the transformation R : C\S∞ → C as being defined piecewise, given on each Sr , r = ∞, by the formula: Rr : (η, ξ ) → (η, ηr ◦ ξ ). The operator Rr uniformly expands the strip Sr in the equatorial direction, and uniformly contracts it in the complementary directions, mapping it onto a thin cylinder intersecting all the strips. The invariant set I is seen in this picture as intersections of the “boxes” Rr−1 (Sr−1 ∩ Rr−2 (Sr−2 ∩ · · · (Rr−n Sr−n ) · · · ) −1 −1 ∩ Sr0 ∩ R−1 r0 (Sr1 ∩ Rr1 (Sr2 ∩ · · · Rrn−1 (Srn ) · · · ).
The parabolic renormalization P which we shall define below complements the action of R by “blowing up” the hypersurface P∞ to an equatorial cylinder. The statement of the Hyperbolicity Conjecture can be generalized to include this transformation.
546
M. Yampolsky
ξ
ξ η
V
D
η
U
∆
0
Fig. 2.
Holomorphic commuting pairs. Following [dF1, dF2] we say that a real commuting pair (η, ξ ) extends to a holomorphic commuting pair H (cf. Fig. 2) if there exist four R-symmetric topological disks 9, D, U , V , and a holomorphic mapping ν, such that ¯ U¯ , V¯ ⊂ 9, U¯ ∩ V¯ = {0}; U \ D, V \ D, D \ U , and D \ V are nonempty • D, connected and simply-connected sets, U ⊃ Iη , V ⊃ Iξ ; • mappings η : U → 9 ∩ Cη(JU ) and ξ : V → 9 ∩ Cξ(JV ) are onto and univalent, where JU = U ∩ R, JV = V ∩ R; • ν : D → 9 ∩ Cν(JD ) is a three-fold branched covering with a single critical point at zero, where JD = D ∩ R; • η and ξ have holomorphic extensions to a certain neighborhood of the origin where both η ◦ ξ and ξ ◦ η are defined, and η ◦ ξ(z) = ξ ◦ η(z) = ν(z) for all z in that neighborhood. As the following proposition shows, the mapping ν is nothing else, but the composition of η and ξ : Proposition 2.4 (Proposition II.1, [dF2]). Under the above conditions, the mappings η and ξ have analytic extensions to U ∪ D and V ∪ D correspondingly. Moreover, η : D → V ∩ C[ξ −1 (η(0)),0] and ξ : D → U are three-fold branched covering maps, and ν = η ◦ ξ . Set = = D ∪ U ∪ V . One immediately observes that if a commuting pair ζ = (η, ξ ) with a finite height has a holomorphic pair extension H : = → 9, then there exists a holomorphic commuting pair G : = → 9 whose restriction coincides with Rζ . We shall refer to the commuting pair ζ as underlying H, and write ζ = H ∩ R. We say that a real commuting pair ζ = (η, ξ ) with an irrational rotation number has complex a priori bounds, if all its renormalizations Rn ζ extend to holomorphic commuting pairs Hn : =n → 9n with definite moduli, that is inf mod(9n \ =n ) > 0. The shadow of the holomorphic commuting pair is the piecewise holomorphic mapping SH : = → 9, given by η(z), z ∈ U SH (z) = ξ(z), z ∈ V ξ ◦ η(z), z ∈ D \ (U ∪ V ). The shadow of a holomorphic pair captures its dynamics in the following sense:
Attractor of Renormalization
547
Proposition 2.5 (Prop. II.4. [dF2]). Given a holomorphic commuting pair H as above, −1 (I ). Then: consider its shadow SH . Let I = = ∩ R, and X = I ∪ SH • The restriction of SH to = \ X is a regular three fold covering onto 9 \ R. • The orbits of SH and H coincide as sets. We will say that two holomorphic commuting pairs H : =H → DH and G : =G → DG are conjugate if there is a homeomorphism h : DG → DH such that SG = h−1 ◦ SH ◦ h. We will usually write simply G = h−1 ◦ H ◦ h, meaning that h conjugates the corresponding elements of the two holomorphic pairs. We define the filled Julia set K(H) of a holomorphic commuting pair H as the closure of the collection of all points which do not escape = under iteration of SH . This set is clearly compact and connected, it is full by the Maximum principle. Its boundary is the Julia set of H, denoted by J (H). Holomorphic commuting pairs in the standard family. For each 0 ≤ θ < 1 let Aθ be the entire mapping given by Aθ (z) = z + θ −
1 sin(2π z). 2π
Since Aθ ◦ T = T ◦ Aθ , where T is the unit translation z → z + 1, each Aθ is a lift of a holomorphic self- mapping fθ of the cylinder C/Z ∼ = C∗ . As each Aθ is real analytic and satisfies Aθ (x) > 0 for x ∈ R \ Z, the restriction fθ |T is a critical circle map, whose rotation number we will denote ρ(θ ). These restrictions are usually referred to as the standard family (or Arnold’s family) of circle homeomorphisms. Elementary considerations of monotone dependence on parameter (see e.g. [MvS]) imply that θ → ρ(θ ) is a continuous non-decreasing map of T onto itself. Whenever t ∈ T is irrational, ρ −1 (t) is a single point. For t = p/q the set ρ −1 (t) is a closed interval, for every parameter value in the interior of this interval the homeomorphism fθ has two period q orbits, an attracting one and a repelling one, which collide to a single parabolic orbit at the two endpoints of the interval (see [EKT]). De Faria (cf. [dF1, dF2]) demonstrated, using some explicit estimates on the growth of the standard maps, that renormalizations of maps fθ can be extended to holomorphic commuting pairs. Before formulating his result, let us briefly describe the geometry of a mapping Aθ . The preimage of the real axis under Aθ consists of the axis itself together with the family of analytic curves k ?± : Re(z) = k ±
1 −2π | Im(z)| arccos , 2π sinh(2π Im(z))
k meet at the critical point c = k and are where k ∈ Z. For each k the curves ?± k both asymptotic to the vertical lines Re(z) = k ± 1/4. Note that each ck is of cuk and R partition the complex plane into simply-connected bic type. The curves ?± regions each of which is univalently mapped onto H or −H by Aθ . Now denote by q Un the connected component of the preimage (Aθ n )−1 (H) whose boundary contains q the point T −pn+1 (Aθ n+1 (0)). The closure of the union of Un and its reflection in the q x-axis will be denoted Uˆ n . Similarly let Vn be the component of (Aθ n+1 )−1 (H) with q n cl Vn T −pn (Aθ (0)), and set Vˆn to be the union of cl Vn with its vertical reflection.
548
M. Yampolsky q
q
Lemma 2.6 ([dF1, dF2]). Set η = T −pn+1 ◦Aθ n+1 and ξ = T −pn ◦Aθ n . For all sufficiently large R the preimages Un,R = η−1 (DR (0)) ∩ Uˆ n and Vn,R = ξ −1 ((DR (0)) ∩ Vˆn are compactly contained in DR (0). Thus the pair of maps (η, ξ ) extends to a holomorphic pair with range DR (0). Moreover the modulus of this holomorphic pair tends to infinity with R. The significance of these examples in the Renormalization Theory is due to a rigidity property of standard maps: Lemma 2.7 (see Lemma IV.8 [dF2]). Every real-symmetric holomorphic self-map of C/Z which is topologically conjugate to a member of the family {fθ }, with a critical point at 0, itself belongs to this family. Combining Lemma 2.7 with the above described properties of the dependence θ → ρ(θ ) we have the following: Lemma 2.8. Let g be a real-symmetric self-map of the cylinder C/Z with a critical point at 0, topologically conjugate to a map fθ . If ρ(θ ) is irrational, or fθ has a periodic orbit with eigenvalue one on the circle, then g ≡ fθ . The first statement of the above lemma, and the fact that as a finite-type analytic map of the cylinder, fθ does not have wandering domains (see [Keen, Mak]), imply the following (cf. [dF1, dF2]): Lemma 2.9. If ρ(θ ) is irrational, then fθ admits no non- trivial, symmetric, invariant Beltrami differentials entirely supported on its Julia set. The above properties of standard maps carry over to general holomorphic commuting pairs via quasiconformal straightening arguments. A topology on a space of branched coverings. Consider the collection B of all triplets (U, u, f ), where U ⊂ C is a topological disk different from the whole plane, u ∈ U , and f : U → C is a three-fold analytic branched covering map, with the only branch point at u. We will endow B with a topology as follows (cf. [McM1]). Let {(Un , un )} be a sequence of open connected regions Un ⊂ C with marked points un ∈ Un . Recall that this sequence Carathéodory converges to a marked region (U, u) if: • un → u ∈ U , and ˆ \ K. ˆ \ Un , U is a component of C • for any Hausdorff limit point K of the sequence C For a simply connected U ⊂ C and u ∈ U let R(U,u) : D → U denote the Riemann mapping with normalization R(U,u) (0) = u, R(U,u) (0) > 0. By a classical result of Carathéodory, the Carathédory convergence of simply-connected regions (Un , un ) → (U, u) is equivalent to the locally uniform convergence of Riemann mappings R(Un ,un ) to R(U,u) . For positive numbers B1 , B2 , B3 and compact subsets K1 and K2 of the open unit disk D, let the neighborhood UB1 ,B2 ,B3 ,K1 ,K2 (U, u, f ) of an element (U, u, f ) ∈ B be the set of all (V , v, g) ∈ B, for which: • |u − v| < B1 , • sup |R(V ,v) (z) − R(U,u) (z)| < B2 , z∈K1
• and R(U,u) (K2 ) ⊂ V , and
sup
z∈R(U,u) (K2 )
|f (z) − g(z)| < B3 .
Attractor of Renormalization
549
One verifies that the sets UB1 ,B2 ,B3 ,K1 ,K2 (U, u, f ) form a base of a topology on B, which we will call Carathéodory topology. This topology is clearly Hausdorff, and the convergence of a sequence (Un , un , fn ) to (U, u, f ) is equivalent Carathéodory convergence of the marked regions (Un , un ) → (U, u) as well as locally uniform convergence fn → f . Epstein class. An orientation preserving interval homeomorphism g : I = [0, a] → g(I ) = J belongs to the Epstein class E if it has a cubic critical point at 0, and extends to an analytic three-fold branched covering map of a topological disk G ⊃ I onto the ◦
double-slit plane CL , where L ⊃ J . Any map g in the Epstein class can be decomposed as g = Qc ◦ h,
(2.5)
where Qc (z) = z3 + c, and h : I → [0, b] is a univalent map h : G → 9(h) onto the complex plane with six slits, which triple covers CL under the cubic map Qc (z). For any s ∈ (0, 1), let us introduce a smaller class Es ⊂ E of Epstein mappings g : I = [0, a] → J ⊂ L for which both |I | and dist(I, J ) are s −1 -commensurable with |J |, the length of each component of L \ J is at least s|J |, and g (a) > s. We will often refer to the space E as the Epstein class, and to each Es as an Epstein class. We say that a commuting pair (η, ξ ) ∈ C belongs to the (an) Epstein class if both of its maps do. It immediately follows from the definitions that: Lemma 2.10. The space of commuting pairs in the Epstein class E is invariant under renormalization. We shall denote by H the space of holomorphic commuting pairs H : = → 9 whose underlying real commuting pair (η, ξ ) is in the Epstein class. In this case both maps η and ξ extend to triple branched coverings ηˆ : Uˆ → 9 ∩ Cη(Jη ) and ξˆ : Vˆ → 9 ∩ Cξ(Jξ ) respectively. We will topologize H by identifying it with a subset of B × B by H → (Uˆ , 0, η) ˆ × (Vˆ , 0, ξˆ ). It is immediately seen that: Proposition 2.11. The holomorphic commuting pairs based on maps in the standard family belong to H . Let us make a note of an important compactness property of Es . Lemma 2.12. Let s ∈ (0, 1). The collection of normalized maps g ∈ Es with I = [0, 1], with marked domains (U, 0) is sequentially compact with respect to Carathéodory convergence. Proof. Let gn : I = [0, 1] → Jn ⊂ Ln be a normalized sequence in Es . These maps extend to triple branched coverings gn : Gn → CLn ; and can be decomposed (2.5) as gn = Qn ◦ hn , where Qn (z) = Qcn (z) = z3 + cn , and hn : I → [0, b] is a univalent map Gn → 9(hn ). By passing to a subsequence we can ensure that Ln converge to an interval L, and cn → c. Then (9(hn ), 0) Carathédory converges to a slit domain (9, 0). Since Jn is s −1 -commensurable with In , hn (1) is bounded. As gn (1) > s > 0, the derivatives (h−1 n ) (hn (1)) are bounded from above. The points hn (1) stay away from the boundary of 9, and it follows from the Koebe Distortion theorem, that {h−1 n } form a normal family in 9. Let us select a locally uniformly converging subsequence h−1 n . Since hn (I ) have bounded length, the limit of this sequence is a non-constant, and therefore univalent,
550
M. Yampolsky
function h−1 : 9 → G. Let Rn : D → Gn be the normalized Riemann map, Rn (0) = 0, Rn (0) > 0. It can be decomposed as Rn = h−1 n ◦Rn , where R is the normalized Riemann map D → 9(hn ). By the above, the maps Rn converge locally uniformly to the Riemann map R : D → G, which is equivalent to Carathéodory convergence (Gn , 0) → (G, 0). Finally, note that the convergence of h−1 n implies that there is a point c ∈ I where derivatives of hn are bounded from above. It follows from the Koebe theorem, that {hn } is a normal family in G, and hence hn → h ≡ (h−1 )−1 . The above proof also yields: Lemma 2.13. For any s ∈ (0, 1), there exists a domain Os ⊃ [0, 1], such that for any g ∈ Es with normalization I = [0, 1], the univalent map h in (2.5) is well-defined and has K(s)-bounded distortion in Os . We will further refer to the above property by saying that a map g ∈ Es is cubic up to bounded distortion. The Epstein class of critical circle maps was first introduced by Eckmann and Epstein [EE] as an invariant subspace for the renormalization operator, it was further shown in [EE] that this class contains a fixed point of R with golden mean rotation number. It can be shown using real distortion estimates, that renormalizations of any smooth circle map converge to the Epstein class, and by recent work of de Faria and de Melo [dFdM1], the rate of convergence is geometric in C r metric. We shall only use the following, weaker statement: Lemma 2.14. Let f ∈ C r , (r ≥ 3) be a critical circle map with an irrational rotation qm+1 |I , f qm |I ) is precomnumber. Then the collection of real commuting pairs (f m m+1 r−1 pact in C topology, and all its partial limits are contained in Es , for some universal constant s > 0, independent on the original map f . In particular for a critical circle map f ∈ E there exists σ > 0 such that all its renormalizations are contained in Eσ . Moreover, the constant σ can be chosen independent on f , after skipping the first few renormalizations. Lemma 2.15. Let ζ = (η, ξ ) ∈ E be a critical commuting pair with ρ(ζ ) = 0, which appears as a limit of a sequence {ζn } ⊂ E with ρ(ζn ) ∈ R \ Q. Then the map η has a unique fixed point in the interval Iη , which is necessarily parabolic, with multiplier one. Proof. The existence of a fixed point in Iη is clear. Since there are arbitrary small real perturbations of η without a fixed point on the real line, any such point must be parabolic. Let us show that the fixed point is unique. We thank P. Jones for suggesting the following argument. Assume that a and b are two distinct fixed points of η in Iη . Since η is in the Epstein class, it has a well-defined inverse branch ϕ in C[a,b] . As a and b are fixed points, ϕ([a, b]) = [a, b]. Since ϕ = Id, there exists x0 ∈ [a, b] with ϕ(x0 ) = x0 . Without loss of generality, assume that ϕ(x0 ) > x0 . Let distP (·, ·) denote the Poincaré distance in C[a,b] . Set B = dist P (ϕ(x0 ), x0 ). By the Schwarz Lemma, distP (ϕ(x), x) > B for all x < x0 . On the other hand, distP (ϕ(x), x) < const ·(x −a)−1 . Therefore, ϕ (a) = 1 + B > 1, a contradiction. A commuting pair ζ = (η, ξ ) ∈ E will be called parabolic if the map η has a unique fixed point in Iη , which has a unit multiplier; this point will usually be denoted pη . Note, that by virtue of its uniqueness, pη has to be globally attracting on one side for the interval homeomorphism η|Iη , it is globally attracting on the other side under η−1 . Complex bounds. The main analytic result of the paper [Ya1] was establishing complex a priori bounds for Epstein critical circle maps in the following form:
Attractor of Renormalization
551
Theorem 2.16 ([Ya1]). For any s ∈ (0, 1) there exists N = N (s) and D = Dr (0) such that the following holds: Let f ∈ Es be a critical circle map whose rotation number ρ(f ) has the continued fraction representation with at least N + 1 terms. Denote by =m 0 the domain which triple covers Cf qm+1 (Im ) under f qm+1 . Then dist(f qm+1 (z), 0) dist(z, 0) 3 , for all z ∈ =m , with f qm+1 (z) ∈ D. + d > c |f qm+1 (Im )| |Im | (2.6) The constants c and d in the above inequality are universal. As an immediate consequence, for m sufficiently large, we can choose a Euclidean disc 9 ≡ Drm (0) around the origin, commensurable with Im , such that the preimages Um = f −qm+1 (Drm ) ∩ =m and Vm = f −qm (Drm ) ∩ =m−1 are contained in a concentric is greater than some fixed value disc with a smaller radius r m , and moreover, rm /rm K > 1. Thus the real commuting pair (f qm+1 |Im , f qm |Im+1 ) extends to a holomorphic pair with range 9 and definite modulus: H : (f qm+1 : Um → 9, f qm : Vm → 9).
(2.7)
For µ ∈ (0, 1) let H (µ) denote the space of holomorphic commuting pairs H : =H → 9H , with mod(9H \ =H ) > µ, min(|Iη |, |Iξ |) > µ and diam(9H ) < 1/µ. Lemma 2.17. For each µ ∈ (0, 1) the space H (µ) is sequentially pre- compact, with every limit point contained in H (µ/2). Proof. Let {Hn } be a sequence of holomorphic pairs in H (µ), with ranges 9n . Let ηˆ n : Uˆ n → 9n and ξˆn : Vˆn → 9n be the three fold extensions of ηn , ξn . By passing to a subsequence let us ensure that the Riemann maps R(Un ,0) converge locally uniformly to a map R. It easily follows from the Koebe Distortion Theorem that R is non-constant and thus a univalent map R ≡ R(U,0) . The family ηˆ n is normal in U by Montel’s Theorem, and passing to a subsequence again, we have (Uˆ n , 0, ηˆ n ) → (U, 0, η). ˆ The convergence of ξˆn is ensured in the same fashion. Clearly, the resulting pair has modulus greater than µ/2, and thus is in H (µ/2). Straightening theorems for holomorphic commuting pairs. The existence of extensions to holomorphic pairs with complex a priori bounds for high renormalizations of Epstein circle maps, together with an analysis of the shapes of the domains of these extensions was used in [Ya1] to construct quasiconformal conjugacies between high renormalizations of circle maps with the same rotation number, with universally bounded dilatation. Let ζ be at least n times renormalizable critical commuting pair. Let us say that the pair of numbers ϑn (ζ ) = (rn−1 , rn−2 ) forms the history of the pair Rn ζ . Theorem 2.18 ([Ya1]). For each s ∈ (0, 1) there exists N = N (s), such that the following holds. Let ζ1 = (η1 , ξ1 ) and ζ2 = (η2 , ξ2 ) be two critical commuting pairs in Es with irrational rotation numbers. Assume that their N th renormalizations have the same rotation number and the same history. Then for every n ≥ N the renormalizations Rn ζ1 and Rn ζ2 extend to holomorphic commuting pairs which are K-quasiconformally conjugate with a universal dilatation bound K.
552
M. Yampolsky
U
W
Fig. 3. Illustration to Proposition 2.19
For commuting pairs of bounded type this theorem was proved by de Faria [dF1, dF2], with “K” depending on the bound on the rotation number. The argument for the unbounded case is quite delicate, and is based on the complex a priori bounds, as well as the following observation on the geometry of the domain of definition of a holomorphic commuting pair: Proposition 2.19 ([Ya1]). There exists B > 0 such that for each s ∈ (0, 1) there exists N = N (s) with the following property. Let f be a critical circle map in Es with ρ(f ) ∈ / Q. For every n ≥ N the commuting pair (η, ξ ) = Rn f extends to a holomorphic commuting pair H with domains U , V , and D and range 9; so that η : U → 9 ∩ Cη(JU ) , where JU = U ∩ R, and ξ : V → 9 ∩ Cξ(JV ) , where JV = V ∩ R are univalent maps. Moreover, for any point z ∈ U \ Uδ ([0, f qn −qn+1 (0)]) either Im z > B|[0, f qn −qn+1 (0)]| or z is contained in a possibly empty set W with the following properties. The set W is non-empty only if the iterate f qn |[f qn−1 (0),0] is a sufficiently small perturbation of a parabolic map. In this case W is an R-symmetric topological disk whose inner diameter is commensurable with [0, f qn −qn+1 (0)]; and there exists a fundamental domain C for the Douady coordinate of f qn such that the univalent image W = f qn−1 (W ) is obtained by repeated translation C0 = C, C1 , C2 , . . . , Ck = C of C under f qn . The interiors of all the (crescent-shaped) regions Ci , except for the last one, are disjoint from C. The intersection (f qn−1 (∂U ) ∩ W ) ∩ H is a connected simple curve ? (a “horocycle") obtained by translation of a fundamental arc γ ⊂ C under f qn . A parallel statement holds for the domain V as well. The proposition is illustrated in Fig. 3. Note that by compactness of the Epstein class the equatorial annulus which the projections of ? and −? cut out on the Douady cylinder of f qn has definite modulus. One of the key points in quadratic-like renormalization theory is the Douady–Hubbard Straightening Theorem, which claims that every quadratic-like map is conjugated to a quadratic polynomial via a quasiconformal homeomorphism which is conformal on the Julia set (a so-called hybrid equivalence). In renormalization theory of holomorphic commuting pairs, the role of quadratic polynomials is played by holomorphic pairs generated by standard maps (Lemma 2.6). Below we will establish certain rigidity properties for these pairs, and prove a version of the Straightening Theorem for holomorphic commuting pairs. Lemma 2.20. Let fθ be a map in the standard family. As before, let Aθ denote its lift to the plane, and T (z) = z + 1. Let k ∈ N ∪ {∞} denote the length of the continued
Attractor of Renormalization
553
fraction expansion of the rotation number ρ(fθ ). Suppose that for some n < k, the critical commuting pair q
q
ζθ,n = (η = T −pn+1 ◦ Aθ n+1 , ξ = T −pn ◦ Aθ n )
(2.8)
(where pn /qn is the nth convergent of ρ(fθ )) extends to a holomorphic commuting pair H : = → 9. Then any H-invariant Beltrami differential µ with support in K(H) can be extended off = to a Beltrami differential µˆ with support in K(Aθ ) which is invariant under Aθ and T (that is µˆ projects to a Beltrami differential on the quotient cylinder C/Z, invariant under fθ ). q
Proof. Set gn = T −pn ◦ Aθ n : C → C. Recall, that if ρ(fθ ) = [r0 , r1 , . . . ] then qn+1 = rn qn + qn−1 , with q0 = 1, q1 = r0 , and pn+1 = rn pn + pn−1 , with p0 = 0, p1 = 1. Let µ be as above, and consider the Beltrami differential µ obtained by pulling q q µ back by various inverse branches of gn+1 = T −pn+1 ◦ Aθ n+1 and gn = T −pn ◦ Aθ n . Observe that µ is invariant under gn−1 = gn+1 ◦ gn−rn . Arguing inductively, we see that µ is invariant under g0 = Aθ and g1 = T ◦ Arθ0 and thus under T as well. Theorem 2.21 (Straightening). Let ζ ∈ E be a critical commuting pair with an irrational rotation number. Then there exists N such that for all n > N the renormalization Rn (ζ ) is K-quasiconformally conjugate to a holomorphic pair H, whose underlying real commuting pair is ζθ,n (2.8) for some θ ∈ [0, 1), n ∈ N. The constant K is universal, and the conjugacy is conformal on the filled Julia set K(H). Proof. The existence of the conjugacy is guaranteed by Theorem 2.18. Its conformality on the filled Julia set follows from Lemmas 2.20 and 2.9. Theorem 2.22 (Straightening of limiting pairs). Let ζ ∈ E be a critical commuting pair with an irrational rotation number. Assume that there is a sequence {nk } ⊂ N, such that some holomorphic pair extensions Hk of Rnk ζ converge in H to a pair H : =H → 9H , such that H|R has a parabolic periodic orbit. Then we can find θ ∈ [0, 1) and N ∈ N such that the critical commuting pair ζθ,N extends to a holomorphic commuting pair G : =G → 9G and there exists a Kquasiconformal map φ : 9H → 9G which is a conjugacy: H = φ −1 ◦ G ◦ φ. The constant K is universal, and φ is conformal on the filled Julia set K(H). Proof. The existence of G, K-quasiconformally conjugate to H is guaranteed by Theorem 2.18 and compactness of quasiconformal maps. Let ψ : 9H → 9G be a conjugacy. Consider a new complex structure µ on 9G given by the pullback (φ −1 )∗ σ0 on K(G) and the standard structure σ0 elsewhere. By Lemma 2.20 the structure µ can be extended to a global structure µ invariant ¯ under Aθ and T . Let q : C → C be the solution of ∂q/∂q = µ fixing 0, 1, and ∞. By −1 Lemma 2.8, q ◦ G ◦ q = G. The map φ = q ◦ ψ is the conjugacy with the desired properties. 3. Parabolic Points, Their Perturbations, and Parabolic Renormalization General theory. We begin with a brief review of the theory of parabolic bifurcations, as applied in particular to an interval map in the Epstein class. For a more comprehensive exposition the reader is referred to [Do], supporting technical details may be found in [Sh]. Fix a map η0 ∈ E having a parabolic fixed point p with unit multiplier.
554
M. Yampolsky
Theorem 3.1 (Fatou Coordinates). There exist topological discs U A and U R , called attracting and repelling petals, whose union is a punctured neighborhood of the parabolic periodic point p such that η0 (U¯ A ) ⊂ U A η0 (U¯ R ) ⊂ U R
∞
k=0 ∞
{p}, and {p}, and
η0k (U¯ A ) = {p}, η0−k (U¯ R ) = {p},
k=0
where η0−1 is the univalent branch fixing ζ . Moreover, there exist injective analytic maps LA : U A → C and LR : U R → C, unique up to post-composition by translations, such that LA (η0 (z)) = LA (z) + 1 and LR (η0 (z)) = LR (z) − 1. The Riemann surfaces C A = U A /η0 and C R = U R /η0 are conformally equivalent to the cylinder C/Z. We denote πA : U A → C A and πR : U R → C R the natural projections. The quotients C A and C R are customarily referred to as Écalle–Voronin cylinders. The real axis projects to natural equators E A ⊂ C A and E R ⊂ C R . By Liouville’s Theorem, any conformal transit homeomorphism τ : C A → C R fixing the ends ± is a translation. Lifting it produces a map τ¯ : U A → C satisfying τ ◦ πA = πR ◦ τ¯ . We will write τ ≡ τθ , and τ¯ = τ¯θ , where LR ◦ τ¯ ◦ (LA )−1 (z) ≡ z + θ mod Z. For z ∈ U A ∩ U R , we define the Écalle–Voronin transformation E(πR (z)) = πA (z). We may regard C A and C R as Riemann spheres with distinguished points ± filling in the punctures. The Écalle–Voronin transformation extends to an analytic map of the neighborhoods W (+), W (−) of the two ends of CR . Composing with a transit homeomorphism yields an analytic dynamical system Fτ = τ ◦ E : W (+) ∪ W (−) → C R with fixed points ±. The product of corresponding eigenvalues is clearly independent of τ ; noting that each of the components W (±) is mapped onto the whole C R and applying the Schwarz Lemma we conclude that |Fτ (+)| · |Fτ (−)| > 1.
(3.1)
For an Epstein map η in a sufficiently small neighborhood of η0 the parabolic point splits into a complex conjugate pair of repelling fixed points pη ∈ H and p¯ η with multipliers e2πi±α(η) . In this situation one may still speak of attracting and repelling petals:
Attractor of Renormalization
555
Lemma 3.2 (Douady Coordinates). There exists a neighborhood U (η0 ) ⊂ E of the map η0 such that for any η ∈ U (η0 ) with | arg α(η)| < π/4, there exist topological discs UηA and UηR whose union is a neighborhood of p, and injective analytic maps A R R LA η : Uη → C and Lη : Uη → C
unique up to post-composition by translations, such that A R R LA η (η(z)) = Lη (z) + 1 and Lη (η(z)) = Lη (z) − 1.
The quotients CηA = UηA /η and CηR = UηR /η are Riemann surfaces conformally equivalent to C/Z. Note that the Holomorphic Index Formula (see e.g. [Mil]) implies that the condition on the arguments of the eigenvalues of the repelling fixed points of η is automatically satisfied for all maps in a sufficiently small neighborhood of η0 . An arbitrary choice of real basepoints a ∈ U A and r ∈ U R enables us to specify the Fatou and Douady coordinates uniquely, by requiring that LA (a) = LA η (a) = 0, and R R L (r) = Lη (r) = 0. The following fundamental theorem first appeared in [DH]: Theorem 3.3. With these normalizations we have A R R LA η → L and Lη → L
uniformly on compact subsets of U A and U R respectively. Moreover, select the smallest n(η) ∈ N for which ηn(η) (a) ≥ r. Then −1 ◦ Tθ(η)+K ◦ LA ηn(η) (z) = (LR η) η
wherever both sides are defined. In this formula Ta (z) denotes the translation z → z+a, θ (η) ∈ [0, 1) is given by θ (η) = 1/α(η) + o(1) mod 1, α(η)→∞
and the real constant K is determined by the choice of the basepoints a, r. Thus for a n(η ) sequence {ηk } ⊂ U (η) converging to η, the iterates ηk k converge locally uniformly if and only if there is a convergence θ(η) → θ , and the limit in this case is a certain lift of the transit homeomorphism τθ for the parabolic map η0 . Parabolic renormalization of commuting pairs. Let ζ = (η, ξ ) ∈ E be a parabolic commuting pair. Denote by p ∈ Iη the parabolic fixed point of η. The pair ζ has zero rotation number and is, therefore, not renormalizable. In what follows we will use the above discussed local theory of parabolic germs to describe a parabolic renormalization construction for a parabolic pair ζ , which will naturally supplement the usual renormalization procedure. A parallel construction for parabolic quadratic-like maps is due to Douady and Devaney [DD], see also [BDDS] for a published version. Let U A , U R , C A , and C R be as above. Fix an arbitrary θ ∈ T and consider the transfer isomorphism τ ≡ τθ : C A → C R which preserves the equators (i.e. a rigid rotation). Let N ≥ 0 be such that ηN (ξ(0)) ∈ C A . Fix a branch πR−1 and take the smallest M for which ηM ◦ πR−1 ◦ τ ◦ πA ◦ ηN ◦ ξ(0) ∈ [0, η(0)].
556
M. Yampolsky
-1
πR
τ πA
η η(0)
η
0 γ(0)
ξ(0) πA
πR ξ
C
R
C
A
τ
Fig. 4. Parabolic renormalization of a commuting pair
Clearly, the composition γ ≡ ηM ◦ πR−1 ◦ τ ◦ πA ◦ ηN ◦ ξ has a well-defined extension to the whole interval [0, η(0)] which is independent of the choice of the branch of πR−1 . Definition 3.1. The parabolic renormalization of the commuting pair ζ = (η, ξ ), corresponding to θ ∈ T, is the normalized pair Pθ ζ = (γ |[η(0),0] , η|[0,γ (0)] ). We illustrate this procedure in Fig. 4. Again, we will refer to the non-rescaled commuting pair (γ , η) as the pre-parabolic renormalization pPθ ζ of the pair ζ . As an immediate consequence of continuity of Douady coordinates (Theorem 3.3) we have the following: Lemma 3.4. Let ζk = (ηk , ξk ) ∈ E be a converging sequence of renormalizable commuting pairs, ζk → ζ = (η, ξ ), with rotation numbers ρk → 0. Assume that their renormalizations also converge, Rζk → ζ˜ . Then for some choice of θ ∈ T we have Pθ ζ = ζ˜ . Similarly to Proposition 2.3 we have: Proposition 3.5. For any ζ ∈ C there exists at most one parabolic commuting pair ζ−1 ∈ C ∞ and a unique choice of θ ∈ T such that Pθ ζ−1 = ζ . Proof. Set ζ = (η, ξ ) and suppose that there exists a parabolic commuting pair ζ−1 = (ξ, γ ) with pPθ ζ−1 = ζ for some choice of θ. In this case, η = ξ M ◦ τ¯ ◦ ξ N ◦ γ for a choice of the local lift τ¯ of the transit map τ = τθ , and some M, N ∈ N. The monotonicity of the dependence of the composition ξ M ◦ τ¯ ◦ ξ N ◦ γ on θ implies that the choice of θ is unique for given ζ−1 . The uniqueness of ζ−1 is now proved as in Proposition 2.3.
Attractor of Renormalization
557
Moreover, we have Proposition 3.6. The sets R(C) and ∪θ∈T Pθ ({ζ ∈ C ∞ , ζ is parabolic}) are disjoint. Proof. Let ζ = (η, ξ ) ∈ C and assume that there exists a parabolic critical commuting pair ζ−1 and a choice of θ ∈ T such that Pθ (ζ−1 ) = ζ . Then for every r ∈ N the map ξ does not have a critical point at ξ −r ◦ η(0), and hence (ξ, ξ −(r−1) ◦ η) ∈ / C. 4. Towers of Holomorphic Commuting Pairs 4.1. Forward-infinite towers. Definition 4.1. A forward tower of Epstein holomorphic commuting pairs is an element of the product space H N , for some N ≤ ∞, T = (Hi )N i=1 , for which the following holds. If ζi = (ηi , ξi ) denotes the real commuting pair underlying Hi , then either χ (ζi ) = ∞, and ζ˜i+1 = pRζi , or χ (ζi ) = ∞, and ζ˜i+1 = pPθi ζi , for some choice of θi ∈ T, for all i ≤ N . The pair H1 will be referred to as the base pair of the tower T . We shall denote by T the space of all towers. For an element T ∈ T its rotation number ρ(T ) is the sequence ρ(T ) = (χ (ζ1 ), χ (ζ2 ), . . . ). The domain and range of the tower T is the domain =1 and range 91 of the base pair H1 . The dynamics F (T ) associated to the tower T is the collection of all finite compositions f = h1 ◦· · ·◦hk , where hn is an element of Hi or a local lift of some τθi . For a point z ∈ =1 , the orbit of z under T is the set OT (z) of images of z under all elements of F (T ) which are defined at z. We say that z is an escaping point if OT (z) ∩ (91 \ =1 ) = ∅. Non-escaping points form the filled Julia set K(T ), its boundary is the Julia set J (T ). 2 N Two towers T1 = (Hi1 )N i=1 and T2 = (Hi )i=1 with domains =1 and =2 are conjugate (quasiconformally, conformally, etc.) if there is a homeomorphism φ between open neighborhoods of K(T1 ), K(T2 ) having the appropriate smoothness, satisfying Hi2 ◦ φ = φ ◦ Hi1
for all i ≤ N.
We will mostly be concerned with forward-infinite towers, that is the case N = ∞. The successive renormalizations of a holomorphic pair with an irrational rotation number provide one type of example of such towers. A more general approach to constructing forward-infinite towers is the following. Let ζ k = (ηk , ξ k ) be a sequence of Epstein commuting pairs with irrational rotation numbers, which can be extended to holomorphic commuting pairs Hk ∈ H . Hence for every k and i ∈ N the nth pre-renormalization pRi ζ k can also be extended to a holomorphic pair Hik . By complex a priori bounds we may ensure (by passing to a subsequence in k) that for every i ∈ N the pairs Hik → Hi as k → ∞. Then by Lemma 3.4, the sequence Tˆ = (Hˆ 1 , Hˆ 2 , . . . , Hˆ i , . . . ) forms a tower. Such towers will be referred to as forward-infinite limiting towers. Theorem 3.3 implies the following:
558
M. Yampolsky
Proposition 4.1. Let Tˆ be a limiting tower constructed as above. For any map f ∈ F (T ), and an open subset W compactly contained in the domain of definition of f , there exists a sequence of analytic maps {fl } defined on W with fl ⇒ f in the uniform metric, where each fl is a finite composition of elements of various Hik . We will further distinguish the case when ζi are pre- renormalizations of the same commuting pair, ζ k = pRnk ζ , by referring to Tˆ as a renormalization limiting tower. Finally, we will say that a limiting tower Tˆ is a standard tower if its base pair H1 has the property H1 |R = ζθ,n as in (2.8) for some θ , n. In view of the Straightening Theorems the following observation will be of importance to us: Theorem 4.2 (Combinatorial rigidity of standard towers). Let T = (Hi ) and T˜ = (H˜ i ) be two standard towers. Assume that the base real commuting pairs ζ1 = H1 |R and ζ˜1 = H˜ 1 |R and the rotation numbers ρ(T ) and ρ(T˜ ) coincide. Then Hi |R = H˜ i |R for all i ∈ N. In the case when ρ(ζ1 ) is irrational the claim is obvious. In the case when ρ(ζ1 ) is rational we need to prove the following: Lemma 4.3. Let ζ ∈ E be a parabolic commuting pair. Let {ri }m 1 , m ≤ ∞ be a sequence of positive integers, ending in ∞ if m is finite, and set ρ = [r1 , . . . ]. Then there exists a unique choice of θ ∈ T for which ρ(Pθ ζ ) = [r1 , . . . ], and in the case when ρ ∈ Q (that is m < ∞) Rm−1 (Pθ ζ ) is a parabolic pair. Proof. We need to introduce some further notation. Let p denote the parabolic fixed point of η, and let C A , and C R be as in Sect. 3. The dynamics of the commuting pair (η, ξ ) induces a natural return map of the equators of the Écalle cylinders S : E R → E A . For a choice of the transit isomorphism τθ : C A → C R fixing the equators the composition fθ = τθ ◦ S : E A → E A is a critical circle map. It is easily seen that fθ is topologically conjugate to the circle map fPθ ζ as in (2.4), and thus ρ(fθ ) = ρ(Pθ ζ ). For every x ∈ T the dependence θ → fθ (x) is strictly monotone. The standard considerations (see e.g [MvS]) imply that ψ : θ → ρ(fθ ) is a non-decreasing degree one continuous map of the circle; moreover ψ −1 (ρ) is a single point for each irrational ρ. It remains therefore to consider the case when ρ = p/q is rational. The existence of θ satisfying the conditions of the lemma follows from Lemma 3.4; let us establish its uniqueness. Let Fθ : R → R be a lift of fθ having singularities at integer points, with q Fθ (0) ∈ [0, 1). The map fθ has rotation number p/q if and only if Fθ (x) = x + p for some x ∈ R. If θ satisfies the conditions of the lemma, the pair Pθ ζ has a unique periodic q q orbit. The uniqueness of the orbit implies that Fθ (x) ≤ x +p for all x, or Fθ (x) ≥ x +p for all x. Monotone dependence of fθ on θ now implies that the value of θ realizing each of these two possibilities is unique. Moreover, since the family fθ contains no rigid rotations, the two values are distinct. Finally we note that by our conventions the continued fraction expansions of ρ(Pθ ζ ) in these two situations will differ. Let us formulate a version of the Straightening Theorem for renormalization limiting towers:
Attractor of Renormalization
559
Theorem 4.4. Any two forward-infinite renormalization limiting towers with the same rotation number are conjugate by a quasiconformal homeomorphism. Let us remark that in the case when the towers contain no parabolic elements, the theorem is an immediate consequence of Theorem 2.21. In this case, the conjugacy is conformal on the filled Julia sets of the towers. The conformality statement for the complementary case will be established further in this section. Proof of Theorem 4.4. Let T = (H1 , H2 , . . . ) be a limiting tower for a sequence Ti = (H1i , H2i , . . . ), where H1i |R = Rki ζ for ζ ∈ E, ρ(ζ ) ∈ / Q. Select a standard tower S with ρ(S) = ρ(T ). To prove our claim, it suffices to show that T and S are conjugate by a quasiconformal homeomorphism. To this end, denote ζˆ = ζθ,n the base pair of S, and let ζˆi = ζθi ,n be a sequence of perturbations ζˆi → ζˆ , such that ρ(ζˆi ) = ρ(Rki ζ ). Let Si denote the standard tower with base pair ζˆi . By complex a priori bounds and Theorem 2.21, the towers Ti and Si are K-quasiconformally conjugate with a uniform bound K. The claim follows from the compactness of quasiconformal mappings. 4.2. Bi-infinite towers. A bi-infinite tower is an element of the product space H Z , T = (. . . , H−n , . . . H0 , . . . , Hn , . . . ), with the following properties. For each n ∈ Z setting ζi = (ηi , ξi ) = Hi ∩ R we either have χ (ζi ) = ∞ and ζ˜i+1 = pRζi ; or χ (ζi ) = ∞ and ζ˜i+1 = pPθi ζi for some choice of θi ∈ T. The rotation number of T is naturally defined to be the bi-infinite sequence ρ(T ) = (. . . , χ (ζ−i ), . . . , χ (ζ0 ), . . . , χ (ζi ), . . . ). Examples of bi-infinite towers can be constructed from renormalizations of Epstein commuting pairs with irrational rotation numbers as follows. Let ζk ∈ E with ρ(ζk ) ∈ R \ Q, and select a sequence of positive integers ik & ∞. Let k k ¯ Tk = (H−i , H−i , . . . , H0k , H1k , . . . ) ∈ k k +1
be a forward-infinite tower for which j Hjk −ik |R = λ−1 k ◦ pR ζk ◦ λk ,
where the homothety λk is chosen so that H0k = Rik ζk . By passing to a subsequence in k we may ensure that Hnk −→ Hk for all n ∈ Z. Then the sequence T = (Hk )∞ −∞ is k→∞
a bi-infinite tower. We call such towers limiting, and if in addition ζk = Rnk ζ for some ζ ∈ E, we say that T is a bi-infinite limiting renormalization tower. Our main result transforms into the following rigidity theorem for bi-infinite limiting renormalization towers:
2 ∞ 2 Theorem 4.5 (Tower Rigidity). Let T 1 = (Hk1 )∞ −∞ and T = (Hk )−∞ be two limiting renormalization towers with the same rotation number. Then there exists a homothety λ : R → R such that for all k ∈ Z we have
Hk1 |R = λ−1 ◦ Hk2 |R ◦ λ. The proof of Theorem 4.5 is broken into two steps. We first establish
560
M. Yampolsky
2 ∞ 2 Theorem 4.6. Let T 1 = (Hk1 )∞ −∞ and T = (Hk )−∞ be two bi-infinite limiting renormalization towers with the same rotation number. Then there exists a quasiconformal homeomorphism φ : C → C conjugating the towers:
Hk1 = φ −1 ◦ Hk2 ◦ φ, for all k ∈ Z. Proof. By Theorem 4.4 the towers 1 2 , . . . ) and Tn2 = (Hn2 , Hn+1 ,...) Tn1 = (Hn1 , Hn+1
are quasiconformally conjugate by a homeomorphism φn : 91n → 92n of the domains of Hn1 and Hn2 respectively, whose dilatation is bounded by a universal constant. By complex a priori bounds, the domains 9in fill out the plane; choosing a diagonal subsequence of maps φni converging in each 9n we obtain a limiting quasiconformal mapping φ : C → C with the required properties. We now proceed to formulate Theorem 4.7. A bi-infinite limiting renormalization tower admits no nontrivial invariant Beltrami differentials. Theorem 4.7 readily implies Theorem 4.5, since the Beltrami differential of the quasiconformal conjugacy φ produced by Theorem 4.6 is invariant under the tower T 1 , and hence equal to zero almost everywhere. Thus the conjugacy φ : C → C is conformal and therefore a homothety. The proof of Theorem 4.7 is based on hyperbolic metric expansion techniques developed by McMullen in [McM2] and used by Hinkle in the context parallel to ours in [Hin]. It will occupy the remainder of this section.
4.3. Expansion of the hyperbolic metric. We fix a bi-infinite renormalization limiting tower T = (Hk )∞ −∞ , Hk |R = ζk = (ηk , ξk ). We will refer to h ∈ F (T ) as a map of level k if h is either an element of the holomorphic commuting pair Hk , that is one of the maps ηk , ξk , or ηk ◦ ξk , or a lift of a transit homeomorphism τθ associated to the parabolic point of ηk . The domain of h, denoted by D(h), will in the first case denote the domain of the extension of h to a degree three proper map onto the plane; in the second case it will be the domain of the maximal extension of h provided by the Fatou coordinates. Given a hyperbolic Riemann surface X we shall denote by dX (·, ·) the Poincaré distance on X. For a differentiable map f : X → Y of hyperbolic Riemmann surfaces, ||f (x)||X,Y will stand for the norm of the derivative with respect to the hyperbolic metrics on the pre-image and the image; we will simply write ||f (x)|| if X = Y = C\R. The expansion properties of bi-infinite towers rely on the following lemma (see [McM2]): Lemma 4.8. There exists a continuous and increasing function C(s) < 1 with C(s) → 0 as s → 0 such that for the inclusion ι of a hyperbolic Riemann surface X into a hyperbolic Riemann surface Y , ||ι (z)||X,Y < C(s), where s = dY (x, Y \ X). The following estimate for the variation of the norm of the derivative of an analytic map of hyperbolic Riemann surfaces is a consequence of the Koebe Distortion Theorem:
Attractor of Renormalization
561
Lemma 4.9 (Corollary 2.27 [McM1]). Let f : X → Y be an analytic map between hyperbolic Riemann surfaces with nowhere vanishing derivative. Suppose, in addition, that X ⊂ Y . Then for z1 , z2 ∈ X we have ||f (z1 )||Y,Y ≤ ||f (z2 )||Y,Y ≤ ||f (z1 )||αY,Y , 1/α
where α = exp(KdY (z1 , z2 )) with a universal constant K > 0. We now apply the above expansion principles to the setting of limiting renormalization towers: Lemma 4.10. Let T = (H1 , H2 , . . . ) be a limiting renormalization tower, h ∈ F (T ). As before, denote by =n = Un ∪Vn ∪Dn and 9n the domain and range of the holomorphic pair Hn . We have the following: (I) ||h (z)|| ≥ 1 for any z ∈ D(h) \ R; (II) There exists a universal constant C > 1 such that ||h (z)|| ≥ C if z ∈ =n \ R and h is an element of level n for which h(z) ∈ 9n \ =n . Proof. Let us consider a holomorphic commuting pair H : = → 9 whose underlying real commuting pair ζ = H|R is in the Epstein class, and has an irrational rotation number. We begin by establishing (I) and (II) for the elements of H. To simplify the notation, let us further assume that ζ = (η, ξ ) is a pre-renormalization of a critical circle mapping f , that is, η = f qn+1 : U → 9 ∩ Cη(JU ) ,
ξ = f qn : V → 9 ∩ Cξ(JV ) .
Consider first the map η : U → 9 ∩ Cη(JU ) . By the convention we have made, D(η) will denote the domain of its three-fold extension η : D(η) → Cη(JU ) . Set D ≡ D(η) \ η−1 R ⊂ C \ R, and ι : D V→ C \ R. By Lemma 4.8, ||ι (z)||D,C\R < C(s) < 1,
where s = distD (z, ∂D).
On the other hand, η : D → C \ R is a local isometry with respect to the hyperbolic metrics, that is ||η (z)||D,C\R = 1. Thus, ||η−1 || > C(s)−1 > 1, which proves (I) for the map η. Let us now establish (II), that is show that for z ∈ U with η(z) ∈ 9 \ =, ||η (z)|| > C > 1 for some fixed C. Note that U ∩ R = JU = [0, f qn −qn+1 (0)], and η([0, f qn −qn+1 ]) = [qn+1 , qn ], which is well inside =. This, real a priori bounds, and the Koebe Theorem, imply that for some universal δ > 0, f qn+1 (Uδ ([0, f qn −qn+1 (0)])) ⊂ =. / =. In the notations of ProposiNow let z be a point of U with η(z) = f qn+1 (z) ∈ tion 2.19, let z ∈ / W . Then distC\R (z, ∂D) < s0 with some universal bound s0 and we have (II) for z. Otherwise, w = f qn−1 (z) is contained in one of Ci . Let m denote the iterate c = f mqn (c) ∈ C . Compactness considerations on the shape of D(f qn ) now imply distC\R (c , ∂D(f qn )) > s1 > 0 for some universal s1 . Hence ||Df qn (c )|| > C(s1 ) > 1. Combining this and property (I) with the chain rule for the decomposition f qn+1 = f qn ◦ · · · ◦ f qn ◦ f qn−1 we have ||η (z)|| > C(s1 ) > 1. The statement (I) is proved in the same way for ξ and η ◦ξ ; (II) is obvious for η ◦ξ and is proved in an identical fashion for ξ . Returning to the original claim, since every Hi can be perturbed to an element of H with an irrational rotation number by an arbitrarily small perturbation, the statement follows by continuity.
562
M. Yampolsky
4.4. The structure of the filled Julia set of a forward- infinite limiting renormalization tower. In this section we prove the following theorem: Theorem 4.11. The filled Julia set K(T ) of a limiting renormalization tower T has no interior, K(T ) = J (T ). Moreover, the repelling periodic orbits of the elements of F (T ) are dense in J (T ). Lemma 4.12. Let T be a limiting renormalization tower. Then every non- repelling periodic orbit of T contains the parabolic point of some ζn in T . Proof. Since the claim of the lemma is certainly true for periodic orbits in the real line, let us assume that z1 , . . . , zn is a periodic orbit of a map h ∈ F (T ) disjoint from R. By Lemma 4.10, ||Dh◦n (z1 )|| > 1, which implies that the orbit z1 , . . . , zn is repelling.
Let us first establish the density of the repelling orbits in J (T ), Lemma 4.13. Repelling periodic orbits of maps in F (T ) are dense in J (T ). Moreover, for any point z ∈ J (T ) there exists an element h ∈ F (T ) mapping a neighborhood of z onto the domain of T . Proof. Let W be an open neighborhood of a point p ∈ J (T ). Clearly, there is an element h ∈ F (T ) such that h(W ) intersects the Julia set of a holomorphic pair H ∈ T . Let us perturb H, if necessary, to a pair H with an irrational rotation number, so that W still intersects J (H ). By reducing W , if necessary, we may assume that W ∩ R = ∅. By the Straightening Theorem, there exists a holomorphic commuting pair G ∈ H with G|R = ζθ,n , such that G and H are quasiconformally conjugate. Denote by G a holomorphic extension of the standard pair ζθ,0 quasiconformally conjugate to H . The conjugacy maps W to a neighborhood W˜ intersecting the Julia set of the Arnold map Aθ . Thus there is an iterate of Aθ mapping W˜ over the whole domain of G. The considerations of convergence imply the existence of an element hW ∈ F (T ) mapping W onto the domain of T . The existence of a repelling periodic point of T in W follows from Lemma 4.12. To prove Theorem 4.11 it remains to show that K(T ) has no interior. Let us call a component U of the interior of K(T ) wandering if it is disjoint from its forward images under maps of F (T ). A non-wandering component will be called periodic. A modification of the argument used in the above lemma implies that a component ◦
of K(T ) does not intersect the Julia set of any of the holomorphic pairs H forming T . This implies that every map h ∈ F (T ) defined on a subset of U is defined on all of U . Thus for a periodic component U there exists an h ∈ F (T ) with h(U ) = U . The ◦
disjointness from the Julia sets also implies that U ∩ R = ∅ for every U ⊂ K(T ). Lemma 4.14. The filled Julia set of a limiting renormalization tower has no wandering interior components. Proof. In view of Theorem 4.4 it is sufficient to give a proof in the case of a standard tower. Let T be such a tower, and let ζθ,n be its base pair. Assume that the statement of the lemma does not hold for T . The filled Julia set of T naturally embeds into C/Z, and we may consider the grand orbit {Ui } of the wandering domain under F (T ) and Arnold’s
Attractor of Renormalization
563
map fθ : C/Z → C/Z. Since the interior components of K(T ) do not intersect the real line, every domain Ui in the grand orbit is disjoint from T, and the branches mapping Ui → Uj are all univalent. The appropriate case of Sullivan’s No Wandering Domains Theorem can be directly translated to our setting to construct non-trivial quasiconformal deformations of the tower T which do not deform the base pair ζθ,n (by Lemma 2.8). This contradicts Theorem 4.2. We recall the following fundamental principles of dynamics on hyperbolic Riemann surfaces (see e.g. [Lyu1]). Proposition 4.15. Let h : U → U be an analytic transform of a hyperbolic Riemann surface. Then one of the following four possibilities holds: (I) h has an attracting or superattracting fixed point in U to which all points converge; (II) all orbits tend to infinity; (III) h is conformally conjugate to an irrational rotation of a disk, a punctured disk or an annulus; (IV) h is a conformal homeomorphism of finite order. The next proposition expands on the second possibility: ˆ Proposition 4.16. Let h be an analytic transform of a hyperbolic domain U ⊂ C, continuous up to the boundary of U . Suppose that the set of fixed point of h on ∂U is totally disconnected. If the second possibility of Proposition 4.15 occurs, there is a fixed point α ∈ ∂U such that hn (z) −→ α for every z. n→∞
The following lemma rules out the existence of periodic components for a limiting renormalization tower, thus completing the proof of Theorem 4.11. Lemma 4.17. The filled Julia set of a limiting renormalization tower has no periodic components. ◦
Proof. Assume the contrary. Let U ⊂ K(T ) be a periodic component, and denote by h the element of F (T ) fixing U . Let us consider the possibilities of Proposition 4.15. The expansion properties of h (Lemma 4.10) rule out the cases (III) and (IV). The case (I) is ruled out by Lemma 4.12. The only remaining possibility is (II). Since h is not the identity map, the fixed points of h in ∂U are isolated, and Proposition 4.16 implies that all orbits in U converge to a fixed point p in ∂U . Again applying Lemma 4.12 we see that p ∈ R is the parabolic fixed point of one of the commuting pairs forming T . Denote by U˜ the projection of the domain U onto the repelling Fatou cylinder C R of p. The domain U˜ contains the ends ± of the cylinder. On the other hand, the return map of C R is repelling at the ends (3.1). This implies, that U˜ contains a preimage of the equator R/Z, and thus U intersects F (T )−1 (R). The last statement contradicts the assumption that U is an interior component of K(T ). 4.5. Quasiconformal rigidity of limiting towers: Proof of Theorem 4.7. Theorem 4.18. A standard tower admits no non-trivial invariant Beltrami differentials on its filled Julia set. Before giving the proof of the theorem let us state the following
564
M. Yampolsky
Lemma 4.19 (cf. Lemma 1.8 [Lyu1]). Let T be a limiting renormalization tower. The group G of homeomorphisms of K(T ) which commute with all maps h ∈ F (T ) is totally disconnected. Proof. Let φ ∈ G be a map in the connected component of the identity. Suppose z0 ∈ K(T ) is a repelling periodic point with h(z0 ) = z0 for some h ∈ F (T ). Since the solutions of h(z) = z are isolated, φ fixes z0 . The claim now follows from density of repelling cycles, Theorem 4.11. Proof of Theorem 4.18. Indeed, let T = (Hi )∞ i=1 be a standard tower, that is, H1 |R = ζθ,n for some θ, n. By Lemma 2.20, any T -invariant Beltrami differential µ extends to a Beltrami differential µˆ on K(fθ ) invariant under fθ : C/Z → C/Z. Let us extend µˆ to the complement of the grand orbit of K(fθ ) under fθ in C/Z as a trivial Beltrami differential, and for every t ∈ [0, 1] denote φt : C/Z → C/Z the solution of the Beltrami equation φt∗ σ0 = µ, ˆ fixing zero. By Lemma 2.8, for every t ∈ [0, 1] we have φt ◦ fθ ◦ φt−1 ≡ fθ . By Theorem 4.2 the towers φt ◦ T ◦ φt−1 and T coincide. Thus each map in the continuous family φt commutes with all h ∈ F (T ) on the filled Julia set of T . As φ0 ≡ Id, Lemma 4.19 implies that for every t, φt (z) ≡ Id on the closure of the grand orbit of K(T ) under fθ . Everywhere else φt is conformal, hence by the Bers Sewing Lemma, it is conformal everywhere, and thus φt ≡ Id. This implies that µˆ is trivial almost everywhere, and the proof is complete. Combining this with Theorem 4.4 we have a corollary: Theorem 4.20. A forward-infinite limiting renormalization tower supports no invariant Beltrami differentials on its filled Julia set. n ∞ For a bi-infinite tower T = (Hk )∞ −∞ let T denote the truncated tower (Hk )−n . Also denote by =n and 9n the domain and range of the pair Hn .
Theorem 4.21. Let T = (Hk )∞ −∞ be a limiting renormalization bi-infinite tower. Then lim J (Tn ) = C
n→−∞
in Hausdorff topology. Proof. Take any z ∈ / ∪n<0 J (Tn ). Then the orbit OT (z) escapes every domain =n for large negative n. That is, there exists an infinite sequence of points znk ∈ OT (z) for nk → −∞ such that znk ∈ =nk and Hnk (znk ) ∈ / =nk . Note that by real a priori bounds and compactness of H , the difference |nk − nk+1 | is bounded. By Lemma 4.10 (II) we have ||Hn k (znk )|| > C > 1 for some universal constant C. Moreover, as seen from the proof of the same lemma, there is an element hnk ∈ F (T nk −1 ) such that distC\R (hnk (znk ), J (T nk −1 )) < s for some universal value of s > 0. Let us arrange the points {ζk = hnk (znk )} into an infinite orbit ζ0 = z, ζ1 = g0 (ζ0 ), ζ2 = g1 (ζ1 ) . . . , with gk ∈ F (T ). By Lemma 4.10 (I) we have ||gk (ζk )|| > C. Denote by αk the hyperbolic geodesic in C \ R of length l(αk ) < s connecting ζk to J (T nk −1 ). Let αn k be the connected component of g0−1 ◦ . . . ◦ gk−1 (αnk ) containing z = ζ0 . Since the Julia set of a tower is invariant, it is enough to show that l(αn k ) → 0 as nk → −∞. Indeed, ||Dgk ◦ · · · ◦ g0 || > C k . By Lemma 4.9 this inequality holds along αn k with C replaced by another universal constant C1 > 1, and hence αn k shrinks to 0.
Attractor of Renormalization
565
Recall, that a measurable line field is a measurable Beltrami differential u(z) with |u(z)| = 1 on a set of positive measure and u(z) ≡ 0 elsewhere. If µ is a non-trivial Beltrami differential, then µ/|µ| is a measurable line field. We say that a line field u(z) is invariant under a tower T if for any h ∈ F (T ), (h∗ u)/u is a real-valued function. A measurable line field u(z) is univalent on an open set U if u = h∗ (d z¯ /dz) a.e. for a univalent map h : U → C. Proof of Theorem 4.7. Suppose that T = (Hk )∞ −∞ is a bi- infinite limiting renormalization tower, and µ is a nontrivial invariant Beltrami differential of T . Let u(z) denote the corresponding invariant line field. By Theorem 4.20 u(z) is not supported in ∪J (T n ). Let z0 be a point of almost continuity of u(z), z0 ∈ / J (T n ) for any n. Denote again by znk ∈ =nk the elements of OT (z0 ) with Hnk (znk ) ∈ / =nk . As seen from the proof of Lemma 4.10, there is an element hnk ∈ F (T nk −1 ) such that the Euclidean distance from ζk = hnk (znk ) to R is commensurable with |=nk ∩ R|. Let Tnk = Xnk ◦ T ◦ X−1 nk , where Xnk (z) = z/|=nk ∩R|. By compactness of H , we may ensure, passing to a subsequence, that elements of Tnk converge to holomorphic pairs Hi forming a bi-infinite tower T . By choosing a further subsequence we may assume that X−1 nk ζk → w, and the rescaled −1 ∗ linefields u(Xnk (z)) w - converge to a measurable linefield u , with h∗ u = u for all h ∈ F (T ). Let D be a small disk around w in C \ R, and denote by Dnk its image under the homothety Xnk . The diameters of all Dnk in the hyperbolic metric of C \ R are equal. Denote by Dn k the univalent preimage h−1 nk (Dnk ) for some hnk ∈ F (T ), containing the point z. Lemma 4.10 together with Lemma 4.9 imply that Euclidean diameters of Dn k shrink to 0, and z is well inside Dn k . As z0 is a point of almost continuity of u, the line field u is almost constant in Dn k . Since z0 is well inside Dn k , the rescaled branches hnk ((z − z0 )/|h (z0 )|) form a normal family in a neighborhood of 0. Passing to a limit, we see that the line field u is univalent on an open neighborhood Dˆ of w (cf. [McM1, Theorem 5.16]). By Theorem 4.21 there is an i such that J (Hi ) ∩ Dˆ = ∅. By Lemma 4.13 the orbit of Dˆ by Ti covers all of 9i . By invariance, this means that u coincides with a locally univalent line field around 0 and (Hi )2 (0) which implies contradictory behaviour of u around Hi (0).
5. Conclusions Proof of Theorem A. Construction of the set A. Due to the compactness of H (Lemma 2.17) and complex ¯ there exits a bi-infinite limiting renora priori bounds, for each bi-infinite string s¯ ∈ malization tower T = (Hi )∞ with the rotation number s¯ . In view of Theorem 4.5 such −∞ a tower is unique. We thus set A = {ζs¯ }s¯∈¯ , ¯ → A by i : s¯ → ζs¯ . With these definitions we have I = i(). By and define i : Proposition 2.3 and Theorem 4.5 the mapping i : → I is an injection. The required invariance property R(I) = I immediately follows from the definition.
566
M. Yampolsky
Convergence to A. Let ζ ∈ E be any commuting pair with ρ(ζ ) ∈ R \ Q, and let ζˆ be a limit point of the sequence Rn ζ . By compactness of H , ζˆ is the base pair of a bi-infinite limiting renormalization tower and therefore ζˆ ∈ A. By compactness of H , there exists an open neighborhood G of the origin, such that the maximal domains of definition of the elements of Rn ζ contain G for all large n. Now let ζ ∈ A be a commuting pair with ρ(ζ ) = ρ(ζ ). We will show that dist(Rn ζ, Rn ζ ) → 0, where the distance is measured as the maximum of the distance between the analytic extensions of the elements of the renormalized pairs in C 0 -metric on G. Otherwise, there exists a sequence nk → ∞ and B > 0 such that dist(Rnk ζ, Rnk ζ ) > B. Passing to a further subsequence we may ensure that Rnk ζ → ζ1 , Rnk ζ → ζ2 and ζ1 and ζ2 are the base pairs of two bi-infinite limiting renormalization towers T1 , T2 with ρ(T1 ) = ρ(T2 ). By construction, the towers T1 , T2 are not homothetic, which contradicts Theorem 4.5. ¯ → A, arguing Continuity of i. We will actually show the continuity of the map i : ¯ s¯n → s¯ ∈ , ¯ by contradiction. Suppose that there is a converging sequence {¯sn } ⊂ , and assume that i(¯sn ) → ζs¯ . By definition of the set A, the commuting pair ζs¯ is the base pair of a bi-infinite limiting renormalization tower T with ρ(T ) = s¯ . On the other hand, each of the pairs ζs¯n is also a base pair of a bi-infinite limiting renormalization tower Tn with ρ(Tn ) = s¯n . Using compactness of H , we may pass to a subsequence nk so that Tnk → T . The bi-infinite tower T has the same rotation number as T , but by our assumption its base pair is not ζs¯ . This is in direct contradiction with Theorem 4.5. Continuity of i −1 . We adapt the argument given by Lyubich in the context of unimodal maps (see [Lyu4]). Set Ar = {ζ ∈ A | χ (ζ ) = r}, this set is obviously closed, hence compact. Let us first prove that R : I → I is a homeomorphism. Consider a pair ζ ∈ I, χ (ζ ) = ∞. Then in a neighborhood of ζ , R ≡ Rχ(ζ ) . Hence R is continuous at ζ . To show continuity of R−1 , consider a pair ζ ∈ I and set ζ∗ = (R|I )−1 (ζ ). Suppose χ (ζ∗ ) = r. Since renormalization is injective, R(Aj ) ζ if j = r. Since Ar is compact, the image R(Aj ) misses some neighborhood of ζ . Assume now that there is a sequence R(ζk ) → ζ , where ζk are contained in distinct strips Ajk . Then necessarily jk → ∞. Passing to a subsequence we may ensure that ζk converge to a parabolic pair ζˆ ∈ A in which case Pτ (ζˆ ) = ζ for some choice of τ . This contradicts Proposition 3.6. It follows that there is a neighborhood of ζ which misses all the images R(Aj ) with j = r. Hence on this neighborhood (R|A )−1 = (R|Ar )−1 . The latter map is continuous since Ar is compact. Let us now demonstrate that i −1 : I → is continuous. Let ζ = ζs¯ ∈ I with s¯ = (ri )∞ −∞ . Since R is a homeomorphism, all pairs ζ ∈ I in a small neighborhood of ζ have the same digits (r−n , . . . , rn ). This is equivalent to the continuity of i −1 . Proof of Theorem B.. Observe first that by Lemma 4.3 the transformation G is well¯ → Aˆ follows from Theorem 4.5 as above, that defined. The continuity of the map i : of the inverse is clear. The conjugacy is evident from the definitions and Theorem 4.5.
Attractor of Renormalization
567
Let us address the convergence statement. Select an arbitrary element (ζ, s¯ ) ∈ Xˆ , where s¯ = (ri )∞ 1 , and consider a sequence of perturbations ζk → ζ , where ζk ∈ E and ρ(ζk ) = [r0k , r1k , . . . ] ∈ R \ Q. We will also require that rnk −→ rn for k ∈ N and rn0 → χ (ζ ). In view of Lemma 3.4, k→∞
we may assume by choosing a further subsequence that k k (Rn ζk , (rn+1 , rn+2 , . . . )) −→ G n (ζ, s¯ ). k→∞
The existence of uniform real bounds for maps in the Epstein class implies that for n > N0 the family {Rn ζk } is sequentially pre-compact in E, which means that the sequence {G n (ζ, s¯ )} is pre-compact as well. Let (ζ ∗ , t¯) be any limit point of {G n (ζ, s¯ )}, then for some choice of ni we have Rni ζk → ζ ∗ . Choosing a “diagonal" subsequence ζkj we may ensure that Rni +L ζkj also converge for all L ∈ Z. Thus ζ ∗ is a base map of a bi-infinite renormalization limiting tower, and hence ζ ∗ ∈ A. Proof of Theorem C. Define the pair ζ0 to be the base pair of the bi-infinite renormalization tower with rotation number (. . . , ∞, ∞, ∞, . . . ). The theorem is just a particular case of Theorem B. Acknowledgements. I would like to express my gratitude to A. Epstein and B. Hinkle; the discussions with them have greatly assisted me in producing this paper. Ben Hinkle has a parallel work [Hin] on unimodal maps with essentially bounded combinatorics, and I appreciate his willingness to let me consult his manuscript before it was finished. I wish to thank O. Lanford, whose numerous helpful comments on an earlier version of this paper helped me to streamline some of the arguments and to improve the exposition. I also would like to thank E. de Faria, who graciously provided me with preliminary copies of his papers with W. de Melo. Finally, my thanks go to Mikhail Lyubich for numerous stimulating conversations and constant moral support.
References [BDDS]
Buff, X., Douady, A., Devaney, R., Sentenac, P.: Baby Mandelbrot sets are born in cauliflowers. In: The Mandelbrot set, theme and variations, London Math. Soc. Lecture Note Ser. 274, Cambridge: Cambridge Univ. Press, 2000, pp. 19–36 [DD] Devaney, R.L., Douady, A.: Homoclinic bifurcations and infinitely many Mandelbrot sets. Preprint [Do] Douady, A.: Does a Julia set depend continuously on the polynomial? In: Complex dynamical systems: The mathematics behind the Mandelbrot set and Julia sets. ed. R.L. Devaney, Proc. of Symposia in Applied Math., Vol 49, Providence, RI: Am. Math. Soc., 1994, pp. 91–138 [DH] Douady, A., Hubbard, J.H.: Etude dynamique des polynômes complexes, I–II. Pub. Math. d’Orsay, 1984 [dF1] de Faria, E.: Proof of universality for critical circle mappings. Thesis, CUNY, 1992 [dF2] de Faria, E.: Asymptotic rigidity of scaling ratios for critical circle mappings. Ergodic Theory Dynam. Systems 19 no. 4, 995–1035 (1999) [dFdM1] de Faria, E. and de Melo, W. Rigidity of critical circle mappings I. J. Eur. Math. Soc. (JEMS) 1 no. 4, 339–392 (1999) [dFdM2] de Faria, E. and de Melo, W.: Rigidity of critical circle mappings II. J. Am. Math. Soc. 13 no. 2, 343–370 (2000) [Ep1] Epstein, A.: Towers of finite type complex analytic maps. PhD Thesis, CUNY, 1993 [Ep2] Epstein, A.: Counterexamples to the quadratic mating conjecture. Manuscript in preparation
568
M. Yampolsky
b sin(2π x) with any Epstein, A., Keen, L., Tresser, C.: The set of maps Fa,b : x → x + a + 2π given rotation interval is contractible. Commun. Math. Phys. 173, 313–333 (1995) [EE] Eckmann, J.-P. and Epstein, H.: On the existence of fixed points of the composition operator for circle maps. Commun. Math. Phys. 107, 213–231 (1986) [FKS] Feigenbaum, M., Kadanoff, L., and Shenker, S.: Quasi-periodicity in dissipative systems. A renormalization group analysis. Physica D 5, 370–386 (1982) [He] Herman, M.: Conjugaison quasi-symmetrique des homeomorphismes analytiques du cercle a des rotations. Manuscript [Hin] Hinkle, B.: Parabolic limits of renormalization. Ergodic Theory Dynam. Systems 20 no. 1, 173–229 (2000) [Keen] Keen, L.: Dynamics of holomorphic self-maps of C∗ . In: Holomorphic functions and moduli I, (ed. D. Drasin et al.), New York: Springer-Verlag, 1988 [Lan1] Lanford, O.E.: Renormalization group methods for critical circle mappings with general rotation number. In: VIIIth International Congress on Mathematical Physics (Marseille, 1986), Singapore: World Sci. Publishing, 1987, pp. 532–536 [Lan2] Lanford, O.E.: Renormalization group methods for critical circle mappings. Nonlinear evolution and chaotic phenomena. NATO Adv. Sci. Inst. Ser. B: Phys. 176, New York: Plenum, 1988, pp. 25– 36 [Lyu1] Lyubich, M.: The dynamics of rational transforms: The topological picture. Russ. Math. Surveys 41, 35–95 (1986) [Lyu2] Lyubich, M.: Renormalization ideas in conformal dynamics. In: Cambridge Seminar “Current Developments in Math.”, May 1995. Cambridge, MA: International Press, 1995, pp. 155–184 [Lyu3] Lyubich, M.: Feigenbaum–Coullet–Tresser Universality and Milnor’s Hairiness Conjecture. Ann. of Math. (2) 149 no. 2, 319–420 (1999) [Lyu4] Lyubich, M.: Almost every real quadratic map is either regular or stochastic. IMS at Stony Brook Preprint #1997/8 [LY] Lyubich, M. and Yampolsky, M.: Dynamics of quadratic polynomials: Complex bounds for real maps. Ann. Inst. Fourier (Grenoble) 47 no. 4, 1219–1255 (1997) [Mak] Makienko, P.: Iterations of analytic functions in C∗ . (Russian) Dokl. Akad. Nauk SSSR 297 no. 1, 35–37 (1987); translation in Soviet Math. Dokl. 36 no. 3, 418–420 (1988) [McM1] McMullen, C.: Complex dynamics and renormalization. Annals of Math. Studies 135, Princeton, NJ: Princeton Univ. Press, 1994 [McM2] McMullen, C.: Renormalization and 3-manifolds which fiber over the circle. Annals of Math. Studies, Princeton, NJ: Princeton University Press, 1996 [Mil] Milnor, J.: Dynamics in one complex variable. Introductory lectures. Braunschweig: Friedr. Vieweg & Sohn, 1999 [MvS] de Melo, W. & van Strien, S.: One dimensional dynamics. Berlin–Heidelberg–New York: SpringerVerlag, 1993 [ORSS] Ostlund, S., Rand, D., Sethna, J., Siggia, E.: Universal properties of the transition from quasiperiodicity to chaos in dissipative systems. Physica D 8, 303–342 (1983) [Sul1] Sullivan, D. Quasiconformal homeomorphisms and dynamics, topology and geometry. In: Proc. ICM-86, Berkeley, V. II, pp. 1216–1228 [Sul2] Sullivan, D.: Bounds, quadratic differentials, and renormalization conjectures. AMS Centennial Publications. 2: Mathematics into Twenty-first Century. Providence, RI: AMS, 1992 [Sh] Shishikura, M.: The Hausdorff dimension of the boundary of the Mandelbrot set and Julia sets. Ann. of Math. (2) 147 no. 2, 225–267 (1998) ´ atek, G.: Rational rotation numbers for maps of the circle. Commun. Math. Phys. 119, 109–128 [Sw1] Swi¸ (1988) [Ya1] Yampolsky, M.: Complex bounds for renormalization of critical circle maps. Ergodic Theory Dynam. Systems 19 no. 1, 227–257 (1999) [Ya2] Yampolsky, M.: Hyperbolicity of renormalization of critical circle maps. IHÉS Preprint M/00/50, 2000 [Yoc] Yoccoz, J.-C.: Il n’ya pas de contre-example de Denjoy analytique. C.R. Acad. Sci. Paris 298, série I, 141–144 (1984)
[EKT]
Communicated by Ya. G. Sinai
Commun. Math. Phys. 218, 569 – 607 (2001)
Communications in
Mathematical Physics
Droplet States in the XXZ Heisenberg Chain Bruno Nachtergaele, Shannon Starr Department of Mathematics, University of California, Davis, CA 95616-8633, USA. E-mail:
[email protected];
[email protected] Received: 5 September 2000 / Accepted: 8 December 2000
Abstract: We consider the ground states of the ferromagnetic XXZ chain with spin up boundary conditions. The ground state of this model, restricted to a sector with a fixed number of down spins, describes a droplet of down spins in an environment of up spins. We find the exact energy and the states that describe these droplets in the limit of an infinite number of down spins. We prove that there is a gap in the spectrum above the droplet states. As the XXZ Hamiltonian has a gap above the fully magnetized ground states as well, this means that the droplet states (for sufficiently large droplets) form an isolated band. The width of this band tends to zero in the limit of infinitely large droplets. We also prove the analogous results for finite chains with periodic boundary conditions and for the infinite chain. 1. Introduction Droplet states have been studied in considerable detail for the Ising model [6, 12, 4], where they play an important role in understanding dynamical phenomena [13]. In this paper we consider the spin- 21 ferromagnetic XXZ Heisenberg chain and prove that the bottom of its spectrum consists of an isolated nearly flat band of droplet states in a sense made precise below. The Hamiltonian for a chain of L spins acts on the Hilbert space HL = C21 ⊗ · · · ⊗ C2L as the sum of nearest-neighbor interactions XXZ H[1,L] =
L−1 x=1
XXZ Hx,x+1
Copyright © 2000 by the authors. Reproduction of this article in its entirety, by any means, is permitted
for non-commercial purposes.
570
B. Nachtergaele, S. Starr
of the form
1 1 XXZ 3 = −−1 S x · S x+1 − − − (1 − −1 ) Sx3 Sx+1 . Hx,x+1 4 4
(1.1)
Here Sxi (i = 1, 2, 3) are the spin matrices, acting on C2x , extended by unity to HL , and normalized so that they have eigenvalues ±1/2. The anisotropy parameter, , is always assumed to be > 1. To formulate the results and also for the proofs, we need to consider the following combinations of boundary fields for systems defined on an arbitrary interval: for α, β = ±1, 0, and [a, b] ⊂ Z, define αβ H[a,b]
where A() =
1 2
√
=
b−1 x=a
XXZ Hx,x+1 − A()(αSa3 + βSb3 ),
(1.2)
00 = H XXZ . 1 − −2 . Note that H[1,L] [1,L] αβ
As all the Hamiltonians H[a,b] commute with the total third component of the spin, it makes sense to study their ground states restricted to a subspace of fixed number of down spins. The subspace for a chain of L spins consisting of the states with n down spins will be denoted by HL,n , for 0 ≤ n ≤ L. In all cases the ground state is then unique. The Hamiltonians with +− and −+ boundary fields have been studied extensively and have kink and antikink ground states respectively [1, 7, 9, 11, 10, 5, 3]. The unique ground states for a chain on [a, b] ⊂ Z, in the sector with n down spins, will be denoted by αβ ψ[a,b] (n), 0 ≤ n ≤ b − a + 1. For αβ = +−, −+, they are given by n n +− (b+1−xk ) − k=1 ψ[a,b] (n) = q Sxk |↑ . . . ↑[a,b] , (1.3) a≤x1 <···<xn ≤b
−+ ψ[a,b] (n)
=
a≤x1 <···<xn ≤b
q
n
k=1 (xk +1−a)
k=1 n
k=1
Sx−k
|↑ . . . ↑[a,b] ,
(1.4)
where = (q + q −1 )/2, 0 < q < 1. Note that the norm of these vectors depends on the length (but not on the position) of the interval [a, b] (see (A.3)). There is a uniform lower bound for the spectral gap above these ground states [9], a property that will be essential in the proofs. Here, we are interested in the ground states of the Hamiltonian with ++ boundary fields, which we refer to as the droplet Hamiltonian, in the regime where there are a sufficiently large number of down spins. This includes, but is not limited to, the case where there is a fixed density ρ, 0 < ρ ≤ 1, of down spins in a system with ++ boundary conditions. We prove that under these conditions the ground states contain one droplet of down spins in a background of up spins. From the mathematical point-of-view there is an important distinction between the kink Hamiltonian and the droplet Hamiltonian, which is that the droplet Hamiltonian does not possess SUq (2) symmetry. In contrast to the kink Hamiltonian where explicit formulae are known for the ground states in finite volumes, no such explicit analytic formulae are known for the droplet Hamiltonian for general L. Therefore, we rely primarily on energy estimates, and our main results are formulated as estimates that become exact only in the limit n, L → ∞. This is natural as, again unlike for the kink ground states, there is no immediate infinite-volume description of the droplet states. We find the exact energy of an infinite droplet and an approximation of the droplet ground states that
Droplet States in XXZ Heisenberg Chain
kink
571
droplet
antikink
1
x
L
Fig. 1. Diagram of a typical droplet as the tensor product of a kink and antikink
becomes exact in the thermodynamic limit. We also prove that all states with the energy of the droplet are necessarily droplet states, again, in the thermodynamic limit. For the droplet Hamiltonians this means that the droplet states are all the ground states, and that there is a gap above them. One can also interpret this as saying that all excitations of the fully magnetized ground states of the XXZ chain, with sufficiently many overturned spins and not too high an energy, are droplet states. 1.1. Main result. The main result of this paper is the approximate calculation of the ground state energy, the ground state space, and a lower bound for the spectral gap of the ++ restricted to the sector HL,n . If the results were exact, we would have operator H[1,L] 0 ⊂ HL,n , and a positive number γ , such that an eigenvalue E0 , a subspace HL,n ++ 0 0 H[1,L] Proj(HL,n ) = E0 Proj(HL,n )
and
++ 0 H[1,L] Proj(HL,n ) ≥ E0 Proj(HL,n ) + γ (Proj(HL,n ) − Proj(HL,n )).
We will always use the notation Proj(V ) to mean orthogonal projection onto a subspace V . Our results are approximations, with increasing accuracy as n tends to infinity, independent of L. First, we identify the proposed ground state space. For n ≥ 0 and n/2 ≤ x ≤ L − n/2 define +− −+ (n/2) ⊗ ψ[x+1,L] (n/2). ξL,n (x) = ψ[1,x]
(1.5)
For any real number x, x is the greatest integer ≤ x, and x is the least integer ≥ x. The typical magnetization profile of ξL,n (x) is shown in Fig. 1. We define the space of approximate ground states as follows: KL,n = span{ξL,n (x) : n/2 ≤ x ≤ L − n/2}. KL,n is the space of “approximate” droplet states with n down spins for a finite chain of length L. An interval of length n can occur in L − n + 1 positions inside a chain of length L. This explains why dim KL,n = L − n + 1. Alternatively, we could use the following definitions of approximate droplet states: antikink,+ x−n/2 kink,+ L−n/2−x ξL,n (x) = [S[1,L] ] [S[1,L] ] |↓ . . . ↓, kink,+ antikink,+ where S[1,L] is the SUq (2) raising operator (see, e.g., (2.5b) of [9]), and S[1,L] is
kink,+ the left-right reflection of S[1,L] . Yet another option for the droplet states is to take the +− −+ + H[x,L] , which have a pinning exact ground states of the Hamiltonians H[1,L] = H[1,x] field at position x, and for which exact expressions for the ground states can be obtained.
572
B. Nachtergaele, S. Starr
One can show that suitable linear combinations of these states differ in norm from the ξL,n (x) by no more than O(q n ). We will only use the states ξL,n (x) defined in (1.5), as they have a more intuitive interpretation as a tensor product of a kink and an antikink state. Theorem 1.1. a) There exists a constant C < ∞ such that ++ − A())Proj(KL,n ) ≤ Cq n . (H[1,L]
The constant C depends only on q, not on L or n. b) There exists a sequence n , with limn→∞ n = 0, such that ++ H[1,L] Proj(HL,n ) ≥ (A() − 2Cq n )Proj(HL,n )
+ (γ − where γ = 1 − −1 . The sequence independent of L.
n
n )(Proj(HL,n ) − Proj(KL,n )),
can be chosen to decay at least as fast as n−1/4 ,
XXZ , which is the one without boundary terms, the large-droplet states are not For H[1,L] separated in the spectrum from other excitations such as the spin waves, i.e., the band of continuous spectrum due to spin wave excitations overlaps with the states of droplet type. Although similar results should hold for boundary fields of larger magnitude the value, A(), of the boundary fields in the droplet Hamiltonian, is particularly convenient for at least two reasons: 1) it allows us to write the Hamiltonian as a sum of kink and anti-kink Hamiltonians, which is the basis for many of our arguments, 2) the energy of a droplet in the center of the chain is the same as for a droplet attached to the boundary. This allows us to construct explicitly the subspace of all droplet states asymptotically in the thermodynamic limit. Although our main results are about infinite droplets, i.e., they are asymptotic properties of finite droplets in the limit of their size tending to infinity, we can extract from our proofs estimates of the corrections for finite size droplets. This allows the following reformulation of the main result in terms of the eigenvalues near the bottom of the spectrum and the corresponding eigenprojection. Let λL,n (1) ≤ λL,n (2) ≤ . . . be the ++ ++ ++ eigenvalues of H[1,L] restricted to the sector HL,n . Let ψL,n (1), ψL,n (2), . . . be the corresponding eigenstates, and define ++ k HL,n = span{ψL,n (j ) : 1 ≤ j ≤ k}.
++ Theorem 1.2. a) We have the following information about the spectrum of H[1,L] restricted to HL,n :
λL,n (1), . . . λL,n (L − n + 1) ∈ [A() − O(q n ), A() + O(q n )], and b) λL,n (L − n + 2) ≥ A() + γ − O(n−1/4 ). c) We have the following information about the eigenspace for the low-energy states, λL,n (1), . . . , λL,n (L − n + 1): L−n+1 Proj(KL,n ) − Proj(HL,n ) = O(q n/2 ).
Droplet States in XXZ Heisenberg Chain
573
"multiplicity of ground states" 0 2 7 10 9 8 7 6 5 4 3 2 1 16
1.2
14
12
1.1
E/A(∆)
10
8
1
6
4
0.9
2
0
0.8 0 1 2 3 4 5 6 7 8 9 10 11 12 n
0 1 2 3 4 5 6 7 8 9 10 11 12 n
++ Fig. 2. Spectrum for H[1,12] when = 2.125 (q = 1/4)
Equivalently
sup
0=ψ∈KL,n
inf
L−n+1 ψ ∈HL,n
sup L−n+1 0=ψ ∈HL,n
inf
ψ∈KL,n
ψ − ψ 2 ψ2 ψ − ψ 2 ψ 2
= O(q n ),
= O(q n ).
Figure 2 illustrates the spectrum for a specific choice of L and q. Note that Theorem 1.2 also implies that, for any sequence of states with energies converging to A(), we must have that the distances of these states to the subspaces KL,n converges to zero. The remainder of the paper is organized as follows. Section 2 reviews some preliminary properties of the Hamiltonians that appear in the paper: a simple estimate for the gap above the ground state of the XXZ Hamiltonian on an open chain without boundary terms, the spectral gap for the Hamiltonian with kink and antikink boundary terms, and a preliminary lower bound for the energy of a droplet state. The proof of the main theorems is given in Sects. 3, 4, and 5. First, in Sect. 3, we calculate the energy of the proposed droplet states ξL,n (x), defined in (1.5). We also prove that these states are approximate eigenstates.
574
B. Nachtergaele, S. Starr
In Sect. 4, we prove a basic estimate which shows that, given a state ψ of the chain on [1, L], with energy E, there exists an interval J ⊂ [1, L], of length |J |, where the state is fully polarized (i.e., all up or all down spins) with high probability. We obtain the following lower bound for this probability: Probψ [the spins in J are all up or all down] ≥ 1 − Constant × |J | ×
E . L
The meaning of this bound is clear. For fixed energy E, as L increases it becomes more and more likely that there exists an interval J , of given length |J |, where the system is in the all up or all down state. Of course, the location of the interval J in [1, L] depends on ψ. The spectral gap of the model enters through the constant. An estimate of this kind should be expected to hold for any ferromagnetic model with a gap, as the interaction encourages like spins to aggregate. Section 5 contains the most intricate part of the proof. We implement the idea that the presence of an interval of all up or all down spins in a state, allows one to decouple the action of the Hamiltonians on the subsystems to the left and the right of this interval. If the spins in the interval are down, the Hamiltonian decouples into a sum of a kink and an antikink Hamiltonian, for which it is known that there is spectral gap. If the spins in the interval are up, we do not immediately obtain an estimate for the gap, but we can repeat the argument for the two decoupled subsystems. If there are a sufficiently large number of down spins in the original system, this procedure must lead to an interval of down spins , and hence an estimate for the gap, after a finite number of iterations. We will also prove, in Sect. 6, the analogous statements for rings and for the infinite chain with a large but finite number of down spins. Some calculations that are used in the proofs are collected in two appendices. 2. Properties of the XXZ Hamiltonians In this section, we collect all the Hamiltonians that appear in the paper, and describe some of their properties. The first Hamiltonian we consider is XXZ H[1,L] =
where
L−1 x=1
XXZ Hx,x+1 ,
(2.1)
1 1 (3) XXZ Hx,x+1 = −−1 S x · S x+1 − − 1 − −1 Sx(3) Sx+1 − . 4 4
(2.2)
> 1 is the anisotropy parameter. Note that for = 1 it is the isotropic Heisenberg model, and for = ∞ it is the Ising model. XXZ , considered as an operator on the four dimensional The diagonalization of Hx,x+1 space C2x ⊗ C2x+1 is eigenvalue XXZ Hx,x+1 :
eigenvector
0
|↑↑, |↓↓
1 −1 2 (1 − ) 1 −1 2 (1 + )
√1 (|↑↓ + |↓↑) 2 √1 (|↑↓ − |↓↑) 2
.
(2.3)
Droplet States in XXZ Heisenberg Chain
575
Let us define σ Px,x+1 = 11 ⊗ · · · ⊗ 1x−1 ⊗ |σ σ σ σ | ⊗ 1x+2 ⊗ · · · ⊗ 1L
for σ =↑, ↓, and Px,x+1 =
↑ Px,x+1
XXZ ≥ Hx,x+1
↓ + Px,x+1 .
(2.4)
Then, clearly,
1 (1 − −1 )(1 − Px,x+1 ). 2
(2.5)
XXZ is 0, and the ground state space is Lemma 2.1. The ground state energy for H[1,L] span{|↑ . . . ↑, |↓ . . . ↓}. The following bounds hold 1 XXZ ≥ (1 − −1 ) 1 − Proj(span{|↑ . . . ↑, |↓ . . . ↓}) . (2.6) H[1,L] 2 XXZ follows trivially Proof. The fact that |↑ . . . ↑ and |↓ . . . ↓ are annihilated by H[1,L] from the fact that |↑ . . . ↑ and |↓ . . . ↓ are annihilated by each pairwise interaction XXZ . So, in fact these states are frustration-free ground states. Next, Hx,x+1 L−1
XXZ ≥ H[1,L]
1 (1 − Px,x+1 ), (1 − −1 ) 2 x=1
by (2.1) and (2.5). We observe that each Px,x+1 is an orthogonal projection. Moreover Px,x+1 commutes with Py,y+1 for every x and y. So L−1 L−1 L−1 x−1 1− Px,x+1 = Py,y+1 (1 − Px,x+1 ) ≤ (1 − Px,x+1 ). But
L−1 x=1
x=1
x=1
y=1
x=1
Px,x+1 = Proj(span{|↑ . . . ↑, |↓ . . . ↓}), which proves (2.6).
!
αβ H[1,L]
for α, β = ±1, 0, defined in All the other Hamiltonians we consider, namely +− XXZ is known (1.2), are perturbations of H[1,L] by boundary fields. The Hamiltonian H[1,L] −+ as the kink Hamiltonian, and H[1,L] is the antikink Hamiltonian. These two models are distinguished because they each possess a quantum group symmetry, for the quantum group SUq (2). It should be mentioned that the representation of SUq (2) on HL which +− −+ commutes with H[1,L] is different than the representation which commutes with H[1,L] . XXZ These Hamiltonians are also distinguished because, like H[1,L] , they can be written as sums of nearest-neighbor interactions and all their ground states are frustration-free. +− We will give a formula, sufficient for our purposes, for the ground states of H[1,L] and −+ H[1,L] , respectively. First define the sectors of fixed total down-spins so that HL,0 = span{|↑ . . . ↑}, and for n = 1, . . . , L, n − HL,n = span{ Sxi |↑ . . . ↑ : 1 ≤ x1 < x2 < · · · < xn ≤ L}. i=1
+− −+ L 3 Proj(H Thus, Stot L,n ) = ( 2 − n)Proj(HL,n ). Then H[1,L] and H[1,L] each have L + 1 +− −+ (n) and ψ[1,L] (n) be these ground states, ground states, one for each sector. Let ψ[1,L] normalized as given in (1.3) and (1.4). The spectral gap is known to exist for each sector HL,n , n = 1, . . . , L − 1, and to be independent of n. Specifically, in [9] the following was proved
576
B. Nachtergaele, S. Starr
+− Proposition 2.2. For the SUq (2) invariant Hamiltonian H[1,L] , L ≥ 2, and ≥ 1 one has +− ψ ψ|H[1,L] +− : ψ ∈ HL,n , ψ = 0, ψ|ψ[1,L] = 0 γL := inf ψ|ψ
= 1 − −1 cos(π/L). In particular
γL ≥ 1 − −1 ,
for all L ≥ 2, and in addition the spectral gap above any of the ground state representations of the GNS Hamiltonian for the infinite chain is exactly 1 − −1 . We will define γ = 1 − −1 which is the greatest lower bound of all γL , and the −+ spectral gap for the infinite chain. A result identical with this one holds for the H[1,L] spin chain, which may be obtained using spin-flip or reflection symmetry. ++ , and the kink There are important differences between the droplet Hamiltonian, H[1,L] ++ 3 Hamiltonian, which we briefly explain. Since H[1,L] commutes with Stot , it makes sense to block diagonalize it with respect to the sectors HL,n , n = 0, . . . , L. If we consider ++ the spectrum of H[1,L] on the sector HL,n for L and n both large, we will see that there are L + 1 − n eigenvalues in a very small interval about A(). Then there is a gap above A() of width approximately γ , with error at most O(n−1/4 ), which is free of any eigenvalues. This is different from the case of the kink and antikink Hamiltonians where the ground state in each sector is nondegenerate, with a uniform spectral gap above. In our case, the ground state is non-degenerate only because the translation invariance is broken in the finite systems. As L → ∞, the translation invariance is restored and the lowest eigenvalue in each sector becomes infinitely degenerate. Therefore, as is done in Theorem 1.2, it is natural to consider the spectral projection corresponding to the L + 1 − n lowest eigenvalues as opposed to just the ground state space. Before beginning to prove the main theorem, we will observe some simple facts about ++ the droplet Hamiltonian. First, the two site Hamiltonian Hx,x+1 restricted to C2x ⊗ C2x+1 is diagonalized as follows eigenvalue − A()
1 −1 2 (1 − )
++ : Hx,x+1
eigenvector |↑↑
√1 (|↑↓ + |↓↑) 2
A()
|↓↓
1 −1 2 (1 + )
√1 (|↑↓ − |↓↑) 2
(2.7)
++ for all nearest neighbor pairs Note that it is not true that HL++ is the sum of Hx,x+1 +− XXZ x, x + 1 ∈ [1, L] as was the case for HL and HL . Instead the following identities are true: +− ++ −+ HL++ = H[1,x] + Hx,x+1 + H[x+1,L] ,
= =
+− H[1,x] ++ H[1,x]
++ + H[x,L] , −+ + H[x,L] ,
(2.8) (2.9) (2.10)
Droplet States in XXZ Heisenberg Chain
577
for 1 ≤ x ≤ L − 1. These identities should be kept in mind since they allow us to cut the droplet spin chain at the sites x, x + 1. This vague notion will be explained in detail −− ++ in Sect. 5. The diagonalization of Hx,x+1 is the same as the diagonalization of Hx,x+1 above, except that ↑ and ↓ are interchanged for each of the eigenvectors. Now we state an obvious (but poor) preliminary lower bound for λL,n (1). Proposition 2.3. The ground state energy of HL++ on HL is −A(), and the ground state space is span{|↑ . . . ↑}. Moreover, ++ ψ|H[1,L] ψ
ψ|ψ
1 ≥ −A() + (1 − −1 ) for all nonzero ψ ⊥ |↑ . . . ↑. 2 (3)
(2.11) (3)
++ XXZ ≥ 0 and −A()(S ≥ −A()1 because H[1,L] Proof. First, H[1,L] 1 + SL ) ≥ ++ ++ −A()1. It is also clear that H[1,L] |↑ . . . ↑ = −A()|↑ . . . ↑, and HL |↓ . . . ↓ = A()|↓ . . . ↓, in agreement with (2.11). Because |↑ . . . ↑ and |↓ . . . ↓ are eigen++ , all that remains is to check that (2.11) vectors of the self-adjoint operator H[1,L] ⊥ holds on span{|↑ . . . ↑, |↓ . . . ↓} . But this is true by Lemma 2.1, since HL++ ≥ −A() + HLXXZ and HLXXZ ≥ 21 (1 − −1 ) on span{|↑ . . . ↑, |↓ . . . ↓}⊥ . !
We now begin the actual proof of Theorems 1.1 and 1.2. ++ on Droplet States 3. Evaluation of H[1,L]
We begin by proving part (a) of Theorem 1.1. This is straightforward because we have ++ closed expressions for each ξL,n (x) and for H[1,L] . The heart of the proof is a number of computations which show that ξL,n (x) and ξL,n (y) are approximately orthogonal with ++ ++ 2 respect to the inner product ∗|∗ as well as ∗|H[1,L] ∗ and ∗|(H[1,L] ) ∗, when x = y and n is large enough. Specifically, |ξL,n (x)|ξL,n (y)| q n|y−x| ≤ ξL,n (x) · ξL,n (y) fq (∞) ++ |ξL,n (x)|H[1,L] ξL,n (y)|
ξL,n (x) · ξL,n (y) ++ 2 |ξL,n (x)|(H[1,L] ) ξL,n (y)|
ξL,n (x) · ξL,n (y)
for all
x, y ;
(3.1)
≤
q n|y−x| fq (∞)
if
x = y ;
(3.2)
≤
q n|y−x| fq (∞)
if
|x − y| ≥ 2.
(3.3)
Here fq (∞) is a number arising in partition theory [2], fq (∞) =
∞
(1 − q 2n ).
n=1
(q 2 ; q 2 )
(It is usually written as ∞ .) The important fact is that fq (∞) ∈ (0, 1] for q ∈ [0, 1). We need one more piece of information, which is that ++ (H[1,L] − A())ξL,n (x)2
ξL,n (x)2
≤
2q 2n/2 . 1 − q 2n/2
(3.4)
578
B. Nachtergaele, S. Starr
To prove this, we refer to Eq. (6.7) of [5]. In that paper, it is proved that ↓
−+ PL ψ[1,L] (n)2 −+ ψ[1,L] (n)2
< q 2(L−n)
q 2(L−n) 1 − q 2n ≤ , 1 − q 2L 1 − q 2(L−n)
where Pxσ = 11 ⊗ · · · ⊗ 1x−1 ⊗ |σ σ | ⊗ 1x+1 ⊗ · · · ⊗ 1L for σ =↑, ↓. Using spin-flip and reflection symmetry, we obtain ↑
+− PL ψ[1,L] (n)2 +− ψ[1,L] (n)2
<
↑
−+ P1 ψ[1,L] (n)2
q 2n , 1 − q 2n
−+ ψ[1,L] (n)2
<
q 2n . 1 − q 2n
+− −+ (n/2) ⊗ ψ[x+1,L] (n/2), we then have the bounds Since ξL,n (x) = ψ[1,x] ↑
Px ξL,n (x)2 q 2n/2 ≤ , ξL,n (x)2 1 − q 2n/2
↑
Px+1 ξL,n (x)2 ξL,n (x)2
≤
q 2n/2 . 1 − q 2n/2
(3.5)
++ ++ Now H[1,L] ξL,n (x) = Hx,x+1 ξL,n (x), because of the identity (2.8), and the fact that +− −+ ξL,n (x) = H[x+1,L] ξL,n (x) = 0. H[1,x]
By (2.7), we estimate ↑
++ 0 ≤ (Hx,x+1 − A())2 ≤ Px↑ + Px+1 ,
which, together with (3.5), proves (3.4). We are now poised to prove Theorem 1.1 (a). We state the argument, which is very simple, as a lemma. It is useful to do it this way, because we will repeat the argument twice more in the proofs of Theorems 6.1 and 6.2. Lemma 3.1. Let {fn : n ∈ Z} be a family of states, normalized so that fn = 1 for all n, but not necessarily orthogonal. Suppose, however, that there are constants C < ∞ and < 1 such that |fn |fm | ≤ C |n−m| for all m, n. If (1 + 2C) < 1, then 2C Proj(fn ) − Proj(span({fn : n ∈ Z})) ≤ . (3.6) 1− n∈Z
Suppose that X is a self-adjoint operator such that for some r < ∞ we have Xfn ≤ r for all n, and for some C < ∞, N ∈ N we have |Xfn |Xfm | ≤ C |n−m| whenever |n − m| ≥ N . Then X · Proj(span({fn : n ∈ Z})) ≤
(2N − 1)r 2 + 1−
2C 1−
2C N 1−
1/2 .
(3.7)
The same results hold if {fn } is a finite family, in which case the bounds are even smaller.
Droplet States in XXZ Heisenberg Chain
579
∞ Proof. Define F = n=−∞ |fn fn |. Define E an infinite matrix such that Emn = fm |f . Let {e : n ∈ Z} be an orthonormal family in any Hilbert space, and let n n A = n |fn en |. Then E = A∗ A and F = AA∗ . For simplicity let F = cl(span({fn : n ∈ Z})), and let E = cl(span({en : n ∈ Z})). We consider A : E → F. Then we calculate 2C A∗ A − 1E ≤ sup . |Emn | ≤ 1 − m n n=m
Since 2C < 1 − , this shows that A is bounded and A∗ A is invertible. Under the invertibility condition, it is true that AA∗ is also invertible on F, and considering this as its domain, σ (AA∗ ) = σ (A∗ A). If we let E and F operate on proper superspaces of E and F, then they will be identically zero on the orthogonal complements. But it is still true that σ (E) \ {0} = σ (AA∗ ) = σ (A∗ A) = σ (F ) \ {0}. In particular, if we let PF be the orthogonal projection onto F, then F − PF = A∗ A − 1E ≤
2C . 1−
This proves (3.6). To prove the second part, let ψ = n αn fn be a state in F. Let φ = n αn en . Then ψ2 = φ|A∗ Aφ ≥ (1 − We calculate Xψ =
α m αn Xfm |Xfn ≤
m,n
2C ) |αn |2 . 1− n
|αn |2 · sup m
n
(3.8)
|Xfm |Xfn |.
n
Breaking the sum into two pieces yields, for any m ∈ Z, |Xfm |Xfn | ≤ |Xfm |Xfn | + |Xfm |Xfn | n
n |m−n|
n |m−n|≥N
≤ (2N − 1)r 2 +
2C N . 1−
So, using (3.8), we have (2N − 1)r 2 + Xψ2 ≤ 2C ψ2 1 − 1− for any nonzero ψ ∈ F. This proves (3.7).
2C N 1−
!
Now to prove Theorem 1.1(a), we note that the hypotheses of the lemma are met. Namely, take fx = ξL,n (x). By (3.1), we have |fx |fy | ≤ C |x−y| , where C = ++ fq (∞)−1 and = q n . We set X = H[1,L] − A(). Then by (3.1), (3.2) and (3.3), we have Xfx |Xfy ≤ C |x−y| , for |x − y| ≥ 2, where C = 4/fq (∞). (Since
580
B. Nachtergaele, S. Starr
A() ≤ 1, 1 + 2A() + A()2 ≤ 4.) By (3.4), we have Xξx ≤ r for all x, where r 2 = 2q 2n/2 /(1 − q 2n/2 ). Therefore, by Lemma 3.1, and some trivial estimations √ 2 2q n/2 ++ (H[1,L] − A()) · Proj(KL,n ) ≤ . (3.9) (1 − 3q 2n/2 )fq (∞) The lemma also gives us the following result: Proj(KL,n ) −
L−n/2
Proj(ξL,n (x)) ≤
x=n/2
2q n . (1 − q n )fq (∞)
(3.10)
This will prove useful in Sect. 5, because it is a precise statement of just how orthogonal our proposed states ξL,n (x) are to each other. 4. Existence of Fully Polarized Intervals We know that the ground states of the kink Hamiltonian exhibit a localized interface such that to the left of the interface nearly all spins are observed in the ↓ state, and to the right nearly all spins are observed in the ↑ state. The interface has a thickness due to quantum fluctuations. A similar phenomenon occurs with the antikink Hamiltonian but with left and right reversed or alternatively with ↑ and ↓ reversed. We might hope that the ground state of the droplet Hamiltonian will also contain an interval (or several intervals) with nearly all ↑- or all ↓-spins. This is the case, and we prove it next. Definition 4.1. For any finite interval J ⊂ Z define the orthogonal projections ↑
PJ = |↑ . . . ↑↑ . . . ↑|J ⊗ 1I \J , ↓
PJ = |↓ . . . ↓↓ . . . ↓|J ⊗ 1I \J , ↑
↓
PJ = PJ + PJ . We also define for any operator X and any nonzero state ψ, the Rayleigh quotient ρ(ψ, X) =
ψ|Xψ . ψ|ψ
Proposition 4.2. Suppose ψ ∈ HL is a nonzero state, and let E = ρ(ψ, HLXXZ ). Given l < L, there is a subinterval J = [a, a + l − 1] ⊂ [1, L] satisfying the bound 2E PJ ψ2 . ≥1− ψ2 γ L/ l
(4.1)
Moreover denoting 2E , γ L/ l < 1, we have the following bound: :=
then as long as
XXZ ρ(PJ ψ, H[1,L] )≤
E 1−
+ 2−1
1−
.
(4.2)
Droplet States in XXZ Heisenberg Chain
581
Proof. Partition [1, L] into r = L/ l intervals J1 , . . . , Jr each of length ≥ l. If Ji = [ai , ai+1 − 1] then HLXXZ =
r i=1
HJXXZ + i
By Lemma 2.1, )≥ ρ(ψ, HJXXZ i
r i=2
HaXXZ ≥ i −1,ai
r i=1
HJXXZ . i
γ (1 − ρ(ψ, PJi )). 2
So E≥
r γ γ (1 − ρ(ψ, PJi )) ≥ r min(1 − ρ(ψ, PJi )). 2 2 i i=1
In other words, ρ(ψ, PJi ) ≥ 1 −
2E , γr
for some i. Since [ai , ai + l] ⊂ Ji , PJi ≤ P[ai ,ai +l+1] . Let J = [ai , ai + l − 1], then (4.1) holds. Note that for any orthogonal projection P and any operator H we have the decomposition H = P H P + (1 − P )H (1 − P ) + [P , [P , H ]]. If H is nonnegative, then (1 − P )H (1 − P ) is as well. Hence P H P ≤ H − [P , [P , H ]]. On the other hand, it is obvious that P [P , [P , H ]]P = (1 − P )[P , [P , H ]](1 − P ) = 0, which implies ρ(ψ, P H P ) ≤ ρ(ψ, H ) + 2[P , [P , H ]]
P ψ (1 − P )ψ ψ2
for any nonzero ψ. Moreover,
ρ(ψ, 1 − P ) ρ(ψ, H ) ρ(ψ, P H P ) ≤ + 2[P , [P , H ]] . ρ(P ψ, H ) = ρ(ψ, P ) ρ(ψ, P ) ρ(ψ, P )
In our particular case, where H = HLXXZ and P = PJ , (4.3) and (4.1) imply E XXZ XXZ + 2[PJ , [PJ , HL ]] . ρ(PJ ψ, HL ) ≤ 1− 1− XXZ ]]. All that remains is to calculate [PJ , [PJ , H[1,L] Notice that β XXZ XXZ ]] = [PJα , [PJ , Hx,x+1 ]], [PJ , [PJ , H[1,L] x∈[1,L−1] α,β∈{↑,↓}
(4.3)
(4.4)
582
B. Nachtergaele, S. Starr β
XXZ commutes with P for all x, x + 1 except a − 1, a and b, b + 1. (We and that Hx,x+1 J define b = a + l − 1.) Straightforward computations yield β
1 β 1[1,a−2] ⊗ (|ββ β β| − |β βββ |) ⊗ P[a+1,b] ⊗ 1[b+1,L] 2
β
1 β 1[1,a−1] ⊗ P[a,b−1] ⊗ (|ββ β β| − |β βββ |) ⊗ 1[b+2,L] , 2
XXZ [PJ , Ha−1,a ]=−
and XXZ [PJ , Hb,b+1 ]=−
β
where ↑ =↓ and ↓ =↑. It is easy to deduce that [PJα , [PJ , HLXXZ ]] is zero unless α = β. β α α XXZ ] has a tensor factor P β ([PJ , Ha−1,a [a+1,b] and PJ has a tensor factor P[a+1,b] , which β
β
XXZ ]] is zero unless α = β. The term [P α , [P , H XXZ ]] is treated implies [PJα , [PJ , Ha−1,a J J b,b+1 similarly.) Another straightforward computation yields β
β
1 β 1[1,a−2] ⊗ (|ββ β β| + |β βββ |) ⊗ P[a+1,b] ⊗ 1[b+1,L] 2
β
1 β 1[1,a−1] ⊗ P[a,b−1] ⊗ (|ββ β β| + |β βββ |) ⊗ 1[b+2,L] . 2
XXZ [PJ , [PJ , Ha−1,a ]] = −
and β
XXZ [PJ , [PJ , Hb,b+1 ]] = −
So [PJ , [PJ , HLXXZ ]] = −
1 1[1,a−2] ⊗ Aa−1,a ⊗ P[a+1,b] ⊗ 1[b+1,L] 2
+ 1[1,a−1] ⊗ P[a,b−1] ⊗ Ab,b+1 ⊗ 1[b+2,L] ,
where A = |↑↓↓↑| + |↓↑↑↓|. In particular A = 1, so that 1[1,a−2] ⊗ Aa−1,a ⊗ P[a+1,b] ⊗ 1[b+1,L] = 1, and Thus
1[1,a−1] ⊗ P[a,b−1] ⊗ Ab,b+1 ⊗ 1[b+2,L] = 1. [PJ , [PJ , HLXXZ ]]
≤ −1 , which along with (4.4) proves (4.2).
!
In the following corollary, we show that essentially the same results hold for any XXZ . bounded perturbation of H[1,L] Corollary 4.3. Suppose HL is a bounded operator on HL with XXZ M = HL − H[1,L] .
Let E < ∞ and ψ ∈ HL be a nonzero state with ρ(ψ, HL ) ≤ E. Given any subinterval K ⊂ [1, L] and l < |K|, there is a sub-subinterval J ⊂ K of length l, satisfying the bound ψ − PJ ψ2 ≤ ψ2 ,
(4.5)
Droplet States in XXZ Heisenberg Chain
583
where =
2(E + M) . γ |K|/|J | < 1. Also under the assumption that
< 1, we
ψ|HL ψ ≥ PJ ψ|HL PJ ψ − M + 2(−1 + 2M) (1 − ) .
(4.6)
This statement is nonvacuous when have the bound
XXZ = M, it is clear that Proof. Since HL − H[1,L] XXZ ) ≤ E + M. ρ(ψ, HKXXZ ) ≤ ρ(ψ, H[1,L]
So Proposition 4.2 implies (4.5). To prove (4.6) notice that for any operator H , any orthogonal projection P , and any nonnegative operator H˜ , H − P H P = (1 − P )H (1 − P ) + [P , [P , H ]] = (1 − P )H˜ (1 − P ) + (1 − P )(H − H˜ )(1 − P ) + [P , [P , H˜ ]] + [P , [P , H − H˜ ]] ≥ (1 − P )(H − H˜ )(1 − P ) + [P , [P , H˜ ]] + [P , [P , H − H˜ ]]. So, for any nonzero ψ, ρ(ψ, H − P H P ) ≥ − H − H˜ ρ(ψ, 1 − P ) − 2 [P , [P , H˜ ]] + 2H − H˜ ρ(ψ, P )1/2 ρ (ψ, 1 − P )1/2 . Setting H = HL , H˜ = HLXXZ and P = PJ we have ρ(ψ, HL ) − ρ(ψ, PJ HL PJ ) ≥ −M − 2(−1 + 2M) Since ρ(PJ ψ, HL ) =
(1 − ).
ρ(ψ, PJ HL PJ ) ρ(ψ, PJ HL PJ ) ≤ , ρ(ψ, PJ ) 1−
the corollary is proved. ! 5. Remainder of the Proof We will now prove Theorem 1.1(b). Let us henceforth denote Proj(span{φ}) simply by Proj(φ) for any nonzero state φ. We observe by (3.10) that there are constants C0 (q) and N0 (q), such that Proj(KL,n ) −
L−n/2 x=n/2
Proj(ξL (x, n)) ≤ C0 (q)q n
584
B. Nachtergaele, S. Starr
whenever n ≥ N0 (q). By (3.10), N0 (q) = 1 and C0 (q) = (1 − q)−1 fq (∞)−1 . Suppose we exhibit a sequence n , with limn→∞ n = 0, such that ++ H[1,L] Proj(HL,n ) ≥ (A() −
n )Proj(HL,n )
+ γ Proj(HL,n ) −
L−n/2
Proj(ξL (x, n)) .
(5.1)
x=n/2 ++ We know, by Theorem 1.1(a), that (H[1,L] − A())Proj(KL,n ) is bounded above and n below by ±Cq 1. Then we would know ++ H[1,L] Proj(HL,n ) ≥ (A () − 2Cq n )Proj HL,n + (γ − n ) Proj(HL,n ) − Proj(KL,n ) .
So to prove Theorem 1.1(b), it suffices to verify that there is a sequence n satisfying (5.1). We will prove this fact in this section. We find it convenient to consider an arbitrary gap λ, 0 ≤ λ < γ . Define λ (L, n) to be the smallest nonnegative number such that ++ ψ ≥ (A() − ψ|H[1,L]
λ (L, n))ψ
2
+ λψ|[1 −
L−n/2
Proj(ξL (x, n))]ψ
x=n/2
holds for all ψ ∈ HL,n . We also define λ (L, n)
= max
λ (∞, n)
= lim
λ (n)
L n≤L ≤L
L→∞
= sup n n ≥n
λ (L
, n),
λ (L, n), λ (∞, n ).
If we can prove that for every λ < γ , limn→∞ λ (n) = 0, then we will have proved Theorem 1.1(b). Given 0 ≤ q < 1, define √ 2 5 − 4q + (6 − 5q)(4 − 3q) N1 (q) = . 1−q Suppose n > N1 (q) and L ≥ n. (The requirement that n > N1 (q) allows us to! apply Corollary 4.3 effectively, i.e. with < 1.) Define an interval K = [ 41 L , 43 L ], and ++ suppose ψ ∈ HL,n is a nonzero state with ρ(ψ, H[1,L] ) ≤ A() + γ . Then by Corollary 4.3 and the !requirement that n > N1 (q), we can find an interval J ⊂ K such that |J | = L1/2 , ψ − PJ ψ2 ≤ C1 (q)L−1/2 ψ2 ,
(5.2)
++ ++ ψ|H[1,L] ψ ≥ PJ ψ|H[1,L] PJ ψ − C2 (q)L−1/4 ψ2 ,
(5.3)
and
Droplet States in XXZ Heisenberg Chain
where
585
8 (1 − 2n1 (q)−1/2 − n1 (q)−1 )−1 , 1−q (1 + 3q)(3 − q) C2 (q) = C1 (q)1/2 . 2(1 + q 2 ) C1 (q) =
Let J = [a, b]. We need to extend our definition of HL,n in the following way. For integers s ≤ t, let H[s,t] = C2s ⊗ C2s+1 ⊗ · · · ⊗ C2t . For 0 ≤ r ≤ s − t + 1, let r − H[s,t],r = span{ Sxi |↑ . . . ↑[s,t] : s ≤ x1 < x2 < · · · < xr ≤ t}. i=1
So HL = H[1,L] in the new notation, and HL,n = H[1,L],n . We are free to decompose ψ= ψ(n1 , n2 , n3 ), n1 ,n2 ,n3
where ψ(n1 , n2 , n3 ) ∈ H[1,a−1],n1 ⊗ H[a,b],n2 ⊗ H[b+1,L],n3 . The condition that ψ ∈ HL,n implies ψ(n1 , n2 , n3 ) = 0 only if (n1 , n2 , n3 ) ∈ [0, a − 1] × [0, b − a + 1] × [0, L − b], and n1 + n2 + n3 = n. Also, since the range of PJ is precisely the direct sum of all those triples H[1,a−1],n1 ⊗ H[a,b],n2 ⊗ H[b+1,L],n3 such that n2 ∈ {0, |J |}, we can restrict attention to those states ψ(n1 , n2 , n3 ) satisfying the same condition. Therefore, let ψ ↑ (j ) = ψ(j, 0, n − j ), and ψ ↓ (j ) = ψ(j, |J |, n − j − |J |). Then ψ ↑ (j ) lies in ↑ ↓ the range of PJ and ψ ↓ (j ) lies in the range of PJ , and PJ ψ =
n
ψ ↑ (j ) +
j =0
n−|J |
ψ ↓ (j ).
j =0
n1 n2 n3 ⊗ H[a,b] ⊗ H[b+1,L] ). Then it is easy to see that Let Q(n1 , n2 , n3 ) = Proj(H[1,a−1] ++ Q(n1 , n2 , n3 ) H[1,L] Q(m1 , m2 , m3 ) = 0
except when (n1 − m1 , n2 − m2 , n3 − m3 ) equals (±1, ∓1, 0) or (0, ±1, ∓1). But if n2 , m2 ∈ {0, |J |} (and |J | > 1), then the condition of the previous line can never be met. Therefore ++ PJ ψ|H[1,L] Pj ψ =
n j =0
++ ψ ↑ (j )|H[1,L] ψ ↑ (j ) +
n−|J | j =0
++ ψ ↓ (j )|H[1,L] ψ ↓ (j ), (5.4)
just as PJ ψ = 2
n j =0
↑
ψ (j ) + 2
n−|J |
ψ ↓ (j )2 .
j =0
We will next bound each of the terms on the right-hand side of (5.4).
(5.5)
586
B. Nachtergaele, S. Starr
Let x = a + |J |/2 = (a + b + 1)/2. Since x, x + 1 ∈ J , consulting (2.7), we have ++ ψ ↓ (j ) = A()ψ ↓ (j ). Hx,x+1 Then, by (2.8), it is clear ++ ψ ↓ (j )|H[1,L] ψ ↓ (j ) ≥ A()ψ ↓ (j )2 +− −+ + ψ ↓ (j )|(H[1,x] + H[x+1,L] )ψ ↓ (j ).
By Proposition 2.2, +− −+ + H[x+1,L] )ψ ↓ (j ) ψ ↓ (j )|(H[1,x] +− −+ (j ) ⊗ ψ[x+1,L] (n − j )) |ψ ↓ (n1 ), ≥ γ ψ ↓ (j )| 1 − Proj(ψ[1,x]
where j = j + |J |/2 + 1. Also, defining x˜j = a − 1 + n/2 − j , we know by (A.9), +− −+ (j ) ⊗ ψ[x+1,L] (n − j )) − Proj(ξL,n (x˜j ) ≤ C3 (q)q |J |/2 , Proj(ψ[1,x]
where C3 (q) = 4(1 − q 2 )−1/2 . Therefore, ++ ψ ↓ (j ) ≥ (A() − C3 q |J |/2 )ψ ↓ (j )2 ψ ↓ (n1 )|H[1,L] + γ ψ ↓ (j )| 1 − Proj(ξL,n (x˜j )) ψ ↓ (j ).
(5.6)
++ Next, we bound ψ ↑ (j )|H[1,L] ψ ↑ (j ) in the case that 1 ≤ j ≤ n/2. The case n/2 ≤ j ≤ n − 1, will be the same by symmetry. Referring to (2.9), ++ +− ↑ ++ ψ ↑ (j ) = ψ ↑ (j )|H[1,x] ψ (j ) + ψ ↑ (j )|H[x,L] ψ ↑ (j ). ψ ↑ (j )|H[1,L]
Now, since ψ ↑ (j ) ∈ H[1,x−1],j ⊗ H[x,L],n−j , we may bound ++ ψ ↑ (j )|H[x,L] ψ ↑ (j ) ≥ (A() −
By the definition of
λ (.)
and
λ (L − x
λ (L − x
+ 1, n − j ))ψ ↑ (j )2 .
λ (n − j )
≤
λ (.),
+ 1, n − j ) ≤
λ (n/2),
since n − j ≥ n/2. So ++ ψ ↑ (j ) ≥ (A() − ψ ↑ (j )|H[x,L]
↑ 2 λ (n/2))ψ (j ) .
(5.7)
By Proposition 2.2, +− ↑ +− ψ (j ) ≥ γ ψ ↑ (j )| (1 − Proj(ψ[1,x] (j )) ⊗ 1[x+1,L] ψ ↑ (j ). ψ ↑ (j )|H[1,x] We can prove +− (j )) ⊗ 1[x+1,L] ψ ↑ (j )) ≤ ψ ↑ (j )|Proj(ψ[1,x]
q |J | ψ ↑ (j )2 . fq (∞)
(5.8)
Droplet States in XXZ Heisenberg Chain
587
Indeed, since ψ ↑ (j ) ∈ H[1,a−1],j ⊗ H[a,x],0 ⊗ H[x+1,L],n−j we have +− (j )) ⊗ 1[x+1,L] ψ ↑ (j ) ψ ↑ (j )|Proj(ψ[1,x] +− (j ))Proj(H[1,a−1],j ⊗ H[a,x],0 )2 ; ≤ ψ ↑ (j )2 × Proj(ψ[1,x]
so it suffices to check +− Proj(ψ[1,x] (j ))Proj(H[1,a−1],j ⊗ H[a,x],0 )2 ≤
q |J | . fq (∞)
But, by a computation, +− Proj(ψ[1,x] (j ))Proj(H[1,a−1],j ⊗ H[a,x],0 ) j
= =
+− 0 Proj(H[1,a−1] ⊗ H[a,x] )ψ[1,x] (j )2 +− ψ[1,x] (j )2
+− +− q j (x−a+1) ψ[1,a−1] (j ) ⊗ ψ[a,x] (0)2
+− ψ[1,x] (j )2 $" # x a−1 2j (|J |/2+1) = q j q2 j q2
"
≤
#
q |J | . fq (∞)
The last calculation is deduced from Eqs. (A.1) and (A.2), and note that it is necessary that j ≥ 1. From this we conclude +− ↑ ψ (j ) ≥ γ (1 − ψ ↑ (j )|H[1,x]
q |J | )ψ ↑ (j )2 . fq (∞)
(5.9)
Combining this with (5.7), we have ++ ψ ↑ (j ) ψ ↑ (j )|H[1,L]
≥ A() −
λ (n/2) + γ
1−
q |J | fq (∞)
ψ ↑ (j )2
(5.10)
as long as 1 ≤ j ≤ n/2. A symmetric argument yields the same bound for the case that n/2 ≤ j ≤ n − 1. For j = 0, note that ψ ↑ (0) = |↑ . . . ↑[1,x] ⊗ ψ[x+1,L] , for some ψ[x+1,L] ∈ H[x+1,L],n . Also, by (2.8), ++ +− ++ ++ H[1,L] = H[1,x+1] + H[x+1,L] ≥ H[x+1,L] .
So ++ ++ ψ ↑ (0)|H[1,L] ψ ↑ (0) ≥ ψ[x+1,L] |H[x+1,L] ψ[x+1,L]
≥ (A() −
2 λ (L − x, n))ψ[x+1,L]
| + λψ[x+1,L]
1−
L−n/2 x=x+n/2 ˜
Proj(ξ[x+1,L],n (x) ˜ ψ[x+1,L] .
588
B. Nachtergaele, S. Starr
We can replace ψ[x+1,L] 2 by ψ ↑ (0)2 . Also, since ψ[x+1,L] ∈ H[x+1,b],0 ⊗ H[b+1,L],n ,
it is true that Proj(ξ[x+1,L] (x, ˜ n))ψ[x+1,L] =0
unless x˜ ≥ b + n/2. Furthermore, Proj(H[1,x],0 ⊗ H[x+1,L],n )ξ[x+1,L],n (x) ˜ = |↑ . . . ↑[1,x] ⊗ ξ[x+1,L],n (x). ˜ Therefore ψ[x+1,L] |
L−n/2
Proj(ξ[x+1,L],n (x))ψ ˜ [x+1,L]
x=b+n/2 ˜
= ψ ↑ (0)|
L−n/2 x=b+n/2 ˜
ξ[x+1,L],n (x) ˜ 2 ↑ Proj(ξ[1,L],n (x))ψ ˜ (0). ξ[1,L],n (x) ˜ 2
But it is very easy to see that ξ[x+1,L],n (x) ˜ 2 ≤ ξ[1,L],n (x) ˜ 2 . So ψ
↑
++ (0)|H[1,L] ψ ↑ (0)
≥
A() −
λ
3 L, n ψ ↑ (0)2 4
+ λψ ↑ (0)| 1 −
L−n/2
Proj(ξL (x)) ˜ ψ ↑ (0).
(5.11)
x=b+n/2 ˜
By an analogous argument ++ ψ ↑ (n)|H[1,L] ψ ↑ (n) ≥ (A() −
3 ↑ 2 λ ( L, n))ψ (n)
4 a−1−n/2
+ λψ ↑ (n)|
Proj(ξL,n (x)) ˜ ψ ↑ (n).
(5.12)
x=n/2 ˜
Let us summarize the proof so far. We began with a state ψ ∈ HL,n . By Corollary 4.3, we found an interval J such that PJ ψ is a good approximation to ψ. We decomposed ↑ ↓ PJ ψ according to whether ψ is in the range of PJ or PJ , and by the number of downspins to the left of J . We split the states ψ σ (j ) into five classes (σ =↓; σ =↑, j = 0; σ =↑, 1 ≤ j ≤ n/2; σ =↑, n/2 ≤ j ≤ n − 1; σ =↑, j = n) and gave some spectral gap estimates for each. The only piece of the proof left is an induction argument, and one other thing: a proof that all of the spectral gap estimates for each of the states ψ σ (j ) can be combined to a single spectral gap estimate for PJ ψ. Specifically, while the ψ σ (j ) are ++ orthogonal with respect to ∗|∗ and ∗|H[1,L] ∗, it is not true that they are orthogonal ˜ for every x. ˜ The trick is that they are nearly orthogonal with respect to ∗|Proj(ξL,n (x))∗ with respect to the projection for specific choices of x: ˜ namely, if x˜ ∈ I1 ∪ I2 ∪ I3 , where
Droplet States in XXZ Heisenberg Chain
589
I1 = [n/2 , a − 1 − n/2], I2 = [a − 1 − n/2 + |J |, b + n/2 − |J |] and I3 = [b + n/2 , L − n/2]. We will prove in Appendix B that, in fact Proj(ξL,n (x))PJ ψ PJ ψ| x∈I ˜ 1 ∪I2 ∪I3
≥ − C4 (q)q
|J |
PJ ψ + 2
n−|J |
ψ ↓ (j )|Proj(ξL,n (x˜j ))ψ ↓ (j )
j =0
+
L−n/2
↑ ψ ↑ (0)|Proj(ξL,n (x))ψ ˜ (0)
x=b+n/2 ˜
+
a−1−n/2
↑ ψ ↑ (n)|Proj(ξL,n (x))ψ ˜ (n),
x=n/2 ˜
for some C4 (q) < ∞, as long as n ≥ N4 (q). Equations (5.4)–(5.12) together with the result of Appendix B imply ++ Pj ψ|H[1,L] PJ ψ ≥ (A() − η)PJ ψ2 + λPJ ψ| 1 −
Proj(ξL (x, ˜ n)) PJ ψ,
x∈I ˜ 1 ∪I2 ∪I3
where η ≤ (C3 (q) + C4 (q))q |J |/2 + max{0,
λ (n/2) − (γ
− λ),
3 λ ( L, n)}.
4
Since each term −λProj(ξL,n (x)), ˜ for x˜ ∈ (I1 ∪ I2 ∪ I3 ) gives a negative contribution to the expectation, we can add those terms to the inequality: ++ PJ ψ ≥ (A() − η)PJ ψ2 Pj ψ|H[1,L] L−n/2 + λPJ ψ| 1 − Proj(ξL (x, ˜ n)) PJ ψ. x=n/2 ˜
Using (3.10), and the fact that 1 − P ≤ 1, for any projection P , we have L−n/2 Proj(ξL,n (x)) ˜ ≤1+ 1 − x=n/2 ˜
2q n . (1 − q)fq (∞)
This and (5.2), (5.3) and (5.13) imply ++ ψ ≥ (A() − ψ|H[1,L]
λ (L, n))ψ
+ λψ| 1 −
L−n/2 x=n/2 ˜
2
Proj(ξL,n (x)) ˜ ψ,
(5.13)
590
B. Nachtergaele, S. Starr
where, for some C5 (q) and C6 (q), λ (L, n)
≤ η + A()C1 (q)L−1/2 + C2 (q)L−1/4 2q n C1 (q)1/2 L−1/4 + 2λ 1 + (1 − q n )fq (∞) √
≤ C5 (q)q 2 L + C6 (q)L−1/4 % + max 0, λ (n/2) − (γ − λ), 1
λ
&
3 L, n 4
.
We have not stated the exact dependence of C5 (q) and C6 (q) on q, though it can be deduced from our previous calculations. The important fact is that there exists N5 (q), such that if n ≥ N5 (q), then the above holds with C5 (q) and C6 (q) both finite, positive numbers. From this, it follows √ 1 4 2 L + C6 (q)L−1/4 + max{0, (n/2) + λ − γ , (L, n)}, (q)q L, n ≤ C 5 λ λ λ 3 and k−1 k−1 r/4 k √ 1√ 4 3 n [(4/3)r/2 −1] n −1/4 2 n, n ≤ C5 (q)q q + C6 (q)n λ 3 4 r=1 r=1 ' ( + max 0, λ (n/2) + λ − γ , λ (n, n) . Note λ (n, n) = 0, because Hn,n is one-dimensional, and the single vector ξn,n (n/2) = ++ ξn (n/2 , n) = A()ξn (n/2 , n). Therefore, |↓ . . . ↓ satisfies H[1,L] λ (∞, n)
1√
≤ C5 q 2
n
∞
k/2 −1]√n
q [(4/3)
+ C6 n−1/4
k=1 + max{0, λ (n/2) + λ − γ }.
∞ k/4 3 k=1
4
Taking the lim sup as n → ∞, we find λ (∞)
≤ max{0,
λ (∞) + λ − γ }.
For λ < γ this implies λ (∞) either equals zero or +∞. But, by Proposition 2.3, λ (∞) < A(). So λ (∞) = 0, as desired, for every λ < γ . By the Cantor diagonal argument, there is a sequence n satisfying (5.1), constructed from the λ (n), with λ → γ and n → ∞. So Theorem 1.1(a) is proved. Theorem 1.2 is a reformulation of the same result, so it needs no proof. 6. Results for the Ring and the Infinite Chain 6.1. The spin ring. The spin ring (periodic spin chain) has state space HL and is defined L−1 XXZ XXZ = XXZ by the Hamiltonian HZ/L x=1 Hx,x+1 + HL,1 . We define a periodic droplet with n down spins +− −+ ξZ/L,n (0) = ξL,n (L/2) = ψ[1,L/2] (n/2) ⊗ ψ[L/2+1,L] (n/2).
Droplet States in XXZ Heisenberg Chain
591
There are L − 1 additional droplet states ξZ/L,n (x) = T x ξZ/L,n (0)
(x = 1, . . . , L − 1),
where T is the unitary operator on HL such that T (v1 ⊗ v2 ⊗ · · · ⊗ vL ) = vL ⊗ v1 ⊗ · · · ⊗ vL−1 . Let KZ/L,n be the span of ξZ/L,n (x), . . . , ξZ/L (L − 1, n). Let L λZ/L,n (1) ≤ λZ/L,n (2) ≤ . . . λZ/L,n n
XXZ acting on the invariant subspace H be the ordered eigenvalues of HZ/L L,n , and let k HZ/L,n be the span of the first k eigenvectors.
Theorem 6.1. For 1 ≤ n ≤ L − 1, λZ/L,n (1), . . . , λZ/L,n (L) ∈ [2A() − O(q n + q L−n ), 2A() + O(q n + q L−n )]. Also, lim inf
n,L min(n,L−n)→∞
Finally,
λ(L, n, L + 1) ≥ 2A() + γ .
L ) = O(q n + q L−n ). Proj(KZ/L,n ) − Proj(HZ/L,n
Proof. We first prove that XXZ − 2A())Proj(KZ/L,n ) = O(q n + q L−n ). (HZ/L
(6.1)
It is easy to see that, just as for the droplets on an interval, |ξZ/L,n (x)|ξZ/L,n (y)| q n·d(x,y) ≤ fq (∞) ξZ/L,n (x) · ξZ/L,n (y) XXZ ξ |ξZ/L,n (x)|HZ/L Z/L,n (y)|
ξZ/L,n (x) · ξZ/L,n (y) XXZ )2 ξ |ξZ/L,n (x)|(HZ/L Z/L,n (y)|
ξZ/L,n (x) · ξZ/L,n (y)
for all
x, y ;
(6.2)
≤
q n·d(x,y) fq (∞)
if
x = y ;
(6.3)
≤
q n·d(x,y) fq (∞)
if
d(x, y) ≥ 2 ;
(6.4)
where d(x, y) = min(|x − y|, |x + y − L|). In fact, using the same tools as in Appendix A, we can calculate exactly, for 0 ≤ x ≤ L/2, " # " # ρ(ξZ/L,n (0), T x ) = q nx
k
L/2−x n/2−k
"
L/2 n/2
q2
#
q2
"
L/2−x n/2−k L/2 n/2
#
q2
2 x q k(L+2k) . k
q2
It is verifiable that this satisfies the bounds above. The other expectations XXZ T x ) and ρ(ξ XXZ 2 x ρ(ξZ/L,n (0), HZ/L,n Z/L,n (0), (HZ/L,n ) T ) are similar. Applying Lemma 3.1, proves (6.1).
592
B. Nachtergaele, S. Starr XXZ acting on the invariant subspace H Now we prove that, considering HZ/L Z/L,n , XXZ ≥ (2A() − HZ/L
n
−
L−n )1 + γ (1 − Proj(KL,n )),
(6.5)
where limn→∞ n = 0. To do this, we use Corollary 4.3. There exists an L0 (q) and XXZ ) ≤ 2A() + γ , C0 (q) such that, if L > L0 (q) then for any ψ ∈ HL,n with ρ(ψ, HZ/L Corollary ! 4.3 guarantees the existence of a “subinterval” J ⊂ Z/L satisfying |J | = 2 L1/2 , PJ ψ − ψ ≤ C0 (q)L−1/2 , and XXZ XXZ ψ|HZ/L ψ ≥ PJ ψ|HZ/L PJ ψ − C0 (q)L−1/4 ψ2 .
We can take L0 (q) = (7 − 6q + 3q 2 )2 /(1 − q)4 and C0 (q) = (5 + 18q + 5q 2 )L0 (q)1/4 /(2 + 2q 2 ). (modL). By “subinterval”, we mean that there exists an interval J ⊂ Z, ! such that J ≡ J 1/2 ! 1/2 ] ∪ [L + 1 − L , L]. Without loss of generality, we assume J = [1, . . . , L Next, ↑ ↓ XXZ XXZ ↑ XXZ ↓ PJ ψ|HZ/L PJ ψ = PJ ψ|HZ/L PJ ψ + PJ ψ|HZ/L PJ ψ ↑
↓
and PJ ψ2 = PJ ψ2 + PJ ψ2 . ↑ XXZ P ↑ ψ, first. Of course, H XXZ = H −− + H ++ , and since We estimate PJ ψ|HZ/L J L,1 [1,L] Z/L ↑
↑
XXZ P ψ = (A() + H ++ )P ψ. Then using H −− |↑↑ = A()|↑↑, we see that HZ/L [1,L] J J Theorem 1.1(b), ↑
↑
↑
XXZ PJ ψ ≥ (A() − (n))PJ ψ2 PJ ψ|HZ/L ↑
↑
+ γ PJ ψ|(1 − Proj(KL,n ))PJ ψ, where limn→∞ (n) = 0. But ↑
↑
↑
PJ Proj(KL,n )PJ = PJ
=
≤
↑ PJ
↑ PJ
L−n/2 x=n/2
↑
Proj(ξL,n (x))PJ + O(q n )
! L+1− L1/2 −n/2
x=L1/2 +n/2 L−1 x=0
↑
Proj(ξL,n (x))PJ + O(q n ) ↑
Proj(ξZ/L,n (x))PJ − O(q n )
↑
↑
= PJ Proj(KZ/L,n )PJ − O(q n ), where by A = B + O(q n ), we mean A − B = O(q n ), and by A ≥ B − O(q n ), we mean B − A ≤ O(q n )1. We omit the calculations here. So ↑
↑
↑
XXZ n 2 PJ ψ|HZ/ l PJ ψ ≥ (2A() − (n) − O(q ))PJ ψ ↑
↑
+ γ PJ ψ|(1 − Proj(KZ/L,n ))PJ ψ.
Droplet States in XXZ Heisenberg Chain
593
Symmetrically, ↓
↓
↓
XXZ L−n PJ ψ|HZ/ ))PJ ψ2 l PJ ψ ≥ (2A() − (L − n) − O(q ↓
↓
+ γ PJ ψ|(1 − Proj(F KZ/L,L−n ))PJ ψ, where F : HL,L−n → HL,n denotes the spin-flip. But KZ/L,n = F KZ/L,L−n . Also, ↓ ↑ PJ Proj(KZ/L,n )PJ = O(q n + q L−n ). So, for any ψ ∈ HL,n , XXZ ψ|HZ/ l ψ ≥ (2A() − [
n
+
L−n
+ O(q n + q L−n ) + O(L−1/4 )])ψ2
+ γ ψ|(1 − Proj(KZ/L,n ))ψ. Equations (6.1) and (6.6) together imply the corollary.
(6.6)
!
6.2. The infinite spin chain. Let |> = |. . . ↑↑↑ . . .Z be a vacuum state, and define HZ,n = cl(span{Sx−1 Sx−2 . . . Sx−n |> : x1 < x2 < · · · < xn }), where cl(.) is the l 2 -closure. This is a separable Hilbert space, and HZXXZ =
∞ x=−∞
XXZ Hx,x+1
is a densely defined, self-adjoint operator. This Hamiltonian defines the infinite spin chain. We check that the series does converge. In fact XXZ 0 ≤ Hx,x+1 ≤
1 (1 + −1 )(Nˆ x + Nˆ x+1 ), 2
ˆ where Nˆ x = ( 21 − Sx3 ) counts the number of down spins at x. But ∞ x=−∞ Nx ≡ n on HZ,n . So the series does converge, and HZXXZ ≤ n(1 + −1 ). We define the droplet states +− −+ ξZ,n (x) = ψ(−∞,x] (n/2) ⊗ ψ[x,∞) (n/2); and let KZ,n be the l 2 closure of span{ξZ,n (x) : x ∈ Z}). Theorem 6.2. The following bounds exist for the infinite spin chain (HZXXZ − 2A())Proj(KZ,n ) = O(q n ), and, considering HZXXZ as an operator on HZ,n , HZXXZ ≥ (2A() − where
n
is a sequence with limn→∞
n
n )1 + γ (1 − Proj(KZ,n )),
= 0.
594
B. Nachtergaele, S. Starr
Proof. The proof that (HZXXZ − 2A())Proj(KZ,n ) = O(q n )
(6.7)
is essentially the same as in Sect. 3. One fact we should check is that for each ξZ,n (x), (HZXXZ − 2A())ξZ,n (x)2 = O(q n ). We observe that +− −+ ++ XXZ 3 H[−L,L] ξZ,n (0) = H[−L,0] + H[1,L] + H0,1 + A()(S−L + SL3 ) ξZ,n (0) ++ 3 = H0,1 + A()(S−L + SL3 ) ξZ,n (0). But as before,
++ (H0,1 − A())ξZ,n (0)2 ≤ O(q n )ξZ,n (0)2 .
An obvious fact is 3 (S−L + SL3 − 1)ξ(0, n)2 ≤ O(q L−n )ξ(0, n)2 .
Taking L → ∞, yields the desired result. We have the usual orthogonality estimates |ξZ,n (x)|ξZ,n (y)| q n|x−y| ≤ , ξZ,n (x) · ξZ,n (y) fq (∞) |ξZ,n (x)|HZXXZ ξZ,n (y) q n|x−y| ≤ ξZ,n (x) · ξZ,n (y) fq (∞) |ξZ,n (x)|(HZXXZ )2 ξZ,n (y)| q n|x−y| ≤ ξZ,n (x) · ξZ,n (y) fq (∞)
for x = y, for |x − y| ≥ 2.
In fact, the estimate of ξZ,n (x)|ξZ,n (y) follows by (A.9), taking the limit that L → ∞, and the other estimates are consequences. Applying Lemma 3.1 proves (6.7). For the second part, suppose ψ ∈ HZ,n . Then XXZ ρ(ψ, HZXXZ ) = lim ρ(ψ, H[−L,L] ). L→∞
XXZ = H ++ 3 3 Furthermore H[−L,L] [−L,L] + A()(S−L + SL ), and 3 lim ψ|(S−L + SL3 )ψ = ψ2
L→∞
by virtue of the fact that n, the total number of down spins in the state ψ, is finite. Essentially the same fact is restated as limL→∞ ψL = ψ, where ψL = Proj(H(−∞,−L−1],0 ⊗ H[−L,L],n ⊗ H[L+1,∞),0 )ψ. Let us define ?L,n = Proj(H(−∞,−L−1],0 ⊗ K[−L,L],n ⊗ H[L+1,∞),0 )ψ, where K[−L,L],n is the droplet state subspace for the finite chain. By Theorem 1.1(b), ++ ψL |H[−L,L] ψL ≥ (2A() − (n))ψL 2
+ γ ψL |(1 − ?L,n )ψL .
Droplet States in XXZ Heisenberg Chain
595
Since ψL → ψ in the norm-topology, as L → ∞, all we need to check is that ?L,n converges weakly to Proj(KZ,n ). It helps to break up ?L,n into two pieces, ' ?L,n = Proj span |. . . ↑(−∞,−L−1] ⊗ ξ[−L,L],n (x) ⊗ |↑ . . .[L+1,∞) : ( − L/2 + n/2 ≤ x ≤ L/2 − n/2 , and ?L,n = ?L,n − ?L,n . Define φL,n (x) = |. . . ↑(−∞,−L−1] ⊗ ξ[−L,L],n (x) ⊗ |↑ . . .[L+1,∞) . Note that for any sequence xL such that xL ∈ [− L/2 + n/2 , L/2 − n/2], we have lim ρ(φL,n (xL ), KZ,n ) = 1. L→∞
The reason is that φL,n (x) − ξZ (x, n) = O(q L/2 ) because the left and right interfaces of the droplet in φL,n (x) are a distance at least L/2 from the left and right endpoints of the interval [−L, L], and the probability of finding an overturned spin decays qexponentially with the distance from the inteface. For the same reason, for any fixed x ∈ Z, limL,→∞ ρ(ξZ (x, n), ?L,n ) = 1. These two facts imply that ?L,n converges weakly to Proj(KZ,n ). Now ?L,n converges weakly to zero, because every state in ?L,n has over half its downspins concentrated in the annulus [−L, L] \ [− L/2 + n/2 , L/2 − n/2], and the inner radius tend to infinity. This means that w − limL→∞ ?L,n = Proj(KZ,n ), as claimed. Thus, taking the appropriate limits, ψ|HZXXZ ψ ≥ (2A() − (n))ψ2 + γ ψ|(1 − Proj(KZ,n ))ψ, which finishes the proof of the theorem.
!
Appendix A In this section we carry out several calculations, whose results are needed in the main body of the paper, but whose proofs are not very enlightening for understanding the +− main arguments. The definitions of the kink states, ψ[a,b] (n), and the antikink states, −+ (n), are given in (1.3) and (1.4). One nice feature of these states is that they are ψ[a,b] αβ
governed by a quantum Clebsch-Gordan formula, due to the SUq (2) symmetry of H[a,b] , αβ = +−, −+. By this we mean the following: Suppose a ≤ x ≤ b. Then, +− +− +− (n) = ψ[a,x] (k) ⊗ ψ[x+1,b] (n − k)q (b−x)k , (A.1) ψ[a,b] k
−+ ψ[a,b] (n)
=
k
−+ −+ ψ[a,x] (k) ⊗ ψ[x+1,b] (n − k)q (x+1−a)(n−k) .
(A.2)
+− We let the sum in k run over all integers k, with the understanding that ψ[a,b] (n) = −+ ψ[a,b] (n) = 0, if n < 0 or n > b − a + 1. One need not refer to the quantum group to
596
B. Nachtergaele, S. Starr
understand this decomposition, it is enough just to check the definitions. We can also see from the definitions that " # b−a+1 αβ αβ q n(n+1) , (A.3) ψ[a,b] (m)|ψ[a,b] (n) = δm,n n q2 b − a + 1 b−a+2 αβ βα ψ[a,b] (m)|ψ[a,b] (n) = δm,n q , (A.4) n for αβ = +−, −+. The combinatorial prefactor in (A.3) is a q-binomial coefficient (in this case a q 2 binomial coefficient), also known as a Gauss polynomial. The most important feature, for us, is the q-binomial formula L
(1 + q 2k x) =
L " # L q n(n+1) x n . n q2 n=0
k=1
At this point let us introduce another useful combinatorial quantity, fq (n), defined for n = 0, 1, 2, . . . , ∞: n fq (n) = (1 − q 2k ). k=1
For a fixed q ∈ [0, 1), the sequence fq (n) is clearly montone decreasing, and fq (∞) > 0. We note that " # fq (n) n = k q2 fq (k)fq (n − k) which means that for 0 ≤ k ≤ n,
" # 1 n 1≤ . ≤ k q2 fq (∞)
The first result we wish to prove is that +− −+ +− −+ (m) ⊗ ψ[x+1,x+y+r] (n + k)|ψ[1,x+r] (m + k) ⊗ ψ[x+r+1,x+y+r] (n) ψ[1,x] " # " # r x y = q m(m+k+1)+n(n+k+1)+k(r+1) . m n q2 k q2
This is very simple. From (A.1) and (A.2), +− −+ +− −+ (m) ⊗ ψ[x+1,x+y+r] (n + k)|ψ[1,x+r] (m + k) ⊗ ψ[x+r+1,x+y+r] (n) ψ[1,x] +− −+ +− = q r(n+k−j ) ψ[1,x] (m) ⊗ ψ[x+1,x+r] (j ) ⊗ ψ[x+r+1,x+y+r] (n + k − j )| j,l
+− +− −+ |q r(m+k−l) ψ[1,x] (l) ⊗ ψ[x+1,x+r] (m + k − l) ⊗ ψ[x+r+1,x+y+r] (n) +− +− = q r(m+n+2k−l−j ) ψ[1,x] (m)|ψ[1,x] (l) j,l
−+ +− × ψ[x+1,x+r] (j )|ψ[x+1,x+r] (m + k − l)
−+ −+ (n + k − j )|ψ[x+r+1,x+y+r] (n). × ψ[x+r+1,x+y+r]
(A.5)
Droplet States in XXZ Heisenberg Chain
597
Consulting (A.3) and (A.4), we see that the only choice of j and l for which none of the inner-products vanishes is j = l = k. Plugging in these values for j and l and using the formulae for the inner-products yields (A.5). We can use (A.3) to normalize the inner-product in the following way, +− −+ +− −+ (m) ⊗ ψ[x+1,x+y+r] (n + k)|ψ[1,x+r] (m + k) ⊗ ψ[x+r+1,x+y+r] (n) ψ[1,x]
+− −+ +− −+ ψ[1,x] (m) ⊗ ψ[x+1,x+y+r] (n + k) · ψ[1,x+r] (m + k) ⊗ ψ[x+r+1,x+y+r] (n) # " # " # " # $" r x+r x y y+r q (m+n+k)(r−k) . (A.6) = m + k q2 n + k q2 m q2 n q2 k
We wish to specialize this formula in two ways. First, by setting k = r we have +− −+ +− −+ ψ[1,x] (m) ⊗ ψ[x+1,x+y+r] (n + r)|ψ[1,x+r] (m + r) ⊗ ψ[x+r+1,x+y+r] (n)
+− −+ +− −+ ψ[1,x] (m) ⊗ ψ[x+1,x+y+r] (n + r) · ψ[1,x+r] (m + r) ⊗ ψ[x+r+1,x+y+r] (n) " # " # # # " $" x y y+r x+r . (A.7) = m q2 n q2 m + r q2 n + r q2
Second, by setting k = 0, we have +− −+ +− −+ ψ[1,x] (m) ⊗ ψ[x+1,x+y+r] (n)|ψ[1,x+r] (m) ⊗ ψ[x+r+1,x+y+r] (n)
+− −+ +− −+ ψ[1,x] (m) ⊗ ψ[x+1,x+y+r] (n) · ψ[1,x+r] (m) ⊗ ψ[x+r+1,x+y+r] (n) " # " # " # " # $ x y y+r x+r q (m+n)r . = m q2 n q2 n q2 m q2
(A.8)
To estimate (A.7), we notice that " # " # $" # " # x x+r y y+r m q2 m q2 m + r q2 n + r q2 =
fq (x) fq (m + r) fq (y) fq (n + r) · · · . fq (x + r) fq (m) fq (y + r) fq (n)
This quantity is at most 1 (when r = 0). To get a lower bound we observe that the first and third ratios on the right-hand side are greater than 1, while the product of the second and third is easily bounded r fq (m + r) fq (n + r) (1 − q 2(m+k) )−1 (1 − q 2(n+k) )−1 · ≥ fq (m) fq (n) k=1 −1 −1 q 2(n+1) q 2(m+1) 1− ≥ 1− . 1 − q2 1 − q2
598
B. Nachtergaele, S. Starr
Inserting the inequality to (A.7), +− −+ +− −+ (m) ⊗ ψ[x+1,x+y+r] (n + r)|ψ[1,x+r] (m + r) ⊗ ψ[x+r+1,x+y+r] (n) ψ[1,x]
+− −+ +− −+ ψ[1,x] (m) ⊗ ψ[x+1,x+y+r] (n + r) · ψ[1,x+r] (m + r) ⊗ ψ[x+r+1,x+y+r] (n) −1/2 −1/2 q 2(m+1) q 2(n+1) ≥ 1− 1− . 1 − q2 1 − q2
This leads to a useful formula. If ψ and φ are normalized states then Proj(ψ) − Proj(φ) = 1 − |ψ|φ|2 . Thus, +− −+ (m) ⊗ ψ[x+1,x+y+r] (n + r)) Proj(ψ[1,x]
+− −+ − Proj(ψ[1,x+r] (m + r) ⊗ ψ[x+r+1,x+y+r] (n)) ≤
8q 2 (q 2m + q 2n ). 1 − q2
In particular, changing notation to match the body of the paper, +− −+ Proj(ψ[1,x] (n1 ) ⊗ ψ[x+1,L] (n2 )) − Proj(ξL,n1 +n2 (x)) ˜ ≤
4q min(n1 ,n2 )+1 , 1 − q2
(A.9)
where x˜ = x + (n2 − n1 )/2. To estimate (A.8), we begin again by observing # # " " # " # $" y y+r x+r x n q2 m q2 a q2 b q2 =
fq (x) fq (x − m + r) fq (y) fq (y − n + r) · · · . fq (x + r) fq (x − m) fq (y + r) fq (y − n)
By the monotonicity of fq (x) in x, we have # " # " # $" # " 1 y y+r x x+r 2 ≤ . fq (∞) ≤ 2 b m q2 n q2 a f (∞) 2 2 q q q From this it follows +− −+ +− −+ (m) ⊗ ψ[x+1,x+y+r] (n)|ψ[1,x+r] (m) ⊗ ψ[x+r+1,x+y+r] (n) ψ[1,x]
+− −+ +− −+ ψ[1,x] (m) ⊗ ψ[x+1,x+y+r] (n) · ψ[1,x+r] (m) ⊗ ψ[x+r+1,x+y+r] (n)
= C(x, y, m, n, r)q (m+n)r , where fq (∞) ≤ C(x, y, m, n, r) ≤
(A.10)
1 . fq (∞)
In particular, we have the useful bound |ξL,n (x)|ξL,n (y)| q n|y−x| ≤ . ξL,n (x) · ξL,n (y) fq (∞)
(A.11)
Droplet States in XXZ Heisenberg Chain
599
This is the first in a series of three inequalities needed for Sect. 3. Next, we need a bound for ++ ξL,n (y)| |ξL,n (x)|H[1,L]
ξL,n (x) · ξL,n (y)
.
It turns out that this is well approximated by the normalized inner-product above. The rea++ son is that, while H[1,L] is not a small operator in general, when acting on the droplet states ++ ++ ξL,n (x) = Hx,x+1 ξL,n (x). To it reduces to just one nearest-neighbor interaction: H[1,L] exploit this we return to the notation above, and observe that as long as r ≥ 1, +− −+ +− −+ (m) ⊗ ψ[x+1,x+y+r] (n + k)|ψ[1,x+r] (m + k) ⊗ ψ[x+r+1,x+y+r] (n) ψ[1,x] +− +− = q (r+2)m+n+k−3j +(r−2)l ψ[1,x−1] (m − j )|ψ[1,x−1] (m − j ) j,l
(A.12)
+− −+ +− × ψ{x} (j ) ⊗ ψ{x+1} (l)|ψ[x,x+1] (j + l)
−+ +− −+ × ψ[x+2,x+y+r] (n + k − l)|ψ[x+2,x+r] (k − l) ⊗ ψ[x+r+1,x+y+r] (n).
This is derived just as before, using Eqs. (A.1)–(A.4). Note +− −+ (j ) = ψ{x} (j ) = q j (Sx− )j |↑x . ψ{x}
The usefulness of this formula is in the fact that −+ ++ +− +− (j ) ⊗ ψ{x+1} (l)|Hx,x+1 ψ[x,x+1] (j + l)| |ψ{x}
+− −+ +− ≤ ψ{x} (j ) ⊗ ψ{x+1} (l)|ψ[x,x+1] (j + l).
Indeed, the formula for the right-hand side is +− −+ +− (j ) ⊗ ψ{x+1} (l)|ψ[x,x+1] (j + l) = q 2j +3l , ψ{x}
while the left-hand side is +− −+ ++ +− (j ) ⊗ ψ{x+1} (l)|Hx,x+1 ψ[x,x+1] (j + l) j l ψ{x}
0 0 −A() q 2 (1 − q)2 2(1 + q 2 ) q 4 (1 − q 2 ) 1 0 − 2(1 + q 2 )
0 1
1 1 A()q 5
600
B. Nachtergaele, S. Starr
Thus, +− −+ ++ +− −+ |ψ[1,x] (m) ⊗ ψ[x+1,x+y+r] (n + k)|Hx,x+1 ψ[1,x+r] (m + k) ⊗ ψ[x+r+1,x+y+r] (n)| +− +− ≤ q (r+2)m+n+k−3j +(r−2)l ψ[1,x−1] (m − j )|ψ[1,x−1] (m − j ) j,l
+− −+ ++ +− × |ψ{x} (j ) ⊗ ψ{x+1} (l)|Hx,x+1 ψ[x,x+1] (j + l)|
−+ +− −+ × ψ[x+2,x+y+r] (n + k − l)|ψ[x+2,x+r] (k − l) ⊗ ψ[x+r+1,x+y+r] (n) +− +− ≤ q (r+2)m+n+k−3j +(r−2)l ψ[1,x−1] (m − j )|ψ[1,x−1] (m − j ) j,l
+− −+ +− × ψ{x} (j ) ⊗ ψ{x+1} (l)ψ[x,x+1] (j + l)
−+ +− −+ × ψ[x+2,x+y+r] (n + k − l)|ψ[x+2,x+r] (k − l) ⊗ ψ[x+r+1,x+y+r] (n)
+− −+ +− −+ = ψ[1,x] (m) ⊗ ψ[x+1,x+y+r] (n + k)|ψ[1,x+r] (m + k) ⊗ ψ[x+r+1,x+y+r] (n).
This result, in conjunction with (A.11), gives ++ |ξL,n (x)|H[1,L] ξL,n (y)|
ξL (x, n) · ξL (y, n)
≤
q n|y−x| , fq (∞)
(A.13)
whenever |x − y| ≥ 1. The requirement that |x − y| ≥ 1 comes from the fact that r must be at least one for (A.12) to hold true. Now a similar argument works to bound ++ 2 ) ξL,n (y)| |ξL,n (x)|(H[1,L]
by ξL,n (x)|ξL,n (y).
Specifically, we note ++ 2 ++ ++ ξL,n (x)|(H[1,L] ) ξL,n (y) = ξL,n (x)|(Hx,x+1 Hy,y+1 ξL,n (y)
as long as |x − y| ≥ 2. Then the same argument as above can show that ++ ++ |ξL,n (x)|(Hx,x+1 Hy,y+1 ξL,n (y)| ≤ ξL,n (x)|ξL,n (y).
Thus we have ++ 2 |ξL,n (x)|(H[1,L] ) ξL,y (n)|
ξL,n (x) · ξL,y (n)
≤
q n|y−x| , fq (∞)
(A.14)
whenever |x − y| ≥ 2. Appendix B In this section we derive a single result. We need the following definitions, some of which appeared previously in the ) paper. Given an arbitrary finite subset A ⊂ Z, let HA be the |A|-fold tensor product x∈A C2x , the space of all spin states on A. The subspace of all vectors ψ ∈ HA with exactly n down spins is denoted HA,n . For any subset A1 ⊂ A, we can define QA1 ,n to be the projection onto the subspace of HA consisting of those vectors with exactly n down spins in A1 . So, QA1 ,n = Proj(HA1 ,n ⊗ HA\A1 ). We also
Droplet States in XXZ Heisenberg Chain
601
define PA1 = QA1 ,0 + QA1 ,|A1 | . It is the projection onto the span of vectors such that on A1 they have all up spins or all down spins, but nothing else. Now, let 0 ≤ n < L. Suppose J = [a, b] is a subinterval of [1, L]. We define the projections: ↑
Gj = Q[1,a−1],j QJ,0 Q[b+1,L],n−j , ↓
Gj = Q[1,a−1],j QJ,|J | Q[b+1,L],n−j −|J | . Then, for any ψ ∈ H[1,L],n , PJ ψ =
n j =0
↑
Gj ψ +
n−|J | j =0
↓
Gj ψ.
We recall the definition of droplet states: For n/2 ≤ x ≤ L − n/2, +− −+ ξL,n (x) = ψ[1,x] (n/2) ⊗ ψ[x+1,L] (n/2), +− −+ (n/2) and ψ[x+1,L] (n/2) are the kink and antikink states defined in where ψ[1,x] (1.3) and (1.4). Let ?x = Proj(ξL,n (x)). Define the intervals
I1 = [n/2 , a − n/2 − 1], I2 = [b − n/2 , a − 1 + n/2], I3 = [b + n/2 , L − n/2]. Some of these intervals may be empty. We have the following result. There exists an N (q) ∈ N and a C(q) < ∞, such that as long as n ≥ N (q) ↓ ↓ PJ ?x PJ ≥ G↑n ?x G↑n + Ga−1+n/2−x ?x Ga−1+n/2−x x∈I1 ∪I2 ∪I3
x∈I1
+
x∈I2
↑ ↑ G 0 ?x G 0
− C(q)q |J | PJ Proj(H[1,L],n ).
x∈I3
To prove this we group certain projections, Gσj , and certain projections, ?x , together. Let G1 =
n−|J | j =0
↓ Gj ,
X1 =
n−|J |
?a−1+n/2−j ;
j =0
↑
G2 = G0 ,
X2 =
L−n/2 x=b+n/2
G3 =
n/2−1 j =1
↑
Gj ;
↑
G4 = Gn/2 ; G5 =
n−1 j =n/2+1
↑
Gj ;
?x ;
602
B. Nachtergaele, S. Starr
G6 =
G↑n ,
X6 =
a−1−n/2
?x .
x=n/2
To prove the claim it suffices to prove Xi Gj ≤ O(q |J | ) for i = j , and G1 X1 G1 −
n−|J | j =0
↓
↓
Gj · ?a−1+n/2−j · Gj ≤ O(q |J | ).
(B.1)
We will explain how this may be done now. By our definition, each Gi may be written k∈Ei Gσk i , and each Xj may be written x∈Fj ?x , for intervals Ei , Fj , possibly empty, and σi ∈ {↑, ↓}. Thus, letting σ = σi ,
=
k,l∈Ei x,y∈Fj
k,l∈Ei x,y∈Fj
=
(Xj Gi )∗ (Xj Gi ) =
k,l∈Ei x,y∈Fj
Gσk ?x ?y Gσl
Gσk ·
|ξL,n (x)ξL,n (x)| |ξL,n (y)ξL,n (y)| · · Gσl ξL,n (x)|ξL,n (x) ξL,n (y)|ξL,n (y)
Gσk ·
|Gσk ξL,n (x) Gσl ξL,n (y)| ξL,n (x)|ξL,n (y) · · · Gσl . ξL,n (x) ξL,n (x) · ξL,n (y) ξL,n (y)
Applying Cauchy–Schwarz we deduce that jσ Gk ψ Gl ψMkl i , Xj Gi ψ2 ≤ k,l∈Ei
where jσ
Mkl =
Gσ ξL,n (x) Gσl ξL,n (y) |ξL,n (x)|ξL,n (y)| k · · . ξL,n (x) ξL,n (x) · ξL,n (y) ξL,n (y)
x,y∈Fj
Since the projections Gσk are mutually orthogonal to one another, Gσk ψ2 Gi ψ2 = k∈Ei
Thus,
jσ
i ) . Xj Gi ψ2 ≤ Gi ψ2 · (Mk,l∈E i kl
Of course, Gi ψ2 ≤ ψ2 , because Gi is a projection. So jσ
Xj Gi ≤ (Mkl i )k,l∈Ei 1/2 . jσ
We now discuss how to bound (Mkl i )k,l∈Ei . We can bound the inner-product ξl,n (x)|ξl,n (y) by (A.9). So jσ
Mkl ≤
q n|x−y| Gσ ξL,n (x) Gσ ξL,n (y)| k l · · . fq (∞) ξL,n (x) ξL,n (y)
x,y∈Fj
Droplet States in XXZ Heisenberg Chain
603
Then, using the operator norm with respect to l ∞ , jσ
jσ
(Mkl )k,l∈Ei ≤ (Mkl )k,l∈Ei ∞ q n|x−y| Gσ ξL,n (x) Gσ ξL,n (y)| k l · · . ≤ sup fq (∞) ξL,n (x) ξL,n (y) k∈Ei l∈Ei x,y∈Fj
To proceed, we need to estimate Gσl ξL,n (x)/ξL,n (x) for each σ , l and x. In fact, no estimation is required, we can perform the computation exactly. Let us explain how this is done. The operator Gσl falls in the following class of projections. Suppose we have some partition P of [1, L], composed of intervals [xj −1 + 1, xj ], where 0 = x0 < x1 < · · · < xr =L, and suppose we have a vector n = (n1 , . . . , nr ), where 0 ≤ nj ≤ xj − xj −1 and rj =1 nj = n. Then we can define the projection QP ,n :=
r
Q[xj −1 +1,xj ],nj .
j =1
The operators Gσl are of this form, where the partition has three intervals [1, a − 1], [a, b] and [b + 1, L], and n = (j, 0, n − j ) or n = (j, |J |, n − j − |J |), depending on whether σ is ↑ or ↓. We can reduce the problem of computing QP ,n ξL,n (x) to one +− +− (n/2), and QP2 ,n2 ψ[x+1,L] (n/2) for some partitions and of computing QP1 ,n1 ψ[1,x] vectors P1 ,P2 , n1 and n2 . To accomplish this, let k be the integer such that xk−1 + 1 ≤ x < xk . Define the partition P , where xj = xj for j < k, xk = x, and xj = xj −1 for j > k, and define the r + 1-vector n by nj = nj for j < k, nk = n/2 − k−1 j =1 nj , nk+1 = nk − nk , and nj = nj −1 for j > k + 1. Since ξL,n (x) has a definite number of downspins, n/2, to the left of x and a definite number of downspins, n/2, to the right of x + 1, the vector QP ,n ξL,n (x) is the same as QP ,n ξL,n (x). In fact, since +− −+ (n/2) ⊗ ψ[x+1,L] (n/2), we know ξL,n (x) = ψ[1,x] +− −+ (n/2)) ⊗ (QP2 ,n2 ψ[x+1,L] (n/2)), QP ,n ξL,n (x) = (QP1 ,n1 ψ[1,x]
where P1 is the partition consisting of the first k parts of P , P2 is the remainder partition, n1 = (n1 , . . . , nk ) and n2 = (nk+1 , . . . , nr ). Therefore, −+ +− (n/2) QP1 ,n1 ψ[1,x] (n/2) QP2 ,n2 ψ[x+1,L] QP ,n ξL,n (x) · . = +− −+ ξL,n (x) ψ[1,x] (n/2) ψ[x+1,L] (n/2)
We now present the formula for the two quantities on the right-hand side of the equation. The key to the computation is the decomposition formulae of (A.1) and (A.2). These have trivial generalizations. Specifically, for x0 < x1 < · · · < xr , +− ψ[x (n) = 0 +1,xr ]
−+ (n) = ψ[x 0 +1,xr ]
(n1 ,...,nr ) n1 +···+nr =n
(n1 ,...,nr ) n1 +···+nr =n
q nxr −(n1 x1 +...nr xr )
r * j =1
q (n1 x0 +...nr xr−1 )−nx0
+− ψ[x (nj ), j −1 +1,xj ]
r * j =1
−+ ψ[x (nj ). j −1 +1,xj ]
(B.2)
(B.3)
604
B. Nachtergaele, S. Starr
From this one can easily calculate # xj − xj −1 +− j =1 r nj (n)2 QP ,n ψ[x q2 0 +1,xr ] j =1 nj (2(xr −xj )−(n−nj )) , " # = q (B.4) +− xr − x 0 ψ[x0 +1,xr ] (n)2 n q2 " # r xj − xj −1 −+ j =1 r nj (n)2 QP ,n ψ[x q2 0 +1,xr ] " # = q j =1 nj (2(xj −1 −x0 )−(n−nj )) . (B.5) −+ 2 xr − x 0 ψ[x0 +1,xr ] (n) n q2 r
"
We notice the following interesting fact. The exponent of q in the formulas above has the following interpretation. The most probable locations of the downspins for kink state +− ψ[1,L] (n) are in the interval [L + 1 − n, L]. Suppose we place marbles in these places and ask for the minimum transport required to move these marbles so that nj of the marbles lie in the bin [xj −1 + 1, xj ] for each j . Then this is precisely the exponent of q in (B.4). To state this in symbols r
L nj (2(xr − x1 ) − (n − nj )) = min{ |f (x) − x| : f ∈ Perm([1, L]),
j =1
x=1
# f ([L + 1 − n, L]) ∩ [xj −1 + 1, xj ] = nj , j = 1, . . . , r}.
The exponent of q in (B.5) has a similar interpretation, except that the marbles initially occupy the sites of [1, n] instead of [L + 1 − n, L]. Having said how one can perform the computations of Gσj ξL,n (x), we now state our results. The following notation is convenient: ∗L,n,x :=
ξL,n (x)| ∗ ξL,n (x) . ξL,n (x)|ξL,n (x)
This is the expectation value of an observable with respect to the droplet state ξL,n (x). • If 0 ≤ x ≤ a − 1 and σ =↑ let r = a − 1 − x − j + n/2. Then " ↑
Gj
L,n,x
=
# # " L−b a−1−x n − j q2 r q2 " # q 2(n−j )(|J |+r) . L−x n/2 q 2
We make the convention that " # n =0 k q2
if k < 0
or k > n.
Thus the formula above is zero unless 0 ≤ r ≤ a − 1 − x.
Droplet States in XXZ Heisenberg Chain
605
• If 0 ≤ x ≤ a − 1 and σ =↓ let r = a − 1 − x − j + n/2. Then # " # " L−b a−1−x n − j − |J | q 2 r q2 ↓ " # Gj = q 2(n−j )r . L,n,x L−x n/2 q 2 • If a ≤ x ≤ b and σ =↑, the answer is zero unless j = n/2, and # " # " L−b a−1 n/2 q 2 n/2 q 2 ↑ # " # q 2[n/2(x−a+1)+n/2(b−x)] . Gn/2 =" L,n,x x L−x n/2 q 2 n/2 q 2 • If a ≤ x ≤ b and σ =↓, the answer is zero unless j = n/2 − x + a − 1, and # " # " L−b a−1 x − n/2 q 2 L − x − n/2 q 2 ↓ " # " # Gn/2 = . L,n,x x L−x n/2 q 2 n/2 q 2 • If b + 1 ≤ x ≤ L and σ =↑, let r = x − b − n/2 + j . Then # " # " x−b a−1 r j q2 q 2 2j (|J |+r) ↑ " # Gj = q . L,n,x x n/2 q 2 • If b + 1 ≤ x ≤ L and σ =↓, let x − a + 1 − n/2 + j . Then # " # " x−b a−1 r j q2 q 2 2j (|J |+r) ↑ " # Gj = q . L,n,x x n/2 q 2 The rest of the computations proceed directly from these observations. Note that each q 2 -binomial coefficient can be bounded above by fq (∞)−1 , but one should remember to restrict the indices j and x to those for which none of the q 2 -binomial coefficients vanish. Our results are the following: • As mentioned above, it is easy to check that X1 G2 = X1 G6 = X2 G6 = X2 G5 = X6 G2 = X6 G3 = 0. Simply put, if one consults the formulae in the paragraph, each of the products above is composed of ?x Gσj for which the q-binomial coefficients vanish.
606
B. Nachtergaele, S. Starr
• A simultaneous bound for X1 G3 2 and X1 G5 2 is C(q)q 2|J | , where C(q) = • X1 G4 2 ≤
1 fq (∞)3
2 + 8q . (1 − q)4 fq (∞)3 |J | +
1 + q n/2 1 − q n/2
2
q 2|J |n/2 .
• We bound X2 G1 2 and X6 G1 2 , simultaneously, by C(q)q 2(|J |−1) , where 2
C(q) =
1 fq
(∞)3 (1 − q |J | )2 (1 − q 2(|J |−1) )
.
The reason the bound is so small is that it is actually equal to zero, if |J | > n, as can be understood by counting downspins to the left and right of x. • Both X2 G3 2 and X6 G5 2 can each be bounded by C(q)q 2|J | , where C(q) =
q2 fq
(∞)3 (1 − q)2 (1 − q |J |+2 )
.
• Both X2 G4 2 and X6 G4 2 can each be bounded by 1 + q 2n/2 1 1 + q 2n/2 q 4n/2(|J |+n/2) . + fq (∞)3 1 − q 2n/2 1 − q 2n/2 That accounts for all of the necessary computations except one, which we now carry out. We show in this paragraph that n−|J | ↓ 4q |J | ↓ G1 X1 G1 − ≤ G ? G . (B.6) a−1+n/2+j j j 3 |J | 2 fq (∞) (1 − q ) j =0
In this case we can define xj = a − 1 + n/2 + j , for each 0 ≤ j ≤ n − |J |, and we have 1 ↓ Gj q |J |·|x−xj | . ≤ L,n,x fq (∞) This is understood because |x − xj | downspins must be moved all the way across the ↓ droplet in order to change the basic interval for ξ(x) into a state compatible with Gj . Thus, proceeding in the same way as before, we obtain n−|J | ↓ ↓ G1 X1 G1 − G ? G j a−1+n/2+j j ≤ M, j =0 where Mjj = 0 for each j , and Mj k ≤
1 q |J |·|x−xj |+|J |·|x−xk | 2 fq (∞) x∈I2
Droplet States in XXZ Heisenberg Chain
607
when j = k. By extending the indices x to cover all integers, and by translating so that xj is the new origin of x, we have Mj k ≤
1 q |J |·|x|+|J |·|x+j −k| . 3 fq (∞) x
The series is easily calculated as
q
|J |·|x|+|J |·|x+j −k|
=q
|J |·|j −k|
x
1 + q 2|J |j − k| + 1 − q 2|J |
.
So, for any fixed j , we have
Mj k
∞ 1 + q 2|J 2 |J |l l+ . ≤ q fq (∞)2 1 − q 2|J | l=1
k∈Z k=j
This sum is then easily computed as ∞ 1 + q 2|J | 4q |J | = q |J |l l + . 2|J | 1−q (1 − q |J | )2 l=1
From this we obtain (B.6). Acknowledgements. This material is based on work supported by the National Science Foundation under Grant No. DMS0070774.
References 1. Alcaraz, F.C., Salinas, S.R., Wreszinski, W.F.: Anisotropic ferromagnetic quantum domains. Phys. Rev. Lett. 75, 930–933 (1995) 2. Andrews, G.: The Theory of Partitions. Volume 2 of Encyclopedia of Mathematics and its Applications, Reading, MA: Addison Wesley, 1976 3. Bach, K.T. and Macris, N.: On kink states of ferromagnetic chains. Physica A 279, 386–397 (2000) 4. Bodineau, T., Ioffe, D., and Velenik, Y.: Rigorous probabilistic analysis of equilibrium crystal shapes. J. Math. Phys. 41, 1033–1098 (2000) 5. Bolina, O., Contucci, P., Nachtergaele, B.: Path Integral Representation for Interface States of the Anisotropic Heisenberg Model. Rev. Math. Phys. 12, 1325–1344 (2000), archived as math-ph/9908004 6. Dobrushin, R.L., Kotecký, R., and Shlosman, S.: Wulff construction: A global shape from local interaction. AMS translations series, Vol 104, Providence R.I.: AMS, 1992 7. Gottstein, C.-T., Werner, R.F.: Ground states of the infinite q-deformed Heisenberg ferromagnet. Preprint archived as cond-mat/9501123 8. Kassel, C.: Quantum Groups. New York, NY: Springer Verlag, 1995 9. Koma, T., Nachtergaele, B.: The spectral gap of the ferromagnetic XXZ chain. Lett. Math. Phys. 40, 1–16 (1997) 10. Koma, T., and Nachtergaele, B.: The complete set of ground states of the ferromagnetic XXZ chains. Adv. Theor. Math. Phys., 2, 533–558 (1998), archived as cond-mat/9709208 11. Matsui, T.: On ground states of the one-dimensional ferromagnetic XXZ model. Lett. Math. Phys. 37, 397 (1996) 12. Pfister, C.E.: Large deviations and phase separation in the two dimensional Ising model. Helv. Phys. Acta 64, 953–1054 (1991) 13. Schonmann, R.H. and Shlosman, S.: Wulff droplets and the metastable relaxation of kinetic Ising models. Commun. Math. Phys. 194, 389–462 (1998) Communicated by M. Aizenman
Commun. Math. Phys. 218, 609 – 632 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Gravity on Finite Groups Leonardo Castellani Dipartimento di Scienze e Tecnologie Avanzate, East Piedmont University, Italy and Dipartimento di Fisica Teorica and I.N.F.N, Via P. Giuria 1, 10125 Torino, Italy. E-mail:
[email protected] Received: 11 November 1999 / Accepted: 11 December 2000
Abstract: Gravity theories are constructed on finite groups G. A self-consistent review of the differential calculi on finite G is given, with some new developments. The example of a bicovariant differential calculus on the nonabelian finite group S3 is treated in detail, and used to build a gravity-like field theory on S3 . 1. Introduction The algebraic treatment of differential calculus in terms of Hopf structures allows to extend the usual differential geometric quantities (connection, curvature, metric, vielbein, etc.) to a variety of interesting spaces that include quantum groups, noncommutative spacetimes (i.e. quantum cosets), and discrete spaces. In this paper we concentrate on a particular sort of discrete spaces, i.e. finite group “manifolds”. As we will discuss, these spaces can be visualized as collections of points, corresponding to the finite group elements, and connected by oriented links according to the particular differential calculus we build on them. Although functions f ∈ Fun(G) on finite groups G commute, the calculi that are constructed on Fun(G) by algebraic means are in general noncommutative, in the sense that differentials do not commute with functions, and the exterior product does not coincide with the usual antisymmetrization of the tensor product. The physical motivations for finding differential calculi on finite groups are at least threefold in our opinion: (i) The possibility of using finite group spaces as internal spaces for Kaluza–Klein compactifications of supergravity or superstring theories. Harmonic analysis on such spaces is far simpler than on the usual smooth manifolds (coset spaces, Calabi–Yau spaces, etc.) or orbifolds. We note in this respect that compactification of D = 5 Yang– Mills theory on the finite group space Z 2 yields precisely the Higgs potential, and gives Supported in part by EEC under TMR contract ERBFMRX-CT96-0045.
610
L. Castellani
it a geometric raison d’ être . In fact Connes’ reconstruction of the standard model in terms of noncommutative geometry [1] can be presumably recovered as Kaluza–Klein compactification of Yang–Mills theory on an appropriate discrete internal space. (ii) Field theories on discrete structures are interesting per se: many statistical models are of this sort and the tools offered by differential calculi on these structures can be of use in the study of integrable models, see for ex. ref. [2]. (iii) Gauge and gravity theories on finite group spaces may be used as lattice approximations. For example the action for pure Yang–Mills F ∧ ∗ F considered on the finite group space Z N × Z N × Z N × Z N , yields the usual Wilson action of lattice gauge theories, and N → ∞ gives the continuum limit [3]. New lattice theories can be found by choosing different finite groups. Here we propose an action for a toy theory of gravity on the smallest nonabelian finite group S3 . In fact the same type of action can be used for any finite group. Taking Z N ×Z N ×Z N ×Z N yields a discretized version of gravity, in the same spirit of refs. [4] (where however no action principle was used). In Sect. 2 a review of the differential calculus on finite groups is presented. Most of this material is not new, and draws on the treatment of refs. [5, 6], where the Hopf algebraic approach of Woronowicz [7] for the construction of differential calculi is adapted to the setting of finite groups. Some developments on Lie derivative, diffeomorphisms and integration are new. The general theory is illustrated in the case of S3 in Sect. 3. The “softening” of the rigid finite group manifold is discussed in Sect. 4, together with the application to a gravity-like field theory on S3 . 2. Differential Calculus on Finite Groups 2.1. Fun(G) as a Hopf algebra. Let G be a finite group of order n with generic element g and unit e. Consider Fun(G), the set of complex functions on G. An element f of Fun(G) is specified by its values fg ≡ f (g) on the group elements g, and can be written as fg x g , fg ∈ C, (2.1) f = g∈G
where the functions
xg
are defined by g
x g (g ) = δg .
(2.2)
Thus Fun(G) is a n-dimensional vector space, and the n functions x g provide a basis. Fun(G) is also a commutative algebra, with the usual pointwise sum and product [(f + h)(g) = f (g)+h(g), (f ·h)(g) = f (g)h(g), (λf )(g) = λf (g), f, h ∈ Fun(G), λ ∈ C] and unit I defined by I (g) = 1, ∀ g ∈ G. In particular: x g = I. (2.3) x g x g = δg,g x g , g∈G
The G group structure induces a Hopf algebra structure on Fun(G), with coproduct , coinverse κ and counit ε defined by group multiplication, inverse and unit as: (f )(g, g ) = f (gg ), −1
κ(f )(g) = f (g ), ε(f ) = f (e),
: Fun(G) → Fun(G) ⊗ Fun(G),
(2.4)
κ : Fun(G) → Fun(G), ε : Fun(G) → C.
(2.5) (2.6)
Gravity on Finite Groups
611
In the first line we have used Fun(G × G) ≈ Fun(G) ⊗ Fun(G) [indeed a basis for functions on G × G is given by x g1 ⊗ x g2 , g1 , g2 ∈ G]. On the basis functions x g the costructures take the form: −1 −1 g (x g ) = x h ⊗ x h g , κ(x g ) = x g , ε(x g ) = δe . (2.7) h∈G
The coproduct is related to the pullback induced by left or right multiplication of G on itself. Consider the left multiplication by g1 : Lg1 g2 = g1 g2 ,
∀ g1 , g2 ∈ G.
(2.8)
This induces the left action (pullback) Lg1 on Fun(G): Lg1 f (g2 ) ≡ f (g1 g2 )|g2 ,
Lg1 : Fun(G) → Fun(G),
(2.9)
where f (g1 g2 )|g2 means f (g1 g2 ) seen as a function of g2 . For the basis functions we find easily: −1
Lg1 x g = x g1
g
.
(2.10)
Introducing the mapping L : Fun(G) → Fun(G × G) ≈ Fun(G) ⊗ Fun(G): (Lf )(g1 , g2 ) ≡ (Lg1 f )(g2 ) = f (g1 g2 )|g2
(2.11)
L = .
(2.12)
we see that
Thus the coproduct mapping on the function f encodes the information on all the left actions Lg , g ∈ G applied to f , without reference to a particular g (“point of the group manifold”). It also encodes the information on right actions. Indeed one can define the right action R on Fun(G) as: (Rf )(g1 , g2 ) ≡ (Rg1 f )(g2 ) = f (g2 g1 )|g2 .
(2.13)
Introducing the flip operator τ : Fun(G × G) → Fun(G × G): (τ u)(g1 , g2 ) ≡ u(g2 , g1 ), u ∈ Fun(G × G)
(2.14)
it is easy to find that: R = τ ◦ .
(2.15)
For the basis functions: −1
Rg1 x g = x gg1 , Rx g = τ ◦ (x g ) =
−1 g
xh
⊗ xh.
(2.16)
h∈G
Finally: Lg1 Lg2 = Lg2 g1 , Rg1 Rg2 = Rg1 g2 , Lg1 Rg2 = Rg2 Lg1 .
(2.17)
612
L. Castellani
2.2. First order differential calculus. Differential calculi can be constructed on Hopf algebras A by algebraic means, using the costructures of A [7]. In the case of finite groups G, differential calculi on A = Fun(G) have been discussed in refs. [5, 6]. Here we review some of the results, and present new developments. A first-order differential calculus on A is defined by (i) A linear map d: A → , satisfying the Leibniz rule d(ab) = (da)b + a(db),
∀ a, b ∈ A.
(2.18)
The “space of 1-forms” is an appropriate bimodule on A, which essentially means that its elements can be multiplied on the left and on the right by elements of A [more precisely A is a left module if ∀ a, b ∈ A, ∀ ρ, ρ ∈ ; we have: a(ρ + ρ ) = aρ + aρ , (a + b)ρ = aρ + bρ, a(bρ) = (ab)ρ, Iρ = ρ. Similarly one defines a right module. A left and right module is a bimodule if a(ρb) = (aρ)b]. From the Leibniz rule da = d(I a) = (dI )a + I da we deduce dI = 0. (ii) The possibility of expressing any ρ ∈ as ak dbk (2.19) ρ= k
for some ak , bk belonging to A. To build a first order differential calculus on Fun(G) we need to extend the algebra A = Fun(G) to a differential algebra of elements x g , dx g (it is sufficient to consider g the basis elements and their differentials). Noteg however that gthe dx are not linearly independent. In fact from 0 = dI = d( g∈G x ) = g∈G dx we see that only n − 1 differentials are independent. Every element ρ = adb of can be expressed as a linear combination (with complex coefficients) of terms of the type x g dx g . Moreover ρb ∈ (i.e. is also a right module) since the Leibniz rule and the multiplication rule (2.3) yield the commutations:
g
dx g x g = −x g dx g + δg dx g
(2.20)
allowing to reorder functions to the left of differentials. There are n(n − 1) independent terms x g dx g , since there are n − 1 independent dx g . A convenient independent set was chosen in ref. [5] by taking all the terms eg,g ≡ x g dx g with g = g . Within this set one can choose any subset, defining a consistent first order differential algebra of elements x g , eg,g . These different choices can be described by oriented graphs, whose vertices are elements g of G, and where an oriented line from g to g means that the term x g dx g exists in the subset [5]. Partial derivatives, “curved” indices. Consider the differential of a function f ∈ Fun(g): fg dx g = fg dx g + fe dx e = (fg − fe )dx g ≡ ∂g f dx g . (2.21) df = g=e
g∈G
dx e
dx g
g=e
g=e
dx g
= − g=e (from g∈G = 0). The partial derivatives of f We have used have been defined in analogy with the usual differential calculus, and are given by ∂g f = fg − fe = f (g) − f (e).
(2.22)
Gravity on Finite Groups
613
Not unexpectedly, they take here the form of finite differences (discrete partial derivatives at the origin e). The partial derivatives satisfy the modified Leibniz rule: ∂g (ff ) = (∂g f )f (g) + f (e)∂g f .
(2.23)
2.3. Left and right covariance. A differential calculus is left or right covariant if the left or right action of G (Lg or Rg ) commutes with the exterior derivative d. Requiring left and right covariance in fact defines the action of Lg and Rg on differentials: Lg db ≡ d(Lg b), ∀ b ∈ Fun(G) and similarly for Rg db. More generally, on elements of (oneforms) we define Lg as: Lg (adb) ≡ (Lg a)Lg db = (Lg a)d(Lg b),
(2.24)
and similarly for Rg . For example the left and right action on the differentials dx g is given by: Lg (dx g1 ) ≡ d(Lg x g1 ) = dx g
−1 g 1
−1
Rg (dx g1 ) ≡ d(Rg x g1 ) = dx g1 g .
,
(2.25)
In the same spirit as in the previous section, we can introduce mappings L : → A⊗ and R : → ⊗ A that encode the information about all left or right translations: L (aρb) = (a)L (ρ)(b), L (db) = (id ⊗ d)(b) ∀ a, b ∈ A, ρ ∈ , (2.26) R (aρb) = (a)R (ρ)(b), R (db) = (d ⊗ id)(b) ∀ a, b ∈ A, ρ ∈ . (2.27) To see their relation with Lg and Rg , consider their action on the basic terms x g1 dx g2 ∈ : −1 −1 L (x g1 dx g2 ) = (x g1 )(id ⊗ d)(x g2 ) = x h ⊗ x h g1 dx h g2 , (2.28) h∈G g1
g2
g1
g2
R (x dx ) = (x )(d ⊗ id)(x ) =
−1
x g1 h dx g2 h ⊗ x h .
(2.29)
h∈G
Defining (f ⊗ ρ)[g] ≡ f (g)ρ ≡ (ρ ⊗ f )[g], with f ∈ A = Fun(G), ρ ∈ , g ∈ G, we deduce: L (x g1 dx g2 )[g] = x g g1
g2
R (x dx )[g] = x
−1 g 1
g1 g −1
dx g dx
−1 g 2
g2 g −1
= Lg (x g1 dx g2 ), g1
g2
= Rg (x dx ),
(2.30) (2.31)
so that the relations we were looking for are simply L (ρ)[g] = Lg ρ, R (ρ)[g] = Rg ρ,
∀ ρ ∈ .
(2.32)
Computing L and R on the basic differentials yields: −1 L (dx g1 ) ≡ (id ⊗ d)(x g1 ) = x h ⊗ dx h g1 ,
(2.33)
h∈G
R (dx g1 ) ≡ (d ⊗ id)(x g1 ) =
−1 g 1
dx h ⊗ x h
.
(2.34)
h∈G
In the following we will mainly use the pullbacks Lg and Rg , rather than the more cumbersome mappings L and R . The reason we have introduced them is to make
614
L. Castellani
contact with the notations of general Hopf algebras, where the notion of “point on the manifold” may not exist. The reader not interested in Hopf algebra formalism can simply ignore the discussions involving L or R . A differential calculus is called bicovariant if it is both left and right covariant. Finally, consider the left action on eg1 ,g2 = x g1 dx g2 given by Eq. (2.30). We see that excluding from the differential algebra the element eg1 ,g2 implies the exclusion of all the elements ehg1 ,hg2 , h ∈ G, that is the exclusion of a subset of xdx elements corresponding to an orbit under left multiplication of the couples (g1 , g2 ). We call this subset the eg1 ,g2 left orbit. Thus the left-covariant differential calculi on Fun(G) are obtained from the universal one (where none of the eg1 ,g2 is excluded) by excluding one or more eg1 ,g2 left orbits [5]. Analogous considerations hold for right-invariant calculi. 2.4. Left and right-invariant one-forms. As for Lie group manifolds, also in the case of finite groups one can construct left and right invariant one-forms, which provide a basis (“vielbein basis” or cotangent basis) for the vector space of one-forms. Following the usual definition, left-invariant one forms θ are elements of satisfying: Lg θ = θ.
(2.35)
In terms of the L mapping this means: L θ = I ⊗ θ
(2.36)
(use (2.32) and I (g) = 1). It is a simple matter, via Eq. (2.28), to show that the one-forms: −1 θg ≡ (2.37) x hg dx h = x h dx hg , h∈G
h∈G
are indeed left-invariant: L θ g = I ⊗ θ g , or equivalently : Lh θ g = θ g . The relations (2.37) can be inverted : dx h = (x hg − x h )θ g . From
(2.38)
g∈G
g∈G dx
g
= 0 one finds: −1 −1 θg = x h dx hg = xh dx hg = 0.
g∈G
g∈G h∈G
h∈G
(2.39)
g∈G
We can take as the basis of the cotangent space the n − 1 linearly independent leftinvariant one-forms θ g with g = e. This basis corresponds to the “universal” differential calculus [5]. Smaller sets of θ g can be chosen as a basis (see below). Notice that in the definition of θ g the whole eg,e orbit is involved, cf. (2.37). Thus the left-invariant one-forms are in 1-1 correspondence with the (g,e) left orbits: removing the eg,e left orbit means to remove θ g . All left-covariant differential calculi are therefore obtained by excluding (i.e. setting to zero) some of the θ g . The remaining θ g (g = e) constitute a basis for the bimodule . The x, θ commutations (bimodule relations) are easily found: x h dx g = x h θ g
−1 h
= θg
−1 h
x g (h = g) ⇒ x h θ g = θ g x hg
−1
(g = e),
(2.40)
Gravity on Finite Groups
615
implying the general commutation rule between functions and left-invariant one-forms: f θ g = θ g Rg f.
(2.41)
Thus functions do commute between themselves (i.e. Fun(G) is a commutative algebra) but do not commute with the basis of one-forms θ g . In this sense the differential geometry of Fun(G) is noncommutative, the noncommutativity being milder than in the case of quantum groups Funq (G)(which are noncommutative algebras). Analogous results hold for right invariant one-forms ωg , the corresponding formulae being: ωg =
x gh dx h , R ωg = ωg ⊗ I,
(2.42)
h∈G g
f ωg = ω Lg f.
(2.43)
From the expressions of θ g and ωg in terms of xdx, one finds the relations θg =
x h ωad(h)g ,
ωg =
h∈G
−1 )g
x h θ ad(h
.
(2.44)
h∈G
For a bicovariant calculus the right action on θ g is given by (use the definitions of R and of θ g ): R θ g =
θ ad(h)g ⊗ x h ,
or Rh θ g = θ ad(h)g ,
(2.45)
h∈G
where ad is the adjoint action of G on G, i.e. ad(h)g ≡ hgh−1 . Then bicovariant calculi are in 1-1 correspondence with unions of conjugacy classes (different from {e}) [5]: if θ g is set to zero, one must set to zero all the θ ad(h)g , ∀ h ∈ G corresponding to the whole conjugation class of g. As in [5] we denote by G the subset corresponding to the union of conjugacy classes that characterizes the bicovariant calculus on G (G = {g ∈ G|θ g = 0}). Unless otherwise indicated, hereafter repeated indices are summed on G . A bi-invariant (i.e. left and right invariant) one-form ( is obtained by summing on all θ g or ωg with g = e: (=
θg =
g=e
Note 2.1. Since
g∈G θ
g
=0=
g∈G ω
ωg .
(2.46)
g=e g,
cf. (2.39), we have also θ e = −( = ωe .
2.5. Exterior product. For a bicovariant differential calculus on a Hopf algebra A an exterior product, compatible with the left and right actions of G, can be defined by means of a bimodule automorphism ) in ⊗ that generalizes the ordinary permutation operator: )(θ ⊗ ω) = ω ⊗ θ,
(2.47)
616
L. Castellani
where θ and ω are respectively left and right invariant elements of [7]. Bimodule automorphism means that )(aη) = a)(η), )(ηb) = )(η)b
(2.48) (2.49)
for any η ∈ ⊗ and a, b ∈ A. The tensor product between elements ρ, ρ ∈ is defined to have the properties ρa ⊗ ρ = ρ ⊗ aρ , a(ρ ⊗ ρ ) = (aρ) ⊗ ρ and (ρ ⊗ ρ )a = ρ ⊗ (ρ a). Left and right actions on ⊗ are defined by1 : L (ρ ⊗ ρ ) ≡ ρ1 ρ1 ⊗ ρ2 ⊗ ρ2 , L : ⊗ → A ⊗ ⊗ ,
(2.50)
R (ρ ⊗ ρ ) ≡
(2.51)
ρ1 ⊗ ρ1
⊗ ρ2 ρ2 ,
R : ⊗ → ⊗ ⊗ A,
where ρ1 , ρ2 , etc., are (a customary short-hand notation) defined by L (ρ) = ρ1 ⊗ ρ2 , R (ρ) = ρ1 ⊗ ρ2 ,
ρ1 ∈ A, ρ2 ∈ , ρ1 ∈ , ρ2 ∈ A.
(2.52) (2.53)
Left-invariance on ⊗ is naturally defined as L (ρ ⊗ ρ ) = I ⊗ ρ ⊗ ρ (similar to the definition for right-invariance), so that, for example, θ i ⊗ θ j is left-invariant, and is in fact a left-invariant basis for ⊗ if {θ i } is a left-invariant basis for . The definition of Lg and Rg on tensor products ⊗ · · · ⊗ is straightforward; for example: Lg (ρ ⊗ ρ ) ≡ L (ρ ⊗ ρ )[g] = ρ1 ρ1 (g)ρ2 ⊗ ρ2 = Lg ρ ⊗ Lg ρ ,
(2.54)
Rg (ρ ⊗ ρ ) ≡ R (ρ ⊗ ρ )[g] =
(2.55)
ρ1 ⊗ ρ1 ρ2 ρ2 (g)
= Rg ρ ⊗ Rg ρ ,
where the last equality in both equations is derived after expanding the generic form ρ on the θ i basis (ρ = fi θ i ) and likewise for ρ . In particular Lh (θ i ⊗ θ j ) = θ i ⊗ θ j , Rh (θ i ⊗ θ j ) = θ ad(h)i ⊗ θ ad(h)j . Note 2.2. In general )2 = 1, since )(ωj ⊗ θ i ) is not necessarily equal to θ i ⊗ ωj . By linearity, ) can be extended to the whole of ⊗ . Note 2.3. ) is invertible and commutes with the left and right action of G, i.e. L )(ρ ⊗ ρ ) = (id ⊗ ))L (ρ ⊗ ρ ) = ρ1 ρ1 ⊗ )(ρ2 ⊗ ρ2 ), and similar for R . Equivalently: Lg )(ρ ⊗ ρ ) = )[Lg (ρ ⊗ ρ )] = )(Lg ρ ⊗ Lg ρ ) and similar for Rg . Therefore )(θ i ⊗ θ j ) is left-invariant, and can be expanded on the left-invariant basis θ i ⊗ θ j : ij
)(θ i ⊗ θ j ) = )
kl θ
k
⊗ θl.
1 More generally, we can define the action of on ⊗ ⊗ · · · ⊗ as L
L (ρ ⊗ ρ ⊗ · · · ⊗ ρ ) ≡ ρ1 ρ1 · · · ρ1 ⊗ ρ2 ⊗ ρ2 ⊗ · · · ⊗ ρ2 L : ⊗ ⊗ · · · ⊗ → A ⊗ ⊗ ⊗ · · · ⊗ ,
R (ρ ⊗ ρ ⊗ · · · ⊗ ρ ) ≡ ρ1 ⊗ ρ1 · · · ⊗ ρ1 ⊗ ρ2 ρ2 · · · ρ2 R : ⊗ ⊗ · · · ⊗ → ⊗ ⊗ · · · ⊗ ⊗ A.
(2.56)
Gravity on Finite Groups
617
The exterior product is defined as: ρ ∧ ρ ≡ ρ ⊗ ρ − )(ρ ⊗ ρ ), i
j
θ ∧θ ≡W
ij
kl θ
k
l
i
(2.57) j
⊗θ =θ ⊗θ −)
ij
kl θ
k
l
⊗θ ,
(2.58) ij
where ρ, ρ ∈ and {θ i } = left-invariant basis for . Notice that, given the matrix ) kl , we can compute the exterior product of any ρ, ρ ∈ , since any ρ ∈ is expressible in terms of θ i . In the case A = Fun(G), we find x h ωad(h)g ) = x hg )(θ g ⊗ ωad(h)g ) )(θ g ⊗ θ g ) = )(θ g ⊗ =
h∈G
h∈G
hg ad(h)g
x ω
g
⊗θ =
−1 h)g
x hg x h θ ad(h
⊗ θg
h,h ∈G
h∈G
=
x hg θ ad(g
−1 )g
⊗ θ g = θ ad(g
−1 )g
(2.59)
⊗ θg,
h∈G ij
and the )
kl
matrix takes the form: g1 ,g2 h1 ,h2
)
g
adg1 −1 )g2
= δh21 δh1
.
(2.60)
Then the exterior product of two left-invariant basic one-forms is given by: −1
θ g1 ∧ θ g2 = θ g1 ⊗ θ g2 − θ g1
g2 g1
⊗ θ g1 .
(2.61)
Note that: θg ∧ θg = 0
(no sum on g).
(2.62)
This familiar formula holds in the case of finite groups, but not for a general Hopf algebra. We can generalize the definition (2.58) to exterior products of n one-forms: j1 jn n θ i1 ∧ . . . ∧ θ in ≡ W i1 ..i j1 ..jn θ ⊗ · · · ⊗ θ
(2.63)
or in short-hand notation: θ 1 ∧ . . . ∧ θ n = W1..n θ 1 ⊗ · · · ⊗ θ n ,
(2.64)
where the labels 1 . . . n in W refer to index couples. The numerical coefficients W1...n are given through a recursion relation W1...n = I1...n W1...n−1 ,
(2.65)
where I1...n = 1 − )n−1,n + )n−2,n−1 )n−1,n . . . − (−1)n )12 )23 · · · )n−1,n
(2.66)
and W1 = 1. The space of n-forms ∧n is therefore defined as in the usual case but with the new permutation operator ), and can be shown to be a bicovariant bimodule (see for ex. [10]), with left and right action defined as for ⊗ · · · ⊗ with the tensor product replaced by the wedge product.
618
L. Castellani
2.6. Exterior derivative. With the exterior product we can define the exterior derivative d : → ∧ , d(ak dbk ) = dak ∧ dbk ,
(2.67) (2.68)
which can easily be extended to ∧n (d : ∧n → ∧(n+1) ), and has the following properties: d(ρ ∧ ρ ) = dρ ∧ ρ + (−1)k ρ ∧ dρ , d(dρ) = 0, L (dρ) = (id ⊗ d)L (ρ) or Lg (dρ) = d(Lg ρ), R (dρ) = (d ⊗ id)R (ρ) or Rg (dρ) = d(Rg ρ),
(2.69) (2.70) (2.71) (2.72)
where ρ ∈ ∧k , ρ ∈ ∧n , ∧0 ≡ Fun(G). The last two properties express the fact that d commutes with the left and right action of G. 2.7. Tangent vectors. In (2.21) we expressed df in terms of the differentials dx g . Using (2.38) we find the expansion of df on the basis of the left-invariant one-forms θ g : fg dx g = fg (x gh − x g )θ h df = g∈G
g∈G
=
h∈G
h∈G
(Rh−1 f − f )θ h ≡
(th f )θ h
(2.73)
h∈G
so that the “flat” partial derivatives th f are given by th f = Rh−1 f − f.
(2.74)
Note that th f are really functions ∈ Fun(G), whereas the “curved” partial derivatives of Eq. (2.22) are numbers. The Leibniz rule for the flat partial derivatives tg reads: tg (ff ) = (tg f )Rg −1 f + f tg f .
(2.75)
In analogy with ordinary differential calculus, the operators tg appearing in (2.73) are called (left-invariant) tangent vectors, and in our case are given by tg = Rg −1 − id. They satisfy the composition rule: t g tg =
h
(2.76)
C hg,g th ,
(2.77)
where the structure constants are: C hg,g = δgh g − δgh − δgh
(2.78)
ad(h)g1 ad(h)g2 ,ad(h)g3
(2.79)
and are ad(G) invariant: C
=C
g1 g2 ,g3 .
Clearly we can expand df also on the right-invariant basis ωg and define (right˜ invariant) tangent vectors th from df = h (t˜h f )ωh , whose explicit operator expression is: t˜g = Lg −1 − id.
(2.80)
Gravity on Finite Groups
619
Note 2.4. The exterior derivative on any f ∈ Fun(G) can be expressed as a commutator of f with the bi-invariant one-form (: df = [(, f ],
(2.81)
as one proves by using (2.41) and (2.73). Note 2.5. From the fusion rules (2.77) we deduce the “deformed Lie algebra” (cf. refs. [7–10]): tg1 tg2 − )
g3 ,g4 g1 ,g2 tg3 tg4
= C h g1 ,g2 th ,
(2.82)
where the C structure constants are given by: C
g
g1 ,g2
≡C
g
g1 ,g2
−)
ad(g2−1 )g
= δg1
g3 ,g4 g g1 ,g2 C g3 ,g4
=C
g
g1 ,g2
−C
g
g2 ,g2 g1 g2−1
g
(2.83)
− δg1 ,
and besides property (2.79) they also satisfy: C
g
g1 ,g2
=C
g1 . g,g2−1
(2.84)
Moreover the following identities hold: (i) deformed Jacobi identities: C k h1 ,g1 C h2k,g2 − )
g3 ,g4 h2 k g1 ,g2 C h1 ,g3 C k,g4
= C k g1 ,g2 C h2h1 ,k ,
(2.85)
(ii) fusion identities: C k h1 ,g C h2k,g = C hg,g C h2h1 ,h .
(2.86)
Thus the C structure constants are a representation (the adjoint representation) of the tangent vectors t. Note 2.6. The fusion rules (2.77) also allow to associate an ordinary (i.e. not deformed) Lie algebra to the finite group G; the corresponding structure constants are simply twice g the antisymmetric part (in the indices g1 , g2 ) of C g1 ,g2 . 2.8. Cartan–Maurer equations, connection and curvature. From the definition (2.37) and Eq. (2.40) we deduce the Cartan–Maurer equations: g dθ g + C g1 ,g2 θ g1 ∧ θ g2 = 0, (2.87) g1 ,g2 g
where the structure constants C g1 ,g2 are those given in (2.78). Parallel transport of the vielbein θ g can be defined as in ordinary Lie group manifolds: ∇θ g = −ω
g
g
⊗ θg ,
(2.88)
620
L. Castellani
where ω
g1 g2
is the connection one-form: ω
g1 g2
=
g1 g3 g3 ,g2 θ .
(2.89)
Thus parallel transport is a map from to ⊗ ; by definition it must satisfy: ∇(aρ) = (da) ⊗ ρ + a∇ρ,
∀ a ∈ A, ρ ∈ ,
(2.90)
and it is a simple matter to verify that this relation is satisfied with the usual parallel transport of Riemannian manifolds. As for the exterior differential, ∇ can be extended to a map ∇ : ∧n ⊗ −→ ∧(n+1) ⊗ by defining: ∇(ϕ ⊗ ρ) = dϕ ⊗ ρ + (−1)n ϕ∇ρ,
∀ϕ ∈ ∧n
(2.91)
Requiring parallel transport to commute with the left and right action of G means: Lh (∇θ g ) = ∇(Lh θ g ) = ∇θ g , g
g
Rh (∇θ ) = ∇(Rh θ ) = ∇θ
ad(h)g
(2.92) .
(2.93)
Recalling that Lh (aρ) = (Lh a)(Lh ρ) and Lh (ρ ⊗ ρ ) = (Lh ρ) ⊗ (Lh ρ ), ∀ a ∈ A, ρ, ρ ∈ (and similar for Rh ), and substituting (2.88) yields respectively:
g1 g3 ,g2
∈C
(2.94)
and
ad(h)g1 ad(h)g3 ,ad(h)g2
=
g1 g3 ,g2 .
(2.95)
Therefore the same situation arises as in the case of Lie groups, for which parallel transport on the group manifold commutes with left and right action iff the connection components are ad(G) – conserved constant tensors. As for Lie groups, condition (2.95) is satisfied if one takes proportional to the structure constants. In our case, we can take any combination of the C or C structure constants, since both are ad(G) conserved constant tensors. As we see below, the C constants can be used to define a torsionless connection, while the C constants define a parallelizing connection. As usual, the curvature arises from ∇ 2 : ∇ 2 θ g = −R R
g1
g2
≡
g
g
g dω 1g2
⊗ θg ,
g + ω 1g3
(2.96) g ∧ ω 3g2 .
(2.97)
The torsion R g is defined by: R g1 ≡ dθ g1 + ω
g1 g2
∧ θ g2 .
(2.98)
Using the expression of ω in terms of and the Cartan–Maurer equations yields R
g1
g2
g1 h h,g2 C g3 ,g4 g (− 1h,g2 C h g3 ,g4
= (− =
g1 g2 ,g3 g (−C 1g2 ,g3
R g1 = (−C =
g1 h g3 g4 g3 ,h g4 ,g2 ) θ ∧ θ g g + 1g3 ,h h g4 ,g2 − 1g4 ,h h ) g4 g3 g −1 ,g2
+
g1 g2 g3 g2 ,g3 ) θ ∧ θ g g + 1g2 ,g3 − 1 ) g3 ,g3 g2 g3−1
θ g3 ⊗ θ g4 ,
(2.99)
4
+
θ g2 ⊗ θ g3 .
(2.100)
Gravity on Finite Groups
621
Thus a connection satisfying:
g1 g2 ,g3
−
g1 g3 ,g3 g2 g3−1
=C
g1 g2 ,g3
(2.101)
corresponds to a vanishing torsion R g = 0 and could be referred to as a “Riemannian” connection. On the other hand, the choice:
g1 g2 ,g3
=C
g1 g3 ,g2−1
(2.102)
g
corresponds to a vanishing curvature R g = 0, as can be checked by using the fusion equations (2.86) and property (2.84). Then (2.102) can be called the parallelizing connection: finite groups are parallelizable.
2.9. Tensor transformations and covariant derivative. Under the familiar transformation of the connection 1-form: (ωi j ) = a ik ωk l (a −1 )l j + a ik d(a −1 )kj
(2.103)
the curvature 2-form transforms homogeneously: (R i j ) = a ik R k l (a −1 )l j .
(2.104)
The transformation rule (2.103) can be seen as induced by the change of basis θ i = a ij θ j , with a ij invertible x-dependent matrix (use Eq. (2.90) with aρ = a ij θ j ). j
The covariant derivative D of a function φi transforming as φi = φj (a −1 ) i (i = contravariant index) is defined as follows: Dφi ≡ dφi − φj ω
j
i
(2.105)
(or equivalently by ∇φ ≡ Dφi ⊗ θ i , with φ = φi θ i ), and indeed transforms homogej neously (Dφi ) = (Dφj )(a −1 ) i Similarly on a function ϕ i with a covariant index transforming as (ϕ i ) = a ij ϕ j the covariant derivative is: Dϕ i ≡ dϕ i + ωi j ϕ j
(2.106)
and transforms as (Dϕ i ) = a ij (Dϕ j ). Then D on the scalar φi ϕ i reduces to d, if one defines D to satisfy the Leibniz rule: D(φi ϕ i ) = (Dφi )ϕ i + φi D(ϕ i ). The generalization of D on tensors T with an arbitrary number of covariant and contravariant indices is not straightforward, see for example refs. [5, 4] for a discussion. Although a consistent definition of parallel transport for tensors is clearly important, we will not need it in the following.
622
L. Castellani
2.10. Metric. The metric tensor γ can be defined as an element of ⊗ : γ = γi,j θ i ⊗ θ j .
(2.107)
Requiring it to be invariant under left and right action of G means: Lh (γ ) = γ = Rh (γ )
(2.108)
or equivalently, by recalling Lh (θ i ⊗ θ j ) = θ i ⊗ θ j , Rh (θ i ⊗ θ j ) = θ ad(h)i ⊗ θ ad(h)j : γi,j ∈ C, γad(h)i,ad(h)j = γi,j .
(2.109)
These properties are analogous to the ones satisfied by the Killing metric of Lie groups, which is indeed constant and invariant under the adjoint action of the Lie group. On finite G there are various choices of biinvariant metrics. One can simply take γi,j = δi,j , or γi,j = C k l,i C l k,j , or the “distance” matrix defined in Sect. 3. Note that we are not insisting here on a covariantly conserved metric (i.e. a metric compatible connection, see ref. ([5])). For any biinvariant metric γi,j there are tensor transformations a ij under which γi,j is invariant, i.e.:
a hh γh,k a kk = γh ,k ⇔ a hh γh,k = γh ,k (a −1 )kk .
(2.110)
These transformations are simply given by the matrices that rotate the indices according to the adjoint action of G: ad(α(g))h
a hh (g) = δh
,
(2.111)
where α(g) : G → G is an arbitrary mapping. Then these matrices are functions of G via this mapping, and their action leaves γ invariant because of its biinvariance (2.109). Indeed substituting these matrices in (2.110) yields: a hh (g)γh,k a kk (g) = γad([α(g)]−1 )h ,ad([α(g)]−1 )k = γh ,k ,
(2.112)
proving the invariance of γ . Consider now a contravariant vector ϕ i transforming as (ϕ i ) = a ij (ϕ j ). Then using (2.110) one can easily see that
(ϕ k γk,i ) = ϕ k γk ,i (a −1 )ii , i.e. the vector ϕi ≡ ϕ k γk,i indeed transforms as a covariant vector.
(2.113)
Gravity on Finite Groups
623
2.11. Lie derivative and diffeomorphisms. The notion of diffeomorphisms, or general coordinate transformations, is fundamental in gravity theories. Is there such a notion in the setting of differential calculi on Hopf algebras? The answer is affirmative, and has been discussed in detail in refs. [8–10]. As for differentiable manifolds, it relies on the existence of the Lie derivative. Let us review the situation for Lie group manifolds. The Lie derivative lti along a left-invariant tangent vector ti is related to the infinitesimal right translations generated by ti : 1 [Rexp[εti ] ρ − ρ], ε ρ being an arbitrary tensor field. Introducing the coordinate dependence lti ρ = lim
ε→0
(2.114)
1 [ρ(y + εti ) − ρ(y)] (2.115) ε→0 ε identifies the Lie derivative lti as a directional derivative along ti . Note the difference in meaning of the symbol ti in the r.h.s. of these two equations: a group generator in the first, and the corresponding tangent vector in the second. To find the natural generalization of the Lie derivative in the case of finite groups, we express formula (2.114) in a completely algebraic notation: lti ρ(y) = lim
lti ρ = (id ⊗ ti )R (ρ).
(2.116)
This expression is well defined for any Hopf algebra. In particular for finite groups (2.116) takes the form: ltg ρ = [Rg −1 ρ − ρ]
(2.117)
so that the Lie derivative is simply given by ltg = Rg −1 − id = tg
(2.118)
cf. the definition of tg in (2.76). For example ltg (θ g1 ⊗ θ g2 ) = θ ad(g
−1 )g 1
⊗ θ ad(g
−1 )g 2
− θ g1 ⊗ θ g2 .
(2.119)
As in the case of differentiable manifolds, the Cartan formula for the Lie derivative acting on p-forms holds: ltg = itg d + ditg
(2.120)
(see Appendix A ). Exploiting this formula, diffeomorphisms (Lie derivatives) along generic tangent vectors V can also be consistently defined via the operator: lV = iV d + diV .
(2.121)
This requires a suitable definition of the contraction operator iV along generic tangent vectors V , discussed in Appendix A. We have then a way of defining “diffeomorphisms” along arbitrary (and x-dependent) tangent vectors for any tensor ρ: δρ = lV ρ
(2.122)
and of testing the invariance of candidate lagrangians under the generalized Lie derivative.
624
L. Castellani
2.12. Haar measure and integration. Since we want to be able to define actions (integrals on p-forms) we must now define integration of p-forms on finite groups. Let us start with integration of functions f . We define the integral map h as a linear functional h : Fun(G) → C satisfying the left and right invariance conditions: h(Lg f ) = 0 = h(Rg f ).
(2.123)
Then this map is uniquely determined (up to a normalization constant), and is simply given by the “sum over G” rule: h(f ) = f (g). (2.124) g∈G
Next we turn to define the integral of a p-form. Within the differential calculus we have a basis of left-invariant 1-forms, which may allow the definition of a biinvariant volume element. In general for a differential calculus with n independent tangent vectors, there is an integer p ≥ n such that the linear space of p-forms is 1-dimensional, and (p + 1)- forms vanish identically. We will see explicit examples in the next section. This means that every product of p basis one-forms θ g1 ∧ θ g2 ∧ . . . ∧ θ gp is proportional to one of these products, that can be chosen to define the volume form vol: θ g1 ∧ θ g2 ∧ . . . ∧ θ gp = 8 g1 ,g2 ,...gp vol,
(2.125)
where 8 g1 ,g2 ,...gp is the proportionality constant. Note that the volume p-form is obviously left invariant. We can prove that it is also right invariant with the following argument. Suppose that vol be given by θ h1 ∧ θ h2 ∧ . . . ∧ θ hp , where h1 , h2 , . . . hp are given group element labels. Then the right action on vol yields: Rg [θ h1 ∧ . . . ∧ θ hp ] = θ ad(g)h1 ∧ . . . ∧ θ ad(g)hp = 8 ad(g)h1 ,...ad(g)hp vol .
(2.126)
Recall now that the “epsilon tensor” 8 is necessarily made out of products of the ) tensor of Eq. (2.58), defining the wedge product. This tensor is invariant under the adjoint action ad(g), and so is the 8 tensor. Therefore 8 ad(g)h1 ,...ad(g)hp = 8 h1 ,...hp = 1 and Rg vol = vol. This will be verified in the examples of the next section. Having identified the volume p-form it is natural to set f (g) (2.127) f vol ≡ h(f ) = g∈G
and define the integral on a p-form ρ as: g1 gp ρ = ρg1 ,...gp θ ∧ . . . ∧ θ = ρg1 ,...gp 8 g1 ,...gp vol ≡ ρg1 ,...gp (g) 8 g1 ,...gp .
(2.128)
g∈G
Due to the biinvariance of the volume form, the integral map biinvariance conditions: Lg ρ = ρ = Rg ρ.
: ∧p → C satisfies the (2.129)
Gravity on Finite Groups
625
Moreover, under the assumption that d(θ g2 ∧. . .∧θ gp ) = 0, i.e. that any exterior product of p − 1 left-invariant one-forms θ is closed, the important property holds: df = 0 (2.130) with f any (p − 1)-form: f = fg2 ,...gp θ g2 ∧ . .. ∧ θ gp . This property, which allows integration by parts, has a simple proof. Rewrite df as: g2 gp df = (dfg2 ,...gp )θ ∧ . . . ∧ θ + fg2 ,...gp d(θ g2 ∧ . . . ∧ θ gp ). (2.131) The second term in the r.h.s. vanishes by assumption. Using now (2.73) and (2.127): df = (tg1 fg2 ,...gp )θ g1 ∧ θ g2 ∧ . . . ∧ θ gp = [Rg −1 fg2 ,...gp − fg2 ,...gp ]8 g1 ,...gp vol (2.132) 1 = 8 g1 ,...gp [Rg −1 fg2 ,...gp (g) − fg2 ,...gp (g)] = 0. 1
g∈G
Note that, when the volume form belongs to a nontrivial cohomology class (that is d(vol) = 0 but vol = dρ) the term d(θ g2 ∧ . . . ∧ θ gp ) must vanish: otherwise, being a p-form, it should be proportional to vol, and this would contradict vol = dρ. 3. Bicovariant Calculus on S3 In this section we illustrate the general theory on the particular example of the permutation group S3 . Elements: a = (12), b = (23), c = (13), ab = (132), ba = (123), e. Multiplication table: e
a
b
c
ab ba
e
e
a
b
c
ab ba
a
a
e
ab ba
b
b
ba
c
c
ab ba
e
b
c
ab
c
a
e
a
b
ab ab
c
a
b
ba
e
ba ba
b
c
a
e
ab
Nontrivial conjugation classes: I = [a, b, c], II = [ab, ba]. There are 3 bicovariant calculi BCI , BCII , BCI+II corresponding to the possible unions of the conjugation classes [5]. They have respectively dimension 3, 2 and 5. We examine here the BCI and BCII calculi.
626
L. Castellani
3.1. BCI differential calculus. Basis of the 3-dimensional vector space of one-forms: θ a, θ b, θ c.
(3.1)
Basis of the 4-dimensional vector space of two-forms: θ a ∧ θ b, θ b ∧ θ c, θ a ∧ θ c, θ c ∧ θ b.
(3.2)
Every wedge product of two θ can be expressed as a linear combination of the basis elements: θ b ∧ θ a = −θ a ∧ θ c − θ c ∧ θ b , θ c ∧ θ a = −θ a ∧ θ b − θ b ∧ θ c .
(3.3)
Basis of the 3-dimensional vector space of three-forms: θ a ∧ θ b ∧ θ c, θ a ∧ θ c ∧ θ b, θ b ∧ θ a ∧ θ c,
(3.4)
and we have: θ c ∧ θ b ∧ θ a = −θ c ∧ θ a ∧ θ c = −θ a ∧ θ c ∧ θ a = θ a ∧ θ b ∧ θ c , θ b ∧ θ c ∧ θ a = −θ b ∧ θ a ∧ θ b = −θ a ∧ θ b ∧ θ a = θ a ∧ θ c ∧ θ b , c
a
b
c
b
c
b
c
b
b
a
(3.5)
c
θ ∧ θ ∧ θ = −θ ∧ θ ∧ θ = −θ ∧ θ ∧ θ = θ ∧ θ ∧ θ . Basis of the 1-dimensional vector space of four-forms: vol = θ a ∧ θ b ∧ θ a ∧ θ c ,
(3.6)
θ g1 ∧ θ g2 ∧ θ g3 ∧ θ g4 = 8 g1 ,g2 ,g3 ,g4 vol,
(3.7)
and we have:
where the nonvanishing components of the 8 tensor are: 8abac = 8acab = 8cbca = 8cacb = 8babc = 8bcba = 1, 8baca = 8caba = 8abcb = 8cbab = 8acbc = 8bcac = −1.
(3.8) (3.9)
Note the interesting property f vol = vol f,
∀ f ∈ Fun(G)
(3.10)
due to Ra Rb Ra Rc = Rabac = Re = id. Cartan–Maurer equations: dθ a + θ b ∧ θ c + θ c ∧ θ b = 0, dθ b + θ a ∧ θ c + θ c ∧ θ a = 0, c
a
b
b
(3.11)
a
dθ + θ ∧ θ + θ ∧ θ = 0. The exterior derivative on any three-form of the type θ ∧ θ ∧ θ vanishes, as one can easily check by using the Cartan–Maurer equations and the equalities between exterior products given above. Then, as shown in the previous section, integration of a total differential vanishes on the “group manifold” of S3 corresponding to the BCI bicovariant calculus. This “group manifold” has three independent directions, associated to the cotangent basis θ a , θ b , θ c . Note however that the volume element is of order four in the left-invariant one-forms θ .
Gravity on Finite Groups
627
3.2. BCII differential calculus. Basis of the 2-dimensional vector space of one-forms: θ ab , θ ba .
(3.12)
Basis of the 1-dimensional vector space of two-forms: vol = θ ab ∧ θ ba = −θ ba ∧ θ ab
(3.13)
θ g1 ∧ θ g2 = 8 g1 ,g2 vol,
(3.14)
so that:
where the 8 tensor is the usual 2-dimensional Levi–Civita tensor. Again f vol = vol f since abba = e. Cartan–Maurer equations: dθ ab = 0, dθ ba = 0.
(3.15)
Thus the exterior derivative on any one-form θ g vanishes and integration of a total differential vanishes on the group manifold of S3 corresponding to the BCII bicovariant calculus. This group manifold has two independent directions, associated to the cotangent basis θ ab , θ ba . 3.3. Visualization of the S3 group “manifold”. We can draw a picture of the group manifold of S3 . It is made out of 6 points, corresponding to the group elements and identified with the functions x e , x a , x b , x c , x ab , x ba . BCI – calculus: From each of the six points x g one can move in three directions, associated to the tangent vectors ta , tb , tc , reaching three other points whose “coordinates” are Ra x g = x ga , Rb x g = x gb , Rc x g = x gc .
(3.16)
The 6 points and the “moves” along the 3 directions are illustrated in Fig. 1. The links are not oriented since the three group elements a, b, c coincide with their inverses. BCII – calculus: From each of the six points x g one can move in two directions, associated to the tangent vectors tab , tba , reaching two other points whose “coordinates” are Rab x g = x gba , Rba x g = x gab .
(3.17)
The 6 points and the “moves” along the 3 directions are illustrated in Fig. 1. The arrow convention on a link labeled (in italics) by a group element h is as follows: one moves in the direction of the arrow via the action of Rh on x g (in this case h = ab). To move in the opposite direction just take the inverse of h. The pictures in Fig. 1 characterize the bicovariant calculi BCI and BCII on S3 , and were drawn in Ref. [5] as examples of digraphs, used to characterize different calculi on sets. Here we emphasize their geometrical meaning as finite group “manifolds”.
628
L. Castellani
ba b
c
a
c
e
a
c
c
a
ab b
a b
ab
a
b
ab
c
b
ab
ab ab
ab ba
ab
e
S3 manifold (BC I )
S3 manifold (BCII)
Fig. 1. S3 group manifold, and moves of the points under the group action
3.4. Distance matrix. We define a distance matrix (dist)i,j between two points i, j of the finite group manifold as the minimum number of links that connects them. It is easy to verify that the graphs in Fig. 1 (and more generally any graph corresponding to a bicovariant calculus on a finite G) are ad(G) invariant, and therefore dist itself is ad(G) invariant and can be taken as a biinvariant metric. In the case of the connected manifold corresponding to BCI the distance matrix is invertible and given by: 0 1 1 1 2 2 1 0 2 2 1 1 1 2 0 2 1 1 dist = (3.18) 1 2 2 0 1 1 , 2 1 1 1 0 2 2 1 1 1 2 0 where rows and columns are ordered as e, a, b, c, ab, ba. For the disconnected BCII graph we must define also the distance between two disconnected points, and we arbitrarily set it to zero. The resulting distance matrix is also invertible. 4. Softening G and “Gravity” on S3 We have in mind to construct a dynamical theory of vielbein fields whose vacuum solution describes the G manifold. In particular let us take G = S3 in the BCI setting. Then the dynamical fields of the theory are collected in the 1-form vielbein V g , which is not left-invariant any more since it is a deformation of θ g : V ag (x)θ g , V b = V bg (x)θ g , V c = V cg (x)θ g . (4.1) Va = g∈G
g∈G
g∈G
g
In addition we consider also the “spin connection” 1-form ω 1g2 as an independent field (first order formulation). The field equations will determine the expression of ω in terms of the vielbein field.
Gravity on Finite Groups
629
We try then to mimic the Einstein–Cartan action of general relativity: A=
g
g
V h33 V h44 8g1 ,g,g3 ,g4 γ g,g2 R
g1
g2
∧ θ h3 ∧ θ h4 ,
(4.2)
g
where R 1g2 , the “soft” curvature, is given in terms of ω as in Eq. (2.97) and the indices of the 8 tensor (defined in (3.7)) are lowered with the (bi-invariant) metric γg1 ,g2 = δg1 ,g2 . The integrand is a 4-form (since we chose the BCI calculus on S3 ): thus the action is formally identical to the one of general relativity. The one-forms θ are the vielbeins of S3 discussed in the previous section: they are given one-form fields without dynamics. Invariances of A. Consider the field transformations: g
g
g
(V h ) = a g V h ,
g (ω h )
=
(4.3)
g g a g ω h (a −1 )hh
g + a h d(a −1 )hh .
(4.4)
Requiring A to be invariant under these transformations sets some conditions on the x g dependent “rotation” matrix a. Recalling that the curvature transforms as (R h ) = g
a gR
g
h (a
−1 )h , h
A =
we find the transformed action A : g
g
g
V kh33 V kh44 8g1 ,g,g3 ,g4 γ g,g2 a k33 a k44 a k11 R k1 k2 (a −1 )kg22 ∧ θ h3 ∧ θ h4 .
(4.5)
In the case of usual general relativity, the Lorenz metric and the Levi–Civita tensor are conserved under local Lorenz rotations, and this implies the invariance of the action under local Lorenz transformations. Here the 8 tensor is the one given in (3.9); moreover the x dependent a matrix elements do not commute with the two-form R k1 k2 as in the usual case. We will show that if the a matrices entries are taken to be the functions of Eq. (2.111) satisfying an additional periodic condition, then the action is invariant under the transformations (4.4). g First, for the adjoint matrices of (2.111) we have γ g,g2 (a −1 )kg22 = a g2 γ g2 ,k2 because of (2.110). Suppose then that the a matrix entries satisfy a “two unequal links” periodic condition: a ij = Rgg a ij ,
g = g .
(4.6)
g
Then we can bring the a g2 term to the left of the curvature two-form (the g = g in (4.6) is due to θ g ∧ θ g = 0), and we see that the action A is invariant if: g
g
g
g
8g1 ,g2 ,g3 ,g4 a k11 a k22 a k33 a k44 = 8k1 ,k2 ,k3 ,k4 ,
(4.7)
i.e. if 8 is a conserved tensor. But as we have already argued in Sect. 2.12 this is the case when the a matrix rotates the indices according to the adjoint action of G. Hence we have an action invariant under the local ad(G) transformations, in analogy with the local Lorenz rotations of general relativity.
630
L. Castellani
This action is also invariant under the analogue of general coordinate transformations. Indeed diffeomorphisms along a generic tangent vector v are generated by the Lie derivative lv = div + iv d. Then under diffeomorphisms the variation of A is given by δA = lv (4-form) = [div (4-form) + iv d(4-form)] = 0, (4.8) since d (4-form) = 0 and
d (3-form) = 0.
Field equations. The field equations are obtained by varying the action with respect to g g g g the dynamical fields V h , ω h . The δω h variation yields an equation relating ω h to (first derivatives of) the vielbein and its inverse, as in the usual zero-torsion condition of g ordinary Einstein–Cartan gravity. Varying with respect to V h leads to the analogues of Einstein equations:
g g (4.9) V h33 γ g0 ,g2 R 1g2 ,h1 ,h2 8g1 ,g0 ,g,g4 8 h1 ,h2 ,h,h4 + 8g1 ,g0 ,g3 ,g 8 h1 ,h2 ,h3 ,h = 0, where the curvature components R
g1
g2 ,h1 ,h2
are defined by R
g1
g2
=R
g1
Note 4.1. The analogue of a cosmological term g g g g εg1 ,g2 ,g3 ,g4 V h11 V h22 V h33 V h44 θ h1 ∧ θ h2 ∧ θ h3 ∧ θ h4
g2 ,h1 ,h2 θ
h1 ∧θ h2 .
(4.10)
is invariant under the local tangent rotations (4.4) because of property (4.7), and under diffeomorphisms because of (4.8). Adding this term to the action (4.2) allows “vacuum” g solutions with R 1g2 ,h1 ,h2 = 0. Note 4.2. The same action (4.2) can be used in the case of the finite group Z N × Z N × Z N × Z N . Here the situation simplifies: for example the 8 tensor becomes the usual Levi–Civita tensor, the basic one forms anticommute, etc. One then obtains a discretized gravity of the type discussed in ref. [4], with some differences. The field equations are derived from a variational principle, the local symmetry involves functions arbitrary up to a (“two unequal links”) periodicity condition, no procedure is used to “localize” the components of the curvature tensor, and no use of the left symmetric tensor product ⊗L is made. Note 4.3. The analogue of the topological action: g g R 1g2 ∧ R 2g1
(4.11)
has all the invariances discussed above, even without the two-links periodic condition on the matrices a(x), thanks to [vol, f ] = 0. Note 4.4. On the closure of diffeomorphisms: The product of two generic vector fields V = a i ti , W = bj tj (with a, b ∈ Fun(G)) is again a vector field: (a i ti )(bj tj ) = C k i,j a i (Ri −1 bj ) + a i (ti bk ) tk
(4.12)
Gravity on Finite Groups
631
as can be proved on functions, using the fusion rules (2.77) and the definitions of the Appendix. Likewise the commutator closes [a i ti , bj tj ] = C k i,j a i (Ri −1 bj ) + a i (ti bk ) − a ↔ b tk . (4.13) However the product and the commutator of two Lie derivatives along generic vector fields does not close on p-forms with p ≥ 1, as can be checked on the basis θ i . Thus the “diffeomorphisms” that leave our action invariant do not close on a Lie algebra (with structure functions), but are freely generated by the Lie derivatives lV . Only the Lie derivatives along left-invariant tangent vectors (and constant combinations thereof) close according to the fusion rules: ltg ltg = C hg,g lth .
(4.14)
A similar formula holds for Lie derivatives along right-invariant tangent vectors. Acknowledgements. I have benefited from discussions with P. Aschieri and F. Mueller-Hoissen.
A. Lie Derivative Along Generic Tangent Fields As discussed in ref. [10] a generic vector field V can be written in terms of the leftinvariant tangent vectors ti of Eq. (2.75) as V = a i ti , a i ∈ Fun(G) and defined to act on any b ∈ G as: V b = (a i ti )b ≡ a i (ti b).
(A.1)
We denote by = the space of vector fields V . The product between elements of Fun(G) and left-invariant tangent vectors ti generalizes to the whole =: (aV )b ≡ a(V b), and (a + b)V = aV + bV ;
(ab)V = a(bV );
(λa)V = λ(aV ),
λ ∈ C.
(A.2)
= is the analogue of the space of derivations on Fun(G). Indeed: V (a + b) = V (a) + V (b),
V (λa) = λV (a)
i
Linearity,
i
V (ab) ≡ (c ti )(ab) = ti (a)(Ri −1 b)c + aV (b) Leibniz rule.
(A.3) (A.4)
Inner derivative. The iV contraction operator is defined by the following properties (V ≡ ci ti , a ∈ Fun(G), λ ∈ C): iV (ρ) = iti (ρ)ci , iV (a) = 0,
(A.5) (A.6)
iV (θ i ) = ci ,
(A.7)
i
iV (ρ ∧ ρ ) = iti (ρ) ∧ (Ri −1 ρ )c + (−1)
iV (aρ + ρ ) = aiV (ρ) + iV (ρ ), i
iV (ρa) = iti (ρ)(Ri −1 a)c , iλV = λiV .
deg(ρ)
ρ ∧ iV (ρ ),
(A.8) (A.9) (A.10) (A.11)
632
L. Castellani
Lie derivative. Definition: lV ≡ iV d + diV .
(A.12)
Properties (V ≡ ci ti , a, b, ci ∈ Fun(G)): lV (a) = V (a), lV dρ = dlV ρ,
(A.13) (A.14)
lV (λρ + ρ ) = λlV (ρ) + lV (ρ ), lbV (ρ) = (lV ρ)b − (−1)
deg(ρ)
(A.15) iV (ρ) ∧ db,
lV (ρ ∧ ρ ) = ρ ∧ lV (ρ ) + lti (ρ) ∧ (Ri −1 ρ )c
+ (−1)deg(ρ ) iti (ρ) ∧ (Ri −1 ρ )dci , lti (θ j ) = C
j
k,i θ
k
.
(A.16)
i
(A.17) (A.18)
The proof that the Lie derivative lti defined as lti ρ = (id ⊗ ti )R (ρ) in (2.116) is equal to lti = iti d + diti is done by induction on the generic p-form ai1 ...ip θ i1 ∧ . . . ∧ θ ip , checking first that both definitions give the same result on θ j (see refs. [8, 10]). References 1. Connes, A.: Noncommutative geometry and reality. J. Math. Phys. 36, 6194 (1995); Connes, A. and Lott, J.: Particle models and noncommutative geometry. Nucl. Phys. Proc. Suppl. B 18, 29 (1991) 2. Dimakis, A. and Müller-Hoissen, F.: Bidifferential calculi and integrable models. math-ph/9908015 3. Dimakis, A., Müller-Hoissen, F. and Striker, T.: Non-commutative differential calculus and lattice gauge theory. J. Phys. A 26, 1927 (1993) 4. Dimakis, A. and Müller-Hoissen, F.: Discrete Riemannian Geometry. J. Math. Phys. 40, 1518 (1999), gr-qc/9808023 5. Bresser, K., Müller-Hoissen, F., Dimakis, A. and Sitarz, A.: Noncommutative geometry of finite groups. J. Phys. A 29, 2705 (1996), q-alg/9509004 6. Bonechi, F., Giachetti, R., Maciocco, R., Sorace, E. and Tarlini, M.: Cohomological Properties of Differential Calculi on Hopf Algebras. In: Proceedings of the Symposium on Quantum Groups of the International Colloquium GROUP21, Goslar 1996, q-alg/9612019 7. Woronowicz, S.L.: Differential calculus on compact matrix pseudogroups (Quantum groups). Commun. Math. Phys. 122, 125 (1989) 8. Aschieri, P. and Castellani, L.: An introduction to non-commutative differential geometry on quantum groups. Int. J. Mod. Phys. A 8, 1667 (1993) 9. Castellani, L.: Differential calculus on I SOq (N ), quantum Poincaré algebra and q-gravity. Commun. Math. Phys. 171, 383 (1995), hep-th 9312179; The lagrangian of q-Poincaré gravity. Phys. Lett. B 327, 22 (1994), hep-th 9402033 10. Aschieri, P.: On the geometry of inhomogeneous quantum groups. Ph.D Thesis, Scuola Normale Superiore di Pisa (1998), math.QA/9805119 Communicated by A. Connes
Commun. Math. Phys. 218, 633 – 663 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Twisted Traces of Quantum Intertwiners and Quantum Dynamical R-Matrices Corresponding to Generalized Belavin–Drinfeld Triples P. Etingof 1,2 , O. Schiffmann3 1 Columbia University, Mathematics Department, 2990 Broadway, New York, NY 10027, USA 2 MIT, Mathematics Department, 77 Massachusetts Ave., Cambridge, MA 02139, USA.
E-mail:
[email protected]
3 Yale University, Mathematics Department, 10 Hillhouse Ave., New Haven, CT 06510, USA.
E-mail:
[email protected] Received: 20 March 2000 / Accepted: 11 December 2000
Abstract: We consider weighted traces of products of intertwining operators for quantum groups Uq (g), suitably twisted by a “generalized Belavin–Drinfeld triple”. We derive two commuting sets of difference equations- the (twisted) Macdonald–Ruijsenaars system and the (twisted) quantum Knizhnik–Zamolodchikov–Bernard (qKZB) system. These systems involve the nonstandard quantum R-matrices defined in a previous joint work with T. Schedler ([ESS]). When the generalized Belavin–Drinfeld triple comes from an automorphism of the Lie algebra g, we also derive two additional sets of difference equations, the dual Macdonald–Ruijsenaars system and the dual qKZB equations. 1. Introduction This paper is a continuation of [ES1] and [EV2]. In [EV2], A.Varchenko and the first author considered weighted traces of products of intertwining operators for quantum groups Uq (g), where g is a simple Lie algebra. They showed that the generating function FV1 ,...VN (λ, µ) of such traces (where λ, µ are complex weights for g) satisfies four commuting systems of difference equations – the Macdonald–Ruijsenaars (MR) system, the quantum Knizhnik–Zamolodchikov– Bernard (qKZB) system, the dual MR system, and the dual qKZB system. The first two systems are systems of difference equations with respect to λ, which involve Felder’s trigonometric dynamical R-matrix depending on λ. The second two systems are systems of difference equations with respect to µ, which are obtained from the first two by the transformation λ → µ, Vi → VN∗ −i+1 . Such a symmetry is explained by the fact that the function FV1 ,...VN (λ, µ) is invariant under this transformation. If the quantum group Uq (g) is replaced with the Lie algebra g, these results are replaced with their classical analogs ([EV2]). Namely, the MR and qKZB equations are replaced by the classical MR and KZB equations, which are differential equations involving Felder’s classical trigonometric dynamical r-matrix. The dual MR and KZB equations retain roughly the same form, but involve the rational quantum dynamical
634
P. Etingof, O. Schiffmann
r-matrix rather than the trigonometric one. Thus, the symmetry between λ and µ is destroyed. In [ES1], we generalized the classical MR and KZB equations to the case when the trace is twisted using a “generalized Belavin–Drinfeld triple”, i.e. a pair of subdiagrams
1 , 2 of the Dynkin diagram of g together with an isomorphism T : 1 → 2 between them. It turned out that such twisted traces also satisfy differential equations which involve a dynamical r-matrix, namely the one attached to the triple ( 1 , 2 , T ) by the second author in [S]. After [ES1] was finished, we wanted to generalize its results to the quantum case. It was clear to us that to express the result we would need an explicit quantization of classical dynamical r-matrices from [S]. Therefore, we hoped that attempts to quantize the results of [ES1] using the approach of [EV2] could help us obtain such a quantization (which was unknown even for the usual Belavin–Drinfeld classical r-matrices). This program did, in fact, succeed, and the quantization of dynamical r-matrices from [S] was recently obtained in [ESS]. We note that Fronsdal also suggested a (different) solution to the quantization problem of classical Belavin–Drinfeld r-matrices in [Fro]. In this paper, using the results of [ESS] and methods of [EV2], we generalize the difference equations from [EV2] to the twisted case; this also provides a quantum generalization of [ES1]. Namely, we deduce difference equations with respect to the weight λ for the generating function of the twisted traces for Uq (g), – the twisted MR and qKZB equations. Not surprisingly, they involve the dynamical R-matrix constructed in [ESS]. In the case when T is an automorphism of the whole Dynkin diagram of g, we also deduce the twisted dual MR and qKZB equations, i.e. the difference equations with respect to the other weight µ. These equations involve the usual (Felder’s) dynamical R-matrix, but differ from the equations of [EV2] by explicit occurrence of T . Thus, we see that for T = 1, there is no symmetry between λ and µ. If T is not an automorphism, we do not expect the existence of the dual equations. This is explained at the end of Sect. 2. Replacing Uq (g) with g, we obtain the classical limit of these results. The twisted MR and qKZB equations become their classical analogs from [ES1]. The dual equations retain their form, but the trigonometric R-matrices are replaced by their rational limits. Finally, we adapt the construction of the quantum dynamical R-matrices from [ESS] to the case when g is an arbitrary symmetrizable Kac–Moody algebra. This yields quantizations of the classical dynamical r-matrices from [ES1] in the case of Kac–Moody algebras. Again, the generating functions for twisted traces of intertwiners for Uq (g) satisfy a set of difference equations involving these quantum dynamical R-matrices, and a set of dual difference equations if in addition T is an automorphism of the Dynkin diagram. In the next paper, we plan to generalize these results to the case of affine algebras, when traces take values in finite-dimensional representations. This involves dynamical Rmatrices with spectral parameters. In particular, we plan to obtain a trace representation of solutions of the elliptic qKZ equation (with Belavin’s elliptic R-matrix), and compute its monodromy. Remark. The elliptic qKZ equation is important in statistical mechanics (see [JM]). For its classical version, the trace representation of solutions and monodromy are obtained in [E1]. The problem of quantizing the results of [E1] was suggested to the first author by his advisor I. Frenkel as a topic for his PhD thesis in 1992. After this the first author tried to quantize the results of [E1] (see [E2]) but obtained only partial results.
Twisted Traces of Quantum Intertwiners and Quantum Dynamical R-Matrices
635
1.1. Notations. Let g be a simple complex Lie algebra with a fixed polarization g = n− ⊕ h ⊕ n+ . Let (resp. ) be the Dynkin diagram (resp. the root system) of g. Denote by (aij ) the Cartan matrix of g and let di be relatively prime positive integers such that (di aij ) is a symmetric matrix. Let ( , ) be the nondegenerate invariant symmetric form for which (α, α) = 2 if α is a short root. Let {eα , fα }α∈ be a Chevalley basis of n− ⊕ n+ , normalized in such a way that (eα , fα ) = 1 for all α and set hα = [eα , fα ]. We also let ∈ g ⊗ g and h ∈ h ⊗ h be the inverse elements of the restriction of ( ) to g and h respectively. 1 A Let q = e 2 h¯ , where h¯ is a formal variable. For any operator A we set q A = eh¯ 2 . Let Uq (g) be the Drinfeld–Jimbo quantized enveloping algebra of g. It is a C[[h]]-Hopf ¯ algebra with generators Eα , Fα , α ∈ and q h , h ∈ h subject to the following set of relations: q x+y = q x q y , x, y ∈ h
q h Eαj q −h = q αj (h) Eαj ,
Eαi Fαj − Fαj Eαi = δij
1−aij
k=0 1−aij
k=0
(−1)
k
(−1)k
1 − aij k
q di hαi − q −di hαi , q di − q −di
1−aij −k
Eαj Eαk i = 0,
i = j,
1−aij −k
Fαj Fαki = 0,
i = j,
[n]q ! = [1]q · [2]q · · · · · [n]q ,
[n]q =
1 − aij k
where as usual [n]q ! n = , k q [k]q ![n − k]q !
q h Fαj q −h = q −αj (h) Fαj ,
q di
q di
Eαi Fαi
q n − q −n . q − q −1
Comultiplication , antipode S and counit in Uq (g) are given by (Eαi ) = Eαi ⊗ q di hαi + 1 ⊗ Eαi , (Fαi ) = Fαi ⊗ 1 + q −di hαi ⊗ Fαi , S(Eαi ) = −Eαi q −di hαi , (Eαi ) = (Fαi ) = 0,
(q h ) = q h ⊗ q h ,
S(Fαi ) = −q di hαi Fi ,
S(q h ) = q −h ,
(q h ) = 1.
Let Uq (n± ) be the subalgebra generated by (Eα )α∈ and (Fα )α∈ respectively. It is ˆ q (n− ). Here ⊗ ˆ known that Uq (g) is quasitriangular, with R-matrix R ∈ q h Uq (n+ )⊗U denotes the completion with respect to the principal grading of Uq (n± ). 1.2. Generalized Belavin–Drinfeld triples and classical dynamical r-matrices. Let l ⊂ h be a subalgebra on which ( , ) is nondegenerate. Let (xi )i∈I be an orthonormal basis of
636
P. Etingof, O. Schiffmann
l and let (x i )i∈I be the dual basis of l∗ . The classical dynamical Yang–Baxter equation with respect to l is the following equation: i
(1) ∂r xi
23 (λ)
∂x i
(2) ∂r − xi
13 (λ)
∂x i
(3) ∂r + xi
12 (λ)
∂x i + r 12 (λ), r 13 (λ) + r 12 (λ), r 23 (λ) + r 13 (λ), r 23 (λ) = 0,
(1.1)
where r(λ) : l∗ → (g ⊗ g)l is a meromorphic function. Solutions of (1.1) relevant to the theory of Poisson–Lie groupoids (see [EV1, ES2, Xu]) are those satisfying the generalized unitarity condition, i.e. r(λ) + r 21 (λ) = $ is constant and $ belongs to (S 2 g)g . In [S] the second author classified all such solutions r(λ) which are non-skewsymmetric (that is, $ = 0). Up to isomorphism and gauge transformations, they are labeled by the following combinatorial data called generalized Belavin–Drinfeld triples. Definition. A generalized Belavin–Drinfeld triple is a triple ( 1 , 2 , T ) where 1 , ∼
2 ⊂ and T : 1 → 2 is an orthogonal isomorphism. Let ( 1 , 2 , T ) be a generalized Belavin–Drinfeld triple. Set l=
C(α − T (α))
⊥
⊂ h.
α
Note that l is spanned by real elements so that the restriction of ( , ) to l is nondegenerate. Let h0 ⊂ h be the orthogonal complement to l in h and let h0 ∈ h0 ⊗ h0 be the element inverse to the form ( , ). The following lemmas are proved in [ES1]. Lemma 1.1. There exists a unique Lie algebra homomorphism B : b− → b− (resp. B −1 : b+ → b+ ) such that B(fα ) = fT (α) , B(hα ) = hT (α) if α ∈ 1 , B(fα ) = 0 if α ∈ 1 (resp. B −1 (eα ) = eT −1 (α) , B −1 (hα ) = hT −1 (α) if α ∈ 2 , B −1 (eα ) = 0 if α ∈ 2 ), and B ±1 (h) = h if h ∈ l. Moreover the restriction of B to h is an orthogonal operator. Remark. We use the symbol B −1 for notational convenience only. The operators B and B −1 are only inverse to each other when restricted to h, when T is an automorphism of . Lemma 1.2 (Cayley transform). For any x ∈ h0 , there exists a unique element CT (x) ∈ h0 such that for all α ∈ 1 one has (α −T α, CT (x)) = (α +T α, x). The linear operator CT : h0 → h0 is skew-symmetric. The classical dynamical r-matrix associated to ( 1 , 2 , T ) is rT (λ) = −r021 +
α,l≥1
where r0 = 21 h +
α eα
1 e−l(α,λ) eα ∧ B l fα + (CT ⊗ 1)h0 , 2
⊗ fα is the standard classical r-matrix.
(1.2)
Twisted Traces of Quantum Intertwiners and Quantum Dynamical R-Matrices
637
1.3. Quantum dynamical R-matrices. In our joint work with Travis Schedler [ESS] we obtain an explicit quantization of the r-matrices rT (λ). Namely, we construct a T (λ) : l∗ → Uq (g)⊗Uq (g) (tensor product in the category of trigonometric function R T (λ) ≡ 1 − hr topologically free C[[h¯ ]]-modules) such that R ¯ T (λ) mod (h¯ 2 ) which satisfies the quantum dynamical Yang–Baxter equation 1 (1) (2) 23 T12 λ − 1 h¯ h(3) R T13 λ + 1 hh R RT λ − hh ¯ ¯ 2 2 2 1 (2) 12 1 (3) (1) 13 T23 λ + 1 hh =R RT λ − hh RT λ + hh . ¯ ¯ ¯ 2 2 2
(1.3)
In the above equation we used the standard notation for shifts in the dynamical variable: for instance, if S(λ) is any meromorphic function l∗ → Uq (g)⊗2 we set S(λ− 21 hh ¯ (3) ) = 1 ∂S (3) S(λ) − 2 h¯ i ∂y i yi + . . . (the Taylor expansion), where y1 , . . . , yr is a basis of l and
y 1 , . . . y r is the dual basis of l∗ . The construction is based on the following result. Let I± ⊂ Uq (b± ) be the kernels of the projections Uq (b± ) → Uq (h). Also set Z = 21 ((1 − CT ) ⊗ 1)h0 . The maps B : Uq (b− ) → Uq (b− ) and B −1 : Uq (b+ ) → Uq (b+ ) are defined in the same fashion as in the classical case (see Lemma 1.1). To simplify notations we will write qiA for (q A )i for any operator A (the operator q A acting on the i th component of a tensor product). Theorem 1.1 ([ESS]). There exists a unique trigonometric rational function JT : l∗ → (Uq (b− )⊗Uq (b+ ))l such that 1. JT − q Z ∈ I − ⊗ I + , 2. JT (λ) satisfies the modified ABRR equation: R21 q12λ B1 (JT (λ)) = JT (λ)q12λ q l .
(1.4)
Moreover JT (λ) satisfies the shifted 2-cocycle condition: 1 1 JT (λ)12,3 (λ)JT12 λ + h(3) = JT1,23 (λ)JT23 λ − h(1) . 2 2
(1.5)
T (λ) is obtained by twisting R by JT (λ): The quantum dynamical R-matrix R RT (λ) = JT−1 (λ)R21 JT21 (λ), λ T (λ) = R21 . R T h¯ Note that the polarization we use here is opposite to the polarization used in [ESS], where the twist JT (λ) was an element of Uq (b+ ) ⊗ Uq (b− ). One aim of this paper is to provide a representation-theoretic interpretation of the quantum dynamical R-matrix RT (λ). This is done in terms of twisted traces of quantum intertwiners and of the systems of difference equations satisfied by them.
638
P. Etingof, O. Schiffmann
2. Twisted Traces of Quantum Intertwiners 2.1. Definition. Let Mµ be the Verma module over Uq (g) with highest weight µ ∈ h∗ and let vµ be a highest weight vector. We will also consider the graded dual Verma module Mµ∗ and let vµ∗ be its lowest weight vector satisfying vµ∗ , vµ = 1. Let V be a finite dimensional Uq (g)-module and let V = ν V [ν] be its weight space decomposition. The following result is well-known (see e.g [ES2]): Lemma 2.1. Suppose that Mµ∗ is irreducible. Then the map H omUq (g) (Mµ , Mλ ⊗ V ) → V [µ − λ]
0 → vλ∗ , 0vµ
is an isomorphism. Conversely, for every weight ν and every homogeneous vector v ∈ V [ν] we will ∗ , 0v v = v. denote by 0vµ : Mµ → Mµ−ν ⊗V the unique intertwiner satisfying vµ−ν µ µ v It will be convenient to consider all the operators 0µ simultaneously by setting 0Vµ =
v∈B
0vµ ⊗ v ∗ ∈ H omC (Mµ ,
Mµ−ν ⊗ V ⊗ V ∗ ),
ν
where B is any homogeneous basis of V . Let ( 1 , 2 , T ) be a generalized Belavin–Drinfeld triple. Let l, h0 , CT , . . . have the same meanings as in Sect. 1. Finally, let µ, µ be weights satisfying the following relations: (µ, α) = (µ , T (α)) for all α ∈ 1 .
(2.1)
We define a linear map B : Mµ → Mµ by u · vµ → B(u) · vµ for all u ∈ Uq (n− ). Now consider finite-dimensional Uq (g)-modules V1 . . . , VN and let v1 ∈ V1 [µ1 ],
. . . , vN ∈ VN [µN ] be homogeneous vectors such that µ := i µi ∈ l⊥ . The set of pairs of weights (µ, µ ) satisfying (2.1) and for which we have µ −µ = µ is an l∗ -torsor ˜l∗ . For any such pair (µ, µ ) and for λ ∈ l∗ , we define the following formal power series in (V1 ⊗ · · · ⊗ VN )l ⊗ q 2(λ,µ) C[[q −2(λ,α1 ) , . . . , q −2(λ,αr ) ]] by analogy with [EV2]: 2vT1 ,... ,vN (λ, µ) = Tr |Mµ 0v1
µ−
N
and 2VT1 ,... ,VN (λ, µ) =
vi ∈Bi
i=2 µi
· · · 0vµN Beλ
∗ 2vT1 ,... ,vN (λ, µ) ⊗ vN ⊗ · · · ⊗ v1∗ ,
where Bi is any homogeneous basis of Vi . It is clear that 2VT1 ,... ,VN (λ, µ) takes values in (V1 ⊗ · · · ⊗ VN )l ⊗ (VN∗ ⊗ · · · ⊗ V1∗ )l .
Twisted Traces of Quantum Intertwiners and Quantum Dynamical R-Matrices
639
2.2. The main results. Our main result is that the functions 2VT1 ,... ,VN (λ, µ) satisfy some interesting difference equations. These difference equations are more conveniently expressed after some renormalizations. Set 1 JT (λ) = JT −λ − ρ + h(1) + h(2) , 2
1 JT (λ) = JT λ + h(1) + h(2) . 2
Put QT (λ) = m21 (1 ⊗ S −1 )(JT (λ)) = m21 (1 ⊗ S −1 )(JT (λ)). We will denote simply by Q(λ) the element corresponding to the trivial triple ( , , I d). Also set RT (λ) = 21 21 JT−1 (λ)R21 JT21 (λ) and RT (λ) = J−1 T (λ)R JT (λ). Define J1···N (λ) = J1,2···N (λ) · · · JN−1,N (λ). T T T Finally, let
−1 δqT (λ) = Tr |M−ρ (Bq 2λ )
be the twisted Weyl denominator. The explicit expression for δqT (λ) is as follows. Let
3 be the subset of 1 ∩ 2 consisting of roots which return to their original position after applying T several times, and let 3 be the set of positive roots which are linear combinations of roots from 3 . For each α ∈ 3 let Nα be the order of the action of B on α. Consider the Lie algebra g and define θα ∈ C by B Nα uα = θα uα for any uα ∈ g[α]. Then (see [ES1]) δqT (λ) = q 2(ρ,λ) 1 − θα q −2Nα (α,λ) . α∈ 3 /B
Define the renormalized trace function by (∗N) (∗1) FVT1 ,... ,VN (λ, µ) = Q−1 µ + h(∗1···∗N) ⊗ · · · ⊗ Q−1 µ + h(∗1) × ϕVT1 ,... ,VN (λ, −µ − ρ), where
ϕVT1 ,... ,VN (λ, µ) = J1···N (λ)−1 2VT1 ,... ,VN (λ, µ)δqT (λ). T
Let W be a finite-dimensional Uq (g)-module. Consider the following difference operator acting on functions l∗ → (V1 ⊗ · · · ⊗ VN )l : T DW = Tr |W [ν] (RT )W V1 (λ + h(2···N) ) · · · (RT )W VN (λ) Tν , (2.2) ν
where Tν f (λ) = f (λ + ν). In the above, we only consider the trace of the “diagonal block” of (RT )W V1 (λ + h(2···N) ) · · · (RT )W VN (λ), i.e the part that preserves W [ν]. Theorem 2.1 (Twisted Macdonald–Ruijsenaars equations). T T DW FV1 ,... ,VN (λ, µ) = χW (q −2µ )FVT1 ,... ,VN (λ, µ),
where χW (x) = variable λ.
(2.3)
T acts on the dim W [ν]x ν is the character of W and where DW
640
P. Etingof, O. Schiffmann
This theorem is proved in Sect. 3. For each j ∈ {1, . . . , N} consider the two following operators: −2µ−Ch −2h −2 q∗j,∗1 · · · q∗j,∗jh−1 , −1 j +1,j Nj RT · · · RT (λ)−1 j λ + h(j +2···N) j 1 j,j −1 × RT λ + h(2···j −1) + h(j +1···N) · · · RT λ + h(j +1···N) ,
DjT = q∗j KjT =
(2.4)
(2.5)
where Ch = m12 (h ) ∈ U (h) is the quadratic Casimir element for h, and where
j f (λ) = f (λ + h(j ) ). Theorem 2.2 (Twisted qKZB equations). The function FVT1 ,... ,VN (λ, µ) satisfies the following difference equation for all j = 1, . . . , N: FVT1 ,... ,VN (λ, µ) = DjT ⊗ KjT FVT1 ,... ,VN (λ, µ). (2.6) This theorem is proved in Sect. 4. Now suppose that ( 1 , 2 , T ) is a complete triple, i.e 1 = 2 = and T is an automorphism. In this case, the functions FVT (λ, µ) satisfy in addition some dual difference equations, with respect to the variable µ. In a complete triple the maps B : Uq (b− ) → Uq (b− ) and B −1 : Uq (b+ ) → Uq (b+ ) come from an automorphism B : Uq (g) → Uq (g). Let d be the order of B and let B ⊂ Aut (Uq (g)) be the subgroup generated by B. Let W be any finite-dimensional Uq (g)-module. We denote by W B the twist of W by B: as a vector space W = W B and the Uq (g)-action is given by u · w = B −1 (u)w. Now suppose that W " W B as Uq (g)-modules and let us fix an intertwiner in HomUq (g) (W, W B ) ⊂ AutC (W ) of order d. This endows W with the structure of a module over B Uq (g). Consider the following difference operator acting on functions with values in (VN∗ ⊗ · · · ⊗ V1∗ )l : ∗ ∗ ∨,T DW = Tr |W [ν] RW VN (µ + h(∗1···∗N−1) ) · · · RW V1 (µ)BW T∨ (2.7) ν, ν
where T∨ ν f (µ) = f (µ + ν). Theorem 2.3 (Dual twisted Macdonald–Ruijsenaars equations). ∨,T T FV1 ,... ,VN (λ, µ) = Tr |W h0 (q −2λ B)FVT1 ,... ,VN (λ, µ). DW
(2.8)
Remark. Any B-invariant finite-dimensional Uq (g) module is a direct sum of modules V ν0 := ν∈Bν0 Vν , where Vν is the irreducible highest weight module of highest weight ν and where ν0 is dominant integral. It is easy to see that both sides of (2.8) identically vanish when W = V ν0 and ν0 ∈ l∗ (i.e. when B(ν0 ) = ν0 ). The twisted character Tr |W h0 (q 2λ B) can be expressed explicitly when W = Vν with
ν ∈ l∗ . Consider the quotient Dynkin diagram = {α = α∈α α | α ∈ /T } which forms a set of simple roots in l∗ with respect to the restriction of ( , ) to l∗ . Namely, if
= {α1 , . . . , αn } is of type An and if T is the “flip” T : αi → αn+1−i then /T is Bk , where n = 2k − 1 or n = 2k; if = D4 and T is the rotation of order three around
Twisted Traces of Quantum Intertwiners and Quantum Dynamical R-Matrices
641
the trivalent root then /T = G2 ; if = Dk and T is the symmetry of order two around the trivalent root then = Ck−1 ; and if = E6 and T is the symmetry around the trivalent root then /T = F4 . Let be the root system of . Let g be the simple complex Lie algebra associated to . Note that any weight ν ∈ l∗ is naturally a weight for . However, the scalar product on l∗ is not the usual one corresponding to the root system . For instance we have (α, α) = 2Nα if Nα is the number of elements in the
T -orbit α and if any two elements of that orbit are orthogonal. For λ = α cα α ∈ l∗
2Nα set λ = α (α,α) cα α. Proposition 2.1. For any ν ∈ l∗ we have Tr |W h0 q 2λ B = χV ν q 2λ , where V ν is the irreducible g-module of highest weight ν. Theorem 2.3 and Proposition 2.1 are proved in Sect. 5. Now, define for each j ∈ {1, . . . , N} the following operators: −2λ−Cl −l − qj,j +1 · · · qj,N l ,
Dj∨,T = qj
−1 · · · R∗1,∗j (µ)−1 B∗ −1 (j ) × Kj∨,T = R∗j −1,∗j µ + h(∗1···∗j −2) R∗j,∗N µ + h(∗j +1···∗N−1) + h(∗1···∗j −1) · · · R∗j,∗j +1 µ + h(∗1···∗j −1) , (2.9) where Cl = m12 (l ) ∈ U (l) and where B∗ −1 (j ) f (µ) = f µ + B −1 (h(∗j ) ) . Theorem 2.4 (Dual twisted qKZB equations). The functions FVT1 ,... ,VN (λ, µ) satisfy the following difference equation for each j = 1 . . . , N: BVj BV∗ ∗ FVT1 ,... ,Vj ,... ,VN (λ, µ) = (Dj∨,T ⊗ Kj∨,T )FVT ,... ,V B ,... ,V (λ, µ). j
1
j
N
(2.10)
This theorem is proved in Sect. 6. Remark 1. For T = I d, Theorems 2.1–2.4 appear in [EV2]. Remark 2. We do not expect the dual equations to exist for non-complete triples. This can be explained in the following way. Suppose that g is an affine Lie algebra and that T = I d, so that rT (λ, z) is the Felder elliptic dynamical R-matrix. In that case it is known that the the dual trigonometric qKZB equations without spectral parameter can be interpreted as monodromy of the flat connection on the torus defined by the classical (elliptic) KZB equations (see [Ki]). One can show that this is true for any elliptic dynamical R-matrix. On the other hand, it was proved in [ES1] Proposition 4.2 that the classical dynamical R-matrix with spectral parameter rT (λ, z) associated to an affine Lie algebra and a triple ( 1 , 2 , T ) is elliptic only when T is an automorphism; for general triples, it is partially elliptic and partially trigonometric (for instance, it is purely trigonometric when T is nilpotent). This shows that the monodromy of these KZB equations should be defined only for complete triples, and hence the existence of the dual equations should be expected only for them.
642
P. Etingof, O. Schiffmann
Remark 3. The above theorems are also valid for the specialized quantum group Uq (g), which is obtained from the formal quantum group when we take h¯ ∈ C∗ \ {iR} to be a complex number. In that case, it is more convenient to consider the twist JT (λ) as an endomorphism of the functor F : Rep(Uq (g)) × Rep(Uq (g)) → Vec which assigns to any two finite-dimensional Uq (g)-modules V and W the vector space V ⊗ W . Here Rep(Uq (g)) is the category of finite-dimensional Uq (g)-modules and Vec is the category of finite-dimensional C-vector spaces. For instance, Eq. (1.5) means that for every three representations U, V , W and vectors u ∈ U , v ∈ V and w ∈ W with respective weights λu , λv and λw we have 1 1 JT (λ)12,3 (λ)JT12 λ + λw (u ⊗ v ⊗ w) = JT1,23 (λ)JT23 λ − λu (u ⊗ v ⊗ w). 2 2 3. The Twisted Macdonald–Ruijsenaars Equations The proof of Theorem 2.1 is an extension of the proof of Theorem 1.1 of [EV2] to the case of an arbitrary generalized Belavin–Drinfeld triple. From now on we fix such a triple ( 1 , 2 , T ). We first note that the notion of radial part generalizes straightforwardly to the twisted setting: Proposition 3.1. Let V be a finite-dimensional Uq (g)-module. For any X ∈ Uq (g) there T (with respect to the variable λ) acting on formal exists a unique difference operator DX l 2(λ,µ) −2(λ,α 1 ) , . . . , , q −2(λ,αr ) ]], λ ∈ l ∗ such that we have C[[q power series in V ⊗ q T Tr |Mµ 0Vµ Bq 2λ . Tr |Mµ 0Vµ XBq 2λ = DX T is called the twisted radial part of X. The operator DX For any finite-dimensional Uq (g)-module W set
CW = Tr |W (1 ⊗ πW )(R21 R(1 ⊗ q 2ρ )). It is well-known (see [D, R]) that the map W → CW defines a homomorphism from the Grothendieck ring of the category of finite-dimensional Uq (g)-modules to the center of Uq (g). Set MTW = DCT W . Proposition 3.2. We have (i) MTW MTV = MTV MTW for all V , W ,
(ii) MTW 2VT (λ, µ) = χW (q 2(µ+ρ) )2VT (λ, µ) where χW (x) = ν dim W [ν]x ν is the character of W . Proof. See [EK, EV2]. % & Let us now proceed to explicitly compute the operator MTW . Let V be a finitedimensional Uq (g)-module. Introduce the following function with values in V ⊗ V ∗ ⊗ Uq (g), with components labeled as 1, ∗1 and 2 respectively: ZV (λ, µ) = Tr |Mµ 0Vµ R20 B0 q02λ .
Twisted Traces of Quantum Intertwiners and Quantum Dynamical R-Matrices
643
Lemma 3.1. We have 1 ZV (λ, µ) = JT12 (λ)2VT λ + h(2) , µ . 2
(3.1)
Proof. First we note that, by pulling the R-matrix around the trace and using the intertwining property together with the fact that B1−1 R = B2 R we obtain ZV (λ, µ) = R21 q12λ Tr |Mµ (0Vµ (B2−1 R20 )B0 q02λ ) = R21 q12λ B2−1 Tr |Mµ (0Vµ R20 B0 q02λ ) = R21 q12λ B2−1 ZV (λ, µ). On the other hand, using the defining equation for JT (λ), the relation B2−1 JT (λ) = B1 JT (λ) and the l-invariance of 2VT (λ, µ) we have 1 1 R21 q12λ B2−1 JT12 (λ)2VT (λ + h(2) , µ) = R21 q12λ B1 JT12 (λ) 2VT λ + h(2) , µ 2 2 1 = JT12 (λ)q12λ q12l 2VT λ + h(2) , µ 2 1 = JT12 (λ)2VT λ + h(2) , µ . 2 The lemma now follows from the fact both sides of (3.1) satisfy the relation Y = R21 q12λ B2−1 Y and are of the form Y = q Z 2VT (λ + 21 h(2) , µ) + l.o.t, where l.o.t stands for terms of strictly positive degree in component 2. % & Consider the following function with values in V ⊗ V ∗ ⊗ Uq (g) ⊗ Uq (g) (with components labeled as 1, ∗1, 2 and 3 respectively): XV (λ, µ) = Tr |Mµ 0Vµ R20 q02λ B0 (R03 )−1 .
Lemma 3.2. We have 1 1 XV (λ, µ) = JT3,12 (λ)JT12 λ − h(3) 2VT λ + (h(2) − h(3) ), µ JT32 (λ)−1 . (3.2) 2 2 Proof. Moving R03 around the trace, using the quantum Yang–Baxter equation for R and the B-invariance property of R again, we get XV (λ, µ) = R13 Tr |Mµ (0Vµ (R03 )−1 R20 q02λ B0 ) = R13 R23 Tr |Mµ (0Vµ R20 (R03 )−1 q02λ B0 )(R23 )−1 = R13 R23 q32λ Tr |Mµ (0Vµ R20 q02λ (R03 )−1 B0 )q3−2λ (R23 )−1 = R13 R23 q32λ B3 Tr |Mµ (0Vµ R20 q02λ B0 (R03 )−1 ) q3−2λ (R23 )−1 = R13 R23 q32λ B3 XV (λ, µ) q3−2λ (R23 )−1 .
644
P. Etingof, O. Schiffmann
On the other hand, using the modified ABRR equation (1.4) we have 1 1 R13 R23 q32λ B3 JT3,12 (λ) JT12 λ − h(3) 2VT λ + h(2) − h(3) , µ 2 2 −1 × B3 JT32 (λ)−1 q3−2λ R23 1 1 = 1 R13 q32λ B3 JT31 (λ) JT12 λ − h(3) 2VT λ + h(2) − h(3) , µ 2 2 −2λ −l 32 × q3 q23 JT (λ)−1 1 1 = 1 JT31 (λ)q32λ q31l JT12 λ − h(3) 2VT λ + h(2) − h(3) , µ 2 2 − × q3−2λ q23 l JT32 (λ)−1 1 1 = JT3,12 q32λ q31l q32l JT12 λ − h(3) 2VT λ + h(2) − h(3) , µ 2 2 − × q3−2λ q23 l JT32 (λ)−1 1 1 = JT3,12 JT12 λ − h(3) 2VT λ + h(2) − h(3) , µ JT32 (λ)−1 . 2 2 3,12 −1 Now set X(λ) = JT XV (λ)JT32 (λ). By the above and by Lemma 3.1, both X(λ) 1 (3) and ZV (λ − 2 h ) satisfy the equation
q32λ q31l q32l Y = Y q32λ q23l and are both of the form Y = JT12 (λ − 21 h(3) )2VT (λ + 21 (h(2) − h(3) )) + l.o.t. Hence X(λ) = ZV (λ − 21 h(3) ) and the lemma is proved. % & Corollary 3.1. We have Tr |Mµ (0Vµ R20 (R03 )−1 q 2λ B0 ) 1 (3) 1 (2) 2λ 3,12 12 (3) 32 −1 −2λ = B3 q3 JT (λ)JT (λ − h )2V (λ + (h − h ), µ)JT (λ) q3 . 2 2 Now let W be any finite-dimensional Uq (g)-module. By Corollary 3.1, we get MTW 2VT (λ, µ) = Tr |Mµ (0Vµ CW q02λ B0 ) 2ρ = Tr |Mµ Tr |W (RW 0 R0W qW )0Vµ q02λ B0 ) 2ρ = Tr |W Tr |Mµ m23 (R20 S3 (R03 )−1 q3 )0Vµ q02λ B0 2ρ (3.3) = Tr |W m23 S3 Tr |Mµ (R20 (R03 )−1 0Vµ q02λ B0 ) q2 1 (1) (2) = Tr |W dk (λ)ci ⊗ q −2λ dk (λ)di λ + h(2) × 2 i,j,k × 2VT (λ + h(W ) ) dj (λ)q 2λ S(B(cj ))S(B(ck ))q 2ρ W ,
where m : Uq (g) ⊗ Uq (g) → Uq (g) is the multiplication map, JT (λ) = i ci ⊗ di (λ),
J −1 (λ) = i ci ⊗di (λ) and where we used Sweedler’s notation for coproducts: (x) =
T (1) x ⊗ x (2) .
Twisted Traces of Quantum Intertwiners and Quantum Dynamical R-Matrices
Let us set R =
i
ai ⊗ bi , R−1 =
Lemma 3.3. We have
where u =
i
j
i
645
ai ⊗ bi .
dj (λ)q 2λ S(B(cj )) = q Cl P (λ)S(u)q 2λ ,
S(bi )ai is the Drinfeld element and P (λ) =
j
(3.4)
dj (λ)S −1 (cj ).
Proof. This is obtained by applying m21 (S ⊗ 1) to the relation (B1 JT (λ))−1 q1−2λ = q −l q1−2λ JT−1 (λ)R21 , which itself follows from (1.4) and the B-invariance of JT (λ). & % Substituting (3.4) in (3.3) yields (1) (2) MTW 2VT (λ, µ) = dk (λ)ci Tr |W [ν] q Cl P (λ)q 2λ S −1 (B(ck ))q −2λ dk (λ) i,k,ν
(3.5)
1 × di (λ + ν)S(u)q 2ρ 2VT (λ + ν). 2
Lemma 3.4. We have (1) (2) dk (λ) ⊗ q 2λ S −1 (B(ck ))q −2λ dk (λ) k
=
(1) (2) (aj )(1) dk (λ)q −l +1⊗Cl S −1 (ck )S −1 (bj )(aj )(2) dk (λ) 2 .
(3.6)
jk
Proof. From the modified ABRR relation we get q12λ B1 (JT (λ))q1−2λ = (R21 )−1 JT (λ)q l . Applying 1 ⊗ yields
q12λ B1 (JT1,23 (λ))q1−2λ = (R23,1 )−1 JT1,23 (λ)q12l q13l , which can be written as (1) (2) q12λ B(ck )q1−2λ ⊗ dk (λ) ⊗ dk (λ) k
=
ik
(1)
(2)
(bi ⊗ (ai )(1) ⊗ (ai )(2) ) × (ck ⊗ dk (λ) ⊗ dk (λ))q12l q13l .
Equation (3.6) is now obtained by applying m13 (S −1 ⊗ 1 ⊗ 1). % & We introduce the following notation. For any linear operator H (λ) ∈ End (V1 ⊗ · · · ⊗ VN ) we set H (λ + hˆ (i) )(v1 ⊗ · · · ⊗ vN ) = Hν (λ + ν)(v1 ⊗ · · · ⊗ vN ), ν
where Hν (λ) : V1 ⊗ · · · ⊗ Vi ⊗ · · · ⊗ VN → V1 ⊗ · · · ⊗ Vi [ν] ⊗ · · · ⊗ VN is the block of H (λ) with image Vi [ν] in the i th component. In other words, we replace hˆ (i) by the weight in the i th component after the action of H .
646
P. Etingof, O. Schiffmann
Lemma 3.5. The following identities hold:
(1)
(i) ⊗ S −1 (bj )(aj )(2) = ak ⊗ ubk , j (aj )
(ii) R23 JT1,23 (λ) = JT1,32 (λ)R23 ,
(1) (2) (iii) S(ci )di (λ) ⊗ di (λ) = S(QT )(λ − 21 h(2) )1 JT−1 (λ + 21 hˆ (1) ). Proof. Equalities (i) and (iii) are proved in the same fashion as in [EV2]. Equality (ii) & follows from the relation R = op R. %
Corollary 3.2. We have (1) (2) dk (λ) ⊗ q 2λ S −1 (B(ck ))q −2λ dk (λ)
1 (1) 21 −1 1 ˆ (2) = q −l −1⊗Cl u−1 λ − (J λ + R. h h S(Q ) ) T 2 T 2 2 2
Proof. Use (i), (ii) and (iii) successively, as in [EV2], (2.32).
& %
By Corollary 3.2 and using the relation B1 JT (λ) = B2−1 JT (λ), we can rewrite (3.5) as follows: MTW 2VT (λ, µ) 1 (1) 21 −1 1 ˆ (2) λ − (J λ + R h h Tr |W [ν] P2 (λ)q −l −1⊗Cl u−1 S(Q ) ) = T 2 T 2 2 2 ν 1 2ρ m12 l × (ci )1 di λ + ν S(u)2 q2 q2 2VT (λ + ν, µ) 2 2 WV ˜ = Tr |W [ν] G(λ)(R (λ) 2V (λ + ν, µ), T) ν
(3.7) ˜ ˜ where G(λ) = q −2ρ P (λ)S(QT )(λ). We now proceed to compute G(λ). ˜ Proposition 3.3. We have G(λ) =
δqT (λ+h) . δqT (λ)
Proof. The following lemma is proved as in [EV2]: ˜ Lemma 3.6. We have P (λ) = Q−1 T (λ + h), i.e. G(λ) = G(λ + h), where G(λ) = −1 −2ρ q QT (λ)S(QT )(λ − h). A direct (though lengthy) computation shows that (G(λ)) = JT (λ) G(λ + h(2) ) ⊗ G(λ) J−1 T (λ)
(3.8)
(see [M] for a detailed proof of this in the nondynamical case; the dynamical case is analogous). In particular, replacing λ by hλ¯ , we have RJT
λ λ + hh(2) λ λ ¯ G ⊗G J−1 T h¯ h¯ h¯ h¯ λ λ λ + hh(1) λ ¯ −1 21 = JT G ⊗G J21 R T h¯ h¯ h¯ h¯
Twisted Traces of Quantum Intertwiners and Quantum Dynamical R-Matrices
647
which can be rewritten as λ λ + hh(2) λ λ λ + hh(1) ¯ ¯ 21 λ R21 G ⊗ G = G ⊗ G R . (3.9) T T h¯ h¯ h¯ h¯ h¯ h¯ Let us now expand RT hλ¯ and G hλ¯ around h¯ = 0: RT
λ h¯
= 1 + hr(λ) + O(h¯ 2 ), ¯
λ 2 G = 1 + hg ¯ 1 (λ) + O(h¯ ), h¯
where g1 (λ) = (G(λ/h) ¯ − 1)/h¯ ∈ Uq (g)/hU ¯ q (g) " U (g). Note that by (3.8) we have 0 (g1 (λ)) = g1 (λ) ⊗ 1 + 1 ⊗ g1 (λ) (where 0 is the usual coproduct on U g), which implies that g1 (λ) ∈ g. But since G(λ) is of l-weight zero, g1 (λ) ∈ h. Now, by (3.9), we have ∂g1 (λ) xi ∧ = [r(λ), g1 (λ) ⊗ 1 + 1 ⊗ g1 (λ)], ∂xi i
where (xi ) is a basis of l. In particular, [r(λ), g1 (λ) ⊗ 1 + 1 ⊗ g1 (λ)] ∈ G2 h. But this implies that [r(λ), g1 (λ) ⊗ 1 + 1 ⊗ g1 (λ)] = 0. Thus g1 : l∗ → l is a closed f1 ( hλ¯ ) 1-form on l∗ and there exists functions f1 (λ) and g2 (λ) = h12 G( hλ¯ ) − ∈ λ−h¯ h ¯
f1 (
h¯
)
Uq (g)/hU ¯ q (g) " U (g). By the same argument, g2 (λ) is a closed 1-form. Continuing in (λ) . It this process, we finally obtain a function f defined on l∗ such that G(λ) = f f(λ−h) remains to determine f (λ) explicitly. For this, apply (3.7) and Proposition 3.2 (ii) to the 2(µ+ρ,λ) case of the trivial representation V = C. Then 2VT (λ, µ) = q δ T (λ) and RV W = 1. We q get f (λ + ν) ν
f (λ)
|W [ν]
dim W [ν]
2(µ+ρ) q 2(µ+ρ,λ) q 2(µ+ρ,λ+ν) q = χ . W δqT (λ + ν) δqT (λ)
As in [EV2], Corollary 2.16 we conclude that one can take f (λ) = δqT (λ). Theorem 2.1 now follows from (3.7), Proposition 3.2 (ii), Proposition 3.3 and from the following easily checked fusion identity: 1···N (2···N) J1···N (λ)−1 R0,1···N JT (λ + h(0) ) = R01 · · · R0N T λ+h T T (λ) . T
(3.10)
4. The Twisted qKZB Equations We will first prove that the twisted qKZB equations hold for two finite-dimensional Uq (g)-modules V and W . As in the preceding section, we start with several preliminary lemmas. Consider the following function with values in W ⊗ V ⊗ V ∗ ⊗ W ∗ ⊗ Uq (g), with components labeled as 1, 2, ∗2, ∗1 and 3 respectively: ZW V (λ, µ) = Tr |Mµ (0W R30 0Vµ q02λ B0 ). µ +h(∗2)
648
P. Etingof, O. Schiffmann
Lemma 4.1. We have 1 (3) T λ + h ,µ . ZW V (λ, µ) = (R32 )−1 JT12,3 (λ)2W V 2
(4.1)
Proof. Moving the R-matrix around the trace and using the cyclicity property, we get ZW V (λ, µ) = R31 Tr |Mµ (0W 0V q 2λ B0 R30 ) µ +h(∗2) µ 0 2λ −1 B3 Tr |Mµ (0W R30 0Vµ q02λ B0 ) = R31 q12 µ +h(∗2) 2λ (B3−1 R32 )B3−1 ZW V (λ, µ). = R31 q12
On the other hand, 1 (3) 2λ T (B3−1 R32 )B3−1 (R32 )−1 JT12,3 (λ)2W R31 q12 V (λ + h , µ) 2
1 (3) 2λ T = R31 q12 B12 JT12,3 (λ)2W V λ + h ,µ . 2
From the modified ABRR equation it follows that
2λ 2λ B12 (JT12,3 (λ)) = JT12,3 (λ)q13l q23l q12 . R3,12 q12
Using the coproduct formula R3,12 = R32 R31 and the l-invariance of ZV W (λ, µ), we see that 1 (3) 2λ T R31 q12 (B3−1 R32 )B3−1 (R32 )−1 JT12,3 (λ)2W λ + h ,µ V 2
1 (3) T = (R32 )−1 JT12,3 (λ)2W V λ + h ,µ . 2
2λ B −1 X and are of the form Thus both sides of (4.1) satisfy the equation X = R31 q12 3
Z 2 T (λ + 1 h(3) , µ) + l.o.t. But it is easy to see that such an X is unique X = q32h q12,3 VW 2 and the lemma follows. % &
Now set Z˜ W V (λ, µ) = m32 S3 (ZW V (λ, µ)) = Tr |Mµ (0W (R20 )−1 0Vµ q02λ B0 ). µ +h(∗2) Lemma 4.2. We have 1 (1) −1 1 1 ˆ (2) T Z˜ W V (λ, µ) = S(u)−1 JT λ − hˆ (2) 2W V λ − h , µ . (4.2) 2 QT λ − h 2 2 2 2 Proof. From (4.1) it follows that 1 (3) 32 −1 12,3 T ˜ ZV W (λ, µ) = m32 S3 (R ) JT (λ)2W V λ + h 2 (1) 1 ˆ (2) (2) T = cj ⊗ S(dj (λ))S(ai )bi cj 2W V λ − h ,µ . 2 ij
Twisted Traces of Quantum Intertwiners and Quantum Dynamical R-Matrices
649
To conclude the proof of the lemma we use the following relations: i S(ai )bi = S(u−1 ), S(u−1 )x = S 2 (x)S(u−1 ) for all x ∈ Uq (g) and (1) 1 (1) 1 ˆ (2) (2) −1 −1 λ− h . cj ⊗ S (dj (λ))cj = QT λ − h JT 2 2 2 j
This last equation is obtained by applying m32 (1 ⊗ 1 ⊗ S −1 ) to the cocycle identity (1.5). % & Consider the following function with values in V ⊗ V ∗ ⊗ Uq (g): YV (λ, µ) = Tr |Mµ (0Vµ (R02 )−1 q02λ B0 ).
Lemma 4.3. We have 1 YV (λ, µ) = q1−2λ JT21 (λ)2VT λ − h(2) , µ . 2
(4.3)
Proof. A computation similarto the one in Lemma 4.1 shows that both YV (λ, µ) and q1−2λ JT21 (λ)2VT λ − 21 h(2) , µ satisfy the equation X = (R12 )−1 q12λ B2−1 X Z 2 T λ − 1 h(2) , µ + l.o.t. It is easy to see that such and are of the form X = q1−2λ q12 V 2 an X is unique. % & We will also need the following two-representations analogue of YV (λ, µ): YW V (λ, µ) = Tr |Mµ (0W 0V (R03 )−1 q02λ B0 ). µ +h(∗2) µ
Lemma 4.4. We have 1 (3) T YW V (λ, µ) = (R12,3 )−1 JT3,12 (λ)2W λ − h ,µ . V 2 Proof. One checks that both sides of (4.4) are solutions of the equation 2λ −1 X = (R12,3 )−1 q12 B3 X −
−
Z q Z 2 T (λ − 1 h(3) ) + l.o.t. of the form X = q12 h q13 h q31 32 W V 2
& %
Finally, we introduce a function: 2ρ Y˜W V (λ, µ) = m32 q2 S3 (YW V (λ, µ)) m (q 0Vµ R03 )q02λ B0 ). = Tr |Mµ (0W µ +h(∗2) 32 2 2ρ
(4.4)
650
P. Etingof, O. Schiffmann
Lemma 4.5. We have 1 1 21 −1 T Y˜W V (λ, µ) = q 2ρ u−1 S(QT ) λ − h R JT 2W V λ + hˆ (2) , µ . 2 2 2
(4.5)
Proof. By (4.4) we have Y˜W V (λ, µ) =
(1) (2) (ai )(1) dj (λ) ⊗ S(cj )S(bi )q 2ρ (ai )(2) dj (λ) ij
1 ˆ (2) T × 2W V λ + h ,µ . 2 Now we use the following relations: q 2ρ x = S 2 (x)q 2ρ for any x ∈ Uq (g), (ai )(1) ⊗ S −1 (bi )(ai )(2) = ai ⊗ S −1 (bi )u−1 , i
(4.6)
i
and (1 ⊗ S)R−1 = R. We get Y˜W V (λ, µ) = (q 2ρ u−1 )2
ij
1 ˆ (2) (1) (2) T ai dj (λ) ⊗ S(cj )bi dj (λ) 2W V λ + h ,µ . 2
Using the identity R12 JT3,12 (λ) = JT3,21 (λ)R12 and i
1 1 (1) (2) S(ci )di (λ) ⊗ di (λ) = S(QT ) λ − h(2) JT−1 λ − hˆ (1) , 1 2 2
which is obtained by applying m12 (S⊗1⊗1) to (1.5), we can further simplify Y˜W V (λ, µ): Y˜W V (λ, µ)
1 1 1 ˆ (2) T = (q 2ρ u−1 )2 S(QT ) λ − h(1) (JT21 )−1 λ + hˆ (2) R12 2W V λ+ h 2 2 2 2 1 1 ˆ (2) = (q 2ρ u−1 )2 S(QT ) λ − h(1) R21 T λ+ h 2 2 2 1 1 ˆ (2) T × JT−1 λ + hˆ (2) 2W λ + h ,µ , V 2 2
which proves the lemma. % & We need one last technical result: Lemma 4.6. We have
2ρ −2(µ +ρ)− i xi2 m21 q1 0Vµ R02 = q∗1 (R10 )−1 0Vµ
where (xi )i is an orthonormal basis of h.
(4.7)
Twisted Traces of Quantum Intertwiners and Quantum Dynamical R-Matrices
651
Proof. Let Z = u−1 q 2ρ . This is a ribbon element of Uq (g) (see [D]). Thus 0Vµ Z = (Z)0Vµ = R21 R(Z ⊗ Z)0Vµ . But by (4.6) we have m21 (q1 0Vµ R02 ) =m21 (q1 R02 R12 )0Vµ 2ρ
2ρ
2ρ
−2ρ
=R01 m21 (q1 Rq1
)q1 0Vµ 2ρ
=R(1 ⊗ Z)0Vµ . On the other hand, it is easy to see that Z|Mν = q (2ρ+ν,ν) . The lemma now follows by a direct computation. % & We are now in position to prove Theorem 2.2. From (4.7), we have W −2(µ +ρ)− x 2 i i 0µ +h(∗2) (R20 )−1 0Vµ q02λ B0 Tr q ∗2 |Mµ 2ρ = Tr |Mµ 0W m (q 0Vµ R03 )q02λ B0 . µ +h(∗2) 32 2 In other words,
(q −2(µ +ρ)−
2 i xi
)∗2 Z˜ W V (λ, µ) = Y˜W V (λ, µ).
Using (4.2), (4.5), the relation uS(u−1 ) = q 4ρ , Proposition 3.3 and the definition of T (λ) we finally obtain ϕW V
(q −2(µ +ρ)−
2 i xi
T (2) 21 T )∗2 ϕW V (λ − h , µ) = RT (λ)ϕW V (λ, µ).
(4.8)
From this we derive the qKZB equation with N representations in the following way. We start with the easily checked fusion identities −1 1,23 23 12 (3) 13 (1) J23 T (λ) RT (λ)JT (λ + h ) = RT (λ + h )RT (λ),
(3) −1 12,3 12 J12 T (λ + h ) RT (λ)JT (λ)
=
13 (2) R23 T (λ)RT (λ + h ).
(4.9) (4.10)
Now, from (4.8) with W = V1 ⊗ · · · ⊗ Vj and V = Vj +1 ⊗ · · · ⊗ VN we get
2 q −2(µ+ρ)+ xi q 2 xi ⊗xi
∗j +1,... ,∗N ∗j +1,... ,∗N,∗1,... ,∗j 1···j j +1···N (j +1...N) T × JT (λ)JT (λ − h )ϕV1 ,...VN (λ − h(j +1···N) , µ) j +1...N,1...j 1···j j +1...N = RT (λ)JT (λ + h(j +1...N) )JT (λ)ϕVT1 ,...VN (λ, µ).
(4.11)
By (4.9) this implies that
2 1...j −1 q 2 xi ⊗xi λ + h(j ) JT q −2(µ+ρ)+ xi ∗j +1...∗N ∗j +1,...∗N,∗1,...∗j j +1...N (j +1...N) × JT λ−h ϕVT1 ,... ,VN λ − h(j +1...N) , µ j +1...N,1...j −1 j +1...N,j 1...j −1 λ + h(j ) RT λ + h(j ...N) = RT (λ)JT j +1...N
× JT
(λ)ϕVT1 ,... ,VN (λ, µ).
(4.12)
652
P. Etingof, O. Schiffmann
On the other hand, by (4.9) with W = V1 ⊗ · · · ⊗ Vj −1 and V = Vj ⊗ · · · ⊗ VN and using (4.10) we also have
2 (q −2(µ+ρ)+ xi )∗j ...∗N q 2 xi ⊗xi ∗j,...∗N,∗1,...∗j −1 1...j −1 j +1...N × JT (λ)JT λ + h(1...j −1) ϕVT1 ,... ,VN λ − h(j ...N) , µ j +1...N,1...j −1 j,1...j −1 1...j −1 j +1...N λ + h(j +1...N) JT = RT (λ)RT (λ + h(j ...N) )JT (λ) × ϕVT1 ,... ,VN (λ, µ). (4.13) Applying the operator j to both sides of (4.13) we get
2 q −2(µ+ρ)+ xi q 2 xi ⊗xi ∗j ...∗N
∗j,...∗N,∗1,...∗j −1
1...j −1 j +1...N × JT (λ + h(j ) )JT (λ + h(1...j ) )ϕVT1 ,... ,VN
λ − h(j +1...N) , µ
j,1...j −1 λ + h(j ) j RT λ + h(j +1...N) 1...j −1 j +1...N × JT (λ) λ + h(j ...N) JT j +1...N,1...j −1
= RT
(4.14)
× ϕVT1 ,... ,VN (λ, µ). Comparing (4.12) with (4.14) and using the l-invariance of R we obtain
2
1...j −1 q −2(µ+ρ)+ xi q 2 xi ⊗xi λ + h(j ...N) JT ∗j j +1...N,j
∗j,∗1...∗j −1 j +1...N
(λ)JT (λ)ϕVT1 ,... ,VN (λ, µ) j +1...N j,1...j −1 = JT λ + h(j ) j RT λ + h(j +1...N) 1,...j −1 λ + h(j ...N) ϕVT1 ,... ,VN (λ, µ), × JT × RT
(4.15)
which can be rewritten as −1
2
j +1...N (q 2 xi ⊗xi )∗j,∗1...∗j −1 JT λ + h(j ) q −2(µ+ρ)+ xi
=
∗j j +1...N,j j +1...N × RT (λ)JT (λ)ϕVT1 ,... ,VN (λ, µ) −1 1...j −1 j,1...j −1 RT λ + h(j +1...N) λ + h(j +1...N)
j JT 1,...j −1
× JT
λ + h(j ...N) ϕVT1 ,... ,VN (λ, µ).
(4.16)
Finally, taking into account identities (4.9) and (4.10), we have
2
q −2(µ+ρ)+ xi q 2 xi ⊗xi ∗j ∗j,∗1...∗j −1 j +1,j Nj λ + h(j +2...N) ϕVT1 ,... ,VN (λ, µ) × RT (λ) · · · RT j1 = j RT λ + h(2...j −1) + h(j +1...N) · · · jj −1
× RT
(λ + h(j +1...N) )ϕVT1 ,... ,VN (λ, µ).
(4.17)
Twisted Traces of Quantum Intertwiners and Quantum Dynamical R-Matrices
653
The proof of Theorem 2.2 is now obtained by replacing µ by −µ−ρ and by rewriting (4.17) in terms of FVT1 ,... ,VN (λ, µ). 5. The Twisted Dual Macdonald–Ruijsenaars Equation In this section we let ( 1 , 2 , T ) be a complete generalized Belavin–Drinfeld triple. Let W be a finite-dimensional Uq (g)-module such that W " W B and let us consider W as a B Uq (g)-module (see Sect. 2). For generic values of µ, the tensor product Mµ ⊗ W decomposes as a direct sum of Verma modules, and ην : W [ν] ⊗ Mµ+ν → Mµ ⊗ W, w ⊗ y → 0w µ+ν (y) is an isomorphism onto the isotypic component corresponding to Mµ+ν . The following lemma is straightforward: Lemma 5.1. We have (B ⊗ B) ◦ ην = ηB(ν) ◦ (B ⊗ B). Now let V be any finite-dimensional Uq (g)-module and consider the composition PV ⊗V ∗ ,W RV W 0VB(µ) (B ⊗ B)ην : W [ν] ⊗ Mµ+ν → MB(µ)+h(V ) ⊗ W ⊗ V ⊗ V ∗ . By [EV2], Proposition 3.1, we have PV ⊗V ∗ ,W RV W 0VB(µ) ην = R W V (B(µ + ν))t2 0VB(µ+ν) . It follows from Lemma 5.1 that PV ⊗V ∗ ,W RV W 0VB(µ) (B ⊗ B)ην = ηh(W ) R W V (B(µ + ν))t2 0VB(µ+ν) (B ⊗ B),
(5.1)
where R(λ) = RI d (λ) is the quantum dynamical R-matrix corresponding to the trivial triple ( , , I d) and where t2 means transposition in the second component (so that 2λ R W V (B(µ + ν))t2 acts on W ⊗ V ∗ ). Now let us multiply both sides of (5.1) by qM µ ⊗W and sum over all values of ν. This yields 2λ PV ⊗V ∗ ,W RV W 0VB(µ) (B ⊗ B)qM µ ⊗W
= ηR W ⊗V (B(µ + h(W ) ))(B ⊗ B)q 2λ η−1 ,
(5.2)
∼ where η = ν ην : ν W [ν] ⊗ Mµ+ν → Mµ ⊗ W . Let us take the trace in the Verma modules and in W . Using the fact that R ∈ q h Uq (n+ ) ⊗ Uq (n− ) and that ν − B(ν) is never a linear combination of negative roots, we obtain (V ) Tr |W [ν] (R W V (B(µ + ν))t2 B)ϕVT (λ, µ + ν). Tr |W (q 2λ+h B)ϕVT (λ, µ) = ν
It is clear that (V )
Tr |W (q 2λ+h
B) =
ν,B(ν)=ν
(V )
Tr |W [ν] (q 2λ+h
B) = Tr |W h0 (q 2λ B).
654
P. Etingof, O. Schiffmann
Hence from (5.2) we get Tr |(W ∗ )h0 (q −2λ B ∗ )ϕVT (λ, µ) =
ν
∗ WV Tr |W ∗ [−ν] (BW (B(µ + ν))t1 t2 )ϕVT (λ, µ + ν), ∗R
which can be rewritten in terms of FVT (λ, µ) as Tr |(W ∗ )h0 (q −2λ B)FVT (λ, µ) ∗ WV Tr |W ∗ [ν] Q−1 (B(µ + ν))t1 t2 Q|V ∗ (B(µ + ν)) FVT (λ, µ + ν). = |V ∗ (B(µ))BW ∗ R ν∈l∗
(5.3) Finally, using the formula RW V (λ)t1 t2 = Q(λ) ⊗ Q(λ − h(1) ) RW ∗ V ∗ (λ − h(1) − h(2) ) Q−1 (λ − h(2) ) ⊗ Q−1 (λ) (5.4) (see [EV2], (3.12)) and using the fact that Q is of weight zero, we simplify (5.3) to Tr |(W ∗ )h0 (q −2λ )FVT (λ, µ) =
ν
=
ν
=
ν
(5.5)
∗ Tr |W ∗ [ν] BW ∗ QW ∗ (B(µ + ν))RW ∗ V ∗ (B(µ + ν) − ν
−h
(2)
)
(2) T d × Q−1 W ∗ (B(µ + ν) − h ) FV (λ, µ + ν)
−1 ∗ T Tr |W ∗ [ν] QW ∗ (B(µ + ν))RW ∗ V ∗ (µ)BW ∗ QW ∗ (B(µ + ν)) FV (λ, µ + ν) T ∗ Tr |W ∗ [ν] RW ∗ V ∗ (µ)BW ∗ FV (λ, µ + ν).
(5.6)
The twisted dual Macdonald–Ruijsenaars equations for an arbitrary number of modules V1 , . . . VN can now be deduced from (5.6) and from the fusion identity (3.10). Theorem 2.3 is proved. Proof of Proposition 2.1. Let W and W be the Weyl groups of and respectively. By the Bernstein–Gelfand–Gelfand resolution, we have Tr |Vν (q 2λ B) =
(−1)l(w) Tr |Mw(ν+ρ)−ρ q 2λ B .
w∈W
Denote by sα the simple reflection corresponding to the simple root α ∈ . The group generated by B acts on W by B(sα ) = sT α . It follows from the facts that W acts simply transitively on the sets of simple roots and that ν is dominant that B(w(ν + ρ) − ρ) =
Twisted Traces of Quantum Intertwiners and Quantum Dynamical R-Matrices
655
w(ν + ρ) − ρ if and only if B(w) = w. Moreover, WB is naturally isomorphic to W. Hence, (−1)l(w) Tr |Mw(ν+ρ)−ρ q 2λ B = (−1)l(w) Tr |Mw(ν+ρ)−ρ q 2λ B w∈W
w∈W B
=
(−1)l(w)
w∈W
q 2(λ,w(ν+ρ)−ρ) . + 1 − θα q −2(α,λ) α∈
(5.7)
Let ωα be the fundamental weight corresponding to α ∈ . It is easy to check that
{ωα = (α,α) α∈α ωα , α ∈ } is the set of fundamental weights of . Thus (2λ, w(ν + 2Nα
ρ) − ρ) = (2λ, w(ν + ρ) − ρ), where ρ = α ωα . Hence, by the Weyl character formula for g we have Tr |Vν (q 2λ B) = χV ν (q 2λ ) where
δ q (λ) , δqT (λ)
1 − q −2(α,λ)
δ q (λ) = q 2(ρ,λ)
+
α∈
is the (usual) Weyl denominator for . Setting ν = 0 we see that in fact δ q (λ) = δqT (λ). The proposition follows. % & 6. The Twisted Dual qKZB Equations In this section we prove Theorem 2.4. As in the preceding section, let T be an automorphism of and let V1 , . . . VN be finite-dimensional Uq (g)-modules. We will extensively use the following two identities which are proved in [EV3]: V ∗ ∗ −1 V W = R−1 R21 (µ)0Vµ+h(W ∗ ) 0W 0W ∗ 0 µ = R21 R (µ) 0µ+h(W ∗ ) 0µ µ+h(V ) µ
(6.1)
for any two modules V , W . Consider
1 VN 2λ 2VT1 ,... ,VN (λ, µ) = Tr |Mµ 0VB(µ)+h B (∗2···∗N ) · · · 0B(µ) q
and move the j th intertwiner to the right using (6.1). We get 2 TV1 ,... ,VN (λ, µ)
∗ (∗j +2···∗N) −1 ∗ = Rj +1,j · · · RN,j qj2λ Rj,j ) · · · Rj,N (B(µ))−1 +1 (B(µ) + h VN 2λ Vj 1 × Tr |Mµ 0VB(µ)+h 0B(µ) B . (6.2) (∗2···∗N ) · · · 0B(µ)+h(∗j ) q
Now, we have VN 2λ Vj 1 0B(µ) B Tr |Mµ 0VB(µ)+h (∗2···∗N ) · · · 0B(µ)+h(∗j ) q V
N = BVj B ∗ ∗ Tr Mµ+h(∗j ) 0 j (∗j ) Vj
∗ Tr Mµ 0 = BVj B ∗ ∗ ∗j Vj
B(µ+h
Vj
)+
B(µ)+
N
i=1,i=j
i=1,i=j
h(∗)i
2λ N · · · 0VB(µ+h B (∗j ) ) q
VN 2λ · · · 0 q B , (∗i) B(µ)
h
(6.3)
656
P. Etingof, O. Schiffmann
where we note Vj = VjB for simplicity. Finally, we move 0Vj to the right back to its original position, thereby completing a cycle. By (6.1) we obtain −1
V Tr |Mµ 0 j
B(µ)+
=
N
i=1,i=j
h(∗i)
N · · · 0VB(µ) q 2λ B
−1 ∗ R−1 j,1 · · · Rj,j −1 R1,j B(µ) +
N
h
(∗i)
i=2,i=j
· · · Rj∗−1,j
N
B(µ) +
h
(∗i)
i=j +1
× 2VT1 ,...V ,...VN (λ, µ).
(6.4)
j
Combining (6.2), (6.3) and (6.4) yields the following relation: 2 TV1 ,... ,VN (λ, µ) −1 = Rj +1,j · · · RN,j qj2λ (Bj R−1 j,1 ) · · · (Bj Rj,j −1 ) × ×
∗ Rj,j +1
B(µ) +
N
h(∗i)
−1
−1 ∗ · · · Rj,N B(µ) B∗ −1 (j )
i=j +2
∗ × Bj∗ R1,j B(µ) +
N i=2,i=j
h(∗i) · · · Bj∗ Rj∗−1,j B(µ) +
(6.5)
N
h(∗i)
×
i=j +1
× BVj B ∗ ∗ 2VT1 ,... ,V ,...VN (λ, µ). Vj
j
Let us replace µ by −µ − ρ and let us rewrite this equation in terms of F T (λ, µ). We get F TV1 ,... ,VN (λ, µ) −1 −1 1···N −1 2λ 1···N = JT (λ) Rj +1,j · · · RN,j qj (Bj Rj,1 ) · · · (Bj Rj,j −1 )(Bj JT (λ)) ×
N −1 (∗i) Q−1 B(µ) − (B(µ)) ⊗ · · · ⊗ Q h ∗1 ∗N i=2
N −1 ∗ × R∗j,j +1 B(µ) − h(∗i) × R∗j,N (B(µ))−1 B∗−1 −1 (j ) Bj
(6.6)
i=j +2
N N × R∗1,j B(µ) − h(∗i) · · · Bj∗ R∗j −1,j B(µ) − h(∗i) i=2,i=j
i=j +1
× Bj Q∗N (B(µ)) ⊗ · · · ⊗ Q∗1 B(µ) −
N i=2,i=j
× BVj B ∗ ∗ FVT1 ,... ,V ,... ,VN (λ, µ). Vj
j
h
(∗i)
−B
−1
(h
(∗j )
)
Twisted Traces of Quantum Intertwiners and Quantum Dynamical R-Matrices
657
Inverting, we obtain B Vj B ∗ ∗ FVT1 ,... ,V ,... ,VN (λ, µ) Vj
j
1···N = (Bj J1···N (λ))−1 (Bj Rj,1...j −1 )qj−2λ R−1 (λ) T j +1...N,j JT N −1 (∗i) −1 (∗j ) × Bj Q−1 B(µ) − (B(µ)) ⊗ · · · ⊗ Q h − B (h ) ∗1 ∗N i=2,i=j
N N (∗i) ∗ ∗−1 B(µ) − · · · × B B(µ) − × Bj∗ R∗−1 h R h(∗i) j 1,j j −1,j i=j +1
(6.7)
i=2,i=j
N × B∗ −1 (j ) R∗j,N (B(µ)) · · · R∗j,j +1 B(µ) − h(∗i) i=j +2
N × Q∗N (B(µ)) ⊗ · · · ⊗ Q∗1 B(µ) − h(∗i) × FVT1 ,... ,VN (λ, µ). i=2
Using (5.4) together with the fact that µ = B(µ) −
h(∗i) it is easy to check that
N −1 (∗i) −1 (∗j ) Kj∨,T = Bj Q−1 B(µ) − (B(µ)) ⊗ · · · ⊗ Q h − B (h ) ∗1 ∗N i=2,i=j
N N (∗i) ∗ ∗−1 (∗i) B(µ) − B(µ) − × · · · × B × Bj∗ R∗−1 h R h j 1,j j −1,j i=j +1
i=2,i=j
N × B∗ −1 (j ) R∗j,N (B(µ)) · · · R∗j,j +1 B(µ) − h(∗i) i=j +2
N × Q∗N (B(µ)) ⊗ · · · ⊗ Q∗1 B(µ) − h(∗i) . i=2
Finally, we have 1···N Dj∨,T = (Bj J1···N (λ))−1 (Bj Rj,1...j −1 )qj−2λ R−1 (λ) T j +1...N,j JT
(6.8)
when applied to (V1 ⊗ · · · ⊗ VN )l . The proof of this last equality is similar to the proof of [EV2] (4.10): it is enough to check (6.8) for N = 3, for which it follows from the modified ABRR equation (1.4). This concludes the proof of Theorem 2.4.
658
P. Etingof, O. Schiffmann
7. The Classical Limits Let us now examine the classical limits of Theorems 2.1–2.4, that is, the behavior of (a suitable renormalization of) the functions FVT1 ,... ,VN (λ, µ) when h¯ → 0. In that limit, the quantum group Uq (g) becomes the usual enveloping algebra U (g). We will denote by 0, QcT , RcT , JcT , . . . the classical limits of the operators constructed in Sect. 2.1, obtained when we replace Uq (g) by U (g). Let V1 , . . . VN be finite-dimensional Uq (g)-modules and let V1c , . . . VNc be the corresponding U (g) modules. Let us fix a generalized Belavin–Drinfeld triple ( 1 , 2 , T ) and set 2VT1,c,... ,VN = Tr |Mµc 0
V1c
N
µ +
(∗i) i=2 h
Vc
· · · 0µN Be−λ .
c (Be −λ ))−1 . We define the classical limit of the function Also set δ T (λ) = (Tr |M−ρ FVT1 ,... ,VN (λ, µ) as
FVT1,c,... ,VN (λ, µ) := lim FVT1 ,... ,VN h¯ →0
λ h¯
,µ .
The following result is clear from the definitions. Lemma 7.1. We have FVT1,c,... ,VN (λ, µ)
= δ T (λ)[Qc∗N (µ + h(∗1···∗N) )−1 ⊗ · · · ⊗ Qc∗1 (µ + h(∗1) )−1 ]2VT1,c,... ,VN (λ, −µ − ρ).
The classical analogue of Proposition 3.1 is as follows. Proposition 7.1. Let V be any finite-dimensional U (g)-module and let X ∈ U (g). T acting on functions l ∗ → V l such that (i) There exists a unique differential operator dX T Tr |Mµ (0Vµ Be−λ ). Tr |Mµ (0Vµ XBe−λ ) = dX T dT = dT dT . (ii) If X, Y belong to the center of U (g) then dX Y Y X
Unfortunately, there is no convenient classical analogue of the Drinfeld-Reshetikhin construction of central elements in Uq (g), and therefore no convenient explicit compuT in general. However, this can be done when X = m () is the tation of the operator dX 12 quadratic Casimir, which yields the following classical analogue of Theorem 2.1 (which is proved directly in [ES1], Theorem 3.2). Theorem 7.1 ([ES1]). The function FVT1,c,... ,VN (λ, µ) satisfies the following second order differential equation: i∈I1
r ∂2 − ST (λ)|Vl ⊗Vn FVT1,c,...VN (λ, µ) = (µ, µ)FVT1,c,... ,VN (λ, µ), ∂xi2 l,n=1
(7.1)
Twisted Traces of Quantum Intertwiners and Quantum Dynamical R-Matrices
659
where (xi )i∈I1 (resp. (xi )i∈I2 ) is an orthonormal basis of l (resp. of h0 ) and where ST (λ) =
∞ ∞
e(s+v)(α,λ) (B s fα ⊗ B −v eα + B −v eα ⊗ B s fα )
α k=0 v=1
−
1 − CT 2
i∈I2
xi ⊗
1 − CT xi . 2
Theorem 7.1 can also be deduced from Theorem 2.1 by expanding in powers of h. ¯ The classical limit of Theorem 2.2 are the twisted (trigonometric) KZB equations. Theorem 7.2 ([ES1]). The function FVT1,c,... ,VN (λ, µ) satisfies the following system of differential equations, for j = 1, . . . N: ∂ xi|Vj + rT (λ)|Vj ⊗Vl − rT (λ)|Vl ⊗Vj FVT1,c,... ,VN (λ, µ) ∂xi i∈I1
l>j
l<j
j −1 1 = (µ + Ch )|Vj∗ + (h )|Vi∗ ⊗Vj∗ FVT1,c,... ,VN (λ, µ). 2
(7.2)
l=1
This theorem is proved in [ES1] but can again also be deduced from Theorem 2.2 by expanding in powers of h. ¯ Finally, when T is an automorphism of the Dynkin diagram we consider classical limits of the dual Macdonald–Ruijsenaars and dual qKZB equations. Let W be a B∨,T ,c invariant finite-dimensional g-module and let DW denote the difference operator given by formula (2.7) when Uq (g) is replaced by U (g). Theorem 7.3. We have ∨,T ,c T ,c DW FV1 ,... ,VN = Tr |W h0 (e−λ B)FVT1,c,... ,VN .
Similarly, let Kj∨,T ,c be the classical limit of Kj∨,T , i.e the difference operator given by formula (2.9) when q = 1 (and hence R(µ) is just the classical exchange matrix evaluated at −µ − ρ, see [EV3]). Theorem 7.4. For j = 1, . . . N we have BVj BV∗ ∗ FVT1,c,... ,VN = (e−λ )|Vj Kj∨,T ,c FVT ,c,... ,V B ,... ,V . j
1
j
N
8. Extension to Kac–Moody Algebras In this section we briefly explain how to adapt the construction of [ESS] to Kac–Moody algebras and how to generalize Theorems 2.1–2.4 to this setting. Let A = (aij ) be a symmetrizable generalized Cartan matrix of size n and rank l. ˇ be a realization of A, i.e., h is a complex vector space of dimension 2n − l, Let (h, , )
= {α1 , . . . αn } ⊂ h∗ and ˇ = {h1 , . . . hn } ⊂ h are linearly independent sets and αj , hi = aij . Let g = n− ⊕ h ⊕ n+ be the Kac–Moody algebra associated to A, i.e., g is generated by elements ei , fi , i = 1, . . . n and h with relations [ei , fj ] = δij hi ,
[h, h] = 0,
[h, ei ] = αi , hei ,
[h, fi ] = −αi , hfi ,
660
P. Etingof, O. Schiffmann
together with the Serre relations (see [K]). Let ( , ) be a nondegenerate invariant bilinear form on g. Let h be the inverse element to the restriction ( , ) to h. For every root α ∈ h∗ we set α ∗ = (1 ⊗ α)h . Let Uq (g) be the quantum Kac–Moody algebra. It is defined by the same relations as in Sect. 1, where now (aij ) is the generalized Cartan matrix A. Construction of the twist. Let ( 1 , 2 , T ) be a generalized Belavin–Drinfeld triple. As ⊥
and h0 = l⊥ ⊂ h. We will say that ( 1 , 2 , T ) before we set l = α∈ 1 C(α − T α) is nondegenerate if the restriction of ( , ) to l is, and we make this assumption from now on. Let h1 ⊂ h (resp. h2 ⊂ h) be the subspace spanned by simple roots α ∈ 1 (resp. α ∈ 2 ). In [ESS] we obtained an explicit construction of a twist JT (λ) for simple complex Lie algebras. An important observation there was that h = h1 + l, which makes it is possible to extend B to an orthogonal automorphism of h, and to define maps B ±1 : Uq (b∓ ) → Uq (b∓ ). However, in general we only have h1 + l ⊂ h but h1 + l = h, and thus it is necessary to modify the construction in [ESS], which is done below. The following lemma is obvious. Lemma 8.1. There exist unique algebra morphisms B : Uq (n− ⊕ h1 ) → Uq (n− ⊕ h2 ) and B : Uq (n+ ⊕ h2 ) → Uq (n+ ⊕ h1 ) such that B(Fα ) = FT α , B(hα ) = hT α if α ∈ 1 , B(Fα ) = 0 if α ∈ \ 1 , and B −1 (Eα ) = ET −1 α , B −1 (hα ) = hT −1 α if α ∈ 2 , B −1 (Eα ) = 0 if α ∈ \ 2 . Let α, β ∈ . Write α → β if there exists l ≥ 0 such that T l (α) = β. We extend this relation to Z+ by setting α →
β if there exists
α1 , . . . αr , β1 , . . . βr ∈ such that αi → βi for i = 1, . . . r and α = i αi , β = i βi . It is easy to see that this relation is transitive, i.e., if α → β and β → γ then α → γ . Set Z+ →α = {σ ∈ Z+ , σ → α},
Z+ α→ = {σ ∈ Z+ , α → σ }.
Now let us consider the space
+ + ∗ ∗ IT = Uq (n− )[−α]q (Z →α ) ⊗ Uq (n+ )[β]q (−Z β→ ) ⊂ Uq (b− ) ⊗ Uq (b+ ). β→α
Lemma 8.2. The space IT is stable under the actions of B ⊗ 1, 1 ⊗ B −1 and Ad(q h ). Proof. Note that the actions of (B ⊗ 1) and (1 ⊗ B −1 ) are well-defined on IT as B(Uq (n− )[−α]) = 0 if α ∈ Z+ 1 and B −1 (Uq (n+ )[β]) = 0 if β ∈ Z+ 2 . It is clear that (B ⊗ 1)IT ⊂ IT and (1 ⊗ B −1 )IT ⊂ IT . The last claim in the lemma follows easily from the formula ∗
∗
Ad(q h )(uα ⊗ vβ ) = uα q β ⊗ q −α vβ if uα ∈ Uq (n− )[−α] and vβ ∈ Uq (n+ )[β].
& %
Note that the Cayley transform CT : h0 → h0 is still well-defined in the Kac–Moody setting. Set Z = 21 ((1 − CT ) ⊗ 1)h0 . Let I T be the completion of IT with respect to ∗ the principal gradings in Uq (b± ) and let I T be the subspace consisting of elements of strictly negative degree in the first component and strictly positive degree in the second component.
Twisted Traces of Quantum Intertwiners and Quantum Dynamical R-Matrices
661 ∗
Theorem 8.1. There exists a unique element JT0 (λ) : l∗ → (1 + I T )l such that R21 q12λ B1 JT0 (λ) = JT0 (λ)q12λ q h .
(8.1)
Moreover JT (λ) := JT0 (λ)q Z satisfies the 2-cocycle relation 1 1 JT12,3 (λ)JT12 λ + h(3) = JT1,23 (λ)JT23 λ − h(1) . 2 2 Proof. The first statement is proved exactly as in [ESS]. We write JT0 (λ) = 1 +
0,i 0,i i≥0 JT (λ), where JT (λ) has degree i in the first component. Then (8.1) is equivalent to a system of equations labelled by j ≥ 1, Ad(q h q12λ )B1 JT (λ) = JT (λ) + . . . , 0,j
0,j
where . . . stands for terms involving JT0,i (λ) with i < j . But the operator Ad(q h q12λ ) 0,j B1 − 1 is invertible on ITl for generic λ and JT (λ) can be computed recursively. The second claim is proved as [ESS], Sect. 4. We consider the three-components versions of (8.1):
R21 R31 q12λ B1 XT0 (λ) = XT0 (λ)q12h q13h , R32 R21 q3−2λ B3−1 XT0 (λ) =
XT0 (λ)q12h q13h ,
(8.2) (8.3)
acting on (a suitable completion of) the space
+ ∗ + ∗ Uq (n− )[−α]q (Z →α ) ⊗ Uq (g)[β] ⊗ Uq (n+ )[γ ]q (−Z γ → ) , α,β,γ
where the sum runs over all triples (α, β, γ ) such that β can be written as β = β + − β − , Z (J 0 (λ + where β + + γ → β − + α. It is not difficult to show that (JT0 (λ))1,23 Ad q1,23 T 1 (1) 23 1 (3) 12 0 Z 0 12,3 h )) and (J (λ)) Ad q (J (λ+ h )) are two solutions of (8.2) and (8.3) 12,3 T T 2 2 with the same degree zero terms (in component 1 or in component 3). This implies that they are equal (see [ESS, Lemma 4.3]). % & Now let V1 , . . . , VN be Uq (g)-modules from the category O. Define the renormalized twisted trace functions FVT1 ,... ,VN (λ, µ) in the same way as in Sect. 2. Note that all the operators JT (λ), RT (λ), QT (λ), . . . are well-defined on any module from the category O when considered as formal powers series in q 2(λ,µ) C[[q −(λ,αi ) , q −(µ,αi ) ]]αi ∈ . Operators DW for affine algebras g are defined in some particular situation in [E3]. Theorem 8.2. The function FVT1 ,... ,VN (λ, µ) satisfies the following difference equation for all j = 1, . . . , N: FVT1 ,... ,VN (λ, µ) = (DjT ⊗ KjT )FVT1 ,... ,VN (λ, µ), where DjT and KjT are defined by (2.4) and (2.5).
(8.4)
662
P. Etingof, O. Schiffmann
Theorem 8.3. Let T be an automorphism of . The functions FVT1 ,... ,VN (λ, µ) satisfy the following difference equation for each j = 1 . . . , N: BVj BV∗ ∗ FVT1 ,... ,Vj ,... ,VN (λ, µ) = (Dj∨,T ⊗ Kj∨,T )FVT ,... ,V B ,... ,V (λ, µ), j
1
j
N
(8.5)
where Dj∨,T and Kj∨,T are defined by (2.9). The above two theorems are proved in the same way as Theorems 2.2 and 2.4 respectively. Similarly, let W be an integrable highest weight Uq (g)-module (resp. a B-invariant integrable highest weight Uq (g)-module) and let V1 , . . . , VN be Uq (g)-modules from the category O. Theorem 8.4. T T FV1 ,... ,VN (λ, µ) = χW (q −2µ )FVT1 ,... ,VN (λ, µ), DW
(8.6)
T is defined by (2.2). where DW
Theorem 8.5. Let T be an automorphism of . Then ∨,T T DW FV1 ,... ,VN (λ, µ) = Tr |W h0 (q −2λ B)FVT1 ,... ,VN (λ, µ),
(8.7)
∨,T is defined by (2.7). where DW
The proof of the above two theorems is the same as in the finite-dimensional case. Remark. The integrability condition on the module W is not essential. The classical limits of Theorems 8.2–8.5 are analogous to the the corresponding classical limits of Theorems 2.1–2.4 in Sect. 7. Acknowledgements. The first author was partially supported by the NSF grant DMS-9700477. The work of both authors was partly done when they were employed by the Clay Mathematics Institute as CMI Prize Fellows. O.S. would like to thank the MIT Mathematics Department for its hospitality.
References [D] [E1] [E2] [E3] [EK] [ESS] [ES1] [ES2]
Drinfeld, V.G.: On almost cocommutative Hopf algebras. Leningrad Math. J. 1 no.2, 321–342 (1990) Etingof, P.: Representations of affine Lie algebras, elliptic r-matrix systems, and special functions. Commun. Math. Phys. 159, 471–502 (1990) Etingof, P.: Difference equations with elliptic coefficients and quantum affine algebras. Preprint hepth/9312057 (1993) Etingof, P.: Central elements for quantum affine algebras and affine Macdonald’s operators. Math. Res. Lett. 2, 611–628 (1995) Etingof P., Kirillov, A. Jr.: Macdonald’s polynomials and representations of quantum groups. Math. Res. Lett. 1(3), 279–296 (1994) Etingof, P., Schedler, T., Schiffmann O.: Explicit quantization of dynamical r-matrices for finitedimensional simple Lie algebras. J. Am. Math. Soc. 13 no. 3, 595–609 (2000) Etingof, P., Schiffmann, O.: Twisted traces of intertwiners for Kac–Moody algebras and classical dynamical r-matrices corresponding to generalized Belavin–Drinfeld triples. Math. Res. Lett. 6, 593–612 (1999) Etingof, P., Schiffmann, O.: Lectures on the dynamical Yang–Baxter equations. Preprint math.QA/9908064, to appear in the Proceedings of the Durham Quantum Groups Conference, July 1999
Twisted Traces of Quantum Intertwiners and Quantum Dynamical R-Matrices
[EV1]
663
Etingof, P., Varchenko, A.: Geometry and classification of solutions of the classical dynamical Yang– Baxter equation. Commun. Math. Phys. 192, 77–120 (1998) [EV2] Etingof, P., Varchenko, A.: Traces of intertwiners for quantum groups and difference equations, I. Duke Math. J. 104, 391–432 (2000) [EV3] Etingof, P., Varchenko A.: Exchange dynamical quantum groups. Commun. Math. Phys. 205, 19–52 (1999) [FTV] Felder, G., Tarasov, V., Varchenko, A.: Monodromy of solutions of the elliptic quantum Knizhnik– Zamolodchikov–Bernard difference equations. Internat. J. Math. 10 no. 8, 943–975 (1999) [FR] Frenkel, I., Reshetikhin, N.: Quantum affine algebras and holonomic difference equations. Commun. Math. Phys. 146, 1–60 (1992) [Fro] Fronsdal, C.: Quasi-Hopf deformations of quantum groups. Lett. Math. Phys. 40 no. 2, 117–134 (1997) [JM] Jimbo, M., Miwa, T.: Algebraic analysis of solvable lattice models. CBMS Regional Conference Series in Mathematics, 85. Providence, RI: American Mathematical Society, 1995 [K] Kac, V.: Infinite-dimensional Lie algebras. Cambridge: Cambridge University Press, 1990 [Ki] Kirillov, A. Jr.: Traces of intertwining operators and Macdonald’s polynomials. PhD Thesis, Yale University (1995) [M] Majid, S.: Foundations of quantum group theory. Cambridge: Cambridge University Press, 1995 [R] Reshetikhin, N.Y.: Quasitriangular Hopf algebras and invariants of tangles. Leningrad Math. J. 1 no.2, 491–513 (1990) [S] Schiffmann, O.: On classification of dynamical r-matrices. Math. Res. Lett. 5 no. 1–2, 13–30 (1998) [Xu] Xu, P.: Quantum groupoids. math.QA 9905192 (1999) Communicated by T. Miwa
Commun. Math. Phys. 218, 665 – 685 (2001)
Communications in
Mathematical Physics
© Springer-Verlag 2001
Symmetry and Resonance in Periodic FPU Chains Bob Rink Mathematical Institute, Utrecht University, PO Box 80,010, 3508 TA Utrecht, The Netherlands. E-mail:
[email protected] Received: 25 July 2000 / Accepted: 20 December 2000
Abstract: The symmetry and resonance properties of the Fermi Pasta Ulam chain with periodic boundary conditions are exploited to construct a near-identity transformation bringing this Hamiltonian system into a particularly simple form. This “Birkhoff– Gustavson normal form” retains the symmetries of the original system and we show that in most cases this allows us to view the periodic FPU Hamiltonian as a perturbation of a nondegenerate Liouville integrable Hamiltonian. According to the KAM theorem this proves the existence of many invariant tori on which motion is quasiperiodic. Experiments confirm this qualitative behaviour. We note that one can not expect this in lower-order resonant Hamiltonian systems. So the periodic FPU chain is an exception and its special features are caused by a combination of special resonances and symmetries. 1. Introduction The n particles FPU chain with periodic boundary conditions is a model for point masses moving on a circle with nonlinear forces acting between the nearest neighbours. It is in fact the n degrees of freedom Hamiltonian system on R2n induced by the real-analytic Hamiltonian 1 p 2 + V (qj +1 − qj ), H = (1.1) 2 j j ∈Z/nZ
in which V : R → R is a real-analytic potential energy function of the form V (x) =
1 2 α 3 β 4 x + x + x + ... . 2! 3! 4!
(1.2)
The α, β, . . . are real parameters measuring the nonlinearity in the forces between the particles in the chain.
666
B. Rink
Numerically, the FPU system was first studied by E. Fermi, J. Pasta and S. Ulam, see [4]. These authors used the chain as a model for a string of which the elements interact in a nonlinear way. They expected that in the presence of small nonlinearities, the chain would show ergodic behaviour, meaning that almost all orbits densely fill up an energy-level set of the Hamiltonian. Ergodicity would eventually lead to an equal distribution of energy between the various Fourier modes of the system, a concept called thermalisation. FPU’s nowadays famous numerical experiment was intended to investigate at what timescale thermalisation would take place. The result was astonishing: it turned out that there was no sign of thermalisation at all. Putting initially all the energy in one Fourier mode, they observed that this energy was shared by only a few other modes, the remaining modes were hardly excited. Additionally, within a not too long time the system returned close to its initial state. On increasing the strength of the nonlinearity, this recurrence occurred even earlier. Later computations, e.g. described in [9], confirmed that the same phenomena can also be observed in very large periodic chains. Empirical evidence was found that for small total energy, normal mode energies are hardly shared. Ergodic behaviour can only be observed when the energy level passes a certain critical value. In 1965 an article of Zabuski and Kruskal appeared, cf. [18]. These authors considered the Korteweg–de Vries equation as a continuum limit of the FPU chain and numerically found the first indications for the stable behaviour of solitary waves, thereby suggesting an explanation for the striking data of the FPU experiment. In 1967, Gardner, Greene, Kruskal and Miura ([6]) discovered infinitely many conserved quantities for the KdV equation, which should account for the regular behaviour of its solutions. Reference [11] contains a good overview of these results. They are suggestive, but do not provide a full explanation of FPU’s observations as the impact of the transition from a discrete to a continuous chain has never been analysed. There is another, possibly correct explanation for the quasiperiodic behaviour of the FPU system. It is based on the Kolmogorov–Arnol’d–Moser (KAM) theorem (cf. [2]) and different from the Zabuski-Kruskal argument, it should work especially well for chains with a low number of particles. As is well-known (cf. [2]), the general solution of an n degrees of freedom Liouville integrable Hamiltonian system is constrained to move in an n-dimensional torus and is not at all ergodic but periodic or quasiperiodic. The KAM theorem states that most of the invariant tori of a nondegenerate integrable system persist under small Hamiltonian perturbations. Thus many authors, starting with Izrailev and Chirikov in [7], have stated that the KAM theorem explains the observations of the FPU experiment. This reasoning seems plausible, but, as was clearly pointed out by Ford in [5], it is still completely unclear why the FPU system should be a perturbation of such a nondegenerate integrable system. This gap in the theory was recently mentioned again in the book of Weissert ([17]). What does “nondegenerate” mean here? Let us consider the frequency map ω, which assigns to each n-dimensional invariant torus of a Liouville integrable system the ndimensional vector of frequencies of the (quasi)periodic motion on this torus. An integrable system is called “nondegenerate” if ω is a local diffeomorphism. The KAM theorem holds for perturbations of these nondegenerate integrable systems. But it is no exception for an integrable system to be degenerate. A common example is the harmonic oscillator of which the frequency map is constant: the harmonic oscillator is highly degenerate.And indeed, perturbations of it are known that are ergodic even on lowenergy level sets of the Hamiltonian. Ford gives a nice example of such a perturbation in his review article [5]. We conclude that, although the FPU Hamiltonian can be considered
Symmetry and Resonance in Periodic FPU Chains
667
as a perturbation of an integrable system – namely the harmonic oscillator – the KAM theorem does not apply that easily! The only serious attempt to overcome this problem was made in 1971 in a paper by T. Nishida, [10]. This author considers the FPU chain with fixed endpoints and cubic nonlinearities. He explains that, under the assumption that the linear frequencies of the chain are nonresonant, there is a nonlinear symplectic near-identity transformation of phase space, called the Birkhoff transformation, with the following property: written out in the new coordinates, the Hamiltonian function of the chain turns out to be a quadratic function of the normal mode energies plus a higher order perturbation. His article consists mainly of a computation of this normal form, that is the quadratic function of the normal mode energies. Finally, Nishida verifies the KAM condition for this perturbation of an integrable system. This proves that most solutions of low energy are quasiperiodic ... if the frequencies are non-resonant! Nishida refers to an unpublished result of Izumi in which they are proven to be so if the number of particles n in the chain is prime or a power of 2. Otherwise, the frequencies are resonant, so he is in trouble. No solution is given. The aim of this paper is first of all to compute all lower order resonances in the eigenvalues of the FPU chain. This immediately increases the amount of chains for which Nishida’s statements are valid. But we will mainly focus on the periodic chain. The periodic chain has two important properties: first of all its eigenvalues display many obvious 1:1-resonances. But on the other hand, the periodic chain has nice symmetry properties and it turns out that these symmetry properties overrule all the possible nontrivial lower order resonances in the eigenvalues. The conclusion is that, independent of the number of particles in the chain, we find a near-identity transformation of phase-space that brings the FPU Hamiltonian into a very simple form, the so-called Birkhoff–Gustavson normal form. This normal form is nondegenerately integrable in many cases. It must be stressed that it is highly exceptional that one can do this for a resonant Hamiltonian system such as the periodic FPU chain. The current paper intends to make clear that the special symmetry, eigenvalue and resonance characteristics of the periodic FPU system play a crucial role in the construction of the Birkhoff–Gustavson normal form. It turns out that these characteristics cause the nondegenerate near-integrability of the chain. The conclusion is that the KAM theorem applies because of these resonance and symmetry properties: the quasiperiodic behaviour that Fermi, Pasta and Ulam observed is in some sense an exceptional feature of the FPU system.
1.1. Outline of the paper. This paper is a continuation of [13] in which normal forms of small chains are computed and the KAM theorem is verified. We generalize and explain the results of [13] in this paper. In Sects. 2–6 the necessary theory is formulated. We start with an investigation of the eigenvalues (Sect. 2) and the discrete symmetries (Sect. 4) of the periodic FPU chain. The concept of a Birkhoff–Gustavson normal form as an approximation of a Hamiltonian system is explained in Sect. 5. It will be shown that normal forms for the periodic FPU chain exist that inherit its symmetry properties. In the appendix, which is based on notes of Beukers, number theory is used to compute all lower order resonances in the eigenvalues. We exploit this in Sects. 7 and 8 to prove Theorem 8.2, which forms the core of this paper: it gives the restrictions that the Birkhoff–Gustavson normal form of any Hamiltonian with the same eigenvalues and symmetries as the periodic FPU chain, is subject to.
668
B. Rink
These restrictions on the normal form allow us to point out many near-integrals of the chain in Sect. 9. We finish with an analysis of the β-chain, which is proved to be nearintegrable in Sect. 10. The KAM nondegeneracy condition can easily be checked when the β-chain contains an odd number of particles. Some open questions are formulated for the even β-chain. 2. Phonons To establish the sign conventions that we shall stick to during our analysis, some basic definitions follow here. For further reading on Hamiltonian systems and a thorough explanation of these concepts, the reader is referred to [1]. We shall be considering Hamiltonian systems of differential equations on R2n , the elements of which are denoted by (q, p) = (q1 , . . . , qn , p1 , . . . , pn ). On R2n the n symplectic form σ := dq j ∧ dpj is defined. Endowed with this symplectic j =1 2n form R is a symplectic space. Any Hamiltonian function H : R2n → R induces a Hamiltonian vector field XH on R2n which is defined by σ (XH , ·) = dH . Furthermore, for any two Hamiltonians F and G the Poisson brackets are defined as {F, G} := σ (XF , XG ) = dF · XG = −dG · XF . Keeping these definitions in mind, we now start our analysis of the periodic FPU chain: In order to facilitate the equations of motion induced by the periodic FPU Hamiltonian (1.1), we apply a well-known Fourier transformation (q, p) → (q, ¯ p). ¯ For 1 ≤ j < n2 define n n 2j kπ 2j kπ 2 2 q¯j = cos cos qk , p¯ j = pk , n n n n k=1 k=1 (2.1) n n 2j kπ 2j kπ 2 2 q¯n−j = sin sin qk , p¯ n−j = pk . n n n n k=1
k=1
Furthermore, define n
1 qk , q¯n = √ n k=1
n
1 p¯ n = √ pk , n
(2.2)
k=1
and if n is even, n
1 (−1)k qk , q¯ = √ n n 2
k=1
n
1 p¯ = √ (−1)k pk . n n 2
(2.3)
k=1
The new coordinates (q, ¯ p) ¯ are known as “phonons”. The transformation to phonons is symplectic, that is σ = nj=1 d q¯j ∧ d p¯ j . For a proof, cf. [12] or [13]. In phononcoordinates, the Hamiltonian reads H =
n 1 j =1
2
(p¯ j2 + ωj2 q¯j2 ) + H3 (q¯1 , . . . , q¯n−1 ) + H4 (q¯1 , . . . , q¯n−1 ) + . . . ,
(2.4)
Symmetry and Resonance in Periodic FPU Chains
669
in which Hk (k = 2, 3, . . . ) denotes the k th order part of H ; for j = 1, . . . , n, the numbers ωj are the eigenvalues of the linear periodic FPU problem: jπ ωj := 2 sin . (2.5) n Exact expressions for H3 and H4 in terms of the q¯j can be found in the literature, cf. [12]. We do not repeat them. The linearised equations are the equations induced by H2 . They read: q¯j = p¯ j , p¯ j = −ωj2 q¯j .
(2.6)
The q¯j , p¯ j (1 ≤ j ≤ n − 1) are harmonics with frequency ωj ; p¯ n is constant, whereas q¯n increases with constant speed -note that ωn = 0. In fact, the linearised equations are Liouville integrable, with integrals Ej := 21 (p¯ j2 + ωj2 q¯j2 ). The nonlinear equations (α or β unequal to zero) are much harder to analyse. The Ej are for instance no longer constants of motion. 3. Reduction of a Continuous Symmetry Group From (2.4) we see that H is independent of q¯n , even if α, β, . . . = 0. This implies that p¯ n is an integral of H . The set p¯ n−1 ({0}) defines a 2n − 1 dimensional hyperplane in R2n , invariant under the flow of both XH and Xp¯n . The flow of Xp¯n = ∂ ∂q¯n induces a symplectic R-action on this hyperplane. The time-t flow etXp¯n is actually given by n n ∂ ∂ ∂ ∂ ∂ etXp¯n : q¯j → q¯j +t + p¯ j + p¯ j , (3.1) ∂ q¯j ∂ p¯ j ∂ q¯j ∂ p¯ j ∂ q¯n j =1
j =1
or written out in the original coordinates: n n−1 tX √1 p ∂ ∂ ∂ t ∂ k n e qj → (qj + √ ) . : + pj + pj ∂qj ∂pj ∂pj n ∂qj j =1
(3.2)
j =1
The orbits of this flow are the lines (q, ¯ p) ¯ + R ∂ ∂q¯n . It is clear that the 2n − 2 dimensional −1 −1 2n−2 hyperplane q¯n ({0})∩ p¯ n ({0}) ∼ is transversal to these orbits. Therefore, R2n−2 =R is a model for the space p¯ n−1 ({0})/R of Xp¯n -orbits lying in p¯ n−1 ({0}). R2n−2 inherits the 2n symplectic structure σ˜ := n−1 j =1 d q¯j ∧ d p¯ j from R . And since the FPU Hamiltonian H is constant on the orbits of the flow of Xp¯n , H reduces to a Hamiltonian on R2n−2 given by H =
n−1 1 j =1
2
(p¯ j2 + ωj2 q¯j2 ) + H3 (q¯1 , . . . , q¯n−1 ) + H4 (q¯1 , . . . , q¯n−1 ) + . . . .
(3.3)
The reduced Hamiltonian (3.3) represents the periodic FPU system from which the centre of mass motion has been eliminated. Since ωj2 > 0 (1 ≤ j ≤ n − 1), we conclude with the Morse-Lemma (cf. [1]) that the level sets of H are 2n − 3 dimensional spheres around the origin of R2n−2 . And since H is a constant of motion for the flow of XH , we see that the origin is a stable stationary point for the reduced system induced by the reduced Hamiltonian (3.3).
670
B. Rink
4. Discrete Symmetries Apart from the continuous family of symmetries of the previous section, the FPU Hamiltonian has some discrete symmetries. These have important dynamical consequences. The first discrete symmetry is a rotation symmetry. Let T : R2n → R2n denote the circle permutation, the unique linear mapping defined by T :
∂ ∂ → , ∂qj ∂qj −1
∂ ∂ → . ∂pj ∂pj −1
(4.1)
T is symplectic: T ∗ σ = σ . Furthermore, note that T leaves H invariant: T ∗ H := H ◦ T = H . This implies that the Hamiltonian vector field XH induced by H is equivariant under T : DT · XH = XH ◦ T . In other words: if γ : R → R2n is an integral curve of XH , then T ◦ γ : R → R2n is an integral curve of XH . This is why we call T a symmetry of H . The same thing holds for the powers of T . The group T := {Id, T , T 2 , . . . , T n−1 } ∼ = Z/nZ is a discrete symmetry group of H . We can point out another discrete symmetry, namely the reflection S : R2n → R2n which is the unique linear mapping sending S:
∂ ∂ → − , ∂qj ∂qn−j
∂ ∂ → − . ∂pj ∂pn−j
(4.2)
S is again a symplectic symmetry: S ∗ σ = σ and S ∗ H = H . The group S := {Id, S} ∼ = Z/2Z , whereas the full discrete symmetry group T , S := {Id, T , T 2 , . . . , T n−1 , S, ST 2 , . . . , ST n−1 } ∼ = Dn is called the “nth dihedral group”; its group structure is determined by the relation ST = T n−1 S. The vector field XH is equivariant under the elements of T , S, that is T , S maps integral curves of XH to integral curves of XH . The reader should note that T and S leave q¯n−1 ({0})∩ p¯ n−1 ({0}) invariant. Therefore, T and S reduce to linear symplectic mappings on R2n−2 that leave the reduced Hamiltonian invariant1 . 5. Normalisation We shall study the reduced FPU system (3.3) using Birkhoff–Gustavson normalisation. In fact, we shall construct a near-identity transformation of phase-space allowing us to write the FPU Hamiltonian in “normal form”, meaning that it can be seen as a perturbation of a rather simple system. The study of the truncated normal form – that is this simpler system – leads to important conclusions for the original FPU system. For instance, the solutions of the truncated normal form are approximations of low-energetic solutions of the original system valid on a long time-scale. Integrals of the truncated normal form are near-integrals of the original system: on orbits of low energy, they are almost conserved 1 The FPU Hamiltonian also has a reversing symmetry, namely the mapping R : R2n → R2n given by
R:
∂ ∂ ∂ ∂ → , → − . ∂qj ∂qj ∂pj ∂pj
(4.3)
R leaves the FPU Hamiltonian invariant, i.e. R ∗ H = H . R is anti-symplectic in the sense that R ∗ σ = −σ . This implies that the vector field XH is anti-equivariant under R: DR · XH = −XH ◦ R. In other words: if γ : R → R2n is an integral curve of XH , then R ◦ γ ◦ (−Id) : R → R2n is an integral curve of XH . Since R leaves q¯n−1 ({0}) ∩ p¯ n−1 ({0}) invariant, R reduces to an anti-symplectic mapping on R2n−2 leaving the reduced Hamiltonian invariant. More information on reversing symmetries can be found in [8].
Symmetry and Resonance in Periodic FPU Chains
671
for a long time. See [15] for an explanation and explicit statements. Furthermore, the truncated normal form can help us understand bifurcation phenomena. And last but not least, if the truncated normal form of the FPU chain is integrable in a nondegenerate way, then the FPU chain is a perturbation of a nondegenerate integrable system. We may apply the KAM theorem then and conclude that almost all low-energetic solutions of (3.3) are quasiperiodic and move on tori. Conclusions of this type were drawn for the first time in [13]. The setting of normalisation is the following: Let Pk be the set of all homogeneous k th degree polynomials in (q¯1 , . . . , q¯n−1 , p¯ 1 , . . . , p¯ n−1 ). The set of all power series without linear part, P := k≥2 Pk , is a Liealgebra with the Poisson bracket. For each h ∈ P the adjoint representation adh : P → P is the linear operator defined by adh (H ) = {h, H }. Note that whenever h ∈ Pk , then adh : Pl → Pk+l−2 . The flow etXh of a Hamiltonian vector field Xh induced by h ∈ P −P2 is a symplectic near-identity transformation in R2n−2 . For its action on an arbitrary Hamiltonian H ∈ P d we have dt (etXh )∗ H = dH · Xh = −adh (H ). This is a linear differential equation in P of which the solution is (etXh )∗ H = e−tadh H . In particular the near-identity “Lietransformation” e−Xh = Id − Xh + . . . transforms H into 1 H := (e−Xh )∗ H = eadh H = H + {h, H } + {h, {h, H }} + . . . . 2
(5.1)
Let us denote the k th order part of the Hamiltonian H – that is the projection of H on 1 m Pk – by Hk . If for instance h ∈ P3 , then we obtain the formula Hk = k−2 m=0 m! (adh ) (Hk−m ). We just gathered all terms of equal degree in formula (5.1). Assume now, as is the case for the reduced FPU Hamiltonian, that adH2 : Pk → Pk is semisimple (i.e. complex-diagonalisable) for every k ≥ 2. Then Pk = ker adH2 ⊕ im adH2 . In particular H3 is uniquely decomposed as H3 = f3 + g3 , with f3 ∈ ker adH2 , g3 ∈ im adH2 . Now choose a h3 ∈ P3 such that adH2 (h3 ) = g3 . One could for example choose h3 = g˜ 3 := (adH2 |im adH2 )−1 (g3 ). But clearly the choice h3 = g˜ 3 + p3 suffices for any p3 ∈ ker adH2 ∩ P3 . For the new Hamiltonian H we calculate from (5.1) that H2 = H2 , H3 = f3 ∈ ker adH2 , H4 = H4 + {h3 , H3 − 21 g3 }, etc. But now we can again write H4 = f4 + g4 with f4 ∈ ker adH2 , g4 ∈ im adH2 and it is clear that by a suitable choice of h4 ∈ P4 the Lie-transformation e−Xh4 transforms our H into H for which H2 = H2 , H3 = f3 ∈ ker adH2 and H4 = f4 ∈ ker adH2 . Continuing in this way, we can for any finite r ≥ 3 find a sequence of symplectic near-identity transformations e−Xh3 , . . . , e−Xhr with the property that e−Xhk only changes the Hl with l ≥ k, whereas the composition e−Xh3 ◦ . . . ◦ e−Xhr transforms H into H with the property that H k Poisson commutes with H2 for every 2 ≤ k ≤ r. H is called a normal form of H of order r. Its study can give us useful information on low-energetic solutions of the original Hamiltonian H . More on normalisation by Lie-transformations can be found in [3]. 6. Normal Forms and Discrete Symmetry In Sect. 4 we investigated the discrete symmetries of the periodic FPU Hamiltonian. We saw that they reduce to symmetries of the reduced FPU system on R2n−2 . In this section we show how one can construct normal forms of the reduced FPU Hamiltonian
672
B. Rink
that have the same symmetry properties as the reduced FPU Hamiltonian itself. The author acknowledges Hans Duistermaat for bringing this crucial point to his attention and for stressing that it could lead to interesting conclusions. We shall see that it does so in Sect. 8 and further. The symmetry properties are captured in the definition of the symmetric subspace of P: P ST := {f ∈ P | S ∗ f = f, T ∗ f = f }. Note that the FPU Hamiltonian is in P ST . The next observation is that S ∗ and T ∗ are Lie-algebra automorphisms of P : S ∗ {f, g} = {S ∗ f, S ∗ g},
T ∗ {f, g} = {T ∗ f, T ∗ g},
(6.1)
simply because T and S are symplectic. Now take f ∈ P ST and g ∈ P ST . Then from (6.1) it follows that S ∗ {f, g} = {S ∗ f, S ∗ g} = {f, g} and T ∗ {f, g} = {T ∗ f, T ∗ g} = {f, g}. This means that P ST is a Lie-subalgebra of P : if f, g ∈ P ST , then {f, g} ∈ P ST . Alternatively stated: if h ∈ P ST , then adh : P ST → P ST . In particular, eadh : P ST → P ST . Since adH2 leaves P ST invariant, we know that P ST = (ker adH2 ∩P ST )⊕(im adH2 ∩ ST P ). So if we decompose the third order part of the FPU Hamiltonian as H3 = f3 + g3 with f3 ∈ ker adH2 , g3 ∈ im adH2 , then f3 , g3 ∈ P3ST automatically. h3 = g˜ 3 = (adH2 |im adH2 )−1 (g3 ) is the unique element of im adH2 ∩ P3ST for which adH2 (h3 ) = g3 . But since g˜ 3 ∈ P3ST , we find that H = (e−Xg˜3 )∗ H = eadg˜3 H ∈ P ST . Of course the choice h3 = g˜ 3 + p3 also suffices for any p3 ∈ ker adH2 ∩ P3ST . It should be clear that continuing this procedure, we can produce normal forms H ∈ P ST of H up to any finite order2 .
7. Simultaneous Diagonalisation From (6.1) we infer that (T ∗ ◦ adH2 )(f ) = T ∗ {H2 , f } = {T ∗ H2 , T ∗ f } = {H2 , T ∗ f } = (adH2 ◦ T ∗ )(f ). (7.1) So adH2 and T ∗ commute on Pk . Therefore adH2 leaves the eigenspaces of T ∗ invariant and we can diagonalise adH2 and T ∗ simultaneously. This allows us to calculate the subspace Pk ∩ ker adH2 ∩ ker(T ∗ − Id) ⊂ Pk in which H k is contained and helps us formulate some important restrictions on the normal form of the FPU Hamiltonian. 2 Although the bookkeeping is a bit harder, one can extend the previous argument to prove that the normal forms can also be chosen invariant under R ∗ . For a complete proof, cf. [3].
Symmetry and Resonance in Periodic FPU Chains
673
In order to perform this simultaneous diagonalisation, we introduce the “superphonons” (z, ζ ). For 1 ≤ j < n2 , define: n
zj :=
iωj 1 1 − 2π ij k e n (pk + iωj qk ), (p¯ j − i p¯ n−j ) + (q¯j − i q¯n−j ) = √ 2 2 2n k=1
ζj :=
2π ij k 1 1 1 (p¯ j + i p¯ n−j ) − (q¯j + i q¯n−j ) = e n (pk − iωj qk ), √ 2iωj 2 iωj 2n k=1
n
n
iωj 1 1 − 2π ij k zn−j := − (p¯ j − i p¯ n−j ) + e n (pk − iωj qk ), (q¯j − i q¯n−j ) = − √ 2 2 2n k=1 n
ζn−j
2π ij k 1 1 1 := (p¯ j + i p¯ n−j ) + (q¯j + i q¯n−j ) = e n (pk + iωj qk ), √ 2iωj 2 iωj 2n k=1 (7.2)
and if n is even: n
1 1 z n2 := √ (p¯ n2 + iω n2 q¯ n2 ) = (−1)k (pk + iω n2 qk ), √ iω n2 2n k=1 2iω n2 n
1 1 ζ n2 := √ (p¯ n2 − iω n2 q¯ n2 ) = √ (−1)k (pk − iω n2 qk ). 2n k=1 2
(7.3)
One checks that {zj , zk } = {ζj , ζk } = 0 and {zj , ζk } = δj k , the Kronecker delta. So our superphonons define canonical coordinates, i.e. σ˜ = n−1 j =1 zj ∧ ζj . ∗ ∗ From (4.1) we infer that T qj = qj +1 and T pj = pj +1 , where qj , pj : R2n → R are the coordinate functions. So from (7.2) we see that T ∗ : zj → e ζn−j → e
2π ij n
zj ,
− 2πnij
ζn−j ,
ζj → e−
2π ij n
ζj ,
z n2 → −z n2
zn−j → e
2π ij n
zn−j ,
and ζ n2 → −ζ n2 .
(7.4)
We conclude that T ∗ acts diagonally on (z, ζ )-coordinates. And it acts diagonally on monomials in (z, ζ ): if -, θ ∈ {0, 1, 2, . . . }n−1 are multi-indices, then T ∗ : z- ζ θ → e
2π iµ(-,θ ) n
z- ζ θ ,
(7.5)
µ being defined as: µ(-, θ) :=
j (-j + -n−j − θj − θn−j ) +
1≤j < n2
n (- n − θ n2 ) 2 2
On the other hand one calculates: H2 = iωj (zj ζj − zn−j ζn−j ) + iω n2 z n2 ζ n2 . 1≤j < n2
mod n.
(7.6)
(7.7)
674
B. Rink
So we also diagonalised adH2 with respect to monomials: adH2 : z- ζ θ → ν(-, θ )z- ζ θ ,
(7.8)
in which ν is defined as ν(-, θ) := iωj (θj − θn−j − -j + -n−j ) + iω n2 (θ n2 − - n2 ).
(7.9)
1≤j < n2
Monomials z- ζ θ commuting with H2 -the ones for which ν(-, θ ) = 0- are called resonant monomials. They are particularly important because they cannot be normalised away. 8. Restrictions for Symmetric Normal Forms From Sect. 6 we know that we can transform the periodic FPU Hamiltonian into a discrete symmetric normal form of any desired order. Suppose we did so up to order r. Then H k ∈ Pk ∩ ker adH2 ∩ ker(T ∗ − Id) for any 2 ≤ k ≤ r. But since both T ∗ and adH2 act diagonally in (z, ζ )-coordinates, we know that this H k must be a linear combination of monomials z- ζ θ for which |-| + |θ | = k,
µ(-, θ ) = 0
mod n and ν(-, θ ) = 0.
(8.1)
Extra restrictions on H k , with which we shall deal later, arise from the fact that H k can be chosen in the even smaller set P ST 3 . But first we investigate which - and θ satisfy (8.1). Because the ωj in (7.9) are of the form 2i sin( jnπ ), this is actually a number-theoretical question that we shall solve for |-| + |θ | = 2, 3, 4. The quadratic case - i.e. |-| + |θ | = 2 - is easy: since all the ωj are different, we find from ν(-, θ) = 0 that the Lie-subalgebra P2 ∩ ker adH2 ⊂ P2 is spanned by the monomials zj ζj , zn−j ζn−j , zj zn−j , ζj ζn−j (1 ≤ j <
n ) 2
and z n2 ζ n2 .
(8.2)
T ∗ acts diagonally on these basis-elements as follows: T ∗ : zj ζj → zj ζj , zj zn−j → e
4π ij n
zn−j ζn−j → zn−j ζn−j ,
zj zn−j ,
ζj ζn−j → e
− 4πnij
z n2 ζ n2 → z n2 ζ n2 ,
ζj ζn−j .
(8.3)
The Lie-subalgebra P2 ∩ ker adH2 ∩ ker(T ∗ − Id) = span{zj ζj , zn−j ζn−j , z n2 ζ n2 } is abelian. From (4.2) and (7.2) we calculate the action of S ∗ on the coordinate-functions: S ∗ : zj → −iωj ζn−j , ζj → z → −z , ζ → −ζ . n 2
n 2
3 and invariant under R ∗
n 2
n 2
1 −1 zn−j , zn−j → iωj ζj , ζn−j → zj , iωj iωj
(8.4)
Symmetry and Resonance in Periodic FPU Chains
675
So the action on the basis-elements reads: S ∗ : zj ζj → −zn−j ζn−j , zj zn−j → ωj2 ζj ζn−j ,
zn−j ζn−j → −zj ζj , ζj ζn−j →
z n2 ζ n2 → z n2 ζ n2 ,
1 zj zn−j . ωj2
(8.5)
We conclude that the Lie-subalgebra P2ST ∩ker adH2 is spanned by the quadratics zj ζj − zn−j ζn−j and z n2 ζ n2 . Note that H2 itself is indeed a linear combination of these quadratics. The analysis is harder if we consider the cases |-| + |θ| = 3, 4. With the use of number theory, the proof of the following theorem is given in the appendix. Theorem 8.1. i) The set of multi-indices (-, θ ) ∈ {0, 1, 2, . . . }2n−2 for which |-| + |θ| = 3, µ(-, θ ) = 0 mod n and ν(-, θ ) = 0 is empty. ii) The set of multi-indices (-, θ ) ∈ {0, 1, 2, . . . }2n−2 for which |-| + |θ | = 4, µ(-, θ) = 0 mod n and ν(-, θ) = 0 is contained in the set given by the relations θj − θn−j − -j + -n−j = θ n2 − - n2 = 0. Theorem 8.1 has some major implications. We shall investigate these now and they will be summarised in Theorem 8.2. From i) we see that P3ST ∩ker adH2 ⊂ P3 ∩ker adH2 ∩ker(T ∗ −Id) = {0}. First of all, this implies that we can always transform away H3 from the periodic FPU Hamiltonian: H 3 = 0. This is an unexpected result. Consider for example the chain with 6 particles, which satisfies a third order resonance relation: ω1 : ω3 : ω5 = 1 : 2 : 1. For systems with a third order resonance relation one can generally not expect H 3 to be trivial. But, as was observed for the first time in [13], it is trivial for the 6 particles chain. One could say that the 1 : 2 : 1-resonance is not active at H3 -level. We now know that for the periodic FPU chain no resonance will ever be active at the H3 -level. This simplification is caused by the symmetries of the FPU system. Secondly, we conclude from i) that the h3 of Sect. 6 is uniquely determined by the requirement that it be in P3ST . This in turn uniquely determines H 4 . From ii) we infer that any element of P4 ∩ ker adH2 ∩ ker(T ∗ − Id) must be a linear combination of products of two of the basis-elements in (8.2). Note however that not all these products are really T ∗ -invariant and that the full normal form is even invariant under S ∗ . We work out these extra restrictions now. The question which products of the basis-elements (8.2) are invariant under T ∗ is easy to answer with the help of the formulas (8.3). Clearly, all products of zj ζj , zn−j ζn−j and z n2 ζ n2 are. T ∗ multiplies the terms (zj ζj )(zk zn−k ), (zj ζj )(ζk ζn−k ), (zn−j ζn−j )(zk zn−k ), (zn−j ζn−j )(ζk ζn−k ), (z n2 ζ n2 )(zk zn−k ) and (z n2 ζ n2 )(ζk ζn−k ) with a factor e± T ∗.
T∗
4π ik n
= 1, so 4π i(j −k)
multiplies (zj zn−j )(ζk ζn−k ) by e n these terms are not invariant under which is 1 if and only if 2(j − k) = 0 mod n. But because 1 ≤ j, k < n2 , the condition is 2(j − k) = 0, i.e. j = k. Thus we end up with a term that we already had: (zj zn−j )(ζj ζn−j ) = (zj ζj )(zn−j ζn−j ). Finally, the terms (zj zn−j )(zk zn−k ) 4π i(j +k)
which is 1 if and only if and (ζj ζn−j )(ζk ζn−k ) are multiplied by a factor e± n 2(j + k) = 0 mod n. But since 1 ≤ j, k < n2 , the only possibility is that 2(j + k) = n, that is n must be even and j + k = n2 . This concludes our search for fourth order monomials invariant under T ∗ and Poisson commuting with H2 . We shall check now which combinations of these terms are also invariant under S ∗ . The action of S ∗ on P2 ∩ ker adH2 can be diagonalised in real coordinates. For this
676
B. Rink
purpose, besides our familiar complex basis, we also define the following real basiselements for P2 ∩ ker adH2 . For 1 ≤ j < n2 , let 1 2 2 (p¯ 2 + p¯ n−j + ωj2 q¯j2 + ωj2 q¯n−j ), 2ωj j bj := i(zj ζj + zn−j ζn−j ) = p¯ j q¯n−j − p¯ n−j q¯j , (8.6) 1 2 1 2 (ω ζj ζn−j + zj zn−j ) = (p¯ 2 − p¯ j2 + ωj2 q¯n−j − ωj2 q¯j2 ), cj := ωj j 2ωj n−j i 1 dj := (ωj2 ζj ζn−j − zj zn−j ) = (p¯ j p¯ n−j + ωj2 q¯j q¯n−j ), ωj ωj and if n is even 1 a n2 := iz n2 ζ n2 = (p¯ 2n + ω2n q¯ 2n ). 2 2 2ω n2 2 aj := i(zj ζj − zn−j ζn−j ) =
Note that these basis-elements are subject to the relation aj2 = bj2 + cj2 + dj2
(8.7)
and that H2 can easily be expressed as H2 =
ωj aj .
(8.8)
1≤j ≤ n2
Our definitions diagonalise the action of S ∗ : S ∗ : aj → aj ,
a n2 → a n2 ,
bj → −bj ,
cj → cj ,
dj → −dj .
(8.9)
The products aj ak , a n2 aj and bj bk are invariant under S ∗ and T ∗ . The products aj bk and a n2 bk are not invariant under S ∗ , although they are under T ∗ . It is left as an easy excercise for the reader to prove that the only configuration for other terms to appear is dj d n2 −j − cj c n2 −j . We summarize the results of this section in the following theorem: Theorem 8.2. Let H be the reduced periodic FPU Hamiltonian (3.3). There is a fourth order normal form H of H which is invariant under T ∗ and S ∗ 4 . For this normal form we have H 3 = 0, whereas H 4 is a linear combination of the fourth order terms aj ak , bj bk (1 ≤ j, k < n2 ) and if n is even a n2 aj (1 ≤ j ≤ n2 ) and dj d n2 −j − cj c n2 −j (1 ≤ j ≤ n4 ). 9. Near-Integrals or Integrals of the Truncated Normal Form In the previous section we proved that the truncated fourth order normal form of the periodic FPU Hamiltonian is subject to many restrictions, as indicated in Theorem 8.2. This enables us to point out some integrals for the truncated normal form. These are near-integrals of the periodic FPU chain: quantities that are nearly conserved by the flow of the orginal chain (3.3) for a long time, cf. [15]. 4 and R ∗
Symmetry and Resonance in Periodic FPU Chains
677
In order to be able to compute these integrals, we first write down the commutation relations between the real basis-elements (8.6). They are given by {bj , cj } = 2dj ,
{bj , dj } = −2cj ,
{cj , dj } = 2bj .
(9.1)
All the other Poisson brackets between basis-elements give 0. These relations lead to the following conclusions: 9.1. The odd chain. Corollary 9.1. If n is odd, then the truncated normal form H2 + H 4 of the periodic FPU chain is Liouville integrable with the quadratic integrals aj , bj (1 ≤ j ≤ n−1 2 ). Proof. H2 is a linear combination of the quadratics aj and H 4 is a linear combination of the fourth order terms aj ak and bj bk . The aj and bk Poisson commute with all these terms and with each other. It is well-known (cf. [2]), that the integrals of a 2n − 2-dimensional Liouville integrable Hamiltonian system generally define n − 1-dimensional invariant tori. Let’s see what these tori look like here and analyse the integral map F : R2n−2 → Rn−1 that maps (q, ¯ p) ¯ → (a, b): Proposition 9.2. im F = {(a, b) ∈ Rn−1 |aj ≥ 0, |bj | ≤ aj }. (im F )int
For (a, b) ∈ = {(a, b) ∈ n − 1-dimensional torus .
Rn−1 |a
j
> 0, |bj | < aj
(9.2)
}, F −1 ({(a, b)}) is a smooth
for aj > 0, the cartesian product of Proof. Clearly, im aj = [0, ∞). The level set of aj is, R2n−6 and a 3-dimensional sphere in R4 with radius 2aj . Let us consider bj restricted to the level set of aj . To compute its critical points, we use the method of Lagrange multipliers: (q, ¯ p) ¯ is a critical point iff there is a constant λ such that Daj (q, ¯ p) ¯ = λDbj (q, ¯ p), ¯ that is (ωj q¯j , ωj q¯n−j , ω1j p¯ j , ω1j p¯ n−j ) = λ(−p¯ n−j , p¯ j , q¯n−j , −q¯j ). From this we infer that λ = ±1. For λ = 1, we find p¯ n−j = −ωj q¯j , p¯ j = ωj q¯n−j . In these points we have bj = aj . λ = −1 gives p¯ n−j = ωj q¯j , p¯ j = −ωj q¯n−j , so bj = −aj . These are the extreme values of bj on the level set of aj , giving (9.2). We also learn from this that if aj > 0 and |bj | < aj , then Daj and Dbj are independent. So if (a, b) ∈ (imF )int , then all Daj and Dbk are independent on F −1 ({(a, b)}). According to a theorem of Arnol’d (cf. [2]) such a level set must be a torus. In order to compute the flow on these tori, we make the explicit transformation to action-angle coordinates (q, ¯ p) ¯ → (a, b, φ, ψ) as follows. Let arg : R2 − {(0, 0)} → R/2πZ be the argument function, arg : (r cos 7, r sin 7) → 7. Then, for 1 ≤ j ≤ n−1 2 , define 1 2 2 aj := (p¯ 2 + p¯ n−j + ωj2 q¯j2 + ωj2 q¯n−j ), 2ωj j bj := p¯ j q¯n−j − p¯ n−j q¯j , (9.3) 1 1 φj := arg(−p¯ n−j − ωj q¯j , p¯ j − ωj q¯n−j ) + arg(p¯ n−j − ωj q¯j , p¯ j + ωj q¯n−j ), 2 2 1 1 ψj := arg(−p¯ n−j − ωj q¯j , p¯ j − ωj q¯n−j ) − arg(p¯ n−j − ωj q¯j , p¯ j + ωj q¯n−j ). 2 2
678
B. Rink
Note that these are well-defined as long as (a, b) ∈ (imF )int . With the formula d arg(x, y) = xdy−ydx , one can verify that the (a, b, φ, ψ) are canonical coordinates: x 2 +y 2 n−1 σ˜ = j =1 d q¯j ∧ d p¯ j = 1≤j ≤ n−1 dφj ∧ daj + dψj ∧ dbj . 2
The truncated normal form H2 + H 4 is a function of the actions aj , bj . Its induced equations of motion therefore read: a˙ j = b˙j = 0, φ˙ j = ωj +
∂H 4 (a, b), ∂aj
ψ˙ j =
(9.4) ∂H 4 (a, b), ∂bj
which are very simple. In order to verify that the truncated normal form H2 + H 4 is nondegenerate, we examine the frequency map ω which adds to each invariant torus the frequencies of the flow on it:
ω : (a, b) → ω1 +
∂H 4 (a, b), . . . , ω n−1 , 2 ∂a1
∂H 4 ∂H 4 ∂H 4 + (a, b) (a, b), . . . , (a, b) . ∂a n−1 ∂b1 ∂b n−1 2
2
∂ H4 and ω is a local diffeomorphism iff both the constant derivative matrices ∂a j ∂ak are invertible. We will explicitly check this for the odd β-chain in Sect. 10. The situation is more difficult in the case of 2
∂2H 4 ∂bj ∂bk
9.2. The even chain. Corollary 9.3. If n is even, then the truncated normal form H2 + H 4 of the periodic FPU chain has the quadratic integrals aj (1 ≤ j ≤ n2 ) and bj − b n2 −j (1 ≤ j < n4 ). Proof. H2 is a linear combination of the quadratics aj (1 ≤ j ≤ n2 ), whereas H 4 is a linear combination of the fourth order terms aj ak (1 ≤ j, k ≤ n2 ), bj bk (1 ≤ j < n2 ) and dj d n2 −j − cj c n2 −j (1 ≤ j ≤ n4 ). The aj clearly commute with all these terms. So do the terms bj −b n2 −j : {bj −b n2 −j , bk } = {bj −b n2 −j , ak } = {bj −b n2 −j , a n2 } = 0. But one also verifies from (9.1) that {bj −b n2 −j , cj c n2 −j −dj d n2 −j } = c n2 −j {bj , cj }−cj {b n2 −j , c n2 −j } −d n2 −j {bj , dj } + dj {b n2 −j , d n2 −j } = 2c n2 −j dj − 2cj d n2 −j +2d n2 −j cj − 2dj c n2 −j = 0. If n is even, then the n-1-degrees of freedom Hamiltonian H2 + H 4 has at least (if 4 divides n) or 3n−2 (if 4 does not divide n) quadratic integrals. These are 4 near-integrals for the original chain (3.3). We have not yet found a complete system of integrals for the truncated normal form though. We will do so for the even β-chain in Sect. 10.2. 3n−4 4
Symmetry and Resonance in Periodic FPU Chains
679
10. The Normal Form of the β-Chain In this section we present the explicit normal form of the periodic FPU Hamiltonian in the case that H3 = 0, i.e. α = 0 in (1.1). This chain, that has no cubic terms, is usually referred to as the β-chain. A calculation of the normal form of order 4 is relatively easy in this case, because one does not have to transform away H3 first. The calculation is still tedious though and that is why we do not present it. The reader can find an example of a similar computation in [13]. The following theorem is a major generalisation of the result in [13], which in turn is a restatement -with a much more efficient proof- of a theorem in the PhD thesis of Sanders ([14]). Theorem 10.1. If α = 0, then in the periodic FPU chain one has β H4 = n
ω2 ωk ωl 1 1 k ωk ak ak al + (3ak2 − bk2 ) + a 2n + a n2 4 32 4 2 2 0
In formula (10.1) it is understood that terms with the subscript 2 respectively 4 divides n.
n 2
and
n 4
only appear if
10.1. The odd β-chain. In formula (10.1) we see again what was already predicted in Theorem 8.2, namely that H 4 is a linear combination of the terms aj ak and bj bk (1 ≤ j, k ≤ n−1 2 ). According to corollary 9.1 this normal form is integrable, the aj and bj being the (quadratic) integrals. To check the nondegeneracy condition, we compute the second order derivative matrices of H4 with respect to the action variables aj and bj : 3ω12 4ω1 ω2 · · · 4ω1 ω n−1 2 4ω ω 2 3ω . . . 4ω ω 2 n−1 2 1 2 2 ∂ H4 β 2 , (10.2) = . . . . . . ∂aj ∂ak 16 n . . . 4ω n−1 ω1 4ω n−1 ω2 · · · 3ω2n−1 2
∂ 2H 4 β =− ∂bj ∂bk 16 n
2
2
ω12 ω22
..
. ω2n−1
.
(10.3)
2
∂2H
4
∂2H
is clearly nondegenerate. But so is ∂aj ∂a4k . This can be proved by applying elementary row and column operations to (10.2), thus reducing it to upperdiagonal form. This yields an expression for the determinant that is unequal to 0. We conclude that ∂bj ∂bk
680
B. Rink
the reduced periodic β-chain with an odd number of particles can, after a near-identity transformation, be written as a perturbation of a nondegenerate integrable Hamiltonian system. Therefore, the KAM theorem (cf. [2]) applies: Theorem 10.2. If n is odd, α = 0 and β = 0, then almost all low-energy solutions of the reduced periodic FPU chain (3.3) are periodic or quasiperiodic and move on invariant tori. In fact, the relative measure of all these tori lying inside the small ball {0 ≤ H ≤ ε}, goes to 1 as ε goes to 0. It should also be possible to write down an expression for the normal form if α = 0. The nondegeneracy condition can be checked again then. But the computation of this normal form is very nasty - transforming away H3 we obtain the transformed H4 = H4 + 21 {(adH2 |im adH2 )−1 (H3 ), H3 } which thereafter has to be normalised to produce H 4 . The result is most likely that for almost all α and β the nondegeneracy condition holds and the KAM theorem applies. Without computation this is clear for |α| # |β|, because in this situation the coefficients of the normal form (10.1) change only slightly and the invertible matrices form an open set in the set of all matrices. 10.2. The even β-chain. It is a surprise that in the normal form of the even β-chain no terms bj bk (j = k) arise, see formula (10.1). This leads to the following remarkable conclusion: Corollary 10.3. If n is even and α = 0, β = 0, then the truncated normal form H2 +H 4 of the reduced periodic FPU chain (3.3) is Liouville integrable. The integrals are the quadratics aj (1 ≤ j ≤ n2 ), bj − b n2 −j (1 ≤ j < n4 ) and d n4 (if n is a multiple of 4) and the fourth order terms ωk2 bk2 + ω2n −k b2n −k + 4ωk ω n2 −k (ck c n2 −k − dk d n2 −k ) (1 ≤ j < n4 ). 2
2
Proof. This follows from simply computing all the Poisson brackets, using (9.1) and the fact that the Poisson brackets form a derivation. Only the aj , bj − b n2 −j and d n4 induce a 2π -periodic flow and can therefore be seen as actions after some symplectic action-angle transformation. It is an open problem to construct the remaining action variables. Thereafter one could differentiate H 4 with respect to them and verify the KAM nondegeneracy condition. One exceptional case is easier: the β-problem with 4 particles. Its truncated normal form reads:
√ √ 2 β 1 2 1 2 1 2 H2 + H 4 = 2a1 + 2a2 + (10.4) a + a + a1 a2 + d1 , 4 8 1 4 2 2 8 which has the commuting integrals a1 , a2 and d1 . The frequency map is √ √ √ 2β 2β β β β a 2 , 2 + a2 + a1 , d1 ). ω : (a1 , a2 , d1 ) → ( 2 + a1 + 16 8 8 8 16 ω is nondegenerate, since √ 2 1 0 2 4 ∂ω β √2 1 = 0 2 2 ∂(a1 , a2 , d1 ) 4 0 0 41
(10.5)
(10.6)
Symmetry and Resonance in Periodic FPU Chains
681
is invertible. So a similar theorem as 10.2 holds for the β-chain with 4 particles. It is unclear what happens for the even chain if α = 0. The truncated normal form might not be integrable. On the other hand we already found out about 3n 4 integrals. And in [13] it was already shown that the normal forms of the α-β-chain with up to 6 particles are Liouville integrable. 11. Discussion The lesson that we can learn from this analysis is that the characteristic features of the Fermi-Pasta-Ulam chain, such as quasiperiodicity and nonergodicity, are not just a property shared by all low-energy solutions of resonant Hamiltonian systems. On the contrary: the periodic FPU chain is a rather special system possessing particular symmetries and eigenvalues. These cause or may cause nondegenerate integrability of the Birkhoff–Gustavson normal form of the chain, which in turn implies that the KAM theorem (cf. [2]) is applicable. The appendix of this paper contains a calculation of all possible lower order resonances in the eigenvalues of the FPU chain. That is, it lists all the possible n ∈ N for which there is a vector k = (k1 , . . . , kn ) with the property that |k| := |kj | = 3 or |k| = 4 and kj sin(j π/2n) = 0. Clearly, the conclusions of Nishida (see the introduction and reference [10]) are valid for all chains with: 1) 2) 3) 4)
fixed endpoints. n − 1 free particles, i.e. the 0th and the nth particle are forced to be at rest. α = 0, β = 0. there is no k ∈ Zn−1 with |k| = 4 and kj sin(j π/2n) = 0.
It is a little exercise to verify that this means that n may have any value except multiples of 15 and 21. The following questions remain unanswered: 1. Can we, as for the periodic chain, find symmetry-like properties for the chain with fixed endpoints that overrule the resonances in the eigenvalues that occur if n is a multiple of 15 or 21? This would make the proof of Nishida’s “conjecture” complete. 2. From Corollary 9.1 we know that the truncated normal form of the odd FPU chain is integrable. In Sect. 10.1 we checked a nondegeneracy condition for the odd β-chain and were able to apply the KAM theorem. Can we also compute the normal form of the odd chain if α = 0? Is it really nondegenerate? 3. What is the reason that the truncated normal form of the even β-chain is integrable as we know from Corollary 9.3? Is there some hidden symmetry-like property of the FPU chain that prevents terms bj bk (j = k) from appearing in the truncated normal form, thus causing the integrability? 4. Is it possible to explicitly construct action-angle coordinates for the truncated normal form of the even β-problem, globally or locally, and verify the KAM nondegeneracy condition? 5. What about the even α-β chain? As indicated in Corollary 9.3 its truncated normal form has a lot of conserved quantities. But is it also really Liouville integrable? If yes, then there is a big chance for the KAM theorem to work. And otherwise: can we find a counterexample of an even α-β chain with many ergodic orbits of low energy? Where the first and third questions are of a rather philosophical nature, the other three involve tough computations. Answers might be given in a subsequent paper.
682
B. Rink
A. Proof of Theorem 8.1 This appendix is based on notes of Frits Beukers. Its main intention is to prove Theorem 8.1. Some algebra is used that might be uncommon to the reader, but fortunately the conclusions of Theorem 8.1 and Theorem 8.2 are easily understood.
A.1. Sums of roots of unity. We are interested in solving the resonance equation ν(-, θ) = 0, that is we want to find vanishing sums of the eigenvalues ±iωj = ±2i sin( jnπ ). A study of these sums is possible if we first consider sums of roots of unity. Fix N ∈ N. We study the equation ζ1 +ζ2 +· · ·+ζN = 0 in the unknown roots of unity ζi . The solutions will be determined modulo permutation of the terms and multiplication by a common root of unity. We also assume that there are no vanishing subsums, that is i∈I ζi = 0 for all I ⊂ {1, . . . , N}, |I | < N . We first state our basic tool. Let K be a k field generated over Q by roots of unity. Let pk be a prime power and let ζ := e2πi/p . p Suppose ζ ∈ K and ζ ∈ K. Proposition A.1. The minimal polynomial of ζ over the field K is given by X p − ζ p if k ≥ 2 and Xp−1 + X p−2 + · · · + X + 1 if k = 1. For the proof of this proposition we refer to [16], §60-61. To return to our problem let us choose M ∈ N minimal so that (ζi /ζj )M = 1 for all i, j = 1, 2, · · · , N . Since we can multiply every term of our relation with ζ1−1 and put ζi = ζi /ζ1 we may as well assume that all ζi are M th roots of unity. Let p k be a primary factor of M. Set M = M/p and write ζi = ζ˜i ζ ni where ζ˜i ∈ K := Q(e2πi/M ) and ni ∈ {0, 1, 2, . . . , p − 1}. Then, according to Prop. A.1, the minimal polynomial of ζ over K is Xp − ζ p if k ≥ 2 and Xp−1 + X p−2 + · · · + X + 1 if k = 1. We now rewrite our relation in the following form: p−1
ζ˜i ζ s = 0.
(R)
s=0 ni =s
If k ≥ 2 the minimal polynomial of ζ over K is X p − ζ p . In particular this means that there exist no non-trivial K-linear relationsbetween 1, ζ, . . . , ζ p−1 . So the relation (R) implies that all coefficients are zero, hence ni =s ζ˜i ζ s = 0 for all s = 0, 1, 2, . . . , p−1. By the minimal choice of M, at least two of the exponents ni , nj should be different. Hence the assumption k ≥ 2 leads automatically to vanishing subsums. Let us now assume k = 1. Then the minimal polynomial of ζ over K is Xp−1 + p−2 X + · · · + X + 1. This means that any K-linear relation between 1, ζ, . . . , ζ p−1 must have all of its coefficients equal. Hence, (R) implies that all sums
ζ˜i
(P)
ni =s
have the same value σ . Since we do not want vanishing subsums we necessarily have σ = 0. This in its turn implies that each of the summations contains at least one term and so p ≤ N . This puts a bound on our search range.
Symmetry and Resonance in Periodic FPU Chains
683
A.2. Explicit computations. In this section we compute vanishing sums of roots of unity having no vanishing subsums. It should be noted that the solutions are given modulo permutation of terms and multiplication by a common root of unity. For each of the specific values of N we shall be considering, we denote by M the smallest number such that (ζi /ζj )M = 1 for i, j . From the previous section we know that M is square free and that p ≤ N for all prime divisors of M. Furthermore, we also note that if M divides 6, then it is easy to see that the only possible relations without vanishing subsums are 1 − 1 = 0 and 1 + δ + δ 2 = 0, where δ = e2πi/3 . So we shall assume that there is a prime ≥ 5 dividing M. By N ≥ p ≥ 5 the first interesting case to be considered is N = 5. N = 5. We have 5|M. Then (P) partitions our sum in precisely five parts, each with equal sum. Hence 1 + η + η2 + η3 + η4 = 0, where η = e2πi/5 . N = 6. Then p ≤ 5, hence 5|M. Then (P) partitions our sum in four parts of length 1 and one with length 2. Hence we see that −δ − δ 2 + η + η2 + η3 + η4 = 0 is the solution. N = 7. Then p ≤ 7. If 7|M then necessarily, 1 + @ + @ 2 + @ 3 + @ 4 + @ 5 + @ 6 = 0, where @ = e2πi/7 . Suppose 5 is the largest prime dividing M. Then (P) gives a partitioning in 31111 or 22111. The first gives rise to solutions with zero subsums. The second gives rise to the solutions (−δ − δ 2 )(1 + η) + η2 + η3 + η4 = 0 and (−δ − δ 2 )(1 + η2 ) + η + η3 + η4 = 0. N = 8. Then p ≤ 7. If 7|M then (P) implies that we have a partitioning 2111111 and −δ − δ 2 + @ + @ 2 + @ 3 + @ 4 + @ 5 + @ 6 = 0. Suppose 5 is the largest prime dividing M. Then (P) gives a partitioning 41111, 32111 or 22211. The first two give rise only to vanishing subsums. The last solution gives rise to (−δ − δ 2 )(1 + ηi + ηj ) + ηk + ηl = 0, where {i, j, k, l} = {1, 2, 3, 4}. A.3. Sums of the iωj . We are interested in vanishing sums of the eigenvalues ±iωj = ±2i sin( jnπ ). So we look for all solutions of ζ1 + · · · + ζN = 0 such that together with each ζi , minus its complex conjugate −ζi−1 also occurs. Since we shall only be interested in sums of 3 or 4 eigenvalues iωj , we restrict ourselves to N = 6, 8. We include sums with vanishing subsums, except vanishing subsums of the form ζ − ζ = 0, since these give rise to vanishing subsums of iωj ’s. So all vanishing subsums of roots of unity must have length at least three. N = 6. To bring our relation without zero subsums in the desired form, we have to multiply it by ±i and we derive 2i sin(π/6) + 2i sin(π/10) − 2i sin(3π/10) = 0. Now we look at relations with vanishing subsums. There can only be two vanishing subsums of length three. Hence (ζ1 + ζ2 )(1 + δ + δ 2 ) = 0 with ζ1 , ζ2 arbitrary. It is necessary and sufficient to assume that ζ1 ζ2 = −1. This means (ζ − ζ −1 )(1 + δ + δ 2 ) where ζ is arbitrary. Hence, 2i sin(π r) + 2i sin(π(r + 2/3)) + 2i sin(π(r + 4/3)) = 0, where r is an arbitrary rational number.
684
B. Rink
N = 8. Let us first see what we get from our relations without zero subsums. We find 2i sin(π/6) + 2i sin(3π/14) − 2i sin(π/14) − 2i sin(5π/14) = 0, 2i sin(π/6) + 2i sin(13π/30) − 2i sin(7π/30) − 2i sin(3π/10) = 0, 2i sin(π/6) + 2i sin(π/30) − 2i sin(11π/30) + 2i sin(π/10) = 0. Any relation with vanishing subsums must have subsums both of length 4, or subsums of lengths 3 and 5. The first case cannot occur, but the second yields ζ1 (1 + δ + δ 2 ) + ζ2 (1 + η + · · · + η4 ) = 0. Both ζ1 , ζ2 must be purely imaginary and have opposite sign. So we can take ζ1 = −ζ2 = i, hence 2i sin(π/2) − 2i sin(π/6) + 2i sin(π/10) − 2i sin(3π/10) = 0. Note that this constitutes a list of all possible nontrivial lower order resonance relations in the eigenvalues of the FPU-chain.
A.4. Proof of Theorem 8.1. We indicate how Theorem 8.1 can be proved using the previous paragraphs. From the first relation in Sect. A.3 we infer that iω n6 + iω 10n − iω 3n = 0 if n 10 is a multiple of 30. So multi-indices -, θ can be found such that |-| + |θ | = 3 and ±ν(-, θ) = iω n6 +iω 10n −iω 3n = 0. But for this - and θ, we must have that µ(-, θ) = 10
n ± 3n ± n6 ± 10 10 of which one easily verifies that it is unequal to 0 modulo n. One finds the same result for the other third order relation of the previous section. The verification is not hard, but needs more bookkeeping because of the appearance of the arbitrary rational. The conclusion is that for all multi-indices -, θ with |-|+|θ| = 3 and ν(-, θ) = 0, we have that µ(-, θ) = 0 mod n. This proves the first part of Theorem 8.1, which actually states that P3 ∩ ker adH2 is too small to have a nontrivial intersection with ker(T ∗ − Id). The proof of the second part of Theorem 8.1 is not harder. For |-| + |θ | = 4, there are a number of trivial solutions to the equation ν(-, θ ) = 0, namely those of the form iωj −iωj +iωk −iωk = 0. These give rise to the - and θ mentioned in Theorem 8.1. All the other solutions to ν = 0 must be of the form mentioned in Sect. A.3 under the item ‘N = 8’. From the first relation we see for instance that iω n6 + iω 3n − iω 14n − iω 5n = 0 14 14 if n is a multiple of 42. So multi-indices -, θ with |-| + |θ | = 4 can be found such that ±ν(-, θ) = iω n6 + iω 3n − iω 14n − iω 5n = 0. But for these multi-indices, one must have 14
14
n 5n that µ(-, θ) = ± n6 ± 3n 14 ± 14 ± 14 = 0 mod n. One finds the same conclusion for the other relations under the item “N = 8”. This proves the second part of Theorem 8.1.
Acknowledgement. The author thanks Frits Beukers, Richard Cushman, Hans Duistermaat, Reinout Quispel, Theo Tuwankotta and Ferdinand Verhulst for useful ideas, comments and discussions.
References [1] Abraham, R., Marsden, J.E.: Foundations of Mechanics. Reading, MA: The Benjamin/Cummings Publ. Co., 1987 [2] Arnol’d, V.I., Avez, A.: Ergodic problems of classical mechanics. Reading, MA: Benjamin, 1968 [3] Churchill, R.C., Kummer, M., Rod, D.L.: On averaging, reduction, and symmetry in hamiltonian systems. J. Diff. Eq. 49, 359–414 (1983)
Symmetry and Resonance in Periodic FPU Chains
685
[4] Fermi, E., Pasta, J., Ulam S.: Los Alamos Report LA-1940. In: E. Fermi, Collected Papers, 2, 1955, pp. 977–988 [5] Ford, J. The Fermi–Pasta–Ulam problem: Paradox turns discovery. Phys. Rep. 213, 271–310 (1992) [6] Gardner, C.S., Greene, J.M., Kruskal, M.D., Miura, R.M.: Method for solving the Korteweg–de Vries equation. Physics Rev. Lett. 19, 1095–1097 (1967) [7] Izrailev, F.M., Chirikov, B.V.: Statistical properties of a nonlinear string. Sov. Phys. Dokl. 11 No. 1, 30–32 (1966) [8] Lamb, J.S.W., Roberts, J.A.G.: Time-reversal symmetry in dynamical systems: a survey. Phys. D 112, 1–39 (1998) [9] Livi, R., Pettini, M., Ruffo, S., Sparpaglione, M.: Equipartition threshold in nonlinear large Hamiltonian systems: The Fermi–Pasta–Ulam model. Phys. Rev. A 31, 1039–1045 (1985) [10] Nishida, T.: A note on an existence of conditionally periodic oscillation in a one-dimensional anharmonic lattice. Mem. Fac. Eng. Univ. Kyoto 33, 27–34 (1971) [11] Palais, R.S.: The symmetries of solitons. Bull. Am. Math. Soc. 34, 339–403 (1997) [12] Poggi, P., Ruffo, S.: Exact solutions in the FPU oscillator chain. Phys. D 103, 251–272 (1997) [13] Rink, B., Verhulst, F.: Near-integrability of periodic FPU-chains. Physica A 285, 467–482 (2000) [14] Sanders, J. A.: On the theory of nonlinear resonance. Thesis, University of Utrecht, 1979 [15] Verhulst, F.: Symmetry and integrability in Hamiltonian normal forms. In: Symmetry and Perturbation Theory, Bambusi, D., Gaeta, G. (eds.), Firenze: Quaderni GNFM, 1998 [16] Van der Waerden, B.L.: Algebra I. Heidelberger Taschenbücher, Berlin–Heidelberg–NewYork: Springer, 1966 [17] Weissert, T.W.: The genesis of simulation in dynamics; pursuing the FPU problem. New York: Springer, 1997 [18] Zabuski, N.J., Kruskal, M.D. Interaction of “Solitons” in a collisionless plasma and the recurrence of initial states. Phys. Rev. Lett. 15, 240–243 (1965) Communicated by G. Gallavotti