Communications in Mathematical Physics - Volume 214

Commun. Math. Phys. 214, 1 – 56 (2000) Communications in Mathematical Physics © Springer-Verlag 2000 Modular-Invaria...

Author: M. Aizenman (Chief Editor)

43 downloads 865 Views 5MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Commun. Math. Phys. 214, 1 – 56 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Modular-Invariance of Trace Functions in Orbifold Theory and Generalized Moonshine Chongying Dong1, , Haisheng Li2, , Geoffrey Mason1, 1 Department of Mathematics, University of California, Santa Cruz, CA 95064, USA 2 Department of Mathematical Sciences, Rutgers University, Camden, NJ 08102, USA

Received: 7 January 1999 / Accepted: 14 March 2000

Abstract: The goal of the present paper is to provide a mathematically rigorous foundation to certain aspects of the theory of rational orbifold models in conformal field theory, in other words the theory of rational vertex operator algebras and their automorphisms. Under a certain finiteness condition on a rational vertex operator algebra V which holds in all known examples, we determine the precise number of g-twisted sectors for any automorphism g of V of finite order. We prove that the trace functions and correlation functions associated with such twisted sectors are holomorphic functions in the upper half-plane and, under suitable conditions, afford a representation of the modular group of the type prescribed in string theory. We establish the rationality of conformal weights and central charge. In addition to conformal field theory itself, where our conclusions are required on physical grounds, there are applications to the generalized Moonshine conjectures of Conway–Norton–Queen and to equivariant elliptic cohomology. Contents 1. 2. 3. 4. 5. 6. 7.

Introduction . . . . . . . . . . . . . . . . . Vertex Operator Algebras . . . . . . . . . . Twisted Modules . . . . . . . . . . . . . . . P -Functions and Q-Functions . . . . . . . . The Space of 1-Point Functions on the Torus The Differential Equations . . . . . . . . . . Formal 1-Point Functions . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

2 8 11 15 26 30 34

Supported by NSF grants DMS-9303374, DMS-9700923 and faculty research funds granted by the University of California at Santa Cruz. Supported by NSF grant DMS-9616630. Supported by NSF grants DMS-9401272, DMS-9700909 and faculty research funds granted by the University of California at Santa Cruz.

2

8. 9. 10. 11. 12. 13.

C. Dong, H. Li, G. Mason

Correlation Functions . . . . . . . . . . . . . . . . . An Existence Theorem for g-Twisted Modules . . . . The Main Theorems . . . . . . . . . . . . . . . . . . Rationality of Central Charge and Conformal Weights Condition C2 . . . . . . . . . . . . . . . . . . . . . . Applications to the Moonshine Module . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

35 40 42 47 50 53

1. Introduction The goals of the present paper are to give a mathematically rigorous study of rational orbifold models, more precisely we study the questions of the existence and modularinvariance of twisted sectors of rational vertex operator algebras. The idea of orbifolding a vertex operator algebra with respect to an automorphism, and in particular the introduction of twisted sectors, goes back to some of the earliest papers in the subject [FLM1, Le, FLM2, FLM3, DHVW, DHVV, FFR, D3], while the question of modular-invariance underlies the whole enterprise. Apart from a few exceptions such as [DGM], the physical literature tends to treat the existence and modular-invariance of twisted sectors as axioms, while mathematical work has been mainly limited to studying special cases such as affine algebras and lattice theories [KP, Le, FLM2] and fermionic orbifolds [DM1]. Under some mild finiteness conditions on a rational vertex operator algebra V we will, among other things, establish the following: (A) The precise number of inequivalent, simple g-twisted sectors that V possesses. (B) Modular-invariance (in a suitable sense) of the characters of twisted sectors. In order to facilitate the following discussion we assume that the reader has a knowledge of the basic definitions concerning vertex operator algebras as given, for example, in [FLM3, FHL, DM1] and below in Sects. 2 and 3. Let us suppose that V is a vertex operator algebra. There are several approaches to what it means for V to be rational, each of them referring to finiteness properties of V of various kinds (cf. [MS, HMV, AM] for example). Our own approach is as follows (cf. [DLM2, DLM3]): following [Z], an admissible V -module is a certain linear space M=

∞

M(n)

(1.1)

n∈Z

which admits an action of V by vertex operators which satisfy certain axioms, the most important of which is the Jacobi identity. However, the homogeneous subspaces M(n) are not assumed to be finite-dimensional. As explained in [DLM2], this definition includes as a special case the idea of an (ordinary) V-module, which is a graded linear space of the shape M= Mn (1.2) n∈C

such that M admits an appropriate action of V by vertex operators and such that each Mn has finite dimension, Mλ+n = 0 for fixed λ ∈ C and sufficiently small integer n, and each Mn is the n-eigenspace of the L(0)-operator. Although one ultimately wants to establish the rationality of the grading of such modules, it turns out to be convenient to allow the gradings to be a priori more general. One then has to prove that the grading is

Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine

3

rational after all. By the way, the terms “simple module” and “sector” are synonymous in this context. We call V a rational vertex operator algebra in case each admissible V -module is completely reducible, i.e., a direct sum of simple admissible modules. We have proved in [DLM3] that this assumption implies that V has only finitely many inequivalent simple admissible modules, moreover each such module is in fact an ordinary simple V -module. These results are special cases of results proved in (loc.cit.) in which one considers the same set-up, but relative to an automorphism g of V of finite order. Thus one has the notions of g-twisted V -module, admissible g-twisted V -module and g-rational vertex operator algebra V , the latter being a vertex operator algebra all of whose admissible g-twisted modules are completely reducible. We do not give the precise definitions here (cf. Sect. 3), noting only that if g has order T then a simple g-twisted V -module has a grading of the shape M=

∞

Mλ+n/T

(1.3)

n=0

with Mλ = 0 for some complex number λ which is called the conformal weight of M. This is an important invariant of M which plays a rˆole in the theory of the Verlinde algebra, for example. The basic result (loc.cit.) is that a g-rational vertex operator algebra has only finitely many inequivalent, simple, admissible g-twisted modules, and each of them is an ordinary simple g-twisted module. Although the theory of twisted modules includes that of ordinary modules, which corresponds to the case g = 1, it is nevertheless common and convenient to refer to the untwisted theory if g = 1, and to the twisted theory otherwise. It seems likely that if V is rational then in fact it is g-rational for all automorphisms g of finite order, but this is not known. Nevertheless, our first main result shows that there is a close relation between the untwisted and twisted theories. To explain this, we first recall standard notation for the vertex operator determined by v: Y (v, z) = v(n)z−n−1 (1.4) n∈Z

so that each v(n) is a linear operator on V . Define C2 (V ) to be the subspace of V spanned by the elements u(−2)v for u, v in V . We say that V satisfies Condition C2 if C2 (V ) has finite codimension in V . This is closely related to Zhu’s condition C [Z], which also includes the hypothesis that V is a sum of highest weight modules for the Virasoro algebra. Zhu asserted in [Z], and we verify in Sect. 12, that condition C2 holds for a number of the most familiar rational vertex operator algebras. Next, any automorphism h of V induces in a natural way (cf. Sect. 4 of [DM2] and Sect. 3 below) a bijection from the set of isomorphism classes of (simple, admissible) g-twisted modules to the corresponding set of hgh−1 -twisted modules, so that if g and h commute then one may consider the set of h-stable g-twisted modules. In particular we have the set of h-stable ordinary (or untwisted) modules, which includes the vertex operator algebra V itself. Our first set of results may now be stated as follows: Theorem 1.1. Suppose that V is a rational vertex operator algebra which satisfies Condition C2 . Then the following hold: (i) The central charge of V and the conformal weight of each simple V -module are rational numbers.

4

C. Dong, H. Li, G. Mason

(ii) If g is an automorphism of V of finite order then the number of inequivalent, simple g-twisted V -modules is at most equal to the number of g-stable simple untwisted V -modules, and is at least 1 if V is simple. (iii) If V is g-rational, the number of inequivalent, simple g-twisted V -modules is precisely the number of g-stable simple untwisted V -modules. We actually prove an extension of this result in which V is also assumed to be g i rational for all integers i (g as in (ii)), where the conclusion is that each simple g i -twisted V -module has rational conformal weight. There is a second variation of this theme involving the important class of holomorphic vertex operator algebras. This means that V is assumed to be both simple and rational, moreover V is assumed to have a unique simple module - which is necessarily the adjoint module V itself. Familiar examples include the Moonshine Module [B2, FLM3] and vertex operator algebras associated to positive-definite, even, unimodular lattices [B2], [FLM3]. Proof of rationality and holomorphy of these particular vertex operator algebras can be found in [D1, D2] and [DLM2]. We establish Theorem 1.2. Suppose that V is a holomorphic vertex operator algebra which satisfies Condition C2 , g is an automorphism of V of finite order. Then the following hold: (i) V possesses a unique simple g-twisted V -module up to isomorphism, call it V (g). (ii) The conformal weight of V (g) is a rational number. We turn now to a discussion of the general question of modular-invariance. One is concerned with various trace functions, the most basic of which is the formal character of a (simple) g-twisted sector M. If M has grading (1.3) we define the formal character as char q M = q λ−c/24

∞

dim Mλ+n/T q n/T ,

(1.5)

n=0

where c is the central charge and q a formal variable. More generally, if M is an h-stable g-twisted sector as before, then h induces a linear map on M which we denote by φ(h), and one may consider the corresponding graded trace ZM (g, h) = q λ−c/24

∞

tr Mλ+n/T φ(h)q n/T ,

(1.6)

n=0

The linear map φ(h) is only determined up to a nonzero scalar and therefore ZM (g, h) is also only defined up to a nonzero scalar. Similarly, up to a nonzero scalar ZM (g, h) is independent of the choice g-twisted sector in the isomorphism class of M. The choice of such a scalar does not interfere with any of the proofs and results in the present paper. As is well-known, it is important to consider these trace functions as special cases of so-called (g, h) correlation functions. These may be defined for homogeneous elements v ∈ V of weight k and any pair of commuting elements (g, h) via TM (v, g, h) = q λ−c/24

∞

tr Mλ+n/T (v(k − 1)φ(h))q n/T .

(1.7)

n=0

Note that v(k − 1) induces a linear map on each homogeneous subspace of M. If we take v to be the vacuum vector then (1.7) reduces to (1.6).

Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine

5

In the special case when V is holomorphic and the twisted sector V (g) is unique up to equivalence, we set ZV (g) (g, h) = Z(g, h), TV (g) (v, g, h) = T (v, g, h).

(1.8)

Note that in this situation, the uniqueness of V (g) shows that V (g) is h-stable whenever g and h commute. As discussed in [DM1–DM2], this allows us to consider φ as a projective representation of the centralizer C(g) on V (g), in particular (1.7) is defined for all commuting pairs (g, h). Linearizing this projective representation to an ordinary representation of a covering group of C(g) involves the choice of a 2-cocycle with coefficients in C∗ . Such a choice corresponds to a choice of the scalar in defining φ(h), as discussed above. One can also consider these trace functions less formally. Taking q to be the usual local parameter at infinity in the upper half-plane h = {τ ∈ C|imτ > 0}

(1.9)

i.e., q = qτ = e2πiτ , we will see that the trace functions TM (v, g, h, τ ) converge to holomorphic functions in h under suitable conditions on V . By extending TM linearly to the whole of V one obtains a function TM : V × P (G) × h → P 1 (C),

(1.10)

where P (G) is the set of commuting pairs of elements in G. We take ! = SL(2, Z) to operate on h in the usual way via Möbius transformations, that is aτ + b ab , γ = ∈ !, (1.11) γ : τ → cd cτ + d and we let it act on the right of P (G) via (g, h)γ = (g a hc , g b hd ).

(1.12)

Zhu has introduced in [Z] a second vertex operator algebra associated in a certain way to V ; it has the same underlying space, however the grading is different. We are concerned with elements v in V which are homogeneous of weight k, say, with regard to the second vertex operator algebra. We write this as wt[v] = k. For such v we define an action of the modular group ! on TM in a familiar way, namely TM |γ (v, g, h, τ ) = (cτ + d)−k TM (v, g, h, γ τ ).

(1.13)

We state some of our main results concerning modular-invariance. Theorem 1.3. Suppose that V is a vertex operator algebra which satisfies Condition C2 , and let G be a finite group of automorphisms of V . (i) For each triple (v, g, h) ∈ V × P (G) and for each h-stable g-twisted sector M, the trace function TM (v, g, h, τ ) converges to a holomorphic function in h. (ii) Suppose in addition that V is g-rational for each g ∈ G. Let v ∈ V satisfy wt[v] = k. Then the space of (holomorphic) functions in h spanned by the trace functions TM (v, g, h, τ ) for all choices of g, h and M is a (finite-dimensional) !-module with respect to the action (1.13).

6

C. Dong, H. Li, G. Mason

More precisely, if γ ∈ ! then we have an equality TM |γ (v, g, h, τ ) = σW TW (v, (g, h)γ , τ ),

(1.14)

W

where (g, h)γ is as in (1.12) and W ranges over the g a hc -twisted sectors which are g b hd -stable. The constants σW depend only on g, h, γ and W . Theorem 1.4. Suppose that V is a holomorphic vertex operator algebra which satisfies Condition C2 , and let G be a cyclic group of automorphisms of V . If (g, h) ∈ P (G), v ∈ V satisfies wt[v] = k, and γ ∈ !, then T (v, g, h, τ ) is a holomorphic function in h which satisfies T |γ (v, g, h, τ ) = σ (g, h, γ )T (v, (g, h)γ , τ )

(1.15)

for some constant σ (g, h, γ ). Note that if V is as in Theorem 1.4, then charq V is a modular function on SL(2, Z), possibly with character. It follows from this that the central charge of V is an integer divisible by 8. We can summarize some of the results above by saying that the functions TM (v, g, h, τ ) and T (v, g, h, τ ) are generalized modular forms of weight k in the sense of [KM]. This means essentially that each of these functions and each of their transforms under the modular group have q-expansions in rational powers of q with bounded denominators, and that up to scalar multiples there are only a finite number of such transforms. One would like to show that each of these functions is, in fact, a modular form of weight k and some level N in the usual sense of being invariant under the principal congruence subgroup !(N). This will require further argument, as it is shown in (loc.cit.) that there are generalized modular forms which are not modular forms in the usual sense. All we can say in general at the moment is that TM (v, g, h, τ ) and T (v, g, h, τ ) have finite level in the sense of Wohlfahrt [Wo]. This is true of all generalized modular forms, and means that TM (v, g, h, τ ) and T (v, g, h, τ ) and each of their ! -transforms are invariant under the group +(N ) for some N , where we define +(N ) to be the normal 1N closure of in !. 0 1 Let us emphasize the differences between Theorem 1.3 (ii) and Theorem 1.4. In the former we assume g-rationality, which in practice is hard to verify, even for known vertex operator algebras. In Theorem 1.4 there is no such assumption, however we have to limit ourselves to cyclic pairs (g, h) in P (G). One expects Theorem 1.4 to hold for all commuting pairs (g, h). In [N], Simon Norton conjectured that (1.15) holds in the special case that v is the vacuum vector and V is the Moonshine Module [FLM3] whose automorphism group is the Monster (loc.cit.). His argument was based on extensive numerical evidence in Conway-Norton’s famous paper Monstrous Moonshine [CN] which was significantly expanded in the thesis of Larrissa Queen [Q]. A little later it was given a string-theoretic interpretation in [DGH]. We will see in Sect. 12 that the Moonshine Module satisfies Condition C2 (a proof is also outlined in [Z]), so that Theorem 1.4 applies. Norton also conjectured that each Z(g, h, τ ) is either constant or a hauptmodul, the latter being a modular function (weight zero) on some discrete subgroup !0 of SL(2, R) of genus zero such that the modular function in question generates the full field of meromorphic modular functions on !0 . By utilizing the results of Borcherds [B2] which establish the original Moonshine conjectures in [CN] for the Moonshine Module we obtain

Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine

7

Theorem 1.5 (Generalized Moonshine). Let V , be the Moonshine Module, and let g be an element of the Monster simple group M. The following hold: (i) V , has a unique g-twisted sector V , (g). (ii) The formal character char q V , (g) is a hauptmodul. (iii) More generally, if g and h in M generate a cyclic subgroup then the graded trace Z(g, h, τ ) of φ(h) on V , (g) is a hauptmodul. These results essentially establish the Conway–Norton–Queen conjectures about the Monster for cyclic pairs (g, h). As we have said, the spaces V , (g) support faithful projective representations of the corresponding centralizer CM (g) of g in M. It was these representations that were conjectured to exist in [N] and [Q], and they are of considerable interest in their own right. See [DLM1] for more information in some special cases. There also appears to be some remarkable connections with the work of Borcherds and Ryba [Ry, BR, B3] on modular moonshine which we hope to consider elsewhere. Finally, we mention that Theorem 1.4 can be considerably strengthened, indeed the best possible results can be established, if we assume that g has small order. For simplicity we state only a special case: Theorem 1.6. Let V , v, k be as in Theorem 1.4, and assume that the central charge of V is divisible by 24. Suppose that g has order p = 2 or 3. Then the following hold: (i) The conformal weight λ of the (unique) g-twisted sector V (g) is in

1 Z. p2

(ii) The graded trace T (v, 1, g, τ ) is a modular form of weight k and level p 2 . (iii) We have λ ∈ p1 Z if, and only if, T (v, g, h, τ ) is a modular form on the congruence subgroup !0 (p). These results follow from the previous theorems, and will be proved elsewhere. They can be used to make rigorous some of the assumptions commonly made by physicists (cf. [M, S, T1, T2, T3, V] for example) in the theory of Zp -orbifolds, and we hope that Theorem 1.6 may provide the basis for a more complete theory of such orbifolds. We have already mentioned the work of Zhu [Z] on several occasions, and it is indeed this paper to which we are mainly indebted intellectually. In essence, we are going to prove an equivariant version of the theory laid down by Zhu (loc.cit.), though even in the special case that he was studying our work yields improvements on his results. In particular, Zhu’s hypothesis that V is a sum of highest weight modules for the Virasoro algebra is eliminated in the present paper, and our notion of rationality, developed in [DLM3], is qualitatively weaker than that of Zhu [Z]. Nevertheless, the broad outline of our proof follows that of [Z]. The equivariant refinement of Zhu’s theory began with our paper [DLM3] which also plays a basic rˆole in the present paper. In this paper we constructed so-called twisted Zhu algebras Ag (V ) which are associative algebras associated to a vertex operator algebra V and automorphism g of finite order. They have the property that, at least for suitable classes of vertex operator algebras, the module category for Ag (V ) and the category of g-twisted modules for V are equivalent. This reduces the construction of g-twisted sectors to the corresponding problem for Ag (V ) (not a priori known to be non-zero!) As in [Z], the rˆole of the finiteness condition C2 is to show that the (g, h) correlation functions that we have considered above satisfy certain differential equations of regular - singular type. These differential equations have coefficients which are essentially modular forms on a congruence subgroup, a fact which is ultimately attributable to the Jacobi identity satisfied by the vertex operators. One attempts to characterize the space of correlation functions as those solutions of

8

C. Dong, H. Li, G. Mason

the differential equation which possess other technical features related to properties of V and Ag (V ). This space is essentially what is sometimes referred to as the (g, h)conformal block, and our results follow from the technical assertion that, under suitable circumstances, the (g, h)-conformal block is indeed spanned by the (g, h)-correlation functions. This whole approach to conformal blocks is inspired by [Z], but is more complicated when twisted sectors are involved. We point out that the holomorphy of trace functions follows from the fact that they are solutions of suitable differential equations, moreover the Frobenius–Fuchs theory of differential equations with regular - singular points leads to the representation of elements of the conformal block as q-expansions. One attempts to identify coefficients of such q-expansions with the trace function defined by some Ag (V )-module, and at the same time show that elements of the conformal blocks are free of logarithmic singularities. The point is that the Frobenius–Fuchs theory plays a critical rˆole, as it does also in [Z]. The paper is organized as follows: after some preliminaries in Sects. 2 and 3, we take up in Sect. 4 the study of certain modular and Jacobi-type forms, the main goal being to write down the transformation laws which they satisfy. Our methods (and probably the results too) will be well-known to experts in elliptic functions, but it is fascinating to see how such classical topics such as Eisenstein series and Bernoulli distributions play a rˆole in an abstract theory of vertex operator algebras. In Sect. 5 we introduce the space of abstract (g, h) 1-point functions associated to a vertex operator algebra and establish that it affords an action by the modular group (Theorem 5.4). In Sections 6 and 7 we continue the study of such functions and in particular write down the differential equation that they satisfy and the general shape of the solutions in terms of q-expansions and logarithmic singularities. Next we prove in Sect. 8 (Theorems 8.1 and 8.7) that if g and h commute then distinct h-stable g-twisted sectors give rise to linearly independent trace functions which lie in the (g, h)-conformal block, and in Sect. 9 we give as an application of the ideas developed so far a general existence theorem for twisted sectors. Section 10 contains the main theorems which give sufficient conditions under which the (g, h) conformal block is spanned by trace functions. Having reached Sect. 11, we have enough information to be able to apply the methods and results of Anderson and Moore [AM], and this leads to the rationality results stated above in Theorems 1.1 and 1.2 as well as the applications to modular-invariance and to the generalized Moonshine Conjectures, which are discussed in Sect. 13. We also point out how one can use Theorem 1.4 to describe not only other correlation functions but also “Monstrous Moonshine of weight k.” Sect. 12 establishes condition C2 for a number of well-known rational vertex operator algebras, so that our theory applies to all of these examples. 2. Vertex Operator Algebras The definition of vertex operator algebra [FLM3] entails a Z-graded complex vector space: Vn (2.1) V = n∈Z

satisfying dim Vn < ∞ for all n and Vn = 0 for n << 0. If v ∈ Vn we write wtv = n and say that v is homogeneous and has (conformal) weight n. For each v ∈ V there are linear operators vn ∈ EndV , n ∈ Z which are assembled into a vertex operator Y (v, z) = v(n)z−n−1 ∈ (End V )[[z, z−1 ]] n∈Z

Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine

9

which is linear in v. Various axioms are imposed. For u, v ∈ V , (i) u(n)v = 0

for sufficiently large.

(2.2)

(ii) There is a distinguished vacuum element 1 ∈ V0 satisfying Y (1, z) = 1

(2.3)

and Y (v, z)1 = v +

v(−n)1zn−1 .

(2.4)

n≥2

(iii) There is a distinguished conformal vector ω ∈ V2 with generating function Y (ω, z) = L(n)z−n−2 n∈Z

such that the component operators generate a copy of the Virasoro algebra represented on V with central charge c ∈ C. That is [L(m), L(n)] = (m − n)L(m + n) +

1 (m3 − m)δm+n,0 c. 12

(2.5)

Moreover we have Vn = {v ∈ V |L(0)v = nv}, d Y (v, z) = Y (L(−1)v, z). dz

(2.6) (2.7)

(iv) The Jacobi identity holds, that is z0−1 δ

z1 − z2 z0

z2 − z1 Y (v, z2 )Y (u, z1 ) −z0 z1 − z0 = z2−1 δ Y (Y (u, z0 )v, z2 ) z2

Y (u, z1 )Y (v, z2 ) − z0−1 δ

(2.8)

where δ(z) = n∈Z zn and (zi − zj )n is expanded as a formal power series in zj . Throughout the paper, z0 , z1 , z2 , etc. are independent commuting formal variables. Such a vertex operator algebra may be denoted by the 4-tuple (V , Y, 1, ω) or, more usually, by V . Zhu has introduced a second vertex operator algebra (V , Y [ ], 1, ω) ˜ associated to V in Theorem 4.2.1 of [Z]. It plays a crucial rˆole in Zhu’s theory and also in the present paper, so we give some details of the construction1 . 1 Our formulae differ from that of Zhu by a factor of 2π i in the exponent ezwtv .

10

C. Dong, H. Li, G. Mason c 24 .

The conformal vector ω˜ is defined to be ω − defined for homogeneous v via the equality

Y [v, z] = Y (v, ez − 1)ezwtv =

The vertex operators Y [v, z] are

v[n]z−n−1 .

(2.9)

n∈Z

For integers i, m, p with i, m ≥ 0 we may define c(p, i, m) in either of two equivalent ways:

i p−1+z = c(p, i, m)zm , i

(2.10)

c(p, i, m)zi = (log(1 + z))m (1 + z)p−1 ,

(2.11)

m=0

m!

∞ i=m

where, as usual log(1 + z) =

∞ (−1)n−1 n=1

n

zn .

(2.12)

Using a change of variable we calculate that v[m] = Resz Y [v, z]zm

(2.13) m

= Resz Y [v, log(1 + z)](log(1 + z)) (1 + z)

−1

,

(2.14)

i.e., v[m] = Resz Y (v, z)(log(1 + z))m (1 + z)wtv−1 .

(2.15)

So if m ≥ 0 then v[m] = m!

∞

c(wtv, i, m)v(i).

(2.16)

i=m

For example we have ∞ wtv − 1 v(i). v[0] = i

(2.17)

i=0

We also write Y [ω, ˜ z] =

L[n]z−n−2 .

(2.18)

n∈Z

For example, one has c , 24 L[−1] = L(−1) + L(0),

L[−2] = ω[−1] −

L[0] = L(0) +

∞ n=1

(−1)n−1 n(n + 1)

(2.19) (2.20) L(n).

(2.21)

Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine

11

Care must be taken to distinguish between the notion of conformal weight in the original vertex operator algebra and in (V , Y [ ], 1, ω). ˜ If v ∈ V is homogeneous in the latter vertex operator algebra we denote its conformal weight by wt[v], and set V[n] = {v ∈ V |L[0]v = nv}.

(2.22)

In general, Vn and V[n] are distinct, though it follows from (2.21) that for each N we have Vn = V[n] . (2.23) n≤N

n≤N

3. Twisted Modules Let V be a vertex operator algebra. An automorphism g of the vertex operator algebra V is a linear automorphism of V preserving 1 and ω such that the actions of g and Y (v, z) on V are compatible in the sense that gY (v, z)g −1 = Y (gv, z) for v ∈ V . Let AutV be the group of automorphisms of V . If g ∈ AutV has finite order T , say, there are various classes of g-twisted V -modules of concern to us (cf. [Le, FLM2, DLM2, DLM3]). A weak g-twisted V -module is a C-linear space M equipped with a linear map V → (EndM)[[z1/T , z−1/T ]], v → YM (v, z) = n∈Q v(n)z−n−1 satisfying the following: For v ∈ V , w ∈ M, v(m)w = 0

(3.1)

YM (1, z) = 1;

(3.2)

V r = {v ∈ V |gv = e−2πir/T v}

(3.3)

if m is sufficiently large;

set

for 0 ≤ r ≤ T − 1. Then YM (v, z) =

v(n)z−n−1

for v ∈ V r ;

(3.4)

n∈r/T +Z

and the twisted Jacobi identity holds z1 − z2 z2 − z1 −1 −1 z0 δ YM (u, z1 )YM (v, z2 ) − z0 δ YM (v, z2 )YM (u, z1 ) z0 −z0 z1 − z0 −r/T z1 − z0 = z2−1 YM (Y (u, z0 )v, z2 ) (3.5) δ z2 z2 if u ∈ V r . We often write (M, YM ) for this module. It can be shown (cf. Lemma 2.2 of [DLM2, DLM3]) that YM (ω, z) has component operators which still satisfy (2.5) and (2.7). If we take g = 1, we get a weak V -module.

12

C. Dong, H. Li, G. Mason

For m ∈ C we use the notation ιz1 ,z2 (z1 − z2 )m to denote the expansion of (z1 − z2 )m in terms of the nonnegative integral powers of z2 : m m m ιz1 ,z2 ((z1 − z2 ) ) = z1 (−1)i z2i z1−i , (3.6) i i≥0

where

m i

=

m(m−1)···(m−i+1) . i!

It follows from the twisted Jacobi identity (3.5) that r/T

Resz1 YM (u, z1 )YM (v, z2 )z1 ιz1 ,z2 ((z1 − z2 )m )z2n r/T

− Resz1 YM (v, z2 )YM (u, z1 )z1 ιz2 ,z1 ((z1 − z2 )m )z2n r/T = Resz1 −z2 YM (Y (u, z1 − z2 )v, z2 )ιz2 ,z1 −z2 z1 (z1

(3.7)

− z2 )m z2n

r/T −m r/T (z1 − z2 )m . for m ∈ Z, n ∈ C. Here, ιz2 ,z1 −z2 z1 = m≥0 r/T m z2 An admissible g-twisted V -module is a weak g-twisted V -module M which carries a T1 Z+ -grading M= M(n) (3.8) n∈ T1 Z+

which satisfies the following v(m)M(n) ⊆ M(n + wtv − m − 1)

(3.9)

for homogeneous v ∈ V . We may and do assume that M(0) = 0 if M = 0. Again if g = 1 we have an admissible V -module. An (ordinary) g-twisted V -module is a weak g-twisted V -module M which carries a C-grading induced by the spectrum of L(0). That is, we have M= Mλ , (3.10) λ∈C

where Mλ = {w ∈ M|L(0)w = λw}. Moreover we require that dim Mλ is finite and for fixed λ, M Tn +λ = 0 for all small enough integers n. If g = 1 we have an ordinary V -module. If M is a simple g-twisted V -module, then M=

∞

Mλ+n/T

(3.11)

n=0

for some λ ∈ C such that Mλ = 0. We define λ as the conformal weight of M. The vertex operator algebra V is called g-rational in case every admissible g-twisted V -module is completely reducible, i.e., a direct sum of simple admissible g-twisted modules. Remark 3.1. There is a subtlety in the definition of these twisted modules. Namely, the definition of V r given in (3.3) is not quite the same as that which we have used elsewhere ([DLM3, DM1], etc.). Previously we set V r = {v ∈ V |gv = e2πir/T v}, so in effect we have interchanged the notions of g- and g −1 -twisted modules. The reason for doing this is that it makes the theorem of modular-invariance (Theorem 5.4 below) have the expected form.

Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine

13

Theorem 3.2 ([DLM3]). Suppose that V is a g-rational vertex operator algebra, where g ∈ AutV has finite order. Then the following hold: (a) Every simple admissible g-twisted V -module is an ordinary g-twisted V -module. (b) V has only finitely many isomorphism classes of simple admissible g-twisted modules. Let V be a vertex operator algebra with g an automorphism of V of finite order T . In [DLM3] a certain associative algebra Ag (V ) was introduced which plays a basic rˆole in the present work. In case g = 1, A1 (V ) = A(V ) was introduced and used extensively in [Z]. We need to review certain aspects of these constructions here. Let V r be as in (3.3). For u ∈ V r and v ∈ V we define r

(1 + z)wtu−1+δr + T u ◦g v = Resz Y (u, z)v, z1+δr wt u Resz (Y (u, z) (1+z) v) if r = 0 z u ∗g v = 0 if r > 0.

(3.12) (3.13)

where δr = 1 if r = 0 and δr = 0 if r = 0. Extend ◦g and ∗g to bilinear products on V . We let Og (V ) be the linear span of all u ◦g v. Theorem 3.3 ([DLM3, Z]). The quotient Ag (V ) = V /Og (V ) is an associative algebra with respect to ∗g . Note that if g = 1 then A(V ) is nonzero, whereas if g = 1 the analogous assertion may not be true. But if Ag (V ) = 0 then the vacuum element maps to the identity of Ag (V ), and the conformal vector maps into the center of Ag (V ). Let M be a weak g-twisted V -module. Define 7(M) = {w ∈ V |u(wtu − 1 + n)w = 0, u ∈ V , n > 0}. For homogeneous u ∈ V define o(u) = u(wtu − 1)

(3.14)

(sometimes called the zero mode of u). Theorem 3.4 ([DLM3]). Let M be a weak g-twisted V -module. The following hold: (a) The map v → o(v) induces a representation of the associative algebra Ag (V ) on 7(M). (b) If M is a simple admissible g-twisted V -module then 7(M) = M(0) is a simple Ag (V )-module. Moreover, M → M(0) induces a bijection between (isomorphism classes of) simple admissible g-twisted V -modules and simple Ag (V )-modules. When combined with Theorem 3.2 one finds Theorem 3.5 ([DLM3]). Suppose that V is a vertex operator algebra with an automorphism g of finite order, and that V is g-rational (possibly g = 1). Then the following hold: (a) Ag (V ) is a finite-dimensional, semi-simple associative algebra (possibly 0). (b) The map M → 7(M) induces an equivalence between the category of ordinary g-twisted V -modules and the category of finite-dimensional Ag (V )-modules.

14

C. Dong, H. Li, G. Mason

There are various group actions that we need to explain. Let g, h be automorphisms of V with g of finite order. If (M, Yg ) is a weak g-twisted module for V there is a weak hgh−1 -twisted V -module (M, Yhgh−1 ), where for v ∈ V we define Yhgh−1 (v, z) = Yg (h−1 v, z).

(3.15)

This defines a left action of Aut(V ) on weak twisted modules and on isomorphism classes of weak twisted modules. Symbolically, we write h ◦ (M, Yg ) = (M, Yhgh−1 ) = h ◦ M,

(3.16)

where we sometimes abuse notation slightly by identifying (M, Yg ) with the isomorphism class that it defines. The action (3.16) induces an action h ◦ 7(M) = 7(h ◦ M).

(3.17)

Next, it follows easily from definitions (3.12) and (3.13) that the action of h on V induces an isomorphism of associative algebras h :Ag (V )→ Ahgh−1 (V ) v

→ hv

(3.18)

which then induces a functor h : Ahgh−1 (V ) − mod → Ag (V ) − mod.

(3.19)

To describe (3.19), let (N, ∗hgh−1 ) be a left Ahgh−1 (V )-module (extending the notation of (3.13)). Then h ◦ (N, ∗hgh−1 ) = (N, ∗g ), where, for n ∈ N, v ∈ V , v ∗hgh−1 n = h−1 v ∗g n.

(3.20)

Now if (M, Yg ) is as before then (3.17) and (3.19) both define actions of h on 7(M); they are the same. For if v ∈ V and we consider the image of v in Ahgh−1 (V ), it acts on 7(h ◦ M) via the zero mode ohgh−1 (v) of v in the vertex operator Yhgh−1 (v, z) = Yg (h−1 v, z). In other words, if m ∈ 7(h◦M) = 7(M), then ohgh−1 (v)m = og (h−1 v)m, which is precisely what (3.20) says. We say that the g-twisted V -module M is h-invariant if h ◦ M ∼ = M. The set of all such automorphisms, the stabilizer of M, is a subgroup C of AutV . There is a projective representation of C on M induced by the action (3.16). See [DM1] or [DM2] for more information on this point. Via (3.17) this induces a projective representation of C on 7(M). Next we discuss the C2 -condition in more detail. Let V be a vertex operator algebra and M a V -module. We define C2 (M) = {v(−2)m|v ∈ V , m ∈ M}.

(3.21)

We say that M satisfies condition C2 in case C2 (M) has finite codimension in M. The most important case is that in which M is taken to be V itself. Proposition 3.6. Suppose that V satisfies condition C2 , and let g be an automorphism of V of finite order. Then the algebra Ag (V ) has finite dimension.

Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine

15

Proof. Note from (3.21) that C2 (V ) is a Z-graded subspace of V . Since the codimension of C2 (V ) is finite, there is an integer k such that V = C2 (V ) + W , where W is the sum of the first k homogeneous subspaces of V . We will show that Vm ⊂ W +Og (V ) for each m ∈ Z, in which case V = W +Og (V ) and dim Ag (V ) ≤ dim W . We proceed by induction on m. Recall that V r is the eigenspace of g with eigenvalue e−2πir/T for 0 ≤ r ≤ T − 1, where g has order T . Since C2 (V ) is a homogeneous and g-invariant subspace of V then we may write any c ∈ C2 (V ) ∩ Vm in the form c=

n

ui (−2)vi

(3.22)

i=1

for homogeneous elements ui , vi ∈ V satisfying ui ∈ V r for some r = r(i) and wtui + wtvi + 1 = m. Suppose first that ui ∈ V r with r = 0. According to (3.12), Og (V ) contains wtui (1 + z)wtui Resz Y (ui , z) vi = ui (j − 2)vi . z2 j

(3.23)

j ≥0

Now wtui (j − 2)vi = wtui + wtvi − j + 1 = m − j , so if j ≥ 1 then ui (j − 2)vi ∈ W + Og (V ) by induction. But then W + Og (V ) also contains the remaining summand ui (−2)vi of (3.23). Now suppose that ui ∈ V r with r ≥ 1. By Lemma 2.2 (i) of [DLM3] we have that wtui −1+r/T Og (V ) contains the element Resz Y (ui , z) (1+z) z2 vi , and we conclude once again that ui (−2)vi lies in W + Og (V ). So we have shown that all summands of (3.22) lie in W + Og (V ). The proposition follows. Proposition 3.7. Suppose that V satisfies condition C2 , and let g be an automorphism of V of finite order. If Ag (V ) = 0 then V has a simple g-twisted V -module. Proof. Ag (V ) has finite dimension by Proposition 3.6. Now the result follows from Theorem 9.1 of [DLM3]. The following lemma will be used in Sect. 12. Lemma 3.8. Let M be a V -module. Then C2 (M) is invariant under the operators v(0) and v(−1) for any v ∈ V . u ∈ V and w ∈ M. Then for k = 0, −1 Proof. Consider u(−2)w ∈ C2 (M) for k v(k)u(−2)w = u(−2)v(k)w + ∞ i=0 i (v(i)u)(−2 + k − i)w ∈ C2 (M) as required. 4. P -Functions and Q-Functions We study certain functions, which we denote by P and Q, which play a r oˆ le in later sections. The P -functions are essentially Jacobi forms [EZ] and the Q-functions are certain modular forms. The main goal is to write down the transformation laws of these functions under the action of the modular group ! = SL(2, Z).

16

C. Dong, H. Li, G. Mason

Let h denote the upper half plane h = {z ∈ C|im z > 0} with the usual left action of ! via Möbius transformations aτ + b ab τ= . (4.1) cd cτ + d ! also acts on the right of S 1 × S 1 via ab (µ, λ) = (µa λc , µb λd ). cd

(4.2)

Let t be a torsion point of S 1 ×S 1 . Thus t = (µ, λ) with µ = e2πij/M and λ = e2πil/N for integers j, l, M, N with M, N > 0. For each integer k = 1, 2, · · · and each t we define a function Pk on C × h as follows: Pk (µ, λ, z, qτ ) = Pk (µ, λ, z, τ ) =

nk−1 qzn 1 , (k − 1)! j 1 − λqτn

(4.3)

n∈ M +Z

where the sign means omit the term n = 0 if (µ, λ) = (1, 1). Here and below we write qx = e2πix . Remark 4.1. (i) (4.3) converges uniformly and absolutely on compact subsets of the region |qτ | < |qz | < 1. (ii) Theorem 4.2 holds also for (µ, λ) = (1, 1) in case k ≥ 3 but not if k = 1, 2 (cf. [Z]). We will prove Theorem 4.2. Suppose that (µ, λ) = (1, 1). Then z Pk (µ, λ, , γ τ ) = (cτ + d)k Pk ((µ, λ)γ , z, τ ) cτ + d ab for all γ = ∈ !. cd We can reformulate Theorem 4.2 as follows: for suitable functions F (µ, λ, z, τ ) on (Q/Z)2 × C × h, and for an integer k, we set z F |k γ (µ, λ, z, τ ) = (cτ + d)−k F ((µ, λ)γ −1 , , γ τ ). (4.4) cτ + d As is well-known, this defines a right action of ! on such functions F . Theorem 4.2 says precisely that Pk is an invariant of this action. So it is enough to prove the theorem 0 −1 11 for the two standard generators S = and T = of !. If γ = T 1 0 01 then Theorem 4.2 reduces to the assertion Pk (µ, λ, z, τ + 1) = Pk (µ, µλ, z, τ ), which follows immediately from definition (4.3). We also note the equality d (4.5) Pk (µ, λ, z, τ ) = 2π ikPk+1 (µ, λ, z, τ ). dz So if Theorem 4.2 holds for k then it holds for k + 1 by (4.5) and the chain rule. These observations reduce us to proving Theorem 4.2 in the case that k = 1 and γ = S, when it can be restated in the form

Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine

17

Theorem 4.3. If (µ, λ) = (1, 1) then z −1 ) = τ P1 (λ, µ−1 , z, τ ). P1 (µ, λ, , τ τ We will need to make use of several other functions in the proof of Theorem 4.3. First there is the usual Eisenstein series G2 (τ ) with q-expansion ∞

G2 (τ ) =

nq n π2 τ + 2(2π i)2 3 1 − qτn

(4.6)

n=1

and well-known transformation law

G2 (γ τ ) = (cτ + d) G2 (τ ) − 2π ic(cτ + d), γ = 2

ab cd

∈ !.

(4.7)

(For these and other facts about elliptic functions, the reader may consult [La].) Let

∞ qτn qz−1 qz + 1 qz qτn . (4.8) − + 2π i ℘1 (z, τ ) = G2 (τ )z + πi qz − 1 1 − qz qτn 1 − qτn qz−1 n=1

The function ℘1 (z, τ ) is not elliptic, but satisfies ℘1 (z + 1, τ ) = ℘1 (z, τ ) + G2 (τ ), ℘1 (z + τ, τ ) = ℘1 (z, τ ) + G2 (τ )τ − 2π i, z ab ∈ !. , γ τ ) = (cτ + d)℘1 (z, τ ), γ = ℘1 ( cd cτ + d

(4.9) (4.10) (4.11)

Now introduce a further P -type function

Pλ (z, τ ) = 2π i

n∈Z,n=0

qzn , 1 − λqτn

where λ is a root of unity. Lemma 4.4. Suppose that |qτ | < |qz | < 1 and that λN = 1. Then Pλ (z, τ ) =

N−1

λk (G2 (N τ )(z + kτ ) − ℘1 (z + kτ, N τ ) − π i) .

k=0

Proof. As |λqτn | < 1 for n ≥ 1 then ∞ n=1

∞

∞

qzn = qzn λm qτmn n 1 − λqτ = =

n=1 m=0 ∞ ∞

m=0 n=0 ∞ m m=0

λm qz qτm (qz qτm )n

λ qz qτm 1 − qz qτm ∞

=

(as |qz qτm | < 1 for m ≥ 0)

λ m qz q m qz τ + . 1 − qz 1 − qz qτm m=1

(4.12)

18

C. Dong, H. Li, G. Mason

Using |qz−1 qτm | < 1 for m ≥ 1, a similar calculation yields ∞ ∞ λ−1 qz−n qτn λ−m qz−1 qτm = . −1 n 1 − λ qτ 1 − qz−1 qτm n=1 m=1

From this and (4.12) we get −1

(2πi)

∞

qz Pλ (z, τ ) = + 1 − qz

m=1

λ−m qz−1 qτm λm qz qτm − 1 − qz q mt 1 − qz−1 qτm

.

(4.13)

Next, use the expansion −1 ∞ ∞ N N−1 ∞ n qz+kτ qNτ λm qz qτm λk qz qτnN+k k = = λ n 1 − qz qτm 1 − qz+kτ qNτ 1 − qz qτnN+k

m=0

n=0 k=0

k=0

n=0

and a similar expression for the second term under the summation sign in (4.13) to see that

∞ N−1 ∞ −1 n qz+kτ q n q q z+kτ Nτ Nτ Pλ (z, τ ) = 2πi . (4.14) λk n − −1 n 1 − qz+kτ qNτ 1 − qz+kτ qNτ n=0 n=1 k=0 Using the formula (4.8) for ℘1 (z + kτ, N τ ), the lemma follows readily from (4.14).

For a root of unity λ, set ?(λ) =

1 1−λ ,

0,

λ = 1 λ = 1.

(4.15)

Now we are ready for the proof of Theorem 4.3. Let ν = e2πi/M with µ and λ as before. For t ∈ Z we then have M

jt

j

ν P1 (ν , λ, z, τ ) =

j =1

M

ν

jt

j =1

j n∈ M +Z

n q z+t qzn M = . 1 − λqτn 1 − λq nτ n∈Z

M

From (4.12) and (4.15) we conclude that M

ν j t P1 (ν j , λ, z, τ ) =

j =1

1 z+t τ Pλ ( , ) + ?(λ). 2π i M M

(4.16)

Regarding this as a system of linear equations in P1 (ν j , λ, z, τ ) for t = 0, 1, . . . , M − 1, we may invert to find that (with µ = ν j ) P1 (µ, λ, z, τ ) =

M−1 1 −t z+t τ Pλ ( µ , ) + 2π i?(λ) . 2πiM M M t=0

Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine

19

Now use Lemma 4.4 to obtain

P1 (µ, λ, z, τ ) =

M−1 N−1 1 −t k µ λ 2πiM

(4.17)

t=0 k=0

M−1 ?(λ) −t N τ (t + kτ ) z + t + kτ N τ G2 ( ) − ℘1 ( , ) + µ . M M M M M t=0

(We used (µ, λ) = (1, 1) to eliminate some terms in (4.17).) So now we get, using (4.7), (4.11) and (4.17): M−1 N−1 z −1 1 −t k µ λ )= P1 (µ, λ, , τ τ 2π iM t=0 k=0

M−1 −N (−k + tτ ) z − k + tτ −N ?(λ) −t G2 ( µ ) − ℘1 ( , ) + Mτ Mτ Mτ Mτ M t=0

M−1 N −1 Mτ 2 1 −t k Mτ Mτ (−k+tτ ) = µ λ G2 ( ) − 2∂ii 2πiM N N N Mτ t=0 k=0

M−1 Mτ z−k+tτ Mτ ?(λ) −t −− µ . ℘1 ( , ) + N N N M

(4.18)

t=0

Case 1. µ = 1 = λ. Here (4.18) simplifies to M−1 N−1 τ −t k z −1 z − k + tτ Mτ )=− , ) P1 (µ, λ, , µ λ ℘1 ( τ τ 2π iN N N t=0 k=0

=−

τ 2π iN

M−1 N−1

µ−t λk ℘1 (

t=0 k=0

z + N − k + tτ Mτ , ) N N

using (4.9). From (4.17) this is indeed equal to τ P1 (λ, µ−1 , z, τ ), as required. Case 2. λ = 1 = µ. This time (4.18) reads M−1 N−1 z −1 1 k P1 (µ, λ, , λ )= τ τ 2πiM t=0 k=0

Mτ N

2

Mτ 2π iMτ G2 ( )− N N

z − k + tτ Mτ Mτ ℘1 ( , ) + ?( la). − N N N

−k Mτ

(4.19)

20

C. Dong, H. Li, G. Mason

It is easy to check that

1 N

N−1 k=0

kλk = −?(λ), so that (4.19) simplifies to read

M−1 N−1 τ k z −1 Mτ k z − k + tτ Mτ P1 µ, λ, , =− ) + ℘1 ( , ) λ G2 ( τ τ 2πiN N N N N t=0 k=0

=

τ 2πiN

N−1 M−1 l=0 t=0

Mτ l z + l + tτ Mτ ) − ℘1 ( , ) λ−l G2 ( N N N N

= τ P1 (λ, µ−1 , z, τ ). This completes the discussion of Case 2. The final case λ = 1 = µ is completely analogous, and we accordingly omit details.This completes the proof of Theorem 4.3, hence also that of Theorem 4.2. We discuss some aspects of Bernoulli polynomials. Recall [La, Ra] that these polynomials Bk (x) are defined by the generating function ∞

tetx tk = . B (x) k et − 1 k!

(4.20)

1 1 B0 (x) = 1, B1 (x) = x − , B2 (x) = x 2 − x + . 2 6

(4.21)

k=0

For example

We will need the following identities (loc.cit.) N−1

(a + x)k−1 =

a=0

1 (Bk (x + N ) − Bk (x)), k

Bk (1 − x) = (−1)k Bk (x).

(4.22) (4.23)

Proposition 4.5. If µ = e2πij/M with 1 ≤ j ≤ M and k ≥ 2 then µm 1 −Bk (j/M) = . k k (2πi) m k! 0=m∈Z

Proof. This is a typical sort of calculation which we give using results from [La]. Now

m

k

µ /m =

∞ m µ m=1

0=m∈Z

=

mk

+ (−1)

kµ

−m

mk

M ∞ µt + (−1)k µ−t (Mn + t)k t=1 n=0

= M −k

M t=1

ζ (k, t/M)(µt + (−1)k µ−t ),

Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine

21

where ζ (k, x) is the Hurwitz zeta-function ζ (k, x) =

∞ n=0

For an

M th

1 . (n + x)k

root of unity α define fα : Z/MZ → C by fα (n) = α n . Set ξ(k, fα ) = M −k

M

fα (n)ζ (k, n/M).

n=1

In fact, one can use this definition for any function f : Z/MZ → C. Thus µm = ξ(k, fµ ) + (−1)k ξ(k, fµ−1 ). mk

0=m∈Z

Define

1 0

fj (n) = and fˆj (n) =

M

n ≡ j modM n ≡ j modM fj (a)e−2πian/M .

a=1

Then we get fˆj (n) = µ−n . Use (loc.cit. Theorem 2.1, p. 245) to get (remember k ≥ 2) 1−k −1 2π ξ(1−k, fj ) = (2πi) !(k) ξ(k, fµ )eπi(1−k)/2 − ξ(k, fµ−1 )e−πi(1−k)/2 M −k k−1 = (2πi) M (k − 1)! ξ(k, fµ ) + (−1)k ξ(k, fµ−1 ) . (! is the gamma-function here!) So now we have µm (2π i)k M 1−k = ξ(1 − k, fj ). k m (k − 1)!

0=m∈Z

On the other hand by definition, ξ(1 − k, fj ) = M k−1

M

fj (n)ζ (1 − k, n/M) = M k−1 ζ (1 − k, j/M)

n=1

so

µm (2π i)k ζ (1 − k, j/M). = mk (k − 1)!

0=m∈Z

Moreover (loc.cit. Corollary on p, 243) ζ (1 − k, j/M) = −!(k)Resz

z−k ezj/M = −(k − 1)!Bk (j/M)/k! = −Bk (j/M)/k. ez − 1

22

C. Dong, H. Li, G. Mason

So finally

µm (2π i)k = − Bk (j/M). mk k!

0=m∈Z

We next introduce the Q-functions. With µ = e2πij/M and λ = e2πil/N , we define for k = 1, 2, · · · and (µ, λ) = (1, 1), Qk (µ, λ, qτ ) = Qk (µ, λ, τ ) n+j/M

=

λ(n + j/M)k−1 qτ 1 n+j/M (k − 1)! 1 − λqτ n≥0

n−j/M

+

(−1)k λ−1 (n − j/M)k−1 qτ n−j/M (k − 1)! 1 − λ−1 qτ

−

n≥1

Bk (j/M) . k!

(4.24)

Here (n + j/M)k−1 = 1 if n = 0, j = 0 and k = 1. Similarly, (n − j/M)k−1 = 1 if n = 1, j = M and k = 1. For good measure we also set Q0 (µ, λ, τ ) = −1.

(4.25)

We need to justify the notation, which suggests that Qk (µ, λ, τ ) depends only on τ and the residue classes of j and l modulo M and N respectively. To see this, note that if we provisionally denote by Qk (µ, λ, τ ) the value of (4.24) in which j is replaced by j + M, then we find that Qk (µ, λ, τ ) − Qk (µ, λ, τ ) =

1 Bk (j/M + 1) Bk (j/M) (j/M)k−1 − + = 0, (k − 1)! k! k!

the last equality following from (4.22). We are going to prove Theorem 4.6. If k ≥ 0 then Qk (µ, λ, τ ) is a holomorphic modular form of weight k. If ab γ = ∈ ! it satisfies cd Qk (µ, λ, γ τ ) = (cτ + d)k Qk ((µ, λ)γ , τ ). As usual one needs to deal with the cases k = 1, 2 of Theorem 4.6 separately. To this end, for each element a = (a1 , a2 ) ∈ Q2 /Z2 , we recall the Klein and Hecke forms (loc.cit.) defined as follows: ga (τ ) = −qτB2 (a1 )/2 e2πia2 (a1 −1)/2 (1 − qa1 τ +a2 )

1 qa1 τ +a2 ha (τ ) = 2πi a1 − − − 2 1 − qa1 τ +a2

∞ m=1

∞ n=1

(1−qτn qa1 τ +a2 )(1 − qτnau qa−1 ), 1 τ +a2

qτm qa−1 qτm qa1 τ +a2 1 τ +a2 . − 1 − qτm qa1 τ +a2 1 − qτm qa−1 1 τ +a2 (4.26)

. Using (4.21) we easily find Proposition 4.7. Let a = (j/M, l/N ) ∈ Z. Then

Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine

23

(i) ha (τ ) = −2πiQ1 (µ, λ, τ ) d (ii) dτ (logga (τ )) = −2πiQ2 (µ, λ, τ ). Now we can prove Theorem 4.6 in the case k = 1, 2. For k = 1 we use (i) above together with Theorem 2 (i) and H3 of (loc.cit. p. 248). Similarly if k = 2 we use the calculation of (loc.cit. p. 251 et. seq.) We now consider the case k ≥ 3. In this case the result is a consequence of the following: Theorem 4.8. If k ≥ 3 then 1 (2π i)k

Qk (µ, λ, τ ) =

m1 ,m2 ∈Z

λ−m1 µm2 , (m1 τ + m2 )k

indicates that (m1 , m2 ) = (0, 0). λ−m1 µm2 Proof. The non-constant part of (m k is equal to 1 τ +m2 )   ∞ ∞ −m1 µm2 m1 µm2 λ λ   + (−1)k (m1 τ + m2 )k (m1 τ − m2 )k

where

m2 ∈Z

m1 =1

=

∞ M m1 =1 t=1 n∈Z

µt

m1 =1

λ−m1 λm1 k + (−1) (m1 τ + Mn + t)k (m1 τ − Mn − t)k

M ∞

= M −k

µt

m1 =1 t=1 n∈Z λ−m1

(m1 τ/M + n + t/M)k

+ (−1)k

λm1 (m1 τ/M − n − t/M)k

.

Use (loc.cit. p. 155) to get this equal to M −k

M ∞ (−1)k µt (2π i)k

(k − 1)!

m1 =1 t=1 ∞ n=1

n n nk−1 λ−m1 qm +(−1)k nk−1 λm1 qm 1 τ/M+n+t/M 1 τ/M−n−t/M

= (−1)k M −k

∞ M ∞ (2πi)k t µ (k − 1)! m1 =1 t=1

n=1

m1 n m1 n nk−1 λ−m1 ν tn qτ/M +(−1)k nk−1 λm1 ν −tn qτ/M

= (−1)k M −k 



d|n

M

∞

t=1

n=1

(2π i)k t µ (k − 1)!



n d k−1 (λ−n/d ν td + (−1)k λn/d ν −td  qτ/M

24

C. Dong, H. Li, G. Mason

(where ν = e2πi/M ). Using orthogonality relations for roots of unity, this is equal to  ∞ M 1−k (−2πi)k    (k − 1)! n=1

=

=



d k−1 λ−n/d + (−1)k

d|n

d≡−j (modM)

d|n

 n d k−1 λn/d   qτ/M

d≡j (modM)

∞ ∞ M 1−k (2πi)k m m(Mn+j ) λ (Mn + j )k−1 qτ/M (k − 1)! n=0 m=1 m(Mn+M−j ) k −m +(−1) λ (Mn + M − j )k−1 qτ/M

M 1−k (2πi)k (k − 1)!   Mn+j ∞ −1 (Mn + M − j )k−1 q Mn+M−j λ(Mn + j )k−1 qτ/M λ τ/M  , + (−1)k Mn+j −1 q Mn+M−j 1 − λq 1 − λ n=0 τ/M τ/M

where 1 ≤ j ≤ M. Together with Proposition 4.5 we now get 1 λ−m1 µm2 Bk (j/M) M 1−k = − + (2πi)k (m τ + m2 )k k! (k − 1)!  1  Mn+j ∞ −1 (Mn + M − j )k−1 q Mn+M−j λ λ(Mn + j )k−1 qτ/M τ/M   + (−1)k Mn+j −1 q Mn+M−j 1 − λq 1 − λ n=0 τ/M τ/M

= Qk (µ, λ, τ ). This completes the proof of Theorem 4.8 and hence also Theorem 4.6.

We remark the well-known fact that if we take (µ, λ) = (1, 1) in Theorem 4.8 we obtain the Eisenstein series of weight k as long as k ≥ 3 is even. Thus for k ≥ 4 even,

1 (m1 τ + m2 )k m1 ,m2 ∈Z

∞ 2 k −Bk (0) n = (2πi) σk−1 (n)q , + k! (k − 1)!

Gk (τ ) =

(4.27)

n=1

where σk−1 (n) = d|n d k−1 . For this range of values of k, Gk (τ ) is a modular form on SL(2, Z) of weight k. We make use of the normalized Eisenstein series ∞

Ek (τ ) =

1 −Bk (0) 2 Gk (τ ) = σk−1 (n)q n + k (2πi) k! (k − 1)!

(4.28)

n=1

for even k ≥ 2. (Warning: this is not the same notation as used in (loc.cit.), for example.)

Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine

25

We also utilize the differential operator ∂k which acts on holomorphic functions on h via ∂k f (τ ) =

1 df (τ ) + kE2 (τ )f (τ ). 2π i dτ

(4.29)

One checks using (4.7) that (∂k f )|k+2 γ = ∂k (f |k γ ).

(4.30)

We complete this section with a discussion of some further functions related to P and Q which occur later. Again we take µ = e2πij/M and λ = e2πil/N . Set for k ≥ 1, P¯k (µ, λ, z, qτ ) = P¯k (µ, λ, z, τ ) =

1 (k − 1)!

n∈j/M+Z

nk−1 zn 1 − λqτn

(4.31)

with P¯0 = 0.

(4.32)

Recall (3.6). We shall need the following result in Sect. 8. Proposition 4.9. If m ∈ Z, µ = e2πij/M , k ≥ 0 and (µ, λ) = (1, 1), then 1 Qk (µ, λ, τ ) + Bk (1 − m + j/M) k z1 −1 m−j/M −m+j/M ¯ = Resz ιz,z1 ((z − z1 ) )z1 z Pk (µ, λ, , τ ) z z1 qτ −1 m−j/M −m+j/M ¯ − Resz λιz1 ,z ((z − z1 ) )z1 Pk (µ, λ, , τ) . z z Proof. The result is clear if k = 0, so take k ≥ 1. The first of the two residues we must evaluate is equal to  r+j/M  z1 k−1 ∞  (r + j/M) z 1  Resz   r+j/M (k − 1)!z 1 − λq τ n=0 r∈Z ∞

= =

(−n − m + j/M)k−1 1 −n−m+j/M (k − 1)! 1 − λqτ (−1)k

n=0 ∞

(k − 1)! n=m

n−j/M

λ−1 (n − j/M)k−1 qτ

n−j/M

1 − λ−1 qτ

Similarly the second residue is equal to ∞ n+j/M −1 λ(n + j/M)k−1 qτ . n+j/M (k − 1)! 1 − λqτ n=1−m

.

26

C. Dong, H. Li, G. Mason

Comparing with the definition of Qk (µ, λ, τ ), we see that we are reduced to establishing that 1 (Bk (1 − m + j/M) − Bk (j/M)) k!  −m n+j/M −1 λ(n + j/M)k−1 qτ     n+j/M  (k − 1)!  1 − λqτ  n=0   n−j/M −1  λ (n−j/M)k−1 qτ 0 (−1)k  + (k−1)! , n−j/M n=m −1 1−λ qτ = n+j/M −1 λ(n+j/M)k−1 qτ 1   n+j/M n=1−m  (k−1)! 1−λqτ    k m−1 λ−1 (n−j/M)k−1 q n−j/M  (−1) τ  ,  n−j/M − (k−1)! n=1 1−λ−1 qτ   0,

m≤0

.

(4.33)

m≥2 m=1

The case m = 1 is trivial. Assume next that m ≤ 0. Then the two summations in (4.33) are equal to −m

n+j/M

−1 λ(n + j/M)k−1 qτ n+j/M (k − 1)! 1 − λqτ

−

n=0

0 (−n + j/M)k−1 1 (k − 1)! n=m λqτ−n+j/M − 1

−m

=

1 (n + j/M)k−1 . (k − 1)! n=0

Now the desired result follows from (4.22). Similarly, if m ≥ 2 the two summands in (4.33) sum to m−1 (−1)k (n − j/M)k−1 . (k − 1)! n=1

Two applications of (4.22) then yield the desired result.

5. The Space of 1-Point Functions on the Torus The following notation will be in force for some time: V is a vertex operator algebra . g, h ∈ Aut(V ) have finite order and satisfy gh = hg. A = "g, h#. N = lcm(T , T1 ). g has order T , h has order T1 and A has exponent ab in SL(2, Z) satisfying a ≡ d ≡ 1 (e) !(T , T1 ) is the subgroup of matrices cd (modN ), b ≡ 0 (modT ), c ≡ 0 (modT1 ). (f) M(T , T1 ) is the ring of holomorphic modular forms on !(T , T1 ); it is naturally graded M(T , T1 ) = ⊕k≥0 Mk (T , T1 ), where Mk (T , T1 ) is the space of forms of weight k. We also set M(1) = M(1, 1). (g) V (T , T1 ) = M(T , T1 ) ⊗C V . (a) (b) (c) (d)

Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine

27

(h) O(g, h) is the M(T , T1 )-submodule of V (T , T1 ) generated by the following elements, where v ∈ V satisfies gv = µ−1 v, hv = λ−1 v : v[0]w, w ∈ V , (µ, λ) = (1, 1), v[−2]w +

∞

(2k − 1)E2k (τ ) ⊗ v[2k − 2]w, (µ, λ) = (1, 1),

(5.1) (5.2)

k=2

v, (µ, λ) = (1, 1), ∞

Qk (µ, λ, τ ) ⊗ v[k − 1]w, (µ, λ) = (1, 1).

(5.3) (5.4)

k=0

Here, notation for modular forms is as in Sect. 4. These definitions are sensible because of the following: Lemma 5.1. M(T , T1 ) is a Noetherian ring which contains each E2k (τ ), k ≥ 2, and each Qk (µ, λ, τ ), k ≥ 0, for µ, λ a T th ., resp. T1th . root of unity. Proof. It is well-known that the ring of holomorphic modular forms on any congruence subgroup of SL(2, Z) is Noetherian. So the first statement holds. Each E2k is a modular form on SL(2, Z), whereas the containment Qk (µ, λ, τ ) ∈ Mk (T , T1 ) follows from Theorem 4.6. Lemma 5.2. Suppose that V satisfies Condition C2 . Then V (T , T1 )/O(g, h) is a finitelygenerated M(T , T1 )-module. Proof. Since C2 (V ) is a graded subspace of V of finite codimension, there is an integer m such that Vn ⊂ C2 (V ) whenever n > m. Let M be the M(T , T1 )-submodule of V (T , T1 ) generated by W = ⊕n≤m Vn . The lemma will follow from the assertion that V (T , T1 ) = M +O(g, h). This will be established by proving that if v ∈ V[k] (cf. (2.22)) then v ∈ M + O(g, h). If k ≤ m then v ∈ W by (2.23) and we are done, so we may take k > m. Since V[k] ⊂ C2 (V ) + W then we may write v in the form v=w+

p

ai (−2)bi

i=1

with ai , bi ∈ V homogeneous in the vertex operator algebra (V, [ ]) such that wt[ai ] + wt[bi ] + 1 = k. Clearly, it suffices to show that ai (−2)bi ∈ M + O(g, h). We may also assume that gai = µ−1 ai , hai = λ−1 ai for suitable scalars µ, λ. Suppose first that (µ, λ) = (1, 1). From (5.2) we see that O(g, h) contains ai [−2]bi +

∞

(2l − 1)E2l (τ ) ⊗ ai [2l − 2]bi .

l=2

Since wt[ai [2l − 2]bi ] = k − 2l, then the sum ∞

(2l − 1)E2l (τ ) ⊗ ai [2l − 2]bi

l=2

lies in M + O(g, h) by the inductive hypothesis, whence so too does ai [−2]bi .

28

C. Dong, H. Li, G. Mason

On the other hand, it follows from (2.9) that we have v(n) = v[n] + αj v[j ] j >n

for v ∈ V , j ∈ Z and scalars αj . In particular we get αj ai [j ]bi . ai (−2)bi = ai [−2]bi + j >−2

Having already shown that each of the summands ai [j ]bi lies in M + O(g, h), j ≥ −2, we get ai (−2)bi ∈ M + O(g, h) as desired. Now suppose that (µ, λ) = (1, 1). In this case (5.4) tells us that O(g, h) contains the element ∞ −ai [−1]bi + Ql (µ, λ, τ ) ⊗ ai [l − 1]bi l=1

(cf. (4.25)). More to the point, O(g, h) also contains the same expression with ai replaced by L[−1]ai . Since (L[−1]ai )[t] = −tai [t − 1] by (2.7), we see that O(g, h) contains the element ∞ ai [−2]bi + (l − 1)Ql (µ, λ, τ ) ⊗ ai [l − 2]bi . l=1

Now we proceed as before to conclude that ai (−2)bi ∈ M + O(g, h).

There is a natural grading on V (T , T1 ). Namely, the subspace of elements of degree n is defined to be V (T , T1 )n = ⊕k+l=n Mk (T , T1 ) ⊗ V[l] .

(5.5)

Observe that O(g, h) is a graded subspace of V (T , T1 ). Lemma 5.3. Suppose V satisfies Condition C2 . If v ∈ V then there is m ∈ N and elements ri (τ ) ∈ M(T , T1 ), 0 ≤ i ≤ m − 1, such that L[−2]m v +

m−1

ri (τ ) ⊗ L[−2]i v ∈ O(g, h).

(5.6)

i=0

Proof. Let I be the M(T , T1 )-submodule of V (T , T1 )/O(g, h) generated by {L[−2]i v, i ≥ 0}. Since M(T , T1 ) is a Noetherian ring, Lemma 5.2 tells us that I is finitely generated and so some relation of the form (5.6) must hold. We now define the space of (g, h) 1-point functions C 1 (g, h) to be the C-linear space consisting of functions S : V (T , T1 ) × h → C which satisfy (C1) S(v, τ ) is holomorphic in τ for v ∈ V (T , T1 ). (C2) S(v, τ ) is M(T , T1 )-linear in the sense that S is C-linear in v and satisfies S(f ⊗ v, τ ) = f (τ )S(v, τ ) for f ∈ M(T , T1 ) and v ∈ V .

(5.7)

Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine

29

(C3) S(v, τ ) = 0 if v ∈ O(g, h). (C4) If v ∈ V satisfies gv = hv = v then S(L[−2]v, τ ) = ∂S(v, τ ) +

∞

E2l (τ )S(L[2l − 2]v, τ ).

(5.8)

l=2

In (5.8), ∂S is the operator which is linear in v and satisfies ∂S(v, τ ) = ∂k S(v, τ ) =

1 d S(v, τ ) + kE2 (τ )S(v, τ ) 2π i dτ

for v ∈ V[k] (cf. (4.29)).

Theorem 5.4 (Modular-Invariance). For S ∈ C1 (g, h) and γ =

ab cd

(5.9)

∈ SL(2, Z)

define S|γ (v, τ ) = S|k γ (v, τ ) = (cτ + d)−k S(v, γ τ )

(5.10)

for v ∈ V[k] , and extend linearly. Then S|γ ∈ C1 ((g, h)γ ). Proof. We need to verify that S|γ satisfies (C3)-(C4) with (g, h)γ = (g a hc , g b hd ) in place of (g, h). Step 1. S|γ vanishes on O((g, h)γ ). Pick v, w ∈ V homogeneous in (V , Y [ ]) and with g a hc v = µ−1 v, g b hd v = λ−1 v. Suppose to begin with that (µ, λ) = (1, 1). We must show that S|γ (u, τ ) = 0 when u is one of the elements (5.1) and (5.2). This follows easily from (5.10), the equality S(u, τ ) = 0, and the fact that E2k is modular of weight 2k. Now assume that (µ, λ) = (1, 1). If gv = α −1 v and hv = β −1 v then (α, β) = (1, 1), so that certainly S|γ (v, τ ) = (cτ + d)−wt[v] S(v, γ τ ) = 0. So it remains to show that S|γ (u, τ ) = 0 for ∞ u= Qk (µ, λ, τ ) ⊗ v[k − 1]w. k=0

First note that we have (α, β) = (µ, λ)γ −1 . Then with Lemma 5.1 and Theorem 4.6 we calculate that S|γ (u, τ ) = =

∞ k=0 ∞

Qk (µ, λ, τ )S|γ (v[k − 1]w, τ ) Qk (µ, λ, τ )(cτ + d)−wt[v]−wt[w]+k S(v[k − 1]w, γ τ )

k=0

= (cτ + d)−wt[v]−wt[w] = (cτ + d)−wt[v]−wt[w]

∞ k=0 ∞ k=0

which is indeed 0 since S ∈ C1 (g, h).

Qk (α, β, γ τ )S(v[k − 1]w, γ τ ) S(Qk (α, β, γ τ ) ⊗ v[k − 1]w, γ τ )

30

C. Dong, H. Li, G. Mason

Step 2. S|γ satisfies (5.8). First note that if g a hc v = g b hd v = v then also gv = hv = v. Then we calculate using (4.30) that S|γ (L[−2]v, τ ) = (cτ + d)−wt[v]−2 S(L[−2]v, γ τ ) = (cτ + d)−wt[v]−2 (∂S(v, γ τ ) +

∞

E2k (γ τ )S(L[2k − 2]v, γ τ ))

k=2

= (∂wt[v] S)|wt[v]+2 γ (v, τ ) +

∞

(cτ + d)2k−wt[v]−2 E2k (τ )S(L[2k − 2]v, γ τ )

k=2

= ∂wt[v] (S|wt[v] γ )(v, τ ) +

∞

E2k (τ )S|γ (L[2k − 2]v, τ ).

k=2

This completes the proof of Step 2, and with it that of the theorem.

6. The Differential Equations In this section we study certain differential equations which are satisfied by functions S(v, τ ) in the space of (g, h) 1-point functions. The idea is to exploit Lemma 5.3 together with (5.8). We fix an element S ∈ C1 (g, h). Lemma 6.1. Let v ∈ V and suppose that V satisfies Condition C2 . There are m ∈ N and ri (τ ) ∈ M(T , T1 ), 0 ≤ i ≤ m − 1, such that S(L[−2]m v, τ ) +

m−1

ri (τ )S(L[−2]i v, τ ) = 0.

(6.1)

i=0

Proof. Combine Lemma 5.3 together with (C2) and (C3).

In the following we extend (5.9) by setting for f ∈ Ml (T , T1 ), v ∈ V[k] , ∂S(f ⊗ v, τ ) = ∂k+l (S(f ⊗ v, τ )) = ∂k+l (f (τ )S(v, τ ))

(6.2)

(cf. (4.29). Then define ∂ i S(f ⊗ v, τ ) = ∂k+l+2(i−1) (∂ i−1 S(f ⊗ v, τ ))

(6.3)

∂S(f ⊗ v, τ ) = (∂l f (τ ))S(v, τ ) + f (τ )∂S(v, τ ).

(6.4)

for i ≥ 1. Note that

Moreover ∂l f (τ ) ∈ Ml+2 (T , T1 ) as we see from (4.30). The simplest case to study is that corresponding to a primary field, i.e. a vector v which is a highest weight vector for the Virasoro algebra in (V , Y [ ]). Thus v satisfies L[n]v = 0 for n > 0. We assume that this holds until further notice. First note that we have S(L[−2]v, τ ) = ∂S(v, τ ).

(6.5)

Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine

31

This follows from (5.8) if gv = hv = v. In general, it is a consequence of this special case, the linearity of S(v, τ ) in v, and the identity S(w, τ ) = 0 if gw = µ−1 w, hw = λ−1 w and (µ, λ) = (1, 1). This latter equality follows from (5.3) and (C3). In the same way, we find from (5.8) that S(L[−2]i+1 v, τ ) = ∂S(L[−2]i v, τ ) +

∞

E2k (τ )S(L[2k − 2]L[−2]i v, τ ).

(6.6)

k=2

Using the Virasoro algebra relation we easily find that for i ∈ N and k ≥ 2 there are scalars αij k , 0 ≤ j ≤ i − 1 such that L[2k − 2]L[−2]i v =

i−1

αij k L[−2]j v,

(6.7)

j =0

so that (6.6) becomes S(L[−2]i+1 v, τ ) = ∂S(L[−2]i v, τ ) +

i−1 ∞

αij k E2k (τ )S(L[−2]j v, τ ).

(6.8)

j =0 k=2

Now proceeding by induction on i, the case i = 1 being (6.5), one proves Lemma 6.2. Suppose that L[n]v = 0 for n > 0. Then for i ≥ 1 there are elements fj (τ ) ∈ M(1), 0 ≤ j ≤ i − 1, such that S(L[−2]i v, τ ) = ∂ i S(v, τ ) +

i−1

fj (τ )∂ j S(v, τ ).

(6.9)

j =0

Combine Lemmas 6.2 and 6.1 to obtain Lemma 6.3. Suppose that V satisfies Condition C2 , and that v ∈ V satisfies L[n]v = 0 for n > 0. Then there are m ∈ N and gi (τ ) ∈ M(T , T1 ), 0 ≤ i ≤ m − 1, such that ∂ m S(v, τ ) +

m−1

gi (τ )∂ i S(v, τ ) = 0.

(6.10)

i=0

Bearing in mind the definition of ∂ (cf. (5.9), (6.3)), (6.10) may be reformulated as follows: Proposition 6.4. Let R = R(T , T1 ) be the ring of holomorphic functions generated by E2 (τ ) and M(T , T1 ). Suppose that V satisfies Condition C2 , and that v ∈ V satisfies L[n]v = 0 for n > 0. Then there are m ∈ N and ri (τ ) ∈ R(T , T1 ), 0 ≤ i ≤ m − 1, such that m−1

(q 1 T

d m d i ) S(v, τ ) + ri (τ )(q 1 ) S(v, τ ) = 0, T dq 1 dq 1

where q 1 = e2πiτ/T . T

T

i=0

T

(6.11)

32

C. Dong, H. Li, G. Mason

d 1 d 1 d = q1 . = dq T T dq 1 2π i dτ T Now (6.11) is a homogeneous linear differential equation with holomorphic coefficients ri (τ ) ∈ R, and such that 0 is a regular singular point. The forms in R(T , T1 ) have Fourier expansions at ∞ which are power series in q 1 because they are invariant under T 1T . We are therefore in a position to apply the theory of Frobenius–Fuchs concern0 1 ing the nature of the solutions to such equations. A good reference for the elementary aspects of this theory is [I], but the reader may also consult [AM] where they arise in a context related to that of the present paper. We will also need some results from (loc. cit.) in Sect. 11. Frobenius–Fuchs theory tells us that S(v, τ ) may be expressed in the following form: for some p ≥ 0, We observe here only that q

S(v, τ ) =

p

(log q 1 )i Si (v, τ ), T

i=0

(6.12)

where Si (v, τ ) =

b(i)

q λij Si,j (v, τ ),

(6.13)

j =1

Si,j (v, τ ) =

∞

ai,j,n (v)q n/T

(6.14)

n=0

are holomorphic on the upper half-plane, and 1 λi,j1 ≡ λi,j2 (mod Z) T

(6.15)

for j1 = j2 . We are going to prove Theorem 6.5. Suppose that V satisfies Condition C2 . For every v ∈ V , the function S(v, τ ) ∈ C1 (g, h) can be expressed in the form (6.12)-(6.15). Moreover, p is bounded independently of v. We begin by proving by induction on k that if v ∈ V[k] then S(v, τ ) has an expression of the type (6.12). We have already shown this if v is a highest weight vector for the Virasoro algebra and in particular if v is in the top level V[t] of (V , Y [ ]), i.e., if V[t] = 0 and V[s] = 0 for s < t. This begins the induction. The proof is an elaboration of the previous case. We may assume that gv = hv = v. Lemma 6.6. Suppose that l ≥ 2 and i ≥ 0. Then there are scalars αij l and wij l ∈ V[2i+2−2l−2j +k] , 0 ≤ j ≤ i − 1, such that L[2l − 2]L[−2]i v = L[−2]i L[2l − 2]v +

i−1

αij l L[−2]j wij l .

j =0

Moreover wt[wij l ] ≤ wt[v] with equality only if wij l = v.

(6.16)

Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine

33

Proof. By induction on i + l, the case i = 0 being trivial. Now we calculate c L[2l − 2]L[−2]i+1 v = (L[−2]L[2l − 2] + 2lL[2l − 4] + δl,2 )L[−2]i v 2 i−1 c = L[−2]i+1 L[2l−2]v+ αij l L[−2]j +1 wij l +2lL[2l−4]L[−2]i v+δl,2 L[−2]i v. 2 j =0

Either l = 2 or the inductive hypothesis applies to L[2l − 4]L[−2]i v, and in either case the lemma follows. Now use (5.8) and Lemma 6.6 to see that S(L[−2]i+1 v, τ ) = ∂S(L[−2]i v, τ ) +

∞

E2l (τ )(S(L[−2]i L[2l − 2]v, τ ) +

l=2

i

αij l S(L[−2]j wij l , τ )). (6.17)

j =0

Note that (6.17) is the appropriate generalization of (6.8). By induction based on (6.17) we find Lemma 6.7. For i ≥ 1 we have S(L[−2]i v, τ ) = ∂ i S(v, τ ) +

i−1

fij (τ )∂ j S(v, τ ) +

j =0

i−1 j =0

gij l (τ )∂ j S(wij l , τ ),

l

(6.18) where fij (τ ), gij l (τ ) ∈ M(1) and wt[wij l ] < wt[v]. The analogue of Lemma 6.3 is now Lemma 6.8. There is m ∈ N such that ∂ m S(v, τ ) +

m−1

gi (τ )∂ i S(v, τ ) +

i=0

m j =0

hj l (τ )∂ j S(wj l , τ ) = 0

(6.19)

l

for gi (τ ), hj l (τ ) ∈ M(T , T1 ), and wt[wj l ] < wt[v]. We are now in a position to complete the proof that S(v, τ ) has an expression of the form (6.12)-(6.15). By induction this is true of the terms S(wj l , τ ) in (6.19), and hence the third summand on the l.h.s of (6.19) also has such an expression. Thus as before we may view (6.19) as a differential equation of regular singular type, this time inhomogeneous, namely, p

m−1

(q 1 T

d m d i ) S(v, τ ) + ri (τ )(q 1 ) S(v, τ ) + (log q 1 )i ui (v, τ ) = 0, T T dq 1 dq 1 T

i=0

T

i=0

(6.20) where ri (τ ) ∈ M(T , T1 ) (cf. Proposition 6.4) and ui (v, τ ) satisfies (6.13)-(6.15). One easily sees (cf. [I, AM]) that the functions (log q 1 )i ui (v, τ ), 0 ≤ i ≤ p, T are themselves solutions of a differential equation of regular singular type (6.11) with

34

C. Dong, H. Li, G. Mason

coefficients analytic in the upper half plane. Let us formally state this by saying that they are solutions of the differential equation L1 f = 0, where L1 is a suitable linear differential operator with 0 as regular singular point and coefficients analytic in the upper half plane. Now (6.20) takes the form L2 S + f = 0 for the corresponding linear differential operator L2 , so that we get L1 L2 S = 0. But L1 L2 is once again a linear differential operator of the appropriate type, so again the Frobenius–Fuchs theory allows us to conclude that S(v, τ ) indeed satisfies (6.12)-(6.15). It remains to prove that the integer p in (6.12) can be bounded independently of v. Indeed, we showed in Lemma 5.2 that if W = ⊕n≤m Vn and Vn ⊂ C2 (V ) for n > m then V(T , T1 )/O(g, h) is generated as M(T , T1 )-module by W . So for v ∈ V we have v≡ i fi (τ ) ⊗ wi (mod O(g, h)), where {wi } is a basis for W , whence S(v, τ ) = i fi (τ )S(wi , τ ) since S vanishes on O(g, h). Clearly then, we may take p to be the maximum of the corresponding integers determined by S(wi , τ ). This completes the proof of Theorem 6.5. 7. Formal 1-Point Functions Although we dealt with holomorphic functions in Sect. 6, the arguments were all formal in nature. In this short section we record a consequence of this observation. We identify elements of M(T , T1 ) with their Fourier expansions at ∞, which lie in the ring of formal power series C[[q 1 ]]. Similarly, the functions E2k (τ ), k ≥ 1, are T considered to lie in C[[q]]. The operator ∂ (cf. (5.9), (6.2)) operates on these and other power series via the identification

1 d 2πi dq

=

q1

d T T dq 1

.

T

A formal (g, h) 1-point function is a map

S : V (T , T1 ) → P , where P is the space of formal power series of the form qλ

∞

an q n/T

(7.1)

n=0

for some λ ∈ C, and which satisfies the formal analogues of (C2)-(C4) in Sect. 5. We will establish Theorem 7.1. Suppose that S is a formal (g, h) 1-point function. Then S defines an element of C1 (g, h), also denoted by S, via the identification S(v, τ ) = S(v, q), q = qτ = e2πiτ .

(7.2)

The main point is to show that if S is a formal (g, h) 1-point function, and if v ∈ V is such that ∞ λ S(v, q) = q an q n/T qτλ

∞

n/T n=0 an qτ

n=0

then is holomorphic in τ . We prove this as in Sect. 6. Namely, by first showing that if v is a highest weight vector for the Virasoro algebra then S(v, q) satisfies a differential equation of type (6.11). Since the coefficients are holomorphic in h, the Frobenius–Fuchs theory tells us that S(v, q) has the desired convergence.

Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine

35

Proceeding by induction on wt[v], in the general case we arrive at an inhomogeneous differential equation of type (6.20). Again convergence of S(v, q) follows from the Frobenius–Fuchs theory. Since the proofs of these assertions are precisely the same as those of Sect. 6, we omit further discussion. 8. Correlation Functions In this section we start to relate the theory of 1-point functions to that of twisted V modules. We keep the notation (a)-(h) introduced at the beginning of Sect. 5, and introduce now a simple g-twisted V -module M = M(g) = ⊕∞ n=0 Mλ+n/T (cf. (3.11)). We further assume that h leaves M stable, that is h ◦ M & M. As remarked in Sect. 3, there is a projective representation on M of the stabilizer (in AutV ) of M, and we let φ(h) be a linearized action on M of the element corresponding to h. This all means (cf. (3.15), (3.16)) that if v ∈ V operates on M via the vertex operator YM (v, z) then we have (as operators on M) φ(h)YM (v, z)φ(h)−1 = YM (hv, z).

(8.1)

We define M = ⊕∞ n=0 Mλ+n/T to be the restricted dual of M, so that Mn = HomC (Mn , C) and there is a pairing ", # : M × M → C such that "Mn , Mm # = 0 if m = n. With this notation, a (g, h) 1-point correlation function is essentially a trace function, namely trYM (v, z)q L(0) = "w , YM (v, z)q L(0) w#, (8.2) w,w

where w ranges over a homogeneous basis of M, w ranges over the dual basis of M , and q is indeterminate. As Laurent series we have trYM (v, z)q L(0) = "w , v(n)q L(0) w#z−n−1 . (8.3) w,w n∈ 1 Z T

It is easy to see that the trace function is independent of the choice of basis. Now we introduce the function T which is linear in v ∈ V , and defined for homogeneous v ∈ V as follows: T (v) = TM (v, (g, h), q) = zwtv trYM (v, z)φ(h)q L(0)−c/24 .

(8.4)

Here c is the central charge of V . Next observe that m ∈ T1 Z, v(m) maps Mn to Mn+wtv−m−1 . So unless m = for wtv − 1, we have "w , v(m)φ(h)w# = 0. So only the zero mode o(v) = v(wtv − 1) contributes to the sum in (8.3). Thus T (v) is independent of z, and T (v) = q λ−c/24

∞

tr Mλ+n/T o(v)φ(h)q n/T .

(8.5)

n=0

We could equally write TM (v) = tr M o(v)φ(h)q L(0)−c/24 . We are going to prove

(8.6)

36

C. Dong, H. Li, G. Mason

Theorem 8.1. T (v) ∈ C1 (g, h). The strategy is to prove that T is a formal (g, h) 1-point function, then invoke Theorem 7.1. Certainly T (v) has the correct shape as a power series in q (cf. (7.1). So we must establish that T (v) satisfies the formal analogues of (C2)-(C4). We can impose M(T , T1 )-linearity (C2) by extension of scalars. As we shall explain, the proof of (C4) is contained in Zhu’s paper [Z]. So it remains to discuss (C3), that is we must show that T (v) vanishes on O(g, h), i.e., on the elements of type (5.1)-(5.4). Again we shall later explain that (5.1) and (5.2) may be deduced from results in [Z], so we concentrate on (5.3) and (5.4). To this end, let us now fix a homogeneous v ∈ V such that gv = µ−1 v, hv = λ−1 v and (µ, λ) = (1, 1). We need to establish Lemma 8.2. T (v) = 0. Theorem 8.3. ∞ k=0 Qk (µ, λ, τ )T (v[k − 1]w) = 0 for any w ∈ V . The proof of Lemma 8.2 is easy. We have already seen that only the zero mode o(v) of v contributes a possible non-zero term in the calculation of T (v). On the other hand, if µ = 1 then from (3.4) we see that o(v) = 0. So Lemma 8.2 certainly holds if µ = 1. Suppose that λ = 1. We have T (v) = tr M o(v)φ(h)q L(0)−c/24 = tr M φ(h)o(v)q L(0)−c/24 , i.e., trYM (v, z)φ(h)q L(0) = trφ(h)YM (v, z)q L(0) .

(8.7)

φ(h)YM (v, z) = λ−1 YM (v, z)φ(h).

(8.8)

But (8.1) yields

As λ = 1, (8.7) and (8.8) yield trYM (v, z)φ(h)q L(0) = 0. This completes the proof of Lemma 8.2. The proof of Theorem 8.3 is harder. We first need to define n-point correlation functions. These are multi-linear functions T (v1 , . . . , vn ), vi ∈ V , defined for vi homogeneous via T (v1 , . . . , vn ) = T ((v1 , z1 ), . . . , (vn , zn ), (g, h), q)

= z1wtv1 · · · znwtvn trYM (v1 , z1 ) · · · YM (vn , zn )φ(h)q L(0)−c/24 .

(8.9)

We only need the case n = 2. We will prove Theorem 8.4. Let v, v1 ∈ V be homogeneous with gv = µ−1 v, hv = λ−1 v and (µ, λ) = (1, 1). Then T (v, v1 ) =

∞

z1 P¯k (µ, λ, , q)T (v[k − 1]v1 ), z

k=1 ∞

T (v1 , v) = λ

k=1

where P¯k is as in (4.31).

z1 P¯k (µ, λ, q, q)T (v[k − 1]v1 ), z

(8.10)

(8.11)

Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine

37

We start with Lemma 8.5. Let k ∈

1 T Z.

Then

(1 − λq k )trv(wtv − 1 + k)YM (v1 , z1 )φ(h)q L(0) ∞ wtv − 1 + k wtv−1+k−i z1 = trYM (v(i)v1 , z1 )φ(h)q L(0) i

(8.12)

i=0

(1 − λq k )trYM (v1 , z1 )v(wtv − 1 + k)φ(h)q L(0) ∞ wtv − 1 + k wtv−1+k−i z1 = λq k trYM (v(i)v1 , z1 )φ(h)q L(0) . i

(8.13)

i=0

Proof. We have trv(wtv − 1 + k)YM (v1 , z1 )φ(h)q L(0) = tr[v(wtv − 1 + k), YM (v1 , z1 )]φ(h)q L0 + trYM (v1 , z1 )v(wtv − 1 + k)φ(h)q

L(0)

(8.14) .

From (8.8) we get v(wtv − 1 + k)φ(h) = λφ(h)v(wtv − 1 + k), moreover, v(wtv − 1 + k)q L(0) = q k q L(0) v(wtv − 1 + k). Hence trYM (v1 , z1 )v(wtv − 1 + k)φ(h)q L(0) = λq k trv(wtv − 1 + k)YM (v1 , z1 )φ(h)q L(0) . (8.15) Using the relation [v(m), YM (v1 , z1 )] =

∞ m i=0

i

z1m−i YM (v(i)v1 , z1 )

which is a consequence of the Jacobi identity (2.3) we get tr[v(wtv − 1 + k), YM (v1 , z1 )] =

∞ wtv − 1 + k i=0

i

z1wtv−1+k−i YM (v(i)v1 , z1 ).

Both parts of the lemma follow from (8.15)-(8.16).

(8.16)

38

C. Dong, H. Li, G. Mason

Now we turn to the proof of (8.10) of Theorem 8.4. Using (8.12) in the last lemma and setting µ = e2πir/T we have T (v, v1 ) = T ((v, z), (v1 , z1 ), (g, h), q) = zwtv z1wtv1 trYM (v, z)YM (v1 , z1 )φ(h)q L(0)−c/24 = zwtv z1wtv1 z−wtv−k trv(wtv − 1 + k)YM (v1 , z1 )φ(h)q L(0)−c/24 k∈Z+ Tr

= z1wtv1

z−k (1 − λq k )−1

k∈Z+ Tr

∞

wtv − 1 + k wtv−1+k−i z1 YM (v(i)v1 , z1 )φ(h)q L(0)−c/24 i i=0 ∞ z1 wtv − 1 + k T (v(i)v1 ) = ( )k (1 − λq k )−1 i z r i=0

k∈Z+ T

=

∞

(

k∈Z+ Tr

=

i

z1 k ) (1 − λq k )−1 c(wtv, i, m)k m T (v(i)v1 ) z

i ∞

i=0 m=0

m!P¯m+1 (µ, λ, z1 /z, q)c(wtv, i, m)T (v(i)v1 )

i=0 m=0

=

∞

P¯m+1 (µ, λ, z1 /z, q)T (v[m]v1 ),

m=0

where we have used (2.10) and (4.31). This is precisely (8.10) of Theorem 8.4. Equation (8.11) follows in the same way by using (8.13). Before proving Theorem 8.3 we still need Lemma 8.6. We have ∞ ∞ r/T 1 v(i − 1). Bk (1 − wtv + r/T )v[k − 1] = i k! k=0

i=0

Proof. The l.h.s. of the equality is equal to Resw Y [v, w]

e(1−wtv+r/T )w e(1+r/T )w = Resw Y (v, ew − 1) w w e −1 e −1 (1 + z)r/T = Resz Y (v, z) z ∞ r/T = v(i − 1) i i=0

as required.

Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine

39

Proof of Theorem 8.3. Combine Lemma 8.6 and Proposition 4.9 to get ∞

=

Qk (µ, λ, q)T (v[k − 1]w)

k=0 ∞

z1 wtv−r/T −wtv+r/T ¯ Resz ιz,z1 ((z − z1 )−1 )z1 z Pk (µ, λ, , q) T (v[k − 1]w) z k=1 ∞ z1 q wtv−r/T −wtv+r/M ¯ −λ Pk (µ, λ, Resz ιz1 ,z ((z−z1 )−1 )z1 z , q) T (v[k−1]w) z k=1 ∞ r/T − T (v(i − 1)w). i i=0

On the other hand, use (3.7) to obtain ∞ r/T i=0

i

T (v(i − 1)w) =

∞ r/T i=0

i

z1wtv+wtw−i

trYM (v(i − 1)w, z1 )φ(h)q L(0)−c/24 ∞ r/T wtv+wtw−i z1 = Resz−z1 (z − z1 )i−1 i i=0

trYM (Y (v, z − z1 )w, z1 )φ(h)q L(0)−c/24 r/T z = Resz−z1 ιz1 ,z−z1 (z − z1 )−1 z1wtv+wtw z1 trYM (Y (v, z − z1 )w, z1 )φ(h)q L(0)−c/24 z1 = Resz ιz,z1 (z − z1 )−1 ( )wtv−r/T T (v, w) z −1 z1 wtv−r/T T (w, v) − Resz ιz1 ,z (z − z1 ) ( ) z which by Theorem 8.4 is equal to ∞

Resz ιz,z1 (z − z1 )−1 (

k=1

−λ

∞

z1 wtv−r/T ¯ z1 Pk (µ, λ, , q)T (v[k − 1]w) ) z z

Resz ιz1 ,z (z − z1 )−1 (

k=1

This completes the proof of the theorem.

z1 wtv−r/T ¯ z1 q Pk (µ, λ, ) , q)T (v[k − 1]w). z z

In order to complete the proof of Theorem 8.1 we need to explain how (5.8), and the fact that T (v) vanishes on (5.1) and (5.2), follow from [Z]. These results concern the case in which the critical vector v satisfies gv = hv = v. Thus v lies in the invariant sub vertex operator algebra V A . Now Zhu’s proof of (5.8), for example, is quite general in the sense that it does not depend on any special properties of V . In particular, his argument

40

C. Dong, H. Li, G. Mason

applies to V A , which is what we need. (Note that M is a module for V A .) Similarly, Zhu’s argument establishes that T vanishes on (5.1) and (5.2) in the case that g and h both fix v and w. On the other hand we may certainly assume that w is an eigenvector for g and h. If gw = αw, hw = βw and (α, β) = (1, 1), we have already seen (cf. the proof of Lemma 8.2) that T (v[2k − 2]w) = 0, so it is clear in this case that T vanishes on (5.2). This completes the proof of Theorem 8.1. Theorem 8.7. Let M 1 , M 2 , ... be inequivalent simple g-twisted V -modules, each of which is h-stable. Let T1 , T2 , . . . . be the corresponding trace functions (8.6). Then T1 , T2 , . . . . are linearly independent elements of C1 (g, h). Proof. Suppose false. Then we may choose notation so that for some m ∈ N there are non-zero scalars c1 , c2 , . . . , cm such that c1 T1 + · · · + cm Tm = 0.

(8.17)

Let 7i be the top level of M i , 1 ≤ i ≤ m, and let λi be the conformal weight of M i (cf. Sect. 3). Thus M i is graded by T1 Z, i M i = ⊕∞ n=0 Mλi +n/T

and 7i = Mλi i . Define a partial order << on the λi by declaring that λi << λj , if and only if λj − λi ∈

1 Z+ . T

(8.18)

We may, and shall, assume that λ1 is minimal with respect to <<. By Theorem 3.4 the 7i realize inequivalent irreducible representations of the algebra Ag (V ). Moreover from the discussion following (3.20) the Ag (V )-modules 7i are hinvariant, and the corresponding trace functions tr|7i o(v)φ(h) are linearly independent. Thus we may choose v so that tr|71 o(v)φ(h) = 1, tr|7i o(v)φ(h) = 0 for i > 1. Because of our assumption on λ1 , applying this to (8.17) yields c1 = 0, contradiction. 9. An Existence Theorem for g-Twisted Modules We will prove Theorem 9.1. Suppose that V is a simple vertex operator algebra which satisfies Condition C2 , and that g ∈ AutV has finite order. Then V has at least one simple g-twisted module. The idea is to prove that Ag (V ) = 0. Then the theorem is a consequence of Proposition 3.7. We start with more general considerations that we shall need in Sect. 10. Let (g, h) be a pair of commuting elements in Aut(V ). Lemma 9.2. Let v ∈ V satisfy gv = µ−1 v, hv = λ−1 v. Then the following hold: (a) The constant term of ∞ k=0 Qk (µ, λ, q)v[k − 1]w is equal to −v ◦g w if µ = 1. (b) The constant term of ∞ k=0 Qk (µ, λ, q)(L[−1]v)[k − 1]w is equal to −v ◦g w if µ = 1, λ = 1.

Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine

(c) The constant term of v[−2]w + λ = 1.

∞

k=2 (2k

41

− 1)E2k (q)v[2k − 2]w is v ◦g w if µ =

Proof. As usual, (c) follows as in the corresponding proof in [Z]. The proof of (a) is similar to that of Lemma 8.6. For from (4.24) we see that if we take µ = e2πir/T with 1 ≤ r ≤ T then the constant term of the expression in (a) is equal to the following (take v homogeneous): −

∞ Bk (r/T ) k=0

k!

v[k − 1]w = −Resz Y [v, z]w

erz/T ez − 1

e(wtv+r/T )z ez − 1 (1 + z)wtv−1+r/T = −Resz Y (v, z)w z = −Resz Y (v, ez − 1)w

and by (3.12) this is exactly −v ◦g w if r = T . As for (b), we replace v by L[−1]v = L(−1)v + L(0)v (cf. (2.20)) and set r = T in the foregoing. From (4.24) the constant term of ∞

Qk (1, λ, q)v[k − 1]w

k=0

is −

∞ Bk (1) k=0

k!

v[k − 1]w +

(1 + z)wtv 1 1 v[0]w = −Resz Y (v, z)w + v[0]w. 1−λ z 1−λ

Note that (L[−1]v)[0]w = 0. Then the constant term of the expression in (b) is equal to (1 + z)wtv+1 (1 + z)wtv − Resz Y (L(−1)v, z)w − wtvResz Y (v, z)w z z wtv+1 d (1 + z)wtv (1 + z) = − Resz Y (v, z) w − wtvResz Y (v, z)w dz z z d (1 + z)wtv+1 (1 + z)wtv − wtvResz Y (v, z)w dz z z wtv (1 + z) = − Resz Y (v, z)w z2 = − v ◦g w. = Resz Y (v, z)w

This completes the proof of the lemma.

Now take S ∈ C1 (g, h) and assume that S = 0. After Theorem 6.5 we may choose p so that (6.12)-(6.15) hold for all v ∈ V , and such that Sp = 0. We may further choose notation such that λp,1 is minimal among all λp,j with respect to the partial order (8.18) and ap,1,0 (v) = 0 for some v ∈ V . Setting Sp,1 (v, τ ) = α(v) +

∞ n=1

ap,1,n (v)q n/T

(9.1)

42

C. Dong, H. Li, G. Mason

defines a function α : V → C which is not identically zero. Because S(v, τ ) is linear in v, α is a linear functional on V . Lemma 9.3. α vanishes on Og (V ). Proof. We know that S vanishes on O(g, h), hence on the elements (5.1)-(5.4). Using Lemma 9.2 leads to the required vanishing conditions. For example, if µ = 1 in the notation of Lemma 9.2 then ∞

Qk (µ, λ, τ )S(v[k − 1]w) = 0.

(9.2)

k=0

This identity holds if S is replaced by Sp , and then if Sp is in turn replaced by Sp,1 . Hence ∞

Qk (µ, λ, τ )(α(v[k − 1]w) +

k=0

∞

ap,1,n (v[k − 1]w)q n/T ) = 0.

(9.3)

n=1

The constant term in (9.3), necessarily zero, is equal to α(−v ◦g w) = 0 by Lemma 9.2, so that α vanishes on v ◦g w if gv = v. The other vanishing conditions follow similarly. To complete the proof of Theorem 9.1 we must show that there is some non-zero element S of C1 (g, h) (for suitable h). For then we know that the function α is nonvanishing on V but vanishes on Og (V ), whence V = Og (V ) and Ag (V ) = V /Og (V ) is non-zero, as required. Consider C1 (1, g) : V is itself a g-stable simple V -module, so that the corresponding trace functionTV (v, (1, g), q) lies in C1 (1, g) by Theorem 8.1. So C1 (1, g) = 0, and 0 −1 since induces a linear isomorphism between C1 (1, g) and C1 (g, 1) by Theorem 1 0 5.4 then also C1 (g, 1) = 0. This completes the proof of Theorem 9.1. 10. The Main Theorems We continue to use the notation introduced in Sect. 5. The next theorem is decisive. Theorem 10.1. Suppose that V is g-rational and satisfies Condition C2 , and let M 1 , . . . , M m be all of the inequivalent, simple, h-stable, g-twisted V -modules. Let T1 , . . . , Tm be the corresponding trace functions (8.6). Then T1 , . . . , Tm form a basis of C1 (g, h). There are several important corollaries. Theorem 10.2. Suppose that V is rational and satisfies Condition C2 . Suppose further that the group "g, h# generated by g and h is cyclic with generator k. Then the dimension of C1 (g, h) is equal to the number of inequivalent, k-stable, simple V -modules. In particular, the number of inequivalent, simple g-twisted V -modules is at most equal to the number of g-stable, simple V -modules, with equality if V is g-rational. Notice that parts (ii) and (iii) of Theorem 1.1 are included in Theorem 10.2 together with Theorem 9.1. Recall next that a simple vertex operator algebra V is called holomorphic in case V is rational and if V is the unique simple V -module.

Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine

43

Theorem 10.3. Suppose that V is holomorphic and satisfies Condition C2 . For each automorphism g of V of finite order, there is a unique simple g-twisted V -module V (g). Moreover if "g, h# is cyclic then C1 (g, h) is spanned by TV (g) (v, g, h, q). This establishes Theorem 1.2 (i). First we show how Theorems 10.2 and 10.3 follow from Theorem 10.1. In the situation of Theorem 10.2 we have "g, h# = "k#. Since V is rational, Theorem 10.1 tells us that C1 (1, k) has a basis consisting of the trace functions TM (v, 1, k, q) where M ranges over the inequivalent, k-stable, simple V -modules. We can find γ ∈ SL(2, Z) such that (g, h)γ = (1, k). By the theorem of modularinvariance, γ induces a linear isomorphism from C1 (g, h) to C1 (1, k). Theorem 10.2 follows from this together with Theorem 8.7. As for Theorem 10.3, since V is holomorphic it is certainly rational, so Theorem 10.2 applies. So if "g, h# = "k# is cyclic then dim C1 (g, h) is equal to 1 since V is the only simple V -module and it is certainly k-stable. By Theorem 9.1, V has at least one simple g-twisted V -module, call it V (g), and from Theorem 8.7 there can be no more than one since dim C1 (g, 1) = 1. So V (g) is unique, hence h-stable whenever gh = hg. So TV (g) (v, g, h, q) spans C1 (g, h) by Theorem 8.1. This establishes Theorem 10.3. Turning to the proof of Theorem 10.1, we consider first an arbitrary function S ∈ C1 (g, h). We have seen in Sect. 9 that S can be represented as S(v, τ ) =

p

(logq 1 )i Si (v, τ )

i=0

T

(10.1)

for fixed p and all v ∈ V , with each Si satisfying (6.13)-(6.15). We will prove Proposition 10.4. Each Si is a linear combination of the functions T1 , . . . , Tm . Proposition 10.5. Si = 0 if i > 0. Theorem 10.1 obviously follows from these two propositions. First we show that Proposition 10.5 follows from Proposition 10.4. To this end, pick v ∈ V such that gv = µ−1 v, hv = λ−1 v. It suffices to show that pSp (v, τ ) = 0. If (µ, λ) = (1, 1) this follows from (C3) and (5.3), so we may assume gv = hv = v. Set w = L[−2]v −

∞

E2k (τ )L[2k − 2]v.

(10.2)

k=1

From (5.8) and (5.9) we get S(w, τ ) =

q1

d T S(v, τ ). T dq 1

(10.3)

T

Now Proposition 10.4 combined with Theorem 8.1 tells us that (10.3) is satisfied by each Si . Then we calculate that S(w, τ ) =

p i ( (log q 1 )i−1 Si (v, τ ) + (log q 1 )i Si (w, τ )). T T T i=0

(10.4)

44

C. Dong, H. Li, G. Mason

We may identify the parts of (10.1) which involve a given power (log q 1 )i . Taking T i = p − 1, we see that Sp−1 (w, τ ) =

p Sp (v, τ ) + Sp−1 (w, τ ), T

so that pSp (v, τ ) = 0, as desired. We turn our attention to the proof of Proposition 10.4. We assume without loss that Sp = 0 and that each Sp,j = 0 (cf. (6.13)). We are then in the situation that was in effect in Sect. 9. We adopt the notation (9.1). It was shown that α : V → C vanishes on Og (V ), and thus defines a linear functional α : Ag (V ) → C. We continue this line of reasoning, and now prove Lemma 10.6. Suppose that u, v ∈ V and satisfy hu = ρu, hv = σ v, ρ, σ ∈ C. Then α(u ∗g v) = ρδρσ,1 α(v ∗g u).

(10.5)

Proof. We may assume that gu = ξ u and gv = νv for scalars ξ, ν. If ξ or ν is not equal to 1 then u (resp. v) lies in Og (V ) (cf. Lemma 2.1 of [DLM3]), whence so too do u ∗g v and v ∗g u by Theorem 3.3. So in this case both sides of (10.5) are equal to 0. So we may assume gu = u, gv = v. Similarly, u ∗g v is an eigenvector for h with eigenvalue ρσ , so if ρσ = 1 then u ∗g v and v ∗g u lie in O(g, h) by (5.3). Then S(u ∗g v) = S(v ∗g u) = 0 by (C3), which again leads to both sides of (10.5) being 0. So we may assume that ρσ = 1 and try to prove that α(u ∗g v) = ρα(v ∗g u). Now we know from [Z] (also see Lemma 2.2 (iii) of [DLM3]) that if V g is the space of g-invariants of V then for u homogeneous u ∗g v − v ∗g u ≡ Resz Y (u, z)v(1 + z)wtu−1 (modO(V g )). Using (2.17) we get u ∗g v − v ∗g u ≡

∞ wtu − 1 u(i)v ≡ u[0]v (modO(V g )). i

(10.6)

i=0

Now certainly O(V g ) ⊂ Og (V ) (loc. cit.), and if ρ = 1 then u[0]v ∈ O(g, h) by (5.1). So in this case (10.6) leads to α(u ∗g v − v ∗g u) = 0 as desired. So we may take ρ = 1. In this case we follow the calculation of Lemma 9.2. Bearing in mind that gu = u and hv = ρv with ρ = 1, we see from the proof of Lemma 9.2 (b) that the constant term of ∞

Qk (1, ρ −1 , q)u[k − 1]v

k=0

is −u ∗g v +

1 u[0]v. 1 − ρ −1

Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine

Since S vanishes on

∞

k=0 Qk (1, ρ

−1 , q)u[k

α(u ∗g v) =

45

− 1]v ∈ O(g, h), this shows that

1 α(u[0]v). 1 − ρ −1

However (10.6) still applies, so that α(u ∗g v) =

1 (α(u ∗g v) − α(v ∗g u)), 1 − ρ −1

which is equivalent to the desired result.

We will need Lemma 10.7. Let A be a finite-dimensional semi-simple algebra over C with decomposition A = ⊕i∈I Ai into simple components. Let h : A → A be an automorphism of A of finite order, and suppose that F : A → C is a linear map which satisfies F (ab) = ρδρσ,1 F (ba)

(10.7)

whenever ha = ρa and hb = σ b, ρ, σ ∈ C. Then F can be written as a linear combination with scalars αj : αj tr Wj (aγj ), (10.8) F (a) = j ∈J

where in (10.8), {Aj }j ∈J ranges over the h-invariant simple components of A, Wj is the simple A-module such that Aj Wj = Wj , and γj ∈ A∗j satisfies ha = γj aγj−1 for a ∈ Aj . Remark. The existence of γj is the Skolem–Noether Theorem. Proof. Proceed by induction on the order of h and cardinality of I . The group "h# permutes the Ai among themselves, and the conditions of the lemma apply to any hinvariant sum of Ai . So we may assume that "h# is transitive in its action on the Ai . First assume that there are at least two components. Then there are no h-invariant components, so we must show that F = 0 in this case. If σ = 1 and hb = σ b, taking a = 1 in (10.7) shows that F (b) = 0. So we only need show that F is zero on the algebra Ah of h-invariants of A. If there is 1 = k ∈ "h# such that k fixes each Ai then the algebra of k-invariants B is a semi-simple algebra admitting "h#/"k#. By induction we see that F (B) = 0, so we are done as Ah ⊂ B. So without loss there is no such k. So h has order |I |, the number of components. Thus if A0 is the first component then we may set Ai = hi A0 , 0 ≤ i ≤ |I | − 1. Then |I |−1 h hi x|x ∈ A0 } & A0 . A ={ i=0

By (10.7) F (ab) = F (ba) for a, b ∈ Ah , so F is a trace function , i.e., F (a) = αtr W a for some α ∈ C, W the simple Ah -module. So we must show F (1A ) = 0. Let λ be an |I |th root of unity, let u, v ∈ A0 be units such that uv = 1A0 , and let a=

|I |−1 i=0

λi hi (u), b =

|I |−1 i=0

λ−i hi (v).

46

C. Dong, H. Li, G. Mason

|I |−1 Then ba = ab = i=0 hi (u)hi (v) = i hi (uv) = 1A . On the other hand ha = λ−1 a, hb = λb and λ = 1. So (10.7) yields F (1A ) = F (ab) = F (ba) = λF (ab). So F (1A ) = 0 as desired. This reduces us to the case that A is itself a simple algebra. Pick γ ∈ A∗ such that h(a) = γ aγ −1 and consider F1 : A → C defined by F1 (a) = F (aγ −1 ). If ha = ρa and ρ = 1 then F (aγ −1 ) = 0 by (10.7), so F1 (a) = 0 for such a. On the other hand we get for hb = σ b, F1 (ab) = F (abγ −1 ) = ρδρσ,1 F (bγ −1 a) = ρδρσ,1 F (bγ −1 aγ γ −1 ) = δρσ,1 F (baγ −1 ) = δρσ,1 F1 (ba). From this we conclude that F1 (ab) = F1 (ba) for all a, b ∈ A. So F1 is a trace function F1 (a) = αtr W a, so that F (a) = αtr W aγ . This completes the proof of the lemma. Now we return to the situation of Lemma 10.6. From (3.18) h induces an automorphism of Ag (V ) via h : v → hv, and since V is g-rational, then Ag (V ) is semi-simple and Lemma 10.7 applies. From the discussion in Sect. 3, the h-invariant components of Ag (V ) correspond precisely to the h-invariant simple Ag (V )-modules, and these correspond to the h-invariant simple g-twisted V -modules. For such a simple Ag (V )-module 7 we have φ(h)o(v)φ(h)−1 = o(hv) (cf. (8.1)), o(v) being the corresponding zero mode (3.14). Also o(hv) = γ o(v)γ −1 if γ represents h in the sense of Lemma 10.7. So γ and φ(h) differ by a scalar when considered as operators on 7. By Lemmas 10.6 and 10.7 we get Lemma 10.8. The linear function α : Ag (V ) → C can be represented in the form αj tr 7(M j ) o(v)φ(h), (10.9) α(v) = j

where αj are scalars and the spaces 7M j range over the top levels of the h-invariant simple g-twisted V -modules M j . Recall that we have Sp (v, τ ) =

b

q λp,j Sp,j (v, τ )

(10.10)

j =1

with Sp,1 as in (9.1). Lemma 10.9. Suppose that αj = 0 in (10.9). Then the conformal weight of the corresponding g-twisted module M j is equal to λp,1 + c/24. Proof. We use the method of proof of Proposition 10.5 once more. For v ∈ V , let w = w(v) be as in (10.2). Thus (10.3) holds whenever S ∈ C1 (g, h). Applying (10.3) with S = TM j (cf. (8.4)) and considering leading terms yields tr 7(M j ) o(w)φ(h) = (λj − c/24)tr 7(M j ) o(v)φ(h),

(10.11)

where λj is the conformal weight of M j . Similarly applying (10.3) to S itself and considering the leading term of Sp yields α(w) = λp,1 α(v).

(10.12)

Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine

47

Using (10.5), we find that for v ∈ V , λp,1

j

αj tr 7(M j ) o(v)φ(h) = λp,1 α(v) = α(w) = =

j

j

αj tr 7(M j ) o(w)φ(h)

αj (λj − c/24)tr 7(M j ) o(v)φ(h).

The linear independence of characters of Ag (V ) implies that λp,1 αj = αj (λj − c/24). The lemma follows. We are ready for the final argument. We have in the previous notation

q λp,1 Sp,1 (v, τ ) = q λp,1 (

j

=

j

αj tr 7(M j ) o(v)φ(h) +

∞

ap,1,n (v)q n/T )

n=1

q λj −c/24 αj tr 7(M j ) o(v)φ(h) + q λp,1

∞

ap,1,n (v)q n/T ).

n=1

Now also TM j (v, τ ) = q λj −c/24 (tr 7(M j ) o(v)φ(h) +

∞ n=1

tr M j

λj +n/T

o(v)φ(h)q n/T ).

So we see that the function S (v, τ ) = S(v, τ ) − (log q 1 )p T

j

αj TM j (v, τ )

again has the form (6.12)-(6.15), but the leading term of the piece corresponding to Sp now has a higher degree than Sp itself. We now continue the argument, replacing S with S and Sp with Sp . We find, since each TM j already lies in C1 (g, h), and since there are only finitely many M j , that indeed Sp is a linear combination of TM j . But our argument applies equally well to each Si , so each Si is a linear combination of TM j . This completes the proof of Proposition 10.4.

11. Rationality of Central Charge and Conformal Weights Recall from (3.11) that a simple g-twisted V -module M has grading of the form M = ⊕∞ n=0 Mλ+n/T for some λ ∈ C called the conformal weight of M. We will show that, under suitable rationality conditions on V , the conformal weight λ of M is a rational number. We prove even more, namely Theorem 11.1. Suppose that V is a holomorphic vertex operator algebra which satisfies Condition C2 , and let g ∈ AutV have finite order. Let V (g) be the unique simple gtwisted V -module whose existence is guaranteed by Theorem 10.3. Then the conformal weight of V (g) is rational, and the central charge c of V is also rational.

48

C. Dong, H. Li, G. Mason

Theorem 11.2. Suppose that V is a vertex operator algebra which satisfies Condition C2 , and let g ∈ AutV have finite order. Suppose that V is g i -rational for all integers i. Then each simple g i -twisted V -module has rational conformal weight, and the central charge c of V is rational. Theorem 11.3. Suppose that V is a rational vertex operator algebra which satisfies condition C2 . Then each simple V -module has rational conformal weight, and the central charge of V is rational. These theorems complete the proofs of Theorems 1.1 and 1.2. Note that Theorem 11.3 is simply a restatement of Theorem 11.2 in the special case that g = 1. We will prove Theorems 11.1 and 11.2 simultaneously. Indeed, at this point in the paper the proof follows from ideas in a paper of Anderson and Moore [AM]: we have only to assemble the relevant facts. First observe that to prove Theorem 11.2 it suffices to show that each simple g-twisted V -module has rational conformal weight, and that c is rational. With this in mind, let f (q) be one of the following q-expansion: q −c/24 n≥n0 (dim Vn )q n , where V = ⊕n≥n0 Vn ; q λ−c/24 n≥n0 (dim V (g)λ+n/T )q n with λ the conformal weight of V (g), where V (g) is either the unique simple g-twisted V -module in the situation of Theorem 11.1, or any simple g-twisted V -module in the situation of Theorem 11.2. Let U be the SL(2, Z)-module of holomorphic functions on h generated by f (q). In each case U is a finite-dimensional C-linear space, and the elements of U have q-expansions in (not necessarily rational) powers of q. This assertion follows from Theorems 10.1 and 10.3. This puts us in the position of being able to apply methods and results of Anderson and Moore (loc.cit.) . The argument proceeds as follows. Define by λ = λ(τ ) the usual Picard function which generates the field of rational d functions on the compactification of h/ !(2). With E = dλ , there are unique meromorphic functions ki such that U is precisely the space of solutions of the differential equation Eny +

n−1

ki E i y = 0.

(11.1)

i=0

The ki are then in C(λ) (Proposition 1 of (loc.cit.)). For a given φ ∈ Aut(C), and for r(q) ∈ U , let r φ be as defined in (loc.cit.). By the Frobenius–Fuchs theory, the r φ are then q-expansions of the solutions of the φ-transform of (11.1), namely n

E y+

n−1 i=0

φ

ki E i y = 0.

(11.2)

We claim that the solutions of (11.2) also afford a representation of SL(2, Z). First note that since each ki lies C(λ) then the actions of φ and γ ∈ SL(2, Z) on C(λ) commute: this follows from the well-known formulae for the action of the modular group on λ. Then if y|γ is the γ -image of a solution y of (11.1) we find that (y|γ )φ |γ −1 is a solution of (11.2). The claim follows from this observation. Now f (q) ∈ U has the form f = q λ−c/24 an q n/T , n≥N

Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine

49

where λ = 0 and T = 1 in the first case, and where an ∈ Z in all cases. Then f φ = q φ(λ−c/24) an q n/T , n≥N

i.e., f φ = q φ(λ−c/24)−(λ−c/24) f.

(11.3)

One now applies S to both sides of (11.3) to obtain f φ |S = e−α/τ f |S,

(11.4)

where α = 2π i(φ(λ − c/24) − (λ − c/24)). On the other hand, we showed above that both f φ |S and f |S have q-expansions. This leads to a contradiction by using the limit argument of (loc.cit.) unless α = 0. As this holds for all φ, we conclude that c/24 ∈ Q and λ − c/24 ∈ Q, which completes the proofs of the theorems. Let us formalize the situation which prevails in case V is a holomorphic vertex operator algebra which satisfies Condition C2 and is equipped with a finite group G of automorphisms. Let V (g) be the unique simple g-twisted V -module (Theorem 10.3) for g ∈ G, and let C(g) = {h ∈ G|gh = hg} be the centralizer of g in G. According to Theorem 11.1 the grading of V (g) has the form V (g) = ⊕∞ n=0 V (g)λ+n/T , where g has order T and λ = λ(g) ∈ Q. As V (g) is unique, it admits a (projective) representation of C(g), so for any h ∈ C(g) we may consider the trace function TV (g) (v, g, h, q). As usual, this is really only defined up to a nonzero scalar. By Theorem 10.3 this trace function spans the 1-dimensional space C1 (g, h) if "g, h# is cyclic. The shape of the trace function is as in (8.5), with λ − c/24 ∈ Q. Thus it has a q-expansion with rational powers of q of bounded denominator, and is holomorphic as a function on h. We now assume that "g, h# is cyclic and fix v ∈ V[k] . It follows from Theorem 5.4 that TV (g) |k γ (v, g, h, τ ) = (cτ + d)−k TV (g) (v, g, h, γ τ ) lies 1 ((g, h)γ , τ ) and hence in C ab is a scalar multiple of TV (g a hc ) (v, (g, h)γ , τ ). Here γ = lies in SL(2, Z). Thus cd there are scalars σ (g, h, γ ) such that the following holds: (cτ + d)−k TV (g) (v, g, h, γ τ ) = σ (g, h, γ )TV (g a hc ) (v, (g, h)γ , τ ).

(11.5)

Equation (11.5) together with the rationality of the corresponding q-expansions says precisely that each TV (g) (v, g, h, τ ) is a generalized modular form of weight k in the language of [KM]. Note that Theorem 1.4 follows from these results. Theorem 1.3 follows in the same way. A case of particular interest is when we take v to be the vacuum element 1, in which case k = 0. In this case Z(g, h) = TV (g) (1, g, h, τ )

(11.6)

(see (1.6)-(1.8)). This is essentially the graded trace of φ(h) on the g-twisted module V (g), sometimes called a partition function or McKay-Thompson series. In this case, we have proved

50

C. Dong, H. Li, G. Mason

Theorem 11.4. Let V be a holomorphic vertex operator algebra which satisfies Condition C2 , and let G be a finite group of automorphisms of V . For each pair of commuting elements (g, h) which generates a cyclic group, Z(g, h) is a generalized modular function (i.e., of weight zero) which is holomorphic on h and satisfies γ : Z(g, h) → σ (g, h, γ )Z((g, h)γ )

(11.7)

for γ ∈ SL(2, Z). 12. Condition C2 In order to be able to apply the preceding results to known vertex operator algebras, we need verify that Condition C2 is satisfied. We do this for some of the best known rational vertex operator algebras in this section. Refer to Sect. 3 for the definition of Condition C2 . Lemma 12.1. If V is a vertex operator algebra and M is V -module, then C2 (M) contains v(−n)M for all v ∈ V and n ≥ 2. Proof. This follows from definition (3.21) together with the equality (L(−1)m v)(−2) = (m + 1)!v(−m − 2).

Lemma 12.2. Let V1 , . . . , Vk be vertex operator algebras such that for each i, all simple Vi -modules satisfy Condition C2 . Then the same is true for the tensor product vertex operator algebra V1 ⊗ · · · ⊗ Vk . Proof. See [FHL] for tensor product vertex operator algebras and their modules. We may assume that k = 2. One knows (loc.cit.) that the simple V1 ⊗ V2 -modules are precisely those of the form M1 ⊗ M2 with Mi a simple Vi -module. If v ∈ V1 then (v ⊗ 1)(−2) = v(−2) ⊗ id, from which it follows that C2 (M1 ⊗ M2 ) contains C2 (M1 ) ⊗ M2 . Similarly it contains M1 ⊗ C2 (M2 ). The lemma follows immediately. Now we discuss Condition C2 for the most well-known rational vertex operator algebras, namely, (i)

The vertex operator algebra L(cp,q , 0) associated with the (discrete series) simple Virasoro algebra V ir-module of highest weight 0 and central charge c = cp,q = 1−

6(p−q)2 pq

([DMZ, FZ, Wa]).

(ii) The moonshine module V , ([B1, FLM3]). (iii) The vertex operator algebra VL associated with a positive definite even lattice L ([B1, D1, FLM3]). (iv) The vertex operator algebra L(k, 0) associated to a gˆ -module of highest weight 0 and positive integral level k, g a simple Lie algebra ([DL, FZ, Li]). Lemma 12.3. L(cp,q , 0) satisfies Condition C2 .

Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine

51

Proof. Set L = L(cp,q , 0). It is a quotient of the corresponding Verma module M = M(cp,q , 0) and we have M & U (V ir− ) · 1 (cf. [FZ]) where V ir− = ⊕∞ n=1 CL(−n) and the L(n) are the usual generators of the Virasoro algebra V ir. Now Y (ω, z) = n∈Z L(n)z−n−2 , so that C2 (V ) contains L(−n)L for all n ≥ 3 by Lemma 12.1. We have L = M/J and J contains two singular vectors [FF]. The first is L(−1) · 1, which C2 (L) contains U (V ir− )L(−1)1. From this we see that L = C2 (L) + ∞ shows that k 1. CL(−2) k=0 The second singular vector has the form v = L(−2)pq 1 + an1 ,··· ,nr L(−n1 − 2) · · · L( − nr − 2)1, (12.1) where the sum ranges over certain (n1 , · · · , nr ) ∈ Zr+ with n1 +· · ·+nr = 0, an1 ,...,nr ∈ C (cf. Eq. (3.11) of [DLM2]). From the previous paragraph we see that the terms under the summation sign in (12.1) each lie in C2 (L), whence also L(−2)pq 1 lies in C2 (L). By Lemma 3.8, C2 (L) is invariant under L(−2). We conclude that L = C2 (L) + pq−1 k k=0 CL(−2) 1, and the proposition follows. The following result was stated without proof in [Z] Proposition 12.4. The moonshine module V , satisfies Condition C2 . Proof. Let U be the tensor product L( 21 , 0)⊗48 . It is shown in [DMZ] that V , contains a sub vertex operator algebra isomorphic to U . Moreover when considered as a U -module, V , is a direct sum of finitely many simple U -modules. Suppose that each simple module for L( 21 , 0) satisfies Condition C2 . Then this is true also for U by Lemma 12.2, so that the space spanned by u(−2)v for u ∈ U and v ∈ V , already has finite codimension in V , . So it suffices to show that the simple L( 21 , 0)-modules indeed satisfy Condition C2 . The proof of this later assertion is similar to that of the last proposition. Apart from L = L( 21 , 0) itself there are just two other simple modules for L, namely L( 21 , 21 ) 1 1 and L( 21 , 16 ) [DMZ]. Let M( 21 , h) (h = 21 , 16 ) be the corresponding Verma module 1 1 with L( 2 , h) = M( 2 , h)/Jh . As in the previous proposition we have L(−n)L( 21 , h) ⊂ C2 (L( 21 , h)) for n ≥ 3, so that L( 21 , h) = C2 (L( 21 , h)) + a,b≥0 CL(−2)a L(−1)b vh , where vh is a highest weight vector. Moreover Jh contains two singular vectors (cf. [FF]). One of them is (L(−2) − 43 L(−1)2 )v 1 or (L(−2) − 43 L(−1)2 )v 1 (cf. [DMZ]), 2 16 from which we conclude that L( 21 , h) = C2 (L( 21 , h)) + b≥0 CL(−1)b vh . Because of the existence of a second singular vector we see that C2 (L( 21 , h)) necessarily contains L(−1)b vh for big enough b, whence our claim follows. The reader is referred to [FLM3] for the construction of VL and associated notation, which we use in the next result. Proposition 12.5. VL satisfies Condition C2 . Proof. Set H = L ⊗Z C. Then it is easy to see that VL = C2 (VL ) + S(H ⊗ t −1 ) ⊗ C{L}.

(12.2)

52

C. Dong, H. Li, G. Mason

Let 0 = α, γ ∈ L and set β = γ − α. Then C2 (VL ) contains u(−2)v, where we take u = L(−1)k ι(a) (k ≥ 0) and v = ι(b), where a, b, c ∈ Lˆ are such that a¯ = α, b¯ = β and c = ab. Then c¯ = γ . So C2 (VL ) contains Resz z−k−2 Y (ι(a), z)ι(b) = Resz z

−k−2

exp

−α(n) n<0

n

z

−n

exp

−α(n) n>0

n

z

−n

azα ι(b).

Now azα ι(b) = z"α,β# ι(c). Using (12.2) we now see that C2 (VL ) contains Resz z−k−2 eα(−1)z z"α,β# ι(c), and we conclude that C2 (VL ) contains α(−1)1+k−"α,β# ι(c)

(12.3)

whenever k ≥ 0 and k ≥ 1 − "α, β#. So if "α, β# ≥ 1 we may choose k appropriately to see that C2 (VL ) contains ι(c). So we have shown that C2 (VL ) contains ι(c), and hence S(H ⊗ t −1 ) ⊗ ι(c) by Lemma 3.8, unless "α, γ # ≤ "α, α# for all α ∈ L. Let ! denote the set of γ ∈ L with this latter property. Now ! is a finite set. Fix a Z-basis B of L and let M = maxγ ∈!,β∈B (1 − "β, γ − β#). From (12.3) we see that β(−1)M ⊗ ι(c) ∈ C2 (VL ) for all β ∈ B, γ ∈ !. Now from the above calculations, we see that C2 (VL ) contains S r (H ⊗ t −1 ) ⊗ C{L} for all big enough integers r and also C2 (VL ) contains S(H ⊗ t −1 ) ⊗ Cι(c) for all c ∈ Lˆ such that c¯ ∈ L\!. It follows from (12.2) that indeed C2 (VL ) has finite codimension in VL . Proposition 12.6. Let k be a positive integer and g a complex simple Lie algebra. Then the vertex operator algebra L(k, 0) associated to g and k satisfies Condition C2 . Proof. See [FZ] for the vertex operator algebra structure of L(k, 0) and also the corresponding Verma module M(k, 0). By definition M(k, 0) = U (ˆg) ⊗U (∞ C & U( n n=0 t ⊗g+Cc)

∞

t −n ⊗ g)

n=1

(linearly). Then L = L(k, 0) is the quotient of M(k, 0) by the maximal gˆ -submodule. For a ∈ g, Y (a, z) = n∈Z a(n)z−n−1 , so C2 (L) contains a(−n)L for all a ∈ g and all n ≥ 2 by Lemma 12.1. Thus L = C2 (L) + U (t −1 ⊗ g)1. It is enough to show that C2 (L) contains a1 (−1)m1 · · · ad (−1)md 1 whenever mi ≥ 0 and m1 + · · · + md is large enough; here a1 , . . . , ad is a basis of g. By Lemma 3.6 of [DLM2] we may choose the ai so that [Y (ai , z1 ), Y (ai , z2 )] = 0, Y (ai , z)3k+1 = 0 for each i. Now the constant term in Y (ai , z)3k+1 1 is equal to ai (−1)3k+1 1 + r where r is a sum of products of the form ai (n1 )e1 · · · ai (n3k+1 )e3k+1 1 with some nj ≤ −2. Since the operators ai (nj ) commute, r ∈ C2 (L). Hence ai (−1)3k+1 1 ∈ C2 (L). We can now conclude that C2 (L) contains a1 (−1)m1 · · · ad (−1)md 1 whenever mi ≥ 3k + 1 for some i. The proposition follows immediately.

Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine

53

13. Applications to the Moonshine Module We now apply our results to the study of the conjectures of Conway–Norton–Queen as discussed in the Introduction. Recall that the moonshine module V , is a vertex operator algebra whose automorphism group is precisely the Monster M. See [B1, FLM3, G] for details. The first author proved that V , has a unique simple module, namely V , itself, in [D2], and in [DLM3] we showed that in fact V , is rational, that is every admissible V , -module is completely reducible. Thus V , is holomorphic. It also satisfies Condition C2 (Prop. 12.4). By Theorem 10.3 we conclude that there is a unique simple g-twisted V , -module V , (g) for each g ∈ M. For each pair of commuting elements (g, h) in M, recall from Sect. 11 that Z(g, h, τ ) = Z(g, h) is the corresponding partition function. The function Z(1, h) is precisely the graded character of h ∈ M on V , . By the results of Borcherds [B2], which confirm the original Conway-Norton conjecture [CN], each Z(1, h) is a Hauptmodul – in fact the Hauptmodul conjectured in [CN]. We use this to prove the next result, which completes the proof of Theorem 1.5. Theorem 13.1. The following hold: (i) There is a scalar σ = σ (g) such that the graded dimension Z(g, 1, τ ) of V , (g) is equal to σ Z(1, g, Sτ ). In particular, Z(g, 1, τ ) is a Hauptmodul. (ii) More generally, if a commuting pair g, h ∈ M generates a cyclic group then Z(g, h, τ ) is Hauptmodul. Proof. Suppose that "g, h# = "k#. Then there is γ ∈ SL(2, Z) such that (g, h)γ −1 = (1, k). By Theorem 11.4 we have Z(g, h, τ ) = σ Z(1, k, γ τ ) for some scalar σ . Since Z(1, k, τ ) is a Hauptmodul, so too is Z(g, h, τ ). If h = 1 then we may take k = g and γ = S. Both parts of the theorem now follow. More is known in special cases. Huang has shown [H] that if g is of type 2B (in ATLAS notation) then in fact the constant σ (g) in part (i) is equal to 1. This also follows from our results and [FLM3]. Similarly, if g is of type 2A it is shown in [DLM1], on the basis of Theorem 13.1, that again σ (g) = 1. In this case, the precise description of Z(g, h, τ ) for gh = hg and h of odd order is given in [DLM1]. As discussed in [DM1] and [DM2], the uniqueness of V , (g) leads to a projective representation of the centralizer CM (g) on V , (g). These are discussed in some detail, in the case that g has order 2, in (loc.cit.). The conjectures of Conway–Norton–Queen state that there are (projective) representations of CM (g) on suitably graded spaces such that all graded traces are either Hauptmoduln or zero. There is no doubt that the twisted modules V , (g) are the desired spaces. One would still like to show that σ (g) = 1 in part (i) of the theorem for all g ∈ M, and to compute the Mckay-Thompson series Z(g, h, τ ) for "g, h# not cyclic. Finally, we calculate some correlation functions. We fix a holomorphic vertex operator algebra V satisfying Condition C2 . Recall that T (v, (g, h), τ ) is the (g, h) 1-point correlation function associated with v and a pair of commuting elements g, h ∈ Aut(V ) as defined in (8.4). If wt[v] = k then we have seen that T (v, (1, 1), τ ) is a generalized modular form, holomorphic in the upper half-plane h. In fact, Eq. (11.5) tells us that T (v, (1, 1), τ ) spans a 1-dimensional SL(2, Z)-module under the action (1.13), that is we have T |γ (v, (1, 1), τ ) = σ (γ )T (v, (1, 1), τ ) for some character σ : SL(2, Z) → C∗ . We use this to prove

(13.1)

54

C. Dong, H. Li, G. Mason

Lemma 13.2. Let V be a holomorphic vertex operator algebra which satisfies Condition C2 and let v ∈ V satisfy wt[v] = k. If k is odd, then the correlation function T (v, (1, 1), τ ) is identically zero. Proof. We observe that the q-expansion of T (v, (1, 1), τ ) lies in C[[q 1/3 , q −1/3 ]]. This is because V is holomorphic and so 8|c (cf. the remark following Theorem 1.4). It follows that T acts on T (v, (1, 1), τ ) as multiplication by a cube root of unity, and since T covers the abelianization of SL(2, Z) then the kernel of the character σ has index dividing 3. Since S 4 = id it follows that S, and in particular S 2 , lies in the kernel of σ . Now setting γ = S 2 in (13.1) yields T (v, (1, 1), τ ) = T |S 2 (v, (1, 1), τ ) = (−1)k T (v, (1, 1), τ ). The lemma follows.

Now let G be a finite group of automorphisms of V . Then T (v, 1, g, τ ) is essentially the graded trace of o(v)g on V for g ∈ G. If we choose v to be the conformal vector ω˜ of (V , Y []) then wt[ω] ˜ = 2, so T (ω, ˜ 1, g, τ ) is a form of weight 2. It is easy to describe: Lemma 13.3. T (ω, ˜ 1, g, τ ) =

1 d 2πi dτ Z(1, g, τ ).

Proof. One could proceed by setting ω˜ = ω−c/24 and using Y (ω, z) = n L(n)z−n−2 , but it is simpler to use (5.8) and (5.9), as we may because g ω˜ = ω. ˜ As ω˜ = L[−2]1, the lemma follows. If we write V = ⊕n Vn , then of course we have Z(1, g, τ ) = q −c/24 (tr|Vn g)q n , n

T (ω, ˜ 1, g, τ ) = q

−c/24

n

(n − c/24)(tr|Vn g)q n ,

and we may think of T (ω, ˜ 1, g, τ ) as arising from a sequence of virtual characters of G. That is, instead of “Moonshine of weight 0,” one now has “Moonshine of weight 2.” This is relevant because of the work of Devoto [De] in which such things are interpreted as being elements of degree 2 in the elliptic cohomology of BG. Similarly suppose that v ∈ V satisfies wt[v] = k with gv = v for all g ∈ G. Then G commutes with o(v) in its action on each Vn , so each eigenspace of the semisimple part o(v)s of o(v) on Vn is a G-module and gives rise to a “generalized module” for G, i.e., of the form i λin Vni with λin ∈ C the distinct eigenvalues of o(v)s on Vn and Vni the corresponding eigenspaces of o(v)s . In this way, the pair (V , v) gives rise to a sequence of generalized modules n,i λin Vni for G such that the corresponding trace functions T (v, 1, g, τ ) are modular forms of weight k. This is “Moonshine of weight k,” and together with the analogues for the twisted sectors gives rise to elements of Ell k BG as in [De]. Actually, this is not quite what we have proved, because in [De] there are additional arithmetic requirements. It seems likely that the appropriate conditions do hold, but that remains to be investigated. The 1-point correlation functions for the Moonshine Module are completely described in a forthcoming paper [DM3].

Modular-Invariance of Trace Functions in Orbifold Theory and Moonshine

55

References [AM]

Anderson, G. and Moore, G.: Rationality in conformal field theory. Commun. Math. Phys. 117, 441–450 (1988) [B1] Borcherds, R.E.: Vertex algebras, Kac-Moody algebras, and the Monster. Proc. Natl. Acad. Sci. USA 83, 3068–3071 (1986) [B2] Borcherds, R.E.: Monstrous moonshine and monstrous Lie superalgebras. Invent. Math. 109, 405– 444 (1992) [B3] Borcherds, R.E.: Modular Moonshine III. Peprint [BR] Borcherds, R.E. and Ryba, A.: Modular Moonshine II. Duke. Math. J. 83, 435–459 (1996) [CN] Conway, J.H. and Norton, S.P.: Monstrous Moonshine. Bull. London. Math. Soc. 12, 308–339 (1979) [De] Devoto, J.: Equivariant cohomology and finite groups. Michigan Math. J. 43, 3–32 (1996) [DVVV] Dijkgraaf, R., Vafa, C., Verlinde, E. and Verlinde, H.: The operator algebra of orbifold models. Commun. Math. Phys. 123, 485–526 (1989) [DGH] Dixon, L., Ginsparg, P., Harvey, J.A.: Beauty and the beast: Super-conformal symmetry in a Monster module. Commun. Math. Phys. 119, 221–241 (1986) [DHVW] Dixon, L., Harvey, J.A., Vafa, C. and Witten, E.: Strings on orbifolds. Nucl. Phys. B 261, 620–678 (1985); Strings on orbifolds II. Nucl. Phys. B 274, 285–314 (1986) [DGM] Dolan, L., Goddard, P. and Montague, P.: Conformal field theory of twisted vertex operators. Nucl. Phys. B 338, 529 (1990) [D1] Dong, C.: Vertex algebras associated with even lattices. J. Algebra 160, 245–265. (1993) [D2] Dong, C.: Representations of the moonshine module vertex operator algebra. Contemporary Math. 175 (1994) [D3] Dong, C.: Twisted modules for vertex algebras associated with even lattice. J. of Algebra 165, 91–112 (1994) [DL] Dong, C. and Lepowsky, J.: Generalized Vertex Algebras and Relative Vertex Operators. Progress in Math. Vol. 112, Boston: Birkhäuser, 1993 [DLM1] Dong, C., Li, H. and Mason, G.: Some twisted modules for the moonshine vertex operator algebras. Contemp. Math. 193, 25–43 (1996) [DLM2] Dong, C., Li, H. and Mason, G.: Regularity of rational vertex operator algebras. Adv. in Math. 132, 148–166 (1997) [DLM3] Dong, C., Li, H. and Mason, G.: Twisted representations of vertex operator algebras, Math. Ann. 310, 571–600 (1998) [DM1] Dong, C. and Mason, G.: Nonabelian orbifolds and the boson-fermion correspondence. Commun. Math. Phys. 163„ 523–559 (1994) [DM2] Dong, C. and Mason, G.: Vertex operator algebras and moonshine: A survey. Adv. Studies in Pure Math. 24, 101–136 (1996) [DM3] Dong, C. and Mason, G.: Monstrous moonshine of higher weight. math.QA/9803116 [DMZ] Dong, C., Mason, G. and Zhu, Y.: Discrete series of the Virasoro algebra and the moonshine module. Proc. Symp. Pure. Math. American Math. Soc. 56 II, 295–316 (1994) [EZ] Eichler, M. and Zagier, D.: On the Theory of Jacobi Forms I. Progress in Math. Vol. 55, Boston: Birkhäuser, 1985 [FF] Feigin, B.L. and Fuchs,D.B.: Verma modules over the Virasoro algebra. Lecture Notes in Math., Vol. 1060, Berlin–New York: Springer-Verlag, 1984 [FFR] Feingold, A.J., Frenkel, I.B. and Ries, J.F.X.: Spinor construction of vertex operator algebras, triality (1) and E8 . Contemp. Math. 121 (1991) [FHL] Frenkel, I.B., Huang, Y. and Lepowsky, J.: On axiomatic approaches to vertex operator algebras and modules. Memoirs Am. Math. Soc. 104 (1993) [FLM1] Frenkel, I.B., Lepowsky, J. and Meurman, A.: A natural representation of the Fischer-Griess Monster with the modular function J as character. Proc. Natl. Acad. Sci. USA 81, 3256–3260 (1984) [FLM2] Frenkel, I.B., Lepowsky, and Meurman, A.: Vertex operator calculus. In: Mathematical Aspects of String Theory, Proc. 1986 Conference, San Diego. ed. by S.-T. Yau, Singapore: World Scientific, 1987, pp. 150–188 [FLM3] Frenkel, I.B., Lepowsky, J. and Meurman, A.: Vertex Operator Algebras and the Monster. Pure and Applied Math. Vol. 134, London–New York: Academic Press, 1988 [FZ] Frenkel, I.B. and Zhu,Y.: Vertex operator algebras associated to representations of affine and Virasoro algebras. Duke Math. J. 66, 123–168 (1992) [G] Griess, R.: The Friendly Giant. Invent. Math. 69, 1–102 (1982) [HMV] Harvey, J., Moore, G. and Vafa, C.: Quasicrystalline compactification, Nucl. Phys. B304, 269–290 (1988) [H] Huang, Y.: A non-meromorphic extension of the moonshine module vertex operator algebra. Contemp. Math. 193, 123–148 (1996)

56

[I] [KP] [KM] [K] [La] [Le] [Li] [M] [MS] [N] [Q] [Ra] [Ry] [S] [T1] [T2] [T3] [V] [Wa] [Wo] [Z]

C. Dong, H. Li, G. Mason

Ince, E.: Ordinary Differential Equations. London: Dover Publications, Inc., 1956 Kac, V. and Peterson, D.: Infinite-dimensional Lie algebras, theta functions and modular forms. Advances in Math. 53, 125–264 (1984) Knopp, M. and Mason, G.: Generalized modular forms. In preparation Koblitz, N.: Introduction to Ellipitic Curves and Modular Forms. New York: Springer-Verlay, 1984 Lang, S.: Introduction to Modular Forms. New York: Springer-Verlag, 1976 Lepowsky, J.: Calculus of twisted vertex operators. Proc. Natl. Acad Sci. USA 82, 8295–8299 (1985) Li, H.: Local systems of vertex operators, vertex superalgebras and modules. J. Pure Appl. Alg. 109, 143–195 (1996) Montague, P.: Orbifold constructions and the classification of self-dual c = 24 conformal field theory. Nucl. Phys. B 428, 233–258 (1994) Moore, G. and Seiberg, N.: Classical and quantum conformal field theory. Commun. Math. Phys. 123, 177–254 (1989) Norton, S.: Generalized moonshine. Proc. Symp. Pure. Math., American Math. Soc. 47, 208–209 (1987) Queen, L.: Some relations between finite groups, Lie groups and modular functions. Ph.D. Thesis, University of Cambridge, 1980 Rademacher, H.: Topics in Analytic Number Theory. New York: Springer-Verlag, 1973 Ryba, A.: Modular Moonshine? Contemp. Math. 193, 307–336 (1996) Schellekens, A.N.: Meromorphic c = 24 Conformal Field Theories. Commun. Math. Phys. 153, 159 (1993) Tuite, M.: Monstrous moonshine from orbifolds. Commun. Math. Phys. 146, 277–309 (1992) Tuite, M.: On the relationship between monstrous moonshine and the uniqueness of the moonshine module. Commun. Math. Phys. 166, 495–532 (1995) Tuite, M.: Generalized moonshine and abelian orbifold constructions. Contemp. Math. 193, 353–368 (1996) Vafa, C.: Modular invariance and discrete torsion on orbifolds. Nucl. Phys. B 273, 592 (1986) Wang, W.: Rationality of Virasoro vertex operator algebras. Duke Math. J. IMRN, 71 1, 197–211 (1993) Wohlfahrt, K.: An extension of F. Klein’s level concept. Illinois J. Math. 8, 529–535 (1964) Zhu, Y.: Modular invariance of characters of vertex operator algebras. J. Am. Math. Soc. 9, 237–302 (1996)

Communicated by T. Miwa

Commun. Math. Phys. 214, 57 – 89 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Random Matrix Theory and ζ (1/2 + it) J. P. Keating1,2 , N. C. Snaith1 1 School of Mathematics, University of Bristol, University Walk, Bristol BS8 1TW, UK 2 BRIMS, Hewlett-Packard Laboratories, Filton Road, Stoke Gifford, Bristol BS34 6QZ, UK

Received: 20 December 1999 / Accepted: 24 March 2000

Abstract: We study the characteristic polynomials Z(U, θ ) of matrices U in the Circular Unitary Ensemble (CUE) of Random Matrix Theory. Exact expressions for any matrix size N are derived for the moments of |Z| and Z/Z ∗ , and from these we obtain the asymptotics of the value distributions and cumulants of the real and imaginary parts of log Z as N → ∞. In the limit, we show that these two distributions are independent and Gaussian. Costin and Lebowitz [15] previously found the Gaussian limit distribution for Im log Z using a different approach, and our result for the cumulants proves a conjecture made by them in this case. We also calculate the leading order N → ∞ asymptotics of the moments of |Z| and Z/Z ∗ . These CUE results are then compared with what is known about the Riemann zeta function ζ (s) on its critical line Res = 1/2, assuming the Riemann hypothesis. Equating the mean density of the non-trivial zeros of the zeta function at a height T up the critical line with the mean density of the matrix eigenvalues gives a connection between N and T . Invoking this connection, our CUE results coincide with a theorem of Selberg for the value distribution of log ζ (1/2 + iT ) in the limit T → ∞. They are also in close agreement with numerical data computed by Odlyzko [29] for large but finite T . This leads us to a conjecture for the moments of |ζ (1/2 + it)|. Finally, we generalize our random matrix results to the Circular Orthogonal (COE) and Circular Symplectic (CSE) Ensembles. 1. Introduction We investigate the distribution of values taken by the characteristic polynomials Z(U, θ ) = det(I − U e−iθ )

(1)

of N × N unitary matrices U with respect to the circular unitary ensemble (CUE) of random matrix theory (RMT). Our motivation is that it has been conjectured that the limiting distribution of the non-trivial zeros of the Riemann zeta function (and other

58

J. P. Keating, N. C. Snaith

L-functions), on the scale of their mean spacing, is the same as that of the eigenphases θn of matrices in the CUE in the limit as N → ∞ [28, 29, 31]. Hence the distribution of values taken by the zeta function might be expected to be related to those of Z(U, θ ), averaged over the CUE. The Riemann zeta function is defined by ∞ 1 −1 1 ζ (s) = 1− s = (2) ns p p n=1

for Res > 1, and then by analytic continuation to the rest of the complex plane. It has infinitely many non-trivial zeros in the critical strip 0 < Res < 1. The Riemann Hypothesis (RH) states that all of these non-trivial zeros lie on the critical line Res = 1/2; that is, ζ (1/2 + it) = 0 has non-trivial solutions only when t = tn ∈ R. Montgomery [28] has conjectured that the two-point correlations between the heights tn (assumed real), on the scale of the mean asymptotic spacing 2π/ log tn , in the limit n → ∞, are the same as those which exist between the eigenvalues of random complex hermitian matrices in the limit as the matrix size tends to infinity. Such matrices form the Gaussian Unitary Ensemble (GUE) of RMT. The GUE correlations are in turn the same as those of the phases θn of the eigenvalues of N × N unitary matrices, on the scale of their mean separation 2π/N , averaged over the CUE, in the limit N → ∞. (For a review of the spectral statistics of random matrices, see [27]). This conjecture is supported by a theorem, also due to Montgomery [28], which implies that, in the appropriate limits, the Fourier transform of the two-point correlation function of the Riemann zeros coincides over a restricted range with the corresponding CUE result. It is also supported by extensive numerical computations [29]. Both the conjecture and Montgomery’s theorem (again for restricted ranges) extend to all n-point correlations [30]. There is also strong numerical evidence in support of this generalization; for example, the distribution of spacings between adjacent zeros, measured in units of the mean spacing, appears to have the same limit as for the CUE [29]. Furthermore, heuristic calculations based on a Hardy-Littlewood conjecture for the pair correlation of the primes imply the validity of the generalized conjecture for all n, without restriction on the correlation range [24, 7, 9]. Thus all available evidence suggests that, in the limit as N → ∞, local (i.e. shorttn 1 range) statistics of the scaled (to have unit mean spacing) zeros wn = tn 2π log 2π , th defined by averaging over the zeros up to the N , coincide with the corresponding N statistics of the similarly scaled eigenphases φn = θn 2π , defined by averaging over the CUE of N × N unitary matrices. This then implies that locally-determined statistical properties of ζ (s), high up the critical line, might be modelled by the corresponding properties of Z(θ ), averaged over the CUE. One of our aims here is to explore this link by comparing certain RMT calculations with the following theorem and conjecture concerning the value distribution of ζ (1/2 + it). First, according to a theorem of Selberg [33, 29], for any rectangle E in R2 , 1 log ζ (1/2 + it) lim ∈E t : T ≤ t ≤ 2T , T →∞ T (1/2) log log T 1 2 2 e−(x +y )/2 dx dy; (3) = 2π E

Random Matrix Theory and ζ (1/2 + it)

59

that is, in the limit as T , the height up the critical line, tends to infinity, the value distributions of the real and imaginary parts of log ζ (1/2 + iT )/ (1/2) log log T each tend independently to a Gaussian with unit variance and zero mean. Interestingly, Odlyzko’s computations for these distributions when T ≈ t1020 show systematic deviations from this limiting form [29]. For example, increasing moments of both the real and imaginary parts diverge from the Gaussian values. We review this data in more detail in Sect. 3. Second, it is a long-standing conjecture that f (λ), defined by 1 1 T lim |ζ (1/2 + it)|2λ dt = f (λ)a(λ), (4) T →∞ (log T )λ2 T 0 where a(λ) =

p

(1 − 1/p)

λ2

∞ (λ + m) 2 −m p m! (λ)

,

(5)

m=0

exists, and a much-studied problem then to determine the values it takes, in particular for integer λ (see, for example, [33, 21]). Obviously f (0) = 1. It is also known that f (1) = 1 [17] and f (2) = 1/12 [20]. Based on number-theoretical arguments, Conrey and Ghosh have conjectured that f (3) = 42/9! [13], and Conrey and Gonek that f (4) = 24024/16! [14]. Conrey and Ghosh have obtained a lower bound for f when λ ≥ 0 [12], and HeathBrown [18] has obtained an upper bound for 0 < λ < 2. We now state our main results, all of which hold for θ ∈ R. (i) For Res > −1, N

MN (s) = |Z(U, θ )|s U (N) =

j =1

(j ) (j + s) , ( (j + s/2))2

(6)

where the average is over the CUE of N ×N unitary matrices, that is over the group U (N ) with respect to the normalized translation-invariant (Haar) measure [34, 27]. Clearly the result extends by analytic continuation to the rest of the complex s-plane. It follows from (6) that, for integers k ≥ 0, MN (2k) is a polynomial in N of degree k 2 . (ii) For s ∈ C, N ( (j ))2 Z(U, θ ) s/2 LN (s) = = , (7) ∗ Z (U, θ ) (j + s/2) (j − s/2) U (N)

j =1

where arg Z(U, θ ) is defined by continuous variation along θ − i$, starting at −i$, in the limit $ → 0, assuming θ is not equal to any of the eigenphases θn , with log Z(U, θ − i$) → 0 as $ → ∞. Thus Im log Z(U, θ ) has a jump discontinuity of size π when θ = θn . (iii) The value distributions of the real and imaginary parts of log Z(U, θ )/ (1/2) log N each tend independently to a Gaussian with zero mean and unit variance in the limit as N → ∞. This corresponds directly to Selberg’s theorem (3) for log ζ (1/2 + it) if we identify the mean density of the eigenangles θn , N/2π , with the mean density of the 1 T Riemann zeros at a height T up the critical line, 2π log 2π ; that is if N = log

T . 2π

(8)

60

J. P. Keating, N. C. Snaith

This is a natural connection to make between matrix size and position on the critical line, because the mean eigenvalue density is the only parameter in the theory of spectral statisitics for the circular and Gaussian ensembles of RMT. The central limit theorem for Im log Z was first proved by Costin and Lebowitz [15] for the characteristic polynomials of matrices in the GUE (see also [32] for a review of related results). Our proof is new, and goes further in that it allows us to compute the cumulants. (iv) Let Qn (N ) be the nth cumulant of the distribution of values of Re log Z, defined with respect to the CUE, and let Rn (N ) be the corresponding cumulant for Im log Z. Then N

2n−1 − 1 (n−1) ψ (j ), 2n−1

Qn (N ) =

(9)

j =1

and

Rn (N ) =

N

(−1)1+n/2 2n−1

j =1 ψ

(n−1) (j )

0

n even , n odd

(10)

where ψ is a polygamma function. Thus Q1 (N ) = R1 (N ) = 0. It is straightforward to obtain a complete (large N ) asymptotic expansion for these cumulants. For example, 1 1 1 Q2 (N ) = (Re log Z)2 = log N + (γ + 1) + + O(N −4 ), (11) U (N) 2 2 24N 2 2n−1 − 1 ζ (n − 1) (n) + O(N 2−n ), n ≥ 3, (12) Qn (N ) = (−1)n 2n−1 and

R2 (N ) = (Im log Z)2

U (N)

= Q2 (N )

1 1 1 + O(N −4 ), log N + (γ + 1) + 2 2 24N 2 (−1)(k+1) R2k (N ) = ζ (2k − 1) (2k) + O(N 2−2k ), k > 1. 22k−1 =

(13) (14)

The fact that when k > 1 R2k (N ) tends to a constant as N → ∞ proves a conjecture made by Costin and Lebowitz [15]. (v) It follows from (6) that fCUE (λ) = lim

1

N→∞

Nλ

2

|Z(U, θ )|2λ

U (N)

=

G2 (1 + λ) , G(1 + 2λ)

(15)

where G denotes the Barnes G-function [3], and hence that fCUE (0) = 1 (trivial) and fCUE (k) =

k−1 j =0

j! (j + k)!

(16)

for integers k ≥ 1. Thus, for example, fCUE (1) = 1, fCUE (2) = 1/12, fCUE (3) = 42/9! 2 and fCUE (4) = 24024/16!. fCUE (k) is the coefficient of N k in MN (2k), which, as noted

Random Matrix Theory and ζ (1/2 + it)

61

above, is a polynomial in N of degree k 2 . The coefficients of the lower-order terms can also be calculated explicitly. Similarly, lim N λ LN (2λ) = G(1 − λ)G(1 + λ). 2

N→∞

(17)

The results listed above allow us to compute the value distributions of Re log Z, Im log Z, and |Z|, for any N , and to derive explicit asymptotics for these distributions when N → ∞. In comparing our random-matrix results with what is known about the zeta function, we find the following. First, the value distributions of Re log Z and Im log Z coincide with Odlyzko’s numerical data for the corresponding distributions of the values of the zeta function at a height T up the critical line if we make the identification (8). This implies that, with respect to its local statistics, the zeta function behaves like a finite polynomial of degree N given by (8). The value distribution of |Z| is similarly in agreement with our numerical data for that of |ζ (1/2 + it)|. It is important at this stage to remark that Montgomery’s conjecture (and its generalization) refers to the short range correlations (i.e. correlations on the scale of mean separation) between the Riemann zeros at a height T up the critical line, in the limit as T → ∞. The finite-T correlations take the form of a sum of two contributions, one being the random-matrix limit and the other representing long range deviations which may be expressed as a sum over the primes [4, 25, 5]. This is also known to be the case for the second moment of Im log ζ (1/2 + it). Specifically, Goldston [16] has proved, under the assumption of RH and Montgomery’s conjecture, that as T → ∞, 1 T

T

(Im log ζ (1/2 + it))2 dt

0

∞

=

(1 − m) 1 1 T 1 + o(1). log log + (γ + 1) + 2 2π 2 m2 p m p

(18)

m=2

Here the first two terms on the right-hand side agree with those in (13) if we again make the identification (8). The same general behaviour also holds for the higher moments of log ζ . It is plausible then that the moments of |ζ (1/2 + it) | (which are determined by long-range correlations between the zeros) asymptotically split into a product of two terms, one coming from random matrix theory and the other from the primes. Taken together with the fact that fCUE (k) = f (k) for k = 1, 2, and, conjecturally, for k = 3, 4, this leads us to conjecture that f (λ) = fCUE (λ)

(19)

for all λ where the moments are defined. This is further supported by other heuristic arguments, and by the fact that the product of a(λ) and our formula (6) for the moments of |Z(U, θ )| matches Odlyzko’s numerical data for the moments of |ζ (1/2 + it)| over the range 0 < λ ≤ 2, where we can compare them, again making the identification (8). These results were first announced in lectures at the Erwin Schrödinger Institute in Vienna, in September 1998 and at the Mathematical Sciences Research Institute in Berkeley in June 1999. The structure of this paper is as follows. We derive the CUE results listed above in Sect. 2, and then compare them with numerical data (almost all taken from [29]) for the Riemann zeta-function in Sect. 3. Our conjecture (19) is also discussed in more detail

62

J. P. Keating, N. C. Snaith

in this section. In Sect. 4 we state the analogues of the CUE results for the other circular ensembles of RMT, namely the Circular Orthogonal (COE) and Circular Symplectic (CSE) Ensembles. Numerical evidence suggests that the eigenvalues of the laplacian on certain compact (non-arithmetic) surfaces of constant negative curvature are asymptotically the same as those of matrices in the COE, and so our results might be expected to describe the associated Selberg zeta functions. More generally, it has been suggested that in the semiclassical (h¯ → 0) limit the quantum eigenvalue statistics of all generic, classically chaotic systems are related to those of the RMT ensembles (COE for time-reversal symmetric integer-spin systems, CUE for non-time-reversal integer-spin systems, and CSE for half-integer-spin systems) [10], and our results might then be expected to apply to the corresponding quantum spectral determinants. It is worth noting in this respect that extensive numerical evidence supports the conclusion that for classically chaotic systems the value distribution of the fluctuating part of the spectral counting function (which is proportional to the imaginary part of the logarithm of the spectral determinant) tends to a Gaussian in the semiclassical limit [6, 2]. Finally, it is worth remarking that Montgomery’s conjecture extends to many other classes of L-functions, and hence our results are expected to apply to them too, in the same way. However, Katz and Sarnak [22, 23] have conjectured that correlations between the zeros low down on the critical line, defined by averaging over L-functions within certain particular families, are described not by averages over the CUE, that is, over the unitary group U (N ), but by averages over other classical compact groups, for example the orthogonal group O(N ) or the unitary symplectic group U Sp(2N ). Thus the value distributions within these families close to the symmetry point t = 0 on the critical line will also be described by averages over the corresponding groups. We shall present our results in this case in a second paper [26].

2. CUE Random Matrix Polynomials 2.1. Generating functions. All of our CUE random-matrix results follow from the formulae (6) and (7) for the generating functions MN (s) and LN (s), and our goal in this section is to derive these expressions. Consider first MN (s). We start with the representation of Z(U, θ ) in terms of the eigenvalues eiθn of U : Z(U, θ ) =

N

1 − ei(θn −θ) .

(20)

n=1

The CUE average can then be performed using the joint probability density for the 2 eigenphases θn , ((2π)N N !)−1 j <m eiθj − eiθm [34, 27]. Thus |Z|s U (N) =

1 (2π)N N ! ×

1≤j <m≤N

2π

2π

···

0

iθj e

dθ1 · · · dθN N s 2 iθm i(θn −θ) − e (1 − e ) . 0

n=1

(21)

Random Matrix Theory and ζ (1/2 + it)

63

This integral can be evaluated exactly using Selberg’s formula (see, for example, Chapter 17 of [27]): J (a, b, α, β, γ , N ) 2γ ∞ ∞ N ··· (x − x ) (a + ixj )−α (b − ixj )−β dxj = j 3 −∞ −∞ 1≤j <3≤N j =1 =

N−1

(2π)N (a

+ b)(α+β)N−γ N(N−1)−N

j =0

(22)

(1 + γ + j γ ) (α + β − (N + j − 1)γ − 1) , (1 + γ ) (α − j γ ) (β − j γ )

where a, b, α, β and γ are complex numbers, Re a, Re b, Re α and Re β are all greater than zero, Re (α + β) > 1 and 1 Re α Re β Re (α + β − 1) − < Re γ < min , , . (23) N N −1 N −1 2(N − 1) To see this, note that (21) can be written in the form 2π 2N(N−1) 2sN 2π |Z|s U (N) = · · · dθ1 · · · dθN N !(2π )N 0 0

(24)

N sin(θj /2 − θm /2)2 |sin(θn /2 − θ/2)|s .

1≤j <m≤N

n=1

×

Clearly this integral is independent of θ (as it must be, since we are averaging over all unitary matrices) and so we set θ = 0. Using sin(θj −θm ) = sin θj cos θm −cos θj sin θm , we then have π π 2 2N +sN cot θm − cot θj 2 · · · dθ · · · dθ |Z|s U (N) = 1 N N !(2π)N 0 0 1≤j <m≤N

×

N

sin2 θn

N N−1

n=1

|sin θk |s .

(25)

k=1

Finally, the change of variables xn = cot θn gives 2N +sN N !(2π)N 2

|Z|s U (N) =

×

N

∞ −∞

···

∞ −∞

dx1 · · · dxN

|xm − xj |2

1≤j <m≤N

((1 + ixn )(1 − ixn ))−N−s/2

n=1

= =

2 2N +sN

N !(2π)N N j =1

J (1, 1, N + s/2, N + s/2, 1, N )

(j ) (s + j ) , ( (j + s/2))2

(26)

64

J. P. Keating, N. C. Snaith

provided Res > −1, which is just the result (6). Clearly the product (26) has an analytic continuation to the rest of the complex plane. Consider next LN (s). Note first that, according to the definition given in the Introduction,

Z Z∗

1 2

= exp (i Im log Z(U, θ ))

∞ N sin[(θn − θ)m] = exp −i , m

(27)

n=1 m=1

where for each value of n, the sum of sines lies in (−π, π ]. Hence, again using the joint probability density of the eigenphases θn , s 2π 2π 1 Z 2 eiθj − eiθm 2 = · · · dθ · · · dθ 1 N ∗ N Z N !(2π) 0 0 1≤j <m≤N U (N)

N ∞ sin[(θn − θ)k] . (28) × exp −is k n=1

k=1

As before, this integral is independent of θ , and so we set θ = 0. The sum in (28) can be evaluated using ∞ sin kx k=1

k

=

π −x , for 2

0 < x < 2π.

(29)

Note that this relation keeps the sine sum within the range (−π, π ] prescribed by the definition of the logarithm. Substituting (29) into (28) then yields s 2π 2π 2N(N−1) Z 2 sin(θj /2 − θm /2)2 = · · · dθ · · · dθ 1 N Z∗ N!(2π)N 0 0 U (N)

×

N

exp −

n=1

1≤j <m≤N

is (π − θn ) . 2

(30)

Making the transformation φj = θj /2 − π/2 and using the identity sin(φj − φm ) = (tan φj − tan φm ) × cos φj cos φm gives s π/2 π/2 2 Z 2 2N = ··· dφ1 · · · dφN (31) Z∗ N!(2π )N −π/2 −π/2 U (N)

×

N tan φj − tan φm 2 (cos2 φn )N−1

1≤j <m≤N

×

N

(cos φk + i sin φk )s .

k=1

n=1

Random Matrix Theory and ζ (1/2 + it)

65

Finally, changing variables to xj = tan φj , we have that s ∞ ∞ 2 Z 2 2N xj − xm 2 = · · · dx · · · dx 1 N Z∗ N !(2π )N −∞ −∞ 1≤j <m≤N U (N)  s N N N 1 xk  1  × × +i 1 + xn2 2 1+x 1 + x2 n=1

=

k=1

2 2N

N!(2π )N ×

N n=1

∞

···

−∞

1 1 + xn2

N

∞ −∞

k

dx1 · · · dxN 1 + ixk √ 1 − ixk

k=1

xj − xm 2

1≤j <m≤N

N √

×

k

s

.

(32)

This is in the form of Selberg’s integral (22) with a = b = 1, α = N − s/2, β = N + s/2 and γ = 1 (the condition (23) is satisfied when |s| < 2) and so we have s N Z 2 ( (j ))2 = , (33) Z∗ (j + s/2) (j − s/2) j =1

U (N)

as required. 2.2. Value distribution of Re log Z. All information about the value distribution of Re log Z can be obtained from the generating function MN (s): the moments may be obtained from the coefficients in the Taylor expansion of MN about s = 0, MN (s) =

∞ (log |Z|)j U (N)

j!

j =0

sj ;

(34)

the corresponding cumulants Qj (N ) are related to the Taylor coefficients of log MN , log MN (s) =

∞ Qj (N )

j!

j =1

sj ;

(35)

and the probability density for the values taken by Re log Z, ρN (x) =< δ(log |Z| − x) >U (N) , is given by 1 ρN (x) = 2π

∞

−∞

e−iyx MN (iy)dy.

(36)

(37)

We now analyse these general formulae using the explicit expression (6) for MN (s). Differentiating log MN (s), we have that N

Qn (N ) =

2n−1 − 1 (n−1) ψ (j ), 2n−1 j =1

(38)

66

J. P. Keating, N. C. Snaith

where ψ (n) (z) =

d n+1 log (z) dzn+1

(39)

are the polygamma functions. Thus it follows immediately that Q1 (N ) = (log |Z|)U (N) = 0.

(40)

Furthermore, substituting the well-known integral representation for the polygamma functions [1], when n ≥ 2, ∞ n−1 −j t N t e 2n−1 − 1 n (−1) dt n−1 2 1 − e−t 0 j =1 ∞ n−1 −t 2n−1 − 1 t e 1 − e−Nt n = (−1) dt 2n−1 1 − e−t 1 − e−t 0 ∞ 2n−1 − 1 e−t n = (−1) 2n−1 1 − e−t 0 × (n − 1)t n−2 − (n − 1)t n−2 e−Nt + N t n−1 e−Nt dt,

Qn (N ) =

(41)

where the last equality follows from an integration by parts. Consider first the second cumulant Q2 (N ). Rearranging the integrand in the final equality of (41),

1 ∞ 1 − e−Nt −t e−(N+1)t dt, (42) e + Nt Q2 (N ) = 2 0 1 − e−t 1 − e−t and so, re-expanding the terms written as fractions to give geometric series and integrating these term-by-term, we have that Q2 (N ) = (log |Z|)2 U (N) =

N ∞ 11 N 1 . + 2 n 2 n2 n=1

(43)

n=N+1

The large-N asymptotics can then be obtained by substituting n 1 k=1

k

∞

= γ + log n +

Ak 1 − , 2n n(n + 1) · · · (n + k − 1)

(44)

k=2

1 where Ak = k1 0 x(1−x)(2−x)(3−x) · · · (k−1−x)dx, into the first sum and applying the Euler–Maclaurin formula to the second. Any number of terms in the expansion in inverse powers of N can be calculated in this way; for example 1 1 1 1 1 . − + O Q2 (N ) = (log |Z|)2 U (N) = log N + (γ + 1) + 2 2 24N 2 80N 4 N6 (45)

Random Matrix Theory and ζ (1/2 + it)

67

Consider next the cumulants Qn (N ) when n ≥ 3. We now write ∞ −Nt t n−2 2n−1 − 1 n n−1 n−2 e (n−1) (−1) −(n−1)t ) + (N t dt. Qn (N ) = 2n−1 et − 1 et − 1 0 (46) The first term, which is independent of N , can be integrated explicitly using a wellknown representation of the zeta-function [33]. Changing variables in the second to y = tN then gives 2n−1 − 1 Qn (N ) = (−1)n n−1 2 ∞ 1 1 n−1 n−2 −y × (n)ζ (n − 1) + n−1 (y − (n − 1)y )e dy . N ey/N − 1 0

(47)

The N -dependent term in this equation clearly vanishes in the limit as N → ∞. Its large-N asymptotics can be obtained by expanding (ey/N − 1)−1 in powers of y/N and then integrating term-by-term; for example (n − 3)! 2n−1 − 1 n (n)ζ (n − 1) − + O(N 1−n ). (−1) (48) Qn (N ) = 2n−1 N n−2 It follows immediately from the fact that Qn (N√)/(Q2 (N ))n/2 → 0 as N → ∞ for all n > 2 that the value distribution of Re log Z/ Q2 (N ) tends to a Gaussian in this limit. Specifically, we have from (37) and the definition of the cumulants that if (49) ρ˜N (x) = Q2 (N )ρN ( Q2 (N )x) then 1 ρ˜N (x) = 2π

∞

y2 Q4 y 4 iQ3 y 3 exp −iyx − + + ··· − 3/2 2 4!Q22 −∞ 3!Q2

dy.

(50)

Hence all terms in the exponential that involve higher powers of y than y 2 vanish in the limit as N → ∞. Evaluating the resulting Gaussian integral then gives 2 1 −x lim ρ˜N (x) = √ exp . (51) N→∞ 2 2π The large-N asymptotics describing the approach to this limit can be obtained by retaining more terms in (50). There are several ways to do this. One is to expand the exponential of all terms that involve higher powers of y than y 2 as a series in increasing powers of y, so that

2 ∞ Q3 (iy)3 1 1 Q4 (iy)4 −x 2 + e−iyx e−y /2 + + ··· ρ˜N (x) = √ exp 3/2 2 2π −∞ 4!Q22 2π 3!Q2

2 Q3 (iy)3 Q4 (iy)4 + 2! + + ··· 3/2 4!Q22 3!Q2

68

J. P. Keating, N. C. Snaith

Q3 (iy)3

Q4 (iy)4 + + ··· 4!Q22

3



3! + · · ·  dy 3/2 3!Q2

2 ∞ 3 1 A4 (iy)4 1 −x −iyx −y 2 /2 A3 (iy) = √ exp e e + + 3/2 2 2π −∞ Q22 2π Q2 A5 (iy)5 + + · · · dy, 5/2 Q2 +

(52)

where the coefficients An (N ) are defined in terms of combinations of the cumulants Qn (N ) with n ≥ 3 (for example, A3 = Q3 /3!). Integrating term-by-term then gives 2 ∞ 1 1 Am −x 2 /2 −x e +√ ρ˜N (x) = √ exp 2 2π 2π m=3 Qm/2 2 m m p i m−p (m − p − 1)!!, m − p even x × (53) 0, m − p odd p p=0

from which it follows that the deviation from the Gaussian limit is of the order of (log N )−3/2 (because An (N ) → constant as N → ∞). It may be seen from (53) that it is only in the limit as N → ∞ that ρ˜N (x) becomes even in x: when N is finite it is asymmetric about x = 0. This can be traced back to the fact that the series in the exponential in (50) involves both even and odd powers of y. Indeed, the dominant N → ∞ asymptotics can also be computed by retaining only the y 3 term in the exponential (and not expanding the exponential as a series itself). Thus

∞ 1 y2 iQ3 y 3 ρ˜N (x) ∼ dy, (54) exp −ixy − − 3/2 2π −∞ 2 3!Q2 and this integral can then be computed exactly in terms of the Airy function Ai(z), giving

√ 3/2 −2 1/3 Q32 xQ2 Q22 21/3 x Q2 Ai , ρ˜N (x) ∼ Q2 exp + + 1/3 4/3 Q3 Q3 3Q23 Q3 22/3 Q3 (55) which itself is manifestly asymmetric in x. Finally, we note that the formulae derived above lead directly to corresponding expressions for the moments, since these may be related to the cumulants by taking the exponential of the right-hand side of (35), re-expanding as a Taylor series in powers of s, and equating the coefficients with those in (34). Thus, for example, it is straightforward to see that dn (log |Z|)n U (N) = n MN (s) |s=0 ds (2k − 1)!!(log |Z|)2 kU (N) + O((log N )k−2 ) if n = 2k = , if n = 2k + 1 O((log N )k−1 )

(56)

where the second moment is given by (45). This again implies that the limiting distribution is Gaussian.

Random Matrix Theory and ζ (1/2 + it)

69

2.3. Value distribution of Im log Z. In the same way as for the real part, all information about the value distribution of Im log Z is contained in the generating function LN (s). Thus, LN (−it) =

∞ (Im log Z)j U (N)

j!

j =0

tj ,

(57)

and similarly for the corresponding cumulants Rj , log LN (−it) =

∞ Rj (N ) j =1

j!

tj ,

(58)

where LN (s) is given by (7). Likewise, the probability density for the values taken by Im log Z, σN (x) =< δ(Im log Z − x) >U (N) ,

(59)

is given by σN (x) =

1 2π

∞

−∞

e−iyx LN (y)dy.

(60)

All of the results of the previous section then extend immediately to Im log Z. Thus, taking the logarithm of (7) and differentiating, N (−i)n (n−1) n−1 (n−1) (j ) + (−1) ψ (j ) −ψ 2n j =1 0 if n odd . = (−1)n/2+1 N (n−1) (j ) if n even j =1 ψ 2n−1

Rn (N ) =

(61)

The fact that all of the odd cumulants are zero implies that all of the odd moments are also zero. This is the main difference compared to the case of Re log Z. For the even cumulants we have R2m (N ) =

(−1)m+1 Q2m (N ), 22m−1 − 1

(62)

and so the asymptotics computed in the previous section apply immediately in this case too. Thus R2 (N ) = (Im log Z)2 U (N) 1 1 1 1 , = Q2 (N ) = log N + (γ + 1) + +O 2 2 24N 2 N4

(63)

and for m > 1, R2m (N ) =

(−1)m+1 22m−1

(2m)ζ (2m − 1) −

(2m − 3)! N 2m−2

+ O(N 1−2m ).

(64)

70

J. P. Keating, N. C. Snaith

The fact that, for m ≥ 2, R2m /R2m → 0 as N → ∞ implies that the value distribution √ of Im log Z/ R2 (N ) tends to a Gaussian in the limit. This was first proved by Costin and Lebowitz [15] for the GUE of random matrices. Specifically, they proved that the fluctuating part of the eigenvalue counting function has a limiting value distribution that is Gaussian. The connection comes because the two functions are the same, up to multiplication by π; specifically, if n(U, a, b) denotes the number of eigenvalues of U with a < θn < b, then n(U, a, b) =

(b − a)N 1 Z(U, b) + Im log , 2π π Z(U, a)

(65)

assuming that none of the eigenphases coincides with the end-points of the range. In addition, Costin and Lebowitz conjectured that, for m ≥ 2, R2m (N ) → constant when N → ∞. Our asymptotic formula (64) proves this for averages over the CUE, and provides the value of the constant. Wieand [35] has independently given a proof of the central limit theorem for n(U, a, b) − (b − a)N/(2π ) in the CUE case. The asymptotics of the approach to the Gaussian can be calculated from (58) and (60). Defining σ˜ N (x) =

R2 (N )σN ( R2 (N )x),

(66)

we have that y2 exp −iyx − 2 −∞

R6 y 6 R4 y 4 − 3 + · · · dy × exp R22 4! R2 6! 2 ∞ 1 y2 1 −x = √ exp exp −iyx − + 2 2π −∞ 2 2π 4 6 8 C6 y C8 y C4 y + + + · · · dy, × 2 3 R2 R2 R24

σ˜ N (x) =

1 2π

∞

(67)

where the coefficents Cn (N ) are defined in terms of the cumulants R2m (N ) with m > 1; for example C4 (N ) = R4 (N )/4!. Thus, integrating term-by-term, 2 ∞ 1 1 C2m −x 2 /2 −x +√ σ˜ N (x) = √ exp m e 2 2π 2π m=2 R2 2m 2m p (2m − p − 1)!! if 2m − p is even × . (−ix) 0 if 2m − p is odd p

(68)

p=0

In this case σ˜ N (x) is an even function of x for all N , and not just in the limit as N → ∞. This is a consequence of the fact that all of the odd cumulants are identically zero. It follows from (68) that the deviation from the Gaussian limit is of the order of (log N )−2 , and so is asymptotically smaller than in the case of Re log Z.

Random Matrix Theory and ζ (1/2 + it)

71

Finally, the expressions derived above for the cumulants may again be used to deduce information about the moments. We have already noted that the odd moments are identically zero. For the even moments we find the usual Gaussian relationship: (Im log Z)2k U (N) = (2k − 1)!!(Im log Z)2 kU (N) + O((log N )k−2 ),

(69)

where the asymptotics of the second moment are given by (63). 2.4. Independence. We have shown in Sects. 2.2 and 2.3 that the values of both Re log Z and Im log Z have a Gaussian limit distribution as N → ∞. Our purpose in this section is to show that they are also independent in this limit. The generating function for the joint distribution is 2π 2π 1 t is(Im log Z) eiθj − eiθk 2 |Z| e U (N) = · · · dθ · · · dθ 1 N N!(2π )N 0 0 1≤j
N ∞ N sin[(θl − θ)m] i(θn −θ) t × exp −is 1−e m n=1 m=1 l=1 2π 2π 1 eiθj − eiθk 2 = ··· dθ1 · · · dθN N N!(2π ) 0 0 1≤j
N ∞ N sin(θ m) t l 1 − eiθn × exp −is . (70) m n=1

m=1

l=1

Making the same transformations as in Sect. 2.1, 2N 2tN N !(2π )N 2

|Z|t eis(Im log Z) U (N) =

×

N n=1

×

N

∞ −∞

1 1 + xn2

2N 2tN N !(2π )N 2

=

∞ −∞

···

∞ −∞

dx1 · · · dxN

N+t/2 N ···



|xj − xk |2

1≤j
s

xl  1  +i 2 1 + xl 1 + xl2 l=1

∞ −∞

dx1 · · · dxN

|xj − xk |2

1≤j
(1 + ixn )−N−t/2+s/2 (1 − ixn )−N−t/2−s/2

n=1

2N 2tN J (1, 1, N + t/2 − s/2, N + t/2 + s/2, 1, N ) N !(2π )N 2

= =

N j =1

(j ) (t + j ) . (j + t/2 + s/2) (j + t/2 − s/2)

(71)

The conditions on the validity of Selberg’s integral translate into the restrictions t/2 + s/2 > −1, t/2 − s/2 > −1 and t > −1.

72

J. P. Keating, N. C. Snaith

Next we expand the logarithm of the generating function as a series in powers of s and t: N

log (j ) + log (t + j ) − log (j + t/2 + s/2) − log (j + t/2 − s/2)

j =1

= α00 + α10 t + α01 s + +

α20 2 α02 2 α30 3 α21 2 t + α11 ts + s + t + t s 2 2 3! 2!1!

α12 2 α03 3 ts + s + ··· , 2!1! 3!

(72)

where αn0 = Qn (N ), α0n = i n Rn (N ),

(73a) (73b)

and for n = 0 and m = 0, αmn

 N 1 = m −ψ (n−1) (j + t/2 + s/2) ∂t 2n j =1 + (−1)n−1 ψ (n−1) (j + t/2 − s/2) ∂m

(0,0)

1 1 −ψ (n+m−1) (j + t/2 + s/2) = 2n 2m j =1 + (−1)n−1 ψ (n+m−1) (j + t/2 − s/2) (0,0) 0 if n odd −1 N = . ψ (n+m−1) (j ) if n even 2n+m−1 j =1 N

(74)

The joint value distribution is then given by τN (x, y) = δ(log |Z| − x)δ(Im log Z − y)U (N) ∞ ∞ 1 −(itx+isy) it log |Z| is Im log Z e e dt ds e = U (N) 4π 2 −∞ −∞ ∞ ∞ 1 = e−(itx+isy) 4π 2 −∞ −∞ N

(j ) (it + j ) dt ds (j + it/2 + s/2) (j + it/2 − s/2) j =1 ∞ ∞ α20 1 −(itx+isy) e exp α10 it + α01 s + (it)2 = 2 4π −∞ −∞ 2 α02 2 α30 α21 + α11 its + s + (it)3 + (it)2 s 2 3! 2!1! α12 2 α03 3 + its + s + · · · dt ds. 2!1! 3! ×

(75)

Random Matrix Theory and ζ (1/2 + it)

73

Hence, using α10 = α01 = α11 = 0, α20 = −α02

1 1 1 1 = log N + (γ + 1) + − +O 2 2 24N 2 80N 4

1 N6

,

(76)

and αmn = O(1) for m + n ≥ 3, which follows from a comparison with the cumulants of Re log Z, the scaled joint distribution (77) τ˜N (x, y) = Q2 (N )R2 (N )τN ( Q2 (N )x, R2 (N )y) satisfies

v2 w2 exp −ivx − iwy − − 2 2 −∞ −∞ α21 α12 α30 3 2 (iv) + (iv) w + ivw 2 + 3/2 3/2 3/2 3!α20 2!α20 2!α20 α03 + w 3 + · · · dv dw 3/2 3!α20 ∞ ∞ v2 w2 1 exp −ivx − iwy − − = 4π 2 −∞ −∞ 2 2 1 dv dw. × 1+O (log N )3/2

τ˜N (x, y) =

1 4π 2

∞

Thus lim τ˜N (x, y) =

N→∞

1 4π 2

∞

∞ −∞

v2

e−ivx− 2 dv

2 1 −x = exp exp 2π 2

∞

e−iwy−

−∞ 2 −y

2

w2 2

(78)

dw

(79)

.

Therefore, as claimed, the limiting value distributions of the real and imaginary parts of log Z are independent and Gaussian as N → ∞. 2.5. Asymptotics of the generating functions. Our goal in this section is to derive the leading-order asymptotics of the generating functions MN (s) and LN (s) as N → ∞. The results are most easily stated in terms of the Barnes G-function [3], defined by G(1 + z) = (2π)z/2 e−[(1+γ )z

2 +z]/2

∞

(1 + z/n)n e−z+z

2 /(2n)

,

(80)

n=1

which has the following important properties: G(1) = 1, G(z + 1) = (z) G(z),

(81)

and ∞

z2 zn z (−1)n−1 ζ (n − 1) , log G(1 + z) = (log(2π) − 1) − (1 + γ ) + 2 2 n n=3

(82)

74

J. P. Keating, N. C. Snaith

where the sum converges for |z| < 1. It follows from the definition (80) that G(z) is an entire function of order two and that G(1 + z) has zeros at the negative integers, −n, with multiplicity n (n = 1, 2, 3 . . . ). Consider first MN (s). Define fCUE (s/2) = lim

MN (s)

N→∞

N (s/2)

2

1

= lim

N→∞ N (s/2)2

N j =1

(j ) (j + s) . ( (j + s/2))2

(83)

We claim that fCUE (s/2) =

(G(1 + s/2))2 . G(1 + s)

(84)

To prove this, we use the fact that for |s| < 1, 

s 2

fCUE (s/2) = exp 

2

(γ + 1) +

∞

(−s)j

j =3

2j −1 − 1 2j −1

 ζ (j − 1)  , j

(85)

which follows from (35), (40), (45), and (48). Comparing this to

(G(1 + s/2))2 log G(1 + s)

= 2 log G(1 + s/2) − log G(1 + s) ∞

s sn s2 = 2(log(2π) − 1) − 2(1 + γ ) + 2 (−1)n−1 ζ (n − 1) n 4 8 2 n n=3 ∞

s sn s2 −(log(2π) − 1) + (1 + γ ) − (−1)n−1 ζ (n − 1) 2 2 n n=3

= (1 + γ )

s2 4

+

∞ n−1 2 −1 n=3

2n−1

ζ (n − 1)

(−s)n , n

(86)

which also holds for |s| < 1, we see that (84) holds when |s| < 1, and hence, by analytic continuation, in the rest of the complex s-plane. It follows that fCUE (s/2) is a meromorphic function of order two with a pole of order 2k − 1 at each odd negative integer s = −(2k − 1), for k = 1, 2, 3, . . . . The value of fCUE (n), where n is an integer, can be calculated directly from (84), since we have from (81) that G(n) =

n−1 j =1

(j ),

n = 2, 3, 4 . . . .

(87)

Random Matrix Theory and ζ (1/2 + it)

75

Thus (G(1 + n))2 G(1 + 2n) n 2 j =1 (j ) = 2n m=1 (m) n j =1 (j ) = 2n m=n+1 (m)

fCUE (n) =

=

n−1 j =0

j! , (j + n)!

(88)

for n = 1, 2, . . . . Inspired by a talk by one of us (JPK) at the Mathematical Sciences Research Institute, Berkeley, in June 1999, in which this result was discussed, Brézin and Hikami have since checked that the same formula holds for the integer moments of a wider class of random-matrix characteristic polynomials, including the GUE [11]. The leading order asymptotics of LN (s) can be obtained in the same way. In this case we claim that lim LN (s)N s

2 /4

N→∞

= G(1 − s/2)G(1 + s/2).

(89)

To prove this we note that lim LN (s)N s

2 /4

N→∞

= lim N s N→∞

2 /4

N

( (j ))2 (j + s/2) (j − s/2)

j =1



= exp −(γ + 1)

s 2 2

(90)

 ∞ ζ (2j − 1)s 2j  − , 22j j j =2

where the second equality follows from (58), (61), (63), and (64). We also have that log(G (1 − s/2)G(1 + s/2)) = log G(1 − s/2) + log G(1 + s/2) ∞ s s2 (−s)n = − (log(2π) − 1) − (1 + γ ) + (−1)n−1 ζ (n − 1) n 4 8 2 n n=3

s + (log(2π) − 1) − (1 + γ ) + 4 8 s2

= − (1 + γ ) = − (1 + γ )

s2 4 s2 4

+2 −

∞

n=2 ∞ n=2

∞

(−1)n−1 ζ (n − 1)

n=3

(−1)2n−1 ζ (2n − 1)

ζ (2n − 1)s 2n , 22n n

sn 2n n

(−s)2n 22n (2n) (91)

when |s/2| < 1. Thus (89) holds for |s/2| < 1, and hence, by analytic continuation, in 2 the rest of the complex s-plane. It follows that limN→∞ LN (s)N s /4 has zeros of order n at s = ±2n for n = 1, 2, . . . .

76

J. P. Keating, N. C. Snaith

3. ζ (1/2 + it) Our aim now is to compare the CUE results for Z(U, θ) derived in the previous sections with the behaviour of the Riemann zeta function on its critical line. First, we have to identify the analogue of the matrix size N , which is the one parameter that appears in the CUE formulae. With this in mind we note that under the identification T N = log , (92) 2π the fact that value distributions of Re log Z/ 21 log N and Im log Z/ 21 log N tend independently to Gaussians with zero mean and unit variance in the limit as N → ∞ coincides precisely with Selberg’s theorem (3). (Of course, the fact that log Z has zero mean is a consequence of its definition: we could multiply the determinant in (1) by any function with no zeros, for example a constant, but this would correspond to a trivial shift of the mean.) In the random matrix theory of spectral statistics, the natural parameter is the mean eigenvalue separation. For the eigenphases θn of U , this is 2π/N . In the same way, the mean spacing between the Riemann zeros tn at a height T up the critical line, 2π/ log(T /2π), is the only property of the zeta function that appears in Montgomery’s conjecture and its generalizations. Equation (92) corresponds to equating these two parameters. As already mentioned in the Introduction, Odlyzko’s computations of the value distributions of both the real and imaginary parts of log ζ (1/2 + it), for ranges of t near to the 1020 th zero (that is, t ≈ 1.5202 × 1019 ), exhibit striking deviations from the Gaussian limit guaranteed by Selberg’s theorem [29]. In Figs. 1 and 2 we show some of Odlyzko’s data, for the real and imaginary parts respectively, normalized as in (3), together with the Gaussian. It is apparent that the deviations are larger for Re log ζ , and that in this case the value distribution is not symmetric about zero. This deviation can be quantified by comparing the moments of these distributions with the corresponding Gaussian values. These moments are listed in Tables 1 (Re log ζ ) and 2 (Im log ζ ). Again, it is clear from the size of the odd moments that the distribution is not symmetric about zero in the case of Re log ζ . We begin by comparing these data with the CUE results derived in Sects. 2.2 and 2.3. The matrix size N corresponding, via (92), to the height of the 1020th zero is about 42 (the results we now present are not sensitive to the precise value). In Figs. 1 and 2 we also plot the CUE value distributions for Re log Z and Im log Z corresponding to N = 42, computed by direct numerical evaluation of the Fourier integrals in (37), using (6), and (60), using (7). The N = 42 random matrix curves are clearly much closer to the data than the limiting Gaussians (N = ∞). This is even more apparent in Fig. 3, where we show minus the logarithm of the value distributions plotted in Fig. 1. Similarly, we also give in Tables 1 and 2 the values of the CUE moments, normalized in the same way (so that the second moment takes the value one). These confirm the improved agreement. In this context we recall two relevant facts about the deviations of the CUE value-distributions from their Gaussian limiting forms: first, these deviations are larger for Re log Z than for Im log Z; and second, in the case of Re log Z they are not symmetric (even) about zero for N finite, whereas for Im log Z they are. As was already pointed out in the Introduction, random matrix theory cannot give a complete description of the finite-T distribution of values of log ζ (1/2 + it), because it contains no information about the long-range zero-correlations associated with the primes. These can be computed separately, using the methods of [4]. For the moments of

Random Matrix Theory and ζ (1/2 + it)

77

0.4

CUE

0.3

Riemann Zeta 0.2 Gaussian 0.1

−6

−4

−2

4

2

Fig. 1. The CUE value distribution for Re log Z with N = 42, Odlyzko’s data for the value distribution of Re log ζ (1/2 + it) near the 1020th zero (taken from [29]), and the standard Gaussian, all scaled to have unit variance Table 1. Moments of Re log ζ (1/2 + it), calculated over two ranges (labelled a and b) near the 1020th zero (T 1.520 × 1019 ) (taken from [29]), compared with the CUE moments of Re log Z with N = 42 and the Gaussian moments, all scaled to have unit variance Moment

ζ a)

ζ b)

CUE

Normal

1

0.0

0.0

0.0

0

2

1.0

1.0

1.0

1

3

−0.53625

−0.55069

−0.56544

0

4

3.9233

3.9647

3.89354

3

5

−7.6238

−7.8839

−7.76965

0

6

38.434

39.393

38.0233

15

7

−144.78

−148.77

−145.043

0

8

758.57

765.54

758.036

105

9

−4002.5

−3934.7

−4086.92

0

10

24060.5

22722.9

25347.77

945

Table 2. Moments of Im log ζ (1/2 + it) near the 102th zero (T = 1.520 × 1019 ) (taken from [29]) compared with the CUE moments for Im log Z when N = 42 and the Gaussian moments, all scaled to have unit variance Moment

ζ

CUE

Normal

1

−6.3 × 10−6

0.0

0

2

1.0

1.0

1

3

−4.7 × 10−4

0.0

0

4

2.831

2.87235

3

5

−9.1 × 10−3

0.0

0

6

12.71

13.29246

15

7

−0.140

0.0

0

8

76.57

83.76939

105

78

J. P. Keating, N. C. Snaith

0.4

CUE

0.3

Riemann Zeta 0.2 Gaussian 0.1

−4

−2

4

2

Fig. 2. The CUE value distribution for Im log Z with N=42, Odlyzko’s data for Im log ζ (1/2 + it) near the 1020th zero (taken from [29]), and the standard Gaussian, all scaled to have unit variance

CUE 15

Zeta Gaussian

10

5

−6

−4

−2

2

4

Fig. 3. Minus the logarithm of the value distributions plotted in Fig. 1

log ζ (1/2+it), the results take the same form as Goldston’s formula (18): the long-range contributions may be expressed as convergent sums over the primes. These prime-sums all have the property that, if each prime p is replaced by p γ , they vanish in the limit γ → ∞. We give explicit formulae below, but first turn to the moments of |ζ (1/2 + it)|. We expect a relationship between the moments of |ζ (1/2 + it)|, defined by averaging over t, and those of |Z(U, θ )|, averaged over the CUE; but clearly the moments of |ζ (1/2 + it)| are related to those of Re log ζ (1/2 + it) by exponentiation, and so it is natural to anticipate a long-range contribution in the form of a multiplicative factor given by a convergent product over the primes. We are thus led to a connection resembling the conjecture (4). The precise form of the prime product in (4) can, in fact, be recovered

Random Matrix Theory and ζ (1/2 + it)

79

using heuristic arguments similar to those of [8] and [25] (essentially by substituting for ζ (1/2 + it) the prime product (2), truncated to include only primes with p < T /2π , and treating these prime-contributions as being independent). However, our main focus here is on the CUE component, and so we merely observe that if each prime p in (5) is replaced by p γ , then a(λ) → 1 in the limit as γ → ∞. This leads us to conjecture, again invoking (92), that f (λ), defined by (4), is equal to fCUE (λ), defined by (15). Based on the results of Section 2.5, we thus conjecture that (G(1 + λ))2 , G(1 + 2λ)

f (λ) =

(93)

and f (n) =

n−1 j =0

j! . (j + n)!

(94)

The main evidence in support of this conjecture is, as already noted in the Introduction, that (94) coincides with the known values f (1) = 1 [17] and f (2) = 1/12 [20], and agrees with other conjectures (based on number-theoretical calculations) that f (3) = 42/9! [13] and f (4) = 24024/16! [14] (this last conjecture and ours were announced independently at the Erwin Schrödinger Institute in Vienna, in September 1998). In addition, we can compare with numerical data. Odlyzko has computed T +H 1 |ζ (1/2 + it)|2λ dt r(λ, H ) = (95) 2 H (log T )λ T for T close to t1020 [29]. It is obviously natural to compare this to rCUE (λ) =

1 Nλ

2

MN (2λ)a(λ),

(96)

with N satisfying (92). The results, shown in Table 3, would appear to support the conjecture. We can also test our conjecture by returning to the moments of Re log ζ (1/2 + it). Based on the arguments of the previous paragraph, we expect that as T → ∞, 1 T dk (97) (Re log ζ (1/2 + it))k dt ∼ k [MN (s)a(s/2)]s=0 , T 0 ds where N is related to T via (92). The resulting expressions incorporate both the random matrix and the prime contributions. A comparison with Odlyzko’s data may be made by computing the moments using (97) with N = 42. These values are listed in Table 4 (in this case, unlike in Table 1, the moments have not been normalized, in order to focus on the subdominant role played by the primes). They clearly match the data more closely than the CUE values. The moments of Im log ζ (1/2+it) can be treated in the same way. These are obviously related to derivatives of the generating function LN (s). Applying the same heuristic method which underpins (4) leads us to conjecture that 1 T dk (98) (Im log ζ (1/2 + it))k dt ∼ (−i)k k [LN (s)b(s/2)]s=0 , T 0 ds

80

J. P. Keating, N. C. Snaith

where b(λ) =

(1 − 1/p)−λ

2

p

∞ n=0

(1 + λ) (1 − λ) p −n . (1 + λ − n) (1 − λ − n)n!n!

(99)

Moments calculated using (98) with N = 42 are listed in Table 5 (again, unlike in Table 2, these have not been scaled), together with Odlyzko’s data. In this case too, the prime contribution leads to a noticeable improvement compared to the CUE values. It is also simple to check that for k = 2 (98) coincides with (18), and that for k = 3 and k = 4 it agrees with heuristic calculations based on the methods of [4, 7] and [9]. The conjecture corresponding to (4) is then that 1 lim (log T ) T →∞ T λ2

T

0

ζ (1/2 + it) ζ (1/2 − it)

λ

dt = G(1 − λ)G(1 + λ)b(λ),

(100)

where we have used (89). Finally, it is also instructive to examine the distribution of values of |Z|, PN (w) = δ(w − |Z|)U (N) .

(101)

Table 3. Comparison of r(λ, H ), calculated numerically for the Riemann zeta function near the 1020th zero (taken from [29]), the corresponding CUE quantity (N = 42), with and without the prime product a(λ), and the lower bound on the leading order coefficient [12], C1 (λ) λ

CUE with

r(λ, H )

prime product

C1 (λ)

CUE

(lower bound)

% error CUE

% error CUE

with primes

0.1

1.011

1.004

1.0042

1.0129

0.741

0.886

0.2

1.038

1.034

1.0172

1.0430

0.395

0.870

0.3

1.071

1.067

1.0381

1.0803

0.423

1.25

0.4

1.105

1.098

1.064

1.1171

0.649

1.74

0.5

1.133

1.123

1.0904

1.1466

0.914

2.10

0.6

1.151

1.135

1.1113

1.1631

1.37

2.25

0.7

1.152

1.132

1.1195

1.1616

1.77

2.26

0.8

1.133

1.107

1.1076

1.1386

2.38

2.85

0.9

1.091

1.06

1.069

1.0925

2.92

3.07

1.

1.024

0.989

1.

1.0238

3.52

3.52

1.1

0.933

0.896

0.901

0.9350

4.16

4.35

1.2

0.822

0.787

0.776

0.8307

4.48

5.55

1.3

0.699

0.667

0.637

0.7167

4.89

7.45

1.4

0.571

0.544

0.494

0.5996

4.99

10.2

1.5

0.446

0.426

0.36

0.4858

4.65

14.0

1.6

0.333

0.319

0.246

0.3806

4.27

19.3

1.7

0.237

0.229

0.157

0.2880

3.37

25.8

1.8

0.158

0.156

0.092

0.2103

1.41

34.8

1.9

0.100

0.101

0.05

0.1480

0.542

46.5

2.

0.0602

0.0624

0.025

0.1003

3.53

60.7

Random Matrix Theory and ζ (1/2 + it)

81

Table 4. Moments of Re log ζ (1/2 + it) near the 1020th zero (T 1.520 × 1019 ) (averages in a) and b) taken over different intervals) compared with Re log Z when N = 42 with and without the prime contributions Moment

ζ a)

ζ b)

CUE + primes

1

−0.001595

0.000549

0.0

0.0

2

2.5736

2.51778

2.56939

2.65747

3

−2.2263

−2.19591

−2.21609

−2.44955

CUE

4

25.998

25.1283

26.017

27.4967

5

−81.2144

−79.2332

−81.2922

−89.4481

6

655.921

628.48

663.493

713.597

7

−3966.46

−3765.29

−4052.98

−4437.47

8

33328.6

30385.5

34808.2

37806

9

−282163

−250744

−304267

−332278

10

2.271×106

2.298×106

3.082×106

3.359×106

Table 5. Moments of Im log ζ (1/2 + it) near the 1020th zero (T 1.520 × 1019 ) compared with Im log Z when N = 42, with and without the prime contributions Moment

ζ

CUE + primes

1

−1.0 × 10−5

0.0

0.0

2

2.573

2.569

2.657

3

−1.9 × 10−3

0.0

0.0

4

18.74

18.69

20.28

5

−0.097

0.0

0.0

6

216.5

215.6

249.5

7

−3.8

0.0

0.0

8

3355

3321

4178

CUE

Obviously 1 PN (w) = 2π w

∞

−∞

e−is log w MN (is)ds.

(102)

We can approximate this for large N in the same manner as for Re log Z: ∞ 1 PN (w) = exp −is log w − Q2 s 2 /2! − iQ3 s 3 /3! + Q4 s 4 /4! + · · · ds 2πw −∞

∞ −is log w s 2 iQ3 s 3 1 exp − − 3/2 + · · · ds = √ √ 2 2πw Q2 −∞ Q2 Q2 3! ∞ 2 −is log w s 1 ds (103) exp − ∼ √ √ 2 2πw Q2 −∞ Q2 1 − log2 w = √ , (104) exp 2Q2 w 2πQ2 which is valid when w is fixed and N → ∞, and more generally if log w >> − 21 log N , the lower bound being determined by the first pole of MN (s).

82

J. P. Keating, N. C. Snaith

0.6

CUE

0.5

Zeta

0.4

0.3

0.2

0.1

0

5

10

15

20

Fig. 4. The CUE value distribution of |Z|, corresponding to N = 12, with numerical data for the value distribution of |ζ (1/2 + it)| near t = 106

For any finite N we can plot PN (w) numerically by direct evaluation of (102). This is done in Fig. 4 together with data for the value distribution of |ζ (1/2 +it)| when t ≈ 106 , which corresponds via (92) to N = 12. As w → 0, PN (w) tends to a constant for a given N , the value of which can be calculated by noting that the contribution to the integral is dominated by the pole of MN (is) (at s = i) closest to the real axis. Hence lim PN (w) =

w→0

2 N (j ) 1 . (N ) (j − 1/2)

(105)

j =1

If N is large, this is asymptotic to [19] 1 1 1/4 2 log 2 + 3ζ (−1) − log π N 1/4 . N (G(1/2)) = exp 12 2

(106)

Based on the previous discussion of its moments, it is natural to expect that as t → ∞ the way in which the primes contribute to the value distribution of |ζ (1/2 + it)| is given by P˜N (w) =

1 2πw

∞ −∞

e−is log w a(is/2)MN (is)ds.

(107)

Consequently, P˜N (0) = a(−1/2)PN (0).

(108)

Random Matrix Theory and ζ (1/2 + it)

83

We find that a(−1/2) ≈ 0.919, P12 (0) ≈ 0.671, and so a(−1/2)P12 (0) ≈ 0.617, which is indeed close to the numerically computed value of the probability density at zero, 0.613. Away from w = 0, in the region where (104) is valid, the stationary point of (107) is at s ∗ = −i log w/Q2 , so a(is ∗ /2) = a(log w/(2Q2 )). Since a(0) = 1, when | log w| << Q2 , a(log w/(2Q2 )) is close to 1 and so the contribution from the prime product recedes to the extremes of the distribution when N is large. 4. COE and CSE Results Our main focus in this paper has been on the CUE of random matrix theory. However, the methods and results of Sect. 2 extend immediately to the other circular ensembles – the Circular Orthogonal Ensemble (COE) and the Circular Symplectic Ensemble (CSE) [27] – and for completeness we outline the form these generalizations take. Let Z now represent the characteristic polynomial of an N × N matrix U in either the CUE (β = 2), the COE (β = 1), or the CSE (β = 4). The generalization of (21) is that 2π 2π β (β/2)!N iθj iθm |Z|s RMT = · · · dθ · · · dθ − e e 1 N (Nβ/2)!(2π )N 0 0 1≤j <m≤N s N × (109) 1 − ei(θp −θ) , p=1 where the average is over the appropriate ensemble. Exactly the same method as was applied in Sect. 2.1 leads to MN (β, s) = |Z|s RMT =

N−1 j =0

(1 + jβ/2) (1 + s + jβ/2) . ( (1 + s/2 + jβ/2))2

(110)

It follows from expanding log MN (β, s) as a series in powers of s that the cumulants of the distribution of values taken by Re log Z are given by Qβn (N ) =

N −1 2n−1 − 1 (n−1) ψ (1 + jβ/2). 2n−1

(111)

j =0

β

As in the CUE case, Q1 (N ) = 0. Replacing the polygamma functions by their integral representations and interchanging the integral and the sum in (111) provides the leading-order asymptotics β

Q2 (N ) =

N−1 1 1 + O(1) 2 1 + jβ/2 j =0

1 = log N + O(1). β

(112)

84

J. P. Keating, N. C. Snaith β

For Qn (N ), n ≥ 3, the sum in (111) converges as N → ∞. Its value in the limit is Qβn (∞)

∞ 2n−1 − 1 e−t t n−1 n ≡ lim = (−1) dt N→∞ 2n−1 (1 − e−t )(1 − e−βt/2 ) 0 ∞ ∞ ∞ 2n−1 − 1 n = (−1) e−(1+s+βr/2)t t n−1 dt 2n−1 0 Qβn (N )

r=0 s=0

=

2n−1

−1

2n−1

(−1)n

∞ ∞

(n)(s + βr/2)−n .

(113)

r=0 s=1

When β = 4, the number of ways in which s + 2r = k is k/2 if k is even, and (k + 1)/2 if k is odd. Thus

∞ ∞ n−1 − 1 2 k k (−1)n (n) + Q4n (∞) = 2n−1 (2k − 1)n (2k)n k=1 k=1

∞ ∞ ∞ 2n−1 − 1 1 2k 1 2k − 1 n = (−1) (n) + + 2n−1 2 (2k − 1)n (2k − 1)n (2k)n k=1 k=1 k=1 2n−1 − 1 1 n = (114) (−1) (n) ζ (n − 1) + 1 − n ζ (n) . 2n 2 Similarly,

Q1n (∞)

= (2

n−1

− 1)(−1)

n

1 (n) ζ (n − 1) − 1 − n 2

ζ (n) .

(115)

The asymptotic convergence of these cumulants ensures that the distribution of values taken by Re log Z is Gaussian in the limit N → ∞ (with unit variance if normalized β with respect to Q2 (N )) when the zeros are distributed with COE or CSE statistics, just as it was for the CUE. All of the calculations carried out for the CUE transfer immediately β to the other two ensembles by replacing Qn with Qn . A similar equivalence holds for Im log Z. We have LN (β, s) = eis(Im log Z) RMT =

N−1 j =0

( (1 + jβ/2))2 , (jβ/2 + 1 + s/2) (jβ/2 + 1 − s/2) (116)

from which it follows that Rnβ (N ) =

0 (−1)n/2+1 2n−1

N−1 j =0 β

if n odd ψ n−1 (1 + jβ/2) β

if n even

.

(117)

Comparison with (111) then shows that R2 (N ) = Q2 (N ), and that the value distribution of Im log Z has a Gaussian limit in all three cases.

Random Matrix Theory and ζ (1/2 + it)

85

In order to generalize the results of Sect. 2.5, we need the next term in the asympβ totic expansion (112) for Q2 (N ). Applying the recurrence formula for the polygamma function, ψ (1) (z + 1) = ψ (1) (z) −

1 , z2

(118)

we have in the CSE case that N−1 1 (1) ψ (1 + 2j ) 2 j =0    2j N−1 1  (1) 1 ψ (1) (1) −  = ψ (1) + 2 m2

Q42 (N ) =

m=1

j =1

=

N (1) 1 ψ (1) − 2 2

N−1 k=1

N−1 N −k 1 N −k − (2k − 1)2 2 (2k)2 k=1

1 1 3 1 = log N + (1 + γ ) + log 2 + ζ (2) + O(N −1 ). 4 4 4 16

(119)

As Q42 (N ) = R24 (N ), this also gives us the second cumulant for Im log Z. In the COE case we follow a very similar procedure, except that as we now have polygamma functions of half-integers, we need to consider the case of even and odd N seperately. We start with N even, relating the polygamma functions of integers back to ψ (1) (1) and those with half-integer argument to ψ (1) (1/2), and find that Q12 (N )

N−1 1 (1) = ψ (1 + j/2) 2 j =0   N/2 (N−2)/2 1 N 4(N/2 − k + 1) N/2 − k  N =  ψ (1) (1) + ψ (1) (1/2) − − 2 2 2 (2k − 1)2 k2 k=1

k=1

3 = log N + 1 + γ − ζ (2) + O(N −1 ). 4

(120)

The calculation for odd N is very similar and the result is the same. Once again Q12 (N ) = R21 (N ). The procedure for calculating the leading-order coefficient of |Z|s or (Z/Z ∗ )s/2 for averages over the CSE and COE ensembles is also very similar to that already detailed for the CUE. In these cases we need log (1 + z) = −zγ +

∞ n=2

(−1)n ζ (n)

zn , n

valid for |z| < 1, as well as the expansion (82) for the Barnes G-function.

(121)

86

J. P. Keating, N. C. Snaith

Using (114) and (119), we have that MN (4, s) fCSE (s) = lim N→∞ N s 2 /8

2 ∞ 1 1 3 s 1 (1 + γ ) + log 2 + ζ (2) + ζ (n − 1) = exp (−1)n 4 4 16 2 2 n=3 n 1 1 1 1 s 1 − n ζ (n − 1) + ζ (n) − n ζ (n) − n+1 ζ (n) + n ζ (n) , (122) 2 2 2 2 4 n and from (121) and (82) we see that 1 log (1 + s) + log (1 + s/4) 2 1 1 − log G(1 + s) − log (1 + s/2) − log (1 + s/2) 2 2 2 ∞ 1 3 s = (−1)n (1 + γ ) + ζ (2) + 4 16 2 n=3 1 1 1 1 × ζ (n − 1) − n ζ (n − 1) + ζ (n) − n ζ (n) 2 2 2 2 n 1 1 s − n+1 ζ (n) + n ζ (n) . 2 4 n log G(1 + s/2) +

Thus fCSE (s) = 2s

2 /8

√ G(1 + s/2) (1 + s/4) (1 + s) , √ G(1 + s) (1 + s/2) (1 + s/2)

(123)

(124)

for |s| < 1. It then follows by analytic continuation that the equality holds for all s. The above combination of gamma and G-functions also has the correct poles and zeros, namely a pole of order k at negative integers of the form −(2k − 1) and a zero of order 1 at −(4k − 2), where k = 1, 2, 3 . . . . The coefficients which reduce to rational numbers, as for the 2k th moments in the CUE case, are those where s = 4k for positive integers k. With the help of (87) we see that fCSE (4k) = 2k−1 j =1

2k

. (2j − 1)!! (2k − 1)!!

(125)

This can also been checked directly by writing MN (4, s) as a polynomial of order 2k 2 in N . For the imaginary part of the log of Z we have, in the CSE case, that 2 1 1 3 s s 2 /8 = exp − (1 + γ ) + log 2 + ζ (2) (126) lim LN (4, s) × N N→∞ 4 4 16 2 2n ∞ 1 1 1 s − ζ (2n − 1) + 2n ζ (2n) − 2n ζ (2n) , 22n 2 4 2n n=2

Random Matrix Theory and ζ (1/2 + it)

87

and the expansions of the gamma and G-functions allow us to show that lim LN (4, s) × N s

2 /8

N→∞

=2

#

−s 2 /8

G(1 + s/2)G(1 − s/2) (1 + s/4) (1 − s/4) , (1 + s/2) (1 − s/2)

(127)

which has zeros of order k at ±(4k − 2) and also k th order zeros at ±4k, k = 1, 2, 3 . . . , just as an examination of LN (4, s) indicates it should. Moving on to the COE, we have in this case that

lim

N→∞

MN (1, s) Ns

2 /2

2 ∞ 3 s 1 + γ − ζ (2) (−1)n 2n−1 ζ (n − 1) + 4 2 n=3 n 1 3 s − ζ (n − 1) − 2n−1 ζ (n) + ζ (n) − n ζ (n) . (128) 2 2 n

= exp

Comparing this to 3 log (1 + s) 2 1 1 − log G(1 + 2s) − log (1 + 2s) − log (1 + s/2) 2 2 2 ∞ 3 s = 1 + γ − ζ (2) + (−1)n 2n−1 ζ (n − 1) − ζ (n − 1) − 2n−1 ζ (n) 4 2 n=3 n 3 s 1 + ζ (n) − n ζ (n) (129) 2 2 n

log G(1 + s) +

when |s| < 1/2, it follows that fCOE (s) = lim

N→∞

MN (1, s) Ns

2 /2

=

√ (1 + s) G(1 + s) (1 + s) , √ (1 + s/2) G(1 + 2s) (1 + 2s)

(130)

in this range, and hence, by analytic continuation, for all s. This expression has a simple poles at s = −(2k − 1) and a pole of order k at s = −(2k + 1)/2, with k = 1, 2, 3, . . . . We find rational values of this coefficient when s = 2k: fCOE (2k) =

k j =1

(2j − 1)! . (2k + 2j − 1)!

(131)

Again, this can be verified by computing the leading order term of MN (1, 2k), which turns out to be a polynomial of order 2k 2 in N .

88

J. P. Keating, N. C. Snaith

Finally, lim LN (1, s) × N

N→∞

s 2 /2

2 ∞ s 3 − = exp − 1 + γ − ζ (2) (ζ (2n − 1) − ζ (2n) 4 2 n=2 2n 1 s + 2n ζ (2n) 2 2n # G(1 + s)G(1 − s) (1 + s) (1 − s) = , (132) (1 + s/2) (1 − s/2)

with the correct zeros of order k at ±2k and order k at ±(2k + 1), where k = 1, 2, 3, . . . . Acknowledgements. We are grateful to Peter Sarnak, for a number of very helpful suggestions, to Brian Conrey, for several stimulating discussions, and, in particular, to Andrew Odlyzko, for giving us access to his numerical data. NCS also wishes to acknowledge generous funding from NSERC and the CFUW in Canada.

References 1. Abramowitz, M. and Stegun, I.A.: Handbook of Mathematical Functions. New York: Dover Publications, Inc., 1965 2. Aurich, R., Bolte, J. and Steiner, F.: Universal signatures of quantum chaos. Phys. Rev. Lett. 73, 1356–1359 (1994) 3. Barnes, E.W.: The theory of the G-function. Q. J. Math. 31, 264–314 (1900) 4. Berry, M.V.: Semiclassical formula for the number variance of the Riemann zeros. Nonlinearity 1, 399– 407 (1988) 5. Berry, M.V. and Keating, J.P.: The Riemann zeros and eigenvalue asymptotics. SIAM Rev. 41, 236—266 (1999) 6. Bogomolny, E. and Schmit, C.: Semiclassical computations of energy levels. Nonlinearity 6, 523–547 (1993) 7. Bogomolny, E.B. and Keating, J.P.: Random matrix theory and the Riemann zeros I; three- and four-point correlations. Nonlinearity 8, 1115–1131 (1995) 8. Bogomolny, E.B. and Keating, J.P.: Gutzwiller’s trace formula and spectral statistics: Beyond the diagonal approximation. Phys. Rev. Lett. 77, 1472–1475 (1996) 9. Bogomolny, E.B. and Keating, J.P.: Random matrix theory and the Riemann zeros II; n-point correlations. Nonlinearity 9, 911–935 (1996) 10. Bohigas, O., Giannoni, M.J. and Schmit, C.: Characterization of chaotic quantum spectra and universality of level fluctuation. Phys. Rev. Lett. 52, 1–4 (1984) 11. Brézin, E. and Hikami, S.: Characteristic polynomials of random matrices. Preprint, 1999 12. Conrey, J.B. and Ghosh, A.: On mean values of the zeta-function. Mathematika 31, 159–161 (1984) 13. Conrey, J.B. and Ghosh, A.: On mean values of the zeta-function, iii. In: Proceedings of the Amalfi Conference on Analytic Number Theory, Università di Salerno, 1992 14. Conrey, J.B. and Gonek, S.M.: High moments of the Riemann zeta-function. Preprint, 1998 15. Costin, O. and Lebowitz, J.L.: Gaussian fluctuation in random matrices. Phys. Rev. Lett. 75 (1), 69–72 (1995) 16. Goldston, D.A.: On the function S(T ) in the theory of the Riemann zeta-function. J. Number Theory 27, 149–177 (1987) 17. Hardy, G.H. and Littlewood, J.E.: Contributions to the theory of the Riemann zeta-function and the theory of the distribution of primes. Acta Mathematica 41, 119–196 (1918) 18. Heath-Brown, D.R.: Fractional moments of the Riemann zeta-function, ii. Quart. J. Math. Oxford (2) 44, 185–197 (1993) 19. Hughes, C.P., Keating, J.P. and O’Connell, N.: On the characteristic polynomial of a random unitary matrix. Preprint, 2000 20. Ingham, A.E.: Mean-value theorems in the theory of the Riemann zeta-function. Proc. Lond. Math. Soc. 27, 273–300 (1926) 21. Ivic, A.: Mean values of the Riemann zeta function. Bombay: Tata Institute of Fundamental Research, 1991

Random Matrix Theory and ζ (1/2 + it)

89

22. Katz, N.M. and Sarnak, P.: Random Matrices, Frobenius Eigenvalues and Monodromy. Providence, RI: AMS, 1999 23. Katz, N.M. and Sarnak, P.: Zeros of zeta functions and symmetry. Bull. Am. Math. Soc. 36, 1–26 (1999) 24. Keating, J.P.: The Riemann zeta function and quantum chaology. In: G. Casati, I. Guarneri, and U. Smilansky (eds.), Quantum Chaos. Amsterdam: North-Holland, 1993, pp. 145–85 25. Keating, J.P.: Periodic orbits, spectral statistics, and the Riemann zeros. In: I.V. Lerner, J.P. Keating, and D.E. Khmelnitskii (eds.), Supersymmetry and trace formulae: chaos and disorder. New York: Plenum, 1999, pp. 1–15 26. Keating, J.P. and Snaith, N.C.: Random matrix theory and L-functions at s = 1/2. Commun. Math. Phys. 214, 91–110 (2000) 27. Mehta, M.L.: Random Matrices. London: Academic Press, second edition, 1991 28. Montgomery, H.L.: The pair correlation of the zeta function. Proc. Symp. Pure Math. 24, 181–93 (1973) 29. Odlyzko, A.M.: The 1020th zero of the Riemann zeta function and 70 million of its neighbors. Preprint, 1989 30. Rudnick, Z. and Sarnak, P.: Zeros of principal L-functions and random-matrix theory. Duke Math. J. 81, 269–322 (1996) 31. Sarnak, P.: Quantum chaos, symmetry and zeta functions. Curr. Dev. Math., 84–115 (1997) 32. Soshnikov, A.: Level spacings distribution for large random matrices: Gaussian fluctuations. Ann. of Math. 148, 573–617 (1998) 33. Titchmarsh, E.C.: The Theory of the Riemann Zeta Function. Oxford: Clarendon Press, second edition, 1986 34. Weyl, H.: Classical Groups. Princeton: Princeton University Press, 1946 35. Wieand, K. Eigenvalue Distributions of Random Matrices in the Permutation Group and Compact Lie Groups. PhD thesis, Harvard University, 1998 Communicated by P. Sarnak

Commun. Math. Phys. 214, 91 – 110 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Random Matrix Theory and L-Functions at s = 1/2 J. P. Keating1,2 , N. C. Snaith1 1 School of Mathematics, University of Bristol, University Walk, Bristol BS8 1TW, UK 2 BRIMS, Hewlett-Packard Laboratories, Filton Road, Stoke Gifford, Bristol BS34 6QZ, UK

Received: 1 February 2000 / Accepted: 24 March 2000

Abstract: Recent results of Katz and Sarnak [8, 9] suggest that the low-lying zeros of families of L-functions display the statistics of the eigenvalues of one of the compact groups of matrices U (N ), O(N ) or U Sp(2N ). We here explore the link between the value distributions of the L-functions within these families at the central point s = 1/2 and those of the characteristic polynomials Z(U, θ) of matrices U with respect to averages over SO(2N ) and U Sp(2N ) at the corresponding point θ = 0, using techniques previously developed for U (N ) in [10]. For any matrix size N we find exact expressions for the moments of Z(U, 0) for each ensemble, and hence calculate the asymptotic (large N ) value distributions for Z(U, 0) and log Z(U, 0). The asymptotic results for the integer moments agree precisely with the few corresponding values known for L-functions. The value distributions suggest consequences for the non-vanishing of L-functions at the central point.

1. Introduction For L-functions, as for the well-known Riemann zeta function, it is conjectured, and widely believed, that the non-trivial zeros lie on the line Re s = 1/2; this being the generalised Riemann hypothesis (GRH). Whereas high on this critical line the zeros of any given L-function appear to have the same statistical distribution as the eigenvalues of the group U (N ) of (large) N × N unitary matrices endowed with Haar measure (otherwise known as the Circular Unitary Ensemble (CUE) of random matrix theory) [13, 14, 12], Katz and Sarnak [8, 9] have proposed that the low-lying zeros of families of L-functions follow the statistics of the eigenvalues not always of U (N ), but in some cases of the compact groups O(N ) or U Sp(2N ). This is supported by the fact that for the function field analogue the equivalent of the Riemann Hypothesis is known to be true, and Katz and Sarnak have discovered that zeta functions over function fields have zero statistics which show exactly the behaviour just described.

92

J. P. Keating, N. C. Snaith

Conrey and Farmer [3] have extended this idea that the low-lying zeros of families of L-functions show particular statistics to the study of the mean values of the L-functions Lf (s) within families at the central point s = 1/2. They have found evidence that the symmetry type to which the low-lying zeros subscribe also determines the behaviour of these mean values. In particular, they conjecture that in general, as Q → ∞, 1 Q∗

f ∈F c(f ) ≤ Q

a(k) 1 (log QA )B(k) , V (Lf ( ))k ∼ gk 2 (1 + B(k))

(1)

where they choose V (x) depending on the symmetry type (V (z) = |z|2 for unitary symmetry and V (z) = z for the orthogonal or symplectic case); A is a symmetrydependent constant; the family, F, over which the average is performed is considered to be partially ordered by the conductor, c(f ), of each L-function; and the sum is over the Q∗ elements with c(f ) ≤ Q. The symmetry type of the family manifests itself in the expectation that it alone determines the values of gk and B(k). These functions are thus universal, being independent of the details of the particular family in question. a(k), on the other hand, is expected to depend on the specific family involved. Previously [10] we have studied mean values of the Riemann zeta function, which is defined by the Euler product over prime numbers or a Dirichlet sum, ∞ 1 −1 1 1− s = , (2) ζ (s) = p ns p n=1

for Res > 1, and by an analytical continuation in the rest of the complex plane. We conjectured that the moments of ζ (1/2 + it) high on the critical line t ∈ R factor into a part which is specific to the Riemann zeta function, and a universal component which is the corresponding moment of the characteristic polynomial Z(U, θ) of matrices in U (N ), defined with respect to an average over the CUE. The connection between N and the height T up the critical line corresponds to equating the mean density of eigenvalues 1 T N/2π with the mean density of zeros 2π log 2π . This idea has subsequently been applied by Brézin and Hikami [2] to other random matrix ensembles, and by Coram and Diaconis [4] to other statistics. Our purpose here is to extend these calculations to SO(2N ) and U Sp(2N ), and to compare the results with what is known about the L-functions. (Only SO(2N ) is relevant, because a family of L-functions governed by O(N ) falls approximately into two halves; one displaying even symmetry about s = 1/2, and the other odd symmetry. This latter class contributes zero to averages at the central value, while the zero statistics of the former are expected to follow those of SO(2N ).) We therefore consider the value distribution of the characteristic polynomial of 2N ×2N orthogonal or unitary symplectic matrices, Z(U, θ ) =

N

1 − ei(θn −θ)

1 − ei(−θn −θ) ,

(3)

n=1

averaged over these groups when θ = 0, which is the symmetry point for the eigenvalues, just as s = 1/2 is for the L-function zeros. In each case we derive an explicit expression for Z(U, 0)s , valid for all N , and from this obtain the leading order N → ∞ asymptotics, together with a simplified formula when s is an integer. We also derive the value

Random Matrix Theory and L-Functions at s = 1/2

93

distributions of log Z(U, 0) and Z(U, 0). Comparing with results for various families of L-functions suggests that, as for ζ (s), random matrix theory determines the universal part of (1). This then provides further support for the programs of Katz and Sarnak, and Conrey and Farmer. 2. Symplectic Symmetry 2.1. Random matrices in U Sp(2N ). We are interested here in the group of symplectic unitary matrices, U Sp(2N ).These are2N × 2N matrices, U , with U U † = 1 and 0 IN U t J U = J , where J = and IN is the N × N identity matrix. For −IN 0 these matrices, the eigenvalues lie on the unit circle and come in complex conjugate pairs. Thus the characteristic polynomial related to such a matrix with eigenvalues eiθ1 , e−iθ1 , eiθ2 , e−iθ2 , . . . eiθN , e−iθN takes the form (3). Our first step will be to calculate the moments of Z(U, 0) defined by averaging with respect to Haar measure. The joint probability density function of the eigenvalues is thus [17]

NSp

1≤i<j ≤N

θi − θj sin 2

θi + θj sin 2

= NSp

2 N

1≤i<j ≤N

sin2 θk

k=1

2 N

1 (cos θj − cos θi ) 2

sin2 θk ,

(4)

k=1

with normalization constant NSp = 2

−3N

N

(1 + N + j ) 22N −2N = . (1 + j )((1/2 + j ))2 πNN! 2

j =1

(5)

Noting that Z(U, 0) =

N (1 − eiθn )2 n=1

= 22N

N

sin2 (θn /2)

n=1

= 2N

N

(1 − cos θn ),

(6)

n=1

we proceed to

Z(U, 0)s

U Sp(2N)

= NSp 2Ns ×

2π 0

1≤i<j ≤N

2π

··· 0

dθ1 · · · dθN 2 N

1 (cos θj − cos θi ) 2

k=1

sin2 θk

N (1 − cos θn )s n=1

94

J. P. Keating, N. C. Snaith

= NSp 2Ns+2N−N ×

N

sin2 θk

2

π

0

N

π

··· 0

dθ1 · · · dθN

(cos θj − cos θi )2

1≤i<j ≤N

(1 − cos θn )s ,

(7)

n=1

k=1

which, after the transformation xj = cos θj , becomes

s

Z(U, 0)

= NSp 2

U Sp(2N)

×

N

Ns+2N−N 2

1 −1

···

1 −1

dx1 · · · dxN

(xj − xi )2

1≤i<j ≤N

(1 − xk )1/2+s (1 + xk )1/2 .

(8)

k=1

There is a form of Selberg’s integral (detailed in [11]) which states that

1

−1

···

1

−1 1≤j
|(xj − xl )|2γ

n

(1 − xj )α−1 (1 + xj )β−1 dxj

j =1

= 2γ n(n−1)+n(α+β−1)

n−1 j =0

(1 + γ + j γ )(α + j γ )(β + j γ ) , (1 + γ )(α + β + γ (n + j − 1))

(9)

Reα Reβ if Reα > 0, Reβ > 0 and Reγ > − min n1 , n−1 , n−1 . In our case γ = 1, α = 3/2 + s and β = 3/2, so

Z(U, 0)s

= NSp 2Ns+2N−N 2N 2

U Sp(2N)

×

N−1 j =0

= 22Ns

2 +N+Ns

(2 + j )(3/2 + s + j )(3/2 + j ) (2)(3 + s + N + j − 1)

N (1 + N + j )(1/2 + s + j ) (1/2 + j )(1 + s + N + j )

j =1

≡ MSp (N, s).

(10)

We now consider the coefficients cj in the expansion MSp (N, s) = ec1 s+c2 s

2 /2+c s 3 /3!+c s 4 /4!+··· 3 4

.

(11)

These coefficients are the cumulants of the distribution of log Z(U, 0), because MSp (N, s) is the generating function for the moments of this distribution: MSp (N, s) =

∞ j =0

(log Z(U, 0))j U Sp(2N)

sj . j!

(12)

Random Matrix Theory and L-Functions at s = 1/2

95

Since log MSp (N, s) = 2sN log 2 +

N

(log (1/2 + s + j ) + log (1 + N + j )

j =1

− log (1/2 + j ) − log (1 + s + N + j )),

(13)

we find that N

c1 =

d (ψ(1/2 + j ) − ψ(1 + N + j )) (14) log MSp (N, s)s=0 = 2N log 2 + ds j =1

is the first cumulant, while the higher ones are given by cn =

N dn (n−1) (n−1) log M (N, s) = (1/2 + j ) − ψ (1 + N + j ) , ψ Sp s=0 ds n j =1

(15) j +1

d where ψ (j ) (z) = dz j +1 log (z) is a polygamma function. We seek the behaviour of these cumulants for large N . For the first we use the asymptotic formula ∞

ψ(z) ∼ log z −

B2n 1 , − 2z 2nz2n

(16)

n=1

which holds when z → ∞ with |argz| < π , where B2n are the Bernoulli numbers. Also, we need the integral form of the digamma function,

∞ −t e − e−zt dt. (17) ψ(z) + γ = 1 − e−t 0 Applying (17), we obtain c1 = 2N log 2+

N j =1

= 2N log 2 +

∞ 0

N

j =1 0

∞

e−t −e−(j +1/2)t dt −γ − 1−e−t

0

∞

e−t −e−(j +N+1)t dt + γ 1 − e−t

e−(j +N+1)t − e−(j +1/2)t dt. 1 − e−t

(18)

We now interchange the summation and integration and perform the sum explicitly so that we can integrate by parts to arrive at

∞ −(2N+2)t e−(N+2)t e dt + (2N + 1) dt −t 1 − e 1 − e−t 0 0

∞ −(N+3/2)t e−3t/2 e dt − (N + 1/2) dt. 1 − e−t 1 − e−t 0

c1 = 2N log 2 − (N + 1) +

1 2

0

∞

∞

96

J. P. Keating, N. C. Snaith

Converting back to polygamma function notation via (17), then applying (16) as N becomes large, we see that c1 = 2N log 2 + (N + 1)ψ(N + 2) − (2N + 1)ψ(2N + 2) 1 − ψ(3/2) + (N + 1/2)ψ(N + 3/2) 2 1 γ = log N + + O(N −1 ). 2 2

(19)

For the second cumulant we need to use the asymptotic formula for higher polygamma functions, valid as z → ∞ with |arg z| < π ,

∞ (n − 1)! (2k + n − 1)! n! ψ (n) (z) ∼ (−1)n−1 . (20) + n+1 + B2k zn 2z (2k)!z2k+n k=1

There is also an integral formula for the higher polygamma functions which will prove useful:

∞ n −zt t e ψ (n) (z) = (−1)n−1 dt. (21) 1 − e−t 0 This leads us to c2 =

N

ψ (1) (j + 1/2) − ψ (1) (1 + N + j )

j =1

=

N j =1

∞ 0

te−(j +1/2)t dt − 1 − e−t

∞ 0

te−(1+N+j )t dt . 1 − e−t

(22)

Again we interchange the order of the summation and integration, perform the sum and integrate by parts. The result, expressed in terms of polygamma functions, is 1 c2 = −ψ(3/2) − ψ (1) (3/2) + ψ(N + 3/2) + (N + 1/2)ψ (1) (N + 3/2) 2 + ψ(N + 2) + (N + 1)ψ (1) (N + 2) − ψ(2N + 2) − (2N + 1)ψ (1) (2N + 2) 3 = log N + 1 + γ + log 2 − ζ (2) + O(N −1 ). (23) 2 The higher cumulants follow in a similar manner

∞ n−1 −(1/2+j )t

∞ n−1 −(1+N+j )t N t e t e n n cn = (−1) dt − (−1) dt −t 1 − e 1 − e−t 0 0 j =1

∞ n−1 −(3/2)t t e 1 − e−Nt = (−1)n dt −t 1−e 1 − e−t 0

∞ n−1 −2−N t e 1 − e−Nt − (−1)n dt, (24) −t 1−e 1 − e−t 0

Random Matrix Theory and L-Functions at s = 1/2

97

and so N→∞

∞

t n−1 e−(3/2)t dt (1 − e−t )(1 − e−t )

0

∞ n−2 −(1/2)t n−1 e−(3/2)t ∞ t e n −t = (−1) dt + (n − 1) 0 1 − e−t 1 − e−t 0

1 ∞ t n−1 e−(1/2)t − dt 2 0 1 − e−t

lim cn = (−1)n

1 = −(n − 1)ψ (n−2) (1/2) − ψ (n−1) (1/2) 2 1 n n n−1 = (−1) (n − 1)! (2 − 1)ζ (n − 1) − (2 − 1)ζ (n) . 2

(25)

These expressions for the cumulants, inserted into (11), allow us to write the leading order coefficient of the moment MSp (N, s) as MSp (N, s) fSp (s) ≡ lim N→∞ N s/2+s 2 /2 2 γ 3 s = exp s + 1 + γ + log 2 − ζ (2) (26) 2 2 2 n ∞ 1 s (−1)n (2n−1 − 1)ζ (n − 1) − (−1)n (2n − 1)ζ (n) + . 2 n n=3

This coefficient can be expressed as a combination of gamma functions and the Barnes G-function [1, 16], which is defined by G(1 + z) = (2π)z/2 e−[(1+γ )z

2 +z]/2

∞

(1 + z/n)n e−z+z

2 /(2n)

,

(27)

n=1

and has zeros at the negative integers, −n, with multiplicity n (n = 1, 2, 3 . . . ). Other properties useful to us are that G(1) = 1, G(z + 1) = (z) G(z),

(28)

and furthermore, for |z| < 1, ∞

z zn z2 log G(1 + z) = (log(2π) − 1) − (1 + γ ) + (−1)n−1 ζ (n − 1) . 2 2 n

(29)

n=3

Combining this with log (1 + z) = −γ z +

∞ n=2

ζ (n)

(−z)n , n

(30)

98

J. P. Keating, N. C. Snaith

which holds for |z| < 1, we see that, for |s| < 1/2, 1 1 1 log G(1 + 2s) − log (1 + 2s) + log (1 + s) 2 2 2 ∞ s2 3 γ (−1)n (2n−1 − 1)ζ (n − 1) = s + (1 + γ ) − ζ (2)s 2 + 2 2 4 n=3 n 1 s − (−1)n (2n − 1)ζ (n) . 2 n

log G(1 + s) −

(31)

A comparison with (26) shows that fSp (s) = 2

s 2 /2

√ G(1 + s) (1 + s) ×√ , G(1 + 2s)(1 + 2s)

(32)

for |s| < 1/2, and hence by analytic continuation for all s. For integer moments the formula is simpler. Using (28) we see that G(n) =

n−1

(j ),

(33)

j =1

and so for integer n,

√ G(1 + n) (1 + n) √ G(1 + 2n)(1 + 2n) n √ (1 + n) j =1 (j ) n2 /2 =2 2n j =1 (j ) (1 + 2n) √ n n! j =1 (j ) n2 /2 =2 2n+1 j =1 (j − 1)! n √ (j − 1)! n! 2 j =1 = 2n /2 22n−1 32n−2 42n−3 · · · (2n − 1)2 2n n √ √ √ √ 1 2··· n − 1 n j =1 (j − 1)! n2 /2 =2 √ √ √ 2 4 · · · 2n 22n−2 32n−2 42n−4 · · · (2n − 2)2 (2n − 1)2

fSp (n) = 2n

2 /2

1n−1 2n−2 · · · (n − 2)2 (n − 1) 2n/2 2n−1 3n−1 4n−2 5n−2 · · · (2n − 2)(2n − 1) 1 2 = 2n /2 n−1 2n/2 2 j =1 j nj=1 (2j − 1)!! −1  n =  (2j − 1)!! . = 2n

2 /2

(34)

j =1

Following the ideas developed in [10], these integer coefficients have also been calculated independently by Brézin and Hikami [2].

Random Matrix Theory and L-Functions at s = 1/2

99

Having the generating function, MSp (N, s), it is a short step to find the value distributions of both log Z(U, 0) and Z(U, 0) itself. The distribution of log Z(U, 0)/ log N is δ(x − log Z(U, 0)/ log N ) U Sp(2N)

∞ 1 = e−iy(x−log Z(U,0)/ log N) dy 2π −∞ U Sp(2N)

∞ 1 = e−iyx MSp (N, iy/ log N )dy 2π −∞

∞ 1 2 3 = e−iyx ec1 iy/ log N+c2 (iy/ log N) /2+c3 (iy/ log N) /3!+··· dy 2π −∞

∞ 1 1 −iyx log N + O(1) iy/ log N e exp = 2π −∞ 2 y2 y3 −(log N + O(1)) − i(O(1)) + · · · dy. 2(log N )2 3!(log N )3 Thus

∞ 1 e−iyx+iy/2 dy 2π −∞ = δ(x − 1/2),

lim δ(x − log Z(U, 0)/ log N ) U Sp(2N) =

N→∞

(35) (36)

(37)

and so the distribution of values of log Z(U, 0)/ log N tends to a delta function centred at x = 1/2. If we instead retain the y 2 term in the exponent in (36), we have the central limit theorem 2 1 c1 log Z(U, 0) x = exp − , (38) lim δ x + √ − √ N→∞ c2 c2 2π 2 U Sp(2N) where c1 and c2 are related to N by (19) and (23), respectively. For finite N , the exact distribution is of course given by (35), where MSp (N, s) is defined by (10). 1

It is not difficult to determine as well the distribution of the values of Z(U, 0) log N . Changing variables in (35), results in

∞ ! 1 1 δ(x − Z(U, 0) log N ) = x −iy MSp (N, iy/ log N )dy, (39) U Sp(2N) 2π x −∞ and so lim δ(x − Z(U, 0)

N→∞

1 log N

) U Sp(2N)

∞ 1 = e−iy log x eiy/2 dy 2π x −∞ 1 = δ(log x − 1/2). x

(40)

Alternatively, we can examine the value distribution of Z(U, 0). Denoting it by PSp (N, x), we see that

∞ 1 PSp (N, x) = x −iy MSp (N, iy)dy. (41) 2π x −∞

100

J. P. Keating, N. C. Snaith

Although PSp (N, x) does not have a limiting distribution as N → ∞, we suggest the approximation

∞ 1 c2 y 2 exp −iy log x + ic1 y − dy PSp (N, x) ≈ 2πx −∞ 2 1 (log x − c1 )2 , (42) = √ exp − 2c2 x 2πc2 and plot it, for two values of N , in Fig. 1 along with the exact distribution (41). It should be noted that the approximation (42) is valid when x is fixed and N → ∞, and more generally is expected to be a good approximation when log x >> − log N and N is large, the lower bound being determined by the first pole of MSp (N, s). P (x)

P (x)

(a)

0.2

(b)

0.2 exact

exact

asymptotic

0.15

asymptotic

0.15

0.1

0.1

0.05

0.05 x

0

2

4

6

8

x 0

2

4

6

8

Fig. 1. Distribution of the values of Z(U, 0) for matrices in U Sp(2N ), (a) N = 6, (b) N = 42. The solid curve is the exact distribution (41) and the dashed curve is the large N approximation (42)

It may be seen from Fig. 1 that PSp (N, 0) = 0. Although the approximation (42) also tends to zero as x → 0, it does not predict the correct rate of approach. This may instead be obtained by examining the poles of the integrand of PSp (N, x) =

1 2πx

∞

−∞

x −iy 22iNy

N (1 + N + j )(1/2 + iy + j ) dy. (1/2 + j )(1 + iy + N + j )

(43)

j =1

These poles occur at the points y = i(2k + 1)/2 and are of order k, for k = 1, 2, . . . , N, then of order N for all higher k. Due to the factor x −iy it is evident that in the limit x → 0, the lowest pole, that at y = (3i)/2, gives the dominant contribution. From the residue at that lowest pole we thus find that as x → 0, PSp (N, x) ∼ x 1/2 2−3N

N 1 (1 + N + j )(j ) . (N ) (1/2 + j )(N + j − 1/2)

(44)

j =1

2.2. L-functions with symplectic symmetry. In the Introduction we gave a brief description of the mean values at s = 1/2 for families of L-functions and the relation of these to the symmetry type displayed by the low-lying zeros. Here we consider the case of symplectic symmetry in more depth.

Random Matrix Theory and L-Functions at s = 1/2

101

If we again use Conrey and Farmer’s notation, as in (1), then in the symplectic case they have V (z) = z and find that B(k) = 21 k(k + 1) [3]. They also list several families which are conjectured to have low-lying zeros with symplectic symmetry, the simplest of which is the Dirichlet L-functions, L(s, χd ), where χd is a quadratic Dirichlet character. The sum (1) is then over all such characters with conductor |d| ≤ D: as D → ∞, k 1 1 1 a(k) 1 , χd ∼ gk L (log D 2 ) 2 k(k+1) , 1 D∗ 2 (1 + 2 k(k + 1))

(45)

|d|≤D

where D ∗ is the number of quadratic characters included in the sum. For this case the first few values of gk for integer k have been found using numbertheoretic techniques to be [7, 15, 3] g1 = 1, g2 = 2, g3 = 24 and, by conjecture, g4 = 3 · 28 . It might be expected that gk is related to the random matrix moment values calculated in Sect. 2.1, since it is believed to be purely symmetry-determined. Our purpose now is to provide evidence in favour of this. Making the identification N = log(QA ),

(46)

and recalling that as N → ∞ MSp (N, k) ∼ fSp (k)N 2 k(k+1) , we conjecture that for symplectic families of L-functions 1

(1 +

gk 1 2 k(k

+ 1))

= fSp (k).

(47)

Following the arguments of [10], the relation between N and Q should arise from equating the mean densities of zeros. For the L-functions we need the density near s = 1/2 because we are dealing with the L-functions just at this point. In the case of L-functions with quadratic Dirichlet characters, (45), the mean density at a fixed height 1 up the critical line increases like 2π log |d| as |d| → ∞. Since the mean density of eigenvalues of a matrix in U Sp(2N ) is N/π , we equate N = (1/2) log D, and obtain exactly the proposed relation, since A = 1/2 in this case. It is then striking that the first few values of fSp at the integers, fSp (1) = 1, fSp (2) = 1 1 1 3 , fSp (3) = 45 and fSp (4) = 4725 , agree precisely, via (47), with the values that Conrey and Farmer report for the symplectic L-functions. Further evidence in favour of (47) is the success of a very similar conjecture relating moments of the Riemann zeta function to averages over U (N ) [10]. The only difference is that in the case of ζ (s) the average was along the critical line rather than over a family of functions. This is not a significant difference, however, and Conrey and Farmer in fact suggest that we think of the Riemann zeta function as a unitary family (with zeros showing the statistics of the eigenvalues of matrices from U (N )) in its own right, where we are averaging over special values of the family {ζ (1/2 + it)} as t ranges over the real numbers. The validity of the conjecture (47) would imply many results on the value distribution of the central values of symplectic L-functions. The distribution for the logarithm of symplectic families of L-functions, for example, is expected to behave for asymptotically large Q in the same way as that of the characteristic polynomial Z, always remembering

102

J. P. Keating, N. C. Snaith

that N must be related to the L-function parameter via the density of zeros. This is because the conjecture (47) can also be written as MSp (N, k) 1 1 k lim Lf ( ) = a(k) × lim , (48) 1 N→∞ N 21 k(k+1) 2 Q→∞ (log QA ) 2 k(k+1) Q∗ f ∈F c(f ) ≤ Q so the value distribution of log Lf ( 21 )/ log log QA defined by averages with c(f ) ≤ Q, would be, for large Q and making the identification (46),

∞ 1 VSp (x) = e−ixy a(iy/ log N )MSp (N, iy/ log N )dy, (49) 2π −∞ leading to 1 lim VSp (x) = N→∞ 2π

∞

−∞

a(0)e−ixy+iy/2 dy.

(50)

Since a(0) = 1, we see that this would imply that the distribution of log Lf ( 21 )/ log log QA is asymptotic to δ(x − 1/2), in just the same way as for log Z(U, 0)/ log N . Following the same line of argument, we suggest that " # log Lf 21 c˜1 lim δ x+√ − √ Q→∞ c˜2 c˜2 F

∞ √ √ 3/2 1 iy 2 3 e−iyx−iyc1 / c2 eiyc1 / c2 −y /2+c3 (iy) /(c2 3!)+··· dy = lim a √ N→∞ 2π −∞ c2 2 1 −x , (51) = √ exp 2 2π where · F denotes an average over a family F of L-functions, as in (1), and c˜1 and c˜2 are given by (19) and (23), respectively, again with the identification (46). If we now turn to the distribution of values of Lf ( 21 ) itself, WSp (x), we can close the contour of

∞ 1 WSp (x) = x −is a(is)MSp (N, is)ds (52) 2π x −∞ around the poles and obtain, as x → 0, the dominant contribution from the pole at s = (3i)/2: WSp (x) ∼ x 1/2 a(−3/2)2−3N

N 1 (1 + N + j )(j ) . (N ) (1/2 + j )(N + j − 1/2)

(53)

j =1

This is of particular note in the light of recent interest in the non-vanishing of the central values of L-functions, see for example [15, 6, 5] and references therein. Clearly (53) implies that as long as a(−3/2) is finite for a particular family of symplectic Lfunctions, the probability that the central value of those L-functions lies in the range (0, x) decreases like x 3/2 as x → 0.

Random Matrix Theory and L-Functions at s = 1/2

103

3. Orthogonal Symmetry 3.1. Random matrices in SO(2N). We now consider the characteristic polynomial of matrices from the group SO(2N ). Here the eigenphases also come in complex conjugate pairs, so Z(U, θ ) takes the form (3), and the average at the point θ = 0 is, once again using the joint probability density function for the eigenphases dictated by Haar measure [17],

Z(U, 0)s

SO(2N)

= NO

2π

0

×2Ns

2π

··· 0

N

×

π

π

···

0

2

1 (cos θj − cos θi ) 2

0

dθ1 · · · dθN

(cos θj − cos θi )2

1≤i<j ≤N

(1 − cos θn )s

n=1

= NO 22N−N ×

(1 − cos θn )s

2N−N 2 +Ns

N

1≤i<j ≤N

n=1

= NO 2

dθ1 · · · dθN

N

2 +Ns

1 −1

···

1 −1

dx1 · · · dxN

(xj − xi )2

1≤i<j ≤N

(1 − xn2 )−1/2 (1 − xn )s ,

(54)

n=1

with a normalization constant NO = 2

−N

N j =1

(N + j − 1) 22N −4N+1 = . (1 + j )((j − 1/2))2 πNN! 2

(55)

We use the Selberg integral again, this time with γ = 1, α = s + 1/2 and β = 1/2, obtaining

Z(U, 0)s

SO(2N)

= NO 22N−N ×

N−1 j =0

= NO 2

2 +Ns

· 2N

2 −N+sN +N/2+N/2−N

(2 + j )(s + 1/2 + j )(1/2 + j ) (2)(s + 1 + N + j − 1)

N+2Ns

N (1 + j )(s + j − 1/2)(j − 1/2) (s + N + j − 1)

j =1

= 22Ns

N j =1

(N + j − 1)(s + j − 1/2) (j − 1/2)(s + j + N − 1)

≡ MO (N, s).

(56)

104

J. P. Keating, N. C. Snaith

As for MSp (N, s), MO (N, s) is the generating function for the moments of the log of Z(U, 0), this time for the orthogonal ensemble, so if we write MO (N, s) = eq1 s+q2 s

2 /2+q s 3 /3!+q s 4 /4!+··· 3 4

,

(57)

then the parameters qj are the cumulants of the value distribution of log Z(U, 0). These cumulants can be obtained by taking derivatives of log MO (N, s) = 2N s log 2 +

N

(log (N + j − 1) + log (s + j − 1/2)

j =1

− log (j − 1/2) − log (s + j + N − 1)),

(58)

thus producing d log MO (N, s) s=0 ds N = 2N log 2 + (ψ(j − 1/2) − ψ(j + N − 1)),

q1 =

j =1

qn = =

dn

log MO (N, s)

ds n N

s=0

ψ (n−1) (j − 1/2) − ψ (n−1) (j + N − 1) ,

(59)

j =1

for n = 2, 3, 4, . . . . The asymptotic behaviour of these cumulants for large N may be recovered using the techniques of Sect. 2.1. Starting with q1 and using (16) and (17),

∞ −t −(j +N−1)t N ∞ e−t −e−(j −1/2)t e −e q1 = 2N log 2+ dt −γ − dt +γ 1−e−t 1−e−t 0 0 j =1

= 2N log 2 +

N

j =1 0

∞

e−(j +N−1)t − e−(j −1/2)t dt. 1 − e−t

(60)

At this point we interchange the sum and integral, evaluate the sum and integrate by parts, resulting in

∞ −Nt

∞ −2Nt e e q1 = 2N log 2 − (N − 1) dt + (2N − 1) dt −t 1−e 1 − e−t 0 0

∞ −(N+1/2)t

1 ∞ e−t/2 e − dt − (N − 1/2) dt −t 2 0 1−e 1 − e−t 0 = 2N log 2 + (N − 1)ψ(N ) − (2N − 1)ψ(2N ) 1 + ψ(1/2) + (N − 1/2)ψ(1/2 + N ) 2 1 γ 1 = − log N − + O . (61) 2 2 N

Random Matrix Theory and L-Functions at s = 1/2

105

The second cumulant is determined similarly (with the help of (20) and (21)) to be

q2 =

N 0

j =1

=

∞

∞

te−(j −1/2)t dt − 1 − e−t

te−t/2 (1 − e−Nt )

∞

te−(j +N−1)t dt 1 − e−t

0

dt −

∞

te−Nt (1 − e−Nt ) dt (1 − e−t )2

(1 − e−t )2 0

∞ e−t/2 1 ∞ te−t/2 = dt + dt 1 − e−t 2 0 1 − e−t 0

∞ −(N+1/2)t

∞ −(N+1/2)t e te − dt + (N − 1/2) dt −t 1 − e 1 − e−t 0 0

∞ −Nt

∞ e te−Nt − dt + (N − 1) dt 1 − e−t 1 − e−t 0 0

∞ −2Nt

∞ −2Nt e te + dt − (2N − 1) dt −t 1−e 1 − e−t 0 0 1 = −ψ(1/2) + ψ (1) (1/2) + ψ(N + 1/2) + (N − 1/2)ψ (1) (N + 1/2) 2 + ψ(N ) + (N − 1)ψ (1) (N ) − ψ(2N ) − (2N − 1)ψ (1) (2N ) 3 1 = log N + 1 + γ + log 2 + ζ (2) + O . (62) 2 N 0

Finally, all the higher cumulants converge asymptotically to a constant, lim qn = lim

N→∞

N→∞

×

(−1)n

∞ t n−1 e−t/2

0 ∞ t n−1 e−Nt

0

= (−1)n

1 − e−t

∞ 0

1 − e−Nt dt − (−1)n 1 − e−t 1 − e−Nt dt 1 − e−t 1 − e−t

t n−1 e−t/2 dt. (1 − e−t )2

(63)

Evaluating the integral in (63) by integrating by parts and then rewriting it as a pair of polygamma functions, 1 lim qn = −(n − 1)ψ (n−2) (−1/2) + ψ (n−1) (−1/2) 2 1 = (−1)n (n − 1)! (2n−1 − 1)ζ (n − 1) + (2n − 1)ζ (n) . 2

N→∞

(64)

It is thus clear from (57) and the asymptotic form of the cumulants that the leading order coefficient of the moments of Z(U, 0) is

106

J. P. Keating, N. C. Snaith

MO (N, s) fO (s) ≡ lim N→∞ N s 2 /2−s/2 2 γ 3 s = exp − s + 1 + γ + log 2 + ζ (2) 2 2 2 n ∞ 1 s n n−1 n n (−1) (2 − 1)ζ (n − 1) + (−1) (2 − 1)ζ (n) + . (65) 2 n n=3

Examining the product form of MO (N, s) we see that the coefficient is expected to have poles of order k at s = −(2k − 1)/2, for k = 1, 2, 3 . . . . Using (29) and (30), we see that, for |s| < 1/2, 1 1 1 log G(1 + 2s) + log (1 + 2s) − log (1 + s) 2 2 2 2 γ 3 s = − + 1 + γ + ζ (2) 2 2 2 n ∞ 1 s n n−1 n n + (−1) (2 , − 1)ζ (n − 1) + (−1) (2 − 1)ζ (n) 2 n

log G(1 + s) −

(66)

n=3

and comparing with (65) we thus find that fO (s) = 2s

2 /2

√ G(1 + s) (1 + 2s) ×√ , G(1 + 2s)(1 + s)

(67)

for |s| < 1/2, and hence by analytic continuation in the rest of the complex plane. This clearly has the correct combination of poles. This leading order coefficient reduces for integer moments, again using (28), to √ n (j ) (1 + 2n) j =1 2 fO (n) = 2n /2 2n j =1 (j ) (1 + n) √ n (2n)! j =1 (j − 1)! n2 /2 =2 2n−2 2n−3 2 3 · · · (2n − 2)2 (2n − 1)n! n √ √ 2n n! (2n − 1)!! j =1 (j − 1)! n2 /2 =2 2n−1 4n−2 · · · (2n − 2) 32n−3 52n−5 · · · (2n − 1)n! n √ 2n j =1 (j − 1)! n2 /2 =2 n−1 2 j =1 j 1n−1 2n−2 · · · (n − 1) 32n−4 52n−6 · · · (2n − 3)2 1n−1 2n−2 · · · (n − 2)2 (n − 1)2n/2 2n(n−1)/2 3n−2 5n−3 · · · (2n − 3)1n−1 2n−2 · · · (n − 1)  −1 n−1 = 2n  (2j − 1)!! .

= 2n

2 /2

j =1

This result was also obtained independently in [2].

(68)

Random Matrix Theory and L-Functions at s = 1/2

107

Once more, we can examine the value distribution of Z(U, 0) and its logarithm. The value distribution of log Z(U, 0)/ log N is δ(x − log Z(U, 0)/ log N ) SO(2N)

∞ 1 = e−iyx MO (N, iy/ log N )dy 2π −∞

∞ 1 1 exp −iyx + − log N + O(1) iy/ log N = 2π −∞ 2 − (log N + O(1))

(69)

y2 y3 −i(O(1)) + · · · dy, 2(log N )2 3!(log N )3

yielding the limiting distribution lim δ(x − log Z(U, 0)/ log N ) SO(2N)

N→∞

∞ 1 = e−iyx−iy/2 dy 2π −∞ = δ(x + 1/2).

(70)

This is a delta distribution as in the symplectic case, but this time centred at x = −1/2. Keeping the y 2 term in the exponent in the integral above leads to the central limit theorem: 2 1 q1 log Z(U, 0) x lim δ x + √ − = exp − . (71) √ N→∞ q2 q2 2π 2 SO(2N) 1

The value distribution of Z(U, 0) log N is similarly straightforward to compute. We see that

∞ 1 1 x −iy MO (N, iy/ log N )dy, (72) δ(x − Z(U, 0) log N ) SO(2N) = 2π x −∞ and so

∞ 1 e−iy log x e−iy/2 dy 2π x −∞ 1 = δ(log x + 1/2). x

1

lim δ(x − Z(U, 0) log N ) SO(2N) =

N→∞

We also examine the distribution simply of Z(U, 0), PO (N, x). As

∞ 1 x −iy MO (N, iy)dy, PO (N, x) = 2π x −∞

(73)

(74)

we can make the approximation q2 y 2 exp −iy log x + iq1 y − dy 2 −∞ −(log x − q1 )2 1 , exp = √ 2q2 x 2π q2

PO (N, x) ≈

1 2π x

∞

(75)

108

J. P. Keating, N. C. Snaith

P (x)

P (x)

(a)

(b)

0.6

0.6

0.5

0.5

exact

exact 0.4

0.4

asymptotic

0.3

asymptotic

0.3 0.2

0.2

0.1

0.1 x

x 2

0

4

6

2

0

8

4

6

8

Fig. 2. Distribution of the values of Z(U, 0) for matrices in the group SO(2N ), with (a) N = 6 and (b) N = 42. The solid curve is the exact distribution (74) and the dashed curve is the large N approximation in (75)

valid as N → ∞ when x is fixed (and, like (42), expected to be a good approximation when log x >> − log N and N is large). The result (75) is plotted in Fig. 2 for N = 6 and N = 42 along with the numerically calculated exact distribution, from (74). Unlike the symplectic case (and unlike the approximation (75)), PO (N, x) diverges as x → 0. This can be seen by considering the poles of the integrand, which occur at i/2, 3i/2, 5i/2, . . . . Once again it is the lowest pole, the simple one at i/2, that dominates the integral as x → 0. In this case we find that PO (N, x) ∼ x −1/2 2−N

N 1 (N + j − 1)(j ) , (N ) (j − 1/2)(j + N − 3/2)

(76)

j =1

in that limit. 3.2. L-functions with orthogonal symmetry. We now turn our attention to families of L-functions with a symmetry governed by an ensemble of orthogonal matrices. Lfunctions of this type fall into two categories, even and odd, which are related to the ensembles SO(2N ) and SO(2N + 1) respectively. Of the L-functions comprising an orthogonal family, approximately one half will have even symmetry, and the other half odd symmetry, these latter vanishing at s = 1/2. Examples of such families are given in [3]. Referring to (1), in the orthogonal case V (z) = z and B(k) = 21 k(k−1).As in the symplectic case, the first few of the coefficients gk with integer coefficients have been calculated. The known values are g1 = 1, g2 = 2, g3 = 23 and it is conjectured that g4 = 27 [3]. With N taking the place of log(QA ), we conjecture this time that lim

Q→∞

1 (log QA )

1 2 k(k−1)

Q∗

Lf

1 k 2

= a(k) × fO (k)/2.

(77)

f ∈F c(f ) ≤ Q

The right-hand side is divided by two because the random matrix average was just over SO(2N ), whereas the sum over central values of the L-functions contains an equal number of functions contributing zero to the average; namely the L-functions with odd

Random Matrix Theory and L-Functions at s = 1/2

109

symmetry about s = 1/2. Once again, we expect the relation (46) to follow from equating the density of zeros of the L-functions and the density of eigenphases of the matrices. Having posed the conjecture (77), we check it against the known values of gk . It is clear that the first four coefficients fO (1) = 2, fO (2) = 4, fO (3) = 83 and fO (4) = 16 45 satisfy conjecture (77); that is gk / (1 + 21 k(k − 1)) = fO (k)/2, for k = 1, 2, 3, 4. As for the symplectic case, we can examine what (77) implies about the value distributions of L-functions and their logarithms. Since the L-functions with odd symmetry are zero at s = 1/2, we now restrict ourselves to averages over the orthogonal L-functions with even symmetry. These are expected to satisfy (77) without the factor 1/2 on the right-hand side. The value distribution of log Lf 21 / log log QA for L-functions with even symmetry (defined by averaging as in (77)) is expected to be given, for large Q, and with the identification (46), by

∞ 1 VO (x) = e−ixs a(is/ log N )MO (N, is/ log N )ds, (78) 2π −∞ and following the argument laid out for the symplectic case, this converges to δ(x + 1/2) as N → ∞. We can once again state a conjectural central limit theorem, this time for averages over a family F of L-functions with c(f ) ≤ Q governed by the symmetry SO(2N ): " # log Lf 21 q˜1 lim δ x + − Q→∞ q˜2 q˜2 F

∞ √ √ 3/2 1 iy 2 3 e−iyx−iyq1 / q2 eiyq1 / q2 −y /2+q3 (iy) /(q2 3!)+··· dy = lim a √ N→∞ 2π −∞ q2 2 1 x , (79) = √ exp − 2 2π where q˜1 and q˜2 are related to (61) and (62), respectively, via (46). For the value distribution of Lf ( 21 ) itself, which the conjecture suggests for large Q is

∞ 1 WO (x) = x −is a(is)MO (N, is)ds, (80) 2π x −∞ we expect that near x = 0, WO (x) ∼ x −1/2 a(−1/2)2−N

N 1 (N + j − 1)(j ) ; (N ) (j − 1/2)(j + N − 3/2)

(81)

j =1

the contribution to the integral (80) from the simple pole at s = i/2. For L-functions with even symmetry from an orthogonal family for which a(−1/2) = 0, this analysis therefore suggests that the likelihood that the central value vanishes is integrably singular, and that the probability of a value in the range (0, x) vanishes as x 1/2 when x → 0. Acknowledgements. It is a pleasure to thank David Farmer and, especially, Brian Conrey for suggesting these calculations and for numerous helpful comments during the course of the research. We are grateful also to Peter Sarnak for stimulating discussions. NCS also owes a great deal of gratitude to NSERC and the CFUW in Canada for their generous funding.

110

J. P. Keating, N. C. Snaith

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17.

Barnes, E.W.: The theory of the G-function. Q. J. Math. 31, 264–314 1900 Brézin, E. and Hikami, S.: Characteristic polynomials of random matrices. Preprint, 1999 Conrey, J.B. and Farmer, D.W.: Mean values of L-functions and symmetry. Preprint, 1999 Coram, M. and Diaconis, Persi: New tests of the correspondence between unitary eigenvalues and the zeros of Riemann’s zeta function. Preprint, 1999 Iwaniec, H., Luo, W. and Sarnak, P.: Low lying zeros of families of L-functions. Preprint, 1999 Iwaniec , H.and Sarnak, P.: The non-vanishing of central values of automorphic L-functions and Siegel’s zeros. Preprint, 1997 Jutila, M.: On the mean value of L(1/2, χ ) for real characters. Analysis, 1, 149–161 1981 Katz, N.M. and Sarnak, P.: Random Matrices, Frobenius Eigenvalues and Monodromy, Providence, R. I: AMS, 1999 Katz, N.M. and Sarnak, P.: Zeros of zeta functions and symmetry. Bull. Am. Math. Soc. 36, 1–26 1999 Keating, J.P. and Snaith, N.C.: Random matrix theory and ζ (1/2 + it). Commun. Math. Phys. 214, 57–89 (2000) Mehta, M.L.: Random Matrices. London: Academic Press, second edition, 1991 Rubinstein, M.: Evidence for a Spectral Interpretation of Zeros of L-functions. PhD thesis, Princeton University, 1998 Rudnick, Z. and Sarnak, P.: Principal L-functions and random matrix theory. Duke Math. J. 81 2, 269–322 1996 Rumely, R.: Numerical computations concerning ERH. Math. Comp. 61, 415–440 1993 Soundararajan, K.: Non-vanishing of quadratic Dirichlet L-functions at s = 21 . Preprint, 1999 Vignéras, M.-F.: L’equation fonctionnelle de la fonction zeta de Selberg du groupe modulaire P SL(2, Z). Asterisque 61, 235–249 1979 Weyl, H.: Classical Groups. Princeton, NJ: Princeton University Press, 1946

Communicated by P. Sarnak

Commun. Math. Phys. 214, 111 – 135 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Characteristic Polynomials of Random Matrices Edouard Brézin1 , Shinobu Hikami2 1 Laboratoire de Physique Théorique de l’École Normale Supérieure, Unité Mixte de Recherche 8549 du

Centre National de la Recherche Scientifique et de l’École Normale Supérieure, 24 rue Lhomond, 75231 Paris Cedex 05, France. E-mail: [email protected] 2 Department of Basic Sciences, University of Tokyo, Meguro-ku, Komaba 3-8-1, Tokyo 153, Japan. E-mail: [email protected] Received: 1 October 1999 / Accepted: 18 May 2000

Abstract: Number theorists have studied extensively the connections between the distribution of zeros of the Riemann ζ -function, and of some generalizations, with the statistics of the eigenvalues of large random matrices. It is interesting to compare the average moments of these functions in an interval to their counterpart in random matrices, which are the expectation values of the characteristic polynomials of the matrix. It turns out that these expectation values are quite interesting. For instance, the moments of order 2K scale, for unitary invariant ensembles, as the density of eigenvalues raised to the power K 2 ; the prefactor turns out to be a universal number, i.e. it is independent of the specific probability distribution. An equivalent behaviour and prefactor had been found, as a conjecture, within number theory. The moments of the characteristic determinants of random matrices are computed here as limits, at coinciding points, of multi-point correlators of determinants. These correlators are in fact universal in Dyson’s scaling limit in which the difference between the points goes to zero, the size of the matrix goes to infinity, and their product remains finite. 1. Introduction The correlation function of the eigenvalues of large N × N matrices are known to exhibit a number of universal features in the large-N limit. For instance in the Dyson limit [1, 2], when the distances between these eigenvalues, measured in units of the local spacing, becomes of order 1/N , the correlation functions, as well as the level spacing distribution, become universal, i.e. independent of the specific probability measure. For finite differences, upon a smoothing of the distribution, the two-point correlation function is again universal [3,4]. The short distance universality was also shown to extend to external source problems [5–8], in which an external matrix is coupled to the random matrix. In this article, we study the average of the characteristic polynomials, whose zeros are the eigenvalues of the random matrix. The probability distribution of the characteristic

112

E. Brézin, S. Hikami

polynomial det(λ − X) of a random matrix X, a polynomial of degree N in λ, may be characterized by its moments det K (λ − X) , or better by its correlation functions K det(λl − X) . l=1

This study is motivated by various conjectures which appeared recently in number theory for the zeros of the Riemann ζ -function and its generalizations known as Lfunctions [12]. Indeed the characteristic polynomials, as well as the zeta-fuctions, have their zeros on a straight line, and these zeros obey the same statistical distribution. For the 2K th moment of the Riemann ζ -function (K is a positive integer), it has been conjectured [9, 10] that 1 T

0

T

dt|ζ

1 2 + it |2K γK aK (log T )K , 2

(1)

where aK is a number related to the Dirichlet coefficient (the divisor function) dK (n), and γK =

K−1 l=0

l! . (l + K)!

(2)

The explicit formula for aK is given in the Appendix, together with summation formulae for the Dirichlet coefficients, which are related to (1). In this work we shall compute the equivalent of (1) for random matrices, show that the density of states ρ(λ) replaces log T , and that the same number γK is universally present. For the negative moments, similar conjectures have been proposed, with a cut-off parameter δ for avoiding divergences [11], and we show here how to obtain these negative moments for random matrices. Several types of L-functions have been introduced [12], which correspond to the three standard classes of random matrices. The conjecture for the average of the moments (1) has been extended to these L-functions [13]. The average is taken as a sum of the discriminant d, for instance, for the Dirichlet L( 21 , χd ) function. The relations between the distributions of the eigenvalues of the random matrix theory and the statistical distribution of the zeros of the various L-functions has also been studied [12, 14]. Our aim in this article, is to clarify the universality of the moments of the characteristic polynomials for these three classes. The circular unitary ensemble has been studied earlier by Keating and Snaith [10], who did obtain the γK in (2) from their calculation. However this ensemble has a constant density of states, and furthermore it does not allow to study the universality of these properties. In this work we have considered a Gaussian ensemble and non-Gaussian extensions, instead of the circular ensemble, to verify both the explicit dependence in the density of states and the universality of the coefficient γK . In theprocess of the derivation, we have found it necessary to start with the K-point K functions det(λl − X) , which are shown to be themselves universal in the large-N l=1

Dyson limit, in which N (λi − λj ) is held fixed. The moments are then simply the limit of these functions when all the Dyson variables vanish.

Characteristic Polynomials of Random Matrices

113

2. Correlation Functions of Characteristic Polynomials We consider random M × M Hermitian matrices X with a normalized probability distribution 1 exp −N TrV(X), Z

P (X) =

(3)

in which V is a given polynomial. It will turn out to be convenient to distinguish here M and N , but we later restrict ourselves to a large N and large M limit, with limM/N = 1. Let us consider the correlation function of K distinct characteristic polynomials: K FK (λ1 , · · · , λK ) = det(λα − X) , (4) α=1

in which the bracket denotes an expectation value with the weight (3). Integrating as usual over the unitary group, we obtain FK (λ1 , · · · , λK ) =

1 ZM

M

dµ(xi )

2

(x1 , · · · , xM )

M K

(λα − xi )

(5)

α=1 i=1

1

in which dµ(x) denotes the measure dµ(x) = dx exp −N V (x), the Vandermonde (xi − xj ), and ZM the normalization constant determinant (x1 , · · · , xM ) = 1≤i<j ≤M

ZM =

M

dµ(xi )

2

(x1 , · · · , xM ).

(6)

(x1 , · · · , xM ; λ1 , · · · , λK ) , (λ1 , · · · , λK )

(7)

1

We now use the obvious identity (x1 , · · · , xM )

M K α=1 i=1

(λα − xi ) =

and represent the Vandermonde determinants (x1 , · · · , xM ) and (x1 , · · · , xM ; λ1 , · · · , λK ) as determinants of arbitrary polynomials whose coefficients of highest degree are equal to unity (the so-called monic polynomials) pn (x) = x n + lowerdegree.

(8)

Then (x1 , · · · , xM ) = det pn (xm )

(9)

(n runs from zero to M − 1 and m from one to M), and (x1 , · · · , xM ; λ1 , · · · , λK ) = det pa (ub )

(10)

in which a runs from zero to M + K − 1, b from one to M + K and ub stands for xb if b ≤ M, or λb for M < b ≤ M + K.

114

E. Brézin, S. Hikami

Choosing now the polynomials orthogonal with respect to the measure dµ: pn (x)pm (x)dµ(x) = hn δnm ,

(11)

we may easily integrate over the M eigenvalues M

dµ(xi )

(x1 , · · · , xM ; λ1 , · · · , λK ) (x1 , · · · , xM )

1

= M!

M−1

hn det pα (λβ ),

(12)

0

in which α runs from M to M + K − 1 and β from 1 to K. Similarly the normalization factor ZM is given by ZM =

M 1

dµ(xi )

2

(x1 , · · · , xM ) = M!

M−1

hn .

(13)

0

We thus end up with FK (λ1 , · · · , λK ) =

pM (λ1 ) pM+1 (λ1 ) · · · pM+K−1 (λ1 ) pM (λ2 ) pM+1 (λ2 ) · · · pM+K−1 (λ2 ) 1 . det .. (λ1 , · · · , λK ) . p (λ ) p (λ ) · · · p (λ ) M K M+1 K M+K−1 K

(14)

If we are concerned simply with the moments of the distribution of a single characteristic polynomial, we obtain from (14), µK (λ) = FK (λ, · · · , λ) = [det(λ − X)]K pM (λ) pM+1 (λ) · · · pM+K−1 (λ) pM (λ) pM+1 (λ) · · · pM+K−1 (λ) (−1)K(K−1)/2 .. = det . K−1 . (l!) l=0 (K−1) (K−1) (K−1) p (λ) p (λ) · · · p (λ) M M+1 M+K−1

(15)

These expressions are all exact, but in the next section we shall be concerned with the large N limit. Then (i) the interesting case is that of even K, since for odd K the result is oscillatory ( for instance for K = 1 µ1 (λ) = pM (λ) ), (ii) it will turn out that, even if we are interested simply in the moments µK (λ), it is more convenient to study first the large N -limit of the FK with distinct λi and afterwards let them approach a single λ. The results that will be derived later for those FK ’s and µK ’s will be shown to be universal in the Dyson limit, in which N goes to infinity, the λi − λj goes to zero for any pair i, j , and the products N (λi − λj ) remain finite. We first derive explicit formulae for the Gaussian case, and show later that they do apply to any random matrix distribution P (X) of the form (3).

Characteristic Polynomials of Random Matrices

115

3. The Gaussian Case We now specialize the result (14) of the previous section to the Gaussian distribution of M × M Hermitian matrices P (X) =

1 N exp − TrX2 , ZM 2

(16)

with M = N − K,

(17)

(the reason for this choice of M will be clarified in the next section). Then the polynomials that we have introduced, are Hermite polynomials, and with our normalizations, (−1)n Nx 2 /2 d n −Nx 2 /2 Hn (x) = e e = x n + l.d., (18) Nn dx and n! hn = n N

2π . N

(19)

The integral representation (−1)n n! Hn (x) = Nn

dz e−N(z /2+xz) 2iπ z(n+1) 2

(20)

over a contour which circles around the origin in the z-plane, turns out to be well adapted. Repeated use of this formula in the result (14) yields F2K (λ1 , · · · , λ2K ) =

2K−1 (−1)K l=0 (M + l)! (λ1 , · · · , λ2K ) N K(2M+2K−1)

2K

2K z2 dzl l exp − N × det(e−Nλa zb ). (21) 2 2iπ zlM+l l=1

1

We can expand the determinant in the r.h.s. and keep only one of the (2K)! terms, antisymmetrizing instead the integration variables zl . This gives 2K−1 (−1)K l=0 (M + l)! F2K (λ1 , · · · , λ2K ) = (λ1 , · · · , λ2K ) N K(2M+2K−1)

2K 2K zl2 dzl exp [−N × ( + λl zl )] (z1 , · · · , z2K ). (22) 2 2iπ zlM+2K 1 l=1 This expression for the expectation value of a product of 2K characteristic polynomials, as an integral over 2K complex variables, is exact for finite N and M. We are now in position to study the large N -limit through a saddle point integration 1 over each zl . Since we have chosen M +K = N each z has a weight K exp −N (z2 /2+ z

116

E. Brézin, S. Hikami

λz+log z), which presents two saddle points z± , solutions of the equation z2 +λz+1 = 0, i.e. with the parametrization λ = 2 sin φ,

(23)

when λ lies on the support of the asymptotic Wigner semi-circle of the density of levels, z+ = ieiφ , z− = −ie−iφ .

(24)

Therefore there are, a priori 22K saddle-points at which the moduli of the weight 2K 2K zl2 z2 exp [−N ( + λl zl + log zl )] are the same. However, it is only when ( l + 2 2 1 1 λl zl + log zl ) is real (in the Dyson limit in which the differences between the λ’s are small), that theoscillations, which damp the result, are not present. Therefore we keep 2K only the saddle-points in which we take K solutions of type z+ and K of type K z− . We are now interested in Dyson’s short-distance limit. Defining λ=

2K 1 λl , 2K

(25)

l=1

and the density of eigenvalues at this point 1 1 4 − λ2 = cos φ, ρ(λ) = 2π π

(26)

we introduced the scaling variables xa = 2π Nρ(λ)(λa − λ), with

2K

xa = 0,

(27)

a=1

which are kept finite in this limit. Then the fluctuations around a saddle-point may be taken all at the point λ, and they yield a factor (

2π K 2 2 −K/2 )(1 − z− )] = (Nρ(λ))−K . ) [(1 − z+ N

(28)

We must now take into account the various factors in (22) at these saddle-points. In 2K the Dyson limit the factor zlK which remained in the denominator, may be re1

placed by one, since at a given λ one has z+ z− = 1. The only delicate factor is thus 2K z2 (z1 , · · · , z2K ) exp [−N ( l + λl zl + log zl )], which we must first compute at one (λ1 , · · · , λ2K ) 2 1 2K of the saddle-points, and then take the sum over the saddle-points. We consider K first the saddle-point zl (λl ) = z+ (λl ) l = 1, · · · , K, zl (λl ) = z− (λl ) l = K + 1, · · · , 2K.

(29)

Characteristic Polynomials of Random Matrices

117

2K zl2 one If we expand in the Dyson limit the weight exp −N + λl zl + log zl 2 1 finds

2K zl2 exp −N + λl zl + log zl 2 1 K 2K λ2 = exp N K 1 + × exp −N (λl − λ)z+ (λ) + (λl − λ)z− (λ) , (30) 2

K+1

1

d 1 2 z (λ) + λz± + log z± = z± ). Therefore at that saddle-point, in dλ 2 ± terms of the scaling variables (27),

2K K zl2 λ2 exp −N = exp N K 1 + xl . (31) + λl zl + log zl exp −i 2 2 (we have used

1

1

Let us consider now the ratio of Vandermonde determinants at that same saddle-point: (z1 , · · · , z2K ) = (λ1 , · · · , λ2K )

1≤l<m≤K

z+ (λl ) − z+ (λm ) λl − λ m

×

1≤l≤K,K+1≤m≤2K

K+1≤l<m≤2K

z+ (λl ) − z− (λm ) . λl − λ m

In the scaling limit, this factor becomes dz+ dz− K(K−1)/2 (z1 , · · · , z2K ) 2 = (2i cos φ)K (λ1 , · · · , λ2K ) dλ dλ = (N i)

K2

(2πρ(λ))

z− (λl ) − z− (λm ) λl − λ m

K+K 2

(32)

1≤l≤K,K+1≤m≤2K

1≤l≤K,K+1≤m≤2K

1 . xl − x m

1 λl − λ m (33)

Leaving aside for the moment the overall factors which do not change at the various saddle-points, we note the result from this particular one which is K 1 2K exp −i xl , and consider summing over the saddleK xl − x m 1

1≤l≤K,K+1≤m≤2K

points. The sum is best done under the form of an integral over K variables. Indeed, if we consider

K K 2 (u , · · · , u ) (−1)K(K−1)/2 duα 1 K I (x1 , · · · , x2K ) = exp −i uα K 2K K! 2iπ α=1 l=1 (uα − xl ) 1 l=1 (34) over a contour in which each uα circles around the x’s, we recover exactly the contribution previous saddle-point by choosing u1 = x1 , · · · , uK = xK , or any permutation of those K x’s. In view of the Vandermonde in the numerator, all the u’s have to be different, and

118

E. Brézin, S. Hikami

2K poles to be added, which reconstruct exactly the sum on K the saddle-points that we needed to perform. Collecting the various factors that came on the way, we end up with the final formula thus there are indeed

2K N 2 exp(−N K) V (λl ) F2K (λ1 , · · · , λ2K ) = (2π Nρ(λ))K 2 K! l=1

K K 2 (u , · · · , u ) duα 1 K × uα K 2K exp −i . 2π (u − xl ) α=1 l=1 α 1 α=1

exp −

If we specialize to K = 1 one finds sin x N exp − (V (λ1 ) + V (λ2 )) F2 (λ1 , λ2 ) = e−N (2π Nρ(λ)) 2 x

(35)

(36)

with x = π Nρ(λ)(λ1 − λ2 ), in which we recover the well-known Dyson kernel, which characterizes the correlation between eigenvalues, whose universality has been very 2 much discussed over recent years. Note the dependence in (Nρ(λ))K of this function. −N This K=1 result (36) is indeed equal to (2π e )K(λ1 , λ2 ), where the kernel K(λ1 , λ2 ) is K(λ1 , λ2 ) =

sin[π Nρ(λ)(λ1 − λ2 )] . π(λ1 − λ2 )

(37)

(In the next section we return to the normalizations. It will be explained how the extrafactor 2πe−N is cancelled by the normalization constant hN−1 .) We can now specialize this formula to the moments of the distribution of the characteristic polynomial, by letting all the λ’s approach each other, i.e. letting the x’s vanish. Before we do that, we should point out that the procedure to obtain these moments is in fact subtle. In principle we could have set all the λ’s equal at an early stage of the calculation. If we returned for instance to (21) we might have replaced the limit of det(e−Nλa zb ) by (z1 , · · · , z2K ) (up to a factor), but then the saddle-point method to (λ1 , · · · , λ2K ) obtain the large N -limit becomes quite problematic. Indeed the Vandermonde of the z s at the saddle-point vanishes and it is necessary to go far beyond the Gaussian integration. However it is now straightforward to obtain this moment from (35). We obtain exp −(N KV (λ))F2K (λ, · · · , λ) K K duα 2 exp(−N K) = (2π Nρ(λ))K exp −i uα K! 2π 1

α=1

2 (u

1 , · · · , uK ) . K 2K α=1 uα

(38)

Expanding the Vandermonde determinant into a sum over permutations, we find 2 K K duα (u1 , · · · , uK ) uα = (−1)K(K−1)/2 exp −i K 2K 2π u α=1 α 1 α=1 1 1 × (−1)(P +Q) ··· , (39) (2K − P0 − Q0 − 1)! (2K − PK−1 − QK−1 − 1)! P ,Q

Characteristic Polynomials of Random Matrices

119

in which P and Q are permutations of the integers (0, · · · , K − 1). Therefore K K duα uα exp −i 2π

2 (u

1 , · · · , uK ) K 2K α=1 uα

α=1

1

= (−1)K(K−1)/2 K!

K−1 1 l! = K! , 0≤i,j ≤K−1 (2K − i − j − 1)! (K + l)!

det

(40)

0

and thus finally exp −(N KV (λ))F2K (λ, · · · , λ) = (2π Nρ(λ))K e−NK 2

K−1 0

l! . (K + l)!

(41)

4. Normalizations and Universality We have studied in the previous section a Gaussian ensemble of random matrices and 2 found that the result (41) for the moment involved (2π Nρ(λ))K times a number and one would like to see how general this result is, as far as the dependence in the density of states is concerned as well as for the normalization. We shall see that this behaviour is quite general, and given a proper normalization, that the prefactor is also universal. Indeed let us recall how the K-point correlation function of the eigenvalues are defined in an ensemble of hermitian N × N matrices X with a probability weight proportional to exp −NTrV (X). In [1] one finds N N! 1 RK (λ1 , · · · , λK ) = dλ(K+1) · · · dλN exp −N V (λl ) (N − K)! ZN 1

×

2

(λ1 , · · · λN ).

(42)

Comparing with our initial definitions (5) we see that one has the relation K ZN−K N! 2 RK (λ1 , · · · , λK ) = exp −N V (λl ) (λ1 , · · · λK ) (N − K)! ZN 1

×F2K (λ1 , λ1 , · · · , λK , λK );

(43)

the r.h.s. reduces, up to a normalization, to our previous product of characteristic functions of matrices (N − K) × (N − K), each one beeing repeated twice. On the other hand it is well known ([1]) that this K-point function may be expressed in terms of a kernel KN as RK (λ1 , · · · , λK ) =

det

1≤i,j ≤K

KN (λi , λj ),

(44)

and without entering into the precise definition of KN in terms of orthogonal polynoKN (λ, µ) mials, one should simply recall that is universal in the Dyson limit ([4]) ρ((λ + µ)/2)) (λ − µ goes to zero, N goes to infinity, N (λ − µ) finite), i.e. it is independent of the polynomial V which defines the probability measure.

120

E. Brézin, S. Hikami

Therefore we define a modified weight, and modified moments, 12K (λ1 , λ2 , · · · , λ2K ) N! ZN−K = (N − K)! ZN

2K N exp − V (λl ) F2K (λ1 , λ2 , · · · , λ2K ) 2

(45)

1

and M2K (λ) =

N! ZN−K {exp −N KV (λ)} F2K (λ, λ, · · · , λ). (N − K)! ZN

(46)

The universality of level correlations implies the universality of M2K . Therefore we have to return to the Gaussian case, in order to take into account this new normalization, and then the result will be universal. From (13) we have N! ZN−K 1 = N−1 , (N − K)! ZN N−K hn

(47)

and, given the explicit expression (19) of hn for the Gaussian case, we find, in the large N limit, N! ZN−K = (2π )−K eNK . (N − K)! ZN

(48)

With this normalization the universal moment M2K (λ) is given by M2K (λ) = (2π )−K (2π Nρ(λ))K

2

K−1 0

l! . (K + l)!

(49)

In fact this connection between the usual correlation functions and the expectation values of a product of characteristic functions, (43) and (44), allows one to recover directly the moment M2K (λ), by using the universal expression for the kernel K(λi , λj ) in the Dyson limit, K(λi , λj ) =

sin[π Nρ(λi − λj )] . π(λi − λj )

(50)

The integral representation, over 2K variables describing contours around the K poles λl , det 1≤i<j ≤K K(λi , λj ) 2 (λ , · · · , λ ) 1 K K K K 1 dul dvl (u1 , · · · , uK ) (v1 , · · · , vK ) = K(ui , vi ) K K K! 2πi 2πi i=1 j =1 (ui − λj )(vi − λj ) 1

1

i=1

(51)

Characteristic Polynomials of Random Matrices

121

allows one to write easily the limit in which all the λ’s are equal: det 1≤i<j ≤K K(λi , λj ) 2 (λ , · · · , λ ) 1 K K K 1 dul dvl = K! 2πi 2π i

lim

1

1

K

(u1 , · · · , uK ) (v1 , · · · , vK ) × K(ui , vi ). K K K i=1 (ui − λ) (vi − λ) i=1

(52)

Since the kernel is a Toeplitz matrix, i.e. K(λi , λj ) = K(λi − λj ), one can shift the u’s and the v’s of λ and the r.h.s. becomes independent of λ. In the case of the sine kernel we obtain, in the limit in which all the λ’s are equal, 1 K!

K K dxl dvl 2πi 2π i 1

=

1

2 K−1 (2πρN)K

(2π)K

l=0

K

(v1 , · · · , vK ) (x1 , · · · , xK ) sin(π Nρxi ) K K K π xi i=1 [(vi + xi ) vi ] i=1

l! . (l + K)!

(53)

We have indeed recovered, for any function V defining the probability distribution, the universal moment (49) 5. Large N Asymptotics Rather than starting, as in the previous sections, with an exact expression for the correlation functions of characteristic functions, and at the end letting N go to infinity, we may use a different method to investigate directly the large N limit for the moments of their distribution. This method applies for a general probability distribution of the form (3) and it may also be used to the more general case of an external matrix source coupled to the matrix X [7] in this distribution. It turns out that here again it is neccessary to consider first F2K for different λj ’s, and let all the λj ’s approach the same λ at the end of the calculation. From (5), we have ∂ ln F2K = MGλ (λi ), ∂λi where Gλ (λi ) is the resolvent, Gλ (λi ) =

1 1 Tr . M λi − X

(54)

(55)

The bracket here denotes an expectation value with a weight which includes both P (X) 2K and det(λl − X). We assume that the asymptotic spectrum of the eigenvalues xi of 1

X fill a single interval [α, β] in the large M limit. (It is sufficient to consider the single cut case, since we are interested in Dyson short distance universality, which involves only the local statistics). Therefore Gλ (z) is also analytic in a plane cut from the interval [α, β], and ˆ λ (x) ∓ iπρλ (x), Gλ (x ± i0) = G

(56)

122

E. Brézin, S. Hikami

ˆ λ (x) = [Gλ (x + i0) + Gλ (x − i0)]/2. The saddle point equation in the large where G M limit becomes 2MGλ (z) − N V (z) +

2K j =1

1 = 0. z − λj

(57)

The last term of (57) is of relative order 1/N and thus we have to solve this Riemannˆ Hilbert problem to this order. At leading order, we have 2G(x) = V (x), and up to order 1/N, 1 Gλ (z) = G(z) + N

CG (z) +

2K

Cλi (z) .

(58)

i=1

ˆ and Cˆ λi (x) = From the saddle point equation (57), we have Cˆ G (x) = (N − M)G(x) 1 . We now set M = N − K. The functions CG (z) and Cλi (z) are uniquely 2(λi − x) determined from their analyticity in a plane cut from α to β, and their fall-off as 1/z2 for large z (since both Gλ (z) and G(z) behave as 1/z at infinity). The result is K , (z − α)(z − β) √ √ 1 (z − α)(z − β) − (λi − α)(λi − β) 1 . (59) Cλi (z) = √ 1− 2 (z − α)(z − β) z − λi CG (z) = KG(z) − √

These expressions lead to 1 d log (λi − α)(λi − β) 2 dλi

2K (λj − α)(λj − β) 1 1− . λi − λ j (λi − α)(λi − β)

(N − K)Gλ (λi ) = N G(λi ) − −

1 2

(60)

j =1,j =i

Since there is a branch cut between α and β, one must specify whether λi approaches the real axis from above or from below. The sign of the square root on both sides of the cut will be denoted 6i . There are then a priori 22K saddle points corresponding to the different choices of 6i . For each choice of the 6i ’s, we have ∂ 1 d log F˜6 = 6i N iπρ(λi ) + − log (λi − α)(β − λi ) ∂λi 2 dλi

2K 6j (λj − α)(β − λj ) 1 1 , − 1− √ 2 λ − λj 6i (λi − α)(β − λi ) j =1,j =i i

(61)

where F˜6 means the value of F2K for given 6j ’s multiplied by a factor exp(− N2 V (λi )). Introducing the parametrization φ(x), defined by x = 21 (α + β) − 21 (β − α) cos φ(x)

Characteristic Polynomials of Random Matrices

and 21 (β − α) sin φ(x) =

123

√ (x − α)(β − x), we have

6i φ(λi ) − 6j φ(λj ) d log sin dλi 2 √ 6i (λi − α)(β − λi ) + 6j (λj − α)(β − λj ) 6i 1 = √ . 2 (λi − α)(β − λi ) λi − λ j Thus we obtain F˜6 by integration, F˜6 = C6

2K sin

6i φ(λi )−6j φ(λj ) 2

λi − λ j

i<j

×

2K

exp 6i iN π

λi

λ0

i=1

(62)

2K

1 √ sin φ(λi ) i=1

ρ(x)dx .

(63)

We have to sum over all the saddle-point contributions, i.e. sum over all the different choices of 6j ’s. We focus now on the Dyson limit in which thedifferences λi − λj are all 2K 2K of order 1/N. Among the 2 possibilities, we retain only the solutions in which K half of K among 6l are positive, and the remaining halves are negative. Otherwise, the exponential factor in the final result gives very rapid oscillations in the large N limit. This situation is thus exactlysimilar to that of the previous section. 2K Again the sum over the saddle-points is conveniently written as a contour K integral   2K K 2 (u − u ) 1 du · · · du du n m 1 2 K n<m F˜ = λj − 2 un  . ··· cos  K 2K K! (2πi)K (u − λ ) n j n=1 j =1 n=1

j =1

(64) When we set all the λj = λ, this becomes 1 F˜ = K!

dui (2π i)K

i<j (ui − uj ) K 2K i=1 ui

2

cos 2

K

un π Nρ

(65)

n=1

and we recover the result (38). However in this method, since we re-integrated the logarithmic derivative of F2K , the constant of integration remains undetermined. We may fix this constant by the same requirement that we have used in the previous section, and the final result agrees then with the previous calculation. 6. Symplectic Group Sp(N ) We have studied up to now unitary invariant measures, characterized for the probability law of the eigenvalues by the factor | (x1 , · · · , xM )|β . We could also consider the Gaussian orthogonal ensemble (GOE, with β = 1) or Gaussian symplectic (GSE, with β = 4). If we took the GOE for instance, we could immediately relate the correlation

124

E. Brézin, S. Hikami

functions of characteristic determinants, to the correlations of the eigenvalues, as in (43) (except that since β is one no doubling of the λ’s is needed), and therefore relate the moments universality to the Dyson universal limit. Remaining still with the unitary β = 2 class, in Cartan’s classification of symmetric spaces, we find ensembles which are invariant under Sp(N ) or O(N ). One of the physical applications of random Sp(N ) matrices, is the statistics of the energy levels inside a superconductor vortex [8]. In number theory, it is known that some generalizations of Riemann’s ζ -functions, such as the Dirichlet L-function L(s, χd ), where χd is a quadratic Dirichlet character of mod |d|, present a spectrum of low lying zeros on the line Re s = 1/2, which agrees with the statistics of the eigenvalues of the Sp(N) random matrix theory [12, 14]. In this Sp(N) invariant symmetric space, the eigenvalues appear always in pairs of positive and negative real numbers. Due to this fact, a new universality class governs the correlations of the eigenvalues near the origin, i.e. near s = 1/2 (whereas in the bulk one recovers the previous unitary class). Therefore we study now the new universality class, which governs the new scaling near the origin. We thus consider random Hermitian matrices X, which are 2M × 2M and satisfy the condition X T J + J X = 0, where J is

(66)

0 1M J = . −1M 0

(67)

The unitary symplectic group is the subgroup of SU (2M) consisting of 2M ×2M unitary matrices, satisfying the symplectic constraint U T = −J U † J.

(68)

The integration over this unitary symplectic group for FK (λ1 , · · · , λK ) gives [8] K det(λα − X) FK (λ1 , · · · , λK ) = α=1

1 = ZM

M

dµ(xi )

2

2 (x12 , · · · , xM )

1

M i=1

xi2

M K α=1 i=1

(λ2α − xi2 ). (69)

Repeating the analysis of Sect. 2, FK (λ1 , · · · , λK ) is given again by a determinantal form as (14). Changing x1 to xi2 = yi and denoting µi = λ2i , we have ∞ K K K K 1 dyi yi2 (yi − yj )2 (µα − yi )e−N yi . FK (µ1 , · · · , µK ) = 0

i=1

i=1

i<j

α=1 i=1

(70) The orthogonal monic polynomials for this measure are the Laguerre polynomials (1)

Ln2 (y), which is defined by (−1)n eNy d n n+ 1 −Ny (1) Ln2 (y) = √ ( ) (y 2 e ) y N n dy 1 (−1)n du (1 + u)n+ 2 −Nuy = n! e , Nn 2π i un+1

(71)

Characteristic Polynomials of Random Matrices

125

(1)

normalized as required to Ln2 (y) = y n + lowerdegree. The orthogonality condition is ∞ √ (1) (1) dye−Ny yLn2 (y)Lm2 (y) = hn δn,m (72) 0

3

with hn = n!<(n + 23 )/N 2n+ 2 , and hN−1 2π e−2N in the large N limit. From (14), we have K−1 K M+ K−1 l=0 (M 2

FK (µ1 , · · · , µK ) = (−1)

+ l)!

K(M+ K2 − 21 )

N K K M+K− 21 (1 + zl ) dzi e−N zα µα × M+K 2πi zl i=1

l=1

We now set M = N − K, and the factor

K−1

1 (µ) K

i<j

zj zi − 1 + zi 1 + zj

.

(73)

(M + l)!/N K(M+K/2−1/2) is equal to

0

(2πN )K e−KN , up to corrections of relative order 1/N in the large N limit. The large N limit is governed by the saddle-point equations zl2 + zl + µ1l = 0. In the following we study the scaling vicinity of the origin, in which all the µl ’s scale as 1/N 2 . Then zl2 at the saddle-point may be expanded i6l 1 √ zl √ − + O( µl ), µl 2

(74)

where 6l = ±1. zj zi = Noting that − (−zi2 µi + zj2 µj ) i K(K−1)/2 (6i λi − 1 + zi 1 + zj i<j

i<j

6j λj ) (6i = ±1), and combining it with the Vandermonde (λ2 ), we are left with a 1 −N zα µα = factor i<j 6i λi +6 in this scaling limit. We have also the exponential e j λj

e−i 6i λi . We have again to sum over all the saddle-points, which are characterized by the sign of 6i = ±1, and to include the factor due to the fluctuations near the saddle-point. The Gaussian fluctuations yield a factor (2π/(−2i6i λ3i N ))1/2 . Then (1 + zl )−1/2 (λi /(6i i))1/2 . There is a 1/(2π i)K in addition. We have an extra 6i due to the contour direction, which goes through two saddle points; one is in the positive imaginary plane and the other in the negative half-plane. When K = 2, and λ1 and λ2 are of order 1/N , we obtain F2 (λ1 , λ2 ) =

2π e−2N KSP (λ1 , λ2 ), λ1 λ2

(75)

with the kernel KSP (λ1 , λ2 ) given by KSP (λ1 , λ2 ) =

sin[N (λ1 − λ2 )] sin[N (λ1 + λ2 )] − . 2π(λ1 − λ2 ) 2π(λ1 + λ2 )

(76)

The coefficient (2π)e−2N is cancelled by the normalization factor 1/ hN−1 . Putting 1 4 3 λ1 = λ2 = 0, we have, neglecting the factor 2π e−2N , F2 (0) 2π 3! N .

126

E. Brézin, S. Hikami

For general K, FK (λ1 , · · · , λk ) becomes in the scaling limit FK (λ1 , · · · , λK ) = (−1)K(N−K+ ×

6

K

K−1 2 )

K

e−iN

i=1 6i λi

K

(2π N ) 2 e−NK (i) 2 (K−1)

i 6i λi

i<j (6i λi

+ 6 j λj )

π K 2

N

1 (2π i)K

.

(77)

The sum over all the saddle-points, characterized by 6i ± 1, is conveniently written as a contour integral,

1 e−iN 6i λi i<j (6i λi + 6j λj ) (6i λi ) 6 ! k " k K 2k dui (u2 ) (u) −iN i=1 ui , (78) ··· = (−1) 2 (K−1) e k k 2 2 k! 2π i i=1 j =1 (ui − λj )

I =

i=1

where the contour encloses ui = ±λj . We may now set λj = λ, and keeping track of various coefficients, we obtain the K th moment FK (λ, · · · , λ). For general λ, the result has a complicated form, but when λ = 0, it becomes a number FK (0, · · · , 0) =

2k/2 e−NK K (K+1) K (K−3) (i) 2 (−1)K(N−1) N2 k! k dui (u2 ) (u) −i k ui i=1 × e . k 2K 2π i u i=1 i i=1

(79)

This representation allows one to compute the K th moment at the origin. By the expansion of the VanderMonde determinants, similarly to (39), (79) is reduced to a determinant K form. We have by the normalization; F˜K (0) = (2π )− 2 eKN FK (0), F˜K (0) = (−1)KN

K K l! (2N ) 2 (K+1) . K (2l)! π2

(80)

l=1

Comparing to the result of the unitary case in (49), we notice that the exponent of N is different and the universal coefficient is given also by the product of the ratio of the factorizations. For F2K (λ1 , λ1 , · · · , λK , λK ), the even 2K th moment may be obtained again from F˜2K (λ1 , λ1 , · · · , λK , λK ) = det[K(λi , λj )]/( 2 (λ2 ) λ2i ); using the expression for the kernel (76), we have for the 2K th moment, det[K(λi , λj )] 2 2 (λ2 ) λi K 2 dui dvi (u2 ) (v 2 ) = K K K 2 2 K! 2πi 2π i i=1 j =1 (u2i − λ2j ) K i=1 j =1 (vi − λj ) ×

K 1 sin[N (ui − vi )] . (2π )K ui − v i i=1

(81)

Characteristic Polynomials of Random Matrices

127

For general λ, the result has a complicated form, but again here one can compute from there the values at λ = 0. The result agrees with the previous expression of F˜2K (0) in (80). One may also use the large N asymptotic analysis as in Sect. 5 and rederive the results as the same sum (78) over the saddle points. 7. Orthogonal Group O(N) We discuss here the O(2N ) case, which is different from Sp(N ) (whereas O(2N + 1) has a structure which is similar to Sp(N ) [8]). In number theory, for example, the twisted L-function, Lτ (s, χd ), presents a spectrum of low lying zeros, which agrees with the statistics of the eigenvalues of the O(2N ) random matrix theory [12, 14]. Then in terms of eigenvalues FK (λ1 , · · · , λK ) = <

K

det(λα − X) >

α=1

1 = ZM

M

dµ(xi )

2

(x12 , · · ·

2 , xM )

1

K M α=1 i=1

(λ2α − xi2 ). (82)

The difference between the symplectic and orthogonal case is due to the absence of the factor xi2 . Using the analysis of Sect. 2, FK (λ1 , · · · , λK ) is given by the determinantal form as in (14). Changing xi2 to yi and denoting µi = λ2i , we have

FK (µ1 , · · · , µK ) =

K ∞

0

i=1

dyi

K i=1

− 21

yi

i<j

(yi − yj )2

K K

(µα − yi )e−N

yi

.

α=1 i=1

(83) (− 21 )

The orthogonal polynomials for this case are Laguerre polynomials Ln defined by (− 21 )

Ln

(y), which is

1 √ eNy d (y) = (−1)n y n ( )n (y n− 2 e−Ny ) N dy 1 n (−1) du (1 + u)n− 2 −Nuy = n! e Nn 2π i un+1

(84)

(− 21 )

normalized as Ln

(y) = y n + lowerdegree. The orthogonality condition is ∞ 1 (− 1 ) (− 1 ) dye−Ny √ Ln 2 (y)Lm 2 (y) = hn δn,m y 0 1

(85)

with hn = n!<(n + 21 )/N 2n+ 2 , and hN−1 2π e−2N in the large N limit. From (14), we have similar to the Sp(N ) case, K−1 K M+ K−1 l=0 (M + l)! 1 2 FK (µ1 , · · · , µK ) = (−1) K 1 (µ) N K(M+ 2 − 2 ) 3 K K K zj dzi (1 + zl )M+K− 2 −N zα µα zi . (86) e − × 2πi 1 + zi 1 + zj zlM+K i=1 l=1 i<j

128

E. Brézin, S. Hikami K−1

We set M = N − K, and the factor

(M + l)!/N K(M+K/2−1/2) is equal to

0

(2πN )K e−KN , up to corrections of relative order 1/N in the large N limit. The saddle λl point zl is same as (74). The only difference is the extra factor (1 + zl )−1 i6 . When l K=2, we obtain F2 (λ1 , λ2 ) = 2π e−2N KO (λ1 , λ2 ),

(87)

with the kernel KO (λ1 , λ2 ) given by KO (λ1 , λ2 ) =

sin[N (λ1 − λ2 )] sin[N (λ1 + λ2 )] + . 2π(λ1 − λ2 ) 2π(λ1 + λ2 )

(88)

The factor (2π)e−2N is cancelled by the normalization factor 1/ hN−1 (2π )−1 e2N . 1 Putting λ1 = λ2 = 0, we have neglecting the factor 2π e−2N , F2 (0) 2π (2N ). For general K, FK (λ1 , · · · , λk ) becomes in the scaling limit FK (λ1 , · · · , λK ) = (−1)K(N−K+ ×

6

K−1 2 )

e−iN

i<j (6i λi

K

K

(2π N ) 2 e−NK (i) 2 (K−3)

i 6i λi

+ 6 j λj )

π K 2

N

.

1 (2π i)K (89)

The sum over all the saddle-points, characterized by 6i ± 1, is conveniently written as a contour integral,

1 e−iN 6i λi i<j (6i λi + 6j λj ) (6i λi ) 6 ! k " k K (u2 ) (u) K 2k dui i=1 ui −iN i=1 ui , (90) = (−1) 2 (K−1) ··· e k k 2 2 k! 2π i i=1 j =1 (ui − λj )

I =

i=1

where the contour encloses ui = ±λj . We may now set λj = λ, and keeping track of various coefficients, we obtain the K th moment FK (λ, · · · , λ). For general λ, the result has a complicated form, but when λ = 0, it becomes a number FK (0, · · · , 0) =

2k/2 e−NK K (K−1) K (K−5) (i) 2 (−1)K(N−1) N2 k! k (u2 ) (u) −i k ui dui i=1 × e . k 2K−1 2π i i=1 ui i=1

(91)

K

The normalization factor is (2π)− 2 eKN for FK (λ). Denoting the normalized K th moment by F˜K (λ), we have F˜K (0) = (−1)KN

K−1 l=1

K

l! (2N ) 2 (K−1) . K (2l)! π2

(92)

Characteristic Polynomials of Random Matrices

129

We have F˜2K (λ1 , λ1 , · · · , λK , λK ) = det[K(λi , λj )]/ 2 (λ2 ). Using the expression for the kernel (88), we obtain for the 2K th moment in the orthogonal O(2N ) case, det[K(λi , λj )] 2 (λ2 ) (u2 ) (v 2 ) K 2K dvi dui i=1 (ui vi ) = K K K 2 2 2 2 2πi 2π i K K! i=1 j =1 (ui − λj ) i=1 j =1 (vi − λj ) K 1 sin[N (ui − vi )] . (2π )K ui − v i

×

(93)

i=1

Inserting λi = 0, we find the consistent result with (91). 8. Negative Moments In number theory literature one finds various moments in which powers of the zetafunctions appear in the denominator [11]. The equivalent for random K matrices would be to consider expectation values of the form det(λl − X)6l in which the 6’s 1

are ±1. One cannot use the techniques introduced hereabove any more but, at least in the Gaussian case, it is easy to obtain exact expressions through the use of auxiliary integrations, over both commuting and anti-commuting variables. We first rederive our previous results for positive moments (i.e. 6l = +1 for all l’s) . Let us introduce M Grassmann variables c¯a , ca and an integration normalized to d cdc ¯ cc ¯ = 1. (94) π Then, for an hermitian M × M matrix X, one has det(λ − X) = N −M

M d c¯a dca 1

A product

K

iπ

exp iN

[c¯a (λδa,b − Xa,b )cb ].

(95)

a,b

det(λl − X) is represented by a product of K integrals of the type (95).

1

At the end the random matrix X appears in an expression of the form exp −iN

M K l=1 a,b=1

(l)

Xab c¯a(l) cb .

(96)

With the Gaussian probability weight (16) we have exp iN TrAX = exp −

N TrA2 , 2

(97)

130

E. Brézin, S. Hikami

and thus   K M K K d c ¯ dc N a a det(λl − X) = N −M iλl γll + γlm γml  exp N π 2 1

l=1

1

l,m=1

(98) with γlm =

N a=1

c¯a(l) ca(m) .

(99)

We can use an auxiliary K × K hermitian matrix B to replace the quadratic terms in γ by exp

N Trγ 2 = 2

N 2π

K 2 /2

N 2 d K B exp N Trγ B − TrB2 . 2

(100)

We are left with an integral over the Grassmannian variables N

−MK

K M (l) d c¯a (l) dca iπ

a=1 l=1

exp N

K

(iλl δlm + Blm )

l,m=1

a=1

=

M

det

1≤l,m≤K

c¯a (l) ca(m) M

(λl δlm − iBlm )

.

(101)

We end up with an integral over a K × K hermitian matrix B: K 2 N K /2 N 2 d K B {det(λl δlm − iBlm )}M exp − TrB2 . det(λl − X) = 2π 2 1

(102) Therefore, from this method as well, we have reduced the correlations of the characteristic functions of the matrix, to an integral over K 2 variables. If one is interested in the moments, i.e. λl = λ for all l’s, one may take as variables the eigenvalues bl of B (which yields a factor 2 (λ1 , · · · , λK )), and recover the previous expressions. For the λl ’s non-equal, one must first shift the matrix B of the diagonal matrix iλl δlm , and then integrate out the unitary group SU (K) by the Itzykson–Zuber formula [15–17], to reduce it, as before, to an integral over K variables (a slightly different integral, but which may be handled in the large N -limit in an identical fashion). In case of negative moments the method is identical, except that we need now ordinary commuting variables, instead of Grassmannian. Indeed starting from 1 det(λ − X ± i6) M dφa∗ dφa M =N exp ±iN [φa∗ (λδa,b − Xa,b ± i6δa,b )φb ], ±iπ 1

a,b

(103)

Characteristic Polynomials of Random Matrices

131

one can introduce, for each factor (det(λl −X))6l an integration over M complex variables (φa∗ , φa ) if 6l = −1, or over M complex Grassmannian variables (c¯a , ca ) if 6l = +1. The expectation value with the Gaussian weight P (X) is then immediate. Of course for the negative moments, one must pay attention to the sign of the infinitesimal imaginary part of the λ’s since there is a cut on the real axis along the support of Wigner’s semi-circle. Although the method is obvious and elementary, the notations can become cumbersome and, rather than working out the most general case, and arbitrary choices for the signs of the imaginary parts, we restrict ourselves to an example. If we consider only negative powers, we may follow identical steps as hereabove with positive powers, and we find K 1

K2 1 N 2 2 d K B {det(λl δlm − Blm + i6δlm )}−M = det(λl − X + i6) 2π × exp −

N TrB2 . 2

(104)

When all the λ’s are equal the r.h.s. simplifies to an integral over K variables K 1

2 (b

dbl K 1

1, · · ·

, bK )

(λ − bl + i6)M

K N 2 exp − bl . 2

1

For the λl ’s non-equal, after a shift of the matrix B and the integration over SU (K), one obtains

K 1

K N K(K+1)/2 1 N 2 = exp − λl det(λl − X + i6) 2π 2 ×

K 1

1

dbl (bl − i6)M

K (b1 , · · · , bK ) 1 2 exp −N b l + b l λl , (λ1 , · · · , λK ) 2

(105)

1

from which one could repeat easily the analysis of Sect. 3.

9. Discussion We have discussed the universal expressions for the moments of the characteristic polynomials in a random matrix theory, where the ensembles belong to the unitary family (β = 2). We have shown that these universalities are related to the universality of the kernel in Dyson’s short distance limit. Since the statistics of the zeros of the ζ -function follows the universal behavior of the Gaussian unitary ensemble (GUE) [12, 19], the power moment of the ζ -function also has to follow the universal behavior of GUE. We have studied here the characteristic polynomial, which corresponds to the ζ -function on the critical line, and we have found a universal behavior for the moments of the characteristic polynomial. The universal number (49) appears indeed in the average of the moment of the ζ -function, which was conjectured as (2), γK = K−1 l!/(K + l)!. 0

132

E. Brézin, S. Hikami

Our method of splitting the singularity by the introduction of the distinct λi may be applied directly to the average of the power moment of the Riemann ζ -function. We consider the average of the product of ζ (si ), si = 21 ± i(λi + t), 1 F = T

0

2K T

ζ (si )dt,

(106)

i=1

where we choose K positive λi ’s and K negative ones. If, at the end of the calculation, we set all the positive λi ’s equal to λ and the negative ones to −λ, one recovers the 2K th moment of the modulus of the ζ -function. When T is large, the leading and the next leading terms of the derivative of ln F with respect to λi , are presumably given by ∂ ln F 1 ∼ ±i ln T − , ∂λi λi − λ j

(107)

j =i

where the pole in the second term appears when two distinct λi coincide, one of the λi ’s with a plus sign and the other one with a minus sign. In the Appendix a discussion of the assumptions leading to (107) is given. Then, after integration, we have, following a line of arguments similar to those of Sect. 5, c F = K!

K 2 (u) dui e−i ui ln T . K 2K 2π i i=1 j =1 (ui − λj ) i=1

(108)

Therefore, if we let the λi ’s coincide, we recover the integral (39), which provides the universal coefficient γK . The coefficient c is not determined by this method, which starts with the logarithmic derivative of F, and an extra normalization condition is needed. In (A.7) it will be argued that a coefficient aK is present in the result, which is the residue at s = 1 of a function gK (s) defined in the Appendix; it is thus plausible that the coefficient c in (108) is nothing but c = aK . We have also investigated negative moments as (105). This result may apply to the mean value of negative moments of the ζ -function. Indeed, the exponent K 2 of log T for the negative integer K, has been conjectured [11]. For the symplectic and orthogonal case, Sp(N ) and O(2N ) ensembles, there may be also be a correspondence between the random matrix results (80), (92) and the average values of the certain L-functions, with the same γK , as far as there is a universality. Existing conjectures [13] for the moment of the L function shows the same exponent K K 2 (K + 1) and 2 (K−1) for the symplectic and the orthogonal cases, and the conjectured values of γK agree with our results (80) and (92). Acknowledgement. This work was supported by the CREST of JST, and one of us (E.B.) is happy to thank the organizers of the third CREST meeting for the invitation extended to him.

Appendix: Summation Formula for the Riemann Zeta-Function The Riemann ζ -function is given by

∞ 1 1 , ζ (s) = = ns 1 − p1s p n=1

(A.1)

Characteristic Polynomials of Random Matrices

133

where p is a prime number. The K th power of this function is written as [ζ (s)]K =

∞ dK (n) n=1

=

ns

dK (p) dK (p 2 ) + + ··· ps p 2s

1+

p

,

(A.2)

where dK (n) is the K th Dirichlet coefficient. When n is a power of the prime number, dK (p j ) = <(K + j )/ <(K)j ! (this follows easily from the definition of the Dirichlet coefficient dK (n) = 1). n1 ···nK =n

We consider now the average of (A.2) on the critical line s = 21 + it over a large interval T , 2 2K T ∞ 1 T d (n) 1 1 K dt ζ + it = dt (A.3) . 1 +it T 0 2 T 0 n2 n=1

Expanding the sum |

∞ dK (n) n=1

terms ,

∞ d 2 (n) K

n=1

ns

ns

|2 , which appears in (A.3), we first examine the diagonal

2 (p) 2 (p 2 ) dK dK 1+ = + + ··· ps p 2s p K 2 (K − 1)2 −2s 2 = (1 − p −s )−K 1 − p + ··· 4

= [ζ (s)]K gK (s), 2

where gK (s) =

(A.4)

#

(1 − p −s )K

2

p

$ ∞ d 2 (p j ) K

j =0

pj s

.

(A.5)

The function gK (s) is an analytic function of s, including the point s = 1. Let us examine the contribution of these diagonal terms given by (A.4) to (A.3). Their contribution is conveniently found, if we apply the following inversion formula (Perron formula): B(s) =

∞

bn n−s ,

n=1

f (x) =

bn .

(A.6)

n≤x

Then, we have f (x) =

1 2π i

c+i∞

c−i∞

B(s)x s s −1 ds,

(A.7)

134

E. Brézin, S. Hikami

2 (n), and B(s) = in which c is some arbitrary real positive number. Substituting bn = dK 2 ζ K (s)gK (s), we obtain, from the residue of the singularity at s = 1,

n≤x

2 dK (n) =

gK (1) 2 2 x logK −1 x + O(x logK −2 3x). <(K 2 )

(A.8)

By a partial summation, this approximate calculation yields, d 2 (n) aK 2 K ∼ logK T , n <(K 2 + 1)

(A.9)

n≤T

where aK = gK (1). From these formulae, it is seen that the contribution of the diagonal terms to the average of the K th power moment of the ζ -function does take the asymptotic form of (1). However, neglecting the off-diagonal terms, we failed to reproduce the proper coefficient γK , whose understanding clearly requires the off-diagonal products in (A.4) as well. A lower bound for the 2K th moment is known [18]

T +Y

T −Y

|ζ

1 2 + it |2K dt Y logK Y, 2

(A.10)

where log6 T ≤ Y ≤ T . An upper bound seems difficult to obtain, and (1) remains as a conjecture, except for the K = 1 and K = 2 cases, for which it has been derived. We note here the results and the conjecture of Montgomery [19] about the density of the zeros of Riemann ζ -function and their correlation. When γ is a zero on the critical line, ζ ( 21 + iγ ) = 0, 0<γ ≤T

0<γ ,γ ≤T ,α/L≤γ −γ ≤β/L

1≥

2 T + o(1) log T , 3 2π

# 1 = (1 + o(1))

β α

1−(

(A.11)

$ sin π u 2 ) du + δ(α, β) T L, πu (A.12)

where L = log T /(2π), and δ(α, β) = 1 for 0 ∈ [α, β], and otherwise zero. Then (A.11) is equivalent to the average density of state in (1), with for K = 1, γK = aK = 1 and (A.12) is equivalent to the pair correlation function in random matrix theory. Let us present the arguments which lead to the conjectured formula (107); we first assume that λ1 − λ2 ∼ O((ln T )−1 ) for large T . The diagonal approximation for the product of ζ (s1 ) and ζ (s2 ), which earlier gave the expected behaviour for the moment, but with a wrong coefficient, may thus be applied here again, since we are taking a logarithmic derivative, which is unsensitive to overall normalizations. Within this assumption,

Characteristic Polynomials of Random Matrices

135

we obtain

∞

∞ 1 1 ∂ 1 T ∂ ζ (s3 ) · · · ζ (s2K ) log F = ln dt 1 1 +iλ1 +it −iλ2 −it ∂λ1 ∂λ1 T 1 n=1 n 2 n=1 n 2

∂ 1 T 1 ∼ ln dtζ (s3 ) · · · ζ (s2K ) ∂λ1 n1+i(λ1 −λ2 ) T 1 n
We have considered up to now what happens when λ1 −λ2 is small, but we should repeat the same arguments for the Dyson limit in which all pairs λ1 − λj are of order (log T )−1 . Therefore, when one sums over all possible combinations, one obtains (107). References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19.

Mehta, M.L.: Random Matrices. New York: Academic Press, 1991 Dyson, F.J.: J. Math. Phys. 13, 90–97 (1972) Ambjorn, J. and Makeenko, Yu.M.: Mod. Phys. Lett. A 5, 1753–1764 (1990) Brézin, E. and Zee, A.: Nucl. Phys. B 402, 613–627 (1993) Brézin, E. and Hikami, S.: Nucl. Phys. B 479, 697–706 (1996) Brézin, E. and Hikami, S.: Phys. Rev. E 56 264–269 (1997) Zinn-Justin, P.: Commun. Math. Phys. 194, 631–650 (1998) Brézin, E., Hikami, S. and Larkin, A.I.: Phys. Rev. B 60, 3589–3602 (1999) Conrey, J.B. and Gonek, S.M.: High Moments of the Riemann Zeta-function. Preprint Keating, J. and Snaith, N.: Random matrix theory and some zeta-function moments. Lecture at Erwin Schrodinger Institute, Sept., 1998 J. Keating, MSRI workshop, Random Matrices and their Applications, June 1999 Gonek, S.M.: Mathematika 36, 71–88 (1989) Katz, N.M. and Sarnak, P.: Random matrices, Frobenius eigenvalues, and monodromy. AMS Colloquium publications, Vol. 45, Prvidence, RI: AMS, 1999 Conrey, J.B. and Farmer, D.W.: Mean Values of L-Functions and Symmetry. AIM preprints, 1999 Rubinstein, M.O.: PhD thesis, Princeton University, 1998 Harish-Chandra: Proc. Nat. Acad. Sci. 42, 252–253 (1956) Itzykson, C. and Zuber, J.-B.: J. Math. Phys. 21, 411–421 (1980) Duistermaat, J.J. and Heckman, G.H.: Invent. Math. 69, 259–268 (1982) Ramachandra, K.: Hardy-Ramanujan J. 3, 1–24 (1980) Montgomery, H.L.: Proc. Symp. Pure. Math. 24, 181–193 (1973)

Communicated by P. Sarnak

Commun. Math. Phys. 214, 137 – 189 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Scalar Curvature Deformation and a Gluing Construction for the Einstein Constraint Equations Justin Corvino Department of Mathematics, Stanford University, Stanford, CA 94305-2125, USA. E-mail: [email protected] Received: 8 November 1999 / Accepted: 27 March 2000

Abstract: On a compact manifold, the scalar curvature map at generic metrics is a local surjection [F-M]. We show that this result may be localized to compact subdomains in an arbitrary Riemannian manifold. The method is extended to establish the existence of asymptotically flat, scalar-flat metrics on Rn (n ≥ 3) which are spherically symmetric, hence Schwarzschild, at infinity, i.e. outside a compact set. Such metrics provide Cauchy data for the Einstein vacuum equations which evolve into nontrivial vacuum spacetimes which are identically Schwarzschild near spatial infinity. 1. Introduction The main goal of this paper is to construct certain types of solutions to the Einstein constraint equations whose existence was heretofore an open question. We start by reviewing some notions from general relativity which will help motivate the statement of the main problem. 4 The basic object of study in general relativity is a Lorentzian manifold (M , g) ¯ (with metric signature (−, +, +, +)) satisfying Einstein’s equation 1 Ric(g) ¯ − R(g) ¯ g¯ = 8π T . 2

(1)

In this equation, Ric(g) ¯ and R(g) ¯ denote the Ricci and scalar curvatures, respectively, of the metric g, ¯ and T is the stress-energy tensor of matter; we consider the vacuum case T = 0. It is well known that Einstein’s equation admits an initial value formulation, in which the vacuum initial data consist of a manifold M 3 , a Riemannian metric g and a symmetric (0, 2)-tensor K on M. The Gauss and Codazzi equations provide the constraints upon g and K in order that they form, respectively, the induced metric and Current address: Department of Mathematics, Brown University, 151 Thayer Street, Box 1917, Providence, RI 02912, USA. E-mail: [email protected]

138

J. Corvino

second fundamental form of M inside a spacetime (M, g); ¯ in the vacuum case these constraints are [W] R(g) − K2 + H 2 = 0, div(K) − ∇H = 0.

(2) (3)

Here H denotes the mean curvature g ij Kij , and all quantities are computed with respect to g. From Eq. (2), we see that the scalar curvature R(g) is nonnegative in the maximal (H = 0) case, and R(g) is zero in the time-symmetric (K = 0) case. When M is closed and H is constant, a parametrization of solutions to the vacuum constraints has been obtained [I]. The first solution of the Einstein equation (1) (with T = 0) was obtained by Schwarzschild in 1916, with the metric given by 2m 2m −1 2 2 2 dr + r 2 d2 dt + 1 − ds = − 1 − r r in standard spherical coordinates. The singularity at r = 2m is only a coordinate singularity, and the metric can be extended to all of R × (R3 \ {0}) (see (4) for a formula for the spacelike part), but the spacetime does contain the first example of a black hole singularity (“at” r = 0). The spacetime is spherically symmetric, and the t = 0 slice is a Riemannian metric on R3 \ {0} which is a time-symmetric solution to the vacuum constraints. This spacelike Schwarzschild slice is actually a complete Riemannian manifold, as near r = 0 the metric is asymptotically flat (AF). It can be shown by a simple ODE argument [H-E] that the only spherically symmetric, AF solutions to the constraint R(g) = 0 are isometric to these Schwarzschild slices; in fact this ODE argument shows that this is true even locally in spherically symmetric subdomains of a manifold. Note that except for the trivial case when m = 0 (flat R3 ), these Schwarzschild metrics do not extend to all of R3 . Actually this discussion extends to Rn (n ≥ 3). We remark that by spherically symmetric initial data we mean data for which there is a subgroup of the isometry group isomorphic to SO(n), whose orbits are hypersurfaces. We note that the Schwarzschild metric has an analogue in higher dimensions, given by metrics which are conformal to the flat metric on Rn \{c}: S g(m,c),ij (x)

m = 1+ (n − 1)r n−2

4 n−2

δij

(4)

in standard coordinates, with r = |x − c|. For later use we have included the center-ofmass parameter c. As above, we again find that these are the only examples of symmetric, scalar-flat initial data. In fact Birkhoff’s theorem [H-E] states that any spherically symmetric domain in a vacuum spacetime is isometric to a domain in the maximally extended Schwarzschild spacetime; in particular this shows that spherically symmetric spacetimes are static (see the paragraph before Prop. 2.7 for a short discussion of staticity related to the issues in this paper); note that spherical symmetry in an (n + 1)-dimensional spacetime is imposed by a subgroup of the isometry group isomorphic to SO(n) whose orbits are (n − 1)-dimensional spacelike submanifolds. It has been a well-known open problem [S-Y1, p. 371] to determine whether there exist nontrivial metrics with zero scalar curvature on Rn which are spherically symmetric (hence Schwarzschild) outside a compact set (“at infinity”). The same could be posed

Scalar Curvature Deformation

139

on punctured space, where “nontrivial” would mean “not Schwarzschild”. The nonexistence of such metrics would imply that symmetry at infinity is somehow a rigid condition, in the sense that symmetry at infinity would force symmetry everywhere. Bray [Br1] demonstrates the existence of metrics g on Rn with R(g) ≥ 0 which are Schwarzschild at infinity, which he used in his approach to the Penrose Conjecture. The problem of keeping the scalar curvature precisely zero requires more analysis, which we carry out in the present work. We have discovered that indeed symmetry at infinity is not rigid, and in fact we have constructed scalar-flat metrics which are Schwarzschild at infinity and have considerable flexibility in the interior (see Theorem 2). By solving the Cauchy problem for the Einstein equation, we obtain small-time existence for nontrivial spacetimes which are Schwarzschild at spatial infinity. One may be initially inclined to believe that symmetry at infinity is a very rigid condition. For example, it is easy to show (by unique continuation for harmonic functions) that there is no conformally flat AF metric on Rn which is symmetric at infinity and scalar-flat. If the proposed metric were obtained by deforming a metric which is conformally flat at infinity, then the condition that the metric be symmetric means that in some coordinate chart, all the higher “moments” in the expansion of the conformal factor into spherical harmonics (see Sect. 5.4) must be made to vanish, a seemingly strong condition. In the Newtonian case, for example, it is elementary to show that for the Newtonian potential of a mass density ρ to be symmetric at infinity, the density ρ must be orthogonal to the nonconstant homogeneous harmonic polynomials, in which case we see ρ is somewhat special. So one might also think a special construction would carry over to initial data for GR, with the energy of the field in place of the matter. In fact, it appears that these metrics which are symmetric at infinity are in some sense generic amongst AF scalar-flat metrics, and we will make this more precise below (see Sect. 4). The method we use to produce these metrics is to glue an AF metric, which is scalarflat and conformally flat at infinity, to a Schwarzschild metric, at a “large” radius R. The gluing occurs in an annular region, in which the scalar curvature is near zero, decaying like a power of R1 in the annular region (see Sect. 5). Naturally we would then like to deform the scalar curvature to be precisely zero, without jeopardizing the symmetry at infinity. The problem of scalar curvature deformation on closed manifolds has been addressed by Fischer and Marsden [F-M], who showed that on a closed manifold the scalar curvature is generically a local surjection; that is, given a function S which is the scalar curvature of a generic metric g, then any function sufficiently near S is also realizable as the scalar curvature of a metric near g. They also prove this near the flat metric on Euclidean space, using weighted Sobolev spaces to control the behavior of functions and tensors at infinity. The method of proof is the use of a Hodge-type elliptic splitting theorem along with the implicit function theorem. In fact as we shall soon see (Sect. 2), the linearization Lg of the scalar curvature operator has an L2 -adjoint L∗g which has injective symbol. By elliptic theory, the appropriate Sobolev function space then splits as ran Lg ⊕ ker L∗g . Hence the generic condition which is sufficient to show local surjectivity is that L∗g have trivial kernel. What we do is generalize this to a compact domain in a Riemannian manifold (M n , g) (n ≥ 2), in the sense that given a sufficiently small compactly supported deformation of the scalar curvature, we achieve it as the scalar curvature of a metric which differs from the initial metric only in a compact set. This localized scalar curvature deformation is obtained much in the spirit of the Fischer-Marsden paper, but it requires a bit more delicate analysis to produce a solution with compact support. We also mention here the

140

J. Corvino

related result of Lohkamp [L], who showed using different means that it is possible to push the scalar curvature down a specified amount on a compact set. Although our result allows the scalar curvature to move in either direction, we can of course only expect to move it a small amount. The basic outline of the paper follows. In Sect. 2, we define basic notation, prove several preliminary statements, and then state the two main theorems we will prove. In Sect. 3, we give the proof of the local scalar curvature deformation. The proof is contained in several subsections. First, we obtain the basic weighted elliptic estimate (10) of the overdetermined elliptic operator L∗g , under the assumption that kerL∗g is trivial. This is not a “standard” estimate, in the sense that although the function spaces considered allow growth at a certain rate at ∂, the estimate is an absolute estimate on the domain , without boundary terms. This estimate then allows us to demonstrate the local surjectivity of Lg , by solving Lg h = φ using a direct variational method in suitable weighted L2 -Sobolev spaces. These variational solutions have nice decay properties at the boundary of the domain (engineered by the weighting), and this allows us in the following subsection to use Picard iteration (in Hölder spaces via Schauder estimates) to obtain appropriate solutions of the nonlinear problem. To emphasize, the difficulty in simply applying standard methods is our insistence on the deformation of the original metric to have specified compact support, with control on the deformation near ∂. Solving the linearized problem for Lg in spaces in which the tensors decay at ∂ suggests that we consider for the domain of the adjoint L∗g functions that can grow at the boundary ∂ at some rate specified by a weight function. In Sect. 4 we recall the notion of the mass of an asymptotically flat manifold. We also give an application (as indicated by Rick Schoen) of the local curvature deformation to a conjecture about Bartnik’s quasi-local mass: we show that minimal mass metrics (when they exist) are in fact static. In the final Sect. 5, we obtain the existence of scalar-flat metrics on Rn which are Schwarzschild at infinity, by using the aforementioned gluing followed by a local deformation in the annular gluing region. We mention that since the gluing is localized, the method applies to any conformally flat end of a scalar-flat AF manifold. The proof is not just a straightforward application of the local curvature deformation, however, as an obstruction appears which precludes us from being able to simply deform scalar curvatures which are close to zero in the annulus to precisely zero. This obstruction arises as the gluing is done at “large” radii in which the metric is approaching the flat metric δ, and we have an “approximate kernel” coming from the kernel of L∗δ . So the proof accounts for this problem by identifying effective parameters which give the added flexibility to enable us to make the scalar curvature zero, namely the mass m and center c of the Schwarzschild metric which we use for the gluing. The procedure is to note that transverse to this approximate kernel our basic elliptic estimates remain uniform. So we can indeed solve a projected problem: we can adjust the scalar curvature to lie in the approximate kernel. This is performed in an analogous fashion to the above deformation, and there are several subsections which mirror the above steps: elliptic estimate, variational solution of the linearized problem, and Picard iteration. The final step is to incorporate the parameters to show that the component of the scalar curvature lying in the approximate kernel can be made zero. This actually turns out to be slightly delicate, and it is written out in careful detail. The proof actually shows that the construction works for metrics which are conformally flat to leading order.

Scalar Curvature Deformation

141

2. Preliminaries 2.1. Basic notation. Let denote a compactly contained subdomain of a smooth manifold M n (n ≥ 3). We will specify regularity conditions on the boundary as needed, and we will generally assume ∂ is at least C 2 . We list here some notation, and we define some function spaces which we will find useful. • Ric(g) = Rij and R(g) = g ij Rij denote the Ricci and scalar curvatures, respectively, of a Riemannian metric g on M; we use the Einstein summation convention throughout, as well as the convention of using a semicolon to denote covariant differentiation and a comma to denote partial differentiation. • We let dµg denote the volume measure induced by g, dσg the induced surface measure on submanifolds, dx the Lebesgue measure on Euclidean space, and dξ the Euclidean surface measure. • S (0,2) denotes the space of symmetric (0,2)-tensor fields. • Hk denotes the subspace of S (0,2) consisting of those measurable tensors which are square integrable along with the first k weak covariant derivatives; with the standard Hk -inner product induced by the metric g, Hk becomes a Hilbert space. H k is defined k (H k ) are defined as the spaces of tensors similarly for functions, and the spaces Hloc loc k k (functions) which are in H (H ) on each compact subset. • Mk (k > n2 ) denotes the open subset of Hk of Riemannian metrics, and Mk,α denotes the open subset of metrics in C k,α . • Let ρ be a smooth positive function on . Define L2ρ () to be the set of locally-L2 functions f such that fρ 1/2 ∈ L2 (). The pairing f, gL2ρ () = fρ 1/2 , gρ 1/2 L2 () makes L2ρ () a Hilbert space. Define L2ρ () similarly for tensor fields. • Let Hρk () be the Hilbert space of L2ρ () functions whose covariant derivatives up through order k are also L2ρ (). The inner product is defined in the natural way, incorporating the L2ρ ()-pairings on all the derivatives. Define Hρk () similarly for tensor fields. • Let Cρk,α (), (0 < α < 1), denote the subset of C k,α () comprised of functions which are also in L2ρ (). This is a Banach space with norm denoted · k,α,ρ = · k,α + · L2ρ . (Unless noted otherwise, norms will be taken over .) Define Cρk,α () analogously. 2.2. Some basic propositions. We note the following lemmas hold [F-M, Be]. Lemma 2.1. The scalar curvature map is a smooth map of Banach manifolds, as a map R : Ml+2 () → H l () (l + 2 > n2 + 1), or R : Mk+2,α () → C k,α () (k ≥ 0). The linearization Lg of the scalar curvature operator is given by Lg (h) = −*g (trg (h)) + div(div(h)) − g(h, Ric(g)), in either of the above spaces.

142

J. Corvino

Remark. Note that we have taken the Laplacian with negative eigenvalues, i.e. 1 *g f = √ ∂i (g ij |g| ∂j f ) = −*H f, |g| where *H is the Hodge–deRham Laplacian dδ + δd on forms; δ = (−1)n(p+1)+1 ∗ d∗ on p-forms, and “∗” denotes the Hodge star. Also, “div” denotes the divergence operator, which on S (0,2) , for example, is given by div(h)i = g j k hij ;k . Lemma 2.2. The formal L2 -adjoint L∗g of Lg is given by L∗g (f ) = −(*g f )g + Hess(f ) − f Ric(g). 2 (), then the scalar curvature is a constant Proposition 2.3. If ker L∗g is nonzero in Hloc in .

Remark. For this proposition, C 4 -regularity on g is enough to enable us to applyAronszajn’s theorem as it appears in [A], as well as the appropriate elliptic regularity statement. It can be weakened at least to C 3 , as noted at the end of [A]. Proof ([F-M]). Taking the trace of the equation L∗g f = 0 yields

and hence

(n − 1)*g f = −R(g)f,

(5)

R(g) g f. Hess(f ) = Ric(g) − n−1

(6)

3 (), so we can take the divergence of the equation By elliptic regularity, f ∈ Hloc = 0. To do this, we note the following identities; here L denotes the Lie derivative, so that LX (g) = Xi;j + Xj ;i :

L∗g f

L∇ψ (g) = 2 Hess(ψ), div(ψ g) = dψ, div(LX (g)) = −*H X . − dδX . + 2 Ric(g) · X, where the second formula follows by the fact that g is parallel for the Levi-Civita connecton, and where the third formula follows from the Bochner–Weitzenbock formula l , and the formula *H X . = *X . + Ric(g) · X, the Ricci identity Xi;j k − Xi;kj = Xl Rkj i .

for dδ in terms of the connection, dδX . = −d (div(X)); here Xi = gij X j , and the . l . rough Laplacian * is given by *Xi = −gil g j k X;j k Thus we have 1 (−*H dψ − dδdψ + 2 Ric(g) · ∇ψ) 2 1 = (−d*H ψ − d(δd + dδ)ψ + 2 Ric(g) · ∇ψ) 2 = d*g ψ + Ric(g) · ∇ψ.

div(Hess ψ) =

Scalar Curvature Deformation

143

Hence upon taking the divergence of L∗g f = 0, we obtain 0 = −d*g f + (d*g f + Ric(g) · ∇f ) − Ric(g) · ∇f − f div(Ric(g)) 1 = − f d(R(g)) 2 by the Bianchi identity. From this we see that if f (x0 ) = 0, then d(R(g))(x0 ) = 0. To see what happens on the zero-set of f , we let γ be a geodesic starting from some x0 , and define h(t) = f (γ (t)). Then h (t) = Hessγ (t) f (γ (t), γ (t)) R(g) g (γ (t), γ (t)) h(t). = Ric(g) − n−1

(7)

If both f (x0 ) = 0 and df (x0 ) = 0, then the above linear ODE for h has zero initial data, and so h ≡ 0. This implies that f is identically zero near the point x0 ; since f satisfies the elliptic equation (n − 1)*g f = −R(g)f , we have by Aronszajn’s unique continuation theorem [A] that f ≡ 0; note that as remarked above, the principal coefficients in the equation are sufficiently smooth (C 2,1 ) for g at least C 4 . This shows that if f ≡ 0 and f (x0 ) = 0, then df (x0 ) = 0; hence zero is a regular value for f . Thus the zero-set of f is a codimension-one submanifold 1, an embedded hypersurface in . Clearly then d(R(g)) ≡ 0, so R(g) must be constant on . 2 (); in fact the dimension does not exceed Corollary 2.4. dim ker L∗g is finite in Hloc (n + 1).

Proof. There is only a finite-dimensional space of possible initial data of the form (f (x0 ), ∇f (x0 )) for the linear ODE above satisfied by elements in the kernel of L∗g . 2 elements in ker L∗ are actually in C 2 (). Proposition 2.5. Hloc g

Proof. We assume the metric is at least C 3 and the boundary is at least C 2 , say, so that it can be covered by geodesically convex neighborhoods Ui in which geodesics starting in touch the boundary ∂ ∩ Ui at most once. As we saw in (7), any solution f satisfies an ODE along geodesics. By the smooth dependence of solutions of the ODE to the initial values, it is easy to see that f is in C 2 (), as the exponential map is C 2 . This is true up to the boundary, by considering geodesics initiating at points pi ∈ ∩ Ui going to boundary points, and using the uniqueness theorem for ODE to show we can extend f across ∂. Proposition 2.6. The zero-set of a nontrivial element f in the kernel of L∗g is a totally geodesic hypersurface in . Proof. As shown above, the zero-set 1 is an embedded hypersurface, and so we can choose local coordinates (x i ) adapted to 1, with x 1 , . . . , x n−1 coordinates on 1. The gradient ∇f is a nowhere-zero normal vector field on 1. Let D¯ denote the ambient connection. Then since the Levi-Civita connection is compatible with the metric g and g( ∂x∂ j , ∇f ) ≡ 0 for j < n, we have on 1, ∂ ∂ ¯ ∂ ∇f = 0. , ∇f = −g , D g D¯ ∂ j ∂x j ∂x i ∂x ∂x i

144

J. Corvino

This follows since as shown above, Hess f = Ric(g) − R(g) g f , and moreover n−1 th j k ¯ D ∂ ∇f is a vector whose k component is g f;ij which then clearly vanishes on 1. ∂x i

Thus the second fundamental form of 1 vanishes, as D¯ along 1.

∂ ∂x i

∂ ∂x j

has no normal component

The next proposition shows that the kernel of L∗g is related to static spacetimes in general relativity. A static spacetime is a four-dimensional Lorentzian manifold which possesses a timelike Killing field and a spacelike hypersurface which is orthogonal to the integral curves of this Killing field. In this case coordinates can be chosen so that the metric g¯ is a warped product of the hypersurface (with metric g) and a time interval, where the warping factor f is independent of time [W], i.e. g¯ = −f 2 dt 2 + g. Proposition 2.7. Let (M n , g) be a Riemannian manifold. Then f is a nontrivial element in the kernel of L∗g if and only if the warped product metric g¯ ≡ −f 2 dt 2 + g is Einstein. (Note that if f has zeros, then g¯ degenerates on the codimension-one submanifold f −1 (0).) Proof. We work on a component of the open set where f is nowhere-zero. The well¯ are as known formulas for the curvature tensor of the warped product (B ×f F, g) follows, for vectors X, Y tangent to the base B and V , W tangent to the d-dimensional fiber F , and with Ric = Ric(g) ¯ [O]: Ric(X, Y ) = RicB (X, Y ) −

d Hess f (X, Y ), f

Ric(X, V ) = 0, Ric(V , W ) = RicF (V , W ) − g(V ¯ , W)

*g f g(∇f, ¯ ∇f ) . + (d − 1) f f2

If we apply these formulas here, where d = 1 and so RicF = 0, we obtain the following using L∗g f = 0 (5),(6): Hess f (X, Y ) f *g f *g f =− g(X, Y ) = − g(X, ¯ Y ), f f Ric(X, V ) = 0 = g(X, ¯ V ), *g f Ric(V , W ) = − g(V ¯ , W ). f Ric(X, Y ) = Ric(g)(X, Y ) −

R(g) R(g) f , we see that g¯ is Einstein, with Ric(g) ¯ = (n−1) g. ¯ Note Since *g f = − (n−1) that from this equation it is well-known that R(g) is constant on any component of M, consistent with Proposition 2.3. Furthermore, if the metric g is scalar-flat (so it solves the time-symmetric constraints), then g¯ solves the vacuum Einstein equation. The converse follows analogously.

Scalar Curvature Deformation

145

We thus call Riemannian metrics with ker L∗g nontrivial static, and note that the spacetime metric associated with a nontrivial element f ∈ ker L∗g may degenerate at the zero-locus of f , which for example might correspond to horizons; we want to include this possibility, but note that some authors use “static” for metrics which admit f ∈ ker L∗g with empty zero-locus. f −1 (0) is quite special; we have seen that it is totally geodesic, but also it is easy to see that no closed (compact and boundaryless) component of it can be the boundary of an open set, since then the maximum principle for harmonic functions and unique continuation would furnish a contradiction. For the Schwarzschild 1− m m 4 metric g S = (1+ 2r ) δ on R3 \{0}, r = |x|, the function 1+ 2rm is in ker L∗g S and vanishes 2r

precisely on the horizon r = m2 . Riemannian metrics which are static and are smooth enough so that the above lemmas hold are known to be analytic, at least in certain local coordinate charts, and possibly away from a set of measure zero. The way to see this is to rewrite the equation L∗g f = 0 as a system for the function f and the metric g. To be precise we have the following proposition, whose proof we include for completeness.

Proposition 2.8. Let g be a metric which is at least C 2 and satisfies Proposition 2.3. If g is static, and f is a nontrivial element of the kernel of L∗g , then at each point p with f (p) = 0, there is a neighborhood and a coordinate chart in which f and g are analytic. Proof. Equations (5) and (6) above show how to rewrite the equation L∗g f = 0 as a square system for f and g: R0 f = 0, (n − 1) R0 f g = 0. f Ric(g) − Hessg f − (n − 1) *g f +

(8) (9)

Here we have used Proposition 2.3 to write R(g) = R0 , a constant. This is a second order nonlinear (quasilinear) analytic system for (f, g). It is not quite elliptic; the action of the diffeomorphism group is the obstruction to the ellipticity of the Ricci curvature operator. To overcome this, then, we pick a particularly nice coordinate system in which to work, a harmonic coordinate system, one in which the coordinate functions x i satisfy *g x i = 0, i = 1 . . . n. This can be arranged, as the local solvability of elliptic equations [Be,T] allows us to solve near p for harmonic functions x i with x i (p) = 0 and independent gradients at p. In these coordinates we compute the components Rij of the Ricci tensor, and we will see that the expression we get is semilinear elliptic in the components of g. In fact, if we let 9 k = g ij 9ijk , then Ric(g)ij −

1 1 1 r k k r r + grj 9,ir ≈ 9ij,k − 9ik,j − gri g st 9st,j − grj g st 9st,i , gri 9,j 2 2 2

where “≈” means “equal to, up to terms in gij and gij,k ”. Computing further we see

g km gim,j k + gj m,ik − gij,mk − gim,kj + gkm,ij − gik,mj 2

g km ≈ gj m,ik − gij,mk − gkm,ij + gik,mj , 2

k k − 9ik,j ≈ 9ij,k

146

J. Corvino

and

g rq 1 1 1 r r + grj g st 9st,i ≈ gri g st gri g st 9st,j gsq,tj + gtq,sj − gst,qj 2 2 2 2 rq

1 st g + grj g gsq,ti + gtq,si − gst,qi 2 2

1 st = g gsi,tj + gsj,ti − gst,ij . 2 Subtracting the previous two equations yields 1 1 r + grj 9,ir ≈ − g st gij,st . gri 9,j Ric(g)ij − 2 2 Finally, note that *g x r = 9 r , so that in harmonic coordinates the highest-order term in the formula for the Ricci curvature is just 1 Ric(g)ij ≈ − g st gij,st . 2 Note that the non-linear system above has the linearization at (f, g) with leading-order term of the form g st ij (h, kij ) → g h,ij , −f kij,st − h,ij . 2 This clearly has elliptic symbol, at all p where f (p) = 0. Hence applying the standard theory for analytic elliptic systems [Mo], the regularity follows. 2.3. Statements of the main theorems. We would like to define a weight function ρ on whose behavior near ∂ is determined by the distance to ∂. It is well-known (assuming the metric is at least C k ) that a C k hypersurface 1 has a closed tubular neighborhood N<0 (1) on which there is a C k−1 map = : N<0 (1) → 1, the nearest-point projection [Gr, pp. 13–39], [Si, pp. 42–46], [G-T, pp. 354–357]. We let 1r = {x ∈ N<0 (1) : d(x, 1) = r} denote the level sets of the function d(·, 1), the distance to 1, for 0 < r ≤ <0 . The functions = and d are related by the exponential on the normal bundle ν of 1 by the equation x = expν (=(x), d(x)en ). If x ∈ N<0 (1), then the unit-speed geodesic γ from =(x) to x realizes d along γ , by the triangle inequality. Thus |∇d| ≡ 1, and clearly ∇d is the parallel transport of en along such a γ , and so it is C k−1 . So the function d and hence these level sets will have regularity C k , so the second fundamental form will be controlled by that of 1 as long as k ≥ 2 [Gr, Si, G-T]. Now we let 1 = ∂, and we only take level sets inside . Now let ρ ≤ 1 be a positive function on , depending only on d and sharply asymptotic to d N near the C k+3 -boundary ∂, where N ! 1, and d ≡ d(·, ∂) is C k+3 ; moreover we will later find a d0 depending on for which we would like ∇ρ to vanish at distance d0 or greater from ∂ (see after Eq. (14)). Furthermore we may arrange that | ∇ l ρ |≤ Cd N−l , 0 ≤ l ≤ k + 3, and that ρ tends monotonically to zero with decreasing distance to the boundary ∂. To be definite we could let ρ˜ be a smooth function on R+ which is identically x N near x = 0, and which increases monotonically and levels out at height 1 for x ≥ d0 ; let ρ(x) = ρ(d(x)). ˜ We use the weight ρ −1 which 2 blows up at the boundary ∂ to form weighted L -spaces of tensors, which by design will be forced to decay suitably at ∂.

Scalar Curvature Deformation

147

Theorem 1. Let ⊂ M be a compactly contained C k+3 -domain in a Riemannian manifold (M n , g0 ), with n ≥ 2 and g0 a C k+4,α -metric. Suppose that the linearization Lg0 of the scalar curvature map R : C k+4,α () → C k+2,α () has an injective formal 2 () → L2 (). Then there L2 -adjoint L∗g0 at g0 , where we can consider L∗g0 : Hloc loc is an < > 0 such that for any function S ∈ C k,α () for which (S − R(g0 )) ∈ Cρk,α −1 ()

with the support of (S − R(g0 )) contained in and with S − R(g0 )k,α,ρ −1 < <, there is a C k+2,α -metric g with R(g) = S in and g ≡ g0 outside . We also prove a smooth (C ∞ ) version of this theorem (Theorem 4). Theorem 2. Let g be any smooth, asymptotically flat, scalar-flat metric on Rn , conformally flat at infinity, with positive mass m0 . Given any compact set K, there is a smooth scalar-flat metric on Rn which is Schwarzschild at infinity and agrees with the original metric g inside K. We get finite regularity if we assume the metric has a certain finite regularity, just as above: see Theorem 11. As we remarked in the introduction, the method works as well for metrics conformally flat to leading order (Corollary 5.18). 3. The Local Deformation of the Scalar Curvature 3.1. The Basic Estimate. We now prove an elliptic estimate in the ρ-weighted Sobolev spaces in a C 2 -domain. Since ρ decays at ∂, the boundary conditions are imposed by ρ −1 , not ρ; tensors in ρ −1 -weighted spaces will have to decay at a certain rate at the boundary, whereas growth at a certain rate at the boundary is allowed in the ρ-weighted spaces. We prove the Basic Estimate of Theorem 3 by first using the overdeterminedness of the operator L∗g to obtain an absolute elliptic estimate (Proposition 3.1), and then incorporating the weight ρ by integrating the analogous estimate (Proposition 3.2) near the boundary. We remark that we get a global estimate, even though the functions may grow near the boundary. We prove the Hρ2 case (which is the case we will need), and then point out the transparent modifications to other Sobolev spaces. Theorem 3. Suppose the kernel of L∗g is trivial in . Then there is a constant C = 2 (), C(n, g, , ρ), uniform for metrics C 2 -near g, so that for f ∈ Hloc f Hρ2 () ≤ CL∗g (f )L2ρ () .

(10)

Proof. We again note the trace of L∗g f = −*g f + Hess f − f Ric(g) is just −(n − 1)*g f − f R(g). So *g f is pointwise controlled by the tensor L∗g f , up to the zero-order term. The full Hessian, then, is similarly controlled by L∗g f . Thus by integration we have the following proposition, which does not require the triviality of the kernel of L∗g . 2 (), Proposition 3.1. For f ∈ Hloc

f H 2 () ≤ C(n, g, ) L∗g f L2 () + f L2 () .

(11)

148

J. Corvino

Proof. We use the basic interpolation inequality [G-T] f H 1 () ≤

1 f H 2 () + C(n, g, )f L2 () . 2

(12)

Along with the previous discussion, it is clear we can bound the L2 -norms of f and its first two derivatives by the right-hand side in (11). Remark. This looks innocuous at first, but recall that no boundary conditions have been imposed. The key is the overdeterminedness of L∗g , by which we have enough flexibility to estimate the full Hessian of f without performing the standard cut-off arguments using ellipticity, and hence we get the estimate on all of . The following proposition uses the no-kernel condition on L∗g . 2 (), Proposition 3.2. For < ≥ 0 sufficiently small, and for any f ∈ Hloc

f H 2 (< ) ≤ C L∗g f L2 (< ) , where C is independent of <. 2 -kernel in , we show that there is an < > 0 small Proof. Since L∗g has trivial Hloc ∗ 2 enough so that Lg has no H -kernel in < . To see this, let i =
f H 2 (< ) ≤ C L∗g f L2 (< ) . We claim furthermore that the constant C above is independent of < small. Suppose the contrary. Then there is a sequence i L∗g fi L2 (i ) . By normalizing fi H 2 (i ) to 1, we have L∗g fi L2 (i ) < 1/ i. Since ∂i is C 1,1 (in fact C 2 ), we can extend fi to H 2 () in such a way that for all i, fi H 2 () ≤ C1 . Remark. Here C1 is independent of
Scalar Curvature Deformation

149

By applying Rellich again, and relabeling indices, we have a function φ ∈ H 1 () with fi → φ in H 1 (). Moreover for any η ∈ Cc∞ (), 0 = lim η, L∗g fi L2 = lim Lg η, fi L2 = Lg η, φL2 . i→∞

i→∞

Hence L∗g φ = 0 weakly. By elliptic regularity (or a direct application of the estimate 2 (). We show that φ is not identically zero, which is a contradiction. (11)), φ ∈ Hloc By Proposition 3.1,

(13) f H 2 (< ) ≤ C L∗g f L2 (< ) + f L2 (< ) , and we explicitly point out that C = C(n, g, ) is independent of < small, since the reasoning in the above remark applies as well to the constant in the interpolation inequality 1 . (12). By the H 1 -convergence of {fi }, there is an i1 so that for i ≥ i1 , fi −φL2 () < 4C 1 ∗ Moreover, we can choose i1 large enough so that i ≥ i1 implies Lg fi L2 (i ) < 4C . Plugging into the above estimate (13) we get fi − φH 2 (i ) < 1/2, for i large; note we have strongly used the independence of C on i large. This shows that φ cannot be zero by the normalization of the fi . Remark. The constant C in Proposition 3.2 is of course also independent of g in a sufficiently small neighborhood in C 2 , since the operator norm L∗g − L∗g1 on H 2 () can be made arbitrarily small for g1 sufficiently close to g, as can the difference in the volume forms and connection coefficients used in the norms in the estimate, and also the preceding remark applies as well. We write out a similar remark in Proposition 5.5. We now use Proposition 3.2 to finish off the proof of Theorem 3. Define F and G by letting F = f 2 + |∇f |2 + | Hess f |2 , and G = |L∗g f |2 . We now employ the weight ρ, which depends only on d and is monotonic in distance to the boundary, and which is identically 1 on some d0 , where Proposition 3.2 is true for < ≤ d0 ≤ <0 . Thus we can integrate to get the following:     d0 d0     ρ˜ (<)  + ρ˜ (<)  +  F dµg d< ≤ C  G dµg d<. 0

d0

0

{<
d0

{<
We recall the co-area formula for Lipschitz functions φ with ess inf |∇φ| > 0 [E-G]: <2

G dµg = {<1 <φ<<2 }

<1 {φ=s}

G dσg ds, |∇φ|

(14)

where again dµg is the volume measure for g, dσg is the area measure on hypersurfaces, and G ∈ L1 (dµg ). So with φ = d, and using the fact that |∇d| = 1 near ∂ (Sect. 2.3), we get   d0 d0 d0 F   dσg ds  d<. ρ˜ (<) F dµg d< = ρ˜ (<)  |∇d| 0 0 {<
< {d=s}

150

J. Corvino

We integrate by parts to get (since the boundary terms vanish):

d0

ρ˜ (<)

0

F dµg d< =

{<
=

0

d0

ρ {d=<}

F dσg d< |∇d|

f 2H 2 (A . (0,d0 ) ) ρ

d Let C0 = 0 0 ρ˜ (<)d< > 0. Applying the same procedure as above to the integral of G yields the estimate C0 f 2H 2 ( ρ

d0 )

+ f 2H 2 (A

(0,d0 ) )

ρ

≤ C(C0 G2L2 (

d0 )

ρ

since ρ ≡ 1 in d0 ; this finishes the Basic Estimate.

+ G2H 2 (A ρ

(0,d0 ) )

),

By simple modifications, we see the method extends to other L2 -weighted Sobolev spaces Hρk . In fact it is equally clear that the estimate may be extended to the Lp -Sobolev p spaces, p > 1; here we can take f ∈ Lρ () to mean f ρ 1/p ∈ Lp (). As before, we k,p p k,p can define Wρ (), and for tensors Lρ (), Wρ (), in the obvious manner. Proposition 3.3. Let be a C k -domain, and g a C k -metric, (k ≥ 2). If L∗g has trivial k,p

2 ()-kernel, then there is a C so that for all f ∈ W Hloc loc () (p > 1),

f W k,p () ≤ CL∗g f W k−2,p () . ρ

ρ

Proof. The proof exactly mirrors the previous proof, using the Lp -estimates and the appropriate interpolation and compactness results for the W k,p -spaces [G-T, Ma]. In the appendix we include another way to get the Basic Estimate which illustrates again that the key is the overdeterminedness of L∗g . 3.2. Variational method. In this section we show how solutions to Lg (h) = φ for φ ∈ L2ρ −1 () can be obtained from standard variational arguments. We assume the triviality condition on the kernel of L∗g as in the Basic Estimate (10). For f ∈ L2ρ (), consider the functional F defined on Hρ2 () by F(u) =

Let µf =

inf

u∈Hρ2 ()

1 | L∗g u |2 −f u ρ dµg . 2

F(u). Letting u ≡ 0, we see µf ≤ 0.

We first show µf is finite: Lemma 3.4. For any f ∈ L2ρ (), µf > −∞.

Scalar Curvature Deformation

151

Proof. By the Basic Estimate (10), there is a C > 0 such that u2H 2 ≤ CL∗g u2L2 . Thus ρ

ρ

F(u) ≥

1 u2H 2 − ρ 2C

f uρ dµg

1 ≥ u2H 2 − f L2ρ uL2ρ ρ 2C 1 u2H 2 − f L2ρ uHρ2 . ≥ ρ 2C Since this lower bound is quadratic in uHρ2 , it is clear that µf > −∞.

Corollary 3.5. For any f ∈ L2ρ (), µf = lim F(ui ) for some sequence {ui } with i→∞

{ui 2H 2 () } bounded. ρ

Recall the following two elementary facts: w

Fact 1. If X is a normed linear space and xn converges weakly to x, written xn F x, then x ≤ lim inf xn . n→∞

Proof. Let x ∗ denote an element of the dual X ∗ . By Hahn–Banach, x = sup x, x ∗ . Now with x ∗ = 1, x ∗ =1

x, x ∗ = lim xn , x ∗ ≤ lim inf xn · 1. n→∞

n→∞

Fact 2. Let X, Y be normed linear spaces, and T : X → Y a bounded linear transforw w mation. If xn F x in X, then T (xn ) F T (x) in Y . Proof. Let T ∗ : Y ∗ → X∗ the adjoint of T . Then for any y ∗ ∈ Y ∗ , T (xn ), y ∗ = xn , T ∗ (y ∗ ) → x, T ∗ (y ∗ ) = T (x), y ∗ . Proposition 3.6. For any f ∈ L2ρ () there is a minimizer u ∈ Hρ2 (), i.e. F(u) = µf . Proof. By the previous corollary, µf = lim F(ui ), where {ui 2H 2 () } is bounded. By i→∞

ρ

Banach–Alaoglu and the reflexivity of Hilbert space (the Riesz representation theorem), w w there is a subsequence uik F u in Hρ2 (). By Fact 2 above, then, L∗g uik F L∗g u in w

L2ρ (), and (via inclusion) uik F u in L2ρ (). Thus it follows from Fact 1 above that L∗g uL2ρ ≤ lim inf L∗g uik L2ρ . Hence, k→∞

1 ∗ 2 µf = lim | Lg uik | −f uik ρ dµg k→∞ 2 1 ∗ 2 | Lg uik | ρ dµg − lim sup f uik ρ dµg ≥ lim inf k→∞ 2 k→∞

152

J. Corvino

1 ∗ Lg uik 2L2 − f uρ dµg ρ k→∞ 2 1 ∗ 2 ≥ Lg uL2 − f uρ dµg = F(u). ρ 2

= lim inf

The next step is to derive the Euler–Lagrange equation for the functional F, minimized by F(u) = µf . So let η ∈ Cc∞ (). Then d 1 d ∗ 2 0= |t=0 F(u + tη) = |t=0 | Lg (u + tη) | −f (u + tη) ρ dµg dt dt 2   1 d |t=0  (in a local orthonormal frame) = (L∗g u + tL∗g η)2ij ρ dµg  dt 2 i,j − f ηρ dµg = Thus for all η ∈ formulation of

Cc∞ (),

(L∗g u L∗g η − f η)ρ dµg .

(L∗g η ρ L∗g u) dµg

=

fρ η dµg . This is simply the weak

Lg (ρL∗g u) = fρ. Note that ρL∗g u is a symmetric tensor. Corollary 3.7. If f is not identically zero in , then µf < 0. Also, when f = 0, the function u = 0 is the unique minimizer, so this variational procedure produces only the trivial solution of Lg h = 0. Proof. If µf = 0, then u ≡ 0 would be a minimizer, contradicting the previous equation. Moreover, if f were zero, then the functional F is nonnegative: 1 F(u) = | L∗g u |2 ρ dµg ≥ 0. 2

The minimum value is attained by u = 0, and since L∗g has trivial kernel, this is the only solution. The operator w → Pw = Lg (ρL∗g w) is self-adjoint, and we have the distributional equalities fρ, ηL2 = Lg (ρL∗g u), ηL2 = Pu, ηL2 = u, PηL2 , for η ∈ Cc∞ (). As ρ and g are sufficiently smooth, we can write P as a fourth-degree elliptic operator not in divergence-form, and elliptic regularity yields our weak solution 4 (). u ∈ Hloc

Scalar Curvature Deformation

153

4 (). With Thus we can solve Lg (ρL∗g u) = fρ for any f ∈ L2ρ () with u ∈ Hloc ρ > 0 in , the map H : L2ρ () → L2ρ −1 () which maps f → fρ is an isometric

4 () with isomorphism. Hence for any φ ∈ L2ρ −1 () there is a u ∈ Hρ2 () ∩ Hloc ∗ ∗ Lg (ρLg u) = φ; so h = ρLg u solves the problem posed at the start of this section. If we had strong enough estimates on u to show ρL∗g u ∈ Hρ2−1 (), we could just apply the inverse function theorem to produce a solution of the nonlinear problem R(g + h) = S of Theorem 1, with h of compact support. What we do instead in the next section is apply Schauder theory to get estimates on u in Hölder spaces, and then apply Picard’s method directly to yield strong solutions to the nonlinear problem.

3.3. Solving the nonlinear problem via Picard’s method. The problem we would like to solve is the following: given an initial metric g0 and a function S close to R(g0 ) (in some appropriate norm) and equal to R(g0 ) outside , find a tensor h ∈ S 0,2 with R(g0 + h) = S and h ≡ 0 outside . In this section we show we can do this using Picard’s method, assuming that L∗g0 has trivial kernel. Since R : Mk+2,α → C k,α is a smooth function between open sets in Banach spaces, we have by Taylor’s formula R(g0 + h) = R(g0 ) + Lg0 (h) + O(h2k+2,α ), where sup

h=0

O(h2k+2,α )k,α h2k+2,α

= C(g0 ).

By the Taylor remainder formula, we can take C to be uniform for metrics C k+2,α -near g0 , and since the quantities are local in nature, C can be taken uniform over any domain inside . We then solve the linear problem Lg0 (h0 ) = S − R(g0 ) via the variational method of the previous section. Let g1 = g0 + h0 ; by appropriate control on (S − R(g0 )) we will show that h0 is small in an appropriate Hölder space (by Schauder theory), so that g1 is indeed a metric. Note that (R(g1 ) − S) = O(h0 2k+2,α ). At this point the most natural technique to employ would be Newton’s method, in which we would linearize about the new metric g1 and produce a tensor h1 and another metric g2 = g1 + h1 , with (R(g2 ) − S) = O(h1 2k+2,α ). As with h0 , then, h1 can be controlled to be small by (R(g1 ) − S), so we see that each stage would represent a quadratic improvement on the previous stage. Unfortunately, we will see there seems to be problem with this: a priori we seem to experience a derivative loss each time we linearize about a new metric (see the discussion after (18)). To overcome this difficulty, we employ a Picard approach, only linearizing at g0 ; this produces additional error terms which we show can be controlled by the quadratic improvement. We note that the norms below will be taken in the metric g0 . Proposition 3.8. Let g0 be a C k+4,α -metric such that the operator L∗g0 has trivial kernel 2 (). Then there is a constant C = C(g , n, , k, ρ), uniform for metrics near g in Hloc 0 0 in C k+4,α , and an < > 0 (sufficiently small) so that if S ∈ C k,α () with (S − R(g0 )) ∈

154

J. Corvino

Cρk,α −1 () and with S − R(g0 )k,α,ρ −1 < <, then upon solving Lg0 (h0 ) = (S − R(g0 )) via the variational method of the previous section, and letting g1 = g0 + h0 , we have h0 k+2,α,ρ −1 ≤ CS − R(g0 )k,α,ρ −1 < C <, (and hence for small <, g1 is indeed a metric), and S − R(g1 )k,α,ρ −1 ≤ CS − R(g0 )2k,α,ρ −1 < C < 2 . Moreover, the metric g1 is C k+2,α .

Proof. We let u0 represent a solution of the variational problem F(u0 ) = µf ≤ 0, where here fρ = S − R(g0 ). Let h0 = ρL∗g0 u0 , so that Lg0 (h0 ) = S − R(g0 ). We use the variational formula to get control of u0 Hρ2 . Indeed, since µf ≤ 0,

(S − R(g0 ))u0 dµg ≥

1 ∗ 1 u0 2H 2 , Lg0 u0 2L2 ≥ ρ ρ 2 2C 2

where C is from the Basic Estimate. By the Cauchy–Schwarz inequality, we have 1 1 (S − R(g0 ))u0 dµg = (S − R(g0 ))ρ − 2 u0 ρ 2 dµg

≤ S − R(g0 )L2 u0 L2ρ ρ −1

≤ S − R(g0 )L2 u0 Hρ2 . ρ −1

Combining this with the previous inequality yields a constant C so that u0 Hρ2 ≤ CS − R(g0 )L2 . ρ −1

(15)

This obviously yields an L2ρ −1 -bound on h0 as follows:

h0 2L2

ρ −1

=

| h0 |2 ρ −1 dµg =

≤

Cu0 2H 2 ρ

ρ 2 | L∗g0 u0 |2 ρ −1 dµg

(16)

≤ CS − R(g0 )2L2 . ρ −1

(17)

Now we get Hölder regularity and estimates on the solution u0 via the Schauder theory [G-T, Mo]. Indeed, if 0 < d1 ≤ 1 and ⊂⊂ with d( , ∂) ≥ d1 , i.e. ⊂ d1 , there is a constant C such that

u0 k+4,α, ≤ Cd1−k−4−α u0 L2 () + P(u0 )k,α, . (18) Here the constant C depends only on n, k, α, , and g0 , with the dependence on g0 via the constant of ellipticity of P and on the C k,α ()-norms of the coefficients of P; in particular, the constant C is uniform for metrics C k+4,α -near g0 , since the Ricci term from L∗g is differentiated twice in computing P. Note that we only get (k + 2)-regularity on the tensor h0 (see (19)), so that the coefficients in our subsequent iterations would drop in regularity. We will overcome this by linearizing only at the initial metric.

Scalar Curvature Deformation

155

We note that the dependence on the domain just comes from the finitely many ¯ and within each chart the local estimate is independent of charts we use to cover , diameter bounds from above (see Corollary 6.3 of [G-T]); this will be important in (19) and (22) below, where the C can thus be chosen uniformly in d1 and d respectively and independent of ⊂⊂ and B ⊂⊂ B. This gives an interior Hölder bound on h0 on ⊂ 2d1 , (d1 ≤ 1): h0 k+2,α, = ρL∗g0 u0 k+2,α, ≤ Cu0 k+4,α, ≤ Cd1−k−4−α u0 L2 (d ) + Cd1−k−4−α S − R(g0 )k,α 1

≤

−N Cd1−k−4−α d1 2 − N2 −k−4−α

≤ Cd1

u0 L2ρ () + Cd1−k−4−α S − R(g0 )k,α,ρ −1

S − R(g0 )k,α,ρ −1 .

(19)

Note that in the first inequality we used bounds on the derivatives of ρ (see Sect. 2.3), and in the third we used the inequality u0 L2 (d

1)

− N2

≤ Cd1

u0 L2ρ () .

(20)

This follows from the asymptotic behavior of ρ on , since there is a constant C > 0 such that C −1 d N ≤ ρ ≤ Cd N on ; hence −N 2 u0 dµg ≤ Cd1 u20 ρ dµg . d1

We can estimate the right-hand side of this inequality by (S − R(g0 )) by the above estimate (15). We combine this interior estimate with an analysis of the behavior near the boundary. Let x ∈ with d(x, ∂) = d ≤ 1, and let B = B(x, d2 ), and B = B(x, 3d 4 ). Then clearly as above N

u0 L2 (B) ≤ Cd − 2 u0 L2ρ () .

(21)

Again the right-hand side is bounded by (S − R(g0 )) by (15), and the Schauder estimate considerations as above yield the following: h0 k+2,α,B = ρL∗g0 u0 k+2,α,B ≤ Cd N−k−2−α u0 k+4,α,B ≤ Cd N−2k−6−2α u0 L2 (B) + Cd N−2k−6−2α S − R(g0 )k,α N

≤ Cd N−2k−6−2α d − 2 u0 L2ρ () + Cd N−2k−6−2α S − R(g0 )k,α,ρ −1 N

≤ Cd 2 −2k−6−2α S − R(g0 )k,α,ρ −1 .

(22)

So we have an interior bound and a local bound near the boundary ∂ of h0 k+2,α by S − R(g0 )k,α,ρ −1 . Clearly for N large enough, h0 and its first k + 2 derivatives decay to zero at the boundary. (This was affected by the weight ρ.) Moreover it is clear by this estimate that g1 = g0 + h0 is C k+2,α on M.

156

J. Corvino

Now to achieve an overall bound on h0 k+2,α on we study the boundary. Cover the boundary with a finite collection of balls Bi of radius ri , ri < 1, centered on ∂. These balls cover some closed <0 -neighborhood N<0 (∂) of ∂. Let x and y be arbitrary , d(y,∂) ), in which points inside Bi ∩ . Either the distance d(x, y) < max( d(x,∂) 2 2 case y ∈ B(x, d(x)/2) (or vice versa) and so the Hölder quotient is bounded as in the previous estimate (22), or else d(x, y) ≥ max( d(x,∂) , d(y,∂) ), in which case the 2 2 following inequality holds: (k+2) h (x) | | ∇ (k+2) h0 (x) − ∇ (k+2) h0 (y) | | ∇ (k+2) h0 (y) | 0 α |∇ . ≤2 + d(x, y)α d(x, ∂)α d(y, ∂)α Each of these last two terms can be bounded using the previous estimate (22), so that N the Hölder quotient is bounded by Cd 2 −2k−6−3α S − R(g0 )k,α,ρ −1 . If (xi , yi ) is a sequence of points representing a maximizing sequence of Hölder quotients, then since h0 and its first (k + 2) derivatives are bounded in terms of S − R(g0 )k,α,ρ −1 , we can achieve a similar bound on the Hölder quotients of h0 by S − R(g0 )k,α,ρ −1 , unless perhaps the distance between the points decreases below a certain level, say min(δ0 , <20 ), where δ0 is a Lebesgue number for the covering {Bi } of the neighborhood N<0 (∂). If (xi , yi ) is such a pair of points, then either xi , yi ∈ <0 , or (by renaming if necessary) xi ∈ <0 and yi ∈ N<0 (∂), or xi , yi ∈ N<0 (∂). In any o f these cases it is clear that we can achieve the desired estimate: in the first case by the interior estimate (19) on <0 /2 , in the second case also by this estimate (since the points must be close to ∂(<0 )), and in the third case by the above analysis near the boundary (22), since xi and yi will be contained in some ball Bj . Combining this with the previous L2ρ −1 -bound on h0 (17), we see indeed that h0 k+2,α,ρ −1 ≤ CS − R(g0 )k,α,ρ −1 .

(23)

Finally, we recall that S − R(g1 )k,α ≤ Ch0 2k+2,α , by the Taylor formula, that is, (S − R(g1 )) is quadratic in h0 k+2,α , on and also locally on B ⊂⊂ . We then have from the above (23) that S − R(g1 )k,α ≤ CS − R(g0 )2k,α,ρ −1 . Moreover, near ∂ (d ≤ 1), | S − R(g1 ) |≤ Cd N−4k−12−4α S − R(g0 )2k,α,ρ −1 .

(24)

| S − R(g1 ) |2 ≤ C 2 d 2N−8k−24−8α S − R(g0 )4k,α,ρ −1 ,

(25)

Thus

and hence indeed (S − R(g1 )) ∈ L2ρ −1 , and also S − R(g1 )k,α,ρ −1 ≤ CS − R(g0 )2k,α,ρ −1 .

Remark. Estimates (24) and (25) clearly illustrate how the quadratic convergence compensates for the loss in decay from differentiation. Indeed as R(g1 ) involves two derivatives of the tensor h0 we might expect to lose some decay near ∂. Indeed the quadratic convergence yields a high enough power of d in (25) to keep (S − R(g1 )) in L2ρ −1 .

Scalar Curvature Deformation

157

Proof of Theorem 1. We iterate the procedure of linear correction. As was noted above (following Eq. (18)), we only linearize about g0 due to the apparent loss in differentiability. Having found h0 as above, we can repeat the procedure to find a symmetric tensor h1 so that Lg0 (h1 ) = S − R(g1 ). Let g2 = g1 + h1 . By the preceding estimates, we see that for small enough <, g2 will indeed be a C k+2,α -metric. We estimate (S − R(g2 )). Indeed we write, R(g2 ) = R(g1 ) + Lg1 (h1 ) + O(h1 2k+2,α ) = S + [Lg1 (h1 ) − Lg0 (h1 )] + O(h1 2k+2,α ).

(26)

So we do not quite have quadratic convergence, and we need to estimate the additional error term [Lg1 (h1 ) − Lg0 (h1 )]. The map g → Lg is a smooth map Mk+2,α → L(C k+2,α , C k,α ), where L denotes the space of bounded linear maps. So we can estimate Lg1 (h1 ) − Lg0 (h1 )k,α ≤ C2 h0 k+2,α h1 k+2,α ≤ C2 C 3 S − R(g0 )3k,α,ρ −1 . Here C is from Proposition 3.8, C2 is uniform for g1 C k+2,α -near g0 , and we have used Proposition 3.8 to estimate h1 k+2,α ≤ CS − R(g1 )k,α,ρ −1 ≤ C 2 S − R(g0 )2k,α,ρ −1 . Although we do not have a quadratic improvement at the next step, we show that this decay is enough to allow the iteration to proceed. In fact, we define hm and gm recursively m−1 by taking gm = g0 + hl = gm−1 + hm−1 , and letting hm be a variational solution l=0

of Lg0 (hm ) = S − R(gm ); this assumes we have enough control to keep gm a metric; we then show we can control hm so that gm+1 is a metric, and moreover the sequence gm converges in Mk+2,α . Proposition 3.9. Suppose that in the above iteration procedure we have recursively obtained h0 , . . . , hm−1 and that g0 , . . . , gm are C k+2,α -metrics. Let C be as in Proposition 3.8 and suppose that there is a constant K and a δ with 0 < δ < 1 so that for all l < m, (1+lδ)

hl k+2,α,ρ −1 ≤ CKS − R(g0 )k,α,ρ −1 , and for all j ≤ m, (1+j δ)

S − R(gj )k,α,ρ −1 ≤ KS − R(g0 )k,α,ρ −1 . Then for sufficiently small S − R(g0 )k,α,ρ −1 (independent of m), the iteration can proceed to the next step and the above inequalities persist for l = m and j = m + 1. Proof. By Proposition 3.8, we see that for S − R(g0 )k,α,ρ −1 sufficiently small, we have the base cases for the induction proof, and we also have the metrics gj near g0 ; in this way a second-order Taylor remainder constant C1 for g → R(g) and a first-order Taylor remainder constant C2 for g → Lg can be taken independent of g1 . . . , gm . Moreover we remark that since we are solving the variational problem about g0 only, the proof of (23) yields hm k+2,α,ρ −1 ≤ CS − R(gm )k,α,ρ −1 for fixed C independent of m. So we have

158

J. Corvino

hm k+2,α,ρ −1 ≤ CS − R(gm )k,α,ρ −1 (1+mδ)

≤ CKS − R(g0 )k,α,ρ −1 .

(27)

As in (22) above, a variational solution hm of Lg0 (hm ) = S − R(gm ) will decay at the boundary so that gm+1 will be C k+2,α . From the equation R(gm+1 ) = R(gm ) + Lgm (hm ) + O(hm 2k+2,α ) =S+

m−1 p=0

[Lgp+1 (hm ) − Lgp (hm )] + O(hm 2k+2,α ),

(28)

we have S − R(gm+1 )k,α ≤ C1 hm 2k+2,α + C2

m−1

hp k+2,α hm k+2,α

p=0 (2+2mδ)

≤ C1 (CK)2 S − R(g0 )k,α,ρ −1 + C2 (CK)S − R(g0 )k,α,ρ −1 ×

m−1 p=0

(1+mδ)

(S − R(g0 )δk,α,ρ −1 )p (CK)S − R(g0 )k,α,ρ −1 (1+(m+1)δ)

= S − R(g0 )k,α,ρ −1

(1+(m−1)δ)

C1 (CK)2 S − R(g0 )k,α,ρ −1

m−1 δ p + C2 (CK)2 S −R(g0 )1−δ . (S −R(g ) ) 0 k,α,ρ −1 k,α,ρ −1 p=0

(29) Observe that m−1 p=0

S − R(g0 )δk,α,ρ −1

p

=

1 − S − R(g0 )mδ k,α,ρ −1

1 − S − R(g0 )δk,α,ρ −1

! .

For S − R(g0 )k,α,ρ −1 below 1 and bounded away from 1, this expression is uniformly bounded, say by C3 for all m. We can choose S − R(g0 )k,α,ρ −1 < 1 small enough to and C3 C2 (CK)2 S − R(g0 )1−δ appropriately, bound C1 (CK)2 S − R(g0 )1−δ k,α,ρ −1 k,α,ρ −1 say for sake of Proposition 3.9, by K3 . Finally, we bound S − R(gm+1 )L2 . We do this exactly as in the previous theorem ρ −1

(proof of Prop. 3.8). To be precise, inside a set ⊂⊂ , a uniform distance from the boundary, we can take some M with 1 ≤ ρ −1 ≤ M in , and we have from (29) that S −R(gm+1 )L2

ρ −1

( )

(1+(m+1)δ)

≤ S −R(g0 )k,α,ρ −1 Mvol() C1 (CK)2 S −R(g0 )1−δ k,α,ρ −1 2 + C3 C2 (CK)2 S − R(g0 )1−δ . k,α,ρ −1

(30)

Scalar Curvature Deformation

159

Clearly for S − R(g0 )k,α,ρ −1 < 1 sufficiently small, this can be made less than K 6 S

(1+(m+1)δ)

− R(g0 )k,α,ρ −1 . We choose once and for all so that C d 2N−8k−24−8α ρ −1 dµg ≤ 1.

(31)

\

The boundary estimate of |S −R(gm+1 )| (as in (24)) is done by using Eq. (28) together with the induction hypothesis and the estimate (22): N

hj k+2,α,B ≤ Cd 2 −2k−6−2α S − R(gj )k,α,ρ −1 . We plug this into (28), and what we get is estimate (29) with an additional factor of d N−4k−12−4α . We use this estimate to integrate |S − R(gm+1 )|2 ρ −1 near the boundary (also using (31)), and on the interior we have the estimate of the right-hand side of (30). Proposition 3.9 shows that the series

∞

hl converges geometrically to some “small” l=0 Mk+2,α to g = g0 + h with R(g) = S.

h ∈ Cρk+2,α , and hence gm converges in −1 This completes the proof of Theorem 1.

By using an exponentially decaying weight, we can actually solve the problem smoothly. In particular, let ρ ≤ 1 be a positive function on which is sharply asymptotic 1 to e−1/d near ∂. Furthermore we may assume |∇ k ρ| ≤ C(k)d −2k e(− d ) , and that ρ tends monotonically to zero with decreasing distance to ∂ near ∂, and that ρ ≡ 1 in a specified d0 ⊂⊂ (see Sect. 2.3 and the end of Sect. 3.1). Note that the restriction in smoothness of the previous result was essentially due to that of the initial metric, so it sufficed to use a weight decreasing like a power. Without a restriction on the smoothness of the initial metric, we use a weight which decays faster than any power of the distance to the boundary. Theorem 4. Let ⊂ M be a compactly contained domain in a Riemannian manifold (M, g0 ), with g0 a C ∞ -metric, and ∂ smooth. Suppose that the linearization Lg0 of the scalar curvature map R : C ∞ () → C ∞ () has an injective formal L2 -adjoint L∗g0 2 () → L2 (). Then there is an < > 0 and at g0 , where we can consider L∗g0 : Hloc loc a sequence 0, such that for any smooth function S with S − R(g0 )0,α,ρ −1 < < and with d 2k (S − R(g0 ))k,α,Nk <
160

J. Corvino

quasilinear elliptic fourth-order equation for u, and since both S and g0 are smooth, we can easily see that standard elliptic bootstrapping will yield smoothness of u and hence of g. The Basic Estimate for L∗g0 , namely f Hρ2 () ≤ CL∗g0 (f )L2ρ () still holds, since we really only used the monotonicity properties of ρ in the proof. Hence the variational method for solving Lg0 (h) = φ still works. So we have sequences um and hm = ρL∗g0 um . The estimate (15) yields a constant C0 independent of m so that um Hρ2 ≤ C0 S − R(gm )L2 .

(32)

ρ −1

Clearly Proposition 3.9 goes through almost unchanged for our exponentially decaying weight ρ (the boundary estimate for hm is given below) and along with (32) yields (1+mδ)

um Hρ2 ≤ C0 KS − R(g0 )0,α,ρ −1 ≤ C0 K< (1+mδ) .

(33)

So the sequence of um is summable in Hρ2 (). The Schauder estimate then yields for ⊂ 2d1 , q " " " " um " " m=p+1



4,α,

q " " " " ≤ C " um "



m=p+1

q " " " " ≤ C " um " m=p+1

This shows that u =

∞ m=0

L2 (d1 )

L2ρ ()

q " " " " +" Lg0 hm "

m=p+1



0,α

m=p+1

q " " " " +" hm "



 2,α

.

um ∈ C 4,α , uniformly on compact subdomains of , and so

indeed h = ρL∗g0 u. Now by simple computation, the top-order part of R(g0 + ρL∗g0 u) is ρg ij g pm −g0ab (g0 )ip u,abmj + g0ab (g0 )ij u,abmp . Hence the principal symbol computed in normal coordinates for g at a point is  ρg0ab ξa ξb 

n

p=1

ξp2

n

(g0 )ii −

i=1

n

 (g0 )ip ξi ξp  ,

i,p=1

which is easily seen to be nonvanishing for ξ = 0. So u is smooth on by elliptic bootstrapping on the quasilinear equation R(g0 + ρL∗g0 u) = S. What remains to be shown, then, is that all the derivatives of h decay at the boundary ∂. So let x ∈ with d(x, ∂) = d < 1, and let η1 < η2 < 1, so that B = B(x, η1 d) ⊂ B = B(x, η2 d) ⊂ Nk . We note the estimate 1

um L2 (B) ≤ C e 2d(1−η2 ) um L2ρ () .

Scalar Curvature Deformation

161

Plugging this into the Schauder estimate we obtain (where the constant C here also depends on n, k, α, , ρ and g0 as before (see (18) and (19)) q " " " " hm " " m=p+1

k+2,α,B

q " " " ∗ " = "ρLg0 ( um )" m=p+1

≤ Cd −2(k+2)−1−α e ≤ Cd −3k−9−2α e

(− (1+η1

(− (1+η1

+ Cd −3k−9−2α e ≤ Cd −3k−9−2α e

k+2,α,B

1 )d

1 )d

" "

)"

(− (1+η1

1

q " " " um " "

)"

m=p+1 q "

" um "

m=p+1 q " )d ) "

"

−1+2η2 +η1 ) 1 )(1−η2 )

( 2d(1+η

+ Cd −3k−9−2α e

(− (1+η1

1 )d

m=p+1 q

L2ρ ()

" " (S − R(gm )) "

m=p+1

k,α,Nk

" " um "

" "

" "

L2 (B)

" " (S − R(gm )) "

m=p+1 q "

)"

k+4,α,B

k,α,Nk

.

Here C depends on η1 and η2 , which we fix so that 2η2 + η1 < 1. The first term on the right converges to zero by the summability of the um . Moreover, for
the boundary, for any k ∈ Z+ .

4. The Mass of an Asymptotically Flat Manifold In this section we recall the notion of the mass of an asymptotically flat Riemannian manifold which we will need in the next section; we also give an application to Bartnik’s quasi-local mass. A complete Riemannian manifold (M n , g) is called asymptotically flat (AF) if there is a compact set K ⊂ M such that M \ K is a union of a (finite) number of “ends” each diffeomorphic to the exterior of the unit ball in Rn , with decay conditions on the metric: if N denotes an end, and H : N → Rn \ B1 is an AF coordinate chart, we require the tensor (H∗ g − δ) to decay suitably, for example (H∗ g − δ)(x) = O2 (|x|−p ) and R(g) = O(|x|−q ), where q > n and p > (n − 2)/2 [S1]. Here we define Ok (R l ) |α| as the set of functions f with ∂∂x αf = O(R l−|α| ) for all α with |α| ≤ k; similarly we define Ok (R l ) for tensor fields. We note that these conditions certainly hold for metrics which are conformally flat and scalar-flat at infinity (see Sect. 3.4), in which case (H∗ g − δ)(x) = O∞ (|x|−(n−2) ). For optimal conditions on the decay, see [Ba1].

162

J. Corvino

In this setting we can define the ADM mass m of an end as m = mADM

1 = lim 4ωn−1 r→∞

gij,i − gii,j ν j dξ,

(34)

|x|=r i,j

where ωn−1 is the volume of the standard unit Sn−1 , dξ denotes Euclidean surface measure, ν j denotes the Euclidean unit normal, and we have abused notation by denoting H∗ g by “g” [A-D-M]. The existence of the limit, as well as the fact that the limit is independent of the AF chart, is demonstrated in [Ba1]. It is a simple computation to show that if the metric g is conformally flat and scalar-flat, then the mass m appears in the expansion of the conformal factor in spherical harmonics at infinity (see Sect. 5.4) u=1+

A + O(|x|−(n−1) ) |x|n−2

m . as A = n−1 A fundamental result about the mass is the Positive Mass Theorem (PMT):

Theorem 5. Let (M n , g) be AF with R(g) ≥ 0. Then the ADM mass of each end is nonnegative, and is zero if and only if (M n , g) is isometric to (Rn , δ). The first proof for n = 3 appears in [S-Y2, S-Y4], and a proof for n ≤ 7 is given in [S1]. A proof for spin manifolds is found in [Wi]. A sharpening of the PMT known as the Penrose Inequality has recently been attained by Bray [Br2] and Huisken–Ilmanen [H-I] independently. We state the version from [Br2]. First note that a horizon is a stable minimal S2 , and outermost minimal spheres are stable [H-I]. Theorem 6. Let (M 3 , g) be AF with R(g) ≥ 0. Let m be the ADM mass of an AF end, and let A be the total surface area of the outermost minimal spheres in this end. Then # m≥

A . 16π

Equality $ holds if and only if the metric is isometric to the Schwarzschild metric of mass A m = 16π outside the horizon. The physical motivation is that the outermost spheres represent horizons of black holes whose mass contribution is through the above area expression; the inequality gives a lower bound for the total mass in terms of the masses of the black holes. See [Br1, Br2] for more discussion of these ideas. Remark. We do not explicitly use these theorems, so we have not stated the optimal decay conditions on the metric. There is one fact used to simplify the analysis in the proofs of the above theorems which we would like to state. Bray [Br1, Br2] remarked that this follows from conformal analysis carried out in [S-Y3] and [S1].

Scalar Curvature Deformation

163

Theorem 7. Let (M n , g) be AF with nonnegative (zero) scalar curvature. Given any < > 0 there exists a metric g0 with nonnegative (zero) scalar curvature which outside a compact set is conformally flat and scalar-flat, and moreover satisfies the <-quasiisometry condition on M: for all vectors v in the tangent bundle of M, 1−< ≤

g0 (v, v) ≤ 1 + <. g(v, v)

Moreover, the masses of each AF end with respect to the two metrics differ at most by <. So scalar-flat initial data can be approximated <-quasi-isometrically by data which is conformally flat at infinity. It is in this sense that we comment in the introduction that the construction in Theorem 2 of data which is Schwarzschild at infinity is generic. The pointwise approximation in Theorem 7 is also good enough to control the mass, but of course one may wish to work in topologies of metrics which exhibit more control on other metric invariants. 4.1. Applications to quasi-local mass. In general relativity there remains the problem of finding a suitable definition for the mass-energy of an extended body (region), a quantity which measures both the local matter density and the gravitational field. For infinitesimal bodies there is the stress-energy tensor T , which appears in the Einstein ¯ equation Ric(g) ¯ − R(2g) g¯ = 8π T . For isolated systems modelled by asymptotically flat spaces, the ADM mass measures the energy of the system. Bartnik [Ba2] attempts to define the quasi-local mass of a domain with nonnegative scalar curvature; the curvature condition comes from thinking of these domains as subdomains of time-symmetric initial data for the general constraints (not necessarily vacuum) satisfying the weak energy condition [H-E]. Bartnik further restricts the domain to have no horizons [Ba3]. We call a manifold admissible for if it is AF with one end, has nonnegative scalar curvature, has no horizons, and contains the given domain isometrically. We define mB () = inf{mADM (g) : g admissible}. We note that Huisken-Ilmanen [H-I] make a similar definition where horizons are allowed on ∂. A metric which attains this infimum is called a minimal mass metric, and the extension is a minimal mass extension. It is an important open problem to determine for which minimal mass metrics exist. Several basic properties hold, two of which were recently proved by Huisken-Ilmanen: (i) (ii) (iii) (iv)

mB () is well-defined. mB () > 0 unless is locally isometric to R3 , in which case mB () = 0. If 1 ⊂ 2 , then mB (1 ) ≤ mB (2 ). If k is an exhaustion of M, then lim mB (k ) = mADM . k→∞

As remarked by Bartnik in [Ba2], (iii) is elementary and (i) follows from the PMT. Properties (ii) and (iv) appear in [H-I], where appropriate conditions on the exhaustion k are stated. In the next theorem the local scalar curvature deformation of Theorem 1 is used to prove part of the static metric conjecture of Bartnik [Ba2, Ba3]. (Of course the harder part is establishing the existence of minimal mass metrics.)

164

J. Corvino

Theorem 8. Minimal mass extensions of a domain are static. Remark. We assume the boundary ∂ is smooth enough to solve a boundary value problem for the conformal Laplacian and apply Green’s formulas (see the proof). Proof. The first step is to see that the scalar curvature is zero outside the given domain . To see this, we do a conformal change by solving the boundary value problem for the conformal Laplacian L0 , n−2 R(g)u = 0, 4(n − 1) u = 1 on ∂, u → 1 at ∞.

L0 u = *g u − u > 0,

The analysis which justifies this can be found in [Ba1, L-P, S-Y2], as can the expansion of the solution u at infinity, A + O(|x|−2 ). u=1+ |x| We let u ≡ 1 in . By the maximum principle, A < 0. By direct computation, one can show the mass of u4 g is less than that of g contradicting minimality. One point that should be made here is that as Bartnik mentions [Ba1], the extension may not be smooth across ∂, though it is clearly seen to be Lipschitz. However, we 4 should check that R(u n−2 g) ≥ 0 distributionally across ∂. Indeed, for any smooth nonnegative φ of compact support we have by Green’s identity, (where the normal derivatives Dn are taken in the outward direction to ) uL0 φ dµg = L0 φ dµg + uL0 φ dµg

M

c

n−2 R(g)φ dµg + *g φ dµg + u*g φ dµg 4(n − 1) c M = Dn φ dσg + φ*g u dµg − (uDn φ − φDn+ u)dσg =−

∂

u

− M

c

n−2 u R(g)φ dµg . 4(n − 1)

∂

Here we write Dn+ u to mean the outward-pointing normal derivative computed using the n−2 values of u outside of . Now since u = 1 and Dn+ u ≤ 0 on ∂, and *g u = 4(n−1) R(g)u c on , we indeed have the desired inequality:

uL0 φ dµg = M

∂

φDn+ u dσg −

n−2 4(n − 1)

φR(g)u dµg ≤ 0.

It is now elementary to see the metric must be static outside . From the argument in the proof of Proposition 3.2 we see that if it were not static there would be some compactly contained domain U ⊂⊂ M \ on which it were not static. We can use the local scalar curvature deformation to bump the scalar curvature up a bit on U , leaving the metric unchanged outside U . Since the structure at infinity is unchanged, so is the mass, and hence this is a minimal metric as well. But the above conformal change pushes the mass lower, since the scalar curvature is positive on U , contradicting minimality.

Scalar Curvature Deformation

165

Corollary 4.1. Suppose 1 ⊂ 2 and suppose 2 admits a minimal mass extension. If mB (1 ) = mB (2 ), then 2 \ 1 is static. As we noted before Prop. 2.8, if nontrivial solutions of L∗g f = 0 vanish somewhere, the static spacetime metric might degenerate on the zero-locus of f . However this zerolocus is somewhat special. In the present situation, consider M topologically R3 and ∂ connected. Then there aren’t any closed (compact and boundaryless) components 10 ⊂ M \ of f −1 (0). For suppose is in the infinite component of M \ 10 , then the harmonic function f vanishes on the boundary of a region on which f is defined; the maximum principle implies that f vanishes identically on an open set, and hence is identically zero by unique continuation, a contradiction. On the other hand, suppose is inside the bounded component of M \ 10 . We have that 10 is minimal; if it were not stable, it could be used as a barrier to produce a stable minimal surface between it and asymptotically-flat infinity; this stable minimal surface would be a horizon, contradicting the no-horizon condition. 5. Scalar-Flat Metrics, Schwarzschild at Infinity 5.1. The basic setup. We construct scalar-flat metrics on Rn which are Schwarzschild at infinity by using a gluing procedure. Let g be any smooth AF metric on Rn with nonzero mass m0 . Let φ : R → R be a smooth cutoff function, such that 0 ≤ φ ≤ 1, φ(t) ≡ 1 for t ≤ (1 + δ) and φ(t) ≡ 0 for t ≥ (2 − δ) for some small δ. Let H : Rn \ K → Rn \ B1 be a fixed AF coordinate chart, in which (H∗ g − δ) ∈ Ok |x|−(n−2) , k ≥ 2 in Z+ ∪ {∞} (this decay is achieved for example by metrics which are conformally flat and scalar-flat at infinity with k = ∞, which are types of metrics we will be gluing). Now define a metric g˜ on Rn as follows: let g˜ = g on K, and on Rn \ K we define g˜ in the chart H:  |x| ≤ R  g(x), |x| |x| S φ( ) g(x) + 1 − φ( ) g (x), R ≤ |x| ≤ 2R g(x) ˜ = g˜ R (x) = (m,c) R R   S g(m,c) (x), |x| ≥ 2R. g˜ is then as smooth as g, provided of course that |c| < R. Note that the gluing occurs on an annular region AR = {x : R < |x| < 2R}. We scale the metric to make the gluing occur on a fixed annular region A1 = {x : 1 < |x| < 2}. So we let gR (x) = g(Rx); ˜ note that we should write gR,(m,c) , but we will suppress (m, c) until needed. We study the difference between gR and the standard metric δ on the annulus A1 , always working in the fixed AF coordinate chart. We have on AR , |x| |x| g˜ ij (x) − δij (x) = φ [gij (x) − δij (x)] + 1 − φ [gijS (x) − δij (x)] R R = O(R −(n−2) ), g˜ ij,k =

φ ( |x| R)

(35)

xk

[gij (x) − gijS (x)] R |x| |x| |x| S + φ gij,k (x) + 1 − φ gij,k (x) R R

= O(R −(n−1) ).

(36)

166

J. Corvino

Inductively one sees that ∂ |β| (g˜ ij − δij ) = O(R −(n−2+|β|) ), ∂x β

(37)

where 0 ≤ |β| ≤ k. Therefore on A1 we have that

∂ |β| (gR )ij − δij = O(R −(n−2) ), β ∂x

(38)

where 0 ≤ |β| ≤ k. So we see that if (H∗ g − δ) ∈ Om+1 (|x|−(n−2) ), then gR − δC m,α (A1 ) = O(R −(n−2) ), for 0 < α < 1. Moreover, we have from Eq. (38) above that the Christoffel symbols of gR decay: 9ijk =

1 km (gR )im,j + (gR )j m,i − (gR )ij,m = O(R −(n−2) ), g 2 R

and likewise k 9ij,l = O(R −(n−2) ).

Thus the Ricci and scalar curvature also decay: k k k l l Ric(gR )ij = 9ij,k − 9ik,j + (9kl 9ij − 9jkl 9ik ) = O(R −(n−2) ), ij k k k l l R(gR ) = gR 9ij,k − 9ik,j + (9kl 9ij − 9jkl 9ik ) = O(R −(n−2) ).

(39) (40)

The idea is to let R tend to infinity, doing the gluing further out in the AF region. Then the scalar curvature R(gR ) will be tending to zero; if for any R it were ever precisely zero, we’d be done. If it never hit zero exactly, then since the scalar curvature on ∂A1 is zero, we see that R(gR ) is then not constant on A1 , although it is almost the constant zero. In particular then, the metric gR is not static on A1 , so we can use Theorem 1 to perturb the scalar curvature even closer to zero, without destroying the outer symmetry, or the given original metric on the inside. The question is, of course, whether it is possible to perturb the metric in A1 to be precisely scalar-flat. Obviously this is a question of the radius of surjectivity about R(gR ) for R tending to infinity. This in turn is related to the elliptic estimate (10) for the adjoint L∗gR of the linearization of the scalar curvature operator f Hρ2 (A1 ) ≤ CR L∗gR f L2ρ (A1 ) . The constant CR cannot be taken to be uniform in R, because at the flat metric δ, L∗δ has nontrivial kernel. In fact, using the identity (6) for solutions of L∗g f = 0, R(g) Hess f = Ric(g) − g f, n−1 we see that ker L∗δ is just the span K of the linear functions 1, x 1 , . . . , x n (note we have included the constants in our term “linear”). This obstruction is one we will deal with directly.

Scalar Curvature Deformation

167

5.2. The approximate kernel. The problem we have run into is simply that the limiting case of our gluing procedure is flat space, which is static. The Basic Estimate (10) of Theorem 3 therefore is not uniform in R, and hence our demonstration of the local surjectivity of the scalar curvature does not suffice to allow us to perturb the metric to a scalar-flat one, at least not without some more work. The technique we use here is to account for the kernel K of L∗δ at the flat metric through geometric means, identifying geometric quantities which correspond to this kernel. In this setting we say that the set K forms an approximate kernel; for each R we have assumed that L∗gR has no kernel, but these operators are approaching L∗δ , which has kernel K. What we will do is show that if we work transverse to K, we do keep uniform estimates, and we use these to annihilate the transverse component of the scalar curvature. Finally we of course have to use some geometric means to then annihilate the component of the scalar curvature which remains in K. This technique is a common and powerful tool in many geometric gluing problems, for example see [Bu, K, P, S2]. We now identify a space transverse to K in L2 (A1 ). Note that we will be dealing with measures dx and dµg simultaneously, where g will be some (smooth) metric on A1 . We first note that the weight ρ on A1 is decaying at ∂A1 as before, so that the set K is generally not contained in L2ρ −1 (A1 ). We modify K slightly to a set K∗ ⊂ Hρ2−1 (A1 ) by letting K∗ = span{ζ, ζ x 1 , . . . , ζ x n }; here ζ is a smooth, spherically symmetric bump function of compact support in A1 , 0 ≤ ζ ≤ 1, which we can take to be identically 1 in a neighborhood about the sphere {x : |x| = 3/2}. Let “⊥” denote the L2 (A1 )-orthogonal complement, with respect to whichever measure dµg or dx is being used. Then we have the following: Proposition 5.1. L2 (A1 ) = K∗ ⊕L2 (dx) K ⊥ . Proof. Clearly K∗ is (n + 1)-dimensional (as ζ is continuous and nontrivial), so we just n need to check that K∗ ∩ K ⊥ = {0}. So suppose that (c0 ζ + ci x i ζ ) ∈ K ⊥ . Then since i=1

1 ∈ K, c0 ζ +

n

! i

dx = 0.

(41)

Since ζ is symmetric on A1 , we have by symmetry, x i ζ dx = 0, i = 1, . . . , n.

(42)

A1

ci x ζ

i=1

A1

As ζ ≥ 0 on A1 , the previous two equations yield c0 = 0. Now, since x j ∈ K, we now have   n cj (x j )2 ζ + ci x i x j ζ  dx = 0. (43) A1

i=j

Of course by symmetry again we have x i x j ζ dx = 0, A1

i = j,

(44)

168

J. Corvino

so that

cj (x j )2 ζ dx = 0.

A1

Again since ζ ≥ 0 on A1 , cj = 0. Corollary 5.2.

L2 (A

1)

= K ⊕L2 (dx) K∗⊥ .

Corollary 5.3. For g sufficiently C 0 -near δ we have

Proof. Suppose (c0 ζ + uniformly near δ,

n i=1

L2 (A1 ) = K∗ ⊕L2 (dµg ) K ⊥ ,

(45)

L2 (A1 ) = K ⊕L2 (dµg ) K∗⊥ .

(46)

ci x i ζ ) ∈ K ⊥ . There is a constant M > 0 so that for any g

min

j =1,... ,n

A1

ζ dµg ,

j 2

A1

ζ (x ) dµg

≥ M.

Now take some < ∈ (0, 1), and note that Eqs. (42),(44) above imply that for g sufficiently near δ, ) ) ) ) ) ) ) ) <M j i j ) ) ) max . ζ x dµg ) , ) ζ x x dµg )) ≤ ) i=1,... ,n; i=j n A1 A1 Then the conditions that (c0 ζ +

n i=1

ci x i ζ ) ∈ K ⊥ ,

A1

A1



c0 ζ +

j 2

! i

ci x ζ

i=1

c0 ζ x + cj (x ) ζ + j

n

n

dµg = 0



ci x x ζ  dµg = 0, i j

i=j

imply that for each i = 0, . . . n, |ci | ≤ < Thus all ci must be zero.

max

j =0,...n, j =i

|cj |.

5.3. The projected problem. We now consider the projected problem πK∗⊥ R(g) = 0,

(47)

where πK∗⊥ : L2 (A1 , dµg ) → L2 (A1 , dµg ) denotes orthogonal projection onto K∗⊥ . The plan is to first perturb an appropriate gR in A1 to some g¯ R to solve this projected problem, thereby achieving R(g¯ R ) ∈ K∗ . (We will then later show that (m, c) can be chosen so that R(g¯ R ) = R(g¯ R,(m,c) ) = 0.) We solve the projected problem as before. The first item we establish is a Basic Estimate, which is used to solve the linearization variationally, and finally we use Picard iteration to solve the nonlinear problem πK∗⊥ R(g) = 0.

Scalar Curvature Deformation

169

2 (A ) as the subset of all u ∈ 5.3.1. The basic estimate. We define the space H∗,loc 1 2 (A ) such that for any domain compactly contained in A we have Hloc 1 1 u dx = 0 = u x i dx,

for i = 1, . . . , n. Then we have the following (in which the measure is dx): 2 (A ), Theorem 9. There is a constant C such that for all f ∈ H∗,loc 1

f Hρ2 (A1 ) ≤ C L∗δ f L2ρ (A1 ) . Proof. By Proposition 3.1 we have on A1,< = {x : 1 + < < |x| < 2 − <} f H 2 (A1,< ) ≤ C L∗δ f L2 (A1,< ) + f L2 (A1,< ) .

(48)

From this we conclude that on A1,< , f H 2 (A1,< ) ≤ C L∗δ f L2 (A1,< ) .

(49)

This follows by standard argument: if it were false, there would be a sequence fi in 2 H∗,loc (A1 ) with fi H 2 (A1,< ) = 1, L∗δ fi L2 (A1,< ) → 0, and (by Rellich) an f ∈ L2 (A1,< ) with fi → f in L2 (A1,< ). Then we have L∗δ f = 0 weakly in A1,< ; by elliptic regularity, f is actually smooth, and hence a strong solution, and hence f is linear. Now the estimate (48) implies that fi → f in H 2 (A1,< ), and hence f H 2 (A1,< ) = 1. But since then f must also be L2 -orthogonal to the linear functions, we obtain the contradiction that f is zero. We can argue just as in the proof of Proposition 3.2 that C in the previous estimate is uniform in < small. We should only make one small remark on this point: in that proof 2 (A ) as an H 1 -limit of f such that L∗ φ = 0; these f we construct a nontrivial φ ∈ Hloc 1 i i δ 2 are L -orthogonal to the linear functions on a sequence of annuli exhausting A1 . Hence φ is L2 -orthogonal to the linear functions. This is then the contradiction. We can now work in the weight ρ just as in the proof of Theorem 3. We actually would like a Banach space on which we can perform the variational procedure. First we note that the bump function ζ is indeed in Hρ2−1 (A1 ), in fact it is compactly supported in A1 , so the definition below makes sense. We define for each g, 2 H∗,ρ (A1 ) ≡ {u ∈ Hρ2 (A1 ) : u ζ dµg = 0 = u x i ζ dµg , i = 1, . . . , n.}. A1

A1

2 (A ) is a Hilbert subspace of H 2 (A ). Lemma 5.4. The space H∗,ρ 1 1 ρ 2 (A ). Let u denote the Proof. Let the sequence ui be an Hρ2 -Cauchy sequence in H∗,ρ 1 2 2 Hρ -limit of the ui . By Cauchy–Schwarz we see that u ∈ H∗,ρ (A1 ): ) ) ) ) ) ) ) ) j j j ) ) ) ) = ux ζ dµ u x ζ dµ − ux ζ dµ g) i g g) ) ) A1 A1 A1 ≤ |ui − u||x j |ζ dµg A1 j

≤ x ζρ −1/2 L2 (A1 ,dµg ) ui − uL2ρ (A1 ,dµg ) → 0.

170

J. Corvino 2 (A ). The Basic Estimate (10) extends similarly to the space H∗,ρ 1

2 (A ), Theorem 10. There is a constant C so that for all u ∈ H∗,ρ 1

uHρ2 (A1 ) ≤ CL∗δ uL2ρ (A1 ) .

(50)

Proof. The proof is the same as the preceding theorem, as any solution u to L∗δ u = 0 which is orthogonal to K∗ is trivial. Finally we note that these estimates hold uniformly for metrics g which are C 2 -close to the flat metric δ. Proposition 5.5. There is an <0 > 0 and a constant C so that for C 2 -metrics g which 2 (A ), are <0 -near δ in C 2 we have for all u ∈ H∗,ρ 1 uHρ2 (A1 ) ≤ CL∗g uL2ρ (A1 ) .

(51)

Proof. We first remark that the set of functions which comprise Hρ2 (A1 ) is clearly the same for the metric g as for δ. Moreover, for g <-near δ in C 2 , the difference in the volume forms, the Christoffel symbols, and the curvature of g are bounded near zero by some C(<), where C(<) → 0 as < ↓ 0. Hence we have that uHρ2 (A1 ,g) ≤ uHρ2 (A1 ,δ) + C(<)uHρ2 (A1 ,g) ,

L∗δ uL2ρ (A1 ,δ) ≤ L∗g uL2ρ (A1 ,g) + C(<)uHρ2 (A1 ,g) .

(52) (53)

From this and the Basic Estimate (50), which is uniform in g C 2 -near δ, the proposition follows. 5.3.2. Variational method. The linearization Rg of the projected operator πK∗⊥ R(g) is simply πK∗⊥ Lg . Since the operator Rg (ρR∗g ) is formally self-adjoint, we may expect to be able to map K∗⊥ onto K∗⊥ (restricted to suitably weighted spaces), with uniform estimates for g near δ using a restricted variational problem. 2 (A ) by Indeed for f ∈ L2ρ (A1 ) we consider the functional G defined on H∗,ρ 1 1 ∗ 2 G(u) = |Rg u| − f u ρ dµg . 2 A1 Let µf =

inf

2 (A ) u∈H∗,ρ 1

G(u). Again we see µf ≤ 0.

2 (A ), we have for any ψ ∈ C ∞ (A ) ∩ S 0,2 , We note that for φ ∈ H∗,ρ 1 1 c

R∗g (φ), ψL2 (dµg ) = φ, Rg (ψ)L2 (dµg ) = φ, Lg (ψ)L2 (dµg )

= L∗g (φ), ψL2 (dµg ) ,

(54)

where the second equality follows since φ is already L2 -orthogonal to K∗ . So we see 2 (A ) we have L∗ = R∗ , thus for our variational problem the Basic Estimate that on H∗,ρ 1 g g

Scalar Curvature Deformation

171

2 (A ), just as done (51) will apply, and will yield the existence of a minimizer u ∈ H∗,ρ 1 before. The next step is to compute the Euler–Lagrange equation. Let η ∈ Cc∞ (A1 ) be orthogonal to K∗ . Then we have 1 d )) d )) | R∗g (u + tη) |2 −f (u + tη) ρ dµg , 0 = ) F(u + tη) = ) dt t=0 dt t=0 2   d 1 (in a local orthonormal frame) = |t=0  (R∗g u + tR∗g η)2ij ρ dµg  dt 2 i,j − f ηρ dµg , R∗g u R∗g η − f η ρ dµg . 0= (55)

This is almost just the weak formulation of Rg (ρR∗g u) = fρ. The only problem seems to be that η is not arbitrary in Cc∞ (A1 ). However, we see that since the image of Rg lies in K∗⊥ , we ought to take fρ ∈ L2∗,ρ −1 (A1 , dµg ) ≡ L2ρ −1 (A1 , dµg ) ∩ K∗⊥ ; this is a Hilbert space, as the proof of Lemma (5.4) extends to show that integration by ζ or x i ζ is a bounded linear functional on L2ρ −1 (A1 , dµg ). We have the following lemmas. Lemma 5.6. Cc∞ (A1 ) ∩ K∗⊥ is L2 (A1 , dµg )-dense in K∗⊥ . Proof. Let f ∈ K∗⊥ . Since Cc∞ (A1 ) is dense in L2 (A1 , dµg ), we can find a sequence ηi ∈ Cc∞ (A1 ) with ηi → f in L2 (A1 , dµg ). We decompose ηi = η1i + η2i , with η1i ∈ Cc∞ (A1 ) ∩ K∗⊥ and η2i ∈ K∗ . But clearly then by orthogonality, η2i → 0. Lemma 5.7. For η ∈ K∗ , R∗g η = 0. Proof. For φ ∈ Cc∞ (A1 ) ∩ S 0,2 , R∗g η, φL2 (dµg ) = η, Rg (φ)L2 (dµg ) = 0.

Combining the previous two lemmas with Eq. (55), we see we have the weak formulation of Rg (ρR∗g u) = fρ; note that ρR∗g u ∈ S 0,2 , i.e. it is a symmetric (0, 2)-tensor. In the next section we get strong estimates on this solution and use these to solve the nonlinear problem. 5.3.3. Strong estimates and the nonlinear problem. In this section we show that the basic facts needed for the Picard iteration to solve the nonlinear scalar curvature problem hold also for the projected problem. The first fact is Taylor’s formula, for which we use the following simple lemma. Lemma 5.8. The map πK∗⊥ : C k,α (A1 ) → C k,α (A1 ) is a continuous linear (and hence smooth) map of Banach spaces. Proof. The integral maps φ → A1 φ ζ dµg and φ → A1 φ x i ζ dµg , i = 1, . . . , n, are continuous on C k,α (A1 ). By Gram–Schmidt it is possible to write the map πK∗⊥ as the identity minus a linear combination of the above integral maps.

172

J. Corvino

Thus Taylor’s formula applies to the map πK∗⊥ R : Mk+2,α (A1 ) → C k,α (A1 ): πK∗⊥ R(g0 + h) = πK∗⊥ R(g0 ) + Rg0 (h) + O(h2k+2,α ),

(56)

where sup

h=0

O(h2k+2,α )k,α h2k+2,α

= C(g0 ).

We can arrange for C to be uniform for metrics which are C k+2,α -near g0 , or in particular near δ. We then solve the linear problem Rg0 (h0 ) = −πK∗⊥ R(g0 ). This is the analogue of our previous problem (Sect. 3.3) with S = 0. We can solve 2 (A ). Hence the Basic this variationally with h0 = ρR∗g0 u0 = ρL∗g0 u0 with u0 ∈ H∗,ρ 1 Estimate applies to u0 as before, and it is easy to observe that the only obstruction to applying the previous argument used in the proof of Proposition 3.8 to this setting are the analogous Schauder estimates (18), (19), and (22). We now derive the analogues of these estimates. Proposition 5.9. For ⊂⊂ with d( , ∂) ≥ d > 0 and d ≤ 1, there is a constant C > 0 so that (57) u0 k+4,α, ≤ Cd −k−4−α u0 L2 () + Rg (ρR∗g u0 )k,α, . The constant C = C(n, k, α, , g0 ), where the dependence on g0 is on a bound on the constant of ellipticity and on C k,α bounds on the coefficients, and the dependence on is as before (see (18)). Here K∗ is a fixed finite-dimensional subspace of Cc∞ () whose functions are all supported in . Proof. Since all norms on finite-dimensional spaces are equivalent, we have for all φ ∈ K∗ φk,α, ≤ CφL2 ( ) .

(58)

Clearly this C is uniform for metrics near g0 , since the norms on each side of the inequality in (58) are respectively uniformly equivalent for g near g0 , as the volume forms and geodesic distance functions are uniformly comparable. With this in mind, we have from (18), u0 k+4,α, ≤ Cd −k−4−α u0 L2 () + Lg (ρL∗g u0 )k,α, ≤ Cd −k−4−α u0 L2 () + Rg (ρR∗g u0 )k,α, + πK∗ Lg (ρL∗g u0 )k,α, ≤ Cd −k−4−α u0 L2 () + Rg (ρR∗g u0 )k,α, + πK∗ Lg (ρL∗g u0 )L2 ( ) ≤ Cd −k−4−α u0 L2 () + Rg (ρR∗g u0 )k,α, + u0 4,α, . (59)

Scalar Curvature Deformation

173

Since by Arzela–Ascoli the inclusion map C k+4,α ( ) O→ C 4,α ( ) is compact (for k > 0), and the inclusion C 4,α ( ) O→ L2 ( ) is continuous, Ehrling’s Lemma [R-R] yields the following interpolation inequality: for any < > 0, there is a C(<), for which for any φ ∈ C k+4,α ( ), φ4,α, ≤ <φk+4,α, + C(<)φL2 ( ) .

(60)

Clearly we can choose < small enough so that the Proposition follows from (60). As with (58), C(<) can also be taken uniform near g0 . The proof of the analogues of estimates (19) and (22) proceed as in the proof of (22) and as in the preceding paragraph, except that the interpolation is applied to C k+4,α ( ) O→ C 4,α ( ) O→ L2ρ ( ). Therefore we see that if we let the radius R be large enough, we can arrange by the metric decay estimates of Eq. (38) to have the metric gR close to the flat metric in A1 , so that the scalar curvature is close to zero there, indeed as we showed above, R(gR ) = O(R −(n−2) ). In fact, keeping in mind that R(gR ) = 0 near the boundary of the annulus (by the cut-off), we have πK∗⊥ R(gR )k,α,ρ −1 ≤ CR(gR )k,α,ρ −1 = O(R −(n−2) ).

(61)

We will be a bit more precise about this later. With this estimate in mind, then, the previous method then applies to yield the following solution of the projected problem (47). We use the ideas of Theorem 4 which illustrate that we can solve the problem in a low regularity class (k = 0 below) and bootstrap up, and in fact using an exponential weight we can in fact solve in the smooth category as well. Theorem 11. Given any C k+4,α asymptotically flat, scalar-flat metric g on Rn with a chart H at infinity in which H∗ g − δ = OK+1 (|x|−(n−2) ), and given any (m, c) ∈ Rn+1 , then for sufficiently large R there is a C k+2,α metric g¯ R,(m,c) = gR,(m,c) + h with R(g¯ R,(m,c) ) ∈ K∗ . The tensor h is obtained variationally as above, and in fact from the proof of Proposition 3.8 it follows that hk+2,α,ρ −1 ≤ CR(gR )k,α,ρ −1 = O(R −(n−2) ).

(62)

If g is smooth, this can be done smoothly just as in Theorem 1. Remark. These two preceding estimates are independent of m and c bounded in some fixed region. Also the theorem clearly holds for smooth scalar-flat metrics g which are conformally flat at infinity. The assumption that the decay to the flat metric is in O3 is sufficient to get convergence to the flat metric in C 2,α on the annular region, and thus we keep control of the Taylor constants during the iteration for k = 0, and then we can bootstrap regularity as before. So we actually have a map I on a subset of Rn+1 : (m, c) → πK R(g¯ R,(m,c) ). We wish to conclude that for suitable choice of (m, c), this map is zero, and hence by the decomposition in Corollary 5.3, the scalar curvature will vanish. In the next section we do this by computing the related map I0 , which projects R(g¯ R,(m,c) ) onto each of the elements 1, x 1 , . . . , x n , and we show that this map vanishes for appropriate (m, c). By scaling, then, we will have constructed a scalar-flat metric on Rn which preserves the original metric g inside a large ball and is Schwarzschild at infinity.

174

J. Corvino

5.4. Computing the map I0 . In the section we compute the map I0 : {(m, c) : |c| < R} → Rn+1 . We prove the map hits zero by a degree argument, in which we show that the map is a local diffeomorphism which surrounds the origin, at the point which corresponds to the mass m0 and center c0 of the initial metric. For clarity we will do some of the computations explicitly only for the case n = 3; the modifications for general n > 3 will be straightforward, and the relevant changes are recorded in the proof of Corollary 5.18. As we stated before (Theorem 7), up to an <-quasi-isometry, we may take an 4 asymptotically flat, scalar-flat metric g to be conformally flat at infinity, gij = u n−2 δij , with u > 0 and u ∼ 1 atinfinity. The conformal formula for the scalar curvature, − n+2 n−1 R(g) = −4 n−2 u n−2 (*u), shows that u is harmonic at infinity on Rn in the standard metric. Since u ∼ 1 at infinity, (u − 1) can be expanded in spherical harmonics at infinity, which are the spherical harmonics under a Kelvin transform [A-B-R]. Thus for n = 3 we have the expansion u(x) = 1 +

Cy Dz Bx A + 3 + 3 + 3 + O∞ (|x|−3 ). |x| |x| |x| |x|

(63)

We have the following lemma which picks out a unique center about which the asymptotics of u are a bit better. Lemma 5.10. If g is AF, scalar flat, and conformally flat at infinity with nonzero mass, then in each AF chart there is a unique c0 such that u(x − c0 ) = 1 +

A + O∞ (|x|−3 ). |x|

(64)

Proof. We denote c = (c1 , c2 , c3 ). Plugging into the expansion (63) we have A B(x − c1 ) + |x − c| |x − c|3 C(y − c2 ) D(z − c3 ) + + + O∞ (|x − c|−3 ). |x − c|3 |x − c|3

u(x − c) = 1 +

(65)

It is easy to see by the binomial formula that 1 = (|x|2 − 2c · x + |c|2 )−1/2 |x − c| −1/2 2c · x 1 |c|2 1− = + 2 |x| |x|2 |x| 1 c·x −2 = + O∞ (|x| ) , 1+ |x| |x|2

(66)

and 1 1 = 3 |x − c|3 |x|

1+

3c · x |x|2

+ O∞ (|x|−5 ).

(67)

Scalar Curvature Deformation

175

Plugging this into the formula (65), we obtain A Ac · x + (B, C, D) · x + O∞ (|x|−3 ). + |x| |x|3

u(x − c) = 1 +

As the mass is nonzero, we have A > 0 (see Sect. 4), so that we can choose the ci so that A(c1 , c2 , c3 ) = −(B, C, D).

Corollary 5.11. For AF metrics g which are conformally flat and scalar-flat at infinity, there is a coordinate chart in which g takes the form v 4 δ, where m v(x) − (1 + ) ∈ O∞ (|x|−3 ). 2|x| Proof. Expressing g in a conformally flat chart at infinity and performing the computation in Lemma 5.10, we can choose a large ball in the conformally flat chart which is centered at (c1 , c2 , c3 ). We now relate the center to an expression similar to the mass integral; this ties in scalar curvature to the center. Lemma 5.12. Suppose g is an AF metric with R(g) = 0 which is conformally flat at infinity. In an appropriate chart at infinity we write g = u4 δ, with u(x) = 1 +

A Bx Cy Dz + 3 + 3 + 3 + O∞ (r −3 ), r r r r

where r = |x|. Then 3 B, C, D = lim 64π R→∞

SR

x

(gij,i − gii,j )ν j dξ,

(68)

i,j

where SR is the Euclidean sphere of radius R ! 1 centered at the origin, ν is the Euclidean normal to SR , and dξ is the Euclidean surface measure on SR . Proof. x SR

(gij,i − gii,j )ν j dξ i,j

=

x SR

= −8 SR

Since

j

∂u ∂x j

νj =

∂u ∂r ,

4u3

i,j

xu3

∂u ∂u δij − 4u3 j ∂x i ∂x

∂u ν j dξ. ∂x j j

and r ≡ R on SR ,

ν j dξ

176

J. Corvino

x SR

(gij,i − gii,j )ν j dξ i,j

A (Bx + Cy + Dz) −4 xu3 − 2 − 2 + O(R ) dξ R R4

= −8 SR

3A A (Bx + Cy + Dz) −4 + O(R −2 ) − 2 − 2 x 1+ + O(R ) dξ. R R R4

= −8 SR

Thus we have x (gij,i − gii,j )ν j dξ SR

i,j

= −8 SR

A (Bx + Cy + Dz) 3A −4 x − 2 −2 − 3 + O(R ) dξ. R R4 R

By symmetry, the first component of the vector integral reduces in the limit to −8 SR

2Bx 2 16B − 4 dξ = 4 r R

x 2 dξ SR

= 16B

x 2 dξ S2

= 16B The other components follow analogously.

Area(S2 ) 64π = B. 3 3

We recall the notation g˜ = g˜ R = g˜ R,(m,c) , the glued metric on AR , and gR = gR,(m,c) , the scaled glued metric on A1 and finally g¯ = g¯ R,(m,c) = gR + h, where h is the deformation tensor supported in A1 obtained as in the previous section. Now, the mass integral, as well as the integrand in Lemma 5.12 after an application of the divergence theorem, involves the term (gij,ij − gii,jj ). For the metric g˜ this is essentially the i,j

scalar curvature. More precisely we compute

k k k l l − 9ik,j + (9kl 9ij − 9jkl 9ik ) R(g) ˜ = g˜ ij 9ij,k 1 km 1 = g˜ ij g˜ (g˜ im,j + g˜ j m,i − g˜ ij,m ) + g˜ km (g˜ im,j k + g˜ j m,ik − g˜ ij,mk ) 2 ,k 2 1 1 − g˜ km (g˜ im,k + g˜ km,i − g˜ ik,m ) − g˜ km (g˜ im,kj + g˜ km,ij − g˜ ik,mj ) 2 ,j 2

k l l + (9kl 9ij − 9jkl 9ik ) .

Scalar Curvature Deformation

177

By Eq. (37), we have that 1 ij km g˜ g˜ (g˜ j m,ik − g˜ ij,mk − g˜ km,ij + g˜ ik,mj ) + O(R −4 ) 2 = g˜ ij g˜ km (g˜ ik,mj − g˜ ij,mk ) + O(R −4 )

R(g) ˜ =

= (δ ij + O(R −1 ))(δ km + O(R −1 ))(g˜ ik,mj − g˜ ij,mk ) + O(R −4 ). Hence, R(g) ˜ =

(g˜ ij,ij − g˜ ii,jj ) + O(R −4 ).

(69)

i,j

Similarly, R(gR ) =

(gR )ij,ij − (gR )ii,jj + O(R −2 ).

(70)

i,j

With these computational lemmas at hand, we calculate the first integral associated to the map I0 . We first use Lemma 5.10 to fix our coordinate chart at infinity about the center, which forces the integral computed in the previous Lemma 5.12 to tend to zero as R tends to infinity. Effectively we have zero-ed out the center of mass. If the metric has mass m0 > 0, we will look at the map I0 near the point (m0 , 0). Proposition 5.13. R(g¯ R )dµgR = 16π A1

m − m0 + o(R −1 ), R

(71)

where o(R −1 ) is independent of m and c bounded near (m0 , 0). Proof. R(g¯ R ) dµgR = A1

R(gR ) + LgR (h) + O(h2k+2,α,A1 ) dµgR .

A1

We compute these three terms separately. The first fact we note is that we have an inequality of the form O(h2k+2,α,A1 ) ≤ Ch2k+2,α,A1 , where C is independent of R ! 1. The reason is that this term comes as the secondorder Taylor remainder term of the form 21 D 2 R|(gR +η) (h, h), where η = th for some t ∈ (0, 1), and the fact that gR is bounded near δ. Hence by Eq. (62), O(h2k+2,α,A1 ) = O(R −2 ); this term is pointwise uniformly controlled and only improves with increasing R. Hence, O(h2k+2,α,A1 )dµgR ≤ C O(R −2 )dµgR = O(R −2 ). A1

A1

178

J. Corvino

We now consider the term

A1

LgR (h)dµgR .As h decays at the boundary of the annulus,

we can integrate by parts to get LgR (h)dµgR = h · L∗gR (1)dµgR . A1

A1

Now since L∗gR (1) = − Ric(gR ) = O(R −1 ), we also have

LgR (h)dµgR = O(R −2 ).

(72)

A1

Finally we compute

A1

R(gR )dµgR . We first note that using Eq. (38) and Taylor’s

formula, we see that ) ) $ ) ) )1 − det (gR )ij ) = O(R −1 ).

(73)

Thus from (70), we have

(gR )ij,ij − (gR )ii,jj ) dx + O(R −2 ) R(gR )dµgR = A1 i,j

A1

=

R2

i,j

AR

= R −1

∂AR

= 16π

(g˜ ij,ij − g˜ ii,jj )

dx + O(R −2 ) R3

(g˜ ij,i − g˜ ii,j )ν j dξ + O(R −2 )

i

m − m0 + o(R −1 ) + O(R −2 ), R

(74)

where both lower-order terms are independent of m and c bounded, as are the previous lower-order terms. The computation of the other integrals is a bit more delicate. We will want a more precise version of (62), the bound on h via the bound on the scalar curvature of the glued −1 metric. In particular, we want a bound on the R -dependence. We also want a precise −2 bound of the R -dependence on |R(gR )− ((gR )ij,ij −(gR )ii,jj )|. Precisely, we have i,j the following three lemmas. Lemma 5.14. There are smooth functions γ˜R,k , γ˜R,j k on AR , bounded independent of R ! 1 and m and c bounded, such that on AR , xk γ˜R,k g˜ ii,k = g˜ Rii,k = −2m 3 + 2 (m − m0 ) + O(R −3 ), (75) |x| R γ˜R,j k 2m xj xk g˜ ii,j k = g˜ Rii,j k = − 3 δj k + 6m 5 + (m − m0 ) + O(R −4 ). (76) |x| R3 |x|

Scalar Curvature Deformation

179

Hence by scaling, we get functions γR,k , γR,j k on A1 , bounded independent of R ! 1 and m and c bounded, such that on A1 , 2m x k γR,k + (m − m0 ) + O(R −2 ), 3 R |x| R γR,j k 1 xj xk 2m = − 3 δj k + 6m 5 + (m − m0 ) + O(R −2 ). R |x| R |x|

(gR )ii,k = − (gR )ii,j k

(77) (78)

Proof. Recall that (Sects. 5.1, 5.4) |x| 4 |x| m 4 g˜ ij = φ( )u + [1 − φ( )](1 + ) δij . R R 2|x − c|

(79)

Also recall that we have zero-ed out the |x|−2 -term from the expansion of u; this is not essential, but it does simplify the calculation slightly. We have k 4 |x| x m u4 − 1 + = φ R R|x| 2|x − c| 3 |x| m m 3 +φ 1+ 4u u,k − 4 1 + R 2|x − c| 2|x − c| ,k 3 m m 1+ . +4 1+ 2|x − c| 2|x − c| ,k

g˜ ii,k

Expanding, we get g˜ ii,k

4 4 k φ ( |x| m m0 mc · x −3 −3 R)x = + O(|x| ) 1+ + O(|x| ) − 1 + + R |x| 2|x| 2|x| 2|x|3 3 |x| m0 x k m0 −3 −4 + 4φ − 1+ + O(|x| ) + O(|x| ) R 2|x| 2|x|2 |x| 3 mc · x m −3 + + O(|x| ) − 1+ 2|x| 2|x|3 m xk 3m c · x x k mck −4 × − + O(|x| ) − + 2|x|2 |x| 2|x|4 |x| 2|x|3 3 mc · x m −3 + O(|x| ) + +4 1+ 2|x| 2|x|3 3m c · x x k mck m xk −4 − + × − + O(|x| ) . 2|x|2 |x| 2|x|4 |x| 2|x|3

180

J. Corvino

Simplifying we get g˜ ii,k

k φ ( |x| |x| m − m0 1 m − m0 x k −3 R)x = −2 + O( 2 ) + 4φ + O(|x| ) R |x| |x| |x| R 2|x|2 |x| − 2m

xk + O(|x|−3 ). |x|3

This clearly yields the first equation of the lemma. The second one follows by another differentiation, as we can differentiate the preceding equation, with the R −1 - order increasing in each term by one. This next lemma actually follows from Lemmas 5.14 and 5.16, but we give another simple proof. Lemma 5.15. There is a constant C > 0 independent of R ! 1 and m and c bounded, and for each R a smooth function F˜R on AR such that |m − m0 | , R3 R(g˜ R ) = F˜R + O(R −4 ); |F˜R | ≤ C

(80) (81)

and hence with FR (x) ≡ R 2 F˜ (Rx), |m − m0 | , R R(gR ) = FR + O(R −2 ). |FR | ≤ C

(82) (83)

Proof. Recall that the scalar curvature behaves under conformal transformation in three dimensions as follows:

R(U 4 g0 ) = U −5 R(g0 )U − 8*g0 (U ) . Simple computation shows R(vδ) =

3 −3 2 v v,i − 2v −2 v,ii . 2 i

i

Hence with v = g˜ ii = φ

4 |x| 4 |x| m u + 1−φ 1+ R R 2|x − c|

the result follows from the previous lemma, since v = 1 + O∞ (R −1 ), and since *δ v = =

3 i=1 3 i=1

−

2m (x i )2 + 6m |x|3 |x|5

+

γ˜R,ii (m − m0 ) + O(R −4 ) R3

γ˜R,ii (m − m0 ) + O(R −4 ). R3

Scalar Curvature Deformation

181

Lemma 5.16. There is a constant C > 0 independent of R ! 1 and m and c bounded, ˜ R on AR such that on AR , and for each R a smooth function G |m − m0 | , R4

6m2 ˜ R + O(R −5 ); R(g˜ R ) = +G (g˜ R )ij,ij − (g˜ R )ii,jj + |x|4 ˜ R| ≤ C |G

(84) (85)

i,j

˜ R (Rx), and hence with GR (x) ≡ R 2 G |m − m0 | , R2

1 6m2 R(gR ) = (gR )ij,ij − (gR )ii,jj + 2 4 + GR + O(R −3 ). R |x| |GR | ≤ C

(86) (87)

i,j

Proof. We estimate each of the terms in the expression for the scalar curvature 1

1 km g˜ km ,k (g˜ im,j + g˜ j m,i − g˜ ij,m ) + g˜ (g˜ im,j k + g˜ j m,ik − g˜ ij,mk ) 2 2 1 km 1 km − g˜ ,j (g˜ im,k + g˜ km,i − g˜ ik,m ) − g˜ (g˜ im,kj + g˜ km,ij − g˜ ik,mj ) 2 2 k l

k l + 9kl 9ij − 9j l 9ik .

R(g) ˜ = g˜ ij

v

In fact since the metric g˜ = vδ, the Christoffel symbols are easily computed: 9iii = 21 v,i , v v j 9ii = − 21 v,j for i = j , 9iji = 21 v,j for i = j , and 9ji k = 0 when all the indices are distinct. Hence we can easily compute the final term in the expression for the scalar curvature, valid for any n: |∇v|2 k l l g˜ ij 9kl = (2 − n)(n − 1) 9ij − 9jkl 9ik , 4v 3 where the norm of the gradient is taken in the flat metric. The derivatives of the Christoffel symbols are also easy to compute, for example k 9ik,j =

v,i v,j 1 v,ij − . 2 v v2

k , for which The other type of derivative of a Christoffel symbol we use is of the form 9ij,k 2 v,k 1 v,kk k k we have the following possibilities:9ij,k = 0 for distinct i, j, k; 9kk,k = 2 v − v 2 ; 2 v,k v v,i v,k 1 v,kk k k 9ii,k = − 2 v − v 2 for i = k, and finally 9ik,k = 21 ,ik − for i = k. 2 v v

Thus computing the first and third terms in the expression of R(g) ˜ we get (n−1) |∇v| . v3 Returning to the case n = 3 for simplicity, we use Lemma 5.14 to estimate all these 3 2 k 2 terms: 6 |∇v| = (m − m0 )O(R −4 ) + 6 m2 (x|x|)6 + O(R −5 ). v3 2

k=1

182

J. Corvino

We now compute the remaining terms:

1 ij km g˜ j m,ik − g˜ ij,mk − g˜ km,ij + g˜ ik,mj g˜ g˜ 2

= g˜ ij g˜ km g˜ ik,mj − g˜ ij,mk

= v −2 g˜ ik,ik − g˜ ii,kk i,k

|x| 2(m0 − m) 2m −2 = 1−2 φ g˜ ik,ik − g˜ ii,kk + + O(|x| ) R |x| |x| i,k  

(m − m0 )  |x| = g˜ ik,ik − g˜ ii,kk + 4 g˜ ik,ik − g˜ ii,kk  φ |x| R i,k i,k

m −5 −4 g˜ ik,ik − g˜ ii,kk + O(R ). |x| i,k

in the final equation above is of the right form, since The second term g˜ ik,ik − g˜ ii,kk = O(R −3 ). Finally the third term reduces to i,k

4

m m g˜ ii,kk = 4 2*v = 0. |x| |x|

i=k

With these lemmas at hand, we are ready to compute the rest of the map I0 . Proposition 5.17. There is a constant K such that mc K xR(g¯ R,(m,c) )dµgR = − 2 + QR (m, c) + o(R −2 ), R A1

where o(R −2 ) is independent of m and c bounded, and where |m − m0 | , R2 where C is independent of R ! 1 and of m and c bounded. |QR | ≤ C

Proof.

l

x R(g¯ R,(m,c) )dµgR = A1

x l R(gR ) + LgR (h) + O h2k+2,α,A1 dµgR

A1

=

l

x R(gR )dµgR +

L∗gR (x l ) · h dµgR

A1

A1 2 + O(hk+2,α,A1 ).

(88)

As before, we note that L∗gR (x l ) = O(R −1 ). Moreover we can estimate h by using Lemma 5.15 and the estimate (62): hk+2,α,A1 ≤ CR(gR )k,α,ρ −1 ≤ C

|m − m0 | + O(R −2 ). R

Scalar Curvature Deformation

183

This allows us to estimate the last two terms in (88). To estimate the first term we use Eq. (73), which says the volume forms of δ and gR are equal to order R −1 , and we use Lemmas 5.15 and 5.16 to get x l R(gR )dµgR = x l R(gR )dx + (m − m0 ) O(R −2 ) + O(R −3 ) A1

A1

xl

=

((gR )ij,ij −(gR )ii,jj )dx+(m−m0 )O(R −2 )+O(R −3 ).

i,j

A1 2

, integrates to zero against x l by symmetry. The other term, R −2 6m |x|4 Finally, we apply scaling and the divergence theorem to yield l x 2 dx R xl ((gR )ij,ij − (gR )ii,jj )dx = (g˜ ij,ij − g˜ ii,jj ) 3 R R i,j i,j A1 AR = R −2 xl (g˜ ij,i − g˜ ii,j )ν j dξ i

∂AR

−R −2

= R −2

xl

∂AR

+R −2

(g˜ il,i − g˜ ii,l )dx

i

AR

(g˜ ij,i − g˜ ii,j )ν j dξ

i

2g˜ ii ν l dξ.

∂AR

We compute the second integral. By the normalization of the center and by symmetry, the contribution from the inner boundary sphere SR is O(R −3 ). On the outer boundary S2R , 4 4 m m mc·x −3 g˜ ii = 1 + = 1+ + O(|x| ) . + 2|x − c| 2|x| 2 |x|3 By symmetry again, 2R

−2

l

g˜ ii ν dξ = 2R

∂AR

−2

S2R

2mcl

xl xl dξ + o(R −2 ) (2R)3 2R

16π mcl = + o(R −2 ). 3R 2 Finally, we computed in Lemmas 5.10 and 5.12 that 32π mcl xl (g˜ ij,i − g˜ ii,j )ν j dξ = − + o(R −2 ). R −2 3R 2 ∂AR

i

184

J. Corvino

Remark. The normalization of the inner center to zero is not necessary. If we had not normalized it at the start, we would have a difference of corresponding terms in m0 c0 . The following argument would go through just the same. We can at last prove Theorem 2. Proof. By previous considerations, it suffices to show that the map 1 ¯ dµgR , −KR 2 x R(g) ¯ dµgR R R(g) IR : (m, c) → 16π A1

A1

hits zero for some R and |c| < R. We have shown that IR (m, c) = (m − m0 , m c) + (0, RR ) + o(1),

(89)

with |RR | = (m − m0 ) O(1). The leading first term is a diffeomorphism on some neighborhood U of (m0 , 0) which hits the origin there. Consider a small box B ⊂ U surrounding (m0 , 0) in which for each l, µ0 ≡ max |m − m0 | , max |cl | ≡ µl . B

B

(90)

By simple considerations of the boundary of this thin box, it is clear that the map IR will take some point inside this box to the origin, for R sufficiently large (in which case the o(1)-term is then a small perturbation). Indeed disregarding the o(1)-term, the map sends the hyperplanes m = m0 ± µ0 to the hyperplanes x 0 = ±µ0 , and the hyperplanes cl = ±µl are mapped close to the hyperplanes x l = ±m0 µl , respectively, by (90). Now R can be taken sufficiently large. By homotopy invariance of the local degree [Au,p.135], the origin will still have degree one for the map IR on B, so the origin is indeed in the image IR (B). Corollary 5.18. Theorem 2 remains true under the assumption that the metric g is Schwarzschild up to order O3 (|x|−(n−1) ), i.e. 4/(n−2) m0 g = 1+ δ + O3 (|x|−(n−1) ). (n − 1)|x|n−2 Proof. The proof is just to trace the steps in the preceding argument and see that everything goes through. We can write the metric g˜ = g˜ R as g˜ = φk + vδ, where k = O3 (|x|−(n−1) ) and 4/(n−2) 4/(n−2) m0 m v =φ 1+ + (1 − φ) 1 + , (n − 1)|x|n−2 (n − 1)|x − c|n−2 where here φ = φ |x| R . So the metric is a perturbation of a metric for which the previous procedure works. We check the various statements go through relatively unchanged. We record a few here; indeed it is easy to check the analogue of Lemma 5.14: 4m x k γ˜R,k −n + (m − m ) g˜ ip,k = (g˜ R )ip,k = − 0 δip + O(R ), (91) n − 1 |x|n R (n−1)

Scalar Curvature Deformation

185

4m 4nm x j x k = − δj k + (n − 1)|x|n n − 1 |x|(n+2) γ˜R,j k + n (m − m0 ) δip + O(R −(n+1) ). R

g˜ ip,j k = (g˜ R )ip,j k

(92)

Similarly, Lemma 5.16 becomes R(g˜ R ) =

i,j

4m2 (6 − n) 1 + (m − m0 )O(R −(2n−2) ) (g˜ R )ij,ij − (g˜ R )ii,jj + n − 1 |x|2n−2

+O(R −(2n−1) ).

(93)

To prove this you just keep track of the perturbation terms from φk. There is one more point that should be mentioned. There is an extra term which we need to account for when computing projection of the perturbed scalar curvature into the linear functions x l . Since we only really have the mass term in the expansion of the metric g, we have to redefine the initial center c0 . Indeed from the proof of Proposition 5.17 we get an analogous equality in n dimensions: x A1

l

xl 2 dx (g˜ ij,ij − g˜ ii,jj ) n R R R i,j AR = R −(n−1) xl (g˜ ij,i − g˜ ii,j )ν j dξ

(gR )ij,ij − (gR )ii,jj dx =

i,j

∂AR

− R −(n−1)

i

AR

(g˜ il,i − g˜ ii,l )dx.

i

We can of course apply the divergence theorem to the second term. What we obtain on the outer boundary is precisely what we got before (see the proof of Proposition 5.17), which in the n-dimensional case is 4ωn−1 mcl . nR (n−1) This tells us we should define the center c0 of our initial metric g by 4(n − 2)ωn−1 m c0 = lim − R→∞ n

SR

j i l x (gij,i − gii,j )ν dξ − (gil ν − gii ν )dξ . i

The degree argument then proceeds as above.

SR

We emphasize that the main point is that the mass term is spherically symmetric, which allows the above integrals to converge.

186

J. Corvino

A. Appendix We would like to mention another way to do the estimate in Proposition 3.1 (for n ≥ 3) which illustrates the flexibility we get from the overdetermined nature of L∗g ; it is more complicated, but it may be worthy of a comment here. We begin by restricting the operator L∗g to a codimension-one submanifold 1 of . We denote by g1 the induced metric g |T 1 on 1, and by Hess1 and *1 the Hessian and Laplacian on (1, g1 ). Define L1 to be the operator L∗g1 on 1: L1 f ≡ −*1 f g1 + Hess1 f − f Ric(g1 ). Note that clearly L1 has injective symbol for n ≥ 3 (again, for example, consider the trace of the symbol). We now examine the differences between the full tensors with respect to g and the corresponding tensors on 1. Choose a local orthonormal frame {e1 , e2 , . . . en } on a neighborhood of p ∈ 1, with {e1 , e2 , . . . en−1 } a local orthonormal frame for 1, and en orthogonal to 1. Let D and R denote the ambient connection and curvature tensor, respectively, let II = (hij ) denote the second fundamental form, and H the mean curvature, and finally let X, Y = g(X, Y ). Then the following identities hold on 1: (−*g f + *1 f ) = −

n

(ei ei f − (D ei ei )f ) +

i=1

n−1

(ei ei f − (Dei ei )f )

i=1

= −en en f + (D en en )f +

n−1

(D ei ei )f − (Dei ei )f

i=1

= −en en f + (D en en )f + H en f. For 1 ≤ i, j ≤ n − 1, (Hess f − Hess1 f )ij = (ei ej f − (D ei ej )f ) − (ei ej f − (Dei ej )f ) = −hij en f,

(94)

and − Ric(g)(ei , ej ) + Ric(g1 )(ei , ej ) = − Ric(g)(ei , ej ) +

n−1

R(ei , ek )ej , ek

k=1

(by the Gauss equation) = − Ric(g)(ei , ej ) +

n−1

R(ei , ek )ej , ek

k=1

+

n−1

[II(ek , ek ), II(ei , ej ) − II(ei , ek ), II(ej , ek )]

k=1

= − R(ei , en )ej , en +

n−1 k=1

hkk hij −

n−1 k=1

hik hj k .

Scalar Curvature Deformation

187

Thus the following holds for 1 ≤ i, j ≤ n − 1: (L∗g f |T 1 −L1 f )ij = −en en f + (D en en )f + H (en f ) δij − hij en f n−1 (hkk hij − hik hj k ) . +f −R inj n +

(95)

k=1

We will also want to consider Hessian terms of the form f;in , f;nn (components involving the normal to 1) to estimate the full Hessian Hess f , not just Hess1 f . To this end, note n

(L∗g f )ii = −

f;jj + f;ii − f Ric(g)(ei , ei ).

(96)

f;jj − f Ric(g)(ei , ei ),

(97)

j =1

This implies f;nn = −(L∗g f )ii −

n−1

(i < n),

j =i

and similarly, f;in = (L∗g f )in + f Ric(g)(ei , en ),

(i < n).

(98)

Recall that 1r = {x ∈ : d(x, 1) = r} is smooth and that the second fundamental form is bounded near that of ∂ for 0 < r < <0 (see Sect. 2.3). 2 (). Then for a.e. r ∈ (0, < ], Lemma A.1. Let f ∈ Hloc 0 | Hess f |2 dσg ≤ C (|L∗g f |2 + f 2 + |∇f |2 ) dσg , 1r

1r

where Hess f , ∇f , and L∗g f

refer to the full tensors in (, g), and where C is a constant uniform for r ∈ (0, <0 ] and uniform for metrics C 2 -near g. Proof. Since L1 has injective symbol and f ∈ H 2 (1r ) for a.e. r ∈ (0, <0 ] by the co-area formula, the standard elliptic estimate yields a constant C > 0 so that

f|1r H 2 (1r ) ≤ C L1 f L2 (1r ) + f L2 (1r ) . The constant C can be taken independent of r ∈ (0, <0 ] and of metrics sufficiently near g, since then the coefficients of L1 will be uniformly controlled within N<0 (), as will the volume forms and connection coefficients used in the norms in the above estimate. Hence by the above computation (95) of (L∗g f|T (1r ) − L1 f ), we have (since the second fundamental form is controlled) L1 f L2 (1r ) ≤ L∗g f L2 (1r ) + C f L2 (1r ) + C ∇f L2 (1r ) ; here we use (96) and (97) to see that en en f (from (95)) can be be replaced by terms in f and ∇f plus n−1

−

1 ∗ n−2 ∗ (Lg f )ii + (L f )nn . n−1 n−1 g i=1

188

J. Corvino

Hence

f|1r H 2 (1r ) ≤ C L∗g f L2 (1r ) + f L2 (1r ) + ∇f L2 (1r ) .

Combining this with the above computation (94), (97), (98) of Hess f yields the lemma. Now we can integrate this up to any “annular” subset A of the form A = A(<1 , <2 ) = {x ∈ : 0 ≤ <1 ≤ d(x, ∂) ≤ <2 ≤ <0 } using the co-area formula:

f H 2 (A) ≤ C L∗g f L2 (A) + f L2 (A) + ∇f L2 (A) . We now use the interpolation inequality [G-T] C f H 1 (A) ≤

1 f H 2 (A) + C(n, g, A)f L2 (A) . 2

(99)

This yields

f H 2 (A) ≤ C L∗g f L2 (A) + f L2 (A) . We use this estimate for A = A(0, <0 ), and combine it with the interior estimate

(100) f H 2 (< /2 ) ≤ C L∗g f L2 () + f L2 () , 0

where we let < ≡ {x ∈ : d(x, ∂) > <}. This finishes the proof of Proposition 3.1. The main idea here is that the estimate we would like to prove is valid on a closed submanifold, and the overdeterminedness allows us to use this to get the estimate on a neighborhood of the submanifold, since we can control normal derivatives appropriately. Acknowledgements. I would first like to thank my advisor Rick Schoen for all of the wisdom and patience he showed me during the preparation of my dissertation. I would like to acknowledge the ARCS Foundation which partly supported the research. I thank Hubert Bray for several discussions on mathematical relativity, and in particular on the quasi-local mass and possible connections with scalar curvature deformation. Finally, many thanks go to Adrian Butscher for making many helpful comments on the original manuscript.

References [A-D-M] Arnowitt, R., Deser, S., Misner, C.: Coordinate invariance and energy expressions in general relativity. Phys. Rev. 122, 997–1006 (1961) [A] Aronszajn, N.: A unique continuation theorem for solutions of elliptic partial differential equations or inequalities of second order. J. Math. Pures Appl. 36, 235–249 (1957) [Au] Aubin, T.: Some Nonlinear Problems in Riemannian Geometry. New York: Springer-Verlag, 1998 [A-B-R] Axler, S., Bourdon, P., Ramey, W.: Harmonic Function Theory. New York: Springer-Verlag, 1992 [Ba1] Bartnik, R.: The Mass of an Asymptotically Flat Manifold. Comm. Pure and Appl. Math. 39, 661–693 (1986) [Ba2] Bartnik, R.: New Definition of Quasilocal Mass. Phys. Rev. Lett. 62, 2346–2348 (1989) [Ba3] Bartnik, R.: Energy in General Relativity. In: Yau, S.-T. (ed.) Tsing Hua Lectures on Geometry and Analysis. Cambridge, MA: International Press, 1997, pp. 5–27

Scalar Curvature Deformation

[Be] [Br1] [Br2] [Bu] [E-G] [F-M] [G-T] [Gr] [H-E] [H-I] [I] [K] [L-P] [L] [Ma] [Mo] [O] [P] [R-R] [S1] [S2] [S-Y1] [S-Y2] [S-Y3] [S-Y4] [Si] [T] [W] [Wi]

189

Besse, A.L.: Einstein Manifolds. Berlin: Springer-Verlag, 1987 Bray, H.L.: The Penrose Inequality in General Relativity and Volume Comparison Theorems Involving Scalar Curvature. Thesis. Stanford University, 1997 Bray, H.L.: The Proof of the Riemannian Penrose Inequality Using the Positive Mass Theorem. Thesis, Stanford University, 2000 Butscher, A.: Deformation Theory of Minimal Lagrangian Submanifolds. In preparation Evans, L.C., Gariepy, R.F.: Measure Theory and Fine Properties of Functions. Boca Raton: CRC Press, Inc, 1992 Fischer, A.E., Marsden, J.E.: Deformations of the Scalar Curvature. Duke Math. J. 42, 519–547 (1975) Gilbarg, D., Trudinger, N.S.: Elliptic Partial Differential Equations of the Second Order. New York: Springer-Verlag, 1983 Gray, A.: Tubes. Redwood City, CA: Addison-Wesley, 1990 Hawking, S.W., Ellis, G.F.R.: The Large-Scale Structure of Spacetime. Cambridge: Cambridge University Press, 1973 Huisken, G, Ilmanen, T.: The Inverse Mean Curvature Flow and the Riemannian Penrose Inequality. Preprint Isenberg, J.: Constant Mean Curvature Solutions of the Einstein Constraint Equations on Closed Manifolds. Class. Quant. Grav. 12, 2249–2274 (1995) Kapouleas, N.: Complete Constant Mean Curvature Surfaces in Euclidean Three-space. Ann. of Math. 131, 239–330 (1990) Lee, J.M., Parker, T.H.: The Yamabe Problem. Bull. AMS. 17, 37–91 (1987) Lohkamp, J.: Scalar Curvature and Hammocks. Math. Ann. 313, no. 3, 384–407 (1999) Maz’ja, V.G.: Sobolev Spaces. Berlin: Springer-Verlag, 1985 Morrey, C.B., Jr.: Multiple Integrals in the Calculus of Variations. Berlin: Springer- Verlag, 1966 O’Neill, B.: Semi-Riemannian Geometry. San Diego: Academic Press, Inc., 1983 Pollack, D. The Extent of Nonuniqueness for the Yamabe Problem. Thesis, Stanford University, 1991 Renardy, M., Rogers, R.C.: An Introduction to Partial Differential Equations. NewYork: SpringerVerlag, 1993 Schoen, R.M.:Variational Theory for the Total Scalar Curvature Functional for Riemannian Metrics and Related Topics. In: Giaquinta, M. (ed.) Topics in the Calculus of Variations. Lecture Notes in Math., 1365, Berlin: Springer-Verlag, 1987, pp. 120–154 Schoen, R.M.: The Existence of Weak Solutions with Prescribed Singular Behavior for a Conformally Invariant Scalar Equation. Comm. Pure and Appl. Math. 41, 317–392 (1988) Schoen, R.M.,Yau, S.-T.: Lectures on Differential Geometry. Cambridge, MA: International Press, 1994 Schoen, R.M., Yau, S.-T.: On the Proof of the Positive Mass Conjecture in General Relativity. Commun. Math. Phys. 65, 45–76 (1979) Schoen, R.M., Yau, S.-T.: The Energy and Linear Momentum of Spacetimes in General Relativity. Commun. Math. Phys. 79, 47–51 (1981) Schoen, R.M., Yau, S.-T.: Proof of the Positive Mass Theorem. II. Commun. Math. Phys. 79, 231–260 (1981) Simon, L.: Theorems on Regularity and Singularity of Energy Minimizing Maps. Basel: Birkhäuser Verlag, 1996 Taylor, M.E.: Partial Differential Equations III. New York: Springer-Verlag, 1996 Wald, R.M.: General Relativity. Chicago: U. Chicago Press, 1984 Witten, E. A new proof of the positive energy theorem. Commun. Math. Phys. 80, 381–402 (1981)

Communicated by H. Nicolai

Commun. Math. Phys. 214, 191 – 200 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Limiting Case of the Sobolev Inequality in BMO, with Application to the Euler Equations Hideo Kozono, Yasushi Taniuchi Mathematical Institute, Tohoku University, Sendai 980-8578, Japan. E-mail: [email protected]; [email protected] Received: 2 March 1999 / Accepted: 29 March 2000

Dedicated to Professor Masayasu Mimura on the occasion of his 60th birthday Abstract: We shall prove a logarithmic Sobolev inequality by means of the BMO-norm in the critical exponents. As an application, we shall establish a blow-up criterion of solutions to the Euler equations. 1. Introduction The purpose of this paper is to establish an L∞ -estimate of functions in terms of the BMO norm and the logarithm of a norm of higher derivatives. It is well known that in Rn the Sobolev space W s,p with sp > n is continuously embedded into L∞ . This is not true in the space W k,r for kr = n. Brezis–Gallouet [3] and Brezis-Wainger [4] investigated the relation between L∞ , W k,r and W s,p and proved that there holds the embedding r−1 f L∞ ≤ C 1 + log r (1 + f W s,p ) , sp > n (1.1) provided f W k,r ≤ 1 for kr = n. Then Ozawa [10, 11] gave deep and systematic treatments and clarified the relation between (1.1), the Gagliardo-Nirenberg inequality and the Trudinger-Moser one. The estimate (1.1) was applied to prove existence of global solutions to the nonlinear Schrödinger equation([3, 5]). Similar embedding for vector functions u with div u = 0 was investigated by Beale–Kato–Majda [1], ∇uL∞ ≤ C 1 + rot uL∞ (1 + log+ uW s+1,p ) + rot uL2 , sp > n, (1.2) where log+ a = log a if a ≥ 1, = 0 if 0 < a < 1. In [1], they made use of (1.2) to give a blow-up criterion of solutions to the Euler equations. The difference between these two embeddings stems from the bound of f in W k,r for kr = n and that of rot u in L∞ . However, both of these bounds control f and ∇u in the common space BMO. In this paper, we will show a corresponding embedding estimate in L∞ by means of the BMO-norm which covers (1.2). As an application of our

192

H. Kozono, Y. Taniuchi

estimate, we will extend the blow-up criterion of solutions to the Euler equations which was originally given by Beale–Kato–Majda [1]. It is proved in [1] that the L∞ norm of vorticity controls breakdown of smooth solutions for the 3-D Euler equations. We will generalize such a criterion to the BMO-norm. The advantage to use BMO-space consists of the fact that Riesz transforms are bounded in BMO, but not in L∞ . This fact enables us to prove the same criterion not only by the vorticity but also by the deformation tensor (see Ponce [12]). Our first result now reads: Theorem 1. Let 1 < p < ∞ and let s > n/p. There is a constant C = C(n, p, s) such that the estimate f L∞ ≤ C 1 + f BMO (1 + log+ f W s,p ) (1.3) holds for all f ∈ W s,p . Remark. Compared with (1.2), we do not need to add f L2 to the right-hand side of (1.3). This makes it easier to derive an apriori estimate of solutions to the Euler equations than Beale–Kato–Majda [1]. We next consider the Euler equations for the incompressible fluid motion in Rn for n ≥ 2; ∂u + u · ∇u + ∇p = 0, div u = 0 in x ∈ Rn , t > 0, (E) ∂t u |t=0 = a. It is proved by Kato–Lai [7] and Kato–Ponce [8] that for every a ∈ W s,p for s > n/p +1 with div a = 0, there are T > 0 and a unique solution u of (E) on the interval [0, T ) in the class u ∈ C([0, T ); W s,p ) ∩ C 1 ([0, T ); W s−2,p ).

(1.4)

The time interval T of existence of the solution u depends only on aW s,p . It is an interesting question whether the solution u(t) really blows up as t ↑ T . Our result on (E) reads as follows. Theorem 2. Let 1 < p < ∞, s > n/p + 1. Suppose that u is the solution of (E) in the class (1.4). If either

T

rot u(t)BMO dt (≡ M0 ) < ∞

(1.5)

Def u(t)BMO dt (≡ M1 ) < ∞

(1.6)

0

or

T

0

holds, then u can be continued to the solution in the class (1.4) on the interval [0, T ) for some T > T , where rot u and Def u denote the vorticity and the deformation tensor of u, respectively.

Sobolev Inequality in BMO and Euler Equations

193

An immediate consequence of the above theorem is Corollary 1. Let u be the solution of (E) in the class (1.4) on the interval [0, T ) for 1 < p < ∞, s > n/p + 1. Assume that T is maximal, i.e., u cannot be continued to the solution in the class (1.4) on [0, T ) for any T > T . Then both

T

rot u(t)BMO dt = ∞

T

and

0

Def u(t)BMO dt = ∞

0

hold. In particular, we have lim sup rot u(t)BMO = ∞ and

lim sup Def u(t)BMO = ∞.

t↑T

t↑T

Remarks. 1. Beale–Kato–Majda [1], Ponce [12] and Kato–Ponce [8] obtained the same continuation principle of solutions as in Theorem 2 under the stronger assumption in L∞ . 2. Theorem 2 also holds for the Navier–Stokes equations. However, in the Navier– Stokes equations, on account of the viscosity term, a sharper estimate of solutions holds than for (2.17) below. Moreover, we can formulate the continuation principle for u itself in L2 (0, T ; BMO). For details, see [9].

2. Proof of the Theorems 2.1. Proof of Theorem 1. We shall make use of the Littlewood-Paley decomposition; there exists a non-negative function ϕ ∈ S (S; the Schwartz class) such that supp ϕ ⊂ ∞ ϕ(2−k ξ ) = 1 for ξ = 0. See Bergh-Löfström [2, {2−1 ≤ |ξ | ≤ 2} and such that k=−∞

Lemma 6.1.7]. Let us define φ0 and φ1 as φ0 (ξ ) =

∞

ϕ(2k ξ )

and

φ1 (ξ ) =

k=1

−1

ϕ(2k ξ ),

k=−∞

respectively. Then we have that φ0 (ξ ) = 1 for |ξ | ≤ 1/2, φ0 (ξ ) = 0 for |ξ | ≥ 1 and that φ1 (ξ ) = 0 for |ξ | ≤ 1, φ1 (ξ ) = 1 for |ξ | ≥ 2. It is easy to see that for every positive integer N there holds the identity φ0 (2N ξ ) +

N

ϕ(2−k ξ ) + φ1 (2−N ξ ) = 1,

ξ = 0.

(2.1)

k=−N

Since C0∞ is dense in W s,p and since W s,p is continuously embedded in BMO, it suffices to prove (1.3) for f ∈ C0∞ . For such f we have the representation f (x) =

y∈Rn

K(x − y) · ∇f (y)dy

with

K(y) =

1 y , nωn |y|n

194

H. Kozono, Y. Taniuchi

for all x ∈ Rn , where ωn denotes the volume of the unit ball in Rn . By (2.1) we decompose f into three parts: f (x) =

y∈Rn

K(x − y) ×

N

× φ0 (2 (x − y)) +

N

ϕ(2

−k

(x − y)) + φ1 (2

−N

(x − y)) · ∇f (y)dy

k=−N

≡ f0 (x) + g(x) + f1 (x) (2.2) for all x ∈ Rn . Step 1. Estimate of f0 . Let us first consider the case s ≥ 1. Since s > n/p, we can take q and q so that 1/p − (s − 1)/n ≤ 1/q < 1/n, 1/q = 1 − 1/q. Then there holds the Sobolev embedding W s,p ⊂ W 2,q , so we have by integration by parts that

|f0 (x)| =

y∈Rn

|K(x − y)φ0 (2N (x − y))|q dy

≤C

≤C

y∈Rn

φ0 (2N (x − y))q dy |x − y|(n−1)q

|x−y|≤2−N

≤C

2−N

0

q

q

1 q

1 dy |x − y|(n−1)q 1

r −(n−1)q +n−1 dr

1 y∈Rn

|∇f (y)|q dy

1 q

∇f Lq

1 q

f W s,p

f W s,p

= C2−N(1−n/q) f W s,p

(2.3)

for all x ∈ Rn . We next consider the case n/p < s < 1. Let H (y) ≡ K(y)φ0 (y). For λ > 0, we define Hλ (y) = H (λy). Then we have f0 (x) = K(y)φ0 (2N y)∇f (x − y)dy n y∈R N(n−1) =2 K(2N y)φ0 (2N y)σ τ−x ∇f (y)dy =

y∈Rn N(n−1) −2 H2N , ∇σ τ−x f L2

for all x ∈ Rn , where (τh f )(y) = f (y − h), (σf )(y) = f (−y) and (·, ·)L2 denotes the inner product in L2 (Rn ). By integration by parts, from the above identity we obtain the identity s s f0 (x) = 2N(n−1) div (−')− 2 H2N , σ τ−x (−') 2 f 2 , x ∈ Rn . (2.4) L

Sobolev Inequality in BMO and Euler Equations

195

On the other hand, there holds s

∇(−')− 2 H ∈ Lp Indeed, since s

(−')− 2 H (x) = C we have for |x| ≥ 2, − 2s

|∇(−')

|y|≤1

for p = p/(p − 1).

1 K(y)φ0 (y)dy, |x − y|n−s

(2.5)

x ∈ Rn ,

x−y H (x)| = C K(y)φ0 (y)dy n−s+2 |y|≤1 |x − y| 1 1 ≤C dy n−s+1 |y|n−1 y≤1 |x − y| C ≤ n−s+1 |x|

(2.6)

(2.7)

with C = C(n, s). For |x| < 1, we make use of another representation of (2.6) such as s 1 (−')− 2 H (x) = · h(x) with |x|n−1−s 1 y φ0 (|x|y)dy. h(x) = C x n−s |y|n y∈Rn | |x| − y|

(2.8)

For each 0 < |x| < 1, we denote by *x the 2-dimensional plane spanned by x and e1 ≡ (1, 0, · · · , 0). Let ex be a unit vector in *x with e1 · ex = 0 so that the pair {e1 , ex } is the orthonormal basis of *x . Furthermore, taking another n − 2 unit vectors (3) (n) (3) (n) ex , · · · , ex , we may assume that {e1 , ex , ex , · · · , ex } is an orthonormal basis in Rn . Let us define an orthogonal linear transformation Sx in such a way that Sx e1 = cos θx · e1 − sin θx · ex , Sx ex = sin θx · e1 + cos θx · ex , (j )

(j )

Sx ex = ex ,

j = 3, · · · , n,

where θx is the angle between x and e1 . Since φ0 is a radial symmetric function, we have by changing the variable y → y = Sx y that 1 Sxt y h(x) = φ (|x|y)dy, n−s |y|n 0 y∈Rn |e1 − y| and hence there holds |h(x)| ≤ C

y∈Rn

1 1 dy ≤ C n−s |e1 − y| |y|n−1

(2.9)

n 2 for all 0 < |x| < 1 with C = C(n, s). Since cos θx = x1 /|x|, sin θx = ± j =2 xj /|x|, there holds

t ∂ Sx y C 1 ≤ , j = 1, · · · , n, for all 0 < |x| < 1, y ∈ Rn ∂x n |y| |x| |y|n−1 j

196

H. Kozono, Y. Taniuchi

with C = C(n) independent of x, y, which yields

Sxt y 1 + 1 |∇φ0 (|x|y)| dy ∇x n−s |y|n |y|n−2 y∈Rn |e1 − y| C 1 1 1 1 ≤ dy + C dy 1 1 |e1 − y|n−s |y|n−2 |x| y∈Rn |e1 − y|n−s |y|n−1 2|x| ≤|y|≤ |x|

|∇h(x)| ≤

≤

C |x|

(2.10)

for all 0 < |x| < 1 with C = C(n, s). Notice that supp∇φ0 ⊂ {1/2 < |ξ | < 1}. Then it follows from (2.8), (2.9) and (2.10) that s

|∇(−')− 2 H (x)| ≤

C |x|n−s

for all |x| < 1.

(2.11)

Since n/p < s < 1, from (2.7) and (2.11) we obtain (2.5). Since (−')

1−s 2

H2N Lp = 2N(1−s−n/p ) (−')

1−s 2

= 2N(1+n/p−n−s) (−')

H Lp

1−s 2

H Lp ,

it follows from (2.4) and the Hölder inequality that s

s

|f0 (x)| ≤ 2N(n−1) div (−')− 2 H2N Lp σ τ−x (−') 2 f Lp ≤ C2N(n−1) (−')

1−s 2

s

H2N Lp (−') 2 f Lp

≤ C2−N(s−n/p) f W s,p

(2.12)

for all x ∈ Rn . Now, by (2.3) and (2.12) we have in both cases f0 L∞ ≤ C2−Nβ f W s,p

with β = Min.{1 − n/q, s − n/p},

(2.13)

where C = C(n, p, s) is independent of N . Step 2. Estimate of g. For each x ∈ Rn , we take bk (x) so that 1 bk (x) = |B(x, 2k+1 )|

B(x,2k+1 )

f (y)dy,

k = 0, ±1, · · · , ±N,

where B(x, R) denotes the ball centered at x with radius R and |B| is the volume of B. Since supp ϕ(2k (x − ·)) ⊂ {y ∈ Rn ; 2k−1 ≤ |y − x| ≤ 2k+1 }, we have by integration

Sobolev Inequality in BMO and Euler Equations

197

by parts

N −k K(x − y)ϕ(2 (x − y))∇y (f (y) − bk (x)) dy |g(x)| = n k=−N y∈R N = div y K(x − y)ϕ(2−k (x − y)) (f (y) − bk (x)) dy n y∈R k=−N

≤C

k=−N

≤C

N

2k−1 ≤|y−x|≤2k+1

1

k=−N

≤C

N

N k=−N

2(k+1)n

1 2−k + |x − y|n |x − y|n−1

|f (y) − bk (x)|dy

2k−1 ≤|y−x|≤2k+1

1 |B(x, 2k+1 )|

|f (y) − bk (x)|dy

B(x,2k+1 )

|f (y) − bk (x)|dy

≤ CNf BMO for all x ∈ Rn , which implies that gL∞ ≤ CN f BMO

(2.14)

with C = C(n) independent of N . Step 3. Estimate of f1 . Integrating by parts, we have by a direct calculation |f (x)| 1 −N = div y K(x − y)φ1 (2 (x − y)) f (y)dy n y∈R −N ≤ div K(x − y)φ1 (2 (x − y))f (y)dy y∈Rn + 2−N K(x − y) · ∇φ1 (2−N (x − y))f (y)dy y∈Rn ≤C |x − y|−n |f (y)|dy 2N ≤|x−y| + C2−N |x − y|1−n |f (y)|dy

≤C

2N ≤|x−y|≤2N +1

2N ≤|x−y|

|x − y|−np dy

1/p f Lp

1/p + C2−N |x − y|−(n−1)p dy f Lp 2N ≤|x−y|≤2N +1  N +1 1/p  1/p   ∞ 2 f Lp ≤C r −np +n−1 dr + 2−N r −(n−1)p +n−1 dr   2N 2N ≤ C2

−N· pn

f Lp

198

H. Kozono, Y. Taniuchi

for all x ∈ Rn , which yields f1 L∞ ≤ 2

−N· pn

f Lp

(2.15)

with C = C(n, p) independent of N . Now it follows from (2.2) and (2.13)-(2.15) that f L∞ ≤ C(2−γ N f W s,p + N f BMO )

(2.16)

with γ = Min.{1 − n/q, s − n/p, n/p}, where C = C(n, s, p) is independent of N and f . If f W s,p ≤ 1, then we may take N = 1; otherwise, we take N so large that the first log f W s,p term of the right-hand side of (2.16) is dominated by 1, i.e., N ≡ +1 γ log 2 ([·]; Gauss symbol) and (2.16) becomes

log f W s,p f L∞ ≤ C 1 + f BMO +1 . γ log 2 In both cases, (1.3) holds. This proves Theorem 1. Remark. There is a simple alternative proof for (2.14). Indeed, we have g(x) =

N k=−N

(div 4)5k ∗ f (x),

x ∈ Rn ,

where 4(x) = K(x)ϕ(x) and ψt (x) = t −n ψ(x/t) for t > 0. Since 4 ∈ S with the property that div 4(x)dx = 0, Rn

it follows from Stein [13, Chap. IV, 4.3.3] that gL∞ ≤

N k=−N

≤

N

(div 4)5k ∗ f L∞ sup (div 4)t ∗ f L∞

k=−N t>0

≤ CN f BMO , which yields (2.14).

2.2. Proof of Theorem 2. It is proved by Kato–Lai [7] and Kato–Ponce [8] that for the given initial data a ∈ W s,p for s > 1 + n/p, the time interval T of the existence of the solution u to (E) in the class (1.4) depends only on aW s,p . Hence by the standard argument of continuation of local solutions, it suffices to establish an apriori estimate for u in W s,p in terms of a, T , M0 or a, T , M1 according to (1.5) or (1.6). Indeed, we shall show that the solution u(t) in the class (1.4) is subject to the following estimate:

Sobolev Inequality in BMO and Euler Equations

199

sup u(t)W s,p ≤ (aW s,p + e)αj exp(CT αj )

0
with αj = eCMj ,

j = 0, 1, (2.17)

where C = C(n, p, s) is a constant independent of a and T . We shall first prove (2.17) under (1.5). It follows from the commutator estimate in Lp given by Kato–Ponce [8, Proposition 4.2] that

t u(t)W s,p ≤ aW s,p exp C ∇u(τ )L∞ dτ , 0 < t < T , (2.18) 0

where C = C(n, p, s). In case p = 2, i.e., in the W s,2 -estimate, this can be done more directly as in Beale–Kato–Majda [1, p. 64, Eq. (14)]. By the Biot-Savard law, we have a representation of ∇u in terms of ω ≡ rot u as ∂u = Rj (R × ω), ∂xj

j = 1, · · · , n,

(2.19)

1 ∂ (−')− 2 denote the Riesz transforms. Since R is ∂xj a bounded operator in BMO, this yields

where R = (R1 , · · · , Rn ), Rj =

∇uBMO ≤ CωBMO

(2.20)

with C = C(n). Hence it follows from (2.20) and Theorem 1 that ∇u(t)L∞ ≤ C 1 + ω(t)BMO (1 + log+ u(t)W s,p )

(2.21)

for all 0 < t < T with C = C(n, p, s). Substituting (2.21) to (2.18), we have u(t)W s,p + e ≤ (aW s,p

t + e) exp C {1 + ω(τ )BMO log(u(τ )W s,p + e)}dτ 0

for all 0 < t < T . Defining z(t) ≡ log(u(t)W s,p + e) , we obtain from the above estimate t z(t) ≤ z(0) + CT + C ω(τ )BMO z(τ )dτ, 0 < t < T . 0

Now (1.5) and the Gronwall inequality yield

t z(t) ≤ (z(0) + CT ) exp C ω(τ )BMO dτ 0

≤ (z(0) + CT ) α0 for all 0 < t < T with C = C(n, p, s), which implies (2.17) for j = 0. Next, assume (1.6). Instead of (2.19) we make use of another representation n

∂ul = Rj ( Rk Def ukl ), ∂xj k=1

j, l = 1, · · · , n,

where Def ukl =

∂uk ∂ul + . ∂xl ∂xk

200

H. Kozono, Y. Taniuchi

Hence again by the boundedness of Riesz transforms in BMO, there holds ∇uBMO ≤ CDef uBMO .

(2.22)

Then by (2.22) and Theorem 1 we have similarly to (2.21) that ∇u(t)L∞ ≤ C 1 + Def u(t)BMO (1 + log+ u(t)W s,p ) for all 0 < t < T with C = C(n, p, s). It is easy to see that the rest of the argument is parallel to that of the case when (1.5) holds, so we get also (2.17) for j = 1. This proves Theorem 2. Acknowledgement. The authors would like to express their thanks to Professor Takayoshi Ogawa for his valuable suggestions.

References 1. Beale, J.T., Kato, T., Majda, A.: Remarks on the breakdown of smooth solutions for the 3 − D Euler equations. Commun. Math. Phys. 94, 61–66 (1984) 2. Bergh, J., Löfström, J.: Interpolation spaces, An introduction. Berlin-New York-Heidelberg: SpringerVerlag, 1976 3. Brezis, H., Gallouet, T.: Nonlinear Schrödinger evolution equations. Nonlinear Anal. TMA 4, 677–681 (1980) 4. Brezis, H., Wainger, S.: A note on limiting cases of Sobolev embeddings and convolution inequalities. Comm. Partial Differential Equations 5, 773–789 (1980) 5. Hayashi,N., Wahl, von W., On the global strong solutions of coupled Klein-Gordon-Schrödinger equations. J. Math. Soc. Japan 39, 489–497 (1987) 6. John, F., Nirenberg, L.: On functions of bounded mean oscillation. Comm. Pure Appl. Math. 14, 415–426 (1961) 7. Kato, T., Lai, C.Y.: Nonlinear evolution equations and the Euler flow. J. Func. Anal. 56, 15–28 (1984) 8. Kato, T., Ponce, G.: Commutator estimates and the Euler and Navier–Stokes equations. Comm. Pure Appl. Math. 41, 891–907 (1988) 9. Kozono, H., Taniuchi,Y.: Bilinear estimates in BMO and the Navier–Stokes equations. To appear in Math. Z. 10. Ozawa,T.: On critical cases of Sobolev’s inequalities. J. Funct. Anal. 127, 259–269 (1995) 11. Ozawa,T.: Characterization of Trudinger’s inequality. J. of Inequal. & Appl. 1, 369–374 (1997) 12. Ponce, G.: Remarks on a paper by J. T. Beale, T. Kato and A. Majda. Commun. Math. Phys. 98, 349–353 (1985) 13. Stein, E. M.: Harmonic Analysis. Princeton, NJ: Princeton University Press, 1993 Communicated by H. Araki

Commun. Math. Phys. 214, 201 – 226 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Inviscid Limits of the Complex Ginzburg–Landau Equation Philippe Bechouche1,2 , Ansgar Jüngel 2,3, 1 Departamento de Matemática Aplicada, Facultad de Ciencias, Universidad de Granada, 18071 Granada,

Spain

2 Laboratoire Dieudonné, Université de Nice, BP 71, 06108 Nice Cedex 2, France.

E-mail: [email protected]

3 Fachbereich Mathematik, Technische Universität Berlin, Strasse des 17. Juni 136, 10623 Berlin, Germany

Received: 2 April 1999 / Accepted: 29 March 2000

Abstract: In the inviscid limit the generalized complex Ginzburg–Landau equation reduces to the nonlinear Schrödinger equation. This limit is proved rigorously with H 1 data in the whole space for the Cauchy problem and in the torus with periodic boundary conditions. The results are valid for nonlinearities with an arbitrary growth exponent in the defocusing case and with a subcritical or critical growth exponent at the level of L2 in the focusing case, in any spatial dimension. Furthermore, optimal convergence rates are proved. The proofs are based on estimates of the Schrödinger energy functional and on Gagliardo–Nirenberg inequalities.

1. Introduction The description of spatial pattern formation or chaotic dynamics in continuum systems, in particular fluid dynamical systems, is a challenging task in theoretical physics and applied mathematics. Due to the complexity of the corresponding nonlinear evolution equations, simpler model equations for which the mathematical issues can be solved with greater success, have been derived. The complex Ginzburg–Landau equation is one of these equations. It models the evolution of the amplitude of perturbations to steady-state solutions at the onset of instability [23, 31]. Originally discovered by Ginzburg and Landau for a phase transition in superconductivity [17], this equation was subsequently derived for instability waves in hydrodynamics such as the nonlinear growth of Rayleigh–Bénard convective rolls [33, 34], the appearance of Taylor vortices in the Couette flow between counter-rotating cylinders [39], and the development of Tollmien-Schlichting waves in plane Poiseuille flows [2]. The equation is also used to model the transition to turbulence in chemical reactions [19, 20, 24] or to describe pattern formation [8, 32]. Current address: Universität Konstanz, Fachbereich für Mathematik und Statistik, Fach D 193, 78457 Konstanz, Germany. E-mail: [email protected]

202

P. Bechouche, A. Jüngel

The generalized complex Ginzburg–Landau (CGL) equation for the function u(x, t) is given by ∂t u = (a + iν)u + Ru − (b + iµ)f (u) in , t > 0, u(x, 0) = u0 (x) for x ∈ .

(1.1) (1.2)

We consider this equation either in the whole space = Rd or in the torus Td = (R/2πZ)d . In the latter case we prescribe periodic boundary conditions. The parameters a, b, ν, µ, R are real with R ≥ 0, a > 0 and b > 0. Without loss of generality, we take ν > 0. The interaction term f (u) is a nonlinear function of u, a typical form of which is f (u) = u|u|2σ ,

u ∈ C,

(1.3)

for some σ > 0. Usually, the CGL equation with cubic nonlinearity σ = 1 has been considered in physics. Equation (1.1) reduces to the nonlinear Schrödinger (NLS) equation for a = b = 0, ∂t u = iνu + Ru − iµf (u) in , t > 0,

(1.4)

(more commonly with R = 0) and to the nonlinear heat equation with dissipative nonlinearity for ν = µ = 0, so that one may expect for it a mixture of the properties that occur in those two cases. Now, the self-focusing mechanism and its resulting finitetime blow-up for solutions of the NLS equation (with R = 0) is well understood both physically and mathematically [26,30,36]. More precisely, necessary conditions for the possible formation of finite-time singularities are µ < 0 and σ ≥ d/2. In this case, the interaction is attractive, and when σ ≥ d/2 this attraction is strong enough to lead to singularities. When µ > 0, the interaction is repulsive, and there is no tendency for the particles to concentrate. Therefore, it is of particular mathematical interest to ask how “far” is the solution of the CGL equation (1.1) from the solution of the NLS equation (1.4), in order to relate the results for the NLS equation to the behavior of the solution to the CGL equation. In this paper we answer this question. More precisely, we prove rigorously the inviscid limit a → 0, b → 0 in Eq. (1.1) and give optimal convergence rates for the difference of the solutions to Eqs. (1.1) and (1.4). Optimal convergence rates for the difference of the solutions to Eqs. (1.1) and (1.4) have also been given by Wu in [41], however, his estimates do not allow the limit a → 0, b → 0, except in the case of linear interactions. The results of this paper are valid for any nonlinearity satisfying some structure conditions (see below). Bartuccelli et al. [1] obtained asymptotic bounds for the CGL equation (with periodic boundary conditions) which suggest that large intermittent fluctuations can occur in two or three space dimensions near the inviscid limit. In one space dimension, however, only small deviations from mean quantities are possible. Numerical studies of the CGL equation near the inviscid limit have been performed in [18, 38]. Another paper which relates the CGL equation to the NLS equation is [25]. The authors show that so-called traveling holes of the CGL equation are singular perturbations of nonlinear Schrödinger dark solitons. Furthermore, Levermore studied recently the inviscid limit in terms of wavenumber modes [27]. In order to describe the main results of this paper we suppose that 0 < a < a0 , 0 < b < b0 , ν > 0, R ≥ 0, µ ∈ R, d ≥ 1,

(1.5)

Complex Ginzburg–Landau Equation

203

and, for the sake of presentation, we assume that the nonlinearity f (u) is given by (1.3). In Sect. 2 we present results for more general nonlinearities. We show the inviscid limit for nonlinearities with arbitrary growth exponent when µ ≥ 0 and with subcritical exponent at the level of L2 when µ < 0. Here, the term “subcritical at the level of L2 ” indicates that σ < 2/d. More precisely, we assume that if µ ≥ 0 : 0 < σ < ∞; if µ < 0 : (i) σ < 2/d, or (ii) σ = 2/d and u0 L2 is small enough.

(1.6) (1.7)

def

Let ε = (a, b), let the initial datum satisfy u0 ∈ V = H 1 () ∩ L2σ +2 (), T > 0, and let uε be the solution to the CGL equation (1.1)–(1.2) in × (0, T ) (see Sect. 2 for a review on existence results). Furthermore, when µ < 0, let (ba −dσ/2 ) be bounded as ε → 0. Theorem 1.1. Let the above assumptions hold. Then there exists a subsequence of (uε ) (not relabeled) such that, as ε → 0, uε converges to a function u which solves the NLS equation (1.4), (1.2). The convergence is strong in the space L∞ (0, T ; Lrloc ()), where r < 2d/(d − 2), and weak-∗ in the spaces L∞ (0, T ; H 1 ()) and W 1,∞ (0, T ; V ∗ ). If the NLS equation is uniquely solvable (e.g., when σ ≤ 2/(d − 2); cf. [6]) then the whole sequence (uε ) converges. For the convergence rates, let uε be the solution to the CGL equation (1.1), (1.2) with initial datum uε0 and let u be the solution to the NLS equation (1.4), (1.2) with initial datum u0 . We assume that uε0 , u0 ∈ H n () with n > d/2 and uε0 − u0 L2 = O(a) + O(b) as ε → 0. Theorem 1.2. Let the above assumptions hold. Then there exists a constant T ∗ > 0 independent of a, b such that for all 0 ≤ t < T ∗ , (uε − u)(t)L2 = O(a) + O(b)

as ε → 0.

Moreover, we can take T ∗ = T if d = 1. The growth condition for the nonlinearity in the above results is sharp. Indeed, when = Rd and u0 ∈ H 1 () ∩ L2σ +2 (), there exists a global in time weak solution to the NLS equation if the conditions (1.6)–(1.7) hold. Moreover, if σ ≥ 2/d then there exist solutions that blow up in finite time (see [6, Rem. 6.5.1, Thm. 9.4.1]). Theorems 1.1 and 1.2 are corollaries from the more general results of Sect. 2 (see Theorems 2.1, 2.2, 2.4, and 2.5). In the following we explain our method of proof of the inviscid limits. The basic idea is to treat the CGL equation as a perturbation of the NLS equation. Now, the NLS flow (with R = 0) conserves the mass M and the energy E, which are defined by |u(t)|2 dx, (1.8) M(t) = ν µ E(t) = |∇u(t)|2 dx + F (|u(t)|2 ) dx, (1.9) 2 2 and F is given by F (s) =

1 s σ +1 , σ +1

s ≥ 0.

204

P. Bechouche, A. Jüngel

One may expect that in the case of the CGL equation, these quantities are also bounded in terms of M(0), E(0), and Rt (if R > 0). This is indeed true. Let u be a weak solution to the CGL equation. A simple computation shows that, when µ ≥ 0 (see Sect. 3), ∂t M ≤ 2RM, ∂t E ≤ 2R(σ + 1)E, for all t > 0, which implies that M(t) ≤ M(0)e2Rt ,

E(t) ≤ E(0)e2R(σ +1)t .

When µ ≥ 0 we immediately obtain H 1 and L2σ +2 bounds for u(t) uniformly in a, b. In the case µ < 0 we need, for instance, to control the integral µ µ F (|u(t)|2 )dx = − |u(t)|2σ +2 dx − 2 2(σ + 1) with the H 1 norm of u(t). This is achieved through the Gagliardo–Nirenberg inequality (see Lemma 2.6) uL2σ +2 ≤ C(σ, d)uθH 1 u1−θ , L2

where the interpolation parameter θ satisfies θ = σ d/(2σ + 2). Notice that 0 < θ < 1 if and only if 0 < σ < 2/(d − 2). This inequality gives, for 0 ≤ t ≤ T , ν µ ν +2 E(t) + M(t) = u(t)2H 1 + u(t)2σ L2σ +2 2 2 2(σ + 1) ν (2σ +2)θ (2σ +2)(1−θ) ≥ u(t)2H 1 + µC u(t)H 1 u(t)L2 . 2 Because u(t)L2 is uniformly bounded, this estimate provides an upper bound for u(t)H 1 uniformly in a, b, whenever E(t) is uniformly bounded and the exponent of the H 1 norm of u(t) in the second term of the right-hand side is less than in the first term, which is the case if and only if σ < 2/d. For the limit case σ = 2/d, the above estimate gives a uniform bound for u(t)H 1 provided that u(t)L2 or, equivalently, u0 L2 is small enough. It is clear that the uniform H 1 bound for u(t) is the key estimate in the proof of Theorem 1.1. In a similar way, one can obtain uniform H n bounds for u(t) (n > 1) by estimating the quantities ∂t u2H n (see Sects. 5 and 7). These bounds can be only derived for small enough t > 0, since the use of the Gagliardo–Nirenberg inequality leads to inequalities of the type ∂t u2H n ≤ C1 + C2 u2+α Hn ,

with positive α = α(σ, d). For the proof of Theorem 1.2 we consider the equation satisfied by the difference uε −u, where uε with ε = (a, b) solves the CGL equation and u solves the NLS equation. In order to estimate the term f (uε )−f (u) we need uniform L∞ bounds for the solutions which are obtained via the Sobolev embedding theorem from the uniform H n bounds with n > d/2. This paper is organized as follows. In Sect. 2 we recall some existence results for the CGL equation and state our main theorems. Section 3 is concerned with the uniform H 1 estimate for the solutions of the CGL equation. The first convergence result is proved in Sect. 4. A uniform bound in H 2 is derived in Sect. 5. Finally, the proof of the optimal convergence rate is presented in Sect. 6 (for at most three spatial dimensions) and Sect. 7 (for any spatial dimension).

Complex Ginzburg–Landau Equation

205

2. Preliminaries and Main Results The (global in time) existence and uniqueness of solutions to the generalized CGL equation (1.1)–(1.2) is shown by Ginibre and Velo for = Rd [14–16] and by Doering, Gibbon and Levermore for = Td [9] (also see the review [28]) under some assumptions on the nonlinearity. In particular, assume the following: def

Let = Rd or = Td (d ≥ 1), u0 ∈ V = H 1 () ∩ L2σ +2 (), T > 0, 0 < σ < ∞ and, in the case µ < 0, let either (i) σ < 2/d, or (ii) σ = 2/d and u0 L2 be small enough.

(2.1)

Note that V = H 1 () if σ ≤ 2/(d − 2). We impose the following assumption on the non-linearity f (u): f (u) = ug(|u|2 ),

u ∈ C,

(2.2)

where the function g(s) satisfies: g ∈ C 1 ([0, ∞); [0, ∞)) and there exist σ, γ , ., . > 0 such that for all s ≥ 0, γ s σ ≤ g(s) ≤ .(1 + s σ ), 0 ≤ sg (s) ≤ . (1 + s σ ).

(2.3)

For instance, the functions g(s) = s σ , g(s) = s σ + log(1 + s), or g(s) = ses /(1 + es ) satisfy the condition (2.3). In order to define the mass (1.8) and energy (1.9) we introduce s F (s) = g(ξ )dξ, s ≥ 0. 0

Under the above assumptions there exists a unique solution u to (1.1)–(1.2) satisfying u ∈ H 1 (0, T ; L2 ()) ∩ C 0 ([0, T ]; H 1 () ∩ L2σ +2 ()) ∩ L2 (0, T ; H 2 ()). This follows from the a priori estimates of Sect. 3 and the compactness arguments of [9, 14]. Furthermore, if additionally u0 ∈ H n () (n ∈ N), then the solution u satisfies u ∈ H 1 (0, T ; H n−1 ()) ∩ C 0 ([0, T ]; H n ()) ∩ L2 (0, T ; H n+1 ()). This regularity result can be achieved by applying, for instance, Theorems II.3.2 and II.3.3 in [40, Ch. I.3.3]. Concerning the existence and uniqueness of solutions to the NLS equation (1.4), (1.2) we refer to [6, 7, 13] for = Rd and to [3, 4, 21] for = Td (see also [22] and the reviews in [11, 12]). Our first main result is concerned with the convergence of the solution uε of the CGL equation with initial datum u0 and parameter ε = (a, b) to a solution u of the NLS equation. In the case µ < 0 we need an additional technical assumption on the parameters a and b: Let (ba −dσ/2 ) be bounded as (a, b) → 0.

(2.4)

206

P. Bechouche, A. Jüngel

Theorem 2.1. Let the assumptions (1.5), (2.1)–(2.4) hold. Furthermore, let = Td . Let uε be the solution to the CGL equation (1.1)–(1.2). Then there exists a subsequence of uε (not relabeled) such that, as ε → 0, uε 0 u ∂t uε 0 ∂t u uε → u f (uε ) → f (u)

in L∞ (0, T ; H 1 ()) weakly-∗, in L∞ (0, T ; V ∗ ) weakly-∗, in L∞ (0, T ; Lrloc ()) strongly, q in L∞ (0, T ; Lloc ()) strongly,

where r < 2d/(d−2) (r < ∞ if d ≤ 2); 2/(2σ +1) < q < (2σ +2)/(2σ +1) when µ ≥ 0; q = 2d/(d + 2) when µ < 0; and u ∈ L∞ (0, T ; H 1 ()) ∩ W 1,∞ (0, T ; H −1 ()) ∩ L(2σ +1)q ( × (0, T )) solves the NLS equation (1.4), (1.2). Theorem 2.2. Let the assumptions of Theorem 2.1 hold, but assume that = Rd and that g(s) ≤ .s σ for σ ≥ 0. Then the same conclusion as in Theorem 2.1 holds. The proof of this result is based on a uniform bound for uε in L∞ (0, T ; H 1 ()), which allows to bound the term f (uε ). In one space dimension, the Sobolev embedding H 1 () 2→ L∞ () therefore provides a√uniform bound for uε in L∞ ( × (0, T )) from √ which we can conclude a O( a) + O( b) convergence rate for the difference uε − u in L∞ (0, T ; L2 ()) under the condition that the initial data satisfies √ √ uε0 − u0 L2 = O( a) + O( b) as a, b → 0. (2.5) Theorem 2.3. Let d = 1 and assume that the hypotheses (1.5), (2.1)–(2.5) hold. Then we have for all 0 ≤ t ≤ T , √ √ (uε − u)(t)L2 = O( a) + O( b) as a, b → 0. More precisely, there exist two constants M1 (T ), M2 (T ) > 0 depending on the data but not on a, b such that for all 0 ≤ t ≤ T , √ √ (uε − u)(t)L2 ≤ uε0 − u0 L2 + M1 (T ) a + M2 (T ) b. In order to get the optimal convergence rate O(a) + O(b) for (uε − u)(t) in L2 (), we have to impose stronger assumptions on the nonlinearity g. Suppose that g ∈ C 2 ([0, ∞); [0, ∞)) and there exist σ ≥ 1/2, .1 , .1 > 0 such that for all s ≥ 0, (2.6) 0 ≤ sg (s) ≤ .1 (s 1/2 + s σ ), 2 1/2 σ 0 ≤ s g (s) ≤ .1 (s + s ). It can be easily checked that the condition on g in (2.6) implies that 0 ≤ sg (s) ≤ .2 (1 + s σ )

for all s ≥ 0,

for some .2 > 0. Thus the condition on g in (2.6) is stronger than the corresponding condition in (2.3). Theorem 2.4. Let 1 ≤ d ≤ 3 and uε0 , u0 ∈ H 2 () and assume the conditions (1.5), (2.1)–(2.6). Then, if uε0 − u0 L2 = O(a) + O(b), there exists a constant T ∗ > 0 depending on the data but not on a, b such that for all 0 ≤ t < T ∗ , (uε − u)(t)L2 = O(a) + O(b) Moreover, we can take T ∗ = T if d = 1.

as a, b → 0.

Complex Ginzburg–Landau Equation

207

More precisely, there exist two constants M3 (t), M4 (t) > 0 which are independent of a, b such that for all 0 ≤ t < T ∗ , (uε − u)(t)L2 ≤ uε0 − u0 L2 + M3 (t)a + M4 (t)b. The constant T ∗ > 0 depends on d, σ, R, .1 , |b0 +iµ|, C4 , C6 and u0 L2 (see (5.8)). See (5.3) and (5.6) for the definitions of C4 and C6 . Our last result is concerned with the optimal convergence rate in any space dimension. In order to avoid cumbersome assumptions on the function g, we suppose that g(s) = s σ

for some σ > 0.

(2.7)

Theorem 2.5. Let d ≥ 1, uε0 , u0 ∈ H n () with n ∈ N, n > d/2, and suppose that (1.5), (2.1)–(2.4) and (2.7) hold. Then, if uε0 − u0 L2 = O(a) + O(b), there exists a constant T ∗ > 0 depending on the data but not on a, b such that for all 0 ≤ t < T ∗ , (uε − u)(t)L2 = O(a) + O(b)

as a, b → 0.

In the proofs of the main theorems we employ the Gagliardo–Nirenberg inequality (see, e.g., [6, p. 21] or [10, p. 242]): Lemma 2.6. Let = Rd , or = Td , or let ⊂ Rd be a bounded domain with ∂ ∈ C 0,1 , and d, m ≥ 1. Furthermore, let 1 ≤ p, q, r ≤ ∞, j ∈ N ∪ {0} with j < m and θ ∈ [j/m, 1] such that 1 m j 1 1 = +θ − + (1 − θ) , p d r d q provided that m − j − d/r is not a nonnegative integer (else take θ = j/m). Then there exists G = G(d, m, j, θ, q, r) such that for all u ∈ W m,r () ∩ Lq (), D α uLp ≤ GuθW m,r u1−θ Lq . |α|=j

Frequently, we use the inequality vH 2 ≤ C0 (vL2 + vL2 )

(2.8)

which is valid for all v ∈ H 2 () with = Rd or = Td . 3. A Uniform Bound in H 1 We show that the mass M and the energy E (see (1.8), (1.9), respectively) of the solution u to the CGL equation (1.1)–(1.2) is bounded uniformly in a, b. In the following we assume that (1.5), (2.2)–(2.3) hold. Lemma 3.1. Let u0 ∈ L2 (). For all 0 ≤ t ≤ T , it holds M(t) ≤ K0 (T ), where K0 (T ) = e2RT u0 2L2 .

208

P. Bechouche, A. Jüngel

Proof. The proof of this lemma is standard. Indeed, multiply the equation (1.1) by u∗ , integrate over , integrate by parts and take the real part to get 1 ∂t u(t)2L2 + a∇u2L2 + b 2

|u|2 g(|u|2 )dx = Ru2L2 .

An application of Gronwall’s lemma gives the assertion.

Lemma 3.2. Let u0 ∈ H 1 ()∩L2σ +2 (), µ ≥ 0, and σ < ∞. Then, for all 0 ≤ t ≤ T , E(t) ≤ K1 (T ), where K1 (T ) = (E(0) + µR.T K0 (T ))e2R max(1,(σ +1).γ

−1 )T

.

Proof. For every solution u to (1.1), (1.2), we get t

(−νu + µf (u))∂t u∗ dxdτ t t 2 = −νa uL2 dτ − µb f (u)2L2 dτ 0 0 t t 2 + νR ∇uL2 dτ + µRRe f (u)u∗ dxdτ 0 0 t − µRe (a − iν)∇f (u) · ∇u∗ dxdτ 0 t − νRe (b − iµ)∇f (u)∗ · ∇u dxdτ 0 t t 2 = −νa uL2 dτ − µb |u|2 g(|u|2 ) dxdτ 0 0 t t 2 + νR ∇uL2 dτ + µR |u|2 g(|u|2 ) dxdτ 0 0 t − (aµ + bν)Re ∇f (u) · ∇u∗ dxdτ.

E(t) − E(0) = Re

0

0

(3.1)

The fourth term on the right-hand side of (3.1) is estimated as follows. We use (2.3) to obtain µR

t

|u|2 g(|u|2 ) dxdτ t ≤ µR. (|u|2 + |u|2σ +2 ) dxdτ 0 t t ≤ µR. u2L2 dτ + µR.γ −1 (σ + 1) F (|u|2 ) dxdτ. 0

0

0

Complex Ginzburg–Landau Equation

209

For the last integral we use g (s) ≥ 0 and Re((∇u∗ )2 u2 ) ≤ |∇u|2 |u|2 : t −(aµ + bν)Re ∇f (u) · ∇u∗ dxdτ 0 t = −(aµ + bν) g(|u|2 )|∇u|2 + g (|u|2 )|u|2 |∇u|2 0 + g (|u|2 )Re((∇u∗ )2 u2 ) dxdτ t ≤ −(aµ + bν) g(|u|2 )|∇u|2 dxdτ 0

≤ 0.

Therefore, we get from (3.1), t t E(t) ≤ E(0) + νR ∇u2L2 dτ + µR. u2L2 dτ 0 0 t + µR.γ −1 (σ + 1) F (|u|2 ) dxdτ 0

≤ E(0) + µR.K0 (T )T + 2R max(1, .γ

−1

t

(σ + 1))

E(τ ) dτ,

0

using Lemma 3.1. From Gronwall’s lemma, we conclude E(t) ≤ K1 (T ). The lemma is proved.

Remark 3.3. The condition (2.3) on g can be weakened. Indeed, instead of assuming s · g (s) ≥ 0 for s ≥ 0 it suffices to assume that 1 sg (s) ≥ − g(s) 2

for s ≥ 0.

(3.2)

In order to see this we have to estimate the last term on the right-hand side of (3.1): t −(aµ + bν)Re ∇f (u) · ∇u∗ dxdτ 0 t ≤ (aµ + bν) |∇u|2 − g(|u|2 ) + |u|2 (|g (|u|2 )| − g (|u|2 )) 0 t ≤ (aµ + bν) |∇u|2 [−g(|u|2 ) + g(|u|2 )] dxdτ = 0.

0

In the case µ < 0 we need more restrictive conditions on σ : Lemma 3.4. Let u0 ∈ H 1 () ∩ L2σ +2 (), µ < 0, and either (i) σ < 2/d, or (ii) σ = 2/d and u0 L2 is small enough. Then, for all 0 ≤ t ≤ T , ∇u(t)2L2 ≤ K2 (T ), where K2 (T ) depends on the given data, on T and on ba −dσ/2 .

210

P. Bechouche, A. Jüngel

We use the Gagliardo–Nirenberg inequality (see Lemma 2.6) in some special cases: There exist G1 , G2 , G3 , G4 > 0 such that for all u ∈ H 2 (), 1 uL2σ +2 ≤ G1 uθH 1 u1−θ , L2 1

2 uL2σ +2 ≤ G2 uθH2 2 u1−θ , L2 3 uL4σ +2 ≤ G3 uθH3 2 u1−θ , L2 4 ∇uL2σ +2 ≤ G4 uθH4 2 u1−θ , L2

dσ ∈ (0, 1), 2σ + 2 1 dσ θ2 = ∈ 0, , 4σ + 4 2 dσ θ3 = ∈ (0, 1), 4σ + 2 σ (2 + d) + 2 1 ∈ 0, . θ4 = 4σ + 4 2

θ1 =

(3.3) (3.4) (3.5) (3.6)

These inequalities are valid for all σ ≤ 2/(d − 2) if d > 2 and σ < ∞ if d ≤ 2. Proof of Lemma 3.4. Again we start from the relation (3.1). Since µ < 0, we get t ν ∇u(t)2L2 + aν u2L2 dτ 2 0 ν |µ| 2 ≤ ∇u0 L2 + F (|u(t)|2 ) dx 2 2 t t 2 2 2 + |µ|b |u| g(|u| ) dxdτ + νR ∇u2L2 dτ (3.7) 0 0 t − (aµ + bν)Re ∇f (u) · ∇u∗ dxdτ. 0

We have to estimate the second, third and fifth term on the right-hand side. For the second term we use the Hypothesis (2.3), the Gagliardo–Nirenberg inequality (3.3) and Lemma 3.1: |µ| F (|u(t)|2 dx 2 |µ| ≤ . (|u(t)|2 + (σ + 1)−1 |u(t)|2σ +2 ) dx 2 |µ| |µ|. 2σ +2 (2σ +2)θ1 (2σ +2)(1−θ1 ) ≤ u(t)H 1 u(t)L2 .K0 (T ) + G 2 2σ + 2 1 +2 |µ|.G2σ |µ| +2−dσ 1 .K0 (T ) + (u(t)dσ ≤ + ∇u(t)dσ )u(t)2σ . L2 L2 L2 2 σ +1 In the last step we have used the inequality (x + y)(2σ +2)θ1 = (x + y)dσ ≤ 2(x dσ + y dσ )

for x, y ≥ 0,

since dσ ≤ 2. Again using Lemma 3.1 we get |µ| 2

F (|u(t)|2 ) dx ≤

+2 |µ|.G2σ |µ| 1 .K0 (T ) + K0 (T )σ +1 2 σ +1

+

+2 |µ|.G2σ +2−dσ 1 u(t)2σ . ∇u(t)dσ L2 L2 σ +1

(3.8)

Complex Ginzburg–Landau Equation

211

Now we have to distinguish the cases dσ < 2 and dσ = 2. Let first dσ < 2. Then, by Young’s inequality, +2 |µ|.G2σ +2−dσ 1 u(t)2σ ∇u(t)dσ L2 L2 σ +1 +2 dσ |µ|.G2σ 1 ∇u(t)2L2 ≤ε 2 σ +1 1 dσ/(2−dσ ) 2 − dσ |µ|.G2σ +2 (4σ +4−2dσ )/(2−dσ ) 1 + u(t)L2 ε 2 σ +1 +2 and taking ε = ν(σ + 1)/(2dσ |µ|.G2σ ), we conclude 1 ν |µ| F (|u(t)|2 ) dx ≤ C1 + ∇u(t)2L2 , 2 4

(3.9)

where def

C1 =

+2 |µ|.G2σ |µ| 1 (3.10) .K0 (T ) + K0 (T )σ +1 2 σ +1 2dσ |µ|.G2σ +2 dσ/(2−dσ ) (2 − dσ )|µ|.G2σ +2 1 1 + K0 (T )(2σ +2−dσ )/(2−dσ ) . ν(σ + 1) 2σ + 2

In the case dσ = 2 we choose u0 L2 small enough such that +2 |µ|.G2σ ν 1 ≤ . e2σ RT u0 2σ L2 σ +1 4

Hence, by (3.8), +2 |µ|.G2σ |µ| ν |µ| 1 K0 (T )σ +1 + ∇u(t)2L2 F (|u(t)|2 ) dx ≤ .K0 (T ) + σ +1 2 2 4 ν 2 ≤ C1 + ∇u(t)L2 , 4 and C1 > 0 is given by (3.10). The third term on the right-hand side of (3.7) is estimated by employing the assumption (2.3), the Gagliardo–Nirenberg inequality (3.5), and the relation (2.8) t |u|2 g(|u|2 )2 dxdτ b|µ| 0 t ≤ 2b|µ|. 2 (|u|2 + |u|4σ +2 ) dxdτ 0 t (4σ +2)θ3 (4σ +2)(1−θ3 ) +2 2 ≤ 2b|µ|. T K0 (T ) + 2b|µ|. 2 G4σ uH 2 uL2 dτ 3 0 t +2−dσ ≤ 2b|µ|. 2 T K0 (T ) + 4b|µ|. 2 (G3 C0θ3 )4σ +2 (udσ + udσ )u4σ dτ L2 L2 L2 0 θ3 4σ +2 (G3 C0 ) T K0 (T )2σ +1

≤ 2b|µ|. T K0 (T ) + 4b|µ|. t +2−dσ + 4b|µ|. 2 (G3 C0θ3 )4σ +2 udσ u4σ dτ. L2 L2 2

2

0

(3.11)

212

P. Bechouche, A. Jüngel

Again, we have to distinguish the cases dσ < 2 and dσ = 2. If dσ < 2 we use Young’s inequality t dσ t 4σ +2−dσ dσ uL2 uL2 dτ ≤ ε u2L2 dτ 2 0 0 1 dσ/(2−dσ ) 2 − dσ t (8σ +4−2dσ )/(2−dσ ) + uL2 dτ ε 2 0 and choose ε=

ν a 2 b 4|µ|. dσ (G3 C θ3 )4σ +2 0

to get b|µ|

t 0

aν |u| g(|u| ) dxdτ ≤ C2 + 2 2

t

2 2

0

u2L2 dτ,

(3.12)

where C2 = 2b0 |µ|. 2 T K0 (T ) + 4b0 |µ|. 2 (G3 C0θ3 )4σ +2 T K0 (T )2σ +1 def

+ 2(ba −dσ/2 )2/(2−dσ ) |µ|. 2 (G3 C0θ3 )4σ +2 (2 − dσ ) (3.13) dσ/(2−dσ ) T K0 (T )(4σ +2−dσ )/(2−dσ ) . × 4ν −1 |µ|. 2 dσ (G3 C0θ3 )4σ +2 In the case dσ = 2 choose u0 L2 such that +2−dσ 4|µ|. 2 (G3 C0θ3 )4σ +2 e(4σ +2−dσ )RT u0 4σ ≤ L2

νa . 2b

Clearly, we have to assume that the quotient b/a remains bounded as a, b → 0. Thus, we conclude from (3.11) t b|µ| |u|2 g(|u|2 )2 dxdτ 0 aν t ≤ 2b|µ|. 2 T K0 (T ) + 4b|µ|. 2 (G3 C0θ3 )4σ +2 T K0 (T )2σ +1 + u2L2 dτ 2 0 aν t ≤ C2 + u2L2 dτ, (3.14) 2 0 and C2 is given by (3.13). It remains to estimate the last term on the right-hand side of (3.7): t −(aµ + bν)Re ∇f (u) · ∇u∗ dxdτ 0 t ≤ a|µ| [g(|u|2 )|∇u|2 + 2g (|u|2 )|u|2 |∇u|2 ] dxdτ 0 t ≤ a|µ|(. + 2. ) (1 + |u|2σ )|∇u|2 dxdτ (3.15) 0 t t = a|µ|(. + 2. ) ∇u2L2 dτ + a|µ|(. + 2. ) |u|2σ |∇u|2 dxdτ. 0

0

Complex Ginzburg–Landau Equation

213

The last integral is estimated by employing the Hölder inequality and the Gagliardo– Nirenberg inequalities (3.4) and (3.6): t t |u|2σ |∇u|2 dxdτ ≤ u2σ ∇u2L2σ +2 dτ L2σ +2 0 0 t 2 1 2 ≤ G2σ G u1+α u1+α dτ, 2 4 H2 L2 0

where dσ ∈ (0, 1], 2 (4 − d)σ def α2 = 2σ (1 − θ2 ) + 2(1 − θ4 ) − 1 = ∈ (−1, 3], 2 def

α1 = 2σ θ2 + 2θ4 − 1 =

since σ ≤ 2/d. The inequalities (2.8) and (x + y)1+α1 ≤ 2(x 1+α1 + y 1+α1 )

for all x, y ≥ 0

imply

a|µ|(. + 2. )

t

|u|2σ |∇u|2 dxdτ t 2σ 2 1 1 2 ≤ 2a|µ|(. + 2. )G2 G4 (u1+α + u1+α )u1+α dτ L2 L2 L2 0

0

2 σ +1 ≤ 2a|µ|(. + 2. )G2σ 2 G4 T K0 (T ) t 2 1 2 + 2a|µ|(. + 2. )G2σ u1+α u1+α dτ. 2 G4 L2 L2

(3.16)

0

If σ < 2/d then 1 + α1 < 2, and we can apply Young’s inequality to the last integral: t 1 + α1 t 1+α2 1 u1+α u dτ ≤ ε u2L2 dτ L2 L2 2 0 0 1 (1+α1 )/(1−α1 ) 1 − α t 1 (2+2α )/(1−α1 ) + uL2 1 dτ ε 2 0 1 + α1 t ≤ε u2L2 dτ 2 0 1 (1+α1 )/(1−α1 ) 1 − α 1 T K0 (T )(1+α1 )/(1−α1 ) , + ε 2 and taking ε=

ν , 2 2|µ|(1 + α1 )(. + 2. )G2σ 2 G4

we obtain from (3.16) a|µ|(. + 2. )

t 0

|u|2σ |∇u|2 dxdτ ≤ C3 +

aν 2

0

t

u2L2 dτ,

214

P. Bechouche, A. Jüngel

where

def 2 σ +1 C3 = a0 |µ|(. + 2. )G2σ 1 G4 T 2K0 (T )

+ (1 − α1 )ε −(1+α1 )/(1−α1 ) K0 (T )(1+α1 )/(1−α1 ) .

(3.17)

In the case σ = 2/d we get 1 + α1 = 2. Choose u0 L2 small enough such that t aν t 1+α1 1+α2 2σ 2 2a|µ|(. + 2. )G2 G4 uL2 uL2 dτ ≤ u2L2 dτ. 2 0 0 Hence, the estimate (3.15) becomes t −(aµ + bν)Re ∇f (u) · ∇u∗ dxdτ 0 t aν t ≤ a0 |µ|(. + 2. ) ∇u2L2 dτ + C3 + u2L2 dτ. 2 0 0

(3.18)

Now we insert the inequalities (3.9), (3.12), and (3.18) into (3.7) to get ν ν ∇u(t)2L2 ≤ ∇u0 2L2 + C1 + C2 + C3 4 2 t + νR + a0 |µ|(. + 2. ) ∇u2L2 dτ. 0

Notice that C2 depends on ba −dσ/2 . An application of Gronwall’s inequality yields ∇u(t)2L2 ≤ K2 (T ), where

0 ≤ t ≤ T,

def K2 (T ) = 2∇u0 2L2 + 4ν −1 (C1 + C2 + C3 ) × exp 4T (R + ν −1 a0 |µ|(. + 2. )) .

This proves the lemma.

(3.19)

Lemmas 3.1, 3.2 and 3.4 yield a uniform H 1 -bound for the solutions to (1.1),(1.2), provided that ba −dσ/2 remains bounded as a, b → 0 in the case µ < 0: Corollary 3.5. Let (2.1)–(2.3) hold. In the case µ < 0 we also have to require that ba −dσ/2 remains bounded as a, b → 0. Then there exists a constant K3 (T ) > 0, only depending on the data and on T , but not on a, b, such that for every solution u to (1.1), (1.2) it holds u(t)2H 1 ≤ K3 (T ) for all 0 ≤ t ≤ T . Remark 3.6. The constant K3 (T ) is defined by K0 (T ) + K1 (T ) if µ ≥ 0 K3 (T ) = K0 (T ) + K2 (T ) if µ < 0. The precise values of K0 (T ) and K1 (T ) are given in Lemma 3.1, 3.2, respectively. The constant K2 (T ) is defined in (3.19), with constants C1 , C2 , C3 which are introduced in (3.10), (3.13) and (3.17), respectively.

Complex Ginzburg–Landau Equation

215

Remark 3.7. The above corollary also holds true if µ ≥ 0 and the second chain of inequalities in condition (2.3) is replaced by 1 − g(s) ≤ s · g (s) ≤ . (1 + s σ ) 2

for s ≥ 0

(see Remark 3.3). Remark 3.8. The estimates of Lemma 3.1 and 3.2 are rather standard, whereas the estimate of Lemma 3.4 seems to be new, in the sense that the particular estimate and the dependence on the parameters have not been presented before. 4. Proof of Theorems 2.1 and 2.2 Let uε be a solution to (1.1)–(1.2). Let p = 2d/(d − 2) (p < ∞ if d ≤ 2). When µ ≥ 0, we choose 2/(2σ +1) < q < (2σ +2)/(2σ +1). When µ < 0, take q = 2d/(d +2); then q is the conjugate exponent to p, i.e. 1/p + 1/q = 1, and the condition σ < 2/d implies (2σ + 1)q < p, showing that the embedding H 1 () 2→ L(2σ +1)q () is continuous. In order to give uniform bounds for f (uε ) we have to consider the cases = Rd and = Td separately. First let = Td and µ ≥ 0. From (2.3) we get, for all 0 ≤ t ≤ T ,

f (uε (t))Lq ≤ .

q

2σ q

1/q

|uε (t)| (1 + |uε (t)| ) dx

q (2σ +1)q 1/q ≤ . uε (t)Lq + uε (t)L(2σ +1)q +1 , ≤ c uε (t)L2 + uε (t)2σ L2σ +2 where here and in the following we denote by c, ci positive constants independent of ε = (a, b) and t. In the last step we have used that q < 2, hence L2 () 2→ Lq (), and L2σ +2 () 2→ L(2σ +1)q (), because is bounded. When µ < 0, we obtain as above (since q = 2d/(d + 2) < 2) q (2σ +1)q 1/q f (uε (t))Lq ≤ . uε (t)Lq + uε (t)L(2σ +1)q +1 . ≤ c uε (t)L2 + uε (t)2σ 1 H Applying Lemma 3.2 and Corollary 3.5, we conclude that (f (uε )) is uniformly bounded in L∞ (0, T ; Lq ()) when = Td . Now let = Rd . Then the above arguments do not apply since the injection q L () 2→ L2 () is generally not valid. Here we need the condition g(s) ≤ .s σ . Then, if µ ≥ 0, +1 ≤ c. f (uε (t))Lq ≤ .uε (t)2σ L(2σ +1)q The last inequality follows from the fact that the uniform bounds on uε (t) in L2 () and L2σ +2 () imply a uniform bound in L(2σ +1)q () since 2 < (2σ + 1)q < 2σ + 2, by interpolation. When µ < 0, we obtain +1 +1 ≤ cuε (t)2σ ≤ c. f (uε (t))Lq ≤ .uε (t)2σ H1 L(2σ +1)q

216

P. Bechouche, A. Jüngel

Applying Corollary 3.5, we conclude that (f (uε )) is uniformly bounded in L∞ (0, T ; Lq ()). Since V ∗ = H −1 () + L(2σ +2)/(2σ +1) (), this implies that (see, e.g., [6, p. 217]) ∂t uε (t)V ∗ ≤ cuε (t)V ≤ c. This shows that the sequence (uε ) is uniformly bounded in L∞ (0, T ; H 1 ()) and W 1,∞ (0, T ; V ∗ ). From the above estimates we get the following weak convergence results for a subsequence of (uε ), which is not relabeled: uε 0 u in L∞ (0, T ; H 1 ()) weakly-∗, ∂t uε 0 ∂t u in L∞ (0, T ; V ∗ ) weakly-∗, f (uε ) 0 f in L∞ (0, T ; Lq ()) weakly- ∗ . It remains to identify the limit f . For this, let ω ⊂ be an arbitrary bounded open set. Then (uε ) is also bounded in L∞ (0, T ; H 1 (ω)) and W 1,∞ (0, T ; V ∗ (ω)), where V ∗ (ω) = H −1 (ω) + L(2σ +2)/(2σ +1) (ω). By Lemma 9.4.6 of [6], there exists a subsequence, still denoted by (uε ), such that for every t > 0, uε (x, t) → u(x, t)

for almost every x ∈ ω,

L∞ (0, T ; H 1 (ω)) ∩ W 1,∞ (0, T ; V ∗ (ω)). Applying the mean value theorem to

and u ∈ both Re(f ) and Im(f ), we get for all u, v ∈ C, 1 f (u) − f (v) = (u − v)(g(|wλ |2 ) + |wλ |2 g (|wλ |2 )) + (u − v)∗ wλ2 g (|wλ |2 ) dλ, 0

(4.1) where wλ = λu + (1 − λ)v. Thus, using Hölder’s inequality and condition (2.3), for all 0 ≤ s, t ≤ T , f (uε (t)) − f (u(t))Lq (ω)

1 q 1/q q 2σ ≤ (. + 2. ) |uε (t) − u(t)| (1 + |λuε (t) + (1 − λ)u(t)| )dλ dx

≤ (. + 2. ) ≤ c1

ω

ω

ω

0

|uε (t) − u(t)|

q

1 + max(1, 2

2σ −1

)(|uε (t)|

|uε (t) − u(t)|q (1 + |uε (t)|2σ + |u(t)|2σ )q dx

≤ c2 uε (t) − u(t)Lq (ω) + c2

q

ω

2σ

q

+ |u(t)| ) 2σ

1/q dx

1/q

|uε (t) − u(t)| (|uε (t)|

2σ q

+ |u(t)|

2σ q

1/q )dx

≤ c2 uε (t) − u(t)Lq (ω) 2σ + c2 uε (t)2σ + u(t) uε (t) − u(t)L(2σ +1)q (ω) , (2σ +1)q (2σ +1)q L L where c1 = (. + 2. ) max(1, 22σ −1 ) and c2 = 31−1/q c1 . First, consider µ ≥ 0. Then, since (uε (t)) is bounded in L(2σ +1)q () (see above), f (uε (t)) − f (u(t))Lq (ω) ≤ c2 uε (t) − u(t)Lq (ω) + c3 uε (t) − u(t)L(2σ +1)q (ω) .

Complex Ginzburg–Landau Equation

217

We know that uε (t) → u(t) a.e. in ω and (uε (t)) is uniformly bounded in L2σ +2 (ω). Hence, by Lemma 1.3 of [29, Ch. 1], uε (t) → u(t) in Ls (ω) strongly for all s < 2σ + 2. In particular, we can choose s = q and s = (2σ + 1)q. Hence, f (uε ) → f (u)

in L∞ (0, T ; Lq (ω)) strongly.

(4.2)

Now, let µ < 0. Then, by Corollary 3.5, f (uε (t)) − f (u(t))Lq (ω) ≤ c2 uε (t) − u(t)Lq (ω) + 2c2 sup uε (t)2σ u (t) − u(t)L(2σ +1)q (ω) H 1 () ε 0
≤ c2 uε (t) − u(t)Lq (ω) + c4 uε (t) − u(t)Lr (ω) . To see the last step observe that, since (2σ + 1)q < p, there exists r < p such that (2σ + 1)q ≤ r. The pointwise a.e. convergence of (uε (x, t)) and the uniform bound on (uε ) in L∞ (0, T ; H 1 (ω)) imply the strong convergence of (uε ) to u in L∞ (0, T ; Lr (ω)) with r < p = 2d/(d − 2), and hence the convergence (4.2) holds. Therefore, we get f (u) = f

in ω × (0, T )

for every bounded ω ⊂ , hence almost everywhere in × (0, T ). Now, the limit in the equation satisfied for uε can be performed and we see that u solves the nonlinear Schrödinger equation (1.4). Finally, notice that the initial value is satisfied in the sense of V ∗ . 5. A Uniform Bound in H 2 () In this section we derive a uniform bound in H 2 () for the solution u of the CGL equation for space dimensions d ≤ 3. First we prove the existence of a uniform bound in one space dimension. Lemma 5.1. Let the conditions (2.1)–(2.2), (2.4), (2.6) hold. Furthermore, let d = 1 and u0 ∈ H 2 (). Then there exists a constant K4 (T ) > 0, only depending on the data and on T , but not on a, b such that ∂xx u(t)2L2 ≤ K4 (T ) for all 0 ≤ t ≤ T . Proof. The first part of the proof is valid for any space dimension. We impose the restriction d = 1 later for the estimation of the L∞ bound of u. Taking the Laplacian of Eq. (1.1) we obtain ∂t u = (a + iν)2 u + Ru − (b + iµ) g(|u|2 )u + 2g (|u|2 )∇u · ∇|u|2 + ug (|u|2 )(∇|u|2 )2 + ug (|u|2 )|u|2 .

218

P. Bechouche, A. Jüngel

Multiplying this equation by u∗ , taking the real part and integrating over yields 1 ∂t u2L2 = −a∇u2L2 + Ru2L2 − b g(|u|2 )|u|2 dx 2 − 2Re (b + iµ)g (|u|2 )∇u · ∇|u|2 u∗ dx − Re (b + iµ)ug (|u|2 )(∇|u|2 )2 u∗ dx − Re (b + iµ)ug (|u|2 )|u|2 u∗ dx. (5.1)

The fourth term on the right-hand side can be estimated by using (2.6) and the inequality |∇|u|2 | ≤ 2|∇u||u|: −2Re (b + iµ)g (|u|2 )∇u · ∇|u|2 u∗ dx ≤ 4.1 |b + iµ| (1 + |u|2σ −1 )|∇u|2 |u|dx

−1 2 ≤ 4.1 |b + iµ|(1 + u2σ L∞ )∇uL4 uL2 .

The fifth term on the right-hand side of (5.1) is treated similarly −Re (b + iµ)ug (|u|2 )(∇|u|2 )2 u∗ dx ≤ 4.1 |b + iµ| (1 + |u|2σ −1 )|∇u|2 |u|dx

−1 2 ≤ 4.1 |b + iµ|(1 + u2σ L∞ )∇uL4 uL2 .

In the last term of (5.1) we use the inequality ||u|2 | ≤ 2|u||u| + |∇u|2 to get −Re (b + iµ)ug (|u|2 )|u|2 u∗ dx ≤ 2.1 |b + iµ| (1 + |u|2σ −1 )(2|u||u| + |∇u|2 )|u|dx

−1 2 ∞ ≤ 2.1 |b + iµ|(1 + u2σ L∞ )(2uL2 uL + ∇uL4 )uL2 .

Hence, we obtain from Eq. (5.1), −1 2 ∞ ∂t u2L2 ≤ 2Ru2L2 + 4.1 |b + iµ|(1 + u2σ L∞ )uL uL2 −1 2 + 8(2.1 + .1 )|b + iµ|(1 + u2σ L∞ )∇uL4 uL2 .

The following Gagliardo–Nirenberg inequality holds for d ≤ 4: d/4

1−d/4

∇uL4 ≤ G5 ∇uH 1 ∇uL2

.

By employing Lemma 3.2 and Corollary 3.5 this gives d/2

2−d/2

∇u2L4 ≤ G25 ∇uH 1 ∇uL2

≤ G25 (∇u2L2 + u2L2 )d/4 ∇uL2

2−d/2

1−d/4

≤ G25 K3

d/4

d/2

max(1, K3 )(1 + uL2 ),

Complex Ginzburg–Landau Equation

219

and we can continue to estimate −1 2 ∞ ∂t u2L2 ≤ [2R + 4.1 |b0 + iµ|(1 + u2σ L∞ )uL ]uL2 d/2

−1 + C4 (1 + u2σ L∞ )(1 + uL2 )uL2 ,

(5.2)

where we have set C4 = 8(2.1 + .1 )|b0 + iµ|G25 K3 def

1−d/4

d/4

max(1, K3 ).

(5.3)

In order to bound the L∞ norm of u, we have to impose the condition d = 1. Then it holds 1/2

1/2

uL∞ ≤ GuH 1 uL2

with G = lary 3.5,

(5.4)

√ √ 2 if = R and G = 1/ π if = T. Thus, by Lemma 3.1 and CoroluL∞ ≤ (4K0 K3 )1/4 ,

and using the elementary inequalities 1 (1 + ∂xx u2L2 ), 2 1 ≤ (1 + 3∂xx u2L2 ), 4

∂xx uL2 ≤ 3/2

∂xx uL2

we obtain from (5.2) ∂t ∂xx u2L2 ≤ 2R + 4.1 |b + iµ|(1 + (4K0 K3 )σ/2−1/4 )(4K0 K3 )1/4 ∂xx u2L2 3 5 + C4 1 + (4K0 K3 )σ/2−1/4 + ∂xx u2L2 4 4 3 σ/2−1/4 ≤ C4 (1 + (4K0 K3 ) ) + C5 ∂xx u2L2 , 4 where C5 = 2R + 4.1 |b + iµ|(1 + (4K0 K3 )σ/2−1/4 )(4K0 K3 )1/4 5 + C4 (1 + (4K0 K3 )σ/2−1/4 ). 4 def

By Gronwall’s lemma, we conclude that ∂xx u(t)2L2 ≤ K4 (T )

for all 0 ≤ t ≤ T ,

where 3 def K4 (T ) = ∂xx u0 2L2 + C4 (1 + (4K0 K3 )σ/2−1/4 )eC5 T . 4 This finishes the proof.

220

P. Bechouche, A. Jüngel

In two and three space dimensions we have the following result: Lemma 5.2. Let the conditions (2.1)–(2.2), (2.4), (2.6) hold. Furthermore, let 1 ≤ d ≤ 3 and u0 ∈ H 2 (). Then there exist constants K5 (t), T ∗ > 0 depending on the data but not on a, b such that u(t)2L2 ≤ K5 (t) for all 0 ≤ t < T ∗ . Remark 5.3. The constant K5 (t) satisfies limt→T ∗ K5 (t) = +∞. Proof. We start from the inequality (5.2) which holds for 1 ≤ d ≤ 4. For space dimensions d ≥ 2, we cannot use the H 1 norm for estimating the L∞ norm of u as in the proof of Lemma 5.1. Instead we employ the Gagliardo–Nirenberg inequality d/4

1−d/4

uL∞ ≤ G6 uH 2 uL2

,

(5.5)

which holds for d ≤ 3. Thus, using the inequality (2.8) we get d/4

d/8

uL∞ ≤ G6 C0 (K0

d/4

1/2−d/8

+ uL2 )K0

d/4

≤ C6 (1 + uL2 ), where we have set d/4

def

1/2−d/8

C6 = G6 C0 K0

d/8

max(1, K0 ).

(5.6)

The above estimate together with the inequality (5.2) yields d/4

∂t u2L2 ≤ 2Ru2L2 + 4.1 |b0 + iµ|C6 (1 + C62σ −1 (1 + uL2 )2σ −1 ) d/4

×(1 + uL2 )u2L2

d/4

d/2

+ C4 (1 + C62σ −1 (1 + uL2 )2σ −1 )(1 + uL2 )uL2 . Since d < 4 the largest exponent of uL2 is 2 + σ d/2, and therefore, there exist two constants L1 , L2 > 0 only depending on d, σ, R, .1 , |b0 + iµ|, C4 , and C6 such that 2+σ d/2

∂t u2L2 ≤ L1 + L2 uL2

.

(5.7)

From the Gronwall-type inequality of the appendix we conclude that u(t)2L2 ≤ K5 (t)

for all 0 ≤ t < T ∗ ,

where def

K5 (t) =

u0 2L2

+

L 4/(4+σ d) −σ d/4 1

L2

σd − L2 t 4

−4/σ d −

L 4/(4+σ d) 1

L2

and L 4/(4+σ d) −σ d/4 4 1 2 u0 L2 + . T = σ dL2 L2 ∗ def

This proves the lemma.

(5.8)

Complex Ginzburg–Landau Equation

221

Remark 5.4. Lemma 5.2 can be improved slightly by replacing the Gagliardo-Nirenberg inequality (5.5) by the inequality [40] 1/2

1/2

uL∞ ≤ B0 uH 1 uH 2

(5.9)

if d = 3, and by the Brézis-Gallouet inequality [5, 35] 1/2 uL∞ ≤ B1 uH 1 1 + ln(1 + B2 uH 2 )

(5.10)

if d = 2, in the sense that the final time T ∗ > 0 can be chosen larger than in (5.8). Indeed, from (5.2) we obtain for d = 3, using (5.9), ∂t u2L2 ≤ B3 + B4 u2+σ , L2 and for d = 2, employing the elementary inequality ln(1 + x) ≤ Cδ (1 + x δ )

for x ≥ 0, δ > 0,

and the Brézis-Gallouet inequality (5.10), ∂t u2L2 ≤ B5 + Bδ u2+δ , L2 for any δ > 0. Now Lemma 8.1 of the Appendix applies. Since the exponents of the right-hand sides of the above two estimates are smaller than the corresponding exponent in (5.7) (if δ < σ ), we can choose T ∗ > 0 larger than in (5.8). Remark 5.5. The estimates of Lemmas 5.1 and 5.2 seem to be new in the presented form. 6. Proof of Theorems 2.3 and 2.4 Proof of Theorem 2.3. The difference w = uε − u satisfies the equation ∂t w = iνw − iµ(f (uε ) − f (u)) + Rw + auε − bf (uε ). Multiplying this equation by w∗ , integrating over , taking the real part and finally, using the formula (4.1) gives 1 1 ∂t w2L2 = µIm wλ2 g (|wλ |2 )(w ∗ )2 dλdx + Rw2L2 2 0 ∗ + aRe uε w dx − bRe f (uε )w ∗ dx,

where wλ = λuε + (1 − λ)u. Therefore, by assumption (2.3), 2 ∂t wL2 ≤ 2|µ|. (1 + max(1, 22σ −1 )(|uε |2σ + |u|2σ ))|w|2 dx + 2Rw2L2 + 2a uε w ∗ dx 2σ + 2b. (1 + |uε | )|uε ||w|dx

2σ 2 ≤ 2|µ|. (1 + max(1, 22σ −1 )(uε 2σ L∞ + uL∞ ))wL2 + 2Rw2L2 + 2a uε w ∗ dx

ε + 2b.(1 + uε 2σ L∞ )u L2 wL2 .

(6.1)

222

P. Bechouche, A. Jüngel

Now, let d = 1. Then, using the inequality (5.4) and observing that G ≤

√

2, we get

uε 2L∞ ≤ G2 (K0 K3 )1/2 ≤ (4K0 K3 )1/2 , u2L∞ ≤ G2 (K0 K3 )1/2 ≤ (4K0 K3 )1/2 , where we also have used Lemma 3.1 and Corollary 3.5. Hence, since 1/2

wL2 ≤ uε L2 + uL2 ≤ 2K0 , 1/2

wH 1 ≤ uε H 1 + uH 1 ≤ 2K3 , we obtain from (6.1) ∂t w2L2 ≤ 2|µ|. (1 + max(2, 22σ )(4K0 K3 )σ/2 )w2L2 + 2Rw2L2 + 2a∂x uε L2 ∂x wL2 + 4b.(1 + (4K0 K3 )σ/2 )K0

≤ C7 w2L2 + a · 4K3 + b · 4.K0 (1 + (4K0 K3 )1/2 ), where we have set C7 = 2R + 2|µ|. (1 + max(2, 4σ )(4K0 K3 )σ/2 ). def

(6.2)

Applying Gronwall’s lemma, we conclude that w(t)2L2 ≤ w(0)2L2 + a · 4K3 eC7 T + b · 4.K0 (1 + (4K0 K3 )σ/2 )eC7 T , which proves the theorem.

Proof of Theorem 2.4. We start from the inequality (6.1) which is valid for any d ≥ 1. In order to bound the L∞ norm of uε and u we use the Gagliardo–Nirenberg inequality (5.5), the inequality (2.8) and Lemmas 5.1 and 5.2: d/4

1−d/4

uε L∞ ≤ G6 uε H 2 uε L2 d/4

d/4

d/4

1−d/4

≤ G6 C0 (uε L2 + uε L2 )uε L2 ≤ C8 , where def

d/4

(4−d)/8

C8 = G6 C0 K0 Moreover,

d/8

(K0

+ max(K4 , K5 )d/8 ).

(6.3)

uL∞ ≤ C8 .

Then the estimate (6.1) yields ∂t w2L2 ≤ 2R + 2|µ|. (1 + max(2, 4σ )C82σ ) w2L2 1/2

+ 2auε L2 wL2 + 2b.(1 + C82σ )K0 wL2 1/2

≤ 2C9 w2L2 + 2[a max(K4 , K5 )1/2 + b.(1 + C82σ )K0 ]wL2 , where C9 = R + |µ|. (1 + max(2, 4σ )C82σ ). def

(6.4)

Complex Ginzburg–Landau Equation

223

Therefore, 1/2

∂t wL2 ≤ C9 wL2 + a · max(K4 , K5 )1/2 + b · .(1 + C82σ )K0 , and we obtain from Gronwall’s lemma w(t)L2 ≤ w(0)L2 + a · max(K4 , K5 )1/2 eC9 t + b · .(1 + C82σ )K0 eC9 t , 1/2

for all 0 ≤ t < T ∗ (and 0 ≤ t ≤ T if d = 1). This finishes the proof.

7. Proof of Theorem 2.5 First we prove the following uniform bound in H n () for the solution u to (1.1)–(1.2). Lemma 7.1. Under the assumptions of Theorem 2.5 there exist constants T ∗ > 0 and K6 (t) > 0 independent of a, b such that u(t)H n ≤ K6 (t)

for all 0 ≤ t < T ∗ .

Proof. Let D α u denote the partial derivative of u according to the multi-index α. Applying D α with |α| = n to the CGL equation (1.1), multiplying this equation by D α u∗ , taking the real part and integrating over yields α 2 α 2 ∂t |D u| dx = −2a |D u| dx + 2R |D α u|2 dx |α|=n

|α|=n+1

− 2Re

|α|=n

(b + iµ)

|α|=n

D α (|u|2σ u)D α u∗ dx.

Using the notations ∇ n u = (D α u)|α|=n , ∇ n u2L2 = |D α u|2 dx, |α|=n

we obtain

∂t ∇ n u2L2 ≤ 2R∇ n u2L2 + 2|b0 + iµ| ∇ n (|u|2σ u) · ∇ n u∗ dx .

(7.1)

From [28, p. 183] we get the existence of a constant c(d) > 0 only depending on the space dimension, such that the last term of the above inequality is majorized by 2|b0 + iµ|c(d)∇ n u2L2 u2σ L∞ . The L∞ norm of u is estimated by using the Gagliardo–Nirenberg inequality , uL∞ ≤ G7 uθH n u1−θ L2

(7.2)

224

P. Bechouche, A. Jüngel

with θ = d/2n < 1, so that we conclude from (7.1) σ (1−θ)

∂t ∇ n u2L2 ≤ 2R∇ n u2L2 + 2G2σ 7 c(d)|b0 + iµ|K0

θ n 2 u2σ H n ∇ uL2 .

0 > 0 such that There exists a constant C 0 (uL2 + ∇ n uL2 ). uH n ≤ C Hence, introducing y(t) = ∇ n u(t)2L2 , this gives ∂t y ≤ (2R + C10 K0σ θ )y + C10 y 1+σ θ , where

σ (1−θ) 2σ θ C0

def

C10 = 2C72σ c(d)|b0 + iµ|K0

max(1, 22σ θ−1 ).

Employing Hölder’s inequality, there exist two constants D1 , D2 > 0 such that ∂t y ≤ D1 + D2 y 1+σ θ . Lemma 8.1 of the Appendix implies that for 0 ≤ t < T ∗ ,

−1/σ θ D 1/(1+σ θ) −σ θ D1 1/(1+σ θ) def 1 y(t) ≤ K6 (t) = y(0) + − σ θD2 t − , D2 D2 where T ∗ > 0 is given by D 1/(1+σ θ) σ θ −1 def 1 T ∗ = σ θ D2 y(0) + . D2 The lemma is proved.

Proof of Theorem 2.5. The proof is very similar to the proof of Theorem 2.4. We only have to use the Gagliardo–Nirenberg inequality (7.2) instead of (5.5) and to apply Lemma 7.1 in order to obtain a uniform L∞ bound on uε .

8. Appendix Lemma 8.1. Let the function y : (0, T ) → R be absolutely continuous and satisfy the following differential inequality: y ≤ Ay α + B

for t > 0,

with α > 1, A, B > 0 and y(0) > 0. Then there exists a constant T ∗ > 0 such that for all 0 ≤ t < T ∗ ,

−1/(α−1) B 1/α 1−α B 1/α y(t) ≤ y(0) + + (1 − α)At − , A A where T ∗ > 0 is given by α−1 −1 T ∗ = (α − 1)A y(0) + (B/A)1/α .

Complex Ginzburg–Landau Equation

225

Proof. Via the change of variable z = y + C, where C = (B/A)1/α , we get from (y + C)α ≥ y α + C α the differential inequality

z − Azα ≤ 0.

This inequality can be integrated: −1/(α−1) , z(t) ≤ z(0)1−α + (1 − α)At for all 0 ≤ t < T ∗ . The lemma follows. Acknowledgements. The authors would like to thank David Levermore for stimulating discussions. The authors acknowledge support from the DAAD-PROCOPE Program, the DAAD “Acciones Integradas” Program, and from the TMR Project “Asymptotic Methods in Kinetic Theory”, grant number ERB-FMBX-CT97-0157. The second author is partly supported by the Gerhard-Hess Program of the Deutsche Forschungsgemeinschaft, grant number JU359/3-1.

References 1. Bartuccelli, M., Constantin, P., Doering, C., Gibbon, J. and Gisselfält, M.: On the possibility of soft and hard turbulence in the complex Ginzburg–Landau equation. Physica D 44, 421–444 (1990) 2. Blennerhassett, P.: On the generation of waves by wind. Philos. Trans. Roy. Soc. London, Ser. A 298, 451–494 (1980) 3. Bourgain, J.: Exponential sums and nonlinear Schrödinger equations. Geom. Funct. Anal. 3, 157–178 (1993) 4. Bourgain, J.: Fourier transform restriction phenomena for certain lattice subsets and applications to nonlinear evolution equations. Geom. Funct. Anal. 3, 109–156 (1993) 5. Brézis, H. and Gallouet, T.: Nonlinear Schrödinger evolution equations. Nonlin. Anal. TMA 4, 677–681 (1980) 6. Cazenave, T.: An Introduction to Nonlinear Schrödinger Equations. Textos de Métodos Matemáticos 26. Rio de Janeiro: Universidade Federal do Rio de Janeiro, 3rd edition, 1996 7. Cazenave, T. and Weissler, F.: The Cauchy problem for the critical nonlinear Schrödinger equation in H s . Nonlin. Anal. TMA 14, 807–836 (1990) 8. Cross, M. and Hohenberg, P.: Pattern formation outside of equilibrium. Rev. of Modern Phys. 65, 851– 1089 (1993) 9. Doering, C., Gibbon, J. and Levermore, C.D.: Weak and strong solutions of the complex Ginzburg-Landau equation. Physica D 71, 285–318 (1994) 10. Gilbarg, D. and Trudinger, N.S.: Elliptic Partial Differential Equations of Second Order. Berlin: Springer, 2nd edition, 1983 11. Ginibre, J.: Le problème de Cauchy pour des EDP semi-linéaires périodiques en variables d’espace. Astérisque 237, 163–187 (1996) 12. Ginibre, J.: An introduction to nonlinear Schrödinger equations. In: Agemi et al., editor, Nonlinear waves, volume 10 of Gakkutosho Gakuto Int. Ser., Tokyo, 1997, pp. 85–133 13. Ginibre, J. and Ozawa, T.: Long range scattering for non-linear Schrödinger and Hartree equations in space dimension n ≥ 2. Commun. Math. Phys. 151, 619–645 (1993) 14. Ginibre, J. and Velo, G.: The Cauchy problem in local spaces for the complex Ginzburg-Landau equation. I. Compactness methods. Physica D 95, 191–228 (1996) 15. Ginibre, J. and Velo, G.: The Cauchy problem in local spaces for the complex Ginzburg-Landau equation. II. Contraction methods. Commun. Math. Phys. 187, 45–79 (1997) 16. Ginibre, J. and Velo, G.: Generalized estimates and Cauchy problem for the logarithmic complex Ginzburg-Landau equation. J. Math. Phys. 38, 2475–2482 (1997) 17. Ginzburg, V. and Landau, L.: On the theory of superconductivity. Zh. Eksp. Fiz. 20, 1064 (1950) English transl. in: Men of Physics: L. D. Landau, Vol. I, D. Ter Haar (ed.), New York: Pergamon Press, 1965, pp. 546–568 18. Goldman, D. and Sirovich, L.: The one-dimensional complex Ginzburg–Landau equation in the low dissipation limit. Nonlinearity, 7, 417–439 (1994)

226

P. Bechouche, A. Jüngel

19. Huber, G. and Alstrom, P.: Universal decay of vortex density in two dimensions. Physica A 195, 448–456 (1993) 20. Huber, G., Alstrom, P. and Bohr, T.: Nucleation and transients at the onset of vortex turbulence. Phys. Rev. Lett. 69, 2380–2383 (1992) 21. Kastenholz, M., Lange, H. and Nieder, D.: Periodic solutions of nonlinear Schrödinger equations. Portug. Math. 46, 517–537 (1989) 22. Kenig, C., Ponce, G. and Vega, L.: Small solutions in nonlinear Schrödinger equations. Annales I.H.P., Anal. nonlin. 10, 255–288 (1993) 23. Kirrmann, P., Schneider, G. and Mielke, A.: The validity of modulation equations for extended systems with cubic nonlinearities. Proc. Roy. Soc. Edinb., Sect. A, 122, 85–91 (1992) 24. Kuramoto, Y.: Chemical Oscillations, Waves and Turbulence, Volume 19 of Series in Synergetics. New York: Springer, 1984 25. Lega, J. and Fauve, S.: Traveling hole solutions to the complex Ginzburg-Landau equation as perturbation of nonlinear Schrödinger dark solitons. Physica D 102, 234–252 (1979) 26. LeMesurier, B., Papanicolau, G., Sulem, C. and Sulem, P.: Local structure of the self-focusing singularity of the nonlinear Schrödinger equation. Physica D 32, 210–226 (1988) 27. Levermore, C.D.: Work in preparation, 2000 28. Levermore, C.D. and Oliver, M.: The complex Ginzburg-Landau equation as a model problem. In P. Deift, editor, Dynamical systems and probabilistic methods in partial differential equations. Volume 31 of Lectures in Appl. Math., Providence, RI: 1996. AMS, pp. 141–190 29. Lions, J.L.: Quelques méthodes de résolution des problèmes aux limites non linéaires. Paris: Dunod, 1969 30. Merle, F. and Tsutsumi, Y.: L2 -concentration of blow-up solutions for the nonlinear Schrödinger equation with critical power nonlinearity. J. Diff. Eqs. 84, 205–214 (1990) 31. Mielke, A. and Schneider, G.: Derivation and justification of the complex Ginzburg-Landau equation as a modulation equation. In: P. Deift et al., editor, Dynamical Systems and Probabilistic Methods in Partial Differential Equations, Volume 31 of Lectures in Appl. Math. Providence, RI: AMS, 1996, pp. 191–216 32. Newell, A., Passot, T. and Lega, J.: Order parameter equations for patterns. Annual Rev. Fluid Mech. 25, 399–453 (1993) 33. Newell, A. and Whitehead, J.: Finite bandwidth, finite amplitude convection. J. Fluid Mech. 38, 279–303 (1969) 34. Newell, A. and Whitehead, J.: Review of the finite bandwidth concept. In H. Leipholz, editor, Proceedings of the Internat. Union of Theor. and Appl. Math.. Berlin: Springer, 1971, pp. 284–289 35. Ozawa, T.: On critical cases of Sobolev’s inequalities. J. Funct. Anal. 127, 259–269 (1995) 36. Rasmussen, I. and Rypdal, K.: Blow-up in Schrödinger equations I – A general review. Physica Scripta 33, 498–504 (1987) 37. Simon, J.: Compact sets in the space Lp (0, T ; B). Ann. Math. Pura Appl. 146, 65–96 (1987) 38. Stark, D.: Structure and turbulence in the complex Ginzburg–Landau equation with a nonlinearity of arbitrary order. PhD thesis, University of Arizona, Tucson, USA, 1995 39. Stuart, J. and Di Prima, R.: The Eckhaus and Benjamin-Feir resonance mechanisms. Proc. Roy. Soc. Lond. Ser. A 362, 27–41 (1978) 40. Temam, R.: Infinite-Dimensional Dynamical Systems in Mechanics and Physics. New York: Springer, 1988 41. Wu, J.: The inviscid limit of the complex Ginzburg-Landau equation. J. Diff. Eqs. 142, 413–433 (1998) Communicated by J. L. Lebowitz

Commun. Math. Phys. 214, 227 – 232 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Calculation of a Certain Determinant Madan Lal Mehta1 , Rong Wang2 1 S.Ph.T., C.E. de Saclay, 91191 Gif-sur-Yvette Cedex, France; member of C.N.R.S.

E-mail: [email protected]

2 Institute of Biophysics, Chinese Academy of Sciences, Beijing, P.R. China

Received: 30 August 1999 / Accepted: 29 March 2000

Abstract: The n×n determinant det[(a+j −i)(b+j +i)] is evaluated. This completes the calculation of the Mellin transform of the probability density of the determinant of a random quaternion self-dual matrix taken from the gaussian symplectic ensemble. The inverse Mellin transform then gives the later probability density itself. 1. Introduction and Results Matrices whose elements, or real parameters determining the elements, are Gaussian random variables have been extensively studied for the statistical properties of their spectra [1]. One particular property is the probability density of the determinant (PDD) of such matrices. The PDD of n × n real symmetric matrices was derived by Nyquist et al. [2] to be a Meijer G-function. The PDD of n × n matrices belonging to the following classes was considered recently by Normand and one of the authors [3] (i) complex hermitian, (ii) complex, (iii) quaternion real self-dual, (iv) quaternion real, and (v) real symmetric matrices. The method used was to compute the Mellin transform of the PDD and then invert it to get the PDD itself. The Mellin transform in case (i) turned out to be a determinant with elements (a + i + j ), simple enough to evaluate. Thus the PDD for matrices of cases (i) and (ii) was found to be either a Meijer G-function or a linear combination of two Meijer G-functions. For case (iii) the Mellin transform of the PDD was found to depend on determinants of matrices with elements [(a +j −i)(b +j +i)] and a = 0 or 1/2, b = s/2 or s ± 1/2. These determinants are evaluated here thus completing the derivation of the PDD of matrices in case (iii). The question of the PDD of matrices in cases (iv) and (v) remains open. In this brief note we evaluate the determinant of any matrix [(a + j − i)(b + j + i)]i,j =0,1,...,n−1

(1)

or its pfaffian when a = 0. The result is Mn (a, b) = det[(a + j − i)(b + j + i)]i,j =0,1,...,n−1

(2)

228

M. L. Mehta, R. Wang n−1

= Dn

i! (b + i),

(3)

i=0

M2n (0, b) = Pf[(j − i)(b + j + i)]i,j =0,1,...,2n−1 =

n−1

(2i + 1)! (b + 2i + 1),

(4)

i=0

where Dn = det[aδi,j − δi,j +1 + j (b + i)δi,j −1 ] n

−(b+a)/2

= coeff. of (z /n!) in (1 − z) (1 + z) n b−a b+a = (−1)k 2 2 k n−k

(5) −(b−a)/2

(6) (7)

k=0

with Pochhammer’s symbol (a)n = (a + n)/ (a).

(8)

The expression for Dn simplifies when a is a small integer or when a = b. The final result for the probability density gn (y) of the determinant of an n × n random quaternion self-dual matrix is as follows. It is convenient to consider the even and odd parts gn± (y) = 21 [gn (y) ± gn (−y)] of gn (y) separately. The Mellin transform of the even part gn+ (y) was found [3] to be a constant times Mn 21 , 2s while that of the odd part gn− (y) was found to be a constant times Pf[(j − i) j + i + s+1 × Pf[(j − 2 s−1 i) j + i + 2 , if n is even, and zero if n is odd. The Mellin transforms of gn± (y) are thus seen to be either a product of Gamma functions or a linear combination of these products. The gn± (y) and their sum gn (y) themselves are thus either a Meijer G-function or a linear combination of two Meijer G-functions. However, it is cumbersome to write their expressions except for small values of n. The corresponding problem for random real symmetric matrices or for quaternion real matrices with gaussian element densities remains open, as noted earlier.

2. Evaluation of the Determinant We will need the following lemma. Lemma. For j a non-negative integer and A and B complex numbers one has the identity F (j, A, B) ≡

j k=0

(−1)k

j (A + j − k)k (B)j −k . = (B − A − j + 1)j k

(9)

Calculation of a Certain Determinant

229

Proof. The lemma is trivial for j = 0 and easy to verify for j = 1. Suppose that it is true for some positive integer j . Then for the next integer F (j + 1, A, B) =

j +1

(−1)

k

k=0

j +1 (A + j + 1 − k)k (B)j +1−k k

j j + (A + j + 1 − k)k (B)j +1−k = (−1) k k−1 k=0 k j (A + 1 + j − k)k (B + 1)j −k =B (−1) k k j k−1 (A + j − k + 1)k−1 (B)j −k+1 − (A + j ) (−1) k−1 j +1

k

k

= B F (j, A + 1, B + 1) − (A + j ) F (j, A, B) = (B − A − j )F (j, A, B) = (B − A − j )j +1 .

(10) j And so it is true for every positive integer j . Note that the binomial coefficient k is zero if the integer k is either negative or is greater than j . Actually, k! = (k + 1) can be replaced by ∞ whenever k is a negative integer. Corollary. j k=0

j (j − k)(A + j − k)k (B)j −k = j B(B − A − j + 2)j −1 . (−1) k k

(11)

Proof. Left hand side = j B

j −1 k=0

(−1)

k

j −1 (A + j − k)k (B + 1)j −1−k k

= j BF (j − 1, A + 1, B + 1) = j B(B − A − j + 2)j −1 .

(12)

Determinant. The determinant of a matrix is not changed if we add to any of its rows (columns) an arbitrary linear combination of the other rows (columns). Replacing the j th column Cj = (a + j − i)(b + j + i) by Cj

=

j k=0

j (b + j ) Cj −k (−1) k (b + j − k) k

= (a − i)

j

(−1)k

k=0

+

j k=0

(−1)k

j (b + j − k)k (b + i + j − k) k

j (j − k)(b + j − k)k (b + i + j − k) k

(13)

230

M. L. Mehta, R. Wang

= (b + i)[(a − i)F (j, b, b + i) + j (b + i)F (j − 1, b + 1, b + i + 1)] = (b + i)[(a − i)(i − j + 1)j + j (b + i)(i − j + 2)j −1 ] i! i! = (b + i) (a − i) + j (b + i) , (i − j )! (i − j + 1)!

(14)

where in the third line above we have used the lemma and its corollary. Taking out the factors (b + i) one has

n−1 i! i! (b + i) det (a − i) + j (b + i) . (15) Mn (a, b) = (i − j )! (i − j + 1)! i=0

Now replace the row

i! i! Ri = (a − i) + j (b + i) (i − j )! (i − j + 1)!

(16)

by the linear combination Ri = =

i

(−1)k

i Ri−k k

(−1)k

(i − k)! i (i − k)! (a − i + k) + j (b + i − k) k (i − k − j )! (i − k − j + 1)!

k=0 i k=0 i

(17)

a−i 1 + k!(i − j − k)! (k − 1)!(i − j − k)! k=0 j j (b + i) − + k!(i − k − j + 1)! (k − 1)!(i − k − j + 1)!

i i (a − i) 1 i−j i−j −1 = i! − (−1)k (−1)k−1 k k−1 (i − j )! (i − j − 1)! k=0 k=0 i i j (b + i) j k i−j +1 k−1 i − j + + (−1) (−1) k k−1 (i − j + 1)! (i − j )! = i!

(−1)k

k=0

k=0

= i! [(a − i)δi,j − δi,j +1 + j (b + i)δi,j −1 + j δi,j ] = i! [aδi,j − δi,j +1 + j (b + i)δi,j −1 ].

(18)

Thus Mn (a, b) = Dn

n−1

i! (b + i),

(19)

i=0

where Dn = det[aδi,j − δi,j +1 + j (b + i)δi,j −1 ].

(20)

Calculation of a Certain Determinant

231

Expanding by the last row and last column, one gets the recurrence relation Dn+1 = aDn + n(b + n − 1)Dn−1 , D0 = 1,

D1 = a,

(21)

D2 = a + b,

... .

2

(22)

To find Dn introduce the generating function f (z) =

∞ n z n=0

n!

Dn .

(23)

Multiplying Eq. (21) on both sides by zn /n! and summing over n from 0 to ∞, one has f (z) = af (z) + bzf (z) + z2 f (z)

(24)

since ∞ n=0

∞

Dn+1

zn zn d = Dn n! dz n!

(25)

n=0

and ∞

n(b + n − 1)Dn−1

n=0

∞

∞

n=1

n=2

zn zn−1 zn−2 Dn−1 Dn−1 = bz + z2 n! (n − 1)! (n − 2)!

(26)

= bzf (z) + z2 f (z) hence (1 − z2 )f (z) = (a + bz)f (z)

(27)

f (z) = (1 − z)−(b+a)/2 (1 + z)−(b−a)/2 .

(28)

or

This gives Eqs. (3) and (7). If a = 0, 2 −b/2

f (z) = (1 − z )

∞ 2n b z = n! 2 n

(29)

n=0

so that the determinant (2) of an n × n matrix is zero when n is odd, as it should, and when n is even it is 2n−1 (2n)! b i! (b + i) M2n (0, b) = n! 2 n i=0 (30) 2

n−1 (2i + 1)! (b + 2i + 1) . = i=0

The pfaffian is the square root of this and its sign can be fixed by looking at one of the terms.

232

M. L. Mehta, R. Wang

When a = b, f (z) = (1 − z)−a

Dn = (a)n .

and

(31)

When a is a small integer, expression (7) for Dn simplifies. For example, when a = 1, ∞ 2n b+1 z f (z) = (1 + z)(1 − z2 )−(b+1)/2 = (1 + z) (32) n! 2 n n=0

and (2n)! = n!

D2n

D2n+1 =

b+1 2

(2n + 1)! n!

n

,

b+1 2

(33) n

.

(34)

Or when a = 2, 2 −(b+2)/2

f (z) = (1 + z) (1 − z ) 2

∞ 2n b+2 z = (1 + z) n! 2 n 2

(35)

n=0

and D2n =

(2n)! n!

D2n+1 = 2.

b+2 2

(2n + 1)! n!

n

+

b+2 2

(2n)! (n − 1)!

b+2 2

n−1

,

(36)

n

.

(37)

etc. Acknowledgements. We are thankful to Dr. LauYun Kau who brought to our attention reference [4] convincing us that the determinant is not that difficult to compute and to Dr. Krattenthaler for helpful comments. We also thank Dr. J.-M. Normand for reading the manuscript.

References 1. See for example, Mehta, M.L.: Random matrices. New York: Academic Press, 1991 2. Nyquist, H., Rice, S.O. and Riordan, J.: The distribution of random determinants. Q. Appl. Math. 12, 97–104 (1954) 3. Mehta, M.L. and Normand, J.-M.: Probability density of the determinant of a random hermitian matrix. J. Phys. A: Math. Gen. 31, 5377–5391 (1998) Eqs. (2.14), (2.26), (A.22), (A.23) 4. Krattenthaler, C.: Advanced determinant calculus. Preprint www:http://radon.mat.univie.ac.at/People/kratt and private communication Communicated by H. Araki

Commun. Math. Phys. 214, 233 – 247 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Universality of Correlations of Levels with Discrete Statistics Edouard Brézin, Vladimir Kazakov Laboratoire de Physique Théorique de l’École Normale Supérieure , 24 rue Lhomond, 75231 Paris Cedex 05, France. E-mail: [email protected]; [email protected] Received: 26 October 1999 / Accepted: 7 July 2000

Abstract: We study the statistics of a system of N random levels with integer values, in the presence of a logarithmic repulsive potential of Dyson type. This problem arises in sums over representations (Young tableaux) of GL(N) in various matrix problems and in the study of statistics of partitions for the permutation group. The model is generalized to include an external source and its correlators are found in closed form for any N. We reproduce the density of levels in the large N and double scaling limits and the universal correlation functions in Dyson’s short-distance scaling limit. We also study the statistics of small levels. 1. Introduction and Definition of the Generalized Model The theory of random matrices leads often to consideration of character expansions, i.e. sums over irreducible representations of groups such as GL(N ) or U (N ) [1–9]. Similar sums also play a role in probability theory [10], in particular in recent studies of the distribution of cycles in the group of permutations [11–15]. This has led to the study of probability distribution functions (PDF) for N non-negative integers h1 , · · · , hN of the form:

Pα (h1 , · · · , hN ) ∼ 2 (h)α

N

k=1 hk

,

(1)

where (h) = m>j (hm − hj ), and 0 < α < 1 is a real parameter. Let us generalize this distribution to an ensemble characterized by N parameters α1 , · · · , αN , in the following way: P[α1 ,··· ,αN ] (h1 , · · · , hN ) =

1 (h)χh ({α}), Z

(2)

Unité Mixte de Recherche 8549 du Centre National de la Recherche Scientifique et de l’École Normale Supérieure

234

E. Brézin, V. Kazakov h

detk,j αk j is the Weyl character of a diagonal GL(N ) group element (α) α1 , · · · , αN of a given irreducible representation R. R is fixed in terms of the highest weights mk = hk + k − N, k = 1, · · · , N . (In principle the corresponding hk ’s are strictly ordered, but in view of the symmetry of the weight (1) this restriction may be ignored in the sums). The constant Z is defined by the normalization condition where χh (α) =

Z=

∞ h1 =0

···

∞

(h)χh ({α}).

(3)

hN =0

Note that in the limit of coinciding α’s αk → α the (non-normalized) distribution (2) reduces to (1); this follows from the well known formula: χh (α) →

N

α

k=1

k

mk

(h) dim{m} = N−1 α k (hk −k+N) , k=0 k!

(4)

where dim{m} is the dimension of a representation given by the highest weights m1 , · · · , mN . There are several reasons for this generalization. First it leads to simple exact formulae, as in the case of random matrices coupled to an external matrix source [16, 17]; the N parameters αk ’s play here the role of the eigenvalues of the source. Here also, even when the source is a simple multiple of the identity, meaning now that all the αk ’s are equal to a single α, the final formulae are explicit and simple. Furthermore these parameters αk provide a powerful check of universality. Indeed it is found here that at generic points, at which the density of levels is non-singular, the correlations are, in the proper Dyson scaling limit, insensitive to the specific probability distribution. Singular points fall also into universal classes, as they do for the usual matrix models [16]. Varying the external parameters αk one can indeed tune various singular classes: examples are given in the subsequent sections. The machinery developed for integrable systems, and used in [11–15] for the single-parameter model (1), ought to be applicable to our generalized model (2) as well. Finally the distribution probability (1) provides a natural measure on the S∞ group of arbitrary permutations; the mk ’s with k = 1, 2, · · · , N , are the lengths of the cycles of a permutation class consisting exactly of N cycles. One can also interpret this distribution probability as defined on the (infinite) set of all Young tableaux with mk boxes in the k th row. In our case, (2) is a natural multi-parametric generalization of (1), which may now be interpreted as a specific coloring of Young tableaux. The boxes of the Young tableau characterizing a permutation are colored in N colors in a way which is explained below; the k th color is weighted with αk . For instance, for N = 2, there are two rows of length (highest weights) m1 ≥ m2 in the corresponding Young tableau characterizing a class of permutations with 2 cycles, and for the character we have the following finite sum of positive terms: χm1 ,m2 (α1 , α2 ) = =

m 1 −m2

α1m1 −k α2m2 +k

k=0 α1m1 α2m2

+ α1m1 −1 α2m2 +1 + · · · + α1m2 +1 α2m1 −1 + α1m2 α2m1 .

(5)

Universality of Correlations of Levels with Discrete Statistics

235

This may be interpreted as a coloring of a tableau, in which the first m1 − k boxes of the upper row have color 1 and all the other boxes have color 2; we sum over k with a factor α1#1 α2#2 , where #1, #2 are the numbers of boxes (“areas”) of the Young tableau of a given color. To generalize this formula to any N , we have to expand a general character into a sum of monomials in the α’s (generators of the maximal torus of GL(N )): χ{m} =

n{l}

{l}{m}

N

(αk )lk ,

(6)

k=1

where the finite sum over the positive integers (l1 , · · · , lN ) characterizing the elements of the Lie algebra of the maximal torus, or the weights of representation R, is restricted in a specific way by the shape of the corresponding Young tableau of the representation R(m1 , · · · , mN ). The positive integers n{l} are called the multiplicities of those weights. To make all this more explicit, and to give it a nice probabilistic interpretation, let us introduce the Gelfand–Tseytlin scheme (GT scheme), which provides an orthonormal basis of states for a given representation R of GL(N ) characterized by the highest weights mi , i = 1, · · · , N . Every state vector of R is characterized by N (N −1)/2 positive integers mkj , k = 1, · · · , N, j = 1, · · · , k. The first ones, the mN,i ≡ mi , i = 1, · · · , N , are simply the highest weights. The subsequent ones are given integers, restricted by the inequalities: mj,i ≥ mj −1,i ≥ mj,i+1 , i = 1, · · · , j.

(7)

The basis vectors of this representation are thus characterized by a triangular array usually depicted as [18] mN,1

mN,2 mN−1,1

···

mN,N mN−1,N−1 .. .

··· .. . mN,N

(8)

in which any m in a given row lies in between the two m’s next to it in the previous row. We will use the definition of a character as a trace of a group element in its diagonal TjjR R form, for a given representation R: χR ({α}) = TrR N α j=1 j , where Tjj is a diagonal generator in the Lie algebra of this representation. In the GT basis the values of TjjR are given as (see for example [18]) TjjR ≡ li =

j i=1

mj i −

j −1

mj −1,i .

(9)

i=1

The expansion (6) is now explicitly given by the formula: χR ({α}) =

N {GT }R j =1

l {m}

αjj

,

(10)

236

E. Brézin, V. Kazakov

where the sum is taken over all GT schemes (states), satisfying the restrictions (7). Of course in this sum the same monomial may appear several times. The number of times each monomial enters in the sum is the multiplicity n{l} of Eq. (6). This formula may be given an interesting probabilistic geometrical interpretation, in terms of sums over the colorings of Young tableaux. Let us define a coloring in the following way (see Fig. 1 for an illustration of the N = 3 case). The color number one is given to the last mN,1 − mN−1,1 boxes of the first row, the last mN,1 − mN−1,1 boxes of the second row, etc., the mN,N boxes of the last row. Color number 2 is given to the boxes numbered mN−1,1 + 1 to mN−2,1 in the first row, mN−1,2 + 1 to mN−2,2 in the second row, etc., the first mN−1,N−1 boxes of the row before last, etc., the N th color is given to the first m1,1 boxes of the first row. At the end, the Young tableau is a “brick wall” made out of colored bricks, each brick at an upper level relying on zero, one or two lower bricks of lower color indices, as depicted in Fig. 1. m11 = 4

m21 = 6

000111 111 000111 000 111 000 000 000 000111 111 000111 111 000111 111 000111 000111 000 000 111 000 111 000 111 000 111 111 000 000 111 000 111 000 111 000 111 000 111 000 000 000111 111 000 111111 000 111 m22 = 3

m31 = 9

m32 = 5

m33 = 2

l1 = m11

= the area covered by

l2 = m21 + m22 − m11

= the area covered by

l3 = m31 + m32 + m33 − m21 − m22

= the area covered by

1111 0000 1111 0000 0000 1111 0000 1111 0000 1111 0000 1111 l

l

l

Fig. 1. Coloring of the young tableau for the term in the distribution probability proportional to α11 α22 α33

Note that if all the αk ’s reduce to αk = 1, one is simply summing in (10) over each state with weight one ; therefore one recovers the dimension of the representation dimR as stated in (4). If one substitutes the expression (10) for the character into (2), one obtains a probability for the colored Young tableaux (related to permutations of colored objects). In the next sections, we derive explicit (for any N) formulae for the correlators of the distribution (2) and study various scaling limits.

Universality of Correlations of Levels with Discrete Statistics

237

2. Density of Discrete Levels The density of levels is defined as ρ(σ ) =

1 dt 1 hk δNσ,hk = t , Nσ +1 N 2iπ t N k

(11)

k

in which the brackets denote the averaging with respect to the probability distribution (2). The variable N σ is a priori an integer but, in the large N -limit, σ will be considered as a continuous variable. Let us calculate U1 (t) ≡ N1 k t hk h by performing explicitly the sums over the h’s. Using the antisymmetry of (h) in (2) we can rewrite it as U1 (t) = C

k

h

t hk

h h det m,j αmj C (k) m Pm −1 (−)P hm , αm = (α) (α) m k

P

h

(12) where the determinant is represented as a sum over permutations P of the levels h1 , · · · , hN and by definition (k) αm = t δm,k αm .

(13)

The sums over h’s are now independent and can be calculated using the formula: ∞ h=0

1 1 1 h α = (α∂α ) = Qp 1−α p! 1 − α p h

p

1 1−α

,

where Qp (x) is a polynomial of degree p, whose coefficient of highest degree is one. Since for any set of such polynomials det p,q Qp−1 (xq ) = (x1 , · · · , xN ) we obtain from (12):

1 1 − αk N tαk − αj C (1 − α (k) )−1 = , (14) U1 (t) = (k) (α) N 1 − tαk αk − α j 1−α k

m

m

k

j

where the prime in the product means that the term k = j is omitted. The constant C has been determined, through the normalization condition U1 (1) = 1. This quantity has an elegant contour integral representation, similar to the one found in [16] for integrals over hermitian random matrices with an external matrix source. Indeed, the integral 1 U1 (t) = N (t − 1)

du 2iπ u

1−u 1 − tu

N j

tu − αj , u − αj

(15)

over a contour in the complex u-plane which surrounds the N poles αj ’s (and not the origin), reproduces the sum (14). For the density of levels (11), since dt U1 (t) ρ(σ ) = , (16) 2iπ t Nσ +1

238

E. Brézin, V. Kazakov

where the integration contour encircles the vicinity of the origin, we obtain, using (15): 1 dt du 1 − u N tu − αj ρ(σ ) = . (17) N 2iπt Nσ +1 (t − 1) 2iπ u 1 − tu u − αj j

Similarly for the resolvent, defined as G(z) =

N 1 1 , we can use N z − hk /N k=1

∞

G(z) =

dτ e−τ z U1 (eτ/N )

(18)

0

and obtain the integral representation: N τ/N e u − αj 1 ∞ du 1−u dτ −τ z G(z) = . e τ/N τ/N N 0 (e − 1) 2iπ u 1 − e u u − αj

(19)

j

After the change t → t/u in (17) we get (σ = p/N ): (t − αj ) duup+1 (u − 1) 1 1 dt ρ(p/N ) = N 2iπt p+1 (t − 1) 2iπ (u − αj ) t − u j

(20)

j

or, inflating the contour of integration in u and changing u → 1/w: 1 dt (αj − t) dw (1 − w) 1 ρ(p/N ) = . p+1 p+2 N 2πt (1 − t) 2π w (1 − αj w) tw − 1 j

(21)

j

Expanding the last factor we get a finite sum representation for the correlator: p

ρ(p) =

−1 p−k p−k+1 LN (α)Lˆ N (α), N

(22)

k=0

q q where the polynomials in α’s LN (α) and Lˆ N (α) are defined as (αj − x) dx q LN ({α}) = , q+1 2π ix (1 − x)

(23)

j

q Lˆ N ({α}) =

(1 − x) dx . q+1 2π ix (1 − αj x)

(24)

j

The contours for both (23, 24) encircle the pole at x = 0. Note that this expression is exact, for any N and any set of α’s. In particular for the density, in the simplest case α1 = α2 = · · · = αN = α, most studied in the literature, this leads to a simple integral: −1 duup+1 (u − 1)N 1 dt (t − α)N ρ(p/N ) = (25) p+1 N N 2πt (t − 1) 2π (u − α)N t − u

Universality of Correlations of Levels with Discrete Statistics

239

or, after inflating the contour and changing u → α/w, ρ(p/N ) = −

du 2iπ u

dt 2iπ t Nσ +1 (t − 1)

(1 − u)(α − tu) (1 − tu)(u − α)

N

.

(26)

The polynomial (23) reduces now to: q LN (α)

=

dx 2π ix q+1

α−x 1−x

N (27)

and the two sets of polynomials are related here simply by q

q

LN (α) = α N−q Lˆ N (α).

(28)

In the next sections we will study the large N limit of this density and, for the case of equal α’s, compare it with a direct computation based on the solution of a RiemannHilbert problem given in the appendix. 3. Pair Correlator of Discrete Levels The simplest interesting correlator, which is conjectured to obey a short distance universality for nearby levels is the pair correlator: U2 (t1 , t2 ) ≡

1 hk hl t1 t2 N2

(29)

k=l

or rather its connected part: K2 (t1 , t2 ) ≡ U2 (t1 , t2 ) − U1 (t1 )U1 (t2 ).

(30)

A similar calculation yields U2 (t1 , t2 ) =

1 (t1 αk − t2 αl )(αk − αl ) N2 (t1 αk − αl )(αk − t2 αl ) k,l

(αm − t1 αk ) (αm − t2 αl ) (αm − αk ) (αm − αl ) m=k m=l 1 − α k N 1 − α k N , 1 − αk t1 1 − αk t2

(31)

k

from which one derives again a contour integral representation over two complex variables (similar to the representation for the hermitian matrix model found in [17]): dudv 1 1−u N 1−v N N2 (2iπ )2 (u − t2 v)(v − t1 u) 1 − t1 u 1 − t2 v (t1 u − αj )(t2 v − αj ) . (32) (u − αj )(v − αj )

K2 (t1 , t2 ) = −

j

240

E. Brézin, V. Kazakov

Finally, for the density correlator

ρ(p, q) =

δp,hk

k

m

δq,hm =

dt1 p+1 t1

2iπ

dt2 q+1

2iπ t2

K2 (t1 , t2 ),

(33)

we obtain (after changing variables from t1 to t1 /u and t2 to t2 /v), a factorized formula, again similar to the hermitian matrix model [17]: ρ(p, q) = −R(p, q)R(q, p),

(34)

where R(p, q) =

dudtt −p uq−p−1 (2π )2 (1 − t)

u−1 t −1

N j

(t − αj ) (u − αj )

or, going back to the previous non-scaled variables: dudtt −p uq (t − αj ) (u − 1) R(p, q) = − . (2π )2 (u − t) (t − 1) (u − αj )

(35)

(36)

j

Another useful form of R(p, q) can be obtained (similar to (21)) by the contour of integration in u in (36) and changing u → 1/w to catch the poles at w = 0: dwdtt −p u−q (αj − t) (1 − w) 1 R(p, q) = . (37) (2π )2 (1 − t) (1 − αj w) (1 − tw) j

Expanding the last factor we get a finite sum representation for the correlator: R(p, q) = α −N+q

inf(p,q) k=0

p−k

q−k

LN (α)Lˆ N (α),

(38)

q q where LN (α) and Lˆ N (α) are given by (23) and (24). In the next sections we will study these formulae in the large N limit, for the case of large Young tableaux.

4. Study of the Density in the Large N Limit In the large N limit the formula for the resolvent (19) gives ∞ dτ u du G(σ ) = exp −τ σ − − uG0 (u) , τ 2π u 1−u 0

(39)

1 where G0 (u) = N1 N j =1 u−αj is resolved distribution of parameters αj of our problem. Integration over τ leads to the equation: du u G(σ ) = log σ − − uG0 (u) . (40) 2πu 1−u

Universality of Correlations of Levels with Discrete Statistics

241

Differentiating the last equation in σ , taking into account the contribution of the pole at σ =

u + uG0 (u), 1−u

(41)

and integrating back in σ we obtain an explicit equation for the resolvent G(s) = ln u(s),

(42)

where u(s) is a solution of Eq. (41). The constant of integration in (42) is chosen to be zero in order to match the asymptotics σ → ∞ (u → 1). If we eliminate u from Eqs. (41) and (42) we obtain the following functional equation for G(σ ): eG(σ ) = 1 +

1 1 + G0 (eG(σ ) ). σ σ

(43)

This functional formula is very similar to those found by direct analysis of the saddle point equations in [4, 5] for the heat kernel on the group SU (N ) in the large N limit (see also [6, 7] for many other similar results). 1 For the particular case α1 = α2 = · · · = αN = α we have G0 (u) = u−α and G(σ ) is given by    √ √ 1 1− α 1 + α  G(σ ) = log  (α + 1)σ −(1−α) − (1−α) σ − σ− . √ √ 2σ 1+ α 1− α (44) Since on the real axis G(σ ± 0) = ReG(σ ) ± iπρ(σ ),

(45)

we see from (44) that ρ(σ ) = 1,

for 0 < σ < b

(46)

1 ρ(σ ) = log (α + 1)σ − (1 − α) − (1 − α) (σ − b)(a − σ ) , 2σ for b < σ < a,

(47)

where a=

√ 1+ α √ , 1− α

b=

√ 1− α √ . 1+ α

(48)

This coincides with the direct saddle point calculation of the Appendix and reproduces an old result of A. Vershik and S. Kerov [10] on the limiting shape of the Young tableau. Another interesting problem corresponds to the statistics of the weights close to the upper or lower end of the distribution [12, 13] etc. studied in the next section.

242

E. Brézin, V. Kazakov

5. Double Scaling Limits The density function contains three special points (“end points”): zc = 0, a, b, near which it exhibits a universal behavior. We will study the close vicinity of the points a, b such that the deviation z from one of these points scales with N according to the power law z ∼ N −2/3 typical of the double scaling limit for the generic distributions of the eigenvalues of matrix models [19]. The scaling for α → 1 in the vicinity z ∼ b ∼ 0 will be different and will be considered afterwards. It is useful to put in (19) t = eτ/N = 1+θ/N . This leads to the integral representation:

∞

dθ θ (1 + θ/N )zN+1 0 θ u 1+ × . u − αj N

1 G(z) = N

du u θ −N 1− 2π u 1−uN (49)

j

Exponentiating the integrand and expanding in 1/N we get: ∞ dθ du u G(z) = exp − θ z − − uG0 (u) θ (1 + θ/N ) 2π u 1−u 0 θ2 θ3 u2 2 + + u G (u) −+ 2 z+ 0 2 2N (1 − u) 3N

3 u 1 3 × −z + + G (u) + · · · . u 0 (1 − u)3 2

(50)

The end points a, b are defined by Eq. (41). The singular behavior arises when the term in (50) linear in θ develops a double zero at the end point (i.e., when its derivative in u is zero): G0 (u) + uG0 (u) +

1 = 0. (1 − u)2

(51)

One sees immediately that if the conditions (41) and (51) are satisfied the term quadratic in θ is also zero. In the vicinity of the end points z = a or z = b we can expand the exponential in powers of = z − a (or respectively (z − b)) and δu = u − uc : ∞ 1 ∂ du 2 G(z) = − exp − θ − A(δu) dθ ∂z 2iπ u 2 0

θ2 θ3 + + · · · (52) ( + Auc δu) + Au2c 2N 6N 2 2uc 2zc + + uc G0 . In the double scaling regime, when ∼ N −2/3 , 2 uc (1 − uc )3 the quadratic term in θ 2 /N terms is negligible, and after Gaussian integration in u along the contour of the stationary phase, we obtain an Airy-like function of N 2/3 , ∞ dθ −N 2/3 θ−Bθ 3 ∂ e . (53) G(z) ∂z θ 1/2 0 with A = −

Universality of Correlations of Levels with Discrete Statistics

243

This gives for the density a double scaling expression: ρ(σ ) ∼ 1/2 f (N 2/3 ),

(54)

where the function f (x) of the double scaling parameter can be defined nonperturbatively by an appropriate change of the integration contour in (53). For the last end point of the distribution, namely the vicinity of z = 0, in the simple large N limit ρ(z) = 1 around this point. But it will be not so any more in the special double scaling limit in which α approaches 1 and the interval (0, b), on which ρ remains fixed to one, shrinks to zero. We will consider this special limit in Sect. 7.

6. Study of the Density Correlator for Nearby Levels Now we shall study the quantity R(p, q) in the large N limit for a separation of p, q of order one : |p − q| = 1, 2, 3, · · · . The result is expected to be universal (if we express p − q in terms of the local level spacing ρ(p)−1 ). The study of the large N limit of R(p, q) is very similar to the study of the density ρ(p) in the previous section. Namely we put again t = eτ/N and retain in (36) only the terms of the order ∼ 1. The integration over τ yields the following large N limit of R: R(p, q) =

[u+ (q/N )]|p−q| − [u− (q/N )]|p−q| , |p − q|

(55)

where u± (σ ) = eG(σ ) are two conjugated solutions of (41) corresponding to two choices of the sign in (45) when approaching the real σ -axis. Hence we get on the real axis: R(p, q) = 2ie2ReG(q/N)

sin (πρ(q/N )|p − q|) . |p − q|

(56)

The level correlation function (34) is thus ρ(p, q) ∼

sin2 (πρ(q/N )|p − q|) . |p − q|2

(57)

It reproduces the well-known universal formula for the level correlation function with Dyson repulsion law (for the correlations of eigenvalues of the unitary ensemble of matrices). It does not change even on the saturated part of the corresponding Young tableau where ρ(q/N ) = 1. The only difference with respect to the hermitian ensemble is that in our case of discrete levels this correlation function makes sense only at discrete values of the distance between levels |p − q| = 1, 2, 3, · · · . The authors of [12–15] came to the same conclusion in a particular case of equal α’s. Our generalization to any collection of αk ’s shows once again a remarkable universality of the classical result (57).

244

E. Brézin, V. Kazakov

7. A Special Large N Limit: Small Weights in Very Large Young Tableaux In this section we study a new singular scaling regime, in which all the αk → 1 and 1 N → ∞, so that the parameter ρ = N (1 − αk ) remains finite. N k

(p) LN (α)

In this limit the polynomial defined by (23) becomes : dz(1 + z)p−1 −ρz dt (p) ρ/(t−1) −ρ e = e e = e−ρ Mp (ρ), LN (α) → 2πit p+1 2π izp+1

(58)

with Mp (ρ) = Lp (ρ) − Lp−1 (ρ)

(59)

in which Ln (ρ) is the standard Laguerre polynomial of order n, normalized to Ln (0) = 1. Similarly, in this limit, (p) Lˆ N (α) → Mp (ρ).

(60)

The formula (22) becomes: P∞ (p) = e−ρ

p

Mk (ρ)Mk+1 (ρ).

(61)

k=0

In the limit ρ → 0 of ultra large Young tableaux, we have Mp (ρ) −ρ and the formula (61) gives explicitly P∞ (p) = ρ − (p + 1)ρ 2 + O(ρ 3 ).

(62)

In full analogy with these formulae for the density we can deduce from (38) the correlation function of the small weights p and q for large Young tableaux: R∞ (p, q) =

inf(p,q)

Mp−k (ρ)Mq−k (ρ)

(63)

k=0

which gives for the limit ρ → 0 of ultra large Young tableaux R∞ (p, q) = −ρ + inf(p, q)ρ 2 + O(ρ 3 ).

(64)

8. Conclusion In this work we have studied discrete statistics of the kind (2). Such character expansions (CE) appear in many situations in which one sums over the irreducible representations of GL(N ): the hermitian matrix models, circular ensembles, in various partition functions and loop averages of gauge theories, such as those encountered in two-dimensional quantum chromodynamics [1, 2]. The systematic large N saddle point analysis of various CE’s was first proposed in [3] in relation to the calculation of partition functions and Wilson loops of QCD2 on the sphere, and developed further in [4, 5, 7]. It appeared to be the only effective method in the study of the combinatorics of the so called dually weighted planar graphs (DWG) [6] first introduced in [8].

Universality of Correlations of Levels with Discrete Statistics

245

Although we studied only a particular example of such models our results suggest that the universal properties of various correlation functions observed for our model will hold for many other models defined as multiple sums over discrete levels (random ensembles on the Young tableaux). The explicit formulae, exact for finite N , which have been obtained here-above for the density of levels and for the correlation functions, provide a powerful handle on the study of various universal scaling limits. It would be nice to apply our methods to more complicated characteristics of the model, such as the probability of distribution of the length of various rows of the Young tableau, (which in the case of equal α’s shows, according to the authors of [11], a remarkable relation to the Painleve II equation). Unfortunately, for such a quantity, we have not succeeded to write it in the form of a finite contour integral representation, similar to that used in our paper for the correlation functions. Note Added. After our paper was sent to the preprint data base we were informed by A. Okunkov about his recent results on the generating functions of correlators in a similar model [20]. 9. Appendix : Density of Levels by a Direct Saddle Point Calculation It is interesting to compare the result obtained in Sect. 5 with a direct calculation in the large N-limit. For the simplest model in which all the α’s are equal, the method is straightforward ; it is based, as usual, on the solution of a Riemann-Hilbert problem. The probability weight of a given set of random h’s, is proportional by definition to α hk (h1 , · · · , hN ). We write it in terms of the density distribution ρ(x) =

N 1 δ(x − hk /N ), N

(65)

1

this weight is proportional to exp N 2 log α dxxρ(x) + dxdy log |x − y|ρ(x)ρ(y) . The similarity with the unitary ensembles of random matrices, with their characteristic logarithmic repulsion, is manifest. In the large-N limit the weight is maximum for a density which satisfies the condition ρ(y) log α + 2P dy = 0. (66) x−y This condition holds on the support of the measure, which is yet to be determined. We shall make the ansatz (proposed in [3] in a similar situation) that ρ(x) remains equal to one for 0 < x < b, is some function of x in the interval b < x < a, vanishes at x = a and remains zero for x > a. The consistency of these conditions will be checked later. The equation for ρ(x) on the interval b, a becomes a √ x ρ(y) log α + log dy +P = 0. (67) x−b x−y b We introduce the resolvent

G(z) = 0

a

dy

ρ(y) z−y

(68)

246

E. Brézin, V. Kazakov

and

H (z) =

a b

dy

ρ(y) z − y.

The function H (z) is analytic in the cut-plane from b to a. On the cut it satisfies √ x α = 0. ReH(x) + log x−b One verifies readily that the function √ a −1 dx 1 x α H (z) = (z − a)(z − b) log √ π x−b (x − b)(a − x) b z−x

(69)

(70)

(71)

is the only fuction which satisfies the analycity requirements and the behaviour at infinity. If we consider a large circle in the complex u-plane, the behavior at infinity implies that √ du 1 u α = 0. (72) log √ z − u (u − b)(u − a) u−b Shrinking now the contour, and collecting the various singularities, we obtain √ b dx z α 1 H (z) = − log − (z − a)(z − b) . √ z−b z − x (a − x)(b − x) 0 The integration is then elementary and we end up with √ √ b(z − a) + a(z − b) 1 G(z) = − log α − 2 log . √ 2 z(a − b) fixes a and b; one finds √ √ 1+ α 1− α a= √ and b = √ . 1− α 1+ α

The behavior at infinity G(z) ∼

One finds then G(z) = log

(73)

(74)

1 z

1 (1 + α)z − (1 − α) − (1 − α) z − a)(z − b) , 2αz

(75)

(76)

in agreement with the result of Sect. 5. The density of levels is then obtained as 1 ρ(x) = − ImG(x + i0), (77) π and one verifies our assumptions : ρ(x) = 1 for 0 < x < b, vanishes for x > a and in the interval (b, a) 2 b(a − x) ρ(x) = arctan . (78) π a(x − b) This result has been first obtained by A. Vershik [10]. Acknowledgements. One of us (V.K.) thanks A. Vershik and L. Pastur for useful discussions and valuable bibliographical comments.

Universality of Correlations of Levels with Discrete Statistics

247

References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.

Migdal, A.: Sov. Phys. JETP 42, 413 (1975) Rusakov, B.: Mod. Phys. Lett. A 5, 693 (1990) Douglas, M. and Kazakov, V.: hep-th/9305047, Phys. Lett. B 319, 219 (1993) Kazakov, V. and Wynter, T.: hep-th/9410087, Nucl. Phys. B 440, 407 (1995) Gross, D. and Matytsin, A.: hep-th/9410054, Nucl. Phys. B 437, 541 (1995) Kazakov, V., Staudacher, M. and Wynter, T.: hep-th/9502132, Commun. Math. Phys. 177, 451 (1996); hep-th/9506174 Commun. Math. Phys. 179, 235 (1996); hep-th/9601069, Nucl. Phys. B 471, 309 (1996) Kostov, I., Staudacher, M. and Wynter, T.: hep-th/9703189, Commun. Math. Phys. 191, 283 (1998) Di Francesco, Itzykson, C.: Ann. Inst. Henri Poincaré 59, 117 (1993) Kazakov, V. and Zinn-Justin, P.: hep-th/9808043, Nucl. Phys. B 546,647 (1999) Vershik, A.M. and Kerov, S.V.: Soviet. Math. Dokl. 18, 527 (1977), Funct. Anal. Appl. 19, 21 (1983) Baik, J., Deift, P. and Johansson, K.: math/9810105, math/9901118 Borodin, A., Olshanski, G.: math.CO/9905189 Borodin, A., Olshanski, G.: math.RT/9904010 Johansson, K.: math.CO/9906120 Johansson, K.: math.CO/9903134 Kazakov, V.: Nucl. Phys. B 354, 614 (1991) Brézin, E. and Hikami, S.: cond-math/9702213, Phys. Rev. E 56, 264 (1997) Zhelobenko, D. P.: Compact Lie groups and their representations. VIII: Providence, RI: American Mathematical Society, 1973 Brézin, E. and Kazakov, V.: Phys. Lett. B 236, 144 (1990) Okunkov, A.: math.CO/9903176, math.RT/9907127; Borodin, A., Okunkov, A., Olshanski, G.: math.CO/9905032

Communicated by A. Jaffe

Commun. Math. Phys. 214, 249 – 286 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

On the Law of Addition of Random Matrices L. Pastur1,2, , V. Vasilchuk2,3 1 Centre de Physique Théorique de CNRS, Luminy–case 907, 13288 Marseille, France.

E-mail: [email protected]

2 U.F.R. de Mathématiques, Université Paris 7, 2, place Jussieu, 75251 Paris Cedex 05, France 3 Mathematical Division, Institute for Low Temperature Physics, 47, Lenin Ave., 310164, Kharkov, Ukraine.

E-mail: [email protected] Received: 27 October 1999/ Accepted: 22 March 2000

Abstract: Normalized eigenvalue counting measure of the sum of two Hermitian (or real symmetric) matrices An and Bn rotated independently with respect to each other by the random unitary (or orthogonal) Haar distributed matrix Un (i.e. An + Un∗ Bn Un ) is studied in the limit of large matrix order n. Convergence in probability to a limiting nonrandom measure is established. A functional equation for the Stieltjes transform of the limiting measure in terms of limiting eigenvalue measures of An and Bn is obtained and studied. 1. Introduction The paper deals with the eigenvalue distribution of the sum of two n×n Hermitian or real symmetric random matrices as n → ∞. Namely we express the limiting normalized counting measure of eigenvalues of the sum via the same measures of its two terms, assuming that the latter exist and that terms are randomly rotated one with respect another by an unitary or an orthogonal random matrix uniformly distributed over the group U (n) or O(n) respectively. One may mention several motivations of the problem. First, it can be regarded in the context of the general problem to describe the eigenvalues of the sum of two matrices in terms of eigenvalues of two terms of the sum. The latter problem dates back at least to the paper of H. Weyl [33], and is related to a number of interesting questions of combinatorics, geometry, algebra, etc. (see e.g. review [8] for recent results and references). The problem is also of considerable interest for mathematical physics because of its evident links with spectral theory and quantum mechanics (perturbation theory in particular). It is clear that one cannot expect in general a simple and closed expression for eigenvalues of the sum of two given matrices via eigenvalues of terms. Hence, it is natural to look for a “generic” asymptotic answer, studying a randomized version of the On leave from the U.F.R. de Mathématiques, Université Paris 7.

250

L. Pastur, V. Vasilchuk

problem in which at least one of the two terms is random and both behave rather regularly as n → ∞. Particular results of this type were given in [16, 19] where it was proved that under certain conditions the normalized eigenvalue counting measure of the sum converges in probability to the nonrandom limit that can be found as a unique solution of a certain functional equation, determined by the both term of the sum. Thus, a randomized version of the problem admits a rather constructive and explicit solution in ceratin cases. These results were developed in several directions (see e.g. [9]–[11] and the recent work [21]). Similar problems arose recently in operator algebras studies, known now as the free (non-commutative) probability (see [28, 31, 29] for results and references). In particular, the notion of the R-transform and the free convolution of measures were introduced by Voiculescu and allowed the limiting eigenvalue distributions of the sum to be given in a rather general and simple form. From the point of view of the random matrix theory the problem that we are going to consider is a version of the problem of the deformation (see e.g. [7] for this term) of a given random matrix (that can be a non-random matrix in particular) by another random matrix in the case when “randomness” of the latter includes as an independent part the random choice of the basis in which this matrix is diagonal. We will discuss this topic in more detail in Sect. 2. In this paper we present a simple method of deriving functional equations for the limiting eigenvalue distribution in a rather general situation. The method is based on certain differential identities for expectations of smooth matrix functions with respect to the normalized Haar measure of U (n) (or O(n)) and on elementary matrix identities, the resolvent identity first of all. The basic idea is the same as in [16, 19]: to study not the moments of the counting measure, as it was proposed in the pioneering paper by Wigner [34], but rather its Stieltjes (called also the Cauchy or the Borel) transform, playing the role of an appropriate generating (or characteristic) function of the moments. However, the technical implementation of the idea in this paper is different and simpler than in [16, 19] (see Remark 1 after Theorem 2.1). The paper is organized as follows. In Sect. 2 we present and discuss our main results (Theorem 2.1). In Sect. 3 we prove Theorems 3.1 and 3.2 giving the solution of the problem under the conditions of the uniform in n boundedness of the fourth moments of the normalized counting measure of the terms. These conditions are more restrictive than those for our principal result, given in Theorem 2.1. Their advantage is that they allow us to use the main ingredients of our approach in more transparent form, free of technicalities. In Sect. 4 we prove Theorem 2.1, whose main condition is the uniform boundedness of the first absolute moment of the normalized counting measure of one of the two terms of the sum. In Sect. 5 we study certain properties of solutions of the functional equation and of the limiting counting measure. In Sect. 6 we discuss topics related to our main result and our technique.

2. Model and Main Result We consider the ensemble of n-dimensional Hermitian (or real symmetric) random matrices Hn of the form Hn = H1,n + H2,n , where H1,n = Vn∗ An Vn , H2,n = Un∗ Bn Un .

(2.1)

On the Law of Addition of Random Matrices

251

We assume that An and Bn are random Hermitian (or real symmetric) matrices having arbitrary distributions, Vn and Un are unitary (or orthogonal) random matrices uniformly distributed over the unitary group U (n) (or over the orthogonal group O(n)) with respect to the Haar measure, and An , Bn , Vn and Un are mutually independent. For the sake of definiteness we will restrict ourselves to the case of Hermitian matrices and the group U (n) respectively. The results for symmetric matrices and for the group O(n) have the same form, although their proof is more involved technically (see Sect. 6). We are interested in the asymptotic behavior as n → ∞ of the normalized eigenvalue counting measure (NCM) Nn of the ensemble (2.1), defined for any Borel set ⊂ R by the formula Nn (λ) =

#{λi ∈ } , n

(2.2)

where λi , i = 1, . . . , n are the eigenvalues of Hn . The problem was studied recently in [31, 26, 30] in the context of free (non-commutative) probability. In particular, it follows from results of [26] that if the matrices An and Bn are non-random, their norms are uniformly bounded in n, i.e. their NCM N1,n and N2,n have uniformly in n compact supports, and if these measures have weak limits as n → ∞, N1,n → N1 , N2,n → N2 ,

(2.3)

then the NCM (2.2) of random matrix (2.1) converges weakly with probability 1 to a non-random measure N . Besides, if ∞ N (dλ) , Imz > 0, (2.4) f (z) = −∞ λ − z is the Stieltjes transform of this limiting measure and ∞ Nr (dλ) fr (z) = , r = 1, 2, −∞ λ − z

(2.5)

are the Stieltjes transforms of Nr , r = 1, 2 of (2.3), then according to [18] f (z) satisfies the functional equation f (z) = f1 (z + R2 (f (z))),

(2.6)

where R2 (f ) is defined by the relation z=−

1 − R2 (f2 (z))) f2 (z)

(2.7)

and is known as the R-transform of the measure N2 of (2.3) (see Remark 3 after Theorem 2.1 and [31, 29] for the definition and properties of this transform taking into account that our definition (2.7) differs from that of [31] by the sign). The proof of this result in (n) [26, 18] was based on the asymptotic analysis of the expectations mk of moments of measure (2.2). Since, according to the spectral theorem and the definition (2.2), (n)

(n)

(n)

mk = E{Mk }, Mk

= n−1 TrHnk ,

(2.8)

252

L. Pastur, V. Vasilchuk (n)

one can study the averaged moments mk by computing asymptotically the expectations of the divided by n traces of the powers of (2.1), i.e. of corresponding multiple sums. This direct method dates back to the classic paper by Wigner [34] and requires a considerable (n) amount of combinatorial analysis, existence of all moments measures N1,2 and their rather regular behavior as n → ∞ to obtain the convergence of expectations (2.8) for all integer k and to guarantee that limiting moments determine uniquely corresponding measure. By using this method it was proved in [26, 18] that the expectation of Nn con(n) verges to the limit, determined by (2.6)–(2.7) and in [26] that the variance Var{Mk } = (n) 2 (n) E{(Mk ) } − E2 {Mk } admits the bound (n)

Var{Mk } ≤

Ck , n2

(2.9)

where Ck is independent of n. This bound yields evidently the convergence of all moments with probability 1, thereby the weak convergence with probability 1 of random measures (2.2) to the non-random limit, determined by (2.6), (2.7). The convergence with probability 1 here and below is understood as that in the natural probability space = n , (2.10) n

where n is the probability space of matrices (2.1), that is the product of respective spaces of An and Bn and two copies of the group U (n) for Un and Vn . In this paper we obtain the analogous result under weaker assumptions and by using a method that does not involve combinatorics. This is because we work with the Stieltjes transforms of measures (2.2) and (2.3) and derive directly the functional equations for their limits and the bound analogous to (2.9) for the rate of their convergence (rather well known in random matrix theory, see e.g. [23, 11]) by using certain simple identities for expectations of matrix functions with respect to the Haar measure (Proposition 3.2 below) and elementary facts on resolvents of Hermitian matrices. The Stieltjes transform was first used in studies of the eigenvalue distribution of random matrices in paper [16] and proved to be an efficient tool in the field (see e.g. [9–14, 19–21, 24, 25]). We list the properties of the Stieltjes transform that we will need below (see e.g.[1]). Proposition 2.1. Let m be a non-negative and normalized to unity measure and m(dλ) s(z) = , Im z = 0 (2.11) λ−z be the Stieltjes transform of m (here and below integrals without limits denote the integrals over the whole axis). Then: (i) s(z) is analytic in C \ R and |s(z)| ≤ |Im z|−1 ;

(2.12)

Im s(z)Im z > 0, Im z = 0;

(2.13)

(ii)

On the Law of Addition of Random Matrices

253

(iii) lim y|s(iy)| = 1;

y→∞

(2.14)

(iv) for any continuous function ϕ with compact support we have the inversion (Frobenius–Perron) formula 1 φ(λ)Im s(λ + iε); (2.15) φ(λ)N (dλ) = lim ε→0 π (v) conversely, any function verifying (2.12)–(2.14) is the Stieltjes transform of a nonnegative and normalized to unity measure and this one-to-one correspondence between measures and their Stieltjes transforms is continuous if one will use the topology of weak convergence for measures and the topology of convergence on compact sets of C \ R for their Stieltjes transforms. We formulate now our main result. Since eigenvalues of a Hermitian matrix are unitary invariant we can replace matrices (2.1) by Hn = An + Un∗ Bn Un ,

(2.16)

where An , Bn and Un are the same as in (2.1). However, it is useful to keep in mind that the problem is symmetric in An and Bn . We prove Theorem 2.1. Let Hn be the random n × n matrix of the form (2.1). Assume that the normalized eigenvalue counting measures Nr,n , r = 1, 2 of matrices An and Bn converge weakly in probability as n → ∞ to the non-random nonnegative and normalized to 1 measures Nr , r = 1, 2 respectively and that ∗ (dλ) ≡ m1 < ∞, (2.17) sup |λ|ENr,n n

∗ is one of the measures N where Nr,n 1,n or N2,n . Then the normalized eigenvalue counting measure Nn of Hn converges in probability to a non-random nonnegative and normalized

to 1 measure N whose Stieltjes transform (2.4) is a unique solution of the system 2 (z) , f (z) = f1 z − f (z) 1 (z) , z− f (z)

f (z) = f2

f (z) =

(2.18)

1 − 1 (z) − 2 (z) −z

in the class of functions f (z) satisfying (2.12)–(2.14) and functions r (z), r = 1, 2 analytic for Im z = 0 and satisfying conditions 1,2 (z) → 0 as Im z → ∞,

(2.19)

where fr (z), r = 1, 2 are Stieltjes transforms (2.5) of the measures Nr , r = 1, 2 and E{·} denotes the expectation with respect to the probability measure, generated by An , Bn , Un and Vn .

254

L. Pastur, V. Vasilchuk

The theorem will be proved in Sect. 4. Here we make several remarks related to the theorem (see also Sect. 5). Remark 1. The historically first example of a random matrix ensemble representable in the form (2.16) was proposed in [16] and has the form Hm,n = H0,n +

m

τi Pqi ,

(2.20)

i=1

where H0,n is a non-random n × n Hermitian matrix such that its normalized eigenvalue counting measure converges weakly to a limiting non-negative and normalized to 1 measure N0 , τi , i = 1, . . . , m are i.i.d. random variables and Pqi are orthogonal projections on unit vectors qi , i = 1, . . . , m, that are independent of one another and of {τi }m i=1 , and uniformly distributed over the unit sphere in Cn 1 . It is clear that the matrix m

τi Pqi

(2.21)

i=1

can be written in the form Un∗ Bn Un of the second term of (2.1) or (2.16). According to [16] the NCM of the random matrix (2.21) converges in probability as n → ∞, m → ∞, m/n → c ≥ 0 to a non-random nonnegative and normalized to 1 measure whose Stieltjes transform fMP (z) satisfies the equation fMP (z) = − z − c

τ σ (dτ ) 1 + τfMP (z)

−1

,

(2.22)

where σ is the probability law of τi in (2.20). Assume that σ has the finite first moment |τ |σ (dτ ) < ∞. (2.23) Then taking (2.21) as the second term of (2.1) we get, in view of inequality E

m m |λ|N2,n (dλ) ≤ n−1 E{|τi |} = E{|τ |} < ∞, n i=1

the condition (2.17) of Theorem 2.1. Applying then Theorem 2.1 in which f2 (z) is given by (2.22), we obtain from the two last equations of the system (2.18) that 1 (z) τ σ (dτ ) =c . f (z) 1 + τfMP (z) This and the first equation of (2.18) yield the functional equation for the Stieltjes transform of the limiting eigenvalue distribution of ensemble (2.20) τ σ (dτ ) f (z) = f0 z − c , (2.24) 1 + τf (z) 1 In fact, in [16] a more general class of independent random vectors was considered, but we restrict ourselves here to the unit vectors, in order to have an example of an ensemble of form (2.1).

On the Law of Addition of Random Matrices

255

where f0 (z) is the Stieltjes transform of the limiting NCM N0 of the non-random matrix H0,n . This equation was obtained in [16] by another method, whose main ingredient was careful analysis of changes of the resolvent of matrices (2.20) induced by addition of the (m + 1)th term, i.e. by a rank-one perturbation. This allowed the authors to prove that the sequence gi,n (z) = n−1 Tr(Hi,n − z)−1 , i = 1, . . . , m converges in probability to the non-random limit f (z, t), z ∈ C\R, t ∈ [0, 1], as n → ∞, m → ∞, i → ∞, m/n → c, i/m → t, and that the limiting function f (z, t) satisfies the quasilinear PDE, τ (t) ∂f ∂f +c , f (z, 0) = f0 (z), ∂t 1 + τ (t)f ∂z

(2.25)

where τ (t) is the inverse of the probability distribution σ (τ ) = P{τi ≤ τ }. It can be shown that the solution of (2.25) at t = 1 coincides with (2.20) [16]. Equation (2.25) with τ (t) ≡ const is a particular case of the so-called complex Burgers equation which appeared in free probability [31], where the random matrices (2.20) provide an analytic model for the stationary processes with free increments, like in the conventional probability the heat equation and sums of i.i.d. random variables comprise an important ingredient of the theory of random processes with independent increments. Remark 2. Consider the ensemble known as the deformed Gaussian ensemble [19]: Hn = H0,n + Mn ,

(2.26)

where H0,n is a non-random matrix such that its normalized eigenvalue counting measure converges weakly to the limit N0 and Mn = {Mj k }nj,k=1 is a random Hermitian matrix whose matrix elements Mj k are complex Gaussian random variables satisfying conditions: Mj k = Mkj , E{Mj k } = 0, E{Mj1 k1 Mj2 k2 } =

2w2 δ j j δk k . n 12 1 2

In other words, the ensemble is defined by the distribution n P(dM) = Zn−1 exp − 2 TrM 2 dM, 4w dM =

n j =1

dMjj

(2.27)

(2.28)

dRe Mj k dIm Mj k ,

1≤j
where Zn is the normalization constant. The distribution defines the Gaussian Unitary Ensemble (GUE) [17]. This is why ensemble (2.26) is called the deformed GUE [7]. It is known [17] that Mn can be written in the form Mn = Un∗ 3n Un ,

(2.29)

where Un are unitary matrices whose probability law is the Haar measure on U (n) and 3n is independent of the Un diagonal random matrix whose normalized eigenvalue counting measure converges with probability 1 to the semicircle law. The Stieltjes transform fsc (z) of the latter satisfies the simple functional equation [19] fsc (z) = −(z + 2w 2 fsc (z)),

(2.30)

256

L. Pastur, V. Vasilchuk

whose solution yields the semicircle law by Wigner

Nsc (dλ) = (4πw 2 )−1 8w 2 − λ2 χ[−2√2w,2√2w] (λ)dλ,

(2.31)

where χ[a,b] (λ) is the indicator of the interval [a, b] ⊂ R. It is easy to see that E{n−1 TrMn2 } = 2w2 < ∞. Denoting by Nsc,n the NCM of the random matrices defined by (2.28) we can rewrite this inequality in the form ∞ λ2 E{Nsc,n (dλ)} < ∞. (2.32) −∞

Thus, if we use (2.29) as the second term in (2.16), it will satisfy condition (2.1). Taking fsc (z) as f2 (z) in (2.18) we find from the two last equations of the system that 2 (z)/f (z) = −2w 2 f (z) and then the first equation of (2.18) takes the form f (z) = f0 (z + 2w 2 f (z)),

(2.33)

where f0 (z) is the Stieltjes transform of the limiting counting measure of matrices H0,n . This functional equation determining the limiting eigenvalue distribution of the deformed GUE was found by another method in [19] (see also [12]) for random matrices (2.26) in which Mn has independent (modulo the Hermitian symmetry conditions) entries, for (2.28) in particular. Remark 3. Consider now a probability measure m(dλ) and assume that its second moment m2 is finite. In this case we can write the Stieltjes transform s(z) of m in the form s(z) = −(z + 7(z))−1 , where 7(z) is the Stieltjes transform of a non-negative measure whose total mass is m2 (to prove this fact one can use, for example, the general integral representation [1] for functions satisfying (2.13) ). Since s (z) = z−2 (1 + o(1)), z → ∞, then, according to the local inversion theorem, there exists a unique functional inverse z(s) of s(z) defined and analytic in a neighborhood of zero and assuming its values in a neighborhood of infinity. Denote 7(z(s)) = Rm (s)

(2.34)

and following Voiculescu [31] call Rm (s) the R-transform of the probability measure m. By using the R-transforms R1,2 of measures N1,2 we can rewrite the first two equations of system (2.18) in the form 1,2 1 = + z + R2,1 (f (z)) = −R(f (z)) + R2,1 (f (z)), f (z) f (z)

(2.35)

where R denotes the R-transform of the limiting normalized counting measure N of the ensemble (2.1) (the measure whose Stieltjes transform is f ). These relations and the third equation of system (2.18) lead to the remarkably simple expression of R via R1 and R2 , R(f ) = R1 (f ) + R2 (f ),

(2.36)

On the Law of Addition of Random Matrices

257

that “linearizes” the rather complex system (2.18). The relation was obtained by Voiculescu in the context of C ∗ -algebra studies (see [31, 29] for results and references). Thus, one can regard the system (2.18) as a version of the binary operation on measures defined by (2.36) and known as the non-commutative convolution. A simple precursor of relation (2.36) containing the functional inverses of f and f1,2 for real z lying outside of the support of N0 in (2.24) was used in [16] (see also [25]) to locate the support of N in terms of the support of N0 in the case of ensemble (2.20). The simplest form of the relation (2.36) for the case when both measures are semicircle measures (2.31), i.e. when 2 f , was indicated in [19]. Formal derivation of relation (2.36) for the case R1,2 = 2w1,2 when the both matrices H1 and H2 are distributed according to the laws (n)

(n)

P1,2 (dH ) = Z1,2 exp{−nV1,2 (H )}, dH,

(2.37)

where V1,2 : R → R+ are polynomials of an even degree was given in [36]. The derivation is based on the perturbation theory with respect to the non-quadratic part of V1,2 and the R-transform is related to the sum of irreducible diagrams of the formal perturbation series. Existence of the limiting eigenvalue counting measure for the random matrix ensemble (2.37) was rigorously proved in [6] for a rather broad class of functions V (not necessarily polynomials). It was also proved that the normalized counting measure (2.2) converges in probability to the limiting measure. The form (2.29) of matrices of ensemble (2.37) can be deduced from known results on the ensemble (2.37) (see e.g.[5]) in the same way as for the GUE (2.28), where V (λ) = λ2 /4w 2 (see [17]). Condition (2.17) follows from results of [6, 21]. Thus we can apply Theorem 2.1 to obtain rigorously relation (2.36) in the case when matrices Hr , r = 1, 2 in (2.1) are distributed according to (2.37). Remark 4. The problem of addition of random Hermitian (real symmetric) matrices has natural multiplicative analogues in the case of positive definite Hermitian (real symmetric) or unitary (orthogonal) matrices. Namely, assuming that An and Bn are positive definite matrices and Un is the unitary (orthogonal) Haar distributed random matrix we can consider the positive definite random matrix Hn = An Un∗ Bn Un An . 1/2

1/2

(2.38)

Likewise, if Sn and Tn are unitary (orthogonal) matrices and Un is as above we can consider the random unitary matrices Vn = Sn Un∗ Tn Un .

(2.39)

In latter case the normalized eigenvalue counting measure is defined as n−1 times the number of eigenvalues belonging to a Borel set of the unit circle. In both cases (2.38) and (2.39) one can study the limiting properties of the NCM’s of respective random matrices provided that the “input” matrices An , Bn , Sn and Tn have limiting eigenvalue distributions. The first examples of ensembles of the above forms as multiplicative analogues of the ensemble (2.20) were proposed in [16], where the respective functional equations analogous to (2.24) were derived. A general class of the random matrix ensembles of these forms was studied in free probability [28, 31, 2], where the notions of the S-transform and the free multiplicative convolution of measures were proposed and used to give a general form of the limiting eigenvalue distributions of products (2.38) and (2.39). It will be shown in the subsequent paper [27] that a version of the method of this paper leads to results analogous to those given in Theorem 2.1 above.

258

L. Pastur, V. Vasilchuk

3. Convergence with Probability 1 for Non-Random An and Bn As the first step of the proof of Theorem 2.1 we prove the following Theorem 3.1. Let Hn be the random n × n matrix of the form (2.1) in which An and Bn are non-random Hermitian matrices, Un and Vn are random independent unitary matrices distributed each according to the normalized to 1 Haar measure on U (n). Assume that the normalized counting measures Nr,n , r = 1, 2 of matrices An and Bn converge weakly as n → ∞ to nonnegative and normalized to 1 measures Nr , r = 1, 2 respectively and that sup λ4 Nr,n (dλ) = m4 < ∞, r = 1, 2. (3.1) n

Then the normalized eigenvalue counting measure (2.2) of Hn converges with probability 1 to a non-random and normalized to 1 measure whose Stieltjes transform (2.4) is a unique solution of the system (2.18) in the class of functions f (z), r (z), r = 1, 2 analytic for Im z = 0 and satisfying conditions (2.12)–(2.14) and (2.19) respectively. Remark 1. The theorem generalizes the results of [26] proved under the condition that supports of the NCM Nr,n , r = 1, 2 of An and Bn are uniformly bounded in n. Remark 2. By mimicking the proof of the Glivenko–Cantelli theorem (see e.g. [15]), one can prove that the random distribution functions Nn (λ) = Nn (] − ∞, λ[) corresponding to measures (2.2) converge uniformly with probability 1 to the distribution function N(λ) = N (] − ∞, λ[) corresponding to measure N : P{ lim sup |Nn (λ) − N (λ)| = 0} = 1. n→∞ λ∈R

We present now our technical means. First is a collection of elementary facts of linear algebra. Proposition 3.1. Let Mn be the algebra of linear transformations of Cn in itself (n × n complex matrices) equipped with the norm, induced by the Euclidean norm of Cn . We have : (i) if M ∈ Mn and {Mj k }nj,k=1 is the matrix of M in any orthonormalized basis of Cn , then

(ii) if TrM =

n j =1

|Mj k | ≤ ||M||;

(3.2)

|TrM1 M2 | ≤ (TrM1 M1∗ )1/2 (TrM2 M2∗ )1/2 ,

(3.3)

Mjj , then

where M ∗ is the Hermitian conjugate of M, and if P is a positive definite transformation, then |TrMP | ≤ ||M||TrP ;

(3.4)

On the Law of Addition of Random Matrices

259

(iii) for any Hermitian transformation M its resolvent G(z) = (M − z)−1

(3.5)

is defined for all non-real z, Im z = 0, ||G(z)|| ≤ |Im z|−1

(3.6)

and if {Gj k (z)}nj,k=1 is the matrix of G(z) in any orthonormalized basis of Cn then |Gj k (z)| ≤ |Im z|−1 ;

(3.7)

(iv) if M1 and M2 are two Hermitian transformations and Gr (z), r = 1, 2 are their resolvents, then G2 (z) = G1 (z) − G1 (z)(M2 − M1 )G2 (z)

(3.8)

(the resolvent identity); (v) if G(z) = (M − z)−1 is regarded as a function of M, then the derivative G (z) of G(z) with respect to M verifies the relation G (z) · X = −G(z)XG(z)

(3.9)

for any Hermitian X ∈ Mn , and, in particular, ||G (z)|| ≤ ||G(z)||2 ≤ |Im z|−2 .

(3.10)

Here is our main technical tool. Proposition 3.2. Let = : Mn → C be a continuously differentiable function. Then the following relation holds for any M ∈ Mn and any Hermitian element X ∈ Mn : = (U ∗ MU ) · [X, U ∗ MU ]dU = 0, (3.11) U (n)

where [M1 , M2 ] = M1 M2 − M1 M2 is the commutator of M1 and M2 and the symbol . . . dU

(3.12)

(3.13)

U (n)

denotes integration over U (n) with respect to the normalized Haar measure dU . Proof. To prove (3.11) we use the right shift invariance of the Haar measure: dU = d(U U0 ), ∀U0 ∈ U (n) according to which the integral

= e−iεX U ∗ MU eiεX dU U (n)

is independent of ε for any Hermitian X ∈ Mn . Thus its derivative with respect to ε at ε = 0 is zero. This derivative is the l.h.s. of (3.11).

260

L. Pastur, V. Vasilchuk

Proposition 3.3. System (2.18) has a unique solution in the class of functions f (z), 1,2 (z) analytic for Im z = 0 and satisfying conditions (2.12)–(2.14) and (2.19).

Proof. Assume that there exist two solutions (f , 1,2 ) and (f , 1,2 ) of the system. Denote δf = f − f , δ1,2 = 1,2 − 1,2 . Then, by using (2.18) and the integral representation (2.5) for f1,2 , we obtain the linear system for δφ = zδf , and for δ1,2 , δφ(1 − a1 (z)) + b1 (z)δ1 = 0, δφ(1 − a2 (z)) + b2 (z)δ2 = 0, δφ − δ1 − δ2 = 0,

(3.14)

where

a1 =

1 z I2 , b1 = I2 , I2 = I2 (z − 1 /f , z − 1 /f ), f f f N2 (dλ) I2 (z , z ) = , (λ − z )(λ − z )

(3.15) (3.16)

and a2 , b2 can be obtained from a1 and b1 by replacing N2 and 1 by N1 and 2 in the above formulas. For any y0 > 0 consider the domain E(y0 ) = {z ∈ C : |Im z| ≥ y0 , |Re z| ≤ |Im z|}.

(3.17)

If s(z) is the Stieltjes transform (2.11) of a probability measure m, then we have for z ∈ E(y0 ), λm(dλ) M ≤ = + + 2 m(dλ), y λ−z 0 |λ|≤M

|λ|>M

|λ|>M

i.e. zs(z) = −1 + o(1), z → ∞, z ∈ E(y0 ).

(3.18)

Analogously, by using this asymptotic relation and condition (2.19) we obtain that for z → ∞, z ∈ E(y0 ), z2 I1,2 (z) = 1 + o(1), a1,2 (z) = o(1), b1,2 (z) = −1 + o(1). Thus the determinant b1 b2 + b1 + b2 − (a2 b1 + a1 b2 ) of system (3.14) is equal asymptotically to −1. We conclude that if y0 in (3.17) is big enough, then system (3.14) has only a trivial solution, i.e. system (2.18) is uniquely soluble. In what follows we use the notation . . . dU = . . . .

(3.19)

U (n)

Proof of Theorem 3.1. Because of unitary invariance of eigenvalues of Hermitian matrices we can assume without loss of generality that the unitary matrix V in (2.1) is set to unity, i.e. we can work with the random matrix (2.16). We will omit below the subindex

On the Law of Addition of Random Matrices

261

n in all cases when it will not lead to confusion. Write the resolvent identity (3.8) for the pair (H1 , H ) of (2.1): G(z) = G1 (z) − G1 (z)H2 G(z),

(3.20)

where G(z) = (H1 + H2 − z)−1 , G1 (z) = (H1 − z)−1 . Consider the matrix gn (z)G(z), where gn (z) =

1 TrG(z) = n

Nn (dλ) , Im z = 0 λ−z

(3.21)

is the Stieltjes transform of random measure (2.2). The resolvent identity (3.20) leads to the relation gn (z)G(z) = gn (z)G1 (z) − G1 (z)gn (z)H2 G(z). (3.22) By using Proposition 3.2 with the matrix element (H1 + M − z)−1 ac as =(M) we have in view of (3.9) and (3.11), (3.12), (G[X, H2 ]G)ac = 0. Choosing the Hermitian matrix X with only (a, b)th and (b, a)th non-zero entries, we obtain Gaa (H2 G)bc = (GH2 )aa Gbc . Applying to this relation the operation n−1

n a=1

(3.23)

and taking into account the definition

(3.21) of gn (z) we rewrite the last relation in the form gn (z)H2 G(z) = δ2,n (z)G(z), where δ2,n (z) =

1 TrH2 G(z). n

(3.24)

Thus we can rewrite (3.22) as gn (z)G(z) = gn (z)G1 (z) − G1 (z)δ2,n (z)G(z).

(3.25)

Introduce now the centralized quantities ◦ gn◦ (z) = gn (z) − fn (z), δ2,n (z) = δ2,n (z) − 2,n (z),

(3.26)

fn (z) = gn (z), 2,n (z) = δ2,n (z).

(3.27)

where

With these notations (3.25) becomes fn (z)G(z) = fn (z)G1 (z) − 2,n (z)G1 (z)G(z) + R1,n (z),

(3.28)

262

L. Pastur, V. Vasilchuk

where ◦ R1,n (z) = −gn◦ (z)G(z) − G1 (z)δ2,n (z)G(z).

(3.29)

Besides, since n−1 TrH 2 = n−1 Tr(H1 + H2 )2 ≤ 2n−1 TrH12 + 2n−1 TrH22 1/2 = 2 λ2 N1,n (dλ) + 2 λ2 N2,n (dλ) ≤ 4m2 ≤ 4m4 , we have µ2 ≡ sup(n

−1

n

n

Thus

gn (z) =

1/2

λ2 Nn (dλ) ≤ 4m2 ≤ 4m4

TrH ) = sup 2

< ∞.

(3.30)

(3.31)

Nn (dλ) 1 = − + gn (z), λ−z z

where

gn (z) =

In view of (3.31) −1

|z gn (z)| ≤ |Im z|

λNn (dλ) . (λ − z)z

|λ|Nn (dλ) ≤ |Im z|−1 m4 , 1/4

i.e. the asymptotic relation gn−1 (z) = −z 1 + O

1 |Im z|

, Im z → ∞

(3.32)

holds uniformly in n. We have also the simple bound |gn (z)| ≤ |Imz|−1

(3.33)

following from (3.4) and (3.7) and, in addition, according to Proposition 3.1 and (3.24), the bounds |δ2,n (z)| ≤ m4 |Imz|−1 , 1/4

−1

zδ2,n (z) = n

TrH2 zG(z) = n

(3.34) −1

TrH2 (−1 + H G(z)).

(3.35)

Hence, in view of (3.31), |zδ2,n (z)| ≤ (n−1 TrH22 )1/2 + (n−1 TrH22 )1/2 (n−1 TrH 2 G(z)G∗ (z))1/2 1/4

≤ m4

1/2

+ 2m4 /y0 ,

i.e. zδ2,n (z) is uniformly bounded in n. As a result of the above bounds we have for |Im z| ≥ y0 uniformly in n, 1 ||2,n (z)fn−1 (z)G1 (z)|| = O , y0 → ∞, y0

(3.36)

On the Law of Addition of Random Matrices

263

i.e. the matrix 1 − 2,n (z)fn−1 (z)G1 (z) is invertible uniformly in n and there is y0 independent of n and such that for |Im z| ≥ y0 , ||(1 + 2,n (z)fn−1 (z)G1 (z))−1 || ≤ 2.

(3.37)

Thus (3.28) is equivalent to G(z) = (1 + 2,n (z)fn−1 (z)G1 (z))−1 G1 (z)

(1 + 2,n (z)fn−1 (z)G1 (z))−1 fn−1 (z)R1,n (z)

or to

G(z) = G1 z − 2,n (z)fn−1 (z) + (1 + 2,n (z)fn−1 (z)G1 (z))−1 fn−1 (z)R1,n (z).

Applying to this relation the operation n−1 Tr we obtain fn (z) = f1,n (z − 2,n (z)fn−1 (z)) + r1,n (z), where −1

f1,n (z) = n

TrG1 (z) =

(3.38)

N1,n (dλ) λ−z

(3.39)

is the Stieltjes transform of the normalized counting measure of H1,n in (2.1) and r1,n (z) = n−1 Tr(1 + 2,n (z)fn−1 (z)G1 (z))−1 fn−1 (z)R1,n (z),

(3.40)

where R1,n (z) is defined in (3.29). We show in the next Theorem 3.2 that there exists a sufficiently big y0 > 0 and C(y0 ) > 0, both independent of n and such that if z ∈ E(y0 ), where E(y0 ) is defined in (3.17), then the variances ◦ v1 (z) = |gn◦ (z)|2 , v2 (z) = |δ2,n (z)|2

(3.41)

admit the bounds v1 (z) ≤

C(y0 ) C(y0 ) , v2 (z) ≤ . 2 n n2

(3.42)

These bounds, Proposition 3.1, (3.37), and the Schwartz inequality for the expectation . . . imply that uniformly in n and in z ∈ E(y0 ), |r1,n (z)| ≤

2C 1/2 (y0 ) (1 + y0−1 )|fn−2 (z)n−1 TrG(z)G∗ (z)|2 1/2 . n

In view of (3.27), (3.32) and the identity zG(z) = −1 + H G(z) we have fn−1 (z)G(z) = −z(1 + O(y0−1 ))G(z) = (1 + O(y0−1 ))(1 − H G(z)), and since, by (3.3), (3.4) and (3.30), |n−1 TrH G(z)| ≤ y0−1 n−1 TrH 2 ≤ 2m4 y0−1 , |n−1 TrH 2 G(z)G∗ (z)| ≤ 4m4 y0−2 , 1/4

1/2

264

L. Pastur, V. Vasilchuk

we obtain that for z ∈ E(y0 ), |r1,n (z)| ≤

C1 (y0 ) , n

(3.43)

where C1 (y0 ) is independent of n and is bounded in y0 . Furthermore, the bounds (3.33) and (3.34) imply that sequences {fn (z)} and {2,n (z)} are analytic and uniformly in n bounded for |Im z| ≥ y0 > 0. Thus the sequences are compact with respect to uniform convergence on compacts of the domain D(y0 ) = {z ∈ C : |Im z| ≥ y0 > 0}.

(3.44)

In addition, according to the hypothesis of the theorem, the normalized counting measures N1,n of matrices H1,n converge weakly to a limiting probability measure N1 . Hence, their Stieltjes transforms (3.39) converge uniformly on compacts of (3.44) to the Stieltjes transform f1 of N1 . Hence, if y0 > 0 is large enough, there exist two analytic in (3.44) functions f and 2 verifying the relation 2 (z) f (z) = f1 z − , |Im z| ≥ y0 . f (z) This is the first equation of system (2.18). The second equation of the system follows from the argument above in which the roles H1 and H2 are interchanged, in particular the quantity n−1 TrH1 G(z) is denoted 1,n (z). As for the third equation, it is just the limiting form of the identity n−1 Tr(H1,n + H2,n − z)G(z) = 1.

(3.45)

Thus, we have derived system (2.18). Its unique solubility in domain (3.17) where y0 is large enough is proved in Proposition 3.3. Besides, all three functions fn , r,n , r = 1, 2 defined in (3.27) are a priori analytic for |Im z| > 0. Hence, their limits f, r , r = 1, 2 are also analytic for non-real z. In view of the weak compactness of probability measures and the continuity of the one-to-one correspondence between nonnegative measures and their Stieltjes transforms (see Prop. 2.1(v)) there exists a unique nonnegative measure N such that f admit the representation (2.4). The measure N is a probability measure in view of (3.32) and.(2.14). We conclude that the whole sequence {fn } of expectations (3.27) of the Stieltjes transforms gn (3.21) of measures (2.2) converges uniformly on compacts of D(y0 ), where D(y0 ) is defined in (3.44), to the limiting function f verifying (2.18). This result, Theorem 3.2 and the Borel–Cantelli lemma imply that the sequence {gn (z)} converges with probability 1 to f (z) for any fixed z ∈ D(y0 ). Since the convergence of a sequence of analytic functions on any countable set having an accumulation point in their common domain of definition implies the uniform convergence of the sequence on any compact of the domain, we obtain the convergence gn to f with probability 1 on any compact of D(y0 ). Due to the continuity of the one-to-one correspondence between probability measures and their Stieltjes transforms (see Prop.2.1(v)) the normalized eigenvalue counting measure (2.2) of the eigenvalues of random matrix (2.1) converge weakly with probability 1 to the nonrandom measure N whose Stieltjes transform (2.4) satisfies (2.18).

On the Law of Addition of Random Matrices

265

Theorem 3.2. Let Hn be the random matrix of the form (2.1) satisfying the condition of Theorem 3.1. Denote gn (z) = n−1 Tr(Hn − z)−1 , δr,n (z) = n−1 TrHr,n (Hn − z)−1 , r = 1, 2.

(3.46)

Then there exist y0 and C(y0 ), both positive and independent of n and such that the variances of random variables (3.46) admit the bounds for |Im z| ≥ y0 , C(y0 ) , n2 C(y0 ) |δr,n (z) − δr,n (z)|2 ≤ , r = 1, 2, n2 |gn (z) − gn (z)|2 ≤

(3.47) (3.48)

if z ∈ E(y0 ), where E(y0 ) is defined in (3.17). Proof. Because of the symmetry of the problem with respect to H1 and H2 in (2.1) it suffices to prove (3.48) for, say, δ2,n (z). Besides, we will use below the notations g(z) and δ(z) for gn (z) and δ2,n (z) and the notations 1 and 2 for two values z1 and z2 of the complex spectral parameter z. We assume that |Im z1,2 | ≥ y0 > 0. We will use the same approach as in the proof of Theorem 3.1, i.e. we will derive and study certain relations obtained by using Proposition 3.2 and the resolvent identity. Consider the matrix V1 = g ◦ (1)G(2),

(3.49)

where g ◦ (1) = g(1) − g(1). It is clear that n−1 TrV1 for z1 = z and z2 = z is the variance (3.47) that we denoted by v1 (z) in (3.41): |g ◦ (z)|2 = n−1 TrV1 |z1 =z,z2 =z = v1 (z).

(3.50)

In view of the resolvent identity (3.20) for the pair (H1 , H ) we have V1 = −G1 (2)W, W = g ◦ (1)H2 G(2).

(3.51) (3.52)

Applying Proposition 3.2 to the function =(M) = G◦aa (1)(MG(2))cd , where G(z) = (H1 + M − z)−1 , and G◦ (z) = G(z) − G(z) = (H1 + M − z)

−1

−

U (n)

(H1 + U ∗ BU − z)−1 dU,

we obtain the relation − (G(1)[X, H2 ]G(1))aa (H2 G(2))cd + G◦aa (1)([X, H2 ]G(2))cd − G◦aa (1)(H2 G(2)[X, H2 ]G(2))cd = 0,

266

L. Pastur, V. Vasilchuk

where the operation [. . . , . . . ] is defined in (3.12). Choosing as X the Hermitian matrix having only the (c, j )th and (j, c)th non-zero entries, we obtain from the above relation the following one: −Gac (1)(H2 G(1))j a (H2 G(2))cd + (G(1)H2 )ac Gj a (1)(H2 G(2))cd +G◦aa (1)δcc (H2 G(2))j d − G◦aa (1)(H2 )cc Gj d (2) −G◦aa (1)(H2 G(2))cc (H2 G(2))j d + G◦aa (1)(H2 G(2)H2 )cc Gj d (2) = 0. Applying to this relation the operation n−1 and taking into account that ac

g ◦ = n−1

a

G◦aa ,

we have n−2 [G2 (1), H2 ]H2 G(2) + g ◦ (1)H2 G(2) + g ◦ (1)k(2)G(2) − g ◦ (1)δ(2)H2 G(2) = 0,

(3.53)

where k(z) = n−1 TrK(z), K(z) = BGU (z)B − B, GU (z) = U G(z)U ∗ .

(3.54)

Introducing the centralized quantity (cf. (3.26)) k ◦ = k − k,

(3.55)

and using our notations (3.24) and (3.27), we can rewrite (3.53) as (1 − (2))W = −k(2)V1 + R,

(3.56)

R = g ◦ (1)δ ◦ (2)H2 G(2) − g ◦ (1)k ◦ (2)G(2) − T1 ,

(3.57)

where

and T1 = n−2 [G2 (1), H2 ]H2 G(2). In view of the uniform in n bound (3.36)), the function 1 − (z) is uniformly in n bounded away from zero. Thus we have from (3.51), (3.52) and (3.56),

−1 (1 − (2))−1 G1 (2)R. (3.58) V1 = 1 − k(2)(1 − (2))−1 G1 (2) According to (3.54), (3.6) and (3.1), we have uniformly in n, |k(z)| ≤ y0−1 n−1 TrB 2 + |n−1 TrB| ≤ y0−1 m4

1/2

1/4

+ m4

< ∞.

(3.59)

This bound and the universal bound (3.6) imply that the matrix (1 − k(z)(1 − (z))−1 G1 (z)) is uniformly in n invertible if |Im z| ≥ y0 and y0 is large enough, and hence the matrix

−1 (1 − (z))−1 G1 (z) Q = 1 − k(z)(1 − (z))−1 G1 (z)

On the Law of Addition of Random Matrices

267

admits the following bound for |Im z| ≥ y0 and sufficiently large y0 : ||Q|| ≤

C , y0

(3.60)

where C is an absolute constant. Setting now in (3.58) z1 = z, z2 = z and applying to this relation the operation n−1 Tr we obtain in the l.h.s. the variance v1 (z) because of (3.50). As for the r.h.s., its terms can be estimated as follows in view of (3.57): (i) |g ◦ (1)δ ◦ (2)n−1 TrQH2 G(2)| ≤ α12 (y0 )v1 v2 , 1/2 1/2

(3.61)

where v2 is defined in (3.41) and because, according to (3.1), (3.3), (3.6) and (3.60), |n−1 TrQH2 G(2)| ≤ (n−1 TrQ∗ Q)1/2 (n−1 TrH22 G(2)G∗ (2))1/2 ≤ ≤ Cy0−2 m4

1/4

≡ α12 (y);

(3.62)

(ii) |g ◦ (1)k ◦ (2)n−1 TrQG(2)| ≤ α13 (y0 )v1 v3 ,

(3.63)

v3 = |k ◦ (z)|2 ,

(3.64)

1/2 1/2

where

because |n−1 TrQG(2)| ≤ (n−1 TrQ∗ Q)1/2 (n−1 TrG(2)G∗ (2))1/2 ≤ Cy0−2 ≡ α13 (y0 );

(3.65)

(iii) |n−3 Tr(Q[G2 (1), H2 ]H2 G(2))| ≤ Cm4 y0−4 n−2 ≡ 1/2

β1 (y0 ) . n2

Thus we obtain the inequality 1/2 1/2

v1 ≤ α12 (y0 )v1 v2

1/2 1/2

+ α13 (y0 )v1 v3

+

β1 (y0 ) , n2

(3.66)

where α12 , α13 and β1 are independent on n and vanish as y0 → ∞. Now we are going to derive analogous inequalities for v2 and v3 defined in (3.41) and in (3.64) and to obtain the system vi ≤

3 j =1,j =i

1/2 1/2

αij vi vj

+

βi (y0 ) , i = 1, 2, 3. n2

(3.67)

To get the second inequality of the system we consider the matrix (cf. (3.49)) V2 = δ ◦ (1)H2 G(2).

(3.68)

268

L. Pastur, V. Vasilchuk

Applying to V2 the operation n−1 Tr and setting z1 = z, z2 = z, we obtain the variance v2 of (3.42). On the other hand, using Proposition 3.2 for the function =(M) = (MG(1))◦aa (MG(2))cd , we obtain, after performing in essence the same procedure as that used in the derivation of (3.53), in particular, choosing the Hermitian matrix X with only the (c, j )th and (j, c)th non-zero entries, v2 = −g(2)δ ◦ (1)k(2) + δ ◦ (1)δ 2 (2) − T2 ,

(3.69)

T2 = n−3 Tr([GU (1), K(1)]BG(2))

(3.70)

where

and K(z), k(z) are defined in (3.54). Using again centralized quantities (3.26) and (3.55), we can write g(2)δ ◦ (1)k(2) = g ◦ (2)δ ◦ (1)k(2) + g(2)δ ◦ (1)k ◦ (2) and δ ◦ (1)δ 2 (2) = δ ◦ (1)δ ◦ (2)δ(2) + δ ◦ (1)δ ◦ (2)δ(2). Thus, in view of (3.33), (3.34), (3.59), and the Schwarz inequality we have the bounds g(2)δ ◦ (1)k(2) ≤ v1 v2 m4 (1 + m4 y0−1 ) + v2 v3 y0−1 , 1/2 1/2

1/4

1/4

1/2 1/2

and δ ◦ (1)δ 2 (2) ≤ 2v2 m4 y0−1 . 1/4

These bounds and the analogously obtained bound for T2 in (3.70) lead for m4 y0−1 ≤ 1/4 to the second inequality (3.67), in which 1/4

α21 (y0 ) = 4m4 , α23 (y0 ) = 2y0−1 , β2 = 8m4 y0−2 . 1/4

1/4

(3.71)

To obtain the third inequality of (3.67) we may use the same scheme as above applied to the matrix V3 = k ◦ (1)K(2) (cf. (3.49) and (3.68)). However this requires rather tedious computations and the existence of the uniformly bounded in n sixth moment m6 of the measure N2,n . For this reason we consider the quantity n−1 Tr(BGU (1)B)◦ GU (2)B,

(3.72)

where GU (z) is defined in (3.54). As before we would like to obtain for this quantity a certain relation, based on the invariance of the Haar measure with respect to the group shifts. To this end we will introduce the following function of the unitary matrix U : (BU G(1)U ∗ B)◦aa (U G(2)U ∗ B)cd , where G(z) = (H1 + U ∗ BU − z)−1 and we will use the analogue of (3.11) obtained from the left shift invariance of the Haar measure. This leads to the relation (cf. (3.53) and (3.69)) k ◦ (1)g(2)K(2) + k ◦ (1)δ(2)GU (2)B − k ◦ (1)GU (2)B − T3 = 0,

(3.73)

On the Law of Addition of Random Matrices

269

where T3 = n−2 GU (1)BK(1)GU (2)B − K(1)BGU (1)GU (2)B. We multiply (3.73) by B from the left and introduce again the centralized quantities g ◦ , δ ◦ and k ◦ defined in (3.26) and (3.55). We obtain (1 − (2) − f (2)B)k ◦ (1)K(2) = −k ◦ (1)g ◦ (2)BK(2) + k ◦ (1)δ ◦ (2)BGU (2)B + BT3 . In view of (3.32) and (3.36) the imaginary part of the function 1 − (z) is uniformly in n bounded away from zero if |Im z| is large enough. Since B is a Hermitian matrix, the matrix S = (1 − (2) − f (2)B)−1

(3.74)

admits the bound 1 − (2) −1 . ||S|| = |f (2)|−1 · ||((1 − (2))f −1 (2) − B)−1 || ≤ |f (2)|−1 Im f (2) By using (3.28) and (3.34) we find that for z ∈ E(y0 ), where E(y0 ) is defined in (3.17) with sufficiently big y0 , we have the uniform in n inequality |f (2)Im(1 − (2))f −1 (2)| ≥ 1/2, i.e. ||S|| ≤ 2.

(3.75)

This leads to the relation V3 ≡ k ◦ (1)K(2) = −k ◦ (1)g ◦ (2)SBK(2) + k ◦ (1)δ ◦ (2)SBGU (2)B + SBT3 .

(3.76)

We apply to this relation the operation n−1 Tr, set z1 = z, z2 = z and estimate the contribution of the first two terms of the r.h.s. as (3.76) as above, using in addition (3.75). We obtain |n−1 TrSBK(2)| ≤ 4m4

1/2

−1

|n

TrSBGU (2)B| ≤

≡ α31 (y0 ), 1/2 −1 4m4 y0 ≡ α32 (y0 ).

(3.77)

To estimate the third term of the r.h.s. of (3.76) we use the identity SB = −f −1 (2) + (1 − (2))f −1 (2)S, the asymptotic relations (3.32) and (3.34) and the bound (3.75). This yields the bound ||SB|| ≤ 4y0 . By using this bound and the same reasoning as in obtaining other bounds above, we obtain |n−1 TrSBT3 | ≤ where C is an absolute constant.

Cm4 β3 ≡ 2, n y02 n2

270

L. Pastur, V. Vasilchuk

Let us introduce new variables 1/2

1/2

1/2

u1 = y0 v1 , u2 = v2 , u3 = v3 .

(3.78)

Then we obtain from (3.67) and (3.62), (3.65), (3.71), and (3.77) the system u2i ≤

3 j =1,j =i

aij ui uj +

γi , n2

(3.79)

in which the coefficients {aij , i = j } have the form aij = y0−1 bij , where bij are bounded in y0 and in n as y0 → ∞ and n → ∞. By choosing y0 sufficiently big (and then fixing it) we can guarantee that 0 ≤ aij ≤ 1/4, i = j . Thus summing the three relations (3.79) we can write the result in the form (au, ˆ u) ≤ γ /n2 , where γ = γ1 + γ2 + γ3 and (a) ˆ ij = δij + (1 − δij )/4, i, j = 1, 2, 3. Since the minimum eigenvalue of the matrix aˆ is 1/2, we obtain from (3.78) bounds (3.47) and (3.48). 4. Convergence in Probability In this section we prove Theorem 2.1. Since, according to Theorem 3.2, the randomness of Un in (2.1) (or (2.16)) already allows us to prove that the variance of the Stieltjes transform of the NCM (2.2) vanishes as n → ∞, we have only to prove that the additional randomness due to the matrices An and Bn in (2.1) does not destroy this property. We will prove this fact first for An and Bn whose norms are uniformly bounded in n (see Lemma 4.1 below), and then we will treat the general case of Theorem 2.1 by using a certain truncating procedure. Proposition 4.1. Let {mn } be a sequence of random non-negative unit measures on the line and {sn } be the sequence of their Stieltjes transforms (2.11). Then the sequence {mn } converges weakly in probability to a nonrandom non-negative unit measure m if and only if the sequence {sn } converges in probability for any fixed z belonging to a compact K ⊂ {z ∈ C : Imz > 0} to the Stieltjes transform f of the measure m. Proof. Let us prove first the necessity. According to the hypothesis for any continuous function ϕ(λ) having compact support we obtain lim P ϕ(λ)m(dλ) − ϕ(λ)mn (dλ) > ε =0. (4.1) n→∞

Let χ (λ) be a continuous function that is equal to 1 if |λ| < A and is equal to 0 if |λ| > A + 1 for some A > 0. Then χ (λ)m(dλ) χ (λ)mn (dλ) 2 |s(z) − sn (z)| ≤ − + . λ−z λ−z min{dist{z, ±A}} According to (4.1) the first term in the r.h.s. of this inequality converges in probability to zero. Since A is arbitrary, we obtain the required assertion. To prove sufficiency we assume that for any z ∈ K, lim P{|s(z) − sn (z)| > ε} =0.

n→∞

(4.2)

On the Law of Addition of Random Matrices

271

This relation and the inequality (cf. (2.12)) |sn (z)| ≤ max |Imz|−1 ≡ y0−1 < ∞

(4.3)

lim E{|s(z) − sn (z)|}=0,

(4.4)

z∈K

imply that

n→∞

i.e. the sequence {sn (z)} converges to zero in mean. We have also the inequality

|sn (z)| ≤ y0−2 < ∞.

(4.5)

Inequalities (4.3) and (4.5) imply that the sequence {sn }∞ n=1 of random analytic functions is uniformly bounded and equicontinuous. Thus, for any η > 0 we can construct in K a p(η) finite η-network, i.e. a set {zl }l=1 such that for any z ∈ K there exists zl satisfying the inequality |z − zl | ≤ η. Then we have for φn (z) ≡ sn (z) − s(z), Sl = {z : |z − zl | ≤ η}, and η = y02 ε/2, where ε is arbitrary sup |φn (z)| = K

max

sup |φn (z)| ≤ ε +

l=1...p(η) z∈K∩Sl

p(η)

|φn (zl )|,

l=1

and hence E{ sup |φn (z)|} ≤ ε + K

p(η)

E{|φn (zl )|}.

l=1

This inequality and (4.4) imply that lim E{ sup |s(z) − sn (z)|} =0.

n→∞

z∈K

(4.6)

Assume now that the statement is false, i.e. the sequence {mn } does not converge weakly in probability to m. This means that there exists a continuous function ϕ of a compact support, a subsequence {nk } and some ε > 0 such that lim P ϕ(λ)m(dλ) − ϕ(λ)mnk (dλ) ≥ ε =ξ > 0. nk →∞

(4.7)

On the other hand, we have from (4.6) and the Tchebyshev inequality that for any r there exists an integer n(r) such that for n ≥ n(r), (4.8) P sup |φn (z)| ≤ r −1 ≥ 1 − ξ/2. z∈K

Hence, one can select from the sequence {nk } a subsequence {nk } such that inequalities (4.7) and (4.8) are both satisfied. Denote by A and by B the events whose probabilities are written in the l.h.s. of (4.7) and (4.8). Then P{A ∩ B} ≥ P{A}+ P{B} − 1 ≥ ξ/2.

272

L. Pastur, V. Vasilchuk

Hence, for any nk there exists a realization ωnk belonging to both sets A and B, i.e. for which both inequalities ϕ(λ)m(dλ) − ϕ(λ)mn (dλ) ≥ ε, sup |φn (z)| ≤ r −1 (4.9) k k z∈K

are valid. In view of the compactness of the family of the random analytic functions {sn } with respect to the uniform in K convergence and the weak compactness of the family of random measure {mn } there exists a subsequence {nk } of {nk } and a subsequence of realizations {ωnk } such that the subsequence {mnk } corresponding to these realizations converges weakly to a certain measure m and we have in view of (4.7), ϕ(λ)m(dλ) − ϕ(λ) ≥ ε > 0. m (dλ) (4.10) On the other hand, in view of (4.9) and the continuity of the correspondence between measures and their Stieltjes transforms (see Proposition 2.1(v)), the subsequence {snk } converges uniformly on K to s(z), the Stieltjes transform of the measure m. This is incompatible with (4.10), because of the one-to-one correspondence between measures and their Stieltjes transforms. Remark 1. Since the Stieltjes transforms of non-negative and normalized to unity measures are analytic and bounded for non-real z, we can replace the requirement of their convergence for any z belonging to a certain compact set of C± by the convergence for any z belonging to any interval of the imaginary axis, i.e. for z = iy, y ∈ [y1 , y2 ], y1 > 0. Remark 2. The argument used in the proof of the proposition proves also that if {mn } is a sequence of random non-negative measures converging weakly in probability to a nonrandom non-negative measure m, then the Stieltjes transforms sn of mn and the Stieltjes transform s of m are related as follows: lim E{sup |sn (z) − s(z)|} = 0

n→∞

z∈K

(4.11)

for any compact set K of C± . Lemma 4.1. Let Hn be the random n×n matrix of the form (2.1) in which An and Bn are random Hermitian matrices, Un and Vn are random unitary matrices distributed each according to the normalized to unity Haar measure on U(n) and An , Bn , Un and Vn are mutually independent. Assume that the normalized counting measures Nr,n , r = 1, 2 of matrices An and Bn converge in probability as n → ∞ to non-random non-negative unit measures Nr , r = 1, 2 respectively and that sup ||An || ≤ T < ∞, sup ||Bn || ≤ T < ∞. n

n

(4.12)

Then the normalized counting measure of Hn converges in probability to a non-random unity measure N whose Stieltjes transform f (z) is a unique solution of system (2.18) in the class of functions f (z), r (z), r = 1, 2 analytic for Im z = 0 and satisfying conditions (2.12)–(2.14) and (2.19).

On the Law of Addition of Random Matrices

273

Proof. In view of Proposition 4.1 it suffices to show that limn→∞ E{|gn (z)−f (z)|} = 0 for any z belonging to a certain compact set of C± . Moreover, according to Remark 1 after Proposition 4.1, we can restrict ourselves to a certain interval of the imaginary axis, i.e. to z = iy, y ∈ [y1 , y2 ], 0 < y1 < y2 < ∞.

(4.13)

Since condition (4.12) of the lemma implies evidently condition (3.1) of Theorem 3.1 and Theorem 3.2, all the results obtained in these theorems are valid in our case for any fixed realization of random matrices An and Bn . In addition, all n-independent estimating quantities entering various bounds in the proofs of these theorems and depending on the fourth moment m4 in (3.1) and on y0 will depend now on T and on y1 and y2 in (4.13), but not on particular realizations of random matrices An and Bn . We will denote below all these quantities simply by the unique symbol C that may have a different value in different formulas. In particular, denoting as above by . . . the expectation with respect to the Haar measure and using (3.42), we can write that 1/2

E{|gn (z) − gn (z)|} ≤ E{|v1 (z)|} ≤

C . n

Thus, it suffices to show that lim E{|gn (z) − f (z)|} = 0, z = iy, y ∈ [y1 , y2 ],

n→∞

(4.14)

where y1 is big enough. Introduce the quantities γn (y) = iy(gn (iy) − f (iy)), γr,n (y) = δr,n (iy) − r (iy), r = 1, 2.

(4.15)

By using the second equation of system (2.18) we can write the identity γn (y) = iy[f2 (iy − t1,n (y)) − f2 (iy − t1 (y))] + ε1,n (y),

(4.16)

ε1,n (y) = iy[gn (iy) − f2 (iy − t1,n (y))], δ1,n (iy) 1 (iy) , t1 (y) = . t1,n (y) = gn (iy) f (iy)

(4.17)

where

(4.18)

We have E{|ε1,n (y)|} ≤ y2 E{|gn (iy) − g2,n (iy − t1,n (y))|} E{|g2,n (iy − t1,n (y)) − f2 (iy − t1,n (y))|}.

(4.19)

The analogues of (3.38), (3.39) in our case are: r1,n (z), gn (z) = g2,n (z − δ1,n (z)gn (z)−1 ) + where −1

g2,n (z) = n

TrG2 (z) =

N2,n (dλ) , λ−z

(4.20)

274

L. Pastur, V. Vasilchuk

is the Stieltjes transform of random NCM N2,n of H2,n , r1,n (z) = − gn◦ (z)n−1 TrP −1 gn (z)−1 G(z) ◦ − δ1,n (z)n−1 TrP −1 gn (z)−1 G2 (z)G(z), the symbol . . . denotes the expectation with respect to the Haar measure on U (n), P = 1 − G2 (z)t1,n (z), and ◦ gn◦ (z) = gn (z) − gn (z), δ1,n (z) = δ1,n (z) − δ1,n (z)

(4.21)

are the respective random variables centralized by the partial expectations with respect to the Haar measure. In addition, we have the analogue of (3.43), C r1,n (z) ≤ . n This leads to the following bound for the first term in the r.h.s. of (4.19): E{|gn (iy) − g1,n (iy − t2,n (y)|} ≤ E{| r1,n (iy)|} ≤

C . n

To show that the second term also vanishes as n → ∞, we use the analogues of (3.32) and (3.36), g1,n (iy) + 1 ≤ T , |δ2,n (iy)| ≤ T , iy y 2 y which imply that |t1,n (y)| ≤ 2T ,

(4.22)

if y1 is big enough. Thus E{|g2,n (iy − t1,n (y)) − f2 (iy − t1,n (y))|} ≤ sup E{|g2,n (iy + ζ ) − f1 (iy + ζ )|}. |ζ |≤T

The r.h.s of this inequality tends to zero as n → ∞ in view of the hypothesis of Theorem 2.1 and Remark 2 after Proposition 4.1. Thus, there exist 0 < y1 < y2 < ∞ such that for all y ∈ [y1 , y2 ], limn→∞ E{|ε1,n (y)|} = 0. Analogous arguments show that limn→∞ E{|ε2,n (y)|} = 0, where ε2,n (y) is defined in (4.17) and in (4.18) where the indices 1 and 2 are interchanged. Thus we have lim E{|εr,n (y)|} = 0, r = 1, 2.

n→∞

(4.23)

Consider now the first term in the l.h.s. of (4.16). In view of (2.5) we can write this term in the form [f2 (iy − t1,n (y)) − f2 (iy − t1 (y))] = −

δ1,n iy I2 γn + I2 γ1,n = −a1 γn + b1 γ1,n , f gn f (4.24)

where I2, a1 and b1 are defined by formulas (3.15) and (3.16), in which we have to replace 1 , 1 , f and f by 1 , δ1,n , f and gn respectively. Denote by = = {=ij }3i,j =1 the matrix defined by the l.h.s. of system (3.14) and by O = {Oi }3i=1 the vector with

On the Law of Addition of Random Matrices

275

components O1 = γn , O2 = γ1,n , O3 = γ2,n . Then we have from (4.16), (4.23) and (4.24), E{|(=O)1 |} ≤ E{|ε1,n |}.

(4.25)

Interchanging in the above argument indices 1 and 2 we obtain also that E{|(=O)2 |} ≤ E{|ε2,n |}.

(4.26)

Besides, applying to the identity G(z)(H1 + H2 − z) = 1 the operation n−1 T r . . . and subtracting from the result the third equation of system (2.18), we obtain one more relation, E{|(=O)3 |} = 0.

(4.27)

It follows from the proof of Proposition 3.3 that the matrix = is invertible if y1 is big enough. Denote by || . . . ||1 the l 1 -norm of C3 and by || . . . || the induced matrix norm. Then we have E{||O||1 } ≤ E{||=−1 =O||1 } ≤ E1/2 {||=−1 ||2 }E1/2 {||=O||21 }.

(4.28)

It follows from our arguments above that all entries of the matrices = and =−1 and all components of the vector O are bounded uniformly in n and in realizations of random matrices An ,Bn , Un and Vn in (2.1). Thus we have ||=−1 || ≤

3

|(=−1 )ij | ≤ C, ||=O||1 ≤

i,j =1

3

|=ij ||O|j ≤ C.

i,j =1

These bounds and (4.25)–(4.28) imply that E{||O||1 } ≤ C 3/2 (E{|ε2,n |} + E{|ε2,n |})1/2 . In view of (4.23) this inequality implies (4.14), i.e. the assertion of the lemma.

Now we extend the result of Lemma 4.1 for the case of unbounded An and Bn , having the limiting NCM’s with the finite first moments. We will apply the truncation technique standard in probability, whose random matrix version was used already in [16, 19]. Proof of Theorem 2.1. Without loss of generality we can assume that sup |λ|E{N1,n (dλ)} ≤ m1 < ∞. n

(4.29)

For any T > 0 introduce the matrices ATn and BnT replacing eigenvalues An and Bn lying T , r = 1, 2 in ]T , ∞[ by T and eigenvalues lying in ] − ∞, −T ] by −T . Denote by Nr,n the NCM of ATn and BnT . It is clear that for any T > 0 and r = 1, 2, the sequence T } T {Nr,n n≥1 converges weakly in probability to the measures Nr as n → ∞, where T Nr are analogously defined via Nr and have their supports in [−T , T ], and that for each r = 1, 2 the sequence {NrT }T ≥1 converges weakly to Nr as T → ∞. Denote by T + H T = V ∗ AT V + U ∗ B T U . According to NnT , r = 1, 2 the NCM of HnT = H1,n n n n n n n 2,n linear algebra, if Mr , r = 1, 2 are two Hermitian n × n matrices, then rank(M1 + M2 ) ≤ rankM1 + rankM2 ,

(4.30)

276

L. Pastur, V. Vasilchuk

and if {µr,l }nl=1 , r = 1, 2 are eigenvalues of Mr , r = 1, 2, then for any Borel set ∈ R, |#{µ1,l ∈ } − #{µ2,l ∈ }| ≤ rank(M1 − M2 ). By using these facts we find that |Nn () − NnT ()| ≤ +

1 1 rank(Hn − HnT ) ≤ rank(An − ATn ) n n

1 rank(Bn − BnT ) ≤ N1,n (R\] − T , T [) + N2,n (R\] − T , T [), n

(4.31)

valid for any Borel set ∈ R. As a result, the Stieltjes transform gnT of NnT and the Stieltjes transform gn of Nn are related as follows: |gnT (z) − gn (z)| ≤

π N1,n (R\] − T , T [) + N2,n (R\] − T , T [) , |Imz|

hence E{|gnT (z) − gn (z)|} ≤

π E{N1,n (R\] − T , T [)} + E{N2,n (R\] − T , T [)} , |Imz| (4.32)

and lim E{Nr,n (R\] − T , T [)} ≤ 1 − Nr (] − T , T [) = o(1),

n→∞

T → ∞.

Since the norms of matrices H1T and H2T are bounded, the results of Lemma 4.1 are applicable to the function gnT (z), so that, in particular, for any non-real z it converges in probability as n → ∞ to a function f T (z) satisfying the system T2 (z) T T , f (z) = f1 z − T f (z) T (z) f T (z) = f2T z − T1 , f (z) f T (z) =

1 − T1 (z) − T2 (z) . −z

T (z)} are bounded uniformly in n and T for z ∈ In addition, since E{gnT (z)} and E{δ1,n E(y0 ) :

1 , y0 1 1 m1 T T |E{δ1,n |λ|E{N1,n |λ|E{N1,n (dλ)} ≤ (z)}| ≤ (dλ)} ≤ , y0 y0 y0 |E{gnT (z)}| ≤

we have |f T (z)| ≤

1 m1 , |T1 (z)| ≤ . y0 y0

(4.33)

On the Law of Addition of Random Matrices

277

Thus, there exists a sequence Tk → ∞ such that sequences of analytic functions {f Tk (z)} and {T1 k (z)} converge uniformly on any compact set of the E(y0 ) of (4.32). In addition, the measures NrTk , r = 1, 2 converge weakly to the limiting measures Nr , r = 1, 2. Hence, there exist three analytic functions f (z), 1 (z) and 2 (z) = zf (z) + 1 − 1 (z) verifying (2.18). Besides, because of (4.33) and (3.1) for z ∈ E(y0 ) we have |1 (z)| ≤

m1 , and 2 (z) = o(1) as y0 → ∞. y0

As a result of the relations above, f (z) and r (z), r = 1, 2 satisfy the conditions of Proposition 3.3, hence they are defined uniquely. Furthermore, we have E{|gn (z) − f (z)|} ≤ E{|gn (z) − gnTk (z)|} + E{|gnTk (z) − f Tk (z)|} + |f Tk (z) − f (z)|. Hence in view of (4.32), of the arguments above on the convergence of f Tk to f , and of Lemma 4.1 we conclude that for each z ∈ E(y0 ), lim E{|gn (z) − f (z)|} = 0.

n→∞

In view of Proposition 4.1 this implies that the NCM (2.2) of random matrices (2.1) converges weakly in probability as n → ∞ to the non-random measure, whose Stieltjes transform is a unique solution of system (2.18).

5. Properties of the Solution Here we will consider several simple properties of the limiting eigenvalue counting measure described by Theorem 2.1, i.e. the measure whose Stieltjes transform is a solution of (2.18) satisfying (2.12)–(2.14). We refer the reader to works [31, 2, 4, 3] and references therein for a rather complete collection of results on properties of the measure, resulting from the binary operation in the space of the probability measures, defined by a version of system (2.18). This binary operation is called free additive convolution. (i) Assume that the supports of the limiting eigenvalue measures of the matrices An and Bn are bounded, i.e. there exist −∞ < ar, br < ∞, r = 1, 2, such that supp Nr ⊂ [ar , br ], r = 1, 2.

(5.1)

supp N ⊂ [a1 + a2 , b1 + b2 ].

(5.2)

Then

Proof. Denote by {λl }nl=1 and by {λr,l }nl=1 , r = 1, 2 eigenvalues of Hn and Hr,n in (2.1) respectively. Then, according to the linear algebra (cf.(4.31)), #{λl ∈ R\[a1 + a2 , b1 + b2 ]} ≤ #{λ1,l ∈ R\[a1 , b1 ]} + #{λ2,l ∈ R\[a2 , b2 ]}. In view of Theorem 2.1 and (5.1) this leads to the relation N (Rσ ) = 0, i.e. to (5.2).

278

L. Pastur, V. Vasilchuk

(ii) Examples. 1. Consider the case when An =Bn , i.e. N1 = N2 . In this case system (2.18) will have the form z 2 f (z) = f1 − . (5.3) 2 f (z) Take N1 = N = α δ0 + (1 − α) δa , where 0 ≤ α ≤ 1, a > 0 and δλ is the unit measure concentrated at λ ∈ R. Then f1 (z) =

−α 1−α + z a−z

and (2.18) reduces to the quadratic equation z(z − 2a)f 2 + 2a(1 − 2α)f − 1 = 0, whose solution satisfying (2.12) - (2.14) is

−a(1 − 2α) − (z − λ+ )(z − λ− ) f (z) = , λ± = a(1 ± 2 α(1 − α)). z(z − 2a) By using (2.15) we find that the limiting measure in this case has the form N = (2α − 1)+ δ0 + (1 − 2α)+ δ2a + N ∗ , where x+ = max(0, x), and 1 N (dλ) = π

∗

(λ+ − λ)(λ − λ− ) χ[0,2a] (λ)dλ λ(λ − 2a)

(5.4)

(5.5)

is the absolute continuous measure of the mass 1 − 2α. Here χ (λ) is the indicator of the set ⊂ R. In the cases α = 0, 1 (5.4) is δ2a and δ0 respectively, and in the case α = 1/2 (5.4) has no atoms, but only the square root singularities N ∗ (dλ) =

1 χ[0,2a] (λ)dλ. √ π λ(2a − λ)

(5.6)

Formulas (5.3)–(5.6) show that: – the result (5.2) is optimal with respect to the endpoints of the measures Nr , r = 1, 2 and N ; – in the case when N1 = N2 have atoms of the mass µ > 1/2 at the same point then the measure N has also an atom of the mass (2µ − 1) (for general results of this type see [3]). However, in general the support of N is strictly included in the sum of supports of measures Nr , r = 1, 2, i.e. the inclusion in the r.h.s part of (5.3) is strict. This can be illustrated by the following two examples. 2. Take again N1 = N2 , where now N1 (dλ) =

π

1 a2

− λ2 )

χ[−a,a] (λ)dλ

On the Law of Addition of Random Matrices

279

is the arcsin law. This measure corresponds to the matrix ensemble (2.37) with V (λ) =

0, |λ| < 1, ∞, |λ| > 1.

(5.7)

In this case the equation in (5.3) is again quadratic and leads to

N (dλ) =

3a 2 − λ2 ) χ √ √ (λ)dλ. π(4a 2 − λ2 ) [− 3a, 3a]

3. In the next example we take Nr (dλ) =

1 4πar2

8ar2 − λ2 χ[−2√2ar ,2,√2ar ] (λ)dλ, r = 1, 2,

i.e. both measures are the semicircle laws (2.31). Then it is easy to find that N is also the semicircle measure with the parameter a 2 = a12 + a22 . This case was indicated in [19]. It can be easily deduced from the law of addition of the R-transforms of Voiculescu [31], because in this case Rr (f ) = 2ar2 f . For further properties of the measure N in the case when one of Nr , r = 1, 2 is the semicircle law see [14, 4]. (iii) Suppose that one of the measures Nr (dλ), r = 1, 2 is absolutely continuous with respect to the Lebesgue measure, i.e., say, N1 (dλ) = ρ1 (λ)dλ, and ρ 1 = ess sup |ρ1 (λ)| < ∞. λ∈R

Then N is also absolutely continuous with respect to the Lebesgue measure, i.e.N (dλ) = ρ(λ)dλ, and ess sup |ρ1 (λ)| = ρ 1 < ∞. λ∈R

(5.8)

Proof. Indeed, since the function z1∗ = z − 2,1 /f (z) is analytic for non-real z, the number of its zeros in any compact set of C\R is finite. Thus, for any λ ∈ R there exists a sequence {zn } of non-real numbers such that zn → λ as n → ∞ and Im zn∗ = 0. Hence, we have from the first equation of system (2.18) for zn∗ = λ∗n + iεn∗ , 1 1 Imf (z) = π π

εr∗ ρr (µ)dµ 1 ≤ ρ1 (µ − λ∗r )2 + (εr∗ )2 π

εr∗ dµ = ρ1. (µ − λ∗r )2 + (εr∗ )2

This relation and the inversion formula (2.15) yield (5.8). For more general results in this direction see the recent paper [3].

280

L. Pastur, V. Vasilchuk

6. Discussion In this section we comment on several topics related to those studied above. 1. In this paper we deal with Hermitian and unitary matrices, i.e. we assume that the matrices An and Bn in (2.1) are Hermitian and Un and Vn are unitary. It is natural also to consider the case of real symmetric An and Bn and orthogonal Un and Vn . This case can be handled by using the analogue of formula (3.11) of the orthogonal group O(n). Indeed, it is easy to see that this analogue has the form = (O T MO) · [X, O T MO]dO = 0, O(n)

where O T is transposed to O and X is a real symmetric matrix. By using this formula we obtain instead of (3.23), Gaa (H2 G)bc + Gab (H2 G)ac = (GH2 )aa Gbc + (GH2 )ab Gbc . The second terms in both sides of this formula give two additional terms −n−1 GT H2 G + n−1 H2 GT G (cf. (3.40)). These terms, however, produce the asymptotically vanishing contribution because, in view of (3.3), (3.6) and (3.37), we have 2 1/4 −2 n Tr(1 + 2,n fn−1 G1 )−1 G1 (−GT H2 G + H2 GT G) ≤ 3 m4 . ny0 Similar and also negligible as n → ∞ terms appear in analogues of formulas (3.53), (3.69) and (3.73) of the proof of Theorem 3.2. As the result, we obtain in this case the same system (2.18), defining the Stieltjes transform of the limiting eigenvalue counting measure of the analogue of (2.1) with the real symmetric An and Bn and orthogonal Haar-distributed Un and Vn . 2. As was mentioned in the Introduction, our main result, Theorem 2.1, can be viewed as an extension of the result by Speicher [26], obtained by the moment method and valid for uniformly in n bounded matrices An and Bn in (2.1). Both results are analogues for randomly rotated matrices of old results of [16, 19] (see (2.24) and (2.33)) on the form of the limiting eigenvalue counting measure of the sum of an arbitrary matrix and certain random matrices (see (2.20) and (2.26)), in particular, Gaussian random matrices (2.28). In this case, however, there exists another model, proposed by Wegner [32] that combines properties of random matrices, having all entries roughly of the same order, and of random operators, whose entries decay sufficiently fast in the distance from the principal diagonal (see e.g. [22]). A simple, but rather non-trivial version of the Wegner model corresponds to the selfadjoint operator H acting in l 2 (Zd ) × Cn and defined by the matrix H (x, j ; y, k) = v(x − y)δj k + δ(x − y)fj k (x),

(6.1)

where x, y ∈ Zd , j, k = 1, . . . , n, δ(x) is the d-dimensional Kronecker symbol, v(−x) = v(x), |v(x)| < ∞, (6.2) x∈Zd

On the Law of Addition of Random Matrices

281

and f (x) = {fj k (x)}nj,k=1 , x ∈ Zd are independent for different x and identically distributed n × n Hermitian matrices, whose distribution for any x is given by (2.28). According to [32] (see also [14]) asymptotic for n → ∞ properties of operator (6.1) resemble, in many aspects, asymptotic properties of matrices (2.28). The “free” analogue of the Wegner model was proposed in [18]. In this case i.i.d. matrices f (x) have the form f (x) = Un∗ (x)Bn Un (x),

(6.3)

where Bn is as in (2.1) and Un (x), x ∈ Zd are i.i.d. unitary n × n matrices whose distribution is given by the Haar measure on U (n). By using a version of the moment method, similar to that of paper [26], or, rather, its formal scheme, the authors derived the limiting form of   n   E n−1 G(x, j ; y, j ) ,   j =1

where G(x, j ; y, k) is the matrix (the Green function) of the resolvent (H − z)−1 of (6.1)–(6.3). The authors also found a certain second moment of the Green function. This moment is necessary to compute the a.c. conductivity via the Kubo formula. Because of the moment method results of [18] are valid for uniformly bounded in n matrices Bn in (6.3), similar to results for matrices (2.1) obtained in [26]. By using a natural extension of the differentiation formula (3.11) and the technique developed in [14] to analyze the Wegner model, the results of paper [18] can be extended to the case of arbitrary matrices Bn in (6.3), because in this case the role of condition (2.17) of Theorem 2.1) plays condition (6.2). 3. As was mentioned before asymptotic properties of random matrices are of considerable interest in certain branches of operator algebra theory and in the related branch of non-commutative probability theory, known as free probability (see [28, 31, 30] and references therein). Here large random matrices are an important example of the asymptotically free non-commutative random variables, providing a sufficiently rich analytic model of the abstract notion of freeness of elements of an operator algebra. The most widely used examples of asymptotically free families of non-commutative random variables are Gaussian random matrices and unitary Haar-distributed random matrices. The proof of asymptotic freeness of unitary matrices given in [28, 31] reduces to that for complex Gaussian matrices and is based on the observation that the unitary part of the polar decomposition of the complex Gaussian matrix with independent entries is the Haar-distributed unitary matrix. This method requires certain technicalities because of the singularity of the polar decomposition at zero. On the other hand, the differentiation formula (3.11) allows one to prove directly similar statements. Here is an example of results of this type (related results are proved in [35]). Theorem 6.1. Let k be a positive integer, {Tr,n }kr=1 be a set of n × nmatrices, such that sup

r≤k; k,l,n∈N

∗ n−1 Tr(Tr,n Tr,n )l < ∞,

(6.4)

and let Un be the unitary and Haar-distributed random matrix. If for any k ∈ N, lim n−1 TrTr,n = 0, r = 1, . . . , k,

n→∞

(6.5)

282

L. Pastur, V. Vasilchuk

then for any set of non-zero integers such that {mr }kr=1 ,

k

r=1 mr

= 0,

lim n−1 TrUnm1 T1,n . . . Unmk Tk,n = 0,

(6.6)

n→∞

where · denotes the integration with respect to the Haar measure over U (n). Remark 1. The theorem is trivially true in the case when

k

r=1 mr

= 0.

In the two subsequent lemmas we omit the subindex n. Lemma 6.1. Let {Ti }ki=1 be a set of n × n matrices and U is the Haar-distributed unitary matrix. Then for any set of non-zero integers {mi }ki=1 , ki=1 mi = 0 the following identity holds: −1

n

TrU

m1

T1 . . . U

mk

Tk = −

m1

n−1 TrU l1 −1 n−1 Tr(U m1 −l1 +1 T1 . . . U mk Tk )

l1 =2

−

mr

n−1 Tr(U m1 T1 . . . Tr−1 U lr −1 )n−1 Tr(U mr −lr +1 Tr . . . U mk Tk )

r∈{2,...,k},mr >0 lr =1

+

−m r

n−1 Tr(U m1 T1 . . . Tr−1 U −lr )n−1 Tr(U mr +lr Tr . . . U mk Tk ).

r∈{2,...,k},mr <0 lr =1

(6.7) Proof. Without loss of generality assume that m1 > 0. Then, using the analogue of formula (3.11) for the average [U m1 T1 . . . U mk Tk ]ab , we obtain for any Hermitian X, mr

[U m1 T1 . . . Tr−1 U l1 −1 XU mr −lr +1 Tr . . . U mk Tk ]ab

r∈{1,...,k},mr >0 lr =1

−m r

[U m1 T1 . . . Tr−1 U −lr XU mr +lr Tr . . . U mk Tk ]ab = 0.

(6.8)

r∈{2,...,k},mr <0 lr =1

Choosing as X the Hermitian matrix having only (c, d)th and (d, c)th non-zero entries, setting then a = c and b = d and applying to the result the operation n−2 a,b , we obtain (6.7). Lemma 6.2. Under the conditions (6.4) and (6.5) the variance D = |ξ ◦ |2 of the random variable ξ = n−1 TrL, L =U m1 T1 . . . U mk Tk is of the order n−2 as n → ∞.

(6.9)

On the Law of Addition of Random Matrices

283

Proof. Using the same technique as that in Lemma 6.1 for Lab Lcd we obtain the relation D=−

m1

◦

ξ n−1 TrU l1 −1 n−1 Tr(U m1 −l1 +1 T1 . . . U mk Tk )

l1 =2

−

mr

◦

ξ n−1 Tr(U m1 T1 . . . Tr−1 U lr −1 )n−1 Tr(U mr −lr +1 Tr . . . U mk Tk )

r∈{2,...,k},mr >0 lr =1

+

−m r

◦

ξ n−1 Tr(U m1 T1 . . . Tr−1 U −lr )n−1 Tr(U mr +lr Tr . . . U mk Tk )

r∈{2,...,k},mr <0 lr =1

+ n−2 =, (6.10) where == −

mr

n−1 Tr(U mr −lr +1 Tr . . . Tk U m1 T1 . . . Tr−1 U lr −1 )∗ L

r∈{1,...,k},mr >0 lr =1

+

−m r

n−1 Tr(U mr +lr Tr . . . Tk U m1 T1 . . . Tr−1 U −lr )∗ L.

r∈{2,...,k},mr <0 lr =1

We have obviously for k = m = 1, n−1 Tr(U T )◦ n−1 Tr(U T ) ≤

1 −1 n Tr(T T ∗ ). n2

We proceed further by induction. In view of condition (6.4) and Proposition 3.1 we have the bound |n−1 Tr(U m1 Tr1 . . . U mp Trp )| ≤ C,2

(6.11)

where C may depend only on p. Now, since n−1 TrU l = 0, l = 0, the summands of the first term in r.h.s. of (6.10) can be estimated as follows: √ ◦ −1 l1 −1 m1 −l1 +1 mk n Tr(U )n Tr(U T . . . U T ) ≤ C D |n−1 Tr(U l1 )◦ |2 . (6.12) ξ 1 k Likewise, by using the cyclic property of the trace, the identity a ◦ bc = a ◦ b◦ c + a ◦ c◦ b, Schwarz inequality, and (6.11), we obtain for the second term in the righthand side of (6.10) the following estimates for r ≥ 2: ◦ −1 ξ n Tr(U m1 T1 . . . Tr−1 U lr −1 )n−1 Tr(U mr −lr +1 Tr . . . U mk Tk ) √ ≤C D |n−1 Tr(U m1 +lr −1 T1 . . . U mr−1 Tr−1 )◦ |2 + |n−1 Tr(U mr −lr +1 Tr . . . U mk Tk )◦ |2 .

(6.13)

284

L. Pastur, V. Vasilchuk

The third term in the right-hand side of (6.10) can be estimated analogously. The fourth term is of the order 1/n2 in view of (6.9). By the induction hypothesis the expectations under square roots in the r.h.s. of (6.13) and (6.12) are of the order n−2 . This leads to the inequality D≤

C2 C1 √ D+ 2, n n

where C1 and C2 are independent of n. This implies the bound D = O(n−2 ). Proof of Theorem 6.1. We use Lemma 6.1 and again the induction. We have first n−1 TrU m T1 U −m T2 = n−1 TrT1 n−1 TrT2 = 0. In the general case we use Lemma 6.2 to factorize asymptotically the moments in the mr −1 mrs r.h.s. of (6.7). In the resulting relation the expressions n TrU 1 Tr1 . . . U Trs are zero for anycollection (Tr1 , . . . , Trs ) and any n, if si=1 mri = 0, and tend to zero as n → ∞ if si=1 mri = 0 in view of the induction hypothesis and condition (6.5). This leads to (6.6). Remark 2. A simple version of the above arguments allows us to prove that the normalized counting measure of the Haar distributed unitary matrices converges with probability one to the uniform distribution on the unit circle. Indeed, consider again the Stieltjes transform gn of this measure, supported now on the unit circle. By the spectral theorem for unitary matrices we have gn (z) = n−1 TrG(z), G(z) = (U − z)−1 , |z| = 1.

(6.14)

We can then obtain the following identities: TrG2 (z)U = 0, gn (z)n−1 TrG(u)U = 0, gn (z1 )n

−1

−3

TrG(z1 )Ug(z2 ) + n

TrG(z1 )G(z2 )U G(z2 ) =0.

(6.15) (6.16)

By using the obvious relations G (z) = G2 (z), G(0) = U −1 , G(∞) = 0, we obtain from the first of identities (6.15) fn (z) ≡ gn (z) =

0, |z| < 1 −z−1 , |z| > 1.

This relation shows that the expectation of the normalized counting measure of U is the uniform distribution on the unit circle, the fact that follows easily from the shift invariance of the Haar measure. Now the second identity (6.15) and (6.16) lead to the bound |gn (z)|2 ≤

C(r0 ) , |z| ≤ r0, n2

where C(r0 ) is independent of n and finite if r0 is small enough. This bound and arguments analogous to those used in the proof of Theorem 3.1 imply that the normalized eigenvalue counting measure of unitary Haar distributed random matrices converges

On the Law of Addition of Random Matrices

285

with probability one to the uniform distribution on the unit circle. This fact as well as the analogous fact for the orthogonal group can be deduced from the works by Dyson (see e.g. [17]), where the joint probability distribution of all n eigenvalues of the Haar distributed unitary or orthogonal matrices was found and studied. This technique is more powerful but also more complex than that used above and based on rather elementary means. Acknowledgements. V. Vasilchuk is thankful to the Laboratoire de Physique Mathématique et Géométrie de l’Université, Paris-7, for hospitality and to the Ministère des Affaires Etrangères de France for the financial support.

References 1. Akhiezer, N.I., Glazman, I.M.: Theory of Linear Operators in Hilbert Space. New York: Frederick Ungar, 1963 2. Bercovici, H., Voiculescu, D.: Free convolution of measures with unbounded support. Indiana University Math. J. 42, 733–773 (1993) 3. Bercovici, H.,Voiculescu, D.: Regularity questions for free convolution of measures. In: Bercovici, H., et al. (eds.) Nonselfadjoint Operator Algebras, Operator Theory, and Related Topics. Basel: Birkhäuser, 1998, pp. 37–47 4. Biane, P.: On the free convolution with a semicircular distribution. Indiana Univ. Math. J. 46, 705–717 (1997) 5. Bessis, B., Itzykson, C., Zuber, J.-B.: Quantum field theory technique in graphical enumeration. Adv. Appl. Math. 1, 109–157 (1980) 6. Boutet de Monvel, A., Pastur, L., Shcherbina, M.: On the statistical mechanics approach in the random matrix theory: Integrated density of states. J. Stat. Phys. 79, 585–611 (1995) 7. Brody, T.A., French, J., Melo, P., Pandy, A., Wong S.: Random matrix physics: Spectrum and strength fluctuations. Rev. Mod. Phys. 53, 385–480 (1981) 8. Fulton, W.: Eigenvalues, invariant factors, highest weights, and Schubert calculus. Bull. Math. Soc. 37, 200–249 (2000) 9. Girko, V.L.: Random Matrices (Sluchainye matricy). Kiev: Vyshcha Shkola, 1975 (in Russian) 10. Girko, V.L.: Theory of Random Determinants. Dordrecht: Kluwer Academic Publishers, 1990 11. Khorunzhenko, B., Khorunzhy, A., Pastur, L.: Asymptotic properties of large random matrices with independent entries. J. Math. Phys. 37, 5033–5060 (1996) 12. Khoruzhenko, B., Khorunzhy, A., Pastur, L., Shcherbina, M.: Large-n limit in the statistical mechanics and the spectral theory of disordered systems. In: C. Domb, C., Lebowitz, J. (eds.), Phase Transitions and Critical Phenomena. 15, New-York: Academic Press, 1992, pp. 74–239 13. Khorunzhy, A.: Eigenvalue distribution of large random matrices with correlated entries. Mat. Phyz. Anal. Geom. 3, 80–101 (1996) 14. Khorunzhy, A., Pastur, L.: Limits of infinite interaction radius, dimensionality and the number of components for random operators with off-diagonal randomness. Commun. Math. Phys. 153, 605–646 (1993) 15. Loeve, M.: Probability Theory. Berlin: Springer, 1977 16. Marchenko, V.A., Pastur, L.A.: Distribution of eigenvalues for some sets of random matrices. Math. USSR, Sb. 1, 457–483 (1967) 17. Mehta, M.: Random Matrices. Boston: Academic Press, 1991 18. Neu, P., Speicher, R.: Rigorous mean field model for CPA: Anderson model with free random variables. J. Stat. Phys. 80, 1279–1308 (1995) 19. Pastur, L.A.: On the spectrum of random matrices. Teor. Math. Phys. 10, 67–74 (1972) 20. Pastur, L.: Eigenvalue distribution of random matrices. Annales l’Inst. H.Poincare 64, 325–337 (1996) 21. Pastur, L.: A simple approach to global regime of the random matrix theory. In: Miracle-Sole, S., Ruiz, J., Zagrebnov V. (eds.). Mathematical Results in Statistical Mechanics. Singapore: World Scientific, 1999, pp. 429–454 22. Pastur, L., Figotin, A.: Spectra of Random and Almost Periodic Operators. Berlin: Springer, 1992 23. Pastur, L., Shcherbina, M.: Universality of the local eigenvalue statistics for a class of unitary invariant random matrix ensembles. J. Stat. Phys. 86, 109–147 (1997) 24. Silverstein, J.: Strong convergence of the empirical distribution of eigenvalues of large dimensional random matrices. J. Multivariate Anal. 55, 331–339 (1995) 25. Silverstein, J.W., Choi, S.I.: Analysis of the limiting spectral distribution of large dimensional random matrices. J. Multivariate Anal. 54, 295–309 (1995)

286

L. Pastur, V. Vasilchuk

26. Speicher, R.: Free convolution and the random sum of matrices. Publ. Res. Inst. Math. Sci. 29, 731–744 (1993) 27. Vasilchuk, V.: On the law of multiplication of random matrices. Submitted to the Math. Physics, Analysis and Geometry 28. Voiculescu, D.V.: Limit laws for random matrices and free products. Invent. Math. 104, 201–220 (1991) 29. Voiculescu, D.V.(ed.).: Free Probability Theory. Providence: American Mathematical Society, 1997 30. Voiculescu, D.: A strengthened asymptotic freeness result for random matrices with applications to free entropy. Int. Math. Res. Not. 1, 41–62 (1998) 31. Voiculescu, D.V., Dykema, K.J., Nica, A.: Free Probability Theory. A Noncommutative Probability Approach to Free Products with Applications to Random Matrices, Operator Algebras and Harmonic Analysis on Free Groups. Providence: American Mathematical Society, 1992 32. Wegner, F.: Disordered systems with n-orbitals per site: n → ∞ limit. Phys. Rev. 19, 783–792 (1979) 33. Weyl, H.: Das asymptotische Verteilungsgesetz der Eigenwerte lineare partieller differential Gleichungen. Math. Ann. 71, 441–479 (1912) 34. Wigner, E.: On the distribution of the roots of certain symmetric matrices. Ann. of Math. 67, 325–327 (1958) 35. Xu, F.: A random matrix model from two dimensional Yang-Mills theory. Commun. Math. Phys. 190, 287–307 (1997) 36. Zee, A.: Law of addition in random matrix theory. Nucl. Phys. B474, 726–744 (1996) Communicated by Ya. G. Sinai

Commun. Math. Phys. 214, 287 – 313 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Natural Convection with Dissipative Heating Y. Kagei1 , M. Ružiˇ ˚ cka2 , G. Thäter3 1 Graduate School of Mathematics, Kyushu University 36, Fukuoka 812-8581, Japan 2 Institute of Applied Mathematics, University of Freiburg, Eckerstr. 1, 79104 Freiburg, Germany 3 Mathematical Seminar, University of Bonn, Nußallee 15, 53115 Bonn, Germany.

E-mail: [email protected] Received: 26 April 1999 / Accepted: 29 March 2000

Abstract: We derive a system of equations which models the motion of linear viscous fluids that undergo isochoric motions in isothermal processes but not necessarily isochoric ones in non-isothermal processes. The main point is that in contrast to the usual Oberbeck–Boussinesq approximation we do not neglect dissipative heating. We study Rayleigh–Bénard convection for our system and investigate existence, uniqueness and regularity of solutions and conditions for the stability of the motionless state. 1. Introduction The well-known Oberbeck–Boussinesq [20, 2] approximation was designed as a simplified model for the thermomechanical response of linear viscous fluids undergoing isochoric motions in isothermal processes but not necessarily isochoric ones in nonisothermal processes. Its roots stem from the end of the last century. Nevertheless, its justification from the point of view of continuum mechanics was missing until 1996 when Rajagopal/R˚užiˇcka/Srinivasa derived the Oberbeck–Boussinesq approximation from the full thermodynamical theory for Navier–Stokes fluids by introducing a new method of treating the constraint of mechanical incompressibility in [23]. Though there is no doubt concerning its usefulness and a lot of attention was paid to its investigation1 , when analyzing convection in the earth’s upper mantle (see [4,9,19,27,28]), in astrophysics (see [1]) or within fast rotating configurations (see [3, 17]) the Oberbeck–Boussinesq approximation seems not appropriate due to its neglect of dissipative heating (see also [6, 7, 21]). Moreover, the rapid technological advances in the fields of jet propulsion, nuclear power and hypersonic flight lead to new applications where the three primary factors governing natural convection, namely, body force, fluid volumetric expansion coefficient and the temperature differences all can greatly exceed previously considered bounds. 1 For more detailed references and interesting historical remarks we refer to [11] and the introduction of [23], for a non-Newtonian model to [18].

288

Y. Kagei, M. R˚užiˇcka, G. Thäter

Several attempts have been made to single out under which circumstances dissipative heating must be taken into account and to find a scheme which is adequate for those processes while still as simple as possible: The authors of [1] follow a very general approach to model the dynamics of interstellar media. Filtering out fast acoustic motions which are evidently compressible, they investigate density variations in weakly compressible flows. A collection of incompressible approximations is derived taking the kinematic limit in compressible fluid dynamics according to two small non-dimensional parameters. These schemes model different situations depending on how these parameters are chosen. Especially, dissipative heating can be included (see Eqs. (8) there). Another idea is that the effect of viscous dissipation in natural convection is appreciable when the induced kinetic energy becomes appreciable compared to the amount of heat transferred and to measure this with the so-called Dissipation number introduced by Ostrach [21] in 1957 and used in [4, 6, 7, 9, 27, 28]. It relates the product of thermal expansion coefficient, acceleration due to the applied body force and length scale to the specific heat at constant pressure: Di≡ αgL cp . It has no dependence on the Prandtl nor on the Grashof number, the latter being generally thought to be the important parameter for laminar instability and flow transition. In particular the authors of [4, 9, 27] investigate processes in the earth’s upper mantle, where Di ∼ 0.12 · · · 0.5 (data2 taken from [19, 4]). This means that dissipative heating influences but does not dominate the corresponding flow processes. For example, local increases in the temperature of the earth’s mantle due to dissipative heating could possibly explain the high heat flow behind island arcs and the variable composition of oceanic basalts erupted at ridges (see [9]). Though all results quoted above are partially justified by experimental data, up to now, the question: How to model the situation where dissipative heating has to be taken into account while staying as close as possible to the usual Boussinesq scheme ? is still open. A reasonable derivation of such a scheme from the point of view of continuum mechanics is missing as well as this was true for the Oberbeck–Boussinesq approximation up to [23]. Moreover, it is even not clear how the right system looks in general. In [23] one finds a higher order approximation including dissipative heating by taking into account higher order terms in the power series. But unfortunately the equations, which hold on the approximation levels cannot be re-written as a system for some approximate quantities. For that we start from the balance laws for mass, linear momentum and energy and the constitutive relations – as used in [23]. One crucial point is to find the adequate non-dimensionalization for velocity, time and pressure since representative values for these quantities are not obvious from the configuration. As usual in natural convection processes we assume that bouyancy and gravity forces are driving the convection in equilibrium and that the pressure gradient balances the body force term. This means in particular that the body forces must have a potential. But this is no serious restriction for applications. The second peculiarity is the perturbation parameter ε. We choose it as a measure of incompressibility (slightly different from the one in [23]). However, the main difference from [23] is that we introduce an additional non-dimensional number, the Dissipation number Di , while in [23] only the perturbation parameter ε, Reynolds and Prandtl number are used. The consequence is a different balance of the terms. However, of course our small ε (which means small compressibility) leads to the zero divergence condi2 different values due to an uncertainty with respect to the value of α.

Natural Convection with Dissipative Heating

289

tion for the velocity and we stay close in this sense to the usual Oberbeck–Boussinesq approximation. Though it is known that dissipative heating and compressibility are of the same order as long as the Grüneisen constant is approximately one (which holds for the usual liquids and gases, see [4, 27, 28]) for possible non-standard applications (for example if g or L are very large) we find it more convenient to avoid any coupling between the considerations for Di and ε. Our new approximation3 for the unknowns velocity v, pressure p and temperature θ reads: div v = 0, 1 ε3 v˙ − √ v + ∇p = (1 − ε 3 θ)b, (1.1) Gr b t 1 2Di ˜ )θ˙ − Di θ + θ + θ C(θ v · b − √ θ = √ D(v) · D(v). 2(θ b − θ t ) Pr Gr Gr

Here b is the density of body forces, Gr and Pr are the Grashof and Prandtl number and C˜ stands for the specific heat, θ t denotes the temperature at the top and θ b at the bottom of the layer. Note, that the “addition” of the dissipative heating term D(v) · D(v) to the θ b +θ t usual Boussinesq scheme must be balanced by the term θ + 2(θ b −θ t ) v · b. This was already observed in [1, 21, 27, 28] nevertheless all the systems derived there are different from (1.1) due to different non-dimensionalizations and techniques used. The validity of our approximation is clearly delineated for concrete applications: One must compute the size of Gr , Di and ε only. In Sect. 3 we investigate system (1.1) in the framework of Rayleigh–Bénard convection. We consider the corresponding initial-boundary value problem for an infinite fluid layer R2 × (0, 1) in a gravitational field, i.e. b = (0, 0, −1). We are interested in rectangular cellular solutions and treat bounded parts of the layer with periodic boundary conditions in the two unbounded directions and set v = 0 on {x3 = 0, 1}. Due to our non-dimensionalization θ = − 21 on {x3 = 1}, θ = 21 on {x3 = 0}. We subtract the trivial solution {v0 , θ 0 } = {0, 21 − x3 }, nonetheless for the new unknown θ − θ 0 we keep the notation θ. Our results are proved by energy estimates and perturbation arguments. If the specific heat C˜ = const. (cf. Theorem 3.3 (iii)) we show that there exists a unique solution in the class4 v ∈ C([0, T ]; V ) ∩ L2 (0, T ; D(A)), θ ∈ C([0, T ]; L2 ()) ∩ L2 (0, T ; H01 ()). If C˜ = const. (cf. Theorem 3.3 (i)) we are forced to control higher order derivatives and the corresponding unique solution is found in the class v ∈ C([0, T ]; D(A)), ∂t v ∈ C([0, T ]; H ) ∩ L2 (0, T ; V ), θ ∈ C([0, T ]; H 2 () ∩ H01 ()), ∂t θ ∈ C([0, T ]; L2 ()) ∩ L2 (0, T ; H01 ()). In general T depends on the size of the initial data and physical parameters. Global existence is connected to stability and thus to the critical eigenvalues of the linearized operator at the motionless state (see Sect. 4). 3 For the precise definition of all quantities we refer to Sect. 2. 4 See beginning of Sect. 3 for the definition of the spaces used.

290

Y. Kagei, M. R˚užiˇcka, G. Thäter

For the usual Boussinesq system (i.e. Di = 0) it is well-known that the crucial parameter for stability is the critical Rayleigh number R c : The motionless state is unconditionally stable for R < R c while it is unstable if R > R c (cf. [10, 15]). For Di > 0 Theorem 3.3 (ii) states that if Di and the initial values are sufficiently small the motionless state remains stable as long as R < R c (Di ) while R > R c (Di ) causes instability. Here R c (Di ) > R c which means that 0 < Di < < 1 makes the process more stable. This is reasonable since an effect of Di > 0 at a fixed Rayleigh number is to reduce thermal convection (see [27]) and the corresponding decrease in convective heat transfer is directly coupled to increased stability. Concerning systems different from (1.1) but including dissipative heating, we note that in [16] global weak solutions are constructed for a system, which is, roughly speakθ b +θ t ing, the system (1.1) with b = 0, constant C˜ and without the term Di (θ + 2(θ b −θ t ) )v · b. The uniqueness and regularity of weak solutions are open as in the theory of the threedimensional Navier–Stokes equations. See also [12] for the two-dimensional Rayleigh– Bénard convection with dissipative heating and [13] for such system (including moreover the angular momentum equation) in general bounded domains. Let us finally introduce some notation: In what follows boldfaced minuscules always stand for vectors and vector valued functions whereas boldfaced capital letters represent tensor valued functions, i.e. v = (v1 , v2 , v3 ), T = (Tkl )3k,l=1 ; T ≡ (Tlk )3k,l=1 . All quantities are considered at points x = (x1 , x2 , x3 ) ∈ R3 and at a certain time t. We use the abbreviations ∂k ≡ ∂x∂ k , ∂t ≡ ∂t∂ (analogously for ∂θ . . . ), ∂θθ ≡ ∂θ ∂θ , div v ≡ ∂1 v1 + ∂2 v2 + ∂3 v3 and ∇ρ ≡ (∂1 ρ , ∂2 ρ , ∂3 ρ ). The dot between two quantities denotes the corresponding scalar product, whereas the superposed dot is the usual material time derivative: v·w ≡ T·L=

3

vk w k ,

k=1 3

Tkl Lkl ,

θ˙ ≡ ∂t θ + v · ∇θ, b˙ ≡ ∂t b + (b · ∇)v = ∂t b +

k,l=1

3

bk ∂k v.

k=1

The trace of some tensor D is denoted by tr D and D2 ≡ D · D. For the identity tensor we write I. For more details we refer to [26]. 2. Derivation of the Approximation 2.1. Governing equations and assumptions. For the convenience of the reader we include the basic steps – similar to those in [23] – of the derivation of the governing equations we will use. Henceforth let ρ denote the density, v the velocity field, T the Cauchy stress tensor, b the density of external body forces, e the specific internal energy, L the velocity gradient, r the radiant heating, θ the temperature, η the entropy and q the heat flux vector. The starting point for our analysis is the balance of mass, linear momentum and energy and the Second Law of Thermodynamics in the form of the Clausius–Duhem inequality: ρ˙ + ρ div v = 0, div T + ρ b = ρ v˙ ,

(2.1)

Natural Convection with Dissipative Heating

291

ρ e˙ − T · L − div q + ρ r = 0, q r + ρ ≥ 0. ρ η˙ − div θ θ

(2.2)

In addition to these balance equations we have to provide senseful5 constitutive relations. Precisely we assume: (i) The fluid shall undergo isochoric motions in isothermal processes. This means in particular ˙ div v = α(θ)θ,

(2.3)

where α is the thermal expansion coefficient. For the sake of simplicity in the following let α be constant (it is obvious how to handle non-constant α if necessary). Eqs. (2.3) and (2.1)1 lead to α θ˙ = − ρρ˙ which can easily be integrated to find an expression for ρ in terms of θ : ρ = ρr e−α(θ−θr ) ,

(2.4)

where ρr and θr are constants and correspond to some convenient reference state where density and temperature are known. Since the density is already determined by (2.4) we treat the mechanical pressure p as the third independent variable, which together with the velocity v and the temperature θ has to be determined from the system of equations. For this the stress T is represented as ˆ T ≡ −pI + T,

p ≡ − 13 trT.

(2.5)

Inserting (2.3) and (2.5) in (2.1)2 and taking the divergence of the resulting equation one can find an equation for p (cf. formula (2.16) in [23]). (ii) We neglect radiant heating (i.e. r = 0) and for the heat flux we suppose the Fourier Law q = κ∇θ (κ > 0 being the thermometric conductivity of the material). Introducing the specific Helmholtz free energy # ≡ e − θη

(2.6)

we can re-write (2.2) as ˙ +T·L+κ −ρ(η θ˙ + #)

|∇θ|2 ≥ 0, θ

(2.7)

where we also used (2.1)3 . Due to (2.4) we see # = #(θ ) and thus we deduce from (2.7) with the help of (2.3) and (2.5) that −ρ(η +

αp |∇θ |2 + ∂θ #)θ˙ + Tˆ · L + κ ≥ 0. ρ θ

The linearity of this inequality in θ˙ implies that η = −∂θ # − αp/ρ, which together with (2.6) yields e = # − αθp/ρ − θ∂θ # and (2.1)3 reads ˆ · L. −(ρ θ ∂θθ # + α 2 θp)θ˙ − αθ p˙ − κ θ = T 5 i.e. reflecting the properties of the material and in accordance with (2.2)

(2.8)

292

Y. Kagei, M. R˚užiˇcka, G. Thäter

ˆ shall linearly depend on D ≡ (iii) The extra stress T Stokes law this leads to

1 2 (L

+ L ). Together with the

2 ˆ = T(θ, ˆ T D) ≡ − µ(θ)(trD)I + 2µ(θ)D, 3

(2.9)

where µ is the viscosity. Henceforth let µ be constant for simplicity. Employing the assumptions (i)–(iii) the system (2.1) finally transforms to ˙ div v = α θ, 1 ρ v˙ − µ( v + ∇div v) + ∇p = ρ b, 3

(2.10)

1 (−ρ θ ∂θθ # − α 2 θ p)θ˙ − αθ p˙ − κ θ = 2µ(D2 − |trD|2 ). 3 We will set C(θ ) ≡ −ρ θ ∂θθ #. This is the specific heat. The Clausius–Duhem inequality simplifies to the reduced dissipation inequality ˆ · L + κ |∇θ| ≥ 0. T θ 2

2.2. Non-dimensionalization. The typical situation for which the Oberbeck–Boussinesq approximation is designed is the heating of a fluid layer from below in a gravitational field with a certain difference between the temperature θ t at the top and θ b at the bottom. Let U be the representative velocity, L, t0 and π typical length, time and pressure; ρr is the reference density (used in (2.4)) and g the gravitational acceleration or the acceleration 1 due to the applied body force, respectively. We set ϑ ≡ θ b − θ t and ) ≡ 2ϑ (θ b + θ t ), c0 ≡ C()ϑ) and introduce the dimensionless variables: x t , t≡ , L t0 p C p≡ , C≡ , π c0 x≡

v ρ , ρ ≡ , U ρr b θ b ≡ , θ ≡ − ). g ϑ v≡

We will not choose U , L and t0 independent of one another but t0 = fix θr = ϑ). If the data are smooth enough we find ρ = e−αϑθ ∼ 1 − αϑθ.

L U.

Moreover we

(2.11)

˜ Note that C(θ ) = C(ϑ(θ + ))). We set C(θ) ≡ c10 C(ϑ(θ + ))) and let ν ≡ µ/ρr be the kinematic viscosity. The system of equations for the dimensionless quantities and differential operators then becomes (we skip all bars for convenience): ˙ div v = αϑ θ, ν gL 1 π (1 − αϑθ )˙v − ∇p = 2 (1 − αϑθ)b,

v + ∇(div v) + UL 3 ρr U 2 U 2 ˜ 0 U L − α π U Lϑ(θ + ))p θ˙ − απ U L(θ + ))p˙ − κ θ = Cc =

2νρr U 2 1 (D2 − |trD|2 ). (2.12) ϑ 3

Natural Convection with Dissipative Heating

293

A priori there is no obvious representative velocity. The same is true for the pressure. gL Natural convection processes are reflected while U 2 αϑ = O(1) in the body force term and consequently we set (as in [6, 9, 21]6 ) U≡

gLαϑ.

This means that the gravitational potential energy is assumed to be completely transformed into kinetic energy (see [9]). Our perturbation parameter shall measure the compressibility, i.e. ε3 ≡ αϑ.

(2.13)

It is worth noticing that this perturbation parameter was already chosen by Oberbeck [20]. For most liquids α < 10−4 K −1 (see [4, 9, 19]) and thus, ε is small for a large range of temperature differences. Inserting these definitions for U and ε into (2.12) we observe that from now on only measurable quantities occur. Let us introduce the non-dimensional numbers (see [6, 21]), Gr ≡

1 gL3 αϑ, ν2

Di ≡

αgLρr , c0

ν Pr ≡ c0 . κ

(2.14)

From this and (2.13) it follows that Gr = ε3

gL3 , ν2

Di = ε 3

gLρr . c0 ϑ

The Grashof number Gr measures the relation of body forces to √ viscous forces and the Prandtl number viscous versus inertial forces. This means Pr Gr is a measure for conduction over convection of energy. Note that with our representative velocity the √ Reynolds number Re ≡ ν1 U L = Gr . In order to retain on the one hand both inertial and viscous effects and on the other hand effects due to disspative heating we must ensure that Di and Gr are of order one compared to ε. We discuss the consequences gL of this requirement in Remark 2.1 Moreover, in (2.12)2 the coefficient U 2 in front of −3 the body force term is the only one of order ε . To allow nonzero body forces we set π ≡ ρr gL since then also the pressure term is of the order ε −3 . Finally, our Eqs. (2.12) transform to ˙ div v = ε3 θ, ε3

1 ε3 (1 − ε 3 θ )˙v − √ ( v + ∇(div v)) + ∇p = (1 − ε 3 θ)b, (2.15) 3 Gr 1 |trD|2 2Di (C˜ − ε 3 Di (θ + ))p)θ˙ − Di (θ + ))p˙ − √ θ = √ (D2 − ). 3 Pr Gr Gr

6 for different choices for U see [4, 9, 19, 27, 28]

294

Y. Kagei, M. R˚užiˇcka, G. Thäter

Approximation scheme. We want to expand the quantities v, θ and p into power series with respect to the perturbation parameter ε in the form v=

∞ k=0

ε k vk ,

θ=

∞

ε k θk ,

p=

k=0

∞

ε k pk .

(2.16)

k=0

We attach the boundary conditions of the quantities to the zero order term and set the boundary conditions to be zero for the higher order terms. We replace all quantities in the system (2.15) by these power series and have a look at the first ε-powers. This means in particular that we are not interested in terms with large exponents at ε. Moreover we assume that all quantities are as smooth as we want. We see from Eq. (2.15)1 that div v0 = div v1 = 0. For that we may skip the terms ∇div v and θ v˙ in (2.15)2 for our analysis. Analogously, in (2.15)3 we neglect the terms Di (θ + ))p θ˙ and tr D. Thus, ε 3 v˙ 0 + ε ∂t v1 + (v0 · ∇)v1 + (v1 · ∇)v0 + · · · ∞ ∞ ∞ ε3 k k 3 k − √ ( ε vk ) + ε ∇pk = 1 − ε ε θk b Gr k=0 k=0 k=0 (to avoid confusion in the following we use the superposed dot only at the ”zero level”). At ε0 we note ∇p0 = b. Therefore we may only take body forces which have a potential to ensure the solvability of this equation. Nevertheless, this is no serious restriction. For ε1 and ε 2 we see ∇p1 = ∇p2 = 0, which together with the zero boundary conditions leads to p2 = p1 = 0. At ε3 we observe

1 v˙ 0 − √ v0 + ∇p3 = −θ0 b. Gr

ε4 yields 1 ∂t v1 + (v0 · ∇)v1 + (v1 · ∇)v0 − √ v1 + ∇p4 = −θ1 b, Gr

(2.17)

and ε5 finally provides 1 ∂t v2 + (v0 · ∇)v2 + (v1 · ∇)v1 + (v0 · ∇)v2 − √ v2 + ∇p5 = −θ2 b. Gr (2.18) Next we analyze (2.15)3 . For the specific heat we insert ˜ ) ∼ C(θ ˜ 0 ) + ε C˜ (θ0 ) C(θ

∞ k=1

ε k−1 θk ,

Natural Convection with Dissipative Heating

295

where C˜ ≡ ∂θ C˜ and obtain ∞ ˜ 0 ) + ε C˜ (θ0 ) ε k−1 θk θ˙0 + ε(∂t θ1 + v0 · ∇θ1 + v1 · ∇θ0 ) + · · · C(θ −Di (

∞

k=0

k=1

ε θk + )) p˙ 0 + ε(∂t p1 + v1 · ∇p0 + v0 · ∇p1 ) + · · · k

∞ 1 2Di − √ ε k θk = √ Pr Gr k=0 Gr

D(

∞ k=0

k

ε vk ) · D(

∞ k=0

k

ε vk ) .

We see at ε0 , ˜ 0 )θ˙0 − Di (θ0 + ))p˙ 0 − C(θ

1 2Di √ θ0 = √ D(v0 ) · D(v0 ). Pr Gr Gr

It is reasonable (though for the following not essential) to assume that b does not depend on the time. Then relation ∇p0 = b leads to p˙ 0 = v0 · b (otherwise, since b = ∇- for ˙ and ε1 gives some scalar field -, we would find p˙ 0 = -) ˜ 0 )(∂t θ1 + v0 · ∇θ1 + v1 · ∇θ0 ) + C˜ (θ0 )θ1 θ˙0 C(θ −Di θ1 v0 · b − Di (θ0 + ))(∂t p1 + v1 · ∇p0 + v0 · ∇p1 ) 4Di 1 − √ θ1 = √ D(v0 ) · D(v1 ). Pr Gr Gr

(2.19)

At ε2 we finally find ˜ 0 )(∂t θ2 + v0 · ∇θ2 + v1 · ∇θ1 + v2 · ∇θ0 ) C(θ + C˜ (θ0 ) θ2 θ˙0 + θ1 (∂t θ1 + v0 · ∇θ1 + v1 · ∇θ0 ) 1 + C˜ (θ0 )θ12 θ˙0 −Di θ2 v0 · b − θ1 (∂t p1 + v1 · b + v0 · ∇p1 ) (2.20) 2 1 −Di (θ0 + ))(∂t p2 + v0 · ∇p2 + v1 · ∇p1 + v2 · b) − √ θ2 Pr Gr 2Di = √ 2 D(v0 ) · D(v2 ) + D(v1 ) · D(v1 ) . Gr Since the solutions are smooth we deduce from the unique solvability of (2.17) and (2.19) that v1 ≡ 0, θ1 ≡ 0, p4 ≡ 0, since they fulfill the equations and have zero boundary values. Similarly, from (2.18) and (2.20) and the known values for v1 , θ1 and p2 we find that v2 ≡ 0, θ2 ≡ 0, p5 ≡ 0. One computes that the next ε-levels yield non-trivial contributions. Thus, setting v ≡ v0 , θ ≡ θ0 and p ≡ p0 + ε 3 p3 we obtain div v = 0, 1 ε3 v˙ − √ v + ∇p = (1 − ε 3 θ)b, Gr 1 2Di ˜ )θ˙ − Di (θ + ))v · b − √ C(θ

θ = √ D(v) · D(v). Pr Gr Gr

(2.21)

As in [23] the approximation combines different levels in the temperature and the velocity equation and the velocity has zero divergence. In (2.21) both v and θ are precise of order ε 3 and p of order ε 6 (since v1 = v2 ≡ 0, θ1 = θ2 ≡ 0, p4 = p5 ≡ 0).

296

Y. Kagei, M. R˚užiˇcka, G. Thäter

Remark 2.1. As already mentioned our model (2.21) is derived under the assumption 3 −3 r that Gr = O(1) and Di = O(1), which means gL = O(ε −3 ) and gLρ c0 ϑ = O(ε ). We ν2 would like to emphasize that this is easy to control for a concrete flow problem since all necessary data can be measured. Moreover examples for which the above requirement is satisfied are quoted in our introduction (cf. [3, 4, 9, 17, 27, 28]). Note that if one requires that Gr = O(1) and Di = O(ε) one obtains the usual 3 Oberbeck–Boussinesq scheme (cf. [23]) (hence, in this situation gL = O(ε −3 ) and ν2 gLρr c0 ϑ

= O(ε −2 )).

Remark 2.2. Appropriately adapting the strategy outlined in this section one can find a variety of different systems for different situations. We have restricted ourselves to the study of the case when dissipative heating is not negligible. Note that we have not been interested in any question related to the convergence of the series in (2.16) or the limiting process as ε → 0. 3. Existence and Stability of Solutions 3.1. Problem and main results. In this chapter we consider our system (2.21) within the context of Rayleigh–Bénard convection. We impose that v and θ are lj -periodic in xj , where lj , (j = 1, 2) are positive numbers and v = 0 on {x3 = 0, 1}, θ = − 21 on {x3 = 1}, θ =

1 2

on {x3 = 0}.

(3.1)

We are mainly interested in gravity, i.e. b = (0, 0, −1). Nevertheless to avoid possible confusion about signs from now on b always means (0, 0, 1). One easily notices that v0 = 0,

θ0 =

1 2

− x3

and

∇p 0 = (1 − ε 3 ( 21 − x3 ))(0, 0, −1),

is a solution of (2.21) and (3.1) which represents the motionless state and pure conduction. We will investigate the deviation from pure conduction and introduce the quantity θ˜ ≡ θ − θ 0 which is zero at the top and the bottom of the layer. Moreover, we set ˜ θ˜ + 1 − x3 ))−1 for convenience. Note ˆ ≡ 1 + ) and η ≡ (C( ε 3 ∇ p˜ ≡ ∇p − ∇p 0 , ) 2 2 ˆ > 1 and η depends not only on θ˜ but also on x3 . After these changes and skipping that ) of all tildes (2.21) has transferred to div v = 0, 1 v˙ − √ v + ∇p = θb, Gr 2Di η η ˆ − x3 + θ)v3 − v3 = √ D(v) · D(v), θ˙ − √ θ + Di η() Pr Gr Gr v = 0, θ = 0

on {x3 = 0, 1},

(3.2)

(3.3)

and our initial boundary value problem is now posed under periodic boundary conditions in x1 - and x2 -directions, and the initial condition v|t=0 = v0 , θ|t=0 = θ0 . Next we introduce some function spaces and operators.

(3.4)

Natural Convection with Dissipative Heating

297

For a Banach space X we denote its norm by . X . For simplicity we write the norm of Lq (D) as . q , where D is a domain and 1 ≤ q ≤ ∞. Here and in what follows (. , .) denotes the scalar product of L2 () or L2 ()3 , respectively. Let = Tl1 ,l2 × (0, 1), where Tl1 ,l2 = R2 /(l1 Z × l2 Z). For m ∈ N, H m () denotes the mth order L2 -Sobolev space on . We denote by C0∞ () the set of all C ∞ -functions θ on with values in R satisfying θ = 0 near the boundaries {x3 = 0, 1}. H01 () is the H 1 -closure of ∞ () denotes the set of solenoidal vector fields on whose components C0∞ (). C0,σ ∞ () in L2 ()3 and H 1 ()3 are in C0∞ (). We write H and V for the closures of C0,σ respectively. We denote by P the orthogonal projector from L2 ()3 to H . The Stokes operator A is defined by A ≡ −P with domain D(A) ≡ H 2 ()3 ∩ V . Definition 3.1. Let T > 0 and {v0 , θ0 } ∈ D(A) × (H 2 () ∩ H01 ()). A pair {v, θ } is called a strong solution of problem (3.2)–(3.4) on [0, T ] if v ∈ C([0, T ]; D(A)), ∂t v ∈ C([0, T ]; H ) ∩ L2 (0, T ; V ), θ ∈ C([0, T ]; H 2 () ∩ H01 ()), ∂t θ ∈ C([0, T ]; L2 ()) ∩ L2 (0, T ; H01 ()) and {v, θ } fulfills the equations 1 ∂t v + P ((v · ∇)v) + √ Av = P (θ b), Gr (3.2)3 on (0, T ) together with the initial conditions (3.4). We introduce the Rayleigh number R ≡ Gr Pr and set λ ≡

√

(3.5)

R.

Remark 3.2. To understand how the Rayleigh number enters our estimates we perform the following transformation: v → Gr −1/2 v,

t → Gr 1/2 t,

θ → Pr 1/2 Gr −1/2 θ,

p → Gr −1 p.

For simplicity7 let η = 1. Our equations then become ∂t v + P ((v · ∇)v) + Av − λP (θ b) = 0, (3.6)

θ λ 2Di λDi ˆ − x3 )v3 − θ˙ − + () v3 + Di θv3 = D(v) · D(v). Pr Pr Pr λ In the first energy estimates these equations are scalarly multiplied by v and Pr θ , respectively. The occurring (v3 , θ )-terms have a factor λ and there is a critical number up to which it is possible to absorb them at the left-hand side. For the usual Oberbeck– Boussinesq system, i.e. in the case that Di = 0 the related critical Rayleigh number R c is (cf. [24, 4]):

1 2 (v3 , θ) 1 ≡ sup ; {v, θ } ∈ V × H0 − {0} . (3.7) √ Rc ∇v22 + ∇θ 22 √ From (3.7) it follows that for λ < R c , λ (∇v22 + ∇θ22 ). ∇v22 + ∇θ22 − 2λ(v3 , θ) ≥ 1 − √ Rc √ So in case Di = 0 if λ < R c the solution {v(t), θ (t)} tends to zero as t → ∞. The case Di > 0 is analyzed in Theorem 3.3. 7 Due to our non-dimensionalization if η = const. then η = 1.

298

Y. Kagei, M. R˚užiˇcka, G. Thäter

Theorem 3.3. (i) Assume that η is a smooth function satisfying d 0 < d1 ≤ η(θ, x3 ) ≤ d2 , η(θ ) ≤ d3 for all 0 ≤ x3 ≤ 1 dθ

(3.8)

for some constants dj (j = 1, 2, 3). Then for each {v0 , θ0 } ∈ D(A) × (H 2 () ∩ H01 ()), there exist a positive number T0 and a unique strong solution {v, θ } on [0, T0 ], where T0 depends on H 2 -norm of {v0 , θ0 }, dj (j = 1, 2, 3) and physical parameters. (ii) In addition to (3.8), assume also that ˜ 0 ) = η(0) = 1. C(θ

(3.9)

Then there exist ζ0 > 0, R c (Di ) and δ > 0 such that if 0 ≤ Di ≤ ζ0 and R < R c (Di ), then for each {v0 , θ0 } with E(0) ≡ Av0 22 + θ0 2H 2 () ≤ δ, the strong solution {v, θ} exists on [0, ∞) and satisfies Av(t)22 + ∂t v(t)22 + θ(t)2H 2 () + ∂t θ(t)22 ≤ C1 e−βt E(0) for some positive constants C1 and β, which means that the motionless state is asymptotically stable. Here, δ → 0 and β → 0 as R → R c (Di ). Furthermore, R c (0) = R c , R c (Di ) > R c for 0 < Di ≤ ζ0 . If R > R c (Di ) then the motionless state is unstable. (iii) Assume that η = 1. Then for each {v0 , θ0 } ∈ V × L2 () there exists a unique solution {v, θ } of problem (3.2)–(3.4) in the class v ∈ C([0, T ]; V ) ∩ L2 (0, T ; D(A)), θ ∈ C([0, T ]; L2 ()) ∩ L2 (0, T ; H01 ()) for some T = T (∇v0 22 +θ0 22 ) > 0. Moreover, the solution {v, θ } exists globally in time if 0 ≤ Di ≤ ζ0 , R < R c (Di ) and ∇v0 22 + θ0 22 ≤ δ. In this case for some positive constants C1 and β it holds ∇v(t)22 + θ(t)22 ≤ C1 e−βt (∇v0 22 + θ0 22 ). Remark 3.4. (i) Under the assumption (3.8) likewise, for each {v0 , θ0 } ∈ D(A) × H01 () there exists a solution of problem (3.2)–(3.4) in the class v ∈ C([0, T ]; D(A)), ∂t v ∈ C([0, T ]; H ) ∩ L2 (0, T ; V ), (3.10) 1 2 2 2 2 θ ∈ C([0, T ]; H0 ()) ∩ L (0, T ; H ()), ∂t θ ∈ L (0, T ; L ()) for some T = T (Av0 22 + ∇θ0 22 ) > 0. Moreover, if (3.9) and 0 ≤ Di ≤ ζ0 hold while R < R c (Di ), then this solution exists globally in time provided that Av0 22 + ∇θ0 22 ≤ δ for some δ > 0. In this case {v, θ } satisfies Av(t)22 + ∇θ(t)22 ≤ C1 e−βt (∇v0 22 + θ0 22 ). However, the uniqueness of such solutions in the class (3.10) is an open problem if η is not constant (cf. also Remark 3.10).

Natural Convection with Dissipative Heating

299

(ii) Condition (3.9) means that in principle the non-constant specific heat is connected to the deviation from pure conduction. Without (3.9) the analysis of the linearized operator at the motionless state becomes much more complicated since η(0) depends on x3 . (iii) Theorem 4.1 determines the values of ζ0 and Rc (Di ). They are related to estimate (3.19) which is crucial for global existence and stability. (iv) One could also consider viscosity and the thermal conductivity to be non-constant, namely, they depend on θ and thus κ θ is replaced by div (κ(θ )∇θ) and ν v by div (ν(θ )∇v). Assume that the bounds (3.8) and 0 < ν1 ≤ ν(θ ) ≤ ν2 , |ν (θ )| ≤ ν3 , 0 < κ1 ≤ κ(θ) ≤ κ2 , |κ (θ )| ≤ κ3 hold for some constants dj , νj and κj (j = 1, 2, 3). In this case the same assertions as in Theorem 3.3 can be proved in a similar manner provided the analogues of (3.9) hold for κ and ν. 3.2. Preliminaries. In what follows the letter C denotes constants which may vary from line to line and may depend on the periods l1 , l2 , Pr , Gr , dj (j = 1, 2, 3) and Di if not specified. Mostly without explicit reference to them we permanently use the Hölder, the Poincaré, the Young inequality8 and the well-known estimates, valid for v ∈ D(A) and θ ∈ H 2 () ∩ H01 (), vH 2 ≤ C(l1 , l2 ) Av2 , θH 2 ≤ C(l1 , l2 ) θ2 . The following lemmata will be applied to estimate the nonlinearities. Lemma 3.5. There exists a constant C = C(l1 , l2 ) > 0 such that: (i)

1/2

1/2

v1 · ∇v2 2 ≤ C ∇v1 2 ∇v2 2 Av2 2 1/2

1/2

v · ∇θ 2 ≤ C ∇v2 ∇θ 2 θ2

(v1 ∈ V , v2 ∈ D(A)),

(v ∈ V , θ ∈ H 2 ()∩H01 ()).

(ii) For γ > 0, θ ∈ H01 (), v1 ∈ D(A), v2 ∈ V , 1/2

1/2

|(η(θ, x3 )D(v1 ) · D(v2 ), θ )| ≤ Cd2 ∇v1 2 Av1 2 ∇v2 2 ∇θ 2 ≤ γ ∇θ 22 + C d22 γ −1 ∇v1 2 Av1 2 ∇v2 22 .

(iii) For v = (v1 , v2 , v3 ) ∈ V , θ1 ∈ H01 (), θ2 ∈ L2 (), 1/2

1/2

(η(θ, x3 )θ1 v3 , θ2 ) ≤ Cd2 ∇θ1 2 v2 ∇v2 θ2 2 . 1/2

1/2

(iv) D(v1 ) · D(v2 )2 ≤ C Av1 2 Av2 2 ∇v2 2

(v1 , v2 ∈ D(A)).

(v) For γ > 0, θ1 , θ2 ∈ H 2 () ∩ H01 (), −(η(θ1 , x3 ) θ2 , θ2 ) ≥ (d1 − γ )∇θ2 22 1/2 3/2 − Cd3 θ2 2 ∇θ2 2 θ1 2 + d3 γ −1 θ2 22 . 8 For γ > 0, 1 < r < ∞ it holds a b ≤ γ 1−r a r + γ br while 1 + 1 = 1. r r 1

300

Y. Kagei, M. R˚užiˇcka, G. Thäter

Proof. (ii) Due to η(θ, x3 ) ≤ d2 , and the inequalities (see [5, Th. 10.1]): 1

1

1

1

θ 6 ≤ C ∇θ 2 and ∇v3 ≤ C ∇v22 vH2 2 ≤ C ∇v22 Av22 , we see |(η(θ, x3 )D(v1 ) · D(v2 ), θ )| ≤ d2 D(v1 )3 D(v2 )2 θ 6 1/2

1/2

≤ Cd2 ∇v1 2 Av1 2 ∇v2 2 ∇θ 2 ≤ γ ∇θ 22 + Cd22 γ −1 ∇v1 2 Av1 2 ∇v2 22 . The inequalities (i), (iii) and (iv) can be proved similarly. Relation (v) is shown in the following way: −(η(θ1 , x3 ) θ2 , θ2 ) = (η(θ1 , x3 )∇θ2 , ∇θ2 ) + (∇θ2 , θ2 ∇η(θ1 , x3 )) ≥ d1 ∇θ2 22 − d3 ∇θ2 2 θ2 3 ∇θ1 6 − d3 ∇θ2 2 θ2 2 ≥ (d1 − γ )∇θ2 22 − Cd3 (θ2 2 ∇θ2 2 θ1 2 + d3 γ −1 θ2 22 ) 1/2

for arbitrary γ > 0. This completes the proof.

3/2

Lemma 3.6. There exists C = C(l1 , l2 ) > 0 such that the following inequalities hold if γ > 0, θ, θ1 , ∈ H01 (), θ2 ∈ H 2 () ∩ H01 (), v ∈ V , v1 , v2 ∈ D(A) while d η ≡ dθ η(θ, x3 ): (i) |(η θ1 θ2 , θ1 )| ≤ γ ∇θ1 22 + Cd34 γ −3 θ1 22 θ2 42 , |(v · ∇θ1 , θ)| ≤ γ ∇θ 22 + Cγ −1 ∇v22 ∇θ1 2 θ1 2 ,

(ii) (iii)

|(η θ D(v1 ) · D(v2 ), θ )| ≤ γ ∇θ22 + Cd32 γ −1 θ 22 Av1 22 Av2 22 .

Proof. |(η θ1 θ2 , θ1 )| ≤ d3 θ1 6 θ2 2 θ1 3 3/2

1/2

≤ Cd3 ∇θ1 2 θ1 2 θ2 2 ≤ γ ∇θ1 22 + Cd34 γ −3 θ1 22 θ2 42 , |(v · ∇θ1 , θ)| = | − (v · ∇θ, θ1 )| ≤ v6 ∇θ 2 θ1 3 1/2

1/2

≤ C ∇v2 ∇θ 2 ∇θ1 2 θ1 2

≤ γ ∇θ 22 + Cγ −1 ∇v22 ∇θ1 2 θ1 2 , |(η θ D(v1 ) · D(v2 ), θ )| ≤ d3 θ 23 D(v1 )6 D(v2 )6 ≤ Cd3 ∇θ 2 θ 2 Av1 2 Av2 2 ≤ γ ∇θ 22 + Cd32 γ −1 θ 22 Av1 22 Av2 22 .

In the proof of Theorem 3.3 we apply the energy method9 since especially for η = const. it is the simplest possibility and very near to physics. Namely, we introduce the following energies:

t E0 (t) = E0 [v, θ](t) ≡ sup [v(s)22 + θ(s)22 ] + ∇v(s)22 + ∇θ(s)22 ds, 0≤s≤t

0

9 For a review of energy methods within the framework of regularity theory for Navier–Stokes equations,

see [8, 25].

Natural Convection with Dissipative Heating

301

E1 (t) = E1 [v](t)

≡ sup 0≤s≤t

[∇v(s)22 ] +

t 0

Av(s)22 ds,

E2 (t) = E2 [v, θ](t) ≡ sup [Av(s)22 + ∂t v(s)22 + ∇θ(s)22 ] + 0≤s≤t

t + ∇∂t v(s)22 + θ (s)22 + ∂t θ(s)22 ds, 0

t E3 (t) = E3 [θ ](t) ≡ sup [ θ (s)22 + ∂t θ(s)22 ] + ∇∂t θ(s)22 ds, 0≤s≤t

0

Eˆ (t) = Eˆ [v, θ ](t) ≡ E0 (t) + E1 (t), ˆ + E2 (t), E˜ (t) = E˜ [v, θ](t) ≡ E(t) ˜ + E3 (t), E (t) = E [v, θ ](t) ≡ E(t)

ˆ E(0) ≡ + ∇v0 22 + θ0 22 , ˜ ˆ E(0) ≡ E(0) + Av0 22 + ∇θ0 22 , v0 22

˜ E(0) ≡ E(0) + θ0 22 .

if η = 1, due to the term D · D we cannot close our estimates without controlling Even t 2 ˆ 0 Av2 ds (see calculations up to (3.14)). Consequently, the energy E(t) is related to the class of solutions in Theorem 3.3 (iii) (η = 1). difference in case η = const. is that due to Lemma 3.5 (v) the term t The crucial 2 ds is involved from the very beginning (i.e. while investigating E ). The θ (s) 0 2 0 t corresponding energy estimate creates 0 Avr2 ds with r > 2 on the right-hand side and thus, we must find a bound for sup Av2 . Through the equation of motion, Av is connected to ∂t v. All these higher order derivatives are included into E2 (t). E3 (t) is needed for the corresponding uniqueness proof (see Remark 3.10). For the proof of global existence, we use the weighted energies N0 (t) ≡ sup [eβs (v(s)22 + θ(s)22 )] + β 0≤s≤t

t 0

eβs (∇v(s)22 + ∇θ(s)22 )ds,

ˆ related to E0 (t), Nj (t) (j = 1, 2, 3), Nˆ (t), N˜ (t) and N (t) corresponding to Ej (t), E(t), ˜ E(t) and E(t), respectively.

3.3. Plan of the proof. Throughout the proof we denote Di by ζ . First the existence of strong solutions is shown and after that uniqueness. We construct a strong solution by the Faedo-Galerkin method. Let {ωn } be the orthonormal basis of H consisting of the eigenfunctions of A. Let {τn } be the orthonormal basis of L2 () consisting of the eigenfunctions of B, where B is the operator defined by B ≡ − with domain of definition D(B) ≡ H 2 () ∩ H01 (). For each n we look for an approximate solution {vn , θ n } of (3.2)–(3.4) of the form vn (t) ≡ satisfying vn (0) = v0n ,

n

fn,j (t)ωj , θ n (t) ≡

j =1

θ n (0) = θ0n and

n

gn,j (t)τj

j =1

302

Y. Kagei, M. R˚užiˇcka, G. Thäter

1 ∂t vn , ωj + √ (Avn , ωj ) + (vn · ∇vn , ωj ) = (θ n b, ωj ), Gr

η(θ n ) ˆ − x3 + θ n )v3n , τj ) − ( √ θ n , τj ) ∂t θ n , τj + ζ (η() Pr Gr 2ζ n n n −(v3 , τj ) + (v · ∇θ , τj ) = √ (ηD(vn ) · D(vn ), τj ), Gr

(3.11)

where {v0n , θ0n } is the orthogonal projection in H × L2 () of {v0 , θ0 } onto the space spanned by {ω1 , τ1 }, · · · , {ωn , τn }. The existence of {vn , θ n } on some interval [0, Tn ) follows from the standard argument of ordinary differential equations. The first step in our proof is to show Proposition 3.7. There is a positive constant T0 = T0 (E(0)) such that if the approximate solution {vn , θ n } exists on [0, T0 ], then for all t ∈ [0, T0 ], E[vn , θ n ](t) ≤ C (E(0) + E(0)2 ), where C is a positive constant independent of n. ˜ ˜ n , θn ](t) and E(0) and, when η = 1, also for The same assertion holds for E[v ˆ n , θn ](t) and E(0). ˆ E[v By Proposition 3.7 one can see that each approximate solution {vn , θ n } exists on [0, T0 ] and the convergence of {vn , θ n } to a strong solution {v, θ } on [0, T0 ] follows by the stan˜ n , θ n ](t) in Proposition 3.7 proves the existence dard arguments. The assertion for E[v ˆ n , θ n ](t) of solutions in the class (3.10) of Remark 3.4 (i). Similarly the assertion for E[v provides the existence of solutions in Theorem 3.3 (iii) when η = 1. An intermediate step is to estimate the L2 -norm of u ≡ {v, θ }. Here we formulate the result for η = 1 only. Proposition 3.8. Assume that η = 1. Let 0 ≤ ζ ≤ ζ0 and R < R c (ζ ), where ζ0 and R c (ζ ) are positive numbers given by Theorem 4.1 below. Suppose that the solution {v, θ} exists on [0, T ] for some T > 0. Then there are β0 > 0 and C > 0 such that if 0 < β ≤ β0 it holds for all t ∈ [0, T ] , u(t)2H ×L2 () ≤ C e−2βt u(0)2H ×L2 () + β12 N0 (t)2 + β12 N1 (t)2 . Here the constant C is independent of β and β0 → 0 as R → R c (Di ). Now we are ready to show the global existence and stability of solutions applying the energy method and Proposition 3.8 Proposition 3.9. (i) Assume that (3.8) holds. Let β > 0 be fixed. There is a positive constant T1 = T1 (E(0)) ≤ T0 such that for all t ∈ [0, T1 ], N (t) ≤ C (E(0) + E(0)2 ), ˜ where C is a positive constant independent of T1 . The same holds for N˜ (t) and E(0), ˆ ˆ and also, when η = 1, for N (t) and E(0). (ii) Let 0 ≤ ζ ≤ ζ0 , R < R c (ζ ) and (3.8) and (3.9) hold. Then there exist positive constants β, δ and C1 , which do not depend on T , such that N (T ) ≤ C1 (E(0) + E(0)2 ) ˜ ˆ if N (T ) ≤ δ. The same holds for N˜ (t) and E(0), and for Nˆ (t) and E(0) if η = 1. Again δ → 0 and β → 0 as R → R c (ζ ).

Natural Convection with Dissipative Heating

303

3.4. Outline of the uniqueness proof. If η = 1, solutions in the class of Theorem 3.3 (iii) can be handled in the same way as in [13]. So we consider the case η is not constant only. The uniqueness of strong solutions (in the sense of Definition 3.1) is shown with the usual method and we omit most details. Let {v1 , θ1 }, {v2 , θ2 } be two solutions to the same initial value. We set v ≡ v1 − v2 and θ ≡ θ1 − θ2 . With the help of the estimate : 3/4 1/4 θ∞ ≤ C θ2 θ2 we derive |((η(θ1 ) − η(θ2 )) θ3 , θ4 )| ≤ d3 θ1 − θ2 ∞ θ3 2 θ4 2 3/4

1/4

≤ Cd3 θ 2 θ 2 θ3 2 θ4 2

(3.12)

and |(η(θ1 )D(v1 ) · D(v1 ) − η(θ2 )D(v2 ) · D(v2 ), θ )| ≤ |((η(θ1 ) − η(θ2 ))D(v1 ) · D(v1 ), θ )| + |(η(θ2 )D(v) · D(v1 + v2 ), θ )| 3/4

1/4

3/2

1/2

≤ C(d3 θ2 θ 2 Av1 2 ∇v1 2 θ 2 3/4

1/4

3/4

1/4

+Av2 ∇v2 A(v1 + v2 )2 ∇(v1 + v2 )2 θ 2 ). One can then see that

d H (t) ≤ F (t)H (t), dt

where H (t) = (v(t)22 + ∇v22 + θ 22 + ∇θ 22 )(t) with H (0) = 0 and F (t) is some function in L1 (0, T ). It is now easy to deduce that H (t) ≡ 0 and the proof of our theorems is finished. Remark 3.10. The reason that we cannot prove uniqueness in the class (3.10) of Remark 3.4 (i) is the estimate (3.12). It leads to the term θ2 82 in F (t) for which we need that θ ∈ C([0, T ]; H 2 ()) in the existence proof. 3.5. Proofs of the Propositions. Proof of Proposition 3.7. Let η = 1. Our idea is to show that if {vn , θ n } exists on [0, T ], then there is C > 0 independent of T such that for t ∈ [0, T ], ˆ ˆ n , θ n ](t) + (T 1/2 + T )E[v ˆ n , θ n ](t)2 . ˆ n , θ n ](t) ≤ E(0) (3.13) + C T E[v E[v The statement of Proposition 3.7 for Eˆ follows from (3.13) by taking T appropriately small. For the other energies one proceeds similarly. To show (3.13) we multiply (3.11)2 by gn,j (t) and add the resulting equations j = 1, · · · , n obtaining (η = 1) 1 1 d n 2 ˆ − x3 )v3n , θ n ) θ 2 − √ ( θ n , θ n ) = −ζ (() 2 dt Pr Gr 2ζ + (v3n , θ n ) − ζ (θ n v3n , θ n ) + √ (D(vn ) · D(vn ), θ n ). Gr

304

Y. Kagei, M. R˚užiˇcka, G. Thäter

Note, (vn · ∇θ n , θ n ) = 0. With Lemma 3.5 we then find 1 d n 2 1 θ 2 + √ ∇θ n 22 ≤ F (vn , θ n ), 2 dt Pr Gr 1

1

where 1

3

F (v, θ) ≡ C (v2 θ2 + ∇θ 2 θ 2 ∇v22 v22 + Av22 ∇v22 ∇θ 2 ). From (3.11)1 we obtain in a similar way 1 d n 2 1 v 2 + √ ∇vn 22 = (v3n , θ n ) ≤ vn 2 θ n 2 . 2 dt Gr It then follows that after application of Young’s inequality d 2 d1 n 2 n 2 n 2 n 2 (v 2 + θ 2 ) + √ ∇v 2 + √ ∇θ 2 ≤ F˜ (vn , θ n ), dt Gr Pr Gr where the terms with ∇θ 22 were absorbed at the left and F˜ (v, θ) ≡ C (v22 + θ 22 + θ 22 ∇v2 v2 + Av2 ∇v32 ).

(3.14)

Now for fixed t > 0 we integrate (3.14) over 0 < s ≤ t. Since F˜ is t s [0, s], where non-negative for all s ∈ [0, t] we deduce 0 F˜ dτ ≤ 0 F˜ dτ. Moreover, for t < T (skipping the superscript n) we calculate

t 1/2 t 1/2 t Av2 ∇v32 ds ≤ sup ∇v22 Av22 ds ∇v22 ds 0≤s≤t

0

≤

0

sup ∇v22

3/2

0≤s≤t

≤ T 1/2

0

0

sup ∇v22

2

0

t

Av22 ds +

0≤s≤t t

t 0

1/2

T 1/2

Av22 ds

2

,

θ 22 ∇v2 v2 ds ≤ T sup θ 22 ( sup v22 + sup ∇v22 ). 0≤s≤t

0≤s≤t

0≤s≤t

We thus obtain

ˆ n , θ n ](t)2 . E0 [vn , θ n ](t) ≤ E0 (0) + C T E0 [vn , θ n ](t) + (T 1/2 + T )E[v

(3.15)

In a similar manner testing10 (3.11)1 by Avn one derives d 1 ∇vn 22 + √ Avn 22 ≤ C (Avn 2 ∇vn 32 + θ n 22 ), dt Gr n n ˆ n , θ n ](t)2 . E1 [v , θ ](t) ≤ E1 (0) + C T E0 [vn , θ n ](t) + T 1/2 E[v

(3.16)

This and (3.15) yield (3.13) and the proof for η = 1 is complete. If η = const. there occurs an additional term d3 ∇θ 22 θ 22 + d32 θ 22 in F (v, θ ) above (see Lemma 3.5 (v)). In principle this quantity is treated with energy estimates for higher order derivatives appearing in E2 (t). 10 i.e. we multiply by λ g (t) and add up, where λ is the j th eigenvalue of A j n,j j

Natural Convection with Dissipative Heating

305

More precisely, we test (3.11)2 by − θ n . Then we differentiate (3.11)1 with respect to t and test the resulting equation by ∂t vn . In addition to the technique already introduced we must replace quantities through expressions derived from the equations, namely ∂t v22 ≤ C (Av22 + v · ∇v22 + θ22 ) ≤ C (Av22 + Av42 + θ 22 ), Av22 ≤ C (∂t v22 + Av42 + θ 22 ), ∂t θ 22 ≤ C ( θ 22 + Av22 ∇θ 22 + v22 + Av42 ), ∂t v(0) = −Gr −1/2 Av0 − P (v0 · ∇v0 ) + P (θ0 b). ˜ ˜ 2. + E(0) The last relation provides ∂t v(0)22 ≤ E(0) For E3 (t) Eq. (3.11)2 is differentiated with respect to t and then the resulting equation is tested by ∂t θ n . Analogously to ∂t v(0) one treats ∂t θ(0). The precise estimates for E2 (t) and E3 (t) are technical, but straightforward applying Lemma 3.6. Details are left to the interested reader. This completes the proof of Proposition 3.7. Proof of Proposition 3.8. For simplicity throughout this proof · denotes the norm of H × L2 () : u ≡ (v22 + θ 22 )1/2 . Recall that we are considering the case η = 1. We first re-write the system

Here

du + Lu + B1 (u) + B2 (u) + B3 (u) = 0, u(0) = u0 . dt √1 Av − P (θ b) Gr Lu ≡ ˆ − x3 ) − 1)v3 , − √1 θ + (ζ () Pr

B1 (u) ≡

(3.17) (3.18)

Gr

0 P (v · ∇v) 0 . , B2 (u) ≡ − √2ζ D · D , B3 (u) ≡ v · ∇θ ζ θv3 Gr

Theorem 4.1 below will show that, if 0 ≤ ζ ≤ ζ0 and R < R c (ζ ), then the spectrum σ (L) of the operator L satisfies11 Re σ (L) ≥ 2β0 > 0 for some β0 > 0. Besides, β0 → 0 as R → R c (ζ ). From this follows (see [15, p. 289]) that for 0 < β ≤ β0 , e−t L uH ×L2 () ≤ C e−βt uH ×L2 () (t > 0).

(3.19)

Since u is the unique solution of (3.17), we can write u as u(t) = e−t L u0 + G1 (u) + G2 (u) + G3 (u), where

Gj (u) = −

0

t

e−(t−s)L Bj (u)ds (j = 1, 2, 3).

By Lemma 3.5 and the Gagliardo-Nirenberg inequality [5, Th. 10.1] v∞ ≤ 3/4 1/4 C vH 2 v2 ≤ C Av2 , we find 3/2 1/2 B1 (u) ≤ C ∇v2 Av2 + Av2 ∇θ 2 ,

11 Here and in what follows Re is the real part of a complex number.

306

Y. Kagei, M. R˚užiˇcka, G. Thäter

which implies that

t

e−β(t−s) (∇v2 Av2 + Av2 ∇θ 2 )ds 0 3/4 t 1/4 t −βt ≤ Ce eβs ∇v22 ds eβs Av22 ds

G1 (u) ≤ C

0

t

+ 0

G1 (u) ≤ C e 2

−2βt

e

βs

3/2

1/2

1/2

t

Av22 ds

0

0

e

βs

1/2 ,

∇θ 22 ds

1 1 2 2 N0 (t) + 2 N1 (t) . β2 β 3/2

1/2

Due to the estimate D(v) · D(v)2 ≤ C Av2 ∇v2 we see that 2 t 2 t 2 −2βt βs 2 βs 2 e ∇v2 ds + e Av2 ds G2 (u) ≤ C e ≤ C e−2βt

0

0

1 1 N0 (t)2 + 2 N1 (t)2 . β2 β

Besides, B3 (u) ≤ C θ v3 2 ≤ C θ6 v3 3 ≤ C ∇θ 2 ∇v2 and therefore G3 (u)2 ≤ C

e−2βt N0 (t)2 and the proof is complete. β2

Proof of Proposition 3.9. We deal with assertion (ii). The assertion (i) is proved similarly. Let η = 1. First for 0 < β ≤ min (β0 , √1 , d√1 ) we show Gr

2Pr

Gr

C N0 (t) ≤ E0 (0) + 3 E0 (0) + N0 (t)3/2 + N02 (t) + N12 (t) . β

(3.20)

Here β0 stems from Proposition 3.8 and C > 0 is independent of β. Exactly as in the proof of Proposition 3.7 (see (3.14)) we obtain 2 d1 d (3.21) (v22 + θ 22 ) + √ ∇v22 + √ ∇θ 22 ≤ F˜ (v, θ ). dt Gr Pr Gr The inequality (3.20) now follows from (3.21) with the help of Proposition 3.8 and the following standard arguments: (a) Using the Poincaré inequality on the left-hand side of (3.21) one splits 2 2 2 2 2 2 ˆ ˆ ˆ 2β(∇v 2 + ∇θ 2 ) ≥ β(∇v2 + ∇θ2 ) + C β(v2 + θ2 ).

(b) Then the inequality is multiplied by eβt and one applies d f (t) for β ≤ β and integrates. eβt dt

(c) For r ∈ N ,

t

t eβs b(s)r ds ≤ ( sup eβs b(s))r e−β(r−1)s ds = 0

0≤s≤t

0

d βt dt (e f (t))

C β r−1

= βeβt f (t) +

( sup eβs b(s))r . 0≤s≤t

Natural Convection with Dissipative Heating

307

In a similar manner to above (see (3.16)), using Proposition 3.8 we deduce for 0 < β ≤ min (β0 , √1 ) and C > 0 independent of β that 2 Gr

N1 (t) ≤ E1 (0) +

C 2 2 3 (0) + N (t) + N (t) + N (t) E . 0 0 1 1 β4

(3.22)

It then follows from (3.20) and (3.22) that in case η = 1, C ˆ ˆ Nˆ (t) ≤ E(0) + 4 E(0) + Nˆ (t)3/2 + Nˆ (t)2 + Nˆ (t)3 β with C > 0 independent of β. This immediately yields the assertion of Proposition 3.9 (ii) for Nˆ (t) if η = 1. If η = const. terms with θ 22 appear in F˜ (v, θ ) as in the proof of Proposition 3.7 Besides, they also appear on the right-hand of the estimate corresponding to Proposition 3.8, since we must modify B3 (u) to 0 B3 (u) = . (3.23) ˆ − x3 ) + η(θ )θ v3 − η−1 √ θ ζ (η − 1)() Pr

Gr

1

Here condition (3.9) enters the proof and yields η − 1 = θ 0 η (τ θ )dτ. Thus the term ˜ (t) we apply the usual √ θ in B3 (u) can be treated as a perturbation. To obtain N − η−1 Pr Gr energy method for higher order derivatives appearing in N2 (t), choosing β appropriately (depending on β0 , Gr and Pr ). The estimate for N (t) is also derived by the energy method for the derivatives appearing in N3 (t). The estimates for N2 (t) and N3 (t) are technical but straightforward using Proposition 3.8 and Lemma 3.6

4. Linearized Operator at the Motionless State In this section we prove the estimate (3.19) for the semigroup e−t L . We use the analytic perturbation theory for two parameters (see [14, p. 66ff, 116f, 119]). Usually in the case of several parameters one must be careful. However, after some reduction, we can transform our problem to an eigenvalue problem for which the critical eigenvalue is known to be simple. We consider the eigenvalue problem linearized at the motionless state u = 0: −σ u + Lu = 0,

(4.1)

where L is defined in (3.18). Since is bounded and L is strongly it has compact resolvent (see [5, Th. 14.6]). Thus, the spectrum σ (L) of L consists of discrete eigenvalues {σn }n≥1 with Re σ1 ≤ Re σ2 ≤ · · · ≤ Re σn ≤ · · · → +∞ ([14, Th. III.6.29]). The estimate (3.19) then holds if Re σ1 > 0. We denote the eigenvalues σj of L by σj (R , ζ ). elliptic12

Theorem 4.1. There exist ζ0 > 0 and R c (ζ ) ≥ R c such that if 0 ≤ ζ ≤ ζ0 and R < R c (ζ ), then σ1 (R , ζ ) > 0. Moreover, if 0 ≤ ζ ≤ ζ0 and R > R c (ζ ), then σ1 (R , ζ ) < 0. Here the number R c (ζ ) satisfies R c (0) = R c and R c (ζ ) > R c for 0 < ζ ≤ ζ0 . 12 This follows from the estimates obtained in Lemmata 3.5 and 3.6

308

Y. Kagei, M. R˚užiˇcka, G. Thäter

Proof. Performing the transformation introduced in Remark 3.2 we find the following problem, which is equivalent to (4.1), −σ u + L(λ, ζ )u = 0, L(λ, ζ )u ≡

Av − λP (θ b) ˆ − x3 ) − 1)v3 − Pr1 θ + Prλ (ζ ()

(4.2) .

It is known that for ζ = 0, all eigenvalues {σn (λ) ≡ σn (λ, 0)}n≥1 are real, the smallest eigenvalue has even multiplicity, say 2m (m ∈ N), and σ0 (λ) ≡ σ1 (λ) = · · · = σ2m (λ) < σ2m+1 (λ) ≤ · · · ≤ σn (λ) ≤ · · · → +∞, (L(λ, 0)u, u) ≥ σ0 (λ)u2 (u ∈ D(L) ≡ D(A) × D(B)). Here and in the following we denote the scalar product of H × for uj = {vj , θj } ∈ H × L2 () (j = 1, 2) is defined by :

L2 ()

(4.3)

by (·, ·) which

(u, u). √ Furthermore, σ0 (λ) > 0 (resp. σ0 (λ) < 0) if and only if λ < λ0 ≡ R c (resp. λ > λ0 ) while σ0 (λ) = 0 if and only if λ = λ0 . There exists γ0 = γ0 (l1 , l2 , Pr ) > 0 such that if j ≥ 2m + 1 and λ ≤ λ0 then σj (λ) ≥ γ0 . If 1 ≤ j ≤ 2m each σj (λ) is continuous in λ. In particular, for any ε > 0 there exists δ(ε) > 0 such that if λ < λ0 − ε, then σ0 (λ) ≥ δ(ε). We now consider the case 0 < ζ 1. We write (4.2) as (u1 , u2 ) ≡ (v1 , v2 )L2 () + Pr (θ1 , θ2 )L2 () and u ≡

−σ u + L0 u + (λ − λ0 )M1 u + ζ M2 u + M3 (λ, ζ )u = 0, where L0 = L(λ0 , 0),

0 −P (θ b) , M u = 2 λ0 ˆ − Pr1 v3 Pr () − x3 )v3 0 and . M3 (λ, ζ )u = λ−λ0 ˆ Pr ζ () − x3 )v3 Let first λ < λ0 − ε for some ε > 0.

M1 u =

Proposition 4.2. For any ε > 0 there exists δ(ε) > 0 and ζ1 (ε) > 0 with σ (L(λ, ζ )) ⊂ {σ ; Re σ ≥ δ(ε)/2} if λ < λ0 − ε and 0 ≤ ζ ≤ ζ1 (ε). ˆ − x3 )v3 2 ≤ C u, we see from (4.3) that Proof. Since () Re (L(λ, ζ )u, u) = (L(λ, 0)u, u) + Re (ζ M2 u, u) + Re (M3 (λ, ζ )u, u) ≥ (σ0 − Cλ0 ζ )u2 . Now recall that for any ε > 0 there exists δ = δ(ε) > 0 such that if λ < λ0 − ε, then δ(ε) and λ < λ0 − ε, then σ0 ≥ δ(ε). Thus, if ζ ≤ 2Cλ 0 Re (L(λ, ζ )u, u) ≥ δ(ε) 21 u2 , which implies that σ (L(λ, ζ )) ⊂ {σ ; Re σ ≥ This shows Proposition 4.2

1 2

δ(ε)} for λ < λ0 −ε and 0 ≤ ζ ≤

δ(ε) 2Cλ0 .

Natural Convection with Dissipative Heating

309

We next investigate L(λ, ζ ) for |λ − λ0 | ≤ ε

1 and 0 < ζ

1.

Proposition 4.3. (i) There exist ε2 > 0 and ζ2 > 0 such that σ (L(λ, ζ )) ⊂ {σ ; |σ | ≤ 41 γ0 } ∪ {σ ; Re σ ≥ 43 γ0 }

(4.4)

if |λ − λ0 | ≤ ε2 and 0 ≤ ζ ≤ ζ2 . (ii) There exist 0 < ε3 ≤ ε2 and 0 < ζ3 ≤ ζ2 such that the eigenvalues of L(λ, ζ ) in {σ ; |σ | ≤ 41 γ0 } have the form σ = σ (1,0) (λ − λ0 ) + σ (0,1) ζ + O(|λ − λ0 |2 + ζ 2 )

(4.5)

with constants σ (1,0) < 0 and σ (0,1) > 0, if |λ − λ0 | ≤ ε3 and 0 ≤ ζ ≤ ζ3 . Moreover, there exists λc = λc (ζ ) > 0 satisfying λc (0) = λ0 and

λc (ζ ) > λ0 for 0 < ζ ≤ ζ3

and it holds σ1 (λ, ζ ) > 0 if λ < λc (ζ ) and σ1 (λ, ζ ) < 0 if λ > λc (ζ ), provided that |λ − λ0 | < ε3 and 0 ≤ ζ ≤ ζ3 . Proof. First we observe Mj u2 ≤ Cu

(j = 1, 2, 3).

(4.6)

Since L0 is self-adjoint there exists an orthonormal basis of eigenfunctions which are used to express u and L0 u for u ∈ D(L0 ). From this we compute (−µ + L0 )u2 ≥ (min | − µ + σk |)2 u2 . k≥1

ˆ Pr ) > 0, Combining (4.6) and (4.7) we obtain for some constant a = a(λ0 , ), (λ − λ0 )M1 + ζ M2 + M3 (λ, ζ ) (−µ + L0 )−1 u a(|λ − λ0 | + ζ ) 1 ≤ u ≤ u, min | − µ + σk | 2

(4.7)

(4.8)

k≥1

provided that µ ∈ S ≡ {σ ; |σ | > 41 γ0 } ∩ {σ ; Re σ < 43 γ0 }, |λ − λ0 | ≤ ε2 and 0 ≤ ζ ≤ ζ2 for some small ε2 > 0 and ζ2 > 0. Using the identity −µ + L(λ, ζ ) = (I + ((λ − λ0 )M1 + ζ M2 + M3 (λ, ζ ))(−µ + L0 )−1 )(−µ + L0 ) we see that S is included in the resolvent set of L(λ, ζ ) and (4.4) follows. To prove (4.5) we note that the problem (4.2) is equivalent to   −σ v − v − λθ b + ∇p = 0, ˆ − x3 ) − 1)v3 = 0, −σ θ − Pr1 θ + Prλ (ζ ()  div v = 0

(4.9)

with boundary conditions under consideration. To solve (4.9) we expand v, θ and ∇p into Fourier series in x1 and x2 , and so we 2πi(

k1

x +

k2

x )

l1 1 l2 2 h(x ), where (k , k ) ∈ Z2 . We assume v, θ and ∇p to have the form e 3 1 2 first consider the case (k1 , k2 ) = (0, 0), namely, vj = vj (x3 ) (j = 1, 2, 3), θ = θ(x3 ).

310

Y. Kagei, M. R˚užiˇcka, G. Thäter

Equation div v = 0 leads to v3 ≡ 0. We then obtain

d dx3 v3

= 0. This, together with v = 0 on {x3 = 0, 1}, yields

−σ vj 2L2 (0,1) + dxd 3 vj 2L2 (0,1) = 0 (j = 1, 2), −σ θ2L2 (0,1) +

1 d θ 2 2 = 0. Pr dx3 L (0,1)

This implies that

σ ≥ aπ = a inf 2

dxd 3 h2L2 (0,1) h2L2 (0,1)

;h ∈

H01 (0, 1),

h = 0 ,

where a = min (1, Pr −1 ). Therefore, we see that σ ∈ {σ ; Re σ ≥ 43 γ0 }. We next consider (k1 , k2 ) = (0, 0). This is the case for which there can occur σ ∈ {σ ; |σ | ≤ 41 γ0 }. Taking curl curl of (4.9)1 , we obtain

σ v3 + 2 v3 + λ 2 θ = 0, ˆ − x3 ) − 1)v3 = 0 −σ θ − Pr1 θ + Prλ (ζ ()

(4.10)

with boundary conditions v3 = ∂3 v3 = θ = 0 at x3 = 0, 1 and the periodic boundary conditions in x1 and x2 . Here 2 = ∂11 + ∂22 . 2πi(

k1

x +

k2

x )

k 1

k 2

2πi( l 1 x1 + l 2 x2 )

l1 1 l2 2 f (x ), θ = e We now substitute v3 = e 3 (0, 0) into (4.10). Then we find the eigenvalue problem :  2 2  −σ Dω f + Dω f − λω g = 0 1 λ ˆ − x3 ) − 1)f = 0 −σg + Pr Dω g + Pr (ζ ()  d f = dx3 f = g = 0

g(x3 ) for (k1 , k2 ) =

(0 < x3 < 1), (x3 = 0, 1),

(4.11)

2πk2 2 d d 2 2 2 2 1 2 where ω2 ≡ ( 2πk l1 ) + ( l2 ) > 0, Dω ≡ (− dx 2 + ω ) and Dω ≡ ( dx 2 − ω ) . 2

2

3

3

It is easily verified that the eigenvalues and eigenfunctions of (4.9) for (k1 , k2 ) = (0, 0) can be obtained from those of (4.11) with suitable ω2 > 0 and vice versa, since ω2 > 0. We write (4.11) as Here

M≡

−σ Mf + L(λ, ζ )f = 0, f = {f, g}. Dω 2 Dω 0 , L(λ, ζ ) ≡ λ ˆ 0 1 Pr (ζ () − x3 ) − 1)

(4.12) 2

−λω 1 Pr Dω

and the operators Dω and Dω 2 are defined as above for g ∈ H 2 (0, 1) ∩ H01 (0, 1) and f ∈ {f ∈ H 4 (0, 1) ; f = dxd 3 f = 0 at x3 = 0, 1}, respectively. The eigenvalues σj (λ0 ) of L0 are given by the eigenvalues of the eigenvalue problem (4.12) with λ = λ0 and ζ = 0, and moreover, the eigenvalues of L(λ, ζ ) in {σ ; |σ | ≤ 1 1 4 γ0 } are given by those of L(λ, ζ ) in {σ ; |σ | ≤ 4 γ0 }. In particular, σ (L(0, 0)) ∩ 1 {σ ; |σ | ≤ 4 γ0 } = {σ0 (λ0 ) = 0}. The following lemma summarizes facts contained in [22, p. 38] which are based on results obtained already earlier and quoted therein. Lemma 4.4. (i) The eigenvalue σ0 (λ0 ) = 0 of L(0,0) ≡ L(0, 0) is simple.

Natural Convection with Dissipative Heating

311

(ii) One can choose an eigenfunction f0 = {f0 , g0 } of L(0,0) associated with σ0 (λ0 ) = 0 in such a way that f0 (x3 ) > 0 and g0 (x3 ) > 0 for 0 < x3 < 1. Since σ0 (λ0 ) is simple by Lemma 4.4 (i), there exists only one eigenvalue σ = σ (λ, ζ ) of L(λ, ζ ) in {σ ; |σ | ≤ γ40 } when |λ−λ0 | and ζ are sufficiently small. Furthermore, due to the simplicity of σ0 (λ0 ), one can see that σ (λ, ζ ) is analytic in λ and ζ near λ = λ0 and ζ = 0 and it is expanded as σ (λ, ζ ) =

∞ j,k≥0

σ (j,k) (λ − λ0 )j ζ k

with σ (0,0) = σ (λ0 ) = 0.

(4.13)

We denote by f(λ, ζ ) the eigenfunction associated with σ (λ, ζ ) satisfiying f(0, 0) = f0 . Then f(λ, ζ ) =

∞ j,k≥0

(λ − λ0 )j ζ k f (j,k)

(4.14)

with f (0,0) = f0 . Substituting (4.13) and (4.14) into (4.12) we obtain L(0,0) f0 = 0, −σ Mf0 + L f + L(1,0) f0 = 0, (4.15) −σ (0,1) Mf0 + L(0,0) f (0,1) + L(0,1) f0 = 0 (4.16) and so on. Here L(λ, ζ ) = 0≤j,k≤1 (λ − λ0 )j ζ k L(j,k) with L(0,0) = L(0, 0), 1 λ0 ˆ − x3 )f L(1,0) f = −ω2 g, − f , L(0,1) f = 0, () Pr Pr (1,0)

(0,0) (1,0)

(0,1) . To compute σ (j,k) we define #·, ·$ by and L(1,1) = λ−1 0 L

1

1 1 f1 (x3 )f2 (x3 )dx3 + Pr g1 (x3 )g2 (x3 )dx3 #f1 , f2 $ = 2 ω 0 0

for fj = {fj , gj } ∈ L2 (0, 1)2 (j = 1, 2). Here f denotes the complex conjugate of f . Note that #L(0,0) f1 , f2 $ = #f1 , L(0,0) f2 $ and #Mf, f$ > 0 for f = 0. Taking #·, ·$ of (4.15) and (4.16) with f0 respectively, we obtain σ (1,0) =

#L(1,0) f0 , f0 $ #Mf0 , f0 $

and

σ (0,1) =

#L(0,1) f0 , f0 $ #Mf0 , f0 $

respectively. The coefficient σ (1,0) must satisfy σ (1,0) < 0, since σ0 (λ) > 0 if and only if λ < λ0 , and σ0 (λ) < 0 if and only if λ > λ0 . Since f0 , g0 > 0 by Lemma 4.4 (ii) and ˆ > 1 ≥ x3 for 0 ≤ x3 ≤ 1, we see that since )

1 ˆ − x3 )f0 (x3 )g0 (x3 )dx3 > 0. λ0 () #L(0,1) f0 , f0 $ = 0

Thus, σ (0,1) > 0, and we have obtained (4.5). Now we define λ0 (ζ ) by σ (λ0 (ζ ), ζ ) = 0 and arrive at σ (0,1) λ0 (ζ ) = λ0 − (1,0) ζ + O(ζ 2 ). σ

312

Y. Kagei, M. R˚užiˇcka, G. Thäter

Since λ0 (ζ ) also depends on ω2 , we denote λ0 (ζ ) by λ0 (ζ ; ω2 ). Then the critical number Rc (ζ ) is given by Rc (ζ ) = λc (ζ )2 with λc (ζ ) =

inf

(k1 ,k2 )∈Z2 \(0,0)

λ0 (ζ ; (

2π k1 2 2π k2 2 ) +( ) ). l1 l2

The calculation above can be justified as follows: For f ∈ D ≡ H01 (0, 1) × L2 (0, 1) the problem (4.12) is equivalent to −σ f + L

(0,0)

f + M −1 [(λ − λ0 )L(1,0) + ζ L(0,1) + (λ − λ0 )ζ L(1,1) ]f = 0, Dω −λ0 ω2 Dω −1 (0,0) L = λ0 1 − Pr Pr Dω

and Dω is the self-adjoint extension of Dω −1 Dω 2 in D(Dω ) = H01 (0, 1) with domain {f ∈ H 3 (0, 1) ; f = dxd 3 f = 0 at x3 = 0, 1}. As for Dω we refer to [29, Th.V.2]. One can then obtain for certain a, b > 0, 1/2

|||L(j,k) f||| ≤ a |||L

(0,0)

f||| + b |||f|||.

Here ||| · ||| denotes the norm in D : |||f|||2 = M 1/2 f2 =

1 Dω 1/2 f 2L2 (0,1) + Pr g2L2 (0,1) . ω2 (0,0)

Since σ0 (λ0 ) = 0 is a simple eigenvalue of L , one can see that σ (λ, ζ ) is analytic in λ and ζ near λ = λ0 and ζ = 0 as in the same way in [14]. Then σ (1,0) and σ (0,1) are immediately obtained as above. Theorem 4.1 now follows from Propositions 4.2 and 4.3 by taking ε = ε3 in Proposition 4.2 and ζ0 = min {ζ1 (ε3 ), ζ3 }. This completes the proof of Theorem 4.1 Acknowledgement. We would like to thank A. Srinivasa for pointing out a mistake in a previous version of this paper and for useful discussions.

References 1. Bayly, B.J., Levermore, C.D., Passot T.: Density variations in weakly compressible flows. Phys. Fluids A 4 (5), 945–954 (1992) 2. Boussinesq, J.: Théorie Analytique de la Chaleur. Paris: Gauthier-Villars, 1903 3. Busse, F.H., Carrigan, C.R.: Convection induced by centrifugal buoyancy. J. Fluid Mech. 62, 579–592 (1974) 4. Busse, F.H.: Fundamentals of Thermal Convection. In: Mantle Convection, Plate Tectonics and Global Dynamics, W. R. Peltier ed., New York-London-Paris: Gordon and Breach, 1989 5. Friedman, A.: Partial Differential equations. New York: Holt, Rinehart and Winston Inc., 1969 6. Gebhart, B.: Heat Transfer. New York–Toronto–London: MacGraw-Hill Book Company Inc., 1961 7. Gebhart, B.: Effects of viscous dissipation in natural convection. J. Fluid Mech. 14, 225–232 (1962) 8. Heywood, J.: The Navier–Stokes Equations: On the existence, regularity and decay of solutions. Indiana Univ. Math. J. 29, 639–681 (1980); Remarks on the Possible Global Regularity of Solutions of the Three-dimensional Navier–Stokes Equation. In: Progress in theoretical and computational fluid mechanics, Winter School, Paseky, 1993, G.P. Galdi, J. Málek, J. Neˇcas, eds., Essex: Longman Scientific & Technical, 1994 9. Hewitt, F.M., McKenzie, D.P., Weiss, N.O.: Dissipative heating in convective flows. J. Fluid Mech. 68 (4), 721–738 (1975)

Natural Convection with Dissipative Heating

313

10. Joseph, D.D.: On the stability of the Boussinesq Equations. Arch. Rational. Mech. Anal. 20, 59–71 (1965) 11. Joseph, D.D.: Stability of Fluid Motions II: §54. Berlin–Heidelberg–New York: Springer Verlag, 1976 12. Kagei,Y.:Attractors for two-dimensional equations of thermal convection in the presence of the dissipation function. Hiroshima Math. J. 25, 251–311 (1995) 13. Kagei,Y., Skowron, M.: Nonstationary flows of nonsymmetric fluids with thermal convections. Hiroshima Math. J. 23, 343–363 (1993) 14. Kato, T.: Perturbation Theory for Linear Operators. Berlin–Heidelberg–New York: Springer, Corrected Printing of the Second Edition (1980) 15. Kirchgäßner, K., Kielhöfer, H.: Stability and bifurcation in fluid dynamics. Rocky Mountain J. Math. 3, 275–318 (1973) 16. Lions, P.-L.: Mathematical Topics in Fluid Mechanics. Oxford lecture series in mathematics and its application 10, Oxford: Oxford Science Publications, 1996 17. Litsek, P.A., Bejan, A.: Convection in the Cavity Formed Between Two Cylindrical Rollers. J. Heat Transf. 112, 625–631 (1990) 18. Málek, J., R˚užiˇcka, M., Thäter, G.: Fractal dimension, attractors and Boussinesq approximation in three dimensions. Act. Appl. Math. 37, 83–98 (1994) 19. McKenzie, D.P., Roberts, J.M., Weiss, N.O.: Convection in the earth’s mantle: towards a numerical simulation. J. Fluid Mech. 62, 465–538 (1974) 20. Oberbeck, A.: Über die Wärmeleitung der Flüssigkeiten bei der Berücksichtigung der Strömungen infolge von Temperaturdifferenzen. Annalen der Physik und Chemie 7, 271 (1879); Über die Bewegungserscheinungen der Atmosphäre. Sitz. Ber. K. Preuss. Akad. Miss. 383 and 1120 (1888) 21. Ostrach, S.: Internal viscous flows with body forces. In: Görtler, H. (ed): Grenzschichtforschung, Berlin– Göttingen–Heidelberg: Springer Verlag, 1958 22. Rabinowitz, P.: Existence and nonuniqueness of rectangular solutions of the Bénard problem. Arch. Rational Mech. Anal. 29, 32–57 (1968) 23. Rajagopal, K.R., R˚užiˇcka, M., Srinivasa, A.R.: On the Oberbeck–Boussinesq Approximation. Math. Models Methods Appl. Sci. 6, 1157–1167 (1996) 24. Schmitt, B.J., von Wahl, W.: Monotonicity and boundedness in the Boussinesq-equations. Eur. J. Mech. B/Fluids 12, 245–270 (1993) 25. Specovius, M.: Ein Struktursatz von Leray für Lösungen Navier–Stokesscher Anfangsrandwertaufgaben. Diploma thesis University Paderborn (1981) 26. Truesdell, C.: A first course in rational Continuum Mechanics, Volume 1: General Concepts. 2nd ed., San Diego-London: Academic Press, 1991 27. Turcotte, D.L., Hsui, A.T., Torrance, K.E., Schubert, G.: Influence of viscous dissipation on Bénard convection. J. Fluid Mech. 64 (2), 369–374 (1974) 28. Velarde, M.G., Perez Cordon, R.: On the (non-linear) foundations of Boussinesq approximation applicable to a thin layer of fluid II: Viscous dissipation and large cell gap effects. J. de Physique 37 (3), 178–182 (1976) 29. von Wahl, W.: The Boussinesq-Equations in terms of poloidal and toroidal fields and the mean flow. Bayreuther Math. Schriften 40, 203–290 (1992) Communicated by H. Araki

Commun. Math. Phys. 214, 315 – 337 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Symmetry Breaking and Other Phenomena in the Optimization of Eigenvalues for Composite Membranes S. Chanillo1 , D. Grieser 2 , M. Imai3 , K. Kurata4 , I. Ohnishi3 1 Department of Mathematics, Rutgers University, New Brunswick, NJ 98903, USA.

E-mail: [email protected]

2 Institut für Mathematik, Humboldt-Universität Berlin, Unter den Linden 6, 10099 Berlin, Germany.

E-mail: [email protected]

3 Department of Information mathematics and Computer sciences, University of Electro-Communications,

Chofu-ga-oka 1-5-1, Chofu-shi, Tokyo, Japan. E-mail: [email protected]; [email protected]

4 Department of Mathematics, Tokyo Metropolitan University, Minami-Ohsawa 1-1, Hachioji-shi, Tokyo,

Japan. E-mail: [email protected] Received: 22 November 1999/ Accepted: 31 March 2000

Abstract: We consider the following eigenvalue optimization problem: Given a bounded domain ⊂ R and numbers α > 0, A ∈ [0, ||], find a subset D ⊂ of area A for which the first Dirichlet eigenvalue of the operator − + αχD is as small as possible. We prove existence of solutions and investigate their qualitative properties. For example, we show that for some symmetric domains (thin annuli and dumbbells with narrow handle) optimal solutions must possess fewer symmetries than ; on the other hand, for convex reflection symmetries are preserved. Also, we present numerical results and formulate some conjectures suggested by them. 1. Problem and Main Results We study qualitative properties of solutions of a certain eigenvalue optimization problem. In physical terms, the problem can be stated as follows: Problem (P). Build a body of prescribed shape out of given materials (of varying densities) in such a way that the body has a prescribed mass and so that the basic frequency of the resulting membrane (with fixed boundary) is as small as possible. In fact, we will consider a more general problem, which we now state in mathematical terms: Given a domain ⊂ Rn (bounded, connected, with Lipschitz boundary) and numbers α > 0, A ∈ [0, ||] (with | · | denoting volume). For any measurable subset D ⊂ let χD be its characteristic function and λ (α, D) the lowest eigenvalue λ of the problem −u + αχD u = λu on , u = 0 on ∂.

(1)

316

S. Chanillo, D. Grieser, M. Imai, K. Kurata, I. Ohnishi

Define (α, A) = inf λ (α, D). D⊂ |D|=A

(2)

Any minimizer D in (2) will be called an optimal configuration for the data (, α, A). If D is an optimal configuration and u satisfies (1) then (u, D) will be called an optimal pair (or solution). Our problem now reads: Problem (M). Study existence, uniqueness and qualitative properties of optimal pairs. As is well-known, u is uniquely determined, up to a scalar multiple, by D, and may be chosen to be positive on . In addition, we will always assume u2 = 1.

(Integrals over are always taken with respect to the standard measure.) Clearly, changing D by a set of measure zero does not affect λ (α, D) or u. Therefore, we will consider sets D that differ by a null-set as equal. At first sight, it is not obvious that problem (M) generalizes problem (P). In fact, we will see (Theorem 13) that there is a number α (A) > 0 such that solutions of problem (P) are in one to one correspondence with solutions of problem (M) with parameters in the range α ≤ α (A). The number α (A) is characterized as the unique value of α satisfying (α (A), A) = α (A),

(3)

see Proposition 10. Our investigations are theoretical and numerical: Numerical results (obtained by M.I. and I.O.) suggest properties of optimal configurations; this leads to the formulation of conjectures, and some of these are proved rigorously (by S.C., D.G. and K.K.). A central tool in our investigations is the variational characterization of the eigenvalue: 2 2 |∇u| + α χD u λ (α, D) = inf R (u, α, D), R (u, α, D) := , 2 u∈H01 () u and the eigenfunction u is a minimizer. So (α, A) is characterized by (α, A) =

inf

u∈H01 () |D|=A

R (u, α, D).

We first prove the following theorem on existence and basic properties of solutions. It is fundamental for all further considerations. Theorem 1. For any α > 0 and A ∈ [0, ||] there exists an optimal pair. Moreover, any optimal pair (u, D) has the following properties: (a) u ∈ C 1,δ () ∩ H 2 () ∩ C γ () for some γ > 0 and every δ < 1. (b) D is a sublevel set of u, i.e. there is a number t ≥ 0 such that D = {u ≤ t}.

Symmetry Breaking for Composite Membranes

317

(c) Every level set {u = s}, s ≥ 0, has measure zero, except possibly in the case α = α (A), s = t. Here we use the short notation {u = t} = {x : u(x) = t}. Since χD is discontinuous, solutions u may not be twice differentiable, so Eq. (1) is understood in the weak sense. Note that Theorem 1(b) shows in particular that our problem is equivalent to finding the smallest eigenvalue and associated eigenfunctions of the nonlinear problem (with free variables u and t) −u + αχ{u≤t} u = λu on , u = 0 on ∂, |{u ≤ t}| = A.

(4)

The question of uniqueness is much more subtle: For some domains there will be a unique optimal pair for all α, A, while for others there will be many, for certain ranges of α, A. This follows from our results on symmetry preservation and symmetry breaking below. We now list a few questions that naturally come to mind: (SY) If has symmetries, does D have the same symmetries? (Note that if and D have a symmetry in common then u will also have this symmetry since it is uniquely determined by and D.) (CX) Assume is convex. Is D c := \ D convex? Is D unique? (CN) Is D or D c connected? (FB) What is the regularity of the free boundary ∂D? We give partial answers to all of these questions. Some proofs, mainly relating to (FB), and additional results can be found in the companion paper [CGK]. Many open problems remain, see Sect. 6. At this point, the reader is invited to look at Figs. 1–3 for a first impression. We now state our qualitative results. As a general convention, constants only depend on the quantities indicated as subscripts or in parentheses, unless otherwise specified. Often we suppress the subscript . First, as an easy consequence of Theorem 1 one has: Theorem 2. Fix α > 0, A > 0, and let D be an optimal configuration. (a) D contains a tubular neighborhood of the boundary ∂. (b) If α < α (A) then every connected component D0 of the interior of D hits the boundary, i.e. D0 ∩ ∂ = ∅. In particular, if is simply connected and α < α (A) then D is connected. The number α (A) was defined above, see (3). The significance of the condition α < α (A) is that it is equivalent to u < 0 on . One always has α (A) ≥ µ . Here and throughout the paper, µ denotes the first eigenvalue of the Dirichlet Laplacian on , and ψ the positive, L2 -normalized eigenfunction: −ψ = µ ψ

on ,

ψ = 0

on ∂.

318

S. Chanillo, D. Grieser, M. Imai, K. Kurata, I. Ohnishi

Next, we consider the dependence of and solutions (u, D) on α and A. Here it is convenient to formulate our problem also for α = 0, as follows: If α = 0 then a solution (unique in this case) is a pair (ψ , D), where D is the sublevel set of ψ of area A. (Since ψ is real analytic and non-constant, such D exists for every A and is unique.) We will prove strict monotonicity and Lipschitz continuity of in both parameters (Prop. 10). Continuous dependence of optimal pairs (u, D) on the parameters may be expected only at parameter values where they are unique. This is the case, in particular, if α = 0 or A = 0 or A = ||; in these cases u = ψ , and the continuity is proved in [CGK]. Here we only state the results. They are used only in the proof of Theorem 9. For example, we have the following: Theorem 3. For s ≥ 0 let []s = {ψ ≤ s}, where ψ is the positive L2 -normalized first eigenfunction of − on . Fix A ∈ [0, ||] and choose t such that |[]t | = A. Then for any δ > 0 there is α0 = α0 (δ, ) such that whenever α < α0 and D is an optimal configuration for (α, A) then |t − t | < δ and []t −δ ⊂ D ⊂ []t +δ . We now address questions of symmetry. First, we prove symmetry preservation in the presence of convexity: Theorem 4. Assume that the domain is symmetric and convex with respect to the hyperplane {x1 = 0}. In other words, for each x = (x2 , . . . , xn ) the set {x1 : (x1 , x ) ∈ }

(5)

is either empty or an interval of the form (−c, c). Then for any solution (u, D) both u and D are symmetric with respect to {x1 = 0}, D c is convex with respect to {x1 = 0}, and u is decreasing in x1 for x1 ≥ 0.

Fig. 1. Ellipse (with α = 1, A = 1, || = 6.3). Optimal configuration D in black

For example, any solution in an elliptic region has a double reflection symmetry, see Fig. 1. The principal tool here is Steiner symmetrization. See [K2] for an overview on such methods. Theorem 4 easily implies the following uniqueness result (the only case where we can prove uniqueness!): Corollary 5. Let = {|x| < 1} be the ball. Then there is a unique optimal configuration D for any α, A, and D is a shell region D = {x : r(A) < |x| < 1}.

Symmetry Breaking for Composite Membranes

319

One of the most interesting phenomena studied in this paper is symmetry breaking for certain plane domains . That is, an optimal configuration D may have less symmetry than . We will prove it for two types of domains: Thin annuli and dumbbells with narrow handle. An annulus has rotational symmetry, a dumbbell has a reflection symmetry. Theorem 6. Fix α > 0 and δ ∈ (0, 1). For a > 0 let a = {x ∈ R2 : a < |x| < a + 1}. There exists a0 = a0 (α, δ) such that whenever a > a0 and D is an optimal configuration for a with parameters α and A = δ|a | then D is not rotationally symmetric.

a α = 1, δ = 0.64, τ = 3.5

c Stronger non-linearity (α = 10),

b Thinner annulus (τ = 1.2)

d Larger area of D (δ = 0.83)

Fig. 2. Symmetry breaking on annuli: The parameters are α (the “strength” of the non-linearity), δ = A/|| (the relative size of D), and τ = rout /rin (the ratio of outer and inner radius). In each of b, c, d only one parameter is changed, compared with a. Optimal configuration D in black

See Fig. 2a and b. For dumbbells we prove a little more than symmetry breaking: Theorem 7. For h ∈ (0, 1) define the dumbbell with handle width 2h, h = B1 (−2, 0) ∪ ((−2, 2) × (−h, h)) ∪ B1 (2, 0), where Br (p) = {x ∈ R2 : |x − p| < r}. Fix α > 0 and A ∈ (0, 2π ). Then there is h0 = h0 (α, A) > 0 such that we have for h < h0 : (a) Any optimal pair (u, D) is not symmetric with respect to the x2 -axis. (b) If A > π then for any optimal pair (u, D) the complement D c is contained in one of the lobes (i.e. one of the balls B1 (±2, 0)). See Fig. 3 for part (a). In fact, similar results hold for more general dumbbells. As we remarked before, symmetry breaking implies non-uniqueness: For example for a dumbbell the pair (u , D ) obtained from a solution (u, D) by reflection in the x2 -axis will be a solution, and different from (u, D) by the theorem. The following result on the regularity of the free boundary is proved in [CGK]: Theorem 8. If (u, D) is an optimal pair, x ∈ ∂D and ∇u(x) = 0 then ∂D is a real analytic hypersurface near x.

320

S. Chanillo, D. Grieser, M. Imai, K. Kurata, I. Ohnishi

h = 0:3

h = 0:2 Fig. 3. Symmetry breaking on dumbbells: The parameters are α = 0.1, A = 1. Symmetry breaking occurs when the width of the handle (2h) is decreased. The ”lobes” are unit circles with centers 4 units apart. Optimal configuration D in black

The difficulty is that χD is discontinuous at x ∈ ∂D, so u is not even C 2 there. That the level set {u = t} has C ω regularity nevertheless is proved by introduction of suitable local coordinates (with u as one coordinate) and analysis of the resulting nonlinear elliptic equation. Similar arguments and continuity considerations for α near zero allow us to give partial answers to problems (CX) and (FB): Theorem 9. Suppose is convex and has a C 2 boundary. Then there is α0 (A, ) > 0 such that for any α < α0 and any optimal configuration D, one has: (a) ∂D ∩ is real analytic; (b) D c is convex. Problem (P) and generalizations of it (to higher eigenvalues and to a maximization problem), but with fewer qualitative results, were studied before in [Kr, CM], and [C] (where Theorem 4 is stated, but the proof is incomplete since the case of equality in the rearrangement inequalities is not addressed). Problems similar to problem (M) (e.g. with Lp potentials) were considered in [AH, Eg, AHS, CL], and [HKK]. The paper is organized as follows: In Sect. 2, we prove Theorems 1 and 2 and discuss the parameter dependence of . Also, in Subsect. 2.3 we discuss the relation of problems (P) and (M). In Sect. 3 we prove Theorems 4, 6, and 7 on symmetry questions, and Corollary 5. In Sect. 4 we prove Theorem 9. In Sect. 5 we describe the numerical algorithm used. In Sect. 6 we state some open problems and conjectures. Finally, we collect some standard facts about elliptic PDEs in the Appendix.

Symmetry Breaking for Composite Membranes

321

2. Basic Results 2.1. Existence and regularity. Proof of Theorem 1. We first prove existence and regularity: The regularity statements in (a) hold for solutions of equations −u + ρu = 0 with ρ bounded by standard elliptic theory, see for example [GT, Theorem 8.29 and Corollary 8.36]. To prove existence, fix α and A, and write = (α, A), λ(D) = λ (α, D) for simplicity. Let Dj be a minimizing sequence, i.e. λ(Dj ) → as j → ∞. Let uj ∈ H01 (all function spaces are defined on ) be the positive L2 -normalized first eigenfunction of −+αχDj . Since λ(Dj ) is bounded, the sequence {uj } is bounded in H01 . Also, {χDj } is bounded in L2 . Therefore, we may choose a subsequence (again denoted uj , Dj ) and u ∈ H01 , η ∈ L2 such that uj * u in H01 (weak convergence) and χDj * η in L2 . This implies uj → u (strongly) in L2 , χDj uj * ηu in L2 , and η = A. Now taking limits in the weak form of the eigenvalue equation ∇uj · ∇ψ + α χDj uj ψ = λ(Dj ) uj ψ ∀ψ ∈ H01

we get −u + αηu = u

(weakly).

(6)

We have 0≤η≤1

a.e.,

since 0 ≤ χDj ≤ 1 for all j and weak convergence preserves pointwise inequalities a.e. (exercise!). Therefore, u has the regularity stated in (a). It remains to prove that η may be replaced by a characteristic function. Since u2 = 1, (6) shows that |∇u|2 + α ηu2 = . (7)

Now the minimization problem ηu inf η: η=A 0≤η≤1

2

has a solution η = χD , where D is any set with |D| = A and {u < t} ⊂ D ⊂ {u ≤ t},

t := sup{s : |{u < s}| < A}

(8)

(compare the “bathtub principle”, Theorem 1.18 in [LL]). Therefore, we get from (7) |∇u|2 + α χD u2 ≤ .

By definition of as a minimum, this must actually be an equality, and (u, D) is a solution.

322

S. Chanillo, D. Grieser, M. Imai, K. Kurata, I. Ohnishi

(b) Let (u, D) be any solution. Then it is obvious that (8) must hold (always up to a set of measure zero; if (8) didn’t hold then one could reduce D u2 by shifting a part of D from {u > t} to {u ≤ t}). Set Ns = {u = s} for any s > 0. Using Lemma 7.7 from [GT] twice, we see that u = 0 a.e. on Ns (since u ≡ const on Ns ; recall that u is in H 2 ). Therefore, ( − αχD )u = 0

a.e. on Ns .

(9)

Since u > 0 and > 0, this shows that D c ∩ Ns has measure zero. Taking s = t we get (b). (c) If s > t then Ns ⊂ D c , so |Ns | = 0 by (9). The same argument works if s = t and α = . Finally, u satisfies −u = ( − α)u on the open set {u < t}, hence u is real analytic there, and therefore the level sets Ns have measure zero for s < t. Proof of Theorem 2. Part (a) is clear from Theorem 1(b). To prove (b), assume this was false. Then there is an open subset D0 ⊂ {u ≤ t} with ∂D0 ⊂ D c = {u ≥ t} and therefore u = t on ∂D0 . Then u assumes a minimum at some x0 ∈ D0 . But this is a contradiction since α < α (A) implies (α, A) > α (see Proposition 10 below) and therefore u = (α − (α, A))u < 0 on D0 .

2.2. Parameter dependence of . Proposition 10. (a) The function (α, A) → (α, A) is Lipschitz continuous, uniformly on bounded sets. More precisely, we have, for any α, α ≥ 0, A, A ∈ [0, ||], |(α, A) − (α , A )| ≤ |α − α |

max{A, A } + |A − A | min{α, α }C,max{α,α } ||

(10)

with C,α bounded for α bounded. (b) (α, A) is strictly increasing in A for fixed α > 0, strictly increasing in α for fixed A > 0, and (α, A) − α is strictly decreasing in α for fixed A < ||. (c) If A < || then there is a unique value α = α (A) with (α (A), A) = α (A).

(11)

The function α is continuous and strictly increasing, α (0) = µ and α (A) → ∞ as A → ||. Proof. (a) Write = (α, A) and = (α , A ), and D), (u , D ) be mini let (u, 2 2 mizers for , respectively. We may assume u = (u ) = 1, so that |∇u|2 + α u2 , |D| = A, =

D

,

and similarly for etc. By symmetry of (10) we may assume that A ≥ A. Choose D1 ⊂ D with |D1 | = A and D1 ⊃ D with |D1 | = A . Here we may assume that D1 is of the form {u ≤ s} for a suitable number s. Using the optimality of (u, D) for we get 2 2 2 |∇u | + α (u ) = + (α − α ) (u ) − α (u )2 . (12) ≤

D1

D

D \D1

Symmetry Breaking for Composite Membranes

323

Similarly, using the optimality of (u , D ) for we get ≤ |∇u|2 + α u2 = + (α − α) D1

Alternatively, we may rewrite this as ≤ + (α − α)

D

u2 + α

D1

u2 + α

D1 \D

u2 .

(13)

D1 \D

u2 .

(13’)

In order to estimate the integrals in (12), (13) and (13’) which are multiplied by ±(α−α ), observe that for any s > 0 and any function u we have 2 |{u ≤ s}| {u≤s} u ≤ . 2 || u The other integrals are estimated using the uniform estimate (47): u solves the equation −u + αχD u = u. is bounded in terms of and α since one may apply (12) with α = 0, A = A , to obtain ≤ µ + α. Therefore, the uniform bound (47), applied to G = , yields u2 ≤ (A − A) sup u2 ≤ (A − A)C,α . D1 \D

Finally, we obtain (10) by applying these estimates to (12) and (13) in the case α ≤ α , and to (12) and (13’) if α ≥ α . (b) This follows immediately from (12) and the unique continuation theorem. (c) This follows easily from (a) and (b) since (α, A) − α equals µ > 0 for α = 0 and tends to −∞ as α → ∞ by (a). We now consider continuous dependence of optimal pairs (u, D) on the data. First, near α = 0: Proposition 11. Fix D ⊂ . Let uα,D be the (positive, L2 -normalized) first eigenfunction of − + αχD , and ψ = u0,D the first eigenfunction of −. Then there is a constant C = C such that, for 0 ≤ α ≤ 1, uα,D − ψ ≤ Cα, in the H 2 () and L∞ () norms, and in C 1,δ () if ∂ is in C 1,δ . Proof. See [CGK]. Proof of Theorem 3. This is almost immediate from Proposition 11, see [CGK].

Similarly, one has continuity in A at A = 0 and at A = ||. Here we only consider the latter case: Proposition 12. Let be a smooth bounded domain and fix α > 0. Let M = max ψ .

Then, for any δ > 0 there is A0 = A0 (δ, α, ) < || such that whenever A > A0 and D is an optimal configuration for (α, A) then D c ⊂ {ψ > M − δ}. Proof. See [CGK].

324

S. Chanillo, D. Grieser, M. Imai, K. Kurata, I. Ohnishi

2.3. Relation of problems (P) and (M). We want to show that problem (P) (see Sect. 1) is a special case of problem (M). The mathematical formulation of problem (P) is: Given 0 ≤ h < H (lower and upper bounds for the densities of the materials that are available) and the prescribed total mass M ∈ [h||, H ||], M > 0, consider measurable “density functions” ρ satisfying h ≤ ρ ≤ H, ρ = M.

Then the objective is to find ρ and u which realize the minimum in |∇u|2 -(h, H, M) := inf inf . 2 ρ u∈H 1 () ρu 0

(14)

The corresponding eigenvalue problem is −u = -ρu,

u|∂ = 0.

(15)

(We assume the modulus of elasticity to be the same for all materials.) Problem (P) and problem (M) are related in the following way: Theorem 13. (a) If (u, ρ) is a minimizer for problem (P) then ρ is of the form ρD = hχD + H χD c for a set D of the form D = {u ≤ t}. That is, only two types of materials occur. (b) The pair (u, ρD ) is a minimizer for problem (P), with parameter values (h, H, M), if and only if (u, D) is a minimizer (optimal pair) for problem (M), with parameter values (α, A) given by α = (H − h)-(h, H, M), H || − M A= . H −h

(16) (17)

The minimal eigenvalues are related by (α, A) = H -(h, H, M).

(18)

(c) The values of (α, A) that occur when h, H, M vary are precisely those satisfying A ∈ [0, ||), 0 < α ≤ α (A) or A = ||, 0 < α < ∞, where α (A) is defined in (11). In particular, α = α (A) corresponds to h = 0. Note that problem (P) really depends on two parameters only since for κ > 0 one has -(κh, κH, κM) = κ −1 -(h, H, M), with the same minimizers (up to a factor κ for ρ). This is obvious from (14).

Symmetry Breaking for Composite Membranes

325

Proof. (a) This is almost obvious from (14), and proved just like part (b) of Theorem 1. (b) First, if ρ = ρD and |D| = A then M = ρ = Ah + (|| − A)H, which gives (17). Simple manipulation shows that −u = -ρD u = -(hχD + H χD c )u

(19)

−u + (H − h)-χD u = H -u.

(20)

is equivalent to Now if (u, ρD ) is a minimizer for problem (P) then it satisfies (19) with - = -(h, H, M), and then (20) shows that (α, A) ≤ H -(h, H, M) with α satisfying (16). Conversely, if (u, D) is a minimizer for problem (M) with parameter values (α, A) given by (16), (17) then (20) holds with H - replaced by = (α, A), so instead of (19) we get −u = -ρD u + ( − H -)u, where - = -(h, H, M). Multiplying by u and integrating gives |∇u|2 = ρD u2 + ( − H -) u2 .

Now the definition of - implies that ≥ - ρD so we get ≥ H -. This proves (α, A) = H -(h, H, M) and part (b). (c) If A = || then D = , ρ ≡ h and therefore h-(h, H, M) = µ from (15), so α = H h−h µ can take any positive value by suitable choice of h and H . Now let A < ||. By Prop. 10(b) and (c), α varies in the indicated range precisely when (α, A) − α varies in [0, µ ). From (16) and (18) one has |∇u|2

u2 ,

(α, A) − α = h- := h-(h, H, M), so we only need to show that h- has range [0, µ ) (with A fixed). First, h- ≥ 0 by definition, and h- = − α < µ by Prop. 10, since α = (H − h)- > 0, so the range of h- is contained in [0, µ ). Next, h- = 0 for h = 0 (and then M can be adjusted to A), and in the limit H = h one has ρ ≡ h and h- = µ , so when H → h then h- → µ , and clearly M can be adjusted to A. Using continuity of h- (which is proved as for in Prop. 10) we get the claim. 3. Symmetry Preservation and Symmetry Breaking 3.1. Symmetry preservation in the presence of convexity. Here we prove Theorem 4. Proof of Theorem 4. We use Steiner symmetrization (symmetrically decreasing rearrangement) u → u# with respect to the hyperplane {x1 = 0}. This is defined as follows. Assume u ∈ H01 () ∩ C 0 (): For each x , u# (·, x ) is the unique function of x1 which is symmetric in x1 and decreasing for x1 ≥ 0 such that |{x1 : u# (x1 , x ) > t}| = |{x1 : u(x1 , x ) > t}| for all t ∈ R. It is well-known (see, e.g., [LL, AB]) that, for all x and i = 1, . . . , n, with integrals taken over the set (5), |∂xi u# |2 dx1 ≤ |∂xi u|2 dx1 , (21) (u# )2 dx1 = u2 dx1 , (22) (αχD )# (u# )2 dx1 ≤ αχD u2 dx1 . (23)

326

S. Chanillo, D. Grieser, M. Imai, K. Kurata, I. Ohnishi

Here, f# is the increasing symmetric rearrangement of a function f , which is defined by f# = −(−f )# . Note that (21) for i = 1 is just the standard rearrangement inequality in one dimension, while for i > 1 it is proved as follows: Replace the partial derivatives by difference quotients (v3 (x1 ) − v0 (x1 ))/3 with v3 (x1 ) = u(x1, . . . , xi + 3, . . . ). After multplication by 3 2 the claimed inequality becomes simply |v3# − v0# |2 dx1 ≤ |v3 − v0 |2 dx1 which is well-known. Fix α and A and assume (u, D) is an optimal pair. Define the set D # by χD # = (χD )# . Integrating (21), (22) and (23) over x and summing (21) over i we get # 2 + (αχD )# (u# )2 dx # |∇u | dx λ(α, D ) ≤ # 2 (u ) dx |∇u|2 dx + αχD u2 dx ≤ = λ(α; D). (24) 2 u dx Since we have |D # | = |D| = A (by (22) applied to χD ), optimality of (u, D) implies that (u# , D # ) is also a minimizer and that equality holds in (21) and (23), for all i and almost all x . We need to show that this implies u = u# . The statements about D then follow from the characterization D = {u ≤ t}. First note that since (u# , D # ) is a minimizer, the function u# solves the equation −u# + αχD # u# = λ(α; D # )u# . Therefore, u and u# are continuously differentiable by Theorem 1, so equality in (21) holds for all x . By a result of Brothers and Ziemer (see [BZ]) this equality implies u# (x1 , x ) = u(x1 , x ) for all x1 provided the set {x1 : ∂x1 u# (x1 , x ) = 0} has measure zero. Therefore, we will be done once we have shown that the set {v = 0}

has measure zero, where v = ∂x1 u# .

(*)

We will give two proofs of this: The first proof works whenever α = α (A) and the second proof works whenever α ≤ α (A), so together they cover all cases. First proof of (*), assuming α = α (A): Assume this was not so. Define t # by # D = {u# ≤ t # }. v satisfies −v + αχD # v = λ(α, D # )v on {u# = t # }. Since {u# = t # } has measure zero by Theorem 1 and the assumption α = α (A), v vanishes on a set of positive measure in the open set {u# = t # }, so the unique continuation theorem (for sets of positive measure, see [FG]) applied to v implies that v ≡ 0 on some connected component K of {u# = t # }. Therefore, u# is constant in the x1 -direction on K. Since u# = 0 or t # on ∂K we conclude that then u# must actually be constant on K. This is a contradiction to Theorem 1(c). Second proof of (*), assuming α ≤ α (A) (this proof is taken from Cox [C]): We show that actually v < 0 for x1 > 0, so that {v = 0} is contained in the hyperplane {x1 = 0}. We have −u# = (α, A)u# −αχD # u# , and the right-hand side is decreasing in x1 (for x1 > 0) by definition of the rearrangement and since α ≤ (α, A) by Prop. 10. Taking the x1 -derivative (in the sense of distributions), we get v ≥ 0 as distribution. Also, v is continuous, so by the classical theory of subharmonic functions it satisfies the maximum principle (alternatively, it is in H 1 and then the maximum principle as in [GT], Ch. 8, applies). Since v ≤ 0, we conclude that v < 0 unless v vanishes identically in x1 > 0, which is clearly impossible. This proves (*). This concludes the proof that u = u# and hence the proof of the theorem. Note that in the case α ≤ α (A) the second proof of (*) above actually shows that ux1 < 0 for x1 > 0.

Symmetry Breaking for Composite Membranes

327

Proof of Corollary 5. The only set D ⊂ {|x| < 1} which has the symmetry and convexity properties stated in Theorem 4 in all directions is a shell region as stated. Clearly, r(A) is uniquely determined by A. Therefore, D is unique.

3.2. Symmetry breaking on annuli. We now give the proof of Theorem 6 about symmetry breaking on an annulus, = a = {x ∈ R2 ; a < |x| < a + 1},

a > 0.

Let D be any radial set in , D = {(r, θ ); r ∈ D1 , 0 ≤ θ < 2π },

D1 ⊂ (a, a + 1),

and let u be the first eigenfunction for D, with eigenvalue σ : −u + αχD u = σ u

on ,

u|∂ = 0.

(25)

For a sufficiently large (depending on α and δ = |D|/||) we will construct a comparison domain D˜ and a function u˜ which satisfy ˜ 2 + a χD˜ u˜ 2 ! a |∇ u| < σ. (26) ˜2 a u This shows that D is not an optimal configuration and hence implies the theorem. In order to construct D˜ and u, ˜ first pick N = N (δ) with δ <1−

1 2N

and consider the sector E+ = a ∩ {(r, θ ); 0 ≤ θ ≤ π/N }. Then let u˜ be the first Dirichlet eigenfunction of the Laplacian on E+ and λ1 (E+ ) be the first eigenvalue, −u˜ = λ1 (E+ )u˜

on E+ ,

u| ˜ ∂E+ = 0,

(27)

extended by zero on \ E+ ; the set D˜ can be taken to be any subset of \ E + with ˜ = |D|. This is possible since |D|/|| = δ < 1 − 1 = | \ E+ |/||. |D| 2N 2 Note that since supp u˜ ∩ D˜ = ∅, we have ( a |∇ u| ˜ + a χD˜ u˜ 2 )/ a u˜ 2 = ˜ 2 / E+ u˜ 2 = λ1 (E+ ), so (26) is equivalent to E+ |∇ u| !

λ1 (E+ ) < σ.

(28)

In order to prove this, we need to introduce a third eigenvalue problem, which is intermediate between (25) and (27). Define v to be the lowest eigenfunction for the problem (25) among functions of the form v(r, θ ) = h(r) sin N θ,

328

S. Chanillo, D. Grieser, M. Imai, K. Kurata, I. Ohnishi

and let τ be the associated eigenvalue. Note that problem (25) for such functions is equivalent to the problem 1 N2 −h (r) − h (r) + 2 h(r) + αχD1 (r)h(r) = τ h(r) r r h(a) = h(a + 1) = 0

on [a, a + 1],

(29) (30)

for h. Thus, h is the first eigenfunction of this Sturm-Liouville problem, and the eigenvalue τ is characterized by a+1 2 2 ((g ) + (αχD1 + Nr 2 )g 2 )r dr a , (31) τ = inf a+1 2 r dr g∈S g a where S = {g ∈ C 1 [a, a + 1]; g(a) = g(a + 1) = 0}. From this the (well-known) fact that h does not change sign on [a, a + 1] is evident; so we may assume h ≥ 0. We will compare u with v and v with u. ˜ The following two lemmas provide the needed estimates. Lemma 14. Let σ be the lowest eigenvalue for the problem (25) (with D radial) on a,b = {x ∈ R2 : a < |x| < b}, and let τ be the lowest eigenvalue for eigenfunctions of the form v(r, θ ) = h(r) sin N θ on a,b . Then we have τ − σ ≤ N 2 /a 2 . Proof. Since χD is assumed radial, the first eigenfunction of (25) is a radial function u = f (r). Now consider the trial function w(r, θ) = f (r) sin N θ . We have 2 2 (|∇w| + αχD w ) dx τ ≤ a,b . 2 a,b w dx Thus,

b τ≤

a

((f (r))2 +

N2 f (r)2 + αχD1 f (r)2 )r r2 b 2 a f (r) r dr

dr

.

By definition of f (r) we get b τ ≤σ+

a

( Nr 2 f (r)2 )r dr ≤ σ + N 2 /a 2 . b 2 a f (r) r dr 2

The claim follows. Lemma 15. Define v as above. Assume D is radial and |D|/|| = δ. There exists a positive constant cα,δ , independent of a, such that for all a ≥ 1 we have 2 v dx D ≥ cα,δ . 2 v dx

Symmetry Breaking for Composite Membranes

329

Proof. We see from v(r, θ ) = h(r) sin N θ that a+1 2 χD (r)h(r)2 r dr v dx D . = a a+11 2 2 r dr h(r) v dx a

(32)

h satisfies Eq. (29). For τ one has a uniform bound τ ≤ Cα,δ with Cα,δ independent of a ≥ 1, because from (31) one gets a+1 2 (g ) r dr + α + N 2, τ ≤ inf a a+1 2 r dr g∈S g a and by using for g the translate of any fixed test function on [0, 1] one sees that the first term on the right is bounded by some absolute constant. Therefore, the coefficients of Eq. (29) are uniformly bounded for a ≥ 1. Also, we have h ≥ 0. Lemma 16 in Sect. 6 then implies that one has inf

[a+δ/4,a+1−δ/4]

h ≥ cα,δ h

L2 (a,a+1) .

(33)

Since |D1 | = δ, we have |[a + δ/4, a + 1 − δ/4] ∩ D1 | ≥ δ/2. Therefore, a+1 δ χD1 (r)h(r)2 r dr ≥ a h2 inf [a+δ/4,a+1−δ/4] 2 a and

a

a+1

h(r) r dr ≤ (a + 1)

a+1

2

a

h ≤ 2a

a+1

2

a

Combining (33), (34) and (35) with (32) we get the lemma.

h2 .

(34)

(35)

End of proof of Theorem 6. We have 2 α χD v 2 dx |∇v| dx τ= + 2 . 2 v dx v dx

(36)

Since v(r, θ ) = h(r) sin N θ , v vanishes on the rays θ = 0 and θ = π/N . Since |v| and |∇v| are periodic in θ of period π/N , we can replace by E+ in the first quotient. Therefore, we can use v as test function in the Rayleigh quotient for the Dirichlet Laplacian on E+ and obtain 2 2 dx |∇v| E+ |∇v| dx = ≥ λ1 (E+ ). 2 2 v dx E+ v dx Combining this with (36) and Lemma 15 we therefore get τ ≥ λ1 (E+ ) + αcα,δ .

(37)

From Lemma 14 we then get σ > τ − N 2 /a 2 ≥ λ1 (E+ ) + αcα,δ − N 2 /a 2 . If a is chosen so large that N 2 /a 2 ≤ αcα,δ then this gives (28) and hence the theorem.

330

S. Chanillo, D. Grieser, M. Imai, K. Kurata, I. Ohnishi

3.3. Symmetry breaking on dumbbells. Proof of Theorem 7. Since α is fixed throughout, we will write λ (D) = λ (α, D), (A) = (α, A) = inf |D|=A λ (D). Here we keep the index since we will also consider these quantities with replaced by one of the ‘lobes’ B± = B1 (±2, 0). All (implied) constants will only depend on α and A. Write B = B± , and given D, let D ± = D ∩ B± ,

A± = |D± |.

Further, we introduce Amin = min{min(|D− |, |D+ |) : D ⊂ , |D| = A}. Thus, if D is distributed over with the greatest possible imbalance between D+ and D− then the smaller of D± will have area Amin . It is easily checked that c Amin = max(0, A − |B− |).

We first sketch the idea of the proof: 1. For h = 0, i.e. two disconnected balls, one clearly has (A) = min(B (A− ), B (A+ )).

(38)

Since B is strictly increasing, it is optimal to put as much of D as possible in one ball, say B+ , and the “small” remainder in the other. Thus (A) = B (Amin ), and the eigenfunction is zero in B+ . 2. For small positive h, this situation should be approximately the same: Eq. (38) will hold with an error that is a power of h (compare Eq. (42) below), so the same argument as in 1. implies symmetry breaking. Also, the eigenfunction must be small on one lobe, and since D = {u ≤ t}, one gets (b) from an estimate of t. We now carry out the details. Let (u, D) be an optimal pair. Assume u2 = 1.

First we need an estimate ensuring that the perturbation introduced by the handle is small. This is provided by the following estimate near the boundary (see [GT, Theorem 8.27 with R0 = 1 and R = 3h]), which is applicable since satisfies a uniform exterior cone condition (uniformly in h): There is β ∈ (0, 1] such that max

x:dist (x,∂)≤3h

u(x) ≤ Chβ u

L2 () .

(39)

From this it follows that there is a cut-off function σ = σh on having the following properties: 1. 0 ≤ σ ≤ 1 on . 2. supp σ ⊂ B− ∪ B+ . 3. |u| = O(hβ ) on supp (1 − σ ). 4. |∇σ |2 < C, uniformly as h → 0.

Symmetry Breaking for Composite Membranes

331

To construct σ , choose χ ∈ C0∞ ([0, 2)), 0 ≤ χ ≤ 1, that equals one on [0, 3/2] and set σ (x) = 1 − χ (|x − (±1, 0)|/ h) on B± and σ ≡ 0 on the handle. Properties 1, 2 and 4 are easily checked directly, and property 3 follows from (39). For brevity, denote, for ⊂ , Q (u) = ∇u

+α u

2 L2 ( )

2 L2 (D∩ )

so that (A) = Q (u). Without loss of generality we may assume B (A− ) ≤ B (A+ ). First, we show B (Amin ) ≥ (A).

(40)

˜ for B (Amin ), extend u˜ by zero outside B− This is easy: Take an optimal pair (u, ˜ D) c with ¯ ˜ ¯ = A, by choosing any D ⊂ B− and define a domain D = D ∪ D ⊂ , |D| ¯ as a test pair for |D | = A − Amin . Since u˜ ≡ 0 on D , one gets (40) by using (u, ˜ D) (A). Next, we show a reverse inequality. Using the properties of σ and supp ∇σ ⊂ supp (1 − σ ) we obtain, with · denoting the L2 -norm on B± , ∇(σ u) 2 = σ ∇u + (∇σ )u 2 ≤ ( ∇u + ∇σ maxsupp (1−σ ) u)2 ≤ ∇u 2 + O(hβ ) and therefore QB± (σ u) ≤ QB± (u) + O(hβ ). Now we can use σ u as test function for the lowest eigenvalue of − + αχD∩B± on B± , and this gives the third inequality in (A) = Q (u) ≥

±

≥

±

≥

±

QB± (u) ≥ λB± (D± )

±

B±

λB± (D± )

B±

QB± (σ u) − O(hβ )

(σ u)2 − O(hβ ) u2 − O(hβ )

≥ B (A− ) + (B (A+ ) − B (A− ))

B+

u2 − O(hβ ).

(41)

In the last two inequalities we have used property 3 of σ , the optimality of B (A± ), and u2 = 1. Since we assume B (A+ ) ≥ B (A− ), this and inequality (40) imply B (Amin ) ≥ (A) ≥ B (A− ) − O(hβ ).

(42)

By strict monotonicity of B one easily gets from this A− ≤ Amin + o(1) (h → 0). Next, from D ⊂ D+ ∪ D− ∪ H and |H | < 4h we have A < A+ + A− + 4h, so c |) ≤ A+ − A− > A − 2A− − 4h ≥ A − 2Amin − o(1), and then Amin = max(0, A − |B− max(0, A − π) gives A+ − A− ≥ min(A, 2π − A) − o(1). This shows A+ = A− for h < h0 (A, α) and therefore proves part (a) the theorem.

(43)

332

S. Chanillo, D. Grieser, M. Imai, K. Kurata, I. Ohnishi

Now we prove part (b). From (43) we have A+ − A− > c0 for some constant c0 > 0, whenever h < h0 (A, α), so strict monotonicity and continuity of B imply B (A+ ) − B (A− ) > c

(44)

with c > 0 independent of h. Now from (40) and (41), and using B (A− ) ≥ B (Amin ) (since A− ≥ Amin ) and monotonicity, we conclude (B (A+ ) − B (A− )) B+ u2 = O(hβ ). This and (44) give B+ u2 = O(hβ ). Since, by (39), u|∂B+ = O(hβ ), this L2 bound implies a pointwise bound for u on B+ by (47). Combined with (39), applied on the handle, this gives sup u(x) = O(hβ/2 ).

x∈B−

(45)

Finally, we want to deduce from (45) that D c ⊂ B− if A > π and h is sufficiently small: Since (u, D) is an optimal pair, we have D = {u ≤ t} for some t > 0. Equation (45) shows that we are done if we can show that t > c for a constant c > 0 independent of h. For r ∈ (0, 1) let B− (r) be the closed ball of radius r concentric with B− . Applying Lemma 16 to G = B− we see, since u L2 (B− ) ≥ 1 − O(hβ ) by (45), that inf u ≥ cr

B− (r)

(46)

for any r ∈ (0, 1), with cr > 0 only depending on r, A and α, and this implies |{u ≥ cr }| ≥ |B− (r)|. Therefore, we can conclude t > cr as soon as |B− (r)| > || − A. Since || ≤ 2π + 4h and A > π , one can find such an r if h < h0 , both r and h0 only depending on A (and α). This completes the proof of the theorem.

4. Free Boundary and Convex Domains Proof of Theorem 9, Part (a). First recall, as a consequence of results by Brascamp-Lieb [BL] and Caffarelli-Spruck [CS], that the first eigenfunction ψ on a convex domain possesses only one point where ∇ψ = 0. This point is necessarily the point where ψ attains its maximum. Now given A, we select t as in Theorem 3, and we select δ0 < t such that t + δ0 < M where M = max ψ. With this choice of δ0 we use Theorem 3 to determine a value α1 for which []t −δ0 ⊂ D ⊂ []t +δ0 for all α < α1 . Then the free boundary {u = t} is contained in the closed annulus A = {t − δ0 ≤ ψ ≤ t + δ0 }. We have ∇ψ = 0 on A, so C := minA |∇ψ| is positive. Thus decreasing α1 to a smaller value α0 > 0, we can use Prop. 11 to conclude that for all α < α0 we have |∇u| > C/2 on A and hence on the free boundary {u = t}. Applying Theorem 8 we now get the first part of Theorem 9. Proof of Theorem 9, part (b). We only sketch the proof. Fix x0 with ∇ψ(x0 ) = 0. Choose coordinates in which ∇ψ(x0 ) = (0, . . . , 0, a), a > 0, and for x near x0 (where x = (x1 , . . . , xn−1 )) and t near t0 = ψ(x0 ) denote the locally unique solution xn of the equation ψ(x , xn ) = t by F0 (x , t). For α near zero and x near x0 one has ∂uα /∂xn = 0 by Prop. 11, so we may define Fα similarly for uα instead of ψ. By a result of Korevaar and Lewis [KL] the level set of ψ through x0 is strictly convex, 2F 0 in the sense that the matrix ( ∂x∂ i ∂x )i,j =1,... ,n−1 is positive definite at (x0 , t0 ). Therefore, j

Symmetry Breaking for Composite Membranes

333

∂ Fα the result follows if one can show continuity of ∂x in α and (x , t). Now the equation i ∂xj for u gives for Fα a uniformly elliptic, quasi-linear equation (writing y = (x , t)) 2

n

bij (∇Fα )

i,j =1

∂ 2 Fα (y) = αχGα (yn )yn − (α, A)yn ∂yi ∂yj

with bij real analytic and Gα = (−∞, tα ], where tα is such that |{uα ≤ tα }| = A. From this it is easy to derive the desired regularity, cf. the proof of Lemma 3 in [CGK].

5. Numerical Results In this section we make a few remarks on our method for the numerical solution of our eigenvalue problem. We use the finite element method for the discretization of our eigenvalue problem, with conforming P-1 elements. To create the mesh we have utilized the automatically spatial meshing program encoded by Y. Tsukuda (see [TK]). In order to calculate the approximate first eigenvalue and the corresponding eigenfunction, we employ the power method. Our method to obtain an optimal configuration is based on an algorithm that was introduced in [Pi]. However, we do not insist on D (the sought-for optimal configuration) to be a union of elements. This flexibility allows us to find a good approximation even without remeshing. We now describe the main procedure. The given data are A and α. We first take any initial domain D0 satisfying |D0 | = A. Next, if we have obtained Dn−1 (n = 1, 2, 3, · · · ) then we calculate the first eigenvalue λn−1 and the corresponding eigenfunction un−1 for the finite element approximation problem for the operator − + αχDn−1 . Then we obtain Dn from un−1 by finding a number t0 such that |{un−1 ≤ t0 }| = A and setting Dn = {un−1 ≤ t0 }. The number t0 is determined by a bisection method, i.e. by setting down0 = 0, up0 = max un−1 , j = 0 and then iterating Steps 1 and 2 (with L(t) := |{un−1 ≤ t}|) Step 1: Let intermj : = (upj + downj )/2 and calculate L(intermj ). Step 2: If L(intermj ) < A, then upj +1 := upj and downj +1 := intermj , else if L(intermj ) > A, then upj +1 := intermj and downj +1 := downj . Increase j by one. The iteration is stopped when L(intermk ) nearly equals A and upk and downk nearly equal intermk according to the adopted precision of approximation, and then we set t0 = intermk . Having obtained Dn we repeat the procedure above to find un , Dn+1 etc. It is easily seen that λn ≤ λn−1 . We iterate until |λn − λn−1 | < 3, where 3 is given. In the numerical experiments that we have done, we have taken 3 between 10−7 and 10−10 . By the monotonicity of {λn }, the limit limn→∞ = λ∞ exists. However, it is not clear a priori whether λ∞ = (α, A) or not. In order to avoid the latter case, we have repeated the same procedure with several different initial shapes D0 . The results of some of the computations that we have done are shown in Figs. 1–3. They illustrate well Theorems 2, 4 6, 7, and 9.

334

S. Chanillo, D. Grieser, M. Imai, K. Kurata, I. Ohnishi

6. Some Open Problems and Conjectures In this section D = Dα,A will always denote an optimal configuration. Conjecture 1 (Uniqueness and convexity). If is convex then D is unique, and D c is convex (at least for α ≤ α (A)). Concerning the restriction on α compare the remark after Theorem 2. We have proved convexity for small α in Theorem 9. Problem 1 (Regularity of the free boundary). When is the boundary of an optimal configuration smooth everywhere? In general, how can we control the size of singular sets of the free boundary? In the convex case we have proved smoothness for small α in Theorem 9. A similar method should easily yield smoothness of the free boundary for small A and smooth ∂. Conjecture 2. In dimension two the free boundary ∂D is smooth outside a finite set. We prove some restrictions on the singular set of ∂D in [CGK]. Problem 2 (Topology of D and D c ). If is simply connected, is D also connected even in the case α > α (A) (cf. Theorem 2)? If A or || − A is small enough (with α fixed), is D c always connected? Compare Prop. 12 for the case of A close to ||. In a dumbbell ψ has two maxima. But numerical evidence suggests the following conjecture: Conjecture 3 (One component of D c for dumbbell). Let h be a dumbbell. Then for every α > 0 there is ρ0 (α, h) > 0 such that D c consists of one component (near one of the maxima of ψ ) whenever || − A < ρ0 (α, h). Clearly, one would expect ρ0 (α, h) → 0 as α → 0. We now turn to questions of symmetry. A very general problem is the following: Problem 3 (Symmetry and symmetry breaking). Determine (at least qualitatively) the region in the space of parameters where symmetry breaking occurs. For annuli the parameters are α, δ = A/|| and the ratio τ of outer and inner radius (“thickness”). For dumbbells the parameters are α, A and the thickness of the handle. First results on this general problem are given by Theorems 6 and 7. See also Fig. 2. The next three conjectures address other aspects of this problem, i.e. they concern other regions in parameter space. They are motivated by numerical experiments. Conjecture 4 (Symmetry on dumbbells). Let h be a dumbbell. Then for every α > 0 there is ρ1 (α, h) > 0 such that symmetry breaking occurs if and only if || − A < ρ1 (α, h). Conjecture 5 (Symmetry on annuli). For each α, δ > 0 there is τ0 (α, δ) such that symmetry breaking occurs for the annulus of thickness τ if and only if τ < τ0 (α, δ). Theorem 6 gives one half of this. The other half means that the optimal configuration is rotationally symmetric for “thick” annuli. Some aspects of this conjecture are discussed in [CGK]. More generally, it would be interesting to prove symmetry preservation in any situation not covered by Theorem 4 (i.e. in a non-convex situation). In particular, a natural conjecture is:

Symmetry Breaking for Composite Membranes

335

Conjecture 6 (Symmetry preservation for small α). For any domain and any A there is α0 (A, ) such that for α ≤ α0 (A, ) any optimal configuration D has the same symmetries as . Also, the analysis of the transition between the symmetric and asymmetric situations would be interesting, as well as the shape of asymmetric solutions for the annulus. Problem 4 (Relation between D and the curvature of ∂). Prove that D is fat near points where ∂ has large positive curvature. For example see Fig. 1. For α = 0 and A near zero this should be not too hard. See [K1] for the case α = 0 under additional geometric assumptions. From this one should obtain the result at least for small α and A by perturbation. In [CGK], Thm. 9, we prove in a model case that D is thin near a portion of the boundary which has large negative curvature. Problem 5 (Limit α → ∞). Consider the restricted minimization problem, allowing only such sets D for which D c is a ball. How does this relate to the limit α → ∞ in our problem? Where does the center of an optimal ball lie? This is motivated as follows: Formally, for α = ∞ the eigenvalue λ (α, D) equals the first Dirichlet eigenvalue of D c . (The convergence to this value as α → ∞ is proved in [HH] and [DKM], for example.) Now by the Faber-Krahn inequality (see [Ch], for example), the first Dirichlet eigenvalue of a domain of prescribed area is minimal if the domain is a ball. So the optimal configuration for α large should be close to a ball, at least when A is close enough to || (so that a ball of volume || − A fits into ). Problem 6 (Other Elliptic Operators). Consider the same optimization problem for a magnetic Schrödinger operator (i∇ − αχD A(x))2 with constant magnetic field or a uniformly elliptic operator of divergence type −∇{(1 + αχD (x))∇}. We have no results for these operators, even if is a ball. Appendix: Basic PDE Facts Here we collect some well-known facts about uniform estimates for solutions of elliptic equations. We will state these for an equation P u = 0,

P =+

n

bj (x)

j =1

∂ + c(x), ∂xj

x ∈ G,

where P has measurable, uniformly bounded coefficients, u ∈ C 1 (G) ∩ C 0 (G), and G ⊂ Rn is a bounded open set. In the following estimates, saying that the constants depend on P will mean that they depend on supG (b1 , . . . , bn , c) and stay bounded when this quantity stays bounded. First, we have the uniform bound (see [GT, Thm. 8.15 and (8.38)]) sup |u| ≤ CG,P ( u G

L2 (G)

+ sup |u|). ∂G

(47)

336

S. Chanillo, D. Grieser, M. Imai, K. Kurata, I. Ohnishi

Second, we have Harnack’s inequality: If u ≥ 0 on G and G is a compact subset of G then sup u ≤ CG,G ,P inf u.

(48)

G

G

Combining these two we get the following slightly less standard estimate. For 3 > 0 let G3 = {x ∈ G : dist (x, ∂G) ≥ 3}. Lemma 16. For any 3 > 0 there is a positive constant cG,P ,3 such that for any u ∈ C 1 (G) ∩ C 0 (G) that solves P u = 0 and satisfies u ≥ 0 one has inf u ≥ cG,P ,3 ( u G3

L2 (G)

− sup u). ∂G

Here we set inf u := ∞. ∅

Proof. We have 2 u = G

u + 2

G3

G\G3

u2 ≤ |G3 | sup u2 + |G \ G3 | sup u2 G3

≤ CG,P ,3 inf u2 + |G \ G3 | CG,P ( G3

G

G

u2 + sup u2 ), ∂G

where we used Harnack’s inequality and the uniform estimate (47). If 3 is so small that |G \ G3 | CG,P < 1/2 then we can subtract the last two terms, and the claim follows easily. The claim for larger 3 then follows from the fact that inf G3 u ≥ inf G3 u if 3 ≥ 3. Acknowledgement. We started this work while K. Kurata was visiting the Erwin Schrödinger International Institute for Mathematical Physics (ESI) in Vienna. K. Kurata is partially supported by Tokyo Metropolitan University maintenance costs and by ESI and wishes to thank Professor T. Hoffmann-Ostenhof for his invitation and the members of ESI for their hospitality. D. Grieser was also at the ESI, and thanks L. Friedlander for inviting him. M. Imai deeply appreciates helpful comments and heartful encouragement by Professor Teruo Ushijima in the University of Electro-Communications (Tokyo). We thank Professor M. Loss for his interest and some valuable comments. We thank Professor E. Harrell for his interest in our work and for informing us of the related works [AH, AHS], and valuable discussions. S. Chanillo was supported by NSF grant DMS9970359.

References [AB] [AH] [AHS] [BL] [BZ] [CS]

Ashbaugh, M.S., Benguria, R.D.: A sharp bound for the ratio of the first two eigenvalues of Dirichlet Laplacians and extensions. Ann. of Math. 135, 601–628 (1992) Ashbaugh, M.S., Harrell, E.: Maximal and minimal eigenvalues and their associated nonlinear equations. J. Math. Phys. 28, 1770–1786 (1987) Ashbaugh, M.S., Harrell, E., Svirsky, R.: On minimal and maximal eigenvalue gaps and their causes. Pacific J. Math. 147, 1–24 (1991) Brascamp, H.J., Lieb, E.: On extensions of the Brunn–Minkowski and Prekopa–Leindler theorems, including inequalities for log concave functions, and with an application to a diffusion equations. J. Funct. Anal. 22, 366–389 (1976) Brothers, J.E., Ziemer, W.P.: Minimal rearrangements of Sobolev functions. J. reine angew. Math. 384, 153–179 (1988) Caffarelli, L.A., Spruck, J.: Convexity properties of solutions to some classical variational problems. Comm. in Partial Diff. Eq. 7, 1337–1379 (1982)

Symmetry Breaking for Composite Membranes

[CGK] [Ch] [C] [CL] [CM] [DKM] [Eg] [FG] [F] [GT] [G] [HKK] [HH] [H] [K1] [K2] [KL] [Kr] [LL] [M] [Pi] [TK]

337

Chanillo, S., Grieser, D., and Kurata, K.: The free boundary problem in the optimization of composite membranes. To appear in Cont. Math., Proc. of the Conf. on Geometry and Control, R. Gulliver, W. Littman, R. Triggiani (eds) Chavel, I.: Eigenvalues in Riemannian Geometry. New York: Academic Press, 1984 Cox, S.J.: The two phase drum with the deepest bass note. Japan J. Indust. Appl. Math. 8, 345–355 (1991) Cox, S.J., Lipton, R.: Extremal eigenvalue problems for two-phase conductors. Arch. Rat. Mech. Anal. 136, 101–117 (1996) Cox, S.J., McLaughlin, J.R.: Extremal eigenvalue problems for composite membranes, I and II. Appl. Math. Optim. 22, 153–167, 169–187 (1990) Demuth, M., Kirsch, W., McGillivray, I.: Schrödinger operators-Geometric estimates in terms of the occupation time. Comm. in Part. Differ. Eqs. 20, 37–57 (1995) Egnell, H.: Extremal properties of the first eigenvalue of a class of elliptic eigenvalue problems. Ann. Scuol. Norm. Sup. Pisa, Cl. Sci., IV. Ser. 14, No. 1 1–48 (1987) de Figueiredo, D.G., Gossez, J-P.: Strict monotonicity of eigenvalues and unique continuation. Comm. P.D.E. 17 (1 & 2), 339–346 (1992) Friedman, A.: On the regularity of the solutions of non-linear elliptic and parabolic systems of partial differential equations. J. Math. Mech. (now Indiana Math. J.) 7, 43–60 (1958) Gilbarg, D., Trudinger, N.T.: Elliptic Partial Differential Equations of Second Order. New York– Berlin: Springer-Verlag, 1983 Giraud, G.: Ann. Scient. de l’Ec. Norm. 43, 1–128 (1926) Harrell, E., Kröger, P., Kurata, K.: On the placement of an obstacle or a well so as to optimize the fundamental eigenvalue. Preprint Hempel, P., Herbst, I.: Strong magnetic fields, Dirichlet boundaries, and spectral gaps. Commun. Math. Phys. 169, 237–260 (1995) Hopf, E.: Über den funktionalen, insbesondere den analytischen Character der Lösungen elliptischer Differentialgleichungen zweiter Ordnung. Math. Z. 34, 194–233 (1931) Kawohl. B.: On the location of maxima of the gradient for solutions to quasi-linear elliptic problems and a problem raised by Saint Venant. J. of Elasticity 17, 195–206 (1987) Kawohl, B.: Symmetrization – or how to prove symmetry of solutions to a PDE. In: Partial Differential Equations, Theory and Numerical Solution, (W.Jäger, J.Neusetal Eds.), Notes in Math. 406, Chapman & Hall Res., 1999, pp. 214–229 Korevaar, N.J., Lewis, J.L.: Convex solutions of certain elliptic equations have constant rank Hessians. Arch. Rat. Mech. Anal. 97, 19–32 (1987) Krein, M.G.: On certain problems on the maximum and minimum of characteristic values and on the Lyapunov zones of stability. AMS Translations Ser. 2, 1, 163–187 (1955) Lieb, E., Loss, M.: Analysis. Providence, RI: Am. Math. Soc. 1997 Morrey, C.B.: On the analyticity of solutions of analytic non-linear elliptic systems of PDE I. Am. J. Math. 80, 198–218 (1958) Pironneau, O.: Optimal shape design for elliptic systems. New York: Springer-Verlag Inc., 1984 Tsukuda, Y., Kaizu, S.: Proceedings of the tenth symposium on computational fluid mechanics. Chuo University, Tokyo, Japan. Japan society of computational fluid dynamics, 1996, pp. 220–221 (in Japanese with English abstract)

Communicated by B. Simon

Commun. Math. Phys. 214, 339 – 371 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Multifractal Analysis of Hyperbolic Flows L. Barreira , B. Saussol Departamento de Matemática, Instituto Superior Técnico, 1049-001 Lisboa, Portugal. E-mail: [email protected]; [email protected] Received: 18 November 1999 / Accepted: 3 April 2000

Abstract: We establish the multifractal analysis of hyperbolic flows and of suspension flows over subshifts of finite type. A non-trivial consequence of our results is that for every Hölder continuous function non-cohomologous to a constant, the set of points without Birkhoff average has full topological entropy. 1. Introduction Much attention has been given by physicists and applied mathematicians to the study of chaotic behavior. Several techniques were put forward as a means to deal with the enormous amount of data provided by the associated time series. In particular, there has been a growing interest in the study of multifractal spectra, such as the dimension spectrum for pointwise dimensions. These spectra conveniently encode information about the “multifractal” structure of complicated invariant sets. The rigorous mathematical theory of multifractal analysis has been developed quite significantly during the last decade. We refer the reader to the book [7] for a description of results, and for a list of references. We briefly describe here the main elements of multifractal analysis. Let T : X → X be a continuous map of a compact metric space, and g : X → R a continuous function. For each α ∈ R, let n 1 i Kα = x ∈ X : lim g(T x) = α . n→∞ n i=0

We also consider the set

n 1 i g(T x) does not exist . K = x ∈ X : lim n→∞ n i=0

L. B. was partially supported by FCT’s Units Funding Program and NATO grant CRG970161. B. S. was supported by the Center for Mathematical Analysis, Geometry, and Dynamical Systems.

340

L. Barreira, B. Saussol

Clearly X=K∪

Kα .

(1)

α∈R

This union is formed by pairwise disjoint T -invariant sets, and is called a multifractal decomposition of X. For each α ∈ R such that Kα = ∅, set D(α) = dimH Kα , where dimH Z denotes the Hausdorff dimension of Z. Given Z ⊂ X and α > 0, recall that dimH Z = inf{α : m(Z, α) = 0}, where (diam U )α , m(Z, α) = lim inf δ→0 U

U ∈U

and the infimum is taken over all finite or countable covers U of Z by sets of diameter at most δ. The function D is called dimension spectrum for the Birkhoff averages of g, and is one of the main elements of multifractal analysis. By Birkhoff’s ergodic theorem, if µ is a T -invariant finite ergodic measure on X, and α = X g dµ/µ(X), then µ(Kα ) = µ(X). That is, there exists a set Kα in the multifractal decomposition with full µ-measure. Of course this does not mean that the other sets in the multifractal decomposition are empty. In fact, for several classes of hyperbolic dynamical systems it has been proved that: 1. if Kα = ∅, then Kα is a proper dense set; 2. the set {α ∈ R : Kα = ∅} is an interval (in particular it contains an uncountable number of points); 3. the function D is real analytic and strictly convex; 4. the irregular set K is everywhere dense and has full Hausdorff dimension, that is, dimH K = dimH X. This implies that the multifractal decomposition in (1) is composed of an uncountable number of T -invariant sets, all being everywhere dense, and all having positive Hausdorff dimension. Thus, multifractal analysis reveals a very rich “multifractal” structure for hyperbolic dynamical systems. In particular, this analysis has been effected when g is a Hölder continuous function, and T is either a subshift of finite type, an expanding map, or an axiom A diffeomorphism. We refer to [7] for details and a list of references. One of the main objectives of our paper is to establish a version of the multifractal analysis for a class of hyperbolic flows and suspension flows over subshifts of finite type. In the multifractal analysis of a flow = {ϕt }t on X, the sets Kα and K are replaced respectively by 1 t g(ϕτ x) dτ = α Kα = x ∈ X : lim t→∞ t 0 and

1 t g(ϕτ x) dτ does not exist . K = x ∈ X : lim t→∞ t 0

Recall that a set A ⊂ X is -invariant if ϕt A = A for every t ∈ R. Each of the sets Kα and K are -invariant.

Multifractal Analysis of Hyperbolic Flows

341

For several classes of hyperbolic flows we establish Properties 1, 2, 3, and 4 above. For example, we can give a complete description when is the geodesic flow on a compact surface with negative curvature. Recall that a Borel finite measure µ on X is - invariant if µ(ϕt A) = µ(A) for every measurable set A ⊂ X and every t ∈ R. Assume that µ is ergodic, i.e., that any -invariant measurable set has either zero or full µ-measure. By Birkhoff’s ergodic theorem, if g : X → R is a µ- integrable measurable function then 1 t→∞ t

lim

0

t

g(ϕτ x) dτ =

1 µ(X)

X

g dµ

(2)

for µ-almost every x ∈ X. Therefore, it is a rare event from the point of view of measure theory that the limit in (2) does not exist for a given point x ∈ X. Property 1 above shows that, surprisingly, for suspension flows over subshifts of finite type, and a generic Hölder continuous function g, the set of points where the limit in (2) does not exist is everywhere dense and has full topological entropy. In particular, from the point of view of topology it is a rather common event that the limit in (2) does not exist for a given point x ∈ X. Our results are counterparts of the corresponding results for diffeomorphisms on hyperbolic sets developed by Barreira and Schmeling in [3]. The main theme of our proofs is to use Markov systems and the associated symbolic dynamics developed by Bowen [4] and Ratner [10] to reduce the setup for flows to the setup for maps, and then apply the results that are already available in the case of maps. This is done through a study of suspension flows over subshifts of finite type associated to Markov systems, and a careful analysis of the relation between cohomology for flows and cohomology for the maps associated to Markov systems. After the completion of this draft, we learned that Pesin and Sadovskaya [8] recently obtained results related to ours. They use a different approach, involving the construction of Moran covers associated to Markov systems.

2. Hyperbolic Flows 2.1. Preliminaries. Let = {ϕt }t be a C 1 flow of the smooth compact manifold M. A -invariant set ⊂ M is called hyperbolic for if there exists a continuous splitting T M = E s ⊕ E u ⊕ E 0 , and constants c > 0 and λ ∈ (0, 1) such that for each x ∈ the following properties hold: d 1. dt (ϕt x)|t=0 generates E 0 (x); 2. dx ϕt E s (x) = E s (ϕt x) and dx ϕt E u (x) = E u (ϕt x) for each t ∈ R; 3. dx ϕt v ≤ cλt v for every v ∈ E s (x) and t > 0; 4. dx ϕ−t v ≤ cλt v for every v ∈ E u (x) and t > 0.

For example, geodesic flows on compact Riemannian manifolds with negative sectional curvature have the whole manifold as a hyperbolic set. Furthermore, time changes and small C 1 perturbations of flows with a hyperbolic set also possess a hyperbolic set. A closed -invariant set ⊂ M is called a basic set of if is hyperbolic, locally maximal, topologically transitive, and the periodic orbits of are dense in .

342

L. Barreira, B. Saussol

2.2. Irregular sets. For each continuous function g : → R we define the irregular set for the Birkhoff averages of g (with respect to = {ϕt }t ) by 1 t B(g) = x ∈ : lim g(ϕτ x) dτ does not exist . t→∞ t 0 One can easily verify that B(g) is -invariant. By Birkhoff’s ergodic theorem, the set B(g) has zero measure with respect to any -invariant finite measure. We say that g : → R is -cohomologous to a function h : → R on if there exists a bounded measurable function q : → R such that g(x) − h(x) = lim

t→0

q(ϕt x) − q(x) t

(3)

for every x ∈ . If g : → R is - cohomologous to a constant c ∈ R on , then t t 1 1 1 s+t g(ϕτ x) dτ − c = lim q(ϕτ x) dτ − q(ϕτ x) dτ t t s→0 s s 0 0 s 1 s+t 1 (4) q(ϕτ x) dτ − q(ϕτ x) dτ = lim t s→0 s t

0

2 sup|q| ≤ t for every x ∈ and t > 0, and hence, B(g) = ∅. We now present the main result of this section. It shows that for hyperbolic flows, if g : → R is not -cohomologous to a constant, then the set B(g) is non-empty, is everywhere dense, and has full topological entropy. See Sect. 4.1 for the definition of topological entropy h(|Z) on an arbitrary set Z (not necessarily compact nor invariant). Theorem 1. Let be a compact basic set of a topologically mixing C 1+ε flow , for some ε > 0, and let g : → R be a Hölder continuous function. Then the following properties are equivalent: 1. g is not -cohomologous to a constant on ; 2. B(g) is a non-empty proper dense set with h(|B(g)) = h(|).

(5)

In [3], Barreira and Schmeling studied irregular sets with respect to diffeomorphisms on hyperbolic sets. Theorem 1 is a counterpart of their results in the case of flows, and follows from the more general statements formulated below. We now show that “most” Hölder continuous functions are not -cohomologous to a constant. Let C α () be the space of Hölder continuous functions on with Hölder exponent α. For a function ϕ ∈ C α () we define its norm by |ϕ(x) − ϕ(y)|

ϕ α = sup{|ϕ(x)| : x ∈ } + sup : x, y ∈ and x = y , d(x, y)α where d denotes the distance on M. Theorem 2. Let be a compact hyperbolic set of a topologically transitive C 1 flow . Then, for each α ∈ (0, 1), the family of functions in C α () which are not cohomologous to a constant is open and dense in C α ().

Multifractal Analysis of Hyperbolic Flows

343

Theorems 1 and 2 immediately imply the following statement, whose formulation has the advantage of not using the notion of cohomology. Theorem 3. Let be a compact basic set of a topologically mixing C 1+ε flow , for some ε > 0. Given α > 0, for an open and dense family of functions g ∈ C α (), the set B(g) is a non-empty proper dense set with h(|B(g)) = h(|). 2.3. Multifractal analysis. Let g : → R be a continuous function. For each α ∈ R, consider the set 1 t Kα = x ∈ : lim g(ϕτ x) dτ = α . t→∞ t 0 One can easily verify that Kα is -invariant. By (4), if g is -cohomologous to a constant c ∈ R on , then Kc = . Given α ∈ R, set E(α) = h(|Kα ). The function E is called the entropy spectrum for the Birkhoff averages of g. For every real number q, let νq be the equilibrium measure of qg, and write T (q) = P (qg), where P (qg) is the topological pressure of qg with respect to . It is well known that T is a real analytic function. We denote by hν (|) the entropy of | with respect to the -invariant measure ν. See Sect. 4.1 for the definition. We now present a multifractal analysis of the spectrum E on basic sets. Theorem 4. Let be a compact basic set of a topologically mixing C 1+ε flow , for some ε > 0, and let g : → R be a Hölder continuous function with P (g) = 0. Then the following properties hold: 1. the domain of E is a closed interval in [0, ∞), which coincides with the range of the function α = −T , and if q ∈ R then E(α(q)) = T (q) + qα(q) = hνq (|); 2. if g is not -cohomologous to a constant on , then E and T are real analytic strictly convex functions. See Sect. 4.2 for a more detailed description of the spectrum E. 2.4. Markov systems. Let be a compact basic set of the C 1 flow = {ϕt }t , and let Vεs (x) = {y ∈ B(x, ε) : d(ϕt y, ϕt x) → 0 as t → +∞} and Vεu (x) = {y ∈ B(x, ε) : d(ϕt y, ϕt x) → 0 as t → −∞} be the local stable and unstable manifolds of size ε at the point x ∈ . For each sufficiently small ε > 0, there exists δ > 0 such that if x, y ∈ are at a distance

344

L. Barreira, B. Saussol

d(x, y) ≤ δ then there is a unique time t = t (x, y) ∈ [−ε, ε] for which the set def [x, y] = Vεs (ϕt x) ∩ Vεu (y) consists of a single point, and [x, y] ∈ . Let D ⊂ M be an open disk of dimension dim M − 1 which is transversal to the flow , and let x ∈ D. There exists a diffeomorphism from D × (−ε, ε) onto an open neighborhood U (x) of x. The projection map πD : U (x) → D defined by πD (ϕt y) = y is differentiable. A closed set R ⊂ ∩ D is called a rectangle if R = int R (where the interior is computed with respect to the topology of ∩D), and πD [x, y] ∈ R whenever x, y ∈ R. Consider a collection of rectangles R1 , . . . , Rk ⊂ (each contained in some disk transversal to the flow) with Ri ∩ Rj = ∂Ri ∩ ∂Rj for i = j such that there exists ε > 0 with:

1. = t∈[0,ε] ϕt ( ki=1 Ri ); 2. for each i = j either (ϕt Ri ) ∩ Rj = ∅ for all t ∈ [0, ε] or (ϕt Rj ) ∩ Ri = ∅ for all t ∈ [0, ε]. We define the transfer function τ : → [0, ∞) by τ (x) = min t > 0 : ϕt x ∈

k

Ri .

i=1

k

Let T : → i=1 Ri be the transfer map given by T x = ϕτ (x) x. We note that the

restriction of T to ki=1 Ri is invertible. We say that the rectangles R1 , . . . , Rk form a Markov system for on if T (int(Vεs (x) ∩ Ri )) ⊂ int(Vεs (T x) ∩ Rj ) and T −1 (int(Vεu (T x) ∩ Rj )) ⊂ int(Vεu (x) ∩ Ri ) whenever x ∈ int T Ri ∩ int Rj . Any basic set of a C 1 flow possesses Markov systems of arbitrary small diameter (see [4, 10]). Furthermore, the map τ is Hölder continuous on each domain of continuity, and 0 < inf τ ≤ sup τ < ∞. x∈

(6)

x∈

Given a Markov system R1 , . . . , Rk for on the basic set we define a k × k matrix A with entries aij = 1 if int T Ri ∩ int Rj = ∅, and aij = 0 otherwise. Consider the set X ⊂ {1, . . . , k}Z defined by X = {(· · · i−1 i0 i1 · · · ) : ain in+1 = 1 for every n ∈ Z}, and the shift map σ : X → X given by σ (· · · i0 · · · ) = (· · · j0 · · · ), where jn = in+1 for every n ∈ Z. The map σ |X is called a (two-sided) subshift of finite type with transfer matrix A. We fix β > 1 and equip X with the distance dX defined by dX ((· · · i−1 i0 i1 · · · ), (· · · j−1 j0 j1 · · · )) =

∞ n=−∞

β −|n| |in − jn |.

(7)

Multifractal Analysis of Hyperbolic Flows

We define a coding map π : X →

345

k

i=1 Ri

for the basic set by T −j int Rij . π(· · · i0 · · · ) = j ∈Z

One can easily check that π ◦ σ = T ◦ π . As observed in [4], it is always possible to choose the constant β in such a way that the function τ ◦ π : X → [0, ∞) is Lipschitz. Markov systems will be used in the proof of Theorem 1.

2.5. Cohomology for flows and maps. We now discuss the cohomology assumption in Theorem 1. We show how to use a Markov system to reduce this assumption to a cohomology assumption using the associated transfer map instead of the original flow. This relation is crucial to our approach. Given a continuous function g : → R and a Markov system for the flow = {ϕt }t on the basic set with transfer function τ : → [0, ∞), we define a new function Ig : → R by Ig (x) =

τ (x) 0

g(ϕs x) ds.

In particular, if c ∈ R, then Ic = cτ . We say that a function G : → R is T - cohomologous to a function H : → R on if there exists a bounded measurable function q : → R such that G − H = q ◦ T − q on . Theorem 5. Let be a basic set of the C 1 flow , g : → R and h : → R continuous functions, and τ the transfer function of some Markov system for on . Then the following properties are equivalent: 1. g is -cohomologous to h on and (3) holds for every x ∈ ; 2. Ig is T -cohomologous to Ih on with Ig (x) − Ih (x) = q(T x) − q(x) for every x ∈ . Theorem 5 allows us to translate the results obtained in [3] in the setting of subshifts of finite type and hyperbolic sets to the setting of hyperbolic flows. Theorem 5 implies that a function g is - cohomologous to a constant c ∈ R if and only if Ig is T - cohomologous to cτ . In particular, the cohomology assumption in Theorem 1 can be replaced by one in terms of the transfer map T (associated to some Markov system). Therefore, it would be of interest to also describe the convergence and the non-convergence of the Birkhoff averages of the flow in terms of T . This is effected in the following statement. Proposition 6. Let be a basic set of the C 1 flow = {ϕt }t , g : → R a continuous function, and τ the transfer function of some Markov system for on . Then the following properties hold: 1. if g : → R is Hölder continuous, then Ig is Hölder continuous on each domain of continuity of τ ;

346

L. Barreira, B. Saussol

2. if x ∈ , then lim inf t→∞

1 t

t

0

and 1 lim sup t→∞ t 3.

0

t

m Ig (T i x) g(ϕs x) ds = lim inf i=0 m i m→∞ i=0 τ (T x) m Ig (T i x) g(ϕs x) ds = lim sup i=0 ; m i m→∞ i=0 τ (T x)

m i i=0 Ig (T x) B(g) = x ∈ : lim m does not exist . i m→∞ i=0 τ (T x)

(8)

The identity (8) tells us that any irregular set for a hyperbolic flow can be described in terms of the map T . However, contrary to the maps considered in [3], T is not invertible nor hyperbolic. 3. Suspension Flows 3.1. Preliminaries. Let T : X → X be a homeomorphism of the compact metric space X, and τ : X → (0, ∞) a Lipschitz function. Consider the space Y = {(x, s) ∈ X × R : 0 ≤ s ≤ τ (x)},

(9)

with the points (x, τ (x)) and (T x, 0) identified for each x ∈ X. One can introduce in a natural way a topology on Y which makes Y a compact topological space. This topology is induced by a distance introduced by Bowen and Walters in [5] (see Appendix A for details). The metric structure shall first be used in Sect. 3.2. The suspension flow over T with height function τ is the flow : = {ψt }t on Y , where ψt : Y → Y is defined by ψt (x, s) = (x, s + t).

(10)

We extend τ to a function τ : Y → R by τ (y) = min{t > 0 : ψt y ∈ X × {0}}, and extend T to a map T : Y → X × {0} by T (y) = ψτ (y) y. Since there is no danger of confusion we continue to use the symbols τ and T for the extensions. Given a continuous function g : Y → R we define a function Ig : Y → R by τ (y) g(ψs y) ds. (11) Ig (y) = 0

Multifractal Analysis of Hyperbolic Flows

347

Theorem 7. If : = {ψt }t is a suspension flow on Y over T : X → X, and g : Y → R and h : Y → R are continuous functions, then the following properties are equivalent: 1. g is :-cohomologous to h on Y with g(y) − h(y) = lim

t→0

q(ψt y) − q(y) for every y ∈ Y ; t

2. Ig is T -cohomologous to Ih on Y with Ig (y) − Ih (y) = q(T y) − q(y) for every y ∈ Y ; 3. Ig |X × {0} is T -cohomologous to Ih |X × {0} on X × {0} with Ig (y) − Ih (y) = q(T y) − q(y) for every y ∈ X × {0}. By Theorem 7 (see Properties 2 and 3), each cohomology class in the base space X induces a cohomology class in the whole space Y , and all cohomology classes in Y appear in this way. We also obtain a version of Proposition 6 for suspension flows. Proposition 8. Let : = {ψt }t be a suspension flow on Y over T : X → X with height function τ , and g : Y → R a continuous function. If x ∈ X and s ∈ [0, τ (x)], then m Ig (T i x) 1 t g(ψτ (x, s)) dτ = lim inf i=0 lim inf (12) m i m→∞ t→∞ t 0 i=0 τ (T x) and 1 lim sup t→∞ t

0

t

m Ig (T i x) g(ψτ (x, s)) dτ = lim sup i=0 . m i m→∞ i=0 τ (T x)

(13)

Note that for a fixed x ∈ X the limits in (12) and (13) are independent of s. One can also consider the case when T : X → X is continuous but not necessarily a homeomorphism. More precisely, let T be a local homeomorphism in an open neighborhood of each point of the compact metric space X, τ : X → (0, ∞) a Lipschitz function, and Y as in (9). Note that even if X is a topological manifold and τ is a constant function, then Y may not be a topological manifold. The suspension semi-flow over T with height function τ is the semi-flow : = {ψt }t on Y , where ψt : Y → Y is defined by (10). The statements in Theorem 7 and Proposition 8 also hold for suspension semi-flows. 3.2. Suspension flows over subshifts of finite type. Let now : = {ψt }t be a suspension flow on Y over T : X → X. The space Y is equipped with the Bowen–Walters distance (see Appendix A for the definition). Let g : Y → R be a continuous function. For each α ∈ R, set E(α) = h(:|Kα ), where

1 t Kα = x ∈ Y : lim g(ψτ x) dτ = α . t→∞ t 0

348

L. Barreira, B. Saussol

The topological entropy is computed with respect to (the topology induced by) the Bowen–Walters distance on Y . The function E is called the entropy spectrum for the Birkhoff averages of g. For every real number q, let νq be the equilibrium measure of qg, and write T (q) = P: (qg). The following is a version of Theorem 4 for suspension flows over subshifts of finite type. Theorem 9. Let : be a suspension flow on Y over a topologically mixing two-sided subshift of finite type, and g : Y → R a Hölder continuous function with P: (g) = 0. Then the following properties hold: 1. the domain of E is a closed interval in [0, ∞), which coincides with the range of the function α = −T , and if q ∈ R then E(α(q)) = T (q) + qα(q) = hνq (:); 2. if g is not :-cohomologous to a constant on Y , then E and T are real analytic strictly convex functions. Given a continuous function g : Y → R we consider the irregular set

1 B(g) = y ∈ Y : lim t→∞ t

0

t

g(ψτ y) dτ does not exist .

Set

m i i=0 Ig (T x) C = x ∈ X : lim m does not exist . i m→∞ i=0 τ (T x) For a suspension flow : and a continuous function g on Y , it follows from Proposition 8 that B(g) = {(x, s) ∈ Y : x ∈ C and s ∈ [0, τ (x)]}. We now formulate a version of Theorem 1 for suspension flows over subshifts of finite type. Theorem 10. Let : be a suspension flow on Y over a topologically mixing two-sided subshift of finite type, and g : Y → R a Hölder continuous function. Then the following properties are equivalent: 1. g is not :-cohomologous to a constant on Y ; 2. B(g) is a non-empty proper dense set with h(:|B(g)) = h(:).

(14)

Multifractal Analysis of Hyperbolic Flows

349

Abramov’s entropy formula shows that hµ (T ) hν (T ) h(:) = sup = , τ dµ µ X X τ dν where the supremum is taken over all T -invariant probability measures. Here, hµ (T ) is the entropy of T with respect to µ, and ν is the equilibrium measure of −h(:)τ . One can also consider one-sided subshifts of finite type T : X → X. It is easy to verify that in this case T is a local homeomorphism in an open neighborhood of each point. The statements in Theorem 10 hold for suspension semi-flows over one-sided subshifts of finite type. See also Sect. 5.1 below. Given a basic set of a hyperbolic flow, each Markov system has naturally associated a suspension flow over a two-sided subshift of finite type. In fact these are the primary examples of suspension flows. We now describe this construction. If is a basic set of the C 1 flow = {ϕt }t , then given a Markov system there is an associated transfer function τ : → R (which is Hölder continuous on each domain of continuity), and an associated two-sided subshift of finite type σ : X → X with coding map π : X → (see Sect. 2.4). Therefore, to each Markov system one can naturally associate the suspension flow : = {ψt }t on Y over σ with Lipschitz height function τ ◦ π (see Sect. 2.4). We extend π to a finite-to-one surjection π : Y → by π(x, s) = (ϕs ◦ π )(x) for every (x, s) ∈ Y . Then π ◦ ψt = ϕt ◦ π.

(15)

Observe that the function g ◦ π : Y → R is Hölder continuous whenever g : → R is Hölder continuous. Using (15) one can show that B(g) = π(B(g ◦ π )). This can be used to establish the identity in (5) from the identity in (14). 4. Multifractal Analysis and Irregular Sets 4.1. A new Carathéodory dimension for flows. We introduce here a new Carathéodory dimension characteristic for flows. It is a generalization of the topological entropy, and is a flow version of a Carathéodory dimension characteristic introduced in [3] in the case of maps. Let : = {ψt }t be a continuous flow of the compact metric space (Y, d). Given x ∈ Y , t > 0, and ε > 0, we write B(x, t, ε) = {y ∈ Y : d(ψτ y, ψτ x) < ε whenever 0 ≤ τ ≤ t}. Let u : Y → R be a strictly positive continuous function. We write t u(ψτ y) dτ : y ∈ B(x, t, ε) U (x, t, ε) = sup 0

if B(x, t, ε) = ∅, and U (x, t, ε) = −∞ otherwise. For each set Z ⊂ Y and each α ∈ R, we define exp(−αU (x, t, ε)), M(Z, α, u, ε) = lim inf T →∞ <

(x,t)∈<

350

L. Barreira, B. Saussol

where the infimum is taken over all finite

or countable sets < = {(xi , ti )}i such that (xi , ti ) ∈ Y × [T , ∞) for each i, and i B(xi , ti , ε) ⊃ Z. We define the number dimu,ε Z = inf{α : M(Z, α, u, ε) = 0}. The limit def

dimu Z = lim dimu,ε Z ε→0

exists, and is called the u-dimension of Z (with respect to :). If u is the constant function equal to 1, then dimu Z is called topological entropy of : on Z, and is denoted by h(:|Z). If Z is compact and :-invariant, then we recover the well-known notion of topological entropy h(:|Z) = lim lim inf ε→0 t→∞

log NZ (t, ε) log NZ (t, ε) = lim lim sup , ε→0 t→∞ t t

where NZ (t, ε) is the least number of sets B(x, t, ε) needed to cover Z. For every Borel probability measure ν on Y , let dimu,ε ν = inf{dimu,ε Z : ν(Z) = 1}. The limit def

dimu ν = lim dimu,ε ν ε→0

exists, and is called the u-dimension of ν. If u = 1, then dimu µ is called the entropy of : with respect to ν, and is denoted by hν (:). We also define the lower and upper u-pointwise dimensions of ν at the point x ∈ Y by d ν,u (x) = lim lim inf −

log ν(B(x, t, ε)) U (x, t, ε)

d ν,u (x) = lim lim sup −

log ν(B(x, t, ε)) . U (x, t, ε)

ε→0 t→∞

and ε→0 t→∞

4.2. Suspension flows over subshifts of finite type. Let : = {ψt }t be a suspension flow on Y over a homeomorphism T : X → X of the compact metric space X, and µ a T invariant Borel probability measure in X. It is well known that µ induces a :-invariant probability measure ν in Y such that

Y

g dν =

X

τ (x) 0

g(x, s) dsdµ(x) τ dµ X

(16)

for every continuous function g : Y → R, and that any :-invariant measure ν in Y is of this form for some T -invariant Borel probability measure µ in X. We remark that the identity in (16) is equivalent to

Multifractal Analysis of Hyperbolic Flows

351

Y

g dν =

X

Ig dµ

X

τ dµ,

(17)

where the function Ig is defined by (11). We now consider the space Y equipped with the Bowen–Walters distance (see Appendix A). For every real number α, set Kα = {y ∈ Y : d ν,u (y) = d ν,u (y) = α}. Whenever Kα = ∅ and y ∈ Kα , the common value α of d ν,u (y) and d ν,u (y) is denoted by dν,u (y), and is called u-pointwise dimension of ν at y. We set Du (α) = dimu Kα . The function Du is called the u-dimension spectrum for u-pointwise dimensions (with respect to the measure ν). We now consider the special case when T is a subshift of finite type. Proposition 11. Let : = {ψt }t be a suspension flow on Y over a topologically mixing two-sided subshift of finite type, ν is an equilibrium measure for : with Hölder continuous potential, and u : Y → R is a Hölder continuous positive function. If y ∈ Y and ε > 0, then log ν(B(y, t, ε)) d ν,u (y) = lim inf − t t→∞ 0 u(ψτ y) dτ and log ν(B(y, t, ε)) d ν,u (y) = lim sup − t . t→∞ 0 u(ψτ y) dτ Notice that the limits in the proposition are independent of ε. Let g : Y → R be a continuous function. By (17) and Abramov’s entropy formula, we obtain hµ (T ) + X Ig dµ hν (:) + g dν = , (18) Y X τ dµ whenever µ is a T -invariant probability measure in X, and ν is the :-invariant probability measure induced by µ in Y . Since τ > 0, we conclude from (18) that P: (g) = 0 if and only if PT (Ig ) = 0, where PT (Ig ) is the topological pressure of Ig with respect to T . Therefore, when P: (g) = 0 the measure ν is an equilibrium measure of g (with respect to :) if and only if µ is an equilibrium measure of Ig |X (with respect to T ). For every real number q, we define the function gq : Y → R by gq = −Tu (q)u + qg, where the number Tu (q) is chosen so that P: (gq ) = 0. The above discussion shows that Tu (q) is equivalently specified by the equation PT (Igq ) = 0, where PT (Igq ) is the topological pressure of Ig with respect to T . We denote by νq and mu , respectively, the equilibrium measures of gq and − dimu X · u with respect to :. The following is a complete multifractal analysis of the spectrum Du for suspension flows over subshifts of finite type.

352

L. Barreira, B. Saussol

Theorem 12. Let : be a suspension flow on Y over a topologically mixing two-sided subshift of finite type, u : Y → R a Hölder continuous positive function, and ν an equilibrium measure for : with Hölder continuous potential g : Y → R such that P: (g) = 0. Then the following properties hold: 1. for ν-almost every y ∈ Y , hν (:) dν,u (y) = ; Y u dν 2. Tu is real analytic, and satisfies Tu (q) ≤ 0 and Tu (q) ≥ 0 for every q ∈ R, with Tu (0) = dimu Y and Tu (1) = 0; 3. the domain of Du is a closed interval in [0, ∞), which coincides with the range of the function αu = −Tu , and if q ∈ R, then Du (αu (q)) = Tu (q) + qαu (q); 4. for every q ∈ R, νq (Kαu (q) ) = 1, and dνq ,u (x) = Tu (q) + qαu (q) for νq -almost all x ∈ Kαu (q) ; moreover, d νq ,u (x) ≤ Tu (q) + qαu (q) for every x ∈ Kαu (q) , and Du (αu (q)) = dimu νq for every q ∈ R; 5. if ν = mu , then Du and Tu are real analytic strictly convex functions. Theorem 12 is a flow version of Theorem 6.6 in [3], which in turn follows from work of Pesin and Weiss [9], and Schmeling [11]. Setting u = 1 in Theorem 12 we obtain a complete multifractal analysis of the spectrum E(α) = h(:|{y ∈ Y : hν (y) = α}), where hν (y) = lim − t→∞

1 log ν(B(y, t, ε)) = lim t→∞ t t

0

t

g(ψτ y) dτ.

(19)

The function E is called entropy spectrum for local entropies (with respect to the measure ν), and coincides with the entropy spectrum for the Birkhoff averages of g. In the case of axiom A diffeomorphisms this spectrum was studied in [1]. We note that the statements in Proposition 11 and Theorem 12 also hold for suspension semi-flows over one-sided subshifts of finite type.

Multifractal Analysis of Hyperbolic Flows

353

4.3. Irregular sets. In this section we establish a version of the results in Sect. 3 for u-dimension. Consider again a continuous flow : = {ψt }t on Y . Given continuous functions g1 , . . . , gk : Y → R and u : Y → R, with u positive, we define the irregular set F(g1 , . . . , gk ; u) by

t 0

gj (ψs y) ds

0

u(ψs y) ds

y ∈ Y : lim t t→∞

does not exist for j = 1, . . . , k .

(20)

One can show that F(g1 , . . . , gk ; u) = {(x, s) : x ∈ C(g1 , . . . , gk ; u) and s ∈ [0, τ (x)]},

(21)

where C(g1 , . . . , gk ; u) is the set

m

x∈X:

i i=0 Igj (T x) lim m i m→∞ i=0 Iu (T x)

does not exist for j = 1, . . . , k .

(22)

The proof is a modification of the proof of Proposition 6. Theorem 13. Let : be a suspension flow on Y over a topologically mixing two-sided subshift of finite type, and g1 , . . . , gk , u : Y → R Hölder continuous functions, with u positive. Then the following properties are equivalent: 1. the function gj is not :-cohomologous to a multiple of u on Y for each j = 1, . . . , k; 2. dimu F(g1 , . . . , gk ; u) = dimu Y . Setting u = 1, we have

F(g1 , . . . , gk ; 1) =

k

B(gj ).

j =1

Hence, under the hypotheses of Theorem 13, if the function gj is not :-cohomologous to a constant on Y for each j = 1, . . . , k, then

h(:|

k

B(gj )) = h(:).

j =1

One can also consider suspension semi-flows over one-sided subshifts of finite type, and obtain a corresponding version of Theorem 13. An application of this is given in the following section.

354

L. Barreira, B. Saussol

5. Suspensions over Hyperbolic Dynamical Systems 5.1. Suspension semi-flows over expanding maps. Let T : M → M be a C 1 map of a smooth compact manifold M, and ⊂ M a T -invariant set such that T is expanding on . This means that there exist constants c > 0 and β > 1 such that dx T n v ≥ cβ n v for all x ∈ , v ∈ Tx M, and n ∈ N. We say that is a repeller of T . It is well known that repellers admit Markov partitions of arbitrarily small diameter. Each Markov partition has associated a one-sided subshift of finite type σ : X → X, and a coding map π : X → for the repeller, which is Hölder continuous, onto, finite-to-one, and satisfies T ◦ π = π ◦ σ. Consider a Markov partition for , and the associated coding map π : X → . Let : be the associated suspension semi-flow on Y over the one-sided subshift of finite type σ : X → X, with Y equipped with the Bowen–Walters distance. We define a function u : X → R by u(x) = log dπx T .

(23)

One says that T is conformal on if dx T is a multiple of an isometry for each x ∈ . One can show that if T is conformal on , then dimH Z = 1 + dimu π −1 Z for every :-invariant set Z ⊂ . This follows from work of Schmeling [12]. Let ν be a :-invariant probability measure on Y . For every real number α, set log ν(B(y, r)) =α , Kα = y ∈ Y : lim r→0 log r where B(y, r) ⊂ Y is the Bowen–Walters ball with radius r centered at y ∈ Y . The function D(α) = dimH Kα is called the dimension spectrum for pointwise dimensions (with respect to the measure ν). Let µ be the measure in X associated to ν as in Sect. 4.2. By Proposition 17 in Appendix A, for each y = (x, s) ∈ Y there exists c ≥ 1 such that if r is sufficiently small, then BX (x, r/c) × (s − r/c, s + r/c) ⊂ B(y, r) ⊂ BX (x, cr) × (s − cr, s + cr). Therefore,

log µ(BX (x, r)) =α−1 . Kα = (x, s) ∈ Y : lim r→0 log r

(24)

Since each set Kα is :-invariant, if u is as in (23), then D(α) = 1 + Du (α − 1). Proceeding in a similar way to that in Sect. 4.2 one can now effect a multifractal analysis of the spectrum D. We use the same notation as in Sect. 4.2. The following is an immediate consequence of Theorem 12 and the above discussion, together with the appropriate versions of Propositions 17 and 19 in Appendix A for locally invertible maps.

Multifractal Analysis of Hyperbolic Flows

355

Theorem 14. For a repeller of a topologically mixing C 1 map which is conformal on , let : be the suspension semi-flow on Y over the one-sided subshift of finite type associated to some Markov partition of , and ν an equilibrium measure for : with Hölder continuous potential g : Y → R such that P: (g) = 0. Then the following properties hold: 1. for ν-almost every y ∈ Y , lim

r→0

hµ (T ) log ν(B(y, r)) =1+ ; log r (log dT

◦ π ) dµ X

2. if T = Tu + 1, α = −T , and q ∈ R, then D(α(q)) = T (q) + qα(q); 3. for every q ∈ R, νq (Kα(q) ) = 1, and log νq (B(y, r)) = T (q) + qα(q) r→0 log r lim

for νq -almost all x ∈ Kα(q) ; moreover, lim sup r→0

log νq (B(y, r)) ≤ T (q) + qα(q) log r

for every x ∈ Kα(q) , and D(α(q)) = dimH νq for every q ∈ R; 4. if ν = mu , then D and T are real analytic strictly convex functions. The following statement follows easily from a version of Theorem 13 for suspension semi-flows over one-sided subshifts of finite type. Theorem 15. Under the hypothesis of Theorem 14, if ν = mu then log ν(B(y, r)) dimH y ∈ Y : lim does not exist = dimH Y. r→0 log r 5.2. Suspension flows over axiom A diffeomorphisms. Let be a basic set of a C 1 flow . Given a Markov system, we consider the associated two-sided subshift of finite type σ : X → X, and coding map π : X → (see Sects. 2.4 and 3.2). Let βs : X → R and βu : X → R be Hölder continuous positive functions. For each cylinder set Ci−n ···im = {(· · · j0 · · · ) : jk = ik for −n ≤ k ≤ m}, write

βs (Ci−n ···im ) = sup

m

k

βs (σ x) : x ∈ Ci−n ···im

k=0

and

βu (Ci−n ···im ) = sup

n

k=0

βu (σ

−k

x) : x ∈ Ci−n ···im .

356

L. Barreira, B. Saussol

Given α ∈ R, consider the function M(Z, α) = lim inf ?→0 <

exp(−αβs (C) − αβu (C)),

C∈<

where the infimum is taken over all covers < of Z by cylinders Ci−n ···im with m > ? and n > ?. We define the (βs , βu )-dimension of Z by dimβs ,βu Z = inf{α : M(Z, α) = 0}. Let again be a basic set of a C 1 flow = {ϕt }t . We say that the flow is conformal on if the maps dx ϕt |E s (x) : E s (x) → E s (ϕt x)

and

dx ϕt |E u (x) : E u (x) → E u (ϕt x)

are multiples of isometries for each x ∈ and t ∈ R. We give two examples of (βs , βu )dimension: 1. Let be a basic set of a C 1 flow such that is conformal on . Let T be the transfer map associated to some Markov system for on , and π : Y → be the associated coding map. Consider the functions βs : X → R and βu : X → R defined by βs (x) = log dπx ϕτ (πx) |E s (π x)

(25)

βu (x) = − log dπx ϕτ (πx) |E u (π x) .

(26)

and

Note that n−1

βs (σ k x) = log dπx ϕτn (πx) |E s (π x) ,

k=0 n−1

βu (σ −k x) = log dπx ϕ−τn (πx) |E u (π x) ,

k=0

where τn (π x) =

n−1

τ (π(σ k x)).

k=0

Then dimH Z = 1 + dimβs ,βu π −1 Z for every :-invariant set Z ⊂ .

(27)

Multifractal Analysis of Hyperbolic Flows

357

2. Let be a basic set of a C 1 axiom A diffeomorphism f such that dx f |E s (x) and dx f |E u (x) are multiples of isometries for each x ∈ . Consider a Markov partition for , and the associated coding map π : X → . Define functions βs : X → R and βu : X → R by βs (x) = log dπx f |E s (π x)

and

βu (x) = − log dπx f |E u (π x) .

Then dimH Z = dimβs ,βu π −1 Z

(28)

for every set Z ⊂ . The identities in (27) and (28) follow from work of Schmeling [12]. In what follows we shall only consider the first situation. A straightforward modification applies to the second one. We briefly present another description of the (βs , βu )- dimension. When X is equipped with the distance in (7), the map π is in general only Hölder continuous. We will introduce a new distance dX in X (inducing the same topology as dX ) such that for a certain class of flows (the flows which are conformal on ; see the definition below in this section) the map π : (X, dX ) → π(X) is locally Lipschitz with Lipschitz inverse, and thus it preserves the Hausdorff dimension. We define a new distance dX in X by dX ((· · · i0 · · · ), (· · · j0 · · · )) = |i0 − j0 | + βs (Ci−nu ···ins ) + βu (Ci−nu ···ins ), where ns = max{n ∈ N : ik = jk for all k ≤ n} and nu = max{n ∈ N : ik = jk for all k ≥ −n}. Since diamdX C = βs (C) + βu (C) for every cylinder C, the (βs , βu )-dimension coincides with the Hausdorff dimension with respect to dX . The distance dX induces a new Bowen–Walters distance in Y . One can easily verify that this distance induces the same topology in Y as the original Bowen– Walters distance obtained from dX . Let be a basic set of a C 1 flow , and ν be a -invariant probability measure on . For every α ∈ R, let log ν(B(y, r)) Kα = y ∈ : lim =α . r→0 log r With the help of a Markov system, one can show that the set Kα satisfies an identity similar

to that in (24). It follows from work of Barreira, Pesin, and Schmeling [2] that ν( α∈R Kα ) = 1. Consider now the dimension spectrum for pointwise dimensions (with respect to the measure ν) defined by D(α) = dimH Kα .

358

L. Barreira, B. Saussol

In a similar way to that in Sect. 5.1, if is conformal on , then D(α) = 1 + dimβs ,βu (X ∩ π −1 Kα−1 ), with βs and βu as in (25) and (26). Given a continuous function g : → R, let ts (q) and tu (q) be the unique numbers such that PT (−ts (q)βs + qg ◦ π ) = PT (−tu (q)βu + qg ◦ π ) = 0. We write T (q) = 1 + ts (q) + tu (q). One can also formulate a version of Theorem 14 for basic sets. Theorem 16. Let be a compact basic set of a topologically mixing C 1+ε flow , for some ε > 0, such that is conformal on , and ν an equilibrium measure for with Hölder continuous potential g : → R such that P (g) = 0. Then the following properties hold: 1. for ν-almost every y ∈ , hµ (T ) hµ (T ) log ν(B(y, r)) − ; =1− r→0 log r X βs dµ X βu dµ lim

2. if α = −T then D(α(q)) = T (q) + qα(q) for every q ∈ R; 3. if µ is not a measure of maximal dimension on X, then D and T are real analytic strictly convex functions. Theorem 16 was established by Pesin and Sadovskaya in [8]. 6. Proofs 6.1. Proofs of the results in Section 3. Proof of Theorem 7. Assume that g is :-cohomologous to h on Y . If x ∈ Y then τ (x) q(ψt ψs x) − q(ψs x) Ig (x) − Ih (x) = ds lim t→0 t 0 τ (x)+t τ (x) 1 = lim q(ψs x) ds − q(ψs x) ds t→0 t 0 t τ (x) τ (x)+t 1 = lim q(ψs x) ds − q(ψs x) ds t→0 t 0 0 1 t − lim q(ψs x) ds t→0 t 0 = q(ψτ (x) x) − q(x) = q(T x) − q(x). Therefore, Ig is T -cohomologous to Ih on Y .

Multifractal Analysis of Hyperbolic Flows

359

Assume now that Ig is T -cohomologous to Ih on Y . If x ∈ Y then τ (ψt x) = τ (x) − t for every sufficiently small t > 0 (depending on x). Thus, T (ψt x) = T x, and Ig (ψt x) − Ih (ψt x) = q(T x) − q(ψt x). Since Ig (ψt x) − Ig (x) 1 = lim − lim t→0 t t t→0+ we obtain

0

t

g(ψs x) ds = −g(x),

Ig (ψt x) − Ih (ψt x) Ig (x) − Ih (x) + t t t→0+ q(T x) − q(ψt x) q(T x) − q(x) = lim − + t t t→0+ q(ψt x) − q(x) = lim . t t→0+

g(x) − h(x) = lim

−

We also have

τ (ψ−t x) =

τ (x) + t t

(29)

if x ∈ X × {0} if x ∈ X × {0}

for every sufficiently small t > 0 (depending on x). When x ∈ X × {0} we have T (ψ−t x) = T x and one can proceed in a similar fashion to the one above to show that g(x) − h(x) = lim

t→0−

q(ψt x) − q(x) . t

(30)

When x ∈ X × {0} we have T (ψ−t x) = x, and Ig (ψ−t x) − Ih (ψ−t x) = q(x) − q(ψ−t x). Since lim

t→0+

Ig (ψ−t x) 1 = lim t t→0+ t

0

−t

g(ψs x) ds = g(x),

we obtain g(x) − h(x) = lim

t→0−

Ig (ψt x) − Ih (ψt x) q(x) − q(ψt x) = lim . −t −t t→0−

(31)

By (29), (30), and (31), if x ∈ Y then g(x) − h(x) = lim

t→0

q(ψt x) − q(x) . t

Therefore, g is :-cohomologous to h on Y . It remains to prove that Property 7 implies Property 7. Assume that Property 7 holds with the function q : X × {0} → R. We can extend q to a function q : Y → R by t [g(ψs y) − h(ψs y)] ds q(ψt y) = q(y) − 0

360

L. Barreira, B. Saussol

for every y = (x, 0) and t ∈ [0, τ (x)). For every t ∈ [0, τ (x)) we have T ψt y = T y and by (11) we obtain q(T ψt y) − q(ψt y) = q(T y) − q(ψt ) τ (y) = [g(ψs y) − h(ψs y)] ds t

= Ig (ψt y) − Ih (ψt y). This completes the proof of the theorem. Proof of Proposition 8. The proof is a straightforward modification of the corresponding arguments in the proof of Proposition 6 (see Sect. 6.2 below). Proof of Theorem 9. By (19), the desired statements follow immediately from Theorem 12 by setting u = 1. Proof of Theorem 10. This follows immediately from Theorem 13 by setting k = 1, g1 = g, and u = 1. 6.2. Proofs of the results in Section 2. Proof of Theorem 1. If g is -cohomologous to a constant, then B(g) = ∅. Assume now that g is not -cohomologous to a constant. Consider a Markov system, and the associated suspension flow : = {ψt }t and coding map π : Y → satisfying (15). The map π can be used to transfer the results from the symbolic dynamics to the dynamics on the manifold. By (15), we obtain π(B(g ◦ π)) ⊂ B(g). A priori one cannot discard that there exists a point x ∈ X such that 1 t 1 t lim inf (g ◦ π )(ψτ x) dτ < lim sup (g ◦ π )(ψτ x) dτ (32) t→∞ t 0 t→∞ t 0 and lim inf t→∞

1 t

0

t

g(ψτ (π x)) dτ = lim sup t→∞

1 t

0

t

g(ψτ (π x)) dτ.

(33)

With slight changes to the proof of Theorem 7.4 in [3] (see also the proof of Theorem 21.1 in [7], and in particular that of Lemmas 2 and 3 inside Theorem 21.1) one can prove the following. Lemma 1. We have 1 t 1 t lim inf (g ◦ π )(ψτ x) dτ = lim sup (g ◦ π )(ψτ x) dτ = α t→∞ t 0 t→∞ t 0 if and only if lim inf t→∞

1 t

0

t

g(ψτ (π x)) dτ = lim sup t→∞

1 t

0

t

g(ψτ (π x)) dτ = α.

Multifractal Analysis of Hyperbolic Flows

361

The lemma shows that (32) and (33) cannot hold simultaneously, and hence, B(g) ⊂ π(B(g ◦ π)). Therefore, B(g) = π(B(g ◦ π )).

(34)

We now proceed as in [3]. Let R ⊂ be the “boundary” of the Markov system, i.e., the set of points y ∈ such that ϕt x is in the boundary of some element of the Markov system for some t ∈ R. Note that R is -invariant, and that π : π −1 ( \ R) → \ R is a homeomorphism. Furthermore, since there exist cylinders C ⊂ X such that π(C) is disjoint from R, we have h(:|π −1 R) < h(:)

and

h(|R) < h(|).

By (34), we conclude that h(|B(g)) = h(:|B(g ◦ π )). By Theorem 10, we obtain h(|) = h(:) = h(:|B(g ◦ π )) = h(|B(g)). This completes the proof of the theorem.

Proof of Theorem 2. Let G = {g ∈ C α () : g is not - cohomologous to a constant}, def

and g ∈ G. By Livschitz’s theorem for flows (see, for example, Theorem 19.2.4 in [6]), there exist two points xi = ϕTi xi for i = 0, 1 such that T0 T1 1 1 g(ϕτ x0 ) dτ − g(ϕτ x1 ) dτ = 0. δ= T0 0 T1 0 For any f ∈ C α () such that f − g α < δ/2 we have T i 1 δ (f − g)(ϕτ xi ) dτ ≤ sup{|f (x) − g(x)| : x ∈ } ≤ f − g α < , T 2 i 0 for i = 0, 1, and hence, 1 T0

T0 0

1 f (ϕτ x0 ) dτ = T1

T1 0

f (ϕτ x1 ) dτ.

This implies that f is not -cohomologous to a constant. Hence, G is open. Let <0 and <1 be two distinct periodic orbits, and choose a function h ∈ C α () such that h| 0, the function gε = g+εh ∈ C α () is not -cohomologous to a constant, because averages on <0 and <1 differ by ε. Moreover,

gε − g α ≤ ε h α , and hence the function g can be arbitrarily well approximated by functions in G. Therefore, G is dense in C α (). Proof of Theorem 4. Consider a Markov system, and the associated suspension flow : = {ψt }t and coding map π : Y → satisfying (15). By Lemma 1 (see the proof of Theorem 1), we have E(α) = Du (α) for every α, with u = 1, and Du as in Sect. 4.2.

362

L. Barreira, B. Saussol

Therefore, the desired statements follow immediately from Theorem 12 by setting u = 1. Proof of Theorem 5. The proof is a straightforward modification of the corresponding arguments in the proof of Theorem 7 (see Sect. 6.1 above). Proof of Proposition 6. Given m ∈ N, define a function τm : → R by τm (x) =

m−1

τ (T i x).

(35)

i=0

If x ∈ and m ∈ N then 0

τm (x)

g(ϕs x) ds = = =

m−1 τi+1 (x)

g(ϕs x) ds

τi (x)

i=0 m−1 τ (T i x) i=0 m−1

0

g(ϕs T i x) ds

(36)

Ig (T i x).

i=0

Given t > 0 there exists a unique m ∈ N such that τm (x) ≤ t < τm+1 (x). One can write t = τm (x) + κ for some κ ∈ (inf τ, sup τ ) and thus 1 t

0

t

g(ϕs x) ds =

τm (x) 0

τ (x)+κ g(ϕs x) ds + 0 m g(ϕs x) ds τm (x) + κ

and τm (x) 1 t 1 g(ϕs x) ds − g(ϕs x) ds t 0 τm (x) 0 τm (x) κ sup|g| 1 1 |g(ϕs x)| ds + − ≤ τm (x) + κ τm (x) 0 τm (x) + κ κ sup|g| κ ≤ · τm (x) sup|g| + (τm (x) + κ)τm (x) τm (x) + κ 2 sup τ sup|g| ≤ . τm (x) By (6), if t → ∞, then m → ∞ and τm (x) → ∞. Hence, by (36), m−1 1 t 1 i g(ϕs x) ds − Ig (T x) → 0 as t → ∞. t 0 τm (x) i=0

This immediately implies Statements 2 and 3.

Multifractal Analysis of Hyperbolic Flows

363

Assume now that g is Hölder continuous. If x and y lie in the same domain of continuity of τ , then τ (y) τ (x) |Ig (x) − Ig (y)| = g(ϕs x) ds + [g(ϕs x) − g(ϕs y)] ds τ (y) 0 ≤ sup|g| · |τ (x) − τ (y)| + sup τ · sup |g(ϕs x) − g(ϕs y)| s∈(0,τ (y))

≤ cd(x, y)α + c

sup

s ∈ (0, sup τ ), z ∈ M

dz ϕs α d(x, y)α ,

for some positive constants c and α. This shows that Ig is Hölder continuous on each domain of continuity of τ .

6.3. Proofs of the results in Section 4. Proof of Proposition 11. For each m ∈ N, let τm : X → R be the function defined by (35). Given x ∈ X, let m = m(x, t) ∈ N be the unique integer satisfying τm−1 (x) ≤ t < τm (x). By Proposition 19 in Appendix A there exists a constant c ≥ 1 such that if y = (x, s) ∈ Y , t > 0, and ε > 0 is sufficiently small, then ε ε BX (x, m, ε) × s − , s + ⊂ B(y, t, ε) c c (37) ⊂ BX (x, m − 1, ε) × (s − cε, s + cε), where BX (x, m, ε) = {x ∈ X : dX (T k x , T k x) < ε for k = 0, . . . , m}.

(38)

By Proposition 18 in Appendix A the function Ig is Hölder continuous on X. Since µ is an equilibrium measure of Ig it has the Gibbs property. Therefore, the limit lim inf − t→∞

log ν(B(y, t, ε)) log µ(BX (x, m, ε)) = lim inf − t→∞ U (y, t, ε) U (y, t, ε)

(39)

is independent of ε. Let δ(ε) = sup{|u(y1 ) − u(y2 )| : dY (y1 , y2 ) < ε} and observe that U (y, t, ε)

1 ≤ t 0

u(ψτ y) dτ

t ≤

0 [u(ψτ y) + δ(ε)] dτ t 0 u(ψτ y) dτ

≤1+

δ(ε) . t inf u

By (39) and (40), we conclude that d ν,u (y) = lim inf − t→∞

log ν(B(y, t, ε)) log ν(B(y, t, ε)) . = lim inf − t t→∞ U (y, t, ε) 0 u(ψτ y) dτ

A similar argument applies to d ν,u (y).

(40)

364

L. Barreira, B. Saussol

Proof of Theorem 12. We shall reduce our setup to the case of maps. Lemma 2. If y = (x, s) ∈ Y , then

m Ig (T i x) d ν,u (y) = lim inf − i=0 m i m→∞ i=0 Iu (T x)

and

m Ig (T i x) d ν,u (y) = lim sup − i=0 . m i m→∞ i=0 Iu (T x)

Proof of the lemma. Let τm : Y → R be the function defined by (35). Given t > 0, let m ∈ N be the unique integer such that τm (x) ≤ t < τm+1 (x), and write t = τm (x) + κ with κ ∈ (inf τ, sup τ ). Proceeding as in the proof of Proposition 6 we obtain m−1 1 t 1 i u(ψτ y) dτ − Iu (T y) → 0 as t → ∞. (41) t 0 τm (y) i=0

Let BX (x, m, ε) be as in (38). By (37), − log ν(B(y, t, ε)) log µ(BX (x, m, ε)) → 0 as t → ∞. + t τm (x)

(42)

Note that T i (x, s) = T i (x, 0) for every i ∈ N, and hence, m−1

Iu (T i y) =

i=0

m−1

Iu (T i x).

i=0

Write A=

− log ν(B(y, t, ε)) log µ(BX (x, m, ε)) . + m−1 t i i=0 Iu (T x) 0 u(ψτ y) dτ

Since 0 < inf u ≤ sup u < ∞, by (41) and (42) we obtain − log µ(BX (x, m, ε)) t A= + o(t) t τm (x) u(ψ τ y) dτ 0 log µ(BX (x, m, ε)) t + + o(t) , t τm (x) 0 u(ψτ y) dτ and hence,

|A| ≤

hµ (T ) 1 + o(t). inf u inf τ

This completes the proof of the lemma. Given Z ⊂ X and β ∈ R, set Nβ (Z) = lim inf ?→∞ <

C∈<

 exp −β sup

 m(C)−1 

i=0

  Iu (T i x) : x ∈ C  , 

(43)

with the infimum taken over all covers < of Z by cylinders Ci−n ···im such that m(Ci−n ···im ) = m ≥ ?.

Multifractal Analysis of Hyperbolic Flows

365

Lemma 3. If Z ⊂ X is T -invariant, then dimu {(x, s) ∈ Y : x ∈ Z and s ∈ [0, τ (x)]} = inf{β : Nβ (Z) = 0}. Proof of the lemma. We use the same notation as in the proof of Lemma 2. The inequality m−1 t Iu (T i x) ≤ κ sup u u(ψτ x) dτ − 0 i=0

implies the desired statement.

By Lemmas 2 and 3 we have Kα = {(x, s) ∈ Y : x ∈ Zα and s ∈ [0, τ (x)]}, where

m−1

Zα = x ∈ X : lim − i=0 m−1 m→∞

i=0

Ig (T i x) Iu (T i x)

=α ,

and Du (α) = inf{β : Nβ (Zα ) = 0}. Lemma 4. We have

hµ (T ) Iu dµ = hν (:) u dν. X

Y

Proof of the lemma. By (11), τ (x)

Iu dµ τ dµ = u(ψs x) dsdµ(x) X X X 0 τ (x) u((x, s)) dν(x, s) = X 0 = u dν. Y

Abramov’s entropy formula shows that

hµ (T ) Iu dµ = hν (:) τ dµ Iu dµ = hν (:) u dν. X

X

This establishes the desired identity.

X

Y

By Lemmas 2 and 4, we obtain dν,u (y) = hν (:)/ Y u dν for ν-almost every y ∈ Y . We can now apply Theorem 6.6 in [3] to obtain the remaining properties in the theorem. Proof of Theorem 13. Proceeding as in the proof of Theorem 12, one can reduce our setup to the case of maps. More precisely, Lemma 2 establishes the identity (21), with F(g1 , . . . , gk ; u) and C(g1 , . . . , gk ; u) as in (20) and (22). Furthermore, by Lemma 3 we have dimu Y = inf{β : Nβ (X) = 0}

(44)

366

L. Barreira, B. Saussol

and dimu F(g1 , . . . , gk ; u) = inf{β : Nβ (C(g1 , . . . , gk ; u)) = 0},

(45)

with Nβ (Z) as in (43). Note that the set C(g1 , . . . , gk ; u) is defined entirely in terms of the map T , and the functions Iu and Igj for each j . By Theorem 7, the function gj is :-cohomologous to a multiple of u on Y if and only if Igj is T - cohomologous to a multiple of Iu on X, and hence, if and only if Igj is T -cohomologous to Iαj u = αj Iu on X, where αj is the unique number such that PT (Igj ) = PT (αj Iu ). Therefore we have the setup of Theorem 7.1 in [3], which implies that inf{β : Nβ (C(g1 , . . . , gk ; u)) = 0} = inf{β : Nβ (X) = 0}. The desired result follows from (44) and (45).

Appendix A. Bowen–Walters Distance for Suspension Flows We recall here a distance introduced by Bowen and Walters in [5] for suspension flows with an arbitrary height function. We also establish several properties which are needed in the proofs of the statements in Sects. 2–5. We would like to thank Valentin Afraimovich and Jean-René Chazottes for bringing the paper [5] to our attention. As in Sect. 3.1, let T : X → X be a homeomorphism of the compact metric space (X, dX ), and τ : X → (0, ∞) a Lipschitz function. Without loss of generality one can assume that the diameter diam X of X is at most 1. If this is not the case then since X is compact one can simply consider the new distance dX / diam X on X. We also consider the space Y in (9) with the points (x, τ (x)) and (T x, 0) identified for each x ∈ X. The suspension flow over T with height function τ is the flow : = {ψt }t on Y with ψt : Y → Y defined as in (10). We first assume that τ = 1 on X, and introduce the Bowen– Walters distance d1 on the corresponding space Y . We shall first consider horizontal and vertical segments. Given x, y ∈ X and t ∈ [0, 1] we define the length of the horizontal segment [(x, t), (y, t)] by ρh ((x, t), (y, t)) = (1 − t)dX (x, y) + tdX (T x, T y).

(46)

Note that ρh ((x, 0), (y, 0)) = dX (x, y) and

ρh ((x, 1), (y, 1)) = dX (T x, T y).

Furthermore, given (x, t), (y, s) ∈ Y on the same orbit we define the length of the vertical segment [(x, t), (y, s)] by ρv ((x, t), (y, s)) = inf{|r| : ψr (x, t) = (y, s) and r ∈ R}.

(47)

Finally, given two points (x, t), (y, s) ∈ Y the distance d1 ((x, t), (y, s)) is defined as the infimum of the lengths of paths between (x, t) and (y, s) composed by a finite number of horizontal and vertical segments. More precisely, for each n ∈ N we consider all finite chains z0 = (x, t), z2 , . . . , zn−1 , zn = (y, s) of points in Y such that for each i either zi and zi+1 are on the same segment X × {t} for some t ∈ [0, 1] (in which case [zi , zi+1 ] is called a horizontal segment), or zi and zi+1 are on the same orbit of the flow (in which case [zi , zi+1 ] is called a vertical segment). The lengths of horizontal and vertical segments are defined respectively in (46)

Multifractal Analysis of Hyperbolic Flows

367

and (47). We remark that when [zi , zi+1 ] is simultaneously a horizontal and a vertical segment, since by hypothesis the space X has diameter at most 1, the length of [zi , zi+1 ] is computed thinking of it as a horizontal segment. The length of the chain from z0 to zn is finally defined as the sum of the lengths of the segments [zi , zi+1 ] for i = 0, . . . , n − 1. We now consider the case of an arbitrary Lipschitz height function τ : X → (0, ∞), and introduce the Bowen–Walters distance dY on Y . Given two points (x, t), (y, s) ∈ Y , we set dY ((x, t), (y, s)) = d1 ((x, t/τ (x)), (y, s/τ (s))), where d1 is the Bowen–Walters distance when τ is the constant 1. Note that a horizontal segment is now of the form w = [(x, tτ (x)), (y, tτ (y))], and that its length is ?h (w) = (1 − t)dX (x, y) + tdX (T x, T y). The length of a vertical segment w = [(x, t), (x, s)] now becomes ?v (w) = |t − s|/τ (x), provided that t and s are sufficiently close (or otherwise when x is not a fixed point of T ). We shall from now on assume that T is an invertible Lipschitz map with Lipschitz inverse. We consider a number L ≥ max{1/ min τ, sup τ, 1} which is simultaneously a Lipschitz constant for T , T −1 , and τ . Given (x, t), (y, s) ∈ Y we define    dX (x, y) + |t − s|,  dπ ((x, t), (y, s)) = min dX (T x, y) + τ (x) − t + s, . (48)  d (x, T y) + τ (y) − s + t  X Note that dπ need not be a metric. Nevertheless, the following statement relates dπ with the Bowen–Walters distance dY . Proposition 17. There exists a constant c > 1 such that for each p, q ∈ Y the following property holds: c−1 dπ (p, q) ≤ dY (p, q) ≤ cdπ (p, q). Proof. Let (x, t), (y, s) ∈ Y . We easily obtain t s −1 2 L |t − s| − L dX (x, y) ≤ − ≤ L|t − s| + L2 dX (x, y). τ (x) τ (y)

(49)

(50)

We now consider the chain formed by the points (x, t), (y, tτ (y)/τ (x)), and (y, s), which is composed of a horizontal and a vertical segment. We obtain dY ((x, t), (y, s)) ≤ ?h ((x, t), (y, tτ (y)/τ (x))) + ?v ((y, tτ (y)/τ (x)), (y, s)) t s t t dX (x, y) + dX (T x, T y) + − ≤ 1− τ (x) τ (x) τ (x) τ (y) ≤ LdX (x, y) + L|t − s| + L2 dX (x, y),

(51)

368

L. Barreira, B. Saussol

using (50). Therefore dY ((x, t), (y, s)) ≤ c[dX (x, y) + |t − s|]

(52)

whenever c ≥ L + L2 . Considering the chain formed by the points (x, t), (x, τ (x)) = (T x, 0), (y, 0), and (y, s) we obtain τ (x) − t s + dX (T x, y) + τ (x) τ (y) ≤ L[dX (T x, y) + τ (x) − t + s].

dY ((x, t), (y, s)) ≤

(53)

By (52), (53), and the symmetry of dY we conclude that dY ((x, t), (y, s)) ≤ cdπ ((x, t), (y, s)) whenever c ≥ L + L2 . Consider now a chain z0 , . . . , zn between (x, t) and (y, s), and denote its length by ?(z0 , . . . , zn ). Assume further that the chain does not intersect the roof of Y . Let H and V denote the set of indices in the chain corresponding respectively to horizontal and vertical segments, and write ?H = ?h (zi , zi+1 ) and ?V = ?v (zi , zi+1 ). i∈H

i∈V

Let us denote zi = (xi , ri ) ∈ Y . Since the chain does not cross the roof, for the horizontal length we have ?H = (1 − ri )dX (xi , xi+1 ) + ri dX (T xi , T xi+1 ) i∈H

≥ L−1

(1 − ri )dX (xi , xi+1 ) + ri dX (xi , xi+1 ) ≥ L−1 dX (x, y).

(54)

i∈H

For the vertical length, using (50) we obtain ?V ≥ |t/τ (x) − s/τ (y)| ≥ L−1 |t − s| − L2 dX (x, y).

(55)

It follows from (54) and (55) that 2L4 ?(z1 , . . . , zn ) ≥ (L4 + L)?H + L?V ≥ dX (x, y) + |t − s|.

(56)

It is easy to see that for any chain of length ? there exists another chain with the same endpoints and of length at most L?, such that at most one segment intersects the roof of Y . Notice that if a chain crosses the roof of Y at least two times in the same direction then its length is at least 2, which is always larger than the length of the chain used to establish (51). Hence LdY ((x, t), (y, s)) is bounded from below by the infimum of the length of all chains between (x, t) and (y, s) which intersect the roof at most once. Let then z0 , . . . , zn be a chain intersecting the roof of Y exactly once. Without loss of generality one can assume that there exists 1 ≤ j ≤ n such that rj = τ (xj ), where zj = (xj , rj ), and that [zj −1 , zj ] is a vertical segment. If zj is after zj −1 along the orbit then by (56) we obtain 2L4 [?(z0 . . . , zj ) + ?(zj . . . , zn )] ≥ dX (x, xj ) + τ (x) − t + dX (T xj , y) + s.

Multifractal Analysis of Hyperbolic Flows

369

Since Ld(x, xj ) + d(T xj , y) ≥ d(T x, T xj ) + d(T xj , y) ≥ d(T x, y), we conclude that 2L5 ?(z1 , . . . , zn ) ≥ dX (T x, y) + τ (x) − t + s.

(57)

If zj is before zj −1 along the orbit then a similar computation gives 2L5 ?(z1 , . . . , zn ) ≥ dX (x, T y) + τ (y) − s + t.

(58)

By (56), (57), and (58) we conclude that dπ ((x, t), (y, s)) ≤ cdY ((x, t), (y, s)), provided that c ≥ 2L6 . Setting c = 2L6 we obtained the desired inequalities in (49).

Given a continuous function g : Y → R we define a new function Ig : X → R by (11). Proposition 18. If g is a Hölder continuous function on Y , then Ig is Hölder continuous on X. Proof. We proceed in a similar way to that in the proof of Proposition 6. Let x, y ∈ X and assume without loss of generality that τ (x) ≥ τ (y). We obtain τ (y) τ (x) |Ig (x) − Ig (y)| = g(ϕs x) ds + [g(ϕs x) − g(ϕs y)] ds τ (y) 0 ≤ sup|g| · |τ (x) − τ (y)| + sup τ · sup |g(ϕs x) − g(ϕs y)|

(59)

s∈(0,τ (y))

≤ sup|g| · LdX (x, y) + b

sup

s∈(0,τ (y))

dY ((x, s), (y, s))α ,

for some positive constants α and b. It follows from Proposition 17 and (59) (see also (48)) that |Ig (x) − Ig (y)| ≤ sup|g| · LdX (x, y) + b (cdπ ((x, s), (y, s)))α ≤ [sup|g| · L + bcα ]dX (x, y)α . This shows that Ig is Hölder continuous on X.

We now consider Bowen balls in X and Y , defined respectively by def BX (x, m, ε) = T −n BX (T n x, ε), 0≤n≤m def

BY (y, ρ, ε) =

0≤t≤ρ

ψ−t BY (ψt y, ε).

370

L. Barreira, B. Saussol

We say that T has bounded distortion if for each Hölder continuous function g : X → R there exists a constant D > 0 such that if x ∈ X, m ∈ N, ε > 0, and y ∈ BX (x, m, ε) then m−1 m−1 k k g(T x) − g(T y) ≤ Dε. k=0

k=0

Recall also the definition of the function τm in (35). Proposition 19. Assume that T has bounded distortion. There exists κ > 0 such that for every x ∈ X, 0 < s < τ (x), and m ∈ N, if ε > 0 is sufficiently small then 1 BY ((x, s), τm (x), ε) ⊂ BX (x, m, ε) × (s − ε, s + ε) ⊂ BY ((x, s), τm (x), κε). κ (60) 1 ) with c as in Proposition 17. Let also (x, t) ∈ Y with t ∈ Proof. Let ε ∈ (0, 2c (cε, τ (x) − cε), and (y, t) ∈ BY ((x, s), τm (x), ε). If m = 0 then by Proposition 17 we have dπ ((x, t), (y, s)) ≤ cε. Since

τ (x) − t + s ≥ τ (x) − t ≥ cε

and

τ (y) − s + t ≥ t ≥ cε

we must have dX (x, y) + |t − s| = dπ ((x, t), (y, s)) ≤ cε, which implies that dX (x, y) ≤ cε and |t − s| ≤ cε. This establishes the first inclusion when m = 0. For any 1 ≤ n ≤ m set tn = τn (x) − t and sn = τn (y) − s. It is easy to see that ψtn (x, t) = (T n x, 0) and ψsn (y, s) = (T n y, 0). By Proposition 17 we obtain dX (T n x, T n y) ≤ cdY (ψtn (x, t), ψsn (y, s)) ≤ cdY (ψtn (x, t), ψtn (y, s)) + cdY (ψtn (y, s), ψsn (y, s)) ≤ cε + c|tn − sn |.

(61)

Furthermore, by (48) we have dπ (ψtn (x, t), ψtn (y, s)) ≤ cε. Thus there exists yn ∈ X and rn ∈ (tn − cε, tn + cε) such that ψrn (y, s) = (yn , 0). Moreover the sequence rn is strictly increasing, since tn+1 − tn > 2cε. Hence sn ≤ rn ≤ tn + cε. By symmetry we obtain tn ≤ sn + cε, and hence |tn − sn | ≤ cε. By (61) we conclude that dX (T n x, T n y) ≤ c(1 + c)ε. This establishes the first inclusion in (60) provided that κ ≥ c(1 + c). Let now y ∈ BX (x, m, ε) and s ∈ (t − ε, t + ε). Take r ∈ (0, τm (x)) and choose n such that τn (x) ≤ r + t < τn+1 (x). Write r = r + t − τn (x) ≥ 0. By Proposition 17, the bounded distortion property, and (48) we obtain dY (ψr (x, t), ψr (y, s)) ≤ dY ((T n x, r ), (T n y, r )) + dY ((T n y, r ), ψr (y, s)) ≤ cdπ ((T n x, r ), (T n y, r )) + cdπ ((T n y, r ), ψr (y, s)) ≤ cdX (T n x, T n y) + c|r + τn (y) − r − s| ≤ cdX (T n x, T n y) + c|t − s| + c|τn (x) − τn (y)| ≤ c(2 + D)ε.

Multifractal Analysis of Hyperbolic Flows

This establishes the second inclusion in (60) provided that κ ≥ c(2 + D). Setting κ = max{c(1 + c), c(2 + D)} we obtain the desired inclusions.

371

References 1. Barreira, L., Pesin, Ya. and Schmeling, J.: Multifractal spectra and multifractal rigidity for horseshoes. J. Dynam. Control Systems 3, 33–49 (1997) 2. Barreira, L., Pesin, Ya. and Schmeling, J.: Dimension and product structure of hyperbolic measures. Ann. of Math. (2) 149, 755–783 (1999) 3. Barreira, L. and Schmeling, J.: Sets of “non-typical” points have full topological entropy and full Hausdorff dimension. Israel J. Math. 116, 29–70 (2000) 4. Bowen, R.: Symbolic dynamics for hyperbolic flows. Am. J. Math. 95, 429–460 (1973) 5. Bowen, R. and Walters, P.: Expansive one-parameter flows. J. Diff. Eqs. 12, 180–193 (1972) 6. Katok, A. and Hasselblatt, B.: Introduction to the modern theory of dynamical systems. Encyclopedia of Mathematics and its Applications, vol. 54, Cambridge: Cambridge Univ. Press, 1995 7. Pesin, Ya.: Dimension theory in dynamical systems: Contemporary views and applications. Chicago Lectures in Mathematics, Chicago, IL: Chicago University Press, 1997 8. Pesin, Ya. and Sadovskaya, V.: Multifractal analysis of conformal axiom A flows, Preprint 9. Pesin, Ya. and Weiss, H.: A multifractal analysis of equilibrium measures for conformal expanding maps and Markov Moran geometric constructions. J. Statist. Phys. 86, 233–275 (1997) 10. Ratner, M.: Markov partitions for Anosov flows on n-dimensional manifolds. Israel J. Math. 15, 92–114 (1973) 11. Schmeling, J.: On the completeness of multifractal spectra. Ergodic Theory Dynam. Systems 19, 1595– 1616 (1999) 12. Schmeling, J.: Entropy preservation under Markov coding. Preprint Communicated by J. L. Lebowitz

Commun. Math. Phys. 214, 373 – 387 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Zero-Temperature Dynamics of ± J Spin Glasses and Related Models A. Gandolfi1, , C. M. Newman2, , D. L. Stein3, 1 Dipartimento di Matematica, Università di Roma Tor Vergata, Viale della Ricerca Scientifica, 00133 Roma,

Italia. E-mail: [email protected]

2 Courant Institute of Mathematical Sciences, New York University, New York, NY 10012, USA.

E-mail: [email protected]

3 Depts. of Physics and Mathematics, University of Arizona, Tucson, AZ 85721, USA.

E-mail: [email protected] Received: 3 November 1999 / Accepted: 10 April 2000

Abstract: We study zero-temperature, stochastic Ising models σ t on Zd with (disordered) nearest-neighbor couplings independently chosen from a distribution µ on R and an initial spin configuration chosen uniformly at random. Given d, call µ type I (resp., type F) if, for every x in Zd , σxt flips infinitely (resp., only finitely) many times as t → ∞ (with probability one) – or else mixed type M. Models of type I and M exhibit a zero-temperature version of “local non-equilibration”. For d = 1, all types occur and the type of any µ is easy to determine. The main result of this paper is a proof that for d = 2, ±J models (where µ = αδJ + (1 − α)δ−J ) are type M, unlike homogeneous models (type I) or continuous (finite mean) µ’s (type F). We also prove that all other noncontinuous disordered systems are type M for any d ≥ 2. The ±J proof is noteworthy in that it is much less “local” than the other (simpler) proof. Homogeneous and ±J models for d ≥ 3 remain an open problem. 1. Introduction and Results In this paper, we study a specific class of continuous time Markov processes σ t = σ t (ω) with random environments. These correspond to the zero-temperature stochastic dynamics of disordered nearest-neighbor Ising models [1–4]. (Zero-temperature dynamics with d a different sort of disorder are studied in [5].) The state space is S = {−1, +1}Z and 0 the initial state σ is a realization of i.i.d. symmetric Bernoulli variables. The only transitions are single spin flips, where σxt+0 = −σxt−0 , and the transition rates depend on a realization J of i.i.d. random couplings Jx,y , indexed by nearest-neighbor pairs (with Euclidean distance ||x − y|| = 1) of sites in Zd , with common distribution µ on R. For Research partially supported by a Fulbright grant and by the Italian MURST, cofin ’99, under the Research Programme “Stochastic Processes with Spatial Structure”. Research partially supported by the National Science Foundation under grant DMS-98-02310. Research partially supported by the National Science Foundation under grant DMS-98-02153.

374

A. Gandolfi, C. M. Newman, D. L. Stein

a given J , the rate for a flip at x from state σ t−0 = σ is 1 or 1/2 or 0 according to whether Hx (σ ) ≡ 2

Jx,y σx σy

(1)

y: ||y−x||=1

(corresponding to the change in energy) is negative or zero or positive. The joint distribution of J , σ 0 , and ω will be denoted P . Zero-temperature dynamics without disorder have been much studied in the physics literature as a model of “coarsening” [6] and more recently because of the interesting phenomenon of persistence [7–11]. A natural question in both the disordered and nondisordered models is whether σ t has a limit (with P -probability one) as t → ∞ or equivalently whether for every x, σxt flips only finitely many times. More generally, one may call such an x an F-site (F for finite) and otherwise an I-site (I for infinite). The nonexistence of a limit corresponds to the type of “recurrence” studied in a general context and applied to various interacting particle systems in [12]. The issue of whether σ t has no limit is the zero-temperature version of whether there is “local non-equilibration” at positive temperature [13]. At positive temperature, local non-equilibration concerns not the recurrence of the spin configuration σ t but rather of a dynamical probability measure νt,τ (t) corresponding to averaging over the dynamics for times between t − τ (t) and t (for fixed J , σ 0 and dynamics realization ω up to time t − τ (t)). For τ (t) growing slowly with t, νt,τ (t) should be (approximately) a pure Gibbs state at the given temperature (on a lengthscale growing with t). Local non-equilibration would mean that the system does not converge to a single limiting pure state as t → ∞ (depending on J , σ 0 , ω). Although this type of non-equilibration has been proved to occur for the d = 2 homogeneous Ising model [13], it is an open problem whether it occurs at positive temperature for spin glasses (for any d ≥ 2). The focus of this paper is the study of the analogous problem at zero temperature for certain classes of spin glasses and related disordered systems. By translation-ergodicity, the collection of F-sites (resp., I-sites) has (with P-probability one) a well-defined non-random spatial density ρF (resp., ρI ). The densities ρF and ρI depend only on d and µ and of course satisfy ρF + ρI = 1. For each d, one may then characterize µ (or more accurately, one should characterize the pair (d, µ)) as being type F or I or M (for mixed) according to whether ρF (d, µ) = 1 or ρI (d, µ) = 1 or 0 < ρF , ρI < 1. Before reviewing previous characterization results and presenting new ones, we briefly discuss some important special cases of µ. Ferromagnetic models are those where µ is supported on [0, ∞) (so that each Jx,y ≥ 0) and homogeneous ones are those without disorder ( i.e., , where µ = δJ ). In the homogeneous ferromagnet, sites flip at rate 1 or 1/2 or 0 according to whether they disagree with a strict majority or exactly one half or a strict minority of their nearest neighbors. Antiferromagnetic models are those with Jx,y ≤ 0; on the lattice Zd , these are equivalent to ferromagnetic models under the relabelling (or “gauge”) transformation in which σx → −σx for each x on the odd sublattice while Jx,y → −Jx,y for every {x, y} (leaving (1) unchanged). Spin glasses (of the Edwards-Anderson type [14]) may be defined as those models where µ is symmetric (under Jx,y → −Jx,y ) – the most popular examples being mean zero Gaussian distributions and the ±J spin glasses, where µ = (1/2)δJ + (1/2)δ−J with J > 0 (a standard review is [15]). As we shall see, the family of measures µ = αδJ + (1 − α)δ−J with α ∈ [0, 1], including homogeneous ferromagnets and antiferromagnets and ±J

Zero-Temperature Dynamics of ± J Spin Glasses

375

spin glasses, is the most difficult to characterize. Henceforth, we use the term ±J model to refer to any µ of the form αδJ + (1 − α)δ−J with J > 0 and 0 < α < 1. Our review of known results begins with a proposition classifying all µ’s for d = 1. The type I nature of one-dimensional homogeneous ferromagnets was stated in [1], but is equivalent to a result in [16] (see also [17]) because for d = 1, the dynamics is the same as that for annihilating random walks or the usual voter model. It is possible that other parts of the proposition may also not be new. Proposition 1. Set d = 1. Then µ is type I if it is αδJ + (1 − α)δ−J with J ≥ 0 and α ∈ [0, 1]; µ is type F if it is either continuous or else of the form αδJ + βδ−J + ν with J > 0, 0 < α + β < 1 and a continuous ν supported on [−J, J ]; all other µ’s are type M. Proof. Since d = 1, any J = (Jx,x+1 : x ∈ Z) is equivalent (by an appropriate gauge transformation) to a ferromagnetic model with Jx,x+1 replaced by |Jx,x+1 |. Hence, for the remainder of this proof, we can and will assume that µ is replaced by µ, the common distribution of the |Jx,x+1 |’s, and thus that each Jx,x+1 > 0. If µ = δJ , then it is trivially type I for J = 0, while for J > 0, we have a homogeneous ferromagnet, for which a proof that it is type I may be found in [1]. For any other µ, one looks for sites z such that Jz,z+1 > Jz−1,z , Jz+1,z+2 .

(2)

Since µ = δJ , this has a strictly positive probability, and hence, by translation-ergodicity, there will be (with P -probability one) a doubly infinite sequence of such sites zn (with positive density). The conditions (2) imply that Hz (σ t ) and Hz+1 (σ t ) (see (1)) are t both negative or both positive according to whether σzt σz+1 = −1 or +1. It follows that t 0 0 0 t if σz σz+1 = +1, then σz and σz+1 will never flip, while σz0 σz+1 = −1 implies that (with probability one) one of them will flip exactly once and there will be no other flips of either. This already shows that ρF > 0. If µ is continuous, we may rely on the proof in [1] or argue as follows. Restricting attention to an interval {z, z + 1, . . . , z } with z = zn−1 + 1 and z = zn (where zn−1 and zn are successive sites from the special sequence defined above) and times after σzt and σzt have ceased flipping, we have a Markov process with a finite state space – the configurations of (σx : z + 1 ≤ x ≤ z − 1). Because of the continuity of µ, each flip in this interval will strictly lower the energy, −

−1 z

Jx,x+1 σx σx+1 .

(3)

x=z

Since this energy is (for a fixed J ) bounded below, the process in the interval must eventually stop flipping and reach an absorbing state. Applying this argument to every such interval, we conclude that a continuous µ is type F. If µ = αδJ + ν with J > 0, 0 < α < 1 and ν a continuous measure on [0, J ], then we modify the above argument as follows. Instead of looking for sites z satisfying (2), we look for runs of the value J , i.e., for sites z < w, where Jz−1,z < J, Jz,z+1 = J, Jz+1,z+2 = J, . . . , Jw−1,w = J, Jw,w+1 < J.

(4)

Now let {zn , zn + 1, . . . , wn }, as n varies over Z, be the doubly infinite sequence of run intervals. Focusing on the configurations in one of these intervals, and noting that

376

A. Gandolfi, C. M. Newman, D. L. Stein

the transition rates in that interval do not depend on the values of σzn −1 or σwn +1 , we observe that the two constant configurations are absorbing and accessible from any other configuration, so that the process eventually reaches one of these two absorbing configurations. To conclude that this µ is type F, we need to show for each n, that (after σwt n−1 and σztn have ceased flipping) the configuration in the interval {wn−1 , wn−1 + 1, . . . , zn } will also reach an absorbing state. But this follows exactly as in the argument above for continuous µ, with the continuity of ν replacing that of µ. To complete the proof of the proposition, it remains to show that if µ is neither δJ nor continuous nor of the form αδJ + ν as above, then ρI > 0 – i.e., some spins flip infinitely often. But for any µ now under consideration, there will exist some J > 0 and sites z and z = z + 3 such that z and z each satisfy (2), Jz +1,z +2 = Jz +2,z +3 (≡ Jz −1,z ) = J ,

(5)

σz0 = σz0 +1 = +1, σz0 = σz0 +1 = −1.

(6)

and

Under these circumstances, σzt +1 and σzt +3 (≡ σzt ) will never flip, but σzt +2 will flip infinitely many times because its flip rate will always be 1/2. Among the main results of [1] are extensions of the conclusions of Proposition 1 to d = 2 for the homogeneous ferromagnet (or antiferromagnet) and to d ≥ 2 for continuous µ (satisfying some conditions). In particular, it is proved there that a continuous µ with finite mean ( i.e., with E(|Jx,y |) < ∞) is type F for any d. (Certain continuous µ’s with infinite means are also shown in [1] to be type F by the very different percolationtheoretic methods of [18].) The continuous finite mean µ result is actually a corollary of the following more general theorem about flips that strictly decrease the energy, which we will apply to ±J models. Theorem 2 ([1]). For any d and any µ with finite mean, (with P -probability one) at each site x in Zd , there are only finitely many flips with Hx (σ ) < 0. The cases left open by the results of [1] were: (i) the homogeneous ferromagnet or antiferromagnet for d ≥ 3, (ii) ±J models for d ≥ 2, (iii) other noncontinuous µ’s for d ≥ 2 and finally (iv) general continuous µ’s with infinite means for d ≥ 2. The main results of this paper are the following two theorems that resolve (ii) for d = 2 and (iii) for d ≥ 2. We remark that part of the proof of Theorem 4 can be easily applied to show that ρF > 0 for any continuous µ; thus the µ’s of (iv) must either be type F or type M. Our guess is that (iv) is type F for any d ≥ 2. As for (i), there is some numerical evidence [7] that homogeneous models remain type I for d = 3 but perhaps not for d > 4. For more discussion of physical background and open problems, see [2,13,19]. Theorem 3. ±J models are type M for d = 2. Theorem 4. For any d ≥ 2, if µ is neither continuous nor of the form αδJ + (1 − α)δ−J for some J ≥ 0 and 0 ≤ α ≤ 1, then µ is type M. The proof of Theorem 4, presented in Sect. 2 of the paper, is quite easy. In Sects. 3 and 4, we give the proof of Theorem 3; the demonstration that ρI > 0 (Sect. 3) is fairly easy but the proof that ρF > 0 (Sect. 4) is not. The arguments used for the latter may

Zero-Temperature Dynamics of ± J Spin Glasses

377

be of general interest. In the proofs of both parts of Theorem 3, an important role is played by the frustration/contour representation of the ±J model for d = 2 (see, e.g., [20–24]); there are natural extensions of this representation for d ≥ 3 that could be useful in determining the type in these higher dimensions. As we shall see, there is an interesting conceptual difference between the proofs of these two theorems. The proof of Theorem 4 is essentially local in that we demonstrate that certain sites are type I and certain are type F from knowledge of the couplings and spins (at time zero) in finite regions containing those sites. The proof of Theorem 3, on the other hand, is not local, in that using the local knowledge, we only manage to deduce that some site among a finite number must be type I (or F). This is because we are unable to find local configurations of couplings and spins that completely insulate a local region from the surroundings. The (unknown in advance) influence from the outside prevents a determination of the type of individual sites. Although we have not proved it, we suspect that ±J models are intrinsically nonlocal in the strong sense that the type of any site cannot be ascertained from strictly local knowledge. Some related rigorous results appear in [25] concerning zero-temperature dynamics on homogeneous trees. 2. Other than ±J Models: Proof of Theorem 4 Proof that ρF > 0. Let C denote some cube in Zd , such as the unit cube consisting of vertices x = (x1 , . . . , xd ) with each xi = 0 or 1, and let Ei (C) (resp., Eo (C)) denote the set of nearest-neighbor edges {x, y} such that both x and y (resp., exactly one of x and y) belong to C. It suffices to show that with positive probability, the Jx,y ’s for {x, y} ∈ Ei (C) ∪ Eo (C) and the σx0 ’s for x ∈ C are such that σxt never flips for x ∈ C. To do this, we note that for each µ of Theorem 4, |Jx,y | is nonconstant. Hence there − ± exists some J > 0 such that P (A+ J (C) ∪ (AJ (C)) > 0, where AJ (C) is the event that |Jx,y | > J and sgn(Jx,y ) is the constant value ±1 for every {x, y} in Ei (C), while for every {x, y} in Eo (C), |Jx,y | ≤ J . Let B + (C) (resp., B − (C)) denote the event that (σx0 : x ∈ C) is constant (resp., is one of the two checkerboard patterns). Then − + − either A+ J (C) ∩ B (C) or AJ (C) ∩ B (C) (or both) have positive probability. But the occurrence of either one implies that σxt never flips for x in C and so ρF > 0. To show that ρI > 0, we use a slightly more complicated geometric construction involving a site w (e.g., the origin) and 2d disjoint cubes C1 , . . . , C2d that are neighbors of w in the sense that for each j , w ∈ / Cj but Cj contains exactly one nearest neighbor zj of w (see Fig. 1). Since µ is not continuous, it has an atom at some value J˜. We again construct events involving the couplings and spins near w, but now the construction depends on which of two cases µ falls into. Proof that ρI > 0; Case 1. Suppose µ = αδJ + βδ−J + ν with J > 0, 0 < α + β < 1 and a continuous ν supported on [−J, J ]. Then either J or −J (or both if α, β > 0) will work for J˜. We now define DJ˜,j to be the event that Jx,y = J˜ for every {x, y} in Ei (Cj ) and for {x, y} = {zj , w}, but for every other {x, y} in Eo (Cj ), |Jx,y | < |J˜|. The event D ˜ ≡ ∩2d D ˜ has positive probability. Now, for either value of sgn(J˜), consider the J

j =1

J ,j

events B + and B − , defined as

± 0 j B ± = ∩2d j =1 (B (Cj ) ∩ {σzj = (−1) })

(7)

378

A. Gandolfi, C. M. Newman, D. L. Stein

11 00 00 11

1 0 0 1

11 00 00 11

z 1 0 0 2 1

1 0 0 1

1 0 0 1

11 00 00 11 00 11

z3 11 00 00 11 00 11

1w 0 0 1 0 1

1 0 0 1 0 1

1 0 0 1 0 1

11 00 00 11 00 11

11 00 00 11 00 11

z1

0 z4 1 0 1 0 1

1 0 0 1 0 1

1 0 0 1 0 1

1 0 0 1 0 1

Fig. 1. Geometric construction demonstrating that a positive fraction of spins flip infinitely often for all d ≥ 2, in noncontinuous disordered systems other than ±J models. In this d = 2 figure, filled circles and solid lines denote respectively sites and edges of the original Z2 lattice. Here, the spin at site w flips infinitely often, given the events discussed in the text in Sect. 2 ˜

˜

and note that P (DJ˜ ∩ B sgn(J ) ) > 0. We claim that if DJ˜ ∩ B sgn(J ) occurs, then σwt flips infinitely many times and thus ρI > 0. To see this, note that, very much as in the proof ˜ above that ρF > 0, if DJ˜,j ∩ B sgn(J ) (Cj ) occurs (and here we use the fact that d = 1), then no site in Cj ever flips. If in addition σz0j = (−1)j for each j , then w has at all times exactly d neighbors with σx = +1 and d with σx = −1, so its rate for flipping is always 1/2 and it will flip infinitely many times. Proof that ρI > 0; Case 2. For any µ satisfying the hypotheses of Theorem 4 that is not in Case 1, J˜ may be chosen so that |Jx,y | > |J˜| with positive probability. We now define D +˜ and D −˜ as J ,j

J ,j

D ±˜

J ,j

= A±˜ (Cj ) ∩ {Jzj ,w = J˜} |J |

(8)

± + − ± and D ±˜ = ∩2d j =1 D ˜ and note that P (D ˜ ∪ D ˜ ) > 0. With B defined in (7), we J

J ,j

J

J

have that either D +˜ ∩ B + or D −˜ ∩ B − (or both) have positive probability. But if either J

J

occurs, then, as in Case 1, σztj = (−1)j for all t and σwt will flip infinitely many times, which completes the proof.

3. Two-Dimensional ±J Models: ρI > 0 We begin this section by introducing the frustration/contour representation for the ±J model that we will use throughout this section and the next for the proof of Theorem 3. We then give the proof that ρI > 0, which concludes with a general lemma about

Zero-Temperature Dynamics of ± J Spin Glasses

379

recurrence that will also be used (many times) in the next section for the proof that ρF > 0. The frustration/contour representation (see, e.g., [20–24]) uses variables (%, &) associated with the dual lattice Z2∗ ≡ Z2 + (1/2, 1/2), that are determined by (J , σ ). A (dual) site in Z2∗ may be identified with the plaquette p in Z2 of which it is the center, and is called frustrated for a given J if an odd number of the four couplings Jx,y making up the edges of that p are antiferromagnetic; % is then the set of frustrated (dual) sites. Thus % is determined completely by J , and it is not hard to see that every subset % of Z2∗ arises from some J . The edge {x, y}∗ in Z2∗ , dual to (i.e., the perpendicular bisector of) the edge {x, y} of Z2 , is said to be unsatisfied for a given J and σ , if sgn(Jx,y σx σy ) = −1 (and satisfied otherwise); & is then the set of unsatisfied (dual) edges. We say that (J , σ ) gives rise to (%, &), and that & is compatible with J or with % if there exists some σ such that (J , σ ) gives rise to (%, &). We define ∂&, the boundary of &, as the set of (dual) sites that touch an odd number of (dual) edges of &; then & is compatible with % if and only if % = ∂&. A (site self-avoiding) path in Z2∗ consisting of edges from & will be called a domain wall. For a given %, domain walls can terminate (i.e., with no possibility of continuation) only on frustrated sites; this is because any termination site touches exactly one edge of & and thus belongs to ∂& (= %). For a given J or %, the Markov process σ t determines a process & t , that is easily seen to also be Markovian. The transition associated with a spin flip at x ∈ Z2 is a local “deformation” of the contour & t at the (dual) plaquette x ∗ in Z2∗ that contains x ; this deformation interchanges the satisfied and unsatisfied edges of x ∗ and leaves the boundary % of & t unchanged. The only transitions with nonzero rates are those where the number of unsatisfied edges starts at k = 4 or 3 or 2 and ends at 0 or 1 or 2, respectively; transitions with k = 4 or 3 (resp., k = 2) correspond to energy-lowering (resp., zero-energy) flips and have rate 1 (resp., 1/2). We will continue to use the terms flip, energy-lowering, etc. for the transitions of & t . Proof that ρI > 0. This is by far the easier part of the proof of Theorem 3 and uses a strategy that is only a slight extension of the type of argument used in the previous section to prove ρI > 0 in Theorem 4. As in that proof, we will consider an event of positive probability, here denoted D, involving the frustration configuration in a finite region (and thus the values of only finitely many couplings). Unlike that proof, we will not then intersect D with some event involving σ 0 to insure that for all t, some site x has a positive flip rate. Instead, we will show that given D, and any spin (or contour) configuration in a certain fixed square C of Z2 , there must be at least one site in C with a positive flip rate. This will insure that, conditional on D, at least one site in C will flip infinitely many times. The region C is a 6 × 6 square of Z2 and D is defined in terms of % restricted to the 5 × 5 square *∗ of sites of Z2∗ contained within C. We choose D as the event that the frustrated sites of *∗ are exactly the nine sites (out of 25) indicated in Fig. 2. These nine sites consist of a center site wc and four adjacent pairs of sites to the Southeast, Northeast, Northwest and Southwest of wc . We have to show that for any & compatible with D, there is at least one site in C (or equivalently, one (dual) plaquette touching *∗ ) with a positive flip rate, i.e., with at least two unsatisfied edges. In fact, if D occurs, there must be a domain wall γc starting from wc ; this is because wc is frustrated and so either one or three of the edges touching it belong to &. Either γc has a “bend” within *∗ and thus the (dual) plaquette just inside the bend has a positive flip rate (since it has two or more unsatisfied edges) or else γc

380

A. Gandolfi, C. M. Newman, D. L. Stein

wc

w

1

w

2

Fig. 2. Geometric construction demonstrating that a positive fraction of spins flip infinitely often in ±J models for d = 2. In this figure the filled circles are sites in Z2 , the empty circles are unfrustrated sites in the (dual) Z2∗ lattice, and each empty circle covered by an × is a frustrated site in Z2∗ . Dashed lines correspond to edges in Z2∗ . The significance of the Z2∗ -sites wc , w1 , and w2 is discussed in Sect. 3

runs straight out of *∗ . In the latter case, by the invariance of D with respect to rotations by π/2, we may assume (without loss of generality) that γc runs from wc to the East and passes just above the (dual) edge joining the two Southeastern sites (that we will denote w1 , w2 ). But then there must be another domain wall γ1 starting from w1 . Either γc and γ1 together determine a positive flip rate site or else γ1 runs from w1 straight out of *∗ to the South. But then there must be another domain wall γ2 starting from w2 , that (together with γc and γ1 ) will determine a positive flip rate site, no matter what direction it runs off to. Let A denote the set of & configurations such that there is a site in C with a positive flip rate and let B denote the event that there is a spin flip in C at some time t ∈ [0, 1]. It is easy to see that for some α > 0, & ∈ A ⇒ P (B|& 0 = &) ≥ α.

(9)

It follows from Lemma 5 below that conditional on D, there will (with conditional probability one) be infinitely many spin flips in C and hence some site in C will flip infinitely many times. Since a positive density of the translates of the event D must occur (with probability one), we conclude that ρI > 0 as desired. Lemma 5. Let Zt be a continuous-time Markov process with state space Z and time(τ ) homogeneous transition probabilities, and let Zt denote the time-shifted process Zτ +t . For A a (measurable) subset of Z, say A recurs if {τ > 0 : Zτ ∈ A} is unbounded. For B an event measurable with respect to {Zt : 0 ≤ t ≤ 1}, say B recurs if {τ > 0 : Z (τ ) ∈ B} is unbounded. If inf P (B|Z0 = z) ≥ α > 0

z∈A

(10)

and A recurs with probability one (resp., with positive probability), then so does B.

Zero-Temperature Dynamics of ± J Spin Glasses

381

Proof. If A occurs, then define Tj inductively by T0 = 0 and Tj +1 is the smallest τ ≥ Tj + 1 such that Zτ ∈ A. For j ≥ 1, let ηj denote the indicator of the event that Z (Tj ) ∈ B. It follows from (10), by conditioning on the values of the ZTj ’s, that (η1 , η2 , . . . ) stochastically dominates (η1 , η2 , . . . ), a sequence of i. i. d. zero-or-one valued random variables with P (ηj = 1) = α. Thus, with probability one (conditional on A), ηj = ∞ and B recurs. 4. Two-Dimensional ±J Models: ρF > 0 The general strategy in this section is somewhat similar to that of the last section, but the analysis is considerably more involved. We will again consider an event, now denoted D , involving the frustration configuration in a finite region * of Z 2∗ , and the spin configuration in a fixed square C of Z2 . Our object will be to show that at least one of the sites in C will eventually have flip rate zero and hence will flip only finitely many times, thus proving ρF > 0; this will be done by proving that the domain wall geometry in * must eventually satisfy various constraints. The key technique of the proof will be to combine Theorem 2 and Lemma 5 to show that certain contour events A are eventually absent (e-absent), i.e., that A recurs with probability zero, since otherwise there would be infinitely many energy-lowering flips in * with positive probability. The region * is an 8 × 8 square in Z 2∗ and the event D is that out of the 64 sites in * , the frustrated ones are exactly the 20 sites indicated in Fig. 3. These are all within the “border” of * and are those sites in the border that are at most distance two from one of the four corner sites. The region C is the 7 × 7 square C(* ) of sites of Z2 contained within * ; these sites correspond to the 49 dual plaquettes formed by the edges of * . As indicated in Figure 3, let uN , uE , uW and uS denote the sites in the exact middles of the North, East, South and West sides of the border of C . What our proof will show is that eventually either uN (and uS ) or uE (and uW ) will have flip rate zero. The bulk of the proof is a lengthy series of lemmas, most of which show that certain types of contour configurations (in * ) are e-absent. Here is a sketch of how the lemmas will lead to the desired conclusion. A contour configuration & will be called of horizontal (resp., vertical) type if it contains a horizontal (resp., vertical) domain wall, i.e., one connecting the West and East (resp., South and North) sides of the border of * ; a & that is of neither of these two types will be said to be of non-crossing type. It turns out (see Lemmas 12 and 13 below) that (conditional on D ) eventually & t is exclusively one of these three types – i.e., it will not simultaneously contain both a horizontal and a vertical domain wall, and there will be no transition in which the type changes. It also turns out (as a consequence of other lemmas and again conditional on D ) that for uN (resp., uE ) to flip, & t just before or just after the flip must either be vertical (resp., horizontal) or else must be e-absent. It follows that eventually at most one of uN and uE has positive flip rate, completing the proof. Now to the lemmas. In the lemmas, we will consider various rectangles, denoted *∗ (or sometimes 5 ∗ ) of sites in Z2∗ , the associated rectangles C(*∗ ) of Z2 -sites within *∗ , contour configurations &(*∗ ) (and frustration configurations %(*∗ )) restricted to *∗ , and internal (or more specifically, *∗ -internal) transitions or flips of these restricted contour configurations, i.e., those corresponding to (energy decreasing or zero-energy) flips of sites in C(*∗ ) (these do not include external flips, i.e., of sites not in C(*∗ ) that are nearest neighbors of sites in C(*∗ )). We will call &(*∗ ) unstable if it is the starting configuration of an energy decreasing internal transition; i.e., if &(*∗ ) contains 3 or 4 edges of some (dual) plaquette completely within *∗ .

382

A. Gandolfi, C. M. Newman, D. L. Stein

uN

uW

uE

uS

Fig. 3. Geometric construction demonstrating that a positive fraction of spins flip only finitely many times in ±J models for d = 2, as explained in Sect. 4. The conventions used in this figure are the same as in Fig. 2

Lemma 6. Any unstable &(*∗ ) is e-absent. Proof. This is an easy consequence of Theorem 2 and Lemma 5. Here B is the event that an energy decreasing internal flip takes place in a unit time interval and α may be bounded below by the probability that such a flip takes place before any other (internal or external) flip that could change &(*∗ ). We leave further details to the reader. Lemma 7. Given compatible %(*∗ ) and &(*∗ ), if there exists a rectangle 5 ∗ ⊇ *∗ such that for every &(5 ∗ ) that coincides with &(*∗ ) in *∗ and is compatible with %(*∗ ), there is a finite sequence of 5 ∗ -internal transitions, &1 (5 ∗ ) = &(5 ∗ ) → &2 (5 ∗ ) → · · · → &n (5 ∗ ), (possibly with n = 1) such that &n (5 ∗ ) is e-absent, then (conditional on %(*∗ )) &(*∗ ) is also e-absent. Proof. For each &(5 ∗ ) and each 5 ∗ -internal transition from that configuration, let c1 () > 0 denote (a lower bound for) the probability that that transition is the first (5 ∗ -internal or 5 ∗ -external) flip to be attempted during a time interval of length > 0 and that flip is successful. Inductively, we see that with probability at least c(&(5 ∗ )) = 1 1 c1 ( n−1 ) · · · cn−1 ( n−1 ) (or c(&(5 ∗ )) = 1 when n = 1) &(5 ∗ ) will transform into ∗ &n (5 ) sometime during a time interval of unit length. We can now apply Lemma 5 with B being the event that one of these (finitely many) &n (5 ∗ )’s occurs during the unit time interval and with α being the minimum of the c(&(5 ∗ ))’s. A path or domain wall in Z2∗ with endpoints z and w is called monotonic if, for one of the two directed versions of the path, either every step moves to the East or to the North or else every step moves to the East or to the South. For such a monotonic path γ , we denote by R(γ ) = R(z, w) the (smallest) rectangle in Z2∗ with z and w as two of its corners. For a non-monotonic γ , R(γ ) denotes the smallest rectangle containing the sites of γ .

Zero-Temperature Dynamics of ± J Spin Glasses

383

Lemma 8. &(*∗ ) is e-absent if it contains a non-monotonic domain wall. Proof. Any non-monotonic domain wall contains as a sub-path a non-monotonic domain wall γ , with R(γ ) a 2 × (m + 1) rectangle and γ going around one long and two short sides of the border of the rectangle. Let x1 , . . . , xm denote the Z2 -sites at the centers of the m (dual) plaquettes of R(γ ) (listed in either of the two natural orders). Consider the sequence of flips of the first m − 1 of these sites (in the same order). If &(*∗ ) contains no other edges of R(γ ) than those of γ , then that sequence of flips corresponds to a sequence of transitions as in Lemma 7 (with 5 ∗ = *∗ ) whose final configuration is unstable; if there are other edges, then an unstable configuration may be reached earlier. In either case, we conclude from Lemmas 6 and 7 that &(*∗ ) is e-absent. Lemma 9. Let & contain a monotonic domain wall γ (with endpoints z and w) but no other edge inside the rectangle R(γ ). If γ is any other monotonic path between z and w, then there is a finite sequence of R(γ )-internal flips (i.e., flips of Z2 -sites within R(γ )) that transforms & into a configuration & whose restriction to R(γ ) consists exactly of the edges of γ . Proof. We sketch a proof, but the reader is invited to provide her own for this elementary result. Suppose (without loss of generality) that z is the Southwest and w the Notheast corner of R(γ ). Let γ denote the path between those corners that runs along the South and East sides of the rectangle. It suffices to show that any γ (and hence also γ ) can be transformed into γ (and vice-versa, by inversion). But this can be done (inductively) by noting that for any γ = γ , there is some site within R(γ ) whose flip will strictly reduce the area of the region between γ and γ . In an m × n rectangle *∗ of Z2∗ , the border consists of those sites in *∗ that are nearest neighbors of sites outside *∗ . The border has four (distinct, unless m = n = 1, but not disjoint) sides: North, East, West and South. There are four corners (distinct, if m, n > 1), denoted NE, NW, SW and SE, each of which is the single site at the intersection of two adjacent sides of the border. We define the interior of *∗ (denoted int(*∗ )) as those sites in *∗ , that are not in its border and we define the interior of any side of the border as those sites in that side that are not corners. Lemma 10. Given an m × n rectangle *∗ with m, n > 1 and conditional on a frustration configuration %(*∗ ), a contour configuration &(*∗ ) is e-absent if it contains a monotonic domain wall γ between some z and w and any one of the following four situations holds for the rectangle R = R(γ ) = R(z, w): (i) &(*∗ ) contains an edge e∗ in R that is not in γ . (ii) There is a frustrated site in int(R). (iii) There are two frustrated sites in the interior of a single side of the border of R. (iv) A corner of R other than z, w is frustrated and so is at least one site in the interior of each of the two sides of the border of R touching that corner. Proof. (i) Let γ be any monotonic path between z and w that contains e∗ . Consider the sequence of flips provided by Lemma 9 that would (if there were no edges of &(*∗ ) in R other than those of γ ) transform γ into γ . Because e∗ is already in &(*∗ ) (and so may be other edges of R that are not in γ ), at some stage along this sequence of flips (before e∗ is absorbed into the evolving domain wall) &(*∗ ) will have been transformed into an unstable configuration. The desired conclusion then follows from Lemmas 6 and 7.

384

A. Gandolfi, C. M. Newman, D. L. Stein

(ii) Since a frustrated site must have an odd number of unsatisfied edges touching it, such a site in the interior of R has an unsatisfied edge e∗ not in γ (but in R) touching it. The result now follows from part (i). (iii) Without loss of generality, we assume that z and w are the SW and NE corners of R and the two frustrated sites are on the south side of the border of R. Since these are not endpoints of γ , but they are frustrated, they must each have at least one unsatisfied edge not from γ touching them. By part (i), we may assume that those edges go out from R to the South. Also, by part (i) we may assume that &(*∗ ) has no edges other than those of γ in R. Then by the sequence of flips provided by Lemma 9, & (whose restriction to *∗ is &(*∗ )) can be transformed into & , where γ is replaced by γ , a domain wall between z and w lying along the South and East sides of R. But & also contains the two edges going South from the two frustrated sites. Thus it contains a non-monotonic domain wall in the slightly larger region 5 ∗ , that adds to *∗ its neighboring sites. The desired conclusion now follows from Lemmas 7 and 8. (iv) We may assume that z and w are the SW and NE corners of R, that there is a frustrated site in the interior of each of the South and East sides of the border and that the SE corner is also frustrated. By the same reasoning as in part (iii), there must be an unsatisfied edge going out from R starting from each of these three frustrated sites. The ones from the interior sites on the sides go to the South and the East, while the one from the corner can go in either of those two directions. Thus there are either two unsatisfied edges going South from the South side or else two going East from the East side. In either case, the proof is completed as in part (iii). We now focus on the 8 × 8 square * and the frustration configuration % (* ) (or the event D that the frustration configuration in * is exactly % (* )) indicated in Fig. 3. Since there are no frustrated sites in int(* ), domain walls of any &(* ) compatible with % (* ) must be extendable so that the endpoints z and w are both on the border of * . By Lemma 8, if z and w are on a single side of the border and &(* ) is not e-absent, then the domain wall can only be the straight line path between z and w. The following two lemmas cover the situations where the endpoints are on adjacent or opposite sides and give restrictions on the possible &(* )’s that are not e-absent. In the first of the two lemmas we write |z − z | to denote the Euclidean distance between sites in Z2∗ . Lemma 11. Condition on D . Every &(* ) that contains a domain wall between sites z and w that are on adjacent sides (but not on any single side) of the border of * , is e-absent if z, w and the common corner c(z, w) of the two sides do not satisfy the following condition: |z − c(z, w)| + |w − c(z, w)| ≤ 3.

(11)

(If z and w are opposite corners of * , then c(z, w) can be taken as either of the two remaining corners, the condition is not satisfied and &(* ) is e-absent.) Every &(* ) that contains a domain wall γ between z and w on a single side of the border of * with z a corner, is e-absent unless |z − w| ≤ 2.

(12)

Proof. We may assume by Lemma 8 that the domain wall is monotonic. If z and w are not on a single side and (11) does not hold, then one of the following two cases occurs. (I) One of |z − c(z, w)| or |w − c(z, w)| is at least 3. In this case, since we condition on D , one of the sides of R(z, w) contains at least two frustrated sites in its interior and part (iii) of Lemma 10 applies.

Zero-Temperature Dynamics of ± J Spin Glasses

385

(II) |z − c(z, w)| = 2 = |w − c(z, w)|. In this case, since we condition on D , a corner (other than z or w) of R(z, w) is frustrated and so is one site in the interior of each of the adjacent sides of the border of R(z, w). Thus, part (iv) of Lemma 10 applies. If z and w are on a single side with z a corner, we may assume, without loss of generality, that z is the NW corner and w is on the North side. If (12) does not hold, then there are (at least) two frustrated sites on γ between z and w, and there must be unsatisfied edges not in γ touching these two sites. One of those edges must go to the South (into * ), or else there would be a non-monotonic domain wall (using some edges of γ and two edges going North just outside of * ). Since there is no frustration in int(* ), that South-going edge must be extendable to a domain wall reaching some site w on the border of * . Combining that extension with part of γ yields a domain wall γ between z and w . If w is on the West or North sides, γ would be non-monotonic. If w is on the South or East sides, then (11) with w replaced by w would not hold and &(* ) would be e-absent by the part of this lemma that has already been proved. Before stating the next lemma, we recall our definition of a horizontal (resp., vertical) domain wall in * as one whose endpoints are in the West and East (resp., South and North) sides of the border. Lemma 12. Conditional on D , every &(* ) that contains both a vertical and horizontal domain wall is e-absent. Proof. Let us denote the endpoints of the horizontal (resp., vertical) domain wall by zW and zE (resp., zS and zN ) so that the subscript indicates the side that the endpoint is located on. (Note though that the endpoints may be corners.) Since the vertical and horizontal domain walls must have at least one site of * in common, it follows that &(* ) has a domain wall with any pair of the points {zN , zE , zW , zS } as endpoints. It also follows that both the horizontal and vertical crossings must be straight lines or else there would be a non-monotonic domain wall. Hence zS , zN are at distance ≥ 4 from one of the East or West sides (which we take to be the East side, without loss of generality) and similarly (without loss of generality) zW , zE may be assumed to be at distance ≥ 4 from the South side. But then the domain wall with endpoints z = zS and w = zE violates (11) and &(* ) is e-absent by Lemma 11. We recall that &(* ) is said to be of horizontal or vertical or non-crossing type according to whether it contains a horizontal or a vertical domain wall or neither. We will also say that & t is eventually of type A if for some (random) finite T , & t is of type A for all t ≥ T , A standing for one of the above three types. Lemma 13. Suppose &(* ) and & (* ) are related by an internal or external flip (either &(* ) → & (* ) or & (* ) → &(* )), and suppose further that &(* ) has a vertical (or horizontal) domain wall but & (* ) does not. Conditional on D , any such & (* ) is e-absent and, with probability one, & t is eventually of one of the three types – vertical, horizontal or non-crossing. Proof. Let γ be a vertical domain wall in &(* ) between z on the South side and w on the North side. The flip changes the edges of a single (dual) plaquette in or next to * ; whether the flip is internal or external, zero-energy or energy lowering, there will remain in & (* ) a portion of γ from z to some z ∈ * on that plaquette and a portion from w to some w ∈ * on that plaquette. Without loss of generality, we may assume that the distance from z to the South side is at least 3, and we then denote by γ the domain wall portion from z to z .

386

A. Gandolfi, C. M. Newman, D. L. Stein

If z is a border site, it cannot be on the North side since then γ would violate the assumption that & (* ) is not of vertical type. For either the East or West side as a location for z , it would follow that γ is either non-monotonic or else & (* ) is e-absent because of violating (11) or (12) with w replaced by z . If z is not a border site, then it is unfrustrated and γ must be extendable to a γ between z and some border site z . The e-absence of & (* ) now follows by the same arguments as above but with z and γ replaced by z and γ . Of course, analogous arguments work when &(* ) is of horizontal rather than vertical type. The final claim of the lemma now follows by choosing T to be the finite (with probability one) time beyond which no e-absent configurations in * are taken on by & t . By Lemma 12 and the part of this lemma already proved, no changes of type occur after that time. Proof that ρF > 0. Since D occurs with strictly positive probability (and hence a positive density of translates of D occur with probability one), it suffices to show that conditional on D , with probability one, for times beyond some finite T , some Z 2 -site in the 7 × 7 square C(* ) will not flip. We take the same T as in the proof of the previous lemma, namely the time beyond which no e-absent &(* )’s are seen. Past this time, & t remains of one particular type, and we will locate a non-flipping site depending on the type. Let uN denote the site in the middle of the North side of C(* ) (and uE , uW , uS the sites in the middle of the other sides), as indicated in Fig. 3. A flip of uN corresponds to a change in & t involving the edges of the (dual) plaquette inside * and just below the middle of its North side. Since e-absent configurations are no longer seen, it must be a zero-energy flip in which both before and after the flip there are exactly two unsatisfied edges from that plaquette, but with the unsatisfied and satisfied edges exchanged by the flip. Thus either before or after the flip, & t must contain an edge going South from one of the two central sites (that we will denote z) on the North side of * . That South-going edge must be extendable to a domain wall γ between z and some other border site w. We claim that γ must be vertical because otherwise & t would be e-absent. This is so because if γ were not vertical, then w would either be on the North side and so γ would be non-monotonic and e-absence would follow from Lemma 8; or else w would be on the West or East sides and e-absence would follow from Lemma 11. This shows that after time T , uN (and by symmetry uS ) cannot flip unless & t is of vertical type. Similarly uE and uW cannot flip after T unless & t is of horizontal type. By Lemma 13, conditional on D , after (the almost surely finite) time T some site ( e.g., either uN or uE ) does not flip. This completes the proof. Acknowledgements. A. G. thanks Joel Lebowitz and Rutgers University for their hospitality. A. G. and C. M. N. thank Anton Bovier and WIAS, Berlin for their hospitality.

References 1. Nanda, S., Newman, C.M. and Stein, D.L.: Dynamics of Ising Spin Systems at Zero Temperature. In: On Dobrushin’s Way (from Probability Theory to Statistical Physics), R. Minlos, S. Shlosman and Y. Suhov, eds., Am. Math. Soc. Transl. (2) 198, 2000, pp. 183–194 2. Newman, C.M. and Stein, D.L.: Blocking and Persistence in the Zero-Temperature Dynamics of Homogeneous and Disordered Ising Models. Phys. Rev. Lett. 82, 3944–3947 (1999) 3. Jain, S.: Zero-Temperature Dynamics of the Weakly Disordered Ising Model. Phys. Rev. E 59, R2493– R2496 (1999)

Zero-Temperature Dynamics of ± J Spin Glasses

387

4. Jain, S.: Persistence in the Zero-Temperature Dynamics of the Diluted Ising Ferromagnet in Two Dimensions. Phys. Rev. E 60, R2445–R2447 (1999) 5. Fontes, L.R.G., Isopi, M., Newman, C.M.: Chaotic Time Dependence in a Disordered Spin System. Prob. Theory Rel. Fields 115, 417–443 (1999) 6. Bray, A.J.: Theory of Phase-Ordering Kinetics. Adv. Phys. 43, 357–459 (1994) 7. Stauffer, D.: Ising Spinodal Decomposition at T = 0 in One to Five Dimensions. J. Phys. A 27, 5029–5032 (1994) 8. Derrida, B.: Exponents Appearing in the Zero-Temperature Dynamics of the 1D Potts Model. J. Phys. A 28, 1481–1491 (1995) 9. Derrida, B., Hakim, V., Pasquier V.: Exact First-Passage Exponents of 1D Domain Growth: Relation to a Reaction-Diffusion Model. Phys. Rev. Lett. 75, 751–754 (1995) 10. Majumdar, S.N., Huse, D.: Growth of Long-Range Correlations after a Quench in Phase-Ordering Systems. Phys. Rev. E 52, 270–284 (1995) 11. Majumdar, S.N., Sire C.: Survival Probability of a Gaussian Non-Markovian Process: Application to the T = 0 Dynamics of the Ising Model. Phys. Rev. Lett. 77, 1420–1423 (1996) 12. Cox, J. T., Klenke, A.: Recurrence and Ergodicity of Interacting Particle Systems. Prob. Theory Rel. Fields 116, 239–255 (2000) 13. Newman, C. M., Stein, D. L.: Equilibrium Pure States and Nonequilibrium Chaos. J. Stat. Phys. 94, 709–722 (1999) 14. Edwards, S., Anderson, P.W.: Theory of Spin Glasses. J. Phys. F 5, 965–974 (1975) 15. Binder K., Young, A.P.: Spin Glasses: Experimental Facts, Theoretical Concepts, and Open Questions. Rev. Mod. Phys. 58, 801–976 (1986) 16. Arratia, R.: Site Recurrence for Annihilating Random Walks on Zd . Ann. Prob. 11, 706–713 (1983) 17. Cox, J.T., Griffeath, D.: Diffusive Clustering in the Two Dimensional Voter Model. Ann. Prob. 14, 347–370 (1986) 18. Nanda, S., Newman, C.M.: Random Nearest Neighbor and Influence Graphs on Zd . Ran. Structures and Algorithms 15, 262–278 (1999) 19. Newman, C. M., Stein, D. L.: Zero-Temperature Dynamics of Ising Spin-Systems Following a Deep Quench: Results and Open Problems. Physica A 279, 159–168 (2000) 20. Toulouse, G.: Theory of the Frustration Effect in Spin Glasses. I. Commun. Phys. 2, 115–119 (1977) 21. Fradkin, E., Huberman, B., Shenker, S.H.: Gauge Symmetries in Random Magnetic Systems. Phys. Rev. B 18, 4789–4814 (1978) 22. Bieche, L., Uhry, J.P., Maynard, R., Rammal, R.: On the Ground States of the Frustration Model of a Spin Glass by a Matching Method of Graph Theory. J. Phys. A 13, 2553–2576 (1980) 23. Barahona, F.: On the Computational Complexity of Ising Spin Glass Models. J. Phys. A 15, 3241–3253 (1982) 24. Bovier, A., Fröhlich, J.: A Heuristic Theory of the Spin Glass Phase. J. Stat. Phys. 44, 347–391 (1986) 25. Howard, C. D.: Zero-temperature Ising Spin dynamics on the homogeneous tree of degree three. J. App. Prob. 37, 736–747 (2000) Communicated by J. L. Lebowitz

Commun. Math. Phys. 214, 389 – 409 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Entropy Production and Information Gain in Axiom-A Systems Da-quan Jiang1 , Min Qian2 , Min-ping Qian1 1 Department of Probability and Statistics, Peking University, Beijing 100871, P.R. China.

E-mail: [email protected]; [email protected]

2 Department of Mathematics, Peking University, Beijing 100871, P.R. China

Received: 2 August 1999 / Accepted: 14 April 2000

Abstract: Anosov systems are mathematical models for chaotic systems in statistical mechanics and fluid dynamics. Most of these systems enjoy the property of positive entropy production. We introduce the concept of specific information gain (or specific relative entropy) h(µ+ , µ− ) in Anosov systems and prove that it is identical to the entropy production rate ep (µ+ ) defined by Ruelle and Gallavotti in Anosov systems. From this point of view, the entropy production rate ep (µ+ ) characterizes the degree of macroscopic irreversibility of the system. 1. Introduction Chaotic systems arise naturally in statistical mechanics and fluid dynamics. Gallavotti [10] developed Ruelle’s idea and proposed the chaotic hypothesis that a many particles system in a stationary state can be regarded as a transitive Anosov system. Most of these chaotic systems enjoy the property of positive entropy production. The concept of entropy production was first put forward in nonequilibrium statistical physics to describe how far a specific state of a system is from its equilibrium state [31, 12]. In [24, 26], unifying the concepts of the entropy production in various concrete cases, a measuretheoretic definition is given for stochastic processes. Assume that X = {Xn }n∈Z is a stationary Markov chain with countable state space S, transition matrix P = (pij )i,j ∈S and stationary distribution = (πi )i∈S . Let ν and ν − be the distributions of the Markov chain and its time reversal respectively, then the entropy production (or say information gain of the stationary chain with respect to its time reversal) is def

πi pij 1 πi pij log H ν|G n , ν − |Gn = n→∞ n π j pj i

ep = lim

i,j ∈S

πi pij 1 = (πi pij − πj pj i ) log , 2 πj p j i i,j ∈S

(1)

390

D. Q. Jiang, M. Qian, M. P. Qian

where Gn = σ (Xi , 0 ≤ i < n), and H (ν|Gn , ν − |Gn ) is the relative entropy of ν with respect to ν − restricted to the σ -field Gn . (See Appendix 4.1 for the definition of relative entropy.) It describes how far the Markov chain is from reversibility (or the situation of detailed balance) (see [24–27, 11, 13]). The Markov chain X is reversible (i.e. ν = ν − ) iff the entropy production ep = 0, or iff πi pij = πj pj i , ∀i, j ∈ S (i.e. the chain X is in detailed balance). The equality (1) also tells that the entropy production is the sum, over pairs of states, of the “flux” multiplied by the “force”. For a dynamical system generated by an ordinary differential equation x˙ = V (x) with an invariant measure µ, where x ∈ Rn and V (x) is a vector field, as Andrey [2] formulated, one can define the entropy production rate of the system in the stationary state µ by ep (V , µ) = − divV dµ, (2) where divV is the divergence of V . For an Anosov diffeomorphism (M, f ) with any f -invariant probability measure µ, Ruelle [30] proposed a definition of the entropy production rate ep (µ): (3) ep (µ) = − log (x)µ(dx), where (x) is the absolute Jacobian of Df : Tx M → Tf (x) M. In fact, this is the rate that the entropy needs to be pumped out of the system to keep the system in the stationary state µ. In case the invariant measure µ+ has the SBR property, the entropy production rate ep (µ+ ) of (M, f, µ+ ) is not only of physical interest, but can also be studied more fruitfully. A natural question is whether Ruelle’s definition has any measure-theoretic basis and whether the entropy production defined for Markov chains is essentially the similar thing as the entropy production rate defined by Ruelle for diffeomorphisms. These are the main problems considered in this article. First we give a sketch of contents in different sections. Stimulated by the analogy between the equilibrium states of Axiom A diffeomorphisms and the Gibbs measures in the theory of random fields [32, 3, 28, 23], we introduce in Sect. 2 the concept of specific information gain (or specific relative entropy) for subshifts of finite type and Axiom A systems, in the same way as Föllmer [7] and Preston [23] did for Gibbs measures in random fields. A local version of this notion is also presented. The main results of this section are stated in Proposition 4 and Theorem 1. In Sect. 3 we exploit Theorem 1 to prove the following results: Assume that (M, f ) is a C 2 transitive Anosov diffeomorphism, then the specific information gain h(µ+ , µ− ) of µ+ with respect to µ− is identical to the entropy production rate ep (µ+ ) defined by Ruelle, where µ+ and µ− are the SBR measures for f and f −1 respectively. Furthermore, ep (µ+ ) = 0 iff µ+ is absolutely continuous with respect to the volume measure v on M, or iff µ+ = µ− . We also compare the entropy production rate defined by Ruelle [30] for basic sets of general Axiom A diffeomorphisms to the specific information gain in such systems. For the transitive Anosov diffeomorphism (M, f ), the specific information gain h(µ+ , µ− ) measures the difference between the system (M, f, µ+ ) and its time reversal (M, f −1 , µ− ). So we can say that the entropy production rate ep (µ+ ) characterizes the degree of macroscopic irreversibility of the Anosov system (M, f ). Because the entropy production rate ep (µ+ ) = 0 if and only if µ+ = µ− , which is similar to the result of

Entropy Production and Information Gain in Axiom-A Systems

391

stationary Markov chains, we call the Anosov system (M, f ) reversible if ep (µ+ ) = 0. In this case, from any absolutely continuous initial distribution, the system (M, f ) and its time reversal (M, f −1 ) will have the same asymptotic distribution µ+ , which is still absolutely continuous with respect to the volume measure v on M. For the convenience of the reader, in the appendix we list some notions and facts that we will use, including (1) relative entropy, (2) general thermodynamic formalism, (3) subshift of finite type and Gibbs measure, and (4) Axiom A diffeomorphism. 2. Information Gain In this section, we will introduce the concept of specific information gain for subshifts of finite type and Axiom A systems in the same way as Föllmer [7] and Preston [23] did for Gibbs measures in the theory of random fields. For a measurable transformation T from measure space (X, A, µ) to measurable space (Y, B), we define the measure µT −1 on (Y, B) by µT −1 (B) = µ(T −1 B), ∀B ∈ B. If T is from (X, A) to itself, we denote by MT (X) the set of all T -invariant probability measures on (X, A). For two finite measurable partitions C and D of (X, A), their joining def is the partition C D = {Ci ∩ Dj |Ci ∈ C, Dj ∈ D}. 2.1. Subshift of finite type. Let A be a m × m matrix with entries aij = 0 or 1. We can define the bi-infinite sequence space $, $A and the left-shift θ : $A → $A as in Appendix 4.3. $A is a metric space with ordinary metric. Let U = {Ui , 1 ≤ i ≤ m} Ui = {s ∈ $A |s0 = i}, 1 ≤ i ≤ m, n−1 −k n−1 −k and Fn = σ k=0 θ U , the smallest σ -field containing k=0 θ U. Denote by C($A ) the set of real continuous functions on $A . For φ ∈ C($A ), define Var k φ = sup{|φ(x) − φ(y)| : xi = yi , −k ≤ i ≤ k}. Let FA = {φ ∈ C($A )| Var k φ ≤ bα k (∀k ≥ 0) for some positive constants b and α ∈ (0, 1)}. Suppose that ($A , θ) is topologically mixing and fix a function φ ∈ FA , then φ has a Gibbs measure µφ (Appendix 4.3). See Appendix 4.1 for the definition of the relative entropy H (µ, ν) of µ w.r.t. ν. Definition 1. For any µ ∈ Mθ ($A ), we define the specific information gain (or specific relative entropy) of µ with respect to µφ by def

h(µ, µφ ) = lim

n→∞

1 H (µ|Fn , µφ |Fn ). n

(4)

From the following Proposition 1, the limit in the definition exists. Remark. 1. Orey, Pelikan [19, 20] and Xiwen Lin [17] proved independently the large deviation theorem at the Donsker–Varadhan level-3 type for a subshift of finite type (also for Axiom A diffeomorphisms) with rate function ν def E H (νs − |F1 , µφ,s − |F1 ) ν ∈ Mθ ($A ), Iµφ (ν) = (5) +∞ ν ∈ M($) \ Mθ ($A ),

392

D. Q. Jiang, M. Qian, M. P. Qian

where νs − and µφ,s − are respectively the regular conditional probability distributions of −1 −1 −k ν and µφ given F−∞ =σ k=−∞ θ U . And they proved that when ν ∈ Mθ ($A ), Iµφ (ν) = −

φdν − hν (θ ) + Pθ (φ),

(6)

where hν (θ ) is the measure-theoretic entropy of θ w.r.t. ν, and Pθ (φ) is the topological pressure of φ w.r.t. θ. For the specific relative entropy h(ν, µφ ) defined in (4), we can prove h(ν, µφ ) = E ν H (νs − |F1 , µφ,s − |F1 ), and then get h(ν, µφ ) = − φdν −hν (θ )+ Pθ (φ), but we prefer to give a simple and direct proof of this result. 2. After the preparation of this article, we found that Chazottes, et al. [4] proved the results of Proposition 1 in the case when µ is ergodic. Our proof of the general invariant measure case has many points in common with that in [4]. As our discussions are all based on the results of Proposition 1 and Proposition 2, we still keep some details here to make the presentation self-contained. Proposition 1. Suppose that ($A , θ ) is topologically mixing. For φ ∈ FA , let µφ be the Gibbs measure of φ, then for any µ ∈ Mθ ($A ), (7) h(µ, µφ ) = e(µ, φ) − hµ (θ ) + Pθ (φ) = − φdµ − hµ (θ ) + Pθ (φ), of θ w.r.t. µ, Pθ (φ) is the topological where hµ (θ ) is the measure-theoretic entropy pressure of φ w.r.t. θ , and e(µ, φ) = − φdµ is the specific energy of φ w.r.t. µ. Furthermore, h(µ, µφ ) ≥ 0, where the equality holds if and only if µ = µφ . Proof. If s0 s1 . . . sn−1 is allowed by A, i.e. asi si+1 = 1, 0 ≤ i ≤ n − 2, we denote it by s n , and we denote µ({x ∈ $A |xi = si , 0 ≤ i ≤ n − 1}) by µ(s n ). Since

n−1

1 −k θ U H (µ|Fn , µφ |Fn ) + Hµ n 0 1 µ(s n ) 1 = µ(s n ) log µ(s n ) log µ(s n ) − n n n µφ (s ) n n s

1 =− µ(s n ) log µφ (s n ) n n

s

s

and

n−1 1 lim Hµ θ −k U = hµ (θ, U) = hµ (θ ), n→∞ n 0

we only need to prove 1 µ(s n ) log µφ (s n ) = n→∞ n n lim

s

φdµ − Pθ (φ).

(8)

Entropy Production and Information Gain in Axiom-A Systems

393

By the property of the Gibbs measure µφ [3, Th. 1.4], there exists a constant c > 1 such that µφ (s n ) 1 ≤ ≤ c, (9) c exp(−nPθ (φ) + Sn φ(x))

k n for any x ∈ $A s.t. xi = si , 0 ≤ i ≤ n − 1, and Sn φ(x) = n−1 k=0 φ(θ x). For any s n allowed by A, choose x(s ) ∈ $A such that xi = si , 0 ≤ i ≤ n − 1, then we have 1 1 1 µ(s n ) log µφ (s n ) − µ(s n )Sn φ(x(s n )) + Pθ (φ) ≤ log c. n n n n n s

s

We only need to prove 1 lim µ(s n )Sn φ(x(s n )) = n→∞ n n

φdµ.

s

For µ ∈ Mθ ($A ), 1 µ(s n )Sn φ(x(s n )) − φdµ n n s 1 n n = µ(s )Sn φ(x(s )) − Sn φdµ n n s 1 = Sn φ(x(s n )) − Sn φ(y) µ(dy) n n {x|xi =si ,0≤i≤n−1} s 1 ≤ Sn φ(x(s n )) − Sn φ(y)µ(dy). n n {x|xi =si ,0≤i≤n−1}

(10)

s

For any y ∈ {x ∈ $A |xi = si , 0 ≤ i ≤ n − 1}, Sn φ(x(s n )) − Sn φ(y) ≤

n−1 φ(θ k x(s n )) − φ(θ k y) k=0

≤ Var 0 φ + Var 1 φ + · · · + Var [n/2] φ + Var n−[n/2] φ + · · · + Var 0 φ ≤ 2b

[n/2]+1 k=0

αk ≤

2b . 1−α

(11)

By (10) and (11), we have 1 1 2b n n µ(s )Sn φ(x(s )) − φdµ ≤ µ(s n ) n sn n sn 1 − α =

2b → 0 as n → ∞. n(1 − α)

394

D. Q. Jiang, M. Qian, M. P. Qian

The Gibbs measure µφ is the unique equilibrium state for φ with respect to θ [3, Th. 1.22]. We have h(µ, µφ ) ≥ 0 by the variational principle [3, Prop. 1.21].And that h(µ, µφ ) = 0 if and only if µ = µφ follows from the uniqueness of the equilibrium state for φ w.r.t. θ. As θ is an expansive homeomorphism of the compact metric space $A , the entropy map h· (θ ) of θ is affine and upper semi-continuous on Mθ ($A ) [33, Th. 8.1, Th. 8.2]. (h· (θ ) is affine means that if µ, ν ∈ Mθ ($A ) and p ∈ [0, 1], then hpµ+(1−p)ν (θ ) = phµ (θ ) + (1 − p)hν (θ ).) Therefore h(·, µφ ) is affine and lower semi-continuous on Mθ ($A ). In fact, we can prove a stronger result than Proposition 1. It is the local version of the specific relative entropy (reminicient of the local entropy defined by Katok). Proposition 2. Under the assumptions of Proposition 1, we have 1 µ(s n ) log = Pθ (φ) − hˆ µ (s) − E µ (φ|I) a.e. dµ(s), or L1 (dµ), n→∞ n µφ (s n ) lim

where hˆ µ (s) = −E µ

(12)

−1 −1 −1 −k U , and (s), log µ(U |F )I F = σ θ I i U −∞ −∞ i i k=−∞

+∞ I is the σ -field of θ-invariant sets, i.e. I = {B|B ∈ F−∞ , B = θ −1 B}. Moreover,

h(µ, µφ ) = E µ Pθ (φ) − hˆ µ (s) − E µ (φ|I) = Pθ (φ) − hµ (θ ) − φdµ. If µ is ergodic, then hˆ µ (s) = hµ (θ, U) = hµ (θ ), a.e. dµ(s). Proof. By the property of the Gibbs measure µφ [3, Th. 1.4], there exists a constant c > 1 such that µφ (s n ) 1 ≤ ≤ c, exp(−nPθ (φ) + Sn φ(s)) c so we have

(13)

log c 1 log µφ (s n ) − −nPθ (φ) + Sn φ(s) ≤ . n n

By the Birkhoff ergodic theorem, lim

n→∞

1 1 log µφ (s n ) = lim Sn φ(s) − Pθ (φ) n→∞ n n = E µ (φ|I) − Pθ (φ), a.e. dµ(s), or L1 (dµ).

(14)

By the Shannon–McMillan–Breiman theorem [22, p. 261] (generalized to the invariant measure case),

1 −1 n µ lim − log µ(s ) = −E log µ(Ui |F−∞ )IUi I (s) a.e. dµ(s), or L1 (dµ). n→∞ n i

(15)

Entropy Production and Information Gain in Axiom-A Systems

395

Then by taking the expectation on both sides of (15), we can get

n−1 1 −k hµ (θ ) = hµ (θ, U) = lim Hµ θ U n→∞ n k=0 1 µ(s n ) log µ(s n ) = − lim n→∞ n n s 1 = − lim E µ log µ(s n ) n→∞ n −1 = Eµ − Eµ log µ(Ui |F−∞ )IUi I (s) = −E

µ

i

i

−1 log µ(Ui |F−∞ )IUi .

The desired result follows from (14), (15) and (16).

(16)

Obviously, Proposition 2 implies Proposition 1. In spite of this fact, we keep the simple and direct proof of Proposition 1 to help those readers who are not familiar with the Shannon–McMillan–Breiman theorem. Assume that µ is ergodic and µ = µφ , then µ and µφ are mutually singular [33, Th. 6.10], and h(µ, µφ ) > 0. Proposition 2 shows that for any typical sequence s ∈ $A µφ (s n ) w.r.t. µ, the µφ measure of the cylinder set s n divided by its µ measure, µ(s n ) , converges exponentially to zero with exponential rate h(µ, µφ ). 2.2. Axiom-A diffeomorphism. In this subsection, we suppose that (M, f ) is a C r (r ≥ 1) Axiom A diffeomorphism and 5s is a basic set of (M, f ). Let R be a Markov partition of 5s with diameter small enough. We denote by A the transition matrix of f |5s with respect to R. We can define the subshift of finite type ($A , θ ) and the map π : $A → 5s as in Appendix 4.4. Since

n lim diam f −k R = 0, (17) n→∞

−n

by the property of the entropy of a continuous map with respect to a partition (see [33, Th. 4.12, Th. 8.3] or [3, Prop. 2.4]), for any µ ∈ Mf (5s ),

n −k hµ (f |5s ) = lim hµ f |5s , f R n→∞

= lim hµ f |5s , n→∞

= hµ (f |5s , R),

−n

2n

f

−k

R

0

(18)

i.e. R is a f |5s -generator. Fix a function φ : 5s → R Hölder continuous, i.e. |φ(x) − φ(y)| ≤ bd(x, y)γ , (b > 0, γ > 0). Then by Theorem 4.1 in [3], φ has a unique equilibrium state µφ w.r.t. f |5s .

396

D. Q. Jiang, M. Qian, M. P. Qian

2.2.1. Mixing case. Assume that f |5s is topologically mixing. Definition 2. For any µ ∈ Mf (5s ), we define the specific information gain (or specific relative entropy) of µ with respect to µφ by def

h(µ, µφ ) = lim

n→∞

n−1

where Fn = σ (

0

1 H (µ|Fn , µφ |Fn ), n

(19)

f −k R).

From the proposition below we know that the limit in the definition exists and is independent of the choice of Markov partition R. Remark. The elements of R are closed proper rectangles. Actually, it is not a partition since some of its elements intersect with one another on the boundary. We can modify the elements of R appropriately on the boundary to make them not intersect with one another. In this article when we consider the measure-theoretic entropy of f w.r.t. µ, or the specific information gain of µ w.r.t. µφ , we use this modified Markov partition R. If µ(∂R) = 0, in fact, there is no need to modify the Markov partition R. Proposition 3. Suppose that f |5s is topologically mixing, φ : 5s → R Hölder continuous and µφ the equilibrium state of φ with respect to f |5s , then for any µ ∈ Mf (5s ), h(µ, µφ ) = e(µ, φ) − hµ (f |5s ) + Pf (φ) = −

φdµ − hµ (f |5s ) + Pf (φ), (20)

where hµ (f |5s ) is the measure-theoretic entropy of f |5s w.r.t. µ, Pf (φ) is the topological pressure of φ w.r.t. f |5s , and e(µ, φ) = − φdµ is the specific energy of φ w.r.t. µ. Furthermore, h(µ, µφ ) ≥ 0, where the equality holds if and only if µ = µφ . Proof. To prove the proposition, we need the following fact [3, p. 90, Lemma 4.2]: There are 8 > 0,and α ∈ (0, 1) such that: if x ∈ 5s , y ∈ M and d(f k x, f k y) ≤ 8 for all k ∈ [−n, n], then d(x, y) < α n . As f |5s is topologically mixing, by Prop. 3.19 in [3], ($A , θ) is topologically mixing. We can assume that the diameter of the Markov partition R is smaller than 8 given above. Then φ ∗ = φ◦π ∈ FA and has a Gibbs measure µφ ∈ Mθ ($A ). We have µφ = µφ π −1 . For any µ ∈ Mf (5s ), there exists µ ∈ Mθ ($A ) such that µπ −1 = µ. We have hµ (f |5s ) = hµ (f |5s , R) = hµ (θ ), Pf (φ) = Pθ (φ ◦ π ), and

e(µ, φ) = −

φdµ = −

φ ◦ π dµ = e(µ, φ ◦ π ).

Entropy Production and Information Gain in Axiom-A Systems

So

397

1 H µ|Fn , µφ |Fn n→∞ n

n−1 µ( n−1 f −k Rsk ) 1 −k 0 = lim µ f Rsk log n→∞ n µφ ( n−1 f −k Rsk ) 0 sn 0

n−1 µπ −1 ( n−1 f −k Rsk ) 1 −1 −k 0 = lim µπ f Rsk log n→∞ n µφ π −1 ( n−1 f −k Rsk ) 0 sn 0

h(µ, µφ ) = lim

1 µ(s n ) µ(s n ) log n→∞ n µφ (s n ) n

= lim

s

= h(µ, µφ ), by Proposition 1, = e(µ, φ ◦ π ) − hµ (θ ) + Pθ (φ ◦ π ) = e(µ, φ) − hµ (f |5s ) + Pf (φ). By the variational principle and the uniqueness of the equilibrium state for φ w.r.t. f |5s , we have h(µ, µφ ) ≥ 0, where the equality holds if and only if µ = µφ . Remark about the proof. Obviously, hµ (θ ) ≥ hµ (f |5s , R). With the modified Markov partition R, to verify hµ (θ ) = hµ (f |5s , R), there are some subtleties to be considered (one needs to restrict π on a closed subset of $A ). One can avoid this trouble by giving a proof similar to that of Proposition 1, exploiting n−1 −k µφ Rsk k=0 f 1 ≤ c, ≤ (21) c exp −nPf (φ) + Sn φ(x) which is valid for any x ∈ 5s s.t. f k x ∈ Rsk , 0 ≤ k ≤ n − 1, and Sn φ(x) =

n−1 k k=0 φ(f x). We can also prove the following local version of Proposition 3, like we did in proving Proposition 2: Proposition 4. With the assumptions of Proposition 3, let Bn (x) be the member of the −k R to which x belongs, then partition n−1 k=0 f 1 µ(Bn (x)) = Pf (φ) − hˆ µ (x) − E µ (φ|I) a.e. dµ(x), or L1 (dµ), log n µφ (Bn (x)) (22) −1 −1 −1 −k R), and I where hˆ µ (x) = −E µ i log µ(Ri |F−∞ )IRi I (x), F−∞ = σ ( −∞ f lim

n→∞

is the σ -field of f -invariant sets, i.e. I = {B|B ∈ B(5s ), B = f −1 B}. Moreover, h(µ, µφ ) = E µ Pf (φ) − hˆ µ (x) − E µ (φ|I) = Pf (φ) − hµ (f |5s ) − φdµ. If µ is ergodic, then hˆ µ (x) = hµ (f |5s , R) = hµ (f |5s ), a.e. dµ(x).

Assume that µ is ergodic and µ = µφ , then µ and µφ are mutually singular [33, Th. 6.10], and h(µ, µφ ) > 0. Proposition 4 shows that for any typical point x w.r.t. µ, the µφ (Bn (x)) µφ measure of its neighbourhood Bn (x) divided by its µ measure, µ(B , converges n (x)) exponentially to zero with exponential rate h(µ, µφ ).

398

D. Q. Jiang, M. Qian, M. P. Qian

2.2.2. Transitive case. We now consider the case 5s = X1 ∪ · · · ∪ Xm with f Xk = Xk+1 (1 ≤ k ≤ m, Xm+1 = X1 ) and f m |Xk mixing. For µ ∈ Mf (5s ), one has µ(X1 ) = 1/m and µ = mµ|X1 ∈ Mf m (X1 ). Conversely, if µ ∈ Mf m (X1 ), then µ ∈ m−1 Mf (5s ), where µ(E) = 1/m$k=0 µ (X1 ∩ f k E). One can check that µ ↔ µ defines m a bijection Mf (5s ) ↔ Mf m (X1 ), hµ (f ) = mhµ (f ) and Sm φdµ = m φdµ. µ maximizes hµ (f ) + φdµ if and only if µ maximizes hµ (f m ) + Sm φdµ , i.e. µ is the equilibrium state of φ w.r.t. f |5s iff µ is the equilibrium state of Sm φ|X1 w.r.t. f m |X1 . Furthermore, Pf m (Sm φ|X1 ) = mPf (φ|5s ). (See the proof of Theorem 4.1 in [3].) Fix a Markov partition R, with diameter small enough, of the basic set 5s . Let (k)

def

Rk = R ∩ Xk = {Ri ∩ Xk |Ri ∈ R}, then Rk = {Ri } ⊂ R is a Markov partition of Xk as a mixing basic set of f m . Let µk = mµ|Xk , φm = Sm φ and µkφm be the equilibrium

1 m k k state of φm |Xk , then µφ = m1 m k=1 µφm defined by µφ (E) = m k=1 µφm (E ∩ Xk ) is the unique equilibrium state of φ w.r.t. f |5s . n−1 m −l (f ) R and For any µ ∈ Mf (5s ), we define µk as before. Let F˜ n = σ l=0 n−1 m −l Fk,n = σ l=0 (f ) Rk , then m k=1

h(µk , µkφm ) = =

m k=1 m k=1

1 H µk |Fk,n , µkφm |Fk,n n→∞ n lim

1 lim µk n→∞ n n s

n−1 l=0

log (f m )−l Rs(k) l

µk

µkφm

n−1 m −l (k) l=0 (f ) Rsl

n−1 m −l (k) l=0 (f ) Rsl

n−1 n−1 m −l (k) m µ (f ) R sl l=0 1 = m lim log µ (f m )−l Rs(k) l n→∞ n n−1 m −l (k) n µ (f ) R k=1 s

= m lim

l=0

n→∞

φ

l=0

sl

1 H µ|F˜ n , µφ |F˜ n . n

Since µkφm is the equilibrium state of φm on the mixing basic set Xk of f m , by Proposition 3, m m − Sm φdµk − hµk f m |Xk + Pf m Sm φ|Xk h µk , µkφm = k=1

k=1

=

m

−m

φdµ − mhµ f |5s + mPf (φ)

k=1

= m − φdµ − hµ f |5s + Pf (φ) . 2

So

1 H µ|F˜ n , µφ |F˜ n = m − φdµ − hµ (f |5s ) + Pf (φ) . n→∞ n lim

(23)

Entropy Production and Information Gain in Axiom-A Systems

399

Definition 3. For µ ∈ Mf (5s ), we define the specific information gain (or specific relative entropy) of µ with respect to µφ by def

h(µ, µφ ) = where F˜ n = σ

n−1

m )−l R . (f l=0

1 1 lim H µ|F˜ n , µφ |F˜ n , m n→∞ n

(24)

From the analysis above, we know the limit in the definition exists and is independent of the choice of Markov partition R of the basic set 5s . If f |5s is topologically mixing, then the definition is the same as before. We have proved the following theorem. Theorem 1. Suppose that 5s is a basic set of the C r (r ≥ 1) Axiom A diffeomorphism (M, f ), φ : 5s → R Hölder continuous and µφ the equilibrium state of φ with respect to f |5s , then for any µ ∈ Mf (5s ), h(µ, µφ ) = e(µ, φ) − hµ (f |5s ) + Pf (φ) = − φdµ − hµ (f |5s ) + Pf (φ). (25) Furthermore, h(µ, µφ ) ≥ 0, where the equality holds if and only if µ = µφ . As f |5s is an expansive homeomorphism of the compact metric space 5s , the entropy map h· (f |5s ) of f |5s is affine and upper semi-continuous on Mf (5s ) [33, Th. 8.1, Th. 8.2]. Therefore h(·, µφ ) is affine and lower semi-continuous on Mf (5s ). 3. Entropy Production in Axiom-A Systems Suppose that (M, f ) is a C 2 transitive Anosov diffeomorphism. For µ ∈ Mf (M), Ruelle proposed in [30] a definition of the entropy production rate ep (µ) = − log dµ. Ruelle showed that ep (µ) is the rate that the entropy needs to be pumped out of the system to keep the system in the stationary state µ. Because of the SBR property of the invariant measure µ+ , the entropy production rate ep (µ+ ) is of physical interest. Ruelle [30] and Gallavotti, et al. [10] identified it with the phase space contraction rate under the action of f . Ruelle [30, Lemma 1.1] asserted that ep (µ+ ) is equal to minus the sum of the Lyapunov exponents of (M, f, µ+ ), but did not give a detailed proof. (See Proposition 7 below.) In this section we prove that the entropy production rate ep (µ+ ) is identical to the specific information gain h(µ+ , µ− ) of µ+ w.r.t. µ− , and give some sufficient and necessary conditions for the positivity of the entropy production rate ep (µ+ ). By the definition of h(µ+ , µ− ) and Proposition 4, it is clear that ep (µ+ ) measures the difference between the system (M, f, µ+ ) and its time reversal (M, f −1 , µ− ), so we can say that the entropy production rate characterizes the degree of macroscopic irreversibility of the Anosov system (M, f ). We also discuss the relation between the entropy production rate defined by Ruelle [30] for basic sets of general Axiom A diffeomorphisms and the specific information gain in such systems. Suppose that (M, f ) is a C r (r ≥ 1) Axiom A diffeomorphism. Let 5(f ) be the nonwandering points set of (M, f ) and Wxs , Wxu be the stable manifold and unstable manifold of f at point x ∈ 5(f ) respectively. For x ∈ 5(f ), the tangent space Tx M

400

D. Q. Jiang, M. Qian, M. P. Qian

has decomposition Tx M = Exs ⊕ Exu , where Exs and Exu are the stable subspace and unstable subspace of Tx M respectively. Let (x), s (x) and u (x) be respectively the absolute Jacobian of the linear map Df : Tx M → Tf x M, Df : Exs → Efs x and Df : Exu → Efu x , using the inner products derived from the Riemannian metric on M. The absolute Jacobian of the linear map Df −1 : Exs → Efs −1 x is (s (f −1 x))−1 . Lemma 1. Suppose that (M, f ) is a C r (r ≥ 1) Axiom A diffeomorphism, then there exists a positive continuous function F : 5(f ) → R+ such that: for any x ∈ 5(f ), (x) = u (x)s (x)

F (f x) . F (x)

(26)

Proof. Let ds and du be the dimensions of the stable manifolds and unstable manifolds respectively, and d = ds + du . For x ∈ 5(f ), we can find orthogonal bases of Exs and Exu such that Exs = span{ξ1 (x), · · · , ξds (x)}, Exu = span{η1 (x), · · · , ηdu (x)}, and ξ1 (x), · · · , ξds (x), η1 (x), · · · , ηdu (x) vary continuously with x ∈ 5(f ). We denote (ζ1 , · · · , ζd ) = (ξ1 , · · · , ξds , η1 , · · · , ηdu ), and 1

!ζ1 ∧ · · · ∧ ζd ! = [det(< ζi , ζj >)] 2

the length of d-exterior product vector ζ1 ∧ · · · ∧ ζd , then by the definition we have (x) = |Jac(Df (x))| Df (ξ1 (x)) ∧ · · · ∧ Df (ξd (x)) ∧ Df (η1 (x)) ∧ · · · ∧ Df (ηd (x)) s u = . ξ1 (x) ∧ · · · ∧ ξd (x) ∧ η1 (x) ∧ · · · ∧ ηd (x) s

u

Let (ζ˜1 , · · · , ζ˜d ) = (ξ1 (f x), · · · , ξds (f x), η1 (f x), · · · , ηdu (f x)) and assume that Df (x)ζi =

d

cij (x)ζ˜j , 1 ≤ i ≤ d,

j =1

then det(< Df (ζi ), Df (ζj ) >) = det = det

<

k

cik ζ˜k ,

cj l ζ˜l >

l

cik cj l < ζ˜k , ζ˜l >

k,l

= det C(< ζ˜k , ζ˜l >)C T

= | det C|2 det(< ζ˜i , ζ˜j >),

Entropy Production and Information Gain in Axiom-A Systems

where

401

s C 0 | det C| = det u 0 C  s c11  ..  .  s  cd 1 s = det    0  .  .. 0

 s · · · c1d 0 ··· 0 s .. .. .. .. ..   . . . . .  s · · · cds ds 0 · · · 0   u u  · · · 0 c11 · · · c1du  .. .. .. .. ..  . . . . .  · · · 0 cduu 1 · · · cduu du

= s (x)u (x), because

s (x) = Jac(Df |Exs ) Df (ξ1 (x)) ∧ · · · ∧ Df (ξd (x)) s = ξ1 (x) ∧ · · · ∧ ξd (x) s 1 det < Df (ξi (x)), Df (ξj (x)) > 2 = 1 det < ξi (x), ξj (x) > 2

1 2 s = det < cik ξk (f x), cjs l ξl (f x) > = det C s ,

k

l

similarly, u (x) = | det C u |. So

1 2 det < ζ˜i , ζ˜j > (x) = s (x)u (x) 1 det < ζi , ζj > 2 F (f x) , = s (x)u (x) F (x) where

F (x) = ξ1 (x) ∧ · · · ∧ ξds (x) ∧ η1 (x) ∧ · · · ∧ ηdu (x) ! 1 I (< ξi (x), ηj (x) >) 2 . = det (< ηk (x), ξl (x) >) I

Assume that (M, f ) is a C 2 Axiom A diffeomorphism and 5s is a basic set of (M, f ). For x ∈ 5s , let φu (x) = − log u (x) and φs (x) = log s (f −1 x). Then φu and φs are Hölder continuous functions on 5s . By Theorem 4.1 in [3], each of φu and φs has a unique equilibrium state µφu ( w.r.t. f |5s ), µφs ( w.r.t. f −1 |5s ). We denote µ+ = µφu , µ− = µφs , and p+ = Pf (φu ), p− = Pf −1 (φs ). Bowen [3, Prop. 4.8] showed that p+ and p−

402

D. Q. Jiang, M. Qian, M. P. Qian

are respectively the escape rates of f and f −1 from neighborhoods of the basic set 5s . µ+ and µ− are respectively the generalized SBR measures for f |5s and f −1 |5s . If 5s is a hyperbolic attractor of (M, f ), then by Theorem 4.11 in [3], p+ = 0 and µ+ is the SBR measure for f |5s . Let v be the volume measure on M induced by the Riemannian metric. If (M, f ) is a C 2 transitive Anosov diffeomorphism, then the nonwandering points set 5(f ) = M and is the only basic set of f . In this case, p+ = p− = 0, hence µ+ and µ− are respectively the SBR measures for (M, f ) and (M, f −1 ). And by Theorem 4.12 in [3], for v-almost all x ∈ M,

k limn→∞ n1 n−1 k=0 F (f x) = F dµ+ ,

−k x) = F dµ (27) limn→∞ n1 n−1 − k=0 F (f for any F ∈ C(M). µ+ and µ− describe statistical properties of typical trajectories and they are generated exclusively by the dynamics. So µ+ and µ− are natural distributions of the Anosov system (M, f ) and its time reversal (M, f −1 ) respectively. Theorem 2. Let 5s be a basic set of the C 2 Axiom A diffeomorphism (M, f ), and let µ+ and µ− be the generalized SBR measures on 5s defined above, then h(µ+ , µ− ) = − log dµ+ − p+ + p− . (28) 5s

In particular, if (M, f ) is a C 2 transitive Anosov diffeomorphism, then the entropy production rate def ep (µ+ ) = − log dµ+ = h(µ+ , µ− ). (29) M

Proof. By Theorem 1, we have h(µ+ , µ− ) = e(µ+ , φs ) − hµ+ (f −1 |5s ) + p− =− log s dµ+ − hµ+ (f |5s ) + p− . 5s

By the definition of equilibrium state, hµ+ (f |5s ) − h(µ+ , µ− ) = −

5s

=−

5s

5s

log u dµ+ = p+ , so

log s dµ+ − log u dµ+ − p+ + p− , by Lemma 1, 5s F (f x) log dµ+ + log dµ+ (x) − p+ + p− . F (x) 5s

By the compactness of 5s , there exists a constant B > 1 such that F (x) and bounded by B −1 and B. Since µ+ is f -invariant, log F (f x)dµ+ (x) = log F (x)dµ+ (x), 5s

that is,

5s

5s

log FF(f(x)x) dµ+ = 0. We have proved (28).

F (f x) F (x)

are

Entropy Production and Information Gain in Axiom-A Systems

403

Ruelle [30] defined the entropy production rate ep (µ+ , f |5s ) associated with the escape from the Axiom A basic set 5s under the action of f by def log dµ+ − p+ . ep (µ+ , f |5s ) = − 5s

is an attractor of (M, f −1 ), then p

If 5s − = 0 and h(µ+ , µ− ) is identical to the entropy production rate ep (µ+ , f |5s ) defined by Ruelle. If 5s is an attractor of (M, f ), then p+ = 0 and h(µ+ , µ− ) = − 5s log dµ+ + p− . For a general basic set 5s of the C 2 Axiom A diffeomorphism (M, f ), by (28), it seems reasonable to define the entropy production rate of f |5s by def log dµ+ − p+ + p− = h(µ+ , µ− ). (30) ep (µ+ , f |5s ) = − 5s

Then

ep (µ+ , f |5s )

≥ 0, where the equality holds if and only if µ+ = µ− .

Remark. As is shown above, the entropy production rate of a Markov chain and that of an Anosov diffeomorphism both can be expressed as the specific relative entropy between the forward and the backward evolution. Lebowitz, Spohn [14] and Maes [18] pointed out that formally, the entropy production in both cases can be regarded as being caused by the currents associated with the breaking of time-reversal symmetry of certain spacetime Gibbs measures. In the diffeomorphism case, the Gibbs measures are obtained by lifting µ+ and µ− to the sequence space $A via the Markov partition R. That is, they are Borel probability measures µ+ and µ− on $A such that µ+ π −1 = µ+ and µ− π −1 = µ− . From later on in this section, we assume that (M, f ) is a C 2 transitive Anosov diffeomorphism. Most chaotic systems in statistical mechanics and fluid dynamics enjoy the property of positive entropy production. In the mathematical models of these systems, Anosov diffeomorphisms, the positivity of the entropy production rate ep (µ+ ) does not always hold true. Ruelle [30] proved that if µ+ is not absolutely continuous with respect to the volume measure v, then ep (µ+ ) > 0. With the help of h(µ+ , µ− ), we can easily prove that this condition is also necessary. Proposition 5. The entropy production rate ep (µ+ ) > 0 if and only if f has no invariant probability measure absolutely continuous with respect to the volume measure v. Proof. If f leaves invariant a probability measure µ absolutely continuous w.r.t. the volume measure v, then µ = µ+ = µ− [3, p. 102, Cor. 4.13], h(µ+ , µ− ) = 0. Conversely, if ep (µ+ ) = h(µ+ , µ− ) = 0, then by Theorem 1, µ+ = µ− . As Mf (M) = Mf −1 (M) and hν (f ) = hν (f −1 ), ∀ν ∈ Mf (M), µ− is also the unique equilibrium state of φs w.r.t. f . Then φu and φs have the same equilibrium state µ+ w.r.t. f . By [3, Prop. 4.5], for each periodic point x ∈ M with period p, p−1 k=0

φu (f k x) −

p−1

φs (f k x) = p Pf (φu ) − Pf (φs ) = p(p+ − p− ) = 0.

(31)

k=0

Then by Lemma 1 and (31), Jac(Df p (x)) = Jac(Df p |E s ) Jac(Df p |E u ) = 1. x

x

This is equivalent to that f admits an invariant measure µ absolutely continuous w.r.t. v [3, Th. 4.14].

404

D. Q. Jiang, M. Qian, M. P. Qian

Now we give another sufficient condition for ep (µ+ ) = 0. We will see that it is also necessary. Proposition 6. If the relative entropy of µ+ w.r.t. µ− ,H (µ+ , µ− ) < +∞, then ep (µ+ ) = h(µ+ , µ− ) = 0.

(32)

Proof. To prove the proposition, we need the following fact [5]: Let X be a complete separable metric space (i.e. Polish space), B(X) the Borel σ -algebra. If µ and ν are two probability measures on (X, B(X)), and G1 ⊂ G2 are two sub-σ -fields of B(X), then H (µ|G2 , ν|G2 ) = H (µ|G1 , ν|G1 ) + E µ (H (µx |G2 , νx |G2 )), where µx and νx are the regular conditional probability distributions of µ and ν given G1 respectively. So H (µ|G2 , ν|G2 ) ≥ H (µ|G1 , ν|G1 ). By F˜ n ⊂ Fmn ⊂ B(5s ), H (µ+ |F˜ n , µ− |F˜ n ) ≤ H (µ+ |Fmn , µ− |Fmn ) ≤ H (µ+ , µ− ), we have 1 h(µ+ , µ− ) = lim H µ+ |F˜ n , µ− |F˜ n n→∞ mn 1 ≤ lim H µ+ |Fmn , µ− |Fmn n→∞ mn 1 ≤ lim H (µ+ , µ− ) n→∞ mn = 0. Remark. Suppose that 5s is one basic set of a C r (r ≥ 1) Axiom A diffeomorphism (M, f ) and R is the Markov partition of 5s with diameter small enough. Let F n = σ ( n−n f −k R). Exploiting the variational expression of relative entropy [5] (Appendix n −k R = 0, one can prove the following result: For any 4.1) and limn→∞ diam −n f µ, ν ∈ M(5s ), not necessarily f -invariant, limn→∞ H (µ|F n , ν|F n ) = H (µ, ν). If µ, ν ∈ Mf (5s ), then limn→∞ H (µ|Fn , ν|Fn ) = H (µ, ν). We now pause to connect the entropy production rate ep (µ+ ) with the Lyapunov exponents of (M, f, µ+ ), which will reveal some geometrical meaning of ep (µ+ ) as is exposed before Theorem 3. By the Oseledec multiplicative ergodic theorem [21, 29], there exists a Borel set A ⊂ M with the following properties: (1) f A = A and µ(A) = 1, ∀µ ∈ Mf (M). (2) For each x ∈ A, the Lyapunov characteristic exponents of the diffeomorphism (1) (s(x)) (k) and their multiplicities mx , 1 ≤ k ≤ s(x) are defined, f at x, λx < · · · < λx (0) (1) (s(x)) = Tx M such that that is, there are linear subspaces 0 = Vx ⊂ Vx ⊂ · · · ⊂ Vx (k) (k) (k−1) , and mx = dim Vx − dim Vx lim

n→∞ (k)

1 log !Df n (x)u! = λ(k) x n

(k−1)

when u ∈ Vx \ Vx for k = 1, · · · , s(x). (1) (k) (k) (k) Since f is a diffeomorphism, for any x ∈ A, λx > −∞ and λx = λf x , Df (Vx ) (k)

= Vf x for k = 1, · · · , s(x). As µ+ is ergodic, s(x) is µ+ -almost everywhere constant, (k)

(k)

and for each k, λx and its multiplicity mx are µ+ -almost everywhere constant. We denote these constants by λ(k) (µ+ , f ), m(k) (µ+ , f ), 1 ≤ k ≤ s.

Entropy Production and Information Gain in Axiom-A Systems

405

Proposition 7. For any µ ∈ Mf (M), log dµ =

s(x) k=1

(k) m(k) x λx dµ(x).

In particular, the entropy production rate ep (µ+ ) = −

s(x) k=1

(k) m(k) x λx dµ+ (x) = −

s

m(k) (µ+ , f )λ(k) (µ+ , f ).

(33)

k=1

Proof. Let d be the dimension of the manifold M. For x ∈ M, the d th exterior power (Df (x))∧d of Df (x) is the linear map on the d th exterior power ∧d Tx M of Tx M defined by (Df (x))∧d (u1 ∧ · · · ∧ ud ) = Df (x)u1 ∧ · · · ∧ Df (x)ud , ∀u1 , · · · , ud ∈ Tx M. By the proof of the Oseledec multiplicative ergodic theorem [29], for each x ∈ A, 1 1 lim log Jac(Df n (x)) = lim log (Df n (x))∧d n→∞ n n→∞ n s(x) (k) = m(k) (34) x λx . k=1

Fix any µ ∈ Mf (M), then by the Birkhoff ergodic theorem, for µ-almost all x ∈ M, n−1

1 1 log (f k x) log Jac(Df n (x)) = lim n→∞ n n→∞ n lim

k=0

= E µ (log |I)(x),

(35)

where I = {B|B ∈ B(M), B = f −1 B}, the σ -field of f -invariant sets. From (34) and (35), we get that for µ-almost all x ∈ A, s(x) k=1

therefore

(k) µ m(k) x λx = E (log |I)(x),

log dµ =

s(x) k=1

(k) m(k) x λx dµ(x).

By (27), for v-almost all x ∈ M, 1 log |Jac(Df n (x))| = n→∞ n lim

log dµ+ = −ep (µ+ ).

That is to say, the exponential rate of volume contraction of (M, f ) is v-almost everywhere equal to s ep (µ+ ) = − m(k) (µ+ , f )λ(k) (µ+ , f ). k=1

406

D. Q. Jiang, M. Qian, M. P. Qian

Theorem 3. If (M, f ) is a C 2 transitive Anosov diffeomorphism, then the following are equivalent: (i) ep (µ+ ) = h(µ+ , µ− ) = 0. (ii) f admits an invariant measure µ # v. (iii) µ+ # v. (iv) µ+ = µ− . (v) H (µ+ , µ− ) < +∞. (vi) H (µ+ , µ− ) = 0. s (k) (k) (vii) k=1 m (µ+ , f )λ (µ+ , f ) = 0. (viii) For any periodic x ∈ M of period p,|Jac(Df p (x))| = 1. Proof. (a) By Proposition 5, (i) and (ii) are equivalent. Then by [3, Th. 4.14], (ii) is equivalent to (viii). From Theorem 1, we have that (i) is equivalent to (iv). (b) If (ii) holds, then by [3, Cor. 4.13], µ = µ+ , hence µ+ # v. (c) If H (µ+ , µ− ) < +∞, then by Proposition 6, ep (µ+ ) = h(µ+ , µ− ) = 0. By Theorem 1, µ+ = µ− . So H (µ+ , µ− ) = 0. (d) The equivalence of (i) and (vii) follows from Proposition 7. As µ+ and µ− are ergodic and they are extreme points of the convex set Mf (M), either µ+ = µ− or they are mutually singular [33, Th. 6.10]. By this fact, we can also prove Proposition 6 and the equivalence of (iv), (v) and (vi) in Theorem 3. As is known, among the C 2 Anosov diffeomorphisms, the ones that admit no invariant measure µ # v are open and dense, so most Anosov systems have positive entropy production rate. Remark. Suppose that g is a C 2 diffeomorphism of a compact Riemannian manifold M preserving a Borel probability measure µ, and the Lyapunov exponents of g are µalmost everywhere not zero. Ledrappier [15] proved that the following two conditions are equivalent: (i) Pesin’s entropy formula holds true for the system (M, g, µ), i.e. hµ (g) =

s(x) i=1

(i)+ m(i) x λx dµ(x),

a+

where = max(a, 0); (ii) µ has absolutely continuous conditional measures on unstable manifolds, i.e. µ is an SBR measure for g. (See also Ledrappier and Young [16].) Exploiting this result and the absolute continuity of local stable manifolds, Ledrappier [15] then proved that µ is absolutely continuous with respect to the volume measure v on M if and only if Pesin’s entropy formula holds true for the systems (M, g, µ) and (M, g −1 , µ), i.e. µ is an SBR measure for g and g −1 . (See Theorem 5.5 and Corollary 5.6 in [15].) For the Anosov diffeomorphism (M, f ), noticing that s log u dµ+ = m(i) (µ+ , f )λ(i)+ (µ+ , f ) i=1

and

log s dµ+ = −

s

m(i) (µ+ , f )λ(i)− (µ+ , f ),

i=1

where a − = − min(a, 0), then one can employ Ledrappier’s results to give another proof of the equivalence of ep (µ+ ) = 0 and µ+ # v besides our thermodynamic presentation.

Entropy Production and Information Gain in Axiom-A Systems

407

4. Appendix 4.1. Relative entropy [5]. Suppose that µ and ν are two probability measures on a measurable space (5, F), the relative entropy of µ with respect to ν is defined as: def

H (µ, ν) =

"

dµ log dµ dν (ω)µ(dω) if µ # ν and log dν ∈ L1 (dµ), +∞ otherwise.

There is another equivalent definition: "

φdµ − log

H (µ, ν) = sup

φ∈B(F )

# eφ dν ,

where φ ranges over all bounded F-measurable functions. If 5 is a Polish space and F is the Borel σ -field, then replacing B(F) by C(5) gives the same supremum.

4.2. General thermodynamic formalism [3,33]. If C = {C1 , · · · , Cm } is a finite measurable partition of a probability space (X, B, µ), the entropy of the partition C is defined by Hµ (C) = −

m

µ(Ck ) log µ(Ck ).

k=1

def If D is another finite measurable partition, then C D = {Ci ∩ Dj |Ci ∈ C, Dj ∈ D}. Let T be a measure-preserving transformation of the probability space (X, B, µ). If C is a finite measurable partition, the limit

n−1 1 −k hµ (T , C) = lim Hµ T C n→∞ n k=0

def

is called the entropy of T w.r.t. C. hµ (T ) = supC hµ (T , C), where C ranges over all finite measurable partitions of X, is called the measure-theoretic entropy of T , or Kolmogorov–Sinai entropy of T . Let T : X → X be a continuous map on the compact metric space X, and MT (X) the set of all T -invariant probability measures. We refer the reader to [3, 33] for the definition of the topological pressure PT (φ) of φ ∈ C(X) w.r.t. T . Walters, et al. [33] proved that for any φ ∈ C(X), PT (φ) =

sup

µ∈MT (X)

hµ (T ) +

φdµ .

This is called the variational principle. If µ ∈ MT (X) satisfies hµ (T )+ φdµ = PT (φ), then µ is called an equilibrium state for φ w.r.t. T .

408

D. Q. Jiang, M. Qian, M. P. Qian

4.3. Subshift of finite type and Gibbs measure [3]. If A is a m × m matrix with entries aij = 0 or 1, define $=

$ {1, · · · , m}, $A = {x ∈ $|axi xi+1 = 1, ∀i ∈ Z}, Z

and the left-shift θ : $A → $A , (θx)n = xn+1 , ∀n ∈ Z. We give {1, · · · , m} the discrete topology and $ the product topology. Then θ is a homeomorphism on $A . For φ : $A → R continuous, define Var k φ = sup{|φ(x) − φ(y)| : xi = yi , −k ≤ i ≤ k}. Let FA = {φ ∈ C($A )| Var k φ ≤ bα k (∀k ≥ 0) for some positive constants b and α ∈ (0, 1)}. Suppose that ($A , θ) is topologically mixing and fix a function φ ∈ FA . There is a unique θ -invariant Borel probability measure µφ on $A , for which one can find constants c > 1 and p = Pθ (φ) such that µφ ({y ∈ $A |yi = xi , 0 ≤ i ≤ n − 1}) 1 ≤ ≤ c, c exp(−np + Sn φ(x))

k for any x ∈ $A , and Sn φ(x) = n−1 k=0 φ(θ x) [3, Th. 1.4]. The measure µφ is mixing for θ, therefore ergodic and is the unique equilibrium state for φ w.r.t. θ [3, Th. 1.22]. We call µφ the Gibbs measure of φ. According to traditional terminology, this class {µφ } ought to be called “Gibbs measure with translation invariant exponentially decreasing interactions”. 4.4. Axiom-A diffeomorphism [3]. Suppose that f : M → M is a diffeomorphism of a compact C ∞ Riemannian manifold M. A closed subset ⊂ M is called hyperbolic if f () = and each tangent space Tx M with x ∈ can be written as a direct sum Tx M = Exs ⊕ Exu of subspaces such that (a) Df (Exs ) = Efs x , Df (Exu ) = Efu x ; (b) there exist constants c > 0 and λ ∈ (0, 1) so that !Df n (v)! ≤ cλn !v! when v ∈ Exs , n ≥ 0 and !Df −n (v)! ≤ cλn !v! when v ∈ Exu , n ≥ 0; and (c) Exs and Exu vary continuously with x. We say that f satisfies Axiom A if the nonwandering points set 5(f ) is hyperbolic and 5(f ) = {x : x is periodic}. f is called an Anosov diffeomorphism if M is hyperbolic. We refer the reader to [3] for the definitions of basic set and Markov partition. If R = {R1 , · · · , Rm } is a Markov partition of a basic set 5s , we define the transition matrix A = A(R) = (aij ) by " aij =

1 if intRi ∩ f −1 intRj = ∅, 0 otherwise.

And we define the subshift offinite type ($A , θ) as in Appendix 4.3. For each x ∈ $A , the set j ∈Z f −j Rxj consists of a single point, denoted by π(x). The map π : $A → 5s is a continuous surjection, π ◦ θ = f ◦ π , and π is one-to-one % over the residual set Y = 5s \ j ∈Z f j ∂R, where ∂R = ∂ s R ∪ ∂ u R is the boundary of R as defined in Bowen [3].

Entropy Production and Information Gain in Axiom-A Systems

409

References 1. Adler, R.L.: Symbolic dynamics and Markov partitions. Bull. Am. Math. Soc.(N.S.) 35 (1), 1–56 (1998) 2. Andrey, L.: The rate of entropy change in non-Hamiltonian systems. Phys. Lett. A 111, 45–46 (1985) 3. Bowen, R.: Equilibrium states and the ergodic theory of Anosov diffeomorphisms, Lect. Note Math. 470, Berlin: Springer-Verlag, 1975 4. Chazottes, J.R., Floriani, E. and Lima, R.: Relative entropy and identification of Gibbs measures in dynamical systems. J. Stat. Phys. 90 (3–4), 697–725 (1998) 5. Donsker, M.D. and Varadhan, S.R.S.: Asymptotic evaluation of certain Markov process expectations for large time, I. Comm. Pure Appl. Math. 28, 1–47 (1975); IV, Comm. Pure Appl. Math. 36, 183–212 (1983) 6. Eckmann, J.P. and Ruelle, D.: Ergodic theory of chaos and strange attractors. Rev. Modern Phys. 57 (3), 617–656 (1985) 7. Föllmer, H.: On entropy and information gain in random fields. Z. Wahrsch. Verw. Gebiete 26, 207–2171 (1973) 8. Föllmer, H. and Orey, S.: Large deviations for the empirical field of Gibbs measure. Ann. Prob. 16(3), 961–977 (1988) 9. Gallavotti, G.: Chaotic hypothesis: Onsager reciprocity and fluctuation-dissipation theorem. Jour. Stat. Physics 84, 899–926 (1996). 10. Gallavotti, G. and Cohen, E.G.D.: Dynamical ensembles in stationary states. Jour. Stat. Phys. 80, 931–970 (1995) 11. Gong, G.L. and Qian, M.P.: The invariant measures, probability flow and circulations of one-dimensional Markov processes. In: Fukushima, M. (ed.) Functional analysis in Markov processes (Lect. Not. Math. 923). Proceedings, Katata and Kyoto 1981, Berlin–Heidelberg: Springer-Verlag, 1982, pp. 188–198 12. Hasengawa, H.: Progress of Theoretic Phys. Vol. 55, 90; Vol. 56, 44; Vol. 57, 1523; Vol. 58, 128 13. Kalpazidou, S.L.: Cycle representations of Markov processes. New York: Springer-Verlag, 1995 14. Lebowitz, J.L. and Spohn, H.: A Gallavotti–Cohen-type symmetry in the large deviation functional for stochastic dynamics. J. Stat. Phys. 95 (1–2), 333–365 (1999) 15. Ledrappier, F.: Propriétés ergodiques des mesures de Sinai. Publ. Math. IHES 59, 163–188 (1984) 16. Ledrappier, F. and Young, L.S.: The metric entropy of diffeomorphisms Part I: Characterization of measures satisfying Pesin’s entropy formula. Ann. Math. 122, 509–539 (1985) 17. Lin, X.W. and Zhou, Q.S.: Large deviation principle for equilibrium states of Axiom A diffeomorphisms. Shu Xue Jin Zhan (China) 18 (1), 119–121 (1989) 18. Maes, C.: The fluctuation theorem as a Gibbs property. J. Stat. Phys. 95 (1–2), 367–392 (1999) 19. Orey, S. and Pelikan, S.: Large deviation principle for stationary processes. Ann. Prob. 16 (4), 1481–1495 (1988) 20. Orey, S. and Pelikan, S.: Deviations of trajectory averages and the defect in Pesin’s formula for Anosov diffeomorphisms. Trans. Am. Math. Soc. 315 (2), 741–753 (1989) 21. Oseledec, V.I.: A multiplicative ergodic theorem. Liapunov characteristic numbers for dynamical systems. Trans. Moscow Math. Soc. 19, 197–221 (1968) 22. Peterson, K.: Ergodic Theory. Cambridge: Cambridge University Press, 1983 23. Preston, C.: Random Fields. Lect. Note Math. 534, New York: Springer-Verlag, 1976 24. Qian, M.P. and Qian, M.: Circulation for recurrent Markov chains. Z. Wahrsch. Verw. Gebiete 59, 203–210 (1982) 25. Qian, M.P. and Qian, M.: The entropy production and reversibilty of Markov processes. Kexue Tongbao (China) 3, 165–167 (1985) 26. Qian, M.P., Qian, M. and Gong, G.L.: The reversibility and the entropy production of Markov processes. Contemp. Math. 118, 255–261 (1991) 27. Qian, M.P., Qian, C., and Qian, M.: Circulations of Markov chains with continuous time and the probability interpretation of some determinants. Sci. Sinica(Series A) 27 (5), 470–481 (1984) 28. Ruelle, D.: Thermodynamic formalism. Massachusetts: Addison-Wesley Publishing Company, 1978 29. Ruelle, D.: Ergodic theory of differentiable dynamical systems. Publ. Math. IHES 50, 275–306 (1979) 30. Ruelle, D.: Positivity of entropy production in nonequilibrium statistical mechanics. J. Stat. Phys. 85 (1–2), 1–23 (1996) 31. Schnakenberg, J.: Network theory of microscopic and macroscopic behaviour of master equation systems. Rev. Modern Phys. 48 (4), 571–585 (1976) 32. Sinai, Ya.G.: Gibbs measures in ergodic theory. Russian Math. Surveys 27 (4), 21–69 (1972) 33. Walters, P.: An introduction to ergodic theory. New York: Springer-Verlag, 1982 Communicated by Ya. G. Sinai

Commun. Math. Phys. 214, 411 – 428 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Random Homogenization and Singular Perturbations in Perforated Domains Viê.t Hà Hoàng Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge CB3 9EW, UK. E-mail: [email protected] Received: 15 December 1999 / Accepted: 14 April 2000

Abstract: The paper considers the singularly perturbed Dirichlet problem −εuε + uε = f in a randomly perforated domain ε , which is obtained from a bounded open set in RN after removing many holes of size ε q . The perforated domain is described in terms of an ergodic dynamical system acting on a probability space. Imposing certain conditions on the domain, the behaviour of uε when ε → 0 in Lebesgue spaces Ln () is studied. Test functions together with the Birkhoff ergodic theorem are the main tools of analysis. The Poisson distribution of holes of size εp with the intensity λε −r is then considered. The above results apply in some cases; other cases are treated by the Wiener sausage approach. 1. Introduction We consider in this paper the singularly perturbed Dirichlet problem −εuε + uε = f,

(1)

in a perforated domain ε obtained by perforating holes, whose size is of a positive order of ε, from a bounded domain in RN . Assuming that f ∈ L2 (), we study the asymptotic behaviour when ε → 0 of the solution uε , which is in H01 (ε ) (regarded as 0 outside ε ) and satisfies ∇uε .∇ψdx + uε ψdx = f ψdx, (2) ε ε

ε

ε

for all ψ ∈ H01 (ε ). In [13], we considered the problem (1) in a periodically perforated domain. When the holes’ size and the period are of the same order of ε, periodic test functions which satisfy certain equations in the unit periodic cell was employed. This is essentially the “energy method” of Tartar [25] presented in details in [5] and was employed in the situation of

412

V. H. Hoàng

perforated domains in [26]. When the holes’ size is of a smaller order than the period, the framework of Cioranescu and Murat [6] was used. However, as pointed out in [12], this framework works for a more general class of perforated domains in which holes may not be periodically perforated and may not be of the same shapes and sizes. In [14], we examined (1) in domains in which holes are distributed randomly according to a smooth probability density function. The Wiener sausage approach, which was introduced in the seminal paper of Kac [15] was used together with new results in the theory of Wiener sausage by Le Gall [16] and Sznitman [23]. In this paper, we study the problem (1) in perforated domains which can be described using the concept of a dynamical system acting on a probability space. This class of perforated domains covers periodic domains in which the holes’ size and the period are of the same order as a special case. Domains with a Poisson distribution of holes can also be described by this framework. The framework was initiated and developed by Kozlov, Zhikov and Oleinik (see [27]) and Papanicolaou and Varadhan [20] for the problem of homogenization of partial differential equations with random coefficients. Our description of perforated domains follows that of Beliaev and Kozlov [4]. In physical terms, (1) describes a steady state diffusion process in which there may be a competition between the smallness of the diffusivity and the small holes’ size and spacing. An example may be found in Talbot and Willis [24]. The function f represents the rate of introduction of the diffusing species, and uε is a linear sink term. The Dirichlet boundary condition occurs when we assume an ideal sink condition inside the holes, i.e. the sink coefficient is infinity. In the next section, we define the perforated domains and we touch upon necessary preliminaries. Homogenization of the problem (1) in those perforated domains is addressed in Sect. 3. The limit of uε in Lebesgue spaces and corrector results when the convergence is weak are studied but an order of convergence is impossible due to the lack of a Poincaré style inequality. In Sect. 4, we consider a Poisson distribution of holes, in which the intensity of the distribution is of a negative order of ε. When N ≥ 3, the cases which are not amenable to this framework can be treated by using the Wiener sausage approach in [14]; the proofs are quite similar to those in [14] so only the results are presented. Finally, the appendix elaborates the results on the pointwise boundedness of uε which is used previously. The (non-singularly perturbed) Laplace equation in perforated domains with a probabilistic description has been studied in several papers in the literature. Kac [15] considered the behaviour of the eigenvalues of the Laplacian with respect to the Dirichlet boundary condition in a domain in R3 , which is obtained from a bounded set by perforating M spheres of radius δ such that limδ→0 Mδ = α. The spheres are uniformly randomly distributed and are allowed to intersect each other. Using the Wiener sausage, he showed that in the limit δ → 0, these eigenvalues converge to the corresponding eigenvalues of the Laplacian in shifted by 2π α/||. Also employing the Wiener sausage, Rauch and Taylor [22] generalized the result by Kac to the case where holes are distributed according to a continuous density function; the domain need not be bounded. The convergence is established for the solution of the time dependent heat equation using the asymptotics of the heat kernel and the Feynman–Kac formula. The result by Kac was later developed further by Ozawa [17, 18] where he showed an estimate for the rate of the convergence; but Ozawa used the non probabilistic method of perturbational calculus which is quite complicated computationally. This approach was then extended by Figari et al. [9] to obtain fluctuation results for the eigenvalues. Papanicolaou and Varadhan [21] deduced a pointwise estimate for the nonstationary diffusion equation

Random Homogenization and Singular Perturbations in Perforated Domains

413

employing estimates for the time at which a Brownian trajectory first hits the boundary of the holes. Similar results for more general domains were then obtained by Baxter et al. [2] and Baxter and Jain [3]. The Laplace equation in a domain where holes are distributed according to a non smooth density function is considered in Balzano [1] using the γ -convergence of measures. For other related work in periodically and general perforated domains, we refer to the vast bibliography at the end of the book by Dal Maso [7]. Throughout this paper, c denotes various constants which do not depend on ε. Integrals are with respect to dx in except where the domains and variables are specified. By . we denote the norm in L2 () in the scalar case and in (L2 ())N in the vector case, except where the space is specified; and . denotes the mean of a random variable in L1 (W ) (W is the set of realizations defined below). As usual, repeated indices indicate summation. 2. Preliminaries Let (W, F, P ) be a probability space. Acting on this space, there is an N -dimensional dynamical system T (x) : W → W (x ∈ RN ) which satisfies the group properties and preserves the P -measure. We assume further that it is ergodic. It is well known that if g ∈ Lα (W ) then g(T (x)ω) ∈ Lαloc (RN ) a.s. and if gn → g in Lα (W ) then gn (T (x)ω) → g(T (x)ω) in probability in Lαloc (RN ) (see [27]). For each random variable g : W → R, we denote by ∂i g the random variable (if it exists) such that almost surely ∂i g(T (x)ω) is the generalized derivative ∂g(T (x)ω)/∂xi of the realization g(T (x)ω) of N g. By ∂g we denote the vector (∂1 g, . . . , ∂N g). even Let K(t) ∈ D(R δ) be a positive function with support inside the unit ball and RN K(t) = 1. Let K (t) = δ −N K(δ −1 t) for δ > 0. For each g ∈ L2 (W ), we set g δ (ω) = K δ (t)g(T (t)ω)dt. (3) RN

It is not difficult to show that all the realizations g δ (T (x)ω) are infinitely differentiable functions in RN . Then there exists the derivative ∂i g δ for all i, which is defined as ∂K δ (t) ∂i g δ (ω) = − g(T (t)ω)dt. (4) ∂ti RN If g has a derivative ∂i g, then ∂i g δ (ω) =

RN

K δ (t)∂i g(T (t)ω)dt.

We denote by H 1 (W ) the set of all random variables g ∈ L2 (W ) such that for all i = 1, . . . , N, the derivative ∂i g of g is in L2 (W ). Equipping H 1 (W ) with the norm

g.g1/2 + ∂i g.∂i g1/2 , H 1 (W ) is a Hilbert space. Zhikov et al. [27] proves that if g δ is defined in (3) then g δ → g in L2 (W ) when δ → 0. Thus we also have ∂i g δ → ∂i g in L2 (W ). Hence the set of random variables having infinitely differentiable realizations is dense in H 1 (W ). Let g ∈ H 1 (W ) and {gn } be a sequence in L∞ (W ) which converges to g in L2 (W ). Fixing δ > 0, then gnδ → g δ in H 1 (W ) as n → ∞. For each δ > 0 we choose δ δ n(δ) such that gn(δ) − g δ H 1 (W ) < δ, then gn(δ) → g in H 1 (W ) as δ → 0. Since

414

V. H. Hoàng

δ gn(δ) ∈ L∞ (W ), gn(δ) ∈ L∞ (W ) and so the set of bounded random variables in H 1 (W ) is dense in H 1 (W ). We next define the randomly perforated domain; the definition follows that of [4]. Let V0 be a set in F. For each ω we define a random set

V (ω) = {y ∈ RN : T (y)ω ∈ V0 }. Let be a bounded domain in RN . We define the randomly perforated domain ε to be ε = εq V (ω) (ε > 0, q > 0). For each δ > 0, we let V δ (ω) = {y ∈ RN : |y − y | > δ ∀y ∈ / V (ω)}, and V0δ = {ω ∈ W : 0 ∈ V δ (ω)}, i.e. if T (y )ω ∈ / V0 then |y | > δ. The domain V δ (ω) can be defined as V δ (ω) = {y ∈ RN : T (y)ω ∈ V0δ }. Let S δ be the set of all random variables having infinitely differentiable realizations and being equal to zero outside V δ (ω). An explicit example of such a random variable can be constructed. In fact if g ∈ L2 (W ) such that g(ω) = 0 if ω ∈ / V02δ then g δ as defined δ δ in (3) belongs to S ([4]). Letting S = δ>0 S , S is a linear space. We define H01 (V0 ) to be the completion of S H 1 (W ) in H 1 (W ). To deal with the homogenization problem in the next section, we need another space of random variables. Beliaev and Kozlov [4] call the random domain V (ω) strictly porous if there exists a positive random variable h(ω) and a positive constant m such that h−(1+m) < ∞ and for all φ ∈ D(V (ω)) we have h(T (y)ω)φ 2 (y)dy ≤ |∇φ(y)|2 dy. (5) RN

RN

Let S be the set of random variables g such that ∂i g ∈ L2 (W ) for all i = 1, . . . , N. A random variable in S may not belong to L2 (W ). Beliaev and Kozlov proved that if V (ω) is strictly porous, then

g 2 h ≤ ∂g.∂g, and

|g|n 2/n ≤ g 2 h h−(1+m) 1/(1+m) , for n = 1 + m/(2 + m), for all g ∈ S . Letting H0 be the completion of S in the norm

∂g.∂g1/2 , we can easily see that H0 ⊂ Ln (W ). In the sequel, by weak convergence of a sequence, say uε , to a limit, say u, in probability in a certain space, we mean that from a sequence ε → 0, we can extract a subsequence εn → 0 and choose a subset W0 ⊂ W of P -measure 1 such that uεn converges weakly to u in that space for all ω ∈ W0 . Finally we note the following result in [4] which will be used throughout.

Random Homogenization and Singular Perturbations in Perforated Domains

Lemma 1. Let F i and G in L2 (W ) be such that for any ψ ∈ S

415

H 1 (W ) we have

F i ∂i ψ + Gψ = 0.

(6)

Then for any ψ ∈ D(V (ω)), ∂ψ(y) [F i (T (y)ω) + G(T (y)ω)ψ(y)]dy = 0 a.s. N ∂yi R By using a change of variable we have that x ∂ψ(x) x q i ε F T q ω + G T q ω ψ(x)dx = 0 N ε ∂xi ε R for all ψ ∈ D(εq V (ω)), and so for all ψ ∈ H01 (ε ). 3. Homogenization of Singularly Perturbed Dirichlet Problems We now consider (1) in the perforated domain ε defined in the previous section. Letting ψ = uε in (2), we have that uε ≤ f so from a sequence uε , we can always extract a subsequence, still denoted by uε , which converges weakly to a function u in L2 (). We first consider the case q = 1/2. Let ξ ∈ H01 (V0 ) be such that for all ψ ∈ H01 (V0 ),

∂ξ.∂ψ + ξ ψ = ψ, which exists and is unique by the Lax–Milgram lemma. Setting ξ ε (x) = ξ(T (x/ε 1/2 )ω) we have ξ ε 0 ξ in L2 () due to the ergodic theorem and that ε ε (7) ε ∇ξ .∇ψ + ξ ψ = ψ, for all ψ ∈ H01 (ε ). Using this test function, we deduce the following behaviour of uε . Theorem 1. When q = 1/2, uε 0 ξ f in L2 () and uε − ξ ε f L1 () → 0 a.s. If f ∈ H01 () W 1,∞ (), then uε − ξ ε f 0 0 in H01 () a.s. Proof. Letting ξ ε φ (φ ∈ D()) be the test function in (2), we have ε ∇uε .∇φξ ε + ε ∇uε .∇ξ ε φ + uε ξ ε φ = f ξ ε φ. The right-hand side converges to ξ f φ. On the left-hand side, ε| ∇uε .∇φξ ε | ≤ cε ∇uε . ξ ε → 0, since ε1/2 ∇uε and ξ ε are bounded. Next, letting uε φ be the test function in (7) we have ε ε ε ε ε ε ε ∇ξ .∇φu + ε ∇ξ .∇u φ + ξ u φ = uε φ. Since uε 0 u, the right-hand side converges to uφ. As ∂ξ(T (x/ε 1/2 )ω) converges weakly to ∂ξ in (L2 ())N , ε1/2 ∇ξ ε is bounded so the first term on the right-hand

416

V. H. Hoàng

side tends to 0 as ε → 0. Therefore the sum of thelast two terms converges to uφ. Comparing the above two equations, we have ξ f φ = uφ for all φ ∈ D(). Therefore u = ξ f . We now show the corrector results. Let ψ ∈ H01 () W 1,∞ (); then ξ ε ψ and ξ ε ψ 2 are in H01 (). Let v ε = uε − ξ ε ψ. On using (2) and (7) we have ε ∇v ε 2 + v ε 2 = f uε + ξ ε ψ 2 − 2 f ψξ ε + ε |ξ ε |2 |∇ψ|2 . The left-hand side converges to ξ (f − ψ)2 . Thus lim sup uε − ξ ε f L1 () ≤ lim sup( uε − ξ ε ψ L1 () + ξ ε (f − ψ) L1 () ) ε→0

ε→0 1/2

≤ c ξ

f − ψ + lim sup ξ ε f − ψ . ε→0

Letting ψ ∈ D() and ψ → f in L2 (), we have uε − ξ ε f L1 () → 0. When f ∈ H01 () W 1,∞ (), we can put ψ = f . After some simple calculation, we have that ε ∇v ε 2 + v ε 2 = −ε ∇v ε .∇ψξ ε + ε ∇ψ.∇ξ ε v ε . The left-hand side is less than cε ∇v ε + cε 1/2 v ε due to the boundedness of ξ ε and ε1/2 ∇ξ ε . If v ε > cε 1/2 , we have that ε ∇v ε 2 ≤ cε ∇v ε , so ∇v ε ≤ c. If v ε ≤ cε 1/2 then ε ∇v ε 2 ≤ cε ∇v ε + cε which implies ∇v ε ≤ c. Therefore we can always extract a subsequence from v ε which converges weakly in H01 (). Since v ε → 0 in L1 () this weak limit is 0. When q < 1/2, we consider the random variable θ ε ∈ H01 (V0 ) such that ∀ψ ∈

H01 (V0 ),

ε 1−2q ∂θ ε .∂ψ + θ ε ψ = ψ.

(8)

There exists a unique such random variable θ ε . Putting ψ = θ ε , we have that L2 (W ) ≤ 1. As all random variables ψ ∈ H01 (V0 ) can be regarded as 0 outside V0 , so θ ε L2 (W ) = θ ε L2 (V0 ) . The space L2 (V0 ) can be defined as the subspace of L2 (W ) of all random variables being equal to 0 outside V0 . We assume that L2 (W ) is separable. Then L2 (V0 ) is also separable, so we can extract a subsequence θ ε which converges weakly to a random variable θ in L2 (V0 ). Since ε1−2q ∂θ ε .∂θ ε ≤ θ ε ≤ 1, ε 1/2−q ∂θ ε (L2 (W ))N ≤ 1. Fixing a function ψ ∈ H01 (V0 ) we have that the first term in (8) converges to 0, the second term to θψ. Therefore (θ − 1)ψ = 0 for all ψ ∈ H01 (V0 ). If g is a random variable in L2 (W ) such that g(ω) = 0 when ω ∈ / V02δ then 1 δ g ∈ H0 (V0 ). Let δ0 > 0 and 2δ < δ0 . Letting g = (θ − 1)IV δ0 , then g(ω) = 0 θ ε

0

if ω ∈ / V0δ0 ⊂ V02δ . Therefore g δ ∈ H01 (V0 ) for all δ < δ0 /2. Since (θ − 1)g δ = 0 and g δ → g in L2 (W ), letting δ → 0, we have that (θ − 1)2 IV δ0 = 0, i.e. θ = 1 almost 0 surely in V0δ0 . We say that the set V0 satisfies the property (P) if P (V0 \ δ V0δ ) = 0.

Random Homogenization and Singular Perturbations in Perforated Domains

417

For such a set we have that θ = 1 almost surely in V0 . Therefore (θ ε − 1)2 V0 ≤

θ ε V0 − 2 θ ε V0 + 1V0 → 0, i.e. θ ε converges strongly to 1 in L2 (V0 ) . Letting ηε = θ ε (T (x/ε q )ω) we have |θ ε (T (x/ε q )ω) − IV0 (T (x/ε q )ω)|2 dxP (dω) = || θ ε − IV0 L2 (W ) → 0, W

where we have used the fact that T (x) preserves the probability measure. Therefore, |θ ε (T (x/ε q )ω) − IV0 (T (x/ε q )ω)| → 0 in L2 () in probability. Since IV0 (T (x/ε q )ω) 0 P (V0 ) in L2 (), we have that ηε (x) 0 P (V0 ) in probability. From Lemma 1, we have (9) ε ∇ηε .∇ψ + ηε ψ = ψ, for all ψ ∈ H01 (ε ). Theorem 2. When q < 1/2, assuming that V0 satisfies the property (P) and that L2 (W ) 2 is separable, we have uε 0 P (V0 )f in L2 () in probability, 1,∞ and for allε f ∈εL (), 1 ε ε u − η f L1 () → 0 in probability. If f ∈ H0 () W (), then u − η f 0 0 in H01 (). Proof. Letting ηε φ be the test function in (2) we have ε ε ε ε ε ε ε ∇u .∇φη + ε ∇u .∇η φ + u η φ = f ηε φ. The right-hand side converges to P (V0 ) f φ in probability. On the left-hand side, since ε1/2 ∇uε and ηε are bounded, the first term converges to 0. Thus the sum of the last two terms converges to P (V0 ) f φ. Now letting uε φ be the test function in (9) we have ε ε ε ε ε ε ε ∇η .∇φu + ε ∇η .∇u φ + η u φ = uε φ. Since uε 0 u in L2 () the right-hand side converges to uφ. On the left-hand side, the first term converges to 0 in probability. We justify this as follows. As ∇ηε = ε−q ∂θ ε (T (x/ε q )ω), so |∇ηε |2 dxP (dω) = ε 1−2q || |∂θ ε |2 < c. ε

W

Thus if α > 1/2, W ε α ∇ηε 2 P (dω) → 0 and so εα ∇ηε → 0 in probability. In addition, uε is bounded so the first term converges to 0 in probability. This implies that the sum of the last two terms converges to uφ. Therefore u = P (V0 )f . The corrector results are shown in a similar manner as in the proof of Theorem 1. The case q > 1/2 is more complicated. We assume that the domain is strictly porous. Let us consider the random variable ζ ∈ H0 (V0 ) such that

∂ζ.∂ψ = ψ,

418

V. H. Hoàng

for all ψ ∈ H0 (V0 ). Since | ψ| ≤ c |ψ|n 1/n ≤ c |∂ψ|2 1/2 , we can apply the Lax– Milgram lemma and deduce that there exists a unique solution to the above equation. Letting ζ ε (x) = ζ (T (x/ε q )ω), we have 2q ε (10) ∇ζ .∇ψ = ψ ε for all ψ ∈ H01 (ε ). We next justify the fact that for all φ ∈ D(), ζ ε φ ∈ H01 (ε ). This is not obvious since ζ may not belong to L2 (W ) and therefore ζ ε may not belong to L2 (). Let ζk be a sequence in S which converges to ζ in the norm of H0 (V0 ). Let ζkε = ζ (T (x/ε q )ω). Then ∇ζkε converges to ∇ζ ε in (L2loc (RN ))N , and ζkε converges to ζ ε in Lnloc (RN ); the convergences are in probability. Since ζkε are zero inside a neighbourhood of ∂ε q V (ω) and belong to C ∞ (RN ), for all φ ∈ D(), ζkε φ ∈ D(ε ). As ζkε φ → ζ ε φ in the norm of W 1,n (ε ), ζ ε φ ∈ W01,n (ε ). From the Sobolev inequality, we have ζ ε φ − ζkε φ LN n/(N−n) (ε ) ≤ c ∇(ζ ε φ) − ∇(ζkε φ) Ln (ε ) , where the constant c does not depend on ε . Thus ζkε φ → ζ ε φ in LNn/(N−n) (ε ). Similarly, ζkε ∇φ → ζ ε ∇φ in (LNn/(N−n) (ε ))N . If N n/(N − n) ≤ 2, then ∇(ζkε φ) → 1,Nn/(N−n) ∇(ζ ε φ) in LNn/(N−n) (ε ). Therefore ζ ε φ ∈ W0 (ε ). Since εq ∇ζ ε is q ε bounded, we have that ε ∇(ζ φ) Ln (ε ) is bounded and so ε q ζ ε φ LN n/(N−n) (ε ) is also bounded by the Sobolev inequality. Repeating this argument, we find that ζ ε φ LN n/(N−2n) (ε ) ≤ c ∇(ζ ε φ) LN n/(N−n) (ε ) . Since εq ζ ε ∇φ (LN n/(N−n) (ε ))N is bounded (we can apply the above argument for ζ ε ∂φ/∂xi ), ε q ζ ε φ LN n/(N−2n) (ε ) is bounded. If N n/(N − 2n) ≤ 2, we again have

that ζ ε φ ∈ W0 (ε ). This process only stops when we find a positive integer l such that N n/(N − ln) ≥ 2. Then ζkε φ → ζ ε φ (and similarly ζkε ∂φ/∂xi → ζ ε ∂φ/∂xi ) in LNn/(N−ln) (ε ). Therefore ζ ε φ ∈ H01 (ε ). We also have that ε q ζ ε φ LN n/(N−ln) (ε ) is bounded. Beliaev and Kozlov [4] proved that if V (ω) is strictly porous, then for any ψ ∈ H01 (ε ), 1,Nn/(N−2n)

n

ε

|ψ| dx

1/n ≤ cε

q

1/2 |∇ψ| dx 2

ε

,

(11)

where c = c(ω) is independent of ε. Let n be such that 1/n + 1/n = 1. The behaviour of uε is as follows. Theorem 3. Let f ∈ Ls () for some s ≥ n and s > N/2. Let n be greater than a constant C(N, s). Then ε1−2q uε 0 f ζ a.s. in Ln (). For all f ∈ Ls (), ε 1−2q uε − ζ ε f L1 () → 0 a.s.; and if f ∈ D() then uε ε 1−2q − ζ ε f Ln () → 0. Proof. Since ε ∇uε 2 + uε 2 ≤ f Ln () uε Ln () ≤ cε q ∇uε by (11), we have that ∇uε ≤ cε q−1 and so uε Ln () ≤ cε 2q−1 . Therefore from a sequence ε → 0,

Random Homogenization and Singular Perturbations in Perforated Domains

419

we can extract a subsequence such that ε 1−2q uε converges weakly to a limit u in Ln (). Letting ζ ε φ (φ ∈ D()) be the test function in (2) we have ε ε ε ε ε ε ε ∇u .∇ζ φ + ε ∇u .∇φζ + u ζ φ = f ζ ε φ. The right-hand side converges to ζ f φ. On the left-hand side ε ∇uε .∇φζ ε ≤ ε ∇uε . ζ ε ∇φ ≤ cεq ζ ε ∇φ . As shown above, we have that ζ ε ∂φ/∂xi ∈ H01 () and that εq ∇ζ ε ∂φ/∂xi and εq ζ ε ∇(∂φ/∂xi ) are bounded, so there exists a subsequence such that ε q ζ ε ∂φ/∂xi converges weakly in H01 (). Since ε q ζ ε ∂φ/∂xi Ln () converges to 0, the weak limit is 0. Therefore εq ζ ε ∇φ → 0 so the second term on the right-hand side converges to 0. We next consider the third term. Since f ∈ Ls () and s > N/2, we have from Theorem 8.15 of Gilbarg and Trudinger [10] that there exist positive constants C and β which do not depend on ε, but C depends on f and β depends on s and N such that supε |uε | < Cε −β (see the Appendix). Using the interpolation inequality (Eq. (7.10) in [10]), we have that uε Ln () ≤ δ uε Lr () + δ −µ uε Ln () , where δ > 0, n < n < r and µ = (1/n − 1/n )/(1/n − 1/r). Letting r → ∞, we have

uε Ln () ≤ δ uε L∞ () + δ 1−n /n uε Ln () .

Putting δ = ε(2q−1+β)n/n , using uε Ln () < cε 2q−1 and uε L∞ () < cε −β we have uε Ln () ≤ ε n(2q−1+β)/n −β which converges to 0 when n (which is less than 2)is large enough such that β < (2q − 1)n/(n − n). Therefore | uε ζ ε φ| → 0. Thus ε ∇uε .∇ζ ε φ → ζ f φ. Next we let uε φ be the test function in (10). Multiplying both sides by ε 1−2q we have ε ∇ζ ε .∇uε φ + ε ∇ζ ε .∇φuε = ε1−2q uε φ. The right-hand side converges to uφ. Since ∂ζ (T (x/εq )ω) converges weakly in (L2 ())N , we have ∇ζ ≤ cε−q . In addition, uε 2 ≤ f Ln () . uε Ln () ≤ cε2q−1 . Therefore ε| ∇ζ ε .∇φuε | ≤ cεε −q ε q−1/2 = cε 1/2 → 0. Hence ε ∇ζ ε .∇uε φ → uφ which gives u = ζ f . Next we prove the corrector results. Let φ ∈ D(). On using uε and ζ ε φ as test functions in (2) and ζ ε φ 2 in (10), we can easily show that ε2q |∇(uε ε 1−2q − ζ ε φ)|2 = ε 1−2q f uε − ε 1−2q |uε |2 − 2 f ζ ε φ ε ε ε 2 2q +2 u ζ φ + ζ φ + ε |ζ ε |2 |∇φ|2 .

420

V. H. Hoàng

On the side, ε 1−2q f uε → ζ f 2 , 2 f ζ ε φ → 2 ζ f φ and ζ ε φ → right-hand

ζ φ 2 . Since uε Ln () → 0 and ε 1−2q uε Ln () is bounded, ε 1−2q |uε |2 → 0. Similarly uε ζ ε φ → 0. As having shown before, ε q ζ ε ∇φ → 0, soε2q |ζ ε |2 |∇φ|2 → 0. Thus the right-hand side of the above equation converges to ζ (f − φ)2 . From (11), we have that lim sup uε ε 1−2q − ζ ε φ Ln () ≤ ζ 1/2 f − φ . ε→0

Then lim sup uε ε 1−2p − ζ ε f L1 () ε→0

≤ lim sup c uε ε 1−2q − ζ ε φ Ln () + c ζ ε (f − φ) L1 () ε→0

≤ c f − φ + c f − φ Ln () lim sup ζ ε Ln () . ε→0

n

Letting φ → f in L (), we have that limε→0 uε ε 1−2q − ζ ε f L1 () → 0. If f ∈ D() we can put φ = f so that uε ε 1−2q − ζ ε f Ln () → 0. We now consider a general case where the reference set V0 in F depends on ε itself; we call it V0ε . The random domain is defined as V ε (ω) = {y ∈ RN : T (y)ω ∈ V0ε }. The perforated domain ε is then εq V ε (ω). We consider the case where q < 1/2 first. Let κ ε be a random variable in H01 (V0ε ) such that ε1−2q ∂κ ε .∂ψ + κ ε ψ = ψ, for all ψ in H01 (V0ε ). Such a random variable exists and is unique. As mentioned before, random variables in H01 (V0ε ) may be considered as being 0 outside V0ε . Letting ψ = κ ε we have that κ ε L2 (W ) is bounded. On assuming that L2 (W ) is separable, we can extract a subsequence from {κ ε } which converges to a random variable κ in L2 (W ). We suppose that {V0ε } is an increasing sequence of sets when ε → 0, which are subsets of a set V0 in F and that P (V0 \ ε V0ε ) = 0. Let ψ ∈ H01 (V0ε0 ) for some ε0 . Then by the definition of H01 (V0ε0 ), ψ ∈ H01 (V0ε ) for all ε < ε0 . Since ε 1/2−q ∂κ ε (L2 (V0 ))N is bounded, ε1−2q ∂κ ε .∂ψ → 0 and so κψV0 = ψV0 . We now show that under certain assumptions, the set of random variables ψ ∈ H01 (V0ε0 ) for some ε0 is dense in L2 (V0 ). Let g ∈ L2 (V0 ) and g ε be the random variable being equal to g in V0ε and to 0 in V0 \ V0ε . Let / V ε (ω)}, V εδ (ω) = {y ∈ RN : |y − y | > δ ∀ y ∈ / V0ε implies |y | > δ. We make the and V0εδ = {ω ∈ W : 0 ∈ V εδ (ω)}, i.e. T (y )ω ∈ assumption that for all ε1 > ε2 > 0 if δ is sufficiently small then V0ε1 ⊂ V0ε2 δ . We define ε (2δ) the random variable g εδ as in (3) (with respect to g ε ). Let ε0 < ε. Then V0ε ⊂ V0 0 ε (2δ) ε 0 for δ small. Then g ε (ω) = 0 if ω ∈ / V0 , so g εδ ∈ H01 (V0 0 ). Since limδ→0 g εδ −

Random Homogenization and Singular Perturbations in Perforated Domains

421

g ε L2 (V0 ) = 0, we can choose a number δ(ε) such that g εδ(ε) − g ε L2 (V0 ) < ε. As P (V0 \ ε V0ε } = 0, limε→0 g ε − g L2 (V0 ) = 0, so limε→0 g εδ(ε) − g L2 (V0 ) = 0. The conclusion follows. Therefore κ = 1 a.s. in V0 . Now (κ ε −1)2 V0 ≤ κ ε V0 −2 κ ε V0 + 1V0 so κ ε → 1 in L2 (V0 ). Letting λε (x) = κ ε (T (x/ε q )ω) we can show that λε 0 P (V0 ) in L2 () in probability. Supposing all the above assumptions are met, we have the following results. All the convergences are in probability. Theorem 4. When q < 1/2, uε 0 P (V0 )f in L2 () and uε − λε f L1 () → 0 for all f ∈ L2 (). If f ∈ H01 () W 1,∞ () then uε − λε f 0 0 in H01 (). If P (V0 ) = 1 then uε → f in L2 () for all f ∈ L2 (). The proof of this theorem is similar to that of Theorem 2. If P (V0 ) = 1, since uε 0 f , we have lim sup uε − f ≤ lim sup f uε − 2 f uε + f 2 = 0. ε→0

ε→0

When q = 1/2, we will only consider the case V0 = W . Let κ ε in H01 (V0ε ) satisfy

∂κ ε .∂ψ + κ ε ψ = ψ,

(12)

for all ψ ∈ H01 (V0ε ). Then κ ε H 1 (W ) is bounded. Assuming that L2 (W ) is separable, then H 1 (W ) is also separable. Indeed, if we consider a random variable g in H 1 (W ) as (g, ∂1 g, . . . , ∂N g) ∈ (L2 (W ))N+1 , then H 1 (W ) is a closed subspace of the separable space (L2 (W ))N+1 and therefore, is separable. Then from {κ ε } we can extract a subsequence which converges weakly to a random variable κ in H 1 (W ). Next we show that under certain conditions, the set of random variables which belong to H01 (V0ε ) for some ε is dense in H 1 (W ). Since the subset of H 1 (W ) containing bounded random variables is dense in H 1 (W ), we only need to prove this for pointwise bounded random variables. Let g ∈ H 1 (W ) be bounded pointwise and g ε be equal to g in V0ε and to 0 outside ε V0 . Let g εδ be defined as in (3). From Zhikov et al. [27], we have g εδ − g ε 2L2 (W ) ≤ sup |g ε (T (δt)ω) − g ε (ω)|2 P (dω). |t|<1 W

Let Vδtε be the set of ω such that T (δt)ω ∈ V0ε . We then have |g(T (δt)ω) − g(ω)|2 P (dω) g εδ − g ε 2L2 (W ) ≤ sup |t|<1 Vδtε

V0ε

+

Vδtε \V0ε

≤ sup

|t|<1 W

|g(T (δt)ω)|2 P (dω) +

V0ε \Vδtε

|g(ω)|2 P (dω)

|g(T (δt)ω) − g(ω)|2 + cP (Vδtε \ V0ε ) + cP (V0ε \ Vδtε ).

Let δ(ε) be such that δ → 0 when ε → 0. Then the first term converges to 0 ([27] p. 233). Since P (Vδtε \V0ε ) ≤ 1−P (V0ε ) → 0, and P (V0ε \Vδtε ) ≤ 1−P (Vδtε ) = 1−P (V0ε ) → 0, we have that limε→0 g εδ − g ε 2L2 (W ) = 0.

422

V. H. Hoàng

Next we show that if g ∈ H 1 (W ) then ∂i g εδ → ∂i g in L2 (W ) for an appropriate choice of δ. From (4) we have |∂i g εδ − ∂i g δ |2 W ∂K δ (t) 2 ε 2 ≤ |g (T (t)ω) − g(T (t)ω)| dt P (dω) dt. ∂ti W |t|<δ |t|<δ ∂K(t/δ) 2 −2−2N ≤δ |g ε − g|2 P (dω)dt dt. ∂ti |t|<δ |t|<δ W ∂K(t) 2 −2 2 = cδ |g| P (dω) dt. ε ∂t i W \V0 |t|<1 Hence ∂i g εδ − ∂i g L2 (W ) ≤ cδ −1 (1 − P (V0ε ))1/2 + ∂i g δ − ∂i g L2 (W ) . Letting δ = (1 − P (V0ε ))α/2 for some α < 1, the right-hand side converges to 0. If for each ε there ε (2δ) for δ = (1 − P (V0ε ))α/2 , then g εδ ∈ H01 (V0ε0 ); exists an ε0 < ε such that V0ε ⊂ V0 0 the set of random variables belonging to H01 (V0ε ) for some ε is then dense in H 1 (W ). Let ψ ∈ H01 (V0ε0 ) for some ε0 > 0 then ψ ∈ H01 (V0ε ) for ε < ε0 . When ε → 0 from (12) we have

∂κ.∂ψ + κψ = ψ. Since this is true for ψ in a dense set of H 1 (W ) it is true for all ψ in H 1 (W ). As a random variable κ that satisfies the above equation exists and is unique, so κ = 1. Thus lim supε→0 (κ ε − 1)2 ≤ lim supε→0 κ ε − 2 κ ε + 1 → 0, so κ ε → 1 in L2 (W ). Hence letting λε (x) = κ ε (T (x/ε 1/2 )ω) we have that λε 0 1 in L2 () in probability. Assuming all the above conditions are met, we then have the following result. Theorem 5. When q = 1/2, uε → f in L2 () in probability. 4. The Poisson Distributions of Holes We consider a Poisson distribution of holes’ centres in RN . Let λ and r be positive constants. The probability that the number of holes’ centres inside a bounded Borel set A equals a (a = 0, 1, 2, . . . ) is P (ν(A) = a) = e−λε

−r |A|

(λε −r |A|)a /a!.

The quantity λε−r is the intensity of the distribution; it represents the average number of holes’ centres in a unit volume. Let T be a closed bounded set containing the origin in its interior; T represents the sample hole. A hole of centre xk and size ε p is defined as ∞ ε p ε Tk = {x : x − xk ∈ ε T }. The perforated domain is then = \ k=1 Tkε . Making the rescaling y = xε−r/N we obtain a new Poisson process in the y space such that for every bounded Borel set A, P (ν(A) = a) = e−λ|A| (λ|A|)a /a!.

(13)

size is εp−r/N . Each realization The holes’ centres are now {y1 , y2 , . . . } and the holes’

of holes can be characterized as a Borel measure ν = ∞ i=1 δyi , where δyi is the Dirac

Random Homogenization and Singular Perturbations in Perforated Domains

423

measure. Let us define Nˆ to be the set of all Borel measures of non-negative integer ˆ values.

As shown in Daley and Vere-Jones [8], a measure ν in N is necessarily of the form ν = i ki δyi for a sequence of points {y1 , y2 , . . . } such that a bounded Borel set contains at most a finite number of these points, and ki are positive integers. The set of such measures when ki = 1 for all i is denoted by Nˆ ∗ . Daley and Vere-Jones [8] constructed a ˆ metric such that Nˆ is complete and separable. The set Nˆ∗ is measurable in N . This metric ˆ satisfies the property that if νk converges to ν in N then RN φ(x)d(νk ) → RN φ(x)d(ν) for all φ ∈ C0 (RN ). Since the measure λ|A| is non atomic, the Poisson point process is simple, i.e. P (Nˆ ∗ ) = 1. We wish {Nˆ , B(Nˆ ), P } to play the role of the probability space {W, F, P } in the previous section (B(Nˆ ) denotes the σ -algebra of the Borel sets in Nˆ ). It is quite easy to see that for y ∈ RN , T (y)ω = {y1 − y, y2 − y, . . . }, (ω = {y1 , y2 , . . . }) defines an N dimensional dynamical system on W (Papanicolaou [19]). Let V0ε be the set of all realizations ω = {y1 , y2 , . . . } such that there is no yk belonging to the set −ε p−r/N T . The set y ∈ RN not belonging to any holes Tk = {y : y − yk ∈ ε p−r/N T } is the random setV ε (ω). Letting be a bounded open set in RN , the perforated domain ε is ε = εr/N V ε (ω). First we consider p = r/N . Theorem 1 holds. To show that Theorem 2 holds, we first show that L2 (Nˆ ) is separable. Since Nˆ with the metric mentioned above is separable ([8]), B(Nˆ ) is countably generated. Since {Nˆ , B(Nˆ ), P } is σ finite, the metric space of sets in B(Nˆ ) with finite measure is separable; this implies the separability of L2 (Nˆ ) (Halmos [11] pp. 168 and 177). We need to show that P (V0 \ δ V0δ ) = 0 (we write V0 for V0ε since it does not depend on ε). Let −T δ be the δ neighbourhood of −T . The set of all realizations which do not intersect −T δ is V0δ . The probability of the set of realizations which has at least a point δ in −T δ \ −T is 1 − e−λ|−T \−T | which converges to 0 when δ → 0 (we assume that δ limδ→0 | − T \ −T | = 0 which holds for many sets T ). Therefore P (V0 \ δ V0δ ) = 0. Theorem 2 applies. Next we show that Theorem 3 holds. The random domain V (ω) is strictly porous. In fact there exists a random variable h such that h−(1+m) < ∞ for all m > 0. We need to assume that T contains a ball of positive radius ρ > 0. Let z(ω) be the infimum of |yk |. We have: Lemma 2. For δ > 0, let h(ω) = ρ N+δ−2 Cδ /zN+δ , where Cδ = (N − 2)δ for N > 2 and Cδ = δ 2 for N = 2. Then the estimate (5) holds. The proof can be found in [4]. We now prove that the domain is indeed strictly porous. Lemma 3. For all m > 0, h−(1+m) < ∞. Proof. We have h−(1+m) = (Cδ ρ N+δ−2 )−(1+m) z(N+δ)(1+m) . Let dz be an infinitesimal quantity and B(z) be the ball centred at the origin with radius z. The probability that z < z(ω) < z + dz is N −zN )

e−λ|B(z)| (1 − e−λCN ((z+dz)

N

) ∼ e−λCN z λCN N zN−1 dz + o(dz).

424

V. H. Hoàng

Therefore

z

(N+δ)(1+m)

= λCN N

∞

N

z(N+δ)(1+m)+N−1 e−λCN z dz < ∞.

0

From this we have that Theorem 3 applies for the Poisson distribution of holes. In fact it is true for any n < 2. Next we consider the case p > r/N and 2r/N < 1. The set V0ε consists of all realizations ω such that no yk belongs to −ε p−r/N T which is an increasing sequence Np−r |T | ∀ε, so of sets with respect to ε decreasing. As P (Nˆ \ ε V0ε ) ≤ 1 − e−λε ε P (Nˆ \ ε V0 ) = 0. The set V0εδ then contains all the realizations which do not intersect a δ neighbourhood p−r/N of −εp−r/N T . If ε1 > ε2 there is a positive minimum distance between ∂(−ε1 T) p−r/N ε1 ε2 δ and ∂(−ε2 T ). Therefore if δ is sufficiently small, V0 ⊂ V0 . All the assumptions of Theorem 4 hold; therefore the theorem holds for V0 = Nˆ and P (V0 ) = P (Nˆ ) = 1. We have been able to show that theorem 5 holds only for N ≥ 3. We need to show that ε (2δ) for δ = (1−P (V0ε ))α/2 , (α < 1), there is an ε0 < ε such that V0ε ⊂ V0 0 . Let d be the p−r/N distance between the origin and ∂T . This condition holds if εp−r/N d > ε0 d + 2δ. Np−r ε −λε |T | Np−r α/2 (p−r/N)Nα/2 Since 1 − P (V0 ) = 1 − e ∼ λ|T |ε , δ ∼ (λ|T |) ε < cεp−r/N when N α/2 ≥ 1. We can choose α < 1 such that this is satisfied when N > 2. We now deal with the case where p < r/N . We have the following theorem. Theorem 6. When p < r/N , if f ∈ Ls () for some s > 2 then a.s. for all t > 0 we can find a positive constant c(t) such that uε < c(t)ε t when ε is sufficiently small. Proof. For any fixed ε, Lemma 2 holds for hε (ω) = (ρε (p−r/N) )(N+δ−2) Cδ /zN+δ . Then (hε )−(m+1) = (Cδ−1 (ρ −1 ε r/N−p )N+δ−2 )1+m z(N+δ)(1+m) . Now we deduce a Poincaré type inequality. The idea is similar to that by Beliaev and Kozlov for the inequality (11). Let ψ ∈ D(ε ). From the Holder inequality, we have x 1/2 1/n |ψ|n dx ≤ |ψ|2 hε T r/N ω dx ε ε ε x (2−n)/(2n) (hε )−(1+m) T r/N ω dx , (14) ε where n = 1 + m/(2 + m). Since the domain is porous we have from (5) that x |ψ|2 hε T r/N ω dx ≤ ε2r/N |∇ψ|2 dy. ε On the other hand x −(1+m) x (N+δ)(1+m) = c(δ)ε (r/N−p)(N+δ−2)(1+m) z T r/N ω . hε T r/N ω ε ε Since z(T (x/εr/N )ω)(N+δ)(1+m) 0 z(N+δ)(1+m) in L1 (), the second term on the right-hand side in (14) is smaller than c(δ)ε (r/N−p)(N+δ−2)(1+m)(2−n)/(2n) ≤ c(δ)ε (r/N−p)δ/2 .

Random Homogenization and Singular Perturbations in Perforated Domains

425

Therefore ε

|ψ|n dx

1/n

≤ c(δ)ε r/N+(r/N−p)δ/2

1/2 ε

|∇ψ|2 dx

;

and this is true for all ψ ∈ H01 (ε ). Since f ∈ Ls () and s > 2, there exists a constant n < 2 such that 1/n + 1/s = 1. We therefore have ε ∇uε 2 + uε 2 ≤ f Ls () uε Ln () ≤ c(δ)ε r/N+(r/N−p)δ/2 ∇uε , so ∇uε ≤ c(δ)εr/N +(r/N−p)δ/2−1 . Hence uε Ln () ≤ c(δ)ε 2r/N+(r/N−p)δ−1 , which gives uε ≤ c(δ)εr/N +(r/N−p)δ/2−1/2 . For any t > 0 we can always find a δ > 0 such that r/N + (r/N − p)δ/2 − 1/2 ≥ t so that uε ≤ c(t)εt . So far we have solved the cases where the holes’ size is εp and the average number of holes per unit volume is λε−r , and either p = r/N , or p > r/N and 2r/N < 1, or p > r/N and 2r/N = 1 for N > 2, or p < r/N . Let us consider the case p > r/N and 2r/N > 1. When N ≥ 3, since p > 1/2 we can apply the method of Wiener sausage presented in detail in [14]. The proofs are similar to those in [14] so we do not present them here. We say that the domain satisfies the condition (R) if for all ε > 0 small enough, the measure of the set of points x ∈ such that the distance d(x, ∂) between x and ∂ is less than ε is smaller than cε for some constant c not depending on ε. Many sufficient conditions for (R) to hold exist. For example any domains which can be mapped to the unit ball by a regular mapping function satisfy this condition. The following results hold when p > 1/2 for all r. Theorem 7. When N ≥ 3, uε has the following behaviour in L2 (). If 1+(N −2)p−r > 0, then uε → f ; if 1 + (N − 2)p − r = 0, then uε → f/(1 + 2λcap T ); and if 1 + (N − 2)p − r < 0, then uε → 0. The convergence holds in probability in the space of configurations of holes. In the above cap T denotes the Newtonian capacity of the set T . For the case 1 + (N − 2)p − r ≥ 0 we can deduce an order of convergence when f ∈ H01 () L∞ (). Theorem 8. Assume that f ∈ H01 () L∞ () and satisfies the condition (R). When N ≥ 3 if 1 + (N − 2)p − r > 0, then ε−µ/2 uε − f L2 () → 0 and if 1 + (N − 2)p − r = 0, then ε−µ/2 uε − f/(1 + 2λcap T ) L2 () → 0 in probability for all µ < min{1/2 + (N − 1)p − r, 2p − 1, 1/3}. ¯ 0 ⊂ . If (R) is not satisfied then the above results hold in L2 (0 ) for any subdomain When 1 + (N − 2)p − r < 0 an order of convergence is obtained for f in L∞ (). Theorem 9. Assume that satisfies (R) and f ∈ L∞ (). When N ≥ 3 if 1 + (N − 2)p − r < 0, then εµ uε → 0 in L2 () in probability for all µ > max{1 + (N − 2)p − r, 1 − 2p, −1/5}. ¯ 0 ⊂ . If does not satisfy (R), εµ uε → 0 in L2 (0 ) for all 0 such that Better results on the local behaviour of uε can be obtained when we impose more conditions on p and r.

426

V. H. Hoàng

¯ 0 ⊂ . Assume that f is bounded Theorem 10. Let 0 be any subdomain such that pointwise. When N ≥ 3 and 1+(N −2)p < r < min{5(1−2p)/6+Np, (N −2)p+9/8}, then ε 1+(N−2)p−r uε → f/(2λcap T ) in L2 (0 ) in probability. Furthermore, if f ∈ H01 (), then ε −µ/2 ε 1+(N−2)p−r uε − f/(2λcap T ) L2 (0 ) → 0 in probability for all µ < min{5/2 + (3N − 5)p − 3r, 3 + 8(N − 2)p/3 − 8r/3}. Now we consider the case N = 2, p > r/2 and r > 1. Theorem 11. When N = 2, p > r/2 and r > 1, assuming that f ∈ Ls () for some s > 2, a.s. we have uε ≤ c(t)εt for all t < r/2 − 1/2. Proof. The proof is similar to that of Theorem 6. We have that uε Ln () ≤ c(δ)ε r+(r/2−p)δ−1 and so uε ≤ c(δ)εr/2+(r/2−p)δ/2−1/2 , where δ can be chosen arbitrarily small.

If N ≥ 3 we have uε ≤ c(δ)ε r/N+(r/N−p)(N+δ−2)/2−1/2 which converges to 0 when p(N − 2) + 1 − r < 0 which is consistent with the results deduced from the Wiener sausage method. The case p > r/2 and r = 1 when N = 2 remains open. Acknowledgement. The author is grateful to John Willis for introducing this research topic. He also thanks James Norris for suggesting the Poisson distribution and the probabilistic proof in the appendix. Financial support from a Gulbenkian Research Studentship offered by Churchill College (Cambridge) is gratefully acknowledged.

Appendix We establish the upper bound for uε L∞ () , where uε is the solution of the problem −εuε + uε = f,

uε ∈ H01 (ε ),

f ∈ Ls () for some s > N/2. Our purpose is to clarify the proof of Theorem 8.15 of Gilbarg and Trudinger [10] for this particular case; and we do not give a full proof + −1 s ε here. Let b = ε −1 + f −1 Ls (ε ) |f |. We set w = u + ε f L ( ) . Let H (z) = α α −α −1 z − ε f Ls (ε ) for z ∈ [ε f Ls (ε ) , M] and is linear for z > M, α and M are positive numbers. We begin with the following inequality in [10] H (w) L2N/( ≤ C b Ls (ε ) . H (w)w L2s/(s−1) (ε ) , ˆ Nˆ −2) (ε ) 1/2

where Nˆ = N for N > 2 and 2 < 2ˆ < 2s for N = 2, C = C(N ) for N > 2 ˆ |ε |) for N = 2. It is quite simple to see from the proof of Theoand C = C(2, ˆ ||). We then have rem 8.15 of Gilbarg and Trudinger [10] that when N = 2, C < C(2, −1/2 H (w) L2N/( ≤ Cε H (w)w L2s/(s−1) (ε ) , where C = C(N, ||, s). Now ˆ N−2) ˆ (ε ) letting M → ∞, we get α ≤ Cαε −1/2 w α L2s/(s−1) (ε ) , w − ε −α f αLs (ε ) 2N/( ˆ N−2) ˆ ε L

( )

so w Lαχ s ∗ (ε ) ≤ (C ∗ α)1/α ε −1/(2α) w Lαs ∗ (ε ) ,

Random Homogenization and Singular Perturbations in Perforated Domains

427

∗ for all α > 0 (note that w > ε −1 f −1 Ls (ε ) so we can choose C such that this is true) where s ∗ = 2s/(s − 1) and χ = Nˆ (s − 1)/((Nˆ − 2)s) > 1. Since w ∈ L2 (ε ), by induction we find that w ∈ Lp (ε ) for all p > 0. When α ≥ 1, we can choose the constant C ∗ to be independent of α. Taking α = χ m for m = 0, 1, . . . , we have

w Lχ n s ∗ (ε ) ≤

∞

n−1

(C ∗ χ m )χ

m=0

where σ = m=0 χ −m , τ = β > 0 such that

−m

ε −χ

∞

−m /2

m=0 mχ

−m

w Ls ∗ (ε ) ≤ (C ∗ )σ χ τ ε υ w Ls ∗ (ε ) , and υ = −

∞

m=0 χ

−m /2.

Thus, there is

w Lχ n s ∗ (ε ) ≤ Cε−β w Ls ∗ (ε ) , where C = C(, N, s). Letting n → ∞ we get

w L∞ (ε ) ≤ Cε −β w Ls ∗ (ε ) . From the interpolation inequality w Ls ∗ (ε ) ≤ δ w Lr (ε ) + δ −µ w L2 (ε ) , where δ > 0, µ = (1/2 − 1/s ∗ )/(1/s ∗ − 1/r). Letting r → ∞, we get w Ls ∗ (ε ) ≤ δ w L∞ (ε ) + δ 1−s

∗ /2

w L2 (ε ) ,

so

w L∞ (ε ) ≤ Cε −β (δ w L∞ (ε ) + δ 1−s

∗ /2

w L2 (ε ) ).

Choosing δ such that Cε−β δ = C0 < 1, we get ∗ /2

w L∞ (ε ) ≤ Cε −β s Therefore + u + ε −1 f Ls (ε )

∗ /2

L∞ (ε )

≤ Cε −β s

w L2 (ε ) . + u + ε −1 f Ls (ε )

L2 (ε )

.

Thus there exists a constant β > 0 such that u+ L∞ (ε ) ≤ Cε −β . By considering −u we deduce the same inequality for u− . Note that β > 1. Probabilistically the result can also be proved in the following concise manner which gives a better constant β. Let ω be a Wiener process and f be 0 outside . From the Feynman-Kac formula, we have ∞ ε |u | ≤ e−t Ex {|f (ω(2εt))|}dt 0 ∞ 2 −t e (4π εt)−N/2 e−|y| /(4εt) f (y)dydt = 0

≤ cε−N/2

RN ∞

0

e−t t −N/2

2 /(4εt)

RN

e−s y

1/s dy

dt,

where s is such that 1/s + 1/s = 1. The Holder inequality has been used. Putting y = 2(εt)1/2 z, we get 1/s ∞ ε −N/(2s) −t −N/(2s) −s z2 e t dt e dz = cε−N/(2s) |u | ≤ cε 0

RN

since s > N/2. Thus β can be chosen to be N/(2s) which is less than 1.

428

V. H. Hoàng

References 1. Balzano, M.: Random relaxed Dirichlet problems. Ann. Mat. Pura App. 153, 133–174 (1988) 2. Baxter, J.R., Chacon, R.V., Jain, N.C.: Weak limits of stopped diffusions. Trans. Amer. Math. Soc. 293, 767–792 (1986) 3. Baxter, J.R., Jain, N.C.: Asymptotic capacities for finely divided bodies and stopped diffusions. Illinois J. Math. 31, 469–495 (1987) 4. Beliaev, A.Yu., Kozlov, S.M.: Darcy equation for random porous media. Comm. Pure Appl. Math. 49, 1–34 (1996) 5. Bensoussan, A., Lions, J.L., Papanicolaou, G.: Asymptotic Analysis for Periodic Structures. Amsterdam: North-Holland, 1978 6. Cioranescu, D., Murat, F.: Un terme étrange venu d’ailleurs I & II. In: Brézis, H., Lions, J. L. (eds.) Non Linear Partial Differential Equations and Their Applications, Collège de France Seminar, Vol II & III, Research Notes In Mathematics, 60 & 70, London: Pitman, 1982, pp.,98–138, 154–178 7. Dal Maso, G.: An Introduction to L-convergence. Boston–Basel–Berlin: Birkhäuser, 1994 8. Daley, D.J., Vere-Jones, D.: An Introduction to the Theory of Point Processes. Berlin–Heidelberg–New York: Springer-Verlag, 1988 9. Figari, R., Orlandi, E., Teta, S.: The Laplacian in regions with many small obstacles: Fluctuations around the limit operator. J. Stat. Phys. 41, 465–488 (1985) 10. Gilbarg, D., Trudinger, N.S.: Elliptic Partial Differential Equations of Second Order, 3rd ed.. Berlin, New York: Springer-Verlag, 1998 11. Halmos, P.R.: Measure Theory. New York: Van Nostrand, 1950 12. Hoàng, V.H.: Diffusion in Perforated Domains, PhD thesis. University of Cambridge, 1999 13. Hoàng, V.H.: Homogenization of singularly perturbed Dirichlet problems in perforated domains. Proc. Roy. Soc. Edin. 130A, 35–51 (2000) 14. Hoàng, V.H.: Singularly perturbed Dirichlet problems in randomly perforated domains. Comm. Part. Diff. Eqns. 25, 355–375 (2000) 15. Kac, M.: Probabilistic methods in some problems of scattering theory. Rocky Mountain J. Math. 4, 511–538 (1974) 16. Le Gall, J.F.: Fluctuation results for the Wiener sausage. Ann. Prob. 16, 991–1018 (1988) 17. Ozawa, S.: On an elaboration of M. Kac’s theorem concerning eigenvalues of the Laplacian in a region with randomly distributed small obstacles. Commun. Math. Phys. 91, 473–487 (1983) 18. Ozawa, S.: Random media and eigenvalues of the Laplacian. Commun. Math. Phys. 94, 421–437 (1984) 19. Papanicolaou, G.C.: Diffusion in random media. In: Keller , J.B., McLaughlin, D.W., & Papanicolaou G.C. (eds.) Surveys in Applied Mathematics, Vol I, New York: Plenum Press, 1995, pp. 205–254 20. Papanicolaou, G.C., Varadhan, S.R.S.: Boundary value problems with rapidly oscillating random coefficients. In: Fritz, J., Lebowitz, J.L., Szász, D. (eds.) Random Fields, Amsterdam, Oxford, New York: North-Holland, 1979, pp. 821–834 21. Papanicolaou, G.C., Varadhan, S.R.S.: Diffusion in regions with many small holes. In: Grigelionis, B. (ed.) Proceedings of Vilnius Conference in Probability, Lecture Notes in Control and Information Sciences 25, Berlin: Springer, 1980, pp. 190–206 22. Rauch, J., Taylor, M.: Potential and scattering theory on widely perturbed domains. J. Funct. Anal. 18, 27–59 (1975) 23. Sznitman, A.S.: Long time asymptotics for the shrinking Wiener sausage. Comm. Pure Appl. Math. 43, 809–820 (1990) 24. Talbot, D.R.S., Willis, J.R.: The effective sink strength of a random array of voids in irradiated material. Proc. Roy. Soc. Lond. A370, 351–374 (1980) 25. Tartar, L.: Cours Peccot au Collège de France, 1977. Unpublished (cited in Cioranescu and Murat [6]) 26. Tartar, L.: Convergence of the homogenization process. Appendix to Sanchez-Palencia, E., NonHomogeneous Media and Vibration Theory. Lecture Notes in Physics, 127, Berlin–Heidelberg–New York: Springer-Verlag, 1980 27. Zhikov (Jikov), V.V., Kozlov, S.M., Oleinik, O.A.: Homogenization of Differential Operators and Integral Functionals. Berlin–Heidelberg: Springer-Verlag, 1994 Communicated by Ya. G. Sinai

Commun. Math. Phys. 214, 429 – 447 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Recurrence and Transience of Random Walks in Random Environments on a Strip Erwin Bolthausen1, , Ilya Goldsheid2 1 Universität Zürich, Institut für Mathematik, Winterthurerstrasse 190, 8057 Zürich, Switzerland 2 School of Mathematical Sciences, Queen Mary and Westfield College, University of London,

London E1 4NS, UK Received: 15 March 2000 / Accepted: 14 April 2000

Abstract: We explain the necessary and sufficient conditions for recurrent and transient behavior of a random walk in a stationary ergodic random environment on a strip in terms of properties of a top Lyapunov exponent. This Lyapunov exponent is defined for a product of a stationary sequence of positive matrices. In the one-dimensional case this approach allows us to treat wider classes of random walks than before. 1. Introduction 1.1. Description of the model. Let (Pn , Qn , Rn ), −∞ < n < ∞, be a strictly stationary ergodic sequence of triples of m × m matrices with non-negative elements such that for all n the sum Pn + Qn + Rn is a stochastic matrix, (Pn + Qn + Rn )1 = 1,

(1.1)

where 1 is a column vector whose components are all equal to 1. We write the components of Pn as Pn (i, j ), 1 ≤ i, j ≤ m, and similarly for Qn and Rn . Let ( , F, P, T ) be the corresponding dynamical system with denoting the space of all sequences ω = (ωn ) = ((Pn , Qn , Rn )) of triples described above, F being the corresponding natural σ -algebra, P denoting the probability measure on ( , F), and T being a shift operator on defined by (T ω)n = ωn+1 . For fixed ω we define a random walk ξ(t), t ∈ N on the strip S = Z × {1, . . . , m} by its transition probabilities Qω (z, z1 ) given by   Pn (i, j ) if z = (n, i), z1 = (n + 1, j ),   R (i, j ) if z = (n, i), z = (n, j ), def n 1 (1.2) Qω (z, z1 ) =  Qn (i, j ) if z = (n, i), z1 = (n − 1, j ),   0 otherwise. Research supported by Swiss National Science Foundation contract No 20-55648.98

430

E. Bolthausen, I. Goldsheid

This defines, for any starting point z = (n, i) ∈ S and any ω, a law P rω,z for the Markov chain by def

P rω,z (ξ(1) = z1 , . . . , ξ(t) = zt ) = Qω (z, z1 )Qω (z1 , z2 ) · · · Qω (zt−1 , zt ).

(1.3)

We call ω the environment or the random environment. The “full measure” on the product of the space of environments and the space of trajectories ξ(·) is a semi-direct product of P and P rω,z , defined by P(dω)P rω,z (dξ ). All our main results do not depend on the choice of the starting point z. We therefore write P rω instead of P rω,z when there is no danger of confusion. If n ∈ Z, we call the set {(n, i) : 1 ≤ i ≤ m} the nth layer of the set S, and say that ξ is in layer n if its Z-component is n. Remark 1. Our random walk is allowed to jump at most one step to the right or to the left from the layer where it stays with no restrictions for jumps within these three layers. This however is no real restriction. The model incorporates the one that allows uniformly bounded jumps to the left and to the right. Let us consider a random walk on Z in a random def

environment ω˜ = ((p(x, k))x∈Z,−m≤k≤m , where p(x, k) ≥ 0 is a stationary sequence of vectors with m k=−m p(x, k) = 1. The distribution of the random walk is defined for a fixed environment η in a usual way by def

P rω˜ (ξ(t + 1) = x + k|ξ(t) = x) = p(x, k),

x ∈ Z.

(1.4)

We represent x ∈ Z uniquely as x = nm + i, where 1 ≤ i ≤ m. This defines a bijection x ↔ (n, i) from Z to S = Z × {1, . . . , m}. This bijection naturally transforms the ξ -process on Z into a walk on Z × {1, . . . , m}. The latter clearly is a random walk of the type (1.3) with an environment ω = ((Pn , Qn , Rn )), −∞ < n < ∞, given by Pn (i, j ) = p(nm + i, m + j − i), Rn (i, j ) = p(nm + i, j − i), Qn (i, j ) = p(nm + i, −m + j − i). A similar construction reduces the study of any random walk with bounded jumps on a strip to the case of a random walk on a wider strip but with jumps to the nearest layers only. From now on we will therefore consider only the strip model. Recurrence and transience for a random walk with nearest-neighbor jumps in dimension one were first discussed by Solomon [14], who, in addition to the necessary and sufficient condition for recurrence, also provides a complete description of conditions for transience at linear and sublinear speed of growth. For the same model very precise results concerned in particular with the sublinear growth were then obtained by Kesten, Kozlov, and Spitzer [9]. Finally, Sinai [13] discovered the unusual (log t)2 asymptotic behavior of this random walk in the recurrent case. The beautiful development that followed after that was (and still is) addressing questions which are beyond the scope of this paper and we therefore don’t discuss them (see [4, 5, 8], and [3], where also further references can be found). We should only add that at present the understanding of the nearest-neighbor jumps model is quite complete. The picture is much less complete in the case of non-nearest neighbor jumps. In dimension one the random walk with uniformly bounded (but generally non-nearest neighbor) jumps was studied by Key [10] who was the first to analyze the necessary and

Recurrence and Transience of Random Walks on a Strip

431

sufficient conditions for its recurrence and transience in terms of Lyapunov exponents. Sufficient conditions for linear growth (and hence transience) can be found in [12, 7]. The only result concerned with sufficient conditions for recurrence of a one-dimensional random walk with unbounded jumps that we are aware of is that of Andjel [1]: he proves that if the probabilities p(x, k), −∞ < x, k < ∞, are stationary in x, their joint distributions are invariant with respect to the reflection k → −k, and p(x, k) are exponentially small in k with the rate of this decay being uniform in x, then the corresponding random walk is recurrent. The main purpose of this paper is to relate the recurrence and transience of the random walk in a random environment on a strip to the top Lyapunov exponent λ+ of a product of a stationary sequence (An ) of positive matrices. If this Lyapunov exponent is negative, then the random walk converges to ∞ with probability one; if it is positive, the random walk converges to −∞; and if it is 0, then the random walk is recurrent. The definition of these matrices needs some preparation and is given below in (1.13). Let us mention that in the classical one-dimensional case of nearest neighbor jumps [14] our Lyapunov exponent reduces to λ+ = E log(qn /pn ), where (pn ) are the probabilities for jumps to def

the right, and qn = 1 − pn and thus coincides with the one considered by Solomon. There are certain similarities and differences between the approach developed by Key [10] and the one presented in this paper. Key relates recurrence and transience to a middle Lyapunov exponent of matrices whose elements are defined in terms of the function p(x, k). His construction works only if p(x, m) > 0 with P-probability one. The model considered by Key can be reduced to the strip case as explained in Remark 1. But, unlike in [10], our matrices are constructed in a much less straightforward way. However, there are certain advantages in our approach: – First of all, it works for the case of the strip under very natural conditions. – In the one-dimensional case it covers a wider range of random walks than that considered in [10]. In particular, the condition “p(x, m) > 0 with P-probability one” can be replaced by a completely natural condition “p(x, m) > 0 with positive P-probability”. – The matrices we construct have strictly positive elements. Even though they are not independent, it is usually much easier to control the top Lyapunov exponent of these matrices than the middle Lyapunov exponent of independent matrices. This applies in particular to the questions of existence and actual calculation of the Lyapunov exponents. Due to this approach, our paper is self-contained modulo the classical result from [2] stating the existence of the top Lyapunov exponent for a product of positive stationary matrices. – Finally, in our approach stationary environments are handled at no extra cost. 1.2. Notations and basic conditions. Let x = (x(i)) be a vector and A = (a(i, j )) a matrix. We put def def x = max |xi |, A = max |a(i, j )|. i

i

j

We say that A is strictly positive (and write A > 0), if all its components satisfy a(i, j ) > 0. A is called non-negative (and we write A ≥ 0), if all a(i, j ) are nonnegative. A similar convention applies to vectors. Although ξ(t) is an element of Z × {1, . . . , m}, we often use expressions like limt→∞ ξ(t) = −∞ which simply means that the Z-coordinate of ξ(t) tends to −∞ as t → ∞.

432

E. Bolthausen, I. Goldsheid

If U ⊂ S, we define τU as the first entrance time of the random walk ξ into U : def

τU = inf {t ≥ 0 : ξ(t) ∈ U } . By a slight abuse of this notation we write τn for the first entrance time into layer n instead of τ{(n,i):1≤i≤m} . We write [n, ∞) for the semi-strip {(k, i) ∈ S : k ≥ n} ⊂ S, def

and similarly (−∞, n] = {(k, i) ∈ S : k ≤ n}. The space S is partitioned into equivalence classes (also called communication classes) in the usual way. By definition, z1 ∈ S is reachable from z ∈ S if there exists N th N ∈ N with QN ω (z, z1 ) > 0, where Qω is the N power of the infinite matrix Qω . Note that this notion is ω-dependent. If U ⊂ S, z, z1 ∈ U , we say that z1 is reachable from z within U , if there is a path from z to z1 inside U with each transition having positive Qω probability. The equivalence classes (communication classes) are now defined by the equivalence relation z ∼ z1 meaning that z1 is reachable from z and vice versa. We then say that z and z1 communicate. The notion that they communicate within a set U is obvious. A point z is called reachable from a set B ⊂ S if there is a point z1 ∈ B such that z is reachable from z1 . Also B is reachable from z if there is a point in B which is reachable from z. The following set of four assumptions will be often used below. In the cases where all of them are relevant to the matter, they will be referred to as “Condition C”. Condition C. C1 The dynamical system ( , F, P, T ) is ergodic. C2 E log(1 − Rn + Pn )−1 < ∞, E log(1 − Rn + Qn )−1 < ∞. C3 For all j ∈ {1, . . . , m}, and all n, m

Qn (i, j ) > 0 and

i=1

m

Pn (i, j ) > 0,

i=1

P-almost surely. C4 With positive P-probability, the layer 0 is in one communication class. For future reference we need several immediate corollaries following from the above conditions. Our choice of the matrix norm together with (1.1) imply that min i

m

Qn (i, j ) = 1 − Rn + Pn and min i

j =1

m

Pn (i, j ) = 1 − Rn + Qn . (1.5)

j =1

Hence, it follows from Condition C2 and (1.5) that with P-probability one for all i and n m j =1

Qn (i, j ) > 0 and

m j =1

Pn (i, j ) > 0

(1.6)

Recurrence and Transience of Random Walks on a Strip

433

and moreover E log Pn ≥ E log(min i

m

Pn (i, j )) > −∞,

j =1 m

E log Qn ≥ E log(min i

Qn (i, j )) > −∞.

(1.7)

(1.8)

j =1

As will become clear later, Condition C2 implies that our main Lyapunov exponents are finite. In fact it is practically equivalent to the condition of existence of finite Lyapunov exponents and hence it would be difficult to weaken it. On the other hand, in the light of Remark 1, this condition may seem to be somewhat restrictive, as for instance m if with positive P probability we have p(nm + 1, m) = 0, then the corresponding j =1 Pn (1, j ) = 0 and C2 is not satisfied. However, in many cases this turns out to be a minor difficulty. Indeed, consider a new Markov chain which by definition is our Markov chain but observed only at those moments of time when it changes the layer. This new chain on S, call it ξ˜ (t), has transition probabilities defined by def ˜ n def P˜n = (I − Rn )−1 Pn , Q = (I − Rn )−1 Qn , R˜ n ≡ 0. If Rn is such that for any i there exists k ∈ N for which j Rnk (i, j ) < 1, i.e. the walk starting from any i in layer n can leave this layer with positive probability, then these matrices are well defined. (We remark that P˜n (i, j ) is the probability that the random walk ξ starting from (n, i) will reach (n + 1, j ) at the moment of its first exit from layer ˜ n (i, j ) is defined similarly.) The basic statements of n; the probabilistic meaning of Q our main Theorem 2 below can be re-formulated in terms of ξ˜ (t), and therefore, our ˜ n , 0). Assumption (1.6) applied to result is valid if Condition C is satisfied for (P˜n , Q ˜ n , 0) simply means that each layer can be left from any point inside it both by a (P˜n , Q jump to the right, and to the left.

Remark 2. Condition C3 and (1.6) easily imply that with P-probability one, there is just one communication class, i.e. that the chain ξ is with P-probability one irreducible. To see this, note first that from C2 (more precisely, from (1.6)) it follows that with Pprobability 1, the random walk starting from any point (n, i) can reach any other layer; from C3, it follows that any point can be reached from some point in any other layer. Also C4 and ergodicity imply that with P-probability one there are infinitely many layers each being in one communication class. We thus have that, with P-probability one, any two points in S communicate. We need the following strengthening of this statement. Lemma 1. If n ∈ Z, and (m, i), (m1 , i1 ) satisfy m, m1 < n, then (m, i) and (m1 , i1 ) communicate within (−∞, n] with P-probability one. Proof. There exists K ∈ N, such that with positive P-probability, the 0-layer communicates within [−K, K]. From ergodicity, we conclude that with P-probability one, there is a layer k with k + K < m, m1 , such that this layer communicates within [k − K, k + K]. From C2 it follows that layer k is reachable from (m, i) by a path stepping only to the left, i.e., in particular, within (−∞, n]. On the other hand, C3 implies that (m1 , i1 ) is reachable from layer k by steps only to the right, i.e. also within (−∞, n]. Combining these facts, one sees that (m1 , i1 ) is reachable from (m, i) within (−∞, n] with P-probability one. Similarly, (m, i) is reachable from (m1 , i1 ) within (−∞, n].

434

E. Bolthausen, I. Goldsheid

1.3. Main results. To formulate our main results, we first introduce a sequence of stochastic matrices (ζn ). This needs some preparation. First, for any a ∈ Z, and any stochastic matrix ρ, we define for n ≥ a matrices ψn = ψn,a,ρ recursively by def

ψa = ρ,

(1.9)

ψn = (I − Rn − Qn ψn−1 )−1 Pn

(1.10)

def

for n > a. The existence of the inverse of I − Rn − Qn ψn−1 is a trivial consequence of assumption C2. These matrices have the following probabilistic interpretation. We introduce at level a reflecting boundary conditions given by ρ, i.e. (Pa , Qa , Ra ) is simply replaced by (ρ, 0, 0). Call the modified environment ω. ˆ Then ψn (i, j ) is the probability that the random walk which starts in (n, i) and with the reflection at layer a reaches the layer n + 1 in the point j , i.e. ψn (i, j ) = P rω,(n,i) (ξ(τn+1 ) = j ) . ˆ Lemma 2. If ρ is stochastic then so are all the ψn , n ≥ a. Proof. The statement is evident from the probabilistic interpretation: the reflection at layer a together with assumptions C2 and (1.6) imply that the walk reaches the layer n+1 with probability 1. A formal proof can be given by induction in n. By definition, ρ = ψa is stochastic. Assume n > a. We have to check that ψn 1 = 1. Using (1.10) we write the following sequence of equivalent equalities: ψn 1 = 1 ⇔ (I −Qn ψn−1 −Rn )−1 Pn 1 = 1 ⇔ Pn 1 = (I − Qn ψn−1 − Rn )1 ⇔ (Pn + Qn ψn−1 + Rn )1 = 1. The last equality is true because Pn + Qn + Rn is stochastic and, by the induction assumption, ψn−1 1 = 1. Theorem 1. Suppose that Condition C is satisfied. a) For P-a.e sequence ω there exists ζn = lim ψn,a,ρ

(1.11)

a→−∞

and ζn is independent of ρ. Furthermore, the convergence is uniform in ρ. b) The sequence ζn = ζn (ω), −∞ < n < ∞, of m × m matrices is the unique sequence of stochastic matrices which satisfies the following system of equations: ζn = (I − Qn ζn−1 − Rn )−1 Pn ,

n ∈ Z.

(1.12)

c) The enlarged sequence (Pn , Qn , Rn , ζn ), −∞ < n < ∞, is stationary and ergodic. Once the above enlarged process is constructed, we can introduce the following sequence of matrices: An = (I − Qn ζn−1 − Rn )−1 Qn . def

(1.13)

Clearly, An is a stationary sequence. Furthermore, from (I − Qn ζn−1 − Rn )−1 ≥ I it follows that An ≥ Qn . Obviously, An ≤ (I − Qn ζn−1 − Rn )−1 ≤ (1 − Qn ζn−1 − Rn )−1 ,

Recurrence and Transience of Random Walks on a Strip

435

and it follows now from C2 and (1.7) that E| log An | < ∞.

(1.14)

The top Lyapunov exponent of the product of matrices An is defined by λ+ = lim def

n→∞

1 log An An−1 . . . A1 . n

(1.15)

It is well known [2] that with P-probability 1 the limit in (1.15) exists and does not depend on ω. It is clear from (1.14) that λ+ < +∞, but checking that λ+ is finite is a more delicate matter (see Remark 3 below). It should be mentioned that the existence of λ+ can nowadays be obtained in many ways. For instance, it is an immediate consequence of Kingman’s sub-additive ergodic theorem. However, in the paper by Furstenberg and Kesten [2], one also finds very useful additional information concerned with the more specific case of products of positive matrices. We are going to use this information in Sect. 3. Theorem 2. Suppose that Condition C is satisfied. Then a) λ+ > 0 if and only if for P-a.e. environment ω one has: lim ξ(t) = −∞,

t→∞

P rω -almost surely. b) λ+ < 0 if and only if for P-a.e. environment ω one has: lim ξ(t) = ∞,

t→∞

P rω -almost surely. c) λ+ = 0 if and only if for P-a.e. ω one has: lim sup ξ(t) = +∞, lim inf ξ(t) = −∞, t→∞

t→∞

P rω -almost surely. The reader has probably noticed that in the above formulations Pn and Qn play asymmetric roles whereas it is clear that they shouldn’t. Indeed, let us introduce stochastic matrices ζn− by the following system of equations which are symmetric to (1.12): − ζn− = (I − Pn ζn+1 − Rn )−1 Qn , −∞ < n < +∞.

(1.16)

A full analogue of Theorem 1 holds, and therefore the matrices A− n are defined as − −1 A− n = (I − Pn ζn+1 − Rn ) Pn . def

(1.17)

Finally, the top Lyapunov exponent of the product of matrices A− n is defined by λ− =

def

1 . A− . . . A− log A− n n+1 −1 n→−∞ |n| lim

A natural but non-trivial symmetry property is expressed by the following result:

(1.18)

436

E. Bolthausen, I. Goldsheid

Lemma 3. λ+ + λ− = 0.

(1.19)

The proof of this lemma will be given in Sect. 3. The symmetric formulation of Theorem 2 describing the behavior of the path in terms of λ− is now obvious. Remark 3. It follows from Condition C2 that n 1 1 log Aj = E log A1 lim sup log An An−1 · · · A1 ≤ lim n→∞ n n→∞ n j =1 −1 = E log (I − Q1 ζ0 − R1 ) Q1 ≤ E log (1 − Q1 + R1 )−1 .

Thus, the fact that λ+ < ∞ is implied by C2. Similarly, λ− < ∞. This together with Lemma 3 also implies λ+ > −∞. Lemma 3 allows us to evaluate the Lyapunov exponent in the following case: Corollary 1. If, in addition to Condition C, the sequence (Pn , Qn , Rn ), −∞ < n < ∞, has the same distribution as (Q−n , P−n , R−n ), −∞ < n < ∞. Then λ+ = λ− = 0.

(1.20)

Proof. The discussion above reveals that λ+ and λ− have the same distribution. As both are non-random, one has λ+ = λ− . From (1.19), the equation follows. Corollary 1 in conjunction with Theorem 2 shows that the random walk is recurrent when the random environment is symmetric in the above sense. However, in this case, recurrence can also be obtained as a consequence of ergodic considerations developed in [11, 6] for a one-dimensional random walk with bounded jumps. The corresponding argument easily extends to the case of a strip. Remark 4. If the environment is constant in n and not random (i.e. one has just three matrices P , Q and R which define the transition probabilities of the random walk), Theorem 1 states that there is a unique stochastic matrix ζ which satisfies the equation ζ = (I − R − Qζ )−1 P , and the matrix A is defined by A = (I − R − Qζ )−1 Q. def

Theorem 2 then states that this random walk is recurrent if and only if the largest eigenvalue of A is 1. 2. Proofs For the rest of the paper we always assume that Condition C is satisfied. We don’t state this condition explicitly anymore.

Recurrence and Transience of Random Walks on a Strip

437

2.1. Recurrence and exit probabilities. In order to study the asymptotic properties of our random walk, it is useful to investigate the behavior of the following exit probabilities. Let a < b. For a ≤ n ≤ b let hn,a,b (j ), 1 ≤ j ≤ m, denote the probabilities that the random walk starting at (n, i) ∈ S reaches layer b before layer a, i.e. hn,a,b (j ) = P rω,(n,j ) (τb < τa ) .

(2.1)

It is convenient to consider the following column-vectors: hn,a,b = (hn,a,b (j ))1≤j ≤m . We often drop the indices a, b in this notation. The sequence (hn ) satisfies the system of equations hn = Pn hn+1 + Qn hn−1 + Rn hn ,

a < n < b,

(2.2)

and the boundary conditions ha = 0, hb = 1. The meaning of these equations is simple and follows from the Markov property: to reach layer b, the random walk starting in layer n, a < n < b, is after the first jump either in one of the neighboring layers or still inside the layer n. The probabilities governing this first jump are given by the matrices Pn , Qn or Rn , respectively. Applying then the Markov property to the walk after the first jump leads to (2.2). def

To solve these equations, it is useful to introduce for a ≤ n matrices ϕn,a = (ϕn,a (i, j ))1≤i,j ≤m . Here ϕn,a (i, j ) is the P rω,(n,i) probability that the random walk reaches layer n + 1 before layer a and enters it at j , i.e. ϕn,a (i, j ) = P rω,(n,i) (τn+1 < τa , ξ(τn+1 ) = (n + 1, j )) .

(2.3)

We often drop the index a in this and similar expressions. Using the Markov property, we have ϕn = Pn + Qn ϕn−1 ϕn + Rn ϕn .

(2.4)

ϕn = (I − Qn ϕn−1 − Rn )−1 Pn .

(2.5)

Hence

The boundary condition ϕa = 0 and (2.5) define ϕn recursively for all n ≥ a. Using again the Markov property we have hn = ϕn hn+1 and hence the following expression solves (2.2): hn,a,b = ϕn,a ϕn+1,a . . . ϕb−1,a 1.

(2.6)

We consider now the a → −∞ limit of these objects. It is clear that ϕn = ϕn,a is a monotonic function of a: ϕn,a−1 ≥ ϕn,a . Hence, the following limits exist: def

ηn =

lim ϕn,a .

a→−∞

(2.7)

The probabilistic meaning of ηn is clear: ηn (i, j ) is the probability that a walk starting in (n, i) reaches the layer n + 1 at j . Similarly, we can define the limit for the h-vectors: hn,a,b is monotone in a. Hence, the limit h+ n,b (j ) =

def

lim hn,a,b (j )

a→−∞

438

E. Bolthausen, I. Goldsheid

exists. From (2.6), we get + hn,b = ηn ηn+1 · · · ηb−1 1.

(2.8)

Lemma 4. With P-probability one, one has ηn > 0 for all n. Proof. This is an immediate consequence of Lemma 1.

Lemma 5. If for some n one has P (ηn 1 = 1) > 0 then P (ηn 1 < 1) = 1 for all n. Proof. If P (ηn 1 = 1) > 0, then there exists i such that P h+ (i) < 1 > 0. Define n,n+1

def the event D = h+ n,n+1 (i) < 1 . We claim that P (T (D)\D) = 0. In fact, if ω ∈ D, then

under the environment T (ω) one has h+ n−1,n (i) < 1. However, (n − 1, i) is reachable from (n, i) inside (−∞, n], P-a.s., and therefore, for this shifted environment, one also has h+ n,n+1 (i) < 1. This implies the claim. From ergodicity, it follows that P(D) = 1. Remark that this implies that P (ηn 1 < 1) = 1 for this specific n, because (n, i) is reachable inside (−∞, n] from any other point (n, j ) in the layer. By stationarity, this then evidently implies P (ηn 1 < 1) = 1 for all n. This lemma has some easy consequences. Remark first that the Lyapunov exponent def

λη = lim

n→∞

1 log η1 η2 · · · ηn ≤ 0 n

(2.9)

exists and is non-random. Corollary 2. a) λη = 0 if and only if ηn 1 = 1 for all n, P-almost surely. In that case we have for P-a.a. ω: lim sup ξ(t) = ∞, P rω -a.s. t→∞

b) λη < 0 holds if and only if ηn 1 < 1 for all n, P-almost surely. In that case we have for P-a.a. ω: lim ξ(t) = −∞, P rω -a.s.

t→∞

Proof. The first parts in a) and b) are immediate consequences of Lemma 5. The consequence for the ξ -process in a) is easy: if the ηn matrices are all stochastic then the walk reaches with probability one each layer on the right of where it is, and therefore, it reaches any layer on the right of the starting point. To prove limt→∞ ξ(t) = −∞, P rω − a.s. in the case where λη < 0, note that this condition implies that P-a.s., ∞ ∞ + ηn ηn+1 · · · ηb−1 1 < ∞, hn,b =

b=n+1

b=n+1

which by the Borel–Cantelli Lemma implies lim supt→∞ ξ(t) < ∞, P rω − a.s., and hence limt→∞ ξ(t) = −∞.

Recurrence and Transience of Random Walks on a Strip

439

Remark 5. There are symmetric statements relating the behavior of the paths of the random walk to the Lyapunov exponent λ− η which is defined in terms of ηn− (i, j ) = P rω,(n,i) (τn−1 < ∞, ξ(τn−1 ) = (n − 1, j )) .

(2.10)

Note that the vectors

− def hn,a = lim 1 − hn,a,b , b→∞

(2.11)

are also given by the following expressions which are symmetric to (2.8): − − − hn,a = ηn− ηn−1 · · · ηa+1 1.

2.2. Proof of Theorem 1. Proof of a). We consider the difference def

/n = /n,a,ρ = ψn,a,ρ − ϕn,a with initial condition /a = ψa = ρ. Using (1.10) and (2.5) we write /n = [(I − Qn ψn−1 − Rn )−1 − (I − Qn ϕn−1 − Rn )−1 ]Pn = (I − Qn ψn−1 − Rn )−1 Qn (ψn−1 − ϕn−1 )(I − Qn ϕn−1 − Rn )−1 Pn = An /n−1 ϕn = An An−1 · · · An−k+1 /n−k ϕn−k+1 ϕn−k+2 · · · ϕn

(2.12)

with An = (I − Qn ψn−1 − Rn )−1 Qn . def

def

We put Yn,k = An An−1 · · · An−k+1 /n−k and rewrite (2.12) as /n = Yn,k ϕn−k+1 ϕn−k+2 · · · ϕn = Yn,k ηn−k+1 · · · ηn + Yn,k (ϕn−k+1 · · · ϕn − ηn−k+1 · · · ηn ) . Remark that the matrices Yn,k and ϕn depend on a, but not the η’s. Applying Lemma 9 from the Appendix to the product of η s, we present /n as /n,a,ρ =Yn,k,a,ρ Dk ((c(1)1, . . . , c(m)1) + εk ) +

Yn,k,a,ρ ϕn−k+1,a · · · ϕn,a − ηn−k+1 · · · ηn , where we indicate the dependencies on a and ρ. Dk is a diagonal matrix with positive diagonalelements, c(1), . . . , c(m) are positive numbers satisfying i c(i) = 1, and εk ≤ nr=n−k+2 (1 − mρr ), where ηr (i, j )ηr−1 (j, k) . ρr = min i,j,k l ηr (i, l)ηr−1 (l, k)

440

E. Bolthausen, I. Goldsheid

When k is growing while n is fixed we have that with P-probability 1 the sequence ε(k) tends to 0 because (ρr )r∈Z is a stationary sequence of strictly positive numbers. def

We write xn,k,a,ρ = Yn,k,a,ρ Dk 1. Then /on,k,a,ρ (i, j ) = Yn,k,a,ρ Dk ((c(1)1, . . . , c(m)1) + εk )(i, j ) = def

xn,k,a,ρ (i)c(j )(1 + ok (1)), where ok (1) → 0 as k → ∞, and does not depend on a and ρ. Therefore lim

/on,k,a,ρ (i, j )

k→∞ /on,k,a,ρ (i, l)

=

c(j ) , c(l)

uniformly in a and the boundary condition ρ. As on the other hand, for fixed n, k,

lim Yn,k,a,ρ ϕn−k+1,a · · · ϕn,a − ηn−k+1 · · · ηn = 0, a→−∞

uniformly in ρ, we get lim

a→−∞

c(j ) /n,a (i, j ) = , /n,a (i, l) c(l)

uniformly in ρ. As lima→−∞ /n,a,ρ evidently exists, we get that, uniformly in ρ, lim /n,a,ρ (i, j ) = c(j ) lim

a→−∞

a→−∞

m

/n,a,ρ (i, j ).

j =1

Combining this with the existence of a a → −∞ limit of the sequence ϕn,a of matrices, and the fact that the ψn,a,ρ are stochastic, we thus prove part a) of Theorem 1. Proof of b). Passing to the limit in (1.10) proves that (ζn ) satisfies Eq. (1.12). The

uniqueness of this solution follows from the observation that any solution ζn satisfies ζn = ψn,a,ζa ; then passing again to the limit a → −∞ and using the fact that the convergence is uniform in ρ = ζa proves that ζn = ζn . Proof of c). This is evident from the ergodicity condition on (Pn , Qn , Rn ) and the construction of the sequence.

2.3. Proof of Theorem 2. For a finite a, we can take in (2.12) the special boundary def

condition ρ = ζa for which we have ψn = ψn,a,ζa = ζn . Then An = An , and (2.12) gives for every n ∈ Z, k ≥ 1, such that n − k ≥ a : /∗n = An An−1 · · · An−k+1 /∗n−k ϕn−k+1 ϕn−k+2 · · · ϕn ,

(2.13)

def

where /∗n = /∗n,a = ζn − ϕn,a . In particular /∗n = An An−1 · · · Aa+1 /∗a ϕa+1 ϕa+2 · · · ϕn = An An−1 · · · Aa+1 ζa ϕa+1 ϕa+2 · · · ϕn .

(2.14)

Recurrence and Transience of Random Walks on a Strip

441

Suppose now that λ+ < 0. Then from (2.14) it follows that for fixed a and n sufficiently large, ∗ / ≤ An An−1 · · · Aa+1 ζa ≤ exp[λ+ (n − a)/2]. n Rewriting (2.6) for b > n as hn,a,b = (ζn − /∗n )(ζn+1 − /∗n+1 ) . . . (ζb−1 − /∗b−1 )1, we see that hn,a,b ≥ (1 − /∗n )(1 − /∗n+1 ) . . . (1 − /∗b−1 )1, − defined by (2.11) satisfies h− < 1. This and hence limb→∞ hn,a,b > 0, i.e. hn,a n,a implies ηn− 1 < 1 (see (2.10) and (2.11)). A statement symmetric to that of Corollary 2 (see Remark 5) implies that P rω (limt→∞ ξ(t) = +∞) = 1. Similarly, if λ+ > 0 then, according to (1.19), λ− < 0 and P rω (limt→∞ ξ(t) = −∞) = 1. Suppose now that for P-almost every environment ω, limt→∞ ξ(t) = −∞, P rω -a.s. Then obviously each layer b will be visited by our random walk only a finite number def

of times and Corollary 2 implies ηn 1 < 1, P-a.s. for all n, and λη < 0. Define δn = lima→−∞ /∗n,a . Then (2.13) implies δn = An An−1 · · · A1 δ0 η1 η2 · · · ηn .

(2.15)

Remark now that the sequence δn is stationary and satisfies 0 < δn (i, j ) < 1, for all i, j . But then there exists at least an infinite subsequence nk such that 1 log δnk = 0. k→∞ nk lim

Using this together with (2.15) gives λ+ = lim

n→∞

1 log An An−1 · · · A1 = −λη . n

This completes the proof of the statements a) and b) of Theorem 2. We turn now to c). Suppose that λ+ = 0. Then /n = ζn − ηn = 0 (indeed, /n > 0 would imply λ+ > 0). This implies lim supt→∞ ξ(t) = ∞ P rω -almost surely, for Palmost all ω. Using Lemma 3, we also have λ− = 0, and we obtain symmetrically lim inf t→∞ ξ(t) = −∞. The other direction of part c) follows from a) and b) which are already proved. 3. Invariant Measures and Proof of Lemma 3 The proof of Lemma 3 follows from considerations concerning the stationary measure of the random walk. This stationary measure is not in itself of importance for the topics covered in this paper, but it will become crucial in later work.

442

E. Bolthausen, I. Goldsheid

3.1. Invariant measures of Markov chains confined to finite boxes. Consider a Markov chain whose phase space is a box [a, b], a < b, – the part of the strip which is contained between layers a and b. The behavior of the random walk at the boundaries a and b is defined as follows. We replace the triple (Pa , Qa , Ra ) by (ρa , 0, 0), and (Pb , Qb , Rb ) with (0, ρb , 0), where ρa and ρb are stochastic matrices. The meaning of this choice is clear: the random walk is not allowed to jump to the left from a or to the right from b and is sent to layer a + 1 immediately after it reaches layer a and, similarly, to b − 1 from b. If [a, b] is such that at least one layer inside [a, b] belongs to one communication class then the whole of [a, b] belongs to this class. The proof of this statement is similar to that of Lemma 1. The invariant measure of this Markov chain is then unique. We are going to consider growing sequences of intervals with a → −∞, b → ∞. This means that for a.a. environments the invariant measure becomes unique if b − a is large enough. We write this measure as a sequence of row-vectors (πn )a≤n≤b , where πn = (πn (i))1≤i≤m . From now on we suppose our boxes to be large enough. The measure is then unique and moreover πn > 0 for all n. We will now describe this measure in terms of products of matrices. Consider the usual system of equations satisfied by the vectors πn : πn = πn+1 Qn+1 + πn Rn + πn−1 Pn−1 ,

a < n < b,

(3.1)

where for n = a + 1 the matrix Pa has to be replaced by ρa , and for n = b − 1 the matrix Qb has to be replaced by ρb . Furthermore, we have the following equations at the boundaries: πa = πa+1 Qa+1 ,

πb = πb−1 Pb−1 .

(3.2)

To solve these equations we use the transfer matrix approach which is well known in the theory of difference equations. The idea is to define recursively a sequence of m × m matrices αn , a ≤ n < b, such that πn = πn+1 αn .

(3.3)

Equations (3.2) provide a natural candidate for αa : def

αa = Qa+1 . To find αn , n > a, suppose that αn−1 is known. We then substitute πn−1 = πn αn−1 into (3.1) obtaining thus πn = πn αn−1 Pn−1 + πn Rn + πn+1 Qn+1 , which immediately implies the recursion αn = Qn+1 (I − Rn − αn−1 Pn−1 )−1 , def

a < n < b,

(3.4)

again with Pn−1 for n = a + 1 replaced by ρa . Once the αn are known, one uses (3.3) and the boundary condition (3.2) at b to write the following equation for πb : πb = πb αb−1 Pb−1 .

(3.5)

Recurrence and Transience of Random Walks on a Strip

443

Finally, once πb is found, the solution can then be presented as πn = Z −1 πb αb−1 . . . αn , (3.6) where Z −1 is a normalizing factor, Z = a≤n≤b,1≤i≤m πn (i). The above calculations are formal and have to be justified. Obviously, there are two questions that have to be addressed: the existence of αn and πb . This will be done right now. We emphasize that this can be done because of the probabilistic nature of the equations in question, and, more specifically, because of the algebraic structure provided by Eqs. (1.1). This very structure, by the way, also leads to a number of useful relations between our matrices. As usual, we emphasize the dependence on a and b by writing πn,a,b and of αn on a by writing αn,a . Of course, they also depend on the choice of the boundary conditions. Remark that the αn do not depend on the boundary condition at b. Let us put γn = (I − αn Pn − Rn+1 )−1 . def

(3.7)

Lemma 6. a) The matrices γn exist and γn ≤ (1 − Pn+1 + Rn+1 )−1 .

(3.8)

b) The matrices def

ϑn+1 = γn Pn+1 , a ≤ n ≤ b − 1,

(3.9)

are stochastic. Proof. We use induction in n. Let us first check (3.8) and (3.9) for n = a. From Qa+1 ρa + Ra+1 = Qa+1 + Ra+1 it is obvious that γa = (I − Qa+1 ρa − Ra+1 )−1 exists and satisfies (3.8). The following sequence of equivalent relations proves b): (I − Qa+1 ρa − Ra+1 )−1 Pa+1 1 = 1 ⇐⇒ Pa+1 1 = (I − Qa+1 ρa − Ra+1 )1 ⇐⇒ Pa+1 1 = (I − Qa+1 − Ra+1 )1 ⇐⇒ (1.1). By the induction assumption, for j ≤ n − 1, γj exist, and ϑj +1 are stochastic. It remains to check that this is also true when j = n. Note first that αj = Qj +1 γj −1 exists if j ≤ n. Hence I − αn Pn − Rn+1 = I − Qn+1 γn−1 Pn − Rn+1 = I − Qn+1 ϑn − Rn+1 . By the induction assumption, ϑn is stochastic. Hence the reasoning used for n = a applies again and proves a) and b) in general case. Corollary 3. Lemma 6 states that αb−1 Pb−1 is stochastic. Hence, a non-trivial solution to Eq. (3.5) exists, and formula (3.6) provides a representation for the invariant measure (πn )a≤n≤b . We now discuss the existence of the a → −∞ limit of the matrices αn = αn,a . Lemma 7. a) If ρa = ρ, where ρ is from (1.9) then ϑn = ψn and αn,a = Qn+1 (I − Rn − Qn ψn−1 )−1 for n > a, where the matrices ψn are the ones defined by (1.10).

(3.10)

444

E. Bolthausen, I. Goldsheid

b) The following limit exists and is independent of the choice of the sequence of matrices ρa : def

αn =

lim αn,a = Qn+1 (I − Rn − Qn ζn−1 )−1 ,

a→−∞

(3.11)

where matrices ζn are the ones defined in Theorem 1. Furthermore, the convergence is uniform in ρa . Proof. According to (3.9),

−1 Pn+1 , ϑn+1 = I − Rn+1 − αn,a Pn

(3.12)

and hence, from the definition of boundary conditions at a, ϑa+1 = (I − Ra+1 − Qa+1 ρa )−1 Pa+1 . Thus, ϑa+1 = ψa+1 if ρa = ρ. From αn,a = Qn+1 γn−1 we have αn,a Pn = Qn+1 ϑn for all n ≥ a. Substituting this relation into (3.12) we obtain ϑn+1 = (I − Rn − Qn+1 ϑn )−1 Pn+1 .

(3.13)

From this, we see that ϑn = ψn ,

n > a,

as they are defined by the same recursion and initial condition. Letting now a → −∞ and using Theorem 1 proves the part b) of the lemma. The starting point of the above considerations is formula (3.3). It turns out to be useful to look at a symmetric way of solving Eqs. (3.1). Namely, we start with hypothesis that there exists a sequence of m × m matrices βn defined recursively and such that πn+1 = πn βn+1 .

(3.14)

It is now easy to repeat all the relevant statements and proofs. These considerations lead to useful algebraic relations between matrices α and β. Lemma 8. Let (πi )a≤i≤b be the unique invariant measure of our Markov chain. Then for any n, 0 < n < b, π0 = π0 β1 . . . βn αn−1 . . . α0 ,

(3.15)

Proof. From the uniqueness of the invariant measure and the fact that matrices αn are such that both (3.5) and (3.3) hold, it follows that π0 = πn αn−1 . . . α0 .

(3.16)

Similarly, from uniqueness, existence of β’s, and (3.14) we have: πn = π0 β1 . . . βn . Combining of (3.16) and (3.17) concludes the proof.

(3.17)

Recurrence and Transience of Random Walks on a Strip

445

3.2. Proof of Lemma 3. The stationary distribution is uniquely defined only if we normalize it to be a probability measure. When taking limits a → −∞ and b → ∞, this normalization is not appropriate. We therefore assume now that a < 0 < b and normal ize the measure so that m j =1 π0,a,b (j ) = 1, which together with (3.1) and (3.2) defines the measure uniquely as well. We then define def

πn =

lim πn,a,b .

a→−∞ b→∞

To prove the existence of this limit, note that π0 is the unique(!) eigenvector of (many) strictly positive matrices which do have the limit. Indeed, according to (3.15) we can write, say, π0,a,b = π0,a,b β1,b αo,a 1 and the convergence of β1,b α0,a to β α0 implies the convergence of π0,a,b . The convergence of πn,a,b follows then from, say, convergence in (3.17). We obviously have: 1 . . . β n π 0 = π0 β αn−1 . . . α0 .

(3.18)

1 . . . β n π˜ n 1 = π0 = π0 β αn−1 . . . α0 ,

(3.19)

Hence,

where π˜ n = πn πn −1 . But then def

1 1 1 . . . β n + lim log π˜ n αn−1 . . . α0 log π0 β n→∞ n n = λβ + λα

0 = lim

n→∞

(3.20)

with the obvious definitions of λβ and λα . We stress that the convergence in (3.20) is uniform with respect to π˜ n (see [2]; this also can be deduced directly from Lemma 9). Using now (3.11) and taking into account (1.13) implies for n > 0 : αn−1 . . . α0 = Qn (I − Rn−1 − Qn−1 ζn−2 )−1 · · · Q1 (I − R0 − Q0 ζ−1 )−1

(3.21)

= Qn An−1 An−2 · · · A1 (I − R0 − Q0 ζ−1 ) .−1 We can also consider the symmetric situation, with matrices A− n defined by (1.17). Then we arrive in a symmetric way at

1 . . . β n = P0 A− A− · · · A− I − Rn − Pn ζ − −1 . β 1 2 n−1 n+1 Equations (3.21) and (3.22) immediately imply that λ+ = λα and λ− = λβ and thus Lemma 3 follows.

(3.22)

446

E. Bolthausen, I. Goldsheid

4. Appendix The lemma presented in this appendix is well known (see, e.g. [2]). We prove it in order to make our article more self-contained. It is likely that the short and logical way of explaining the proof is new (but the idea is the old one). Lemma 9. Let gn = (gn (i, j )), n = 1, 2, . . . be a sequence of positive m × m matrices, gn > 0. Let Hn = gn gn−1 . . . g1 and

ρr = min gr (i, j )gr−1 (j, k)( gr (i, j )gr−1 (j, k))−1 . i,j,k

j

Suppose that ∞

ρr = ∞.

r=2

Then the product Hn can be presented as follows: Hn = Dn [(c(1)1, . . . , c(m)1) + εn ], where c(j ) are strictly positive numbers which are uniquely defined by the sequence g· , do not depend on n and such that j c(j ) = 1, Dn is a diagonal matrix with positive diagonal elements, and εn ≤ nr=2 (1 − mρr ). Proof. Let us rewrite Hn in the following form: −1 Hn = Dn Dn−1 gn Dn−1 Dn−1 gn−1 . . . D1−1 g1 = Dn g˜ n g˜ n−1 . . . g˜ 1 ,

where g˜ r ≡ Dr−1 gr Dr−1 , and Dr are diagonal matrices, with diagonal elements chosen so that all matrices g˜ r are stochastic (by definition, D0 = I ). It is very easy to see that Dr (i, i) = gr (i, ir ) gr−1 (ir , ir−1 ) . . . g1 (i2 , i1 ) ir ,... ,i1

and g˜ r (i.j ) =

gr (i, j )Dr−1 (j, j ) . Dr (i, i)

Obviously, g˜ r (i.j ) ≥ ρr , and it is well known that the product of stochastic matrices converges to a matrix of the form (c(1)1, . . . , c(m)1). Moreover, g˜ n g˜ n−1 . . . g˜ 1 = (c(1)1, . . . , c(m)1) + εn , where εn ≤

n r=2

(1 − mρr ).

Recurrence and Transience of Random Walks on a Strip

447

References 1. Andjel, E. D.: A zero or one law for one dimensional random walks in random environments. Ann. Prob. 16, 722–729 (1988) 2. Furstenberg, H., and Kesten, H.: Products of random matrices. Ann. Math. Statist. 31, 457–469 (1960) 3. Greven, A., and den Hollander, F.: Large deviations for a random walk in random environment. Ann. Probab. 22 no 3, 1267–1285 (1994) 4. Golosov, A.: Localization of random walks in one-dimensional random environments. Commun. Math. Phys. 92, 491–506 (1984) 5. Golosov, A.: On the limit distributions for a random walk in a critical one-dimensional random environment. Uspekhi Mat. Nauk 41 no 2, 189–190 (1986) 6. den Hollander, F., and Thorisson, H.: Shift-coupling and a zero-one law for random walk in random environment. Acta Apll. Mathem. 34, 37–50 (1994) 7. Kalikow, S.: Generalized random walk in a random environment. Ann Prob. 9, 753–768 (1981) 8. Kesten, H.: The limit distribution of Sinai’s random walk in a random environment. Physica A 138, 299–309 (1986) 9. Kesten, H., Kozlov, M.V., and Spitzer F.: Limit law for random walk in a random environment. Composito Mathematica 30, 145–168 (1975) 10. Key, E.: Recurrence and transience criteria for random walk in a random environment. Ann. Prob. 12, 529–560 (1984) 11. Ledrappier, F.: Quelques propriété des exposant charactéristiques. École d’été de probabilités de Saint Flour, XII-1982, Lecture Notes in Mathematics 1097, 305–396 (1984) 12. Letchikov, A.V.: A criterion for linear drift, and the central limit theorem for one-dimensional random walks in a random environment. Russian Acad. Sci. Sb. Math. 79, 73–92 (1993) 13. Sinai Ya. G.: The limiting behavior of a one-dimensional random walk in a random medium. Theory Prob. Appl. 27, 256–268 (1982) 14. Solomon, F.: Random walks in a random environment. Ann. Prob. 3, 1–31 (1975) Communicated by Ya. G. Sinai

Commun. Math. Phys. 214, 449 – 467 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Super Hilbert Spaces Oliver Rudolph Physics Division, Starlab nv/sa, Boulevard Saint Michel 47, 1040 Brussels, Belgium. E-mail: [email protected] Received: 8 November 1999 / Accepted: 25 April 2000

Abstract: The basic mathematical framework for super Hilbert spaces over a Graßmann algebra with a Graßmann number-valued inner product is formulated. Super Hilbert spaces over infinitely generated Graßmann algebras arise in the functional Schrödinger representation of spinor quantum field theory in a natural way. 1. Introduction The purpose of this article is to define and study the notion of super Hilbert space in a mathematically proper way and to establish generalizations of some basic results of the theory of Hilbert spaces for super Hilbert spaces. According to our definition a super Hilbert space is a module over a Graßmann algebra endowed with a Graßmann number-valued inner product. The notion of super Hilbert space has first been considered by DeWitt [1]. Our definition, though more general than DeWitt’s definition (see Sects. 4 and 7 below), is motivated by DeWitt’s work. We shall see that DeWitt’s definition is not general enough for certain physical applications and in particular we shall see that his notion of physical observable on a super Hilbert space is in general not well-defined. Within the framework developed in this paper we shall arrive at a more transparent notion of physical observable for super Hilbert spaces. In standard complex quantum theory the physical transition amplitudes are given by the complex-valued inner product on the underlying complex Hilbert space. For quantum field theories with fermionic degrees of freedom or supersymmetric quantum theories super Hilbert spaces with Graßmann number-valued inner products may be introduced as long as a prescription to compute physical probabilities and transition amplitudes is given. In the functional Schrödinger representation of spinor quantum field theory super Hilbert spaces with Graßmann number-valued inner products arise naturally, see [4] and below. Therefore super Hilbert spaces provide a means to bring quantum (field)

450

O. Rudolph

theories with fermionic degrees of freedom and supersymmetric quantum (field) theories into a form resembling standard quantum mechanics which is of potential interest in several branches of quantum field theory and may shed new light on some technical and conceptual issues in quantum field theory. It is the aim of the present investigation to develop and study this notion more thoroughly. This paper is organized as follows: in Sect. 2 we review some facts about Graßmann algebras; in Sect. 3 we define and discuss the notion of a Hilbert module over a Graßmann algebra; in Sect. 4 we define super Hilbert spaces and in Sect. 5 we study physical observables on super Hilbert spaces. In Sect. 6 we review the Schrödinger representation of spinor quantum field theory as a simple example for a situation where a super Hilbert space arises naturally as the state space in a quantum field theory. In Sect. 7 we briefly review definitions of the notion of super Hilbert space previously given by other authors and discuss the relation of these definitions with our approach. 2. Graßmann Algebras The Graßmann algebra (or exterior algebra) n with n generators is the algebra (over C) generated by a set of n anticommuting generators {ξi }ni=1 and by 1 ∈ C, ξi ξj = −ξj ξi , for all i, j. The Graßmann algebra generated by a countably infinite set of generators will be denoted by ∞ . In the sequel we shall write n , where n ∈ N ∪ {∞} is possibly infinite unless indicated otherwise. Let Mn := {(m1 , · · · , mk ) | 1 ≤ k ≤ n, mi ∈ N, 1 ≤ m1 < · · · < mk ≤ n}. A special basis of n is given by the set of elements of the form ξm1 ξm2 · · · ξmk , with (m1 , · · · , mk ) ∈ Mn , together with the unit 1 ∈ C. Graßmann algebras are more fully discussed in, e.g., [3, 14]. We define an involution ∗ on n , i.e., a map ∗ : n → n satisfying (q ∗ )∗ = q, (qp)∗ = p∗ q ∗ and (αq)∗ = α ∗ q ∗ for q, p ∈ n , α ∈ C by setting 1∗ := 1, ξi∗ := ξi for all i and by extending ∗ to all of n by linearity. The Graßmann algebra carries a natural Z2 -grading: n = n,0 ⊕ n,1 , where n,0 consists of the even (commuting) elements in n and n,1 consists of the odd (anticommuting) elements of n , i.e., for ar ∈ n,r and as ∈ n,s we have ar as = (−1)rs as ar ∈ n,r+s(mod2) . We also write deg(ar ) = r if ar ∈ n,r and call deg(ar ) the degree of ar . Every element q ∈ n can be uniquely decomposed as q = qB 1 + qS = qB 1 + qm1 ,··· ,mk ξm1 · · · ξmk , (1) (m1 ,··· ,mk )∈Mn

where qB , qm1 ,··· ,mk ∈ C. The complex number qB is called the body of q and the Graßmann number qS is called the soul of q. Notice that the body operation is linear and respects the algebra structure, i.e., (q + p)B = qB + pB and (pq)B = pB qB for all q, p ∈ n . The Graßmann algebra can also be written as a direct sum n = ⊕nr=0 Vr , where Vr is the complex vector space spanned by the elements of the form nξm1 · · · ξmr , r fixed. Therefore any q ∈ n can be uniquely decomposed as q = r=0 qr with qr ∈ Vr . Any choice of a basis of V1 may serve as a possible choice of (possibly complex) generators of n .

Super Hilbert Spaces

451

For n finite there is an isomorphism : n → n known as the Hodge star operator. Consider the ordered sequence {ξ1 , · · · , ξn } of all generators of n , then is defined on the element ξi1 ξi2 · · · ξid by [ξi1 ξi2 · · · ξid ] := ξj1 ξj2 · · · ξjn−d , where (j1 , · · · , jn−d ) is chosen such that (i1 , · · · , id , j1 , · · · , jn−d ) is an even permutation of (1, · · · , n). We extend to all of n by conjugate linearity, i.e., we require that [αq] := α ∗ [q] for α ∈ C and q ∈ n and that is a real linear transformation. It is well-known that the Hodge star operator is independent of the basis used to define it. Now expand every q ∈ n with respect to the basis of n as in Eq. (1). Then we can define for each 1 ≤ κ < ∞, 1/κ κ κ |q|κ := |qB | + |qm1 ,··· ,mk | . (2) (m1 ,··· ,mk )∈Mn

Moreover, we define |q|∞ := sup(m1 ,··· ,mk )∈Mn |qm1 ,··· ,mk |. If n is finite, it is straightforward to verify that each | · |κ defines a norm on n and that n becomes a complex Banach space with each of the norms | · |κ , 1 ≤ κ ≤ ∞, which we denote by n (κ) respectively. In the case of ∞ , | · |κ defines a seminorm on ∞ and we denote the set of all q ∈ ∞ for which the above expression for |q|κ satisfies |q|κ < ∞ by ∞ (κ). Again it is easy to see that ∞ (κ) with the norm | · |κ is a Banach space for all 1 ≤ κ ≤ ∞. The norm | · |1 is sometimes also called the Rogers norm and n (1) the Rogers algebra, see [10]. The norms | · |κ in (2) depend implicitly on the choice of the set of generators of the Graßmann algebra and are not invariant under a change of the set of generators of n . Graßmann number-valued variables appearing in physical theories are by their very nature not directly observable and therefore in general there may be no physically preferred choice for the set of generators of a Graßmann algebra. However, for n finite, it is well-known that not only all the norms in (2) are equivalent and therefore generate the same topology on n for all 1 ≤ κ ≤ ∞ but the resulting topology is in fact independent of the choice of generators of the Graßmann algebra (this is an immediate consequence of Proposition 1.2.16 in [5]). It is evident that the Hodge star operator is continuous in this topology. There is a norm, invariant under change of generators, on n which can be constructed as follows. Firstly, it is known that there is a norm · r on Vr given by, [3, 14], qr r = inf |qm1 ,··· ,mr | , (3) (m1 ,··· ,mr )∈Mn

for qr ∈ Vr where the infimum is taken over all possible choices of the set of generators of the Graßmann algebra. The norm · r satisfies qr ps r+s ≤ qr r ps s , for all qr ∈ Vr and ps ∈ Vs , see [3, 14]. Now define a seminorm on n by q :=

n

qr r .

(4)

r=0

For n finite it is obvious that · is a norm on n . This norm · is called the mass (norm) on n (n finite). By construction the mass norm is independent of the choice of the set of generators of n . If n = ∞, then every finite subset {ξi1 , · · · ξim } ∪ {1} of the set of all generators {ξi }i of ∞ generates an m-dimensional Graßmann subalgebra of ∞ denoted by i1 ,··· ,im .

452

O. Rudolph

The collection of all such Graßmann subalgebras of ∞ forms a directed set and the canonical imbedding morphisms obviously preserve the mass norm. We consider the algebraic direct limit ∞ of this directed set. The mass norm on the finite dimensional Graßmann subalgebras induces a mass norm · on ∞ . We denote the completion of m m ∞ with respect to the mass norm by ∞ .mObviously, ∞ consists of all q ∈ ∞ with q = ∞ q < ∞. The norm on is again called the mass norm. r r ∞ r=0 It is easy to see that the mass norm is submultiplicative (pq)r r ≤ pr−k qk r ≤ pr−k r−k qk k pq = r

≤

r

k≤r

r

k

r

k≤r

pr r qk k = pq.

Notice that there is a seminorm on n which is trivially independent of the choice of the set of generators, namely qB := |qB |.

(5)

We have qB ≤ |q|κ for all q ∈ n (κ) and qB ≤ q for all q ∈ m ∞. In the sequel it is understood that the symbol stands for either (a) n with n finite, (b) for m ∞ , or (c) for ∞ (κ) with 1 ≤ κ ≤ ∞. 3. Hilbert Modules Definition 3.1. A pre-Hilbert module is a Z2 -graded right module E = E0 ⊕ E1 equipped with a -valued inner product ·, · : E ×E → that is sesquilinear, definite, and whose body is Hermitian and positive. In other words: 1. 2. 3. 4.

x, y1 +y2 = x, y1 +x, y2 , and y1 +y2 , x = y1 , x+y2 , x for x, y1 , y2 ∈ E; x, αy = αx, y = α ∗ x, y, for x, y ∈ E, α ∈ C; x, yB = y, x∗B , for x, y ∈ E; x, xB ≥ 0 for x ∈ E and x, x = 0 if and only if x = 0.

An element x of a pre-Hilbert module E = E0 ⊕ E1 is called even if x ∈ E0 and odd if x ∈ E1 , respectively. Immediate consequences of Definition 3.1 are that every pre-Hilbert module is a complex vector space and that every element x of a pre-Hilbert module E can be uniquely written as a sum of an even and an odd element of E, i.e., x = x0 + x1 , where x0 ∈ E0 and x1 ∈ E1 . Remark 3.2. The rationale behind the positivity requirement 4 in Definition 3.1 is to interpret the body of the inner product · , · as physical transition amplitude. Remark 3.3. DeWitt [1] requires the inner product on a super Hilbert space to be sesquilinear with respect to Graßmann numbers, i.e., x, yq = x, yq, for all x, y in the super Hilbert space and q ∈ n . We shall see below, however, that the inner product on the super Hilbert space arising in the functional Schrödinger representation of spinor quantum field theory does not satisfy this condition. Accordingly we have allowed for a more general notion of pre-Hilbert module.

Super Hilbert Spaces

453

We may now use a norm · defined on to define a norm · E on a pre-Hilbert module E by x2E = x, x .

(6)

For instance, if equals n or ∞ (κ) endowed with the norm | · |κ , then the norm on E is given by x2κ := |x, x|κ ,

(7)

for x ∈ E and 1 ≤ κ ≤ ∞. The norm x2 := x, x

(8)

corresponding to the mass norm on = n or = m ∞ in Eq. (4) is called the mass norm on the Hilbert module E. In the sequel it is understood that we consider only norms on a Hilbert module arising from a norm on as in Eq. (6) unless explicitly stated otherwise. Lemma 3.4 (Cauchy–Schwarz inequality). If E is a pre-Hilbert module and x, y ∈ E, then |x, yB |2 ≤ x, xB y, yB . Proof. Let px := x, xB , py := y, yB , q := x, yB and λ ∈ R, then 0 ≤ x − yλq ∗ , x − yλq ∗ B = px − 2λqq ∗ + λ2 qpy q ∗ . Adding 2λqq ∗ on both sides and taking norms yield 2λ|q|2 ≤ 2|λ||q|2 ≤ |px + λ2 qpy q| ≤ |px | + λ2 |q|2 |py |.

(9)

This is equivalent to 2 λ|q||py | − |q| ≥ (|q|)2 − |px ||py |. If |py | = 0, then setting λ := |p1y | yields the required inequality. Moreover, we find that |px | = 0 and |py | = 0 implies |q| = 0 (let λ = 1). From symmetry considerations (or from Eq. (9)) we also get that |py | = 0 and |px | = 0 implies |q| = 0. In the case that |px | = |py | = 0 we infer from Eq. (9) by taking λ to be positive that |q| = 0. On any pre-Hilbert module E there is a body operation, i.e., a linear map B : E → E0 , x → xB such that (xλ)B = xB λB for all λ ∈ [9]. First define the soul s(E) and the body b(E) of E by s(E) := {x ∈ E|xλ = 0 for some λ ∈ , λ = 0}, b(E) := E/s(E). The body operation B : E → E0 is the canonical surjection from E to b(E). If the inner product of E satisfies xB , yB = x, yB , then the body of E endowed with the induced inner product is a pre-Hilbert space whose completion is a Hilbert space (by virtue of the Cauchy–Schwarz inequality). But even if the inner product does not respect the body operation, we can prove

454

O. Rudolph

Proposition 3.5. Let E be a pre-Hilbert module. Then there exists a map x → [x] from E into a dense subspace of a Hilbert space H such that [x], [y]H = x, yB , for all x, y ∈ E, where ·, ·H denotes the inner product on H . Proof. Let N := {x ∈ E|x, xB = 0}. Let [x] := x + N . Then ·, ·B induces a well-defined inner product on E/N by virtue of Lemma 3.4. Therefore E/N with this inner product is a pre-Hilbert space. Definition 3.6. Let E be a pre-Hilbert module and · a norm on E, then E is said to be a Hilbert module if E is complete with respect to its norm. A Hilbert submodule of a Hilbert module E is a closed submodule of E. Definition 3.7. Let E and F be Hilbert modules. A C-linear map O : E → E is called an operator on E. We denote the set of all bounded operators on E by L(E). An operator T : E → E is called unitary if T (x), T (y) = x, y for all x, y ∈ E. An operator S is called weakly unitary if S(x), S(y)B = x, yB for all x, y ∈ E. A (Hilbert) module map is a linear map T : E → F which respects the module action: T (xq) = T (x)q, for x ∈ E, q ∈ . Definition 3.8. A Hilbert module E is said to satisfy the strong definiteness condition if x, xB = 0 implies x = 0 for all x ∈ E. Every Hilbert module E satisfying the strong definiteness condition becomes a preHilbert space with respect to the norm · 2B := ·, ·B . Every Hilbert module E is endowed with a Z2 -grading E = E0 ⊕E1 . This induces a Z2 -grading on L(E): every operator T : E → E can be written as the sum of an even map T0 : Ei → Ei and an odd map T1 : Ei → Ei+1(mod2) , i.e., T = T0 + T1 where T0 and T1 are defined by T0 u := (T u0 )0 +(T u1 )1 and T1 u := (T u0 )1 +(T u1 )0 respectively where u = u0 + u1 . Definition 3.9. Let E be a Hilbert module. An operator T : E → E is said to be adjointable if there exists an operator T ∗ : E → E satisfying x, T y = T ∗ x, y for all x, y ∈ E. Such an operator T ∗ is called an adjoint of T . We denote the set of all adjointable operators on E by B(E). An adjointable operator T ∈ B(E) is called self-adjoint if T ∗ = T . An operator T : E → E is said to be weakly adjointable if there exists an operator T † : E → E satisfying x, T yB = T † x, yB for all x, y ∈ E. Such an operator T † is called a weak adjoint of T . We denote the set of all weakly adjointable operators on E by Bw (E). A weakly adjointable operator T ∈ Bw (E) is called weakly self-adjoint if T† = T. Remark 3.10. Obviously, any adjointable operator is also weakly adjointable. Thus, B(E) ⊂ Bw (E). We have noticed above in Remark 3.2 that the body of the inner product on a Hilbert module is interpreted as the physical transition amplitude. Accordingly we also expect that the set Bw (E) plays a distinguished role and that the operators representing physical observables or physical operations will be elements of Bw (E). The following proposition can be proven in analogy to the corresponding result for Hilbert C ∗ -modules, see [13].

Super Hilbert Spaces

455

Lemma 3.11. (a) Let E be a Hilbert module and T : E → E be an adjointable operator. The adjoint T ∗ of T is unique. If both T : E → E and S : E → E are adjointable operators, then ST is adjointable and (ST )∗ = T ∗ S ∗ . (b) Let E be a Hilbert module satisfying the strong definiteness condition and Tw : E → E be a weakly adjointable operator. Then the weak adjoint Tw† of Tw is unique. If both Tw : E → E and Sw : E → E are adjointable operators, then Sw Tw is adjointable and (Sw Tw )† = Tw† Sw† . Proof. (a) Assume that T and T ∗ are adjoints of T , then 0 = T x, y − T ∗ x, y = (T − T ∗ )x, y, for all x, y ∈ E. Let y = (T − T ∗ )x. This implies T = T ∗ . A similar argument proves (b). 4. Super Hilbert Spaces The Definitions 3.7 and 3.9 are analogous to parallel definitions in the theory of Hilbert C ∗ -modules [13, 8]. However, the positivity requirement in the definition of a Hilbert module is weaker than the positivity requirement for Hilbert C ∗ -modules and all results for Hilbert C ∗ -modules depending on the positivity of the inner product may in general not be valid for a Hilbert module. The Cauchy–Schwarz inequality in Lemma 3.4 is a first example. As a consequence of the failure of the general Cauchy–Schwarz inequality the inner product on a pre-Hilbert module may in general not be continuous in each argument and therefore in general an inner product on a pre-Hilbert module does not extend to an inner product on its completion. In the sequel we shall be mainly interested in inner products which are continuous. Definition 4.1. We shall call a (pre-) Hilbert module H a super (pre-) Hilbert space if the inner product on H is continuous, i.e., if there exists a constant C > 0 such that x, y ≤ Cx y. Remark 4.2. The completion of a super pre-Hilbert space is a super Hilbert space. Remark 4.3. All concrete examples for super Hilbert spaces we shall discuss below will satisfy the strong definiteness condition. An example for a situation where a super Hilbert space without the strong definiteness condition arises is the Becchi-Rouet-Stora-Tyutin (BRST) formulation of gauge theories. The natural choice of the state space arising in the BRST formulation of gauge theories is a Hilbert module (as there is always a representation of the Graßmann algebra acting on the state space) endowed with an indefinite inner product. The physical states are annihilated by the BRST operator (, i.e., satisfy the condition (ψphys = 0. The inner product induced on the set Vphys of all physical states can be shown to be positive but not definite. The states in Vphys with probability zero are called ghost states and are unobservable. Therefore in the BRST formulation of gauge theories we naturally arrive at a physical state space which carries the structure of Hilbert module or a super Hilbert space not satisfying the strong definiteness condition. A good introduction into the BRST formalism can be found, e.g., in [7]. We have already noticed above that the physical transition amplitudes are given by the body of the inner product of a Hilbert module. This gives rise to the following definition.

456

O. Rudolph

Definition 4.4. Let H be a super Hilbert space. An element x ∈ H is called physical if x, xB = 0. An element g ∈ H with g = 0 and g, gB = 0 is called a ghost. Example 4.5. Let n be finite. The Graßmann algebra n endowed with the mass norm · becomes a super Hilbert space with the inner product · , · given by p, q := [p [q]]

(10)

for all p, q ∈ n , where denotes the Hodge star operator. The submultiplicativity of the mass norm implies p, q ≤ pq for all p, q ∈ n . Recalling Eq. (1) q = qB 1 + qm1 ,··· ,mk ξm1 · · · ξmk , (m1 ,··· ,mk )∈Mn

we see that q, qB = |qB |2 +

|qm1 ,··· ,mk |2 .

(m1 ,··· ,mk )∈Mn

Therefore n with the inner product (10) satisfies the strong definiteness condition. More general super Hilbert spaces can be constructed by building the tensor product n ⊗ H of n with a complex Hilbert space H. The inner product of n ⊗ H is given on simple tensors by p ⊗ ϕ, q ⊗ ψ = p, qϕ, ψ, for p, q ∈ n and ϕ, ψ ∈ H, and extended to arbitrary elements of n ⊗ H by linearity and continuity. We omit the details of the construction as a more general example will be given below in Example 4.9. Example 4.6. Consider a measure space (X, (), where X is a set and ( a σ -algebra of subsets of X, endowed with a σ -finite measure µ. Every function f : X → n can be expanded as f (x) = fB (x) + fm1 ,··· ,mk (x)ξm1 · · · ξmk , (m1 ,··· ,mk )∈Mn

with complex-valued functions fB : X → C and fm1 ,··· ,mk : X → C. We restrict ourselves here to the case that n is finite. Now consider the set E of all functions f : X → n such that fB and all fm1 ,··· ,mk are square integrable with respect to µ. This requirement is independent of the basis chosen. We define a n -valued inner product on E by f, g = f (x)∗ g(x)dµ(x), (11) for all f, g ∈ E. If n is furnished with the Rogers norm | · |1 , then define

f := |fm1 ,··· ,mr (x)|2 dµ(x),

(12)

(m1 ,··· ,mr )∈Mn0

where we introduced the notation Mn0 := {(m1 , · · · , mk ) | 0 ≤ k ≤ n, mi ∈ N, 1 ≤ m1 < · · · < mk ≤ n} and f∅ := fB . Further let N := {f ∈ E | f = 0}. It is easy to

Super Hilbert Spaces

457

see that Eq. (12) defines a norm on E/N and that E/N equipped with the norm (12) becomes a super Hilbert space. Indeed, let f, g ∈ E, then r sgn(σ ) ∗ |f, g|1 = fσ (m1 ),··· ,σ (mk ) gσ (mk+1 ),··· ,σ (mr ) dµ (−1) 0 (m1 ,··· ,mr )∈Mn k=0 σ

≤

r

(m1 ,··· ,mr )∈Mn0 k=0 σ

≤

|fσ (m1 ),··· ,σ (mk ) | |gσ (mk+1 ),··· ,σ (mr ) |dµ

r

(m1 ,··· ,mr )∈Mn0 k=0 σ

1 2 2 |fσ (m1 ),··· ,σ (mk ) | dµ |gσ (mk+1 ),··· ,σ (mr ) | dµ 2

≤ f g, where the sum σ in the first three lines runs over all permutations σ of (m1 , · · · , mr ) such that (σ (m1 ),· · · , σ (mk )) ∈ Mn0 and (σ (mk+1 ), · · · , σ (mr )) ∈ Mn0 . If we replace (11) by f, g = [f (x)]g(x)dµ(x), a similar argument holds. Example 4.7. For n infinite we also can make m ∞ a super Hilbert space by defining an appropriate inner product. For simplicity we assume that the set of all generators is countable {ξi }i∈N . The generalization of the following to the situation where the set of generators is uncountable is obvious. First of all we observe that the inner product (10) is not well-defined as the Hodge star operator is not defined on m ∞ . This difficulty can be m m m overcome by suitably imbedding m ∞ into the direct sum ∞ ⊕∞ of two copies of ∞ . The basic idea is to introduce the formal infinite product of all generators ξ∞ ≡ i ξi . We do not make any attempt to give a precise meaning to this infinite product of Graßmann numbers and just introduce ξ∞ as an auxiliary object which has certain properties we would expect from the product of all generators of the Graßmann algebra. Namely, we require that qξ∞ = qB ξ∞ for all q ∈ m ∞ . Analogously we define cofinite products of the generators of the Graßmann algebra, i.e., infinite products obtained fromξ∞ by removing at most finitely many terms in the product. E.g., the infinite product i=1 ξi of all generators except ξ1 is denoted by ξˆ1 ≡ ∂ξ∂ 1 ξ∞ . We require ∂ ∂ ∂ ∂ =− ∂ξi ∂ξj ∂ξj ∂ξi and ξi ξˆi = ξ∞ and ξi ∂ξ∂ j = − ∂ξ∂ j ξi , for all i = j . Moreover we require ξ∞ to be even. ∂ m Therefore the algebra m ∞ generated by the ∂ξi and 1 is isomorphic to ∞ . m Now we are able to define the action of the Hodge star operator on ∞ by setting [q] ≡ qB∗ ξ∞ +

(m1 ,··· ,mk )∈M∞

∗ qm 1 ,··· ,mk

∂ ∂ ··· ξ∞ , ∂ξmk ∂ξm1

(13)

for all q ∈ m ∞ . Moreover, we require [[q]] = q, for all q. The algebra generated by the ∂ξ∂ i is isomorphic to m ∞ with the isomorphism given by the Hodge star operator (13). The inner product p, q = [p [q]], for all p, q ∈ m ∞ is now well-defined. Notice m m that although [q] ∈ / m ∞ for all q ∈ ∞ , the inner product satisfies p, q ∈ ∞ if

458

O. Rudolph

p, q ∈ m ∞ . Since, by virtue of the properties of the mass norm, we also have p, q ≤ pq for all p, q ∈ m ∞ and since q, qB = |qB |2 + |qm1 ,··· ,mr |2 (m1 ,··· ,mr )∈Mn

we see that m ∞

with the inner product (10) is a super Hilbert space satisfying the strong definiteness condition. m Example 4.8. m ∞ can be made a super Hilbert space (over ∞ ) by setting p, q = [p]q,

m for all p, q ∈ [m ∞ ] (when we identify ξ∞ formally with 1 ∈ C). Obviously ∞ satisfies the strong definiteness condition. Example 4.9. We are now going to construct the tensor product of two super Hilbert spaces H1 and H2 . We denote the inner products on H1 and H2 by ·, ·1 and ·, ·2 respectively, and the norms on H1 and H2 are denoted by · 1 and · 2 respectively. The algebraic tensor product H 1 ⊗alg H2 of H1 and H2 is defined as usual as the set of all finite sums of the form i pi ⊗ qi with pi ∈ H1 and qi ∈ H2 . We define a function µ on H1 ⊗alg H2 by µ(t) := inf (14) pi 1 qi 2 t = pi ⊗ q i . i

i

µ is a cross norm on H1 ⊗alg H2 and the completion of H1 ⊗alg H2 with respect to µ is a Banach algebra which we denote by H1 ⊗µ H2 (for a proof, see, e.g., Proposition T.3.6 in [13]). The inner products on H1 and H2 induce an inner product on H1 ⊗alg H2 given by a, b = pi , tj 1 ⊗ qi , sj 2 if a =

i

pi ⊗ qi and b =

i,j

j tj

⊗ sj . As

µ(a, b) = inf cl 1 dl 2 a, b = cl ⊗ d l l l ≤ restr. inf pi , tj 1 1 qi , sj 2 2

i,j

≤ restr. inf

pi 1 qi 2 tj 1 sj 2

i,j

= restr. inf

i

  pi 1 qi 2  tj 1 sj 2  j

= µ(a)µ(b), where the infimum in the first line runs over all possible decompositions of a, b as sums over elementary tensors, whereas the “restricted infima” in the following three lines run

Super Hilbert Spaces

459

over all decompositions of a and b into sums of elementary tensors. Consequently the inner product µ on H1 ⊗alg H2 is continuous and can be extended to the completion H1 ⊗µ H2 of H1 ⊗alg H2 . We denote this extension also by µ. Therefore H1 ⊗µ H2 is a super Hilbert space when endowed with the norm µ. When both H1 and H2 satisfy the strong definiteness condition, both H1 and H2 are pre-Hilbert spaces with respect to the body of their inner products. Therefore also the body µB of µ is a complex-valued scalar product on H1 ⊗alg H2 and, by virtue of the Cauchy–Schwarz inequality, µB can be extended to a complex-valued scalar product µB on H1 ⊗µ H2 . µB obviously coincides with the body of the extension of µ to H1 ⊗µ H2 . Therefore we conclude that H1 ⊗µ H2 is a super Hilbert space satisfying the strong definiteness condition. m In Sect. 6 below we shall be interested in the case H1 = m ∞ and H2 = ∞ . The m m norm µm arising from called the mass norms on ∞ and ∞ via Eq. (14) is the mass m m m norm on ∞ ⊗µm ∞ . It follows from our discussion above that m ∞ ⊗µm ∞ is a super Hilbert space satisfying the strong definiteness condition. We shall see in Sect. 6 that in the functional Schrödinger m representation of spinor quantum field theory the super Hilbert space m ⊗ ∞ arises naturally as the quantum theoretical state space. ∞ µm Proposition 4.10. Let H be a super Hilbert space and T : H → H be an adjointable operator. Then T and T ∗ are bounded with respect to the operator norm T := sup{T x | x ≤ 1}.

(15)

If Eq. (6) holds, then T = T ∗ . Proof. Let xλ , x, y ∈ H, such that xλ → x and T xλ → y. The inner product of a super Hilbert space is separately continuous in each variable. Thus 0 = T ∗ e, xλ − T ∗ e, xλ = e, T xλ − T ∗ e, xλ → e, y − T ∗ e, x = e, y − T x, for all e ∈ H. Putting e = y −T x implies y = T x. The boundedness of T and T ∗ follows now from the closed graph theorem. As T x2 = T ∗ T x, x ≤ T ∗ T xx ≤ T ∗ T x2 , we find T ≤ T ∗ . But then also T ∗ ≤ T ∗∗ = T . A similar argument proves Proposition 4.11. Let H be a super Hilbert space satisfying the strong definiteness condition and T : H → H be a weakly adjointable operator. Then T and T † are bounded with respect to the operator norm in Eq. (15) and with respect to the norm T w := sup{|T x, T xB |1/2 | x ≤ 1}

(16)

and T w = T † w . Proof. The boundedness of T and T † with respect to the norm in Eq. (15) follows as in the proof of Proposition 4.10. The boundedness with respect to · w follows from qB ≤ q for all q ∈ . Proposition 4.12. Let H be a super Hilbert space. When equipped with the operator norm (15) B(H) is an involutive Banach algebra with continuous involution.

460

O. Rudolph

Proof. It is easy to see that (15) defines a norm on B(H). The operator norm is clearly submultiplicative. It remains to show that B(H) is norm complete. If (Tn )n∈N is a Cauchy sequence of adjointable operators, then (Tn x)n∈N and (Tn∗ x)n∈N are Cauchy sequences in H for every x ∈ H. We call the limits T x and T x respectively. Since y, T x = limy, Tn x = limTn∗ y, x = T y, x, we see that T is adjointable and T ∗ = T . This shows that B(H) is norm complete. From Tn − T = Tn∗ − T ∗ it is easy to see that the involution is continuous. 5. Physical Observables Definition 5.1. Let E be a Hilbert module and T ∈ Bw (E). Then we say that a Graßmann number λ is a spectral value for T when T − λI does not have a two-sided inverse in Bw (E). The set of spectral values for T is called the spectrum of T and is denoted by sp(T ). The subset spC (T ) := sp(T ) ∩ C is called the complex spectrum of T. It is well-known that a Graßmann number q ∈ n , n finite, has an inverse if and only if its body qB is nonvanishing [1]. Therefore the following proposition that the spectrum of a bounded module map T on a Hilbert n module, n finite, is fully determined by the complex spectrum of T is not surprising. Proposition 5.2. Let E be a Hilbert n module, n finite, and T ∈ Bw (E) be a Hilbert module map. Then λ ∈ sp(T ) if and only if λB ∈ spC (T ). Proof. Let λ ∈ / sp(T ). Then T − λI has a two-sided inverse in Bw (E), denoted by Tλ−1 . −1 Evidently Tλ isa module map. Now let s be a Graßmann number with vanishing body. ∞ −1 −1 n −1 := Tλ−1 is a left inverse for T − (λ − s)I and Tλ−s,R Then Tλ−s,L := n=0 (−Tλ s) ∞ −1 n Tλ−1 is a right inverse for T − (λ − s)I . Both sums are actually finite. n=0 (−sTλ ) This follows from the bodylessness of s and from the fact that Tλ−1 is decomposable into −1 −1 + Tλ,1 . Therefore the left and right inverse exist an even and an odd part: Tλ−1 = Tλ,0 −1 −1 −1 −1 for all s ∈ n with sB = 0. As Tλ−s,L (T − (λ − s)I ) Tλ−s,R = Tλ−s,R = Tλ−s,L the left and right inverse coincide. This proves that λ ∈ / sp(T ) implies λ − s ∈ / sp(T ) for all s ∈ n with sB = 0. Example 5.3. Consider n endowed with the inner product (10). Let ξ1 , · · · , ξn denote the set of generators of n . Consider the module map ξˆ1 : n → n , ξˆ1 q := ξ1 q. Obviously 0 is the only complex spectral value of ξˆ1 and, as ξˆ1 − sI does not have an inverse for all bodyless s ∈ n , all Graßmann numbers with vanishing body are spectral values for ξˆ1 . The element ξ1 · · · ξn ∈ n is an “Eigenstate” for ξˆ1 for any bodyless spectral value: ξˆ1 ξ1 · · · ξn = sξ1 · · · ξn = 0, for all s ∈ n with sB = 0. Definition 5.4. Let H be a super Hilbert space. A physical observable on H is a weakly self-adjoint operator O : H → H. Proposition 5.5. Let H be a super Hilbert space and let H be the Hilbert space from Proposition 3.5. Then there exists a * homomorphism ϕ from Bw (H) ∩ L(H) (equipped with the norm · w ) into the C ∗ -algebra B(H ) of bounded operators on H .

Super Hilbert Spaces

461

Proof. Let N := {x ∈ H|x, xB = 0} and let T ∈ Bw (H). For n ∈ N we have by virtue of Lemma 3.4: |T n, T nB |2 ≤ T † T n, T † T nB n, nB = 0. Thus T (N ) ⊂ N . This shows that every T ∈ Bw (H) induces a bounded linear operator on H/N which we denote by φ(T ) via φ(T )(x + N ) := T (x) + N . The operator φ(T ) can be uniquely extended to a bounded linear operator ϕ(T ) on H (compare, e.g., Theorem 1.5.7 in [5]). Obviously, the correspondence ϕ is linear, multiplicative and satisfies ϕ(T † ) = ϕ(T )∗ and ϕ(I ) = I|H , i.e., ϕ is a * homomorphism. Proposition 5.6. Let H be a super Hilbert space satisfying the strong definiteness condition. Then the * homomorphism ϕ from Proposition 5.5 is an isometric isomorphism from Bw (H) to the C ∗ -algebra B(H ). Hence Bw (H) is a C ∗ -algebra with norm T w := sup{|T x, T xB |1/2 |x ≤ 1}. Proof. This follows, e.g., from Theorem 1.5.7 in [5].

Remark 5.7. Proposition 5.2 and Example 5.3 show that in general it is meaningless to attribute physical relevance to the soul of a spectral value of a physical observable. Instead it becomes clear that in general only the elements of the complex spectrum of a physical observable may admit an interpretation as possible physical values of the observable. This is in accordance with and further substantiated by the fact that Graßmann numbers cannot be measured. Above we have argued that the physical transition amplitudes on a super Hilbert space are given by the body of the inner product. Therefore it seems reasonable to define the physical spectrum of an operator T on a super Hilbert space H as, loosely speaking, the subset of spC (T ) corresponding to physical elements of H, i.e., Definition 5.8. The physical spectrum of a bounded physical observable O on a super Hilbert space H, denoted by spph (O), is the set of λ ∈ C such that ϕ(O − λI ) has no two-sided inverse in B(H ). Proposition 5.9. Let O be a bounded physical observable on a super Hilbert space H. Then spph (O) = sp(ϕ(O)) ⊂ spC (O). If H satisfies the strong definiteness condition, then spph (O) = spC (O). Proof. Let λ ∈ / spC (O), λ ∈ C. Then O − λI has a two-sided inverse in Bw (H) and as ϕ is an algebra-homomorphism, also ϕ(O) − λI has a two-sided inverse in B(H ). This implies λ ∈ / sp(ϕ(O)). The other direction follows analogously if ϕ is an isomorphism. Corollary 5.10. The physical spectrum of a bounded physical observable on a super Hilbert space is a compact subset of R. If H is a super Hilbert space, T ∈ L(H), then we say that an element λ ∈ sp(T ) is a right Eigenvalue for T if there is an x ∈ H such that T (x) = xλ. x is called an Eigenvector corresponding to λ. Notice that an Eigenvector might correspond to more than one Eigenvalue, see Example 5.3. Corollary 5.11. Let O be a bounded physical observable on a super Hilbert space. Let λ, λ ∈ spph (O) be physical spectral values of O with λ = λ and let x, x be Eigenvectors corresponding to λ and λ respectively. Then x, x B = 0.

462

O. Rudolph

Remark 5.12. It is instructive to compare our definition of a physical observable with the definition given by DeWitt in [1]. According to DeWitt’s definition an element of a super Hilbert space is called physical if it has nonvanishing body. A physical observable is then defined as a self-adjoint module map on the super Hilbert space such that – all Eigenvalues are even Graßmann numbers; – for every Eigenvalue there is a physical Eigenstate; – the set of physical Eigenvectors that correspond to souless Eigenvalues contains a complete basis (for a definition of this notion see [1]). Our argument above shows that DeWitt’s additional assumptions are unnecessary to assure real-valuedness of physical spectral values and weak orthogonality of Eigenvectors of physical observables. Moreover, Proposition 5.2 shows that for super Hilbert spaces over finitely generated Graßmann algebras there are no physical observables in DeWitt’s sense, as there will always be Eigenvalues which are not even. For super Hilbert spaces over an infinitely generated Graßmann algebra it is also easy to see that in general DeWitt’s conditions cannot be satisfied. For let T be a physical observable in DeWitt’s sense with Eigenvalue λ, and let x denote a physical Eigenstate for λ. Then λ + ξ1 is an Eigenvalue of T with Eigenstate xξ1 = 0. The Eigenvalue λ + ξ1 is neither an even Graßmann number nor it is guaranteed that there is a physical Eigenstate for this Eigenvalue. This is a contradiction. Therefore we conclude that in general there are no physical observables in DeWitt’s sense on a super Hilbert space. 6. The Schrödinger Representation of Spinor Quantum Field Theory In the Schrödinger representation of quantum field theory the commutation relations of the field operators are realized by representing the field operators by functionals and representing the conjugate momenta by functional derivatives [4]. This formulation of quantum field theory is equivalent to the standard operator formulation and to the functional-integral representation of quantum field theory. In the Schrödinger representation of spinor quantum field theory super Hilbert spaces naturally arise as the quantum mechanical state space. The material presented in this section is taken from Hatfield [4]. The Hamiltonian for free spinor field theory is given by H = d 3 x: † (x)(−iα · ∇ + βm):(x), k ¯ = d 3 x :(x)(−iγ ∂k + m):(x), where the matrices αi and β satisfy β 2 = 1, {αi , αj } = 2δij , {αi , β} = 0 and the γ are related by γ 0 = β and αi = γ 0 γi . The canonical anticommutation relations of the fields are given by {:α (x, t), :β† (y, t)} = δαβ δ 3 (x − y), {:α (x, t), :β (y, t)} = {:α† (x, t), :β† (y, t)} = 0. The time evolution is given by the functional Schrödinger equation i

∂ |: = H |:. ∂t

Super Hilbert Spaces

463

In the “coordinate” Schrödinger representation the state space at time t is spanned by the Eigenfunctions |: of the field operator :(x). The corresponding Eigenvalues ψ(x) are spinors of Graßmann number-valued functions. The conjugate momentum operator : † (x) is represented as a functional derivative :β† (x) =

δ . δψβ (x)

It is well-known that the Eigenfunctions and the functional derivative can be rewritten as a plane wave expansion in terms of the creation and annihilation operators 2 d 3p m bi (p)uiα (p)e−ipx + di† (p)vαi (p)eipx , ψα (x) = 3 E (2π) i=1 " ! 2 δ d 3p m δ δ i † ipx i † −ipx (vα ) (p)e , + † = (uα ) (p)e δψα (x) δdi (p) (2π)3 E δbi (p) i=1 where

   δi1 0 p / +m −/p + m  0   δi2  i i u (p) = √  0  , v (p) = √  . 2m(m + E) 2m(m + E) δi1 0 δi2 

(17)

δ The operators b(p), δb(p) , d † (p) and δd †δ(p) act on the state space and obviously satisfy the equal time anticommutation relations δ bi (p), = δ 3 (p − k)δij , δbj (k) δ † , dj (k) = δ 3 (p − k)δij , δdi† (p)

with all other anticommutators vanishing.Accordingly, the δbδ i and the

δ δdi†

are interpreted

as creation operators of field quanta with positive or negative energy respectively whereas the bi and the di† are interpreted as the corresponding annihilation operators. At this stage the important observation for our purposes is that the state space can be m ⊗ m m naturally identified as the tensor product super Hilbert space ≡ ∞ ∞,d µ ∞,b , m where the uncountable set of generators of the first factor (∞,d ) is identified with

{di† (p)} and the set of generators of m ∞,b is identified with {bi (p)}, see Example 4.9. As all generators of m commute with all generators of m ∞,d ∞,b , we omit the tensor † m symbol in our notation. A factor bi (p) (or d (p)) in an element x ∈ ∞ corresponds i

to the situation that the field quantum annihilated by bi (p) (or di† (p)) is absent. For example, the vacuum state, where all negative energy states are filled and all positive energy states are empty, is represented by |0 =

2 % % i=1 p

bi (p) = ξ∞,b ,

464

O. Rudolph

and the state with one positron of momentum p p with spin up is given by |pp , ↑ = d2† (p p )

2 % %

bi (p).

i=1 p

Physical transition amplitudes are computed by performing a functional integration over all Graßmann degrees of freedom, see [4]. Mathematically this is equivalent to taking the body of the inner product (10) as the physical transition amplitude, compare Proposition 3.5. It is now obvious that – when identifying the Hilbert space H in Proposition 3.5 with the state space in the operator (Heisenberg) representation of spinor quantum field theory – these physical transition amplitudes in the Schrödinger representation coincide with the physical transition amplitudes in the operator representation. 7. Previous Definitions Revisited 7.1. DeWitt super Hilbert spaces. Super Hilbert spaces were first considered by DeWitt in his book [1]. The basic features of DeWitt’s definition may be summarized as follows: DeWitt defines a super Hilbert space H basically as a Z2 -graded n module, where n is possibly infinite, with a n -valued inner product ·, · : H × H → n subject to the following conditions: 1. 2. 3. 4. 5. 6.

x, y1 + y2 = x, y1 + x, y2 , for x, y1 , y2 ∈ H; x, αy = αx, y = α ∗ x, y, for x, y ∈ H, α ∈ C; x, yq = x, yq for all x, y ∈ H, q ∈ n ; x, y = y, x∗ , for x, y ∈ H; x, xB ≥ 0 for x ∈ H; x ∈ H has nonvanishing body if and only if x, xB > 0; xs , yr qt = (−1)t (s+r) qt xs , yr for all pure xs ∈ Hs , yr ∈ Hr and q ∈ n , deg(qt ) = t.

DeWitt moreover requires that the body of H is an ordinary complex Hilbert space. The central difference between our definition and DeWitt’s is the requirement of sesqui-linearity of the inner product. It is easy to see that sesqui--linearity implies xB , yB = x, yB for all x, y ∈ H. DeWitt concentrated on algebraic properties of super Hilbert spaces and did not consider the topological or metric structure of n or H. Nagamachi and Kobayashi formalized and refined DeWitt’s definition by taking also into account the topological and norm structure on super Hilbert spaces [9]. It becomes clear from their work that the requirement of sesqui--linearity in some sense trivializes the theory, as it is possible to show along their lines that, when n is equipped with the Rogers norm, every super Hilbert space is of the form H = H ⊗ n , where H is an ordinary complex Hilbert space. 7.2. El Gradechi and Nieto’s super Hilbert space. El Gradechi and Nieto studied in [2] a super extension of the Kirillov-Kostant-Souriau geometric quantization method. They defined a super Hilbert space to be a direct sum H = H0 ⊕ H1 of two complex Hilbert spaces (H1 , ·, ·0 ) and (H2 , ·, ·1 ) equipped with the super Hermitian form ·, · = ·, ·0 + i·, ·1 . At first sight this definition looks rather different from the approach given in this paper. However, El Gradechi’s and Nieto’s definition is actually abstracted from the concrete example arising in their study of super unitary irreducible representations of OSp(2/2)

Super Hilbert Spaces

465

in super Hilbert spaces of L2 superholomorphic sections of prequantum bundles of the Kostant type. It is beyond the scope of this paper to review the construction in [2] (1|2) in detail. For our purposes it is enough to know that the super Hilbert space Lp constructed in [2] is a Z2 -graded 4 module and that its elements are sections of the form ψ(z, z, θ, θ , χ , χ ), where z ∈ C denotes a complex variable with |z| ≤ 1 and θ, θ , χ , χ denote a complexified set of generators of 4 . Notice that the definition for complex conjugation in 4 adopted in [2] differs from our definition: pq = p q for all (1|2) p, q ∈ 4 . The even super Hermitian form on Lp is defined for ψ = ψ0 + ψ1 and ψ = ψ0 + ψ1 by (see Eqs. 5.8 and 5.10 in [2]) ψ , ψ ≡ (ψ , ψ)hdzdzdθ dθdχ dχ , where h is an integrating constant factor for the super Liouville measure constructed in [2] and where (ψ , ψ) = ψ ψ = ψ0 ψ0 + ψ1 ψ1 + ψ0 ψ1 + ψ1 ψ0 . It is now crucial to realize that it is possible to construct a 4 -valued inner product ·, · (1|2) on Lp in the sense of Definition 3.1 by replacing (ψ , ψ) by ∗

(ψ , ψ)∼ = ψ ψ (where ∗ denotes the complex conjugation operation introduced in Sect. 2) and by setting ψ , ψ ≡ (ψ , ψ)∼ hdzdz. (18) This is an inner product of the form Eq. (11) and therefore the physical super Hilbert space can be identified with the super Hilbert space discussed in Example 4.6. Notice that what is called a super unitary operator with respect to the super Hermitian product in [2] would be called a weakly unitary operator with respect to the inner product (18) in our terminology. El Gradechi and Nieto correctly note that the examples for super Hilbert spaces considered by them are not covered by the definitions of DeWitt, and of Nagamachi and Kobayashi.

7.3. Other definitions. There are some other notions of super Hilbert space in the literature which we briefly mention here. Khrennikov [6] defines a super Hilbert space to be a Banach (commutative) module which is isomorphic to the C2 () of space square-summable sequences in with the inner product x, y := xn yn∗ and norm x2 := x, x. Schmitt proposed a different notion of super Hilbert space in [12]. According to his definition a super Hilbert space is just a complex Z2 -graded ordinary Hilbert space (consequently the inner product is always complex-valued). It is beyond the scope of the present work to discuss this definition in detail and the reader is referred to [12] and references therein. We only remark that Schmitt’s definition of super Hilbert space is general enough to cover even the functional Schrödinger representation of spinor quantum field theory and contact to our approach can be made for instance by identifying his “super Hilbert space” with the Hilbert space of Proposition 3.5.

466

O. Rudolph

7.4. Discussion. The definitions of the notion of super Hilbert space put forward by DeWitt [1], Nagamachi and Kobayashi [9] and Khrennikov [6] are all special cases of our more general definition. From a mathematical point of view they are viable generalizations of ordinary Hilbert space theory, and indeed, unlike the definition of super Hilbert space introduced in this paper, the definitions by Nagamachi, Kobayashi and Khrennikov are actually designed such that analogies or extensions of most basic structural results of ordinary Hilbert space theory remain valid in the super case. However, these definitions suffer from the problem that they are too narrow to cover the physically important example of the state space arising in the functional Schrödinger representation of spinor quantum field theory discussed in Sect. 6. As repeatedly stated above, in physical applications of super Hilbert spaces the physical transition amplitudes have to be identified with the body of the inner product (this is in accordance with DeWitt’s remark that “real physics is restricted to the ordinary Hilbert space that sits inside the super Hilbert space”). However, in the theories by DeWitt, Nagamachi, Kobayashi and Khrennikov the inner product respects the body operation xB , yB = x, yB . In the functional Schrödinger representation of spinor quantum field theory the physical state space is identified with the tensor product of two isomorphic copies of the underlying Graßmann algebra: the presence of, e.g., a Graßmann number di† (p) indicates the presence of the corresponding positron field quantum. Therefore the body of a DeWitt-type inner product gives essentially only the contribution of the vacuum-to-vacuum transition amplitude to the full physical transition amplitude. To obtain the full physical transition amplitude we need to introduce the more general inner product on the state space, given by Eq. (10), which is not sesqui--linear and for which in general xB , yB = x, yB . The super Hilbert spaces arising in the work of El Gradechi and Nieto [2] and of Samsonov [11] provide further examples for super Hilbert spaces which are not super Hilbert spaces in the sense of DeWitt, Nagamachi and Kobayashi, or Khrennikov but which are super Hilbert spaces in the sense of Definition 4.1. Therefore we conclude that the introduction of the more general notion of super Hilbert space put forward in this paper is physically justified. Acknowledgements. This research was performed while the author was a Marie Curie Research Fellow in the Theoretical Physics Group at Imperial College, London, UK, under the Training and Mobility of Researchers (TMR) programme of the European Commission.

References 1. DeWitt, B.: Supermanifolds, 2nd Edition. Cambridge: Cambridge University Press, 1992 2. El Gradechi, A.M., Nieto, L.M.: Supercoherent States, Super Kähler Geometry and Geometric Quantization. Commun. Math. Phys. 175, 521–564 (1996) 3. Federer, H.: Geometric Measure Theory. Berlin: Springer, 1969 4. Hatfield, B.: Quantum Field Theory of Point Particles and Strings. Redwood City: Addison-Wesley, 1992 5. Kadison, R.V., Ringrose, J.R.: Fundamentals of the Theory of Operator Algebras I. Orlando: Academic, 1983 6. Khrennikov, A.Yu.: The Hilbert super space. Sov. Phys. Dokl. 36, 759–760 (1991) 7. Kugo, T.: Eichtheorie. Berlin: Springer, 1997 8. Lance, E.C.: Hilbert C ∗ -modules. London Mathematical Society Lecture Notes Series 210. Cambridge: Cambridge University Press, 1995 9. Nagamachi, S., Kobayashi, Y.: Hilbert superspace. J. Math. Phys. 33, 4274–4282 (1992) 10. Rogers, A.: A global theory of supermanifolds. J. Math. Phys. 21, 1352–1365 (1980)

Super Hilbert Spaces

467

11. Samsonov, B.F.: Supersymmetry and supercoherent states of a nonrelativistic free particle. J. Math. Phys. 38, 4492–4503 (1997) 12. Schmitt, T.: Supergeometry and hermitian conjugation. J. Geom. Phys. 7, 141–169 (1990) 13. Wegge-Olsen, N.E.: K-Theory and C ∗ -algebras. Oxford: Oxford University Press, 1993 14. Whitney, H.: Geometric Integration Theory. Princeton, New Jersey: Princeton University Press, 1957 Communicated by H. Araki

Commun. Math. Phys. 214, 469 – 491 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Extended Diffeomorphism Algebras and Trajectories in Jet Space T. A. Larsson Vanadisvägen 29, 113 23 Stockholm, Sweden. E-mail: [email protected] Received: 29 October 1998 / Accepted: 2 May 2000

Abstract: Let the DRO (Diffeomorphism, Reparametrization, Observer) algebra DRO(N ) be the extension of diff(N ) ⊕ diff(1) by its four inequivalent Virasoro-like cocycles. Here diff(N ) is the diffeomorphism algebra in N -dimensional spacetime and diff(1) describes reparametrizations of trajectories in the space of tensor-valued p-jets. DRO(N ) has a Fock module for each p and each representation of gl(N ). Analogous representations for gauge algebras (higher-dimensional Kac–Moody algebras) are also given. The reparametrization symmetry can be eliminated by a gauge fixing procedure, resulting in previously discovered modules. In this process, two DRO(N ) cocycles transmute into anisotropic cocycles for diff(N ). Thus the Fock modules of toroidal Lie algebras and their derivation algebras are geometrically explained.

1. Introduction Consider the algebra of diffeomorphisms in N -dimensional spacetime, diff(N ). The classical representations act on tensor densities over spacetime [7, 18], but this is not a good starting point for quantization. Naïvely, one would try to introduce canonical momenta and normal order, but this only works in one dimension, where this procedure gives Fock representations of theVirasoro algebra. In higher dimensions, infinities are encountered; formally, a central extension proportional to the number of time-independent functions arises. Moreover, diff(N ) has no central extension when N > 1. diff(N ) acts naturally on the corresponding space of p-jets, p finite. The infinite jet space is essentially the space of functions, insofar as functions may be identified with their Taylor series. This realization of diff(N ) is finite-dimensional but non-linear; diffeomorphisms act linearly on the Taylor coefficients with matrices depending nonlinearly on the base point. The corresponding Fock representation is well defined but not very interesting, because it gives us back the original tensor densities (and derivatives thereof), and no extensions arise.

470

T. A. Larsson

To remedy this, consider the space of trajectories in jet space. diff(N ) acts naturally on this space as well, but in a highly reducible fashion; the realization is a continuous direct sum because every point on a trajectory transforms independently of its neighbors. This degeneracy can be lifted by adding an extra diff(1) factor describing reparametrizations, and thus the total algebra is diff(N ) ⊕ diff(1). The DRO algebra DRO(N ) is the extension of this algebra by its four independent Virasoro-like cocycles, which are non-central except in one dimension. The canonical normal ordering with respect to reparametrizations results in Fock modules for DRO(N ). On the group level, this corresponds to a representation up to a local phase; only if the phase is globally constant, the Lie algebra extension is central. Reparametrizations are then eliminated by Hamiltonian reduction. Since they generate first class constraints, a gauge fixing condition must be introduced; a natural choice is to identify one coordinate with the parameter along the trajectory. Poisson brackets are now replaced with Dirac brackets before normal ordering. This yields a projective realization of diff(N ), which was discovered by hand in [14] (that paper was limited to zero-jets). In particular, two of the diff(N ) ⊕ diff(1) cocycles transmute into the anisotropic diff(N ) extensions described in that paper. By further specialization to scalar-valued jets (and choosing a Fourier basis on the N -dimensional torus), we recover the results of [8] on the derivation algebra of toroidal Lie algebras. I thus give a complete geometrical explanation of the rather surprising results in [8, 14], and generalize in two ways: reparametrizations are separated from diffeomorphisms, and arbitrary tensor-valued p-jets are considered, not only zero-jets. Berman and Billig [1] independently studied tensor-valued objects, but only as modules over the “spatial” subalgebra diff(N − 1). For a supersymmetric generalization, see [15]. Proper representations were studied in [7]. It was noted by several authors [1,15] that the gauge-fixed algebra is “space-time asymmetric” in the sense that time is a distinguished direction. In the present work this anisotropy is isolated in the gauge fixing condition, whereas the underlying algebraic structure is completely isotropic. The gauge algebra map(N, g), i.e. the algebra of maps from N -dimensional spacetime to a finite-dimensional Lie algebra g, has similar projective representations. This representation theory is also developed in the present paper, and thus the results of [1, 2, 6, 9, 14, 17] on toroidal Lie algebras are geometrically explained and generalized. All considerations in this paper are local, but I expect that the results can be globalized without too much difficulty. It is clear that the first de Rham homology plays an important rôle, both because the basic objects are one-dimensional trajectories and because closed one-chains appear in (2.6) below.

2. The Algebra DRO(N ) Let ξ = ξ µ (x)∂µ , x ∈ RN , ∂µ = ∂/∂x µ , be a vector field, with commutator [ξ, η] ≡ ξ µ ∂µ ην ∂ν − ην ∂ν ξ µ ∂µ . Greek indices µ, ν = 0, 1, .., N −1 label the spacetime coordinates and the summation convention is used on all kinds of indices. The diffeomorphism algebra (algebra of vector fields, Witt algebra) diff(N ) is generated by Lie derivatives Lξ . In particular, we refer to diffeomorphisms on the circle as reparametrizations. They form an additional diff(1) algebra with generators L(t), t ∈ S 1 . diff(N ) ⊕ diff(1) is the

Extended Diffeomorphism Algebras

471

Lie algebra with brackets [Lξ , Lη ] = L[ξ,η] , [L(s), Lξ ] = 0,

(2.1)

˙ − t). [L(s), L(t)] = (L(s) + L(t))δ(s Alternatively, we describe reparametrizations in terms of generators Lf , where f = f (t)d/dt is a vector field on the circle: Lf = dt f (t)L(t). The commutator is [f, g] = (f g˙ − g f˙)d/dt, where a dot indicates the t derivative. The assumption that t ∈ S 1 is for technical simplicity; it enables jets to be expanded in a Fourier series, but it is physically quite unjustified because it means that spacetime is periodic in the time direction. However, all we really need is that dt F˙ (t) = 0 for all functions F (t). Most results are unchanged if we instead take t ∈ R and replace Fourier sums with Fourier integrals everywhere. Introduce N priviledged functions on the circle q µ (t), which can be interpreted as the trajectory of an observer (or base point). Let the observer algebra Obs(N ) be the space of local functionals of q µ (t), i.e. polynomial functions of q µ (t), q˙ µ (t), . . . d k q µ (t)/dt k , k finite, regarded as a commutative Lie algebra. The DRO (Diffeomorphism, Reparametrization, Observer) algebra DRO(N ) is an abelian but non-central Lie algebra extension of diff(N ) ⊕ diff(1) by Obs(N ): 0 −→ Obs(N ) −→ DRO(N ) −→ diff(N ) ⊕ diff(1) −→ 0.

(2.2)

The extension depends on the four parameters cj , j = 1, 2, 3, 4, to be called abelian charges; the name is chosen in analogy with the central charge of the Virasoro algebra. The sequence (2.2) splits (DRO(N ) is a semi-direct product) iff all four abelian charges vanish. The brackets are given by 1 [Lξ , Lη ] = L[ξ,η] + dt q˙ ρ (t) c1 ∂ρ ∂ν ξ µ (q(t))∂µ ην (q(t)) 2π i + c2 ∂ρ ∂µ ξ µ (q(t))∂ν ην (q(t)) , 1 [Lf , Lξ ] = dt (c3 f¨(t) − ia3 f˙(t))∂µ ξ µ (q(t)), 4πi c4 [Lf , Lg ] = L[f,g] + dt (f¨(t)g(t) ˙ − f˙(t)g(t)), (2.3) 24π i [Lξ , q µ (t)] = ξ µ (q(t)), [Lf , q µ (t)] = −f (t)q˙ µ (t), [q µ (s), q ν (t)] = 0, extended to all of Obs(N ) by Leibniz’ rule and linearity. The parameter a3 is cohomologically trivial and can be removed by the redefinition ia3 Lξ → Lξ + (2.4) dt ∂µ ξ µ (q(t)). 4π i

472

T. A. Larsson

The remaining four cocycles are non-trivial. We identify c4 as the central charge in the Virasoro algebra generated by reparametrizations. It is not difficult to reformulate the DRO algebra as a proper Lie algebra, by introducing a compete basis for Obs(N ). In fact, it suffices to consider the two linear operators ρ S0 (F ) and S1 (Fρ ), defined for two arbitrary functions F (t, x) and Fρ (t, x), t ∈ S 1 , x ∈ RN , 1 S0 (F ) = dt F (t, q(t)), 2π i (2.5) 1 ρ dt q˙ ρ (t)Fρ (t, q(t)). S1 (Fρ ) = 2π i DRO(N ) now takes the form ρ

[Lξ , Lη ] = L[ξ,η] + S1 (c1 ∂ρ ∂ν ξ µ ∂µ ην + c2 ∂ρ ∂µ ξ µ ∂ν ην ), 1 [Lf , Lξ ] = S0 ((c3 f¨ − ia3 f˙)∂µ ξ µ ), 2 c4 [Lf , Lg ] = Lf g− S0 (f¨g˙ − f˙g), ˙ f˙g + 12 [Lξ , S0 (F )] = S0 (ξ µ ∂µ F ), ∂F [Lf , S0 (F )] = S0 (f + f˙F ), ∂t ρ ρ [Lξ , S1 (Fρ )] = S1 (ξ µ ∂µ Fρ + ∂ρ ξ µ Fµ ), ∂Fρ ρ ρ ), [Lf , S1 (Fρ )] = S1 (f ∂t ∂F µ ) + S1 (∂µ F ) ≡ 0, S0 ( ∂t 1 S0 (f ) = dt f (t), if f (t) is independent of x, 2πi

(2.6)

where f˙ = df/dt. That (2.6) defines a Lie algebra follows from the explicit realization in Theorem 1 below, but it is also straightforward to verify the Jacobi identities. ρ If Fρ (t, x) is independent of t, S1 (Fρ ) is dual to closed one-forms, and as such it can be viewed as a closed one-chain. Dzhumadil’daev [5] has given a list of diff(N ) extensions by modules of tensor fields; see also [16]. The cocycles c1 and c2 are related ρ to his cocycles ψ4W and ψ3W , respectively. In fact, they are equal for S1 an exact onechain, but closed one-chains are not included in Dzhumadil’daev’s list since they are not tensor modules. In one dimension, S1 (f ) = 1/(2π i) dx 0 f (x 0 ), so the first line in (2.6) reduces to the Virasoro algebra with central charge c = 12(c1 + c2 ). The cocycles in DRO(N ) have a natural origin from the diffeomorphism algebra in (N + 1)-dimensional space. Let its coordinates be zA , where capital indices A = −1, 0, 1, 2, . . . , N run over N + 1 values and the extra direction is labelled by −1. diff(N + 1) has an abelian extension with two Virasoro-like cocycles: [LX , LY ] = L[X ,Y ] + S C (c1 ∂C ∂B X A ∂A Y B + c2 ∂C ∂A X A ∂B Y B ), [LX , S C (FC )] = S C (X A ∂A FC + ∂C X A FA ), S C (∂C F ) ≡ 0,

(2.7)

Extended Diffeomorphism Algebras

473

where X = X A (z)∂A is an (N + 1)-dimensional vector field. The cocycles multiplying c1 and c2 are simply those found by Eswara Rao and Moody [8] and myself [13], in one extra dimension. Now embed diff(N ) ⊕ diff(1) ⊂ diff(N + 1) in the natural manner: zµ = x µ , z−1 = t, X A (z) = (ξ µ (x), f (t)), LX = (Lξ , Lf ), S C (FC ) = ρ S0 (F ) + S1 (Fρ ), where F = F−1 . Under this decomposition, (2.7) restricts to (2.6) with c3 = 2c2 , c4 = 12(c1 + c2 ), up to a trivial cocycle. However, it is easy to see that c3 and c4 are in fact independent parameters, so there are four different cocycles in total. In Sect. 7 below I will show that the complicated anisotropic cocycles in [14] can be obtained from (2.3) by a gauge-fixing procedure. We are interested in representations of DRO(N ) that are of lowest energy type with respect to the Hamiltonian H = L−i d = −i dt L(t). (2.8) dt

Such a representation contains a cyclic state 0 (the vacuum), satisfying H 0 = h0 , (2.9) and A0 = 0 for every operator A such that [H, A] = −wA A, wA > 0. The lowest energy h also characterizes the representation. 3. Preliminaries Consider the space of V -valued functions over spacetime, where V carries an gl(N ) representation +. This is our configuration space which will be denoted by Q. A basis is given by φα (x), x ∈ RN , where the index α labels different components of tensor densities. The fields can be either bosonic or fermionic, but it is assumed that all components have the same parity. Let m = (m0 , m1 , .., mN−1 ), all mµ 0, be a multi-index th of length |m| = N−1 µ=0 mµ , let µ be a unit vector in the µ direction, and let 0 be the multi-index of length zero. Denote by ∂m φα (x) = ∂0 ..∂0 .. ∂N−1 ..∂N−1 φα (x).

m0

(3.1)

mN −1

Diffeomorphisms act as follows on derivatives of tensor densities: [Lξ , ∂n φ(x)] = ∂n (−ξ µ (x)∂µ φ(x) − ∂ν ξ µ +(Tµν )φ(x)) Tnm (ξ(x))∂m φ(x), = −ξ µ (x)∂n+µ φ(x) −

(3.2)

|m||n|

n Tnm (ξ ) = ∂n−m+ν ξ µ +(Tµν ) m

n m−µ + (1 − δn )∂n−m+µ ξ µ +(1), (3.3) m−µ n where the V index (α) was suppressed, m! = m0 !m1 !..mN−1 ! and m = n!/m!(n−m)!. µ Our convention is that gl(N ) has basis Tν and brackets [Tνµ , Tτσ ] = δνσ Tτµ − δτµ Tµσ .

(3.4)

474

T. A. Larsson

+(Tµν ) are the matrices in the representation +, acting on a tensor density with p upper and q lower indices and weight κ as follows: σ ..σ +(Tνµ )φτ11..τqp

=

σ ..σ −κδνµ φτ11..τqp

+

p i=1

σ ..µ..σ δνσi φτ11..τq p

−

q j =1

σ ..σ

p δτµj φτ11..ν..τ q.

(3.5)

The matrices Tnm (ξ ), with components Tnm (ξ )αβ , satisfy m m (ξ ) = ∂ν ξ µ δn+µ + Tnm (∂ν ξ ) + Tn Tn+ν

m−ν

T0m (ξ ) ∂ν Tnm (ξ ) Tnm ([ξ, η])

= = =

(ξ ),

δ0m ∂ν ξ µ +(Tµν ), Tnm (∂ν ξ ), ξ µ Tnm (∂µ η) − ην Tnm (∂ν ξ ) + (Tnr (ξ )Trm (η) − Tnr (η)Trm (ξ )). |m||r||n|

(3.6)

In particular, Tnm (ξ ) = 0 if |m| > |n|. Let tr denote the trace in gl(N ) representation +. Define numbers dim(+), k0 (+), k1 (+), k2 (+) by tr 1 = dim(+), tr Tνµ = k0 (+)δνµ , tr Tνµ Tτσ = k1 (+)δτµ δνσ + k2 (+)δνµ δτσ .

(3.7)

For an unconstrained tensor transforming as in (3.5), dim(+) = N p+q , k1 (+) = (p + q)N

p+q−1

,

k0 (+) = −(p − q − κN )N p+q−1 , k2 (+) = ((p − q − κN ) − p − q)N 2

(3.8) p+q−2

.

Note that if κ = (p − q)/N, + is an sl(N ) representation. Let S4 be the symmetric representation on 4 lower indices, appropriate for multi-indices. We have dim(S4 ) = m δ , etc., where m m

N −1+4 N −1+4 dim(S4 ) = , k0 (S4 ) = , 4 4−1

(3.9) N +4 N −1+4 , k2 (S4 ) = . k1 (S4 ) = 4−1 4−2 Lemma 1.

N +p m i. δm tr 1 = dim(+), p |m|p

N +p N +p m µ ii. k0 (+) + tr Tm (ξ ) = ∂µ ξ dim(+) , p p−1 |m|p

N +p N +p+1 iii. k1 (+) + tr Tnm (ξ )Tmn (η) = ∂ν ξ µ ∂µ ην dim(+) p p−1 |m||n|p |n||m|p

N +p N +p N +p + ∂ µ ξ ∂ν η k2 (+) + dim(+) + 2 k0 (+) . p p−2 p−1 µ

ν

Extended Diffeomorphism Algebras

475

Proof. If |m| = |n| = 4, Tnm (ξ ) = ∂µ ξ ν (+(Tνµ )δnm + +(1)ζnm (Tνµ )),

(3.10)

µ

where ζnm (Tν ) are the representation matrices in S4 , acting on multi-indices. Only the top values (3.10) contribute to the traces, which means that we can ignore that higher jets do not transform as S4 -valued zero-jets. By the definition (3.7) and (3.10), dim(+ ⊗ S4 ) = dim(+) · dim(S4 ), k0 (+ ⊗ S4 ) = k0 (+)dim(S4 ) + dim(+) k0 (S4 ), k1 (+ ⊗ S4 ) = k1 (+)dim(S4 ) + dim(+) k1 (S4 ), k2 (+ ⊗ S4 ) = k2 (+)dim(S4 ) + dim(+) k2 (S4 ) + 2k0 (+)k0 (S4 ).

(3.11)

The lemma now follows from (3.9) and the following sums: N +p , p 4=0

p N +p+1 k1 (S4 ) = , p−1 p

dim(S4 ) =

4=0

N +p , p−1 4=0

p N +p k2 (S4 ) = . p−2 p

k0 (S4 ) =

(3.12)

4=0

4. Jet Space Trajectories Let J p Q be the space of trajectories in the space of V -valued p-jets, with coordinates (q µ (t), φα,m (t)), where |m| p and t ∈ S 1 . The parameter t is referred to as time and q µ (t) as the observer’s trajectory in spacetime. DRO(N ) acts on J p Q as follows: [Lξ , φ,n (t)] = −

Tnm (ξ(q(t)))φ,m (t),

|m||n|

˙ − t) + iwφ,n (t)δ(s − t), [L(s), φ,n (t)] = −φ˙ ,n (t)δ(s − t) + λφ,n (t)δ(s (4.1) µ µ [Lξ , q (t)] = ξ (q(t)), [L(s), q µ (t)] = −q˙ µ (t)δ(s − t). Clearly, there is a chain of inclusions J −1 Q ⊂ J 0 Q ⊂ J 1 Q ⊂ . . . , where J −1 Q consists of q µ (t) only. Hence J p Q is reducible (but indecomposable) as a DRO(N ) realization. This kind of reducibility is not present in the Fock modules below, because jets of all orders up to p are created from the vacuum, cf. (5.32). We call λ the causal weight of φ, in contradistinction to its tensorial weight κ. The shift parameter w can be eliminated by the redefinition φ,n (t) → e−iwt φ,n (t),

(4.2)

so it is only defined up to an integer. The triple (κ, λ, w) will collectively be referred to as the weights of φ. The observer’s trajectory q µ (t) has causal weight 0 but it does not

476

T. A. Larsson

transform as a zero-jet under diffeomorphisms. However, its time derivative has causal weight 1 and does transform as a (vector-valued) zero-jet, [Lξ , q˙ µ (t)] = ∂ν ξ µ (q(t))q˙ ν (t), ˙ − t). [L(s), q˙ µ (t)] = −q¨ µ (t)δ(s − t) + q˙ µ (t)δ(s

(4.3)

A point in J ∞ Q can be identified with a trajectory in the space of V -valued functions via generating functions; for x = (x µ ) ∈ RN , define φα (x, t) =

1 φα,m (t)(x − q(t))m , m!

(4.4)

|m|0

where (x − q(t))m = (x 0 − q 0 (t))m0 (x 1 − q 1 (t))m1 ..(x N−1 − q N−1 (t))mN −1 .

(4.5)

φα (x, t) transforms as in (3.2) under diffeomorphisms and as (4.1) under reparametrizations; note that 1 d φα (x, t) = (φ˙ α,m (t)(x − q(t))m − mµ q˙ µ (t)φα,m (t)(x − q(t))m−µ . dt m! |m|0

Moreover, ∂m φα (x, t) = φα,m (x, t). This formula suggests that we define a map ∂ˇµ :

J p Q −→ J p+1 Q,

∂ˇµ q ν (t) = δµν ,

(4.6)

∂ˇµ φ,n (t) = φ,n+µ (t), extended to the whole of J p Q by Leibniz’ rule and linearity. Further, define ∂ˇm as in (3.1). This operator satisfies ∂ˇm f (q(t)) = ∂m f (q(t)) and ∂ˇµ Lξ = Lξ ∂ˇµ + ∂µ ξ ν ∂ˇν , ∂ˇµ L(s) = L(s)∂ˇµ ,

(4.7)

when acting on arbitrary functions on J p Q. 5. Realization in Fock Space Consider the symplectic space J p P obtained by adjoining to J p Q dual coordinates (jet momenta) (pµ (t), π α,m (t)). The graded Poisson algebra C ∞ (J p P) is the associative, graded commutative algebra on symbols (q µ (t), φα,m (t), pµ (t), π α,m (t)), equipped with a compatible graded Lie structure: the Poisson bracket. The only non-zero brackets are [pµ (s), q ν (t)] = δµν δ(s − t),

[π

α,m

(s), φβ,n (t)] ≡ ∓[φβ,n (t), π

(5.1) α,m

(s)] =

δnm δβα δ(s

− t),

where we here and henceforth use the convention that the upper sign refers to bosons and the lower to fermions.

Extended Diffeomorphism Algebras

477

All functions over S 1 can be expanded in a Fourier series; e.g.

φα,m (t) =

∞ n=−∞

< > (t) + φˆ α,m (0) + φα,m (t), φˆ α,m (n)e−int ≡ φα,m

(5.2)

< (t) (φ > (t)) is the sum over negative (positive) frequency modes only. where φα,m α,m φˆ α,m (0) will be referred to as the zero mode. Quantization amounts to replacing the Poisson brackets (5.1) by graded commutators; the Fock space J p F is the universal enveloping algebra modulo relations

µ < q< (t)0 = pµ (t)0 = πα,m (t)0 = φα,m (t)0 = 0,

(5.3)

where pµ (t) = pµ< (t) + pˆ µ (0) and πα,m (t) = π<α,m (t) + πˆ α,m (0). Normal ordering is necessary to remove infinites and to obtain a well defined action on Fock space. Let f (q(t), φ(t)) be a function of q µ (t), φ(t), as well as its derivatives φ,m (t), but independent of the canonical momenta. Denote :f (q(t), φ(t))pµ (t): = f (q(t), φ(t))pµ (t) + pµ> (t)f (q(t), φ(t)), :fβα (q(t))π β,n (t)φα,m (t): =

(5.4)

β,n π>β,n (t)fβα (q(t))φα,m (t) ± φα,m (t)fβα (q(t))π (t).

In particular, β,n

:π ,n (t)Tnm (ξ )φ,m (t): = Tnm (ξ )αβ (π>β,n (t)φα,m (t) ± φα,m (t)π (t)). We are now ready to state the main result. Theorem 1. The following operators provide a realization of DRO(N ) on J p F:

dt :ξ µ (q(t))pµ (t): + T (ξ(q(t)), t), T (ξ, t) = ∓ :π ,n (t)Tnm (ξ )φ,m (t): , Lξ =

L(t) =

|m||n|p − : q˙ µ (t)pµ (t):

+ L (t),

d ,m (π (t)φ,m (t)): dt |m|p N + p λ − λ2 − w + w 2 ,m + iw :π (t)φ,m (t): ± dim(+) , p 4π i

L (t) = ±

− :π ,m (t)φ˙ ,m (t): + λ :

(5.5)

478

T. A. Larsson

where the upper sign holds for bosons and the lower sign for fermions. The abelian charges are c3 = 1 + c3 , a3 = 1 + a3 , c1 = 1 + c1 , c4 = 2N + c4 , where

N +p N +p+1 k1 (+) + dim(+) , p p−1

N +p N +p N +p ± k2 (+) + dim(+) + 2 k0 (+) , p p−2 p−1

N +p N +p ±(2λ − 1) k0 (+) + dim(+) , p p−1

N +p N +p ±(2w − 1) k0 (+) + dim(+) , p p−1

N +p ±2(1 − 6λ + 6λ2 ) dim(+). p

c1 = ± c2 = c3 = a3 = c4 =

(5.6)

dim(+), k0 (+), k1 (+) and k2 (+) were defined in (3.7) and λ and w in (4.1). From (5.5) we read off the transformation laws for the jet momenta. [Lξ , pν (t)] = −∂ν ξ µ (q(t))pµ (t) − T (∂ν ξ(q(t)), t), ˙ − t), [L(s), pν (t)] = pν (s)δ(s [Lξ , π ,m (t)] = π ,n (t)Tnm (ξ(q(t))),

(5.7)

|m||n|p ,m

˙ − t) [L(s), π ,m (t)] = −π˙ (t)δ(s − t) + (1 − λ)π ,m (t)δ(s ,m − iwπ (t)δ(s − t). Note the range of the sum, which depends on the order of the jet. In particular, the top momentum π ,m (t), |m| = p, transforms as a tensor-valued zero-jet. Without normal ordering, Theorem 1 defines a proper but highly reducible representation of diff(N ); in fact, it is a continuous direct sum of p-jets, one for each value of the time parameter t. This degeneracy is lifted by the introduction of the reparametrization algebra. Using (4.6), the diff(N ) generators can be written as Lξ = dt ξ µ (q(t))(pµ (t) ± π ,m (t)φ,m+µ (t)) ∓

|m|p

|m|p

π

,m

(t)∂ˇm (ξ (q(t))φ,µ + ∂ν ξ µ (q(t))+(Tµν )φ(t)). µ

(5.8)

All formulas simplify for zero-jets. T (ξ, t) = ∂ν ξ µ Tµν (t), where Tµν (t) generate the ), Kac–Moody algebra gl(N [Tνµ (s), Tτσ (t)] = (δνσ Tτµ (s) − δτµ Tνσ (s))δ(s − t) 1 ˙ − t), ∓ (k1 (+)δτµ δνσ + k2 (+)δνµ δτσ )δ(s 2π i k (+) ˙ − t) ∓ 0 δνµ (δ(s ¨ − t) + i δ(s ˙ − t)). [L (s), Tνµ (t)] = Tνµ (s)δ(s 4π i

(5.9)

Extended Diffeomorphism Algebras

479

It should be stressed that the action in Theorem 1 on J p F is manifestly well defined, at least for the subalgebra of polynomial vector fields. Namely, a monomial basis for J p F is given by finite strings in the non-negative modes qˆ µ (n), pˆ µ (n), φˆ α,m (n), πˆ α,m (n), n 0, |m| p, and a generic element is a finite linear combination of such monomials. For ξ a polynomial vector field, finiteness is preserved by (5.5). Split the delta function into positive and negative frequency parts: δ > (t) =

1 −imt e , 2π

δ (t) =

m>0

1 −imt e . 2π

(5.10)

m0

Lemma 2 ([14]). 1 ˙ δ(t), 2π i 1 ¨ + i δ(t)), ˙ ii. δ > (t)δ˙ (−t) − δ˙> (−t)δ (t) = (δ(t) 4π i 1 ... ˙ iii. δ˙> (t)δ˙ (−t) − δ˙> (−t)δ˙ (t) = ( δ (t) + δ(t)). 12π i i. δ > (t)δ (−t) − δ > (−t)δ (t) = −

Lemma 3. Let π A (s), φB (t), s, t ∈ S 1 , generate a graded Heisenberg algebra, with non-zero brackets [π A (s), φB (t)] = δBA δ(s − t). Then [π>A (s), φB (t)] = δBA δ > (s − t),

[φB (s), π>A (t)] = ∓δBA δ > (t − s),

[πA (s), φB (t)] = δBA δ (s − t),

[φB (s), πA (t)] = ∓δBA δ (t − s).

Lemma 4. Define F (t) = ∓ :π A (t)φ˙ A (t): ,

EBA (t) = ∓ :π A (t)φB (t): ,

(5.11)

where π A (s) and φB (t), defined as in the previous lemma, carry the same statistics (so EBA (t) is bosonic). Then ˙ − t) [F (s), F (t)] = (F (s) + F (t))δ(s ... 1 A ˙ − t)), ± δA ( δ (s − t) + δ(s 12π i ˙ − t) ∓ δBA [F (s), EBA (t)] = EBA (s)δ(s

1 ¨ − t) + i δ(s ˙ − t)), (δ(s 4π i

C A A C [EBA (s), ED (t)] = (δBC ED (s) − δD EB (s))δ(s − t) 1 A C ˙ − t). δ(s ∓ δD δB 2π i

(5.12) (5.13)

(5.14)

480

T. A. Larsson

Proof. This lemma follows by direct calculation. The technique is illustrated for (5.13) only, [F (s), EBA (t)] = [∓π>C (s)φ˙ C (s) − φ˙ C (s)πC (s), ∓π>A (t)φB (t) − φB (t)πA (t)] d = π>A (s) (∓δ > (t − s))φB (t) ± π>A (t)δ > (s − t)φ˙ B (s) ds d ± δ > (s − t)πA (t)φ˙ B (s) ± π>A (s)φB (t) (∓δ (t − s)) ds d ± ± φ˙ B (s)π>A (t)δ (s − t) + (∓δ > (t − s))φB (t)πA (s) ds d A + φ˙ B (s)δ (s − t)π (t) ± φB (t) (∓δ (t − s))πA (s) ds d A A = ∓π> (s)φB (t) δ(t − s) ± π> (t)φ˙ B (s)δ(s − t) (5.15) ds d + (∓δ > (t − s))δBA δ (s − t) + φ˙ B (s)πA (t)δ(s − t) ds d d ± δ > (s − t) (δ (t − s))δBA − φB (t)πA (s) δ(t − s) ds ds d A A ˙ = ∓ :π (s)φB (t): δ(t − s) ± :π (t)φB (s): δ(s − t) ds ∓ δBA (δ > (s − t)δ˙ (t − s) − δ˙> (t − s)δ (s − t)). The result now follows by collecting terms and applying Lemma 2 to obtain the central extension. Proof of Theorem 1. First we note that in proving the brackets with q µ (t), normal ordering is irrelevant because Lξ is linear in pµ (t). This part is straightforward and not given here. We now turn to diffeomorphisms, and set L0ξ = ds :ξ µ (s)pµ (s): , where we abbreviate ξ µ (q(s)) = ξ µ (s), etc., 0 0 [Lξ , Lη ] = dsdt [ξ µ (s)pµ (s) + pµ> (s)ξ µ (s), ην (t)pν (t) + pν> (t)ην (t)] = dsdt ξ µ (s)(∂µ ην (t)δ (s − t))pν (t) + ην (t)(−∂ν ξ µ (s)δ (t − s))pµ (s) + ξ µ (s)pν> (t)(∂µ ην (t)δ (s − t)) + (−∂ν ξ µ (s)δ > (t − s))ην (t)pµ (s) + (∂µ ην (t)δ > (s − t))pν (t)ξ µ (s) + pµ> (s)ην (t)(−∂ν ξ µ (s)δ (t − s))

+ pν> (t)(∂µ ην (t)δ > (s − t))ξ µ (s) + pµ> (s)(−∂ν ξ µ (s)δ > (t − s))ην (t) = dsdt ξ µ (s)∂µ ην (t)pν (t)δ(s − t) + ∂µ ην (t)δ > (s − t)∂ν ξ µ (s)δ (t − s) + pµ> (t)ξ µ (s)∂µ ην (t)δ(s − t) − ∂ν ξ µ (s)δ > (t − s)∂µ ην (t)δ (s − t) − ην (t)∂ν ξ µ (s)pµ (s)δ(s − t) − pµ> (s)ην (t)∂ν ξ µ (s)δ(s − t) (5.16) = dsdt :ξ µ (s)∂µ ην (t)pν (t): δ(s − t) − :ην (t)∂ν ξ µ (s)pµ (s): δ(t − s) + ∂µ ην (t)∂ν ξ µ (s)(δ > (s − t)δ (t − s) − δ > (t − s)δ (s − t)).

Extended Diffeomorphism Algebras

481

We now apply Lemma 2 and integrate by parts, which yields 1 [L0ξ , L0η ] = L0[ξ,η] + ds ∂ν ξ˙ µ (s)∂µ ην (s) 2π i ρ = L0[ξ,η] + S1 (∂ρ ∂ν ξ µ ∂µ ην ).

(5.17)

Let capital indices run over both tensor indices and multi-indices, e.g. A = (α, m), π A (s) = π α,m (s), φB (t) = φβ,m (t). Now, Lξ = L0ξ + dt T (ξ(q(t)), t), where T (ξ, t) = TAB (ξ )EBA (t) in an obvious notation. EBA (t) is defined in (5.11), the matrices TBA (ξ ) satisfy the relations (3.6) and Lemma 1, with A m δA = δm , |m|p

TAA (ξ )

=

TBA (ξ )TAB (η) =

tr Tmm (ξ ),

(5.18)

|m|p

tr Tnm (ξ )Tmn (η).

|m||n|p |n||m|p

It follows from (5.14) that [T (ξ(s), s), T (η(t), t)] = (TAC (ξ(s))TCB (η(t)) − TCB (ξ(s))TAC (η(t)))EBA (s)δ(s − t) 1 A ˙ − t) ∓ (5.19) T (ξ(s))TAB (η(t))δ(s 2πi B = (T ([ξ(s), η(t)], s) − ξ µ (s)T (∂µ η(t), s) + ην (t)T (∂ν ξ(s), s))δ(s − t) 1 ˙ − t), − (c ∂ν ξ µ (s)∂µ ην (t) + c2 ∂µ ξ µ (s)∂ν ην (t))δ(s 2πi 1 where the parameters c1 and c2 can now be computed from Lemma 1, with the result (5.6). Further, [L0ξ , T (η(t), t)] = ξ µ (t)T (∂µ η(t), t), without extension. This concludes the proof for the diff(N ) subalgebra. Next we turn to reparametrizations. They generate a Virasoro algebra with central charge c, which may be written as c ... ˙ − t) + ˙ − t)). (5.20) [L(s), L(t)] = (L(s) + L(t))δ(s ( δ (s − t) + δ(s 24π i Set L0 (s) = − : q˙ µ (s)pµ (s): . This is recognized as being of the same form as F (s) in Lemma 4, with N bosonic fields q µ (s), and thus they generate a Virasoro algebra with central charge 2N . Set L(t) = L0 (t) + L (t), where A A A L (t) = F (t) − λE˙ A (t) − iwEA (t) ± δA

λ − λ2 − w + w 2 . 4π i

(5.21)

By Lemma 4, these operators generate a Virasoro algebra with central charge ±2(1 − A . Moreover, [L0 (s), L (t)] = 0 and the parameter c in (5.6) follows from 6λ + 6λ2 )δA Lemma 1.

482

T. A. Larsson

Finally, we want to prove that [L(s), Lξ ] = [L

0

(s), L0ξ ]

=−

=−

1 (c3 ∂µ ξ¨ µ (s) + ia3 ∂µ ξ˙ µ (s)). 4π i

(5.22)

dt [q˙ µ (s)pµ (s) + pµ> (s)q˙ µ (s), ξ ν (t)pν (t) + pν> (t)ξ ν (t)]

dt q˙ µ (s)(∂µ ξ ν (t)δ (s − t))pν (t) − ξ ν (t)

d µ (δ δ (t − s))pµ (s) ds ν

d µ > (δ δ (t − s))ξ ν (t)pµ (s) ds ν d + (∂µ ξ ν (t)δ > (s − t))pν (t)q˙ µ (s) − pµ> (s)ξ ν (t) (δνµ δ (t − s)) ds d µ > > ν > µ > + pν (t)(∂µ ξ (t)δ (s − t))q˙ (s) − pµ (s) (δν δ (t − s))ξ ν (t) (5.23) ds d = − dt q˙ µ (s)∂µ ξ ν (t)pν (t)δ(s − t) + ∂µ ξ µ (t)δ > (s − t) δ (t − s) ds d − ξ µ (t)pµ (s) δ(t − s) + pν> (t)q˙ µ (s)∂µ ξ ν (t)δ(s − t) ds d > d − δ (t − s)∂µ ξ µ (t)δ (s − t) − pµ> (s)ξ µ (t) δ(t − s) ds ds + q˙ µ (s)pν> (t)(∂µ ξ ν (t)δ (s − t)) −

=

˙ − t) dt − : q˙ µ (s)∂µ ξ ν (t)pν (t): δ(s − t) + :ξ µ (t)pµ (s): δ(s

+ ∂µ ξ µ (t)(δ > (s − t)δ˙ (t − s) − δ˙> (t − s)δ (s − t)) 1 = (∂µ ξ¨ µ (s) + i∂µ ξ˙ µ (s)), 4πi [L0 (s), T (ξ(t), t)] = −q˙ µ (s)T (∂µ ξ(t), t)δ(s − t),

(5.24)

˙ − t) [L (s), T (ξ(t), t)] = T (ξ(t), s)δ(s 1 A ¨ − t) + (2w − 1)δ(s ˙ − t)). ± (5.25) T (ξ(t))((2λ − 1)δ(s 4πi A To compute [L(s), dt T (ξ(t), t)], we note that the regular pieces from (5.24) and (5.25) cancel, whereas the extension acquires the form (5.22). The parameters c3 and a3 now follows from Lemma 1. The Fock module described in Theorem 1 is reducible, because it can be decomposed according to the number of φ’s, the canonical momenta counting negative. If there are several independent field species, a finer decomposition is possible. An alternative way to see this is as follows. Let us refer to φˆ ,m (n) and πˆ ,m (n) as phase space modes of frequency n. The reparametrization generators can be split as L(s) = L (s) + L< (s), where the raising operators L (s) consist of Fourier modes of non-negative frequency (as measured by the Hamiltonian (2.8)), and the lowering operators L< (s) consist of negative ones. Clearly, every lowering operator contains at least one negative frequency phase space mode. Because all expressions are normal ordered, lowering operators thus annihilate

Extended Diffeomorphism Algebras

483

< the vacuum. A similar decomposition should be applied to Lξ = L ξ + Lξ , but since [L(s), Lξ ] = 0 classically, there are no such lowering operators. Define a cyclic state ∅ to be a state annihilated by all lowering operators: (5.26) L< (s)∅ = 0.

As is well known, an irreducible representationcontains only one cyclic state. Since the vacuum 0 is cyclic, the existence of additional cyclic states signals reducibility. The following theorem describes some cyclic states and their energies, but no claim is made that the list is exhaustive. Theorem 2. The lowest energy (2.9) of the Fock representation in Theorem 1 is

1 N +p 1 1 h=∓ dim(+)((w − )2 − (λ − )2 ). (5.27) 2 p 2 2 n 0 , n 0, is cyclic with energy ˆ For a scalar bosonic zero-jet, the state n = (φ(0)) h(n) = h + nw. For fermionic tensor-valued p-jets, set ˆ 4 (n) = < φˆ α,m (n), α |m|4

ˆ −4−1 (n) = <

πˆ α,m (n)

(5.28)

α 4|m|p

(4 0), where the products run over all components. The states k, 4 = < ˆ 4 (k − 1) . . . < ˆ 4 (0)0 , k, −4 − 1 = < ˆ −4−1 (k − 1) . . . < ˆ −4−1 (0)0 ,

(5.29)

are cyclic, with energy

1 N +4 h(k, 4) = h + (5.30) dim(+) (k 2 + (2w − 1)k), 2 4

1 N + p N +4−1 h(k, −4 − 1) = h + − dim(+) (k 2 − (2w + 1)k). 2 p 4−1 ˆ Proof. Set L(m) = −i ds eims L(s). Then the Virasoro algebra takes the form c (m3 − m)δ(m + n), 12 ˆ [L(m), φˆ ,m (n)] = (n + (1 − λ)m + w)φˆ ,m (m + n), ˆ [L(m), πˆ ,m (n)] = (n − λm − w)πˆ ,m (m + n), ˆ ˆ ˆ [L(m), L(n)] = (n − m)L(m + n) −

(5.31)

ˆ and the Hamiltonian H = L(0) (2.8). The action on the vacuum is (excluding the observer) m−1 Lˆ (m)0 = ± (n − λm + w)πˆ ,m (m − n)φˆ ,m (n)0 . n=0 |m|p

(5.32)

484

T. A. Larsson

To compute parameters, note that

c c [Lˆ (m), Lˆ (−m)]0 = (− m3 + ( − 2h)m)0 . 12 12

A straightforward calculation shows that c is given by (5.6) and h by (5.27). The property that φ(t) is a scalar-valued zero-jet is preserved by Lξ and L(s). Moreover, any lowering operator gives negative-frequency phase space modes when acting ˆ ˆ on a zero mode, and hence the state is cyclic. The energy follows from [L(0), φ(0)] = ˆ wφ(0). ˆ 4 (n), jets of order |m| 4 Now consider fermions and 4 > 0. When Lξ acts on < are produced, but no higher-order jets. Also, L(s) preserves jet order. When acting on <4 (n), a lowering operator produces a sum of terms, each containing at least one phase space mode with frequency less than n, and jet order at most 4. However, the state k, 4 is the product of all such modes, so the fermionic property makes all these terms vanish. Hence k, 4 is cyclic. The energy h(k, 4) follows from the following calculation and Lemma 1: ˆ [L(0), φˆ ,m (n)] = (n + w)φ,m (n), ˆ ˆ 4 (n)] = ˆ 4 (n) [L(0), < (n + w)< (5.33) |m|4

N +4 ˆ 4 (n), = (n + w) dim(+) < 4

k−1 ˆ k, 4 = h + N + 4 dim(+) L(0) (n + w) k, 4 . 4 n=0

The case 4 < 0 is completely analogous, except that Lξ increases the jet order.

6. Gauge Algebra Consider the gauge algebra map(N, g), i.e. maps from N -dimensional spacetime to a finite-dimensional Lie algebra g, where g has basis J a (hermitian if g is compact and semisimple), structure constants f ab c , and Killing metric δ ab . The brackets are [J a , J b ] = if ab c J c .

(6.1)

Let δ a ∝ tr J a be a priviledged vector satisfying f ab c δ c ≡ 0. Clearly, δ a = 0 if J a ∈ [g, g], but it may be non-zero on abelian factors. The primary example is gl(d), where tr Jji ∝ δji . Our notation is similar to [10]. Let X = Xa (x)J a , x ∈ RN , be a g-valued function and define [X, Y ] = ab if c Xa Yb J c . The generators of map(N, g) are denoted by JX . The DGRO (Diffeomorphism, Gauge, Reparametrization, Observer) algebra DGRO(N, g) has brackets 1 (c5 δ ab + c8 δ a δ b ) dt q˙ ρ (t)∂ρ Xa (q(t))Yb (q(t)), [JX , JY ] = J[X,Y ] − 2π i 1 a [Lf , JX ] = dt (c6 f¨(t) − ia6 f˙(t))Xa (q(t)), (6.2) δ 4πi c7 a [Lξ , JX ] = Jξ µ ∂µ X − dt q˙ ρ (t)Xa (q(t))∂ρ ∂µ ξ µ (q(t)), δ 2π i [JX , q µ (t)] = 0, in addition to (2.3).

Extended Diffeomorphism Algebras

485

Alternatively, we can describe DGRO(N, g) by the relations ρ

[JX , JY ] = J[X,Y ] − (c5 δ ab + c8 δ a δ b )S1 (∂ρ Xa Yb ), 1 [Lf , JX ] = δ a S0 ((c6 f¨ − ia6 f˙)Xa ), 2 ρ [Lξ , JX ] = Jξ µ ∂µ X − c7 δ a S1 (Xa ∂ρ ∂µ ξ µ ),

(6.3)

ρ

[JX , S0 (F )] = [JX , S1 (Fρ )] = 0, in addition to (2.6). The cocycle proportional to a6 can be removed by the redefinition ia6 a δ S0 (Xa ), 2

JX → JX +

(δ a [X, Y ]a = 0), while the remaining terms define non-trivial extensions. In particular, we recognize the c5 term as the higher-dimensional generalization of the affine Kac– Moody algebra g. The present notation has the advantage that all abelian charges cj , j = 1, . . . , 8, can be discussed collectively. µ µ Let M be a g representation. We write Tν = Tν ⊕ 1, J a = 1 ⊕ J a , 1 = 1 ⊕ 1 for a elements in gl(N ) ⊕ g, and abbreviate M = M(1 ⊕ J a ). map(N, g) acts on J p Q and J p P in the following fashion (V indices suppressed): [JX , φ,n (t)] = − Jnm (X(q(t)))φ,m (t), |m||n|

[JX , π

,m

(t)] =

π ,n (t)Jnm (X(q(t))),

(6.4)

|m||n|p

n ≡ ∂n−m Xa M a , m [JX , q µ (t)] = [JX , pν (t)] = 0. Jnm (X)

The expression for the matrices Jnm (X), with components Jnm (X)αβ , follows immediately from [JX , φ,n (t)] = ∂n (−Xa (q(t))M a φ(q(t))). They satisfy the following relations: m−µ

m Jn+µ (X) = Jnm (∂µ X) + Jn

(X),

J0m (X) = δ0m Xa M a , ∂µ Jnm (X) = Jnm (∂µ X), Jnm ([X, Y ]) = Jnr (X)Jrm (Y ) − Jnr (Y )Jrm (X), |m||r||n|

Jnm (ξ µ ∂µ X)

= ξ µ Jnm (∂µ X) +

(6.5)

Tnr (ξ )Jrm (X) − Jnr (X)Trm (ξ ).

|m||r||n|

In particular, Jnm (X) = 0 if |m| > |n| and Jnm (X) = Xa M a δnm if |m| = |n|. Set tr M a = zM δ a and tr M a M b = yM δ ab + wM δ a δ b . For g semisimple, wM = zM = 0 and yM = ψ 2 xM , where ψ is the highest root of g and xM is a positive integer (the Dynkin index of the g representation M) [10]. The analog of Lemma 1 is

486

T. A. Larsson

Lemma 5.

N +p i. dim(+), tr Jmm (X) = Xa zM δ a p |m|p

ii.

tr

Jnm (X)Jmn (Y )

= (yM δ

ab

|m|,|n|p

iii.

N +p + wM δ δ ) dim(+) Xa Yb , p a b

tr Tnm (ξ )Jmn (X)

|m|,|n|p

N +p N +p = ∂µ ξ Xa zM δ k0 (+) − dim(+) . p p−1 µ

a

Proof. As in Lemma 1, only terms with |m| = |n| contribute to the sums, and we can hence think of Jnm (X) and Tnm (ξ ) as representation matrices in + ⊗ M ⊗ S4 . Hence i. =

p

tr M a dim(S4 )dim(+),

4=0

ii. =

p

tr M a M b Xa Yb dim(S4 )dim(+),

4=0

iii. =

p

Xa tr M a ∂µ ξ µ (k0 (+)dim(S4 ) + dim(+) k0 (S4 )).

4=0

We now apply (3.12) and use the definition of yM , zM and wM .

Theorem 3. The following operators, together with the operators in Theorem 1, yield a realization of the algebra DGRO(N, g) on the Fock space J p F, JX = dt J (X(q(t)), t), (6.6) :π ,n (t)Jnm (X)φ,m: (t). J (X, t) = ∓ |m||n|p

The parameters are

c5 = c6 = a6 = c7 = c8 =

N +p dim(+), ∓yM p

N +p a ±zM δ (2λ − 1) dim(+), p

N +p ±zM δ a (2w − 1) dim(+), p

N +p N +p a k0 (+) + ∓zM δ dim(+) , p p−1

N +p dim(+). ∓wM p

(6.7)

Extended Diffeomorphism Algebras

487

Proof. We use the same notation as in the proof of Theorem 1. In particular, capital indices A = (α, m) run over both internal and multi-indices, and we write X(s) = X(q(s)), etc. Equation (6.6) can be written as J (X, s) = JAB (X)EBA (s), where JBA satisfy relations (6.5) and EBA (s) is as in Lemma 4. The following formulas follow immeditately from (5.13) and (5.14), [J (X(s), s), J (Y (t), t)] = JAB (X(s))JCD (Y (t)) × 1 A C A A C ˙ − t)) δ δ δ(s × ((δBC ED (s) − δD EB (s))δ(s − t) ∓ 2π i D B 1 B ˙ − t), = J ([X, Y ](s), s)δ(s − t) ∓ J (X(s))JBA (Y (t))δ(s 2π i A [L0ξ , J (X(t), t)] = ξ µ (t)J (∂µ X(t), t), [T (ξ(s), s), J (X(t), t)] = TAB (ξ(s))JCD (X(t)) × 1 A C A A C ˙ − t)) δ δ δ(s × ((δBC ED (s) − δD EB (s))δ(s − t) ∓ 2π i D B = (J (ξ µ (s)∂µ X(t), s) − ξ µ (t)J (∂µ X(t), s))δ(s − t) 1 B ˙ − t), ∓ T (ξ(s))JBA (X(t))δ(s 2πi A [L(s), JX ] ≡

(6.8)

1 (g a X¨ a (s) + iba X˙ a (s)), 4πi

[L0 (s), J (X(t), t)] = −q˙ µ (s)J (∂µ X(t), t)δ(s − t), ˙ − t) [L (s), J (X(t), t)] = JAB (X(t))(EBA (s)δ(s 1 A ¨ − t) + (2w − 1)i δ(s ˙ − t)) ± δ ((2λ − 1)δ(s 4πi B 1 A ˙ − t) ± ¨ − t) + (2w − 1)i δ(s ˙ − t)). = J (X(t), s)δ(s J (X(t))((2λ − 1)δ(s 4π i A We now collect terms, integrate over t, and find that the regular terms give the proper algebra, while Lemma 5 give the extension parameters. Since c5 must be positive in a unitary represention, the bosonic Fock space carries a non-unitary representation. In analogy with (5.8), we can write J (X, t) = ∓ :π ,m (t)∂ˇm (Xa (q(t))M a φ(t)): . |m|p

A slight generalization is possible. The gauge connection corresponds to the jet Aaµ,m (t) µ,m g) acts as with conjugate momentum Ea (t). map(N, [JX , Aaν,n (t)] = − Jnm (X(q(t)))ab Abν,m (t) + ∂n+ν X a (q(t)), (6.9) |m|p

488

T. A. Larsson

where the matrices Jnm (X) are taken in the adjoint representation of g, i.e. (M a )bc = −if ab c . Thus the contribution to (6.6) is µ,m J (X, t) = :Eb (t)∂ˇm (if ab c Xa Acµ,m (t)): + ∂m+µ X a Eaµ,m (t) . (6.10) |m|p

Due to the non-homogeneous term in (6.9), the Fock space does not decompose into subspaces with a fixed number of A’s as a map(N, g) module. 7. Constraints Representations of DRO(N ) can be restricted to diff(N ) using techniques from constrained Hamiltonian systems[3, 11]. The same mechanism has appeared in mathematics under the name Drinfeld-Sokolov reduction [4]. The space J p P is equipped with a natural graded symplectic structure, and it can therefore be viewed as a classical phase space. Let P , R, . . . label bosonic constraints χP (q, p, φ, π). If DRO(N ) acts in the phase space such that all constraints are preserved, we may consider the restriction to the constraint surface χP ≈ 0. Weak equality (i.e. equality modulo constraints) is denoted by ≈. Constraints are classified as second or first class depending on whether the Poisson bracket matrix CP R = [χP , χR ] is invertible or not. First class constraints are connected to gauge symmetries and they always generate a Lie algebra. However, it is often possible to go from first class to second class (by fixing a gauge) and back (by dropping half the constraints). Assume that all constraints are second class, if necessary by adding gauge-fixing conditions. Then the matrix CP R has an inverse, denoted by FP R : FP R CRS = δSP . The Dirac bracket [A, B]∗ = [A, B] − [A, χP ]FP R [χR , B]

(7.1)

defines a new Lie bracket which is compatible with the constraints: [A, χR ]∗ = 0 for every A ∈ C ∞ (J p P). Reparametrizations generate a Lie algebra and can hence be viewed as first class constraints. A natural gauge choice is to identify one coordinate with the time parameter. Thus, our constraints are L(t) ≈ 0,

q 0 (t) − t ≈ 0.

The Poisson bracket matrix C(s, t) and its inverse F(s, t) are, on the constraint surface, 0 q (s) − s 0 T C(s, t) ≡ [χ (s), χ (t)] = , q (t) − t L(t) L(s)

0 δ(s − t) ... ≈ (7.2) c4 ˙ − t)) , −δ(s − t) 24πi ( δ (s − t) + δ(s

c ... 4 ˙ − t)) −δ(s − t) ( (s − t) + δ(s F(s, t) ≈ 24πi δ . δ(s − t) 0 We now solve the constraints, q 0 (t) = t,

p0 (t) = −q i (t)pi (t) + L (t),

(7.3)

Extended Diffeomorphism Algebras

489

where the latin index i = 1, 2, . . . N − 1 range over the remaining (“spatial”) directions. If Lξ satisfy the DRO algebra (2.3) under the original bracket, the Dirac brackets become 1 [Lξ , Lη ]∗ = L[ξ,η] + dt c1 ∂ν ξ˙ µ (q(t))∂µ ην (q(t)) 2π i + c2 ∂µ ξ˙ µ (q(t))∂ν ην (q(t)) 1 + dt c3 (∂ν ην (q(t))ξ¨ 0 (q(t)) − ∂µ ξ µ (q(t))η¨ 0 (q(t))) 4πi − ia3 (∂ν ην (q(t))ξ˙ 0 (q(t)) − ∂µ ξ µ (q(t))η˙ 0 (q(t))) (7.4) c4 + dt ξ¨ 0 (q(t))η˙ 0 (q(t)) − ξ˙ 0 (q(t))η0 (q(t)), 24π i [Lξ , q µ (t)]∗ = ξ µ (q(t)) − q˙ µ (t)ξ 0 (q(t)), [q µ (s), q ν (t)]∗ = 0, [L(s), Lξ ]∗ = [L(s), L(t)]∗ = [L(s), q µ (t)]∗ = 0, where f˙(q(t)) = q˙ ρ (t)∂ρ f (q(t)) and f¨(q(t)) = q¨ ρ (t)∂ρ f (q(t)) + q˙ ρ (t)q˙ σ (t)∂ρ ∂σ f (q(t)).

(7.5)

Some other Dirac brackets are ˙ − t), [pµ (s), pν (t)]∗ = (δµ0 pν (s) + δν0 pµ (t))δ(s [pµ (s), q ν (t)]∗ = (δνµ − q˙ µ (t)δν0 )δ(s − t), ∗

[pµ (s), T (ξ(q(t)), t)] = T (∂µ ξ(q(s)), s)δ(s

˙ − t) + δµ0 T (ξ(q(s)), s)δ(s

(7.6) − t).

Note that [Lξ , q 0 (t)]∗ = 0. Equation (7.4) is the four-parameter extension of diff(N ) found in [14]; it was denoted by d iff(N ; c1 , c2 , c3 , c4 ) in that paper. The parameters c1 and c2 are the same as in that paper, but I have interchanged the names of the other two: c3old = 12c4new and c4old = c3new . Note that two of the cocycles are anisotropic in the sense that they single out the x 0 direction. This anisotropy originates from the gauge choice q 0 (t) ≈ t. I expect other gauge choices to give rise to even more complicated cocycles. Therefore, it is natural to work with the full DRO algebra, where the cocycles are of the simple Virasoro form. Substitution of (7.3) into (5.5) gives Lξ = dt :ξ i (q(t))pi (t): − :ξ 0 (q(t))q˙ i (t)pi (t): (7.7) + ξ 0 (q(t))L (t) + T (ξ(q(t)), t), which is the realization found in [14]. These generators thus provide an explicit realization of the gauge-fixed algebra (7.4). In particular, the Dirac brackets agree with the original brackets since q 0 (t) and p0 (t) have been eliminated. We can recast (7.4) as a proper Lie algebra analogous to (2.6). However, this algebra acquires a very complicated form, due to the second-order derivatives in (7.5). Not only ρ do the operators S0 (F0 ) and S1 (Fρ ) enter, but two infinite families of linear operators ρ|ν1 ..νn ν1 ..νn Sn (Fν1 ..νn ), Rn (Gρ|ν1 ..νn ), where Fν1 ..νn (t, x), Gρ|ν1 ..νn (t, x), t ∈ S 1 , x ∈ RN ,

490

T. A. Larsson

are arbitrary functions, totally symmetric in the indices ν1 ..νn . They have the explicit realization 1 ν1 ..νn Sn (Fν1 ..νn ) = (7.8) dt q˙ ν1 (t)..q˙ νn (t)Fν1 ..νn (t, q(t)), 2πi 1 Rnρ|ν1 ..νn (Gρ|ν1 ..νn ) = dt q¨ ρ (t)q˙ ν1 (t)..q˙ νn (t)Gρ|ν1 ..νn (t, q(t)). 2πi The resulting algebra was written down in [14]. The gauge algebra map (N, g) is reduced along similar lines. Since JX commutes with both L(s) and q 0 (t) (before normal ordering), the gauge-fixed realization of map (N, g) is simply obtained by substituting q 0 (t) = t in (6.6). After normal ordering, the extension described in [14] arises, with parameters k = c5 , g a = c6 δ a and g a = c7 δ a given by (6.7); c8 was not considered in that paper. Realizations of toroidal Lie algebras are obtained by further specialization to the N -dimensional torus. 8. Discussion The representation theory of diffeomorphism and gauge algebras in more than one dimension has been developed. The rather obscure results in [8, 14] have been given a natural geometric explanation in terms of jet space trajectories where the reparametrization invariance has been eliminated by gauge fixing. These manifestly well defined modules are “quantum general covariant”, in the sense that they combine a diff(N ) representation (general covariance) with the following quantum properties: Poisson brackets are replaced with commutators, normal ordered expressions act on a lowest-energy Fock space, and the algebra acquires an extension. Moreover, these features are obtained without the introduction of any classical background field. Therefore, these Fock modules can be viewed as natural building blocks for theories of quantum gravity. Classically, everything could be repeated by replacing trajectories by d-dimensional extended objects (e.g. world sheets) in spacetime; simply reinterpret the variable t in (5.5) as having d components t i . Reparametrization is now expressed by diff(d): [Li (s), Lj (t)] = Lj (s)∂i δ(s − t) + Li (t)∂j δ(s − t).

(8.1)

However, the quantum theory only exists if d = 1 (and trivially if d = 0), because otherwise normal ordering yields infinities and (8.1) has no central extension. References 1. Berman, S. and Billig, Y.: Irreducible representations for toroidal Lie algebras. J. Algebra 221, 188–231 (1999) 2. Billig, Y.: Principal vertex operator representations for toroidal Lie algebras. J. Math. Phys. 7, 3844–3864 (1998) 3. Dirac, P.A.M.: Lectures on quantum mechanics. Belfer Graduate School of Science, Yeshiva Univ., New York (1964) 4. Drinfeld, V.G. and Sokolov, V.V.: Lie algebras and equations of Korteweg-de Vries type. J. Sov. Math. 30, 1975–2035 (1985) 5. Dzhumadildaev A.: Virasoro type Lie algebras and deformations. Z. Phys. C 72, 509–517 (1996) 6. Eswara Rao, S., Moody, R.V., and Yokonuma, T.: Lie algebras and Weyl groups arising from vertex operator representations. Nova J. of Algebra and Geometry 1, 15–57 (1992) 7. Eswara Rao, S.: Irreducible representations of the Lie algebra of the diffeomorphisms of a d-dimensional torus. J. Algebra 182, 401–421 (1996)

Extended Diffeomorphism Algebras

491

8. Eswara Rao, S. and Moody, R.V.: Vertex representations for N -toroidal Lie algebras and a generalization of the Virasoro algebra. Commun. Math. Phys. 159, 239–264 (1994) 9. Fabbri, M. and Moody, R.V.: Irreducible representations of Virasoso-toroidal Lie algebras. Commun. Math. Phys. 159, 1–13 (1994) 10. Goddard, P. and Olive, D.: Kac–Moody and Virasoro algebras in relation to quantum physics. Int. J. Mod. Phys. 1, 303–414 (1986) 11. Henneaux, M. and Teitelboim, C.: Quantization of gauge systems. Princeton, NJ: Princeton Univ. Press, 1992 12. Larsson, T.A.: Multi-dimensional Virasoro algebra. Phys. Lett. A 231, 94–96 (1989) 13. Larsson, T.A.: Central and non-central extensions of multi-graded Lie algebras. J. Phys. A 25, 1177–1184 (1992) 14. Larsson, T.A.: Lowest-energy representations of non-centrally extended diffeomorphism algebras. Commun. Math. Phys. 201, 461–470 (1999) 15. Larsson, T.A.: Fock representations of non-centrally extended super-diffeomorphism algebras. physics/9710022 (1997) 16. Larsson, T.A.: Extensions of diffeomorphism and current algebras. math-ph/0002016 (2000) 17. Moody, R.V., Eswara Rao, S., and Yokonoma, T.: Toroidal Lie algebras and vertex representations. Geom. Ded. 35, 283–307 (1990) 18. Rudakov, A. N.: Irreducible representations of infinite-dimensional Lie algebras of Cartan type. Math. USSR Izv. 8, 836–866 (1974) Communicated by T. Miwa

Commun. Math. Phys. 214, 493 – 494 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Erratum

The Boltzmann Equation for a One-Dimensional Quantum Lorentz Gas R. Esposito1 , M. Pulvirenti2 , A. Teta1 1 Dipartimento di Matematica Pura ed Applicata, L’Aquila, Italy 2 Dipartimento di Matematica, Università di Roma La Sapienza, Rome, Italy

Received: 6 April 2000 / Accepted: 10 April 2000 Commun. Math. Phys. 204, 619–649 (1999)

Step 5 of the paper contains a mistake. As a consequence the proof of Theorem 2.1 is not only incomplete, but, as we shall show below, the result is actually false. Consider the expansion (3.36) and the decomposition: ρε± = Rε± + Iε± .

(3.37)

Iε± contains the sum over all pairs of different trajectories. This term is estimated by the arguments leading to (3.43) and this is not correct. In fact there are pairs of different trajectories with the same initial and final points and velocities, whose contribution is not negligible. An example of such pair of trajectory is the following: given three scatterers with positions c1 , c2 , c3 such that c1 < c2 < c3 , the trajectory ξ starts in x0 at time t = 0 with velocity −p, p > 0, crosses the scatterer in c2 , is reflected by the scatterer c1 crosses again the scatterer in c2 , is reflected by the scatterer in c3 , then by the scatterer in c2 and arrives at some position x at time t. The trajectory ξ starts at t = 0 at the same point x0 with the speed −p, it is reflected by the scatterer in c2 then by the scatterer in c3 , crosses the scatterer in c2 , is reflected by the scatterer in c1 , crosses the scatterer in c2 where it meets the trajectory ξ and the two trajectories continue together up to the point x reached at time t. The above construction is valid independently of the position c2 of the intermediate scatterer. Let us denote by < ξ, ξ > the contribution to (3.36) due to this couple of trajectories: ˜ )|r − (x0 )|2 . ˜ )D(ξ < ξ, ξ >= D(ξ It is immediate to check that < ξ, ξ >=< ξ, ξ >, whatever is the value of c2 provided it is between c1 and c3 . Estimate (3.43) is therefore false for the above couple of trajectories. Actually Estimate (3.43) holds for all the couples of trajectories with fixed final position and velocity, whose initial positions can be separated by the displacement of a scatterer. The couple ξ , ξ does not fulfill this condition. We refer to such trajectories as equivalent.

494

R. Esposito, M. Pulvirenti, A. Teta

The contribution coming from the cross term between equivalent trajectories may be as large as the one due to the diagonal terms and this is the case for ξ and ξ described above. We thank L. Erdos and H. T. Yau who pointed out to us this error by means of the above example. Let us now see the consequences. Assume that the initial datum r0 is non vanishing for negative velocities only, say r0+ = 0. Then the limiting evolution does not satisfy Eq. (2.20). In order to see this it is sufficient to proceed as in Step 6, but taking into account also the terms due to equivalent trajectories. The occurrence of such trajectories requires at least three scatterers and, with the above initial conditions, the only possible couples are those described before, so that the contribution of order λ(p)3 to the limiting evolution is exactly twice the one predicted by Eqs. (2.20) which therefore does not describe the limiting behavior. One can ask what is the differential equation for the limit, if any. The answer is that there is not such an equation: the reason is that the evolution of the system is actually non-markovian even in the limit, because the non-vanishing of the cross terms for one trajectory depends on the previous history of the companion trajectory even very far in the past. In conclusion, Eq. (2.16) correctly accounts for the limiting evolution up to the second order in λ(p) and describes the behavior of the markovian process associated to the diagonal terms. The purely quantum effects due to the cross terms destroy the markovian character of the limiting evolution and hence prevent the limiting evolution from being a solution of a differential equation. Such a phenomenon is strictly confined to the one dimensional setup, since in higher dimensions, the limiting behavior is markovian, as shown in references [EY1] and [EY2] on related models. We finally note that such a one-dimensional pathology could be presumably removed changing the microscopic model. If one assumes that the scatterers are moving according to independent Markov processes whose probability distributions are concentrated around the lattice positions, then Eq. (2.20) is expected to hold in the limit. Communicated by J. L. Lebowitz

Commun. Math. Phys. 214, 495 – 545 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

s(2|1) and D(2|1; α) as Vertex Operator Extensions of Dual Affine s(2) Algebras P. Bowcock1 , B. L. Feigin2 , A. M. Semikhatov3 , A. Taormina1 1 Department of Mathematical Sciences, University of Durham, Durham DH1 3LE, England.

E-mail: [email protected]; [email protected]

2 Landau Institute for Theoretical Physics, Russian Academy of Sciences, Moscow, Russia.

E-mail: [email protected]

3 Tamm Theory Division, Lebedev Physics Institute, Russian Academy of Sciences, Moscow, Russia.

E-mail: [email protected] Received: 29 July 1999 / Accepted: 6 February 2000

Abstract: We discover a realisation of the affine Lie superalgebra s(2|1) and of the exceptional affine superalgebra D(2|1; α) as vertex operator extensions of two s(2) algebras with “dual” levels (and an auxiliary level-1 s(2) algebra). The duality relation between the levels is (k1 + 1)(k2 + 1) = 1. We construct the representation of s(2|1) k1 k1 , s(2) k2 , and s(2) 1 modules and decompose on a sum of tensor products of s(2) it into a direct sum over the s(2|1) k1 spectral flow orbit. This decomposition gives rise to character identities, which we also derive. The extension of the construction to D(2|1; k2 )k1 is traced to the properties of s(2) ⊕ s(2) ⊕ s(2) embeddings into D(2|1; α) and their relation with the dual s(2) pairs. Conversely, we show how the k2 representations are constructed from s(2|1) s(2) k1 representations. Contents 1. 2.

3. 4.

5.

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . The Vertex Operator Construction of s(2|1) . . . . . . . . 2.1 Dual s(2) algebras and vertex operators . . . . . . 2.2 The Wakimoto representations of s(2) and screened operator products . . . . . . . . . . . . . . . . . . Extending s(2|1) k )k Algebra . . . . . . . k to the D(2|1; 3.1 The “auxiliary” level-1 s(2) algebra . . . . . . . . 3.2 Conformal embeddings in D(2|1; α) . . . . . . . . Constructing the Representations . . . . . . . . . . . . . . 4.1 Reconstructing the dual s(2) pair . . . . . . . . . . 4.2 From s(2|1) to s(2) representations . . . . . . k k k ⊕ s(2) k to s(2|1) 4.3 From s(2) k representations . Decomposition of Representations and Character Identities 5.1 Decomposing the s(2|1) representation . . . . . . 5.2 Verma-module case . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . .

496 500 500

. . . . . . . . . . .

503 505 506 506 510 511 513 516 520 520 521

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

496

P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina

5.3 From Verma modules to irreducible representations . . . . . . 5.4 Sumrules for irreducible representation characters . . . . . . . 6. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix A. Some s(2) Quantum Group Relations . . . . . . . . . . . . . Appendix B. s(2|1) Algebra, Spectral Flow, and Charged Singular Vectors Appendix C. D(2|1; α) . . . . . . . . . . . . . . . . . . . . . . . . . . . . Appendix D. s(2|1) OPE’s . . . . . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . .

523 526 534 536 538 540 542 544

1. Introduction In this paper, we address a particular instance of the general problem of extending infinitedimensional algebras by vertex operators. We show that vertex operator extensions can lead to interesting and nontrivial constructions of affine Lie (super)algebras and their representations. A well-known example of an extension via vertex operators applies to the sum of two Virasoro algebras with appropriate central charges, and leads to matter coupled to gravity in the conformal gauge [12,13].Another context where vertex operator extensions of conformal algebras are relevant is that of coset conformal field theories c = g1 /g2 , where the “inverse” problem is to reconstruct the g1 representations by combining the representations of c and g2 related by the action of vertex operators. A natural generalisation of the situation encountered in matter plus gravity is provided k2 of two affine s(2) algebras at levels k1 ⊕ s(2) by studying extensions of the sum s(2) k1 and k2 . This is a complicated problem in general; we solve it in the case where (k1 + 1)(k2 + 1) = 1,

ki ∈ C \ {−2, −1}.

(1.1)

The result is that with the help of an additional scalar current, spin- 21 vertex operators k1 ⊕ s(2) k2 to the affine Lie superalgebra D(2|1; extend s(2) k2 )k1 . We study the representations in terms of the subalgebra of D(2|1; k2 )k1 given by s(2|1) u(1). Let k1 ⊕ k induced from the n+1-dimensional representation Wn,k be the Weyl module over s(2) of s(2). For generic values of the level, the vacuum representation L0,0,k1 of s(2|1) k1 is contained in a sum of tensor products of Weyl modules Wn,k1 and Wn,k2 of the respective k1 and s(2) k2 . More precisely, algebras s(2) Wn,k1 ⊗ Wn,k2 ⊗ M 1 (n mod 2),1 = L0,0,k1 ;θ ⊗ A−θ , (1.2) 2

n≥0

θ∈Z

module, M0,1 and where L0,0,k1 ;θ denotes the spectral flow transform of the s(2|1) M 1 ,1 are the irreducible representations of the auxiliary level-1 s(2) algebra, and Aa 2 are Fock modules over a Heisenberg algebra. The basic relation (1.1) can be rewritten as k11+2 + k21+2 = 1, and can be put into a broader perspective when viewed as a particular case of the following family of relations between levels, 1 k1 +2

+

1 k2 +2

= n ∈ N.

(1.3)

We refer to the s(2) algebras with the levels k1 and k2 satisfying (1.3) as dual; Eq. (1.3) may be viewed as a duality relation between s(2) levels, which potentially allows one

Vertex Operator Extensions of Dual Affine s(2) Algebras

497

to study certain classes of integrable and admissible representations of the extended al gebraic structure, starting from the well-studied s(2) representation theory at fractional ki algebras level. For simplicity in illustrating the main idea, we can replace the s(2) with the Virasoro algebras Vir(ki ) obtained from them by Hamiltonian reduction, with central charges di = 13 − ki 6+2 − 6(ki + 2). When extending the sum Vir(k1 ) ⊕ Vir(k2 ) by vertex operators, one must address the problem of their potential non-locality. For in1/2

(i)

iπ/2

stance, (a component of) the operator φ12 has monodromy (−1) ki +2 = e ki +2 with itself, (1) (2) which means nonlocality in general; however a bilinear combination of φ12 and φ12 iπ/2

iπ/2

operators for two dual Virasoro algebras has monodromy e k1 +2 e k2 +2 = eniπ/2 in view of relation (1.3). It is therefore local (for even n) or “almost” local. The precise way to build local or almost local bilinear combinations of vertex operators of the dual algebras is governed by the corresponding quantum group s(2)qi 2iπ

with qi = e ki +2 . The vertex operators, which carry a representation of the quantum group, have to be contracted “over the quantum group index”, as Tr q (φ (1) (z) φ (2) (z)). (1) (2) For n odd in (1.3), the monodromies enπi/2 of θ(z) = Tr q (φ12 (z) φ12 (z)) show that ± √1 f

these operators are not yet local, but they become so when multiplied with e 2 , where f is a free scalar. On the other hand, for n even these bilinear combinations are local without the need of a free scalar. The n = 0 case is the well known example mentioned earlier of matter coupled to gravity in the conformal gauge; one may view the relation (1.3) with n = 0 as the anomaly cancellation condition between matter (Vir(k1 )), Liouville (Vir(k2 )) and reparametrisation ghosts sectors in the quantisation of two-dimensional gravity. For n = 2, such a contraction θ(z) is a fermion with central charge 21 . Subtracting this from the total central charge of the two Virasoro algebras, we are left with [13 −

6 k1 +2 −6(k1

+ 2)] + [13 −

6 k2 +2 −6(k2

+ 2)]− 21 =

15 3 2 − 2k1 +3 −3(2k1

+ 3),

which is the central charge of an N = 1 superVirasoro model. This suggests, therefore, that Vir(k1 ) ⊕ Vir(k2 ) extends to the N = 1 superVirasoro algebra. It also suggests k1 ⊕ Vir(k2 ) can be extended, when n = 2, to the affine superalgethat the sum s(2) bra o sp(1|2)k1 by an “inverse” Hamiltonian reduction process (cf. [35]). Actually, it is k1 is a Virasoro algebra with central charge known [14] that the coset o sp(1|2)k1 /s(2) cV =

2k1 2k1 +3

−

3k1 k1 +2

= 13 −

6 k2 +2

− 6(k2 + 2),

(1.4)

k2 for 1 + 1 = 2, which can be obtained by Hamiltonian reduction from s(2) k1 +2 k2 +2 providing the second, dual s(2) algebra in the sum. ki algebras, where In this paper, we study in detail the case of a dual pair of s(2) the levels are related by (1.3) with n = 1 and, thus, a free scalar is needed to make the k1 ⊕ s(2) k2 vertex operators local after taking the quantum group trace. If we were s(2) to guess what conformal theory the extended structure might be, a crude hint would be provided by evaluating its central charge, that is, cextended = cs(2)k1 + cs(2)k2 + cscalar =

3k1 k1 +2

+

3k2 k2 +2

(1.1)

+ 1 = 0 + 1.

(1.5)

It turns out that the conformal theory with vanishing central charge on the right-hand side of the last equation is the s(2|1) affine Lie superalgebra (which, remarkably, has the

498

P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina

same number of bosonic and fermionic currents and hence vanishing central charge of the Sugawara energy-momentum tensor); the remaining 1 corresponds to u(1) (a free scalar theory). We derive this result in what follows, but now we give some indirect arguments in k1 ⊕ s(2) k2 . There exists a W favour of the emergence of the s(2|1) k1 algebra from s(2) algebra associated with s(2|1); it can be arrived at, for example, by taking two scalar fields and constructing the combinations that commute with two s(2|1) screenings (which can be either both fermionic or one bosonic and one fermionic). This W algebra has several different descriptions, one of which is that of the coset w = s(2|1) g(2). The g(2) = s(2)⊕u(1) algebra from the denominator provides the first of the s(2) al subalgebra of s(2|1)). gebras making the dual pair (this will eventually become the s(2) One can, in principle, “reconstruct” the s(2|1) algebra from g(2)⊕w (possibly with the help of some additional constructions, for example choosing a particular bosonisation). On the other hand, the coset W algebra can also be described as w = s(2) u(1). Up to the u(1) mixing, therefore, the W algebra in the “reconstruction” g(2) ⊕w → s(2|1) can be replaced with s(2); this is the origin of the second (dual) s(2) in our construction s(2) ⊕ s(2) → s(2|1) (this second s(2) is obviously not a subalgebra of s(2|1)). The above argument is also strongly supported by the detailed knowledge we have of a class of irreducible representations of s(2|1) at admissible level k1 = 1 − 1, u ∈ N + 1, and their corresponding characters. Their branching functions into u k1 were shown in [?,?] to involve characters of a rational characters of the subalgebra s(2) torus Au(u−1) and of the parafermionic algebra Zu−1 . The latter can be obtained as the k2 in the construction of s(2|1) u−1 / u(1), providing us with the dual s(2) coset s(2) k1 (note that one indeed has (1.1) with k2 = u − 1). But there is an additional remarkable observation: the u(1) mixings involved in the above coset argument conspire so as to extend s(2|1) k1 ⊕ u(1) to the exceptional affine Lie superalgebra D(2|1; k2 )k1 . The crucial circumstance here is the relation existing between conformal embeddings s(2) ⊕ s(2) ⊕ s(2) ⊂ D(2|1; α) and the dual s(2) pairs. Such an embedding is conformal if and only if two of the s(2) algebras are dual k1 ⊕ s(2) k2 ⊕ s(2) 1 by (the quantum trace (and the third has level 1). Extending s(2) 1 of the bilinears in) the spin- 2 vertex operators thus gives a realisation of the D(2|1; k2 )k1 algebra. The representations of D(2|1; α) that can be obtained in this way are very spe cial, however, in that they appear to be exactly the direct sums of s(2|1) representations over the spectral flow orbits. Anyway, we do not focus on the D(2|1; α) representations and concentrate instead on the s(2|1) algebra, whose representations have been studied in more detail (the extension to D(2|1; α) is however very interesting because of its close relation to the N = 4 superconformal algebra, and its study certainly deserves to be deepened). Our original interest in the study of the s(2|1) representation theory, especially at admissible values of the level, stems from the rôle this affine algebra might play in the quantisation of the N = 2 non-critical strings [21, 1, 15, 35]. From the s(2|1) algebra standpoint, the representations constructed by taking ten k1 , s(2) k2 , and s(2) 1 representations are reducible, because all sor products of s(2) the s(2|1) generators commute with the third D(2|1; α) Cartan current. The decompo sition of such a representation N into different s(2|1) representations is governed by the

Vertex Operator Extensions of Dual Affine s(2) Algebras

499

spectral flow: along with each s(2|1) representation, N contains all of its spectral-flow images (we saw this in (1.2) for the vacuum representation). Associated with this construction of s(2|1) representations are the character identi ties in the form of sumrules between s(2|1)⊕ u(1) and s(2)⊕ s(2)⊕ s(2) characters, which we give for Verma modules and for a class of irreducible admissible representations. Each sumrule is invariant under a spectral flow whose generator can be added to the u(1) ⊕ s(2|1) generators to extend the algebra to D(2|1; α). It is therefore ex pected that each sumrule in a given sector of the theory describes a D(2|1; α) character belonging to a special class corresponding to conformal embeddings. As well as building s(2|1) representations from tensor products of s(2) represen k2 algebra tations as described above, we also construct representations of the dual s(2) starting with an s(2|1) k1 representation. A key step in doing so was to establish a corre k2 . spondence between Verma modules of s(2|1) k1 and relaxed Verma modules of s(2) The latter provide one of the starting points in constructing representations of the vertex k1 ⊕ s(2) k2 , and therefore, the above correspondence is operator extensions of s(2) partly what allows one to invert, up to spectral flow, the vertex operator construction of s(2|1) representations. We find several indications that the established correspondences between representations have functorial properties; however we leave for the future the very interesting separate problem of constructing the functors between the s(2|1) k1 and k1 ⊕ s(2) k2 representation theories. s(2) This paper is organised as follows. In Sect. 2, we describe the basic vertex opera k1 ⊕ s(2) k2 to s(2|1) tor construction allowing us to extend s(2) k1 . In Sect. 3, we study embeddings into the D(2|1; α) algebra, find the relation between the conformal embeddings and the dual s(2) pairs, and conclude that our vertex operator algebra extends to D(2|1; k2 )k1 . We then concentrate on s(2|1) k1 representations in Sect. 4. In Sects. 4.1 and 4.2, we show how the s(2)k2 representations can be reconstructed starting from s(2|1) k1 representations. With the experience gained here, we then ad k1 and s(2) k2 representations into a dress in Sect. 4.3 the problem of combining s(2) representation of s(2|1) k1 . In Sect. 5, we consider the decomposition of the s(2|1) representations constructed; for Verma modules, we confirm the decomposition formula by calculating the characters in Sect. 5.2, and in Sect. 5.3 we give the character identities relating s(2|1) characters with those of the dual s(2) pair for an interesting class of ir reducible s(2|1) representations. We conclude with remarks on several related research directions. Our conventions on the different algebras are summarised in the appendices. Appendix A gives the basic relations describing the quantum s(2) algebra and its “spin- 21 ” representations; these are the representations realised on the vertex operators involved in our construction. Appendix B summarises our conventions and some useful facts about the affine s(2|1) algebra, and Appendix C describes several basic facts about the exceptional Lie superalgebras D(2|1; α). In Appendix D, we explicitly check that the operator product expansions of the currents constructed in terms of vertex operators yield the s(2|1) algebra.

500

P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina

2. The Vertex Operator Construction of s(2|1) p−2 and 2.1. Dual s(2) algebras and vertex operators. We take two algebras, s(2) s(2)p −2 with 1 1 = 1, + p p

(2.1)

and p ∈ C \ {0, 1}. In terms of the levels k = p − 2 and k = p − 2, this condition can also be written as (1.1) (with k1 = k and k2 = k from now on). k algebra can be described in terms of the currents J ± (z) and J 0 (z) with The s(2) the operator products ±J ± (w) , z−w k/2 J 0 (z)J 0 (w) = , (z − w)2

J 0 (z)J ± (w) =

J + (z)J − (w) =

k 2J 0 (w) , + (z − w)2 z−w

(2.2)

k algebra. and similarly for the “primed” s(2) In what follows, we let Vj,k denote an s(2) module with the highest-weight vector |j, k satisfying + |j, k = 0, J≥0

− |j, k = 0, J≥1

0 |j, k = 0, J≥1

J00 |j, k = j |j, k.

(2.3)

A fundamental rôle in our construction is played by the vertex operators correspond k representation, ing to the top level states in the irreducible spin- 21 s(2) (2.4) J0+ 21 , k = 0, J0− 21 , k = − 21 , k , J0− − 21 , k = 0, and similarly in the primed sector. Let V 1 and V− 1 be the vertex operators corresponding 2

2

to | 21 , k and |− 21 , k, respectively. So, V 1 (z) and V− 1 (z) make up an s(2) doublet, 2

2

which we express by writing (V 1 (z), V− 1 (z)) ∈ C2 (z). In addition, each of these oper2

2

ators has a second component and thus is an element of a two-dimensional space C2q (z) associated with the point z, where the subscript indicates that this space is a representation of the quantum group s(2)q , with quantum group parameter q = e2πi/p [5]. The s(2)q -module is the quotient of a Verma module over a singular vector; in the notations of Appendix 6, where we collect some simple facts about quantum groups, this can be V",1 with " = ±1. Thus, the vertex operator associated with the spin- 21 representation is a “two-indexed object” with s(2) index a = 1, 2 and quantum group index α = 1, 2, and therefore a tensor component in the space C2 (z) ⊗ C2q (z). Its action on s(2) modules is given by C2 (z) ⊗ C2q (z) : Vj,k → Vj + 1 ,k ⊗ z

#j + 1 −#j −# 1 2

2

2

⊗z

2

#j − 1 −#j −# 1 2

C((z)) ⊕ Vj − 1 ,k

2

(2.5)

C((z)),

where #j = j (jp+1) and C((z)) is the completion of the Laurent polynomial ring C[z, z−1 ] with positive powers of z. In what follows, we distinguish one of the quantum

Vertex Operator Extensions of Dual Affine s(2) Algebras

501

group components from the other with a tilde; in the free-field realisation that we use they act as V± 1 (z) : Vj,k → Vj + 1 ,k ⊗ z 2

#j + 1 −#j −# 1 2

2

2

C((z)),

(2.6)

C((z)).

(2.7)

and 1 (z) : Vj,k → V 1 ⊗ z V ± j − ,k 2

#j − 1 −#j −# 1 2

2

2

The monodromies of vertex operators are defined by analytically continuing from the real line, where they are fixed as Vj1 ,m1 (w) Vj2 ,m2 (z) = e

2π i p j1 j2

Vj2 ,m2 (z) Vj1 ,m1 (w), Re z > Re w. (2.8) To evaluate the operator product of vertex operators i Vji ,mi (zi ), we use (2.8) to rearrange the operators Vj1 ,m1 (z1 ) . . . VjN ,mN (zN ) such that Re zi > Re zi+1 , and then use the operator products Vji ,mi (zi )·Vj ,m (z ) ∼ (zi −z )ji j /(2p) with Re zi > Re z . The point of the subsequent construction is that different factors of the form (zi − z )ji j /(2p) combine into integral powers, after which we no longer need to assume Re zi > Re zi+1 . Consider now the dual s(2) algebra, with the vertex operators viewed as elements 2 2 of C (z) ⊗ Cq (z), where the quantum group parameter is q = e2πi/p = q −1 in view of (2.1). Similarly to the above, the monodromies in the primed sector are given 2π i

j j

by e p 1 2 = e2πij1 j2 q −j1 j2 . The module V" ,1 over s(2)q −1 is also a module over s(2)q , as we see in (A.15)– (A.16). Moreover, depending on whether " " is −1 or +1, this is (almost) the dual of V",1 . Thus, there exists the trace (A.18) on the tensor product of the unprimed and primed vertex operators. Taking this trace “over the quantum group index” leaves us with a four-dimensional space: ·,·

C2 (z) ⊗ C2q (z) ⊗ C2 (z) ⊗ C2q (z) −−→ C2 (z) ⊗ C2 (z),

(2.9)

(more precisely, there is an extra factor on the right-hand side given by V−1,0 for "" = −1). Now, the monodromies of the operators from C2 (z)⊗C2 (z) with each other are given 1 1 πi by q j1 j2 · e2πij1 j2 q −j1 j2 = e2πi 2 · 2 = e 2 . Therefore, the operators from C2 (z) ⊗ C2 (z) are “almost” local with respect to each other. They can be made local by introducing an auxiliary scalar field f with the operator product f (z) f (w) = log(z − w),

(2.10) ± √1 f (z)

and multiplying the operators from C2 (z) ⊗ C2 (z) with e 2 . Note that adding up the dimensions of the primed and unprimed V± 1 operators, we find 2

1 1 2(2

which after multiplying with e e

πi 2

·e

πi 2

+ 1) + p

± √1 f (z) 2

1 1 2(2

+ 1) 3 = , p 4

becomes

3 4

+

1 4

(2.11)

= 1. Their monodromies are

= −1. Denoting by C21 (z) the two-dimensional space generated by e

± √1 f (z) 2

,

502

P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina

we thus see that the basis elements of C2 (z)⊗C2 (z)⊗C21 (z) represent eight dimension-1 fermionic currents. It turns out that these eight fermionic currents generate an affine Lie superalge bra D(2|1; α), as will be discussed in Sect. 3. Until then, we concentrate on the algebra generated by four of these eight fermionic currents, which is the affine Lie superalge bra s(2|1) (see Appendix 6 for notations and conventions). We, thus, define

1 V 1 + V 1 V 1 ' √1 , E1 = − V 2 2 2 2 2

(2.12) 2 i π 1 ' √1 , E = πp cos p V 1 V− 1 + V 1 V − − 2

and

2

2

2

2

1 V 1 +V 1 V 1 ' √1 , cos πp V −2 − −2 − − 2 2 2

2 F = V− 1 V 1 + V− 1 V 1 ' √1 , F1 =

i πp

2

2

2

2

(2.13)

2

where 'α (z) = eαf (z) .

(2.14)

k and s(2) k algebras be generated by the currents (J + , Lemma 2.1. Let the s(2) 0 − + 0 − J , J ) and (J , J , J ), respectively. Then the fermionic currents (2.12)–(2.13) generate the affine s(2|1) algebra of level k = p − 2. Its s(2) subalgebra coincides k , i.e., with s(2) E 12 = J + ,

H − = J 0,

F 12 = J − ,

(2.15)

√k 2

∂f.

(2.16)

√1 ∂f ) 2

(2.17)

and its u(1) subalgebra is generated by the current H + = (k + 1)J 0 − The current A=

2(k + 1)(J 0 −

commutes with all the s(2|1) currents. We prove the lemma by evaluating the operator product expansions of the currents (2.12)–(2.16). Some of them are straightforward. In particular, one has, H + (z) · H + (w) =

−k/2 , (z − w)2

(2.18)

and since the generators in (2.13) are obtained from those in (2.12) by the action of the s(2) subalgebra (the unprimed s(2)), we also effortlessly obtain the operator product expansions −F 2 , z−w E2 E 12 (z) F 1 (w) = , z−w

F 12 (z) E 1 (w) =

F1 , z−w −E 1 E 12 (z) F 2 (w) = . z−w

F 12 (z) E 2 (w) =

(2.19) (2.20)

Vertex Operator Extensions of Dual Affine s(2) Algebras

503

The remaining relevant operator product expansions are calculated in Appendix D with the help of the Wakimoto bosonisation discussed in the following section. The latter explicitly shows the quantum group action in terms of the screening operator and contour manipulations (more precisely, screenings (one screening in the current case of s(2)q ) generates the nilpotent subalgebra of s(2)q , while the entire quantum group is represented in terms of a construction [19, 33] using more involved contour operations). Let us remark at this point that the currents in formulas (2.12) and (2.13) are normalised in accordance to a particular choice of screening contour in the Wakimoto representation. In general, the normalisation depends on the precise definition of the screened vertex operators, cf. [19, 33]. 2.2. The Wakimoto representations of s(2) and screened operator products. We define a first-order bosonic system and an independent current by the operator products β(z) γ (w) =

−1 , z−w

∂ϕ(z) ∂ϕ(w) =

1 . (z − w)2

(2.21)

p−2 currents can be written as In terms of these free fields, the s(2) J + = − β, J 0 = p2 ∂ϕ + βγ , J − = βγ γ + 2p γ ∂ϕ + (p − 2)∂γ ,

(2.22)

and the vertex operators are, V1 = e 2

√1 ϕ 2p

,

V− 1 = γ e 2

√1 ϕ 2p

.

(2.23)

Similar formulae for the primed s(2) algebra are obtained by consistently replacing the free fields with the primed ones and replacing p with p . The Wakimoto bosonisation screening acts on a vertex operator as

1 S : V (z) → 2πi dx S(x)V (z). (2.24) Cz

The integration contour can be chosen in different ways; one of the possibilities is to take the contour ✛ • ✲ (2.25) where the straight line shows the branch cut due to the nonlocality in the operator product of S(x) and V (z). As is well known (see, e.g., [19, 33, 6]), up to a normalisation factor this contour can be replaced with the contour running along the cut (in this case the integral is defined by analytic continuation). We choose the latter possibility in what follows; it turns out that we will have fewer (−1)α factors if we define

∞ (z) ≡ dx S(x)V (z), (2.26) S : V (z) → V z

504

P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina

where the integral has to be evaluated for those parameter values where it converges and then analytically continued. In order to verify the s(2|1) operator product expansions between the currents constructed in the previous section, we need to evaluate products involving screened and unscreened vertex operators in the combinations determined by the quantum-group trace (2.9), typically, (SV1 (z) V1 (z) + V1 (z) S V1 (z)) · (SV2 (w) V2 (w) + V2 (w) S V2 (w)), vertex operators. In all of the operator products where V and V some s(2) and s(2) that we encounter in what follows, a nonzero contribution comes from the terms SV1 (z) V1 (z) V2 (w) S V2 (w) + V1 (z) S V1 (z) SV2 (w) V2 (w)

∞

∞ = du S(u) V1 (z) V1 (z) · dx S (u) V2 (w) V2 (w) z w

∞

∞ + du S (u) V1 (z) V1 (z) · dx S(u) V2 (w) V2 (w), z

w

(2.27)

We first assume Re z > Re w, evaluate these operators products, and analytically continue the result. The above expression (2.27) thus becomes,

∞

z

z du S(u) · V1 (z) · V2 (w) dx V1 (z) · S (x) · V2 (w) w

z

∞ + du V1 (z) · S(u) · V2 (w) dx S (x) · V1 (z) · V2 (w) w z

∞

∞ + (−1)α1 du S(u) · V1 (z) · V2 (w) dx S (x) · V1 (z) · V2 (w) z z

∞

∞ + (−1)α1 du S(u) · V1 (z) · V2 (w) dx S (x) · V1 (z) · V2 (w), z

z

(2.28)

where (−1)α1 and (−1)α1 are the monodromies coming from

V1 (z) S (x) = (−1)α1 S (x) V1 (z),

V1 (z) S(x) = (−1)α1 S(x) V1 (z), √1

ϕ

√1

(2.29)

ϕ

(where Re x > Re z). When V1 is either e 2p or γ e 2p , as prescribed in (2.23), the monodromies are (−1)α1 = e−πi/p and (−1)α1 = e−πi/p = −eπi/p , and (2.27) becomes, (SV1 (z) V1 (z) + V1 (z) S V1 (z)) · (SV2 (w) V2 (w) + V2 (w) S V2 (w))

∞

∞ = −2i sin πp du S(u) V1 (z) V2 (w) dx S (x) V1 (z) V2 (w) z z

∞

z + du S(u) V1 (z) V2 (w) dx V1 (z) S (x) V2 (w) z w

z

∞ + du V1 (z) S(u) V2 (w) dx S (x) V1 (z) V2 (w). (2.30) w

z

Vertex Operator Extensions of Dual Affine s(2) Algebras

505

Appendix D is devoted to explicit checks of the s(2|1) operator product expansions in the Wakimoto representation. In addition to establishing the operator product expansions, the Wakimoto bosonisation is a useful tool to verify the relation between the energymomentum tensors underlying the central charge balance in Eq. (1.5). Lemma 2.2. The combined energy-momentum tensors of the s(2|1) algebra generated by (2.12)–(2.16) and the independent u(1) current A(z), Eq. (2.17), equals the sum of the energy-momentum tensors for two s(2) and the free u(1) current ∂f (z): TSug + 21 AA = 21 ∂ϕ∂ϕ − √12p ∂ 2 ϕ − β∂γ + 21 ∂ϕ ∂ϕ − √1 ∂ 2 ϕ − β ∂γ + 21 ∂f ∂f. 2p p−2 s(2)

p −2 s(2)

(2.31) Proof. To show this, we directly calculate the Sugawara energy-momentum tensor given by Eq. (B.5). First, the s(2) subalgebra contribution is given by, 1 p−1

E 12 F 12 + H − H − = −β∂γ +

1 p−1

∂βγ +

p 2(p−1)

∂ϕ∂ϕ

(2.32)

(which, obviously, is not the s(2) energy-momentum tensor). Second, the fermionic contribution can be read off Appendix D, 1 p−1

E1F 1 − E2F 2 =

+ 21 ∂ϕ ∂ϕ + √ 1 ∂β γ + 2β γ ∂f − − β ∂γ − p−1 1 2(1−p) ∂ϕ∂ϕ

p ∂ϕ ∂f −

p−2 2(p−1) ∂f ∂f

− √1 ∂ 2 ϕ ,

(2.33)

(p−2)2 H + H + = p (p − 2)∂ϕ ∂f − 2(p−1) ∂f ∂f − p2 ∂ϕ ∂ϕ √ + 2(p − 2)β γ ∂f + (p − 1)β ∂γ − (p − 1)∂β γ − 2p(p − 1)β γ ∂ϕ − (p − 1)β β γ γ .

(2.34)

√1 ∂ 2 ϕ 2p

2p

and, in addition, we have −

1 p−1

Finally, the free scalar current A has the energy-momentum tensor 1 2 AA

−

√

=

p 2 ∂ϕ ∂ϕ −

p(p − 1) ∂ϕ ∂f + 21 (p − 1)∂f ∂f +

2p(p − 1)β γ ∂ϕ

2(p − 1)β γ ∂f + (p − 1)β β γ γ − (p − 1)β ∂γ + (p − 1)∂β γ .

Adding all this together, we arrive at (2.31).

(2.35)

k )k Algebra 3. Extending s(2|1) k to the D(2|1; So far, we have seen that the appropriate bilinear combinations of vertex operators extend k ⊕ s(2) k ⊕ the s(2) u(1) algebra to s(2|1) k . We now show that this is in fact part of a larger construction, that of the exceptional affine Lie superalgebra D(2|1; α).

506

P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina

3.1. The “auxiliary” level-1 s(2) algebra. We first use the auxiliary scalar f to con struct an s(2) algebra of level 1, j + (z) = e

√

2f (z)

,

j 0 (z) =

√1 2

∂f (z),

√ 2f (z)

j − (z) = e−

.

(3.1)

Then the two Cartan currents of s(2|1) u(1) current become p−2 and the extra H − = J 0,

(3.2) 0

p 0 p J − (p − 2) j , A = 2(p − 1)(J 0 − j 0 ),

H+ =

(3.3) (3.4)

p −2 and J 0 is that of s(2) p−2 . We where, as before, J 0 is the Cartan current of s(2) thus clearly see the interplay of three s(2) algebras, two at dual levels k = p − 2, k = p − 2, and one at level 1, in our construction of s(2|1). ⊕ s(2) ⊕ s(2) In the next subsection, we discuss how to embed s(2) and s(2|1) ⊕ u(1) in the affine superalgebra D(2|1; α) in such a way that the u(1) currents (3.2)–(3.4) are recovered. This will mean that the eight basis elements of C2 (z) ⊗ C 2 (z) ⊗ C21 (z) (four of which were explicitly given in Eqs. (2.12)–(2.13) in terms of vertex operators) represent in fact the D(2|1; α) fermions. 3.2. Conformal embeddings in D(2|1; α). Our conventions on D(2|1; α) are summarised in Appendix C. Using the notations for the roots introduced there, we write the bosonic subalgebra of D(2|1; α) as s(2)(α2 ) ⊕ s(2)(α3 ) ⊕ s(2)(αθ ) . When s(2|1) is regularly embedded in D(2|1; α) (i.e. when the roots of the subalgebra are chosen as a subset of the roots of the embedding algebra), any of these three s(2) algebras can be taken as the s(2) subalgebra of s(2|1), for which we introduce the notation (reminding − us of the H − current) s(2)(α ) , so that α − = α2 , α3 or αθ . For each choice of α − , there exist two regular embeddings of s(2|1) in D(2|1; α). The corresponding s(2|1) root lattices contain α − and four of the eight odd roots of D(2|1; α). From now on, we use α the parameter γ = 1+α , γ ∈ C \ {0, 1, ∞} instead of α. This parameter governs the norm of α2 and α3 , and αθ is always the longest root if γ ∈ [ 21 , 1[ (see Appendix C). For the affine algebras, the levels of the algebras involved in the embedding γ u(1) ⊂ D(2|1; s(2|1) λ ⊕ 1−γ )κ

(3.5)

κ or κ according to whether α − = α2 , α3 or αθ . On the are related as λ = − γκ , − 1−γ other hand, one also has the regular embedding, γ (α−κ3 ) ⊕ s(2) κ(αθ ) ⊂ D(2|1; (α2κ ) ⊕ s(2) s(2) 1−γ )κ . − γ

(3.6)

1−γ

γ We now note that the Sugawara central charge of D(2|1; 1−γ )κ is 1 for any value of the γ level κ, because the dual Coxeter number of D(2|1; 1−γ )κ is zero and its superdimension (number of bosonic generators minus number of fermionic generators) is one; hence, the u(1) is also 1 corresponding central charge is one. But the central charge of s(2|1) λ ⊕ for any level λ, this time because the superdimension of s(2|1) is zero, and it turns out that for certain relations between the level κ and the parameter γ , which we analyse

Vertex Operator Extensions of Dual Affine s(2) Algebras

507

γ λ1 ⊕ s(2) λ2 ⊕ s(2) λ3 subalgebra of D(2|1; below, the central charge of the s(2) 1−γ )κ is also one. It is therefore not surprising that s(2) ⊕ s(2) ⊕ s(2) and s(2|1) ⊕ u(1) are intimately linked, a fact already noticed earlier when the energy-momentum tensors of the two theories were shown to coincide in a free field representation (2.31), and also seen from the representation decompositions in Sect. 5 and the character identities (5.55). The relation between s(2) ⊕ s(2) ⊕ s(2) and s(2|1) ⊕ u(1) is based on the fact γ that the embeddings into D(2|1; ) which we consider give rise to dual s(2) pairs: κ 1−γ

(α2κ ) ⊕ s(2) (α−κ3 ) ⊕ s(2) κ(αθ ) is one if Lemma 3.1. The Sugawara central charge of s(2) − γ

1−γ

and only if two of the s(2) subalgebras are dual to each other in the sense of Eq. (1.1). Moreover, when this is so, the third s(2) algebra has level 1. Proof. Indeed, adding up the central charges on the left-hand side of (3.6), we find the (α2 ) and s(2) (αθ ) are dual and s(2) (α3 ) sum is one for κ = γ − 1 (in which case s(2) (α ) (α ) θ 3 has level 1), for κ = −γ (with s(2) and s(2) dual), or for κ = 1 (with s(2)(α2 ) (α ) 3 and s(2) dual). Conversely, we have to consider three different possibilities to pick out two dual s(2) algebras. (α2 ) and s(2) (αθ ) leads to κ = γ −1 and the relevant embedding 1. Taking the pair s(2) is γ 2) (α3 ) ⊕ s(2) (αθ ) ⊂ D(2|1; (α s(2) 1−γ ⊕ s(2)1 γ −1 1−γ )γ −1 .

(3.7)

γ

(αθ ) chosen as dual, κ = −γ and the relevant em (α3 ) and s(2) 2. For the pair s(2) bedding is (αθ ) γ (α2 ) ⊕ s(2) (αγ3 ) ⊕ s(2) −γ s(2) ⊂ D(2|1; 1 1−γ )−γ .

(3.8)

1−γ

(α2 ) and s(2)(α3 ) as the dual pair leads to κ = 1 and, thus, to the 3. Taking s(2) embedding γ (α21) ⊕ s(2) (α13 ) ⊕ s(2) (αθ ) ⊂ D(2|1; s(2) 1 1−γ )1 . −γ

(3.9)

γ −1

This completes the proof of the lemma. γ So we have two subalgebras of g = D(2|1; u(1) 1−γ )κ , namely h1 = s(2|1)λ ⊕ (α2 ) (α3 ) (αθ ) and h2 = s(2)λ1 ⊕ s(2)λ2 ⊕ s(2)λ3 , whose central charges coincide with that κ of g as long as two of the three levels k1 = − γκ , k2 = − 1−γ , and k3 = κ obey (ka + 1)(kb + 1) = 1. For unitary representations, we would then conclude that embeddings (3.5), (3.7), (3.8), and (3.9) are conformal, since the coset Virasoro algebra L = Lg − Lh has vanishing central charge, and the only unitary representation of the coset algebra in this case is the trivial one [18], leading to the following formal relation between characters,

Tr 7 q Lg = Tr 7 q Lh .

(3.10)

In the case at hand, though, we are dealing with non unitary representations. A sum h2 representations can only be (of products) of characters corresponding to h1 or to

508

P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina

interpreted as an irreducible g character if the corresponding representation is acted upon trivially by the coset Virasoro algebra. We want to argue that, in order for this to be so in the case of a g representation with highest vector |7, it is sufficient to demand that the state L−2 |7 is null (i.e., singular or a descendant of a singular state), and that the same condition holds for the vacuum representation, i.e. L−2 |0 is null, where |0 is the usual sl(2, C) invariant vacuum, which is also invariant under the action of the finite dimensional algebra g. Indeed, the operator product expansion of L with one of the currents J a (ζ ) of g is of the form, L(z)J a (ζ ) =

X(ζ ) Y (ζ ) + + regular, (z − ζ )2 z−ζ

(3.11)

and the state associated with the residue of the double pole is, X−1 |0 = (J a )1 L−2 |0 = λ(J a )−1 |0,

(3.12)

for some constant λ. Since (J a )−1 |70 is not null and yet our assumption demands that the vector (J a )1 L−2 |70 vanishes or is null, it follows that λ = 0, so that X(ζ ) must vanish identically. Thus, a b , (3.13) Jm , Ln = M ab Ym+n so that Ln and Yna are just two Kaˇc–Moody primary fields in some representation of g. Moreover this representation is finite since the number of weight-two states in 70 is finite, and all the states in the multiplet are null. We now return to the representation |7 which is not the vacuum representation. It is easy to show that (Lg )1 L−2 |7 = 3L−1 |7,

(3.14)

so that this state must also be null. Since L−1 and L−2 generate the whole Virasoro algebra, it follows that L−n |7 must be null for any n, and hence any of the Y−n |7 will also be null for a Y in the null multiplet of weight-two fields. As a consequence of the above remarks, a state of the form |W = L−n {. . . }|7, where . . . consists of some arbitrary collection of g current modes, is null, as can be shown inductively by pushing L−n to the right. The action of L on the representation 7 is therefore effectively trivial as claimed. γ For instance, in the case of embedding (3.9), where D(2|1; 1−γ ) has level one, the αθ 2 αθ state (E−1 ) |0 is a singular vector, where E−1 is the usual affine simple root associated with the step operator corresponding to the highest root αθ of g. This state is a singular vector at grade two as is L−2 |0, so it isn’t unreasonable to argue that they are in the same multiplet under g. If this were the case, our condition that L−2 |7 is null is equivalent to the condition that (E αθ −1 )2 |7 is null. The latter is implied by the vanishing αθ 2 of 7|(E1−αθ )2 (E−1 ) |7, yielding the familiar s(2) result αθ .7(αθ .7 − 1) = 0.

(3.15)

This means we only should see the unitary singlet and doublet representations of the (α ) γ s(2)1 θ appearing in the D(2|1; 1−γ )1 character decomposition if the embedding is to be conformal, and indeed inspection of our character sum rules reveals that only the (α ) characters of these s(2)1 θ representations occur.

Vertex Operator Extensions of Dual Affine s(2) Algebras

509

We now proceed to identify which of the s(2|1) embeddings (3.5) (with κ = γ − 1, −γ or 1) is the one implied in (3.2)–(3.4). To make contact with the latter expressions, (α − ) is also at level k. we must fix the level of s(2|1) to be k, so that its subalgebra s(2) According to the above discussion, we have the embeddings s(2|1) ⊕ u (1) λ

D (α2 ) ⊕ s(2) (α3 ) ⊕ s(2) (αθ ) s(2) λ1 λ2 λ3

γ ≡ D(2|1; in D 1−γ )γ −1 for the levels, (k; k , 1, k), (k ; k , 1, k), for γ = − kk = k + 1, (λ; λ1 , λ2 , λ3 ) = (k; k, 1, k ), (k ; k, 1, k ), for γ = − kk = k + 1, γ ≡ D(2|1; in D 1−γ )−γ for the levels, (k; 1, k , k), (k ; 1, k , k), for γ = −k, (λ; λ1 , λ2 , λ3 ) = (k; 1, k, k ), (k ; 1, k, k ), for γ = −k , γ ≡ D(2|1; and in D 1−γ )1 for the levels, (k; k, k , 1), (k ; k, k , 1), for γ = − k1 , (λ; λ1 , λ2 , λ3 ) = (k; k , k, 1), (k ; k , k, 1), for γ = − k1 .

(3.16)

(3.17)

(3.18)

(3.19)

u(1) in terms of s(2)(α2 ) ⊕s(2)(α3 ) ⊕ We actually have twelve descriptions of s(2|1) k ⊕ (α ) θ s(2) , which we summarize in Table 1. There, we label the even subalgebra of s(2|1) ⊕ u(1) as (α [s(2)

−)

+

∗

⊕ u(1)(α ) ] ⊕ u(1)(α ) .

(3.20)

(α − ) is the algebra dual to s(2) (α − ) , and We further define α − and β such that s(2) (β) is the third, level 1, algebra in the sense described in Lemma 3.1. The comparison s(2) of data in Table 1 with the currents (3.2)–(3.4) requires proper normalisation, dictated k/2 by the operator product expansions derived earlier. Since H − (z) H − (w) = (z−w) 2, − − − the properly normalised corresponding root is α = µα with µ = k/2(α )2 − 2 + + = να + with (since ( α ) = k/2). Similarly, H (z) corresponds to α − − 0 + 2 to α = ρα with ρ = k /2(α − )2 , j 0 (z) ν = −k/2(α ) , J (z) corresponds 2 α∗ = τ α∗ corresponds to β = σβ with σ = 1/2(β) , and finally, A(z) corresponds to with τ = 1/(α ∗ )2 . We conclude that the first embedding in Table 1 indeed corresponds to currents (3.2)– (3.4), with

α− = +

α = ∗

α =

√

k 2 αθ , √ − k , α − − kβ 2 (α2 + α3 ) = (k + 1) √ −1 (kα2 + (k + 1)α3 ) = 2k(k 2(k+1)

(3.21) (3.22) −

+ 1)( α

). −β

(3.23)

510

P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina

This therefore indicates that the vertex operators corresponding to all eight elements from C2 (z) ⊗ C 2 (z) ⊗ C21 (z) extend our vertex operator construction of s(2|1) k to 1 D(2|1; k )k = D(2|1; k )k . In Sect. 5, we establish character sumrules which confirm that the coset of D(2|1; k )k by s(2|1) u(1) actually coincides with the coset correk ⊕ sponding to the conformal embedding (3.7).

γ (α2 ) ⊕ s(2) (α3 ) ⊕ s(2) (αθ ) in D(2|1; Table 1. Embeddings of s(2|1) u(1) and s(2) λ ⊕ λ3 λ1 λ2 1−γ )γ −1 γ γ (embeddings 1,3,5,7), D(2|1; 1−γ )−γ (embeddings 2,4,9,11) and D(2|1; 1−γ )1 (embeddings 6,8,10,12). while α ∗ is in the u(1) direction α − and α + are associated with the isospin and hypercharge in s(2|1), −

−

(α ) and s(2) (α ) are dual algebras and s(2) (β) is orthogonal to the s(2|1) root plane in D(2|1; α). s(2) −1 the third, level 1 algebra. The transformations t1 : γ → 1 − γ and t2 : γ → γ are discussed in Appendix C. α−

roots generating s(2|1) plane

αθ

(α1 − αθ , α1 )

α+

α∗

−α2 − α3 (1 − γ )α2 − γ α3

(α1 + α3 , − α1 − α2 ) −α2 + α3 (1 − γ )α2 + γ α3

α2

(α1 + α3 , αθ − α1 )

α 3 + αθ

α3 + (1 − γ )αθ

α− β

value of parameter γ

α 2 α3

−k/k

(k; k , 1, k)

α3 α2

−k = t1 (−k/k )

(k; 1, k , k)

α 2 α3

−k/k

(k; k , 1, k)

α3 α2

−k = t1 (−k/k )

(k; 1, k , k)

−k /k

(k; k, 1, k )

αθ

α3

α3 αθ −1/k = t2 t1 t2 (−k /k) (α1 , α1 + α2 )

−α3 + αθ −α3 + (1 − γ )αθ

αθ

α3

−k /k

α3 αθ −1/k = t2 t1 t2 (−k /k) α3

(α1 + α2 , αθ − α1 )

α 2 + αθ

α2 + γ αθ

αθ

α2

α2 αθ (α1 , α1 + α3 )

−α2 + αθ

−α2 + γ αθ

αθ

α2

α2 αθ

embedding

(λ; λ1 , λ2 , λ3 )

(k; k, k , 1) (k; k, 1, k ) (k; k, k , 1)

−k

(k; 1, k, k )

−1/k = t2 (−k )

(k; k , k, 1)

−k

(k; 1, k, k )

−1/k = t2 (−k )

(k; k , k, 1)

4. Constructing the Representations k We now construct relations between representations of s(2|1) k and of the s(2) k algebras with dual k and k . The vertex operator realisation of s(2|1) and s(2) in k and s(2) k vertex operaSect. 2 involves taking “quantum traces” of products of s(2) tors, suggesting that representations of s(2|1) could be constructed as sums of products 1 representations). of representations of these dual s(2) algebras (and the auxiliary s(2) However, identifying which s(2) representations to choose in order to obtain, e.g., the

Vertex Operator Extensions of Dual Affine s(2) Algebras

511

category O of s(2|1) representations does not follow from the operator construction alone, and the question is analysed in Sect. 4.3. k⊕ On the other hand, one can address the inverse problem of reconstructing s(2) s(2)k representations out of a given s(2|1) representation. By simply restricting the k . For the dual s(2) k algebra, latter, one obviously arrives at representations of s(2) which is not a subalgebra of s(2|1), the construction of its representations starting with an s(2|1) k representation is provided in Secs. 4.1 and 4.2. The “direct” and the “inverse” problems each consist, first, in constructing a representation and second, in decomposing it. The second step, which requires a more detailed analysis of the representations constructed, will be addressed in Sect. 5. We continue using both k = p − 2 and k = p − 2 as the parameters. 4.1. Reconstructing the dual s(2) pair. We first address the problem of reconstructing k ⊕ s(2) k starting with the s(2|1) the dual pair s(2) k algebra. One of these is obviously the subalgebra of s(2|1), and thus the problem is to “complete” the u(1) subalgebra to the second (“primed”) s(2). The construction goes by picking out the appropriate terms in the operator products (C2 (z)⊕C2 (z))·(C2 (w)⊕C2 (w)), where C2 (z)⊕C2 (z) denotes the s(2|1) operators representing s(2) doublets. This yields the dual s(2) operator product expansions after a “correction” with an extra scalar field (recall that we also needed an auxiliary scalar in constructing s(2|1) out of two s(2) algebras). We, thus, introduce the scalar field φ with the operator product ∂φ(z) ∂φ(w) = −

1 (z − w)2

(4.1)

(note that the sign is opposite to the one for the f scalar in (2.10)) and define C+ =

1 k+1

E1 F 2 e

√

2φ

,

C− =

1 k+1

√ 2φ

F 1 E 2 e−

,

C0 =

k √ 2

∂φ + (k + 1)H + . (4.2)

2

The operators C ± representing C2 (z) (“corrected” by the free scalar, and therefore, local with respect to each other) commute with the s(2) subalgebra of s(2|1). It also follows from the s(2|1) algebra that (E 1 F 2 )(z) (E 2 F 1 )(w) =

TSug + E 1 F 1 − E 2 F 2 + ∂H − + ∂H + k(k + 1) 2(k + 1)H + − −(k+1) +. . . , 4 3 (z − w) (z − w) (z − w)2 (4.3)

where TSug is the Sugawara energy-momentum tensor (B.5). Thus, C + (z) C − (w) =

k C0 . + 2 (z − w)2 z−w

(4.4)

It is easy to see that C + (z) C + (w) and C − (z) C − (w) are nonsingular. We also find the operator products C 0 (z) C ± (w) =

±C ± , z−w

C 0 (z) C 0 (w) =

k /2 . (z − w)2

(4.5)

512

P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina

Equations (4.4) and (4.5) show that C + , C 0 , and C − satisfy the “dual” s(2) algebra of level k . However, we keep the notation C ±,0 instead of J ±,0 in order to stress that the former are constructed out of the s(2|1) currents. In addition, we have the current D=

√1 2

∂φ + H + ,

D(z) D(w) = − (k+1)/2 , (z−w)2

(4.6)

k algebra. that commutes with the s(2) Remark 4.1. Under the s(2|1) spectral flow E 1 (z) → zθ E 1 (z),

E 2 (z) → z−θ E 2 (z),

F 1 (z) → z−θ F 1 (z), F 2 (z) → zθ F 2 (z),

H + (z) → H + (z) − θ kz ,

k currents undergo the spectral flow transform with the parameter 2θ : the s(2) C ± (z) → z±2θ C ± (z),

C 0 → C 0 + θ

k z.

Note also √that if the s(2|1) spectral flow is accompanied by the transformation φ(z) → k currents remain invariant. φ(z) − 2 θ log z in the auxiliary scalar sector, the s(2) Remark 4.2. The operators emerging in the operator product expansion (4.3) already subalgebra; they can be further reorganised by separating the part commute with the s(2) that commutes with H + . The resulting operators then generate the coset W algebra w = s(2|1)/ g(2) mentioned in the Introduction. Its central charge c = −2

2k + 1 k − 1 =2 k+2 k +2

3k satisfies, obviously, c + 1 + k+2 = 0. The lowest spin (spin-2 and spin-3) generators of w can be read off from (4.3) (with the first-order pole restored) and are expressed through the s(2|1) currents as

W2 =

E1F 1 − E2F 2 E 12 F 12 + H − H − H +H + ∂H − + + + k+1 (k + 1)(k + 2) k(k + 1) k + 2

and (choosing it to be a W2 -primary) 1 1 12 1 2 12 1 2k 1 1 12 W3 = k+1 (k+2)(k+4)(3k+2) 2 (k + 4)E ∂F + E F F − 2 E ∂F + 21 (k + 4)E 2 ∂F 2 − − F 12 E 1 E 2 + H − E 1 F 1 + H − E 2 F 2 − − 2k H + E 12 F 12 + −

− k+2 + k H ∂H

−

1 2 (k

−

k+4 + 2 2 2 + − − k H E F − − kH H H 1 1 12 12 1 1 2 (k + 4)∂E F + 2 ∂E F

−

2(k+4) H 3k 2

k+4 + 1 1 k H E F + + +

H H

+ 4)∂E F 2 − ∂H + H − − 21 ∂ 2 H − − 16 (k + 4)∂ 2 H + . 2

No higher-spin currents are generated for k = −3 (i.e., the W3 · W3 operator product is closed to W2 and W3 ), and similarly with more negative integer values: for k = −n, the highest spin is n.

Vertex Operator Extensions of Dual Affine s(2) Algebras

513

k representations. We now consider the s(2) k represen4.2. From s(2|1) k to s(2) ±,0 tation furnished by the modes Cn of the currents constructed in (4.2). We start with an s(2|1) k module, which we take to be a twisted Verma module Ph− ,h+ ,k;θ (see Appendix 6) with the twisted highest-weight vector |h− , h+ , k; θ . Let also Eσ be the Fock σ √ φ module of the free current ∂φ with the highest-weight vector |e 2 corresponding to σ √ φ k . the operator e 2 . Then Ph− ,h+ ,k;θ ⊗ n∈Z Eσ +2n carries a representation of s(2) We now construct the s(2)k highest-weight vector, which we temporarily denote by |h− , h+ ; θ ; σ , by tensoring the highest-weight vectors in these modules. k state Lemma 4.3. The s(2)

√σ φ |h− , h+ ; θ ; σ = |h− , h+ , k; θ ⊗ e 2

(4.7)

is a twisted relaxed highest-weight state. We recall [17] that a twisted relaxed highest-weight state |j , 7, k ; θ s(2) satisfies the annihilation conditions + C≥1+θ = 0, |j , 7, k ; θ s(2) − C≥1−θ = 0, |j , 7, k ; θ s(2) 0 C≥1 |j , 7, k ; θ s(2)

(4.8)

= 0,

the eigenvalue condition

k C00 |j , 7, k ; θ s(2) = (j − θ 2 ) |j , 7, k ; θ s(2) ,

(4.9)

and the relation − + C−θ = 7 |j , 7, k ; θ s(2) . Cθ |j , 7, k ; θ s(2)

(4.10)

We will write |j , 7, k s(2) for the “untwisted” state |j , 7, k ; 0s(2) . When there are no more independent relations (no singular vectors are factored over), the module k generators acting on |j , 7, k ; θ is called the relaxed Verma spanned by the s(2) module Rj ,7,k ;θ [17]. The Sugawara dimension of the twisted relaxed highest-weight vector |j , 7, k ; θ s(2) is given by

# =

j 2 + j + 7 k 2 j + − θ θ . k + 2 4

(4.11)

Proof of the Lemma. Let Uh− ,h+ ;θ be the operator corresponding to the twisted highest k currents C ±,0 deweight vector |h− , h+ , k; θ (see (B.3)). It follows that the s(2) velop the following poles in the operator product with the operator Xh− ,h+ ;θ;σ = Uh− ,h+ ;θ e

σ √ 2

φ

corresponding to (4.7):

C + (z) Xh− ,h+ ;θ;σ (0) ∼ z2θ−1−σ ,

C − (z) Xh− ,h+ ;θ;σ (0) ∼ z−2θ−1+σ .

(4.12)

This gives precisely the twisted relaxed highest-weight conditions for the state in (4.7). We now choose σ = 2θ for simplicity (taking σ different from 2θ would result in the spectral flow transform of this state and, thus, in the twist of the entire module). We

514

P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina

then find that the |h− , h+ ; θ ; 2θ state constructed in (4.7) satisfies condition (4.10) with θ = 0 and 1 7 = − (k+1) 2 (k + 1 + h+ − h− )(h+ + h− ).

We also see that the eigenvalue of C00 on |h− , h+ ; θ ; 2θ is j = |h− , h+ ; θ ; 2θ is the relaxed highest-weight state

(4.13) h+ k+1 .

h+ 1 |h− , h+ ; θ ; 2θ = | k+1 , − (k+1) 2 (k + 1 + h+ − h− )(h+ + h− ), k s(2)

Thus,

(4.14)

and the freely generated module is the relaxed Verma module Rh+

−1 k+1 , (k+1)2

(k+1+h+ −h− )(h+ +h− ),k

.

The dependence of the space n E2θ+2n on θ is only mod Z, and for the integer θ that we consider in what follows, we have the space E∗ = n E2n . The s(2)k representations on Ph− ,h+ ,k;θ ⊗ E∗ are distinguished by the eigenvalues ξ of the current in (4.6). Let Xξ be the Fock module over the Heisenberg algebra of Fourier modes of the D current with the highest-weight vector |ξ such that D0 |ξ = ξ |ξ and D≥1 |ξ = 0. It follows that √ √ D0 |h− , h+ , k; θ ⊗ e 2(θ+n)φ = (h+ − (k + 1)θ − n) |h− , h+ , k; θ ⊗ e 2(θ+n)φ , (4.15) √

k algebra, the state |h− , h+ , k; θ ⊗ |e 2(θ+n)φ is a twisted while as regards the s(2) relaxed highest-weight state with the twist θ = 2n. We thus expect the decomposition k representations, into a sum of spectral-flow transformed s(2) Ph− ,h+ ,k;θ ⊗ E∗ =

n∈Z

Rh+

−1 k+1 , (k+1)2

(k+1+h+ −h− )(h+ +h− ),k ;2n

⊗ Xh+ −(k+1)θ−n ⊗ . . . (4.16)

k k representations). Since the parameters of the s(2) (where the dots denote s(2) modules appearing on the right-hand side are insensitive to the s(2|1) spectral flow parameter θ , this induces the correspondence Ph− ,h+ ,k;• ❀ Rh+

−1 k+1 , (k+1)2

(k+1+h+ −h− )(h+ +h− ),k ;•

(4.17)

k modules. between spectral-flow orbits of s(2|1) k and s(2) An interesting question is whether this correspondence can be extended to a functor; answering this involves, in particular, investigating the correspondence between singular vectors appearing in the modules related by the correspondence. We now show that the ❀ arrow descends to submodules generated from a class of the so-called “charged” singular vectors. The relaxed Verma modules may contain singular vectors such that the corresponding quotients are the usual Verma or twisted Verma modules. These are the charged singular

Vertex Operator Extensions of Dual Affine s(2) Algebras

515

vectors [17], which occur in the module built on the vector |j , 7, k s(2) whenever 7 = 7ch (n, j ) for n ∈ Z, where 7ch (n, j ) = n(n + 1) + 2nj ,

n ∈ Z.

When (4.18) holds, the charged singular vector reads  (C0− )−n |j , 7ch (n, j ), k s(2) n ≤ −1, , |C(n, j , k )s(2) = (C + )n+1 |j , 7 (n, j ), k , n ≥ 0. ch s(2) 0

(4.18)

(4.19)

This state satisfies the usualVerma-module highest-weight conditions for n ≤ −1 and the twisted Verma highest-weight conditions with the twist parameter θ = 1 for n ≥ 1 [17]. Remarkably, we see from (4.13) and (4.18) that a charged singular vector occurs in if and only if the relaxed Verma module Rh+ −1 k+1 , (k+1)2

h+ = −h− − (k + 1)n

(k+1+h+ −h− )(h+ +h− ),k

or

h+ = h− − (k + 1)(n + 1),

(4.20)

which are the conditions (B.11) for the existence of charged singular vectors (B.12) and (B.14) in the s(2|1) k Verma modules Ph− ,h+ ,k;• (in the notations of [7], such conditions on the highest weight state lead to class IV and class V representations). Thus, Lemma 4.4. The relaxed Verma module Rh+

−1 k+1 , (k+1)2

(k+1+h+ −h− )(h+ +h− ),k

generated

from the state constructed in (4.7) contains a charged singular vector if and only if the s(2|1) Verma module Ph− ,h+ ,k;θ contains a charged singular vector. In addition to the “existence” result asserted in the lemma, the charged singular k representation on Ph− ,h+ ,k;θ ⊗ E∗ actually evaluate as vectors (4.19) in the s(2) the respective s(2|1) k charged singular vectors (B.12) or (B.14). Indeed, consider the representation on Ph− ,−h− −(k+1)n,k;θ ⊗E∗ and let n ≤ −1. Then, up to the factor (k+1)n , √ (C0− )−n |h− , −h− − (k + 1)n, k; θ ⊗ e 2θ φ

√ 2 2 1 = Eθ+n . . . Eθ−1 · Fθ+n+1 . . . Fθ1 |h− , −h− − (k + 1)n, k; θ ⊗ e 2(θ+n) φ , (4.21)

which is precisely the corresponding charged singular vector in (B.14). Next, let n ≥ 0. Note that in this case, the charged singular vector (C0+ )n+1 |j , n(n + 1) + 2nj , k s(2) satisfies the highest-weight conditions that are twisted by 1 with respect to the s(2) Verma-module highest-weight conditions. It is immediate to see that (again, up to a nonvanishing factor) √ (C0+ )n+1 |h− , −h− − (k + 1)n, k; θ ⊗ e 2θ φ

√ 1 1 2 2 |h− , −h− − (k + 1)n, k; θ ⊗ e 2(θ+n+1) φ = E−θ−n−1 . . . E−θ−1 · F−θ−n . . . F−θ √ (+) 1 = E−θ−n−1 C (n, h− , k; θ) ⊗ e 2(θ+n+1) φ . (4.22)

516

P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina

1 Now, unless k + 1 − 2h− = 0, the state E−θ−n−1 |C (+) (n, h− , k; θ) generates the same module as the charged singular vector of which it is a descendant, since (+) 1 1 E−θ−n−1 Fθ+n+1 C (n, h− , k; θ) = (k + 1 − 2h− )C (+) (n, h− , k; θ) .

k singular vectors in Ph− ,h− −(k+1)n,k;θ ⊗ A similar analysis applies to the charged s(2) E∗ , which again evaluate as the s(2|1) charged singular vectors. Namely, whenever h+ = k module Rh h− − (k + 1)n, the corresponding s(2) contains 2h− − h− k+1

− n, k )

k+1 −n,(n−1)( k+1 −n),k

the charged singular vector |C(n − 1, . When n ≤ 0, it is given by s(2) 2 |C (−) (n, h− , k; θ) times a primary the action of (C0− )−n+1 , which evaluates as Eθ+n−1 state in the φ sector. Unless 2h− = k + 1, this s(2|1) vector generates the same module k charged as the charged singular vector |C (−) (n, h− , k; θ). When n ≥ 1, the s(2) h− + n singular vector |C(n − 1, k+1 − n, k )s(2) is given by the action of (C0 ) on the highest-weight vector and evaluates precisely as |C (−) (n, h− , k; θ) (times a φ-sector state). k ⊕ s(2) k to s(2|1) 4.3. From s(2) k representations. We now build s(2|1) k repre k and s(2) k (and the “auxiliary” s(2) 1 ) representations. sentations by combining s(2) The construction is summarised in Theorem 4.5, while Sect. 4.3.1–4.3.2 explain how k representations, we build on the result one arrives at this result. As regards the s(2) k correspondence of the previous subsection, where we saw that the s(2|1) k ❀ s(2) produces relaxed Verma modules from the s(2|1) Verma modules. We can therefore k expect that the correspondence acting in the opposite direction should start with relaxed Verma modules. This proves to be the case, as we see in what follows. k ⊕s(2) 1 sector. We recall from (2.6)– 4.3.1. Choosing representation spaces: the s(2) k modules as (2.7) that the vertex operators C2 (z) map between s(2) C2 (z) ⊗ C2q (z) : Vj,k → Vj − 1 ,k ⊗ z

−j −1 p

2

j

C((z)) ⊕ Vj + 1 ,k ⊗ z p C((z)). 2

(4.23)

k modules Vj + n ,k , n ∈ Z, with the spins differing from This gives rise to a chain of s(2) 2 each other by (half-)integers. For the future convenience, we rewrite (4.23) for these modules C2 (z) ⊗ C2q (z) : Vj + n2 ,k → Vj + n−1 ,k ⊗ z 2

−j −1− n 2 p

C((z)) ⊕ Vj + n+1 ,k ⊗ z 2

j+ n 2 p

C((z)). (4.24)

In the auxiliary sector, let Fµ be the Fock space of the auxiliary current ∂f (z) with the highest-weight vector |µ such that √ (4.25) (∂f )0 |µ = 2 µ |µ. We then have the vertex operator action C21 (z) : Fµ → Fµ+ 1 ⊗ zµ C((z)) + Fµ− 1 ⊗ z−µ C((z)). 2

2

(4.26)

Vertex Operator Extensions of Dual Affine s(2) Algebras

517

m

✻

·

·•

·

·•

·

·•

·

·•

·

·•

·

·•

·

·•

·

·•

·

·•

·

·•

·

·•

·

·•

·

·•

·

·•

·

·•

·

·•

·

·•

·

✲n

Fig. 1.

The s(2|1) fermions constructed in terms of the vertex operators would change both n and m in Vj + n2 ,k ⊗ Fµ+ m2 . However, µ is changed by a half -integer simultaneously with k spin j changed by a half-integer. This involves only an index-2 sublattice of the s(2) the Z × Z $ (n, m) lattice in Fig. 1 (alternatively, one could choose to work with the other half of the lattice sites). k vertex operators Our aim is to combine Eqs. (4.24) and (4.26) with the action of s(2) such that the s(2|1) fermions E 1 , E 2 , F 1 , and F 2 then act on a space of the form n∈Z

Vj + n2 ,k ⊗ R (n) ⊗

m∈Z

Fµ− n2 +m

(4.27)

k modules). The very existence of such a space endowed (where R are some s(2) with the representations induced from the vertex operator action is not obvious a priori, because vertex operator action in each sector gives rise to the spaces zν C((z)) with non-integral ν that are different for different terms (recall that p = k + 2 ∈ C \ {0, 1} and j ∈ C in (4.24)); it does not therefore induce a representation on a space of the form (4.27) unless the modules R (n) are judiciously chosen. A crucial point making such a choice possible is the quantum group trace in the construction of the s(2|1) fermions. k sector. As we saw in Sect. 4.2 from 4.3.2. Choosing representation spaces: the s(2) k reprethe construction that is in a certain sense inverse to the present one, the s(2) sentations have to include the relaxed Verma modules. As a hint in properly choosing the relaxedVerma module R (0) = Rj ,7,k ;θ in (4.27), k relaxed k highest-weight state, an s(2) let us assume that the product of an s(2) 1 highest-weight state, |j, k ⊗ |j , 7, k ; θ ⊗ highest-weight state, and an s(2) s(2) |µ ∈ Vj,k ⊗ R ⊗ Fµ , is a twisted highest-weight state with respect to s(2|1)k j ,7,k ;θ

(see (B.3)) tensored with a highest-weight vector in the Fock module Aa over the free current A(z) in Eq. (2.17); we define the highest-weight vectors |a ∈ Aa such that A0 |aA =

2(k + 1) a |aA .

(4.28)

518

P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina

Assuming, thus, that |j, k ⊗ |j , 7, k ; θ s(2) ⊗ |µ = |h− , h+ , k; θ ⊗ |aA , we evaluate the eigenvalues of the s(2|1) Cartan generators (and also of A0 , see (2.17)) on this vector. Using (2.16) and (2.17), we obtain the eigenvalues

H0+ ≈ (k + 1)j − k(µ − θ2 ), A0 ≈ 2(k + 1) (j − k 2θ − µ),

(4.29) (4.30)

whence h+ = (1 + k)j + k(θ − µ + 21 θ ). We now compare the dimensions using Eq. (2.31) that we have established for our s(2|1) k generators. The balance of dimensions of the highest-weight vectors reads j (j +1) k+2

=

+

j (j +1)+7 k +2

j 2 −(j (k+1))2 k+1

− j θ + (4.11)

+2 µ−

k θ 2 4

+ µ2

j (k + 1) − k µ −

θ 2

θ 2

2

+ (k + 1) j −

k θ 2

2 −µ .

(B.6)

(4.31) This fixes 7 equal to 7(j, j ) given by1 7(j, j ) = − 1 + j − (k + 1)j j + (k + 1)j . Generalising this, we now take the modules Rj − n ,7 (j,j ),k with 2

R (n)

(4.32)

in (4.27) to be

R (n)

=

n

7n (j, j ) = −(1 − (k + 1)j + j )(j + (k + 1)j − n).

(4.33)

In accordance with (4.11), the Sugawara dimension #n of |j , 7n (j, j ), k is #n =

(1 − (k + 1)j + n2 )( n2 − (k + 1)j ) , k + 2

j k+2 and #n−1 − #n 3/4 1 k +2 is the Sugawara dimension of | 2 , k corresponding

and therefore, evaluating #n+1 − #n −

3/4 k +2

=

n/2 k +2

(4.34)

−

−n/2−1 j − k3/4 +2 = k +2 + k+2 (where k vertex operator action to the C2 (z) vertex operator), we find the exponents in the s(2)

C2 (z) ⊗ C2q (z) : Rj − n ,7n (j,j ),k → 2

→

Rj − n−1 ,7 (j,j ),k n−1 2

⊗z

−n 2 −1 + j p p

C((z)) ⊕ Rj − n+1 ,7 (j,j ),k n+1 2

⊗z

n 2 −j p p

C((z)). (4.35)

+ pj and Remarkably, the exponents −n/2−1 p integers with the exponents from (4.24).

n/2 p

−

j p

appearing here add up to (half-)-

1 We note that (4.32) implies h

+ 7(j, k+1 )=−

(k+1+h+ −h− )(h+ +j ) , (k+1)2

k states from a given s(2|1) which reproduces Eq. (4.13) obtained in constructing s(2) k state and, thus, suggests that the constructions in Sects. 4.3 and 4.2 are parts of the direct and the inverse functors.

Vertex Operator Extensions of Dual Affine s(2) Algebras

519

4.3.3. The vertex operator action. The last observation is a crucial point that, irrespective of the preceding motivations, allows us to construct the space carrying an s(2|1) action. Theorem 4.5. For every pair (j, j ) ∈ C × C, there is an s(2|1) k representation on the space Nj,j =

n∈Z

Vj + n2 ,k ⊗ Rj − n ,7n (j,j ),k ⊗ 2

m∈Z

Fm− n2 ,

(4.36)

k module with the highest-weight vector |j, k, R where Vj,k is an s(2) j ,7,k is an s(2)k representation with the relaxed highest-weight vector |j , 7, k s(2) , and F is the Fock µ space of the auxiliary current with the highest-weight vector |µ defined in (4.25). Remark 4.6. The range of the n summation in (4.36) can be restricted to a subset of Z k and s(2) k modules involved, as we will see in what follows. depending on the s(2) To show (4.36), it only remains to check that combining (4.24), (4.35), and (4.26), we are left with the Laurent spaces zν C((z)) with integer ν. Indeed, putting together (4.24) and (4.35) and recalling the trace in (2.9), we arrive at C2 (z) ⊗ C2 (z) : Vj + n2 ,k ⊗ Rj − n ,7n (j,j ),k → 2

n

→ Vj + n−1 ,k ⊗ Rj − n−1 ,7 2

2

n−1 (j,j ),k

⊗ z− 2 −1 C((z)) ⊕ Vj + n+1 ,k 2

⊗ Rj − n+1 ,7 (j,j ),k n+1 2

n 2

⊗ z C((z)). (4.37)

We next combine this with the auxiliary sector from (4.26) and, thus, obtain the vertex operator action C2 (z) ⊗ C2 (z) ⊗ C21 (z) : Vj + n2 ,k ⊗ Rj − n ,7n (j,j ),k ⊗ Fµ− n2 +m → 2 n → Vj + n−1 ,k ⊗ Rj − n−1 ,7 (j,j ),k ⊗ z− 2 −1 C((z)) ⊕ Vj + n+1 ,k n−1 2 2 2

n 2 ⊗ Rj − n+1 ,7 (j,j ),k ⊗ z C((z)) ⊗ n+1 2n

n ⊗ zµ− 2 +m Fµ− n +m+ 1 ⊕ z−µ+ 2 −m Fµ− n +m− 1 . 2

2

2

2

(4.38)

This indeed involves the zν C((z)) spaces with all ν being integer mod µ, and there fore, induces an s(2|1) representation on the space n∈Z Vj + n2 ,k ⊗ Rj − n ,7 (j,j ),k ⊗ n 2 n m∈Z Fµ+m− 2 . This space actually depends only on the fractional part of µ, and this dependence is nothing but the overall (non-integral) spectral flow; we can therefore set µ = 0, which gives the space Nj,j in Eq. (4.36) endowed with the structure of an s(2|1) k representation. (Choosing between integral and half-integral µ corresponds to Ramond and Neveu–Schwarz sectors.) Remark 4.7. In the auxiliary sector modules in (4.36), we can replace m− n2 with m− 21 for odd n and with m for even n; therefore, for odd and even n we have the respective spaces 1 F = M and 1 1 m∈Z m+ m∈Z Fm = M0,1 that are the spin- 2 and spin-0 irreducible ,1 2

2

520

P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina

1 algebra introduced in (3.1). Thus, the s(2|1) representations of the s(2) representation space can be described as a sum of tensor products of three s(2) representations: Nj,j = Vj + n2 ,k ⊗ Rj − n ,7n (j,j ),k ⊗ M ε(n) ,1 , ε(n) = n mod 2. (4.39) 2

n∈Z

2

1/2

Replacing ε(n) with 1 − ε(n) gives the Neveu–Schwarz representation space Nj,j . Remark 4.8. It is obvious from the above that Nj,j is in fact a representation of D(2|1; k )k . 5. Decomposition of Representations and Character Identities In this section, we decompose the s(2|1) k representations constructed as in (4.39) and check the decomposition formulas by using character identities. In Sect. 5.1, we explain the general strategy of decomposing the representations and obtaining the corresponding character identities. In Sec 5.2, we calculate the characters of both sides of the decomposition formula for the corresponding (relaxed) Verma modules and show these characters to be identical. In Sect. 5.3, we outline the steps leading from the Verma-module case to the respective irreducible representations. We specialise to the admissible representations and, further, to the “principal admissible” ones. The relevant decomposition formula is given in (5.25). As we do not give a direct proof that the representations on the right-hand side are indeed the corresponding irreducible s(2|1) representations, we invoke the 1 u −1 character sumrules derived independently. These are given in Sect. 5.4, where we first have to explain how the characters of the twisted representations are identified with the s(2|1) characters known from [7]. As a result, the sumrules in Eqs. (5.55)–(5.57) 1 u −1 are found to be precisely the character identities for (5.25). We conclude the section with a remark on the modular properties of the characters involved. 5.1. Decomposing the s(2|1) representation. The s(2|1) k representation on the space Nj,j constructed in the previous section is by no means irreducible, since all the s(2|1) generators commute with the current A(z) constructed in (2.17). Thus, Nj,j decomposes as λ P(λ) ⊗ Aa(λ) , where P(λ) are some s(2|1) k modules and Aa are the Fock modules over the free scalar; λ labels different eigenvalues that A0 has on the highestweight vector of each A module. Evaluating the s(2|1) highest-weight parameters, we find that the different s(2|1) representations P(λ) are the spectral flow transform of each other. For all these modules, the highest weight parameters, except the twist, are the same. Taking for definiteness Vj + n2 ,k to be the Verma modules (and Rj − n ,7 (j,j ),k ;θ the twisted relaxed Verma n 2 modules), we thus arrive at the decomposition n∈Z

Vj + n2 ,k ⊗ Rj − n ,7n (j,j ),k ;θ ⊗ M ε(n) ,1 = 2

2

Pj,(k+1)j ,k;2µ−θ +θ ⊗ Aj −θ

θ∈Z

(5.1) where Aj −θ are Fock modules (see (4.28)). A priori, Pj,(k+1)j ,k;2µ−θ +θ are some s(2|1) representations with the respective twisted highest-weight vectors |j, (k + 1)j , k; 2µ − θ + θ. That they are in fact the corresponding twisted Verma

Vertex Operator Extensions of Dual Affine s(2) Algebras

521

modules is confirmed in Sect. 5.2 by showing that the characters on both sides of (5.1) are identical.2 − Sug + 1 √ 1 A In calculating the characters, we take the trace Tr zH0 ζ H0 q L0 y 2(k+1) 0 q 2 (AA)0 on the right-hand side of the decomposition formula and then use Eqs. (2.16), (2.17), k ⊕ s(2) k ⊕ s(2) 1 Cartan generators and (2.31) to rewrite this in terms of the s(2) and the energy-momentum tensors, −

Sug √ 1 + 1 A Tr zH0 ζ H0 q L0 y 2(k+1) 0 q 2 (AA)0 0 Sug j 0 Sug

J 0 Sug = Tr zJ0 q L0 ζ k+1 y 0 q L0 ζ −k y −1 0 q l0

(5.2)

(with the respective Sugawara energy-momentum tensor for each algebra). The trace on each side of the last formula is taken over the space on the complementary side of (5.1). The resulting character identities are therefore of the general form

χ s(2|1)k (q, z, ζ ) χ A (q, y) = χ s(2)k (q, z) χ s(2)k (q, ζ k+1 y) χ s(2)1 (q, ζ −k y −1 )

(5.3)

(where χ s(2)1 and χ A are the standard expressions essentially given by the correspond k and s(2) k ing theta function; the “nontrivial” ingredients are the s(2|1) and s(2) characters). In what follows, we freely interchange between (q, z, ζ, y) and (τ, σ, ν, ρ) related by q = e2iπτ ,

z = e2iπσ ,

ζ = e2iπν ,

y = e2iπρ ,

(5.4)

where τ ∈ C, Im τ > 0 ⇒ |q| < 1, and σ ∈ C, ν ∈ C, ρ ∈ C. 5.2. Verma-module case. Comparing the characters of both sides of (5.1), we have to deal with formal objects, since already the relaxed n Verma module character involves the nowhere convergent series δ(z, y) = n∈Z yz : the characters of a twisted relaxed Verma module R and of the s(2)k Verma module Vj,k are j ,7,k ;θ

char R j ,7,k ;θ (u, q) ≡ Tr R

j ,7,k ;θ

=q =q char V j,k (z, q)

k 2 4θ

u

− k4 θ 2

− k2 θ

δ(u q

δ(u q

≡ Tr Vj,k z

J00

Sug

0

uJ 0 q L0

q

−θ

Sug

L0

−θ

, 1)

, 1)

q

q

2 j +j +7 k +2

(u q −θ )j i 3 i≥1 (1 − q )

2 j +j +7 k +2

uj q −θ j , i 3 i≥1 (1 − q )

(5.5)

j 2 +j

q k+2 zj = , ϑ1,1 (q, z)

(5.6)

2 Strictly speaking, this requires the assumption that the modules in question are generated from the respective highest-weight vectors.

522

P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina

with the Jacobi theta-function (B.9). In the auxiliary sector, it is more convenient to return to the description as in (4.36); the character of Fµ is simply

Tr Fµ v

√1 (∂f )0 2

q

1 2 (∂f ∂f )0

For the character of n∈Z Vj + n2 ,k ⊗ Rj − n ,7 n 2 (recall the dimension in Eq. (4.34)) 0 Sug

Tr V n zJ0 q L0 Tr R

Sug

0

uJ 0 q L0

=

q

n 2 (j + n 2 ) +j + 2 k+2

2

j − n 2 ,7n (j,j ),k

j + 2 ,k

m,n∈Z

vµ q µ = . (5.7) i i≥1 (1 − q ) ⊗ m∈Z Fµ− n2 +m , we thus have (j,j ),k

n2 + n 4 2

n

Tr F

µ− n 2 +m

v

− j (n+1) + (k

1 q 2 (∂f ∂f )0 =

√1 (∂f )0 2

+1)j 2

n

(5.8)

n 2

n

k+2 k+2 uj − 2 q (µ+m− 2 ) v µ+m− 2 zj + 2 δ(u, 1) q k +2 . ϑ1,1 (q, z) (1 − q i )4

n∈Z m∈Z

i≥1

Shifting the summation variable as n → n + m, we continue this as (with ϑ1,0 defined in (B.10)) ϑ1,0 (q, z 2 u− 2 v − 2 q −µ ) ϑ1,0 (q, z 2 u− 2 v 2 q µ ) . ϑ1,1 (q, z) (1 − q i )4 1

j2

= zj uj v µ q k+1 +µ δ(u, 1) 2

1

1

1

1

1

i≥1

(5.9) In accordance with (5.2), we now substitute u = ζ k+1 y and v = ζ −k y −1 . Then uv = ζ , and therefore, the argument of the first theta function in the numerator becomes 1 1 1 1 1 1 z 2 ζ − 2 q −µ . In the second theta-function, we have u− 2 v 2 = u−1 (uv) 2 = u−1 ζ 2 ; we now use the formal δ-function property zδ(z, y) = yδ(z, y) to replace u with 1. Thus, m,n∈Z

Tr V

j+ n 2 ,k

Sug

zJ0 q L0 0

Tr R

j − n 2 ,7n (j,j ),k

Sug

0

(ζ k+1 y)J 0 q L0

Tr F

µ− n 2 +m

(ζ k y)

−1 √ (∂f )0 2

2

1

ϑ1,0 (q, z 2 ζ − 2 q −µ ) ϑ1,0 (q, z 2 ζ 2 q µ ) . ϑ1,1 (q, z) (1 − q i )4 1

j2

= zj ζ µ q k+1 +µ δ(ζ k+1 y, 1)

q 2 (∂f ∂f )0

1

1

1

i≥1

We set µ = 0 for simplicity (this parameter has the meaning of the overall spectral flow transform). On the right-hand side of (5.1), we have the character (B.8) of the twisted s(2|1) Verma module Ph− ,h+ ,k;θ , and the character of Aa is given by

Tr Aa y

√ 1 2(k+1)

A0

q

L0

y a q (k+1)a = , (1 − q i ) 2

i≥1

(5.10)

Vertex Operator Extensions of Dual Affine s(2) Algebras

523

and thus the character of the right-hand side of (5.1) equals θ∈Z

Tr P

=

j,(k+1)j ,k;θ

−

+

Sug

zH0 ζ H0 q L0

Tr A

j2

j −θ

zj ζ (k+1)j −(k+1)θ q k+1 −(k+1)j

y

√ 1 A 2(k+1) 0

q 2 (AA)0 1

2 +2θ(k+1)j −(k+1)θ 2

y j −θ q (k+1)(j −θ)

2

θ∈Z

ϑ1,0 (q, z 2 ζ 2 ) ϑ1,0 (q, z 2 ζ − 2 ) ϑ1,1 (q, z) (1 − q i )4 1

1

1

1

i≥1

ϑ1,0 (q, z 2 ζ 2 ) ϑ1,0 (q, z 2 ζ − 2 ) , ϑ1,1 (q, z) (1 − q i )4 1

j2

= zj y j ζ (k+1)j q k+1 δ(ζ k+1 y, 1)

1

1

1

(5.11)

i≥1

where we again can use the δ-function to replace y j ζ (k+1)j with 1. This is identical with the character obtained for the left-hand side (with µ = 0; a nonzero value of µ, as well as of the twist θ of the R modules, is restored by the overall spectral flow transform). We thus conclude that there are twisted s(2|1) Verma modules on the right-hand side of (5.1).

5.3. From Verma modules to irreducible representations. To obtain the decomposition formula of form (5.1) for other (e.g., irreducible) representations, one can start with appropriate Verma modules (e.g., those with dominant highest-weights, so as to arrive at the admissible representations eventually) and follow the corresponding BGG resolution, with (5.1) applied again to each term in the resolution. If the decomposition formula is of a functorial nature (i.e., the appearance of singular vectors is “synchronised” on both sides), one then arrives at a similar formula for the irreducible representations. Here, we follow this program only as far as the first step consisting in taking the quotients with respect to the charged singular vectors. In the end, we confirm the decomposition identities for irreducible representations by verifying the corresponding character identities. We see from (4.18) and (B.11) that whenever a charged singular vector occurs in one (and hence in all) of the relaxed Verma modules Rj − n ,7 (j,j ),k , n ∈ Z, each of the n 2 twisted s(2|1) Verma modules Pj,(k+1)j ,k;θ , θ ∈ Z, contains a charged singular vector. (Proceeding in the “inverse” direction, we saw the same correspondence in Sect. 4.2, where we made it even more explicit.) For Weyl modules, we then obtain the result stated in the Introduction by taking the appropriate quotients of (5.1). In what follows, we discuss the form that the decomposition takes for a class of irreducible representations. We find which character identities (“sumrules” of type (5.3)) must follow from this formula, and derive them independently. This is a strong argument in favour of the functorial properties of decomposition (5.1). We now explain which identity is to be tested at the level of character sumrules. We take a rational level k in the positive zone expressed through two positive integers t and u via k+2=

t u

' = 1.

(5.12)

524

P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina

We require the relaxed Verma module Rj ,70 (j,j ),k “at the origin” in (5.1) to have the charged singular vector |C(, j , k )s(2) ; more precisely, we satisfy Eq. (4.18) by choosing (k + 1)j = 1 + j + ,

∈ Z.

k , we are interested in the case where the irreducible represen k and s(2) For both s(2) with respect tations are admissible [27]. We recall here that the quotient of Rj ,7,k /V to the submodule generated from the charged singular vector |C(, j , k )s(2) , ∈ Z, is a Verma module if ≥ 0 or a twisted Verma module (with the twist 1) if ≤ −1, and the spin of the highest-weight vector of the quotient module is 1, ≥ 0, 1−θ jV = j + + (k + 2) 2 , where θ = 0, ≤ −1. take the “admissible” values, We now require that the spins of Vj,k and Rj ,70 (j,j ),k /V j=

r−1 2

−

j + +

t s−1 u 2 ,

t 1−θ t−u 2

=

r −1 2

t s −1 t−u 2

(5.13)

1 ≤ s ≤ t − u

(5.14)

−

with 1 ≤ r ≤ t − 1,

1 ≤ r ≤ t − 1,

1 ≤ s ≤ u,

(which obviously requires t ≥ u + 1). Conditions (5.13) give (r + s − s + θ )t = (r + r )u,

(5.15)

implying that r + r is a multiple of t, which means r + r = t once r and r are in the “admissible range” given above. We then have r = u − s + s − θ , and therefore (choosing ≥ 0, so as to have the “untwisted’ Verma modules in the quotients of the relaxed Verma modules, after which we can set = 0), we obtain the representations V t−u+n+1−s − t−u s−1 , t−2u ⊗ V u−n−1−s u s −1 2u−t ⊗ M ε(n) ,1 . (5.16) n

u

2

2

u

2

− t−u

2

,

t−u

2

Here, V and V are Verma modules; their quotients are the respective admissible representations M and M for n such that −t + s − s + 1 ≤ n − u + 1 ≤ s − s − 1.

(5.17)

Shifting the summation index, we thus arrive at the following combination of admissible representations: Nadm s,s

=

s −s−1 n=s −s+1−t

M t+n−s − t−u s−1 , t−2u ⊗ M−n−s 2

u

2

u

2

u − t−u

s −1 2u−t 2 , t−u

1 ≤ s ≤ u, ⊗ M ε(n+u−1) ,1 , 1 ≤ s ≤ t − u. 2

(5.18)

Vertex Operator Extensions of Dual Affine s(2) Algebras

525

Remark 5.1. That the representations outside the “admissible range” decouple can also be understood by starting with free-field realisations of s(2) modules. The correspond ing formula of type (5.1) then gives some s(2|1) representations (whose structure de k and s(2) k ). Now, admissible pends on the free-field representations chosen for s(2) representations of either s(2)k or s(2)k are singled out from the free-field spaces as the cohomology of an appropriate “BRST” complex (for example, a Felder-like complex [2], but in general other free-field resolutions are needed). The vertex operators k and s(2) k representations then extend to the mapping between the individual s(2) entire complex; this allows one to single out trivial vertex operators (see, e.g., [6]) that induce zero mapping on the cohomology (as a rule, nontrivial mappings via vertex operator exist if and only if the corresponding fusion rule coefficient is nonzero). In the admissible case, the vertex operators become trivial as soon as the spin j + n2 or j − n2 goes outside the corresponding Kaˇc table (which occurs simultaneously for the unprimed and primed modules in (5.16)–(5.18)). This allows us to consistently restrict to only a finite number (exactly t − 1) of terms on the left-hand side of (5.1). For each pair (s, s ) in the corresponding range, the space Nadm s,s decomposes into a direct sum over the s(2|1) spectral flow orbit. The s(2|1) representations involved carry the labels Lh− ,h+ ,k;θ = L t−u+1−s − t−u s−1 , t−u+1−s − t−u s+1 , t −2;m , m ∈ Z. One 2 u 2 2 u 2 u expects these to be the corresponding irreducible s(2|1) representations. As we have noted, proving this by studying the BGG resolution is a separate problem which we do not address here, however this is confirmed by the character identities derived independently. k representations, we take the level Specialising to the integrable (unitary) s(2) 2u−t k = t−u to be an integer, whence t = u + 1 which implies s = 1 and k = u − 1. It follows that the level k is a fractional of the form k=

1 − 1, u

(5.19)

and therefore, belongs to a subclass of s(2) “admissible” levels called “principal admissible”, which are parametrised as u(k + h∨ ) = k 0 + h∨ ,

k 0 ∈ Z+ ,

(5.20)

where h∨ = 2 is the dual Coxeter number of s(2) [27, 30]. In the present instance, we have k 0 = u − 1. Note that (5.19) is also principal admissible with respect to s(2|1), with k 0 = 0, since h∨ = 1. For each s ∈ [1, u], we now have the space s(2|1) Nint s =

−s n=1−s−u

M u+n − 1 s−1 , 1 −1 ⊗ M −n−s ,u−1 ⊗ M ε(n+u−1) ,1 = u

2

=

2

u

u−s n=1−s

2

2

M n − 1 s−1 , 1 −1 ⊗ M u−s−n ,u−1 ⊗ M ε(n−1) ,1 . 2

u

2

u

2

(5.21)

2

We remind the reader that the notation Xj,k for each module indicates the spin j and the level k. When we are dealing with the characters in what follows, it will be useful to t label each admissible representation of s(2) −2 with the spin u

j=

r −1 t s−1 − , 2 u 2

1 ≤ r ≤ t − 1, 1 ≤ s ≤ u,

(5.22)

526

P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina

by r and s in addition to t and u, as Mt,u,r,s . In this notation, the integrable represen k integrable representations tations are Mt,1,r,1 = It,r . In fact, we encounter the s(2) Mu+1,1,r ,1 = Iu+1,r . Then (after a convenient redefinition of the summation index) int the space Ns is rewritten as Nint s =

u r=1

Mu+1,u,r,s ⊗ Iu+1,u+1−r ⊗ I3,ε(r−s−1)+1 .

(5.23)

This is the space to be decomposed into s(2|1) representations. 1 u −1 In addition to only a finite number of irreducible representations appearing on the left-hand side of the decomposition formula as discussed above, another effect occurring for the representations under consideration is the periodicity under the spectral flow on the s(2|1) side, where the spectral flow (B.7) with θ = u produces an isomorphic representation. In the analogue of (5.1) for such representations Lh− ,h+ ,k , we split the summation over the twists on the right-hand side by writing θ = uα + β with α ∈ Z and β = 0, . . . , u − 1. Since Lh− ,h+ ,k;uα+β ) Lh− ,h+ ,k;β , the decomposition formula for Nint s takes a remarkable form involving precisely u non-isomorphic s(2|1) representations: u r=1

Mu+1,u,r,s ⊗ Iu+1,u+1−r ⊗ I3,ε(r−s−1)+1

=

u−1 β=0

L u−s+1 , u−s−1 , 1 −1;β ⊗ 2u

2u

u

α∈Z

(5.24) A u−1−s −uα−β . 2

In what follows, we test this formula by verifying the corresponding character identities. 5.4. Sumrules for irreducible representation characters. We now study the character sumrules corresponding to Eq. (5.25). 5.4.1. Free-scalar and s(2) characters. In order to write down the character identities corresponding to (5.25), we first of all note that the sum of the Fock module characters over α on the right-hand side gives rise to a level u generalised theta function, namely,

q u [α−

2β−(u−1−s) 2 ] 2u

y u [α−

2β−(u−1−s) ] 2u

α∈Z

= θ−2β+(u−1−s),u (q, y) = θ2β−(u−1−s),u (q, y −1 ),

(5.25)

where we used (5.10) and some standard properties of the generalised theta functions defined by µ 2 µ µ m2 θµ,κ (q, ζ ) = q κ(n+ 2κ ) ζ κ(n+ 2κ ) = ζ 2 q 4κ ϑ1,0 (q 2κ , ζ κ q µ−κ ) (5.26) n∈Z

(with the Jacobi theta function from (B.10)). This accounts for the level-u theta function in Eqs. (5.55)–(5.57).

Vertex Operator Extensions of Dual Affine s(2) Algebras

527

k We also recall [27] the character formula corresponding to the admissible s(2) module Mt,u,r,s with the level as in (5.12), namely, s(2)

χt,u,r,s (τ, σ ) =

θb+ ,ut (τ, σu ) − θb− ,ut (τ, σu ) , θ1,2 (τ, σ ) − θ−1,2 (τ, σ )

(5.27)

where b± = ±ur − (s − 1)t (and, as before, 1 ≤ r ≤ t − 1, 1 ≤ s ≤ u). Under the spectral flow, the s(2) characters transform as follows (see also [16] for the s(2) spectral flow properties): s(2)

χu+1,u,r,s (τ, σ − λτ ) = (−1)λ q

λ2 (u−1) 4u

z−

λ(u−1) 2u

s(2)

χu+1,u,r,s+λ (τ, σ ),

(5.28)

s(2)

with χu+1,u,r,s+λ understood as follows: writing s + λ = uα + β with α ∈ Z, β = 0, . . . , u − 1, we have s(2) for α ∈ 2Z, χu+1,u,r,(s+λ)u (τ, σ ) s(2) χu+1,u,r,s+λ (τ, σ ) = (5.29) s(2) −χu+1,u,u+1−r,(s+λ)u (τ, σ ) for α ∈ 2Z + 1, u−1 representations are where (x)u denotes the residue mod u. The integrable s(2) periodic under the spectral flow with period 2, while the spectral flow transform by 1 maps the representations into one another according to r → u + 1 − r; we then have the character transformation formula s(2)

χu+1,1,r,1 (τ, σ + τ ) = q −

u−1 4

z−

u−1 2

s(2)

χu+1,1,u+1−r,1 (τ, σ ).

(5.30)

5.4.2. Identification of the s(2|1) characters and the spectral flow. Turning to the right characters known from hand side of (5.25), we next identify which of the s(2|1) 1 u −1 our previous work on the subject [7] correspond to the module Lh− ,h+ ,k;β with twist β and h− =

u−s+1 , 2u

h+ =

u−s−1 . 2u

(5.31)

We recall an extensive study of s(2|1) 1 −1 characters provided in [?,?,?]; these characu

ters are organised according to the eigenvalues that H0− and H0+ have on the twist-zero state (see (B.3)) H− , H+ = H− − ν , k ≡ H− , H− − ν , k; 0 . (5.32) u u

To see how these are related to (5.31), we observe that the same irreducible module can be generated from the twisted highest-weight state 1 1 2 . . . E−1 · F1−ν . . . F02 H− , H− − uν , k; 0 , (5.33) X− (ν, H− , k) = E−ν+1 ν−1

ν

which is singled out by the fact that it satisfies annihilation conditions of type (B.13), 1 ≈ 0, E 2 ≈ 0, F 1 ≈ 0, and F 12 ≈ 0 (in the Verma module, the action namely E−ν ν ν 1 1 on X − produces the charged singular vector, and hence, E 1 |X − = 0 in the of E−ν −ν

528

P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina

irreducible representation; in this sense, therefore, X − is a pre-singular vector to (B.12)). We find the eigenvalues H0− (X − ) = H− − 21 ,

H0+ (X − ) = H+ + ν −

(5.34)

1 2

that the Cartan generators have on this state. On the other hand, we see from (5.31) that h+ − h− = −(k + 1) = − u1 .

(5.35)

The X− state is then expressed through the twisted highest-weight state as 2 h− , h− − u1 , k; β , X − = F−β

(5.36)

and therefore, H0− (X − ) = h− − H0+ (X − ) =

−s+1 2u , 1 h+ + 1 − ( u − 1)β 1 2

=

−

1 2

=

−s−1 2u

(5.37)

+ 1 − ( u1 − 1)β.

By comparing (5.37) and (5.34), we obtain H− = h− = H+ = h+ −

u−s+1 2u , β u−s−1−2β . u = 2u

(5.38)

This gives the parameters of the twist-zero highest-weight vectors in the s(2|1) modules entering the decomposition formula. All of these representations are obtained by taking the quotient of the Verma modules with two charged singular vectors, one of type (B.12) and the other of type (B.14), with the integers labelling these singular vectors being of the opposite signs (obviously, there exist other singular vectors that have to be factored over to obtain the irreducible representation). According to the values of H− and H+ , the irreducible representation characters in any chosen “sector” of the theory (Ramond, Neveu–Schwarz, etc.) at level k = u1 − 1 are those of class IV ( 21 u(u + 1) characters) and of class V ( 21 u(u − 1) characters) [10, 7, 8]. Choosing the Ramond sector for definiteness, we have that IV. The class IV constraints on the highest weight state isospin and hypercharge are given by 2H− + (k + 1)m = 0

and

H+ − H− = m (k + 1),

(5.39)

where 0 ≤ m ≤ m ≤ u − 1 and m, m ∈ Z+ . For the corresponding Verma module, this implies the existence of the charged singular vectors |C (−) (−m , H− , k) and |C (+) (m − m , H− , k). The corresponding 21 u(u + 1) irreducible representation characters are given by, R,IV χm,m (q, z, ζ ) = q

·

2 −H 2 H− + k+1

zH− ζ H+ F R (q, z, ζ ) u

(5.40)

q − 8 η(q u )3 ϑ1,1 (q u , zq −m ) ϑ1,0 (q u , z 2 ζ 2 q −m ) ϑ1,0 (q u , z 2 ζ − 2 q m −m ) 1

1

1

1

,

Vertex Operator Extensions of Dual Affine s(2) Algebras

529

with ϑ1,0 (q, z 2 ζ − 2 ) ϑ1,0 (q, z 2 ζ 2 ) F (q, z, ζ ) = ϑ1,1 (q, z) n≥1 (1 − q n )3 1

1

1

1

R

(5.41)

being essentially the Verma module character (B.8). Note that one of these characters only (the vacuum representation m = m = 0) is regular in the limit where z → 1. V. The class V constraints on the highest weight state isospin and hypercharge are given by, 2H− − (k + 1)(M + M + 2) = 0

H+ − H− = −(M + 1)(k + 1), (5.42)

and

where 0 ≤ M + M ≤ u − 2 and M, M ∈ Z+ . The charged singular vectors in the corresponding Verma module are |C (−) (M + 1, H− , k) and |C (+) (−M − 1, H− , k); the corresponding 21 u(u − 1) irreducible representation characters are R,V χM,M (q, z, ζ ) = q

·

2 −H 2 H− + k+1

zH− ζ H+ F R (q, z, ζ ) u

(5.43)

−q − 8 η(q u )3 ϑ1,1 (q u , zq M+M +2 ) 1 2

1 2

1 2

ϑ1,0 (q u , z ζ q M +1 ) ϑ1,0 (q u , z ζ

− 21

q M+1 )

.

Note that u − 1 of them are regular in the limit as z → 1 (they correspond to M + M = u − 2). As we will see momentarily, a given orbit of the spectral flow (B.7) involves the χ R,V as well as the χ R,IV type characters. Lemma 5.2. The character of L u−s+1 , u−s−1 , 1 −1;β is given by 2u

u

2u

L

χ u−s+1 , u−s−1 , 1 −1;β (q, z, ζ ) = 2u

2u

u

R,IV χs−1,u−1−β (q, z, ζ ) for u − 1 < s + β R,V χu−s−1−β,β (q, z, ζ ) for u − 1 ≥ s + β

=

= q −(β+1)

β+s u

z

1−s 2u

ζ−

1+2β+s 2u

(5.44)

F R (q, z, ζ )

u

·

q − 8 η(q u )3 ϑ1,1 (q u , zq 1−s ) ϑ1,0 (q u , z 2 ζ 2 q β+1 ) ϑ1,0 (q u , z 2 ζ − 2 q −s−β ) 1

1

1

1

.

Indeed, according to (5.32), the relation between H+ and H− involves a strictly positive integer ν, and therefore, one is naturally in a class V context. The labels M and M are given by (5.42) with H± from (5.38), M = u(H− + H+ ) − 1 = −s − 1 − β + u, M = u(H− − H+ ) − 1 = β.

(5.45)

These can be translated into class-IV labels in the appropriate range in accordance with the spectral flow properties of the class IV and V characters, which we now consider. R,IV The spectral flow (B.7) acts on the χm,m characters by shifting the labels as m → m,

m → m − θ,

(5.46)

530

P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina

and for some values of θ , the parameter m may flow to a value outside the fundamental range 0 ≤ m ≤ m ≤ u − 1. This gives the class-V characters. Indeed, first note that it is sufficient to consider the spectral flow parameter θ in the range 0 ≤ θ ≤ u − 1, given the (quasi)-periodicity property of the class IV characters, R,IV n R,IV χm+nu,m +n u = (−1) χm,m .

(5.47)

Then we have, R,IV χm,m (τ, σ, ν + 2θ τ ) = q

θ 2 (1−u) u

θ 2 (1−u) u

R,IV χm,m −θ (τ, σ, ν)

ζ

θ (1−u) u

R,V χu−1−(m−m +θ),θ−m −1 (τ, σ, ν)

m + 1 ≤ θ ≤ u − 1 − (m − m ),

for R,IV χm,m (τ, σ, ν + 2θ τ ) = q

θ (1−u) u

0 ≤ m − θ ≤ m,

for R,IV χm,m (τ, σ, ν + 2θ τ ) = q

ζ

θ 2 (1−u) u

for

ζ

θ (1−u) u

R,IV χm,m −θ+u (τ, σ, ν)

u − (m − m ) ≤ θ ≤ u − 1.

On the other hand, the spectral flow transform (B.7) acts on class V characters by shifting the representation labels as M → M + θ,

M → M − θ,

(5.48)

and one has, R,V χM,M (τ, σ, ν + 2θ τ ) = q

θ 2 (1−u) u

for R,V χM,M (τ, σ, ν + 2(u − 1 − M )τ ) = q

θ (1−u) u

(M +1−u)2 (u−1) u

ζ

(M +1−u)(u−1) u

θ = u − 1 − M ,

θ 2 (1−u) u

for

R,V χM−θ,M +θ (τ, σ, ν)

M − (u − 2) ≤ θ ≤ u − 2 − M ,

for R,V χM,M (τ, σ, ν + 2θ τ ) = q

ζ

ζ

θ (1−u) u

R,IV χu−2−M−M ,0 (τ, σ, ν)

R,V χM−θ+u,M +θ−u (τ, σ, ν)

u − M ≤ θ ≤ u − 1.

5.4.3. Sumrules for s(2|1) characters. A key ingredient in the derivation of the character identities is provided by the decomposition formulas of the u2 characters given above into s(2) characters at the same level k = u1 − 1 [20], namely, R,IV χm,m (τ, σ, ν) =

R,V χM,M (τ, σ, ν) =

u−1 i=0 u−1 i=0

s(2)

χu+1,u,u−i,m+1 (τ, σ ) F IV (τ, ν), s(2)

χu+1,u,u−i,u−1−M−M (τ, σ ) F V (τ, ν),

(5.49)

(5.50)

where F IV (τ, ν) =

u−2 a=0

(u−1)

ci,X(i,a) (τ )θ(u−1)(m−2m +u)+uX(i,a),u(u−1) (τ, uν ),

(5.51)

Vertex Operator Extensions of Dual Affine s(2) Algebras

F (τ, ν) = V

u−2 a=0

531

(u−1)

ci,X(i,a) (τ )θ(u−1)(M −M)+uX(i,a),u(u−1) (τ, uν ).

(5.52)

(u−1)

In the above, ci,j (τ ) are the s(2) string functions at level u − 1 [26] and X(i, a) = u u u (u − 1)i + 2i( 2 − [ 2 ]) − 2a, where [ 2 ] is the integer part of u2 . The s(2) string functions u−1 characters at the level u−1 were first introduced by Kaˇc and Peterson to rewrite s(2) in terms of generalised theta functions at level u − 1, namely, u−1

s(2)

χu+1,1,r,1 (τ, ν) =

m=−u+2

(u−1)

cr−1,m (τ )θm,u−1 (τ, ν),

(5.53)

with the properties (k)

(k)

(k)

(k)

c,m (τ ) = c,−m (τ ) = ck−,k−m (τ ) = c,m+2nk (τ ),

(5.54)

where n ∈ Z and ≡ m mod 2. As is clear from (5.49) and (5.50), each s(2|1) character is decomposed in exactly u s(2) characters χu+1,u,r,s (τ, σ ) with s fixed. More precisely, at a fixed value of s in the range 1 ≤ s ≤ u, the u s(2|1) characters which can be decomposed into the u characters s(2) χu+1,u,r,s (τ, σ ) are, R,IV IV. χs−1,m with 0 ≤ m ≤ s − 1

(s characters),

R,V V. χM,M with M + M = u − 1 − s

(u − s characters).

characters are not only Decomposition formulas (5.49) and (5.50) for the s(2|1) 1 u −1 explicitly given in terms of characters of the subalgebra s(2) 1 , but they also encode u −1

u−1 through the s(2) some information on a dual algebra s(2) string functions at the level u − 1. We were able to derive character identities for the values u = 2 and u = 3 by using standard, albeit somewhat tedious, manipulations on generalised theta functions. On the basis of these two cases, we were led to conjecture identities for arbitrary u ∈ N. One may write 2u sumrules for Ramond s(2|1) characters at the level k = u1 − 1 corresponding to the decomposition (5.25) for irreducible representations. As before, we use (λ)u to denote the residue modulo u of λ, and [ uλ ] to denote the integer part of uλ . The 2u sumrules read SλR (τ, ρ, σ, ν) :

R AR λ (τ, ρ, σ, ν) = Bλ (τ, ρ, σ, ν)

λ = 0, . . . , 2u − 1, (5.55)

where AR λ (τ, ρ, σ, ν)

=

(λ)u i=0

+

R,IV θu−2i+λ,u (τ, −ρ)χ(λ) (τ, σ, ν) u ,i u−1

i=(λ)u +1

(5.56) R,V θu−2i+λ,u (τ, −ρ)χi−1−(λ) (τ, σ, ν) u ,u−1−i

532

P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina

and BλR (τ, ρ, σ, ν) = θ1+λ,1 (τ, u−1 u ν − ρ) ·

N i=0

s(2) s(2) (τ, σ ) χu+1,1,u−2i,1 (τ, uν u+1,u,2i+1+(u−1−4i)[ uλ ],(λ)u +1

χ

+ θλ,1 (τ,

u−1 u ν

− ρ) ·

N

+ ρ)

s(2) u+1,u,2i+2+(u−3−4i)[ uλ ],(λ)u +1

χ

i=0 s(2) · (τ, σ ) χu+1,1,u−(2i+1),1 (τ, uν

+ ρ),

(5.57)

with u − 2 ≤ 2N ≤ u − 1 and u − 3 ≤ 2N ≤ u − 2. Note that the second sum in (5.56) is zero for (λ)u = u − 1. There is complete agreement between the sumrules (5.55) and the decomposition formula (5.25). The appearance of the level-u theta functions was explained in (5.26). The theta functions at the level κ = 1 represent the level-one s(2) integrable characters, since one has 1 s(2) θr−1,1 (τ, ν) = χ3,1,r,1 (τ, ν), η(τ )

r = 1, 2.

(5.58)

Strictly speaking, Eq. (5.25) provides the first u sumrules, but the other set is obtained by transforming each of them under the flow, σ → σ − uτ, ν → ν + uτ, ρ → ρ − τ.

(5.59)

To make the correspondence between (5.55) and (5.25) transparent, it remains to recall Lemma 5.2. We end up this section by observing that the above sumrules behave in a remarkable way under two particular flows of the variables σ , ν, and ρ. We first note that the class IV and V characters transform as R,IV χm,m (τ, σ + uτ, ν + (u − 2)τ )  R,IV  for m ≤ m − 1, P χm,m +1 (τ, σ, ν) R,IV = P χu−1,0 (τ, σ, ν) for m = m = u − 1,  P χ R,V 0,u−2−m (τ, σ, ν) for m = m ≤ u − 2,

R,V χM,M (τ, σ + uτ, ν + (u − 2)τ ) R,V for 0 ≤ M ≤ u − 3 and 1 ≤ M ≤ u − 2, P χM+1,M −1 (τ, σ, ν) = R,IV for 0 ≤ M ≤ u − 2 and M = 0, P χu−2−M,0 (τ, σ, ν)

(5.60)

(5.61)

where P = (−1)u+1 q

(u−1)2 u

z

u−1 2

ζ−

(u−1)(u−2) 2u

.

(5.62)

Vertex Operator Extensions of Dual Affine s(2) Algebras

533

R Remark 5.3. The u terms in AR λ (resp. Bλ ) form an orbit under the flow

σ → σ + uτ, ν → ν + (u − 2)τ, ρ → ρ + 2 uτ ,

(5.63)

as can be readily checked with the help of the spectral flow formulas (5.60), (5.61), and (5.28). The invariance under (5.63) indicates that each expression AR λ (equivalently, BλR ) potentially describes a character corresponding to a representation of the bigger affine Lie superalgebra D(2|1; α). Indeed, the spectral flow generator has isospin H− = 1 1/2, hypercharge H+ = 2−u , 2u a u(1) charge proportional to − u in the direction orthogonal to s(2|1) and conformal weight 1. (This follows by comparing the quantum numbers of the vacuum representation of s(2|1) ⊕ u(1) with those of the representation R,V ρ with character θ2u−2,u (τ, u )χ0,u−2 (τ, σ, ν), which are in the same spectral flow orbit and appear in the λ = u sumrule). If one extends the algebra s(2|1) ⊕ u(1) by this spectral flow generator, one generates D(2|1; α), as can be most quickly understood by looking at the D(2|1; α) root diagram in Appendix 6. Indeed, consider for example the first embedding of s(2|1) in D(2|1; α) in Table 1; the spectral flow generator can be identified with the current corresponding to the root α1 + α2 , which is needed in order to extend the s(2|1) root diagram to D(2|1; α). We next note that for α ∈ Z chosen in such a way that 0 ≤ m + λ − uα ≤ u − 1, we have R,IV χm,m (τ, σ − λτ, ν + λτ )  λ(u−1) λ(u−1) R,IV  (−1)λ+α z− 2u ζ − 2u χm+λ−uα,m (τ, σ, ν)    for m ≤ m + λ − uα, = λ(u−1) λ(u−1)  (−1)λ+α z− 2u ζ − 2u χmR,V  −m−λ+uα−1,u−1−m (τ, σ, ν)    for m > m + λ − uα.

(5.64)

For class V characters, with α ∈ Z chosen such that 0 ≤ u−2−M −M +λ−uα ≤ u−1, similarly, R,V χM,M (τ, σ − λτ, ν + λτ )  λ(u−1) λ(u−1) R,V  (−1)λ+α z− 2u ζ − 2u χM−λ+uα,M  (τ, σ, ν)    for M − λ + uα ≥ 0, = λ(u−1) λ(u−1) R,IV  (−1)λ+α z− 2u ζ − 2u χu−2−(M−λ+uα)−M  ,u−1−M (τ, σ, ν)    for M − λ + uα < 0.

(5.65)

Remark 5.4. The λ th sumrule may be obtained from the λth sumrule by the transformation σ → σ − (λ − λ)τ, ν → ν + (λ − λ)τ,

ρ → ρ − (λ

− λ) τu .

(5.66)

534

P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina

In particular, because of the quasi-periodicity of the level-u theta functions and because s(2|1) characters are periodic under the simultaneous shifts σ → σ − uτ and 1 u −1 ν → ν + uτ , one has the quasi-periodicity properties −u u 1−u 1−u R AR ζ Aλ (τ, ρ, σ, ν), λ (τ, ρ − 2τ, σ − 2uτ, ν + 2uτ ) = q y z

BλR (τ, ρ − 2τ, σ − 2uτ, ν + 2uτ ) = q −u y u z1−u ζ 1−u BλR (τ, ρ, σ, ν),

(5.67)

and the set of 2u sumrules (5.55) is closed under the above spectral flow. Moreover, this set carries a unitary representation of the modular group, as can be explicitly checked by using the modular transformations of s(2) characters at integrable and fractional levels. A complete treatment of the modular properties must include the Neveu–Schwarz, the “super” Ramond and “super” Neveu–Schwarz sectors, which may be obtained as follows [20]. The Neveu–Schwarz sumrules are derived by flowing the Ramond sector sumrules according to, SλR (τ, ρ, −σ − τ, ν) = q −

1−u 4u

z−

1−u 2u

SλNS (τ, ρ, σ, ν),

(5.68)

while the “super” Ramond and Neveu–Schwarz sectors are given by, SλR,NS (τ, ρ, σ, ν). SλR,NS (τ, ρ, σ + 1, ν) =

(5.69)

It should be noted that the group of modular transformations and the group of spectral flow transformations on characters are combined into an “extended modular group” via the semi-direct product, in which the spectral flow transformations are the invariant subgroup; thus, the “extended modular group” representation can be induced from the spectral flow representation, and these are the transformations that close on the chosen representations. The relevance of these facts to the representation set of s(2|1) 1 u −1 theory of D(2|1; α) are beyond the present scope and are left aside for future work. 6. Conclusions We have seen that a vertex operator extension of two s(2) algebras at levels k and k satisfying the duality relation (k + 1)(k + 1) = 1 yields an interesting structure, the exceptional affine Lie superalgebra D(2|1; α). This novel construction should provide the setting needed to build some classes of D(2|1; α) representations whose characters are given by either side of the sumrules (5.55). In this paper however, we made use of the above vertex operator extension to con struct representations of the s(2|1) subalgebra of D(2|1; α), and saw that they give sums of representations “twisted” by the s(2|1) spectral flow (5.1). We also derived the k corresponding character identities relating s(2|1) characters to the constituent s(2) k characters, both for Verma modules (Sect. 5.2) and irreducible represenand s(2) tations (Eq. (5.55)). The latter identities involve s(2|1) representations at admissible 1 (non integrable) level k = u1 − 1, u ∈ N and relate them to representations of s(2) u −1 u−1 . Interestingly enough, the admissible s(2|1) and s(2) representations are re1 u −1

1 representations, whose characters are periodic under lated to the admissible s(2) u −1 u−1 representations, which the spectral flow with period 2u, and to the integrable s(2) are themselves periodic under the spectral flow with period 2 [16]. The interplay of admissible and integrable representations within a bigger algebraic structure is quite

Vertex Operator Extensions of Dual Affine s(2) Algebras

535

remarkable and should be further exploited in the way any duality is: in this context for instance, it should relate the representation theory of D(2|1; α) at integer and fractional levels. We hope to return to these issues elsewhere. Another important point raised by the relation between representations and by the corresponding character identities is that of the closure under modular transformations. The s(2|1) characters appearing in these identities do carry a unitary representation of the modular group [23], and using this information, one can explicitly check that the 2u functions A(λ), λ = 0, . . . 2u − 1 on the left-hand side of the sumrules (5.55) also carry a unitary representation of the modular group. It is therefore tempting to identify these functions with the characters corresponding to a particular class of D(2|1; α) represen tations satisfying the requirements for s(2|1) ⊕ u(1) ˆ to be conformally embedded in D(2|1; α), as discussed in Sect. 3. A very interesting general problem is to verify the functorial properties of the cor k ⊕ s(2) k and s(2|1) respondence established between the s(2) k representations; in the simpler case of the relation between s(2) and N = 2 superconformal representations [17], a similar relation is the equivalence of categories modulo the spectral flow (i.e., the equivalence of representation theories of two algebras obtained by extending the universal enveloping of s(2) and N = 2, respectively, by the spectral flow operator). We have seen that in the present case, the spectral flow plays a very similar rôle, and thus an interesting problem is whether we again can construct a similar functor; the two cases are actually related by the Hamiltonian reduction functor k ⊕ s(2) k ←−−−−−−−−−− s(2|1) s(2) k −−−−−−−−−−→    Hamiltonian Reduction  [4,3,22] ! ! k −1 s(2)

) mod spectral flow

←−−−−−−−−−−− N = 2 −−−−−−−−−−→

which may be extended to an argument demonstrating the functorial properties of the correspondence found in this paper. We note that the character identity corresponding to (1.2) turns out to be equivalent to the character identity derived from the correspon dence between the s(2) and N = 2 Verma modules [16]; this identity has also been known in a different representation-theory context [28]. As another future research direction, we note the possibility to use various free-field (and other) realisations of s(2|1) [35, 9] in the construction of Sects. 4.1–4.2; it would be interesting, for example, to relate the corresponding screening operators and interpret this relation in terms of the respective quantum groups. Finally, it is worth exploring the geometric interpretation of the construction for s(2|1) found in this paper. Despite the complication generated by the presence of the odd integer n = 1 on the right-hand side of (1.3), we expect to be able to proceed similarly to the n = 0 case, relevant to the coupling of matter to gravity, where no auxiliary scalar " C) = { : S 1 → is needed and the basic geometric setting involves the loop group SL(2, SL(2, C)}. In that case, the analogue of s(2|1) is an extended algebra consisting of the semi-direct product of two commuting s(2) algebras (corresponding to the left and the right actions on the loop group) with levels k1 and k2 constrained by Eq. (1.3) with n = 0, and the contracted vertex operators C2 (z) ⊗ C2 (z), which are functions on the group and can be viewed as matrix elements of the evaluation representation (in contrast with the s(2|1) case, C2 (z) ⊗ C2 (z) do commute). The constraint on k1 and k2 actually

536

P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina

follows from imposing the Knizhnik–Zamolodchikov equation on the C2 (z) ⊗ C2 (z) vertex operators. The vacuum representation of the extended algebra can be described in terms of distributions on G+ – the subgroup of the loop group consisting of mappings that extend to the origin from S 1 = {z ∈ C : |z| = 1} – and is given by a sum of Weyl modules of the two s(2) algebras. A different piece of this picture is provided " C)/G+ , where G− is the subgroup by distributions on the double quotient G− \SL(2, " C) consisting of loops which are the boundary values of holomorphic maps of SL(2, : {z ∈ C ∪ ∞ : |z| > 1 → SL(2, C)} [32]. The equivalence classes X(n) are labelled by a positive integer n ∈ Z+ . The distributions living on a given X(n) carry the left and the right s(2) actions; this time, however, the Weyl modules are combined differently since W ⊗ Wm enters as many times as there are n-dimensional s(2) representations in the tensor product of the - and m-dimensional ones. It is interesting to investigate how much of this description can be carried over to the “noncommutative” case of the s(2|1) algebra constructed in this paper. Acknowledgements. We thank A. Belavin, H. Kausch, W. Oxbury, I. Shchepochkina, I. Tipunin, and G. Watts for discussions. This work was supported by the EPSRC grant GR/M12544 and partly by the RFBR Grant 9801-01155 and the Russian Federation President Grant 99-15-96037. A.M.S. gratefully acknowledges kind hospitality extended to him at the Department of Mathematical Sciences, University of Durham. A.T. acknowledges The Leverhulme Trust for a fellowship.

Appendix A. Some s(2) Quantum Group Relations In describing the s(2)q quantum group, we follow the conventions of [29]. The quantum group relations are K K −1 = K −1 K = 1, K E K −1 = q 2 E,

K F K −1 = q −2 F,

(A.1)

K − K −1 [E, F ] = . q − q −1 The antipode acts on these generators as follows: S(E) = −E K −1 ,

S(F ) = −K F,

S(K) = K −1 ,

S(K −1 ) = K,

(A.2)

and the comultiplication is given by #(E) = 1 ⊗ E + E ⊗ K, #(K) = K ⊗ K,

#(F ) = K −1 ⊗ F + F ⊗ 1, #(K

−1

)=K

−1

⊗K

−1

.

(A.3) (A.4)

Together with the counit ε given by ε(E) = ε(F ) = 0, ε(K) = ε(K −1 ) = 1, these relations endow s(2)q with a Hopf algebra structure. For a module V over a Hopf algebra A, the A action on the dual module V∗ is defined by (a f )(v) = f (S(a) v),

a ∈ A,

f ∈ V∗ ,

v ∈ V.

(A.5)

Let V",n be the s(2)q module with the highest-weight vector v0 such that E v0 = 0,

K v0 = "q n v0 ,

(A.6)

Vertex Operator Extensions of Dual Affine s(2) Algebras

537

where " 2 = 1 and n is a positive integer. We then define F vi−1 = [i] vi ,

(A.7)

whence K vi = " q n−2i vi .

E vi = "[n − i + 1] vi−1 , We use the standard notation q i − q −i [i] = , q − q −1

# $ n [n]! , = [i]! [n − i]! i

[i]! = [1] [2] . . . [i].

There is an invariant scalar product (vi , vj ) = δij q

(A.8)

−i(n−i−1)

# $ n . i

(A.9)

(A.10)

In the V",n modules with n = 1, in particular, the s(2)q action on the basis vectors v0 and v1 is given by E v0 = 0, E v1 = "v0 ,

K v0 = "q v0 , K v1 = "q

−1

v1 ,

F v0 = v1 ,

(A.11)

F v1 = 0.

(A.12)

In the dual module with the dual basis v i , we then find from (A.5) and (A.2): E v0 = − q v1,

K v 0 = "q −1 v 0 ,

F v 0 = 0,

E v = 0,

K v = "q v ,

F v = −"q

1

1

1

(A.13)

1

V" ,1

−1 0

v0

v . v1 . As

Let be a similar module over s(2)q −1 , with the basis and check, it is also a module over s(2)q , the s(2)q action being given by E v0 = v1 , E v1

= 0,

(A.14) is easy to

K v0 = " q −1 v0 ,

F v0 = 0,

(A.15)

K

F

(A.16)

v1

The tensor product of s(2)q modules

="q V" ,1

v1 ,

v1

=

" v0 .

⊗ V",1 is decomposed as

V" ,1 ⊗ V",1 = V"" ,0 ⊕ V"" ,2 ,

(A.17)

where V"" ,2 is generated from v1 ⊗v0 , and V"" ,0 from v0 ⊗v0 −qv1 ⊗v1 . The projection V" ,1 ⊗ V",1 → V"" ,0 can be defined in terms of the trace ·, · such that v0 , v0 = 1,

v1 , v1 = −q,

v0 , v1 = 0,

v1 , v0 = 0.

(A.18)

The vertex operator construction involves this trace operation in the case where " = 1 and " = −1: from the quantum group representation standpoint, each s(2|1) k (or D(2|1; α)k ) fermion is constructed as −qv1 ⊗ v1 + v0 ⊗ v0 = qv1 ⊗ F v0 − " F v1 ⊗ v0 .

(A.19)

With the basis of the quantum group module represented by the vertex operators as in Sect. 2, we identify qF in the first term with the screening operator S acting in the unprimed sector, and −" F in the second terms with S acting in the primed sector; we then denote the action of S or S on the respective vertex operator with a tilde. This gives (2.12)–(2.13), where we omit the tensor product sign and write the primed and the unprimed multipliers in the reversed order compared to the formula with the tensor-product notation; we hope that this minor notational discrepancy does not lead to confusion.

538

P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina F2

E1

E 12

F 12

F1

E2

Fig. 2. The s(2|1) root diagram

Appendix B. s(2|1) Algebra, Spectral Flow, and Charged Singular Vectors The affine Lie superalgebra s(2|1) consists of four bosonic currents E 12 , H − , F 12 , and + 1 H and four fermionic ones, E , E 2 , F 1 , and F 2 . The s(2) subalgebra is generated by E 12 , H − , and F 12 , and it commutes with the u(1) subalgebra generated by H + . For reference, we give in Fig. 2 the two-dimensional root diagram of the finite dimensional Lie superalgebra s(2|1), represented in Minkowski space with the fermionic roots along the light cone directions. We sometimes refer to the eigenvalue corresponding to a given eigenstate of the Cartan generator H0− (resp. H0+ ) as the isospin (resp. the hypercharge) of that state. At level k, the nonvanishing commutation relations are given by 12 , [Hm− , En12 ] = Em+n

12 , [Hm− , Fn12 ] = −Fm+n

1 , [Fm12 , En2 ] = Fm+n

12 , F 2 ] = −E 1 [Em n m+n ,

2 , [Fm12 , En1 ] = −Fm+n

12 , F 1 ] = E 2 [Em n m+n ,

1 [Hm± , En1 ] = 21 Em+n ,

1 , [Hm± , Fn1 ] = − 21 Fm+n

− 1 12 , F 12 ] = mδ ± ± [Em m+n,0 k + 2Hm+n , [Hm , Hn ] = ∓ 2 mδm+n,0 k, n

[Hm± , En2 ] 1 , F 1] [Em n + 2 [Em , Fn2 ]+ 1 , E2] [Em n +

= = = =

[Hm± , Fn2 ] + − −mδm+n,0 k + Hm+n − Hm+n , + − mδm+n,0 k + Hm+n + Hm+n , 12 , Em+n [Fm1 , Fn2 ]+ 2 ∓ 21 Em+n ,

=

(B.1)

2 , ± 21 Fm+n

12 . = Fm+n

One of the s(2|1) k spectral flows is given by Uθ :

1 , E 2 → E 2 , En1 → En−θ n n+θ 1 , F 2 → F 2 , Fn1 → Fn+θ n n−θ

Hn+ → Hn+ + kθ δn,0

(B.2)

(with the s(2) subalgebra remaining invariant). For θ ∈ Z, this is an automorphism of s(2|1). Applying the spectral flow to modules gives twisted modules. A twisted module with a vacuum vector is, thus, generated from the state |h− , h+ , k; θ (which we call the twisted highest-weight vector) satisfying the twisted highest-weight conditions 1 |h− , h+ , k; θ = 0 E−θ

Eθ2 |h− , h+ , k; θ = 0

F112 |h− , h+ , k; θ = 0,

(B.3)

and whose quantum numbers of hypercharge and isospin are given by, H0+ |h− , h+ , k; θ = (h+ − kθ ) |h− , h+ , k; θ ,

H0− |h− , h+ , k; θ = h− |h− , h+ , k; θ ,

(B.4)

Vertex Operator Extensions of Dual Affine s(2) Algebras

539

where k is the level and θ is the twist. The eigenvalue of H0+ is parametrised as h+ − kθ so as to have the same value of h+ for all the modules differing from each other by a spectral flow transform. We assume θ ∈ Z in most of our formulas, with the necessary modifications for θ ∈ Z + 21 to be done in accordance with the spectral flow transform. The dimension of |h− , h+ , k; θ with respect to the Sugawara energy-momentum tensor

1 H − H − − H + H + + E 12 F 12 + E 1 F 1 − E 2 F 2 (B.5) TSug = k+1 is given by h2− − h2+ + 2θh+ − kθ 2 . (B.6) k+1 The character of a twisted module N;θ is expressed through the “untwisted” character χN as #h− ,h+ ,k;θ =

N (q, z, ζ ) = ζ −kθ q −kθ χ N (q, z, ζ q 2θ ). χ;θ 2

(B.7)

The twisted Verma module Ph− ,h+ ,k;θ is freely generated from |h− , h+ , k; θ by 1 2 1 , F 2 , E 12 , F 12 , H + , and H − . For an integral θ , the E≤−θ−1 , E≤θ−1 , F≤θ ≤−θ ≤−1 ≤−1 ≤−1 ≤0 character of Ph− ,h+ ,k;θ is 2 − h2 Sug

+ − −h+ 2 Tr zH0 ζ H0 q L0 = zh− ζ h+ −(k+1)θ q k+1 +2θh+ −(k+1)θ

(B.8)

ϑ1,0 (q, z 2 ζ 2 ) ϑ1,0 (q, z 2 ζ − 2 ) , ϑ1,1 (q, z) m≥1 (1 − q m )3 1

·

1

1

1

where the Jacobi theta functions are defined by % % % 1 2 ϑ1,1 (q, z) = (−1)m q 2 (m −m) z−m = (1 − z−1 q m ) (1 − zq m ) (1 − q m ), m≥0

m∈Z

ϑ1,0 (q, z) =

2 −m)

q 2 (m 1

z−m =

%

(1 + z−1 q m )

m≥0

m∈Z

m≥1

%

(1 + zq m )

m≥1

m≥1

%

(B.9) (1 − q m ).

m≥1

(B.10) Among singular vectors that can exist in Ph− ,h+ ,k;θ , we note the so-called charged singular vectors. They occur whenever h+ = ±h− − (k + 1)n,

n∈Z

(B.11)

and are given by an explicit construction as follows [34]. For h+ − h− = −(k + 1)n, n ∈ Z, the charged singular vector in the twisted Verma module Ph− ,h+ ,k;θ reads (−) C (n, h− , k; θ ) =  2 . . . E 2 · F 1 . . . F 1 |h , h − n(k + 1), k; θ, n ≤ 0, E    θ+n θ−1 θ+n θ − −  −n −n+1 (B.12) 1 1 2 2  E . . . E · F  −θ−n −θ−1 1−θ−n . . . F−θ |h− , h− − n(k + 1), k; θ, n ≥ 1.   n

n

540

P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina

As is easy to check, this vector satisfies the highest-weight conditions (we omit the singular vector itself) 1 ≈0 E−θ−n

2 Eθ+n ≈0

F112 ≈ 0

(B.13)

1 ≈ 0. together with Fθ+n Similarly, whenever h+ + h− = −n(k + 1), n ∈ Z, the charged singular vector in Ph− ,h+ ,k;θ is

(+) C (n, h− , k; θ ) =  2 2 ·F1 . . . Fθ1 |h− , −h− − n(k + 1), k; θ, Eθ+n . . . Eθ−1    θ+n+1  −n

−n

1 1  E−θ−n . . . E−θ−1 ·F2 . . . F 2 |h− , −h− − n(k + 1), k; θ,   −θ −n −θ  n

n ≤ −1, n ≥ 0.

(B.14)

n+1

2 This satisfies the highest-weight conditions (B.13) supplemented by F−θ−n ≈ 0. It is (±) straightforward to check that the highest-weight conditions satisfied by |C (n, h− , k; θ) imply that this vector generates a submodule in the respective Ph− ,h+ ,k;θ module.

Appendix C. D(2|1; α) The one-parameter family of exceptional Lie superalgebras D(2|1; α), with α ∈ C \ {0, −1, ∞} are basic type II classical simple complex Lie superalgebras in the Kaˇc classification [24, 11].At each fixed value of α, D(2|1; α) is a rank 3 superalgebra with six even roots and eight odd roots. Its dual Coxeter number h∨ is zero and its superdimension, which is the number of bosonic generators minus the number of fermionic generators, is sdim = 9 − 8 = 1. The central charge of the Virasoro algebra satisfied by the Sugawara energy-momentum tensor of the affine algebra is therefore 1 for any value of the level κ since, c=

κ sdim = 1. κ + h∨

(C.1)

The bosonic part of D(2|1; α) is s(2)⊕s(2)⊕s(2), and the action of D(2|1; α)0 on D(2|1; α)1 is the product of 2-dimensional representations. The root diagram (see Fig. 3) can be visualised in a parallelipiped in 3d space with metric gij = diag(−1, −1, 1). All odd roots are at the vertices of the parallelipiped on the light cone. Six even roots lie on the three lines through the centre of the faces. The Weyl group does not act transitively on the set of simple roots, and there are six choices of simple root systems. We describe in more detail the system of simple roots with one odd root, α1 , and two even roots, α2 and α3 . The three regular s(2) subalgebras are in the directions of α2 , α3 and αθ = 2α1 + α2 + α3 . The relevant scalar products are summarised as α12 = 0, α22 = −2γ , α32 = −2(1 − γ ), αθ2 = 2, α1 · α2 = γ , α1 · α3 = 1 − γ , α2 · α3 = 0, α3 · αθ = 0, α2 · αθ = 0, α1 · αθ = 1,

(C.2) (C.3) (C.4)

Vertex Operator Extensions of Dual Affine s(2) Algebras

541

αθ

α1

α3

α2

Fig. 3. The D(2|1; α) root diagram

where γ =

α , 1+α

γ ∈ C \ {0, 1, ∞}.

(C.5)

With the metric above, we have α1 =

1 − γ , 1),

α2 = ( 2γ , 0, 0),

√ √1 (− γ , − 2

α3 = (0, 2(1 − γ ), 0). (C.6)

The mapping t1 : γ → 1 − γ interchanges the rôles of α2 and α3 , while the mapping t2 : γ → γ −1 interchanges those of α2 and αθ . These two transformations generate an order-6 group defined by the relations t12 = t22 = 1, t1 t2 t1 = t2 t1 t2 , and one has the isomorphisms γ D(2|1; 1−γ ) ) D(2|1; 1−γ γ ) 1 ) ) D(2|1; − γ1 ) ) D(2|1; γ − 1) ) D(2|1; −γ ). ) D(2|1; γ −1

(C.7)

If one restricts the parameter γ to real values, it is sufficient to consider the domain γ ∈ [1/2, 1[ for which αθ is always the longest root. As far as the affine superalgebra is concerned, the isomorphisms are, γ 1−γ 1 D(2|1; 1−γ )k ) D(2|1; γ )k ) D(2|1; γ −1 )− k γ

) D(2|1; γ − 1)− k ) D(2|1; − γ1 )− γ

k 1−γ

) D(2|1; −γ )−

k 1−γ

.

(C.8)

542

P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina

Appendix D. s(2|1) OPE’s We use a number of standard integrals, assuming whenever necessary that they are analytically continued from the domain where they are well-defined. For Re z > Re w, we have

z U(α + 1)U(β + 1) (D.1) dx (z − x)α (x − w)β = (z − w)a+b+1 U(α + β + 2) w and also

∞ U(α + 1)U(−1 − α − β) dx (x − z)α (x − w)β = (z − w)α+β+1 U(−β) z sin πβ U(α + 1)U(β + 1) . = −(z − w)α+β+1 sin π(α + β) U(α + β + 2) (D.2) In evaluating the operator product E 1 (z) · E 2 (w), we have the following operators √1

ϕ

in (2.30): V1 = e 2p , V2 = e above integrals, we obtain

∞

z

du S(u) V1 (z) V2 (w)

= β(w)

sin sin

π p 2π p

(z − w)

√1 ϕ 2p

∞ z

1 2 2p +1− p

, V1 = e

√1 ϕ 2p

, and V2 = γ e

√1 ϕ 2p

. Using the

dx S (x) V1 (z) V2 (w) U(1 − p1 )U(1 − p1 ) U(2 − p2 )

· (z − w)

1 −2 2p p

U( p1 )U(2 − p2 ) U(2 − p1 )

= (z − w)− 2 β(w) 1

πp

sin

(D.3)

2π p

plus lower-order terms. This is to be multiplied with the operator product √1

f (z) − √1 f (w)

e 2 = (z − w)− 2 + . . . , which restores the first-order pole. We now e 2 recall the normalisation factors (−1) · πpi cos πp from (2.12) and the factor −2i sin πp from (2.30). This gives the operator product 1

E 12 (w) −β(w) = , z−w z−w which is in agreement with (B.1). The remaining terms in (2.30) cancel each other. Evaluating the operator product E 1 (z) · F 1 (w), we have V1 = e V1 = e

√1 ϕ 2p

, and V2 = γ e

√1 ϕ 2p

√1 ϕ 2p

, V2 = γ e

√1 ϕ 2p

. Then

∞

du S(u) V1 (z)V2 (w)

∞ 1 1 = (z − w) 2p du − u−w + p2 ∂ϕ(w) + βγ (w) − z

z

√1 2p

z−w ∂ϕ(w) u−w + ...

(u − z)

− p1

(u − w)

− p1

,

Vertex Operator Extensions of Dual Affine s(2) Algebras

543

(where the dots denote higher-order terms). This equals '' & U(1 − p1 )U(−1 + p2 ) & p = (z − w) . p − 2 + (z − w) 2 ∂ϕ(w) + βγ (w) U( p1 ) (D.4) (∞ Next, the primed sector integral z du S (u) V1 (z)V2 (w) is obtained from the last expression by simply replacing p → p , ϕ → ϕ , β → β , and γ → γ . Multiplying the primed and the unprimed contributions, we thus obtain

∞

∞ du S(u) V1 (z) V2 (w) dx S (x) V1 (z) V2 (w) z z p ' 1 2 & U(1 − )U(−1 + ) 1 p−2 2 ∂ϕ(w) + βγ (w) p p 2 = (z − w) + (z − w)2 z−w U( p1 ) ' p U( p1 )U(1 − p2 ) & p − 2 2 ∂ϕ (w) + β γ (w) × + z−w (z − w)2 U(1 − p1 ) & ( p ∂ϕ + βγ ) − p( p ∂ϕ + β γ ) ' p 1 p (p − 2) π 2 2 = (z − w) 2 . + 2π 2 (z − w) z − w sin p 2 1 2p − p

√1 f (z) 2

This is further multiplied with e

e

− √1 f (w) 2

= (z−w)− 2 (1+ √1 (z−w)∂f +. . . ); in 1

2

addition, we recall the normalisations (−1)· πpi cos πp in (2.12) and the factor −2i sin from the first term in (2.30). Thus, the first term in (2.30) gives the contribution p p + β γ ) − ( p ∂ϕ + βγ ) − (p−2) √ ∂f ( ∂ϕ −(p − 2) 2 2 p 2 E 1 (z) F 1 (w) = + 2 (z − w) z−w −k H+ − H− = + . 2 (z − w) z−w

π p

The remaining terms in (2.30) cancel each other and, therefore, the operator product E 1 (z) F 1 (w) given by the last formula is in agreement with the respective commutator in (B.1). We will need the first regular term in the above expansion when we calculate the s(2|1) energy-momentum tensor in Lemma 2.2, namely, −p E 1 F 1 = √p βγ ∂f − √p β γ ∂f − p2 β γ ∂ϕ + p2 βγ ∂ϕ 2 2 √ √ p (p−2) p 2 √ √ − p(p−1) β γ ∂ϕ − βγ ∂ϕ + ∂ ϕ (D.5) 2 +

p+2 4(p−1) ∂ϕ∂ϕ

+

p−2 2

+

p 2 2∂ ϕ

2(1−p) 2 2(p−1) (p−2) p−2 2 √ p∂ f − √ ∂ϕ∂ϕ + 41 (2 − 3p)∂ϕ ∂ϕ 2 p−1 2 2

+

p √ 2 p ∂ϕ∂f

−

p 2

p ∂ϕ ∂f +

p 4

(p − 2)∂f ∂f + X,

where X is a contribution that cancels against similar terms coming from E 2 F 2 .

544

P. Bowcock, B. L. Feigin, A. M. Semikhatov, A. Taormina

As regards E 2 (z) · F 2 (w), we again use (2.30), where now V1 = e √1 ϕ

√1 ϕ 2p

√1 ϕ 2p

√1 ϕ

, V2 =

γe ,(V1 = γ e 2p , and V2 = e 2p . We already know from (D.4) the unprimed ∞ integral z du S(u) V1 (z) V2 (w); the primed sector contributes is evaluated similarly. We only quote the first regular term −p E 2 F 2 = −pβ ∂γ −p ∂βγ + p2 βγ ∂ϕ − p2 β γ ∂ϕ + √p βγ ∂f 2 √ p p p(p−1) √ + √ β γ ∂f − √ βγ ∂ϕ − β γ ∂ϕ + (p−2)p ∂ 2f 2 2 2(1−p) 2 2 √ − p2 p2 ∂ 2 ϕ − p2 p2 ∂ 2 ϕ − p4 (p − 2)∂f ∂f + p2 p ∂ϕ∂f p−2 ∂ϕ∂ϕ + p2 p ∂ϕ ∂f − 4(p−1) +

2−p √ ∂ϕ∂ϕ 2 p−1

+ 41 (2 − p)∂ϕ ∂ϕ + X,

(D.6)

where X is a contribution that cancels between E 1 F 1 and E 2 F 2 . √1 ϕ The operator product F 1 (z) · F 2 (w) is evaluated similarly; we have V1 = γ e 2p , √1

√1 ϕ

ϕ

√1 ϕ

V2 = γ e 2p , V1 = γ e 2p , and V2 = e 2p . One then multiplies the contributions of the primed and unprimed sectors, given by

∞ z

and

z

∞

du S

(u) V1 (z) V2 (w)

= −(z − w)

1 −2 2p p

U(− p1 )U( p2 ) U( p1 )

,

∞ γ (w)

γ (z) − du S(u) V1 (z) V2 (w) = du βγ 2 − u−w u−z z

−1 −1 · 1 − p2 (u − w) ∂ϕ + √12p (z − w) ∂ϕ (u − z) p (u − w) p 1

= (z − w) 2p

− p2 +1

U(1 − p1 )U(−1 + p1 ) U( p1 )

(βγ 2 +

2p γ ∂ϕ + (p − 2)∂γ ),

where we see the J − current from (2.22). References 1. Aharony, G., Ganor, O., Sonnenschein, J., Yankielowicz, S., Sochen, N.: G/G models and WN strings. Phys. Lett. B289, 309–316 (1992) CurrentAlgebra. Commun. 2. Bernard, D., Felder, G.: Fock Representations and BRST Cohomology in s(2) Math. Phys. 127, 145–168 (1990) 3. Bershadsky, M., Lerche, W., Nemeschansky, D. and Warner, N.P.: Extended N = 2 Superconformal Structure of Gravity and W Gravity Coupled to Matter. Nucl. Phys. B401, 304–347 (1993) 4. Bershadsky, M. and Ooguri, H.: Hidden Osp(N, 2) Symmetries in Superconformal Field Theories. Phys. Lett. B 229, 374 (1989) 5. Bouwknegt, P., McCarthy, J., Pilch, K.: Quantum Group Structure in the Fock Space Resolutions of s(n) Representations. Commun. Math. Phys. 131, 125–155 (1990) 6. Bouwknegt, P., McCarthy, J., Pilch, K.: Free-Field Approach to 2-Dimensional Conformal Field Theories. Prog. Theor. Phys. Suppl. 102, 67–135 (1990)

Vertex Operator Extensions of Dual Affine s(2) Algebras

545

7. Bowcock, P., Hayes, M.R., Taormina, A.: Characters of admissible representations of the affine superal gebra s(2|1). Nucl. Phys. B 510, 739–763 (1998) 8. Bowcock, P., Hayes, M.R., Taormina, A.: Parafermionic representation of the affine s(2|1) at fractional level. hep-th 9803024 9. Bowcock, P., Koktava, R-L., Taormina, A.: Wakimoto modules for the affine superalgebra s(2|1) and noncritical N = 2 strings. Phys. Lett. 388, 303–308 (1996) 10. Bowcock, P., Taormina, A.: Representation theory of the affine Lie superalgebra s(2|1) at fractional level. Commun. Math. Phys. 185, 467–493 (1997) 11. Cornwell, J.F.: Group Theory in Physics. Vol.3, London–New York: Academic Press, 1989 12. David, F.: Conformal field theories coupled to 2D-gravity in the conformal gauge. Mod. Phys. Lett. A 3, 1651–1656 (1988) 13. Distler, J., Kawai, H.: Conformal field theory and 2D-quantum gravity. Nucl. Phys. B 321, 509–527 (1989) 14. Fan, J-B., Yu, M.: Modules over affine Lie superalgebras. hep-th 9304122 15. Fan, J-B., Yu, M.: G/G gauged supergroup valued WZNW field theory. hep-th 9304123 16. Feigin, B.L., Semikhatov, A.M., Sirota, V.A., and Tipunin, I.Yu.: Resolutions and Characters of Irreducible Representations of the N = 2 Superconformal Algebras. Nucl. Phys. B 536 617–656 (1999) 17. Feigin, B.L., Semikhatov, A.M., and Tipunin, I.Yu.: Equivalence between Chain Categories of Representations of Affine s(2) and N = 2 Superconformal Algebras. J. Math. Phys. 39, 3865–3905 (1998) 18. Goddard, P., Olive, D.: Kac–Moody and Virasoro algebras in relation to quantum physics. Int. J. of Mod. Phys. A1, 303–414 (1986) 19. Gomez, C., Sierra, S.: The Quantum Symmetry of Rational Conformal Field Theories. Nucl. Phys. B 352, 791–828 (1991) 20. Hayes, M.R., Taormina, A.: Admissible s(2|1) characters and parafermions. Nucl. Phys. B 529, 588–610 (1998) 21. Hu, H.L.,Yu, M.: On the equivalence of non-critical strings and G/G topological field theories. Phys. Lett. B 289, 302–308 (1992) 22. Ito, K., and Kanno, H.: Hamiltonian Reduction and Topological Conformal Algebra in c ≤ 1 Non-Critical Strings. Mod. Phys. Lett. A 9, 1377 (1994) 23. Johnstone, G.B.: Modular Transformations and Invariants in the Context of Fractional Level s(2|1; C). In preparation 24. Kaˇc, V.G.: A sketch of Lie Superalgebra Theory. Commun. Math. Phys. 53, 31–64 (1977) 25. Kaˇc, V.G.: Infinite Dimensional Lie Algebras. Cambridge: Cambridge University Press, 1990 26. Kaˇc, V.G., Peterson, D.H.: Infinite-dimensional Lie algebras, theta functions and modular forms. Adv. in Math. 53, 125–264 (1984) 27. Kaˇc, V.G., Wakimoto M.: Modular invariant representations of infinite-dimensional Lie algebras and superalgebras. Proc. Nat. Acad. Sci. 85, 4956 (1988) 28. Kac, V.G., Wakimoto, M.: Integrable highest weight modules over affine superalgebras and number theory. hep-th 9407057 29. Kassel, C.: Quantum Groups. Berlin–Heidelberg–New York: Springer, 1995 30. Mathieu, P., Walton, M.A.: On principal admissible representations and conformal field theory. hep-th 9812192 31. Mukhi, S., Panda, S.: Fractional-Level Current Algebras and the Classification of Characters. Nucl. Phys. B 338, 263–282 (1990) 32. Pressley, A., Segal, G.: Loop Groups. Oxford: Oxford Mathematical Monographs, 1986 33. Ramirez, C., Ruegg, H., Ruiz Altaba, M.: Explicit Quantum Symmetries of WZNW Theories. Phys. Lett. B 247, 499–508 (1990) 34. Semikhatov, A.M.: Verma Modules, Extremal Vectors, and Singular Vectors on the Non-Critical N = 2 String Worldsheet. hep-th/9610084 35. Semikhatov, A.M.: The Non-Critical N = 2 String is an s(2|1) Theory. Nucl. Phys. B 478, 209 (1996) 36. Tsuchiya, A., Kanie, Y.: Vertex Operators in the Conformal Field Theory on P1 and Monodromy Representations of the Braid Group. Lett. Math. Phys. 13, 303–312 (1987) Communicated by R. H. Dijkgraaf

Commun. Math. Phys. 214, 547 – 563 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Monotonicity Properties of Optimal Transportation and the FKG and Related Inequalities Luis A. Caffarelli Department of Mathematics, The University of Texas at Austin, Austin, TX 78712-1082, USA. E-mail: [email protected] Received: 18 October 1999 / Accepted: 24 March 2000

Abstract: Optimal transportation between densities f (X), g(Y ) can be interpreted as a joint probability distribution with marginally f (X), and g(Y ). We prove monotonicity and concavity properties of optimal transportation (Y (X)) under suitable assumptions on f and g. As an application we obtain the Fortuin, Kasteleyn, Ginibre correlation inequalities as well as some generalizations of the Brascamp–Lieb momentum inequalities. 0. Introduction We start this introduction by giving some background on optimal transportation and the FKG inequalities. 0.1. The problem of optimal transportation. We are given two probability densities f (X), g(Y ), and we want to transport the (variable X with) density f onto the (variable Y with) density g in a way that minimizes transportation costs, say for simplicity, C(Y −X). Let us first say what we mean by transporting f to g. (Pre) Definition. A smooth map Y (X) transports f to g if g(Y (X)) det DX Y = f (X). That is, a small differential of volume g(Y ) dy is pulled back to

f (X) dx

by the map Y (X).

Research was supported in part by the National Science Foundation, DMS-9714758.

548

L. A. Caffarelli

A weak formulation is the following: Definition 1. A (weak) transport is a measurable map Y (X), such that for any C0 function h(Y ) the following (“change of variable”) formula is valid: h(Y )g(Y ) dY = h(Y (X))f (X) dX. Now, given the cost function C(X), we define Optimal transportation. The (weak) transportation Y (X) is optimal if it minimizes J (Y ) = C(Y (X) − X)f (X) dx among all weak transportation. Existence and regularity of such an optimal transportation has been studied in detail. (See for instance [B, C2, C3] and [G-M].) We will discuss (and use) in this paper the particular case where C(X − Y ) = 21 |X − Y |2 . The correlation inequalities part of the paper holds true for more general cost functions, still convex and with the appropriate symmetries, but the proofs are technically involved and we will present it elsewhere. The second derivative estimates for the Monge-Ampere like equations corresponding to non-quadratic cost functions, is a completely open matter. In the quadratic case, there is a rather complete existence and regularity theory ([B, C2, C3]). We will be interested in the following results. Theorem 1 (Existence and stability, [B])). Let 1 , 2 be two open domains in Rn , f (X), g(Y ) two strictly positive bounded, measurable functions in i , with f (X) dX = g(Y ) dY = 1. 1

2

Then, a) There exists a unique optimal transportation map Y (X). b) The optimal transportation Y (X) (and its inverse X(Y )) are obtained from the following minimization process: b1 ) Among all pairs of continuous functions ϕ(X), ψ(Y ) satisfying the constraint ϕ(X) + ψ(Y ) ≥ X, Y minimize

J (ϕ, ψ) =

1

ϕ(X)f (X) dX +

2

ψ(Y )g(Y ) dY.

b2 ) ϕ and ψ are unique and convex and Y (X) is defined as the (possibly multiple valued) map Y ∈ Y (X) if ϕ(X) + ψ(Y ) = Y, X.

Monotonicity Properties of Optimal Transportation

549

Theorem 2 (Regularity, [C2, C3]). Hypothesis as before, assume further that 1 , 2 are convex. Then a) If 0 < λ ≤ f, g ≤ , the map Y (X) and its inverse X(Y ) are single valued, of class C α in i for some α. b) If f, g are Hölder continuous, with exponent β for some β then Y (X), X(Y ) are of class C 1,β . c) In both cases, (a) and b)), there exists a pair of convex potentials ϕ(X), ψ(Y ) such that Y (X) = ∇ϕ(X), X(Y ) = ∇ψ(Y ). d) ϕ satisfies the Monge–Ampére equation det D 2 ϕ(X) =

f (X) g(∇ϕ(X))

in case a) in the Alexandrov weak sense, in case b) in the classical sense. (Note that ϕ ∈ C 2,β .) By approximation, we will develop all our discussion for f, g of class C α , so we will always talk of “classical” solutions. From the variational construction of Y , we also have a stability theorem. Theorem 3 (Stability). Let fj , gj be uniformly bounded, measurable and supported in a bounded domain BR . Assume that fj → f in L1 , gj → g in L1 . Then ϕj → ϕ, ψj → ψ uniformly in BR . In particular if ϕj , ψj are uniformly C 1,α , then ∇ϕj , ∇ψj also converge uniformly to ∇ϕ, ∇ψ. We complete the discussion with the following interpretation (see [B]). If we think of f (X), g(Y ) as probability densities, we may think of the map Y (X) as a joint probability distribution: ν0 (X, Y ) in 1 × 2 , sitting on the graph X, Y (X) with the property that the marginals µ1 (X), µ2 (Y ) of ν0 are exactly f (X) dx and g(Y ) dy. In fact ν0 has the following minimizing property: Theorem ([B]). Among all probability measures ν(X, Y ) with marginals f (X) dX and g(Y ) dY , Y (X) minimizes E(ν) = |X − Y |2 dν(X, Y ).

0.2. The FKG inequalities. The FKG inequalities (see [FKG, H, P]) play a fundamental role in statistical mechanics. In this paper, we are interested in a theorem of Holley [H] from which the inequalities follow. Holley’s Theorem establishes a monotonicity condition for probability measures µ1 , µ2 defined on a finite lattice, . Let us discuss briefly his two main theorems. We consider a finite lattice (that we will think of as embedded in the set P of vertices of the unit cube of RN for some N (i.e., the set of all N -tuples, X = (x1 , . . . , xN ) with xi = 0 or 1. On , we have two non-vanishing probability measures µ1 (X), µ2 (X) with the “monotonicity property”: Given X, Y in , µ2 (X ∨ Y )µ1 (X ∧ Y ) ≥ µ2 (X)µ1 (Y ). (As usual ∨ denotes taking max in each entry, ∧ min.) Then

550

L. A. Caffarelli

Theorem 4 ([H]). There exists a joint measure ν(X, Y ) with marginals µ1 (X), µ2 (Y ) such that ν(X, Y ) = 0 ⇒ X ≤ Y. As a corollary, he obtains Corollary 1. If h is an increasing function of X, then h(X) dµ1 (X) ≤ h(X) dµ2 (X)

(that is µ2 is “concentrated more to the right” than µ1 ). The purpose of this paper is to study the relation between optimal transportation and the FKG inequalities, in particular to show: a) In the continuous case, the optimal transportation from the unit cube of Rn into itself (µ1 = f (X), µ2 = g(Y )) has the proper monotonicity properties (Y (X) ≥ X) of Holley’s joint probability density provided that f, g do). b) If we “spread” the measures µi from the vertices of the unit cube to half cubes, the densities f, g so obtained satisfy these properties, recuperating from this approach Holley’s theorem, for the lattice formed by all vertices of the cube. c) For a general sublattice, one can extend the “spread” measure to all of the half cubes recuperating in full the theorem of Holley. d) In fact the discrete optimal transportation satisfies Y (X) ≥ X. Our proof is based on the fact that first derivatives of solutions of the Monge–Ampére equation satisfy an equation themselves. But it is also known that second derivatives are subsolutions of an elliptic equation. In the last section we explore what the implications of that fact are in terms of correlation inequalities. In closing this introduction we want to stress that in the continuous case the optimal transport map Y (X) interpreted as a joint probability measure ν(X, Y ) = δX,Y (X) (X, Y )f (X) dX = δX,Y (X) (X, Y )g(Y ) dY is not just a joint distribution but a “change of variables”, i.e., a one to one map that carries one density to the other, and it is further the gradient of a convex potential, giving the map (or the measure ν(X, Y )) a lot of stability. 1. Optimal Transportation from the Unit Cube to the Unit Cube and Periodic Monge–Ampére We start this section with a reflection property of optimal transportation maps. Given X ∈ Rn we denote by X¯ its reflection with respect to x1 , i.e., if X = (x1 , x2 , . . . , xn ) then X¯ = (−x1 , x2 , . . . , xn ). Lemma 1. Assume that a) 1 , 2 are symmetric with respect to x1 , i.e., X ∈ i ⇔ X¯ ∈ i ,

Monotonicity Properties of Optimal Transportation

551

b) f, g are also symmetric, i.e., ¯ f (X) = f (X),

¯ g(X) = g(X).

Then the optimal transportation is also symmetric, i.e., ¯ a) ϕ(X) = ϕ(X), ψ(Y ) = ψ(Y¯ ), ¯ = Y¯ (X). b) Y (X) Proof. By Brenier [B] ϕ, ψ are the unique minimizing pair of ϕ(X)f (X) dX + ψ(Y )f (Y ) dY under the constraint ϕ(X) + ψ(Y ) ≥ X, Y . By uniqueness, then,

¯ ϕ(X) = ϕ(X),

ψ(Y ) = ψ(Y¯ )

¯ ψ(Y¯ ) are a competing pair with the same energy. since ϕ(X),

Remark. The lemma is valid for a general cost function C(X) symmetric in x1 . Corollary 2. Under the hypothesis and with the notation of the lemma, if Y + is the + + optimal transportation from + , where Y is again ϕ(X), ψ(Y ) 1 to 2 then Y = Y |+ 1 + n restricted to X, Y in (R ) = {X : x1 > 0} must be the minimizing pair. We apply the previous lemma and corollary to densities f (X) and g(Y ) in the unit cube of Rn . Let f, g be densities in the unit cube of Rn , Q1 = {X : 0 ≤ xi ≤ 1} and Y be the optimal transportation. Let us write Y = X + V and respectively ϕ(X) = 21 |X|2 + u(X) (that is V = ∇u). Then Theorem 5. If we extend f, g to f ∗ , g ∗ on a larger cube Q by even reflections, then u(X) also extends periodically to u∗ , to the same cube Q∗ by even reflection and Y (X) to the optimal transportation map Y ∗ = X + ∇u∗ (X) from Q∗ to Q∗ . Corollary 3. If f, g are strictly positive and C α in the unit cube Q1 , then Y (X) maps each face of the cube to itself and both Y (X), X(Y ) have a C 1,α extension across ∂Q. Proof. It follows from the interior regularity theory (the above theorem) since each face of Q becomes interior after a reflection. Remark. The problem of finding “periodic” solutions to the Monge–Ampére equation was solved by Yanyan Li [L] by a different method.

552

L. A. Caffarelli

2. Monotonicity Properties of Y (X) We start with a heuristic discussion. Recall that the Holley condition on µ2 , µ1 was that µ2 (A ∨ B)µ1 (A ∧ B) ≥ µ2 (A)µ1 (B). Logarithmically log µ2 (A ∨ B) − log µ2 (A) ≥ log µ1 (B) − log µ1 (A ∧ B). Let us now think on smooth densities f (X), g(Y ) on the unit cube, and assume we are trying to prove, by a continuity argument that Y (X) is monotone, that is Y (X) ≥ X. So we are looking at a continuous family of densities f t , g t for which Y (X) > X and we find a first time t0 and a point X0 , for which Y (X0 ) > X0 , that is some coordinate, say y1 (X0 ) = x1 (X0 ). That means that y1 (X) − x1 (X) has a local minimum, zero, at X0 . But it is well known that y1 = D1 ϕ, satisfies an elliptic equation, obtained by differentiating the equation for ϕ. From log det D 2 ϕ = log f (X) − log g(∇ϕ) we get Mij Dij (D1 ϕ) = (log f (X))1 − (log g(∇ϕ))i Di1 ϕ. Since ϕ1 − x1 has a minimum, zero, at X0 , Di1 ϕ = δi1 , and we get at X0 , Y (X0 ), Mij Dij [y1 − x1 ] = (log f )1 (X) − (log g)1 (Y ). Since Mij is a strictly positive matrix for ϕ strictly convex and y1 − x1 has a minimum, the left-hand side must be non-negative. If we impose the right-hand to be non-positive we have a contradiction. About the right-hand side, we know that Y > X and that Y − X, e1 = 0, so the natural hypothesis we want to impose on f, g is that Monotonicity hypothesis. If Y ≥ X and Y − X, ei = 0, then Di (log g)(Y ) ≥ Di (log f )(X). Note. If we think of A = Y and B = X + tei we can argue that heuristically B ∨ A = Y + tei and B ∧ A = X, so log g(Y + tei ) − log g(Y ) ≥ log f (X + tei ) − log f (X) becomes Holley’s condition. We will show in fact later how to associate to a discrete “Holley” pair a continuous one satisfying our hypothesis.

Monotonicity Properties of Optimal Transportation

553

But first we prove our main comparison theorem. Theorem 6. Let f, g be C 1,α , strictly positive probability densities in the unit cube Q of Rn . Assume that given any X, Y, ej with X ≤ Y , and X − Y, ej = 0 (i.e., yj − xj = 0) (Dj log f )(X) ≤ (Dj log g)(Y ), and let Y (X) be the optimal transportation map. Then for any X in Q, Y (X) ≥ X. Proof. As we pointed out before, we know that the potentials ϕ(X), ψ(Y ) are of class C 2,α across ∂Qj and the C 1,α optimal transportations Y (X), X(Y ) map each face of the cube into itself in a C 1,α fashion. In particular, classical regularity theory for fully non linear equations applies to ϕ in the interior of the cube. More precisely, ϕ satisfies det Dij ϕ =

f (X) g(∇ϕ)

(see [G-T]) and f, g being C 1,α (this is not kept by reflection along the faces), we have that: ϕ is of class C 3,α (Q). We now study directional derivatives along the boundary of Qj . Consider D1 ϕ outside the faces x1 = 0, x1 = 1. Then, across the remaining boundary of Q1 , y1 (X) = D1 ϕ satisfies Mij Dij (D1 ϕ) = D1 log f (X) − D, (log g)D,1 ϕ. Both Mij and the right-hand side are of class C α (since D1 log f is tangential to the face). Hence y1 (X) is of class C 2,α across that part of the boundary and the equation is satisfied in the classical sense. In order to make the f, g relation strict we change g to gε by defining εyi + Cε , log gε (Y ) = log g + where the constant Cε is chosen so that gε (Y ) = 1. Then from the condition Dj log f (X) ≤ Dj log g(Y ) for yj − xj = 0, we now have for 0 < γ < δ(ε) small enough: Dj log f (X) ≤ Dj log gε (Y ) − δ if |yj0 − xj0 | < γ for some j0 and yj − xj > −γ for the remaining j .

554

L. A. Caffarelli

We now look at the continuous family of densities ft , gt defined by log ft = t log f + C(t), log gt = t log gε + D(t), where C(t), D(t) are chosen to keep

ft =

gt = 1 and we show

Lemma 2. For any 0 < t < 1 the corresponding (continuous in t) family of optimal transports Yt (X), satisfies yjt ≥ xjt − 21 γ . Proof. For t = 0, Y (X) is the identity map, and thus the inequality is satisfied for t small. As usual, suppose there exists a first value t0 > 0, for which the inequality is not satisfied. Thus, there exists X0 and a j (say j = 1) such that y1 (X0 ) = x1 (X0 ) − 21 γ and still y1 (X) ≥ x1 (X) − 21 γ everywhere else. We first note that x1 (X0 ) = 0, 1 because, if not y1 (X0 ) = x1 (X0 ). But everywhere else we have 0 ≤ Mij Dij y1 (X0 ) (since y1 − x1 has a minimum at X0 ) and D1 log f (X0 ) ≤ D1 log g(Y (X0 )) − tδ (since |y1 − x1 | = γ /2 and yj ≥ xj − γ /2 for the remaining j ). This is a contradiction that completes the proof of the lemma and the theorem.

Corollary 4. Let 0 < λ ≤ f, g ≤ be measurable. Suppose that log f , log g satisfies the hypothesis of the theorem in the sense of distributions. Then, the theorem still holds, i.e., Y (X) ≥ X. Proof. Mollify log f , log g to log fε , log gε with a standard (radially symmetric, nonnegative, compactly supported) mollifier ϕε . Then the hypothesis of Theorem 6 is satisfied as long as X, Y stay at distance ε from ∂Q1 . Take as center of coordinates the center of the cube: X = ( 21 , 21 , . . . , 21 ) and make a 2ε-dilation. The new fε , gε satisfy the hypothesis of Theorem 6 when restricted to the unit cube. Thus Theorem 1 holds for them. By passing to the limit on the maps, the theorem holds for f, g.

Monotonicity Properties of Optimal Transportation

555

3. Holley’s Theorem when the Lattice is all of the Vertices of the Unit Cube Given a vertex X ∈ P , we will denote by QX the subcube of Qj , of side 1/2 that has X as a vertex QX = {Z : |Z − X|L∞ ≤ 1/2}. We prove the following theorem. Theorem 7. Let f, g be step functions f = µ1 (X)χ QX , X∈P

g=

µ2 (X)χ QX .

X∈P

Assume that given vertices X, Y , X + ej , Y + ej with Y ≥ X and Y, ej = X, ej = 0 we have log µ2 (Y + ej ) − log µ2 (Y ) ≥ log µ1 (X + ej ) − log µ1 (X). Then Y (X) ≥ X. Proof. As a distribution Di log f (resp. Di log g) is the jump function log µi (X + ej ) − log µ1 (X) supported on the face of QX laying in the plane xj = 1/2.

Corollary 5. Let Z1 , Z2 ∈ P . Define ν(Z1 , Z2 ) = µ1 (Z1 )/|Q1/2 | |{X ∈ QZ1 /Y (X) ∈ QZ2 }| = µ2 (Z2 )/|Q1/2 | |{Y ∈ QZ2 /X(Y ) ∈ QZ1 }|. Then a) ν is a probability measure with marginals µ1 (Z1 ), µ2 (Z2 ), b) ν(Z1 , Z2 ) = 0 ⇒ Z2 ≥ Z1 . 4. Holley’s Theorem for General Lattices Given a lattice ⊂ P , and two measures µ1 , µ2 satisfying the Holley condition we want to extend µ1 , µ2 to small perturbations µ∗1 , µ∗2 in all of P keeping the inequalities. Usually, µ1 , µ2 are extended by zero. We need to be a little more careful. We state the following presentation of . Lemma 3. There is a partition of RN = Rk1 ⊗ Rk2 ⊗ · · · ⊗ Rk, j

and a family of elements wi (1 ≤ j ≤ ,, 1 ≤ i ≤ kj ) such that any non zero element j X ∈ is the max of wi , j wi x= i,j ∈IX

and

j

j

wi = ei + v

with the coordinates vis = 0 ∀ s ≥ j . (More precisely wi1 = e1i , wi2 = e2i + v, with v ∈ Rk1 , wi3 = e3i + v with v ∈ Rk1 +k2 and so on.

556

L. A. Caffarelli

Proof. The decomposition is by first choosing the minimal elements e¯ 1 , e¯ 2 , . . . , e¯ k1 and contracting the ones in them to only one position. Next we choose minimal elements among those not in Rk1 and so on. ¯ be the following extension of : We now extend the lattice and the measure. Let ¯ = ∪ 0 ,

where w ∈ 0 ⇔ max(w, e1 ) ∈

(that is, we add to all those elements with a 1 as first coordinates, those with a zero). ¯ define Given w in w + = w ∨ e1 , w− = w+ − e1 (i.e., w with a zero in the position e1 ). Define

∗

µ (w) =

µ(w) µ(w + )/M

if w ∈ . otherwise (M large)

¯ is a lattice and µ∗ , µ∗ still satisfy Theorem 8. 1 2 log µ∗2 (v1 ∨ v2 ) − log µ∗2 (v2 ) ≥ log µ∗1 (v1 ) − log µ∗1 (v1 ∧ v2 ). ¯ are w + and w − of elements in (w + is always in since e1 ∈ λ). Proof. Elements in Then v1 ∧ v2 = w1± ∧ w2± for w ∈ . If one of the signs is a −, v1 ∧ v2 = (w1 ∧ w2 )− . If not Also

v1 ∧ v2 = w1 ∧ w2 . v1 ∨ v2 = w1± ∨ w2± .

If one of the signs is a + (since w + ∈ ), v1 ∨ v2 = w1 ∨ w2 . If not

v1 ∨ v2 = (w1 ∨ w2 )− .

About the measures µ∗1 , µ∗2 , let us verify the proper inequalities. For that purpose we choose M # µi (X) for any X. There are several cases to consider a) w1 , w2 ∈ , then w1 ∧ w2 , w1 ∨ w2 ∈ and everything is as before. b) w1 ∈ , w2 ∈ / (thus w2 = w2− ). b1 ) If w1 = w1− , we have that w1 ∧ w2 ∈ and w1 ∨ w2 ∈ / and the factor log M cancels in the µ∗2 expression. b2 ) If w1 = w1+ , w1 ∨ w2 ∈ . If w1 ∧ w2 ∈ , the extra factor log M in the µ∗2 expression controls everything else (we choose log M # sup | log µi |. If w1 ∧ w2 ∈ / , µ∗1 (w1 ∧ w2 ) = µ1 (w1 ∧ w2+ )/M, and µ∗ (w2 ) = µ(w2+ )/M, thus each term has an extra log M factor that cancels.

Monotonicity Properties of Optimal Transportation

557

c) w2 ∈ , w1 ∈ / . c1 ) If w2 = w2+ , then w1 ∨ w2 ∈ . If w1 ∧ w2 ∈ the extra term − log M in the µ1 expression controls everything. If w1 ∧ w2 ∈ / then µ∗1 (w1 ∧ w2 ) = µ(w1+ ∧ w2 )/M, µ∗1 (w1 ) = µ(w1+ )/M,

and we have log M cancellation. / and we have c2 ) If w2 = w2− , then w1 ∧ w2 ∈ , but w1 ∨ w2 ∈ µ∗2 (w1 ∨ w2 ) = µ2 (w1+ ∨ w2 )/M, µ∗1 (w1 )∗ = µ1 (w1+ )/M,

and there is a log M factor cancellation. / , w2 ∈ / , then w1 ∨ w2 ∈ / . If w1 ∧ w2 ∈ / , the factors log M cancel. d) If w1 ∈ If not, the extra factor log M in the µ∗1 expression controls everything else. The proof of the theorem is complete.

Theorem 9. We are given ⊂ P and µ1 , µ2 . As before, let f, g be the step functions µ1 (wi )χ Qwi , f = wi ∈

g=

wi ∈

µ2 (wi )χ Qwi .

Then, the optimal transportation map Y (X) is monotone. Proof. If we start with M = M0 and we repeat the extension process (M1 # M0 , M2 ≥ M1 and so on) we exhaust P . Note that once we have extended through e11 , . . . , e1k1 , the elements e21 , . . . , e2k2 belong now to the lattice and are minimal, so we can keep extending. As M0 goes to infinity the measures µ∗i converge to µi . We complete this work by showing that, actually, the discrete optimal transportation map is monotone. In this case the map is in general multi-valued. That is the mass µ1 (w) may have to be spread through several points v. Still, for all those v’s, v(w) ≥ w. Theorem 10. Let be a sublattice of P , the set of vertices of the unit cube on Rn , and let µ1 , µ2 be positive measures in satisfying the usual monotonicity condition. Let ν(X, Y ) be the (discrete) optimal transportation. Then ν(X, Y ) = 0 ⇒ Y ≥ X. Proof. From the previous theorem we may assume that µi is defined and positive in all of P . We will approximate it by bounded densities f, g that satisfy the hypothesis of Theorem 6. We define them as follows. Let 1 be the vector 1 = (1, 1, . . . , 1). In the strip Sωε = {εω < X ≤ ω + ε1}, let N (X, ω) be the number of coordinates, j , for which wj − xj > ε and we define there, for δ $ ε, f (X) = µ1 (ω)δ N . Note that Sωε cover Q1 disjointly (given X we determine w by those coordinates xj > ε). Same definition for g.

558

L. A. Caffarelli

Of course, we have to multiply as usual by a normalization constant to make f = g = 1, but this does not affect the logarithmic inequality. Also if δ goes to zero much faster than ε, (say like ε 2N ) f and g converge to µ1 and µ2 , since most of the mass concentrates in the cube Qε (ω) = {|xi − ωi | < ε}. About Di log f , Di log g, they are jump functions concentrated on the planes xj = ε or 1 − ε so we have to check that the jump inequalities are satisfied. We also may disregard plane intersections since they will not affect Di f in the distributional sense. So we check that a) For X ≤ Y and xi = yi = ε we have Jump(log g) ≥ Jump(log f ). Indeed when xi , yi go through ε we change from evaluating the measures at w1 , (resp. w2 ) to w1 + ei , w2 + ei , and both N (X), N (Y ) increase by one, so the jump relation holds (they are the lattice relations plus a factor log δ. b) When xi , yi go through (1 − ε), w1 and w2 remain unchanged and N (X), N (Y ) both decrease by one. Also here the jump relation holds (both jumps are just log δ). This completes the proof. 5. Second Derivative Estimates In this section we explore what the implications are of the fact that second derivatives of solutions to Monge–Ampére equations are subsolutions of an elliptic equation. First an heuristic discussion: Let us take a second pure derivative of the equation log det Dij ϕ = log f (x) − log g(∇ϕ). We get Mij Dij ϕαα + Mij,k, Dij α ϕDijβ ϕ = Dαα log f − (log g)ij ϕiα ϕj α − (log g)i ϕααi . From the concavity of log det the second term on the left is negative. If ϕαα reaches at X0 the maximum value among all pure second derivatives, then the right-hand side must be negative. Let us look at the explicit case in which up to a constant, f = e−Q(X) and g = e−(Q(Y )+F (Y )) , where Q is a nonnegative quadratic polynomial, aij xi xj (for instance, near neighborhood or other “Dirichlet Integral” like interactions in field theory). We may assume that α = e1 . Then, we must compute D11 (−Q(X) + Q(∇ϕ) + F (∇ϕ), we have D11 (−Q)(X) = −a11 , D11 Q(∇ϕ) = aij ϕi1 ϕj 1 + aij ϕi11 ϕj . But since ϕ11 (X0 ) is the maximum among all pure second derivatives, ϕ11i = 0 for all i, and ϕ1i = 0 for i = 1. So D11 Q(∇ϕ(X0 )) = a11 (ϕ11 )2 . Finally, if F is convex D11 F (∇ϕ) = Fij ϕi1 ϕj 1 + Fi ϕi11 is non-negative.

Monotonicity Properties of Optimal Transportation

559

Therefore D11 (R.H.S.) ≥ a11 ((ϕ11 )2 − 1). We get a contradiction if ϕ11 > 1. That is Theorem 11. Let, up to a multiplicative constant, f (X) = e−Q(X) , g(Y ) = e−(Q(Y )+F (Y )) with F convex. Then the potential ϕ of the optimal transportation satisfies 0 ≤ ϕαα ≤ 1. In particular Y = X + ∇u(X), where u = ϕ − 21 |X|2 is concave and −1 ≤ uαα ≤ 0 (independently of dimension). Proof. To make the previous theorem valid we have to take care of what happens when X goes to infinity. Again by approximation we may assume that the convex function F (X) is +∞ outside the ball BR (that is g is supported in the ball of radius R, and smooth bounded away from zero and infinity inside it. We will replace the second derivative by an incremental quotient, and show that it still satisfies a maximum principle and goes to zero at infinity. Let (δϕe )(X) = ϕ(X + he) + ϕ(X − he) − 2ϕ(X). We fix h, and study what happens if δϕ = δϕe1 attains a local maximum at X0 , for all possible e. From the concavity of log det, we still have that, for the linearization coefficients Mij , of log det at X0 , Mij δϕ(X0 ) ≤ δ(log f − log g) = δ(−Q(X) ) + Q(∇ϕ) + F (∇ϕ). From the fact that δϕe1 realizes a maximum among X and e, we obtain a) ∇δϕ = ∇ϕ(X0 + he1 ) + ∇ϕ(X0 − he1 ) − 2∇ϕ(X0 ) = 0 and b) for any τ ⊥ e1 , Dτ δϕ = τ · (∇ϕ(X0 + he1 ) − ∇ϕ(X0 − he1 ) = 0. Therefore ∇ϕ(X ± he1 ) = ∇ϕ(X) ± λe1 and δϕ = 2λ (λ positive). Then, from the convexity of F , δF (∇ϕ(X0 )) ≥ 0.

560

L. A. Caffarelli

If we write Q(X) as a bilinear form Q(X) = B(X, X), δQ(∇ϕ) = B(∇ϕ(X0 ) + λe1 , ∇ϕ(X0 ) + λe1 ) + B(∇ϕ(X0 ) − λe1 , ∇ϕ(X0 ) − λe1 ) − 2B(∇ϕ(X0 ), ∇ϕ(X0 )) = λ2 B(e1 , e1 ). Similarly δQ(X) = h2 B(e1 , e1 ) so we get: If δϕ has an interior maximum at X0 , then it must hold: ∇ϕ(X0 ± he1 ) = ∇ϕ(X0 ) ± λe1 with λ < h. But, since ϕ is convex ϕ(X0 ± he1 ) − ϕ(X0 ) ≤ ∇ϕ(X0 ± he1 ) − ∇ϕ(X0 ), ±he1 = λh ≤ h2 . Thus, δϕ ≤ 2h2 , the desired inequality. To complete the proof of the theorem it would be enough to show that δϕ goes to zero (for fixed δ) when X goes to infinity. We show that: X . Lemma 4. As X goes to infinity Y converges uniformly to R |X|

Proof. Let X0 = λe1 for λ large and Y0 its image. Let ν be a unit vector with angle (ν, e1 ) ≤

π − ε. 2

From the monotonicity of the map, any point on BR of the form Y & = Y0 + tν must come from a vector

X & = X0 + sµ,

with µ, ν ≥ 0. In particular, we must have angle (µ, e1 ) ≤ (π − ε). In other words if in Y space we consider the cone, = = {Y & = Y0 + tν, with t > 0, angle (ν, e1 ) ≥

π − ε, 2

its intersection with BR must be covered by the image of the (concave) cone = = {X& = X0 + sµ, with s > 0 and angle (µ, e1 ) ≤ π − ε}.

Monotonicity Properties of Optimal Transportation

561

But = has very small f measure µf ( = ) ≤ (ελ)n e−(ελ) , 2

ελ > λ1/2 ,

since the ball of radius ελ is not contained in =. On the other hand, g is strictly positive in BR , so µg (= ∩ BR ) ∼ |= ∩ BR | ≤ µf ( = ). This forces the exponential convergence of Y to Re1 . This completes the proof of the lemma and the theorem, since the uniform converX gence of ∇ϕ to |X| , makes δϕ go to zero (for any fixed, positive h). We state three corollaries of this last inequality. The first two are a generalization of the classic Brascamp–Lieb moment inequality and the third an eigenvalue inequality. Corollary 6. Let f (X) = e−Q(X) , g(H ) = e−[Q(Y )+F (Y )] with Q quadratic and F convex, and let = be a convex function of one variable (|x1 |α in [B-L]). Then Eg (=(y1 − Eg (y1 )) ≤ Ef (=(x1 )). Proof. It follows from [B-L] that it is enough to prove it in the one dimensional case (see Theorem 5.1 of [B-L]). We can also assume by a translation that Eg (y1 ) = 0. By the change of variable formula that means y(x)f (x) dx = 0. Also

Eg (=(y1 )) =

=(y1 (x)f (x) dx.

But y(x) = x + u(x), where y = ϕ & (x), ϕ convex and u = ψ & (x), ψ concave. Thus y is increasing, and u is decreasing and changes sign, since u(x)f (x) dx = y(x)f (x) dx = 0. Say u(x0 ) = 0. Then, we write =(y(x))f (x) ≤ [=(x) + = & (y(x))(y − x)]f (x). Since = is convex, ≤ Ef (=(x)) +

[= & (y(x)) − = & (x0 )](y − x)f (x).

But at x0 , = & (y(x0 )) = = & (x0 ) and y(x0 ) = x0 , and further = & is increasing, while y − x = u is decreasing, thus the last integrand is negative, and this completes the proof.

562

L. A. Caffarelli

If we want to repeat the argument above for functions = that depend on more than one variable, and we want to prove that Eg (=(Y − Eg (Y )) ≤ Ef (=(X)), we may as before assume that Eg (Y ) = 0. That means, with Y = X+U , that U (X0 ) = 0 for some X0 (i.e., the concave function −ψ has a maximum). The same computation then gives us Eg (=(Y )) ≤ Ef (=(X)) + (∇=(Y ) − ∇(=(X0 ))(−∇ψ(Y )f (X) dx, where ψ and = − ∇=(X0 ), X − X0 are both convex with a minimum at X0 , so there is some hope that the integrand be negative. For instance, if we are looking at statistics of k-variables we have the following corollary. Corollary 7. Assume that Q(X), F (X) in the definition of f (X), g(Y ) are symmetric with respect to (x1 , . . . , xk ) and that =(x1 , . . . , xk ) is convex and symmetric. Then Eg (=(Y )) ≤ Ef (=(X)). Proof. As before we may assume the problem is k-dimensional ([B-L], Theorem 4.3). Since Q and F are symmetric, the potentials ϕ(X), ψ(X) are symmetric. Therefore ∇ϕ, ∇ψ, ∇= = 0 for X = 0 and further, sign ϕi (X) = sign ψi (X) = sign =i (X) = sign xi = sign yi . From the computation above it suffices to show that for all Y , ∇= · ∇ψ ≥ 0. That follows since =i · ψi ≥ 0 for all i.

A final consequence of the estimate ϕαα ≤ 1 for log concave perturbations of the Gaussian is that any Raleigh-like quotient (log Sobolev inequality, isoperimetric inequality, Poincaré inequality) that involves a quotient between first derivatives and the function themselves is smaller for the perturbation than for the Gaussian. For instance, let F (t), G(t), H (t), K(t) be non-negative, non-decreasing functions of t ∈ [0, ∞), then we have the Corollary 8. Let f, g be densities as in Theorem 11 (i.e., a Gaussian and its log concave perturbation) then consider the “Raleigh” quotient F ( G(|∇u|)f (X) dX) λf = inf . H ( K(|u|)f (X) dX) Then λg ≥ λf . Proof. If we apply the change of variable formula to any function u(Y ), we get K(|u(Y )|)g(Y ) dY = K(|u(X)|)f (X) dX, while ∇X u(Y (X)) = DX (Y )∇Y u(X). But DX Y is a symmetric matrix with all eigenvalues less than one, so |∇X u(Y (X))| ≤ |∇Y u(Y )| which proves the corollary. Remark. The monotonicity for the log Sobolev inequality under log concave perturbations of the Gaussian follows from the Bakry–Emery theorem ([B-E]).

Monotonicity Properties of Optimal Transportation

563

References [B-E]

Bakry, D. and Emery, M.: Diffusions hypercontractives. In: Sém. Prob. XIX, LNM 1123 Berlin– Heidelberg–New York: Springer, 1985, pp. 177–206 [B-L] Brascamp, H. and Lieb, E.: On extentions of the Brunn-Minkowski and Prékopa-Leindler Theorems, Including Inequality for Log Concave functions, and with an Application to the Diffusion Equation. J. Funct. Anal. 22, 366–389 (1976) [B] Brenier, Y.: Polar factorization and monotone rearrangement of vector-valued functions. Comm. Pure Appl. Math. XLIV, 375–417 (1991) [C1] Caffarelli, L.A.: Interior W 2,p estimates for solutions of the Monge–Ampére equation. Ann. of Math. 131, 135–150 (1989) [C2] Caffarelli, L.A.: The regularity of mappings with a convex potential. J.A.M.S. 5, 99–104 (1992) [C3] Caffarelli, L.A.: Boundary regularity of maps with convex potential I. Comm. Pure Appl. Math. 45, 1141–1151 (1992) [FKG] Fortiun, C.M., Kasteleyn, P.W., Ginibre, J.: Correlation inequalities on some partially ordered sets. Commun. Math. Phys. 22, 89–103 (1971) [G-M] Ganzbo, W., McCann, R.J.: The geometry of optimal transport. Acta Math. 177, 2, 113–161 (1996) [G-T] Gilbarg, P., Trudinger, N.: Elliptic partial differential equations of second order. Second edition, Berlin–Heidelberg–New York: Springer, 1983 [H] Holley, R.: Remarks on the FKG inequalities. Commun. Math. Phys. 36, 227–231 (1974) [L] Li, Yanyan: Some existence results of fully non-linear elliptic equations of Monge-Ampere type. Comm. Pure Appl. Math. 43, 233–271 (1990 [P] Preston, C.J.: A generalization of the FKG inequalities. Commun. Math. Phys. 36, 232–241 (1974) Communicated by J. L. Lebowitz

Commun. Math. Phys. 214, 565 – 572 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Lifshitz Tail for Schrödinger Operator with Random Magnetic Field Shu Nakamura Graduate School of Mathematical Sciences, University of Tokyo, 3-8-1, Komaba, Meguro-ku, Tokyo 153-8914, Japan. E-mail: [email protected] Received: 3 January 2000 / Accepted: 18 April 2000

Abstract: We study the behavior of the density states at the lower edge of the spectrum for Schrödinger operators with random magnetic fields. We use a new estimate on magnetic Schrödinger operators, which is similar to the Avron–Herbst–Simon estimate but the bound is always nonnegative. 1. Introduction In this paper, we consider the magnetic Schrödinger operator without scalar potential on Rd (d ≥ 2): H = (p − A(x))2

on L2 (Rd ),

where p = −i∂x the momentum operator, A(x) = (A1 (x), . . . , Ad (x))

x ∈ Rd

is a vector potential. For the moment we suppose A is a C 1 -class function. We identify A(x) with the 1-form A = dj =1 Aj (x)dxj . The magnetic field is the 2-form defined by Bij (x)dxi ∧ dxj , B(x) = dA = i<j

where Bij (x) = ∂xi Aj (x) − ∂xj Ai (x), and we identify B with the skew-symmetric matrix valued function (Bij (x)) on Rd . It is well-known that for any closed 2-form B, there is a 1-form A such that B = dA, and the corresponding Schrödinger operator H is uniquely determined by B modulo the gauge transformation group. Supported in part by ISPS grant Kiban B 09440055

566

S. Nakamura

We suppose that B = B ω is a metrically transitive random closed 2-form in the following sense: There exists a probability space (, , P) and Rd acts on as a measure preserving transformation group. We denote the action by {Tx | x ∈ Rd }. The action is supposed to be ergodic, i.e., if E ⊂ is invariant under {Tx }, then P(E) = 0 or 1. B = B ω is assumed to satisfy B ω (x − y) = B (Tx ω) (x),

x, y ∈ Rd ,

and it is a continuous closed 2-form almost surely. We denote L = {x ∈ Rd | |xj | ≤ L/2 for j = 1, . . . , d},

L > 0,

and let Hω be the Schrödinger operator (with the same symbol as above) on ⊂ Rd with, e.g., the Neumann boundary condition (cf. Sect. 3). We also denote the Lebesgue measure of by ||. Then it is well-known that the density of states k(E) = lim

L→∞

1 #{eigenvalue of HωL ≤ E} |L |

exists for almost all E ∈ R and ω ∈ , and it is independent of ω (almost surely). Our main assumption is the following. Assumption A. (i) B ω is a metrically transitive random closed 2-form on Rd in the above sense. (ii) There is a constant M ∈ R+ = [0, ∞) such that |B ω (x)| ≤ M for any x ∈ Rd and almost all ω ∈ , where |B(x)| = ( i<j |Bij (x)|2 )1/2 . (iii) There exists a real-valued continuous function ϕ on R+ such that ϕ(λ) → 0 as λ → ∞, and it satisfies the following condition: Let 1 , 2 ∈ and let f ∈ L1 (), g ∈ L∞ () such that f is 1 -measurable and g is 2 -measurable, where denotes the σ -algebra generated by {B ω (x) | x ∈ }. Then |E(f g) − E(f )E(g)| ≤ ϕ(dist(1 , 2 ))f L1 gL∞ , where E(·) denotes the expectation with respect to P, and dist(·, ·) is the Euclidean distance on Rd . (iv) Let C0 be the unit cube in Rd . Then

E

C0

|B ω (x)|dx > 0.

Theorem 1. Suppose Assumption A. Then lim sup log(− log k(E))/ log E ≤ −d/2. E↓0

Lifshitz Tail for Schrödinger Operator with Random Magnetic Field

567

The Lifshitz tail (or the Lifshitz singularity) of the density of states for the Schrödinger operator with random potential has been studied by many authors (see, e.g., the textbook of Carmona and Lacroix [2] or the lecture note of Kirsch [3]), but very few results have been obtained on the Lifshitz tail for the Schrödinger operator with random magnetic field so far. The author is only aware of a work by Ueki [7], where the Lifshitz tail is studied for Schrödinger operators with a class of Gaussian random magnetic field, and a work by the author on the 2D discrete Schrödinger operator with Anderson type random magnetic field [5]. The strategy of the proof is similar to [5]. We first prove that the magnetic Schrödinger operator is bounded from below by a nonnegative function, which does not vanish where B does not (Sect. 2). This estimate is similar to the well-known AHS (Avron-HerbstSimon) estimate: H ≥ ±Bij (x),

x ∈ Rd , i = j.

The right-hand side of the AHS estimate is not necessarily positive. In order to obtain global lower bounds of the operator, we have to use a partition of unity, and thus introduce (not necessarily positive) error terms. Our estimate (Theorem 2) is in fact weaker than the AHS estimate in many situations, but the right hand side of the estimate is always nonnegative, and hence easier to apply in our context. In Sect. 3, we discuss Neumann decoupling and prove a simple a priori estimate on the density of states for magnetic Schrödinger operators. We then combine them with a result on the Lifshitz tail of the Schrödinger operator with random potential to conclude the assertion (cf. Kirsch–Martinelli [4], Kirsch [3]). Notation. We denote the definition domain of an operator A by D(A). The quadratic form domain of A is denoted by Q(A). The inner product of a Hilbert space H is denoted by ϕ|ψ (ϕ, ψ ∈ H), which is linear in the second entry. We denote the space of the Schwartz functions on Rd by S(Rd ). 2. Energy Estimates for Magnetic Schrödinger Operators We consider a deterministic Schrödinger operator H = (p − A(x))2 with the magnetic field B = dA. We fix a constant r > 0. For x ∈ Rd and 1 ≤ i < j ≤ d, we denote Dij (x) = y ∈ Rd yk = xk for k = i, j , (xi − yi )2 + (xj − yj )2 ≤ r 2 , γij (x) = ∂Dij (x). We define bij (x) ∈ T = R/(2π Z) by bij (x) ≡ Bij (y)dyi dyj Dij (x)

We identify T with [−π, π) and set W (x) =

(mod 2π Z).

1 |bij (y)|2 dy. (d − 1)4π 3 r 3 γij (x) i<j

568

S. Nakamura

Theorem 2. H ≥ W in the operator sense, i.e., ψ|H ψ ≥ W (x)|ψ(x)|2 dx, Rd

ψ ∈ D(H ).

Remark. Since |bij (x)| ≤ π , we have 0 ≤ W (x) ≤

1 d(d − 1) d · 2π r · π 2 = 2 , 3 3 (d − 1)4π r 2 4r

x ∈ Rd .

Proof. We denote the velocity operator by vj = pj − Aj (x),

j = 1, . . . , d.

We parameterize γij (x) as follows: For t ∈ [0, 2π r), we set   if k = i, j , xk yk (x; t) = xi + r cos(t/r) if k = i,  x + r sin(t/r) if k = j . j For ψ ∈ S(Rd ), we compute

Ex (ψ) = where y(x; ˙ t) =

d dt y(x; t).

γij (x)

|y˙ · (vψ)|2 dy,

(2.1)

If we denote

˜ ψ(t) = ψ(y(x; t)),

˜ = y(x; A(t) ˙ t) · A(y(x; t)),

then the right-hand side of (2.1) is 2πr (−i∂t − A(t)) ˜ ˜ 2 dt = ψ|h ˜ ψ, ˜ ψ(t) 0

where h is an operator on L2 ([0, 2π r)) defined by 2 ˜ hϕ(t) = (−i∂t − A(t)) ϕ(t),

ϕ ∈ D(h) = H 2 (R/(2π rZ)),

i.e., ϕ ∈ D(h) satisfies the periodic boundary condition. Then using the gauge transform t ˜ G(t) = exp(i 0 A(s)ds), we can easily observe that h is unitarily equivalent to h˜ defined by ˜ hϕ(t) = −∂t2 ϕ(t), ˜ = ϕ ∈ H 2 ((0, 2π r)) ϕ(2π r) = eib ϕ(0) , ϕ ∈ D(h) where b ≡

2πr

˜ A(t)dt (mod 2πZ). Now we note 2πr ˜ A(t)dt = y˙ · A(y(x; t))dt

0

0

=

γij (x)

Dij (x)

Bij (y)dyi dyj ≡ bij (x)

(mod 2π Z)

Lifshitz Tail for Schrödinger Operator with Random Magnetic Field

569

by the Stokes formula. Thus b = bij (x). The spectrum of h˜ is easily computed, and we learn b n 2 ˜ σ (h) = + n∈Z . 2π r r In particular, ˜ = inf σ (h) = inf σ (h)

bij (x) 2π r

2 =

1 |bij (x)|2 . 4π 2 r 2

This implies ˜ ψ ˜ ≥ inf σ (h)ψ ˜ 2= Ex (ψ) = ψ|h

1 ˜ 2. |bij (x)|2 ψ 4π 2 r 2

Integrating this inequality over x ∈ Rd , we have 2πr 1 Ex (ψ)dx ≥ |bij (x)|2 |ψ(y(x; t))|2 dtdx 2r 2 d d 4π R R 0 2πr 1 |bij (y(x ; t ))|2 |ψ(x )|2 dt dx , = 4π 2 r 2 Rd 0

(2.2)

where x = y(x, t) and t ≡ t + π r (mod 2π Z). Now we estimate the left hand side of (2.2). We use the same change of variables as above to learn 2πr 2 y(x; ˙ t) · ψ(y(x; t)) dtdx Ex (ψ)dx = Rd

=

Rd

= πr

2πr 0

Rd

Rd

0

sin2 (t/r)|vi ψ(x)|2 + cos2 (t/r)|vj ψ(x)|2 dtdx

|vi ψ(x)|2 + |vj ψ(x)|2 dx.

Combining this with (2.2), we obtain |vi ψ(x)|2 + |vj ψ(x)|2 dx ≥ Rd

1 4π 3 r 3

Rd

2πr 0

|bij (y(x; t))|2 |ψ(x)|2 dtdx.

Then we sum up this estimate over 1 ≤ i < j ≤ d to conclude (d − 1)

d i=1

|vi ψ(x)|2 ≥

1 4π 3 r 3

Rd i<j

and this implies the assertion of Theorem 2.

|bij (y(x; t))|2 |ψ(x)|2 dtdx,

Example. Let H be the Schrödinger operator on R2 with constant magnetic field with B > 0. We choose r > 0 so that b = b12 = π , i.e., r = B −1/2 . Then we have W (x) = B/2, and hence H ≥ B/2. It is well-known, however, that inf σ (H ) = B, and the estimate is not optimal.

570

S. Nakamura

The following estimate (or, strictly speaking, its generalization) will be useful in the proof of Theorem 1. Corollary 3. Let W as above, and let H0 = p2 be the free Schrödinger operator. Then 1 inf σ (H0 + W ). 2 Proof. By Kato’s inequality ([6] Theorem X.33), we have inf σ (H ) ≥

ψ|H ψ ≥ |ψ| |H0 |ψ|,

ψ ∈ S(Rd ).

Combining this with Theorem 1, we learn ψ|H ψ ≥

1 |ψ| |(H0 + W )|ψ|, 2

and hence ψ|H ψ ψ2 |ψ| |(H0 + W )|ψ| 1 1 ≥ inf ≥ inf σ (H0 + W ). 2 2 ψ=0 ψ 2

inf σ (H ) = inf

ψ=0

(Note that the last inequality is in fact equality, since the ground state of the Schrödinger operator is nonnegative.) 3. Proof of Theorem 1 As in the last section, we fix A(x) and consider a deterministic magnetic Schrödinger operator H = (p − A)2 . Let ⊂ Rd be an open set. Then the Neumann Hamiltonian H on L2 () is defined by H ψ(x) = (p − A(x))2 ψ(x),

ψ ∈ D(H ),

Q(H ) = D((H ) ) 1 = ψ ∈ Hloc () (pj − Aj (x))ψ ∈ L2 (), j = 1, . . . , d . 1/2

˜ ⊂ are Note that Q(H ) is not necessarily the same as H 1 () if is unbounded. If d 2 2 ˜ ˜ open sets in R such that | \ | = 0, then we can regard L () = L (), and H and H˜ may be considered as operators on the same Hilbert space. Since Q(H˜ ) ⊃ Q(H ) and the symbols are the same, we learn H˜ ≤ H

on L2 ().

Hence, in particular, by the min-max principle ([6] Theorem XIII-1) we have 1 1 #{eigenvalues of H ≤ E} ≤ #{eigenvalues of H˜ ≤ E} || || for E ∈ R if || is finite. We denote C(L; x) = y ∈ Rd xj < yj < xj + L, j = 1, . . . , d , for L > 0, x ∈ Rd . In particular, we write L = C(L, 0). We will need the following a priori bound on the number of eigenvalues.

Lifshitz Tail for Schrödinger Operator with Random Magnetic Field

571

Lemma 4. Suppose that |B(x)| ≤ M with M < ∞. Then there exists a constant C > 0 depending only on d and M such that 1 #{eigenvalues of HL ≤ E} ≤ C(1 + E d/2 ) |L | for any E ≥ 0 and for any integer L > 0. Proof. Let = L , and let ˜ = C(1, n) n ∈ Zd , 0 ≤ nj < L, j = 1, . . . , d . ˜ ⊂ and | \ | ˜ = 0, and hence it suffices to estimate Then 1 #{e.v.’s of H˜ ≤ E} = −d |L |

#{e.v.’s of HC(1,n) ≤ E}.

(3.1)

0≤nj
We note that the number of eigenvalues are independent of the choice of gauge. Hence we may choose A(x) on each C(1, n) so that |Aj (x)| ≤ M (j = 1, . . . , d). Then HC(1,n) = p2 − A · p − p · A + A2 1 1 = p 2 + (p − 2A)2 − A2 2 2 1 ≥ p 2 − dM 2 . 2 Hence we learn #{e.v.’s of HC(1,n) ≤ E} ≤ #{e.v.’s of ( 21 p 2 − dM 2 ) ≤ E} ≤ c1 (E + dM)2 ≤ c2 (1 + E d/2 ) with some c1 , c2 > 0. The lemma now follows from (3.1) and (3.2).

(3.2)

We also need a generalization of the results of the last section for the Neumann Hamiltonian. Let r > 0 and W (x) as in the last section. For ⊂ Rd , we set W (x) if dist(x, c ) > r, W (x) = 0 if dist(x, c ) ≤ r. Then, by exactly the same argument as in the last section, we can prove the following estimates: Lemma 5. H ≥ W as operators on L2 (). Hence, in particular, inf σ (H ) ≥

1 inf σ (H0, + W ), 2

where H0, denotes the free Schrödinger operator with the Neumann boundary condition on .

572

S. Nakamura

Proof of Theorem 1. Here we suppose that B = B ω satisfies Assumption A. Let L > 0 and we set 1 kL (E) = E(#{e.v.’s of HLω ≤ E}), E ∈ R, |L | where HLω is the Neumann Hamiltonian on L . Then we can show k(E) ≤ kL (E),

for any E and L,

(3.3)

using exactly the same argument as in the case of the Schrödinger operator with random potential (cf. [3] Chapter 7, Corollary 2). Now we apply the argument of Kirsch– Martinelli [4]. It follows from Assumption A-(iii) that W (x) satisfies Assumption B-(ii) of [4]. By Assumption A-(iv), we have E |W (x)|dx > 0, C0

and hence E(|{x ∈ C0 | W (x) = 0}|) < 1. Then we learn from the argument of Sect. 3 of [4] that there are c1 , c2 > 0 such that if L ∼ c1 E −1/2 then P inf σ (H0,L + WL ) ≤ E ≤ exp(−c2 E −d/2 ), E ∼ 0. We note that the support of W − WL is contained in a neighborhood of ∂L with the width r, and the volume is O(Ld−1 ). Hence the difference of W and WL does not affect the large deviation argument. This and Lemma 5 imply P inf σ (HLω ) ≤ E ≤ exp(−c3 E −d/2 ) with c3 > 0. Combining this with Lemma 4, we have kL (E) ≤ C(1 + E d/2 ) · P inf σ (HL ) ≤ E ≤ exp(−c4 E −d/2 ) with c4 > 0 if E is sufficiently small. Theorem 1 follows from this and (3.3).

References 1. Avron, J., Herbst, I., Simon, B.: Schrödinger operators with magnetic fields I: General interactions. Duke Math. J. 45, 847–883 (1978) 2. Carmona, R., Lacroix, J.: Spectral Theory of Random Schrödinger Operators. Basel–Boston: Birkhäuser, 1990 3. Kirsch, W.: Random Schrödinger operators. In: Schrödinger Operators, H. Holden, A. Jensen, eds., Lecture Notes in Physics 345, Berlin–Heidelberg–New York: Springer, 1989 4. Kirsch, W., Martinelli, F.: Large deviations and Lifshitz singularity of the integrated density of states of random Hamiltonians. Commun. Math. Phys. 89, 27–40 (1983) 5. Nakamura, S.: Lifshitz tail for 2D discrete Schrödinger operator with random magnetic field. Preprint 1999 Sep. (mp_arc:99-420) To appear in Ann. Henri Poincaré 6. Reed, M., Simon, B.: The Methods of Modern Mathematical Physics, Vol. I–IV. New York: Academic Press, 1972–1979 7. Ueki, N.: Simple examples of Lifshitz tails in Gaussian random magnetic fields. To appear in Ann. Henri Poincaré Communicated by B. Simon

Commun. Math. Phys. 214, 573 – 592 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Homotopy Classes for Stable Periodic and Chaotic Patterns in Fourth-Order Hamiltonian Systems W. D. Kalies1 , J. Kwapisz2 , J. B. VandenBerg3 , R. C. A. M. VanderVorst3,4, 1 Department of Mathematical Sciences, Florida Atlantic University, Boca Raton, FL 33431, USA.

E-mail: [email protected]

2 Department of Mathematical Sciences, Montana State University-Bozeman, Bozeman, MT 59717-2400,

USA. E-mail: [email protected]

3 Department of Mathematical Sciences, University of Leiden, Niels Bohrweg 1, 2333 CA Leiden,

The Netherlands. E-mail: [email protected]; [email protected]

4 CDSNS, Georgia Institute of Technology, Atlanta, GA 30332, USA

Received: 6 April 1999 / Accepted: 2 May 2000

Abstract: We investigate periodic and chaotic solutions of Hamiltonian systems in R4 which arise in the study of stationary solutions of a class of bistable evolution equations. Under very mild hypotheses, variational techniques are used to show that, in the presence of two saddle-focus equilibria, minimizing solutions respect the topology of the configuration plane punctured at these points. By considering curves in appropriate covering spaces of this doubly punctured plane, we prove that minimizers of every homotopy type exist and characterize their topological properties. 1. Introduction This work is a continuation of [?] where we developed a constrained minimization method to study heteroclinic and homoclinic local minimizers of the action functional γ 2 β 2 JI [u] = j (u, u , u ) dt = (1.1) |u | + |u | + F (u) dt, 2 I I 2 which are solutions of the equation γ u − βu + F (u) = 0

(1.2)

with γ , β > 0. This equation with a double-well potential F has been proposed in connection with certain models of phase transitions. For brevity we will omit a detailed background of this problem and refer only to those sources required in the proofs of the results. A more extensive history and reference list are provided in [?], to which we refer the interested reader. The above equation is Hamiltonian with H = −γ u u +

γ 2 β 2 |u | + |u | − F (u). 2 2

This work was supported by grants ARO DAAH-0493G0199 and NIST G-06-605.

(1.3)

574

W.D. Kalies, J. Kwapisz, J.B. VandenBerg, R.C.A.M. VanderVorst

The configuration space of the system is the (u, u )-plane, and solutions to (1.2) can be represented as curves in this plane. Initially these curves do not appear to be restricted in any way. However, the central idea presented here is that, when (±1, 0) are saddle-foci, the minimizers of J respect the topology of this plane punctured at these two points, which allows for a rich set of minimizers to exist. Using the topology of the doublypunctured plane and its covering spaces, we describe the structure of all possible types of minimizers, including those which are periodic and chaotic. Since the action of the minimizers of these latter types is infinite, a different notion of minimizer is required that is reminiscent of the minimizing (Class A) geodesics of Morse [?]. Such minimizers have been intensively studied in the context of geodesic flows on compact manifolds or the Aubry–Mather theory (see e.g. [1] for an introduction). A crucial difference is that we are dealing with a non-mechanical system on a non-compact space. Nevertheless, we are able to emulate many of Morse’s original arguments about how the minimizers can intersect with themselves and each other. For a precise statement of the main results we refer to Theorem 3.2 and Theorem 5.8. For related work on mechanical Hamiltonian systems we refer to [2,?] and the references therein. Another important aspect of the techniques employed here and in [?] is the mildness of the hypotheses. In particular, our approach requires no transversality or non-degeneracy conditions, such as those found in other variational methods and dynamical systems theory, see [?]. Specifically, we will assume the following hypothesis on F : (H): F ∈ C 2 (R), F (±1) = F (±1) = 0, F (±1) > 0, and F (u) > 0 for u = ±1. Moreover there are constants c1 and c2 such that F (u) ≥ −c1 + c2 u2 . We will also assume for simplicity of the formulation that F is even, but many analogous results will hold for nonsymmetric potentials, cf. [?]. Finally, we assume that the parameters γ and β are such that u = ±1 are saddle-foci, i.e. 4γ /β 2 > 1/F (±1). An example of a nonlinearity satisfying these conditions is F (u) = (u2 − 1)2 /4, in which case (1.2) is the stationary version of the so-called extended Fisher–Kolmogorov (EFK) equation. In [?] we classify heteroclinic and homoclinic minimizers of J by a finite sequence of even integers which represent the number of times a minimizer crosses u = ±1. In order to classify more general minimizers we must consider infinite and bi-infinite sequences, as we now describe. A function u : R → R can be represented as a curve in the (u, u )−plane, and the associated curve will be denoted by (u). Removing the equilibrium points (±1, 0) from the (u, u )−plane (the configuration space) creates a space with nontrivial topology, denoted by P = R2 \{(±1, 0)}. In P we can represent functions u which have the property that u = 0 when u = ±1, and various equivalence classes of curves can be distinguished. For example, in [?] we considered classes of curves that terminate at the equilibrium points (±1, 0). Another important class consists of closed curves in P, which represent periodic functions. We now give a systematic description of all classes to be considered. Definition 1.1. A type is a sequence g = (gi )i∈I with gi ∈ 2N ∪ {∞}, where ∞ acts as a terminator. To be precise, g satisfies one of the following conditions: i) I = Z, and g ∈ 2NZ is referred to as a bi-infinite type. ii) I = {0} ∪ N, and g = (∞, g1 , g2 , . . . ) with gi ∈ 2N for all i ≥ 1, or I = −N ∪ {0}, and g = (. . . , g−2 , g−1 , ∞) with gi ∈ 2N for all i ≤ −1. In these cases g is referred to as a semi-terminated type.

Homotopy Classes for Stable Periodic and Chaotic Patterns

575

iii) I = {0, . . . , N + 1} with N ≥ 0, and g = (∞, g1 , . . . , gN , ∞) with gi ∈ 2N. In this case g is referred to as a terminated type. These types will define function classes using the vector g to count the crossings of u at the levels u = ±1. Since there are two equilibrium points, we introduce the notion of parity denoted by p, which will be equal to either 0 or 1. 2 (R) is in the class M(g, p) if there are nonempty sets Definition 1.2. A function u ∈ Hloc {Ai }i∈I such that i) u−1 (±1) = i∈I Ai , ii) #Ai = gi for i ∈ I, iii) max Ai < min Ai+1 , i+p+1 , and iv) u(A i ) = (−1) v) i∈I Ai consists of transverse crossings of ±1, i.e., u (x) = 0 for x ∈ Ai .

Note that by Definition 1.1, a function u in any class M(g, p) has infinitely many crossings of ±1. Definition 1.2 is similar to the definition of the class M(g) in [?] except that here it is assumed that all crossings of ±1 are transverse. Only finitely many crossings are assumed to be transverse in [?] so that the classes M(g) would be open subsets of χ +H 2 (R). Since we will not directly minimize over M(g, p), we now require transversality of all crossings of ±1 to guarantee that (u) ∈ P. However, note that the minimizers found in [?] are indeed contained in classes M(g, p) as defined above, where the types g are terminated. The classes M(g, p) are nonempty for all pairs (g, p). Conversely, any function 2 (R) is contained in the closure of some class M(g, p) with respect to the u ∈ Hloc −i 2 (R) given by ρ(u, v) = complete metric on Hloc i 2 min{1, u − vH 2 (−i,i) }, cf. 2 [?]. That is, if we define M(g, p) := {u ∈ Hloc (R) | ∃un ∈ M(g, p), with un → u 2 (R)}, then H 2 (R) = ∪ in Hloc (g,p) M(g, p). Note that the functions in ∂M(g, p) := loc M(g, p) \ int(M(g, p)) have tangencies at u = ±1 and thus are limit points of more than one class. In the case of an infinite type, shifts of g can give rise to the same function class. Therefore certain infinite types need to be identified. Let σ be the shift map defined by σ (g)i = gi+1 and the map τ : {0, 1} → {0, 1} be defined by τ (p) = (p + 1)mod 2 = |p−1|. Two infinite types (g, p) and (g , p ) are equivalent if g = σ n (g) and p = τ n (p) for some n ∈ Z, and this implies M(g, p) = M(g , p ). A bi-infinite type g is periodic if there exists an integer n such that σ n (g) = g. When the domain of integration is R, the action J [u] given in (1.1) is well-defined only for terminated types g and u ∈ M(g, p) ∩ {χ p + H 2 (R)}, where χ p is a smooth function from (−1)p+1 to (−1)p . For semi-terminated types or infinite types the action J is infinite for every u ∈ M(g, p). In Sect. 2, we will define an alternative notion of minimizer in order to overcome this difficulty. The primary goal of this paper is to prove the following theorem, but we also prove additional results about the structure and relationships between various types of minimizers. Theorem 1.3. If F satisfies Hypothesis (H) and is even, then for any type g and parity p there exists a minimizer of J in M(g, p) in the sense of Definition 2.1. Moreover, if g is periodic, then there exists a periodic minimizer in M(g, p). In Sects. 5 and 6 we show that other properties of the symbol sequences, such as symmetry, are reflected in the corresponding minimizers. The classification of minimizers by

576

W.D. Kalies, J. Kwapisz, J.B. VandenBerg, R.C.A.M. VanderVorst

symbol sequences has other properties in common with symbolic dynamics; for example, if a type is asymptotically periodic in both directions, then there exists a minimizer of that type which is a heteroclinic connection between two periodic minimizers. The minimizers discussed here all lie in the 3-dimensional “energy-manifold” M0 = {(u, u , u , u ) | H ((u, u , u , u ) = 0}. Exploiting certain properties of minimizers that are established in this paper, we can deduce various linking and knotting characteristics when they are represented as smooth curves in M0 [?,?]. The minimizers found in this paper are also used in [?] to construct stable patterns for the evolutionary EFK equation on a bounded interval, and the dynamics of the evolutionary EFK is discussed in [?]. Some notation used in this paper was previously introduced in [?]. While we have attempted to present a self-contained analysis, we have avoided reproducing details (particularly in Sect. 5.1) which are not central to the ideas presented here, and which are thoroughly explained in [?]. 2. Definition of Minimizer For every compact interval I ⊂ R the restricted action JI is well-defined for all types. When we restrict u to an interval I , we can define its type and parity relative to I , which we denote by (g(u|I ), p(u|I )). Namely, let u ∈ M(g, p). It is clear that (u, u )|∂I ∈ (±1, 0) for any bounded interval I . Then g(u|I ) is defined to be the finite-dimensional vector which counts the consecutive instances of u|I = ±1, and p(u|I ) is defined such that the first time u|I = ±1 in I happens at (−1)p+1 . Note that the components of g(u|I ) are not necessarily all even, since the first and the last entries may be odd. We are now ready to state the definition of a (global) minimizer in M(g, p). Definition 2.1. A function u ∈ M(g, p) is called a minimizer for J over M(g, p) if and only if for every compact interval I the number JI [u|I ] minimizes JI [v|I ] over all functions v ∈ M(g, p) and all compact intervals I such that (v, v )|∂I = (u, u )|∂I and (g(v|I ), p(v|I )) = (g(u|I ), p(u|I )). The pair (g(u|I ), p(u|I )) defines a homotopy class of curves in P with fixed end points (u, u )|∂I . The above definition says that a function u, represented as a curve (u) in P, is a minimizer if and only if for any two points P1 and P2 on (u), the segment (P1 , P2 ) ⊂ (u) connecting P1 and P2 is the most J -efficient among all connections (P1 , P2 ) between P1 and P2 that are induced by a function v and are of the same homotopy type as (P1 , P2 ), regardless of the length of the interval needed to parametrize the curve (P1 , P2 ). As we mentioned in the introduction, this is analogous to the length minimizing geodesics of Morse and Hedlund and the minimizers in the Aubry–Mather theory. The set of all (global) minimizers in M(g, p) will be denoted by CM(g, p). Lemma 2.2. Let u ∈ M(g, p) be a minimizer, then u ∈ C 4 (R) and u satisfies Eq. (1.2). Moreover, u satisfies the relation H (u, u , u , u ) = 0, i.e. the associated orbit lies on the energy level H = 0. Proof. From the definition of M(g, p), on any bounded interval I ⊂ R there exists #0 (I ) > 0 sufficiently small such that u + φ ∈ M(g, p) for all φ ∈ H02 (I ), with φH 2 < # ≤ #0 . Therefore JI [u + φ] ≥ JI [u] for all such functions φ, which implies that dJI [u] = 0 for any bounded interval I ⊂ R, and thus u satisfies (1.2).

Homotopy Classes for Stable Periodic and Chaotic Patterns

577

To prove the second statement we argue as follows. Since u ∈ M(g, p), there exists a bounded interval I such that u |∂I = 0. Introducing the rescaled variable s = t/T with T = |I | and v(s) = u(t), we have JI [u] = J [T , v] ≡

1

0

1 γ 2 1β 2 |v |v | + | + T F (v) ds, T3 2 T 2

(2.1)

which decouples u and T . Since u |∂I = 0 we see from Definition 2.1 that J [T ± #, v] ≥ J T [u] = J [T , v]. The smoothness of J in the variable T > 0 implies that ∂ = 0. Differentiating yields ∂τ J [τ, v] τ =T

∂ J [τ, v] = ∂τ

1

−4 3

2

−2 β

2

−τ γ |v | − τ |v | + F (v) ds 2 2 τ 3 β 2 −1 2 =τ − γ |u | − |u | + F (u) dt 2 2 0 τ = −τ −1 H (u, u , u , u )dt ≡ −E. 0

0

Thus E = 0, and H (u, u , u , u ) = 0 for t ∈ I . This immediately implies that H = 0 for all t ∈ R. The minimizers for J found in [?] also satisfy Definition 2.1, and we restate one of the main results of [?]. Proposition 2.3. Suppose F is even and satisfies (H), and β, γ > 0 are chosen such that u = ±1 are saddle-focus equilibria. Then for any terminated type g with parity either 0 or 1 there exists a minimizer u ∈ M(g, p) of J . From Definition 1.2, the crossings of u ∈ M(g, p) with ±1 are transverse and hence isolated. We adapt from [?], the notion of a normalized function with a few minor changes to reflect the fact that we now require every crossing of ±1 to be transverse. Definition 2.4. A function u ∈ M(g, p) is normalized if, between each pair u(a) and u(b) of consecutive crossings of ±1, the restriction u|(a,b) is either monotone or u|(a,b) has exactly one local extremum. Clearly, the case of u|(a,b) being monotone can occur only between two crossings at different levels ±1, in which case we say that u has a transition on [a, b]. Lemma 2.5. If u ∈ CM(g, p), then u is normalized. Proof. Since u ∈ M(g, p), all crossings of u = ±1 are transverse, i.e. u = 0. Thus for any critical point t0 ∈ R, u(t0 ) = ±1, and the Hamiltonian relation from Lemma 2.2 and (1.3) implies that γ u (t0 )2 /2 = F (u(t0 )) > 0. Therefore u is a Morse function, and between any two consecutive crossings of ±1 there are only finitely many critical points. Now on any interval between consecutive crossings where u is not normalized, the clipping lemmas of Sect. 3 in [?] can be applied to obtain a more J -efficient function, which contradicts the fact that u is a minimizer.

578

W.D. Kalies, J. Kwapisz, J.B. VandenBerg, R.C.A.M. VanderVorst

3. Minimizers of Arbitrary Type In this section we will introduce a notion of convergence of types which will be used in Sect. 5.2 to establish the existence of minimizers in every class M(g, p) by building on the results proved in [?].

Definition 3.1. Consider a sequence of types (gn , pn ) = (gin )i∈In , pn and a type (g, p) = (gi )i∈I , p . The sequence (gn , pn ) limits to the type (g, p) if and only if n there exist numbers Nn ∈ 2Z such that gi+N → gi for all i ∈ I as n → ∞. We n n +p −p n n will abuse notation and write (g , p ) → (g, p). We should point out that a sequence of types can limit to more than one type. For

n , 0) = (∞, 2, 2, n, 4, 4, 4, 4, n, 2, 2, 2, . . . ), 0 limits to the example the sequence (g

types (∞, 2, 2, ∞), 0 , (∞, 4, 4, 4, 4, ∞), 1 and (∞, 2, 2, 2, . . . ), 0 . Theorem 3.2. Let (gn , pn ) → (g, p) and un ∈ CM(gn , pn ) with un 1,∞ ≤ C for 4 (R), all n. Then there exists a subsequence unk such that unk → u ∈ M(g, p) in Cloc and u is a minimizer in the sense of Definition 2.1, i.e. u ∈ CM(g, p). Proof. This proof requires arguments developed in [?] to which the reader is referred for certain details. The idea is to take the limit of un restricted to bounded intervals. We define the numbers Nn as in Definition 3.1, and we denote the convex hull of Ai by Ii = conv(Ai ). Due to translation invariance we can pin the functions un so that un (0) = (−1)p+1 , which is the beginning of the transition between INn n +pn −p and n I1+N . Due to the assumed a priori bound and interpolation estimates which can n n +p −p be found in the appendix to [?], there is enough regularity to yield a limit function u 4 –limit of u , after perhaps passing to a subsequence. Moreover u satisfies the as a Cloc n differential equation (1.2) on R. The question that remains is whether u ∈ M(g, p). To simplify notation we will now assume that Nn = 0 and pn = p = 0. Fixing a small δ > 0, we define Iin (δ) ⊃ Iin as the smallest interval containing Iin such that u|∂Iin (δ) = (−1)i+1 − (−1)i+1 δ (if g is a (semi-)terminated type then Iin (δ) may be n (δ) is denoted by Ln (δ). a half-line). The interval of transition between Iin (δ) and Ii+1 i To see that u ∈ M(g, p), one has to to eliminate the two possibilities that a priori may lead to the loss or creation of crossings in the limit so that u ∈ M(g, p): the distance between two consecutive crossings in un could grow without bound or u could possess tangencies at u = ±1. Due to the a priori estimates in W 1,∞ we have the following bounds on J : J [un |Iin (δ) ] ≤ C

and

J [un |Lni (δ) ] ≤ C ,

(3.1)

where C and C are independent of n and i. Indeed, note that for n large enough the homotopy type of un on the intervals Iin (δ) is constant by the definition of convergence of types. Since the functions un are minimizers, J [un |Iin (δ) ] is less than the action of any test function of this homotopy type satisfying the a priori bounds on u and u on ∂Iin (δ) (see [?, Sect. 6] for a similar test function argument). The estimate |Lni (δ)| ≤ C(δ) is immediately clear from Lemma 5.1 of [?]. We now need to show that the distance between two crossings of (−1)i+1 within the interval Iin (δ) cannot tend to infinity. First we will deal with the case when gin is finite for all n. Suppose that the distance between consecutive crossings of (−1)i+1 in Iin (δ) tends to infinity as n → ∞. Due to Inequality (3.1) and Lemma 2.5, minimizers have exactly one extremum between

Homotopy Classes for Stable Periodic and Chaotic Patterns

579

crossings of (−1)i+1 for any # > 0, and hence there exist subintervals Kn ⊂ Iin (δ) with |Kn | → ∞, such that 0 < |un − (−1)qn | < # on Kn , where qn ∈ {0, 1}, and |u |∂Kn | < #. Taking a subsequence we may assume that qn is constant. We begin by considering the case where qn = i + 1. Now # can be chosen small enough, so that the local theory in [?] is applicable in Kn . If |Kn | becomes too large then un can be replaced by a function with lower action and with many crossings of (−1)i+1 . Subsequently, redundant crossings can be clipped out, thereby lowering the action. This implies that un is not a minimizer in the sense of Definition 2.1, a contradiction. The case where qn = i must be dealt with in a different manner. First, there are unique points tn ∈ Kn such that un (tn ) = 0, and for these points un (tn ) → (−1)i as |Kn | → ∞. Let un (sn ) be the first crossing of (−1)i+1 to the left of Kn . Taking the limit (along subsequences) of un (t − sn ) we obtain a limit function u which is a solution of (1.2). If |tn − sn | is bounded then u has a tangency to u = (−1)i at some t∗ ∈ R. All un lie in {H = 0} (see (1.3)) and so does u, hence u (t∗ ) = 0. Moreover u (t∗ ) = 0, because u(t∗ ) is an extremum. By uniqueness of the initial value problem this implies that u ≡ (−1)i , contradicting the fact that u(0) = (−1)i+1 . If |tn − sn | → ∞, then u is a monotone function on [0, ∞), tending to (−1)i as x → ∞, and its derivatives tend to zero (see Lemma 3 in [14] or Lemma 1, Part (ii) in [?] for details). This contradicts the saddle-focus nature of the equilibrium point. In the case that gin = ∞ we remark that (3.1) also holds when Iin is a half-line. It follows from the estimates in Lemma 5.1 in [?] that uni → (−1)i+1 as x → ∞ or x → −∞ (whichever is applicable). From the local theory in Sect. 4 of [?] and the fact that un is a minimizer, it follows that the derivatives of un tend to zero. The analysis above concerning the intervals Kn and the clipping of redundant oscillations now goes on unchanged. We have shown that the distance between two crossings of ±1 is bounded from above. Next we have to show that the limit function has only transverse crossings of ±1. This ensures that no crossings are lost in the limit. If u were tangent to (−1)i+1 in Ii , then we could construct a function in v ∈ M(g, p) in the same way as demonstrated in [?] by replacing tangent pieces by more J -efficient local minimizers and by clipping. The function v still has a lower action than u on a slightly larger interval (the limit function u also obeys (3.1), so that the above clipping arguments still apply). Since un → u in 4 it follows that J [u ] → J [u] on bounded intervals I . This then implies that for n Cloc I n I large enough the function un is not a minimizer in the sense of Definition 2.1, which is a contradiction. The limit function u could also be tangent to (−1)i for some t0 ∈ Ii . As before, such tangencies satisfy u(t0 ) − (−1)i = u (t0 ) = u (t0 ) = u (t0 ) = 0, which leads to a contradiction of the uniqueness of the initial value problem. Finally, crossings of u = ±1 cannot accumulate since this would imply that at the accumulation point all derivatives would be zero, leading to the same contradiction as above. In particular, if gin → ∞ for some i, then |Iin | → ∞ and the crossings in Anj for j > i move off to infinity and do not show in u, which is compatabile with the convergence of types. 4 –limit of minimizers, We have now proved that u ∈ M(g, p) and, since u is the Cloc u is also a minimizer in the sense of Definition 2.1. Remark 3.3. It follows from the estimates in Theorem 3 of [?] that in the theorem above we in fact only need an L∞ -bound on the sequence un .

580

W.D. Kalies, J. Kwapisz, J.B. VandenBerg, R.C.A.M. VanderVorst

Remark 3.4. It follows from the proof of Theorem 3.2 that there exists a constant δ0 > 0 such that for all uniformly bounded minimizers u(t) it holds that |u(t) − (−1)i+p | > δ for all t ∈ Ii and all i ∈ I. This means that the uniform separation property discussed in [?] is uniformly satisfied by all minimizers. 4. Periodic Minimizers A bi-infinite type g is periodic if there exists an integer n such that σ n (g) = g. The (natural) definition of the period of g is min{n ∈ 2N | σ n (g) = g}. We will write g = r, where r = (g1 , . . . , gn ) and n is even. Cyclic permutations of r with possibly a flip of p give rise to the same function class M(r, p). In reference to the type r with parity p we will use the notation (r, p). Any such type pair (r, p) can formally be associated with a homotopy class in π1 (P, 0) in the following way. Let e0 and e1 be the clockwise oriented circles of radius one centered at (1, 0) and (−1, 0) respectively, so that [e0 ] r /2 r /2 and [e1 ] are generators for π1 (P, 0). Defining θ(r, p) = eτnn (p) · . . . · ep1 , the map θ : ∪k≥1 2N2k × {0, 1} → π1 (P, 0) is an injection, and we define π1+ (P, 0) to be the image of θ in π1 (P, 0). Powers of a type pair (r, p)k for k ≥ 1 are defined by concatenation of r with itself k times, which is equivalent to (r, p)k = θ −1 ((θ (r, p))k ). Definition 4.1. Two pairs (r, p) and ( r, p) are equivalent if there are numbers p, q ∈ N r, p)q up to cyclic permutations. This relation, (r, p) ∼ ( r, p), is such that (r, p)p = ( an equivalence relation.

Example. If (r, p) = (2, 4, 2, 4), 0 and ( r, p) = (4, 2, 4, 2, 4, 2), 1 , then θ(r, p)3 = θ ( r, p)2 . The equivalence class of (r, p) is denoted by [r, p]. A type (r, p) is a minimal representative for [r, p] if for each ( r, p) ∈ [r, p] there is k ≥ 1 such that ( r, p) = (r, p)k up to cyclic permutations. A minimal representative is unique up to cyclic permutations. It is clear that in the representation of a periodic type g = r, the type r is minimal if the length of r is the minimal period. Due to the above equivalences we now have that M(r, p) = M( r, p), ∀ ( r, p) ∈ [r, p]. It is not a priori clear that minimizers in M(r, p) are periodic. However, we will see that among these minimizers, periodic minimizers can always be found. For a given periodic type r we consider the subset of periodic functions in M(r, p), Mper (r, p) = {u ∈ M(r, p) | u is periodic}. For any u ∈ Mper (r, p) and a period T of u, (u|[0,T ] ) is a closed loop in P whose homotopy type corresponds to a nontrivial element of π1+ (P, 0). In this correspondence there is no natural choice of a basepoint. For specificity, we will describe how to make the correspondence with the origin as the basepoint and thereafter omit it from the notation. Translate u so that u(0) = 0. Let γ : [0, 1] → P be the line from 0 to ∗ [0,T ] ) = γ ∗ ◦ (u|[0,T ] ) ◦ γ , and

(0, u (0)), and let+ γ (t) = γ (1 − t). Then (u| (u|[0,T ] ) . Thus there exists a (u|[0,T ] ) ∈ π1 (P, 0). Now define (u|[0,T ] ) ≡

pair θ −1 (u|[0,T ] ) = ( r, p) ∈ [r, p], with r = rk for some k ≥ 1. Therefore we define for any ( r, p) ∈ [r, p],

Mper ( r, p) = u ∈ Mper (r, p) | (u|[0,T ] ) ∼ θ( r, p) ∈ π1 (P) for a period T of u .

Homotopy Classes for Stable Periodic and Chaotic Patterns

581

The type r = g(u|[0,T ] ), with g = r, is the homotopy type of u relative to a period T . This type has an even number of entries. It follows that Mper (r, p) ⊂ Mper ( r, p) for all ( r, p) = (r, p)k , k ≥ 1. Furthermore Mper (r , p) = ∪( r, p)∈[r,p] Mper ( r, p). In order to get a better understanding of periodic minimizers in M(r, p) we consider the following minimization problem: Jper (r, p) =

inf

u∈Mper (r,p)

JT [u] =

inf

T (r,p) Mper T ∈R+

JT [u],

(4.1)

T (r, p) where JT is action given in (1.1) integrated over one period of length T , and Mper is the set of T -periodic functions u ∈ Mper (r, p) for which g(u|[0,T ] ) = r. Note that T is not necessarily the minimal period, unless r is a minimal representative for [r]. It is clear that for γ , β > 0 the infima Jper (r, p) are well-defined and are nonnegative for any homotopy type r. At this point it is not clear, however, that the infima Jper (r, p) are attained for all homotopy types r. We will prove in Sect. 5 that existence of minimizers for (4.1) can be obtained using the existence of homoclinic and heteroclinic minimizers already established in [?].

Lemma 4.2. If Jper (r, p) is attained for some u ∈ Mper (r, p), then u ∈ C 4 (R) and satisfies (1.2). Moreover, since u is minimal with respect to T we have H (u, u , u , u ) = 0, i.e. the associated periodic orbit lies in the energy surface H = 0. Proof. Since Jper (r, p) is attained by some u ∈ Mper (r, p) for some period T , we have that JT [u + φ] − JT [u] ≥ 0 for all φ ∈ H 2 (S 1 , T ) with φH 2 ≤ #, sufficiently small. This implies that dJT [u] = 0, and thus u satisfies (1.2). The second part of this proof is analogous to the proof of Lemma 2.2. We now introduce the following notation: CM(r, p) = {u ∈ M(r, p) | u is a minimizer according to Definition 2.1}, CMper (r, p) = {u ∈ CM(r, p) | u is periodic}, CMper (r, p) = {u ∈ Mper (r, p) | u is a minimizer for Jper (r, p)}. 4.1. Existence of periodic minimizers of type r = (2, 2)k . If we seek periodic minimizers of type r = (2, 2)k , the uniform separation property for minimizing sequences (see Sect. 5 in [?]) is satisfied in the class Mper (r). Note that the parity is omitted because it does not distinguish different homotopy types here. The uniform separation property as defined in [?] prevents minimizing sequences from crossing the boundary of the given homotopy class. For any other periodic type the uniform separation property is not a priori satisfied. For the sake of simplicity we begin with periodic minimizers of type (2, 2) and minimize J in the class Mper ((2, 2)). Minimizing sequences can be chosen to be normalized due to the following lemma, which we state without proof. The proof is analogous to Lemma 3.5 in [?]. Lemma 4.3. Let u ∈ Mper ((2, 2)) and T be a period of u. Then for every # > 0 there exists a normalized function w ∈ Mper ((2, 2)) with period T ≤ T such that JT [w] ≤ JT [u] + #.

582

W.D. Kalies, J. Kwapisz, J.B. VandenBerg, R.C.A.M. VanderVorst

The goal of this subsection is to prove that when F satisfies (H) and β, γ > 0 are such that u = ±1 are saddle-foci, then Jper ((2, 2)) is attained, by Theorem 4.5 below. The proof relies on the local structure of the saddle-focus equilibria u = ±1 and is a modification of arguments in [?]; hence we will provide only a brief argument. The reader is referred to [?] for further details. In preparation for the proof of Theorem 4.5, we fix τ0 > 0, #0 > 0, and δ > 0 so that the conclusion of Theorem 4.2 of [?] holds, i.e. the characterization of the oscillatory behavior of solutions near the saddle-focus equilibria u = ±1 holds. Let T ((2, 2)) be normalized, and let t be such that u(t ) = 0. Then t is part of a u ∈ Mper 0 0 0 transition from ∓1 to ±1. Assume without loss of generality that this transition is from −1 to 1. Define t− = sup{t < t0 : |u(t)+1| < δ} and t+ = inf{t > t0 : |u(t)−1| < δ}. Then let S(u) = {t : |u(t) ± 1| < δ} and B[u, T ] = |S(u) ∩ [t+, t− + T ]|, and note that [t0 , t0 + T ] = S(u) ∩ [t+ , t− + T ] ∪ S(u)c ∩ [t0 , t0 + T ] . With these definitions we can establish the following estimate (cf. Lemma 5.4 in [?]). For all u ∈ Mper ((2, 2)) with JT [u] ≤ Jper ((2, 2)) + #0 , u2H 2 ≤ C(1 + Jper ((2, 2)) + B[u, T ]).

(4.2)

First, u 2H 1 ≤ C(Jper ((2, 2))+#0 ), and second if |u±1| > δ, then F (u) ≥ η2 u2 , which t +T implies that u2L2 ≤ 1/η2 t00 F (u) dt + (1 + δ)2 B[u, T ] ≤ C(JT [u] + B[u, T ]). Combining these two estimates proves (4.2). T ((2, 2)) that satisfy J [u] ≤ J ((2, 2)) + 1, it follows For functions u ∈ Mper T per from Lemma 5.1 of [?] that there exist (uniform in u) constants T1 and T2 such that T2 ≥ |S(u)c ∩ [t0 , t0 + T ]| ≥ T1 > 0 and thus T > T1 . The next step is to give an a priori upper bound on T by considering the minimization problem (cf. Sect. 5 in [?]) T ((2, 2)) normalized, T ∈ R+ , B# = inf{ B[u, T ] | u ∈ Mper and JT [u] ≤ Jper ((2, 2)) + #}.

Lemma 4.4. There exists a constant K = K(τ0 ) > 0 such that B# ≤ K for all 0 < # < #0 . Moreover, if T0 ≡ K + T2 , then for any 0 < # < #0 , there is a normalized T ((2, 2)) with J [u] ≤ J ((2, 2)) + 2# and T < T ≤ T . u ∈ Mper T per 1 0 Tn ((2, 2))×R+ be a minimizing sequence for B# , with normalProof. Let (un , Tn ) ∈ Mper ized functions un . As in the proof of Theorem 5.5 of [?], in the weak limit this yields a pair ( u, T ) such that B[ u, T ] ≤ B# . We now define K((2, 2), τ0 ) = 8((2τ0 + 2) + 2). This gives two possibilities for B[ u, T ], either B[ u, T ] > K or B[ u, T ] ≤ K. If the former is T ((2, 2)) × R+ , true then we can construct (see Theorem 5.5 of [?]) a pair ( v , T ) ∈ Mper with v normalized, such that

v ] < JT [ u] ≤ Jper ((2, 2)) + # JT [

and

B[ v , T ] < B[ u, T ] ≤ B# ,

which is a contradiction excluding the first possibility. In the second case, where B[ u, T ] ≤ K, we can construct a pair ( v , T ) with v normalized such that v ] < JT [ u] + # ≤ Jper ((2, 2)) + 2#, JT [

and

B[ v , T ] < B[ u, T ] ≤ K,

which implies that T1 < T < T ≤ K + T2 = T0 and concludes the proof. For details concerning these constructions, see Theorem 5.5 in [?].

Homotopy Classes for Stable Periodic and Chaotic Patterns

583

Theorem 4.5. Suppose that F satisfies (H) and β, γ > 0 are such that u = ±1 are saddle-foci, then Jper ((2, 2)k ) is attained for any k ≥ 1. Moreover, the projection of any minimizer in CMper ((2, 2)) onto the (u, u )–plane is a simple closed curve. Tn Proof. By Lemma 4.4, we can choose a minimizing sequence (un , Tn ) ∈ Mper ((2, 2))× R+ , with un normalized and with the additional properties that un H 2 ≤ C and T1 < Tn ≤ T0 . Since the uniform separation property is satisfied for the type (2, 2) this leads to a minimizing pair ( u, T ) for (4.1) by following the proof of Theorem 2.2 in [?]. As for the existence of periodic minimizers of type r = (2, 2)k the uniform separation property is automatically satisfied and the above steps are identical. Lemma 2.5 yields that minimizers are normalized functions and the projection of a normalized function in Mper ((2, 2)) is a simple closed curve in the (u, u )–plane.

We would like to have the same theorem for arbitrary periodic types r. For homotopy types that satisfy the uniform separation property the analogue of Theorem 4.5 can be proved. However, in Sect. 5 we will prove a more general result using the information about the minimizers with terminated types (homoclinic and heteroclinic minimizers) which was obtained in [?]. Remark 4.6. The existence of a (2, 2)-type minimizer is proved here in order to obtain a priori W 1,∞ -estimates for all minimizers (Sect. 5). However, if F satisfies the additional hypothesis that F (u) ∼ |u|s , s > 2 as |u| → ∞, then such estimates are automatic (cf. [?,?]). In that case the existence of a minimizer of type (2, 2) follows from Theorem 4.14 below. To prove existence of minimizers of arbitrary type r we will use an analogue of Theorem 4.14 (see Lemma 5.7 and Theorem 5.8 below). 4.2. Characterization of minimizers of type g = (2, 2). Periodic minimizers associated with [e0 ] or [e1 ] are the constant solutions u = −1 and u = 1 respectively. The simplest nontrivial periodic minimizers are those of type r = (2, 2)k , i.e. r ∈ [(2, 2)]. These minimizers are crucial to the further analysis of the general case. The type r = (2, 2) is a minimal type (associated with [e1 e0 ]), and we want to investigate the relation between minimizers in M((2, 2)) and periodic minimizers of type (2, 2)k . Considering curves in the configuration space P is a convenient method for studying minimizers of type (2, 2). For example, minimizers in CM((2, 2)) and CMper ((2, 2)) all satisfy the property that they do not intersect the line segment L = (−1, 1)×{0} in P. If other homotopy types r are considered, i.e. r ∈ [(2, 2)], then minimizers represented as curves in P necessarily have self-intersections and they must intersect the segment L, which makes their comparison more complicated. We will come back to this problem in Sect. 5. Note that for a C 1 -function u the associated curve (u) is a closed loop if and only if u is a periodic function. Lemma 4.7. For any non-periodic minimizer u ∈ CM((2, 2)) and any bounded interval I the curve [u|I ] has only a finite number of self-intersections. For periodic minimizers u ∈ CMper ((2, 2)) this property holds when the length of I is smaller than the minimal period. Proof. Fix a time interval I = [0, T ]. If u is periodic, T should be chosen smaller than the minimal period of u. Let P = (u0 , u0 ) be an accumulation point of self-intersections of u|I . Then P is a self intersection point, and there exists a monotone sequence of times τn ∈ I converging to t0 such that (u(τn )) are self-intersection points and (u(t0 )) = P .

584

W.D. Kalies, J. Kwapisz, J.B. VandenBerg, R.C.A.M. VanderVorst

Also there exists a corresponding sequence σn ∈ I with σn = τn such that (u(τn )) = (u(σn )). Choosing a subsequence if necessary, σn → s0 monotonically. Since u is a minimizer in CM((2, 2)), the intervals [σn , τn ] must contain a transition, and hence |τn − σn | > T0 > 0. Therefore, s0 = t0 , and we will assume that s0 < t0 (otherwise change labels). The homotopy type of (u|[s0 ,t0 ] ) is (2, 2)k for some k ≥ 1 (since I is bounded). Assume that σn and τn are increasing; the other case is similar. Using the times σn < s0 < τn < t0 , the curve ∗ = [u|[σn −δ,t0 +δ] ], for δ sufficiently small, can be decomposed as 1 = a ◦γ2 ◦γ ◦γ1 ◦b, where b = (u|[σn −δ,σn ] ), γ1 = (u|[σn ,s0 ] ), γ = (u|[s0 ,τn ] ), γ2 = (u|[τn ,t0 ] ), and a = (u|[t0 ,t0 +δ] ). For n sufficiently large, γ1 and γ2 have the same homotopy type, and γ1 = γ2 , since otherwise u would be periodic with period smaller than t0 − σn < T . We can now construct two more paths 1 = a ◦ γ1 ◦ γ ◦ γ1 ◦ b

and

2 = a ◦ γ2 ◦ γ ◦ γ 2 ◦ b

which have the same homotopy type for n sufficiently large. Since J [∗ ] is minimal, J [1 ] ≥ J [∗ ] and J [2 ] ≥ J [∗ ], and thus J [γ1 ] ≥ J [γ2 ] and J [γ2 ] ≥ J [γ1 ] which implies that J [γ1 ] = J [γ2 ]. Therefore J [∗ ] = J [1 ] = J [2 ], and 1 , 2 and ∗ are all distinct minimizers with the same homotopy type and same boundary conditions. Since these curves all coincide along γ , the uniqueness of the initial value problem is contradicted. An argument very similar to the one above is also used in the proof of Lemma 4.12 and is demonstrated in Fig. 4.1. Lemma 4.8. If r = (2, 2)k with k > 1, then CMper (r) = CMper ((2, 2)) and Jper (r) = k · Jper ((2, 2)). Proof. Let u ∈ CMper (r) with r = (2, 2)k for k > 1, and let T be the period1 such that the associated curve in P, (u|[0,T ] ), has the homotopy class of θ((2, 2)k ). First we will prove that (u|[0,T ] ) is a simple closed curve in P, and hence u ∈ Mper ((2, 2)). Suppose not, then by Lemma 4.7 the curve (u|[0,T ] ) can be fully decomposed into k distinct simple closed curves i for i = 1, . . . , k (just call the inner loop 1 , cut it out, and call the new inner loop 2 , and so on). Denote by Ji the action associated with loop i , then i Ji = JT [u]. Let vi ∈ Mper ((2, 2)k ) be the function obtained by pasting together k copies of u restricted to the loop i . If vi were a minimizer in Mper ((2, 2)k ), then by Lemma 4.2 the functions u and vi would be distinct solutions to the differential equation (1.2) which coincide over an interval. This would contradict the uniqueness of solutions of the initial value problem, and hence vi is not a minimizer, i.e. JT [vi ] = k · Ji > Jper ((2, 2)k ). Consequently Jper ((2, 2)k ) = i Ji > Jper ((2, 2)k ), which is a contradiction. Thus u ∈ Mper ((2, 2)) and (u|[0,T ] ) is a simple loop traversed k times. Now we will show that u ∈ CMper ((2, 2)). Since (u) is the projection of a function into the (u, u )–plane, u traverses the loop once over the interval [0, T /k], and Jper ((2, 2)k ) = k · JT /k [u]. Suppose JT /k > Jper ((2, 2)). Then we can construct a function in Mper ((2, 2)k ) with action less than J [u] = Jper ((2, 2)k ) by gluing together k copies of a minimizer in Mper ((2, 2)), which is a contradiction. Lemma 4.9. For any k ≥ 1, CMper ((2, 2)k ) = CMper ((2, 2)) = CMper ((2, 2)). 1 One may assume without loss of generality that is a minimal period.

Homotopy Classes for Stable Periodic and Chaotic Patterns

585

Proof. We have already shown in Lemma 4.8 that CMper ((2, 2)k ) = CMper ((2, 2)). We first prove that CMper ((2, 2)) ⊂ CMper ((2, 2)). Let u ∈ CMper ((2, 2)) have period T . Suppose u ∈ CMper ((2, 2)). Then there exist two points (u(t1 )) = P1 and (u(t2 )) = P2 on (u) such that the curve γ between P1 and P2 obtained by following (u) is not minimal. Replacing γ by a curve with smaller action and the same homotopy type yields a function v ∈ Mper ((2, 2)) for which J[t1 ,t2 ] [v] ≤ J[t1 ,t2 ] [u]. Choose k ≥ 0 such that kT > t2 − t1 . Then u is a minimizer in CMper ((2, 2)k ) = CMper ((2, 2)) which is a contradiction. To finish the proof of the lemma we show that CMper ((2, 2)) ⊂ CMper ((2, 2)). Let u ∈ CMper ((2, 2)) have period T . Let (u|[0,T ] ) be the associated closed curve in P and ω its winding number with respect to the segment L. Suppose JT [u] > Jper ((2, 2)ω ) = ω·Jper (2, 2). This implies the existence of a function v ∈ Mper ((2, 2)ω ) and a period T such that JT [v] < JT [u]. Choose a time t0 ∈ [0, T ] such that u(t0 ) = 1 and u (t0 ) > 0. Let P0 = (1, u (t0 )) ∈ P. There exists a δ > 0 sufficiently small such that u(t0 ± δ) > 0, u (t0 ± δ) > 0, and u does not cross ±1 in [t0 − δ, t0 + δ] except at t0 . Let P1 and P2 denote the points (u(t0 ∓ δ), u (t0 ∓ δ)) respectively. Let γ denote the piece of the curve (u) from P1 to P2 and γ ∗ the curve tracing (u) backward in time from P2 to P1 . Now choose a point P3 on (v) for which v = 1 and v > 0. We can easily construct cubic polynomials p1 and p2 for which the curve (p1 ) connects P1 to P3 and the curve (p2 ) connects P3 to P2 in P. These curves (pi ) are monotone functions, and hence the loop (p1 ) ◦ (p2 ) ◦ γ ∗ has trivial homotopy type in P. Therefore (u|[0,T ] )k ◦ γ ∼ (p2 ) ◦ (v|[0,T ] )k ◦ (p1 ) in P for any k ≥ 1, and from Definition 2.1 J [(u|[0,T ] )k ◦ γ ] ≤ J [(p2 ) ◦ (v|[0,T ] )k ◦ (p1 )]. Thus, k · JT [u] + J [γ ] ≤ J [p1 ] + J [p2 ] + k · JT [v], which implies 0 ≤ k(JT [u] − JT [v]) ≤ J [p1 ] + J [p2 ] − J [γ ]. These estimates lead to a contradiction for k sufficiently large.

Lemma 4.10. For any two distinct minimizers u1 and u2 in CMper ((2, 2)), the associated curves (ui ) do not intersect. Proof. Suppose (u1 ) and (u2 ) intersect at a point P ∈ P. Translate u1 and u2 so that (u1 (0)) = (u2 (0)) = P . Define the function u ∈ Mper ((2, 2)2 ) as the periodic extension of u1 (t) for t ∈ [0, T1 ], u(t) = u2 (t − T1 ) for t ∈ [T1 , T1 + T2 ], where Ti is the minimal period of ui . Then JT1 +T2 [u] = 2Jper ((2, 2)) = Jper ((2, 2)2 ). By Lemma 4.8 we have u ∈ CMper ((2, 2)), which contradicts the fact that u1 and u2 are distinct minimizers with (u1 ) = (u2 ). As a direct consequence of this lemma, the periodic orbits in Mper ((2, 2)) are ordered in the sense that (u1 ) lies either strictly inside or outside the region enclosed by (u2 ). The ordering will be denoted by >. Theorem 4.11. There exists a largest and a smallest periodic orbit in CMper ((2, 2)) in the sense of the above ordering, which we will denote by umax and umin respectively. Moreover 1 < umin 1,∞ ≤ umax 1,∞ ≤ C0 , and umin < u < umax for every u ∈ CMper ((2, 2)). In particular the set CMper ((2, 2)) is compact.

586

W.D. Kalies, J. Kwapisz, J.B. VandenBerg, R.C.A.M. VanderVorst

Proof. Either the number of periodic minimizers is finite, in which case there is nothing to prove, or the set of minimizers is infinite. Let U = {(u) | u ∈ CMper ((2, 2))} ⊂ P, and let A = U ∩ {(u, u ) | u = 0, u > 0}. Every minimizer in CMper ((2, 2)) intersects the positive u–axis transversely exactly once. Moreover distinct minimizers cross this axis at distinct points by Lemma 4.10. Thus we can use A as an index set and label the minimizers as uα for α ∈ A. Due to the a priori upper bound on minimizers (Lemma 5.1 in [?]), A is a bounded set. The set A is contained in the u-axis and hence has an ordering induced by the real numbers. This order corresponds to the order on minimizers, i.e. α < β in A if and only if uα < uβ as minimizers. Suppose α∗ is an accumulation point of A. Then there exists a sequence αn converging to α∗ . From Theorem 3.2 (the a priori L∞ -bound on uαn is sufficient by Remark 3.3) we see that there exists u ∈ CM((2, 2)) which is a solution to Eq. (1.2) such that 1 (R). Since u is periodic and the C 1 –limit of a sequence of periodic uαn → u in Cloc αn loc functions with uniformly bounded periods (compare with the proof of Theorem 3.2 to find a uniform bound on the periods) is periodic, u ∈ CMper ((2, 2)). By Lemma 4.9, u ∈ CMper ((2, 2)). Furthermore u corresponds to uα∗ , and hence A is compact. Consequently A contains maximal and minimal elements. Let umax and umin be the periodic minimizers through the maximal and minimal points of A respectively. This proves the theorem. The above lemmas characterize periodic minimizers in CM((2, 2)). Now we turn our attention to non-periodic minimizers. We conclude this subsection with a theorem that gives a complete description of the set CM((2, 2)). Let u ∈ CM((2, 2)) be non-periodic. Suppose that P is a self-intersection point of (u). Then there exist times t1 < t2 such that (u(t1 )) = (u(t2 )) = P , and (u|[t1 ,t2 ] ) is a closed loop. By Lemma 4.7 there are only finitely many self-intersections on [t1 , t2 ]. Without loss of generality we may therefore assume that γ is a simple closed loop, i.e, we need only consider the case where P = (u(t1 )) = (u(t2 )) and (u|[t1 ,t2 ] ) is a simple closed loop. We now define + = (u|(t1 ,∞) ) and − = (u|(−∞,t2 ) ). We will refer to ± as the forward and backward orbits of u relative to P . Lemma 4.12. Let u ∈ CM((2, 2)) be a non-periodic minimizer with at least one selfintersection. Let P and ± be defined as above. Then the forward and backward orbits ± relative to P do not intersect themselves. Furthermore, P and ± are unique, and the curve (u) passes through any point in P at most twice. Proof. We will prove the result for + ; the argument for − is similar. Suppose that + has self-intersections. Define t∗ = min{t > t1 | (u(t)) = (u(τ )) for some τ ∈ (t1 , t)}. The minimum t∗ is attained by Lemma 4.7, and t∗ > t2 since γ ≡ (u|[t1 ,t2 ] ) is a simple closed loop. Let t0 ∈ (t1 , t∗ ) be the point such that (u(t0 )) = (u(t∗ )). This point is unique by the definition of t∗ , and γ˜ ≡ (u|[t0 ,t∗ ] ) is a simple closed loop. For small positive δ we define Q = (u(t∗ )), B = (u(t1 − δ)), E = (u(t∗ + δ)) and ∗ = (u|[t1 −δ,t∗ +δ] ), see Fig. 4.1. We can decompose this curve into five parts; ∗ = σ3 ◦ γ˜ ◦ σ2 ◦ γ ◦ σ1 , where σ1 joins B to P , σ2 joins P to Q, σ3 joins Q to E, and γ and γ˜ are simple closed loops based at P and Q respectively, see Fig. 4.1. The simple closed curves γ and γ˜ go around L exactly once and thus have the same homotopy type. Moreover, γ = γ˜ since u is non-periodic.

Homotopy Classes for Stable Periodic and Chaotic Patterns

587

Besides ∗ we can construct two other distinct paths from B to E: 1 = σ 3 ◦ σ 2 ◦ γ ◦ γ ◦ σ 1

and

2 = σ3 ◦ γ˜ ◦ γ˜ ◦ σ2 ◦ σ1 .

It is not difficult to see that 1 , 2 and ∗ all have the same homotopy type. Since J [∗ ] is minimal in the sense of Definition 2.1 we have, by the same reasoning as in Lemma 4.7, that J [1 ] ≥ J [∗ ] and J [2 ] ≥ J [∗ ], which implies that J [γ˜ ] ≥ J [γ ] and J [γ ] ≥ J [γ˜ ]. Hence J [γ ] = J [γ˜ ]. Therefore J [1 ] = J [2 ] = J [∗ ] which gives that 1 , 2 and ∗ are all distinct minimizers of the same type as curves joining B to E. Since these curves all contain the paths σ1 , σ2 and σ3 , and are solutions to (1.2), the uniqueness to the initial value problem is contradicted. Finally, the curve (u) can pass through a point at most twice because it is a union of + and − , each visiting a point at most once. Moreover, points in (u|(t1 ,t2 ) ), common to both + and − , are passed exactly once. It now follows that if there is another selfintersection besides P , say at R = (u(s1 )) = (u(s2 )), then s1 < t1 and t2 < s2 . We conclude that the curve (u|(s1 ,s2 ) ) contains (u|[t1 ,t2 ] ) and therefore it is not a simple closed curve. Thus P is a unique self-intersection that cuts off a simple loop.

B

Q

2

1

3

P

~

E L

;1; 0)

(

(1; 0)

Fig. 4.1. The forward orbit + starting at P with a self-intersection at the point Q Lemma 4.12 implies that this cannot happen for non-periodic u ∈ CM((2, 2))

Lemma 4.13. Let u ∈ CM((2, 2)) be non-periodic. Suppose that u ∈ L∞ (R). Then u is a connecting orbit between two periodic minimizers u− , u+ ∈ CMper ((2, 2)), i.e. there are sequences tn− , tn+ → ∞ such that u(t − tn− ) → u− (t) and u(t + tn+ ) → u+ (t) 4 (R). in Cloc Proof. Lemma 4.12 implies that + is a spiral which intersects the positive u–axis at a bounded, monotone sequence of points (αn , 0) in P converging to a point (α∗ , 0). Let tn be the sequence of consecutive times such that u(tn ) = αn , and n (tn ) = 0. Consider the sequence of minimizers in CM((2, 2)) defined by un (t) = u(t + tn ). By Theorem 1 –limit u ∈ CM((2, 2)). If u is periodic, there is nothing more 3.2 there exist a Cloc + + to prove. Thus suppose u+ is non-periodic. Then the curve (u+ ) crosses the u–axis 1 convergence (u ) crosses infinitely many times. On the other hand, from the Cloc +

588

W.D. Kalies, J. Kwapisz, J.B. VandenBerg, R.C.A.M. VanderVorst

this axis only at α∗ . By Lemma 4.12, (u+ ) can intersect α∗ at most twice, which is a 4 –convergence follows from regularity (as in the proof of Theorem contradiction. The Cloc 4.2). The proof of the existence of u− is similar. Theorem 4.14. Let u ∈ CM((2, 2)). Either u is unbounded, u is periodic and u ∈ CMper ((2, 2)), or u is a connecting orbit between periodic minimizers in CMper ((2, 2)). Proof. Let u ∈ CM((2, 2)) be bounded, then u is either periodic or non-periodic. In the case that u is periodic it follows from Lemma 4.9 that u ∈ CMper ((2, 2)). Otherwise if u is not periodic it follows from Lemma 4.13 that u is a connecting orbit between two minimizers u− , u+ ∈ CMper ((2, 2)). In Sect. 5.2 we give analogues of the above theorems for arbitrary homotopy types r. Notice that the option of u ∈ CM((2, 2)) being unbounded in the above theorem does not occur when F (u) ∼ |u|s , s > 2 as |u| → ∞. 5. Properties of Minimizers In Sect. 4, we proved the existence of minimizers in Mper ((2, 2)), which will provide a priori bounds on the minimizers of arbitrary type. These bounds and Theorem 3.2 will establish the existence of such minimizers. In this section we will also prove that certain properties of a type g are often reflected in the associated minimizers. The most important examples are the periodic types g = r. Although there are minimizers in every class M(r, p), it is not clear a priori that among these minimizers there are also periodic minimizers. In order to prove existence of periodic minimizers for every periodic type r we use the theory of covering spaces. 5.1. Existence. The periodic minimizers of type (2, 2) are special for the following reason. For a normalized u ∈ Mper ((2, 2)), define D(u) to be the closed disk in R2 such that ∂D(u) = (u). Theorem 5.1. i) If u ∈ CM(r , p), then (u) ⊂ D(umin ) for any periodic type r = (2, 2). ii) If u ∈ CM(g, p), then (u) ⊂ D(umin ) for any terminated type g. Proof. i) If r = (2, 2) then every u ∈ CM(r , p) has the property that (u) intersects the u-axis between u = ±1. Suppose that (u) does not lie inside D(umin ). Then (u) must intersect (umin ) at least twice, and let P1 and P2 be distinct intersection points with the property that the curve 1 obtained by following (u) from P1 to P2 lies entirely outside of D(umin ). Let 2 ⊂ (umin ) be the curve from P1 to P2 following umin , such that 1 and 2 are homotopic (traversing the loop (umin ) as many times as necessary) and thus J [1 ] = J [2 ] is minimal. Replacing 1 by 2 leads to a minimizer in CM(r , p) which partially agrees with u. This contradicts the uniqueness of the initial value problem for (1.2). ii) As in the previous case the associated curve (u) either intersects (umin ) at least twice or lies completely inside D(umin ), and the proof is identical. Corollary 5.2. For all minimizers in the above theorem, u1,∞ ≤ umin 1,∞ ≤ C0 . In order to prove existence of minimizers in every class we now use the above theorem in combination with an existence result from [?].

Homotopy Classes for Stable Periodic and Chaotic Patterns

589

Theorem 5.3. For any given type g and parity p there exists a (bounded) minimizer u ∈ CM(g, p). Moreover u1,∞ ≤ C0 , independent of (g, p). Proof. Given a type g we can construct a sequence gn of terminated types such that gn → g as n → ∞. For any terminated type gn there exists a minimizer un ∈ CM(gn , p) by Proposition 2.3 (Theorem 1.3 of [?]). Clearly such a sequence un satisfies un 1,∞ ≤ C by Corollary 5.2. Applying Theorem 3.2 completes the proof.

5.2. Covering spaces and the action of the fundamental group. The fundamental group of P is isomorphic to the free group on two generators e0 and e1 which represent loops (traversed clockwise) around (1, 0) and (−1, 0) respectively with basepoint (0, 0). Indeed, P is homotopic to a bouquet of two circles X = S1 ∨ S1 . The universal covering can be represented by an infinite tree whose edges cover either e0 or of X denoted by X → P, can then be e1 in X, see Fig. 5.1. The universal covering of P, denoted by ℘ : P viewed by thickening the tree X so that P is homeomorphic to an open disk in R2 .

Xg

Xg }g

O

O

}

}

e1

0

X

e0

of X is a tree. Its origin is denoted by O. For θ = e0 e1 e0 , the quotient space Fig. 5.1. The universal cover X θ = X/ θ is also a covering space over X, and X θ ∼ S 1 X

An important property of the universal covering is that the fundamental group π1 (P) in a natural way, via the lifting of paths in P to paths in induces a left group action on P We will not reproduce P. This action will be denoted by θ · p for θ ∈ π1 (P) and p ∈ P. the construction of this action here, and the reader is referred to an introductory book on algebraic topology such as [?]. However, we will utilize the structure of the quotient

590

W.D. Kalies, J. Kwapisz, J.B. VandenBerg, R.C.A.M. VanderVorst

obtained from this action, which are again coverings of P. These quotient spaces of P spaces will be the natural spaces in which to consider the lifts of curves (u) which lie in more complicated homotopy classes than those in the case of u ∈ Mper ((2, 2)). A periodic type g = r is generated by a finite type r, which together with the parity r2n p determines an element of π1 (P) of the form θ(r) = e|p−1| · · · · · epr1 . Since we only consider curves in P which are of the form (u) = (u(t), u (t)), the numbers ri are all positive. The infinite cyclic subgroup generated by any such element θ will be denoted θ = P/ θ is obtained by identifying points p by θ ⊂ π1 (P). The quotient space P for which q = θ k · p for some k ∈ Z. The resulting space P θ is homotopic and q in P θ → P is a covering space. Figure 5.1 illustrates the situation to an annulus, and ℘θ : P for X, since it is easier to draw, and for P the reader should imagine that the edges in based at O is shown by the picture are thin strips. The lift of the path θ = e0 e1 e0 to X θ . Note the dashed line. This piece of the tree becomes a circle in the quotient space X are identified with this circle. The dashed lines in both that infinitely many edges in X and X θ are strong deformation retracts of X and X θ respectively, and hence X θ is X θ is homotopic to an annulus. Thus θ gives that P homotopic to a circle. Thickening X θ ) is a generated by a simple closed loop in P θ which will be denoted by ζ (r). Note π1 (P that for convenience we suppress the dependence of θ and ζ on the parity p. Remark 5.4. If we define the shift operator σ on finite types r to be a cyclic permutation, then Mper (r, p) = Mper (σ k (r), τ k (p)) for all k ∈ Z. Functions in Mper (r, p) have a θ , θ = θ(r). However, functions in the shifted unique lift to a simple closed curve in P k k θ . In order for such functions class Mper (σ (r), τ (p)) are not simple closed curves in P θk , to be lifted to a unique simple closed curve we need to consider the covering space P k k where θk = θ (σ (r), τ (p)). 5.3. Characterization of minimizers of type r. In Sect. 5.2 we characterized minimizers in CM((2, 2)) by studying the properties of their projections into P. What was special about the types (2, 2)k was that the projected curves were a priori contained in P \L, which is topologically an annulus. The J -efficiency of minimizing curves restricts the possibilities for their self and mutual intersections. In particular, we showed that all periodic minimizers in CM((2, 2)) project onto simple closed curves in P \ L and that no two such minimizing curves intersect. These two properties, coupled with the simple topology of the annulus, already force the minimizing periodic curves to have a structure of a family of nested simple loops. Such a simple picture in the configuration plane P cannot be expected for minimizers in CM((r , p)) with r = (2, 2). The simple intersection properties (of Lemma 5.9 and 5.11) no longer hold; in fact, periodic minimizing curves must have self-intersections in P as do any curves in P representing the homotopy class of (r , p). However, by θ , we can remove exactly these necessary lifting minimizing curves into the annulus P self-intersections and put us in a position to emulate the discussion for the types (2, 2)k . More precisely, for a minimal type (r, p), any u ∈ Mper ((r, p)k ) with period T such that θ −1 [(u|[0,T ] )] = (r, p)k , there are infinitely many lifts of the closed loop (u|[0,T ] ) θ (r) (see the above remark) but there is exactly one lift, denoted θ (u|[0,T ] ), that into P θ (r). We can repeat all of the arguments in is a closed loop homotopic to ζ k (r) in P θ (r) instead of Sect. 4 by identifying intersections between the curves θ (u|[0,T ] ) in P intersections between the curves (u|[0,T ] ) in P \ L. Of course, when gluing together pieces of curves, the values of u and u come from the projections into P. In particular,

Homotopy Classes for Stable Periodic and Chaotic Patterns

591

the arguments of Lemma 4.9 show that θ (u|[0,T ] ) must be a simple loop traced k-times, which leads to the following: Lemma 5.5. For any periodic type r and any k ≥ 1 it holds that CMper ((r, p)k ) = CMper (r, p) = CMper (r , p). The proof of the next theorem is a slight modification of Theorem 4.11. Theorem 5.6. For any periodic type r the set CMper (r, p) is compact and totally θ ). ordered (in P The following lemma is analogous to Lemma 4.13. Note however that by Theorem 5.1 we do not need to assume that the minimizer is uniformly bounded. Lemma 5.7. Let u ∈ CM(r , p) for some periodic type r = (2, 2). Either u is periodic and u ∈ CMper (r, p), or u is a connecting orbit between two periodic minimizers u− , u+ ∈ CMper (r, p), i.e. there are sequences tn− , tn+ → ∞ such that 4 (R). u(t − tn− ) → u− (t) and u(t + tn+ ) → u+ (t) in Cloc Combining Theorem 5.3 and Lemma 5.7 we obtain the existence of periodic minimizers in every class with a periodic type (this result can also be obtained in a way analogous to Theorem 4.5). Theorem 5.8. For any periodic type r the set CMper (r, p) is nonempty. The classification of functions by type has some properties in common with symbolic dynamics. For example, if a type g is asymptotic to two different periodic types, i.e. σ n (g) → r+ and σ −n (g) → r− as n → ∞, with r+ = r− , then any minimizer u ∈ CM(g, p) is a connecting orbit between two periodic minimizers u− ∈ CMper(r− ,p) and u+ ∈ CMper (r+ , p), i.e. there exist sequences tn− , tn+ → ∞ such that u(t −tn− ) → u− (t) 4 (R). This result follows from Cantor’s diagonal argument and u(t + tn+ ) → u+ (t) in Cloc using Theorems 3.2 and 5.7, and hence we have used the symbol sequences to conclude the existence of heteroclinic and homoclinic orbits connecting any two types of periodic orbits. Symmetry properties of types g are also often reflected in the corresponding minimizers. For example, define the map Bi0 on infinite types by Bi0 (g) = (g2i0 −i )i∈Z , and consider types that satisfy Bi0 (g) = g for some i0 . Moreover assume that g is periodic. In this case we can prove that the corresponding periodic minimizers are symmetric and satisfy Neumann boundary conditions. Theorem 5.9. Let g = r satisfy Bi0 (r) = r for some i0 . Then for any u ∈ CMper (r, p) there exists a shift τ such that uτ (x) = u(x − τ ) satisfies i) uτ (x) = uτ (T − x) for all x ∈ [0, T ] where T is the period of u, ii) uτ (0) = u τ (0) = 0 and uτ (T ) = uτ (T ) = 0, and iii) uτ is a local minimizer for the functional JT [u] on the Sobolev space Hn2 (0, T ) = {u ∈ H 2 (0, T ) | u (0) = u (T ) = 0}. Proof. Without loss of generality we may assume that i0 = 1 and g = (g1 , . . . , gN ) for some N ∈ 2N. We can choose a point t0 in the convex hull of A1 such that u (t0 ) = u (t0 + T ) = 0 and g(u|[t0 ,t0 +T ] ) = (g1 /2, g2 , . . . , gN , g1 /2). We now define v(t) = u(t0 +T − t). Then by the symmetry assumptions on g we have that g(v|[t0 ,t0 +T ] ) = g(u|[t0 ,t0 +T ] ). Since J[t0 ,t0 +T ] (v) = J[t0 ,t0 +T ] (u) and (u(t0 )) = (u(t0 +T )) = (v(t0 )) = (v(t0 + T )), we conclude from the uniqueness of the initial value problem that u(t) = v(t) for all t ∈ [t0 , t0 + T ], which proves the first statement. The second statement follows immediately from i). The third property follows from the definition of minimizer.

592

W.D. Kalies, J. Kwapisz, J.B. VandenBerg, R.C.A.M. VanderVorst

References 1. Bangert, V.: Mather Sets for Twist Maps and Geodesics on Tori. Volume 1, of Dynamics Reported. Oxford: Oxford University Press, 1988 2. Boyland, P. and Golé, C.: Lagrangian systems on hyperbolic manifolds. Ergodic Theory Dynam. Systems 19, 1157–1173 (1999) 3. Fulton, W.: Algebraic Topology: A First Course. Berlin–Heidelberg–New York: Springer-Verlag, 1995 4. Ghrist, R., VandenBerg, J.B. and VanderVorst, R.C.A.M.: Braided closed characteristics in fourth-order twist systems. Preprint 2000 5. Ghrist, R. and VandenBerg, J.B. and VanderVorst, R.C.A.M.: Morse theory on the space of braids and Lagrangian dynamics. In preparation 6. Hulshof, J. and VandenBerg, J.B. and VanderVorst, R.C.A.M.: Traveling waves for fourth order parabolic equations. To appear in SIAM J. Math. Anal. (1999) 7. Kalies, W.D. and Kwapisz, J. and VanderVorst, R.C.A.M.: Homotopy classes for stable connections between Hamiltonian saddle-focus equilibria. Commun. Math. Phys. 193, 337–371 (1998) 8. Kalies, W.D. and VanderVorst, R.C.A.M.: Multitransition homoclinic and heteroclinic solutions of the extended Fisher–Kolmogorov equation. J. Diff. Eq. 131, 209–228 (1996) 9. Kalies, W.D. and VanderVorst, R.C.A.M. and Wanner, T.: Slow motion in higher-order systems and -convergence in one space dimension. To appear in Nonlin. Anal. TMA 10. Kwapisz, J.: Uniqueness of the stationary wave for the extended Fisher–Kolmogorov equation. J. Diff. Eq. 165, 235–253 (2000) 11. Morse, M.: A fundamental class of geodesics on any closed surface of genus greater than one. Trans. Am. Math. Soc. 26, 25–60 (1924) 12. Rabinowitz, P.H.: Heteroclinics for a Hamiltonian system of double pendulum type. Top. Meth. Nonlin. Anal. 9, 41–76 (1997) 13. Schecter, E.: Handbook of analysis and its foundations. San Diego–New York–Boston: Acad. Press, 1997 14. VandenBerg, J.B.: The phase-plane picture for a class of fourth-order conservative differential equations. J. Diff. Eq. 161, 110–153 (2000) 15. VandenBerg, J.B.: Uniqueness of solutions for the extended Fisher–Kolmogorov equation. Comptes Rendus Acad. Sci. Paris (Série I) 326, 447–452 (1998) 16. VandenBerg, J.B. and VanderVorst, R.C.A.M.: Stable patterns for fourth order parabolic equations. Preprint (2000) Communicated by Ya. G. Sinai

Commun. Math. Phys. 214, 593 – 605 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

The Ordered K-Theory of C ∗ -Algebras Associated with Substitution Tilings Ian F. Putnam Department of Mathematics and Statistics, University of Victoria, Victoria, B.C., Canada Received: 12 August 1999 / Accepted: 2 May 2000

Abstract: We consider the C ∗ -algebra, AT , constructed from a substitution tiling system which is primitive, aperiodic and satisfies the finite pattern condition. Such a C ∗ -algebra has a unique trace. We show that this trace completely determines the order structure on the group K0 (AT ); a non-zero element in K0 (AT ) is positive if and only if its image under the map induced from the trace is positive. 1. Introduction and Statement of the Main Result We begin by introducing some of the terminology and notation. All of these things are developed more fully in the survey article [KP]. We have included other references to more original sources where appropriate. A substitution tiling system in Rd consists of a finite collection of bounded, regular closed sets p1 , . . . , pN in Rd called prototiles. We also have a constant λ > 1 and, for each i = 1, . . . , N, ω(pi ) which is a finite collection of subsets of Rd with pairwise disjoint interiors; each is a translate of one of the prototiles and their union is λpi = {λx | x ∈ pi }. In general, we call a translate of one of the prototiles a tile. Several one and two dimensional examples, including the Penrose tiles, are given in [AP]. As a generalization of the above, one can also have a finite set called labels. A labelled prototile is then a bounded, closed regular subset together with a label. (The idea being that we now have a way of distinguishing two prototiles which may be exactly the same geometric object.) It is clear how to extend the remainder of the definitions to this situation. All of our results apply equally well to the situation of labelled prototiles. Collections of tiles with pairwise disjoint interiors are called partial tilings. The union of such a set of tiles is called the support of the partial tiling and a partial tiling whose support is Rd is called a tiling. If T denotes a tiling (or even a partial tiling), then for any x in Rd , T + x denotes the tiling (or partial tiling) obtained by translating all tiles in T by x. Supported in part by a grant from NSERC, Canada

594

I. F. Putnam

Notice that we can extend our definition of ω to tiles by ω(p + x) = ω(p) + λx, for any prototile p and vector x. We can further extend this definition to partial tilings by ω(T ) = {ω(t) | t ∈ T }. This also means that we can iterate ω, and for any prototile p, we may construct ωn (p), for any n = 1, 2, . . . , which is a partial tiling with support λn p. We will assume here that all of our prototiles contain the origin in their interior. (This loses no generality.) We define the puncture of any prototile p to be the origin and for any vector x, we define the puncture of t = p + x, denoted x(t), to be x. So each tile has a distinguished point in its interior. We say that the substitution is primitive if there is a positive integer k such that, for every ordered pair of prototiles p, p , a translate of p appears inside ωk (p). We construct a tiling as follows. There exists a prototile p, a vector x, and a positive integer k so that the sequence of partial tilings ωnk (p + x), n = 1, 2, . . . is coherent in the sense that the nth one contains all the earlier ones. Moreover, these grow to cover Rd . We will not prove this here, although the proof is not difficult. We let T denote the union of these partial tilings which is a tiling. We look at all translations of T and put a metric d on this set as in [RW, Rud, So1]). The completion of this set of translations of T is denoted . It is also worth noting that the elements of this completion can be viewed as tilings with the same tiles. This space is actually independent of the choice of T as above. From now on, we revert to using T to denote an arbitrary element of . Under a hypothesis called the finite pattern condition, [RW], it is a compact metric space. The map ω:→ then becomes a continuous surjection. We will focus our attention on the case when this map is also injective, and hence, a homeomorphism. This is usually referred to as the unique composition property or that the substitution is locally invertible. However, Solomyak [So2] has shown that, with the hypothesis of the finite pattern condition, this is equivalent to the set containing no periodic tilings. That is, if T in and x in Rd satisfy T + x = T , then x = 0. (In the terminology of [So2], is the local isomorphism class of any of its elements.) Therefore, we will say that the substitution system is aperiodic if the map ω is injective. We will also assume that our substitution forces its border [Kel1]. As discussed in [KP], this loses no generality, provided we allow labelled tiles. We define punc to be the set of all tilings in which have a puncture on the origin. This set is compact and totally disconnected. We want to describe a base for its topology consisting of clopen sets. Fix some finite partial tiling P inside ωk (p), where p is a prototile and k is a positive integer, and let t be any tile in P . We let U (P , t) = {T | P − x(t) ⊂ T }. That is, we translate the patch P back by vector x(t), so that t now covers the origin, with its puncture exactly on the origin. Then we look at all tilings containing this patch. This set is closed and open in punc and such sets form a base for the topology. We are interested in the equivalence relation on punc which is simply translational equivalence. That is, we define Rpunc = {(T , T + x) | T , T + x ∈ punc }. This set is also given a topology which is easiest to describe as follows. Let P be a patch as before and let t, t be two tiles in P . The map sending T in U (P , t) to

Ordered K-Theory of C ∗ -Algebras with Substitution Tilings

595

T + x(t) − x(t ) is a homeomorphism onto U (P , t ). Its graph is contained in Rpunc and is denoted U (P , t, t ). These sets form a base for the topology of Rpunc . Indeed, they are actually clopen sets and Rpunc is totally disconnected. This makes Rpunc into a locally compact, Hausdorff, σ -compact, r-discrete, principal groupoid with counting measure as a Haar system. (See [Ren] as a general reference on the subject of groupoids, or [Put2] for a leisurely treatment.) We use r, s to denote the range and source maps from Rpunc to punc . That is, r(T , T ) = T and s(T , T ) = T . We let AT denote the C ∗ -algebra of Rpunc . We refer the reader to [KP] or to [Ren] as the main source for the construction of C ∗ -algebras from groupoids. For general references to C ∗ -algebras, we suggest [Fi, Da, Pe]. (We should note that this really doesn’t depend on T . The notational confusion comes because this is a special case of a more general construction [KP]. It would probably be preferable to use the notation Aω , but we will stay with this for historical reasons.) This C ∗ -algebra is the completion in a certain norm of the *-algebra of continuous compactly supported functions on Rpunc , denoted Cc (Rpunc ). Let us mention some properties of this C ∗ -algebra. The key point is that the space with map ω can be viewed as a Smale space [AP]. Then the space punc can be viewed as an abstract transversal to the relation of unstable equivalence. The reduction of this groupoid on punc is exactly Rpunc . Hence AT is strongly Morita equivalent to U (, ω) and the results of [PS] apply. In particular, the equivalence relation Rpunc is minimal in the sense that every equivalence class is dense in punc and also amenable in the sense of Renault [Ren]. We are interested in the computation of the K-theory of AT , and especially its K-zero group. We refer the reader to [Bl,W-O] as general references for K-theory for operator algebras and [Be1, Kel1] for further information and motivation for this problem in physics. Methods have been given in [AP,Be2,BCL,Kel1,Kel2] for the computation of the K-theory of AT . In some cases, these included the order structure on K-zero. Here, we will prove a more general result. The space punc possesses a natural measure µ. It is most easily described as follows. The mixing Smale space, (, ω), has a measure of maximum entropy which is a product measure with respect to the canonical stable and unstable coordinates. The entropy is d log(λ). The set punc is contained in a finite collection of local stable sets and the measure µ is simply the restriction of the stable component of the measure of maximal entropy. Its key properties are that it is finite and Rpunc -invariant. This means that it is preserved under the local homomorphisms whose graphs make up our topology base above. This measure has full support. This is because the equivalence relation Rpunc is minimal and, since µ is Rpunc -invariant, its support is also. This measure is also unique. We will have more to say about this later. This measure µ defines a trace, τ , on the C ∗ -algebra AT . For an element f which lies in the dense sub-algebra Cc (Rpunc ), its trace is given by τ (f ) = f (x, x)dµ(x). punc

This is a positive bounded linear functional of norm one. It is also faithful since the measure has full support. Such a trace induces a positive group homomorphism on the K-zero group of AT [Bl,W-O], τˆ : K0 (AT ) → R.

596

I. F. Putnam

It is our goal here to show that, under a very mild hypothesis regarding the topology of the prototiles, this homomorphism completely determines the order structure on K0 (AT ) [Bl,W-O]. Theorem 1.1. Let p1 , . . . , pN , ω be a substitution tiling system (or labelled substitution tiling system) in Rd which is primitive, aperiodic and satisfies the finite pattern condition. Suppose that for each prototile, the capacity or box-counting dimension of its boundary is strictly less than d. Then the order on K0 (AT ) is determined by the trace. That is, for any element a in K0 (AT ), a is in K0 (AT )+ if and only if a = 0 or τˆ (a) > 0. Notice that the hypothesis regarding dimension is satisfied by any polyhedra, where the boundaries are made up of lower dimensional hypersurfaces. Our proof will be presented in the last section. It will make use of a canonical C ∗ subalgebra of AT , denoted AFT [Kel1, Kel2]. This sub-algebra is reasonably large inside AT , but also has the advantage of being AF or approximately finite dimensional. (Again, AFω might be more appropriate notation.) The structure of this C ∗ -algebra is fairly well-understood. It is one of the AF -algebras constructed by Cuntz and Krieger from a mixing topological Markov chain, and the analogue of our main result above is known for such C ∗ -algebras. Our proof will make use of this. The rest of the argument is to show how we may interpolate between projections in AT with projections in AFT . The details appear in Sect. 3. Let us give a description of AFT . For each prototile p and each positive integer n, let Punc(p, n) denote the set of all punctures in the tiles of the partial tiling ωn (p). Suppose that x is in some Punc(p, n), and T is any tiling in punc such that p is in T . Recall that the puncture in p is the origin. Then the tiling ωn (T ) − x is again in punc . We let W (p, n, x) denote the set of all tilings of this form. It is not difficult to check that, for a fixed value of n, the collection of sets {W (p, n, x) | p a prototile, x ∈ Punc(p, n)}, is a partition of punc into clopen sets. Since we assume the substitution forces its border, then as n varies, these generate the topology on punc . Suppose that p is a prototile and n is a positive integer. If we have x and y in Punc(p, n), then the map sending T to T + x − y is a homeomorphism from W (p, n, x) onto W (p, n, y). The graph of this map is denoted by W (p, n, x, y). It is a clopen subset of Rpunc . We define RAF to be the union of these sets, which is then an open subgroupoid of Rpunc . Then the C ∗ -algebra of RAF is denoted by AFT and the obvious inclusion of Cc (RAF ) ⊂ Cc (Rpunc ) extends to an inclusion AFT ⊂ AT . To see that AFT is approximately finite dimensional, it suffices to notice that if we take An to be the linear span of the characteristic functions of the sets W (p, n, x, y), where p is a prototile and x, y are in Punc(p, n), then in fact, this is actually a finite dimensional C ∗ -subalgebra. The details are given in [KP]. It is also shown there that the matrix which describes the embedding of An ⊂ An+1 is the same for every n and is equal to the N × N matrix whose i, j entry is the number of different translates of pi appearing in ω(pj ) for all i, j . Since the substitution is primitive, so is this matrix in the sense that some power has no zero entries [LM]. Our trace, τ , restricts to a trace on AFT . By the results of [Ha], such a C ∗ -algebra has a unique trace. This implies the

Ordered K-Theory of C ∗ -Algebras with Substitution Tilings

597

uniqueness of our Rpunc -invariant measure µ since any other measure would give rise to another trace. These would be distinct on C(punc ) which is contained in AFT . While our main theorem gives a complete answer to the question of the order on K0 (AT ), there is one important question which we leave unanswered. That is to compute the range of the map τˆ . There is a natural conjecture, namely τˆ (K0 (AT )) = τˆ (K0 (AFT )) = {µ(E) | E ⊂ punc clopen } + Z. One inclusion in the first equality is obvious. In some special situations in low dimensions (d ≤ 2), equality is known [vE]. As well, our result in the next section, Theorem 2.1, suggests that this will be true under the same hypothesis as Theorem 1.1. Furthermore, the set τˆ (K0 (AFT )) is known to be the subgroup of R generated by numbers of the form λ−nd ξi , where n is a positive integer and ξi is an entry of the left Perron eigenvector of the primitive matrix of the last paragraph. 2. A Technical Result In this section we will prove a technical result which we will need in the proof of the main theorem. We include it as a separate section because it may be of some independent interest, as we will try to explain below. As we discussed in the introduction, we have two equivalence relations (or principal groupoids), RAF ⊂ Rpunc , on the space punc . The second has a topology in which it is locally compact, Hausdorff, metrizable, r-discrete σ -compact and for which counting measure is a Haar system. The first is an open subgroupoid. Roughly speaking, the structure of the subgroupoid is fairly well-understood and the difficulty in analyzing the C ∗ -algebra AT usually involves Rpunc − RAF . Our main technical result here is to show that, at the level of measure theory, the difference is negligible. Specifically, we will prove the following. Theorem 2.1. Let p1 , . . . , pN , ω be a substitution tiling system in Rd which is primitive, aperiodic and satisfies the finite pattern condition. Let Rpunc and RAF be the associated principal groupoids and let µ be the unique Rpunc -invariant probability measure on punc . If the boundary of each pi has capacity or box-counting dimension strictly less than d, then we have µ(r(Rpunc − RAF )) = 0. The proof will be broken into a series of lemmas and we will introduce some new notation. Recall that Punc(p, n) denotes the set of all punctures in the tiles of ωn (p). For any x in Punc(p, n), we define ∂(x) to be the Euclidean distance from x to the boundary of λn p, which is the support of ωn (p); that is, ∂(x) = inf{|x − y| | y ∈ Rd − λn p}, where || denotes the usual Euclidean norm on Rd . We fix b > 0 so that B(0, b) ⊂ p, for all prototiles p. Here, B(x, r) denotes the open ball in Rd centred at x and with radius r. Notice that this means for all tiles t, B(x(t), b) ⊂ t. We let #X to denote the number of elements of any finite set X.

598

I. F. Putnam

Lemma 2.2. For each prototile p, there is a positive constant ap such that #Punc(p, n) ≥ ap λdn for all positive integers n. Proof. We define the N × N substitution matrix B as follows. The i, j entry of B is the number of occurrences of translates of pj in ω(pi ). The fact that the substitution is primitive is equivalent to the fact that this non-negative matrix is primitive [LM]. Let v be the vector in Rd whose i th entry is the volume of pi . It is easy to calculate that v is a (right) eigenvector of B with eigenvalue λd . Since this eigenvector is clearly positive, this is the Perron eigenvector for B and λd is the Perron eigenvalue. (See Sect. 4.2 of [LM].) Since B is primitive, we may apply 4.5.12 of [LM] to conclude that, for every pair i, j , we may find a positive constant ai,j such that lim |(B n )i,j − ai,j λdn | = 0.

n→∞

But #Punc(p, n) is simply the number of different tiles in ωn (p) which is sum over all i of (B n )i,j , where pj = p. The result follows easily from this. Lemma 2.3. Let p1 , . . . , pN , ω be a primitive substitution tiling system in Rd such that, for each prototile p, boundary of p, ∂p, has box-counting dimension strictly less than d. Then, for any R > 0 and prototile p, we have lim

n→∞

#{x ∈ Punc(p, n) | ∂(x) ≤ R} = 0. #Punc(p, n)

Proof. We let δ be the maximum box-counting dimension of the boundaries of the prototiles. So our hypothesis is that δ < d. This means that there is a constant K and a function m(/) ≤ K/ −δ such that, for any prototile p, we may cover its boundary with m(/) balls of radius /, for any / > 0. Fix a prototile p and a positive integer n, let / = Rλ−n . Choose an open cover of ∂p with /-balls as above and denote their centres by xi , i = 1, . . . , m(/). Now if x is in Punc(p, n) and ∂(x) ≤ R, then for some y in ∂(λn p), we have |x − y| ≤ R. Then we have |λ−n x − λ−n y| ≤ Rλ−n = /, λ−n y is in ∂p and hence |xλ−n − xi | < 2/, for some i. So each point x of Punc(p, n) within R of the boundary of λn p, is contained in some λn B(xi , 2/). Notice that λn B(xi , 2/) = B(λn xi , λn 2/) = B(λn xi , 2R). We next want an upper bound on the number of such x, for a fixed i. Let ki = #(Punc(p, n) ∩ B(λn xi , 2R)). We will use the fact that B(x(t), b) ⊂ t, for any tile t. This means that the balls B(x, b), for x in Punc(p, n), are pairwise disjoint. And if x is also in B(λn xi , 2R), then B(x, b) is contained in B(λn xi , 2R + b). This means that V ol(B(λn xi , 2R + b)) ≥ V ol(B(x, b)),

Ordered K-Theory of C ∗ -Algebras with Substitution Tilings

599

where the sum is taken over all x in Punc(p, n) ∩ B(λn xi , 2R). There is a positive constant, Vd , so that for all positive r, V ol(B(x, r)) = Vd r d . So we have Vd (2R + b)d ≥ ki Vd bd , which in turn gives us ki ≤ (1 + 2R/b)d .

(1)

Now if we sum over i, we obtain #{x ∈ Punc(p, n) | ∂(x) ≤ R} ≤

m(/)

ki

i=1

≤

m(/)

(1 + 2R/b)d

i=1

≤ m(/)(1 + 2R/b)d ≤ K/ −δ (1 + 2R/b)d = K(Rλ−n )−δ (1 + 2R/b)d = K(1 + 2R/b)d R −δ λnδ = K λnδ , where K = K(1 + 2R/b)d R −δ is independent of n. Now we combine this estimate with Lemma 2.2 to obtain #{x ∈ Punc(p, n) | ∂(x) ≤ R} K λnδ ≤ lim n→∞ n→∞ (ap λdn ) #Punc(p, n) lim

= lim (K /ap )λn(δ−d) n→∞

= 0, since δ < d.

Definition 2.4. For any tiling T in r(Rpunc − RAF ), there is a vector x in Rd such that T − x is in punc , but (T , T − x) is not in RAF . For such T we define ρ(T ) = inf{|x| | (T , T − x) ∈ Rpunc − RAF }. Lemma 2.5. Let p be a prototile and n be a positive integer. Suppose that x is in Punc(p, n) and that T is in r(Rpunc − RAF ) ∩ W (p, n, x). Then we have ρ(T ) ≥ ∂(x). Proof. The hypothesis means that we can write T = ωn (T ) − x, where T contains the tile p. Suppose that y is any vector with |y| < ∂(x). We claim that if T − y is in punc , then (T , T − y) is in RAF . From this it follows that if (T , T − y) is to be in Rpunc − RAF , we must have |y| ≥ ∂(x) and the conclusion follows from the definition of ρ. As for the claim, we begin by noting that if |y| < ∂(x), then x + y is in the interior of λn p. If, in addition, T −y is in punc , then the T −y = ωn (T )−x−y = ωn (T )−(x+y) and so x+y is in Punc(p, n). The graph of the map sending ωn (T )−x to ωn (T )−(x+y) is contained in RAF . In particular, the pair (T , T − y) is in RAF . This completes the proof of the claim.

600

I. F. Putnam

Recall that we are trying to prove that µ(r(Rpunc − RAF )) = 0. It is easy to check that for fixed R > 0, the set {(T , T − x) | |x| ≤ R} ∩ (Rpunc − RAF ) is compact in Rpunc . It follows that for any R > 0, r(Rpunc − RAF ) ∩ ρ −1 [0, R] is compact in punc . To prove our result, it suffices to show that the µ-measure of this set is zero, for any R. We now fix R0 > 0 and, for convenience, we denote {T ∈ r(Rpunc − RAF ) | ρ(T ) ≤ R0 } by Y0 . We will construct a sequence of positive constants, R0 < R1 < R2 < . . . , and a sequence of locally defined maps γ1 , γ2 , . . . with the following properties. Each γm is a local homeomorphism whose graph is a clopen set in RAF and whose domain contains Y0 . Moreover, for all T in Y0 , we will have Rm−1 < ρ(γm (T )) ≤ Rm − 1. We may conclude from this last equation that the sets γm (Y0 ) are pairwise disjoint. Moreover, the maps γm all preserve the measure µ. So each of these sets has the same measure as Y0 and since the measure is finite, we conclude that µ(Y0 ) = 0, as desired. We begin by setting R0 = sup{ρ(T ) | T ∈ Y0 }. Assume that, for some m ≥ 1, we have Rm−1 defined with the property that ρ(T ) < Rm−1 , for all T in Y0 . We define γm as follows. We apply Lemma 2.3 using the value R = Rm−1 + 1. We may find a n sufficiently large so that the ratio in the limit is less than 1/2, for all prototiles p. This means that, for this value of n, #{x ∈ Punc(p, n) | ∂(x) ≤ Rm−1 } ≤ #{x ∈ Punc(p, n) | ∂(x) ≥ Rm−1 + 1}, for each prototile p. Now for each prototile, p, we may define an injection, η, from the first set above to the second. (Of course, there is a different η for each p, but we will suppress this in our notation.) The domain of the map γm will be the union of all sets W (p, n, x), where p is any prototile and x is in Punc(p, n) with ∂(x) ≤ Rm−1 . For T in W (p, n, x), we define γm (T ) = T + x − η(x). It is easy to check that γm is a homeomorphism on its domain. Also, for every T in YR , we know that T is in some set W (p, n, x). It follows from Lemma 2.5 that Rm−1 ≥ ρ(T ) ≥ ∂(x) and so W (p, n, x), and hence T , is in the domain of γm . It is also clear from the definition and Lemma 2.5 that ρ(γm (T )) ≥ ∂(η(x)) ≥ Rm−1 − 1. Therefore γm has all the required properties. To complete the induction, we choose Rm to be Rm = sup ρ(γm (Y0 )) + 1. This completes the proof of Theorem 2.1.

Ordered K-Theory of C ∗ -Algebras with Substitution Tilings

601

3. Proof of the Main Result We begin a proof of the main result Theorem 1.1. The key ingredient is the following. Lemma 3.1. Let p be a non-zero projection in AT and suppose that 0 < / < τ (p). Then there is a projection q in AFT satisfying [q] ≤ [p] in K0 (AT ) and |τ (p) − τ (q)| < /. The proof will take some time and involve several lemmas. Begin by choosing 0 < δ ≤ //20 and so that δ < 1/400. We use the facts that Cc (Rpunc ) ⊂ AT is dense and that Rpunc is totally disconnected to find a function f in Cc (Rpunc ) which is locally constant (i.e. f has finite range) and so that p − f < δ.

(2)

By replacing f by f ∗ f if necessary, we may assume that f is positive in AT . We may also assume that f ≤ 1. It follows from Eq. (2) that f 2 − f < 3δ.

(3)

Note that when we write f 2 , we mean the product in AT , which is the convolution product on Rpunc , not the pointwise product. Let K = r(supp(f ) ∩ (Rpunc − RAF )) which is a compact subset of punc and has µ(K) = 0, by Theorem 2.1. We may choose a clopen set F ⊃ K such that f 2 (x, x)dµ(x) < δ. (4) F

We define a function e on Rpunc by

e(T , T ) =

1 0

if T = T ∈ /F . otherwise

Notice that e is a projection in AFT . Lemma 3.2. The element ef (product in AT ) is a locally constant function on Rpunc and ef is in Cc (RAF ). Finally, we have |τ (f ) − τ (f 2 ef )| < 7δ.

(5)

602

I. F. Putnam

Proof. The first statement is obvious since both e and f have the same property. As for the second, we only need to see that ef is zero on Rpunc − RAF . We have ef (T , T ) = e(T , T )f (T , T ), for any (T , T ) in Rpunc . If (T , T ) is in Rpunc − RAF and f is not zero on this point, then T = r(T , T ) is in K and so e(T , T ) = 0. For the last inequality, we have |τ (f ) − τ (f 2 ef )| ≤ |τ (f ) − τ (f 2 )| + |τ (f 2 ) − τ (f ef )| +|τ ((f − f 2 )ef )| ≤ f − f 2 + |τ (f 2 ) − τ (ef 2 )| +f − f 2 ≤ 6δ + τ ((1 − e)f 2 ) = 6δ + f 2 (x, x)dµ(x) F

< 7δ, by Eq. (4).

We now know that the element f ef = (ef )∗ (ef ) is self-adjoint and lies in AFT . Since it is a locally constant function on RAF it will actually lie in one of the canonical approximating finite-dimensional C ∗ -algebras, denoted by AN in [KP]. This means that its spectrum is finite and we may write f ef =

m

λ i ei ,

(6)

i=1

where the λi are positive constants less than or equal to 1 and the ei are projections in AFT satisfying m

ei = 1,

i=1

ei ej = 0, for i = j.

By re-arranging the order of the terms, we may assume that λi ≤ 1/2, for i = 1, . . . , k, λi ≥ 1/2, for i = k + 1, . . . , m, for some fixed k. We now define q=

m

ei .

i=k+1

Notice immediately that q is a self-adjoint projection and lies in AFT .

Ordered K-Theory of C ∗ -Algebras with Substitution Tilings

603

Lemma 3.3. 1. 2.

pq − q < 4δ. pqp − q < 8δ.

Proof. We use the definition of q and Eq. (6): pq − q = (p − 1)

ei

i>k

  m  λj ej  λ−1 = (p − 1) i ei i>k

= (p − 1)

i>k

j =1

f ef λ−1 i ei

≤ (p − 1)f ef

i>k

λ−1 i ei

≤ (p(f − p) + p − f ) sup{λ−1 i } i>k

< (2δ)2 = 4δ.

The second inequality follows at once from the first. We omit the details.

Since δ < 1/400, we obtain (pqp)2 − pqp < 24δ < 1/16, so the spectrum of pqp is contained in [−1/8, 1/8] ∪ [7/8, 9/8] and we may apply functional calculus and obtain q = χ(1/2,∞) (pqp). Then q is a self-adjoint projection in AT within 1/8 of pqp and hence within distance 1/2 of q. Therefore, [q ] = [q], by 4.3.2 of [Bl] or 5.2.6 of [W-O]. Also, the element q can be obtained as a limit of polynomial functions with zero constant term applied to pqp. From this we see that pq = q = q p, or q ≤ p. We have all the properties we desired from q and q , except the estimate on the trace of q. Lemma 3.4. |τ (p) − τ (q)| < /. Proof. First, we want to estimate |τ (f ) − τ (f q)|. Recall that the sum of the ei ’s was the identity, so that we have

|τ (f ) − τ (f q)| = |

m

τ (f ei ) −

i=1

=|

k i=1

i>k

τ (f ei )|.

τ (f ei )|

604

I. F. Putnam

Now we use the fact that, for i ≤ k, we have λi ≤ 1/2. So we may continue |τ (f ) − τ (f q)| ≤ | ≤|

k i=1 n

2(1 − λi )τ (f ei )| 2(1 − λi )τ (f ei )|

i=1 n

= 2|

τ (f ei ) − τ (f λi ei )|

i=1

= 2|τ (f ) − τ (f 2 ef )| < 14δ by Lemma 3.2. Now we are ready to compute |τ (p) − τ (q)| ≤ |τ (p) − τ (f )| + |τ (f ) − τ (f q)| +|τ (f q) − τ (pq)| + |τ (pq) − τ (q)| < δ + 14δ + f − p + pq − q < 20δ < /, using Lemma 3.3 for the last term.

We have now completed the proof of Lemma 3.1 and we are ready to give a proof of Theorem 1.1. First, we consider the “only if” direction. If a is any positive element in K0 (AT ), then by definition, a = [p] for some projection p in some Mn (AT ). By applying II.4.2 of [Ren], we may view p as a matrix of functions on Rpunc . Since p = p∗ p, the diagonal elements of the matrix p are non-negative on the diagonal in Rpunc . This means that τ (p) is non-negative. Moreover, if it is zero, then since µ has full support, then each diagonal entry of p is zero. This in turn implies that p = 0. Now we turn to the “if” direction of the proof. That is, suppose that a = [p] − [q] is in K0 (AT ) and has τˆ (a) = τ (p) − τ (q) > 0. We will show that [p] ≥ [q] in K0 (AT ). We will first consider the case that the projections actually lie in the algebra AT , rather than in matrices over AT . Begin with two projections p1 and p2 in AT and suppose that τ (p1 ) > τ (p2 ). Let / = (τ (p1 ) − τ (p2 ))/3. We apply Lemma 3.1 to the projection p1 and / > 0 to obtain q1 in AFT with [p1 ] ≥ [q1 ] and |τ (p1 ) − τ (q1 )| < /. We apply the same result to the projection 1 − p2 and the same / to obtain a projection in AFT . We let q2 be its orthogonal complement. So we have [q2 ] ≥ [p2 ] and |τ (p2 ) − τ (q2 )| < /. Then by a simple application of the triangle inequality, we have τ (q1 ) − τ (q2 ) ≥ / > 0. The C ∗ -algebra AFT has a unique trace and it is a simple AF-algebra. For simple AFalgebras, the order on their K-zero groups is completely determined by the traces [EHS]. We know then that [q1 ] ≥ [q2 ] and hence [p1 ] ≥ [p2 ] in K0 (AT ) as desired. In the case that the projections lie in Mn (AT ), we can use the same argument by replacing the groupoids Rpunc and RAF by their products with the trivial groupoid {1, . . . , n} × {1, . . . , n}. All of the essential features of the groupoids remain and the

Ordered K-Theory of C ∗ -Algebras with Substitution Tilings

605

effect at the level of C ∗ -algebras is to tensor on C ∗ ({1, . . . , n} × {1, . . . , n}) ∼ = Mn , the C ∗ -algebra of n × n matrices. We omit the details. Acknowledgement. I would like to thank Johannes Kellendonk, Chris Bose, Florin Diacu and Rua Murray for helpful conversations.

References [AP] [Be1] [Be2] [BCL] [Bl] [Co2] [Da] [Ef] [EHS] [vE] [Fi] [GS] [Ha] [Kel1] [Kel2] [KP] [LM] [Pe] [Put1] [Put2] [PS] [RW] [Ren] [Rud] [So1] [So2] [W-O]

Anderson, J.E. and Putnam, I.F.: Topological invariants for substitution tilings and their associated C ∗ -algebras. Ergodic Theory and Dynamical Systems 18, 509–537 (1998) Bellissard, J.: K-theory of C ∗ -algebras in solid state physics. In: Statistical Mechanics and Field Theory: Mathematical Aspects, ed. T.C. Dorlas, N.M. Hugenholtz and M. Winnik, Lecture Notes in Physics 257, Berlin–Heidelberg–New York: Springer-Verlag, 1986, pp. 99–156 Bellissard, J.: Gap labelling theorems for Schrödinger’s Operators. In: From Number Theory to Physics, ed. M. Waldschmidt, P. Moussa, J.M. Luck and C. Itzykson, Berlin–Heidelberg–New York: Springer-Verlag, 1992, pp. 538–630 Bellissard, J., Contensou, E., Legrand, A.: K-théorie des quasicristaux, image par la trace, le cas du ré octaggonal. C. R. Acad. Sci. Paris t. 326, Série I, 197–200 (1998) Blackadar, B.: K-theory for Operator Algebras. MSRI Publications 5, Berlin–Heidelberg–New York: Springer-Verlag, 1986 Connes, A.: Non-commutative Geometry. San Diego: Academic Press, 1994 Davidson, K.R.: C ∗ -algebras by example. Providence, RI.: Am. Math. Soc. 1996 Effros, E.G.: Dimensions and C ∗ -algebras. CBMS Regional Conf. Ser. no. 46, Providence, RI.: Am. Math. Soc., 1981 Effros, E.G., Handelman, D.E. and Shen, C.-L.: Dimension groups and their affine representations. Am. J. Math. 102, 385–407 (1980) van Elst, A.: Gap-labelling theorems for Schrödinger operators on the square and cubic lattice, Rev. Math. Phys. 6, 319–342 (1994) Fillmore, P.A.: A user’s guide to operator algebras. New York: Wiley, 1996 Grünbaum, B. and Shephard, G.C.: Tilings and Patterns. New York: Freeman and Co., 1987 Handelman, D.: Positive matrices and dimension groups associated with topological Markov chains. J. Operator Th. 6, 55–74 (1981) Kellendonk, J.: Non-commutative geometry of tilings and gap labelling. Rev. Math. Phs. 7, 1133–1180 (1995) Kellendonk, J.: The local structure of tilings and their integer group of coinvariants. Commun. Math. Phys. 187, 115–157 (1997) Kellendonk, J. and Putnam, I.F.: Tilings, C ∗ -algebras and K-theory. Preprint Lind, D. and Marcus, B.: An introduction to symbolic dynamics and coding. Cambridge: Cambridge University Press, 1995 Pedersen, G.K.: C ∗ -algebras and their automorphism groups. London: Academic Press, 1979 Putnam, I.F.: C ∗ -algebras from Smale spaces. Canad. J. Math. 48, 175–195 (1996) Putnam, I.F.: Hyperbolic dynamical systems and generalized Cuntz-Krieger algebras. Lecture Notes from the summer school in operator algebras, Odense, 1996 Putnam, I.F. and Spielberg, J.: The structure of C ∗ -algebras associated with hyperbolic dynamical systems. J. Func. Anal. 163, 279–299 (1999) Radin, C. and Wolff, M.: Space tilings and local isomorphism. Geom. Ded. 42, 355–360(1992) Renault, J.N.: A groupoid approach to C ∗ -algebras. Lecture Notes in Math. 793, Berlin–Heidelberg– New York: Springer-Verlag, 1980 Rudolph, D.J.: Markov tilings of Rd and representations of Rd actions. Contemp. Math. 94, 271–289 (1989) Solomyak, B.: Dynamics of self-similar tilings. Ergodic Theory and Dynamical Systems 17, 695–738 (1997) Solomyak, B.: Non-periodicity implies unique composition for self-similar translationally-finite tilings. Disc. Comp. Geom. 20, 265–279 (1998) Wegge-Olsen, N.E.: K-theory and C ∗ -algebras. Oxford: Oxford University Press, 1993

Communicated by A. Connes

Commun. Math. Phys. 214, 607 – 649 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Stratification of the Generalized Gauge Orbit Space Christian Fleischhack1,2 1 Mathematisches Institut and Institut für Theoretische Physik, Universität Leipzig, Augustusplatz 10/11,

04109 Leipzig, Germany. E-mail: [email protected]

2 Max-Planck-Institut für Mathematik in den Naturwissenschaften, Inselstraße 22–26, Leipzig, Germany.

E-mail: [email protected] Received: 12 January 2000 / Accepted: 8 May 2000

Abstract: Different versions for defining Ashtekar’s generalized connections are investigated depending on the chosen smoothness category for the paths and graphs – the label set for the projective limit. Our definition covers the analytic case as well as the case of webs. Then the action of Ashtekar’s generalized gauge group G on the space A of generalized connections is investigated for compact structure groups G. Here, first, the orbit types of the generalized connections are determined. The stabilizer of a connection is homeomorphic to the holonomy centralizer, i.e. the centralizer of its holonomy group. It is proven that the gauge orbit type of a connection can be defined by the G-conjugacy class of its holonomy centralizer equivalently to the standard definition via G-stabilizers. The connections of one and the same gauge orbit type form a so-called stratum. As the main result of this article a slice theorem is proven on A. This yields the openness of the strata. Afterwards, a denseness theorem is proven for the strata. Hence, A is topologically regularly stratified by G. These results coincide with those of Kondracki and Rogulski for Sobolev connections. Furthermore, the set of all gauge orbit types equals the set of all (conjugacy classes of) Howe subgroups of G. Finally, it is shown that the set of all gauge orbits with maximal type has the full induced Haar measure 1.

1. Introduction For quite a long time the geometric structure of gauge theories has been investigated. A classical (pure) gauge theory consists of three basic objects: First the set A of smooth connections (“gauge fields”) in a principal fiber bundle, then the set G of all smooth gauge transforms, i.e. vertical automorphisms of this bundle, and finally the action of G on A. Physically, two gauge fields that are related by a gauge transform describe one and the same situation. Thus, the space of all gauge orbits, i.e. elements in A/G, is the configuration space for the gauge theory. Unfortunately, in contrast to A, which is

608

C. Fleischhack

an affine space, the space A/G has a very complicated structure: It is non-affine, noncompact and infinite-dimensional and it is not a manifold. Additionally, another typical “disadvantage” concerning A/G is the so-called Gribov problem: usually there do not exist global gauge fixings in A/G, i.e. smooth sections in A −→ A/G. Moreover, even in Airr −→ Airr /G (Airr ⊆ A being the set of all connections whose holonomy group is an irreducible subgroup of the structure group) there are often no smooth sections as proven by Singer [22]. All that causes enormous problems, in particular, when one wants to quantize a gauge theory. One possible quantization method is the path integral quantization. Here one has to find an appropriate measure on the configuration space of the classical theory, hence a measure on A/G. As just indicated, this is very hard to find. Thus, one has hoped for a better understanding of the structure of A/G. However, up to now, results are quite rare. But, should one restrict oneself to the case of smooth connections? Since in a quantization process smoothness is usually lost anyway, it is quite clear that one has to admit also non-smooth connections. This way, about 20 years ago, the efforts were focussed on a related problem: The consideration of connections and gauge transforms that are contained in a certain Sobolev class. (For basic results we refer, e.g., to [19].) Now, G is a Hilbert–Lie group and acts smoothly on A. About 15 years ago, Kondracki and Rogulski [14] found a lot of fundamental properties of this action. Perhaps, the most remarkable theorem they obtained was a slice theorem on A. This means, for every orbit A ◦ G ⊆ A there is an equivariant retraction from a (so-called tubular) neighborhood of A onto A ◦ G. Using this theorem they could clarify the structure of the so-called strata. A stratum contains all connections that have the same, fixed type, i.e. the same (conjugacy class of the) stabilizer under the action of G. Using a denseness theorem for the strata, Kondracki and Rogulski proved that the space A is regularly stratified by the action of G. In particular, all the strata are smooth submanifolds of A. Despite these results the mathematically rigorous construction of a measure on A/G has not been achieved. This problem was solved – at least preliminary – by Ashtekar et al. [1, 2], but, however, not for A/G itself. Their idea was to drop simply all smoothness conditions for the connections and gauge transforms. In detail, they first used the fact that a (smooth) connection can always be reconstructed uniquely by its parallel transports. On the other hand, these parallel transports can be identified with an assignment of elements of the structure group G to the paths in the base manifold M such that the concatenation of paths corresponds to the product of these group elements. It is intuitively clear that for smooth connections the parallel transports additionally depend smoothly on the paths [16]. But now this restriction is removed for the generalized connections. They are purely algebraic homomorphisms from the groupoid P of paths to the structure group G. Analogously, the set G of generalized gauge transforms collects all functions from M to G. Now the action of G on A is defined purely algebraically as well. Given A and G the topologies induced by the topology of G, one sees that, for compact G, these spaces are again compact. This provides us with the huge apparatus of measure theory on compact spaces. A particularly nice theorem [2] guarantees the existence of a natural induced Haar measure on A and A/G, the new configuration space for the path integral quantization. Both from the mathematical and from the physical point of view it is very interesting how the “classical” regular gauge theories are related to the generalized formulation in the Ashtekar framework. First of all, it has been proven that A and G are dense subsets in A and G, respectively [20]. Furthermore, A is contained in a set of induced Haar measure zero [18]. These properties coincide exactly with the experiences known from the Wiener

Stratification of the Generalized Gauge Orbit Space

609

or Feynman path integral. Then the Wilson loop expectation values have been determined for the two-dimensional pure Yang–Mills theory [5, 11] – in coincidence with the known results in the standard framework. Now, we are going to investigate the action of the generalized gauge transforms on the space of generalized connections in comparison with its counterpart in the Sobolev case discussed in detail by Kondracki and Rogulski [14]. Our main goals are the determination of the gauge orbit types and the proof of a slice theorem and a denseness theorem in the Ashtekar framework. However, our methods are completely different to those in [14]: We use topology and algebra instead of differential geometry and analysis. The outline of the present paper is as follows: • In the first part we will give a quite detailed introduction into the algebraic and topological definitions and properties of A, G and A/G. Here we closely follow Ashtekar and Lewandowski [4, 3] as well as Marolf and Mourão [18]. The most important difference to their definitions is that we do not restrict the paths to be (piecewise) analytic or smooth. For our purpose it is sufficient to fix a category of smoothness from the beginning. This is C r , where r can be any positive natural number, ∞ (smooth case) or ω (analytical case). We can also consider the corresponding cases C r,+ of paths that are (piecewise) immersions additionally. We will show that in a certain sense the case (ω, +) corresponds to the loop structures introduced by Ashtekar and Lewandowski [2] and the case (∞, +) corresponds to the webs introduced by Baez and Sawin [6]. • For the following parts we still need a more detailed analysis of the properties of the space A itself. However, these considerations are a little bit separate from the main goal of this paper – the investigation of the action of G on A –, such that we exported them to a second article [10]. There we will give a construction method for new connections being crucial for most of the statements of that article as well as some of the present paper. Then, as a main result, we will show that an induced Haar measure dµ0 can be defined for arbitrary smoothness conditions. For this, we introduce the notion of a hyph that generalizes the notion of a web and a graph. We show that the paths of a hyph are holonomically independent and that the set of all hyphs is directed. These two properties yield the well-definedness of dµ0 . • The second part of the present paper is devoted to the type of the gauge orbit. Here and in the following, G is compact. In the general theory of transformation groups the type of an orbit (or, more precisely, an element of an orbit) is defined by the conjugacy class of its stabilizer (see, e.g., [8]). Here, we will derive the explicit form of the stabilizer for every generalized connection. As we will see, the stabilizer of a connection is homeomorphic to the centralizer of its holonomy group, hence a finite-dimensional Lie group. Since stabilizers are conjugate in G if and only if these centralizers are conjugate in G, the type of an orbit is uniquely determined by a certain equivalence class of a Howe subgroup of the structure group G. (A Howe subgroup of G is a subgroup that can be written as the centralizer of some subset V ⊆ G.) This is closely related to the observations of Kondracki and Sadowski [15]. • In the third part we will see how the results of Kondracki and Rogulski can be extended from the Sobolev framework to the generalized case. First of all we prove a very crucial lemma: Every centralizer in a compact Lie group is finitely generated. This implies that every orbit type (being the centralizer of the holonomy group) is determined by a finite set of holonomies of the corresponding connection. Using the projection onto these holonomies we can lift the slice theorem from an appropriate finite-dimensional Gn to the space A. A slice theorem means that for every connection A ∈ A there

610

C. Fleischhack

is an open and G-invariant neighbourhood that can be retracted equivariantly to the orbit A ◦ G. It implies the (relative) openness of the strata – the sets of connections of one and the same type. Afterwards, we prove a denseness theorem for the strata. For this we need the construction method for new connections from [10] mentioned above. Altogether, we prove that the slice and the denseness theorem yield again a topologically regular stratification of A as well as of A/G. However, in contrast to the Sobolev case, the strata are not proved to be manifolds. But, two results for generalized connections go beyond those for Sobolev ones. First, as a corollary of the denseness theorem we obtain that the set of all gauge orbit types equals the set of all conjugacy classes of Howe subgroups of G. This way we can explicitly derive the set of all gauge orbit types. This was not known until now for the Sobolev case. However, recently, Rudolph, Schmidt and Volobuev [21] solved this problem for all SU (n)-bundles over two-, three- and four-dimensional manifolds. Second, we prove that the generic stratum, i.e. the set of all connections whose holonomy centralizers equal the center of G, is not only dense in A, but has also the total induced Haar measure 1. This shows finally that the Faddeev–Popov determinant for the projection A −→ A/G is equal to 1. In the following, M is always a connected and at least two-dimensional C r -manifold with r ∈ N+ ∪ {∞} ∪ {ω} being arbitrary, but fixed. Furthermore, m is an, as well, arbitrary, but fixed point in M and G is a Lie group being compact in sections 3ff. 2. Reformulation of Ashtekar’s Gauge Theory 2.1. Paths. In the classical approach a connection can be described by the corresponding parallel transports along paths in the base manifold. But, not every assignment of group elements to the paths yields a connection. On the one hand, this map has to be a homomorphism, i.e., products of paths have to lead to products of the parallel transports, and on the other hand, it has to depend in a certain sense continuously on the paths. Moreover, additional topological obstructions may occur. In the Ashtekar approach, however, the second (and the third) condition is dropped. A connection is now simply a homomorphism from the set of paths to the structure group G. Up to now, it is not clear whether there is an “optimal” definition for the structure of the groupoid P of paths. The first version was given by Ashtekar and Lewandowski [2]. They used piecewise analytical paths. The advantage of this approach was that any finite set of paths forms a finite graph. Hence for two finite graphs there is always a third graph containing both of them, i.e. the set of all graphs forms a directed set. Using this it is easy to prove independence theorems for loops and to define then a natural measure on A and A/G. But, the restriction to analyticity seems a little bit unsatisfactory. Since one has desired from the very beginning to use A for describing quantum gravity, one comes into troubles with the diffeomorphism invariance of this theory: After applying a diffeomorphism a path need no longer be analytical. That is why Baez and Sawin [6] introduced so-called webs and tassels built by only smooth paths fulfilling certain conditions. Any graph can be written as a web and for any finite number of webs there is a web containing all of them. So the directedness of the label set for the definition of A remains valid, and, consequently, one can generalize the construction of the natural induced Haar measure and lots more things. In this paper we will introduce another definition for paths. Our definition will have the advantage that it does not depend explicitly on the chosen smoothness category

Stratification of the Generalized Gauge Orbit Space

611

labelled by r ∈ N+ ∪ {∞} ∪ {ω}. Moreover, it does not matter whether we demand the paths to be piecewise immersions (cases C r,+ , + denoting the immersion property) or not. Therefore, in what follows suppose that we have fixed the parameter r from the very beginning. Furthermore, we decide now whether we additionally demand the paths to be piecewise immersions or not. Nevertheless, we write always simply C r . 2.1.1. General case. In this paragraph we consider all smoothness categories on one stroke. Definition 2.1. A path is a piecewise C r -map γ : [0, 1] −→ M. If we consider piecewise immersed paths, we have to additionally define all γ : [0, 1] −→ M that are piecewise constant, i.e. γ |[τ1 ,τ2 ] = {x} for some x ∈ M, or immersive to be paths. The initial point is γ (0) and the terminal point γ (1). Two paths γ1 and γ2 can be multiplied iff the terminal point of γ1 and the initial point of γ2 coincide. Then the product is given by γ1 (2t) for 0 ≤ t ≤ 21 γ1 γ2 (t) := . γ2 (2t − 1) for 21 ≤ t ≤ 1 A path γ is called trivial iff im γ ≡ γ ([0, 1]) is a single point. An important idea of the Ashtekar program is the assumption that the total information about the continuum theory is encoded in the set of all subtheories on finite lattices. Thus we need the definition of paths and graphs. The set of all paths is hard to manage. That is why we restrict ourselves to special paths. Definition 2.2. • A path γ has no self-intersections iff from γ (τ1 ) = γ (τ2 ) follows that – τ1 = τ2 or – τ1 = 0 and τ2 = 1 or – τ1 = 1 and τ2 = 0. • Two paths γ1 and γ2 are non-intersecting iff γ1 (τ1 ) = γ2 (τ2 ) implies τ1 , τ2 ∈ {0, 1}. • A path γ is called subpath of a path γ iff there is an affine non-decreasing map φ : [0, 1] → [0, 1] with γ = γ ◦ φ. Iff additionally φ(0) = 0 (or φ(1) = 1), γ is called initial path (or terminal path) of γ . We define γ t,+ (τ ) := γ (t + τ (1 − t)) for all t ∈ [0, 1) and γ t,− (τ ) := γ (τ t) for all t ∈ (0, 1] to be the outgoing and incoming subpath of γ in t, respectively. If γ is a path without self-intersections then set γ x,± := γ t,± for all x ∈ im γ , where t fulfills γ (t) = x. (We choose t = 0 in the +-case if x = γ (0). Analogously for t = 1.) • A (finite) graph is a (finite) union of paths ei that are mutually non-intersecting and and of isolated points vj . The elements have no self-intersections of V() := {e (0), e (1)} ∪ {v } are called vertices, that of E() := i i j i j i {ei } edges. A graph is called connected iff V() ∪ e∈E() im e is connected. • A path in a graph is a path in M, that equals a product of edges in and trivial paths (with values in V()), respectively, whereas the product of two consecutive paths has to exist. A path γ in M is called simple iff there is a finite graph such that γ is a path in . • A path γ in M is called finite iff γ is up to the parametrization equal to a finite product of simple paths. Here, two paths γ1 and γ2 are equal up to the parametrization iff there is a bijective : [0, 1] −→ [0, 1] with (0) = 0 and γ2 = γ1 ◦ such that and −1 are piecewise C r .

612

C. Fleischhack

• Two finite paths γ1 and γ2 are called equivalent iff there is a finite sequence of finite paths δi with δ0 = γ1 and δn = γ2 such that for all i = 1, . . . , n – δi and δi−1 coincide up to the parametrization or – δi arises from δi−1 by inserting a retracing or – δi−1 arises from δi by inserting a retracing. Inserting a retracing means, there is a τ ∈ [0, 1] and a finite path δ such that   δ ( 1 t)   i−1 2 1  δ(4(t − 2 τ )) δi (t) = 1  δ(4( τ + 1 − t))   δ (21 t − 21 ) i−1 2 2

for 0 ≤ t ≤ 21 τ for 21 τ ≤ t ≤ 21 τ + 41 . 1 1 for 2 τ + 4 ≤ t ≤ 21 τ + 21 for 21 τ + 21 ≤ t ≤ 1

In the following, we denote by a retracing of a path γ a subpath of the form δδ −1 with a certain finite δ. • The set of all classes of finite paths is denoted by P, that of paths in by P . Furthermore, we write Pxy for the set of all classes of finite paths from x to y. The set of all classes of finite paths having base point m forms the hoop group HG ≡ Pmm . We have immediately from the definitions Proposition 2.1. The multiplication on P induced by the multiplication of paths is welldefined and generates a groupoid structure on P. (This means, roughly speaking, P possesses all properties of a group: associativity, existence of unit elements and of the inverse. But, the product need not be defined for all paths.) The hoop group HG is a subgroup of P. Remark. 1. One can define an analogous equivalence relation on the set of paths in a fixed graph: Two paths would be “-equivalent” iff they arise from each other by reparametrizations or by inserting or cancelling of retracings contained in . Obviously, two paths in are equivalent if they are -equivalent. On the other hand, one can also prove that two paths contained in are already -equivalent if they are equivalent. Consequently, we can identify P and the set of all -equivalence classes of paths in . In other words: P is the groupoid that is generated freely by the set of all edges of . 2. In what follows we usually say instead of “finite connected graph” simply “graph” and instead of “finite path” only “path”. Moreover, by a path we always mean – if not explicitly the converse is said – an equivalence class of paths. 3. Finally, we identify two graphs if the (corresponding) edges are equivalent. Since edges are per def. free of retracings, this simply means that the edges are equal up to the parametrization. 4. Note that the paths γ1 (τ ) := τ and γ2 (τ ) := τ 2 in R(⊆ Rn ⊆ M) are not equivalent. √ This comes from the fact that : τ −→ τ 2 is C r , but −1 : τ −→ τ is not. (As well, it is not possible to transform γ1 into γ2 successively inserting or deleting retracings as in Definition 2.2.) Furthermore, one sees that γ1 ◦ γ2−1 is an example for a path with retracings that is not equivalent to a path without. 5. If we restricted ourselves to piecewise analytical paths, i.e. the smoothness category (ω, +) from the very beginning, every path would be finite. [2]

Stratification of the Generalized Gauge Orbit Space

613

The main assumption quoted above suggests the usage of finite graphs as an index set for the subtheories. But, these theories are not “independent”. Roughly speaking, a subtheory defined on a smaller lattice arises by projecting the theory defined on the bigger lattice. Definition 2.3. Let 1 and 2 be two graphs. 1 is smaller or equal 2 (1 ≤ 2 ) iff each edge of 1 is (up to the parametrization) a product of edges of 2 and the vertex sets fulfill V(1 ) ⊆ V(2 ). Obviously, ≤ is a partial ordering. 2.1.2. Immersive case. In the case of piecewise immersed paths we can define another equivalence relation for finite paths. Here we use the fact that any piecewise immersed path can be parametrized proportionally to the arc length: Definition 2.4. We shortly call a path a pal-path iff it is parametrized proportionally to the arc length. Two finite paths γ1 and γ2 are called equivalent iff there is a finite sequence of finite paths δi with δ0 = γ1 and δn = γ2 such that for all i = 1, . . . , n • δi and δi−1 coincide when parametrized proportionally to the arc length or • δi arises from δi−1 by inserting a retracing or • δi−1 arises from δi by inserting a retracing. This definition seems to require a certain Riemannian structure on M. But, on the one hand, each manifold can be given a Riemannian structure. On the other hand, the definition of equivalence does not depend on the chosen Riemannian metric: if two paths coincide w.r.t. to the arc length to the first metric then they obviously coincide w.r.t. to the arc length of the other metric. Thus, this definition is indeed completely determined by the manifold structure of M. Lemma 2.2. 1. Two finite paths γ1 and γ2 are equivalent if they can be obtained from each other by a reparametrization. 2. Each nontrivial finite path is equivalent to a pal-path without retracings. Proof. 1. Clear. 2. We prove this inductively on the number n of simple paths γi that the finite path γ is decomposed into. We will even prove that γ is equivalent to a pal-path γ that can be decomposed (up to the parametrization) into n ≤ n simple paths and that has no retracings. For n = 1 we have nothing to prove. Thus, let n ≥ 2. First free γ0 := n−1 i=1 γi off n −1 the retracings using the induction hypothesis. We get a pal-path γ0 ∼ i=1 γi with the desired properties and n ≤ n. Denote by γ the pal-path corresponding to γ0 γn . Obviously, γ ∼ γ . Suppose, γ is not free of retracings. Let δδ −1 be a retracing. Then a part of the retracing δδ −1 has to be in γn . Since γn is simple (and w.l.o.g. non-trivial), the terminal point of δ cannot be in int γn . Since by assumption γ0 is free of retracings, the terminal point has to be the initial point of γn , and thus δ −1 is (if necessary, after an appropriate [affine] reparametrization) an initial path of γn . Assume now δ to be maximal, i.e., any δ “containing” terminal path δ of γ0 that yields a retracing in γ is equal to δ. Such a δ exists: Assume that every pal-subpath δτ−1 of γn corresponding to the parameter interval [0, τ ] with τ < T yields a retracing. (Such

614

C. Fleischhack

a T exists, because there exists some retracing.) By the continuity of every path and the fact that the paths arising here and so all their subpaths are pal, also δT−1 has to yield a retracing. Consequently, there is a maximal T . Now, cancel out the retracing: If δ is not a (genuine) subpath of γn −1 (i.e., “exceeds” or equals it), define γn to be the “remaining” part of γn “outside” (γn −1 )−1 ; then γ := n −2 i=1 γi ◦ γn ∼ γ consists of at most n − 2 + 1 < n finite paths. The induction hypothesis gives the assertion. Suppose now that δ is a (genuine) subpath of γn −1 . Then n −2 define the pal-path γ by i=1 γi γn −1 ◦ γn , where γn denotes the "remaining" part of γn outside of δ −1 and γn −1 that of γn −1 outside of δ. By the maximality of δ, γ contains no retracings. γ ∼ γ ∼ γ yields the assertion. Most of the constructions in the following as well as most of those in [10] do actually not depend on the choice of the equivalence relation for the paths. But, the second one can only be used for piecewise immersed paths. Therefore, in what follows, we will use the general equivalence relation given in the previous paragraph. 2.2. Gauge theory on the lattice. In this subsection we will transfer the lattice gauge theory given by Ashtekar and Lewandowski [4, 3] to our case. The algebraic definitions for the connections, gauge transforms and the action of the latter ones follow these authors closely. In the last two subsections we will state some assertions mainly on the basic properties of the action of the gauge transforms and the projections onto smaller graphs. 2.2.1. Algebraic definition. We use the standard definition: Globally connections are parallel transports, i.e. G-valued homomorphisms of paths in M, and gauge transforms are G-valued functions over M. The lattice versions now come from restricting the domain of definition to edges and vertices in a graph. Definition 2.5. Let be a graph. We define A := Hom(P , G) . . . set of all connections on and G := Maps(V(), G) . . . set of all gauge transforms on . Here, Hom(P , G) denotes the set of all homomorphisms from the groupoid P freely generated by the edges of into the structure group and Maps(V(), G) the set of all maps from the set of all vertices of into the structure group. In the classical case the action of a gauge transform on a connection can be described by the corresponding action on the parallel transports: hA (γ ) −→ gγ−1 (0) hA (γ )gγ (1) . By simply restricting onto the lattice we receive the action of G on A by # : A × G −→ A (h , g ) −→ h ◦ g with h ◦ g (γ ) := g (γ (0))−1 h (γ ) g (γ (1)) for all paths γ in . Definition 2.6. For each graph we define A/G := A /G . . . set of all equivalence classes of connections in .

Stratification of the Generalized Gauge Orbit Space

615

2.2.2. Topological definition. It is obvious that the groupoid P is always freely generated by the the set A = Hom(P , G) can be identified edges ei of . Hence, via h −→ h(e1 ), . . . , h(e#E() ) with G#E() and can so be given a natural topology. Analogously, we use that naturally G = Maps(V(), G) can be identified via g −→ (g(x))x∈V(G) with G#V() . So G is by means of the pointwise multiplication a topological group. We have immediately Proposition 2.3. For all graphs the action # : A × G −→ A is continuous. Proof. # as a map from G#E() × G#V() to G#E() is a concatenation of multiplications, hence continuous. Corollary 2.4. A/G = A /G is a Hausdorff space. A/G is compact for compact G. It is well-known that connections are dual to paths and equivalence classes of connections are dual to closed paths. This is again confirmed by Proposition 2.5. A/G is isomorphic to Hom(HG x, , G)/Ad, hence isomorphic to Gdim π1 () /Ad, for each graph and for each vertex x in . Here HG x, is the set of all (classes of) path(s) in starting and ending in x, and π1 () is the fundamental group of . Proof. Define

J : A/G −→ Hom(HG x, , G)/Ad. [h] −→ [h |HG x, ]Ad

• J is well-defined. If h = h ◦ g, then h (α) = g(x)−1 h (α)g(x) for all α ∈ HG x, , i.e. h |HG x, = h |HG x, ◦Ad g(x). • J is injective. Let J (h ) = J (h ), i.e., let there exist a g ∈ G such that h (α) = g −1 h (α)g for all α ∈ HG x, . Choose for all vertices y = x a path γy from x to y, set γx := 1 and set g(y) := h (γy )−1 g h (γy ) for all y. Now, h = h ◦ (g(y))y∈V() is clear. • J is surjective. Let [h] ∈ Hom(HG x, , G)/Ad be given. Choose an h ∈ [h] and as above for all vertices y a path γy and some gy ∈ G. For each γ ∈ P set h0 (γ ) := −1 gγ−1 (0) h(γγ (0) γ γγ (1) ) gγ (1) . We have J (h0 ) = [h]. Since HG x, is isomorphic to π1 (), hence a free group with dim π1 () generators [11, 2], we have A/G ∼ = Gdim π1 () /Ad. 2.2.3. Relations between the lattice theories. If one constructs a global theory from its subtheories one has to guarantee that these subtheories are “consistent”. This means, e.g., that the projection of a connection onto a smaller graph has to be already defined by its projection onto a bigger graph. So we need projections onto the subtheories induced by the partial ordering on the set of graphs. Definition 2.7. Let 1 ≤ 2 . We define

π12 : A2 −→ A1 , h −→ h |P1

616

C. Fleischhack

π12 : G 2 −→ G 1 g −→ g |V(1 ) and

π12 : A/G 2 −→ A/G 1 . [h] −→ [h |P1 ]

We denote all the three maps by one and the same symbol because it should be clear in the following what map is meant. Obviously, from h = h ◦ g on 2 follows h |P1 = h |P1 ◦g |V(1 ) on 1 , i.e.

π12 is well-defined. Furthermore, we have

Proposition 2.6. Let 1 ≤ 2 ≤ 3 . Then π12 π23 = π13 . Finally, we write down the projections by operations on the structure group in order to see topological properties. Let again 1 ≤ 2 . First we decompose each edge ei of 1 into edges fj of 2 : i *(i,ki ) ei = K ki =1 fj (i,ki ) . With this we get for the map between the connections (n := #E(1 )) π12 :

G#E(2 )

G#E(1 ) , −→ Kn *(1,k1 ) *(n,kn ) K1 g1 , . . . , g#E(2 ) −→ k1 =1 gj (1,k1 ) , . . . , kn =1 gj (n,kn ) .

On the level of gauge transforms the description is very easy: π12 projects (gv )v∈V(2 ) onto those elements belonging to vertices in 1 . For classes of connections an analogous formula as for connections holds: First choose two free generating systems α and β of HG x1 ,1 and HG x2 ,2 , respectively, and then a path γ from x2 to x1 in the bigger graph i *(i,ki ) 2 . Thus we get decompositions αi = γ −1 K ki =1 βj (i,ki ) γ . Hence, (ni := dim π1 (i )) π12 :

Gn2 /Ad

g1 , . . . , gn2

−→

Ad

−→

Gn1 /Ad,

Kn *(n1 ,kn ) , kn 1=1 gj (n1 ,kn1 ) .

*(1,k1 ) K1 k1 =1 gj (1,k1 ) , . . .

1

1

Ad

Proposition 2.7. π12 is continuous, open and surjective. Proof. The surjectivity is clear for all three cases. The continuity is trivial for the first two cases and follows in the third because the projections Gn −→ Gn /Ad are open, continuous and surjective (see [8]) and the map from Gn2 to Gn1 corresponding to π12 is obviously continuous. The openness follows immediately in the case of gauge transforms because projections onto factors of a direct product are open anyway. In the case of connections one additionally needs the openness of the multiplication in G: Each edge in 1 is a product of edges in 2 , i.e., after possibly renumbering we have ei = fi,1 · · · fi,Ki . Thus, π12 (g1,1 , . . . , gn,Kn , . . . ) = (g1,1 · · · g1,K1 , . . . , gn,1 · · · gn,Kn ). Let now W be open in A2 = G#E(2 ) . Then W is a union of sets of the form W1,1 × · · · × Wn,Kn × · · · , i.e., π12 (W ) is a union of sets of the form (W1,1 · · · W1,K1 ) × · · · × (Wn,1 · · · Wn,Kn ). But these are open, i.e., π12 is open. The openness of π12 : A/G 2 −→ A/G 1 follows

now because the map π12 : A2 −→ A1 is open and the projections A −→ A/G are continuous, open and surjective.

Stratification of the Generalized Gauge Orbit Space

617

2.3. Continuum gauge theory. For completeness in the first paragraph we will briefly quote the definitions of A, G and A/G from [4] and in the second we summarize the most important facts about these spaces. In the last two paragraphs we will first investigate the topological properties of the action of G on A and of the projections onto the lattice gauge theories and then prove that the connections etc. are algebraically described exactly in the same form both for our definition of paths and for that of Ashtekar and Lewandowski [2]. 2.3.1. Definition of A, G and A/G. By means of the continuity of the projections π12 the spaces (A ) , (G ) and (A/G ) are projective systems of topological spaces. This leads to the crucial [4] Definition 2.8 (Generalized Gauge Theories). • A := lim A is the space of generalized connections. ← − The elements of A are usually denoted by A or hA . • G := lim G is the space of generalized gauge transforms. ← − The elements of G are usually denoted by g. • A/G := lim A/G is the space of generalized equivalence classes of connections. ← − Explicitly this means A = {(h ) ∈ × A | π12 h2 = h1 for all 1 ≤ 2 },

G = {(g ) ∈ × G | π12 g2 = g1 for all 1 ≤ 2 }

as well as A/G = {([h ]) ∈ × A/G | π12 [h2 ] = [h1 ] for all 1 ≤ 2 }.

We denote π :

A −→ A , (h ) −→ h

π :

G −→ G (g ) −→ g ,

and π :

A/G −→ A/G . ([h ]) −→ [h ]

(1)

618

C. Fleischhack

2.3.2. Topological characterization of A, G and A/G. We have [4, 13] Theorem 2.8. 1. A, G and A/G are completely regular Hausdorff spaces and, for compact G, compact. 2. For every principal fibre bundle over M with structure group G the regular connections (gauge transforms, equivalence classes of connections) are also generalized connections (gauge transforms, equivalence classes of generalized connections): The maps A −→ A, G −→ G and A/G −→ A/G are embeddings. 3. Let X be a topological space. A map f : X −→ A is continuous iff π ◦ f : X −→ A ≡ G#E() is continuous for all graphs . The analogous assertion holds for maps from X to G and A/G, respectively, as well. 4. π is continuous for all graphs . 5. G is a topological group. We shall postpone the discussion whether the space A is dense in A or not for several reasons. This, in fact, depends crucially on the chosen smoothness category and equivalence relation for the paths. It should be clear that – provided γ1 (τ ) := τ and γ2 (τ ) := τ 2 (cf. Remark in paragraph 2.1.1) are seen to be non-equivalent – the denseness is unlikely: No classical smooth connection A can distinguish between these paths. So we will discuss this a bit more in detail in the accompanying paper [10]. As well, we will show there that π is also open and surjective. But all that requires some technical efforts that are absolutely not necessary for the actual goal of this paper – the determination of the gauge orbit types. Proof. 1. The property of being compact, Hausdorff or completely regular is maintained by forming product spaces and by the transition to closed subsets. Thus the assertion follows from the corresponding properties of the structure group G. 2. The embedding property follows from Giles’ reconstruction theorem [12] and [1]. 3. See, e.g., [13]. 4. Since id : A −→ A etc. is continuous, this follows from the facts just proven. 5. The multiplication on G is defined by (g ) ◦ (g ) = (g ◦ g ) . With this G −1 = (g−1 ) . The multiplication is a group with unit (e ) and inverse (g ) m : G × G −→ G is continuous due to the continuity criterion above: π ◦ m = m ◦ (π × π ) is continuous for all , because the multiplication m on G is continuous. 2.3.3. Action of Gauge transforms on connections. Because of the consistency of the actions of G on A one can also define an action of G on A. One simply sets [4] #:

A × G −→ A. (h ) , (g ) −→ (h ◦ g )

Theorem 2.9. 1. The action # of G on A is continuous. and are continuous. 2. The maps A : G −→ A g : A −→ A g −→ A ◦ g A −→ A ◦ g 3. The canonical projection πA/G : A −→ A/G is continuous and open and for compact G also closed and proper. 4. The map π : A/G −→ A /G is well-defined and continuous. [(h ) ] −→ [h ]

Stratification of the Generalized Gauge Orbit Space

619

Proof. 1. π ◦ # = # ◦ (π × π ) : A × G −→ A as a concatenation of continuous maps on the right-hand side is continuous for any graph . By the continuity criterion for maps to A in Theorem 2.8, # is continuous. 2. Follows from the continuity of #. 3. Follows because # is a continuous action of a (compact) topological group G on the Hausdorff space A. [8] 4. π is well-defined. Namely, let A = A ◦ g, i.e. (h ) = (h ◦ g ) , thus h = h ◦ g for all graphs . Then [h ] = [h ]. The continuity of π : A/G −→ A /G follows from the continuity of π : A −→ A and πA /G as well as from the continuity criterion for the quotient topology because the diagram A π

↓

A is commutative.

πA/G

A/G π

πA

/G

↓

A /G

We note that for a compact structure group G and for analytic paths A/G and A/G are even homeomorphic (cf. [4, 3]). 2.3.4. Algebraic characterization of A, G and A/G. In this paragraph we will show that our choice of the definition of paths leads to the same results as the definitions in [2] do. Theorem 2.10. 1. We have A ∼ = Hom(P, G). (This justifies the notation hA for a connection A.) Here, Hom(P, G) is the set of all maps h : P −→ G, that fulfill h(γ1 γ2 ) = h(γ1 )h(γ2 ) for all multipliable paths γ1 , γ2 ∈ P. 2. We have G ∼ = ×x∈M G ≡ Maps(M, G). The isomorphism is even a homeomorphism of topological groups. 3. The action of gauge transforms on the connections is given by hA◦g (γ ) := gγ−1 (0) hA (γ ) gγ (1) for all γ ∈ P.

(2)

hA : P −→ G is the homomorphism corresponding to A ∈ A and gx the component of the gauge transform g ∈ G in x. 4. We have A/G ∼ = Hom(HG, G)/Ad. Here, Hom(HG, G) is the set of all homomorphisms h : HG −→ G. Proof. 1. Define

I : Hom(P, G) −→ A. h −→ (h |P )

• I is obviously well-defined. • I is injective. From h 1 = h2 follows the existence of a γ ∈ P with h1 (γ ) =h2 (γ ). Since γ equals γi with appropriate simple γi , we have h1 (γi ) = h2 (γi ), hence h1 (γi ) = h2 (γi ) for some γi . Choose a finite graph such that γi is a path in . Here we have h1 |P (γi ) = h1 (γi ) = h2 (γi ) = h2 |P (γi ), i.e. I (h1 ) = I (h2 ).

620

C. Fleischhack

• I is surjective. Let (h ) be given. We consider first not classes of paths, but the paths itself. Construct for any simple γ ∈ P a graph containing γ . Define h(γ ) := h (γ ). For general γ ∈ P define h(γ ) := h(γi ) according to some decomposition of γ into simple paths γi . This construction is well-defined: First one easily realizes that it is independent of the decomposition of γ into finite paths (thus also of the parametrization), see the remark below. Hence obviously, h is a homomorphism. Thus, also h(γ δδ −1 γ ) = h(γ γ ) etc., i.e., h is constant on equivalence classes of paths. Consequently, h : P −→ G is a well-defined homomorphism with I (h) = (h ) . 2. Set I : Maps(M, G) −→ G. (gx )x∈M −→ (gx )x∈V() Obviously, I is bijective and a group homomorphism. The topology on Maps(M, G) = ×x∈M G is generated by the preimages πy−1 (U ) of open U ⊆ G, by which πy : (gx )x∈M −→ gy is continuous. Hence, π ◦ I = πv1 × · · · × πv#V() is continuous for all , i.e., I is continuous. Due to the continuity criterion for maps into product spaces, also I −1 is continuous because for all y the map πy ◦ I −1 = π ( consists only of the vertex y) is continuous. 3. This follows immediately from the preceding steps. 4. Use the map J : A/G −→ Hom(HG, G)/Ad and repeat the steps of the proof [h] −→ [h |HG ]Ad of Proposition 2.5. Remark. We still have to show that h(γ ) defined in the surjectivity part of thefirst item above is independent of the decomposition of γ into finite paths. Namely, let γi and γj be two decompositions of γ . The terminal points of γi and γj correspond to certain values of the parameters τi and τj , respectively, of the path γ . Order these values to a sequence (τk ) and construct a decomposition of γ into simple paths δk such that δk corresponds to the segment γ |[τk ,τk+1 ] . Now, on the one hand, γ equals up to the parametrization δk , but, on the other hand, each γi and γj equals up to the parametrization a product δκ ◦ δκ+1 ◦ · · · ◦ δλ with certain κ, λ. Now let i be that graph w.r.t. that γi is simple. Construct hereof the graph i by inserting the terminal points of i ≤ i , and all the γj as vertices. Finally, let k be the graph spanned by δk . Thus, k , we have h(γi ) = h (γi ) i

= hi (γi ) = hi (δκ ◦ δκ+1 ◦ · · · ◦ δλ ) = hi (δκ ) hi (δκ+1 ) · · · hi (δλ ) = hκ (δκ ) hκ+1 (δκ+1 ) · · · hλ (δλ ). Using the analogous relation for γj we have h(γi ) = hk (δk ) = h(γj ). Thus, h(γ ) does not depend on the decomposition. In the following we will usually write a gauge transform in the form g = (gx )x∈M . Furthermore we have again by the continuity criterion for maps into product spaces:

Stratification of the Generalized Gauge Orbit Space

621

Corollary 2.11. Let X be a topological space. A map f : X −→ G is continuous iff πx ◦ f : X −→ G is continuous for all x ∈ M. πx is continuous for all x ∈ M. Remark. If we work in the (ω, +)-category for the paths, i.e., we only consider piecewise analytical graphs, all the definitions and results coincide completely with those of Ashtekar and Lewandowski in [2, 4, 3].

2.4. Graphs vs. webs. In this subsection we will compare the consequences of our definition of paths to that of webs [6, 7, 17]. Within this subsection we only consider the smooth, piecewise immersed category (∞, +) for paths. Note moreover that, here, a path is simply a piecewise immersive and C ∞ -map from [0, 1] to M, i.e. it is not an equivalence class. But it is still finite as before. Let us briefly quote the basic properties of webs. A web consists of a finite number of so-called tassels. A tassel T with base point p ∈ M is a finite, ordered set of curves ci (piecewise immersive smooth maps from [0, 1] to M, i.e. the notion of a curve coincides with our notion of a general, usually non-finite path) that fulfills certain properties: 1. ci (0) = p for all i (common initial point). 2. ci is an embedding (in particular, has no self-intersections). 3. There is a positive constant ki ∈ R for each i such that ci (t) = cj (s) implies ki t = kj s (consistent parametrization). 4. Define Type(x) := {i ∈ I | x ∈ im ci } for all x ∈ M. Then, for all J ⊂ I the set Type−1 ({J }) is empty or has p as an accumulation point. Thus, in our notation, each ci is a simple path. A web is now a finite collection of tassels such that no path of one tassel contains the base point of another tassel. The following theorem on curves proven by Baez and Sawin [6] will be crucial: Theorem 2.12. Given a finite set C of curves. Then there is a web w, such that every curve c ∈ C is equivalent to a finite product of paths γ ∈ w and their inverses. This, namely, leads immediately to the following Proposition 2.13. Every curve is equivalent to a finite path. Thus, our restriction to finite paths is actually no restriction. Proof. Let there be given an arbitrary curve γ : [a, b] −→ M. By the preceding theorem γ depends on some web w, i.e., there is a family of curves ci being simple paths such that γ equals (modulo equivalence, i.e. up to reparametrizations, cf. [6]) a finite product of the curves ci and their inverses. By Definition 2.2, γ is finite. This means, roughly speaking, the sets of paths the connections are based on are the same for the webs and our case (∞, +). But this yields the equality of our definition of A and that of Baez and Sawin. Theorem 2.14. Suppose G to be compact and semi-simple. Then AWeb and A(∞,+) , i.e. the spaces of generalized connections defined by webs [6] and by Definition 2.8, respectively, are homeomorphic.

622

C. Fleischhack

Proof. Using the proposition above we see analogously to the proof of Theorem 2.10 that IWeb : Hom(P, G) −→ AWeb h −→ (h |w )w is a bijection. (Now, the well-definedness is a consequence of the surjectivity of πw : AWeb −→ G#w [17]). Thus, I := IWeb ◦ I −1 : A(∞,+) −→ AWeb is a bijection, too. We are left with the proof that I is a homeomorphism. For this it is sufficient to prove that each element of a subbase of the one topology has an open image in the other topology. Possible subbases for A(∞,+) and AWeb are the families of all sets of the type π−1 (W ) and πw−1 (Ww ), respectively. Hereby, w is a web and Ww ⊆ Gk , k being the number of paths in w, open. (Here again where we need the semi-simplicity and compactness of G, because only for these assumptions it is proven [17] up to now that the projection πw |A : AWeb ⊇ A −→ Gk is surjective, i.e. Aw = Gk . Otherwise, it would be possible that πw (A) is a non-open Lie subgroup of Gk . So the sets πw−1 (Ww ) do no longer create a subbase.) Furthermore, is a graph and W ⊆ G#E() an element of a certain subbase, e.g., a set of type W = W1 × · · · × W#E() with open Wi ∈ G. Thus, we can take as a subbase for A(∞,+) simply all sets πc−1 (W ), where c is a simple path, i.e. a graph, and W ⊆ G is open. Since every web is a collection of a finite number of simple paths, we get completely analogously that the family of all πc−1 (W ) is a subbase for AWeb . The only difference here is that c has to be simple with different initial and terminal point. We are therefore left with the proof that I(πc−1 (W )) is open in AWeb for all simple, closed paths c and all open W , which is, however, quite easy. Decompose c into two paths c1 and c2 (with different initial and terminal points) which span the graph . Then I(πc−1 (W )) = I(π−1 ((πc )−1 (W ))). By the continuity of πc the set (πc )−1 (W ) is open in G2 , i.e. a union of sets of the type W1 × W2 , but I(π−1 (W1 × W2 )) is open as discussed above. We will continue the discussion on the relationship between graphs and webs in [10]. 3. Determination of the Gauge Orbit Types In contrast to the general theory above let now G be a compact Lie group throughout this section and the following ones. The goal of this section is the classification of the generalized connections by the type of their G-orbits. In contrast to the theory of classical connections in principal fiber bundles, topological subtleties do not play an important rôle – a generalized connection is only an (algebraic) homomorphism from the groupoid P of paths into the structure group G, and the generalized gauge transforms are simply mappings from M to G. Thus, also the theory of generalized gauge orbits is governed completely by the algebraic structure of the action of G on A: hA◦g (γ ) = gx−1 hA (γ ) gy

for all A ∈ A, g ∈ G, γ ∈ Pxy .

(3)

For each element g of the stabilizer B(A) of a connection A the following must be fulfilled: hA (γ ) = hA◦g (γ ) = gx−1 hA (γ ) gy hence, in particular,

for all γ ∈ Pxy ,

(4)

Stratification of the Generalized Gauge Orbit Space

623

−1 h (α) g for all α ∈ HG ≡ P • hA (α) = gm m mm and A −1 • hA (γx ) = gm hA (γx ) gx for all x ∈ M, whereas γx is for any x some fixed path from m to x.

Any path γ ∈ Pxy can be written as γx−1 (γx γ γy−1 ) γy , i.e. as a product of paths in HG and {γx }; thus, both conditions are even equivalent to (4). From the first condition it follows that gm has to commute with all holonomies hA (α), i.e. gm is contained in the centralizer Z(HA ) of the holonomy group of A. Writing the second condition as gx = hA (γx )−1 gm hA (γx )

for all x ∈ M,

(5)

we see that an element g of the stabilizer of A is already completely determined by its value in the point m, i.e. by an element of the holonomy centralizer Z(HA ). From this the isomorphy of B(A) and Z(HA ) follows immediately. Due to general theorems of the theory of transformation groups the gauge orbit A ◦ G is homeomorphic to the factor space B(A)\ G. Now, we define the subgroup G 0 ⊆ G by πm−1 (eG ). This means it contains all gauge transforms that are trivial in m. Obviously, we have G ∼ = G × G 0 . Since B(A) and Z(HA ) ∼ = Z(HA ) × {eG 0 } are homeomorphic, we get for the moment heuristically

∼ ∼ G G G × G0. G × = = \ \ \ 0 B(A) Z(HA ) × {eG 0 } Z(HA ) Using a rigorous argument we will prove that the left and the right space are indeed homeomorphic, i.e. the homeomorphism type of a gauge orbit is already determined by that of Z(H )\ G. Consequently, two connections have homeomorphic gauge orbits, in A particular, if the holonomy centralizers are conjugate. Finally, we can prove that the stabilizers of two connections are conjugate w.r.t. G iff the corresponding holonomy centralizers are conjugate w.r.t. G. This allows us to define the type of a connection not only (as known from the general theory of transformation groups) by the G-conjugacy class of its stabilizer B(A), but equivalently by the Gconjugacy class of its holonomy centralizer Z(HA ). After all, we again mention that in the following G is a compact Lie group. The purely algebraic results, of course, are valid also without this assumption. 3.1. Stabilizer of a connection.

Definition 3.1. Let A ∈ A. Then EA := A ◦ G ≡ {A ∈ A | ∃g ∈ G : A = A ◦ g} is called a gauge orbit of A. Obviously, two gauge orbits are equal or disjoint. We need some notations. Definition 3.2. Let A ∈ A be given. 1. The holonomy group HA of A is equal to hA (HG) ⊆ G. 2. The centralizer Z(HA ) of the holonomy group, also called holonomy centralizer of A, is the set of all elements in G that commute with all elements in HA . 3. The base centralizer B(A) of A is the set of all elements g = (gx )x∈M in G such that −1 h (γ ) g for all x ∈ M and all paths γ from m to x. hA (γ ) = gm x A

624

C. Fleischhack

Note that for regular connections the holonomy group defined above is exactly the holonomy group known from classical theory. We get immediately from the definitions Lemma 3.1. Let A ∈ A and g ∈ G. 1. The holonomy group HA is a subgroup of G. 2. Z(HA ) is a closed subgroup of G. −1 H g and Z(H −1 3. We have HA◦g = gm A m A◦g ) = gm Z(HA ) gm . 4. We have g ∈ B(A) iff a) gm ∈ Z(HA ) and −1 h (γ ) g . b) for all x ∈ M there is a path γ from m to x with hA (γ ) = gm x A Proof. 1. This is an obvious consequence of the homomorphy property of hA : HG −→ G. 2. Trivial. −1 h (α)g for all α ∈ HG. 3. This follows immediately from hA◦g (α) = gm m A 4. $⇒ We have to prove only gm ∈ Z(HA ), but this is clear because we have hA (α) = −1 h (α)g for all α ∈ HG by assumption. gm m A ⇐$ Let x ∈ M be fixed and δ be an arbitrary path from m to x. Choose a γ such −1 h (γ ) g . Then α := δγ −1 ∈ HG and that hA (γ ) = gm x A −1 −1 gm hA (δ) gx = gm hA (αγ ) gx

−1 = gm hA (α) hA (γ ) gx −1 = hA (α) gm hA (γ ) gx = hA (α) hA (γ ) = hA (δ).

since gm ∈ Z(HA ) by the choice of γ

Now we can determine the stabilizer of a connection. Proposition 3.2. For all A ∈ A and all g ∈ G we have A ◦ g = A ⇐⇒ g ∈ B(A). Proof. Per def. we have A ◦ g = A ⇐⇒ ∀x, y ∈ M, γ ∈ Pxy : hA (γ ) = hA◦g (γ ) = gx−1 hA (γ ) gy .

(6)

−1 h (α) g = h (α) holds for all α ∈ P $⇒ Let A ◦ g = A. Due to (6) gm m mm ≡ HG, A A −1 h (γ ) g for all x ∈ M i.e. gm ∈ Z(HA ). Again by (6) we have hA (γx ) = gm x A x and all γ ∈ Pmx . Thus, g ∈ B(A). ⇐$ Let g ∈ B(A) and x, y ∈ M be given. Choose some γx ∈ Pmx , γy ∈ Pmy . Then for all γ ∈ Pxy the following holds:

gx−1 hA (γ )gy = gx−1 hA (γx−1 γx γ γy−1 γy ) gy −1 −1 hA (γx γ γy−1 ) gm gm hA (γy ) gy = gx−1 hA (γx−1 ) gm gm −1 −1 = (gm hA (γx ) gx )−1 hA (γx γ γy−1 ) (gm hA (γy ) gy )

(since γx γ γy−1 ∈ HG and gm ∈ Z(HA ))

= hA (γx )−1 hA (γx γ γy−1 ) hA (γy ) = hA (γ ).

Stratification of the Generalized Gauge Orbit Space

By (6) we have A ◦ g = A.

625

Since for compact transformation groups every stabilizer is closed (see, e.g., [8]), we have using the proposition above Corollary 3.3. B(A) is a closed, hence compact subgroup of G. Furthermore, by the lemma above we get A ◦ g 1 = A ◦ g 2 ⇐⇒ A ◦ g 1 ◦ g −1 2 = −1 A ⇐⇒ g 1 ◦ g 2 ∈ B(A), i.e. we can identify EA and B(A)\ G by τ : B(A)\ G −→ EA . −→ A ◦ g [g] Again by the general theory of compact transformation groups we get [8] Proposition 3.4. τ : B(A)\ G −→ EA is an equivariant isomorphism between compact Hausdorff spaces. 3.2. Isomorphy of B(A) and Z(HA ). In the next subsection we shall determine the homeomorphism class of a gauge orbit EA . For that purpose, we should use the base centralizer. But, this object seems – at least for the first moment – to be quite inaccessible from the algebraic point of view. However, looking carefully at its definition (Def. 3.2) −1 h (γ )g the value of g is already one sees that for given A due to hA (γ ) = gm x x A determined by gm ∈ Z(HA ). Therefore, the base centralizer is completely determined by the holonomy centralizer. Proposition 3.5. For any A ∈ A the map φ : B(A) −→ Z(HA ) g −→ gm is an isomorphism of Lie groups. (The topologies on B(A) and Z(HA ) are the relative ones induced by G and G, respectively.) Proof. • Obviously, φ is a homomorphism. • Surjectivity Let g ∈ Z(HA ). Choose for each x ∈ M a path γx from m to x (w.l.o.g. γm is the trivial path) and define gx := hA (γx )−1 g hA (γx ).

(7)

Obviously, g = (gx ) ∈ G and φ(g) = g. By Lemma 3.1, 4 we have g ∈ B(A) because 1. gm = hA (γm )−1 g hA (γm ) = g ∈ Z(HA ) by the triviality of γm ∈ HG and −1 h (γ ) g for the γ chosen above. 2. hA (γx ) = gm x x A x • Injectivity Clear, because gx is uniquely determined by A and so gm is due to hA (γx ) = −1 h (γ ) g . gm x A x • Continuity of φ φ is the restriction of πm : G −→ G m ≡ G to B(A). The continuity of φ is now a consequence of the continuity of πm .

626

C. Fleischhack

• Continuity of φ −1 φ : B(A) −→ Z(HA ) is a continuous and bijective map of a compact space onto a Hausdorff space. Therefore, φ −1 is continuous. Finally, we note that obviously the isomorphism φ does not depend on the special choice of the paths γx .

3.3. Determination of the homeomorphism class. As we have seen in the last subsection, B(A) and Z(HA ) × {eG 0 } are homeomorphic subgroups of G. One could conjecture that consequently

= Z(HA )\ G × G 0 B(A)\ G and Z(HA ) × {eG 0 } \ G × G 0 ∼ are homeomorphic. But, this is not clear at all. For instance, 2Z and 3Z are isomorphic, but Z/2Z = {0, 1} and Z/3Z = {0, 1, 2} are not. Nevertheless, in our case the claimed relation holds: Proposition 3.6. For any A ∈ A there is a homeomorphism 80 : G 0 × Z(H )\ G −→ B(A)\ G. A Hence, the homeomorphism type of EA is not only determined by B(A)\ G, but already by Z(H )\ G. A

Before we prove this proposition, we shall motivate our choice of the homeomorphism. First we again choose for each x ∈ M a path γx from m to x, where w.l.o.g. γm is the trivial path. By Eq. (7) we get a homomorphism φ : G −→ G g −→ hA (γx )−1 g hA (γx ) x∈M with φ (Z(HA )) = B(A) and therefore a map from Z(H )\ G to B(A)\ G. Further A more, we have φ (G)G 0 = G ∼ = φ (G)×G 0 with g −→ φ (gm ), φ (gm )−1 g . Although

there is no group structure on B(A)\ G – in general, B(A) is only a subgroup and not a normal subgroup of G –, there is at least a canonical right action of G and G 0 , respectively, by [g] ◦ g := [g g ]. Thus, (g, [g]) −→ [φ (g)] ◦ g is a good candidate to become our desired homeomorphism.

Proof. First we choose some path γx from m to x for each x ∈ M, where w.l.o.g. γm is trivial. Now we define 80 : G 0 × Z(H )\ G −→ G A B(A)\ (gx )x∈M , [g] −→ φ (g) (gx )x∈M with gm = eG .

Stratification of the Generalized Gauge Orbit Space

627

1. 80 is well-defined. Let g1 ∼ g2 , i.e. g1 = zg2 for some z ∈ Z(HA ), and let g := (gx )x∈M ∈ G 0 . Then we have 80 (gx )x∈M , [g1 ] = φ (g1 ) g = φ (zg2 ) g Homomorphy property of φ = φ (z) φ (g2 ) g = φ (g2 ) g φ (Z(HA )) = B(A) by Proposition 3.5 = 80 (gx )x∈M , [g2 ] . 2. 80 is injective. Let 80 (g1,x )x∈M , [g1 ] = 80 (g2,x )x∈M , [g2 ] . Then there exists a z ∈ B(A) with φ (g1 )x g1,x = zx φ (g2 )x g2,x , i.e. hA (γx )−1 g1 hA (γx ) g1,x = zx hA (γx )−1 g2 hA (γx ) g2,x for all x ∈ M. Thus, • for x = m: g1 = zm g2 , i.e. [g1 ] = [g2 ], and • for x = m: g1,x = hA (γx )−1 g1−1 hA (γx ) zx hA (γx )−1 g2 hA (γx ) g2,x = hA (γx )−1 g1−1 zm g2 hA (γx ) g2,x = hA (γx )−1 hA (γx ) g2,x = g2,x ,

i.e. 80 is injective. 3. 80 is surjective. Let [ g ] ∈ B(A)\ G be given. Define gx := (φ ( gm )−1 g )x for all x ∈ M. Then we have 80 (gx )x∈M , [ gm ] = [ g ]. 4. 80−1 is continuous. It is sufficient to prove that the projections pr i ◦ 80−1 of 80−1 to the factors G 0 (i = 1) and Z(H )\ G (i = 2) are continuous. A

a) pr 1 ◦ 80−1 is continuous. For all x ∈ M \ {m} the map πmx

mult.

G −−→ G × G −−→ G −1 h (γ )g ) g −→ (gm , gx ) −→ (hA (γx )−1 gm A x x is a composition of continuous maps and consequently continuous itself. Since πB(A) : G −→ B(A)\ G is open and surjective, we get the continuity of πx ◦ pr 1 ◦ 80−1 for all x ∈ M \ {m} by πx ◦ (pr 1 ◦ 80−1 ) ◦ πB(A) = mult. ◦ πmx . For x = m the corresponding statement is trivial. Thus, pr 1 ◦ 80−1 is continuous. b) pr 2 ◦ 80−1 is continuous. We use πZ(HA ) ◦ πm = (pr 2 ◦ 80−1 ) ◦ πB(A) : G −→ Z(H )\ G. The statement A now follows because πB(A) is an open and surjective map and πZ(HA ) and πm are continuous.

628

C. Fleischhack

5. 80 is a homeomorphism because 80−1 is a continuous and bijective map of a compact space onto a Hausdorff space. Thus we get the following important result: The homeomorphism class of a gauge orbit of a connection is completely determined by its holonomy centralizer. Finally, we should emphasize that, in general, the homeomorphism 80 is not an equivariant map w.r.t. the canonical action of G on G 0 × Z(H )\ G. A

3.4. Criteria for the homeomorphy of gauge orbits. It is well known that orbits of general transformation groups are classified by the conjugacy classes of their stabilizers. This would effect in our case that the gauge orbits are characterized by the conjugacy class of their corresponding base centralizer w.r.t. G. As we have already seen, the base centralizer of a connection A is isomorphic to the holonomy centralizer of A and the homeomorphism type of the gauge orbit is completely determined by that of Z(H )\ G. A

Now we are going to show that base centralizers are conjugate w.r.t. G if and only if the corresponding holonomy centralizers are conjugate w.r.t. G. This will allow us to define the type of a gauge orbit EA to be the conjugacy class of Z(HA ) w.r.t. G. The investigation of the set of all these classes is much easier than in the case of classes in G. We want to prove the following Proposition 3.7. Let A1 , A2 ∈ A be two generalized connections. Then the following statements are equivalent: 1. Z(HA1 ) and Z(HA2 ) are conjugate in G.

2. B(A1 ) and B(A2 ) are conjugate in G.

It would be quite easy to prove this directly using Proposition 3.5. Nevertheless, we do not want to do this. Instead, we shall first derive some concrete criteria for the homeomorphy of two gauge orbits. Finally, the just claimed proposition will be a nice by-product. let there Proposition 3.8. Let A1 , A2 ∈ A be two generalized connections.Furthermore, exist an isomorphism 8 : G −→ G of topological groups with 8 B(A1 ) = B(A2 ). Then the map : : EA1 −→ EA2 A1 ◦ g −→ A2 ◦ 8(g) is a homeomorphism compatible with the action of G. Proof. • : is well-defined. −1 −1 Let A1 ◦ g = A1 ◦ g . Then we have A1 ◦ (g ◦ g ) = A1 , i.e. g ◦ g ∈ B(A1 ) by −1 Proposition 3.2. By assumption we have 8(g ◦ g ) = 8(g) ◦ 8(g )−1 ∈ B(A2 ), i.e. A2 ◦ 8(g) = A2 ◦ 8(g ). • Since 8 is a group isomorphism, : is again an isomorphism that is compatible with the action of G. • For the proof of the homeomorphy property of : we consider the following commutative diagram:

Stratification of the Generalized Gauge Orbit Space

E A1

:

τ1 ∼ =

↓

B(A1 )\ G

→ EA2 ∼ = τ2

;

↓

→ B(A2 )\ G

629

A1 ◦ g

:

→ A2 ◦ g

τ1

↓ [g]B(A1 )

τ2 ;

.

↓ → [8(g)]B(A2 )

Since τ1 and τ2 are homeomorphisms, it is sufficient to prove that ; and ;−1 are continuous. But, this follows immediately from the fact that πB(A) : G −→ B(A)\ G is an orbit space projection and that ; ◦ πB(A1 ) = πB(A2 ) ◦ 8. To simplify the speech in the following we state Definition 3.3. Let G be a Lie group (topological group) and let U1 and U2 be closed subgroups of G. U1 and U2 are called extendibly isomorphic (w.r.t. G) iff there is an isomorphism ψ : G −→ G of Lie groups (topological groups) with ψ(U1 ) = U2 . If misunderstanding seems to be unlikely, we simply drop “w.r.t. G” and write “extendibly isomorphic”. In Proposition 3.8 we compared gauge orbits w.r.t. their base centralizers. Now we will compare them using their holonomy centralizers. In order to manage this we need an extendibility lemma. Let the holonomy centralizers of two connections be extendibly isomorphic, i.e. let there exist a ψ : G −→ G with ψ(Z(HA1 )) = Z(HA2 ). By 8 := φ2−1 ◦ ψ ◦ φ1 the base centralizers are isomorphic. Extending 8 to G we get Lemma 3.9. Let A1 , A2 ∈ A be two generalized connections. Then the following statement holds: If Z(HA1 ) and Z(HA2 ) are extendibly isomorphic, then B(A1 ) and B(A2 ) are also extendibly isomorphic. We have explicitly: Let ψ : G −→ G be an isomorphism of Lie groups with ψ Z(HA1 ) = Z(HA2 ). Furthermore, let γx be an arbitrary, but fixed path in M for each x ∈ M. Then we have: • The map 8 : G −→ G defined by 8(g)x := hA2 (γx )−1 ψ hA1 (γx )gx hA1 (γx )−1 hA2 (γx )

(8)

is an isomorphism of topological groups. • 8 |B(A1 ) is an isomorphism of Lie groups between B(A1 ) and B(A2 ). Furthermore, 8 |B(A1 ) is independent of the choice of the paths γx . Proof. Let Z(HA1 ) and Z(HA2 ) be extendibly isomorphic with the corresponding isomorphism ψ. • Obviously, we have 8(g) ∈ G and 8 is a homomorphism of groups. Moreover, 8 is bijective with the inverse 8 −1 (g)x = hA1 (γx )−1 ψ −1 hA2 (γx )gx hA2 (γx )−1 hA1 (γx ).

(9)

630

C. Fleischhack

To prove the continuity of 8 it is sufficient to prove the continuity of πx ◦ 8 for all x. Hence, let U ⊆ G be open. Then we have (πx ◦ 8)−1 (U ) = {g ∈ G | (πx ◦ 8)(g) = 8(g)x = hA2 (γx )−1 ψ hA1 (γx )gx hA1 (γx )−1 hA2 (γx ) ∈ U } = πx−1 hA1 (γx )−1 ψ −1 (hA2 (γx )) ψ −1 (U ) ψ −1 (hA2 (γx )−1 ) hA1 (γx ) . Since ψ is a homeomorphism and πx is continuous, (πx ◦ 8)−1 (U ) is open. The continuity of 8 is now a consequence of Corollary 2.11, that of 8 −1 is clear. • Let φi be the isomorphism for Ai (i = 1, 2) corresponding to Proposition 3.5. Then we have 8 |B(A1 ) = φ2−1 ◦ ψ ◦ φ1 : B(A1 ) −→ B(A2 ). Since φ1 , φ2 and ψ are Lie isomorphisms and, moreover, independent of the choice of the γx , 8 |B(A1 ) is again an isomorphism of Lie groups that is independent of the choice of the γx . Thus, B(A1 ) and B(A2 ) are extendibly isomorphic.

The next lemma is obvious. Lemma 3.10. Let A1 , A2 ∈ A be two generalized connections. Then Z(HA1 ) and Z(HA2 ) are extendibly isomorphic provided they are conjugate w.r.t. G. Now we can prove Proposition 3.7. Proof of Proposition 3.7. • Let Z(HA1 ) and Z(HA2 ) be conjugate and thus also extendibly isomorphic. The map 8 : G −→ G from Lemma 3.9 fulfills now 8(g ) = (hA1 (γx )−1 g hA2 (γx ))−1 gx (hA1 (γx )−1 g hA2 (γx )) x∈M , where g ∈ G was chosen such that Z(HA2 ) = (Ad g)Z(HA1 ). We define g := hA1 (γx )−1 g hA2 (γx ) x∈M . Hence, the map 8 : G −→ G from Lemma 3.9 is simply Ad g. Moreover, Ad g maps B(A1 ) isomorphically onto B(A2 ). Thus, B(A2 ) = (Ad g)B(A1 ). • Let B(A1 ) and B(A2 ) be conjugate, i.e. let there exist a g ∈ G with B(A2 ) = −1 Z(H )g . g −1 B(A1 )g. Then we obviously have Z(HA2 ) = gm A1 m Let us summarize: Theorem 3.11. Let A1 , A2 ∈ A be two generalized connections. Then the following implication chain holds: B(A1 ) and B(A2 ) are conjugate in G. ⇐⇒ Z(HA1 ) and Z(HA2 ) are conjugate in G. $⇒ Z(HA1 ) and Z(HA2 ) are extendibly isomorphic. $⇒ B(A1 ) and B(A2 ) are extendibly isomorphic. $⇒ The gauge orbits EA1 and EA2 are homeomorphic.

Stratification of the Generalized Gauge Orbit Space

631

This theorem has an interesting and perhaps a little bit surprising consequence: Even after projecting A down to A/G ≡ Hom(HG, G)/Ad the complete knowledge about the homeomorphism class of the corresponding gauge orbit is conserved. Naively one would suggest that after projecting the total gauge orbit onto one single point this information should be lost. But, the homeomorphism class is already determined by giving the holonomy centralizer, that, the other way round, can be, up to a global conjugation, reconstructed from [A]. Proposition 3.12. For each [A] ∈ A/G the homeomorphism class of the gauge orbit corresponding to [A] can be reconstructed from [A]. 3.5. Discussion on how to define the gauge orbit type. If we ignored the usual definition of the type of an orbit in a general G-space, then Theorem 3.11 would open up several possibilities to define the type of a gauge orbit. If the type should characterize as “uniquely” as possible the homeomorphism class of the gauge orbit, then it would be advisable to define the base centralizer modulo extendible isomorphy to be the type. But, even this choice would not guarantee that two gauge orbits with different type are in fact non-homeomorphic. Moreover, the base centralizers as subgroups of G are not so easily controllable as centralizers in G are. Thus, we will take the holonomy centralizer for the definition. Only the question remains, whether we should take the centralizer modulo conjugation or modulo extendible isomorphy. We have to collect conjugate centralizers in one type anyway in order to make points of one orbit be of the same type. (Note, that the holonomy centralizers of two gauge equivalent connections are generally not equal but only conjugate.) If we now include the general definition of an orbit type into our considerations again, it will be clear that we shall use the centralizer modulo conjugation. Since two connections have one and the same (usual) orbit type iff their base centralizers are conjugate, i.e. iff their holonomy centralizers are conjugate, we define the gauge orbit type by Definition 3.4. The type of a gauge orbit EA is the holonomy centralizer of A modulo conjugation. We write Typ([A]) or simply Typ(A). We emphasize that this definition of the type of the gauge orbit EA is – as mentioned above – independent of the choice of the connection A ∈ EA . In fact, if A is gauge equivalent to A, by Lemma 3.1 there is a g ∈ G with Z(HA ) = g −1 Z(HA )g. Hence, the

holonomy centralizers of A and A are conjugate. Thus, we can assign to each [A] ∈ A/G a unique gauge orbit type. Using Theorem 3.11 we get immediately Corollary 3.13. Two gauge orbits with the same type are homeomorphic.

Finally, we want to give a further justification for our definition of the gauge orbit type. Let us consider regular connections. In the literature there are two different definitions for the type of a “classical” gauge orbit: On the one hand [14], one chooses the total stabilizer of A ∈ A in G. On the other hand [15], one sees first that the pointed gauge group G0 (the set of all gauge transforms that are the identity on a fixed fibre) is a normal and closed subgroup in G. Obviously, G := G/G0 can be identified with the structure group G. Moreover, the action of G0 on A is free, proper and smooth. This way one gets an action of G, the "essential part" of the gauge transforms, on the space

632

C. Fleischhack

A/G0 . Now, the gauge orbit types are the conjugacy classes of stabilizers being closed subgroups of G ∼ = G. This definition corresponds to our choice of the centralizer of the holonomy group. Due to the statements proven above these two descriptions are equivalent if we consider generalized connections, but in general not if we work in the classical framework. There it is under certain circumstances possible [15] that two connections have conjugate holonomy centralizers, but this conjugation cannot be lifted to a conjugation of the base centralizers. The deeper reason behind this is that the gauge transform g = hA1 (γx )−1 g hA2 (γx ) x∈M (cf. proof of Proposition 3.7) generally is not a classical gauge transform, i.e. it is not smooth. Nevertheless, in case of the definition using the holonomy group we have Corollary 3.14. The gauge orbit type is conserved by the embedding A >→ A. But note that this does not mean at all that the classical and the generalized gauge orbit of a classical connection itself are equal or at least homeomorphic. 4. Stratification of the Generalized Gauge Orbit Space Throughout this section let G be a compact Lie group. The main goal of this section is to prove that A admits a stratification. The basic concept hereby is the notion of a stratum. A stratum simply collects all connections having the same type. Natural questions are: Where in A does which stratum lie? What strata are “bigger” or “smaller”? Which stratum is perhaps the boundary of another? The aim of a stratification theorem is to reduce these geometrical questions to a certain ordering of the types of the strata under consideration. As we have seen above, types can be characterized by subgroups of G that, on the other hand, are naturally ordered by the inclusion relation. And indeed this will induce an appropriate ordering of the strata. For the proof of the stratification theorem we will need first a lemma showing that the orbit type of every connection is determined by a finite set of holonomies. This allows us to lift the slice theorem from a certain Gk to A. Together with a denseness theorem we get the stratification.

4.1. Partial ordering of types. Since every gauge orbit type is an equivalence class of the centralizer of the corresponding holonomy group, the following notation is useful. Definition 4.1. A subgroup U of G is called Howe subgroup iff there is a set V ⊆ G with U = Z(V ). Analogously to the general theory we define a partial ordering for the gauge orbit types [9]. Definition 4.2. Let T denote the set of all conjugacy classes of Howe subgroups of G. Let t1 , t2 ∈ T . Then t1 ≤ t2 holds iff there are G1 ∈ t1 and G2 ∈ t2 with G1 ⊇ G2 . Obviously, we have Lemma 4.1. The maximal element in T is the class tmax of the center Z(G) of G, the minimal is the class tmin of G itself. Every connection whose type equals tmax will be called generic.

Stratification of the Generalized Gauge Orbit Space

633

Definition 4.3. Let t ∈ T . We define the following expressions: A≥t := {A ∈ A | Typ(A) ≥ t}, A=t := {A ∈ A | Typ(A) = t}, A≤t := {A ∈ A | Typ(A) ≤ t}. All the A=t are called strata. The justification for the notation “strata” can be found in Subsect. 4.5. 4.2. Reducing the problem to finite-dimensional G-spaces. 4.2.1. Finiteness lemma for centralizers. We start with the crucial Lemma 4.2. Let U be a nonempty subset of a compact Lie group G. Then there exist an n ∈ N and u1 , . . . , un ∈ U , such that Z({u1 , . . . , un }) = Z(U ). Proof. • The case Z(U ) = G is trivial. • Let Z(U ) = G. Then there is a u1 ∈ U with Z({u1 }) = G. Choose now for i ≥ 1 successively ui+1 ∈ U with Z({u1 , . . . , ui }) ⊃ Z({u1 , . . . , ui+1 }) as long as there is such a ui+1 . This procedure stops after a finite number of steps, since each nonincreasing sequence of compact subgroups in G stabilizes [9]. (Centralizers are always closed, thus compact.) Therefore there is an n ∈ N, such that Z({u1 , . . . , u n }) = Z({u1 , . . . , un } ∪ {u}) for all u ∈ U . Thus, we have Z({u1 , . . . , un }) = u∈U Z({u1 , . . . , un } ∪ {u}) = Z({u1 , . . . , un } ∪ U ) = Z(U ). Corollary 4.3. Let A ∈ A. Then there is a finite set α ⊆ HG, such that Z(HA ) = Z(hA (α)). We set hA (α) := hA (α1 ), . .. , hA (αn ) ⊆ G, where n := #α. To avoid cumbersome notations we denote also hA (α1 ), . . . , hA (αn ) ∈ Gn by hA (α). It should be clear from the context what is meant. Furthermore, α is always finite. Proof. Due to HA ⊆ G and the just proven lemma there are an n ∈ N and g1 , . . . , gn ∈ HA with Z({g1 , . . . , gn }) = Z(HA ). On the other hand, since g1 , . . . , gn ∈ HA , there are α1 , . . . , αn ∈ HG with gi = hA (αi ) for all i = 1, . . . , n. 4.2.2. Reduction mapping. Definition 4.4. Let α ⊆ HG. Then the map ϕα : A −→ G#α A −→ hA (α) is called reduction mapping. Lemma 4.4. Let α ⊆ HG be arbitrary. Then ϕα is continuous, and for all A ∈ A and g ∈ G we have ϕα (A ◦ g) = ϕα (A) ◦ gm . Here G acts on G#α by the adjoint map. Proof. ϕα is continuous by the continuity criterion for maps into product spaces and by the fact that each α ∈ α is a finite product of edges of graphs. The compatibility with −1 h (α) g . the group action follows from hA◦g (α) = gm m A

634

C. Fleischhack

4.2.3. Adjoint action of G on Gn . In this short paragraph we will summarize the most important facts about the adjoint action of G on Gn that can be deduced from the general theory of transformation groups (see, e.g., [8]). First we determine the stabilizer Gg of an element g ∈ Gn . We have Gg = {g ∈ G | g ◦ g = g} = {g ∈ G | g −1 gi g = gi

∀i} = Z({g1 , . . . , gn }).

Consequently, we have for the type of the corresponding orbit Typ(g) = [Gg ] = [Z({g1 , . . . , gn })]. The slice theorem reads now as follows: Proposition 4.5. Let g ∈ Gn . Then there is an S ⊆ Gn with g ∈ S, such that: • S ◦ G is an open neighboorhood of g ◦ G and • there is an equivariant retraction f : S ◦ G −→ g ◦ G with f −1 ({g}) = S. Both on A and on Gn the type is (the class of) a Howe subgroup of G. The transformation behaviour of the types under a reduction mapping is stated in the next Proposition 4.6. Any reduction mapping is type-minorifying, i.e. for all α ⊆ HG and all A ∈ A we have Typ ϕα (A) ≤ Typ(A). Proof. We have Typ ϕα (A) = [Z(ϕα (A))] ≡ [Z(hA (α))] ≤ [Z(HA )] = Typ(A). 4.3. Slice theorem for A. We state now the main theorem of the present paper. Theorem 4.7. There is a tubular neighbourhood for any gauge orbit. Equivalently we have: For all A ∈ A there is an S ⊆ A with A ∈ S, such that: • S ◦ G is an open neighbourhood of A ◦ G and • there is an equivariant retraction F : S ◦ G −→ A ◦ G with F −1 ({A}) = S. 4.3.1. The idea. Our proof imitates in a certain sense the proof of the standard slice theorem (see, e.g., [8]) which is valid for the action of a finite-dimensional compact Lie group G on a Hausdorff space X. Let us review the main idea of this proof. Given x ∈ X. Let H ⊆ G be the stabilizer of x, i.e., [H ] is an orbit type on the G-space X. Now, this situation is simulated on an Rn , i.e., for an appropriate action of G on Rn one chooses a point with stabilizer H . So the orbits on X and on Rn can be identified. For the case of Rn the proof of a slice theorem is not very complicated. The crucial point of the general proof is the usage of the Tietze-Gleason extension theorem because this yields an equivariant extension ψ : X −→ Rn , mapping one orbit onto the other. Finally, by means of ψ the slice theorem can be lifted from Rn to X. What can we learn for our problem? Obviously, G is not a finite-dimensional Lie group. But, we know that the stabilizer B(A) of a connection is homeomorphic to the centralizer Z(HA ) of the holonomy group that is a subgroup of G. Since every centralizer is finitely generated, Z(HA ) equals Z(hA (α)) with an appropriate finite α ∈ HG. This is nothing but the stabilizer of the adjoint action of G on Gn in the point hA (α). Thus, the reduction mapping ϕα is the desired equivalent for ψ.

Stratification of the Generalized Gauge Orbit Space

635

We are now looking for an appropriate S ⊆ A, such that F : S ◦ G −→ A ◦ G A ◦ g −→ A ◦ g is well-defined and has the desired properties. In order to make F well-defined, we need A ◦ g = A $⇒ A ◦ g = A for all A ∈ S and g ∈ G, i.e. B(A ) ⊆ B(A). Applying the projections πx on the stabilizers we get for γx ∈ Pmx (let γm be the trivial path)

hA (γm )−1 Z(HA )hA (γx ) = πx (B(A )) ⊆ πx (B(A)) = hA (γm )−1 Z(HA )hA (γx ), thus Z(HA ) ⊆ hA (γm )hA (γm )−1 Z(HA ) hA (γx )h−1 (γx ) A

(10)

for all x ∈ M. In particular, we have Z(HA ) ⊆ Z(HA ) for x = m. Now we choose an α ⊆ HG with Z(HA ) = Z(hA (α)) and an S ⊆ G#α and an equivariant retraction f : S ◦ G −→ ϕα (A) ◦ G. Since equivariant mappings magnify stabilizers (or at least do not reduce them), we have Z(g ) ⊆ Z(ϕα (A)) for all g ∈ S. Therefore, the condition of (10) would be, e.g., fulfilled if we had for all A ∈ S, 1. ϕα (A ) ∈ S and 2. hA (γx ) = hA (γx ) for all x ∈ M,

because the first condition implies Z(HA ) ⊆ Z(hA (α)) ≡ Z(ϕα (A )) ⊆ Z(ϕα (A)) = Z(HA ). We could now choose S such that these two conditions are fulfilled. However, this would imply F −1 ({A}) ⊃ S in general because for g ∈ B(A) together with A the connection A ◦ g is contained in F −1 ({A}) as well (we have F (A ) = A = A ◦ g = F (A ◦ g)), but A ◦ g need no longer fulfill the two conditions above. Now it is quite obvious to define S as the set of all connections fulfilling these conditions multiplied with B(A). And indeed, the well-definedness remains valid. 4.3.2. The proof. Proof. 1. Let A ∈ A. Choose for A an α ⊆ HG with Z(HA ) = Z(hA (α)) according to Corollary 4.3 and denote the corresponding reduction mapping ϕα : A −→ G#α shortly by ϕ. 2. Due to Proposition 4.5 there is an S ⊆ G#α with ϕ(A) ∈ S, such that • S ◦ G is an open neighbourhood of ϕ(A) ◦ G and • there exists an equivariant mapping f with – f : S ◦ G −→ ϕ(A) ◦ G and – f −1 ({ϕ(A)}) = S. 3. We define the mapping ψ : A −→ G, A −→ hA (γx ) x∈M , whereas for all x ∈ M \ {m} the (arbitrary, but fixed) path γx runs from m to x and γm is the trivial path.

636

C. Fleischhack

4. As we motivated above we set S 0 := ϕ −1 (S) ∩ ψ −1 (ψ(A)), S := ϕ −1 (S) ∩ ψ −1 (ψ(A)) ◦ B(A) ≡ S 0 ◦ B(A) and

5.

F : S ◦ G −→ A ◦ G. A ◦ g −→ A ◦ g

F is well-defined. • Let A ◦ g = A ◦ g with A , A ∈ S and g , g ∈ G. Then there exist z , z ∈ B(A) with A = A0 ◦ z and A = A0 ◦ z as well as A0 , A0 ∈ S 0 . • Due to S 0 ⊆ ψ −1 (ψ(A)) we have ψ(A0 ) = ψ(A) = ψ(A0 ), i.e. hA (γx ) = 0 hA (γx ) = hA (γx ) for all x. 0 • Furthermore, we have

f (ϕ(A ◦ g )) = f (ϕ(A0 ◦ z ◦ g ))

◦ gm ) ϕ "equivariant" = f (ϕ(A0 ) ◦ zm

◦ gm = f (ϕ(A0 )) ◦ zm

= = =

ϕ(A) ◦ zm

◦ gm ϕ(A ◦ z ) ◦ gm ϕ(A) ◦ gm

f equivariant

ϕ(A0 ) ∈ S ϕ "equivariant" z ∈ B(A)

. Therefore, we have ϕ(A) ◦ g = and analogously f (ϕ(A ◦ g )) = ϕ(A) ◦ gm m −1 (g )−1 ∈ ϕ(A)◦gm , i.e. gm (gm ) is an element of the stabilizer of ϕ(A), thus gm m Z(ϕ(A)) = Z(HA ). • Since A0 ◦ z ◦ g = A0 ◦ z ◦ g , we have A0 = A0 ◦ z g (g )−1 (z )−1 , and so for all x ∈ M, −1 hA (γx ) = z g (g )−1 (z )−1 m hA (γx ) z g (g )−1 (z )−1 x . 0

0

Moreover, since g (g )−1 m ∈ Z(HA ), we have z g (g )−1 (z )−1 m ∈ Z(HA ). From hA (γx ) = hA (γx ) = hA (γx ) for all x now z g (g )−1 (z )−1 ∈ 0

0

B(A) follows, and thus g (g )−1 ∈ B(A). • By this we have A ◦ g = A ◦ g , i.e. F is well-defined. 6. F is equivariant. • Let A = A ◦ g ∈ S ◦ G. Then

F (A ◦ g) = F (A ◦ (g ◦ g)) = A ◦ (g ◦ g) = (A ◦ g ) ◦ g

= F (A ◦ g ) ◦ g

= F (A ) ◦ g.

Stratification of the Generalized Gauge Orbit Space

637

F is retracting. • Let A = A ◦ g ∈ A ◦ G. Then F (A ) = F (A ◦ g) = A ◦ g = A . 8. S ◦ G is an open neighbourhood of A ◦ G. • Obviously, A ◦ G ⊆ S ◦ G. • We have S ◦ G = ϕ −1 (S ◦ G). “⊆” Let A = A ◦ g ∈ S 0 ◦ G = S ◦ G. Then we have ϕ(A ) = ϕ(A ◦g) = ϕ(A )◦gm ∈ S◦G because ϕ(S 0 ) ⊆ S. −1 Thus, A ∈ ϕ (S ◦ G). “⊇” – Let A ∈ ϕ −1 (S ◦ G), i.e. ϕ(A ) = g ◦ g with appropriate g ∈ S and g ∈ G. – Choose some g with gm = g. −1 = g ∈ S. Then ϕ(A ◦ g −1 ) = ϕ(A ) ◦ gm −1 Now set A := A ◦ g . −1 hA (γx ) and A := A ◦ g we get – Using gx := hA (γx )

7.

= e and a) ϕ(A ) = ϕ(A ) ∈ S because of gm G b) hA (γx ) = hA (γx ) gx = hA (γx ) for all x ∈ M.

Thus, we have A ∈ S 0 ⊆ S and A = A ◦ g = A ◦ ((g )−1 ◦ g) ∈ S ◦ G. • Consequently, S ◦ G = ϕ −1 (S ◦ G) is as a preimage of an open set again open because of the continuity of ϕ. 9. F is continuous. • We consider the following diagram: F

S◦G ϕ

→ A◦G .

ϕ

↓ S◦G

f

↓ → ϕ(A) ◦ G

A ◦g

F

ϕ

↓

ϕ(A ) ◦ gm

τG → ∼ =

(11)

Z(HA )\ G

→ A◦g ϕ

f

↓ → ϕ(A) ◦ gm

τG

→ [gm ]Z(HA )

It is commutative due to ϕ(S ◦ G) ⊆ S ◦ G, ϕ(A ◦ G) ⊆ ϕ(A) ◦ G and the definition of F . τG is the canonical homeomorphism between the orbit of ϕ(A) and the quotient of the acting group G by the stabilizer of ϕ(A). Since ϕ, f and τG are continuous, the map F := τG ◦ ϕ ◦ F : S ◦ G −→ Z(H )\ G A A ◦ g −→ [gm ]Z(HA ) is continuous. • Now, we consider the map G. F : (S ◦ G) × G −→ −1 (A ◦ g , gm ) −→ hγx (A) gm hγx (A ◦ g ) x∈M

638

C. Fleischhack

F is continuous because mult. πx ◦ F : (S ◦ G) × G −→ G×G −−→ G (A , gm ) −→ (hγx (A ), gm ) −→ hγx (A)−1 gm hγx (A ) is obviously continuous for all x ∈ M. • F induces a map F via the following commutative diagram F (S ◦ G) × G →G id×πZ(H

i.e., –

A

)

πB(A)

↓

F

,

↓

(S ◦ G) × Z(H )\ G → B(A)\ G A −1 = hγx (A) gm hγx (A ) x∈M B(A) .

F (A , [gm ]Z(HA ) ) F is well-defined.

Let g2,m = zg1,m with z ∈ Z(HA ). Then hγx (A)−1 g2,m hγx (A ) x∈M B(A) = hγx (A)−1 z g1,m hγx (A ) x∈M B(A) = zx hγx (A)−1 g1,m hγx (A ) x∈M B(A)

F (A , [g2,m ]Z(HA ) ) =

= F (A , [g1,m ]Z(HA ) ), because (zx )x∈M := (hγx (A)−1 z hγx (A))x∈M ∈ B(A) for z ∈ Z(HA ). – F is continuous, because id × πZ(HA ) is open and surjective and πB(A) and F are continuous. • For A ∈ S there is an A0 ∈ S 0 and a g ∈ B(A) with A = A0 ◦ g . Thus, we have hγx (A0 ) = hγx (A) and hγx (A)−1 gm hγx (A0 ◦ g ◦ g) x∈M B(A) −1 −1 = hγx (A)−1 gm gm (gm ) hγx (A)gx gx x∈M B(A) = hγx (A)−1 hγx (A ◦ g ) gx x∈M B(A) = (gx )x∈M B(A)

F (A ◦ g, [gm ]) =

= [g]B(A) , where we used g ∈ B(A). • Now, F is the concatenation of the following continuous maps: id×F

F

(τG )−1

F : S ◦ G −−−→ (S ◦ G) × Z(H )\ G −→ B(A)\ G −−−→ A ◦ G, A A ◦ g −→ (A ◦ g, [gm ]Z(HA ) ) −→ [g]B(A) −→ A ◦ g, where τG is the canonical homeomorphism between the orbit A ◦ G and the acting group G modulo the stabilizer B(A) of A. Hence, F is continuous. 10. We have F −1 ({A}) = S.

Stratification of the Generalized Gauge Orbit Space

639

• “⊆” Let A ∈ F −1 ({A}), i.e. F (A ) = A. – By the commutativity of (11) we have f (ϕ(A )) = ϕ(F (A )) = ϕ(A), hence A ∈ ϕ −1 (f −1 (ϕ(A))) = ϕ −1 (S). – Define gx := hA (γx )−1 hA (γx ) and A := A ◦ g. Then we have

ϕ(A ) = ϕ(A ) ∈ S, i.e. A ∈ ϕ −1 (S), and hA (γx ) = hA (γx ) for all

x, i.e. A ∈ ψ −1 (ψ(A)). By this, A ∈ S 0 . – Consequently, F (A ) = A = F (A ), and therefore also A ◦ g = F (A ) ◦ g = F (A ◦ g) = F (A ) = A, i.e. g ∈ B(A). Thus, A = A ◦ g −1 ∈ S 0 ◦ B(A) = S. “⊇” Let A ∈ S. Then F (A ) = F (A ◦ 1) = A ◦ 1 = A, i.e. A ∈ F −1 ({A}). 4.3.3. Openness of the strata. Proposition 4.8. A≥t is open for all t ∈ T . Corollary 4.9. A=t is open in A≤t for all t ∈ T . Proof. Since A=t = A≥t ∩ A≤t , A=t is open w.r.t. to the relative topology on A≤t .

Corollary 4.10. A≤t is compact for all t ∈ T . Proof. A \ A≤t = t ∈T ,t /≤ t A=t = t ∈T ,t /≤ t A≥t is open because A≥t is open for all t ∈ T . Thus, A≤t is closed and therefore compact. The proposition on the openness of the strata can be proven in two ways: first as a simple corollary of the slice theorem on A, but second directly using the reduction mapping. Thus, altogether the second variant needs less effort. Proof of Proposition 4.8. We have to show that any A ∈ A≥t has a neighbourhood that again is contained in A≥t . So, let A ∈ A≥t . • Variant 1 Due to the slice theorem there is an open neighbourhood U of A ◦ G, and so of A, too, and an equivariant retraction F : U −→ A ◦ G. Since every equivariant mapping reduces types, we have Typ(A ) ≥ Typ(A) ≥ t for all A ∈ U , thus U ⊆ A≥t . • Variant 2 Choose again for A an α ⊆ HG with Typ(A) = [Z(HA )] = [Z(hA (α))] ≡ [Z(ϕα (A))] = Typ(ϕα (A)). Due to the slice theorem for general transformation groups there is an open, invariant neighbourhood U of ϕα (A) in G#α and an equivariant retraction f : U −→ ϕα (A) ◦ G. Since ϕα and f are type-reducing, we have Typ(A ) ≥ Typ(ϕα (A )) ≥ Typ f (ϕα (A )) = Typ(ϕα (A)) = Typ(A) ≥ t

for all A ∈ U := ϕα−1 (U ), i.e. U ⊆ A≥t . Obviously, U contains A and is open as a preimage of an open set.

640

C. Fleischhack

4.4. Denseness of the strata. The next theorem we want to prove is that the set A=t is not only open, but also dense in A≤t . This assertion does – in contrast to the slice theorem and the openness of the strata – not follow from the general theory of transformation groups. We have to show this directly on the level of A. As we will see in a moment, the next proposition will be very helpful. Proposition 4.11. Let A ∈ A and i be finitely many graphs. Then there is for any t ≥ Typ(A) an A ∈ A with Typ(A ) = t and πi (A) = πi (A ) for all i. Namely, we have Corollary 4.12. A=t is dense in A≤t for all t ∈ T . Proof. Let A ∈ A≤t ⊆ A. We have to show that any neighbourhood U of A contains an A having type t. It is sufficient to prove this assertion for all graphs i and all (Wi ) with open Wi ⊆ G#E(i ) and πi (A) ∈ Wi for all i ∈ I with finite I , U = i π−1 i because any general open U contains such a set. Now let i and U be chosen as just described. Due to Proposition 4.11 above there exists an A ∈ A with Typ(A ) = t ≥ Typ(A) and πi (A) = πi (A ) for all i, i.e. with A ∈ A=t and A ∈ π−1 (Wi ) for all i, thus, A ∈ i π−1 (Wi ) = U . πi ({A}) ⊆ π−1 i i i Along with the proposition about the openness of the strata we get Corollary 4.13. For all t ∈ T the closure of A=t w.r.t. A is equal to A≤t . Proof. Denote the closure of F w.r.t. E by ClE (F ). Due to the denseness of A=t in A≤t we have ClA≤t (A=t ) = A≤t . Since the closure is

compatible with the relative topology, we have A≤t = ClA≤t (A=t ) = A≤t ∩ ClA (A=t ), i.e. A≤t ⊆ ClA (A=t ). But, due to Corollary 4.10, A≤t ⊇ A=t itself is closed in A. Hence, A≤t ⊇ ClA (A=t ).

4.4.1. How to prove Proposition 4.11?. Which ideas will the proof of Proposition 4.11 be based on? As in the last two subsections we get help from the finiteness lemma for centralizers. Namely, let α ⊆ HG be chosen such that Typ(A) = [Z(HA )] = [Z(ϕα (A))]. t ≥ Typ(A) is finitely generated as well. Thus, we have to construct a connection whose type is determined by ϕα (A) and the generators of t. For this we use the induction on the number of generators of t. In conclusion, we have to construct inductively from A new connections Ai , such that Ai−1 coincides with Ai at least along the paths that pass α or that lie in the graphs i . But, at the same time, there has to exist a path e, such that hAi (e) equals the i th generator of t. Now, it should be obvious that we get help from the construction method for new connections introduced in [10]. Before we do this we recall an important notation used there. Definition 4.5. Let γ1 , γ2 ∈ P. We say that γ1 and γ2 have the same initial segment (shortly: γ1 ↑↑ γ2 ) iff there exist 0 < δ1 , δ2 ≤ 1 such that γ1 |[0,δ1 ] and γ2 |[0,δ2 ] coincide up to the parametrization.

Stratification of the Generalized Gauge Orbit Space

641

We say analogously that the final segment of γ1 coincides with the initial segment of γ2 (shortly: γ1 ↓↑ γ2 ) iff there exist 0 < δ1 , δ2 ≤ 1 such that γ1−1 |[0,δ1 ] and γ2 |[0,δ2 ] coincide up to the parametrization. Iff the corresponding relations are not fulfilled, we write γ1 ↑↑ γ2 and γ1 ↓↑ γ2 , respectively. Finally, we recall the decomposition lemma. Lemma 4.14.Let x ∈ M be a point. Any γ ∈ P can be written (up to parametrization) as a product γi with γi ∈ P, such that • int γi ∩ {x} = ∅ or • int γi = {x}. 4.4.2. Successive magnifying of the types. In order to prove Proposition 4.11 we need the following lemma for magnifying the types. Hereby, we will use explicitly the con struction of a new connection A from A as given in [10]. Lemma 4.15. Let i be finitely many graphs, A ∈ A and α ⊆ HG be a finite set of paths with Z(HA ) = Z(hA (α)). Furthermore, let g ∈ G be arbitrary. Then there is an A ∈ A, such that: • hA (α) = hA (α),

• πi (A ) = πi (A) for all i, • hA (e) = g for an e ∈ HG and • Z(HA ) = Z({g} ∪ hA (α)).

Proof. 1. Let m ∈ M be some point that is neither contained in the images of i nor in that of α, and join m with m by some path γ . Now let e be some closed path in M with base point m and without self-intersections, such that im (i ) = ∅. (12) im e ∩ int γ ∪ im (α) ∪ Obviously, there exists such an e because M is supposed to be at least two-dimensional. Set e := γ e γ −1 ∈ HG and g := hA (γ )−1 ghA (γ ). Finally, define a connection A for A, e and g as follows: 2. Construction of A • Let δ ∈ P be for the moment a "genuine" path (i.e., not an equivalence class) that does not contain the initial point e (0) ≡ m of e as an inner point. Explicitly we have int δ ∩ {e (0)} = ∅. Define  g hA (e )−1 hA (δ) hA (e ) g −1 , for δ ↑↑ e and δ ↓↑ e   g h (e )−1 h (δ) , for δ ↑↑ e and δ ↓↑ e A A . hA (δ) := −1  hA (δ) hA (e ) g , for δ ↑↑ e and δ ↓↑ e    hA (δ) , else • For every trivial path δ set hA (δ) = eG . • Now, let δ ∈ P be an arbitrary path. Decompose δ into a finite product δi due to Lemma 4.14 such that no δi contains the point e (0) in the interior when we suppose δi is not trivial. Here, set hA (δ) := hA (δi ).

We know from [10] that A is indeed a connection.

642

C. Fleischhack

3. The assertion πi (A ) = πi (A) for all i is an immediate consequence of the construction because im (i ) ∩ im e = ∅. As well, we get hA (α) = hA (α). 4. Moreover, from (12), the fact that e has no self-intersections and the definition of A we get hA (γ ) = hA (γ ) and so

hA (e) = hA (γ ) hA (e ) hA (γ −1 ) = hA (γ ) g hA (γ )−1 = g. 5. We have Z(HA ) = Z({g} ∪ HA ). “⊆” Let f ∈ Z(HA ), i.e. f hA (α) = hA (α) f for all α ∈ HG. • From hA (e) = g it follows that f g = gf , i.e. f ∈ Z({g}). • From im e ∩ im (α) = ∅ it follows that hA (αi ) = hA (αi ), i.e. f ∈ Z(hA (αi )) for all i. Thus, f ∈ Z({g}) ∩ Z(hA (α)) = Z({g} ∪ HA ). “⊇” Let f ∈ Z({g} ∪ HA ). • Let α be a path from m to m , such that int α ∩ {m } = ∅ or int α = {m }. Set α := γ α γ −1 . Then by construction we have hA (α) = hA (γ ) hA (α ) hA (γ )−1 = hA (γ ) hA (α ) hA (γ )−1 .

There are four cases: Suppose, e.g., α ↑↑ e and α ↓↑ e . Then hA (α) = hA (γ ) g hA (e )−1 hA (α ) hA (e ) (g )−1 hA (γ )−1 = g hA (γ ) hA (e )−1 hA (α ) hA (e ) hA (γ )−1 g −1 = g hA (γ e •

−1 −1

αeγ

) g −1 .

Thus, f ∈ Z({hA (α)}). The remaining cases yield the same conclusion. Now, let α ∈ HG be arbitrary and α := γ −1 αγ . By the Decomposition Lemma 4.14 there is a decomposition α = αi with int αi ∩{m } = ∅ or int αi = {m } for all i. Thus, α = γ αi γ −1 = −1 γ αi γ . Using the result just proven we get −1 γ αi γ = Z({hA (α)}). f ∈ Z h A

Thus, f ∈ Z(HA ). Due to the definition of α we have Z(HA ) = Z({g} ∪ hA (α)).

4.4.3. Construction of arbitrary types. Finally, we can now prove the desired proposition. Proof of Proposition 4.11. • Let t ∈ T and t ≥ Typ(A). Then there exist a Howe subgroup V ⊆ G with t = [V ] and a g ∈ G, such that Z(HA ) ⊇ g −1 V g =: V . Since V is a Howe subgroup, we have Z(Z(V )) = V and so by Lemma 4.2 there exist certain u0 , . . . , uk ∈ Z(V ) ⊆ G, such that V = Z(Z(V )) = Z({u0 , . . . , uk }).

Stratification of the Generalized Gauge Orbit Space

643

• Now let Z(HA ) = Z(hA (α)) with an appropriate α ⊆ HG as in Corollary 4.3. Because of V ⊆ Z(HA ) we have V = V ∩Z(HA ) = Z({u0 , . . . , uk })∩Z(hA (α)) = Z({u0 , . . . , uk } ∪ hA (α)). • We now use inductively Lemma 4.15. Let A0 := A and α 0 := α. Construct for all j = 0, . . . , k a connection Aj +1 and an ej ∈ HG from Aj and α j by that lemma, such that πi (Aj +1 ) = πi (Aj ) for all i, hAj +1 (α j ) = hAj (α j ), hAj +1 (ej ) = uj and Z(HAj +1 ) = Z({uj } ∪ hAj (α j )). Setting α j +1 := α j ∪{ej } we get Z(HAj +1 ) = Z({uj }∪hAj (α j )) = Z(hAj +1 (α j +1 )).

Finally, we define A := Ak+1 . Now, we get πi (A ) = πi (A) for all i, hA (α) = hA (α) and hA (ej ) = uj . Thus, Z(HA ) = Z(hA (α k+1 ))

= Z(hA ({e0 , . . . , ek } ∪ α))

= Z({u0 , . . . , uk } ∪ hA (α)) = V,

i.e., Typ(A ) = [V ] = t.

The proposition just proven has a further immediate consequence. Corollary 4.16. A=t is non-empty for all t ∈ T . Proof. Let A be the trivial connection, i.e. hA (α) = eG for all α ∈ P. The type of A is [G], thus minimal, i.e. we have t ≥ Typ(A) for all t ∈ T . By means of Proposition 4.11 there is an A ∈ A with Typ(A ) = t. This corollary solves the problem which gauge orbit types exist for generalized connections. Theorem 4.17. The set of all gauge orbit types on A is the set of all conjugacy classes of Howe subgroups of G. Furthermore we have Corollary 4.18. Let be some graph. Then π (A=tmax ) = π (A). In other words: π is surjective even on the generic connections. Proof. π is surjective on A as proven in [10]. By Proposition 4.11 there is now an A with Typ(A ) = tmax and π (A ) = π (A).

4.5. Stratification of A. First we recall the general definition of a stratification [14]. Definition 4.6. A countable family S of non-empty subsets of a topological space X is called stratification of X iff S is a covering for X and for all U, V ∈ S we have • U ∩ V = ∅ $⇒ U = V , • U ∩ V = ∅ $⇒ U ⊇ V and

644

C. Fleischhack

• U ∩ V = ∅ $⇒ V ∩ (U ∪ V ) = V . The elements of such a stratification S are called strata. A stratification is called topologically regular iff for all U, V ∈ S, U = V and U ∩ V = ∅ $⇒ V ∩ U = ∅. Theorem 4.19. S := {A=t | t ∈ T } is a topologically regular stratification of A. Analogously, {(A/G)=t | t ∈ T } is a topologically regular stratification of A/G. Proof. • Obviously, S is a covering of A. • For a compact Lie group the set of all types, i.e. all conjugacy classes of Howe subgroups of G, is at most countable (cf. [14]). • Moreover, from A=t1 ∩ A=t2 = ∅, A=t1 = A=t2 immediately follows. • Due to Corollary 4.13 we have Cl(A=t1 ) = A≤t1 , i.e. from Cl(A=t1 ) ∩ A=t2 = ∅, t2 ≤ t1 follows and thus Cl(A=t1 ) ⊇ A=t2 . (Note that Cl(U ) denotes again the closure of U , here w.r.t. A.) • Analogously we get Cl(A=t2 ) ∩ (A=t1 ∪ A=t2 ) = A≤t2 ∩ (A=t1 ∪ A=t2 ) = A=t2 . • As well, from Cl(A=t1 )∩A=t2 = ∅ and A=t1 = A=t2 , t1 > t2 follows, i.e. Cl(A=t2 )∩ A=t1 = ∅. Consequently, S is a topologically regular stratification of A. For a regular stratification it would be required that each stratum carries the structure of a manifold that is compatible with the topology of the total space. In contrast to the case of the classical gauge orbit space [14], this is not fulfilled for generalized connections. 5. Non-Complete Connections We shall round off that paper with the proof that the set of the so-called non-complete connections is contained in a set of induced Haar measure zero. This section actually stands a little bit separated from the context because it is the only section that is not only algebraic and topological, but also measure theoretical. Again, G is compact. Definition 5.1. Let A ∈ A be a connection. 1. A is called complete ⇐⇒ HA = G. 2. A is called almost complete ⇐⇒ HA = G. 3. A is called non-complete ⇐⇒ HA = G. Obviously, we have Lemma 5.1. If A ∈ A is complete (almost complete, non-complete), so A◦g is complete (almost complete, non-complete) for all g ∈ G. Thus, the total information about the completeness of a connection is already contained in its gauge orbit. Now, to the main assertion of this section. Proposition 5.2. Let N := {A ∈ A | A non-complete}. Then N is contained in a set of µ0 -measure zero whereas µ0 is the induced Haar measure on A [2,6,10].

Stratification of the Generalized Gauge Orbit Space

645

Since N is gauge invariant, we have Corollary 5.3. Let [N ] := {[A] ∈ A/G | A non-complete}. Then [N ] is contained in a set of µ0 -measure zero. For the proof of the proposition we still need the following Lemma 5.4. Let U ⊆ G be measurable with µHaar (U ) > 0 and NU := {A ∈ A | HA ⊆ G \ U }. Then NU is contained in a set of µ0 -measure zero. Proof. • Let k ∈ N and k be some connected graph with one vertex m and k edges α1 , . . . , αk ∈ HG. Such a graph does indeed exist for dim M ≥ 2. For instance, take k circles Ki with centers in ( 1i , 0, . . . ) and radii 1i . By means of an appropriate chart mapping around m these circles define a graph with the desired properties. Furthermore, let πk : A −→ Gk . A −→ (hA (α1 ), . . . , hA (αk )) • Denote now by Nk,U := πk−1 ((G \ U )k ) the set of all connections whose holonomies on k are not contained in U . Per construction we have NU ⊆ Nk,U . • Since the characteristic function χNk,U for Nk,U is obviously a cylindrical function, we get χNk,U dµ0 = πk∗ (χ(G\U )k ) dµ0 µ0 (Nk,U ) = A A k = χ(G\U )k dµHaar = [µHaar (G \ U )]k . Gk

• From NU ⊆ Nk,U for all k follows NU ⊆ k Nk,U . But, µ0 ( k Nk,U ) ≤ µ0 (Nk,U ) = µHaar (G \ U )k for all k, i.e. µ0 ( k Nk,U ) = 0, because µHaar (G \ U ) = 1 − µHaar (U ) < 1. Proof of Proposition 5.2. • Let (*k )k∈N be some null sequence. Furthermore, let {Uk,i }i be for each k a finite covering of G by open k,i whose respective diameters are smaller than *k . Now sets U define N := k i NUk,i . • Since Uk,i is open and G is compact, Uk,i is measureable with µHaar (Uk,i ) > 0. Due to Lemma 5.4 we have NUk,i ⊆ NU∗ k,i with µ0 (NU∗ k,i ) = 0 for all k, i; thus N ⊆ N ∗ := k i NU∗ k,i with µ0 (N ∗ ) = 0. • We are left to show N ⊆ N . Let A ∈ N. Then there is an open U ⊆ G with HA ⊆ G \ U . Now let m ∈ U . Then * := dist(m, ∂U ) > 0. Choose k such that *k < *. Then choose a Uk,i with m ∈ Uk,i . We get for all x ∈ Uk,i : d(x, m) ≤ diamUk,i < *k < *, i.e. x ∈ U . Consequently, Uk,i ⊆ U and thus HA ⊆ G \ Uk,i , i.e. A ∈ N . Corollary 5.5. The set of all generic connections (i.e. connections of maximal type) has µ0 -measure 1. Proof. Every almost complete connection A has type [Z(HA )] = [Z(G)] = tmax . (Observe that the centralizer of a set U ⊆ G equals that of the closure U .) Since A=tmax is open due to Proposition 4.8, thus measurable, Proposition 5.2 yields the assertion.

646

C. Fleischhack

The last assertion is very important: It justifies the definition of the natural induced Haar measure on A/G (cf. [2, 10]). Actually, there were (at least) two different possibilities for this. Namely, let X be some general topological space equipped with a measure µ and let G be some topological group acting on X. The problem now is to find a natural measure µG on the orbit space X/G. On the one hand, one could simply define µG (U ) := µ(π −1 (U )) for all measurable U ⊆ X/G. (π : X −→ X/G is the canonical projection.) But, on the other hand, one also could stratify the orbit space. For instance, in the easiest case we could have X = X/G×G. In general, one gets (roughly speaking) X= V /G × GV \ G whereas V is an appropriate disjoint decomposition of X and GV characterizes the type of the orbits on V . Now one naively defines µG (U ) := −1 µ(π −1 (U )∩V ) V µG,V (G/GV ) := V µ π (U ) ∩ V µV (GV ), where µV measures the “size” of the stabilizer GV in G. This second variant is nothing but the transformation of the dµ measures using the Faddeev-Popov determinant (i.e. the Jacobi determinant) dµ . In G contrast to the first method, here the orbit space and not the total space is regarded to be primary. For a uniform distribution of the measure over all points of the total space the image measure on the orbit space need no longer be uniformly distributed; the orbits are weighted by size. But, for the second method the uniformity is maintained. In other words, the gauge freedom does not play any rôle when the Faddeev-Popov method is used. Nevertheless, we see in our concrete case of πA/G : A −→ A/G that both methods are equivalent because the Faddeev-Popov determinant is equal to 1 (at least outside a set of µ0 -measure zero). This follows immediately from the slice theorem and the corollary above that the generic connections have total measure 1. 6. Discussion In the present paper we gained a lot of information about the structure of the generalized gauge orbit space within the Ashtekar framework. The most important tool was the theory of compact transformation groups on topological spaces. This enabled us to investigate the action of the group of generalized gauge transforms on the space of generalized connections. Our considerations were guided by the results of Kondracki and Rogulski [14] about the structure of the classical gauge orbit space for Sobolev connections. The methods used there are however fundamentally different from ours. Within the Ashtekar approach most of the proofs are purely algebraic or topological; in the classical case the methods are especially based on the theory of fiber bundles, i.e. analysis and differential geometry. In Sect. 3 we proved that the G-stabilizer B(A) of a connection A is isomorphic to the G-centralizer Z(HA ) of the holonomy group of A. Furthermore, two connections have conjugate G-stabilizers if and only if their holonomy centralizers are conjugate. Thus, the type of a generalized connection can be defined equivalently both by the G-conjugacy class of B(A) (as known from the general theory of transformation groups) and by the G-conjugacy class of Z(HA ). This is a significant difference from the classical case. The reduction of our problem from structures in G to those in G was the crucial idea in Sect. 4. Since centralizers in compact groups are even generated by a finite number of elements, we could model the gauge orbit type [Z(HA )] on a finite-dimensional space. Using an appropriate mapping we lifted the corresponding slice theorem to a slice theorem on A. This is the main result of our paper. Collecting connections of one and the same type we got the so-called strata whose openness was an immediate

Stratification of the Generalized Gauge Orbit Space

647

consequence of the slice theorem. In the next step we showed that the natural ordering on the set of the types encodes the topological properties of the strata. More precisely, we proved that the closure of a stratum contains (besides the stratum itself) exactly the union of all strata having a smaller type. This implied that this decomposition of A is a topologically regular stratification. All these results hold in the classical case as well. This is very remarkable because our proofs used partially completely different ideas. However, two results of this paper go beyond the classical theorems. First, we were able to determine the full set of all gauge orbit types occurring in A. This set is known for Sobolev connections – to the best of our knowlegde – only for certain bundles. Recently, Rudolph, Schmidt and Volobuev solved this problem completely for SU (n)-bundles P over two-, three- and four-dimensional manifolds [21]. The main problem in the Sobolev case is the non-triviality of the bundle P . This can exclude orbit types that occur in the trivial bundle M × SU (n). But, this problem is irrelevant for the Ashtekar framework: Every regular connection in every G-bundle over M is contained in A [2]. This means, in a certain sense, we only have to deal with trivial bundles. Second, in the Ashtekar framework there is a well-defined natural measure on A. Using this we could show that the generic stratum has the total measure one; this is not true in the classical case. The proposition above implies now that the Faddeev-Popov determinant for the transformation from A to A/G is equal to 1. This, on the other hand, justifies the definition of the induced Haar measure on A/G by projecting the corresponding measure for A which has been discussed in detail in Sect. 5. Hence, we were able to “transfer” the classical theory of strata in a certain sense (almost) completely to the Ashtekar program. We emphasize that all assertions are valid for each compact structure group – both in the analytical and in the C r -smooth case. What could be next steps in this area? An important – and in this paper completely ignored – item is the physical interpretation of the gained knowledge. So we will conclude our paper with a few ideas that could link mathematics and physics: • Topology What is the topological structure of the strata? Are they connected or is A connected itself (at least for connected G)? Is A=t globally trivial over (A/G)=t , at least for the generic stratum with t = tmax ? What sections do exist in these bundles, i.e. what gauge fixings do exist in A? These problems are closely related to the so-called Gribov problem, the non-existence of global gauge fixings for classical connections in principal fiber bundles with compact, non-commutative structure group (see, e.g., [22]). From this lots of difficulties result for the quantization of such a Yang–Mills theory that are not circumvented up to now. • Algebraic topology Is there a meaningful, i.e. especially non-trivial cohomology theory on A? First abstract attempts can be found, e.g., in [4, 3]. Is it possible to construct in this way characteristic classes or even topological invariants? • Measure theory How are arbitrary measures distributed over single strata? In other words: What properties do measures have that are defined by the choice of a measure on each single stratum?

648

C. Fleischhack

This is extremely interesting, in particular, from the physical point of view because the choice of a µ0 -absolutely continuous measure µ on A corresponds to the choice of an action functional S on A by A f dµ = A f e−S dµ0 . According to Lebesgue’s decomposition theorem all measures whose support is not fully contained in the generic stratum have singular parts. Finally, we have to stress that the present paper only investigates the case of pure gauge theories. Of course, this is physically not satisfying. Therefore the next goal should be the inclusion of matter fields. A first step has already been done by Thiemann [23] whereas the aspects considered in the present paper did not play any rôle in Thiemann’s paper. Acknowledgements. I am very grateful to Gerd Rudolph and Eberhard Zeidler for their great support while I wrote my diploma thesis and the present paper. Additionally, I thank Gerd Rudolph for reading the drafts. Moreover, I am grateful to Domenico Giulini and Matthias Schmidt for convincing me to hope for the existence of a slice theorem on A. I thank Jerzy Lewandowski for asking me how the notion of webs is related to the notion of paths in the present paper. Finally, I thank the Max-Planck-Institut für Mathematik in den Naturwissenschaften for its generous promotion.

References 1. Ashtekar, A. and Isham, C. J.: Representations of the holonomy algebras of gravity and nonabelian gauge theories. Class. Quant. Grav. 9, 1433–1468 (1992) 2. Ashtekar, A. and Lewandowski, J.: Representation theory of analytic holonomy C ∗ algebras. In: Baez, J. C. (ed.) Knots and Quantum Gravity. Oxford Lecture Series in Mathematics and its Applications, Oxford: Oxford University Press, 1994, pp. 21–61 3. Ashtekar, A. and Lewandowski, J.: Differential geometry on the space of connections via graphs and projective limits. J. Geom. Phys. 17, 191–230 (1995) 4. Ashtekar, A. and Lewandowski, J.: Projective techniques and functional integration for gauge theories. J. Math. Phys. 36, 2170–2191 (1995) 5. Ashtekar, A., Lewandowski, J., Marolf, D., Mourão, J. and Thiemann, Th.: SU (N ) quantum Yang–Mills theory in two dimensions: A complete solution. J. Math. Phys. 38, 5453–5482 (1997) 6. Baez, J. C. and Sawin, S.: Functional integration on spaces of connections. J. Funct. Anal. 150, 1–26 (1997) 7. Baez, J. C. and Sawin, S.: Diffeomorphism-invariant spin network states. J. Funct. Anal. 158, 253–266 (1998) 8. Bredon, G. E.: Introduction to Compact Transformation Groups. New York: Academic Press, Inc., 1972 9. Burbaki, N.: Gruppy i algebry Li, Gl. IX Kompaktnye vewestvennye gruppy Li). Moskva: Izdatelstvo «Mir», 1986 10. Fleischhack, Ch.: Hyphs and the Ashtekar-Lewandowski Measure. MIS-Preprint 3/2000. To appear in J. Geom. Phys. 11. Fleischhack, Ch.: A new type of loop independence and SU (N ) quantum Yang–Mills theory in two dimensions. J. Math. Phys. 41, 76–102 (2000) 12. Giles, R.: Reconstruction of gauge potentials from Wilson loops. Phys. Rev. D24, 2160–2168 (1981) 13. Kelley, J. L.: General Topology. Toronto, New York, London: D. van Nostrand Company, Inc., 1955 14. Kondracki, W. and Rogulski, J.: On the stratification of the orbit space for the action of automorphisms on connections (Dissertationes mathematicae 250). Warszawa: 1985 15. Kondracki, W. and Sadowski, P.: Geometric structure on the orbit space of gauge connections. J. Geom. Phys. 3, 421–434 (1986) 16. Lewandowski, J.: Group of loops, holonomy maps, path bundle and path connection. Class. Quant. Grav. 10, 879–904 (1993) 17. Lewandowski, J. and Thiemann, Th.: Diffeomorphism invariant quantum field theories of connections in terms of webs. Class. Quant. Grav. 16, 2299–2322 (1999) 18. Marolf, D. and Mourão, J.: On the support of the Ashtekar-Lewandowski measure. Commun. Math. Phys. 170, 583–606 (1995) 19. Mitter, P. K.: Geometry of the space of gauge orbits and the Yang-Mills dynamical system. Lectures given at Cargèse Summer Inst. on Recent Developments in Gauge Theories, Cargèse, 1979 20. Rendall, A.: Comment on a paper of Ashtekar and Isham. Class. Quant. Grav. 10, 605–608 (1993)

Stratification of the Generalized Gauge Orbit Space

649

21. Rudolph, G., Schmidt, M. and Volobuev, I.: Classification of gauge orbit types for SU (n) gauge theories. math-ph/0003044 22. Singer, I. M.: Some remarks on the Gribov ambiguity. Commun. Math. Phys. 60, 7–12 (1978) 23. Thiemann, Th.: Kinematical Hilbert spaces for Fermionic and Higgs quantum field theories. Class. Quant. Grav. 15, 1487–1512 (1998) Communicated by H. Nicolai

Commun. Math. Phys. 214, 651 – 677 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

On Action-Angle Variables for the Second Poisson Bracket of KdV T. Kappeler, M. Makarov Institut für Mathematik, Universität Zürich, Winterthurerstrasse 190, 8057 Zürich, Switzerland. E-mail: [email protected] Received: 12 November 1999 / Accepted: 9 May 2000

Abstract: We prove that on the Sobolev spaces H N (S 1 ) (N ≥ 0), each leaf of the foliation, induced by the second Poisson bracket of KdV, admits global action-angle variables. The actions with respect to the first bracket raise to the actions with respect to the second bracket. The angles for the first bracket are, at the same time, angles for the second bracket.

0. Introduction Consider the Korteweg–deVries equation (KdV) with periodic boundary conditions, ∂t u = −∂x3 u + 6u∂x u;

u(x + 1, t) = u(x, t) (x, t ∈ R).

It is well known that this equation can be viewed as a bihamiltonian system (cf. e.g. [GZ, Ma, Mc]): i.e. there exist two Poisson structures, ∂x , referred to as the first Poisson structure, and 1 Lq := − ∂x3 + q∂x + ∂x q, 2

(0.1)

referred to as the second Poisson structure, with the following two properties: (BH1) ∂x and Lq are compatible, i.e. Lq + ∂x is a Poisson structure; (BH2) KdV is a Hamiltonian system with respect to ∂x and Lq . Indeed, ∂t u = ∂x

∂H1 (u), ∂q(x)

(0.2)

652

T. Kappeler, M. Makarov

where the Hamiltonian H1 is given by H1 (q) := ∂t u = Lq

1

1 2 0 ( 2 (∂x q)

+ q 3 )dx and

∂H2 (u), ∂q(x)

(0.3)

1 where H2 (q) := 21 0 q(x)2 dx. Both Poisson structures are degenerate and induce symplectic foliations. However, the second Poisson structure is not constant. Nevertheless the second Poisson structure is, in many respects, the more natural one of the two (cf. e.g. [GZ, KZ]). For the first 1 Poisson structure, the average [q] := 0 q(x)dx is a Casimir and the symplectic leaves of the induced foliation on the Sobolev spaces (H N (S 1 ))N≥0 are given by the affine spaces HcN (S 1 ) := {q ∈ H N (S 1 ) | [q] = c} (S 1 denotes the unit circle.) It has been proved in [KM2] (cf. also [BBGK] and [BKM1]) that each symplectic leaf admits global action-angle coordinates. The aim of this paper is to prove that each leaf of the symplectic foliation induced by the second Poisson structure admits global action-angle variables as well. To the best of our knowledge, this is the first result of its kind, providing action-angle coordinates for a completely integrable system of infinite dimension with a non-constant Poisson structure. To state our results more precisely, we have first to describe the symplectic foliation of the second Poisson structure. For q ∈ H N (S 1 ), consider the Schrödinger equation −y

+ qy = λy.

(0.4)

Denote by y1 (x, λ, q) and y2 (x, λ, q) the fundamental solutions of (0.4), which are N+2 (R), and by (λ, q) the discriminant elements in Hloc (λ, q) := y1 (1, λ, q) + y2 (1, λ, q).

(0.5)

It turns out that (0, q) is a Casimir for the second Poisson structure whose level sets describe the regular leaves on the Sobolev spaces (H N (S 1 ))N≥0 . However, the foliation induced by the second Poisson structure is not regular and admits singular leaves. To describe them, denote by spec(q) the spectrum λ0 (q) < λ1 (q) ≤ λ2 (q) < . . . of the d2 operator − dx 2 + q when considered with periodic boundary conditions on the interval [0,2] and let specd (q) ⊆ spec(q) denote the subset of double eigenvalues. Further introduce, for N ≥ 0, the following level sets: For n ≥ 0, N 1 LN c,n := {q ∈ H (S ) | λn−1 (q) < 0 < λn (q); (0, q) = c}

(0.6)

(with the convention λ−1 = −∞). For n = 2k ≥ 2, InN := {q ∈ H N (S 1 ) | λn−1 (q) = λn (q) = 0};

(0.7)

further J0N := {q ∈ H N (S 1 ) | λ0 (q) = 0} and for n = 2k ≥ 2, N := {q ∈ H N (S 1 ) | λn− 1 ± 1 (q) = 0; 0 ∈ specd (q)}. J±,n 2

2

(0.8)

0 . To make notation shorter, we set Lc,n := L0c,n , In := In0 , J0 = J00 , and J±,n := J±,n

Action-Angle Variables for KdV

653

Notice that, for N ≥ 0, LN c,n = ∅ for |c| < 2 and n even, or c > 2 and n = 4k, or N JN JN c < −2 and n = 4k + 2 and that LN = I2k +,2k −,2k (disjoint union). In (−1)k 2,2k the sequel, the pair (c, n) will always be assumed to satisfy one of the following three conditions (k ≥ 0) (i) |c| < 2, n = 2k + 1;

(ii) c > 2, n = 4k;

(iii) c < −2, n = 4k + 2,

as in all other cases LN c,n = ∅. The following result was established in [KM1] N N Theorem 1. For N ≥ 0, LN c,n (|c| = 2, n ≥ 0), J±,n (n ≥ 2, even), and J0 are N 1 connected, real analytic submanifolds of H (S ) of codimension 1 and are the regular symplectic leaves of the foliation of the second Poisson structure, whereas InN (n ≥ 2, even) are connected, real analytic submanifolds of H N (S 1 ) of codimension 3 and are the singular symplectic leaves.

To describe our results in a convenient form, we introduce the following Definition 1. We say that a real analytic submanifold M of a Hilbert space of sequences with elements (vj , wj )j ∈A (A ⊂ N), endowed with the canonical symplectic structure j dvj ∧dwj , is an action-angle model space for a leaf L of the symplectic foliation of a Poisson structure if there exists a symplectomorphism : L → M which, together with its inverse, is real analytic so that, for any j ∈ A, the elements (vj , wj ) are either actionangle variables or Birkhoff variables (i.e. the associated symplectic polar coordinates are action-angle coordinates). We call an analytic action-angle diffeomorphism. Introduce, for A ⊆ N,     hm (A; R2 ) := z = (xj , yj )j ∈A | ||z||2m := j 2m (xj2 + yj2 ) < ∞ ,   j ∈A

the cylinder Z := R×S 1 , and the half cylinders Z + := R>0 ×S 1 and Z − := R<0 ×S 1 , all endowed with the canonical symplectic structure. In this paper we prove the following Theorem 2. Let N ≥ 0 be given. (A) For |c| < 2 and n odd, or c > 2 and n = 0, hN+ 2 (N; R2 ) is an action-angle model space for LN c,n . 3

(B) For |c| > 2 and n = 2k ≥ 2 even, Z × hN+ 2 (N \ {k}; R2 ) is an action-angle model space for LN c,n . 3

N . (C) For n = 2k ≥ 2, Z ∓ × hN+ 2 (N \ {k}; R2 ) is an action-angle model space for J±,n 3

(D) For n = 2k ≥ 2, hN+ 2 (N\{k}; R2 ) is an action-angle model space for the singular leaf InN . 3

(E) hN+ 2 (N; R2 ) is an action-angle model space for J0N . 3

N To shorten notation we denote by LN ∗ an arbitrary leaf of the foliation and by M∗ its corresponding action-angle model space, as given by Theorem 2.

654

T. Kappeler, M. Makarov

Remark 1. It follows from statement (B) that for |c| > 2 and n even, Lc,n has nontrivial homology. Its fundamental group is given by Z. Despite this fact, Lc,n admits global action-angle variables (cf. [Du] for a discussion of the existence of global action-angle coordinates in the case of integrable Hamiltonian systems of finite dimension). Remark 2. The proof of Theorem 2 is presented for the leaves Lc,n with |c| = 2 and n ≥ 1. The remaining leaves are treated in a similar fashion.

(1) To show Theorem 2 we use the Birkhoff coordinates 2In (cos θn , sin θn ) n≥1

on H0N (S 1 ) (N ≥ 0) for the first Poisson structure ∂x (cf. [KM2]) as a starting point, (1) (1) where (In , θn )n≥1 denote action-angle variables with respect to ∂x . Similarly as for In (2) (n ≥ 1), the actions In with respect to the second Poisson structure can be represented by periods of a holomorphic differential on the hyperelliptic surface µ = (λ, q)2 − 4 d2 associated to the spectrum spec(q) of the operator − dx 2 + q (cf. [FM, NV]). However, (2)

if λ2n−1 ≤ 0 ≤ λ2n , the period integral for In is not well defined and needs to be (2) regularized. As a result, in the case where λ2n−1 < 0 < λ2n , In can take all real values. This phenomenon accounts for the cylinder Z appearing in the action-angle model space (1) (2) in part (B) of Theorem 2. The formulas for the actions In and In extend to the whole (1) phase space L2 (S 1 ). Moreover, the bihamiltonian structure relates the action In and (2) In , Lq (1)

(2)

(1)

d ∂In ∂In = , ∂q(x) dx ∂q(x)

(2)

i.e. In can be raised to In . In a first step of the proof of Theorem 2 (cf. Sect. 4) it is shown that, on any leaf LN c,n with |c| < 2 and N ≥ 0 (with corresponding model space denoted by M∗ ), one (2) : L → M when the actions I (1) in obtains a real ∗ ∗ n

analytic diffeomorphism (1) (q) =

(1)

2In (cos θn , sin θn )

(2)

n≥1

are replaced by In

(n ≥ 1). Similar results

hold for all other leaves, with some extra care to be taken for the leaves with topology. In Sect. 3 we prove that the angles (θn )n≥1 in the definition of (1) are, at the same (2) time, conjugate, with respect to the second Poisson structure, to the actions In n≥1 . To establish this result, we have been inspired from computations due to McKean–Vaninsky [MV] who proved canonical relations for the defocusing nonlinear Schrödinger equation (NLS). In future work we plan to extend the results of Theorem 2 to more general phase spaces, including in particular quasiperiodic potentials. 1. Action Variables of the Second Poisson Structure 1.1. Action variables for KdV. Based on Arnold’s formula for the action variables of a finite dimensional integrable system, Flaschka and McLaughlin [FM] introduced vari(1) ables In which can be shown to be action variables for KdV with respect to the first

Action-Angle Variables for KdV

Poisson structure (n ≥ 1, q ∈ L2 (S 1 ))

1 (−1)n (µ, q) dµ, In(1) (q) = arccosh π #n 2 where

arccosh

(−1)n (−1)n (µ, q) := log (µ) − (µ)2 − 4 . 2 2

655

(1.1)

(1.2)

Here #n (n ≥ 1) is a counterclockwise oriented circuit around the nth gap (λ2n−1 , λ2n ), the root (µ)2 − 4 is the continuous branch (in q and λ) defined on C \ ∪n≥0 [λ2n−1 , λ2n ] and normalized by i (µ, q)2 − 4 > 0 for q = 0, µ = 1. It is convenient to choose the principal branch of the logarithm in (1.2). For this purpose we assume that #n is chosen sufficiently close around the gap (λ2n−1 , λ2n ) so that it is contained in the of domain the principal branch of the logarithm. In addition we may (−1)n (µ) assure that Re ≥ 21 for µ inside #n . 2 (2) By the same approach as in [FM] one can define variables I˜n with respect to the second Poisson structure (cf. [NV]),

1 1 (−1)n (µ, q) I˜n(2) := arccosh dµ 2π #n µ 2

where we assume for the moment that, for any n ≥ 1, zero is not on #n . However the above definition depends on the choice of #n . More precisely, the integral depends on whether 0 is inside of #n or not. To remove this dependence we slightly the change 1 (−1)n (−i0) 1 above definition by adding the Casimir Fn (q) := − 2π dµ arccosh #n µ 2 to I˜n2 , which leads to

(−1)n (µ) (−1)n (−i0) 1 arccosh − arccosh dµ, 2 2 #n µ (1.3) n n where, as usual, arccosh (−1) (−i0) = lim&→0+ arccosh (−1) (−i&) . Notice that, 2 2 due to Cauchy’s theorem the Casimir Fn (q) vanishes if 0 is not inside #n . Further the right side of (1.3) is well defined even in the case where 0 is on the circuit #n , as the (−1)n (µ) function arccosh is differentiable in µ on #n . 2 In(2) :=

1 2π

(2)

The fact that Fn (q) = 0 if 0 is not inside #n can be used to verify that indeed In (q) does not depend on the choice of the circuit #n , as long as #n is close to [λ2n−1 , λ2n ]. Hence, in the case where 0 ∈ [λ2n−1 , λ2n ], (1.3) leads to

1 λ2n 1 (−1)n (µ − i0) (2) In = arccosh dµ, (1.4) π λ2n−1 µ 2

656

T. Kappeler, M. Makarov

whereas in the case 0 ∈ (λ2n−1 , λ2n ) one gets,

(−1)n (µ − i0) (−1)n (−i0) 1 λ2n 1 (2) In = arccosh − arccosh dµ. π λ2n−1 µ 2 2 (1.5) n Notice that for 0 ∈ {λ2n , λ2n−1 }, arccosh (−1)2(0) = 0. Nevertheless, the integral (1.5) remains convergent: in the case λ2n = λ2n−1 , this holds as arccosh is Hölder continuous of order 21 for x ≥ 1. In the case λ2n = λ2n−1 = 0, the convergence follows (2) from the case λ2n−1 = 0 < λ2n by a limiting argument which shows that In = 0 in this case. 1.2. Raising of actions. Proposition 2. For q ∈ L2 (S 1 ) and n ≥ 1, Lq

(2)

(1)

∂In d ∂In = . ∂q(x) dx ∂q(x)

Remark. For a bihamiltonian system with Poisson structures P1 and P2 , we say that a functional F1 , defined on the phase space, raises to a functional F2 if P1 dF1 = P2 dF2 . (1) (2) Proposition 2 states that In raises to In (n ≥ 1). In fact, for any integrable, non degenerate, finite dimensional bihamiltonian system, actions with respect to the first Poisson structure raise to actions for the second Poisson structure. Proof. Choose #n so that 0 ∈ #n . Then from (1.3) one obtains (2) ∂In 1 1 1 1 ∂(0) ∂(µ)

=− − dµ. (1.6) ∂q(x) 2π #n µ 2 (µ) − 4 ∂q(x) 2 (−i0) − 4 ∂q(x) Observe that any product yi · yj of the two fundamental solutions y1 = y1 (x, µ, q), y2 = y2 (x, µ, q) of −y

+ qy = µq is in the domain of Lq and satisfies Lq (yi yj ) = d 2 (yi yj ). As ∂(µ) 2µ dx ∂q(x) is a linear combination of y1 (x, µ) , y1 (x, µ)y2 (x, µ) and y2 (x, µ)2 we conclude that Lq

∂(µ) d ∂(µ) = 2µ . ∂q(x) dx ∂q(x)

In particular it follows that Lq ∂(0) ∂q(x) = 0 and in view of (1.6), this leads to (2)

d 1 ∂In =− Lq ∂q(x) dx π

#n

(1)

d ∂In ∂(µ) dµ = . 2 ∂q(x) dx ∂q(x) (µ) − 4 1

Introduce the Poisson brackets {·, ·}j (j = 1, 2) defined by 1 1 ∂F d ∂G ∂F ∂G {F, G}1 := dx; {F, G}2 := Lq dx ∂q(x) dx ∂q(x) ∂q(x) ∂q(x) 0 0 for functionals F and G with sufficiently regular gradients.

(1.7)

Action-Angle Variables for KdV

657

Corollary 3. For a differentiable functional F , {F, In(1) }1 = {F, In(2) }2 (n ≥ 1). Proposition 2 allows to express the average [q] = S 1 q(x)dx of q ∈ L2 (S 1 ) as a (2) function of the actions (In )n≥1 and Casimir(s). To prove this we use the fact that if a Casimir F of a first Poisson structure of a bihamiltonian system raises to a functional G, then G is a Casimir for the second Poisson structure. Corollary 4. There exists a Casimir f (q) with respect to the second Poisson structure, so that, (2) 2πj Ij + f (q). (1.8) [q] = j ≥1

Proof. Recall that 21 ||q||2 is the Hamiltonian of translation with respect to the first (1) Poisson structure, hence the frequency ωj corresponding to the j th action variable Ij (1) is given by ωj = ∂(1) ( 21 ||q||2 ) = 2πj . Thus 21 ||q||2 − j ≥1 2πj Ij is constant on ∂Ij

each leaf L2c (S 1 ) := {q ∈ L2 (S 1 ) | [q] = c} of the foliation induced by the first Poisson (1) (1) structure. For q ≡ c, Ij = 0 ∀j ≥ 1 and therefore 21 ||q||2 − j ≥1 2πj Ij = 21 c2 , which leads to 1 1 (1) 2πj Ij + [q]2 (q ∈ L2 (S 1 )). (1.9) ||q||2 = 2 2 j ≥1

Notice that, for q ∈ H 1 (S 1 ), 21 ||q||2 raises to [q], i.e. d d ∂ 21 ||q||2 ∂[q] = q= . Lq ∂q(x) dx dx ∂q(x) Thus, in view of Proposition 2, 21 ||q||2 −

(1) j ≥1 2πj Ij

(which is a Casimir with respect (2)

to the first Poisson structure) raises to [q] − j ≥1 2πj Ij which by the remark before Corollary 4 is a Casimir with respect to the second Poisson structure on H 1 (S 1 ) and, by continuity, on L2 (S 1 ). The next result compares the span of the L2 -gradients For q ∈ L2 (S 1 ), let

(2)

∂In ∂q(x)

with the span of

(1)

∂In ∂q(x) .

O := {n ≥ 1 | λ2n−1 (q) < λ2n (q)}. For a subset A ⊆ L2 (S 1 ), denote by span "A# the L2 -closure of the linear span of A. Lemma 5.

(2) (1) ∂In ∂(0) ∂In , | n ∈ O = span 1, |n∈O . span ∂q(x) ∂q(x) ∂q(x)

658

T. Kappeler, M. Makarov

Proof. First notice that either λ2n = 0 ∀n ∈ O or λ2n−1 = 0 ∀n ∈ O. As the two cases are treated in the same way we consider the case λ2n = 0 ∀n ∈ O. By I so(q) we denote the isospectral set of q, Iso(q) := {p ∈ L2 (S 1 ) | spec(p) = spec(q)}. The generalized tangent space to Iso(q) s given by (cf. [MT]) d 2 Tq Iso(q) = span f |n∈O , dx 2n where f2n is an L2 -normalized eigenfunction corresponding to λ2n . Thus, with Mq denoting the right inverse of Lq (cf. [KM1]), d −1 2 Tq Iso(q) = span 1, f2n |n∈O ; span 1, dx ∂(0) ∂(0) 2 span , Mq Tq Iso(q) = span ,f | n ∈ O , ∂q(x) ∂q(x) 2n d 2 2 = 2λ where we used the identity Lq f2n 2n dx f2n . 2 with a = 0 iff n ∈ {0}∪O and ∂(0) = 2 As 1 = n≥0 an f2n n n≥0 bn f2n with bn = ∂q(x) 2 2 0 iff n ∈ {0} ∪ O (cf. [MT], p. 175–176) we conclude that f0 is in span 1, f2n | n ∈ O

as well as in span

∂(0) 2 ∂q(x) , f2n

| n ∈ O . This leads to

d −1 ∂(0) span 1, Tq Iso(q) = span , Mq Tq Iso(q) = NO , dx ∂q(x) 2 | n ∈ O . On the other hand, using the Birkhoff coordiwhere NO := span f02 , f2n (1) d ∂In (1) nates provided by , one sees that Tq Iso(q) = span dx ∂q(x) | n ∈ O . Thus (cf. Proposition 2) (1) d −1 ∂In Tq Iso(q) = span 1, |n∈O ; NO = span 1, dx ∂q(x) (2) ∂(0) ∂(0) ∂In NO = span , Mq Tq Iso(q) = span , |n∈O . ∂q(x) ∂q(x) ∂q(x)

2. Angle Variables 2.1. Abel map and angle variables. In this section we review the definition of the angle variables (θn )n≥1 (cf. [MT, KM2]). First let us introduce some more notation. For q ∈ L2 (S 1 ) and j ≥ 1, denote by φj (λ, q) the function φj (λ, q) =

1 j 2π 2

µ(j ) (q) − λ l , l2π 2

l∈N\{j }

(2.1)

Action-Angle Variables for KdV

659

(j )

(j )

where µl (q) satisfy λ2l−1 (q) ≤ µl (q) ≤ λ2l (q) (j, l ≥ 1, l = j ) and are uniquely determined by 1 λ2k (q) dλ φj (λ, q) = 0, (k = j ). (2.2) π λ2k−1 (q) 2 (λ, q) − 4 Further, for j ≥ 1, denote by cj (q) the spectral invariant, defined by (cf. [KM2]) 1 dλ φj (λ, q) = 1. (2.3) cj (q) 2 2π #j (λ, q) − 4 In the case the j th gap is open, cj (q) is given by 1 cj (q) π

λ2j (q)

λ2j −1 (q)

φj (λ, q)

dλ 2 (λ − i0, q) − 4

= 1.

(2.4)

Define ψj (λ) := cj φj (λ) and introduce, for n ≥ 1 and q with γn (q) > 0, the multivalued ( mod 2π ) map (cf. [KM2]) θn (q) :=

∞

µ∗k (q)

k=1 λ2k (q)

ψn (λ, q) ∗

dλ 2 (λ, q) − 4

,

(2.5)

d where µk (q) (k ≥ 1) denote the Dirichlet eigenvalues of − dx 2 + q, considered on the

∗ ∗ interval [0, 1] and µk (q) the Dirichlet divisor, µk (q) = (µk (q), ∗ 2 (µk , q) − 4), on

the hyperelliptic surface y = ∗ 2 (λ, q) − 4 with ∗ 2 (µk , q) − 4 = y1 (1, µk , q) − y2 (1, µk , q). (2.6) 2

Recall from [KM2] that the infinite sum in (2.5) converges uniformly.

2.2. On the gradient of the angle variables. For the convenience of the reader we recall two propositions proved in [KM2]. Denote by Un the open set Un := {q ∈ L2 (S 1 ) | γn (q) = 0}. For A ⊆ L2 (S 1 ) denote by N orA the (possibly empty) subset of A given by N orA := {q ∈ A | µk (q) = λ2k (q), ∀k ≥ 1}. Further, denote by mij ≡ mij (λ, q) the entries of the Floquet matrix, m11 := y1 (1, λ, q); m21 := y1 (1, λ, q); m12 := y2 (1, λ, q); m22 := y2 (1, λ, q). For k, n ≥ 1 and q ∈ Un , define (˙ denotes the derivative cn,k (q) := −

ψn (λ2k , q) ; ˙ 2k , q) (λ

dk (q) := (−1)k+1

d dλ )

m ˙ 11 (λ2k , q)m21 (λ2k , q) . ˙ 2k , q) (λ

660

T. Kappeler, M. Makarov

Proposition 6 ([KM2]). For n ≥ 1 and q ∈ N orUn , ∞ ∂θn = cn,k (q) y1 (x, λ2k , q)y2 (x, λ2k , q) + dk (q)y22 (x, λ2k , q) , ∂q(x) k=1

where the series converges in H 2 (S 1 ). Proposition 7 ([KM2]). For any N ≥ 0 and n ≥ 1 the map ∇θn : Un ∩ H0N → H0N+1 ,

q '→

∂θn ∂q(x)

is real analytic. The following corollary follows from Proposition 7. Corollary 8. For q ∈ Un ∩ Um ∩ H 2 (S 1 ), the bracket {θn , θm }2 (q) is well defined. 3. Canonical Relations In this section we prove the following Theorem 3. Let n, m ≥ 1. (2)

(2)

(i) For q ∈ L2 (S 1 ), {In , Im }2 = 0. (2) (ii) For q ∈ L2 (S 1 ) ∩ Un , {θn , Im }2 = −δn,m . 2 1 (iii) For q ∈ Un ∩ Um ∩ H (S ), {θn , θm }2 = 0. d . Then, for any a ∈ R, Lq;a ∂(a) Proof. For a ∈ R, introduce Lq;a := Lq − 2a dx ∂q(x) = 0 (cf. [KM2], Appendix B). Notice that (a − b)Lq = aLq;b − bLq;a . Hence, for a = b,

∂(b) ∂(a) 1 ∂(b) ∂(a) , Lq , (aLq;b − bLq;a ) = ∂q(x) ∂q(x) L2 a − b ∂q(x) ∂q(x) L2

∂(a) ∂(b) ∂(b) 1 ∂(a) = 0, = +a b Lq;a , , Lq;b b−a ∂q(x) ∂q(x) L2 ∂q(x) ∂q(x) L2

and in view of (1.6), statement (i) follows. (2) (1) (ii) By Corollary 3, {θn , Im }2 = {θn , Im }1 on Un . It is proved in [KM2] that θn is an angle variable for KdV with respect to the first Poisson structure. Hence θn satisfies (1) {θn , Im }1 = −δn,m and (ii) follows. To prove (iii) we need two auxiliary results. Proposition 9. For n, m ≥ 1 and q ∈ N orUn ∩ N orUm , {θn , θm }2 (q) is well defined and {θn , θm }2 (q) = 0.

Action-Angle Variables for KdV

661

Proof. For k ≥ 1, introduce ak (x, q) := y1 (x, µk (q), q)y2 (x, µk (q), q), Then (cf. [PT]), for i, j ≥ 1, d 2 2 d = = 0; g ,g ai , aj , dx i j L2 dx L2

gk (x, q) :=

d 2 g , aj dx i

y2 (x, µk (q), q) . ||y2 (·, µk (q), q)||L2 L2

=

1 δi,j . 2

d d 2 As Lq ak = 2µk dx ak and Lq gk2 = 2µk dx gk , we conclude that, for i, j ≥ 1, Lq gi2 , gj2 2 = Lq ai , aj L2 = 0; Lq gi2 , aj 2 = µi δi,j . L

L

The claimed statement then follows from Proposition 6.

Recall that by Corollary 8, {θn , θm }2 (q) is well defined on Un ∩ Um ∩ H 2 (S 1 ). Proposition 10. For n, m, k ≥ 1 and q ∈ Un ∩ Um ∩ H 2 (S 1 ), the derivative of {θn , θm }2 ∂I

(2)

k at q in the direction Lq ∂q(x) is zero.

Remark. Reformulated (with a slight abuse of notation), Proposition 10 states that (2) {{θn , θm }2 , Ik }2 = 0. Proof. We want to use the Jacobi identity (2)

(2)

(2)

{{θn , θm }2 , Ik }2 + {{θm , Ik }2 , θn }2 + {{Ik , θn }2 , θm }2 = 0.

(3.1)

Given F, G, H ∈ C 2 (U ) with sufficiently regular gradients and U an open subset of L2 (S 1 ), the Jacobi identity {{F, G}2 , H }2 + {{G, H }2 , F }2 + {{H, F }2 , G}2 = 0

(3.2)

∂ F ∂ G and ∂q(x)∂q(y) is established as follows: for h ∈ L2 (S 1 ), one obtains, using that ∂q(x)∂q(y) are symmetric, that the derivative Dh {F, G}2 of {F, G}2 in the direction of h ∈ L2 (S 1 ) is given by ∂G ∂ 2F Dh {F, G}2 = , Lq , h(x) ∂q(x)∂q(y) ∂q(y) L2 L2 ∂F d ∂G + , h(x) ∂q(x) dx ∂q(x) L2 (3.3) ∂F ∂ 2G − , h(x) , Lq ∂q(x)∂q(y) ∂q(y) L2 L2 ∂G d ∂F − . , h(x) ∂q(x) dx ∂q(x) L2 2

2

Substituting (3.3) (and similar expressions for the other terms) in the left side of (3.2), one verifies the identity (3.2).

662

T. Kappeler, M. Makarov (2)

In our case F = θn , G = θm , H = Ik , and U = Un ∩ Um ∩ H 2 (S 1 ). Taking into (2) ∂θn ∂θm , ∂q(x) ∈ H 3 (S 1 ) account that on U , the functions Ik , θn , θm are analytic and ∂q(x) (2)

∂In ∈ H 4 (S 1 ), one verifies that all the expressions in the (Proposition 7) as well as ∂q(x) formal derivation of the Jacobi identity are well defined. It follows from Theorem 3(ii) (2) that {{θn , θm }2 , Ik }2 = 0.

Proof of Theorem 3(iii). For q ∈ Un ∩ Um ∩ H 2 (S 1 ), the set I so(q) := {p ∈ L2 | spec(p) = spec(q)} satisfies I so(q) ⊂ Un ∩ Um ∩ H 2 (S 1 ). By Proposition 10, for (2) k, n, m ≥ 1, {{θn , θm }2 , Ik }2 = 0 on Un ∩ Um ∩ H 2 (S 1 ). Thus {θn , θm }2 is constant on I so(q). By Proposition 9, for the unique element p in N or(I so(q)), {θn , θm }2 (p) = 0, hence {θn , θm }2 (q) = 0. 4. Action-angle Map 4.1. Definition of (2) . In [KM2] it is shown that the map (1) : H0N (S 1 ) → hn+ 2 (N; R2 ), 1

defined by (1) (q) :=

(1) 2Ij (cos θj , sin θj )

j ≥1

is a real analytic action-angle map with respect to the first Poisson structure. (1) has a real analytic extension to all of L2 (S 1 ) (again denoted by (1) ), (1) (q) := (1) (q−[q]). We define the map (2) using the same angles (θn )n≥1 as in (1) . From (1.4) one (2) (2) sees that for n ≥ 1 with λ2n (q) < 0, In (q) satisfies In (q) ≤ 0 whereas for n with (2) 0 < λ2n−1 (q), In (q) ≥ 0. (2) = (vn , wn )n≥1 , where Definition 11. For q ∈ L2 (S 1 ), set (2) (q) := n (q)  (2)    −2In (cos θn , sin θn ) −   I (2) , θ n n (vn , wn ) := (0, 0)     (2)  2In (cos θn , sin θn )

n≥1

if λ2n < 0; if

λ2n−1 ≤ 0 ≤ λ2n , γn = 0;

if

0 = λ2n−1 = λ2n ;

if

0 < λ2n−1 .

(2) To simplify notation, the restriction of (2) to a leaf LN ∗ is denoted again by . N N (2) For example, for a leaf L∗ = Lc,2k with |c| > 2 and k ≥ 1, (q) is given by

(2) (2) (2) , Ik , θk , 2In (cos θn , sin θn ) , − −2In (cos θn , sin θn ) n>k

1≤n
N (2) whereas for a leaf LN ∗ = Lc,2k+1 with |c| < 2, (q) is given by

(2) (2) , 2In (cos θn , sin θn ) − −2In (cos θn , sin θn ) 1≤n≤k

n>k

.

Action-Angle Variables for KdV

663

N 4.2. Analyticity of (2) . Recall that MN ∗ denotes the model space corresponding to L∗ (cf. Theorem 2). In this subsection we prove (2)

N Proposition 12. For any N ≥ 0, ∗ : LN ∗ → M∗ is a real analytic map.

First let us introduce the following real analytic submanifolds (c ∈ R fixed): L∗ (C) ≡ Lc,k (C) := {q ∈ L2 (S 1 ; C) | (0, q) = c}, J∗ (C) ≡ J±,k (C) := {q ∈ L2 (S 1 ; C) | λk− 1 ± 1 = 0; 0 ∈ specd (q)}, 2

2

Ik (C) := {q ∈ L2 (S 1 ; C) | λk−1 (q) = λk (q) = 0}. Using that λj (q) (j ≥ 0) satisfy the asymptotics λ2n (q), λ2n−1 (q) = n2 π 2 + O(||q||), one obtains the following Lemma 13. (i) Given q0 ∈ Lc,k , there exist a complex neighborhood Vq0 ⊆ L2 (S 1 ; C) of q0 and K > 0 with the properties sup |λ2n+1 (q) − λ2n (q)| ≥ K;

q∈Vq0 n≥0

sup |λj (q)| ≥ K.

q∈Vq0 j ≥0

(ii) Given q0 ∈ J±,k , there exist a complex neighborhood Vq0 ⊆ L2 (S 1 ; C) of q0 and K > 0 with the properties sup |λ2n+1 (q) − λ2n (q)| ≥ K;

q∈Vq0 n≥0

sup

q∈Vq0

|λj (q)| ≥ K.

j =k− 21 ± 21

(iii) Given q0 ∈ Ik , there exist a complex neighborhood Vq0 ⊆ L2 (S 1 ; C) of q0 and K > 0 with the property sup |λ2n+1 (q) − λ2n (q)| ≥ K;

q∈Vq0 n≥0

sup

q∈Vq0

|λj (q)| ≥ K.

j =k− 21 ± 21 (1)

(2)

Lemma 13 allows us to analytically extend the variables In and In , defined in (1.1) (2) and (1.2). With regard to the variables In , note that on J±,k and Lc,k (C) (C),n Ik (C), 1 (−1) (−i0) 1 with |c| < 2, the Casimir Fn (q) = − 2π #n µ dµ arccosh vanishes for 2

any n ≥ 1, whereas on Lc,k with |c| > 2, Fn (q) vanishes for n = 2k . From Lemma 13 and the analyticity of (µ, q) in µ and q it follows that the defi(1) (2) nitions (1.1) and (1.2) can be used to define the variables In respectively In on the (1) complex neighborhoods specified in Lemma 13. According to [KM2], the actions In 2 1 are analytic on a complex neighborhood of L (S ) which is independent of n. A similar (2) result holds for the variables In : Lemma 14. For any given q0 ∈ L2 (S 1 ), there exists a complex neighborhood Uq0 ⊆ (2) L2 (S 1 ; C) of q0 so that, for any n ≥ 1, In (q), defined by (1.2), is analytic on Uq0 .

664

T. Kappeler, M. Makarov

We consider the case L∗ = Lc,k with |c| < 2. (Similar arguments are used in the (2) other cases.) To prove that ∗ is real analytic we have to show that, for any q0 ∈ L∗ , there exists a complex neighborhood Uq0 ⊂ L2 (S 1 ; C) of q0 such that, for any n ≥ 1, the coordinate functions vn , wn are analytic and (vn , wn )n≥1 is uniformly bounded in Uq0 . 1 Taking (1) : H N (S 1 ) → hN+ 2 (N; R2 ) as a starting point, we want to replace in (1) (1) (2) j the action Ij by Ij . Notice that for q real valued and n with λ2n−1 < λ2n < 0, or

(1)

In 2|λ2n−1 |

(2)

≤ −In ≤

1/2 ≤ (2|λ2n |)−1/2 (2|λ2n−1 |)−1/2 ≤ −ζn (q) := −In(2) /In(1)

(1)

In 2|λ2n |

(4.1)

whereas for 0 < λ2n−1 < λ2n , one obtains 1/2 ≤ (2|λ2n−1 |)−1/2 . (2|λ2n |)−1/2 ≤ ζn (q) := In(2) /In(1)

(4.2)

The inequalities (4.1)–(4.2) show that the quotient 1/2 ζn (q) := ± ±In(2) /In(1)

(4.3)

can be defined in a continuous fashion for q ∈ L2 (S 1 ) with λ2n−1 ≤ λ2n < 0 resp. 0 < λ2n−1 ≤ λ2n : if λ2n = λ2n−1 = 0, ζn (q) is given by ζn (q) = − (−2λ2n )−1/2 resp. (2λ2n )−1/2 . (2) (2) (1) As we are in the case L∗ = Lc,k with |c| < 2, n satisfies n = ζn n (n ≥ 1). th For n ≥ 1, denote by γn the length of the n gap, γn := λ2n − λ2n−1 . To prove that (2) is real analytic we need the following Lemma 15. Let q0 ∈ L∗ . Then there exists a (small) neighborhood Uq0 ⊆ L2 (S 1 ; C) with the following properties: (A) For q ∈ Uq0 , (1)

(i) In (q) = (ii)

(2) ±In (q)

1 2nπ

=

γn 2 2

1 4n3 π 3

(1) 1 + rn (q) ; γn 2 (2) 1 + rn (q) , 2 (j )

(j )

where the error terms rn (q) (j = 1, 2) satisfy rn (q) = O

(B) The (iii)

log n n

uniformly

(j ) (j ) for q ∈ Uq0 as well as |1 + rn (q)| ≤ C and C1 ≤ Re(1 + rn (q)) with C > 1 independent of n and q ∈ Uq0 . functions ζn (q) (cf. (4.3)), admit an analytic continuation on Uq0 and satisfy (3) ζn (q)2 = 2n21π 2 (1 + rn (q)), (3) (3) where the error terms rn (q) satisfy rn (q) = O logn n uniformly for q ∈ Uq0 (3) (3) as well as |1 + rn (q)| ≤ C and C1 ≤ Re(1 + rn (q)) with C > 1 independent of n and q ∈ Uq0 .

Remark. From Lemma 15(B) and Cauchy’s formula it follows that

log n ||p|| (q ∈ Uq0 ; p ∈ L2 (S 1 ; C)). dq ζn (p) = O n2

Action-Angle Variables for KdV

665 (1)

Proof. Statement (B) follows from (A) and Lemma 14: By (A) and Lemma 14, rn and (j ) (2) log n rn are real analytic. In view of the asymptotics rn (q) = O n (j = 1, 2) we can (j )

choose Uq0 so small that for any q ∈ Uq0 , n ≥ 1 and j = 1, 2, | Im(1 + rn (q))| ≤ (j ) (3) (1) (2) 1 Re 1 + r (q) .As 1+rn (q) = (1+rn (q))−1 (1+rn (q)), the claimed statements n 2 then follow from (A) (with C > 1 appropriately chosen). Concerning (A), statements (i) and (ii) are proven in a similar fashion, thus we consider (ii) only (for (i) cf. also [BBGK]). Choose a neighborhood U ⊆ L2 S 1 , C of q0 and, as above, circuits #n (n ≥ 1) around the nth gap (λ2n−1 (q0 ), λ2n (q0 )) so that, d2 for all q in U and n ≥ 1, the only eigenvalues of − dx 2 + q inside #n are λ2n−1 , λ2n , the d ˙ only zero of (µ) ≡ dµ (µ) inside #n is λ˙ n , and spec(q) ∩ #n = ∅. ( λ˙ n denotes the (2) ˙ zero of (µ) close to τn = (λ2n−1 + λ2n )/2 .) Then the variables In (q) are analytic (2)

on U (cf. Lemma 14). As we are in the case L∗ = Lc,k with |c| < 2, In is given by (n ≥ 1) 1 2π

In(2) (q) =

#n

1 (−1)n (µ) arccosh dµ. µ 2

As q0 ∈ Lc,k with |c| < 2, λk−1 (q0 ) < 0 < λk (q0 ) and k is odd. Introduce

−1, for n ≤ +1, for n >

εn :=

k−1 2 k−1 2 .

With this choice, Re(εn µ) > 0 inside #n and hence the value log(εn µ) of the principal ˙ branch of the logarithm is well defined. Integrate by parts to obtain (with (µ) = d (µ)) dµ εn In(2) (q) =

Use that

#n

˙

1 2π

#n

εn log(εn µ)

√ (µ)

dµ = 0, to obtain

εn In(2) =

εn 2π

(µ)2 −4

#n

˙ (µ) (µ)2 − 4

dµ.

˙ (µ) log(εn µ) − log(εn λ˙ n ) dµ. (µ)2 − 4

Introduce, for µ inside and near #n , w1 (µ) ≡ w1 (µ, q) := 2

1 (µ − λ0 ) 2 ((λ2k − µ)(λ2k−1 − µ))1/2 , nπ k2 π 2

k=n

w2 (µ) ≡ w2 (µ, q) := (−1)n−1

λ˙ k − µ . k2 π 2

k=n

666

T. Kappeler, M. Makarov

(Here, as usual, z1/2 denotes the branch on C \ R≤0 with 11/2 = 1.) Then both

˙ (µ)2 − 4 and (µ) admit product representations,

w1 (µ) (λ2n − µ)(µ − λ2n−1 ), nπ µ − λ˙ n ˙ (µ) = (−1)n−1 2 2 w2 (µ) n π

with the sign of the radical (λ2n − µ)(µ − λ2n−1 ) determined by (µ)2 − 4. Hence

1 εn µ (µ − λ˙ n )w(µ)

εn In(2) = log dµ, 2π #n λ˙ n nπ (λ2n − µ)(µ − λ2n−1 ) where w(µ) :=

(µ)2 − 4 =

w2 (µ) w1 (µ) .

If γn = 0, (ii) holds and thus we may assume γn = 0. By

deforming the contour #n , one obtains with µ(t) := τn + t γ2n and σn := γn /2 εn = nπ π

τn −λ˙ n γn /2 ,

(σn + t) w(µ(t))dt. (1 − t 2 )1/2 In order to investigate the integrand we make a Taylor expansion of log µ(t) , ˙λ εn In(2)

µ(t) log λ˙ n −1

log where, with λ(µ) :=

µ λ˙ n

1

µ(t) λ˙ n

n

= λ(µ(t))b(µ(t)),

− 1,

γn /2 µ(t) −1= λ(µ(t)) = λ˙ n λ˙ n and

b(µ) :=

1

0

τn − λ˙ n +t γn /2

ds = 1 + sλ(µ)

1 0

1+s

=

ds

µ λ˙ n

γn /2 (σn + t) λ˙ n

−1

.

Therefore, εn In(2) = with 4 Jn := π

1 −1

(γn /2)2 Jn 4nπ εn λ˙ n

(σn + t)2 b(µ(t))w(µ(t)) (2)

dt . (1 − t 2 )1/2

It remains to obtain the stated estimates for In . According to [PT], for µ − τn = O(1), the functions w1 (µ) and w2 (µ) satisfy asymptotic estimates

log n 1 log n w1 (µ) = 1 + O ; w2 (µ) = 1+O . n 2 n

Action-Angle Variables for KdV

667

Hence w2 (µ) 1 w(µ) = = 1 + rn(4) ; w1 (µ) 2 By Lemma 16 below, σn = γn O λ(µ(t)) =

log n n

rn(4)

=O

log n . n

and hence, with λ˙ n = n2 π 2 + O(1),

γn (σn + t) = γn O 2λ˙ n

1 n2

(0 ≤ |t| ≤ 1)

which leads to

b(µ(t)) = 1 + γn O

1 n2

.

Combining these estimates we conclude 4 Jn = π

1

−1

t2

1 dt + O √ 2 2 1−t

log n n

=1+O

log n n

locally uniformly in q. It remains to establish estimates for Jn for finitely many n’s. They are obtained by proving them first for real valued potentials in U - which is straightforward - and then taking, if necessary, a sufficiently small neighborhood Vq0 ⊆ U of q0 to conclude that the same estimates hold for the complex valued potentials in Vq0 as well. The following result has been obtained in [BKM1, Lemma2.4] Lemma 16. Locally uniformly on a (small) neighborhood U ⊂ L2 (S 1 ; C) of L2 , λ˙ n − τn = γn2 O where τn =

λ2n +λ2n−1 , γn 2

log n , n

˙ = λ2n − λ2n−1 , and λ˙ n is the zero of (λ) close to τn .

As a consequence of Lemma 15, we conclude that, for q ∈ L2 (S 1 ), (2) (2πj )3 Ij (q) < ∞. j ≥1

Proof of Proposition 12. The statement follows by combining Lemma 13 and Lemma 15 1 together with the result from [KM2] saying that (1) : H0N (S 1 ) → hN+ 2 N; R2 is real analytic.

668

T. Kappeler, M. Makarov

4.3. Properness of (2) . To shorten notation let, for λ2n−1 ≤ µ ≤ λ2n , (−1)n (µ, q) pn (µ) ≡ pn (µ, q) := arccosh 2 1/2

n (−1) (µ) 1 . = log 1+ 1− 2 ((µ)/2)2 N Proposition 17. For any N ≥ 0, (2) : LN ∗ → M∗ is proper. N Proof. Let us consider the case where LN ∗ = Lc,2k with |c| > 2 and k ≥ 1. (The other cases are treated similarly.) Given a compact subset K ⊂ MN c,2k , there exists M ≥ 1 (2) −1 (K) ⊆ Lc,2k , and, for any ε > 0, nε ≥ 1 so that for, q ∈ Q := n3+2N |In(2) (q)| ≤ M; (4.4) n≥1

n≥nε

n3+2N |In(2) (q)| ≤ ε.

(4.5)

By Lemma 27 (cf. Appendix B), there exists n1 = n1 (M, k, c) ≥ k + 1 such that for n ≥ n1 , 1 γn2 1 1 λ2n 1 (1) 1 In(2) (q) ≥ . pn (µ)dµ = In ≥ λ2n π λ2n−1 2λ2n 2(8π )2 λ2n n By Lemma 26 (cf. Appendix B), there exist constants Cj = Cj (M, c, k) (j ≥ 1) such that, for all q ∈ Lc,2k , γj (q) ≤ Cj . Let #l ≡ #l (q) := ( j ≥l j 2N γj2 )1/2 and Dk := kj =1 Cj . Recall that the sum of the bands nk=1 (λ2k−1 − λ2k−2 ) can be estimated by n2 π 2 (cf. [GT]). Then for n ≥ n1 , using that λ0 < λ2k−1 < 0 < λ2k (as k ≥ 1), we obtain √ λ2n ≤ λ2n − λ0 ≤ γ1 + . . . + γn + n2 π 2 ≤ Dn1 + n#n1 + n2 π 2 . This leads to n3+2N |In(2) (q)| ≥ M≥ n≥n1

≥

1 2(8π)2 Dn1

1 1 2 1 n3+2N γ √ 2 2(8π ) n≥n Dn1 + n#n1 + n2 π 2 n n 1 1 1 1 2N 2 n γn = #2 2 2 + #n1 + π n≥n 2(8π ) Dn1 + #n1 + π 2 n1 1

1 ≥ (#n1 − Dn1 − π 2 ), 2(8π)2 hence #n1 ≤ 2(8π )2 M + Dn1 + π 2 . 2 2 Substitute this inequality into #12 ≤ n2N 1 Dn1 + #n1 to obtain, for q ∈ Q, 2 2 2 2 #12 ≤ n2N D + 2(8π ) M + D + π . n 1 n 1 1

(4.6)

Action-Angle Variables for KdV

669

Thus the set {(γn (q))n≥1 | q ∈ Q} is bounded in hN (N; R). Using the same argument as above we obtain from (4.5), with n˘ := max (nε , n1 ), ε≥

n3+2N |In(2) (q)| ≥

n≥n˘

1 1 n2N γn (q)2 2 2 2(8π ) #1 (q) + π n≥n˘

and thus the set {(γn (q))n≥1 | q ∈ Q} is compact in hN (N; R). Corollary 4 together with (4.4) implies that, for q ∈ Q ⊂ Lc,2k , |[q]| ≤ M + f (q),

(4.7)

where f (q) is the Casimir introduced in Corollary 4. By [GT] it then follows that Q is compact in H N (S 1 ). 4.4. Local properties of (2) . In this subsection we prove N Proposition 18. For any N ≥ 0, the map (2) : LN ∗ → M∗ is a local diffeomorphism, N (2) N N i.e. for any q ∈ L∗ , dq : Tq L∗ → T(2) (q) M∗ is bijective. N We prove Proposition 18 for LN ∗ = Lc,2k with |c| > 2, k ≥ 1. (The other cases are ∼ N+ 23 (N; R2 ). To show that dq (2) : treated similarly.) Notice that T(2) (q) MN c,2k = h N+ 2 (N; R2 ) is invertible we prove that dq (2) is a Fredholm operator of Tq LN c,2k → h index 0 and is 1-1. 3

N+ 2 Lemma 19. The map dq (2) : Tq LN (N; R2 ) is a Fredholm operator of c,2k → h index 0. 3

(2)

Proof. In view of the definition of n , for n ∈ N \ {k} and p ∈ Tq LN c,2k , (1) (1) dq (2) n (p) = ζn (q)dq n (p) + dq ζn (p)n (q).

(4.8)

3 Formula (4.8) allows to extend dq (2) to a map dq (2) : H N (S 1 ) → hN+ 2 N; R2 . We 3 first prove that dq (2) : H N (S 1 ) → hN+ 2 N; R2 is a Fredholm operator of index 1. Rewrite (4.8) as (with n ∈ N \ {k}, p ∈ H N (S 1 ))

1 1 (1) (1) dq (2) (p) = (p) + ζ (q) − d dq (1) √ √ q n n n n (p) + dq ζn (p)n (q). 2nπ 2nπ Recall from [KM2] that dq (1) (p) : H N (S 1 ) → hN+ 2 (N; R2 ) is a Fredholm operator of index 1. By Lemma 15(B),

1 log n |ζn (q) − √ . |=O n2 2nπ 1

n ). Hence As (ζn (q))n=k ∈ l ∞ is analytic we then conclude that |dq ζn (p)| ≤ ||p||O( log n2 3 (1) 1 N+ 2 (2) N 1 2 dq : H (S ) → h (N; R ) is a compact perturbation of √ dq n 2nπ

n≥1

and thus a Fredholm operator of index 1. As a consequence, dq (2) : Tq LN c,2k → hN+ 2 (N; R2 ) is a Fredholm operator of index 0. 3

670

T. Kappeler, M. Makarov

To finish the proof of Proposition 18, it remains to prove N Lemma 20. The map dq (2) : Tq LN c,2k → T(2) (q) Mc,2k is 1-1.

Proof. It suffices to consider the case N = 0. Assume that, for some h ∈ Tq Lc,2k , dq (2) (h) = 0. Notice that ∂vn ∂wn (2) dq (h) = ,h ,h en + e−n , ∂q(x) ∂q(x) L2 L2 n≥1

(2) where en := (δk,n , 0)k≥1 and e−n := (0, δk,n )k≥1 . Introduce the sequence Fn (2)

defined by F0

:=

∂(0) ∂q(x) ,

n≥0

and

(2) F2n−1

:=

∂θn ∂q(x) , ∂vn ∂q(x) ,

n ∈ O; n ∈ O;

(2) F2n

:=

(2) Then, for n ≥ 0, h, Fn

(2)

∂In ∂q(x) , ∂wn ∂q(x) ,

n ∈ O; n ∈ O.

(2) = 0. We prove that h = 0 by showing that F is n L2 n≥0 (1) a complete system in L2 (S 1 ), i.e. span (Fn )n≥0 = L2 (S 1 ). We use themap (q) = (1)

1

(xn , yn )n≥1 . As dq (1) : L20 (S 1 ) → h 2 (N; R2 ) is 1-1, the system Fn (1) F0

n≥0

with

= 1 and, for n ≥ 1, (1) F2n−1

:=

∂θn ∂q(x) , ∂xn ∂q(x) ,

n ∈ O; n ∈ O;

(1) F2n

:=

(1)

∂In ∂q(x) , ∂yn ∂q(x) , (2)

n∈O n ∈ O (1)

is complete in L2 (S 1 ) (cf. [KM2]). Notice that, for n ∈ O, F2n−1 = F2n−1 . From (4.8) (2)

(1)

(2)

(1)

we conclude that, for n ∈ O, F2n−1 = ζn F2n−1 as well as F2n = ζn F2n with ζn = 0 (2) (2) (1) (1) (Lemma 15 ). By Lemma 5, span F0 , F2n | n ∈ O = span F0 , F2n | n ∈ O . Thus span Fn(2) | n ≥ 0 = span Fn(1) | n ≥ 0 = L2 (S 1 ). 4.5. Global properties of (2) . In this section we prove N Theorem 4. For N ≥ 0, the map (2) : LN ∗ → M∗ as well as its inverse is a real analytic diffeomorphism. N Proof. We have established that (2) : LN ∗ → M∗ is a real analytic map and a local (2) N diffeomorphism. It remains to show that : L∗ → MN ∗ is 1-1 and onto. Consider (2) −1 N (z) = 1}. Then V is open and closed in MN the set V := {z ∈ M∗ | F ∗ (as (2) is a local diffeomorphism) and proper. In order to show that V = MN it suffices ∗ therefore to prove that V = ∅. N N In the case where LN ∗ = Lc,n with |c| < 2 we take w = 0 ∈ Mc,n . Then, for any (2) −1 q∈ (0) and n ≥ 1, γn (q) = 0 (cf. (1.4)) and therefore q is a constant potential.

Action-Angle Variables for KdV

671

Using that a leaf LN c,n with |c| < 2, contains exactly one constant potential (cf. [KM1]) one concludes that 0 ∈ V which proves that V = MN c,n . N with |c| > 2 and k ≥ 1, it follows that γ (q) > 0 for In the case where LN = L k ∗ c,2k N all q ∈ L∗ and thus we have to argue differently. Introduce the sets gap{k} := {(vj , wj )j ≥1 | (vj , wj ) = (0, 0) for j = k}; Gapc,{k} := {q ∈ LN c,2k | γj (q) = 0 iff j = k}. and k ≥ 1),is 1-1. By (1.4)–(1.5), −1 gap{k} ⊆ Gapc,{k} . (2) Gapc,{k} ⊆ gap{k} and (2) Therefore it suffices to show that Gapc,{k} is non-empty and (2) , restricted to Gapc,{k} , is 1-1. By Subsect. 2.4 [KM1], Gapc,{k} is a 2-dimensional cylinder. Further, each isospectral set in Gapc,{k} is a circle, parametrized by the angle θk , and contains a unique normalized potential which is completely determined by its average. By Corollary 4, (2) [q] = 2πkIk + const on Gapc,{k} and hence the restriction of (2) to Gapc,{k} is 1-1. Thus ∅ = (2) Gapc,{k} ⊆ V which proves that V = MN c,n in this case as well. 4.6. Symplectic properties of (2) . In this section we prove N Theorem 5. For N ≥ 0 arbitrary, (2) : LN ∗ → M∗ is a symplectomorphism. (2) Proof. Let q ∈ LN ∗ and z := (q). Denote by (e±n )n≥1 the standard basis in 3 (2) N+ 2 2 h (N; R ), i.e. en := (δk,n , 0)k≥1 and e−n := (0, δk,n )k≥1 . Let u±n (q) be n≥1

the basis of Tq LN ∗ given by

−1 (2) (2) [e±n ]. u±n (q) ≡ u±n := dz (2)

(2)

Denote by ωq the symplectic structure evaluated at q, induced by the restriction of the second Poisson structure to LN ∗ . We have to prove that (2) (2) ωq(2) u±n , u±m = ∓δ±n,∓m , n, m ≥ 1. (4.9) It suffices to consider the case N = 0. Let us first verify (4.9) for q ∈ L2∗ ∩ Gap∞ , where Gap∞ := {q ∈ L2 (S 1 ) | γk (q) > 0, ∀k ≥ 1}. In view of Theorem 3, for any n, m ≥ 1 and q ∈ L2∗ ∩ Gap∞ , {vn , vm }2 (q) = {wn , wm }2 (q) = 0;

{vn , wm }2 (q) = δn,m ,

(4.10)

where we recall that (2) (q) = (vn , wn )n≥1 . For a functional f : L2 (S 1 ) → R and q ∈ L∗ , denote by δq f the orthogonal ∂f ∂f projection of ∂q(x) on Tq L∗ . Notice that Lq ∂q(x) − δq f = 0. Take q ∈ L2∗ ∩ Gap∞ . (2) As dq (2) u±k = e±k and ∂vn ∂wn en + e−n , dq (2) (h) = ,h ,h ∂q(x) ∂q(x) L2 L2 n≥1

672

T. Kappeler, M. Makarov

(2) (2) the system δq vk , δq wk k≥1 is a basis of Tq L∗ , biorthogonal to the basis uk , u−k k≥1 of Tq L∗ . On the other hand, by (4.10) Lq δq wk , −Lq δq vk k≥1 is biorthogonal to (2) (2) δq vk , δq wk k≥1 . Thus, for k ≥ 1, uk (q) = Lq δq wk and u−k (q) = −Lq δq vk . Using again (4.10) one concludes that, for q ∈ L2∗ ∩ Gap∞ , (4.9) holds. (2) (2) As L2∗ ∩Gap∞ is dense in L∗ and ωq as well as u±k (q) (k ≥ 1) depend continuously on q, we conclude that (4.9) holds for arbitrary q ∈ L∗ . Remark. It follows from the computations above that, for k ≥ 1 and q ∈ L∗ arbitrary, (2) (2) δq wk = Mq uk and δq vk = −Mq u−k , where Mq is the canonical right inverse of Lq (cf. [KM1]). Therefore we can improve on Theorem 3(iii) in the following way: Corollary 21. For n, m ≥ 1 and q ∈ Un ∩ Um {θn , θm }2 = 0. A. Appendix In this appendix we state some well known results that are frequently used in this paper. The following lemma follows by direct computations. Lemma 22. Let f and g be two solutions of −y

+ q(x)y = λy with q ∈ H 1 (S 1 ). Then d Lq (f g) = 2λ dx (f g). Recall that, for q ∈ L2 (S 1 ), y1 (x, λ, q) and y2 (x, λ, q) denote the fundamental solutions of −y

+ q(x)y = λy and mij are given by m11 := y1 (1, λ, q), m21 := y1 (1, λ, q), m12 := y2 (1, λ, q), m22 := y2 (1, λ, q). Lemma 23. (i) The functional (·, ·) is analytic on C × L2C (S 1 ). N+2 (S 1 ) and is given by (ii) For q ∈ H N (S 1 ) and λ ∈ C, ∂(λ,q) ∂q(x) is in H ∂(λ, q) = m12 y12 (x, λ, q) + (m22 − m11 )y1 (x, λ, q)y2 (x, λ, q) ∂q(x) − m21 y22 (x, λ, q). d ∂(λ,q) (iii) For q ∈ L2 (S 1 ), Lq ∂(λ,q) ∂q(x) = 2λ dx ∂q(x) . ! Lemma 24. (i) 2 (λ, q) − 4 = 4(λ0 − λ) n≥1 ! ˙ ˙ (ii) (λ, q) = − n≥1 (λnn2−λ) . π2 ! (µn −λ) (iii) y2 (1, λ, q) = n≥1 n2 π 2 .

(λ2n −λ)(λ2n−1 −λ) . n4 π 4

The following lemma is proved in [PT, p 168]. Lemma 25. Suppose zm , m ≥ 1, is a sequence of complex numbers such that zm = ! is an entire function of m2 π 2 + O(1). Then, for each n ≥ 1, the product m≥1 zmm2−λ π2 λ such that

m=n

zm − λ log n 1 n+1 1+O = (−1) m2 π 2 2 n

m≥1 m=n

uniformly for λ = n2 π 2 + O(1).

Action-Angle Variables for KdV

673

B. Appendix In this appendix we prove estimates needed in Subsect. 4.3. Recall that, for λ2n−1 ≤ µ ≤ λ2n , we have introduced pn (µ) ≡ pn (µ, q) := arccosh

(−1)n (µ, q) . 2

Define hn ≡ hn (q) := pn (λ˙ n (q), q)

(=

max

λ2n−1 ≤µ≤λ2n

pn (µ, q)).

Lemma 26. Let c ∈ R, k ≥ 0, M > 0. Then, for any n ≥ 1, there exists a constant Cn = Cn (M, k, c) > 0 so that for q ∈ Lc,k with ∞ j =1

(2)

(2πj )3 |Ij (q)| ≤ M,

(B.1)

the following two estimates hold: (ii) In(1) (q) ≤ Cn , n = k/2.

(i) γn (q) ≤ Cn , n ≥ 1;

(2)

Proof. (i) Case [|c| < 2] : for any j ≥ 1, the actions Ij 1 λ2j 1 π λ2j −1 µ pj (µ)dµ. Let q ∈ Lc,k with (B.1) and n ≥ 1. If hn (q) ≤ 1,

(2)

are given by Ij

=

γn ≤ 4 max {2π nhn , h2n }

(cf. [GT, Theorem 3.3]) and thus γn ≤ 8π n. If hn (q) > 1 (hence λ2n−1 < λ2n ) we consider the set An := {λ2n−1 ≤ µ ≤ λ2n | pn (µ) ≤ 1}. From [GT, Theorem 3.3] and the maximum principle, we obtain ln := meas(An ) ≤ 4 max {2π n, 1} = 8π n.

(B.2)

As |c| < 2, there exists n0 ≥ 1 with λ2n0 −2 < 0 < λ2n0 −1 . By assumption (B.1), we obtain, for n ≥ n0 , 1 1 λ2n 1 λ2n−1 + γn M (2) ≥ In ≥ dµ = log n3 π λ2n−1 +ln µ π λ2n−1 + ln or

πM γn ≤ (λ2n−1 + ln ) exp − λ2n−1 n3

πM πM ≤ λ2n−1 exp − 1 + 8π n exp . n3 n3

(B.3)

Using the a priori estimates on the bands, λ2j +1 − λ2j ≤ (2j + 1)π 2 (cf. [GT]) we get λ2n−1 ≤

n−1 j =n0

γj + n 2 π 2 .

(B.4)

674

T. Kappeler, M. Makarov

Combining (B.3) and (B.4) leads to (n ≥ n0 )  

n−1 πM πM . γj  exp 3 − 1 + 8π n exp γn ≤ n2 π 2 + n n3 j =n0

Similarly, for n < n0 , one obtains  

n 0 −1 πM πM 2 2   γn ≤ n0 π + exp 3 − 1 + 8π n exp . γj n n3 j =n+1

By induction we then obtain the claimed estimate in the case [|c| < 2]. The case [c ≥ 2, k = 0] is treated similarly as the case [|c| < 2], once one has obtained an estimate for λ0 (q): If c ≥ 2 and k = 0, then 0 ≤ λ0 . By the variational characterization of the eigenvalue λ0 , λ0 (q) ≤ [q] and, by Corollary 4, [q] ≤ M + f (q). Thus, for any 2 2 n ≥ 1, 0 ≤ λ2n−1 ≤ λ2n−1 − λ0 + [q] ≤ n−1 j =1 γj + n π + M + f (q) and these estimates lead to bounds of γn as above. In the case [c > 2, k = 2m ≥ 2] as well as [c = ±2, k = 2m ≥ 2] one argues again similarly as in the case [|c| < 2] : Consider e.g. [c > 2, k = 2m ≥ 2]. Only the estimate λ2m 1 (2) (2) for γm is different as λ2m−1 < 0 < λ2m and Im is given by Im = π1 λ2m−1 µ (pm (µ) − (−1)m c pm (0))dµ. Notice that for q ∈ Lc,k , pm (0) = arccosh is independent of 2 q. If hm (q) ≤ pm (0) + 1, then γm ≤ 4 max{2π m(pm (0) + 1), (pm (0) + 1)2 }. If hm (q) > pm (0) + 1, introduce Am := {λ2m−1 ≤ µ ≤ λ2m | pm (µ) ≤ pm (0) + 1}. The function pm (µ) is increasing on (λ2m−1 , λ˙ m ) and decreasing on (λ˙ m , λ2m ), hence Am = [λ2m−1 , xm ] ∪ [ym , λ2m ], where pm (xm ) = pm (ym ) = pm (0) + 1. Notice that either 0 ≤ xm < ym or xm < ym ≤ 0. Let lm = meas(Am ). Again by [GT, Theorem 3.3] and the maximum principle, lm ≤ 4 max{2π m(pm (0) + 1), (pm (0) + 1)2 }. Thus, if γm ≤ 4lm , γm is bounded. If γm − 4lm > 0 we argue as follows: Assume that 0 ≤ xm < ym . (The case xm < ym ≤ 0 is treated similarly.) Then 1 xm 1 Im(2) = (B.5) (pm (µ) − pm (0))dµ π λ2m−1 µ 1 λ2m 1 1 ym 1 (pm (µ) − pm (0))dµ + (pm (µ) − pm (0))dµ. + π xm µ π ym µ m (0) The integrals on the right hand side of (B.5) are estimated separately: As pm (µ)−p ≥0 µ for λ2m−1 ≤ µ < 0 and for 0 < µ ≤ xm , we have xm 1 (pm (µ) − pm (0))dµ ≥ 0. λ2m−1 µ

Using that ym − xm = γm − lm and ym < γm one obtains 1 γm dµ 1 γm 1 ym 1 1 ym dµ (pm (µ) − pm (0))dµ ≥ ≥ = log . π xm µ π xm µ π lm µ π lm

Action-Angle Variables for KdV

675

As γm − lm = ym − xm , 0 ≤ xm ≤ ym , and 4lm ≤ γm in the case considered, 2lm ≤ γm − 2lm ≤ γm − lm < λ2m . This leads to

λ2m 1 dµ (pm (µ) − pm (0))dµ ≥ −pm (0) µ µ ym ym

γm −lm dµ lm 3 ≥ −pm (0) log = −pm (0) log 1 + . ≥ −pm (0) γm − 2lm 2 γm −2lm µ

0≥

λ2m

Substituting these estimates into (B.5) yields

1 1 3 γm − pm (0) log Im(2) ≥ log π lm π 2 or γm ≤ e

(2)

π|Im |

pm (0) 3 lm , 2

where lm can be estimated as above. (1) (ii) Notice that, for n = k/2, 0 ∈ [λ2n−1 , λ2n ] and therefore In (q) ≤ 2(|λ2n | + (2) |λ2n−1 |)|In (q)| and, according to Corollary 4, |[q]| ≤ M + f (q) for q ∈ Lc,k . Using that for q ∈ Lc,k , λk > 0 and hence |λj | ≤ max {λj − λ0 + |[q]|, λk − λ0 }, and using the same estimate for λ2n−1 − λ0 as above λ2n−1 − λ0 ≤ n2 π 2 +

n

γj

j =1

one obtains, with an appropriate choice of Cn = Cn (M, k, c), the claimed estimate (ii). Lemma 27. Under the same assumption as in Lemma 26, there exists n1 = n1 (M, k, c) so that for n ≥ n1 and q ∈ Lc,k , (i) γn (q) ≤ 18nπ; (1) 1 1 2 (ii) In (q) ≥ (8π) 2 n γn (q). Proof. (i) Let q ∈ Lc,k and choose n0 sufficiently large so that for n ≥ n0 , exp 1<

πM n3

−

. Then (B.3) leads to γn ≤ λ2n−1 n5/2 +16π n. Use λ0 ≤ [q] to see that |λ2n−1 | ≤ λ2n−1 − λ0 + |[q]| ≤ n2 π 2 + n−1 j =1 γj + |[q]|. By Corollary 4, |[q]| ≤ M + f (q) and thus, for n ≥ n0 (with n0 sufficiently large), 1

1

n5/2

γn ≤

n−1 1

n5/2

j =1

γj + 17π n.

(B.6)

676

T. Kappeler, M. Makarov

Define γ˘j := γj − 17πj to conclude from (B.6) that for n ≥ n0 , γ˘n ≤ (γ˘1 + . . . + γ˘n−1 + 17π ≤

max γ˘l + 1.

n2 1 ) 2 n5/2

1≤l≤n−1

Arguing inductively, we then conclude that γ˘n0 +j ≤ max γ˘l + j 1≤l≤n0

and statement (i) follows if we choose n1 ≥ n0 sufficiently large. (ii) By [BBGK, Lemma 2.2], & & γ ' nγn 1 1 2' n In(1) (q) ≥ = γ min nγ , min 1, 2 . n (8π)2 n n (8π )2 n By choosing n1 , if necessary, even larger so that n1 ≥ 18π , statement (ii) follows then from (i) for n ≥ n1 . Acknowledgement. It is a pleasure to thank B. Khesin for helpful discussions. Our work got started by a question of S.P. Novikov concerning the existence of (global) action-angle variables of KdV with respect to the second Poisson structure.

References [BBGK] Bättig, D., Bloch, A., Guillot, J.-C. and Kappeler, T.: On the symplectic structure of the phase space for periodic KdV, Toda and defocusing NLS. Duke Math. J. 79, 549–604 (1995) [BKM1] Bättig, D., Kappeler, T. and Mityagin, B.: On the Korteweg–deVries equation: convergent Birkhoff normal form. J. Funct. Anal. 140, 335–358 (1996) [BKM2] Bättig, D., Kappeler, T. and Mityagin, B.: On the Korteweg–deVries equation: frequencies and initial value problem. Pacific J. Math. 181, 1–55 (1997) [Du] Duistermaat, J.J.: On global action-angle coordinates. CPAM 33, 687–706 (1980) [FM] Flaschka, H. and McLaughlin, D.: Canonically conjugate variables for the Korteweg–deVries equation and Toda lattice with periodic boundary conditions. Progress of Theor. Phys. 55, 438–456 (1976) [GT] Garnett, J., Trubowitz, E.: Gaps and bands of one dimensional Schrödinger operators. Commun. Math. Helv. 59, 258–312 (1984) [GZ] Gelfand, I.M., Zakharevich, I.S.: Spectral theory of a pencil of skew-symmetric differential operators of third order on S 1 . Funct. Anal. and its Appl. 23, 1–11 (1989) [Ka] Kappeler, T.: Fibration of the phase-space for the Korteweg–deVries equation. Ann. Inst. Fourier 41, 539–575 (1991) [KM1] Kappeler, T., Makarov, M.: On the symplectic foliation induced by the second Poisson bracket for KdV. In: Symmetry and Perturbation Theory, eds. D. Bambusi, G. Gaeta, Quaderni del Con. Naz. delle Ric., Fisica Matematica, n. 54, 1998, pp. 132–152 [KM2] Kappeler, T., Makarov, M.: On Birkhoff coordinates for KdV. Preprint [KZ] Khesin, B., Zakharevich, I.S.: Poisson-Lie group of pseudodifferential symbols. Commun. Math. Phys. 171, 475–530 (1995) [Ku] Kuksin, S.: Nearly Integrable Infinite-Dimensional Hamiltonian Systems. Lecture Notes Math. 1556, Berlin: Springer-Verlag, 1993 [Ma] Magri, F.: A simple model of the integrable Hamiltonian equation. J. Math. Phys. 19, 1156–1162 (1978) [Mc] McKean, H.P.: Compatible brackets in Hamiltonian Mechanics. In: Important Developments in Soliton Theory, eds. A.S. Fokas, V.E. Zakharov, Berlin–Heidelberg–New York: Springer, 1993 [MT] McKean, H.P., Trubowitz, E.: Hill’s operator and hyperelliptic function theory in the presence of infinitely many branch points. CPAM 24, 143–226 (1976)

Action-Angle Variables for KdV

[MV] [NV] [PT]

677

McKean, H.P., Vaninsky, K.L.: Action-angle variables for the cubic Schrödinger equation. CPAM 50, 489–562 (1997) Novikov, S.P., Veselov, A.P.: Poisson brackets and complex tori. Proc. Steklov Inst. Math. 165, 53–65 (1985) Pöschel, J., Trubowitz, E.: Inverse Spectral Theory. San Diego: Academic Press, 1987

Communicated by Ya. G. Sinai

Commun. Math. Phys. 214, 679 – 703 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

A Fleming–Viot Particle Representation of the Dirichlet Laplacian Krzysztof Burdzy1 , Robert Hołyst2 , Peter March3 1 Department of Mathematics, University of Washington, Box 354350, Seattle, WA 98195-4350, USA.

E-mail: [email protected]

2 Institute of Physical Chemistry, Polish Academy of Sciences, Dept. III, Kasprzaka 44/52, 01224 Warsaw,

Poland. E-mail: [email protected]

3 Department of Mathematics, Ohio State University, Columbus, OH 43210, USA.

E-mail: [email protected] Received: 11 November 1999 / Accepted: 19 May 2000

Abstract: We consider a model with a large number N of particles which move according to independent Brownian motions. A particle which leaves a domain D is killed; at the same time, a different particle splits into two particles. For large N , the particle distribution density converges to the normalized heat equation solution in D with Dirichlet boundary conditions. The stationary distributions converge as N → ∞ to the first eigenfunction of the Laplacian in D with the same boundary conditions. 1. Introduction Our article is closely related to a model studied by Burdzy, Hołyst, Ingerman and March (1996) using heuristic and numerical methods. Although we are far from proving conjectures stated in that article, the present paper seems to lay solid theoretical foundations for further research in this direction. The model is related to many known ideas in probability and physics – we review them in the Appendix (Sect. 3). We present the model and state our main results in this section. Section 2 is entirely devoted to proofs. We will be concerned with a multiparticle process. The motion of an individual particle will be represented by Brownian motion in an open subset of Rd . Probably all our results can be generalized to other processes. However, the present paper is motivated by the article of Burdzy, Hołyst, Ingerman and March (1996) whose results are very specific to Brownian motion and so we will limit ourselves to this special case. We note that Theorem 1.1 below uses only the strong Markov property of the process representing a particle and the continuity of the density of the hitting time of a set. Theorem 1.2 is similarly easy to generalize. At the other extreme, the proof of Theorem 1.4 uses Brownian properties in an essential way and would be hard to generalize. It might be therefore of some interest to see if Theorem 1.4 holds for a large class of processes. Research partially supported by NSF grant DMS-9700721, KBN grant 2P03B12516 and Maria CurieSklodowska Joint Fund II.

680

K. Burdzy, R. Hołyst, P. March

Consider an open set D ⊂ Rd and an integer N ≥ 2. Let Xt = (Xt1 , Xt2 , . . . , XtN ) be a process with values in D N whose evolution can be described as follows. Suppose X0 = (x 1 , x 2 , . . . , x N ) ∈ D N . The processes Xt1 , Xt2 , . . . , XtN evolve as independent Brownian motions until the first time τ1 when one of them, say, X j hits the boundary of D. At this time one of the remaining particles is chosen in a uniform way, say, Xk , and the process Xj jumps at time τ1 to Xτk1 . The processes Xt1 , Xt2 , . . . , XtN continue as independent Brownian motions after time τ1 until the first time τ2 > τ1 when one of them hits ∂D. At the time τ2 , the particle which approaches the boundary jumps to the current location of a randomly (uniformly) chosen particle among the ones strictly inside D. The subsequent evolution of the process Xt proceeds along the same lines. Before we start to study properties of Xt , we have to check if the process is well defined. Since the distribution of the hitting time of ∂D has a continuous density, only one particle can hit ∂D at time τk , for every k, a.s. However, the process Xt can be defined for all t ≥ 0 using the informal recipe given above only if τk → ∞ as k → ∞. This is because there is no obvious way to continue the process Xt after the time τ∞ = limk→∞ τk if τ∞ < ∞. Hence, the question of the finiteness of τ∞ has a fundamental importance for our model. Theorem 1.1. We have limk→∞ τk = ∞ a.s. Consider an open set D which has more than one connected component. If at some time t all processes Xtk belong to a single connected component of D then they will obviously stay in the same component from then on. Will there be such a time t? The answer is yes, according to the theorem below, and so we could assume without loss of generality that D is a connected set, especially in Theorem 1.4. Theorem 1.2. With probability 1, there exists t < τ∞ such that all processes Xtk belong to a single connected component of D at time t. Before we continue the presentation of our results, we will provide a slightly more formal description of the process Xt than that at the beginning of the introduction. The fully rigorous definition would be a routine but tedious task and so it is left to the reader. One can show that given (x 1 , x 2 , . . . , x N ) ∈ D N , there exists a strong Markov process Xt , unique in the sense of distribution, with the following properties. The process starts from X0 = (x 1 , x 2 , . . . , x N ), a.s. Let τ1 = and for n ≥ 1,

inf

1≤m≤N

τn+1 =

inf

inf{t > 0 : lim Xsm ∈ D c },

1≤m≤N

s→t−

inf{t > τn : lim Xsm ∈ D c }. s→t−

Then τn+1 > τn for every n ≥ 1, a.s. For every n ≥ 1, there exists a unique kn such that lims→τn − Xskn ∈ D c , a.s. We have Xτmn = Xτmn − , for every m = kn . For some random j

j = j (n, kn ) = kn we have Xτknn = Xτn . The distribution of j (n, kn ) is uniform on the set {1, 2, . . . , N} \ {kn } and independent of {Xt , 0 ≤ t < τn }. For every n, the process {X(t∧τn+1 )− , t ≥ τn } is a Brownian motion on D N stopped at the hitting time of ∂D N . Let PtD (x, dy) be the transition probability for the Brownian motion killed at the time of hitting of D c . Given a probability measure µ0 (dx) on D, we define measures µt for t > 0 by P D (x, A)µ0 (dx) , (1.1) µt (A) = D tD D Pt (x, D)µ0 (dx)

Fleming–Viot Particle Representation of the Dirichlet Laplacian

681

for open sets A ⊂ D. Note that µt is a probability measure, for every t ≥ 0. Let δx (dy) be the probability measure with the unit atom at x. We will write X N t (dy) = (1/N ) N δ (dy) to denote the empirical (probability) distribution representing the k=1 Xtk particle process Xt . We will say that D has a regular boundary if for every x ∈ ∂D, Brownian motion starting from x hits D c for arbitrarily small t > 0, a.s. Theorem 1.3. Suppose that D is bounded and has a regular boundary. Fix a probability distribution µ0 on D and recall the definition (1.1). Suppose that for every N , the initial N distribution X0N is a non-random measure µN 0 . If the measures µ0 converge as N → ∞ to µ0 then for every fixed t > 0 the empirical distributions XtN converge to µt in the sense that for every set A ⊂ D, the sequence XtN (A) converges to µt (A) in probability. The regularity of ∂D seems to be only a technical assumption, i.e., Theorem 1.3 is likely to hold without this assumption. We conjecture that for any S > 0, the measure-valued processes {XtN ( · ), 0 ≤ t ≤ S} converge to {µt ( · ), 0 ≤ t ≤ S} in the Skorohod topology, as N → ∞. The arguments presented in this paper do not seem to be sufficient to justify this claim. One may wonder whether EXtN (A) = µt (A) for all sets A and t > 0 if we assume that X0N = µ0 . One can find intuitive arguments both for and against this claim but none of them seemed to be quite clear to us. We will have to resort to brute calculation to show that the statement is false. The word “brute” refers only to the lack of a clear intuitive explanation and not to the difficulty of the example which is in fact quite elementary (see Example 2.1 in Sect. 2). The example is concerned with a process on a finite state space. We presume that a similar example can be based on the Brownian motion process. We will say that an open set D ⊂ Rd satisfies the interior ball condition if for some r > 0 and every x ∈ D there exists an open ball B(y, r) ⊂ D such that x ∈ B(y, r). Theorem 1.4. Suppose that D ⊂ Rd is a bounded domain, has a regular boundary and satisfies the interior ball condition. (i) For every N , there exists a unique stationary probability measure MN for Xt . The process Xt converges to its stationary distribution exponentially fast, i.e., there exists λ > 0 such that for every A ⊂ D N , and every x ∈ D N , lim eλt |P x (Xt ∈ A) − MN (A)| = 0.

t→∞

N be the stationary empirical measure, i.e., let X N have the same distribution (ii) Let XM M as (1/N ) N δ (dy), assuming that X has the distribution MN . Let ϕ(x) be the k t k=1 Xt first eigenfunction for boundary conditions, the Laplacian in D with the Dirichlet N converge as N → ∞ normalized so that D ϕ = 1. Then the random measures XM to the (non-random) measure with density ϕ(x), in the sense of weak convergence of random measures.

682

K. Burdzy, R. Hołyst, P. March

2. Proofs This section is devoted to proofs of the main results. It also contains an example related to Theorem 1.3. Proof of Theorem 1.1. Fix an arbitrary S < ∞. Let Bt be a Brownian motion, and h(x, t) = P (inf{s > t : Bs ∈ / D} > S | Bt = x). We will first prove the theorem for N = 2 as this special case presents the main idea of 1 , t) + h(X 2 , t). Consider an arbitrary y ∈ D the proof in a clear way. Let Mt = h(Xt− t− 1 2 and assume that X0 = X0 = y. Let a = h(y, 0) and τ∗ = τ1 ∧ S. An application of the optional stopping theorem to the martingale Mt∧τ∗ gives 2h(y, 0) = EM0 = EMτ∗ = E(Mτ∗ | τ∗ = S)P (τ∗ = S) + E(Mτ∗ | τ∗ < S)P (τ∗ < S) = 2 · P (τ∗ = S) + E(Mτ∗ | τ1 < S)P (τ1 < S) = 2 · h(y, 0)2 + E(Mτ∗ | τ1 < S)(1 − h(y, 0)2 ). From this we obtain E(Mτ∗ | τ1 < S) =

2h(y, 0) − 2h(y, 0)2 2h(y, 0) = ≥ h(y, 0). 1 − h(y, 0)2 1 + h(y, 0)

The process Xtk which hits ∂D at time τ1 jumps to the location of Xτ3−k , so we have 1 E(h(Xτ11 , τ1 ) + h(Xτ21 , τ1 ) | τ1 < S) ≥ 2h(y, 0) = E(h(X01 , 0) + h(X02 , 0)). By applying the strong Markov property at the stopping time τ1 we obtain E(h(Xτ12 , τ2 ) + h(Xτ22 , τ2 ) | τ2 < S) ≥ E(h(X01 , 0) + h(X02 , 0)). By induction, for all k ≥ 1, E(h(Xτ1k , τk ) + h(Xτ2k , τk ) | τk < S) ≥ E(h(X01 , 0) + h(X02 , 0)) = 2a.

(2.1)

Let Jk = h(Xτ1k , τk ) + h(Xτ2k , τk ). Since h(x, t) ≤ 1, we have Jk ≤ 2. Hence, E(Jk | τk < S) ≤ 2P (Jk ≥ a | τk < S) + aP (Jk < a | τk < S) = P (Jk ≥ a | τk < S)(2 − a) + a, and so, using (2.1), P (h(Xτ1k , τk ) = h(Xτ2k , τk ) ≥ a/2 | τk < S) 2a − a a E(Jk | τk < S) − a = P (Jk ≥ a | τk < S) ≥ ≥ = . 2−a 2−a 2−a

Fleming–Viot Particle Representation of the Dirichlet Laplacian

683

It follows that P (τk+1 ≥ S | τk < S) 1 2 = P (inf{s > τk : Xs− ∈ / D} > S, inf{s > τk : Xs− ∈ / D} > S | τk < S) = [P (inf{s > τk : Xs1 ∈ / D} > S | Xτ1k = x)]2 P (Xτ1k ∈ dx | τk < S) = h(x, τk )2 P (Xτ1k ∈ dx | τk < S)

≥ (a/2)2 ·

a3 a = . 2−a 8 − 4a

This implies that P (τk+1 < S) =

k

P (τj +1

j =1

a3 < S | τj < S) ≤ 1 − 8 − 4a

k ,

and so P (τ∞ < S) = 0. Recall that we have assumed that X01 = X02 . If X01 is not equal to X02 , we can apply the argument to the post-τ1 process to see that P (τ∞ < S) = 0 for every starting position of Xt . Since S < ∞ is arbitrarily large, the proof of the theorem is complete in the special case N = 2. Now we generalize the argument to arbitrary N ≥ 2. Recall S, τ∗ and h(x, t) from the first part of the proof. Let Mt =

N k=1

k h(Xt− , t),

and ak = h(X0k , 0). Then N

ak = EM0 = EMτ∗

k=1

= E(Mτ∗ | τ∗ = S)P (τ∗ = S) + E(Mτ∗ | τ∗ < S)P (τ∗ < S) = N · P (τ∗ = S) + E(Mτ∗ | τ1 < S)P (τ1 < S) N N =N ak + E(Mτ∗ | τ1 < S) 1 − ak . k=1

k=1

From this we obtain E(Mτ∗ | τ1 < S) =

N

k=1 ak

1−

−N

N

N

k=1 ak

k=1 ak

.

(2.2)

Our immediate goal is to prove that the right hand side of (2.2) is bounded below by ((N − 1)/N ) N k=1 ak .

684

K. Burdzy, R. Hołyst, P. March

The derivative of the function x →

N k=1 ak

− N x /(1 − x) is equal to

N

k=1 ak

−N . (1 − x)2 The derivative is non-positive since N ak ≤ N . If we let b = N k=1

k=1 ak /N then for N a fixed value of b, the value of the product k=1 ak is maximized if we take ak = b for all k. These facts imply that N N

N ak − N · b N k=1 ak − N k=1 ak ≥ k=1

N 1 − bN 1 − k=1 ak (2.3) N b − N · bN 1 − bN−1 = = Nb · . 1 − bN 1 − bN We will show that Nb ·

1 − bN−1 ≥ (N − 1)b, 1 − bN

(2.4)

for b ∈ [0, 1). The last inequality is equivalent to N (1 − bN−1 ) ≥ (N − 1)(1 − bN ). After multiplying out and regrouping the terms we obtain 1 + N bN − N bN−1 − bN ≥ 0.

(2.5)

The function f (b) = 1 + N bN − N bN−1 − bN has the derivative f (b) = N (N − 1)bN−2 (b − 1) which is negative for b < 1. Since f (1) = 0, we have f (b) ≥ 0 for b ∈ [0, 1), i.e., (2.5) holds. Consequently, (2.4) is true as well. Combining (2.2), (2.3) and (2.4) yields E(Mτ∗ | τ1 < S) =

N

k=1 ak

1−

−N

N

N

k=1 ak

k=1 ak

N

≥ (N − 1)b =

N −1 ak . N k=1

The process Xk which hits the boundary at time τ1 jumps to the location of a process Xj , uniformly chosen from other processes. Hence, N N 1 E h(Xτk∗ , t) | τ1 < S = 1 + h(Xτk∗ − , t) | τ1 < S E N −1 k=1

=

N E(Mτ∗ N −1

k=1

N N N N −1 | τ1 < S) ≥ ak = ak . · N −1 N k=1

k=1

By induction and the strong Markov property applied at τk ’s, we have for every k ≥ 1, N N k E h(Xτk , t) | τk < S ≥ ak = N b. (2.6) k=1

k=1

Fleming–Viot Particle Representation of the Dirichlet Laplacian

685

N j Let Jk = h(Xτk , τk ). Since h(x, t) ≤ 1, we have Jk ≤ N . Recall that b = N j =1 (1/N ) k=1 ak . Hence, E(Jk | τk < S) ≤ N P (Jk ≥ b | τk < S) + bP (Jk < b | τk < S) = P (Jk ≥ b | τk < S)(N − b) + b. This and (2.6) imply that P (Jk ≥ b | τk < S) ≥

E(Jk | τk < S) − b Nb − b ≥ . N −b N −b

It follows that P (∃j : h(Xτjk , τk ) ≥ b/N | τk < S) ≥

Nb − b . N −b

(2.7)

j

Fix some t ∈ (0, S). Suppose that h(Xt , t) ≥ b/N for some j and assume that j is j / (b/(2N ), 1)}. the smallest number with this property. Let T = inf{s > t : h(Xs , s) ∈ j j Note that h(XT , T ) = 1 if and only if T = S. The process h(Xs , s) is a martingale on the interval (t, T ). By the martingale property and the optional stopping theorem, the probability of not hitting b/(2N ) before time S is greater than or equal to b/N − b/(2N ) b = . 1 − b/(2N ) 2N − b

(2.8)

j

j

Consider the event A that h(Xt , t) ≥ b/N at some time t and that the process h(Xs , s) does not hit b/(2N ) between t and S. Given this event, for any k = j , the process X k may jump at most once before time S with probability greater than (1/(N − 1)) · (b/(2N )), independent of other processes X m , m = k, j . To see this, observe that X k might not hit ∂D before time S at all; or it may hit ∂D, then jump to the location of Xj with probability 1/(N − 1). If the jump takes place, the process X k lands at some time u at a j place where we have h(Xuk , u) = h(Xu , u) ≥ b/(2N ), because we are assuming that A holds. The definition of the function h now implies that after u, the process Xk will not hit ∂D before time S, with probability greater than b/(2N ). Multiplying the probabilities for all k = j and using (2.8), we conclude that if we j have h(Xt , t) ≥ b/N then the probability that there will be at most N − 1 jumps (counting all particles) before time S is greater than N−1 b b . p0 = · 2N − b 2N (N − 1) Hence, in view of (2.7), P (τk+N ≥ S | τk < S) ≥ P (∃j : h(Xτjk , τk ) ≥ b/N | τk < S)P (τk+N > S | ∃j : h(Xτjk , τk ) ≥ b/N ) Nb − b ≥ · p0 . N −b Thus P (τ(m+1)N < S) =

m j =1

P (τ(j +1)N < S | τj N

Nb − b < S) ≤ 1 − · p0 N −b

m

,

686

K. Burdzy, R. Hołyst, P. March

and so

P (τ∞ < S) = 0.

Since S is arbitrarily large, the proof is complete.

j

Proof of Theorem 1.2. Fix arbitrary points x j ∈ D and suppose that X0 = x j for all j . Let τ j be the the first jump time for the process X j . Since there are N ! permutations of {1, 2, . . . , N}, there exists a permutation (j1 , j2 , . . . , jN ), such that P (τ j1 < τ j2 < · · · < τ jN ) ≥ 1/N !. In order to simplify the notation we will assume that (j1 , j2 , . . . , jN ) = (1, 2, . . . , N ). Thus we have P (τ 1 < τ 2 < · · · < τ N ) ≥ 1/N ! and P (τ 1 < τ 2 < · · · < τ N , Xτ11 = XτN1 ) ≥ j

1 . N · N!

j

Let τ2 denote the time of the second jump of process Xt . By independence, P (τ 1 < τ 2 < . . . < τ N−1 < min(τ N , τ21 ), Xτ11 = XτN1 | τ 1 , τ 2 , . . . , τ N−1 , Xτ11 = XτN1 ) = P (τ 1 < τ 2 < · · · < τ N−1 < τ N , Xτ11 = XτN1 | τ 1 , τ 2 , . . . , τ N−1 , Xτ11 = XτN1 )

× P (τ 1 < τ 2 < · · · < τ N−1 < τ21 , Xτ11 = XτN1 | τ 1 , τ 2 , . . . , τ N−1 , Xτ11 = XτN1 )

= P (τ 1 < τ 2 < · · · < τ N−1 < τ N , Xτ11 = XτN1 | τ 1 , τ 2 , . . . , τ N−1 , Xτ11 = XτN1 )2 . It follows that, P τ 1 < τ 2 < · · · < τ N−1 < min(τ N , τ21 ), Xτ11 = XτN1 = EP τ 1 < τ 2 < . . . < τ N−1 < min(τ N , τ21 ), Xτ11 = XτN1 | τ 1 , τ 2 , . . . , τ N−1 , Xτ11 = XτN1

= E P τ 1 < τ 2 < · · · < τ N−1 < τ N ,

Xτ11 = XτN1 | τ 1 , τ 2 , . . . , τ N−1 , Xτ11 = XτN1

2

≥ EP τ 1 < τ 2 < · · · < τ N−1 < τ N ,

Xτ11 = XτN1 | τ 1 , τ 2 , . . . , τ N−1 , Xτ11 = XτN1 2 = P τ 1 < τ 2 < · · · < τ N−1 < τ N , Xτ11 = XτN1 ≥

2

1 . (N · N!)2

We proceed by induction. Let us display one induction step. We start with P (τ 1 < τ 2 < · · · < τ N−1 < min(τ N , τ21 ), Xτ11 = XτN1 , Xτ22 = XτN2 ) ≥

1 1 . · N (N · N !)2

Fleming–Viot Particle Representation of the Dirichlet Laplacian

687

Then we observe that P (τ 1 < τ 2 < · · · < τ N−1 < min(τ N , τ21 , τ22 ), Xτ11 = XτN1 , Xτ22 = XτN2 | τ 1 , τ 2 , . . . , τ N−1 , τ21 , Xτ11 = XτN1 , Xτ22 = XτN2 , )

= P (τ 1 < τ 2 < · · · < τ N−1 < min(τ N , τ21 ), Xτ11 = XτN1 , Xτ22 = XτN2 | τ 1 , τ 2 , . . . , τ N−1 , τ21 , Xτ11 = XτN1 , Xτ22 = XτN2 , )

× P (τ 1 < τ 2 < · · · < τ N−1 < min(τ22 , τ21 ), Xτ11 = XτN1 , Xτ22 = XτN2 | τ 1 , τ 2 , . . . , τ N−1 , τ21 , Xτ11 = XτN1 , Xτ22 = XτN2 , )

= P (τ 1 < τ 2 < · · · < τ N−1 < min(τ N , τ21 ), Xτ11 = XτN1 , Xτ22 = XτN2 | τ 1 , τ 2 , . . . , τ N−1 , τ21 , Xτ11 = XτN1 , Xτ22 = XτN2 , )2 .

From this we deduce that P (τ 1 < τ 2 < · · · < τ N−1 < min(τ N , τ21 , τ22 ), Xτ11 = XτN1 , Xτ22 = XτN2 )

= EP (τ 1 < τ 2 < · · · < τ N−1 < min(τ N , τ21 , τ22 ), Xτ11 = XτN1 , Xτ22 = XτN2 | τ 1 , τ 2 , . . . , τ N−1 , τ21 , Xτ11 = XτN1 , Xτ22 = XτN2 , )

= E[P (τ 1 < τ 2 < · · · < τ N−1 < min(τ N , τ21 ), Xτ11 = XτN1 , Xτ22 = XτN2 | τ 1 , τ 2 , . . . , τ N−1 , τ21 , Xτ11 = XτN1 , Xτ22 = XτN2 , )2 ]

≥ [EP (τ 1 < τ 2 < · · · < τ N−1 < min(τ N , τ21 ), Xτ11 = XτN1 , Xτ22 = XτN2 | τ 1 , τ 2 , . . . , τ N−1 , τ21 , Xτ11 = XτN1 , Xτ22 = XτN2 , )]2

= P (τ 1 < τ 2 < · · · < τ N−1 < min(τ N , τ21 ), Xτ11 = XτN1 , Xτ22 = XτN2 )2 2 1 1 ≥ . · N (N · N!)2 Proceeding in this way, we can prove that P (τ 1 < τ 2 < · · · < τ N−1 < min(τ N , τ21 , τ22 , . . . , τ2N−1 ), N Xτ11 = XτN1 , Xτ22 = XτN2 , . . . , XτN−1 N −1 = Xτ N −1 ) ≥ c1 ,

(2.9)

where c1 > 0 is a constant which depends on N but not on the starting position of Xtk ’s. If the event in (2.9) occurs then at time τN−1 all particles are present in the same connected component of D as XN . They will stay in this connected component of D forever. If the event in (2.9) does not occur then we wait until the time max(τ N , τ21 , τ22 , . . . , τ2N−1 ) and restart our argument, using the strong Markov property. We can construct in this way a sequence of events whose conditional probabilities (given the outcomes of previous “trials”) are bounded below by c1 . With probability 1, at least one of these events will occur and so all particles will end up in a single connected component of D. Proof of Theorem 1.3. Fix some S ∈ (0, ∞). We will prove that XSN converges to µS . Our proof will consist of three parts.

688

K. Burdzy, R. Hołyst, P. March

Part 1. In this part of the proof, we will define the tree of descendants of a particle and estimate its size. Fix an arbitrarily small ε1 > 0. Let Bt denote a Brownian motion and TA = inf{t > 0 : Bt ∈ A}. Find open subsets A1 , A2 and A3 of D such that A1 ⊂ A2 ⊂ A3 , µ0 (A1 ) ≥ ε1 > 0, and for some p1 , p2 > 0, inf P x (TAc2 > S) > p1

x∈A1

and

inf P x (TAc3 > S) > p2 .

x∈A2

We would like to set aside a small family of particles starting from A1 and containing about ε1 N particles. Since the measure µ0 may be purely atomic with all atoms greater than ε1 , we cannot designate the particles in that family just by their starting position. We have assumed that the measures X0N converge to µ0 , so we must have X0N (A1 ) > ε1 /2 for large N . Let [b] denote the integer part of a number b. For each sufficiently large N, we arbitrarily choose [N ε1 /2] particles with the property that their starting positions lie inside A1 . The choice is deterministic (non-random). The family of all N − [N ε1 /2] remaining particles will be called H. By the law of large numbers, for any p3 < 1 and sufficiently large N, more than N ε1 p1 /4 particles will stay inside A2 until time S, with probability greater than p3 . We will say that a particle has label k if its motion is represented by Xtk . We will identify the families Hc and H with the sets of labels so that we can write, for example, k ∈ Hc . Let F be the event that at least N1 = N ε1 p1 /4 particles from the family Hc stay inside A2 until time S. Consider the motion of a particle X k belonging to the H family, conditional on F . Given F , the probability that the particle lands on a particle from the family Hc at the time of a jump is not less than (N ε1 p1 /4)/(N − 1). If this event occurs, then the k th particle can stay within the set A3 until time S with probability p2 or higher. Hence, each jump of particle k has at least probability p2 (N ε1 p1 /4)/(N − 1) ≥ p2 ε1 p1 /4 ≡ p4 of being the last jump for this particle before time S. We see that the total number of jumps of X k before time S is stochastically bounded by the geometric distribution with mean 1/p4 . In the rest of Part 1, we will assume that F occurred, i.e., all the probabilities will be conditional probabilities given F , even if the conditioning is not reflected in the notation. We will now define a tree T m of particle trajectories representing descendants of particle m (see Fig. 1). Informally, the family of all descendants of a particle Xm can be described as the smallest family of points (t, n) with the following properties. The particle Xtm is its own descendant for all t, i.e., (t, m) ∈ T m for all t. If a particle X k jumps on a descendant of X m at time s then Xtk becomes a descendant of X m for all t ≥ s, i.e., (t, k) ∈ T m for all t ≥ s. We now present a more formal definition. We will say that (t, n) ∈ T m if there exists a set of pairs (sj , yj ), 0 ≤ j ≤ j0 , with (s0 , y0 ) = (0, x m ), (sj0 , yj0 ) = (t, Xtn ), such that there exist distinct k(j ) ∈ H, j ≥ 1, with the following properties: (i) (sj −1 , Xskj −1 ) = (sj −1 , yj −1 ) and (sj , Xskj ) = (sj , yj ) for all j , (ii) X k does not jump at time sj for 0 ≤ j ≤ j0 − 1, and (iii) k(0) = m and k(j0 ) = n. Note that the definition assumes that we use only pieces of trajectories of particles from the family H. Let Ktm be the set of all descendants of particle m until t, i.e., the set of all k such that (s, k) ∈ T m for some s ≤ t. The function t → Ktm is monotone. Fix some m ∈ H. Let α1k be the number of jumps made by particle k from the family H during the time interval [0, S] but before the first time Tmk when it becomes a descendant of m (if there is no such time, we count all jumps before S). Let α2k be the

Fleming–Viot Particle Representation of the Dirichlet Laplacian

689

a H

H

c

{

{

xm 0

Fig. 1. Descendants of the mth particle are represented by thick lines. The domain D is the interval (0, a)

number of jumps made after Tmk but before S (α2k = 0 if k does not become a descendant of m before S). It is easy to see that every random variable α1k and α2k is stochastically bounded by the geometric distribution with mean 1/p4 – we can use the same argument as earlier in the proof. We will need a substantially stronger bound, though. It is not very hard to see that one can define our processes on a probability space which will also carry random variables α1k and α2k for all k ∈ H, such that α1k ≤ α1k and α2k ≤ α2k for all k k k ∈ H, and every random variable α1 and α2 is geometric with mean 1/p4 . Moreover, k k random variables α1 and α2 can be constructed so that they are jointly independent and independent of the process t → Ktm . The construction of such a family of random α2k ’s variables is standard so we will only sketch it. We start with constructing α1k ’s and k k and then we use them to construct α1 ’s and α2 ’s. We consider a probability space which carries independent sequences of Bernoulli coin tosses with success probability p4 . We then identify α1k ’s and α2k ’s with the number of tosses until and including the first success in different sequences – we need one sequence of Bernoulli trials for each α1k and α2k . k The results of coin tosses corresponding to α1 are used to determine whether particle k jumps onto a particle from the family Hc and then stays inside A3 until S (this would be considered a “success”), for all jumps before the time Tmk . The analogous events after time Tmk are determined by the sequence of coin tosses corresponding to α2k . All other aspects of the motion of particle k are determined by some other random mechanism. Such mechanisms need not be independent for different particles. Let |Ktm | denote the cardinality of Ktm . Note that if a particle from the family H \ Ktm jumps then with probability 1/(N − 1) it lands on any other given particle. The value of |Ktm | increases by 1 at the time t of a jump of a particle from H \ Ktm with probability m |/(N − 1). Hence, equal to |Kt− m m k P (|Ktm | = a + 1 |Kt− | = a, ∃k ∈ H \ Kt− : Xtk = Xt− )=

a , N −1

690

K. Burdzy, R. Hołyst, P. March

so for integer a, r ≥ 1, and c1 = c1 (r) < ∞, m m k E(|Ktm |r |Kt− | = a, ∃k ∈ H \ Kt− : Xtk = Xt− ) a a = ar 1 − + (a + 1)r N −1 N −1 r a + 1 a a = ar 1 − + N −1 a N −1 a a c r 1 ≤ ar 1 − + 1+ N −1 a N −1 c1 r = ar 1 + . N −1 Hence, the expectation of |Ktm |r jumps by at most the factor of 1 + c1 r/(N − 1) at the time of a jump of a particle from H \ Ktm . Let α = k∈H α1k . By conditioning on the times of jumps, we obtain for m ∈ H, c1 r α m r . E|Kt | ≤ E 1 + N −1 We estimate this quantity as follows, using the fact that the family of random variables α1k may be simultaneously stochastically bounded by a sequence of independent geometric random variables α1k with mean 1/p4 . The number of particles in H is obviously bounded by N . In the following calculation we will pretend that the number of α1k ’s is N ; this is harmless because if the number of particles in H is smaller than N , we can always add a few independent α1k ’s to the family. If c1 r log 1 + = c2 , N −1 then c2 is small for large N , and the following holds, c1 r α k E 1+ = E exp(c2 α) = E exp c2 α1 N −1 k∈H k α1 = E exp c2 α1k ≤ E exp c2 k∈H

≤E

N

exp c2 α1k

k∈H

N α1k = E exp c2

k=1

N  ∞ c2 (j +1) j   = e (1 − p4 ) p4 = j =0

ec2 p4 1 − ec2 (1 − p4 )

N

−N −N 1 1 −c2 1 = 1+ (e − 1) = 1+ − 1 c1 r p4 p4 1 + N−1 −N 1 c1 r = 1− ≤ c3 = c3 (r, p4 ) < ∞. p4 N − 1 + c 1 r

Fleming–Viot Particle Representation of the Dirichlet Laplacian

691

Thus, for some c3 which depends only on r and p4 , E|Ktm |r ≤ c3 .

(2.10)

Next we will estimate the total number of jumps β m on the tree of descendants of particle m. For each descendant, this will include not only the first jump, at the time of which a particle becomes a descendant of m, but also all subsequent jumps by the descendant. Recall that given the whole genealogical tree {Ktm , 0 ≤ t ≤ S}, the numbers of jumps of descendants of m can be simultaneously bounded by α2k ’s, i.e., independent geometric random variables with mean 1/p4 . We have, using (2.10), for some c4 = c4 (r) < ∞, r  |Ktm | α2k  E(β m )r ≤ E |Ktm | + k=1



|Ktm |

≤ c4 E|Ktm |r + c4 E 

≤

α2k 

k=1

c4 E|Ktm |r

r

+ c4 E E

a

α2k

r m |K | = a t

≤

k=1 a r m r r k m α2 |Kt | = a c4 E|Kt | + c4 E E a k=1 r m r r+1 c4 E|Kt | + c4 E E a α2k |Ktm | = a

≤

c4 E|Ktm |r

≤

+ c4 E E a

r+1

m c5 |Kt | = a

= c4 E|Ktm |r + c4 c5 E|Ktm |r+1 ≤ c6 = c6 (r) < ∞.

(2.11)

Part 2. This part of the proof is devoted to some qualitative estimates of the transition probabilities of the killed Brownian motion. Suppose A ⊂ D is an open set. We recall our assumption of regularity of ∂D. It implies that the function (x, t) → PtD (x, A) vanishes continuously as x → D c and so it has a continuous extension to D × [0, ∞). The notation of the following remarks partly anticipates the notation in Part 3 of the proof. Fix some t > 0 and arbitrarily small δ1 > 0. The set D × [0, t] is compact, so the continuous function (x, t) → PtD (x, A) is uniformly continuous on this set. It follows D (x, A)−P D (y, A)| < δ that we can find an integer n < ∞ and δ2 > 0 such that |Pt−s 1 t−s j when s ∈ [sj , sj +1 ], |sj − sj +1 | ≤ t/n, and |x − y| ≤ δ2 . Fix arbitrarily small t1 , δ1 > 0. Let Dδ2 denote the set of points whose distance to D c is greater than δ2 . The transition density pt (x, y) of the free Brownian motion is bounded by r1 < ∞ for x, y ∈ Rd and t ≥ t1 . The same bound holds for the transition densities ptD (x, y) for the killed Brownian motion because ptD (x, y) ≤ pt (x, y). Choose δ2 > 0 so small that the volume of D \ Dδ2 is less than δ1 /r1 . Then for every sj ≥ t1 and x ∈ D, D c Psj (x, Dδ2 ) = ptD (x, y)dy ≤ (δ1 /r1 ) · r1 = δ1 . D\Dδ2

692

K. Burdzy, R. Hołyst, P. March

Part 3. We start with the definition of marks which we will use to label particles. We will prove, in a sense, that the theorem holds separately for each family of particles bearing the same marks. Typically, a particle Xj will bear different marks at different times. The family of marks 0 is defined as the smallest set which contains 0, and which has the property that if θ1 , θ2 ∈ 0 then the ordered pair (θ1 , θ2 ) also belongs to 0. Note that we do not assume that θ1 = θ2 . We will write (θ1 → θ2 ) rather than (θ1 , θ2 ). Here are some examples of marks: 0,

(0 → 0),

(0 → (0 → 0)),

((0 → 0) → 0).

Our marks can be identified with vertices of a binary tree and are introduced only because our notation seems more intuitive in our context. Every mark will have an associated “height”. The height of 0 is defined to be 1. The height of (θ1 → θ2 ) is one plus the maximum of heights of θ1 and θ2 . We assign marks as follows. If a particle X j has not jumped before time t then its mark is equal to 0 on the interval (0, t). The mark of every particle is going to change every time it jumps, and only at such times. If a particle X j jumps at time t onto a particle Xk , the mark of X j has been θ1 just before t, and the mark of X k has been θ2 just before t then the mark of Xj will be (θ1 → θ2 ) on the interval between (and including) t and the first jump of Xj after t. To see that the above definition uniquely assigns marks to all particles at all times, note that we assign mark 0 to all particles until the first jump by any particle. Recall that τ1 < τ2 < τ3 < . . . denote the jump times of all particles. If we know the marks on the interval [τj , τj +1 ) then it is easy to assign them in a unique way on the interval [τj +1 , τj +2 ). An easy inductive procedure allows us to assign marks to all particles at all times. The mark of Xtk will be denoted θ(Xtk ). For any θ1 ∈ 0, let XtN,θ1 (dy) =

N 1 1{θ1 } (θ (Xtk ))δXk (dy). t N k=1

Note that XtN,θ1 (dy) is a (sub-probability) empirical measure supported by the particles marked with θ1 at time t. The law of large numbers and the continuity of x → PtD (x, dy) (see Part 2 of the proof) imply that for every fixed t ≤ S, the measures XtN,0 (dy) converge in probability as N → ∞ to the measure D PtD (x, dy)µ0 (dx), in the sense of weak convergence of measures. In particular, XSN,0 (dy) converge weakly to a multiple of µS (dy). The main goal of this part of the proof is to show that XSN,θ (dy) converge weakly to a multiple of µS (dy), for any fixed mark θ. This will be achieved by an inductive argument. We will elaborate the details of one inductive step, showing how the convergence of XtN,0 (dy) N,(0→0) to D PtD (x, dy)µ0 (dx) for every t ≤ S implies the convergence of Xt (dy) to D c1 D Pt (x, dy)µ0 (dx) for every t ≤ S. Consider some t ∈ (0, S]. We will show that for any δ > 0, p < 1, open A ⊂ D and 0 < t1 < t2 < t, we have N,0 D (dx) D Pt−s (x, A)Xs (1 − δ)µt (A) ≤ inf D t1 ≤s≤t2 D Pt (x, D)µ0 (dx) P D (x, A)XsN,0 (dx) ≤ sup D t−sD ≤ (1 + δ)µt (A), (2.12) t1 ≤s≤t2 D Pt (x, D)µ0 (dx)

Fleming–Viot Particle Representation of the Dirichlet Laplacian

693

with probability greater than p, when N is sufficiently large. If we fix any integer n ≥ 1, then for every sj = j t2 /n, j = 0, 1, . . . , n, and every open set A1 we have XsN,0 (A1 ) → j D (x, A )µ (dx) in probability as N → ∞, by the law of large numbers and the P 1 0 D sj D (x, A) imply that continuity of x → PsDj (x, A1 ). This and the continuity of x → Pt−s j D

D (x, A)X N,0 (dx) Pt−s sj j D

PtD (x, D)µ0 (dx)

→ µt (A),

in probability. Since there is only a finite number of sj ’s, we immediately obtain a weak version of (2.12), namely, N,0 D D Pt−sj (x, A)Xsj (dx) (1 − δ)µt (A) ≤ inf D 0≤j ≤n D Pt (x, D)µ0 (dx) N,0 D D Pt−sj (x, A)Xsj (dx) ≤ sup ≤ (1 + δ)µt (A), (2.13) D 0≤j ≤n D Pt (x, D)µ0 (dx) with probability greater than p, when N is sufficiently large. Fix arbitrarily small δ1 , p1 > 0 and let δ2 > 0 be so small and n so large that the D (x, A) − following conditions are satisfied, according to Part 2 of the proof. First, |Pt−s j D (y, A)| < δ when s ∈ [s , s Pt−s 1 j j +1 ], 0 ≤ j ≤ n − 1, and |x − y| ≤ δ2 . Second, if Dδ2 denotes the set of points whose distance to D c is greater than δ2 , we want to have PsDj (x, Dδc2 ) < δ1 , for every x and j with sj ≥ t1 . Finally, increase n if necessary so that the probability that a Brownian path has an oscillation larger than δ2 within a subinterval of [0, S] of length t2 /n or less, is less than p1 . With this choice of various constants, we see that for large N , with probability greater than p the following will be true for all j with t1 ≤ sj ≤ S. First, the proportion of X k ’s which will be within distance δ2 of the boundary at time sj will be less than 2δ1 and the proportion of X k ’s which will jump during the interval [sj , sj +1 ] will be less than 3δ1 . Because of this and D (X k , A) − P D (X k , A)| < δ for all the other parameter choices, we will have |Pt−s 1 sj t−s s j j , s ∈ [sj , sj +1 ] and all labels k in a subset of {1, 2, . . . , N} whose cardinality would be bounded below by (1 − 2p1 − 3δ1 )/N . This implies that simultaneously for all j and s ∈ [sj , sj +1 ], for large N , D N,0 P D (x, A)X N,0 (dx) − Pt−s (x, A)Xs (dx) ≤ δ1 + 2p1 + 3δ1 , t−sj sj D

D

with probability greater than p. This, the fact that p can be arbitrarily large and δ1 and p1 arbitrarily small, and (2.13) prove (2.12). We will now prove that a suitable version of (2.12) holds when we replace XsN,0 N,(0→0) with Xs . Consider an arbitrary t ≤ S, and 0 < t1 < t2 < t. Suppose that a particle Xk with mark 0 hits the boundary of D at a time s ∈ (t1 , t2 ). Then it will jump onto a randomly chosen particle. If Xk jumps onto a particle marked 0, its label will change to (0 → 0). Given this event, conditional on the value of XsN,0 , the distribution D (x, · )X N,0 (dx), by the strong Markov property. The of Xk at time t will be D Pt−s s same holds true for all other particles with mark 0 which hit D c between times t1 and t2 . Since these particles evolve independently after the jump, we see from (2.12) that

694

K. Burdzy, R. Hołyst, P. March

the empirical distribution at time t of all particles marked (0 → 0) which received this mark at a time between t1 and t2 converges in probability to a constant multiple of µt , as N → ∞. If we fix t > 0, it is easy to see that for sufficiently small t1 > 0 and large t2 < t, the probability that a particle with mark 0 will hit the boundary of D in one of N,(0→0) converges to a the intervals [0, t1 ] or [t2 , t] will be arbitrarily small. Hence, Xt constant multiple of µt , in probability. Given the last fact, the same argument which proves (2.13) yields for some η(θ ) ∈ (0, 1], D →0) η((0 → 0))(1 − δ)µt (A) ≤ inf Pt−s (x, A)XsN,(0 (dx) j j 0≤j ≤n D D →0) ≤ sup Pt−s (x, A)XsN,(0 (dx) (2.14) j j 0≤j ≤n D

≤ η((0 → 0))(1 + δ)µt (A), with large probability, when N is large. The argument following (2.13) is not specific to the case when the particles have the mark 0 and so it can be applied to the present case of particles marked (0 → 0). Hence, we obtain the following formula, which differs from (2.12) only in that the normalizing constant is η((0 → 0)) and not D PtD (x, D)µ0 (dx), η((0 → 0))(1 − δ)µt (A) ≤ inf P D (x, A)XsN,(0→0) (dx) t1 ≤s≤t2 D t−s D ≤ sup Pt−s (x, A)XsN,(0→0) (dx) (2.15) t1 ≤s≤t2 D

≤ η((0 → 0))(1 + δ)µt (A). The last formula holds with probability greater than p if N is sufficiently large, for any fixed t ≤ S, any 0 < t1 < t2 < t, and any p < 1. Proceeding by induction, one can show that (2.15) applies not only to the mark θ = (0 → 0) but also to (0 → (0 → 0)), ((0 → 0) → 0), ((0 → 0) → (0 → 0)), and to every other mark θ . Imbedded in an induction step for a mark θ is the proof that the measures XSN,θ (dy) converge to a constant (deterministic) multiple of µS (dy). It follows that for every finite deterministic subset 01 of 0, the measures θ∈01 XSN,θ (dy) converge to a multiple of µS (dy). It will now suffice to show that for any p2 , p3 > 0, there exists a finite set 01 such N,θ that θ ∈0 / 1 XS (D) < p2 with probability greater than 1 − p3 . In other words, we want to show that for some finite 01 , the number of particles with a mark from 0c1 at time S is less than p2 N with probability greater than 1 − p3 . In order to prove this, we will use the result proved in Part 1 of the proof. Recall the notion of the “height” of a mark from the beginning of the second part of the proof. Suppose that a particle Xk has a mark with height j at time S. Let tj be the infimum of t with the property that Xtk has a mark with height j . Then tj must be the time of a jump of Xk . Let X nj be the particle on which X k jumps at time tj . By definition, the height of the mark of Xk or the height of the mark of X nj must be equal to j − 1 just before time tj . We define kj to be k or nj , so that the height of the mark of Xkj is equal to j − 1 prior to tj . We proceed by induction. Suppose we have identified a particle X km which has a mark with height m − 1 prior to a time tm , where m ≤ j .

Fleming–Viot Particle Representation of the Dirichlet Laplacian

695

Then we let tm−1 be the infimum of t < tm with the property that the height of the mark of Xtkm is m − 1. We see that X km must jump at time tm−1 on a particle X nm−1 . We choose km−1 to be either km or nm−1 , so that the height of the mark of X km−1 is m − 2 just before the time tm−1 . Proceeding in this way, we will end up with a particle X k2 which has a mark with height 1. The mark of this particle is 0, as it is the only mark with height 1. This implies that t1 = 0. We claim that for all m ≤ j and t ∈ [tm−1 , tm ), we have (t, km ) ∈ T k2 . In other words, every particle X km is a descendant of X k2 at times t ≥ tm−1 . To see this, note that the claim is obviously true for m = 2. If k3 = k2 then the claim is true for m = 3, because Xk2 always remains its own descendant. If k3 = k2 , it is clear that the particle Xk3 has jumped at time t2 on X k2 , a descendant of k2 , and so became a descendant of k2 at this time. Proceeding by induction, we can show that all particles in our chain are descendants of k2 on the intervals specified above. Note that at every time tm , either a descendant of k2 jumps or a descendant of k2 is born. Hence, if Xk has a mark with height j at time S, it must belong to the family of descendants of a particle for which the sum of descendants and their jumps is not less than j . Recall the event F from Part 1 of the proof and choose the parameters in Part 1 so that the probability of F c is less than p3 /2 and the cardinality of Hc is less than Np2 /2. Conditional on F , we have the following estimate. If the sum of the number of descendants of a particle k and the number of all their jumps is equal to j then a crude estimate says that at most j descendants of k end up at time S with marks of height j or lower; by the argument in the previous paragraph, the marks cannot be higher than j . Hence, the expected number of particles with marks higher than n at time S among descendants of a particle k ∈ H can be bounded using (2.11) by j ≥n

j P (β k ≥ j ) ≤

E(β k )3 j ≤ c1 j −2 ≤ c2 /n. 3 j j ≥n

j ≥n

The expected number of all particles in the family H with marks higher than n is bounded by N c2 /n. Choose 01 to be the set of all marks of height less than n, where n is so large that N c2 /n < (p3 /2)(p2 N/2). Then, conditional on F , the probability that the total number of particles in H with marks higher than n is bounded by Np2 /2 with probability 1−p3 /2 or higher. We add to that estimate all particles from Hc – their number is bounded by Np2 /2, so the total number of particles with marks higher than n is bounded by Np2 , with probability greater than or equal to 1 − p3 /2. This estimate was obtained under the assumption that F holds. Since the probability of F is less than p3 /2, we are done. Example 2.1. We will show that for some process Xt , some t, A, µ0 and N we have EXtN (A) = µt (A). Our process has a finite state space; we presume that a similar example can be constructed for Brownian motion. We will consider a continuous time Markov process on the state space {0, 1, 2}. The set {1, 2} will play the role of D. The possible jumps of the process are 0 → 1, 1 → 2, 2 → 1 and 1 → 0. The jump rates are equal to 1 for each one of these possible transitions. The measure µ0 will be uniform on D, i.e., µ0 (1) = µ0 (2) = 1/2. First we will argue that µt = µ0 for all t > 0. To see this, note that if we condition the process not to jump to 0, it will jump from the state 1 to the state 2 and from 2 to 1 at the rates 1, i.e., at the original rates. This is because all four types of jumps 0 → 1, 1 → 2, 2 → 1 and 1 → 0 may be thought of as coming from four independent Poisson

696

K. Burdzy, R. Hołyst, P. March

processes. Conditioning on the lack of jumps of one of these processes does not influence the other three jump processes. Since the conditioned process makes jumps from 1 to 2 and vice versa with equal rates, the symmetry of µ0 on the set {1, 2} is preserved forever, i.e., µt = µ0 for all t > 0. Now let N = 2 and consider the distribution of the process Xt . Let At denote the number of particles X1 and X 2 at the state 1 at time t. The process At is a continuous time Markov process with possible values 0, 1 and 2. Its possible transitions are 0 → 1, 1 → 2, 2 → 1 and 1 → 0, just like for the original process Xt . If At = 0, i.e., if both particles are at the state 2, the waiting time for a jump of At has expectation 1/2 because each of the particles jumps independently of the other one with the jump rate 1. When both particles are in the state 1, and one of them jumps to 0, it immediately returns to 1 (the location of the other particle) so the jumps of Xk ’s from 1 to 0 have no effect on At , if At = 2. It follows that the rate for the jumps of At from 2 to 1 is 2, i.e., it is the same as for the jumps of At from 0 to 1. Finally, let us analyze the case At = 1. When only one of the particles is at 2, it jumps to 1 at the rate 1, so the rate of the transitions 1 → 2 for the At process is 1. However, its rate of transitions 1 → 0 is equal to 2 because any jump of a particle from the state 1 will result in its landing at 2, either directly or through the instantaneous visit to 0. Given these transition rates, it is elementary to check that the stationary distribution of At assigns probabilities 1/3, 1/2 and 1/6 to the states 0, 1 and 2. This implies that EXt2 ({1}) ≈ 5/12 for large t (no matter what X02 is) and so EXt2 ({1}) = µt ({1}) for some t when we choose µ0 ({1}) to be equal to 1/2. Proof of Theorem 1.4. For a point x ∈ D let ρx be the supremum of dist(x, ∂B(y, r)) over all open balls B(y, r) such that x ∈ B(y, r) ⊂ D. For each x ∈ D we will choose a ball Bx with radius r, such that x ∈ Bx ⊂ D and dist(x, ∂Bx ) > ρx /2. The center of Bx will be denoted vx . We would like the mapping x → vx to be measurable. One way to achieve this goal is to construct a countable family of balls with radius r and make the mapping x → vx constant on every element of a countable family of squares, closed on two sides, disjoint, and summing up to D. Such a construction is known as “Whitney squares”, it is quite elementary and so it is left to the reader. We will construct Xtk ’s in a special way. Two 1-dimensional processes Utk and Rtk will be associated with each Xtk . The processes Utk and Rtk will take their values in [0, r]. The processes Rtk will be independent d-dimensional Bessel processes reflected at r. In other words, every process Rtk will have the same distribution as the radial part of the d-dimensional Brownian motion reflected inside the ball B(0, r). We will define Utk so that Utk ≤ Rtk for all k and t. The processes Rtk will give us a bound on the distance of Xtk from D c ; more precisely, we will have, according to our construction, dist(Xtk , D c ) ≥ r − Utk ≥ r − Rtk . No matter what distribution for X0 is desirable, it is easy to see that we can choose the starting values for Rtk ’s so that R0k = dist X0k , vXk , a.s. 0

In our construction, we assume that Rtk ’s are given and we proceed to describe how to define Utk ’s and Xtk ’s given Rtk ’s. We will first fix a k. Let T1k be the first time when the process Rtk hits r. On interval [0, T1k ), we can define Xtk as Brownian motion in the d k k R such that Rt = dist Xt , vXk . This requires only generating an angular part for Xtk , 0

relative to the initial positions X0k . A classical “skew-product” decomposition (see Itô and McKean (1974)) achieves the goal by generating a Brownian motion on a sphere (independent of Rtk ) and then time-changing it according to a clock defined by Rtk . Note

Fleming–Viot Particle Representation of the Dirichlet Laplacian

697

that according to the definition of vx , this constructed process Xtk will remain inside D k k k Utk jumps to for t ∈ [0, T1k). We let Utk = R T1 , the process t for t ∈ [0, T1 ). At time the value dist Xk k , v Xk k T1 −

, i.e., we let U k k = dist X k k , v Xk k

T1 −

T1 −

T1

T1 −

. We let the

process Utk evolve after time T1k as a d-dimensional Bessel process independent of Rtk , until time T2k = inf{t ≥ T1k : Utk = Rtk }. Let T3k = inf{t ≥ T2k : Rtk = r}. We couple the processes Utk and Rtk on the interval [T2k , T3k ), i.e.,we let Utk = Rtk for t ∈ [T2k , T3k ). For t ∈ [T1k , T3k ), we construct Xtk so that dist Xtk , v Xk k T1

= Utk . The spherical part

is constructed in an “independent” way, in the sense of the skew-product decomposition. We proceed by induction. Recall that Rtk is given, and suppose that processes Xtk k and Utk are defined on the interval [0, T2j that Rtk approaches −1 ).Moreover, suppose k k r as t ↑ T2j to be dist X k k , v Xk k and we let the −1 . We define U k T2j −1 −

T2j −1

T2j −1 −

k process Utk evolve after time T2j −1 as a d-dimensional Bessel process independent k k k k k k k of Rt , until time T2j = inf{t ≥ T2j −1 : Ut = Rt }. Note that Ut < Rt ≤ r for k k k k k k t ∈ [T2j −1 , T2j ). Let T2j +1 = inf{t ≥ T2j : Rt = r}. We couple the processes Ut k k k and Rt (i.e., we make them equal) on the interval [T2j , T2j +1 ). The Brownian motion k k k k Xtk is defined on [T2j = Utk . Its spherical part −1 , T2j +1 ) so that dist Xt , v X k T2j −1

is generated in an independent way from other elements of the construction and then time-changed according to the skew-product recipe. We see that Utk < Rtk ≤ r for k k Rtk = r. This implies that Xtk stays inside D on every t ∈ [T2j −1 , T2j +1 ) and lim t→T k k k interval [T2j −1 , T2j +1 ).

2j +1

Let τ1k = limj →∞ Tjk and note that typically, τ1k < ∞. The above procedure allows us to define the processes Xtk and Utk on the interval [0, τ1k ). We repeat the construction for all particles Xtk in such a way that the processes in the family {(Xtk , Utk )}1≤k≤N are jointly independent. j Let τ1 = min1≤j ≤N τ1 and suppose that the minimum is attained at k, i.e., τ1k = k k τ1 < ∞. Since infinitely many independent Bessel processes {Utk , t ∈ [T2j −1 , T2j +1 )} traveled from dist Xk k , v Xk k to r, and their travel times sum up to a finite T2j −1 T2j −1 number, bounded by τ1 , it follows that limj →∞ dist X k k , v Xk k = r. We will τ1k

Xtk

T2j −1

T2j −1

show that must approach ∂D at time = τ1 . Recall the function ρx from the beginning of the proof. If x belongs to an open ball B(y, r) ⊂ D then the same holds for all points in a small neighborhood of x. The definition of ρx now easily implies that the function x → ρx is Lipschitz inside D. Since, by assumption, ρx does not vanish inside D, every sequence xn satisfying dist(xn , vxn ) → r also satisfies ρxn → 0 and so must approach ∂D as n → ∞. This finishes the proof that limt→τ k − dist(Xtk , D c ) = 0. Since two independent Brownian 1

particles cannot hit ∂D at the same time, we see that there is only one process Xtk with τ1k = τ1 < ∞. Still assuming that τ1k = τ1 < ∞, we uniformly and independently of everything j j else choose j = k and let Xτk1 = Xτ1 and Uτk1 = Uτ1 . We then proceed with the

698

K. Burdzy, R. Hołyst, P. March

construction of Xtk and Utk on the interval [τ1k , τ2k ), such that limt→τ k − dist(Xtk , D c ) = 0. 2 The construction is completely analogous to that outlined above. Note that we necessarily have Uτk1 < Rτk1 so we have to start our construction as in the inductive step of the original algorithm. Recall that the construction generates a process Utk satisfying Utk ≤ Rtk for t ∈ [τ1k , τ2k ). j We let τ2 = τ2k ∧ min1≤j ≤N,j =k τ1 . A particle Xj will have to approach ∂D at time τ2 . We will make this particle jump and then proceed by induction. Theorem 1.1 shows that there will be no accumulation of jumps of Xtk ’s at any finite time. Recall that the inner ball radius r > 0 is a constant depending only on the domain D. It is well known that the reflected process Rtk spends zero time on the boundary (i.e., at the point r) so if it starts from R0k = r then its distribution at time t = 1 is supported on (0, r). It follows that for any p1 < 1 there exists r1 ∈ (0, r) such that we have, with r2 = r, P R1k ∈ [0, r1 ] | R0k = r2 > p1 . This estimate can be extended to all r2 ∈ [0, r], by an easy coupling argument. It follows from this and the independence of processes Rtk that there exists p2 > 0 such that with probability greater than p2 , more than Np1 /2 processes Rtk happen to be in [0, r1 ] at time 1, no matter what their starting positions are at time 0, for every N > 0. Let Da be the set of all points in D whose distance from D c is greater than or equal to a. The processes Xtk have been constructed in such a way that a.s., for every k and t we have dist(Xtk , D c ) ≥ r − Rtk . This and the claim in the previous paragraph show that for any starting position of Xtk ’s, with probability greater than p2 , more than Np1 /2 processes Xtk happen to be in Dr3 at time 1, where r3 = r − r1 . We will now proceed as at the beginning of Part 1 in the proof of Theorem 1.3. Fix an arbitrary p1 < 1, a corresponding r1 = r1 (p1 ) ∈ (0, r), r3 = r − r1 , and arbitrary 0 < r5 < r4 < r3 . Let H be the family of all processes Xk such that X1k ∈ Dr3 . Assume that H has at least Np1 /2 elements. There is p3 > 0 (depending on N, p1 and rj ’s) such that with probability greater than p3 , all processes in H will stay in Dr4 for all t ∈ [1, 2]. For some p4 > 0, all processes in Hc will have a jump in the interval [1, 2], will land on a particle from the H family, and subsequently stay in Dr5 until time t = 2. Altogether, there is a strictly positive probability p2 p3 p4 ≡ p5 that all particles will be in Dr5 at time t = 2, given any initial distribution at time t = 0. Now let us rephrase the last statement in terms of the vector process Xt whose state space is D N . We have just shown that with probability higher than p5 the process Xt can reach a compact set DrN5 within 2 units of time. This and the strong Markov property applied at times 2, 4, 6, . . . show that the hitting time of DrN5 is stochastically bounded by an exponential random variable with the expectation independent of the starting point of Xt . Since the transition densities ptX (x, y) for Xt are bounded below by the densities for the Brownian motion killed at the exit time from D N , we see that ptX (x, y) > c1 > 0 for x, y ∈ DrN5 . Fix arbitrarily small s > 0 and consider the “skeleton” {Xns }n≥0 . It is standard to prove that the properties listed in this paragraph imply that the skeleton has a stationary probability distribution and that it converges to that distribution exponentially fast. This can be done, for example, using Theorem 2.1 in Down, Meyn and Tweedie (1995). Extending the convergence claim to the continuous process t → Xt from its skeleton can be done in a very general context, as was kindly shown to us by Richard Tweedie. In our case, a simple argument based on “continuity” can be supplied. More precisely, one can use a lower estimate for ptX (x, y) in terms of the transition densities

Fleming–Viot Particle Representation of the Dirichlet Laplacian

699

for Brownian motion killed upon leaving D N , which are continuous. We leave the details to the reader. This completes the proof of part (i) of the theorem. Recall that we have proved that for any p1 < 1 there exists r1 < r such that for any starting position of X0k , the particle X k is in Dr−r1 at time t = 1, with probability N of the compact set greater than p1 . It follows that for any N , the mean measure EXM N Dr−r1 is not less than p1 . Hence, the mean measures EXM are tight in D. Lemma 3.2.7 N is tight and of Dawson (1992, p. 32) implies that the sequence of random measures XM so it contains a convergent subsequence. N Choose a subsequence Nj such that the sequence XMj is convergent to a probability measure 7(µ), carried by the family of probability measures on D. It will be enough to prove part (ii) of the theorem for this sequence. Consider the sequence of processes N N Xt = Xt j , each with the stationary distribution XMj as its starting distribution. Fix an open set A ⊂ D. By an argument totally analogous to the proof of Theorem 1.3, the following holds in the sense of convergence in probability, P D (x, A)µ(dx) N D tD lim Xt j (A) = 7(dµ). (2.16) j →∞ D Pt (x, D)µ(dx) We will now apply a few results from Bass and Burdzy (1992). Check Sect. 3 of that paper for the definition of a John domain. It is elementary to see that our domain D is a John domain, because it satisfies the interior ball condition. By Proposition 3.2 in Bass and Burdzy (1992), every John domain is a twisted Hölder domain of order 1. Hence, the parabolic boundary Harnack principle (Theorem 1.2 of Bass and Burdzy (1992)) holds for D. That theorem says that if ptD (x, y) denotes the transition densities for Brownian motion killed upon exiting D, then for each u > 0 there exists c = c(D, u) ∈ (0, 1) such that ptD (x, y) psD (v, y) ≥ c (2.17) psD (v, z) ptD (x, z) for all s, t ≥ u and all v, x, y, z ∈ D. We will need a stronger version of this inequality. The proof will be based on a lemma of Burdzy, Toby and Williams (1989). The following version of that lemma is taken from Burdzy and Khoshnevisan (1998). Suppose that functions h(x, y), g(x, y) and h1 (x, y) are defined on product spaces W1 × W2 , W2 × W3 and W1 × W3 , resp. Assume that for some constant c1 , c2 ∈ (0, 1) the functions satisfy for all x, y, x1 , x2 , y1 , y2 , z1 , z2 , h1 (x, y) = h(x, z)g(z, y)dz, W2

h(x2 , z1 ) h(x1 , z1 ) ≥ (1 − c1 ), h(x1 , z2 ) h(x2 , z2 ) and

Then

g(z1 , y1 ) g(z2 , y1 ) ≥ c2 . g(z1 , y2 ) g(z2 , y2 ) h1 (x1 , y1 ) h1 (x2 , y1 ) ≥ (1 − c1 + c22 c1 ). h1 (x1 , y2 ) h1 (x2 , y2 )

(2.18)

We will apply the lemma with p2D (x, y) in place of h1 (x, y), and p1D (x, y) in place of h(x, z) and g(z, y). We see from (2.18) that the constant c(D, 2) in (2.17) may be taken to

700

K. Burdzy, R. Hołyst, P. March

be c(D, 1)+c(D, 1)2 (1−c(D, 1)). By induction, we see that the constants c(D, 2n ) may be chosen in such a way that c(D, 2n ) = c(D, 2n−1 ) + c(D, 2n−1 )2 (1 − c(D, 2n−1 )). Then c(D, 2n ) → 1 as n → ∞. Obviously, we may assume that the function u → c(D, u) is non-decreasing. Hence, (2.17) holds for some c(D, u) satisfying c(D, u) → 1 as u → ∞. The inequality (2.17) easily implies that c(D, t)

D PtD (y, A) PtD (x, A) −1 Pt (y, A) ≤ ≤ c(D, t) , PtD (y, D) PtD (x, D) PtD (y, D)

for all x, y ∈ D. This in turn shows that D D D Pt (x, A)µ2 (dx) D Pt (x, A)µ1 (dx) c(D, t) ≤ D D D Pt (x, D)µ2 (dx) D Pt (x, D)µ1 (dx) D −1 D Pt (x, A)µ2 (dx) ≤ c(D, t) , D D Pt (x, D)µ2 (dx) for any probability measures µ1 and µ2 on D. Since c(D, t) → 1 as t → ∞, we see that for some fixed probability measure µ1 on D and any 7(µ), −1 D D D Pt (x, A)µ1 (dx) D Pt (x, A)µ(dx) 7(dµ) = 1. (2.19) lim D D t→∞ D Pt (x, D)µ1 (dx) D Pt (x, D)µ(dx) The normalized distribution of the killed Brownian motion in D converges to the normalized first eigenfunction ϕ1 of the Dirichlet Laplacian in D, i.e., D ϕ1 (y)dy D Pt (x, A)µ1 (dx) lim = A , D t→∞ D ϕ1 (y)dy D Pt (x, D)µ1 (dx) by the eigenfunction expansion for ptD (x, y). In view of (2.19), D ϕ1 (y)dy D Pt (x, A)µ(dx) 7(dµ) → A , D D ϕ1 (y)dy D Pt (x, D)µ(dx)

(2.20)

N

as t → ∞. By the stationarity of XMj , the right hand side of (2.16) does not depend on t and so (2.20) is in fact an equality. This observation combined with (2.16) completes the proof.

3. Appendix. Related Probabilistic and Physical Models We will discuss a few well known models and problems in probability and mathematical physics to which our paper is related. Before we do so, let us note that the original impulse for the article came from heuristic and numerical results presented in Burdzy, Hołyst, Ingerman and March (1996). This largely determined the direction of our research. The notes below may include some ideas for future research on our model, perhaps different in their flavor from the present article. (i) Superprocesses with interactions. Superprocesses, also known as measure-valued diffusions or Dawson–Watanabe diffusions, are processes whose states are measures.

Fleming–Viot Particle Representation of the Dirichlet Laplacian

701

Super-Brownian motion and the Fleming–Viot process with Brownian spatial motion are two of the most studied models in this class. The model introduced in this paper resembles most the Fleming–Viot process, which can be described in a heuristic way as follows. Consider N particles performing independent Brownian motions in Rd . Every ε units of time, two particles are chosen uniformly and the first particle jumps to the location of the second one. Between the jumps, the particles are independent Brownian motions. Assume for simplicity that all particles start from a fixed point. If N → ∞ and ε → 0, at a rate related to N, then the empirical distributions of the particles converge for every time t ≥ 0 to a random measure. It is known that in dimensions d ≥ 2, the measures are carried by sets of fractal nature. The original Fleming–Viot model sketched above assumes independence of the branching mechanism from the spatial distribution of the particles. In recent years, a number of papers have been devoted to processes which are similar but whose branching mechanism does depend on the spatial distribution of the particles. Roughly speaking, two closely related models have been considered – in one of them a “catalyst” is present, facilitating the branching of particles; the other model assumes that branching can be influenced by the local density of particles (see, e.g., Adler and Ivanitskaya (1996) Dawson and Fleischmann (1997), Dawson and Greven (1996), Dawson and Perkins (1998) and Klenke (1999)). Our model goes in a slightly different direction because we consider an “obstacle” (the set D c ) where the particles are killed although the offspring are generated in a uniform way across the whole population as in the original model. Our process might possibly represent a biological population, with a region D c having fatal effect on individuals. The assumption of the constant number of individuals is an idealization of the constant carrying capacity of an environment. Fleming–Viot models are sometimes applied to “populations” whose individual members are genes. The main qualitative difference between our model and the classical Fleming-Viot process with Brownian spatial motion is that in the limit, we obtain measures with smooth densities. (ii) Propagation of chaos. When we consider a large number of interacting particles then under some assumptions, two tagged particles will behave in an almost independent way j (see, e.g., Sznitman (1991)). In our case, two particles Xtk and Xt are almost independent when N is large. An even stronger result is true – the propagation of chaos holds for the entire trees of descendants of particles labeled k and j . The two claims are quite clear in view of the theorems and techniques presented in the paper but we will not give a rigorous proof here. (iii) Genetic algorithms. A very active area of applied and theoretical research deals with “genetic algorithms”. We mention a book of Man, Tang and Kwong (1999) as a possible starting entry point to this rapidly growing field. A genetic algorithm is a way to search for an answer to a problem by imitating biological genetic processes. Our model might be thought of as a genetic algorithm generating the first eigenvalue and the corresponding eigenfunction for the Dirichlet Laplacian. We do not make any claims of direct applicability of our model, especially in view of the fact that we do not present any theoretical estimates of the rate of convergence or computer simulations. We note however, that a related problem of finding the second Neumann eigenvalue (the “spectral gap”) is one of the most studied problems from both theoretical and practical points of view, for various Markov processes. (iv) Minimization of entropy production. It is postulated in physics (Wio (1994) III.5) that an irreversible system achieves a stationary state characterized by the minimum

702

K. Burdzy, R. Hołyst, P. March

entropy production. See the Prigogine–de Groot Theorem in Yourgrau, van der Merwe and Raw (1982); consult also a recent article of Ruelle (1997) on this topic. Entropy production has been studied in the context of stochastic processes, for example by Gong and Qian (1997) but we could not find a direct relationship between that paper and our model. We will explain how our model relates the principle of minimum entropy production to a minimizing property of the first Laplacian eigenfunction. In order to simplify the presentation, we will consider a slightly modified model in which the branching rate is constantly equal to λ1 , the first eigenvalue of the Laplacian in D with Dirichlet boundary conditions. In general, the branching rate does not have to be a constant. In the limit, when N → ∞, we obtain the following formula for the evolution of the density of the particle process, ∂p(x, t) = ;p(x, t) + λ1 p(x, t), (3.1) ∂t where ; represents the Laplacian (we ignored the probabilistic constant 1/2). The first term on the right-hand side represents the Brownian motion effect and the second one represents branching. We will use the notion of entropy proposed by Rényi (1961). He introduced a family of entropy measures parametrized by β, St (β) = (1 − β)−1 log p(x, t)β dx. D

We will consider one of these definitions corresponding to β = 2, p(x, t)2 dx. St = − log D

Using this definition of entropy, we obtain from (3.1), |∇p(x, t)|2 dx dS . = −λ1 + D 2 dt D p(x, t) dx The first term represents the decrease of the entropy in the system due to the flux of particles through the boundary. The second term represents the entropy production. The last quantity is always positive and is minimal in the stationary state, i.e., when dS/dt = 0. We see that the entropy production is minimal when |∇p(x, t)|2 dx D = λ1 . 2 D p(x, t) dx However, the same minimization problem defines the first eigenfunction of the Laplacian in D with Dirichlet boundary conditions leading to Eq. (3.1) in the stationary regime (dp/dt = 0). In this sense, the first eigenfunction minimizes the entropy production. We note that since λ1 is the mean escape rate from the system, the property of minimum entropy production is equivalent to the property of the minimum mean escape rate from the system. The Rényi entropy belongs to the class of entropies introduced in the nonextensive thermostatics (Pennini, Plastino and Plastino (1998)). In ordinary physical systems it is usually assumed – in view of the second law of thermodynamics – that entropy is an additive quantity and therefore has a properly defined density. This is the case when the

Fleming–Viot Particle Representation of the Dirichlet Laplacian

703

boundary conditions do not strongly influence the bulk properties of the system. This does not hold for the stochastic process considered in this paper since the process of branching in the middle of the system is induced by the flux of particles through the boundary. Acknowledgement. We are grateful to David Aldous, Wilfrid Kendall, Tom Kurtz, Jeff Rosenthal, Dan Stroock, Kathy Temple and Richard Tweedie for very useful advice. We would like to thank the anonymous referee for many suggestions for improvement.

References 1. Adler, R.J. and Ivanitskaya, L.: A superprocess with a disappearing self-interaction. J. Theoret. Probab. 9, 245–261 (1996) 2. Bass, R. and Burdzy, K.: Lifetimes of conditioned diffusions. Probab. Th. Rel. Fields 91, 405–443 (1992) 3. Burdzy, K.,Hołyst, R.,Ingerman, D. and March, P.: Configurational transition in a Fleming-Viot-type model and probabilistic interpretation of Laplacian eigenfunctions. J. Phys. A 29, 2633–2642 (1996) 4. Burdzy, K. andKhoshnevisan, D.: Brownian motion in a Brownian crack. Ann. Appl. Probab. 8, 708–748 (1998) 5. Burdzy, K.,Toby, E. andWilliams, R.J.: On Brownian excursions in Lipschitz domains. Part II. Local asymptotic distributions, In: Seminar on Stochastic Processes 1988, E. Cinlar, K.L. Chung, R. Getoor, J. Glover, editors, Boston: Birkhäuser, 1989, pp. 55–85 6. Dawson, D.A.: Infinitely divisible random measures and superprocesses. In: Stochastic Analysis and Related Topics, H. Körezlioglu and A.S. Üstünel, Eds, Boston: Birkhäuser, 1992 7. Dawson, D.A. and Fleischmann, K.: Longtime behavior of a branching process controlled by branching catalysts. Stoch. Process. Appl. 71, 241–257 (1997) 8. Dawson, D.A. and Greven, A.: Multiple Space-Time Scale Analysis For Interacting Branching Models. Electronic J. Probab. 1, paper no. 14, 1–84 (1996) 9. Dawson, D.A. and Perkins, E.A.: Long-time behavior and coexistence in a mutually catalytic branching model. Ann. Probab. 26, 1088–1138 (1998) 10. Down, D., Meyn, S.P. and Tweedie, R.L.: Exponential and uniform ergodicity of Markov processes. Ann. Probab. 23, 1671–1691 (1995) 11. Gong, G. and Qian, M.: Entropy production of stationary diffusions on non-compact Riemannian manifolds. Sci. China Ser. A 40, 926–931 (1997) 12. Itô, K. and McKean, P.: Diffusion Processes and Their Sample Paths. New York: Springer-Verlag, 2nd edition, 1974 13. Klenke, A.: A Review on Spatial Catalytic Branching. In: Festschrift in honour of D. Dawson, 1999, to appear 14. Man, K.F., Tang, K.S., and Kwong S.: Genetic algorithms. Concepts and designs. London: SpringerVerlag (1999) 15. Pennini, F., Plastino, A.R. and Plastino, A.: Rényi entropies and Fisher informations as measures of nonextensivity in a Tsallis setting. Physica A 258, 446–457 (1998) 16. Rényi, A.: On measures of entropy and information. Proc. 4-th Berkeley Symp. Math. Stat. Probab. 1, 547–561 (1961) 17. Ruelle, D.: Entropy production in nonequilibrium statistical mechanics. Commun. Math. Phys. 189, 365–371 (1997) 18. Sznitman, A.-S.: Topics in propagation of chaos, École d’Été de Probabilités de Saint-Flour XIX–1989, Lecture Notes in Math., 1464. Berlin: Springer, 1991, pp. 165–251 19. Wio, H.S.: An Introduction to Stochastic Processes and Nonequilibrium Statistical Physics. Singapore: World Scientific, 1994 20. Yourgrau, W., van der Merwe, A., and Raw, G.: Treatise on Irreversible and Statistical Thermophysics. New York: Dover Publications Inc., 2nd edition, 1982, pp. 48–52 Communicated by D. Brydges

Commun. Math. Phys. 214, 705 – 731 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Passivity and Microlocal Spectrum Condition Hanno Sahlmann, Rainer Verch Institut für Theoretische Physik, Universität Göttingen, Bunsenstr. 9, 37073 Göttingen, Germany. E-mail: [email protected]; [email protected] Received: 9 February 2000 / Accepted: 7 June 2000

Abstract: In the setting of vector-valued quantum fields obeying a linear wave-equation in a globally hyperbolic, stationary spacetime, it is shown that the two-point functions of passive quantum states (mixtures of ground- or KMS-states) fulfill the microlocal spectrum condition (which in the case of the canonically quantized scalar field is equivalent to saying that the two-pnt function is of Hadamard form). The fields can be of bosonic or fermionic character. We also give an abstract version of this result by showing that passive states of a topological ∗-dynamical system have an asymptotic pair correlation spectrum of a specific type. 1. Introduction A recurrent theme in quantum field theory in curved spacetime is the selection of suitable states which may be viewed as generalizations of the vacuum state familiar from quantum field theory in flat spacetime. The selection criterion for such states should, in particular, reflect the idea of dynamical stability under temporal evolution of the system. If a spacetime possesses a time-symmetry group (generated by a timelike Killing vector field), then a ground state with respect to the corresponding time-evolution appears as a good candidate for a vacuum-like state. More generally, any thermal equilibrium state for that time-evolution should certainly also be viewed as a dynamically stable state. Groundand thermal equilibrium states, and mixtures thereof, fall into the class of the so-called “passive” states, defined in [34]. An important result by Pusz and Woronowicz [34] asserts that a dynamical system is in a passive state exactly if it is impossible to extract energy from the system by means of cyclic processes. Since the latter form of passivity, i.e. the validity of the second law of thermodynamics, expresses a thermodynamical stability which is to be expected to hold generally for physical dynamical systems, one would expect that passive states are natural candidates for physical (dynamically stable) states in quantum field theory in curved spacetime, at least when the spacetime, or parts of it, possess time-symmetry groups. This point of view has been expressed in [7].

706

H. Sahlmann, R. Verch

In this work we study the relationship between passivity of a quantum field state and the microlocal spectrum condition for free quantum fields on a stationary, globally hyperbolic spacetime. The microlocal spectrum condition (abbreviated, µSC ) is a condition restricting the form of the wavefront sets, WF(ωn ), of the n-point distributions ωn of a quantum field state [6, 35]. For quasifree states, it suffices to restrict the form of WF(ω2 ); see relation (1.1) near the end of this Introduction for a definition of µSC in this case. There are several reasons why the µSC may rightfully be viewed as an appropriate generalization of the spectrum condition (i.e. positivity of the energy in any Lorentz frame), required for quantum fields in flat spacetime, to quantum field theory in curved spacetime. Among the most important is the proof by Radzikowski [35] (based on mathematical work by Duistermaat and Hörmander [12]) that, for the free scalar Klein–Gordon field on any globally hyperbolic spacetime, demanding that the two-point function ω2 obeys the µSC is equivalent to ω2 being of Hadamard form. This is significant since it appears nowadays well-established to take the condition that ω2 be of Hadamard form as criterion for physical (dynamically stable) quasifree states for linear quantum fields on curved spacetime in view of a multitude of results, cf. e.g. [15, 14, 31, 42, 43, 45, 47] and references given therein. Moreover, µSC has several interesting structural properties which are quite similar to those of the usual spectrum condition, and allow to some extent similar conclusions [6, 5, 44]. It is particularly worth mentioning that one may, in quasifree states of linear quantum fields fulfilling µSC, covariantly define Wick-products and develop the perturbation theory for P (φ)4 -type interactions along an Epstein–Glaser approach generalized to curved spacetime [5, 6]. Also worth mentioning is the fact that µSC has proved useful in the analysis of other types of problems in quantum field theory in curved spacetime [36, 30, 13]. In view of what we said initially about the significance of the concept of passivity for quantum field states on stationary spacetimes one would be inclined to expect that, on a stationary, globally hyperbolic spacetime, a passive state fulfills the µSC, at least for quasifree states of linear fields. And this is what we are going to establish in the present work. We should like to point out that more special variants of such a statement have been established earlier. For the scalar field obeying the Klein–Gordon equation on a globally hyperbolic, static spacetime, Fulling, Narcowich and Wald [16] proved that the quasifree ground state with respect to the static Killing vector field has a two-point function of Hadamard form, and thus fulfills µSC, as long as the norm of the Killing vector field is globally bounded away from zero. Junker [26] has extended this result by showing that, if the spacetime has additionally compact spatial sections, then the quasifree KMS-states (thermal equilibrium states) at any finite temperature fulfill µSC. But the requirement of having compact Cauchy-surfaces, or the constraint that the static Killing vector field have a norm bounded globally away from zero, exclude several interesting situations from applying the just mentioned results. A prominent example is Schwarzschild spacetime, which possesses a static timelike Killing flow, but the norm of the Killing vector field tends to zero as one approaches the horizon along any Cauchy-surface belonging to the static foliation. In [28] (cf. also [17]), quasifree ground- and KMS-states with respect to the Killing flow on Schwarzschild spacetime have been constructed for the scalar Klein– Gordon field, and it has long been conjectured that the two-point functions of these states are of Hadamard form. However, when trying to prove this along the patterns of [16] or [26], who use the formulation of quasifree ground- and KMS-states in terms of the Klein–Gordon field’s Cauchy-data, one is faced with severe infra-red problems even for massive fields upon giving up the constraint that the norm of the static Killing vector

Passivity and Microlocal Spectrum Condition

707

field be globally bounded away from zero. This has called for trying to develop a new approach to proving µSC for passive states, the result of which is our Theorem 5.1; see further below in this Introduction for a brief description. Then our Thm. 5.1 shows, as a corollary, that the quasifree ground- and KMS-states of the scalar Klein–Gordon field on Schwarzschild spacetime satisfy µSC (thus their two-point functions are of Hadamard form). [We caution the reader that this does not show that these states or rather, their “doublings” defined in [28], were extendible to Hadamard states on the whole of the Schwarzschild–Kruskal spacetime. There can be at most one single quasifree, isometryinvariant Hadamard state on Schwarzschild–Kruskal spacetime and this state necessarily restricts to a KMS-state at Hawking temperature on the (“outer, right”-) Schwarzschildpart of Schwarzschild–Kruskal spacetime, cf. [31, 29].] Before describing next the contents of this work, we wish to note that we have aimed at a quite self-contained presentation. Therefore, Sections 3 and 4 consist to major parts of summaries of well-established material from the literature (as will be described shortly), with some adaptations required for the present purposes. The inclusion of this material is mainly for the convenience of the reader. The novel results of the present work appear in Sections 2 and 5. In more detail, the organization of the paper is as follows. In Section 2, we will introduce the notion of “asymptotic pair correlation spectrum” of a state ω of a topological ∗-dynamical system. This object is to be viewed as a generalization of the wavefront set of the two-point function ω2 in the stated general setting, see [44] for further discussion. We then show that for (strictly) passive states ω the asymptotic pair correlation spectrum must be of a certain, asymmetric form. This asymmetry can be interpreted as the microlocal remnant of the asymmetric form of the spectrum that one would obtain for a ground state. Section 3 will be concerned with some aspects of wavefront sets of distributions on test-sections of general vector bundles. Section 3.1 contains a reformulation of the wavefront set for vector-bundle distributions along the lines of Prop. 2.2 in [44]. We briefly recapitulate some notions of spacetime geometry, as far as needed, in Subsect. 3.2. In Subsect. 3.3 we quote the propagation of singularities theorem (PST) for waveoperators acting on vector bundles, in the form used later in Section 5, from [8, 12]. In Subsect. 4.1 we introduce, following [32], the Borchers algebra of smooth testsections with compact support in a vector bundle over a Lorentzian spacetime, and briefly summarize the connection between states on the Borchers algebra, their GNSrepresentations, the induced quantum fields, and the Wightman n-point functions. We require that the quantum fields associated with the states are, in a weak sense, bosonic or fermionic, i.e., they fulfill a weak form of (twisted) locality. A quite general formulation of (bosonic or fermionic) quasifree states will be given in Subsect. 4.3. Section 5 contains our main result, saying that for a state ω on the Borchers algebra associated with a given vector bundle, over a globally hyperbolic, stationary spacetime (M, g) as base manifold, the properties (i) ω is (strictly) passive, (ii) ω fulfills a weak form of (twisted) locality, and (iii) ω2 is a bi-solution up to C ∞ for a wave operator, imply WF(ω2 ) ⊂ R,

(1.1)

where R is the set of pairs of non-zero covectors (q, ξ ; q , ξ ) ∈ T∗ M × T∗ M so that g µν ξν is past-directed and lightlike, the base points q and q are connected by an affinely

708

H. Sahlmann, R. Verch

parametrized, lightlike geodesic γ , and both ξ and −ξ are co-tangent to γ , or ξ = −ξ if q = q . Following [6], we say that the quasifree state with two-point function ω2 fulfills the µSC if the inclusion (1.1) holds. If one had imposed the additional requirement that ω (resp., the associated quantum fields) fulfill appropriate vector-bundle versions of the CCR or CAR, one would conclude that WF(ω2 ) = R, as is e.g. the case for the free scalar Klein–Gordon field (cf. [35]). Moreover, for a quasifree state ω on the Borchers algebra of a vector bundle over any globally hyperbolic spacetime one can show that imposing CCR or CAR implies that ω2 is of Hadamard form (appropriately generalized) if and only if WF(ω2 ) = R. The discussion of these matters will be contained in a separate article [38]. 2. Passivity and Asymptotic Pair Correlation Spectrum Let A be a C ∗ -algebra with unit and {αt }t∈R a one-parametric group of automorphisms of A, supposed to be strongly continuous, that is, ||αt (A) − A|| → 0 as t → 0 for each A ∈ A. Moreover, let D(δ) denote the set of all A ∈ A such that the limit δ(A) := lim

t→0

1 (αt (A) − A) t

exists. One can show that D(δ) is a dense ∗-subalgebra of A, and δ is a derivation with domain D(δ). Following [34], one calls a state ω on A passive if for all unitary elements U ∈ D(δ) which are continuously connected to the unit element,1 the estimate 1 ω(U ∗ δ(U )) ≥ 0 i

(2.1)

is fulfilled. As a consequence, ω is invariant under {αt }t∈R : ω◦αt = ω for all t ∈ R. Furthermore, it can be shown (cf. [34]) that ground states or KMS-states at inverse temperature β ≥ 0 for αt are passive, as are convex sums of such states. (In Appendix A we will summarize some basic properties of ground states and KMS-states. Standard references include [4, 39].) However, the significance of passive states is based on two remarkable results in [34]. First, a converse of the previous statement is proven there: If a state is completely passive, then it is a ground state or a KMS-state at some inverse temperature β ≥ 0. Here a state is called completely passive if, for each n ∈ N, the product state ⊗n ω is a passive state on ⊗n A with respect to the dynamics {⊗n αt }t∈R . Secondly, the following is established in [34]: the dynamical system modelled by A and {αt }t∈R is in a passive state precisely if it is impossible to extract energy from the system by means of cyclic processes. In that sense, passive states may be viewed as good candidates for physically realistic states of any dynamical system since for these states the second law of thermodynamics is warranted. In the present section we are interested in studying the asymptotic high frequency behaviour of passive states along similar lines as developed recently in [44]. We shall, 1 i.e. there exists a continuous curve [0, 1] t → U (t) ∈ D(δ) with each U (t) unitary and U (0) = 1 , A U (1) = U .

Passivity and Microlocal Spectrum Condition

709

however, generalize the setup since this will prove useful for developments later in this work. Thus, we assume now that A is a topological ∗-algebra with a locally convex topology and with a unit element (cf. e.g. [40]). We denote by S the set of continuous semi-norms for A. Moreover, we say that {αt }t∈R is a continuous one-parametric group of ∗-automorphisms of A if for each t, αt is a topological ∗-automorphism of A, and if the group action is locally bounded and continuous in the sense that for each σ ∈ S there is σ ∈ S, r > 0 with σ (αt (A)) ≤ σ (A) for all |t| < r, A ∈ A, and σ (αt (A) − A) → 0 as t → 0 for each A ∈ A. Then we refer to the pair (A, {αt }t∈R ) as a topological ∗-dynamical system. Using the fact that for all A, B ∈ A and σ ∈ S, the maps C → σ (AC) and C → σ (CB) are again continuous semi-norms on A, one deduces by a standard argument that also σ (αs (A)αt (B) − AB) → 0 as s, t → 0. A continuous linear functional ω on A will be called a state if ω(A∗ A) ≥ 0 for all A ∈ A and if ω(1A ) = 1. Furthermore, we say that ω is a ground state, or a KMS-state at inverse temperature β > 0, for {αt }t∈R , if the functions t → ω(Aαt (B)) are bounded for all A, B ∈ A, and if ω satisfies the ground state condition (A.1) or the KMS-condition (A.2) given in Appendix A, respectively. Now we call a family (Aλ )λ>0 with Aλ ∈ A a global testing family in A provided there is for each σ ∈ S an s ≥ 0 (depending on σ and on the family) such that sup λs σ (A∗λ Aλ ) < ∞. λ

(2.2)

The set of all global testing families will be denoted by A. Let ω be a state on A, and ξ = (ξ1 , ξ2 ) ∈ R2 \{0}. Then we say that ξ is a regular direction for ω, with respect to the continuous one-parametric group {αt }t∈R , if there exists some h ∈ C0∞ (R2 ) and an open neighbourhood V of ξ in R2 \{0} such that 2 −iλ−1 k·t sup e h(t)ω(αt1 (Aλ )αt2 (Bλ )) dt = O ∞ (λ) as λ → 0 (2.3) k∈V

holds for all global testing families (Aλ )λ>0 , (Bλ )λ>0 ∈ A. Then we define the set ACS 2A (ω) as the complement in R2 \{0} of all k which are regular directions for ω. We call ACS 2A (ω) the global asymptotic pair correlation spectrum of ω. The asymptotic pair correlation spectrum, and more generally, asymptotic n-point correlation spectra of a state, may be regarded as generalizations of the notion of a wavefront set of a distribution in the setting of states on a dynamical system. We refer to [44] for considerable further discussion and motivation. The properties of ACS 2A (ω) are analogous to those of ACS 2 (ω) described in [44, Prop. 3.2]. In particular, ACS 2A (ω) It is evident that, if ω is a finite convex sum of states ωi , is a closed conic set in R2 \{0}. then ACS 2A (ω) is contained in i ACS 2A (ωi ). Now we are going to establish an upper bound for ACS 2A (ω), distinguished by a certain asymmetry, for all ω in a subset P of the set of all passive states, to be defined next: We define P as the set of all states on A which are of the form ω(A) =

m

ρi ωi (A),

A ∈ A,

(2.4)

i=1 2 We shall write ϕ(λ) = O ∞ (λ) as λ → 0 iff for each s ∈ N there are C , λ > 0 so that |ϕ(λ)| ≤ C λs s s s for all 0 < λ < λs .

710

H. Sahlmann, R. Verch

where m ∈ N, ρi > 0, m i=1 ρi = 1, and each ωi is a ground state or a KMS-state at some inverse temperature βi > 0 (note that βi = 0 is not admitted!) on A with respect to {αt }t∈R . The states in P will be called strictly passive. We should like to remark that in the present general setting where A is not necessarily a C ∗ -algebra, the criterion for passivity given at the beginning in (2.1) may be inappropriate since it could happen that D(δ), even if dense in A, doesn’t contain sufficiently many unitary elements. In the C ∗ -algebraic situation, (2.1) entails the slightly weaker variant 1 ω(A δ(A)) ≥ 0 i

(2.5)

for all A = A∗ ∈ D(δ), and one may take this as a substitute for the condition of passivity of a state in the present more general framework (supposing that D(δ) is dense). In fact, each ω ∈ P is {αt }t∈R -invariant and satisfies (2.5) (see Appendix A), and in the C ∗ -algebraic situation, every ω ∈ P also satisfies (2.1). Proposition 2.1. Let (A, {αt }t∈R ) be a topological ∗-dynamical system as described above. (1) Let ω ∈ P. Then either or

ACS 2A (ω) = ∅,

ACS 2A (ω) = {(ξ1 , ξ2 ) ∈ R2 \{0} : ξ1 + ξ2 = 0, ξ2 ≥ 0}.

(2) Let ω be an {αt }t∈R -invariant KMS-state at inverse temperature β = 0. Then either or

ACS 2A (ω) = ∅,

ACS 2A (ω) = {(ξ1 , ξ2 ) ∈ R2 \{0} : ξ1 + ξ2 = 0}.

Proof. 1) By assumption ω is continuous, hence we can find a seminorm σ ∈ S so that |ω(A)| ≤ σ (A) for all A ∈ A. Thus there are positive constants c and s so that ω(αt (A∗λ Aλ )) = ω(A∗λ Aλ ) ≤ c · (1 + λ−1 )s

(2.6)

holds for all t ∈ R. In the first equality, the invariance of ω was used, and in the second, condition (2.2) was applied. Thus, for any Schwartz-function hˆ ∈ S(R2 ), and any (Aλ )λ>0 , (Bλ )λ>0 in A, one obtains that the following function of λ > 0 and k ∈ R2 , ˆ wλ (k) := e−ik·t h(t)ω(α t1 (Aλ )αt2 (Bλ )) dt depends smoothly on k and satisfies the estimate |wλ (λ−1 k)| ≤ c (|k| + λ−1 + 1)r with suitable constants c > 0, r ∈ R. Hence, this function satisfies the assumptions of Lemma 2.2 in [44]. Application of that lemma entails the following: Suppose that for ˆ some open neighbourhood V of ξ ∈ R2 \{0} we can find some hˆ ∈ S(R2 ) with h(0) =1 and −iλ−1 k·t ˆ h(t)ω(αt1 (Aλ )αt2 (Bλ )) dt = O ∞ (λ) as λ → 0 sup e (2.7) k∈V

Passivity and Microlocal Spectrum Condition

711

for all (Aλ )λ>0 , (Bλ )λ>0 ∈ A. Then this implies that the analogous relation holds with hˆ replaced by φ · hˆ for any φ ∈ C0∞ (R2 ) when simultaneously V is replaced by some slightly smaller neighbourhood V of ξ . Consequently, relation (2.7) – with hˆ ∈ S(R2 ), ˆ h(0) = 1 – entails that ξ is absent from ACS 2A (ω). 2) Some notation needs to be introduced before we can proceed. For f ∈ S(R), we define (τs f )(s ) := f (s − s)

and

r

f (s ) := f (−s ),

s, s ∈ R.

Then we will next establish ω◦αt = ω

⇒

ACS 2A (ω) ⊂ {(ξ1 , ξ2 ) ∈ R2 \{0} : ξ1 + ξ2 = 0}.

(2.8)

To this end, let ξ = (ξ1 , ξ2 ) ∈ R2 \{0} be such that ξ1 + ξ2 = 0, and pick some δ > 0 and an open neighbourhood Vξ of ξ so that |k1 + k2 | > δ for all k ∈ Vξ . Now pick two functions hj ∈ C0∞ (R) (j = 1, 2) such that their Fourier-transforms −it ·p hˆ j (tj ) = √1 e j hj (p) dp have the property hˆ j (0) = 1. Define hˆ ∈ S(R2 ) by 2π ˆ h(t) := hˆ 1 (t1 )hˆ 2 (t2 ). Then observe that one can find λ0 > 0 such that the functions gλ,k (p) := ((τ−λ−1 (k1 +k2 ) rh1 ) · h2 )(p),

p ∈ R,

(2.9)

vanish for all k = (k1 , k2 ) ∈ Vξ and all 0 < λ < λ0 . Consequently, also the functions fλ,k (p) := (τλ−1 k2 gλ,k )(p),

p ∈ R,

(2.10)

vanish for all k ∈ Vξ and all 0 < λ < λ0 . Denoting the Fourier-transform of fλ,k by fˆλ,k , one obtains for all k ∈ Vξ , 0 < λ < λ0 : 0= = =

fˆλ,k (s)ω(Aλ αs (Bλ )) ds −1 (k +k )s 1 2

e−iλ

−1 k·t

e−iλ

−1 k s 2

e−iλ

hˆ 1 (s )hˆ 2 (s + s)ω(Aλ αs (Bλ ))ds ds

ˆ h(t)ω(α t1 (Aλ )αt2 (Bλ )) dt

for all testing-families (Aλ )λ>0 , (Bλ )λ>0 . Invariance of ω under {αt }t∈R was used in passing from the second equality to the last. In view of step 1.) above, this shows (2.8). 3) In a further step we will argue that ω ground state

⇒

ACS 2A (ω) ⊂ {(ξ1 , ξ2 ) ∈ R2 \{0} : ξ2 ≥ 0}.

(2.11)

So let again hj and hˆ j as above, and fλ,k as in (2.10) with Fourier-transform fˆλ,k . Let ξ = (ξ1 , ξ2 ) ∈ R2 \{0} have ξ2 < 0. Then there is an open neighbourhood Vξ of ξ and an 0 > 0 so that k2 < −0 for all k ∈ Vξ . The support of fλ,k is contained in the support of τλ−1 k2 h2 , and there is clearly some λ0 > 0 such that suppτλ−1 k2 h2 ⊂ (−∞, 0) for all k = (k1 , k2 ) ∈ Vξ as soon as 0 < λ < λ0 . By the characterization of a ground state

712

H. Sahlmann, R. Verch

given in (A.1), and using also the {αt }t∈R -invariance of a ground state, one therefore obtains −1 ˆ sup e−iλ k·t h(t)ω(α (A )α (B )) dt t1 λ t2 λ k∈Vξ

= sup fˆλ,k (s)ω(Aλ αs (Bλ )) ds k∈Vξ

= 0

if 0 < λ < λ0

for all (Aλ )λ>0 , (Bλ )λ>0 ∈ A. Relation (2.11) is thereby proved. 4) Now we turn to the case ω KMS at β > 0

⇒

ACS 2A (ω) ⊂ {(ξ1 , ξ2 ) ∈ R2 \{0} : ξ2 ≥ 0}.

(2.12)

Consider a ξ ∈ R2 \{0} with ξ2 < 0 and pick some 0 > 0 and an open neighbourhood Vξ of ξ so that k2 < −0 for all k = (k1 , k2 ) ∈ Vξ . Choose again hj and hˆ j as above and define correspondingly gλ,k and fλ,k as in (2.9) and (2.10), respectively. Denote again their Fourier-transforms by gˆ λ,k and fˆλ,k . Note that gλ,k and fλ,k are in C0∞ (R) for all λ > 0 and all k ∈ R2 , so their Fourier-transforms are entire analytic. Moreover, a standard estimate shows that sup |gˆ λ,k (s + iβ)| ds ≤ c < ∞. (2.13) λ>0,k∈R2

One calculates −1 −1 fˆλ,k (s + iβ) = eλ k2 β e−iλ k2 s gˆ λ,k (s + iβ),

s ∈ R,

and now the KMS-condition (A.2) yields for all (Aλ )λ>0 , (Bλ )λ>0 ∈ A, fˆλ,k (s)ω(Aλ αs (Bλ )) ds λ−1 k β −iλ−1 k2 s 2 = e e gˆ λ,k (s + iβ)ω(αs (Bλ )Aλ ) ds −1 k β 2

≤ eλ

c · c (1 + λ−1 )s ,

λ > 0, k ∈ R2 ,

for suitable c , s > 0, where (2.6) and (2.13) have been used. Making use also of the {αt }t∈R -invariance of ω one finds, with suitable γ > 0, −1 ˆ sup e−iλ k·t h(t)ω(α (A )α (B )) dt t1 λ t2 λ k∈Vξ = sup fˆλ,k (s)ω(Aλ αs (Bλ )) ds k∈Vξ

−1 0β

≤ γ e−λ

(1 + λ−1 )s = O ∞ (λ)

as λ → 0

for all (Aλ )λ>0 , (Bλ )λ>0 ∈ A. This establishes statement (2.12).

Passivity and Microlocal Spectrum Condition

713

5) Combining now the assertions (2.8), (2.11) and (2.12), one can see that for each ω ∈ P there holds ACS 2A (ω) ⊂ {(ξ1 , ξ2 ) ∈ R2 \{0} : ξ1 + ξ2 = 0, ξ2 ≥ 0}. Since the set on the right-hand side obviously has no proper conic subset in R2 \{0}, one concludes that statement (1) of the proposition holds true. 6) As ω is KMS at β = 0, this means that it is a trace: ω(AB) = ω(BA). Since ω is also {αt }t∈R -invariant, we have ACS 2A (ω) ⊂ {(ξ1 , ξ2 ) ∈ R2 \{0} : ξ1 + ξ2 = 0}. The set on the right hand side has precisely two proper closed conic subsets W± := {(ξ1 , ξ2 ) ∈ R2 \{0} : ξ1 + ξ2 = 0, ±ξ2 ≥ 0}. These two sets are disjoint, W+ ∩ W− = ∅, and we have W+ = −W− . Hence, since ω is a trace, one can argue exactly as in [44, Prop. 4.2] to conclude that either ACS 2A (ω) ⊂ W+ or ACS 2A (ω) ⊂ W− imply ACS 2A (ω) = ∅. This establishes statement (2) of the proposition. ! " Hence we see that strict passivity of ω results in its ACS 2A (ω) being asymmetric. This is due to the fact that, roughly speaking, the negative part of the spectrum of the unitary group implementing {αt }t∈R in such a state is suppressed by an exponential weight factor. It is worth noting that this asymmetry is not present for KMS-states at β = 0. Such states at infinite temperature would hardly be regarded as candidates for physical states, and they can be ruled out by the requirement that ACS 2A (ω) be asymmetric. Remark 2.2. One can modify or, effectively, enlarge the set of testing families by allowing a testing family to depend on additional parameters: Define A2 as the set of all families (Ay,λ )λ>0,y∈Rm , where m ∈ N is arbitrary (and depends on the family) having the property that for each semi-norm σ ∈ S there is an s ≥ 0 (depending on σ and on the family) such that sup λs σ (A∗y,λ Ay,λ ) < ∞. λ,y

(2.14)

Then the definition of a regular direction k ∈ R2 \{0} for a state ω of the dynamical system (A, {αt }t∈R ) may be altered through declaring ξ a regular direction iff there are an open neighbourhood V of ξ and a function h ∈ C0∞ (R2 ), h(0) = 1, so that −1 sup sup e−iλ k·t h(t)ω(αt1 (Ay,λ )αt2 (Bz,λ )) dt = O ∞ (λ) as λ → 0 k∈V y,z

holds for any pair of elements (Ay,λ )λ>0,y∈Rm , (Bz,λ )λ>0,z∈Rn in A2 . This makes the set of regular directions a priori smaller, and if we define ACS 2A2 (ω) as the complement of all ξ ∈ R2 \{0} that are regular directions for ω according to the just given, altered definition then clearly we have, in general, ACS 2A2 (ω) ⊃ ACS 2A (ω). However, essentially by repeating – with somewhat more laborious notation – the proof of Prop. 2.1, one can see that the statements of Prop. 2.1 remain valid upon replacing ACS 2A (ω) by ACS 2A2 (ω). We shall make use of that observation later.

714

H. Sahlmann, R. Verch

3. Wavefront Sets and Propagation of Singularities 3.1. Wavefront sets of vectorbundle-distributions. Let X be a C ∞ vector bundle over a base manifold N (n = dim N ∈ N) with typical fibre isomorphic to Cr or to Rr ; the bundle projection will be denoted by πN . (We note that here and throughout the text, we take manifolds to be C ∞ , Hausdorff, 2nd countable, finite dimensional and without boundary.) We shall write C ∞ (X) for the space of smooth sections of X and C0∞ (X) for the subspace of smooth sections with compact support. These spaces can be endowed with locally convex topologies in a like manner as for the corresponding test-function spaces E(Rn ) and D(Rn ), cf. [9, 10] for details. By (C ∞ (X)) and (C0∞ (X)) we denote the respective spaces of continuous linear functionals, and by C0∞ (XU ) the space of all smooth sections in X having compact support in the open subset U of N . For later use, we introduce the following terminology. We say that ρ is a local diffeomorphism of some manifold X if ρ is defined on some open subset U1 = domρ of X and maps it diffeomorphically onto another open subset U2 = Ranρ of X. If U1 = U2 = X, then ρ is a diffeomorphism as usual. Let ρ be a (local) diffeomorphism of the base manifold N . Then we say that R is a (local) bundle map of X covering ρ if R is a smooth map from πN−1 (domρ) to πN−1 (Ranρ) so that, for each q in domρ, R maps the fibre over q linearly into the fibre over ρ(q). If this map is also one-to-one and if R is also a local diffeomorphism, then R will be called a (local) morphism of X covering ρ. Moreover, let (ρx )x∈B be a family of (local) diffeomorphisms of N depending smoothly on x ∈ B, where B is an open neighbourhood of 0 ∈ Rs for some s ∈ N. Then we call (Rx )x∈B a family of (local) morphisms of X covering (ρx )x∈B if each Rx is a morphism of X covering ρx , depending smoothly on x ∈ B. Note that each bundle map R of X covering a (local) diffeomorphism ρ of N induces a (local) action on C0∞ (X) in the form of a continuous linear map R 9 : C0∞ (Xdomρ ) → C0∞ (XRanρ ) given by R 9 f := R ◦f ◦ρ −1 ,

f ∈ C0∞ (Xdomρ ).

(3.1)

Given a local trivialization of X over some open U ⊂ N , this induces a oneto-one correspondence between C0∞ (XU ) and ⊕r D(U ), inducing in turn a one-toone correspondence between (C0∞ (XU )) and ⊕r D (U ). Now let u ∈ (C0∞ (XU )) and let (u1 , . . . , ur ) ∈ ⊕r D (U ) be the corresponding r-tupel of scalar distributions on U induced by the local trivialization of X over U . The wavefront set WF(u) of u ∈ (C0∞ (XU )) may then be defined as the union of the wavefront sets of the components ua , i.e. WF(u) :=

r

WF(ua ),

(3.2)

a=1

cf. [8].3 It is not difficult to check that this definition is, in fact, independent of the choice of local trivialization of X over U , and thus yields a definition of WF(u) for all u ∈ (C0∞ (X)) having the properties familiar of the wavefront set of scalar distributions on the base manifold N , so that WF(u) is a conical subset of T∗ N \{0}. Another characterization of WF(u) may be given in the following way. Let q ∈ U and ξ ∈ Tq∗ N \{0}. Choose any chart for U around q, thus identifying q with 0 ∈ Rn 3 We assume that the reader is familiar with the concept of the wavefront set of a scalar distribution, which is presented e.g. in the textbooks [25, 10, 37].

Passivity and Microlocal Spectrum Condition

715

and ξ with ξ ∈ T0∗ Rn ≡ Rn via the dual tangent map of the chart. With respect to the chosen coordinates, we introduce translations: ρˇx (y) := y + x, and dilations: δˇλ (y) := λy on a sufficiently small coordinate ball around y = 0 and taking λ > 0 and the norm of x ∈ Rn small enough so that the coordinate range isn’t left. Via pulling these actions back with help of the chart they induce families of local diffeomorphisms (ρx )x∈B and (δλ )0<λ<λ0 of U for sufficiently small index ranges. Now let Fq (X) be the set of all families (fλ )λ>0 of sections in X with (i) fλ ∈ C0∞ (Xδλ K ) for some open neighbourhood K of q when λ is sufficiently small (ii) For each continuous seminorm σ on C0∞ (X) there is s ≥ 0 so that supλ λs σ (fλ ) < ∞. With these conventions, we can formulate: Lemma 3.1. (q, ξ ) is not contained in WF(u) if and only if the following holds: For any family (Rx )x∈B of local morphisms of X covering (ρx )x∈B there is some h ∈ D(Rn ) with h(0) = 1, and an open neighbourhood V of ξ (in Rn ≡ Tq∗ N ), such that for all (fλ )λ>0 ∈ Fq (X) one has −iλ−1 k·x 9 h(x)u(Rx fλ ) dx = O ∞ (λ) as λ → 0. (3.3) sup e k∈V

Proof. Select a local trivialization of X over U . With respect to it, there are smooth GL(r)-valued functions (Rba (x))ra,b=1 of x such that 4 u(Rx9 fλ ) = Rba (x)ua (fλb ◦ρx−1 ). Now suppose that (q, ξ ) is not in WF(u), so that (q, ξ ) isn’t contained in any of the WF(ua ). Then, making use of the fact that the wavefront set of a scalar distribution may be characterized in terms of the decay properties of its localized Fourier-transforms in any coordinate chart (cf. [25]) in combination with Prop. 2.1 and Lemma 2.2 in [44], one obtains immediately the relation (3.3). Conversely, assume that (3.3) holds. Since (Rba (x))ra,b=1 is in GL(r) for each x and depends smoothly on x, we can find a smooth family (Scb (x))rb,c=1 of functions of x so that Scb (x)Rba (x) = δca , x ∈ B. Since (3.3) holds, one may apply Lemma 2.2 of [44] to the effect that for some open neighbourhood V of ξ and for all ((0, . . . , ϕλ , . . . , 0))λ>0 ∈ Fq (X), where only the cth entry is non-vanishing, one has −1 sup e−iλ k·x h(x)uc (ϕλ ◦ρx−1 ) dx k∈V −iλ−1 k·x b a −1 = sup e h(x)Sc (x)Rb (x)ua (ϕλ ◦ρx ) dx k∈V ∞

= O (λ)

as λ → 0.

Then one concludes from Prop. 2.1 in [44] that (q, ξ ) isn’t contained in WF(uc ) for each c = 1, . . . , r. ! " 4 Summation over repeated indices will be assumed from now on. See also footnote 5.

716

H. Sahlmann, R. Verch

A very useful property is the behaviour of the wavefront set under (local) morphisms of X. We put on record here the following lemma without proof, which may be obtained by extending the proof for the scalar case in [25] together with some of the arguments appearing in the proof of Lemma 3.1. Lemma 3.2. Let U1 and U2 be open subsets of N , and let R : XU1 → XU2 be a vector bundle map covering a diffeomorphism ρ : U1 → U2 . Let u ∈ (C0∞ (XU1 )) . Then it holds that t t WF(R 9 u) ⊂ DρWF(u) = {(ρ −1 (x), Dρ · ξ ) : (x, ξ ) ∈ WF(u)},

(3.4)

t where Dρ denotes the transpose (or dual) of the tangent map of ρ. If R is even a bundle morphism, then the inclusion (3.4) specializes to an equality.

3.2. Briefing on spacetime geometry. Since several concepts of spacetime geometry are going to play some role later on, we take the opportunity to introduce them here and establish the corresponding notation. We refer to the standard references [46, 23] for a more thorough discussion and also for definition of some well-established terminology that is not always introduced explicitly in the following. Let us assume that (M, g) is a spacetime, so that M is a smooth manifold of dimension m ≥ 2, and g is a Lorentzian metric having signature (+, −, . . . , −). It will also be assumed that the spacetime is time-orientable, and that a time-orientation has been chosen. Then one introduces, for any subset G of M, the corresponding future/past sets J ± (G), consisting of all points lying on piecewise smooth, continuous future/pastdirected causal curves emanating from G. A subset G ⊂ M is, by definition, causally separated from G if it has void intersection with J + (G) ∪ J − (G). Thus a pair of points (q, p) ∈ M × M is called causally separated if q is causally separated from p or vice versa, since this relation is symmetric. A smooth hypersurface @ in M is called a Cauchy-surface if each inextendible causal curve in (M, g) intersects @ exactly once. Spacetimes (M, g) possessing Cauchysurfaces are called globally hyperbolic. It can be shown that a globally hyperbolic spacetime admits smooth one-parametric foliations into Cauchy-surfaces. Globally hyperbolic spacetimes have a very well-behaved causal structure. A certain property of globally hyperbolic spacetimes will be important for applying the propagation of singularities theorem in Sect. 5, so we mention it here: Let v be a nonzero lightlike vector in Tq M for some q ∈ M. It defines a maximal smooth, affinely d parametrized geodesic γ : I → M with the properties γ (0) = q and dt γ (t)t=0 = v where “maximal” here refers to choosing I as the largest real interval (I is taken as a neighbourhood of 0, and may coincide e.g. with R) where γ is a smooth solution of the geodesic equation compatible with the specified data at q. Then γ is both future- and past-inextendible (see e.g. the argument in [35, Prop. 4.3]), and consequently, given an arbitrary Cauchy-surface @ ⊂ M, there is exactly one parameter value t ∈ I so that γ (t) ∈ @. 3.3. Wave-operators and propagation of singularities. Suppose that we are given a timeoriented spacetime (M, g). Then let V be a vector bundle with base manifold M, typical fibre isomorphic to Cr , and bundle projection πM . Moreover, we assume that there exists a morphism C of V covering the identity map of M which is involutive (C ◦C = idV ) and

Passivity and Microlocal Spectrum Condition

717

acts anti-isomorphically on the fibres; in other words, C acts like a complex conjugation in each fibre space. Therefore, the C-invariant part V◦ of V is a vector bundle over the base M with typical fibre Rr . A linear partial differential operator P : C0∞ (V) → C0∞ (V) will be said to have metric principal part if, upon choosing a local trivialization of V over U ⊂ M in which sections f ∈ C0∞ (VU ) take the component representation (f 1 , . . . , f r ), and a chart (x µ )m µ=1 , one has the following coordinate representation for 5 P: (Pf )a (x) = g µν (x)∂µ ∂ν f a (x) + Aν ab (x)∂ν f b (x) + Bba (x)f b (x). Here, ∂µ denotes the coordinate derivative ∂x∂ µ , and Aν ab and Bba are suitable collections of smooth, complex-valued functions. Observe that thus the principal part of P diagonalizes in all local trivializations (it is “scalar”). If P has metric principal part and is in addition C-invariant, i.e. C 9 ◦P ◦C 9 = P ,

(3.5)

then we call P a wave operator. In this case, P leaves the space C0∞ (V◦ ) of C 9 -invariant sections invariant. As an aside we note that then there is a covariant derivative (linear connection) ∇ (P ) on V◦ together with a bundle map v of V◦ covering idM such that Pf = g µν ∇µ(P ) ∇ν(P ) f + v 9 f for all f ∈ C0∞ (V◦ ); this covariant derivative is given by (P )

2 · ∇gradϕ f = P (ϕf ) − ϕP (f ) − (✷g ϕ)f for all ϕ ∈ C0∞ (M, R) and f ∈ C0∞ (V◦ ), where ✷g denotes the d’Alembert-operator induced by the metric g on scalar functions [20]. Before we can state the version of the propagation of singularities theorem that will be relevant for our considerations later, we need to introduce further notation. By V V we denote the outer product bundle of V. This is the C ∞ -vector bundle over M × M whose fibres over (q1 , q2 ) ∈ M × M are Vq1 ⊗ Vq2 , where Vqj denotes the fibre over qj (j = 1, 2), and with base projection defined by vq1 ⊗ v q2 → (q1 , q2 )

for

vq1 ⊗ v q2 ∈ Vq1 ⊗ Vq2 .

Note also that the conjugation C on V induces a conjugation 2 C on VV by anti-linear extension of the assignment 2 C(vq1 ⊗ v q2 ) := Cvq1 ⊗ Cv q2 ,

qj ∈ M.

The definition of n V, the n-fold outer tensor product of V, should then be obvious, and likewise the definition of n C. Going to local trivializations and using partition of unity arguments, it is not difficult to see that the canonical embedding C0∞ (V) ⊗ C0∞ (V) ⊂ C0∞ (V V) is dense ([10]). 5 Greek indices are raised and lowered with g µ (x), latin indices with δ a . ν b

718

H. Sahlmann, R. Verch

Moreover, if we take some L ∈ (C0∞ (V V)) , then it induces a bilinear form F over C0∞ (V) by setting F(f, f ) = L(f ⊗ f ),

f, f ∈ C0∞ (V).

(3.6)

Clearly F is then jointly continuous in both entries. On the other hand, if F is a bilinear form over C0∞ (V) which is separately continuous in both entries (f → F(f, f ) and f → F(f , f ) are continuous maps for each fixed f ), then the nuclear theorem implies that there is an L ∈ (C0∞ (V V)) inducing F according to (3.6) [10]. These statements generalize to the case of n-fold tensor products in the obvious manner. Now define 6 N := {(q, ξ ) ∈ T∗ M\{0} : g µν (q)ξµ ξν = 0}. Moreover, define for each pair (q, ξ ; q , ξ ) ∈ N × N : (q, ξ ) ∼ (q , ξ ) iff there exists an affine parametrized lightlike geodesic γ in (M, g) connecting q and q and such that ξ and ξ are co-tangent to γ at q and q , respectively. d Here, we say that ξ is co-tangent to γ at q = γ (s) if ( dt γ (t))µ = g µν (q)ξν , t=s where t is the affine parameter. Therefore, (q, ξ ) ∼ (q , ξ ) means ξ and ξ are parallel transports of each other along the lightlike geodesic γ connecting q and q . Note that the possibility q = q is included, in which case (q, ξ ) ∼ (q , ξ ) means ξ = ξ . One can introduce the following two disjoint future/past-oriented parts (with respect to the time-orientation of (M, g)) of N , N± := {(q, ξ ) ∈ N | ± ξ ✄ 0},

(3.7)

where ξ ✄ 0 means that the vector ξ µ = g µν ξν is future-pointing. The relation “∼” is obviously an equivalence relation between elements in N . For (q, ξ ) ∈ N , the corresponding equivalence class is denoted by B(q, ξ ); it is a bicharacteristic strip of any wave operator P on V since such an operator has metric principal part and therefore its bi-characteristics are lightlike geodesics (see, e.g. [30]). Now we are ready to state a specialized version of the propagation of singularities theorem (PST) which is tailored for two-point distributions that are solutions (up to C ∞ -terms) of wave operators, and which derives as a special case of the PST in [8]. We should like to point out that the formulation of the PST in [8] (extending arguments developed in [12] for the scalar case) is considerably more general in two respects: First, it applies, with suitable modifications, not only to linear second order differential operators with metric principal part, but to pseudo-differential operators on C0∞ (V) that have a so-called “real principal part” (of which “metric principal part” is a special case, note also that a metric principal part is homogeneous). Secondly, the general formulation of the PST gives not only information about the wavefront set of a u ∈ (C0∞ (V)) which is a solution up to C ∞ -terms of a pseudo-differential operator A having real principal part (i.e. WF(Au) = ∅), but even describes properties of the polarization set of such a u. The polarization set WFpol (u) of u ∈ (C0∞ (V)) is a subset of the direct product bundle T∗ M ⊕ V over M and specifies which components of u (in a local trivialization of V) have the worst decay properties in Fourier-space near any given base point in M; 6 The notation (q, ξ ) ∈ T∗ M means that ξ ∈ T∗ M, i.e. q denotes the base point of the cotangent q vector ξ .

Passivity and Microlocal Spectrum Condition

719

the projection of WFpol (u) onto its T∗ M-part coincides with the wavefront set WF(u). The reader is referred to [8] for details and further discussion, and also to [33, 24] for a discussion of the polarization set for Dirac fields on curved spacetimes. As a corollary to the PST formulated in [8] together with Lemma 6.5.5 in [12] (see also [30] for an elementary account), one obtains the following: Proposition 3.3. Let P be a wave operator on C0∞ (V) and define for w ∈ (C0∞ (VV)) the distributions w(P ) , w(P ) ∈ (C0∞ (V V)) by w(P ) (f ⊗ f ) := w(Pf ⊗ f ),

(3.8)

w(P ) (f ⊗ f ) := w(f ⊗ Pf ), for all f, f ∈ C0∞ (V). Suppose that WF(w(P ) ) = ∅ = WF(w (P ) ). Then it holds that WF(w) ⊂ N × N and

(q, ξ ; q , ξ ) ∈ WF(w) with ξ = 0 and ξ = 0 ⇒ B(q, ξ ) × B(q , ξ ) ⊂ WF(w). 4. Quantum Fields 4.1. The Borchers algebra. We begin our discussion of linear quantum fields obeying a wave equation by recalling the definition and basic properties of the Borchers-algebra [2]. Let V denote a vector bundle over the base-manifold M as in the previous section. Then consider the set ∞ n B := {f ≡ (fn )∞ n=0 : f0 ∈ C, fn ∈ C0 ( V), only finitely many fn = 0},

where n V denotes the n-fold outer product bundle of V, cf. Sect. 3. The set B is a priori a vector space, but one may also introduce a ∗-algebraic structure on it: A product f · g for elements f, g ∈ B is given by defining the nth component (f · g)n to be (f · g)n := fi ⊗ gj . i+j =n

Here, fi ⊗ gj is understood as the element in C0∞ (n V) induced by the canonical embedding C0∞ (i V) ⊗ C0∞ (j V) ⊂ C0∞ (n V). Observe that B possesses a unit th element 1B , given by the sequence ((1B )n )∞ n=0 having the number 1 in the 0 component ∗ while all other components vanish. Moreover, for f ∈ B one can define f by setting fn∗ (q1 , . . . , qn ) := (n C)fn (qn , . . . , q1 ),

qj ∈ M,

(4.1)

for the nth component of f∗ , where C denotes the complex conjugation assumed to be given on V. This yields an anti-linear involution on B. With these definitions of product and ∗-operation, B is a ∗-algebra. Furthermore, B has a natural “local net structure” in the sense that one obtains an inclusion-preserving map M ⊃ O → B(O) ⊂ B taking subsets O of M to unital ∗-subalgebras B(O) of B upon defining B(O) to consist of all (fn )∞ n=0 for which supp fn ⊂ O, n ∈ N.

720

H. Sahlmann, R. Verch

Another simple fact is that (local) morphisms of V commuting with C can be lifted to (local) automorphisms of B. To this end, let (Rx )x∈B be a family of (local) morphisms of V covering (ρx )x∈B , and assume that CRx = Rx C for all x. Suppose that O ⊂ M is in the domain of ρx ; then define a map αx on B(O) by setting for f ∈ B(O) the nth component, (αx f)n , of αx f to be (αx f)n := (n Rx9 )fn ,

(4.2)

where (n Rx9 )(g (1) ⊗ · · · ⊗ g (n) ) := Rx9 g (1) ⊗ · · · ⊗ Rx9 g (n) ,

g (j ) ∈ C0∞ (V),

defines the outer product action of Rx9 via linear extension on C0∞ (n V). It is not difficult to check that this yields a ∗-isomorphism αx : B(O) → B(ρx (O)). We will now turn B into a locally convex space by giving it the topology of the strict inductive limit of the toplogical vector spaces Bn := C ⊕

n

C0∞ (k V),

n ∈ N.

k=1

This topology is known as the locally convex direct sum topology (see e.g. [3, Chap. II, §4 n◦ 5]). Some important properties of B, equipped with this topology are given in the following lemma, the proof of which will be deferred to Appendix B. Lemma 4.1. With the topology given above, B is complete and a topological ∗-algebra. Moreover, a linear functional u : B → C is continuous if and only if there is a sequence ∞ j (un )∞ n=0 with u0 ∈ C and uj ∈ (C0 ( V)) for j ∈ N so that u(f) = u0 f0 + uj (fj ), f ∈ B. (4.3) j ∈N

If α is a ∗-automorphism lifting a morphism R of V to B as in (4.2), then α is continuous. Moreover, let (Rx )x∈B be a family of morphisms of V depending smoothly on x with CRx = Rx C and R0 = idV , and let (αx )x∈B be the induced family of ∗-automorphisms of B induced according to (4.2). Then for each f ∈ B it holds that αx (f) → f

for

x → 0,

(4.4)

and there is a constant r > 0 such that to each continuous semi-norm σ of B one can find another semi-norm σ with the property σ (αx (f)) ≤ σ (f),

|x| ≤ r, f ∈ B.

(4.5)

4.2. States and quantum fields. A state ω on B is a continuous linear form on B which fulfills the positivity requirement ω(f∗ f) ≥ 0 for all f ∈ B. By Lemma 4.1 such a state ω is completely characterized by a set {ωn |n ∈ N0 } of linear functionals ωn ∈ (C0∞ (n V)) , the so-called n-point functions. The positivity requirement allows it to associate with any state ω a Hilbertspace ∗-representation by the well-known Gelfand–Naimark–Segal (GNS) construction (or the Wightman reconstruction theorem [41]). More precisely, given a state on B, there exists a triple (ϕ, D ⊂ H, G), called GNS-representation of ω, possessing the following properties:

Passivity and Microlocal Spectrum Condition

721

(a) H is a Hilbertspace, and D is a dense linear subspace of H. (b) ϕ is a ∗-representation of B on H by closable operators with common domain D. (c) G is a unit vector contained in D which is cyclic, i.e. D = ϕ(B)G, and has the property that ω(f) = *G, ϕ(f)G+, f ∈ B. Furthermore, the GNS-representation is unique up to unitary equivalence. We refer to [40, Part II] for further details on ∗-representations of ∗-algebras as well as for a proof of these statements and references to the relevant original literature. Therefore, a state ω on B induces a quantum field – that is to say, an operator-valued distribution C0∞ (V) f → H(f ) := ϕ(f),

f = (0, f, 0, 0, . . . ),

(4.6)

where the H(f ) are, for each f ∈ C0∞ (V), closable operators on the dense and invariant domain D and one has H(C 9 f ) ⊂ H(f )∗ , where H(f )∗ denotes the adjoint operator of H(f ). Conversely, such a quantum field induces states on B: Given some unit vector ψ ∈ D, the assignment ω

(ψ)

(f

(1)

ω(ψ) (c · 1B ) := c, c ∈ C, ⊗ · · · ⊗ f (n) ) := *ψ, H(f (1) ) · · · H(f (n) )ψ+,

f (j ) ∈ C0∞ (V),

defines, by linear extension, a state ω(ψ) on B. (Obviously this generalizes from vector states to mixed states.) If the quantum field H is an observable field, then one would require commutativity at causal separation, and this means H(f )H(f ) = H(f )H(f ) whenever the supports of f and f are causally separated. Such commutative behaviour (locality) of H at causal separation is characteristic of bosonic fields. On the other hand, a field H is fermionic if it anti-commutes at causal separation (twisted locality), i.e. H(f )H(f ) = −H(f )H(f ) for causally separated supports of f and f . The general analysis of quantum field theory so far has shown that the alternative of having quantum fields of bosonic or fermionic character may largely be viewed as generic at least for spacetime dimensions greater than 2 [21, 41, 11, 19]. If ω is a state on B inducing via its GNS-representation a bosonic field, then it follows (−) that the commutator ω2 of its two-point function, defined by (−)

ω2 (f ⊗ f ) :=

1 (ω2 (f ⊗ f ) − ω2 (f ⊗ f )), 2

f, f ∈ C0∞ (V),

vanishes as soon as the supports of f and f are causally separated. If, on the other hand, ω induces a fermionic field, then the anti-commutator, (+)

ω2 (f ⊗ f ) :=

1 (ω2 (f ⊗ f ) + ω2 (f ⊗ f )), 2

f, f ∈ C0∞ (V),

of its two-point function vanishes when the supports of f and f are causally separated. For our purposes in Sect. 5, we may assume a weaker version of bosonic or fermionic (+) (−) behaviour of quantum fields: We shall later suppose that ω2 or ω2 is smooth (C ∞ ) at causal separation. The definition relevant for that terminology is as follows:

722

H. Sahlmann, R. Verch

Definition 4.2. Let w ∈ (C0∞ (V V)) . We say that w is smooth at causal separation if WF(wQ ) = ∅, where Q is the set of all pairs of points (q, q ) ∈ M × M which are causally separated in (M, g) 7 and wQ denotes the restriction of w to C0∞ ((V V)Q ).

4.3. Quasifree states. Of particular interest are quasifree states associated with quantum fields obeying canonical commutation relations (CCR) or canonical anti-commutation relations (CAR). A simple way of introducing them is via the characterization of such states given in [29] which we will basically follow here. Note, however, that in this reference the map K in (4.7) is defined on certain quotients of C0∞ (V◦ ) while we define K on C0∞ (V◦ ) itself (recall that C0∞ (V◦ ) is the space of C 9 -invariant sections). This is due to the fact that we haven’t imposed CCR or CAR for states on the Borchers algebra, so the notion of quasifree states given here is, in this respect, more general. Let h be a complex Hilbertspace (the so-called “one-particle Hilbertspace”) and † F± (h) the bosonic/fermionic Fock-space over h. By a± (.) and a± (.) we denote the corresponding annihilation and creation operators, respectively. The Fock-vacuum vector will be denoted by G± . Then we say that a state ω on B is a (bosonic/fermionic) quasifree state if there exists a real-linear map K : C0∞ (V◦ ) → h

(4.7)

whose complexified range is dense in h, such that the GNS-representation (ϕ, D ⊂ H, G) of ω takes the following form: H = F± (h), G = G± , and

1 † (K(f )) , H(f ) = √ a± (K(f )) + a± 2

f ∈ C0∞ (V◦ ),

where H(.) relates to ϕ(.) as in (4.6). Quasifree states are in a sense the most simple states. It is, however, justified to consider prominently those states since for quantum fields obeying a linear wave-equation, ground- and KMS-states turn out to be quasifree in examples. Any quasifree state ω is entirely determined by its two-point function, i.e. by the map C0∞ (V) × C0∞ (V) (f (1) , f (2) ) → ω(f (1) ⊗ f (2) ) = *G, H(f (1) )H(f (2) )G+, in the sense that the n-point functions ωn (f (1) ⊗ · · · ⊗ f (n) ) = *G, H(f (1) ) · · · H(f (n) )G+,

f (j ) ∈ C0∞ (V),

vanish for all odd n, while the n-point functions for even n can be expressed as polynomials in the variables ω2 (f (i) ⊗f (j ) ), i, j = 1, . . . , n. This attaches particular significance to the two-point functions for quantum fields obeying linear wave equations. We refer to [4, 29, 1] for further discussion of quasifree states and their basic properties. 7 Q is an open subset in M × M due to global hyperbolicity.

Passivity and Microlocal Spectrum Condition

723

5. Passivity and Microlocal Spectrum Condition In the present section we will state and prove our main result connecting passivity and microlocal spectrum condition for linear quantum fields obeying a hyperbolic wave equation on a globally hyperbolic, stationary spacetime. First, we need to collect the assumptions. It will be assumed that V is a vector bundle, equipped with a conjugation C, over a base manifold M carrying a time-orientable Lorentzian metric g, and that (M, g) is globally hyperbolic. Moreover, we assume that the spacetime (M, g) is stationary, so that there is a one-parametric C ∞ -group {τt }t∈R of isometries whose generating vector field, denoted by ∂ τ , is everywhere timelike and future-pointing (with respect to a fixed time-orientation). We recall that the notation N± for the future/past-oriented parts of the set of null-covectors N has been introduced in (3.7), and note that (q, ξ ) ∈ N± iff ±ξ(∂ τ ) > 0. It is furthermore supposed that there is a smooth one-parametric group {Tt }t∈R of morphisms of V covering {τt }t∈R , and a wave operator P on C0∞ (V), having the following properties: Tt9 ◦P = P ◦Tt9 ,

C ◦Tt = Tt ◦C,

t ∈ R.

Now let B again denote the Borchers algebra as in Sect. 4. The automorphism group induced by lifting {Tt }t∈R on B according to (4.2) will be denoted by {αt }t∈R . Whence, by Lemma 4.1, (B, {αt }t∈R ) is a topological ∗-dynamical system. Recall that a state ω on B is, by definition, contained in P if it is a convex combination of ground- or KMS-states at strictly positive inverse temperature for {αt }t∈R . Theorem 5.1. Let ω ∈ P and let ω2 be the two-point distribution of ω (see Sect. 4.1). (P ) (P ) Suppose that WF(ω2 ) = ∅ = WF(ω2(P ) ), where ω2 and ω2(P ) are defined as in (+) (−) (3.8), and suppose also that the symmetric part ω2 or the anti-symmetric part ω2 of the two-point distribution is smooth at causal separation (Definition 4.2). Then it holds that WF(ω2 ) ⊂ R, where R is the set R := {(q, ξ ; q , ξ ) ∈ N− × N+ : (q, ξ ) ∼ (q , −ξ )}.

(5.1)

Proof. 1) Let q be any point in M. Then there is a coordinate chart κ = (y 0 , y) = (y 0 , y 1 , . . . , y m−1 ) around q so that, for small |t|, κ ◦τt = τˇt ◦κ holds on a neighbourhood of q, where τˇt (y 0 , y) := (y 0 + t, y). In such a coordinate system, we can also define “spatial” translations ρˇx (y 0 , y) := (y 0 , y + x) for x = (x 1 , . . . , x m−1 ) in a sufficiently small neighbourhood B of the origin in Rm−1 . Let (Rx )x∈B be any smooth family of local morphisms around q covering (ρx )x∈B , where ρx := κ −1 ◦ρˇx ◦κ (on a sufficiently small neighbourhood of q). Now let q be another point, and choose in an analogous manner as for q a coordinate system κ , and (ρx )x ∈B and (Rx )x ∈B .

724

H. Sahlmann, R. Verch

2) In a further step we shall now establish the relation WF(ω2 ) ⊂ {(q, ξ ; q , ξ ) ∈ (T∗ M × T∗ M)\{0} : ξ(∂ τ ) + ξ (∂ τ ) = 0, ξ (∂ τ ) ≥ 0}. (5.2) Since we have WF(ω2 ) ⊂ N × N by Prop. 3.2, this then allows us to conclude that WF(ω2 ) ⊂ {(q, ξ ; q , ξ ) ∈ N− × N+ : ξ(∂ τ ) + ξ (∂ τ ) = 0},

(5.3)

and we observe that thereby the possibility (q, ξ ; q , ξ ) ∈ WF(ω2 ) with ξ = 0 or ξ = 0 is excluded, because that would entail both ξ = 0 and ξ = 0. For proving (5.2) it is in view of Lemma 3.1 and according to our choice of the coordinate systems κ, κ and corresponding actions (Rx )x∈B and (Rx )x ∈B sufficient to demonstrate that the following holds: There is a function h ∈ C0∞ (Rm × Rm ) with h(0) = 1, and for each (ξ ; ξ ) = (ξ0 , ξ ; ξ0 , ξ ) ∈ (Rm × Rm )\{0} with ξ0 + ξ0 = 0 or ξ0 < 0 there is an open neighbourhood V ∈ (Rm × Rm )\{0} so that sup

(k;k )∈V

−1 e−iλ−1 (tk0 +x·k) e−iλ (t k0 +x ·k ) h(t, x; t , x )ω2 ((T 9 R 9 ⊗ T 9 R 9 )Fλ ) dt dt dx dx t x t x = O ∞ (λ)

(5.4)

as λ → 0 holds for all (Fλ )λ>0 ∈ F(q;q ) (V V). (The notation k = (k0 , k) should be obvious.) However, making use of part (c) of the statement of Prop. 2.1 in [44], for proving (5.4) it is actually enough to show that there are h and V as above so that

sup

(k;k )∈V

−1 e−iλ−1 (tk0 +x·k) e−iλ (t k0 +x ·k ) h(t, x; t , x )ω2 (T 9 R 9 fλ ⊗ T 9 R 9 f ) dt dt dx dx t x λ t x = O ∞ (λ)

(5.5)

as λ → 0 holds for all (fλ )λ>0 ∈ Fq (V) and all (fλ )λ>0 ∈ Fq (V). In order now to exploit the strict passivity of ω via Prop. 2.1, we define the set B2 of testing families with respect to the Borchers algebra B in the same manner as we have defined the set A2 of testing families for the algebra A in Remark 2.2. In other words, a B-valued family (fz,λ )λ>0,z∈Rn is a member of B2 , for arbitrary n ∈ N, whenever for each continuous seminorm σ on B there is some s ≥ 0 so that ∗ fz,λ ) < ∞. sup λs σ (fz,λ z,λ

Now if (fλ )λ>0 is in Fq (V), then (fx,λ )λ>0,x∈B defined by fx,λ := (0, Rx9 fλ , 0, 0, . . . )

(5.6)

is easily seen to be a testing family in B2 . The same of course holds when taking any (fλ )λ>0 ∈ Fq (V) and defining (fx ,λ )λ>0,x ∈B accordingly. Since ω ∈ P, it follows from Prop. 2.1 and Remark 2.2 that, with respect to the time-translation group {αt }t∈R , ACS 2B2 (ω) ⊂ {(ξ0 , ξ0 ) ∈ R2 \{0} : ξ0 + ξ0 = 0, ξ0 ≥ 0}.

Passivity and Microlocal Spectrum Condition

725

And this means that there is some h0 ∈ C0∞ (R2 ) with h0 (0) = 1, and for each (ξ0 , ξ0 ) ∈ R2 \{0} with ξ0 + ξ0 = 0 or ξ0 < 0 an open neighbourhood V0 in R2 \{0} so that e−iλ−1 (tk0 +t k0 ) h0 (t, t )ω(αt (fx,λ )αt (f )) dt dt = O ∞ (λ) (5.7) sup x ,λ (k0 ,k0 )∈V0 ,x,x

as λ → 0 for all (fx,λ )λ<0,x∈B and (fx ,λ )λ>0,x ∈B in B2 . When (fx,λ )λ>0,x∈B ∈ B2 relates to (fλ )λ>0 ∈ Fq (V) as in (5.6), and if their primed counterparts are likewise related, then for sufficiently small |t| and x ∈ B, x ∈ B one has ω(αt (fx,λ )αt (fx ,λ )) = ω2 (Tt9 Rx9 fλ ⊗ Tt9 Rx 9 fλ ) for small enough λ. Whence, upon taking V = {(k0 , k; k0 , k ) : (k0 , k0 ) ∈ V0 , k, k ∈ Rm−1 } and h(t, x; t , x ) = h0 (t, t )h(x, x ), where h is in C0∞ (Rm−1 × Rm−1 ) with h(0) = 1, and with h0 and h having sufficiently small supports, it is now easy to see that (5.7) entails the required relation (5.5), proving (5.2), whence (5.3) is also established. (+)

3) Now we shall show the assumption that ω2 is smooth at causal separation to imply (−) that also ω2 and hence, ω2 itself is smooth at causal separation. The same conclusion (−) can be drawn assuming instead that ω2 is smooth at causal separation. We will present the proof only for the first mentioned case, the argument for the second being completely analogous. We define Q as the set of pairs of causally separated points (q, q ) ∈ M × M. The (+) restriction of ω2 to C0∞ ((V V)Q ) will be denoted by ω2Q . By assumption, ω2Q (−)

has empty wavefront set and therefore WF(ω2Q ) = WF(ω2Q ). Since (q, q ) ∈ Q iff (q , q) ∈ Q, the “flip” map ρ : (q, q ) → (q , q) is a diffeomorphism of Q. Then R : Vq ⊗ Vq vq ⊗ v q → v q ⊗ vq ∈ Vq ⊗ Vq

(5.8)

is a morphism of (V V)Q covering ρ. Thus one finds [R 9 (f ⊗ f )](q, q ) = f (q) ⊗ f (q ), implying

(−)

(−)

(−)

ω2Q (R 9 (f ⊗ f )) = ω2Q (f ⊗ f ) = −ω2Q (f ⊗ f )

for all f ⊗ f ∈ C0∞ ((V V)Q ). Noting that multiplication by constants different from zero doesn’t change the wavefront set of a distribution, this entails, with Lemma 3.2, (−)

(−)

(−)

WF(ω2Q ) = WF(ω2Q ◦R 9 ) = tDρWF(ω2Q ).

(5.9)

Now it is easy to check that Dρ(q, ξ ; q , ξ ) = (q , ξ ; q, ξ )

t

for all (q, ξ ; q , ξ ) ∈ T∗ M × T∗ M, and this implies t

Dρ(N− × N+ ) = N+ × N− .

(5.10)

726

H. Sahlmann, R. Verch (+)

However, since we already know from (5.3) that WF(ω2Q ) ⊂ N− ×N+ and WF(ω2Q ) = (−)

∅, we see that WF(ω2Q ) ⊂ N− × N+ . Combining this with (5.9) and (5.10) yields (−)

WF(ω2Q ) ⊂ (N− × N+ ) ∩ (N+ × N− ) = ∅. And thus we conclude that ω2 is smooth at causal separation. 4) Now we will demonstrate that the wavefront set has the form (5.1) for points (q, q) on the diagonal in M × M, by demonstrating that otherwise singularities for causally separated points would occur according to the propagation of singularities (Prop. 3.3). To this end, let (q, ξ ; q, ξ ) be in WF(ω2 ) with ξ not parallel to ξ . In view of the observation made below (5.3) that we must have ξ = 0 and ξ = 0, we obtain from Prop. 3.3 B(q, ξ ) × B(q, ξ ) ⊂ WF(ω2 ). For any Cauchy surface of M, one can find (p, η; p , η ) in B(q, ξ ) × B(q, ξ ) with p and p lying on that Cauchy surface because of the inextendibility of the bi-characteristics. Since ξ is not parallel to ξ , one can even choose that Cauchy surface so that p = p (if such a choice were not possible, the bi-characteristics through q with cotangent ξ and ξ would coincide). But this is in contradiction to the result of 3) since p and p are causally separated. Hence, only (q, ξ ; q, ξ ) with ξ = λξ , λ ∈ R can be in WF(ω2 ). Applying the constraint ξ(∂ τ ) + ξ (∂ τ ) = 0 found in (5.3) gives λ = −1. Together with the other constraint WF(ω2 ) ⊂ N− × N+ of (5.3) we now see that if (q, ξ ; q, ξ ) is in WF(ω2 ) it must be in R. 5) It will be shown next that ω2 is smooth at points (q, q ) in M × M which are causally related but not connected by any lightlike geodesic: Suppose (q, ξ ; q , ξ ) were in WF(ω2 ) with q, q as described. Using global hyperbolicity and the inextendibility of the bi-characteristics, we can then find (p, η) in B(q, ξ ) with p lying on the same Cauchy surface as q . As p cannot be equal to q by assumption, it must be causally separated from q , and so we have by Prop. 3.3 a contradiction to 3). Thus, ω2 must indeed be smooth at (q, q ). 6) Finally, we consider the case of points (q, q ) connected by at least one lightlike geodesic: Let (q, ξ ; q , ξ ) be in WF(ω2 ). To begin with, we assume additionally that ξ is not co-tangential to any of the lightlike geodesics connecting q and q . As in 4) we then find (p, η; p , η ) in B(q, ξ ) × B(q , ξ ) with p and p lying on the same Cauchy surface and p = p , thus establishing a contradiction to 3). To cover the remaining case, let ξ be co-tangential to one of the lightlike geodesics connecting q and q . As a consequence, we find η with (q , η) ∈ B(q, ξ ). By 4), we have η = −ξ , ξ ✄ 0, showing (q, ξ ; q , ξ ) to be in R. ! " We conclude this article with a few remarks. First we mention that for the canonically quantized scalar Klein–Gordon field, WF(ω2 ) ⊂ R implies WF(ω2 ) = R and thus the two-point function of every strictly passive state is of Hadamard form, see [35]. Results allowing similar conclusions for vector-valued fields subject to CCR or CAR will appear in [38]. In [27], quasifree ground states have been constructed for the scalar Klein-Gordon field on stationary, globally hyperbolic spacetimes where the norm of the Killing vector field is globally bounded away from zero. Our result shows that they all have two-point functions of Hadamard form. As mentioned in the introduction, quasifree ground- and KMS-states have also been constructed for the scalar Klein-Gordon field on Schwarzschild spacetime [28], and again we conclude that their two-point functions are of Hadamard form.

Passivity and Microlocal Spectrum Condition

727

In [18], massive vector fields are quantized on globally hyperbolic, ultrastatic spacetimes using (apparently) a ground state representation, and our methods apply also in this case.

Appendix A. Ground- and KMS-States, Passivity Let (A, {αt }t∈R ) be a topological ∗-dynamical system as described in Sect. 2. We recall that a continuous linear functional ω : A → C is called a state if ω(A∗ A) ≥ 0 for all A ∈ A and ω(1A ) = 1. Now let fˆ(t) := √1 e−ipt f (p) dp, f ∈ C0∞ (R), denote 2π the Fourier-transform. Note that fˆ extends to an entire analytic function of t ∈ C. Then a convenient way of defining ground- and KMS-states is the following: The state ω is a ground state for (A, {αt }t∈R ) if R t → ω(Aαt (B)) is, for each A, B ∈ A, a bounded function and if moreover, ∞ fˆ(t)ω(Aαt (B)) dt = 0, A, B ∈ A, (A.1) −∞

holds for all f ∈ C0∞ ((−∞, 0)). The state ω is a KMS state at inverse temperature β > 0 for (A, {αt }t∈R ) if R t → ω(A, αt (B)) is, for each A, B ∈ A, a bounded function and if moreover,

∞ −∞

fˆ(t)ω(Aαt (B)) dt =

∞

∞

fˆ(t + iβ)ω(αt (B)A) dt,

A, B ∈ A,

(A.2)

holds for all f ∈ C0∞ (R). The state ω is a KMS state at inverse temperature β = 0 if ω is {αt }t∈R -invariant and a trace, i.e. ω(AB) = ω(BA),

A, B ∈ A.

(A.3)

(Note that we have here additionally imposed {αt }t∈R -invariance in the definition of KMS state at β = 0. Other references define a KMS state at β = 0 just by requiring it to be a trace. The invariance doesn’t follow from that, cf. [4].) We note that various other, equivalent definitions of ground- and KMS-states are known (mostly formulated for the case that (A, {αt }t∈R ) is a C ∗ -dynamical system), see e.g. [4] and [39] as well as references cited there. The term “KMS” stands for Kubo, Martin and Schwinger who introduced and used the first versions of condition (A.2). The significance of KMS-states as thermal equilibrium states, particularly for infinite systems in quantum statistical mechanics, has been established in [22]. The following properties of any ground- or KMS-state at inverse temperature β > 0, ω, are standard in the setting of C ∗ -dynamical systems, and the proofs known for this case carry over to topological ∗-dynamical systems:

728

H. Sahlmann, R. Verch

(i) ω is {αt }t∈R -invariant, (ii) −iω(Aδ(A)) ≥ 0 for all A = A∗ ∈ D(δ) (where δ and D(δ) are as introduced at the beginning of Sect. 2). Let us indicate how one proceeds in proving these statements. We first consider the case where ω is a ground state. Since A contains a unit element, the ground state condition (A.1) says that for any A ∈ A the Fourier-transform of the function t → ω(αt (A)) vanishes on (−∞, 0). For A = A∗ , that Fourier-transform is symmetric and hence is supported at the origin. As t → ω(αt (A)) is bounded, its Fourier-transform can thus only be a multiple of the Dirac-distribution. This entails that t → ω(αt (A)) is constant. By linearity, this carries over to arbitrary A ∈ A, and thus ω is {αt }t∈R -invariant. Now we may pass to the GNS-representation (ϕ, D ⊂ H, G) of ω (cf. Sect. 4 where this object was introduced for the Borchers-algebra, but the construction can be carried out for topological ∗-algebras, see [40]) and we observe that, if ω is invariant, then {αt }t∈R is in the GNS-representation implemented by a strongly continuous unitary group {Ut }t∈R leaving G as well as the domain D = ϕ(A)G invariant. This unitary group is defined by Ut ϕ(A)G := ϕ(αt A)G,

A ∈ A, t ∈ R.

Since it is continuous, it possesses a selfadjoint generator H , i.e. Ut = eitH , and the ground state condition implies that the spectrum of H is contained in [0, ∞). Therefore, one has for all A ∈ D(δ), −iω(A∗ δ(A)) = *ϕ(A)G, H ϕ(A)G+ ≥ 0, and this entails property (ii). Now let ω be a KMS-state at inverse temperature β > 0. For the proof of its {αt }t∈R invariance, see Prop. 4.3.2 in [39]. Property (ii) is then a consequence of the so-called “auto-correlation lower bounds”, see [39, Thm. 4.3.16] or [4, Thm. 5.3.15]. (Note that the proofs of the cited theorems generalize to the case where ω is a state on a topological ∗-algebra.) B. Proof of Lemma 4.1 We now want to give the proof of Lemma 4.1. First, we state some properties of the strict inductive limit of a sequence of locally convex spaces, the topology given to B being a specific example. See for example [3, II, §4] for proofs as well as for further details. Let (En )∞ n=1 be a sequence of locally convex linear spaces such that En ⊂ En+1 and the relative topology of En in En+1 coincides with the genuine topology of En for all n ∈ N. Let E be the inductive limit of the En , denote by πn : En P→ E the canonical imbeddings of the En into E and let F be some locally convex space. In this situation, we have: (a) E is a locally convex space. (b) A map f : E → F is continuous iff f ◦ πn is continuous for each n in N. (c) A family of maps (fι )ι , fι : E → F is equicontinuous iff the family (fι ◦ πn )ι is equicontinuous for each n in N. (d) The relative topology of En in E coincides with the genuine topology of En . (e) If the En are complete, so is E.

Passivity and Microlocal Spectrum Condition

729

Now we can prove the lemma: Because of (e), B is complete. The characterization (4.3) of the continuous linear forms on B is a special case of (b). We want to check now that B is a topological ∗-algebra, i.e. that its ∗-operation is continuous and its multiplication m : B × B → B separately continuous in both entries: For f ∈ B let mf : B → B be the right multiplication with f and denote by [f] the smallest integer such that fk = 0 for all k > [f]. By (b), showing continuity of mf amounts to showing the continuity of the maps mf ◦ πn : Bn −→ Bn+[f] , where by (d), we can take the topologies involved to be the genuine topologies of the respective spaces. As those topologies are direct sum topologies with finitely many summands, the question of continuity can be further reduced, finding that mf is continuous iff the maps C0∞ (n V) g −→ g ⊗ f P→ C0∞ (n+k V) are continuous for all f ∈ C0∞ (k V), k ∈ N. That this is indeed the case can be checked by taking recourse to the topologies of the C0∞ (l V). Therefore, the maps mf ◦ πn are continuous for all n, which in turn shows the continuity of right multiplication on B. In the same way, the proof of continuity of the ∗-operation reduces to showing continuity of the ∗-operation (4.1) on C0∞ (n V), which in turn is easy. The continuity of the left-multiplication can be proven completely analogous to that of right-multiplication or inferred from it, using the continuity of the ∗-operation. Therefore, B equipped with the locally convex direct sum topology is indeed a topological ∗-algebra. For the proof of the last statements of the lemma, let αx be a ∗-homomorphism of B which is the lift of a morphism Rx of V covering ρx as stated in Lemma 4.1. Because of (b) and (d) above, αx is continuous iff its restrictions αx ◦ πn : Bn → Bn are continuous which in turn is the case, iff the maps n Rx9 : C0∞ (n V) → C0∞ (n V) are continuous. That the n Rx9 are indeed continuous follows from density of n C0∞ (V) in C0∞ (n V) together with the continuity of Rx9 on C0∞ (V), the latter of which can again be checked by inspection of the topology on C0∞ (V). For the proof of the continuity property (4.4), note that [αx (f)] = [f] for all x, thus it suffices to prove the convergence of αx (f) for x → 0 in the topology induced on B[f] . But this convergence is implied by the assumed smoothness of Rx (hence of Rx9 ) in x together with (d). The proof of (4.5) amounts to showing that (αx )|x|≤r is an equicontinuous set of maps. By (c) and (d), the proof can in the by now familiar way be reduced to proving equicontinuity of (Rx9 )|x|≤r for some r > 0. For the proof of the latter, note that because of the assumed smoothness of ρx in x we find r > 0 such that for each compact set K ⊂ M the set ∪ ρx (K) is contained in some other compact set. Inspection of the |x|
topology of C0∞ (V) shows that this enables one to find to each given seminorm η on C0∞ (V) another seminorm η such that for all f in C0∞ (V), η(Rx9 f ) ≤ η (f ) holds for |x| ≤ r, thus proving the desired equicontinuity. References 1. Araki, H.: On quasifree states of CAR and Bogoliubov transformation. Publ. RIMS 6, 385 (1970/71); Araki, H.,Yamagami, S.: On quasi-equivalence of quasifree states of the canonical commutation relations. Publ. RIMS 18, 283 (1982) 2. Borchers, H.J.: On the structure of the algebra of field operators. Nuovo Cimento 24, 214 (1962); Algebraic aspects of Wightman field theory. In: Sen, R.N., Weil, C. (eds.), Statistical Mechanics and Field Theory, Jerusalem: Israel Universities Press, 1972 3. Bourbaki, N.: Eléments de mathematique, Fascicule XV: Espaces Vectoriels Topologiques. 2nd eds., Paris: Herman, 1966

730

H. Sahlmann, R. Verch

4. Bratteli, O., Robinson, D.W.: Operator algebras and quantum statistical mechanics, vol. 2. 2nd eds., Berlin: Springer-Verlag, 1997 5. Brunetti, R., Fredenhagen, K.: Microlocal analysis and interacting quantum field thories: renormalization on physical backgrounds. Commun. Math. Phys. 208, 623 (2000) 6. Brunetti, R., Fredenhagen, K., Köhler, M: The microlocal spectrum condition and Wick polynomials of free fields in curved spacetimes. Commun. Math. Phys. 180, 633 (1996) 7. Buchholz, D., Florig, M., Summers, S.J.: The second law of thermodynamics, TCP and Einstein causality in anti-de Sitter spacetime. Class. Quantum Grav. 17, L31 (2000) 8. Dencker, N.: On the propagation of polarization sets for systems of real principal type. J. Funct. Anal. 46, 351 (1982) 9. Dieudonné, J.: Foundations of Analysis, vol. 3. New York: Academic Press, 1972 10. Dieudonné, J.: Foundations of Analysis, vol. 7. New York: Academic Press, 1988 11. Doplicher, S., Roberts, J.E.: Why there is a field algebra with a compact gauge group describing the superselection structure in particle physics. Commun. Math. Phys. 131, 51 (1990) 12. Duistermaat, J.J., Hörmander, L.: Fourier integral operators. II. Acta Mathematica 128, 183 (1972) 13. Fewster, C.J.: A general worldline quantum inequality. Class. Quantum Grav. 17, 1897 (2000) 14. Fredenhagen, K., Haag, R.: On the derivation of Hawking radiation associated with the formation of a black hole. Commun. Math. Phys. 127, 273 (1990) 15. Fulling, S.A.: Aspects of quantum field theory in curved spacetime, Cambridge University Press, Cambridge, 1989 16. Fulling, S.A., Narcowich, F.J., Wald, R.M.: Singularity structure of the two-point function in quantum field theory in curved spacetime, II. Ann. Phys. (N.Y.) 136, 243 (1981) 17. Fulling, S.A., Ruijsenaars, S.N.M.: Temperature, periodicity and horizons. Phys. Rep. 152, 135 (1987) 18. Furlani, E.P.: Quantization of massive vector fields on ultrastatic spacetimes. Class. Quantum Grav. 14, 1665 (1997) 19. Guido, D., Longo, R., Roberts, J.E., Verch, R.: Charged sectors, spin and statistics in quantum field theory on curved spacetimes. math-ph/9906019, to appear in Rev. Math. Phys. 20. Günther, P.: Huygens principle and hyperbolic equations. Boston: Academic Press, 1988 21. Haag, R.: Local quantum physics. 2nd edn., Berlin: Springer-Verlag, 1996 22. Haag, R., Hugenholtz, N.M., Winnink, M.: On the equilibrium states in quantum statistical mechanics. Commun. Math. Phys. 5, 215 (1967) 23. Hawking, S.W., Ellis, G.F.R.: The large scale structure of space-time. Cambridge: Cambridge University Press, 1973 24. Hollands, S.: The Hadamard condition for Dirac fields and adiabatic states on Robertson–Walker spacetimes. gr-qc/9906076 25. Hörmander, L.: The Analysis of Linear Partial Differential Operators I. Berlin: Springer Verlag, 1983 26. Junker, W.: Hadamard states, adiabatic vacua and the construction of physical states for scalar quantum fields on curved spacetime. Rev. Math. Phys. 8, 1091 (1996) 27. Kay, B.S.: Linear spin-zero quantum fields in external gravitational and scalar fields. Commun. Math. Phys. 62, 55 (1978) 28. Kay, B.S.: The double-wedge algebra for quantum fields on Schwarzschild and Minkowski spacetimes. Commun. Math. Phys. 100, 57 (1985) 29. Kay, B.S.: Sufficient conditions for quasifree states and an improved uniqueness theorem for quantum fields on space-times with horizons. J. Math. Phys. 34, 4519 (1993) 30. Kay, B.S., Radzikowski, M.J., Wald, R.M.: Quantum Field Theory on Spacetimes with a Compactly Generated Cauchy Horizon. Commun. Math. Phys. 183, 533 (1997) 31. Kay, B.S., Wald, R.M.: Theorems on the uniqueness and thermal properties of stationary, nonsingular, quasifree states on spacetimes with a bifurcate Killing horizon. Phys. Rep. 207, 49 (1991) 32. Keyl, M.: Quantum field theory and the geometric structure of Kaluza-Klein spacetime. Class. Quantum Grav. 14, 629 (1997) 33. Kratzert, K.: Singularitätsstruktur der Zweipunktfunktion des freien Diracfeldes in einer global hyperbolischen Raumzeit, Diploma thesis, University of Hamburg, 1999 34. Pusz, W., Woronowicz, S.L.: Passive states and KMS states for general quantum systems. Commun. Math. Phys. 58, 273 (1978) 35. Radzikowski, M.J.: Micro-local approach to the Hadamard condition in quantum field theory in curved spacetime. Commun. Math. Phys. 179, 529 (1996) 36. Radzikowski, M.J.: A local-to-global singularity theorem for quantum field theory on curved spacetime. Commun. Math. Phys. 180, 1 (1996) 37. Reed, M., Simon, B.: Methods of modern mathematical physics II. San Diego: Academic Press, 1975 38. Sahlmann, H., Verch, R.: Microlocal spectrum condition and Hadamard form for vector-valued quantum fields in curved spacetime. math-ph/0008029 39. Sakai, S.: Operator algebras in dynamical systems. Cambridge: Cambridge University Press, 1991

Passivity and Microlocal Spectrum Condition

731

40. Schmüdgen, K.: Unbounded operator algebras and representation theory. Basel: Birkhäuser, 1990 41. Streater, R.F., Wightman, A.S.: PCT, spin and statistics, and all that. New York: Benjamin, 1964 42. Verch, R.: Local definiteness, primarity and quasiequivalence of quasifree Hadamard quantum states in curved spacetime. Commun. Math. Phys. 160, 507 (1994) 43. Verch, R.: Continuity of symplectically adjoint maps and the algebraic structure of Hadamard vacuum representations of quantum fields in curved spacetime. Rev. Math. Phys. 9, 635 (1997) 44. Verch, R.: Wavefront sets in algebraic quantum field theory. Commun. Math. Phys. 205, 337 (1999) 45. Wald, R.M.: The back-reaction effect in particle creation in curved spacetime. Commun. Math. Phys. 54, 1 (1977) 46. Wald, R.M.: General relativity. University of Chicago Press, Chicago, 1984 47. Wald, R.M.: Quantum field theory in curved spacetime and black hole thermodynamics. Chicago: University of Chicago Press, 1994 Communicated by H. Araki

Commun. Math. Phys. 214, 733 – 735 (2000)

Communications in

Mathematical Physics

© Springer-Verlag 2000

Erratum

Breit–Wigner Approximation and the Distribution of Resonances Vesselin Petkov1 , Maciej Zworski2 1 Département de Mathématiques Apliquées, Université de Bordeaux I, 351, Cours de la Libération, 33405

Talence, France. E-mail: [email protected]

2 Mathematics Department, University of California, Evans Hall, Berkeley, CA 94720, USA.

E-mail: [email protected] Received: 14 June 2000 / Accepted: 26 June 2000 Commun. Math. Phys. 204, 329–351 (1999)

In Lemmas 2 and 4 in [4], the Harnack inequality is applied incorrectly at the end of the proofs. The crucial inequality | Re gρ,τ (λ)| ≤ C|λ|n

(1)

still holds however, as will be explained here. The main result of the paper [4] (the high energy justification of the Breit–Wigner approximation) remains valid: s(λ + δ) − s(λ − δ) = 2π ωC+ (λj , [λ − δ, λ + δ]) + O(δ)λn−1 , (2) |λj −λ|<1

and so do other statements made in [4]. One of the consequences of (1) was the following bound on the number of resonances # {λj : |λ − λj | ≤ R} ≤ CRλn−1 , R > 1.

(3)

We remark here that this inequality was recently generalized by Bony [1], who using a different method considered the semi-classical case and non-compactly supported perturbations. To see that (1) holds, we first observe that if θ0 < π/n, then for any δ > 0, log | det S(−reiθ )| > −Cδ r n , r > r0 , θ ∈ (0, θ0 ) \ (r), |(r)| < δ,

(4)

which follows from [4, Lemma 1] and the minimum modulus estimates for functions of finite order in an angle (see [2, Theorem 56] where it is a consequence of the Carleman inequality recalled in [4, (4.2)]; see also [5, Sect.7] for a direct argument leading to a similar estimate). As in the proof of [4, Lemma 2], Cartan’s lemma gives that | Re gρ,τ (reiθ )| ≤ Cδ τ n , θ ∈ (0, θ0 ) \ (r), |(r)| < δ, |reiθ − τ | < τ/B, Re gρ,τ (λ) ≤ Cτ n , |τ − λ| < τ/B, Im λ ≥ 0,

734

V. Petkov, M. Zworski

where we used (4) to obtain the first inequality. We can now apply the Harnack inequality correctly to conclude that | Re gρ,τ (λ)| ≤ Cγ −1 τ n , Im λ > γ τ, |λ − τ | ≤ τ/2B. From the unitarity of the scattering matrix we also have that ¯ Re gρ,τ (λ) = − Re gρ,τ (λ). To obtain a lower bound everywhere we apply the following lemma which was kindly pointed out to us by W. K. Hayman: Lemma. Suppose that u is harmonic in D(0, 1), and |u(z)| ≤ Then, for every

K , u(z) = −u(¯z), z ∈ D(0, 1). | Im z|

> 0, there exists C = C( ) such that |u(z)| ≤ CK| Im z|, z ∈ D(0, 1 − ).

Proof. We use the Poisson formula and the symmetry (and we can assume that the hypotheses hold in a slightly bigger disc): 2π (1 − r 2 )u(eiϕ ) 1 dϕ u(reiθ ) = 2π 0 1 − 2r cos(θ − ϕ) + r 2 1 π 8(1 − r 2 )r sin θ sin ϕu(eiϕ ) = dϕ. π 0 (1 − 2r cos(θ − ϕ) + r 2 )(1 − 2r cos(θ + ϕ) + r 2 ) Since we know that

|u(eiϕ )| ≤ K/ sin ϕ, 0 ≤ ϕ ≤ π,

we conclude that for r < 1 − , |u(z)/ Im z| ≤ 8

−4

K.

To get (1) we apply the lemma with u(z) = τ −n Re gρ,τ (τ + 2Bτ z). We point out that in the factorization defining gρ,τ in [4, Lemma 2], it only matters that we take a product over resonances in |τ − λ| ≤ τ/C, for some C. That makes gρ,τ defined in |τ − λ| ≤ τ/2C, with (1) holding there. Cauchy estimates then imply that we gain decay in λ when derivatives are applied. The estimate (3) follows from the argument of [4, Prop. 1]: λ+3R Im λj # {λj : |λ − λj | ≤ R} ≤ C dt |t − λj |2 λ−3R |λ −λ|≤λ/2 j λj ∈#ρ

= C(s(λ + 3R) − s(λ − 3R) − i(gρ,λ (λ + 3R) − gρ,λ (λ − 3R)))

(5)

= O(R)λn−1 , 1 ≤ R ≤ λ/6 by (1) and Cauchy inequalities. For R > λ/C the estimate follows from the global bound on the number of resonances. We stress here that the estimate on gρ,τ is only

Breit–Wigner Approximation and Distribution of Resonances

735

needed on the real axis, and hence, in odd but not even dimensions, it follows directly from Melrose’s argument in [3]. To get (2), we proceed as in [4, (4.3)], and hence need to find a bound for 1 π 1 log | det S(−λ − ρeiθ )| sin θdθ = − 2 Re log | det S(z)|dz ρ 0 ρ %λ,ρ 1 = 2 Re 2i ∂¯z log | det S(z)|L(dz) ρ & λ,ρ 1 = − 2 Re i ∂z log det S(z)L(dz), ρ &λ,ρ where &λ,ρ = {z : Im z ≥ 0, |z + λ| ≤ ρ}, %λ,ρ , its boundary, L(dz), the Lebesgue measure on C, and where we used Green’s formula. We now write 1 1 (z) + − ∂z log det S(z) = gρ,τ z − λj z − λ¯ j |λ−λ |<2ρ

+

j λj ∈#ρ

2ρ≤|λ−λj |<λ/B λj ∈#ρ

1 1 − z − λj z − λ¯ j

,

where we can take τ = λ. The first two terms are estimated using (1) and (3). To estimate the last term we will use the equality in the second line of (5), and to apply it, we rewrite it as Im z 2i Im λj 1 1 dy , +i − | Re z−λj |2 (Re z+iy −λj )2 (Re z+iy −λj )2 0 2ρ≤|λ−λ |<λ/B, j λj ∈#ρ

|z − λ| < ρ. Using (5), the integral of the first term is bounded by Cλn−1 . It remains to estimate the second term, and for that we write 1 1 ≤ 2−2k # {λj : |λ − λj | ≤ 2k+1 ρ} |λ − λj |2 ρ 2ρ≤|λ−λj |<λ/B

1≤k≤C log λ

≤C

1≤k≤C log λ

λn−1 = O(λn−1 ). 2k

The method described here can also be used to show that [4, Lemma 4] remains valid. We chose a direct approach for the sake of brevity. References Bony, J.-F.: Majoration du nombre de résonances dans des domaines de taille h. Preprint, 2000 Cartwright, M.: Integral Functions. Cambridge: Cambridge University Press, 1956 Melrose, R.B.: Weyl asymptotics for the phase in obstacle scattering. Comm. P.D.E. 13, 1431–1439 (1988) Petkov, V. and Zworski, M.: Breit–Wigner approximation and the distribution of resonances. Commun. Math. Phys. 204, 329–351 (1999) 5. Sjöstrand, J.: Resonances for bottles and related trace formulæ. Math. Nachr., to appear 1. 2. 3. 4.

Communicated by B. Simon

Communications in Mathematical Physics - Volume 221

Read more

Communications in Mathematical Physics - Volume 220

Read more

Communications in Mathematical Physics - Volume 235

Read more

Communications in Mathematical Physics - Volume 223

Read more

Communications In Mathematical Physics - Volume 283

Read more

Communications In Mathematical Physics - Volume 270

Read more

Communications in Mathematical Physics - Volume 208

Read more

Communications in Mathematical Physics - Volume 186

Read more

Communications In Mathematical Physics - Volume 294

Read more

Communications in Mathematical Physics - Volume 217

Read more

Communications In Mathematical Physics - Volume 274

Read more

Communications in Mathematical Physics - Volume 239

Read more

Communications in Mathematical Physics - Volume 306

Read more

Communications in Mathematical Physics - Volume 264

Read more

Communications in Mathematical Physics - Volume 227

Read more

Communications in Mathematical Physics - Volume 184

Read more

Communications in Mathematical Physics - Volume 261

Read more

Communications in Mathematical Physics - Volume 225

Read more

Communications In Mathematical Physics - Volume 263

Read more

Communications in Mathematical Physics - Volume 211

Read more

Communications In Mathematical Physics - Volume 293

Read more

Communications in Mathematical Physics - Volume 246

Read more

Communications In Mathematical Physics - Volume 298

Read more

Communications in Mathematical Physics - Volume 234

Read more

Communications In Mathematical Physics - Volume 288

Read more

Communications in Mathematical Physics - Volume 304

Read more

Communications In Mathematical Physics - Volume 292

Read more

Communications in Mathematical Physics - Volume 233

Read more

Communications in Mathematical Physics - Volume 253

Read more

Communications in Mathematical Physics - Volume 222

Read more

Recommend Documents

Communications in Mathematical Physics - Volume 221

Commun. Math. Phys. 221, 1 – 26 (2001) Communications in Mathematical Physics © Springer-Verlag 2001 Evolution of a ...

Communications in Mathematical Physics - Volume 220

Commun. Math. Phys. 220, 1 – 12 (2001) Communications in Mathematical Physics © Springer-Verlag 2001 On the Definiti...

Communications in Mathematical Physics - Volume 235

Commun. Math. Phys. 235, 1–45 (2003) Digital Object Identifier (DOI) 10.1007/s00220-002-0778-0 Communications in Mathe...

Communications in Mathematical Physics - Volume 223

Commun. Math. Phys. 223, 1 – 12 (2001) Communications in Mathematical Physics © Springer-Verlag 2001 Resonance Expan...

Communications In Mathematical Physics - Volume 283

Commun. Math. Phys. 283, 1–24 (2008) Digital Object Identifier (DOI) 10.1007/s00220-008-0556-8 Communications in Mathe...

Communications In Mathematical Physics - Volume 270

Commun. Math. Phys. 270, 1–12 (2007) Digital Object Identifier (DOI) 10.1007/s00220-006-0139-5 Communications in Mathe...

Communications in Mathematical Physics - Volume 208

Commun. Math. Phys. 208, 1 – 23 (1999) Communications in Mathematical Physics © Springer-Verlag 1999 Characters of C...

Communications in Mathematical Physics - Volume 186

Commun. Math. Phys. 186, 1-59 (1997) Communications in Mathematical Physics (~) Springer-Verlag1997 Meanders and the...

Communications In Mathematical Physics - Volume 294

Commun. Math. Phys. 294, 1–19 (2010) Digital Object Identifier (DOI) 10.1007/s00220-009-0920-3 Communications in Mathe...

Communications in Mathematical Physics - Volume 217

Commun. Math. Phys. 217, 1 – 31 (2001) Communications in Mathematical Physics © Springer-Verlag 2001 Integrable Stru...