Commun. Math. Phys. 306, 1–33 (2011) Digital Object Identifier (DOI) 10.1007/s00220-011-1271-4
Communications in
Mathematical Physics
Renormalization and Asymptotic Expansion of Dirac’s Polarized Vacuum Philippe Gravejat1 , Mathieu Lewin2 , Éric Séré3 1 Centre de Mathématiques Laurent Schwartz (UMR 7640), École Polytechnique, 91128 Palaiseau Cedex,
France. E-mail:
[email protected]
2 CNRS & Laboratoire de Mathématiques (UMR 8088), Université de Cergy-Pontoise, 95000 Cergy-Pontoise,
France. E-mail:
[email protected]
3 Ceremade (UMR 7534), Université Paris-Dauphine, Place du Maréchal de Lattre de Tassigny,
75775 Paris Cedex 16, France. E-mail:
[email protected] Received: 18 May 2010 / Accepted: 6 January 2011 Published online: 7 June 2011 – © The Author(s) 2011
Abstract: We perform rigorously the charge renormalization of the so-called reduced Bogoliubov-Dirac-Fock (rBDF) model. This nonlinear theory, based on the Dirac operator, describes atoms and molecules while taking into account vacuum polarization effects. We consider the total physical density ρph including both the external density of a nucleus and the self-consistent polarization of the Dirac sea, but no ‘real’ electron. We show that ρph admits an asymptotic expansion to any order in powers of the physical coupling constant αph , provided that the ultraviolet cut-off behaves as ∼ e3π(1−Z 3 )/2αph 1. The renormalization parameter 0 < Z 3 < 1 is defined by Z 3 = αph /α, where α is the bare coupling constant. The coefficients of the expansion of ρph are independent of Z 3 , as expected. The first order term gives rise to the well-known Uehling potential, whereas the higher order terms satisfy an explicit recursion relation.
Contents 1. Introduction and Main Result . . . . . . . . . . . . . 2. The Two Sequences {νn, } and {νn } . . . . . . . . . . 3. Some Preliminary Results . . . . . . . . . . . . . . . 4. Proof of Theorem 4 . . . . . . . . . . . . . . . . . . 5. Proof of Theorem 3 . . . . . . . . . . . . . . . . . . Appendix A. Auxiliary Results on the Uehling Multiplier U Appendix B. Proof of Proposition 3.1 . . . . . . . . . . . Appendix C. Proof of Proposition 3.2 . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . .
. . . . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
© 2011 by the authors. This paper may be reproduced, in its entirety, for non-commercial purposes.
2 9 12 14 17 22 25 27 32
2
P. Gravejat, M. Lewin, É. Séré
1. Introduction and Main Result Renormalization is an essential tool in Quantum Electrodynamics (QED) [2,9,21]. The purpose of this paper is to perform rigorously the charge renormalization of a nonlinear approximation of QED, the reduced Bogoliubov-Dirac-Fock (rBDF) theory that was studied before in [13,15–19]. This model, based on the Dirac operator, describes atoms and molecules while taking into account vacuum polarization effects. It does not need any mass renormalization, hence it is a theory simple enough for an investigation of charge renormalization in full detail. Before turning to our specific Dirac model, let us quickly recall the spirit of renormalization. A physical theory usually aims at predicting physical observables in terms of the parameters in the model. Sometimes, interesting quantities are divergent and it is necessary to introduce cut-offs. For electrons the parameters are their mass m and their charge e (or rather the coupling constant α = e2 ). Predicted physical quantities are then functions F(m, α, ), where is the regularization parameter. Mass and charge are also physical observables and renormalization occurs when their values predicted by the theory are different from their ‘bare’ values: m ph = m ph (m, α, ) = m and/or αph = αph (m, α, ) = α.
(1.1)
In this case the parameters m and α are not observable in contrast with m ph = m ph (m, α, ) and αph = αph (m, α, ) which have to be set equal to their experimental values. The relation (1.1) has to be inverted, in order to express the bare parameters in terms of the physical ones: m = m(m ph , αph , )
α = α(m ph , αph , ).
(1.2)
This allows to express any observable quantity F as a function F˜ of the physical parameters and the cut-off : ˜ ph , αph , ) = F m(m ph , αph , ) , α(m ph , αph , ) , . F(m (1.3) A possible definition of renormalizability is that all such observable quantities have a limit when → ∞, for fixed m ph and αph . Important difficulties can be encountered when trying to complete this program: • The physical quantities m ph and αph might be nonexplicit functions of α and m. The corresponding formulas can then only be inverted perturbatively to any order (usually in α). This is the case in QED [2,9,21]. In the model studied in this paper we have m ph = m and αph = α, hence only the charge has to be renormalized. Furthermore αph is an explicit function of m, α and (see (1.9) later). Renormalizing our model is therefore a much easier task than in full QED. • Even when the bare parameters are explicit functions of the physical ones, these relations can make it impossible to take the limit → ∞ while keeping m ph and αph fixed. As we will explain, in our model (2/3π )αph log ≤ 1. To deal with this problem, we let depend on αph and we investigate the asymptotics in the limit αph → 0. We now turn to the description of our model. The Bogoliubov-Dirac-Fock theory is the Hartree-Fock approximation of QED when photons are neglected [18,19]. The associated reduced theory is obtained by further neglecting the so-called exchange term. In both models, the system is described by a Hartree-Fock (quasi-free) state in Fock
Charge Renormalization
3
space, which is completely characterized by its one-body density matrix P (an orthogonal projector for pure states), acting on the one-body space. The state P contains both the ‘real’ electrons of the system (that of an atom for instance) and the ‘virtual’ electrons of the Dirac sea, which all interact with each other self-consistently. Therefore, there are always infinitely many particles and P is infinite-rank. When the exchange term is neglected, a ground state at zero temperature is (formally) a solution of the following self-consistent equation:
P = χ(−∞,μ) (D) + δ D = D 0 + α(ρ P−1/2 − ν) ∗ |x|−1 .
(1.4)
Here D 0 = α · (−i∇) + β is the free Dirac operator [32] acting on the Hilbert space H := L 2 (R2 , C4 ). For the sake of simplicity we have chosen units in which the speed of light is c = 1 and, as the model does not need any mass renormalization, we have taken m = 1 for the mass of the electrons. The second term in the formula of D is the Coulomb potential induced by both a fixed external density of charge ν (modelling for instance a smeared nucleus) and the self-consistent density ρ P−1/2 of the system (see below). In (1.4), α is the bare coupling constant that will be renormalized later and μ ∈ (−1, 1) is a chemical potential which is chosen to fix the desired total charge of the system. We have added in (1.4) the possibility of having a density matrix 0 ≤ δ ≤ χ{μ} (D) at the Fermi level, as is usually done in reduced Hartree-Fock theory [31]. So the operator P is not necessarily a projector but we still use the letter P for convenience. Later we will restrict ourselves to the case of P being an orthogonal projector. Equation (1.4) is well-known in the physical literature. A mean-field equation of the same form (including an exchange term and classical terms accounting for the interactions with photons) was derived by Reinhard, Greiner and Arenhövel [26] from the Schwinger-Dyson equations of QED. Chaix and Iracane [4] later gave a variational interpretation of this equation. Related models appear in relativistic Density Functional Theory, usually with additional empirical exchange-correlation terms, see, e.g., [12, Eq. (6.2)] and [11, Eq. (62)]. Dirac [8] already considered the first order term obtained from (1.4) in an expansion in powers of α. Let us now explain the exact meaning of ρ P−1/2 . The charge density of an operator A : H → H with integral kernel A(x, y)σ,σ is formally defined as ρ A (x) = 4 σ =1 A(x, x)σ,σ = Tr C4 (A(x, x)). In usual Hartree-Fock theory, the charge density is ρ P (x). However, as there are infinitely many particles, this does not make sense here. In (1.4), the subtraction of half the identity is a convenient way to give a meaning to the density, independently of any reference. One has formally ρ P−1/2 (x) = ρ P−P ⊥ (x) = 2
1 − |ϕi (x)|2 − |ϕi+ (x)|2 , 2 i≥1
where {ϕi− }i≥1 is an orthonormal basis of PH and {ϕi+ }i≥1 is an orthonormal basis of (1 − P)H. As was explained in [19], subtracting 1/2 to the density matrix P of the Hartree-Fock state makes the model invariant under charge conjugation. When there is no external field, ν ≡ 0, Eq. (1.4) has an obvious solution for any μ ∈ (−1, 1), the Hartree-Fock state made of all electrons with negative energy: P = P−0 := χ(−∞,0) (D 0 ),
4
P. Gravejat, M. Lewin, É. Séré
in accordance with Dirac’s ideas [5–7]. Indeed ρ P 0 −1/2 ≡ 0, as is seen by writing in the − Fourier representation α· p+β (P−0 − 1/2)( p) = − , 2 1 + | p|2 and since the Dirac matrices are trace-less. This shows the usefulness of the subtraction of half the identity to P, since the free vacuum P−0 now has a vanishing density. For a general state P, we can use this to write (formally): ρ P−1/2 = ρ P−1/2 − ρ P 0 −1/2 = ρ P−P 0 . −
−
(1.5)
When P belongs to a suitable class of perturbations of P−0 (for instance when P − P−0 is locally trace-class), the density ρ P−P 0 is a well-defined mathematical object. We will −
give below natural conditions which guarantee that P − P−0 has a well-defined density in our context. In the presence of an external field, ν = 0, Eq. (1.4) has no solution in any ‘reasonable’ Banach space [16] and it is necessary to introduce an ultraviolet regularization parameter . The simplest method (although probably not optimal regarding regularity issues [13]) is to impose a cut-off at the level of the Hilbert space, that is to replace H by H := { f ∈ L 2 (R3 ; C4 ), supp( f ) ⊂ B(0; )} and to solve, instead of (1.4), the regularized equation in H :
P = χ(−∞,μ) (D) + δ
D = D 0 + α(ρ P−P 0 − ν) ∗ |x|−1 ,
(1.6)
−
where is the orthogonal projector onto H in H = L 2 (R3 ; C4 ). Existence of solutions to (1.6) was proved in [16] for μ = 0 and in [13] for μ = 0. The precise statement is the following1 : Theorem 1 (Existence of self-consistent solutions to (1.6), [13,16]). Assume that α ≥ 0, > 0 and μ ∈ (−1, 1) are given. Let ν be in the so-called Coulomb space: C :=
f :
R3
|k|
−2
2 | f (k)| dk < ∞ .
Then, Eq. (1.6) has at least one solution P such that P − P−0 ∈ S2 (H ), P±0 (P − P−0 )P±0 ∈ S1 (H ), ρ P−P 0 ∈ C ∩ L 2 (R3 ). −
(1.7)
All such solutions share the same density ρ P−P 0 . −
1 To be more precise, in [13, Theorem 1], only the existence and uniqueness of minimizers of the reduced BDF functional are stated. Elementary arguments based on convexity allow to deduce Theorem 1 from the results of [13].
Charge Renormalization
5
In (1.7), S1 (H ) and S2 (H ) are respectively the spaces of trace-class and HilbertSchmidt operators [30] on H , and P+0 = 1 − P−0 . The method used in [16,13] was to identify solutions of (1.6) with minimizers of the so-called reduced Bogoliubov-DiracFock energy which is nothing but the formal difference between the reduced Hartree-Fock energy of P and that of the reference state P−0 . Note that due to the uniqueness of ρ P−P 0 − the mean-field operator D is also unique and only δ can differ between two solutions of (1.6). Let us mention that it is natural to look for a solution of (1.6) such that P − P−0 is a Hilbert-Schmidt operator on H . If P is a projector, the Shale-Stinespring theorem [29] then tells us that P yields a Fock representation equivalent to that of P−0 . Even when P is not a projector, it will be associated with a unique Bogoliubov mixed state in the Fock space representation of P−0 . This is a mathematical formulation of the statement that P should not be too far from P−0 . Indeed, if P is an orthogonal projector, one has (see [15, Lemma 2] and [17, Lemma 1])
P − P−0 ∈ S2 (H ) ⇒ P±0 (P − P−0 )P±0 ∈ S1 (H ) and ρ P−P 0 ∈ C ∩ L 2 (R3 ), − P2 = P therefore, in this case, (1.7) is just equivalent to the Shale-Stinespring condition P − P−0 ∈ S2 (H ). The property P±0 (P − P−0 )P±0 ∈ S1 (H ) allows us to define the total ‘charge’ of the system by (see [15]) Tr P 0 (P − P−0 ) := Tr P−0 (P − P−0 )P−0 + Tr P+0 (P − P−0 )P+0 . −
When P is a projector, the above quantity is always an integer which is indeed nothing but the relative index of the pair (P, P−0 ), see [15,1]. Varying μ allows to pick the desired total charge. In particular, if ν is small enough and μ = 0, then one has P − P−0 < 1 and the relative index vanishes: Tr P 0 (P − P−0 ) = 0. − It is very important to realize that solutions of (1.6) are singular mathematical objects. This fact is precisely at the origin of charge renormalization. In [13, Theorem 1], the following was proved: Theorem 2 (Nonperturbative charge renormalization formula [13]). Assume that α ≥ 0, > 0 and μ ∈ (−1, 1) are given. If ν ∈ C ∩ L 1 (R3 ), then ρ P−P 0 ∈ L 1 (R3 ), and −
R3
ν−
R3
ρ P−P 0 = −
R3
ν − Tr P 0 (P − P−0 ) −
1 + α B
.
(1.8)
Remark 1. A related model was studied by Cancès and Lewin [3] for defects in nonrelativistic crystals. In contrast to the Dirac case, the charge density of the polarized Fermi sea is not necessarily integrable. In particular it is not in L 1 (R3 ) for anisotropic materials and the question is open in the isotropic case. In Formula (1.8), B is an explicit function of the ultraviolet cut-off (see the comments after (2.4) and (B.10)), which behaves like B =
5 2 log 2 2 log − + + O(1/2 ). 3π 9π 3π
6
P. Gravejat, M. Lewin, É. Séré
Let us emphasize that (1.8) is non perturbative and holds for all α ≥ 0 and all μ ∈ (−1, 1). Theorem 2 shows that the operator P − P−0 is in general not trace-class: if P − P−0 ∈ S1 (H ), then it must hold Tr P 0 (P − P−0 ) = Tr(P − P−0 ) = R3 ρ P−P 0 . − − In our model we have two possible definitions of the charge of the system: R3 ν − Tr P 0 (P − P−0 ) and R3 (ν − ρ P−P 0 ). In practice it is the electrostatic field induced by − − the nucleus (together with the vacuum polarization density) which is measured, hence it is more natural to define the charge by means of the density. By (1.8), the total Coulomb potential is, at infinity,
α 0) ν − Tr (P − P 0 (ν − ρ ) 0 3 − P− R 1+α B 1 P−P− R3 α(ν − ρ P−P 0 ) ∗ ∼ α = . − |x| |x|→∞ |x| |x| Let us assume for simplicity that we put in the vacuum (μ = 0) a nucleus containing R3 ν = Z protons and which is small enough in the sense that ||ν||C 1. Then Tr P 0 (P − P−0 ) = 0 by [16, Theorem 3] and we see that at infinity the potential induced − by the nucleus is not α Z /|x| as expected, but rather αph Z /|x| where αph = Z 3 α ,
with
Z3 =
1 = 1 − αph B . 1 + α B
(1.9)
Our charge renormalization constant2 Z 3 agrees with the one appearing in the one-loop renormalization of full QED [9,2,21]. In our model it allows to renormalize the density of charge to all orders, as we will explain in Theorem 3. The value of α is not observable, αph is the real physical constant since we always observe the nucleus together with the vacuum polarization density. Its experimental value is αph 1/137. In our theory we must therefore fix αph and not α. Using (1.9) we can express any physical quantity in terms of αph and only. Unfortunately αph B < 1 holds, hence it makes no sense to take → ∞ while keeping αph fixed (this is the so-called Landau pole [23,22]) and one has to look for a weaker definition of renormalizability. The cut-off which was first introduced as a mathematical trick to regularize the model actually has a physical meaning. Because of the above constraint αph B < 1, a natural scale occurs beyond which the model does not make sense. Fortunately, this scale is of the order e3π/2αph , a huge number for αph 1/137. It is more convenient to change variables and take as new parameters αph and Z 3 = 1−αph B , with the additional constraint that 0 < Z 3 < 1. The new parameter Z 3 is now independent of αph and the natural question arises whether predicted physical quantities will depend very much on the chosen value of 0 < Z 3 < 1. The purpose of this paper is to prove that the asymptotics of any physical quantity in the regime αph 1 is actually independent of Z 3 to any order in αph , which is what we call asymptotic renormalizability. Note that fixing Z 3 ∈ (0, 1) amounts to take Ce3π(1−Z 3 )/2αph 1. Instead of looking at all possible physical observables, it is convenient to define a renormalized density ρph . Following [16], we define it by the relation
αph ρph = α ν − ρ P−P 0 (1.10) −
∗ |x|−1 . This procedure
in such a way that D = is similar to wavefunction renormalization. By uniqueness of ρ P−P 0 we can see ρph as a function of αph , ν, μ and D 0 − αph ρph
−
2 The renormalization constant Z should not be confused with the nuclear charge Z = 3 R3 ν.
Charge Renormalization
7
(or Z 3 ). For the sake of clarity we will not emphasize the dependence in ν and μ which will be fixed quantities. Also we will use the same notation ρph (αph , ) or ρph (αph , Z 3 ), depending on the context. The self-consistent equation for ρph was derived in [16] and it is mentioned below in Sect. 2. From now on, we will assume that μ = 0. For small external densities ν, this means that we will be looking at the vacuum polarization in the presence of the nucleus, without considering any real electron (that is, ρph is the renormalized density of the nucleus containing both the bare density ν and the vacuum polarization density ρ P−P 0 ). We will explain in Sect. 2 that one can expand − ρph = ρph (αph , ) as follows: ρph (αph , ) =
∞ (αph )n νn, ,
(1.11)
n=0
where {νn, }n ⊂ L 2 (R3 ) ∩ C is a sequence depending only on the external density ν and the cut-off . This sequence is defined below in Sect. 2. The series (1.11) has a positive radius of convergence, which is however believed to shrink to zero when → ∞. Assuming ν decays fast enough (see condition (1.12)), we will prove that for any fixed n, the limit νn, → νn exists in L 2 (R3 ) ∩ C. This is what is usually meant by renormalizability in QED: Each term of the perturbation series in powers of the physical αph has a limit when the cut-off is removed. The sequence {νn }n is the one which is calculated in practice [2,14,12,11]. One has for instance ν0 = ν and ∞ ν(y) 1 1 −1 2 1/2 2 dy, ν1 ∗ |x| = dt (t − 1) + 4 e−2|x−y|t 2 3π 1 t t |x − y| R3 the Uehling potential [28,33]. All the other νn ’s, can be calculated by induction in terms of ν0 , . . ., νn−1 , as is explained below in Sect. 2. The next natural question is to understand the the well-defined, cut-off link between n ν . Recall that α B < 1 dependent, series (1.11) and the formal series ∞ (α ) n ph n=0 ph by construction, so it is in principle not allowed to take the limit → ∞ while keeping αph fixed: we rather want to think of Z 3 = 1 − αph B as being fixed. The main result in this paper is the following Theorem 3 (Asymptotic renormalization of the nuclear charge density). Consider a function ν ∈ L 2 (R3 ) ∩ C such that log(1 + |k|)2N +2 | ν(k)|2 dk < ∞ (1.12) R3
for some integer N . Let ρph (αph , Z 3 ) be the unique physical density defined by (1.10) with μ = 0, Z 3 = 1 − αph B and αph = Z 3 α. Then, for every 0 < < 1/2, there exist two constants C(N , , ν) and a(N , , ν), depending only on N , and ν, such that one has N n N +1 νn (αph ) ≤ C(N , , ν) αph (1.13) ρph (αph , Z 3 ) − n=0
L 2 (R3 )∩C
for all 0 ≤ αph ≤ a(N , , ν) and all ≤ Z 3 ≤ 1 − .
8
P. Gravejat, M. Lewin, É. Séré
Remark 2. We will provide explicit induction formulas for the sequence {νn } later in Sect. 2. In particular, we will see in the proof that under Assumption (1.12), one has νn ∈ L 2 (R3 ) ∩ C for all 0 ≤ n ≤ N . Therefore the approximation series of order N N appearing in (1.13), n=0 (αph )n νn , is a well-defined function of L 2 (R3 ) ∩ C. Remark 3. The space L 2 (R3 ) ∩ C is the natural space which occurs in this theory. In particular the Coulomb norm is nothing but the classical electrostatic energy which appears in the reduced BDF energy functional. Our result can be extended to Sobolev spaces H s (R3 ) provided ν is smooth enough. The interpretation of Theorem 3 is that the renormalized density ρph (αph , Z 3 ) is asymptotically (meaning up to any fixed order N ) given by the formal series n ν , uniformly in the renormalization parameter Z in the range ≤ Z ≤ (α ) ph n 3 3 n≥0 1 − . Therefore, for a very large range of cut-offs, essentially C1 e3 π /2αph ≤ ≤ C2 e3(1− )π/2αph , the result is independent of and it is given by the formal series n≥0 (αph )n νn . Our formulation of renormalizability is more precise than the requirement that each νn, converges. It also leads to the formal perturbation series in a very natural way. question is to ask for the convergence of the perturbation series A natural n ν . It was argued by Dyson in [10] that it is probably divergent, but we (α ) n n≥0 ph are unable to transform his argument into a rigorous mathematical proof. We will make more comments on the series n≥0 (αph )n νn at the end of the next section. It would be interesting to extend Theorem 3 to the case of atoms with ’real’ electrons. This amounts to taking μ sufficiently close to 1 at the same time as αph is small. However this case is more difficult than what is done here: an additional expansion of the electronic charge density in powers of αph is needed. Another interesting problem is to include the exchange term in the model. As for the existence of solutions, this was already done for the free vacuum in [19,24] and in an external field in [15,17,19]. In this case the mass of the electron also has to be renormalized [19, Sect. 2.5] and there is probably no explicit formula for the corresponding m ph . In fact, it is not clear at all whether the renormalization program can be applied: In [26, pp. 194–195], it is argued that mass and charge renormalization alone are not enough to completely remove the divergences by means of multiplicative parameters, in mean-field theory with an exchange term. The proof of Theorem 3 (given in Sect. 5 below) is divided into two steps. We first estimate the difference (see Lemma 5.1) N n νn, (αph ) ≤ C1 (N , , ν)(αph ) N +1 (1.14) ρph (αph , ) − n=0
L 2 (R3 )∩C
for a constant C1 (N , , ν) depending only on N , and ν, and under the assumption that ≤ Z 3 = 1 − αph B ≤ 1 − . This amounts to expanding the solution of the self-consistent equation (1.6) up to the N th order in αph while controlling the error term uniformly in . Then we show in Lemma 5.2 that ∀0 ≤ n ≤ N ,
νn, − νn
L 2 (R3 )∩C
≤
C2 (N , ν) (B ) N +1−n
(1.15)
Charge Renormalization
9
for a constant C2 (N , ν) depending only on N and ν, leading to the bound N N νn, (αph )n − νn (αph )n n=0
n=0
≤ (αph ) N +1 L 2 (R3 )∩C
C2 (N , ν)(1 − N +1 ) ,
N +1 (1 − )
(1.16)
since by assumption (B )−N −1+n ≤ (αph / ) N +1−n . The main result then follows from (1.14) and (1.16). All these bounds strongly use the explicit recursion relations defining the sequences {νn, } and {νn }, as well as tedious estimates on the nonlinear terms appearing in these relations. The rest of the paper is organized as follows. In Sect. 2 we define the sequences {νn, } and {νn } by their respective recursion formulas and we discuss some properties of the latter. In particular, in Theorem 4, we give a simple estimate on ||νn || L 2 (R3 )∩C . In Sect. 3 we present estimates on the different terms appearing in the recursion formulas. Of particular interest will be the density ν1, giving rise to the Uehling potential. In Fourier space, we have ν1, (k) = U (k) ν(k) for an explicit function U (k) which is studied in Sect. 3.1. The proofs of Theorems 4 and 3 are respectively provided in Sects. 4 and 5. Some other technical proofs are provided in Appendices A, B and C. 2. The Two Sequences {νn, } and {νn } In this section we derive formulas for {νn, } and {νn }, and we make some comments on the latter. 2.1. Definition of {νn, } and {νn }. We start with the self-consistent equation (1.6) with cut-off, assuming μ = 0. Note that in the regime of interest in Theorem 3, we have α = αph /Z 3 ≤ αph / . When απ 1/6 211/6 ||ν||C < 1, it is known that 0 ∈ / σ (D), hence δ = 0 in (1.6), see [16, Theorem 3] and [13, Lemma 11]. Therefore assuming a(ν, N , ) ≤
(π 1/6 211/6 ||ν||C )−1 in Theorem 3, we automatically have that δ = 0 and P = P 2 is unique. The idea is then to expand the self-consistent equation
P = χ(−∞,0) D 0 + α(ρ P−P 0 − ν) ∗ |x|−1 −
(2.1)
in powers of α by means of the resolvent formula. A similar computation was performed in [20] in the Furry picture (that is, neglecting the self-consistent terms). This method was also used in [15] to prove existence and uniqueness of solutions of the self-consistent equation. We define ⎡ 1 Fn, (μ1 , . . ., μk ) := ρ ⎣ 2π
∞ −∞
⎤ n 1 1 1 μ j ∗ 0 dη⎦, D 0 + iη |x| D + iη j=1
(2.2) where we recall that is the orthogonal projector onto H in L 2 (R3 , C4 ) and μ1 , . . ., μn ∈ C. We will always use the simplified notation Fn, (μ) := Fn, (μ, . . ., μ)
10
P. Gravejat, M. Lewin, É. Séré
and ν := F −1 ( ν1 B(0,2) ). Note that by Furry’s Theorem F2 j, ≡ 0 for all j, see [15, p. 547]. We also introduce F (μ) := Fn, (μ). n≥3
The self-consistent equation (2.1) may then be written in terms of the density in Fourier space [15,16], as
α(ν − ρ ρ ν (k) + F (2.3) P−P 0 (k) = −α B (k) ρ P−P 0 (k) − P−P 0 ) , −
−
−
where the function B (k) is given by z 2 − z 4 /3 1 Z (|k|) dz B (k) = π 0 (1 − z 2 )(1 + |k|2 (1 − z 2 )/4) |k| Z (|k|) z − z 3 /3 + dz (2.4) √ 2π 0 1 + 2 − |k|z/2
√ with Z (r ) = 1 + 2 − 1 + ( − r )2 /r , see [13]. The formula for B (k) is well-known (but in most previous works the second term was ignored, see for instance [25]). Defining U (|k|) = B − B (k), where B = B (0) and 0 ≤ U (|k|) ≤ B with U (2) = B , we get the renormalized equation (αph ρph ) = ν 1 − αph U ρ ph + F (2.5) with the renormalized coupling constant αph := α/(1 + α B ) and the renormalized density αph ρph = α(ν − ρ Q ) (see [16]). For convenience, we will denote by U the operator of multiplication by the function U (|k|) in the Fourier domain. Hence we can write the self-consistent equation (2.5) in direct space as 1 − αph U ρph + F (αph ρph ) = ν .
(2.6)
We now expand the unique solution ρph = ρph (αph , ) of (2.6) in powers of αph . Writing a formal series ρph = (αph )n νn, (2.7) n≥0
we find that the functions νn, must satisfy the following recurrence relation: ⎧ ν0, = ν , ⎪ ⎪ ⎪ ⎨ ν1, = U ν , (2.8) ⎪ ⎪ ⎪ n ⎩ νn, = U νn−1, + j=3 n 1 +···+n j =n− j F j, νn 1 , , . . ., νn j , , ∀n ≥ 2. Note that the operator U is bounded by U (2) = B on H and that, as we will see later in Corollary 3.1, each F j, is continuous on C j with values in L 2 ∩C. The sequence {νn, } is thus well-defined in L 2 ∩ C. Using estimates from [15] it can be proven that
Charge Renormalization
11
the series (2.7) has a finite radius of convergence in L 2 ∩ C, but this is not needed for the moment and we can stay at a formal level in this section. We can now formally pass to the limit as → ∞ and define by induction a sequence {νn } by ⎧ ν0 = ν, ⎪ ⎪ ⎪ ⎨ ν1 = Uν, (2.9) ⎪ ⎪ ⎪ ⎩ νn = Uνn−1 + nj=3 n 1 +···+n j =n− j F j νn 1 , . . ., νn j , ∀n ≥ 2, where the F j are defined similarly as the F j, with removed and U is the operator of multiplication by the function U (|k|) in the Fourier domain, defined by r 2 1 z 2 − z 4 /3 U (r ) := lim U (r ) = dz →∞ 4π 0 1 + r 2 (1−z 2 ) 4 √ √ 12 − 5r 2 4 + r2 2 4 + r2 + r = . (2.10) + (r − 2) log √ 9πr 2 3πr 3 4 + r2 − r If the series n≥0 νn (αph )n had a positive radius of convergence, it would be a solution of the renormalized equation without ultraviolet cut-off 1 − αph U ρph + F(αph ρph ) = ν. (2.11) However, it is widely believed that the series is divergent (as discussed in the next subsection) and it is not known whether the renormalized Eq. (2.11) admits any solution, even for αph small enough. 2.2. On the series n≥0 νn (αph )n . The recursion formula (2.9) defining {νn } contains two terms. The first term Uνn−1 is a simple multiplication operator in Fourier space, by the function U (|k|) which diverges at infinity. The second term involves the nonlinear functions F j ’s. If only the first term with U were present, the series n≥0 νn (αph )n would only converge when the Fourier transform ν has a compact support, the radius of convergence depending on the size of this support. If only the nonlinear terms were present, the series would have a finite radius of convergence by the estimates of [15] and of Sect. 3.2. However when the two terms are combined, the situation is much more complicated. The nonlinear terms act like convolutions in Fourier space, hence even if ν has a compact support in the Fourier domain, the support of νn will probably grow with n. A careful study of the mixed effect of the multiplication by the divergent function U and the nonlinearities seems rather difficult. We will prove the following estimate: Theorem 4 (Estimate on {νn }n≥1 ). There exist universal constants A and K such that (1 + U)m−n νn 2 L ∩C ! mn , (2.12) ≤ An+1 max (1 + U)m ν L 2 ∩C , (K log(m)) 2 (1 + U)m νn+1 L 2 ∩C for all ≥ 1, m ∈ N and 0 ≤ n ≤ m.
12
P. Gravejat, M. Lewin, É. Séré
Even if we assume that ν decays fast enough in Fourier space, for instance ∀n ≥ 0,
(1 + U)n ν
L 2 ∩C
≤ Cn,
the above estimate (2.12) does not imply that the series n≥0 νn (αph )n is convergent for αph small enough. Although our estimate (2.12) is certainly far from optimal, as we have already mentionned, it is expected that the series does not converge in any appropriate sense [10]. It is sometimes argued that the series could be Borel summable. The Borel transform is defined by B(t) =
tn n≥0
n!
νn .
If B(t) is a convergent series (for an appropriate norm) having a holomorphic extension to a domain containing the positive real line, such that ˜ ph ) := B(α
∞
B(t)e
− αt
ph
dt
0
˜ ph ) as the makes sense in an appropriate neighborhood of αph = 0, one may see B(α n physical density, whose series n≥0 (αph ) νn is only asymptotic. Proving such results mathematically is hard, even for the model studied in this paper. Our estimate (2.12) does not even allow to define the Borel transform B(t) in L 2 (R3 ) ∩ C. But Borel summability is not the only tool to construct a physical density providing the correct asymptotic series. For the model studied in the present paper, we have several natural families of functions of αph : the cut-off densities
ρph αph , Ce3(1−Z 3 )π/2αph ,
(2.13)
obtained by minimizing the reduced BDF energy with a cut-off = Ce3(1−Z 3 )π/2αph and using the relation (1.10). Each such density (2.13) has (for fixed C and 0 < Z 3 < 1) the required asymptotic series in αph by Theorem 3, and it solves the self-consistent equation (1.6) with the corresponding cut-off . Furthermore this solution has the benefit of being well-defined even when αph is not small, allowing for the description of nonperturbative physical events. The rest of the paper is devoted to the proofs of Theorems 3 and 4.
3. Some Preliminary Results In this section we state two preliminary results that will be useful in the proof of our main results, Theorems 3 and 4. The corresponding lengthy calculations will be provided later in Appendices A, B and C. Notation. In the whole paper we use the notation E(r ) = (1 + |r |2 )1/2 for r ∈ R3 or r ∈ R.
Charge Renormalization
13
3.1. The Uehling multiplier U . The operator U, defined previously as the multiplication by the function U in the Fourier domain, plays a major role in the definition of the sequence {νn }. In this section, we provide precise estimates quantifying the convergence of U towards U when → ∞, which will be very useful in the proof of Theorem 3. Proposition 3.1. Let ≥ 1 and denote by U , the function defined on R+ by B − B (r ) when 0 ≤ r ≤ 2, U (r ) = 0 otherwise.
(3.1)
Then, for all r ∈ R+ , lim→∞ U (r ) = U (r ) holds. Moreover, for κ0 = 15π /2,
U − U 1 1 m+3 (3.2) ∀m ≥ 0, (1 + U )m+1 ∞ ≤ κ0 max (1 + B )m , E(2) . L Finally, one has for a universal constant κ1 (given in Lemma B.1 below) ∀ 0 ≤ r ≤ 2,
0 ≤ U (r ) ≤ κ1 (1 + U (r )).
(3.3)
Proposition 3.1 is proved in Appendix B. Note that the uniform estimate (3.2) will later yield our estimate (1.15) on νn, − νn (see Lemma 5.2). More properties of U and U are provided later in Appendix A. 3.2. The nonlinear terms Fn, and Fn . In this section, we provide estimates on the functions Fn, and Fn , which will be one of the main ingredients in the proof of Theorem 3. We recall that F2n, = F2n = 0 by Furry’s Theorem (see [15, p. 547]). In order to state our main result, we introduce the functions ⎤ ⎡ ∞ n 1 1 1 1 ( ) ( )
Fn, j μ j ∗ j+1 0 dη⎦, (3.4) (μ) = ρ ⎣ 2π −∞ D 0 + iη |x| D + iη j=1
for any n ≥ 3, μ = (μ1 , . . . , μn ) ∈ C n and = ( 1 , . . . , n+1 ) ∈ {−1, 0, 1}n+1 . Here, we have used the notation (1)
(−1)
:= ,
(0)
(1)
(−1)
:= 1 − and := 1 = + .
(3.5)
The main result of this section is the following
). Let m ∈ N, ≥ 1 and ∈ {−1, 0, 1}n+1 . Assume Proposition 3.2 (Estimates on Fn, that n ≥ 3. Then, there exist universal constants C and K such that n n m (1 + U)m F (μ) 2 ≤ C (K log n) (1 + U)m μ j , n, C L ∩C n( )/24
(3.6)
j=1
for all μ = (μ1 , . . . , μn ) ∈ C n . Here, n( ) = 1, if at least one j is equal to −1, and n( ) = 0 otherwise. (1,...,1)
By (2.2), (3.4) and (3.5), we can write Fn, the following is a byproduct of (3.6):
(0,...,0)
= Fn, and Fn,
= Fn . Therefore
14
P. Gravejat, M. Lewin, É. Séré
Corollary 3.1. Let m ≥ 0, ≥ 1 and n ≥ 3 be an odd integer. Then, # " max (1 + U)m Fn, (μ) L 2 ∩C , (1 + U)m Fn (μ) L 2 ∩C ≤ C n (K log n)m
n (1 + U)m μ j , C
(3.7)
j=1
for any μ = (μ1 , . . . , μn ) ∈ C n . Here, C and K refer to the universal constants given by Proposition 3.2. In particular, the functions Fn, and Fn are continuous on C n with values in L 2 ∩ C. Recall F2k = F2k, = 0, hence only the case of n being an odd integer is relevant. The estimates of Proposition 3.2 are an adaptation of ideas of [15], in which similar bounds were computed (see, e.g., Lemmas 15 and 16 in [15]). Notice however that the projector was never mentioned in [15] since was a fixed number. We focus here on the limit → ∞ and we need to quantify the dependence on of the estimates on
. The proof of Proposition 3.2 is provided below in Appendix C. The the functions Fn, factor (K log n)m comes from (A.7) of Lemma A.3 and the constant K is also the one appearing in Theorem 4. 4. Proof of Theorem 4 This section is devoted to the proof of our estimate (2.12) on the n th order density νn . The definition of νn, being very similar to that of νn , our proof also provides the following Proposition 4.1 (Estimates on νn, ). There exists A > 0 such that (1 + U)m−n νn, 2 L ∩C ≤ An+1 max (1 + U)m ν L 2 ∩C , (K log(m))
mn 2
! , (1 + U)m νn+1 2 L ∩C
(4.1)
for any ≥ 1, m ∈ N and 0 ≤ n ≤ m. We postpone the proof of Proposition 4.1 and first complete that of Theorem 4. Proof of Theorem 4. We split the proof into three steps. First, we estimate by means of (3.7), the following norms: Jm,n := (1 + U)m−n νn L 2 ∩C . Step 1. Let m ∈ N and denote Pm (t) :=
m
Jm,n t n .
n=0
The polynomial Pm (t) satisfies for any t ≥ 0,
Pm (t) ≤ 1 + t + t 2 (1 + U)m ν L 2 ∩C + Qm (t Pm (t)),
(4.2)
where (C and K are the constants of Proposition 3.2) Qm (u) := u +
m j=3
C j (K log j)m− j u j .
(4.3)
Charge Renormalization
15
Let us assume first that n = 0, 1, 2. By (2.9), we then have νn = U n ν, hence Jm,n = (1 + U)m−n U n ν L 2 ∩C ≤ (1 + U)m ν L 2 ∩C .
∀n = 0, 1, 2,
(4.4)
We now turn to the case n ≥ 3. By (2.9), we have Jm,n ≤ (1 + U)m−n Uνn−1 L 2 ∩C + 3≤2 j+1≤n 2 j+1 n =n−2 j−1 k=1 k
(1 + U)m−n F2 j+1 νn 1 , . . . , νn 2 j+1 L 2 ∩C ,
hence, by Corollary 3.1, Jm,n ≤ (1 + U)m−n+1 νn−1 L 2 ∩C +
×(K log(2 j + 1))
m−n
2 j+1
C 2 j+1
3≤2 j+1≤n 2 j+1 n =n−2 j−1 k=1 k
(1 + U)m−n νn k C .
k=1
Since (1 + U)m−n νn k C ≤ (1 + U)m−n k νn k L 2 ∩C , we arrive at the inequality ⎛ ⎞ j n ⎝ C j (K log j)m−n Jm,n k ⎠. Jm,n ≤ Jm,n−1 + j
j=3
(4.5)
k=1
k=1 n k =n− j
Combining (4.4) with (4.5), we obtain Pm (t) ≤ (1 + t + t 2 )(1 + U)m ν L 2 ∩C + t Pm (t) +
m n
C t (K log j) j j
n=3 j=3
⎛
m−n
⎝
j
j
⎞ Jm,n k t
nk ⎠
.
k=1
k=1 n k =n− j
By Fubini’s theorem, Pm (t) ≤ (1 + t + t 2 )(1 + U)m ν L 2 ∩C + t Pm (t) +
m
C j t j (K log j)m− j
j=3
m− j
p=0 j
k=1 n k = p
⎛ ⎝
j
⎞ Jm,n k t n k ⎠
(4.6)
k=1
holds. Noticing that m− j
p=0 j
k=1 n k = p
⎛ ⎝
j
⎞ Jm,n k t n k ⎠ ≤ Pm (t) j ,
k=1
we deduce (4.2) from (4.6). This completes the proof of Step 1. In the second step of the proof of Theorem 4 we compute suitable bounds on Qm near the origin.
16
P. Gravejat, M. Lewin, É. Séré
Step 2. Let m ≥ 3. There exists a positive constant A(C, K ), depending on C and K , but not on m, such that Qm (u) ≤ 2u,
for any 0 ≤ u ≤ Um :=
A(C, K ) . (K log(m))m/2
By the definition (4.3) of Qm , we have Qm (u) ≤ u +
m
C (K log( j)) j
m− j
u ≤ u + (K log(m)) j
m
j=3
m j=3
Cu K log m
(4.7) j ,
hence when 2Cu ≤ K log m and 2C 3 (K log(m))m−3 u 2 ≤ 1, Qm (u) ≤ 2u holds. This ends the proof of Step 2. Step 3. Conclusion of the proof of Theorem 4. Since the coefficients Jm,n are nonnegative, the function t → t Pm (t) is either identically equal to 0 (then (2.12) is straightforward), or increasing on R+ . In the second case, it tends to ∞ as t → ∞, and there exists a unique Tm > 0 such that Tm Pm (Tm ) = Um .
(4.8)
Two situations may then occur. If Tm ≥ 1/4, by (4.2) and (4.7), Pm (t) ≤ 2(1 + U)m ν L 2 ∩C + 2t Pm (t) for all 0 ≤ t ≤ 1/4. Hence Pm (t) ≤ 4(1 + U)m ν L 2 ∩C and 1 n ≤ 4n+1 (1 + U)m ν L 2 ∩C . Jm,n ≤ 4 Pm 4 Otherwise Tm ≤ 1/4 and in this case we can deduce from (4.2), (4.7) and (4.8) that Um /Tm ≤ 2(1 + U)m ν L 2 ∩C + 2Um . This gives Tm ≥
Um . 4(1 + U)m ν L 2 ∩C
Combining with (4.8) again, we are led to Jm,n ≤
Um 4n+1 ≤ (1 + U)m νn+1 . L 2 ∩C Tmn+1 Umn
Estimate (2.12) then follows from (4.7). We now turn to the proof of Proposition 4.1. Proof of Proposition 4.1. The proof is almost identical. Denoting Jm,n := (1 + U)m−n νn, L 2 ∩C and introducing the polynomial function Pm (t) given by
Pm (t) :=
m
n Jm,n t ,
n=0
we deduce from the definition (2.8), and from (3.3) and (3.7) that
Pm (t) ≤ 1 + κ1 t + κ12 t 2 (1 + U)m ν L 2 ∩C + (κ1 − 1)t Pm (t) + Qm (t Pm (t)), (4.9) for all t ≥ 0. Estimate (4.1) then follows by applying to (4.9) the arguments of Steps 2 and 3 of the proof of Theorem 4.
Charge Renormalization
17
5. Proof of Theorem 3 This last section is devoted to the proof of our main estimate (1.13). The proof relies on the identity
ρph = ν + αph U ρph −
2n+1 N +1 αph F2n+1, (ρph , . . . , ρph ) − αph G N +1, , (5.1)
3≤2n+1≤N
where we denote ⎛ G N +1,
1 := ρ ⎝ 2π ×
∞
−∞
1 D0
− αph ρph ∗ | · |−1 + iη
⎞ 1 1 ρph ∗ 0 dη⎠. |·| D + iη
N +1 j=1
(5.2)
The formula (5.1) follows from Cauchy’s formula applied to (2.6). As mentioned in the Introduction, the proof of (1.13) naturally splits into two steps: we first establish that, under the assumptions of Theorem 3, the error term R N (αph , ) := ρph (αph , ) −
N
n νn, αph
(5.3)
n=0 N +1 (up to some multiplicative constant depending only on is controlled by a factor αph N , ν and ). In a second step we estimate the differences νn, − νn and deduce (1.13). More precisely, the remainder R N satisfies the following
Lemma 5.1. Let N ∈ N and 0 < < 1. Assume that ≤ Z 3 = 1 − αph B ≤ 1 − and N N := (1 + U) N +1 ν L 2 ∩C < ∞. Then, there exist two constants C(m, , N N ) and a(N , , N N ), depending only on N , and N N , such that ( ( ( R N (αph , )(
L 2 ∩C
N +1 ≤ C(N , , N N ) αph ,
(5.4)
for all 0 ≤ αph ≤ a(N , , N N ). As for the differences νn, − νn , we have Lemma 5.2. Let ≥ 1 and N ∈ N. Assume that N N := (1 + U) N +1 ν L 2 ∩C < ∞. Then, there exists a constant C(N , N N ), depending only on N and N N , such that ( ( (νn, − νn (
L 2 ∩C
≤
C(N , N N ) , (1 + B ) N +1−n
for all 0 ≤ n ≤ N . Combining Lemmas 5.1 and 5.2, we can complete the proof of Theorem 3.
(5.5)
18
P. Gravejat, M. Lewin, É. Séré
Proof of Theorem 3. Our assumption (1.12) (together with (A.3)) means that N N := (1 + U) N +1 ν L 2 ∩C < ∞. It follows from (5.3) that ρph (αph , ) −
N
N νn, − νn (αph )n . νn (αph ) = R N (αph , ) + n
n=0
Hence by (5.4) and (5.5), ( ( N ( ( ( n ( νn αph (ρph (αph ) − ( ( (
n=0
≤
N +1 C(N , , N N )αph +C(N , N N )
L 2 ∩C
n=0
N
n αph
n=0
(1 + B ) N +1−n
,
(5.6) for any number αph sufficiently small. In our setting we have B ≥ /αph and the result follows. It therefore remains to show Lemmas 5.1 and 5.2. Proof of Lemma 5.1. Let us introduce the notation r N (αph ) := (αph )−N −1 R N (αph ).
(5.7)
We want to establish a bound on r N independently of αph . By (5.3), this requires to estimate ρph and νn, (which was already done in Proposition 4.1). The first step of the proof will be to bound ρph independently of αph . Let us recall that a ground state for the reduced Bogoliubov-Dirac-Fock model satisfies αph ρph C = α(ρ Q − ν)C ≤ ανC (see [16, Eq. (33)]). Since αph = Z 3 α, this provides ρph C ≤ Z 3−1 νC ≤ −1 νC .
(5.8)
Note that we however do not have any a priori bound in L 2 (R3 ). Inserting (5.3) and (5.7) in (5.1) and using (2.8), we get r N = αph Ur N + U ν N , + G N +1, + ×
N (N +2)
k−N −1 αph
k=N +1
F2n+1, (ω p1 , . . . , ω p2n+1 ),
(5.9)
3≤2n+1≤N p1 +···+ p2n+1 =k−2n−1
where ω p = ν p, for 0 ≤ p ≤ N , and ω N +1 = r N . It remains to estimate all the terms of the right-hand side of (5.9). For the first term, we recall that αph |U | ≤ αph B = 1 − Z 3 ≤ 1 − , therefore αph Ur N L 2 ∩C ≤ (1 − )r N L 2 ∩C .
(5.10)
The second term can be controlled by using (3.3) and (4.1), which provide a positive constant C(N , N N ), depending only on N and N N , such that U ν N , L 2 ∩C ≤ κ1 (1 + U)ν N , L 2 ∩C ≤ C(N , N N ).
(5.11)
Charge Renormalization
19
As for the function G N +1, , we first recall that 1 11 1 11 0 62 6 π 1 π62 6 0 ≤ 1+ αph ||ν||C |D | ≤ D − αph ρph ∗ αph ||ν||C |D 0 | 1−
| · |
(5.12) 0 for all αph < π −1/6 2−11/6 ||ν||−1 C (see [16, p. 4495]). Hence, the operator D −αph ρph ∗ |·|−1 is invertible and, in particular, G N +1, is well-defined. Notice also that (5.12) yields for any αph < π −1/6 2−17/6 ||ν||−1 C , 1 3 0 1 0 ≤ |D |. |D | ≤ D 0 − αph ρph ∗ 2 | · | 2
When N ≥ 5, we argue exactly as in Steps 1 and 2 of the proof of Proposition 3.2, and deduce that there exists a constant C(N ), depending only on N , such that G N +1, L 2 ∩C ≤ C(N )ρph CN +1 .
(5.13)
When N ≤ 4, our argument is different. We expand G N +1, as before, writing G N +1, = −αph F2 j+1, (ρph , . . . , ρph ) + G 6, . N +1≤2 j+1≤5
In view of (3.7) and (5.13) (for N = 5), this leads to ⎛ ⎞ 2 j+1 G N +1, L 2 ∩C ≤ Cαph ⎝ ρph C + ρph 6C ⎠. N +1≤2 j+1≤5
In both cases, we obtain
! G N +1, L 2 ∩C ≤ C(N ) max ρph CN +1 , ρph 6C ,
for any αph ≤ 1, so that, by (5.8),
||ν||CN +1 ||ν||6C G N +1, L 2 ∩C ≤ C(N ) max , 6
N +1
) ≤ C(N , N N , ).
(5.14)
k−m−1 Finally, we consider the terms αph F2n+1, (ω p1 , . . . , ω p2n+1 ) of the sum in the righthand side of (5.9). By (3.7), we have
( ( ( k−N −1 ( F2n+1, (ω p1 , . . . , ω p2n+1 )( (αph
L 2 ∩C
≤ C 2n+1 |αph |k−N −1
2n+1
ω p j C .
j=1
When p j ≤ N , we deduce from (4.1) that there exists a constant C(N , N N ) such that ω p j C ≤ C(N , N N ). On the other hand, when p j = N + 1 for some j, we can bound one of the norms ω p j C by r N C , and the other ones by using (4.1), (5.3) and (5.8) to get ω p j C = |αph |− p j R N ≤ C(N , N N , )|αph |− p j ,
20
P. Gravejat, M. Lewin, É. Séré
for αph ≤ 1. This leads to ( ( ( k−N −1 ( F2n+1, (ω p1 , . . . , ω p2n+1 )( (αph
L 2 ∩C
≤ C(N , N N , )|αph |2n+1 max{r N C , 1},
for αph ≤ 1. Combining with (5.9), (5.10), (5.11) and (5.14), we conclude that
r N L 2 ∩C ≤ C(N , N N , ) + 1 − + C(N , N N , )|αph |3 max{r N C , 1}, for αph sufficiently small. Therefore, the norm ||r N || L 2 ∩C is bounded independently of αph for αph small enough, which ends the proof of Lemma 5.1. We finally prove Lemma 5.2. Proof of Lemma 5.2. Given any n ∈ {0, 1, 2}, it follows from recursion relations (2.8) and (2.9) that n n n ν − U n ν = U − U n ν. νn, − νn = U (ν − ν) + U Therefore, given any N ≥ n and 0 ≤ p ≤ N + 1 − n, we deduce from (3.3) that (1 + U) p (νn, − νn ) L 2 ∩C ≤ κ1n (1 + U)n+ p (ν − ν) L 2 ∩C + nκ1n−1 (1 + U)n+ p−1 (U − U) ν L 2 ∩C .
(5.15)
Next, we recall that ν ν1 B(0,2) , hence that, since U (2) = B , = (1 + U)n+ p (ν − ν)
L 2 ∩C
≤
(1 + U) N +1 ν 2 L ∩C
(1 + U (2)) N +1−n− p
=
(1 + U) N +1 ν (1 + B
L 2 ∩C . N +1−n− p )
(5.16) For the second term in the right-hand side of (5.15), we use (3.2) and write U − U n+ p−1 (1 + U) N +1 ν (1 + U) (U − U) ν L 2 ∩C ≤ L 2 ∩C (1 + U ) N +2−n− p L ∞
1 1 N +4−n− p ≤ κ0 max , (1 + U) N +1 ν 2 . N +1−n− p L ∩C (1 + B ) E(2) Since (1 + B ) N +1−n− p ≤ (1 + B ) N +1 ≤ C(N )E(2), we obtain ( ( ((1 + U) p (νn, − νn )( 2 ≤ L ∩C
C(N ) N +1 + U) ν (1 2 . L ∩C (1 + B ) N +1−n− p
Combining with (5.15) and (5.16), we are led to ( ( ((1 + U) p (νn, − νn )( 2 ≤ L ∩C for N ≥ n and 0 ≤ p ≤ N + 1 − n.
C(N , N N ) , (1 + B ) N +1−n− p
(5.17)
Charge Renormalization
21
We next turn to the case of n ≥ 3. Given any N ≥ n, we assume that (5.17) holds for all n ≤ k − 1 and 0 ≤ p ≤ N + 1 − n, and prove it by induction for n = k and 0 ≤ p ≤ N + 1 − k. Using (2.8) and (2.9), we first infer that ( ( ( ( ((1 + U) p (νk, − νk )( 2 ≤ ((1 + U) p U (νk−1, − νk−1 )( 2 L ∩C L ∩C ( ( ( p ( ( ((1 + U) p + (1 + U) (U − U)νk−1 L 2 ∩C + 3≤2 j+1≤k 2 j+1 k =k−2 j−1 =1
( × F2 j+1, νk1 , , . . . , νk2 j+1 , − F2 j+1 νk1 , . . . , νk2 j+1 ( L 2 ∩C .
(5.18)
We next estimate the first term in the right-hand side of (5.18) using (3.3) and our assumption. This provides (1 + U) p U (νk−1, − νk−1 ) L 2 ∩C ≤ κ1 (1 + U) p+1 (νk−1, − νk−1 ) L 2 ∩C C(N ) N +1 + U) ≤ ν (1 2 . L ∩C (1 + B ) N +1−k− p (5.19) For the second term, we argue as in the proof of (5.17), using (3.2) and (2.12): U − U (1 + U) N +2−k νk−1 (1 + U) p (U − U)νk−1 L 2 ∩C ≤ L 2 ∩C (1 + U ) N +2−k− p L ∞ C(N , N N ) ≤ . (5.20) (1 + B ) N +1−k− p Finally, we turn to the terms in the sums of the right-hand side of (5.18). On the one hand, we deduce from (3.4) that (−1,1,...,1)
F2 j+1, − F2 j+1 = −F2 j+1,
(0,−1,1,...,1)
− F2 j+1,
(0,...,0,−1)
− · · · − F2 j+1,
.
Hence, since p ≤ N + 1 − k ≤ N + 1 − k , we can apply (3.6) and (4.1) to obtain ( ( ((1 + U) p F2 j+1, νk , , . . . , νk , − F2 j+1 νk , , . . . , νk , ( 2 1
≤
C(N ) 1/24
2 j+1
1
2 j+1
(1 + U) N +1−k νk , C ≤
=1
L ∩C
2 j+1
C(N , N N ) . 1/24
(5.21)
On the other hand, the multilinearity of the function F2 j+1 provides F2 j+1 νk1 , , . . . , νk2 j+1 , − F2 j+1 νk1 , . . . , νk2 j+1 = F2 j+1 νk1 , − νk1 , νk2 , , . . . , νk2 j+1 , + F2 j+1 νk1 , νk2 , − νk2 , . . . , νk2 j+1 , + · · · + F2 j+1 νk1 , νk2 , . . . , νk2 j , νk2 j+1 , − νk2 j+1 . Therefore, we infer similarly from (3.6) and (4.1) that ( ((1 + U) p F2 j+1 νk , , . . . , νk , − F2 j+1 νk , , . . . , νk 1
≤ C(N )
2 j+1 q=1
≤
2 j+1 q=1
( ((1 + U) p (νk
1
2 j+1
q ,
2 j+1 ,
( (
L 2 ∩C
( ( ( ( ( ((1 + U) p νk , ( ((1 + U) p νk ( − νkq )(C C C
C(N , N N ) C(N , N N ) ≤ . N +1−k − p q (1 + B ) N +1−k− p (1 + B )
>q
(5.22)
22
P. Gravejat, M. Lewin, É. Séré
As a conclusion, we derive from (5.18), (5.19), (5.20), (5.21) and (5.22) that ( ( 1 1 ((1 + U) p (νk, − νk )( 2 ≤ C(N , N N ) . + L ∩C (1 + B ) N +1−k− p 1/24 Since (1 + B ) N +1−k− p ≤ (1 + B ) N +1 ≤ C(N )1/24 , this completes the proof of (5.17) for n = k. Notice the constant C(N , N N ) deteriorates when n increases. However, this is not a problem since n is limited to the set {0, . . . , N }. Estimate (5.5) then follows from (5.17), considering the case p = 0. This concludes the proof of Lemma 5.2. Acknowledgements. The authors are grateful to Christian Brouder for interesting comments. M.L. would like to thank Jan Derezi´nski and Jan Philip Solovej for stimulating discussions. Grants from the French Ministry of Research (M.L. & É.S., ANR-10-BLAN 0101) and from the European Research Council (M.L., MNIQS258023) are gratefully acknowledged.
Appendix A. Auxiliary Results on the Uehling Multiplier U A.1. Elementary properties of U . We gather in this section some important properties of U , which will be useful for the proof of Lemma A.3 in the next section. Lemma A.1. The function U defined in (2.10) is a non-negative, non-decreasing, smooth function on R+ such that U (r ) ∼
r →0
r2 2 and U (r ) ∼ log r. r →∞ 3π 15π
(A.1)
Its derivative U is positive on (0, ∞), and it holds 2 2 and U (r ) ∼ − . r →∞ 3πr 3πr 2
(A.2)
2 2 (1 + log E(r )) ≤ 1 + U (r ) ≤ 1 + log E(r ). 15π 3π
(A.3)
U (r ) ∼
r →∞
Moreover, we have ∀r ∈ R+ ,
Proof of Lemma A.1. For the convenience of the reader, let us recall the integral and the explicit formulas (2.10) of U : r 2 1 z 2 − z 4 /3 U (r ) = dz 4π 0 1 + r 2 (1−z 2 ) 4 √ √ 12 − 5r 2 4 + r2 2 4 + r2 + r = . (A.4) + (r − 2) log √ 9πr 2 3πr 3 4 + r2 − r Most of the statements of Lemma A.1 are direct consequences of (A.4). As for (A.3), we estimate, using (A.4), 1 r2 r2 2 2z 1 U (r ) ≤ log 1 + ≤ log E(r ). dz = 12π 0 1 + r 2 − r 2 z 2 3π 4 3π 4 4
Charge Renormalization
23
For the lower bound, we notice similarly that U (r ) ≥
r2
4π 1 +
2
r 4
0
1
z4 z − 3
2
dz =
r2 15π 1 +
r2 4
,
for r ∈ R+ , so that 2 2 + r2 ≤ 1 + U (r ). (1 + log E(r )) ≤ 15π 15π
∀0 ≤ r ≤ 1,
(A.5)
On the other hand, we can also write r2 U (r ) ≥ 6π
1 0
z2
4 dz = 4 r2 3πr 1 + 2 (1 − z)
2 r2 r2 r4 r2 1+ − − log 1 + , 2 2 2 8
thus when r > 1 r2 7 1 1 log 1 + +1− ≥ 1 + U (r ) ≥ (1 + log E(r )) . 3π 2 6π 3π The lower bound in (A.3) then follows from (A.5).
A useful consequence of Lemma A.1 is the following Lemma A.2. Let be the function defined on R+ by (r ) =
U (r ) . 1 + U (r )
There exist three positive numbers T− , T+ and 0 such that the function is an increasing diffeomorphism from (0, T− ) onto (0, 0 ), respectively a decreasing diffeomorphism from (T+ , ∞) onto (0, 0 ), and −1 ((0, 0 )) = (0, T− ) ∪ (T+ , ∞). Moreover, we have (r ) ∼
r →0
2r 1 , and (r ) ∼ . r →∞ 15π r log r
(A.6)
Proof of Lemma A.2. From Lemma A.1, we see that the function is well-defined, smooth on R+ , and satisfies (A.6). Then we compute for r ≥ 0: (r ) =
U (r )(1 + U (r )) − U (r )2 . (1 + U (r ))2
2 By (A.1) and (A.2), we thus have (0) = 15π and (r ) ∼r →∞ −1/(r 2 log r ). Since (0) = 0 and (r ) → 0 as r → ∞ by (A.1) and (A.2), there exist a, b, δ > 0 such that is an increasing diffeomorphism from (0, a) onto (0, δ), respectively a decreasing diffeomorphism from (b, ∞) onto (0, δ). The function is positive on [a, b], so that m = min{(t), a ≤ t ≤ b} > 0. Lemma A.2 follows by introducing 0 = min{m/2, δ}, and T− < T+ , the two positive numbers such that (T− ) = (T+ ) = 0 .
24
P. Gravejat, M. Lewin, É. Séré
A.2. A useful bound involving U . We use here results from the previous section to derive a bound useful for the proof of Proposition 3.2. Lemma A.3. There exists a universal constant K > 0 such that ⎞ ⎛ n n ⎠ ⎝ 1+U 1 + U (|v j |) v j ≤ K log n j=1 j=1
(A.7)
for all n ≥ 1, and all (v1 , . . . , vn ) ∈ (R3 )n . If we allow K to depend on n, the optimal constant in the above inequality satisfies K n → 1/3π when n → ∞, as can be seen from the proof. The factor log n in (A.7) is therefore optimal with regard to the large-n dependence. Proof of Lemma A.3. By Lemma A.1,
n 1 + U nj=1 v j 1+U j=1 |v j | ≤ *n *n j=1 1 + U (|v j |) j=1 1 + U (|v j |)
n 1+U j=1 t j := Jn ≤ max *n t1 ,...,tn ∈R+ j=1 1 + U (t j )
(A.8)
holds. It is clear that taking v1 = · · · = vn = v shows that the maximum of the lefthand side of (A.8) is actually Jn . Next, we take t1 = . . . = tn = τn in (A.8) with τn = 15π/(n log n). Using (A.2), we see that Jn (log n)/3π for n 1. We will show that actually Jn ∼ (log n)/(3π ) holds when n → ∞. In the rest of the proof, we assume n ≥ n 0 is such that Jn > 1. ( p) ( p) Let us consider a maximizing sequence {(t1 , · · · , tn )} p∈N for the variational problem defining Jn . If the sequence is unbounded, then by Lemma A.1,
( p) 1 + U n max j {t j }
= 1, Jn ≤ lim ( p) p→∞ 1 + U max j {t j } ( p)
( p)
which contradicts Jn > 1 for n ≥ n 0 . Therefore (t1 , . . ., tn ) is bounded in (R+ )n . In this case the variational problem on the right-hand side of (A.8) has a maximizer, which satisfies the equation
⎞ ⎛ n n U t j j=1 U (tk )
= ⎝ t j ⎠ := 1 . ∀1 ≤ k ≤ n, (tk ) = = n 1 + U (tk ) 1+U t j=1 j
j=1
(A.9) Assume now that 1 ≥ 0 . By Lemma A.2, we have tk ≥ T− and nj=1 t j ≤ T+ , for all 1 ≤ k ≤ n, hence n ≤ T+ /T− . In particular, for n > T+ /T− , 0 ≤ 1 < 0 must hold. If 1 = 0, we infer from Lemma A.1 that t1 = · · · = tn = 0, so that Jn = 1, a contradiction. Therefore, by Lemma A.2, there exist exactly two numbers 0 < τn < T−
Charge Renormalization
25
and Tn > T+ such that (τn ) = (Tn ) = 1 . By (A.9), the unique possible maximizer is (τn , . . . , τn ), where τn = Tn /n ∈ (0, T− ) is such that (τn ) = (nτn ).
(A.10)
The corresponding value of Jn is Jn =
1 + U (nτn ) . (1 + U (τn ))n
(A.11)
By (A.10), we must have τn → 0 as n → ∞. Combining (A.10) with (A.6), it follows that (nτn ) ∼ 2τn /(15π ) → 0. By Lemma A.2 and since nτn = Tn ≥ T+ , nτn → ∞ holds. Using (A.6) again, we deduce that nτn log(nτn ) ∼n→∞ 15π/(2τn ), hence finally τn ∼ 15π /(n log n). Inserting in (A.1) and (A.11), we finally arrive at Jn ∼ (log n)/(3π ). This ends the proof of Lemma A.3. Appendix B. Proof of Proposition 3.1 We start by showing the following lemma which provides estimates on U − U in [0, 2]. Lemma B.1. Let ≥ 1. For κ1 = 258/π , one has ∀0 ≤ r ≤ 2,
|U (r ) − U (r )| ≤ κ1
r . 2E()
(B.1)
Proof of Lemma B.1. Recall that (see (2.10) and (3.1)) U (r ) − U (r ) =
r2 4π +
1 π
z2 −
1
z4 3
dz −
r 2π
Z (r )
r2 2 0 4 (1 − z ) 4 z 2 E() z − 3 dz, 2 Z (r ) (1 − z 2 )(1 + r (1 − z 2 )) 4 E()
1+
3
z − z3 E() −
rz 2
dz
(B.2)
for 0 ≤ r ≤ 2 and where Z (r ) =
2 − r E() − E( − r ) = ≤ . r E() + E( − r ) E()
(B.3)
We will estimate all the terms of the right-hand side of (B.2). The first term is treated as follows, for all 0 ≤ r ≤ 2: 4 r2 1 z 2 − z3 r2 r2 ≤ 1 − ≤ dz . (B.4) 4π 1 + r 2 (1 − z 2 ) 6π E() 6π E()2 E() 6
Using (B.3) and |x| ≤ E(x), we bound the second term by 3 r Z (r )2 r Z (r ) z − z3 r
≤ dz , ≤ r z 2π 0 E() − 2 4π E() − r Z (r ) 2π E() 2
(B.5)
26
P. Gravejat, M. Lewin, É. Séré
for 0 ≤ r ≤ 2. In order to estimate the last term of the right-hand side of (B.2), we distinguish the regions 0 ≤ r ≤ /2 and /2 ≤ r ≤ 2. We calculate 4 E() 2 E() z 2 − z3 1 − Z (r ) 2 dz = log dz ≤ . Z (r ) (1 − z 2 )(1 + r 2 (1 − z 2 )) 3 Z (r ) 1 − z 3 1 − E() 4
On the other hand, by (B.3), 1 − Z (r ) 1−
E()
=1+
6r r (2 − r )( + E()) ≤1+ , (E() + E( − r ))(( − r )E() + E( − r )) E()
as soon as 0 ≤ r ≤ /2. Hence using log(1 + x) ≤ x we infer the bound 4 z 2 − z3 4r 1 E() ∀0 ≤ r ≤ /2, . dz ≤ 2 π Z (r ) (1 − z 2 )(1 + r (1 − z 2 )) π E()
(B.6)
4
For /2 ≤ r ≤ 2, we write similarly as before 4 E() E() z 2 − z3 8 dz 8 ≤ dz ≤ 2 E()( + E()), 2 2 Z (r ) (1 − z 2 )(1 + r (1 − z 2 )) 3r Z (r ) (1 − z)2 3r 4 and deduce the estimate ∀/2 ≤ r ≤ 2,
1 π
4 E() z 2 − z3 128r dz ≤ . (B.7) 2 Z (r ) (1 − z 2 )(1 + r (1 − z 2 )) π E() 4
Estimate (B.1) follows from (B.4), (B.5), (B.6) and (B.7), together with (B.2). This ends the proof of Lemma B.1. We now use Lemma B.1 to finish the proof of Proposition 3.1. The pointwise convergence of U when → ∞ is a direct consequence of (B.1). For (3.2), we first use (A.3) and (B.1) to obtain m+1 U (r ) − U (r ) E(r ) ≤ κ1 15π . ∀0 ≤ r ≤ 2, (1 + U (r ))m+1 2 2E()(1 + log E(r ))m+1 Optimizing x →
E(x) (1+log E(x))m+1
on [0, 2] yields
E(2) E(r ) . ≤ max 1, (1 + log E(r ))m+1 (1 + log E(2))m+1 Since E(2x) ≤ 2E(x) for any x ≥ 0, we are led to m+1
U (r ) − U (r ) 1 1 ≤ κ1 15π . (B.8) max , (1 + U (r ))m+1 2 (1 + log E(2))m+1 E(2) On the other hand, U is non-decreasing on R+ , hence, using (A.3) we infer m U (r ) − U (r ) U (r ) 1 ≤ 15π = ∀r ≥ 2, . (1 + U (r ))m+1 (1 + U (r ))m+1 2 (1 + log E(2))m
Charge Renormalization
27
Using (B.8), we finally obtain ( ( ( U − U ( ( ( ( (1 + U )m+1 (
≤ κ1
L∞
15π 2
m+1
max
1 1 . , (1 + log E(2))m E(2)
(B.9)
We now recall that B =
1 π
0
E()
z 2 − z 4 /3 dz, 1 − z2
(B.10)
so that, for ≥ 1, B ≤
2 3π
E()
2 4 1 dz = log [E()( + E())] ≤ log E(2). 1−z 3π 3π
0
Combining with (B.9), we finally derive (3.2). We end the proof of Proposition 3.1 by noting that (3.3) follows directly from the definition of U and (B.1). Appendix C. Proof of Proposition 3.2
(μ) by duality as follows: We may define Fn,
R3
ζ Fn, (μ) = Tr(Q n, ζ ),
for any smooth function ζ , and where Q n,
1 = 2π
∞
−∞
n 1 1 1 ( j+1 ) ( j ) μ j ∗ dη. D 0 + iη |x| D 0 + iη j=1
We will use, like in [15, p. 547], the inequality |Tr(Q n, ζ )|
=
R3
Tr C4
Q n, ζ ( p, p) dp ≤
R3
Q n, ζ ( p, p) dp.
(C.1)
The a bound of the last integral in (C.1) in terms of the norms main−midea is to derive (1 + U) ζ 2 and (1 + U)−m ζ 2 , which provides an estimate of the form (3.6), L L +C by duality. We split the proof into three steps. Step 1. There exists a universal constant C1 such that for all n ≥ 5, n n m (1 + U)m μ j , (1 + U)m F (μ) ≤ (C1 ) (K log n) n, n( ) C C 2 j=1
for all μ = (μ1 , . . . , μn ) ∈ C n .
(C.2)
28
P. Gravejat, M. Lewin, É. Séré
ζ ( p, p) as follows: We estimate Q n, ∞ 1 ( 1 ) ( 2 )
ζ ( p, p)| ≤ dη · · · ( p)| ϕ ( p − p )| f ( p ) |Q f 1 1 1 1 1 n, 3n+5 4 4 R3 R3 (2π ) 2 −∞ n−1 ( j+1 ) ( j+2 ) f1 × ( p j )| ϕ j+1 ( p j − p j+1 )| f 1 ( p j+1 ) j=1
×
4
( n+1 )
4
( 1 )
f1
( pn )| ζ ( pn − p)| f 1
4
4
( )
where ϕ j = μ j ∗ | · |−1 , and for any β > 0, f β (−1)
= −1, and π
( p) dp1 · · · dpn , ( )
(C.3) ( )
= π /(η2 + E 2 )β , with π = 1, if
(·) = 1|·|> . Applying the following corollary of (A.7)
(1 + U ( p − pn )) ≤ (K log n) (1 + U ( p − p1 )) m
m
m
n−1
(1 + U ( p j − p j+1 ))m
j=1
to (C.3), we are led to estimating
ζ ( p, p)|dp ≤ 1 (K log n)m |Q n, 2π R3 ( j+1 )
× f1
∞ −∞
(−i∇)
4
⎛ Tr ⎝
n j=1
( j )
f1
(−i∇)ψ j (x) ×
4
⎞
f 1( n+1 ) (−i∇)ξ(x) f 1( 1 ) (−i∇) ⎠ dη, 4
4
j = (1 + U )m |ϕj | for 1 ≤ j ≤ n, and ξ = (1 + U )−m | ζ |. Since n + 1 ≥ 6, we where ψ deduce from Hölder’s inequality in Schatten spaces [30], and the fact that ·Sq ≤ ·Sr , as soon as 1 ≤ r ≤ q ≤ ∞, that ⎛ ∞ n ( ( ( j )
ζ ( p, p)|dp ≤ 1 (K log n)m ( f 1 (−i∇)ψ j (x) ⎝ |Q n, ( 4 2π R3 −∞ j=1 ( ( ( n+1 ) ( j+1 ) ( ) f 1 × f1 (−i∇)( (−i∇)ξ(x) f 1 1 (−i∇) dη. ( 4 4 4 S6
S6
(C.4) We now use the Kato-Seiler-Simon inequality (see [27] and [30, Thm. 4.1]), ∀ p ≥ 2,
f (−i∇)g(x)Sp ≤
1 3
(2π ) p
g L p (R3 ) f L p (R3 ) ,
(C.5)
to bound all the terms of the product in the right-hand side of (C.4). This provides ( ) ( ) |h(x)| 21 f 1( ) (−i∇) f 1 (−i∇)|h(x)| 21 f 1 (−i∇)|h(x)| f 1( ) (−i∇) ≤ 4 4 4 4 S6 S S12 12 ( ) ( ) 1 ||h|| L 6 f 1 f 1 , ≤ 1 4 4 L 12 L 12 (2π ) 2
Charge Renormalization
29
for any and in {−1, 0, 1}, and any h ∈ L 6 (R3 ). In particular, by the critical Sobolev inequality, we obtain for any function h in H˙ 1 (R3 ), ( ) f 1 (−i∇)|h(x)| f 1( ) (−i∇) 4 4
( ) ≤ A ||∇h|| L 2 f 1
S6
4
L
( ) f 1 4 12
,
L 12
(C.6)
for some universal constant A. Given any q ≥ 2 and β > 6/q, we then check that ( ) f β
3
Lq
≤ E(η) q
−2β
du βq 3 E(u) R
1 q
3
≤ E(η) q =
−2β
4π 2βq − 3
1
du |u|2βq
|u|≥1
q
3
E(η) q
−2β
1
,
q
(C.7)
for = −1, while similarly, (−1) f β
Lq
≤
4π 2βq − 3
1 q
3
min E(η) q
−2β
3
, q
−2β
!
.
(C.8)
The definition of the functions ψ j gives (1 + U)m ∇ψ j
= 4π (1 + U)m μ j C ,
L2
(C.9)
which, combined with (C.4), (C.6), (C.7) and (C.8), leads to R3
n
ζ ( p, p)|dp ≤ An+1 (K log n)m (1 + U)−m ζ (1 + U)m μ j |Q n, C C
∞
×
1
min
E(η)
0
1 2
1
,
j=1
)n
1 2
dη E(η)
for some universal constant A. When n = 0, we have ∞ −3 dη, whereas, for n = 1,
0 E(η)
∞
min 0
1 E(η)
1 2
,
1
1 2
)
dη E(η)
n 2
≤
1
Inequality (C.2) then follows with C1 = A/2 + A
1 2
∞ 0
∞ 0
n+1−n 2
∞ 0
dη E(η)
5 2
,
E(η)−(n+1)/2 dη ≤
+
1 . 22
E(η)−5/2 dη.
Step 2. There exists a universal constant C2 such that, for n = 3 or n ≥ 5, n n m (1 + U)m F (μ) 2 3 ≤ (C2 ) (K log n) (1 + U)m μ j , n, n( ) C L (R ) 7 j=1
for all μ = (μ1 , . . . , μn ) ∈ C n .
(C.10)
30
P. Gravejat, M. Lewin, É. Séré
The proof is similar to the proof of (C.2). Since n/6 + 1/2 ≥ 1, we can now estimate
ζ ( p, p) by Q n, (K log n)m ∞ ( 2 )
ψ1 (x) f 3 (−i∇) | Q n, ζ ( p, p)|dp ≤ 2π 8 R3 −∞ S6 n−2 ( 2 ) ( j+1 ) ( j+2 ) ( 3 ) × f 1 (−i∇)ψ2 (x) f 1 (−i∇) (−i∇)ψ j+1 (x) f 1 (−i∇) f 1 8
S6 j=2
4
( n ) ( n+1 ) × f 1 (−i∇)ψn (x) f 1 (−i∇) 4 8
S6
4
4
( n+1 ) ( 1 ) f 3 (−i∇)ξ(x) f 1 (−i∇) 8 2
S6
dη, S2
(C.11) ( ) fβ , ψ j
where the functions and ξ are defined as in Step 1. Using Hölder’s inequality and (C.5), we can bound each norm in the right-hand side of (C.11) similarly to (C.6). This provides, for instance, ( 2 ) f 1 (−i∇)|h(x)| f 1( 3 ) (−i∇) ≤ A ||∇h|| L 2 f 1( 2 ) f 1( 3 ) , 8
and
S6
4
( n+1 ) ( 1 ) f 3 (−i∇)|h(x)| f (−i∇) 1 8 2
S2
≤
L 18
8
1 3
(2π ) 2
||h|| L 2 f 3( n+1 ) 8
L9
4
L
14 3
( 1 ) f 1 2
7
.
L2
Combining with (C.7), (C.8) and (C.9), we obtain n
ζ ( p, p)|dp ≤ An+1 (K log n)m (1 + U)−m ζ (1 + U)m μ j |Q n, L2 C R3
∞
×
min 0
for a universal constant A. Since ∞ 0
dη
E(η)
3n−2 1 6 +7
1 1 , E(η) ≤ 0
∞
j=1
n
dη
7
E(η) dη 7
E(η) 6
3n−2 1−n 6 + 7
,
,
for n = 0, whereas, for n = 1, ) ∞ ∞ 1 dη dη 1 1 6 min , 1 ≤ 1 + 1, 1 3n−2 7 0 E(η) 7 7 E(η) 6 7 0 E(η) 6 6 ∞ we obtain (C.10) with C2 = 6A + A 0 E(η)−7/6 dη. Step 3. Let n = 3. There exists a universal constant C3 such that 3 3 m (1 + U)m F (μ) ≤ (C3 ) (K log 3) (1 + U)m μ j , 3, n( ) C C 24 j=1
for all μ = (μ1 , μ2 , μ3 ) ∈ C 3 .
(C.12)
Charge Renormalization
31
The proof of (C.12) follows ideas from [15, Sect. 4.3.4]. Contrarily to Steps 1 and 2, it
ζ ( p, p) by means of the residuum formula for relies on an explicit computation of Q 3, the integral with respect to the variable η. Indeed, it holds
ζ ( p, p) = Q ,δ Q (C.13) 3, 3, ζ ( p, p). δ∈{−1,1}4
Here, the quantity Q ,δ 3, ζ ( p, p) vanishes if δ = ±(1, 1, 1, 1), whereas, when δ = (1, −1, −1, −1), it refers to the expression 1 ϕ1 ( p − p1 ) ( 2 ) ( ) 0(p ) π 1 ( p)P+0 ( p) ( p1 )P− π 1 E( p) + E( p1 ) (2π )6 R3 R3 R3 ϕ2 ( p1 − p2 ) ( 3 ) 0 ( p ) ϕ3 ( p2 − p3 ) π ( 4 ) ( p )P 0 ( p ) × π ( p2 )P− 2 3 − 3 ζ ( p3 − p)d p1 d p2 d p3 , E( p) + E( p2 ) E( p) + E( p3 )
Q ,δ 3, ζ ( p, p) =
where P±0 ( p) = (E( p) ± (α. p + β))/2E( p). The expression of Q ,δ 3, ζ ( p, p) is similar when δ contains exactly one δi = 1, respectively exactly one δi = −1. On the other hand, for δ = (1, 1, −1, −1), the function Q ,δ 3, ζ ( p, p) is given by Q ,δ 3, ζ ( p, p) =
1 ( ) ( ) 0(p ) π 1 ( p)P+0 ( p)|ϕ1 ( p − p1 )|π 2 ( p1 )P− 1 (2π )6 R3 R3 R3 ( ) 0 ( p )|ϕ ( p − p )|π ( 4 ) ( p )P 0 ( p ) ×|ϕ2 ( p1 − p2 )|π 3 ( p2 )P− 2 3 2 3 3 − 3 ζ ( p3 − p) 1 × (E( p) + E( p2 ))(E( p1 ) + E( p2 ))(E( p1 ) + E( p3 )) 1 d p1 d p2 d p3 . + (E( p) + E( p2 ))(E( p) + E( p3 ))(E( p1 ) + E( p3 ))
We next estimate Q ,δ 3, ζ ( p, p) as above. For instance, when δ = (1, −1, −1, −1), since E( p + q) ≤ E( p) + E(q) for any ( p, q) ∈ R3 , we can compute 0 ( p )| |P+0 ( p)ϕ1 ( p − p1 )P− 1 Q ,δ ζ ( p, p) ≤ 1 d p1 d p2 d p3 3, (2π )6 2 3 3 3 R R R E( p + p ) 3 1
|ϕ2 ( p1 − p2 )| ( 3 ) |ϕ3 ( p2 − p3 )| ( 4 ) | ζ ( p3 − p)| ( 1 ) ( ) ×π 2 ( p1 ) π ( p2 ) π ( p3 ) π ( p), 1 1 1 1 1 1 E( p1 ) 6 E( p2 ) 2 E( p2 ) 2 E( p3 ) 2 E( p3 ) 2 E( p) 6
so that, using (A.7) as above, Q ,δ ζ ( p, p) dp ≤ (K log 3)m Mϕ 1 S 3, R3
( ) π ( 3 ) π 4 × 1 (−i∇) ψ3 (x) 1 (−i∇) E 2 E2
S6
( ) π ( 2 ) π 3 ψ2 (x) 1 (−i∇) (−i∇) 1 2 E6 E2 S6 ( 1 ) π ( 4 ) π 1 (−i∇) ζ (x) 1 (−i∇) , E 2 E6 S6
(C.14) where |ϕ1 ( p − q)| 0 Mϕ1 ( p, q) = |P+ ( p)P−0 (q)|. 2 E( p + q) 3
32
P. Gravejat, M. Lewin, É. Séré
The operator Mϕ1 was estimated in Lemma 14 of [15] by Mϕ1 S ≤ A∇ϕ1 L 2 , where 2 A is some universal constant. By the definition of ϕ1 , we obtain Mϕ ≤ 4π A(1 + U)m μ1 C . (C.15) 1 S 2
As for the other terms in the right-hand side of (C.14), we argue as before and, by (C.14) and (C.15), we obtain * 3)m Q ,δ ζ ( p, p) dp ≤ A(K log (1 + U)−m ζ C 3j=1 (1 + U)m μ j C , (C.16) n( ) 3, R3
24
where A denotes some universal constant. All the terms in the right-hand side of (C.13) are similar to the one corresponding to δ = (1, −1, −1, −1). In particular, (C.16) holds
when Q ,δ 3, ζ is replaced by Q 3, ζ . By duality, this completes the proofs of (C.12) and of Step 3. This ends the proof of Proposition 3.2. References 1. Avron, J., Seiler, R., Simon, B.: The index of a pair of projections. J. Funct. Anal. 120, 220–237 (1994) 2. Bjorken, J.D., Drell, S.D.: Relativistic quantum fields. New York: McGraw-Hill Book Co., 1965 3. Cancès, É., Lewin, M.: The dielectric permittivity of crystals in the reduced Hartree-Fock approximation. Arch. Rati. Mech. Anal. 197, 139–177 (2010) 4. Chaix, P., Iracane, D.: From quantum electrodynamics to mean field theory: I. The Bogoliubov-DiracFock formalism. J. Phys. B 22, 3791–3814 (1989) 5. Dirac, P.A.: The quantum theory of the electron. II. Proc. Royal Soc. London (A) 118, 351–361 (1928) 6. Dirac, P.A.: A theory of electrons and protons. Proc. Royal Soc. London (A) 126, 360–365 (1930) 7. Dirac, P.A.: Theory of electrons and positrons. Nobel Lecture delivered at Stockholm, 1933 8. Dirac, P.A.: Théorie du positron. Solvay Report XXV, 203–212 (1934) 9. Dyson, F.J.: The S matrix in quantum electrodynamics. Phys. Rev. 75(2), 1736–1755 (1949) 10. Dyson, F.J.: Divergence of Perturbation Theory in Quantum Electrodynamics. Phys. Rev. 85, 631– 632 (1952) 11. Engel, E.: Relativistic Density Functional Theory: Foundations and Basic Formalism. Vol. ‘Relativistic Electronic Structure Theory, Part 1. Fundamentals’, Schwerdtfeger ed., Amsterdam: Elsevier, 2002, ch. 10, pp. 524–624 12. Engel, E., Dreizler, R.M.: Field-theoretical approach to a relativistic Thomas-Fermi-Dirac-Weizsäcker model. Phys. Rev. A 35, 3607–3618 (1987) 13. Gravejat, P., Lewin, M., Séré, É.: Ground state and charge renormalization in a nonlinear model of relativistic atoms. Commun. Math. Phys. 286, 179–215 (2009) 14. Greiner, W., Müller, B., Rafelski, J.: Quantum Electrodynamics of Strong Fields. First ed., Texts and Monographs in Physics. Berlin-Heidelberg-NewYork: Springer-Verlag, 1985 15. Hainzl, C., Lewin, M., Séré, É.: Existence of a stable polarized vacuum in the Bogoliubov-Dirac-Fock approximation. Commun. Math. Phys. 257, 515–562 (2005) 16. Hainzl, C., Lewin, M., Séré, É.: Self-consistent solution for the polarized vacuum in a no-photon QED model. J. Phys. A 38, 4483–4499 (2005) 17. Hainzl, C., Lewin, M., Séré, É.: Existence of atoms and molecules in the mean-field approximation of no-photon quantum electrodynamics. Arch. Rati. Mech. Anal. 192, 453–499 (2009) 18. Hainzl, C., Lewin, M., Séré, É., Solovej, J.P.: A minimization method for relativistic electrons in a mean-field approximation of quantum electrodynamics. Phys. Rev. A 76, 052104 (2007) 19. Hainzl, C., Lewin, M., Solovej, J.P.: The mean-field approximation in quantum electrodynamics: the no-photon case. Comm. Pure Appl. Math. 60, 546–596 (2007) 20. Hainzl, C., Siedentop, H.: Non-perturbative mass and charge renormalization in relativistic no-photon quantum electrodynamics. Commun. Math. Phys. 243, 241–260 (2003) 21. Itzykson, C., Zuber, J.B.: Quantum field theory. New York: McGraw-Hill International Book Co., 1980 22. Landau, L., Pomeranˇcuk, I.: On point interaction in quantum electrodynamics. Dokl. Akad. Nauk SSSR (N.S.) 102, 489–492 (1955) 23. Landau, L.D.: On the quantum theory of fields. In: Niels Bohr and the development of physics, New York: McGraw-Hill Book Co., 1955, pp. 52–69
Charge Renormalization
33
24. Lieb, E.H., Siedentop, H.: Renormalization of the regularized relativistic electron-positron field. Commun. Math. Phys. 213, 673–683 (2000) 25. Pauli, W., Rose, M.: Remarks on the polarization effects in the positron theory. Phys. Rev. II 49, 462– 465 (1936) 26. Reinhard, P.-G., Greiner, W., Arenhövel, H.: Electrons in strong external fields. Nucl. Phys. A 166, 173– 197 (1971) 27. Seiler, E., Simon, B.: Bounds in the Yukawa 2 quantum field theory: upper bound on the pressure. Hamiltonian bound and linear lower bound. Commun. Math. Phys. 45, 99–114 (1975) 28. Serber, R.: Linear modifications in the Maxwell field equations. Phys. Rev. 48(2), 49–54 (1935) 29. Shale, D., Stinespring, W.F.: Spinor representations of infinite orthogonal groups. J. Math. Mech. 14, 315– 322 (1965) 30. Simon, B.: Trace ideals and their applications. Vol. 35 of London Mathematical Society Lecture Note Series, Cambridge: Cambridge University Press, 1979 31. Solovej, J.P.: Proof of the ionization conjecture in a reduced Hartree-Fock model. Invent. Math. 104, 291– 311 (1991) 32. Thaller, B. The Dirac equation. Texts and Monographs in Physics. Berlin: Springer-Verlag, 1992 33. Uehling, E.: Polarization effects in the positron theory. Phys. Rev. 48(2), 55–63 (1935) Communicated by I.M. Sigal
Commun. Math. Phys. 306, 35–49 (2011) Digital Object Identifier (DOI) 10.1007/s00220-011-1275-0
Communications in
Mathematical Physics
Uniqueness of SRB Measures for Transitive Diffeomorphisms on Surfaces F. Rodriguez Hertz1 , M. A. Rodriguez Hertz1 , A. Tahzibi2 , R. Ures1 1 IMERL-Facultad de Ingeniería, Universidad de la República, CC 30 Montevideo, Uruguay.
E-mail:
[email protected];
[email protected];
[email protected]
2 Departamento de Matemática, ICMC-USP São Carlos, Caixa Postal 668, 13560-970 São Carlos, SP, Brazil.
E-mail:
[email protected]
Received: 20 May 2010 / Accepted: 14 January 2011 Published online: 5 June 2011 – © Springer-Verlag 2011
Abstract: We give a description of ergodic components of SRB measures in terms of ergodic homoclinic classes associated to hyperbolic periodic points. For transitive surface diffeomorphisms, we prove that there exists at most one SRB measure.
1. Introduction In this paper we attempt to give a more accurate description of the ergodic components of SRB measures. These measures were introduced by Sinai, Ruelle and Bowen in the 70’s (see [5,21–23]) and are the measures most compatible with the ambient volume when the system is not conservative. Sinai-Ruelle-Bowen’s works showed the existence and some desirable properties of such measures for uniformly hyperbolic systems. Subsequently, SRB measures were shown to exist for many non-hyperbolic systems such as: diffeomorphisms preserving smooth measures ([15]), Hénon’s attractors ([3]), attractors with mostly contracting center direction ([16,4]), mostly expanding case ([1]), partially hyperbolic attractors with one-dimensional center ([6], see also [24]), when u-Gibbs measures are unique ([7,8]). Most of these results also include a proof of uniqueness or at least finiteness of SRB measures. In general, uniqueness results are based on the knowledge of the geometry of the unstable “foliation”. In this paper, we give a description of the ergodic components of SRB measures in terms of ergodic homoclinic classes (see next subsection) associated to periodic points. Ergodic homoclinic classes were introduced by the authors in [19] (see also [18]) for the conservative setting. Although SRB measures have a different nature (in general, an SRB measure for f is not SRB for f −1 ) we obtain a similar description that combined with a subtle use of Sard’s Theorem, allows us to prove that transitive surface diffeomorphisms have at most one SRB measure.
36
F. Rodriguez Hertz, M. A. Rodriguez Hertz, A. Tahzibi, R. Ures
Fig. 1. x in the ergodic homoclinic class of p
1.1. Statement of results. Roughly speaking an SRB measure is an invariant measure that has a positive Lyapunov exponent a.e. and the decomposition of the measure along unstable manifolds is equivalent to the volume. See 2.4 for a precise definition. By Ledrappier-Young [14], a measure satisfies the entropy formula h μ = λi >0 λi dμ if and only if it is SRB. If all the Lyapunov exponents are non zero μ−almost everywhere then μ is called a hyperbolic measure. We call μ a physical measure if the basin of μ, B(μ) has positive Lebesgue measure, where by definition for every continuous observable φ : M → R, n−1 1 i φ( f (x)) → φdμ n i=0
for every x ∈ B(μ). These measures describe the asymptotic average behavior of a large subset of points of the ambient space and are the basis of the understanding of dynamics in a statistical sense. In general, using absolute continuity of unstable lamination for C 1+α diffeomorphisms, it turns out that any ergodic SRB measure is physical if all of its Lyapunov exponents are non-zero, see [17]. On the one hand, SRB measures are better to lead with thanks to the information given by the presence of the positive exponent. On the other hand, physical measures carry little information (see [26] for a discussion on the subject). For this reason we focus on the study of SRB measures. In this paper we give an accurate description for the ergodic components of SRB measures. We define ergodic homoclinic classes for hyperbolic periodic points which are “ergodic” versions of homoclinic classes and prove that ergodic components of hyperbolic measures are in fact ergodic homoclinic classes. Given a hyperbolic periodic point p, let us define the ergodic homoclinic class of p, ( p), as the set of points x ∈ M such that (Fig. 1) Here W s (x) is the Pesin stable manifold of x, that is, 1 s n n log d( f (x), f (y)) < 0 , W (x) = y ∈ M : lim sup n→+∞ n and W u (x), is Pesin unstable manifold of x. For almost every point, Pesin stable and unstable manifolds are, indeed, immersed manifolds. Note that we can write an ergodic homoclinic class as the intersection of two invariant sets: ( p) = s ( p) ∩ u ( p),
Uniqueness of SRB Measures
37
where u ( p) is the set of x satisfying the relation (1.1) and s ( p) is the set of points x satisfying (1.2) . It is clear that s ( p) and u ( p) are respectively s−saturated and u−saturated. Theorem 1.1. Let f : M → M be a C 1+α diffeomorphism over a compact manifold M and μ a hyperbolic SRB measure. If μ(s ( p)) and μ(u ( p)) > 0, then ◦
u ( p) ⊂ s ( p). Moreover, the restriction of μ to ( p) is ergodic and non-uniformly hyperbolic and physical. We can drop the hypothesis of the hyperbolicity of the measure under the hypothesis m(s ( p)) > 0, where m is the Lebesgue measure. We also give an example where μ(s ( p)) and μ(u ( p)) > 0 and u is not a.e. contained in s . Of course in such example μ is neither hyperbolic nor ergodic. Theorem 1.2. Let f : M → M be a C 1+α diffeomorphism over a compact manifold M and μ an SRB measure. If m(s ( p)) > 0 and μ(u ( p)) > 0, then ◦
u ( p) ⊂ s ( p), Moreover, the restriction of μ to ( p) is a hyperbolic ergodic measure. We mention that a similar result for Lebesgue measure has been proved in [19] without the hypothesis of hyperbolicity of measure. In fact for Lebesgue measure we did ◦ not assume hyperbolicity of the measure and conclude that s = u . But, as we have mentioned before, SRB measures have a different nature. An SRB measure for f is not SRB for f −1 . Theorem 1.1 has the following corollary: Corollary 1.3. Let f : M → M be a C 1+α diffeomorphism. If μ(( p)) > 0 for a hyperbolic point p, then μ|( p) is an ergodic component of μ. If μ is a hyperbolic invariant measure, by a result of A. Katok ([13]), ( f, μ) is approximated by uniformly hyperbolic (homoclinic class of hyperbolic periodic point) sets with μ−measure zero. Using carefully the construction of such periodic points, we prove the following theorem which will be used in the proof of Theorem 1.7. Theorem 1.4. Let f : M → M be a C 1+α diffeomorphism over a compact manifold M and μ a hyperbolic SRB measure. Then for any ergodic component ν of μ there exists a hyperbolic periodic point P such that ν((P)) = 1. For further use, we state the following simple corollary of the (inclination) λ-lemma: Proposition 1.5. If p, q are homoclinically related then ( p) = (q). 1.2. Uniqueness of SRB measures. It is a challenging problem in the Ergodic Theory of Dynamical Systems to prove the existence and uniqueness (finiteness) of SRB or physical measures. In this paper we show that ergodic homoclinic classes are useful objects to distinguish between different hyperbolic ergodic SRB measures. more precisely:
38
F. Rodriguez Hertz, M. A. Rodriguez Hertz, A. Tahzibi, R. Ures
Theorem 1.6. Let μ, ν be ergodic SRB measures such that μ(( p)) = ν(( p)) = 1 for some hyperbolic periodic point p, then μ = ν. In the two dimensional case, using the above theorem we prove that topological transitivity is enough to guarantee that there exists at most one SRB measure for surface C 2 diffeomorphisms. It is easy to see that for a surface diffeomorphism any SRB measure is hyperbolic. Indeed take an ergodic SRB measure μ with two Lyapunov exponents λ+ > λ− . By the Pesin entropy formula (see Young-Ledrappier [14]) h(μ) = λ+ . Since h(μ, f ) = h(μ, f −1 ) by the Ruelle inequality λ+ ≤ −λ− . Theorem 1.7. Let f : M → M be a C 1+α diffeomorphism over a compact surface M. If f is topologically transitive then there exists at most one SRB measure. An example due to I. Kan (see [12]) shows that the above theorem can not be true in higher dimensional manifolds. I. Kan constructed a transitive diffeomorphism of T2 × [0, 1] with two SRB measures with intermingled basins. Gluing two of these examples along the boundary torus and composing with a diffeomorphism that interchanges the two components, a transitive diffeomorphism of T3 with two SRB-measures is obtained. We recall that, as a consequence of the absolute continuity of the unstable “foliation”, the Lebesgue measure is SRB if there is a positive exponent a.e. (see [15]). As a corollary of the above theorem we obtain the following result in the conservative setting. We thank F. Ledrappier for observing this point in the Workshop on Partial Hyperbolicity, in Beijing. Theorem 1.8. Let f : M → M be a C 1+α volume preserving diffeomorphism of a compact surface M with non zero Lyapunov exponents. If f is topologically transitive then it is ergodic. Observe that by H. Furstenberg’s example ([9]), without the hypothesis of nonvanishing Lyapunov exponents, even minimality is not enough to guarantee ergodicity. Furstenberg constructs a minimal non-ergodic C ∞ diffeomorphism of the two torus. 2. Preliminaries 2.1. Non-uniform hyperbolicity. Let us review some results about Pesin theory that shall be used in this paper. A good summary of these facts may be found, for instance, in [17] and [14]. For further references, see Katok’s paper [13] and the book by Barreira and Pesin [2]. Let f : M → M be a C 1 diffeomorphism of a compact Riemannian manifold of dimension n. Given a vector v ∈ Tx M, let the Lyapunov exponent of v be the exponential growth rate of D f along v, that is 1 log |D f n (x)v| |n|→∞ n
λ(x, v) = lim
(2.1)
in case this amount is well defined. And let E λ (x) be the subspace of Tx M consisting of all v such that the Lyapunov exponent of v is λ. Then we have the following: Theorem 2.1 (Osedelec). For any C 1 diffeomorphism f : M → M there is an f -invariant Borel set R of total probability (sin the sense that μ(R) = 1 for all invariant probability measures μ), and for each ε > 0 a Borel function Cε : R → (1, ∞) such that for all x ∈ R, v ∈ Tx M and n ∈ Z,
Uniqueness of SRB Measures
39
(1) Tx M = λ E λ (x) (Oseledec’s splitting), (2) For all v ∈ E λ (x), Cε (x)−1 ex p[(λ − ε)n]|v| ≤ |D f n (x)v| ≤ Cε (x)ex p[(λ + ε)n]|v|, (3) ∠ (E λ (x), E λ (x)) ≥ Cε (x)−1 if λ = λ , (4) Cε ( f (x)) ≤ exp(ε)Cε (x). The set R is called the set of regular points. We also have that D f (x)E λ (x) = E λ ( f (x)). If an f -invariant measure μ is ergodic then the Lyapunov exponents and dim E λ (x) are constant μ-a.e. For fixed ε > 0 and given l > 0, we define the Pesin blocks: Rε,l = {x ∈ R : Cε (x) ≤ l} . Note that Pesin blocks are not necessarily invariant. However f (Rε,l ) ⊂ Rε,exp(ε)l . Also, for each ε > 0, we have R=
∞
Rε,l .
(2.2)
l=1
We lose no generality in assuming that Rε,l are compact. For all x ∈ R we have
E λ (x) ⊕ E 0 (x) E λ (x), Tx M = λ<0
λ>0
where E 0 (x) is the subspace generated by the vectors having zero Lyapunov exponents. Let μ be an invariant measure. When E 0 (x) = {0} for μ-a.e. x in a set N , then we say
that f is non-uniformly hyperbolic on N and that μ is a hyperbolic measure on N . Now, let us assume that f ∈ C 1+α for some α > 0. Given a regular point x, we define its stable Pesin manifold by 1 W s (x) = y : lim sup log d( f n (x), f n (y)) < 0 . (2.3) n→+∞ n
The unstable Pesin manifold of x, W u (x) is the stable Pesin manifold of x with respect to f −1 . Stable and unstable Pesin manifolds of points in R are immersed manifolds [15]. We stress that C 1+α regularity is crucial for this to happen. In this way we obtain a partition x → W s (x) , which we call a stable partition. An Unstable partition is defined analogously. Stable and unstable partitions are invariant. s (x) the connected On the Pesin blocks we have a continuous variation: Let us call Wloc s component of W (x) ∩ Br (x) containing x, where Br (x) denotes the Riemannian ball of center x and radius r = r (, l) > 0, which is sufficiently small but fixed. Then Theorem 2.2 (Stable Pesin Manifold Theorem [15]). Let f : M → M be a C 1+α diffeomorphism preserving a smooth measure m. Then, for each l > 1 and small ε > 0, if x ∈ Rε,l : s (x) is a disk such that T W s (x) = (1) Wloc x loc λ<0 E λ (x) s (x) is continuous over R 1 (2) x → Wloc ε,l in the C topology s (x) equals the number of negative In particular, the dimension of the disk Wloc Lyapunov exponents of x. An analogous statement holds for the unstable Pesin manifold.
40
F. Rodriguez Hertz, M. A. Rodriguez Hertz, A. Tahzibi, R. Ures
2.2. Absolute continuity. An important notion behind the criterion we are going to prove is absolute continuity. Let us state the definitions we will be using. The point of view we follow is similar to that in [14]. Let ξ be a partition of the manifold M. We shall call ξ a measurable partition if the quotient space M/ξ is separated by a countable number of measurable sets. For instance, the partition of the 2-torus by lines of irrational slope is not measurable, while the partition of [0, 1] by singletons is measurable. The quotient space M/ξ of a Lebesgue space M by a measurable partition ξ is again a Lebesgue space [20]. Associated to each measurable partition ξ of a Lebesgue space (M, B, m) there is ξ a canonical system of conditional measures m x , which are measures on ξ(x), the element of ξ containing x, and with the property that for each A ∈ B the set A ∩ ξ(x) ξ is measurable in ξ(x) for almost all ξ(x) in M/ξ , and the function x → m x (A ∩ ξ(x)) is measurable, with: m(A) = m ξx (A ∩ ξ(x))dm T , (2.4) M/ξ
where m T is the quotient measure on M/ξ . For each measurable partition this canonical system of conditional measures is unique (mod 0), i.e. any other system is the same for almost all ξ(x) ∈ M/ξ . Conversely, if there is a canonical system for a partition, then the partition is measurable. In our case, we will be interested in stable and unstable partitions, note that in general these partitions are not measurable. A measurable partition ξ is subordinate to the unstable partition W u if for m-a.e. we have ξ(x) ⊂ W u (x), and ξ(x) contains a neighborhood of x which is open in the topology of W u (x). Definition 2.3. m has absolutely continuous conditional measures on unstable manifolds if for every measurable partition ξ subordinate to W u , m ξx << λux for m-a.e. x, where λux is the Riemannian measure on W u (x) given by the Riemannian structure of W u (x) inherited from M. We are now able to give a definition of SRB-measure. Definition 2.4. An f -invariant probability measure μ is called a Sinai-Bowen Ruelle (SRB) measure if it has a positive Lyapunov exponent a.e. and absolutely continuous conditional measures on unstable manifolds. After Ledrappier-Young ([14]) this is equivalent to having a positive Lyapunov exponent a.e. and satisfying the Pesin formula, hμ( f ) = λ(x)>0 λ(x) dμ. Now, take a point x0 ∈ R, the set of regular points. Assume that x0 has at least one negative Lyapunov exponent. Take two small discs T and T near x0 which are transverse to W s (x0 ). Then we can define the holonomy map with respect to these transversals s (x) ∩ T . The domain of h as a map h defined on a subset of T such that h(x) = Wloc consists of the points x ∈ T ∩ R whose stable manifold have the same dimension as W s (x0 ), and which transversely intersect T and T . h is a bijection. Definition 2.5. We say that the stable partition is absolutely continuous if all holonomy maps are measurable and take Lebesgue zero sets of T into Lebesgue zero sets of T . Absolute continuity of the unstable partition is defined analogously.
Uniqueness of SRB Measures
41
Theorem 2.6 ([15]). Let f be a C 1+α diffeomorphism. Then, its stable and unstable partitions are absolutely continuous. Note that the holonomy maps of the stable foliation are continuous and have continuous Jacobians when restricted to the Pesin blocks Rε,l . 3. Ergodic Components and Ergodic Homoclinic Classes In this section we prove all theorems except Theorem 1.7. For the sake of simplicity firstly, we shall prove Corollary 1.3. Let us introduce some lemmas before entering into its proof. Let μ be a SRB-measure. For any given function ϕ ∈ L 1μ (M, R), let n−1 1 ϕ( f n (x)). n→±∞ n
ϕ˜ ± (x) = lim
(3.1)
i=0
By the Birkhoff Ergodic Theorem, the limit (3.1) exists and ϕ + (x) = ϕ − (x) for μ−a.e. x ∈ M. Note that ϕ ± (x) is f -invariant. Moreover, we have the following: Lemma 3.1. For all ϕ ∈ C 0 (M) there exists an invariant set S with μ(S) = 1 such that if x ∈ S we have ϕ˜+ (w) = ϕ˜+ (x) for all w ∈ W s (x) and m ux -a.e. w ∈ W u (x). Proof of the lemma. The proof is completely analogous to the one for smooth measures since SRB-measures, by definition, have absolutely continuous conditional measures along unstable manifolds. Proof of Corollary 1.3. Let ϕ : M → R be a continuous function, let S be the set obtained in Lemma 3.1 and R the set of μ-regular points. We shall see that ϕ˜ + is constant on ( p) ∩ S ∩ R. This will prove that μ is ergodic when restricted to ( p). Let x, y ∈ ( p)∩S ∩Rε,l = for some ε > 0 and l > 1. Without loss of generality we may assume that x and y are in the support of the restriction of μ to and that they return infinitely many times to . Hence there exists n > 0 such that f n (y) ∈ and d( f n (y), W u ( p)) < δ/2 where δ > 0 is as in the definition of transverse absolute cons ( f n (y)) W u ( p) = ∅. We can suppose for simplicity that n = 0, tinuity. Hence Wloc and that p is a hyperbolic fixed point. As a consequence of the Inclination Lemma, there exists k > 0 such that f k (x) ∈ s (y) = ∅. As in the case of smooth measures, due to Lemma 3.1 and W u ( f k (x)) Wloc u (y) such that ϕ˜ + (w) = ϕ˜ + (y) and u above, there is a m y -positive measure set of w ∈ Wloc s u k + Wloc (w) W ( f (x)) = ∅. Since ϕ˜ is constant on stable leaves and there is transu ( f k (x)) verse absolute continuity, we get a m uf k (x) -positive measure set of w ∈ Wloc such that ϕ˜ + (w ) = ϕ˜ + (y). See Fig. 2. But f k (x) ∈ S, so due to Lemma 3.1 again ϕ˜ + (y) = ϕ˜ + ( f k (x)) = ϕ˜ + (x) concluding the proof. In order to prove Theorem 1.1, we shall need a refinement of Lemma 3.1: Lemma 3.2. Given ϕ ∈ L 1 (μ) there exists an invariant set Sϕ ⊂ M, μ(Sϕ ) = 1 such that if x ∈ Sϕ , then m ux -a.e. y ∈ W u (x) satisfy ϕ + (y) = ϕ + (x). Proof. Given ϕ ∈ L 1 (μ) take a sequence of continuous functions ϕn converging to ϕ in L 1 (μ). Since ϕ˜n+ converges in L 1 (μ) to ϕ˜ + there exists a subsequence ϕ˜n k + converging a.e. to ϕ˜+ . The intersection of this set of almost everywhere convergence with the set S obtained in Lemma 3.1 gives the desired set Sϕ .
42
F. Rodriguez Hertz, M. A. Rodriguez Hertz, A. Tahzibi, R. Ures
Fig. 2. Proof of Corollary 1.3
Let us give now the proof of Theorems 1.1 and 1.2. Proof of Theorem 1.1. To simplify ideas, let us suppose that p is a hyperbolic fixed point. Let S be the set obtained in Lemma 3.2 for the characteristic function 1s ( p) . Take x ∈ u ( p) ∩ S, we will prove that x ∈ s ( p). This proves the first claim of the theorem. Let y ∈ s ( p), and ε > 0, l > 1 be such that x, y ∈ Rε,l . We lose no generality in assuming that y is in the support of μ restricted to s ( p) ∩ Rε,l . We can also assume that x and y return infinitely many times to Rε,l . Proceeding as in the proof of Corollary 1.3, we may also assume that d(y, W u ( p)) < δ/2, where δ is much smaller than the size of local stable manifolds of points in Rε,l . Note that 1s ( p) is an f -invariant function. This implies that if x ∈ / s ( p), then u u s m x -a.e. y ∈ W (x) will satisfy y ∈ / ( p), due to Lemma 3.2 above. The idea of the proof is to find a m ux -positive measure set of points z ∈ W u (x) such that z ∈ s ( p). This will prove that x ∈ s ( p). As a consequence of the Inclination Lemma, and since x returns infinitely many s (y) = ∅. Note that this times to Rε,l , there exists k > 0 such that W u ( f k (x)) Wloc intersection a priori can have positive dimension. Now, since y is in the support of μ restricted to Rε,l ∩s ( p), we have μ(Rε,l ∩s ( p)∩ Bδ (y)) > 0. Since by hypothesis μ is a hyperbolic measure with absolutely continuous conditional measures along unstable manifolds (SRB-measure) there exists z ∈ Bδ (y) u (z) ∩ R ∩ s ( p) ∩ B (y)) > 0. such that dim(W u (y)) = dim(W u (z)) and m uz (Wloc ε,l δ s (y) ∩ W u ( f k (x)) inside Take a smooth foliation L of a neighborhood of a point of Wloc W u ( f k (x)) of dimension equal to dim(W u (y)) = n − dim(W s (y)). This can be done s (y). In fact in such a way that every L ∈ L is transversal to Wloc dim(W s (y) ∩ W u ( f k (x)) = dim(W u ( f k (x))) + dim(W s (y)) − n = dim(W u ( f k (x))) − dim(W u (y)). s (y). Transverse absolute conTake an open submanifold T of W u ( f k (x)) Wloc L s tinuity of the stable foliation implies that m ω ( ( p) ∩ L) > 0∀ω ∈ T where m L is the induced Lebesgue measure of the leaf L of the smooth foliation L. Now the Fubini theorem for the smooth foliation L implies that u s u k m f k (x) ( ( p) ∩ W ( f (x))) ≥ m ωL (L ω ∩ s ( p))dm T (ω) > 0 T
So we have proved that a
m uf k (x) -positive
measure subset of W u ( f k (x)) belongs to
s ( p). As x ∈ S and S is invariant this implies that f k (x) ∈ s ( p) and so x ∈ s ( p). ◦
This finishes the proof of u ( p) ⊂ s ( p).
Uniqueness of SRB Measures
43
By a similar argument as above we are able to prove the ergodicity of μ|( p) . Indeed, let ϕ ∈ C(M) and x, y ∈ ( p) be as above with S the full measure obtained in Lemma 3.1 for ϕ. Following the above arguments mutatis mutandis, we prove that ϕ+ (x) = ϕ+ (y). By definition the restriction of f on ( p) is non uniformly hyperbolic with the same index of p. Proof of Theorem 1.2. Let x, y be as in the proof of Theorem 1.1 with the difference that now y is a density point of Rε,l ∩ s ( p) for Lebesgue measure. Now take a smooth foliation F of dimension equal to dim(W u (y)) inside Bδ (y). By the Fubini theorem s there exists a leaf of this foliation F(z), z ∈ Bδ (y) such that m F z (F(z) ∩ Rε,l ∩ ( p) ∩ Bδ (y)) > 0, where here m F z denotes the Lebesgue measure of F(z). From this point, just changing the role of W u (z) by F(z) in the proof of Theorem 1.1 the arguments are the same. Proof of Theorem 1.4. Let μ be a hyperbolic SRB measure. Firstly we prove the following well-known fact. Lemma 3.3. Almost all ergodic components of μ are hyperbolic and SRB. Proof. By ergodic decomposition there exists a probability measure μˆ on M(M( f )) with support on ergodic measures such that hμ = h ν d μ(ν), ˆ M( f ) λi+ dμ = λi+ (ν)d μ(ν). ˆ + By the Ruelle inequality we have that for all ν, h(ν) ≤ λi (ν), and putting these together it is clear that μ−almost ˆ every ν will satisfy the entropy formula, and so it is an SRB measure. From now on suppose that μ is itself an ergodic hyperbolic SRB measure. By a Katok result ( f, μ) is approximated by uniformly hyperbolic sets with μ−measure zero. More precisely, take a Pesin block of large measure such that the size of stable and unstable manifolds are uniformly bounded from below by a constant larger than zero. Let x ∈ supp(μ| ) and B be a small ball around x such that μ(B ∩ ) > 0. By Katok’s closing lemma we can find a periodic point p near enough to x whose stable and unstable manifolds respectively are C 1 −close to the stable and unstable manifolds of p and consequently x ∈ ( p). As the stable and unstable lamination vary continuously on
we obtain that y ∈ ( p) for any y ∈ B ∩ . This yields that μ(( p)) > 0. The ergodicity of μ implies that μ(( p)) = 1. 4. Example Here we give some examples of systems with SRB measures which shed light on the difference between our results in smooth and SRB measure cases. We construct a diffeomorphism f : M → M with an SRB measure μ such that there exists a hyperbolic periodic point p such that μ(s ( p)) = μ(u ( p)) = 1/2 but μ(s ∩ u ) = 0. Observe that μ can not be the Lebesgue measure by our previous work ◦ ◦ in [19]. In the smooth measure case μ(s ), μ(u ) > 0 implies that s = u = and f | is ergodic.
44
F. Rodriguez Hertz, M. A. Rodriguez Hertz, A. Tahzibi, R. Ures
We will split the construction of f into three steps. 4.1. First step. We begin with f 0 : T2 → T2 as a C 1+α , 0 < α < 1 almost -Anosov diffeomorphism with an SRB measure μ0 . Moreover f 0 has a fixed point R such that D f 0 (R) has two eigenvalues λ1 = 1, λ2 < 1. Such f 0 can be obtained satisfying the following properties: • μ0 -almost every x ∈ T2 has one positive (and consequently one negative) Lyapunov exponent, • W s (R) W u (P) for any periodic point P = R, • W u (Q) W s (P) for any two periodic points P, Q ∈ T2 . We emphasize that such an example can not be C 2 . See the work of Hatomoto [10] (see also [11]) 4.2. Second step. Now we consider a family of skew products over f 0 as follows. Recall that μ0 is an SRB measure of f 0 and f 0 has a fixed point R with neutral direction. We assume also that f 0 has two more fixed points P, Q which are hyperbolic with one dimensional unstable manifold. For x ∈ T2 and t ≥ 1 let gxt : S1 → S1 satisfy the following properties: • For all x ∈ T2 , gxt : S1 → S1 , gxt (0) = 0, 13 ≤ |Dgxt (0)| ≤ t, • For some small > 0, |Dgxt (0)| = t, ∀x ∈ B (Q), • |Dgxt (0)| = 21 , ∀x ∈ / B2 (Q), • t → T2 log |Dgxt (0)|dμ0 is continuous. Now we consider the following skew product over f 0 : F t (x, θ ) = ( f 0 (x), gxt (θ )). Lemma 4.1. There exists t0 such that for μ0 × δ0 almost every (x, 0) ∈ T2 × S1 , the Lyapunov exponent of F t0 in the tangent direction to S1 vanishes. Proof. Let 0 < α := μ0 (B (Q)) < β := μ(B2 (Q)) < 1. By Birkhoff’s Theorem for a μ0 −typical point x : 1 n−1 log |Dgxt (0)|dμ0 = lim log |i=0 Dg tf i (x) (0)|, n→∞ 2 n 0 T and by our choices α log(t) + (1 − α) log(1/3) ≤
T2
log |Dgxt (0)|dμ0 ≤ (1 − β) log(1/2) + β log(t).
Using the above estimates and the continuity |Dgxt (0)|dμ0 with respect to t, of T2 log t we conclude that there exist t0 such that T2 log |Dgx (0)|dμ0 = 0 which means that μ0 × δ0 -almost every point (x, θ ) ∈ T2 × S1 has zero Lyapunov exponent. 4.3. Third step. Let F := F t0 with t0 as above. Let A : T2 → T2 be a linear Anosov diffeomorphism and define f ∈ Diff 1+α (T2 × S1 × T2 ), f (x, θ, y) = (F(x, θ ), A(y)). The Lebesgue measure of T2 is an SRB measure for A and we denote it by m. So, the ˜ Q˜ and R˜ probability measure μ := μ0 2+δ R × δ0 × m is invariant by f . In what follows P,
Uniqueness of SRB Measures
45
stand for respectively (P, 0, 0), (Q, 0, 0) and (R, 0, 0) points in T2 × S 1 × T2 . We will ˜ = μ(u ( P)) ˜ = 1 and s ( P) ˜ ∩ u ( P) ˜ = ∅. show that μ is SRB and satisfies μ(s ( P) 2 The SRB property is straightforward from the definition and the fact that the Lyapunov exponent of f along the tangent direction to S1 vanishes and μ0 and m are SRB. ˜ P, ˜ Q˜ are fixed points of f with unstable dimension respecBy our construction R, tively one, two and three. As W s (R, f 0 ) W u (P, f 0 ) and A is Anosov, we conclude ˜ and consequently μ(s ( P) ˜ ≥ μ({R} × {0} × T2 ) = 1 . In that {R} × {0} × T2 ⊂ s ( P) 2 fact, as for μ0 × δ0 × m-almost every point the Pesin stable manifold is two dimensional, ˜ and we have proved that μ(s ( P)) ˜ = 1. no such point belongs to s ( P) 2 ˜ By construction μ0 × δ R × m-almost every point Let us now investigate u ( P). ˜ It is clear that has two dimensional unstable manifold which is transverse to W s ( P). 1 2 u u s ˜ ˜ ˜ ˜ = ∅. {R} × {0} × T ∈ / ( P). We proved that μ( ( P) = 2 and ( P) ∩ u ( P) 5. SRB measure for surface diffeomorphisms In this section we prove Theorem 1.7. For this aim we will first prove Theorem 1.6. This will be done in the next subsection.
5.1. SRB measures supported on the same ergodic homoclinic class. In this subsection we will show that ergodic homoclinic classes support at most one SRB measure. This is the statement of Theorem 1.6. Proof of Theorem 1.6. Let B(μ) and B(ν) be, respectively, the basins of μ and ν. By ergodicity we have that μ(B(μ)) = ν(B(ν)) = 1. In fact by Birkhoff’s Ergodic Theorem there exists Bμ ⊂ B(μ) and Bν ⊂ B(ν) such that μ(Bμ ) = ν(Bν ) = 1, where n−1 1 ϕ( f ±i (x)) → ϕdμ ∀ϕ ∈ C(M)}, n→∞ n
Bμ = {x : lim
1 n→∞ n
Bν = {x : lim
i=0 n−1
ϕ( f ±i (x)) →
ϕdν ∀ϕ ∈ C(M)}.
i=0
Since μ() = ν() = 1 it comes out that μ((Bμ ) ∩ ) = ν((Bν ) ∩ ) = 1. From now on the technique will be similar to the one of the proof of Corollary 1.3. By absolute continuity of unstable lamination we can take x such that m ux (Bμ ∩) = 1. Now let y be a point of recurrence of Bν ∩ where is a Pesin block for ν|. Again by absolute continuity of unstable lamination we can choose y in such a way that m uy ( ∩ Bν ) > 0. Since y returns infinitely many times to the Pesin’s block we additionally can assume that y is close enough to W u ( p). Using the λ-lemma we have that W u ( f k (x)) is also C 1 −close to W u ( p). This implies that Ws (z) W u ( f k (x)), ∀z ∈ ∩ Bν . Now, by the absolute continuity of stable lamination on Pesin’s blocks, m uf k (x) (Hs ( ∩Bν )) > 0. On the one hand, we know that the basin of ν is s−saturated so m uf k (x) (B(ν)) > 0. On the other hand m uf k (x) (B(μ)) = 1 and this implies that Bμ ∩ Bν = ∅ which implies μ = ν.
46
F. Rodriguez Hertz, M. A. Rodriguez Hertz, A. Tahzibi, R. Ures
Fig. 3. Periodic points associated to SRB measures
5.2. Uniqueness for transitive diffeomorphisms of surfaces. Let μ and ν be two ergodic SRB measures. As mentioned before μ and ν are hyperbolic measures. By Theorem 1.4 we conclude that there exist hyperbolic periodic points Pμ , Pν such that ν((Pν )) = μ((Pμ )) = 1. The following proposition is the main ingredient of the proof of Theorem 1.7. Proposition 5.1. Pμ and Pν are homoclinically related and (Pμ ) = (Pν ). The above proposition together with Theorem 1.6 immediately implies the conclusion of Theorem 1.7. To prove the existence of the homoclinic relation between Pμ and Pν we need that the manifold M is two dimensional. Indeed first, using that M is a surface we prove that the invariant maniolds of Pμ and Pν are topologically transverse. Then using a finer analysis of laminations and Sard’s Theorem we prove transversal homoclinical intersection. Proof. Recall that Pμ and Pν (which we suppose are fixed points) comes from Katok’s closing lemma and as μ, ν are hyperbolic ergodic, both Pμ and Pν have non trivial homoclinic classes. Consequently there exist topological rectangles whose boundaries (∂ s and ∂ u ) consist of stable and unstable segments of Pμ , Pν . (See Fig. 3.) Choose two such rectangles Rμ , Rν such that Pμ , Pν respectively belong to the boundary of Rμ and Rν and Rμ ∩ Rν = ∅. By topological transitivity of f there exist n ∈ N such that f n (Rμ ) ∩ Rν = ∅. Observe that f n (Rμ ) and Rν are topological rectangles and as Pν ∈ / Rμ it comes out that Rν is not contained in f n (Rμ ). So W u (Pμ ) ∩ ∂ s (Rν ) = ∅. Although ∂ s (Rν ) is a piece of W s (Pν ) this intersection may be just topologically transversal (a tangency). However, we will prove that there should exist also transversal intersections between W u (Pμ ) and W s (Pν ). Lemma 5.2. Taking a rectangle Rν small enough there exists a new system of coordinates such that: (1) Rν = [0, 1]2 , (2) W u (Pμ ) ∩ Rν is the graph of a C 2 -function γ : I → [0, 1], I ⊂ [0, 1], (3) There exists K ⊂ [0, 1] of positive Lebesgue measure such (0, x) ∈ [0, 1]2 has a “large” stable manifold crossing [0, 1]2 . Proof. We will show that it is possible to take Rν ⊂ Rν in such a way that it satisfies Items 2 and 3 after a suitable change of coordinates. Observe that W u (Pμ ) is a C 2 -curve and we have supposed that it is tangent to the local stable manifold of Pν . So to guarantee the second item of the lemma, it is enough to take Rν with small height.
Uniqueness of SRB Measures
47
Fig. 4. Tangency and transverality
To prove the last item, firstly notice that by construction the stable (unstable) manifold of Pν has transversal intersection with the unstable (stable) manifold of the points inside a Pesin block of (Pν ). Let us denote by (Pν ) such a Pesin hyperbolic block of (Pν ). By a disintegration argument and using the fact that the conditional measures of μ along unstable manifolds are absolutely continuous with respect to Lebesgue measure we get a local Pesin unstable manifold which intersects (Pν ) in a positive Lebesgue measure. Now as the stable lamination is absolutely continuous we slide such points along stable laminae to obtain a positive measure subset of W u (Pν ) which we denote by Kˆ . Now iterating Kˆ and using the λ-lemma we obtain a positive Lebesgue measure subset K ⊂ [0, 1] such that {0} × K ⊂ ∂(Rν ) and W s (0, x) crosses Rν for any x ∈ K . Let ϕ(x) := h s (x, γ (x)), where h s is the projection by stable lamination (see Fig. 4). Observe that a priori ϕ is defined just on a closed subset {x ∈ I : (x, γ (x)) ∈ W s (K )} with positive Lebesgue measure. However we can verify the Whitney condition to extend it in a C 1 fashion on the whole inteval I as we explain below. We recall a standard treatment of absolute continuity of stable holonomies in the Pesin block following Pugh-Shub [17]. In fact we show that, as the stable lamination is co-dimension one its holonomy map is differentiable. More precisely we claim that the stable holonomy can be extended to a C 1 function h s : [0, 1] → [0, 1] and consequently ϕ can be extended to a C 1 -function on I. Let F be a C 1 −foliation which is close to the stable lamination in the C 1 -sense. Graph transformation arguments show that f −n (F) converges to the stable lamination. Let (h n , J h n ) represent the holonomy h n of f −n (F) with its derivative J h n . As the domain of h n is one dimensional, J h n represents both the derivative and the Jacobian of the holonomy h n . By definition the following diagram commutes
h0
f −n (D) −−−−→ f −n (D ) ⏐ ⏐ ⏐ −n ⏐ f −n f D
−−−−→ hn
D
48
F. Rodriguez Hertz, M. A. Rodriguez Hertz, A. Tahzibi, R. Ures
All holonomies h n are differentiable and J h n (x) = J f −n ( f n (h n (x))) ◦ J h 0 ◦ J f n (x). It is standard that (h n , J h n ) converge uniformly to (h, J h) where h is the stable lamination holonomy and for some function J h. As h n are differentiable with derivative J h n , by uniform convergence of J h n to J h we conclude that J h satisfies the C 1 -extension Whitney theorem hypothesis (see [25]) and consequently we can extend ϕ to a C 1 -function on the whole interval I. Now it is easy to see that if W s (x, γ (x)) is tangent to the graph of γ then Dϕ(x) = 0. By Sard’s Theorem the Lebesgue measure of critical values of ϕ is zero. From this we conclude that the graph of γ intersects transversally W s (0, x) for some x ∈ K . By invariance there exist z ∈ (Pν ) such that W s (z) W u (Pμ ) = ∅. Now using again Katok’s closing lemma we find a hyperbolic periodic point Pˆν with W s ( Pˆν ) close enough to W s (z) in C 1 -topology so that W s (z) W u ( Pˆμ ) = ∅. It is clear that Pν and Pˆν are homoclinically related. So using the λ−lemma it comes out that W u (Pμ ) has also a transversal intersection with W s (Pν ). A similar argument shows that W u (Pν ) has a transversal intersection with W s (Pμ ). So we proved that Pμ and Pν are homoclinically related. Now using Proposition 5.1 we obtain that (Pμ ) = (Pν ). References 1. Alves, J.F., Bonatti, C., Viana, M.: SRB measures for partially hyperbolic systems whose central direction is mostly expanding. Invent. Math. 140, 351–398 (2000) 2. Barreira, L., Pesin, Y.: Lyapunov Exponents and Smooth Ergodic Theory. American Mathematical Society, University Lecture Series, Vol. 23. Providence, RI: Amer. Math. Soc., 2003 3. Benedicks, M., Young, L.-S.: Sinai-Bowen-Ruelle measure for certain Hénon maps. Invent. Math. 112, 541–576 (1993) 4. Bonatti, C., Viana, M.: SRB measures for partially hyperbolic systems whose central direction is mostly contracting. Israel J. Math. 115, 157–194 (2000) 5. Bowen, R.: Equilibrium states and the ergodic theory of Anosov diffeomorphisms. Lecture Notes in Math. 470, Berlin-Heidelberg-New York: Springer, 1975 6. Cowieson, W., Young, L.-S.: SRB measures as zero-noise limits. Erg. Th. Dynam. Syst. 25, 1115–1138 (2005) 7. Dolgopyat, D.: Limit Theorems for partially hyperbolic systems. Transactions of the American Math. Soc. 356(4), 1637–1689 (2004) 8. Dolgopyat, D.: On differentiability of SRB states for partially hyperbolic systems. Invent. Math. 155, 389– 449 (2004) 9. Furstenberg, H.: Strict ergodicity and transformations of the torus. Amer. J. Math. 83, 573–601 (1961) 10. Hatomoto, J.: Diffeomorphisms admitting SRB measures and their regularity. Kodai Math. J. 29, 211–226 (2006) 11. Hu, H.Y., Young, L.-S.: Nonexistence of SBR measures for some diffeomorphisms that are almost Anosov. Erg. Th. Dynam. Syst. 15(1), 67–76 (1995) 12. Kan, I.: Open sets of diffeomorphisms having having two attractors, each with an everywhere dense basin. Bull. Amer. Math. Soc. 31, 68–74 (1994) 13. Katok, A.: Lyapunov exponents, entropy and periodic orbits for diffeomorphisms. IHES Publ. Math. 51, 137–173 (1980) 14. Ledrappier, F., Young, L.-S.: The metric entropy of diffeomorphisms Part I: Characterization of measures satisfying Pesin’s entropy formula. Ann. Math. 122, 509–539 (1985) 15. Pesin, Ya.: Characteristic Lyapunov exponents and smooth ergodic theory. Usp. mat. Nauk 32, 55–112, (1977); English transl., Russ. Math. Surv. 32, 55–114, (1977) 16. Pesin, Ya., Sinai, Ya.G.: Gibbs measures for partially hyperbolic attractors. Erg. Th. Dynam. Syst. 2, 417– 438 (1983) 17. Pugh, C., Shub, M.: Ergodic attractors. Trans. Am. Math. Soc. 312, 1–54 (1989)
Uniqueness of SRB Measures
49
18. Rodriguez Hertz, F., Rodriguez Hertz, M., Tahzibi, A., Ures, R.: A criterion for ergodicity of nonuniformly hyperbolic diffeomorphisms. Electron. Res. Announc. Math. Sci. 14, 74–81 (2007) 19. Rodriguez Hertz, F., Rodriguez Hertz, M., Tahzibi, A., Ures, R.,: New criteria for ergodicity and nonuniform hyperbolicity, preprint, available at http://arxiv.org/abs/0907.4539vL [math.D5], 2009 20. Rohlin, V.A.: On the fundamental ideas of measure theory. AMS Translations 71 (1952) 21. Ruelle, D.: A measure associated with Axiom A attractors. Amer. J. Math. 98, 619–654 (1976) 22. Ruelle, D.: Thermodynamic Formalism. Reading, MA: Addison Wesley, 1978 23. Sinai, Ya.G.: Gibbs measure in ergodic theory. Russ. Math. Surv. 27, 21–69 (1972) 24. Tsujii, M.: Physical measures for partially hyperbolic surface endomorphisms. Acta Math. 194, 37–132 (2005) 25. Whitney, H.: Analytic extensions of functions defined in closed sets. Transactions of American Mathematical Society 36(1), 63–89 (1934) 26. Young, L.-S.: What are SRB measures, and which dynamical systems have them? J. Stat. Phys. 108, 733–754 (2002) Communicated by G. Gallavotti
Commun. Math. Phys. 306, 51–82 (2011) Digital Object Identifier (DOI) 10.1007/s00220-011-1273-2
Communications in
Mathematical Physics
A Generalized Representation Formula for Systems of Tensor Wave Equations Arick Shao Department of Mathematics, Princeton University, Princeton, NJ 08544, USA. E-mail:
[email protected] Received: 26 May 2010 / Accepted: 6 January 2011 Published online: 28 May 2011 – © Springer-Verlag 2011
Abstract: In this paper, we generalize the Kirchhoff-Sobolev parametrix of Klainerman and Rodnianski (Hyperbolic Equ. 4(3):401–433, 2007) to systems of tensor wave equations with additional first-order terms. We also present a different derivation, which better highlights that such representation formulas are supported entirely on past null cones. This generalization of (Hyperbolic Equ. 4(3):401–433, 2007) is a key component for extending Klainerman and Rodnianski’s breakdown criterion result for Einstein-vacuum spacetimes in (J. Amer. Math. Soc. 23(2):345–382, 2009) to Einstein-Maxwell and Einstein-Yang-Mills spacetimes. 1. Introduction Let (M, g) denote a (1+3)-dimensional time-oriented Lorentzian manifold, with LeviCivita connection D and Riemann curvature R. Also, fix an integer n > 0, along with n integers r (1) , . . . r (n) ≥ 0. For each m, c ∈ {1, . . . , n}, we let (m) and (m) denote tensor fields on M of rank r (m) , and we let P (mc) denote a tensor field on M of rank 1+r (m) +r (c) .1 Assume that these objects satisfy the system L(m) (m) I = g (m) I +
n
P (mc) μI J D μ (c) J = (m) I ,
1≤m≤n
(1)
c=1
of tensor wave equations on M, where g = g αβ Dαβ is the covariant wave operator, and where I and J are collections of r (m) and r (c) indices, respectively.2 Note that if n = 1 and P (11) vanishes, then (1) reduces to the tensorial wave equation g = .
(2)
1 We surround the “indices” m and c with “(·)” in order to distinguish them from tensorial indices and their associated Einstein summation conventions. 2 In accordance with Einstein summation conventions, the repeated indices J in (1) denote summations over all possible values for J .
52
A. Shao
The aim of this paper is to generalize the representation formula of [12], which handled the setting (2), to also treat systems of the form (1). In other words, we determine a formula for the (m) ’s at a point p ∈ M in terms of the (m) ’s and (m) ’s along a portion of the regular past null cone about p. In addition, we present a derivation different in nature from that of [12], which allows us to weaken the assumptions required for this formula to be valid. Lastly, like in [12], we will extend this representation formula to arbitrary vector bundles. 1.1. Prior results. The model problem for (1) is the scalar wave equation φ = ψ, φ, ψ ∈ C ∞ R1+3
(3)
on the Minkowski spacetime R1+3 . Suppose φ solves (3), with initial data φ|t=0 = α0 ∈ C ∞ R3 , ∂t φ|t=0 = α1 ∈ C ∞ R3 . From standard theory, cf. [6, Sect. 2.4], at a point (t, x) ∈ (0, ∞) × R3 , we can express φ(t, x) explicitly in terms of the initial data α0 and α1 . Recall that we can decompose φ = φ1 +φ2 , where φ1 satisfies (3) but with zero initial data, while φ2 satisfies the homogeneous equation φ2 ≡ 0 with initial data α0 and α1 . The term φ2 can be explicitly expressed using Kirchhoff’s formula, 1 φ2 (t, x) = [tα1 (y) + α0 (y) + (y − x) · ∇α0 (y)] dσ (y), 4π t 2 ∂ B(x,t) where B(x, t) is the ball in R3 of radius t centered at x, and where dσ is the standard surface measure for ∂ B(x, t). In addition, from Duhamel’s principle, φ1 (t, x) =
t 1 ψ (y, t − r ) ψ (y, t − |y − x|) 1 · dσ (y) dr = · dy. |y − x| 4π 0 ∂ B(x,r ) r 4π B(x,t)
Note that this can be interpreted as an integral over the null cone segment N0− (t, x) = (s, y) ∈ R × R3 | 0 ≤ s ≤ t, |y − x| = t − s . There exist many geometric analogues for the above model result, which provide possibilities for studying wave equations in other settings. An important early example is the result of S. Sobolev in [17], in which a first-order parametrix was applied in order to prove well-posedness for second-order linear wave equations with variable coefficients. Y. Choquét-Bruhat made use of a similar representation formula in her celebrated local existence result for the Einstein-vacuum equations; see [1]. More recently, Chrusciel and Shatah, in [3], used the Hadamard-type parametrix in [7] in order to extend the classical global existence result of Eardley and Moncrief for the Yang-Mills equations, cf. [4,5], to globally hyperbolic curved spacetimes. In contrast to the first-order variants of [1,17], which we collectively refer to as “Kirchhoff-Sobolev-type” parametrices, the Hadamard-type parametrix of [7] achieved greater precision by making use of infinitely many derivatives of the metric g and requiring the convexity of the domain under consideration. For instance, since the geometric wave equation no longer obeys the strong Huygens principle, the Hadamard parametrix depends on all points in the causal past J − (z) of the point z under consideration.
A Generalized Representation Formula
53
Consequently, geodesic convexity is required to make sense of the formula. This differs strongly from Minkowski spacetime, in which the fundamental solution in (1 + 3)dimensions is supported entirely on the past null cone. These smoothness and convexity restrictions, however, can often be undesirable in the context of nonlinear problems in partial differential equations. We now focus our attention on [12, Thm. 3.11], which presented a KirchhoffSobolev parametrix directly handling covariant tensorial wave equations on arbitrary (1 + 3)-dimensional Lorentzian manifolds. For convenience, we refer to this representation formula as KR. In particular, KR, applied at a point p ∈ M, enjoys the following features: • KR directly handles the tensorial wave equation (2) in a completely covariant fashion, in particular without reducing to a scalar wave equation. • The parametrix is supported entirely on the past null cone of p. Indeed, the formula is expressed as integrals of and along this cone. • Since the past null cone can degenerate away from p, then KR is valid only locally near p, on a “regular” portion N − ( p) of the cone. In other words, the validity of KR is constrained by the regularity of the null exponential map, a weaker condition than the geodesic convexity assumption of [7]. • The formula KR contains “error terms”, expressed as integrals along N − ( p) of along with various geometric quantities on N − ( p). These are a result of the nontrivial geometry of the spacetime. • The formula KR depends only on quantities defined on N − ( p), i.e., it is independent of extensions of any quantities to neighborhoods of N − ( p). • The formula KR can be systematically generalized to covariant wave equations on arbitrary vector bundles over M with a bundle metric and a compatible connection; see [12, Thm. 4.1]. For instance, [12, Thm. 4.1] can be used to reprove global existence for the Yang-Mills equations in R3+1 in a similar manner as in [5]; see [12, Sect. 5]. In addition, since KR uses fully tensorial and covariant methods, then the above can be achieved without reference to Cronström gauges. This “gauge-invariant” method can likely be extended to treat curved spacetimes as well. Moreover, KR was applied in the proof of the breakdown criterion for the Einsteinvacuum equations in [14].3 The main result of [14] is the following: Theorem 1. Suppose (M, g) is an existing Einstein-vacuum spacetime, given as a constant mean curvature foliation τ , t0 < t1 < 0, M= t0 <τ
where each τ is a compact spacelike hypersurface of M satisfying the constant mean curvature condition tr k ≡ τ , and where k denotes the second fundamental form of τ in M. In addition, assume the following criterion: k L ∞ (M) + ∇ (log n) L ∞ (M) < ∞. Then, (M, g) can be extended as a CMC foliation to some time t1 + . 3 The entirety of the proof of the breakdown/continuation argument stated in [14], however, spans multiple papers, including [9–14,18,19].
54
A. Shao
In particular, KR was utilized to derive L ∞ -bounds for the curvature, which was shown to satisfy a covariant tensorial wave equation; see [14, Sect. 5]. These L ∞ bounds were essential for obtaining uniform energy estimates along the timeslices of the spacetime. Using local existence theory and various elliptic estimates, one could then derive Theorem 1. 1.2. The generalized formula. The parametrix KR, however, is insufficient for handling the analogous breakdown problem for the Einstein-Maxwell (and similarly, the Einstein-Yang-Mills) setting. The main result, discussed in [16], is the following: Theorem 2. Suppose (M, g, F) is an existing Einstein-Maxwell spacetime, where F is a 2-form in M representing the Maxwell field. Assume M is given as a CMC foliation as in Theorem 1, and assume the following criterion: k L ∞ (M) + ∇ (log n) L ∞ (M) + F L ∞ (M) < ∞. Then, (M, g, F) can be extended as a CMC foliation. The failure of KR in the proof of Theorem 2 is due to the presence of nontrivial first-order terms in the wave equations for the curvature and the Maxwell tensors. More specifically, if we apply KR, then we obtain null cone integrals containing “bad” components of the derivative of the curvature, which we cannot control a priori using standard local energy estimates. We also obtain equally untreatable “bad” components for the Maxwell tensor. In order to obtain Theorem 2, we must deal with the aforementioned first-order terms. To achieve this, we generalize KR to handle a system of tensor wave equations of the form (1). In particular, we must handle the first-order portion of (1) differently. As we shall see later, the main “trick” will be to eliminate those terms containing derivatives of the (m) ’s which are transverse to the null cone. This is intuitively unsurprising, since such derivatives, in both flat and curved spacetimes, are generally the most difficult to control using local energy estimates. Before delving fully into technical details, we first compare the approach of this generalized representation formula to that of its predecessor KR: • Similar to developments in [7], we handle the contributions of the first-order terms by suitably altering the transport equation in [12]. We add corresponding terms to these transport equations, which include the P (mc) ’s. • Due to coupling, all the equations in (1) must be handled concurrently. Consequently, the transport equation of [12] becomes a coupled system of n transport equations. • Like for KR, the parametrix of this paper will be supported on regular past null cones and will depend entirely on quantities defined on these cones. We opt for a different approach to the derivation of our formula than for that of KR. The main differences in principle between the two proofs are as follows: • A major component in deriving KR is the optical function, whose level sets are null cones. Here, we avoid any reference to such a function. As a result, we can eliminate some extraneous geometric assumptions needed in [12]. • While [12] makes heavy use of distributions on manifolds in an informal fashion, we take a more technically rigorous path and remain at the level of the calculus operations represented by such distributions. In particular, rather than dealing with derivatives of the δ-distribution like in [12], we deal with corresponding integrations by parts.
A Generalized Representation Formula
55
• In our derivation, we integrate by parts only the derivatives in directions tangent to the null cone, while in [12], integration by parts is applied liberally to derivatives in all directions. The latter approach results in terms depending on quantities off the null cone, which must then be cancelled through additional meticulous integrations by parts. By exercising further discretion, we can avoid such troubles altogether. • In this paper, we will make explicit the dependence of the parametrix on the initial data.4 This is in contrast to [12], which focused only on the case of vanishing initial data. As mentioned before, we obtain subtle improvements over the assumptions required for KR to be valid. Assumptions A1 and A2 of [12, Sect. 2.1], assumed when deriving KR at a point p ∈ M, postulated that certain local hyperbolicity and null regularity conditions hold for all points in some neighborhood of p. In contrast, in our derivation, since we avoid the optical function and work exclusively on the null cone, we require only the null regularity condition for p. We also previously noted that both the representation formula of this paper and KR depend only on quantities defined on the null cone. Since the derivation of KR introduces a multitude of terms which depend on quantities off the cone, that the final result does not depend on quantities off the cone appears as a “miracle” resulting from numerous cancellations. In contrast, in our derivation, we automatically have at each step of the proof that every term is dependent only on quantities defined on the null cone. Therefore, our derivation highlights this property of the parametrix as a natural rather than surprising consequence. 1.3. The main result. Now, we state an abridged schematic version of the main theorem of this paper. The precise statement will be deferred until Theorem 7, after we develop the background and notations needed to fully describe the parametrix. Theorem 3. Consider the system (1) on (M, g), let N denote a “regular” segment of the past null cone about p ∈ M, and let J (1) , . . . , J (n) be tensors at p of the same ranks as (1) , . . . , (n) , respectively. Then, we have the schematic formula n n (m) (m) J · p=− A(m) · (m) + (error terms) + (initial data), m=1
m=1 N
where: • The A(m) ’s are tensor fields on N which satisfy a system of transport equations depending on the J (m) ’s and on the geometry of N . • The error terms can be expressed as integrals of quantities on N . • The initial data contribution can be expressed as integrals over the lower boundary of N (i.e., a spherical cross-section of the past null cone). In addition, in the final section, we will discuss extensions of the parametrix to covariant wave equations on sections of vector bundles, similar to [12, Thm. 4.1], but once again with nontrivial first-order terms. We also discuss the main theorem in the context of this extended framework, and we discuss how this extension can be applied to treat the Einstein-Yang-Mills breakdown problem. 4 By “initial data”, we mean the values of the (m) ’s and D(m) ’s on a spherical cross-section of the past null cone. These correspond to α0 and α1 in the scalar Minkowski model problem.
56
A. Shao
2. Geometry of Regular Null Cones Fix p ∈ M, and consider the null exponential map exp p about p, defined as the restriction of the exponential map about p to the past null cone N of the tangent space T p M.5 Define the past null cone N − ( p) of p to be the image of exp p . Note that N − ( p) is ruled by the past inextendible null geodesics beginning at p. We call a neighborhood U in N of the origin star-shaped iff for each q ∈ U , the open segment between the origin and q is also contained in U . Fix such a star-shaped U ⊆ N, and assume exp p is also a global diffeomorphism on U . From now on, we shall denote the image exp p (U ) by N − ( p), which we refer to as the regular past null cone.6 In particular, N − ( p) is a smooth null hypersurface of M.
Remark. Recall that since one can find convex subsets about any point, then one can always construct such a set N − ( p). Remark. For nonpathological spacetimes, we can systematically construct such N − ( p) via global considerations. For instance, in the globally hyperbolic setting, we can determine N − ( p) from the past null boundary J − ( p) − I − ( p), where I − ( p) and J − ( p) are the chronological and causal pasts of p, respectively; see [8,15]. In practice, this is the situation adopted in most applications. We remark here that a loss of such regularity occurs at what are called terminal points of N − ( p). At such a point z, one of the following scenarios hold: • Two distinct past null geodesics from p intersect at z. In other words, the map exp p fails to be one-to-one at z. • The pair p and z are null conjugate points. In other words, the map exp p fails to be nonsingular at z. From now on, we shall deal only with the regular portion N − ( p), on which we have a smooth structure. Furthermore, since tangent null vectors in N − ( p) have vanishing Lorentzian “length” and are orthogonal to N − ( p), they cannot be normalized without introducing vectors transversal to N − ( p). Consequently, in our treatment, we will require an additional choice of a future timelike unit vector t ∈ T p M. Here, we fix such a t for the remainder of this paper. 2.1. Null generators. We define the null generators of N − ( p) to be the inextendible past null geodesics γ on M which satisfy γ (0) = p and g(γ (0), t) = 1. We can smoothly parametrize these generators by S2 using the following process. If we choose an orthonormal basis e0 , . . . , e3 of T p M, with e0 = t, then we can identify each ω ∈ S2 with the null generator γω satisfying γω (0) = −e0 + ωk ek . For convenience, we assume such a parametrization of the null generators of N − ( p), and we denote by γω , ω ∈ S2 , the null generator corresponding to ω. Remark. The objects on N − ( p) that we will discuss can of course be defined independently of any parametrization of the null generators. However, for ease of notation, we will work explicitly with S2 . 5 Here, we assume N does not include the origin of T M. p 6 In other words, we can think of N − ( p) as a “past null convex” set.
A Generalized Representation Formula
57
In addition, we define the vector field L on N − ( p) to be the tangent vector fields of the null generators, i.e., we define L|γω (v) = γω (v) for any ω ∈ S2 . Note in particular that L is a geodesic vector field. 2.2. Spherical foliating functions. We say that a function f ∈ C ∞ (N − ( p)) is a foliating function of N − ( p) iff f satisfies the following conditions: • For every ω ∈ S2 , lim f |γω (v) = 0.
v0
• The function
ϑ = ϑ f = (L f )−1 ∈ C ∞ N − ( p)
(4)
is everywhere positive on N − ( p). In other words, f is strictly increasing along every null generator of N − ( p). • There is a constant ϑ0 > 0 such that for all ω ∈ S2 , lim ϑ|γω (v) = ϑ0 .
v0
• For every ω ∈ S2 , the limit lim Lϑ|γω (s)
s0
exists and is finite. The function ϑ defined in (4) is called the null lapse of f . Remark. We can in fact weaken the third condition in the above definition so that ϑ converges to different limits along various null generators. This generalization, however, is currently without practical applications and complicates various initial limit computations. Therefore, we restrict ourselves to the simpler definition. In the remainder of this paper, we will let f denote an arbitrary “default” foliating function for N − ( p), and we let ϑ = ϑ f . The positivity of ϑ implies d f is everywhere nonvanishing, so the level sets of f , denoted
Sv = Svf = q ∈ N − ( p) | f (q) = v , v > 0, form a family of hypersurfaces of N − ( p). Since L represents the unique null direction tangent to N − ( p), and the positivity of ϑ implies that L is transverse to each Sv , we can conclude that each Sv is spacelike, i.e., Riemannian. We shall adopt the following / denote the induced metrics and the Levi-Civita notational conventions: we let λ and ∇ connections on the Sv ’s, respectively. Define the quantity i( p) = i f ( p) to be the supremum of all v > 0 such that for every ω ∈ S2 , there exists a point q on both γω and N − ( p) such that f (q) = v. This can be interpreted as N − ( p) remaining regular up to f -value i( p).7 Note that the definition of i( p) and the positivity of ϑ imply that Sv is diffeomorphic to S2 whenever 0 < v < i( p). 7 If N − ( p) is defined in terms of the global geometry of M, as was discussed before, then i( p) corresponds to the “null injectivity radius” of N − ( p) with respect to f .
58
A. Shao
In addition, for any 0 < v1 ≤ i( p), we define the null cone segment
− N − ( p; v1 ) = N − f ( p; v1 ) = q ∈ N ( p) | f (q) < v1 , while for any 0 < v1 < v2 ≤ i( p), we define
− N − ( p; v1 , v2 ) = N − f ( p; v1 , v2 ) = q ∈ N ( p) | v1 < f (q) < v2 . The most natural example of a foliating function is the affine parameter s on N − ( p), given by s(γω (v)) = v for each ω ∈ S2 . Note in particular that s converges to 0 at p and ϑs ≡ 1 everywhere. The foliation of N − ( p) by the Svs ’s is called the geodesic foliation. Remark. Note that s depends on the normalization t. If the spacetime M is foliated by a global time function t, then we have another natural foliating function t p for N − ( p), given by t p (q) = t ( p) − t (q). This is the foliating function used in the breakdown results of [14,16]. 2.3. Tensors. Here, we list the notations we will use to describe various tensor fields on N − ( p). Again, we assume an arbitrary foliating function f for N − ( p). We begin with the following objects: • A tensor w at some q ∈ Sv is said to be horizontal iff w is tangent to Sv . • We denote by T k N − ( p) the horizontal bundle over N − ( p) of all horizontal tensors of total rank k at every q ∈ N − ( p). l • Similarly, we denote by T N − ( p) the extrinsic bundle over N − ( p) of all tensors in M of total rank l at every q ∈ N − ( p). For a vector bundle V, we let V denote the space of all smooth sections of V. By l this formalism, then T k N − ( p) and T N − ( p) denote the spaces of all horizontal tensor fields of rank k and extrinsic tensor fields of rank l on N − ( p), respectively. In other words, a horizontal tensor field A ∈ T k N − ( p) smoothly maps each q ∈ N − ( p) to a horizontal tensor of rank k at q; extrinsic tensor fields can be characterized analogously. For example, the restrictions to N − ( p) of the spacetime metric g and the cur2 4 vature R can be treated as elements of T N − ( p) and T N − ( p), while the induced metrics λ on the Sv ’s can be treated as an element of T 2 N − ( p). Lastly, note that 0 T 0 N − ( p) = T N − ( p) = C ∞ (N − ( p)). Next, we define the mixed bundles over N − ( p) to be the tensor product bundles l
l
T k T N − ( p) = T k N − ( p) ⊗ T N − ( p) . l
Similarly, we will call a section A ∈ T k T N − ( p) a mixed tensor field on N − ( p). By the duality formulation, we can consider such a field A as a bilinear map
l A : T k N − ( p) × T N − ( p) → C ∞ N − ( p) , or as the corresponding C ∞ (N − ( p))-valued multilinear map on k horizontal vector fields and l extrinsic vector fields. In index notation, we adopt the following conventions: • Horizontal indices will be denoted using Latin letters and will take values between 1 and 2, inclusive. • Extrinsic indices will be denoted using Greek letters and will take values between 1 and 4, inclusive. • Collections of extrinsic indices will be denoted using capital Latin letters.
A Generalized Representation Formula
59
2.4. Covariant differentiation. Recall that the Levi-Civita connection D on M induces l l a connection D on the extrinsic bundles T N − ( p). For any A ∈ T N − ( p) and a vector field X on N − ( p), we can arbitrarily extend A to a neighborhood of N − ( p) and define D X A = D X A. It is easy to see that this definition is independent of the chosen extension of A. Moreover, it is clear from definition that D X g ≡ 0 for any vector field X on N − ( p).8 / on the Sv ’s naturally aggregate to define a connection The Levi-Civita connections ∇ / on the horizontal bundles T k N − ( p). Let X be a vector field on N − ( p): ∇ / X f = X f , as usual. • For a scalars f ∈ C ∞ (N − ( p)), we define ∇ / X Y to be the projection onto the Sv ’s of • For a horizontal vector field Y , we define ∇ / X Y is precisely the corresponding covariant D X Y . Note that if X is horizontal, then ∇ derivative with respect to the Sv ’s. / X A ∈ T k N − ( p) is naturally given • For a fully covariant A ∈ T k N − ( p), then ∇ as follows: for horizontal vector fields Y1 , . . . , Yk ,
/ X A (Y1 , . . . , Yk ) = X [A (Y1 , . . . , Yk )] − A ∇ / X Y1 , Y2 , . . . , Yk ∇
/ X Yk . − · · · − A Y1 , . . . , ∇ / generalizes the usual ∇-covariant / This definition of ∇ derivative on the Sv ’s to also / X λ ≡ 0 for any vector field X on N − ( p), i.e., include the L-direction. Note also that ∇ / remains compatible with the horizontal metrics. ∇ / and D on the horizontal and extrinsic We can canonically combine the connections ∇ / on the mixed bundles. The basic idea is bundles to obtain generalized connections ∇ / behave “like ∇ / on the horizontal components” and “like D on the extrinto have ∇ l sic components”. More specifically, if A ∈ T k T N − ( p) is a fully covariant mixed − tensor field, X is a vector field on N ( p), Y1 , . . . , Yk ∈ T 1 N − ( p) are horizontal 1 vector fields, and Z 1 , . . . , Z l ∈ T N − ( p) are extrinsic vector fields, then we define l / X A ∈ T k T N − ( p), interpreted as a multilinear map, by assigning to the expression ∇ / X A(Y1 , . . . , Yk ; Z 1 , . . . , Z l ) the following value: ∇
/ X Y1 , Y2 , . . . , Yk ; Z 1 , . . . Z l − · · · X [A (Y1 , . . . , Yk ; Z 1 , . . . , Z l )] − A ∇
/ X Yk ; Z 1 , . . . , Z l − A Y1 , . . . , Yk ; D X Z 1 , Z 2 , · · · Z l −A Y1 , . . . , Yk−1 , ∇
− · · · − A Y1 , . . . , Yk ; Z 1 , . . . , Z l−1 , D X Z l . The main observations concerning the above construction are the following: / and D-covariant • The mixed covariant derivatives satisfy Leibniz rules similar to ∇ derivatives. / X λ ≡ 0 and ∇ / X g ≡ 0. • For any vector field X on N − ( p), both ∇ The primary consequence of these observations is that the same integrations by parts / / operations can be justified for ∇-derivatives of mixed tensor fields as for ∇-derivatives on horizontal tensor fields. This property was implicitly used in [12,14]. Finally, we will make use of the following notations: / A ∈ T k+1 N − ( p) denote the horizontal tensor • For any A ∈ T k N − ( p), we let ∇ / X A. This is precisely the covariant field mapping a horizontal vector field X to ∇ differentials of A on the Sv ’s. 8 Technically, by D g, we mean D acting on the restriction of g to N − ( p). X X
60
A. Shao l
l
/ A ∈ T k+1 T N − ( p) denote the • Similarly, for any A ∈ T k T N − ( p), we let ∇ / X A. mixed tensor field mapping a horizontal vector field X to ∇ • We also define the horizontal and mixed Laplacians in the usual way: / ab , / = λab ∇
/ ab . / = λab ∇
/ f = ϑ∇ / L and ∇ / f = ϑ∇ / L . These are horizontal and • Consider also the operators ∇ mixed covariant derivatives in the tangent null direction, subject to the normalizations /f f =∇ / f f ≡ 1. ∇ For further details involving the above constructions, see [16, Sect. 1.2]. 2.5. Parametrizations. We can parametrize N − ( p) by the foliating function f and a value ω ∈ S2 . For 0 < v < i( p) and ω ∈ S2 , we can identify (v, ω) with the unique point q on the corresponding null generator γω with f (q) = v. As a result, we can naturally treat any φ ∈ C ∞ (N − ( p)) as a function of f and ω. For any such φ, we denote by φ|(v,ω) the value of φ at the point q corresponding to the parameters (v, ω). We will freely use this (v, ω)-notation throughout future sections without further elaboration. ˆ m, 2.6. Null frames. In general, null frames are local frames l, ˆ e1 , e2 which satisfy
ˆ lˆ = g m, ˆ mˆ ≡ −2, g l, ˆ mˆ ≡ 0, g l,
ˆ ea = g m, g l, ˆ ea ≡ 0, g (ea , eb ) = δab . Here, we define null frames which are adapted to the f -foliation of N − ( p). Each point of Sv is normal to exactly two null directions, one of which is represented by L. We define L, called the conjugate null vector field, to be the vector field in the other normal null direction, subject to the normalization g(L , L) ≡ −2. Next, we append to L and L a local orthonormal frame e1 , e2 on the Sv ’s. Then, {L , L, e1 , e2 } defines a natural null frame for N − ( p). In this paper, we will index only with respect to adapted null frames. • Horizontal indices 1, 2 correspond to the directions e1 and e2 . • L corresponds to the index 3, while L corresponds to the index 4. 2.7. Ricci coefficients. We will make use of the following connection quantities: • Define the null second fundamental forms χ , χ ∈ T 2 N − ( p) by
χ (X, Y ) = g D X L, Y χ (X, Y ) = g D X L , Y , for any horizontal vector fields X and Y on N − ( p). Note that χ and χ are symmetric, since both L and L are normal to the Sv ’s. • We often decompose χ into its trace and traceless parts: 1 (tr χ ) λ. 2 We will also use an analogous decomposition for χ. tr χ = λab χab ,
χˆ = χ −
A Generalized Representation Formula
61
• Define ζ, η ∈ T 1 N − ( p) by ζ (X ) =
1
g DX L, L , 2
η (X ) =
1
g X, D L L 2
for any horizontal vector field X on N − ( p). The quantities tr χ , χˆ , and ζ are called the expansion, shear, and torsion of N − ( p) (with respect to the f -foliation), respectively. In addition, we have the following relation between ζ and η: / (log ϑ) . η = −ζ + ∇
(5)
For a proof, see [9, Prop. 2.7]. 1 Given a null frame L , L, e1 , e2 , a vector field Z ∈ T N − ( p) is decomposed 1
1 Z = − g Z , L L − g (Z , L) L + g (Z , ea ) ea . 2 2 2
a=1
Using this identity, we can decompose covariant derivatives along N − ( p): 1 1 / a eb + χ L + χab L, D a eb = ∇ 2 ab 2 D a L = χ a b eb + ζa L,
/ L ea + η L , D L ea = ∇ a
D L L ≡ 0,
D L L = 2ηa ea .
D a L = χa b eb − ζa L , (6)
Lastly, we define the mass aspect function μ ∈ C ∞ (N − ( p)) by 1 1 1 / a ζa − χˆ ab χˆ + |ζ |2 + R4343 − R43 , μ=∇ ab 2 4 2
(7)
and we note the following transport equation satisfied by tr χ: 2 1 1 / a η + 2 η − (tr χ ) tr χ − χˆ ab χˆ + R4343 − R43 . / L tr χ = 2∇ ∇ a ab 2 2
(8)
For details on (8), see [2] or [16, Sect. 3.2]. 2.8. Integration. Since N − ( p) is null, we have no volume form on N − ( p) with respect to which we can integrate scalar functions. However, we can still give a natural definition for integrals of functions over N − ( p), as long as we have a fixed normalization for N − ( p). Indeed, we define this integral by N − ( p)
∞
φ= 0
Svs
φ dv
(9)
for any φ ∈ C ∞ (N − ( p)) for which the right hand side is well-defined, where the Svs ’s are the level sets of s on N − ( p). We can similarly define integrals over any open subset of N − ( p), in particular the segments N − ( p; v2 ) and N − ( p; v1 , v2 ).
62
A. Shao
By a change of variables, we can restate (9) in terms of any foliating function f : Proposition 4. For any foliating function f of N − ( p), we have ∞ φ= ϑ · φ dv N − ( p)
0
Sv
for any integrable φ ∈ C ∞ (N − ( p)). We will also need the following derivative formula. Here, the integrals over Sv are the usual integrals over Riemannian manifolds. Proposition 5. Let φ ∈ C ∞ (N − ( p)). Then, for 0 < v0 < i( p), d / f φ + ϑ (tr χ ) φ . ∇ φ = dv Sv v=v0 Sv0 For proofs of the above propositions, see [16, Sect. 3.4] and [9]. Remark. We can justify the definition (9) using energy estimate considerations. Suppose V is a region in M, with a portion of its boundary given by a neighborhood V in N − ( p). Then, if X is a 1-form on M, and we integrate D α X α over V and apply the divergence theorem, then the boundary integral corresponding to V is precisely given by g (X, L) . (10) V
This formula is fundamental to the local energy estimates applied in [14,16]. Remark. Note that the definition (9) depends on the normalization t chosen for N − ( p), while the argument in the previous remark establishes that the expression (10) is in fact independent of t.
2.9. Initial values. Another important set of properties we shall need in our analysis concerns the “initial values” of many of the objects we have defined above, that is, we wish to compute their limits at p along the null generators of N − ( p). We list the general results for arbitrary foliations below. Their proofs involve applications of convex geometry among other technicalities, hence we omit such details in this paper and refer the reader to [16, Sect. 3.3]. An earlier account for the special case of the geodesic foliation is presented in [18]. Proposition 6. The following limits hold for each ω ∈ S2 : • We have the following comparisons between f and the affine parameter s: ϑ s 1 lim − = ϑ0 , = 0. v0 f (v,ω) f s (v,ω) Here, ϑ0 is the initial value of the null lapse ϑ.
(11)
A Generalized Representation Formula
63
• For each integer k > 0,
/ k s lim v k−1 ∇
v0
(v,ω)
= 0,
/ k lim v k ∇ ϑ
v0
(v,ω)
= 0,
(12)
/ k denotes the k th order horizontal covariant differential, i.e., the operator where ∇ / applied successively k times.9 ∇ • We have the following limits for χ : lim ϑ (tr χ ) − 2 f −1 = 0, lim χˆ (v,ω) = 0. (13) v0
v0
(v,ω)
• We have the following limits for ζ and η: lim v |ζ ||(v,ω) = lim v η
v0
v0
• We also have the following limits for χ : = −2ϑ0−1 , lim v tr χ v0
(v,ω)
= 0.
lim v χˆ
v0
(v,ω)
(v,ω)
(14)
= 0.
(15)
• Let φ ∈ C ∞ (N − ( p)), let φ0 ∈ C ∞ (S2 ), and suppose lim φ|(v,ω ) = φ0 |(ω ) ,
ω ∈ S2 .
Then, the following integral limit holds: lim v −2 φ = ϑ02
φ0 |ω dω ,
v0
v0
Sv
S2
(16)
where the integral on the right hand side are with respect to the standard Euclidean measure on S2 . In the previous proposition, the tensor norms | · | denote the natural Riemannian norms on the Sv ’s, that is, for any A ∈ T k N − ( p), we define k 2 ai bi |A| = λ (A, A) = λ Aa1 ...ak Ab1 ...bk . i=1
Remark. We can also see from Proposition 6 that the behaviors of the objects listed within the proposition all “tend to their corresponding values in Minkowski space” at p. For example, the expansion tr χ in Minkowski space is precisely 2s −1 , which is the asymptotic behavior near p demonstrated in (13). Remark. We can also derive initial limits for derivatives of the Ricci coefficients. For details, see [16, Sect. 3.3]. Remark. The initial values for the Ricci coefficients behave even better than in Proposition 6 in the case of the geodesic foliation; see [16, Sect. 3.3] or [18]. 9 Note that ∇ / here is defined with respect to the f -foliation.
64
A. Shao
For completeness, we list a few key ideas regarding the proof of Proposition 6: • The limits (11) are straightforward consequences of the identity ddsf = ϑ and the initial limit assumption for Lϑ. • Fix an orthonormal basis e0 , e1 , e2 , e3 of T p M, with e0 = t, and consider the normal coordinates x 0 , x 1 , x 2 , x 3 associated with this basis. Then, x 0 coincides with s on the intersection of N − ( p) with the domain of x 0 . • Using the above normal coordinate considerations, we can derive that the rescaled induced metrics f −2 λ on the Sv ’s approach ϑ02 times the Euclidean metric on S2 as v 0. This leads to (16). • From the scale of λ obtained above, one can then show that a horizontal covariant / “has scale O( f −1 )”, which leads to (12). derivative ∇ • Fix a convex neighborhood U of p, and define the function σ : U × U → R so that σ (q1 , q2 ) is the squared geodesic distance between q1 and q2 in U. Then, σ defines a smooth function, and the vector field s L extends smoothly to U as a partial covariant differential of σ . Furthermore, second partial covariant derivatives of σ converge to the metric g| p at ( p, p). • From the above, we can derive the initial limits of sχ and sζ when f = s. By comparing s with f , we can obtain (13) and (14) in the general case. • In the case f = s, we can state L and hence χ in terms of L and Dx 0 . Since x 0 can be written as a partial covariant derivative of σ , then χ can be written in terms of first and second derivatives of σ ; this leads to the limit (15). The general case follows again from comparing s with f . 3. The Main Theorem Statement Before we can state the representation theorem, we must make a few additional definitions. Again, we suppose N − ( p) is normalized by t ∈ T p M and foliated by f ∈ C ∞ (N − ( p)). Recall that from t and f , we can define the null vector fields L and L, as well as local null frames L , L, e1 , e2 adapted to the f -foliation of N − ( p). For each m ∈ {1, . . . , n}, we define the extrinsic tensor fields B (m) ∈ T
r (m)
N − ( p)
such that they satisfy the coupled system of transport equations / f B (m) I = − ∇
n 1 ϑ (cm) I (c) J 2 P 4J B , ϑ (tr χ ) − B (m) I + 2 f 2
1≤m≤n
(17)
c=1
along the null generators of N − ( p), where I and J here denote collections of r (m) and r (c) extrinsic indices. We also stipulate the initial conditions 1 ≤ m ≤ n, (18) B (m) p = J (m) , where each J (m) is an aribtrary tensor of rank r (m) at p. Note that the validity of the system (17), (18) follows from the initial limit (13) for tr χ . We also define the extrinsic fields A(m) = f −1 B (m) ∈ T
r (m)
N − ( p) ,
A Generalized Representation Formula
65
and we note that the A(m) ’s satisfy the transport equations 1 1 (cm) I (c) J / L A(m) I = − (tr χ ) A(m) I + P 4J A , ∇ 2 2 n
1 ≤ m ≤ n.
(19)
c=1
The A(m) ’s are the analogues of the tensor field A in [12] and correspond to the factor = |y − x|−1 in the explicit expression for φ1 in the model problem. Moreover, if n = 1 and the P (mc) ’s vanish, then (19) reduces to the transport equation given in [12]. Note in particular that even if a specific J (m) vanishes, A(m) can still be nontrivial due to the coupling in (19). Lastly, we define the coefficients
r −1
ν (cm) ∈ T
r (c) +r (m)
N − ( p) ,
1 ≤ m, c ≤ n
by the formula a 1 1 (cm) I / P (cm) a J I + ∇ / 4 P (cm) 3J I + ζ a P (cm) a J I + tr χ P 4J ν (cm) J I = −∇ 2 4 n 1 1 (cd) K (dm) I + (tr χ ) P (cm) 3J I + P 4J P 3K . 4 2
(20)
d=1
These tensor fields will be present in the error terms of the representation formula.
3.1. The main theorem. We are now ready to state the main theorem. Theorem 7. Assume the following: • Let n be a positive integer, let r (1) , . . . , r (n) be nonnegative integers, and suppose for each 1 ≤ m, c ≤ n, we have defined tensor fields (m) , (m) , and P (mc) on M of ranks r (m) , r (m) , and 1 + r (m) + r (c) , respectively. • Suppose the (m) ’s, (m) ’s, and P (mc) ’s satisfy the system (1). • Fix p ∈ M, and suppose a regular portion N − ( p) of the past null cone N − ( p) is normalized and foliated by t ∈ T p M and f ∈ C ∞ (N − ( p)). • Let v0 be a constant satisfying 0 < v0 ≤ i( p). • For each 1 ≤ m ≤ n, we define the extrinsic tensor fields B (m) ∈ T
r (m)
N − ( p) ,
A(m) = f −1 B (m) ∈ T
r (m)
N − ( p) ,
along with a tensor J (m) of rank r (m) at p, such that the systems of transport equations (17), (18), (19) hold. Then, we have the representation formula 4π ϑ0
n m=1
J (m) I (m) I p = F ( p; v0 ) + E1 ( p; v0 ) + E2 ( p; v0 ) + I ( p; v0 ) ,
(21)
66
A. Shao
where • The “fundamental solution term” F( p; v0 ) is given by F ( p; v0 ) = −
n − m=1 N ( p;v0 )
A(m) I (m) I .
(22)
• The “principal error terms” E1 ( p; v0 ) are given by n
E ( p; v0 ) = − 1
+
m=1 n
a
N − ( p;v
0)
− m=1 N ( p;v0 )
/ A(m) I ∇ / a (m) I ∇ / a A(m) I (m) I . ζ a − ηa ∇
(23)
• The remaining “error terms” E2 ( p; v0 ) are given by E ( p; v0 ) = 2
n − m=1 N ( p;v0 ) n
+
1 2
−
+
μ · A(m) I (m) I
− m=1 N ( p;v0 ) n
− m,c=1 N ( p;v0 ) n
− m,c=1 N ( p;v0 )
A(m) I R43 (m) I a
/ A(c) I (m) I P (cm) a J I ∇ ν (cm) J I · A(c) J (m) I ,
(24)
where R43 [(m) ] I denotes the curvature quantity R43 (m) I = D43 (m) I − D34 (m) I , and where “D43 ” and “D34 ” denote second covariant derivatives. • The “initial value terms” I( p; v0 ) are given by n n 1 (m) I (m) tr χ A I − I ( p; v0 ) = − A(m) I D3 (m) I 2 S S v v 0 0 m=1 m=1 n 1 − P (cm) 3J I A(c) J (m) I . 2 Sv0
(25)
m,c=1
Here, we have indexed with respect to arbitrary null frames L , L, e1 , e2 adapted to the f -foliation. The capital letters I, J refer to collections of extrinsic indices. Furthermore, the error terms E1 ( p; v0 ) can be alternately expressed as E ( p; v0 ) = 1
n − m=1 N ( p;v0 )
/ A
(m) I
(m)
I
+2
n − m=1 N ( p;v0 )
/ a A(m) I (m) I . ζa · ∇
(26)
A Generalized Representation Formula
67
Note that F( p; v0 ) corresponds to the explicit form of φ1 in the model problem, while E1 ( p; v0 ) and E2 ( p; v0 ) are error terms which vanish in the model problem. Lastly, I( p; v0 ) describes the initial value contributions, which correspond to the Kirchhoff formula for φ2 in the model problem. Remark. Although the representation formula was stated in (21)-(26) in index notation, this was done only as a matter of convenience. It is easy to see that these expressions can in fact be described invariantly. Remark. In particular, we can use (21) to examine the value of any (m) | p individually by setting J (c) = 0 for all c = m. 3.2. Improvements over KR. Theorem 7 contains a number of improvements over the parametrix KR. The most apparent is that Theorem 7 handles the system (1) and not just the single equation (2). While [12] only presented the case with vanishing initial data, here we present the general case for arbitrary initial data; the dependence on the initial data is given explicitly by the I( p; v0 )-term (25). Remark. In fact, the extended parametrix presented in [12, Thm. 4.1] can handle the “multiple wave equations” aspect of (1), but it does not cover the issues of general first-order terms. For details, see Sect. 5. Additionally, Theorem 7 weakens the assumptions required for the representation formula to be valid. The main assumptions needed for KR are the local spacetime conditions A1 and A2 given in [12, Sect. 2.1]. First, since we make no reference to the optical function and remain entirely on N − ( p) in our derivation, we only need that the null cone regularity stated in A2 holds for N − ( p), rather than for N − (q) for every q in a neighborhood of p, as needed in [12]. This assumption is reflected in Theorem 7 by working only within the segment N − ( p; v0 ), where v0 ≤ i( p). Moreover, since we work exclusively on N − ( p), the local hyperbolicity condition A1 is entirely superfluous here. 3.3. Exclusive dependence on the null cone. We also previously noted that the representation formula depends only on quantities defined on N − ( p). Indeed, by inspecting the individual portions (22)–(24), we can immediately see that all the objects within the integrands can be expressed as horizontal, extrinsic, or mixed tensor fields on N − ( p). The representation formula remains unaffected by potential extensions of these quantities off N − ( p).10 We remark that this property of the parametrix does not conflict with the fact that wave equations on curved spacetimes do not satisfy the strong Huygens principle. This is demonstrated in Theorem 7 by the recursive error terms (23), (24), (26), which contain the unknowns (1) , . . . , (n) themselves. 3.4. Extensions beyond cut locus terminal points. As stated in Theorem 7, the parametrix is valid only when the past null exponential exp p remains a global diffeomorphism. We can, however, trivially extend Theorem 7 beyond cut locus terminal points, that is, 10 The only exceptions to these rules are the factors D (m) in the “initial data” terms (25), which depend 3 on transversal derivatives of the (m) ’s.
68
A. Shao
terminal points resulting from the intersection of two distinct null generators, by lifting the representation formula to the tangent space T p M via exp p . By pulling back over exp p , we can consider all the tensorial objects referenced in Theorem 7 as fields on the past null cone N of T p M. At the level of the tangent space, cut locus terminal points no longer exist, so we can apply Theorem 7 as usual to obtain a representation formula on N. Using exp p , we can express the above representation formula on N back in terms of the past null cone of p, beyond any cut locus terminal points. Due to intersecting null geodesics, some objects in the integrands, such as the Ricci coefficients, will be multi-valued at cut locus points. However, in this extended formulation of Theorem 7, we no longer have the standard local energy estimates which were crucial to the general relativity applications [14,16]. On the other hand, conjugate terminal points on past null cones pose a much more serious problem. Such points by definition are preserved by the lifting via exp p to the tangent space T p M. Moreover, conjugate points are accompanied by the degeneration of the expansion tr χ and hence the A(m) ’s. Consequently, we have no natural extension of Theorem 7 which survives beyond conjugate points.
3.5. Applications to general relativity. Here, we briefly describe how Theorem 7 is applied to the breakdown problem for the Einstein-Maxwell equations, i.e., to Theorem 2. First, in the vacuum analogue, one crucial estimate is the uniform bound for the spacetime Riemann curvature R, which satisfies a wave equation g R ∼ = R · R.
(27)
The parametrix KR was applied to (27) in order to control R at a point p; the principal term to be estimated as a result is an integral over N − ( p) of the quadratic nonlinearity R · R.11 This was handled using a similar strategy as in the classical work [5] of Eardley and Moncrief, in that at least one of the R’s in the integrand could be controlled by the flux density of R on N − ( p). The Einstein-Maxwell case is similar, except that we now have a system of two wave equations for both R and the Maxwell 2-form F on M: ∼ F · D 2 F + (R + D F) · (R + D F) + l.o., g R = g D F ∼ = F · D R + (R + D F) · (R + D F) + l.o.
(28)
We can apply Theorem 7 to this system, and we can handle the quadratic nonlinearities in the same manner as in the vacuum analogue. The first-order terms F · D 2 F and F · D R present a new challenge, however, as the methods in [14] using KR fail here. However, these terms can be handled directly using Theorem 7, which absorbs the effect of these terms into the transport equation. More specifically, in the setting of Theorem 7, we take (1) = R and (2) = D F. From (28), we see that the first-order coefficients P (11) and P (22) vanish, while the coefficients P (12) and P (21) are schematically of the form F. Due to the extra control present for F stipulated in Theorem 2, the A(m) ’s, which depend on the P (cm) ’s, can still be adequately controlled. For details, see [16]. 11 The analogue A of the transport equation in KR also appears here.
A Generalized Representation Formula
69
4. Proof of the Main Theorem In this section, we assume the hypotheses of Theorem 7, and we derive in detail the parametrix of Theorem 7. Let 0 < < v0 , and define for convenience
N = N − ( p; , v0 ) = q ∈ N − ( p) | < f (q) < v0 . 4.1. General integration formulas. The most fundamental steps in the derivation of Theorem 7 involve integrations by parts. Here, we state the general lemmas representing these steps. These lemmas deal with two separate cases: integrations by parts involving / and integrations by parts involving ∇ / L. ∇, We first examine the horizontal case. This is a simple consequence of the compat/ with both the spacetime metric g and the ibility of the mixed covariant differential ∇ horizontal metrics λ. r
r
Lemma 8. For any integer r ≥ 0, S ∈ T N − ( p), and T ∈ T 1 T N − ( p), a a / Ta I = − / S I · Ta I − / a (log ϑ) · S I Ta I , SI · ∇ ∇ ∇ N
N
N
where I denotes a collection of r arbitrary extrinsic indices. Proof. Applying Proposition 4 yields v0 I /a I /a S · ∇ Ta I = ϑ · S ∇ Ta I dv N
Sv v0 v0 a a I I / ϑ · S Ta I dv − / S · Ta I dv = ϑ∇ ∇
Sv
Sv v0 / a (log ϑ) · S I Ta I dv. − (29) ϑ∇ Sv
/ and ∇γ / vanish. By Proposition 4 again, the last In the last step, we used that both ∇g two terms on the right-hand side of (29) are precisely the desired terms. Defining the horizontal 1-form ω by ω(X ) = ϑ · S I Ta I X a , then by the divergence theorem, the first term on the right-hand side of (29) becomes v0 a / ωa dv = 0. ∇
Sv
As a special case of Lemma 8, we have the following: r
Corollary 9. For any integer r ≥ 0 and S, T ∈ T N − ( p), a / SI · ∇ / a TI − / a (log ϑ) · S I ∇ / a TI , / I =− S I · T ∇ ∇ N
N
N
where I denotes a collection of r arbitrary extrinsic indices.
70
A. Shao
Next, we give the result for the null direction case: r
Lemma 10. For any integer r ≥ 0 and S, T ∈ T N − ( p), I / I I / S · ∇ L TI = − ∇ L S · TI − (tr χ ) · S TI − N
N
N
I
S
S TI +
Sv0
S I TI ,
where I denotes a collection of r arbitrary extrinsic indices. Proof.
12
First, we have / L TI = SI · ∇ N
N
/ L S I TI − ∇
N
/ L S I · TI . ∇
By Propositions 4 and 5, the first term on the right-hand side can be handled by v0 I I / / ∇ L S TI = ∇ f S TI dv N
Sv v0 v0 d I I = S TI dv − ϑ (tr χ ) S TI dv,
dv
Sv
Sv v =v
/ f denotes the covariant derivative ϑ ∇ / L . The first term on the where we recalled that ∇ right-hand side is precisely the desired “boundary terms”. Finally, an application of Proposition 4 to the second term completes the proof. 4.2. Expansion of the differential operator. For any tensor field T of rank l on M, we have on N − ( p) the identity 1 1 g TI = − D43 TI − D34 TI + λab Dab TI 2 2 1 = −D43 TI + R43 [T ] I + λab Dab TI 2 1 / 4 (D3 T ) I + 2ηa ∇ / a TI + R43 [T ] I + λab Dab TI , = −∇ 2 / 4 denotes the operator ∇ / L , where R43 [T ] I denotes the curvature quantity where ∇ D43 TI − D34 TI , and where in the last step, we applied the identity D L L = 2ηa ea . / we have the identities By the definitions of the connections D, D, and ∇, Dab TI = Da (Db T ) I − D D a eb TI , / ab TI = ∇ /a ∇ / bT − ∇ / ∇/ e TI = Da (Db T ) I − D∇/ a eb TI , ∇ I
a b
so by (6), we have the relation 1 1 / ab TI − D D e −∇/ e TI = ∇ / ab TI − χ ∇ / 4 TI − χab D3 TI . Dab TI = ∇ a b a b ab 2 2 12 Thanks to Qian Wang and her comments for greatly simplifying the proof.
A Generalized Representation Formula
71
As a result, we have the decomposition / 4 (D3 T ) I + 2ηa ∇ / a TI − / I −∇ TI = T
1 / 4 TI tr χ ∇ 2
1 1 − (tr χ ) D3 TI + R43 [T ] I . 2 2
(30)
As a result, the quantity FL
= =
n m=1 N n m=1 N
A(m) I (m) I A
(m) I
g
(m)
I
+
n m,c=1 N
A(m) I P (mc) μI J D μ (c) J ,
where we have used (1), can be expanded by (30) as FL
n
1 / 4 D3 (m) I + 2ηa ∇ / a (m) I + R43 (m) I / (m) I − ∇ = A 2 m=1 N n 1 1 / 4 (m) I − (tr χ ) D3 (m) I + A(m) I − tr χ ∇ 2 2 m=1 N n + A(c) J P (cm) μJ I D μ (m) I =
(m) I
m,c=1 N A1 + A2 + A3
+ F R + A4 + A5 + A6 .
Moreover, we can expand A6 as A6
=
n m,c=1 N n
+ =
a
m,c=1 1 C + C2
/ (m) I A(c) J P (cm) a J I ∇
1 1 / 4 (m) I − P (cm) 4J I D3 (m) I A(c) J − P (cm) 3J I ∇ 2 2 N
+ C3 .
4.3. Integrations by parts. The main goal in principle is as follows. As of now, all the existing covariant derivatives thus far are acting on the (m) ’s. We wish to move all covariant derivatives in directions tangent to N − ( p) (that is, L , e1 , and e2 ) away from the (m) ’s. This will be accomplished through multiple applications of Lemmas 8 and 10. We will then see that the transport equation (19) will eliminate the worst remaining terms: those involving L-derivatives of the (m) ’s. If we apply Corollary 9 to A1 , we obtain A1 =
n m=1 N
a a / A(m) I ∇ / a (m) I − ∇ / (log ϑ) A(m) I ∇ / a (m) I . −∇
72
A. Shao
Combining this with (5), then A1
+ A3
n − =
=
a
N m=1 ∇ 1 F + B .
/ A(m) I ∇ / a (m) I + ∇
N
a a (m) I (m) / η − ζ A ∇a I
Applying Lemma 8 to B1 yields B1
=
n m=1 N n
/ a (log ϑ) · ζ a − ηa (m) I + ∇ / a ζa − ∇ / a η (m) I A(m) I ∇ a
+
=
/ a A(m) I (m) I ζ a − ηa ∇
m=1 N B2 + B3 + Fζ .
Likewise, we apply Lemma 10 to A2 to obtain A2 =
n
N m=1 n
+
/ 4 A(m) I D3 (m) I + ∇
S
m=1
A
(m) I
D3
(m)
I
−
N
Sv0
Taking note of the transport equation (19), then n 2 5 3 (m) I (m) A D3 I − A + A + C = m=1
S
(tr χ ) A(m) I D3 (m) I A
(m) I
D3
(m)
I
.
Sv0
A(m) I D3 (m) I
= L1 + I1 .
Remark. The above computation also provides the justification for the transport equation (19). Indeed, this is exactly what is needed to eliminate the terms involving integrals over N of the D3 (m) ’s. Recall from standard local energy estimates that L-derivatives on the null cone are generally the worst behaved. Remark. Recall also that [12] justified its transport equation as the condition needed to eliminate terms involving the distribution δ (u), where u is the “optical function” used throughout [12]. Note, however, that δ (u) is precisely − 21 · D3 δ(u), which corresponds to an integral over the null cone of some terms, with the principal term being the L-derivative of the test function. We also apply Lemma 10 to A4 : A4 =
n m=1 N n
+
m=1
1 (m) I (m) 1 / 4 A(m) I (m) I / 4 tr χ A I + tr χ ∇ ∇ 2 2
1 (tr χ ) tr χ A(m) I (m) I N 2
A Generalized Representation Formula
73
n 1 1 (m) I (m) (m) I (m) tr χ A I − tr χ A I + 2 S 2 Sv0 =
m=1 B4 + B5
+ B6 + L0 + I0 .
We can further expand B4 using (8): B4
+ B6
2 1 1 a ab / = ∇ ηa + η + (tr χ ) tr χ − χˆ χˆ ab A(m) I (m) I 4 2 N
m=1 n 1 1 R4343 − R43 A(m) I (m) I + 4 2 N n
=
m=1 B7 + B8
+ B9 + M1 + M2 + M3 .
By the transport equation (19), then B5
+ B9
n 1 tr χ P (cm) 4J I A(c) J (m) I = P1 . = 4 N m, c=1
If we apply Lemma 8 to C1 , we obtain C1 =
n m,c=1 N n
−
=
a / a (log ϑ) · P (cm) a J I (m) I − ∇ / P (cm) a J I (m) I A(c) J −∇ a
/ A(c) J (m) I P (cm) a J I ∇
m,c=1 N P C4 + P2 + F∇
.
Moreover, applying Lemma 10 to C2 , we have C2
n 1 / 4 (m) I =− L α A(c) J P (cm) α J I ∇ 2 N
m,c=1 n 1 (cm) a (cm) I (c) J (m) I (c) J (m) = (tr χ ) P 3J A I + η P a J A I 2 m,c=1 N n 1 1 (cm) I (cm) (c) J (m) I (c) J (m) / / + ∇ 4 P 3J A I + P 3J ∇ 4 A I 2 2 m,c=1 N n 1 1 + P (cm) 3J I A(c) J (m) I − P (cm) 3J I A(c) J (m) I 2 S 2 Sv0
=
m,c=1 5 C + C6
+ P3 + C7 + L2 + I2 .
74
A. Shao
Applying the transport equation (19), we have C5 + C7 =
=
n 1 (tr χ ) P (cm) 3J I A(c) J (m) I 4 N m,c=1 n 1 + P (cd) 4J K P (dm) 3K I A(c) J (m) I 4 N
m,c,d=1 P4 + P5 .
As of now, we have the rather unsightly expansion ∇ ∇P ζ R 2 3 7 8 4 6 1 2 FL
= F + F + F + F + B + B + B + B + C + C + M + M
+ M3 + P1 + P2 + P3 + P4 + P5 + L0 + L1 + L2 + I0 + I1 + I2 .
(31)
4.4. The error terms. The next step is to aggregate the terms in (31) into error terms corresponding to those comprising (21). First, recalling (5), we see that B3 + B7 = B2 + B8 =
n m=1 N n m=1 N
/ a ζa A(m) I (m) I = M4 , ∇
|ζ |2 A(m) I (m) I = M5 .
Recalling the mass aspect function μ defined in (7), we have M1
+ M2
+ M3
+ M4
+ M5
=
n m=1 N
μA(m) I (m) I = Fμ
.
Next, by (5), C4
+ C6
=−
n m,c=1 N
ζ a P (cm) a J I A(c) J (m) I ,
hence we obtain C4 + C6 + P1 + P2 + P3 + P4 + P5 =
n m,c=1 N
ν (cm) J I A(c) J (m) I = Fν .
Finally, we obtain the more manageable equation ∇ ζ ∇P μ ν R 0 1 2 0 1 2 FL
= F + F + F + F + F + F + L + L + L + I + I + I .
Note that: • The “F”-terms are integrals over the regular past null cone segment N . • The “L”-terms are the vertex limit terms, expressed as integrals over S . • The “I”-terms are the initial data terms, expressed as integrals over Sv0 .
(32)
A Generalized Representation Formula
75
The limits as 0 of the “F”-terms in (32) are clear - these involve simply replacing the integral over N by the same integral over N − ( p; v0 ). These account for the terms F( p; v0 ), E1 ( p; v0 ), and E2 ( p; v0 ) in Theorem 7. Moreover, the “I”-terms in (32) correspond exactly to I( p; v0 ) in Theorem 7. As a result, the limit as 0 of (32) is evaluated as − lim L0 + L1 + L2 = F ( p; v0 ) + E1 ( p; v0 ) + E2 ( p; v0 ) + I ( p; v0 ) . (33)
0
Now, it remains only to compute the limits on the left-hand side of (33). 4.5. Vertex limits. To compute the limits of the “L”-terms, we first obtain the initial values of the integrands using Proposition 6, and then we apply the integral limit property (16) to each term. We begin with L1 , which we write as n n 1 (m) I (m) −1 L = A D3 I =
B (m) I D3 (m) I . m=1 S
S
m=1
Along each null generator γω , ω ∈ S2 , we have the limits lim B (m) (v,ω) → J (m) , lim D3 (m) (v,ω) → D3 (m) p , v0
v0
1 ≤ m ≤ n.
As a result, by (16), lim L1 = 0.
0
Similarly, for L2 , we have lim
0
Next, for
L0 ,
L2
n 1 −1 = lim P (cm) 3J I B (c) J (m) I = 0.
0 2 S m,c=1
we write L0 =
n 1 −2
tr χ B (m) I (m) I .
2 S m=1
Equation (15) implies (tr χ ) converges to −2ϑ0−1 at the vertex, hence by (16), lim L0 =
0
n n 1 2 ϑ0 −2ϑ0−1 J (m) I (m) I p dω = −4π ϑ0 J (m) I (m) I p . 2 2 m=1 S m=1
Therefore, we obtain the desired limit n 0 1 2 − lim L + L + L = 4π ϑ0 J (m) I (m) I p .
0
m=1
As a result, Eq. (33) becomes (21), and the proof of (21)–(25) is complete. Lastly, for the alternate representation (26) of the main error terms, we simply apply Corollary 9 to (23). This concludes the proof of Theorem 7.
76
A. Shao
Remark. We can also apply the method of proof in this paper to derive the representation formula KR of [12]. The calculation is then significantly simplified, since the term A6 in the proof now vanishes, as do the terms resulting from A6 . 5. Generalization to Vector Bundles The representation formula KR was further extended in [12, Thm. 4.1] to the case of covariant wave equations on sections of arbitrary vector bundles. For convenience, we denote this extended parametrix by KRV. We can produce a similar extension of Theorem 7 to sections of vector bundles, using a process analogous to that of [12]. We summarize this method in this section, and we connect this extended version to Theorem 7 as well as to KRV. The final result represents a culmination of the ideas which led to KR, KRV, and Theorem 7. 5.1. General ideas. The extension from KR to KRV is essential for the main application presented in [12]: a gauge-invariant proof of the L ∞ -estimates for the Yang-Mills curvature required in Eardley and Moncrief’s classical global existence result for the Yang-Mills equations in [5].13 Recall that [5] applied the standard representation formula for the scalar wave equation in Minkowski space and, as a result, required the Cronström gauge condition in order to control the gauge potential. The same was also true for its curved spacetime analogue [3]. Using KRV, however, one can avoid the gauge potential altogether and hence bypass the need for a favorable gauge condition. Note that the parametrix KR itself is not directly applicable to the Yang-Mills setting. In order to achieve this gauge-invariance property, one requires a version of the representation formula suitable for Lie algebra-valued tensor fields and the associated gauge covariant derivatives. Not only does the extended parametrix KRV of [12, Thm. 4.1] address this issue, but its derivation is also completely straightforward in light of the proof of KR. The key observations behind this extension concern the natures of the operations used in the derivation of KR. These can be summarized by the following statement: • On the past null cone N − ( p) for which we derive the parametrix, the (mixed) covariant derivatives on N − ( p) satisfy natural Leibniz rules and are compatible with both the spacetime metric g and the induced horizontal metrics λ on N − ( p). In particular, this legitimizes the integration by parts operations fundamental to the derivation of KR. In other words, if covariant derivatives on a vector bundle behave in a manner analogous to the above statement, then the proof of KR can be directly adapted to prove KRV. In particular, the gauge covariant derivatives on Lie algebra-valued tensor fields in the Yang-Mills setting satisfy a version of this property, as long as the associated metrics are modified using an appropriate positive-definite scalar product on the Lie algebra under consideration. It is unsurprising, then, that an analogous extension can be made to the representation formula of Theorem 7. The key observations behind this extension remain the same as before, so once the framework is set, the extension becomes completely straightforward. Next, we will briefly summarize the basic constructions for this extension of Theorem 7. 13 In fact, the papers [4,5] dealt with the more general Yang-Mills-Higgs equations.
A Generalized Representation Formula
77
5.2. Basic constructions. Let V denote a vector bundle over M, and let D and ·, · denote a connection and a bundle metric on V, respectively. We additionally stipulate that D and ·, · satisfy the following compatibility condition: for any sections S, T ∈ V, we have the Leibniz-type identity Dα S, T = Dα S, T + S, Dα T . Recall that V denotes the space of all smooth sections of V. Next, we let V denote the restriction of V to N − ( p), and we note that D induces a connection D on V. We will also require the following constructions: • We can define various “mixed bundles” over N − ( p), expressed as r
r
T k T V = T k N − ( p) ⊗ T N − ( p) ⊗ V,
k, r ≥ 0,
where the right-hand side denotes tensor products of vector bundles. / on the above • Moreover, we can naturally construct “mixed covariant derivatives” ∇ / on the “horizontal” components, D on the mixed bundles, which behave like ∇ “extrinsic” components, and D on V. In particular, for a decomposable element r A ⊗ B ⊗ C ∈ T k T V and a vector field X on N − ( p), we define the mixed covariant derivative by / X (A ⊗ B ⊗ C) = ∇ / X A ⊗ B ⊗ C + A ⊗ D X B ⊗ C + A ⊗ B ⊗ D X C. ∇ • The metrics ·, · , g, and γ induce pairings for the mixed bundles, which we also denote by ·, ·, and which, similar to the original bundle metric, satisfy compatibility conditions with respect to the above mixed connections. • We can define the V-curvature R to describe commutations of two D-derivatives. More specifically, we can define for any T ∈ V and vector fields X, Y on M the quantity R X Y [T ] ∈ V by the formula R X Y [T ] = D X (DY T ) − DY (D X T ) − D[X,Y ] T. We can recover the setting of KR by taking V to be the bundle T r M of all tensors on M of total rank r , with D = D the connection on T r M induced by the Levi-Civita connection and ·, · = g the full metric contraction operation on T r M. Note that D and ·, ·, as given here, are indeed compatible. For this special case, the above mixed bundles and their associated connections and metrics coincide with the corresponding “mixed” objects defined in previous sections. This construction of mixed bundles and derivatives along with the compatibility of the associated connections and metrics are the driving forces behind the extension of Theorem 7. We will denote the object satisfying the covariant wave equation on V to be ∈ V, and we denote the associated nonlinearity by ∈ V. It remains now to describe the first-order coefficients of the wave equation, which we denote by P. In this setting, we can define P to be a “(V ⊗ V)-valued 1-form”, that is, a section of the vector bundle T ∗ M ⊗ V ⊗ V, where T ∗ M is the cotangent bundle of M. In particular, the restriction of any component Pα to N − ( p) is a section of V ⊗ V. Using the bundle metric ·, ·, we can define the natural bilinear pairings
|·, · : V ⊗ V × V → V, ·, ·| : V × V ⊗ V → V,
78
A. Shao
given for decomposable elements by |A ⊗ B, C = A · B, C ,
A, B ⊗ C| = A, B · C,
A, B, C ∈ V,
and extended accordingly to general elements. We will also need the maps
· | · | · : V × V ⊗ V × V → C ∞ N − ( p) ,
·, · : V ⊗ V × V ⊗ V → V ⊗ V , defined for decomposable elements by A | B ⊗ C | D = A, B C, D ,
A ⊗ B, C ⊗ D = B, C · A ⊗ D,
and extended as multilinear functions to general elements. For example, consider the standard case V = T r M of KR. Let S, T ∈ V, let A, B ∈ (V ⊗ V), and let I, J, K denote collections of r extrinsic indices. Then, the above operations are given explicitly via tensorial contractions, i.e., |A, T I = A I J T J ,
T, A| J = TI A I J ,
S | A | T = S I A I J T J , A, B I J = A I K B K J . We have a natural connection D on V ⊗ V induced from the connection D on V, given by the Leibniz-type formula D (A ⊗ B) = D A ⊗ B + A ⊗ D B for the decomposable elements and extended linearly to general sections. Then, we can / corresponding to once again define mixed bundles and mixed covariant derivatives ∇ that the expected compatibility the bundles V ⊗ V. It is a simple exercise to determine
conditions hold for |·, ·, ·, ·|, · | · | ·, and ·, · with respect to the various connections and covariant derivatives. We omit the rather tedious details.
5.3. The generalized equations. The covariant wave equation under consideration is now given by μν μ D (34) g + |P, D = g Dμν + Pμ , D = , where , ∈ V and P ∈ (V ⊗ V). The precise definition of D g can be a bit subtle. Here, the inner covariant derivative Dβ is that of the connection on V, but the outer covariant derivative Dα is the induced connection on (T ∗ M ⊗ V), which acts like D on T ∗ M and like D on V. In other words, for vector fields X, Y on M, we define the “second derivative” D X Y by D X Y = D X (DY ) − D D X Y . Let B ∈ V satisfy the system of transport equations 2 ϑ 1 / f B = − ϑ (tr χ ) − B + B, P4 | , ∇ 2 f 2
B| p = J,
(35)
A Generalized Representation Formula
79
where J is an element of the fiber at p of V. We also define A = f −1 B ∈ V, as well as the coefficients ν ∈ (V ⊗ V), given by
a 1 1 1 1 / Pa + ∇ / 4 P3 + ζ a Pa + tr χ P4 + (tr χ ) P3 + P4 , P3 . ν = −∇ 2 4 4 2 We have indexed with respect to arbitrary f -adapted null frames on N − ( p). We can now finally state our extension of Theorem 7: Theorem 11. Suppose V is a vector bundle over M, with compatible connection and bundle metric D and ·, ·. We assume all the objects derived from V, D, and ·, · as defined in this section, and we define , , P, B, A, ν as in the above development. If p ∈ M, and if the past regular null cone N − ( p) satisfies the same assumptions as in Theorem 7, then (36) 4π ϑ0 J, | p = F ( p; v0 ) + E1 ( p; v0 ) + E2 ( p; v0 ) + I ( p; v0 ) , where
F ( p; v0 ) = −
N − ( p;v0 )
A, ,
a / A, ∇ / a + ζ a − ηa ∇ / a A, , E1 ( p; v0 ) = − ∇ N − ( p;v0 ) 1 A, R43 [] E2 ( p; v0 ) = μ A, + 2 N − ( p;v0 ) N − ( p;v0 ) a / A | Pa | + A | ν | , − ∇ N − ( p;v0 ) 1 1 A, D3 − A | P3 | . tr χ A, − I ( p; v0 ) = − 2 Sv0 2 Sv0 Sv0 Again, the error terms E1 ( p; v0 ) can be alternately expressed as / a A, . / + 2ζ a ∇ A E1 ( p; v0 ) = N − ( p;v0 )
We can derive Theorem 11 in a manner completely analogous to Theorem 7. We sim/ in the tensorial case by the abstract ply replace the compatibility properties of D and ∇ compatibility properties described above. 5.4. Reduction to previous cases. Note that we can obtain KRV trivially from Theorem 11 simply by taking P ≡ 0. Next, we will show how Theorem 7 can be recovered from Theorem 11. For this, we consider the direct sum vector bundle (1)
V = Tr M ⊕ ··· ⊕ Tr
(n)
M,
with the associated metric and connection on V given naturally by
n
S (1) , . . . , S (n) , T (1) , . . . , T (n) = g S (m) , T (m) , m=1
80
A. Shao
D S (1) , . . . , S (n) = DS (1) , . . . , DS (n) . We then define the fundamental objects
= (1) , . . . , (n) ,
(37)
= (1) , . . . , (n) ,
and we similarly define P from the component objects P (mc) . With these specific definitions, then the covariant wave equation (34) coincides with (1), and formula of Theorem 11 reduces to Theorem 7. Note that the vector bundle formalism of (36) automatically encapsulates the fact that Theorem 7 deals with a system of tensor wave equations. As a result, the parametrix KRV can also deal with systems of covariant wave equations in this fashion. What is entirely new with respect to [12] is the presence of first-order terms, the resulting coupling, and the weakening of the assumptions. Remark. Alternatively, one could possibly reinterpret the first-order coefficients P by modifying the given connection D to also take into account contributions from P. This would bring us closer to the case of KRV. However, such modified connections would generally fail to be compatible with the given metric, hence one would need to deal with error terms resulting from this. 5.5. Applications to general relativity: Einstein-Yang-Mills spacetimes. We conclude this paper by briefly summarizing a couple of applications for which Theorem 11 could be helpful. First, we can apply Theorem 11 to treat the analogue of Theorem 2 for the Einstein-Yang-Mills setting. This is quite analogous to the Einstein-Maxwell problem discussed in [16], except that the matter field F is now a g-valued 2-form, where g is the Lie algebra associated with the Yang-Mills theory. In other words, we can think of F as a section of the bundle T 2 M ⊗ g. Recall also that we can define “gauge covariant” derivatives on such g-valued tensor bundles, which we denote by D. Moreover, if we assume as is standard that g admits an invariant positive-definite scalar product ·, ·, then we can combine ·, · and g to define bundle metrics on such g-valued tensor bundles which are compatible with D. Hence, we retrieve a setting that can be described using the language of Theorem 11. Similar to the Einstein-Maxwell setting, we again have a system of two covariant wave equations in the Einstein-Yang-Mills analogue, one for the spacetime curvature R, and one for the derivative of the Yang-Mills curvature, DF ∈ (T 3 M ⊗g). Algebraically, these equations are analogous in form to the system (28) treated in the Einstein-Maxwell setting. Unlike in [16], we cannot apply Theorem 7 here, since the wave equation satisfied by DF involve gauge covariant derivatives rather than the standard tensorial derivatives.14 We can, however, describe this setting in terms of Theorem 11. In fact, we simply take V = T 4M ⊕ T 3M ⊗ g , and we construct a natural connection and bundle metric on V from the above developments using the same method as in (37). Consequently, the system of wave equations described above is now of the form (34), and we can apply Theorem 11 in order to derive the necessary uniform bounds for R and DF. 14 In terms of the usual tensorial covariant derivative, such a wave equation for F would of course fail to be gauge-independent.
A Generalized Representation Formula
81
5.6. Applications to general relativity: Time foliations. In [20,21], and to a lesser extent in [16], one requires the observation that the second fundamental form k of the given time foliation satisfies a covariant wave equation. This can be cleanly described in terms of the language of this section. Furthermore, with this characterization, one can more easily derive the energy estimates for k associated with this wave equation, which are crucial in the above works. In this setting, the spacetime M is given as a smooth 1-parameter family of spacelike hypersurfaces { τ }, and k denotes the second fundamental forms on the τ ’s. Then, the bundle V under consideration here is the “horizontal bundle” of all symmetric 2-tensors in M which are tangent to the τ ’s; note that k ∈ V. For the bundle metric, we can simply consider the pullback γ of g to the τ ’s, i.e., the induced metrics of the τ ’s. The compatible connection on V can be defined in the same manner as for the connections on the horizontal bundles of past null cones. From the Einstein equations, one can see that k satisfies a covariant wave equation of the form (34). As a result, we can apply Theorem 11 to k in this setting. In fact, the application of a representation formula to k is an essential component in the arguments of [20,21]. Note that in contrast to the previous example and to Theorem 7, the bundle metric in this case is in fact positive-definite. Recall that in the case of scalar wave equations, one can derive energy estimates using energy-momentum tensors in a completely standard fashion. Moreover, this methodology extends directly to the setting of an arbitrary vector bundle along with a connection and a compatible positive-definite bundle metric. Therefore, by expressing k in the form (34), then one can immediately derive associated global and local energy estimates for k. Acknowledgements. The author would like to thank Professor Sergiu Klainerman for his insights regarding this topic, and for hours of helpful discussions.
References 1. Choquét-Bruhat, Y.: Théorème d’existence pour certains systèmes d’équations aux dérivées partielles nonlinéaires. Acta Math. 88, 141–225 (1952) 2. Christodoulou, D., Klainerman, S.: Global nonlinear stability of the Minkowski space. Princeton, NJ: Princeton University Press, 1993 3. Chru´sciel, P., Shatah, J.: Global existence of solutions of the Yang-Mills equations on globally hyperbolic four dimensional Lorentzian manifolds. Asian J. Math. 1(3), 530–548 (1997) 4. Eardley, D.M., Moncrief, V.: The global existence of Yang-Mills-Higgs fields in 4-dimensional Minkowski space. I. Local existence and smoothness properties. Commun. Math. Phys. 83, 171–191 (1982) 5. Eardley, D.M., Moncrief, V.: The global existence of Yang-Mills-Higgs fields in 4-dimensional Minkowski space. II. Completion of proof. Commun. Math. Phys. 83, 193–212 (1982) 6. Evans, L.C.: Partial differential equations. Providence, KI: Amer. Math. Soc., 2002 7. Friedlander, F.G.: The wave equation on a curved spacetime. Cambridge: Cambridge University Press, 1976 8. Hawking, S.F., Ellis, G.F.R.: The large scale structure of space-time. Cambridge: Cambridge University Press, 1975 9. Klainerman, S., Rodnianski, I.: Causal geometry of Einstein-vacuum spacetimes with finite curvature flux. Invent. Math. 159, 437–529 (2005) 10. Klainerman, S., Rodnianski, I.: A geometric approach to the Littlewood-Paley theory. Geom. Funct. Anal. 16(1), 126–163 (2006) 11. Klainerman, S., Rodnianski, I.: Sharp trace theorems for null hypersurfaces on Einstein metrics with finite curvature flux. Geom. Funct. Anal. 16(3), 164–229 (2006) 12. Klainerman, S., Rodnianski, I.: A Kirchhoff-Sobolev parametrix for the wave equation and applications. Hyperbolic Eq. 4(3), 401–433 (2007)
82
A. Shao
13. Klainerman, S., Rodnianski, I.: On the radius of injectivity of null hypersurfaces. J. Amer. Math. Soc. 21(3), 775–795 (2008) 14. Klainerman, S., Rodnianski, I.: On the breakdown criterion in general relativity. J. Amer. Math. Soc. 23(2), 345–382 (2009) 15. O’Neill, B.: Semi-Riemannian geometry with applications to relativity. New York: Academic Press, 1983 16. Shao, A.: Breakdown criteria for nonvacuum Einstein equations. Ph.D. thesis, Princeton University, 2010 17. Sobolev, S.: Méthodes nouvelle à resoudre le problème de Cauchy pour les équations linéaires hyperboliques normales. Matematicheskii Sbornik 1(43), 31–79 (1936) 18. Wang, Q.: Causal geometry of Einstein vacuum space-times. Ph.D. thesis, Princeton University, 2006 19. Wang, Q.: On the geometry of null cones in Einstein-vacuum spacetimes. Ann. Henri Poincaré 26(1), 285–328 (2009) 20. Wang, Q.: Improved breakdown criterion for solution of Einstein vacuum equation in CMC gauge. Preprint, 2010, available at http://arXiv.org/abs/1004.2938v1 [math.AP], 2010 21. Wang, Q.: On Ricci coefficients of null hypersurfaces with time foliation in vacuum space-time. Preprint, 2010, available at http://arXiv.org/abs/1006.5963v1 [math.AP], 2010 Communicated by P.T. Chru´sciel
Commun. Math. Phys. 306, 83–118 (2011) Digital Object Identifier (DOI) 10.1007/s00220-011-1279-9
Communications in
Mathematical Physics
A Series of Algebras Generalizing the Octonions and Hurwitz-Radon Identity Sophie Morier-Genoud1 , Valentin Ovsienko2 1 Institut Mathématiques de Jussieu, UMR 7586, Université Pierre et Marie Curie Paris VI, 4 Place Jussieu,
Case 247, 75252 Paris Cedex 05, France. E-mail:
[email protected]
2 CNRS, Institut Camille Jordan, Université Claude Bernard Lyon 1, 43 Boulevard du 11 Novembre 1918,
69622 Villeurbanne Cedex, France. E-mail:
[email protected] Received: 26 May 2010 / Accepted: 11 February 2011 Published online: 8 June 2011 – © Springer-Verlag 2011
Abstract: We study non-associative twisted group algebras over (Z2 )n with cubic twisting functions. We construct a series of algebras that extend the classical algebra of octonions in the same way as the Clifford algebras extend the algebra of quaternions. We study their properties, give several equivalent definitions and prove their uniqueness within some natural assumptions. We then prove a simplicity criterion. We present two applications of the constructed algebras and the developed technique. The first application is a simple explicit formula for the following famous square iden2 2 2 tity: (a12 + · · · + a 2N ) (b12 + · · · + bρ(N ) ) = c1 + · · · + c N , where ck are bilinear functions of the ai and b j and where ρ(N ) is the Hurwitz-Radon function. The second application is the relation to Moufang loops and, in particular, to the code loops. To illustrate this relation, we provide an explicit coordinate formula for the factor set of the Parker loop. Contents 1. 2.
3.
4.
Introduction . . . . . . . . . . . . . . . . . . . Twisted Group Algebras over (Z2 )n . . . . . . . 2.1 Basic definitions . . . . . . . . . . . . . . . 2.2 Quasialgebra structure . . . . . . . . . . . 2.3 The pentagonal and diagrams the hexagonal 2.4 Cohomology H ∗ (Z2 )n ; Z2 . . . . . . . . 2.5 Polynomials and polynomial maps . . . . . The Generating Function . . . . . . . . . . . . 3.1 Generating functions . . . . . . . . . . . . 3.2 The signature . . . . . . . . . . . . . . . . 3.3 Isomorphic twisted algebras . . . . . . . . 3.4 Involutions . . . . . . . . . . . . . . . . . . The Series On and Mn : Characterization . . . . 4.1 Symmetric quasialgebras . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
84 87 88 88 89 90 90 91 91 92 93 94 94 95
84
S. Morier-Genoud, V. Ovsienko
4.2 The generating functions of the algebras On and Mn . 4.3 Characterization of the algebras of the O- and M-series 4.4 Generators and relations . . . . . . . . . . . . . . . . 5. The Series On and Mn : Properties . . . . . . . . . . . . . . 5.1 Criterion of simplicity . . . . . . . . . . . . . . . . . . 5.2 The first algebras of the series . . . . . . . . . . . . . 5.3 The commutation graph . . . . . . . . . . . . . . . . . 6. Generating Functions: Existence and Uniqueness . . . . . . 6.1 Existence of a generating function . . . . . . . . . . . 6.2 Generating functions are cubic . . . . . . . . . . . . . 6.3 Uniqueness of the generating function . . . . . . . . . 6.4 From the generating function to the twisting function . 7. Proof of the Simplicity Criterion . . . . . . . . . . . . . . 7.1 The idea of the proof . . . . . . . . . . . . . . . . . . 7.2 Central elements . . . . . . . . . . . . . . . . . . . . . 7.3 Proof of Theorem 3, part (i) . . . . . . . . . . . . . . . 7.4 Proof of Theorem 3, part (ii) . . . . . . . . . . . . . . 8. Hurwitz-Radon Square Identities . . . . . . . . . . . . . . 8.1 The explicit solution . . . . . . . . . . . . . . . . . . . 8.2 The Euclidean norm . . . . . . . . . . . . . . . . . . . 8.3 Proof of Theorem 4 . . . . . . . . . . . . . . . . . . . 9. Relation to Code Loops . . . . . . . . . . . . . . . . . . . 10. Appendix: Linear Algebra and Differential Calculus over Z2 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
96 96 98 99 99 100 103 105 105 105 106 107 107 107 108 110 110 111 111 112 113 113 115 117
1. Introduction The starting idea of this work is the following naive question: is there a natural way to multiply n-tuples of 0 and 1? Of course, it is easy to find such algebraic structures. The abeliangroup (Z2 )n provides such a multiplication, but the corresponding group algebra K (Z2 )n , over any field of scalars K, is not a simple algebra. A much more interesting algebraic structure on K (Z2 )n is given by the twisted product u x · u y = (−1) f (x,y) u x+y ,
(1.1)
∼ where x, y ∈ (Z2 )n and f is a two-argument function on (Z2 )n with values in Z2 = n {0, 1}. We use the standard notations u x for the element of K (Z2 ) corresponding to n x∈ (Z2 ) . The only difference between the above product and that of the group algebra K (Z2 )n is the sign. Yet, the structure of the algebra changes completely. Throughout the paper the ground field K is assumed to be R or C (although many results hold for an arbitrary field of characteristic = 2). Remarkably enough, the classical Clifford algebras can be obtained as twisted group algebras. The first example is the algebra of quaternions, H. This example was found by many authors but probably first in [23]. The algebra H is a twisted (Z2 )2 algebra. More precisely, consider the 4-dimensional vector space over R spanned by (0, 0), (0, 1), (1, 0) and (1, 1) with the multiplication:
A Series of Algebras Generalizing the Octonions...
85
100
110
101 111
010
011
001
Fig. 1. (Z2 )3 -grading on the octonions
u (x1 ,x2 ) · u (y1 ,y2 ) = (−1)x1 y1 +x1 y2 +x2 y2 u (x1 +y1 , x2 +y2 ) . It is easy to check that the obtained twisted (Z2 )2 -algebra is, indeed, isomorphic to H, see also [25] for a different grading on the quaternions (over (Z2 )3 ). Along the same lines, a Clifford algebra with n generators, is a (Z2 )n -graded algebra, see [5]. The (complex) Clifford algebra Cn is isomorphic to the twisted group algebras over (Z2 )n with the product u (x1 ,...,xn ) · u (y1 ,...,yn ) = (−1)
1≤i≤ j≤n xi y j
u (x1 +y1 ,...,xn +yn ) ,
(1.2)
where (x1 , . . . , xn ) is an n-tuple of 0 and 1. The above twisting function is bilinear and therefore is a 2-cocycle on (Z2 )n . The real Clifford algebras C p,q are also twisted group algebras over (Z2 )n , where n = p + q. The twisting function f in the real case contains an extra term 1≤i≤ p xi yi corresponding to the signature (see Sect. 3.2). The algebra of O can also be viewed as a twisted group algebra [4]. It is octonions isomorphic to R (Z2 )3 equipped with the following product:
u (x1 ,x2 ,x3 ) · u (y1 ,y2 ,y3 ) = (−1)
x1 x2 y3 +x1 y2 x3 +y1 x2 x3 + 1≤i≤ j≤3 xi y j
u (x1 +y1 , x2 +y2 , x3 +y3 ) .
Note that the twisting function in this case is a polynomial of degree 3, and does not define a 2-cocycle. This is equivalent to the fact that the algebra O is not associative. The multiplication table on O is usually represented by the Fano plane. The corresponding (Z2 )3 -grading is given in Fig. 1. We also mention that different group gradings on O were studied in [15], we also refer to [6] for a survey on the octonions and Clifford algebras. In this paper, we introduce two series of complex algebras, On and Mn , and of real algebras, O p,q and M p,q . The series On and O p,q generalize the algebra of octonions in a similar way as the Clifford algebras generalize the algebra of quaternions. The situation
86
S. Morier-Genoud, V. Ovsienko
can be represented by the following diagram: .. . ⏐ ⏐
.. . ⏐ ⏐
C4 ⏐ ⏐
O5 ⏐ ⏐
C3 ⏐ ⏐
O4 ⏐ ⏐
R −−−−→ C −−−−→ H −−−−→ O −−−−→ S −−−−→ · · · where the horizontal line represents the Cayley-Dickson procedure (see, e.g., [6,11]), in particular, S is the 16-dimensional algebra of sedenions. The algebra Mn “measures” the difference between On and Cn . The precise is as follows. The (complex) algebras On are twisted group definition algebras K (Z2 )n with the product (1.1), given by the function xi x j yk + xi y j xk + yi x j xk + xi y j , (1.3) f O (x, y) = 1≤i< j
1≤i≤ j≤n
for arbitrary n. The algebras Mn are defined by the twisting function xi x j yk + xi y j xk + yi x j xk , f M (x, y) =
(1.4)
1≤i< j
which is just the homogeneous part of degree 3 of the function f O (i.e., with the quadratic part removed). In the real case, one can again add the signature term 1≤i≤ p xi yi , which only changes the square of some generators, and define the algebras O p,q and M p,q . The function f O is a straightforward generalization of the twisting function corresponding to the octonions. In particular, the algebra O3 is just the complexified octonion algebra O ⊗ C. In the real case, O0,3 ∼ = O, the algebras O3,0 ∼ = O2,1 ∼ = O1,2 are isomorphic to another famous algebra called the algebra of split-octonions. The first really interesting new example is the algebra O5 and its real forms O p,q with p + q = 5. The algebras On and Mn are not associative, moreover, they are not alternative. It turns out however, that these algebras have nice properties similar to those of the octonion algebra and of the Clifford algebras at the same time. As an “abstract algebra”, On can be defined in a quite similar way as the Clifford algebras. The algebra On has n generators u 1 , . . . , u n such that u i2 = −1 and u i · u j = −u j · u i ,
(1.5)
respectively, together with the antiassociativity relations u i · (u j · u k ) = −(u i · u j ) · u k ,
(1.6)
for i = j = k. We will show that the algebras On are the only algebras with n generators u 1 , . . . , u n satisfying (1.5) and (1.6) and such that any three monomials u, v, w either associate or antiassociate independently of the order of u, v, w.
A Series of Algebras Generalizing the Octonions...
87
The relations of higher degree are then calculated inductively using the following simple “linearity law”. Given three monomials u, v, w, then u · (v · w) = (−1)φ(deg u,deg v,deg w) (u · v) · w, where φ is the trilinear function uniquely defined by the above relations of degree 3, see Sect. 4.4 for the details. For instance, one has u i · ((u j · u k ) · u ) = (u i · (u j · u k )) · u , for i = j = k = , etc. The presentation of Mn is exactly the same as above, except that the generators of Mn commute. We will prove two classification results characterizing the algebras On and Mn algebras in an axiomatic way. Our main tool is the notion of generating function. This is a function in one argument α : (Z2 )n → Z2 that encodes the structure of the algebra. Existence of a generating function is a strong condition. This is a way to distinguish the series On and Mn from the classical Cayley-Dickson algebras. The main results of the paper contain four theorems and their corollaries. (1) Theorem 1 states that the generating function determines a (complex) twisted group algebra completely. (2) Theorem 2 is a general characterization of non-associative twisted group algebras over (Z2 )n with a symmetric non-associativity factor, in terms of generating functions. (3) Theorem 3 answers the question for which n (and p, q) the constructed algebras are simple. The result is quite similar to that for the Clifford algebras, except that the algebras On and Mn degenerate for one value of n over 4 and not 1 over 2 as Cn . (4) Theorem 4 provides explicit formulæ of the Hurwitz-Radon square identities. The algebras On (as well as Mn ) are not composition algebras. However, they have natural Euclidean norm N . We obtain a necessary and sufficient condition for elements u and v to satisfy N (u · v) = N (u) N (v). Whenever we find two subspaces V, W ⊂ On consisting of elements satisfying this condition, we obtain a square identity generalizing the famous “octonionic” 8-square identity. The algebras On and Mn are closely related to the theory of Moufang loops and, in particular, to code loops, see [14,18,26] and references therein. Indeed, the homogeneous elements ±u x , where x ∈ (Z2 )n form a Moufang loop of rank 2n+1 . As an application, we show in Sect. 9 how the famous Parker loop fits into our framework. Our main tools include variations on the cohomology of (Z2 )n and the linear algebra over (Z2 )n . A brief account on this subject is presented in Sect. 2.4 and in the Appendix.
2. Twisted Group Algebras over (Z2 )n In this section, we give the standard definition of twisted group algebra over the abelian group (Z2 )n . The twisting function we consider is not necessarily a 2-cocycle. We recall the related notion of graded quasialgebra introduced in [4]. In the end of the section, we give a short account on the cohomology of (Z2 )n with coefficients in Z2 .
88
S. Morier-Genoud, V. Ovsienko
2.1. Basic definitions. The most general definition is the following. Let (G, +) be an abelian group. A twisted group algebra (K [G] , F) is the algebra spanned by the elements u x for x ∈ G and equipped with the product u x · u y = F(x, y) u x+y , where F : G × G → K∗ is an arbitrary two-argument function such that F(0, .) = F(., 0) = 1. The algebra (K [G] , F) is always unital and it is associative if and only if F is a 2-cocycle on G. Twisted group algebras are a classical subject (see, e.g., [7,9] and references therein). We will be interested in the particular case of twisted algebras over G = (Z2 )n and the twisting function F of the form F(x, y) = (−1) f (x,y) , with f taking values in Z2 ∼ = {0, 1}. We will denote by K (Z2 )n , f the corresponding twisted group algebra. Let us stress that the function f is not necessarily a 2-cocycle. 2.2. Quasialgebra structure. An arbitrary twisted group algebra A = K (Z2 )n , f gives rise to two functions β : (Z2 )n × (Z2 )n → Z2 ,
φ : (Z2 )n × (Z2 )n × (Z2 )n → Z2
such that u x · u y = (−1)β(x,y) u y · u x , u x · u y · u z = (−1)φ(x,y,z) u x · u y · u z ,
(2.1) (2.2)
for any homogeneous elements u x , u y , u z ∈ A. The function β obviously satisfies the following properties: β(x, y) = β(y, x) and β(x, x) = 0. Following [4], we call the structure β, φ a graded quasialgebra. The functions β and φ can be expressed in terms of the twisting function f : β(x, y) = f (x, y) + f (y, x), φ(x, y, z) = f (y, z) + f (x + y, z) + f (x, y + z) + f (x, y).
(2.3) (2.4)
Note that (2.4) reads φ = δ f. In particular, φ is a (trivial) 3-cocycle. Conversely, given the functions β and φ, to what extent is the corresponding function f uniquely defined? We will give the answer to this question in Sect. 3.3.
A Series of Algebras Generalizing the Octonions...
(vu)w
89
w(uv)
(tu)(vw)
(uv)w v(uw)
(wu)v t(u(vw))
v(wu)
((tu)v)w
(uw)v u(vw) (vw)u
u(wv)
t((uv)w)
(t(uv))w
Fig. 2. Two hexagonal and the pentagonal commutative diagrams
Example 2.1. (a) For the Clifford algebra Cn (and for C p,q with p + q = n), the function β is bilinear: βCn (x, y) = xi y j . i= j
The function φ ≡ 0 since the twisting function (1.2) is a 2-cocycle; this is of course equivalent to the associativity property. Every simple graded quasialgebra with bilinear β and φ ≡ 0 is a Clifford algebra, see [25]. (b) For the algebra of octonions O, the function β is as follows: β(x, y) = 0 if either x = 0, or y = 0, or x = y; otherwise, β(x, y) = 1. The function φ is the determinant of 3 × 3 matrices: φ(x, y, z) = det |x, y, z| , where x, y, z ∈ (Z2 ) . This function is symmetric and trilinear. 3
Remark 2.2. The notion of graded quasialgebra was defined in [5] in a more general situation where G is an arbitrary abelian group and the functions that measure the defect of commutativity and associativity take values in K∗ instead of Z2 . The “restricted version” we consider is very special and this is the reason we can say much more about it. On the other hand, many classical algebras can be treated within our framework.
2.3. The pentagonal and the hexagonal diagrams. Consider any three homogeneous elements, u, v, w ∈ A. The functions β and φ relate the different products, u(vw), (uv)w, (vu)w, etc. The hexagonal diagrams in Fig. 2 represent different loops in A that lead to the following identities: φ(x, y, z) + β(x, y + z) + φ(y, z, x) + β(z, x) + φ(y, x, z) + β(x, y) = 0, (2.5) φ(x, y, z) + β(z, y) + φ(x, z, y) + β(z, x) + φ(z, x, y) + β(x + y, z) = 0. Note that these identities can be checked directly from (2.3) and (2.4). In a similar way, the products of any four homogeneous elements t, u, v, w, (see the pentagonal diagrams of Fig. 2) is equivalent to the condition φ(y, z, t) + φ(x + y, z, t) + φ(x, y + z, t) + φ(x, y, z + t) + φ(x, y, z) = 0,
(2.6)
which is nothing but the 3-cocycle condition δφ = 0. We already knew this identity from φ = δ f . Let us stress the fact that these two commutative diagrams are tautologically satisfied and give no restriction on f .
90
S. Morier-Genoud, V. Ovsienko
2.4. Cohomology H ∗ (Z2 )n ; Z2 . In this section, we recall classical notions and results on the cohomology of G = (Z2 )n with coefficients in Z2 . We consider the space of cochains, C q = C q (G; Z2 ), consisting of (arbitrary) maps in q arguments c : G × · · · × G → Z2 . The usual coboundary operator δ : C q → C q+1 is defined by δc(g1 , . . . , gq+1 ) = c(g1 , . . . , gq ) +
q
c(g1 , . . . , gi−1 , gi + gi+1 , gi+2 , . . . , gq )
i=1
+ c(g2 , . . . , gq+1 ), for all g1 , . . . , gq+1 ∈ G. This operator satisfies δ 2 = 0. A cochain c is called q-cocycle if δq = 0, and called a q-coboundary (or a trivial q-cocycle) if c = δb, for some cochain b ∈ C q−1 . The space of q th cohomology, H q (G; Z2 ), is the quotient space of q-cocycles modulo q-coboundaries. We are particularly interested in the case where q = 1, 2 or 3. A fundamental result (cf. [1], p. 66) states that the cohomology ring H ∗ (G; Z2 ) is isomorphic to the algebra of polynomials in n commuting variables e1 , . . . , en : H ∗ (G; Z2 ) ∼ = Z2 [e1 , . . . , en ]. The basis of H q (G; Z2 ) is given by the cohomology classes of the following multilinear q-cochains: (1)
(q)
(x (1) , . . . , x (q) ) → xi1 · · · xiq ,
i1 ≤ · · · ≤ iq ,
(2.7)
where each x (k) ∈ (Z2 )n is an n-tuple of 0 and 1: (k)
x (k) = (x1 , . . . , xn(k) ). The q-cocycle (2.7) is identified with the monomial ei1 · · · eiq . Example 2.3. The linear maps ci (x) = xi , for i = 1, . . . , n provide a basis of H 1 (G; Z2 ) while the bilinear maps ci j (x, y) = xi y j ,
i≤ j
provide a basis of the second cohomology space H 2 (G; Z2 ). 2.5. Polynomials and polynomial maps. The space of all functions on (Z2 )n with values in Z2 is isomorphic to the quotient space Z2 [x1 , . . . , xn ] / (xi2 − xi : i = 1, . . . , n). A function P : (Z2 )n → Z2 can be expressed as a polynomial in (x1 , . . . , xn ), but not in a unique way. Throughout the paper we identify the function P to the polynomial expression in which each monomial is minimally represented (i.e. has the lowest degree possible). So that each function P can be uniquely written in the following form: P=
n
k=0 1≤i 1 <···
where λi1 ...ik ∈ {0, 1}.
λi1 ...ik xi1 · · · xik ,
A Series of Algebras Generalizing the Octonions...
91
3. The Generating Function In this section we go further into the general theory of twisted group algebra over (Z2 )n . We define the notion of generating function. To the best of our knowledge, this notion has never been considered. This notion will be fundamental for us since it allows to distinguish the Clifford algebras, octonions and the two new series we introduce in this paper from other twisted group algebras over (Z2 )n (such as Cayley-Dickson algebras). The generating function contains the full information about the algebra, except for the signature.
3.1. Generating functions. The notion of generating function makes sense for any G-graded quasialgebra A, over an arbitrary abelian group G. We are only interested in the case where A is a twisted group algebra over (Z2 )n . Definition 3.1. Given a G-graded quasialgebra, a function α : G → Z2 will be called a generating function if the binary function β and the ternary function φ defined by (2.1) and (2.2) are both determined by α via β(x, y) = α(x + y) + α(x) + α(y), φ(x, y, z) = α(x + y + z) + α(x + y) + α(x + z) + α(y + z) + α(x) + α(y) + α(z).
(3.1)
(3.2)
Note that the identity (3.1) implies that α vanishes on the zero element 0 = (0, . . . , 0) of (Z2 )n , because the corresponding element 1 := u 0 is the unit of A and therefore commutes with any other element of A. The identity (3.1) means that β is the differential of α in the usual sense of group cohomology. The second identity (3.2) suggests the operator of “second derivation”, δ2 , defined by the right-hand-side, so that the above identities then read: β = δα,
φ = δ2 α.
The algebra A is commutative if and only if δα = 0; it is associative if and only if δ2 α = 0. The cohomological meaning of the operator δ2 will be discussed in the Appendix. Note also that formulæ (3.1) and (3.2) are known in linear algebra and usually called polarization. This is the way one obtains a bilinear form from a quadratic one and a trilinear form from a cubic one, respectively. Example 3.2. (a) The classical algebras of quaternions H and of octonions O have generating functions. They are of the form: α(x) =
1, x = 0, 0, x = 0.
It is amazing that such simple functions contain the full information about the structure of H and O.
92
S. Morier-Genoud, V. Ovsienko
(b) The generating function of Cn is as follows: αC (x) = xi x j .
(3.3)
1≤i≤ j≤n
Indeed, one checks that the binary function β defined by (3.1) is exactly the skewsymmetrization of the function f = 1≤i≤ j≤n xi y j . The function φ defined by (3.2) is identically zero, since α is a quadratic polynomial. The most important feature of the notion of generating function is the following. In the complex case, the generating function contains the full information about the algebra. Theorem 1. If A and A are two complex twisted group algebras with the same generating function, then A and A are isomorphic as graded algebras. This theorem will be proved in Sect. 3.3. In the real case, the generating function determines the algebra up to the signature. 3.2. The signature. Consider a twisted group algebra A = (K (Z2 )n , f ). We will always use the following set of generators of A: u i = u (0,...,0,1,0,...0) ,
(3.4)
where 1 stands at i th position. One has u i2 = ±1, the sign being determined by f . The signature is the data of the signs of the squares of the generators u i . Definition 3.3. We say that the twisting functions f and f differ by a signature if one has f (x, y) − f (x, y) = xi1 yi1 + · · · + xi p yi p ,
(3.5)
where p ≤ n is an integer. Note that f − f as above, is a non-trivial 2-cocycle, for p ≥ 1. The quasialgebra structures defined by (2.3) and (2.4) are identically the same: β = β and φ = φ . The signature represents the main difference between the twisted group algebras over C and R. Proposition 3.4. If A = (C (Z2 )n , f ) and A = (C (Z2 )n , f ) are complex twisted group algebras such that f and f differ by a signature, then A and A are isomorphic as graded algebras. Proof. Let us assume p = 1 in (3.5), i.e., f (x, y) − f (x, y) = xi1 yi1 , the general case will then follow by induction. Let u x , resp. u x , be the standard basis elements of A, resp. A . Let us consider the map θ : A → A defined by
√ −1 u x , if xi1 = 1, θ (u x ) = u x , otherwise. √ x Note that one can write θ (u x ) = −1 i1 u x , for all x. Let us show that θ is a (graded) isomorphism between A and A . On the one hand, √ (xi +yi ) θ (u x · u y ) = (−1) f (x,y) θ (u x+y ) = (−1) f (x,y) −1 1 1 u x+y .
A Series of Algebras Generalizing the Octonions...
93
On the other hand, θ (u x ) · θ (u y ) =
√ xi1 √ yi1 √ xi √ yi −1 u x · −1 u y = −1 1 −1 1 (−1) f (x,y) u x+y .
Using the following (surprising) formula √ xi1 √ yi1 −1 −1 xi yi √ (xi1 +yi1 ) = (−1) 1 1 , −1
(3.6)
we obtain θ (u x · u y ) = θ (u x ) · θ (u y ). To understand (3.6), beware that the power xi1 + yi1 in the denominator is taken modulo 2. In the real case, the algebras A and A can be non-isomorphic but can also be isomorphic. We will encounter throughout this paper the algebras for which either situation occurs. 3.3. Isomorphic twisted algebras. Let us stress that all the isomorphisms between the twisted group algebras we consider in this section preserve the grading. Such isomorphisms are called graded isomorphisms. It is natural to ask under what condition two functions f and f define isomorphic algebras. Unfortunately, we do not know the complete answer to this question and give here two conditions which are sufficient but certainly not necessary. Lemma 3.5. If f − f = δb is a coboundary, i.e., b : (Z2 )n → Z2 is a function such that f (x, y) − f (x, y) = b(x + y) + b(x) + b(y), then the corresponding twisted algebras are isomorphic. Proof. The isomorphism is given by the map u x → (−1)b(x) u x , for all x ∈ (Z2 )n . Lemma 3.6. Given a group automorphism T : (Z2 )n → (Z2 )n , the functions f and f (x, y) = f (T (x), T (y)) define isomorphic twisted group algebras. Proof. The isomorphism is given by the map u x → u T −1 (x) , for all x ∈ (Z2 )n . Note that the automorphisms of (Z2 )n are just arbitrary linear transformations. We are ready to answer the question formulated in the end of Sect. 2.2. Proposition 3.7. Given two twisted algebras A = (K (Z2 )n , f ) and A = (K (Z2 )n , f ), the corresponding quasialgebra structures coincide, i.e., β = β and φ = φ, if and only if f (x, y) − f (x, y) = δb(x, y) + λi xi yi , 1≤i≤n
where b : (Z 2 )n → Z2 is an arbitrary function, and λi are coefficients in Z2 . In particular, if K = C, then A ∼ = A .
94
S. Morier-Genoud, V. Ovsienko
Proof. If the quasialgebras structures coincide, then φ = φ implies δ f = δ f , so that f − f is a 2-cocycle. We use the information about the second cohomology space H 2 ((Z2 )n ; Z2 ) (see Sect. 2.4). Up to a 2-coboundary, every non-trivial 2-cocycle is a linear combination of the following bilinear maps: (x, y) → xi yi , for some i and (x, y) → xk y , for some k < . One deduces that f − f is of the form f (x, y) − f (x, y) = δb(x, y) +
λi xi yi +
1≤i≤n
μk xk y .
k<
Since β = β, one observes that f − f is symmetric, so that the last summand vanishes, while the second summand is nothing but the signature. The isomorphism in the complex case then follows from Proposition 3.4 and Lemma 3.5. Conversely, if f and f are related by the above expression, then the quasialgebra structures obviously coincide. Now, we can deduce Theorem 1 as a corollary of Proposition 3.7. Indeed, if A and A have the same generating function α, then the quasialgebra structures of A and A are the same.
3.4. Involutions. Let us mention one more property of generating functions. Recall that an involution on an algebra A is a linear map a → a¯ from A to A such that ab = b¯ a¯ and 1¯ = 1, i.e., an involution is an anti-automorphism. Every generating function defines a graded involution of the following particular form: u x = (−1)α(x) u x .
(3.7)
Proposition 3.8. If α is a generating function, then the linear map defined by formula (3.7) is an involution. Proof. Using (2.1) and (3.1), one has u x u y = (−1)α(x+y) u x u y = (−1)α(x+y)+β(x,y) u y u x = (−1)α(x)+α(y) u y u x = u y u x . Hence the result. In particular, the generating functions of H and O, see Example 3.2, correspond to the canonical involutions, i.e., to the conjugation. 4. The Series On and Mn : Characterization In this section, we formulate our first main result. Theorem 2 concerns the general properties of twisted (Z2 )n -algebras with φ = δ f symmetric. This result distinguishes a class of algebras of which our algebras of O- and M-series are the principal representatives. We will also present several different ways to define the algebras On and Mn , as well as of O p,q and M p,q .
A Series of Algebras Generalizing the Octonions...
95
4.1. Symmetric quasialgebras. An arbitrary twisted group algebra leads to a quasialgebra structure. One needs to assume some additional conditions on the “twisting” function f in order to obtain an interesting class of algebras. We will be interested in the case where the function φ = δ f , see formula (2.4), is symmetric: φ(x, y, z) = φ(y, x, z) = φ(x, z, y).
(4.1)
This condition seems to be very natural: it means that if three elements, u x , u y and u z form a antiassociative triplet, i.e., one has u x · (u y · u z ) = −(u x · u y ) · u z , then this property is independent of the ordering of the elements in the triplet. An immediate consequence of the identity (2.5) is that, if φ is symmetric, then it is completely determined by β: φ(x, y, z) = β(x + y, z) + β(x, z) + β(y, z) = β(x, y + z) + β(x, y) + β(x, z),
(4.2)
as the “defect of linearity” in each argument. The following statement is our main result about the general structure of a twisted group algebra A = (K[(Z2 )n ], f ). We formulate this result in a slightly more general context of (Z2 )n -graded quasialgebra. Theorem 2. Given a (Z2 )n -graded quasialgebra A, the following conditions are equivalent. (i) The function φ is symmetric. (ii) The algebra A has a generating function. This theorem will be proved in Sect. 6.1. It is now natural to ask under what condition a function α : (Z2 )n → Z2 is a generating function for some twisted group algebra. The following statement provides a necessary and sufficient condition. Proposition 4.1. Given a function α : (Z2 )n → Z2 , there exists a twisted group algebra A such that α is a generating function of A, if and only if α is a polynomial of degree ≤ 3. This proposition will be proved in Sect. 6.2. Furthermore, we will show in Sect. 6.3 that the generating function can be chosen in a canonical way. Theorem 2 has a number of consequences. In particular, it implies two more important properties of φ. The function φ is called trilinear if it satisfies φ(x + y, z, t) = φ(x, z, t) + φ(y, z, t),
(4.3)
and similarly in each argument. The function φ is alternate if it satisfies φ(x, x, y) = φ(x, y, x) = φ(y, x, x) = 0,
(4.4)
for all x, y ∈ (Z2 )n . Let us stress that an algebra satisfying (4.4) is graded-alternative i.e., u x · (u x · u y ) = u 2x · u y
(u y · u x ) · u x = u y · u 2x ,
for all homogeneous elements u x and u y . This does not imply that the algebra is alternative. Let us mention that alternative graded quasialgebras were classified in [3]. The following result is a consequence of Theorem 2 and Proposition 4.1.
96
S. Morier-Genoud, V. Ovsienko
Corollary 4.2. If the function φ is symmetric, then φ is trilinear and alternate. This corollary will be proved in Sect. 6.2. Our next goal is to study two series of algebras with symmetric function φ = δ f . Let us notice that the Cayley-Dickson algebras are not of this type, cf. [4]. 4.2. The generating functions of the algebras On and Mn . We already defined the complex algebras On and Mn , with n ≥ 3 and the real algebra O p,q and M p,q , see the Introduction, formulæ (1.3) and (1.4). Let us now calculate the associated function φ = δ f which is exactly the same for f = f O or f M . One obtains xi y j z k . φ(x, y, z) = i= j=k
This function is symmetric in x, y, z and Theorem 2 implies that the algebras On and Mn have generating functions. The explicit formulæ are as follows: xi x j xk + xi x j + xi , (4.5) αO (x) = 1≤i< j
and αM (x) =
1≤i< j≤n
1≤i< j
xi x j xk +
1≤i≤n
xi .
(4.6)
1≤i≤n
Note that the generating functions αO and αM are Sn -invariant with respect to the natural action of the group of permutations Sn on (Z2 )n . Thanks to the Sn -invariance, we can give a very simple description of the above functions. Denote by |x| the weight of x ∈ (Z2 )n (i.e., the number of 1 entries in x written as an n-tuple of 0 and 1). The above generating functions, together with that of the Clifford algebras depend only on |x| and are 4-periodic: |x| 1 2 3 4 5 6 7 8 · · · αC 1 1 0 0 1 1 0 0 · · · αO 1 1 1 0 1 1 1 0 · · · αM 1 0 0 0 1 0 0 0 · · ·
(4.7)
This table is the most simple way to use the generating function in any calculation. One can deduce the explicit formulæ (3.3), (4.5) and (4.6) directly from Table (4.7). 4.3. Characterization of the algebras of the O- and M-series. Let us formulate two uniqueness results that allow us to give axiomatic definitions of the introduced algebras. Recall that the group of permutations Sn acts on (Z2 )n in a natural way. We will characterize the algebras of the O- and M-series in terms of Sn -invariance. We observe that, in spite of the fact that the functions f O and f M are not Sn -invariant, the corresponding algebras are. However, we believe that Sn -invariance is a technical assumption and can be relaxed, see the Appendix for a discussion. The first uniqueness result is formulated directly of the twisting function f . in terms We study the unital twisted algebras A = (K (Z2 )n , f ) satisfying the following conditions:
A Series of Algebras Generalizing the Octonions...
97
(1) The function f is a polynomial of degree 3. (2) The algebra A is graded-alternative, see (4.4). (3) The set of relations between the generators (3.4) of A is invariant with respect to the action of the group of permutations Sn . Since we will use the relations of degree 2 or 3, the condition (3) means that we assume that the generators either all pairwise commute or all pairwise anticommute and that one has either u i · (u j · u k ) = (u i · u j ) · u k ,
for all
i = j = k,
or u i · (u j · u k ) = −(u i · u j ) · u k ,
for all
i = j = k.
Proposition 4.3. The algebras On and Mn are the only twisted (Z2 )n -algebras satisfying the above three conditions. Proof. Since the algebra A is unital, we have f (0, .) = f (., 0) = 0. This implies that f contains no constant term and no terms depending only on x (or only on y) variables. The most general twisting function f of degree 3 is of the form f (x, y) = λi1jk xi x j yk + λi2jk xi y j xk + λi3jk yi x j xk i< j
+
+ μi1jk yi y j xk + μi2jk yi x j yk + μi3jk xi y j yk
νi j xi y j ,
i, j
where λiejk , μiejk and νiej are arbitrary coefficients 0 or 1. Indeed, the expression of f cannot contain the monomials xi x j y j and xi yi y j because of the condition (2). By Lemma 3.5, adding a coboundary to f gives an isomorphic algebra (as (Z2 )n graded algebras). We may assume that for any i < j < k, the coefficient μi1jk = 0 (otherwise, we add the coboundary of b(x) = xi x j xk ). We now compute φ = δ f and obtain: (λi1jk + μi3jk ) xi y j z k + (λi2jk + μi3jk ) xi z j yk + (λi1jk + μi2jk ) yi x j z k φ(x, y, z) = i< j
+λi2jk yi z j xk + (λi3jk + μi2jk ) z i x j yk + λi3jk xi y j z k .
We can assume that u i · (u j · u k ) = −(u i · u j ) · u k ,
i = j = k.
Indeed, if u i · (u j · u k ) = (u i · u j ) · u k for some values of i, j, k such that i = j = k, then (3) implies the same associativity relation for all i, j, k. Since φ is trilinear, this means that A is associative, so that φ = 0. This can only happen if λiejk = μiejk = 0 for all i, j, k, so that deg f = 2. In other words, we obtain a system of equations φ(xi , y j , z k ) = 1 for all i, j, k. This system has a unique solution λi1jk = λi2jk = λi3jk = 1 and μi2jk = μi3jk = 0. Finally, if all of the generators commute, we obtain νi j = ν ji , so that νi j = 0 up to a coboundary, so that f = f M . If all of the generators anticommute, again up to a coboundary, we obtain νi j = 1, if and only if i < j, so that f = f O .
98
S. Morier-Genoud, V. Ovsienko
The second uniqueness result is formulated in terms of the generating function. Proposition 4.4. The algebras On and Mn and the algebras O p,q and M p,q with p+q = n, are the only non-associative twisted (Z2 )n -algebras over the field of scalars C or R that admit an Sn -invariant generating function. Proof. By Proposition 4.1, we know that the generating function is a polynomial of degree ≤ 3. Every Sn -invariant polynomial α : (Z2 )n → Z2 of degree ≤ 3 is a linear combination α = λ3 α3 + λ2 α2 + λ1 α1 + λ0 α0 of the following four functions: α3 (x) =
1≤i< j
xi x j xk ,
α2 (x) =
xi x j ,
α1 (x) =
1≤i< j≤n
xi ,
α0 (x) = 1.
1≤i≤n
Since α(0) = 0, cf. Sect. 3.1, one obtains λ0 = 0. The function α1 does not contribute to the quasialgebra structure β = δα and φ = δ2 α, so that λ1 can be chosen arbitrary. Finally, λ3 = 0 since otherwise φ = 0 and the corresponding algebra is associative. We obtain the functions αO = α3 + α2 + α1 and αM = α3 + α1 as the only possible Sn -invariant generating functions that define non-associative algebras. Note that relaxing the non-associativity condition φ ≡ 0, will also recover the Clifford algebras Cn and C p,q and the group algebra itself. 4.4. Generators and relations. Let us now give another definition of the complex algebras On and Mn and of the real algebras O p,q and M p,q . We use a purely algebraic approach and present our algebras in terms of generators and relations. Consider the generators (3.4). The generators of O p,q and M p,q square to ±1. More precisely, 1, i≤p u i2 = (4.8) −1, otherwise, where 1 = u (0,...,0) is the unit. For the complex algebras On and Mn , one can set u i2 = 1 for all i. The rest of the relations is independent of the signature. The main difference between the series O and M is that the generators anticommute in the O-case and commute in the M-case: u i · u j = −u j · u i in On , O p,q ,
u i · u j = u j · u i in Mn , M p,q .
(4.9)
The third-order relations are determined by the function φ and therefore these relations are the same for both series: u i · (u i · u j ) = u i2 · u j , u i · (u j · u k ) = −(u i · u j ) · u k ,
(4.10) (4.11)
where i = j = k in the second relation. Note that the antiassociativity relation in (4.11) is the reason why the algebras from the M series generated by commuting elements, can, nevertheless, be simple.
A Series of Algebras Generalizing the Octonions...
99
Recall that a Clifford algebra is an algebra with n anticommuting generators satisfying the relations (4.8) and the identity of associativity. We will now give a very similar definition of the algebras On and Mn (as well as O p,q and M p,q ). The associativity is replaced by the identity of quasialgebra. Define a family of algebras A with n generators u 1 , . . . , u n . Consider the monoid X n of non-associative monomials in u i and define a function φ : X n × X n × X n → Z2 satisfying the following two properties: 1, if i = j = k, (1) φ(u i , u j , u k ) = 0, otherwise. (2) φ(u · u , v, w) = φ(u, v, w) + φ(u , v, w), and similar in each variable. Such function exists and is unique. Moreover, φ is symmetric. Define an algebra AC or AR (complex or real), generated by u 1 , . . . , u n that satisfies the relations (4.8) together with one of the following two relations. All the generators either anticommute: u i · u j = −u j · u i , where i = j, or commute: u i · u j = u j · u i , where i = j. We will also assume the identity u · (v · w) = (−1)φ(u,v,w) (u · v) · w, for all monomials u, v, w. Proposition 4.5. If the generators anticommute, then AC ∼ = On and AR ∼ = O p,q . If the C R ∼ ∼ generators commute, then A = Mn and A = M p,q . Proof. By definition of A = AC (resp. AR ), the elements u i1 ...ik = u i1 · (u i2 · (· · · (u ik−1 · u ik )· · ·), where i 1 < i 2 < · · · < i k , form a basis of A. Therefore, dim A = 2n . The linear map sending the generators of A to the generators (3.4) of On or Mn (O p,q or M p,q , respectively) is a homomorphism, since the function φ corresponding to these algebras is symmetric and trilinear. It sends the above basis of A to that of On or Mn (O p,q or M p,q , respectively). 5. The Series On and Mn : Properties In this section, we study properties of the algebras of the series O and M. The main result is Theorem 3 providing a criterion of simplicity. We describe the first algebras of the series and give the list of isomorphisms in lower dimensions. We also define a non-oriented graph encoding the structure of the algebra. Finally, we formulate open problems.
5.1. Criterion of simplicity. The most important property of the defined algebras that we study is the simplicity. Let us stress that we understand simplicity in the usual sense: an algebra is called simple if it contains no proper ideal. Note that in the case of commutative associative algebras, simplicity and division are equivalent notions, in our situation, the notion of simplicity is much weaker.
100
S. Morier-Genoud, V. Ovsienko
Remark 5.1. This notion should not be confounded with the notion of graded-simple algebra. The latter notion is much weaker and means that the algebra contains no graded ideal; however, this notion is rather a property of the grading and not of the algebra itself. The following statement is the second main result of this paper. We will treat the complex and the real cases independently. Theorem 3. (i) The algebra On (resp. Mn ) is simple if and only if n = 4m (resp. n = 4m + 2). One also has O4m ∼ = O4m−1 ⊕ O4m−1 ,
M4m+2 ∼ = M4m+1 ⊕ M4m+1 .
(ii) The algebra O p,q is simple if and only if one of the following conditions is satisfied (1) p + q = 4m, (2) p + q = 4m and p, q are odd. (iii) The algebra M p,q is simple if and only if one of the following conditions is satisfied (1) p + q = 4m + 2, (2) p + q = 4m + 2 and p, q are odd. This theorem will be proved in Sect. 7. The arguments developed in the proof of Theorem 3 allow us to link the complex R and the real algebras in the particular cases below. Let us use the notation OR n and Mn n+1 when we consider the algebras On and Mn as 2 -dimensional real algebras. We have the following statement. Corollary 5.2. (i) If p + q = 4m and p, q are odd, then O p,q ∼ = ORp+q−1 . (ii) If p + q = 4m + 2 and p, q are odd, then M p,q ∼ = MRp+q−1 . This statement is proved in Sect. 7.4. Remark 5.3. To explain the meaning of the above statement, we notice that, in the case where the complex algebras split into a direct sum, the real algebras can still be simple. In this case, all the simple real algebras are isomorphic to the complex algebra with n −1 generators. In particular, all the algebras O p,q and O p ,q with p + q = p + q = 4m and p and p odd are isomorphic to each other (and similarly for the M-series). A very similar property holds for the Clifford algebras. Theorem 3 immediately implies the following. Corollary 5.4. The algebras On and Mn with even n are not isomorphic. This implies, in particular, that the real algebras O p,q and M p ,q with p+q = p +q = 2m are not isomorphic. 5.2. The first algebras of the series. Let us consider the first examples of the introduced algebras. It is natural to ask if some of the introduced algebras can be isomorphic to the other ones. Proposition 5.5.
(i) For n = 3, one has: O3,0 ∼ = O0,3 . = O2,1 ∼ = O1,2 ∼
The first three algebras are isomorphic to the algebra of split-octonions, while O0,3 ∼ = O.
A Series of Algebras Generalizing the Octonions...
101
(ii) For n = 4, one has: O4,0 ∼ = O2,2 ∼ = O3,0 ⊕ O3,0 ,
O0,4 ∼ = O0,3 ⊕ O0,3 .
In particular, O4,0 and O2,2 are not isomorphic to O0,4 . Proof. The above isomorphisms are a combination of the general isomorphisms of type (a) and (b), see Sect. 3.3. The involved automorphisms of (Z2 )3 and (Z2 )4 are x1 = x1 ,
x1 = x1 ,
x2 = x1 + x2 ,
x2 = x1 + x2 ,
x3 = x1 + x2 + x3 , x3 = x1 + x2 + x3 ,
x4 = x1 + x2 + x3 + x4 .
Then, the twisting functions of the above isomorphic algebras coincide modulo coboundary. Let us notice that the very first algebras of the O-series are all obtained as a combination of the algebras of octonions and split-octonions. In this sense, we do not obtain new algebras among them. In the M-case, we have the following isomorphism. Proposition 5.6. One has M1,2 ∼ = M0,3 . Proof. This isomorphism can be obtained by the following automorphism of (Z2 )3 . x1 = x1 + x2 + x3 , x2 = x2 , x3 = x3 . This algebra is not isomorphic to O0,3 or O3,0 . The next algebras, O5 and M5 , as well as all of the real algebras O p,q and M p,q with p + q = 5, are not combinations of the classical algebras. Since these algebras are simple, they are not direct sums of lower-dimensional algebras. The next statement shows that these algebras are not tensor products of classical algebras. Note that the only “candidate” for an isomorphism of this kind is the tensor product of the octonion algebra and the algebra of complex (2 × 2)-matrices. Proposition 5.7. Neither of the algebras O5 and M5 is isomorphic to the tensor product of the octonion algebra O and the algebra C[2] of complex (2 × 2)-matrices: O5 ∼ = O ⊗ C[2],
M5 ∼ = O ⊗ C[2].
Proof. Let us consider the element u = u (1,1,1,1,0) in O5 and the element u = u (1,1,0,0,0) in M5 . Each of these elements has a very big centralizer Z u of dim Z u = 24. Indeed, the above element of O5 commutes with itself and with any homogeneous element u x of the weight |x| = 0, 1, 3, 5 as well as 6 elements such that |x| = 2. The centralizer Z u is the vector space spanned by these 24 homogeneous elements, and similarly in the M5 case. We will show that the algebra O ⊗ C[2] does not contain such an element. Assume, ad absurdum, that an element u ∈ O ⊗ C[2] has a centralizer of dimension ≥ 24. Consider the subspace O ⊗ 1 ⊕ 1 ⊗ C[2] of the algebra O ⊗ C[2]. It is
102
S. Morier-Genoud, V. Ovsienko
12-dimensional, so that its intersection with Z u is of dimension at least 4. It follows that Z u contains at least two independent elements of the form z 1 = e1 ⊗ 1 + 1 ⊗ m 1 ,
z 2 = e2 ⊗ 1 + 1 ⊗ m 2 ,
where e1 and e2 are pure imaginary octonions and m 1 and m 2 are traceless matrices. Without loss of generality, we can assume that one of the following holds: (1) the generic case: e1 , e2 and m 1 , m 2 are linearly independent and pairwise anticommute, (2) e2 = 0 and m 1 , m 2 are linearly independent and anticommute, (3) m 2 = 0 and e1 , e2 are linearly independent and anticommute. We will give the details of the proof in the case (1). Let us write u = u 0 ⊗ 1 + u 1 ⊗ m 1 + u 2 ⊗ m 2 + u 12 ⊗ m 1 m 2 , where u 0 , u 1 , u 2 , u 12 ∈ O. Lemma 5.8. The element u is a linear combination of the following two elements: 1 ⊗ 1,
e1 ⊗ m 1 + e2 ⊗ m 2 − e1 e2 ⊗ m 1 m 2 .
Proof. Denote by [ , ] the usual commutator, one has [u, z 1 ] = [u 0 , e1 ] ⊗ 1 + [u 1 , e1 ] ⊗ m 1 + ([u 2 , e1 ] − 2u 12 ) ⊗ m 2 + ([u 12 , e1 ] − 2u 2 ) ⊗ m 1 m 2 , [u, z 2 ] = [u 0 , e2 ] ⊗ 1 + [u 2 , e2 ] ⊗ m 2 + ([u 1 , e2 ] + 2u 12 ) ⊗ m 1 + ([u 12 , e2 ] + 2u 1 ) ⊗ m 1 m 2 . One obtains [u 0 , e1 ] = [u 0 , e2 ] = 0, so that u 0 is proportional to 1. Furthermore, one also obtains [u 1 , e1 ] = 0 and [u 2 , e2 ] = 0 that implies u 1 = λ1 e1 + μ1 1,
u 2 = λ2 e2 + μ2 1.
The equations [u 2 , e1 ] − 2u 12 = 0 and [u 1 , e2 ] + 2u 12 give u 12 = λ2 e2 e1
and
u 12 = −λ1 e1 e2 ,
hence λ1 = λ2 , since e1 and e2 anticommute by assumption. Finally, the equations [u 12 , e1 ] − 2u 2 = 0 and [u 12 , e2 ] + 2u 1 = 0 lead to μ1 = μ2 = 0. Hence the lemma. In the case (1), one obtains a contradiction because of the following statement. Lemma 5.9. One has dim Z u ≤ 22. Proof. Lemma 5.8 implies that the element u belongs to a subalgebra C[4] = C[2] ⊗ C[2] ⊂ O ⊗ C[2]. We use the well-known classical fact that, for an arbitrary element u ∈ C[4], the dimension of the centralizer inside C[4]: {X ∈ C[4] | [X, u] = 0} is at most 10 (i.e., the codimension is ≤ 6). Furthermore, the 4-dimensional space of the elements e3 ⊗ 1, where e3 ∈ O anticommutes with e1 , e2 is transversal to Z u . It follows that the codimension of Z u is at least 10. Hence the lemma.
A Series of Algebras Generalizing the Octonions...
103
100
110
100
101
110
111
010
011
101 111
001
010
011
001
Fig. 3. The algebras C3 and M3
Cases (2) and (3) are less involved. In Case (2), u is proportional to e1 ⊗ 1 and one checks that Z u = e1 ⊗ C[2] ⊕ 1 ⊗ C[2] is of dimension 8. In Case (3), u is proportional to 1 ⊗ m 1 so that Z u = O ⊗ 1 ⊕ O ⊗ m 1 is of dimension 16. In each case, we obtain a contradiction. 5.3. The commutation graph. We associate a non-oriented graph, that we call the commutation graph, to every twisted group algebra in the following way. The vertices of the graph are the elements of (Z2 )n . The (non-oriented) edges x − y join the elements x and y such that u x and u y anticommute. Proposition 5.10. Given a complex algebra (C[(Z 2 )n ], f ) with symmetric function φ = δ f , the commutation graph completely determines the structure of A. Proof. In the case where φ is symmetric, formula (4.2) and Proposition 3.7 imply that the graph determines the structure of the algebra A, up to signature. This means two complex algebras, A and A , corresponding to the same commutation graph are isomorphic. Conversely, two algebras, A and A with different commutation graphs, are not isomorphic as (Z2 )n -graded algebras. However, we do not know if there might exist an isomorphism that does not preserve the grading. Example 5.11. The algebra M3 is the first non-trivial algebra of the series Mn . The corresponding commutation graph is presented in Fig. 3, together with the graph of the Clifford algebra C3 . The algebra C3 is not simple: C3 = C[2] ⊕ C[2]. It contains a central element u (1,1,1) corresponding to a “singleton” in Fig. 3. Remark 5.12. (a) The defined planar graph is dual trivalent, that is, every edge represented by a projective line or a circle, see Fig. 3, contains exactly 3 elements. Indeed, any three homogeneous elements u x , u y and u x+y either commute or anticommute with each other. This follows from the tri-linearity of φ. (b) We also notice that the superposition of the graphs of Cn and Mn is precisely the graph of the algebra On . We thus obtain the following “formula”: C + M = O. Example 5.13. The commutation graph of the algebra M4 is presented in Fig. 4.
104
S. Morier-Genoud, V. Ovsienko 1111
1110 0111
0110
1000
0001 0011
0101
1010
1100 1101
1011
1001
0010
0100
Fig. 4. The commutation graph of M4
1111
1110
0111
1011
1000
1101
1001 1100
1010
0100
0001 0101
0110
0011 0010
Fig. 5. The commutation graph of C4
The commutation graph of the Clifford algebra C4 is is presented in Fig. 5. Note that both algebras, M4 and C4 are simple. The superposition of the graphs of M4 and C4 cancels all the edges from (1, 1, 1, 1). Therefore, the element (1, 1, 1, 1) is a singleton in the graph of the algebra O4 . This corresponds to the fact that u (1,1,1,1) in O4 is central, in particular, O4 is not simple. The planar graph provides a nice way to visualize the algebra (K (Z2 )n , f ).
A Series of Algebras Generalizing the Octonions...
105
6. Generating Functions: Existence and Uniqueness In this section we prove Theorem 2 and its corollaries. Our main tool is the notion of generating function. We show that the structure of all the algebras we consider in this paper is determined (up to signature) by a single function of one argument α : (Z2 )n → Z2 . This of course simplifies the understanding of these algebras. 6.1. Existence of a generating function. Given a (Z2 )n -graded quasialgebra, let us prove that there exists a generating function α if and only if the ternary map φ is symmetric. The condition that φ is symmetric is of course necessary for existence of α, cf. formula (3.2), let us prove that this condition is, indeed, sufficient. Lemma 6.1. If φ is symmetric, then β is a 2-cocycle: δβ = 0. Proof. If φ is symmetric then the identity (4.2) is satisfied. In particular, the sum of the two expressions of φ gives: β(x + y, z) + β(x, y + z) + β(x, y) + β(y, z) = 0, which is nothing but the 2-cocycle condition δβ = 0. Using the information about the second cohomology space H 2 ((Z2 )n ; Z2 ), as in the proof of Proposition 3.7, we deduce that β is of the form β(x, y) = δα(x, y) + xi yi + xk y , i∈I
(k,)∈J
where α : (Z2 )n → Z2 is an arbitrary function and where I is a subset of {1, . . . , n} and J is a subset of {(k, ) | k < }. Indeed, the second and the third terms are the most general non-trivial 2-cocycles on (Z2 )n with coefficients in Z2 . Furthermore, the function β satisfies two properties: it is symmetric and β(x, x) = 0. The second property implies that β does not contain the terms xi yi . The symmetry of β means that whenever there is a term xk y , there is x yk , as well. But, xk y + x yk is a coboundary of xk x . We proved that β = δα, which is equivalent to the identity (3.1). Finally, using the equality (4.2), we also obtain the identity (3.2). Theorem 2 is proved. 6.2. Generating functions are cubic. In this section, we prove Proposition 4.1. We show that a function α : (Z2 )n → Z2 is a generating function of a (Z2 )n -graded quasialgebra if and only if α is a polynomial of degree ≤ 3. The next statement is an application of the pentagonal diagram in Fig. 2. Lemma 6.2. A generating function α : (Z2 )n → Z2 satisfies the equation δ3 α = 0, where the map δ3 is defined by δ3 α (x, y, z, t) := α(x + y + z + t) + α(x + y + z) + α(x + y + t) + α(x + z + t) + α(y + z + t) + α(x + y)+α(x + z)+α(x + t)+α(y + z)+α(y + t)+α(z + t) + α(x) + α(y) + α(z) + α(t). (6.1)
106
S. Morier-Genoud, V. Ovsienko
Proof. This follows immediately from the fact that φ is a 3-cocycle: substitute (3.2) to the equation δφ = 0 to obtain δ3 α = 0. The following statement characterizes polynomials of degree ≤ 3. Lemma 6.3. A function α : (Z2 )n → Z2 is a polynomial of degree ≤ 3 if and only if δ3 α = 0. Proof. This is elementary, see also [14,30]. Proposition 4.1 is proved. Let us now prove Corollary 4.2. If the map φ is symmetric, then Theorem 2 implies the existence of the generating function α. The map φ is then given by (3.2). One checks by an elementary calculation that φ(x + y, z, t) + φ(x, z, t) + φ(y, z, t) = δ3 α(x, y, z, t). By Lemma 6.2, one has δ3 α = 0. It follows that φ is trilinear. Furthermore, from (4.2), we deduce that φ is alternate. Corollary 4.2 is proved. 6.3. Uniqueness of the generating function. Let us show that there is a canonical way to choose a generating function. Lemma 6.4. (i) Given a (Z2 )n -graded quasialgebra A with a generating function, one can choose the generating function in such a way that it satisfies α(0) = 0, (6.2) α(x) = 1, |x| = 1. (ii) There exists a unique generating function of A satisfying (6.2). Proof. Part (i). Every generating function α vanishes on the zero element 0 = (0, . . . , 0), cf. Sect. 3.1. Furthermore, a generating function corresponding to a given algebra A, is defined up to a 1-cocycle on (Z2 )n . Indeed, the functions β = δα and φ = δ2 α that define the quasialgebra structure do not change if one adds a 1-cocycle to α. Since every 1-cocycle is a linear function, we obtain α(x) ∼ α(x) + λi xi . 1≤i≤n
One therefore can normalize α in such a way that α(x) = 1 for all x such that |x| = 1. Part (ii). The generating function normalized in this way is unique. Indeed, any other function, say α , satisfying (6.2) differs from α by a polynomial of degree ≥ 2, so that α − α cannot be a 1-cocycle. Therefore, β = β which means the quasialgebra structure is different. We will assume the normalization (6.2) in the sequel, whenever we speak of the generating function corresponding to a given algebra. Let us now consider an algebra A with n generators u 1 , . . . , u n . The group of permutations Sn acts on A by permuting the generators.
A Series of Algebras Generalizing the Octonions...
107
Corollary 6.5. If the group of permutations Sn acts on A by automorphisms, then the corresponding generating function α is Sn -invariant. Proof. Let α be a generating function. Since the algebra A is stable with respect to the Sn -action, the function α ◦ σ is again a generating function. If, moreover, α satisfies (6.2), then α ◦ σ also satisfies this condition. The uniqueness Lemma 6.4 implies that α ◦ σ = α. Note that the converse statement holds in the complex case, but fails in the real case. 6.4. From the generating function to the twisting function. Given an arbitrary polynomial map α : (Z2 )n → Z2 of deg α ≤ 3 such that α(0) = 0, there is a simple way to associate a twisting function f such that (K[(Z2 )n ], f ) admits α as a generating function. Proposition 6.6. There exists a twisting function f satisfying the property f (x, x) = α(x).
(6.3)
Proof. Let us give an explicit formula for a twisting function f . The procedure is linear, we associate to every monomial in α a function in two variables via the following rule: xi x j xk −→ xi x j yk + xi y j xk + yi x j xk , xi x j −→ xi y j , xi −→ xi yi ,
(6.4)
where i < j < k. 7. Proof of the Simplicity Criterion In this section, we prove Theorem 3. We use the notation A to refer to any of the algebras On , O p,q and Mn , M p,q . 7.1. The idea of the proof. Our proof of simplicity of a twisted group algebra A will be based on the following lemma. Lemma 7.1. If for every homogeneous element u x in A there exists an element u y in A such that u x and u y anticommute, then A is simple. Proof. Let us suppose that there exists a nonzero proper two-sided ideal I in A. Every element u in I is a linear combination of some homogeneous elements of A. We write u = λ1 u x 1 + · · · + λ k u x k . Among all the elements of I we choose an element such that the number k of homogeneous components is the smallest possible. We can assume that k ≥ 2, otherwise u is homogeneous and therefore u 2 is non-zero and proportional to 1, so that I = A. In addition, up to multiplication by u x1 and scalar normalization we can assume that u = 1 + λ2 u x 2 + · · · + λk u x k . If there exists an element u y ∈ A anticommuting with u x2 then one obtains that u · u y − u y · u is a nonzero element in I with a shorter decomposition into homogeneous components. This is a contradiction with the choice of u. Therefore, A has no proper ideal.
108
S. Morier-Genoud, V. Ovsienko
We now need to study central elements in A, i.e., the elements commuting with every element of A. 7.2. Central elements. In this section we study the commutative center Z(A) of A, i.e., Z(A) = {w ∈ A| w · a = a · w, for all a ∈ A}. Note that, in the case where A admits a generating function, formula (4.2) implies that the commutative center is contained in the associative nucleus of A, so that the commutative center coincides with the usual notion of center, see [31], p. 136 for more details. The unit 1 of A is obviously an element of the center. We say that A has a trivial center if Z(A) = K 1. Consider the following particular element: z = (1, . . . , 1) in (Z2 )n , with all the components equal to 1, and the associated homogeneous element u z in A. Lemma 7.2. The element u z in A is central if and only if (1) n = 4m in the cases A = On , O p,q ; (2) n = 4m + 2 in the cases A = Mn , M p,q . Proof. The element u z in A is central if and only if for all y ∈ (Z2 )n one has β(y, z) = 0. We use the generating function α. Recall that β(y, z) = α(y + z) + α(y) + α(z). The value α(x) depends only on the weight |x|, see Table (4.7). For every y in Zn2 , one has |z + y| = |z| − |y|. Case (1). According to Table (4.7), one has α(x) = 0 if and only if |x| is a multiple of 4. Assume n = 4m. One gets α(z) = 0 and for every y one has α(y) = 0 if and only if α(y + z) = 0. So, in that case, one always has α(y) = α(y + z) and therefore β(y, z) = 0. Assume n = 4m + r , r = 1, 2 or 3. We can always choose an element y such that |y| = |r − 2| + 1. We get α(z) = α(y) = α(y + z) = 1. Hence, β(y, z) = 1. This implies that u z is not central. Case (2). According to Table (4.7), one has α(x) = 0 if and only if |x| is not equal to 1 mod 4. Assume n = 4m + 2. One gets α(z) = 0 and for every y one has |y| = 1 mod 4 if and only if |y + z| = 1 mod 4. So, in that case, one always has α(y) = α(y + z) and therefore β(y, z) = 0. Assume n = 4m + r , r = 0, 1 or 3. We choose the element y = (1, 0, . . . , 0), if r = 0, 3, or y = (1, 1, 0, . . . , 0), if r = 1. We easily compute β(y, z) = 1. This implies that u z is not central. Let us consider the case where u z is not central. Lemma 7.3. If u z is not central, then A has a trivial center.
A Series of Algebras Generalizing the Octonions...
109
Proof. It suffices to prove that for every homogeneous element u x in A, that is not proportional to 1, there exists an element u y in A, such that u x and u y anticommute. Indeed, if u is central, then each homogeneous component of u is central. Let us fix x ∈ (Z2 )n and the corresponding homogeneous element u x ∈ A, such that x is neither 0, nor z. We want to find an element y ∈ (Z2 )n such that β(x, y) = 1 or equivalently u x anticommutes with u y . Using the invariance of the functions α and β under permutations of the coordinates, we can assume that x is of the form x = (1, . . . , 1, 0, . . . , 0), where first |x| entries are equal to 1 and the last entries are equal to 0. We assume 0 < |x| < n, so that, x starts with 1 and ends by 0. Case A = On or O p,q . If |x| = 4, then we use exactly the same arguments as in the proof of Lemma 7.2 in order to find a suitable y (one can also take one of the elements y = (1, 0, . . . , 0) or y = (0, . . . , 0, 1)). Assume |x| = 4. Consider the element y = (0, 1, . . . , 1, 0, . . . , 0), with |y| = |x|. One has α(x) = α(y) = 0 and α(x + y) = 1. So we also have β(x, y) = 1 and deduce u x anticommutes with u y . Case A = Mn or M p,q . Similarly to the proof of Lemma 7.2, if k = 4 + 2 then we can find a y such that u y anticommutes with u x . If k = 4 + 2 then α(x) = 0. The element y = (0, . . . , 0, 1) satisfies α(y) = 1 and α(x + y) = 0. Consider now the case where u z is a central element. There are two different possibilities: u z 2 = 1, or u z 2 = −1. Lemma 7.4. If u z ∈ A is a central element and if u z 2 = 1, then the algebra splits into a direct sum of two subalgebras: A = A+ ⊕ A− , where A+ := A · (1 + u z ) and A− := A · (1 − u z ). Proof. Using u z 2 = 1, one immediately obtains (1 ± u z )2 = 2 (1 ± u z ), (1 + u z ) · (1 − u z ) = 0.
(7.1)
In addition, using the expression of φ in terms of β given in (4.2) and the fact that β(·, z) = 0, one deduces that φ(·, ·, z) = 0 and thus a · (b · u z ) = (a · b) · u z for all a, b ∈ A. It follows that (a · (1 ± u z )) · (b · (1 ± u z )) = (a · b) · ((1 ± u z ) · (1 ± u z ))
(7.2)
for all a, b ∈ A. This expression, together with the above computations (7.1), shows that A+ and A− are, indeed, two subalgebras of A and that they satisfy A+ · A− = A− · A+ = 0. Moreover, for any a ∈ A, one can write a=
1 1 a · (1 + u z ) + a · (1 − u z ). 2 2
This implies the direct sum decomposition A = A+ ⊕ A− . Notice that the elements respectively.
1 2
(1 + u z ) and
1 2
(1 − u z ) are the units of A+ and A− ,
110
S. Morier-Genoud, V. Ovsienko
7.3. Proof of Theorem 3, part (i). If n = 4m, then by Lemma 7.1 and Lemma 7.3 we immediately deduce that On is simple. If n = 4m, then u z is central and, in the complex case, one has u 2z = 1. By Lemma 7.2 and Lemma 7.4, we immediately deduce that On is not simple and one has O4m = O4m · (1 + u z ) ⊕ O4m · (1 − u z ), where z = (1, . . . , 1) ∈ (Z2 )n . It remains to show that the algebras O4m−1 and O4m · (1 ± u z ) are isomorphic. Indeed, using the computations (7.1) and (7.2), one checks that the map u x −→
1 u (x,0) · (1 ± u z ), 2
where x ∈ (Z2 )n−1 , is the required isomorphism. The proof in the case of Mn is completely similar. 7.4. Proof of Theorem 3, part (ii). The algebras O p,q with p + q = 4m and the algebras M p,q with p + q = 4m + 2 are simple because their complexifications are. If now u z is central, then the property u 2z = 1 or −1 becomes crucial. Using the expressions for f O or f M , one computes f O p,q (z, z) = zi z j zk + zi z j + zi i< j
i≤ j
1≤i≤ p
n(n − 1)(n − 2) n(n + 1) + +p = 6 2 = p, mod 2. And similarly, one obtains f M p,q (z, z) = p. It follows that u 2z = (−1) p . If p is even, then Lemma 7.2 just applied guarantees that A is not simple. Finally, if u z is central and p is odd, then u 2z = −1. Lemma 7.5. If u z is central and p is odd, then O p,q ∼ = O p,q−1 ⊗ C,
M p,q ∼ = M p,q−1 ⊗ C.
Proof. We construct an explicit isomorphism from O p,q−1 ⊗ C to O p,q as follows: u x ⊗ 1 −→ u (x,0) , √ u x ⊗ −1 −→ u (x,0) · u z , for all x ∈ (Z2 )n−1 . We check that the above map is indeed an isomorphism of algebras by noticing that f O p,q ((x, 0), (y, 0)) = f O p,q−1 (x, y). Let us show that Lemma 7.5 implies that the (real) algebras O p,q with p + q = 4m and p odd and the algebras M p,q with p + q = 4m + 2 and p odd are simple. Indeed, O p,q−1 ⊗R C ∼ = O p+q−1 , viewed as a real algebra. We then use the following well-known fact. A simple unital complex algebra viewed as a real algebra remains simple. The proof of Theorem 3 is complete. Lemma 7.5 also implies Corollary 5.2.
A Series of Algebras Generalizing the Octonions...
111
8. Hurwitz-Radon Square Identities In this section, we use the algebras On (and, in the real case, O0,n ) to give explicit formulæ for solutions of a classical problem of products of squares. Recall that the octonion algebra is related to the 8-square identity. In an arbitrary commutative ring, the product (a12 + · · · + a82 ) (b12 + · · · + b82 ) is again a sum of 8 squares c12 + · · · + c82 , where ck are explicitly given by bilinear forms in ai and b j with coefficients ±1, see, e.g., [11]. This identity is equivalent to the fact that O is a composition algebra, that is, for any a, b ∈ O, the norm of the product is equal to the product of the norms: N (a · b) = N (a) N (b).
(8.1)
Hurwitz proved that there is no similar N -square identity for N > 8, as there is no composition algebra in higher dimensions. The celebrated Hurwitz-Radon Theorem [19,27] establishes the maximal number r , as a function of N , such that there exists an identity a12 + · · · + a 2N b12 + · · · + br2 = c12 + · · · + c2N , (8.2) where ck are bilinear forms in ai and b j . The theorem states that r = ρ(N ) is the maximal number, where ρ(N ) is the Hurwitz-Radon function defined as follows. Write N in the form N = 24m+ N , where N is odd and = 0, 1, 2 or 3, then ρ(N ) := 8m + 2 . It was proved by Gabel [16] that the bilinear forms ck can be chosen with coefficients ±1. Note that the only interesting case is N = 2n since the general case is an immediate corollary of this particular one. We refer to [28,29] for the history, results and references. In this section, we give explicit formulæ for the solution to the Hurwitz-Radon equation, see also [22] for further development within this framework. 8.1. The explicit solution. We give an explicit solution for Hurwitz-Radon equation (8.2) for any N = 2n with n not a multiple of 4. We label the a-variables and the c-variables by elements of (Z2 )n . In order to describe the labeling of the b-variables, we consider the following particular elements of (Z2 )n : e0 e0 ei ei
:= (0, 0, . . . , 0), := (1, 1, . . . , 1), := (0, . . . , 0, 1, 0, . . . , 0), where 1 occurs at the i th position, := (1, . . . , 1, 0, 1, . . . , 1), where 0 occurs at the i th position,
for all 1 ≤ i ≤ n and 1 < j ≤ n. We then introduce the following subset Hn of (Z2 )n : Hn = {ei , ei , 1 ≤ i ≤ n}, for n = 1 mod 4, Hn = {ei , e1 + e j , 0 ≤ i ≤ n, 1 < j ≤ n}, for n = 2 Hn = {ei , ei , 0 ≤ i ≤ n}, for n = 3 mod 4. In each case, the subset Hn contains exactly ρ(2n ) elements. We write the Hurwitz-Radon identity in the form ⎛ ⎞⎛ ⎞ ⎝ ax2 ⎠ ⎝ b2x ⎠ = c2x . x∈(Z2 )n
We will establish the following.
x∈Hn
x∈(Z2 )n
mod 4,
(8.3)
112
S. Morier-Genoud, V. Ovsienko
Theorem 4. The bilinear forms cx =
(−1) fO (x+y,y) ax+y b y ,
(8.4)
y∈Hn
where f O is the twisting function of the algebra On defined in (1.3), are a solution to the Hurwitz-Radon identity. In order to prove Theorem 4 we will need to define the natural norm on On . 8.2. The Euclidean norm. Assume that a twisted group algebra A = (K[(Z2 )n ], f ) is equipped with a generating function α. Assume furthermore that the twisting function satisfies f (x, x) = α(x), as in (6.3). The involution on A is defined for every a = x∈(Z2 )n ax u x , where ax ∈ C (or in R) are scalar coefficients and u x are the basis elements, by the formula a¯ = (−1)α(x) ax u x . x∈(Z2 )n
We then define the following norm of an element a ∈ A: N (a) := (a · a) ¯ 0. Proposition 8.1. The above norm is nothing but the Euclidean norm in the standard basis: N (a) = ax2 . (8.5) x∈(Z2 )n
Proof. One has N (a) =
(−1)α(x) ax2 u x · u x =
(−1)α(x)+ f (x,x) ax2 .
The result then follows from the assumption f (x, x) = α(x). The following statement is a general criterion for a, b ∈ A to satisfy the composition equation (8.1). This criterion will be crucial for us to establish the square identities. Proposition 8.2. Elements a, b ∈ A satisfy (8.1), if and only if for all x, y, z, t ∈ (Z2 )2 such that x + y + z + t = 0,
(x, y) = (z, t),
ax b y az bt = 0,
one has α(x + z) = α(y + t) = 1. Proof. Calculating the left-hand-side of (8.1), we obtain N (a · b) = (−1) f (x,y)+ f (z,t) ax b y az bt . x+y+z+t=0
According to (8.5), the product of the norm in the right-hand-side is: N (a) N (b) = ax2 b2y . x,y
A Series of Algebras Generalizing the Octonions...
113
It follows that the condition (8.1) is satisfied if and only if f (x, y) + f (z, t) + f (x, t) + f (z, y) = 1, whenever (x, y) = (z, t) and ax b y az bt = 0. Taking into account the linearity of the function (6.4) and substituting t = x + y + z, one finally gets (after cancellation): f (z, x) + f (x, z) + f (x, x) + f (z, z) = 1. In terms of the function α this is exactly the condition α(x + z) = 1. Hence the result. 8.3. Proof of Theorem 4. Let us apply Proposition 8.2 to the case of the algebra On . Given the variables (ax )x∈(Z2 )n and (bx )x∈Hn , where Hn is the subset defined in (8.3), form the following vectors in On : a=
ax u x ,
b=
x∈(Z2 )n
by u y .
y∈Hn
Taking two distinct elements y, t ∈ Hn one always has αO (y + t) = 1. Therefore, from Proposition 8.2 one deduces that N (a)N (b) = N (a · b). Writing this equality in terms of coordinates of the three elements a, b and c = a · b, one obtains the result. Theorem 4 is proved. Let us give one more classical identity that can be realized in the algebra On . Example 8.3. The most obvious choice of two elements a, b ∈ On that satisfy the condition (8.1) is: a = a0 u 0 + ai u i and b = b0 u 0 + bi u i . One immediately obtains in this case the following elementary but elegant identity: (a02 + · · · + an2 ) (b02 + · · · + bn2 ) = (a0 b0 + · · · + an bn )2 +
(ai b j − b j ai )2 ,
0≤i< j≤n
for an arbitrary n, known as the Lagrange identity. 9. Relation to Code Loops The constructions of the algebras that we use in this work are closely related to some constructions in the theory of Moufang Loops. In particular, they lead to examples of Code Loops [18]. In this section, we apply our approach in order to obtain an explicit construction of the famous Parker Loop. The loop of the basis elements. The structure of loop is a nonassociative version of a group (see, e.g., [17]). Proposition 9.1. The basis together with their opposites, {±u x , x ∈ (Z2 )n }, elements n in a twisted algebra (K (Z2 ) , f ), form a loop with respect to the multiplication rule. Moreover, this loop is a Moufang loop whenever φ = δ f is symmetric.
114
S. Morier-Genoud, V. Ovsienko
Proof. The fact that the elements ±u x form a loop is evident. If the function φ = δ f is symmetric, then this loop satisfies the Moufang identity: u · (v · (u · w)) = ((u · v) · u) · w for all u, v, w. Indeed, the symmetry of φ implies that φ is also trilinear and alternate, see Corollary 4.2. Let us mention that the Moufang loops associated with the octonions and splitoctonions are important classical objects invented by Coxeter [12]. Code loops. The notion of code loops has been introduced by Griess, [18]. We recall the construction and main results. A doubly even binary code is a subspace V in (Z2 )n such that any vectors in V has weight a multiple of 4. It was shown that there exists a function f from V × V to Z2 , called a factor set in [18], satisfying (1) f (x, x) = 41 |x|, (2) f (x, y) + f (y, x) = 21 |x ∩ y|, (3) δ f (x, y, z) = |x ∩ y ∩ z|, where |x ∩ y| (resp. |x ∩ y ∩ z|) is the number of nonzero coordinates in both x and y (resp. all of x, y, z). The associated code loop L(V ) is the set {±u x , x ∈ V } together with the multiplication law u x · u y = (−1) f (x,y) u x+y . The most important example of code loop is the Parker loop that plays an important rôle in the theory of sporadic finite simple groups. The Parker loop is the code loop obtained from the Golay code. This code can be described as the 12-dimensional subspace of (Z2 )24 given as the span of the rows of the following matrix, see [10]: ⎛ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ G=⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎝
1
1 1 1 1 0 1 1 1 1 1 1 1 0 1 0 1 0 1 1 1 0 1 1
0 1 1 0 1 1 1 0 0 0 1 1
1 0 1 1 0 1 1 1 0 0 0 1
0 1 0 1 1 0 1 1 1 0 0 1
0 0 1 0 1 1 0 1 1 1 0 1
0 0 0 1 0 1 1 0 1 1 1 1
1 0 0 0 1 0 1 1 0 1 1 1
1 1 0 0 0 1 0 1 1 0 1 1
1 1 1 0 0 0 1 0 1 1 0 1
0 1 1 1 0 0 0 1 0 1 1 1
1 0 1 1 1 0 0 0 1 0 1 1
⎞ 1 1⎟ 1⎟ ⎟ 1⎟ ⎟ 1⎟ ⎟ 1⎟ ⎟ 1⎟ ⎟ 1⎟ ⎟ 1⎟ ⎟ 1⎟ 1⎠ 0
An explicit formula for the Parker loop. Let us now give the generating function of the Parker loop. We identify the Golay code with the space (Z2 )12 , in such a way that the i th row of the matrix G, denoted i , is identified with the i th standard basic vector ei = (0, . . . , 0, 1, 0 . . . , 0) of (Z2 )12 . As previously, we write u i = u ei = u i the corresponding vector in the Parker loop. The coordinates of an element x ∈ (Z2 )12 are denoted by (x1 , . . . , x11 , x12 ).
A Series of Algebras Generalizing the Octonions...
115
Proposition 9.2. The Parker loop is given by the following generating function α from (Z2 )12 to Z2 : αG (x) = xi xi+1 (xi+5 + xi+8 + xi+9 ) + xi xi+2 (xi+6 + xi+8 ) 1≤i≤11
⎛
+ x12 ⎝
1≤i≤11
xi +
⎞ xi x j ⎠ ,
(9.1)
1≤i< j≤11
where the indices of xi+k are understood modulo 11. Proof. The ternary function φ(x, y, z) = δ f (x, y, z) = |x ∩ y ∩ z| is obviously symmetric in x, y, z. Theorem 2 then implies the existence of a generating function αG . By Proposition 4.1 we know that αG is a polynomial of degree ≤ 3. Moreover, linear terms in αG do not contribute in the quasialgebra structure (i.e do not contribute in the expression of β and φ, see (3.1)). To determine the quadratic and cubic terms, we use the following equivalences: αG contains the term xi x j , i = j ⇐⇒ u i , u j anti-commute, αG contains the term xi x j xk , i = j = k ⇐⇒ u i , u j , u k anti-associate. For instance, the construction of the Parker loop gives that u i and u j commute for all 1 ≤ i, j ≤ 11, since |i ∩ j | = 8, for 1 ≤ i, j ≤ 11. Thus, αG does not contain any of the quadratic terms xi x j , 1 ≤ i = j ≤ 11. But, u 12 anti-commutes with u i , i ≤ 11, since |12 ∩ i | = 6, i ≤ 11. So that the terms x12 xi , i ≤ 11, do appear in the expression of αG . Similarly, one has to determine which one of the triples u i , u j , u k anti-associate to determine the cubic terms in αG . This yields to the expression (9.1) The explicit formula for the factor set f in coordinates on (Z2 )12 is immediately obtained by (6.4). Note that the signature in this case is (11, 1), so that we have to add x12 y12 to (6.4). Remark 9.3. The difference between the loops generated by the basis elements of On and Mn and the Parker loop is that the function (9.1) is not Sn -invariant. Our classification results cannot be applied in this case. We hope that the notion of generating function can be a useful tool for study of code loops. Acknowledgements. This work was completed at the Mathematisches Forschungsinstitut Oberwolfach (MFO). The first author has benefited from the award of a Leibniz Fellowship, the second author is also grateful to MFO for hospitality. We are pleased to thank Christian Duval, Alexey Lebedev, Dimitry Leites, John McKay and Sergei Tabachnikov for their interest and helpful comments. We are also grateful to anonymous referees for a number of useful comments.
10. Appendix: Linear Algebra and Differential Calculus over Z2 The purpose of this Appendix is to relate the algebraic problems we study to the general framework of linear algebra over Z2 which is a classical domain.
116
S. Morier-Genoud, V. Ovsienko
Automorphisms of (Z2 )n and linear equivalence. All the algebraic structures on (Z2 )n we consider are invariant with respect to the action of the group automorphisms Aut((Z2 )n ) ∼ = GL(n, Z2 ). For instance, the generating function α : (Z2 )n → Z2 , as well as β and φ, are considered up to the Aut((Z2 )n )-equivalence (called “congruence” in the classic literature [2]). Quadratic forms. The interest to describe an algebra in terms of a generating function can be illustrated in the case of the Clifford algebras. There are exactly two non-equivalent non-degenerate quadratic forms on (Z2 )2m with coefficients in Z2 (see [2,13] for the details): 2 α(x) = x1 xm+1 + · · · + xm x2m + λ (xm2 + x2m ),
(10.1)
where λ = 0 or 1. Note that sometimes the case λ = 1 is not considered (see [20], p.xix) since the extra term is actually linear, for xi2 = xi . The corresponding polar bilinear form β = δα and the trilinear form φ = δ2 α do not depend on λ. The corresponding twisted group algebra is isomorphic to the Clifford algebra Cn . The normal form (10.1) is written in the standard Darboux basis; this formula has several algebraic corollaries. For instance, we immediately obtain the well-known factorization of the complex Clifford algebras: ∼ C2m ∼ = C⊗m = C[2m ], 2 where C[2m ] are (2m × 2m )-matrices. Indeed, the function (10.1), with λ = 0, is nothing but the sum of m generating functions of C2 . The other classical symmetry and periodicity theorems for the Clifford algebras can also be deduced in this way. Let us mention that bilinear forms over Z2 is still an interesting subject [21]. Cubic polynomials. In this paper, we were led to consider polynomials α : (Z2 )n → Z2 of degree 3: α(x) =
i< j
αi3jk xi x j xk +
αi2j xi x j ,
i< j
where αi3jk and αi2j are arbitrary coefficients (equal to 1 or 0). It turns out that it is far of being obvious to understand what α is “non-degenerate” means. To every polynomial α, we associate a binary function β = δα and a trilinear form φ = δ2 α, see formula (3.2), which is of course just the polarization (or linearization) of α. The form φ is alternate: φ(x, x, .) = φ(x, ., x) = φ(., x, x) = 0 and depends only on the homogeneous part of degree 3 of α, i.e., only on αi3jk . There are three different ways to understand the notion of non-degeneracy. (1) The most naive way: α (and φ) is non-degenerate if for all linearly independent x, y ∈ (Z2 )n , the linear function φ(x, y, .) ≡ 0. One can show that, with this definition, there are no non-degenerate cubic forms on (Z2 )n for n ≥ 3. This is of course not the way we proceed.
A Series of Algebras Generalizing the Octonions...
117
(2) The second way to understand non-degeneracy is as follows. The trilinear map φ itself defines an n-dimensional algebra. Indeed, identifying (Z2 )n with its dual space, the trilinear function φ defines a product (x, y) → φ(x, y, .). One can say that φ (and α) is non-degenerate if this algebra is simple. This second way is much more interesting and is related to many different subjects. For instance, classification of simple Lie (super)algebras over Z2 is still an open problem, see [8] and references therein. This definition also depends only on the homogeneous part of degree 3 of α. (3) We understand non-degeneracy yet in a different way. We say that α is non-degenerate if for all linearly independent x, y there exists z such that β(x, z) = 0,
β(y, z) = 0,
where β = δα. This is equivalent to the fact that the algebra with the generated function α is simple, cf. Sect. 7. We believe that every non-degenerate (in the above sense) polynomial of degree 3 on (Z2 )n is equivalent to one of the two forms (4.5) and (4.6). Note that a positive answer would imply the uniqueness results of Sect. 4.3 without the Sn -invariance assumption. Higher differentials. Cohomology of abelian groups with coefficients in Z2 is a wellknown and quite elementary theory explained in many textbooks. Yet, it can offer some surprises. Throughout this work, we encountered and extensively used the linear operators δk , for k = 1, 2, 3, cf. (3.2) and (6.1), that associate a k-cochain on (Z2 )n to a function. These operators were defined in [30], and used in the Moufang loops theory, [18,14,26]. Operations of this type are called natural or invariant since they commute with the action of Aut((Z2 )n ). The operator δk fits the usual understanding of “higher derivation” since the condition δk α = 0 is equivalent to the fact that α is a polynomial of degree ≤ k. The cohomological meaning of δk is as follows. In the case of an abelian group G, the cochain complex with coefficients in Z2 has a richer structure. There exist k natural operators acting from C k (G; Z2 ) to C k+1 (G; Z2 ) :
C 1 (G; Z2 )
δ
/ C 2 (G; Z2 )
δ1,0,0
δ1,0 δ0,1
/
/
C 3 (G; Z2 )
/
δ0,1,0 δ0,0,1
/ ··· /
where δ0,...,1,...,0 is a “partial differential”, i.e., the differential with respect to one variable. For instance, if β ∈ C 2 (G; Z2 ) is a function in two variables, then δ1,0 β(x, y, z) = β(x + y, z) + β(x, z) + β(y, z). In this formula z is understood as a parameter and one can write δ1,0 β(x, y, z) = δγ (x, y), where γ = β(., z). At each order one has δ = δ1,0,...,0 + δ0,1,...0 + · · · + δ0,...,0,1 . If α ∈ C 1 (G; Z2 ), then an arbitrary sequence of the partial derivatives gives the same result: δk α, for example one has δ2 α = δ1,0 ◦ δα = δ0,1 ◦ δα,
δ3 α = δ1,0,0 ◦ δ1,0 ◦ δα = · · · = δ0,0,1 ◦ δ0,1 ◦ δα,
etc. The first of the above equations corresponds to the formula (4.2) since β = δα.
118
S. Morier-Genoud, V. Ovsienko
References 1. Adem, A., Milgram, R.: Cohomology of finite groups. Fundamental Principles of Math. Sci. 309, Berlin: Springer-Verlag, 2004 2. Albert, A.A.: Symmetric and alternate matrices in an arbitrary field. I. Trans. Amer. Math. Soc. 43, 386–436 (1938) 3. Albuquerque, H., Elduque, A., Pérez-Izquierdo, J.M.: Alternative quasialgebras. Bull. Austral. Math. Soc. 63, 257–268 (2001) 4. Albuquerque, H., Majid, S.: Quasialgebra structure of the octonions. J. Algebra 220, 188–224 (1999) 5. Albuquerque, H., Majid, S.: Clifford algebras obtained by twisting of group algebras. J. Pure Appl. Algebra 171, 133–148 (2002) 6. Baez, J.: The octonions. Bull. Amer. Math. Soc. (N.S.) 39, 145–205 (2002) 7. Berkovich, Ya.G., Zhmud’, E.M.: Characters of finite groups, Part 1. Translations of Mathematical Monographs 172. Providence, RI: Amer. Math. Soc., 1998 8. Bouarroudj, S., Grozman, P., Leites, D.: Classification of finite dimensional modular Lie superalgebras with indecomposable Cartan matrix. SIGMA 5, 060 (2009), 63pp 9. Conlon, S.B.: Twisted group algebras and their representations. J. Austr. Math. Soc. 4, 152–173 (1964) 10. Conway, J.H., Sloane, N.J.A.: Sphere packings, lattices and groups. Third edition. New York: SpringerVerlag, 1999 11. Conway, J.H., Smith, D.A.: On quaternions and octonions: their geometry, arithmetic, and symmetry. Natick, MA: A K Peters, Ltd., 2003 12. Coxeter, H.S.M.: Integral Cayley numbers. Duke Math. J. 13, 561–578 (1946) 13. Dieudonné, J.: La géométrie des groupes classiques. Seconde édition, Berlin-Gottingen-Heidelberg: Springer-Verlag, 1963 14. Drápal, A., Vojtechovský, P.: Symmetric multilinear forms and polarization of polynomials. Linear Algebra Appl. 431(5-7), 998–1012 (2009) 15. Elduque, A.: Gradings on octonions. J. Alg. 207, 342–354 (1998) 16. Gabel, M.R.: Generic orthogonal stably free projectives. J. Algebra 29, 477–488 (1974) 17. Goodaire, E., Jespers, E., Polcino, M.: Alternative loop rings. Amsterdam: North-Holland Publ., 1996 18. Griess, R.L. Jr.: Code Loops. J. Alg. 100, 224–234 (1986) 19. Hurwitz, A.: Uber die Komposition der quadratischen Formen. Math. Ann. 88, 1–25 (1923) 20. Knus, M.-A., Merkurjev, A., Rost, M., Tignol, J.-P.: The book of involutions. AMS Colloquium Publ. 44, Providence, RI: Amer. Math. Soc., 1998 21. Lebedev, A.: Non-degenerate bilinear forms in characteristic 2, related contact forms, simple Lie algebras and superalgebras. http://arXiv.org/abs/0601536v2 [math.Ac], 2006 22. Lenzhen, A., Morier-Genoud, S., Ovsienko, V.: New solutions to the Hurwitz problem on square identities. J. Pure Appl. Alg. (2011). doi:10.1016/j.jpaa.2011.04.011 23. Lychagin, V.: Colour calculus and colour quantizations. Acta Appl. Math. 41, 193–226 (1995) 24. Morier-Genoud, S., Ovsienko, V.: Well, Papa, can you multiply triplets? Math. Intell. 31, 1–2 (2009) 25. Morier-Genoud, S., Ovsienko, V.: Simple graded commutative algebras. J. Alg. 323, 1649–1664 (2010) 26. Nagy, G., Vojtechovský, P.: The Moufang loops of order 64 and 81. J. Symb. Comp. 42, 871–883 (2007) 27. Radon, J.: Lineare scharen orthogonale Matrizen Abh. Math. Sem. Univ. Hamburg 1, 1–14 (1922) 28. Rajwade, A.R.: Squares. London Mathematical Society Lecture Note Series, 171. Cambridge: Cambridge Univ. Press, 1993 29. Shapiro, D.: Compositions of quadratic forms. Berlin: Walter de Gruyter & Co., 2000 30. Ward, H.: Combinatorial polarization. Discrete Math. 26(2), 185–197 (1979) 31. Zhevlakov, K.A., Slin’ko, A.M., Shestakov, I.P., Shirshov, A.I.: Rings that are nearly associative. Pure and Appl. Math. 104, New York-London: Academic Press, Inc., 1982 Communicated by Y. Kawahigashi
Commun. Math. Phys. 306, 119–163 (2011) Digital Object Identifier (DOI) 10.1007/s00220-011-1286-x
Communications in
Mathematical Physics
Quasi-Normal Modes and Exponential Energy Decay for the Kerr-de Sitter Black Hole Semyon Dyatlov Department of Mathematics, Evans Hall, University of California, Berkeley, CA 94720, USA. E-mail:
[email protected];
[email protected] Received: 7 June 2010 / Accepted: 15 March 2011 Published online: 22 June 2011 – © Springer-Verlag 2011
Abstract: We provide a rigorous definition of quasi-normal modes for a rotating black hole. They are given by the poles of a certain meromorphic family of operators and agree with the heuristic definition in the physics literature. If the black hole rotates slowly enough, we show that these poles form a discrete subset of C. As an application we prove that the local energy of linear waves in that background decays exponentially once orthogonality to the zero resonance is imposed. Quasi-normal modes are the complex frequencies appearing in expansions of waves; their real part corresponds to the rate of oscillation and the nonpositive imaginary part, to the rate of decay. According to the physics literature [24] they are expected to appear in gravitational waves caused by perturbations of black holes (for more recent references and findings, see for example [8,25,26]). In the mathematics literature they were studied by Bachelot and Motet-Bachelot [3–5] and Sá Barreto and Zworski [31], who applied the methods of scattering theory and semiclassical analysis to the case of a spherically symmetric black hole. Quasi-normal modes were described in [31] as resonances; that is, poles of the meromorphic continuation of a certain family of operators; it was also proved that these poles asymptotically lie on a lattice. This was further developed by Bony and Häfner in [11], who established an expansion of the solutions of the wave equation in terms of resonant states. As a byproduct of this result, they obtained exponential decay of local energy for Schwarzschild–de Sitter. Melrose, Sá Barreto, and Vasy [30] have extended this result to more general manifolds and more general initial data. In this paper, we employ different methods to define quasi-normal modes for the Kerr–de Sitter rotating black hole. As in [31] and [11], we use the de Sitter model; physically, this corresponds to a positive cosmological constant; mathematically, it replaces asymptotically Euclidean spatial infinity with an asymptotically hyperbolic one. Let Pg (ω), ω ∈ C, be the stationary d’Alembert–Beltrami operator of the Kerr–de Sitter metric (see Sect. 1 for details). It acts on functions on the space slice M = (r− , r+ ) × S2 . We define quasi-normal modes as poles of a certain (right) inverse Rg (ω) to Pg (ω).
120
S. Dyatlov
Because of the cylindrical symmetry of the operator Pg (ω), it leaves invariant the space Dk of distributions with fixed angular momentum k ∈ Z (with respect to the axis of rotation); the inverse Rg (ω) on Dk is constructed by Theorem 1. Let Pg (ω, k) be the restriction of Pg (ω) to Dk . Then there exists a family of operators 2 Rg (ω, k) : L 2comp (M) ∩ Dk → Hloc (M) ∩ Dk
meromorphic in ω ∈ C with poles of finite rank and such that Pg (ω, k)Rg (ω, k) f = f for each f ∈ L 2comp (M) ∩ Dk . Since Rg (ω, k) is meromorphic, its poles, which we call k-resonances, form a discrete set. One can then say that ω ∈ C is a resonance, or a quasi-normal mode, if ω is a k-resonance for some k ∈ Z. However, it is desirable to know that resonances form a discrete subset of C; that is, k-resonances for different k do not accumulate near some point. Also, one wants to construct the inverse Rg (ω) that works for all values of k. For δr > 0,1 put K r = (r− + δr , r+ − δr ),
M K = K r × S2 ,
and let 1 M K be the operator of multiplication by the characteristic function of M K (which will, based on the context, act on L 2 (M K ) → L 2 (M) or L 2 (M) → L 2 (M K )). Then we are able to construct Rg (ω) on M K for a slowly rotating black hole: Theorem 2. Fix δr > 0. Then there exists a0 > 0 such that if the rotation speed of the black hole satisfies |a| < a0 , we have the following: 1. Every fixed compact set can only contain k-resonances for a finite number of values of k. Therefore, quasi-normal modes form a discrete subset of C. 2. The operators 1 M K Rg (ω, k)1 M K define a family of operators Rg (ω) : L 2 (M K ) → H 2 (M K ) such that Pg (ω)Rg (ω) f = f on M K for each f ∈ L 2 (M K ) and Rg (ω) is meromorphic in ω ∈ C with poles of finite rank. As stated in Theorem 2, the operator Rg (ω) acts only on functions supported in a certain compact subset of the space slice M depending on how small a is. This is due to the fact that the operator Pg (ω) is not elliptic inside the two ergospheres located near the endpoints r = r± . The result above can then be viewed as a construction of Rg (ω) away from the ergospheres. However, for fixed angular momentum we are able to obtain certain boundary conditions on the elements in the image of Rg (ω, k), as well as on resonant states: Theorem 3. Let ω ∈ C. 1. Assume that ω is not a resonance. Take f ∈ L 2comp (M) ∩ Dk for some k ∈ Z and put 2 (M). Then u is outgoing in the following sense: the functions u = Rg (ω, k) f ∈ Hloc −1
v± (r, θ, ϕ) = |r − r± |i A±
2 +a 2 )ω (1+α)(r±
u(r, θ, ϕ − A−1 ± (1 + α)a ln |r − r± |)
are smooth near the event horizons {r± } × S2 . 1 In this paper, the subscript in the constants such as δ , C , C does not mean that these constants depend r r θ on the corresponding variables, such as r or θ ; instead, it indicates that they are related to these variables.
Quasi-Normal Modes for Kerr–de Sitter
121
2. Assume that ω is a resonance. Then there exists a resonant state; i.e., a nonzero solution u ∈ C ∞ (M) to the equation Pg (ω)u = 0 that is outgoing in the sense of part 1. The outgoing condition can be reformulated as follows. Consider the function U = e−iωt u on the spacetime R × M; then u is outgoing if and only if U is smooth up to the event horizons in the extension of the metric given by the Kerr-star coordinates (t ∗ , r, θ, ϕ ∗ ) discussed in Sect. 1. This lets us establish a relation between the wave equation on Kerr-de Sitter and the family of operators Rg (ω) (Proposition 1.2). Note that here we do not follow earlier applications of scattering theory (including [11]), where spectral theory and in particular self-adjointness of Pg are used to define Rg (ω) for Im ω > 0 and relate it to solutions of the wave equation via Stone’s formula. In the situation of the present paper, due to the lack of ellipticity of Pg (ω) inside the ergospheres, it is doubtful that Pg can be made into a self-adjoint operator; therefore, we construct Rg (ω) directly using separation of variables, cite the theory of hyperbolic equations (see Sect. 1) for well-posedness of the Cauchy problem for the wave equation, and prove Proposition 1.2 without any reference to spectral theory. We now study the distribution of resonances in the slowly rotating Kerr–de Sitter case. First, we establish absence of nonzero resonances in the closed upper half-plane: Theorem 4. Fix δr > 0. Then there exist constants a0 and C such that if |a| < a0 , then: 1. There are no resonances in the upper half-plane and Rg (ω) L 2 (M K )→L 2 (M K ) ≤
C , Im ω > 0. | Im ω|2
2. There are no resonances ω ∈ R\0 and Rg (ω) =
i(1 ⊗ 1) + Hol(ω), 2 + 2a 2 )ω 4π(1 + α)(r+2 + r−
where Hol stands for a family of operators holomorphic at zero. Next, we use the methods of [38] and the fact that the only trapping in our situation is normally hyperbolic to get a resonance free strip: Theorem 5. Fix δr > 0 and s > 0. Then there exist a0 > 0, ν0 > 0, and C such that for |a| < a0 , Rg (ω) L 2 (M K )→L 2 (M K ) ≤ C|ω|s , | Re ω| ≥ C, | Im ω| ≤ ν0 . Theorems 4 and 5, together with the fact that resonances form a discrete set, imply that for ν0 small enough, zero is the only resonance in {Im ω ≥ −ν0 }. This and the presence of the global meromorphic continuation provide exponential decay of local energy:2 2 Recently, the author has obtained stronger exponential decay results (http://arxiv.org/abs/1010.5201v1 [math.Ap], 2010), as well as a more precise description of resonances and a resonance decomposition (http:// arxiv.org/abs/1101.1260v1 [math.Ap], 2011); all of these are based on the present paper.
122
S. Dyatlov
Theorem 6. Let (r, t ∗ , θ, ϕ ∗ ) be the coordinates on the Kerr–de Sitter background introduced in Sect. 1. Fix δr > 0 and s > 0 and assume that a is small enough. Let u be a solution to the wave equation g u = 0 with initial data
u|t ∗ =0 = f 0 ∈ H 3/2+s (M) ∩ E (M K ),
∂t ∗ u|t ∗ =0 = f 1 ∈ H 1/2+s (M) ∩ E (M K ).
(0.1)
Also, define the constant u0 =
1+α 2 2 + 2a 2 ) 4π(r+ + r−
t ∗ =0
∗(du).
Here ∗ denotes the Hodge star operator for the metric g (see Sect. 1). Then ∗
u(t ∗ , ·) − u 0 L 2 (M K ) ≤ Ce−ν t ( f 0 H 3/2+s (M K ) + f 1 H 1/2+s (M K ) ), t ∗ > 0, for certain constants C and ν independent of u. For the Kerr metric, the local energy decay is polynomial as shown by Tataru and Tohaneanu [34,33], see also the lecture notes by Dafermos and Rodnianski [13] and the references below. Outline of the proof. The starting point of the construction of Rg (ω) is the separation of variables introduced by Teukolsky in [36]. The separation of variables techniques and the related symmetries have been used in many papers, including [2,9,11,15,16,19,20, 31,37]; however, these mostly consider the case of zero cosmological constant, where other difficulties occur at zero energy and a global meromorphic continuation of the type presented here is unlikely. In our case, since the metric is invariant under axial rotation, it is enough to construct the operators Rg (ω, k) and study their behavior for large k. The operator Pg (ω, k) is next decomposed into the sum of two ordinary differential operators, Pr and Pθ (see (1.3)). The separation of variables is discussed in Sect. 1; the same section contains the derivation of Theorem 6 from the other theorems by the complex contour deformation method. In the Schwarzchild–de Sitter case, Pθ is just the Laplace–Beltrami operator on the round sphere and one can use spherical harmonics to reduce the problem to studying the operator Pr + λ for large λ. In the case a = 0, however, the operator Pθ is ω-dependent; what is more, it is no longer self-adjoint unless ω ∈ R. This raises two problems with the standard implementation of separation of variables, namely decomposing L 2 into a direct sum of the eigenspaces of Pθ . Firstly, since Pθ is not self-adjoint, we cannot automatically guarantee existence of a complete system of eigenfunctions and the corresponding eigenspaces need not be orthogonal. Secondly, the eigenvalues of Pθ are functions of ω, and meromorphy of Rg (ω) is nontrivial to show when two of these eigenvalues coincide. Therefore, instead of using the eigenspace decomposition, we write Rg (ω) as a certain contour integral (2.1) in the complex plane; the proof of meromorphy of this integral is based on Weierstrass preparation theorem. This is described in Sect. 2. In Sect. 3, we use the separation of variables procedure to reduce Theorems 1–4 to certain facts about the radial resolvent Rr (Proposition 3.2). For fixed ω, λ, k, where λ ∈ C is the separation constant, Rr is constructed in Sect. 4 using the methods of onedimensional scattering theory. Indeed, the radial operator Pr , after the Regge–Wheeler change of variables (4.1), is equivalent to the Schrödinger operator Px = Dx2 + Vx (x)
Quasi-Normal Modes for Kerr–de Sitter
123
for a certain potential Vx (4.2). (Here x = ±∞ correspond to the event horizons.) This does not, however, provide estimates on Rr that are uniform as ω, λ, k go to infinity. The main difficulty then is proving a uniform resolvent estimate (see (3.8)), valid for large λ and Re λ | Im λ| + |ω|2 + |ak|2 , which in particular guarantees the convergence of the integral (2.1) and Theorem 2. A complication arises from the fact that 2 , where ω are proportional to (r 2 + a 2 )ω − ak. No matter how large Vx (±∞) = −ω± ± ± ω is, one can always choose k so that one of ω± is small, making it impossible to use standard complex scaling 3 , in the case ω = o(k), due to the lack of ellipticity of the rescaled operator at infinity. To avoid this issue, we use the analyticity of Vx and semiclassical analysis to get certain control on outgoing solutions at two distant, but fixed, points (Proposition 6.1), and then an integration by parts argument to get an L 2 bound between these two points. This is discussed in Sect. 6. Finally, Sect. 7 contains the proof of Theorem 5. We first use the results of Sects. 2–6 to reduce the problem to scattering for the Schrödinger operator Px in the regime λ = O(ω2 ), k = O(ω) (Proposition 7.1). In this case, we apply complex scaling to deform Px near x = ±∞ to an elliptic operator (Proposition 7.2). We then analyse the corresponding classical flow; it is either nontrapping at zero energy, in which case the usual escape function construction (as in, for example, [27]) applies (Proposition 7.3), or has a unique maximum. In the latter case we use the methods of [38] designed to handle more general normally hyperbolic trapped sets and based on commutator estimates in a slightly exotic microlocal calculus. The argument of [38] has to be modified to use complex scaling instead of an absorbing potential near infinity (see also [38, Theorem 2]). It should be noted that, unlike [11] or [31], the construction of Rg (ω) in the present paper does not use the theorem of Mazzeo–Melrose [28] on the meromorphic continuation of the resolvent on spaces with asymptotically constant negative curvature (see also [21]). In [11] and [31], this theorem had to be applied to prove the existence of the meromorphic continuation of the resolvent for ω in a fixed neighborhood of zero where complex scaling could not be implemented. Remark. The results of this paper also apply if the wave equation is replaced by the Klein–Gordon equation [23] (g + m 2 )u = 0, where m > 0 is a fixed constant. The corresponding stationary operator is Pg (ω)+m 2 ρ 2 ; when restricted to the space Dk , it is the sum of the two operators (see Sect. 1) Pr (ω, k; m) = Pr (ω, k) + m 2 r 2 ,
Pθ (ω; m) = Pθ (ω) + m 2 a 2 cos2 θ.
The proofs in this paper all go through in this case as well. In particular, the rescaled radial operator Px introduced in (4.2) is a Schrödinger operator with the potential Vx (x; ω, λ, k; m) = (λ + m 2 r 2 ) r − (1 + α)2 ((r 2 + a 2 )ω − ak)2 . 2 with ω defined in (4.3), the radial resolvent can Since Vx (±∞) is still equal to −ω± ± be defined as a meromorphic family of operators on the entire complex plane. Also, 3 Complex scaling originated in mathematical physics with the work of Aguilar–Combes [1], Balslev– Combes, and Simon. It has become a standard tool in chemistry for computing resonances. A microlocal approach has been developed by Helffer–Sjöstrand, and a more geometric version by Sjöstrand–Zworski [32] — see that paper for pointers to the literature. Complex scaling was reborn in numerical analysis in 1994 as the method of “perfectly matched layers” (see [7]). A nice application of the method of complex scaling to the Schwarzschild–de Sitter case is provided in [14].
124
S. Dyatlov
the term m 2 r 2 r in the operator Px becomes of order O(h 2 ) under the semiclassical rescaling and thus does not affect the arguments in Sects. 6 and 7. The only difference in the Klein–Gordon case is the absence of the resonance at zero: 1 is no longer an outgoing solution to the equation Px u = 0 for ω = k = λ = 0. Therefore, there is no u 0 term in Theorem 6, and all solutions to (0.1) decay exponentially in the compact set M K . 1. Kerr–de Sitter Metric The Kerr–de Sitter metric is given by the formulas [12] g = −ρ
2
dr 2 dθ 2 + r θ
θ sin2 θ (a dt − (r 2 + a 2 ) dϕ)2 (1 + α)2 ρ 2 r + (dt − a sin2 θ dϕ)2 . (1 + α)2 ρ 2 −
Here M0 is the mass of the black hole, is the cosmological constant (both of which we assume to be fixed throughout the paper), and a is the angular momentum (which we assume to be bounded by some constant, and which is required to be small by most of our theorems);
r 2 r = (r 2 + a 2 ) 1 − − 2M0 r, 3 θ = 1 + α cos2 θ, ρ 2 = r 2 + a 2 cos2 θ, α =
a 2 . 3
We also put A± = ∓∂r r (r± ) > 0. The metric is defined for r > 0; we assume that this happens on an open interval 0 < r− < r < r+ < ∞. (For a = 0, this is true when 9 M02 < 1; it remains true if we take a small enough.) The variables θ ∈ [0, π ] and ϕ ∈ R/2π Z are the spherical coordinates on the sphere S2 . We define the space slice M = (r− , r+ ) × S2 ; then the Kerr–de Sitter metric is defined on the spacetime R × M. The d’Alembert–Beltrami operator of g is given by g =
1 1 Dθ ( θ sin θ Dθ ) Dr ( r Dr ) + 2 2 ρ ρ sin θ (1 + α)2 + 2 (a sin2 θ Dt + Dϕ )2 ρ θ sin2 θ (1 + α)2 2 − 2 ((r + a 2 )Dt + a Dϕ )2 . ρ r
Quasi-Normal Modes for Kerr–de Sitter
125
(Henceforth we denote D = 1i ∂.) The volume form is d Vol =
ρ 2 sin θ dtdr dθ dϕ. (1 + α)2
If we replace Dt by a number −ω ∈ C, then the operator g becomes equal to Pg (ω)/ρ 2 , where Pg (ω) is the following differential operator on M: (1 + α)2 2 ((r + a 2 )ω − a Dϕ )2 r 1 (1 + α)2 Dθ ( θ sin θ Dθ ) + + (aω sin2 θ − Dϕ )2 . sin θ θ sin2 θ
Pg (ω) = Dr ( r Dr ) −
(1.1)
We now introduce the separation of variables for the operator Pg (ω). We start with taking Fourier series in the variable ϕ. For every k ∈ Z, define the space Dk = {u ∈ D | (Dϕ − k)u = 0}.
(1.2)
This space can be considered as a subspace of D (M) or of D (S2 ) alone, and L 2 (M) = (L 2 (M) ∩ Dk ); k∈Z
the right-hand side is the Hilbert sum of a family of closed mutually orthogonal subspaces. Let Pg (ω, k) be the restriction of Pg (ω) to Dk . Then we can write Pg (ω, k) = Pr (ω, k) + Pθ (ω)|Dk , where (1 + α)2 2 ((r + a 2 )ω − ak)2 , r 1 (1 + α)2 Dθ ( θ sin θ Dθ ) + Pθ (ω) = (aω sin2 θ − Dϕ )2 sin θ θ sin2 θ
Pr (ω, k) = Dr ( r Dr ) −
(1.3)
are differential operators in r and (θ, ϕ), respectively. Next, we introduce a modification of the Kerr-star coordinates (see [13, Sect. 5.1]). Following [33], we remove the singularities at r = r± by making the change of variables (t, r, θ, ϕ) → (t ∗ , r, θ, ϕ ∗ ), where t ∗ = t − Ft (r ), ϕ ∗ = ϕ − Fϕ (r ). Note that ∂t ∗ = ∂t and ∂ϕ ∗ = ∂ϕ . In the new coordinates, the metric becomes 2 dθ 2 2 dr + g = −ρ r θ θ sin2 θ [a dt ∗ − (r 2 + a 2 ) dϕ ∗ + (a Ft (r ) − (r 2 + a 2 )Fϕ (r ))dr ]2 (1 + α)2 ρ 2 r + [dt ∗ − a sin2 θ dϕ ∗ + (Ft (r ) − a sin2 θ Fϕ (r ))dr ]2 . (1 + α)2 ρ 2 −
The functions Ft and Fϕ are required to be smooth on (r− , r+ ) and satisfy the following conditions:
126
S. Dyatlov
• Ft (r ) = Fϕ (r ) = 0 for r ∈ K r = [r− + δr , r+ − δr ]; • Ft (r ) = ±(1 + α)(r 2 + a 2 )/ r + Ft± (r ) and Fϕ (r ) = ±(1 + α)a/ r + Fϕ± (r ), where Ft± and Fϕ± are smooth at r = r± , respectively; • for some (a-independent) constant C and all r ∈ (r− , r+ ), 1 (1 + α)2 (r 2 + a 2 )2 > 0. − r Ft (r )2 − (1 + α)2 a 2 ≥ r C Under these conditions, the metric g in the new coordinates is smooth up to the event horizons r = r± and the space slices Mt0 = {t ∗ = t0 = const} ∩ (R × M), t0 ∈ R, are space-like. Let νt be the time-like normal vector field to these surfaces, chosen so that g(νt , νt ) = 1 and dt ∗ , νt > 0. We now establish a basic energy estimate for the wave equation in our setting. Let u be a real-valued function smooth in the coordinates (t ∗ , r, θ, ϕ ∗ ) up to the event horizons. Define the vector field T (du) by 1 T (du) = ∂t u∇g u − g(du, du)νt . 2 Since νt is timelike, the expression g(T (du), νt ) is a positive definite quadratic form in du. For t0 ∈ R, define E(t0 )(du) as the integral of this quadratic form over the space slice Mt0 with the volume form induced by the metric. Proposition 1.1. Take t1 < t2 and let = {t1 ≤ t ∗ ≤ t2 } × M. Assume that u is smooth in up to its boundary and solves the wave equation g u = 0 in this region. Then E(t2 )(du) ≤ eCe (t2 −t1 ) E(t1 )(du) for some constant Ce independent of t1 and t2 . Proof. We use the method of [35, Prop. 2.8.1]. We apply the divergence theorem to the vector field T (du) on the domain . The integrals over Mt1 and Mt2 will be equal to −E(t1 ) and E(t2 ). The restriction of the metric to tangent spaces of the event horizons is nonpositive and the field νt is pointing outside of at r = r± ; therefore, the integrals over the event horizons will be nonnegative. Finally, since g u = 0, one can prove that div T (du) is quadratic in du and thus | div T (du)| ≤ Cg(T (du), νt ). Therefore, the divergence theorem gives E(t2 ) − E(t1 ) ≤ C
t2
t1
It remains to use Gronwall’s inequality.
E(t0 ) dt0 .
Quasi-Normal Modes for Kerr–de Sitter
127
The geometric configuration of {t ∗ = t1 }, {t ∗ = t2 }, {r = r± }, and νt with respect to the Lorentzian metric g used in Prop. 1.1, combined with the theory of hyperbolic equations (see [13, Prop. 3.1.1], [22, Thm. 23.2.4], or [36, Sects. 2.8 and 7.7]), makes it possible to prove that for each f 0 ∈ H 1 (M), f 1 ∈ L 2 (M), there exists a unique solution u(t ∗ , ·) ∈ C([0, ∞); H 1 (M)) ∩ C 1 ([0, ∞); L 2 (M)) to the initial value problem g u = 0, u|t ∗ =0 = f 0 , ∂t ∗ u|t ∗ =0 = f 1 .
(1.4)
We are now ready to prove Theorem 6. Fix δr > 0 and assume that a is chosen small enough so that Theorems 2–5 hold. Assume that s > 0 and u is the solution to (1.4) with f 0 ∈ H 3/2+s ∩ E (M K ) and f 1 ∈ H 1/2+s ∩ E (M K ), where M K is fixed and compactly contained in M K . By finite propagation speed (see [36, Thm. 2.6.1 and Sect. 2.8]), there exists a function χ (t) ∈ C ∞ (0, ∞) independent of u and such that χ (t ∗ ) = 1 for t ∗ > 1, and for t ∗ ∈ supp(1 − χ ), supp u(t ∗ , ·) ⊂ M K . By Proposition 1.1, we can define the Fourier-Laplace transform ∗ χ u(ω) = eit ω χ (t ∗ )u(t ∗ , ·) dt ∗ ∈ H 3/2+s (M), Im ω > Ce . Put f = ρ 2 g (χ u) = ρ 2 [g , χ ]u; then
1/2+s f ∈ Hcomp (R; L 2 (M) ∩ E (M K )).
Therefore, one can define the Fourier-Laplace transform fˆ(ω) ∈ L 2 ∩ E (M K ) for all ω ∈ C, and we have the estimate ω2s +1 fˆ(ω)2L 2 (M) dω ≤ C( f 0 2H 3/2+s + f 1 2H 1/2+s ), where integration is performed over the line {Im ω = ν = const} with ν bounded. Proposition 1.2. We have for Im ω > Ce , χ u(ω)| M K = Rg (ω) fˆ(ω). Proof. Without loss of generality, we may assume that u ∈ C ∞ ∩ Dk for some k ∈ Z; then Rg (ω) fˆ(ω) can be defined on the whole M by Theorem 1. Fix ω and put (ω) = eiωFt (r ) χ u(ω) − Rg (ω, k) fˆ(ω) ∈ C ∞ (M). Since ρ 2 g (χ u) = f , we have Pg (ω)(eiωFt (r ) χ u(ω)) = fˆ(ω); therefore, Pg (ω)(ω) = 0. Note also that is smooth inside M because of ellipticity of the operator Pg (ω) on Dk (see [36, Sect. 7.4] and the last step of the proof of Theorem 1). Now, if we put U (t, ·) = e−itω (ω)(·), then g U = 0 inside M S . However, by Theorem 3, U is smooth in the (r, t ∗ , θ, ϕ ∗ ) coordinates up to the event horizons and its energy grows in time faster than allowed by Proposition 1.1; therefore, = 0.
128
S. Dyatlov
We now restrict our attention to the compact M K , where in particular t = t ∗ and ϕ = ϕ ∗ . By the Fourier Inversion Formula, for t > 1 and ν > Ce , u(t)| M K = (2π )−1 e−it (ω+iν) Rg (ω + iν) fˆ(ω + iν) dω. Fix positive s < s . By Theorems 4 and 5, there exists ν0 > 0 such that zero is the only resonance with Im ω ≥ −ν0 . Using the estimates in these theorems, we can deform the contour of integration above to the one with ν = −ν0 . Indeed, by a density argument we may assume that u ∈ C ∞ , and in this case, fˆ(ω) is rapidly decreasing as Re ω → ∞ for Im ω fixed. We then get u(t)| K r =
1+α ( fˆ(0), 1) L 2 (K r ) 2 + 2a 2 ) + r− +(2π )−1 e−ν0 t e−itω Rg (ω − iν0 ) fˆ(ω − iν0 ) dω. 4π(r+2
(1.5)
We find a representation of the first term above in terms of the initial data for u at time zero. We have ˆ ( f (0), 1) L 2 (K r ) = g (χ u) d Vol . M K ×R
Here d Vol is the volume form induced by g. Integrating by parts, we get g (χ u) d Vol = − g ((1 − χ )u) d Vol = ∗(du). M K ×R
t≥0
(1.6)
t=0
Here ∗ is the Hodge star operator induced by the metric g, with the orientation on M and R × M chosen so that ∗(dt) is positively oriented on {t = 0}. Finally, the L 2 norm of the integral term in (1.5) can be estimated by −ν0 t Ce ωs−s −1/2 ωs +1/2 fˆ(ω − iν0 ) L 2 (K r ) dω
≤ Ce−ν0 t ωs +1/2 fˆ(ω − iν0 ) L 2ω (R)L 2 (K r )
≤ Ce−ν0 t ( f 0 H s +3/2 + f 1 H s +1/2 ),
since ωs−s −1/2 ∈ L 2 . This proves Theorem 6. Remark. In the original coordinates, (t, r, θ, ϕ), the equation g u = 0 has two solutions depending only on the time variable, namely, u = 1 and u = t. Even though Theorem 6 does not apply to these solutions because we only construct the family of operators Rg (ω) acting on functions on the compact set M K , it is still interesting to see where our argument fails if Rg (ω) were well-defined on the whole M. The key fact is that our Cauchy problem is formulated in the t ∗ variable. Then, for u = t the function f 0 = u|t ∗ =0 behaves like log |r − r± | near the event horizons and thus does not lie in the energy space H 1 . As for u = 1, our theorem gives the correct form of the contribution of the zero resonance, namely, a constant; however, the value of this constant cannot be given by the integral of ∗(du) over t ∗ = 0, as du = 0. This discrepancy is explained if we look closer at the last equation in (1.6); while integrating by parts, we will get a nonzero term coming from the integral of ∗d(χ (t ∗ )) over the event horizons.
Quasi-Normal Modes for Kerr–de Sitter
129
2. Separation of Variables in an Abstract Setting In this section, we construct inverses for certain families of operators with separating variables. Since the method described below can potentially be applied to other situations, we develop it abstractly, without any reference to the operators of our problem. Similar constructions have been used in other settings by Ben-Artzi–Devinatz [6] and Mazzeo–Vasy [29, Sect. 2]. First, let us consider a differential operator P(ω) = P1 (ω) + P2 (ω) in the variables (x1 , x2 ), where P1 (ω) is a differential operator in the variable x1 and P2 (ω) is a differential operator in the variable x2 ; ω is a complex parameter. If we take H1 and H2 to be certain L 2 spaces in the variables x1 and x2 , respectively, then the corresponding L 2 space in the variables (x1 , x2 ) is their Hilbert tensor product H = H1 ⊗H2 . Recall that for any two bounded operators A1 and A2 on H1 and H2 , respectively, their tensor product A1 ⊗ A2 is a bounded operator on H and A1 ⊗ A2 = A1 · A2 . The operator P is now written on H as P(ω) = P1 (ω) ⊗ 1H2 + 1H1 ⊗ P2 (ω). We now wish to construct an inverse to P(ω). The method used is an infinitedimensional generalization of the following elementary Proposition 2.1. Assume that A and B are two (finite-dimensional) matrices and that the matrix A ⊗ 1 + 1 ⊗ B is invertible. (That is, no eigenvalue of A is the negative of an eigenvalue of B.) For λ ∈ C, let R A (λ) = (A + λ)−1 and R B (λ) = (B − λ)−1 . Take γ to be a bounded simple closed contour in the complex plane such that all poles of R A lie outside of γ , but all poles of R B lie inside γ ; we assume that γ is oriented in the clockwise direction. Then 1 (A ⊗ 1 + 1 ⊗ B)−1 = R A (λ) ⊗ R B (λ) dλ. 2πi γ The starting point of the method are the inverses4 R1 (ω, λ) = (P1 (ω) + λ)−1 ,
R2 (ω, λ) = (P2 (ω) − λ)−1
defined for λ ∈ C. These inverses depend on two complex variables, and we need to specify their behavior near the singular points: Definition 2.1. Let X be any Banach space, and let W be a domain in C2 . We say that T (ω, λ) is an (ω-nondegenerate) meromorphic map W → X if: (1) T (ω, λ) is a (norm) holomorphic function of two complex variables with values in X for (ω, λ) ∈ Z , where Z is a closed subset of W , called the divisor of T , 4 In this section, we do not use the fact that R (ω, λ) = (P (ω) ± λ)−1 , neither do we prove that j j R(ω) = P(ω)−1 . This step will be done in our particular case in the proof of Theorem 1 in the next section;
in fact, R1 will only be a right inverse to P1 + λ. Until then, we merely establish properties of R(ω) defined by (2.1) below.
130
S. Dyatlov
(2) for each (ω0 , λ0 ) ∈ Z , we can write T (ω, λ) = S(ω, λ)/ X (ω, λ) near (ω0 , λ0 ), where S is holomorphic with values in X and X is a holomorphic function of two variables (with values in C) such that: • for each ω close to ω0 , there exists λ such that X (ω, λ) = 0, and • the divisor of T is given by {X = 0} near (ω0 , λ0 ). Note that the definition above is stronger than the standard definition of meromorphy and it is not symmetric in ω and λ. Henceforth we will use this definition when talking about meromorphic families of operators of two complex variables. It is clear that any derivative (in ω and/or λ) of a meromorphic family is again meromorphic. Moreover, if T (ω, λ) is meromorphic and we fix ω, then T is a meromorphic family in λ. If X is the space of all bounded operators on some Hilbert space (equipped with the operator norm), then it makes sense to talk about having poles of finite rank: Definition 2.2. Let H be a Hilbert space and let T (ω, λ) be a meromorphic family of operators on H in the sense of Definition 2.1. For (ω0 , λ0 ) in the divisor of T , consider the decomposition T (ω0 , λ) = TH (λ) +
N j=1
Tj . (λ − λ0 ) j
Here TH is holomorphic near λ0 and T j are some operators. We say that T has poles of finite rank if every operator T j in the above decomposition of every ω-derivative of T near every point in the divisor is finite-dimensional. One can construct meromorphic families of operators with poles of finite rank by using the following generalization of Analytic Fredholm Theory: Proposition 2.2. Assume that T (ω, λ) : H1 → H2 , (ω, λ) ∈ C2 , is a holomorphic family of Fredholm operators, where H1 and H2 are some Hilbert spaces. Moreover, assume that for each ω, there exists λ such that the operator T (ω, λ) is invertible. Then T (ω, λ)−1 is a meromorphic family of operators H2 → H1 with poles of finite rank. (The divisor is the set of all points where T is not invertible.) Proof. We can use the proof of the standard Analytic Fredholm Theory via Grushin problems, see for example [17, Thm. C.3]. We now go back to constructing the inverse to P(ω). We assume that (A) R j (ω, λ), j = 1, 2, are two families of bounded operators on H j with poles of finite rank. Here ω lies in a domain ⊂ C and λ ∈ C. We want to integrate the tensor product R1 ⊗ R2 in λ over a contour γ that separates the sets of poles of R1 (ω, ·) and R2 (ω, ·). Let Z j be the divisor of R j . We call a point ω regular if the sets Z 1 (ω) and Z 2 (ω) given by (Fig. 1) Z j (ω) = {λ ∈ C | (ω, λ) ∈ Z j } do not intersect. The behavior of the contour γ at infinity is given by the following Definition 2.3. Let ψ ∈ (0, π ) be a fixed angle, and let ω be a regular point. A smooth simple contour γ on C is called admissible (at ω) if: • outside of some compact subset of C, γ is given by the rays arg λ = ±ψ, and
Quasi-Normal Modes for Kerr–de Sitter
131
Fig. 1. An admissible contour. The poles of R1 are denoted by circles and the poles of R2 are denoted by asterisks.
• γ separates C into two regions, 1 and 2 , such that sufficiently large positive real numbers lie in 2 , and Z j (ω) ⊂ j for j = 1, 2. (Henceforth, we assume that arg λ ∈ [−π, π ]. The contour γ and the regions j are allowed to have several connected components.) Existence of admissible contours and convergence of the integral is guaranteed by the following condition: (B) For any compact K ω ⊂ , there exist constants C and R such that for ω ∈ K ω and |λ| ≥ R, • for | arg λ| ≤ ψ, we have (ω, λ) ∈ Z 1 and R1 (ω, λ) ≤ C/|λ|, and • for | arg λ| ≥ ψ, we have (ω, λ) ∈ Z 2 and R2 (ω, λ) ≤ C/|λ|. It follows from (B) that there exist admissible contours at every regular point. Take a regular point ω, an admissible contour γ at ω, and define 1 R(ω) = R1 (ω, λ) ⊗ R2 (ω, λ) dλ. (2.1) 2πi γ Here the orientation of γ is chosen so that 1 always stays on the left. The integral above converges and is independent of the choice of an admissible contour γ . Moreover, the set of regular points is open and R is holomorphic on this set. (We may represent R(ω) as a locally uniform limit of the integral over the intersection of γ with a ball whose radius goes to infinity.) The main result of this section is Proposition 2.3. Assume that H1 and H2 are two Hilbert spaces, and H = H1 ⊗ H2 is their Hilbert tensor product. Let R1 (ω, λ) and R2 (ω, λ) be two families of bounded operators on H1 and H2 , respectively, for ω ∈ ⊂ C and λ ∈ C. Assume that R1 and R2 satisfy assumptions (A)–(B) and the nondegeneracy assumption. (C) The set R of all regular points is nonempty. Then the set of all non-regular points is discrete and the operator R(ω) defined by (2.1) is meromorphic in ω ∈ with poles of finite rank. The rest of this section contains the proof of Proposition 2.3. First, let us establish a normal form for meromorphic decompositions of families in two variables:
132
S. Dyatlov
Proposition 2.4. Let T (ω, λ) be meromorphic (with values in some Banach space) and assume that (ω0 , λ0 ) lies in the divisor of T . Then we can write near (ω0 , λ0 ), T (ω, λ) =
S(ω, λ) , Q(ω, λ)
where S is holomorphic and Q is a monic polynomial in λ of degree N and coefficients holomorphic in ω; moreover, Q(ω0 , λ) = (λ − λ0 ) N . The divisor of T coincides with the set of zeroes of Q near (ω0 , λ0 ). Proof. Follows from Definition 2.1 and Weierstrass Preparation Theorem.
Proposition 2.5. Assume that Q j (ω, λ), j = 1, 2, are two monic polynomials in λ of degrees N j with coefficients holomorphic in ω near ω0 . Assume also that for some ω, Q 1 and Q 2 are coprime as polynomials. Then there exist unique polynomials p1 and p2 of degree no more than N2 − 1 and N1 − 1, respectively, with coefficients meromorphic in ω and such that 1 = p1 Q 1 + p2 Q 2 when p1 and p2 are well-defined. Proof. The N1 + N2 coefficients of p1 and p2 solve a system of N1 + N2 linear equations with fixed right-hand side and the matrix A(ω) depending holomorphically on ω. If ω is chosen so that Q 1 and Q 2 are coprime, then the system has a unique solution; therefore, the determinant of A(ω) is not identically zero. The proposition then follows from Cramer’s Rule. We are now ready to prove that R(ω) is meromorphic. It suffices to show that for each ω0 ∈ R lying in the closure R , ω0 is an isolated non-regular point and R(ω) has a meromorphic decomposition at ω0 with finite-dimensional principal part. Indeed, in this case R is open; since it is closed and nonempty by (C), we have R = and the statement above applies to each ω0 . Let Z 1 (ω0 ) ∩ Z 2 (ω0 ) = {λ1 , . . . , λm }. We choose a ball 0 centered at ω0 and disjoint balls Ul centered at λl such that: • for ω ∈ 0 , the set Z 1 (ω) ∩ Z 2 (ω) is covered by balls Ul and the set Z 1 (ω) ∪ Z 2 (ω) does not intersect the circles ∂Ul ; • for ω ∈ 0 and λ ∈ Ul , we have R j = S jl /Q jl , where S jl are holomorphic and Q jl are monic polynomials in λ of degree N jl with coefficients holomorphic in ω, and Q jl (ω0 , λ) = (λ − λl ) N jl ; • for ω ∈ 0 , the set of all roots of Q jl (ω, ·) coincides with Z j (ω) ∩ Ul ; • there exists a contour γ0 that does not intersect any Ul and is admissible for any ω ∈ 0 with respect to the sets Z j (ω) \ ∪Ul in place of Z j (ω); moreover, each ∂Ul lies in the region 1 with respect to γ0 (see Definition 2.3). Let us assume that ω ∈ 0 is regular. (Such points exist since ω0 lies in the closure of R .) For every l, the polynomials Q 1l (ω, λ) and Q 2l (ω, λ) are coprime; we find by Proposition 2.5 unique polynomials p1l (ω, λ) and p2l (ω, λ) such that 1 = p1l Q 1l + p2l Q 2l
Quasi-Normal Modes for Kerr–de Sitter
133
and deg p1l < N2l , deg p2l < N1l . The converse is also true: if all coefficients of p1l and p2l are holomorphic at some point ω for all l, then ω is a regular point. It follows immediately that ω0 is an isolated non-regular point. To obtain the meromorphic expansion of R(ω) near ω0 , let us take a regular point ω ∈ 0 and an admissible contour γ = γ0 + · · · + γm , where γ0 is the ω-independent contour defined above and each γl is a contour lying in Ul . The integral over γ0 is holomorphic near ω0 , while R1 (ω, λ) ⊗ R2 (ω, λ) dλ γl
=
γl
S1l (ω, λ) ⊗ S2l (ω, λ)
=
p2l (ω, λ) p1l (ω, λ) + dλ Q 2l (ω, λ) Q 1l (ω, λ) N 2l −1 p1l S1l ⊗ R2 dλ = p1l j (ω) (λ − λl ) j S1l ⊗ R2 dλ.
∂Ul
j=0
∂Ul
Here p1l j (ω) are the coefficients of p1l as a polynomial of λ − λl ; they are meromorphic in ω and the rest is holomorphic in ω ∈ 0 . It remains to prove that R has poles of finite rank. It suffices to show that every derivative in ω of the last integral above at ω = ω0 has finite rank. Each of these, in turn, is a finite linear combination of (λ − λl ) j ∂ωa S1l (ω0 , λ) ⊗ ∂ωb R2 (ω0 , λ) dλ. ∂Ul
However, since ∂ωa S1l (ω0 , λ) is holomorphic in λ ∈ Ul , only the principal part of the Laurent decomposition of ∂ωb R2 (ω0 , λ) at λ = λl will contribute to this integral; therefore, the image of each operator in the principal part of Laurent decomposition of R(ω) at ω0 lies in H1 ⊗ V2 , where V2 is a certain finite-dimensional subspace of H2 . It remains to show that each of these images also lies in V1 ⊗ H2 , where V1 is a certain finite-dimensional subspace of H1 . This is done by the same argument, using the fact that R1 (ω, λ) ⊗ R2 (ω, λ) dλ ∂Ul −γl
can be written in terms of p2l and R1 ⊗ S2l and the integral over ∂Ul is holomorphic at ω0 . The proof of Proposition 2.3 is finished. 3. Construction of R g (ω) As we saw in the previous section, one can deduce the existence of an inverse to Pg = Pr + Pθ and its properties from certain properties of the inverses to Pr + λ and Pθ − λ for λ ∈ C. We start with the latter. For a = 0, Pθ is the (negative) Laplace–Beltrami operator for the round metric on S2 ; therefore, its eigenvalues are given by λ = l(l + 1) for l ∈ Z, l ≥ 0. Moreover, if Dk is the space defined in (1.2) and there is an eigenfunction of Pθ |Dk with eigenvalue l(l + 1), then l ≥ k. These observations can be generalized to our case:
134
S. Dyatlov
Proposition 3.1. There exists a two-sided inverse Rθ (ω, λ) = (Pθ (ω) − λ)−1 : L 2 (S2 ) → H 2 (S2 ), (ω, λ) ∈ C2 , with the following properties: 1. Rθ (ω, λ) is meromorphic with poles of finite rank in the sense of Definition 2.2 and it has the following meromorphic decomposition at ω = λ = 0: Rθ (ω, λ) =
Sθ0 (ω, λ) , λ − λθ (ω)
(3.1)
where Sθ0 and λθ are holomorphic in a-independent neighborhoods of zero and Sθ0 (0, 0) = −
1⊗1 , λθ (ω) = O(|ω|2 ). 4π
2. There exists a constant Cθ such that Rθ (ω, λ) L 2 (S2 )∩D →L 2 (S2 ) ≤ k
Cθ for |λ| ≤ k 2 /2, |k| ≥ Cθ |aω|, |k|2
(3.2)
and 2 , | Im λ| for | Im λ| > Cθ |a|(|aω| + |k|)| Im ω|.
Rθ (ω, λ) L 2 (S2 )∩D →L 2 (S2 ) ≤ k
(3.3)
3. For every ψ > 0, there exists a constant Cψ such that Rθ (ω, λ) L 2 (S2 )→L 2 (S2 ) ≤
Cψ for | arg λ| ≥ ψ, |λ| ≥ Cψ |aω|2 . |λ|
(3.4)
Proof. 1. Recall (1.3) that Pθ (ω) is a holomorphic family of elliptic second order differential operators on the sphere. Therefore, for each λ, the operator Pθ (ω) − λ : H 2 (S2 ) → L 2 (S2 ) is Fredholm (see for example [36, Sect. 7.10]). By Proposition 2.2, Rg (ω, λ) is a meromorphic family of operators L 2 → H 2 . We now obtain a meromorphic decomposition for Rθ near zero using the framework of Grushin problems [17, App. C]. Let i 1 : C → L 2 (S2 ) be the operator of multiplicaton by the constant function 1 and π1 : H 2 (S2 ) → C be the operator mapping every function to its integral over the standard measure on the round sphere. Consider the operator A(ω, λ) : H 2 ⊕ C → L 2 ⊕ C given by Pθ (ω) − λ i 1 . A(ω, λ) = π1 0 The kernel and cokernel of Pθ (0) are both one-dimensional and spanned by 1, since this is the Laplace–Beltrami operator for a certain Riemannian metric on the sphere. (Indeed, by ellipticity these spaces consist of smooth functions; by self-adjointness, the kernel and cokernel coincide; one can then apply Green’s formula [36, (2.4.8)] to an element of the kernel and itself.) Therefore [17, Thm. C.1], the operator B(ω, λ) = A(ω, λ)−1 is well-defined at (0, 0); then it is well-defined for (ω, λ) in an a-independent neighborhood of zero. We write B11 (ω, λ) B12 (ω, λ) . B(ω, λ) = B21 (ω, λ) B22 (ω, λ)
Quasi-Normal Modes for Kerr–de Sitter
135
Now, by Schur’s complement formula we have near (0, 0), Rθ (ω, λ) = B11 (ω, λ) − B12 (ω, λ)B22 (ω, λ)−1 B21 (ω, λ). However, B22 (ω, λ) is a holomorphic function of two variables, and we can find λ + O(|ω|2 + |λ|2 ). 4π (The ω-derivative vanishes at zero since ∂ω Pω (0)|D0 = 0. To compute the λderivative, we use that B12 (0, 0) = i 1 /4π and B21 (0, 0) = π1 /4π .) The decomposition (3.1) now follows by Weierstrass Preparation Theorem. 2. We have Pθ (ω) = Pθ (0) + Pθ (ω), where B22 (ω, λ) =
Pθ (ω) =
(1 + α)2 aω (−2Dϕ + aω sin2 θ ) θ
is a first order differential operator and Pθ (0) =
1 (1 + α)2 2 Dθ ( θ sin θ Dθ ) + Dϕ : H 2 (S2 ) → L 2 (S2 ) sin θ θ sin2 θ
satisfies Pθ (0) ≥ k 2 on Dk ; therefore, if u ∈ H 2 (S2 ) ∩ Dk , then (Pθ (0) − λ)u L 2 . d(λ, k 2 + R+ )
u L 2 ≤ Since
Pθ (ω) L 2 (S2 )∩D →L 2 (S2 ) ≤ 2(1 + α)2 |aω|(|aω| + |k|), k
we get u L 2 ≤
d(λ, k 2
(Pθ (ω) − λ)u L 2 , + R+ ) − C1 |aω|(|aω| + |k|)
(3.5)
provided that the denominator is positive. Here C1 is a global constant. Now, if |λ| ≤ k 2 /2, then d(λ, k 2 + R+ ) ≥ k 2 /2 and d(λ, k 2 + R+ ) − C1 |aω|(|aω| + |k|) ≥
k2 for |k| ≥ 8(1 + C1 )|aω|; 4
together with (3.5), this proves (3.2). To prove (3.3), introduce Im Pθ (ω) =
1 2(1 + α)2 (Pθ (ω) − Pθ (ω)∗ ) = a Im ω(a Re ω sin2 θ − Dϕ ); 2 θ
we have Im Pθ (ω) L 2 (S2 )∩D →L 2 (S2 ) ≤ 2(1 + α)2 |a Im ω|(|aω| + |k|). k
However, for u ∈
H 2 (S2 ) ∩ Dk ,
(Pθ (ω)−λ)u · u ≥ | Im((Pθ (ω)−λ)u, u)| ≥ | Im λ| · u2 − |(Im Pθ (ω)u, u)| ≥ (| Im λ| − 2(1 + α)2 |a|(|aω| + |k|)| Im ω|)u2 , and we are done if Cθ ≥ 4(1 + α)2 .
136
S. Dyatlov
3. If | arg λ| ≥ ψ, then d(λ, k 2 + R+ ) ≥ (k 2 + |λ|)/C2 ; here C2 is a constant depending on ψ. We have then d(λ, k 2 + R+ ) − C1 |aω|(|aω| + |k|) ≥
1 |λ| − C3 |aω|2 C2
for some constant C3 , and we are done by (3.5). The analysis of the radial operator Pr is more complicated. In Sects. 4–6, we prove Proposition 3.2. There exists a family of operators 2 (r− , r+ ), (ω, λ) ∈ C2 , Rr (ω, λ, k) : L 2comp (r− , r+ ) → Hloc
with the following properties: 1. For each k ∈ Z, Rr (ω, λ, k) is meromorphic with poles of finite rank in the sense of Definition 2.2, and (Pr (ω, k) + λ)Rr (ω, λ, k) f = f for each f ∈ L 2comp (r− , r+ ). Also, for k = 0, Rr admits the following meromorphic decomposition near ω = λ = 0: Rr (ω, λ, 0) =
Sr 0 (ω, λ) , λ − λr (ω)
(3.6)
where Sr 0 and λr are holomorphic in a-independent neighborhoods of zero and 1⊗1 , r+ − r− 2 + 2a 2 ) i(1 + α)(r+2 + r− ω + O(|ω|2 ). λr (ω) = r+ − r−
Sr 0 (0, 0) =
2. Take δr > 0. Then there exist ψ > 0 and Cr such that for |λ| ≥ Cr , | arg λ| ≤ ψ, |ak|2 ≤ |λ|/Cr , |ω|2 ≤ |λ|/Cr ,
(3.7)
(ω, λ, k) is not a pole of Rr and we have 1 K r Rr (ω, λ, k)1 K r L 2 →L 2 ≤
Cr . |λ|
(3.8)
Also, there exists δr 0 > 0 such that, if K + = [r+ − δr 0 , r+ ] and K − = [r− , r− + δr 0 ], then for each N there exists a constant C N such that under the conditions (3.7), we have −1
1 K ± |r − r± |i A±
2 +a 2 )ω−ak) (1+α)((r±
Rr (ω, λ, k)1 K r L 2 →C N (K ± ) ≤
CN . |λ| N
(3.9)
3. There exists a constant Cω such that Rr (ω, λ, k) does not have any poles for real λ and real ω with |ω| > Cω |ak|. 4. Assume that Rr has a pole at (ω, λ, k). Then there exists a nonzero solution u ∈ C ∞ (r− , r+ ) to the equation (Pr (ω, k) + λ)u = 0 such that the functions −1
|r − r± |i A±
are real analytic at r± , respectively.
2 +a 2 )ω−ak) (1+α)((r±
u(r )
Quasi-Normal Modes for Kerr–de Sitter
137
5. Take δr > 0. Then there exists C1r > 0 such that for Im ω > 0, |ak| ≤ |ω|/C1r , | Im λ| ≤ |ω| · Im ω/C1r , Re λ ≥ −|ω|2 /C1r , (3.10) (ω, λ, k) is not a pole of Rr and we have 1 K r Rr (ω, λ, k)1 K r L 2 →L 2 ≤
C1r . |ω| Im ω
(3.11)
Given these two propositions, we can now prove Theorems 1–4: Proof of Theorem 1. Take k ∈ Z and an arbitrary δr > 0; put H1 = L 2 (K r ), H2 = L 2 (S2 ) ∩ Dk , R1 (ω, λ) = Rr (ω, λ, k), and R2 (ω, λ) = Rθ (ω, λ)|Dk ; finally, let the angle ψ of admissible contours at infinity be chosen as in Proposition 3.2. We now apply Proposition 2.3. Condition (A) follows from the first parts of Propositions 3.1 and 3.2. Condition (B) follows from (3.4) and part 2 of Proposition 3.2. Finally, Condition (C) holds because every ω ∈ R with |ω| > Cω |ak|, where Cω is the constant from part 3 of Proposition 3.2, is regular. Indeed, Pθ (ω) is self-adjoint and thus has only real eigenvalues. Now, by Proposition 2.3 we can use (2.1) to define Rg (ω, k) as a meromorphic family of operators on L 2 (M K ) ∩ Dk with poles of finite rank. This can be done for any δr > 0; therefore, Rg (ω, k) is defined as an operator L 2comp (M) ∩ Dk → L 2loc (M) ∩ Dk . Let us now prove that Pg (ω, k)Rg (ω, k) f = f in the sense of distributions for each f ∈ L 2comp . We will use the method of Proposition 2.1. Assume that ω is a regular point, so that Rg (ω, k) is well-defined. By analyticity, we can further assume that ω is real, so that L 2 (S2 ) ∩ Dk has an orthonormal basis of eigenfunctions of Pθ (ω). Then it suffices to prove that I = (Rg (ω, k)( fr (r ) f θ (θ, ϕ)), Pg (ω)(h r (r )h θ (θ, ϕ))) = ( fr , h r ) · ( f θ , h θ ), where fr , h r ∈ C0∞ (r− , r+ ), h θ ∈ C ∞ (S2 ) ∩ Dk , and f θ ∈ Dk satisfies Pθ (ω) f θ = λ0 f θ , λ0 ∈ R. Take an admissible contour γ ; then 1 I = (Rr (ω, λ, k) fr , Pr (ω, k)h r ) · (Rθ (ω, λ) f θ , h θ ) 2πi γ +(Rr (ω, λ, k) fr , h r ) · (Rθ (ω, λ) f θ , Pθ (ω)h θ ) dλ. However, Rθ (ω, λ) f θ =
fθ . λ0 − λ
It then follows from Condition (B) that we can replace γ by a closed bounded contour γ which contains λ0 , but no poles of Rr . (To obtain γ , we can cut off the infinite ends of
138
S. Dyatlov
γ sufficiently far and connect the resulting two endpoints by the arc −ψ ≤ arg λ ≤ ψ; the integral over the arc can be made arbitrarily small.) Then 1 I = ((1 − λRr (ω, λ, k)) fr , h r ) · (Rθ (ω, λ) f θ , h θ ) 2πi γ +(Rr (ω, λ, k) fr , h r ) · ((1 + λRθ (ω, λ)) f θ , h θ ) dλ 1 = ( fr , h r ) · (Rθ (ω, λ) f θ , h θ ) + (Rr (ω, λ, k) fr , h r ) · ( f θ , h θ ) dλ 2πi γ 1 ( fr , h r )( f θ , h θ ) dλ = ( fr , h r )( f θ , h θ ), = 2πi γ λ0 − λ which finishes the proof. Finally, the operator Pg (ω, k) is the restriction to Dk of the elliptic differential operator on M obtained from Pg (ω) by replacing Dϕ by k in the second term of (1.1). Therefore, by elliptic regularity (see for example [36, Sect. 7.4]) the operator Rg (ω, k) 2 . acts into Hloc Next, Theorem 2 follows from Theorem 1, the fact that the operator Pg (ω) is elliptic on M K for small a (to get H 2 regularity instead of L 2 ), and the following estimate on Rg (ω, k) for large values of k: Proposition 3.3. Fix δr > 0. Then there exists a0 > 0 and a constant Ck such that for |a| < a0 and |k| ≥ Ck (1 + |ω|), ω is not a pole of Rg (·, k) and we have 1 M K Rg (ω, k)1 M K L 2 ∩D →L 2 ≤ k
Ck . |k|2
(3.12)
Proof. Let ψ, Cr be the constants from part 2 of Proposition 3.2 and Cθ , Cψ be the constants from Proposition 3.1. Put λ0 = k 2 /3; if Ck is large enough, then |k| > 1 + Cθ |aω|, λ0 > Cψ |aω|2 + Cr (1 + |ω|2 ). Take the contour γ consisting of the rays {arg λ = ±ψ, |λ| ≥ λ0 } and the arc {|λ| = λ0 , | arg λ| ≤ ψ}. By (3.2) and (3.4), all poles of Rθ lie inside γ (namely, in the region {|λ| ≥ λ0 , | arg λ| ≤ ψ}), and Rθ (ω, λ) L 2 (S2 )∩D →L 2 (S2 ) ≤ k
C |λ|
(3.13)
for each λ on γ . Now, suppose that |a| < a0 = (3Cr )−1/2 ; then (3.7) is satisfied inside γ and (3.12) follows from (2.1), (3.8), and (3.13). Proof of Theorem 3. 1. Fix δr > 0 such that supp f ⊂ M K . Take an admissible contour γ ; then by (2.1) and the fact that the considered functions are in Dk , 1 (R ± (ω, λ, k) ⊗ Rθ (ω, λ)) f dλ, (3.14) v± = 2πi γ r where −1
Rr± (ω, λ, k) = |r − r± |i A±
2 +a 2 )ω−ak) (1+α)((r±
Rr (ω, λ, k).
Quasi-Normal Modes for Kerr–de Sitter
139
By part 2 of Proposition 3.2, we may choose compact sets K ± containing r± such that for each N , there exists a constant C N (depending on ω, k, and γ ) such that 1 K ± Rr± (ω, λ, k)1 K r L 2 →C N (K ± ) ≤
CN , λ ∈ γ. 1 + |λ|
(The estimate is true over a compact portion of γ since the image of Rr± consists of functions smooth at r = r± , by the construction in Sect. 4.) Now, by (3.4) we get for some constant C N , Rr± (ω, λ, k) ⊗ Rθ (ω, λ) f C N (K ± ;L 2 (S2 )) ≤
C N f L 2 ; 1 + |λ|2
by (3.14), v± ∈ C ∞ (K ± ; L 2 (S2 )). Now, since (Pr + Pθ )u = f and (assuming that K ± ∩ K r = ∅) f | K ± ×S2 = 0, we have (Pr± (ω, k) + Pθ (ω))v± = 0 on K ± × S2 , where −1
Pr± (ω, k) = |r − r± |i A±
2 +a 2 )ω−ak) (1+α)((r±
−1
Pr (ω, k)|r − r± |−i A±
2 +a 2 )ω−ak) (1+α)((r±
has smooth coefficients on K ± (see Sect. 4). Then for each N , PθN v± = (−Pr± ) N v± ∈ C ∞ (K ± ; L 2 (S2 )); since Pθ is elliptic, we get v± ∈ C ∞ (K ± ; H 2N (S2 )). Therefore, v± ∈ C ∞ (K ± × S2 ). 2. Let ω be a pole of Rg (ω, k). Then ω is not a regular point; therefore, there exists λ ∈ C such that (ω, λ) is a pole of both Rr and Rθ . This gives us functions u r (r ) and u θ (θ, ϕ) ∈ Dk such that (Pr (ω, k) + λ)u r = 0 and (Pθ (ω) − λ)u θ = 0. It remains to take u = u r ⊗ u θ and use part 4 of Proposition 3.2. The following fact will be used in the proof of Theorem 4, as well as in Sect. 7: Proposition 3.4. Fix δr > 0. Let ψ, Cr be the constants from part 2 of Proposition 3.2, Cθ , Cψ be the constants from Proposition 3.1, and Ck be the constant from Proposition 3.3. Take ω ∈ C and put L = (Cr (1 + Ck )2 + Cψ )(1 + |ω|)2 . Assume that a is small enough so that Proposition 3.3 applies and suppose that ω and l1 , l2 > 0 are chosen so that l1 ≥ Cψ |aω|2 , l2 ≥ Cθ |a|(|aω| + Ck (1 + |ω|))| Im ω|, l2 ≤ L sin ψ.
(3.15)
Also, assume that for all λ and k satisfying |k| ≤ Ck (1 + |ω|), −l1 ≤ Re λ ≤ L , | Im λ| ≤ l2 ,
(3.16)
we have the estimate 1 K r Rr (ω, λ, k)1 K r L 2 →L 2 ≤ C1 for some constant C1 independent of λ and k. Then ω is not a resonance and 1 1 + C1 (l1 + 1 + |ω|2 ) C1l2 Rg (ω) L 2 (M K )→L 2 (M K ) ≤ C2 + + 1 + |ω|2 l2 l1 for a certain global constant C2 .
(3.17)
(3.18)
140
S. Dyatlov
Fig. 2. The admissible contour γ used in Proposition 3.4
Proof. First of all, by Proposition 3.3, it suffices to establish the estimate (3.18) for the operator Rg (ω, k), where |k| ≤ Ck (1 + |ω|). Now, by (2.1), it suffices to construct an admissible contour in the sense of Definition 2.3 and estimate the norms of Rr and Rθ on this contour. We take the contour γ composed of (Fig. 2): • • • •
the rays γ1± = {arg λ = ±ψ, |λ| ≥ L}; the arcs γ2± = {| arg λ| ≤ ψ, |λ| = L , ± Im λ ≥ l2 }; the segments γ3± of the lines {Im λ = ±l2 } connecting γ2± with γ4 ; the segment γ4 = {Re λ = −l1 , | Im λ| ≤ l2 }.
Then γ divides the complex plane into two domains; we refer to the domain containing positive real numbers as 2 and to the other domain as 1 . We claim that Rθ (ω, ·)|Dk has no poles in 1 , Rr (ω, ·, k) has no poles in 2 , and the L 2 → L 2 operator norm estimates Rθ (ω, λ) ≤ C/|λ|, 1 K r Rr (ω, λ, k)1 K r ≤ C/|λ|, λ ∈ γ1± ;
(3.19)
Rθ (ω, λ)|Dk ≤ C/l2 , 1 K r Rr (ω, λ, k)1 K r ≤ C/(1 + |ω| ), λ ∈ γ2± ; (3.20) 2
Rθ (ω, λ)|Dk ≤ C/l2 , 1 K r Rr (ω, λ, k)1 K r ≤ C1 , λ ∈ γ3± ;
(3.21)
Rθ (ω, λ) ≤ C/l1 , 1 K r Rr (ω, λ, k)1 K r ≤ C1 , λ ∈ γ4
(3.22)
hold for some global constant C; then (3.18) follows from these estimates and (2.1). First, we prove that Rθ (ω, ·)|Dk has no poles λ ∈ 1 . First of all, assume that |λ| ≥ L. Then | arg λ| ≥ ψ and we can apply part 3 of Proposition 3.1; we also get the first half of (3.19). Same argument works for Re λ ≤ −l1 , and we get the first half of (3.22). We may now assume that |λ| ≤ L and Re λ ≥ −l1 ; it follows that | Im λ| ≥ l2 . But in that case, we can apply (3.3), and we get the first halves of (3.20) and (3.21). Next, we prove that Rr (ω, ·, k) has no poles λ ∈ 2 . First of all, assume that |λ| ≥ L and Re λ ≥ 0. Then | arg λ| ≤ ψ and we can apply part 2 of Proposition 3.2; we also get the second halves of (3.19) and (3.20). Now, in the opposite case, (3.16) is satisfied and we can use (3.17) to get the second halves of (3.21) and (3.22).
Quasi-Normal Modes for Kerr–de Sitter
141
Proof of Theorem 4. First, we take care of the resonances near zero. By Proposition 3.3, we can assume that k is bounded by some constant. Next, if ω = 0 and a = 0, then Rg (ω, k) only has a pole for k = 0, and in the latter case, λ = 0 is the only common pole of Rθ (0, ·) and Rr (0, ·, 0). (In fact, the poles of Rθ (0, ·)|Dk are given by λ = l(l + 1) for l ≥ |k|; an integration by parts argument shows that Rr (0, ·, k) cannot have poles with Re λ > 0.) The sets of poles of the resolvents Rθ (ω, λ)|Dk and Rr (ω, λ, k) depend continuously on a in the sense that, if there are no poles of one of these resolvents for (ω, λ) in a fixed compact set for a = 0, then this is still true for a small enough. It follows from here and the first parts of Propositions 3.1 and 3.2 that there exists εω , ελ > 0 such that for a small enough, • • • •
Rg (ω, k) does not have poles in {|ω| ≤ εω } unless k = 0; if |ω| ≤ εω , then all common poles of Rθ (ω, ·)|D0 and Rr (ω, ·, 0) lie in {|λ| ≤ ελ }; the decompositions (3.1) and (3.6) hold for |ω| ≤ εω , |λ| ≤ ελ ; we have λr (ω) = λθ (ω) for 0 < |ω| ≤ εω .
It follows immediately that ω = 0 is the only pole of Rg in {|ω| ≤ εω }. To get the meromorphic decomposition, we repeat the argument at the end of Sect. 2 in our particular case. Note that for small ω = 0, 1 Rg (ω, 0) = 2πi
γ
Rr (ω, λ, 0) ⊗ Rθ (ω, λ)|D0 dλ + Hol(ω).
Here γ is a small contour surrounding λθ (ω), but not λr (ω); the integration is done in the clockwise direction; Hol denotes a family of operators holomorphic near zero. By (3.1) and (3.6), we have
Sr 0 (ω, λ) ⊗ Sθ0 (ω, λ) dλ (λ − λr (ω))(λ − λθ (ω)) γ 1 1 = Hol(ω) + λr (ω) − λθ (ω) 2πi 1 1 − dλ × (Sr 0 (ω, λ) ⊗ Sθ0 (ω, λ)) λ − λr (ω) λ − λθ (ω) γ 1 = Hol(ω) + Sr 0 (ω, λθ (ω)) ⊗ Sθ0 (ω, λθ (ω)) λr (ω) − λθ (ω) i(1 ⊗ 1) = Hol(ω) + . 2 + 2a 2 )ω 4π(1 + α)(r+2 + r−
1 Rg (ω, 0) = Hol(ω) + 2πi
Now, let us consider the case |ω| > εω , Im ω > 0. We will apply Proposition 3.4 with l1 = |ω|2 /C1r , l2 = |ω| Im ω/C1r . Here C1r is the constant in Proposition 3.2. Then (3.15) is true for small a and (3.17) follows from (3.16) for small a by part 5 of Proposition 3.2, with C1 = C1r /(|ω| Im ω). It remains to use (3.18). Finally, assume that ω is a real k-resonance and |ω| > εω . Then by Proposition 3.3, and part 3 of Proposition 3.2, if a is small enough, then the operator Rr (ω, ·, k) cannot have a pole for λ ∈ R. However, the operator Pθ (ω) is self-adjoint and thus only has real eigenvalues, a contradiction.
142
S. Dyatlov
4. Construction of the Radial Resolvent In this section, we prove Proposition 3.2, except for part 2, which is proved in Sect. 6. We start with a change of variables that maps (r− , r+ ) to (−∞, ∞): Proposition 4.1. Define x = x(r ) by
x=
r
r0
ds . r (s)
(4.1)
(Here r0 ∈ (r− , r+ ) is a fixed number.) Then there exists a constant X 0 such that for ±x > X 0 , we have r = r± ∓ F± (e∓A± x ), where F± (w) are real analytic on [0, e−A± X 0 ) and holomorphic in the discs {|w| < e−A± X 0 } ⊂ C. Proof. We concentrate on the behavior of x near r+ . It is easy to see that −A+ x(r ) = ln(r+ − r ) + G(r ), where G is holomorphic near r = r+ . Exponentiating, we get w = e−A+ x = (r+ − r )e G(r ) . It remains to apply the inverse function theorem to solve for r as a function of w near zero. After the change of variables r → x, we get Pr (ω, k) + λ = r−1 Px (ω, λ, k), where Px (ω, λ, k) = Dx2 + Vx (x; ω, λ, k), Vx = λ r − (1 + α)2 ((r 2 + a 2 )ω − ak)2 .
(4.2)
(We treat r and r as functions of x now.) We put 2 ω± = (1 + α)((r± + a 2 )ω − ak),
(4.3)
2 . Also, by Proposition 4.1, we get so that Vx (±∞) = −ω±
Vx (x) = V± (e∓A± x ), ±x > X 0 , where V± (w) are functions holomorphic in the discs {|w| < We now define outgoing functions:
(4.4) e−A± X 0 }.
Definition 4.1. Fix ω, k, λ. A function u(x) (and the corresponding function of r ) is called outgoing at ±∞ iff u(x) = e±iω± x v± (e∓A± x ),
(4.5)
where v± (w) are holomorphic in a neighborhood of zero. We call u(x) outgoing if it is outgoing at both infinities. Let us construct certain solutions outgoing at one of the infinities: Proposition 4.2. There exist solutions u ± (x; ω, λ, k) to the equation Px u ± = 0 of the form u ± (x; ω, λ, k) = e±iω± x v± (e∓A± x ; ω, λ, k), where v± (w; ω, λ, k) is holomorphic in {|w| < W± } and v± (0; ω, λ, k) =
1 (1 − 2iω± A−1 ± )
.
(4.6)
These solutions are holomorphic in (ω, λ) and are unique unless ν = 2iω± A−1 ± is a positive integer.
Quasi-Normal Modes for Kerr–de Sitter
143
Proof. We only construct the function u + . Let us write the Taylor series for v+ at zero: v+ (w) = vjwj. j≥0
Put w = e−A+ x ; then the equation Px u + = 0 is equivalent to ((A+ w Dw − ω+ )2 + Vx )v± = 0. By (4.2) and Proposition 4.1, Vx is a holomorphic function of w for |w| < W+ . If Vx = j≥0 V j w j is the corresponding Taylor series, then we get the following system of linear equations on the coefficients v j : Vl v j−l = 0, j > 0. (4.7) j A+ (2iω+ − j A+ )v j + 0
If ν is not a positive integer, then this system has a unique solution under the condition v0 = (1 − ν)−1 . This solution can be uniquely holomorphically continued to include the cases when ν is a positive integer. Indeed, one defines the coefficients v0 , . . . , vν by Cramer’s Rule using the first ν equations in (4.7) (this can be done since the zeroes of the determinant of the corresponding matrix match the poles of the gamma function), and the rest are uniquely determined by the remaining equations in the system (4.7). We now prove that the series above converges in the disc {|w| < W+ }. We take ε > 0; then |V j | ≤ M(W+ − ε)− j for some constant M. Then one can use induction and (4.7) to see that |v j | ≤ C(W+ − ε)− j for some constant C. Therefore, the Taylor series for v converges in the disc {|w| < W+ − ε}; since ε was arbitrary, we are done. The condition (4.6) makes it possible for u ± to be zero for certain values of ω± . However, we have the following Proposition 4.3. Assume that one of the solutions u ± is identically zero. Then every solution u to the equation Px u = 0 is outgoing at the corresponding infinity. Proof. Assume that u + (x; ω0 , λ0 , k0 ) ≡ 0. (The argument for u − is similar.) Put ν = 2iω0+ A−1 + ; by (4.6), it has to be a positive integer. Similarly to Proposition 4.2, we can construct a nonzero solution u 1 to the equation Px u 1 = 0 with u 1 (x) = e−iω0+ x v˜1 (e−A+ x ) and v˜1 holomorphic at zero. We can see that u 1 (x) = eiω0+ x v1 (e−A+ x ), where v1 (w) = wν v˜1 (w) is holomorphic; therefore, u 1 is outgoing. Note that u 1 (x) = o(eiω0+ x ) as x → +∞. Now, since u + (x; ω0 , λ0 , k0 ) ≡ 0, we can define u 2 (x) = lim (1 − 2iω+ A−1 + )u + (x; ω, λ0 , k0 ); ω→ω0
it will be an outgoing solution to the equation Px u 2 = 0 and have u 2 (x) = eiω0+ x (1 + o(1)) as x → +∞. We have constructed two linearly independent outgoing solutions to the equation Px u = 0; since this equation only has a two-dimensional space of solutions, every solution must be outgoing.
144
S. Dyatlov
The next statement follows directly from the definition of an outgoing solution and will be used in later sections: Proposition 4.4. Fix δr > 0 and let K x be the image of the set K r = (r− + δr , r+ − δr ) under the change of variables r → x. Assume that X 0 is chosen large enough so that 2 (R) be any outgoing Proposition 4.1 holds and K x ⊂ (−X 0 , X 0 ). Let u(x) ∈ Hloc function in the sense of Definition 4.1 and assume that f = Px u is supported in K x . Then: 1. u can be extended holomorphically to the two half-planes {± Re z > X 0 } and satisfies the equation Pz u = 0 in these half-planes, where Pz = Dz2 + Vx (z) and Vx (z) is well-defined by (4.4). 2. If γ is a contour in the complex plane given by Im z = F(Re z), x− ≤ Re z ≤ x+ , and F(x) = 0 for |x| ≤ X 0 , then we can define the restriction to γ of the holomorphic extension of u by u γ (x) = u(x + i F(x)) and u γ satisfies the equation Pγ u γ = f , where Pγ =
1 Dx 1 + i F (x)
2 + Vx (x + i F(x)).
3. Assume that γ is as above, with x± = ±∞, and F (x) = c = const for large |x|. Then u γ (x) = O(e∓ Im((1+ic)ω± )x ) as x → ±∞. As a consequence, if Im((1+ic)ω± ) > 0, then u γ (x) ∈ H 2 (R). We are now ready to prove Proposition 3.2. Proof of Part 1. Given the functions u ± , define the operator Sx (ω, λ, k) on R by its Schwartz kernel Sx (x, x ; ω, λ, k) = u + (x)u − (x )[x > x ] + u − (x)u + (x )[x < x ]. 2 (R) and P S = W (ω, λ, k), where the The operator Sx (ω, λ) acts L 2comp (R) → Hloc x x Wronskian
W (ω, λ, k) = u + (x; ω, λ, k) · ∂x u − (x; ω, λ, k) − u − (x; ω, λ, k) · ∂x u + (x; ω, λ, k) is constant in x. Moreover, W (ω, λ, k) = 0 if and only if u + (x; ω, λ, k) and u − (x; ω, λ, k) are linearly dependent as functions of x. Also, the image of Sx consists of outgoing functions. Now, we define the radial resolvent Rr (ω, λ, k) = Rx (ω, λ, k) r , where Rx (ω, λ, k) =
Sx (ω, λ, k) . W (ω, λ, k)
(4.8)
2 and (P + λ)R It is clear that Rr is a meromorphic family of operators L 2comp → Hloc r r is the identity operator. We now prove that Rx , and thus Rr , has poles of finite rank. Fix k and take (ω0 , λ0 ) ∈ {W = 0}; we need to prove that for every l, the principal part of the Laurent decomposition of ∂ωl Rx (ω0 , λ, k) at λ = λ0 consists of finite-dimensional
Quasi-Normal Modes for Kerr–de Sitter
145
operators. We use induction on l. One has Px (ω, λ, k)Rx (ω, λ, k) = 1; differentiating this identity l times in ω, we get Px (ω0 , λ, k)∂ωl Rx (ω0 , λ, k) = δl0 1 +
l
cml ∂ωm Px (ω0 , λ, k)∂ωl−m Rx (ω0 , λ, k).
m=1
(Here cml are some constants.) The right-hand side has poles of finite rank by the induction hypothesis. Now, consider the Laurent decomposition ∂ωl Rx (ω0 , λ, k) = Q(λ) +
N j=1
Rj . (λ − λ0 ) j
Here Q is holomorphic at λ0 . Multiplying by Px , we get N N Px (ω0 , λ, k)R j Lj ∼ j (λ − λ0 ) (λ − λ0 ) j j=1
j=1
up to operators holomorphic at λ0 . Here L j are some finite-dimensional operators. We then have Px (ω0 , λ0 , k)R N = L N , Px (ω0 , λ0 , k)R N −1 = L N −1 − (∂λ Px (ω0 , λ0 , k))R N , . . . . Each of the right-hand sides has finite rank and the kernel of Px (ω0 , λ0 , k) is twodimensional; therefore, each R j is finite-dimensional as required. (We also see immediately that the image of each R j consists of smooth functions.) Finally, we establish the decomposition at zero. As in part 1 of Proposition 3.1, it suffices to compute Sx (0, 0, 0) and the first order terms in the Taylor expansion of W at (0, 0, 0). We have u ± (x; 0, 0, 0) = 1 for all x; therefore, Sx (x, x ; 0, 0, 0) = 1. Next, put u ω± (x) = ∂ω u ± (x; 0, 0, 0) and u λ± (x) = ∂λ u ± (x; 0, 0, 0). By differentiating the equation Px u ± = 0 in ω and λ and recalling the boundary conditions at ±∞, we get ∂x2 u λ± (x) = r , u λ± (x) = vλ± (e∓A± x ), ±x 0; ∂x2 u ω± (x) = 0, 2 + a 2 )x + vω± (e∓A± x ), ±x 0, u ω± (x) = ±i(1 + α)(r±
for some functions vλ± , vω± real analytic at zero. We then find ∂λ W (0, 0, 0) = ∂x (u −λ − u +λ ) =
∞
−∞
r d x = r+ − r− ,
2 ∂ω W (0, 0, 0) = ∂x (u −ω − u +ω ) = −i(1 + α)(r+2 + r− + 2a 2 ).
146
S. Dyatlov
Proof of Part 3. Assume that ω and λ are both real and Rr has a pole at (ω, λ, k). Let u(x) be the corresponding resonant state; we know that it has the asymptotics u ± (x) = e±iω± x U± (1 + O(e∓A± x )), x → ±∞;
∂x u ± (x) = e±iω± x U± (±iω± + O(e∓A± x )), x → ±∞ for some nonzero constants U± . Since Vx (x; ω, λ, k) is real-valued, both u and u¯ solve the equation (Dx2 + Vx (x))u = 0. Then the Wronskian Wu (x) = u · ∂x u¯ − u¯ · ∂x u must be constant; however, Wu (x) → ∓2iω± |U± |2 as x → ±∞. Then we must have ω+ ω− ≤ 0; it follows immediately that |ω| = O(|ak|). Proof of Part 4. First, assume that neither of u ± is identically zero. Then the resolvent Rx , and thus Rr , has a pole iff the functions u ± are linearly dependent, or, in other words, if there exists a nonzero outgoing solution u(x) to the equation Px u = 0. Now, if one of u ± , say, u + , is identically zero, then by Proposition 4.3, u − will be an outgoing solution at both infinities. Proof of Part 5. Assume that u(x) is outgoing and Px (ω, λ, k)u = f ∈ L 2 (K x ). Since Im ω > 0, we have Im ω± > 0 and thus u ∈ H 2 (R). First, assume that | arg ω − π/2| < ε, where ε > 0 is a constant to be chosen later. Then Re Vx (x) = (1 + α)2 (r 2 + a 2 )2 (Im ω)2 + Re λ · r − (1 + α)2 ((r 2 + a 2 ) Re ω − ak)2 ; using (3.10), we can choose ε and C1r so that Re Vx (x) ≥ |ω|2 /C > 0 for all x ∈ R. Then 2 u L 2 (R) · f L 2 (R) ≥ Re u(x)(D ¯ x + Vx (x))u(x) d x ≥ Re Vx (x)|u|2 d x ≥ C −1 |ω|2 u2L 2 (R) and (3.11) follows. Now, assume that | arg ω − π/2| ≥ ε. Then Im Vx (x) = −2(1 + α)2 ((r 2 + a 2 ) Re ω − ak)(r 2 + a 2 ) Im ω + Im λ · r ; it follows from (3.10) that we can choose C1r so that the sign of Im Vx (x) is constant in x (positive if arg ω > π/2 and negative otherwise) and, in fact, | Im Vx (x)| ≥ |ω| Im ω/C > 0 for all x. Then (assuming that Im Vx (x) > 0) 2 u L 2 (R) · f L 2 (R) ≥ Im u(x)(D ¯ x + Vx (x))u(x) d x = Im Vx (x)|u|2 d x ≥ C −1 |ω| Im ωu2L 2 (R) and (3.11) follows.
Quasi-Normal Modes for Kerr–de Sitter
147
5. Preliminaries from Semiclassical Analysis In this section, we list certain facts from semiclassical analysis needed in the further analysis of our radial operator. For a general introduction to semiclassical analysis, the reader is referred to [17]. Let a(x, ξ ) belong to the symbol class β
S m = {a(x, ξ ) ∈ C ∞ (R2 ) | supξ |β|−m |∂xα ∂ξ a(x, ξ )| ≤ Cαβ for all α, β}. x,ξ
Here m ∈ R and ξ = 1 + |ξ |2 . Following [17, Sect. 8.6], we define the corresponding semiclassical pseudodifferential operator a w (x, h Dx ) by the formula
x + y i 1 a w (x, h Dx )u(x) = , η u(y) dydη. e h (x−y)η a 2π h 2 Here h > 0 is the semiclassical parameter. We denote by m the class of all semiclassical pseudodifferential operators with symbols in S m . Introduce the semiclassical Sobolev spaces Hhl ⊂ D (R) with the norm u H l = h Dx l u L 2 ; then for a ∈ S m , we have h
w
a (x, h Dx ) H l →H l−m ≤ C, h
h
where C is a constant depending on a, but not on h. Also, if a(x, ξ ) ∈ C0∞ (R2 ), then a w (x, h Dx ) L 2 (R)→L ∞ (R) ≤ Ch −1/2 ,
(5.1)
where C is a constant depending on a, but not on h. (See [17, Thm. 7.10] for the proof.) General facts on multiplication of pseudodifferential operators can be found in [17, Sect. 8.6]. We will need the following: for a ∈ S m and b ∈ S n , 5 if supp a ∩ supp b = ∅, then a w (x, h Dx )bw (x, h Dx ) = O L 2 →H N (h ∞ ) for all N ; h
w
w
w
w
a (x, h Dx )b (x, h Dx ) = (ab) (x, h Dx ) + O m+n−1 (h),
[a (x, h Dx ), bw (x, h Dx )] = −i h{a, b}w (x, h Dx ) + O m+n−2 (h 2 ).
(5.2) (5.3) (5.4)
Here {·, ·} is the Poisson bracket, defined by {a, b} = ∂ξ a · ∂x b − ∂ξ b · ∂x a. Also, if A ∈ m , then the adjoint operator A∗ also lies in m and its symbol is the complex conjugate of the symbol of A. One can study pseudodifferential operators on manifolds [17, App. E], and in particular on the circle S1 = R/2π Z. If a(x, ξ ) = a(ξ ) is a symbol on T ∗ S1 that is independent of x, then a w (h Dx ) is a Fourier series multiplier modulo O(h ∞ ): for each N , u j ei j x , then a(h Dx )u(x) = a(h j)u j ei j x + O H N (h ∞ )u L 2 . (5.5) if u(x) = j∈Z
j∈Z
h
In the next three propositions, we assume that P(h) ∈ m and P(h) = p w (x, h Dx )+ O m−1 (h), where p(x, ξ ) ∈ S m . 5 We write A(h) = O (h k ) for some Fréchet space X , if for each seminorm · of X , there exists a X X constant C such that A(h) X ≤ Ch k . We write A(h) = O X (h ∞ ) if A(h) = O X (h k ) for all k.
148
S. Dyatlov
Proposition 5.1 (Elliptic estimate). Suppose that the function χ ∈ S 0 is chosen so that | p| ≥ ξ m /C > 0 on supp χ for some h-independent constant C. Also, assume that either the set supp χ or its complement is precompact. Then there exists a constant C1 such that for each u ∈ Hhm , χ w (x, h Dx )u Hhm ≤ C1 P(h)u L 2 + O(h ∞ )u L 2 .
(5.6)
Proof. The proof follows the standard parametrix construction. We find a sequence of symbols q j (x, ξ ; h) ∈ S −m− j , j ≥ 0, such that for Q N (h) = h j qw j (x, h D x ), 0≤ j≤N
we get (Q N (h)P(h) − 1)χ w (x, h Dx ) = O −N −1 (h N +1 );
(5.7)
applying this operator equation to u, we prove the proposition. We can take any q0 ∈ −m such that q0 = p −1 near supp χ ; such a symbol exists under our assumptions. The rest of q j can be constructed by induction using Eq. (5.7). Proposition 5.2 (Gårding inequalities). Suppose that χ ∈ C0∞ (R2 ). 1. If Re p ≥ 0 near supp χ , then there exists a constant C such that for every u ∈ L 2 , Re(P(h)χ w u, χ w u) ≥ −Chχ w u2L 2 − O(h ∞ )u2L 2 .
(5.8)
2. If Re p ≥ 2ε > 0 near supp χ for some constant ε > 0, then for h small enough and every u ∈ L 2 , Re(P(h)χ w u, χ w u) ≥ εχ w u2L 2 − O(h ∞ )u2L 2 .
(5.9)
Proof. 1. Take χ1 ∈ C0∞ (R2 ; R) such that χ1 = 1 near supp χ , but Re p ≥ 0 near supp χ1 . Then, apply the standard sharp Gårding inequality [17, Thm. 4.24] to the operator χ1w P(h)χ1w and the function χ w u, and use (5.2). 2. Apply part 1 of this proposition to the operator P(h) − 2ε. Proposition 5.3 (Exponentiation of pseudodifferential operators). Assume that G ∈ w C0∞ (R2 ), s ∈ R, and define the operator esG : L 2 → L 2 as w
esG =
(sG w ) j j≥0
j!
.
Assume that |s| is bounded by an h-independent constant. Then: w
1. esG w ∈ 0 is awpseudodifferential operator. 2. esG P(h)e−sG = P(h) + ish(H p G)w + O L 2 →L 2 (h 2 ). w
Proof. 1. See for example [17, Thm. 8.3] (with m(x, ξ ) = 1). The full symbol of esG can be recovered from the evolution equation satisfied by this family of operators; we see that it is equal to 1 outside of a compact set. 2. It suffices to differentiate both sides of the equation in s, divide them by h, and compare the principal symbols.
Quasi-Normal Modes for Kerr–de Sitter
149
6. Analysis Near the Zero Energy In this section, we prove part 2 of Proposition 3.2. Take h > 0 such that Re λ = h −2 . Put μ˜ = h 2 Im λ, k˜ = hk, ω˜ = hω, ω˜ ± = hω± ; then (3.7) implies that ˜ ≤ εr , |ω| |μ| ≤ εr , |a k| ˜ ≤ εr , |ω˜ ± | ≤ εr ,
(6.1)
where εr > 0 and h can be made arbitrarily small by choice of Cr and ψ. If Px is the operator in (4.2), then Px = h −2 P˜x , where ˜ = h 2 Dx2 + V ˜ x (x; ω, P˜x (h; ω, ˜ μ, ˜ k) ˜ μ, ˜ k), ˜ = (1 + i μ) ˜ 2. x (x; ω, V ˜ μ, ˜ k) ˜ r − (1 + α)2 ((r 2 + a 2 )ω˜ − a k) Now, we use Proposition 4.4. Let u be an outgoing function in the sense of Definition 4.1 and assume that f = P˜x u is supported in K x . Then u satisfies (4.5) for |x| > X 0 and some functions v± . Fix x+ > X 0 and consider the function v1 (y) = v+ (e−A+ (x+ +i y) ; ω, λ, k), y ∈ R.
(6.2)
This is a 2π/A+ -periodic function; we can think of it as a function on the circle. It follows from the differential equation satisfied by v+ together with Cauchy-Riemann equations that Q(h)v1 (y) = 0, where ˜ = (−i h D y + ω˜ + )2 + V ˜ x (x+ + i y; ω, Q(h; ω, ˜ μ, ˜ k) ˜ μ, ˜ k). Let q(y, η) be the semiclassical symbol of Q: x (x+ + i y). q(y, η) = (−iη + ω˜ + )2 + V For small h, the function v1 (y) has to be (semiclassically) microlocalized on the set {q = 0}. Since the symbol q is complex-valued, in a generic situation this set will consist of isolated points. Also, since v1 is the restriction to a certain circle of the function v+ , which is holomorphic inside this circle, it is microlocalized in {η ≤ 0}. Therefore, if the equation q(y, η) = 0 has only one root with η ≤ 0, then the function v1 has to be microlocalized at this root. If furthermore q¯ satisfies Hörmander’s hypoellipticity condition, one can obtain an asymptotic decomposition of v1 in powers of h. We will only need a weak corollary of such decomposition; here is a self-contained proof of the required estimates: Proposition 6.1. Assume that x+ > X 0 is chosen so that: • • • •
the equation q(y, η) = 0, y ∈ S1 , has exactly one root (y0 , η0 ) such that η0 < 0; the equation q(y, η) = 0 has no roots with η = 0; the condition i{q, q} ¯ < 0 is satisfied at (y0 , η0 ); Re(η0 + i ω˜ + ) < 0.
150
S. Dyatlov
(If all of the above hold, we say that we have vertical control at x+ and (y0 , η0 ) is called the microlocalization point.) Let η(y) be the family of solutions to q(y, η(y)) = 0 with η(y0 ) = η0 . Then for each N , each χ (y, η) ∈ C0∞ that is equal to 1 near (y0 , η0 ), and h small enough, we have (1 − χ w (y, h D y ))v1 H N = O(h ∞ )v1 L 2 , h
(h D y − η(y))v1 H N = O(h)v1 L 2 ,
(6.3) (6.4)
h
v1 L 2 ≤ Ch 1/4 |v1 (y0 )|, |(h D y − η0 )v1 (y0 )| ≤ Ch v1 L 2 , h∂x u + (x+ + i y0 ) 1 Re ≤ − < 0. u + (x+ + i y0 ) C 1/2
(6.5) (6.6) (6.7)
Similar statements are true for u + replaced by u − , with the opposite inequality sign in (6.7). Proof. (6.3). We know that inf{η | q(y, η) = 0, (y, η) = (y0 , η0 )} > 0. Therefore, we can decompose 1 = χ + χ+ + χ0 , where χ+ depends only on the η variable, is supported in {η > 0}, and is equal to 1 for large positive η and near every root of the equation q(y, η) = 0 with η > 0. Since v+ is holomorphic at zero, its Taylor series provides the Fourier series for v1 ; it then follows from (5.5) that χ+w (y, h D y )v1 H N = O(h ∞ )v1 L 2 . h
Next, the symbol q is elliptic near supp χ0 ; therefore, by Proposition 5.1 (whose proof applies without changes to our case), since Q(h)v1 = 0, we have χ0w (y, h D y )v1 H N = O(h ∞ )v1 L 2 . h
This finishes the proof. (6.4): Take a small cutoff χ as above, and factor q = (η − η(y))q1 , where q1 (y, η) is nonzero near supp χ . We then find a compactly supported symbol r1 with r1 q1 = 1 near supp χ . Now, we have χ w (y, h D y )(r1w (y, h D y )q1w (y, h D y ) − 1)(h D y − η(y))v1 H N = O(h)v1 L 2 , h
(1 − χ w (y, h D y ))(r1w (y, h D y )q1w (y, h D y ) − 1)(h D y − η(y))v1 H N = O(h ∞ )v1 L 2 , h
r1w (y, h D y )(q1w (y, h D y )(h D y − η(y)) − Q(h))v1 H N = O(h)v1 L 2 . h
It remains to add these up. (6.5). We cut off v1 to make it supported in a small ε-neighborhood of y0 . Put f = (h∂ y − iη(y))v1 ; we know that f L 2 ≤ Chv1 L 2 . Now, put y η(y ) dy . (y) = y0
Quasi-Normal Modes for Kerr–de Sitter
151
The condition i{q, q}| ¯ (y0 ,η0 ) < 0 is equivalent to Im ∂ y η(y0 ) > 0; it follows that Im((y) − (y )) ≥ β((y − y0 )2 − (y − y0 )2 )
(6.8)
for some β > 0, |y − y0 | < ε, and y between y and y0 . (To see that, represent the left-hand side as an integral.) Now, y v1 (y) = ei(y)/ h v1 (y0 ) + h −1 ei((y)−(y ))/ h f (y ) dy . y0
Let T f (y) be the second term in the sum above; it suffices to prove that T f L 2 (y0 −ε,y0 +ε) ≤ Ch −1/2 f L 2 (y0 −ε,y0 +ε) . This can be reduced to the inequalities y |ei((y)−(y ))/ h | dy = O(h 1/2 ), sup 0≤y−y0 <ε
sup
0≤y −y0 <ε
y0 y0 +ε
y
|ei((y)−(y ))/ h | dy = O(h 1/2 ),
and similar inequalities for the case y, y < y0 . We now use (6.8); after a change of variables, it suffices to prove that y ∞ 2 2 2 2 e(y ) −y dy < ∞, sup e(y ) −y dy < ∞. sup y >0 y
y>0 0
To prove the first of these inequalities, make the change of variables y = ys; then the integral becomes 1 2 2 ye y (s −1) ds. 0 y 2 (s 2 −1)
≤ C(1 − s 2 )−1/2 , and the integral of the latter converges. However, ye After the change of variables y = y + s, the integral of the second inequality above becomes ∞ 2 e−2y s−s ds. This can be estimated by (6.6): Let χ ∈
C0∞ (R2 )
0
e
−s 2
ds.
have χ = 1 near (y0 , η0 ). Combining (5.1) and (6.4) with
(1 − χ w (y, h D y ))(h D y − η(y))v1 L ∞ = O(h ∞ )v1 L 2 , we get (h D y − η(y))v1 L ∞ = O(h 1/2 )v1 L 2 ; it remains to take y = y0 . (6.7): Follows immediately from (6.5), (6.6), (6.2), Cauchy-Riemann equations, and the fact that Re(η0 + i ω˜ + ) < 0.
152
S. Dyatlov
Fig. 3. A contour with horizontal control
If P˜x were a semiclassical Schrödinger operator with a strictly positive potential, then a standard integration by parts argument would give us u L 2 ≤ C P˜x u L 2 on any interval for each function u satisfying the condition (6.7) at the right endpoint of this interval and the opposite condition at its left endpoint. We now generalize this argument to our case. Assume that we have vertical control at the points x± , ±x± > X 0 , and let (y± , η± ) be the corresponding microlocalization points. Let γ be a contour in the z plane; we say that we have horizontal control on γ if (Fig. 3): • γ ∩ {| Re z| ≤ X 0 } ⊂ R; • the endpoints of γ are z ± = x± + i y± ; • γ is given by Im z = F(Re z), where F is a smooth function and F (x± ) = 0; x (x + i F(x))] ≥ 1 > 0 for all x. • Re[(1 + i F (x))V C1 Now, let u be as in the beginning of this section and define u γ (x), x− ≤ x ≤ x+ , by Proposition 4.4. Then P˜γ u γ = f , where 2 1 ˜ x (x + i F(x)). h Dx + V Pγ = 1 + i F (x) If we have vertical control at the endpoints of γ , then by (6.7), ± Re(u γ (x± )h∂x u γ (x± )) ≤ −|u γ (x± )|2 /C < 0. Now, assume that we have horizontal control on γ . Then we can integrate by parts to get x+ Re(u γ (1 + i F (x)) f ) d x x−
x+
x− x+
= =
Re(u γ · (1 + i F (x)) P˜γ u γ ) d x Re
x−
|h Dx u γ |2 dx + 1 + i F (x)
x+ x−
+ ≥ −h 2 Re(u γ ∂x u γ )|xx=x −
x (x + i F(x))] · |u γ |2 d x Re[(1 + i F (x))V
1 (u γ 2L 2 + h(|u γ (x+ )|2 + |u γ (x− )|2 )). (6.9) C1
Therefore, u γ L 2 ≤ C f L 2 . It follows that the operator Rx from (4.8) is correctly defined and 1 K x Rx 1 K x L 2 →L 2 ≤ Ch 2 . This proves the estimate (3.8) under the assumptions made above.
Quasi-Normal Modes for Kerr–de Sitter
153
We now prove (3.9). We concentrate on the estimate on K + ; the case of K − is considered in a similar fashion. First of all, it follows from (6.9) that |u γ (x+ )| ≤ Ch −1/2 f L 2 .
(6.10)
Now, assume that we have vertical control at every point of the interval I+ = [x+ , x+ + 1] and let (y(x), η(x)), x ∈ I+ , be the corresponding microlocalization points. Let vx (z) = e−iω+ z u(z) and put v2 (x) = vx (x + i y(x)); then |v2 (x+ )| ≤ CeIm(ω+ z + ) |u γ (x+ )|.
(6.11)
Now, by Proposition 6.1, we have |h∂x ln |v2 (x)| − η(x)| ≤ Ch 3/4 , x ∈ I+ . Integrating (6.12) and combining it with (6.10) and (6.11), we see that if x+ +1 Im(ω˜ + z + ) + η(x) d x < −2δ0
(6.12)
(6.13)
x+
for some δ0 > 0, then |v2 (x+ + 1)| ≤ Ce−δ0 / h f L 2 . Next, v2 (x+ + 1) is the value of v at the microlocalization point; therefore, by Proposition 6.1 and (5.1), sup |vx (x+ + 1 + i y)| ≤ Ch −1/4 e−δ0 / h f L 2 . y∈R
Finally, recall that vx (z) = vw (e−A+ z ), where the function vw (w) is holomorphic inside the disc Bw = {|w| ≤ e−A+ (x+ +1) }. The change of variables w → r is holomorphic by Proposition 4.1; let K +w be the image of K + under this change of variables. If δr 0 is small enough, then K +w lies in the interior of Bw ; then by the maximum principle and Cauchy estimates on derivatives, we can estimate vw C N (K +w ) for each N by O(h ∞ ) f L 2 . This completes the proof of (3.9) if the conditions above are satisfied. To prove part 2 of Proposition 3.2, it remains to establish both vertical and horizontal control in our situation: Proposition 6.2. Assume that δr > 0. Then there exist εr and x± , ±x± > X 0 , such that under the conditions (6.1), • we have vertical control at every point of the intervals I+ = [x+ , x+ + 1] and I− = [x− − 1, x− ]; • we have horizontal control on a certain contour γ ; • the inequality (6.13) (and its analogue on I− ) holds. Proof. Let us first assume that a k˜ = μ˜ = ω˜ = 0. Then ω˜ ± = 0 and q(y, η) = −η2 + r (x+ + i y). Therefore, if we choose x+ large enough, there exists exactly one solution (y0 , η0 ) to the equation q(y, η) = 0 with η ≤ 0, and this solution has y0 = 0. It is easy to verify that in that case we have vertical control on I+ . Similarly one can choose the point x− ; moreover, we can assume that K r ⊂ (x− , x+ ) after the change of x = r , we can take γ to be the interval [x− , x+ ] of the variables r → x. Next, since V real line. The condition (6.13) holds because η(x) < 0 for every x and ω˜ ± = 0. Now, fix x± as above. The parameters of our problem are a, varying in a compact set, ˜ μ,
and M, both fixed, and a k, ˜ ω. ˜ By the implicit function theorem, if the last three
154
S. Dyatlov
parameters are small enough, the (open) conditions of vertical control and the condition (6.13) are still satisfied, yielding y± close to zero. Then one can take the contour γ defined by Im z = F(Re z), where F = 0 near K r , F(x± ) = y± , and F is small in C ∞ . ˜ μ, For small values of a k, ˜ ω, ˜ we will still have horizontal control on this γ , proving the proposition. 7. Resonance Free Strip In this section, we prove Theorem 5. First of all, by Proposition 3.4, it suffices to prove Proposition 7.1. Fix δr > 0, εe > 0, and a large constant C . Then there exist constants a0 > 0 and C such that for | Re λ| + k 2 ≤ C | Re ω|2 , |a| < a0 , | Re ω| ≥ 1/C , | Im ω| ≤ 1/C , | Im λ| ≤ | Re ω|/C , we have 1 K r Rr (ω, λ, k)1 K r L 2 →L 2 ≤ C |ω|εe −1 . Indeed, we take C large enough so that Ck2 (1 + |ω|)2 + L ≤ C |ω|2 /2; then, we put l1 = L and l2 = | Re ω|/C . Next, we reformulate Proposition 7.1 in semiclassical terms. Without loss of generality, we may assume that Re ω > 0. Put h = (Re ω)−1 and consider the rescaled operator ˜ 2. ˜ r − (1 + α)2 ((r 2 + a 2 )(1 + i hν) − a k) P˜x = h 2 Px = h 2 Dx2 + (λ˜ + i h μ) Here Px is the operator in (4.2) and λ˜ = h 2 Re λ, k˜ = hk, μ˜ = h Im λ, ν = Im ω. Then it suffices to prove that for h small enough and under the conditions ˜ ≤ C , |μ| |λ˜ | ≤ C , |k| ˜ ≤ 1/C , |ν| ≤ 1/C ,
(7.1)
for each f (x) ∈ L 2 ∩ E (K x ) and solution u(x) to the equation P˜x u = f which is outgoing in the sense of Definition 4.1, we have u L 2 (K x ) ≤ Ch −1−εe f L 2 .
(7.2)
(Here K x is the image of K r = (r− + δr , r+ − δr ) under the change of variables r → x) 0 + i h V 1 , where (Fig. 4). We write P˜x = h 2 Dx2 + V ˜ 2, 0 = λ˜ r − (1 + α)2 (r 2 + a 2 − a k) V ˜ 2. 1 = μ ˜ r − ν(1 + α)2 (r 2 + a 2 )((r 2 + a 2 )(2 + i hν) − 2a k) V 1 L ∞ ≤ C/C for some global constant C. 0 is real-valued and V We note that V
Quasi-Normal Modes for Kerr–de Sitter
155
Fig. 4. The contour used for complex scaling
We now apply the method of complex scaling. (This method was first developed by Aguilar and Combes in [1]; see [32] and the references there for more recent developments.) Consider the contour γ in the complex plane given by Im x = F(Re x), with F defined by ⎧ |x| ≤ R; ⎨ 0, x ≥ R; F(x) = F0 (x − R), (7.3) ⎩ −F (−x − R), x ≤ −R. 0 Here R > X 0 is large and F0 ∈ C0∞ (0, ∞) is a fixed function such that F0 ≥ 0 and F0 ≥ 0 for all x and F0 (x) = 1 for x ≥ 1. (We could use a contour which forms an arbitrary fixed angle θ˜ ∈ (0, π/2) with the horizonal axis for large x; we choose the angle π/4 to simplify the formulas.) Now, let u be an outgoing solution to the equation P˜x u = f ∈ L 2 ∩ E (K x ), as above. By Proposition 4.4, we can define the restriction u γ of u to γ and P˜γ u γ = f , where P˜γ =
h Dx 1 + i F (x)
2
0 (x + i F(x)) + i h V 1 (x + i F(x)). +V
Also, for a and h small enough, u γ lies in H 2 (R). Therefore, in order to prove (7.2), it is enough to show that for each u γ ∈ H 2 (R), we have u γ L 2 (R) ≤ Ch −1−εe P˜γ u γ L 2 (R) .
(7.4)
Let p0 and pγ 0 be the semiclassical principal symbols of P˜x and P˜γ : 0 (x), p0 (x, ξ ) = ξ 2 + V ξ2 0 (x + i F(x)). pγ 0 (x, ξ ) = +V (1 + i F (x))2 The key property of the operator P˜γ , as opposed to P˜x , is ellipticity at infinity, which 0 (±∞) = −ω˜ 2 , where follows from the fact that V 0± 2 ˜ ≥ 1/C > 0 + a 2 − a k) ω˜ 0± = (1 + α)(r±
156
S. Dyatlov
if a is small enough. Certain other properties of the symbol pγ 0 can be derived using 0 near infinity given by (4.4); we state them for a general class of only the behavior of V potentials: Proposition 7.2. Assume that V (x), x > 0, is a real-valued potential such that for x > X 0 , we have V (x) = V+ (e−A+ x ) for a certain constant A+ > 0 and a function V+ (w) holomorphic in {|w| < e−A+ X 0 }; assume also that V+ (0) < 0. Let F(x) be as in (7.3), for R > X 0 , and put p(x, ξ ) = ξ 2 + V (x), ξ2 pγ (x, ξ ) = + V (x + i F(x)). (1 + i F (x))2 Then there exists a constant Cc such that for R large enough and δ > 0 small enough, if x ≥ R + 1, then | pγ (x, ξ )| ≥ 1/Cc > 0, −A+ R
, then Im pγ (x, ξ ) ≤ 0, if | pγ (x, ξ )| ≤ e if | pγ (x, ξ )| ≤ δ, then | p(x, ξ )| ≤ Cc δ, |∇(Re pγ − p)(x, ξ )| ≤ Cc δ.
(7.5) (7.6) (7.7)
Similar facts hold if V is defined on x < 0 instead. Proof. Without loss of generality, we assume that A+ = 1 and V+ (0) = −1. First of all, if x ≥ R + 1, then pγ (x, ξ ) = −iξ 2 /2 + V (x + i F(x)) = −iξ 2 /2 − 1 + O(e−R ). For R large enough, we then get | pγ (x, ξ )| ≥ 1/2, thus proving (7.5). For the rest of the proof, we may assume that R ≤ x ≤ R + 1. Then, since F0 is increasing, we get 0 ≤ F(x) ≤ F (x). Suppose that | pγ (x, ξ )| ≤ δ; then ξ2 = −V (x + i F(x)) + O(δ) (1 + i F (x))2 = −V (x)(1 + O(δ + e−R F(x))) = 1 + O(δ + e−R ).
(7.8)
Taking the arguments of both sides, we get F (x) ≤ C(δ + e−R F(x)) ≤ Cδ + Ce−R F (x). Then for R large enough, | pγ (x, ξ )| ≤ δ → F (x) ≤ Cδ. This proves (7.7), if we note that F is bounded and Re pγ (x, ξ ) − p(x, ξ ) = ξ 2 G 1 (F (x)2 ) + G 2 (F(x), x) for certain smooth functions G 1 and G 2 that are equal to zero at F = 0 and F = 0, respectively. Now, putting δ = e−R and taking the arguments and then the absolute values of both sides of (7.8), we get for | pγ | ≤ δ, F (x) = O(e−R ), ξ 2 = 1 + O(e−R ). Therefore, Im pγ (x, ξ ) = −2F (x) + O(e−R (F(x) + F (x))) = F (x)(−2 + O(e−R )), which proves (7.6).
Quasi-Normal Modes for Kerr–de Sitter
157
0 and its Hamiltonian flow near the zero energy. The horizontal lines Fig. 5. Three cases for the potential V 0 = ±δV correspond to V
Now, we study the trapping properties of the Hamiltonian flow of p0 at the zero energy: Proposition 7.3. There exist constants C V and δV such that for a small enough and every λ˜ , k˜ satisfying (7.1), at least one of the three dynamical cases below holds: 0 ≤ −δV everywhere; (1) V 0 | ≤ δV } = [x1 , x2 ] [x3 , x4 ], where −C V ≤ x1 < x2 < x3 < x4 ≤ C V and (2) {|V ≤ −1/C V on [x3 , x4 ]; V0 ≥ 1/C V on [x1 , x2 ], V 0 0 ≥ −δV } = [x1 , x2 ] with |x j | ≤ C V , V ≤ −1/C V on [x1 , x2 ]. (3) {V 0
0 < 0 everywhere and Proof. First of all, if λ˜ is small enough or λ˜ < 0, then we have V therefore case (1) holds for δV small enough. Therefore, we may assume that 1/C ≤ λ˜ ≤ C for some constant C. Now, we write 0 (x) = G V (r )(FV (r ) − λ˜ −1 ), V ˜ 2, G V (r ) = λ˜ (1 + α)2 (r 2 + a 2 − a k) r FV (r ) = . 2 2 ˜ 2 (1 + α) (r + a 2 − a k) Note that 1/C ≤ G V (r ) ≤ C for a small enough, some constant C, and all r . As for FV , there exists ε > 0 such that for a small enough, ∂r FV (r ) ≥ 1/C > 0 for r ≤ 3M − ε, ∂r FV (r ) ≤ −1/C < 0 for r ≥ 3M + ε, and ∂r2 FV (r ) ≤ −1/C < 0 for |r − 3M| ≤ ε. Indeed, this is true for a = 0 and follows for small a by a perturbation argument. Let r0 ∈ [3M − ε, 3M + ε] be the point where FV achieves its maximal value. Take small δ1 > 0; then we have one of the following three cases, each of which in turn implies the corresponding case in the statement of this proposition (Fig. 5): 0 (x) < −δV for all x and δV > 0 small enough. (1) FV (r0 ) − λ˜ −1 ≤ −δ1 . Then V
158
S. Dyatlov
(2) FV (r0 ) − λ˜ −1 ≥ δ1 . Then for δ2 < δ1 /2, {|FV − λ˜ −1 | ≤ δ2 } = [x1 , x2 ] [x3 , x4 ], where x2 < x3 , x j are bounded by a global constant (since λ˜ is bounded from above), and ∂r FV (r ) > 1/Cδ > 0 for x ∈ [x1 , x2 ], ∂r FV (r ) < −1/Cδ < 0 for x ∈ [x3 , x4 ]. Here Cδ is a constant depending on δ1 , but not on δ2 . It follows that (x) > 0 for x ∈ [x1 , x2 ] and for δ2 small enough depending on δ1 , we have V 0 (x) < 0 for x ∈ [x3 , x4 ]; also, for δV small enough, we have {|V 0 | ≤ δV } ⊂ V 0 [x1 , x2 ] [x3 , x4 ]. (3) |FV (r0 ) − λ˜ −1 | < δ1 . Then {FV − λ˜ −1 > −δ1 } = [x1 , x2 ] with ∂r2 FV (r ) < < −1/C < 0 −1/C < 0 for x ∈ [x1 , x2 ]. For δ1 small enough, we then get V 0 0 ≥ −δV } ⊂ [x1 , x2 ]. for x ∈ [x1 , x2 ], and for δV small enough, we have {V We are now ready to prove (7.4) and, therefore, Theorem 5. Fix R large enough so that Proposition 7.2 holds. The first two cases in Proposition 7.3 are nontrapping; it follows that there exists an escape function G ∈ C0∞ (R2 ) such that H p0 G < 0 on {| p0 | ≤ δV /2} ∩ {|x| ≤ R + 2}. In the third case, we have hyperbolic trapping with the 0 achieves trapped set consisting of a single point (x0 , 0), where x0 is the point where V ∞ its maximal value; therefore, there still exists an escape function G ∈ C0 (R2 ) such that H p0 G ≤ 0 on {| p0 | ≤ δV /2} ∩ {|x| ≤ R + 2} and H p0 G < 0 on {| p0 | ≤ δV /2} ∩ {|x| ≤ R + 2} \ U (x0 , 0), where U is a neighborhood of (x0 , 0) which can be made arbitrarily small by the choice of G (see [18, Prop. A.6]). Now, given Proposition 7.2, we can choose δ0 > 0 such that Im pγ 0 ≤ 0 on {| pγ 0 | ≤ δ0 }
(7.9)
and for cases (1) and (2) of Proposition 7.3, we have HRe pγ 0 G ≤ −1/C < 0 on {| pγ 0 | ≤ δ0 },
(7.10)
and for case (3) of Proposition 7.3, we have HRe pγ 0 G ≤ 0 on {| pγ 0 | ≤ δ0 }, HRe pγ 0 G ≤ −1/C < 0 on {| pγ 0 | ≤ δ0 }\U (x0 , 0).
(7.11)
Armed with these inequalities, we can handle the nontrapping cases even without requiring that μ and ν be small. The statement below follows the method initially developed in [27] and is a special case of the results in [14, Chap. 6]; however, we choose to present the proof in our simple case: Proposition 7.4. Assume that either case (1) or case (2) of Proposition 7.3 holds. Then for λ˜ and k˜ bounded by C , μ˜ and ν bounded by some constant, and h small enough, we have u γ L 2 ≤ Ch −1 P˜γ u γ L 2
(7.12)
for each u γ ∈ H 2 (R). Proof. Take χ ∈ C0∞ (R2 ) such that supp χ ⊂ {| pγ 0 | < δ0 }, but χ = 1 near { pγ 0 = 0}. Next, take s > 0, to be chosen later, and put w w w P˜γ ,s = esG P˜γ e−sG , u γ ,s = esG χ w u γ .
Quasi-Normal Modes for Kerr–de Sitter
159
Take χ1 ∈ C0∞ (R2 ) supported in {| pγ 0 | < δ0 }, but such that χ1 = 1 near supp χ . Then by part 1 of Proposition 5.3 and (5.2), (1 − χ1w )u γ ,s = O(h ∞ )u γ .
(7.13)
(In the proof of the current proposition, as well as the next one, we only use L 2 norms.) Also, for some s-dependent constant C, C −1 χ w u γ ≤ u γ ,s ≤ Cu γ . Now, by part 2 of Proposition 5.3, we have P˜γ ,s = P˜γ 0 + i hV1 + ish(H pγ ,0 G)w + O(h 2 ). Here P˜γ 0 is the principal part of P˜γ (without V1 ) and the constant in O(h 2 ) depends on s. We then have Im( P˜γ ,s χ1w u γ ,s , χ1w u γ ,s ) = Im( P˜γ 0 χ1w u γ ,s , χ1w u γ ,s ) + h Re(V1 χ1w u γ ,s , χ1w u γ ,s ) +sh((HRe pγ ,0 G)w χ1w u γ ,s , χ1w u γ ,s ) + O(h 2 )χ1w u γ ,s 2 . By (7.9) and part 1 of Proposition 5.2, Im( P˜γ 0 χ1w u γ ,s , χ1w u γ ,s ) ≤ Chχ1w u γ ,s 2 + O(h ∞ )u γ 2 . Next, by (7.10) and part 2 of Proposition 5.2, ((HRe pγ ,0 G)w χ1w u γ ,s , χ1w u γ ,s ) ≤ −C −1 χ1w u γ ,s 2 + O(h ∞ )u γ 2 . Adding these up, we get Im( P˜γ ,s χ1w u γ ,s , χ1w u γ ,s ) ≤ −h(C1−1 s − C1 − O(h))χ1w u γ ,s 2 + O(h ∞ )u γ 2 . Here the constants in O(·) depend on s, but the constant C1 does not. Therefore, if we choose s large enough and h-independent, then for small h we have the estimate χ1w u γ ,s 2 ≤ Ch −1 P˜γ ,s χ1w u γ ,s · χ1w u γ ,s + O(h ∞ )u γ 2 . Together with (7.13), this gives χ w u γ 2 ≤ Ch −1 P˜γ u γ · u γ + Ch −1 [ P˜γ , χ w ]u γ · u γ + O(h ∞ )u γ 2 . Applying Proposition 5.1 to estimate (1 − χ w )u γ and the commutator term above, we get the estimate (7.12). Remark. The method described above can actually be used to obtain a logarithmic resonance free region; however, since we expect the resonances generated by trapping to lie asymptotically on a lattice as in [31], we only go a fixed amount deep into the complex plane. The third case in Proposition 7.3 is where trapping occurs, and we analyse it as in [38]: (See also [10] for a different method of solving the same problem.)
160
S. Dyatlov
Proposition 7.5. Assume that case (3) in Proposition 7.3 holds, and fix εe > 0. Then for λ˜ and k˜ bounded by C and for μ, ˜ ν, h small enough, we have u γ L 2 ≤ Ch −1−εe P˜γ u γ L 2
(7.14)
for each u γ ∈ H 2 (R). 0 Proof. First, we establish [38, Lem. 4.1] in our case. Let x0 be the point where V achieves its maximum value. We may assume that | p0 (x0 , 0)| = |V0 (x0 )| < δ0 /2; otherwise, we are in one of the two nontrapping cases. Put 0 (x); 0 (x0 ) − V ξ˜ (x) = sgn(x − x0 ) V (x0 ) < 0, it is a smooth function. Then, define the functions ϕ± (x, ξ ) = since V 0 ξ ∓ ξ˜ (x). We have H p0 ϕ± (x, ξ ) = ∓c(x, ξ )ϕ± (x, ξ ), where c(x, ξ ) = 2∂x ξ˜ (x) is greater than zero near the trapped point (x0 , 0). Also, {ϕ+ , ϕ− } = c(x, ξ ). Next, take h˜ > h and large C0 > 0, let χ0 ≥ 0 be supported in a small neighborhood of (x0 , 0) with χ0 = 1 near this point, and define the modified escape function [38, (4.6)] G 1 (x, ξ ) = −χ0 (x, ξ ) log
2 (x, ξ ) + h/h˜ ϕ− + C0 log(1/ h)G(x, ξ ). ϕ+2 (x, ξ ) + h/h˜
Here G is an escape function satisfying (7.11). We can write 2 ϕ− ϕ+2 + 2 HRe pγ ,0 G 1 = −2χ0 c 2 + h/h˜ ϕ− ϕ+ + h/h˜ −(H p0 χ0 ) log
2 + h/h˜ ϕ− + C0 log(1/ h)HRe pγ 0 G(x, ξ ). (7.15) ϕ+2 + h/h˜
Take χ1 supported in {| pγ 0 | < δ0 }, but equal to 1 near { pγ 0 = 0}. Then one can use the uncertainty principle [38, Sect. 4.2] to show that if χ2 is supported inside {χ0 = 1}, but χ2 = 1 near (x0 , 0), then for each v ∈ L 2 , ((HRe pγ ,0 G 1 )w χ1w v, χ1w v) ≤ (−C −1 h˜ + O(h˜ 2 ))χ2w v2 + O(log(1/ h))(1 − χ2w )χ1w v2 −C0 C −1 log(1/ h)(1 − χ2w )χ1w v2 + O(C0 h log(1/ h))χ1w v2 + O(h ∞ )v2 ≤ −(C −1 h˜ − O(h˜ 2 + C0 h log(1/ h)))χ2w v2 −(C0 C −1 log(1/ h) − O(C0 h log(1/ h) + log(1/ h)))(1 − χ2w )χ1w v2 + O(h ∞ )v2 .
If we fix C0 large enough and h˜ small enough and assume that h small enough, then ((HRe pγ ,0 G 1 )w χ1w v, χ1w v) ≤ −C −1 log(1/ h)(1 − χ2w )χ1w v2 ˜ 1w v2 + O(h ∞ )v2 . −C −1 hχ
Quasi-Normal Modes for Kerr–de Sitter
161
Next, we conjugate by exponential pseudodifferential weights. First of all, one can prove that G w 1 L 2 →L 2 ≤ C log(1/ h); therefore, w
esG 1 L 2 →L 2 ≤ h −C|s| . Let χ be supported in {χ1 = 1}, but χ = 1 near { pγ 0 = 0}, and w
w
w
Pγ ,s = esG 1 Pγ e−sG 1 , u γ ,s = esG 1 χ w u γ ; then [38, Sect. 4.3] ˜ + sh 3/2 h˜ 3/2 + h 2 ). Pγ ,s = Pγ + ish(H pγ 0 G 1 )w + O(s 2 hh Therefore, since Im pγ 0 = 0 near supp χ2 , Im( P˜γ ,s χ1w u γ ,s , χ1w u γ ,s ) = Im( P˜γ 0 χ w u γ ,s , χ w u γ ,s ) + h Re(V1 χ w u γ ,s , χ w u γ ,s ) 1
1
1
1
+sh Re((HRe pγ ,0 G 1 )w χ1w u γ ,s , χ1w u γ ,s ) + O(s 2 h h˜ + sh 3/2 h˜ 3/2 + h 2 )χ1w u γ ,s 2 ≤ O(h)(1−χ2w )χ1w u γ ,s 2 + hV1 L ∞ χ1w u γ ,s 2 − C −1 sh log(1/ h)(1−χ2w )χ1w u γ ,s 2 ˜ w u γ ,s 2 + O(s 2 h h˜ + sh 3/2 h˜ 3/2 + h 2 )χ w u γ ,s 2 + O(h ∞ )u γ 2 . −C −1 sh hχ 1 1
Here P˜γ 0 is the principal part of P˜γ , as before. If we choose s small enough independently of h, then for small h, Im( P˜γ ,s χ1w u γ ,s , χ1w u γ ,s ) ≤ −C1 sh log(1/ h)(1 − χ2w )χ1w u γ ,s 2 −h(C −1 s h˜ − V1 L ∞ )χ1w u γ ,s 2 + O(h ∞ )u γ 2 . Now, V1 L ∞ can be made very small by choosing μ˜ and ν small enough. Then, we get χ w u γ 2 ≤ Ch −1−Cs u γ · Pγ χ w u γ + O(h ∞ )u γ 2 . By proceeding as in the end of Proposition 7.4, we get (7.14), provided that s is small enough. Acknowledgements. I would like to thank Maciej Zworski for suggesting the problem, lots of helpful advice, and encouragement, and Kiril Datchev, Mihai Tohaneanu, Daniel Tataru, and Tobias Schottdorf for some very helpful discussions. I am also grateful for partial support from NSF grant DMS-0654436. Finally, I am especially thankful to an anonymous referee for many suggestions to improve the manuscript.
162
S. Dyatlov
References 1. Aguilar, J., Combes, J.M.: A class of analytic perturbations for one-body Schrödinger Hamiltonians. Commun. Math. Phys 22, 269–279 (1971) 2. Andersson, L., Blue, P.: Hidden symmetries and decay for the wave equation on the Kerr spacetime, http://arxiv.org/abs/0908.2265v2 [math.Ap], 2009 3. Bachelot, A.: Gravitational scattering of electromagnetic field by Schwarzschild black hole. Ann. Inst. H. Poincaré Phys. Théor. 54, 261–320 (1991) 4. A. Bachelot, Scattering of electromagnetic field by de Sitter–Schwarzschild black hole, in Non-linear hyperbolic equations and field theory. Pitman Res. Notes Math. Ser. 253, London: Pitman, 1992, pp. 23–35 5. Bachelot, A., Motet-Bachelot, A.: Les résonances d’un trou noir de Schwarzschild. Ann. Inst. H. Poincaré Phys. Théor. 59, 3–68 (1993) 6. Ben-Artzi, M., Devinatz, A.: Resolvent estimates for a sum of tensor products with applications to the spectral theory of differential operators. J. d’Analyse Math. 43, 215–250 (1983/4) 7. Berenger, J.-P.: A perferctly matched layer for the absorption of electromagnetic waves. J. Comput. Phys. 114, 185–200 (1994) 8. Berti, E., Cardoso, V., Starinets, A.: Quasinormal modes of black holes and black branes. Class. Quant. Grav. 26, 163001 (2009) 9. Blue, P., Sterbenz, J.: Uniform decay of local energy and the semi-linear wave equation on Schwarzschild space. Comm. Math. Phys. 268, 481–504 (2006) 10. Bony, J.-F., Fujie, S., Ramond, T., Zerzeri, M.: Spectral projection, residue of the scattering amplitude, and Schrödinger group expansion for barrier-top resonances, http://arxiv.org/abs/0908.3444v1 [math.Ap], 2009 11. Bony, J.-F., Häfner, D.: Decay and non-decay of the local energy for the wave equation on the de SitterSchwarzschild metric. Comm. Math. Phys. 282, 697–719 (2008) 12. Carter, B.: Hamilton-Jacobi and Schrödinger separable solutions of Einstein’s equations. Comm. Math. Phys. 10, 280–310 (1968) 13. Dafermos, M., Rodnianski, I.: Lectures on black holes and linear waves, http://arxiv.org/abs/0811.0354v1 [gr-qc], 2008 14. Datchev, K.: Distribution of resonances for manifolds with hyperbolic ends, Doctoral dissertation, University of California, Berkeley, 2010, http://math.berkeley.edu/~datchev/main.pdf 15. Donninger, R., Schlag, W., Soffer, A.: A proof of Price’s Law on Schwarzschild black hole manifolds for all angular momenta. Adv. Math. 226, 484–540 (2001) 16. Donninger, R., Schlag, W., Soffer, A.: On pointwise decay of linear waves on a Schwarzschild black hole background. http://arxiv.org/abs/0911.3179v1 [math. Ap], 2009 17. Evans, L.C., Zworski, M.: Semiclassical analysis. Lecture notes, version 0.8, http://math.berkeley.edu/ ~zworski/semiclassical.pdf 18. Gérard, C., Sjöstrand, J.: Semiclassical resonances generated by a closed trajectory of hyperbolic type. Comm. Math. Phys. 108, 391–421 (1987) 19. Finster, F., Kamran, N., Smoller, J., Yau, S.-T.: Decay of solutions of the wave equation in the Kerr geometry. Comm. Math. Phys. 264, 465–503 (2006) 20. Finster, F., Kamran, N., Smoller, J., Yau, S.-T.: Erratum: “Decay of solutions of the wave equation in the Kerr geometry”. Comm. Math. Phys. 280, 563–573 (2008) 21. Guillarmou, C.: Meromorphic properties of the resolvent on asymptotically hyperbolic manifolds. Duke. Math. J. 129, 1–37 (2005) 22. Hörmander, L.: The Analysis of Linear Partial Differential Operators III. Berlin, HeidlebergNewyork:Springer, 1994 23. Kofinti, N.K.: Scattering of a Klein–Gordon particle by a black hole. Internat. J. Theoret. Phys. 23, 991–999 (1984) 24. Kokkotas, K.D., Schmidt, B.G.: Quasi-normal modes of stars and black holes. Living Rev. Relativity 2(1999), http://www.livingreviews.org/lrr-1999-2, 1999 25. Konoplya, R.A., Zhidenko, A.: High overtones of Schwarzschild-de Sitter quasinormal spectrum. JHEP 0406, 037 (2004) 26. Konoplya, R.A., Zhidenko, A.: Decay of a charged scalar and Dirac fields in the Kerr-Newman-de Sitter background. Phys. Rev. D 76, 084018 (2007) 27. Martinez, A.: Resonance free domains for non globally analytic potentials. Ann. Inst. H. Poincaré 3, 739– 756 (2002) 28. Mazzeo, R.R., Melrose, R.B.: Meromorphic extension of the resolvent on complete spaces with asymptotically constant negative curvature. J. Funct. Anal. 75, 260–310 (1987) 29. Mazzeo, R.R., Vasy, A.: Resolvents and Martin boundaries of product spaces. Geom. Funct. Anal. 12, 1018–1079 (2002)
Quasi-Normal Modes for Kerr–de Sitter
163
30. Melrose, R.B., Sá Barreto, A., Vasy, A.: Asymptotics of solutions of the wave equation on de Sitter– Schwarzschild space. http://arxiv.org/abs/0811.2229v1 [math.Ap], 2008 31. Sá Barreto, A., Zworski, M.: Distribution of resonances for spherical black holes. Math. Res. Lett. 4, 103– 121 (1997) 32. Sjöstrand, J., Zworski, M.: Complex scaling and the distribution of scattering poles. J. Amer. Math. Soc. 4, 729–769 (1991) 33. Tataru, D., Tohaneanu, M.: Local energy estimate on Kerr black hole backgrounds. http://arxiv.org/abs/ 0810.5766v2 [math. Ap], 2008 34. Tataru, D.: Local decay of waves on asymptotically flat stationary space-times. http://arxiv.org/abs/0910. 5290v2 [math. Ap], 2010 35. Taylor, M.: Partial Differential Equations I. Basic Theory. Berlin-Heidleberg-New York:Springer, 1996 36. Teukolsky, S.A.: Rotating black holes: separable wave equations for gravitational and electromagnetic pertrubations. Phys. Rev. Lett. 29, 1114–1118 (1972) 37. Tohaneanu, M.: Strichartz estimates on Kerr black hole backgrounds. http://arxiv.org/abs/0910.1545v1 [math.Ap], 2009 38. Wunsch, J., Zworski, M.: Resolvent estimates for normally hyperbolic trapped sets. http://arxiv.org/abs/ 1003.4640v2 [math.Ap], 2010 Communicated by P.T. Chru´sciel
Commun. Math. Phys. 306, 165–186 (2011) Digital Object Identifier (DOI) 10.1007/s00220-011-1282-1
Communications in
Mathematical Physics
Min- and Max-Entropy in Infinite Dimensions Fabian Furrer1 , Johan Åberg2 , Renato Renner2 1 Institute for Theoretical Physics, Leibniz Universität Hannover, 30167 Hannover, Germany.
E-mail:
[email protected]
2 Institute for Theoretical Physics, ETH Zurich, 8093 Zurich, Switzerland.
E-mail:
[email protected];
[email protected] Received: 10 June 2010 / Accepted: 1 February 2011 Published online: 25 June 2011 – © The Author(s) 2011. This article is published with open access at Springerlink.com
Abstract: We consider an extension of the conditional min- and max-entropies to infinite-dimensional separable Hilbert spaces. We show that these satisfy characterizing properties known from the finite-dimensional case, and retain information-theoretic operational interpretations, e.g., the min-entropy as maximum achievable quantum correlation, and the max-entropy as decoupling accuracy. We furthermore generalize the smoothed versions of these entropies and prove an infinite-dimensional quantum asymptotic equipartition property. To facilitate these generalizations we show that the min- and max-entropy can be expressed in terms of convergent sequences of finite-dimensional min- and max-entropies, which provides a convenient technique to extend proofs from the finite to the infinite-dimensional setting. 1. Introduction Entropy measures are fundamental to information theory. For example, in classical information theory a central role is played by the Shannon entropy [1] and in quantum information theory by the von Neumann entropy. Their usefulness partially stems from the fact that they have several convenient mathematical properties (e.g. strong subadditivity) that facilitate a ‘calculus’ of information and uncertainty. Indeed, entropy measures can even be characterized axiomatically in terms of such properties [2]. However, equally important for their use in information theory is the fact that they are related to operational quantities. This means that they characterize the optimal efficiency by which various information-theoretic tasks can be solved. One example of such a task is source coding, where one considers a source that randomly outputs data according to some given probability distribution. The question of interest is how much memory is needed in order to store and faithfully regenerate the data. Another example is channel coding, where the aim is to reliably transmit information over a channel. Here we ask how many bits (or qubits in the quantum case) one can optimally transmit per use of the channel [1,3,4].
166
F. Furrer, J. Åberg, R. Renner
The operational relevance of Shannon and von Neumann entropy is normally limited to the case when one considers the asymptotic limit over infinitely many instances of a random experiment, which are independent and identically distributed (iid) or can be described by a Markov process. In the case of source coding this corresponds to assuming an iid repetition of the source. In the limit of infinitely many such repetitions, the average number of bits one needs to store per output is given by the Shannon entropy of the distribution of the source [1]. In the general case, where we have more complicated types of correlations, or where we only consider finite instances, the role of the Shannon or von Neumann entropies appears to be taken over by other measures of entropy, referred to as the smooth min- and max-entropies [5]. For example, in [6,7] it was found that the smooth max-entropy characterizes one-shot data compression, i.e., when we wish to compress a single output of an information source. Furthermore, in [8] it was proved that in one single use of a classical channel, the transmission can be characterized by the difference between a smooth min- and max-entropy. The von Neumann entropy of a state can be regained via the quantum asymptotic equipartition property (AEP) [5,9], by applying these measures to asymptotically many iid repetitions of the state. This allows us to derive properties of the von Neumann entropy from the smooth min- and max-entropies, a technique that has been used for an alternative proof of the quantum reverse Shannon theorem [10], and to derive an entropic uncertainty relation [11]. The min- and max-entropies furthermore generalize the spectral entropy rates [12] (that are defined in an asymptotic sense) which themselves have been introduced as generalizations of the Shannon entropy [13,14]. Closely related quantities are the relative min- and max-entropies [15], which have been applied to entanglement theory [16,17] as well as channel capacity [18]. So far, the investigations of the operational relevance and properties of the minand max-entropy and their smoothed versions have been almost exclusively focused on quantum systems with finite-dimensional Hilbert spaces. Here we consider the min- and max-entropy in infinite-dimensional separable Hilbert spaces. Since the modeling in vast parts of quantum physics is firmly rooted in infinite-dimensional Hilbert spaces, it appears that such a generalization is crucial for the application of these tools. For example, it has recently been shown that the smooth min- and max-entropies are the relevant measures of entropy in certain statistical mechanics settings [19,20]. An extension of these ideas to, e.g., quantized classical systems, would require an infinite-dimensional version of the min- and max-entropy. Another example is quantum key distribution (QKD), where in the finite-dimensional case the smooth min-entropy bounds the length of the secure key that can be extracted from an initial raw key [5]. The generalization to infinite dimensions has therefore direct relevance for continuous variable QKD (for references see, e.g., Sect. II.D. 3 of [21]). In such a scheme one uses the quadratures of the electromagnetic field to establish a secret key (as opposed to other schemes that use, e.g., the polarization degree of freedom of single photons). Since such QKD methods are based on the generation of coherent states and measurement of quadratures, it appears rather unavoidable to use infinite-dimensional Hilbert spaces to model the states of the field modes. Beyond the obvious application to continuous variable quantum key distribution, one can argue that there are several quantum cryptographic tasks that today are analyzed in finite-dimensional settings, which strictly speaking would require an analysis in infinite-dimensions, since there is in general no reason to assume the Hilbert spaces of the adversary’s systems to be finite. As indicated by the above discussion, an extension of the min- and max-entropies to an infinite-dimensional setting does not only require that we can reproduce known mathematical properties of these measures, but also that we should retain their operational
Min- and Max-Entropy in Infinite Dimensions
167
interpretations. A complete study of this two-fold goal would bring us far beyond the scope of this work. However, here we pave the way for this development by introducing an infinite-dimensional generalization of the min- and max-entropy, and demonstrating a collection of ‘core’ properties and operational interpretations. In particular, we derive (under conditions detailed below) a quantum AEP for a specific choice of an infinitedimensional conditional von Neumann entropy. On a more practical level we introduce a technique that facilitates the extension of results proved for the finite-dimensional case to the setting of separable Hilbert spaces. More precisely, we show that the conditional min- and max-entropies for infinite-dimensional states can be expressed as limits of entropies obtained by finite-dimensional truncations of the original state (Proposition 1). This turns out to be a convenient tool for generalizations, and we illustrate this on the various infinite-dimensional extensions that we consider. The -smoothed min-and max-entropies are defined in terms of the ‘un-smoothed’ ( = 0) min- and max-entropies (which we simply refer to as ‘ min- and max-entropy’). In Sect. 2.1 we extend these ‘plain’ min- and max-entropies to separable Hilbert spaces. Section 2.2 contains the main technical tool, Proposition 1, by which the infinitedimensional min- and max-entropies can be expressed as limits of sequences of finitedimensional entropies. The proof of Proposition 1 is given in Appendix B. In Sect. 3 we consider properties of the min- and max-entropy, e.g., additivity and the data processing inequality. Section 4 focuses on the generalization of operational interpretations. In Sect. 5 we consider the extension of the -smooth min- and max-entropies, for > 0. In Sect. 6 we bound the smooth min- and max-entropy of an iid state on a system A conditioned on a system B in terms of the conditional von Neumann entropy (Proposition 8). This result relies on the assumption that A has finite von Neumann entropy. If A furthermore has a finite-dimensional Hilbert space (but the Hilbert space of B is allowed to be separable) we prove that these smooth entropies converge to the conditional von Neumann entropy (Corollary 1), which corresponds to a quantum AEP. The paper ends with a short summary and outlook in Sect. 7. 2. Min- and Max-Entropy 2.1. Definition of the conditional min- and max-entropy. Associated to each quantum system is a Hilbert space H , which we assume to be separable in all that follows. We denote the bounded operators by L(H ) = {A : H → H | A < ∞}, where A = supψ=1 A|ψ is the standard operator norm. Among these, the trace class operators √ satisfy the additional feature of having a finite trace norm T 1 := tr |T | = tr T † T . The set of trace class operators is denoted by τ1 (H ) := {T ∈ L(H )| T 1 < ∞}. We consider states which can be represented as density operators, i.e., normal states [22], and denote the set of all these states as S(H ) := {ρ ∈ τ1 (H )| ρ ≥ 0, ρ1 = 1}. It is often convenient to allow non-normalized density operators, which form the positive cone τ1+ (H ) ⊂ τ1 (H ) consisting of all non-negative trace class operators. We define the conditional min- and max-entropy of bipartite quantum systems analogously to the finite-dimensional case [23].1 Definition 1. Let H A and H B be separable Hilbert spaces and ρ AB ∈ τ1+ (H A ⊗ H B ). The min-entropy of ρ AB conditioned on σ B ∈ τ1+ (H B ) is defined by Hmin (ρ AB |σ B ) := − log inf{λ ∈ R|λ id A ⊗σ B ≥ ρ AB },
(1)
1 Max-entropy as we define it in Eq. (3) is related to the Rényi 1/2-entropy (see Sect. 3.2 or [23,24]). In
the original definition [5] max-entropy was defined in terms of the Rényi 0-entropy.
168
F. Furrer, J. Åberg, R. Renner
where we let Hmin (ρ AB |σ B ) := −∞ if the condition λ id A ⊗σ B ≥ ρ AB cannot be satisfied for any λ ∈ R. Moreover, we define the min-entropy of ρ AB conditioned on B by Hmin (ρ AB |B) :=
sup
σ B ∈S (H B )
Hmin (ρ AB |σ B ).
(2)
The max-entropy of ρ AB conditioned on B is defined as the dual of the min-entropy Hmax (ρ AB |B) := −Hmin (ρ AC |C),
(3)
where ρ ABC is a purification of ρ AB . In the definition above, and in all that follows, we let “log” denote the binary logarithm. The reduction of a state to a subsystem is indicated by the labels of the Hilbert space, e.g., ρ A = trC ρ AC . Note that the max-entropy Hmax (ρ AB |B) as defined in (3) is independent of the choice of the purification ρ ABC , and thus well-defined. This follows from the fact that two purifications can only differ by a partial isometry on the purifying system, and the min-entropy Hmin (ρ AC |C) is invariant under these partial isometries on subsystem C. The two optimizations in the definition of Hmin (ρ AB |B), in Eqs. (1) and (2), can be combined into Hmin (ρ AB |B) = − log inf{tr σ˜ B | σ˜ B ∈ τ1+ (H B ), id A ⊗σ˜ B ≥ ρ AB } . (4) For convenience we introduce the following two quantities: (ρ AB |σ B ) := 2−Hmin (ρ AB |σ B ) = inf{λ ∈ R|λ id A ⊗σ B ≥ ρ AB }, (5) −Hmin (ρ AB |B) + (ρ AB |B) := 2 = inf{tr σ˜ B | σ˜ B ∈ τ1 (H B ), id A ⊗σ˜ B ≥ ρ AB }. (6) 2.2. Finite-dimensional approximations of min- and max-entropies. In this section we present the main result, Proposition 1, that provides a method to express the conditional min- and max-entropy as a limit of min- and max-entropies of finiteB ∞ dimensional systems. The rough idea is to choose sequences {PkA }∞ k=1 and {Pk }k=1 A B 2 of projectors onto finite-dimensional subspaces Uk ⊂ H A and Uk ⊂ H B , respectively, both converging to the identity. Then we define a sequence of non-normalized states as ρ kAB = (PkA ⊗ PkB )ρ AB (PkA ⊗ PkB ). The min- or max-entropy of ρ kAB can now be treated as if the underlying Hilbert space would be UkA ⊗ UkB (Lemma 8), and therefore finite-dimensional. Proposition 1 shows that, as k → ∞, these finite-dimensional entropies approach the desired infinite-dimensional entropy. As we will see, this provides a convenient method to extend properties from the finite to the infinite setting. When we say that an operator sequence Q k converges to Q in the weak operator topology we intend that limk→0 χ |Q − Q k |ψ = 0 for all |χ , |ψ ∈ H . The sequence converges in the strong operator topology if lim k→0 (Q− Q k )|ψ = 0 for all |ψ ∈ H . Definition 2. Let {PkA }k∈N ⊂ L(H A ), {PkB }k∈N ⊂ L(H B ) be sequences of projectors such that for each k ∈ N the projection spaces UkA ⊂ H A , UkB ⊂ H B of PkA , PkB are finite-dimensional, PkA ≤ PkA and PkB ≤ PkB for all k ≤ k , and PkA , PkB converge in 2 With “projector” we intend a bounded operator P such that P 2 = P and P † = P, which in the mathematics literature usually is referred to as an “orthogonal projector”.
Min- and Max-Entropy in Infinite Dimensions
169
the weak operator topology to the identity. We refer to such a sequence (PkA , PkB ) as a generator of projected states. For ρ AB ∈ S(H A ⊗ H B ) we define the (non-normalized) states ρ kAB := (PkA ⊗ PkB )ρ AB (PkA ⊗ PkB ),
(7)
which we call the projected states of ρ AB relative to (PkA , PkB ). Moreover, we refer to ρˆ kAB :=
ρ kAB
(8)
tr ρ kAB
as the normalized projected states of ρ AB relative to (PkA , PkB ). Note that a sequence of projectors that converges in the weak operator topology to the identity also converges in the strong operator topology to the identity. As a matter of convenience, we can thus in all that follows regard the generators of projected states to converge in the strong operator topology. One may also note that the sequence of projected states ρ kAB (as well as the normalized projected states ρˆ kAB ) converges to ρ AB in the trace norm (see Corollary 2 in Appendix A). The normalized projected states in Eq. (8) are of course only defined if tr ρ kAB = 0. However, this is true for all sufficiently large k due to the trace norm convergence to ρ AB . Proposition 1. For ρ AB ∈ S(H A ⊗ H B ), let {ρ kAB }k∈N be the projected states of ρ AB relative to a generator (PkA , PkB ), and ρˆ kAB the corresponding normalized projected states. Furthermore, let σ B ∈ S(H B ) and define the operators σ Bk := PkB σ B PkB and σˆ Bk := tr(σ Bk )−1 σ Bk . Then, the following three statements hold: (9) Hmin (ρ AB |σ B ) = lim Hmin (ρ kAB |σ Bk ) = lim Hmin ρˆ kAB |σˆ Bk , k→∞
k→∞
and the infimum in Eq. (1) is attained if Hmin (ρ AB |σ B ) is finite. Hmin (ρ AB |B) = lim Hmin (ρ kAB |Bk ) = lim Hmin ρˆ kAB |Bk ,
(10)
and the supremum in Eq. (2) is attained if Hmin (ρ AB |B) is finite. Hmax (ρ AB |B) = lim Hmax (ρ kAB |Bk ) = lim Hmax ρˆ kAB |Bk .
(11)
k→∞
k→∞
k→∞
k→∞
Here, Bk denotes the restriction of system B to the projection space UkB of PkB . The proof of this proposition is found in Appendix B. When we say that the infimum in (1) is attained, it means that there exists a finite λ such that λ id A ⊗σ B − ρ AB ≥ 0 and Hmin (ρ AB |σ B ) = − log λ . Similarly, that the supremum in (2) is attained, means that there exists a σ B ∈ τ1+ (H B ) satisfying id ⊗σ B ≥ ρ AB such that Hmin (ρ AB |B) = Hmin (ρ AB |σ B ). Given the above proposition, a natural question is if Hmin (ρ AB |B) and Hmax (ρ AB |B) are trace norm continuous in general. In the finite-dimensional case [24] it is known that these entropies are continuous with a Lipschitz constant depending on the dimension of H A . However, the following example shows that they are in general not continuous in
170
F. Furrer, J. Åberg, R. Renner
the infinite-dimensional case. Let {|k}k=0,1,... be an arbitrary orthonormal basis of the Hilbert space H A . For each n = 1, 2, . . . let ρn = (1 −
n 1 1 |kk|. )|00| + 2 n n
(12)
k=1
One can see that ρn converges in the trace norm to |00| as n → ∞, while limn→∞ Hmax (ρn ) = 2, and Hmax (|00|) = 0. Hence, the max-entropy is not continuous. (Hmax (ρ) without conditioning means that we condition on a trivial subsystem B. See Eq. (19).) The duality, Eq. (3), yields an example also for the min-entropy.
3. Properties of Min- and Max-Entropy 3.1. Additivity and the data processing inequality. Proposition 1 can be used as a tool to generalize known finite-dimensional results to the infinite-dimensional case. A simple example is the ordering property [9] Hmin (ρ AB |B) ≤ Hmax (ρ AB |B),
(13)
which is obtained by a direct application of Proposition 1. Another example is the additivity, which in the finite-dimensional case was proved in [5]. A direct generalization of the proof techniques they employed appears rather challenging, while Proposition 1 makes the generalization straightforward. Proposition 2. Let ρ AB ∈ S(H A ⊗ H B ) and ρ A B ∈ S(H A ⊗ H B ) for H A , H A , H B , and H B separable Hilbert spaces. Then, it follows that Hmin (ρ AB ⊗ ρ A B |B B ) = Hmin (ρ AB |B) + Hmin (ρ A B |B ), Hmax (ρ AB ⊗ ρ A B |B B ) = Hmax (ρ AB |B) + Hmax (ρ A B |B ).
(14) (15)
The proof is a simple application of the approximation scheme in Proposition 1 combined with Lemma 6 and the finite-dimensional version of Proposition 2, and therefore omitted. For the sake of completeness we note that the data processing inequalities [5] also hold in the infinite-dimensional setting. In this case, however, there is no need to resort to Proposition 1, as the proof in [5] can be generalized directly. Proposition 3. Let ρ ABC ∈ τ+ (H A ⊗ H B ⊗ HC ) for separable Hilbert spaces H A , H B and HC . Then, it follows that Hmin (ρ ABC |BC) ≤ Hmin (ρ AB |B), Hmax (ρ ABC |BC) ≤ Hmax (ρ AB |B).
(16) (17)
The data processing inequalities can be regarded as the min- and max-entropy counterparts of the strong subadditivity of the von Neumann entropy (and are sometimes directly referred to as “strong subadditivity”). One reason for this is that the standard formulation of the strong subadditivity of von Neumann entropy [25–27], H (ρ ABC ) + H (ρ B ) ≤ H (ρ AB ) + H (ρ BC ), can be recast in the same form.
Min- and Max-Entropy in Infinite Dimensions
171
3.2. Entropy of pure states, and a bound for general states. Here we briefly consider the fact that the min-entropy can take the value −∞, and the max-entropy can take the value +∞. For this purpose we discuss the special case of pure states, as well as the case of no conditioning (i.e., if there is no subsystem B). Based on this we obtain a general bound which√says that the conditional min- and max-entropies of a state ρ AB are finite if the operator ρ A is trace class. Moreover it turns out that the min-entropy cannot attain the value +∞, while the max-entropy cannot attain −∞. Lemma 1. The min-entropy of ρ AB = |ψψ|, where |ψ ∈ H A ⊗ H B , is given by √ Hmin (ρ AB |B) = −2 log tr ρ A . (18) √ From this lemma we can conclude that Hmin (ρ AB |B) is finite if and only if ρ A is trace class. Schmidt decomposition [28] of ψ is given Otherwise Hmin (ρ AB |B) = the √ −∞. If ∞ by ∞ k=1 rk |ak |bk , we have tr ρ A = k=1 rk , such that a finite Schmidt rank always implies that Hmin (ρ AB |B) is finite. Recall that the Schmidt coefficients characterize the entanglement of a pure state, and, roughly speaking, that the more uniformly the Schmidt coefficients are distributed the stronger is the entanglement (see for instance [28]). This suggests that pure states with Hmin (ρ AB |B) = −∞ are entangled in a rather strong sense. Proof. Let |ψ = ∞ ˜B ∈ k=1 rk |ak |bk be the Schmidt decomposition of |ψ, and σ + τ ˜ B ≥ ρ AB . For each n ∈ N define |χn = B ) any operator that satisfies id A ⊗σ 1 (H n k=1 |ak |bk . It follows that n 2 tr σ˜ B ≥ χn | id A ⊗σ˜ B |χn ≥ χn |ρ AB |χn = rk , k=1
and thus, by taking the infimum over all σ˜ B with√id A ⊗σ˜ B ≥ ρ AB , as well as the supremum over all n, we find (ρ AB |B) ≥ (tr ρ A )2 . Especially, we see that if √ tr ρ A = +∞, then (ρ √ AB |B) = +∞ (and√thus Hmin (ρ AB |B) = −∞). In the following we assume that tr ρ A < +∞, i.e., ρ A ∈ τ1+ (H A ). We show that the lower √ √ √ bound (ρ AB |B) ≥ (tr ρ A )2 is attained, by proving that σ˜ B := tr( ρ A ) ρ B satisfies id A ⊗ σ˜ B ≥ ρ AB . By using the Schmidt decomposition of ψ we compute for an arbitrary η ∈ H A ⊗ HB ,
2
∞ ∞
√
2 η|(id ⊗σ˜ B − ρ AB )|η = tr( ρ A ) |ck,l | rl −
ck,k rk
k,l=1 k=1
2
∞ ∞ ∞
2 ≥ rl |ck,k | rk −
ck,k rk ≥ 0,
l=1
k=1
k=1
where ck,l = (ak |bl |)|η, and the last step follows from the Cauchy-Schwarz inequality. Hence, id A ⊗σ˜ B √ − ρ AB is positive and therefore tr(σ˜ B ) ≥ (ρ AB |B). Combined √ with (ρ AB |B) ≥ (tr ρ A )2 , we find Hmin (ρ AB |B) = − log (ρ AB |B) = −2 log tr ρ A . The duality (3) allows us to rewrite Lemma 1 by using the unconditional max-entropy. For every ρ ∈ S(H ) this yields the quantum 1/2-Rényi entropy (cf. [23]), √ Hmax (ρ) = 2 log tr ρ = H 1 (ρ), (19) 2 √ if ρ is trace-class. Otherwise Hmax (ρ) = +∞.
172
F. Furrer, J. Åberg, R. Renner
The unconditional min-entropy is obtained by conditioning on a trivial subsystem B. One can see that Hmin (ρ) = − log ρ.
(20)
For a pure state ρ AB = |ψψ| ∈ S(H A ⊗ H B ), the max-entropy is given by Hmax (ρ AB |B) = log ρ A .
(21)
To see this one can apply the duality (3) where we purify the pure state ρ AB with a trivial system C, and next use Eq. (20). By combining these facts with the data processing inequality, Hmin (ρ ABC |BC) ≤ Hmin (ρ AB |B) ≤ Hmin (ρ A ) and Hmax (ρ ABC |BC) ≤ Hmax (ρ AB |B) ≤ Hmax (ρ A ), for ρ ABC a purification of ρ AB , we find the following bounds on the min- and max-entropy. Proposition 4. For every state ρ AB ∈ S(H A ⊗ H B ) it holds that √ − 2 log tr ρ A ≤ Hmin (ρ AB |B) ≤ − log ρ A , √ log ρ A ≤ Hmax (ρ AB |B) ≤ 2 log tr ρ A . Hence, Hmin (ρ AB |B) and Hmax (ρ AB |B) are finite if
(22) (23)
√ ρ A is trace-class.
4. Operational Interpretations of Min- and Max-Entropy Min- and max-entropy can be regarded as answers to operational questions, i.e., they quantify the optimal solution to certain information-theoretic tasks. Max-entropy Hmax (ρ AB |B) answers the question of how distinguishable ρ AB is from states that are maximally mixed on A, while uncorrelated with B [23] (see also Definition 3 below). This is a useful concept, e.g., in quantum key distribution, where one ideally would have a maximally random key uncorrelated with the eavesdropper’s state. Thus, the above distinguishability quantifies how well this is achieved. Min-entropy Hmin (ρ AB |B) is related to the question of how close one can bring the state ρ AB to a maximally entangled state on the bipartite system AB, allowing only local quantum operations on the B system [23]. In the special case that A is classical (i.e., we have a classical-quantum state, see Eq. (31) below) one finds that Hmin (ρ AB |B) is related to the guessing probability, i.e., our best chance to correctly guess the value of the classical system A, given the quantum system B. In the following sections we show that these results can be generalized to the case that H B is infinite-dimensional. These generalizations are for instance crucial in cryptographic settings, where there is a priori no reason to expect an eavesdropper to be limited to a finite-dimensional Hilbert space, while it is reasonable to assume the key to be finite. The operational interpretations of the min- and max-entropy exhibit a direct dependence on the dimension of the A system, which is why a naive generalization to an infinite-dimensional A appears challenging, and will not be considered here.
4.1. Max-entropy√as √ decoupling accuracy. To define decoupling accuracy we use fidelity F(ρ, σ ) := ρ σ 1 as a distance measure between states.
Min- and Max-Entropy in Infinite Dimensions
173
Definition 3. For a finite-dimensional Hilbert space H A and an arbitrary separable Hilbert space H B , we define the decoupling accuracy of ρ AB ∈ τ1+ (H A ⊗ H B ) w.r.t. the system B as d(ρ AB |B) :=
sup
σ B ∈S (H B )
d A F(ρ AB , τ A ⊗ σ B )2 .
(24)
Here, d A is the dimension of H A , and τ A := d −1 A id A is the maximally mixed state on A. Note that in infinite-dimensional Hilbert spaces there is no trace class operator which can be regarded as a generalization of the maximally mixed state in finite dimensions. We must thus require system A to be finite-dimensional in order to keep the decoupling accuracy well-defined. In [23], Proposition 5 was proved in the case where H B is assumed to be finite-dimensional. Below we use Proposition 1 to extend the assertion to the infinite-dimensional case. Proposition 5. Let H A be a finite-dimensional and H B a separable Hilbert space. It follows that d(ρ AB |B) = 2 Hmax (ρ AB |B) ,
(25)
for each ρ AB ∈ τ1+ (H A ⊗ H B ). In the following we will need to consider physical operations (channels) on states, i.e., trace preserving completely positive maps [29]. By TPCPM(H A , H B ) we denote the set of all trace preserving completely positive maps E : τ1 (H A ) → τ1 (H B ). Let I denote the identity map. Proof. Let us take projected states ρ kAB relative to a generator of the form (id A , PkB ) (this is a proper generator since dim H A < ∞). Denote the space onto which PkB projects by UkB and set Pk := id A ⊗PkB . The finite-dimensional version of Proposition 5 together with Proposition 1 yield d(ρ kAB |Bk ) = 2 Hmax (ρ AB |Bk ) → 2 Hmax (ρ AB |B) , as k → ∞. In order to prove d(ρ AB |B) ≤ 2 Hmax (ρ AB |B) we construct a suitable TPCPM and use the fact that the fidelity can only increase under its action [30]. For each k ∈ N choose a normalized state |θk ∈ H A ⊗ H B such that Pk |θk = 0. We define a channel Ek ∈ TPCPM(H A ⊗ H B , H A ⊗ H B ) as Ek (η) = Pk η Pk + qk (η)|θk θk |, with qk (η) := tr[η(id −Pk )]. Then, for all σ B ∈ S(H B ) we find k
F(ρ AB , τ A ⊗ σ B ) ≤ F (Ek (ρ AB ), Ek (τ A ⊗ σ B ))
k k = ρ AB τ A ⊗ σ B + qk (ρ AB )qk (τ A ⊗ σ B )|θk θk | 1
k k k k ≤ ρ AB τ A ⊗ σ B + qk (ρ AB ) = F(ρ AB , τ A ⊗ σ B ) + qk (ρ AB ), 1
where σ Bk := PkB σ B PkB . The second line is due to the fact that |θk is orthogonal to the support of both ρ kAB and τ A ⊗ σ Bk . The last line follows by the triangle inequality and qk (τ A ⊗ σ B ) ≤ 1. By taking the supremum over all σ B ∈ S(H B ) we obtain
1 d(ρ AB |B) ≤ d(ρ kAB |Bk ) + d A tr[ρ AB (id −Pk )] → 2 2 Hmax (ρ AB |B) ,
174
F. Furrer, J. Åberg, R. Renner
as k → ∞. It remains to show d(ρ AB |B) ≥ 2 Hmax (ρ AB |B) . For this purpose we use that the fidelity can be reformulated as F(ρ, σ ) = sup F(|ψ, |φ), |φ
(26)
where |ψ is a purification of ρ, and the supremum is taken over all purifications |φ of σ [31]. Let us fix an arbitrary k ∈ N and a σ B ∈ S(H B ). Assume |ψ ABC to be a k := P˜k |ψ ABC , with P˜k = Pk ⊗ idC , is a purification of ρ AB , and note that |ψ ABC k purification of ρ AB . Let |φ ∈ H A ⊗ H B ⊗ HC be an arbitrary purification of τ A ⊗ σ B . According to (26) it follows that F(ρ AB , τ A ⊗ σ B ) ≥ F(|ψ ABC , |φ) = |ψ ABC |φ| = |ψ ABC | P˜k |φ + ψ ABC | id − P˜k |φ| k ≥ |ψ ABC |φ| − (id − P˜k )|ψ ABC ,
where the last line is obtained by the reverse triangle inequality and the Cauchy-Schwarz inequality. By taking the supremum over all the purifications |φ of τ A ⊗ σ B in the above inequality, Eq. (26) yields F(ρ AB , τ A ⊗ σ B ) ≥ F(ρ kAB , τ A ⊗ σ B ) − (id − P˜k )|ψ ABC . As this holds for all σ B ∈ S(H B ) and all k, we obtain with the definition of the decoupling accuracy: 2
k ˜ d(ρ AB |Bk ) − d A (id − Pk )|ψ ABC = 2 Hmax (ρ AB |B) . d(ρ AB |B) ≥ lim k→∞
4.2. Min-entropy as maximum achievable quantum correlation. Assume a bipartite quantum system consisting of a finite-dimensional A system and an arbitrary B system. We can then define a maximally entangled state between the A and B system as dA 1 | AB := √ |ak |bk . d A k=1
(27)
A an arbitrary orthonormal basis of H A Here, d A denotes the dimension of H A , {ak }dk=1 dA and {bk }k=1 an arbitrary orthonormal system in H B , where we assume that dim(H A ) ≤ dim(H B ).
Definition 4. For H A a finite-dimensional and H B a separable Hilbert space (with dim H A ≤ dim H B ), we define the quantum correlation of a state ρ AB ∈ S(H A ⊗ H B ) relative to B as q(ρ AB |B) := sup d A F ((I A ⊗ E)ρ AB , | AB AB |)2 , E
(28)
where the supremum is taken over all E in TPCPM(H B ,H B ), and | AB is given by (27). Due to the invariance of the fidelity under unitaries [30], the definition of q(ρ AB |B) is independent of the choice of the maximally entangled state | AB . The quantum correlation can be rewritten as q(ρ AB |B) = sup d A AB |(I A ⊗ E)ρ AB | AB . E
(29)
Min- and Max-Entropy in Infinite Dimensions
175
The min-entropy is directly linked to the quantum correlation as shown in [23] for the finite-dimensional case. We extend this result to a B system with a separable Hilbert space. Proposition 6. Let H A be a finite-dimensional and H B be a separable Hilbert space. It follows that q(ρ AB |B) = 2−Hmin (ρ AB |B) ,
(30)
for each ρ AB ∈ S(H A ⊗ H B ). Proof. Let {ρ kAB }k∈N be the projected states of ρ AB relative to a generator of the form (id A , PkB ), and set Pk := id A ⊗PkB . Let us denote the projection space of PkB by UkB and assume that |bl ∈ UkB , l = 1, ..., d A , for all k, with |bl as in Eq. (27). By the already proved finite-dimensional version of Proposition 6 and Proposition 1, we obtain q(ρ kAB |Bk ) = (ρ kAB |Bk ) → (ρ AB |B). We begin to prove (ρ AB |B) ≤ q(ρ AB |B). Fix k and choose Ek ∈ TPCPM(UkB , UkB ) such that q(ρ kAB |Bk ) = d A AB |(I A ⊗ Ek )ρ kAB | AB . Define E˜k (ρ) = Ek (Pk ρ Pk ) + (id B −PkB )ρ(id B −PkB ), which is a valid quantum operation in TPCPM(H B , H B ). As E˜k is just one possible TPCPM, it follows that q(ρ AB |B) ≥ d A AB |(I A ⊗ E˜k )ρ AB | AB ≥ q(ρ kAB |Bk ). We thus find q(ρ AB |B) ≥ limk→∞ q(ρ kAB |Bk ) = (ρ AB |B). We next prove (ρ AB |B) ≥ q(ρ AB |B). Let E be an arbitrary TPCPM(H B , H B ). As a special instance of Stinespring dilations we know that there exists an ancilla H R together with an unitary U B R ∈ L(H B ⊗ H R ) and a state |θ R ∈ H R , such that E(σ B ) = tr R [U B R (σ B ⊗|θ R θ R |)U B† R ] [29]. With |ψ ABC a purification of ρ AB , it follows according to (26) that F ((I A ⊗ E)ρ AB , | AB AB |) = sup F ((id ⊗U B R )|ψ ABC |θ R , | AB |ηC R ) ηC R
≤ sup F (ρ AC , τ A ⊗ tr R (|ηC R ηC R |)) , ηC R
where the last inequality is due to the monotonicity of fidelity under the partial trace and τ A = d −1 A id A = tr B (| AB AB |). The optimization over all pure states ηC R can be replaced by the optimization over all density operators on HC . Then, with Proposition 5 it follows that d A F((I A ⊗ E)ρ AB , | AB AB |)2 ≤ sup d A F(ρ AC , τ A ⊗ σC )2 = 2 Hmax (ρ AC |C) σC
= 2−Hmin (ρ AB |B) = (ρ AB |B). Since this holds for all E ∈ TPCPM(H B , H B ), we obtain q(ρ AB |B) ≤ (ρ AB |B). The quantum correlation and its relation to min-entropy applied to classical quantum states connects the min-entropy with the optimal guessing probability. Imagine a source that produces the quantum states ρ Bx ∈ S(H B ) at random, according to the probability distribution PX (x). The average output is characterized by the classical-quantum state, PX (x)|xx| ⊗ ρ Bx , (31) ρX B = x∈X
176
F. Furrer, J. Åberg, R. Renner
where X denotes the (finite) alphabet of the classical system describing the source and {|x}x∈X is an orthonormal basis spanning H X . We define the guessing probability g(ρ X B |B) as the probability to correctly guess x, permitting an optimal measurement strategy on subsystem B. Formally, this can be expressed as g(ρ X B |B) := sup PX (x) tr(ρ Bx Mx ), (32) {Mx } x∈X
where the supremum is taken over all positive operator valued measures (POVM) on H B . By POVM on H B we intend a set {Mx }x∈X of positive operators which sum up to the identity. For finite-dimensional H B it is known [23] that the guessing probability is linked to the min-entropy by g(ρ X B |B) = 2−Hmin (ρ X B |B) .
(33)
We will now use Proposition 6 to show that Eq. (33) also holds for separable H B . Let ρ X B be a state as defined in Eq. (31), and construct the state | X B := |X |−1/2 x∈X |x|x B , where {|x B }x∈X is an arbitrary orthonormal set in H B . We now define Q(ρ X B , E) := d A X B |(I X ⊗ E)ρ X B | X B (cf. Eq. (29)) and G(ρ X B , {Mx }) := x x∈X PX (x) tr(ρ B M x ) (cf. Eq. (32)). Then, PX (x) tr[E ∗ (|x B x B |)ρ Bx ], (34) Q(ρ X B , E) = x∈X
where E ∗ denotes the adjoint operation of E. Let {M x } be an arbitrary |X |-element POVM on H B . One can see that the TPCPM E(ρ) := x∈X tr(Mx ρ)|x B x B | satisfies E ∗ (|x B x B |) = Mx . Thus, by Eq. (34), we find Q(ρ X B , E) = G(ρ X B , {Mx }). Since the POVM was arbitrary, it follows that q(ρ X B |B) ≥ g(ρ X B |B). Next, let E be an arbitrary TPCPM on H B . Define P = x∈X |x B x B | and Mx = E ∗ (|x B x B |) +
1 ∗ E (id B −P), x ∈ X. |X |
One can verify that {Mx } is a POVM on H B . By using Eq. (34) we can see that G(ρ X B , {Mx }) ≥ Q(ρ X B , E). This implies g(ρ X B |B) ≥ q(ρ X B |B), and thus g(ρ X B |B) = q(ρ X B |B). 5. Smooth Min- and Max-Entropy The entropic quantities that usually appear in operational settings are the smooth min- and max-entropies [8,6,23]. They result from the non-smoothed versions by an optimization procedure over states close to the original state. The closeness is defined by an appropriate metric on the state space, and a smoothing parameter specifies the maximal distance to the original state. The choice of metric has varied in the literature, but here we follow [24]. By S≤ (H ) we denote the set of positive trace class operators with trace norm ¯ smaller √equal to 1. We define the generalized fidelity on S≤ (H ) by F(ρ, σ ) := √ √ than or ρ σ 1 + (1 − tr ρ)(1 − tr σ ), which induces a metric on S≤ (H ) via ¯ P(ρ, σ ) := 1 − F(ρ, σ )2 , (35) referred to as the purified distance.
Min- and Max-Entropy in Infinite Dimensions
177
Definition 5. For > 0, we define the -smooth min- and max-entropy of ρ AB ∈ S≤ (H A ⊗ H B ) conditioned on B as Hmin (ρ AB |B) := (ρ AB |B) := Hmax
sup
Hmin (ρ˜ AB |B),
(36)
inf
Hmax (ρ˜ AB |B),
(37)
ρ˜ AB ∈B (ρ AB ) ρ˜ AB ∈B (ρ AB )
where the smoothing set B (ρ AB ) is defined with respect to the purified distance B (ρ AB ) := {ρ˜ AB ∈ S≤ (H A ⊗ H B )|P(ρ AB , ρ˜ AB ) ≤ }.
(38)
Closely related to this particular choice of smoothing set is the invariance of the smooth entropies under (partial) isometries acting locally on each of the subsystems. This can be used to show the duality relation of the smooth entropies, namely, for all states ρ AB on H A ⊗ H B it follows that Hmin (ρ AB |B) = −Hmax (ρ AC |C),
(39)
where ρ ABC is an arbitrary purification of ρ AB on an ancilla HC . A proof for the finitedimensional case can be found in [24], which allows a straightforward modification to infinite dimensions. A useful property of the smooth entropies is the data processing inequality. Proposition 7. Let be ρ ABC ∈ S≤ (H A ⊗ H B ⊗ HC ), then it follows that Hmin (ρ ABC |BC) ≤ Hmin (ρ AB |B), Hmax (ρ ABC |BC) ≤ Hmax (ρ AB |B).
Proof. Using the data processing inequality for the min-entropy, Eq. (16), we obtain (ρ Hmin ABC |BC) =
sup
ρ˜ ABC ∈B (ρ ABC )
Hmin (ρ˜ ABC |BC) ≤
sup
ρ˜ ABC ∈B (ρ ABC )
Hmin (trC ρ˜ ABC |B).
Thus, it is sufficient to show that trC (B (ρ ABC )) ⊆ B (ρ AB ). But this follows directly from the fact that the purified distance does not increase under partial trace [24], i.e., P(ρ ABC , ρ˜ ABC ) ≥ P(ρ AB , ρ˜ AB ). The data processing inequality of the smooth max-entropy follows from the duality (39), Hmax (ρ ABC |BC) = −Hmin (ρ AD |D) ≤ −Hmin (ρ AC D |C D) = Hmax (ρ AB |B),
where ρ ABC D is a purification of ρ ABC . 6. An Infinite-Dimensional Quantum Asymptotic Equipartition Property In the finite-dimensional case the quantum asymptotic equipartition property (AEP) says that the conditional von Neumann entropy can be regained as an asymptotic quantity from the conditional smooth min- and max-entropy [5,9]. (For a discussion on why the AEP can be formulated in terms of entropies, see [32].) More pre1 (ρ ⊗n |B n ) = H (ρ cisely, lim→0 limn→∞ n1 Hmin AB |B) and lim →0 lim n→∞ n Hmax AB ⊗n (ρ AB |B n ) = H (ρ AB |B). For the infinite-dimensional case we derive an upper (lower) bound to the conditional von Neumann entropy in terms of the smooth
178
F. Furrer, J. Åberg, R. Renner
min-(max-)entropy. We then use these bounds to prove the above limits in the case where H A is finite-dimensional. To this end we need a well defined notion of conditional von Neumann entropy in the infinite-dimensional case. Here we use the definition introduced in [33], which in turn is based on an infinite-dimensional extension of the relative entropy [34–37]. For ρ, σ ∈ τ1+ (H ) the relative entropy can be defined as |a j |bk |2 (a j log a j − a j log bk + bk − a j ), (40) H (ρσ ) := jk
where {|a j } j is an arbitrary orthonormal eigenbasis of ρ with corresponding eigenvalues a j , and analogously for {|bk }k , bk , and σ . The relative entropy is always positive, possibly +∞, and equal to 0 if and only if ρ = σ [35]. For states ρ AB with H (ρ A ) < +∞, the conditional von Neumann entropy can be defined as [33] H (ρ AB |B) := H (ρ A ) − H (ρ AB ρ A ⊗ ρ B ).
(41)
For many applications it appears reasonable to assume H (ρ A ) to be finite, e.g., in cryptographic settings it would correspond to restricting the states of the ‘legitimate’ users. Similarly as for the min- and max-entropy, the conditional von Neumann entropy can be approximated by projected states [33], i.e., for ρ AB ∈ S(H A ⊗ H B ) satisfying H (ρ A ) < ∞ with corresponding normalized projected states ρˆ kAB it follows that lim H (ρˆ kAB |B) = H (ρ AB |B).
k→∞
(42)
In the finite-dimensional case it has been shown [9] that the min-, max- and, von Neumann entropy can be ordered as Hmin (ρ AB |B) ≤ H (ρ AB |B) ≤ Hmax (ρ AB |B).
(43)
A direct application of Proposition 1 and (42) shows that this remains true in the infinite-dimensional case, if H (ρ A ) < ∞. Note, however, that the ordering between min- and max-entropy (13) does not hold for their smoothed versions. Proposition 8. Let ρ AB ∈ S(H A ⊗ H B ) be such that H (ρ A ) < ∞. For any > 0 it follows that 1 2 1 ⊗n n Hmin (ρ AB |B ) ≥ H (ρ AB |B) − √ 4 log(η) log 2 , (44) n n 1 1 2 H (ρ ⊗n |B n ) ≤ H (ρ AB |B) + √ 4 log(η) log 2 (45) n max AB n 1
1
for n ≥ (8/5) log(2/ 2 ), and η = 2− 2 Hmin (ρ AB |B) + 2 2 Hmax (ρ AB |B) + 1. Note that it is not clear under what conditions the limits n → ∞, → 0 exist for the left hand side of Eqs. (44) and (45). If they do, Proposition 8 implies ⊗n 1 (ρ ⊗n |B n ) ≥ H (ρ n lim→0 limn→∞ n1 Hmin AB |B) and lim →0 lim n→∞ n Hmax (ρ AB |B ) AB ≤ H (ρ AB |B). For the case of a finite-dimensional H A we show that these inequalities can be replaced with equalities (Corollary 1). It should be noted that in the classical case a lower bound on the min-entropy and an upper bound on the max-entropy, analogous to Eqs. (44) and (45), correspond [32] to the AEP in classical probability theory [38]. Since in the finite-dimensional quantum
Min- and Max-Entropy in Infinite Dimensions
179
case, the step from Proposition 8 to Corollary 1 is directly obtained [9] via Fannes’ inequality [39], the limits in Corollary 1 are usually referred to as ‘the quantum AEP’ [9]. In the infinite-dimensional case the relation between Proposition 8 and Corollary 1 appears less straightforward, and it is thus not entirely clear what should be regarded as constituting ‘the quantum AEP’. We will not pursue this question here, but merely note that it is the inequalities in Proposition 8, rather than the limits in Corollary 1, that are the most relevant for applications [5]. However, for the sake of simplicity we continue to refer to Corollary 1 as a quantum AEP. We prove Proposition 8 after the following lemma. Lemma 2. Let ρ AB ∈ S(H A ⊗ H B ) and let {ρˆ kAB }∞ k=1 be a sequence of normalized projected states. For any fixed 1 > t > 0, there exists a k0 ∈ N such that t Hmin (ρ AB |B) ≥ Hmin (ρˆ kAB |B), ∀k ≥ k0 .
(46)
Proof. In the following let t ∈ (0, 1) be fixed. According to the definition of the smooth min-entropy in Eq. (36), it is enough to show that B t (ρˆ kAB ) ⊆ B (ρ AB ) for all k ≥ k0 . Note that the purified distance is compatible with trace norm convergence, i.e., ρ AB − ρˆ kAB 1 → 0 implies that P(ρˆ kAB , ρ AB ) → 0. Hence, there exists a k0 such that P(ρˆ kAB , ρ AB ) < (1 − t) for all k ≥ k0 . For k ≥ k0 and ρ˜ AB ∈ B t (ρˆ kAB ) we thus find P(ρ˜ AB , ρ AB ) ≤ P(ρ˜ AB , ρˆ kAB ) + P(ρˆ kAB , ρ AB ) < , such that ρ˜ AB ∈ B (ρ AB ). Proof (Proposition 8). Let (PkA , PkB ) be a generator of projected states. The pair of n-fold tensor products of the projections, (PkA )⊗n , (PkB )⊗n , is also a generator of projected states. If we now fix 1 > t > 0 and n ∈ N, it follows by Lemma 2 that we can (ρ ⊗n |B n ) ≥ H t ((ρˆ k )⊗n |B n ) for every k ≥ k . Since find a k0 ∈ N such that Hmin 0 min AB AB t ((ρˆ k )⊗n |B n ) Eq. (44) is valid for the finite-dimensional case [9], we can apply it to Hmin AB to obtain 1 t 1 2 Hmin ((ρˆ kAB )⊗n |B n ) ≥ H (ρˆ kAB |B) − √ 4 log(ηk ) log n (t)2 n 1
1
for any n ≥ (8/5) log(2/(t)2 ), and ηk = 2− 2 Hmin (ρˆ AB |B) + 2 2 Hmax (ρˆ AB |B) + 1. Hence 1 2 1 ⊗n n k H (ρ |B ) ≥ H (ρˆ AB |B) − √ 4 log(ηk ) log , (47) n min AB (t)2 n k
k
for all k ≥ k0 . Since the left-hand side of Eq. (47) is independent of k we can use (42) and Proposition 1, to find 1 2 1 ⊗n n k H (ρ |B ) ≥ lim H (ρˆ AB |B) − √ 4 log(ηk ) log k→∞ n min AB (t)2 n 1 2 = H (ρ AB |B) − √ 4 log(η) log . (t)2 n We finally take the limit t → 1 in the above inequality, as well as in the condition n ≥ (8/5) log(2/(t)2 ) to obtain the first part of the proposition. For the second part we use the duality of the conditional von Neumann entropy, i.e., H (ρ AB |B) = −H (ρ AC |C) for a purification ρ ABC [33]. This, together with the duality relation for smooth min- and max-entropy (39) leads directly to (45).
180
F. Furrer, J. Åberg, R. Renner
Corollary 1. Let H A be a finite-dimensional and H B a separable Hilbert space. For all ρ AB ∈ S(H A ⊗ H B ) it follows that 1 n Hmin (ρ ⊗n AB |B ) = H (ρ AB |B), →0 n→∞ n 1 n (ρ ⊗n lim lim Hmax AB |B ) = H (ρ AB |B). →0 n→∞ n lim lim
(48) (49)
Proof. Let > 0 be sufficiently small, and let (id A , PkB ) be a generator of projected states ρ kAB , with corresponding normalized projected states ρˆ kAB . Let σ AB ∈ B (ρ AB ), k , and normalized projected states σ k . By H k with projected states σ AB ˆ AB min (σ AB |B) = k k k k k = Hmin (σˆ AB |B) + log tr σ AB and (43) we find Hmin (σ AB |Bk ) ≤ H (σˆ AB |B), where σˆ AB k k k −1 (tr σ AB ) σ AB . Since H (σˆ AB |Bk ) is finite-dimensional we can use Fannes’ inequalk |B ) ≤ H (ρˆ k |B ) + 4 log d + ity [39] to obtain (for k sufficiently large) H (σˆ AB k k A AB k k k , and H (t) = −t log t − (1 − 4Hbin (k ), with d A = dim(H A ), k = ρˆ AB − σˆ AB 1 bin t) log(1 − t). Due to the general relation ρ − σ 1 ≤ 2P(ρ, σ ) (see Lemma 6 in [24]), we have ρ AB − σ AB 1 ≤ 2 for all σ AB ∈ B (ρ AB ), which yields limk→∞ k = ρ AB − σˆ AB 1 ≤ 4, where σˆ AB = σ AB / tr(σ AB ). Combined with (42) this leads to k (ρ Hmin AB |B) = supσ AB ∈B (ρ AB ) lim k→∞ Hmin (σ AB |B) ≤ H (ρ AB |B) + 16 log d A + 4Hbin (4). Applied to an n-fold tensor product this gives 4 1 H (ρ ⊗n |B n ) ≤ H (ρ AB |B) + 16 log d A + Hbin (4). n min AB n
(50)
Equation (48) follows by combining (50) with the lower bound in (44), taking the limits n → ∞ and → 0. Equation (49) follows directly by the duality of the conditional von Neumann entropy [33] together with the duality of the smooth min- and max-entropy (39). 7. Conclusion and Outlook We have extended the min- and max-entropies to separable Hilbert spaces, and shown that properties and operational interpretations, known from the finite-dimensional case, remain valid in the infinite-dimensional setting. These extensions are facilitated by the finding (Proposition 1) that the infinite-dimensional min- and max-entropies can be expressed in terms of convergent sequences of finite-dimensional entropies. We bound the smooth min- and max-entropies of iid states (Proposition 8) in terms of an infinitedimensional generalization of the conditional von Neumann entropy H (A|B), introduced in [33], which is defined when the von Neumann entropy of system A is finite, H (A) < ∞. Under the additional assumption that the Hilbert space of system A has finite dimension we furthermore prove that the smooth entropies of iid states converge to the conditional von Neumann entropy (Corollary 1), corresponding to a quantum asymptotic equipartition property (AEP). Whether these conditions can be relaxed is an open question. In the general case where H (A) is not necessarily finite, this would however require a more general definition of the conditional von Neumann entropy than the one used here. For information-theoretic purposes it appears reasonable to require extensions of the conditional von Neumann entropy to be compatible with the AEP, i.e., that the conditional von Neumann entropy can be regained from the smooth min- and max-entropy in
Min- and Max-Entropy in Infinite Dimensions
181
the asymptotic iid limit. This enables generalizations of operational interpretations of the conditional von Neumann entropy. For example, in the finite-dimensional asymptotic case the conditional von Neumann entropy characterizes the amount of entanglement needed for state merging [40], i.e., the transfer of a quantum state shared by two parties to only one of the parties. An infinite-dimensional generalization of one-shot state merging [41], together with the AEP, could be used to extend this result to the infinite-dimensional case. Some other immediate applications of this work are in continuous variable quantum key distribution, and in statistical mechanics, where it has recently been shown [19,20] that the smooth min- and max-entropies play a role. Our techniques may also be employed to derive an infinite-dimensional generalization of the entropic uncertainty relation [11]. Such a generalization would be interesting partially because it could find applications in continuous variable quantum information processing, but also because it may bring this information-theoretic uncertainty relation into the same realm as the standard uncertainty relation. Acknowledgements. We thank Roger Colbeck and Marco Tomamichel for helpful comments and discussions, and an anonymous referee for very valuable suggestions. Fabian Furrer acknowledges support from the Graduiertenkolleg 1463 of the Leibniz University Hannover. We furthermore acknowledge support from the Swiss National Science Foundation (grant No. 200021-119868). Open Access This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
A. Technical Lemmas In the following, each Hilbert space is assumed to be separable. Let us define the positive cone L+ (H ) := {T ∈ L(H )| T ≥ 0} in L(H ). The next two lemmas follow directly from the definition of positivity of an operator. Lemma 3. If T ∈ L+ (H ), then for each S ∈ L(H ) it follows that ST S † ∈ L+ (H ). Lemma 4. The positive cone L+ (H ) is sequentially closed in the weak operator topology, i.e., for {Tk }k∈N ⊂ L+ (H ) such that Tk converge to T ∈ L(H ) in the weak operator topology, it follows that T ≥ 0. The following lemma is a special case of a theorem by Grümm [42] (see also [43], pp. 25–29, for similar results). Lemma 5. Let Ak , A ∈ L(H ), such that supk Ak < +∞, and Ak → A in the strong operator topology, and let T ∈ τ1 (H ). Then limk→∞ Ak T − AT 1 = 0 and limk→∞ T Ak − T A1 = 0. Corollary 2. If Pk is a sequence of projectors on H that converges in the strong operator topology to the identity, and if ρ ∈ τ1+ (H ), then limk→∞ Pk ρ Pk − ρ1 = 0. Lemma 6. If sequences of projectors PkA and PkB on H A and H B , respectively, converge in the strong operator topology to the identity, then PkA ⊗ PkB converges in the strong operator topology to id AB . Lemma 7. Let {Tk }k∈N ⊂ τ1 (H B ) be a sequence that converges in the weak* topology to T ∈ τ1 (H B ). Then, the sequence id A ⊗Tk in L(H A ⊗ H B ) converges to id A ⊗T in the weak operator topology.
182
F. Furrer, J. Åberg, R. Renner
Proof. For each ψ ∈ H A ⊗ H B we find that ψ| id ⊗Tk |ψ = tr(Tk K ψB ), where K ψB = tr A |ψψ| is the reduced operator. Since K ψA is trace class (and thus compact) the statement follows immediately. B. Proof of Proposition 1 In order to derive Proposition 1 we proceed as follows: In Sect. B.1 we show that the minand max-entropy of a projected state can be reduced to an entropy on a finite-dimensional space. In Sect. B.2 we show that the min- and max-entropies are monotonic over the sequences of projected states. Finally we prove the limits listed in Proposition 1. Note that in what follows we mostly make use of the quantities (ρ AB |σ B ) and (ρ AB |B), as defined in Eqs. (5) and (6), rather than the min- and max-entropies per se. B.1. Reduction. Here we show that the min- and max-entropy of a projected state can be considered as effectively finite-dimensional, in the sense that restricting the Hilbert space to the support of the projected states does not change the value of the entropies. Lemma 8. Let PA , PB be projectors onto closed subspaces U A ⊆ H A and U B ⊆ H B , respectively, ρ˜ AB ∈ τ1+ (H A ⊗ H B ), and σ˜ B ∈ τ1+ (H B ). i) If (PA ⊗ id B )ρ˜ AB (PA ⊗ id B ) = ρ˜ AB it follows that (ρ˜ AB |σ˜ B ) = inf{λ ∈ R|λPA ⊗ σ˜ B ≥ ρ˜ AB }.
(51)
ii) If (id A ⊗PB )ρ˜ AB (id A ⊗PB ) = ρ˜ AB it follows that (ρ˜ AB |B) = (ρ˜ AB |U B ),
(52)
where (ρ˜ AB |U B ) means that the infimum in Eq. (6) is taken only over the set τ1+ (U B ). The proof is straightforward and left to the reader. In the particular case of projected states ρ kAB relative to a generator (PkA , PkB ), the evaluation of (ρ kAB |σ Bk ) and (ρ kAB |B), where σ Bk = PkB σ B PkB , can be restricted to the finite-dimensional Hilbert space UkA ⊗ UkB given by the projection spaces of PkA and PkB . Especially, we can conclude that the infima of Eqs. (5) and (6), and consequently the infimum in (1) and the supremum in (2), are attained for projected states, since these are optimizations of continuous functions over compact sets. B.2. Monotonicity. The next lemma considers the monotonic behaviour of the min- and max-entropies with respect to sequences of projected states. k ∞ Lemma 9. For ρ AB ∈ S(H A ⊗ H B ), σ B ∈ S(H B ), let {ρ kAB }∞ k=1 and {σ B }k=1 be proA B jected states relative to a generator (Pk , Pk ).
i) It follows that (ρ kAB |σ Bk ) and (ρ kAB |B) are monotonically increasing in k, where the first sequence is bounded by (ρ AB |σ B ) and the latter by (ρ AB |B). ii) For an arbitrary but fixed purification ρ ABC of ρ AB with purifying system HC , let ρ kAC = tr B ρ kABC and ρ kABC = (PkA ⊗ PkB ⊗ idC )ρ ABC (PkA ⊗ PkB ⊗ idC ). Then it follows that (ρ kAC |C) is monotonically increasing and bounded by (ρ AC |C).
Min- and Max-Entropy in Infinite Dimensions
183
Note that ρ kAC as defined in the lemma is not a projected state in the sense of Definition 2. Translated to min- and max-entropies, the lemma above says that Hmin (ρ kAB |σ Bk ) and Hmin (ρ kAB |B) are monotonically increasing while Hmax (ρ kAB |B) is monotonically decreasing. But in general, the monotonicity does not hold for normalized projected states. Proof. Set Pk := PkA ⊗ PkB and recall that (ρ kAB |σ Bk ) = inf{λ ∈ R| λPkA ⊗σ Bk ≥ ρ kAB } according to Lemma 8. To show the first part of i) note that for k ≤ k the equations
Pk Pk (λ id ⊗σ B − ρ AB )Pk Pk = Pk (λPkA ⊗ σ Bk − ρ kAB )Pk = λPkA ⊗ σ Bk − ρ kAB
hold, which imply via Lemma 3 that (ρ kAB |σ Bk ) ≤ (ρ kAB |σ Bk ) ≤ (ρ AB |σ B ). For the second part, let σ˜ B ∈ τ1+ (H B ) be the optimal state such that (ρ kAB |B) = tr σ˜ B and
PkA ⊗ σ˜ B ≥ ρ kAB . But then we obtain that PkA ⊗ PkB σ˜ B PkB − ρ kAB ≥ 0 and therefore
also (ρ kAB |B) ≤ (ρ kAB |B). The upper bound follows in the same manner. In order to show ii) we define the sets Mk := {σ˜ C ∈ τ1+ (HC )| id A ⊗σ˜ C ≥ ρ kAC } such that (ρ kAC |C) = inf σ˜ C ∈Mk tr σ˜ C . To conclude the monotonicity we show that Mk ⊃ Mk for k ≤ k. If Mk = ∅, the statement is trivial. Assume σ˜ C ∈ Mk . Using PkB ≤ PkB we find id A ⊗σ˜ C ≥ PkA tr B (PkB ρ ABC PkB )PkA ≥ PkA tr B (PkB ρ ABC PkB )PkA .
Together with Lemma 3, this yields PkA ⊗ σ˜ C ≥ ρ kAC and thus σ˜ C ∈ Mk . A similar argument provides the upper bound (ρ kAC |C) ≤ (ρ AC |C). B.3. Limits. After the above discussion on general properties of the min- and maxentropies of projected states we are now prepared to prove Proposition 1. For the sake of convenience we divide the proof into three lemmas. Lemma 10. For ρ AB ∈ S(H A ⊗ H B ) and σ B ∈ S(H B ), let {ρ kAB }∞ k=1 be the projected states of ρ AB relative to a generator (PkA , PkB ), and let σ Bk := PkB σ B PkB . It follows that (ρ AB |σ B ) = lim (ρ kAB |σ Bk ), k→∞
(53)
and the infimum in Eq. (5) is attained if (ρ AB |σ B ) is finite. Proof. That the infimum is attained follows directly from the definition. To show (53) we prove that (ρ AB |σ B ) is lower semi-continuous in (ρ AB , σ B ) with respect to the product topology induced by the trace norm topology on each factor. Since this means that lim inf k→∞ (ρ kAB |σ Bk ) ≥ (ρ AB |σ B ), the combination with Lemma 9 results directly in (53). To show lower semi-continuity recall that it is equivalent to say that all lower level sets −1 ((−∞, t]) = {(ρ AB , σ B )| (ρ AB |σ B ) ≤ t}, for t ∈ R have to be closed. But this follows by rewriting −1 ((−∞, t]) as {(ρ AB , σ B )| t id ⊗σ B ≥ ρ AB }. Lemma 11. For ρ AB ∈ S(H A ⊗ H B ), let {ρ kAB }∞ k=1 be the projected states of ρ AB relative to a generator (PkA , PkB ). It follows that (ρ AB |B) = lim (ρ kAB |B), k→∞
and the infimum in Eq. (6) is attained if (ρ AB |B) is finite.
(54)
184
F. Furrer, J. Åberg, R. Renner
Proof. Let μk := (ρ kAB |B) = (ρ kAB |Bk ), where the last equality is due to Lemma 8. By Lemma 9 this sequence is monotonically increasing, and we can thus define μ := limk→∞ μk ∈ R ∪ {+∞}. In addition, Lemma 9 also yields μ ≤ (ρ AB |B). Hence, the case λ = +∞ is trivial, and it remains to show μ ≥ (ρ AB |B), for μ < ∞. For each k ∈ N let σ˜ Bk be an optimal state such that (ρ kAB |B) = tr σ˜ Bk and id ⊗σ˜ Bk ≥ ρ kAB . Note that due to positivity tr σ˜ Bk = σ˜ Bk 1 ≤ μ, such that σ˜ Bk is a bounded sequence in τ1 (H B ). Since the trace class operators τ1 (H B ) is the dual space of the compact operators K(H B ) [44], we can apply Banach Alaoglu’s theorem [44,45] to find a subsequence {σ˜ Bk }k∈ with a weak* limit σ˜ B ∈ τ1 (H B ), i.e., tr(K σ˜ Bk ) → tr(K σ˜ B ) (k ∈ ) for all K ∈ K(H B ), such that σ˜ B 1 ≤ μ. Obviously, σ˜ B is also positive. According to Lemma 7, id ⊗σ˜ Bk (for k ∈ ) converges in the weak operator topology to id ⊗σ˜ B , and so does id ⊗σ˜ Bk − ρ kAB to id ⊗σ˜ B − ρ AB . But then we can conclude that id ⊗σ˜ B − ρ AB ≥ 0 such that (ρ AB |B) ≤ tr σ˜ B ≤ μ. Lemma 12. For ρ AB ∈ S(H A ⊗ H B ), let ρ ABC be a purification with purifying system HC , and (PkA , PkB ) be a generator of projected states. It follows that (ρ AC |C) = lim (ρ kAC |C), k→∞
(55)
where ρ kAC = tr B [(PkA ⊗ PkB ⊗ idC )ρ ABC (PkA ⊗ PkB ⊗ idC )]. Proof. Let νk := (ρ kAC |C). Due to Lemma 9 this sequence is monotonically increasing, so we can define ν := limk→∞ νk ∈ R ∪ {+∞}, and conclude that ν ≤ (ρ AC |C). Thus, the case ν = +∞ is trivial. It thus remains to show ν ≥ (ρ AC |C) for ν < +∞. As proved in Lemma 11, the infimum in Eq. (6) is attained even if the underlying Hilbert spaces are infinite-dimensional. Thereby there exists for each k ∈ N a state σ˜ Ck such that id ⊗σ˜ Ck ≥ ρ kAC and tr σ˜ Ck = (ρ kAC |C). Now we can proceed in the same manner as in the proof of Lemma 11 to construct a weak* limit σ˜ C ∈ τ1+ (H B ) that satisfies id A ⊗σ˜ C ≥ ρ AC , and is such that (ρ AC |C) ≤ tr σ˜ C ≤ ν ≤ (ρ AC |C). This completes the proof. Of course, Lemma 10 and 11 can directly be rewritten in terms of min-entropies and yield the first two statements of Proposition 1. The part for the normalized projected states follows via Hmin (ρˆ kAB |σˆ Bk ) = Hmin (ρ kAB |σ Bk ) − log tr σ Bk + log tr ρ kAB , and Hmin (ρˆ kAB |B) = Hmin (ρ kAB |B) + log tr ρ kAB . In order to obtain the convergence stated for the max-entropy in Proposition 1, note that (PkA ⊗ PkB ⊗ idC )ρ ABC (PkA ⊗ PkB ⊗ idC ) is a purification of ρ kAB , whenever ρ ABC is a purification of ρ AB . Hence, Hmax (ρ kAB |B) = −Hmin (ρ kAC |C) = log (ρ kAC |C). For normalized states use Hmax (ρˆ kAB |Bk ) = Hmax (ρ kAB |Bk ) − log tr ρ kAB . References 1. Shannon, C.E.: A Mathematical Theory of Communication. Bell System Technical Journal 27, 379–423 and 623–656 (1948) 2. Rényi, A.: On measures of entropy and information. Proc. of the 4th Berkley Symp. on Math. Statistics and Prob. 1, Berkeley, CA: Univ. of Calif. Press, 1961, pp. 547–561 3. Barnum, H., Nielsen, M.A., Schumacher, B.: Information transmission through a noisy quantum channel. Phys. Rev. A 57, 4153–4175 (1998) 4. Schumacher, B.: Quantum coding. Phys. Rev. A 51, 2738–2747 (1995) 5. Renner, R.: Security of Quantum Key Distribution. Ph.D. thesis, Swiss Fed. Inst. of Technology, Zurich, 2005, Available at http://arXiv.org/abs/quant-ph/0512258v2, 2006
Min- and Max-Entropy in Infinite Dimensions
185
6. Renner, R., Wolf, S.: Smooth Renyi entropy and applications Proc. 2004 IEEE International Symposium on Information Theory, Piscataway, NJ: IEEE, 2004, p. 233 7. Renes, J. M., Renner, R.: One-Shot Classical Data Compression with Quantum Side Information and the Distillation of Common Randomness or Secret Keys. http://arXiv.org/abs/1008.0452v2 [quant-ph], 2010 8. Renner, R., Wolf, S., Wullschleger, J.: The Single-Serving Channel Capacity Proc. 2006 IEEE International Symposium on Information Theory, Piscataway, NJ: IEEE, 2006, pp. 1424–1427 9. Tomamichel, M., Colbeck, R., Renner, R.: A Fully Quantum Asymptotic Equipartition Property. IEEE Trans. Inf. Th. 55, 5840–5847 (2009) 10. Berta, M., Christandl, M., Renner, R.: A Conceptually Simple Proof of the Quantum Reverse Shannon Theorem. http://arXiv.org/abs/0912.3805v1 [quant-ph], 2009 11. Berta, M., Christandl, M., Colbeck, R., Renes, J.M., Renner, R.: The uncertainty principle in the presence of quantum memory. Nature Physics 6, 659–662 (2010) 12. Datta, N., Renner, R.: Smooth Entropies and the Quantum Information Spectrum. IEEE Trans. Inf. Theor. 55, 2807–2815 (2009) 13. Han, T. S.: Information-Spectrum Methods in Information Theory. New York: Springer-Verlag, 2002 14. Han, T.S., Verdu, S.: Approximation theory of output statistics. IEEE Trans. Inform. Theory 39, 752–772 (1993) 15. Datta, N.: Min- and Max-Relative Entropies and a New Entanglement Monotone. IEEE Trans. Inf. Theor. 55, 2816–2826 (2009) 16. Brandão, F.G.S.L., Datta, N.: One-shot rates for entanglement manipulation under non-entangling maps. IEEE Trans. Inf. Theor. 57, 1754 (2011) 17. Buscemi, F., Datta, N.: Entanglement Cost in Practical Scenarios. Phys. Rev. Lett 106, 130503 (2011) 18. Mosonyi, M., Datta, N.: Generalized relative entropies and the capacity of classical-quantum channels. J. Math. Phys. 50, 072104 (2009) 19. Dahlsten, O.C.O., Renner, R., Rieper, E., Vedral, V.: Inadequacy of von Neumann entropy for characterizing extractable work. New J. Phys. 13, 053015 (2011) 20. del Rio L., Åberg J., Renner R., Dahlsten O., Vedral, V.: The thermodynamic meaning of negative entropy. Nature 474, 61–63 (2011) 21. Scarani, V., Bechmann-Pasquinucci, H., Cerf, N.J., Dušek, M., Lütkenhaus, N., Peev, M.: The security of practical quantum key distribution. Rev. Mod. Phys. 81, 1301–1350 (2009) 22. Bratteli, O., Robinson, D. W.: Operator Algebras and Quantum Statistical Mechanics I. New York: Springer-Verlag, 1979 23. König, R., Renner, R., Schaffner, C.: The Operational Meaning of Min- and Max-Entropy. IEEE Trans. Inf. Th. 55, 4337–4347 (2009) 24. Tomamichel, M., Colbeck, R., Renner, R.: Duality Between Smooth Min- and Max-Entropies. IEEE Trans. Inf. Th. 56, 4674–4681 (2010) 25. Lieb, E.H., Ruskai, M.B.: A Fundamental Property of Quantum-Mechanical Entropy. Phys. Rev. Lett. 30, 434–436 (1973) 26. Lieb, E.H.: Convex trace functions and the Wigner-Yanase-Dyson conjecture. Adv. Math. 11, 267–288 (1973) 27. Lieb, E.H., Ruskai, M.B.: Proof of the strong subadditivity of quantum-mechanical entropy. J. Math. Phys. 14, 1938–1941 (1973) 28. Owari, M., Braunstein, S.L., Nemoto, K., Murao, M.: Epsilon-convertibility of entangled states and extension of Schmidt rank in infinite-dimensional systems. Quant. Inf. and Comp. 8, 30–52 (2008) 29. Kraus, K.: Lecture Notes in Physics 190, States, Effects, and Operations. Berlin Heidelberg: SpringerVerlag, 1983 30. Nielsen, M.L., Chuang, I.L.: Quantum Computation and Quantum Information. Cambridge: Cambridge University Press, 2000 31. Uhlmann, A.: The transition probability in the state space of a *-algebra. Rep. Math. Phys. 9, 273–279 (1976) 32. Holenstein, H., Renner, R.: On the Randomness of Independent Experiments. IEEE Trans. Inf. Theor. 57(4), 1865–1871 (2011) 33. Kuznetsova, A.A.: Quantum conditional entropy for infinite-dimensional systems. Theory Probab. Appl. 55, 782–790 (2010) 34. Klein, O.: Zur quantenmechanischen Begründung des zweiten Hauptsatzes der Wärmelehre. Z. F. Phys. A 72, 767–775 (1931) 35. Lindblad, G.: Entropy, Information and Quantum Measurements. Commun. Math. Phys. 33, 305–322 (1973) 36. Lindblad, G.: Expectations and Entropy Inequalities for Finite Quantum Systems. Commun. Math. Phys. 39, 111–119 (1974) 37. Holevo, A.S., Shirokov, M.E.: Mutual and Coherent Information for Infinite-Dimensional Quantum Channels. Probl. Inf. Transm. 46, 201–217 (2010)
186
F. Furrer, J. Åberg, R. Renner
38. 39. 40. 41.
Cover, T.M., Thomas, J.A.: Elements of Information Theory. 2nd ed. New York: Wiley, 2006 Alicki, R., Fannes, M.: Continuity of quantum conditional information. J. Phys. A 37, L55–L57 (2004) Horodecki, M., Oppenheim, J., Winter, A.: Partial quantum information. Nature 436, 673–676 (2005) Berta, M.: Single-shot Quantum State Merging. Diploma thesis, ETH Zurich, February 2008, available at http://arXiv.org/abs/0912.4495v1 [quant-ph], 2009 Grümm, H.R.: Two theorems about C p . Rep. Math. Phys. 4, 211–215 (1973) Simon, B.: Trace Ideals and Their Applications. 2nd ed. Providence, RI: Amer. Math. Soc., 2005 Reed, M., Simon, B.: Methods of Modern Mathematical Physics, Vol. I: Functional Analysis. New York: Academic Press, 1978 Hille, E., Phillips, R.S.: Functional Analysis and Semi-Groups. Providence, RI: Amer. Math. Soc., 1957
42. 43. 44. 45.
Communicated by M.B. Ruskai
Commun. Math. Phys. 306, 187–191 (2011) Digital Object Identifier (DOI) 10.1007/s00220-011-1272-3
Communications in
Mathematical Physics
Schrödinger Operators and the Zeros of Their Eigenfunctions Sol Schwartzman Math. Department, University of Rhode Island, Kingston, RI 02881, USA. E-mail:
[email protected] Received: 11 June 2010 / Accepted: 16 December 2010 Published online: 22 May 2011 – © Springer-Verlag 2011
Abstract: In n-dimensional Euclidean space let us be given an infinitely differentiable real valued function V that is bounded below. We associate with the formal operator that sends a complex valued function ψ into −div(grad ψ) + V ψ a uniquely defined self adjoint operator which we will denote by − + V . If ψ0 is any eigenfunction of the self adjoint operator − + V we prove that a necessary and sufficient condition for ψ0 to never equal zero is that the eigenspace to which ψ0 belongs contain a positive function. In this case the eigenspace must be one dimensional. The same result holds on any complete connected Riemannian manifold whose first Betti number is zero. I. Recall that a Riemannian manifold M n is said to be complete if it is complete as a metric space. If M n = Mn so that the boundary of M n is empty, this definition agrees with the one that is usually given. In what follows we will be given a complete connected C ∞ oriented Riemannian manifold whose first Betti number equals zero. We will allow the possibility that Mn = M n . We recall that there is a standard measure μ on M n . We will denote by D the space of complex valued functions ψ on M n of class C 2 such that 1. The functions ψ, ||grad ψ|| belong to L 2 (Mn , μ). 2. The normal derivative of ψ at any point on the boundary of M n equals zero. We will denote by L V the linear map of D into the space of complex valued functions on M n that sends ψ ∈ D into −div(grad ψ) + V ψ, where V is an arbitrary real valued function. We are going to prove the following. Theorem 1. If ψ1 and ψ2 are any two linearly independent real valued functions in D such that L V (ψ1 ) = L V (ψ2 ) = 0 then there is a point p ∈ M n such that ψ1 ( p) = ψ2 ( p) = 0.
188
S. Schwartzman
We are going to need a lemma that does not seem to have been known even when M n is n-dimensional Euclidean space with its usual Riemannian metric. Suppose that ψ is any nowhere vanishing complex valued function of class C 2 on the manifold M n on which we are working. Since we are assuming that the first Betti number of M n equals zero there must exist real valued C 2 functions k and u such that ψ = exp(k + iu). Here k is uniquely determined and u is unique up to an additive constant, so that grad u is uniquely determined. The standard measure μ on M n arises from a differential form . If we denote by Dgrad u the operation of Lie differentiation that we get from the vector field grad u, the result we will need is the following: Lemma 1. The imaginary part of (ψ/ψ) e2k equals Dgrad u (e2k ). To prove this we need only show that for every p ∈ Mn there is a coordinate patch containing p in which this holds. √ It is well known that there is always a coordinate patch containing p in which g = 1. 2 In such a patch = d x1 ∧ · · · ∧ d xn and for any C function ψ, ψ = div(grad ψ) = ∂ψ ∂ ij ∂ x j g ∂ xi . Since ψ = exp(k + iu) we must have ∂ψ ∂k ∂u exp(k + iu) = +i ∂ xi ∂ xi ∂ xi so that ∂ ∂x j
∂ψ g ∂ xi ij
∂ ∂k ∂u ij = g exp(k + iu) +i ∂x j ∂ xi ∂ xi ∂g i j ∂k ∂u exp(k + iu) = +i ∂ x j ∂ xi ∂ xi 2 ∂ k ∂ 2u ij +g exp(k + iu) +i ∂ xi ∂ x j ∂ xi ∂ x j ∂k ∂k ∂u ∂u ij +g exp(k + iu). +i +i ∂ xi ∂ xi ∂x j ∂x j
Thus the imaginary part of ψ/ψ equals ∂gi j ∂u ∂ 2u ∂k ∂u + gi j + 2g i j , ∂ xi ∂ x j ∂ xi ∂ x j ∂ xi ∂ x j therefore the imaginary part of (ψ/ψ)e2k equals ∂gi j ∂u ∂ 2u ∂k ∂u + gi j + 2g i j ∂ xi ∂ x j ∂ xi ∂ x j ∂ xi ∂ x j times e2k . On the other hand Dgrad u e2k (d x1 ∧ · · · ∧ d xn ) = Dgrad u e2k (d x1 ∧ · · · ∧ d xn ) + e2k Dgrad u (d x1 ∧ · · · ∧ d xn ).
Schrödinger Operators and the Zeros of Their Eigenfunctions
However Dgrad u e2k = 2e2k g i j ∂∂u xj
∂k ∂ xi
189
while
∂u Dgrad u (d x1 ∧ · · · ∧ d xn ) = d g ∧ d x2 ∧ · · · ∧ d xn ∂x j 2 j ∂u ∧ d x3 ∧ · · · ∧ d xn + d x1 ∧ d g ∂x j +··· ∂u + d x1 ∧ d x2 ∧ · · · ∧ d g n j . ∂x j 1j
Since
∂u d g ∂x j 1j
=
∂g 1 j ∂u ∂ 2u d x1 + g 1 j d x1 ∂ x1 ∂ x j ∂ x1 ∂ x j + a linear combination of d x2 , . . . , d xn ,
and similar statements hold for every other expression d(Dgrad u (xα )) we see that Dgrad u (d x1 ∧ · · · ∧ d xn ) equals
∂g i j ∂u ∂ 2u + gi j ∂ xi ∂ x j ∂ xi ∂ x j
d x1 ∧ · · · d xn .
Thus Dgrad u e2k d x1 ∧ · · · ∧ d xn equals 2 ∂g i j ∂u i j ∂k ∂u ij ∂ u 2g e2k d x1 ∧ · · · ∧ d xn . + +g ∂ xi ∂ x j ∂ xi ∂ x j ∂ xi ∂ x j This proves our lemma. II. Suppose now that ψ = ex p(k + iu) ∈ D and that L V ψ = 0. The vector field grad u/||grad u|| is well defined on the set S of points in M n at which grad u = 0. Since grad u is everywhere tangent to the boundary, this vector field defines a local action of the real line on S assuming S is not empty. Actually we want to prove that S is empty. This will imply that u is a constant, which in turn will imply that ψ is a constant times a real valued function. We will now assume that S is nonempty and get a contradiction. Each of the vector fields grad u and grad u/||grad u|| defines a local action of the real line on S. Although we cannot be sure that in either case we get an actual flow it is still true that for either of these actions if p ∈ S there is a maximal open subinterval of the real line such that the action of t on p is defined for all in t in the subinterval. For the action on S associated with grad u/||grad u|| we will use t × p to denote the point into which p is sent after time t. For the action associated with grad u we will use t ◦ p. We will say that p ∈ S is ×-limited if the ×-action of t on p is not defined for some positive values of t. We similarly define what we mean when we say that p is ◦-limited. Proposition 1. No point in S is both ◦-limited and ×-limited.
190
S. Schwartzman
Proof. Suppose p is ×-limited and let (−a× , b× ) be the maximal interval for which the ×-action on p is defined. Since grad u/||grad u|| is always a vector of length one the orbital segment consisting of points t × p for t ∈ (0, b× ) is of finite length. Since our manifold is complete under its Riemannian metric, limt→b× t × p exists. If (−a◦ , b◦ ) is the maximum interval on which the ◦-action on p is defined b◦ cannot be finite. This is so because the image in S of (0, b× ) would equal the image of (0, b◦ ), so that limt→b◦ (t ◦ p) would exist. This limit could not belong to S and therefore would be a fixed point under the action associated with grad u. However such a situation cannot arise at a fixed point of a local action associated with a vector field. This establishes our proposition.
Recall that was the differential form on Mn giving rise to the standard measure and that because we are assuming that L V (ex p(k + iu)) = 0 our lemma tells us that Dgrad u (e2k ) = 0. This is well known to imply that the measure associated with e2k is invariant under the local action arising from grad u. Since Dgrad u (e2k ) = 0 it follows that Dgrad u/||grad u|| (||grad u||e2k ) = 0. From this it follows that the measure associated with ||grad u||e2k is invariant under the local action gotten from grad u/||grad u||. Because ψ ∈ D we know that both ψ and ||grad ψ|| are in L 2 (M n , μ). Thus the measure m associated with e2k = ψψ is finite. , μ) since ||grad ψ|| ∈ L 2 (M n , μ). We know ||grad ψ||2 ∈ L 1 (M n ∂k ∂u However grad ψ = ∂ xi + i ∂ xi ψ, so
(||grad k||2 + ||grad u||2 )ψψ < ∞. Mn
Hence ||grad u|| ∈ L 2 (Mn , m) and therefore ||grad u|| ∈ L 1 (M n , m). Consequently the measure associated with ||grad u||e2k is finite. Thus both flows we are dealing with possess finite invariant measures on S whose null sets are the same as those for the measure associated with . Since no point can be both ◦-limited and ×-limited either the points in S that are not ◦-limited have positive measure or the same is true for the points that are not ×-limited. Suppose for definiteness that the points in S that are not ◦-limited form a set A of positive measure. Because our invariant measure is finite, this would contradict the Poincaré recurrence theorem. Since the same reasoning applies if the set of points that are not ×-limited is of positive measure, we can conclude that if ψ = exp(k + iu) ∈ D and L V ψ = 0 for some real valued function V then S is empty so ψ equals a constant times some real valued function. If ψ1 and ψ2 were two linearly independent real valued functions in D for which L V (ψ1 ) = L V (ψ2 ) = 0 and we let ψ = ψ1 + iψ2 , if ψ1 and ψ2 had no common zero, ψ would vanish nowhere and therefore would equal a constant times a real valued function, contradicting the linear independence of ψ1 and ψ2 . Thus Theorem 1 as stated in the Introduction is proved. III. In what follows we will assume Mn = M n . Next we need to know that if V is a C ∞ real valued function on Mn that is bounded below and we let the formal operator
Schrödinger Operators and the Zeros of Their Eigenfunctions
191
sending ψ into −ψ + V ψ act on C0∞ (Mn ), the closure of this operator is self-adjoint. This is proved in [7, p. 86] when V ≥ 1 everywhere, and moreover it is shown there that if ψ is in the domain of this extended operator then ||grad ψ|| ∈ L 2 (Mn , μ). Since for any densely defined symmetric operator T and any real constant λ it is true that T − λI = T − λI and (T − λI )∗ = T ∗ − λI , we can substitute the condition V bounded below for the restriction V ≥ 1 everywhere. It is well known that any eigenfunction of − + V is of class C ∞ when V satisfies our conditions. We can thus conclude that any eigenfunction of our extended operator − + V belongs to D. Thus from Theorem 1 above we can conclude that if ψ1 and ψ2 are two linearly independent real valued eigenfunctions of our self-adjoint operator − + V belonging to the same eigenspace, then ψ1 and ψ2 have a common zero. Then, given our assumptions concerning M n and V we can conclude Theorem 2. If ψ is any eigenfunction of − + V that is nowhere equal to zero then the eigenspace to which ψ belongs contains a positive function. In this case the eigenspace is one dimensional. Proof. If ψ is an eigenfunction that is nowhere zero and we write ψ as ψ1 + iψ2 , where ψ1 and ψ2 are real valued, then ψ1 and ψ2 cannot have a common zero. By the above remarks ψ1 and ψ2 cannot be linearly independent, from which it follows that ψ must be a constant multiple of a real valued function. Since this function cannot equal zero anywhere either it or its negative is everywhere positive. Since our eigenspace contains a positive function, it follows from Theorem 1 that this eigenspace must be of dimension one.
Finally let us consider the case where M n is compact but is allowed to have a boundary. We get a self-adjoint operator − + V , each of whose eigenfunctions is of class C ∞ . Hence each eigenspace is contained in D. Theorem 1 implies that in this case also we have Theorem 3. If ψ1 and ψ2 are linearly independent real valued functions in an eigenspace of − + V there exists a point p such that ψ1 ( p) = ψ2 ( p) = 0. It should be noted that the case in Theorem 3 in which M n = Mn so there is no boundary and V = 0 was the principal theorem in [3]. Of course our previous results yield a different generalization of the result of Gichev. References 1. Berazin, F.A., Shubin, M.A.: “The Schrödinger Equation”. Dordrecht: Klover Academic Publishers, 1991 2. Dunford, N., Schwartz, J.: “Linear Operators - Part 2”. New York-London: Interscience Publishers, 1963 3. Gichev, V.M.: A Note on the Common Zeros of Laplace Beltrami Eigenfunctions. Ann. Global Anal. Geome. 26, 201–208 (2004) 4. Jost, J.: “Riemannian Geometry and Geometric Analysis”. Berlin-Herdelberg-New York: Springer Verlag, 1995 5. Takhtajan, L.A.: “Quantum Mechanics for Mathematicians”. Graduate Studies in Mathematics, Vol. 95, Providence, RI: Amer. Math. Soc., 2008 6. Taylor, M.E.: “Partial Differential Equations - Basic Theory. Vol. 1”. Berlin-Herdelberg-New York: Springer, 1996 7. Taylor, M.E.: “Partial Differential Equations. Vol. 2”. Berlin-Herdelberg-New York: Springer, 1996 Communicated by B. Simon
Commun. Math. Phys. 306, 193–228 (2011) Digital Object Identifier (DOI) 10.1007/s00220-011-1278-x
Communications in
Mathematical Physics
Second Order Perturbation Theory for Embedded Eigenvalues J. Faupin1, , J. S. Møller2 , E. Skibsted2 1 Institut de Mathématiques de Bordeaux, UMR-CNRS 5251, Université de Bordeaux 1,
351 Cours de la Libération, 33405 Talence Cedex, France. E-mail:
[email protected]
2 Institut for Matematiske Fag, Aarhus Universitet, Ny Munkegade, 8000 Aarhus C, Denmark.
E-mail:
[email protected];
[email protected] Received: 1 July 2010 / Accepted: 29 December 2010 Published online: 18 June 2011 – © Springer-Verlag 2011
Abstract: We study second order perturbation theory for embedded eigenvalues of an abstract class of self-adjoint operators. Using an extension of the Mourre theory, under assumptions on the regularity of bound states with respect to a conjugate operator, we prove upper semicontinuity of the point spectrum and establish the Fermi Golden Rule criterion. Our results apply to massless Pauli-Fierz Hamiltonians for arbitrary coupling.
Contents 1.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Assumptions . . . . . . . . . . . . . . . . . . . . . . 1.2 Main results . . . . . . . . . . . . . . . . . . . . . . 2. Application to the Spectral Theory of Pauli-Fierz Models 2.1 Massless Pauli-Fierz Hamiltonians . . . . . . . . . . 2.2 Checking the abstract assumptions . . . . . . . . . . 2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Example: the massless Nelson model . . . . . . . . . 3. Reduced Limiting Absorption Principle at an Eigenvalue . 4. Upper Semicontinuity of Point Spectrum . . . . . . . . . 5. Second Order Perturbation Theory . . . . . . . . . . . . 5.1 Second order perturbation theory – simple case . . . 5.2 Fermi Golden Rule criterion – general case . . . . . Appendix A. Domain Question . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
Partially supported by the Center for Theory in Natural Sciences, Aarhus University.
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
194 194 199 201 201 204 206 207 211 218 221 221 224 226 227
194
J. Faupin, J. S. Møller, E. Skibsted
1. Introduction In this second of a series of papers, we study second order perturbation theory for embedded eigenvalues of an abstract class of self-adjoint operators. Perturbation theory for isolated eigenvalues of finite multiplicity is well-understood, at least if the family of operators under consideration is analytic in the sense of Kato (see [Ka,RS]). The question is more subtle when dealing with unperturbed eigenvalues embedded in the continuous spectrum. A method to tackle this problem, which we shall not develop here, is based on analytic deformation techniques and gives rise to a notion of resonances. It appeared in [AC,BC] and was further extended by many authors in different contexts (let us mention [Si,RS,JP,BFS] among many other contributions). As shown in [AHS], another way of studying the behaviour of embedded eigenvalues under perturbation is based on Mourre’s commutator method ([Mo]). We shall develop this second approach from an abstract point of view in this paper. We mainly require two conditions: The first one corresponds to a set of assumptions needed in order to use the Mourre method (see Conditions 1.3 below). We shall work with an extension of the Mourre theory which we call singular Mourre theory, and which is closely related to the ones developed in [Sk,MS,GGM1]. Singular Mourre theory refers to the situation where the commutator of the Hamiltonian with the chosen “conjugate operator” is not controlled by the Hamiltonian itself. The regular Mourre theory, studied for instance in [Mo,ABG,HuSp,HS,Ca,CGH], is a particular case of the theory considered here. A feature of singular Mourre theory is to allow one to derive spectral properties of so-called Pauli-Fierz Hamiltonians. This shall be discussed in Sect. 2. Our second set of assumptions concerns the regularity of bound states with respect to a conjugate operator (see Conditions 1.7, 1.9 and 1.10 below). Related questions are discussed in details, in an abstract framework, in the companion paper [FMS] (see also [Ca,CGH]). Our main concerns are to study upper semicontinuity of point spectrum (Theorem 1.14) and to show that the Fermi Golden Rule criterion (Theorem 1.15) holds. If the Fermi Golden Rule condition is not fulfilled we shall still obtain an expansion to second order of perturbed eigenvalues. Before precisely stating our results and comparing them to the literature, we introduce the abstract framework in which we shall work. 1.1. Assumptions. We introduce first our basic conditions, Conditions 1.3, which are related to a set of conditions used in [GGM1]. For a comparison we refer the reader to Remark 1.4 6). Let H be a complex Hilbert space. Suppose that H and M are self-adjoint operators on H, with M ≥ 0, and suppose that a symmetric operator R is given such that D(R) ⊇ D(H ). Let H := M + R defined on D := D(M) ∩ D(H ).
(1.1)
Under Condition 1.3 (1), we shall see that D is dense in H (see Remark 1.4 2) below). Operators are according to our convention always densely defined. Observe also that we do not impose the condition that H is closed. To make contact to [GGM1], we note that the operator closure of H at some points in our exposition will coincide with the operator H used in Hypothesis (M1) in [GGM1] (see Remark 1.4 6) for a further comment). Let 1 1 (1.2) G := D M 2 ∩ D |H | 2 ,
Second Order Perturbation Theory
195
equipped with the norm of the intersection topology defined by 1 2 1 2 u2G := M 2 u + |H | 2 u + u2H . H
H
(1.3)
Let A be a closed, maximal symmetric operator on H. In particular, introducing deficiency indices n ∓ = dim Ker(A∗ ± i), either n + = 0 or n − = 0. For simplicity we shall assume that n + = 0 so that A generates a C0 -semigroup of isometries {Wt }t≥0 (if n − = 0 we may mimic the theory explained below with A → −A). At this point we refer to e.g. [Da, Th. 10.4.4]. We recall that the C0 -semigroup property means (see e.g. [GGM1, Subsect. 2.5] for a short general discussion of C0 -semigroups, and [HP, Chap. 10] for an extensive study) that the map [0, ∞[ t → Wt ∈ B(H) obeys W0 = I, Wt Ws = Wt+s for t, s ≥ 0, and w- limt→0+ Wt = I . Here B(H) denotes the set of bounded operators on H and w- lim stands for weak limit. We also recall that any C0 -semigroup on a Hilbert space is automatically strongly continuous on [0, ∞[, cf. [HP, Th. 10.6.5]. The operator A is the generator of the C0 -semigroup {Wt }t≥0 meaning that D(A) = {u ∈ H, lim+ (it)−1 (Wt u − u) exists} t→0
and
Au = lim+ (it)−1 (Wt u − u). t→0
(1.4) We write Wt = For any Hilbert spaces H1 and H2 , we denote by B(H1 ; H2 ) the set of bounded operators from H1 to H2 . We use the notation B := (1 + B ∗ B)1/2 for any closed operator B. Throughout the paper, C j , j = 1, 2, . . ., will denote positive constants that may differ from one proof to another. Let us recall the following general definition from [GGM1]: eit A .
Definition 1.1. Let {W1,t }, {W2,t } be two C0 -semigroups on Hilbert spaces H1 , H2 with generators A1 , A2 respectively. A bounded operator B ∈ B(H1 ; H2 ) is said to be in C1 (A1 ; A2 ) if W2,t B − BW1,t B(H1 ;H2 ) ≤ Ct, 0 ≤ t ≤ 1,
(1.5)
for some positive constant C. We have the following accompanying remarks and definitions. Remarks 1.2. 1) By [GGM1, Prop. 2.29], B ∈ B(H1 ; H2 ) is of class C1 (A1 ; A2 ) if and only if the sesquilinear form 2 [B, iA]1 defined on D(A∗2 ) × D(A1 ) by φ, 2 [B, iA]1 ψ = iB ∗ φ, A1 ψ − iA∗2 φ, Bψ,
(1.6)
is bounded relatively to the topology of H2 × H1 . The associated bounded operator in B(H1 ; H2 ) is denoted by [B, iA]0 and we have [B, iA]0 = s- lim+ t −1 [BW1,t − W2,t B], t→0
(1.7)
where s- lim stands for strong limit. We say that B is of class C2 (A1 ; A2 ) if and only if B ∈ C1 (A1 ; A2 ) and [B, iA]0 ∈ C1 (A1 ; A2 ). 2) We recall (see [ABG]) that if A and B are self-adjoint operators on a Hilbert space H, B is said to be in C1 (A) if there exists z ∈ C \ R such that (B − z)−1 ∈ C1 (A; A) (meaning here that H j = H and A j = A, j = 1, 2). In that case in fact (B − z)−1 ∈ C1 (A; A) for all z ∈ ρ(B) (ρ(B) is the resolvent set of B).
196
J. Faupin, J. S. Møller, E. Skibsted
3) The standard Mourre class, cf. [Mo], is a subset of C1 (A) given as follows: Notice that for any B ∈ C1 (A) the commutator form [B, iA] defined on D(B) ∩ D(A) extends uniquely (by continuity) to a bounded form [B, iA]0 on D(B). We shall say that B is Mourre-C 1 (A) if [B, iA]0 is a B-bounded operator on H. The subclass of Mourre-C 1 (A) operators in C1 (A) is in this paper denoted by C1Mo (A). Let us now state our first set of conditions which is based on the setting introduced in the beginning of this subsection, in particular the C0 -semigroup of isometries, Wt = eit A , t ≥ 0: Conditions 1.3. (1) H ∈ C1Mo (M). (2) There is an interval I ⊆ R such that for all η ∈ I , there exist c0 > 0, C1 ∈ R, f η ∈ C∞ 0 (R), 0 ≤ f η ≤ 1 and f η = 1 in a neighbourhood of η, and a compact operator K 0 on H such that, in the sense of quadratic forms on D, H ≥ c0 I − C1 f η⊥ (H )2 H − K 0 ,
(1.8)
where f η⊥ (H ) = 1 − f η (H ). (3) G is “boundedly-stable” under {Wt } and {Wt∗ } i.e. Wt G ⊆ G, Wt∗ G ⊆ G, t > 0, and for all φ ∈ G, sup Wt φG < ∞,
0
sup Wt∗ φG < ∞.
(1.9)
0
Let AG denote the generator of the C0 -semigroup Wt |G and let AG ∗ denote the generator of the C0 -semigroup given as the extension of Wt to G ∗ (see Remark 1.4 1) for justification). (4) H ∈ C2 (AG ; AG ∗ ) (see Remark 1.4 2) for justification of notation), and for all φ ∈ D, H φ = [H, i A]0 φ.
(1.10)
We have several accompanying remarks. In Remarks 1.4 1)– 3) we introduce further notation, give justification of notation and furthermore we state a version of the so-called virial theorem. Remarks 1.4. 1) Due to the boundedly-stability (1.9), the closed graph theorem and a density argument, it follows that Wt |G belongs to B(G) and that Wt |G is a C0 -semigroup, cf. [GGM1, Lemma 2.33]. Arguing similarly we verify that each Wt extends by continuity to a bounded operator on G ∗ and that the extensions form a C0 -semigroup in G ∗ . This justifies the notations AG and AG ∗ in Condition 1.3 (3). 2) It follows from Condition 1.3 (1) that D is a core for H as well as for M, see e.g. [GG] or [GGM1]. This condition is transcribed from [Sk] and is stronger than [GGM1, Hypothesis (M1)], cf. 6) given below. Another consequence of Condition 1.3 (1) is the following alternative description of the space √ G: Let G be the Friedrichs extension of the operator M + H on D. Then D( G) = G; this follows from [GGM1, Prop. 3.8]. In Appendix A we give an elementary proof. In particular D is dense in G.
(1.11)
Notice that due to (1.11) we can uniquely consider the operators H and H as being members of B(G; G ∗ ). Whence in particular writing H ∈ B(G; G ∗ ) we have the identity (1.10) for all φ ∈ G and we can legitimately introduce the notation H := [H , iA]0 ∈ B(G; G ∗ ).
(1.12)
Second Order Perturbation Theory
197
3) Suppose Conditions 1.3. Then the following identity holds for all φ1 ∈ D ∩ D(A∗ ) and φ2 ∈ D ∩ D(A): φ1 , (M + R)φ2 = iH φ1 , Aφ2 − iA∗ φ1 , H φ2 .
(1.13)
This is a consequence of (1.7). Another (related) consequence of (1.7) is the following version of the virial theorem: For any eigenstate, (H − λ)ψ = 0, with ψ ∈ D(M 1/2 ) ψ, (M + R)ψ := M 1/2 ψ2 + ψ, Rψ = 0,
(1.14)
see [GGM1, Prop. 4.2]. Observe that due to (1.11), the assumptions of [GGM1, Prop. 4.2] are indeed satisfied. Notice also that some regularity assumption of H with respect to an operator A is needed for the virial theorem for the pair (H, A) to hold (see [GG]). As a standard corollary of the virial theorem we have under Conditions 1.3 that the number of eigenvalues of H in any compact interval J ⊆ I is finite and that each such eigenvalue has a finite multiplicity (here we assume that the corresponding eigenstates are in D(M 1/2 )). Besides, in Sect. 3, we will recall a version of the Limiting Absorption Principle (LAP) established in [GGM1] (see Theorem 3.1 of the present paper) which implies that under Conditions 1.3, H has no singular continuous spectrum in I . The fact that H is a “commutator”, cf. (1.10) and (1.13), is also important in the proof of LAP of [GGM1] (this is indeed an integral part of any known proof of LAP in the spirit of Mourre). 4) The conditions of the regular Mourre theory considered for instance in [Mo,ABG, HuSp,HS,Ca,CGH] constitute a particular case of Conditions 1.3 assuming that M = 0. In [Mo,ABG,HS,Ca,CGH], the conjugate operator A is supposed to be self-adjoint, whereas in [HuSp] the weaker assumption that A is the generator of a C0 -semigroup of isometries is required. Notice that in the case where M = 0 and A is self-adjoint Condition 1.3 (3) appears replaced by the stronger condition: sup|t|<1 eit A φD(H ) < ∞ for any φ ∈ D(H ). If H is separable it follows from [HP, Lemma 10.2.1] that the latter condition is a consequence of the weaker condition that eit A D(H ) ⊆ D(H ) for all t ∈ R. This equivalence is also true without the separability condition, see [ABG, Prop. 3.2.5]. A similar equivalence for semigroups is not known to our knowledge. It should also be noticed that for M = 0 the first part of Condition 1.3 (4) leads to the assumption that H −1/2 H H −1/2 is bounded. 5) The idea of splitting the formal commutator i[H, A] into an H -unbounded piece, M, and a H -bounded piece, R, appeared first in [Sk]. As it was shown in [Sk], and later in [GGM2], this extension of the Mourre theory allows one to study spectral properties of N -body systems coupled to bosonic fields (see also [MS] for the use of related assumptions in a different context). This will be discussed more precisely in the next section. 6) We notice that Conditions 1.3 (with K 0 = 0 in (2)) are stronger than Hypotheses (M1)–(M5) used in [GGM1]. As mentioned at the beginning of this subsection, the operator H in [GGM1] is supposed to be closed; it corresponds to the closure of the operator H considered in this paper (compare Hypothesis (M1) in [GGM1] with Condition 1.3 (1), and see [GGM1, Lemma 2.26]). Therefore, in particular, the results proved in [GGM1] hold under Conditions 1.3, see Remark 3.2 2) for a further discussion. Throughout the discussion below we impose (mostly tacitly) Conditions 1.3. We introduce the following classes of operators (to be considered as classes of “perturbations”):
198
J. Faupin, J. S. Møller, E. Skibsted
Definition 1.5. We say that a symmetric operator V with D(V ) ⊇ D(H ), -bounded relatively to H , is in V1 if V ∈ C1 (AG ; AG ∗ ) and V := [V, i A]0 is given as an H -bounded operator. For any V ∈ V1 , we set V 1 := V (H − i)−1 + V (H − i)−1 .
(1.15)
It follows from the Kato-Rellich Theorem that for any V ∈ V1 the operator H + V is self-adjoint with D(H + V ) = D(H ). Definition 1.6. We say that V ∈ V1 is in V2 if V ∈ C1 (AG ; AG ∗ ), and we set V 2 := V 1 + V B(G ;G ∗ ) ,
(1.16)
where V := [V , i A]0 . Our main assumptions on the unperturbed eigenstates are stated in Condition 1.7 and in its stronger version Condition 1.9. Condition 1.7. If λ ∈ I is an eigenvalue of H , any eigenstate ψ associated to λ, H ψ = λψ, satisfies ψ ∈ D(A) ∩ D(M). Remark 1.8. Under Condition 1.7 and with ψ given as there, one verifies using (1.7) and the fact that D is dense in D(H ) that ψ ∈ D(H A) := {φ ∈ D(A)| Aφ ∈ D(H )}, cf. Remark 1.4 3). Condition 1.9. If λ ∈ I is an eigenvalue of H , any eigenstate ψ associated to λ, H ψ = λψ, satisfies ψ ∈ D(A2 ) ∩ D(M). The (possibly existing) perturbed eigenstates may fulfill the following condition: Condition 1.10. For any compact interval J ⊆ I there exist γ > 0 and a subset B1,γ of the ball centered at 0 with radius γ in V1 , B1,γ ⊆ {V ∈ V1 , V 1 ≤ γ },
(1.17)
such that {0} ⊂ B1,γ , B1,γ is star-shaped and symmetric with respect to 0, and the following holds: There exists C > 0 such that, if V ∈ B1,γ and (H + V − λ)ψ = 0 with λ ∈ J , then ψ ∈ D(A) ∩ D(M) and Aψ ≤ Cψ.
(1.18)
Observe that Conditions 1.9 and 1.10 are both stronger than Condition 1.7. Condition 1.7 is indeed insufficient for our main theorems to hold and we need to assume either Condition 1.9 or Condition 1.10. As for the application we give in Sect. 2, we shall verify Condition 1.10 rather than Condition 1.9, see more precisely Proposition 2.4 and Remark 2.5 2). The following two conditions are needed for our version of the so-called Fermi Golden Rule criterion. The first condition is a technical addition to Conditions 1.3: Condition 1.11. D(M 1/2 ) ∩ D(H ) ∩ D(A∗ ) is dense in D(A∗ ).
Second Order Perturbation Theory
199
Remarks 1.12. 1) Suppose the following modification of the part of Condition 1.3 (3) concerning the adjoint semigroup: D is boundedly-stable under {Wt∗ }, i.e. Wt∗ D ⊆ D, t > 0, and for all φ ∈ D, sup Wt∗ φD < ∞.
(1.19)
0
Then D ∩ D(A∗ ) is dense in D(A∗ ), cf. [GGM1, Remark 2.35]. This statement is of course stronger than Condition 1.11. 2) In our applications Condition 1.11 can be avoided upon changing the definition of V1 . Explicitly this modification is given by imposing in Definition 1.5 the following additional condition (replacing -boundedness with respect to H ): V is H 1/2 bounded. (See Remark 5.2 1).) Our second condition is the so-called Fermi Golden Rule condition. Condition 1.13. Suppose Conditions 1.7 and 1.11. Suppose λ ∈ σpp (H ) and let P denote the eigenprojection P = E H ({λ}) and P¯ = I − P. For given V ∈ V1 there exists c > 0 such that P V Im (H − λ − i0+ )−1 P¯ V P ≥ c P. (1.20) We shall see in Sect. 3 that the left-hand-side of (1.20) defines a bounded operator for any V ∈ V1 (see Theorem 3.3 and Remark 5.2 1) for details). This point might be surprising for the reader due to the low degree of regularity imposed by Condition 1.7 (for example P may not map into D(A2 ) under the stated conditions, see the end of the next subsection for a further discussion). 1.2. Main results. We have the following result on upper semicontinuity of the point spectrum of H , showing, in other words, that the total multiplicity of the perturbed eigenvalues near an unperturbed one, λ, cannot exceed the multiplicity of λ. Theorem 1.14. Assume that Conditions 1.3 and Condition 1.10 hold. Let λ ∈ I and J ⊆ I be a compact interval including λ such that σpp (H ) ∩ J = {λ}. Fix γ > 0 and B1,γ as in Condition 1.10. There exists 0 < γ ≤ γ such that if V ∈ B1,γ and V 1 ≤ γ , the total multiplicity of the eigenvalues of H + V in J is at most dim Ker(H − λ). Notice that the appearing quantity dim Ker(H − λ) is finite. This is in fact a consequence of Conditions 1.3 and Condition 1.7, cf. Remark 1.4 3). We remark that Theorem 1.14 is an abstract version of [AHS, Th. 2.5] where upper semicontinuity of the point spectrum of N -body Schrödinger operators is established. The proof, given in Sect. 4, is essentially the same. In the case where H does not have eigenvalues in J , we do not need Condition 1.10 to establish upper semicontinuity of point spectrum. More precisely, we will prove that σpp (H + σ V ) ∩ J = ∅ for |σ | small enough under the condition that V ∈ V2 (see Corollary 4.1). If it is only required that V ∈ V1 , the result still holds true provided we assume in addition that any eigenstate of H +σ V belongs to D(M 1/2 ) (see Corollary 4.2). One might suspect that there is a similar semistability result as the one stated in Theorem 1.14 given upon replacing Condition 1.10 by Condition 1.9 (assuming now smallness of V 2 ). Although there is a formal argument, Conditions 1.3 are insufficient for a rigorous proof. Nevertheless the analogous assertion is true in the special
200
J. Faupin, J. S. Møller, E. Skibsted
case where H does not have eigenvalues in the interval J , cf. Corollary 4.1. Notice also that another special case, although treated under additional conditions, is part of Theorem 1.15 stated below. For any V ∈ V1 and σ ∈ R we set Hσ := H + σ V . A main result of this paper is the following assertion on absence of eigenvalues of Hσ for small non-vanishing |σ | and for a V fulfilling (1.20). It will be proven in Sect. 5. Theorem 1.15. Assume that Conditions 1.3, Condition 1.7 and Condition 1.11 hold. Assume that Condition 1.13 holds for some V ∈ V1 . Let J ⊆ I be any compact interval such that σpp (H ) ∩ J = {λ}. Suppose one of the following two conditions: i) Condition 1.9 and V ∈ V2 . ii) Condition 1.10 and V ∈ B1,γ . There exists σ0 > 0 such that for all σ ∈ ] − σ0 , σ0 [ \{0}, σpp (Hσ ) ∩ J = ∅.
(1.21)
This type of theorem is usually referred to as the Fermi Golden Rule criterion (or in short just Fermi Golden Rule). In the framework of regular Mourre theory (that is in particular if M = 0, see Remark 1.4 4) above), if A is self-adjoint, Fermi Golden Rule is well-known. It was first proved in [AHS] for N -body Schrödinger operators, under an assumption of the type V ∈ V2 and using exponential bounds for eigenstates (yielding in particular an analogue of Condition 1.9). In [HS], Theorem 1.15 is proved in an abstract setting assuming Condition 1.9 and the H -boundedness of V . In [Ca,CGH], still in the framework of regular Mourre theory and with A self-adjoint, it is shown that an assumption of the type H ∈ C4 (A) implies Condition 1.9. A similar result also appears in [GJ] under slightly weaker (“local”) assumptions, still requiring, however, the boundedness of four commutators. Theorem 1.15 improves the previous results for the following two reasons: First, as mentioned above, Conditions 1.3 do not require that A be self-adjoint neither that the formal commutator i[H, A] be H -bounded, which can be important in applications (see in particular Sect. 2 on Pauli-Fierz Hamiltonians). Second, we prove that the Fermi Golden Rule criterion also holds under Condition 1.10 and the hypothesis V ∈ B1,γ (that is under Condition ii) of Theorem 1.15), which to our knowledge constitutes a new result even in the framework of regular Mourre theory. Let us emphasize that Condition 1.10 does not contain the assumption that the eigenstates are in the domain of A2 , but only in the domain of A. The price we have to pay lies in the fact that Condition 1.10 involves information on the possibly existing perturbed eigenstates, which in concrete models might (at a first glance) seem rather difficult to obtain. Nevertheless in a separate paper, [FMS], we provide abstract hypotheses under which Condition 1.10 is indeed satisfied. As a consequence, we obtain that Theorem 1.15 applies for a class of Quantum Field Theory models provided that the Hamiltonian only has two bounded commutators with A (defined in a suitable sense), see Sect. 2. We emphasize that from an abstract point of view, working with C2 (A) conditions, in fact verifying Condition 1.10 is doable while Condition 1.9 might be false, see [FMS, Ex. 1.1] for a counterexample. Recently Rasmussen together with one us ([MR]) studied the essential energymomentum spectrum of the translation invariant massive Nelson Hamiltonian H . In particular the authors construct, for a given total momentum P and non-threshold energy E, a conjugate operator A with respect to which the fiber Hamiltonian H (P) satisfies a Mourre estimate, locally uniformly in E and P. From the point of view of the
Second Order Perturbation Theory
201
present paper this model is of interest because H (P) is of class C2 (A) but (presumably) not of class C3 (A). This means that, even though the context of [MR] is regular Mourre theory, the improvements of this paper and its companion [FMS] are both essential to conclude anything about the structure of embedded non-threshold eigenvalue bands. We shall use different methods to prove Theorem 1.15 depending on whether we assume i) or ii). In the first case, we shall obtain an expansion to second order of any possibly existing perturbed eigenvalue near the unperturbed one λ. In the second case, ii), this will also be done under the further hypothesis dim Ran(P) = 1, but we shall proceed differently if the unperturbed eigenvalue is degenerate. In both cases, a key ingredient of the proof consists in obtaining a “reduced Limiting Absorption Principle” at an eigenvalue (see Theorems 3.3 and 3.4 below). The paper is organized as follows: In the next section, we consider Pauli-Fierz Hamiltonians which constitute our main example of a model satisfying the abstract conditions stated above. Section 3 concerns reduced Limiting Absorption Principles at an eigenvalue λ of H . In Sect. 4, we study upper semicontinuity of point spectrum and prove Theorem 1.14. Finally in Sect. 5, we study second order perturbation theory assuming either Condition 1.9 or Condition 1.10, and we prove Theorem 1.15. In Appendix A we present a simple proof of the technically important statement (1.11). 2. Application to the Spectral Theory of Pauli-Fierz Models 2.1. Massless Pauli-Fierz Hamiltonians. The main example we have in mind fitting into the framework of Sect. 1 consists of an abstract class of Quantum Field Theory models, sometimes called massless Pauli-Fierz models (see for instance [DG,DJ,GGM2,FMS]). The latter describe a “small” quantum system linearly coupled to a massless quantized radiation field. The corresponding Hamiltonians HvPF acts on the Hilbert space HPF := K ⊗ (h), where K is the Hilbert space for the small quantum system, and (h) is the symmetric Fock space over h := L2 (Rd , dk). The latter describes a field of massless scalar bosons and is defined by (h) := C ⊕
+∞
⊗ns h,
(2.1)
n=1
where ⊗ns denotes the symmetric n th tensor product of h. The operator HvPF depends on the form factor v and is written as HvPF := K ⊗ 1(h) + 1K ⊗ d(|k|) + φ(v),
(2.2)
where K is a bounded below operator on K describing the dynamics of the small system, d(|k|) is the second √ quantization of the operator of multiplication by |k| and φ(v) := (a ∗ (v) + a(v))/ 2. We recall that the second quantization of an operator ω on h is given by its restriction to the n-bosons Hilbert space as d(ω)|C = 0, d(ω)|⊗ns h =
n
1h ⊗ · · · ⊗ 1h ⊗ ω ⊗ 1h ⊗ · · · ⊗ 1h,
(2.3)
j=1
where in the sum above, ω acts on the j th component of the tensor product. The form factor v is a linear operator from K to K ⊗ h, and a ∗ (v), a(v) are the usual creation
202
J. Faupin, J. S. Møller, E. Skibsted
and annihilation operators associated with v (see [BD,GGM2]). For convenience, we assume that K ≥ 0.
(2.4)
The hypotheses we make are slightly stronger than the ones considered in [GGM2]. The first one, Hypothesis (H0), is related to the fact that the small system is assumed to be confined: (H0) (K + 1)−1 is compact on K. For any γ > 0, let Oγ ⊆ B(D(K γ ); K ⊗ h) be the set of operators which extend by continuity from D(K γ ) to an element of B(K; D(K γ )∗ ⊗ h), that is Oγ := v ∈ B(D(K γ ); K ⊗ h), ∃C > 0, ∀ψ ∈ D(K γ ), [(K + 1)−γ ⊗ 1h]vψ K⊗h ≤ CψK . (2.5) Let 0 ≤ τ < 1/2 be fixed. Our first assumption on the form factor is the following: (I1) v and [1K ⊗ |k|−1/2 ]v belong to Oτ . It follows from [GGM2, Prop. 4.6] that, if [1K ⊗ |k|−1/2 ]v ∈ Oτ , then HvPF is self-adjoint with domain D(HvPF ) = D(H0PF ) = D(K ) ⊗ (h) ∩ K ⊗ D(d(|k|)).
(2.6)
We consider the unitary operator T : L2 (Rd ) → L2 (R+ ) ⊗ L2 (S d−1 ) =: h˜
(2.7)
defined by (T u)(ω, θ ) = ω(d−1)/2 u(ωθ ). Lifting it to the full Hilbert space HPF by setting T := 1K ⊗ (T ) (recall that (T ) is defined by its restriction to the n-bosons Hilbert space as (T )|⊗sn h = T ⊗ · · · ⊗ T for n ≥ 1, and (T )|C = 1C for n = 0), we get a unitary map ˜ T : HPF → H˜ PF := K ⊗ (h).
(2.8)
This allows us to write the Hamiltonian in polar coordinates in the following way: H˜ vPF := T HvPF T −1 = K ⊗ 1(h˜ ) + 1K ⊗ d(ω) + φ(v), ˜
(2.9)
on H˜ PF , where v˜ := [1K ⊗ T ]v
(2.10)
˜ and d(ω) denotes the second quantization of the is a linear operator from K to K ⊗ h, operator of multiplication by ω ∈ R+ . Let us consider a function d ∈ C∞ ((0, ∞)) satisfying d (ω) < 0, |d (ω)| ≤ Cω−1 d(ω) for some positive constant C, d(ω) = 1 if ω ≥ 1, and limω→0 d(ω) = +∞ (see Fig. 1). Let O˜τ := [1K ⊗ T ]Oτ .
(2.11)
Second Order Perturbation Theory
203
Fig. 1. The map ω → d(ω)
The following further assumptions on the interaction are made: (I2) The following holds:
1K ⊗ (1 + ω−1/2 )ω−1 d(ω) ⊗ 1L2 (S d−1 ) v˜ ∈ O˜τ ,
1K ⊗ (1 + ω−1/2 )d(ω)∂ω ⊗ 1L2 (S d−1 ) v˜ ∈ O˜τ , 1 ˜ (I3) 1K ⊗ ∂ω2 ⊗ 1L2 (S d−1 ) v˜ ∈ B(D(K 2 ); K ⊗ h). Let us recall the definition of the conjugate operator used in [GGM2]. Let χ ∈ C∞ 0 ([0, ∞)) be such that χ (ω) = 0 if ω ≥ 1 and χ (ω) = 1 if ω ≤ 1/2. For 0 < δ ≤ 1/2, the function m δ ∈ C∞ ([0, ∞)) is defined by m δ (ω) = χ
ω δ
d(δ) + (1 − χ )
ω δ
d(ω),
(2.12)
(see Fig. 2). ˜ Consider the following operator a˜ δ acting on h: a˜ δ := im δ (ω)
i dm δ ∂ + (ω), D(a˜ δ ) = H01 (R+ ) ⊗ L2 (S d−1 ), ∂ω 2 dω
(2.13)
∞ + 1 + + where H01 (R+ ) denotes the closure of C∞ 0 (R ) in H (R ) and C0 (R ) is the set of + smooth compactly supported functions on R . Then the operator A˜ δ on H˜ PF is defined by A˜ δ := 1K ⊗ d(a˜ δ ). It is proved in [GGM2, Sect. 6] that A˜ δ is closed, densely defined and maximal symmetric. ˜ := −φ(ia˜ δ v). ˜ Then Mδ is self-adjoint, Mδ ≥ 0, Let Mδ := 1K ⊗ d(m δ ) and Rδ (v) and if v satisfies Hypotheses (I1) and (I2), then, by [GGM2, Lemma 6.4 i)], Rδ (v) ˜ is symmetric and H˜ vPF -bounded.
204
J. Faupin, J. S. Møller, E. Skibsted
Fig. 2. The map ω → m δ (ω)
2.2. Checking the abstract assumptions. In this subsection, we verify that, on the Hilbert space H = H˜ PF , the operators H = H˜ vPF , M = Mδ , R = Rδ (v), ˜ A = A˜ δ fulfill Conditions 1.3, 1.10 and 1.11 stated in Sect. 1 (provided that v satisfies, in particular, the hypotheses stated above). The following lemma shows that Condition 1.3 (1) is satisfied. Lemma 2.1. Assume that v satisfies Hypothesis (I1). Then for all δ > 0, H˜ vPF ∈ C1Mo (Mδ ).
(2.14)
Proof. The fact that H˜ vPF ∈ C1 (Mδ ) follows from [GGM2, Lemma 6.4 i)]. Moreover, ˜ by since m δ is bounded and [ω, m δ ] = 0, we have that [ H˜ vPF , iMδ ]0 = −φ(im δ v) [GGM2, Cor. 4.13]. Using again that m δ is bounded, we then conclude from Hypothesis (I1) and [GGM2, Prop. 4.6] that [ H˜ vPF , iMδ ]0 is H˜ 0PF -bounded, and hence H˜ vPF -bounded (with relative bound 0). Lemma 2.1 together with [GGM2, Props. 6.6, 6.7 and Theorem 7.12] imply: Proposition 2.2. Assume Hypothesis (H0) and that v satisfies Hypotheses (I1), (I2) and (I3). Then for all E 0 ∈ R, there exists δ0 > 0 such that for all 0 < δ ≤ δ0 , the operators H = H˜ vPF , M = Mδ , R = Rδ (v), ˜ A = A˜ δ fulfill Conditions 1.3 with I = (−∞, E 0 ). Remark 2.3. We remark that the formulation of the Mourre estimate stated in [GGM2, Th. 7.12] is not the same as the one considered in Condition 1.3 (2). However, one can verify that the latter is indeed a consequence of [GGM2, Th. 7.12]. In order to verify Condition 1.10, we need to impose a further condition on v: (I4) The form (K ⊗ 1h˜ )v˜ − v˜ K extends by continuity from D(K ⊗ 1h˜ ) × D(K ) to an element of O˜ 1 . 2
Here O˜ 1 is defined as O˜ τ (see (2.5) and (2.11)). Notice that, assuming (I1), the 2 statement above is meaningful.
Second Order Perturbation Theory
205
We have to identify the set B1,γ used in Condition 1.10. To this end, let us first introduce some definitions. Let IPF (d) be defined by: IPF (d) := {v ∈ L(K; K ⊗ h), v satisfies (I1), (I2), (I3), (I4)} .
(2.15)
Observe that IPF (d) can be equipped with a norm, ·PF , matching the four conditions (I1), (I2), (I3), (I4) (see [FMS, Subsect. 5.1]). Let v ∈ IPF (d). Let Wδ,t denote the C0 -semigroup generated by A˜ δ . We set 1
GδPF := D(| H˜ vPF | 2 ) ∩ D(Mδ2 ). 1
(2.16)
By Proposition 2.2, we have that H = H˜ vPF , M = Mδ , A = A˜ δ fulfill Condition 1.3 (3), and hence Wδ,t |G PF is a C0 -semigroup (see Remark 1.4, 1)). Its generator is denoted δ by A˜ PF . Likewise, the extension of Wδ,t to (G PF )∗ is a C0 -semigroup whose generator Gδ
δ
is denoted by A˜ (G PF )∗ . δ Let V1PF denote the set of symmetric operators V, -bounded relatively to H˜ vPF , such that V ∈ C1 ( A˜ G PF ; A˜ (G PF )∗ ) and [V, i A˜ δ ]0 is H˜ vPF -bounded. It is equipped with the norm δ
δ
−1 −1 ˜ PF ˜ 0 ˜ PF V PF 1 = V ( Hv − i) + [V, i Aδ ] ( Hv − i) .
(2.17)
By [GGM2, Prop. 4.6], if w satisfies Hypothesis (I1), then φ(w) ˜ is -bounded relatively to H˜ vPF , and, by [GGM2, Lemma 6.4 i)], if in addition w satisfies Hypothesis ˜ is H˜ vPF -bounded. Moreover, one (I2), then, for any δ > 0, [φ(w), ˜ i A˜ δ ]0 = −φ(ia˜ δ w) can verify that the map IPF (d) w → φ(w) ˜ ∈ V1PF
(2.18)
is continuous (see [FMS, Lemma 5.8]). In a separate paper, [FMS], we prove (see [FMS, Th. 5.2]): Proposition 2.4. Assume Hypothesis (H0) and let v ∈ IPF (d). For all E 0 ∈ R, there exists δ0 > 0 such that for all 0 < δ ≤ δ0 , the operators H = H˜ vPF , M = Mδ , R = ˜ A = A˜ δ fulfill Condition 1.10. Here I = (−∞, E 0 ) and B1,γ is given by Rδ (v), B1,γ = {φ(w), ˜ w ∈ IPF (d), wPF ≤ γ˜ },
(2.19)
where γ˜ > 0 is fixed sufficiently small. Remarks 2.5. 1) Since the map (2.18) is continuous, for any γ > 0, the set B1,γ is included in {V ∈ V1PF , V PF 1 ≤ γ } provided that γ˜ is chosen small enough. Moreover, B1,γ is clearly star-shaped and symmetric with respect to 0. Hence the requirements of Condition 1.10 are satisfied. 2) Under the conditions of Proposition 2.4, we do not expect Condition 1.9 to be satisfied in general. Indeed, the assumption that v ∈ IPF (d) in the statement of Proposition 2.4 allows us to control two commutators of H˜ vPF with A˜ δ . In order to be able to conclude that Condition 1.9 is satisfied using the method of [FMS], one would need to control three commutators of H˜ vPF with A˜ δ (see [FMS]). This would require a stronger restriction on the infrared behavior of the form factor v than the one imposed by Hypotheses (I1)–(I2)–(I3).
206
J. Faupin, J. S. Møller, E. Skibsted
In order to apply Theorems 1.14 and 1.15, it remains to verify Condition 1.11. Let + 2 d−1 S = D(K ) ⊗ fin (C∞ )), 0 (R ) ⊗ L (S
(2.20)
where for E ⊆ L2 (R+ ) ⊗ L2 (S d−1 ), fin (E) := = ((0) , (1) , (2) , . . . ) ∈ (E), ∃n 0 , (n) = 0 for n ≥ n 0 . (2.21) For any δ > 0, S is included in D( H˜ vPF ) ∩ D(Mδ ) ∩ D( A˜ δ ). Moreover, S is a core for A˜ ∗δ . Therefore we get: Proposition 2.6. Assume that v satisfies Hypothesis (I1). Then, for all δ > 0, the operators H = H˜ vPF , M = Mδ , A = A˜ δ fulfill Condition 1.11. Let us finally mention the particular case for which the unperturbed Hamiltonian under consideration is the non-interacting one, H˜ 0PF , given by H˜ 0PF := K ⊗ 1(h˜ ) + 1K ⊗ d(ω).
(2.22)
In this case, one can choose M = 1K ⊗ N , where N := d(1h˜ ) is the number operator, and A = 1K ⊗ d(i∂ω ). Then one can easily check the following proposition: Proposition 2.7. Assume Hypothesis (H0). Then the operators H = H˜ 0PF , M = 1K ⊗ N , R = 0, A = 1K ⊗ d(i∂ω ) fulfill Conditions 1.3 (with I = R) and Condition 1.9. Remark 2.8. The fact that Condition 1.9 is fulfilled under the conditions of Proposition 2.7 is obvious, since the unperturbed eigenstates are of the form φ ⊗ , where φ is ˜ an eigenstate of K , and denotes the vacuum in (h). 2.3. Results. As a consequence of Propositions 2.2, 2.4 and 2.6, applying Theorems 1.14 and 1.15, we obtain: Theorem 2.9. Assume Hypothesis (H0). Let v0 , v ∈ IPF (d). Let J be a compact interval such that σpp (HvPF ) ∩ J = {λ}. Let Pv0 denote the eigenprojection Pv0 = E HvPF ({λ}) 0 0 ¯ and Pv0 = I − Pv0 . Then the following holds: i) There exists σ0 > 0 such that for all 0 ≤ |σ | ≤ σ0 , the total multiplicity of the eigenvalues of HvPF + σ φ(v) in J is at most dim Ran(Pv0 ). 0 ii) Suppose in addition that + −1 ¯ Pv0 φ(v)Im (HvPF − λ − i0 ) (2.23) P v0 φ(v)Pv0 ≥ c Pv0 , 0 for some c > 0. Then there exists σ0 > 0 such that for all 0 < |σ | ≤ σ0 , σpp HvPF + σ φ(v) ∩ J = ∅. 0
(2.24)
Second Order Perturbation Theory
207
Remarks 2.10. 1) In view of Propositions 2.2, 2.4 and 2.6, Theorems 1.14 and 1.15 replacing HvPF and φ(v) ˜ replacing φ(v). However, imply Theorem 2.9 with H˜ vPF 0 0 ˜ using the unitary transformation mapping HPF to HPF , the statement of Theorem 2.9 clearly follows. 2) In the case where the unperturbed Hamiltonian is the non-interacting one, that is HvPF = H0PF with H0PF = K ⊗ 1(h) + 1K ⊗ d(|k|), one can use Proposition 2.7 0 instead of Proposition 2.4 in order to conclude Theorem 2.9 ii). Indeed, it follows from [GGM2, Prop. 4.11, Lemma 6.2 and proof of Prop. 6.6] that if v satisfies (I1)– (I2)–(I3), then φ(v) ˜ ∈ V2 (in the sense of Definition 1.6). Hence, since Condition 1.9 is satisfied by Proposition 2.7, we can apply Theorem 1.15 with Condition i) instead of Condition ii). For a general v0 ∈ IPF (d), however, we have to apply Theorem 1.15 with Condition i) (see Remark 2.5 2) above). The latter result (the absence of eigenvalues of H0PF + σ φ(v) for sufficiently small σ = 0 according to Fermi Golden Rule) already appears in [DJ] assuming in particular ˜ for some ν > 1. More recently the same result was also proven that ∂ω ν v˜ ∈ B(K; K⊗h) in [Go], still for sufficiently small values of the coupling constant, under the assumptions ˜ Besides, in ˜ ω−1/2 ∂ω v˜ and ω−ν v˜ (for some ν > 1) belong to B(K; K ⊗ h). that ∂ω v, [DJ], upper semicontinuity of the point spectrum of H0PF + σ φ(v) (in the sense stated in ˜ Theorem 2.9 i)) is obtained for sufficiently small σ , assuming that ∂ω ν v˜ ∈ B(K; K⊗ h) for some ν > 2. The main achievement of our paper, as far as massless Pauli-Fierz models are concerned, is to provide a method which allows us to consider HvPF as the 0 unperturbed Hamiltonian, for any v0 belonging to IPF (d). A model sharing several properties with the one considered in this subsection is the so-called “standard model of non-relativistic QED”. For results on spectral theory in this context involving the Mourre method, we refer to [Sk,BFS,BFSS,DJ,FGS]. 2.4. Example: the massless Nelson model. An example of a model satisfying the hypotheses of Subsect. 2.1 is the Nelson model of confined non-relativistic quantum particles interacting with massless scalar bosons. The Hilbert space is given by HN := L2 (R3P ) ⊗ F,
(2.25)
(L2 (R3 )) is the symmetric Fock space over L2 (R3 ) (see (2.1)). The Nelson
where F := Hamiltonian acts on HN and is defined by
HρN := K ⊗ 1F + 1L2 (R3P ) ⊗ d(|k|) + Iρ (x).
(2.26)
Here x = (x1 , . . . , x P ), and K is a Schrödinger operator on L2 (R3P ) describing the dynamics of P non-relativistic particles. We suppose that K is given by K :=
P 1 i + Vi j (xi − x j ) + W (x1 , . . . , x p ), 2m i i=1
(2.27)
i< j
where the masses m i are positive, the confining potential W satisfies 2 (R3P ) and there exist positive constants c , c > 0 and α > 2 such (W0) W ∈ Lloc 0 1 that W (x) ≥ c0 |x|2α − c1 , and the pair potentials Vi j satisfy (V0) The Vi j ’s are -bounded with relative bound 0.
208
J. Faupin, J. S. Møller, E. Skibsted
Without loss of generality, we can assume that K ≥ 0. Note that (W0) implies that Hypothesis (H0) of Subsect. 2.1 is satisfied. The coupling Iρ (x) in (2.26) is of the form Iρ (x) :=
P
ρ (xi ),
(2.28)
i=1
where, for y ∈ R3 , ρ (y) is the field operator defined by
1 ik·x a(k) dk. ρ(k)e−ik·y a ∗ (k) + ρ(k)e ¯ ρ (y) := √ 2 R3
(2.29)
In particular, Iρ (x) can be written under the form Iρ (x) = φ(N (ρ)), where N (ρ) ∈ B L2 (R3P ); L2 (R3P ) ⊗ L2 (R3 ) = B L2 (R3P ); L2 R3 ; L2 (R3P ) is defined by (N (ρ)ψ)(k)(x1 , . . . , x P ) =
P
e−ik·x j ρ(k)ψ(x1 , . . . , x P ).
(2.30)
j=1
Hence HρN is a Pauli-Fierz Hamiltonian in the sense of Subsect. 2.1, with K = and h = L2 (R3 ). For simplicity, we assume that ρ only depends on k through its norm, |k|, and, going to polar coordinates, we introduce
L2 (R3P )
ρ(ω) ˜ = ωρ(ω, 0, 0), ω ∈ R+ .
(2.31)
Our set of conditions on ρ˜ is the following: ∞ 2 (ρ1) 0 (1 + ω−1 )|ρ(ω)| ˜ dω < ∞, 2 ∞ dρ˜ −1 2 −2 2 ˜ + dω (ω) dω < ∞, (ρ2) 0 (1 + ω )d(ω) ω |ρ(ω)| 2 ∞ d2 ρ˜ (ρ3) 0 dω 2 (ω) dω < ∞, ∞ 2 dω < ∞, (ρ4) 0 ω4 |ρ(ω)| ˜ where d denotes the function considered in Subsect. 2.1. Note that (ρ1)–(ρ2)–(ρ3) are the assumptions made in [GGM2]. The further assumption (ρ4) is made in order that Hypothesis (I4) of Subsect. 2.2 is satisfied. We observe that (ρ2) and (ρ4) imply (ρ1). The set of functions ρ satisfying (ρ1)–(ρ2)–(ρ3)–(ρ4) is denoted by IN (d). The following proposition is proven in [FMS, Subsect. 5.2]: Proposition 2.11. Let ρ ∈ IN (d). Then N (ρ) defined as in (2.30) belongs to IPF (d). An example of ρ, and hence ρ, ˜ satisfying (ρ1)–(ρ2)–(ρ3)–(ρ4) is 2
ρ(k) = e with 0 < < ∞ and > 1.
− |k| 2 2
1
|k|− 2 + , ρ(ω) ˜ =e
−
ω2 22
1
ω 2 + ,
(2.32)
Second Order Perturbation Theory
209
From Proposition 2.11 and Theorem 2.9, we obtain: Theorem 2.12. Assume that Hypotheses (W0) and (V0) hold. Let ρ0 , ρ ∈ IN (d). Let J be a compact interval such that σpp (HρN0 ) ∩ J = {λ}. Let Pρ0 denote the eigenprojection Pρ0 = E HρN ({λ}) and P¯ρ0 = I − Pρ0 . Then the following holds: 0
i) There exists σ0 > 0 such that for all 0 ≤ |σ | ≤ σ0 , the total multiplicity of the eigenvalues of HρN0 + σ Iρ (x) in J is at most dim Ran(Pρ0 ). ii) Suppose in addition that Pρ0 Iρ (x)Im (HρN0 − λ − i0+ )−1 P¯ρ0 Iρ (x)Pρ0 ≥ c Pρ0 , (2.33) for some c > 0. Then there exists σ0 > 0 such that for all 0 < |σ | ≤ σ0 , σpp HρN0 + σ Iρ (x) ∩ J = ∅.
(2.34)
In fact, the confinement assumption (W0) allows one to make use of a unitary dressing transformation (see e.g. [GGM2,FMS]) in order to “improve” the infrared behavior of the form factor in the Hamiltonian HρN0 . More precisely, let (ρ1’) denote the following condition: ∞ 2 dω < ∞. (ρ1’) 0 (1 + ω−2 )|ρ(ω)| ˜ Assuming that ρ0 satisfies this condition, the unitary operator Uρ0 := e−iPiρ0 /|·|
(2.35)
is well-defined and we can consider the Hamiltonian
HρN0 := (1K ⊗ Uρ0 )HρN0 (1K ⊗ Uρ∗0 ) = K ρ0 ⊗ 1F + 1K ⊗ d(|k|) + Iρ0 (x) − Iρ0 (0),
(2.36)
where K ρ0 := K +
P |ρ0 (k)|2 |ρ0 (k)|2 P2 dk − P cos(k · x j )dk. 3 2 R3 |k| |k| j=1 R
(2.37)
In the same way as in (2.30), we observe that Iρ0 (x) − Iρ0 (0) = φ(N (ρ0 )), where is defined by
N (ρ0 )
(N (ρ0 )ψ)(k)(x1 , . . . , x P ) =
P (e−ik·x j − 1)ρ0 (k)ψ(x1 , . . . , x P ). j=1
In particular, HρN0 is a Pauli-Fierz Hamiltonian in the sense of Subsect. 2.1. We consider the following further conditions: 2 ∞ dρ˜ (ω) dω < ∞, (ρ2’) 0 dω 2 2 ∞ d ρ˜ (ρ3’) 0 (1 + ω2 )−1 ω2 dω 2 (ω) dω < ∞,
(2.38)
210
J. Faupin, J. S. Møller, E. Skibsted
and we denote by IN (d) the set of functions ρ satisfying (ρ1’)–(ρ2’)–(ρ3’)–(ρ4). In [FMS, Subsect. 5.2], we verify that if ρ0 ∈ IN (d), then N (ρ0 ) defined as in (2.38) belongs to IPF (d). Notice that for any 0 < < ∞ and > 0, the function given in (2.32) belongs to IN (d). As in the statement of Theorem 2.12, we consider a perturbation of the Hamiltonian HρN0 of the form σ Iρ (x). After the dressing transformation, the perturbation becomes σ Iρ0 ,ρ (x) := σ (1K ⊗ Uρ0 )Iρ (x)(1K ⊗ Uρ∗0 )
P ρ¯0 (k)ρ(k) −ik·x j e Re dk. = Iρ (x) − P |k| R3
(2.39)
j=1
Notice that σ Iρ0 ,ρ (x) is not a field operator in the sense of Subsect. 2.1. Hence it does not belong to the class of perturbations considered in Theorem 2.9. Nevertheless, proceeding in the same way as what we did in Subsect. 2.2 to deduce Theorem 2.9 (see in particular [FMS, Theorem 1.3 2)] for the verification of Condition 1.10 in the present context), we obtain: Theorem 2.13. Assume that Hypotheses (W0) and (V0) are satisfied and let ρ0 ∈ IN (d) and ρ ∈ IN (d). Let J be a compact interval such that σpp (HρN0 ) ∩ J = {λ}. Let Pρ0 denote the eigenprojection Pρ0 = E HρN ({λ}) and P¯ρ0 = I − Pρ0 . Then the conclusions i) 0 and ii) of Theorem 2.12 hold. Observe that, thanks to the unitary dressing transformation Uρ0 , the Fermi Golden Rule condition (2.33) is equivalent to the following one: Pρ 0 Iρ0 ,ρ (x)Im (HρN0 − λ − i0+ )−1 P¯ρ 0 Iρ0 ,ρ (x)Pρ 0 ≥ c Pρ 0 ,
(2.40)
Pρ 0 := E H N ({λ}).
(2.41)
where ρ0
Hence the conclusions of Theorem 2.13 for HρN0 follow from the corresponding state ments for HρN0 . In Theorem 2.13, ρ0 and ρ do not belong to the same class of form factors (as far as the infrared singularity is concerned, ρ0 is allowed to have a more singular infrared behavior than ρ). This is due to the fact that the unitary transformation Uρ0 is ρ0 -dependent, so that the Hamiltonian obtained after the transformation, HρN0 , does not depend linearly on ρ0 . Thus, a perturbation of the form HρN0 +σρ − HρN0 does not belong to the class of linear perturbations considered in this paper (at least as far as the Fermi Golden Rule criterion is concerned). Nevertheless, since the non-linear terms in σ in the expression of HρN0 +σρ − HρN0 act only on the particle Hilbert space L2 (R3P ) (and hence, in particular, commute with the conjugate operator A˜ δ of Subsect. 2.1), we expect that the method of this paper can be extended to cover the case where both ρ0 and ρ belong to IN (d).
Second Order Perturbation Theory
211
3. Reduced Limiting Absorption Principle at an Eigenvalue In this section we prove two different “reduced Limiting Absorption Principles”. Assuming Conditions 1.3 and 1.7, we shall prove a Limiting Absorption Principle for the reduced unperturbed Hamiltonian H P¯ (where P = E H ({λ}) and P¯ = I − P). If the stronger Condition 1.9 is satisfied, we shall obtain a Limiting Absorption Principle for the reduced perturbed Hamiltonian H + α P + σ V (for some α > 0), provided that V ∈ V2 and that σ is sufficiently small. Various versions of the Limiting Absorption Principle appear in [GGM1]. We only give here the following theorem which is a particular case of the results established in [GGM1], stated in a form useful for our context. Recall the notation A = (1 + A∗ A)1/2 = (1 + |A|2 )1/2 . Theorem 3.1. Assume that Conditions 1.3 hold. Suppose J ⊆ I is a compact interval such that σpp (H ) ∩ J = ∅. Let S = {z ∈ C, Re z ∈ J, 0 < |Im z| ≤ 1}. For any 1/2 < s ≤ 1, sup A−s (H − z)−1 A−s < ∞.
(3.1)
z∈S
Moreover the function S z → A−s (H − z)−1 A−s ∈ B(H) is uniformly Hölder continuous of order s − 1/2. In particular, the limit A−s (H − λ − i0+ )−1 A−s := limA−s (H − λ − i)−1 A−s , ↓0
(3.2)
exists in the norm topology of B(H) uniformly in λ ∈ J , and the map J λ → A−s (H − λ − i0+ )−1 A−s ∈ B(H) is uniformly Hölder continuous of order s − 1/2. Remarks 3.2. 1) Strictly speaking, the Mourre estimate formulated in Condition 1.3 (2) together with [GGM1] yield that, for any η ∈ J , there is a neighbourhood Iη such that, for any compact interval Jη ⊆ Iη , the Limiting Absorption Principle (3.1) holds with Jη replacing J . The statement of Theorem 3.1 then follows from the compactness of J and a covering argument (see Step II in the proof of Theorem 3.4 below for the use of the same argument). 2) For s = 1, the result [GGM1, Theorem 3.3] is stronger in that the bound (3.1) holds in a stronger operator topology (given in terms of the Hilbert spaces G and G ∗ ). For our purposes (3.1) suffices. A similar remark is due for the bounds (3.3) and (3.25) given below. Besides, [GGM1, Th. 3.3] does not require that A is maximal symmetric, only the generator of a C0 -semigroup. We shall now obtain a result similar to Theorem 3.1 for a reduced resolvent. Theorem 3.3. Assume that Conditions 1.3 and Condition 1.7 hold. Suppose J ⊆ I is a compact interval such that σpp (H ) ∩ J = {λ}. Let P denote the eigenprojection P = E H ({λ}) and let P¯ = I − P. Let S = {z ∈ C, Re z ∈ J, 0 < |Im z| ≤ 1}. For any 1/2 < s ≤ 1, −s ¯ < ∞. sup A−s (H − z)−1 PA
(3.3)
z∈S
Moreover there exists C > 0 such that for all z, z ∈ S, 1 −s −s ¯ (H − z)−1 − (H − z )−1 PA A ≤ C|z − z |s− 2 .
(3.4)
212
J. Faupin, J. S. Møller, E. Skibsted
In particular, the limit −s ¯ := limA−s (H − λ − i)−1 A−s , A−s (H − λ − i0+ )−1 PA ↓0
(3.5)
exists in the norm topology of B(H) uniformly in λ ∈ J , and the map J λ → −s ∈ B(H) is uniformly Hölder continuous of order s −1/2. ¯ A−s (H −λ−i0+ )−1 PA Proof. It follows from Conditions 1.3 and Condition 1.7 that σpp (H ) is finite in a neighbourhood of λ. Hence, possibly by considering a bigger compact interval, we can assume without loss of generality that λ is included in the interior of J . Consider Condition 1.3 (2) with η = λ. Let Jλ ⊆ J be a compact neighbourhood of λ such that f λ = 1 on a neighbourhood of Jλ . Applying Theorem 3.1 on [inf J, inf Jλ ] and using that P + P¯ = I , we obtain that sup
z∈C,Re z∈[inf J,inf Jλ ],0<|Im z|≤1
−s ¯ A−s (H − z)−1 PA < ∞,
(3.6)
−s is Hölder continuous of order s − 1/2 on ¯ and that z → A−s (H − z)−1 PA {z ∈ C, Re z ∈ [inf J, inf Jλ ], 0 < |Im z| ≤ 1}. The same holds with [sup Jλ , sup J ] replacing [inf J, inf Jλ ]. Therefore, to conclude the proof, one can verify that it is sufficient to establish the statement of Theorem 3.3 with J replaced by Jλ . We can follow the proof of [GGM1, Th. 3.3]. We emphasize the differences with [GGM1] and refer the reader to that paper for more details. We obtain from (1.8) with η = λ that
M + R ≥ 2−1 c0 I − C2 f λ⊥ (H )2 H − f λ (H )K f λ (H ).
(3.7)
Since f λ (H ) goes strongly to P as λ → 0, we obtain M + R ≥ 3−1 c0 I − C2 f λ⊥ (H )2 H − C3 P,
(3.8)
which is valid if the support of f λ is sufficiently close to λ. Applying P¯ from the left and from the right in (3.8) yields ¯ ¯ P(M + R) P¯ ≥ 3−1 c0 P¯ − C2 P¯ f λ⊥ (H )2 H P.
(3.9)
Next, we can mimic the proof of [GGM1, Th. 3.3] using (3.9) and the following slightly different constructions: In Subsect. 3.4 of [GGM1] the operator H (related to the one from the seminal paper [Mo]) is taken as H = H − i H . Notice that here and henceforth we can assume without loss that H is closed (possibly by taking the closure). We propose to take ¯ H¯ := H − i P¯ H P,
(3.10)
¯ on the Hilbert space H¯ := PH. ¯ It with domain D( H¯ ) := D(H ) ∩ D(M) ∩ Ran( P) follows from the assumption Ran(P) ⊆ D(M) that H¯ is well-defined and commutes ¯ Similarly, denoting H ¯ := H | ¯ with P. ¯ and M P¯ := ( P¯ M P)| ¯ , P D(H )∩Ran( P) D(M)∩Ran( P) the assumption that Ran(P) ⊆ D(M) implies that H P¯ and M P¯ are self-adjoint. Moreover, since H ∈ C1Mo (M) by Condition 1.3 (1), one verifies that H P¯ ∈ C1Mo (M P¯ ), and hence in particular that D(H P¯ ) ∩ D(M P¯ ) is a core for M P¯ . One also verifies that P¯ H P¯ coincides with the closure of M P¯ + R P¯ defined on D(M P¯ ) ∩ D(H P¯ ), where
Second Order Perturbation Theory
213
¯ R P¯ := ( P¯ R P)| ¯ . Therefore the assumptions of [GGM1, Th. 2.25] are satisD(H )∩Ran( P) fied (see [GGM1, Lemma 2.26]), which implies that H¯ is closed, densely defined, and H¯ ∗ = H¯ − . ¯ By Conditions 1.3 and the fact that Ran(P) ⊆ D(M), H¯ Let G¯ := G ∩ Ran( P). ¯ G¯ ∗ ). Mimicking [GGM1, Subsect. 3.4] extends to a bounded operator: H¯ ∈ B(G; (replacing u ∈ D(H ) in Lemmata 3.9 and 3.10 by u ∈ D( H¯ ), and using (3.9)), one can show that there exists 0 such that for all 0 < || ≤ 0 , for all z = η + iμ with ¯ D( H¯ )). η ∈ Jλ and μ > 0, H¯ − z is invertible with bounded inverse R¯ (z) ∈ B(H; ∗ ¯ ¯ ¯ Furthermore R (z) extends to a bounded operator in B(G ; G) which coincides with the ¯ G¯ ∗ ), and which satisfies inverse of ( H¯ − z) ∈ B(G; C R¯ (z)B(G¯ ;G¯ ∗ ) ≤ , || 1 C (v, R¯ (z)v) 2 + v ¯ ¯ for all v ∈ H, R¯ (z)vG¯ ≤ 1 H || 2 ¯ s- lim R¯ (z) = (H ¯ − z)−1 ∈ B(H), →0±
P
(3.11a) (3.11b) (3.11c)
(see [GGM1, Prop. 3.11 and Lemma 3.12]). Let ρ := As−1 A−s = (1 + 2 |A|2 )
s−1 2
s
(1 + |A|2 )− 2 ,
(3.12)
with 1/2 < s ≤ 1. Instead of looking at the expectation of the resolvent R (z) := (H − z)−1 , = 0, we propose to show a differential inequality for the quantity ¯ u ; (3.13) F (z) := ρ u, P¯ R¯ (z) Pρ here u ∈ H, so that ρ u ∈ D(A) ⊆ D(A∗ ). Note that the assumption Ran(P) ⊆ D(A) implies that P¯ leaves D(A) invariant. In the same way as in the proof of Theorem 3.3 in [GGM1], one can verify that d d ¯ u + ρ u, R¯ (z) P¯ d ρ u F (z) = ρ u, R¯ (z) Pρ d d d ∗ ¯ u, Aρ u − Aρ u, R¯ (z) Pρ ¯ u + R¯ (z) Pρ ∗ ¯ u, H P A − A P H − H R¯ (z) Pρ ¯ u , (3.14) + R¯ (z) Pρ where s−3 s d ρ := (s − 1)|A|2 As−3 A−s = (s − 1)|A|2 (1 + 2 |A|2 ) 2 (1 + |A|2 )− 2 . d (3.15)
In particular it follows from the Spectral Theorem that dρ /d ≤ C||s−1 and Aρ ≤ C||s−1 .
(3.16)
Next it follows from Conditions 1.3 and Condition 1.7 that P A ∈ B(G ∗ ), A P ∈ B(G), and hence that H P A − A P H − H ∈ B(G; G ∗ ).
214
J. Faupin, J. S. Møller, E. Skibsted
This implies d F (z) ≤ C1 ||s−1 u ¯ R¯ (z) Pρ ¯ u ¯ + R¯ ∗ (z) Pρ ¯ u ¯ H H H d ¯ u ¯ R¯ ∗ (z) Pρ ¯ u ¯ . + C2 || R¯ (z) Pρ G
G
By (3.11b) and since · H¯ ≤ · G¯ , we obtain d F (z) ≤ C3 ||s−1 u ¯ ||− 21 |F (z)| 21 + Pρ ¯ u ¯ H H d 2 1 1 ¯ u ¯ + C4 || ||− 2 |F (z)| 2 + Pρ H 3 ≤ C5 ||s− 2 |F (z)| + u2H¯ ,
(3.17)
(3.18)
for 0 < || ≤ 0 . Applying Gronwall’s Lemma, this yields |F (z)| ≤ C6 u2H¯ ,
(3.19)
which combined with (3.11c) gives sup
z∈C,Re z∈Jλ ,0<|Im z|≤1
−s ¯ A−s (H − z)−1 PA < ∞.
In order to prove the Hölder continuity in z, we use that, for 0 < 1 < 0 ,
1 d (F (z) − F (z ))d F0 (z) − F0 (z ) = − d
0 0 d (F (z) − F (z ))d + (F0 (z) − F0 (z )). − 1 d It follows from (3.18) and (3.19) that 1 1 d ≤ C7 s− 2 u2 . (F (z) − F (z ))d 1 0 d
(3.20)
(3.21)
(3.22)
Moreover, using the first resolvent equation together with (3.14), (3.11a), (3.11b), (3.16) and (3.19), we obtain d (F (z) − F (z )) ≤ C8 ||s− 25 |z − z | · u2 , d which implies
0 1
d s− 3 (F (z) − F (z ))d ≤ C9 1 2 |z − z | · u2 . d
Finally, the first resolvent equation and (3.11a) give (F (z) − F (z )) ≤ C(0 )|z − z | · u2 , 0 0
(3.23)
(3.24)
for some positive constant C(0 ) depending on 0 . Taking 1 = |z −z |, Eq. (3.4) follows from (3.21)–(3.24).
Second Order Perturbation Theory
215
We have the following stronger result if Condition 1.9 is further assumed. Theorem 3.4. Assume that Conditions 1.3 and Condition 1.9 hold. Suppose J ⊆ I is a compact interval such that σpp (H ) ∩ J ⊆ {λ}. Let P = E H ({λ}) and V ∈ V2 . For σ ∈ R, define Hσ := H + σ V and H¯ σ := Hσ + α J P, where α J ∈ R is fixed such that α J > sup J − inf J . Let S = {z ∈ C, Re z ∈ J, 0 < |Im z| ≤ 1}. For all 1/2 < s ≤ 1, there exists σ0 > 0 such that for all |σ | ≤ σ0 , sup A−s ( H¯ σ − z)−1 A−s < ∞.
(3.25)
z∈S
Moreover there exists C > 0 such that for all σ, σ ∈ [−σ0 , σ0 ], for all z, z ∈ S, 1 1 −s ( H¯ σ − z)−1 − ( H¯ σ − z )−1 A−s ≤ C |σ − σ |s− 2 + |z − z |s− 2 . A (3.26) Remarks 3.5. 1) In the case σpp (H ) ∩ J = ∅, we have P = 0 and hence H¯ σ = Hσ . Of course, Condition 1.9 is not required in this case. 2) The assumption that α J > sup J − inf J implies that H + α J P does not have eigenvalues in J . 3) Equations (3.25)–(3.26) with σ = σ = 0 yield that −s ¯ sup A−s (H − z)−1 PA < ∞,
(3.27)
z∈S −s is Hölder continuous of order s − 1/2 on S. ¯ and that z → A−s (H − z)−1 PA Hence we recover the Limiting Absorption Principles of Theorems 3.1 and 3.3.
Proof of Theorem 3.4. Considering the Mourre estimate, Condition 1.3 (2), for any η ∈ J , we denote by Jη ⊆ I a compact neighbourhood of η such that f η = 1 on a neighbourhood of Jη . Step 1. Let us prove that, for any η ∈ J , there exists ση > 0 such that for all |σ | ≤ ση , sup
z∈C,Re z∈Jη ,0<|Im z|≤1
A−s ( H¯ σ − z)−1 A−s < ∞,
(3.28)
and that the function (σ, z) → A−s ( H¯ σ − z)−1 A−s is Hölder continuous of order s − 1/2 in σ and z on [−ση , ση ] × {z ∈ C, Re z ∈ Jη , 0 < |Im z| ≤ 1}. Let H¯ := H +α J P. Condition 1.7 implies that [P, i A]0 extends to a compact operator. ¯ we have Since H P = λP and H P¯ = H¯ P, f η⊥ (H )2 H = f η⊥ ( H¯ )2 H¯ + f η⊥ (λ)2 λP − f η⊥ (λ + α J )2 λ + α J P. (3.29) Using that the second and third terms in the right-hand-side of (3.29) are compact, the Mourre estimate (1.8) yields M + R + α J [P, i A]0 ≥ c0 I − C0 f η⊥ ( H¯ )2 H¯ − K 0 , (3.30) where K 0 is compact. Since η ∈ / σpp ( H¯ ) (see Remark 3.5 2)), we can put K 0 = 0 provided we choose the function f η supported in a sufficiently small interval containing η. We get M + R + α J [P, i A]0 ≥ 2−1 c0 I − C1 f η⊥ ( H¯ )2 H¯ . (3.31)
216
J. Faupin, J. S. Møller, E. Skibsted
The estimate (3.31) is stable under perturbation from the class V1 . In particular (and more precisely) there exists ση > 0 such that if |σ | ≤ ση , then (3.32) M + R + σ V + α J [P, i A]0 ≥ 3−1 c0 I − C2 f η⊥ ( H¯ σ )2 H¯ σ . Indeed, since V ∈ V1 , we have that ± V ≤ C3 H + C4 ≤ C5 + C6 f η⊥ ( H¯ ) H¯ f η⊥ ( H¯ ),
(3.33)
and f η⊥ ( H¯ ) H¯ f η⊥ ( H¯ ) ≤ C7 f η⊥ ( H¯ ) H¯ σ f η⊥ ( H¯ ) ≤ C7 f η⊥ ( H¯ σ ) H¯ σ f η⊥ ( H¯ σ ) + C8 |σ |.
(3.34)
The first inequality in (3.34) follows from elementary interpolation while the second inequality follows, for instance, from the Helffer-Sjöstrand functional calculus. We set for shortness Hσ := H + σ V , Hσ := H + σ V , P := [P, i A]0 and P := [P , i A]0 . Remark that Conditions 1.3, Condition 1.9 and the assumption V ∈ V2 imply that Hσ , P , Hσ + α J P ∈ B(G; G ∗ ). Note that Eq. (3.32) can be written Hσ + α J P ≥ 3−1 c0 I − C2 f η⊥ ( H¯ σ )2 H¯ σ .
(3.35)
We emphasize that the constant C2 is independent of z and σ . To prove (3.28), we can proceed as in the proof of Theorem 3.3, using (3.35) instead of (3.9), and replacing H¯ and F (z) in (3.10) and (3.13) respectively by H¯ σ, := H¯ σ − i(Hσ + α J P ),
(3.36)
Fσ, (z) := ρ u, R¯ σ, (z)ρ u.
(3.37)
R¯ σ, (z) := ( H¯ σ, − z)−1
(3.38)
and
Here we have set
and, as before, ρ = As−1 A−s . Notice that, by [GGM1, Th. 2.25 and Lemma 2.26], ∗ = H ¯ σ,− . Moreover, following [GGM1, H¯ σ, is closed, densely defined and satisfies H¯ σ, Subsect. 3.4], one can indeed verify that there exists 0 such that for all 0 < || ≤ 0 and z = η + iμ with η ∈ Jη and μ > 0, H¯ σ, − z is invertible with bounded inverse R¯ σ, (z) satisfying properties similar to (3.11a)–(3.11c). We can compute: d d d ¯ ¯ Fσ, (z) = ρ u, Rσ, (z)ρ u + ρ u, Rσ, (z) ρ u d d d ∗ + R¯ σ, (z)ρ u, Aρ u − Aρ u, R¯ σ, (z)ρ u ∗ − R¯ σ, (z)ρ u, Hσ + α J P R¯ σ, (z)ρ u .
(3.39)
Second Order Perturbation Theory
217
We obtain as in (3.18) that d Fσ, (z) ≤ C9 ||s− 23 u2 . d
(3.40)
Estimate (3.25) (with Jλ in place of J ) and the Hölder continuity in z then follow as in the proof of Theorem 3.3. It remains to prove the Hölder continuity in σ . We follow again the proof of Theorem 3.3. For 0 < 1 < 0 , we have
1 d (Fσ, (z) − Fσ , (z))d Fσ,0 (z) − Fσ ,0 (z) = − d
0 0 d (Fσ, (z) − Fσ , (z))d + (Fσ,0 (z) − Fσ ,0 (z)). − d 1 (3.41) The first term in the right-hand-side of (3.41) is estimated thanks to (3.40), which gives 1 1 d ≤ C10 s− 2 u2 . (F (z) − F (z))d (3.42) σ, σ , 1 0 d As for the second and third terms on the right-hand-side of (3.41), we use that, by the second resolvent equation, R¯ σ, (z) − R¯ σ , (z) = −(σ − σ ) R¯ σ, (z)(V − iV ) R¯ σ , (z). Since V and V are H -bounded by assumption, this implies in the same way as in the proof of (3.23) and (3.24) that 0 3 d ≤ C11 s− 2 |σ − σ | · u2 , (F (z) − F (z))d (3.43) σ, σ , 1 1 d and (Fσ, (z) − Fσ , (z)) ≤ C(0 )|σ − σ | · u2 . 0 0
(3.44)
The Hölder continuity in σ follows from (3.41)–(3.44) by choosing 1 = |σ − σ |. Step 2. Since J is compact, it follows from Step 1 and a covering argument that there exist η1 , . . . , ηl (with l < ∞) such that J ⊆ Jη1 ∪ · · · ∪ Jηl . Taking σ0 = min(ση1 , . . . , σηl ), Eq. (3.25) and the Hölder continuity in σ follow. The Hölder continuity in z is a straightforward consequence of the fact that l s− 12 l 3 s− 12 −s (an ) ≤ l2 an , n=1
(3.45)
n=1
for any sequence of positive numbers (an )n=1,...,l , and 1/2 < s ≤ 1.
218
J. Faupin, J. S. Møller, E. Skibsted
4. Upper Semicontinuity of Point Spectrum In this section we study upper semicontinuity of the point spectrum of H . The main result is Theorem 1.14 proven below. Let us begin with stating a consequence of Theorem 3.4, which shows that if the unperturbed Hamiltonian does not have eigenvalues in a compact interval, the same holds for the perturbed Hamiltonian (provided that the perturbation V belongs to V2 ). Corollary 4.1. Assume that Conditions 1.3 hold. Let J ⊆ I be a compact interval such that σpp (H ) ∩ J = ∅. Let V ∈ V2 . There exists σ0 > 0 such that for any |σ | ≤ σ0 , σpp (H + σ V ) ∩ J = ∅.
(4.1)
The statement of Corollary 4.1 remains true under the weaker assumption that V ∈ V1 , provided that a priori eigenstates of H + σ V belong to D(M 1/2 ). This is a consequence of the Mourre estimate established in the proof of Theorem 3.4 (see (3.32)), together with the virial property that ψ, (H + σ V )ψ = 0 which holds for any eigenstate ψ of H + σ V satisfying ψ ∈ D(M 1/2 ) (see Remark 1.4 3)). Hence we have the following: Corollary 4.2. Assume that Conditions 1.3 hold. Let J ⊆ I be a compact interval such that σpp (H ) ∩ J = ∅. Let V ∈ V1 . There exists σ0 > 0 such that for any |σ | ≤ σ0 , the following holds: Suppose that any eigenstate ψ of H + σ V associated to an eigenvalue λ ∈ J satisfies ψ ∈ D(M 1/2 ), then σpp (H + σ V ) ∩ J = ∅.
(4.2)
We now turn to the proof of Theorem 1.14. Here we need Condition 1.10 and that V ∈ B1,γ in addition to Conditions 1.3. Proof of Theorem 1.14. Let λ ∈ I and J ⊆ I as in the statement of the theorem. Step 1. Let us prove that, for any η ∈ J , there exist βη > 0 and γη > 0 such that, for V 1 ≤ γη , the total multiplicity of the eigenvalues of H + V in (η − βη , η + βη ) is at most dim Ker(H − η). If η is an eigenvalue, we proceed as in [AHS, Sect. 2] introducing the (finite rank) eigenprojection, say P, corresponding to this eigenvalue and the auxiliary operator H¯ = H + α J P. Here α J > sup J − inf J as in Theorem 3.4. Then in the same way as in (3.32), for V 1 ≤ γη with γη > 0 small enough, we have that M + R + α J [P, i A]0 + [V, i A]0 ≥ 3−1 c0 I − C1 f η⊥ ( H¯ + V )2 H¯ + V ,
(4.3)
where f η ∈ C∞ 0 (R) is such that 0 ≤ f η ≤ 1 and f η = 1 in a neighbourhood of η. Let us in the following agree on the convention that P = 0 and H¯ = H if η ∈ / σpp (H ). Then (4.3) holds no matter whether η is an eigenvalue or not (provided V 1 is sufficiently small and that the support of f η is chosen sufficiently close to η). Now, it suffices to follow the proof of [AHS, Th. 2.5], combining Condition 1.10 and (4.3). More precisely, let m be the multiplicity of η and let us assume that H +V has eigenvalues (η j ), j = 1, . . . , m 1 , of total multiplicity m 1 > m, located in (η−βη , η+βη ) ⊆ I . Let (ψ j ), j = 1, . . . , m 1 , be an orthonormalset of eigenvectors, ψ j being associated with η j . Consider a linear combination ψ = j a j ψ j such that ψ = 1 and Pψ = 0.
Second Order Perturbation Theory
219
Since V ∈ B1,γ , it follows from Condition 1.10 that ψ ∈ D ∩ D(A), whence (4.3) together with Remark 1.4, 3) yields 2 3−1 c0 ≤ ψ, (M + R + α J [P, iA]0 + [V, iA]0 )ψ + C1 f η⊥ ( H¯ + V ) H¯ + V 1/2 ψ 2 = i ( H¯ + V − η)ψ, Aψ − i Aψ, ( H¯ + V − η)ψ + C1 f η⊥ ( H¯ + V ) H¯ + V 1/2 ψ ≤ βη 2Aψ + C2 βη . (4.4)
In the second inequality, we used that ( H¯ + V − η)ψ = a j (η j − η)ψ j ≤ βη , j
(4.5)
and hence also that f η⊥ ( H¯ + V ) H¯ + V 1/2 ψ ≤ C3 βη by the Spectral Theorem, where the constant C3 depends on supp( f η ). By Condition 1.10, we obtain a contradiction provided that βη is chosen sufficiently small. Step 2. Let us prove that the total multiplicity of the eigenvalues of H + V in J is at most dim Ker(H − λ). It follows from Step 1 that, for any η ∈ [inf J, λ − βλ ] ∪ [λ + βλ , sup J ], there exist βη > 0 and γη > 0 such that, for V 1 ≤ γη , H + V does not have eigenvalues in (η − βη , η + βη ). Since [inf J, λ − βλ ] ∪ [λ + βλ , sup J ] is compact, it follows from a covering argument that there exist η1 , . . . , ηl such that [inf J, λ − βλ ] ∪ [λ + βλ , sup J ] ⊂
l
(η j − βη j , η j + βη j ).
(4.6)
j=1
Hence, for V 1 ≤ min(γη1 , . . . , γηl ), H + V does not have eigenvalues in [inf J, λ − βλ ] ∪ [λ + βλ , sup J ]. Applying Step 1 again with η = λ, this concludes the proof. The next proposition is a consequence of Theorem 1.14. It will be used in Sect. 5. Proposition 4.3. Assume that Conditions 1.3 and Condition 1.10 hold. Suppose λ ∈ σpp (H ) and that J ⊆ I is a compact interval such that σpp (H ) ∩ J = {λ}. Let P = E H ({λ}), P¯ = I − P and PV,J = E (H +V )pp (J ) for any V ∈ V1 (with sufficiently small norm). Then for any sequence V (n) ∈ B1,γ such that V (n) 1 → 0, P¯ PV (n) ,J → 0.
(4.7)
One of the following two alternatives i) or ii) holds: i) There exists 0 < γ ≤ γ such that if V ∈ B1,γ and 0 = V 1 ≤ γ , then the operator H + V does not have eigenvalues in J . ii) There exists a sequence of operators Vn ∈ B1,γ with 0 = Vn 1 → 0 and a sequence of normalized eigenstates, (H + Vn − λn )ψn = 0, with eigenvalues λn → λ, such that for some ψ∞ ∈ Ran(P) we have ψn − ψ∞ → 0.
220
J. Faupin, J. S. Møller, E. Skibsted
Proof. If (4.7) fails there exist an > 0, a sequence of elements V (n) ∈ B1,γ with 0 = V (n) 1 → 0, a linear combination of eigenstates of H + V (n) , viz. ψ (n) = (n) (n) j≤m(n) a j ψ j , such that ψ (n) ≤ 1
and
¯ (n) > . Pψ
(4.8)
Here m(n) ≤ dim Ran(P) specifies the dimension of the range of PV (n) ,J . (n)
Due to Theorem 1.14 the corresponding eigenvalues, say λ j , concentrate at λ. More precisely (n)
max |λ j − λ| → 0
j≤m(n)
for n → ∞.
(4.9)
In particular we have (n)
max (H − λ)ψ j → 0,
j≤m(n)
and
(n)
max f λ⊥ (H )ψ j → 0,
j≤m(n)
(4.10)
and therefore also (H − λ)ψ (n) → 0,
and
f λ⊥ (H )ψ (n) → 0.
(4.11)
Next by the Banach-Alaoglu Theorem [Yo, Th. 1 on p. 126] we can assume that there exists the weak limit ψ∞ := w − lim ψ (n) (by passing to a subsequence and change of notation). From the first identity of (4.11) we learn that ψ∞ ∈ Ran(P). Consequently ¯ (n) = Pψ ¯ ∞ = 0. w − lim Pψ
(4.12)
Now we apply a similar argument as the one for proving Theorem 1.14 now based on (1.8) rather than (4.3): Looking at the expectation of both sides of (1.8) in the states ¯ (n) , using Remark 1.4 3), we obtain φn := Pψ c0 φn 2 ≤ 2(H − λ)φn Aφn + CH 1/2 f λ⊥ (H )φn 2 + φn , K 0 φn .
(4.13)
Since K 0 is compact we obtain from (4.12) that φn , K 0 φn → 0. By (1.18), Aφn is uniformly bounded, and therefore we conclude in combination with (4.11) that φn → 0. This contradicts (4.8). Let us now prove that either i) of ii) holds. If i) fails indeed there exists a sequence of normalized eigenstates, (H + Vn − λn )ψn = 0, with eigenvalues λn → λ and with ¯ n → 0. By compactness there exists Vn ∈ B1,γ , 0 = Vn 1 → 0. Due to (4.7) Pψ ψ ∈ Ran(P) such that along some subsequence Pψn k → ψ. Whence ¯ n k + Pψn k − ψ → 0 for k → ∞, ψn k − ψ ≤ Pψ and we conclude ii).
(4.14)
There is a different version of the second part of Proposition 4.3 given by first fixing V ∈ B1,γ (but otherwise given under the same conditions). Now we look at the eigenvalue problem in I of the family of perturbed Hamiltonians Hσ = H + σ V with σ ∈ R and |σ | > 0 sufficiently small. In this framework there is a similar dichotomy (it can be shown by applying Proposition 4.3 under the same conditions, replacing B1,γ by the subset {σ V, |σ | ≤ σ0 } ⊆ B1,γ ).
Second Order Perturbation Theory
221
Corollary 4.4. Assume that Conditions 1.3 and Condition 1.10 hold. Suppose λ ∈ σpp (H ) and that J ⊆ I is a compact interval such that σpp (H ) ∩ J = {λ}. Let P = E H ({λ}) and let V ∈ B1,γ . One of the following two alternatives i) or ii) holds: i) For some sufficiently small σ0 > 0 there are no eigenvalues of Hσ := H + σ V in J for all σ ∈] − σ0 , σ0 [ \{0}. ii) For some sequence of coupling constants , 0 = σn → 0, and some sequence of normalized eigenstates ψn , (H + σn V − λn )ψn = 0 with λn → λ, there exists ψ∞ ∈ Ran(P) such that ψn − ψ∞ → 0. 5. Second Order Perturbation Theory In this section we shall study second order perturbation theory. Our main interest is the Fermi Golden Rule, which indeed we shall show is a consequence of having an expansion to second order of any possible existing perturbed eigenvalue near an unperturbed one. This is done in Subsect. 5.1 under Conditions 1.3 and Condition 1.10, in the case where the unperturbed eigenvalue is simple. In the degenerate case, this is done in Subsect. 5.2 assuming Condition 1.9 rather than Condition 1.10. We do not obtain an expansion to second order of the perturbed eigenvalues assuming Condition 1.10 only. Nevertheless we shall show a similar version of the Fermi Golden Rule in this case also (done in Subsect. 5.2). 5.1. Second order perturbation theory – simple case. Theorem 5.1. Assume that Conditions 1.3, Condition 1.10 and Condition 1.11 hold. Suppose λ ∈ σpp (H ) and that J ⊆ I is a compact interval such that σpp (H ) ∩ J = {λ}. Let P = E H ({λ}), P¯ = I − P. Let V ∈ B1,γ . Suppose dim Ran(P) = 1, viz. P = |ψψ|.
(5.1)
For all 1/2 < s ≤ 1 and > 0, there exists σ0 > 0 such that if |σ | ≤ σ0 and λσ ∈ J is an eigenvalue of Hσ , then (5.2) λσ − λ − σ ψ, V ψ + σ 2 V ψ, (H − λ − i0+ )−1 P¯ V ψ ≤ σ 2 , and there exists a normalized eigenstate ψσ , Hσ ψσ = λσ ψσ , such that ≤ |σ |. ψσ − ψ + σ (H − λ − i0+ )−1 P¯ V ψ s ∗ D(A )
(5.3)
Remarks 5.2. 1) It is a consequence of Conditions 1.3, Condition 1.7, Remark 1.8 and Condition 1.11 that Ran(V P) ⊆ D(A)
for all V ∈ V1 .
(5.4)
Notice that we can compute the commutator form [V, i A] on (D(M 1/2 ) ∩ ∗ 1/2 D(H )∩D(A ))× D(M ) ∩ D(H ) ∩ D(A) by a formula similar to (1.7). Whence this form is given by V , cf. (1.13), which by assumption is an H -bounded operator. In combination with Theorem 3.3 (5.4) implies that indeed the operator P V (H − λ − i0+ )−1 P¯ V P ∈ B(H).
(5.5)
222
J. Faupin, J. S. Møller, E. Skibsted
2) Due to Theorem 1.14 there is at most one eigenvalue λσ of Hσ near λ, and if it exists it is simple. Corollary 5.3. Under the conditions of Theorem 5.1 and the condition Im V ψ, (H − λ − i0+ )−1 P¯ V ψ > 0,
(5.6)
there exists σ0 > 0 such that for all σ ∈] − σ0 , σ0 [ \{0}, σpp (Hσ ) ∩ J = ∅.
(5.7)
Proof of Theorem 5.1. Assume by contradiction that (5.2) does not hold. Then there exist > 0 and a sequence σn → 0 such that Hσn has an eigenvalue λn in J satisfying, for all n and for some ψ ∈ Ran(P), ψ = 1, (5.8) λn − λ − σn ψ, V ψ + σn2 V ψ, (H − λ − i0+ )−1 P¯ V ψ ≥ σn2 . Since dim Ran(P) = 1, (5.8) actually holds for any ψ ∈ Ran(P) such that ψ = 1. Let ψn be a normalized eigenstate of Hn := Hσn associated to λn , Hn ψn = λn ψn . Arguing as in the proof of Proposition 4.3 we can assume that there exists ψ˜ ∈ Ran(P) ˜ → 0. Henceforth we set ψ = ψ. ˜ Let Pn := E Hn ({λn }). It follows such that ψn − ψ from the fact that dim Ran(P) = 1 together with Theorem 1.14 that dim Ran(Pn ) = 1. Hence Pn = |ψn ψn |. The equation (Hn − λn )Pn = 0 is equivalent to the following system of equations: P (σn V + λ − λn ) Pn = 0, σn P¯ V Pn + (λ − λn ) P¯ Pn + (H − λ) P¯ Pn = 0.
(5.9)
Since ψn − ψ → 0, we have P¯ Pn → 0 and P Pn → 1. Hence the first equation of (5.9) yields λ − λn = O(|σn |).
(5.10)
Now, using the second equation of (5.9), we can write, for any φ ∈ H such that φ = 1, and any 1/2 < s ≤ 1, P¯ Pn φ2 = P¯ Pn φ, P¯ Pn φ = P¯ Pn φ, (H − λ − i0+ )−1 σn P¯ V Pn + (λ − λn + i0+ ) P¯ Pn φ −s ¯ ≤ C|σn | A−s (H − λ − i0+ )−1 PA ×As P¯ Pn A P¯ Pn + A P¯ V Pn . (5.11) Using Condition 1.10 and the assumption that V ∈ B1,γ , one can prove that A P¯ Pn and A P¯ V Pn are uniformly bounded in n. In addition we claim that for s < 1, As P¯ Pn → 0 as n → ∞. To prove this, it suffices to use that As (A + ik)−1 → 0 as k → ∞, together with A P¯ Pn being uniformly bounded in n and P¯ Pn → 0 as n → ∞. Therefore by Theorem 3.3, P¯ Pn 2 = o(|σn |).
(5.12)
Second Order Perturbation Theory
223
Since dim Ran(P) = dim Ran(Pn ) = 1, Eq. (5.12) implies ¯ n 2 = P¯n ψ2 = o(|σn |), Pψ
(5.13)
and in particular also P¯n P2 = o(|σn |),
(5.14) ¯ where we have set Pn = I − Pn . Taking the expectation of the first equation of (5.9) in the state ψ gives λ − λn = −σn ψ, V ψ + (λ − λn )(1 − Pn ψ2 ) − σn ψ, V (Pn − P)ψ = −σn ψ, V ψ + σn ψ, V P¯n ψ + o(σn2 ), (5.15) where we used (5.10) and (5.13) in the second equality. Let us write P¯n ψ = P P¯n ψ − P¯ Pn ψ.
(5.16)
Estimate (5.14) yields P P¯n ψ = o(|σn |). Inserting (5.16) and the second equation of (5.9) into (5.15), we obtain λ − λn = −σn ψ, V ψ + σn2 V ψ, (H − λ − i0+ )−1 P¯ V Pn ψ + σn (λ − λn )V ψ, (H − λ − i0+ )−1 P¯ Pn ψ + o(σn2 ).
(5.17)
As above we can use λ − λn = O(|σn |) together with the fact that As P¯ Pn → 0 for s < 1 and Theorem 3.3 to obtain σn (λ − λn )V ψ, (H − λ − i0+ )−1 P¯ Pn ψ = o(σn2 ). (5.18) Finally, it follows from Condition 1.10 and the assumption V ∈ B1,γ that As V (Pn − P)ψ → 0 for s < 1. This leads to λ − λn = −σn ψ, V ψ + σn2 V ψ, (H − λ − i0+ )−1 P¯ V ψ + o(σn2 ), (5.19) which contradicts (5.8), and hence proves (5.2). It remains to prove (5.3). Assume, again by contradiction, that (5.3) does not hold. Then there exist > 0 and a sequence σn → 0 such that Hn = Hσn has an eigenvalue λn ∈ J associated to a normalized eigenstate ψn satisfying, for any ψ ∈ Ran(P), ψ = 1, ≥ |σn |. (5.20) ψn − ψ + σn (H − λ − i0+ )−1 P¯ V ψ s ∗ (D (A ))
As above we can assume that there exists ψ ∈ Ran(P) such that ψn − ψ → 0. Let ψ˜ := eiθn ψ, where θn ∈ R is defined by the equation ψ, ψn = eiθn |ψ, ψn |. Using the second equation of (5.9), we can write ¯ n ψn = Pψn + Pψ ¯ n − σn (H − λ − i0+ )−1 P¯ V ψn = ψ, ψn ψ − (λ − λn )(H − λ − i0+ )−1 Pψ + −1 = ψ˜ − σn (H − λ − i0 ) P¯ V ψ˜ + Rn , (5.21) where
−1 ¯ n Rn = (Pψn − 1) ψ˜ − (λ − λn ) H − λ − i0+ Pψ −1 −σn H − λ − i0+ P¯ V ψn − ψ˜ .
(5.22)
By arguments similar to the ones used to prove (5.2), one can see that Rn D(As )∗ = o(|σn |) for any fixed 1/2 < s < 1, which contradicts (5.20), and hence proves (5.3).
224
J. Faupin, J. S. Møller, E. Skibsted
5.2. Fermi Golden Rule criterion – general case. We begin this section with a result similar to Theorem 5.1 that we shall obtain without requiring an hypothesis of simplicity. Here we need Condition 1.9 rather than Condition 1.10. Theorem 5.4. Suppose Conditions 1.3, Condition 1.9 and Condition 1.11. Let V ∈ V2 . Suppose λ ∈ σpp (H ) and that J ⊆ I is a compact interval such that σpp (H ) ∩ J = {λ}. Let P = E H ({λ}), P¯ = I − P. There exist C ≥ 0 and σ0 > 0 such that if |σ | ≤ σ0 and λσ ∈ J is an eigenvalue of Hσ = H + σ V , then there exists ψ ∈ Ran(P), ψ = 1, such that (5.23) λσ − λ − σ ψ, V ψ + σ 2 V ψ, (H − λ − i0+ )−1 P¯ V ψ ≤ C|σ |5/2 . Remarks 5.5. 1) In the simple case, P = |ψψ|, (5.23) is stronger than (5.2). 2) We do not have an analogue of (5.3) under the conditions of Theorem 5.4, even if we assume in addition dim Ran(P) = 1. Similarly, cf. Remark 5.2 2), we do not have upper semicontinuity of point spectrum at λ even if dim Ran(P) = 1. Proof of Theorem 5.4. We can argue in a way similar to the proofs of Proposition 5.2 and Lemma 5.3 in [AHS]. For σ = 0, there is nothing to prove. Let σ = 0. As in the proof of Theorem 3.4, we set H¯ = H + α J P with α J > sup J − inf J , and H¯ σ = H¯ + σ V . Assume that λσ ∈ σpp (Hσ ) and let φσ be such that (Hσ − λσ )φσ = 0, φσ = 1. Hence ( H¯ σ − λσ )φσ = α J Pφσ .
(5.24)
By Theorem 3.4, λσ ∈ / σpp ( H¯ σ ), and hence in particular Pφσ = 0. Moreover, it follows from (5.24) that, for any > 0, −1 −1 Pφσ = α J P H¯ σ − λσ − i Pφσ − iα J P H¯ σ − λσ − i φσ . (5.25) Letting → 0, since λσ ∈ / σpp ( H¯ σ ), we obtain −1 Pφσ = α J P H¯ σ − λσ − i0+ Pφσ .
(5.26)
Note that the right-hand-side of (5.26) is well-defined by Theorem 3.4 since, by Condition 1.9, Ran(P) ⊆ D(A). Let β := α J +λ−λσ . Hence P H¯ − λσ P = β P. Using twice the second resolvent equation, one easily verifies that, for any > 0, −1 P H¯ σ − λσ − i P −1 −1 V P. = (β − i) P − (β − i)−2 σ P V P + (β − i)−2 σ 2 P V H¯ σ − λσ − i (5.27) Letting → 0 and using Theorem 3.4 with s = 1, this yields −1 P H¯ σ − λσ − i0+ P −1 V P + R1 , = β −1 P − β −2 σ P V P + β −2 σ 2 P V H¯ − λσ − i0+
(5.28)
where R1 is a bounded operator on Ran(P) satisfying R1 ≤ C1 |σ |5/2 . Note that the right-hand-side of (5.28) is well-defined by Theorem 3.4 and Remark 5.2 2).
Second Order Perturbation Theory
225
Now let ψ := Pφσ −1 Pφσ . Multiplying (5.28) by α J β and taking the expectation in ψ, we obtain thanks to (5.26): λ − λσ = −α J β −1 σ ψ, V ψ −1 + α J β −1 σ 2 V ψ, H¯ − λσ − i0+ V ψ + ψ, R1 ψ.
(5.29)
Using again Theorem 3.4 with s = 1, this implies λ − λσ = −α J β −1 σ ψ, V ψ −1 + α J β −1 σ 2 V ψ, H¯ − λ − i0+ V ψ + ψ, R2 ψ,
(5.30)
where R2 is a bounded operator on Ran(P) satisfying R2 ≤ C2 |σ |5/2 . In particular, |λ − λσ | ≤ C3 |σ |. We then obtain from (5.26) and (5.28) that −1 λ − λσ ψ = −α J β −1 P V Pψ + α J β −1 σ P V H¯ − λ − i0+ V Pψ + σ −1 R2 ψ σ (5.31) = (−P V P + σ R3 )ψ, where R3 is an operator on the finite dimensional space Ran(P) uniformly bounded in σ . It follows from the usual perturbation theory (see [Ka]) that ψ can be written as ψ = ψ1 + σ ψ2 , where ψ1 is an eigenstate of −P V P and ψ2 ∈ Ran(P). Now, multiplying (5.30) by α −1 J β gives −1 2 + −1 ¯ (λ − λσ )α −1 J β = −σ ψ, V ψ + σ V ψ, ( H − λ − i0 ) V ψ + α J βψ, R2 ψ = −σ ψ, V ψ + α −1 σ 2 V ψ, P V ψ + σ 2 V ψ, (H − λ − i0+ )−1 P¯ V ψ J
+ α −1 J βψ, R2 ψ.
(5.32)
By (5.30), we can write λ − λσ = −σ ψ, V ψ + ψ, R4 ψ,
(5.33)
with R4 ≤ C4 σ 2 , and hence −1 2 (λ − λσ )α −1 J β = (λ − λσ ) + α J (λ − λσ ) 2 2 3 = (λ − λσ ) + α −1 J σ ψ, V ψ + O(|σ | ).
(5.34)
Since ψ = ψ1 + σ ψ2 where ψ1 is an eigenstate of −P V P, we have ψ, V ψ2 − P V ψ2 = O(|σ |).
(5.35)
−1 2 2 2 3 α −1 J σ V ψ, P V ψ − α J σ ψ, V ψ = O(|σ | ).
(5.36)
Therefore,
Combining Eqs. (5.32), (5.34) and (5.36), the statement of the theorem follows.
We come now to the proof of Theorem 1.15 on the absence of eigenvalues of the perturbed Hamiltonian Hσ = H + σ V , generalizing Corollary 5.3:
226
J. Faupin, J. S. Møller, E. Skibsted
Proof of Theorem 1.15. Suppose first that Condition 1.9 holds and that V ∈ V2 . By Theorem 5.4, there exists σ0 > 0 such that if λσ is an eigenvalue of Hσ with |σ | ≤ σ0 , then (5.23) is satisfied. Taking the imaginary part of (5.23) contradicts (1.20). Suppose now Condition 1.10 and that V ∈ B1,γ . Assume by contradiction that (1.21) is false. Then the second alternative ii) of Corollary 4.4 holds. Hence we consider a sequence of normalized eigenstates ψn → ψ∞ ∈ Ran(P) of a sequence of Hamiltonians Hn := Hσn given in terms of a certain sequence of coupling constants σn → 0, σn = 0. Let Pn = |ψn ψn |. As in the proof of Theorem 5.1, the equation (Hn − λn )Pn = 0 is equivalent to (5.9). We notice that λ − λn Im (Pn P Pn ) = 0, Im Pn V P¯ Pn = −Im (Pn V P Pn ) = σn
(5.37)
due to the first equation of (5.9). Next we apply Pn V (H − λ − i0+ )−1 P¯ from the left in the second equation of (5.9), take the imaginary part and use (5.37) yielding σn Pn V Im (H − λ − i0+ )−1 P¯ V Pn (5.38) = (λn − λ)Im Pn V (H − λ − i0+ )−1 P¯ Pn . Now we take the expectation of (5.38) in the state ψ∞ , use the first equation of (5.9) and divide by σn yielding ¯ V Pn ψ∞ Im (H − λ − i0+ )−1 P
= Im Pn V (H − λ − i0+ )−1 P¯ Pn V ψ∞ .
(5.39)
Again, using Condition 1.10, we have that As P¯ Pn → 0 for 1/2 < s < 1. We then conclude by letting n → ∞ in the above identity, using Theorem 3.3, which yields ¯ V ψ∞ = 0. Im (H − λ − i0+ )−1 P Clearly (5.40) contradicts (1.20).
(5.40)
Appendix A. Domain Question We give an independent proof of (1.11) under Condition 1.3 (1), in fact we shall give an alternative proof of the fact that √ D( G) = G,
(A.1) √ √ cf. Remark 1.4 2). Obviously D( G) ⊆ G and the √graph norm of G is equivalent to the norm on G (defined by (1.3)). In particular D( G) is closed in G. Whence (A.1) is in turn a consequence of (1.11). The proof of (1.11) is in two steps. Step I. We shall show that D(M) ∩ D(|H |1/2 ) is dense in G. We will essentially use [FMS, (3.14)]. Whence, introducing the notation In (M) = −in(M − in)−1 for n ∈ N,
(A.2)
Second Order Perturbation Theory
227
we have s- lim H 1/2 In (M)H −1/2 = I.
(A.3)
n→∞
For completeness let us here give the proof of (A.3) following [FMS]: Due to [Mo, Prop. II.3] s- lim H In (M)H −1 = I.
(A.4)
n→∞
Introducing Bns = H s (In (M) − I ) H −s ; Re s ∈ [0, 1], 1/2
we observe that the families {Bn1 } and {Bn0 } are bounded. Whence also {Bn } is bounded 1/2 (by interpolation). Using this fact and the fact that Bn φ → 0 for φ ∈ D(H 1/2 ) (due to (A.4)) we obtain (A.3). Now, to show (A.2), we let φ ∈ G be given and define φn = In (M)φ. By (A.3) we have φn ∈ D(M) ∩ D(|H |1/2 ) and in fact that H 1/2 (φ − φn ) → 0. Obviously M 1/2 (φ − φn ) → 0. We conclude that φ − φn G → 0. Step II. We shall show that D is dense in D(M) ∩ D(|H |1/2 ) ⊆ G.
(A.5)
Whence let φ ∈ D(M) ∩ D(|H |1/2 ) be given. Define similarly φn = In (H )φ. Since H ∈ C1Mo (M) we can compute [M, In (H )] = n −1 In (H )[H, iM]0 In (H ) ∈ B(H), are therefore deduce that s- lim [M, In (H )] = 0. n→∞
(A.6)
It follows from (A.6) that φn ∈ D and that M (φ − φn ) → 0. Clearly H 1/2 (φ − φn ) → 0. In particular φ − φn G → 0. Clearly (1.11) follows by combining (A.2) and (A.5). References [AHS] [AC] [ABG] [BFS] [BFSS] [BC] [BD] [Ca]
Agmon, S., Herbst, I., Skibsted, E.: Perturbation of embedded eigenvalues in the generalized N -body problem. Commun. Math. Phys. 122, 411–438 (1989) Aguilar, J., Combes, J.M.: A class of analytic perturbation for one-body Schrödinger Hamiltonians. Commun. Math. Phys. 22, 269–279 (1971) Amrein, W., Boutet de Monvel, A., Georgescu, V.: C0 -groups, commutator methods and spectral theory of N -body Hamiltonians. Basel–Boston–Berlin: Birkhäuser, 1996 Bach, V., Fröhlich, J., Sigal, I.M.: Quantum electrodynamics of confined non-relativistic particles. Adv. Math. 137, 299–395 (1998) Bach, V., Fröhlich, J., Sigal, I.M., Soffer, A.: Positive commutators and the spectrum of Pauli-Fierz Hamiltonian of atoms and molecules. Commun. Math. Phys. 207, 557–587 (1999) Balslev, E., Combes, J.M.: Spectral properties of many-body Schrödinger operators with dilation analytic interactions. Commun. Math. Phys. 22, 280–294 (1971) Bruneau, L., Derezi´nski, J.: Pauli-Fierz Hamiltonians defined as quadratic forms. Rep. Math. Phys. 54, 169–199 (2004) Cattaneo, L.: Mourre’s inequality and embedded bound states. Bull. Sci. Math. 129, 591–614 (2005)
228
[CGH]
J. Faupin, J. S. Møller, E. Skibsted
Cattaneo, L., Graf, G.M., Hunziker, W.: A general resonance theory based on Mourre’s inequality. Ann. Henri Poincaré 7, 583–601 (2006) [Da] Davies, E.B.: Linear operators and their spectra. Cambridge: Cambridge University Press, 2007 [DG] Derezi´nski, J., Gérard, C.: Asymptotic completeness in quantum field theory. Massive Pauli-Fierz Hamiltonians. Rev. Math. Phys. 11, 383–450 (1999) [DJ] Derezi´nski, J., Jakˇsi´c, V.: Spectral theory of Pauli-Fierz operators. J. Funct. Anal. 180, 243–327 (2001) [FMS] Faupin, J., Møller, J.S., Skibsted, E.: Regularity of bound states. Rev. Math. Phys. (to appear) [FGS] Fröhlich, J., Griesemer, M., Sigal, I.M.: Spectral theory for the standard model of non-relativistic QED. Commun. Math. Phys. 283, 613–646 (2008) [GG] Georgescu, V., Gérard, C.: On the virial theorem in quantum mechanics. Commun. Math. Phys. 208, 275–281 (1999) [GGM1] Georgescu, V., Gérard, C., Møller, J.S.: Commutators, C0 –semigroups and resolvent estimates. J. Funct. Anal. 216, 303–361 (2004) [GGM2] Georgescu, V., Gérard, C., Møller, J.S.: Spectral theory of massless Pauli-Fierz models. Commun. Math. Phys. 249, 29–78 (2004) [Go] Golénia, S.: Positive commutators, Fermi golden rule and the spectrum of 0 temperature Pauli-Fierz Hamiltonians. J. Funct. Anal. 256, 2587–2620 (2009) [GJ] Golénia, S., Jecko, T.: A new look at Mourre’s commutator theory. Compl. Anal. Oper. Th. 1, 399– 422 (2007) [HP] Hille, E., Phillips, R.S.: Functional Analysis and Semigroups. Providence, RI: Amer. Math. Soc., 1957 [HS] Hunziker, W., Sigal, I.M.: The quantum N -body problem. J. Math. Phys. 41, 3448–3510 (2000) [HuSp] Hübner, M., Spohn, H.: Spectral properties of the spin-boson Hamiltonian. Ann. Inst. Henri Poincaré 62, 289–323 (1995) [JP] Jakˇsi´c, V., Pillet, C.A.: On a model for quantum friction, ii. Fermi’s golden rule and dymamics at positive temperature. Commun. Math. Phys. 176, 619–644 (1996) [Ka] Kato, T.: Perturbation Theory for Linear Operators. Second edition, Berlin: Springer-Verlag, 1976 [Mo] Mourre, É.: Absence of singular continuous spectrum for certain selfadjoint operators. Commun. Math. Phys. 78, 391–408 (1980/81) [MR] Møller, J.S., Rasmussen, M.G.: The Translation Invariant Massive Nelson Model: II. The continuous Spectrum Below the Two–boson Thershold. In preparation [MS] Møller, J.S., Skibsted, E.: Spectral theory of time-periodic many-body systems. Adv. in Math. 188, 137–221 (2004) [RS] Reed, M., Simon, B.: Methods of modern mathematical physics I-IV. New York, Academic Press, 1972-78 [Si] Simon, B.: Resonances in N -body quantum systems with dilation analytic potential and foundation of time-dependent perturbation theory. Ann. Math. 97, 247–274 (1973) [Sk] Skibsted, E.: Spectral analysis of N -body systems coupled to a bosonic field. Rev. Math. Phys. 10, 989–1026 (1998) [Yo] Yosida, K.: Functional analysis. Berlin: Springer, 1965 Communicated by H. Spohn
Commun. Math. Phys. 306, 229–260 (2011) Digital Object Identifier (DOI) 10.1007/s00220-011-1292-z
Communications in
Mathematical Physics
Energy Properness and Sasakian-Einstein Metrics Xi Zhang Department of Mathematics, Zhejiang University, Hangzhou 310027, P.R. China. E-mail:
[email protected] Received: 15 July 2010 / Accepted: 24 February 2011 Published online: 24 June 2011 – © Springer-Verlag 2011
Abstract: In this paper, we show that the existence of Sasakian-Einstein metrics is closely related to the properness of corresponding energy functionals. Under the condition of admitting no nontrivial Hamiltonian holomorphic vector field, we prove that the existence of Sasakian-Einstein metric implies a Moser-Trudinger type inequality. At the end of this paper, we also obtain a Miyaoka-Yau type inequality in Sasakian geometry. 1. Introduction An odd dimensional Riemannian manifold (M, g) is said to be a Sasakian manifold if the cone manifold (C(M), g) ˜ = (M × R + , r 2 g + dr 2 ) is Kähler. In this paper, we suppose that dim M = 2m + 1. Furthermore, the Sasakian manifold (M, g) is said to be Sasakian-Einstein if the Ricci tensor of g satisfies the Einstein condition. It is well known that the Kähler cone (C(M), g) ˜ is a Calabi-Yau cone if and only if (M, g) is a Sasakian-Einstein manifold. The anti-de Sitter/conformal field theory (AdS/CFT) correspondence is a conjectured equivalence between a quantum field theory in d spacetime dimensions with conformal scaling symmetry and a quantum theory of gravity in (d + 1)-dimensional anti-de Sitter space. In its original version the Maldacena conjecture (also known as AdS/CFT duality) states that the ’t Hooft large n limit of N = 4 supersymmetric Yang-Mills theory with gauge group SU (n) is dual to type IIB superstring theory on ad S5 × S 5 [23]. An interesting generalization of this duality can be produced by placing D3 branes at the tip of the Ricci-flat 6-dimensional cone Y (see Klebanov and Witten’s paper [21]). The cone 2 , where M is the level hyper-surface metric may be cast in the form dsY2 = dr 2 + r 2 ds M of Y . In particular, M is a positively curved Einstein manifold. In order to preserve the N = 1 supersymmetry, Y must be a Calabi-Yau space, then M is defined to be SasakianEinstein. This has recently led to considerable interest in Sasakian-Einstein geometry, see references [5–12,16–20,24–27,39]. In this paper, we discuss the existence problem The author was supported in part by NSF in China, No.10831008 and No.11071212.
230
X. Zhang
of Sasakian-Einstein metrics. As in the Kähler case, we show that the existence of Sasakian-Einstein metrics is also closely related to the properness of corresponding energy functionals. Furthermore, under the condition of admitting no nontrivial Hamiltonian holomorphic vector field, we prove that the existence of the Sasakian-Einstein metric implies a Moser-Trudinger type inequality. By the AdS/CFT duality, it is reasonable to believe that the properness of energy functionals and the Moser-Trudinger type inequality should have a physical interpretation in terms of conformal field theory, and we will in the future further discuss this problem. Sasakian manifolds can be studied from many viewpoints as they have many structures. A Sasakian manifold (M, g) has a contact structure (ξ, η, ), and it also has a one dimensional foliation Fξ , called the Reeb foliation. Here, the Killing vector field ξ is called the characteristic or Reeb vector field, η is called the contact 1-form, is a (1, 1) tensor field which defines a complex structure on the contact sub-bundle D = ker η. In the following, a Sasakian manifold will be denoted by (M, ξ, η, , g), the quadruple (ξ, η, , g) will be called a Sasakian structure on manifold M. The Sasakian manifold (M, ξ, η, , g) is said to be quasi-regular if all the leaves of Fξ are compact. Otherwise (M, ξ, η, , g) is said to be irregular. In a natural way, the Sasakian structure (ξ, η, , g) induces a transverse holomorphic structure and a transverse Kähler metric on the foliation Fξ . In this paper, we may change the Sasakian structure, but always fix the Reeb vector field ξ and the transverse holomorphic structure on Fξ . Fixing a transverse holomorphic structure on Fξ , we have a splitting of the complexification of the bundle ∧1B (M) of basic one forms on M, ∧1B (M) ⊗ C = ∧1,0 B (M) ⊕ ¯ ∧0,1 (M), and then we have the decomposition of d, i.e. d = ∂ + ∂ . We have the B B B i, j basic cohomology groups H B (M, Fξ ) which enjoy many of the same properties as the Dolbeault cohomology of a Kähler structure. We also have the transverse Chern-Weil theory and can define the basic Chern classes ckB (M, Fξ ). For details, see [8]. Given a transverse Kähler structure g T , one can define the transverse Levi-Civita connection ∇ T on the normal bundle ν(Fξ ) = T M/Lξ , and then one can define the transverse Ricci curvature Ric T . See Sect. 2 for details. If we denote the related Ricci form by ρ T , it is easy to see that ρ T is a closed basic (1, 1)-form and the basic cohomology class 1 B T 2π [ρ ] B = c1 (M, Fξ ) is the basic first Chern class. A Sasakian metric (ξ, η, , g) is said to be transversely Kähler-Einstein if its transverse Ricci form satisfies ρ T = μdη. It is easy to see that if a Sasakian metric (ξ, η, , g) is Sasakian-Einstein then it must be transversely Kähler-Einstein and ρ T = (m + 1)dη. So, a necessary condition for the existence of Sasakian-Einstein metric on M is that there exists a Sasakian structure (ξ, η, , g) such that 2π c1B (M, Fξ ) = (m + 1)[dη] B . Given a Sasakian structure (ξ, η, , g) on M, denote the space of all smooth basic real functions ϕ (i.e. ξ ϕ ≡ 0) on (M, ξ, η, , g) by C B∞ (M, ξ ). Set H(ξ, η, , g) = {ϕ ∈ C B∞ (M, ξ ) : ηϕ ∧ (dηϕ )n = 0},
(1.1)
where ηϕ = η +
√ 1 √ −1 (∂¯ B − ∂ B )ϕ, dηϕ = dη + −1∂ B ∂¯ B ϕ. 2
(1.2)
For any ϕ ∈ H, (ξ, ηϕ , ϕ , gϕ ) is also a Sasakian structure on M, where ϕ = − ξ ⊗ (d Bc ϕ) ◦ , gϕ =
1 dηϕ ◦ (I d ⊗ ϕ ) + ηϕ ⊗ ηϕ . 2
(1.3)
Energy Properness and Sasakian-Einstein Metrics
231
Furthermore, (ξ, ηϕ , ϕ , gϕ ) and (ξ, η, , g) have the same transversely holomorphic structure on ν(Fξ ) and the same holomorphic structure on the cone C(M) (Prop. 4.2 in [16], also [8]). Obviously, those deformations of Sasakian structure deform the transverse Kähler form in the same basic (1, 1) class. We call this class the basic Kähler class of the Sasakian manifold (M, ξ, η, , g). As in the Kähler case, one can define Aubin’s functionals Idη , Jdη , Ding and Tian’s energy functional Fdη , and Mabuchi’s K -energy functional Vdη on the space H(ξ, η, , g), see Sect. 3 for details. We say the energy functional Fdη (or Vdη ) is proper if lim supi→+∞ Fdη (ϕi ) = +∞ whenever limi→+∞ Jdη (ϕi ) = +∞, where ϕi ∈ H(ξ, η, , g). We say two Sasakian structures are compatible with each other if they have the same Reeb vector field and the same transverse holomorphic structure. In Kähler geometry, Tian [36] has shown that the existence of the Kähler-Einstein metric is equivalent with the properness of the F energy functional on a compact Kähler manifold with positive Chern class and without any nontrivial holomorphic fields. In this paper, under the condition that without a nontrivial Hamiltonian holomorphic vector field (see Sect. 2 for details), we generalize Tian’s result in [36] to the Sasakian case. It should be pointed out that, in the special case that the Sasakian manifold is the total space of an S 1 Seifert bundle over a Kähler orbifold (i.e. is quasi-regular), then Ding and Tian’s result about the Kähler-Einstein orbifold ([14]) can be used for the existence problem for quasi-regular Sasakian-Einstein metrics. For the more general case which includes irregular Sasakian-Einstein, one needs to study the transverse Kähler geometry. In fact, we obtain the following theorem. Main Theorem. Let (M, ξ, η, , g) be a compact Sasakian manifold with [dη] B = 2π B m+1 c1 (M, Fξ ) and without any nontrivial Hamiltonian holomorphic vector field. Then M has a Sasakian-Einstein structure compatible with (ξ, η, , g) if and only if the functional Fdη (or the K -energy functional Vdη ) is proper in the space H(ξ, η, , g). We will follow Tian’s method, and the discussion in [29] by Phong, Song, Sturm and Weinkove. It’s easy to see that if a Sasakian structure (ξ , η , , g ) is compatible with (ξ, η, , g), then we have [dη ] B = [dη] B ∈ H B1,1 (Fξ ), by a transverse ∂ ∂¯ lemma ([15]), there exists a basic function ϕ ∈ H(ξ, η, , g) such that dη = dη + dd Bc ϕ, and η = η + d Bc ϕ + ζ,
(1.4)
√
where ζ is a closed basic one form, d Bc = 2−1 (∂¯ B − ∂ B ). So, the existence problem of a Sasakian-Einstein metric compatible with (ξ, η, , g) can be reduced to solve the following transverse Monge-Ampère equation: √ (dη + −1∂ B ∂ B ϕ)m ∧ η = exp(h dη − (m + 1)ϕ), (1.5) (dη)m ∧ η function which satisfies ρ T = where ϕ ∈ H(ξ, √ η, , g) and hdη is a smooth basic m ¯ (m + 1)dη + −1∂ B ∂ B h dη and M exp(h dη )(dη) ∧ η = M (dη)m ∧ η = V . In order to use the continuity method, we consider the following family of equations: √ (dη + −1∂ B ∂ B ϕ)m ∧ η = exp(h dη − t (m + 1)ϕ), (1.6) (dη)m ∧ η where t ∈ [0, 1]. By El-Kacimi’s ([15]) generalization of Yau’s estimates ([38]) for transverse Monge-Ampère equations, to solve the transverse Monge-Ampère equation (1.5),
232
X. Zhang
it is sufficient to obtain an apriori uniform C 0 estimate of solution ϕ of the transverse Monge-Ampère equation (1.6). Different than the Kähler case, the transverse MongeAmpère equation (1.6) only gives the bound on the transverse Ricci curvature which does not lead to a lower bound of the Ricci curvature by a positive constant. So, we can not apply the Myers theorem directly to obtain an estimate on the diameter and the lower bound of the Green’s function. It is pointed out by Sekiya ([31]), through D-homothetic deformation of Sasakian structure, one can get the desired estimates in the Sasakian case. In Sect. 4 (Theorem 4.3), we show that the properness of the energy functional Fdη (or the K -energy functional Vdη ) implies the C 0 estimate, and so we get an existence result for the Sasakian-Einstein metric. To prove that the existence of the Sasakian-Einstein metric implies the properness of the energy functional, we use the backward continuity method. In order to apply the implicit function theorem at time t = 1, we should consider the eigenspace corresponding to the eigenvalue −4(m + 1) of the basic Laplacian of the Sasakian-Einstein metric. Thanks to A. Futaki, H. Ono and G. Wang’s result [16], we know that the above eigenspace is not empty if and only if there exists a nontrivial Hamiltonian holomorphic vector field. In Sect. 6 (Theorem 6.1), under the condition that without a nontrivial Hamiltonian holomorphic vector field, we obtain the following Moser-Trudinger inequality on Sasakian-Einstein manifold (M, ξ, η S E , S E , g S E ), i.e. there exist uniform positive constants C1 , C2 , such that Fdη S E (ϕ) ≥ C1 Jdη S E (ϕ) − C2 ,
(1.7)
for all ϕ ∈ H(ξ, η S E , S E , g S E ). In view of the cocycle identity of Fdη and properties of Jdη (see Sect. 3, Lemma 3.1 and Lemma 3.2), the inequality holds for every Sasakian structure (ξ, η, , g) which is compatible with the Sasakian-Einstein structure (ξ, η S E , S E , g S E ). On the other hand, the relation (3.25) implies that the MoserTrudinger type inequality (1.7) also be valid for the K-energy Vdη . Given a Sasakian structure (ξ, η, , g) on M with 2π c1B (M) = (m + 1)[dη] B , we denote S(ξ, J¯) to be the set of all Sasakian structures compatible with (ξ, η, , g). By Definition 2.6 and Proposition 2.7, it’s easy to see that the integral M
(2c2B (M, Fξ ) −
( 1 dη )m−2 m B c1 (M, Fξ )2 ) ∧ 2 ∧ η m+1 (m − 2)!
(1.8)
is independent of the choice of a Sasakian structure (ξ, η , , g ) ∈ S(ξ, J¯). By direct calculation, see Lemma 7.2, if there exists a Sasakian-Einstein structure (or equivalently a Sasakian structure with constant scalar curvature) in S(ξ, J¯), then the integral (1.8) must be nonnegative; this inequality will be called a Miyaoka-Yau type inequality. In Sect. 7, we discuss this Miyaoka-Yau type inequality under a weaker condition, and show that the energy function Fdη (or Mabuchi’s K-energy Vdη ) bounded from below implies this weak condition. See Theorem 7.3 and Proposition 7.5 for details. This paper is organized as follows. In Sect. 2, we recall some preliminary results about Sasakian geometry. In Sect. 3, we introduce energy functionals in Sasakian geometry. In Sect. 4, we consider the transverse Monge-Ampère equation, and give an existence result of the Sasakian-Einstein metric. In Sect. 5, we use the Sasakian-Ricci flow to get a smoothing lemma. In Sect. 6, we give a proof of the Moser-Trudinger type inequality (1.7), and finish the proof of the main theorem. In the last section, we obtain a MiyaokaYau type inequality in Sasakian geometry.
Energy Properness and Sasakian-Einstein Metrics
233
2. Preliminary Results in Sasakian Geometry 2.1. Transverse Kähler structure. Let (M, ξ, η, , g) be a 2m +1-dimensional Sasakian manifold, and let Fξ be the characteristic foliation generated by ξ . Firstly, let us recall that a Sasakian structure induces a transverse Kähler structure on the foliation Fξ . A transverse holomorphic structure on Fξ is given by an open covering {Ui }i∈A of M and local submersion f i : Ui → C n with fibers of dimension 1, (the leaves of the foliation Fξ |U j on U j coincide with the fibers of the map f i , leave is the image of the flow of ξ ), such that for i, j ∈ A there is a holomorphic isomorphism θi j of open sets of C n such that f i = θi j ◦ f j on Ui ∩ U j . In order to consider the deformations of Sasakian structures, we consider the quotient bundle of the foliation Fξ , ν(Fξ ) = T M/Lξ . The metric g gives a bundle isomorphism σ between ν(Fξ ) and the contact sub-bundle D = K er η, where σ : ν(Fξ ) → D defined by σ ([X ]) = X − η(X )ξ. By this isomorphism, |D induces a complex structure J¯ on ν(Fξ ). Since the Nijenhuis torsion tensor of satisfies N (X, Y ) = −dη(X, Y ) ⊗ ξ, so, (ν(Fξ ), J¯) ∼ = (D, |D ) gives Fξ a transverse holomorphic structure. Then (D, |D , dη) gives Fξ a transverse Kähler structure with transverse Kähler form 21 dη and metric g T defined by g T (·, ·) = 21 dη(·, ·). In the following we say that a Sasakian structure (ξ, η , , g ) has the same transverse holomorphic transverse structure with that of (ξ, η, , g), which means that it satisfies J¯ ◦ πν = πν ◦ ,
(2.1)
where πν is the projection πν : T M → ν(Fξ ). Definition 2.1. We define S(ξ, J¯) to be the set of all Sasakian structures which have the same Reeb vector field and the same transverse holomorphic structure with (ξ, η, , g), i.e. all Sasakian structures compatible with (ξ, η, , g). Definition 2.2. Fix a transverse holomorphic structure (ν(Fξ ), J¯) on the characteristic foliation Fξ . A complex vector field X on M is called a transverse holomorphic vector field if it satisfies: 1. π [ξ, X ] = 0;√ 2. J¯(π(X )) = √−1π(X ); √ 3. π([Y, X ]) − −1 J¯π([Y, X ]) = 0, ∀Y satisfying J¯π(Y ) = − −1π(Y ), where π is the projection π : T M → ν(Fξ ). Given a transverse Kähler form dη, let ψ be a complex valued basic function. Then there (ψ) ∈ (T c M) √ is a unique vector field Vdη√ which satisfies: (1) J¯(π(Vdη (ψ))) = −1π(Vdη (ψ)); (2) ψ = −1η(Vη (ψ)); √ −1 ¯ (3) ∂ B ψ = − 2 dη(Vη (ψ), ·). The vector field Vη (ψ) is called the Hamiltonian vector field of ψ corresponding to the transverse Kähler form dη. A complex vector field X on M is called a Hamiltonian holomorphic vector field if it is transverse holomorphic and is the Hamiltonian vector field of some complex valued basic function ψ corresponding to some transverse Kähler form dη.
234
X. Zhang
Remark 2.3. By the definition, cξ is a Hamiltonian holomorphic vector field for any constant c. In this paper, without nontrivial Hamiltonian holomorphic vector field means that any Hamiltonian holomorphic vector field must be 0 or cξ . Given a Sasakian structure (ξ, η, , g), we might identity D with ν(Fξ ) by the isomorphism. However, it’s better to distinguish them, since under the deformations of Sasakian structure, the contact sub-bundle D changes, while ν(Fξ ) keeps fixed. But for simplicity of notation, we will use the same notation if there is no confusion, especially if we do not consider deformations. From the transverse Kähler structure (D, |D , dη), one can define the transverse Levi-Civita connection ∇ T on D by (∇ X Y ) p , X ∈ D, T (2.2) ∇X Y = [ξ, Y ] p , X = ξ, where ∇ is the Levi-Civita connection with respect to the Riemannian metric g, Y is a section of D and X p the projection of X onto D. It is easy to check that the transverse Levi-Civita connection is torsion-free and metric compatible. The transverse curvature tensor and transverse Ricci curvature are defined by T T T T R T (V, W )Z = ∇VT ∇W Z − ∇W ∇V Z − ∇[V,W ]Z,
(2.3)
Ric T (X, Y ) = < R T (X, ei )ei , Y >g ,
(2.4)
where ei is an orthonormal basis of D, X, Y, Z ∈ D and V, W ∈ T M. We also have the following relations between the transverse curvature tensor and the Riemann curvature tensor (see [10]): R T (X, Y )Z = R(X, Y )Z − < R(X, Y )Z , ξ > ξ − < ∇Y Z , ξ > (X ) + < ∇ X Z , ξ > (Y ) + < [X, Y ], ξ > (Z ),
(2.5)
Ric T (X, Y ) = Ric(X, Y ) + 2g T (X, Y ),
(2.6)
and
for X, Y, Z ∈ D. The transverse Ricci form is defined as follows: ρ T (X, Y ) = Ric T (X, Y ).
(2.7)
Definition 2.4. A Sasakian manifold (M, ξ, η, , g) is said to be transversely KählerEinstein if 1 Ric T = μg T , or ρ T = μ( dη) 2 for some constant c. A Sasakian manifold (M, ξ, η, , g) is called a Sasakian-Einstein manifold if g is a Riemannian Einstein metric, i.e. Ricg = cg for some constant c. It is easy to see that a Sasakian-Einstein manifold must be transversely Kähler-Einstein, the constant c must be 2m, and Ric T = (2m + 2)g T , or ρ T = (m + 1)dη.
(2.8)
Energy Properness and Sasakian-Einstein Metrics
235
2.2. Basic cohomology. A p-form θ on the Sasakian manifold (M, ξ, η, , g) is called basic if i ξ θ = 0,
L ξ θ = 0,
(2.9)
where i ξ is the contraction with the Killing vector field ξ, L ξ is the Lie derivative with respect to ξ . Basic cohomology was introduced by Reinhart in [30]. We begin with a brief review following [37]. It is easy to see that the exterior differential preserves basic p forms. Namely, if θ is a basic form, so is dθ . Let ∧ B (M) be the sheaf of germs of basic p p p p-forms and B (M) = (M, ∧ B (M)) be the set of all sections of ∧ B (M). The basic C cohomology can be defined in a usual way (see [15]). Let D be the complexification of the sub-bundle D, and decompose it into its eigenspaces with respect to | D , that is DC = D1,0 ⊕ D0,1 .
(2.10)
Similarly, we have a splitting of the complexification of the bundle ∧1B (M) of basic one forms on M, 0,1 ∧1B (M) ⊗ C = ∧1,0 B (M) ⊕ ∧ B (M).
(2.11)
i, j
Let ∧ B (M) denote the bundle of basic forms of type (i, j). Accordingly, we have the following decomposition: p
i, j
∧ B (M) ⊗ C = ⊕i+ j= p ∧ B (M).
(2.12)
Define ∂ B and ∂ B by i, j
i+1, j
∂ B : ∧ B (M) → ∧ B ∂B :
i, j ∧ B (M)
→
(M);
i, j+1 ∧ B (M);
(2.13)
√ which is the decomposition of d. Let d Bc = 21 −1(∂ B − ∂ B ) and d B = d|∧ p . We have B √ d B = ∂ B + ∂ B , d B d Bc = −1∂ B ∂ B , d B2 = (d Bc )2 = 0. The basic cohomology groups i, j H B (M, Fξ ) are fundamental invariants of a Sasakian structure which enjoy many of the same properties as the Dolbeault cohomology of a Kähler structure [8]. On Sasakian manifolds, the ∂∂-lemma holds for basic forms. Proposition 2.5 ([15]). Let θ and θ be two real closed basic forms of type (1, 1) on a compact Sasakian manifold (M, ξ, η, , g). If [θ ] B = [θ ] B ∈ H B1,1 (M, Fξ ), then there is a real basic function ϕ such that √ θ = θ + −1∂ B ∂ B ϕ. Consider the complex bundle (D, |D ) (or (ν(Fξ ), J¯)) on a Sasakian manifold (M, ξ, η, , g). Let Rm T be the transverse curvature with respect to the transverse Levi-Civita connection ∇ T . If we choose a local foliate transverse frame (X 1 , . . . , X m ) on the bundle D, then Rm T can be seen as a matrix valued 2-form (i.e. End(D)-valued). Rm T is a basic (1, 1)-form. Let’s define the basic (k, k)-form γk by the formula √ m −1 Rm T ) = 1 + det (I dm + γk . (2.14) 2π k=1
236
X. Zhang
Definition 2.6. γk is a closed basic (k, k)-form; it represents an element in H Bk,k (M, Fξ ) that is called the basic k th Chern class and denoted by ckB (M, Fξ ). We have the following proposition. Proposition 2.7 ([8], Prop. 7.5.21). The basic Chern classes ckB (M, Fξ ) are independent of the choice of a Sasakian structure in S(ξ, J¯) Let ρ T = Ric T (·, ·) be the transverse Ricci form of the Sasakian structure (ξ, η, , g). ρ T is a real closed basic (1, 1)-form and the basic cohomology class 1 B T 2π [ρ ] B = c1 (M, Fξ ) is the basic first Chern class. We say that the basic first Chern class of (M, Fξ ) is positive (negative, null resp.) if it contains a positive (negative, null resp.) representation. By the definition and (2.8), a necessary condition for the existence of the Sasakian-Einstein metric (M, ξ, η, , g) is 2π c1B (M, Fξ ) = (m + 1)[dη] B . On the Sasakian manifold (M, ξ, η, , g), the basic Laplacian is defined by √ 4m −1∂ B ∂¯ B ψ ∧ (dη)m−1 ∧ η , (2.15) B ψ = (dη)m ∧ η for any basic function ψ. It is well known that the basic Laplacian is equal to the restriction of the Riemannian Laplacian g on basic functions, i.e. B ψ = g ψ for any basic function ψ. 2.3. Transformations of Sasakian structures. Let (ξ, η, , g) be a Sasakian structure on M; for every real basic function ϕ ∈ H(ξ, η, , g), we can obtain a new Sasakian structure (ξ, ηϕ , ϕ , gϕ ). Here ηϕ , ϕ , gϕ are defined as that in (1.2) and (1.3). We know that the above deformations fix the Reeb vector field ξ and the transverse holomorphic structure. Let (ξ , η , , g ) be another Sasakian structure which is compatible with (ξ, η, , g), then there exists a basic function ϕ ∈ H(ξ, η, , g) such that dη = dη + dd Bc ϕ and η = η + d Bc ϕ + ζ , where ζ is a closed basic one form. Since the difference between η and η is a basic one form, it is easy to see that (dη )m ∧ η = (dη )m ∧ η, and
(dη )m ∧ η = M
(2.16)
(dη)m ∧ η = V.
(2.17)
M
T denotes the transverse Ricci form with respect to the transverse In the following, ρdη Kähler metric dη. In local foliation coordinates (x, z 1 , · · · , z m ), we have √ T = − −1∂ B ∂ B log det ((dη)i j¯ ). (2.18) ρdη
Globally, the difference between two transverse Ricci forms can be expressed by T T ρdη − ρdη
=
√
−1∂ B ∂ B log(
(dη )m ∧ η ). (dη)m ∧ η
(2.19)
From the above formula, we know that to find a Sasakian-Einstein structure compatible with (ξ, η, , g) is equivalent to solve the transverse Monge-Ampère equation (1.5).
Energy Properness and Sasakian-Einstein Metrics
237
Let’s recall one class special deformations of Sasakian structure, ξs = s −1 ξ, ηs = sη, s = , gs = sg + s(s − 1)η ⊗ η,
(2.20)
where s is a constant. These were called D-homothetic deformations by Tanno [33]. These deformations do not deform the characteristic foliation and the contact bundle D, but only rescale the Reeb field ξ and contact 1-form η. If (ξ, η, , g) is a transverse Kähler Einstein Sasakian metric with positive transverse Ricci curvature, i.e. RicgT = μg T (Ric = (μ − 2)g + (2n + 2 − μ)η ⊗ η) for some positive constant μ ∈ R. Then, μ by the D-homothetic transformation s = 2(n+1) , we get a Sasakian-Einstein metric (ξs , ηs , s , gs ). Indeed, by the relation formula (2.15) in [33], we have Ricgs = Ricg − 2(s − 1)g + s −1 (s − 1)(2(2n + 1)s + 2ns(s − 1))η ⊗ η = {2n + 2 − μ + s −1 (s − 1)(2(2n + 1)s + 2ns(s − 1))}η ⊗ η +(μ − 2 − 2(s − 1))g μ − 2s = {sg + s(s − 1)η ⊗ η} + ((2n + 2)s 2 − μs)η ⊗ η s = 2mgs .
(2.21)
Through Tanno’s D-homothetic transformation, one can prove the following lemma. The proof can be found in [28] and [31], see also Proposition 2.6 in [40]. Proposition 2.8. Let (M, ξ, η, g) be a 2m + 1-dimensional compact Sasakian manifold with Ric T ≥ g T . Suppose that φ ∈ C B∞ (M) satisfies g φ ≤ δ, then we have 1 C(m)δ − inf φ ≤ , (2.22) (−φ)(dη)m ∧ η + M V M where V = M (dη)m ∧ η and constant C(m) depends only on m. From the above proposition, it is easy to conclude the following corollary. Corollary 2.9. Let (M, ξ, η, g) be a 2m + 1-dimensional compact Sasakian manifold and ϕ ∈ H(ξ, η, , g) be a potential function with Ric T (dηϕ ) ≥ gϕT . Then we have Osc(ϕ) ≤ Idη (ϕ) +
˜ C(m) ˜ + C(M, g),
(2.23)
˜ ˜ where constant C(M, g) depends only on (M, g) and constant C(m) depends only on m. To the definition of Idη , see Sect. 3. Proof. Using the fact dη ϕt ≥ −4m and the Green’s formula, we have 1 ˜ sup ϕ ≤ ϕ(dη)m ∧ η + C(M, g). V M M
(2.24)
Then, by dηϕ ϕ ≤ 4m and the above proposition, we have Osc(ϕ) = sup ϕt − inf ϕt M
≤ Idη (ϕ) +
M
˜ C(m) ˜ + C(M, g).
(2.25)
238
X. Zhang
3. Energy Functionals Let (ξ, η, , g) be a Sasakian structure on M. We consider the following functionals on H(ξ, η, , g) which are analogous to the ones in Kähler geometry: 1 Idη (ϕ) := ϕ{(dη)m ∧ η − (dηϕ )m ∧ η}, V M 1 1 Jdη (ϕ) := Idη (sϕ)ds, 0 s (3.1) 1 0 Fdη (ϕ) := Jdη (ϕ) − ϕ(dη)m ∧ η, V M 1 1 0 log{ (ϕ) − eh dη −(m+1)ϕ (dη)m ∧ η}, Fdη (ϕ) := Fdη m+1 V M where V = M (dη)m ∧ η, and h dη is a smooth basic function which satisfies ρ T = √ (m+1)dη + −1∂ B ∂¯ B h dη and M exp(h dη )(dη)m ∧η = V . Noting that g(·, ·) = 21 dη, when ϕ ∈ C B∞ (M) we have √ 1 m −1∂ B ∂¯ B ϕ ∧ (dη)m−1 ∧ η = ϕ(dη)m ∧ η, 4
(3.2)
where is the Laplace of the metric g. Let ϕs be a smooth curve in H. By direct calculation, we have d 1 Idη (ϕs ) = ϕ˙s {(dη)m − (dηϕs )m } ∧ η ds V M 1 − ϕs ϕs ϕ˙s (dηϕs )m ∧ η, (3.3) 4V M 1 d Jdη (ϕs ) = ϕ˙s {(dη)m − (dηϕs )m } ∧ η, (3.4) ds V M 1 d 0 Fdη (ϕs ) = − ϕ˙s (dηϕs )m ∧ η, (3.5) ds V M and
d 1 Fdη (ϕs ) = − ϕ˙s (dηϕs )m ∧ η ds V M −1 h dη −(m+1)ϕ m e (dη) ∧ η ϕ˙s eh dη −(m+1)ϕ (dη)m ∧ η, (3.6) + M
M
d ϕs and ϕs is the Laplace corresponding with the metric gϕs . From (3.6), where ϕ˙s = ds it is easy to see that the critical points of Fdη are transverse Kähler-Einstein metrics. The following properties can be proved by a similar method as in the Kähler case (see [4,28,31,35]).
Proposition 3.1. Let C be a constant, then Idη (ϕ + C) = Idη (ϕ), Jdη (ϕ + C) = Jdη (ϕ), Fdη (ϕ + C) = Fdη (ϕ).
(3.7)
Energy Properness and Sasakian-Einstein Metrics
239
Idη , Idη − Jdη , Jdη are non-negative functionals on H(ξ, η, , g), and we have Idη (ϕ) ≤ (m + 1){Idη (ϕ) − Jdη (ϕ)} ≤ m Idη (ϕ). Let ϕt be a family of basic functions in H, then d d −1 {Idη (ϕt ) − Jdη (ϕt )} = ϕt (t ϕt )(dηϕt )m ∧ η, dt 4V M dt
(3.8)
(3.9)
where t is the Laplacian corresponding to the metric gϕt . Fdη satisfies the following cocycle property, i.e. Fdη (ψ) + Fdη (φ − ψ) = Fdη (φ),
(3.10)
Fdη (ψ) = −Fdη (−ψ)
(3.11)
and
for all φ, ψ ∈ H(ξ, η, , g) and dη = dη + 0 . condition for Fdη
√ −1∂ B ∂ B ψ. We also have the cocycle
Lemma 3.2. Let (ξ, η, , g) and (ξ, η , , g ) be two Sasakian structures √ with the same transverse holomorphic structure on M, and assume that dη = dη + −1∂ B ∂¯ B φ for some basic function φ. Then, we have |Idη (ϕ + φ) − Idη (ϕ)| ≤ (m + 1)Osc(φ)
(3.12)
for all ϕ ∈ H(ξ, η, , g). Proof. By definition, we have V (Idη (ϕ + φ) − Idη (ϕ))
m m ϕ((dη ) − (dη) ) ∧ η + φ((dη )m − (dηϕ )m ) ∧ η, = M
(3.13)
M
√ where dηϕ = dη + −1∂ B ∂¯ B ϕ. On the other hand, By direct calculation, we have | ϕ((dη )m − (dη)m ) ∧ η| M
m−1 √ ϕ(− −1∂ B ∂¯ B φ) ∧ ( (dη ) j ∧ (dη)m− j−1 ) ∧ η|
=| M
j=0
√ φ(− −1∂ B ∂¯ B ϕ) ∧ (
=| M
M
(dη ) j ∧ (dη)m− j−1 ) ∧ η|
j=0
=|
m−1
φ(dη − dηϕ ) ∧ (
≤ mV Osc(φ)
m−1
(dη ) j ∧ (dη)m− j−1 ) ∧ η|
j=0
(3.14)
240
X. Zhang
and
| M
φ((dη )m − (dηϕ )m ) ∧ η| ≤ V Osc(φ).
(3.15)
Then (3.13), (3.14) and (3.15) imply (3.12). Let ρϕT denote the transverse Ricci form of the Sasakian structure (ξ, ηϕ , ϕ , gϕ ). T m M ρϕ ∧ (dηϕ ) ∧ ηϕ is independent of the choice of ϕ ∈ H(ξ, η, , g) (e.g., Proposition 4.4 in [16]). This means that T m−1 ∧ η SϕT (dηϕ )m ∧ ηϕ M M 2mρϕ ∧ (dηϕ ) (3.16) = S¯ = m m M (dηϕ ) ∧ ηϕ M (dηϕ ) ∧ η depends only on the basic Kähler class. As in the Kähler case (see [22]), we can define Mabuchi’s K-energy on the space H(ξ, η, , g). Definition 3.3. Let ϕ and ϕ
be two basic functions in H(ξ, η, , g). We define 1 1 m ¯ M(ϕ , ϕ
) := − ϕ˙t (StT − S)(dη (3.17) t ) ∧ η dt, V 0 M where ϕt (t ∈ [0, 1]) is a path in H connecting ϕ and ϕ
, ϕ˙t = ∂t∂ ϕt , StT is the transverse scalar curvature to the Sasakian structure (ξ, ηϕt , ϕt , gϕt ) and S¯ is the average defined as in (3.16). We also define Vdηϕ (ϕ) := M(ϕ , ϕ + ϕ)
(3.18)
for any ϕ ∈ H(ξ, ηϕ , ϕ , gϕ ). By Theorem 4.12 in [16] (or Lemma 11 in [20]), we know that M is independent of the path ϕt , and so M is well defined. Furthermore, M satisfies the following cocycle condition, i.e. M(ϕ0 , ϕ1 ) + M(ϕ1 , ϕ0 ) = 0, M(ϕ0 , ϕ1 ) + M(ϕ1 , ϕ2 ) + M(ϕ2 , ϕ0 ) = 0,
(3.19) (3.20)
M(ϕ1 + C , ϕ2 + C
) = M(ϕ1 , ϕ2 )
(3.21)
and for any ϕi ∈ H(ξ, η, , g) and C , C
∈ R. Following Ding [13], we also have the following relation between the functionals Vdη and Fdη . Remark. By the definitions and the above properties, it is easy to see that the functionals Fdη and Vdη can also be defined on the space S(ξ, J¯). Lemma 3.4. Let (M, ξ, η, , g) be a compact Sasakian manifold with 2π c1B (M) = (m + 1)[dη] B . Then Vdη (φ) − 2(m + 1)Fdη (φ) 2 2 = h dη (dη)m ∧ η − h dηφ (dηφ )m ∧ η V M V M
(3.22)
for any φ ∈ H(ξ, η, , g), where h dη and h dηφ are the normalized Ricci potential functions with respect to dη and dηφ .
Energy Properness and Sasakian-Einstein Metrics
241
Proof. Let φt be a path connecting 0 with φ, by the definition, we have 1 1 φ˙ t (StT − 2m(m + 1))(dηt )m ∧ η dt Vdη (φ) = − V 0 M 2m 1 φ˙ t (ρtT − (m + 1)dηt )(dηt )m−1 ∧ η dt =− V 0 M √ 2m −1 1 φ˙ t ∂ B ∂¯ B (h dη − (m + 1)φt =− V 0 M (dηt )m ∧ η − log )(dηt )m−1 ∧ η dt (dη)m ∧ η 2 1 (dηt )m ∧ η ∂ =− ) ((dηt )m ∧ η) dt (h dη − (m + 1)φt − log V 0 M (dη)m ∧ η ∂t 2 h dη (dηm − dηφm ) ∧ η = −2(m + 1)(Idη − Jdη )(φ) + V M (dηφ )m ∧ η 2 + (dηφ )m ∧ η. log (3.23) V M (dη)m ∧ η On the other hand, it is easy to check that − log where c = − log( V1
M
(dηφ )m ∧ η − (m + 1)φ + c = h dηφ − h dη , (dη)m ∧ η
(3.24)
eh dη −(m+1)φ dηm ∧ η). Then
2(m + 1) Vdη (φ) = 2(m + 1)Jdη (φ) − φdηm ∧ η + 2c V M 2 2 m + h dη (dη) ∧ η − h dηφ (dηφ )m ∧ η V M V M 2 2 = 2(m + 1)Fdη (φ) + h dη (dη)m ∧ η − h dηφ (dηφ )m ∧ η. V M V M By the normalization M exp(h dηφ )(dηφ )m ∧η = V , we know that M h dηφ (dηφ )m ∧ η ≤ 0, then we have the following corollary. Corollary 3.5. Let (M, ξ, η, , g) be a compact Sasakian manifold with 2π c1B (M) = (m + 1)[dη] B , then 2 Vdη (φ) ≥ 2(m + 1)Fdη (φ) + V for any φ ∈ H(ξ, η, , g).
h dη (dη)m ∧ η M
(3.25)
242
X. Zhang
4. Transverse Monge-Ampère Equation Let (M, ξ, η, , g) be a compact Sasakian manifold with 2π c1B (M) = (m + 1)[dη] B . Given any ϕ ∈ H(M, ξ, η, , g), we have a new Sasakian structure (ξ, ηϕ , ϕ , gϕ ). It is easy to check that dηϕ is Sasakian-Einstein if and only if ϕ satisfies the following equation: √
−1∂ B ∂ B log(
√ (dηϕ )m ∧ η ) = −1∂ B ∂ B (h dη − (m + 1)ϕ), m (dη) ∧ η
(4.1)
which is equivalent to the transverse Monge-Ampère equation (1.5). As in the Kähler case, we consider a family of equations (1.6). We set S = {t ∈ [0, 1]|(1.6) is solvable for t} .
(4.2)
By [15], we know that (1.6) is solvable for t = 0. The openness of S was proved in [31] (Prop. 5.3) and [28] (Prop. 4.4) in a similar way as that in [1]. S is not empty. In order to use the continuity method to solve (1.5), we only need to prove the closedness of S. By El-Kacimi’s ([15]) generalization of Yau’s estimates ([38]) for transverse MongeAmpère equations, the C 0 -estimate for solutions of (1.6) implies the C 2,α -estimate for them, and the transverse elliptic Schauder estimates give higher order estimates. Therefore it suffices to estimate C 0 -norms of the solutions of (1.6). We list the following proposition for further discussion, the proof can be found in [31] (Prop. 5.3) and [28] (Prop. 4.4), see also Prop. 4.2 in [40]. Proposition 4.1. Let 0 < τ ≤ 1, and suppose that (1.6) has a solution ϕτ at t = τ . If 0 < τ < 1, then there exists some > 0 such that ϕτ uniquely extends to a smooth family of solution {ϕt } of (1.6) for t ∈ (0, 1) ∩ (τ − , τ + ). S is also open near t = 0, i.e. there exists a small positive number such that there is a smooth family solution of (1.6) for t ∈ (0, ). If (M, ξ, η, , g) admits no nontrivial Hamiltonian holomorphic vector field, ϕ1 can also be extended uniquely to a smooth family of solution {ϕt } of (1.6) for t ∈ (1 − , 1]. As in [4], we have the following lemma; the proof can be found in [40] (Lemma 4.3), see also Lemma 4.9 in [28] and Lemma 5.4 in [31]. Lemma 4.2. Let {ϕt } be a smooth family of solution of (1.6) for t ∈ (0, 1], then d (Idη − Jdη )(ϕt ) ≥ 0. dt
(4.3)
Now, we consider the existence problem of Sasakian-Einstein metrics. We prove the following theorem. Theorem 4.3. Let (M, ξ, η, , g) be a compact Sasakian manifold with [dη] B = 2π B m+1 c1 (M, Fξ ). If Fdη (or Vdη ) is proper in the space H(ξ, η, , g), then there must exist a Sasakian-Einstein metric compatible with (ξ, η, , g). Proof. By Proposition 4.1, we can suppose that there exists a smooth family of solution {ϕt } of (1.6) for t ∈ (0, τ ) with some τ ∈ (0, 1). From Eq. (1.6), we know that T ≥ t (m + 1)dη . By Proposition 2.7, we have t ϕt ≤ 4m and ρdη t t C1 (m) 1 , (4.4) ϕt (dηϕ )m ∧ η ≤ in f M ϕt + V M t
Energy Properness and Sasakian-Einstein Metrics
243
where positive constant C1 (m) depends only on m. Using the fact dη ϕt ≥ −4m and the Green formula, we have 1 sup ϕt ≤ ϕt (dη)m ∧ η + C2 , (4.5) V M M where C2 is a positive constant that depends only on the geometry of (M, g). By the normalization, it is easy to check that sup M ϕt ≥ 0 and inf M ϕt ≤ 0. Then ϕt C 0 ≤ sup ϕt − inf ϕt M
M
C1 (m) + C2 . t
(4.6)
Idη (ϕt1 ) ≤ (m + 1)(Idη − Jdη )(ϕt2 )
(4.7)
≤ Idη (ϕt ) + By (3.8) and (4.3), we have
for any 0 < t1 ≤ t2 < τ . Combining (4.6) and (4.7), we get tϕt C 0 ≤ t0 (m + 1)(Idη − Jdη )(ϕt0 ) + C3
(4.8)
for any 0 < t ≤ t0 < τ , where C3 is a positive constant which√ depends only on m B ∂ B ϕt ) ∧η | for the geometry of (M, g). So, we obtain an uniform bound on | (dη+ −1∂ (dη)m ∧η 0 0 < t ≤ t0 < τ . By El-Kacimi’s ([15]) generalization of Yau’s C estimate ([38]) for transverse Monge-Ampère equations, there exists an uniform constant C4 such that ϕt C 0 ≤ C4
(4.9)
for 0 < t ≤ t0 < τ . Differentiating (1.6) with respect to t, we have 1 t ϕ˙t = −t (m + 1)ϕ˙s − (m + 1)ϕt . 4
(4.10)
Using (3.5) and (4.10), we have
d t 0 0 (t Fdη (ϕt )) = Fdη (ϕt ) − ϕ˙t (dηϕ )m ∧ η dt V M 1 t = Jdη (ϕt ) − ϕt (dη)m ∧ η − ϕ˙t (dηϕ )m ∧ η V M V M = −(Idη (ϕt ) − Jdη (ϕt )) ≤ 0. (4.11)
By the uniform C 0 estimate (4.9), it is easy to check that 0 t Fdη (ϕt ) → 0
(4.12)
as t → 0. So, from (4.11), we have
1 1 Fdη (ϕt ) ≤ − log{ eh dη −(m+1)ϕ (dη)m ∧ η} m+1 V M (1 − t) ≤ ϕt (dηϕ )m ∧ η, V M
(4.13)
244
X. Zhang
where we have used the concavity of the logrithmic function. From (4.13) and (4.4), we have Fdη (ϕt ) ≤
(1 − t) C1 (m). t
(4.14)
By (4.14) and (4.6), the properness of Fdη implies that Jdη (ϕt ), and consequently, ϕt C 0 are uniformly bounded for t ∈ [, τ ). Therefore, Eq. (1.5) can be solved, i.e. there is a Sasakian-Einstein metric on M. For the K-energy case, is easy to see that, along the solutions of (1.6), we have StT = 2(m + 1)(m −
(1 − t) ϕt ϕt ), 4
and 2 Vdη (ϕt ) = −2(m + 1)(Idη − Jdη )(ϕt ) + V 2t (m + 1) − ϕt (dηϕt )m ∧ η. V M
(4.15)
h dη dηm ∧ η M
(4.16)
Then, by (3.9) and (3.17), we have
1 d m ¯ ϕ˙t (StT − S)(dη Vdη (ϕt ) = − t) ∧ η dt V M 2m + 2 (1 − t) = ϕt ϕt (dηt )m ∧ η ϕ˙ t V 4 M d = 2(m + 1)(t − 1) ((Idη − Jdη )(ϕt )). dt
(4.17)
From (4.16) and (4.17), we have d t ( ϕt (dηϕt )m ∧ η + t (Idη − Jdη )(ϕt )) = (Idη − Jdη )(ϕt ). (4.18) dt V M Noting that Vt M ϕt (dηϕt )m ∧ η + t (Idη − Jdη )(ϕt ) → 0 as t → 0. The identity (4.18) implies that 1 ϕt (dηϕt )m ∧ η + (Idη − Jdη )(ϕt ) ≥ 0, (4.19) V M and 2 Vdη (ϕt ) ≤ −2(m + 1)(1 − t)(Idη − Jdη )(ϕt ) + V 2 m ≤ h dη dη ∧ η. V M
h dη dηm ∧ η M
(4.20)
Then the properness of Vdη implies that Jdη (ϕt ), and consequently, ϕt C 0 are uniformly bounded for t ∈ [, τ ). Therefore, Eq. (1.5) can also be solved. Proposition 4.4. Let (M, ξ, η, , g) be a compact Sasakian manifold with [dη] B = 2π B m+1 c1 (M, Fξ ). If Vdη (or Fdη ) is bounded from below in the space H(ξ, η, , g), then there exists a smooth family of solution {ϕt } of (1.6) for t ∈ (0, 1).
Energy Properness and Sasakian-Einstein Metrics
245
Proof. By Corollary 3.5, it is sufficient to prove the K-energy case. We prove it by contradiction. Let τ < 1 be the maximal number such that there exists a smooth family of solution {ϕt } of (1.6) for t ∈ (0, τ ). By (4.17), we have t d 1 Vdη (ϕs ) + (Idη − Jdη )(ϕt0 ) (Idη − Jdη )(ϕt ) = 2(m + 1)(t − 1) ds t0 1 ≤ (Vdη (ϕt0 ) − inf Vdη (ϕ)) + (Idη − Jdη )(ϕt0 ), (4.21) 1−τ where 0 < t0 < τ . Since Vdη is bounded from below, then (Idη − Jdη )(ϕt ) are bounded uniformly from above for 0 < t < τ , and consequently, ϕt C 0 are uniformly bounded for t ∈ [t0 , τ ). So Eq. (1.6) can also be solved at τ ; this gives a contradiction. 5. Smoothing by The Sasakian-Ricci Flow In this section, we use the Sasakian-Ricci flow to get a smoothing lemma. As a natural analogue of the Kähler-Ricci flow, the Sasakian-Ricci flow was introduced in [32]. Now, we consider the following Sasakian-Ricci flow: √ (d η˜ 0 + −1∂ B ∂ B v)m ∧ η˜ 0 ∂v = log + (m + 1)v − h d η˜ 0 , (5.1) ∂s (d η˜ 0 )m ∧ η˜ 0 with v|s=0 ≡ 0. The long-time existence had been proved in √[32]. In the following, for simplicity, we will denote the transverse Kähler form d η˜ 0 + −1∂ B ∂ B v by d η˜ s , and we will use a subscript s to indicate objects that are defined with respect to the transverse Kähler metric d η˜ s . As in [3], we have the following lemma. Lemma 5.1. The following inequalities:
∂v C 0 ≤ e(m+1)s h d η˜ 0 C 0 , ∂s
s 2 sup(|h d η˜ s |2 + |dh d η˜ s |2s ) ≤ 4e2(m+1)s h d η˜ 0 C 0, 2 M e−(m+1)s s h d η˜ s ≥ 0 h d η˜ 0 ,
(5.2) (5.3) (5.4)
hold for all s ≥ 0. Proof. Differentiating the Sasakian-Ricci flow equation (5.1) gives ∂ 1 v˙ = s v˙ + (m + 1)v, ˙ ∂s 4
(5.5)
and the maximum principle implies (5.2). By direct calculation, we have ∂ 1 1 |d v| ˙ 2s = s |d v| ˙ 2s − |∇s d v| ˙ 2s + (m + 1)|d v| ˙ 2s . ∂s 4 2
(5.6)
From (5.5) and (5.6), we have (
1 s s ∂ − s )(v˙ 2 + |d v| ˙ 2s ) ≤ 2(m + 1)(v˙ 2 + |d v| ˙ 2s ), ∂s 4 2 2
(5.7)
246
X. Zhang
and the maximum principle implies that s 2 sup(v˙ 2 + |d v| ˙ 2s ) ≤ e2(m+1)s h d η˜ 0 C 0. 2 M
(5.8)
By Eq. (5.1), it is easy to check that h d η˜ s = −v˙ + cs
(5.9)
for some constant cs with c0 = 0. From the normalization condition the Ricci potential function and (5.1), we have −(m+1)v+h d η˜ 0 +cs m e (d η˜ 0 ) ∧ η˜ 0 = eh d η˜s (d η˜ s )m ∧ η˜ s = V, (5.10) M
M
and then |cs | ≤ (m + 1)vC 0 ≤ e(m+1)s h d η˜ 0 C 0 .
(5.11)
So, (5.8) and (5.11) imply (5.3). By direct calculation, one can check that ∂ 1 − s (s v) ˙ = (m + 1)s v˙ − |∂ B ∂¯ B v| ˙ 2s , ∂s 4 and then the maximum principle implies the inequality (5.4).
(5.12)
Lemma 5.2. Let vt,s be a solution of (5.1) with d η˜ 0 = dηϕt . Let h˜ = h d η˜ 1 − 1 m V M h d η˜ 1 (d η˜ 1 ) ∧ η˜ 1 and assume that 1 dη S E ≤ d η˜ 1 ≤ dη S E . 2
(5.13)
Then for any p > 2m +1, there exists a positive constant C¯ 1 depending only on (M, g S E ) and p such that p−2
1
˜ C 0 ≤ C¯ 1 (1 − t) p−1 h d η˜ p−1 h 0 C0 .
(5.14)
˜ C 0 ≤ 4em+1 h d η˜ C 0 . By the initial condition Proof. Equation (5.3) implies that h 0 d η˜ 0 = dηϕt , we have ρ T (d η˜ 0 ) ≥ t (m + 1)d η˜ 0 , and 0 h d η˜ 0 ≥ 4m(m + 1)(t − 1). By (5.4), we have − 1 h d η˜ 1 ≤ 4em+1 m(m + 1)(1 − t).
(5.15)
Integrating by parts, we have ˜ 21 (d η˜ 1 )m ∧ η˜ 1 = − ˜ 1 h(d ˜ η˜ 1 )m ∧ η˜ 1 |d h| h M M ˜ sup(−1 h)(d ˜ η˜ 1 )m ∧ η˜ 1 ≤ (h˜ − inf h) M
M
˜ C 0 sup(−1 h) ˜ ≤ 2V h M
˜ C0 , ≤ C¯ 2 (1 − t)h
(5.16)
Energy Properness and Sasakian-Einstein Metrics
247
where C¯ 2 depends only on the dimension of M. Since h˜ is a basic function, by condition (5.13), we have ˜ 2S E ≤ 2|d h| ˜ 21 . |d h|
(5.17)
Let p > 2m + 1. By the Sobolev imbedding theorem (Lemma 2.22 of [2]), the Poincaré inequality and (5.3), we have ˜ p 0 ≤ C¯ 3 ( |h| ˜ p + |d h| ˜ p (dη S E )m ∧ η S E ) h SE C M p−2 ˜ 2 + |d h| ˜ 2S E (dη S E )m ∧ η S E ) ≤ C¯ 4 h d η˜ 0 C 0 ( |h| M p−2 ¯ ˜ 2S E (dη S E )m ∧ η S E ) ≤ C5 h d η˜ 0 C 0 ( |d h| M p−2 ˜ 21 (d η˜ 1 )m ∧ η˜ 1 ), ≤ C¯ 6 h d η˜ 0 C 0 ( |d h| (5.18) M
where constants C¯ i depend only on (M, g S E ) and p. Then (5.16) and (5.18) imply (5.14), and we are finished. Lemma 5.3. Let vt,s be a solution of (5.1) with initial data d η˜ 0 = dηϕt , and u t = vt,1 . We have the inequality u t C 0 ≤
1 m+1 e h dηϕt C 0 m+1
(5.19)
for all t ∈ [0, 1]. Moreover, assume that 21 dη S E ≤ dηϕt +u t ≤ dη S E for all t ∈ [t1 , 1], where t1 ∈ [0, 1). Then for any p > 2m + 1 and 0 ≤ k < 1, there exists a constant C¯ 7 depending only on (M, g S E ) and p such that h dηϕt +u t C 0,k (dη S E ) ≤ C¯ 7 (1 − t)1−β (1 + h dηϕt C 0 )β for all t ∈ [t1 , 1], where β =
(5.20)
p+k−2 p−1 . ∂v
Proof. From (5.2), it follows that | ∂st,s | ≤ e(m+1)s h dηϕt C 0 , and integrating from 0 to 1, we obtain the inequality (5.19). In the following, let d(x, y) be the distance between x and y with respect to the metric g S E . Since h dηϕt +u t is a basic function, by the condition 21 dη S E ≤ dηϕt +u t ≤ dη S E , we have √ |dh dηϕt +u t |dη S E ≤ 2|dh dηϕt +u t |dηϕt +u t . 1
If d(x, y) ≤ (1 − t) p−1 (1 + h dηϕt C 0 )
1 − p−1
, by (5.3) in Lemma 5.1, we have
|h dηϕt +u t (x) − h dηϕt +u t (y)| ≤ d(x, y) sup |dh dηϕt +u t |dη S E ≤
√
M
2d(x, y) sup |dh dηϕt +u t |dηϕt +u t
√
M
≤ 4 2e d(x, y)(1 + h dηϕt C 0 ) √ p+k−2 1−k ≤ 4 2em+1 (1 − t) p−1 (1 + h dηϕt C 0 ) p−1 d(x, y)k . m+1
(5.21)
248
X. Zhang 1
If d(x, y) ≥ (1 − t) p−1 (1 + h dηϕt C 0 ) implies
1 − p−1
, then the estimate (5.14) in Lemma 5.2
˜ C0 |h dηϕt +u t (x) − h dηϕt +u t (y)| ≤ 2h p−2
1
≤ 2C¯ 1 (1 − t) p−1 (h dηϕt C 0 ) p−1 1−k
≤ 2C¯ 1 (1 − t) p−1 (1 + h dηϕt C 0 ) On the other hand, the integral normalization h dηϕt +u t change signs, so we have
M
p+k−2 p−1
d(x, y)k .
(5.22)
eh dηϕt +u t (dηϕt +u t )m ∧ η = V implies
˜ ≤ 2h ˜ C0 h dηϕt +u t C 0 ≤ Osc(h dηϕt +u t ) = Osc(h) 1
p−2
≤ 2C¯ 1 (1 − t) p−1 (h dηϕt C 0 ) p−1 . It is easy to see that (5.21), (5.22) and (5.23) imply the estimate (5.20). Set α := 1 −
1 4m+2
>
1 2
(5.23)
and define the function f dη by
f dη (t) := (1 − t)1−α (1 + 2(1 − t)ϕt C 0 )α .
(5.24)
Discussing as in [35], we have the following proposition. Proposition 5.4. Suppose that (M, ξ, η, , g) admits no non-trivial Hamiltonian holomorphic vector fields. Let ϕt be a smooth family of solutions of Eq. (1.6) for t ∈ (0, 1]. There exists a constant D > 0 depending only on (M, g S E ) such that ϕ1 − ϕt C 0 ≤ A(1 − t)ϕt C 0 + 1
(5.25)
for all t ∈ [t0 , 1], where t0 ∈ [0, 1) satisfies f dη (t0 ) = max[t0 ,1] f dη = D and A depending only on the dimension of M. Proof. Let’s rewrite (1.6) as the following transverse Monge-Ampère equation with dη S E as reference metric √ (dη S E + −1∂ B ∂ B (ϕt − ϕ1 ))m ∧ η (dη S E )m ∧ η (5.26) = exp(−(m + 1)(ϕt − ϕ1 ) + (1 − t)(m + 1)ϕt ). It is easy to see that h dηϕt = (t − 1)(m + 1)ϕt + ct , for some constant ct . The integrated normalization of the Ricci potential function h dηϕt gives V = (dηϕt )m ∧ η = eh dηϕt (dηϕt )m ∧ η M M = e(t−1)(m+1)ϕt +ct (dηϕt )m ∧ η, (5.27) M
from which it follows that |ct | ≤ (m + 1)(1 − t)ϕt C 0 ,
(5.28)
Energy Properness and Sasakian-Einstein Metrics
249
and h dηϕt C 0 ≤ 2(m + 1)(1 − t)ϕt C 0 .
(5.29)
Then, Lemma 5.3 implies that
Consider dηϕt +u t and then
(5.30) u t C 0 ≤ 2e(m+1) (1 − t)ϕt C 0 . √ √ = dη + −1∂ B ∂¯ B (ϕt + u t ) = dη S E + −1∂ B ∂¯ B (ϕt + u t − ϕ1 ), √ −1∂ B ∂ B (ϕt + u t − ϕ1 ))m ∧ η (dη S E )m ∧ η = exp(−(m + 1)(ϕt + u t − ϕ1 ) − h dηϕt +u t − c˜t )
(dη S E +
(5.31)
c˜t for some constant c˜t . Setting ϕ˜t = ϕt + u t − ϕ1 + m+1 , from (5.31) and (5.30), we have eh dηϕt +u t (dηϕt +u t )m ∧ η = e−(m+1)ϕ˜t (dη S E )m ∧ η M M = e−(m+1)ϕ˜t +t (m+1)ϕt −(m+1)ϕ1 (dηϕt )m ∧ η M = e(t−1)(m+1)ϕt −(m+1)u t −c˜t (dηϕt )m ∧ η, (5.32) M
and then |c˜t | ≤ (1 − t)(m + 1)ϕt C 0 + (m + 1)u t C 0 ≤ (1 − t)(m + 1)(1 + 2e(m+1) )ϕt C 0 . Recalling that ϕt − ϕ1 = ϕ˜t − u t −
c˜t m+1 ,
(5.33)
from (5.30) and (5.33), we have
ϕt − ϕ1 C 0 = ϕ˜t C 0 + (1 − t)(4e(m+1) + 1)ϕt C 0 . From above, it will suffice to get the estimate ϕ˜t C 0 ≤ 1. Let’s consider the following transverse Monge-Ampère equation: √ (dη S E + −1∂ B ∂ B ψ)m ∧ η ˜ } + (m + 1)ψ = ψ. log{ (dη S E )m ∧ η
(5.34)
(5.35)
The linearization of the left side of (5.35) at ψ = 0 is δψ →
1 S E δψ + (m + 1)δψ, 4
(5.36)
which is a transverse elliptic operator from C i+2,k (M) → C i+2,k (M) for any 0 < k < 1 B B and i ≥ 0. If M doesn’t have non-trivial Hamiltonian holomorphic vector fields, by Theorem 5.1 of [16], we have ker ( 41 S E +(m +1)) = 0, then the operator ( 41 S E +(m +1)) : (M) → C i+2, (M) is invertible. Applying the implicit function theorem, there C i+2, B B exist positive constants (dη S E ) and C ∗ (dη S E ) which depend only on k and the geometry of (M, g S E ), so that ˜ C 0,k ≤ (dη S E ) then ψC 2,k ≤ C ∗ (dη S E )ψ ˜ C 0,k . if ψ
(5.37)
250
X. Zhang (m+1)−α , where = (dη S E ), C ∗ ¯ 2(C7 +1)(C ∗ +1)(+1) 1 , C¯ 7 is defined as in Lemma 5.3 α = 1 − 4m+2 Let t0 ∈ [0, 1) satisfy f dη (t0 ) = max[t0 ,1] f dη =
Setting D =
as in (5.37), p = 2m + 2. prove the following claim.
= C ∗ (dη S E ) are chosen (by choosing k = 21 and D. Now, we only need to
Claim. For all t ∈ [t0 , 1], we have ϕ˜t
1
C 2, 2
<
1 . 2
(5.38)
We assume the contrary. Since ϕ˜1 = 0, there exists t1 ∈ [t0 , 1) such that ϕ˜t1
C
2, 21
(dη S E )
In particular − 41 dη S E ≤
=
1 1 , and ϕ˜t 2, 1 i f t1 < t < 1. < C 2 (dη S E ) 2 2
(5.39)
√ −1∂ B ∂ B ϕ˜t ≤ 41 dη S E , and then 3 5 dη S E ≤ dηϕt +u t ≤ dη S E 4 4
(5.40)
for all t ∈ [t1 , 1]. By applying (5.20) in Lemma 5.3 (by choosing p = 2m + 2) and (5.29), we have h dηϕt +u t
1
C 0, 2 (dη S E )
≤ C¯ 7 (1 − t)1−α (1 + h dηϕt C 0 )α ≤ C¯ 7 (1 − t)1−α (1 + 2(1 − t)(m + 1)ϕt C 0 )α ≤ C¯ 7 (m + 1)α (1 − t)1−α (1 + 2(1 − t)ϕt C 0 )α
≤ C¯ 7 (m + 1)α D C¯ 7 = 2(C¯ 7 + 1)(C ∗ + 1)( + 1) < ,
(5.41)
for all t ∈ [t1 , 1]. Using (5.37) again, we get ϕ˜t1
1
C 2, 2 (dη S E )
≤ h dηϕt +u t ≤
2(C¯ 7 1 < . 2
1
C 0, 2 (dη S E ) C ∗ C¯ 7 + 1)(C ∗ + 1)(
+ 1) (5.42)
This gives a contradiction, and completes the proof of the claim. So, the proof of the proposition is completed. 6. A Moser-Trudinger Type Inequality In this section, we assume the existence of a Sasakian-Einstein structure and establish a Moser-Trudinger type inequality for functional Fdη S E . Our discussion follows that in [29] by Phong, Song, Sturm and Weinkove. In fact, we obtain the following theorem.
Energy Properness and Sasakian-Einstein Metrics
251
Theorem 6.1. Let (M, ξ, η S E , S E , g S E ) be a compact Sasakian-Einstein metric without non-trivial Hamiltonian holomorphic vector field, then there exist uniform positive constants C1 , C2 depending only the geometry of (M, g S E ), such that Fdη S E (ϕ) ≥ C1 Jdη S E (ϕ) − C2 ,
(6.1)
for all ϕ ∈ H(ξ, η S E , S E , g S E ). √ Proof. Fix a basic function φ ∈ H(ξ, η E , S E , g S E ), and set dη = dη S E + −1∂ B ∂¯ B φ. Now, let us consider the complex Monge-Ampère equation (1.6). Since there are no nontrivial Hamiltonian holomorphic vector fields, by the uniqueness of Sasakian-Einstein structure ([31] or [28]) and Proposition 4.1, a unique solution ϕt exists for all t ∈ (0, 1], and dηϕ1 = dη S E . In particular ϕ1 and −φ differ by a constant. For further consideration, we give the following estimates for functionals F, I and J . From (3.1), (3.4) and (4.10), we have d d 1 1 (Idη − Jdη )(ϕs ) = − ( ϕs (dηϕs )m ∧ η) + ϕ˙s (dη)m ∧ η ds ds V M V M 1 ϕ˙s {(dη)m − (dηϕs )m } ∧ η − V M 1 d 1 m ϕs (dηϕs ) ∧ η) + ϕ˙s (dηϕs )m ∧ η =− ( ds V M V M 1 d 1 ϕs (dηϕs )m ∧ η) − ϕs (dηϕs )m ∧ η. (6.2) =− ( ds V M sV M The uniform C 0 estimate (4.9) of ϕt implies that 1 ϕs (dηϕs )m ∧ η → 0 s V M
(6.3)
as s → 0. By integrating on [0, t], we get t t (Idη − Jdη )(ϕt ) − (Idη − Jdη )(ϕs )ds 0 t d = s (Idη − Jdη )(ϕs )ds ds 0 t d 1 1 t =− s ( ϕs (dηϕs )m ∧ η)ds − ( ϕs (dηϕs )m ∧ η)ds ds V M V 0 M 0 t =− ϕt (dηϕt )m ∧ η, (6.4) V M and then 1 0 Fdη (ϕt ) = −(Idη − Jdη )(ϕt ) − ϕt (dηϕt )m ∧ η V M −1 t = (Idη − Jdη )(ϕs )ds. t 0
(6.5)
252
X. Zhang
Taking t = 1 and considering Fdη (ϕ1 ) = −Fdη S E (φ), so we have Fdη S E (φ) =
1
(Idη − Jdη )(ϕs )ds.
(6.6)
0
0 , we have By the definitions (3.1) and the cocycle property of Fdη
1 0 0 (ϕ1 − ϕt )(dη)m ∧ η + Fdη (ϕ1 ) − Fdη (ϕt ) V M 1 0 = (ϕ1 − ϕt )(dη)m ∧ η − Fdη (ϕt − ϕ1 ) ϕ1 V M 1 1 (ϕ1 − ϕt )(dη)m ∧ η + (ϕt − ϕ1 )(dηϕ1 )m ∧ η) ≤ V M V M ≤ Osc(ϕ1 − ϕt ). (6.7)
Jdη (ϕ1 ) − Jdη (ϕt ) =
By adding Idη (ϕt ) − Idη (ϕ1 ) 1 1 = (ϕt − ϕ1 )(dη)m ∧ η + (ϕ1 − ϕt )(dηϕm1 ) ∧ η V M V M 1 + ϕt {(dηϕ1 )m − (dηϕt )m } ∧ η, V M we get (Idη − Jdη )(ϕt ) − (Idη − Jdη )(ϕ1 ) = Jdη (ϕ1 ) − Jdη (ϕt ) + (Idη (ϕt ) − Idη (ϕ1 )) 1 ≤ ϕt {(ωϕ1 )m − (ωϕt )m } V M m−1 1 j) ϕt (dηϕ1 − dηϕt ) ∧ ( dηϕj t ∧ dηϕ(m−1− )∧η = 1 V M j=0
=
1 V
m−1
M
(ϕ1 − ϕt )(dηϕt − dη) ∧ (
j) dηϕj t ∧ dηϕ(m−1− )∧η 1
j=0
≤ m Osc(ϕ1 − ϕt ).
(6.8)
Interchanging ϕt and ϕ1 in (6.7) and (6.8), we get |Jdη (ϕ1 ) − Jdη (ϕt )| ≤ Osc(ϕ1 − ϕt )
(6.9)
|(Idη − Jdη )(ϕt ) − (Idη − Jdη )(ϕ1 )| ≤ m · Osc(ϕ1 − ϕt ).
(6.10)
and
Energy Properness and Sasakian-Einstein Metrics
253
Using the relationship Fdη (ϕ1 ) = −Fdη S E (φ), we have 1 Jdη (ϕ1 ) = Fdη (ϕ1 ) + ϕ1 (dη)m ∧ η V M 1 = −Fdη S E (φ) + ϕ1 (dη)m ∧ η V M 1 = −Jdη S E (φ) + φ{(dη S E )m − (dη)m ∧ η} V M 1 = (Idη S E − Jdη S E )(φ) ≥ Jdη S E (φ), m
(6.11)
where we have used the inequality (3.8). Since (Idη − Jdη )(ϕt ) is nondecreasing in t, (6.6) implies that Fdη S E (φ) ≥ (1 − t)(Idη − Jdη )(ϕt )ds ≥
1−t Jdη (ϕt ). m
(6.12)
Using (6.11) and (6.7), we have Fdη S E (φ) ≥
1−t 1−t Osc(ϕt − ϕ1 ). Jdη S E (φ) − m2 m
(6.13)
In the following, we choose t0 as in Proposition 5.4. If 2(1 − t0 )ϕt0 C 0 ≤ 1, by the definition of t0 , we have D ≤ (1 − t0 )1−α 2α , i.e. α
1
(1 − t0 ) ≥ 2− 1−α D 1−α .
(6.14)
α , then If 2(1 − t0 )ϕt0 C 0 ≥ 1, we have D ≤ 4α (1 − t0 )ϕt C 0
(1 − t0 ) ≥
D α . 4α ϕt0 C 0
In the second case, we may assume that 1 − t0 <
A−1 2 .
(6.15) The inequality implies that
ϕt0 C 0 ≤ 2ϕ1 C 0 + 2,
(6.16)
then (1 − t0 ) ≥
D . 4α (2ϕ1 C 0 + 2)α
(6.17)
Since sup ϕ1 · inf ϕ1 ≤ 0, we always have the following inequality: C (ϕ1 C 0 + 1)α C ≥ , (Osc(ϕ1 ) + 1)α C = , (Osc(φ) + 1)α
(1 − t0 ) ≥
(6.18)
254
X. Zhang
where C is a positive constant depending only on (M, g S E ). On the other hand, using Proposition 5.4 again, we have (1 − t0 )ϕ1 − ϕt0 C 0 ≤ (1 − t0 )2 Aϕt0 C 0 + 1 1
≤ AD α + 1.
(6.19)
By inequalities (6.13), (6.18) and (6.19), we obtain Fdη S E (φ) ≥ C˜ 1
Jdη S E (φ) − C˜ 2 , (Osc(φ) + 1)α
(6.20)
for all φ ∈ H(ξ, η S E , S E , g S E ), where C˜ 1 and C˜ 2 are positive constants depending only on the geometry of (M, g S E ). T ≥ t (m + 1)dη , we can use CorolSince ϕt − ϕ1 ∈ H(ξ, η S E , S E , g S E ) and ρdη t t lary 2.9 to obtain the following estimate: ¯ Osc(ϕt − ϕ1 ) ≤ Idη S E (ϕt − ϕ1 ) + C(M, g S E ),
(6.21)
¯ g S E ) is a constant depending only on (M, g S E ). By (6.20) for t ∈ [ 21 , 1], where C(M, and (6.21), we have Fdη S E (ϕt − ϕ1 ) ≥ C˜ 3
Jdη S E (ϕt − ϕ1 ) − C˜ 2 , (Jdη S E (ϕt − ϕ1 ) + 1)α
(6.22)
for t ∈ [ 21 , 1], where C˜ 3 is a constant depending only on (M, g S E ). By the cocycle property of the functional F, formulas (6.4), (6.5), (4.6), nondecreasing (Idη − Jdη )(ϕt ), and the concavity of the log function, we have Fdη S E (ϕt − ϕ1 ) = Fdη (ϕt ) − Fdη (ϕ1 ) 1 −1 t (Idη − Jdη )(ϕs )ds + (Idη − Jdη )(ϕs )ds = t 0 0 1 1 log{ e(t−1)(m+1)ϕt (dηϕt )m ∧ η} − m+1 V M 1 t −1 t ≤ (Idη − Jdη )(ϕs )ds + (Idη − Jdη )(ϕs )ds t 0 t (1 − t) ϕt (dηϕt )m ∧ η + V M 1 = (Idη − Jdη )(ϕs )ds − (1 − t)(Idη − Jdη )(ϕt ) t
≤ (1 − t){(Idη − Jdη )(ϕ1 ) − (Idη − Jdη )(ϕt )} ≤ m(1 − t)Osc(ϕ1 − ϕt ) C1 (m) + C2 } ≤ m(1 − t){Idη S E (ϕt − ϕ1 ) + t C1 (m) (6.23) ≤ m(1 − t){(m + 1)Jdη S E (ϕt − ϕ1 ) + + C2 }. t By the same discussion in [29] (p. 1083), we know that (6.13), (6.21), (6.22) and (6.23) imply the Moser-Trudinger inequality (6.1). We write out the proof in details just for the reader’s convenience.
Energy Properness and Sasakian-Einstein Metrics
255
Combining (6.22) with (6.23), we have m(m + 1)(1 − t)J (t) + C˜ 4 (1 − t) ≥ C˜ 3
J (t) − C˜ 2 , (J (t) + 1)α
(6.24)
for t ∈ [ 21 , 1], where C˜ 4 is a constant depending only on (M, g S E ). Here we denote Jdη S E (ϕt − ϕ1 ) by J (t) just for simplicity. Equation (6.24) can also be written as J (t) (C˜ 5 − (1 − t)(J (t) + 1)α ) ≤ C˜ 6 (1 − t) + C˜ 7 , (J (t) + 1)α
(6.25)
where C˜ 5 , C˜ 6 and C˜ 7 are constants depending only on (M, g S E ). We can suppose that there exists a t ∈ [ 21 , 1] with (1 − t )(J (t ) + 1)α =
1˜ C5 . 2
(6.26)
If not then we must have (1 − t)(J (t) + 1)α < 21 C˜ 5 for all t ∈ [ 21 , 1]. It would follow 1
that J ( 21 ) ≤ C˜ 5α , then (6.13) and (6.21) imply (6.1). Otherwise, from (6.25) we have that J (t ) ≤ C˜ 8 and 1 − t ≥ C˜ 9 . These also imply (6.1). Now, Theorem 4.3 and Theorem 6.1 imply the main theorem in the Introduction. 7. A Miyaoka-Yau Type Inequality Definition 7.1. Let (M, ξ, η, , g) be a compact Sasakian manifold with [dη] B = 2π B ¯ m+1 c1 (M, Fξ ). As above, we define S(ξ, J ) to be the space of all Sasakian structures compatible with (ξ, η, , g). Let’s define two positive constants by T f or some (ξ, η, , g) ∈ S(ξ, J¯)}, α(ξ, J¯) := inf{λ | 0 ≤ Sdη
≤ 2mλ
and T f or some (ξ, η, , g) ∈ S(ξ, J¯)}. β(ξ, J¯) := sup{λ | Sdη
≥ 2mλ
Remark. Since the mean value of transverse Scalar curvature S¯ = 2m(m + 1) for any Sasakian structure in S(ξ, J¯), it is easy to see that α(ξ, J¯) ≥ m + 1 and 0 < β(ξ, J¯) ≤ m + 1. Obviously, if there exists a Sasakian-Einstein structure in S(ξ, J¯), then we have α(ξ, J¯) = m + 1 = β(ξ, J¯). Lemma 7.2. Let (M, ξ, η, , g) be a compact Sasakian manifold with [dη] B = 2π B
¯ m+1 c1 (M, Fξ ), and (ξ, η , , g ) ∈ S(ξ, J ). Then we have M
(2π )2 (2c2B (M, Fξ ) −
=
|Rm T |2 − M
( 1 dη )m−2 m B c1 (M, Fξ )2 ) ∧ 2 ∧ η m+1 (m − 2)!
( 1 dη )m (m − 1)(m + 2) T 2 2(S T )2 − ((S ) − (2m(m + 1))2 ) 2 ∧ η , m(m + 1) m(m + 1) m! (7.1)
where Rm T and S T are the transverse curvature tensor and the transverse scalar curvature of (ξ, η , , g ).
256
X. Zhang
Proof. By direct calculation, we have
( 1 dη )m−2 m B c1 (M, Fξ )2 ) ∧ 2 ∧ η m+1 (m − 2)! M ( 1 dη )m−2 1 = {tr (Rm T ∧ Rm T ) − tr Rm T ∧ tr Rm T } ∧ 2 ∧ η m+1 (m − 2)! M ( 1 dη )m 1 ((S T )2 − |ρ T |2 ) 2 ∧ η = |Rm T |2 − |ρ T |2 + m + 1 m! M ( 1 dη )m m+2 T 2 (|ρ | − (S T )2 ) 2 ∧ η . = |Rm T |2 − (S T )2 − (7.2) m+1 m! M (2π )2 (2c2B (M, Fξ ) −
On the other hand
( 21 dη )m ∧ η m! M ( 1 dη )m−2 ∧ η = ρT ∧ ρT 2 (m − 2)! M ( 1 dη )m ∧ η . = 4m(m − 1)(m + 1)2 2 m! M (S T )2 − |ρ T |2
Combining the above two equalities, we get (7.1).
(7.3)
In the local foliation chart (x, z 1 , . . . , z m ), set T Q i jk ¯ l¯ = Ri jk ¯ l¯ −
ST (g T g T + giTl¯ gkTj¯ ). m(m + 1) i j¯ kl¯
(7.4)
It is easy to check that |Q|2 = |Rm T |2 −
2(S T )2 . m(m + 1)
(7.5)
Combining (7.1) and (7.5), we have
( 1 dη )m−2 m B c1 (M, Fξ )2 ) ∧ 2 ∧ η m + 1 (m − 2)! M ( 1 dη )m (m − 1)(m + 2) T 2 ((S ) − (2m(m + 1))2 ) 2 ∧ η . ≥ − m(m + 1) m! M (2π )2 (2c2B (M, Fξ ) −
(7.6)
Let’s recall the Calabi functional on the space S(ξ, J¯), which was introduced by Boyer, Galicki and Simanca in [11],
T 2
m
Cal(ξ, η , , g ) = (Sdη
− 2m(m + 1)) (dη ) ∧ η M T 2 2
m
= (Sdη (7.7)
) − (2m(m + 1)) (dη ) ∧ η . M
Energy Properness and Sasakian-Einstein Metrics
257
If inf S (ξ, J¯) Cal = 0, for arbitrary > 0, we have a Sasakian structure (ξ, η , , g ) ∈ S(ξ, J¯) such that Cal(ξ, η , , g ) ≤ . Then, by (7.6), we have ( 1 dη )m−2 m B c1 (M, Fξ )2 ) ∧ 2 ∧ η (2π )2 (2c2B (M, Fξ ) − m + 1 (m − 2)! M (m − 1)(m + 2) . (7.8) ≥− m(m + 1) Since is arbitrary, (7.8) implies the following theorem. Theorem 7.3. Let (M, ξ, η, , g) be a compact Sasakian manifold with [dη] B = 2π B m+1 c1 (M, Fξ ). If inf S (ξ, J¯) Cal = 0, then we have the following Miyaoka-Yau type inequality: m B c (M, Fξ )2 ) ∧ (dη)m−2 ∧ η ≥ 0. (2c2B (M, Fξ ) − (7.9) m +1 1 M On the other hand, if α(ξ, J¯) = m + 1, for arbitrary > 0, we have a Sasakian structure (ξ, η , , g ) ∈ S(ξ, J¯) such that 0 ≤ S T ≤ 2m(m + 1 + ). By (7.7), we have Cal(ξ, η , , g ) ≤ 2(2m)2 (m + 1) + (2m)2 2 .
(7.10)
Then, we have the following corollary. Corollary 7.4. Let (M, ξ, η, , g) be a compact Sasakian manifold with [dη] B = 2π B ¯ m+1 c1 (M, Fξ ). If α(ξ, J ) = m + 1, then inf S (ξ, J¯) Cal = 0. In particular, we also have the Miyaoka-Yau type inequality (7.9). As in [3], we have the following proposition. Proposition 7.5. Let (M, ξ, η, , g) be a compact Sasakian manifold with [dη] B = 2π B m+1 c1 (M, Fξ ). If the K energy functional Vdη is bounded below on the space H(ξ, η, , g), then, for arbitrary > 0, M admits a Sasakian structure (ξ, η , , g ) T − 2m(m + 1)| ≤ . In particular, α(ξ, J¯) = compatible with (ξ, η, , g) such that |Sdη
m + 1 = β(ξ, J¯). Proof. By Proposition 4.4, there exists a smooth family of solution {ϕt } of (1.6) for t ∈ (0, 1). Let f (t) = (1 − t)(Idη − Jdη )(ϕt ), by (4.17), we have d −1 d f (t) + (1 − t)−1 f (t) = Vdη (ϕt ). dt 2(m + 1) dt
(7.11)
Since Vdη is bounded below, the above equality implies that there exists a sequence ti → 1 such that f (ti ) → 0 as i → +∞. From (4.6) and (3.8), we have h dηt C 0 ≤ Osc(h dηt ) = (1 − t)Osc(ϕt ) ≤ (1 − t)((m + 1)(Idη − Jdη )(ϕt ) +
C1 (m) + C2 ). t
(7.12)
So, there exists a sequence ti → 1 such that h dηti C 0 → 0 as i → +∞. On the other hand, considering T ρdη = t (m + 1)dηt + (m + 1)(1 − t)dη ≥ t (m + 1)dηt , t
(7.13)
258
X. Zhang
˜ g) for arbitrary > 0, we get a Sasakian structure (ξ, η, ˜ , ˜ ∈ S(ξ, J¯) such that SdTη˜ − 2m(m + 1) ≥ − and h d η˜ C 0 < . Let’s consider the Sasakian-Ricci flow (5.1) with the initial data d η˜ 0 = d η. ˜ Since the initial h d η˜ 0 satisfies h d η˜ 0 C 0 < and 0 h d η˜ 0 ≥ −2, by Lemma 5.1, we have h d η˜ s C 0 < 4e2(m+1) , for s ∈ [0, 2], sup |dh d η˜ s |2s M
< 8e
(7.14)
, for s ∈ [1, 2],
(7.15)
f or s ∈ [0, 2].
(7.16)
4(m+1) 2
and s h d η˜ s ≥ −2e2(m+1) ,
1 , we have From (5.6) and (5.12), setting a = 4m
1 ∂ − s |dh d η˜ s |s + a(s − 1)s h d η˜ s ∂s 4 ≤ (m + 1)(|dh d η˜ s |s + a(s − 1)s h d η˜ s ) + as h d η˜ s −(1 + a(s − 1))|∂ B ∂¯ B h d η˜ |2s s
1 + a(s − 1) s h d η˜ s ), 4m (7.17)
1 where we have used the Cauchy-Schwarz inequality 2 s h ≤ m|∂ B ∂¯ B h|2s . Equivalently, we have
∂ 1 − s e1−s (|dh d η˜ s |s + a(s − 1)s h d η˜ s ) ∂s 4 1 + a(s − 1) (7.18) ≤ e1−s s h d η˜ s a − s h d η˜ s . 4m ≤ (m + 1)(|dh d η˜ s |s + a(s − 1)s h d η˜ s ) + s h d η˜ s (a −
Then, (7.18) implies that e1−s (|dh d η˜ s |s +a(s −1)s h d η˜ s ) ≤ 16e4(m+1) 2 for s ∈ [1, 2]. Otherwise at the point of [1, 2] × M where it fails to hold for the first time 1 < t0 ≤ 2, we have e1−s a(s − 1)s h d η˜ s ≥ 8e4(m+1) 2 , and then s h d η˜ s ≥ 32me4(m+1) . But, from (7.18), we have s h d η˜ s ≤ at the point, which is a contradiction. So, we have s h d η˜ s ≤ 64me4m+5 for s = 2,
(7.19)
|SdTη˜ s − 2m(m + 1)| ≤ 32me4m+5 for s = 2.
(7.20)
and then
Corollary 7.6. Let (M, ξ, η, , g) be a compact Sasakian manifold with [dη] B = 2π B m+1 c1 (M, Fξ ). If the K energy functional Vdη is bounded below on the space H(ξ, η, , g), then we have the Miyaoka-Yau type inequality (7.9). As an application of Theorem 4.3 and Lemma 7.2, we have the following proposition.
Energy Properness and Sasakian-Einstein Metrics
259
Proposition 7.7. Let (M, ξ, η, , g) be a compact Sasakian manifold with [dη] B = 2π B m+1 c1 (M, Fξ ) and M
(2c2B (M, Fξ ) −
( 1 dη)m−2 m B c1 (M, Fξ )2 ) ∧ 2 ∧ η = 0. m+1 (m − 2)!
(7.21)
If Fdη (or Vdη ) is proper in the space H(ξ, η, , g), then there must exist a Sasakian metric (ξ, η , , g ) ∈ S(ξ, J¯) with constant curvature 1. Furthermore, if M is simply connected, then (M, g ) is isometric to a unit sphere. Proof. By Theorem 4.3, there exists a Sasakian-Einstein (ξ, η , , g ) ∈ S(ξ, J¯). By Lemma 7.2, formula (7.5) and condition (7.21), we have T T T T T Q i jk ¯ l¯ = Ri jk ¯ l¯ − 2(gi j¯ gk l¯ + gi l¯ gk j¯ ),
(7.22)
i.e. (ξ, η , , g ) is of constant transverse holomorphic bisectional curvature. On the other hand, using the relation (2.5) of the transverse curvature tensor and the Riemann curvature tensor (or see Proposition 7.2 in [34]), it is not hard to see that the Riemannian manifold (M, g ) is of constant curvature 1. Acknowledgements. The paper was written while the author was visiting McGill University. He would like to thank ZheJiang University for the financial support and to thank McGill University for the hospitality. The author would also like to thank Prof. PengFei Guan and Xiangwen Zhang for their useful discussions and help. Finally the author wishes to thank the referees for their valuable comments.
References 1. Aubin, T.: Réduction du cas positif de l’equation de Monge-Ampére sur les varietés Kählériennes compactes á la démonstration dúne inégalité. J. Funct. Anal. 57, 143–153 (1984) 2. Aubin, T.: Nonlinear analysis on manifolds, Monge-Ampère equation. Berlin-New York: Springer-Verlag, 1982 3. Bando, S.: The K-energy map, almost Einstein Kähler metrics and an inequality of the Miyaoka-Yau type. Tohuku Math. J. 39, 231–235 (1987) 4. Bando, S., Mabuchi, T.: Uniqueness of Einstein-Kähler metrics modulo connected group actions. Algebraic Geometry, Adv. Studies in Pure Math. 10, 11–40 (1987) 5. Boyer, C.P., Galicki, K.: On Sasakian-Einstein geometry. Int. J. Math. 11, 873–909 (2000) 6. Boyer, C.P., Galicki, K.: New Einstein metrics in dimension five. J. Diff. Geom. 57, 443–463 (2001) 7. Boyer, C.P., Galicki, K.: Sasakian geometry, holonomy and supersymmetry. http://arxiv.org/abs/math/ 0703231v2 [math.D6], 2007 8. Boyer, C.P., Galicki, K.: Sasakian geometry, Oxford Mathematical Monographs. Oxford: Oxford University Press, 2008 9. Boyer, C.P., Galicki, K., Kollár, J.: Einstein metrics on spheres. Ann. of Math. 162, 557–580 (2005) 10. Boyer, C.P., Galicki, K., Matzeu, P.: On Eta-Einstein Sasakian geometry. Commun. Math. Phys. 262, 177– 208 (2006) 11. Boyer, C.P., Galicki, K., Simanca, R.: Canonical Sasakian metrics. Commun. Math. Phys. 279, 705–733 (2008) 12. Cvetic, M., Lu, H., Page, D.N., Pope, C.N.: New Einstein-Sasaki spaces in five and higher dimensions. Phys. Rev. Lett. 95(7), 071101 (2005) 13. Ding, W.Y.: Remarks on the existence problem of positive Kähler-Einstein metrics. Math. Ann. 282, 463– 471 (1988) 14. Ding, W.Y., Tian, G.: Kähler-Einstein metrics and the generalized Futaki invariant. Invent. Math. 110(1), 315–335 (1992) 15. El Kacimi-Alaoui, A.: Operateurs transversalement elliptiques sur un feuilletage riemannien et applications. Comp. Math. 79, 57–106 (1990) 16. Futaki, A., Ono, H., Wang, G.: Transverse Kähler geometry of Sasaki manifolds and toric Sasaki-Einstein manifolds. J. Diff. Geom. 83, 585–635 (2009)
260
X. Zhang
17. Gauntlett, J.P., Martelli, D., Spark, J., Waldram, W.: Sasaki-Einstein metrics on S 2 × S 3 . Adv. Theor. Math. Phys. 8, 711–734 (2004) 18. Gauntlett, J.P., Martelli, D., Spark, J., Waldram, W.: A new infinite class of Sasaki-Einstein manifolds. Adv. Theor. Math. Phys. 8, 987–1000 (2004) 19. Gauntlett, J.P., Martelli, D., Spark, J., Yau, S.T.: Obstructions to the existence of Sasaki-Einstein metrics. Commun. Math. Phys. 273, 803–827 (2007) 20. Guan, P.F., Zhang, X.: Regularity of the geodesic equation in the space of Sasakian metrics. http://arxiv. org/abs/09065591v2 [math.D6], 2009 21. Klebanov, I.R., Witten, E.: Superconformal field theory on threebranes at a Calabi-Yau singularity. Nucl. Phys. B 536, 199–218 (1999) 22. Mabuchi, T.: K-energy maps integrating Futaki invariants. Tohoku. Math. J. 38(4), 575–593 (1986) 23. Maldacena, J.: The large N limit of superconformal field theories and supergravity. Adv. Theor. Math. Phys. 2, 231–252 (1998) 24. Martelli, D., Sparks, J.: Toric Sasaki-Einstein metrics on S 2 × S 3 . Phys. Lett. B 621, 208–212 (2005) 25. Martelli, D., Sparks, J.: Toric geometry, Sasaki-Einstein manifolds and a new infinite class of AdS/CFT duals. Commun. Math. Phys. 262, 51–89 (2006) 26. Martelli, D., Sparks, J., Yau, S.T.: The geometric dual of a-maximisation for toric Sasaki-Einstein manifolds. Commun. Math. Phys. 268, 39–65 (2006) 27. Martelli, D., Sparks, J., Yau, S.T.: Sasaki-Einstein manifolds and volume minimisation. Commun. Math. Phy. 280(3), 611–673 (2008) 28. Nitta, Y.: A diameter bound for sasaki manifolds with application to uniqueness for Sasaki-Einstein structure. http://arxiv.org/abs/0906.0170v1 [math.D6], 2009 29. Phong, D.H., Song, J., Sturm, J., Weinkove, B.: The Moser-Trudinger inequality on Kähler-Einstein manifolds. Amer. J. Math. 130, 1067–1085 (2008) 30. Reinhart, B.L.: Harmonic integrals on foliated manifolds. Amer. J. Math. 81, 529–536 (1959) 31. Sekiya, K.: On the uniqueness of Sasaki-Einstein metrics. http://arxiv.org/abs/0906.2665v1 [math.D6], 2009 32. Smoczyk, K., Wang, G., Zhang, Y.: On a Sasakian-Ricci flow. Internat. J. Math. 21(7), 951–969 (2010) 33. Tanno, S.: The topology of contact Riemannian manifolds. Illinois. J. Math. 12, 700–717 (1968) 34. Tanno, S., Baik, Y.B.: φ-holomorphic special bisectional curvature. Tohoku Math. J. 22(2), 184–190 (1970) 35. Tian, G.: On Kähler-Einstein metrics on certain Kähler manifolds with C1 (M) > 0. Invent. Math. 89, 225– 246 (1987) 36. Tian, G.: Kähler-Einstein metrics with positive scalar curvature. Invent. Math. 137, 1–37 (1997) 37. Tondeur, P.: Geometry of foliations. Monographs in Mathematics, Vol. 90, Basel: Birkhauser Verlag, 1997 38. Yau, S.T.: On the Ricci curvature of a compact Kähler manifold and the complex Monge-Ampere equation. Comm. Pure Appl. Math. 31, 339–441 (1978) 39. Zhang, X.: A note of Sasakian metrics with constant scalar curvature. J. Math. Phys. 50(10), 103505 (2009) 40. Zhang, X.: Some invariants in Sasakian Geometry. International Mathematics Research Notices, rnq219, 33 pages. doi:10.1093/imrn/rnq219, October 2010 Communicated by P.T. Chru´sciel
Commun. Math. Phys. 306, 261–290 (2011) Digital Object Identifier (DOI) 10.1007/s00220-011-1288-8
Communications in
Mathematical Physics
Burning Cars in a Parking Lot Jean Bertoin Laboratoire de Probabilités et Modèles Aléatoires, UPMC, Case courrier 188, 4 Place Jussieu, 75252 Paris, France. E-mail:
[email protected] Received: 23 July 2010 / Accepted: 15 February 2011 Published online: 21 June 2011 – © Springer-Verlag 2011
Abstract: Knuth’s parking scheme is a model in computer science for hashing with linear probing. One may imagine a circular parking lot with n sites; cars arrive at each site with unit rate. When a car arrives at a vacant site, it parks there; otherwise it turns clockwise and parks at the first vacant site which is found. We incorporate fires into this model by throwing Molotov cocktails on each site at a smaller rate n −α , where 0 < α < 1 is a fixed parameter. When a car is hit by a Molotov cocktail, it burns and the fire propagates to the entire occupied interval which turns vacant. We show that with high probability when n → ∞, the parking lot becomes saturated at a time close to 1 (i.e. as in the absence of fire) for α > 2/3, whereas for α < 2/3, the average occupation approaches 1 at time 1 but then quickly drops to 0 before the parking lot is ever saturated. Our study relies on asymptotics for the occupation of the parking lot without fires in certain regimes which may be of independent interest. 1. Introduction The purpose of this work is to point out a phase transition for a random evolution which combines the dynamics of two different models in statistical physics and computer science, namely forest fires and parking schemes. Its motivation partly stems from an interesting paper by Ráth and Tóth [17] in which the authors introduce forest fires in the Erd˝os-Rényi random graph model. It is well-known that the random graph model is closely related to the multiplicative coalescent [1], and the incorporation of fires causes random shattering of large components. On the other hand, it is also known that the parking scheme bears close connexions to the additive coalescent [7,12] and it seems therefore natural to investigate the effects of random shattering in this framework. Forest fires have been introduced by Drossel and Schwabl [13], see also the review [18]; they are prototypes of systems displaying self-organized criticality. Typically, imagine a lattice where each site is either vacant or occupied by a tree; the connected components of occupied sites are viewed as forests. Each vacant site becomes occupied
262
J. Bertoin
Fig. 1. A car arrives at an occupied site j and parks at the first vacant site d( j)
at some fixed rate, independently of the others. One may think of seeds being sown uniformly on the lattice; a tree grows each time a seed falls on a vacant site and seeds falling on a site that is already occupied are discarded. Furthermore, each tree can also be hit by lightning at some rate. Then the tree burns and the fire propagates to the entire forest, i.e. any site that can be connected by a path consisting only of occupied sites to the site hit by lightning becomes instantaneously vacant. So, roughly speaking, forests may grow by addition of trees on their boundaries or coalescence when a vacant site separating two forests turns occupied, and disappear when hit by a lightning. The evolution of the system thus results from two opposite trends, growth/destruction, which should be viewed as the source of self-organization. Although forest fires have raised considerable interest, notably in the physical literature, mathematical papers on these models are still rather scarce; see e.g. [2,3,8,10, 14,15]. The recent paper [9] by Bressaud and Fournier deals with the more general situation where seeds and lightning occur according to stationary renewal processes, and contains in particular a detailed and well documented introduction to the subject. The work by Ráth and Tóth [17] which we already mentioned is somewhat special because the underlying lattice is the complete graph, and basically this circumvents geometrical difficulties. The study of asymptotics when the rate of lightning tends to 0 is especially challenging since, informally, a low rate of lightning should enable forests to grow larger although the larger a forest grows, the more inflammable it becomes. On the other hand, Knuth’s parking lot is a simple model in computer science for hashing with linear probing; we refer French readers to the very nice survey by Chassaing and Flajolet [11]. Think of Z/nZ as a circular parking lot where initially all sites are vacant. Cars arrive one after the other, uniformly at random. If a car arrives at a site j ∈ Z/nZ which is vacant, then the car parks at j and this site becomes occupied. Otherwise the car tries to park successively at sites j + 1, j + 2, . . . until it finally finds a vacant site, see Fig. 1 above. In this work, we incorporate fires into Knuth’s parking model. We may imagine that Molotov cocktails are thrown on the parking lot, and when a Molotov cocktail falls on an occupied site, the car parked there burns, the fire propagates to the neighboring cars, and the entire connected component of occupied sites becomes instantaneously vacant. See Fig. 2 below.
Burning Cars in a Parking Lot
263
Fig. 2. A Molotov cocktail is thrown on an occupied arc which then becomes entirely vacant
A fundamental difference with the forest fire model on Z/nZ is that a car may occupy a site different from that at which it arrived, whereas trees only grow on sites on which a seed falls. Empty blocks have clearly an important role in the dynamics as, roughly speaking, they form barriers that prevent the propagation of fires. In particular, this ensures the independence of the evolutions of regions separated by such barriers. In the case of forest fires, a barrier remains effective until a seed has fallen on every single site of that block, which takes typically a long time when the block is large. However, the occupation of vacant intervals in a parking lot may occur much quicker when, loosely speaking, the portion of the parking at the left of this interval is already densely occupied. Thus the parking model involves a stronger spacial dependency which makes the study even more tedious. In this work, we assume that cars arrive independently on each site of Z/nZ at a unit rate whereas Molotov cocktails hit each site at rate n −α for some parameter α ∈ (0, 1), independently of car arrivals. We are interested in saturation, i.e. when the parking lot is entirely occupied; note that saturation times are renewal epochs for the dynamics. Observe that when no Molotov cocktails are thrown, the parking lot becomes saturated when n cars have arrived, and thus at a time close to 1. We are interested in the question of whether Molotov cocktails have a significant impact on the saturation of the parking lot. Our main result shows that α = 2/3 is the critical exponent. More precisely, if α > 2/3, then with high probability1 the parking lot becomes saturated at a time close to 1, while if α < 2/3, even though the average occupation of the parking lot approaches 1 as time tends to 1−, it then quickly drops to 0 before the parking lot is completely saturated. Let us explain intuitively this phase transition. We will establish in Lemma 8 that the mean of the total number of cars which have been burnt until the arrival of the m th car with m < n can be bounded from above by cn 2−α /(n − m). When α > 2/3, we may choose m ≈ n − n β for some 2/3 < β < α, so that n 2−α /(n − m) n β . During approximately 1 We say that an event depending on n holds with high probability if its probability tends to 1 as n → ∞. Similarly, we say that a random variable depending on n is close to c (respectively, is small) with high probability if it converges in probability to c (respectively, to 0) as n → ∞.
264
J. Bertoin
2n β−1 units of times, 2n β new cars arrive in the parking lot while the probability that a Molotov cocktail is thrown is only of order n 1−α × 2n β−1 1. This suffices to saturate the parking before the arrival of the (n + n β )th car. This simple argument breaks down for α < 2/3. Indeed, the time interval between two consecutive Molotov cocktails is of order n α−1 , during which about n α new cars arrive. The upperbound for the mean of the number of cars which have burnt only ensures that the latter is less than n α until the arrival of the m th car for n − m ≥ n 2(1−α) . Now the time needed for n 2(1−α) new cars to arrive is of order n 1−2α n α−1 and thus further cars will be burnt during this time-interval. This is of course insufficient to establish the phase transition, and the case α < 2/3 is more complex. Roughly speaking, the crucial step consists in establishing the existence of some β > α and of a small time window before 1 during which mesoscopic occupied intervals of size of order n β are formed. In particular, the average occupation of the parking lot is close to 1 at such times. Since n β−α 1, with high probability such mesoscopic intervals are entirely destroyed by a Molotov cocktail shortly after they have been formed, which causes the average occupation to quickly drop to 0. More precisely, during this short time interval, destruction of cars by fires overpasses significantly the arrivals of new cars, and thus prevents the saturation of the parking. An obvious difficulty in proving rigorously these statements, and in particular in finding the critical parameter, lies in the fact that most of the relevant phenomena occur during a very short time when mesoscopic occupied intervals are formed before being quickly destroyed by fires. Our approach to investigate their formation consists in establishing that for certain times close to 1, fires still have essentially a negligible impact on the occupation of the parking lot. In particular, we are led to analyze asymptotics of the occupation process for parking schemes without fires in intermediate regimes which do not seem to have been considered previously in the literature. The plan of the rest of this work is as follows. The next section is devoted to preliminary observations in the deterministic setting. In particular we shall see that the occupation of the parking lot can be described in terms of a quasi-periodical path constructed from the processes of cars and Molotov cocktails. This will enable us to relate different parkings based on comparable dynamics. Section 3 deals with key asymptotic estimates for parkings without fires. Specifically, we shall establish limit theorems for the empirical measure of the sizes of occupied intervals in a parking lot of size n when m < n cars have arrived, in certain regimes when n, m → ∞. Our main result is stated and proved in Sect. 4. The argument relies on an intermediate claim which, roughly speaking, states that the impact of fires is essentially negligible until the arrival of the m th car provided that m is not too close to n (of course it is crucial to be able to let m be as large as possible). In other words, asymptotics for the occupation of the parking lot are then the same as for the dynamics without fires which have been studied in the preceding section. The proof of this technical result is presented in Sect. 5. 2. Preliminaries in the Deterministic Setting 2.1. Analysis of the model in terms of paths. For n ∈ N fixed, a configuration for a parking lot with n sites is a map ω : Z/nZ → {0, 1}; the site j is vacant if ω( j) = 0, and is occupied otherwise. We write 0 for the configuration when the parking lot is totally vacant (i.e. for the map ω( j) ≡ 0) and 1 for the configuration corresponding to saturation. The support of a configuration corresponds to the set of occupied sites; its connected components are called occupied intervals, or sometimes arcs.
Burning Cars in a Parking Lot
265
We represent car arrivals by a point process (Ct , t ≥ 0) on Z/nZ, where Ct = j if a car arrives at time t on the site j and Ct = ∅ if no car arrives at time t. Similarly, Molotov cocktails form another point process (Mt , t ≥ 0) on Z/nZ; we implicitly assume that times when Molotov cocktails are thrown never coincide with the arrival time of a car. The occupation of the parking lot is given by a process (t , t ≥ 0) with values in the space of configurations which is constructed from these two point processes as follows. We assume that the parking lot is completely vacant at the initial time, viz. 0 = 0. The occupation remains unchanged on every time-interval during which no car arrives and no Molotov cocktail is thrown. Suppose first that a car arrives at time t > 0 on the site j, i.e. Ct = j. If the parking is already saturated, i.e. t− = 1, then we decide that t = 1. Otherwise we denote 2 by Dt− ( j) the closest vacant site of t− to the right of j for the cyclic order (if j is vacant at time t, then Dt− ( j) = j), and we set t = t− + 1{Dt− ( j)} . On the other hand, suppose that a Molotov cocktail is thrown at time t on the site j, i.e. Mt = j. If the parking is saturated, i.e. t− = 1, then it instantaneously becomes vacant, i.e. t = 0. This should be viewed as a renewal time for the occupation process. If j is already vacant, i.e. t− ( j) = 0, then t = t− . Finally, if j is occupied but the parking is not saturated, then we write G t− ( j) and Dt− ( j) for the first vacant sites at the left and at the right of j and set t = t− − 1]G t− ( j),Dt− ( j)[ . A key tool for the analysis is that the occupation of the parking lot at some time t can be conveniently described in terms of a quasi-periodic path. This is a simple observation for parking without fires (see for instance Chassaing and Louchard [12]), which is easily extended in the present setting. Specifically, denote for every j ∈ Z by ξt ( j) the number of cars that have arrived before time t at the site j [mod n] and are parked at time t, either at j or further away on the parking lot. We stress that we do not take into account cars which have been burnt before time t nor those which have arrived at a time when the parking was already saturated. Define a path St : Z → Z by St (0) = 0 and St ( j) − St ( j − 1) = ξt ( j) − 1,
∀ j ∈ Z.
(1)
Clearly St is quasi-periodic, in the sense that St ( j + n) = St ( j) + St (n) for any j ∈ Z. Note the number of vacant sites on the parking lot is given by St (n)− , the negative part of St (n); in particular the parking is saturated if and only if St (n) = 0. The running minimum S t (k) = min St ( j), j≤k
k∈Z
is constant when St (n) = 0, and otherwise is a non-degenerate quasi-periodic path. We now make the key observation that a site j [mod n] of the parking lot is vacant at time t if and only if the path St reaches a new minimum at j. 2 The notation D and G refer to droite (right) and gauche (left) in French; they are commonly used when dealing with extremities of components of random subsets of the real line.
266
J. Bertoin
Lemma 1. In the notation above, we have for any j ∈ Z, t ( j [mod n]) = 0 ⇐⇒ St ( j) < S t ( j − 1). As a consequence, provided that the parking is not saturated (i.e. if St (n) < 0), the interval of occupied sites at time t that contains 0 is given by ]G t , Dt [ with G t = max{ j ≤ 0 : S t ( j − 1) > S t (0)} and Dt = min{ j ≥ 0 : S t ( j) < S t (−1)}. Note that G t = Dt = 0 (and thus ]G t , Dt [= ∅) if and only if the site 0 is vacant at time t. Proof. We shall check the first statement by induction. Denote by 0 < t1 < t2 < · · · the sequence of times at which either a car arrives or a Molotov cocktail is thrown. The claim is obvious for t = t1 ; assume that it holds for t = tm for some m ∈ N. Consider first the case when a Molotov cocktail is thrown at time tm+1 . Plainly our claim still holds at time tm+1 when either the parking is saturated at time tm or the Molotov cocktails falls on a vacant site. So assume that it falls on some occupied interval I = Z/nZ and denote by g and d the first vacant sites at the left and the right of I , i.e. I =]g, d[. By construction, we have ξtm+1 ( j) = 0 if j [mod n] ∈ I and ξtm+1 ( j) = ξtm ( j) otherwise. Thus the path Stm+1 is derived from Stm by replacing all the steps on I [mod n] by −1 and leaving the others unchanged. As the sites g and d are vacant at time tm , our induction assumption ensures that Stm (g) < S tm (g − 1) and Stm (d) < S tm (d − 1), from which we readily conclude that the claim still holds at time tm+1 . Next, consider the case when a car arrives at time tm+1 , say on the site j. If the parking is already saturated at time tm , then things are obvious. Else, we have ξtm+1 (i) = ξtm (i) for i = j [mod n] and ξtm+1 (i) = ξtm (i) + 1 otherwise. We realize after a moment of thought that this change in the path corresponds precisely to parking the car that has just arrived at j on the closest vacant site at its right, which is our first claim. It follows that whenever the parking is not saturated at time t, the first vacant sites at the left and at the right of 0 can be expressed respectively as G t = max{ j ≤ 0 : S t ( j − 1) > St ( j)} and Dt = min{ j ≥ 0 : St ( j) < S t ( j − 1)}. It is easily checked that these quantities can be re-expressed as in the statement.
2.2. Comparison with simpler parkings. An important source of difficulties in the study of fire models is the lack of monotonicity of the dynamics. In the present case, it is not true in general that adding a few cars or suppressing some Molotov cocktails would increase the occupation of the parking lot. For instance, adding one car may induce the merging of two neighboring occupied intervals into a larger one which can then be entirely destroyed by a single Molotov cocktail, while the latter would have shattered only one of the two genuine intervals if no car had parked in between to connect them.
Burning Cars in a Parking Lot
267
A few elementary and intuitively obvious comparisons are nonetheless possible and will be useful in the present study. More precisely, we shall first compare the occupation process with that of the parking lot driven by the same point process of car arrivals C = (Ct , t ≥ 0) but without fires, that is for Mt ≡ ∅. The latter occupation process will be denoted by = (t , t ≥ 0), and more generally a prime in the notation such as X = X (C, ∅) will be used for the analogue of X = X (C, M) when Molotov cocktails are suppressed. For instance, for every j ∈ Z and t ≥ 0 , ξt ( j) denotes the number of cars that have arrived at the site j [mod n] before time t ∧ T , where T = inf{s ≥ 0 : s = 1} denotes the saturation time (i.e. the arrival time of the n th car). For t ≤ T , the nonnegative quantity δt ( j) = ξt ( j) − ξt ( j) represents the number of cars that arrived at j [mod n] before time t and have been burnt in the dynamics with fires; this quantity will play an important role in the sequel. The following observation should be intuitively obvious. Lemma 2. For every t ≥ 0, each occupied interval of t is contained in some occupied interval of t . Proof. With no loss of generality, we may assume that the parking t is not yet saturated at time t as otherwise the statement is plain. If we define t (0) = 0 and t ( j) − t ( j − 1) = δt ( j)
for every j ∈ Z,
then t : Z → Z is a non-decreasing quasi-periodic path and, in the obvious notation, St = St − t . This implies that, if the path St reaches a new minimum at j, then so does St . By Lemma 1, this shows that t ≤ t , which in turn entails our claim. We shall also need a lower bound for occupied intervals. Consider a time t < T at which the parking without fires t is not yet saturated. Recall that St denotes the quasi-periodic path defined by (1) for M ≡ ∅ and S t its running minimum. Lemma 3. In the notation above, consider first the dynamics without fires. Let I be an occupied interval at time t and [ j0 , j1 ] ⊆ I some arc included in I . Next consider the dynamics with fires and suppose that the total number of cars that have arrived on I before time t and have been burnt, is strictly less than the minimum of St − S t on [ j0 , j1 ], i.e. δt ( j) < min (St ( j) − S t ( j)). j∈I
j0 ≤ j≤ j1
Then the entire arc [ j0 , j1 ] is occupied at time t for the dynamics with fires, i.e. t ( j) = 1 for all j ∈ [ j0 , j1 ]. Proof. We first provide an intuitive argument. Let g and d be the first vacant sites at the left and right of I in the dynamics without fires, i.e. I =]g , d [, and consider j ∈ [ j0 , j1 ]. In the dynamics without fires, at least k := St ( j) − S t ( j) − 1 cars that have arrived to ]g , j] could not park on ]g , j], so even with fires at least one car visits the site j in search of a parking place and j is going to be occupied. We use the same notation as in the proof of Lemma 2. In particular, the assumption in the statement reads t (d − 1) − t (g ) = δt ( j) < min (St ( j) − S t ( j)). j∈I
j0 ≤ j≤ j1
268
J. Bertoin
By Lemma 1, we have St (g ) = S t (g ) = S t (d − 1). We know from Lemma 2 that the site g is vacant for the dynamics with fires as well, and thus we also have St (g ) = S t (g ). This yields for every j ∈ [ j0 , j1 ], St ( j) − S t ( j − 1) ≥ = = >
St ( j) − S t (g ) St ( j) − St (g ) St ( j) − St (g ) − (t ( j) − t (g )) St ( j) − S t (g ) − min (St (i) − S t (i)). j0 ≤i≤ j1
Because S t (i) = S t (g ) for every i ∈ I and a fortiori for j0 ≤ i ≤ j1 , we have S t (g ) + min (St (i) − S t (i)) = min St (i). j0 ≤i≤ j1
j0 ≤i≤ j1
We conclude that St ( j) − S t ( j − 1) > St ( j) − min St (i) ≥ 0 j0 ≤i≤ j1
and according to Lemma 1, the site j is thus occupied in the dynamics with fires.
We stress that Molotov cocktails which are thrown outside I have no impact on the occupation inside I , which is intuitively obvious. Indeed the left and right extremities g and d of I are vacant in the dynamics with fires until time t. They thus serve as barriers which stop the propagation of fires started outside I . We will also need lower bounds for the number of cars which are burnt, which will be achieved by comparison with parkings where car arrivals are stopped after a certain time. More precisely, let 0 < s < t be two fixed times. Imagine that we stop the arrival of cars after time s, leaving the point process of Molotov cocktails M unchanged. In other words we work with the car arrival process C = (Cu , u ≥ 0) defined by Cu = Cu if u ≤ s and Cu = ∅ if u > s. We denote by Bs,t the total number of cars which are burnt during the time-interval (s, t] in the original dynamics, i.e. Bs,t = (δt ( j) − δs ( j)), j∈Z/n Z the analog quantity for the dynamics in which no cars arrive after time s. and by Bs,t The following inequality should be intuitively obvious. ≤ B . Lemma 4. In the notation above, we have Bs,t s,t
Proof. Consider a car parked at time s at some site j ∈ Z/nZ and let I denote the occupied interval that contains j. If no Molotov cocktails are thrown on I during the time-interval (s, t], then this car will not burn before time t for the dynamics where . Otherwise, car arrivals are stopped after time s, i.e. this car does not contribute to Bs,t set t˜ = min{u ∈ (s, t] : Mu ∈ I }, and follow the evolution of the occupied interval containing I in the original dynamics during the time-interval (s, t˜]. If no car parks at the boundary of I before t˜, then this interval remains unchanged until time t˜ at which is entirely destroyed. Else, let s˜ < t˜ be the first instant after s at which a car parks on the
Burning Cars in a Parking Lot
269
boundary of I . At this instant, the occupied interval containing I increases; denote it by I˜ ⊃ I . Clearly min{u ∈ (˜s , t] : Mu ∈ I˜} ≤ t˜ and by induction we see that all the cars that occupied the interval I˜ at time s have been burnt at time t˜. In other words, every car which is burned during the time-interval (s, t] in the dynamics where the car arrivals are stopped after time s is also burned in the original dynamics.
3. Asymptotics in Absence of Fires In this section, we consider the situation when cars arrive independently at unit rate on each site and no Molotov cocktails are thrown. That is we assume that (Ct , t ≥ 0) is a Poisson point process on Z/nZ with unit intensity per site and unit of time and M ≡ ∅. Our goal is to get sharp information about the asymptotic behavior of occupied intervals when the size of the parking n → ∞ and at a time close to saturation. Such questions have been considered previously by several authors. In particular, Pittel [16] observed that in the regime when the number k of vacant sites fulfills k ∼ cn for some 0 < c < 1, then the size of the largest occupied interval is of order ln n (we stress that Pittel established a much sharper result). Later on, Chassaing and Louchard [12] proved √ that a phase transition occurs in the regime when k ∼ c n; more precisely macroscopic occupied intervals of size of order n appear precisely in this regime. However these results are not sufficient for our purpose as we will need information √ for intermediate regimes, typically when the number k of vacant sites is such that n k n. Informally, we may expect the vacant sites to be roughly uniformly spread on the parking; therefore the length of a typical occupied interval should be of order n/k. However this informal analysis may be misleading. Indeed, the distribution of the length of the occupied interval that contains a typical site (say, 0) is related to that of an occupied interval chosen uniformly at random, say L (remember that primes refer to √ the dynamics with M ≡ ∅), by a size-biased transformation. It turns out that when n k n, the variance of L is much larger than its mean. As a matter of fact, we will see that the length of the occupied interval that contains a typical site is rather of order (n/k)2 . To keep track of the size of the parking lot, we introduce an additional index n in the notation. For every integer m, denote the arrival time of the m th car by n,m = inf{t ≥ 0 : #Cn,t = m} with #Cn,t =
1Cn,s =∅ ;
0≤s≤t
in particular the increments n,m+1 − n,m are i.i.d. exponential variables with mean ( j), j ∈ Z) for the quasi-periodic path defined in (1) for the 1/n. We also write (Sn,m time t = n,m . . In our analysis, we shall repeatedly use the following description of the law of Sn,m Let ξn,m ( j) denote the total number of cars that have arrived at j [mod n] until the arrival ( j), 0 ≤ j ≤ n − 1) has the multinomial distribution of the m th car, so the n-tuple (ξn,m with parameters m and (1/n, . . . , 1/n). It is well-known that the latter also arises as the law of an n-tuple of i.i.d. Poisson variables with parameter λ > 0 and conditioned to have total sum equal to m; note that this does not depend on the particular value of λ. Hence, consider a standard (i.e. with rate λ = 1) Poisson process (Nt , t ≥ 0) and write ( j), 0 ≤ j ≤ n) has the same Ntc = Nt − t for its compensated version. Then (Sn,m distribution as the random walk (N cj , 0 ≤ j ≤ n) conditioned on Nnc = −k.
270
J. Bertoin
3.1. Uniform bound for the mean length of an occupied interval. It will often be convenient to use the notation k = n − m for the number of vacant sites just after the arrival of the m th car. , the first vacant site at the left, respectively We denote by G n,m , respectively Dn,m − G n,m − 1)+ is the length of the occupied the right, of 0 at time n,m , so L n,m = (Dn,m interval that contains the site 0 when there remain exactly k = n − m vacant sites. The purpose of this section is to establish the following uniform upper bound for the mean length. Proposition 1. There is a numerical constant c > 0 such that for every 0 ≤ m < n, E(L n,m ) ≤ c
n n−m
2 .
The mass distribution of L n,m is known explicitly; see e.g. Eq. (2.6) in Chassaing and Louchard [12]. However the expression is rather involved and as we have not been able to establish Proposition 1 by direct calculations, we shall use a different route that relies on special properties of Borel-Tanner distributions. To start with, recall that for 0 < s ≤ 1, the Borel distribution with parameter s is the probability measure on positive integers induced by the masses e−s (s)−1 , !
≥ 1.
We shall denote by βs a Borel(s) variable; it is known that E(βs ) =
1 1 and E(βs2 ) = . 1−s (1 − s)3
The sum of k independent Borel(s) variables has the Borel-Tanner law with parameter (s, k), its mass distribution is given by bs,k () =
k (s)−k e−s , ( − k)!
≥ k.
Rather than working directly with the occupied interval that contains some given site, we shall consider the spacing between consecutive vacant sites. Recall that there are k vacant sites in the parking lot; we pick one of them uniformly at random, denote it by v1 , and then by v2 , . . . , vk the remaining ones listed according to the cyclic order. The intervals [vi , vi+1 [ for i = 1, . . . , k (with the convention that vk+1 = v1 ) form a partition of Z/nZ; we write σ1 , . . . , σk for the sequence of their sizes. Lemma 5. Under the assumptions and notations above, (σ1 , . . . , σk ) has the same distribution as the k-tuple formed by k i.i.d. Borel(s) variables conditioned on having total sum equal to n, where the parameter s can be chosen arbitrarily in (0, 1]. Remark. One can also establish a slightly weaker version of Lemma 5 using the correspondence between parking schemes and the additive coalescence (cf. Chassaing and Louchard [12]), and an observation due to Pavlov, see e.g. Corollary 5.8 in [5].
Burning Cars in a Parking Lot
271
Proof. This result should belong to the folklore of parking schemes, but as we have been unable to spot a precise reference in the literature, we shall sketch the proof for the reader’s convenience. It is a classical (and easy to check) property of Borel variables that the distribution of the k-tuple formed by k i.i.d. Borel(s) variables conditioned on having sum n does not depend on the parameter s ∈ (0, 1], so it suffices to establish the statement in the special case s = 1. Recall that (S j , 0 ≤ j ≤ n) has the same law as the compensated Poissonian walk c (N j , 0 ≤ j ≤ n) conditioned on Nnc = −k. An easy application of the ballot theorem then entails that the cyclic permutation of (S j , 0 ≤ j ≤ n) at the first instant v1 at which it attains an independent random variable which has the uniform distribution on {−k, . . . , −1}, has the same law as a first-passage bridge for the random walk N c . More precisely (Sv 1 + j − Sv 1 , 0 ≤ j ≤ n) has the same law as (N cj , 0 ≤ j ≤ n) conditioned on inf{ j ≥ 0 : N cj = k} = n. See for instance Theorem 1 in [6] for details. As it is well-known that the durations of the excursions of the compensated Poissonian random walk N c above its running minimum form a sequence of i.i.d. variables with the Borel(1) law, Lemma 1 yields our claim. Note that if L i is the length of the occupied interval with left exterior boundary vi (i.e. vi is the first vacant site at the left of that interval), then L i = σi − 1, and an immediate argument of rotational invariance yields the identity k 1 k E(L n,m ) = E σi (σi − 1) = E(σ12 − σ1 ). (2) n n i=1
We are now able to tackle the proof of Proposition 1, relying on (2), Lemma 5 and properties of Borel-Tanner distributions. Proof of Proposition 1. We may suppose without loss of generality that k 2 ≥ 10n since otherwise the statement is obvious (because L n,m ≤ n). Further, it suffices to establish the bound for k ≤ n/2 as clearly the size L n,k of the occupied interval containing 0 increases when the number of vacant sites k decreases, and hence E(L n,k ) ≤ E(L n,n/2 ) ≤ 9c
for every k ≥ n/2,
provided that the bound stated in Proposition 1 holds for k = n/2. By Lemma 5 and (2) we have k k 2 2 bs,k−1 (n − βs ) E(L n,m ) ≤ E(σ1 ) = E βs , n n bs,k (n) where βs stands for a Borel(s) variable and we agree that bs,k−1 () = 0 for < k −1. Set ∗ = max bs,k k≤≤n bs,k (), and observe from the additivity property of the Borel-Tanner distributions that e−1 bs,k−1 ( − 1) ≤ bs,1 (1)bs,k−1 ( − 1) ≤ bs,k (). ∗ ∗ , and hence This yields bs,k−1 ≤ ebs,k
E(L n,m ) ≤
∗ ∗ ebs,k bs,k ke k E(βs2 ) = , × n bs,k (n) n(1 − s)3 bs,k (n)
where the equality follows from E(βs2 ) = (1 − s)−3 .
272
J. Bertoin
∗ = b (n). In this We next point out that the parameter s can be chosen such that bs,k s,k direction, it is convenient to think for a while of the integer variable as a real one. The logarithmic derivative of the mass function of the Borel-Tanner distribution → bs,k () is −k−1 ln s − s + ln + − ψ( − k + 1), where ψ denotes the Digamma function, that is the logarithmic derivative of the Gamma function. Differentiating once more in the variable , we get
1 k+1 + 2 − ψ1 ( − k + 1),
2 where ψ1 (x) = ∞ m=0 (m + x) denotes the Trigamma function, that is the derivative of ψ. Since ψ1 (x) ≥ 1/x, we easily see that the preceding quantity is negative for all k ≤ ≤ n (recall that k 2 ≥ 10n). Hence → bs,k () is log-concave on [k, n] and thus reaches its maximum at n whenever n−k−1 − ψ(n − k + 1) ≥ 0. (3) ln s − s + ln n + n In that case, we have then E(L n,m ) ≤
ke . n(1 − s)3
(4)
We shall establish below that (3) holds with 1 − s = k/3n when n is sufficiently large and conclude from (4) that E(L n,m ) ≤ 27e
n2 , k2
which is our statement. We use the well-known estimate 1 as x → ∞, + O(x −2 ) 2x so the right-hand side of (3) can be rewritten in the form 1 k+1 −2 − ln(1 − k/n) − + O((n − k) ) . 1− n 2(n − k) ψ(x + 1) = ln x +
Next, elementary calculations yield k k k 2k k 2k 1− − ln 1 − = 1− + − ln 1 − + 3n 3n n 3n n 3n k 2k 1 k 2k − ln 1 − − × ≤ 1− + n 3n n 3n 1 − k/3n k 2k 2 k − = 1 − − ln 1 − n n 3n(3n − k) k 20 k − , ≤ 1 − − ln 1 − n n 9n where in the last line we used the assumption that k 2 ≥ 10n.
(5)
Burning Cars in a Parking Lot
273
Recall that k ≤ n/2 and let n be sufficiently large so that 1 1 20 + + O((n − k)−2 ) ≤ , n 2(n − k) 9n where O(·) is the function which appears in (5). The calculation above shows that k k n−k−1 − ψ(n − k + 1) ≥ 1 − − ln 1 − , ln n + n 3n 3n so (3) is fulfilled for 1 − s = k/3n. This completes the proof.
We conclude this section with an easy consequence of Lemma 5 which will be useful to establish a property of propagation of chaos. Recall that σ1 , . . . , σk denote the sequence of spacings between consecutive vacant sites when there remain exactly k vacant sites. We are interested in the sub-sequence obtained by removing the spacing that contains a site chosen uniformly at random, that is one of the spacings picked by a size-biased sampling. This means that we consider a random index j ∗ ∈ {1, . . . , k} with conditional distribution P( j ∗ = i | σ1 , . . . , σk ) = n −1 σi
for i = 1, . . . , k,
and the sub-sequence (σ˜ 1 , . . . , σ˜ k−1 ) = (σ1 , . . . , σ j ∗ −1 , σ j ∗ +1 , . . . , σk ). Corollary 1. For every ∈ {1, . . . , n − k + 1}, the conditional distribution of (σ˜ 1 , . . . , σ˜ k−1 ) given σ j ∗ = is that of the sequence of the spacings in the parking Z/(n − )Z with Poissonian car arrival and no fires when there remain exactly k − 1 vacant sites. Proof. Let i 1 , . . . , i k−1 be a sequence of positive integers with i 1 + · · · + i k−1 = n − . By Lemma 5, we have for every j = 1, . . . , k and ∈ {1, . . . , n − k + 1}, P (σ˜ 1 , . . . , σ˜ k−1 ) = (i 1 , . . . , i k−1 ), j ∗ = j, σ j ∗ = = P (σ1 , . . . , σk ) = (i 1 , . . . , i j−1 , , i j , . . . , i k−1 ), j ∗ = j =
k−1 bs,1 ()
× bs,1 (ir ). n bs,k (n) r =1
Summing over the possible values for j, we deduce that P (σ˜ 1 , . . . , σ˜ k−1 ) = (i 1 , . . . , i k−1 ), σ j ∗ = =
k−1 k bs,1 ()
bs,1 (ir ) n bs,k (n) r =1
=
k−1
k bs,1 ()bs,k−1 (n − ) 1 bs,1 (ir ). × n bs,k (n) bs,k−1 (n − ) r =1
Note that the second term in the product above corresponds to the mass distribution of the (k − 1)-tuple formed by i.i.d. Borel(s) variables conditioned on having sum n − . Our claim follows from the comparison with Lemma 5.
274
J. Bertoin
3.2. Brownian limits. Proposition 1 provides a useful uniform upper bound for the expected length of occupied intervals. However such information is not sufficient for our purpose, and we will also need precise estimates of these lengths in certain regimes. We first analyze the asymptotic behavior of the path that encodes the occupation of the parking when the number of sites tends to infinity. Proposition 2. Fix arbitrary a < b. In the regime when n, k → ∞ with n 2/3 k n, the rescaled path k (n 2 k −2 u), a ≤ u ≤ b S n n,n−k converges in distribution on the space of càdlàg paths on [a, b] endowed with the maximum norm towards Wu − u, a ≤ u ≤ b, where (Wu , u ∈ R) is a standard two-sided Brownian motion. For the sake of simplicity, we shall establish Proposition 2 for a = 0 and b = 1, the case of arbitrary a and b only requiring a heavier notation. The proof relies on a technical asymptotic property of Poisson mass-distributions pn () =
e−n n , !
∈ Z+ ,
which we now state. Lemma 6. Consider two sequences of nonnegative integers (kn )n∈N and (xn )n∈N such that n 2/3 kn n and kn n =w xn − lim n→∞ n kn for some w ∈ R. Then we have lim
n→∞
pn−n 2 /kn2 (n − kn − xn ) pn (n − kn )
= exp(−w − 1/2).
Proof. We express the ratio as pn−n 2 /kn2 (n − kn − xn ) (n − n 2 /kn2 )n−kn −xn (n − kn )! 2 2 × = en /kn × pn (n − kn ) n n−kn (n − kn − xn )! n −1 n−kn −xn x
kn + i 2 2 × = en /kn × 1 − n −1 n 2 /kn2 . 1− n i=0
Recall that n 2 /kn3 1 and xn ∼ n 2 /kn2 . We estimate the logarithm of the preceding quantity and get 2 2 x xn −1 n −1 n kn + i kn + i 2 1 n2 −1 n + o(1) − − (n − kn − xn ) n + 4 − kn2 kn2 2kn n 2 n i=0 i=0 2 2 n n n kn n xn2 − xn 1 kn2 n3 = − − x + o(1). − − xn − + − x n n kn2 kn2 2kn4 kn kn2 n 2n 2 n2 After some simplifications using the identity xn kn /n = n/kn + w + o(1), we see that the above quantity can be expressed as −w − 1/2 + o(1), which yields our claim.
Burning Cars in a Parking Lot
275
We can now proceed with the proof of Proposition 2. Proof of Proposition 2. Recall that (Nuc = Nu − u, u ≥ 0) is a compensated Poisson process, and that k 2 −2 S (n k u), 0 ≤ u ≤ 1 n n,m
c c has the same law as the process nk Nn 2 k −2 u , 0 ≤ u ≤ 1 conditioned on Nn = −k, i.e. on Nn = m = n − k. Consider a continuous functional on the space of càdlàg paths on the unit interval, with values in [0, 1]. Observe that P(Nc = j) = p ( j + ), so an application of the Markov property for the compensated Poisson process at time n 2 /k 2 yields k 2 −2 S (n k u), 0 ≤ u ≤ 1 E n n,m k c N 2 −2 , 0 ≤ u ≤ 1 | Nn = n − k =E n n k u pn−n 2 /k 2 (n − k − Nn 2 /k 2 ) k c N 2 −2 , 0 ≤ u ≤ 1 . =E n n k u pn (n − k) We next let k = kn depend on n with kn n 2/3 . By Donsker’s invariance principle, kn c N −2 , 0 ≤ u ≤ 1 n n 2 kn u converges weakly as n → ∞ to a standard Brownian motion on the unit time interval, (Wu , 0 ≤ u ≤ 1). It then follows from Lemma 6 and Fatou’s Lemma that kn Sn,n−kn (n 2 kn−2 u), 0 ≤ u ≤ 1 lim inf E n→∞ n
≥ E (Wu , 0 ≤ u ≤ 1) e−W1 −1/2 = E ( (Wu − u, 0 ≤ u ≤ 1)), where the last identity stems from the classical relation of absolute continuity between the law of the Brownian motion with drift and the Wiener measure. Replacing by 1 − , we get the converse upper-bound kn Sn,n−kn (n 2 kn−2 u), 0 ≤ u ≤ 1 ≤ E ( (Wu − u, 0 ≤ u ≤ 1)), lim sup E n n→∞ which completes the proof of (i) in the case a = 0 and b = 1. The general case can be proven by the same argument, but with a heavier notation. Proposition 2 enables us to investigate precisely the asymptotics the occupied interval containing 0 in the regime of interest. Indeed, it suggests that after proper rescaling, the occupied interval should converge weakly to the interval straddling 0 during which a two-sided Brownian motion with negative drift makes an excursion above its running
276
J. Bertoin
minimum. To give a precise statement, it is convenient to introduce some further notation. Recall that (Wu , u ∈ R) denotes a two-sided Brownian motion; we write Ru = Wu − u − min(Wv − v) v≤u
for the Brownian motion with negative drift reflected at its running minimum. We set G = sup{u < 0 : Ru = 0}
and
D = inf{u > 0 : Ru = 0},
so that (G, D) is the excursion interval of the reflected process away from 0 that straddles the origin. We also denote by ⎧ ⎨ 0 if u ≤ G X u = Ru if G < u < D (6) ⎩ 0 if u ≥ D the corresponding excursion. Similarly, we introduce for every positive integer m < n ( j) = S ( j) − S ( j) and finally we set the periodic reflected path Rn,m n,m n,m ⎧ 0 if j ≤ G n,m ⎨ X n,m ( j) = Rn,m ( j) if G n,m < j < Dn,m , (7) ⎩ 0 if j ≥ Dn,m denote the first vacant site at the left, respectively the right, of 0 where G n,m and Dn,m at time n,m , viz. G n,m = max{ j ≤ 0 : S n,m ( j − 1) > Sn,m ( j)}, Dn,m = min{ j ≥ 0 : S n,m ( j − 1) < Sn,m ( j)}.
Proposition 3. In the regime when n, k → ∞ with n 2/3 k n, the rescaled path k X n,n−k (n 2 k −2 u), u ∈ R n converges in distribution on the space of càdlàg paths endowed with the maximum norm towards (X u , u ∈ R). In particular the pair of rescaled extremities of the occupied interval straddling 0, 2 k k2 G , D n 2 n,n−k n 2 n,n−k converges in distribution towards a pair of random variables with joint distribution density
1 2π(y − x)3
exp(−(y − x)/2)dxdy,
x < 0 < y.
The difficulty in deriving rigorously this weak limit from Proposition 2 is that the convergence stated there only concerns paths on a compact interval, whereas in order to investigate the left-extremity G of the occupied interval, we have to deal with the location of the overall minimum of the path on (−∞, 0] (the right-extremity D is much easier to handle once the location G and more precisely the value of the minimum WG − G are known). To resolve this difficulty, we shall need an a priori stochastic bound for the left-extremity G , which in turn relies on the following technical lemma. Recall that pn (·) denotes the mass-distribution of the Poisson law with parameter n.
Burning Cars in a Parking Lot
277
Lemma 7. For every b > 0, in the regime k, n → ∞ with n 2/3 k n, we have max
bn 2 /k 2 ≤≤n−k
pn (n − − k) −→ e−b/2 . pn (n − k)
Proof. Without loss of generality, we may suppose that b is rational. We start by observing that lim
pn (n − k − bn 2 /k 2 ) = e−b/2 , pn (n − k)
(8)
where the limit is taken as n, k → ∞ in the regime k n 2/3 and bn 2 /k 2 ∈ N. Indeed
n−k−bn 2 /k 2 pn (n − k − bn 2 /k 2 ) 2 2 = ebn /k 1 − bn/k 2 pn (n − k)
bn 2
/k 2 −1
1−
i=0
k +i n
,
and the logarithm of this quantity can be expressed as bn 2 k −2 −1 bn 2 k +i 1 k +i 2 bn b2 n 2 bn 2 − + − n−k− 2 + + o(1) k2 k k2 2k 4 n 2 n i=0 2 bn bn b2 n 3 b b2 n 3 b2 n 3 bn 2 bn − + = 2 − − + + − + o(1) k k2 k k4 2k 4 k 2k 4 2 b = − + o(1). 2 Next, consider the function x → px (x − k) = e−x x x−k / (x − k + 1), and view now the variable x as a positive real number. Take the logarithmic derivative; we get x → ln x − k/x − ψ(x − k + 1), where ψ denotes the Digamma function. Using the estimate (5), we can re-express this quantity as k k 1 − ln 1 − − − + O((x − k)−2 ) x x 2(x − k) ≥
k2 1 + O((x − k)−2 ), − 2x 2 2(x − k)
and it is easily checked that this is positive on [3k/2, k 2 /2] for k is sufficiently large. Therefore, since k 2 n, we have max
bn 2 /k 2 ≤≤n−3k/2
pn (n − − k) = pn (n − bn 2 /k 2 − k).
(9)
278
J. Bertoin
Finally, an elementary calculation based on the Stirling formula yields the estimate ln pn (n − k) = −k − (n − k) ln(1 − k/n) − =−
∞ =2
1 ln n + O(1) 2
1 1 k n 1− − ln n + O(1), ( − 1) 2
and since k n, we deduce that max
n−3k/2≤≤n−k
pn (n − − k) = o( pn (n − k)).
Combining this with (8) and (9) completes the proof.
We now proceed with the proof of Proposition 3. Proof of Proposition 3. It is convenient to introduce the dual compensated Poisson process Nˇ sc = s − Ns , where (Ns , s ≥ 0) is a standard Poisson process. The reversed path (Sn,n−k (−), 0 ≤ ≤ n) has then the same distribution as ( Nˇc , 0 ≤ ≤ n) conditionally on Nˇ nc = k. Fix b > 0 and observe that if −G n,n−k > bn 2 /k 2 , then necessarily the reversed path Sn,n−k (−) visits 0 again for some > bn 2 /k 2 . This yields the bound P(−G n,n−k ≥ bn 2 /k 2 ) ≤ P( Nˇc = 0 for some bn 2 /k 2 ≤ < n | Nˇ nc = k). Applying the strong Markov property of the random walk Nˇ c at its first return to 0 after bn 2 /k 2 , we get P(−G n,n−k ≥ bn 2 /k 2 ) ≤
max
bn 2 /k 2 ≤≤n−k
c P( Nˇ n− = k)/P( Nˇ nc = k) =
max
bn 2 /k 2 ≤≤n−k
pn (n − − k) . pn (n − k)
It follows from Lemma 7 that in the regime k, n → ∞ with n 2/3 k n, lim sup P(−G n,n−k ≥ bn 2 /k 2 ) ≤ e−b/2 . This stochastic bound implies that when b is large and n, k → ∞ in the preceding regime, then with high probability the location and value of the overall minimum of Sn,n−k on (−∞, 0] are the same as on [−bn 2 /k 2 , 0]. On the other hand, the location and value of the overall minimum of Wu − u on (−∞, 0] are also the same as on [−b, 0] with high probability when b is large. We can then deduce the first claim of Proposition 3 from Proposition 2 by routine arguments. Finally, that (G, D) has the distribution given in the statement belongs to the Brownian folklore. Specifically, recall that (Ru : u ∈ R) is a two-sided Brownian motion with drift −1 reflected at its running minimum. In particular R is a stationary Markov process and R = {u ∈ R : Ru = 0} is a stationary regenerative set. It is well-known R is light (i.e. its Lebesgue measure is 0 a.s.) and has Lévy measure 1
exp(−r/2)dr, r > 0. 2πr 3 It follows from renewal theory (see for instance Proposition 3.3 in [4]) that the joint law of G = sup(−∞, 0) ∩ R and D = inf(0, ∞) ∩ R is the same as that of the pair (−U γ , (1 − U )γ ), where U is uniform on [0, 1] and γ an independent gamma variable with parameter (1/2, 1/2). This yields the formula in Proposition 3. √
Burning Cars in a Parking Lot
279
We can now deduce from Proposition 3 the asymptotic behavior of the empirical distribution of the length of occupied intervals by standard arguments involving rotational invariance and propagation of chaos, see [19]. Corollary 2. For every j ∈ Z/nZ, denote by L n,m ( j) the length of the occupied interval which contains the site j just after the arrival of the m th car (by convention L n,m ( j) = 0 if the site j is vacant at that time). Consider the empirical distribution 1 μn,m = δ(n−m)2 n −2 L n,m ( j) . n j∈Z/n Z
Then μn,m converges in probability on the space of probability measures on [0, ∞) endowed with Prohorov’s distance as n, m → ∞ in the regime n 2/3 n − m n, towards 1 μ(dx) = √ exp(−x/2)dx, x > 0. 2π x √ We stress that the result would fail in the regime n − m ∼ n, and refer to Theorem 2.1 of Chassaing and Louchard [12] for a different limiting law in the later case. Proof. Consider a continuous bounded function f : R+ → R and set 1 μn,m , f = f ((n − m)2 n −2 L n,m ( j)) n
j∈Z/n Z
and μ, f = f (x)μ(dx). For every n, we pick two sites in Z/nZ uniformly at random, say j ∗ and j † , and write for simplicity L n,m ( j ∗ ) = L ∗n,m and L n,m ( j † ) = L †n,m . − By rotational invariance, L ∗n,m and L †n,m have both the same law as L n,m = (Dn,m + G n,m −1) , the length of the occupied interval that contains the site 0. As a consequence, we get from Proposition 3(ii) that E(μn,m , f ) = E( f ((n − m)2 n −2 L ∗n,m )) = E( f ((n − m)2 n −2 (Dn,m − G n,m − 1)+ )) ∞ 0 1 dx dy f (y − x) exp(−(y − x)/2) → 2π(y − x)3 −∞ 0 = μ, f .
Next we can express the second moment of μn,m , f as E(μn,m , f 2 ) = E( f ((n − m)2 n −2 L ∗n,m ) f ((n − m)2 n −2 L †n,m )). Because | j ∗ − j † | n 2 /(n − m)2 with high probability when n, m → ∞ in the regime ∗ and I † containing respectively j ∗ and j † are of concern, the occupied intervals In,m n,m disjoint with high probability. Further, it follows from Corollary 1 that conditionally ∗ / In,m and L ∗n,m = , L †n,m has the same law as L n−,m−−1 . It is then rouon j † ∈ tine to deduce that the rescaled lengths (n − m)2 n −2 L ∗n,m and (n − m)2 n −2 L †n,m are asymptotically independent. Hence E(μn,m , f 2 ) → μ, f 2 ,
280
J. Bertoin
and we conclude that μn,m , f → μ, f which yields our claim.
in L 2 (P),
4. Main Results Throughout the remainder of this paper, we assume that cars arrive independently at unit rate on each site while Molotov cocktails are thrown on each site at rate n −α , independently of the arrivals of cars, where α is some parameter in (0, 1). In other words, (Ct , t ≥ 0) and (Mt , t ≥ 0) are two independent Poisson point processes on Z/nZ with respective intensities 1 and n −α per site and unit of time. We are interested in the following quantities. First, for every n ∈ N, we introduce the first instant when the parking of size n is saturated, Tn := inf{t ≥ 0 : n,t = 1}. Next, for every t ≥ 0, we denote the average occupation of the parking with size n at time t by θn,t = n −1 n,t ( j). j∈Z/n Z
Our main result claims that for any α, the average occupation at time t is close to t as long as t < 1, with high probability. When α > 2/3, the parking becomes saturated at time close to 1. For α < 2/3, the mean occupation drops suddenly to nearly 0 right after time 1 although the parking is never fully saturated. Here are the formal statements. Theorem 1.
(i) For every 0 < t < 1, we have lim θn,t = t
in probability.
lim Tn = 1
in probability.
n→∞
(ii) For α > 2/3, we have n→∞
(iii) For α < 2/3, 1 < t < 2 and ε > 0, we have lim P(θn,t ≤ t − 1 + ε) = 1.
n→∞
(iv) For α < 2/3, we have for every t < 2, lim P(Tn ≤ t) = 0.
n→∞
Remark. The first instant when a Molotov cocktail is thrown after the saturation time is a renewal time for the occupation of the parking lot. For α > 2/3, we know from (ii) that the latter is close to 1 when n is large, and thus (i) can be reinforced as lim θn,t = {t} = t − t
n→∞
in probability for all t ≥ 0.
We conjecture that this asymptotic for the average occupation holds also when α ≤ 2/3, but have been unable so far to establish this due to the lack of renewal in that situation. In the same vein, we also conjecture that (iv) holds for all t ≥ 0.
Burning Cars in a Parking Lot
281
The rest of this section is devoted to the proof of Theorem 1. Parts (i) and (ii) follow rather easily from the material developed in the preceding sections, while (iii) and (iv) are more delicate. Recall that n,m denotes the arrival time of the m th car, and that the increments n,m+1 − n,m are i.i.d. exponential variables with mean 1/n. In particular the mean number of Molotov cocktails that are thrown during the time-interval [ n,m , n,m+1 ) equals n −α . For every j ∈ Z/nZ, we also denote by δn,m ( j) the number of cars that have arrived at the site j and have been burnt before the arrival time n,m of the m th car. We first point out the following upper-bound. Lemma 8. There is a numerical constant c such that for every j ∈ Z and 0 ≤ m < n, E(δn,m ( j)) ≤ c
n 1−α . n−m
Proof. Recall that L n,m denotes the length of the occupied interval that contains the site 0 at time n,m when no Molotov cocktails are thrown. By rotational invariance, the distribution of this quantity remains the same if we replace 0 by any other site j ∈ Z/nZ. Lemma 2 now shows that the size of each occupied interval which is hit by a Molotov cocktail thrown during the time-interval [ n,m , n,m+1 ) can be stochastically bounded from above by L n,m . Further, we deduce from Proposition 1 that the mean number of cars which are burnt before time n,m can be bounded from above by ⎛ ⎞ 2 m n n 2−α −α ⎝ ⎠ . E δn,m ( j) ≤ n c ≤c n− n−m =0
j∈Z/n Z
As E(δn,m ( j)) does not depend on j ∈ Z/nZ, this establishes our claim.
We are now able to establish parts (i) and (ii) of Theorem 1. Proof of Theorem 1(i). Fix arbitrarily t ∈ (0, 1) and 0 < 2ε < 1 − t. Next set m n = (t + ε)n and m n = (t − ε)n. By the law of large numbers, the bounds n,m n ≤ t ≤ n,m n hold with a high probability when n is large. Thanks to Lemma 8, the mean number of cars which have been burnt up to time n,m n is bounded from above by c(1 − t − ε)−1 n 1−α . By Markov’s inequality, the number of cars which have been burnt up to time n,m n can thus be bounded from above by n 1−α/2 with high probability. Plainly, on the latter event, the average occupation at time t fulfills t − ε − n −α/2 ≤ θn,t ≤ t + ε, which proves our claim.
Proof of Theorem 1(ii). This follows from a variation of the preceding argument. Take m = n − n β for some 2/3 < β < α, so by Lemma 8 and Markov’s inequality, the probability that more than n β cars have burned at time n,m is less than cn 2−α−2β . Thus with high probability, there are at most 2n β vacant sites at time n,m . Because the mean number of Molotov cocktails that are thrown between times n,n−n β and n,n+n β is 2n −α+β 1, we conclude that with high probability, the parking is saturated when the (n + n β )th car arrives, and this entails our claim since n,n+n β ∼ 1.
282
J. Bertoin
Our approach to establish Theorem 1 (iii) when α < 2/3 consists in showing first that there exist times close to 1 at which the length of the occupied interval containing a typical site is of order n β for some β > α. The probability that such an interval is not hit by a Molotov cocktail during a time interval of fixed duration ε > 0 is of order exp(−εn β−α ) 1. This means that the lifetime of these intervals is small, and hence a typical site will be vacant shortly after that time. In this direction, denote for every j ∈ Z/nZ by L n,m ( j) the length of the occupied interval which contains the site j just after the arrival of the m th car (by convention L n,m ( j) = 0 if the site j is vacant at that time). We shall investigate the asymptotic behavior of the empirical distribution 1 μn,m = δ(n−m)2 n −2 L n,m ( j) . n j∈Z/n Z
Roughly speaking we observe that there exist regimes at which the impact of fires is still low (in the sense that the behavior is the same as if there were no fires; cf. Corollary 2) even though the size of typical intervals becomes large, namely greater than n α . Proposition 4. Suppose α < 2/3. In the regime n, m → ∞ with 2
n 3 ∨(1−2α/3) ln4/3 n n − m n, μn,m converges in probability on the space of probability measures on [0, ∞) endowed with Prohorov’s distance towards 1 μ(dx) = √ exp(−x/2)dx, 2π x
x > 0.
We take Proposition 4 for granted, postponing its proof to the next section and now establish Theorem 1(iii). We stress that the precise form of the limit μ(dx) will not be needed, the key point is that the lengths L n,m ( j) are of order n 2 /(n − m)2 in the regime of the statement. Proof of Theorem 1(iii). Since α < 2/3, we may pick β ∈ ( 23 ∨ (1 − 2α/3), 1 − α/2). Then we set kn = n − m n = n β , and we are in the regime of Proposition 4. We work conditionally on the occupation of the parking after the arrival of the m th n car. Fix j ∈ Z/nZ and consider the occupied interval that contains j, say I j . The probability that a Molotov cocktail will be thrown on I j during a time interval of duration t is 1 − exp(−L n,m n ( j)tn −α ), where L n,m n ( j) = |I j | denotes the number of sites in I j . So, if we consider the dynamics in which car arrivals are stopped after the arrival of the m th n car, n,m n , and denote by B the number of cars which are burnt between times n,m n and t + n,m n , then the conditional mean of B is given by (1 − exp(−L n,m n ( j)tn −α )) = nμn,m n , 1 − exp(−tn −α n 2 kn−2 ·). j∈Z/n Z
We pick 0 < η < 2 − α − 2β and take t = tn = n −η , so lim (1 − exp(−tn n −α n 2 kn−2 x)) = 1
n→∞
for every x > 0.
We deduce from Proposition 4 and the porte-manteau theorem that the conditional mean of B is asymptotically close to n. Because B ≤ n, this implies that for every ε > 0,
Burning Cars in a Parking Lot
283
B ≥ (1 − ε)n holds with high probability. We may now invoke Lemma 4 and conclude that for the original dynamics, the number B of cars which are burnt between n,m n and tn + n,m n is at least (1−ε)n with high probability. On the other hand, the total number of cars which have arrived between n,m n and tn + n,m n is close to ntn = n 1−η n. Thus the average occupation of the parking at time tn + n,m n must be small with high probability. As tn + n,m n ∼ 1 and the increase of the average occupation on any time-interval with duration s is obviously bounded from above by s, this entails our claim. We next prepare the ground for the proof of Theorem 1(iv). Roughly speaking, we have to check that for α < 2/3, destruction of cars by fires at times close to 1 occurs more rapidly than new cars arrive, which prevents the saturation of the parking. In this direction, we consider the following setting. Let a > 0 and (xi , i ∈ N) be a collection of nonnegative real numbers which may be viewed as masses. We mark each xi independently, at rate axi . That is each xi receives a mark at time (axi )−1 ei , where (ei , i ∈ N) is a sequence of i.i.d. standard exponentially distributed random variables. For every s ≥ 0, let Xs =
∞
xi 1ei ≤axi s
i=1
be the sum of the masses that have a mark at time s. Lemma 9. Let 0 < t0 < t1 and b > 0. There is the inequality
a 2 xi , P(X s ≤ bs for some t0 ≤ s ≤ t1 ) ≤ (1 + ln(t1 /t0 )) exp 1 − 3 be i∈I
where I = {i : xi ≤ (bt0 ) ∧ (1/at1 )}. Proof. Observe first that the Laplace transform of X s is given for every q > 0 by E(exp(−q X s )) =
∞
E(1ei >axi s + e−q xi 1ei ≤axi s )
i=1 ∞
−ax s = e i + (1 − e−axi s )e−q xi i=1
= exp
∞
−axi s q xi ln 1 − (1 − e )(1 − e )
i=1 ∞
(1 − e−axi s )(1 − eq xi ) .
≤ exp −
i=1
Next, observe from the Markov inequality that 1 , P(X s ≤ ebs) ≤ eE exp − Xs ebs
284
and hence
J. Bertoin
∞ P(X s ≤ ebs) ≤ exp 1 − (1 − e−axi s )(1 − e−xi /ebs ) . i=1
Then recall the definition of the set of indices I in the statement and note that for every i ∈ I and t0 ≤ s ≤ t1 , we have the bounds 1 − e−axi s ≥ e−1 axi s and 1 − e−xi /ebs ≥ xi /(e2 bs), and therefore
a 2 P(X s ≤ ebs) ≤ exp 1 − 3 xi . be i∈I
Applying successively this inequality for s = 0 and j = 0, . . . , ln(t1 /t0 ), we conclude that
P X e j t0 ≤ be j+1 t0 for some j = 0, . . . , ln(t1 /t0 ) a 2 xi , ≤ (1 + ln(t1 /t0 )) exp 1 − 3 be ejt
i∈I
which yields our claim by an argument of monotonicity.
We next deduce from Lemma 9 and Proposition 4 an explicit lower-bound for the first saturation time Tn . Corollary 3. Suppose α < 2/3 and pick β ∈ 23 ∨ (1 − 2α/3), 1 − α/2 . Set n = n 2β+α−1 . Then lim P(Tn ≤ n,n+n ) = 0.
n→∞
Proof. Let kn = n − m n = n β and note that kn ≤ n . We shall implicitly work on the event j/2n ≤ n,m n + j − n,m n ≤ 2 j/n
for all j ≥ kn − 1
as, by the laws of large numbers, the probability of this event is high when n → ∞. Our approach can be described as follows. For every s ≥ 0, let Bs denote the number of cars which have been burnt during the time interval [ n,m n , n,m n + s]. We aim at showing that Bs ≥ 2ns
for all kn /2n ≤ s ≤ 2(kn + n )/n
with high probability. Note that on this event (together with that defined in the beginning of this proof), the total number of cars that are burnt between times n,m n and n,m n + j exceeds j for all kn ≤ j ≤ kn + n . Since Tn > n,n−1 , the saturation of the parking cannot occur before the arrival of the (n + n )th car on the preceding event. Observe that we may work with the dynamics where the car arrival process is stopped after n,m n . Indeed, thanks to Lemma 4, it suffices to establish that the event n = Bs ≥ 2ns for all kn /2n ≤ s ≤ 2(kn + n )/n
Burning Cars in a Parking Lot
285
has a high probability, where Bs stands for the number of cars which have been burnt during the time interval [ n,m n , n,m n + s] in these dynamics. This enables us to apply Lemma 9. More precisely, we consider the occupation of the parking at time n,m n and write xi for the size of the i th largest occupied interval. We take a = n −α so that the rate axi at which xi is marked corresponds to the rate at which a Molotov cocktail is thrown on the interval with size xi , and then Bs = X s . We also take b = 2n, t0 = kn /2n and t1 = 2(kn + n )/n. Note that (bt0 ) ∧ 1/(at1 ) ≤ n β ∧ (n 2−2β /2) = n 2−2β /2 ; we get from an application of Lemma 9 that the conditional probability of the complementary event (n )c given the xi can be bounded from above by 1 2 xi , c ln n × exp − 3 1+α 2e n i∈I
where I = i : xi ≤ n 2−2β /2 . In the notation of Proposition 4, we have xi2 = n 3 kn−2 μn,m n , f i∈I
with f (x) = x1x<1/2 , and it follows from Proposition 4 and the Porte-manteau Theorem that μn,m n , f converges in probability to some strictly positive constant. As 3 − 2β > 1 + α, we conclude from Fatou’s Lemma that the probability of (n )c tends to 0 as n → ∞, which completes the proof. We are now able to proceed to the proof of Theorem 1(iv). Proof of Theorem 1(iv). We keep the notation of Corollary 3 and pick 2 β ∈ ∨ (1 − 2α/3), β and η ∈ (2 − α − 2β, 2 − α − 2β ). 3
We set m n = n − n β . We have shown in the proof of Theorem 1(iii) that the mean density at time n,m n + n −η is small with high probability. As less than 2n 1−η cars have arrived between n,m n and n,m n + n −η with high probability, we have n,m n + n −η ≤ n,m n +2n 1−η . On the other hand, 1 − η < 2β + α − 1, and thus m n + 2n 1−η ≤ n + n , where n = n 2β+α−1 . According to Corollary 3, saturation does not occur before n,n+n with high probability and since the mean intensity increases at most linearly with unit rate, saturation does not occur either before t for every t < 2 with high probability. 5. Proof of Proposition 4 We still need to establish Proposition 4. This will be achieved by showing first that certain events have a high probability. Let (m n , n ∈ N) be a sequence of integers with 2
n 3 ∨(1−2α/3) ln4/3 n n − m n n.
(10)
286
J. Bertoin
We introduce two other sequences ( jn , n ∈ N) and (n , n ∈ N) such that jn n − m n n .
(11)
We will also require these sequences to fulfill certain conditions that will be specified later on. Recall that G n,m and Dn,m denote the first vacant sites at the left and at the right of 0 just after the arrival of the m th car for the dynamics without fires. Consider the events 2 2 , < n /j n,1 = G n,m n > −n 2 /jn2 and Dn,m n n n,2 that no Molotov cocktails are thrown on the arc ] − n 2 /jn2 , n 2 /jn2 ] between the arrivals of the (n − n )th and the m th n cars, i.e. n,2 = Mt ∈] − n 2 /jn2 , n 2 /jn2 ] for all n,n−n ≤ t ≤ n,m n . Finally, set bn =
n , (n − m n ) ln n
and consider the event n,3 that the total number of cars that have arrived on the arc ] − n 2 /jn2 , n 2 /jn2 ] and have been burnt before the arrival of the (n − n )th car is smaller than bn , i.e. ⎧ ⎫ 2 /j 2 n ⎨ n ⎬ δn,n−n (i) ≤ bn . n,3 = ⎩ ⎭ 2 2 i=−n /jn +1
Lemma 10. (i) On the event n,1 ∩ n,2 ∩ n,3 , the total number of cars that have arrived on the arc ]G n,m n , Dn,m [ and have been burnt before the arrival of the n th m n car is smaller than bn , i.e. δn,m n (i) ≤ bn . (12) G n,m n
(ii) Provided that (10) holds, we can choose the sequences ( jn , n ∈ N) and (n , n ∈ N) in such a way that lim P(n,1 ∩ n,2 ∩ n,3 ) = 1.
n→∞
As a consequence, (12) then occurs with high probability Proof.
(i) On the event n,1 ∩n,2 ∩n,3 , no car is burnt on ]G n,m n , Dn,m [ between n car. Note that the sites G and Dn,m the arrivals of the (n − n )th and the m th n n,m n n are vacant for the dynamics without fires at least until time n,m n and thus prevent the propagation of fires started outside ]G n,m n , Dn,m [ to ]G n,m n , Dn,m [ until that n n time. Thus the total number of cars that have arrived on ]G n,m n , Dn,m n [ and have been burnt before n,m n is bounded from above by the number of cars that have arrived on the arc ] − n 2 /jn2 , n 2 /jn2 ] and have been burnt before the arrival of the (n − n )th car, which in turn is at most bn .
Burning Cars in a Parking Lot
287
(ii) Recall the assumption (10); we shall write as usual kn = n −m n . First, as jn kn , we know from Proposition 3 that P(n,1 ) that can be made as close to 1 as we wish by choosing n sufficiently large. Next, n,2 is the event that n − kn consecutive arrivals occur before a Molotov cocktail is thrown on an arc of length 2n 2 /jn2 , and this has probability P(n,2 ) = 1 −
2n 2−α jn−2
n −kn
n + 2n 2−α jn−2
.
As n kn , if we further impose n 2−α jn−2 n, then we get ln P(n,2 ) ∼ −2n n 1−α jn−2 . Thus P(n,2 ) is as close to 1 as we wish provided that n is large enough and n (1−α)/2 jn
and
n jn2 n α−1 .
(13)
Last, by Lemma 8 and Markov inequality, the probability of n,3 is at least 1 − 2cn 2 jn−2
n 1−α n 2−α kn ln n = 1 − 2c . n bn jn2 n
This is close to 1 when n is large whenever n n 2−α jn−2 kn ln n.
(14)
Recapitulating, the proof will be completed if we check that the requirements (11), (13) and (14) can be fulfilled simultaneously. We can take for instance 1/4 jn = n (3−2α)/4 kn ln n and n = nkn ln n. Indeed, we then have n (1−α)/2 n (3−2α)/4 n (3−2α)/4 kn ln n = jn , 1/4
so the first statement in (13) holds. Next, from our assumption (10), n 1−2α/3 ln4/3 n kn ; raising this inequality to the cube yields jn4 = n 3−2α kn ln4 n kn4 , and hence jn kn . As clearly n kn , we have checked (11). Then we also have 1/2 n = nkn ln n n α−1 n 3/2−α kn ln2 n = n α−1 jn2 , so (13) holds. Finally we observe that n =
nkn ln n =
n 2−α kn ln n 1/2
n 3/2−α kn ln2 n
which shows that (14) is fulfilled.
ln2 n jn−2 n 2−α kn ln n,
288
J. Bertoin
Lemma 10 is the key for a useful asymptotic lower bound for the distribution of the occupied interval containing a typical site. Specifically, denote by G n,m and Dn,m the first vacant sites at the left and at the right of 0 just after the arrival of the m th car, and recall that G n,m and Dn,m denote the same quantities for the dynamics without fires. Introduce also the probability measure on (−∞, 0] × [0, ∞), ν(dx, dy) =
1 2π(y − x)3
exp(−(y − x)/2)dxdy,
x < 0 < y,
and recall from Proposition 3 that ν is the limiting distribution of (n − m)2 (n − m)2 G , D n,m n,m n2 n2 in the regime n 2/3 n − m n. Lemma 11. For every x < 0 < y, we have (n − m)2 (n − m)2 lim inf P G n,m < x, Dn,m > y ≥ ν((−∞, x) × (y, ∞)) n2 n2 in the regime n, m → ∞ with 2
n 3 ∨(1−2α/3) ln4/3 n n − m n. Proof. We use the same notation as for Lemma 10. Denote by In =]G n,m n , Dn,m [ the n th occupied interval containing 0 after the arrival of the m n car in the dynamics without fires. According to Lemma 3, the event that G n,m n < xn 2 /kn2 and Dn,m n > yn 2 /kn2 occurs whenever [xn 2 /kn2 , yn 2 /kn2 ] ⊆ In and the total number of cars that have arrived on In and have been burnt before n,m n is strictly less than the minimum of Rn,m = Sn,m − S n,m n n n on In . We know from Lemma 10 that in the event with high probability n,1 ∩n,1 ∩n,1 , the number of such cars is bounded by bn = n/(kn ln n). Now recall Proposition 3 and the notations (6) and (7). By the Skohorod representation theorem, we may assume that the convergence stated there holds almost surely. Fix ε > 0 arbitrarily small and observe that conditionally on G < x − ε and D > y + ε, we have min x≤u≤y X u > 0 a.s. Recall that bn n/kn and n −1 kn X n,m (n 2 kn−2 u) n converges to X u uniformly on u ∈ R. We deduce that the conditional probability of min j∈In X n,m ( j) > bn given that [(x − ε)n 2 kn−2 , (y + ε)n 2 kn−2 ] ⊂ In is as close to 1 as n we wish whenever n is sufficiently large. Hence, thanks to a consequence of Lemma 3 that has been mentioned at the beginning of this proof, (n − m n )2 (n − m n )2 lim inf P G < x, D > y n,m n,m n n n2 n2
can be bounded from below by (n − m n )2 (n − m n )2 G n,m n < x − ε, Dn,m n > y + ε, n,1 ∩ n,1 ∩ n,1 . lim inf P n2 n2 By Lemma 10, the latter is identical to (n − m n )2 (n − m n )2 lim inf P G < x − ε, D > y + ε , n,m n n,m n n2 n2
Burning Cars in a Parking Lot
289
and we know from Proposition 3 that this quantity is given by ν((−∞, x−ε)×(y+ε, ∞)). As ε can be chosen as small as we wish, this establishes our claim. We may now proceed with the proof of Proposition 4. Proof of Proposition 4. By the same argument of propagation of chaos as in the proof of Corollary 2, we just need to establish that after rescaling by the factor (n − m)2 n −2 , the length L n,m of the occupied interval which contains the site 0 converges in distribution to μ. We known from Lemma 2 that Dn,m ≤ Dn,m and G n,m ≤ G n,m , and also from Proposition 3 that the distribution of (n − m)2 (n − m)2 G n,m , Dn,m n2 n2 converges weakly in the regime n 2/3 n − m n towards 1 ν(dx, dy) = exp(−(y − x)/2)dxdy, 2π(y − x)3
x < 0 < y.
This ensures that for every x < 0 < y, (n − m)2 (n − m)2 G < x, D > y ≤ ν((−∞, x) × (y, ∞)). lim sup P n,m n,m n2 n2 The converse lower bound for the lim inf has been established in Lemma 11, which completes the proof of our claim. Acknowledgements. I would like to thank warmly two anonymous referees for their careful work on the first draft of this paper and their pertinent comments. This work has been supported by ANR-08-BLAN-0220-01.
References 1. Aldous, D.J.: Brownian excursions, critical random graphs and the multiplicative coalescent. Ann. Probab. 25, 812–854 (1997) 2. van den Berg, J., Brouwer, R.: Self-organized forest-fires near the critical time. Comm. Math. Phys. 267, 265–277 (2006) 3. van den Berg, J., Járai, A.: On the asymptotic density in a one-dimensional self-organized critical forestfire model. Commun. Math. Phys. 253, 633–644 (2005) 4. Bertoin, J.: Subordinators: Examples and Applications. In: Ecole d’été de Probabilités de St-Flour XXVII, Lect. Notes in Maths 1717, Berlin-Heidelberg-New York: Springer, 1999, pp. 1–91 5. Bertoin, J.: Random Fragmentation and Coagulation Processes. Cambridge Studies in Advanced Mathematics 102, Cambridge: Cambridge University Press, 2006 6. Bertoin, J., Chaumont, L., Pitman, J.: Path transformations of first passage bridges. Electron. Commun. Probab. 8, 155–166 (2003) 7. Bertoin, J., Miermont, G.: Asymptotics in Knuth’s parking problem for caravans. Random Structures Algorithms 29, 38–55 (2006) 8. Bressaud, X., Fournier, N.: Asymptotics of one-dimensional forest fire processes. Ann. Probab. 38, 1783–1816 (2010) 9. Bressaud, X., Fournier, N.: One-dimensional general forest fire processes. Preprint available at http:// arxiv.org/abs/1101.0480v1 [math.PR], 2011 10. Brouwer, R., Pennanen, J.: The cluster size distribution for a forest-fire process on Z. Electron. J. Probab. 11, 1133–1143 (2006) available: http://www.math.washington.edu/∼ejpecp/EjpVol11/paper43.abs.html, 2006 11. Chassaing, Ph., Flajolet, Ph.: Hachage, arbres, chemins & graphes. Gaz. Math. 95, 29–49 (2003)
290
J. Bertoin
12. Chassaing, Ph., Louchard, G.: Phase transition for parking blocks, Brownian excursion and coalescence. Random Structures Algorithms 21, 76–119 (2002) 13. Drossel, B., Schwabl, F.: Self-organized critical forest fire model. Phys. Rev. Lett. 69, 1629–1632 (1992) 14. Dürre, M.: Existence of multi-dimensional infinite volume self-organized critical forest-fire models. Electron. J. Probab. 11 (2006), 513–539, Available: http://www.math.washington.edu/∼ejpecp/EjpVol11/ paper21.abs.html, 2006 15. Dürre, M.: Uniqueness of multi-dimensional infinite-volume self-organized critical forest fire models. Electron. Comm. Probab. 11 (2006), 304–315, available: http://www.emis.de/journals/EJP-ECP/_ejpecp/ ECP/include/getdoc83d.pdf, 2006 16. Pittel, B.: Linear probing: the probable largest search time grows logarithmically with the number of records. J. Algorithms 8, 236–249 (1987) 17. Ráth, B., Tóth, B.: Erdös-Rényi random graphs + forest fires = self-organized criticality. Electron. J. Probab. 14 1290–1327 (2009), available: http://www.math.washington.edu/∼ejpecp/viewarticle.php? id=1962&layout, 2009 18. Schenk, K., Drossel, B., Schwabl, F.: Self-organized critical forest-fire model on large scales. Phys. Rev. E 65, 026135-1-8 (2002) 19. Sznitman, A.-S.: Topics in propagation of chaos. In: Ecole d’été de Probabilités de St-Flour XIX, Lect. Notes in Maths 1464, Berlin-Heidelberg-New York: Springer, 1991 Communicated by F. Toninelli
Commun. Math. Phys. 306, 291–331 (2011) Digital Object Identifier (DOI) 10.1007/s00220-011-1246-5
Communications in
Mathematical Physics
Small BGK Waves and Nonlinear Landau Damping Zhiwu Lin, Chongchun Zeng School of Mathematics, Georgia Institute of Technology, Atlanta, GA 30332, USA. E-mail:
[email protected] Received: 15 March 2010 / Accepted: 18 November 2010 Published online: 5 July 2011 – © Springer-Verlag 2011
Abstract: Consider a 1D Vlasov-poisson system with a fixed ion background and periodic condition on the space variable. First, we show that for general homogeneous s, p p > 1, s < 1 + 1p equilibria, within any small neighborhood in the Sobolev space W of the steady distribution function, there exist nontrivial travelling wave solutions (BGK waves) with arbitrary minimal period speed. This implies that nonlinear and traveling 1 s, p s < 1 + p space for any homogeneous equilibria Landau damping is not true in W and any spatial period. Indeed, in a W s, p s < 1 + 1p neighborhood of any homogeneous state, the long time dynamics is very rich, including travelling BGK waves, unstable homogeneous states and their possible invariant manifolds. Second, it is shown that for homogeneous equilibria satisfying Penrose’s linear stability condition, there exist no BGK waves and unstable homogeneous states in some nontrivial travelling 1 s, p W p > 1, s > 1 + p neighborhood. Furthermore, when p = 2, we prove that there exist no nontrivial invariant structures in the H s s > 23 neighborhood of stable homo geneous states. These results suggest the long time dynamics in the W s, p s > 1 + 1p and particularly, in the H s s > 23 neighborhoods of a stable homogeneous state might be relatively simple. We also demonstrate that linear damping holds for initial perturbations in very rough spaces, for a linearly stable homogeneous state. This suggests that the contrasting dynamics in W s, p spaces with the critical power s = 1 + 1p is a truly nonlinear phenomena which can not be traced back to the linear level. 1. Introduction Consider a one-dimensional collisionless electron plasma with a fixed homogeneous neutralizing ion background. The fixed ion background is a good physical approximation since the motion of ions is much slower than electrons. But we consider the fixed
292
Z. Lin, C. Zeng
ion mainly to simplify notations and the main results in this paper are also true for electrostatic plasmas with two or more species. The time evolution of such electron plasmas can be modeled by the Vlasov-Poisson system ∂f ∂f ∂f +v −E = 0, ∂t ∂x ∂v +∞ ∂E =− f dv + 1, ∂x −∞
(1a) (1b)
where f (x, v, t) is the electron distribution function, E (x, t) the electric field, and 1 is the ion density. The one-dimensional assumption is proper for a high temperature and dilute plasma immersed in a constant magnetic field oriented in the x-direction. For example, recent discovery by satellites of electrostatic structures near geomagnetic fields can be justified by using such Vlasov-Poisson models ([27,39]). We assume: 1) f (x, v, t) ≥ 0 and E (x, t) are T −periodic in x. 2) Neutral condition: T T 0 R f (x, v, 0) d xdv = T . 3) 0 E (x, t) d x = 0, so E (x, t) = −∂x φ (x, t), where the electric potential φ (x, t) is T −periodic in x. Since f (x, v, t) d xdv is an invariant, the neutral condition 2) is preserved for all time. The condition 3) ensures that E (t) is determined uniquely by f (t) from (1b) and the system (1) can be considered to be an evolution equation of f only. It is shown in [21] that with condition 3), the system (1) is equivalent to the following one-dimensional Vlasov-Maxwell system: ∂f ∂f ∂f +v −E = 0, ∂t ∂x ∂v +∞ ∂E f dv + 1, =− ∂x −∞ ∂E = v f (x, v, t) dv − U, ∂t R where U is the bulk velocity of the ion background. The system (1) is non-dissipative and time-reversible. It has infinitely many equilibria, including the homogeneous states ( f 0 (v), 0), where f 0 (v) is any nonnegative function satisfying R f 0 (v) dv = 1. In 1946, Landau [29], looking for analytical 1 2 solutions of the linearized VlasovPoisson system around the Maxwellian e− 2 v , 0 , pointed out that the electric field is subject to time decay even in the absence of collisions. The effect of this Landau damping, as it is subsequently called, plays a fundamental role in the study of plasma physics. However, Landau’s treatment is in the linear regime; that is, only for infinitesimally small initial perturbations. In the past decade, there has been renewed interest [13,18,19,25,26,28,37,38,47,50] as well as controversy about the Landau damping. In [13,25], it was shown that there exist certain analytical perturbations for which electric fields decay exponentially in the nonlinear level. More recently, in [38] nonlinear Landau damping was confirmed for general analytical perturbations of stable equilibria with linear exponential decay. For non-analytic perturbations, the linear decay rate of electric fields is known to be only algebraic (i.e. [48]) and the nonlinear damping is more difficult to justify if it is true. Moreover, in the nonlinear regime, it has been known ([42]) that the damping can be prevented by particles trapped in the potential well of the wave. Such particle trapping effect is ignored in Landau’s linearized analysis as well as other physically equivalent linear theories ([12,49]), which assume that the small amplitude of waves have a negligible effect on the evolution of distribution functions. As early as
Small BGK Waves and Nonlinear Landau Damping
293
in 1949, Bohm and Gross ([8]) already recognized the importance of particle trapping effects and the possibility of nonlinear travelling waves of small but constant amplitude. In 1957, Bernstein, Greene and Kruskal ([6]) formalized the ideas of Bohm and Gross and found a general class of exact nonlinear steady imhomogeneous solutions of the Vlasov-Poisson system. Since then, such steady solutions have been known as BGK modes, BGK waves or BGK equilibria. The nontrivial steady waves of this type are made possible by the existence of particles trapped forever within the electrostatic potential wells of the wave. The existence of such undamped waves in any small neighborhood of an equilibrium will certainly imply that nonlinear damping is not true. Furthermore, numerical simulations [9,14,16,36,37] indicate that for certain small initial data near a stable homogeneous state including Maxwellian, there is no decay of electric fields and the asymptotic state is a BGK wave or superposition of BGK waves which were formally constructed in [11]. Moreover, BGK waves also appear as the asymptotic states for the saturation of an unstable homogeneous state ([3]). These suggest that small BGK waves play an important role in understanding the long time behaviors of the Vlasov-Poisson system, near homogeneous equilibria. In this paper, we provide a sharp characterization of the Sobolev spaces in which small BGK waves exist in any small neighborhood of a homogeneous equilibrium. Denote the fractional s, p order Sobolev spaces by W s, p (R) or Wx,v ((0, T ) × R) with p ≥ 1, s ≥ 0. These spaces are the complex interpolation spaces (see [1,46]) of L p space and Sobolev space W m, p (mpositive integer). Theorem 1.1. Assume the homogeneous distribution function CCZ f 0 (v) ∈ W s, p (R) p > 1, s ∈ [0, 1 + 1p ) satisfies f 0 (v) ≥ 0, f 0 (v) dv = 1, v 2 f 0 (v) < +∞. Fix T > 0 and c ∈ R. Then for any ε > 0, there exist travelling BGK wave solutions of the form ( f ε (x − ct, v), E ε (x − ct)) to (1), such that ( f ε (x, v), E ε (x)) has minimal period T in x, f ε (x, v) ≥ 0, E ε (x) is not identically zero, and T s, p < ε. f ε − f 0 L 1x,v + v 2 | f ε (x, v) − f 0 (v)| d xdv + f ε − f 0 Wx,v (2) 0
R
The first two terms in (2) imply that the BGK wave is close to the homogeneous state ( f 0 , 0) in the norms of total mass and energy. When p > 1, s = 1, the fractional Sobolev space is equivalent to the usual Sobolev space W 1, p . The conclusions in Theo1,1 rem 1.1 are also true for the Sobolev space Wx,v by the same proof. The above theorem immediately implies that nonlinear Landau damping is not true for perturbations in any 1 s, p s < 1 + p space, for any homogeneous equilibrium in W s, p and any spatial W period. As a corollary we show that there exist unstable homogeneous states in of the proof, 1 s, p the W (R) s < 1 + p neighborhood of any homogeneous equilibrium. Corollary 1.1. Under the assumption of Theorem 1.1, for any fixed T > 0, ∃ε0 > 0, such that for any 0 < ε < ε0 , there exists a homogeneous state ( f ε (v), 0) which is linearly unstable under perturbations of the x−period T , f ε (v) ≥ 0, f ε (v) dv = 1, R
294
Z. Lin, C. Zeng
and
f ε (v) − f 0 (v) L 1 (R) +
R
v 2 | f ε (v) − f 0 (v)| dv + f ε (v) − f 0 (v)W s, p (R) < ε.
By the above Corollary and Remark 2.1 following the proof of Theorem 1.1, in the 1 s < 1 + p neighborhood of any homogeneous state there exist lots of unstable homogeneous states and unstable nontrivial BGK waves. In a work in progress, we are constructing stable and unstable manifolds near an unstable equilibrium of the VlasovPoisson system by extending our work ([33]) on invariant manifolds of Euler equations. Such (possible) invariant manifolds might reveal more complicated global invariant structures such as heteroclinic or homoclinic orbits. Moreover, in some physical reference ([11]), small BGK waves are formally shown to follow a nonlinear superposition principle to form time-periodic or quasi-periodic orbits. We note that Maxwellian or any homogeneous equilibria f 0 (v) = μ 21 v 2 with μ monotonically decreasing, were shown by Newcomb in 1950s (see Appendix I, pp. 20–21 of [7]) to be nonlinearly stable in the norm f L 2 . So our result suggests, in particular, that in any invariant small L 2 neighborhood of the Maxwellian, the long time dynamical behaviors are very rich. The following theorem shows that there exist no nontrivial BGK waves near a stable s, p homogeneous state in Wx,v space when p > 1, s > 1 + 1p . s, p Wx,v
Theorem 1.2. Assume f 0 (v) ∈ W s, p (R) p > 1, s > 1 + 1p . Let S = {vi }li=1 be the set of all extrema points of f 0 . Let 0 < T0 ≤ +∞ be defined by 2
f 0 (v) 2π = max 0, max dv . (3) vi ∈S T0 v − vi Then for any T < T0 , ∃ε0 (T ) > 0, such that there exist no nontrivial travelling wave solutions ( f (x − ct, v), E (x − ct)) to (1) for any c ∈ R, satisfying that ( f (x, v), E(x)) has period T in x, E (x) not identically 0, T v 2 f (x, v) dvd x < ∞, (assumption of finite energy) 0
R
s, p < ε0 . and f − f 0 Wx,v
By Penrose’s stability criterion ([41] or Lemma 6.1 in the Appendix) the homogeneous equilibrium ( f 0 (v), 0) is linearly stable to perturbations of x−period T < T0 . Moreover, in Propositions 4.1 and 4.2, the linear damping of the electrical field is shown for such stable states in a rough function space. Theorems 1.1 and 1.2 imply that for any p > 1, s = 1 + 1p is the critical index for existence or non-existence of small BGK waves in the W s, p neighborhood of a stable homogeneous state. In Lemma 3.2, we show that the stability condition 0 < is in some sense also necessary for the above T < T0 1 s, p s >1+ p . non-existence result in W The following corollary shows that all homogeneous equilibria in a sufficiently small 1 s, p W (R) s > 1 + p neighborhood of a stable homogeneous state remain linearly stable. With Corollary 1.1, it implies that s = 1 + 1p is also the critical index for persistence of linear stability of homogeneous states under perturbations in W s, p (R) space.
Small BGK Waves and Nonlinear Landau Damping
295
Corollary 1.2. Assume f 0 (v) ∈ W s, p (R) p > 1, s > 1 + 1p . Let S = {vi }li=1 be the set of all extrema points of f 0 and T0 be defined in (3). Then for any T < T0 , ∃ε0 (T ) > 0 such that any homogeneous state ( f (v), 0) satisfying f (v) − f 0 (v)W s, p (R) < ε0 is linearly stable under perturbations of x−period T . Theorem1.2 and the above corollary suggest that the dynamical structures in the 1 s, p s > 1 + p neighborhood of a stable homogeneous equilibrium might be small W relatively simple, since the only nearby steady structures, including travelling waves, are stable homogeneous states. The physical implication of Theorem 1.2 is that when 1 s, p s > 1 + p , the potential well of the wave is the initial perturbation is small in W unable to trap particles forever to form BGK waves. So the particles will get out of the potential well sooner or later and perform free flights, then the linear damping effect is possible to manifest itself at a nonlinear level. Furthermore, when p = 2, we get a muchstronger result that any invariant structure near a stable homogeneous state in H s space s > 23 must be trivial, that is, the electric field is identically zero. Theorem 1.3. Assume the homogeneous profile f 0 (v) ∈ H s (R) s > 23 . For any T < T0 (defined by (3)), there exists ε0 > 0, such that if ( f (t), E (t)) is a solution to the nonlinear VP equation (1a)–(1b) and f (t) − f 0 L 2x Hvs < ε0 , for all t ∈ R,
(4)
then E (t) ≡ 0 for all t ∈ R. s . The above theorem exThe space L 2x Hvs is contained in the Sobolev space Hx,v cludes any nontrivial invariant structure, such as quasi-periodic solutions and heteroclinic (homoclinic) orbits, in the H s s > 23 neighborhood of a stable homogeneous state. In Theorem 5.1, we also show that nonlinear decay of the electric fieldis true for any positive or negative invariant structure (see Sect. 5 for definition) in the H s s > 23 neighborhood of stable homogeneous states. These results reveal that contrary to the H s s < 23 case, there exist no obstacles of nontrivial invariant structures in the H s s > 23 neighborhood of stable homogeneous states to prevent nonlinear Landau damping. We note that Theorems 1.1, 1.2 and 1.3 about the contrasting nonlinear dynamics in W s, p spaces with s < 1 + 1p or s > 1 + 1p (particularly when p = 2), have no analogue at the linear level. Indeed, under Penrose’s stability condition, it is shown in Sect. 4 that the linear decay of electrical fields holds true for very rough initial data, particularly, no derivatives of f (t = 0) is required for linear damping. We refer to Propositions 4.1 and 4.2, as well as Remark 4.3 in Sect. 4 for more details. This shows once again the importance of particle trapping effects on nonlinear dynamics, which are completely ignored at the linear level. Finally, we briefly describe main ideas in the proof of Theorems 1.1, 1.2 and 1.3. For simplicity, we look at steady BGK waves. The first attempt would be to construct BGK waves near ( f 0 (v), 0) directly by the bifurcation theory. However, this requires a bifurcation condition: for bifurcation period T > 0, 2 f 0 (v) 2π dv. (5) = T v R
296
Z. Lin, C. Zeng
For general homogeneous equilibria and period T , the bifurcation condition (5) is not satisfied. For example, for Maxwellian, this condition fails for any T > 0. Our strategy is to modify f 0 (v) to get a nearby homogeneous state satisfying (5) and then do bifurcation near this modified state. In the modification step, we introduce two parameters, one is to to obtain (5)and the other one is to ensure that the modification results in a small W s, p s < 1 + 1p norm change. For the proof of non-existence of travelling waves in W s, p s > 1 + 1p , our idea is to get a second order equation for the electrical field E (x) from steady Vlasov-Poisson equations and show the integral form ofthis equation is not compatible when T < T0 and the perturbation is small in W s, p s > 1 + 1p . Interestingly, T0 (defined by (3)) is exactly the critical period for linear stability by Penrose’s criterion, which is also used in the proof of Corollaries 1.1 and 1.2. To prove Theorem 1.3, we use the integral form of the linear decay estimate (Proposition 4.1) and the H s s > 23 invariant assumption to obtain similar nonlinear decay estimates in the integral form. From such integral estimates, we can show the homogeneous nature of the invariant structures and the decay of the electric field for semi-invariant solutions. Here we are in a position to offer a conceptual explanation why s = 1 + 1p appears as the critical Sobolev exponent for the existence of small BKG waves and possibly also in the nonlinear Landau damping. By Penrose’ stability criterion, the critical spatial period f0 T0 for linear stability of ( f 0 (v), 0) is determined in (3) by integrals v−c dv, where ci i s, p are critical points of f 0 . These integrals are controlled by || f 0 ||W if s > 1 + 1p , but not if s < 1 + 1p . In the latter case, a small homogeneous perturbation to f 0 in W s, p space may dramatically change its stability for any fixed spatial period T . Due to this change of stability, bifurcations occur and produce small BKG waves and possibly other complicated structures. In the opposite case when s > 1+ 1p , small homogeneous perturbations do not change the stability of ( f 0 (v), 0), therefore the bifurcation of nontrivial waves cannot occur. The result of this paper has also been extended to a related problem of inviscid decay of Couette flow v 0 = (y, 0) of 2D Euler equations. The linear decay of vertical velocity near the Couette flow was already known by Orr ([40]) in 1907. This inviscid decay problem is important to understand the formation of coherent structures in 2D turbulence. In [34], we are able to obtain similar results near the Couette flow. This paper is organized as follows. In Sect. 2, we prove the existence result in 1 s, p s, p W s < 1 + p . In Sect. 3, non-existence of BGK waves in W s > 1 + 1p is shown. In Sect. 4, we study the linear damping problem in Sobolev spaces. In Sect. 5, we use the linear decay estimate in Sect. 4 to show that all invariant structures in H s s > 23 are trivial. The Appendix is to reformulate Penrose’s linear stability criterion used in this paper. Throughout this paper, we use C to denote a generic constant in the estimates and the dependence of C is indicated only when it matters in the proof. 2. Existence of BGK Waves in W s, p
s < 1 + 1p
In thissection, weconstruct small BGK waves near any homogeneous state in the space W s, p s < 1 + 1p . Our strategy is to first construct BGK waves near proper smooth homogeneous states. Then we show that any homogeneous state can be approximated by such smooth states in W s, p .
Small BGK Waves and Nonlinear Landau Damping
297
∞ (R), supp u ⊂ [−b, b], and u (x) is even, then there Lemma 2.1. Assume u (x) ∈ C √ √ ∞ exists g ∈ C (R), supp g ⊂ − b, b , such that u (x) = g x 2 .
Proof. The proof is essentially given in [24, p. 394]. We repeat it here for completeness. (k) When k is odd, since u (k)√ Theorem 1.2.6 in [24], we (x) is √odd we have u (0) = 0. By ∞ can choose g0 ∈ C0 − b, b with the Taylor expansion u (2k) (0) x k/(2k)! Then all derivatives of u 1 (x) = u (x) − g0 x 2 vanish at 0. Define
√ g0 (x) + u 1 x if x > 0 g (x) = . if x ≤ 0 g0 (x) Then g (x) satisfies all the required properties. In particular, g (x) is C ∞ at x = 0 because all derivatives of u 1 (x) vanish there.
Proposition 2.1. Assume f 0 (v) ∈ C ∞ (R) ∩ W 2, p (R) ( p > 1), f 0 is even near v = 0, and
f 0 > 0,
v 2 f 0 (v) dv < ∞.
f 0 (v) dv = 1, R
R
Then for any fixed s < 1 + 1p , T > 0, and any ε > 0, there exist steady BGK solutions of the form ( f ε (x, v), E ε (x)) to (1), such that ( f ε (x, v), E ε (x)) has period T in x, f ε (x, v) > 0, E ε (x) is not identically zero, and f ε − f 0 L 1x,v +
0
T
R
s, p < ε. v 2 | f 0 − f ε | d xdv + f ε − f 0 Wx,v
(6)
Proof. Assume f 0 (v) is even in [−2a, 2a] (a > 0). Let σ (x) = σ (|x|) to be the cut-off function such that σ (x) ∈ C0∞ (R), 0 ≤ σ (x) ≤ 1; σ (x) = 1 when |x| ≤ 1; σ (x) = 0 when |x| ≥ 2. (7) √ √
By Lemma 2.1, there exists g0 (x) ∈ C ∞ (R), supp g0 ⊂ − 2a, 2a , such that v = g0 v 2 . f 0 (v) σ a Define g+ (x), g− (x) ∈ C ∞ (R) by √ ⎧ √ √ x ⎪ + g0 (x) if x > a ⎨ f0 ± x 1 − σ a √ √ g± (x) = g0 (x) if − 2a√< x ≤ a . ⎪ ⎩ 0 if x ≤ − 2a Then
f 0 (v) =
g+ v 2 if v > 0 . g− v 2 if v ≤ 0
298
Z. Lin, C. Zeng
Since f 0 (0) = 0, f 0 ∈ W 2, p (R) ∩ C ∞ (R), we have R
R
f 0 (v) v dv
< ∞. Indeed,
f 0 (v) f 0 (v) f 0 (v) dv + dv ≤ v v dv v |v|≤1 |v|≥1 1 p 1 f p ≤ 2 max f 0 (v) + dv 0 L p |v|≤1 |v|≥1 |v| 1 p 1 f p < ∞. = 2 max f 0 (v) + 0 L |v|≤1 p − 1
We consider three cases. f (v) 2 Case 1. R 0v dv < 2π . Choose a function F (v) ∈ C ∞ (R), such that F ∈ T 2, p W (R), F (v) is even, F (v) 2 dv > 0. (8) F (v) > 0, F (v) dv < ∞, v F (v) dv < ∞, v R R R An example of such functions is given by (v − v0 )2 (v + v0 )2 F (v) = exp − + exp − , 2 2 where v0 is a large positive constant. Indeed, F (v) − F (0) F (v) dv = dv > 0, when v0 is large enough, v v2 R and other properties in (8) are easy to check. Since F (v) is even, by Lemma 2.1, there exists G (x) ∈ C ∞ (R) such that F (v) = G v 2 . Let γ , δ > 0 be two small parameters to be fixed, define v 1 γ f F , (9) f γ ,δ (v) = + (v) 0 2 1 + C0 γ δ γδ where C0 = F (v) dv > 0. Note that f γ ,δ ∈ C ∞ (R)∩ W 2, p (R), R f γ ,δ (v) dv = 1, and f (v) f 0 (v) 1 1 F (v) γ ,δ dv = dv + dv . v 1 + C0 γ 2 R v δ2 R v R Since
2π 2
R
f 0 (v) v dv
R
f 0 (v) 1 dv + 2 v δ2
0<
<
T
, there exists 0 < δ1 < δ2 such that
R
F (v) dv < v
2π T
2
< R
f 0 (v) 1 dv + 2 v δ1
Thus there exists γ0 > 0 small enough, such that 2 f (v) f γ ,δ1 (v) 2π γ ,δ2 dv < dv, < 0< v T v R R
R
F (v) dv. v
when 0 < γ < γ0 .
(10)
Small BGK Waves and Nonlinear Landau Damping
299
We look for steady BGK waves near the homogeneous states f γ ,δ (v), 0 . Consider a 0 steady BGK solution f (x, v), E 0 (x) = −βx (x) to (1). Denote e = 21 v 2 − β (x) to be the particle energy. From the steady Vlasov equation, f 0 (x, v) is constant along each particle trajectory. So for trapped particles with − max β < e < − min β, f 0 depends only on e, and for free particles with e > − min β, f 0 depends on e and the sign of the initial velocity v. We look for BGK waves near f γ ,δ , 0 of the form ⎧ ⎨ 1 2 g+ (2e) + γδ G 2e 2 if v > 0 1+C0 γ β (γ δ) f γ ,δ (x, v) = . (11) ⎩ 1 g− (2e) + γδ G 2e 2 if v ≤ 0 1+C γ 2 (γ δ)
0
For β L ∞ sufficiently small,
β f γ ,δ
(x, v) > 0 and it satisfies the steady Vlasov equation, β
β
since in particular for trapped particles f γ ,δ (x, v) = f γ ,δ (x, −v). To satisfy Poisson’s equation, we solve the ODE β f γ ,δ (x, v) dv − 1 βx x = R 2e γ 1 dv − 1 G g+ (2e) dv + g− (2e) dv + = 1 + C0 γ 2 v>0 (γ δ)2 v≤0 R δ := h γ ,δ (β). β=0
Then h γ ,δ ∈ C ∞ (R). Since f γ ,δ (x, v) = f γ ,δ (v), so h γ ,δ (0) = f γ ,δ (v) dv − 1 = 0 R
and
v2 γ 1 −2 2 2 g+ v dv + g− v dv + G dv 2 1 + C0 γ 2 (γ δ)2 v>0 v≤0 R δ (γ δ) f (v) γ ,δ dv. =− v R
h γ ,δ (0) =
Thus when 0 < γ < γ0 , δ1 < δ < δ2 , we have h γ ,δ (0) < 0, which implies that β = 0 is a center of the second order ODE, βx x = h γ ,δ (β).
(12)
So by the standard bifurcation theory of periodic solutions near a center, for any fixed γ ∈ (0, γ0 ), there exists r0 > 0 (independent of δ ∈ (δ1 , δ2 )), such that for each 0 r0 , there exists a T (γ , δ; r ) − periodic solution βγ ,δ;r to the ODE (12) with
2π T (γ , δ; r )
2
f γ ,δ (v)
→ R
v
dv, when r → 0.
By (10), when r is small enough, T (γ , δ1 ; r ) < T < T (γ , δ2 ; r ).
300
Z. Lin, C. Zeng
Since T (γ , δ; r ) is continuous in δ, for each γ , r > 0 small enough, there exists β δT (γ , r ) ∈ (δ1 , δ2 ), such that T (γ , δT ; r ) = T . Define f γT,r (x, v) = f γ ,δT (x, v) from (11) by setting β = βγ ,δT ;r and let E γ ,r (x) = −βγ ,δT ;r (x). Then f γT,r (x, v), E γ ,r (x) is a nontrivial BGK solution to (1) with x−period T . For any fixed γ > 0, let δ (γ ) = lim δT (γ , r ) ∈ [δ1 , δ2 ]. r →0
By the dominant convergence theorem, it is easy to show that T T v 2 f γT,r (x, v) − f γ ,δ(γ ) (v) d xdv f γ ,r (x, v) − f γ ,δ(γ ) (v) 1 + L x,v R 0 T + f γ ,r (x, v) − f γ ,δ(γ ) (v) 2, p → 0, Wx,v
when r = βγ ,δT ;r H 2 (0,T ) → 0. Since s < 1 + 1p < 2, for any γ > 0 small, there exists r = r (γ , ε) > 0 such that T T v 2 f γT,r (x, v) − f γ ,δ(γ ) (v) d xdv f γ ,r (x, v) − f γ ,δ(γ ) (v) 1 + L x,v R 0 ε + f γT,r (x, v) − f γ ,δ(γ ) (v) s, p < . Wx,v 2 Next, we show that the modified homogeneous state f γ ,δ(γ ) (v) is arbitrarily close to f 0 (v) in the sense that f 0 (v) − f γ ,δ(γ ) (v) 1 + T v 2 f 0 (v) − f γ ,δ(γ ) (v) dv L R + f 0 (v) − f γ ,δ(γ ) (v)W s, p → 0, x,v
when γ → 0. Note that the deviation is f 0 (v) − f γ ,δ(γ ) (v) =
v 1 γ 2 −C F . γ f − (v) 0 0 2 1 + C0 γ δ γδ
Since δ (γ ) ∈ [δ1 , δ2 ], when γ → 0, v γ F dv γ δ (γ ) R δ (γ ) v γ F dv v2 δ (γ ) γ δ (γ ) R γ v F δ (γ ) γ δ (γ ) L p d γ v F dv δ (γ ) γ δ (γ ) L p and thus
= γ2
F (v) dv → 0, = γ 4 δ (γ )2 v 2 F (v) dv → 0, R
R
=γ
1+ 1p 1
1
δ (γ ) p 1
= γ p δ (γ ) p
−1
−2
F (v) L p → 0, F (v) L p → 0,
f 0 (v) − f γ ,δ(γ ) (v) 1 + T v 2 f 0 (v) − f γ ,δ(γ ) (v) dv L R + f 0 (v) − f γ ,δ(γ ) (v)W 1, p → 0. x,v
Small BGK Waves and Nonlinear Landau Damping
It remains to check s−1 d |D| f 0 (v) − f γ ,δ(γ ) (v) p → 0, dv L
301
when γ → 0,
where |D|δ (δ > 0) is the fractional differentiation operator with the Fourier symbol |ξ |δ . By using the scaling equality δ v 1 |D| χd (v) = δ |D|δ χ , d d where χd (v) = χ (v/d), we have s d 1 1 γ v = γ 1−s+ p δ (γ )−1−s+ p |D|s F (v) p → 0, |D| F L dv δ (γ ) γ δ (γ ) Lp when γ → 0, since s < 1 + 1p . So we can choose γ = γ (ε) > 0 small such that ε f 0 − f γ ,δ(γ ) 1 + T v 2 f 0 (v) − f γ ,δ(γ ) (v) dv + f 0 − f γ ,δ(γ ) W s, p (R) < . L (R) 2 R Then ( fε , Eε ) =
f γT(ε),r (γ (ε),ε) (x, v), E γ (ε),r (γ (ε),ε) (x)
is a steady BGK wave solution satisfying (6). 2 f (v) 2 . Choose F = exp − v2 , then R F v(v) dv < 0. Define Case 2. R 0v dv > 2π (v) T f γ ,δ (v) as in Case 1 (see (9)). Then there exists 0 < δ1 < δ2 such that 2 f 0 (v) f 0 (v) 1 2π 1 F (v) F (v) dv + 2 dv < dv + 2 dv. 0< < v T v δ1 R v δ2 R v R R The rest of the proof is the same as in Case 1. f (v) 2 Case 3. R 0v dv = 2π . For δ > 0, define T
1 v f0 . δ δ Then f δ ∈ C ∞ (R) ∩ W 2, p (R), f δ (v) > 0, R f δ (v) dv = 1, and f 0 (v) f δ (v) 1 dv = 2 dv. v δ R v R f δ (v) =
For any ε > 0 small, there exist 0 < δ1 (ε) < 1 < δ2 (ε) such that 2 f 0 (v) f 0 (v) 1 2π 1 < 2 dv < dv 0< 2 T δ2 R v δ1 R v and when δ ∈ (δ1 (ε), δ2 (ε)), ε f 0 (v)− f δ (v) L 1 (R) + T v 2 | f 0 (v) − f δ (v)| dv + f 0 (v) − f δ (v)W 2, p (R) < . 2 R
302
Z. Lin, C. Zeng
For δ ∈ (δ1 (ε), δ2 (ε)), we consider bifurcation of steady BGK waves near ( f δ (v), 0), which are of the form ⎧ ⎨ 1 g+ 2e2 if v > 0 1 δ β δ f δ (x, v) = , e = v 2 − β (x), E = −βx . (13) ⎩ 1 g− 2e if v ≤ 0 2 δ
δ2
The existence of BGK waves is then reduced to solve the ODE β βx x = f δ (x, v) dv − 1 := h δ (β).
(14)
R
As in Case 1, for any δ ∈ (δ1 (ε), δ2 (ε)), ∃ r0 (ε) > 0 (independent of δ) such that for each 0 < r < r0 , there exists a T (δ; r ) −periodic solution βδ;r to the ODE (14) with βδ;r 2 = r . Moreover, H (0,T (δ;r ))
2π T (δ; r )
2
→ R
f δ (v) dv, when r → 0. v
So when r is small enough, T (δ1 ; r ) < T < T (δ2 ; r ) and there exists δT (r, ε) ∈ T (x, v) = f β v) with β = (δ1 (ε), δ2 (ε)) such that T (δT ; r ) = T . Define fr,ε δT (r,ε) (x, T βδT (r,ε);r in (13) and Er,ε (x) = −βδT (r,ε);r (x). Then fr,ε (x, v), Er,ε (x) is a nontrivial BGK solution to (1) with x−period T . Let δ (ε) = lim δT (r, ε) ∈ [δ1 (ε), δ2 (ε)]. r →0
As in Case 1, by the dominance convergence theorem, we can choose r = r (ε) > 0 small enough, such that T T v 2 frT(ε),ε (x, v) − f δ(ε) (v) d xdv fr (ε),ε (x, v) − f δ(ε) (v) 1 + L x,v R 0 ε + frT(ε),ε (x, v) − f δ(ε) (v) 2, p < . Wx,v 2 Then ( fε , Eε ) =
frT(ε),ε (x, v), Er (ε),ε (x)
is a steady BGK wave solution satisfying T f ε − f 0 L 1x,v + v 2 | f 0 − f ε | d xdv + f ε − f 0 W 2, p < ε, R
0
x,v
which certainly implies (6). This finishes the proof of the proposition.
To prove Theorem 1.1, we need the following approximation result. Lemma 2.2. Fix p > 1, 0 ≤ s < 1 + 1p and c ∈ R. Assume f 0 ∈ W s, p (R), f 0 > 0, 2 R f 0 (v) dv = 1, and R v f 0 (v) dv < ∞. Then for any ε > 0, there exists f ε (v) ∈ C ∞ (R) ∩ W 2, p (R), such that f ε is even near v = c, and f ε − f 0 L 1 (R) + v 2 | f ε − f 0 | d xdv + f ε − f 0 W s, p (R) ≤ ε. R
Small BGK Waves and Nonlinear Landau Damping
303
Proof. Let η (x) be the standard mollifier function, that is, C exp x 21−1 if |x| < 1 , η (x) = 0 if |x| ≥ 1 and ηδ1 (x) = δ11 η mollifiers, we have
x δ1 . Define f δ1 (v) := ηδ1 (v) ∗ f 0 (v). Then by the properties of
f δ1 ∈ C
∞
(R),
f δ1 (v) > 0,
R
f δ1 (v) dv = 1,
and when δ1 is small enough, ε fδ − f0 1 + v 2 f δ1 − f 0 d xdv + f δ1 − f 0 W s, p (R) ≤ . 1 L (R) 2 R
(15)
We can assume f δ1 (v) ∈ W 2, p . Since otherwise, we can modify f δ1 (v) near infinity by cut-off to get f˜δ1 (v) such that f˜δ1 (v) ∈ W 2, p and ε ˜ v 2 f δ1 − f˜δ1 d xdv + f δ1 − f˜δ1 s, p ≤ . f δ1 − f δ1 1 + L (R) W (R) 2 R Solely to simplify notations, we set c = 0 below. Let σ (x) = σ (|x|) be the cut-off function defined by (7). Let δ2 > 0 be a small number, and define v f δ1 (v) + f δ1 (−v) v + f δ1 ,δ2 (v) = f δ1 (v) 1 − σ σ δ2 2 δ2 v f δ1 (v) − f δ1 (−v) . σ = f δ1 (v) − 2 δ2 Then obviously, f δ1 ,δ2 ∈ C ∞ (R),
f δ1 ,δ2 (v) > 0,
f δ1 ,δ2 (v) dv =
R
R
f δ1 (v) dv = 1,
and f δ1 ,δ2 (v) is even on the interval [−δ2 , δ2 ]. Below, we prove that when δ2 is small enough, ε fδ − f0 1 + (16) v 2 f δ1 − f 0 dv + f δ1 − f 0 W s, p (R) ≤ . 1 L (R) 2 R Since
f δ − f δ ,δ 1 ≤ 1 1 2 L
|v|≤2δ2
f δ1 (v) dv,
v 2 f δ1 − f δ1 ,δ2 dv ≤ (2δ2 )2 f δ1 (v) dv, |v|≤2δ2 R f δ − f δ ,δ p ≤ f δ p 1 1 2 L 1 L [−2δ ,2δ ] , 2
2
304
Z. Lin, C. Zeng
and f δ (v) + f δ (−v) f δ1 (v) − f δ1 (−v) v v 1 1 ∂v f δ1 − f δ1 ,δ2 = , + σ σ 2 δ2 δ2 2δ2 f δ1 (v) − f δ1 (−v) σ ∂v f δ − f δ ,δ p ≤ f + max 1 1 2 δ1 L p [−2δ ,2δ ] p L 2δ2 2 2 L {δ2 ≤|v|≤2δ2 } 2δ 2 1 ≤ f δ1 p + max σ f dv p −2δ2 δ1 L [−2δ2 ,2δ2 ] 2δ2
L {δ2 ≤|v|≤2δ2 }
≤ f δ1
1 1 1 σ (4δ2 ) p f + max (2δ2 ) p δ 1 L p [−2δ ,2δ ] L p [−2δ2 ,2δ2 ] 2δ2 2 2 ≤ 1 + 2 max σ f δ1 p , L [−2δ2 ,2δ2 ]
so when δ2 → 0, f δ − f δ ,δ 1 + 1 1 2 L
R
v 2 f δ1 − f δ1 ,δ2 dv + f δ1 − f δ1 ,δ2 W 1, p → 0.
Next, we show ∂v f δ − f δ ,δ s−1, p → 0, when δ2 → 0. 1 1 2 W (R) This follows from Lemma 2.3 below, since s − 1 <
1 p
and
f δ1 (v) − f δ1 (−v) s−1, p 2v W (R) 1 1 = f δ1 ((2τ − 1) v) dτ ≤ dτ f δ1 ((2τ − 1) v) s−1, p 0 s−1, p W (R) 0 W (R) 1 − 1 −s+1 −1 |2τ − 1| p f δ1 p + |2τ − 1| p ≤C |D|s−1 f δ1 p dτ ≤ C f δ1 W s, p (R) . L
0
L
So when δ2 → 0, f δ − f δ ,δ 1 + 1 1 2 L v
R
v 2 f δ1 − f δ1 ,δ2 dv + f δ1 − f δ1 ,δ2 W s, p → 0.
Thus by choosing δ2 small enough, (16) is satisfied. By setting f ε = f δ1 ,δ2 , the conclusion of the lemma follows from (15) and (16).
1 ,p Lemma 2.3. Given f ∈ W p (R) ∩ L ∞ , g ∈ W s, p (R) p > 1, 0 ≤ s < 1p , then for any δ > 0, f xδ g (x) ∈ W s, p (R) and x g → 0, when δ → 0. f δ W s, p
(17)
Small BGK Waves and Nonlinear Landau Damping
305 1
,p
Proof. First, we cite a result of Strichartz ([44]): Given h 1 ∈ W p (R) ∩ L ∞ , h 2 ∈ W s, p (R), then h 1 h 2 ∈ W s, p (R) and h 1 h 2 W s, p ≤ C h 1 1p , p , h 1 L ∞ h 2 W s, p . W
The above result immediately implies that f xδ g (x) ∈ W s, p (R). To show (17), for any ε > 0, we pick g1 ∈ C0∞ (R) such that g − g1 W s, p < ε. Since x 1 x 1 1 p + |D| p f p = δ p f (x) L p + |D| p f p , f L δ L δ L so when δ ≤ 1, x 1 ≤ C f 1p , p , for some C independent of δ. f W δ W p ,p Thus x g f δ W s, p x x g1 ≤f +f (g − g1 ) δ δ W s, p W s, p x ≤ C f 1p , p , f L ∞ g − g1 W s, p + f s, p C g1 1p , p , g1 L ∞ W W δ W ≤ C f 1p , p , f L ∞ ε W 1 1 1 −s p + δ f (x) L p + δ p |D| p f p C g1 1p , p , g1 L ∞ . L
W
Letting δ → 0, we get
x lim f g ≤ C f 1p , p , f L ∞ ε. δ→0 W δ W s, p
Since ε is arbitrarily small, (17) is proved.
Proof of Theorem 1.1. Fix the period T > 0 and the travel speed c ∈ R. Then by Lemma 2.2, for any ε small enough, there exists f 1 (v) ∈ C ∞ (R) ∩ W 2, p (R), such that f 1 (v) is even near v = c and f 1 − f 0 L 1 (R) + T v 2 | f 1 − f 0 | dv + f 1 − f 0 W s, p (R) ≤ ε/2. R
Our goal is to construct travelling BGK wave solutions of the form ( f ε (x − ct, v), E ε (x − ct)) near ( f 1 (v), 0), such that f ε (x, v) − f 1 (v) L 1x,v +
R
v 2 | f ε (x, v) − f 1 (v)| d xdv
s, p < + f ε (x, v) − f 1 (v)Wx,v
ε . 2
306
Z. Lin, C. Zeng
It is equivalent to find steady BGK solutions ( f ε (x, v + c), E ε (x)) near ( f 1 (v + c), 0). By Proposition 2.1, there exists a steady BGK solution ( f 2 (x, v), E 2 (x)) near ( f 1 (v + c), 0) such that E 2 (x) not identically 0, f 2 (x, v) − f 1 (v + c) L 1x,v + v 2 | f 2 (x, v) − f 1 (v + c)| d xdv R
s, p < + f 2 (x, v) − f 1 (v + c)Wx,v
ε . 2 5 + 4c2
Setting f ε (x, v) = f 2 (x, v − c),
E ε (x) = E 2 (x),
then ( f ε (x − ct, v), E ε (x − ct)) is a travelling BGK solution and f ε − f 1 (v) L 1x,v + (v − c)2 | f ε − f 1 (v)| d xdv R
s, p < + f ε − f 1 (v)Wx,v
ε . 2 5 + 4c2
Since |v − c| ≥ |v| /2 when |v| ≥ 2 |c| , so v 2 | f ε (x, v) − f 1 (v)| d xdv R ≤ v 2 | f ε (x, v) − f 1 (v)| d xdv + v 2 | f ε (x, v) − f 1 (v)| d xdv |v|≥2|c| |v|≤2|c| 2 ≤ 4 (v − c) | f ε (x, v) − f 1 (v)| d xdv + 4c2 f ε − f 1 L 1x,v 4 + 4c2 ε , < 2 5 + 4c2 and thus
s, p f ε − f 1 (v) L 1x,v + v 2 | f ε (x, v) − f 1 (v)| d xdv + f ε (x, v) − f 1 (v)Wx,v R 4 + 4c2 ε ε ε + = . < 2 2 2 2 5 + 4c 2 5 + 4c
So f ε − f 0 (v) L 1x,v +
R
s, p < ε, v 2 | f ε (x, v) − f 0 (v)| d xdv + f ε (x, v) − f 0 (v)Wx,v
and the proof of Theorem 1.1 is finished.
Remark 2.1. For steady BGK waves ( f (x, v), E (x)) of the form E (x) = −βx and
+ 1 μ (e) if v ≥ 0 f (x, v) = , e = v 2 − β (x), (18) μ− (e) if v < 0 2
Small BGK Waves and Nonlinear Landau Damping
307
with μ+ , μ− ∈ C 1 (R), such as constructed in the proof of Theorem 1.1, E (x) has only two zeros in one minimal period. This is because the electric potential β satisfying the 2nd order autonomous ODE, 1 2 1 2 v − β dv + v − β dv − 1 = h (β) μ+ μ− (19) βx x = 2 2 v≥0 v<0 with h ∈ C 1 (R). Any periodic solution of minimal period to the ODE (19) has only one minimum and maximum, and therefore E = −βx vanishes at only two points. By Theorem 1.1, for T > 0, near any homogeneous equilibria we can construct small BGK waves such that a multiple of its minimal period equals T . By [31] and [32], any such multi-BGK waves are linearly and nonlinearly unstable under perturbations of period T . So far, the existence of a stable BGK wave of minimal period remains open, although some numerical evidences suggest the existence of such stable BGK wave. For example, in [5] starting near a unstable multi-BGK wave, numerical simulations show that the long time asymptotics tend to a seemingly stable BGK wave of minimal period. Remark 2.2. In ([22,23]), Dorning and Holloway (see also [10,17]) studied the bifurcation of small travelling BGK waves with speed v p near homogeneous equilibria ( f 0 (v), 0) under the bifurcation condition f 0 (v) ν vp = P dv > 0, (20) v − vp where P denotes the principal value integral. It is equivalent to find steady BGK waves near f 0 v + v p , 0 . The approach in ([22,23]) is as follows. Define 1 f 0 v + v p + f 0 −v + v p , 2 1 o,v p f f 0 v + v p − f 0 −v + v p . (v) = 2 f e,v p (v) =
Then
d dv
f e,v p (v) dv = P v
f 0 (v) dv = ν v p > 0. v − vp
So by the bifurcation theory, there exist small BGK waves ( f e (x, v), −βx ) near ( f e,v p (v), 0) with periods close to √ 2π , and f e (x, v) is even in v. Next, the odd ν (v p ) part f o,v p (x, v) is defined by o e G (e) if v ≥ 0 f o,v p (x, v) = 1 − σ , −G o (e) if v < 0 −2 min β √ 2e when where σ (x) is the cut-off function as defined in (7) and G o (e) = f o,v p e > 0. Define f (x, v) = f e,v p (x, v) + f o,v p (x, v), then ( f (x, v), −βx ) is a steady BGK wave, since for trapped particles with e < − min β, f o,v p (x, v) = 0 and f (x, v) only depends on e. It can be shown that
308
Z. Lin, C. Zeng
p ( f (x, v), −βx ) is close to f 0 v + v p , 0 in L x,v norm. The periods of the BGK waves 2π . In [17,22,23], it was suggested that BGK constructed above are only near √ ν (v p ) p waves with exact period √ 2π and ε−close to f 0 v + v p , 0 in L x,v norm can be ν (v p ) constructed by performing above bifurcation from (1 + μ (ε)) f 0 v + v p , 0 for the proper small parameter μ. It should be pointed out that this strategy actually does not work to get the exact period √ 2π . Since to ensure that ((1 + μ (ε)) f e,v p (v), 0) is a ν (v p ) bifurcation point, it is required that (1 + μ (ε)) f e,v p (v) dv = 1, R
and thus μ (ε) = 0. So μ is not adjustable at all. Second, by Lemma 3.1 below, d e,v p dv f (v) e,v p ν v p = dv ≤ f (v)W 2, p = f 0 (v)W 2, p . v So by the method in [22,23], one can not get small BGK waves with spatial periods less than 2π/ f 0 (v)W 2, p . By comparison, we construct BGK waves with any minimal
period near any homogeneous equilibrium ( f 0 (v), 0) in any W s, p s < 1 + 1p neighborhood. It is also claimed in [22,23] that for v p such that ν v p < 0, there exist no travelling BGK waves with travel speed v p , arbitrarily near ( f 0 (v), 0). For Maxwellian 1 2 f 0 (v) = e− 2 v , the critical speed is about v p = 1.35, since ν v p < 0 when v p < 1.35. However, by our Theorem 1.1, BGK waves with arbitrary travel speed exist near (in W s, p space, s < 1 + 1p ) any homogeneous equilibrium including Maxwellian. So the claim of the critical travel speed based on (20) is not true.
Proof of Corollary 1.1. From the proof of Theorem 1.1 and Proposition 2.1, it follows that: Fix T > 0, for any ε > 0, there exists a homogeneous profile f ε (v) ∈ C ∞ (R) ∩ 2, p W (R), such that f ε (v) ≥ 0, R f ε (v) dv = 1, 2 f ε (v) 2π = k02 = dv with f ε (vε ) = 0, T R v − vε and
f ε (v)− f 0 (v) L 1 (R) + T
R
v 2 | f ε (v)− f 0 (v)| dv + f ε (v) − f 0 (v)W s, p (R) <
Define f δ (v) ∈ C ∞ (R) ∩ W 2, p (R) by
Then f δ (v) ≥ 0,
R
v − vε 1 . f δ (v) = f ε vε + δ δ
f δ (v) dv = 1 and f δ (v) 1 2π 2 k0 (δ)2 = dv = 2 . v − vε δ T
ε . 2 (21)
Small BGK Waves and Nonlinear Landau Damping
309
We consider two cases below. Case 1. f ε (vε ) > 0. By Lemma 6.1 and Remark 6.1 thereafter, there exist unstable modes of the linearized VP equation around ( f ε (v), 0), for wave numbers k in the internal (k1 , k0 ). Here k1 is defined by k12
= R
f ε (v) dv, v − c1
and c1 is a maximum point f ε (v). If there is no maximum point c1 of f ε such that R
f ε (v) dv < k02 , v − c1
then k1 = 0. Choose δ < 1 such that ε f ε (v) − f δ (v) L 1 (R) + T v 2 | f ε − f δ | dv + f ε − f δ W s, p (R) < . 2 R
(22)
Then again by Lemma 6.1 and Remark 6.1, there exist unstable modes of the linearized VP equation around ( f δ (v), 0), for wave numbers k in the internal (k1 (δ), k0 (δ)). Since k0 (δ) > k0 and k0 (δ) − k1 (δ) → k0 − k1 > 0 when δ → 1−, we have k0 ∈ (k1 (δ), k0 (δ)) when δ is close enough to 1. This implies that ( f δ (v), 0) is linearly unstable under perturbations of period T . Moreover, the inequalities (21) and (22) imply that f δ (v) − f 0 (v) L 1 (R) + T v 2 | f δ (v) − f 0 (v)| dv + f δ (v) − f 0 (v)W s, p (R) < ε. R
Case 2. f ε (vε ) < 0. Choose δ > 1 sufficiently close to 1, then by the same argument as in Case 1, ( f δ (v), 0) is linearly unstable under perturbations of period T and f δ (v) − f 0 (v) L 1 (R) + T
R
v 2 | f δ (v) − f 0 (v)| dv + f δ (v) − f 0 (v)W s, p (R) < ε.
This finishes the proof of Corollary 1.1.
3. Nonexistence of BGK Waves in W s, p s > 1 + 1p In this section, we prove Theorem 1.2. The next lemma is a Hardy type inequality. Lemma 3.1. If u (v) ∈ W s, p (R) p > 1, s > 1p , and u (0) = 0, then u (v) v dv ≤ C uW s, p (R), R for some constant C.
310
Z. Lin, C. Zeng
Proof. Since s > 1p , the space W s, p (R) is embedded to the Hölder space C 0,α with α ∈ 0, s − 1p . So |u (v)| = |u (v) − u (0)| ≤ |v|α uC 0,α ≤ C uW s, p |v|α , and thus 1 u (v) u (v) u (v) dv ≤ dv + v v v dv |v|≥1 R −1 1 −1+α |v| ≤ C uW s, p dv +
1
|v|≥1
−1
≤ C uW s, p (R) .
|v| p
1 dv
p
u L p
Proof of Theorem 1.2. Suppose otherwise, then there exist a sequence εn → 0, and nontrivial travelling wave solutions ( f n (x − cn t, v), E n (x − cn t)) T to (1) such that E n (x) is not identically zero, 0 E n (x) d x = 0, f n (x, v) and βn (x) are T −periodic in x, T s, p < εn . v 2 f n (x, v) dvd x < ∞ and f n (x, v) − f 0 (v)Wx,v 0
R
The travelling BGK waves satisfy (v − cn ) ∂x f n − E n ∂v f n = 0, and ∂ En =− ∂x s, p
Because f n ∈ Wx,v with s > 1 +
1 p
>
(23)
+∞ −∞
2 p,
f n dv + 1.
(24)
so by Sobolev embedding
s, p < ∞. fn L ∞ ≤ C f n Wx,v x,v
By a standard estimate in kinetic theory, ρn =
2
f n dv ≤ f n L3 ∞
x,v
1 v 2 f n dv
3
,
(25)
and thus ρn ∈ L 3 (0, T ). So E n (x) ∈ W 1,3 (0, T ) which implies that E n (x) ∈ H 1 (0, T ) and E n (x) is absolutely continuous. Define two sets Pn = {E n = 0} and Qn = {E n = 0}. Then Pn is of non-zero measure and E n = 0 a.e. on Qn . Thus we have T T E (x)2 d x = − ρ ρn (x) E n (x) d x. E d x = − (26) (x) (x) n n n 0
0
Pn
Small BGK Waves and Nonlinear Landau Damping
Since s − 1 >
1 p,
311
by the trace theorem for fractional Sobolev Space, ∂x f n |v=cn , ∂v f n |v=cn ∈ L p (0, T ).
So from Eq. (23), ∂v f n |v=cn = 0 for a.e. x ∈ Pn . By Lemma 3.1, for a.e. x ∈ Pn , ∂v f n p v − c dv (x) ≤ C f n (x, v)Wvs, p ∈ L (Pn ). n From (23), when x ∈ Pn , ρn (x) =
∂v f n dv E n (x) ∈ L p (Pn ), v − cn
and it follows from (26) that T E (x)2 d x − n 0
Pn
∂v f n dv E n (x)2 d x = 0. v − cn
(27)
Denote |Pn | to be the measure of the set Pn . We consider two cases. Case 1. |Pn | → 0 when n → ∞. Since √ E n L ∞ (0,T ) ≤ E n L 1 (0,T ) ≤ T E n L 2 (0,T ) , so from (27), 2 E 2
n L (0,T )
2 ≤ T E n L 2 (0,T ) 2 ≤ T E n L 2 (0,T ) 2 ≤ T E n L 2 (0,T )
Pn
Pn
∂v f n v − c
n
dvd x
f n (x, v)Wvs, p d x
f n (x, v) − f 0 Wvs, p d x + |Pn | f 0 W s, p Pn 2 s, p + |Pn | f 0 s, p ≤ T E n L 2 (0,T ) C f n (x, v) − f 0 Wx,v W 2 < E n L 2 (0,T ) ,
when n is large enough. Thus for large n, E n L 2 (0,T ) = 0 and thus E n (x) ≡ 0, which is a contradiction. Case 2. |Pn | → d > 0 when n → ∞. When n is large enough, we have |Pn | ≥ d2 . By the trace theorem, ∂v f n (x, cn ) − ∂v f 0 (cn ) L p (Pn ) ≤ ∂v f n (x, cn ) − ∂v f 0 (cn ) L p (0,T ) ≤ C f n − f 0 W s, p ≤ Cεn . Since ∂v f n (x, cn ) = 0 for a.e. x ∈ Pn , so ∂v f 0 (cn ) L p (Pn ) ≤ Cεn which implies that Cεn |∂v f 0 (cn )| ≤ 1 . d p 2
312
Z. Lin, C. Zeng
Thus ∂v f 0 (cn ) → 0 when n → +∞. Therefore there exists a subsequence of {cn }, such that either it converges to one of the critical points of f 0 , say vi ∈ S or it diverges. We discuss these two cases separately below. To simplify notations, we still denote the subsequence by {cn }. Case 2.1. cn → vi ∈ S. Rewrite (27) as T ∂v f 0 2 E (x)2 d x = dv E n (x) d x + Vn (x) E n (x)2 d x, n R v − vi Pn Pn 0 where
Vn (x) = R
∂v f n dv − v − cn
R
∂v f 0 dv = v − vi
R
(28)
∂v ( f n (x, v + cn ) − f 0 (v + vi )) dv. v
Note that ∂v ( f n (x, v + cn ) − f 0 (v + vi )) |v=0 = 0 for x ∈ Pn , so by Lemma 3.1, we have |Vn (x)| d x ≤ C f n (x, v + cn ) − f 0 (v + vi )Wvs, p d x Pn
Pn T
f n − f 0 Wvs, p + f 0 (v + cn ) − f 0 (v + vi )W s, p d x 0 s, p + f 0 (v + cn ) − f 0 (v + vi ) s, p . ≤ C f n − f 0 Wx,v W
≤C
T So Pn |Vn (x)| d x → 0 when n → ∞. Since 0 E n (x) d x = 0 and E n ∈ H 1 (0, T ) is T −periodic, we have E
n L 2 (0,T )
≥
2π E n L 2 (0,T ) . T
Also by the assumption of Theorem 1.2, ai = R
∂v f 0 < v − vi
2π T
2 .
Combining above, from (28), we get 2 E 2
n L (0,T )
max {ai , 0} ≤ 2 E n 2L 2 (0,T ) + 2π T
2 ≤ E n L 2 (0,T ) 2 < E 2
Pn
max {ai , 0} 2π 2 + T T
|Vn (x)| d x E n 2L ∞
|Vn (x)| d x Pn
n L (0,T ) ,
when n is large enough. A contradiction again. Case 2.2. {cn } diverges. We assume cn → +∞, and the case when cn → −∞ is similar. Again, for a.e. x ∈ Pn , ∂v f n (x, cn ) = 0. Let such that: χn (v) be a cut-off function cn 3cn cn / 2 − 1, 3c2n + 1 0 ≤ χn ≤ 1, χn (v) = 1 when v ∈ 2 , 2 ; χn (v) = 0 when v ∈
Small BGK Waves and Nonlinear Landau Damping
313
and |χn |C 1 ≤ M (independent of n). Since W s1 , p → W s2 , p when s1 > s2 , we can assume 1p < s − 1 ≤ 1. Then
χ n ∂v f n (1 − χn ) ∂v f n ∂v f n dv d x v − c dv d x ≤ dv + v − cn n Pn R Pn R v − cn R ∂v f n dv d x χn ∂v f n W s−1, p + ≤C v |v−cn |≥ c2n v − cn Pn T −1+ p1 χn ∂v ( f n − f 0 )W s−1, p + χn ∂v f 0 s−1, p + cn f n W 1, p d x ≤C v
0
v
W
s, p + C T χn ∂v f 0 ≤ C (M) f n − f 0 Wx,v
−1+
W s−1, p
+ C T cn
1 p
f n W 1, p x,v
→ 0, when n → ∞, and this again leads to a contradiction as in Case 1. In the above, we use two estimates: i) χn ∂v ( f n − f 0 )W s−1, p ≤ C (M) ∂v ( f n − f 0 )W s−1, p . v
v
ii) χn ∂v f 0
W s−1, p
→ 0,
when n → ∞.
(29)
We prove them below. Estimate i) follows from the following general estimate: Given u (v) ∈ C01 (R), then for any g ∈ W α, p (R) ( p > 1, 0 ≤ α ≤ 1), we have ugW α, p (R) ≤ C uC 1 gW α, p (R) . (30) This estimate is obvious for α = 0 and α = 1, and the case α ∈ (0, 1) then follows from the interpolation theorem. To show estimate ii), we first note that for any h ∈ C0∞ (R), obviously χn h
W s−1, p
≤ C χn h
W 1, p
→ 0, when n → ∞.
Then the estimate (29) follows by using the fact that C0∞ (R) is dense in W s−1, p and the estimate (30). This finishes the proof of Theorem 1.2.
In the above proof of Theorem 1.2, we do not assume that the possible BGK waves have the form (18) or the electric field vanishes only at finitely many points. So we can exclude any traveling structures which might have the form of a nontrivial wave profile plus a homogeneous part. The following lemma shows that the condition 0 < T < T0 in Theorem 1.2 is necessary. Lemma 3.2. Assume f 0 (v) ∈ C 4 (R) ∩ W 2, p (R) ( p > 1). Let S = {vi }li=1 be the set of all extrema points of f 0 and 0 < T0 < +∞ defined by 2 f 0 (v) f 0 (v) 2π = max dv = dv. (31) vi ∈S T0 v − vi v − vm
314
Z. Lin, C. Zeng
Then ∃ ε0 > 0, such that for any 0 < ε < ε0 there exist nontrivial travelling wave solutions ( f ε (x − vm t, v), E ε (x − vm t)) to (1), such that ( f ε (x, v), E ε (x)) has period T0 in x, E ε (x) not identically zero, and T f ε − f 0 L 1x,v + v 2 | f 0 − f ε | d xdv + f ε − f 0 W 2, p < ε. (32) x,v
R
0
Proof. To simplify notations, we assume vm = 0. As in the proof of Lemma 2.2, for δ1 > 0 we define v f 0 (v) + f 0 (−v) v f δ1 (v) = f 0 (v) 1 − σ + σ δ1 2 δ1 v f 0 (v) − f 0 (−v) , σ = f 0 (v) − 2 δ1 where σ (v) is the cut-off function defined by (7). Then we have: 2 f δ1 (v) f 0 (v) 2π i) f δ1 (0) = 0, dv = dv = , v v T0 and f δ1 (v) ∈ C 4 (R) ∩ W 2, p (R) ; f δ1 − f 0 W 2, p (R) → 0, when δ1 → 0.
ii)
Property i) follows since σ (v) is even. To prove property ii), we only need to show that ∂vv f δ1 − f 0 L p (R) → 0 when δ1 → 0, since in the proof of Lemma 2.2, it is already shown that f δ1 − f 0 W 1, p (R) → 0 when δ1 → 0. Note that
∂vv f δ1 − f 0
1 v 1 v f 0 (v) + f 0 (−v) = 2σ ( f 0 (v) − f 0 (−v)) + σ δ1 δ1 δ1 2δ1 v 1 f 0 (v) − f 0 (−v) + σ 2 δ1 = I + I I + I I I.
Since
f 0 (v) − f 0 (−v) = =
v
−v v
f 0 (s) ds = (v − τ )
0
v
−v
s
0
f 0 (τ ) dτ
f 0 (τ ) dτ ds
+
0
−v
(−v − τ ) f 0 (τ ) dτ,
and | f 0 (v) − f 0 (−v)| p p v ≤C (v − τ ) f 0 (τ ) dτ +
0
v
≤C 0
f (τ ) p dτ 0
0
v
0 −v
(−v − τ ) p
(v − τ ) p−1 dτ
p−1
p
f 0 (τ ) dτ
Small BGK Waves and Nonlinear Landau Damping
+
f (τ ) p dτ
0
0
−v
0
−v
315
(v + τ )
p−1
p p−1
dτ
p ≤ Cv 2 p−1 f 0 L p (−v,v) , so |I | p dv ≤ R
C 2p δ1
2δ1 δ1
| f 0 (v) − f 0 (−v)| p dv ≤
p ≤ C f p
L (−2δ1 ,2δ1 )
0
C 2p δ1
2δ1
δ1
p v 2 p−1 f 0 L p (−v,v) dv
.
Similarly,
C |I I | dv ≤ p δ R 1
2δ1
p
δ1
C ≤ p δ1 and
R
v
0
2δ1
δ1
p
f 0 (τ ) dτ
+
0
−v
p
f 0 (τ ) dτ
p p v p−1 f 0 L p (−v,v) dv ≤ C f 0 L p (−2δ
p |I I I | p dv ≤ C f 0 L p (−2δ
1 ,2δ1 )
dv
1 ,2δ1 )
,
thus when δ1 → 0, ∂vv f δ1 − f 0 L p (R) → 0. Choose δ1 > 0 such that f δ − f 0 2, p < ε/2. 1 W (R)
Since f δ1 ∈ C 4 (R) ∩ W 2, p (R) and
f δ1 (v) v
dv =
2π T0
2 ,
this is exactly Case 3 treated in the proof of Proposition 2.1, so we can construct a nontrivial BGK solution ( f ε , E ε ) near f δ1 (v), 0 satisfying fε − fδ 1 + 1 L x,v
T 0
ε v 2 f ε − f δ1 d xdv + f ε − f δ1 W 2, p < . x,v 2 R
Thus ( f ε , E ε ) is a BGK solution satisfying (32).
From the proof of Theorem 1.2, it is easy to get Corollary 1.2. Proof of Corollary 1.2. Suppose otherwise, then there exists a sequence εn → 0, and homogeneous states { f n (v)} which are linear unstable with x−period T and f n − f 0 W s, p (R) < εn . By Lemma 6.1, for each n, there exists a critical point vn of f n (v) such that 2 f n (v) 2π dv > . v − vn T
316
Since
Z. Lin, C. Zeng
f (vn ) ≤ ∂v ( f n − f 0 )C(R) ≤ C f n − f 0 W s, p (R) ≤ Cεn , 0
either {vn } converges to one of the critical points of f 0 (v), say v0 , or {vn } diverges. As in the proof of Theorem 1.2, in the first case, we have f 0 (v) f n (v) dv → dv, when n → ∞. v − vn v − v0 This implies that
f 0 (v) dv ≥ v − v0
2π T
2
>
2π T0
2 ,
a contradiction. For the second case, we have f n (v) dv → 0, when n → ∞, v − vn a contradiction again.
4. Linear Damping In this section, we study in detail the linear damping problem in Sobolev spaces. First, the linear decayestimates derived here are used in Sect. 5 to show that all invariant structures in the H s s > 23 neighborhood of stable homogeneous states are trivial. Second, the linear decay holds true for initial data as rough as f (t = 0) ∈ L 2 , and this suggests that Theorems 1.1, 1.2 and 1.3 about nonlinear dynamics have no analogues at the linear level. We refer to Remark 4.3 for more discussions. The linearized Vlasov-Poisson system around a homogeneous state ( f 0 (v), 0) is the following: ∂f ∂f ∂ f0 ∂t + v ∂ x − E ∂v = 0, (33) +∞ ∂E ∂ x = − −∞ f dv, where f and E are T −periodic in x and the neutralizing condition becomes T 0 R f dvd x = 0. Notice that any ( f, E) = (g (v), 0) with g (v) dv = 0 is a steady solution of the linear system (33). For a general solution ( f, E) of (33), the homogeneous component of f remains steady and does not affect the evolution of E. So we can T consider a function h (x, v) which is T −periodic in x and 0 h (x, v) d x = 0. Denote its Fourier series representation by 2π h (x, v) = ei T kx h k (v). 0=k∈Z
We define the space
Hxsx
Hvsv
by ⎛
h ∈ Hxsx Hvsv if h Hxsx Hvsv = ⎝
k=0
⎞1 2
|k|2sx h k 2H sv ⎠ < ∞. v
Small BGK Waves and Nonlinear Landau Damping
317
Proposition 4.1. Assume f 0 (v) ∈ H s0 (R) s0 > 23 . Let ( f (x, v, t), E (x, t)) be a T solution of (33) with x−period T < T0 and g (x, v) = f (x, v, 0)− T1 0 f (x, v, 0) d x. If g ∈ Hxsx Hvsv with |sv | ≤ s0 − 1, then s t v E (x, t) (34) 3 +s +s ≤ C 0 g H sx H sv ≤ C 0 f (x, v, 0) H sx H sv , x v v v x x L 2t Hx2
for some constant C0 . One may compare this proposition with other smoothing estimates in PDEs. Here sx +sv only implies based on the most naive estimate, the initial value g ∈ Hxsx Hvsv ⊂ Hx,v 3
+sv +sx
E(0) ∈ Hxsx +1 which is much weaker than Hx2 in the above proposition. However, this improved regularity of E may blow up as t → 0. Proof. To simplify notations, we assume T = 2π . Let eikx gk (v), g (x, v) = 0=k∈Z
then by assumption g Hxsx Hvsv = Let f (x, v, t) =
|k|2sx gk 2H sv < ∞.
0=k∈Z
k=0
then E k (t) = −
1 ik
eikx h k (v, t), E (x, t) =
eikx E k (t),
k=0∈Z
h k (v, t) dv, E k (0) = − R
1 ik
gk (v) dv. R
Below we denote C to be a generic constant depending only on f 0 . When k > 0, we use the the well-known formula for E k (t), σ +i∞ G k (− p/ik) 1 e pt dp, (35) E k (t) = 2πi σ −i∞ k 2 − F (− p/ik) where G k (z) =
+∞
−∞
gk (v) dv, F (z) = v−z
+∞ −∞
f 0 (v) dv, v−z
Im z > 0
and σ is chosen so that the integrand in (35) has no poles for Re p > σ . The formula (35) was derived in Landau’s original 1946 paper ([29]) by using Laplace transforms. Here we follow the notations in ([48]). By using the new variable z = − p/ik, we get k E k (t) = 2π
iσ k iσ k
+∞
−∞
G k (z) −ikzt e dz. k 2 − F (z)
(36)
318
Z. Lin, C. Zeng
2π By assumption k ≥ 1 = 2π T > T0 , so by Penrose’s criterion (Lemma 6.1), there exist no unstable modes to the linearized equation with x−period 2π/k. Therefore, k 2 − F (z) = 0 when Im z > 0. Moreover, by the proof of Lemma 6.1, under the 2 condition k > 2π T0 , k − F (x + i0) = 0 for any x ∈ R. It is also easy to see that F (x + i0) → 0 when x → ∞. So there exists c0 > 0, such that 2 (37) k − F (x + i0) ≥ c0 k 2 , for any x ∈ R and k.
Note that for z = iσ + x, when σ → 0+, by (50), gk (v) dv + iπgk (x) = Hgk + iπgk , G k (z) → G k (x + i0) = P R v−x and
F (z) → F (x + i0) = P R
f 0 (v) dv + iπ f 0 (x) = H f 0 + iπ f 0 , v−x
where H is the Hilbert transform. So letting σ → 0+, from (36), we have G k (x + i0) −ikxt k E k (t) = e d x. 2π R k 2 − F (x + i0) Let Ak (t) =
1 2π
R
(38)
G k (x + i0) −i xt e dx k 2 − F (x + i0)
be the Fourier transform of Hk (x) =
G k (x + i0) , k 2 − F (x + i0)
then E k (t) = k Ak (kt). Since H : H s → H s is bounded for any s ∈ R, G k (x + i0) H sv ≤ C gk Hvsv ,
|F (x + i0) H s0 −1 ≤ C f 0 Hvs0 .
By (37) and the inequality f 1 f 2 H s ≤ Cs,s1 f 1 H s1 f 2 H s , if s1 > we have Hk H sv
1 , |s| ≤ s1 , 2
1 1 F (x + i0) ≤ 2 G k (x + i0) H sv + 4 G k (x + i0) 2 k k 1 − F (x + i0) /k H sv F (x + i0) 1 C C gk Hvsv , ≤ ≤ 2 gk Hvsv 1 + 2 k k 1 − F (x + i0) /k 2 H s0 −1 k2
where C depends on f 0 but not k. In the above, the second inequality holds since the estimates 1 − F (x + i0) /k 2 ≥ c0 and F (x + i0) H s0 −1 ≤ C f 0 H s0 ,
Small BGK Waves and Nonlinear Landau Damping
imply
F (x + i0) 1 − F (x + i0) /k 2
319
H s0 −1
< C f 0 H s0
(39)
through direct verification where, for 0 < s0 − 1 < 1, one needs to use the equivalent characterization of W s, p (Rn ) when 0 < s < 1, p > 1 (See [45, Lemma 35.2]):
|u (x) − u (y)| p d xd y < ∞ . W s, p Rn = u ∈ L p Rn | |x − y|n+sp Rn ×Rn So
R
|t|2sv |Ak (t)|2 dt ≤ Hk 2H sv ≤
and
C gk 2H sv v k4
|t|2sv |E k (t)|2 dt = |t|2sv k 2 |Ak (kt)|2 dt R 1−2sv |t|2sv |Ak (t)|2 dt ≤ Ck −3−2sv gk 2H sv . =k
s t v E k (t)2 2 = L
v
R
For k < 0, the same estimate s t v E k (t)2 2 ≤ C |k|−3−2sv gk 2 sv , L H v
follows by taking the complex conjugate of the k > 0 case. Thus s 2 t v E (x, t)2 3 |k|3+2sv +2sx t sv E k (t) L 2 +sx +sv = L 2t Hx2
k=0
≤C
k=0
This finishes the proof.
|k|2sx gk 2H sv = C g2H sx H sv . v
x
v
The decay estimate in Proposition 4.1 is in the integral form. With some additional assumption on the initial data, we can obtain the pointwise decay estimate. Proposition 4.2. Assume f 0 (v) ∈ H s0 (R) s0 > 23 and let 0 < T0 ≤ +∞ be defined by (3). Let ( f (x, v, t), E (x, t)) be a solution of (33) with x−period T < T0 and 1 T g (x, v) = f (x, v, 0) − f (x, v, 0) d x. T 0 $ # s s If g ∈ Hxsx Hvsv , vg ∈ Hx x Hv v with sv > − 21 , sv +sv ≥ 0 and max |sv | , sv ≤ s0 −1, then sv +sv E H s (t) = o t − 2 , when t → ∞, where
3 1 s = min + s x + sv , + s x + sv . 2 2
(40)
320
Z. Lin, C. Zeng
Corollary 4.1. Assume f 0 (v) ∈ H s0 (R) s0 > 23 and T < T0 . −3
−1
(i) If g ∈ Hx 2 L 2v and vg ∈ Hx 2 L 2v , then E L 2x (t) → 0 when t → ∞. −k k with k ≤ s − 1, then E when t → ∞. (ii) If g, vg ∈ Hx,v 0 k+ 1 (t) = o t H
2
Proposition 4.2 and its corollary shows that linear damping is true for initial data of very low regularity, even in certain negative Sobolev spaces. It also shows that the decay rate is mainly determined by the regularity in v, although the regularity in x affects the norm of electrical field that decays. Proof of Proposition 4.2. First we derive a formula for E t (t). We notice that ( f t , E t ) satisfies the linear system (33) and f t (x, v, 0) = −v∂x f (x, v, 0) + E (x, 0) f 0 (v) 3 1 = eikx −ikvgk (v) − gk (v) dv f 0 (v) = g˜ j (x, v), ik R k=0
j=1
where g˜ 1 (x, v) = −∂x (vg (x, v)), 1 ikx g˜ 2 (x, v) = − e gk (v) σ (v) dv f 0 (v) = eikx g˜ k2 (v), ik R k=0 k=0 1 eikx gk (v) (1 − σ (v)) dv f 0 (v) = eikx g˜ k3 (v), g˜ 3 (x, v) = − ik R k=0
k=0
s −1
and σ (v) is the cut-off function defined by (7). Then g˜ 1 ∈ Hx x s +1 g˜ 3 ∈ Hx x Hvs0 −1 , and g˜ 1
s −1
Hx x
s
Hv v
g˜ 2 2
s −1 Hxsx +1 Hv 0
= vg =
s
, 2 gk (v) σ (v) dv f 0 s
Hx x Hv v
|k|2sx
≤
|k|2sx gk 2H sv σ (v)2
g˜ 3 2
s +1 s −1 Hx x Hv 0
≤
Hv−sv
v
k
|k|2sx vgk 2
s
Hv v
k
s −1
Hv 0
R
k
s
Hv v , g˜ 2 ∈ Hxsx +1 Hvs0 −1 ,
f 0 2
s
Hv 0
≤ C g2H sx H sv , x
v
1 − σ (v) 2 2 2 −sv f 0 Hvs0 ≤ C vg H sx H sv . v v x Hv
Correspondingly, we decompose ( ft , Et ) =
3
f ti , E ti
i=1
i
with f ti , E t being the solution of (33) with initial data f ti (t = 0) = g˜i (x, v). Then by Proposition 4.1 we have sv 1 s0 −1 2 , t E 1 +s +s ≤ C vg s 5 +s +s −1 ≤ C g H sx H sv t E t s t x v x 0 v x x v L 2t Hx2
Hx Hv
L 2t Hx2
Small BGK Waves and Nonlinear Landau Damping
and
s0 −1 3 Et t
321
≤ C vg
5 +s +s −1 x 0
L 2t Hx2
s
s
Hx x Hv v
.
For any t2 > t1 sufficiently large and s defined by (40), we have E2Hxs (t2 ) − E2Hxs (t1 ) t2 E (t), E t (t) Hxs dt ≤ = t1
−sv −sv
≤ t1
t1
t2
t1
−sv −(s0 −1)
−sv −(s0 −1)
+ t1 ≤
x
t2
t1 t2 t1
x
1 +s +s x v
Hx2
x
Hxs
dt
dt 5 +s +s −1 x 0
Hx2
dt
s t v E (t) s t s0 −1 E 3 dt t H x
3 +s +s x v
L 2t (t1 ,t2 )Hx2
1 +s +s x v L 2t (t1 ,t2 )Hx2
% & So E2H s (t)
3 i E t (t) i=1
s t v E (t) s t s0 −1 E 2 t H
s t v E (x, t)
· t sv E t1
E (t) Hxs
s 1 s t v E (t) s vE t H t
+ t1
−s −s t1 v v
t2
+ t s0 −1 E t2
5 +s +s −1 x 0 L 2t (t1 ,t2 )Hx2
+ t s0 −1 E t3
5 +s +s −1 x 0 L 2t (t1 ,t2 )Hx2
.
is a Cauchy sequence, thus limt→∞ E2H s (t) exists and must be x
t≥0 t sv E2L 2 H s t x
< ∞ with sv > zero since above computation, it follows that
− 21 .
By fixing t1 and letting t2 → ∞ in the
−s −s E2Hxs (t1 ) = o t1 v v . This finishes the proof.
Remark 4.1. The integral decay estimate in Proposition 4.1 is optimal and the pointwise decay estimate in Proposition 4.2 is close to optimal. Intuitively, the integral estimate (34) suggests that 1 − s + E (x, t) 3 +sx +sv = o t v 2 . (41) Hx2
In [48], the single-mode solution eikx ( f (v, t), E (t)) with initial profile
2 −(v−α)2 v ≥ α , α is arbitrary constant, f (v, 0) = g (v) = (v − α) e 0 v≤α was calculated explicitly for the problem at Maxwellian, and the decay rate linearized for |E (t)| was found to be O t−3 . Note that g (v), vg ∈ H 2 and g , (vg) are delta −
1
+ε
5
functions which belong to H 2 for any ε > 0, and thus g (v), vg ∈ H 2 −ε . So −(3−ε) in the integral form and Corollary 4.1 Proposition 4.1 suggests a decay rate o t 5
(ii) yields a pointwise decay rate o t − 2 +ε . In [2, pp.188–189], the authors made a more general claim about the decay rate of single mode solutions: for initial profile g (v) with
322
Z. Lin, C. Zeng
(n + 1)th derivative being δ−function like, the decay rate of |E (t)| is O t −(n+1) . In such −(n−ε) − n+ 21 −ε cases, our results give the decay rates o t in the integral form and o t pointwise. 3 In Theorem 1.3, we use the integral estimate (34) to prove that H 2 is the critical regularity for existence or nonexistence of nontrivial invariant structures near stable homogeneous states. This again suggests that the decay estimate in Proposition 4.1 is optimal. Remark 4.2. The linear decay result is also true for initial data in L p space. For simplicity, we consider a single mode solution ( f (x, v, t), E (x, t)) = eikx (h (v, t), E (t))
(42)
to (33) with h (v, 0) = g (v). Assume f 0 (v) ∈ L 1 (R) ∩ W 2, p0 (R) ( p0 > 1) and 0 < T0 ≤ +∞ be defined by (3). We have the following result: If T = 2π k < T0 and g (v) ∈ L p ( p > 1), v 2 g ∈ L 1 , then |E (t)| → 0 when t → +∞. We prove it briefly below. Since g ∈ L p , v 2 g ∈ L 1 , so |g| dv + |g| dv ≤ 21/ p g L p + v 2 g 1 < ∞, g L 1 (R) ≤ |v|≤1
L
|v|≥1
and 1 vg L q ≤ v |g| 2 for 1 < q < 2 satisfying 1 q2
=
1 q1
−
1 q,
1 q
=
1 2
L
+
1 2 |g| 2 1 2p .
L2p
1 1 2 ≤ v 2 g 1 g L2 p , L
Since q < p , for any 1 < q1 < q, letting
we have g L q1 (R) ≤ g L q1 (|v|≤1) + g L q1 (|v|≥1) 1 vg L q < ∞. ≤ C g L p + v q L
2 (|v|≥1)
Since H is bounded L p → L p for any p > 1 and the Fourier transform is bounded L p → L p for any 1 < p ≤ 2, so from (38), E (t) L q1 ≤ C g L q1 (R) < ∞.
(43)
As in the proof of Proposition 4.2, ( f t , E t ) satisfies (33) with 1 g (v) dv f 0 (v) = eikx g˜ (v). f t (t = 0) = eikx −ikvg (v) − ik R Since
g˜ (v) L q ≤ C vg L q + g L 1 (R) f 0 L q < ∞,
by using the estimate for E (t), we get E (t) q ≤ C g˜ (v) L q < ∞. L The decay of |E (t)| follows from the estimates (43) and (44).
(44)
Small BGK Waves and Nonlinear Landau Damping
323
Remark 4.3. In Proposition 4.2, we prove that the linear decay of the electrical field E in L 2 norm holds true for initial data as rough as − 23
f (t = 0) ∈ Hx
− 21
L 2v , v f (t = 0) ∈ Hx
L 2v .
In particular, it is not necessary to have any assumption on derivatives of f (t = 0) to get linear decay of E. The linear decay result implies that there exist no nontrivial − 23
L 2v space problem. So our result on for the linearized existence of BGK waves in the W s, p s < 1 + 1p neighborhood (Theorem 1.1) can not betraced back contrasting nonlinear dynamics in to the linearized level. Also, the W s, p s > 1 + 1p and particularly in H s s > 23 spaces (Theorems 1.2 and 1.3) have no analogue on the linearized level. These again are due to the fact that particle trapping effects are completely ignored on the linear level, but instead they play an important role on nonlinear dynamics.
invariant structures even in Hx
5. Invariant Structures in H s s > 23 s (s ≥ 0) space We define invariant structures near a homogeneous state ( f 0 (v), 0) in Hx,v to be the solutions ( f (t), E (t)) of the nonlinear VP equation (1a)–(1b), satisfying that for all t ∈ R,
f (t) − f 0 H s ((0,T )×R) < ε0 , for some constant ε0 > 0. The above defined invariant structures include the well known structures such as travelling waves, time-periodic, quasi-periodic or almost periodic solu1+ 1 , p
tions. In Secs. 2 and 3, we prove that W p is the critical regularity for existence of nontrivial travelling waves near a stable homogeneous state. For p = 2, this critical reg3 3 ularity is H 2 . In this section, we prove a much stronger result that H 2 is also the critical regularity for existence of any nontrivial invariant structure near a stable homogeneous state. In the proof, we use the linear decay estimate in Proposition 4.1. Lemma 5.1. Assume f 0 (v) ∈ H s0 (R) s0 > 23 and let 0 < T0 ≤ +∞ be defined by (3). Let ( f (x, v, t), E (x, t)) be a solution of (1a)–(1b) with x−period T < T0 , satisfying that: For some 23 < s ≤ s0 and sufficiently small ε0 , f (t) − f 0 L 2x Hvs ((0,T )×R) < ε0 , for all t ≥ 0. Then
(1 + t)s−1 E (x, t)
3
L 2{t≥0} Hx2
≤ Cε0 ,
(45)
for some constant C. Proof. Denote L 0 to be the linearized operator corresponding to the linearized Vlasov-Poisson equation at ( f 0 (v), 0), and E is the mapping from f (x, v) to E (x) by the Poisson equation f dv, Ex = −
324
Z. Lin, C. Zeng
T where f satisfies the neutral condition 0 R f (x, v) dvd x = 0. It follows from Proposition 4.1 that: For any 0 ≤ sv ≤ s0 − 1, if h (x, v) ∈ L 2x Hvsv , then (46) 3 ≤ C h (x, v) L 2 H sv . (1 + t)sv E et L 0 h x v L 2t Hx2
Denote f 1 (t) = f (t) − f 0 , then ∂t f 1 = L 0 f 1 + E∂v f 1 . Thus
t
f 1 (t) = et L 0 f 1 (0) + 0
e(t−u)L 0 (E∂v f 1 ) (u) du = f lin (t) + f non (t),
and correspondingly E (t) = E ( f lin (t)) + E ( f non (t)) = E lin (t) + E non (t). By the linear estimate (46), (1 + t)s−1 E lin (x, t) and 2 3 (1 + t)s−1 E non (x, t) 2 L {t≥0} Hx2 ∞ = (1 + t)2(s−1) E non (x, t)2
3
L 2{t≥0} Hx2
3
≤ C f 1 (0) L 2 Hvs−1 , x
dt
Hx2
0
t 2 E e(t−u)L 0 (E∂v f 1 ) (u) 3 du dt Hx2 0 0 ∞ t ≤ (1 + t)2(s−1) (1 + (t − u))−2(s−1) (1 + u)−2(s−1) du 0 0 t 2 2(s−1) · (1 + u) (1 + (t − u))2(s−1) E e(t−u)L 0 (E∂v f 1 ) (u) 23 dudt Hx 0 ∞ t 2 ≤C (1 + u)2(s−1) (1 + (t − u))2(s−1) E e(t−u)L 0 (E∂v f 1 ) (u) 23 dudt Hx ∞ 0 ∞ 0 2 =C (1 + u)2(s−1) (1 + (t − u))2(s−1) E e(t−u)L 0 (E∂v f 1 ) (u) 23 dtdu Hx 0 u ∞ ≤C (1 + u)2(s−1) (E∂v f 1 ) (u)2L 2 H s−1 du x v 0 ∞ ≤C (1 + u)2(s−1) E (u)2 3 f 1 (u)2L 2 H s du
≤
∞
0
(1 + t)2(s−1)
2 ≤ Cε02 (1 + t)s−1 E (x, t) 2
x
Hx2
3
L {t≥0} Hx2
.
v
Small BGK Waves and Nonlinear Landau Damping
325
In the above estimate, we use the fact that t (1 + (t − u))−2(s−1) (1 + u)−2(s−1) du ≤ C (1 + t)−2(s−1) 0
because 2 (s − 1) > 1 by our assumption that s > 23 , and the inequality E∂v f 1 L 2 Hvs−1 ≤ C E x
Thus
(1 + t)s−1 E (x, t)
3
Hx2
f 1 L 2x Hvs .
3
L 2{t≥0} Hx2
≤ (1 + t)s−1 E lin (x, t)
+ (1 + t)s−1 E non (x, t)
3 L 2{t≥0} Hx2
≤ C f 1 (0) L 2x Hvs + Cε0 (1 + t)s−1 E (x, t) By taking ε0 =
3
L 2{t≥0} Hx2
3
L 2{t≥0} Hx2
.
1 2C ,
we get the estimate (45).
Proof of Theorem 1.3. For any t0 > 0, let f˜ (t), E˜ (t) be the solution of nonlinear VP equation (1a)-(1b) with the initial data f˜ (0), E˜ (0) = ( f (−t0 ), E (−t0 )). Then ( f (t), E (t)) = The assumption (4) implies that ˜ f (t) − f 0
L 2x Hvs
f˜ (t + t0 ), E˜ (t + t0 ) .
< ε0 , for all t ∈ R.
Thus by Lemma 5.1, (1 + t)s−1 E˜ (x, t)
3
L 2{t≥0} Hx2
≤ Cε0 .
So 0
1
E (x, t)
2
3 Hx2
dt =
t0
≤ ≤
2 ˜ E (x, t)
t0 +1
1 (1 + t0 )2(s−1) (Cε0 )2 (1 + t0 )2(s−1)
t0 +1
t0
.
3
Hx2
dt 2 (1 + t)2(s−1) E˜ (x, t)
3
Hx2
dt
326
Z. Lin, C. Zeng
Since t0 can be arbitrarily large, we have
1
E (x, t)2
3
Hx2
0
dt = 0,
and thus E (x, t) ≡ 0 when t ∈ [0, 1]. Repeating the above argument for any finite time interval I ⊂ R, we get E (x, t) ≡ 0 when t ∈ I . Thus E (x, t) ≡ 0 for any t ∈ R.
The following nonlinear instability result follows immediately from Theorem 1.3. Corollary 5.1. Assume the homogeneous profile f 0 (v) ∈ H s (R) s > 23 . For any T < T0 (defined by (3)), there exists ε0 > 0, such that for any solution ( f (t), E (t)) to the nonlinear VP equation (1a)–(1b) with nonzero E (0), there exists T ∈ R such that f (T ) − f 0 L 2x Hvs ≥ ε0 . The invariant structures studied in Theorem 1.3 stay in the L 2x Hvs s > 23 neighborhood of a stable homogeneous state ( f 0 (v), 0) for all time t ∈ R. We can also study the positive (or negative) invariant structures near ( f 0 (v), 0), which are solutions ( f (t), E (t)) to the nonlinear VP equation satisfying that f (t) − f 0 L 2x Hvs < ε0 , for all t ≥ 0 (or t ≤ 0). The next theorem shows that the electric field of these semi-invariant structures must decay when t → +∞ (or t → −∞). Theorem 5.1. Assume the homogeneous profile f 0 (v) ∈ H s (R) s > 23 . For any T < T0 (defined by (3)), there exists ε0 > 0 sufficiently small, such that if f (t) − f 0 L 2x Hvs < ε0 , for all t ≥ 0 (or t ≤ 0), and f (0) L ∞ < ∞, x,v
T
v 2 f (0, x, v) dvd x < ∞, R
0
then E (t, x) L 2x → 0 when t → +∞ (or t → −∞). Proof. We only consider the positive invariant case, since the proof is the same for the negative invariant case. First, there exists a constant C depending on M1 = f (0) L ∞ T and M2 = 0 R 21 v 2 f (0) dvd x, such that E (x, t) Hx1 ≤ C, for all t. Indeed, by the same estimate as in (25), ρ (x, 0) L 3 = f v, 0) dv (x, 2 3
2
L3
≤ f (0) L3 ∞
0
T
1/3
v 2 f (x, v, 0) dvd x R
1 3
= M1 M2 and
E (x, 0) H 1 ≤ C 1 − ρ (x, 0) L 2 ≤ C T /2 + T 1/6 ρ (x, 0) L 3 ≤ C.
(47)
Small BGK Waves and Nonlinear Landau Damping
By the energy conservation, T 0
R
v 2 f (x, v, t) dvd x + E (x, t)2L 2 =
Let j =
x
327
T 0
R
v 2 f (x, v, 0) dvd x + E (x, 0)2L 2 < C.
v f dv, then 2/3 1/3 | j (t)| = v f (t) dv ≤ f (t) L ∞ v 2 f (x, v, t) dvd x , R
and thus j (x, t) Since
1
3 L x2
3
≤ M13 M22 ≤ C.
T d 2 E (x, t) L 2 = j (x, t) E (x, t) d x x dt 0 ≤ j (x, t) 3 E (x, t) L 3x ≤ C E (x, t) Hx1 , L x2
and ∞ 0
E (x, t) H 1 dt ≤ x
∞ 0
(1 + t)−2(s−1) dt
1 ∞ 2 0
3 (1 + t)2(s−1) E (x, t)2 3 dt Hx2
2
≤ Cε0 ,
thus limt→∞ E (x, t) L 2x exists and this limit must be zero. This finishes the proof.
Acknowledgments. This work is supported partly by the NSF grants DMS-0908175 (Lin) and DMS-0801319 (Zeng). We thank Cédric Villani for useful comments.
6. Appendix In this Appendix, we reformulate Penrose’s linear stability criterion. The main purpose is to clarify the intervals of wave numbers (periods) for which linear instability can be found. In the original paper of Penrose [41], a necessary and sufficient condition was given for linear instability of a homogeneous state at certain wave number. However, the precise range of unstable wave numbers was not given in [41]. Lemma 6.1. Assume f 0 (v) ∈ W 2, p (R) ( p > 1). Let S = {vi }li=1 be the set of all extrema points of f 0 . If for some 1 ≤ i ≤ l, 2 f 0 (v) 2π dv = > 0, (48) v − vi Ti then there exists a linearly growing mode with x−period T near Ti . More precisely, when vi is a minimum (maximum) point of f 0 , unstable modes exist for T slightly greater (smaller) than T . Let 0 < T0 ≤ +∞ be defined by 2
f 0 (v) 2π = max 0, max dv . vi ∈S T0 v − vi Then for T < T0 , there exist no unstable modes with x−Period T .
328
Z. Lin, C. Zeng
Proof. Plugging the normal mode solution ( f (x, v, t), E (x, t)) = eik(x−ct) ( f k (v), E k ) into the linearized Vlasov-Poisson equation, we obtain the standard dispersion relation k − 2
f 0 (v) dv = 0. v−c
(49)
Linear instability with x−period T corresponds to a solution of (49) with k = 2π T and Im c > 0. When the condition (48) is satisfied, we have a neutral mode of stability 2 and c0 = vi . Then local bifurcation of unstable modes near (k0 , c0 ) with k0 = 2π Ti can be shown, for example, by the arguments used in [30] for the shear flow instability. The bifurcation direction can be seen from the following computation. Let (k, c) be an unstable mode near (k0 , c0 ).Then f 0 (v) f 0 (v) f 0 (v) dv − k 2 − k02 = dv, dv = (c − vi ) v−c v − vi (v − vi ) (v − c) and by the Plemelj formula when Im c → 0+, k 2 − k02 f 0 (v) f 0 (v) dv → P = dv + iπ f 0 (vi ), c − vi (v − vi ) (v − c) (v − vi )2 where P is the Cauchy principal value. So when f 0 (vi ) > 0 (< 0), we have to let 2 2 2 2 k < k0 k > k0 to ensure Im c > 0. The linear stability when T < T0 can be seen most easily from the following Nyquist graph (see [41]) in the complex plane: Z (ξ + i0) = lim
η→0+
f 0 (v) dv = P v − (ξ + iη)
f 0 (v) dv + iπ f 0 (ξ ), ξ ∈ R. v−ξ
(50)
The unstable wave numbers consist of the part on the positive real axis enclosed by the graph of Z (ξ + i0). So the maximal unstable wave number corresponds to the right-most intersection point of the graph of Z (ξ + i0) with the positive real axis. Therefore if one f0 (v) of the integrals v−v dv is positive, the maximal unstable wave number kmax is i 2 = max kmax vi ∈S
f 0 (v) dv = v − vi
2π T0
2 ,
and all perturbations with k > kmax or equivalently T < T0 are linearly stable. For f0 (v) 1 2 dv to be non-positive, such as Maxwellian e− 2 v , homogeneous states with all v−v i
perturbations of any period (wave number) are linearly stable and thus T0 = +∞. Remark 6.1. 1) The assumption f 0 (v) ∈ W 2, p (R) ( p > 1) is used to ensure that f 0 (v) is locally Hölder continuous and thus the function Z (ξ + i0) is well defined, continuous 1, p and f locally Hölder continuous, and bounded. Lemma 6.1 is still true for f 0 ∈ W 0
particularly for f 0 (v) ∈ W s, p (R) p > 1, s > 1 +
1 p
.
Small BGK Waves and Nonlinear Landau Damping
329
2) The local bifurcation of unstable modes near a neutral mode (k0 , vi ) can be extended globally in the following way. Let vi be an extrema point of f 0 (v), k02 =
2π Ti
2
=
f 0 (v) dv > 0. v − vi
Suppose f 0 (vi ) > 0, then the unstable modes with Im c > 0 exist when k is slightly less than k0 . This unstable mode can be continued by decreasing k as long as the growth rate is not zero. This continuation process can only stop at another neutral mode (k1 , c1 ) with k1 < k0 , c1 ∈ R,. By (50), we must have f 0 (c1 )
= 0,
k12
=
2π T1
2
=
f 0 (v) dv > 0. v − c1
For any wave number k ∈ (k1 , k0 ), there exists an unstable mode. Moreover, since the local bifurcation of unstable modes near k1 is only for slightly larger wave number, we must have f 0 (c1 ) < 0. Similarly, when f 0 (vi ) < 0, the unstable modes exist for wave numbers k ∈ (k0 , k2 ), where 2 f 0 (v) 2π 2 = dv, with f 0 (c2 ) = 0, f 0 (c2 ) > 0. k2 = T2 v − c2 From the above continuation argument, it is also easy to see the linear stability for k > kmax without using the Nyquist graph. Suppose at some k > kmax there exists an unstable mode. Then we can extend this unstable mode for k > k until it stops at a neutral mode k , c with 2 k =
f 0 (v) dv > 0, f 0 c = 0. v−c
But k > k > kmax , this is a contradiction with the definition of kmax . We also note that kmax must occur at a minimal point of f 0 , since the unstable modes only bifurcate for wave numbers less than kmax . 3) Finally, we point out that there could exist “stability gaps” of wave numbers in ¯ ˜ (0, kmax ). By our discussions above in 2), such stability gap must be of the form k, k , where f 0 (v) f 0 (v) 2 ˜ ¯k 2 = dv > 0, k = dv > 0, v − c¯ v − c˜ and c, ¯ c˜ are minimum and maximum points of f 0 respectively. From the Nyquist graph of Z (ξ + i0), it is easy to see that these stability gaps correspond to positive intervals in the real axis not enclosed by the Nyquist curve. References 1. Adams, R.A., Fournier, J.J.F.: Sobolev spaces. Second edition. Pure and Applied Mathematics (Amsterdam), 140. Amsterdam: Elsevier/Academic Press, 2003 2. Akhiezer, A., Akhiezer, I., Polovin, R., Sitenko, A., Stepanov, K.: Plasma electrodynamics. Vol. I: Linear theory, London: Pergamon Press, 1975 (English Edition, translated by D. ter Haar)
330
Z. Lin, C. Zeng
3. Armstrong, T., Montgomery, D.: Asymptotic state of the two-stream instability. J. Plasma. Phys. 1(part 4), 425–433 (1967) 4. Backus, G.: Linearized plasma oscillations in arbitrary electron distributions. J. Math. Phys. 1, 178–191 (1960) 5. Gizzo, A., Izrar, B., Bertrand, P., Fijalkow, E., Feix, M.R., Shoucri, M.: Stability of Bernstein-GreeneKruskal plasma equilibria. Numerical Experiments over a Long Time. Phys, Fluids 31(1), 72–82 (1988) 6. Bernstein, I., Greene, J., Kruskal, M.: Exact nonlinear plasma oscillations. Phys. Rev. 108(3), 546–550 (1957) 7. Bernstein, I.B.: Waves in a Plasma in a Magnetic Field. Phys. Rev. 109, 10–21 (1958) 8. Bohm, D., Gross, E.P.: Theory of Plasma Oscillations. A. Origin of Medium-Like Behavior. Phys. Rev. 75, 1851–1864 (1949) 9. Brunetti, M., Califano, F., Pegoraro, F.: Asymptotic evolution of nonlinear Landau damping. Phys. Rev. E 62, 4109–4114 (2000) 10. Buchanan, M.L., Dorning, J.J.: Nonlinear electrostatic waves in collisionless plasmas. Phys. Rev. E 52, 3015–3033 (1995) 11. Buchanan, M.L., Dorning, J.J.: Superposition of nonlinear plasma waves. Phys. Rev. Lett. 70, 3732–3735 (1993) 12. Case, K.: Plasma oscillations. Ann. Phys. 7, 349–364 (1959) 13. Caglioti, E., Maffei, C.: Time asymptotics for solutions of Vlasov–Poisson equation in a circle. J. Stat. Phys. 92, 301–323 (1998) 14. Danielson, J.R., Anderegg, F., Driscoll, C.F.: Measurement of Landau Damping and the Evolution to a BGK Equilibrium. Phys. Rev. Lett. 92, 245003-1–245003-4 (2004) 15. Degond, P.: Spectral theory of the linearized Vlasov–Poisson equation. Trans. Amer. Math. Soc. 294(2), 435–453 (1986) 16. Demeio, L., Zweifel, P.F.: Numerical simulations of perturbed Vlasov equilibria. Phys. Fluids B 2, 1252– 1255 (1990) 17. Demeio, L., Holloway, J.P.: Numerical simulations of BGK modes. J. Plasma Phys. 46, 63–84 (1991) 18. Glassey, R., Schaeffer, J.: On time decay rates in Landau damping. Comm. Part. Diff. Eqs. 20, 647– 676 (1995) 19. Glassey, R., Schaeffer, J.: Time decay for solutions to the linearized Vlasov equation. Transport Theo. Stat. Phys. 23, 411–453 (1994) 20. Guo, Y., Strauss, W.: Instability of periodic BGK equilibria. Comm. Pure Appl. Math. XLVIII, 861– 894 (1995) 21. Klimas, A.J., Cooper, J.: Vlasov–Maxwell and Vlasov–Poisson equations as models of a one-dimensional electron plasma. Phys. Fluids 26, 478–480 (1983) 22. Holloway, J.P., Dorning, J.J.: Undamped plasma waves. Phys. Rev. A 44, 3856–3868 (1991) 23. Holloway, J.P., Dorning, J.J.: Nonlinear but small amplitude longitudinal plasma waves. In: Modern mathematical methods in transport theory (Blacksburg, VA, 1989). Oper. Theory Adv. Appl. 51, Basel: Birkhäuser, 1991, pp. 155–179 24. Hörmander, L.: The analysis of linear partial differential operators. I. Distribution theory and Fourier analysis. Second edition. Grundlehren der Mathematischen Wissenschaften, 256. Berlin: Springer-Verlag, 1990 25. Hwang, H.J., Vélazquez, J.O.: On the existence of exponentially decreasing solutions of the nonlinear landau damping problem. Indiana Univ. Math. J. 58(6), 2623–2660 (2009) 26. Isichenko, M.B.: Nonlinear Landau Damping in Collisionless Plasma and Inviscid Fluid. Phys. Rev. Lett. 78, 2369–2372 (1997) 27. Krasovsky, V.L., Matsumoto, H., Omura, Y.: Electrostatic solitary waves as collective charges in a magnetospheric plasma: Physical structure and properties of Bernstein–Greene–Kruskal (BGK) solitons. J. Geophys. Res. 108(A3), 1117 (2004) 28. Lancellotti, C., Dorning, J.J.: Time-asymptotic wave propagation in collisionless plasmas. Phys. Rev. E 68, 026406 (2003) 29. Landau, L.: On the vibration of the electronic plasma. J. Phys. USSR 10, 25 (1946) 30. Lin, Z.: Instability of some ideal plane flows. SIAM J. Math. Anal. 35, 318–356 (2003) 31. Lin, Z.: Instability of periodic BGK waves. Math. Res. Letts. 8, 521–534 (2001) 32. Lin, Z.: Nonlinear instability of periodic waves for Vlasov-Poisson system. Comm. Pure. Appl. Math. 58, 505–528 (2005) 33. Lin, Z., Zeng, C.: Invariant manifolds of Euler equations. Preprint in preparation 34. Lin, Z., Zeng, C.: Inviscid dynamical structures near Couette flow. Arch. Rat. Mech. Anal., to appear, doi:10.1007/500205-010-0389-9, 2010 35. Lin, Z., Zeng, C.: Invariant manifolds of Vlasov-Poisson equations. Work in progress 36. Medvedev, M.V., Diamond, P.H., Rosenbluth, M.N., Shevchenko, V.I.: Asymptotic Theory of Nonlinear Landau Damping and Particle Trapping in Waves of Finite Amplitude. Phys. Rev. Lett. 81, 5824 (1998)
Small BGK Waves and Nonlinear Landau Damping
331
37. Manfredi, G.: Long-Time Behavior of Nonlinear Landau Damping. Phys. Rev. Lett. 79, 2815 (1997) 38. Mouhot, C., Villani, C.: On Landau damping. Acta Math. (to appear) 39. Muschietti, L., Ergun, R.E., Roth, I., Carlson, C.W.: Phase-space electron holes along magnetic field lines. Geophys. Res. Lett. 26, 1093–1096 (1999) 40. Orr, W.McF.: Stability and instability of steady motions of a perfect liquid. Proc. Ir. Acad. Sect. A, Math Astron. Phys. Sci. 27, 9–66 (1907) 41. Penrose, O.: Electrostatic instability of a non-Maxwellian plasma. Phys. Fluids 3, 258–265 (1960) 42. O’Neil, T.: Collisionless damping of nonlinear plasma oscillations. Phys. Fluids 8, 2255–2262 (1965) 43. Stein, E.M.: Singular integrals and differentiability properties of functions. Princeton Mathematical Series, No. 30. Princeton, NJ: Princeton University Press, 1970 44. Strichartz, R.S.: Multipliers on fractional Sobolev spaces. J. Math. Mech. 16, 1031–1060 (1967) 45. Tartar, L.: An introduction to Sobolev spaces and interpolation spaces. Lecture Notes of the Unione Matematica Italiana, 3. Berlin-Bologna: Springer/UMI, 2007 46. Triebel, H.: Theory of function spaces. Monographs in Mathematics, 78. Basel: Birkhäuser Verlag, 1983 47. Valentini, F., Carbone, V., Veltri, P., Mangeney, A.: Wave-Particle Interaction and Nonlinear Landau Damping in Collisionless Electron Plasmas. Transport Th. Stat. Phys. 34, 89–101 (2005) 48. Weitzner, H.: Plasma oscillations and Landau damping. Phys. Fluids 6, 1123–1127 (1963) 49. van Kampen, N.: On the theory of stationary waves in plasma. Physica 21, 949–963 (1955) 50. Zhou, T., Guo, Y., Shu, C.-W.: Numerical study on Landau damping. Physica D 157, 322–333 (2001) Communicated by H. Spohn
Commun. Math. Phys. 306, 333–364 (2011) Digital Object Identifier (DOI) 10.1007/s00220-011-1285-y
Communications in
Mathematical Physics
Superconformal Structures on Generalized Calabi-Yau Metric Manifolds Reimundo Heluani1 , Maxim Zabzine2 1 Department of Mathematics, University of California, Berkeley, CA 94720, USA.
E-mail:
[email protected];
[email protected]
2 Department of Physics and Astronomy, Uppsala University, Uppsala, Box 516, 751 20 Uppsala, Sweden.
E-mail:
[email protected] Received: 2 July 2010 / Accepted: 8 February 2011 Published online: 28 June 2011 – © Springer-Verlag 2011
Abstract: We construct an embedding of two commuting copies of the N = 2 superconformal vertex algebra in the space of global sections of the twisted chiral-anti-chiral de Rham complex of a generalized Calabi-Yau metric manifold, including the case when there is a non-trivial H -flux and non-vanishing dilaton. The 4 corresponding BRST charges are well defined on any generalized Kähler manifold. This allows one to consider the half-twisted model defining thus the chiral de Rham complex of a generalized Kähler manifold. The classical limit of this result allows one to recover the celebrated generalized Kähler identities as the degree zero part of an infinite dimensional Lie superalgebra attached to any generalized Kähler manifold. As a byproduct of our study we investigate the properties of generalized Calabi-Yau metric manifolds in the Lie algebroid setting. 1. Introduction Starting from pioneering works by Zumino [28] and by Alvarez-Gaumé and Freedman [1] the supersymmetry algebra of the sigma model is closely related to the complex geometry. On a complex manifold one has a decomposition of the sheaf of differential forms into a bi-complex (∧ p,q T ∗ , ∂, ∂) which induces corresponding Hodge structures when the manifold is Kähler. In particular the Dolbeault differentials and their adjoint differentials form two commuting copies of N = 2 supersymmetry algebra. An analog of such construction also exists for any generalized Kähler manifold [13]. The aim of this article is to provide an affine or chiral analog of this result in the case when the manifold is generalized Calabi-Yau metric, extending thus the results in [15] and producing the quantum counterpart of the results in [4,27] which present the Hamiltonian treatment of the non-linear sigma model (see also [21] for an exposition from the Lagrangian perspective in the context of sheaves of vertex algebras). To any differentiable manifold M one can associate a sheaf of vertex algebras CDR(M) [22]. More generally, given any Courant algebroid E one constructs a sheaf of
334
R. Heluani, M. Zabzine
SUSY vertex algebras U ch (E) [15]. When E is endowed with a generalized Calabi-Yau structure, there is an embedding of the N = 2 superconformal vertex algebra into the global sections of U ch (E) [17]. In the usual Calabi-Yau case, it was shown in [15] that one can in fact construct two commuting copies of the N = 2 superconformal structure, each with central charge 23 dimM. In this article we combine and generalize these results to the case when E is endowed with a generalized Calabi-Yau metric structure as defined in [12]. An interesting new phenomenon in this article is that in the presence of a non-trivial H -flux, the dilaton field plays a crucial role in all of our formulas. This feature is well known in the physics literature. On a given Kähler manifold M with Hermitian metric g, the existence of a global holomorphic volume form is intimately related with the vanishing of the Ricci curvature of g and with the fact that the holonomy of M reduces to SU (n). The metric g gives rise to a volume form volg and the global holomorphic volume form satisfies ∧ = volg . The volume form is covariantly constant with respect to the LeviCivita connection of g and √ the vanishing of the Ricci tensor is expressed in holomorphic coordinates by ∂α ∂β¯ log det g = 0. In the generalized Kähler case the situation is subtler. There exists a dictionary between generalized Kähler manifolds and bihermitian manifolds [12]. The latter are bihermitian manifolds (M, J± , g) with two connections ∇ ± with torsion encoded by a closed three form and such that ∇ ± J± = 0. To the data of a generalized Calabi-Yau metric manifold we can associate two holomorphic volume forms ± which are holomorphic with respect to J± . These in turn give rise to a unique volume form ν = ± ∧ ± and the ratio between this volume form and the Riemannian volume form defines the dilaton by ν = e−4 volg . It is not the holomorphic volume forms ± that enter in the fields of the N = 2 structure, but rather the forms corrected by the dilaton e−2 ± which become covariantly constant with respect to ∇ ± . The analogous statement to the vanishing of the Ricci tensor becomes ∂α ∂β¯ log e−4 det g = 0, where we use holomorphic coordinates for either complex structure. We show in Sect. 6 that these statements correspond to the unimodularity of the Lie algebroids corresponding to the generalized complex structures J1,2 . The existence of N = 2 superconformal supersymmetry allows us to perform a topological twist and in particular consider the BRST cohomology. When we have two commuting copies of the superconformal algebra we may perform the topological twist in one of the two sectors, say the plus sector, and consider its BRST cohomology. Carrying out this construction on a generalized Calabi-Yau metric manifold, produces a sheaf of SUSY vertex algebras with a remaining N = 2 superconformal structure (that of the minus sector). In the case when M is a usual Calabi-Yau manifold, this sheaf is isomorphic to the chiral de Rham complex of M defined in [22], together with its topological structure. Moreover, since in order to consider the BRST cohomology we need only the zero modes of fields to be well defined (as opposed to the full superconformal algebra) we may perform the above mentioned half-twisting procedure to obtain a sheaf of SUSY vertex algebras on any generalized Kähler manifold M, we call this sheaf the chiral de Rham complex of M, a name that is justified since in the usual Kähler case we recover the construction of [22] in the holomorphic setting. In the usual Kähler case, the holomorphic chiral de Rham complex of M can be described purely in terms of holomorphic data by generators and relations. The interpretation of this sheaf as a half-twisted model was given in [26] and in the supersymmetric
Superconformal Structures on Generalized Calabi-Yau Metric Manifolds
335
setting in [19]. The situation in the generalized Kähler case is subtler since there is no obvious notion of what “holomorphic data” means. In Theorem 3 below, we give such a description, after developing rudimentary notions of differential calculus on generalized Kähler manifolds. This result extends that of [19,26] to the generalized Kähler case with or without H -flux, while at the same time we find an interesting new spin (see Remark 8). In the bihermitian setup one can attach (a twisted version of) the holomorphic chiral de Rham complex of [22] to each one of the two Hermitian complex structures. We show that these sheaves agree with the ones constructed by BRST reduction. The existence of two commuting conformal structures allows us to consider U ch (E) as a formal Hamiltonian quantization of the sigma-model with target a generalized Calabi-Yau metric manifold. In general, consider a vertex algebra V endowed with two commuting Virasoro fields L ± (z). Suppose moreover that L = L + + L − is a conformal structure on V [18], i.e. L 0 acts diagonally and L −1 = T , the translation operator on V . Consider the formal change of coordinates z = eiσ and the Hamiltonian + H =i L − L − dσ. (1.1) For any state a ∈ V , we can impose the equations of motion ∂τ Y (a, σ ) = [Y (a, σ ), H ],
(1.2)
to obtain a state field correspondence a → Y (a, σ, τ ) = Y (a, z, z¯ ), where z = eiσ +τ , and z¯ = eiσ −τ , so that the zero mode of L + acts as ∂z and the zero mode of L − acts as ∂z¯ . With these considerations, we obtain the equations of motion for the quantum nonlinear sigma model with target a generalized Calabi-Yau metric manifold, very much in analogy to the standard Calabi-Yau story [8]. The organization of this article is as follows. In Sect. 2 we fix notations and briefly recall the definitions of SUSY vertex algebras. In Sect. 3 we recall the basic definitions of generalized Kähler and Calabi-Yau metric manifolds. In Sect. 4 we recall the construction of the sheaf of SUSY vertex algebras U ch (E). In Sect. 5 we recall the connection with bihermitian geometry and we introduce the basic local coordinate frames that will play an important role in the computations in latter sections. In Sect. 6 we collect some useful lemmas about unimodularity in generalized Calabi-Yau metric manifolds, we collect some results scattered in the literature and produce some new ones. In particular, we clarify the connection between generalized Calabi-Yau metric manifolds as in [12] and their bihermitian counterpart. In Sect. 7 we state and prove the main results of this article. In Sect. 8 we study the topological twists and corresponding BRST cohomologies. We define here the chiral de Rham complex for a Generalized Kähler manifold. We develop in this section the rudiments of differential calculus on generalized Kähler manifolds and show that the chiral de Rham complex can be described entirely in terms of holomorphic data. In Sect. 9 we present a brief summary and discussion of the results in the present article. 2. Preliminaries on SUSY Vertex Algebras In this section we collect some results on SUSY vertex algebras from [16]. Definition 1 ([16]). An N K = 1 SUSY vertex algebra consists of the data of a vector space V , an even vector |0 ∈ V (the vacuum vector), an odd endomorphism S (whose
336
R. Heluani, M. Zabzine
square is an even endomorphism we denote T ), and a parity preserving linear map A → Y (A, z, θ ) from V to End(V )-valued fields (the state-field correspondence). This data should satisfy the following set of axioms: • For any A, Y (A, z, θ ) is a field, namely Y (A, z, θ )B ∈ V [[z]][θ ],
∀B ∈ V.
• Vacuum axioms: Y (|0 , z, θ ) = Id, Y (A, z, θ )|0 = A + O(z, θ ), S|0 = 0. • Translation invariance: [S, Y (A, z, θ )] = (∂θ − θ ∂z )Y (A, z, θ ), [T, Y (A, z, θ )] = ∂z Y (A, z, θ ). • Locality: (z − w)n [Y (A, z, θ ), Y (B, w, ζ )] = 0,
n 0.
Given a N K = 1 SUSY vertex algebra V and a vector A ∈ V , we expand the fields Y (A, z, θ ) = Z −1− j|1−J A( j|J ) , j∈Z
J =0,1
and we call the endomorphisms A( j|J ) the Fourier modes of Y (A, Z ). Define now the operations: [A B] =
j|J A( j|J ) B, j! j≥0
J =0,1
(2.1)
AB = A(−1|1) B. The first operation is called the -bracket and it encodes all the information in the OPE of the superfields Y (A, z, θ ) and Y (B, z, θ ). The second operation is called the normally ordered product, the set of axioms that these operations satisfy are summarized in Appendix A. 3. Preliminaries in Geometry In this section we recall the basic definitions of generalized complex geometry following [12] and [14]. Let M be a smooth manifold and denote by T the tangent bundle of M. Definition 2. A Courant algebroid is a vector bundle E over M, equipped with a nondegenerate symmetric bilinear form , as well as a bilinear bracket [, ] on C ∞ (E) and with a smooth bundle map π : E → T called the anchor.
Superconformal Structures on Generalized Calabi-Yau Metric Manifolds
337
These structures should satisfy the following five axioms: (1) π([A, B]) = [π(A), π(B)], ∀A, B ∈ C ∞ (E). (2) The bracket [, ] should satisfy the Leibniz identity, [A, [B, C]] = [[A, B], C] + [B, [A, C]], ∀A, B, C ∈ C ∞ (E). (3) [A, f B] = f [A, B] + (π(A) f )B, for all A, B ∈ C ∞ (E) and f ∈ C ∞ (M), (4) A, [B, C] + [C, B] = π(A) B, C , ∀A, B, C ∈ C ∞ (E) (5) π(A) B, C = [A, B], C + B, [A, C] , ∀A, B, C ∈ C ∞ (E). We can introduce a natural differential operator D : C ∞ (M) → C ∞ (E) as D f, A = 1 ∞ ∞ 2 π(A) f for all f ∈ C (M) and A ∈ C (E). Thus property (4) becomes [B, C] + [C, B] = D B, C . Another useful identity implied by the definition is π ◦ D = 0, i.e. D f, Dg = 0, ∀ f, g ∈ C ∞ (M). The bracket [, ] is called the Dorfman bracket, in some situations it is convenient to use the antisymmetric version, the Courant bracket [ , ]c which is related to Dorfman bracket as follows [A, B] = [A, B]c + D X, Y .
(3.1)
A Courant algebroid E is called exact if the following sequence is exact: π∗
π
0 → T ∗ −→ E − → T → 0, where we use the inner product in E to identify it with its dual. In this case it is possible to choose an isotropic splitting s : T → E for π giving rise to an isomorphism E∼ = T ⊕ T ∗ taking the Dorfman bracket to that given in the example below. Example 1. E = (T ⊕ T ∗ ) ⊗ C, , and [, ] are respectively the natural symmetric pairing and the Dorfman bracket defined as: 1 (i X η + i Y ζ ). 2 [X + ζ , Y + η] = [X, Y ] + Lie X η − i Y dζ + i Y i X H,
X + ζ, Y + η =
where H is a closed three form. In the rest of this article all Courant algebroids will be assumed to be exact unless noted. Definition 3 ([12, 4.14]). A generalized almost complex structure on a real 2n-dimensional manifold M is given by the following equivalent data: • an endomorphism J of E which is orthogonal with respect to the inner product , and J 2 = −1. • a maximal isotropic sub-bundle L ⊂ E ⊗ C of real index zero, i.e. L ∩ L¯ = 0. • a pure spinor line sub-bundle U ⊂ ∗ T ∗ ⊗ C, called the canonical line bundle satisfying (ϕ, ϕ) ¯ = 0 at each point x ∈ M for any generator ϕ ∈ Ux .
338
R. Heluani, M. Zabzine
Here ( , ) : ∧∗ T ∗ ⊗ ∧∗ T ∗ → det T ∗ is the Mukai pairing which is an invariant bilinear form on the spinors of E ∼ = T ⊕ T ∗ defined as (ϕ, ψ) ≡ [ϕ ∧ ψ]top , where ϕ denotes the antiautomorphism of the Clifford algebra applied to ϕ, see [14] for an extensive explanation on the subject. The fact that L is of real index zero implies E ⊗ C (T ⊕ T ∗ ) ⊗ C = L ⊕ L¯ = L ⊕ L ∗ , using , to identify L¯ with L ∗ . Definition 4 ([12, 4.18]). A generalized almost complex structure J is said to be integrable to a generalized complex structure when its +i-eigenvalue L ⊂ E ⊗ C is Courant (Dorfman) involutive. We refer to a manifold admitting integrable generalized complex structure as a generalized complex manifold. Proposition 1 ([20,14]). Every generalized complex manifold is a Poisson manifold, i.e. it admits a bivector P = P i j ∂i ∧ ∂ j such that P ik ∂k P jl + P jk ∂k P li + P lk ∂k P i j = 0. We refer to such P as Poisson structure. In this case, (L , L ∗ ) is a Lie bi-algebroid (that is, both L and its dual L ∗ are naturally Lie algebroids in a suitably compatible manner), and E ⊗ C could be viewed as its Drinfeld double. Note that E acts on the sheaf of differential forms • T ∗ via the spinor representation, and this sheaf acquires a different grading by the eigenvalues of J acting via the spinor representation:
(3.2) T ∗ = U−n ⊕ · · · ⊕ Un . Clifford multiplication by sections of L (resp. L) increases (resp. decreases) the grading. U−n = UJ is called the canonical bundle of (M, J ). Definition 5. A generalized complex manifold (M, J ) is called generalized CalabiYau if the bundle UJ is holomorphically trivial. This is equivalent to the existence of a nowhere vanishing global section ρ ∈ C ∞ (UJ ) (a non-vanishing pure spinor) satisfying d H ρ = 0, where d H = d + H ∧ is the twisted de Rham differential. Definition 6 ([12, Def. 6.3]). A generalized Kähler structure is a commuting pair (J1 , J2 ) of generalized complex structures such that G = −J1 J2 is a positive definite metric on E. Example 2. Let (g, J, ω) be a usual Kähler manifold, then the following generalized complex structures: −J 0 0 ω−1 J1 = , J2 = (3.3) 0 J∗ −ω 0 commute and
G = −J1 J2 =
is a positive definite metric on T ⊕ T ∗ .
0 g
g −1 0
(3.4)
Superconformal Structures on Generalized Calabi-Yau Metric Manifolds
339
The following notation is taken from [12]. Since J1 and J2 commute, we have the following decomposition: − + E ⊗C∼ = (T ⊕ T ∗ ) ⊗ C = L +1 ⊕ L − 1 ⊕ L1 ⊕ L1 ,
(3.5)
− + where L 1 = L +1 ⊕ L − 1 is the +i eigenvalue bundle for J1 and L 2 = L 1 ⊕ L 1 is the +i eigenvalue bundle for J2 . The Courant integrability of both J1 and J2 imply that each of the terms in the RHS of (3.5) is Courant involutive. If we define C± to be the ±1 eigenbundle of G, we obtain that ± C± ⊗ C = L ± 1 ⊕ L1 .
(3.6)
Note that C+ (resp C− ) is positive definite (resp. negative definite) with respect to the inner product on E. Definition 7. A generalized Calabi-Yau metric manifold is a generalized Kähler manifold (M, J1 , J2 ) such that both (M, J1 ) and (M, J2 ) are generalized Calabi-Yau with the corresponding d H -closed pure spinors ρ1 and ρ2 satisfying the following normalization condition: (ρ1 , ρ1 ) = c(ρ2 , ρ2 )
(3.7)
for some constant c. Example 3. Let M be a usual Calabi-Yau manifold. We have the pure spinors ρ1 = ,
ρ2 = eiω ,
(3.8)
where ω is the symplectic form and is the holomorphic volume form. We have (eiω , e−iω ) = (−1)m(m−1)/2 (, ),
(3.9)
that is, c = (−1)m(m−1)/2 where m = dimM. 4. Sheaves of Vertex Algebras In this section we recall some results from [11] and [5] in the language of SUSY vertex algebras, following [15]. In this section we do not require the Courant algebroid E to be exact. The construction of the chiral-anti-chiral de Rham complex parallels that of the sheaf of (twisted) differential operators from a Lie algebroid (cf. Prop 2 below). Let (E, , , [, ], π ) be a Courant algebroid. Let E be the corresponding purely odd super vector bundle. We will abuse notation and denote by , the corresponding super-skew-symmetric bilinear form, and by [, ] the corresponding degree 1 bracket on E. Similarly, we obtain an odd differential operator D : C ∞ (M) → C ∞ (E). If no confusion should arise, when v is an element of a vector space V , we will denote by the same symbol v the corresponding element of V , where is the parity change operator. The following proposition from [15] describes the construction of the chiral de Rham complex in parallel to the construction of twisted differential operators given a Lie algebroid:
340
R. Heluani, M. Zabzine
Proposition 2. For each complex Courant algebroid E over a differentiable manifold M, there exists a sheaf U ch (E) of SUSY vertex algebras on M generated by functions i : C(M) → U ch (E), and sections of E, j : C(E) → U ch (E) subject to the relations: (1) i is an “embedding of algebras”, i.e. i(1) = |0 , and i( f g) = i( f ) · i(g), where in the RHS we use the normally ordered product in U ch (E). (2) j imposes a compatibility condition between the Dorfman bracket in E and the Lambda bracket in U ch (E): [ j (A) j (B)] = j ([A, B]) + 2χi( A, B ). (3) i and j preserve the O-module structure of E, i.e. j ( f A) = i( f ) · j (A). (4) D and S are compatible, i.e. jD f = Si( f ). (5) We impose the usual commutation relation [ j (A) i( f )] = i(π(A) f ). In the particular case when E = (T ⊕ T ∗ ) ⊗ C is the standard Courant algebroid with H = 0, then U ch (E) is the chiral-anti-chiral de Rham complex of M as in [22], denoted 1 by ch M for historical reasons. Using this proposition, we will abuse notation and use the same symbols for sections of E ⊗ C when they are viewed as sections of U ch (E). 5. Bihermitian Setup Following [12], we discuss the bihermitian description of generalized Kähler geometry which we are going to use extensively later on. Let (M, J1 , J2 ) be a generalized Kähler manifold as in Definition 6. ∼ The projection π : E ∼ → T . We use = T ⊕ T ∗ → T induces isomorphisms π ± : C± − these isomorphisms to transport structures from C± to T . Restricting the natural symmetric and skew-symmetric pairings on T ⊕ T ∗ to C± we obtain Riemannian metrics and two forms on both of C± . We can transport these via π ± to T obtaining b ± g, where b is a two form, and g is a Riemannian metric. Since C± are stable under both J1 and J2 , we obtain complex structures on both of them which are compatible with the inner product. Projecting J1 with π ± we obtain two Hermitian almost complex structures J± on T . Since J1 = ±J2 in C± we would obtain the same data projecting J2 . Finally let ω± = g J± . We have constructed the data (g, b, J+ , J− ) from (J1 , J2 ). It is easy to show that the latter can be recovered from the former as −1 1 1 0 J+ ± J− − ω+−1 ∓ ω− 1 0 J1,2 = . (5.1) −b 1 2 b 1 ω+ ∓ ω− − J+∗ ± J−∗ The projection π identifies ∼
→ T±1,0 . π : L± 1 −
(5.2)
1 We call this sheaf chiral-anti-chiral as in [10] not to confuse it with the holomorphic chiral de Rham complex.
Superconformal Structures on Generalized Calabi-Yau Metric Manifolds
Indeed writing explicitly π we get 1,0 ∞ L± 1 = X + (b ∓ iω± )X |X ∈ C (T± ) .
341
(5.3)
We can now write the integrability conditions for a generalized Kähler structure in terms of the bihermitian data. Here we review the relevant results. Proposition 3 ([12]). The complex structures J± coming from a generalized Kähler structure are integrable and the forms ω± , b and H satisfy c d± ω± = ±(db + H ),
(5.4)
c = i(∂ − ∂ ) and ∂ is the ∂ operator for the complex structure J . where d± ± ± ± ±
Proposition 4 ([12]). Let (g, b, J± ) be the bihermitian data obtained from a generalized Kähler manifold. Define two connections 1 ∇ ± = ∇ ± g −1 (db + H ), 2
(5.5)
where ∇ is the Levi-Civita connection for g. We obtain ∇ ± J± = 0 and (db + H ) is of type (2, 1) + (1, 2) with respect to both J± . Remark 1. As far as the bihermitian picture is concerned the only data we use is the combination db + H which gives rise to a closed 3 form. In view of this statement, we may replace H by H + db in the definition of the Courant algebroid, hence we may assume that b = 0 in all formulas. Thus without loss of any generality we can set b to zero in all above propositions and use only H . This is what we do in the rest of paper. Remark 2. The generalized complex structures J1,2 give rise to the following Poisson tensors: −1 P1,2 = −ω+−1 ± ω− .
(5.6)
Below we will need some properties of the Bismut connection ∇ ± . For each complex α }. We will use Greek structure J± we choose a holomorphic system of coordinates {z ± subindexes when using these coordinate systems while we use Latin subindexes for a general coordinate system. Since the Hermitian complex structures are covariantly constant ±i J± lj , ∇ ± J± = 0 ⇒ J± ij ,k = k±lj J± li − kl
(5.7)
k±lj = kl j ± gls Hsk j ,
(5.8)
where
where kl j are the Christoffel symbols of the Levi-Civita connection for g. In the α } these imply coordinate system {z ± ±α¯ i±α = iβ =0 β¯
∀α, β, i,
(5.9)
and from these we infer: ±α α γ¯ βα ¯ γ¯ g , ¯ = ±Hβα
γ α¯ β±α¯α¯ = ±Hβ αγ ¯ g .
(5.10)
342
R. Heluani, M. Zabzine
We also define the following one forms: vi± = J±i ∇k J±k j = J±i ∇k± J±k j ∓ j
j
1 j 1 j k J±i Hklm g km J±l j ± J±i Hk jm g ml J±l 2 2
1 j k ml = ± J±i J±l g Hk jm , 2 α } looks like which in the holomorphic coordinate system {z ±
(5.11)
±β¯
¯
γβ = ∓2Hβα γ¯ g β γ¯ = −2α β¯ . 2vα± = ∓Hβα γ¯ g β γ¯ ± Hβαγ ¯ g
(5.12)
Similarly we obtain ¯
±β
γβ ± Hβ α¯ γ¯ g β γ¯ = −2αβ 2vα± ¯ g ¯ = ∓Hβ¯ αγ ¯ .
(5.13)
It is convenient to introduce local frames on E adapted to the decomposition (3.5). According to (5.3) we can choose local frames for L ± 1 given by eα± = with dual frames on L ± 1, α e±
=
¯ ±g α β eβ±¯
:= ±g
α β¯
∂ ± gα β¯ dz β±¯ ∂z α±
∂ ± gγ β¯ dz γ± ∂z β±¯
(5.14)
=
dz α±
±g
α β¯
∂ . ∂z β±¯
(5.15)
With respect to complex conjugation we have the following properties: β
± eα± ¯ e± , ¯ = eα = ±gαβ
(5.16)
where these expressions are written in the holomorphic coordinate system for J± correspondingly. One can easily calculate some of the Dorfman brackets [eα± , eβ± ] = 0,
β
α [e± , e± ] = 0,
(5.17)
while other brackets are non-trivial. 6. Unimodularity as Twisted Ricci-Flatness In this section we prove some useful identities about the divergences of the local frame elements {eα± } and the corresponding statements in terms of the bihermitian data. First we recall some basic notions from the theory of Lie algebroids and generalized complex geometry. Recall that given any Lie algebroid L over M, we can define a differential d L : C ∞ (M) → C ∞ (L ∗ ) as (d L f )(l) = π L (l) f , where l is a section of L and π L is the anchor map of L. This differential can be extended to • L ∗ by imposing the Leibniz rule in the usual way (for ζ ∈ C ∞ ( k−1 L ∗ )): (d L ζ )(l1 , . . . , lk ) = (−1)i+1 π(li )ζ (l1 , . . . , lˆi , . . . lk ) i
+
i< j
(−1)i+ j ζ ([li , l j ], . . . , lˆi , . . . , lˆj , . . . , lk ).
(6.1)
Superconformal Structures on Generalized Calabi-Yau Metric Manifolds
343
The cohomologies of the complex ( • L ∗ , d L ) are denoted by H • (L) and are called the Lie algebroid cohomologies of L (with trivial coefficients). Now let (M, J ) be a generalized complex manifold with trivial UJ . Given a nonvanishing global section of UJ , we obtain an isomorphism of sheaves: ∧k L Uk−n ,
(6.2)
where Uk−n were defined in (3.2). The twisted de Rham differential can be split as d H = ∂ + ∂¯ such that ∂ : Uk → Uk−1 and ∂¯ : Uk → Uk+1 . Suppose moreover that M is generalized Calabi-Yau; in this case the isomorphism (6.2) allows us to identify the ¯ with the complex computing the Lie algebroid cohomology of L (using complex (U• , ∂) L¯ = L ∗ ). Moreover, in this case, the Lie algebroids L and L ∗ are both unimodular, a notion due to Weinstein [24] that we now recall. For a Lie algebroid L we have the corresponding sheaf of twisted differential operators U (L). The sheaf det T ∗ is always a right twisted D-module, and the corresponding left U (L)-module is then the line bundle Q L = det L ⊗ det T ∗ . Suppose for simplicity that the line bundle Q L is trivial, for each non-vanishing section s of Q L we can define θs ∈ C ∞ (L ∗ ) by θs (l)s = l · s, where we use the left D-module structure of Q L on the RHS. It turns out that θ gives rise to a well defined element of H 1 (L , Q L ), the first Lie algebroid cohomology of L with coefficients in Q L (see [9] for details). Definition 8 ([24]). A Lie algebroid L is called unimodular if the class θ ∈ H 1 (L , Q L ) above constructed vanishes. We have the following proposition (see for example [2, Thm. 10]) Proposition 5. A generalized complex manifold M is generalized Calabi-Yau if and only if UJ is trivial and L is unimodular. In fact this can be refined as follows. Let (M, J ) be a generalized complex manifold with topologically trivial canonical bundle. Let ρ be a non-vanishing section of UJ . Integrability of J implies that there exists a unique χ ∈ C ∞ (L ∗ ) such that d H ρ = χ · ρ.
(6.3)
There also exists a unique section ζ ∈ C ∞ (det L ∗ ) such that ρ¯ = ζ · ρ,
(6.4)
Recall that the spinor ρ gives rise to a volume form μ := (ρ, ρ) ¯ = [ρ ∧ ζ · ρ]top ∈ C ∞ (det T ∗ ),
(6.5)
where ( , ) is Mukai pairing. Therefore we can define the section s of det L ⊗ det T ∗ by s = ζ¯ ⊗ μ. We have the following Proposition 6. The modular class θs is represented by 2χ .
344
R. Heluani, M. Zabzine
Proof. Recall we have an isomorphism L L ∗ . This induces an isomorphism det L det L ∗ . In the basis ∧ei and ∧ei this isomorphism is given by multiplication by a function α.2 It follows from (6.4) and its complex conjugate that ζ ζ¯ = 1. Let {ei }, be a local frame for L with dual frame {ei }; we can locally write √ ζ = e−iψ α e1 ∧ · · · ∧ edim M ∈ C ∞ (det L ∗ ) (6.6) for a real function ψ. And we have eiψ ζ¯ = √ e1 ∧ · · · ∧ edim M ∈ C ∞ (det L). α
(6.7)
Since s = ζ¯ ⊗ μ = ζ¯ ⊗ (ρ, ζ · ρ) we may assume α = 1 and ψ = 0, (6.8) divμ (ei ) · μ := − Lieπ ei μ = −dιi (ρ, ζ · ρ), where ιi · := π(ei ) ·. Let us write ρ = p ρ p , where ρ p ∈ C ∞ (∧ p T ∗ ). Then we have: ιi ρ ∧ ζ · ρ = (−1) p−1 ιi ρ p ∧ ζ · ρ + (−1) p ρ (6.9) p ∧ ιi ζ · ρ. p
p
Now using that ei · ρ = 0 we have (−1) p−1 π ∗ (ei ) ∧ ρ p ∧ ζ · ρ + (−1) p ρ ιi ρ ∧ ζ · ρ = − p ∧ ιi ζ · ρ, p
p
∗ = (−1) p ρ (−1) p ρ p ∧ π (ei ) ∧ ζ · ρ + p ∧ ιi ζ · ρ, p
p
= (−1) p ρ p ∧ ei · ζ · ρ.
(6.10)
p
Taking d H of this we obtain: − ρ ∧ d H ei · ζ · ρ divμ (ei )μ = − (d H ρ) ∧ ei · ζ · ρ top top = − (χ · ρ) ∧ ei · ζ · ρ − ρ ∧ d H ei · ζ · ρ top
top
= −(ei · χ · ρ, ζ · ρ) − (ρ, d H ei · ζ · ρ) = −χ (ei )μ − (ρ, d H ei · ζ · ρ) = −χ (ei )μ − (ρ, [d H , ei ] · ζ · ρ) = −χ (ei )μ − (ρ, [[d H , ei ], ζ ] · ρ) − (ρ, ζ · ei · χρ) = −2χ (ei )μ − (ρ, [[d H , ei ], ζ ] · ρ),
(6.11)
where for any two elements a, b of the Clifford algebra of (E, ·, · ) we write [a, b] = a · b − (−1) p(a) p(b) b · a. Using the fact that the Dorfman bracket is defined as a derived bracket [14] we obtain: divμ (ei )μ = −2χ (ei )μ − (ρ, [ei , ζ ] · ρ).
(6.12)
2 We can always make this function equal 1 but we are interested in the frames (5.14) and (5.15) adapted to the bihermitian structure, in which case α = log det g (cf. Remark 3).
Superconformal Structures on Generalized Calabi-Yau Metric Manifolds
345
Since we have the equation jk
j
[ei , e j ] = ci ek − cik ek , jk
(6.13)
j
for some functions ci and cik , we obtain j
divμ (ei ) = −2χ (ei ) + ci j .
(6.14)
On the other hand, by definition of the modular class θs of the Lie algebroid L we have j
θs (ei ) = ci j − divμ (ei ),
(6.15)
from which we obtain that the modular class is represented by 2χ . We will also need the following Proposition 7. On a generalized Calabi-Yau manifold with closed pure spinor ρ and corresponding volume form (ρ, ρ) ¯ = μ, the divergence of the corresponding Poisson structure P with respect to μ vanishes: divμ P = 0. Proof. This proposition is a simple corollary of [14, Prop. 3.27]. The Poisson structure, pure spinor and volume form are related by iP
ρ ∧ ρ¯ = e− 2 μ.
(6.16)
iP
Since ρ is a pure spinor we have d(e− 2 μ) = 0 from which the proposition follows. We remind the reader that the divergence of a multivector is defined as follows: divμ P · μ = d(P · μ),
(6.17)
where by P · μ we understand the contraction of the multivector P with the form μ. Thus in local coordinates the divergence of the Poisson structure P can be written as (divμ P) j =
1 ∂i (μ˜ P i j ) = ∂i P i j + ∂i (log μ) ˜ Pi j , μ˜
(6.18)
where the volume form μ = μ˜ d x 1 ∧ . . . ∧ d x 2n . In the rest of this section we fix a generalized Calabi-Yau metric manifold (M, J1 , J2 ) with its two pure spinnors ρ1 and ρ2 . Recall that we have the decomposition (3.5) and α we choose frames {eα± } for L ± 1 with dual frames {e± }. Remark 3. Using the frames (5.14) and (5.15) we can explicitly identify the functions α in (6.6). We obtain ρ1 = √ ρ2 =
1 eiψ1 e1+ ∧ . . . en+ ∧ e1− ∧ · · · ∧ en− · ρ¯1 , det g
eiψ2 e1+
∧ · · · ∧ en+
for two real functions ψ1 and ψ2 .
1 ∧ e−
n ∧ · · · ∧ e−
· ρ¯2 ,
(6.19)
346
R. Heluani, M. Zabzine
The pure spinors give rise to the volume forms: (ρ1 , ρ1 ) = c(ρ2 , ρ2 ) = e−2 volg ,
(6.20)
where volg is the volume form induced by the Riemannian metric g which can be written in any coordinate system as volg = det g d x 1 ∧ . . . d x 2n , (6.21) and is a function showing the mismatch between the Riemannian volume form and the volume form induced by the pure spinors. In the physics literature such a function is called a dilaton and Eq. (6.20) should be regarded as the definition of the dilaton. Proposition 7 implies that the divergences of the Poisson structures P1 and P2 in −1 (5.6) calculated with respect to e−2 volg are zero. Therefore the divergences of ω± are zero: 1 ij −2 ∂ e (6.22) det g ω √ i ± = 0. e−2 det g This in turn implies v ± = −2d,
(6.23)
where the 1-forms v ± are defined in (5.11). The relations (5.12) and (5.13) imply 2
∂ ±β¯ = α β¯ , α ∂z ±
2
∂ ±β = αβ ¯ . α¯ ∂z ±
(6.24)
Let us analyze unimodularity in this context. Recall that the manifolds (M, J1 ) and − + (M, J2 ) are generalized Calabi-Yau, therefore L 1 = L +1 ⊕ L − 1 and L 2 = L 1 ⊕ L 1 are both unimodular. Not only the modular classes vanish but Proposition 6 says that they are represented by zero (as opposed to an exact form) when using the appropriate volume form. Define the structure functions ±γ
γ
cαβ = [eα± , eβ± ], e± , αβ
β
α , e± ], eγ± , c±γ = [e± ±γ
γ
γ
dαβ = [eα+ , eβ− ], e+ , β
dγαβ = [e+α , e− ], eγ+ ,
γ
γ
eαβ = [eα+ , eβ− ], e− , β
eγαβ = [e+λ , e− ], eγ− ,
(6.25)
αβ
where cαβ and c±γ vanish if we use the coordinate frames (5.14) and (5.15). Using (6.19) we compute explicitly the representatives of the modular classes for L 1 and L 2 to obtain: +β β θ1 (eα+ ) = cαβ + eαβ − div eα+ + π(eα+ ) · (− log det g + iψ1 ) = 0, (6.26) +β β θ2 (eα+ ) = cαβ − eαβ − div eα+ + π(eα+ ) · (iψ2 ) = 0, where the divergences are calculated with respect to μ = e−2 volg . Taking the sum of the above expressions and using the coordinate frames (5.14) and (5.15) we obtain: ∂ i −2 1/4 log(e (ψ (det g) ) + + ψ ) = 0. (6.27) 1 2 ∂z +α 2
Superconformal Structures on Generalized Calabi-Yau Metric Manifolds
347
Taking the derivative with respect z +α¯ of this equation and retaining the real part we arrive at: ∂ ∂ −4 log(e det g) = 0. (6.28) ∂z +α ∂z β¯ + We can derive the analogous statement for the complex structure J− by evaluating θ1 (eα− ) and θ2 (eα− ). Equation (6.28) implies that (6.29) log det g = ϕ ± + ϕ¯ ± + 4, where ϕ ± is holomorphic with respect to the complex structure J± respectively. In other words it implies that there exist non-vanishing holomorpic volume forms. In local complex coordinates we write them as ±
1 n ± = eϕ dz ± ∧ . . . dz ± ,
(6.30)
such that ± ∧ ± = e−4 volg . We can rescale these nowhere vanishing forms ±
1 n ∧ . . . dz ± , ζ± = e2+ϕ dz ±
(6.31)
such that ζ± ∧ ζ¯± = volg . Since g is parallel with respect to both ∇ ± we have: ±β ±β¯ ±β¯ ±β ∂z ±α (log det g) = αβ + α β¯ , ∂z α¯ (log det g) = α¯ β¯ + αβ ¯ . ±
Using (6.24) we find the traces of the Christoffel symbols as: ±β ±β¯ αβ = ∂z ±α ϕ ± + 2 , α¯ β¯ = ∂z α¯ ϕ ± + 2 . ±
(6.32)
(6.33)
This implies that the forms ζ± are covariantly constant: ∇ ± ζ± = 0. Thus the generalized Calabi-Yau metric manifold has SU (n) holonomy for ∇ ± . Finally let us calculate some of the traces of structure functions which we will need to use later. Using the explicit frames (5.14) and (5.15) a direct computation shows β ±α ±α β¯ α [eα± , e± dz ± ∓ g β γ¯ ∂z γ¯ + 2βα ] = βα (6.34) ¯ dz . ±
Using (6.24) and (6.33) we can rewrite this as α ] = dϕ ± ∓ g −1 dϕ ± − 2(∂ ± ± g −1 ∂ ± ) + 4d, [eα± , e±
(6.35)
±
where d is the de Rham differential and d = ∂ ± + ∂ is the decomposition with respect to the complex structures J± respectively. We are interested in the inner product of this β expression with eβ∓ and e∓ . Using the orthogonality of frames we obtain a coordinate independent expression for (6.35): α ], eβ∓ = dϕ ± ∓ g −1 dϕ ± + 4d, eβ∓ .
[eα± , e±
(6.36)
348
R. Heluani, M. Zabzine
This can easily be evaluated to obtain: α = −∂z β ϕ + + 2 = −π(eβ− ) ϕ + + 2 , dαβ − α = ∂z β ϕ − + 2 = π(eβ+ ) ϕ − + 2 . eβα
(6.37)
+
β
Similarly, taking the inner product with e∓ we obtain β γ¯ β dααβ = −g− ∂z γ¯ ϕ + + 2 = π(e− ) ϕ + + 2 , − β γ¯ β eαβα = −g+ ∂z γ¯ ϕ − + 2 = −π(e+ ) ϕ − + 2 .
(6.38)
+
In the next section we adopt the following short-hand notation for the action of the anchor π : ±
α f ,α = π(e± ) f,
f ,α ± = π(eα± ) f.
(6.39)
Finally, we can now identify the functions ψ1/2 of (6.19) obtained in the generalized Calabi-Yau metric context with their counterparts ϕ ± obtained in the bihermitian setup. Using the frames in (5.14)–(5.15) and taking the difference of the equations in (6.26) we find: β 2eαβ + ∂z +α iψ1 − iψ2 − log det g = 0. (6.40) Comparing with the second equation of (6.37) we obtain ∂z +α iψ1 − iψ2 − log det g = −2∂z +α ϕ − + 2 .
(6.41)
Similarly, computing the modular classes of L ∗1,2 valuated in e+α we obtain √ ¯ ¯ g α β ∂ β¯ iψ1 − iψ2 − log det g = −2g α β ∂ β¯ ϕ − + 2 ,
(6.42)
z+
z+
from which we may assume iψ1 − iψ2 − log det
√
g = −2 ϕ − + 2 .
(6.43)
α we obtain: Similarly, by valuating the modular classes in eα− and e− log det g − iψ1 − iψ2 = 2 ϕ + + 2 ,
(6.44)
and comparing with (6.43) we obtain ψ1 = i ϕ ± − ϕ ∓ ,
(6.45)
ψ2 = i ϕ + − ϕ − .
We finish this section by summarizing the relations between all the global sections found so far. On the bihermitian setup we have the holomorphic volume forms ± defined in (6.30) and the corresponding covariantly constant forms ζ ± defined in (6.31); these latter are global sections of det T±∗1,0 . On the generalized Calabi-Yau metric setup we have the pure spinors ρ1,2 and the sections ζ1,2 of det L 1,2 ∗ defined by (6.4). We see that under the dual of the isomorphism (5.2) the sections ζ ± are mapped to sections of det L ±∗ 1 ; we can write them locally as ±
1 n ∧ . . . e± , ζ ± = eη e±
2n = dim M.
It follows from (6.45), (6.19) and (6.29) that the sections ζ1,2 are given by e + − eη −η respectively.
(6.46) η+ +η−
and
Superconformal Structures on Generalized Calabi-Yau Metric Manifolds
349
7. N = 2, 2 Superconformal Structure In this section we state and proof the main theorem of this article. We start by associating to each pure spinor ρ1,2 of (M, J1,2 ) a global section of U ch (E). We find the explicit description of these sections in terms of the covariantly constant forms (6.31) in the bihermitian setup. Let (M, J ) be a generalized complex manifold and let ρ be a pure spinor. Let L be the corresponding Lie algebroid (the i-eigenbundle of J ). Recall that ρ gives rise to a global section ζ ∈ C ∞ (det L ∗ ) given by (6.4). Let {ei } be a local frame for L and {ei } be the dual frame. We can write ζ locally as ζ = eη e1 ∧ . . . en ,
n = dimM.
(7.1)
The following lemma is proved as in [17, Lem 5.1] Lemma 1. The local section J=
i i e ei + i T η, 2
(7.2)
i
gives a well defined global section of U ch (E). Remark 4. Integrability of L is not used in the proof of this lemma, so we may replace J by a generalized almost complex structure such that det L ∗ is trivial. The section ζ can be any non-vanishing section of det L ∗ . Let now (M, J1,2 ) be a generalized Calabi-Yau metric manifold with pure spinors ∗ ρ1,2 . Let ζ ± be the corresponding global sections of det L ± 1 written as in (6.46). Recall ∗ that the global sections ζ1,2 of det L 1,2 defined by the pure spinors ρ1,2 are given in these −
−
frames by eη +η and eη −η . The global sections J1 and J2 of U ch (E) constructed by Lemma 1 give rise to global sections +
+
J± =
i i ± e± ei + i T η± . 2
(7.3)
i
Remark 5. We remark here that unlike the usual Calabi-Yau case with H = 0, where the α e± are given by a global holomorphic volume quantum corrections to the local fields e± α form, in this more general case, the relevant global sections of det T±∗1,0 are not given by the holomorphic expressions (6.30) but rather the covariantly constant ones (6.31), where the dilaton correction appears explicitly. Let us now recall the main theorem of [17] in a slightly more general form. Theorem 1 ([17, Thm 5.5]). Let (M, J ) be a generalized Calabi-Yau manifold. Let {ei } be a local frame for the associated Lie algebroid L and let {ei } be the dual frame. Let ζ ∈ C ∞ (det L ∗ ) be a global section written as (7.1) and let J be the corresponding global section of U ch (E) given by Lemma 1. The following is true.3 3 Here H is a section of U ch (E) and should not be confused with the three form defining E. We keep this notation so that it agrees with previous literature.
350
R. Heluani, M. Zabzine
(1)
c [J J ] = − H + λχ , 3
c = 3 dim M,
(7.4)
where
i 1 i j e e [ei , e j ] + ei e j [ei , e j ] − T J [ei , ei ] 4 2 1 − ei Sei + ei Sei − i T J Dη. (7.5) 2 (2) The fields J and H generate the N = 2 superconformal vertex algebra of central charge c. H = H0 − i T J Dη =
Proof. Define J0 = J − i T η. In [17] it was proved that in the coordinate system where the global section of det L ∗ is constant then the fields J0 and H0 generate a copy of the N = 2 superconformal algebra. Since we will have to deal below with generalized Calabi-Yau metric manifolds, where there are two generalized complex structures and therefore two global spinors, we need to keep track of the fields J in a more general coordinate system. We compute i 1 [i T η J0 ] = λ η,i ei − η,i ei = − λJ η,i ei + η,i ei = −iλJ Dη. (7.6) 2 2 By skewsymmetry we obtain [J i T η] = i(λ + T )J Dη.
(7.7)
Combining (7.7) and (7.6) we obtain (7.5). The theorem follows from [17, Thm. 5.5] since J and H are just expressions for J0 , H0 in a more general coordinate system. Let now (M, J1 , J2 ) be a generalized Calabi-Yau metric manifold. Choose frames α {eα± } for the Lie algebroids L ± 1 with dual frames {e± }. Let (M, g, J± ) be the associated bihermitian manifold, and let (6.46) be the local expressions for the global sections of ∗ det L ± 1 . Recall that in terms of the bihermitian data we have η± = ϕ ± + 2,
(7.8)
ϕ±
1 ∧ · · · ∧ dz n are the global holomorphic volume forms (6.30) and where ± = e dz ± ± is the dilaton. We have the global sections (7.3) of U ch (E).
Theorem 2. The sections J ± generate two commuting copies of the N = 2 superconformal vertex algebra of central charge c = 23 dim M. More precisely, defining 1 α ± ± β α β ± ± ± β α ± ± ± α β e e [e , e ] −e± e± [eα , eβ ] +eα e± [e± , eβ ] −eα eβ [e± , e± ] H± = 4 ± β α ± T 1 α ± α α − i J± [e± Seα + eα± Se± , eα± ] − i T J± Dη± , (7.9) + e± 2 2 where J± = 21 (J1 ± J2 ), we obtain the commutation relations: c [J ± J ∓ ] = 0, [J ± J ± ] = − H ± + λχ , 3 [H ± J ± ] = (2T + 2λ + χ S)J ± , [H ± J ∓ ] = 0, c [H ± H ± ] = (2T + 3λ + χ S)H ± + λ2 χ , [H ± H ∓ ] = 0. 3
(7.10)
Superconformal Structures on Generalized Calabi-Yau Metric Manifolds
351
Proof. The proof of the theorem will consist of 4 parts. First we will show that the two sectors commute, that is [J + J − ] = 0.
(7.11)
In the second part we will identify the superconformal vector in each sector, namely we will compute H ± given by the first equation in (7.10). In the third part we will use the main theorem of [17], namely that each J1 = J + + J − and J2 = J + − J − generate a copy of the N = 2 superconformal vertex algebra of central charge 3 dim M. The superconformal vector of these algebras coincide and we will identify it with H + + H − . After this, the theorem will follow by an application of the Jacobi identity of conformal algebras. (1) Commuting sectors. i βα γ + i βα γ + i β γ α + d e+ eβ − e e e + e+ dβ eγ 2 γ 2 γ − β 2 i β α γ − βα + e+ eβγ e− + i(λ + T )η+,α − iλdβ 2 i α β γ − γα β βα eβγ e+ e− − eβ e− eγ+ + i(λ + T ) η+,α − dβ , (7.12) = 2
α + [e− J ] = −
where we used quasi-associativity in the last line. According to (6.38) the last term vanishes, and using skew symmetry we obtain: α [J + e− ]=
i α β γ γα β eβγ e+ e− − eβ e− eγ+ . 2
(7.13)
Similarly we compute i βγ − + i β γ + i β γ + e e e + d e+ eβ − e+ dβα eγ 2 α γ β 2 γα 2 i β γ − β + − e+ eβα eγ + iλdβα + i(λ + T )η,α − 2 i βγ − + γ β + eα eγ eβ − eγβ α e+ eβ− + i(λ + T ) dβα + η,α = − , (7.14) 2
[eα− J + ] =
and the last term vanishes because of (6.37). Using skew-symmetry we obtain: [J + eα− ] =
i βγ − + γ eα eγ eβ − eγβ α e+ eβ− . 2
(7.15)
We also need [T η− J + ] = −λ
i −,α + + − α η eα − η,α + e+ , 2
(7.16)
and from skewsymmetry: 1 + − α [J ± i T η− ] = − (λ + T ) η−,α eα+ − η,α + e+ . 2
(7.17)
352
R. Heluani, M. Zabzine
Combining (7.17), (7.15) and (7.13) we obtain 1 α β γ γα β e+ e− − eβ e− eγ+ eα− [J + J − ] = − eβγ 4 1 α βγ − + β γ − 1 + − α − (λ+T ) η−,α eα+ − η,α + e− eα eγ eβ −eγ α e+ eβ + e+ 4 2 1 α β γ γα β − eβγ e+ e− − eβ e− eγ+ eα− d. (7.18) 4 0 The integral term can be easily evaluated to be λ α β − eβα e+ + eαβα eβ+ , 2
(7.19)
which cancels the λ-term in (7.18) due to (6.37) and (6.38). Using quasi-associativity we can write the first two terms of (7.18) (the cubic terms) as: 1 1 + − β α β − T eαβα eβ+ + eβα (7.20) e+ = T η−,β eβ+ − η,β e + + . 2 2 Combining (7.20), (7.19) and (7.18) we obtain (7.11). (2) Superconformal vectors. We may work with the frames (5.14) and (5.15) where ±α cβγ = 0, [eα+ J + ] =
i βγ − β γ + dα eγ − dαγ e− eβ+ + iχ eα+ + i(λ + T )η,α + 2 i β γ dαβγ eγ− − dαγ + e− eβ+ d. 2 0
The integral term clearly vanishes. Using quasi-associativity we obtain i βγ − + γ β + [eα+ J + ] = dα (eγ eβ ) − dαγ (e− eβ+ ) + iχ eα+ + i(λ + T )η,α +, 2
(7.21)
(7.22)
and with skew-symmetry this reads: i βγ − + γ β + [J + eα+ ] = dα (eγ eβ ) − dαγ (e− eβ+ ) − i(χ + S)eα+ − iλη,α (7.23) +. 2 Similarly we compute i β α γ + αγ [e+α J + ] = e+ dβγ e− − dβ eγ− − iχ e+α + i(λ + T )η+,α 2 i α β γ + αγ β dβγ (e+ e− ) − dβ (e+ eγ− ) − iχ e+α + i(λ + T )η+,α , = 2 and using skew-symmetry: i α β γ + αγ β [J + e+α ] = dβγ (e+ e− ) − dβ (e+ eγ− ) + i(χ + S)e+α − iλη+,α . 2
(7.24)
(7.25)
We also need [i T η+ J + ] =
λ +,α + + + α η eα − η,α + e+ , 2
(7.26)
Superconformal Structures on Generalized Calabi-Yau Metric Manifolds
353
from which we get 1 + + α [J + i T η+ ] = − (λ + T ) η+,α eα+ − η,α + e+ . 2
(7.27)
We can now compute using the non-commutative Wick formula: 1 α β γ 1 βγ αγ β γ β dβγ (e+ e− ) − dβ (e+ eγ− ) eα+ + e+α dα (eγ− eβ ) − dαγ (e− eβ+ ) 4 4 + 1 α + 1 1 α + + eα − (χ + S)e+ eα − e+ (χ + S)eα + λ η+,α eα+ − η,α + + 2 2 2 1 + + 1 +,α + α α (eβ eγ )−d αγ (eβ e− ) e+ d dβγ eα −η,α + e+ − − (λ + T ) η + − + γ α β 2 4 0 1 (χ + S)e+α eα+ d. (7.28) − 2 0
[J + J + ] = −
Since the fields eα+ and e+α are odd, we see that the χ -terms vanish. The integral terms are easily evaluated: 1 1 1 α β 1 λ dαβ e− − dααβ eβ− − λ[e+α , eα+ ] − χ λ dim M = − λχ dim M. 2 2 2 2
(7.29)
Therefore the λ-terms in (7.28) vanish. Using quasi-associativity we can write the first two terms of (7.28) as T 1 β α β − e+ eα+ [eβ+ , e+α ] − dααβ eβ− − dαβ e− . 2 2
(7.30)
Collecting terms and using quasi-commutativity we obtain 1 α + + β + β α + 1 α + e e [e , e+ ] + eα e+ [e+ , eβ ] − e Se + eα+ Se+α 4 + β α 2 + α 1 1 + + α − T η+,α eα+ − η,α λχ dim M, (7.31) + e+ − 2 2
[J + J + ] = −
which in this frame coincides with the first equation of (7.10). A similar computation holds in the minus sector. The computation in the more general frame where ±γ cαβ = 0 is similar. (3) The diagonal embedding. Since the manifold (M, J1 ) is generalized Calabi-Yau, Theorem 1 says that J1 = J + + J − generates an N = 2 superconformal vertex algebra of central charge c = 3 dim M. The superconformal vector is given by (7.5), where {ei } is a frame for L 1 and {ei } is the dual frame. We can consider the frame given by {eα+ } ∪ {eα− } and their corresponding dual frames. Similarly the section J2 = J + − J − generates another N = 2 superconformal vertex algebra of central charge c = 3 dim M. The superconformal vector is given by an expression α }. We want to like (7.5) where we now use the frame for L 2 given by {eα+ } ∪ {e− + − show that this field H is actually H + H . Recall that we have chosen the section e4+ϕ
+ +ϕ −
1 n e+1 ∧ · · · ∧ e+n ∧ e− ∧ · · · ∧ e− = eη
+ +η−
1 n e+1 ∧ · · · ∧ e+n ∧ e− ∧ · · · ∧ e−
354
R. Heluani, M. Zabzine
of det L ∗1 . In the frame {eα+ } ∪ {eα− } for L 1 , and using the expressions in (5.14) and ±γ (5.15) such that cαβ = 0 we have: H1 =
1 α β + − α β − + + − α β − + − β e e [e , e ] +e− e+ [eα , eβ ] +eα eβ [e+ , e− ] +eα eβ [eα , e− ] 4 + − α β T α β α β dαβ e− + dααβ eβ− − eβα e+ − eαβα eβ+ − 2 1 α + α α − i T J1 D(4 + ϕ + + ϕ − ). (7.32) Seα − + eα+ Se+α + eα− Se− + e+ Seα + e− 2
The first term can be written as 1 α + + β α − − β e e [e , e+ ] + e− eβ [eα , e− ] . 2 + β α
(7.33)
while the second term is T +,β − − β − β + −,β + + η = i T J1 D η+ + η− eβ − η,β eβ − η,β − e− + η + e+ 2 T +,β + + β β − + −,β − − η (7.34) eβ − η,β eβ − η,β + + e+ + η − e− , 2
−
from which we easily see H1 = H + + H − . We can perform a similar computation using the global section eϕ
+ −ϕ −
e+1 ∧ · · · ∧ e+n ∧ e1− ∧ · · · ∧ en− = eη
+ −η−
e+1 ∧ · · · ∧ e+n ∧ e1− ∧ · · · ∧ en−
α } of L to obtain that H = H = H + + H − . of det L ∗2 and the frame {eα+ } ∪ {e− 2 2 1 (4) The N = 2, 2 algebra. The theorem follows from the following lemma that in turn is a application of the Jacobi identity for SUSY Lie conformal algebras [15, Proof of Thm. 6.2]:
Lemma 2. Let J + and J − be two commuting superfields satisfying c [J ± J ± ] = − H ± + λχ , 3
(7.35)
for some c ∈ C and some odd fields H ± . Let H = H + + H − and suppose moreover that both pairs (J + + J − , H ) and (J + − J − , H ) generate the N = 2 superconformal vertex algebra of central charge 2c. Then the quadruple J ± , H ± generates two commuting copies of the N = 2 superconformal vertex algebra of central charge c. Example 4. In the usual Calabi-Yau case, these two commuting structures agree with the ones constructed in [15]. Indeed in this case we have that both hermitian complex structures agree J+ = J− = J and we can therefore choose a holomorphic coordinate α } = {z α }. Note also that we may work in the coordinates where the system for both {z ± holomorphic volume form is constant. In this coordinate system, the sections H ± and J ± can be written as:
Superconformal Structures on Generalized Calabi-Yau Metric Manifolds
355
J1 := J + + J − = i S B α α − i S B α¯ α¯ , ¯
¯
J2 := J + − J− = ωα β α β¯ + ωα β¯ S B α S B β , H := H + + H − = S B i Si + T B i i , ¯ α εβ¯ g S B γ β¯ α + g α β Sα β¯ H := H + − H − = εγ ¯ ¯ β + ε¯ γ¯ g α ε¯ S B γ¯ α β¯ + g α β α Sβ¯ ¯
(7.36)
¯
+ gα β¯ T B α S B β + gα β¯ S B α T B β . 8. Half-Twisted Model In this section we study the topological twist of the N = 2 algebras described in the previous section. We first recall the double complex computing the generalized Hodge decomposition of a generalized Kähler manifold [13]. Let (M, J1 , J2 ) be a generalized Kähler manifold. J1 induces a decomposition of forms into its eigenspaces (3.2). Since J2 also acts on ∧• T ∗ via the spin representation and commutes with the action of J1 , Uk in turn is decomposed as Uk = Uk,|k|−n ⊕ Uk,|k|−n+2 ⊕ · · · ⊕ Uk,n−|k| ,
(8.1)
where U p,q is the intersection of the i p-eigenspace of J1 and the iq-eigenspace of J2 . Recall that, with respect to J1 , the (H-twisted) de Rham differential decomposes as d H = ∂1 + ∂1 , and these differentials act on C ∞ (Uk ) o
∂1 ∂1
/ C ∞ (U
k+1 )
.
(8.2)
This decomposition is further refined by the action of J2 into d H = δ+ + δ− + δ + + δ − ,
(8.3)
with these operators defined by: U p−1,q+1 U p+1,q+1 O dII u: II u δ + uu II ∂ u I 2 uu δ− II uu ∂ 1 o / U p,q II ∂1 u I u IIδ − u II uu ∂2 II uu δ+ u zu $ U p−1,q−1 U p+1,q−1
(8.4)
where ∂1 = δ+ + δ− and ∂2 = δ+ + δ− . This decomposition implies: Proposition 8 (Generalized Kähler identities). For a generalized Kähler structure, we have the identities ∗
∗
δ + = −δ+ and δ − = δ− .
356
R. Heluani, M. Zabzine
We obtain thus the following relation between all available Laplacians: d H = 2∂ 1/2 = 2∂1/2 = 4δ ± = 4δ± .
(8.5)
Note that by acting on ρ1 we obtain an isomorphism of sheaves: U p,q ∧r L +1 ⊗ ∧s L − 1,
p + q + n = 2r,
p − q + n = 2s,
(8.6)
and similarly applying to ρ2 we obtain: U p,q ∧r L +1 ⊗ ∧s L − 1,
p + q + n = 2r, q − p + n = 2s.
(8.7)
In the generalized Calabi-Yau metric case, these isomorphisms are in fact isomorphisms of bi-complexes. Indeed since L ± 1 are closed under the Lie bracket, we have d L 1 = d L +1 + d L − . Since we know that under (8.6) the differential ∂1 is mapped to d L 1 , by 1
degree considerations we obtain δ± is mapped to d L ± . We arrive at 1
Proposition 9. On a generalized Calabi-Yau metric manifold, the isomorphisms (8.6) and (8.7) are isomorphisms of bicomplexes. The corresponding spectral sequences are degenerate at E 2 and, in the case of compact M, converge to the H -twisted de Rham cohomology of M. The last statement in the above proposition is true for the complex U•,• on any compact generalized Kähler manifold [13]. We are now in position to study an affine generalization of all of these complexes. For the moment we let (M, J1 , J2 ) be a generalized Calabi-Yau metric manifold. In the vertex algebra of global sections of U ch (E) there are several bigradings corresponding to different choices of conformal vectors and U (1) currents. In the previous sections, in order to accomodate supersymmetry, it was natural to consider the Virasoro field arising from the decomposition: H (z, θ ) = H1 = H + + H − = G + (z) + G − (z) + 2θ L(z). (8.8) α are of conformal weight 1/2. We With respect to this field, the basic fermions eα± and e± may consider different U (1) currents giving rise to different charge decompositions. We will now perform what is called a topological twist. This consists of changing the conformal weights and U (1)-charge of the fields by considering different Virasoro and U (1)-currents. Since we have two commuting copies of the N = 2 superconformal algebra, we may perform this twisting in each sector (plus or minus) independently. For a given N = 2 structure (J, H ) we may consider the operator 1 L0 = H(1|0) ± i J(0|1) , (8.9) 2 which acts diagonally and its eigenvalues will be called the conformal weights of the corresponding states. The eigenvalues of J0 := −i J(0|1) will be called charge. We obtain two possible twistings by choosing different signs in (8.9). Since we have two commuting copies of the N = 2 superconformal algebra in U ch (E), we may use either the same or different signs in the plus or minus sector. It is customary to call the former an B-twist, and the latter a A-twist. Consider the operators 1 ± ± ± H(1|0) + i J(0|1) , J0± := −i J(0|1) L± 0 := 2 (8.10) 1 ± 1 ± ± ± ± H H G . Q± := + i J := − i J 0 0 (0|1) (0|0) (0|1) (0|0) 2 2
Superconformal Structures on Generalized Calabi-Yau Metric Manifolds
357
Proposition 10. Consider the embedding i : U•,• → U ch (E) obtained by composing (8.6) with the obvious embedding ∧• L 1 → U ch (E). We have Q± 0 ◦ i = i ◦ δ± .
(8.11)
Similarly, consider the embedding j : U•,• → ∧• L 2 ⊂ U ch (E); we obtain Q +0 ◦ j = j ◦ δ+ ,
G− 0 ◦ j = j ◦ δ− .
(8.12)
Proof. Since (M, J1 ) is generalized Calabi-Yau, the map (6.2) is an isomorphism of complexes. The differential ∂1 = δ+ + δ− is mapped therefore to d L 1 . A simple com• putation [17] shows that Q 0B := Q +0 + Q − 0 equals d L 1 when restricted to ∧ L 1 = ± ∧• L +1 ⊗ ∧• L − 1 . The latter bigrading coincides with the one by eigenvalues of J0 . Since
• ± Q± 0 increases the degree of the ∧ L 1 component, we obtain (8.11) by degree considerations. Equation (8.12) follows in the same way by using J2 in place of J1 .
Remark 6. Note that by considering the complex conjugate embeddings i¯ and j¯ we would obtain ¯ ¯ G± 0 ◦ i = i ◦ δ± ,
¯ ¯ Q− 0 ◦ j = j ◦ δ− ,
G +0 ◦ j¯ = j¯ ◦ δ+ .
(8.13)
Remark 7. The identification of the operators Q ± , G ± with the differentials δ± and δ± under certain embeddings of differential forms into the chiral-anti-chiral de Rham complex, shows that the full infinite dimensional Lie algebra given by the Fourier modes of the superfields H ± , J ± should be viewed as an affine, or superconformal version of the generalized Kähler identities in the generalized Calabi-Yau case. ± Note that in order to define the bigradings by eigenvalues of L ± 0 and J0 and the ± ± differentials Q 0 and G 0 , we only need the zero modes of the fields H ± , J ± and not the full superconformal algebra. It is easy to see that these zero modes are well defined on any generalized Kähler manifold since the quantum corrections involve derivatives of fields (7.2). We can therefore define:
Definition 9. Let (M, J1 , J2 ) be a generalized Kähler manifold. We define the chiral de Rham complex of M as the sheaf of super-vertex algebras given by the cohomology of the chiral-anti-chiral de Rham complex: ∗ U ch (E), Q +0 . (8.14) ch M := H ch The sheaf ch M is to U (E) what the holomorphic chiral de Rham complex of [22] is to the smooth chiral de Rham complex. In particular, we can give a description in terms of generators and relations just as in the usual case of [22]. Recall that for a complex manifold M we may define a holomorphic Courant algebroid E just as in the smooth setting. The definition of U ch (E) carries over without change to this case. In particular, when E = TC ⊕ TC∗ is the standard Courant algebroid we obtain the usual holomorphic chiral de Rham complex. For a generalized complex manifold M, there is no such notion as a holomorphic tangent bundle TC , so a priori there is no obvious way of constructing the analog of the holomorphic chiral de Rham complex. However, for a generalized Kähler manifold M
358
R. Heluani, M. Zabzine
we have the following analog of TC . As mentioned above, since L ± 1 are closed under the Lie bracket, we may decompose d L 1 = d L +1 + d L − . We have the bicomplex 1
∧ p,q L 1 := ∧ p L +1 ⊗ ∧q L − 1.
(8.15)
q = kerd L +1 : ∧0,q L 1 → ∧1,q L 1 .
(8.16)
Define
A variation of the standard ∂-lemma gives Proposition 11. The following is a resolution of q : dL +
dL +
dL +
q → ∧0,q L 1 −−→ ∧1,q L 1 −−→ ∧2,q L 1 −−→ . . . . 1
1
1
(8.17)
Similarly we may define q := ker d L∗ + : ∧0,q L 1 → ∧1,q L 1 . We have by restriction a 1 non-degenerate pairing q ⊗ q → O := 0 0 ⊂ C ∞ (M).
(8.18)
The sheaf O is a sheaf of rings on M and it will play the role of the structure sheaf on a generalized Kähler manifold. The sheaves p and p are sheaves of O-modules and (8.18) is O-bilinear. The sheaf E := 1 ⊕ 1 will play the role of the holomorphic Courant algebroid. In particular, we have by restriction of the usual operations a differential D : O → E, and we can define as always actions of of sections of E into sections of O by4 π(X ) · f := 2 X, D f .
(8.19)
Finally note that we have a well defined Dorfman bracket [·, ·] : E ⊗ E → E, and that we can define the Courant bracket by (3.1). The following is straightforward Proposition 12. The data (E, D, ·, · ) satisfies (2)–(5) of Definition 2. We now arrive at the following description of ch M. Theorem 3. Let (M, J1 , J2 ) be a generalized Kähler manifold and let (E, D, O) be as above. The sheaf (E) is the sheaf of SUSY vertex algebras generated by functions ch i : O → ch M and sections of E declared to be odd, j : E → M subject to the relations (1)–(5) of Proposition 2. ˜ ch be the sheaf described by the theorem. The fact that ˜ ch is a well defined Proof. Let M M sheaf of SUSY vertex algebras is proved in the same way as in the usual smooth case. Note ˜ ch → U ch (E) given by the inclusions O ⊂ C ∞ (M) that we have an obvious inclusion M and E ⊂ E. Moreover, since Q +0 acts as d L +1 in the restriction ∧• L 1 → U ch (E) and ˜ ch ⊂ ker Q + . Note also that from also in the restriction ∧• L 2 → U ch (E) we see that M
[G +0 ,
Q +0 ]
=
0
L +0 ,
(8.20)
we see that the cohomologies in (8.14) are concentrated in conformal weight zero. Let U ch (E)0 be the kernel of L +0 . Note that the spectrum of J0+ is non-negative on U ch (E)0 . 4 This definition seems circular, but it agrees with the restriction of the usual smooth action.
Superconformal Structures on Generalized Calabi-Yau Metric Manifolds
359
Indeed, the basic fields of negative charge −1 are sections of L +1 , which in turn have positive conformal weight 1 with respect to L +0 . We clearly have + + ˜ ch M ⊂ ker J0 ∩ ker Q 0 ,
(8.21)
˜ ch → ch . We can check that it is an isomorphism locally. Let therefore we have a map M M p
:= ker d L − : ∧ p,0 L 1 → ∧ p,1 L 1 . 1
(8.22)
•
As in the usual complex geometry case, we have that ( , d L +1 ) is a resolution of C. We can locally write •
˜ ch U ch (E)0 = M ⊗ ,
(8.23)
where the grading on the left is by J0+ -charge and agrees with the grading on the right. The differential Q +0 acts as 1 ⊗ d L +1 and the result follows. Corollary 1. The cohomology in (8.14) is concentrated in degree zero, namely, U ch (E)0 is a resolution of ch M. Example 5. Consider a usual Kähler manifold as in Example 2. The sheaf ch M is isomorphic to the holomorphic chiral de Rham complex constructed in [22]. Indeed, in this case we have that both hermitian complex structures agree J = J + = J − . We have natural isomorphisms then L +1 T ∗0,1 ,
∗1,0 L− . 1 T
(8.24)
Recall from Proposition 10 that Q +0 gets identified with δ+ = d L +1 which in this case is the usual operator ∂. If we exchange J1 and J2 5 then we see that 1 = TC∗ is the holomorphic cotangent bundle, 1 = TC is the holomorphic Tangent bundle and O is the sheaf of holomorphic functions on M. Theorem 3 gives an isomorphism of ch M with the holomorphic chiral de Rham complex constructed in [22]. ch Remark 8. In the usual Kähler case, we have by Theorem 3 an embedding ch M ⊂U (E). On the other hand, there is an obvious embedding of the holomorphic chiral de Rham complex of [22] into the smooth chiral de Rham complex U ch (E) given by the inclusions TC ⊂ T ⊗ C, TC∗ ⊂ T ∗ ⊗ C and O ⊂ C ∞ (M). These two inclusions are not the same. In fact they are an affine analog of the fact that the decomposition of cohomology given by (8.4) is not the Dolbeaut decomposition but rather an orthogonal transformation of it [13].
Remark 9. Since the zero mode J0− is well defined on any generalized Kähler manifold, the sheaf ch M carries a grading given by fermionic charge. In the description of Theorem 3 the sections of 1 have charge +1 while the sections of 1 have charge −1. Remark 10. One should view (E, D, ·, · ) as a Courant-Dorfman algebra over O [23]. From this perspective, Theorem 3 is nothing more than the SUSY vertex algebra that one can trivially attach to any such algebra. 5 This is to have the same signs as in the rest of this section.
360
R. Heluani, M. Zabzine
Remark 11. From Remark 9 and Corollary 1 it is natural to define a chiral analog of the Hodge decomposition as the sheaf cohomology p,q ch, p Hch (M) = H q M , (8.25) where M is the component of charge p with respect to J0− . There are several subtleties in defining (8.25). First, we need to show that just as in the usual case, ch M carries a filtration such that the succesive quotients are (finitely presented) sheaves of O-modules. This is done in parallel to the usual case. Having done so, one needs to develop the cohomology theory of sheaves of O-modules on generalized Kähler manifolds. This is easily done in the bihermitian setup as explained below. We will not pursue this further in this article. We point out however that Corollary 1 would imply a chiral version of the de Rham Theorem: ch, p
p,q
Hch (M) =
ker Q +0 : U ch (E) p,q → U ch (E) p,q+1 im Q +0 : U ch(E)
p,q−1
→ U ch (E) p,q
,
(8.26)
where U ch (E) p,q consists of sections with charge p (resp. q) with respect to J0− (resp. J0+ ). In the usual Kähler and non-supersymmetric case, one should compare this result to those of [7]. The description of ch M in terms of “holomorphic” data is much more easily done in the bihermitian setup. It will be useful to replace Ji with −Ji for simplicity (see the signs in Example 2). We have isomorphisms (5.2): L 1 T+∗0,1 ⊕ T−∗0,1 .
(8.27)
Under this identification the differential d L 1 = ∂+ + ∂− . It follows by definition then that we have natural isomorphisms, O O +,
1 TC∗,+ ,
1 TC,+ ,
(8.28)
where O + (resp. TC∗,+ , TC,+ ) is the sheaf of holomorphic functions (resp. holomorphic cotangent bundle, holomorphic tangent bundle) with respect to J + . Recall (Prop. 4) that H is of type (2, 1) + (1, 2) with respect to J+ . Let H+ be its (2, 1) component. We see that 0 → TC∗,+ → E → TC,+ → 0
(8.29)
is naturally a holomorphic exact Courant algebroid in the complex manifold (M, J+ ). It is the H+ -twisting of the standard Courant algebroid. We arrive at the following Proposition 13. Let (M, J1 , J2 ) be a generalized Kähler manifold, and let (g, J± ) be the associated bihermitian data. The sheaf ch M is naturally isomorphic to the sheaf of SUSY vertex algebras U ch (E) on the complex manifold6 (M, −J+ ) constructed as in Prop. 2, that is the holomorphic H+ -twisted chiral de Rham complex of (M, −J+ ). Similarly, if we use Q − in place of Q + in (8.14) we obtain a sheaf of SUSY vertex algebras on the complex manifold (M, −J− ), its holomorphic twisted chiral de Rham complex. 6 The sign change in J is due to the sign changes in the previous paragraph. +
Superconformal Structures on Generalized Calabi-Yau Metric Manifolds
361
Remark 12. Note that we may in fact construct 4-sheaves as half-twisted models if we ± take as BRST differentials Q ± 0 or G 0 . The sheaves obtained using the latter differentials are the anti-holomorphic counterparts to the sheaves described in Prop 13. In particular, in a usual generalized Kähler manifold as in Example 2, we have J+ = J− = −J and therefore the sheaf computed with either Q +0 or Q − 0 coincides with the usual holomorphic chiral de Rham complex of M as in [22] while the sheaf computed with either G +0 or G − 0 is the anti-chiral de Rham complex of M. The following immediately follows from Theorem 2: Proposition 14. Let (M, J1 , J2 ) be a generalized calabi-Yau metric manifold. Then the chiral de Rham complex ch M is a sheaf of topological vertex algebras, i.e. there exists an embedding of the N = 2 superconformal vertex algebra into the space of global − sections of ch M . This algebra is generated by the cohomology classes of the fields J − and H constructed in Theorem 2. Remark 13. Given this last proposition it is natural to define following [3] the 2-variable Elliptic genus of a compact generalized Calabi-Yau metric manifold M as the supertrace Ell M (y, q) := y −
dim M 2
−
−
st H ∗ ((E)) y J0 q L 0 .
(8.30)
However, the no-go theorems (for a review see [25] and references therein) state that if M is compact then H = 0, in which case the dilaton vanishes in the fields J ± , H ± and we reduce to the usual Calabi-Yau case. We conclude this section by mentioning the different topological sectors of these sheaves [17]. Let (M, J1 , J2 ) be a generalized Calabi-Yau manifold. (1) B-twist. In this twist we consider the BRST charge Q B = Q + + Q − . The complex (U ch (E), Q B ) is quasi-isomorphic to (∧• L 1 , d L 1 ). If M is a usual Calabi-Yau manifold with the generalized complex structures as in 2, the cohomology of this complex is given by the Hochschild cohomology of M: ⊕H p, p M. (2) A-twist. Consider now the BRST differential Q A = Q + + G − . In this case the complex (U ch (E), Q A ) is quasi-isomorphic to (∧• L 2 , d L 2 ) and in the usual Calabi-Yau case its cohomology is the de Rham cohomology of M. 9. Discussion In the present article we described explicitly an N = 2, 2 structure on the chiral-antichiral de Rham complex of a generalized Calabi-Yau metric manifold with or without H -flux. We also studied the half-twisted model, and showed that it can be described purely in terms of “holomorphic” data. In the course of doing so we clarified some results scattered in the literature on generalized Calabi-Yau metric manifolds, as well as produced some new interesting ones. In particular, we showed in Sect. 6 how unimodularity of the Lie algebroids appearing in the generalized geometry side, corresponds to the properties of traces of connections on the bihermitian side. In the usual Calabi-Yau case, this corresponds to Ricci-flatness. This article generates some questions which would be interesting to address in the future.
362
R. Heluani, M. Zabzine
• The analog of the Clifford or Dolbeaut decomposition of cohomology makes sense on any generalized complex manifold satisfying the dd c lemma [6]. Presumably one should be able to define the sheaf ch M in such cases. • It would be interesting to match the discrepancy between the holomorphic CDR of [22] and ch M as presented in Remark 8, with the results of [26] and [19]. We suspect that this discrepancy corresponds to different choices of Hamiltonians. • The notion of holomorphic Courant algebroid, or Courant-Dorfman algebra over O should be easy to define on any generalized Kähler manifold. To such objects one should be able to attach a vertex algebra in the same way as in the smooth case. • Although the notion of Elliptic genus in two variables seems not to give anything new due to the no-go theorems of generalized complex geometry. Presumably one could develop a theory of cohomologies with compact support or similar, to make sense of (8.30) in the non-compact case, where H -flux will play a crucial role. Note however that the results of [7] should be easily generalized in this setting to the generalized Kähler case. Acknowledgements We thank Nigel Hitchin, Chris Hull, Ulf Lindström, Maciej Szczesny, Rikard von Unge and Frederik Witt for the discussions for this and related subjects. In particular we are grateful to Jian Qiu for inspiring discussions and a few useful suggestions. The research of R.H. was supported by NSF grant DMS-0635607002. The research of M.Z. is supported by VR-grant 621-2008-4273.
Appendix A: SUSY Conformal Algebras Let h be super Lie algebra spanned by an odd element χ and an even element λ such that [χ , χ ] = −2λ and λ is central. Let H be its universal enveloping algebra. We will consider another set of generators S, −T for the same algebra. Definition 10. An N K = 1 SUSY Lie conformal algebra is a H-module R with an operation [ ] : R ⊗ R → H ⊗ R of degree 1 satisfying: (1) Sesquilinearity: [Sa b] = χ [a b],
[a Sb] = −(−1)a (S + χ ) [a b].
(2) Skew-Symmetry: [b a] = (−1)ab [b− −∇ a]. Here the bracket on the right hand side is computed as follows: first compute [b a], where = (γ , η) are generators of H super commuting with , then replace by (−λ − T, −χ − S). (3) Jacobi identity: [a [b c]] = −(−1)a [[a b]+ c] + (−1)(a+1)(b+1) [b [a c]], where the first bracket on the right hand side is computed as in Skew-Symmetry and the identity is an identity in H⊗2 ⊗ R. Given an N K = 1 SUSY VA, it is canonically an N K = 1 SUSY Lie conformal algebra with the bracket defined in (2.1). Moreover, given an N K = 1 Lie conformal algebra R, there exists a unique N K = 1 SUSY VA called the universal enveloping SUSY vertex algebra of R with the property that if W is another N K = 1 SUSY VA and ϕ : R → W is a morphism of Lie conformal algebras, then ϕ extends uniquely to a morphism ϕ : V → W of SUSY VAs. The operations (2.1) satisfy:
Superconformal Structures on Generalized Calabi-Yau Metric Manifolds
363
• Quasi-Commutativity: ab − (−1) ba = ab
• Quasi-Associativity (ab)c − a(bc) =
0
−∇
[a b]d .
a(− j−2|1) b( j|1) c + (−1)ab
j≥0
b(− j−2|1) a( j|1) c.
j≥0
• Quasi-Leibniz (non-commutative Wick formula) [a bc] = [a b]c + (−1)
(a+1)b
b[a c] +
0
[[a b] c]d,
where the integral d is ∂χ dλ. In addition, the vacuum vector is a unit for the normally ordered product and the endomorphisms S, T are odd and even derivations respectively of both operations. References 1. Alvarez-Gaume, L., Freedman, D.Z.: Geometrical Structure And Ultraviolet Finiteness In The Supersymmetric Sigma Model. Commun. Math. Phys. 80(3), 443–451 (1981) 2. Bonechi, F., Zabzine, M.: Poisson sigma model on the sphere. Commun. Math. Phys. 285, 1033–1063 (2009) 3. Borisov, L., Libgober, A.: Elliptic genera of toric varieties and applications to mirror symmetry. Invent. Mat. 140(2), 453–485 (2000) 4. Bredthauer, A., Lindström, U., Persson, J., Zabzine, M.: Generalized Kähler geometry from supersymmetric sigma models. Lett. Math. Phys. 77, 291–308 (2006) 5. Bressler, P.: The first Pontryagin class. Compos. Math. 143(5), 1127–1163 (2007) 6. Cavalcanti, G.R.: New aspects of the dd c lemma. Oxford Ph.D. thesis, 2004 7. Cheung, P.: The Witten genus and vertex algebras. http://arxiv.org/abs/0811.1418v4 [math.AT], 2010 8. Ekstrand, J., Heluani, R., Källén, J., Zabzine, M.: Non-linear sigma models via the chiral de Rham complex. Adv. in Theor. and Math. Phys. 13, 1221–1254 (2009) 9. Evens, S., Lu, J., Weinstein, A.: Transverse measures, the modular class and a cohomology pairing for Lie algebroids. Quart. J. Math. Oxford Ser. (2) 50(200), 417–436 (1999) 10. Frenkel, E., Losev, A., Nekrasov, N.: Instantons beyond topological theory II. http://arxiv.org/abs/0803. 3302v1 [hep.th], 2008 11. Gorbounov, V., Malikov, F., Schechtman, V.: Gerbes of chiral differential operators. II. Vertex algebroids. Invent. Math. 155(3), 605–680 (2004) 12. Gualtieri, M.: Generalized complex geometry. Oxford Ph.D. thesis, 2004 available at http://arxiv.org/abs/ math.DG/0401221, 2004 13. Gualtieri, M.: Generalized geometry and the Hodge decomposition. http://arxiv.org/abs/math/ 0409093vL, 2004 14. Gualtieri, M.: Generalized complex geometry. http://arxiv.org/abs/math/07032298v1, 2007 15. Heluani, R.: Supersymmetry of the chiral de Rham complex II: Commuting sectors. Int. Math. Res. Not., IMRN 6, 953–987 (2009) 16. Heluani, R., Kac, V.G.: Super symmetric vertex algebras. Commun. Math. Phys. 271, 103–178 (2007) 17. Heluani, R., Zabzine, M.: Generalized Calabi-Yau Manifolds and the chiral de Rham complex. Adv. in Math. 223(5), 1815–1844 (2009) 18. Kac, V. G.: Vertex algebras for beginners. Volume 10 of University Lecture series, Providence, RI: Amer. Math. Soc., 1996 19. Kapustin, A.: Chiral de Rham complex and the half-twisted sigma-model. http://arxiv.org/abs/hep-th/ 0504074v1, 2005 20. Lindström, U., Minasian, R., Tomasiello, A., Zabzine, M.: Generalized complex manifolds and supersymmetry. Commun. Math. Phys. 257, 235–256 (2005) 21. Malikov, F.: Lagrangian approach to sheaves of vertex algebras. Commun. Math. Phys. 278, 487–548 (2008)
364
R. Heluani, M. Zabzine
22. Malikov, F., Shechtman, V., Vaintrob, A.: Chiral de Rham complex. Commun. Math. Phys. 204(2), 439– 473 (1999) 23. Roytenberg, D.: Courant-Dorfman algebras and their cohomology. Lett. Math. Phys. 90(1–3), 311–351 (2009) 24. Weinstein, A.: The modular automorphism group of a Poisson manifold. J. Geom. Phys. 23(3-4), 379– 394 (1997) 25. Witt, F.: Calabi-Yau manifolds with B-fields. Rend. Semin. Mat. Univ. Politec. Torino 66(1), 1–21 (2008) 26. Witten, E.: Two-dimensional models with (0,2) supersymmetry: perturbative aspects. Adv. Theor. Math. Phys. 11(1), 1–63 (2007) 27. Zabzine, M.: Lectures on generalized complex geometry and supersymmetry. Arch. Math. (Brno) 42(suppl.), 119–146 (2006) 28. Zumino, B.: Supersymmetry And Kahler Manifolds. Phys. Lett. B 87(3), 203–206 (1979) Communicated by Y. Kawahigashi
Commun. Math. Phys. 306, 365–380 (2011) Digital Object Identifier (DOI) 10.1007/s00220-011-1274-1
Communications in
Mathematical Physics
Rapid Convergence to Frequency for Substitution Tilings of the Plane José Aliste-Prieto1 , Daniel Coronel2 , Jean-Marc Gambaudo3 1 Centro de Modelamiento Matemático, Universidad de Chile, Blanco Encalada 2120 7to. piso, Santiago,
Chile. E-mail:
[email protected]
2 Facultad de Matemáticas, Pontificia Universidad Católica de Chile, Campus San Joaquín,
Avenida Vicuña Mackenna 4860, Santiago, Chile. E-mail:
[email protected]
3 Laboratoire J.-A. Dieudonné, Université de Nice - Sophia Antipolis, CNRS, 06108 Nice Cedex 02, France.
E-mail:
[email protected] Received: 18 July 2010 / Accepted: 21 December 2010 Published online: 12 June 2011 – © Springer-Verlag 2011
Abstract: This paper concerns self-similar tilings of the Euclidean plane. We consider the number of occurrences of a given tile in any domain bounded by a Jordan curve. For a large class of self-similar tilings, including many well-known examples, we give estimates of the oscillation of this number of occurrences around its average frequency times the total number of tiles in the domain, which depend only on the Jordan curve. 1. Introduction Quasicrystals are alloys that show long range order but possess symmetries that prevent them from being crystals. From their spectacular experimental realization in the early 80’s [SBGC84] to their very recent discovery as natural objects in the Kamtchatka mountains [BSYL09], quasicrystals have been the subject of very active research, whose application and interest extend far beyond the scope of solid state physics. The first examples observed were rapidly quenched alloys of Aluminum and Manganese exhibiting icosahedral symmetry. In these quasicrystals, atoms are known to appear with a given frequency. This means that for a large ball B of radius R, the ratio of the number Ns of atoms inside B corresponding to a specific atomic element s (for example Aluminum in the first known cases) to the total number N of atoms in the ball B tends to a limit νs as R tends to infinity: lim
R→+∞
Ns = νs . N
For such examples, the following questions arise naturally: (1) to estimate the oscillation of the ratio Ns /N around its limit νs , that is to say, to give upper bounds for the speed of convergence to 0 of the ratio |Ns /N − νs | as R goes to infinity; (2) to extend these estimates to domains whose closure are not Euclidean balls.
366
J. Aliste-Prieto, D. Coronel, J.-M. Gambaudo
We now make these questions more concrete by using aperiodic tilings of the Euclidean space to model quasicrystals (see [Bel03]), for details and precise statements of the results, see Sect. 2. Consider a tiling T of d-dimensional Euclidean space Rd made with isometric copies of a finite set of tiles { p1 , . . . , pn }, where the pi ’s are homeomorphic to a closed ball in Rd . Let V be a (large) set in Rd also homeomorphic to a closed ball and let ∂ V be its boundary. For each i ∈ {1, . . . , n}, we let N (V, T , pi ) denote the number of isometric copies of pi in V, N (V, T ) the number of tiles of T , and L(∂ V, T ) the number of tiles of T that intersect ∂ V . If V is a ball, then it is well-known that there are tilings such that for each i in {1, . . . , n}, the ratio N (V, T , pi )/N (V, T ) tends to a welldefined limit νi as the radius of V tends to infinity (see for instance [GS87,LP03,Sol97]). In fact, this is usually a consequence of the unique ergodicity of the associated dynamical system (see for instance [LMS02]). However, in general, very little is known concerning an upper bound for the quantity |N (V, T , pi ) − νi .N (V, T )|, especially when V is not a Euclidean ball (see [APC,LP03] for the case when V equals a ball or box). In particular, for an aperiodic tiling, can we hope to obtain an estimate as strong as the one we can formulate for periodic tilings which reads: (∗) |N (V, T , pi ) − νi N (V, T )| ≤ K L(∂ V, T ) for some K > 0? In this paper, we answer this question in the affirmative for a large class of selfsimilar tilings in two dimensions. Self-similar tilings, which are probably the most studied examples of aperiodic tilings (see for instance [AP98,CS06,Rad94,Rob04,Sol97] and references therein, examples are given in Sect. 5), are associated with a (substitution) primitive square matrix M having non-negative integer elements. By Perron-Frobenius theory (see for instance [HJ90,Sen81], see also Subsect. 3.3 for more details), we know that there exists a largest positive real eigenvalue μ, the remainder of the spectrum being in a ball centered at 0 with radius smaller than μ. We denote by r (M) the modulus of the second largest eigenvalue of M, that is, r (M) = max{|η| | η = μ is an eigenvalue of M}. √ The simple condition r (M) < μ ensures √ that the above estimate (∗) holds true for the associated self-similar tiling. If r (M) = μ and if all eigenvalues with modulus r (M) are semi-simple (their algebraic and geometric multiplicities coincide, see for instance [IJ90] for more details), then the estimate (∗) has to be relaxed to (∗∗) |N (V, T , pi ) − νi N (V, T )| ≤ K L(∂ V, T ) ln L(∂ V, T ) for some K > 0. Some explicit examples of self-similar 2-dimensional tilings satisfying all these conditions will be given at the end of the paper. These results are more explicitly stated in the next section. Ideas for the proofs have been inspired by J. Peyrière’s 1986 paper [Pey86] and can be related to B. Adamcezswki’s work [Ada04]. 2. Definitions and Main Result 2.1. Definitions. Let be a closed subset of the Euclidean plane R2 . A tiling of is a countable collection T = (t j ) j∈J of closed subsets of that cover and have pairwise disjoint interiors. The sets t j are called tiles and, in this paper, all tiles are supposed to be homeomorphic to the closed unit ball in R2 , see Sect. 5 for examples.
Speed of Convergence for Self-Similar Tilings
367
Let E be a group of isometries on the plane that contains the group of translations. A tile q is E-equivalent to a tile p (or is a E-copy of p) if q is the image of p by an isometry in E. Let P = { p1 , . . . , pn } be a finite collection of tiles. A tiling T of ⊆ R2 is E-generated by P if every tile in T is E-equivalent to some tile in P. The set of all tilings of that are E-generated by P is denoted by E ,P (). When = R2 , we write E ,P instead of E ,P (R2 ). Let λ > 1. Given a subset U ⊆ R2 , let λU := {λx | x ∈ U }. A substitution rule (with n , where S belongs to dilation factor λ) is a collection S = (Si )i=1 i E ,P (λpi ) for all i ∈ {1, . . . , n}. Let S be a substitution rule. On one hand, the dilation by λ induces a natural map from E ,P () to E ,λP (λ), where λP = {λp | p ∈ P}. On the other hand, S induces a map from E ,λP (λ) to E ,P (λ), which is defined by subdividing the tiles of a tiling in E ,λP (λ) according to the substitution rule. For details see SubSect. 3.2. The composition of these two maps yields a map IS , : E ,P () → E ,P (λ). If = R2 , then IS := IS ,R2 is a self-map of E ,P and is referred to as the substitution map. A tiling T is called self-similar if it is a periodic point for IS . A tiling T in E ,P is admissible for S if it belongs to S := ISk (E ,P ). k≥0
The substitution matrix associated with S is the n-by-n matrix MS = (m i, j )i, j , where for each i, j ∈ {1, . . . , n}, the coefficient m i, j is the number of E-copies of pi in S j . Recall that a matrix M is primitive if there exists n > 0 such that all the elements of M n are positive. The substitution rule S is primitive if its substitution matrix is primitive. In this paper, all the substitution rules considered will be primitive. 2.2. Main result. Let P = { p1 , . . . , pn } be a finite set of tiles, S a substitution rule with dilation factor λ > 1 and T an admissible tiling for S. Given a Jordan curve in R2 bounding a topological closed disk , we let L(, T ) denote the number of tiles of T that intersect , N ( , T ) be the number of tiles of T included in , and for each i ∈ {1, . . . , n}, we define N ( , T , pi ) to be the number of E-copies of pi contained in . The following theorem, which constitutes the main result of this paper, provides estimates on the oscillation of the number of occurrences N ( , T , pi ) around an average frequency, which only depends on the Jordan curve bounding . Theorem 2.1. Let P = { p1 , . . . , pn } be a finite collection of tiles, S a substitution rule with dilation factor λ > 1 and primitive substitution matrix MS . There exist positive numbers ν1 , . . . , νn depending only on the substitution matrix MS that satisfy: (i) If r (MS ) < λ, then there exists K > 0 such that, for every Jordan curve bounding a topological closed disk and every tiling T in S , we have: |N ( , T , pi ) − νi N ( , T )| ≤ K L(, T ). (ii) If r (MS ) = λ and all the eigenvalues with modulus λ are semi-simple (that is, their algebraic and geometric multiplicities coincide), then there exists K > 0 such that, for every Jordan curve bounding a topological closed disk and every tiling T in S , we have: |N ( , T , pi ) − νi N ( , T )| ≤ K L(, T ) ln L(, T )).
368
J. Aliste-Prieto, D. Coronel, J.-M. Gambaudo
Remark 2.2. Notice that Theorem 2.1 is given for a quite general setting: (1) On one hand, we do not require the tiles in P to be polygons, they need not even be disks with piece-wise smooth boundaries. In fact, the boundary of the tiles could be Jordan curves with infinite length and positive Hausdorff dimension, as is the case for Rauzy tilings that we shall discuss later. (2) On the other hand, we do not require the standard finite pattern condition used in tiling theory (see [KP00]). This allows us to deal with self-similar tilings which have “fault lines” along which tiles can slide past one another. This was the case for the self-affine tilings studied by N. Priebe Frank and L. Sadun [PFS09]. Remark 2.3. For a given substitution, the estimates given by Theorem 2.1 may depend on the group E being considered. This is the case of Penrose tilings, see Sect. 5 for details. Remark 2.4. Theorem 2.1 can be easily extended to colored tilings (for an example, see the square tilings in Sect. 5). 3. The Tools 3.1. Jordan curves and locally finite tilings. In this section, we provide some simple combinatorial estimates for the number of tiles of a tiling that intersect a Jordan curve. First we must define some notation. Denote by Bx (r ) the closed ball of radius r around x in R2 . Given a subset U ⊆ R2 , define rmin (U ) := sup{r > 0 | there exists x ∈ U s.t. Bx (r ) ⊆ U }. To a tiling T of R2 , we associate rT = inf{rmin (t) | t ∈ T } and RT = sup{diam(t) | t ∈ T }/2. The following lemma shows that if rT is positive and RT is finite, then there is a uniform bound on the number of tiles intersected by balls of a prescribed radius. Lemma 3.1. Suppose that T is a tiling with RT < +∞ and rT > 0. Then every ball of radius 2RT intersects at most K T := 16RT2 rT−2 tiles of T , where − stands for the integer part. Proof. Fix x ∈ R2 . It is clear that each tile of T intersecting Bx (2RT ) is included in Bx (4RT ). Since each one of these tiles contains a ball of radius rT , comparing area yields rT2 π N ≤ 16π RT2 , where N is the number of tiles intersecting Bx (2RT ) and the conclusion follows. By virtue of the previous lemma, we say that a tiling T is locally finite if RT < +∞ and rT > 0. The next result provides an estimate for the diameter of a simple curve in terms of the number of tiles of a locally finite tiling that the curve intersects. To simplify notation, it is convenient to identify curves with their images. Let γ : [0, 1] → R2 be a simple curve. The identification induces an order on the image: Let y, y in γ with y = γ (s) and y = γ (t). If γ is open, then y ≤ y if and only if 0 ≤ s ≤ t ≤ 1; if γ is closed, then y ≤ y if and only if 0 ≤ s ≤ t < 1.
Speed of Convergence for Self-Similar Tilings
369
Fig. 1. Proof of Lemma 3.2: Construction of the sequence x1 , . . . , xl and t j (1) , . . . , t j (l)
Lemma 3.2. Let T be a locally finite tiling and γ be a simple curve. Then, diam(γ ) ≤ 2RT L(γ , T ), where L(γ , T ) denotes the number of tiles that γ intersects. Proof. By compactness, there exist x, x ∈ γ with x ≤ x and d(x, x ) = diam(γ ). We construct a sequence of points (xi )i=0 as follows. Fix ε > 0 and set x0 = x. To construct x1 , if x belongs to Bx0 (2RT + ε), then set = 1 and x1 = x . If x does not belong to Bx0 (2RT + ε), then define x1 by x1 := max{z ∈ γ | z ∈ Bx0 (2RT + ε)}. It is clear that d(x1 , x0 ) = 2RT + ε. This construction can be extended inductively to obtain a sequence x = x0 < x1 < · · · < x −1 < x = x in γ such that for all k ∈ {1, . . . , − 1}: • d(xk−1 , xk ) = 2RT + ε, • {z ∈ γ | xk ≤ z ≤ x } ∩ ∪k−1 m=0 Bxm (2RT + ε) = {x k }, and such that d(x −1 , x ) ≤ 2RT + ε (see Fig. 1). For each k ∈ {0, . . . , −1}, choose a tile t j (k) in T that contains xk . Since d(xk , xk ) ≥ 2RT + ε for all k, k in {0, . . . , − 1}, it follows that t j (k) is not the same tile as t j (k ) unless k = k . Hence, the number L(γ , T ) of tiles in T that intersect γ is at least (see Fig. 1). Moreover, it is easy to check that (2RT + ε) ≥ d(x, x ) = diam(γ ). Thus, using the estimate for L(γ , T ) we get diam(γ ) ≤ (2RT + ε)L(γ , T ), and the conclusion can be obtained by letting ε go to 0.
Given a Jordan curve and two tilings T and T , the next lemma compares the number of tiles of T that intersects with the number of tiles of T that intersects.
370
J. Aliste-Prieto, D. Coronel, J.-M. Gambaudo
Lemma 3.3. Let be a Jordan curve and T be a locally finite tiling. Then, for every locally finite tiling T such that RT > RT and K T < L(, T ) ≤ L(, T ), we have: L(, T ) ≤ (2K T + 1)
RT L(, T ), RT
where K T is the constant defined in Lemma 3.1. Proof. For every point y ∈ R2 , we define y := ∪t∈C y t. C y := {t ∈ T | t ∩ B2R (y) = ∅} and C T
p
We construct a collection Y = {yi }i=1 (with p to be determined) of points in such that yi } p covers . Fix any point of as y1 . Now suppose that y1 , . . . , y j the collection {C i=1 have been chosen such that, for all i ∈ {1, . . . , j} and k ∈ {1, . . . , i − 1}, the point yi yk . There are two cases to consider. Either the sets {C yi } j cover does not belong to C i=0 , in which case we set p = j and the construction is completed; or they do not cover , j in which case we choose any point of \ ∪i=0 ∪t∈C yi t to be y j+1 and continue iterating the construction. Observe that L(, T ) is finite because T is locally finite and is compact, and in each iteration, we add at least one tile of T that intersects to the area yi } j . Hence, the construction stops in finitely many steps. covered by {C i=0 On one hand, by Lemma 3.1, each C yi contains at most K T tiles of T . It follows that L(, T ) ≤ pK T ,
(1)
and since K T < L(, T ), we deduce that p ≥ 2. On the other hand, fix i ∈ {1, . . . , p}. Hence, there exists j ∈ {1, . . . , p} such that y j , which belongs to , does not belong to B yi (RT − RT ). Suppose that yi < y j (the other case is analogous), then there is a point yi > yi in with d(yi , yi ) = RT − RT such that the arc [yi , yi ] = {z ∈ | yi ≤ z ≤ yi }, which joins yi and yi , is contained by the ball B yi (RT − RT ). Since i was arbitrary, using Lemma 3.2 yields RT − RT ≤ diam([yi , yi ] ) ≤ 2RT L([yi , yi ] , T ) for all i ∈ {1, . . . , p}. Combining all these inequalities, we obtain p(RT − RT ) ≤ 2RT
p
L([yi , yi ] , T ).
(2)
i=1
From the construction of Y, it is clear that d(yi , y j ) > 2RT for all i < j, and, in p particular, the collection B = {B yi (RT − RT )}i=1 is pairwise disjoint. Moreover, the distance between two different balls in B is greater than 2RT . It follows that no tile of T may intersect more than one ball in B. Hence, no tile of T may intersect more than one arc [yi , yi ] . Thus, combining (2) and (1) we get L(, T ) ≤ 2K T
RT L(, T ). RT − RT
(3)
Speed of Convergence for Self-Similar Tilings
371
To finish the proof, fix c > 0 arbitrarily and consider the following two cases. First, suppose that RT /RT ≤ 1 + c. Since L(, T ) ≤ L(, T ), it follows that L(, T ) ≤ (1 + c)
RT L(, T ). RT
(4)
Now suppose that RT /RT > 1 + c. It is not difficult to check that (1 + c)(RT − RT ) > c RT . Replacing this inequality in (3) we get L(, T ) ≤ 2K T
RT RT
1 1+ L(, T ). c
(5)
Combining (5) and (4) yields 1 RT L(, T ) ≤ max 1 + c, 2K T 1 + L(, T ). c RT An easy computation shows that the last bound is optimal when c = 2K T and the conclusion follows. 3.2. Hierarchical sequences. In this subsection, we recall the concept of “hierarchical sequence” of a tiling. This notion has been used in the context of self-similar tilings satisfying the standard finite pattern condition by many authors, see for instance [KP00]. Here, we extend its use to admissible tilings. Let S be a substitution rule with dilation factor λ. For each l ∈ N, let λl P = {λl p | p ∈ P} and consider the map Dl : E ,λl−1 P → E ,λl P defined by Dl (T ) = λT . Also consider the decomposition maps Sl : E ,λl P → E ,λl−1 P defined by Sl (T ) = {g(λl−1 Si ) | g ∈ E, i ∈ {1, . . . , n}, g(λl pi ) ∈ T } for each l ∈ N∗ . By definition, IS = S1 ◦ D1 . It is not difficult to check that IS is onto when restricted to S . This implies that for each admissible tiling T , there is a sequence (T l )l∈N of tilings, called a hierarchical sequence of T , such that T 0 = T and T l−1 = Sl (T l ) for all l ∈ N, that is, each tile of T l can be decomposed into tiles of T l−1 according to the substitution rule λl−1 S. This sequence is constructed as follows: Set T 0 = T and, for each l > 0, inductively choose T¯ l ∈ IS−1 (T¯ l−1 ) and then set T l = λl T¯ l . It is not difficult to check that λl−1 S1 (λ1−l T ) = Sl (T ) for every T in E ,λl P . It follows that T l−1 = Sl (T l ) for all l ∈ N. Remark 3.4. Notice that all tilings in S are locally finite and that for each tiling T and each l > 0, T l is also locally finite and satisfies: rT l = λl rT ,
RT l = λl RT , and thus K T l = K T .
Proposition 3.5. Let T be an admissible tiling for S. For every topological closed disk there is a finite collection 0 , . . . , m−1 of closed subsets of such that: l=m−1 i = ∪ j∈J t j where J = { j | t j ⊂ }. (i) ∪l=0
372
J. Aliste-Prieto, D. Coronel, J.-M. Gambaudo
(ii) All l ’s have pairwise disjoint interiors. (iii) For each l ∈ {0, . . . , m − 1}, l is a union of tiles in T l , which does not contain a tile in T l+1 . (iv) does not contain a tile of T m . The collection 0 , . . . , m−1 is called a hierarchical decomposition of the closed disk . Moreover, if is the Jordan curve bounding , then λm−l−1 ≤
RT L(, T l ) and N ( l , T l ) ≤ M1 L(, T l+1 ) rT
(6)
for all l ∈ {0, . . . , m − 1}, where M1 is the maximum absolute column sum of the substitution rule M. Proof. Choose m to be the smallest integer such that no tile in T m is included in and set m−1 = ∪t∈Pm−1 t. Pm−1 = {t ∈ T m−1 | t ⊆ } and Applying the recursion l } and l−1 = l ∪ (∪t∈Pl−1 t), Pl−1 = {t ∈ T l−1 | t ⊆ \ 0 , . . . , m−1 , for l ∈ {1, . . . , m − 1}, we obtain a sequence of sets P0 , . . . , Pm−1 and and it is straightforward to check that the sequence l = ∪t∈Pl t for all l ∈ {0, . . . , m −1} satisfies properties (i) to (iv) (Fig. 2). To check the first inequality in (6), observe that, on one hand, Lemma 3.2 implies diam( ) ≤ 2RT l L(, T l ). On the other hand, since contains a tile of T m−1 , it follows that 2rT m ≤ diam( ). The inequality is now obtained by replacing RT l = λl RT and rT m−1 = λm−1rT in the previous inequalities. It remains to check the second inequality in (6). From the construction of the Pl ’s, it is clear that each tile in Pl is contained in a tile of T l+1 that meets (which is the boundary of ). Since each tile of T l+1 is subdivided in tiles of T l according to the substitution rule, it follows that N ( l , T l ) ≤ M1 L(, T l+1 ). 3.3. Perron-Frobenius theory. We now recall the basic Perron-Frobenius theory that we will need in the sequel. For proofs see [HJ90, Chap. 8]. Let M = MS be the substitution matrix associated with S. Since S is primitive, the matrix M is a primitive matrix with non-negative integer coefficients. The Perron-Frobenius Theorem states that the largest real eigenvalue μ > 0 of M, which is called the Perron eigenvalue, is simple and greater than one. Moreover, there exists a right eigenvector v T = (v1 , . . . , vn ) and a left eigenvector w = (w1 , . . . , wn ) such that v and w have positive coefficients and v, w = 1. We denote by r (M) the modulus of the second largest eigenvalue of M, that is, r (M) = max{|η| | η = μ is an eigenvalue of M}. Recall that an eigenvalue is called semi-simple if its algebraic multiplicity is equal to its geometric multiplicity. The following proposition, or more precisely the corollary below, will be crucial for the proof of our main result. The first part is a well-known consequence of the Perron-Frobenius Theorem, see for instance [HJ90, Theorem 8.5.1].
Speed of Convergence for Self-Similar Tilings
373
The second part is less well-known, but can be easily deduced from the first part by using the Jordan Canonical Form of M after recalling that the Jordan blocks associated with semi-simple eigenvalues are diagonal. For l > 0 and i, j ∈ {1, . . . , n}, m li, j denotes the (i, j)-element of the matrix M l . Proposition 3.6. For every ρ > r (M) there exists K = K (ρ) > 0 such that |m li, j − vi μl w j | ≤ Kρ l ,
(7)
for all l > 0. Moreover, if the eigenvalues of modulus r (M) are semi-simple, then there exists K > 0 such that (7) holds with ρ = r (M). Corollary 3.7. Let νi = vi / j v j for all i ∈ {1, . . . , n}. Then, for all ρ > r (M), there exists a constant K > 0 such that for all i ∈ {1, . . . , n}, |m li, j − νi
n
m lk, j | ≤ Kρ l ,
(8)
k=1
for all l > 0. Moreover, if all the eigenvalues with modulus r (M) are semi-simple, then there exists a constant K > 0 such that (8) holds with ρ = r (M). Proof. An easy computation shows |m li, j − νi
n
vi m lk, j ≤ |m li, j − vi μl w j | +
k=1
j
n
vj
|vk μl w j − m lk, j |.
k=1
The conclusion now follows from applying Proposition 3.6 twice.
4. Proof of the Main Result Let P = { p1 , . . . , pn } be a finite set of tiles and S a primitive substitution rule with dilation factor λ > 1. Suppose that T is an admissible tiling for S and is a Jordan curve in R2 bounding a closed disk . The idea of the proof is as follows. Fix i ∈ {1, . . . , n} and let νi be defined as in Corollary 3.7. First, it is not difficult to check that the number of E-copies of t j after applying the substitution l times to ti is exactly m li, j . It follows that the density of the tile ti is νi . Second, we consider a hierarchical sequence (T l = (t lj ) j≥0 )l≥0 of T and the hierarchical decomposition { 0 , . . . , m−1 } of constructed in Proposition 3.5. We use the decomposition of to estimate N ( , T , pi ) and N ( , T ). Since the sets l have disjoint interiors, we have N ( , T , pi ) =
m−1
N ( l , T , pi ) =
m−1
N (t lj , T , pi )
(9)
l=0 t l ⊆ l j
l=0
and N ( , T ) =
m−1
l=0 t l ⊆ l j
N (t lj , T ).
(10)
374
J. Aliste-Prieto, D. Coronel, J.-M. Gambaudo
(b)
(a)
(c)
(d)
Fig. 2. Construction of the hierarchical sequence for a given . The tiling in gray T 0 is a Penrose tiling (see l+1 . In this example, Sect. 5). The tiling in black is T l , the dark-gray region is l and the light-gray one is m=5
Next, multiplying (10) by νi and then subtracting (9), we get N ( , T , pi ) − νi (N ( , T )) =
m−1
N (t lj , T , pi ) − νi N (t lj , T ).
(11)
l=0 t l ⊆ l j
Suppose that t lj is a E-copy of λl pk for some k ∈ {1, . . . , n}. Then N (t lj , T , pi ) = m li,k and N (t lj , T ) = i m li,k . Therefore, applying Corollary 3.7 to (11), we get |N ( , T , pi ) − νi N ( , T )| ≤
m−1
l=0 t l ⊆ l j
Kρ l ≤
m−1
N ( l , T l )Kρ l ,
l=0
where K > 0, and either λ > ρ > r (M) if r (M) < λ or ρ = λ if r (M) = λ and the eigenvalues of modulus r (M) are semi-simple. Applying Proposition 3.5 to the last
Speed of Convergence for Self-Similar Tilings
375
inequality, we obtain: |N ( , T , pi ) − νi N ( , T )| ≤ K M1
m−1
L(, T l+1 )ρ l .
(12)
l=0
We want to apply Lemma 3.3 to give an upper bound of L(, T l ) in terms of L(, T ) for all l ∈ {0, . . . , m}. Since for each l ≥ 0, the tiles of T l+1 are tiled by tiles of T l , it follows that L(, T m−1 ) ≤ · · · ≤ L(, T 1 ) ≤ L(, T ). It is easy to check that K T l = K T for all l ≥ 0. Next, we define l0 ∈ {0, . . . , m} as follows. If L(, T m−1 ) > K T , then 0 := m. Otherwise, we define l0 to be the minimal l ∈ {0, . . . , m −1} such that L(, T l0 ) ≤ K T . Observe that in both cases L(, T l0 −1 ) > K T for all l ∈ {0 . . . , l0 − 1}. We split the sum in (12) into two parts l 0 −1
L(, T l )ρ l +
l=0
m−1
L(, T l )ρ l ,
(13)
l=l0
and deal with each part separately. For the first sum, applying Lemma 3.3, we get L(, T l ) ≤ (2K T l + 1)
RT L(, T ) RT l
for all l ∈ {0, . . . , l0 − 1}. Since RT /RT l = λ−l for every l ≥ 0, it follows that l 0 −1
L(, T )ρ ≤ (2K T + 1)L(, T ) l
l
l=0
l 0 −1
l=0
ρ l . λ
(14)
To estimate the second sum in (13), we suppose that l0 < m (otherwise, the sum is zero). Since ρ ≤ λ and L(, T l ) is decreasing, we have m m−1 m−1 λ − λl0 l l l0 l l0 . L(, T )ρ ≤ L(, T ) λ = L(, T ) λ−1 l=l0
l=l0
From Lemma 6 and the fact that L(, Tl0 ) < K T , we get λm−1−l0 ≤ K T RT /rT . Denote NT := (λK T RT /rT − 1)/(λ − 1). It follows that m−1
L(, T l )ρ l ≤ NT λl0 L(, T l0 ).
(15)
l=l0
If l0 > 0, then from Lemma 3.3 and (15), we get m−1
L(, T l )ρ l ≤ (2K T + 1)NT L(, T ).
(16)
l=l0
It is clear that (16) also holds when l0 = 0, since (2K T + 1) > 1. Combining (14) and (16), we get l −1
0
ρ l + NT . (17) |N ( , T , pi ) − νi N ( , T )| ≤ (2K T + 1)L(, T ) λ l=0
376
J. Aliste-Prieto, D. Coronel, J.-M. Gambaudo
Finally, we deal with the different cases for ρ. In the first case, we have r (MS ) < ρ < λ and it follows from (17) that λ |N ( , T , pi ) − νi N ( , T )| ≤ (2K T + 1) + NT L(, T ), λ−ρ which finishes part (i) of the main result, since RT , rT and K T are constant in S . In the second case, we have ρ = λ and then from (17) and the estimation of m ≥ 0 given in Proposition 3.5, we obtain
|N ( , T , pi ) − νi N ( , T )| ≤ (2K T + 1)L(, T ) logλ L(, T ) + N˜ T , where N˜ T = NT + 1 + logλ (RT /rT ). Thus, there exists K > 0 such that |N ( , T , pi ) − νi N ( , T )| ≤ K L(, T ) ln L(, T ), as soon as meets at least 2 tiles. This completes the proof of part (ii) of the main result. 5. Examples 5.1. Penrose tilings. Penrose tilings are among the most known examples of self-similar tilings, see for instance [AP98,GS87,Pen80,Sen95]. Here, we follow the construction of Anderson and Putnam [AP98]. Consider two isosceles triangles t1 and t2 in R2 , where the vertices of t1 have coordinates (sin(π/10), 0), (− sin(π/10), 0) and (0, cos(π/10)) and those of t2 have coordinates (sin(3π /10), 0), (− sin(3π /10), 0) and (0, cos(3π /10)). Both triangles are equipped with decorations (arrows) on their edges as shown in Fig. 3. Let E denote the group of translations in R2 and PPenrose be the set of 40 triangles obtained by rotation of one of the two triangles t1 or t2 with its prescribed decoration by an angle kπ/10, where k is in {0, 1, . . . , n −1}, and let Penrose be the set of tilings of R2 made with translated copies of triangles in PPenrose such that their tiles meet full-edge to full-edge and the decorations on overlapping edges coincide. Elements of Penrose are called Penrose tilings. They can also be constructed by using the substitution √ SPenrose described in Fig. 3. The dilation factor of SPenrose is the golden mean λ = (1 + 5)/2. The substitution matrix MPenrose is a 40 × 40 non negative primitive matrix (see [AP98] for√details). It is well-known that the Perron eigenvalue of MPenrose is equal to λ2 = (3 + 5)/2, and that λ and −λ are the only eigenvalues of modulus r (MPenrose ).
Fig. 3. Local and substitution rules to construct Penrose tilings
Speed of Convergence for Self-Similar Tilings
377
Fig. 4. Square table substitution
Straightforward computations show that both of them are semi-simple with multiplicity 3. Thus, we can apply Theorem 2.1 (part (ii)), and obtain: Each triangle pi in PPenrose has a well-defined frequency νi > 0, and there exists K > 0 such that, for every Jordan curve and every tiling T in Penrose , we have: |N ( , T , pi ) − νi N ( , T )| ≤ K L(, T ) ln(L(, T ), where is the closed disk bounded by . Now, Let E be the group of all isometries (direct and indirect) of R2 . Up to these isometries, there are now only 2 types of Penrose tiles (the fat and thin triangles). The associated substitution matrix reads 2 1 , 1 1 √ √ and (3 + 5)/2 and (3 − 5)/2 are its eigenvalues. Thus, we can apply Theorem 2.1 (part (i)) to obtain: |N ( , T , pi ) − νi N ( , T )| ≤ K L(, T ). 5.2. The square and table tilings. Squares and table tilings were studied by Robinson [Rob99,Rob04]. First, we consider the square tilings. Let Psquare be the set of four unit squares respectively decorated with symbols p, q, r and s. The square substitution rule Ssquare is described in Fig. 4. The dilation factor is λ = 2. In this case, the substitution matrix Msquare reads ⎛ ⎞ 2 1 0 1 ⎜1 2 1 0⎟ ⎝0 1 2 1⎠. 1 0 1 2 The eigenvalues of Msquare are λ2 , λ and 0. It is easy to check that λ is semi-simple with multiplicity 2. Applying Theorem 2.1 (part (ii)) (extended to deal with the decorated case), it follows that each square t ∈ Psquare has a well-defined frequency νt > 0 and there exists K > 0 such that for any Jordan curve bounding a close disk and any tiling T in Ssquare , we have: |N ( , T , t) − νt N ( , T )| ≤ K L(, T ) ln(L(, T )). Now we consider the table tilings. Let Ptable = { p1 , p2 } where p1 is a vertical domino and p2 is a horizontal domino. The substitution rule Stable is described in Fig. 5. In this case, the dilation factor λ is 2 and the associated substitution matrix reads 2 2 . 2 2
378
J. Aliste-Prieto, D. Coronel, J.-M. Gambaudo
Fig. 5. The table substitution
Fig. 6. Identification of table and square tilings
Fig. 7. The pinwheel substitution
Its eigenvalues are λ2 and 0. Applying Theorem 2.1 (part (i)), it follows that each domino pi has a well-defined frequency νi > 0, and there exists K > 0 such that, for any Jordan curve bounding a close disk and any tiling T in Stable , we have: |N ( , T , pi ) − νi N ( , T )| ≤ K L(, T ). The square and table tilings can be identified (see [Rob04] for details) by identifying tiles as described in Fig. 6. This identification can be used to count tiles in the different tilings. Consider for instance the unit square marked with a p. In every tiling in Ssquare , the squares marked with p and r always appear as in Fig. 6. Thus, for any closed Jordan curve bounding a close disk we have: N ( , T , p) − N ( , T , p1 ) is smaller than L(, T ), where p1 is the horizontal domino in Ptable . Hence, using the estimates for the table substitution gives better estimates than the estimates for the square substitution. It is natural to wonder why. The answer simply relies on the fact that the square substitution system is less efficient than the table one in the sense that the estimates of the error for the n th substitution of the square p, which is a square of size 2n , yields an error term of size λn = 2n . This error term is actually going to be canceled by terms corresponding to the other squares of size 2n . In our computation, the error term for a collection of squares of size 2n is bounded from above by the sum of the error terms for each square of size 2n , and thus we cannot see these cancellations which actually occur. 5.3. The pinwheel tilings. Pinwheel tilings were introduced by Conway and Radin [Rad94]. Let E be the group I + of direct isometries of R2 , and Ppinwheel be the set of two right triangles which are isometric (but with inverse orientation). Each triangle has a hypotenuse of length 1 and sharp angle α satisfying tan α √ = 1/2. The substitution rule Spinwheel is described in Fig. 7 and has a dilation factor of 5. 2 3 . It has two eigenvalues λ2 = The substitution matrix is the 2 × 2 matrix 3 2 5 and −1. Applying Theorem 2.1, it follows that each triangle ti in Ppinwheel has a
Speed of Convergence for Self-Similar Tilings
379
well-defined frequency νi > 0, and there exists K > 0 such that, for any Jordan curve bounding a close disk and any tiling T in Spinwheel : |N ( , T , pi ) − νi N ( , T )| ≤ K L(, T ). 5.4. Rauzy tilings. Rauzy tilings (see [Rau82] for details) are tilings made from a set Prauzy of three (topological) disks r1 , r2 and r3 , where r1 is a disk whose boundary is a Jordan curve (with positive Hausdorff dimension), r2 = λr1 and r2 = λ2 r3 , where λ 1.84 is the unique real root of the polynomial X 3 − X 2 − X − 1 (see [Fog02]). The substitution matrix is the 3 × 3 matrix ⎛ ⎞ 0 0 1 ⎝1 0 0⎠. 0 1 1 The largest eigenvalue of the substitution matrix is λ2 . All other eigenvalues have modulus smaller than 1. Applying Theorem 2.1, it follows that each Rauzy tile pi has a well-defined frequency νi and there exists K > 0 such that for every Jordan curve bounding a close disk and every tiling T in S , we have: |N ( , T , pi ) − νi N ( , T )| ≤ K L(, T ). Acknowledgements. This work is part of the project CrystalDyn supported by the “Agence Nationale de la Recherche” (ANR-06-BLAN- 0070-01). Part of this work was done while J. Aliste-Prieto was a Junior Research Fellow at the Erwin Schrödinger Institute in Vienna. J. Aliste-Prieto also acknowledges funding from Fondecyt Postdoctoral Grant 3100097. D. Coronel is funded by Fondecyt Postdoctoral grant 3100092 and PBCT-Conicyt Research Project ADI-17. Finally, the authors would like to thank Andrew Hart for his help in correcting the English in this article.
References [Ada04]
Adamczewski, B.: Symbolic discrepancy and self-similar dynamics (discrépance symbolique et dynamiques auto-similaires). Ann. de l’Inst. Fourier 54(7), 2201–2234 (2004) [AP98] Anderson, J., Putnam, I.: Topological invariants for substitution tilings and their associated c∗ -algebras. Erg. Th. Dyn. Syst. 18(3), 509–537 (1998) [APC] Aliste-Prieto, J., Coronel, D.: Tower systems for linearly repetitive Delone sets. Erg. Th. Dyn. Sys. doi:10.1017/S0143385710000507, Nov. 2010 [Bel03] Bellissard, J.: The noncommutative geometry of aperiodic solids. In: Geometric and topological methods for quantum field theory (Villa de Leyva, 2001), pages 86–156. River Edge, NJ: World Sci. Publ., 2003 [BSYL09] Bindi, L., Steinhardt, P.J., Yao, N., Lu, P.J.: Natural quasicrystals . Science 324(5932), 1306 (2009) [CS06] Clark, A., Sadun, L.: When shape matters: deformations of tiling spaces. Erg. Th. Dyn. Sys. 26(1), 69–86 (2006) [Fog02] Pytheas Fogg, N.: Substitutions in dynamics, arithmetics and combinatorics. Edited by V. Berthé, S. Ferenczi, C. Mauduit A. Siegel, Volume 1794 of Lecture Notes in Mathematics. Berlin: Springer-Verlag, 2002. [GS87] Grünbaum, B., Shephard, G.C.: Tilings and patterns. Newyork: W. H. Freeman and Company, 1987 [HJ90] Horn R.A., Johnson Ch.R.: Matrix analysis. Cambridge: Cambridge University Press, 1990 [IJ90] Iooss, G., Joseph, D.: Elementary stability and bifurcation theory. 2nd ed. Undergraduate Texts in Mathematics. New York: Springer-Verlag, 1990 [KP00] Kellendonk, J., Putnam, I.F.: Tilings, C ∗ -algebras, and K-theory. In: Directions in mathematical quasicrystals, Volume 13, CRM Monogr. Ser., Providence, RI: Amer. Math. Soc., 2000, pp. 177–206
380
[LP03]
J. Aliste-Prieto, D. Coronel, J.-M. Gambaudo
Lagarias, J.C., Pleasants, P.A.B.: Repetitive delone sets and quasicrystals. Erg. Th. Dyn. Sys. 23(3), 831–867 (2003) [LMS02] Lee, J.-Y., Moody, R.V., Solomyak, B.: Pure point dynamical and diffraction spectra. Ann. H. Poincare 3(5), 1003–1018 (2002) [Pen80] Penrose, R.: Pentaplexity: a class of nonperiodic tilings of the plane. Math. Intelligencer 2(1), 32–37 (1979/80) [Pey86] Peyrière, J.: Frequency of patterns in certain graphs and in Penrose tilings. At: Intl workshop on aperidic Les Houches, 1986 J. Physique 47(7, Suppl. Colloq. C3), 41–61 (1986) [PFS09] Priebe Frank, N., Sadun, L.: Topology of some tiling spaces without finite local complexity. Disc. Con. Dyn. Syst. 23(3), 847–865 (2009) [Rad94] Radin, C.: The pinwheel tilings of the plane. Ann. of Math. (2) 139(3), 661–702 (1994) [Rau82] Rauzy, G.: Nombres algébriques et substitutions. Bull. Soc. Math. France 110(2), 147–178 (1982) [Rob99] Robinson, E.A. Jr..: On the table and the chair. Indag. Math. 10(4), 581–599 (1999) [Rob04] Robinson, Jr. E.A.: Symbolic dynamics and tilings of Rd . In: Symbolic dynamics and its applications, Volume 60, Proc. Sympos. Appl. Math., Providence, RI: Amer. Math. Soc., 2004, pp. 81–119 [SBGC84] Shechtman, D., Blech, I., Gratias, D., Cahn, J.W.: Metallic phase with long range orientational order and no translational symetry. Phys. Rev. Lett. 53(20), 1951–1954 (1984) [Sen95] Senechal, M.: Quasicrystals and geometry. Cambridge: Cambridge University Press, 1995 [Sen81] Seneta, E.: Nonnegative matrices and Markov chains. 2nd ed., Springer Series in Statistics, New York: Springer-Verlag, 1981 [Sol97] Solomyak, B.: Dynamics of self-similar tilings. Erg. Th. Dyn. Syst. 17(3), 695–738 (1997) Communicated by G. Gallavotti
Commun. Math. Phys. 306, 381–417 (2011) Digital Object Identifier (DOI) 10.1007/s00220-011-1284-z
Communications in
Mathematical Physics
Chiral Equivariant Cohomology of a Point: A First Look Andrew R. Linshaw Fachbereich Mathematik, Technische Universität Darmstadt, 64289 Darmstadt, Germany. E-mail:
[email protected] Received: 25 July 2010 / Accepted: 28 December 2010 Published online: 24 June 2011 – © Springer-Verlag 2011
Abstract: The chiral equivariant cohomology contains and generalizes the classical equivariant cohomology of a manifold M with an action of a compact Lie group G. For any simple G, there exist compact manifolds with the same classical equivariant cohomology, which can be distinguished by this invariant. When M is a point, this cohomology is an interesting conformal vertex algebra whose structure is still mysterious. In this paper, we scratch the surface of this object in the case G = SU (2). Contents 1. 2. 3. 4. 5. 6. 7. 8. 9. 10.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . Vertex Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . Recollections on Chiral Equivariant Cohomology . . . . . . . . A Peculiar Topological Structure on W(g) . . . . . . . . . . . . Graded and Filtered Structures . . . . . . . . . . . . . . . . . . . The Basic Complex and the Vertex Algebra Commutant Problem The Case G = SU (2) . . . . . . . . . . . . . . . . . . . . . . . . The Structure of H ∗ (Wbas , K (0)) . . . . . . . . . . . . . . . . . The Structure of H∗SU (2) (C) . . . . . . . . . . . . . . . . . . . . Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
381 383 385 388 389 391 393 401 407 410
1. Introduction Let G be a compact, connected Lie group with complexified Lie algebra g, and let M be a topological space on which G acts by homeomorphisms. The equivariant cohomology HG∗ (M) is defined to be H ∗ ((M × E)/G)), where E is any contractible space on which G acts freely. This is known as the Borel construction. If M is a smooth manifold, and the action of G on M is by diffeomorphisms, there is a de Rham model of HG∗ (M) due to H. Cartan, and developed further by Duflo-Kumar-Vergne [DKV] and Guillemin-Sternberg
382
A. R. Linshaw
[GS]. In fact, one can define the equivariant cohomology HG∗ (A) of any G ∗ -algebra A. A G ∗ -algebra is a commutative superalgebra equipped with an action of G, together with a compatible action of a certain differential Lie superalgebra (sg, d) associated to the Lie algebra g of G. The de Rham model of HG∗ (M) is obtained by taking A to be the algebra (M) of differential forms on M, and HG∗ ((M)) ∼ = HG∗ (M) by an equivariant version of the de Rham theorem. In [LLI], the chiral equivariant cohomology H∗G (A) of an O(sg)-algebra A was introduced as a vertex algebra analogue of the equivariant cohomology of G ∗ -algebras. Examples of O(sg)-algebras include the semi-infinite Weil complex W(g), which was introduced by Feigin-Frenkel in [FF], and the chiral de Rham complex Q(M) of a smooth G-manifold M, which was introduced by Malikov-Schechtman-Vaintrob in [MSV]. In [LLSI], the chiral equivariant cohomology functor was extended to the larger categories of sg[t]-algebras and sg[t]-modules. The main example of an sg[t]-algebra which is not an O(sg)-algebra is the subalgebra Q (M) ⊂ Q(M) generated by the weight zero subspace. H∗G (Q(M)) and H∗G (Q (M)) are both “chiralizations” of HG∗ (M), that is, vertex algebras equipped with weight gradings H∗G (Q(M)) = H∗G (Q(M))[m], H∗G (Q (M)) = H∗G (Q (M))[m], m≥0
m≥0
such that H∗G (Q(M))[0] ∼ = HG∗ (M) ∼ = H∗G (Q (M))[0]. If G acts on M effectively, H∗G (Q(M)) is purely classical, and H∗G (Q(M))[m] vanishes for m > 0 (see Theorem 1.5 of [LLSII]). The functor H∗G (Q (−)) is more interesting, and is a stronger invariant than HG∗ (−) on the category of compact G-manifolds. For any simple G, there exists a sphere S with infinitely many smooth actions of G with the same classical equivariant cohomology, which can all be distinguished by H∗G (Q (S)) (see Theorem 1.7 of [LLSII]). In the case where M is a point pt, Q( pt) = Q ( pt) = C. We will refer to H∗G (C) as the chiral point algebra; it plays the role of HG∗ ( pt) ∼ = Sym(g∗ )G in the classical ∗ ∗ theory, and HG (C)[0] ∼ = HG ( pt). For any sg[t]-module A, there is a chiral Chern-Weil map κG : H∗G (C) → H∗G (A).
(1)
When A = Q (M) for some G-manifold M, this extends the classical Chern-Weil map at weight zero. We regard the elements of H∗G (C) as universal characteristic classes, and an important problem in this theory is to describe H∗G (C) for any G. If G is abelian, H∗G (C) is the abelian vertex algebra generated by HG∗ ( pt), but for nonabelian G, H∗G (C) possesses a rich algebraic structure. For semisimple G, H∗G (C) has a conformal structure of central charge zero, and the Virasoro class L has no classical analogue in HG∗ ( pt). In this paper, we focus on the simplest nontrivial case G = SU (2). Our main result is the construction of an injective linear map : U (sl2 ) → H∗SU (2) (C) whose image consists entirely of nonclassical elements. Let x, y, h be the usual root basis for sl2 , satisfying [x, y] = h, [h, x] = 2x, and [h, y] = −2y. Given a monomial μ = x r y s h t ∈ U (sl2 ), (μ) is homogeneous of degree 4r − 4s and conformal weight 2s + t + 2. In particular, maps the eigenspace of [h, −] of eigenvalue d into H2d SU (2) (C). Moreover, the Virasoro element L, which has degree zero and weight two,
Chiral Equivariant Cohomology of a Point
383
is precisely (1). We conjecture that the image of , together with the classical gener∗ ∗ ator of HSU (2) ( pt), forms a strong generating set for H SU (2) (C). (In this terminology, a vertex algebra A is said to be strongly generated by a set S, if A is spanned by the set of iterated Wick products of elements of S and their derivatives). For the sake of illustration, we write down explicit representatives for a few of the classes given by , and we compute some relations among them. The main technical difficulty in studying the chiral point algebra is that the complex which computes this cohomology is a commutant subalgebra of a bcβγ system, and commutant vertex algebras of this kind are not well understood. Much of this paper is devoted to developing techniques for describing such commutant algebras. Our main tool is the notion of a good increasing filtration on a vertex algebra, which was introduced in [LiII] and used in [LL] to study the commutant problem. The associated graded object of such a vertex algebra is always a supercommutative ring with a derivation. By passing to the associated graded object, one can apply techniques from commutative algebra and algebraic geometry, in particular the theory of jet schemes. An interesting problem which we do not address in this paper is to give an alternative, more geometric construction of the chiral equivariant cohomology, which is suitable for any topological G-space M, and gives the same cohomology as our theory when M is a manifold. In other words, we would like to find a chiral analogue of the Borel model for equivariant cohomology. Such a construction would necessarily include a topological realization of the chiral point algebra. It is our hope that the structure we describe in this paper in the case G = SU (2) may give some hint about where to look for such a construction. 2. Vertex Algebras In this section, we define vertex algebras, which have been discussed from various different points of view in the literature (see for example [B,FLM,K,FBZ]). We will follow the formalism developed in [LZI] and partly in [LiI]. Let V = V0 ⊕ V1 be a super vector space over C, and let z, w be formal variables. By Q O(V ), we mean the space of all linear maps −n−1 V → V ((z)) := v(n)z |v(n) ∈ V, v(n) = 0 for n >> 0 . n∈Z
Each element a ∈ Q O(V ) can be uniquely represented as a power series a = a(z) := a(n)z −n−1 ∈ (End V )[[z, z −1 ]]. n∈Z
We refer to a(n) as the n th Fourier mode of a(z). Each a ∈ Q O(V ) is assumed to be of the shape a = a0 + a1 , where ai : V j → Vi+ j ((z)) for i, j ∈ Z/2Z, and we write |ai | = i. On Q O(V ) there is a set of nonassociative bilinear operations ◦n , indexed by n ∈ Z, which we call the n th circle products. For homogeneous a, b ∈ Q O(V ), they are defined by a(w) ◦n b(w) = Resz a(z)b(w) ι|z|>|w| (z − w)n −(−1)|a||b| Resz b(w)a(z) ι|w|>|z| (z − w)n .
384
A. R. Linshaw
Here ι|z|>|w| f (z, w) ∈ C[[z, z −1 , w, w −1 ]] denotes the power series expansion of a rational function f in the region |z| > |w|. We usually omit the symbol ι|z|>|w| and just write (z − w)−1 to mean the expansion in the region |z| > |w|, and write −(w − z)−1 to mean the expansion in |w| > |z|. It is easy to check that a(w) ◦n b(w) above is a well-defined element of Q O(V ). The nonnegative circle products are connected through the operator product expansion (OPE) formula. For a, b ∈ Q O(V ), we have a(z)b(w) = a(w) ◦n b(w) (z − w)−n−1 + : a(z)b(w) :, (2) n≥0
which is often written as a(z)b(w) ∼ equal modulo the term
n≥0 a(w) ◦n b(w)
(z − w)−n−1 , where ∼ means
: a(z)b(w) := a(z)− b(w) + (−1)|a||b| b(w)a(z)+ . Here a(z)− = n<0 a(n)z −n−1 and a(z)+ = n≥0 a(n)z −n−1 . Note that : a(w)b(w) : is a well-defined element of Q O(V ). It is called the Wick product of a and b, and it coincides with a ◦−1 b. The other negative circle products are related to this by n! a(z) ◦−n−1 b(z) =: (∂ n a(z))b(z) :, where ∂ denotes the formal differentiation operator the k-fold iterated Wick product is defined to be
d dz . For a1 (z), . . . , ak (z)
∈ Q O(V ),
: a1 (z)a2 (z) · · · ak (z) :=: a1 (z)b(z) :, where b(z) =: a2 (z) · · · ak (z) :. We often omit the formal variable z when no confusion will arise. The set Q O(V ) is a nonassociative algebra with the operations ◦n and a unit 1. We have 1 ◦n a = δn,−1 a for all n, and a ◦n 1 = δn,−1 a for n ≥ −1. A linear subspace A ⊂ Q O(V ) containing 1 which is closed under the circle products will be called a quantum operator algebra (QOA). In particular A is closed under ∂ since ∂a = a ◦−2 1. Many formal algebraic notions are immediately clear: a homomorphism is just a linear map that sends 1 to 1 and preserves all circle products; a module over A is a vector space M equipped with a homomorphism A → Q O(M), etc. A subset S = {ai | i ∈ I } of A is said to generate A if any element a ∈ A can be written as a linear combination of nonassociative words in the letters ai , ◦n , for i ∈ I and n ∈ Z. We say that S strongly generates A if any a ∈ A can be written as a linear combination of words in the letters ai , ◦n for n < 0. Equivalently, A is spanned by the collection {: ∂ k1 ai1 (z) · · · ∂ km aim (z) : | i 1 , . . . , i m ∈ I, k1 , . . . , km ≥ 0}. We say that a, b ∈ Q O(V ) quantum commute if (z − w) N [a(z), b(w)] = 0 for some N ≥ 0. Here [, ] denotes the super bracket. This condition implies that a ◦n b = 0 for n ≥ N , so (2) becomes a finite sum. If N can be chosen to be 0, we say that a, b commute. A commutative quantum operator algebra (CQOA) is a QOA whose elements pairwise quantum commute. Finally, the notion of a CQOA is equivalent to the notion of a vertex algebra. Every CQOA A is itself a faithful A-module, called the left regular module. Define ρ : A → Q O(A), a → a, ˆ a(ζ ˆ )b = (a ◦n b) ζ −n−1 . n∈Z
Chiral Equivariant Cohomology of a Point
385
Then ρ is an injective QOA homomorphism, and the quadruple of structures (A, ρ, 1, ∂) is a vertex algebra in the sense of [FLM]. Conversely, if (V, Y, 1, D) is a vertex algebra, the collection Y (V ) ⊂ Q O(V ) is a CQOA. We will refer to a CQOA simply as a vertex algebra throughout the rest of this paper. 3. Recollections on Chiral Equivariant Cohomology We briefly recall the construction of chiral equivariant cohomology, following the notation in [LLI,LLSI,LLSII]. A differential vertex algebra (DVA) is a degree graded vertex algebra A∗ = ⊕ p∈Z A p equipped with a vertex algebra derivation d of degree 1 such that d 2 = 0. A DVA will be called degree-weight graded if it has an additional Z≥0 -grading by weight, which is compatible with the degree in the sense that A p = ⊕n≥0 A p [n]. There is an auxiliary structure on a DVA which is analogous to the structure of a G ∗ -algebra in [GS]. Associated to g is a Lie superalgebra sg defined to be the semidirect product g g−1 , where g−1 is a copy of the adjoint module in degree −1. The bracket in sg is given by [(ξ, η), (x, y)] = ([ξ, x], [ξ, y] − [x, η]), and sg is equipped with a differential d : (ξ, η) → (η, 0). This differential extends to the loop algebra sg[t, t −1 ], and gives rise to a vertex algebra derivation on the corresponding current algebra O(sg) := O(sg, 0). Here 0 denotes the zero bilinear form on sg. An O(sg)-algebra is a degree-weight graded DVA A equipped with a DVA homomorphism ρ : O(sg) → A, which we denote by (ξ, η) → L ξ + ιη . In [LLI], the chiral equivariant cohomology functor was defined on the category of O(sg)-algebras, and in [LLSI] this functor was extended to a larger class of spaces which carry only a representation of the Lie subalgebra sg[t] of sg[t, t −1 ]. An sg[t]-module is a degree-weight graded complex (A, dA ) equipped with a Lie algebra homomorphism ρ : sg[t] → End A, which we denote by (ξ, η)t n → L ξ (n) + ιξ (n), n ≥ 0. We also require that for all x ∈ sg[t] we have ρ(d x) = [dA , ρ(x)], and ρ(x) has degree 0 whenever x is even in sg[t], and degree -1 whenever x is odd, and has weight −n if x ∈ sgt n . Finally, we require A to admit a compatible action ρˆ : G → Aut (A) of G satisfying: d ρ(ex ˆ p(tξ ))|t=0 = L ξ (0), dt ρ(g)L ˆ ˆ −1 ) = L Ad(g)(ξ ) (n), ξ (n)ρ(g ˆ ρ(g)ι ˆ ξ (n)ρ(g
−1
) = ι Ad(g)(ξ ) (n),
ρ(g)d ˆ ρ(g ˆ
−1
) = d,
(3) (4) (5) (6)
for all ξ ∈ g, g ∈ G, and n ≥ 0. These conditions are analogous to Eqs. (2.23)–(2.26) of [GS]. In order for (3) to make sense, we must be able to differentiate along appropriate curves in A, which is the case in our main examples. Given an sg[t]-module (A, d), we define the chiral horizontal, invariant and basic subspaces of A to be respectively Ainv
Ahor = {a ∈ A|ρ(x)a = 0, ∀x ∈ g−1 [t]}, = {a ∈ A|ρ(x)a = 0, ∀x ∈ g[t], ρ(g)(a) ˆ = a, ∀g ∈ G}, Abas = Ahor ∩ Ainv .
Both Ainv and Abas are subcomplexes of A, but Ahor is not a subcomplex of A in general.
386
A. R. Linshaw
An O(sg)-algebra which plays an important role in our theory is the semi-infinite Weil complex W(g), which is just the bcβγ system E(g)⊗S(g). The bc and βγ systems were introduced by Friedan-Martinec-Skenker in [FMS]; in this notation, the bc system E(g) is the vertex algebra with odd generators bξ , cξ , which are linear in ξ ∈ g and ξ ∈ g∗ , and satisfy the OPE relations
bξ (z)cξ (w) ∼ ξ , ξ (z − w)−1 , cξ (z)bξ (w) ∼ ξ , ξ (z − w)−1 ,
bξ (z)bη (w) ∼ 0, cξ (z)cη (w) ∼ 0. Here , denotes the natural pairing between g∗ and g. Similarly, the βγ system S(g) is the vertex algebra with even generators β ξ , γ ξ , which are linear in ξ ∈ g and ξ ∈ g∗ , and satisfy
β ξ (z)γ ξ (w) ∼ ξ , ξ (z − w)−1 , γ ξ (z)β ξ (w) ∼ −ξ , ξ (z − w)−1 ,
β ξ (z)β η (w) ∼ 0, γ ξ (z)γ η (w) ∼ 0. W(g) possesses a Virasoro element ωW of central charge zero, given by ωW = ωE + ωS , ωE = −
n
: bξi ∂cξi :, ωS =
i=1
n
: β ξi ∂γ ξi : .
(7)
i=1
Here {ξ1 , . . . , ξn } is a basis for g and {ξ1 , . . . , ξn } denotes the corresponding dual basis for g∗ . The generators β ξ , γ ξ , bξ , cξ are primary of weights 1, 0, 1, 0 with respect to ωW . There is an additional grading by degree, in which β ξ , γ ξ , bξ , cξ (and their respective derivatives) have degrees −2, 2, −1, 1. Note that the weight zero component is isomorphic to the classical Weil algebra W (g) = (g∗ ) ⊗ Sym(g∗ ), where the degree 1 and degree 2 generators cξ , γ ξ play the role of connection 1-forms and curvature 2-forms, respectively. Next, we recall the O(sg)-algebra structure on W(g). Define vertex operators ξ
ξ
ξ
ξ
W = E + S , E =
n
ξ
: b[ξ,ξi ] cξi :, S = −
i=1
D = J + K,
n 1 ξ ξ J= : Si + Ei cξi :, 2 i=1
n
: β [ξ,ξi ] γ ξi :,
(8)
i=1
K =
n
: γ ξi bξi : .
(9)
i=1
The Fourier modes J (0), K (0), and D(0) are called the BRST, chiral Koszul, and chiral Weil differentials, respectively. They satisfy J (0)2 = K (0)2 = D(0)2 = [K (0), J (0)] = 0. Moreover, we have η
ξ
[ξ,η]
W (z)W (w) ∼ W (w)(z − w)−1 , ξ
W (z)bη (w) ∼ b[ξ,η] (w)(z − w)−1 ,
ξ ξ [D(0), bξ (z)] = W (z), D(0), W (z) = 0. ξ
In other words, the map O(sg) → W(g) sending (ξ, η)(z) → W (z) + bη (z) and sending d → [D(0), −], is a homomorphism of DVAs. The horizontal subalgebra W(g)hor is the vertex subalgebra generated by β ξ , γ ξ , bξ . g[t] The basic subalgebra W(g)bas = W(g)hor is easily seen to be a subcomplex under both differentials K (0) and J (0).
Chiral Equivariant Cohomology of a Point
387
Definition 3.1. For any sg[t]-module (A, dA ), we define its chiral basic cohomology ∗ (A) to be H ∗ (A ∗ Hbas bas , dA ). We define its chiral equivariant cohomology HG (A) to ∗ be Hbas (W(g) ⊗ A). In this paper, we are interested in the case when A is the trivial sg[t]-module C. We refer to H∗G (C) as the chiral point algebra; it plays the role of HG∗ ( pt) = Sym(g∗ )G in the classical theory, and H∗G (C)[0] ∼ = HG∗ ( pt). Any connected, compact group G can be written as a quotient (G 1 × · · · × G r × T )/ , where the G i are simple, T is a torus, and is finite. Then H∗G (C) = H∗G 1 (C) ⊗ · · · ⊗ H∗G r (C) ⊗ H∗T (C). By Theorem 6.1 of [LLI], H∗T (C) is the free abelian vertex algebra
with generators γ ξ1 , . . . , γ ξn , where {ξ1 , . . . , ξn } is a basis for the Lie algebra of T . In other words, H∗T (C) is the polynomial algebra generated by ∂ k γ ξi for k ≥ 0 and i = 1, . . . , n. So we may assume without loss of generality that G is simple. There is a “classical sector” of H∗G (C), which is the abelian subalgebra generated by the weight zero component H∗G (C)[0] = Sym(g∗ )G . Unlike the case where G is abelian, H∗G (C) contains additional elements that have no classical analogues. The most notable feature of H∗G (C) is a conformal structure of central charge zero. Let {ξ1 , . . . , ξn } be an orthonormal basis for g relative to the Killing form. The Virasoro element L is represented by L = ωS − L S + C,
(10)
∗ n ξ ξ ξi ξ j ad (ξi )(ξ j ) :=: : Si Si : and C = i, j=1 : b b γ ξ (K (0)Si )bξi :. The term ωS − L S lies in W(g)bas and satisfies the Virasoro OPE with central charge zero, but it is not D(0)-closed. The purpose of the term C is to correct this flaw, and L still satisfies L(z)L(w) ∼ 2L(w)(z − w)−2 + ∂ L(w)(z − w)−1 . By Corollary 7.17 of [LLI], L represents a nontrivial class L in H∗G (C), and by Corollary 4.18 of [LLSI], L is a conformal structure on H∗G (C). The proof is very simple. First, the Virasoro element ωW = ωS + ωE is a conformal structure on W(g); in particular, ωW ◦0 acts by ∂ and ωW ◦1 a = ma for all a ∈ W(g)[m]. Even though ωW does not lie in W(g)bas , the operators ωW ◦k act on W(g)bas for all k ≥ 0. For all a ∈ W(g)bas and k ≥ 0, we have
where L S = −
n
i=1
(L − ωW ) ◦k a = D(0)(H ◦k a),
H=
n
ξ
: Si bξi :
(11)
i=1
Even though H does not lie in W(g)bas , H ◦k preserves W(g)bas for all k ≥ 0. Hence the operators L◦k and ωW ◦k on W(g)bas agree up to coboundary. The Virasoro element L is nonclassical and has no analogue in HG∗ ( pt), since 0 HG ( pt) = C. The main purpose of this paper is to construct in a uniform way an infinite family of new, nonclassical elements of H∗G (C) in the case G = SU (2). In particular, we construct an injective linear map : U (sl2 ) → H∗SU (2) (C), for which ∗ (1) = L. The image of , together with the generator of HSU (2) ( pt), is conjectured to ∗ be a strong generating set for H SU (2) (C). We also compute some relations among these generators, which gives some glimpse of the rich algebraic structure possessed by the chiral point algebra.
388
A. R. Linshaw
4. A Peculiar Topological Structure on W(g) The notion of a topological vertex algebra (TVA) was introduced by Lian-Zuckerman in [LZII]. It is an abstraction based on examples from physics. A TVA is a vertex algebra A equipped with four distinguished vertex operators L , F, J, G, where L is a Virasoro element with central charge zero, F is an even current which is conformal weight one quasi-primary (with respect to L), J is an odd conformal weight one primary, and G is an odd conformal weight two primary, such that J (z)J (w) ∼ 0, G(z)G(w) ∼ 0, J (0)G = L , F(0)J = J, F(0)G = −G.
(12)
We do not require that L is a conformal structure on A, and in particular, L 0 = L◦1 need not act diagonalizably on A. There is a well-known TVA structure on the bcβγ system E(V ) ⊗ S(V ) attached to an n-dimensional vector space V with basis {x1 , . . . , xn }. Define elements L=
n
: β xi ∂γ xi : − : b xi ∂c xi : ,
F =−
i=1
n
: b xi c xi :,
i=1
J=
n
: c xi β xi :, G = −
i=1
n
: b xi ∂γ xi : .
i=1
It is easy to check that (12) is satisfied. In particular, for any Lie algebra g, W(g) possesses this TVA structure. The same vertex algebra W(g) supports another, more subtle TVA structure, which depends on the O(sg) structure. As usual, fix an orthonormal basis {ξ1 , . . . , ξn } for g relative to the Killing form. Theorem 4.1. Define L=−
n
ξ
ξ
: Si Si : −
i=1
n
ξ
ξ
i i : W W :−
i=1
n 1 ξi ξ ξi J= : S + E c i :, 2
n
: bξi ∂cξi :,
(13)
i=1
i=1
G=−
n n
ξ ξ : 2Si + Ei bξi :, F = − : bξi cξi : . i=1
(14)
i=1 ξ
These elements commute with W for all ξ ∈ g and satisfy (12), and hence define a TVA structure inside W(g)g[t] . Proof. This is a straightforward OPE calculation.
Note that these vertex operators do not lie in W(g)bas , since they depend on cξi . However, the nonnegative modes F(k), J (k), G(k), L(k) for k ≥ 0 act on W(g)bas .
Chiral Equivariant Cohomology of a Point
389
This follows readily from the OPEs ξ
J (z)bξ (w) ∼ W (w)(z − w)−1 , G(z)bξ (w) ∼ − ξ
n
b[ξi ,ξ ] bξi (w)(z − w)−1 ,
i=1 ξ
F(z)b (w) ∼ −b (w)(z − w)−1 . This structure on W(g)bas will be useful later in the study of H∗G (C) for G = SU (2). 5. Graded and Filtered Structures Let R be the category of vertex algebras A equipped with a Z≥0 -filtration A(k) A(0) ⊂ A(1) ⊂ A(2) ⊂ · · · , A =
(15)
k≥0
such that A(0) = C, and for all a ∈ A(k) , b ∈ A(l) , we have a ◦n b ∈ A(k+l) , for n < 0, a ◦n b ∈ A(k+l−1) , for n ≥ 0.
(16) (17)
Elements a(z) ∈ A(d) \ A(d−1) are said to have degree d. Filtrations on vertex algebras satisfying (16)–(17) were introduced in [LiII], and are known as good increasing filtrations. If A possesses such a filtration, the associated graded object gr (A) = k>0 A(k) /A(k−1) is a Z≥0 -graded associative, supercommutative algebra with a unit 1 under a product induced by the Wick product on A. In general, there is no natural linear map A → gr (A), but for each r ≥ 1 we have the projection φr : A(r ) → A(r ) /A(r −1) ⊂ gr (A).
(18)
d Moreover, gr (A) has a derivation ∂ of degree zero (induced by the operator ∂ = dz on A), and for each a ∈ A(d) and n ≥ 0, the operator a◦n on A induces a derivation of degree d − k on gr (A), which we denote by a(n). Here
k = sup{ j ≥ 1| A(r ) ◦n A(s) ⊂ A(r +s− j) ∀r, s, n ≥ 0}, as in [LL]. Finally, these derivations give gr (A) the structure of a vertex Poisson algebra. The assignment A → gr (A) is a functor from R to the category of Z≥0 -graded supercommutative rings with a differential ∂ of degree 0, which we will call ∂-rings. A ∂-ring is the same thing as an abelian vertex algebra, that is, a vertex algebra V in which [a(z), b(w)] = 0 for all a, b ∈ V. A ∂-ring A is said to be generated by a subset {ai | i ∈ I } if {∂ k ai | i ∈ I, k ≥ 0} generates A as a graded ring. The key feature of R is the following reconstruction property [LL]: Lemma 5.1. Let A be a vertex algebra in R and let {ai | i ∈ I } be a set of generators for gr (A) as a ∂-ring, where ai is homogeneous of degree di . If ai (z) ∈ A(di ) are vertex operators such that φdi (ai (z)) = ai , then A is strongly generated as a vertex algebra by {ai (z)| i ∈ I }.
390
A. R. Linshaw
For any Lie algebra g, the semi-infinite Weil algebra W(g) admits a good increasing filtration W(g)(0) ⊂ W(g)(1) ⊂ · · · , W(g) = W(g)(k) , (19) k≥0
where W(g)(k) is defined to be the vector space spanned by iterated Wick products of the generators bξ , cξ , β ξ , γ ξ and their derivatives, of length at most k. We say that elements of W(g)(k) \ W(g)(k−1) have polynomial degree k. This filtration is sg[t]-invariant, and we have an isomorphism of supercommutative rings ⎞ ⎛ ⎞ ⎛ ⎝ Vk ⊕ Vk∗ ⎠ Uk ⊕ Uk∗ ⎠ . (20) gr (W(g)) ∼ = Sym ⎝ k≥0
k≥0
Here Vk , Uk are copies of g, and Vk∗ , Uk∗ are copies of g∗ . The generators of gr (W(g)) ξ
ξ
ξ
ξ
are βk i , γk i , bki , and cki , which correspond to the vertex operators ∂ k β ξi , ∂ k γ ξi , ∂ k bξi , and ∂ k cξi , respectively for k ≥ 0. Since W(g) has a basis consisting of iterated Wick products of the generators and their derivatives, W(g) ∼ = gr (W(g)) as vector spaces, although not canonically. Finally, the filtration (19) is inherited by any subalgebra of W(g), such as W(g)hor and W(g)bas . Note that K (0)(W(g)(k) ) ⊂ W(g)(k) ,
J (0)(W(g)(k) ) ⊂ W(g)(k+1) .
(21)
Clearly K (0) preserves the parity of polynomial degree and J (0) reverses it. Recall that the horizontal subalgebra W(g)hor ⊂ W(g) is generated by β ξ , γ ξ , bξ . Clearly W(g)hor (but not W(g)) has a Z≥0 grading by b-number, which is the eigenvalue of the Fourier mode −F(0), where F is given by (14). This grading is inherited ξ by W(g)bas , since the action of W on W(g)hor preserves b-number, for all ξ ∈ (k) g. Let us introduce the notations W(g)(k) hor and W(g)bas to denote the subspaces of (0) b-number k; in this notation, S(g)g[t] = W(g)bas . The parity of the b-number and cohomological degree gradations coincide, but since K (0) and J (0) raise and lower b-number by 1, respectively, H∗G (C) is not graded by b-number. However, the b-number gradation does induce a filtration on H∗G (C). Any class in (2i) , where ω(2i) ∈ W(g)(2i) , and H2k i≥0 ω G (C) has a representative of the form ω = bas (C) admits a only finitely many terms are nonzero. Consequently, for each k ∈ Z, H2k G decreasing filtration 2k 2k H2k G (C)(0) ⊃ HG (C)(2) ⊃ HG (C)(4) ⊃ · · · ,
(22)
where H2k G (C)(2r ) consists of classes admitting a representative of the form ω = (2i) . There is a similar filtration on H2k+1 (C) of the form ω i≥r G 2k+1 2k+1 H2k+1 G (C)(1) ⊃ HG (C)(3) ⊃ HG (C)(5) ⊃ · · · . (2l) ) = 0, Note that if ω = i≥l ω(2i) represents a class in H2k G (C), we must have J (0)(ω since K (0) and J (0) raise and lower b-number by 1, respectively. Hence the lowest term ω(2l) represents a class in H 2k (W(g)bas , J (0)), and we obtain an injective linear map 2k 2k H2k G (C)(2l) /HG (C)(2l+2) → H (W(g)bas , J (0)),
(23)
Chiral Equivariant Cohomology of a Point
391
whose image is homogeneous b-number 2l. In odd degree 2k + 1, there is a similar injective map 2k+1 2k+1 H2k+1 (W(g)bas , J (0)) , G (C)(2l+1) /HG (C)(2l+3) → H
whose image is homogeneous of b-number 2l + 1. 6. The Basic Complex and the Vertex Algebra Commutant Problem The main technical difficulty that arises in studying H∗G (C) is that the basic complex is a commutant subalgebra of W(g), and vertex algebra commutants are generally not well understood. Given a vertex algebra V and a subalgebra A, recall that the commutant Com(A, V) is defined to be {v ∈ V|[a(z), v(w)] = 0, ∀a ∈ A}. If A is a homomorphic image of a current algebra O(g, B) of some Lie algebra g, we have Com(A, V) = V g[t] . In this notation, W(g)bas = Com(O, W(g)), where O is the ξ copy of O(sg) generated by W , bξ for ξ ∈ g. It is difficult to study this algebra because O(sg) does not act completely reducibly on W(g). Moreover, it seems impossible to describe H∗G (C) without first giving a reasonable description of W(g)bas . Even in the simplest case G = SU (2), we are unable to describe W(g)bas completely. However, in (0) this case we will find a generating set for the subalgebra W(g)bas = S(g)g[t] , as well (1) as for the subspace W(g)bas , which is a module over S(g)g[t] . This will be sufficient to construct the injective linear map : U (sl2 ) → H∗SU (2) (C). As we shall see, S(g)g[t] plays an important role in the structure of the chiral point algebra, but unfortunately we lack the tools to describe S(g)g[t] for any simple g other than sl2 . However, since S(g)g[t] coincides with Com(S , S(g)), where S is generated ξ by {S | ξ ∈ g}, we have the following simple characterization of this algebra: Lemma 6.1. S(g)g[t] = S(g) ∩ K er (J (0)). In particular, we have a vertex algebra homomorphism S(g)g[t] → H ∗ (W(g)bas , J (0)).
ξi 1 m cξi Proof. An OPE calculation shows that J ◦0 ω = m! : ∂ ◦ ω : for m≥0 S m any ω ∈ S(g)g[t] , so the claim is immediate.
S(g)g[t] possesses some additional features that were described in [LL]. In terms of an orthonormal basis {ξ1 , . . . , ξn } for g relative to the Killing form, there is a Virasoro element of central charge zero, given by ωS − L S =
n
: β ξi ∂γ ξi : +
i=1
n
ξ
ξ
: Si Si : .
i=1
There is also an action of the current algebra O(sl2 , − n8 κ) with generators X x , X h , X y , given by X h →
n i=1
: β ξi γ ξi :,
1 ξ ξ : γ i γ i :, 2 n
X x →
i=1
1 ξi ξi : β β : . (24) 2 n
X y → −
i=1
392
A. R. Linshaw
In fact, if V is any g-module of dimension m which admits a symmetric, invariant bilinear form, S(V )g[t] carries an action of O(sl2 , − m8 κ) given by the same formula. Moreover, O(sl2 , − m8 κ) has level − m2 in the standard normalization. By Theorem 0.2.1 of [GK], it is a simple vertex algebra, so the map O(sl2 , − m8 κ) → S(V )g[t] is injective. In [LL], an approach to studying vertex algebras of the form S(V )g[t] using commutative algebra was introduced. There is a good increasing filtration S(V )(0) ⊂ S(V )(1) ⊂ S(V )(2) ⊂ · · · , S(V ) = S(V )(k) , (25) k≥0
where S(V )(k) is spanned by iterated Wick products of β x , γ x and their derivatives, of length at most k. This is analogous to the Bernstein filtration on the Weyl algebra D(V ). This filtration is g[t]-invariant, so g[t] acts on gr (S(V )) by derivations of degree zero, and there is an injective map of invariant spaces gr (S(V )g[t] ) → gr (S(V ))g[t] .
(26)
The latter space may be easier to describe, and if this map happens to be surjective we can reconstruct S(V )g[t] using Lemma 5.1, since a set of generators for gr (S(V )g[t] ) as a differential algebra corresponds to a strong generating set for S(V )g[t] as a vertex algebra. The problem of describing gr (S(V ))g[t] can be reinterpreted in the language of jet schemes. Let X be an irreducible scheme of finite type over C. For each integer m ≥ 0, the jet scheme Jm (X ) is determined by its functor of points: for every C-algebra A, we have a bijection H om(Spec(A), Jm (X )) ∼ = H om(Spec(A[t]/t m+1 ), X ). Thus the C-valued points of Jm (X ) correspond to the C[t]/t m+1 -valued points of X . We define J∞ (X ) = lim∞←m Jm (X ), which is known as the infinite jet scheme, or space of arcs of X , and we denote by O(J∞ (X )) the ring limm→∞ O(Jm (X )). It is a differential algebra with derivation D. More details about jet schemes and a list of references can be found in the Appendix. For a finite-dimensional vector space V , there is an isomorphism of differential algebras gr (S(V )) ∼ = O(J∞ (V ⊕ V ∗ )) which intertwines the differentials D and ∂. If V is a representation of a (connected) Lie group G with Lie algebra g, there is an action of g[t] on O(J∞ (V ⊕ V ∗ )), and this isomorphism intertwines the actions of g[t] as well. Hence we have an isomorphism of invariant spaces gr (S(V ))g[t] ∼ = O(J∞ (V ⊕ V ∗ ))g[t] . We are thus led to the problem of describing invariant rings of the form O(J∞ (V ))g[t] , where V is an arbitrary finite-dimensional G-representation. Note that O(V )G is a canonical subalgebra of O(J∞ (V ))g[t] . Let O(V )G denote the algebra generated by {D i ( f )| f ∈ O(V )G , i ≥ 0}, which lies in O(J∞ (V ))g[t] , since the latter is closed under D. The following result is important in our study of H∗SU (2) (C), since it will allow (1)
us to give a complete description of S(g)g[t] and W(g)bas , for G = SU (2).
Chiral Equivariant Cohomology of a Point
393
Theorem 6.2. Let G = SU (2) and let V be the adjoint representation of G. Then we have (27) O(J∞ (V ⊕ V ))g[t] = O(V ⊕ V )G , O(J∞ (V ⊕ V ⊕ V ))g[t] = O(V ⊕ V ⊕ V )G . (28) We will develop some general techniques for studying invariant rings of this kind and prove this result in the Appendix.
7. The Case G = SU(2) In this section, we consider the simplest nontrivial case G = SU (2). The complexified Lie algebra of SU (2) is sl2 , and we work in the standard root basis x, y, h, with commutators [x, y] = h, [h, x] = 2x, [h, y] = −2y. For simplicity of notation, we denote W(sl2 ) and S(sl2 ) by W and S, respectively. We need the following theorem of Weyl which describes polynomial invariants for the adjoint representation of sl2 . Theorem 7.1. For n ≥ 0, let Vn be a copy of the adjoint representation of sl2 , with basis sl2 ∞ y is generated by {anh , anx , an }. Then Sym n=0 Vn qi j = aih a hj ah k cklm = alh h a m
y
y
+ 2aix a j + 2a xj ai , y akx ak y alx al , i, j ≥ 0, 0 ≤ k < l < m. y amx am
(29)
The ideal of relations among the variables qi j and cklm is generated by polynomials of the following two types: qi j cklm − qk j cilm + ql j ckim 1 qil qim ci jk clmn + q jl q jm 4 q q kl km
− qm j ckli , qin q jn . qkn
(30) (31)
Finally, the ideal of relations among the quadratics qi j is generated by the determinantal relations qim qin qir qis q jm q jn q jr q js (32) . q km qkn qkr qks q q q q lm
ln
lr
ls
394
A. R. Linshaw
This theorem also holds if we have odd as well as even variables. If Un is another copy y of the adjoint representation with basis {u nh , u nx , u n } for n ≥ 0, the supercommutative ring
Sym
∞
Vn
∞
n=0
sl2 Un
n=0
is also generated by cubic and quadratics, as above. The only subtlety is that in the “fermionic” determinants cklm , the indices k, l, m need not be distinct if they correspond to odd variables. For example, the following elements are nonzero: h x y h x y a a a u u u k k k k k k h x y u u u = 2a h u x u y + 2a x u y u h − 2a y u x u h , u h u x u y = 6u h u x u y . k l l k l l k k k k l l l k k k l l h x y h x y u u u u u u l k l l k k ξ
For the reader’s convenience, we write down the vertex operators W and J given by (8)-(9) in terms of the basis x, y, h for sl2 . We have
x = 2 : β x γ h : − : β h γ y : −2 : b x ch : + : bh c y :, W
y
W = −2 : β y γ h : + : β h γ x : +2 : b y ch : − : bh c x :,
h W = −2 : β x γ x : +2 : β y γ y : +2 : b x c x : −2 : b y c y :,
J =: β h γ x c y : − : β h γ y c x : +2 : β x γ h c x :
−2 : β x γ x ch : −2 : β y γ h c y : +2 : β y γ y ch :
− : bh c x c y : +2 : b x c x ch : −2 : b y c y ch : . A natural place to look for elements of Wbas is to write down the invariant normally ordered polynomials in the generators β ξ , γ ξ , bξ ∈ Whor , using (29), and taking into ∗ account the sl2 -module isomorphism sl2 ∼ = sl2 . There are five quadratic, and four cubic vertex operators, given as follows:
v h =: β h γ h : + : β x γ x : + : β y γ y :,
1 h h : γ γ : + : γx γy : , vx = 2
1 v y = − : β h β h : +4 : β x β y : , 2 h h K =: γ b : + : γ x b x : + : γ y b y :, Q C
βb
βγ b
C βbb C bbb
(34) (35) (36)
=: β b : +2 : β b : +2 : β b :, h h
x y
x x
y x
y y
(37) h y
= − : β γ b : + : β γ b : −2 : β γ b : h
C γ bb
(33)
h
x
+ : β x γ x bh : +2 : β y γ h b x : − : β y γ y bh :, 1 1 = − : γ h b x b y : + : γ x b x bh : − : γ y b y bh :, 2 2 =: β h b x b y : + : β x b y bh : − : β y b x bh :, =: b x b y bh : .
(38) (39) (40) (41)
Chiral Equivariant Cohomology of a Point
395
Some of these vertex operators (and similar sl2 -invariant vertex operators involving c x , c y , and ch ) were written down by Akman in [A], in her study of the semi-infinite ˆ 2 , S) = H ∗ (W, J (0)). Note that C γ bb coincides with the element cohomology H ∞+∗ (sl C appearing in (10) and C βγ b coincides with the element H given by (11), in the case G = SU (2). Moreover, v x , v y , and v h coincide with the vertex operators given by (24) in the case G = SU (2). Theorem 7.2. The elements v x , v y , and v h are strong generators for the subalgebra S sl2 [t] ⊂ Wbas . Proof. We have a ring isomorphism gr (S)sl2 [t] = O(J∞ (sl2 ⊕ sl2 ))sl2 [t] , since the adjoint and coadjoint representations of sl2 are isomorphic. Theorem 6.2 implies that gr (S)sl2 [t] is generated as a differential algebra by O(sl2 ⊕ sl2 )sl2 , which is a polynomial algebra on three generators. In terms of the generators ξ ξ {βk , γk |ξ = x, y, h, k ≥ 0} of gr (S), the generators of gr (S)sl2 [t] as a differential algebra are
y
y
β0h γ0h + β0x γ0x + β0 γ0 ,
1 1 h h y y γ0 γ0 + γ0x γ0 , − β0h β0h + 4β0x β0 . 2 2
But these correspond precisely to the vertex operators v h , v x , and v y under the projection φ2 : S(2) → S(2) /S(1) ⊂ gr (S). This shows that the map (26) is surjective, and hence is an isomorphism. Remark 7.3. Since O(sl2 , − 38 κ) is a simple vertex algebra, it follows that S sl2 [t] ∼ = O(sl2 , − 38 κ). In particular, there are no nontrivial normally ordered polynomial relations among v x , v y , v h and their derivatives. Remark 7.4. In [LL], the copy of O sl2 , − 38 κ generated by v x , v y , v h was denoted by A. The main result of [LL] was that Com(A, S) = S , where S is the copy of y h , which are given by (8). Theorem 7.2 shows that O (sl2 , −κ) generated by Sx , S , S Com(S , S) = A, so S and A form a Howe pair (i.e., a pair of mutual commutants) inside S. Next, an OPE calculation shows that K , Q βb , C γ bb , C βbb , and C bbb all lie in Wbas . However, C βγ b does not lie in Wbas since ξ
W (z)C βγ b (w) ∼ 4bξ (w)(z − w)−2 , ξ = x, y, h. The image of C βγ b in gr (Whor ) lies in (gr (Whor ))sl2 [t] , but there is no “quantum correction” ω of lower polynomial degree, such that C βγ b + ω ∈ Wbas . This shows that the map gr (Wbas ) → (gr (Whor ))sl2 [t] fails to be surjective, so we cannot reconstruct Wbas from (gr (Whor ))sl2 [t] in a naive way, using Lemma 5.1. Define C to be the vertex subalgebra of Wbas generated by v x , v y , v h , K , Q βb , C γ bb , C βbb , and C bbb . Clearly C bbb commutes with the other generators of C, and the
396
A. R. Linshaw
following OPE relations are easy to verify: K (z)Q βb (w) ∼ 0, v h (z)K (w) ∼ K (w)(z − w)−1 , v x (z)K (w) ∼ 0, v y (z)K (w) ∼ −Q βb (w)(z − w)−1 , h v (z)Q βb (w) ∼ −Q βb (w)(z − w)−1 , v x (z)Q βb (w) ∼ −K (w)(z − w)−1 , v y (z)K (w) ∼ 0, h v (z)C γ bb (w) ∼ C γ bb (w)(z − w)−1 , v x (z)C γ bb (w) ∼ 0, v y (z)C γ bb (w) ∼ C βbb (w)(z − w)−1 , h v (z)C βbb (w) ∼ −C βbb (w)(z − w)−1 , v x (z)C βbb (w) ∼ C γ bb (w)(z − w)−1 , v y (z)C βbb (w) ∼ 0, K (z)C βbb (w) ∼ −3C bbb (w)(z − w)−1 , Q βb (z)C γ bb (w) ∼ −3C bbb (w)(z − w)−1 , C γ bb (z)C βbb (w) ∼ 0, C γ bb (z)C γ bb (w) ∼ 0, C βbb (z)C βbb (w) ∼ 0.
(42) (43) (44) (45) (46) (47) (48)
Note that each term in the OPE of any pair of these generators is linear in the generators, so C is a homormophic image of a current algebra associated to an 8-dimensional Lie superalgebra s. We can obtain s as an extension of sl2 in two steps. First, let M denote the irreducible, 2-dimensional sl2 -module with basis m −1 , m 1 , regarded as an odd vector space. Define a Lie superalgebra s˜ = s˜ 0 ⊕ s˜ 1 to be the semidirect product sl2 M. In other words, s˜ 0 = sl2 and s˜ 1 = M, and the following super-commutators are satisfied: [h, m −1 ] = −m −1 , [h, m 1 ] = m 1 , [x, m −1 ] = m 1 , [x, m 1 ] = 0, [y, m −1 ] = 0, [y, m 1 ] = m −1 , [m 1 , m 1 ] = 0, [m 1 , m −1 ] = 0, [m −1 , m −1 ] = 0. Next, let N = N0 ⊕ N1 be a super s˜ -module, where N0 is a copy of the 2-dimensional sl2 -module with basis n −1 , n 1 , N1 is a copy of the trivial, 1-dimensional sl2 -module with basis n 0 . The action of s˜ 0 on N is given as follows: [h, n −1 ] = −n −1 , [h, n 1 ] = n 1 , [x, n −1 ] = n 1 , [x, n 1 ] = 0, [y, n −1 ] = 0, [y, n 1 ] = n −1 , [h, n 0 ] = 0, [x, n 0 ] = 0, [y, n 0 ] = 0, [m 1 , n 1 ] = 0, [m 1 , n −1 ] = −3n 0 , [m −1 , n 1 ] = −3n 0 , [m 1 , n 0 ] = 0, [m −1 , n 0 ] = 0. Define s be the semidirect product Lie superalgebra s˜ N , and define a symmetric, invariant bilinear form B on s by extending the form − 38 κ on the subalgebra s˜ 0 ⊂ s trivially. Let F denote the corresponding super current algebra O(s, B), with generators X u , u ∈ s. By (42)-(48) and the fact that v x , v y , v h generate a copy of O(sl2 , − 38 κ), the map F → C sending X h → v h , X x → v x , X y → v y , X m 1 → K , X m −1 → Q βb , X n 1 → C γ bb , X n −1 → C βbb , X n 0 → C bbb , is a vertex algebra homomorphism.
(49) (50)
Chiral Equivariant Cohomology of a Point
397
Using the classical relation (30), we can write down certain normally ordered polynomial relations among the generators of C: : Q βb C γ bb : + : v h C bbb : +∂C bbb , : K C γ bb : +2 : v x C bbb :, : Q βb C βbb : +2 : v y C bbb :, : K C βbb : − : v h C bbb : +∂C bbb .
(51) (52) (53) (54)
On the other hand, it is immediate from Theorem 7.1 that there are no nontrivial relations in W of the form : f C γ bb : + : gC βbb : + · · · ,
(55)
where f, g are homogeneous, normally ordered polynomials in v x , v y , v h of the same degree, and (· · · ) denotes terms in W of lower polynomial degree. Note, however, that there does exist a relation : v y C γ bb : −
1 h βbb 1 1 ˜ :v C : + : Q βb C βγ b : − ∂C βbb − C, 2 4 2
where C˜ =: (∂β h )b x b y : + : (∂β x )b y bh : − : (∂β y )b x bh :. This does not contradict the above statement, however, because the terms : Q βb C βγ b :, v y C γ bb : and : v h C βbb : all have polynomial degree 5. Similarly, it follows from (32) that there are no nontrivial relations in W of the form : f K : + : g Q βb : + · · · ,
(56)
where f, g are homogeneous, normally ordered polynomials in v x , v y , v h of the same degree, and (· · · ) denotes terms of lower polynomial degree. These observations will be important later. Next, we observe that C is actually a subcomplex of Wbas under the differential D(0). Since K lies in C, it is enough to show that J (0) preserves C, which is clear from the following OPEs: J (z)K (w) ∼ 0, J (z)Q βb (w) ∼ 0, J (z)v u (w) ∼ 0, u = x, y, h, J (z)C γ bb (w) ∼ K (w)(z − w)−2
+ : v h K : −2 : v x Q βb : −∂ K (z − w)−1 ,
(58)
J (z)C βbb (w) ∼ −Q βb (w)(z − w)−2
+ 2 : v y K : + : v h Q βb : +∂ Q βb (z − w)−1 ,
(59)
J (z)C bbb (w) ∼ − : K Q βb : (z − w)−1 .
(60)
(57)
It is amusing to note that J (0) acts nonlinearly on the generators of C, so its action on C does not come from a differential Lie superalgebra structure on s. We conjecture (k) that Wbas = C, although we are unable to prove this at present. Let C (k) = C ∩ Wbas , (0) which is the homogeneous subspace of C of b-number k. Since Wbas = S sl2 [t] = C (0) by Theorem 7.2, our conjecture clearly holds for b-number zero. It will suffice for our (1) (1) purposes to show that Wbas = C (1) as well. First, we need a characterization of Wbas sl [t] which is analogous to the characterization of S 2 given by Lemma 6.1.
398
A. R. Linshaw (1)
(1)
Lemma 7.5. We have Wbas = Whor ∩ K er (J (0)). Proof. For any simple Lie algebra g, the inclusion (1)
(1)
W(g)bas ⊂ W(g)hor ∩ K er (J (0))
(61)
is equivalent to the injectivity of the map S(g)g[t] → H ∗ (W(g)bas , J (0))
(62) given by Lemma 6.1. In the case g = sl2 , we have S sl2 [t] ∼ = O sl2 , − 38 κ , which is a sl [t] ∗ simple vertex algebra, so the injectivity of S 2 → H (Wbas , J (0)) is apparent. We expect (although we are unable to prove) that the map (62) is injective for any simple g. (1) Next, we claim that the opposite inclusion K er (J (0)) ∩ W(g)hor ⊂ W(g)bas holds (1) for any simple g. First, any ω ∈ W(g)hor can be written in the form ω=
l n
ξ
: ∂ k bξ j Pk j :,
k=0 j=1
for some fixed l. In this notation, {ξ1 , . . . , ξn } is an orthonormal basis for g relative to ξ the Killing form, and each Pk j lies in S(g). We need the following computations: n 1 ξi ξj k ξi ξi ξj l ξj ∂ k−m b[ξi ,ξ j ] . E ◦m ∂ b = : c E : ◦0 b = 2E , m m! i=1
Using this calculation, we obtain J ◦0 ω =
n l
ξ j : ∂ k Wj Pk : j=1 k=0
+
n l
1 m ξi k ξ j ξi ξ : ∂ c S ◦m Pk j : . ∂ b m!
i, j=1 k=0 m≥0
The term
n j=1
l k=0
ξ j : ∂ k Wj Pk : can be rewritten in the form
n n l l k
ξ
ξ k j k ξj : (∂ k−m b[ξ j ,ξi ] ) ∂ m cξi Pk j : : ∂ S Pk : + m j=1 k=0
=
n l j=1 k=0
=
i, j=1 k=0 m=0
n l k
ξ
ξ ξ k : ∂ m cξi ∂ l−m b[ξi ,ξ j ] Pk j : : ∂ k Sj Pk j : + m i, j=1 k=0 m=0
n n l l k
ξ
ξ 1 m ξ ξi ξ : ∂ c i E ◦m ∂ l bξ j Pk j : . : ∂ k Sj Pk j : + m! j=1 k=0
i, j=1 k=0 m=0
Chiral Equivariant Cohomology of a Point
399
Collecting terms, we have J ◦0 ω =
n l
ξ
1 m ξ ξi ξ : ∂ k Sj Pk j : + : ∂ c i W ◦m ω : . m!
(63)
m≥0
j=1 k=0
Since J ◦0 ω = 0, and the term
l
n j=1 ξi
k=0
ξ ξ : ∂ k Sj Pk j : does not depend on cξi and
its derivatives, we conclude that W ◦m ω = 0 for all m ≥ 0. Hence ω lies in W(g)bas . (k)
(k)
Remark 7.6. The characterization Wbas = Whor ∩ K er (J (0)) fails for k ≥ 2; see for example (58)–(60). (1)
Theorem 7.7. For g = sl2 , Wbas = C (1) , and hence has a basis consisting of normally ordered polynomials in v x , v h , v y , Q βb , K and their derivatives, which are linear in Q βb (1) and K , and their derivatives. In particular, Wbas is homogeneous of even polynomial degree. Proof. Recall that
⎛
gr (Whor ) ∼ = Sym ⎝
⎞ ⎛ ⎞ ⎝ Vk ⊕ Vk∗ ⎠ Uk ⎠,
k≥0
k≥0
ξ where Vk and Uk are the copies of g = sl2 with bases βk | ξ = x, y, h and ξ ξ bk |ξ = x, y, h , respectively, and Vk∗ is the copy of g∗ with basis {γk |ξ = x, y, h}. We
(1) of b-number 1, which is linear in the variables are interested in the subspace gr Whor ξ
bk . Since this subspace is linear in these variables,
it does not matter whether they are
(1) regarded as even or odd variables. So gr Whor is isomorphic to the subspace of
⎛ Sym ⎝
⎞ Vk ⊕ Vk∗ ⊕ Uk ⎠ = O J∞ V ∗ ⊕ V ⊕ U ∗ ,
k≥0
which is linear in the generators of O(J∞ (U ∗ )). Here V = g and U ∗ , V ∗ are copies of g∗ . By Theorem 6.2, O(J∞ (V ∗ ⊕ V ⊕ U ∗ ))sl2 [t] is generated as a differential algebra by O(V ∗ ⊕ V ⊕ U ∗ )sl2 , which by Theorem 7.1 is generated by six quadratics, qV V , qV V ∗ , qV ∗ V ∗ , qV U ∗ , qV ∗ U ∗ , qU ∗ U ∗ , and one cubic cV V ∗ U ∗ . The subspace which is linear in the generators of O(J∞ (U ∗ )) is the C[qV V , qV V ∗ , qV ∗ V ∗ ]-module generated by qV U ∗ , qV ∗ U ∗ and cV V ∗ U ∗ , and their derivatives. (In this notation, C[qV V , qV V ∗ , qV ∗ V ∗ ] denotes the algebra generated by qV V , qV V ∗ , qV ∗ V ∗ , and their derivatives). Recall the projections φd : W(d) → W(d) /W(d−1) ⊂ gr (W). Define y
h = φ2 (∂ m v h ), vmx = φ2 (∂ m v x ), vm = φ2 (∂ m v y ), vm
Q βb m
βb
= φ2 (∂ Q ), m
K m = φ2 (∂ K ), m
βγ b Cm
(64)
= φ3 (∂ C m
βγ b
).
(65)
400
A. R. Linshaw y
Clearly qV V , qV V ∗ , qV ∗ V ∗ , qV U ∗ , qV ∗ U ∗ , and cV V ∗ U ∗ correspond to v0x , v0h , v0 , K 0 , βγ b βb (1) Q 0 , and C0 , respectively. It follows that (gr (Whor ))sl2 [t] is a module over y h |m ≥ 0], with generators K , Q βb , and C βγ b , for n ≥ 0. gr (S sl2 [t] ) = C[vmx , vm , vm n n n Since Wbas = (Whor )sl2 [t] , we have an injective linear map
sl2 [t] (1) (1) gr Wbas → gr Whor . βγ b
Recall that this map is not surjective, since C0 does not lie in the image, so we cannot (1) (1) reconstruct Wbas in a naive way. However, for any ω ∈ Wbas of polynomial degree y h , v , K , Q βb , and C βγ b for m ≥ 0. d, φd (ω) ∈ gr (W) is a polynomial in vmx , vm m m m m It follows that ω can be written in the form ω = ωd + ω< , where the leading term ωd is a normally ordered polynomial in v x , v y , v h , K , Q βb , ξ C βγ b , and their derivatives, and ω< ∈ W(d−2) . Since the operators W (k), k ≥ 0 on even ⊕ Whor either preserve polynomial degree, or lower it by 2, we have Wbas = Wbas odd , where W even and W odd are homogeneous of even and odd polynomial degrees, Wbas bas bas respectively. We can therefore deal with the odd and even cases separately. Suppose (1) first that ω ∈ Wbas and has even polynomial degree d. Then ωd is a normally ordered x polynomial in v , v y , v h , K , Q βb and their derivatives, so ωd ∈ D by definition. In (1) (1) particular, ωd ∈ Wbas , so ω< must also lie in Wbas . Since ω< lies in W(d−2) , it follows by induction on d that ω ∈ D. Next, suppose that ω has odd polynomial degree d. Then the leading term ωd is of the form l
: Pk ∂ k C βγ b :
k=0
for some fixed l, where each Pk lies in S sl2 [t] , and Pl = 0. Unlike the case where d is ξ (1) even, ωd does not lie in Wbas , since W (l + 1)(ωd ) = 4(l + 1)! : Pl bξ :, for ξ = x, y, h. Recall the vertex operator F = −(: b x c x : + : b y c y : + : bh ch :), which is part of the TVA structure given by (14). We calculate F(z)C βγ b (w) ∼ −C βγ b (w)(z − w)−1 , which implies that F(l)(∂ l C βγ b ) = −l!C βγ b , and F(l)(∂ k C βγ b ) = 0 for 0 ≤ k < l. Since F(l) acts trivially on each Pk , we have l 1 k βγ b − F(l) : Pk ∂ C : =: Pl C βγ b : . l! k=0
Since F(k)(Wbas ) ⊂ Wbas for all k ≥ 0, we can assume without loss of generality that l = 0, so that ωd is of the form : PC βγ b :, with P ∈ S sl2 [t] . Next, we compute J (0)(C βγ b ) = −2 : v h v h : −8 : v x v y : +4ωW + 2∂v h ,
Chiral Equivariant Cohomology of a Point
401
where ωW is the Virasoro element in W(2) given by (7). Since J (0) annihilates P, we have
J (0)(ωd ) =: P −2 : v h v h : −8 : v x v y : : + · · · , where (· · · ) lies in W(d−1) . Hence φd+1 (J (0)(ωd )) is a nonzero element of gr (W) of polynomial degree d + 1. Since J (0) can raise the polynomial degree by at most 1, and ω< has polynomial degree at most d − 2, J (0)(ω< ) can have polynomial degree at most d − 1. Hence the term : P(−2 : v h v h : −8 : v x v y :) : cannot be canceled. In particular, (1) J (0)(ω) = 0, which contradicts ω ∈ Wbas , by Lemma 7.5. 8. The Structure of H ∗(W bas , K (0)) Recall that Wbas is a double complex under the commuting differentials K (0) and J (0), (0) (1) whose sum is D(0). Using the description of Wbas and Wbas given by Theorems 7.2 ∗ and 7.7, we will construct an interesting subalgebra of H (Wbas , K (0)). Later, we will use the results in this section together with the spectral sequence of the double complex (k) (k+1) to study H∗SU (2) (C). First, since K (0)(Wbas ) ⊂ Wbas , we may regrade the complex (∗)
∗ , K (0)) by b-number. We denote this new complex by (W (Wbas bas , K (0)), and we denote its cohomology by H (∗) (Wbas , K (0)). Clearly we have a linear isomorphism H (k) (Wbas , K (0)) ∼ H l (Wbas , K (0)). (66) = l∈Z
k≥0
Let D denote the subalgebra of Wbas generated by v x , v y , v h , K , and Q βb , which (0) (1) contains both Wbas and Wbas . Clearly (D∗ , K (0)) and (D(∗) , K (0)) are subcomplexes (∗) ∗ , K (0)) and (W (k) = D ∩ W (k) . There is a of (Wbas bas , K (0)), respectively, where D bas similar linear isomorphism H (k) (D, K (0)) ∼ H l (D, K (0)). = l∈Z
k≥0
Let γ denote the (abelian) subalgebra of W generated by γ x , γ y , γ h . It is easy to see from Theorem 7.2 that γ sl2 [t] is just the polynomial algebra generated by ∂ k v x for k ≥ 0. Lemma 8.1. H (k) (D, K (0)) vanishes for k > 0, and H (0) (D, K (0)) is isomorphic to γ sl2 [t] . Proof. Since K (0) preserves the filtration (19), K (0) acts on gr (D), and we may first consider the cohomology H (k) (gr (D), K (0)). Recall from (64)-(65) that y h , Q βb , K denote the images of ∂ m v x , ∂ m v y , ∂ m v h , ∂ m Q βb , ∂ m K vmx , vm , vm m m y h , Q βb , in gr (D) under the map φ2 : D(2) → D(2) /D(1) ⊂ gr (D). Clearly {vmx , vm , vm m (k) K m | m ≥ 0} generates gr (D) as a ring. We will compute H (gr (D), K (0)) by constructing a contracting homotopy on this complex. The action K (0) on gr (Whor ) is ξ ξ ξ defined on the generators {βk , γk , bk | ξ = x, y, h, k ≥ 0} by ξ
ξ
ξ
βk → −bk , γ ξ → 0, bk → 0.
402
A. R. Linshaw
From this, we see that on the generators of gr (D), K (0) acts as follows: K (0)(vm ) = Q βb m , y
K (0)(vmx ) = 0,
h K (0)(vm ) = −K m ,
K (0)(Q βb m ) = 0.
K (0)(K m ) = 0,
˜ denote the Define a vertex operator R =: β h ch : + : β x c x : + : β y c y :, and let R(0) linear map β h (−k − 1)ch (k) + β x (−k − 1)c x (k) + β y (−k − 1)c y (k) ∈ End(Whor ),
k≥0
which is the sum of all terms of the Fourier mode R(0) which only contain annihilation modes of cξ for ξ = x, y, h. There is an induced derivation on gr (Whor ) which we also ˜ denote by R(0), defined on generators by ξ
ξ
ξ
bk → βk , γ ξ → 0, βk → 0. ˜ It follows that R(0) acts on gr (D) as follows: y βb h ˜ ˜ R(0)(Q m ) = −2vm , R(0)(K m ) = vm , x h y ˜ ˜ ˜ R(0)(v m ) = 0, R(0)(v ) = 0, R(0)(v ) = 0.
˜ = S, where S is the Moreover, as derivations on gr (Whor ) we compute [K (0), R(0)] diagonalizable operator defined on generators by:
ξ ξ S(βmξ ) = −βmξ , S(bm ) = −bm , S(γmξ ) = 0.
It follows that h h βb ) = −vm , S(Q βb S(vmx ) = 0, S(vm ) = −2vm , S(vm m ) = −2Q m , S(K m ) = −K m , y
y
˜ so R(0) is a contracting homotopy for K (0) outside the (trivial) subcomplex generated by {vmx | m ≥ 0}. Hence H (k) (gr (D), K (0)) ∼ = γ sl2 [t] for k = 0, and vanishes for k > 0. Finally, we need to compare H (k) (gr (D), K (0)) with H (k) (D, K (0)). Since K (0) preserves polynomial degree, H (k) (gr (D), K (0)) has a grading H (k) (gr (D), K (0))d , H (k) (gr (D), K (0)) = d≥0
where H (k) (gr (D), K (0))d is homogeneous of polynomial degree d. Similarly, H (k) (D, K (0)) admits a filtration H (k) (D, K (0))0 ⊂ H (k) (D, K (0))1 ⊂ H (k) (D, K (0))2 ⊂ · · · , where H (k) (D, K (0))d denotes the subspace of H (k) (D, K (0)) admitting a representative of polynomial degree at most d. There is an injective linear map H (k) (D, K (0))d /H (k) (D, K (0))d−1 → H (k) (gr (D), K (0))d . It follows that H (k) (D, K (0)) vanishes for k > 0. Since gr (D(0) ) ∼ = D(0) ∼ = γ sl2 [t] , the claim follows.
Chiral Equivariant Cohomology of a Point
403 (0)
(1)
By Theorems 7.2 and 7.7, both Wbas and Wbas lie in D. It follows from Lemma 8.1 that H (0) (Wbas , K (0)) ∼ = γ sl2 [t] and H (1) (Wbas , K (0)) = 0. However, we cannot compute H (k) (Wbas , K (0)) for k ≥ 2 using this approach, because gr (Wbas ) is not γ bb βγ b ˜ preserved by R(0). For example, let C0 and C0 denote the images of C γ bb and γ bb γ bb βγ b ˜ C βγ b in gr (W). Clearly C0 ∈ gr (Wbas ), but R(0)(C 0 ) = C 0 , which does not live in gr (Wbas ). As we shall see, H (2) (Wbas , K (0)) has a very rich structure. Lemma 8.2. The map H (2) (C, K (0)) → H (2) (Wbas , K (0)) induced by the inclusion (2) C → Wbas is injective. Moreover, any nonzero element ω ∈ Wbas ∩ K er (K (0)) of odd polynomial degree represents a nontrivial class in H (2) (Wbas , K (0)). By (66), ω then represents a nontrivial class in H ∗ (Wbas , K (0)) as well. (1) Proof. The first statement is clear since Wbas = C (1) . The second statement is clear (1) of odd polynomial degree, and K (0) preserves because there are no elements of Wbas the parity of polynomial degree.
Hence we can construct nontrivial elements of H (2) (Wbas , K (0)) by finding elements of C (2) ∩ K er (K (0)) of odd polynomial degree. Let I denote the kernel of the map F → C given by (49)-(50). Since F is a complex under the differential X m 1 (0) corresponding to K (0), we obtain a short exact sequence of complexes 0 → I → F → C → 0.
(67)
Note that C γ bb represents a nontrivial class in H (2) (C, K (0)), since it has polynomial degree 3. In fact, C γ bb corresponds to X n 1 ∈ F, which is easily seen to represent a nontrivial class in H ∗ (F, X m 1 (0)). So the class of C γ bb in H (2) (C, K (0)) lies in the image of the map H ∗ (F, X m 1 (0)) → H ∗ (C, K (0)). We can obtain new classes in H ∗ (C, K (0)) which correspond to elements of H ∗ (I, X m 1 (0)) via the connecting homomorphism in the long exact sequence of (67). In other words, we seek elements of F which do not map to zero under X m 1 (0), but rather, map to elements of I. For example, consider : X y X n1 : −
1 5 : X h X n −1 : − ∂ X n −1 ∈ F. 4 12
Under X m 1 (0), it maps to : X m −1 X n 1 : + : X h X n 0 : +∂ X n 0 , which is the element of I corresponding to the relation (51). It follows that the corresponding element h 4 =: v y C γ bb : −
1 h βbb 5 :v C : − ∂C βbb ∈ C (2) 4 12
(68)
lies in the kernel of K (0). Since h 4 has the form (55), it is nonzero, and since it has polynomial degree 5, it represents a nontrivial class in H (2) (Wbas , K (0)). Note that h 4 has weight 4 and cohomology degree −4, so it represents a nontrivial class in H −4 (Wbas , K (0)).
404
A. R. Linshaw
More generally, for n ≥ 2 define h 2n+2 =: (v y )n C γ bb : − −
n n2 − n : (v y )n−1 v h C βbb : − 2 : ∂v y (v y )n−2 C βbb : 2n + 2 2n + 3n + 1
2n 2 + 3n : (v y )n−1 ∂C βbb : . 4n 2 + 6n + 2
(69)
Lemma 8.3. For all n ≥ 2, h 2n+2 represents a nontrivial class in H −4n (Wbas , K (0)) of conformal weight 2n + 2. Proof. Clearly h 2n+2 is homogeneous of weight 2n + 2 and degree −4n. Using the derivation property of K (0) and the relations (51)-(54), the following calculations show that h 2n+2 lies in the kernel of K (0):
K (0) : (v y )n C γ bb : = n : (v y )n−1 Q βb C γ bb : = −n : (v y )n−1 v h C bbb : −n : (v y )n−1 ∂C bbb :,
K (0) : (v y )n−1 v h C βbb : = (n − 1) : (v y )n−2 Q βb v h C βbb : − : (v y )n−1 K C βbb : −3 : (v y )n−1 v h C bbb : = (n − 1) : (v y )n−2 v h Q βb C βbb : +(n − 1) : (v y )n−2 (∂ Q βb )C βbb : − : (v y )n−1 K C βbb : −3 : (v y )n−1 v h C bbb : = −2(n − 1) : (v y )n−1 v h C bbb : +4(n − 1) : (v y )n−2 (∂v y )C bbb : +(n − 1) : (v y )n−2 (∂ Q βb )C βbb : − : (v y )n−1 v h C bbb : + : (v y )n−1 ∂C bbb : −3 : (v y )n−1 v h C bbb : = (−2n − 2) : (v y )n−1 v h C bbb : +4(n − 1) : (v y )n−2 (∂v y )C bbb : +(n − 1) : (v y )n−2 (∂ Q βb )C βbb : + : (v y )n−1 ∂C bbb :,
K (0) : (v y )n−1 ∂C βbb :
(70)
(71)
= (n − 1) : (v y )n−2 Q βb ∂C βbb : −3 : (v y )n−1 ∂C bbb :
= (n − 1) : (v y )n−2 ∂ Q βb C βbb : −(n − 1) : (v y )n−2 (∂ Q βb )C βbb : −3 : (v y )n−1 ∂C bbb :
= −2(n − 1) : (v y )n−2 ∂ v y C bbb : −(n − 1) : (v y )n−2 (∂ Q βb )C βbb : −3 : (v y )n−1 ∂C bbb : = (−2n − 1) : (v y )n−1 ∂C bbb : −2(n − 1) : (v y )n−2 (∂v y )C bbb : −(n − 1) : (v y )n−2 (∂ Q βb )C βbb :,
K (0) : (v y )n−2 ∂v y C βbb : = (n − 2) : (v y )n−3 Q βb (∂v y )C βbb : + : (v y )n−2 (∂ Q βb )C βbb : −3 : (v y )n−2 (∂v y )C bbb : = (−2n + 1) : (v y )n−2 (∂v y )C bbb : + : (v y )n−2 (∂ Q βb )C βbb : .
(72)
(73)
Finally, since h 2n+2 has odd polynomial degree 2n + 3, it represents a nontrivial class in H (2) (Wbas , K (0)). It follows that h 2n+2 represents a nontrivial class in H −4n (Wbas , K (0)) as well.
Chiral Equivariant Cohomology of a Point
405
Next, we construct additional nontrivial classes in H ∗ (Wbas , K (0)) by making use of its algebraic structure. Since v x represents a class in H 4 (Wbas , K (0)) of weight zero, we can obtain new elements by taking circle products of v x with h 2n+2 . For example, define f 3 = v x ◦0 h 4 =: v h C γ bb : +
2 5 (2) : v x C βbb : − ∂C γ bb ∈ Wbas ∩ K er (K (0)). 3 3
Since f 3 has polynomial degree 5, it represents a nontrivial class in H 0 (Wbas , K (0)) of weight 3. Lemma 8.4. For any integers n ≥ 1 and 0 ≤ d ≤ n, starting from the normally ordered monomial μ =: (v y )n−d (v h )d : of polynomial degree 2n, there exists a vertex operator of the form (2)
: μC γ bb : + : f C βbb : + · · · ∈ Wbas ∩ K er (K (0))
(74)
of polynomial degree 2n+3, which represents a nontrivial class in H −4(n−d) (Wbas , K (0)) of weight 2n −d +2. Here f is a normally ordered polynomial in v x , v y , v h of polynomial degree 2n, and the term (· · · ) has polynomial degree at most 2n + 1. Proof. In fact, we will prove slightly more. Not only does the vertex operator (74) exist for all n and d, but in the case d < n, we will show that there appears in f a monomial of the form : (v y )n−d−1 (v h )d+1 : whose coefficient λ lies in the open interval (−1, 0). By Lemma 8.3, the vertex operator (74) exists when d = 0 for all n ≥ 1. Moreover, n the coefficient of : (v y )n−1 (v h ) : in f is − 2n+2 by (69), which certainly lies in (−1, 0). Assume inductively that the statement (together with our additional assumption on λ) holds for n − 1 and all d = 0, . . . , n − 1. Since it holds for n at the value d = 0, we may proceed by induction on d, so we assume it holds for d − 1. Thus our inductive assumption is that there exists a normally ordered polynomial in (2) Wbas ∩ K er (K (0)) of the form : (v y )n−d+1 (v h )d−1 C γ bb : +λ : (v y )n−d (v h )d C βbb : + · · · , λ ∈ (−1, 0), where (· · · ) consists of terms which either have polynomial degree at most 2n + 1, or do not depend on C γ bb , and the term : (v y )n−d (v h )d C βbb : does not appear in (· · · ). Apply (2) the operator v x ◦0 , which preserves Wbas ∩ K er (K (0)). Using (44) and the derivation x property of v ◦0 , we obtain the following expression: (n − d + 1) : (v y )n−d (v h )d C γ bb : −2(d − 1) : (v y )n−d+1 (v h )d−2 v x C γ bb : + λ (n − d) : (v y )n−d−1 (v h )d+1 C βbb :
−2d : (v y )n−d (v h )d−1 v x C βbb : + : (v y )n−d (v h )d C γ bb : , (75) modulo terms which either have polynomial degree at most 2n + 1, or do not depend on C γ bb . (Also, it is clear that the term : (v y )n−d−1 (v h )d+1 C βbb : cannot appear elsewhere in this expression). Suppose first that d < n. The coefficient of : (v y )n−d (v h )d C γ bb : in (75) is n−d+1+λ, which is clearly nonzero, since n − d ≥ 1 and λ > −1. Moreover, the coefficient of λ(n−d) : (v y )n−d−1 (v h )d+1 C βbb : is λ(n − d), and the ratio of these coefficients λ˜ = n−d+1+λ
406
A. R. Linshaw
clearly lies in the interval (−1, 0), since λ ∈ (−1, 0). So we can divide (75) by n−d+1+λ, obtaining : (v y )n−d (v h )d C γ bb : +λ˜ : (v y )n−d−1 (v h )d+1 C βbb : 2(d − 1) − : (v y )n−d+1 (v h )d−2 v x C γ bb :, n−d +1+λ
(76)
modulo terms which either have polynomial degree at most 2n + 1, or do not depend on C γ bb . The expression (76) is not quite of the desired form, because the term −
2(d − 1) : (v y )n−d+1 (v h )d−2 v x C γ bb : n−d +1+λ
(77)
has polynomial degree 2n + 3 and depends on C γ bb . However, by inductive assumption, there exists a polynomial of the form : (v y )n−d+1 (v h )d−2 C γ bb : + : f C βbb : + · · ·
(78)
in the kernel of K (0), where f has polynomial degree at most 2n − 2, and (· · · ) has 2(d−1) x v , polynomial degree at most 2n − 1. Taking the Wick product of (78) with n−d+1+λ and adding it to (76) eliminates the term (77), but does not affect the coefficients of either : (v y )n−d (v h )d C γ bb : or : (v y )n−d−1 (v h )d+1 C βbb : in (76). This yields a vertex operator of the form (74) in the case d < n. Finally, suppose that d = n, which is easier than the case d < n because there is no coefficient λ˜ to worry about. Then (76) becomes 2(n − 1) : (v y )(v h )n−2 v x C γ bb :, 1+λ modulo terms which either have polynomial degree at most 2n + 1, or do not depend on y h n−2 v x C γ bb : to obtain a C γ bb . As above, we can eliminate the term − 2(n−1) 1+λ : (v )(v ) vertex operator of the form (74). This completes the induction. : (v h )n C γ bb : −
Consider the standard monomial basis for the universal enveloping algebra U (sl2 ) given by r s t x y h ∈ U (sl2 )|r, s, t ≥ 0 . For any such monomial μ = x r y s h t ∈ U (sl2 ), we can associated to it the normally ordered monomial : (v x )r (v y )s (v h )t :∈ S sl2 [t] , which we also denote by μ. Define h μ ∈ Wbas ∩ K er (K (0)) to be the Wick product of : (v x )r : with the vertex operator : (v y )s (v h )t C γ bb + : f C βbb : + · · · , given by Lemma 8.4. Clearly h μ has degree 4r − 4s, weight 2s + t + 2, and is of the form : μC γ bb : + : gC βbb : + · · · , where g ∈ S sl2 [t] has polynomial degree at most 2(r + s + t), and (· · · ) has polynomial degree at most 2(r + s + t) + 1. In particular, for μ = 1, we have h μ = C γ bb . (2) The assignment μ → h μ extends to a linear map φ : U (sl2 ) → Wbas ∩ K er (K (0)), which induces a linear map : U (sl2 ) → H ∗ (Wbas , K (0)). It is clear that maps the eigenspace of [h, −] of eigenvalue d into H 2d (Wbas , K (0)).
Chiral Equivariant Cohomology of a Point
407
Lemma 8.5. The map : U (sl2 ) → H ∗ (Wbas , K (0)) is injective. Proof. Let f be a nonzero, homogeneous element of degree d in U (sl2 ). Then the leading term of φ( f ) has polynomial degree 2d + 3, and is of the form : f C γ bb : + : gC βbb :, for some g ∈ S sl2 [t] of polynomial degree at most 2d. Since there are no nontrivial relations of the form (55), φ( f ) is nonzero. Finally, since φ( f ) has odd polynomial degree, it must represent a nontrivial class in H ∗ (Wbas , K (0)). 9. The Structure of H∗SU(2) (C) In this section, we will use the spectral sequence of the double complex together with our results about H ∗ (Wbas , K (0)) to study H∗SU (2) (C). In particular, we will show that each nontrivial element of H ∗ (Wbas , K (0)) given by Lemma 8.5 gives rise to a nontrivial element of H∗SU (2) (C). Recall that for each k, we have a decreasing filtration 2k 2k H2k SU (2) (C)(0) ⊃ H SU (2) (C)(2) ⊃ H SU (2) (C)(4) ⊃ · · · ,
where H2k SU (2) (C)(2r ) consists of classes admitting a representative of the form ω = (2i) (2i) , with ω(2i) ∈ Wbas . We have an injective linear map i≥r ω 2k 2k H2k SU (2) (C)(2l) /H SU (2) (C)(2l+2) → H (Wbas , J (0)),
(79)
which is the specialization of (23) to the case G = SU (2). Lemma 9.1. Suppose that ω = i≥0 ω(2i) is homogeneous with respect to degree and (2i) , and D(0)(ω) = 0. If ω(0) = 0, ω represents a nontrivial weight, with ω(2i) ∈ Wbas ∗ class in H SU (2) (C).
Proof. This is clear from the injectivity of the maps (62) and (79).
Using this result, there are two subalgebras of H∗SU (2) (C) that we can now describe. The first one is essentially classical. Recall the abelian subalgebra γ sl2 [t] of S sl2 [t] , which is just the polynomial algebra with generators ∂ k v x , k ≥ 0. Since D(0) acts trivially on γ sl2 [t] , there is a vertex algebra homomorphism γ sl2 [t] → H∗SU (2) (C), which (0) . The next theorem is clearly injective by Lemma 9.1 and the fact that γ sl2 [t] ⊂ Wbas is our main result; it describes an interesting and essentially nonclassical subalgebra of H∗SU (2) (C).
Theorem 9.2. There exists a linear map ψ : U (sl2 ) → S sl2 [t] such that for any f ∈ U (sl2 ), φ( f ) + ψ( f ) lies in K er (D(0)). Moreover, the linear map : U (sl2 ) → H∗SU (2) (C) sending f to the class of φ( f ) + ψ( f ), is injective.
408
A. R. Linshaw (2)
Proof. Fix a nonzero monomial μ ∈ U (sl2 ). Since φ(μ) ∈ Wbas ∩ K er (K (0)), and (1) J (0) preserves Wbas and lowers b-number by 1, we have J (0)(φ(μ)) ∈ Wbas . More(1) over, J (0)(φ(μ)) ∈ K er (K (0)), since K (0) commutes with J (0). Since Wbas ⊂ D, and H (1) (D, K (0)) = 0 by Lemma 8.1, there is an element ωμ ∈ D(0) = S sl2 [t] such that K (0)(ωμ ) = −J (0)(φ(μ)). Define the linear map ψ : U (sl2 ) → S sl2 [t] on our monomial basis by ψ(μ) = ωμ . Since J (0)(ψ(μ)) = 0 by Lemma 6.1, we clearly have D(0)(φ(μ) + ψ(μ)) = 0. In order to prove that is injective, it suffices to prove that ψ is injective, by Lemma 9.1. Let f be a nonzero element of U (sl2 ) which is homogeneous of degree d. Then φ( f ) has polynomial degree 2d + 3, and its leading term is of the form : f C γ bb : + : gC βbb : for some normally ordered polynomial g in v x , v y , v h of polynomial degree at most 2d. By Lemma 6.1, J (0) annihilates both f and g, so by (58) and (59) we have J (0)(φ( f )) = 2 : f v h K : −4 : f v x Q βb : +2 : gv y K : + : gv h Q βb :, modulo terms of polynomial degree at most 2d + 1. Since there are no relations of the form (56), the condition J (0)(φ( f )) = 0 implies that, up to lower polynomial degree corrections, 2 : f v h : +2 : gv y := 0, −4 : f v x : + : gv h := 0. This in turn implies that : : v h v h : −4 : v x v y : f := 0, up to lower polynomial degree corrections. This is impossible by Remark 7.3, so J (0)(φ( f )) = 0. Since K (0)(ψ( f )) = −J (0)(φ( f )), it follows that ψ( f ) = 0 as well. Remark 9.3. It is clear from the construction that for μ = x r y s h t , (μ) has degree 4r − 4s and conformal weight 2s + t + 2. For the identity element 1 ∈ U (sl2 ), we have φ(1) + ψ(1) = C γ bb + 2 : v x v y : +
1 h h 1 : v v : − ∂v h . 2 2
(80)
This coincides with the element L given by (10), expressed in terms of the generators of C. Hence (1) is precisely the Virasoro class L ∈ H∗SU (2) (C). For the sake of illustration, we write down explicit representatives for a few more of the classes in H∗SU (2) (C) given by Theorem 9.2. For n ≥ 1, let H2n+2 denote φ( f )+ψ( f ) for f = y n ∈ U (sl2 ), which represents a class of degree −4n and weight 2n + 2. Similarly, let Fn+2 denote φ( f ) + ψ( f ) for f = h n , which represents a class of degree 0 and weight n + 2. The following formulae were verified using Kris
Chiral Equivariant Cohomology of a Point
409
Thielemann’s Mathematica OPE package [T]: 2 5 : v x C βbb : − ∂C γ bb 3 3 4 1 1 + : v y v x v h : + : v h v h v h : − : v h ∂v h : 3 3 3 16 2 5 : (∂v y )v x : + : v y ∂v x : − ∂ 2 v h , − 3 3 3 v h v h C γ bb : + : (∂v h )C γ bb : 2 1 + : v h v x C βbb : + : (∂v x )C βbb : + : v x ∂C βbb : 3 3 1 h h h h h h x y + :v v v v :+:v v v v : 4 1 4 2 + : v h v h ∂v h : + : (∂v x )v y v h : + v x (∂v y )v h : 2 3 3 5 x y h 2 x y + : v v ∂v : +4 : (∂ v )v : 3 1 1 −2 : v x ∂ 2 v y : − : (∂v h )(∂v h ) : − ∂ 3 v h , 4 12 1 5 v y C γ bb : − : v h C βbb : − ∂C βbb 4 12 1 7 19 1 : (∂v h )v y : + ∂ 2 v y , + : v x v y v y : + : v h v h v y : + : v h ∂v y : − 4 6 12 12 1 2 7 y y γ bb y h βbb y βbb y : (∂v )C : v ∂C βbb : v v C :− :v v C :− :− 3 15 15 1 2 + : v y v y vh vh : + : v x v y v y v y : 6 3 2 53 y y h : (∂v )v v : − : v y v y ∂v h : +2 : v y ∂ 2 v y :, + 15 30 3 v y v y v y C γ bb : − : v y v y v h C βbb : 8 3 27 − : (∂v y )v y C βbb : − : v y v y ∂C βbb : 14 56 1 1 + : v x v y v y v y v y : + : v y v y v y vh vh : 2 8 103 3 y y y h − : v v v ∂v : + : (∂v y )v y v y v h : +3 : (∂ 2 v y )v y v y : . 56 28
F3 =: v h C γ bb : +
F4 =:
H4 =:
H6 =:
H8 =:
(81)
(82)
(83)
(84)
(85)
We conjecture that the classes {( f )| f ∈ U (sl2 )}, together with the classical ele∗ x ment [v x ] ∈ H∗SU (2) (C)[0] ∼ = HSU (2) ( pt) represented by v , form a strong generating ∗ ∗ set for H SU (2) (C). If this is the case, H SU (2) (C) has purely even degree. In fact, it is likely that H∗SU (2) (C) is generated (although not strongly generated) by a much more economical set. The following circle product relations have been verified: H4 ◦1 H4 =
5 H6 , 2
H4 ◦1 H6 =
112 H8 , 45
H4 ◦1 v x = −
5 L. 12
(86)
410
A. R. Linshaw
We expect that there exist nonzero constants cn such that H4 ◦1 H2n+2 = cn H2n+4 . If this holds, all the H2n+2 lie in the vertex algebra generated by H4 . It is clear from the proof of Lemma 8.4 that all elements of the form φ( f ) for f ∈ U (sl2 ) lie in the vertex algebra generated by v x and h 2n+2 for n ≥ 2. It follows that the classes ( f ) ∈ H∗SU (2) (C) for f ∈ U (sl2 ) all lie in the vertex algebra generated by [v x ] and [H2n+2 ] for n ≥ 2. (Here [H2n+2 ] denotes the class represented by H2n+2 ). If all the H2n+2 lie in the vertex algebra generated by H4 as suggested by (86), each ( f ) lies in the vertex algebra generated by [v x ] and [H4 ]. Finally, if H∗SU (2) (C) is indeed strongly generated by {[v x ], ( f )| f ∈ U (sl2 )}, this would imply that H∗SU (2) (C) is generated as a vertex algebra by [v x ] and [H4 ]. An interesting problem is to compute the graded character χ (z, q) = dim HdSU (2) (C)[n]z d q n , d∈Z n≥0
where HdSU (2) (C)[n] has degree d and weight n. Even if H∗SU (2) (C) is strongly generated by {[v x ], ( f )| f ∈ U (sl2 )}, it is not clear how to compute this character, because there exist normally ordered polynomial relations among these generators and their derivatives. For example, in weight 4 and degree 0, a computer calculation shows that the following expression in Wbas is identically zero: 7 : L L : −F4 − 4 : v x H4 : +∂ F3 + ∂ 2 L . 6 Finally, we expect that there exists an alternative and more geometric construction of the chiral equivariant cohomology, and in particular of H∗G (C). We hope that the structure described in this paper in the case G = SU (2) may give some hint about where to look for such a construction. A natural place to begin would be to find a geometric interpretation of the element [H4 ] ∈ H−4 SU (2) (C)[4]. 10. Appendix In this Appendix we develop some techniques for studying invariant rings of the form O(J∞ (V ))g[t] and we prove Theorem 6.2. 1 First, we recall some basic facts about jet schemes, following the notation in [EM]. Let X be an irreducible scheme of finite type over C. For each integer m ≥ 0, the jet scheme Jm (X ) is determined by its functor of points: for every C-algebra A, we have a bijection
H om(Spec(A), Jm (X )) ∼ = H om Spec A[t]/t m+1 , X . Thus the C-valued points of Jm (X ) correspond to the C[t]/t m+1 -valued points of X . If m > p, we have projections πm, p : Jm (X ) → J p (X ) which are compatible when defined: πm, p ◦ πq,m = πq, p . Clearly J0 (X ) = X and J1 (X ) is the total tangent space Spec(Sym( X/C )). The assignment X → Jm (X ) is functorial, and a morphism f : X → Y of schemes induces f m : Jm (X ) → Jm (Y ) for all m ≥ 1. If X is nonsingular, Jm (X ) is irreducible and nonsingular for all m. Moreover, if X, Y are nonsingular and f : Y → X is a smooth surjection, f m is surjective for all m. 1 The localization technique used to study invariant rings of the form O(J (V ))g[t] in this section is due m to Bailin Song. I thank him for sharing this idea with me.
Chiral Equivariant Cohomology of a Point
411
If X = Spec(R), where R = C[y1 , . . . , yr ]/ f 1 , . . . , f k , we can find explicit equations for Jm (X ). Define new variables y (i) j for i = 0, . . . , m, and define a derivation D (i)
(i)
(i)
(i+1)
on the generators of C[y1 , . . . , yr ] by D(y j ) = y j
(m)
for i < m, and D(y j ) = 0. (i)
(i)
Since D is a derivation, this specifies the action of D on all of C[y1 , . . . , yr ]; in (i) (i) (i) particular, f j = D i ( f j ) is a well-defined polynomial in C[y1 , . . . , yr ]. Letting (i) (i) (i) (i) Rm = C[y1 , . . . , yr ]/ f 1 , . . . , f k , we have Jm (X ) ∼ = Spec(Rm ). Given a scheme X , define J∞ (X ) = lim∞←m Jm (X ), which is known as the infinite jet scheme, or space of arcs of X . If X = Spec(R) as above, J∞ (X ) ∼ = Spec(R∞ ), where (i) (i) (i) (i) (i) (i+1) R∞ = C[y1 , . . . , yr ]/ f 1 , . . . , f k . Here i = 0, 1, 2, . . . and D(y j ) = y j for all i. We denote by O(J∞ (X )) the ring limm→∞ O(Jm (X )). Let G be a connected, reductive algebraic group over C with Lie algebra g. For m ≥ 1, Jm (G) is an algebraic group which is the semidirect product of G with a unipotent group Um . The Lie algebra of Jm (G) is g[t]/t m+1 . Given a linear representation V of G, there is an action of G on O(V ) by automorphisms, and a compatible action of g on O(V ) by derivations, satisfying d ex p(tξ )( f )|t=0 = ξ( f ), ξ ∈ g, dt
f ∈ O(V ).
Choose a basis {x1 , . . . , xn } for V ∗ , so that
(i) O(V ) ∼ = C[x1 , . . . , xn ], O(Jm (V )) = C x1 , . . . , xn(i) , 0 ≤ i ≤ m .
We regard O(V ) as a subalgebra of O(Jm (V )) by identifying xi with xi(0) . Then Jm (G) acts on Jm (V ), and the induced action of g[t]/t m+1 by derivations on O(Jm (V )) is defined on generators by (i)
ξ t r (x j ) = cir ξ(x j )(i−r ) ,
(87)
i! r where cir = (i−r )! for 0 ≤ r ≤ i, and ci = 0 for r > i. Via the projection m+1 g[t] → g[t]/t , g[t] acts on O(Jm (V )), and the invariant rings O(Jm (V ))g[t] and m+1 coincide. O(Jm (V ))g[t]/t The problem of describing invariant rings of the form O(Jm (V ))g[t] was first studied (to the best of our knowledge) by D. Eck in [E]. Eck was primarily interested in the case where G and V are real, although in [E] he worked with the complexifications of G and V in order to use tools from algebraic geometry. He proved that for all m ≥ 1,
O(Jm (V ))g[t] = O(V )G m
(88)
under fairly restrictive hypotheses, namely, that the categorical quotient V //G is smooth and the map V → V //G is an étale fibration. In this notation, O(V )G m denotes the ring generated by {D i ( f )| f ∈ O(V )G | 0 ≤ i ≤ m}. If (88) holds, letting m approach infinity shows that O(J∞ (V ))g[t] = O(V )G , which is just the ring generated by {D i ( f )| f ∈ O(V )G , i ≥ 0}. In the case where V is the adjoint representation of a simple group G, this result was also proven in the Appendix of [Mu]. More recently [P], the equality O(J∞ (V ))g[t] = O(V )G was proven in the case where G is a compact, real Lie group and V is a real, irreducible representation of G.
412
A. R. Linshaw
For the moment we make no additional assumptions about G and V . Since G is reductive, O(V )G is finitely generated. Choose a set of irreducible, homogeneous generators {y1 , . . . , y p } for O(V )G . Let d = dim(V //G), and let # ! " $ ∂ yi |x = d , V 0 = x ∈ V | rank ∂x j
(89)
which is G-invariant, Zariski open, and independent of our choice of generators for O(V )G . For each x ∈ V 0 , we can choose d algebraically independent polynomials from ∂ yi the set {y1 , . . . , y p } such that the corresponding d × n submatrix of ∂ x j has rank d. Without loss of generality, we may assume these are y1 , . . . , yd . It is easy to see that the polynomials {D i (y1 ), . . . , D i (yd )| i ≥ 0} are algebraically independent, and distinct monomials in the variables D i (y j ) are linearly independent over the function field K(V ) = C(x1 , . . . , xn ). For notational simplicity, we will denote O(V ) by O, and we will denote O(Jm (V )) by Om . The ring Om has a Z≥0 -grading Om = r ≥0 Om [r ] by weight, defined by ( j)
wt (xi ) = j. Note that Om [0] = O for all m, and each Om [r ] is a module over O. By (87), for each ξ ∈ g, the action of ξ t k is homogeneous of weight −k. In particular, the Lie subalgebra tg[t] ⊂ g[t] annihilates O, and hence acts by O-linear derivations on each Om . Let K be the quotient field of O, and let Km = K ⊗O Om , Km [r ] = K ⊗O Om [r ]. Clearly tg[t] acts on Km by K-linear derivations. Our first goal is to describe the invariant t g[t] space K1 . t g[t]
Lemma 10.1. K1
(1)
(1)
is generated by y1 , . . . , yd as a K-algebra. (1)
(1)
Proof. Let W ⊂ K1 [1] be the K-subspace spanned by y1 , . . . , yd , which has dimension d over K. Since G is reductive and acts on K1 [1], W has a G-stable complement A ⊂ K1 [1] of dimension r = n − d. The linear map g → H om(A, K), ξ → ξ t| A is surjective, so we can choose ξ1 , . . . , ξr ∈ g such that {ξ1 t| A , . . . , ξr t| A } form a basis for H om(A, K). Choose a dual basis {a1 , . . . , ar } for A satisfying ξi t (a j ) = δi, j . (1)
(90)
(1)
Since {y1 , . . . , yd , a1 , . . . , ar } is a basis for K1 [1] over K, it follows that K1 = K ⊗O Sym(K1 [1]) = K ⊗O Sym(W ) ⊗O Sym(A). Since yi(1) ∈ K1 [1]t g[t] and each ξ t is a K-linear derivation on K1 , it follows that t g[t] Sym(W )t g[t] = Sym(W ). Similarly, (90) shows that Sym(A)t g[t] = C. Hence K1 = K ⊗O Sym(W ), as claimed.
Chiral Equivariant Cohomology of a Point
413
In fact, it suffices to work over a certain localization of O rather than the full quotient (1) field K. We can choose r = n −d elements of the set {x1(1) , . . . , xn(1) }, say xd+1 , . . . , xn(1) , such that (1) (1) (1) y1 , . . . , yd , xd+1 , . . . , xn(1) (91) forms a basis for K1 [1] over K. It follows that the set (91) is algebraically independent. Let be the determinant of the K-linear change of coordinates on K1 [1] given by
(1) (1) (1) (1) (1) (1) x1 , . . . , xd , xd+1 , . . . , xn(1) → y1 , . . . , yd , xd+1 , . . . , xn(1) , (92) and let O be the localization of O along the multiplicative set generated by . Let Om, = O ⊗O Om , which has a weight grading Om, = k≥0 Om, [k]. t g[t] Lemma 10.2. The invariant space O1, is generated as an O -algebra by (1) (1) y1 , . . . , yd . t g[t] Proof. Any ω ∈ O1, can be expressed uniquely in the form ω= pi μi ,
(93)
i∈I (1)
(1)
(1)
where μi are distinct monomials in y1 , . . . , yd and pi are polynomials in xd+1 , . . . , (1) xn with coefficients in O . (1) (1) (1) (1) Since tg[t] acts trivially on each μi , and y1 , . . . , yd , xd+1 , . . . , xn are algebraTherefore we ically independent, it follows that each pi lies in (O 1, )t g[t] independently.
(1)
(1)
may assume without loss of generality that ω = p xd+1 , . . . , xn . But by Lemma 10.1,
(1) (1) ω can also be expressed as a polynomial p y1 , . . . , yd with K-coefficients. Thus (1)
(1)
(1)
p(xd+1 , . . . , xn(1) ) = p (y1 , . . . , yd ), (1) (1) (1) (1) which violates the algebraic independence of y1 , . . . , yd , xd+1 , . . . , xn . Lemma 10.3. For each m > 0, (Om, )tg[t] is generated as an O -algebra by ( j) ( j) y1 , . . . , yd | 1 ≤ j ≤ m . Proof. This holds for m = 1 by Lemma 10.2, that so we may assume inductively tg[t] ( j) ( j) is generated as an O -algebra by y1 , . . . , yd |, 1 ≤ j < m . Let I ⊂ Om−1, (m) (m) Om, denote the ideal generated by x1 , . . . , xn , and consider the filtration Om, ⊃ I ⊃ I 2 ⊃ · · · ,
414
A. R. Linshaw
(m) and let S = and the associated grading Om, ∼ = k≥0 I k /I k+1 . Let S = C xi , xi S ⊗O O . We have decompositions Om = S ⊗O Om−1 , Om, = S ⊗O Om−1, .
(94)
Given ω ∈ (Om, )t g[t] , we say that ω has m-degree s if s is the minimal integer for which ω ∈ sk=0 I k /I k+1 . If ω has m-degree s, let ωˆ be the “leading term” of ω, i.e., the projection of ω onto I s /I s+1 . Since t m g[t] acts trivially on Om−1, , it is immediate from the decomposition (94) that m ωˆ ∈ (S )t g[t] ⊗O (Om−1, )t g[t] .
There is a natural map of O -algebras S → O1, , defined on generators by → x (1) x (m) j j , x j → x j .
(95)
Moreover, given ξ ∈ g,
(m) (1) ξtm x j = m!ξ t x j = m!ξ(x j ).
m ∼ (O1, )t g[t] as O -algebras. Hence (S )t g[t] = (m) (m) For i = 1, . . . , d, the leading term yˆi of yi lies in I /I 2 , and
(m)
yi
(m)
− yˆi
∈ I 0 = Om−1, .
m (m) (1) Since yˆi → yi under (95), it follows from Lemma 10.2 that (S )t g[t] is generated (m) by { yˆi | i = 1, . . . , d} as an O -algebra.
( j)
By our inductive assumption, (Om−1, )t g[t] is generated by yi | 1 ≤ j < m as
an O -algebra. Since ωˆ ∈ (S )t g[t] ⊗ (Om−1, )t g[t] , it follows that ωˆ can be expressed as a polynomial in the variables ( j) (m) yˆi , yi |, 1 ≤ j < m m
(m)
with coefficients in O . Moreover, ωˆ is homogeneous of degree s in the variables yˆi ( j)
since ωˆ ∈ I s /I s+1 . Let ω be the polynomial in the variables yi | 0 ≤ j ≤ m obtained (m)
from ωˆ by replacing the variables yˆi
(m)
with yi
. Clearly
ω = ω − ω ∈ (Om, )t g[t] , and ω has m-degree at most s − 1. Since ω ∼ ω modulo the O -algebra generated ( j) by {yi | 1 ≤ j ≤ m}, the claim follows by induction on m-degree. At this point, we impose a mild technical condition. We say that O contains no invariant lines if O contains no nontrivial, one-dimensional G-invariant subspace. If G is semisimple, this condition is automatically satisfied by any V . Lemma 10.4. If V is a representation of G for which O contains no invariant lines, O G is generated by primes.
Chiral Equivariant Cohomology of a Point
415
Proof. Let f ∈ O G , and suppose that f = p1 · · · pk is the prime factorization of f in O. Since O contains no invariant lines, each g ∈ G must permute the factors of f . Hence f determines a group homomorphism φ : G → Sk , where Sk is the permutation group on k letters. But G is connected and φ is continuous, so φ is trivial, and each pi ∈ O G . Suppose that O contains no invariant lines, and consider the full invariant algebra g[t] t g[t] g[t] Om , which is just the G-invariant subalgebra (Om )G . Given ω ∈ Om , we may write ( j) ( j) ω = k pk μk , where μk are distinct monomials in {y1 , . . . , yd | j = 1, . . . , m}, and g[t] pk ∈ O . Since each μk ∈ Om , and the μk ’s are linearly independent over O , it G . If we express p as a rational function f in lowest terms, follows that each pk ∈ O k g the denominator will either be 1, or will contain only G-invariant prime factors of , since both the denominator and numerator must be invariant. Letting be the product of the G-invariant prime factors of , it follows from Lemma 10.3 that G g[t] G G O = O ⊗O G Om ⊗O G O m ,
(96)
g[t]
where O G m ⊂ Om is generated by {D j ( f )| f ∈ O G , 0 ≤ j ≤ m}. If contains g[t] no G-invariant prime factors (so that = 1), it is immediate that Om = O G m for all m ≥ 1. g[t] Even if = 1, it still may be possible to prove the equality Om = O G m for m ≥ 1 using this method. Recall that corresponded to a choice of algebraically independent elements {y1 , . . . , yd } ⊂ O G . For i = 1, . . . , r , let {y1i , . . . , ydi } be a collection of maximal algebraically independent subsets of O G , and let 1 , . . . , r be the corresponding elements of O G , obtained as above. It is easy to see from Lemma 10.4 that g[t] in order to prove the equality Om = O G m for all m ≥ 1, it suffices to show that gcd(1 , . . . , r ) = 1. Proof of Theorem 6.2. First we consider the case of two copies of the adjoint represeny tation V . As in Theorem 7.1, we work in the basis {aix , ai , aih | i = 1, 2}. Recall that O(V ⊕ V )G has generators q11 , q12 , and q22 , which are algebraically independent. Consider the change of variables
y y (a1h )(1) , (a2h )(1) , (a1x )(1) , (a2x )(1) , (a1 )(1) , (a2 )(1)
y y (1) (1) (1) → q11 , q12 , q22 , (a2x )(1) , (a1 )(1) , (a2 )(1) .
The determinant of this change of variables is 8a2h (a1 a2h − a1h a2 ), so = 1. This proves (27). Next, we consider the case of three copies of V , and we work in the basis y {aix , ai , aih | i = 1, 2, 3}. The ring O(V ⊕ V ⊕ V )G has generators q11 , q12 , q13 , q22 , q23 , q33 , and c123 , and the ideal of relations is generated by y
q11 q12 q13 1 (c123 )2 + q12 q22 q23 . 4 q 13 q23 q33
y
416
A. R. Linshaw
Clearly (V ⊕ V ⊕ V )//G has dimension 6. First, take {y1 , y2 , y3 , y4 , y5 , y6 } = {q11 , q12 , q22 , q33 , q23 , q13 }, which is clearly algebraically independent, and consider the change of variables
y y y (a1h )(1) , (a2h )(1) , (a3h )(1) , (a1x )(1) , (a2x )(1) , (a1 )(1) , (a3x )(1) , (a2 )(1) , (a3 )(1)
y y (1) (1) (1) (1) (1) (1) → q11 , q12 , q22 , q33 , q23 , q13 , (a3x )(1) , (a2 )(1) , (a3 )(1) . y y We have = 64a3h −a3 a2h + a3h a2 c123 , so = c123 . Now consider {y1 , y2 , y3 , y4 , y5 , y6 } = {c123 , q12 , q22 , q33 , q23 , q13 }, and consider the change of variables
y y y (a1h )(1) , (a2h )(1) , (a3h )(1) , (a1x )(1) , (a2x )(1) , (a1 )(1) , (a3x )(1) , (a2 )(1) , (a3 )(1)
y y (1) (1) (1) (1) (1) (1) → c123 , q12 , q22 , q33 , q23 , q13 , (a3x )(1) , (a2 )(1) , (a3 )(1) . We have = −8a3h (−a3 a2h + a3h a2 )(q22 q33 − q23 q23 ), so = q22 q33 − q23 q23 . Since c123 and q22 q33 − q23 q23 have no common factor, (28) follows. y
y
References [A]
Akman, F.: The semi-infinite Weil complex of a graded Lie algebra. Ph. D. Thesis, Yale University, 1993 [B] Borcherds, R.: Vertex operator algebras, kac-moody algebras and the monster. Proc. Nat. Acad. Sci. USA 83, 3068–3071 (1986) [DKV] Duflo, M., Kumar, S., Vergne, M.: Sur la Cohomologie Équivariante des Variétés Différentiables. Astérisque 215, Paris: Soc. Math. France, 1993 [E] Eck, D.: Invariants of k-jet actions. Houston J. Math. 10(2), 159–168 (1984) [EM] Ein, L., Mustata, M.: Jet schemes and singularities. Algebraic geometry—Seattle 2005. Part 2, Proc. Sympos. Pure Math., 80, Part 2, Providence, RI: Amer. Math. Soc., 2009, pp. 505–546 [FBZ] Frenkel, E., Ben-Zvi, D.: Vertex Algebras and Algebraic Curves. Math. Surveys and Monographs, Vol. 88, Providence, RI: Amer. Math. Soc., 2001 [FF] Feigin, B., Frenkel, E.: Semi-infinite weil complex and the virasoro algebra. Commun. Math. Phys. 137, 617–639 (1991) [FLM] Frenkel, I.B., Lepowsky, J., Meurman, A.: Vertex Operator Algebras and the Monster. New York: Academic Press, 1988 [FMS] Friedan, D., Martinec, E., Shenker, S.: Conformal invariance, supersymmetry and string theory. Nucl. Phys. B271, 93–165 (1986) [GK] Gorelik, M., Kac, V.: On simplicity of vacuum modules. Adv. Math. 211, 621–677 (2007) [GS] Guillemin, V., Sternberg, S.: Supersymmetry and Equivariant de Rham Theory. Berlin-HeidelbergNewYork: Springer, 1999 [K] Kac, V.: Vertex Algebras for Beginners. University Lecture Series, Vol. 10. Providence, RI: Amer. Math. Soc., 1998 [LiI] Li, H.: Local systems of vertex operators, vertex superalgebras and modules. J. Pure Appl. Algebra 109(2), 143–195 (1996) [LiII] Li, H.: Vertex algebras and vertex poisson algebras. Commun. Contemp. Math. 6, 61–110 (2004) [LL] Lian, B., Linshaw, A.: Howe pairs in the theory of vertex algebras. J. Algebra 317, 111–152 (2007) [LLI] Lian, B., Linshaw, A.: Chiral equivariant cohomology i. Adv. Math. 209, 99–161 (2007) [LLSI] Lian, B., Linshaw, A., Song, B.: Chiral equivariant cohomology ii. Trans. Am. Math. Soc. 360, 4739–4776 (2008) [LLSII] Lian, B., Linshaw, A., Song, B.: Chiral equivariant cohomology iii. Amer. J. Math. 132(6), 1549–1590 (2010) [LZI] Lian, B., Zuckerman, G.J.: Commutative quantum operator algebras. J. Pure Appl. Algebra 100(1–3), 117–139 (1995) [LZII] Lian, B., Zuckerman, G.J.: New perspectives on the brst-algebraic structure of string theory. Comm. Math. Phys. 154, 613–646 (1993)
Chiral Equivariant Cohomology of a Point
[P] [MSV] [Mu] [T]
417
Perez Alvarez, J.: Jet invariants of compact lie groups. J. Geom. Phys. 57, 293–295 (2006) Malikov, F., Schechtman, V., Vaintrob, A.: Chiral de rham complex. Commun. Math. Phys. 204, 439–473 (1999) Mustata, M.: Jet schemes of locally complete intersection canonical singularities. Invent. Math. 145(3), 397–424 (2001) Thielemanns, K.: A mathematica package for computing operator product expansions. Int. Mod. Phys. C2, 787 (1991)
Communicated by Y. Kawahigashi
Commun. Math. Phys. 306, 419–447 (2011) Digital Object Identifier (DOI) 10.1007/s00220-011-1258-1
Communications in
Mathematical Physics
Genus Two Partition and Correlation Functions for Fermionic Vertex Operator Superalgebras I Michael P. Tuite, Alexander Zuevsky School of Mathematics, Statistics and Applied Mathematics, National University of Ireland Galway, University Road, Galway, Ireland. E-mail:
[email protected];
[email protected] Received: 30 July 2010 / Accepted: 27 November 2010 Published online: 29 May 2011 – © Springer-Verlag 2011
Abstract: We define the partition and n-point correlation functions for a vertex operator superalgebra on a genus two Riemann surface formed by sewing two tori together. For the free fermion vertex operator superalgebra we obtain a closed formula for the genus two continuous orbifold partition function in terms of an infinite dimensional determinant with entries arising from torus Szeg˝o kernels. We prove that the partition function is holomorphic in the sewing parameters on a given suitable domain and describe its modular properties. Using the bosonized formalism, a new genus two Jacobi product identity is described for the Riemann theta series. We compute and discuss the modular properties of the generating function for all n-point functions in terms of a genus two Szeg˝o kernel determinant. We also show that the Virasoro vector one point function satisfies a genus two Ward identity. 1. Introduction Genus two (and higher) partition functions and correlation functions have been studied for some time in string and conformal field theory e.g. [EO,FS,DP,Kn,DVPFHLS]. Meanwhile, in the theory of Vertex Operator Algebras (VOAs) [B,FHL,FLM,Ka,MN, MT5] higher genus approaches based on algebraic geometry have also been developed e.g. [TUY,KNTY,Z2,U]. A more constructive VOA approach has recently been described whereby genus two partition and n-point correlation functions are defined in terms of genus one VOA data [T,MT1,MT2,MT3,MT4]. This approach is based solely on the properties of a VOA with no assumed analytic or modular properties for partition or correlation functions. A compact genus two Riemann surface can be obtained from tori by either sewing two separate tori together, which we refer to as the -formalism, or by self-sewing a torus, which we refer to as the ρ-formalism [MT2]. The theory of partition and n-point correlation functions in the -formalism is described in ref. [MT1] Supported by a Science Foundation Ireland Research Frontiers Programme Grant, and by Max–Planck Institut für Mathematik, Bonn.
420
M. P. Tuite, A. Zuevsky
where these functions are explicitly computed for the Heisenberg VOA and its modules including lattice VOAs. The corresponding functions are considered in the ρ-formalism in ref. [MT3]. This paper extends these methods to the study of genus two partition and n-point functions in the -formalism for Vertex Operator Superalgebras (VOSA). In particular, we explicitly compute and prove convergence and modular properties of the genus two continuous orbifold partition and n-point functions for the rank two fermion VOSA V (H, Z + 21 )⊗2 . (The alternative ρ-formalism is considered elsewhere [TZ3]). These functions are computed in terms of appropriate torus n-point functions described in [MTZ]. We also make extensive use of the expression of the genus two Szeg˝o kernel S (2) of (7) in terms of genus one Szeg˝o kernel data described in [TZ1]. The partition function is then expressed as a certain infinite determinant whose components arise from genus one Szeg˝o kernel data. Furthermore, the generating function of all n-point correlation functions is computed in terms of a genus two Szeg˝o kernel determinant. Section 2 consists of a review of aspects of the -formalism for constructing a genus two Riemann surface by sewing two separate tori with modular parameters τ1 , τ2 respectively for (τ1 , τ2 , ) ∈ D , a specific domain for which the sewing is defined [MT2]. We also review the construction of the genus two Szeg˝o kernel S (2) in terms of genus one Szeg˝o kernel data [TZ1]. In particular we introduce an infinite block matrix 0 ξ F1 (τ1 ) Q= , −ξ F2 (τ2 ) 0 √ where ξ = ± −1 and Fa (τa ) for a = 1, 2 are certain infinite matrices whose entries involve twisted modular forms in τa associated with genus one Szeg˝o kernels [MTZ]. Section 3 is a review of Vertex Operator Superalgebras (VOSA) and the Li–Zamolodchikov (Li–Z) metric on a VOSA [L,Sche]. The free fermion rank one VOSA V (H, Z + 21 ) is also reviewed. In Section 4 we consider the orbifold partition and n-point function on a genus two surface in the -formalism for a VOSA with a Li–Z metric. These are defined in terms of genus one n-point orbifold functions associated with a pair of commuting VOSA automorphisms f a , ga on a torus with modular parameter τa for a = 1, 2. Section 5 contains the main results of the paper wherein the partition function and the generating function for n-point functions are computed for the rank two fermion VOSA with continuous automorphisms generated by the Heisenberg vector. In particular we prove in Theorem 1 that the partition function is given by f f f (τ1 , τ2 , ) = Z (1) 1 (τ1 ) Z (1) 2 (τ2 ) det (I − Q) , Z (2) g g1 g2 f where f = ( f 1 , g1 ) and g = ( f 2 , g2 ) and Z (1) ga (τa ) is the orbifold partition funca tion on the torus with modular parameter τa . The partition function is holomorphic for (τ1 , τ2 , ) ∈ D , a specific domain on which the -formalism can be carried out [MT2]. In Theorem 2 we find the generating function for all genus two n-point functions as a differential form which is expressed in terms of a finite dimensional determinant of genus two Szeg˝o kernels S (2) . We also discuss the bosonization of the fermion VOSA wherein the partition function can be expressed in terms of a genus two Riemann theta series and the Heisenberg genus two partition function. This leads to a new genus two version of the classical Jacobi product identity expressing the genus two Riemann theta series in terms of certain infinite products. We also discuss the genus two Ward identity satisfied by the Virasoro one point function in this bosonized setting.
Genus Two Partition and Correlation Functions for Fermionic VOSAs I
421
In Sect. 6 we discuss modular invariance of the genus two partition and n-point generating form under a modular group preserving D . The Appendix describes some general aspects of Riemann surfaces such as the period matrix, the projective connection and the prime form. We also recall some facts from the classical and twisted elliptic function theory [MTZ]. We collect here notation for some of the more frequently occurring functions and symbols employed. Z is the set of integers, C the complex numbers, H the complex upper-half plane. We will always take τ to lie in H, and z will lie in C unless otherwise noted. For a symbol z we set qz = exp(z) and in particular q = q2πiτ = exp(2πiτ ). 2. The Szeg˝o Kernel on a Genus Two Riemann Surface Formed from Two Sewn Tori The central role played by the Szeg˝o kernel S (g) for the fermion VOSA has been long known e.g. [RS,R,DVFHLS,DVPFHLS]. In this section we review the form of the Szeg˝o kernel on a Riemann surface (2) of genus two obtained by sewing together two tori described in [TZ1]. Some further details appear in Appendix 7.1. 2.1. The Szeg˝o kernel on a Riemann surface. Consider a compact connected Riemann surface (g) of genus g with canonical homology cycle basis ai , bi for i = 1, . . . , g. Let (g) (g) νi be a basis of holomorphic 1-forms with normalization ai ν j = 2πiδi j and period (g) (g) 1 matrix i j = 2πi bi ν j ∈ Hg , the Siegel upper half plane (e.g. [FK,Sp]). Define the theta function with real characteristics [M,F1,FK]
α (g) z|(g) = ϑ (g) eiπ(n+α). .(n+α)+(n+α).(z+2πiβ) , (1) β g n∈Z
for α = (α j ), β = (β j ) ∈
Rg
and z = (z j ) ∈ Cg for j = . . . , g. α1, The Szeg˝o Kernel [Schi,HS,F1,F2] is defined for ϑ β (0|(g) ) = 0 by (g) θ (x, y) = S φ
α
x (g) ϑ (g) β |(g) y ν α , ϑ (g) β (0|(g) )E (g) (x, y)
(2)
where θ = (θ j ), φ = (φ j ) ∈ U (1)n for θ j = −e−2πiβ j , φ j = −e2πiα j ,
j = 1, . . . , g,
(3)
and E (g) (x, y) is the prime form (see Appendix 7.1). The factors of −1 in (3) are included for later convenience. The Szeg˝o kernel has multipliers along the ai and b j cycles in x given by −φi and −θ j respectively and is a meromorphic ( 21 , 21 )-form satisfying 1 1 1 θ (x, y) ∼ d x 2 dy 2 for x ∼ y, S (g) φ x−y −1 θ θ (x, y) = −S (g) (y, x), S (g) φ φ −1 where θ −1 = (θi−1 ) and φ −1 = (φi−1 ).
422
M. P. Tuite, A. Zuevsky
2.2. Genus two Riemann surfaces formed from two sewn tori. Consider the genus two Riemann surface formed by sewing together two tori in the sewing scheme referred to as the -formalism in refs. [MT1,MT2,TZ1]. Let a(1) = C/a for a = 1, 2 be an oriented tori with lattice a = 2πi(Zτa ⊕ Z) for τa ∈ H. (1) Choose a local coordinate z a ∈ C/a on a in the neighborhood of a point (1) pa ∈ a and consider the closed disk |z a | ≤ ra for ra < 21 D(qa ), where [MT2] D(qa ) =
min
λ∈a ,λ=0
|λ|,
is the minimal lattice distance. Introduce a complex sewing parameter where || ≤ r1r2 , and excise the disk {z a , |z a | ≤ ||/ra¯ } ⊂ a(1) , to form a punctured torus
a(1) = a(1) \{z a , |z a | ≤ ||/ra¯ }. Here and below, we use the convention 1 = 2, 2 = 1. Define the annulus
a(1) , Aa = {z a , ||/ra¯ ≤ |z a | ≤ ra } ⊂ and identify A1 and A2 as a single region A = A1 A2 via the sewing relation z 1 z 2 = .
(4) (1)
\A1 } ∪ In this way we obtain a compact genus two Riemann surface (2) = { 1 (1)
{2 \A2 } ∪ A, parameterized by the domain [MT2] 1 D = (τ1 , τ2 , ) ∈ H1 ×H1 ×C | || < D(q1 )D(q2 ) . 4
(5)
2.3. The genus two Szeg˝o kernel in the -formalism. On a torus the prime form is 1 1 ) E (1) (x, y) = K (1) (x − y, τ )d x − 2 dy − 2 , where K (1) (z, τ ) = ∂ϑz ϑ11(z,τ (0,τ ) and ϑ1 (z, τ ) = 1 2 ϑ 1 (z, τ ) for z ∈ C and τ ∈ H. For (θ, φ) = (1, 1) with θ = − exp(−2πiβ) and 2
φ = − exp(2πiα) the genus one Szeg˝o kernel is S
(1)
1 1 θ θ (x, y| τ ) = P1 (x − y, τ )d x 2 dy 2 , φ φ
(6)
Genus Two Partition and Correlation Functions for Fermionic VOSAs I
where
423
(1) α (z, τ ) ϑ β 1 θ α (z, τ ) = P1 (1) (z, τ ) φ K (1) ϑ β (0, τ ) =−
k∈Z
qzk+λ , 1 − θ −1 q k+λ
is a ‘twisted’ Weierstrass function [MTZ] and where qz = e z and φ = e2πiλ for 0 ≤ λ < 1 (see Appendix 7.2 for details). In [TZ1] we determine the genus two Szeg˝o kernel (2) (2) (2) θ S (x, y) = S (x, y), (7) φ (2) with periodicities θ (2) , φ (2) = (θa , φa ) for a = 1, 2 on the inherited homology basis (1) on the genus two Riemann surface (2) formed θ by sewing two tori a in terms of a genus one Szeg˝o kernel data Sa(1) (x, y) = S (1) φ (x, y). Note that we exclude those a
Riemann theta characteristics for which S (2) exists but where one of the lower genus (1) (1) theta functions vanishes, i.e. (θa , φa ) = (1, 1) so that Sa exists on the torus a for a = 1, 2. (2) In [TZ1] we show how to reconstruct S (x, y) from the Laurant expansions (68) of θ θ θ P1 φ (k, l, τ ) with coefficients C φ (k, l, τ ) and D φ (k, l, τ, z) of (69) and (70) of Appendix 7.2. In particular, we define for k, l ≥ 1, 1 θ θ Fa a (k, l, τa , ) = 2 (k+l−1) C a (k, l, τa ). (8) φa φa We let Fa = (Fa define
θ a
(k, l, )) denote the infinite matrix indexed by k, l ≥ 1. We also
φa holomorphic 21 -forms
a(1) , on
k 1 1 θa θ (k, x, τa , ) = 2 − 4 D a (1, k, τa , x) d x 2 , φa φa k 1 1 ¯h a θa (k, y, τa , ) = 2 − 4 D θa (k, 1, τa , −y) dy 2 . φa φa
ha
(9)
θ θ a a We let h a (x) = (h a φ (k, x, τa , )) and h¯ a (y) = (h¯ a ( φ (k, y, τa , )) denote infia a nite row vectors indexed by k. Recalling the sewing relation (4) we note that 1 1 2
a¯
dz a = (−1) ξ
1 2
dz a¯2 , z a¯
√ (1) where ξ ∈ {± −1} depending on the branch of the double cover of a chosen.
(10)
424
M. P. Tuite, A. Zuevsky
It is useful to introduce the infinite block matrices 0 ξI 0 ξ F1 = , Q= , −ξ I 0 −ξ F2 0
(11)
where I denotes the infinite identity matrix. Then Theorem 3.6 of [TZ1] states that (1)
a(1) , Sa (x, y) + h a (x) (I − Fa¯ Fa )−1 Fa¯ h¯ aT (y), for x, y ∈ (2) (12) S (x, y) = (1) −1 T
a , y ∈
(1) , ξ(−1)a¯ h a (x) (I − Fa¯ Fa ) h¯ a¯ (y), for x ∈ a¯
(1,1) =
(1) ∪
(1) , the where T denotes the transpose. Equivalently, for x, y ∈ 1 2 disconnected union of punctured tori, we define the forms (1)
a(1) Sa (x, y), for x, y ∈ S (1,1) (x, y) = (1)
a , y ∈
(1) , 0, for x ∈ a¯ (1)
(h 1 (x), 0), for x ∈ 1 h(x) = (13)
(1) , (0, h 2 (x)), for x ∈ 2
(1) h 1 (x), 0 , for x ∈ 1 h(x) =
(1) . 0, h 2 (x) , for x ∈ 2 Thus h(x) describes an θ infinite row vector indexed by k ≥ 1 and a = 1, 2 with b
(1) and similarly for h(x). With these (h(x)) (k, a) = δab h b φ (k, x, τb , ) for x ∈ b b definitions (12) is equivalent to T
S (2) (x, y) = S (1,1) (x, y) + h(x)(I − Q)−1 h (y),
(14)
(1,1) .
for x, y ∈ Lastly, defining the determinant of I − Q by the formal power series in , 1 log det (I − Q) = T r log (I − Q) = − T r (Q n ), n n≥1
it is shown in ref. [TZ1] that det(I − Q) = det(I − F1 F2 ), is non-vanishing and holomorphic on
(15)
D .
3. Vertex Operator Superalgebras 3.1. General definitions. We discuss some aspects of Vertex Operator Superalgebra theory to establish context and notation. For more details see [B,FHL,FLM,Ka,MN,MT5]. A Vertex Operator Superalgebra (VOSA) is a quadruple (V, Y, 1, ω) as follows: V is a superspace, i.e. a complex vector space V = V0¯ ⊕ V1¯ = ⊕α Vα with index label α in Z/2Z so that each a ∈ V has a parity (fermion number) p(a) ∈ Z/2Z. V has non-negative 21 Z-grading with Vr , for dim Vr < ∞, V = r ∈ 12 Z
Genus Two Partition and Correlation Functions for Fermionic VOSAs I
425
related to the superspace grading by V0¯ =
Vr , V1¯ =
r ∈Z
Vr .
(16)
r ∈Z+ 12
1 ∈ V0 is the vacuum vector and ω ∈ V2 is the conformal vector with properties described below. Y is a linear map Y : V → (End V )[[z, z −1 ]], for formal variable z, so that for any vector a ∈ V , Y (a, z) =
a(n)z −n−1 .
n∈Z
The component operators (modes) a(n) ∈ End V are such that a(n)1 = δn,−1 a, for n ≥ −1. Furthermore, for a ∈ Vα , a(n) : Vβ → Vβ+α .
(17)
The vertex operators satisfy locality: (x − y) N [Y (a, x), Y (b, y)] = 0, for all a, b ∈ V and N 0, where the commutator is defined in the graded sense: [Y (a, x), Y (b, y)] = Y (a, x)Y (b, y) − (−1) p(a) p(b) Y (b, y)Y (a, x).
(18)
The vertex operator for the vacuum is Y (1, z) = I dV , whereas that for ω is Y (ω, z) =
L(n)z −n−2 ,
n∈Z
where L(n) = ω(n + 1) forms a Virasoro algebra for central charge c, [L(m), L(n)] = (m − n)L(m + n) +
c (m 3 − m)δm,−n . 12
L(−1) generates translations with Y (L(−1)a, z) =
d Y (a, z). dz
L(0) determines the grading with L(0)a = wt (a)a for a ∈ Vr and r = wt (a), the weight of a.
426
M. P. Tuite, A. Zuevsky
3.2. The Li–Zamolodchikov (Li–Z) metric. The subalgebra {L(−1), L(0), L(1)} ∼ = S L(2, C) associated with Möbius transformations on z naturally acts on a VOSA (e.g. [B,Ka]). In particular, λ2 0 λ γλ = (19) : z → w = − , −1 0 −λ z is generated by Tλ = exp(λL(−1)) exp( λ1 L(1)) exp(λL(−1)), where
z −2L(0) z λ2 Tλ Y (u, z)Tλ−1 = Y exp(− 2 L(1)) − . u, − λ λ z
(20)
Later we will be particularly interested in the Möbius map z → w = /z associated with the sewing condition (4) with 1
λ = −ξ 2 ,
(21) √ with ξ ∈ {± −1} as previously introduced in (10). For u ∈ V of half-integral weight the action of −γλ = γ−λ is distinguished from that of γλ , whereas for integral weight they are equivalent. In particular we must distinguish √ the choices λ = ± −1 in (19) corresponding to the inversion map z → z −1 normally used to define the adjoint vertex operator. Following ref. [Sche] we therefore define Y † (u, z) = u † (n)z −n−1 = Tλ Y (u, z)Tλ−1 . (22) n
† One can verify that Y † (u, z) = (−1)2wt (u) Y (u, z) for u of weight wt (u). For a quasi-primary vector u (i.e. L(1)u = 0) of weight wt (u), u † (n) = λ−2wt (u) (−λ2 )n+1 u(2wt (u) − n − 2),
(23)
e.g. L † (n) = (−λ2 )n L(−n). Furthermore Y † (u, w)dw wt (u) = Y (u, z)dz wt (u) , where for half-integral wt (u) we choose the branch covering for which 2wt (u) dw wt (u) λ = . dz z
(24)
(25)
We say a bilinear form , λ on V is invariant if for all a, b, u ∈ V [Sche] Y (u, z)a, bλ = (−1) p(u) p(a) a, Y † (u, z)bλ ,
(26)
i.e. u(n)a, bλ = (−1) p(u) p(a) a, u † (n)bλ . Thus it follows that L(0)a, bλ = a, L(0)bλ so that a, bλ = 0 if wt (a) = wt (b) for homogeneous a, b. One also finds a, bλ = b, aλ [FHL,Sche]. , λ is unique up to normalization if L(1)V1 = V0 (we choose the normalization 1, 1λ = 1 throughout) and is non-degenerate if and only if V is simple [L]. We call such a unique non-degenerate symmetric bilinear form the Li–Zamolodchikov (Li–Z) metric. Given any V basis {u α } we define the Li–Z dual V basis {u β }, where u α , u β λ = δ αβ .
Genus Two Partition and Correlation Functions for Fermionic VOSAs I
427
3.3. Free fermion VOSA. Consider the rank one free fermion VOSA V (H, Z + 21 ) with H = Cψ for a (fermion) vector ψ of parity 1 [FFR,Ka] with modes obeying [ψ(m), ψ(n)] = ψ(m)ψ(n) + ψ(n)ψ(m) = δm+n+1,0 .
(27)
The superspace is spanned by Fock vectors we denote by1 (k) ≡ ψ(−k1 )ψ(−k2 ) . . . ψ(−ks )1,
(28)
for distinct ordered integers 1 ≤ k1 < · · · < ks and where ψ(k)1 = 0 for k ≥ 0. The VOSA is generated by Y (ψ, z) with conformal vector ω = 21 ψ(−2)ψ(−1)1 of central charge c = 21 for which (k) has L(0) weight wt ((k)) = 1≤i≤s (ki − 21 ) ∈ 21 Z. In particular wt (ψ) = 21 . Since ψ † (n) = λ−1 (−λ2 )n+1 ψ(−n − 1) it follows from (23) that the Fock vectors form an orthogonal basis with respect to the Li–Z metric , λ with (k) = (−1)[wt ()] λ2wt () (k),
(29)
for (k) of weight wt () and where [x] denotes the integral part of x. We next consider the rank two fermion VOSA V (H, Z + 21 )⊗2 , the tensor product of two copies of the rank one fermion VOSA. We employ the off-diagonal basis ψ ± = √1 (ψ1 ± iψ2 ) for fermions ψ1 = ψ ⊗ 1 and ψ2 = 1 ⊗ ψ. The VOSA is 2 generated by Y (ψ ± , z) = n∈Z ψ ± (n)z −n−1 , where the modes obey the commutation relations [ψ + (m), ψ − (n)] = δm,−n−1 , [ψ + (m), ψ + (n)] = 0, [ψ − (m), ψ − (n)] = 0. The VOSA vector space V is a Fock space spanned by2 (k, l) ≡ ψ + (−k1 ) . . . ψ + (−ks )ψ − (−l1 ) . . . ψ − (−lt )1,
(30)
for distinct positive integers k1 , . . . , ks and distinct l1 , . . . , lt with ψ ± (k)1 = 0 for all k ≥ 0. We define the conformal vector to be ω=
1 + [ψ (−2)ψ − (−1)+ψ − (−2)ψ + (−1)]1, 2
(31)
whose modes generate a Virasoro algebra of central charge 1. Then ψ ± has L(0)-weight 1 1 1 1≤i≤s (ki − 2 ) + 1≤ j≤t (l j − 2 ). Similarly 2 and (k, l) has L(0)-weight wt () = to (29), the Li–Z dual of (k, l) is (k, l) = (−1)st (−1)[wt ()] λ2wt () (l, k), where the (−1)st factor arises from the ordering chosen in (30). For the parameter choice (21) we find for (k, l) of parity p that (k, l) = (−1)st (−ξ ) p wt () (l, k). 1 Denoted by (−k) in ref. [MTZ]. 2 Denoted by (−k, −l) in ref. [MTZ].
(32)
428
M. P. Tuite, A. Zuevsky
The weight 1 space is V1 = Ca for Heisenberg vector a = ψ + (−1)ψ − (−1)1,
(33)
with modes obeying [a(m), a(n)] = mδm,−n . Then ω = 21 a(−1)2 1 is the standard conformal vector for the Heisenberg VOA M. Thus V can be decomposed into irreducible M-modules M ⊗ em for a(0) eigenvalue m ∈ Z e.g. [FFR,Ka]. Furthermore, a(0) is a generator of continuous V automorphisms e2πiγ a(0) for real γ . 4. Partition Functions and Correlation Functions on a Genus Two Riemann Surface In this section we consider the partition and n-point correlation functions for a VOSA on a Riemann surface of genus two formed by sewing two tori. In the next section we will compute these quantities in the case of a rank two fermion VOSA with arbitrary automorphisms generated by a(0).
4.1. Torus n-point correlation functions. We first review aspects of genus one orbifold n-point (correlation) functions for twisted VOSA modules. For more details see refs. [Z1,DLM,MT4,DZ,MTZ]. Let σ ∈ Aut(V ) denote the parity (fermion number) automorphism σ a = (−1) p(a) a,
(34)
for all a ∈ V . Let f, g ∈ Aut(V ) denote two commuting automorphisms that also commute with σ . Consider a σ g-twisted V -module Mσ g with vertex operators Yσ g [DLM,DZ,MTZ]. We assume that Mσ g is stable under σ and f , i.e. both σ and f act on Mσ g . Then for vectors v1 , . . . , vn ∈ V we define the torus orbifold n-point function by [Z1,MTZ], (1) f (v1 , z 1 ; . . . ; vn , z n ; τ ) Z g
L(0) ≡ STr Mσ g f Yσ g (q1 v1 , q1 ) . . . Yσ g (qnL(0) vn , qn )q L(0)−c/24 , (35) where q = exp(2πiτ ), qi = exp(z i ), i = 1, . . . , n, for variables z 1 , . . . , z n , and where STr M denotes the supertrace defined by STr M (X ) = T r M (σ X ). It follows from (17) that the n-point function (35) is non-vanishing provided p1 + · · · + pn = 0 for parity pi = p(vi ).
mod 2,
(36)
Genus Two Partition and Correlation Functions for Fermionic VOSAs I
429
Taking all vi = 1 in (35) yields the genus one orbifold partition function which we f (1) (τ ). Taking n = 1 in (35) gives the genus one 1-point function which denote by Z g f we denote by Z (1) g (v; τ ) and is independent of z. In order to consider modular-invariance of n-point functions at genus 1, Zhu [Z1] introduced a second isomorphic ‘square-bracket’ VOSA (V, Y [, ], 1, ω) ˜ associated to a given VOSA (V, Y (, ), 1, ω). The new vertex operators are defined by a change of coordinates Y [v, z] =
v[n]z −n−1 = Y (qzL(0) v, qz − 1),
n∈Z
c while the new conformal vector ω˜ = ω − 24 1. We set Y [ω, ˜ z] = n∈Z L[n]z −n−2 and write wt[v] = k if L[0]v = kv, V[k] = {v ∈ V |wt[v] = k}. Only primary vectors are homogeneous with respect to both L(0) and L[0], in which case wt (v) = wt[v]. One can show that n-point functions can be expressed in terms of 1-point functions to find [MT4] f Z (v1 , z 1 ; . . . ; vn , z n ; τ ) g f = Z (1) (Y [v1 , z 1 − z n ] . . . Y [vn−1 , z n−1 − z n ]vn ; τ ) . g (1)
(37)
4.2. Genus two n-point correlation functions. In the -sewing scheme we sew two tori a(1) , a = 1, 2 with modular parameters τa via the sewing relation (4). Similarly to ref. [MT1] for VOAs, we define the genus two orbifold n-point correlation function in the -sewing scheme for a VOSA V with a Li–Z metric as follows. Let f a , ga be V automorphisms and let Mσ ga be σ ga -twisted V -modules stable under σ and f a for (1) commuting f a , ga and σ . We combine f 1 , g1 orbifold correlation functions on 1 with (1) (1) f 2 , g2 orbifold correlation functions on 2 . For x1 , . . . , xk ∈ 1 with |xi | ≥ ||/r2 (1) and yk+1 , . . . , yn ∈ 2 with |yi | ≥ ||/r1 , define the genus two orbifold n-point function as the following formal series in : f (v1 , x1 ; . . . ; vk , xk |vk+1 , yk+1 ; . . . ; vn , yn ; τ1 , τ2 , ) g (1) f 1 = (Y [v1 , x1 ] . . . Y [vk , xk ]u; τ1 ) Z g1 u∈V (1) f 2 (Y [vk+1 , yk+1 ] . . . Y [vn , yn ]u; ·Z ¯ τ2 ), g2
Z (2)
(38)
where f (respectively g) denotes the pair f 1 , f 2 (respectively g1 , g2 ). The sum is taken sq over any V -basis where u¯ is the dual of u with respect to the Li–Z metric , λ of (26) as defined by the square bracket Virasoro operators {L[n]} and with λ of (21).
430
M. P. Tuite, A. Zuevsky
Remark 1. Equation (38) reduces to the definition given in ref. [MT1] as follows. For u, v of equal square bracket weight we have u, vλ = −wt[u] u, vsq , sq
(39) √ where u, vsq denotes the standard Li–Z metric corresponding to the choice λ = ± −1. Then (38) can be rewritten as (2) f (v1 , x1 ; . . . ; vk , xk |vk+1 , yk+1 ; . . . ; vn , yn ; τ1 , τ2 , ) Z g r (1) f 1 = (Y [v1 , x1 ] . . . Y [vk , xk ]u; τ1 ) Z g1 u∈V[r ] r ∈Z/2 f ·Z (1) 2 (Y [vk+1 , yk+1 ] . . . Y [vn , yn ]u; ¯ τ2 ), g2 where here u ranges over any V[r ] -basis and u¯ is the dual of u with respect to the standard Li–Z metric u, vsq . In the case where no states vi are inserted then (38) defines the genus two partition (or 0-point) function (2) f (1) f 1 (1) f 2 (τ1 , τ2 , ) = (u; τ1 )Z (u; ¯ τ2 ). Z Z (40) g1 g2 g u∈V
(1) and y j ∈
(1) . The definition (38) depends on the choice of insertion points xi ∈ 1 2 However, similarly to the situation for a VOA discussed in ref. [MT1], we may define an associated formal differential form for quasi-primary vectors as follows: Proposition 1. Let vi ∈ V be quasi-primary vectors of square bracket weight wt[vi ]
(1) and yi ∈
(1) be related by the sewing relation for i = 1, . . . , n. Let xi ∈ 1 2 xi yi = = −λ2 . Then the formal differential form f (v1 , . . . , vn ; τ1 , τ2 , ) F (2) g f (v1 , x1 ; . . . ; vk , xk |vk+1 , yk+1 ; . . . ; vn , yn ; τ1 , τ2 , ) ≡ (−1) Nk Z (2) g ·
k i=1
d xiwt[vi ]
n
wt[v j ]
dy j
,
(41)
j=k+1
is independent of the choice of k = 0, . . . , n, where Nk is the number of odd parity vectors in the set {v1 , . . . , vk }, and where the branch covering (25) is chosen with 2wt[vi ] dyi wt[vi ] λ = . d xi xi
Genus Two Partition and Correlation Functions for Fermionic VOSAs I
431
Proof. For k ∈ {1, . . . , n} consider Z
(2)
n k wt[v ] f (v1 , x1 ; . . . ; vk , xk |vk+1 , yk+1 ; . . . ; vn , yn ) d xiwt[vi ] dy j j g i=1
=
u∈V
·Z (1)
Z (1)
f1 (Y [v1 , x1 ] . . . Y [vk , xk ]u; τ1 ) g1
f2 (Y [vk+1 , yk+1 ] . . . Y [vn , yn ]u; ¯ τ2 ) g2
k
j=k+1
d xiwt[vi ]
i=1 n
wt[v j ]
dy j
.
(42)
j=k+1
sq ¯ Y [vk , xk ]uλ v, where v is summed over any V -basis. We have Y [vk , xk ]u = v∈V v, Since vk is quasi-primary, (24) implies wt[vk ] sq sq dyk † v, ¯ Y [vk , xk ]uλ = v, ¯ Y [vk , yk ]uλ d xk wt[vk ] sq dyk = (−1) pk p(v) Y [vk , yk ]v, ¯ uλ , d xk using invariance (26) and where pk = p(vk ). Hence (42) becomes k−1 f1 p p(v) (1) k (Y [v (−1) Z , x ] . . . Y [v , x ]v; τ ) d xiwt[vi ] 1 1 k−1 k−1 1 v∈V g1 i=1 n wt[v ] f (−1) pk ( pk+1 +···+ pn ) Z (1) 2 (Y [vk , yk ]Y [vk+1 , yk+1 ] . . . Y [vn , yn ]v; ¯ τ2 ) dy j j , g2 j=k sq using u∈V Y [vk , yk ]v, ¯ uλ u¯ = Y [vk , yk ]v¯ and locality. Finally, (36) implies nonvanishing contributions arise only if p(v) = p1 + · · · + pk−1 so that (−1) pk p(v) (−1) pk ( pk+1 +···+ pn ) = (−1) pk . But Nk = pk + Nk−1 , where Nk−1 is the number of odd parity vectors in the set {v1 , . . . , vk−1 }. Hence (−1) pk = (−1) Nk−1 −Nk and the result follows. 5. The Free Fermion VOSA 5.1. Genus one. Consider the rank 2 free fermion VOSA V (H, Z + 21 )⊗2 generated by ψ ± . In this case, the parity automorphism (34) is described by σ = eiπa(0) for Heisenberg vector a. We also define two commuting automorphisms f, g by3 σ f = e2πiβa(0) , σ g = e−2πiαa(0) , for real α, β. It is also convenient to define θ = −e−2πiβ , φ = −e2πiα , in accordance with (3). The twisted partition function is then e.g. [Ka,MTZ]
1 1 2 f 1 − θ −1 q l− 2 +α 1 − θq l− 2 −α . (43) Z (1) (τ ) = q α /2−1/24 g l≥1
3 Note some notational changes from ref. [MTZ].
432
M. P. Tuite, A. Zuevsky
Equation (43) vanishes for (α, β) = ( 21 , 21 ) i.e. (θ, φ) = (1, 1). We will assume that (θ, φ) = (1, 1) for the remainder of this discussion. In ref. [MTZ] it is shown by using associativity how to compute all twisted genus one n-point functions from a generating function which is the 2n-point function for nψ + and n ψ − vectors: f f (ψ + , x1 ; ψ − , y1 ; . . . ; ψ + , xn ; ψ − , yn ; τ ) = det P · Z (1) (τ ), (44) Z (1) g g where P is the n × n matrix: P=
P1
θ (xi − y j , τ ) , φ
(45)
θ for 1 ≤ i, j ≤ n and where P1 φ (z, τ ) is the twisted Weierstrass function defined in (67). Thus, in particular, for a homogeneous square bracket weight Fock vector [k, l] ≡ ψ + [−k1 ] . . . ψ + [−ks ]ψ − [−l1 ] . . . ψ − [−lt ]1, we find that the genus one 1-point function is given by [MTZ], θ (1) f s(s−1)/2 (1) f ([k, l], τ ) = δst (−1) (τ ) det C (k, l, τ ), Z Z g g φ where C
θ φ
(46)
(47)
(k, l, τ ) is the s × s matrix: C
θ θ (k, l, τ ) = C (ki , l j , τ ) , φ φ
(48)
for 1 ≤ i, j ≤ s as defined by (69). Note that (47) is non-vanishing for [k, l] of even parity (integer weight) in agreement with (36).
5.2. The genus two partition function. We now come to the main results of this paper where for the rank two fermion VOSA we compute the genus two partition function and the generating form on the genus two Riemann surface formed by sewing together two tori as defined by (38). Consider commuting automorphisms f a , ga for a = 1, 2 parameterized by σ f a = e2πiβa a(0) , σ ga = e−2πiαa a(0) , and define θa = −e−2πiβa , φa = −e2πiαa , where (θa , φa ) = (1, 1). (The case where (θa , φa ) = (1, 1) will be considered elsewhere [TZ4]). Recall the infinite matrices Fa , Q of (8) and (11), ⎛ θ ⎞ 1 0 ξ F 1 1 θ θ φ1 ⎠ θ . Fa a = 2 (k+l−1) C a (k, l, τa ) , Q = ⎝ 2 φa φa 0 −ξ F2 φ 2
We find the partition function (40) is follows:
Genus Two Partition and Correlation Functions for Fermionic VOSAs I
433
Theorem 1. The genus two partition function for the rank two fermion VOSA is a nonvanishing holomorphic function on D given by (2) f (1) f 1 (1) f 2 Z (49) (τ1 , τ2 , ) = Z (τ1 ) Z (τ2 ) det (I − Q) . g1 g2 g To prove this result we first note some determinant formulas for finite matrices. Let R be an N × N matrix and let k = (k1 , . . . , kn ) denote n ordered subindices with 1 ≤ k1 < · · · < kn ≤ N . We refer to k as an N -subindex of length n. Let R(k, l) = Rkr ls r, s = 1, . . . , n, (50) denote the n × n submatrix of R indexed by a pair k, l of N -subindices of length n. We define R(k, l) = 1 in the degenerate case n = 0. Proposition 2. Let R be an N × N matrix and I the identity matrix. Then det (I + R) =
N n=0
det R(j, j),
(51)
j
where the inner sum runs over all N -subindices of length n. N Proof. Consider det(I +x R) = σ ∈SN σ i=1 (δiσ (i) +x Riσ (i) ) for parameter x, where σ is the signature of σ ∈ S N the permutation group. Consider the subset of S N consisting of all permutations ρ fixing at least N − n indices. Each ρ is a permutation on some j = ( j1 , . . . , jn ), an N -subindex n, where the remaining N − n indices of length are fixed. Then det(I + x R) = an x n for 0≤n≤N
an =
j
ρ
ρ
n
R ji ρ( ji ) =
i=1
det R(j, j).
j
0 tA be a 2M × 2M t −1 B 0 block matrix for parameter t = 0. Then det(I + R) is t independent and is given by Corollary 1. Let A, B be M × M matrices and let R =
det(I + R) =
M
(−1)m
m=0
det A(k, l) det B(l, k),
(52)
k,l
where the inner sum runs over all pairs k, l of M-subindices of length m. −1 IM A t IM 0 t IM 0 for M × M identity Proof. Clearly I + R = 0 IM B IM 0 IM matrix I M so that det(I + R) is independent of t. Next apply (51) to the block matrix R. The block structure of R and the t independence of det(I + R) imply that the inner sum of (51) runs over 2M-indices of length 2m of the form j = (k1 , . . . , km , M +l1 , . . . , M +lm ). The pair k, l are M-subindices of length m so that M 0 A(k, l) . det(I + R) = det B(l, k) 0 m=0 k,l
The result then follows.
434
M. P. Tuite, A. Zuevsky
Proof of Theorem 1. We wish to compute the genus two partition function of (40) for the rank two fermion VOSA: f f f (τ1 , τ2 , ) = ¯ τ2 ), Z (2) Z (1) 1 (u, τ1 )Z (1) 2 (u, g1 g2 g u∈V
where u is summed over any V -basis and u¯ is the square bracket Li-Z dual. We choose the Fock basis {[k, l]} with 1 ≤ k1 < · · · < ks and 1 ≤ l1 < · · · < lm of (46) with square-bracket dual from (32), [k, l] = (−1)sm (−ξ ) p wt[] [l, k].
(53)
Furthermore, (47) implies the corresponding torus one point functions are non-vanishing for m = s with even parity p = 0, where
([k, l], τ1 ) θ1 m(m−1)/2 (k, l, τ1 ), = (−1) det C f1 φ1 (1) (τ Z ) 1 g1 f Z (1) g2 ([k, l], τ2 ) θ 2 m(m−1)/2 m wt[] = (−1) (−1) det C 2 (l, k, τ2 ). φ f 2 2 Z (1) g2 (τ2 ) Z (1)
f1 g1
Hence (suppressing the τ1 , τ2 , dependence) it follows that
Z (1)
Z (2) f1 g1
f g
Z (1)
= f 2
(−1)
m
m≥0
g2
wt[]
k,l
θ θ1 (k, l) det C 2 (l, k). det C φ1 φ2
m 1 But wt[] = i=1 (ki + li − 1) so that the ki +l j − 2 factors may be absorbed into the above m × m determinants to find
Z (1)
Z (2) f1 g1
f g
Z (1)
= f 2
g2
(−1)m
m≥0
det F1
k,l
θ1 θ (k, l) det F2 2 (l, k), φ1 φ2
with Fa of (8). Let A and B denote the finite matrices found by truncating F1 and F2 to an arbitrary order in . Thus applying (52) to A and B with t = −ξ it follows that
Z (1)
Z (2) f1 g1
f g
Z (1)
= det(I − Q), f 2
g2
as an identity between two formal series in . However, it is shown in ref. [TZ1] that det(I − Q) is non-vanishing and holomorphic on D and hence the theorem holds.
Genus Two Partition and Correlation Functions for Fermionic VOSAs I
435
We may similarly compute the genus two partition function in the -formalism for the original rank one fermion VOSA V (H, Z + 21 ) where, in this case, we may only construct a σ -twisted module. Then one finds: Corollary 2. For the rank one free fermion VOSA V (H, Z + 21 ) the genus two partition function in the -formalism for f a , ga ∈ {1, σ } is given by (2) f (1) f 1 (1) f 2 (τ1 , τ2 , ) = Z (τ1 ) Z (τ2 ) det (I − Q)1/2 , (54) Z g1 g2 g f where Z (1) ga (τa ) is the rank one torus partition function. a
5.3. The genus two generating function. In this section we compute the genus two generating form for all n-point functions for the rank two free fermion VOSA. This is the genus two analogue of (44) and is defined by (2) f (2) f (w1 , . . . , wn , z 1 , . . . , z n ) = F (ψ + , ψ − , . . . , ψ + , ψ − ; τ1 , τ2 , ), (55) Gn g g
(1,1) and ψ − the formal 2n-form of (41) found by alternatively inserting ψ + at wi ∈
(1,1) for i = 1, . . . , n, where
(1,1) denotes the disconnected union of the two at z i ∈ f punctured tori. In order to describe Gn(2) g we recall the Szeg˝o kernels and half-forms of (7) and (13) and define matrices S (2) = S (2) (wi , z j ) , S (1,1) = S (1,1) (wi , z j ) , T H + = ((h(wi ))(k, a)) , H − = (h(z i ))(l, b) . S (2) and S (1,1) are finite matrices indexed by wi , z j for i, j = 1, . . . , n; H + is semiinfinite with n rows indexed by wi and columns indexed by k ≥ 1 and a = 1, 2 and H − is semi-infinite with rows indexed by l ≥ 1 and b = 1, 2 and with n columns indexed by z j . We then find Proposition 3.
det
S (1,1) H + H− I − Q
= det S (2) det(I − Q)
with Q, of (11). Proof. Consider the matrix identity (1,1) H + S H− I − Q (1,1) In 0 I H + (I − Q)−1 − H + (I − Q)−1 H − 0 S , = n 0 I−Q 0 I I H− where In is the n × n identity matrix. But the genus two Szeg˝o kernel of (14) implies
S (1,1) − H + (I − Q)−1 H − (wi , z j ) = S (2) (wi , z j ). The result follows on taking the determinant.
436
M. P. Tuite, A. Zuevsky
We may next describe the generating form: Theorem 2. The generating form for the rank two free fermion VOSA is given by Gn(2)
f f (w1 , . . . , wn , z 1 , . . . , z n ) = Z (2) (τ1 , τ2 , ) det S (2) . g g
(56)
Remark 2. Relative to the genus two partition function, the normalized 2-point for ψ + and ψ − is given by the Szeg˝o kernel and more generally, the 2n-point function is given by a Szeg˝o kernel determinant. This agrees with the assumed form of the higher genus fermion 2n-point function in [R] or as found by string theory methods using a Schottky parameterisation in [DVPFHLS]. In order to prove Theorem 2 we require an extension of Proposition 2.
0 0 be N × N matrices where I N − p is the identity Proposition 4. Let R and J p = 0 I N−p (N − p) × (N − p) matrix for 0 ≤ p ≤ N . Then N−p det J p + R = det R(j p , j p ),
(57)
n=0 j p
where the inner sum runs over all N -subindices of length n + p of the form j p = (1, . . . , p, j1 , . . . , jn ). Proof. The proof follows along the same lines as Proposition 2 where here we consider p N det(J p + x R) = x p σ ∈SN σ i=1 Riσ (i) i= p+1 (δiσ (i) + x Riσ (i) ). Then det(J p + x R) = x p 0≤n≤N − p an x n for an =
jp
ρ
ρ
p
Riρ(i)
n
R jr ρ( jr ) =
r =1
i=1
det R(j p , j p ),
jp
where ρ is a permutation of j p = (1, . . . , p, j1 , . . . , jn ). The result then follows as before. Corollary 3. Let A, B be M × M matrices and let U be a p × M matrix and W be a M × p matrix with p ≤ M. Define the ( p + 2M) × ( p + 2M) block matrix ⎡
⎤ 0 0 U 0 tA⎦, R=⎣ 0 −1 W t B 0 where t is a non-zero scalar parameter. Then det(J p + R) is independent of t and is given by det(J p + R) =
M m= p
(−1)m
k,l
det U A (k, l) det W B (l, k),
(58)
Genus Two Partition and Correlation Functions for Fermionic VOSAs I
437
where k and l are M-subindices of length m − p and m respectively. U A (k, l) and W B (l, k) are the m × m submatrices with components i = 1, . . . , p Ui l j U A (k, l)i j = Aki− p l j i = p + 1, . . . , m, j = 1, . . . , p Wli j W B (l, k)i j = Bli k j− p j = p + 1, . . . , m. Proof. det(J p + R) is t invariant since (J p + R)|t=1 = diag(I p , t −1 I M , I M ) (J p + R) diag(I p , t I M , I M ), for identity matrices I p and I M . t invariance and the off-diagonal structure of R imply that the inner sum in (57) is taken over ( p + 2M)-subindices of length 2m described by j p = (1, . . . , p, p + k1 , . . . , km− p , p + M + l1 , . . . , p + M + lm ), for 1 ≤ k1 < . . . < km− p ≤ M and 1 ≤ l1 < . . . < lm ≤ M, i.e. k and l are M-subindices of length m − p and m respectively. Hence det(J p + R) =
M m= p k,l
The result then follows.
0 U A (k, l) . det W B (l, k) 0
Proof of Theorem 2. Following Proposition 1 we may evaluate Gn(2)
f g
by inserting the
(1,1) . quasi-primary vectors ψ ± in any way on the disconnected union of punctured tori (1) (1) + −
In particular, we choose ψ at wi ∈ 1 and ψ at z i ∈ 2 for i = 1, . . . , n. Thus, reordering operators and using (38) and (41) we find (2) f (2) f Gn = Gn (w1 , . . . , wn , z 1 , . . . , z n ) g g n(n−1)/2 n (1) f 1 (Y [ψ + , w1 ] . . . Y [ψ + , wn ]u, τ1 ) (−1) Z = (−1) g1 ·Z
(1)
u∈V
1 1 f2 (Y [ψ − , z 1 ] . . . Y [ψ − , z n ]u, ¯ τ2 ) dwi2 dz i2 . g2 n
(59)
i=1
Choose the Fock basis {[k, l]} with 1 ≤ k1 < · · · < ks and 1 ≤ l1 < · · · < lm of (46) with square bracket dual (53). The corresponding torus one point functions are non-vanishing for n + s = m with parity p = n mod 2 from (36). Expanding (45) using (68) one finds (see Proposition 15 of ref. [MTZ] for details) f Z (1) g1 (Y [ψ + , w1 ] . . . [k, l], τ1 ) 1 = (−1)m(m−1)/2 det E 1 (k, l), f1 (1) (τ1 ) Z g1 f Z (1) g2 (Y [ψ − , z 1 ] . . . [k, l], τ2 ) 2 = (−1)m(m+1)/2 (−ξ ) p wt[] det E 2 (l, k), f2 (1) (τ2 ) Z g 2
438
M. P. Tuite, A. Zuevsky
for m × m matrices with components ⎧ θ1 ⎨D (1, l j , τ1 , wi ) i θφ1 (E 1 (k, l))i j = ⎩ C 1 (k , l , τ ) i i j 1 φ1 ⎧ θ2 ⎨D (li , 1, τ2 , −z j ) θφ2 (E 2 (l, k))i j = ⎩ C 2 (l , k , τ ) i j 2 φ 2
for C
θ a
,D
θ a
Z (1)
f1 g1
= n + 1, . . . , m, j = 1, . . . , n j = n + 1, . . . , m,
of (69) and (70). Since p = n mod 2, one finds ξ p =
φa φa (−1)n(n+1)/2 ξ n so that
Gn(2)
= 1, . . . , n
altogether
f g
Z (1)
= f 2
(−1)m
m≥0
g2
wt[] ξ n det E 1 (k, l) det E 2 (l, k).
k,l
m−n 1 1 1 1 (ki − 21 ) + mj=1 (l j − 21 ) so that factors of 2 l j − 4 and 2 ki − 4 may But wt[] = i=1 be absorbed into the rows and columns of the above determinants. Furthermore, factors 1
1
of dwi2 and dz i2 can be absorbed into the first n rows and columns of det E 1 and det E 2 respectively. Lastly, a factor of ξ can be absorbed into the first n rows of det E 1 (k, l) to find
Z (1)
Gn(2) f1 g1
f g
Z (1)
= f 2
(−1)m
m≥0
g2
det G 1 (k, l) det G 2 (l, k),
k,l
for m × m matrices (G 1 (k, l))i j = (G 2 (l, k))i j =
ξ h 1 (l j , τ1 , wi ) i = 1, . . . , n F1 (ki , l j , τ1 ) h 2 (li , τ2 , z j )
i = n + 1, . . . , m, j = 1, . . . , n
F2 (li , k j , τ2 ) j = n + 1, . . . , m,
with Fa , h a of (8) and (9). Finally, let A, B, U and W denote the finite matrices found by truncating F1 , F2 , h 1 (wi ) and h 2 (z j ) respectively to an arbitrary order in . Thus applying Corollary 3 to A, B, U and W with n = p and t = −ξ it follows that as a formal series in we have (2)
Z (1)
Gn f1 g1
0 H + = det , H− I − Q (1) f 2
f g
Z
g2
Genus Two Partition and Correlation Functions for Fermionic VOSAs I
439
T where H + = 0, h 1 (l j , wi ) and H − = 0, (h 2 (li , z j ) . Finally, using Proposition 3
(1) and z i ∈
(1) we find a convergent series in for wi ∈ 1 2 (2) f Gn g = det S (2) det(I − Q), f1 f2 (1) (1) Z Z g g 1
2
and hence the theorem follows on applying Theorem 1.
Remark 3. The other choices of the insertion points for ψ ± give rise to corresponding H ± and S (1,1) terms in Proposition 3 leading to the same result (56). As an illustration of the use of the generating form we compute the one-point function for
(1) and consider the the Virasoro vector " ω = 21 (ψ + [−2]ψ − + ψ − [−2]ψ + ). Let w, z ∈ 1 f (2) f generating form G1 g (w, z) = S (2) (w, z)Z (2) g (where we suppress the τ1 , τ2 , dependence). Using (37) we find f f (ψ + , w; ψ − , z) = ∂w Z (2) (Y [ψ + , w − z]ψ − , z) ∂w Z (2) g g 1 (2) f (2) f + Z (ψ + [−2]ψ − , z) + . . . , Z =− g g (w − z)2 1
1
and similarly for ∂z . Letting S (2) (w, z) = K (2) (w, z)dw 2 dz 2 it follows that the Virasoro 1-point form is given by 1 1 f (2) f (" ω, z) = dz 2 lim Z . (60) F (2) (∂w − ∂z ) K (2) (w, z) + g g w→z 2 (w − z)2 An alternative expression for this is shown below in Proposition 6. 5.4. Bosonization and a genus two Jacobi product identity. Consider the decomposition of the rank two fermion VOSA into irreducible modules M ⊗ em modules (for m ∈ Z) of the Heisenberg subVOA M generated by the Heisenberg state a. The genus one partition function (43) can thus also be expressed as (e.g. [Ka,MTZ]) e−2πiαβ (1) α (1) f (τ ), Z ϑ (τ ) = g β η(τ ) n for theta function (1) and Dedekind eta-function η(τ ) = q 1/24 ∞ n=1 (1 − q ). All n-point functions can be similarly computed in terms of Heisenberg module traces [MTZ] so that the genus two partition function (49) can also be computed in this bosonized formalism to obtain [MT1] (2) f −2πiα·β (2) (2) α ((2) ), Z Z M (τ1 , τ2 , ) ϑ (61) (τ1 , τ2 , ) = e g β for genus two Riemann theta function with characteristics α = (α1 , α2 ), β = (β1 , β2 ), and where 1 (2) Z M (τ1 , τ2 , ) = η(τ1 )η(τ2 ) det (I − A1 A2 )1/2
440
M. P. Tuite, A. Zuevsky
is the genus two partition function for the rank one free Heisenberg VOA M. Aa for a = 1, 2 is an infinite matrix with components indexed by k, l ≥ 1 [MT1,MT2] (−1)k+1 (k + l − 1)! Aa (k, l, τa , ) = (k+l)/2 √ E k+l (τa ), kl(k − 1)!(l − 1)! 1 (τ ). Comparing with Theorem 1 we find for a standard Eisenstein series E n (τ ) = E n 1 a new identity relating the genus two theta function to determinants on D as follows Theorem 3. α ϑ (2) β ((2) ) α α = det (I − A1 A2 )1/2 det (I − Q) . 1 2 ϑ (1) β (τ1 ) ϑ (1) β (τ2 ) 1
2
It is shown in ref. [MT1] that det (I − A1 A2 ) can be expressed as an infinite product as follows. Let σ2n = (k1 , . . . , k2n ) denote a cycle permutation on 2n positive integers. We may canonically associate each σ with an oriented graph N consisting of 2n valence 2 nodes labelled by k1 , . . . , k2n . N is said to be rotationless when it admits no nontrivial rotations (a rotation being an orientation-preserving automorphism of N which preserves node labels). Lastly, we define a weight function ζ A on N by ζ A (N ) =
n
A1 (k2i−1 , k2i )A2 (k2i , k2i+1 ),
i=1
where k2n+1 ≡ k1 . We then find [MT1] det (I − A1 A2 ) =
(1 − ζ A (N )) ,
N ∈R
where R denotes the set of rotationless oriented cycle graphs with an even number of nodes. This expansion can be similarly applied to det (I − Q) = det (I − F1 F2 ) with corresponding weight function ζ F . Hence Theorem 3 implies a genus two Jacobi product-like formula Proposition 5. α ϑ (2) β ((2) ) α α = (1 − ζ A (N ))1/2 (1 − ζ F (N )) . 1 2 (1) (1) (τ1 ) ϑ (τ2 ) ϑ N ∈R β β 1
2
Genus Two Partition and Correlation Functions for Fermionic VOSAs I
441
Remark 4. The bosonization procedure can be applied to obtain an alternative expression for the genus two generating form of Theorem 2 to obtain Fay’s tresecant identity relating det S (2) to a product of prime forms [TZ4].
5.5. A genus two Ward Identity. We may also recompute the 1-point function (60) for the Virasoro vector ω˜ = 21 a[−1]a in the bosonized version of the rank two free fermion VOSA. We introduce the differential operator [F1,U,MT1] 1 2πi
D=
(2)
∂
(2)
νi (x)ν j (x)
1≤i≤ j≤2
(2)
∂i j
,
(62)
(2)
for holomorphic 1-forms νi . We also recall the genus two projective connection s (2) (x) of Appendix 7.1. Using (61) and results of [MT1] we find Proposition 6. The Virasoro 1-point form for the rank two fermion VOSA satisfies a genus two Ward identity F
(2)
(ω, ˜ x; τ1 , τ2 , ) = e
−2πiα·β
(2) Z M (τ1 , τ2 , )
1 (2) (2) α ((2) ). D+ s (x) ϑ β 12
(63)
The Ward identity (63) is similar to previous results in physics and mathematics, e.g. [EO,KNTY].
6. Modular Invariance Properties We next consider the automorphic properties of the genus two partition
function for the a b rank two fermion VOSA. In [MTZ] we define the action of γ = c d ∈ S L(2, Z) on # $ a genus one orbifold partition function Z (1) gf (τ ) as follows: Z
where γ .τ =
aτ +b cτ +d
(1)
and γ .
% f %% f (1) γ (τ ) = Z (γ .τ ), γ. g % g
f g
=
(64)
. For the rank two fermion VOSA we find d
f a gb f cg
modular invariance with [MTZ], Z
(1)
% f %% (1) f (1) f γ (τ ) = eγ Z (τ ), g % g g
(65)
(1) f where eγ g ∈ U (1) is a specific multiplier system.4 In Theorem 1 we showed that the genus two partition function is holomorphic on the domain D of (5). D is preserved under the action of G (S L(2, Z)× S L(2, Z))Z2 , 4 Note a notational change for the multiplier from that of ref. [MTZ].
442
M. P. Tuite, A. Zuevsky
the direct product of the left and right torus modular groups, which are interchanged upon conjugation by an involution β defined as follows [MT2]: , γ1 (τ1 , τ2 , ) = γ1 .τ1 , τ2 , c1 τ1 + d1 , γ2 (τ1 , τ2 , ) = τ1 , γ2 .τ2 , c2 τ2 + d2 β(τ1 , τ2 , ) = (τ2 , τ1 , ),
a b for (γ1 , γ2 ) ∈ S L(2, Z) × S L(2, Z) with γi = ci di . There is a natural injection i i G → Sp(4, Z) in which the two S L(2, Z) subgroups are mapped to ⎧⎡ a ⎪ ⎨ 1 ⎢0 1 = ⎣ ⎪ ⎩ c1 0
0 1 0 0
b1 0 d1 0
⎧⎡ ⎤⎫ 1 0 ⎪ ⎪ ⎨ ⎬ 0⎥ ⎢0 , 2 = ⎣ 0 ⎦⎪ ⎪ ⎩ 0 ⎭ 1 0
0 a2 0 c2
0 0 1 0
⎤⎫ 0 ⎪ ⎬ b2 ⎥ , ⎦ 0 ⎪ ⎭ d2
and the involution is mapped to ⎡
0 ⎢1 β=⎣ 0 0
1 0 0 0
0 0 0 1
⎤ 0 0⎥ . 1⎦ 0
In a similar way to (64) we define an action of γ ∈ G on the genus two orbifold twisted partition function (40) by Z
(2)
% f %% f (2) γ (τ1 , τ2 , ) = Z γ. γ . (τ1 , τ2 , ) , g % g
generated by γi ∈ i and β with ⎤ ⎡ ⎤ ⎡ ⎤ ⎤ ⎡ a 1 b1 ⎤ ⎡ ⎤ ⎡ f 1 f1 f 1 g1 f2 f1 f1 a b ⎥ ⎢ f 2g 2 ⎥ f f f f1 ⎥ ⎢ ⎥ ⎥ ⎢ ⎢ f2 ⎥ ⎢ ⎢ 2 2 2 2 2 ⎥ ⎥ ⎢ γ1 . ⎣ ⎦ = ⎢ ⎣ f c1 g d1 ⎦ , γ2 . ⎣ g1 ⎦ = ⎣ g1 ⎦ , β. ⎣ g1 ⎦ = ⎣ g2 ⎦ . g1 1 1 g2 g2 g2 g1 g2 f 2c2 g2d2 ⎡
We may now describe the modular invariance of the genus two partition function for the rank two VOSA of Theorem 1 under the action of G. Define a genus two multi(2) f plier system eγ g ∈ U (1) for γ ∈ G in terms of the genus one multiplier system as follows: (2) f (2) f (1) f i = eγi = 1, (66) , eβ eγi g g gi for G generators γi ∈ i and β. We then find
Genus Two Partition and Correlation Functions for Fermionic VOSAs I
443
Theorem 4. The genus two orbifold partition function for the rank two VOSA is modular invariant with respect to G = (S L(2, Z) × S L(2, Z)) Z2 with multiplier system (66) i.e. Z (2)
% f %% (2) f (2) f γ (τ Z , τ , ) = e (τ1 , τ2 , ) . 1 2 γ g % g g
Proof. We recall from Theorem 1 that the genus two partition function can be expressed as Z
(2)
f m wt[] (1) f 1 (1) f 2 (τ1 , τ2 , ) = ([k, l], τ1 )Z ([l, k], τ2 ), Z (−1) g g1 g2 k,l
for 1 ≤ k1 < · · · < km and m1 ≤ l1 < · · · < lm with Fock basis {[k, l]} of square bracket weight wt[] = i=1 (ki + li − 1). Let us consider the action of γ1 ∈ 1 . It follows from (71) (see also Proposition 21 of [MTZ]) that f1 f wt[] (1) f 1 Z (1) γ1 . 1 ([k, l], γ1 .τ1 ) = eγ(1) (c ([k, l], τ1 ). τ + d ) Z 1 1 1 1 g1 g1 g1 Hence from (66) we find Z (2)
% wt[] f %% (1) f 1 m γ = e (c1 τ1 + d1 )wt[] (−1) γ1 g1 g % 1 c1 τ1 + d1 k,l f f ·Z (1) 1 ([k, l], τ1 )Z (1) 2 ([l, k], τ2 ) g1 g2 f1 (2) f = eγ(1) Z . 1 g1 g
A similar result holds for γ2 ∈ 2 , whereas invariance under β is obvious. The result follows. Remark 5. Modular invariance can also inferred from Theorem 3 using modular properties of the Riemann theta function together with those for the Heisenberg genus two partition function described in [MT1]. (2) f Finally, we can also obtain modular invariance for the generating form Gn g described in Theorem 2. In particular, as is described in [TZ1], the genus two Szeg˝o kernel of (12) is invariant under the action of G. Hence it follows that (2) f Theorem 5. Gn g is modular invariant with respect to G with multiplier system (66).
444
M. P. Tuite, A. Zuevsky
7. Appendix 7.1. Some Riemann surface theory. Consider a compact Riemann surface (g) of genus g with canonical homology cycle basis a1 , . . . , ag , b1 , . . . , bg . In general there (g) exist g holomorphic 1-forms νi , i = 1, . . . , g which we may normalize by e.g. [FK] , (g) ν j = 2πiδi j . ai
The genus g period matrix (g) is defined by , 1 (g) (g) i j = ν , 2πi bi j for i, j = 1, . . . , g. (g) is symmetric with positive imaginary part, i.e. (g) ∈ Hg , the Siegel upper half plane. It is useful to introduce the normalized differential of the second kind defined by [Sp,M,F1]: d xd y for x ∼ y, (x − y)2 for local coordinates x, y, with normalization ai ω(g) (x, ·) = 0 for i = 1, . . . , g. Using the Riemann bilinear relations, one finds that , (g) νi (x) = ω(g) (x, ·). ω(g) (x, y) ∼
bi
The projective connection s (g) is defined by [G] d xd y . s (g) (x) = 6 lim ω(g) (x, y) − x→y (x − y)2 s (g) (x) is not a global 2-form but rather transforms under a general conformal transformation x → φ(x) as s (g) (φ(x)) = s (g) (x) − {φ; x}d x 2 , where {φ; x} =
φ φ
−
3 2
φ φ
2
is the Schwarzian derivative. γ There exists a (nonsingular and odd) character δ such that [M,F1] ϑ (g)
γ (0) = 0, δ
∂zi ϑ (g)
γ (0) = 0, δ
for the theta function with real characteristics (1). Define ζ (x) =
g i=1
∂zi ϑ (g)
γ (g) (0)νi (x), δ
Genus Two Partition and Correlation Functions for Fermionic VOSAs I
445
1
a holomorphic 1-form, and let ζ (x) 2 denote the form of weight 21 on the double cover " of . We also refer to ζ (x) 21 as a (double-valued) 1 -form on . We define the prime 2 form E(x, y) by γ
x (g) ϑ (g) δ 1 1 y ν E(x, y) = ∼ (x − y)d x − 2 dy − 2 for x ∼ y, 1 1 ζ (x) 2 ζ (y) 2 x x (g) where y ν (g) = ( y νi ) ∈ Cg . E(x, y) = −E(y, x) is a holomorphic differential "× " . E(x, y) has multipliers along the ai and b j cycles form of weight (− 1 , − 1 ) on 2
2
(g)
in x given by 1 and e−iπ j j
x (g) − y νj
respectively [F1].
7.2. Twisted elliptic functions. Let (θ, φ) ∈ U (1) × U (1) denote a pair of modulus one complex parameters with φ = exp(2πiλ) for 0 ≤ λ < 1. For z ∈ C and τ ∈ H we define ‘twisted’ Weierstrass functions for k ≥ 1 as follows [MTZ]: (−1)k n k−1 qzn θ (z, τ ) = Pk , φ (k − 1)! 1 − θ −1 q n n∈Z+λ
for q = q2πiτ , where expansion
means we omit n = 0 if (θ, φ) = (1, 1). We have a Laurant
1 θ θ (z, τ ) = − (τ )z n−1 , P1 En φ φ z
(67)
n≥1
in terms of twisted Eisenstein series for n ≥ 1, defined by Bn (λ) 1 (r + λ)n−1 θ −1 q r +λ θ (τ ) = − + En φ n! (n − 1)! 1 − θ −1 q r +λ r ≥0
(r − λ)n−1 θq r −λ + , (n − 1)! 1 − θq r −λ (−1)n
r ≥1
where means we omit r = 0 if (θ, φ) = (1, 1), and where Bn (λ) is the Bernoulli polynomial defined by
qzλ 1 Bn (λ) n−1 = + z . qz − 1 z n! n≥1
We also have Laurant expansions θ 1 θ (x − y, τ ) = (k, l) x k−1 y l−1 , P1 + C φ φ x−y k,l≥1 θ θ (z + x − y, τ ) = D (k, l, z) x k−1 y l−1 , P1 φ φ k,l≥1
(68)
446
M. P. Tuite, A. Zuevsky
where for k, l ≥ 1 we define θ θ l k +l −2 E k+l−1 (τ ), (69) C (k, l, τ ) = (−1) k−1 φ φ k +l −2 θ θ Pk+l−1 (z, τ ). (70) D (k, l, τ, z) = (−1)k+1 k−1 φ φ θ In [MTZ] we show that for (θ, φ) = (1, 1), E k φ is a twisted modular form of weight k, i.e. θ θ (γ .τ ) = (cτ + d)k E k (τ ), (71) Ek γ . φ φ
θ a φ b θ a b aτ +b where for γ = c d ∈ S L(2, Z) we have γ .τ = cτ +d and γ . φ = c d . θ φ
References [B]
Borcherds, R.E.: Vertex algebras, Kac-Moody algebras and the monster. Proc. Nat. Acad. Sc. 83, 3068–3071 (1986) [DLM] Dong, C., Li, H., Mason, G.: Modular-invariance of trace functions in orbifold theory and generalized moonshine. Commun. Math. Phys. 214, 1–56 (2000) [DP] D’Hoker, E., Phong, D.H.: The geometry of string perturbation theory. Rev. Mod. Phys. 60, 917–1065 (1988) [DVFHLS] di Vecchia, P., Hornfeck, K., Frau, M., Lerda, A., Sciuto, S.: N-string, g-loop vertex for the fermionic string. Phys. Lett. B211, 301–307 (1988) [DVPFHLS] di Vecchia, P., Pezzella, F., Frau, M., Hornfeck, K., Lerda, A., Sciuto, S.: N-point g-loop vertex for a free fermionic theory with arbitrary spin. Nucl. Phys. B333, 635–700 (1990) [DZ] Dong, C., Zhao, Z.: Modularity in orbifold theory for vertex operator superalgebras. Commun. Math. Phys. 260, 227–256 (2005) [EO] Eguchi, T., Ooguri, H.: Conformal and current algebras on a general Riemann surface. Nucl. Phys. B282, 308–328 (1987) [F1] Fay, J.D.: Theta Functions on Riemann Surfaces. Lecture Notes in Mathematics, Vol. 352. Berlin-New York: Springer-Verlag, 1973 [F2] Fay, J.D.: Kernel functions, analytic torsion, and moduli spaces. Mem. Amer. Math. Soc. 96(464) (1992) [FFR] Feingold, A.J., Frenkel, I.B., Reis, J.F.X: Spinor construction of vertex operator algebras and (1) E 8 . Contemp.Math. 121, Providence, RI: Amer. Math. Soc., 1991 [FHL] Frenkel, I., Huang, Y., Lepowsky, J.: On axiomatic approaches to vertex operator algebras and modules. Mem. Amer. Math. Soc., 104(494) (1993) [FK] Farkas, H.M., Kra, I.: Riemann Surfaces. New York: Springer-Verlag, 1980 [FLM] Frenkel, I., Lepowsky, J., Meurman A.: Vertex Operator Algebras and the Monster. New York: Academic Press, 1988 [FS] Freidan, D., Shenker, S.: The analytic geometry of two dimensional conformal field theory. Nucl. Phys. B281, 509–545 (1987) [G] Gunning, R.C.: Lectures on Riemann Surfaces. Princeton, NJ: Princeton Univ. Press, 1966 [HS] Hawley, N.S., Schiffer, M.: Half-order differentials on Riemann surfaces. Acta. Math. 115, 199–236 (1966) [Ka] Kac, V.: Vertex Operator Algebras for Beginners. University Lecture Series, Vol. 10, Providence, RI: Amer. Math. Soc., 1998 [Kn] Knizhnik, V.G.: Multiloop amplitudes in the theory of quantum strings and complex geometry. Sov. Phys. Usp. 32, 945–971 (1989) [KNTY] Kawamoto, N., Namikawa, Y., Tsuchiya, A., Yamada, Y.: Geometric realization of conformal field theory on Riemann surfaces. Commun. Math. Phys. 116, 247–308 (1988) [L] Li, H.: Symmetric invariant bilinear forms on vertex operator algebras. J. Pure Appl. Alg. 96, 279–297 (1994)
Genus Two Partition and Correlation Functions for Fermionic VOSAs I
[MN] [MT1] [MT2] [MT3] [MT4] [MT5] [MTZ] [M] [R] [RS] [Sche] [Schi] [Sp] [T] [TZ1] [TZ2] [TZ3] [TZ4] [TUY] [U] [Z1] [Z2]
447
Matsuo, A., Nagatomo, K.: Axioms for a vertex algebra and the locality of quantum fields. Math. Soc. Jap. Mem. 4, 1999 Mason, G., Tuite, M.P.: Free bosonic vertex operator algebras on genus two Riemann surfaces I. Commun. Math. Phys. 300, 673–713 (2010) Mason, G., Tuite, M.P.: On genus two Riemann surfaces formed from sewn tori. Commun. Math. Phys. 270, 587–634 (2007) Mason, G., Tuite, M.P.: Free bosonic vertex operator algebras on genus two Riemann surfaces II. To appear Mason, G., Tuite, M.P.: Torus chiral n-point functions for free boson and lattice vertex operator algebras. Commun. Math. Phys. 235, 47–68 (2003) Mason, G., Tuite, M.P.: Vertex operators and modular forms. A Window into Zeta and Modular Physics. eds. K. Kirsten, F. Williams, MSRI Publications 57, Cambridge: Cambridge University Press, 2010, pp. 183–278 Mason, G., Tuite, M.P., Zuevsky, A.: Torus n-point functions for R-graded vertex operator superalgebras and continuous fermion orbifolds. Commun. Math. Phys. 283, 305–342 (2008) Mumford, D.: Tata Lectures on Theta I and II. Boston: Birkhäuser, 1983 Raina, A.K.: Fay’s trisecant identity and conformal field theory. Commun. Math. Phys. 122, 625–641 (1989) Raina, A.K., Sen, S.: Grassmannians, multiplicative ward identities and theta-function identities. Phys. Lett. B203, 256–262 (1988) Scheithauer, N.: Vertex algebras, lie algebras and superstrings. J. Alg. 200, 363–403 (1998) Schiffer, M.: Half-order differentials on Riemann surfaces. SIAM J. Appl. Math. 4, 922– 934 (1966) Springer, G.: Introduction to Riemann Surfaces. Reading, MA: Addison-Wesley, 1957 Tuite, M.P: Genus two meromorphic conformal field theory. CRM Proceedings and Lecture Notes 30, 231–251 (2001) Tuite, M.P., Zuevsky, A.: The Szeg˝o Kernel on a Sewn Riemann Surface. http://arxiv.org/abs/ 1002.4114v1 [math.Q4], 2009, to appear in Commun. Math. Phys. Tuite, M.P., Zuevsky, A.: Shifting, twisting, and intertwining Heisenberg modules. To appear Tuite, M.P., Zuevsky, A.: Genus two partition and correlation functions for fermionic vertex operator superalgebras II. To appear Tuite, M.P., Zuevsky, A.: To appear Tsuchiya, A., Ueno, K., Yamada, Y.: Conformal field theory on universal family of stable curves with gauge symmetries. Adv. Stud. Pure. Math. 19, 459–566 (1989) Ueno, K.: Introduction to conformal field theory with gauge symmetries. Geometry and Physics - Proceedings of the Conference at Aarhus Univeristy, Aaarhus, Denmark, New York: Marcel Dekker, 1997 Zhu, Y.: Modular invariance of characters of vertex operator algebras. J. Amer. Math. Soc. 9, 237–302 (1996) Zhu, Y.: Global vertex operators on Riemann surfaces. Commun. Math. Phys. 165, 485– 531 (1994)
Communicated by Y. Kawahigashi
Commun. Math. Phys. 306, 449–483 (2011) Digital Object Identifier (DOI) 10.1007/s00220-011-1293-y
Communications in
Mathematical Physics
Instantons on Gravitons Sergey A. Cherkis1,2,3 1 School of Mathematics and Hamilton Mathematics Institute, Trinity College Dublin, Dublin 2, Ireland.
E-mail:
[email protected]
2 Center for Theoretical Physics, Department of Physics, University of California, Berkeley, CA 94720, USA 3 Department of Mathematics, Stanford University, Stanford, CA 94305, USA
Received: 1 August 2010 / Accepted: 11 February 2011 Published online: 29 June 2011 – © Springer-Verlag 2011
Dedicated to the memory of Israel Moiseevich Gelfand Abstract: Yang-Mills instantons on ALE gravitational instantons were constructed by Kronheimer and Nakajima in terms of matrices satisfying algebraic equations. These were conveniently organized into a quiver. We construct generic Yang-Mills instantons on ALF gravitational instantons. Our data are formulated in terms of matrix-valued functions of a single variable, that are in turn organized into a bow. We introduce the general notion of a bow, its representation, its associated data and moduli space of solutions. For a judiciously chosen bow the Nahm transform maps any bow solution to an instanton on an ALF space. We demonstrate that this map respects all complex structures on the moduli spaces, so it is likely to be an isometry, and use this fact to study the asymptotics of the moduli spaces of instantons on ALF spaces. Contents 1.
2.
3. 4.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Yang-Mills instanton . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Self-dual gravitational instantons . . . . . . . . . . . . . . . . . . Generalizing Quivers . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Bows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Representation of a bow . . . . . . . . . . . . . . . . . . . . . . . 2.3 Bow data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Bow solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Bow cohomology . . . . . . . . . . . . . . . . . . . . . . . . . . 2.6 Various forms of the moment map conditions . . . . . . . . . . . The Moduli Space of a Bow Representation and Natural Bundles on It 3.1 Natural bundles on M . . . . . . . . . . . . . . . . . . . . . . . 3.2 Holomorphic description . . . . . . . . . . . . . . . . . . . . . . ALF Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
450 451 452 452 453 454 454 455 458 459 460 462 463 463 464
450
S. A. Cherkis
4.1 Taub-NUT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Multi-Taub-NUT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Dk ALF space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. Yang-Mills Instantons on ALF Spaces . . . . . . . . . . . . . . . . . . . . 5.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Instanton charges . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. Nahm Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7. Cohomological Interpretation of the Nahm Transform . . . . . . . . . . . 8. Moduli Spaces of Instantons . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 Moduli space of instantons as a finite hyperkähler quotient . . . . . . 8.2 A different finite quotient realization . . . . . . . . . . . . . . . . . . 9. Asymptotic Metric on the Moduli Space of Instantons on a Multi-Taub-NUT Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1 Balanced bifundamentals . . . . . . . . . . . . . . . . . . . . . . . . 9.2 General case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10. Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
464 464 464 466 466 467 471 472 475 475 476 478 478 479 480
1. Introduction In the paper entitled “Polygons and Gravitons” [1] Hitchin rederives the GibbonsHawking form [2,3] of the hyperkähler metrics on deformations of R4 /Zk+1 from the corresponding twistor spaces. These metrics are Ak ALE spaces. Similar techniques are used in [4] and [5] to obtain all ALF metrics. In this paper we construct Yang-Mills instantons1 on these ALF spaces. To this end we introduce bow diagrams, which are of interest in their own right. The study of instantons involves a diverse number of techniques ranging from differential geometry and integrability to representation theory and string theory. Since the construction by Kronheimer and Nakajima [6] of instantons on ALE spaces, the moduli spaces of these instantons emerged in a number of other areas of mathematics and theoretical physics. In [6] an instanton configuration is encoded in terms of quiver data, so that the instanton moduli spaces can be interpreted as quiver varieties. As pointed out in [7] this quiver construction in general produces moduli spaces of torsion free sheaves, which were interpreted in [8] as moduli spaces of instantons on noncommutative ALE spaces. The connections between instantons and the representation theory are particularly intriguing. The relation between representations of Kac-Moody algebras and cohomology groups of instanton moduli spaces was discovered by Nakajima [9,10]. In [11] quantum groups are constructed in terms of quivers. More recently a version of the geometric Langlands duality for complex surfaces was formulated in [12–14], relating moduli spaces of instantons on ALE spaces, representations of affine Kac-Moody algebras, and double affine Grassmanians. Building on the results of [15] such relations were explained via M-theory analysis in [16].2 A beautiful derivation of this correspondence relying on the existence of six-dimensional superconformal theories appeared recently in [17]. In fact, from the string theory picture of [17] it is more natural to consider instantons on ALF, rather than on ALE, spaces. The construction of such instantons is exactly the problem that we pursue in this paper. We introduce the notion of a bow, thereby generalizing the notion of a quiver. Just as for a quiver we also 1 An instanton is a finite action hermitian connection with self-dual curvature two-form. 2 The derivation of [16] also extends the mathematics literature results to the case of non-simply connected
gauge groups.
Instantons on Gravitons
451
introduce a representation of a bow and the moduli space of its representation. These moduli spaces are richer and have larger L 2 cohomology. We expect that these also have a representation-theoretic interpretation. We would like to emphasize that almost any meaningful question about quiver varieties can be studied for the bow varieties we define here. Any bow has a zero interval length limit in which it becomes a quiver. Any representation of the limiting quiver appears as a limit of some representation of the original bow. Not all bow representations, however, become quiver representations in this limit. As formulated in [18], there is a certain reciprocity acting on the bows and their respective representations. In particular, the representations that have a good quiver limit are exactly those for which the dual bow representation is balanced, as defined in Sect. 9.1. For these bow representations the corresponding bow variety is isomorphic to the corresponding quiver variety.3 For all other representations, however, bow varieties appear to differ from quiver varieties. 1.1. Background. 1.1.1. General constructions of instantons. All instanton configurations on flat R4 were constructed in [19] by a technique now called the ADHM construction. The ADHM data is encoded in terms of a quiver with one vertex and one edge. This construction was generalized by Nahm in [20] to produce all instantons on R3 × S 1 . The Nahm data can no longer be interpreted as quiver data. It finds a natural interpretation in terms of a particular simple bow. Considering ADHM construction data invariant under the action of a finite subgroup of SU (2) ⊂ S O(4), Kronheimer and Nakajima [6] constructed instantons on deformations of R4 / , which are called ALE gravitational instantons. All of these constructions were rediscovered within string theory. See [21,22] for the ADHM construction, [23] for the Kronheimer-Nakajima construction on Ak ALE space, [24] the Kronheimer-Nakajima construction on a general ALE space, and [25] for the Nahm construction. A number of interesting explicit instanton solutions based on these general constructions were obtained. Some such examples are three instantons on R4 [26], instantons on Eguchi-Hanson space [27], and single caloron [28,29], to name a few. 1.1.2. Instantons on the Taub-NUT and on the multi-Taub-NUT. Some instantons on the Taub-NUT space and even on multi-Taub-NUT space which have trivial holonomy at infinity are found in [30] and [31]. It is proved in [32] that this construction provides all instantons of instanton number one. This construction, unfortunately, is limited, since it is hard to introduce nontrivial monodromy at infinity and since it is not clear how to generalize it to generic configurations with larger instanton numbers. It is very useful indeed in generating examples of large instanton number configurations. A general construction of instantons on the Taub-NUT and on the multi-Taub-NUT spaces can be formulated in terms of A-type bows as in [33] and [18]. In particular, as an illustration of this construction, the moduli space of one instanton on the Taub-NUT space is found in [33] and the explicit one instanton connection is found in [18]. In this work we aim to formulate a general construction for generic instantons on ALF spaces. While an instanton on an ALE space is determined by matrices satisfying algebraic equations, an instanton on an ALF space is determined by matrix-valued functions satisfying ordinary differential equations. If for an ALE space the corresponding 3 We ought to emphasize that as Riemannian manifolds these bow and quiver moduli spaces differ.
452
S. A. Cherkis
ADHM data is conveniently organized into a quiver, for each ALF space we introduce a corresponding bow to serve the same purpose. A quiver is a collection of points and oriented edges connecting some of them. Given a quiver one can construct a corresponding bow. In order to obtain this bow, consider a collection of oriented and parameterized intervals, each interval Iσ corresponding to a point σ of the original quiver. If any two points σ and ρ of the quiver are connected by an edge oriented from σ to ρ, we connect the corresponding intervals by an edge, so that this edge connects the right end of the former interval Iσ to the left end of the latter interval Iρ . Sending the lengths of all of the intervals to zero reduces this bow to the original quiver. For a quiver there is a notion of a representation assigning a pair of vector spaces to each vertex [9]. Here we define a notion of a representation of a bow, which generalizes a quiver representation. Bow representations are richer: if we consider the zero length limit in which a bow degenerates into a quiver, only some of the bow’s representations produce a limiting conventional representation of a quiver. This provides one of the motivations behind this work, as the new representations, as well as the moduli spaces associated with them, should carry additional information about self-dual Yang-Mills configurations and, one might expect, about representations of Kac-Moody algebras. 1.2. Yang-Mills instanton. A connection (d + A∧ ) on a Hermitian vector bundle over a Riemannian four-manifold M is said to be a Yang-Mills instanton if its curvature two-form F = d A + A ∧ A is self-dual4 under the action of the Hodge star operation: F = ∗F, and if the Chern number of the corresponding bundle is finite: tr F ∧ F < ∞. M
(1)
(2)
Here we limit our consideration to the base manifold (which is also referred to as the background) M being a self-dual gravitational instanton. Moreover, we only let M be an ALF space as defined below. 1.3. Self-dual gravitational instantons. A self-dual gravitational instanton is a Riemannian manifold (M, g) with the self-dual Riemann curvature two-form valued in End(T M): R = ∗R, and finite Pontrjagin number
(3)
M
tr R ∧ R < ∞.
(4)
There are only two kinds of compact self-dual gravitational instantons: the flat four-torus T 4 and the K 3 surface. Noncompact gravitational instantons can be distinguished by 4 Whether one studies instantons satisfying ∗F = F or anti-instantons satisfying ∗F = −F is a matter of taste, as these two conditions are interchanged by a change of orientation of the base manifold. To avoid any potential ambiguity, we specify that what we call an instanton here has the curvature two-form of type (1, 1) in every complex structure of the hyperkähler base.
Instantons on Gravitons
453
their asymptotic volume growth. Choosing a point x ∈ M we denote by Bx (r ) the volume of a ball of radius r centered at x. If M is a self-dual gravitational instanton and if there are some positive constants A and B such that the volume of a ball satisfies Ar ν Vol Bx (r ) < Br ν ,
(5)
for r > 1, then we call the space M • • • •
an ALE space if ν = 4, an ALF space if 3 ν < 4, an ALG space if 2 ν < 3, and an ALH space if 0 < ν < 2.
As follows from Theorem 3.25 of [34], under the presumption tr r R ∧ R < ∞, each ALF space has ν = 3 and asymptotically its metric has a local triholomorphic isometry. We conjecture that any ALF space (i.e. a manifold with self-dual curvature form, finite Pontrjiagin number, and cubic volume growth) is either a multi-Taub-NUT space [35] (also called a Ak ALF space) or a Dk ALF space [5,36]. For the Ak ALF space k ≥ −1 and for the Dk ALF space k ≥ 0. Both the Ak and Dk ALF spaces have degenerate limits in which they can be viewed as the Kleinian singularities with – Ak ALF space given by x y = z k+1 and – Dk ALF space given by x 2 − zy 2 = z k−1 as complex surfaces in C3 . Each of these singular spaces admits an ALF metric with one singular point, and a general ALF space is a smooth hyperkähler deformation of one of these. For the low values of k in particular, the A0 ALF is the Taub-NUT space, the A−1 ALF is R3 × S 1 , the D2 ALF is a deformation of (R3 × S 1 )/Z2 , the D1 ALF is the deformation of the double cover of the Atiyah-Hitchin space [37] studied in [38], while the D0 ALF is the Atiyah-Hitchin space itself. The study of Yang-Mills instantons on Ak and Dk ALF spaces is the main goal of this work. Our construction of such instantons is formulated in terms of bow diagrams. We introduce the notion of a bow and its representation in Sect. 2, and the moduli space of a bow representation in Sect. 3. In Sect. 4 we realize ALF spaces as moduli spaces of certain bow representations. This is essential for the formulation of the Nahm transform. Bow data defining an instanton are given in Sect. 5, where the representation ranks are related to the instanton charges. We claim that up to gauge equivalence this data is in one-to-one correspondence with the instanton. This correspondence is provided by the Nahm transform leading to the Yang-Mills instanton on ALF spaces given in Sect. 6. Section 7 contains the proof of the self-duality condition for the constructed connections. This proof makes it transparent that the Nahm transform, mapping between the moduli space of a bow representation and the moduli space of instantons, is an isomorphism of complex varieties in any of their complex structures. We describe the bow moduli spaces as finite hyperkähler quotients in Sect. 8 and compute their asymptotics in Sect. 9. 2. Generalizing Quivers In this section we introduce the notion of a bow that is at the center of our construction. It is a generalization of the notion of a quiver, such that a quiver is a degenerate case of a bow. As we aim to demonstrate it has richer structures associated with it.
454
S. A. Cherkis
Fig. 1. An oriented edge e connecting two intervals Iσ and Iρ
2.1. Bows. A bow is defined by the following data: • A collection of oriented closed distinct intervals I = {Iσ }; for concreteness we let each interval be parameterized by s with pσ L s pσ R so that Iσ = [ pσ L , pσ R ]. We denote the length of each interval Iσ by lσ , i.e. lσ = pσ R − pσ L . • A collection E of oriented edges, such that each edge e ∈ E begins at the right end of some interval Iσ and ends at the left end of some, possibly the same, interval Iρ as in Fig. 1. Let h(e) denote the head of the edge e and t (e) denote the tail, then Aσρ = # e ∈ E|t (e) = pσ R , h(e) = pρ L , (6) is the number of edges originating at Iσ and ending at Iρ . For any oriented edge e ∈ E let e¯ denote this edge with the opposite orientation and E¯ = {e}. ¯ When drawing a bow as a diagram, we use wavy lines to signify the intervals Iσ and arrows to denote the edges. A number of bow diagrams such as for example in Figs. 2, 3, and 4 appear in Sect. 4 below. It is clear from the definition that as all the lengths lσ → 0 a bow degenerates into a quiver. 2.2. Representation of a bow. A regular representation of a bow is • A collection of distinct points = {λασ } belonging to these intervals, such that λασ ∈ Iσ with α = 1, 2, . . . , rσ . A number rσ of λ-points belonging to an interval Iσ can be zero. • a collection of bundles E → I consisting of Hermitian bundles E σ0 → [ pσ L , λ1σ ], E σrσ → [λrσσ , pσ R ], and E σα → [λασ , λα+1 σ ] for α = 1, 2, . . . , rσ − 1, β
β
of ranks Rσ = rk E σ , satisfying the following matching conditions at λ-points: E σα |λα+1 ⊂ E σα+1 |λα+1 if Rσα Rσα+1 and σ σ E σα |λα+1 ⊃ E σα+1 |λα+1 if Rσα ≥ Rσα+1 . σ σ
Let 0 ⊆ be the collection of λ-points at which E does not change rank, i.e. 0 = {λασ ∈ | Rσα−1 = Rσα }. • a collection of 0 -graded one-dimensional Hermitian spaces W = {Wλ | λ ∈ 0 }.
Instantons on Gravitons
455
We call the number of λ-points # = σ rσ the file of the representation, and we call the collection (Rσα ) the rank of the representation. For simplicity we focus only on regular bow representations. To define a general representation of a bow one can introduce multiplicities W = {wσα } of the λ-points with wσα being the multiplicity of the points λασ . Denote by w(λ) the multiplicity function, i.e. w(λασ ) = wσα . In this case the assigned spaces in W are such that for any λ ∈ 0 the corresponding space dimension is given by the multiplicity: dim Wλ = w(λ). The matching conditions at however are more involved in this case and we postpone this discussion. Here we focus only on regular bow representations. 2.3. Bow data. In order to simplify our notation let E σ R ≡ E σrσ | pσ R denote the fiber over the right end of the interval Iσ and E σ L ≡ E σ0 | pσ L denote the fiber over the left end on the interval Iσ , also, for some λ = λασ ∈ 0 let E λ ≡ E σα−1 |λασ = E σα |λασ denote the fiber at λ. We denote the collections of the corresponding spaces by E R = {E σ R }, E L = {E σ L }, and E 0 = {E λ |λ ∈ 0 }. To every bow representation R we associate a (generally infinite dimensional) hyperkähler affine space Dat(R). It is a direct sum of three spaces Fin ⊕ Fout , B ⊕ B and N that we call respectively fundamental, bifundamental, and Nahm spaces. We define the two components of the fundamental space by Fin = Hom(W, E 0 ) ≡ ⊕ Hom(Wλ , E λ ), λ∈0
Fout = Hom(E 0 , W ) ≡ ⊕ Hom(E λ , Wλ ); λ∈0
the two components of the bifundamental space by B = HomE (E R , E L ) ≡ ⊕ Hom(E|t (e) , E|h(e) ), e∈E
E
B = Hom (E L , E R ) ≡ ⊕ Hom(E|h(e) , E|t (e) ); e∈E
and the Nahm space by N = Con(E) ⊕ End(E) ⊗ R3 ≡ ⊕ Con(E σα ) ⊕ End(E σα ) ⊕ End(E σα ) ⊕ End(E σα ) . σ,α
d − i T0 on the Hermitian bundle E σα . Here Con(E σα ) is the space of connections ∇ = ds We use the following notation to denote the bow data:
I, J, B L R , B R L , (∇, T1 , T2 , T3 ) ∈ Fin ⊕ Fout ⊕ B ⊕ B ⊕ N = Dat(R). (7)
Now a comment is due specifying the behavior of the Nahm components at . These are the same conditions that appear in the Nahm transform of monopoles as formulated in [39]. At any λ ∈ 0 the Nahm components have regular left and right limits at λ. At λ = λασ ∈ \0 on the other hand, with say Rσα−1 < Rσα , 5 all T j (s) for j = 1, 2, 3 have a regular limit from the left lim T j (s) = T j− (λ),
s→λ−
5 For the case with R α−1 > R α the conditions are completely analogous. σ σ
(8)
456
S. A. Cherkis
while to the right of λ we have ⎛
⎞
R−1 + O((s − λ)0 ) O (s − λ) 2
⎠, T j (s) = ⎝
R−1 O (s − λ) 2 T j− (λ) + O(s − λ) 1 ρj 2 s−λ
(9)
where R = Rσα − Rσα−1 is the change of rank and (ρ1 , ρ2 , ρ3 ) satisfy [ρi , ρ j ] = 2iεi jk ρk defining a R-dimensional irreducible representation of su(2).6 2.3.1. Gauge group action. There is a natural gauge group action on the bow data. We consider the group G of gauge transformations of E : a gauge transformation g ∈ G is smooth outside \0 and at λ ∈ \0 with Rσα−1 < Rσα satisfies g(λ+) =
1 0
0 . g(λ−)
(10)
(The continuity condition for the case Rσα−1 > Rσα is analogous.) The action of the gauge group on the bow data is
g : I, J, B L R , B R L , (∇, T1 , T2 , T3 )
L R g , g −1 B R L g , (g −1 ∇g, g −1 T g, g −1 T g, g −1 T g) . → g −1 (0 )I, J g(0 ), g −1 1 2 3 R R L L B
(11) Here for I = ⊕ Iλ , λ∈0
J = ⊕ Jλ , λ∈0
B L R = ⊕ BeL R , e∈E
B R L = ⊕ BeR L , e∈E
(12)
we use the natural conventions LR g −1 (0 )I = ⊕ g −1 (λ)Iλ , g −1 g R = ⊕ g −1 (h(e))BeL R g(t (e)), L B λ∈0
e∈E −1
RL gL = ⊕ g J g(0 ) = ⊕ Jλ g(λ), g −1 R B λ∈0
e∈E
(t (e))BeR L g(h(e)).
(13) (14)
We shall also use the natural products of various collections such as for example I J ≡ ⊕ Iλ Jλ ∈ End E 0 ≡ ⊕ End E λ , λ∈0 λ∈0 L R RL L R RL B B ≡⊕ Be Be ∈ End E L ≡ ⊕ End E σ L , σ
e∈E
σ
(15) (16)
t (e)= pσ L
and others. 6 For the case of a general bow representation with a higher multiplicity point λα , one of a number of posσ sible conditions is that this representation splits into a direct sum of w(λασ ) = wσα irreducible representations.
Instantons on Gravitons
457
2.3.2. Hyperkähler structure. Each of the three spaces: the fundamental Fin ⊕ Fout , the bifundamental B ⊕ B, and the Nahm N , possesses a hyperkähler structure specified below. Let us now introduce a notation that makes this structure apparent. Let e1 , e2 , e3 be a two-dimensional representation of the quaternionic units, with S being its representation space. Thus, e j with j = 1, 2, 3 satisfy the defining quaternionic relations e12 = e22 = e32 = e1 e2 e3 = −1. For example one can choose a representation in terms of the Pauli sigma matrices e j = −iσ j ; which, when written explicitly, is 0 −i 0 −1 −i 0 , e2 = , e3 = . (17) e1 = −i 0 1 0 0 i Let us assemble the fundamental data into † Q λ = Jλ : Wλ → S ⊗ E λ and Q = ⊕ Q λ : W → S ⊗ E(0 ), Iλ λ∈0
(18)
and the bifundamental data into † † BeL R BeR L + : E : E h(e) → S ⊗ E t (e) , Be− = → S ⊗ E or B = t (e) h(e) e −BeR L BeL R (19) with B − = ⊕ Be− : E L → S ⊗ E R and B + = ⊕ Be+ : E R → S ⊗ E L . e∈E
e∈E
(20)
In order to simplify our notation and to avoid numerous brackets in our formulas we do not distinguish upper and lower ± indices, i.e. B + = B+ and B − = B− . For the Nahm d data ∇ = ds − i T0 , T1 , T2 , T3 we introduce T = 1 ⊗ T0 + e1 ⊗ T1 + e2 ⊗ T2 + e3 ⊗ T3 ,
(21)
and its quaternionic conjugate T∗ = 1 ⊗ T0 − e1 ⊗ T1 − e2 ⊗ T2 − e3 ⊗ T3 .
(22)
If we understand S × I → I to be a trivial bundle over the collection of intervals I, then d ds − iT is a connection on the bundle S ⊗ E. Now the quaternionic units act on the tangent space of Dat R in this form by the left multiplication (23) e j : (δ Q, δ B − , δT) → (e j ⊗ 1W )δ Q, (e j ⊗ 1 E R )δ B − , (e j ⊗ 1 E )δT . If one uses B + instead of B − to parameterize the bifundamental data, the quaternionic unit action has the same form (24) e j : (δ Q, δ B + , δT) → (e j ⊗ 1W )δ Q, (e j ⊗ 1 E L )δ B + , (e j ⊗ 1 E )δT . The space of bow data is a hyperkähler space with the metric given by the direct product metric 1 † 2 † ds = tr W δ Q δ Q + tr E R δ B+ δ B+ + tr S tr E δT∗ δTds 2 1 † tr S tr E δT∗ δTds. = tr W δ Q † δ Q + tr E L δ B− δ B− + (25) 2
458
S. A. Cherkis
As described above, the action of the three complex structures on (δ Q, δ B+ , δT) is by the left multiplication by e1 , e2 , and e3 . It leads to three corresponding Kähler forms ω j (·, ·) = g(·, e j ·), j = 1, 2, 3, which can be organized into a purely imaginary quaternion ω = e1 ⊗ ω1 + e2 ⊗ ω2 + e3 ⊗ ω3 . By direct computation 1 ω = Im tr E 0 δ Q ∧ δ Q † + tr E L δ B+ ∧ δ B+† + tr E δT ∧ δT∗ ds 2 1 † + = Im tr E 0 δ Q ∧ δ Q † + tr E R δ B− ∧ δ B− tr E δT ∧ δT∗ ds . 2
(26) (27)
The metric (25) is clearly compatible with the quaternionic structures and is invariant under the gauge group action (11). The latter becomes apparent when we observe that (11) now takes the form ⎛ ⎞ ⎞ ⎛ g −1 (0 )Q Q + ⎜ ⎟ g −1 ⎜ B+ ⎟ L B gR ⎟. g : ⎝ − ⎠ → ⎜ (28) −1 − ⎝ ⎠ B g R B gL d T(s) ig −1 (s) g(s) + g −1 (s)T(s)g(s) ds
2.4. Bow solutions. Since the gauge group action of G preserves the hyperkähler structure on the space of bow data, we can perform the hyperkähler reduction [40]. Namely, we can find the triplet of moment maps (μ1 , μ2 , μ3 ), with each μ j valued in the dual of the Lie algebra, which we assemble into a pure imaginary quaternionic expression μ = e1 ⊗ μ1 + e2 ⊗ μ2 + e3 ⊗ μ3 , such that for any vector field X generating an infinitesimal gauge transformation in G we have dμ(X ) = i X ω, where i X ω is the interior product of the vector field X and the two form ω. Using the definition and Eqs. (26) and (27) one finds ⎛ d δ(s − λ)Q λ Q †λ μ(Q, B, T ) = Im (−i) ⎝i T∗ + TT∗ + ds λ∈0 ⎞
† † ⎠ δ(s − t (e))Be− Be− + δ(s − h(e))Be+ Be+ . (29) + e∈E
If we are to perform a hyperkähler reduction, the value of the moment map μ(Q, B, T ) has to be invariant under the gauge group action μ(s) → g −1 (s)μ(s)g(s), i.e. it has to be an abelian character of the group of the gauge transformations. It follows that we are to impose μ(Q, B, T) = e1 ⊗ ν1 (s)1 E + e2 ⊗ ν2 (s)1 E + e3 ⊗ ν3 (s)1 E with each ν1 (s), ν2 (s) and ν3 (s) some real-valued functions. Thus for any representation of a bow and a choice of a pure imaginary quaternion function ν = ν1 e1 + ν2 e2 + ν3 e3 we obtain the subset μ−1 (ν) in the space of bow data which inherits the isometric action of the gauge group G on it. The subset μ−1 (ν) is called the level set at level ν. The space of gauge group orbits M(ν) = μ−1 (ν)/G = Dat(R)///G is the quotient hyperkähler space.
Instantons on Gravitons
459
We call this space the moduli space of the bow representation or simply the moduli space of the bow, if it is clear from the context which representation is being considered. A point of M(ν) is a gauge equivalence class of some bow data (Q, B, T) satisfying μ(Q, B, T) = ν. We call such data a bow solution in representation R at level ν. 2.5. Bow cohomology. Let us use the following isometry on the space of bow data to simplify our moment map values: s T(s) → T(s) − ν(s )ds . (30) If (Q, B, T) was satisfying the moment map conditions (29) with μ = ν(s), after s the above redefinition TNew(s) = T(s) + ν(s )ds the data (Q, B, TNew ) sat pσ R p isfies (29) with μ = ν(s )ds − δ(s − pσ L ) σ L ν(s )ds . σ δ(s − pσ R ) Thus it suffices to study the values of the moment maps of the form ν = − pσ R )νσ L − δ(s − pσ L )νσ R ) . (δ(s σ As a matter of fact, we shall choose the value of μ to be a bow cocycle. Imposing the cocycle condition allows us to choose the value of μ to be (31) (δ(s − t (e)) − δ(s − h(e))) νe 1 E , e∈E
so that the level is given by a pure quaternionic imaginary function νe on the set of edges. In order to do this let us outline what we mean by the bow cohomology. Given a bow with the collections of intervals I and edges E, let C(I ) denote the space of smooth real functions on I, let C(E) = R#E denote the space of functions on the set of edges, and let L denote the set of closed paths. A closed path is a cyclically ordered alternating sequence of intervals and edges such that each edge connects two intervals that are adjacent to it in this sequence. We let C(L) denote the space of functions on the space of closed paths. Now let us define the space of 0-cochains to be C 0 = C(I ), the space of 1-cochains to be C 1 = C(I ) ⊕ C(E), and the space of 2-cochains to be C 2 = C(L). For f, g ∈ C(I ), a ∈ C(E), and r ∈ C(L) we define the differentials in the complex d0
d1
: 0 → C0 − → C1 − → C2 → 0
(32)
by f (s) , d0 : f (s) → = f (h(e)) − f (t (e)) g(s) → r (l) = d1 : g(s)ds + a(e). a(e) Iσ
g(s) a(e)
d ds
σ |Iσ ∈l
(33) (34)
e∈l
Since for an edge connecting Iσ to Iρ we have t (e) = pσ R and h(e) = pρ L from our definition of a closed path, it follows that d1 d0 = 0 and (32) is a cochain complex of a bow. 0-cocycles are functions on a bow constant on each connected component. In the context of our moment map discussion above the Nahm data redefinitions T j (s) = T j (s) + f j (s),
(35)
460
S. A. Cherkis
produces the change in the moment map ⎛ ⎞ 3 μ(Q, B, T ) = μ(Q, B, T) + d0 ⎝ e j f j ⎠,
(36)
j=1
that is a 1-coboundary. So, for any two levels that differ by a coboundary, the moduli spaces at these levels are isometric. We impose the 1-cocycle condition on our choice of the level to ensure the absence of noncommutativity of the resulting instanton base space. Thus the first cohomology of a bow H 1 () parameterizes essentially different moment map components and we can consider the values of μ ∈ H 1 () ⊕ H 1 () ⊕ H 1 (). Each such value can be chosen to be of the form (31). 2.6. Various forms of the moment map conditions. One can write the moment map condition μ(Q, B, T) = (37) (δ(s − t (e)) − δ(s − h(e))) νe , e∈E
with the moment map given in Eq. (29) in a number of forms. Depending on the context we can use one or another of these forms. In the remainder of this section we write the moment map conditions (37) in four different forms: the quaternionic, the complex, the real, and the operator form. 2.6.1. Quaternionic form. In the interior of each subinterval d T + iTT∗ = 0, Im ds
(38)
at a point λ ∈ 0 we have Im T(λ+) − Im T(λ−) = Im (−i)Q λ Q †λ , while at λ ∈ \0 we have the conditions (9). At the ends of each interval Iσ we obtain the conditions
† Im T( pσ R ) = Im i Be− Be− − νe ,
(39)
(40)
e∈E
t (e)= pσ R
Im T( pσ L ) =
† Im (−i)Be+ Be+ − νe .
(41)
e∈E
h(e)= pσ L
2.6.2. Complex form. Letting D = δ(s − λ)Iλ Jλ [D, T ] −
d ds
− i T0 + T3 , T = T1 + i T2 and ν C = ν1 + iν2 , (42)
λ∈0
+ δ(s − t (e)) B R L B L R + ν C − δ(s − h(e)) B L R B R L + ν C = 0, e∈E
(43)
Instantons on Gravitons
[D † , D] + [T † , T ] +
461
λ∈0
δ(s − λ)(Jλ† Jλ − Iλ Iλ† )
+ δ(s − t (e)) (BeL R )† BeL R − BeR L (BeR L )† − 2νe3
(44)
e∈E
+ δ(s − h(e)) (BeR L )† BeR L − B L R (B L R )† + 2νe3 = 0.
(45)
2.6.3. Real form. Inside each interval outside the Nahm data satisfies the Nahm equations d (46) T1 = i[T0 , T1 ] + i[T2 , T3 ], ds d T2 = i[T0 , T2 ] + i[T3 , T1 ], (47) ds d T3 = i[T0 , T3 ] + i[T1 , T2 ], (48) ds at λ ∈ 0 ,
1 Iλ Jλ + Jλ† Iλ† , (49) T1 (λ+) − T1 (λ−) = 2
i † † Jλ Iλ − Iλ Jλ , (50) T2 (λ+) − T2 (λ−) = 2
1 † T3 (λ+) − T3 (λ−) = J Jλ − Iλ Iλ† , (51) 2 λ while at the ends of an interval Iσ ,
1 (52) BeL R BeR L + (BeR L )† (BeL R )† − νe1 , T1 ( pσ L ) = 2 e∈E
h(e)= pσ L
T2 ( pσ L ) =
e∈E
t (e)= pσ L
T3 ( pσ L ) =
e∈E
i (BeR L )† (BeL R )† − BeL R BeR L − νe2 , 2
(53)
1 (BeR L )† BeR L − BeL R (BeL R )† − νe3 , 2
(54)
1 RL L R LR † RL † Be Be + (Be ) (Be ) − νe1 , 2
(55)
i LR † RL † RL L R (Be ) (Be ) − Be Be − νe2 , 2
(56)
1 RL RL † LR † LR Be (Be ) − (Be ) Be − νe3 . 2
(57)
t (e)= pσ L
and T1 ( pσ R ) =
e∈E
t (e)= pσ R
T2 ( pσ R ) =
e∈E
t (e)= pσ R
T3 ( pσ R ) =
e∈E
t (e)= pσ R
462
S. A. Cherkis
2.6.4. Operator form. There is yet another way of presenting the moment map expressions. Let us introduce the Dirac operator ⎛ ⎞ d − ds + iT∗ ⎜ ⎟ Q† ⎜ ⎟ (58) D = ⎜ − † ⎟ . ⎝ B ⎠ + † B Strictly speaking, this is called operator in the physics literature, while the the Weyl 0 D Dirac operator in physics is , with D given by Eq. (58). From the expression D† 0 (58) it is clear that D : (S ⊗ E) → (S ⊗ E) ⊕ W ⊕ E L ⊕ E R . Its hermitian conjugate operator is D† =
d − iT + δ(s − t (e))Be− + δ(s − h(e))Be+ . (59) δ(s − λ)Q λ + ds λ∈0
e∈E
Since we can use the hermitian structure to identify the spaces with their dual, we view D† : (S ⊗ E) ⊕ W ⊕ E L ⊕ E R → (S ⊗ E). The moment map then has the form μ(Q, B, T) = Im (−i)D† D.
(60)
3. The Moduli Space of a Bow Representation and Natural Bundles on It As we discussed in Sect. 2.4, the moduli space of a bow representation R is the hyperkähler reduction of Dat R by the gauge group G, M = μ−1 /G = Dat R///G.
(61)
The resulting space M is hyperkähler, thus there is a two-sphere parameterizing the complex structures and we can view it as a complex variety in any one of these complex structures. For a unit vector n the corresponding complex structure is n 1 e1 + n 2 e2 + n 3 e3 . For example for n = (0, 0, 1) the space Dat R can be viewed as a complex variety d parameterized by (I, J, B L R , B R L , (D, T )), where D = ds − i T0 + T3 is viewed as a (not necessarily hermitian) connection on E → I. There is a natural action of the complexification of the gauge group G C on Dat R. According to the result of [41], the complex symplectic reduction of Dat R with respect to G C is isomorphic as a complex manifold to M viewed in the corresponding complex structure. In other words the complex symplectic quotient Dat//G C = (μ1 + iμ2 )−1 (ν1 + iν2 )/G C = μ−1 (ν)/G = M.
(62)
It is an infinite dimensional generalization of a theorem in [40]. The proof of this statement is essentially that of Donaldson in [42]. Thus we can view M as the space of solutions to the complex moment map condition (42) only, modulo the complexified gauge transformations. This is the case for any complex structure one chooses.
Instantons on Gravitons
463
3.1. Natural bundles on M. The level space μ−1 (ν) (which is the space of bow solutions satisfying the given moment map conditions) is a subspace of the linear hyperkähler space of all bow data, therefore it has an induced metric. Moreover, it has a natural isometric action of the gauge group G. We defined the moduli space of the bow representation to be M = μ−1 (ν)/G with the quotient metric. The fact that μ−1 (ν) carries a G-invariant metric and that M is a quotient space implies that there is a natural family of vector bundles with connections on M. In particular, if we choose a point s ∈ I we can consider the subgroup Gs of gauge transformations that act trivially at s = s , i.e. (63) Gs = g ∈ G | g(s ) = 1 . Then the quotient group is the group G s ≡ G/Gs . It can be identified with the group acting on E s fiber. Now Rs = μ−1 (ν)/Gs is a finite dimensional space with the quotient metric, moreover, Rs /G s = M. If the group action is free we obtain the principal bundle G s → Rs ↓ (64) M with a connection determined by the metric on Rs . Since we identified the group G s with the group acting on the fiber E|s , we have an associated hermitian vector bundle E s → Rs ↓ (65) M Thus we can view I as parameterizing natural bundles Rs → M over the moduli space of the bow representation. Each Rs → M carrying a natural connection ds . The curvature of ds is self-dual, as in [43]. 3.2. Holomorphic description. In any given complex structure corresponding to some −1 (νnC )/G C . unit vector n we view our moduli space as a complex variety M = μnC C Here μnC and νn are the complex linear combinations of the components of respectively the moment map and its values determined by the choice of the complex strucC C ture n. Just as in the real description above G C n) ≡ s = G /Gs and the variety Rs ( −1 μnC (νnC )/GsC can be viewed as a principal bundle
n) GC s → Rs ( ↓ M
(66)
with the inherited holomorphic structure. If we denote by E the bundle E without its hermitian structure, then we obtain the associated holomorphic vector bundle n) EsC → Rs ( ↓ M.
(67)
Needless to say, these resulting holomorphic bundles do depend on our initial choice of the complex structure determined by n.
464
S. A. Cherkis
4. ALF Spaces In this section we identify bows and representations that have ALF spaces as their moduli spaces. These are of either A- or D-type. A given ALF space can allow various realizations as a moduli space of different bows. It will suffice for our purposes to specify for each ALF space some corresponding bow and its representation that deliver this ALF space as its moduli space. 4.1. Taub-NUT. The simplest ALF space of A-type is the Taub-NUT space. It can also be referred to as the A0 ALF space.For the rank one Nahm data imposing the moment d T j = 0, thus T j are constant map conditions in the interior of the interval leads to ds and a gauge can be chosen so that T0 is also a constant. Constant U (1) transformation leaves the whole set of bow data inert. Thus the only remaining gauge transformation is the U (1) group acting at s = l/2. Let us denote it by U R (1). This U R (1) can be viewed as the factor group of the group of all the gauge transformations modulo the group of gauge transformations that equal to identity at the point p R . The remaining quotient R3 × S 1 × R4 ///U R (1) is the Taub-NUT space with the scale parameter given by the length of the interval l and the position of the Taub-NUT center given by the negative of ν if the value of the moment map is μ = ν. 4.2. Multi-Taub-NUT. The bow of Fig. 3 with the representation that consists of line bundles on each interval has the (k + 1)-centered Taub-NUT space as its moduli space. Just as in the case of the Taub-NUT above, performing the hyperkähler reduction with respect to the gauge group acting only in the interior of the intervals leads to the space 3 k+1 4 k+1 √ R × S1 × R with the flat metric and sizes of the S 1 circles equal to 1/ lσ . Performing the quotient with respect to the remaining gauge groups acting at the ends of the intervals is essentially the same calculation as in [44]. It leads to the multi-TaubNUT space also called the Ak ALF space with the scale parameter equal to the total sum of the interval lengths. The k + 1 Taub-NUT center positions corresponding to the edges equal to the negative of the value νe that was chosen in the moment map condition μ = e∈E (δ(s − t (e)) − δ(s − h(e))) νe . 4.3. Dk ALF space. The Dk ALE space is the deformation of the quotient of R4 by the action of the dihedral group Dk−2 of order 4k − 8. According to [45], Dk ALE space
Fig. 2. A bow consisting of a single interval and a single edge with a representation having a line bundle over the interval
Instantons on Gravitons
465
Fig. 3. For an Ak ALF space, also called (k + 1)-centered Taub-NUT, the bow diagram above contains k + 1 intervals, each carrying a line bundle
Fig. 4. We model Dk ALF space as the moduli space of this bow representation. This bow has k − 3 intervals with rank 2 bundles over them and four intervals with line bundles over them
is a moduli space of the affine Dk quiver representation determined by the null vector of the corresponding Cartan matrix. Among the bow deformations of the corresponding quiver, the one with the representation in Fig. 4 has the Dk ALF space as its moduli space. The Dk ALF space in this form was already considered by Dancer in [46]. Effectively the description of [46] is equivalent to a bow which has all but one interval having zero lengths.
466
S. A. Cherkis
Fig. 5. A U (2) self-dual connection on Taub-NUT is determined by this bow. Its instanton number and monopole charges are determined by the ranks R0 , R1 , and R2 of the vector bundles
5. Yang-Mills Instantons on ALF Spaces Now that we have modelled each ALF space as a moduli space of a concrete bow representation, say s (for ‘small’ representation) we can choose any other representation L (for ‘large’ representation’) of the same bow. Any gauge equivalence class of a solution of L defines an instanton on the original ALF space up to the action of the gauge group. Since there are two representations of the same bow involved, let us denote all the components of the solution in the large representation L by the capital letters Q, B, and T , as in our notation above, and all the components of a solution in the small representation s by the corresponding lower case letters q and t. Note that all the representations we used in the previous chapter to model the ALF spaces were of file zero, i.e. they had no λ-points and no associated fundamental multiplets. In choosing the solution one has to be careful to match the levels of the two representations. If the original ALF space was the moduli space of the small representation s at level ν, i.e. the solutions (q, t) satisfied μs(q, t) = 1e ⊗ ν = 1e ⊗
(δ(s − t (e)) − δ(s − h(e))) νe ,
(68)
e∈E
then the solution of the large representation has to be chosen at level −ν, i.e. it satisfies μL(Q, B, T ) = −1 E ⊗ ν = −1 E ⊗
(δ(s − t (e)) − δ(s − h(e))) νe .
(69)
e∈E
The bow representations below give some examples.
5.1. Examples. 5.1.1. U (2) instantons on the Taub-NUT. Figure 5 gives a representation of the bow determined by the three nonnegative integer numbers R0 , R1 , and R2 . A solution of this representation determines a U (2) instanton on the Taub-NUT space, with the ranks R0 , R1 , R3 determining its instanton number m 0 and its monopole charges.
Instantons on Gravitons
467
Fig. 6. U (4) instantons on a Taub-NUT space are in one-to-one correspondence with the solutions of this bow representation. eλ1 , eλ2 , eλ3 , eλ4 are the values of the monodromy at infinity, while the vector bundle ranks R0 , R1 , R2 , R3 , R4 determine the instanton number and its monopole charges
5.1.2. U (n) instantons on the Taub-NUT. A generic U (n) instanton on the Taub-NUT space is given by a general regular representation of the Taub-NUT bow of Fig. 2 of file n, such as the one in Fig. 6. The positions of the λ-points are given by the values of the logarithm of the eigenvalues of the holonomy of the instanton connection around the compact direction of the Taub-NUT triholomorphic isometry at infinity. While the ranks j R0 = R j with j = 0, 1, . . . , n can be any nonnegative integers. These ranks determine the monopole charges and the monopole number. 5.1.3. U (n) instantons on the (k + 1)-centered Taub-NUT. A generic U (n) instanton on a (k + 1)-centered Taub-NUT space is given by a regular file n representation of the Ak bow of Fig. 3, such as the file seven representation given in Fig. 7 giving a U (7) instanton on A4 ALF space. 5.1.4. U (n) instantons on the Dk ALF space. A generic U (n) instanton on the Dk ALF space is given by a regular file n representation of the bow in Fig. 4, such as the one in Fig. 8.
5.2. Instanton charges. As we demonstrate in Sect. 6 and 7, if a bow has a representation s with a four-dimensional moduli space Ms(ν) for some moment map value ν, then for any other bow representation L of file n its solution with the moment map value −ν produces an U (n) instanton on Ms(ν). We would like to understand the relation between the representation and the topological charges of the resulting instantons. Let us focus from now on on the case of instantons on a multi-Taub-NUT space. A Taub-NUT with k + 1-centers is a moduli space of the Ak −bow representation of Fig. 3. This Ak ALF space is often denoted by T Nk+1 . There are three kinds of topological charges one can associate to a rank n finite action anti-self-dual connection on an Ak ALF space: • n nonnegative integer monopole charges ( j 1 , j 2 , . . . , j n ), • k + 1 first Chern class values c1 , • the value of the second Chern class c2 we denote by m 0 .
468
S. A. Cherkis
Fig. 7. A bow representation determining a U (7) instanton on the 5-centered Taub-NUT
Fig. 8. An example of a U (6) instanton on the D7 ALF space
The Chern classes were computed by Witten in [47]. In order to discuss these charges we have to recall a few facts about the (k + 1)−centered Taub-NUT space. It is convenient to associate to an Ak bow representation a circle of length l = l0 +l1 +· · ·+lk parameterized by a coordinate s with k+1 points 0 pσ < l, σ = 0, . . . , k with pσ positioned at s = σς=0 lς . We understand the interval Iσ of the bow to be associated to the arc [ pσ , pσ +1 ]. For a representation of file n this assignment gives a natural position for the n points λασ within these arcs on the circle. Let the coordinates of these points in the same order as they appear along the circle be λτ , τ = 1, . . . , n. This way the λ-points are labelled in two ways: as λασ with σ = 0, . . . , k and α = 1, . . . , rσ and as λτ α(τ ) with τ = 1, . . . , n. This gives maps σ (τ ) and α(τ ) such that λτ = λσ (τ ) . Now the circle is divided into subarcs by the points pσ with σ = 0 . . . , k and λτ with τ = 1, . . . , n.
Instantons on Gravitons
469
Each of these subarcs is associated to a subinterval of the bow representation which carries a Hermitian bundle over it. We assign the rank of this bundle Rσα to each subarc. We model the T Nk as a moduli space of the representation of Fig. 3 with the moment map μ = e (δ(s − t (e)) − δ(s − h(e))νe for some given set of distinct k three vectors ν0 , . . . , νk with νe = e1 ⊗ νe1 + e2 ⊗ νe2 + e3 ⊗ νe3 . The hyperkähler quotient in this setup was considered in [44] and leads to the following metric: 1 1 (70) V d r 2 + (dθ + ω)2 , ds 2 = 4 V with θ ∼ θ + 4π, V = l + kσ =0 r1σ , where rσ = | r − νσ | and the one-form ω satisfies dω = ∗3 d V. Here ∗3 signifies the Hodge star operation acting on differential forms on R3 . The hyperkähler reduction produces the following one-forms on this space 1 dθ + ω 1 1 dθ + ω (σ ) (71) , a = − ησ , a˜ = 2 V 2 rσ V where ησ is a one-form on R3 satisfying dησ = ∗3 d r1σ . The self-dual connection on the natural bundles Rs defined in Sect. 3.1 can be written in terms of these forms as a (σ ) . (72) as = s a˜ + σ l1 +l2 +···+lσ <s
This has the form a =
H V (dθ
+ ω) − η with dη = ∗3 d H. For any such form H H f = da = − (dτ + ω) ∧ d + V ∗3 d , V V
(73)
and
H H ∧ dη − dω , (74) V V
nσ H 1 dη − dω . For H = and M f ∧ f = ∂ M H V V 2 s+ rσ it is clear that the only contribution arises from the integral over the sphere at infinity giving 1 s 2 s 1 f ∧ f = k − ne . (75) 8π 2 T Nk+1 2 l l e f ∧ f = (dτ + ω) ∧ d
This is so since limr→νe H V = n e and the contribution of η to the integral over a small sphere surrounding νe is cancelled by that of H V ω. The simplest representation of a bow is a skyscraper or zero-rank representation. If it is of file n it consists of nλ-points and has all vector bundles of rank zero. For such a representation the connection is A = diag(aλ1 , aλ2 , . . . , aλn ) and 1 8π 2
tr F ∧ F = T Nk+1
k n n k + 1 λτ 2 λτ . − 2 l l τ =1
τ =1 σ =0 pσ <λτ
(76)
470
S. A. Cherkis
Let us specify the monopole charges. Consider a holonomy along the isometric direction at infinity parameterized by θ. As this direction is finite for all ALF spaces (see [34]) the isometry has eigenvalues that are unrestricted. We presume the asymptotic monodromy to be generic and thus all of these eigenvalues to be distinct and numbered respecting their cyclic order. Asymptotically the logarithm of these eigenvalues does not depend on the direction in which one moves towards infinity and behaves as
2πi τ τ th eigenvalue = λ + j τ /r + O(1/r 2 ) , (77) V where all the integers j τ can be chosen to satisfy 0 j τ < k. A geometric description of the integers j τ , τ = 1, 2, . . . , n is that these are the Chern numbers of the holonomy eigen-bundles over the sphere of directions in the base space. These are defined modulo k. The reason these are associated with monopole charges is that any finite action connection A asymptotically approaches the connection dθ + ω −1 A3 + g + ig −1 dg, Aapp = g (78) V with A3 some connection over bundles over R3 and an endomorphism of that bundle. The self-duality condition on A implies [48] that the asymptotic fields A3 and satisfy the monopole equation of Bogomolny F3 = ∗3 D A3 .
(79)
Here F3 is the curvature of the connection A3 , D A3 is the covariant differential with the A3 connection, and ∗3 is the Hodge star operation acting on T ∗ R3 . The conventional monopole charges are exactly the quantities j τ that we defined above. Viewing the T Nk+1 as a circle fibration over R3 \ {νσ }, let us pick a set of k + 1 semi-infinite nonintersecting lines, such that the σ th line originates at νσ . The preimage in the total T Nk+1 of this line is an infinite cycle which we denote as Cσ . The k + 1 values of the first Chern class are given by 1 tr F. (80) c1σ = 2π Cσ These are not generally integers, however, a difference of any two of these is an integer. From the argument in [47] it follows that the monopole charges and the first Chern class values are given in terms of the bow ranks by )+1 α(τ ) j τ = Rσα(τ (τ ) − Rσ (τ ) +
c1σ
=
τ
λτ +
Rσ0 (τ )
−
k
1,
(81)
σ =0 pσ <λτ
r (τ ) Rσσ(τ )
−
n
1.
(82)
τ =1 pσ <λτ
The instanton number is typically defined by the value of the second Chern class tr F ∧ F. For compact manifolds this is an integer equal to the instanton number. In the case of the noncompact base there is no reason this is an integer. This number receives contributions from the instantons inside, which is integer for a smooth base 1 4π 2
Instantons on Gravitons
471
space, and the monopole contributions. To single out the instanton contribution consider a large ball B R of radius R. Inside the ball we have the smooth connection Ain and outside the ball we have the connection Aout approaching A3 + dθ+ω V and the two are related by a gauge transformation g as Eq. (78): Ain = g −1 Aout g + ig −1 dg. The second Chern class is a differential of the Chern-Simons form, tr F ∧ F = dC S, with C S = tr A ∧d A + 23 A ∧ A ∧ A and the difference C Sin −C Sout = 13 tr g −1 dg ∧ g −1 dg ∧ g −1 dg. Combining these observations we have tr F ∧ F = tr F ∧ F + tr F ∧ F (83) T Nk B T Nk \B R R 1 −1 tr g dg ∧ g −1 dg ∧ g −1 dg + = C Sout . (84) 3 /Z ∂ BR 3 S∞ k The last term can be reexpressed in terms of the monopole charges. This suggests to define the instanton number to be 1 m0 = tr g −1 dg ∧ g −1 dg ∧ g −1 dg. (85) 12π 2 We expect it to be given in terms of the minimal rank min{Rσα } for the balanced representations we define in Sect. 9.1, while for the general representation we expect it to belt equal to the minimal integer in the set {Mτ } defined in Sect. 9.2. 6. Nahm Transform Consider a pair L and s of representations of the same bow. As in Sect. 5, we shall refer to the corresponding representations, Nahm data, and solutions as large and small respectively. To distinguish the ingredients of these two representations we use E and W to denote the collection of hermitian vector bundles and auxiliary spaces defining the large representation L, and e and w to denote the collection of bundles and auxiliary spaces (if any) defining the small representation s. We use E L , E R , e L , and e R to denote the collections of fibers of the respective bundles at the left and right ends of the intervals of I. For any maps, such as T j , B± or Q acting on E or W we use capital letters. Analogous maps, such as respectively t j or b± acting on e or w, will be denoted by lower case letters. If μL is the moment map for the data in the L representation and μs for those in the s representation, then we choose our moment map conditions to be μL = −ν and μs = +ν.
(86)
Relying on the form of the moment map of Sect. 2.6.4, one can introduce the Dirac operator DL using a bow solution in L as in Eq. (58). In terms of this operator μL = † DL. Similarly, if Ds is the Dirac operator for the data in s, the moment map Im iDL μs = Im iDs† Ds. Now let us consider a ‘twisted’ Dirac operator Dt = DL ⊗ 1e + 1 E ⊗ Ds.
(87)
It acts on (S ⊗ E ⊗ e) ⊕ W ⊕ w ⊕ (E L ⊗E e R ) ⊕ (E R ⊗E e L ) . A crucial observation for what follows is that, due to our choice of the moment map conditions in Eq. (86), the twisted Dirac operator is purely real: Im Dt† Dt = 0.
(88)
472
S. A. Cherkis
Moreover, it is strictly positive away from the degenerate locus D in the direct product (μL)−1 (−ν) × (μs)−1 (ν). In fact, in all of the examples we consider for a generic −1 point pt ∈ μL (−ν) the degenerate locus D does not intersect pt × (μs)−1 (ν). This is so since the operator Dt† Dt is a sum of a number of nonnegative parts of the form (T j ⊗ 1e + 1 E ⊗ t j )2 . If all of these parts have a zero eigenvalue, then each T j has this eigenvalue within each interval Iσ , which is not true for a generic solution. This degeneration corresponds to the zero-size limit of an instanton. In particular, the positivity implies that Ker D is empty. Presuming D is Fredholm, which is also generically the case, the vector space Ker D† is finite dimensional. If we fix some bow solution of μL = −ν we obtain a vector bundle Ker Dt† → μ−1 s (ν).
(89)
In order to understand it better, let us first consider the space of sections of e → I. We can view it as a fiber of a trivial infinite-dimensional hermitian vector bundle over (μs)−1 (ν): (e → I) →
B ↓ μ−1 s (ν)
(90)
The gauge group action is such that the gauge group G acts simultaneously on the base (μs)−1 (ν) and maps the corresponding fibers of B into each other: G
B ↓ μ−1 (ν)
(91)
Thus using this action to identify the fibers locally along the gauge group orbits in the base, we obtain the pushdown bundle B˜ → M(ν) = μs(μ). This bundle has a nontrivial connection ∇e . Turning to the bundle Ker Dt we can view it as a subbundle of the trivial bundle with the fiber given by the space of sections of S ⊗ E ⊗ e → μ−1 (ν). The trivial connection induces a connection on the subbundle Ker Dt → M(ν). Trivializing Ker Dt along the orbits of the gauge group G acting on e this connection descends to a connection on M = (μs)−1 (ν)/G. In the next section we will prove that this induced connection is self-dual. To demonstrate this we show that in any given complex structure on the hyperkähler base M(ν) the curvature of this connection is of type (1, 1). This is equivalent to the corresponding holomorphic bundle being flat in every complex structure. 7. Cohomological Interpretation of the Nahm Transform The space of all complex structures forms a sphere. If we parameterize this sphere by a unit tree-vector n, in the representation (17) of Sect. 2.3.2, the action of the corresponding complex structure on S is given by a unit quaternion i −1 −ζ¯ n ≡ n 1 e1 + n 2 e2 + n 3 e3 = . (92) 1 + ζ ζ¯ −ζ 1 The complex coordinate ζ = (n 1 + in 2 )/n 3 parameterizes the Riemann sphere.
Instantons on Gravitons
473
Adopting the standard Hodge theory one can identify the kernel of a Dirac operator with a middle cohomology of a complex. To simplify our notation in dealing with the twisted Dirac operator, let us introduce Z ≡ T ⊗ 1e + 1 E ⊗ t = (T1 + i T2 ) ⊗ 1e + 1 E ⊗ (t1 + it2 ), and d − i(T0 ⊗ 1e + 1 E ⊗ t0 ) + (T3 ⊗ 1e + 1 E ⊗ t3 ). Dt ≡ D ⊗ 1e + 1 E ⊗ d ≡ ds
(93) (94) (95)
Up to an unimportant scalar factor nDt† equals to
−(J + ζ I † )† (Dt + ζ Z † )† (Z − ζ Dt† )† −1 −ζ¯ Dt† = + −ζ 1 I − ζ J† −Z + ζ Dt† Dt + ζ Z † λ∈0 (−B L R + ζ (B R L )† )† −(b R L + ζ (b L R )† )† + δ(s −t (e)) + δ(s − h(e)) −B R L −ζ (B L R )† b L R − ζ (b R L )† e∈E (−b L R + ζ (b R L )† )† −(B R L + ζ (B L R )† )† + δ(s − t (e)) + δ(s − h(e)) . −b R L − ζ (b L R )† B L R − ζ (B R L )† (96)
Here the operators on the right-hand side in the first line act on (S ⊗ E ⊗ e) ⊕ W, the operators in the second line act on E L ⊗E e R ≡ {E h(e) ⊗ ete |e ∈ E},
(97)
and the operators in the third line act on E R ⊗E e L ≡ {E t (e) ⊗ ehe |e ∈ E},
(98)
so that nDt† acts on the direct sum of these spaces (S ⊗ E ⊗ e) ⊕ W ⊕ (E L ⊗E e R ) ⊕ (E R ⊗E e L ). Let us consider spaces A0 = (S ⊗ E ⊗ e), A1 = (S ⊗ E ⊗ e) ⊕ W ⊕ (E L ⊗ e R ) ⊕ (E R ⊗ e L ), and A2 = (S ⊗ E ⊗ e). The latter space consists of distributions and denotes the fact that the sections we consider have the form f (s) + λ∈0 δ(s − λ)aλ . Now we can consider a Dolbeault-type family of complexes which depend on ζ holomorphically δ0
δ1
→ A1 − → A2 → 0, Cζ : 0 → A0 −
(99)
with ⎛
−(Dt + ζ Z † ) f
⎞
⎟ ⎜ ⎟ ⎜ (−Z + ζ Dt† ) f ⎟ ⎜ ⎟ ⎜ † δ0 : f → ⎜ ⎟, (J + ζ I ) f 0 ⎟ ⎜ ⎟ ⎜ LR R L † R L L R † ⎝ (B − ζ (B ) ) f R + (b + ζ (b ) ) f L ⎠ (b L R − ζ (b R L )† ) f R + (B R L + ζ (B L R )† ) f L
(100)
474
S. A. Cherkis
and δ1 = (−Z + ζ Dt† , Dt + ζ Z † ) +
λ∈0
δ(s − λ)(Iλ − ζ Jλ† )
+ δ(s − t (e))(−BeR L − ζ (BeL R )† ) + δ(s − h(e))(beL R − ζ (beR L )† ) e∈E
−δ(s − t (e))(beR L + ζ (beL R )† ) + δ(s − h(e))(BeL R + ζ (BeR L )† ) ,
(101)
i.e. for ψ ∈ (S ⊗ E ⊗ e), χ ∈ W, v− ∈ E L ⊗E e R , and v+ ∈ E R ⊗E e L we have ⎛ ⎞ ψ ⎜χ ⎟ δ1 : ⎝ ⎠ → (−Z + ζ Dt† )ψ1 + (Dt + ζ Z † )ψ2 + δ(s − λ)(Iλ − ζ Jλ† )χλ v− λ∈0 v+ e e δ(s − t (e))(−BeR L − ζ (BeL R )† )v− + + δ(s − h(e))(beL R − ζ (beR L )† )v− e∈E
− δ(s − t (e))(beR L + ζ (beL R )† )v+e + δ(s − h(e))(BeL R + ζ (BeR L )† )v+e .
(102)
The exactness condition δ1 δ0 = 0 at A1 is satisfied for all ζ if and only if μL(Q, B, T ) ⊗ 1e + 1 E ⊗ μs(b, t) = 0,
(103)
which is exactly our choice of the matching of the moment map values of Eq. (86). Basic Hodge theory becomes very powerful in this setup if one observes that Eq. (96) implies that the Dirac operator is directly related to the operator δ1 −δ0† : A1 → A0 ⊕A2 , namely −1 −ζ¯ Dt† = δ1 − δ0† . (104) −ζ 1 This implies that Dn −1 ∼ δ1† − δ0 . As we argued in Sect. 6, away from the special locus D the equation D = 0 has no solutions.7 It follows that both Ker δ0 and Ker δ1† are empty and thus H 0 (Cζ ) = 0 and H 2 (Cζ ) = 0. Now, similarly to the case of instantons on a four-torus discussed in [49], we argue that the only interesting cohomology of Cζ can be identified with the kernel of Dt† operator H 1 (Cζ ) = Ker Dt† . It is clear that any element of ψ ∈ Ker Dt† is in Ker δ1 , since, due to Eq. (96) δ1 ψ = 0 is the second component of the equation nDt† ψ = 0. The opposite inclusion Ker δ1 ⊂ Ker D† follows from the following argument. For any representative η ∈ Ker δ1 we consider η + δ0 ρ which is also in Ker δ1 . We seek ρ satisfying the following condition δ0† (η + δ0 ρ) = 0. Since Ker δ0 = 0 the operator δ0† δ0 is invertible and we can solve for ρ = −(δ0† δ0 )−1 δ0† η. If one views the Nahm transform as a form of the Fourier transform, then the fact that it is an isometry is a form of the Plancherel theorem. This fact has been proved for the Nahm transform of instantons on a four-torus in [50]. For the original ADHM 7 Ker D is empty, since, as we argued, the operator D † D is strictly positive away from the divisor D and Ker D † D = 0.
Instantons on Gravitons
475
construction of instantons on R4 it was proved in [51], for instantons on ALE spaces in [6], for the monopoles in [52], and for doubly-periodic instantons in [53]. As we demonstrated above, the Nahm transform of Sect. 6 is a triholomorphic isomorphism. To prove that it is an isometry it remains to be shown that the holomorphic two-form is also preserved. Based on the cases listed and on the fact that in our case the Nahm transform still respects all of the complex structures, we work now under the hypothesis that the moduli space of instantons is isometric to the moduli space of the corresponding bow representation. We now turn to study these spaces. 8. Moduli Spaces of Instantons The moduli space of a bow representation was defined so far as an infinite hyperkähler quotient, i.e. the space appeared as a hyperkähler reduction of an infinite-dimensional affine space by an infinite-dimensional group. There are various ways of writing each such moduli space as a finite hyperkähler quotient, each useful and interesting in its own right. 8.1. Moduli space of instantons as a finite hyperkähler quotient. Limiting our attention α α α to some subinterval Iσα of length lσα = λα+1 σ −λσ , in any given trivialization of E σ → Iσ the space of the Nahm data associated with it consists of the Nahm matrices T0 , T1 , T2 , T3 of size Rσα × Rσα and boundary conditions (9), such that T0 is regular and T j have poles at each end of the interval forming representations of dimension Rσα − Rσα+1 at the right end, if this quantity is positive, and of dimension Rσα − Rσα−1 , if this quantity is positive. If either of these rank changes is not positive, then all Ti are regular at the corresponding end. Also, in the case α = 0 or rσ all T j are regular at, respectively, the left or right end. Let us denote the space of the Nahm data satisfying these conditions and solving the Nahm equations within the interval modulo the action of the group of the gauge transformations which are trivial at the ends by O Rσα (Rσα−1 , Rσα+1 ; lσα ). These hyperkähler manifolds were extensively studied; see for example [54], where they are described as submanifolds in T ∗ Gl(n, C). To mention one such space, in any given complex structure On (n, n, l) is T ∗ Gl(n, C) = Gl(n, C) × gl(n, C) with the Kirillov-Konstant form as its complex symplectic form [41]. Consider the group G0 of gauge transformations that have trivial action at the ends of all of the subintervals G0 = g g( pσ R ) = g( pσ L ) = g(λ) = 0 . (105) Performing the hyperkähler reduction by G0 we reduce Dat = N ⊕ Fin ⊕ Fout ⊕ B ⊕ B¯ to the finite-dimensional space. Reduction on each interval produces one of the spaces Oσα ≡ O Rσα (Rσα−1 , Rσα+1 ; lσα ), for α = 1, 2, . . . , rσ ,
(106)
Oσ0 Oσrσ
(107)
≡ ≡
O Rσ0 (Rσ0 , Rσ1 ; lσ0 ), O Rσrσ (Rσrσ −1 , Rσrσ ; lσrσ ).
(108)
The Nahm data is the only component of Dat that transforms under G0 and rσ
N ///G0 = ⊕ ⊕ Oσα . σ ∈I α=0
(109)
476
S. A. Cherkis
It is clear that the group G/G0 is the finite-dimensional group of gauge transformations at the ends of the intervals and at the λ-points. Thus the total moduli space of a bow representation R is the finite hyperkähler quotient Uσα × U L × U R , MR = Fin ⊕ Fout ⊕ B ⊕ B¯ ⊕ Oσα (110) σ,α
where U L = = σ U (Rσrσ ), and Uσα = U min{Rσα , Rσα−1 } for 0 < α < rσ . The structure of the spaces Oσα is quite interesting and they provide a useful realization to the moduli space of the bow representation. Nevertheless, this is not the realization we find most useful in what follows.
0 σ U (Rσ ), U R
8.2. A different finite quotient realization. Let us now focus on the moduli space of a representation of an A-type bow. Our results will generalize easily from this to the general case. Following Bielawski [55] and [56] we use the following spaces: Fn (m, c) = On (m, n; c). A space Fn (m, c) is the space of gauge equivalence classes of the file n Nahm data on an interval of length c that are regular on one end and have a (n − m) × (n − m) block with a pole at the other end with residues forming an irreducible representation of su(2). On the other hand the general On (m 1 , m 2 ; l) space can be represented in terms of F-spaces as On (m 1 , m 2 ; l) = Fn (m 1 , c) × Fn (m 2 , l − c)///U (n) for any positive c < l. Let us choose a point ptσα in each interval Iσα , so for any σ and α = 0, . . . , rσ we have ptσα ∈ Iσα . Each interval Iσα is divided into two parts by ptσα . We denote the length of the left part by cσα , so 0 cσα lσα , and the length of the right part by lσα − cσα . Now we can write the moduli space MR as a finite quotient in a different way. Let Fm,m (c, c ) = Fn (m, c) × Fn (m , c )///U (min{m, m }) for m = m and let Fm,m (c, c ) = Fm (m, c) × Fm (m, c ) × C2m ///U (m). We define these spaces as in [56]. We need one more ingredient associated to each edge of the bow. This is the space G m,m (c, c ) = Fm (m, c) × Fm (m , c ) × C2mm ///U (m) × U (m ). Let G c be the subgroup of G consisting of the gauge transformations that are identity at the chosen points ptσα : G c = g|g( ptσα ) = 1 . (111) c α Then G/G = σ,α U (Rσ ). For each interval consider the space rσ rσ Int σ = FRσα−1 ,R α (lσα−1 − cσα−1 , cσα ) U (Rσα ). (112) σ
α=1
α=1
Now MR =
σ
Int σ ×
e
G
rt (e) 0 Rt (e) ,Rh(e)
rt (e) (lt (e)
rt (e) 0 − ct (e) , ch(e) )
rt (e) 0 U (Rt (e) ) × U (Rh(e) ) . (113)
Of course, one can choose for every σ. In this case
ptσ0
= pσ L and all r
r
ptσrσ
=
pσ R so that cσ0 0
rt (e)
t (e) t (e) 0 G Rrt (e) ,R 0 (lt (e) − ct (e) , ch(e) ) = C2Rh(e) Rt (e) , t (e)
h(e)
= 0 and
cσrσ
= lσrσ (114)
Instantons on Gravitons
477
and we quotient the product of Fm,m , Fm , and C2mm types of spaces only. For any regular representation of a general bow, not necessarily of A-type, we have the following formula:
2R 0 Rrt (e) U (Rσ0 ) × U (Rσrσ ) . MR = Int σ × C h(e) t (e) (115) σ ∈I
e∈E
σ ∈I
To demonstrate the usefulness of this realization let us compute the dimension of the moduli spaces of a representation of an A-type bow, if this moduli space is not empty. For m < n the space Fn (m, c) is biholomorphic to Gl(n, C) × gl(m, C) × Cn+m [57,54] and dimC Fn (m, c) = n(n + 1) + m(m + 1),
(116)
while for m = n the space Fn (n, c) is biholomorphic to Gl(n, C) × gl(n, C) and dimC Fn (n, c) = 2n 2 .
(117)
dimC Fm,m (c, c ) = m(m + 1) + m (m + 1).
(118)
For any n and m n,
Since for m = m we have Fm,m (c, c ) = Fm (m, c)×Fm (m, c )×C2m ///U (m), this gives dimC Fm,m (c, c ) = 2m 2 + 2m 2 + 2m − 2m 2 = 2m(m + 1). For the case m < m we have instead Fm,m (c, c ) = Fm (m, c) × Fm (m, c )///U (m) which gives dimC Fm,m (c, c ) = 2m 2 + m(m + 1) + m (m + 1) − 2m 2 = m(m + 1) + m (m + 1). The case m > m is completely analogous leading to the same answer. The dimensions of the spaces G m,m (c, c ) = Fm (m, c)×Fm (m , c )×C2mm ///U (m)× U (m ) associated to the edges are dimC G m,m (c, c ) = 2m 2 + 2(m )2 + 2mm − 2m 2 − 2(m )2 giving dimC G m,m (c, c ) = 2mm .
(119)
Consider some interval Iσ . In order to avoid cumbersome notation we choose to suppress the σ subscript on all relevant quantities. Since the internal space associated to Iσ is Intσ = FR 0 ,R 1 (l 0 − c0 , c1 ) × FR 1 ,R 2 (l 1 − c1 , c2 ) × · · · FRr −1 ,Rr (l r −1 − cr −1 , cr )///U (R 1 ) × U (R 2 ) × · · · U (R r ), its dimension is dimC Int = R 0 (R 0 + 1) + R r (R r + 1) + = R 0 (R 0 + 1) + R r (R r + 1) +
r −1
2R α (R α + 1) − 2(R α )2 α=1 r −1
2R α .
(120)
(121)
α=1
For an edge e ∈ E let Re = |Rt (e) − Rh(e) |. Assembling the internal spaces Int σ with the spaces G Rt (e) ,Rh(e) associated with the edges we find the dimension dimC MR = 2
σ −1 r
σ
α=1
Rσα + 2
e∈E
r
t (e) 0 min{Rt (e) , Rh(e) }−
e∈E
Re ( Re − 1).
(122)
478
S. A. Cherkis
9. Asymptotic Metric on the Moduli Space of Instantons on a Multi-Taub-NUT Space 9.1. Balanced bifundamentals. Let us first consider an Ak−1 bow representation, such that for each edge the rank of the fiber at its head equals to the rank of the fiber at its tail. We call such a representation a balanced representation. All other moduli spaces can be reduced to these. For the Ak−1 diagram this condition means that for any σ = 1, . . . , k the ranks of E σrσ and E σ0 +1 are equal: Rσrσ = Rσ0 +1 and Rkrk = R10 . Until now we were labeling the subintervals by pairs (σ, α) with σ = 1, . . . , k and α = 0, 1, . . . , rσ . For our purposes here it is convenient to label the subintervals between the λ-points by τ = 1, . . . , n in the following manner. Consider a map (σ, α) → τ given by τ=
σ −1
rσ + α.
(123)
ρ=1
This map is invertible if α = 0 and α = rσ and it maps the pairs (σ, rσ ) and (σ + 1, 0) to the same value of τ. When invertible, the inverse map is given by ⎧ ⎫ ρ σ −1 ⎨ ⎬ rρ , α(τ ) = τ − rρ . (124) σ (τ ) = min ρ τ ⎩ ⎭ ρ =0
ρ=0
Let lτ be the length of the corresponding subinterval Iσα if α = 0, rσ and let it be the sum of the lengths of Iσrσ and Iσ0+1 otherwise. γ γ Let us define integers Mσ = Rσ with γ = 0, 1, . . . , rσ − 1. These integers are the ranks of the bundles on all subintervals except for the rightmost subintervals Iσrσ . Let −1 γ β Mσ and M = Mσ . We also let lσ denote the length of the subinterval Mσ = rγσ=0 β
r
−1 and Iσ0 . The moduli Iσ for β = 1, 2, . . . , rσ − 1 and lσ0 be the sum of the lengths of Iσσ−1 space M of this bow representation has quaternionic dimension M. We now construct a certain space Map associated to this bow representation and an incomplete hyperkähler metric gap on it that serves as an exponentially good approximation to the moduli space metric in some asymptotic regions. We begin by assigning γ Mσ points in R3 to each subinterval with γ = 0, . . . , rσ − 1 (the rightmost subintervals excluded). This is exactly the number equal to the rank of the corresponding vector bundle. Each point has coordinate vector x j and an associated phase τ j ∼ τ j + 4π. We shall treat points belonging to the same subinterval as undistinguishable, thus we’ll have −1 S Mσγ to divide by the corresponding direct product of symmetric groups S = σ rγσ=0 to obtain Map . We have a total of M points numbered by j = 1, . . . , M and we introduce functions γ f ( j) and t ( j) such that the point i is associated with the subinterval Iσ if and only if σ = f (i) and γ = t (i). Let us refer to f (i) as the flavor of a point i and t (i) as the f (i)−1 type of a point i. It is also convenient to introduce g(i) = t (i) + σ =0 Mσ . These functions can be defined by ⎧ ⎫ ⎨ ⎬ Mσ , f (i) = min ρ i (125) ⎩ ⎭
σ ρ
Instantons on Gravitons
479
⎧ ⎫ ⎨ ⎬ t (i) = min b i Mσ + M cf (i) , ⎩ ⎭ σ < f (i) cb ⎧ ⎫ ⎨ ⎬ g(i) = min b i M cf (i) . ⎩ ⎭
(126)
(127)
cb
Let l(i) be the length of the associated interval or, if t (i) = r f (i) , the sum of the lengths r (i)−1 r −1 t (i) of the associated interval Iσ0 and of Iσσ−1 , i.e. l(i) = l f (i) + δ0,t (i) l f f(i)−1 . We now introduce the hyperkähler metric
gap = i j d xi · d x j + −1 (dτi + ωi )(dτ j + ω j ), (128) ij
with i j =
⎧ ⎨ l(i) +
δ0,t (i) | xi +ν f (i) |
⎩ − si j | xi − xj|
+
sik k=i | xi − xk |
if i = j, if i = j,
(129)
where si j is given by the affine Cartan matrix Cαβ of the instanton gauge group ⎧ −2 if f (i) = f ( j) and t (i) = t ( j), ⎪ ⎪ ⎪ ⎨ 1 if f (i) = f ( j) and |t (i) − t ( j)| = 1, 1 if f ( j) = f (i) + 1, t ( j) = 0, t (i) = M f (i)−1 , (130) si j = −C g(i),g( j) = ⎪ ⎪ ⎪ ⎩ 1 if f (i) = f ( j) + 1, t (i) = 0, t (i) = M f ( j)−1 , 0 otherwise. ˜ of real dimension 4M We understand (128) as a hyperkähler metric on the space M fibered by M-dimensional tori T M over the configuration space of M points in R3 . This metric clearly has M triholomorphic isometries acting on the torus. The product S of ˜ by permuting the points of the same type. symmetric groups acts isometrically on M ˜ ˜ We define Map as the quotient of M by this group: Map ≡ M/S. Let D ⊂ Map be the set of points with xi = x j for any i and j with g(i) = g( j). Then there is D ⊂ M and a bijection φ : M \ D → Map \ D . Thus x j , τ j up to the action of S can serve as local coordinates on M\ D. Moreover, the metric g on M is exponentially close to gapp of Eq. (128) in the following sense. If Rad = min{| xi − x j |g(i) = g( j)} is the minimal distance between the points of the same kind, then in the region of large Rad the metric g on M asymptotically approaches the metric gap exponentially fast, i.e. |g − gap | < exp(−C Rad). In order to prove this statement one can employ theorems and techniques developed by Bielawski in [55] and [56]. This proof will appear in a forthcoming publication [58]. 9.2. General case. Here we consider the case of general rank, i.e. we impose no relation on the ranks of E σrσ and E σ0 +1 . Let the difference of these two ranks be denoted by
Rσ = Rσ0 +1 − Rσrσ , and let us introduce the quantities cστ by cστ =
z(σ )+ Rσ γ =z(σ )
γ =z(σ )
| Rσ + z(σ ) − γ |δ((γ − τ ) mod n).
(131)
480
Given the ranks
S. A. Cherkis
⎧ β(τ ) ⎨ Rσ (τ ) if β(τ ) = 0, Rτ = ⎩ min R 0 , R rσ (τ )−1 if β(τ ) = 0, σ (τ ) σ (τ )−1
we let Mτ = R τ −
σ
cστ .
(132)
(133)
is empty. If all Mτ If any Mτ is negative then the moduli space of this representation are nonnegative then the moduli spaces has real dimension 4 nτ =1 Mτ and the metric on it is given by Eqs. (128) and (129). 10. Conclusions For every bow representation we define bow solutions. The main result of this work is the formulation of the ADHM-Nahm transform that for any pair of bow representations L and s and a given solution of L constructs a bundle with a connection on the moduli space Ms of s. We prove that in any of the complex structures on Ms the corresponding holomorphic bundle is flat, thus the curvature of this connection is of type (1, 1) in any of the complex structures. Whenever Ms is four-dimensional this implies that the curvature is self-dual.8 Our main goal is to construct instantons on ALF spaces. We realize each of these spaces as a moduli space Ms of a concrete bow representation. Thus, for any solution of any other representation L of the same bow, using the ADHM-Nahm transform we have formulated, one obtains an instanton solution on the ALF space Ms. The relation of the quiver moduli spaces to the representation theory is most intriguing. This relation remains to be explored for the bow moduli spaces. In particular, our realization of the moduli space of a bow as a finite hyperkähler quotient leads to a natural generalization of the notion of a quiver representation. Certain bow representations, namely those that are reciprocal to the the balanced ones, produce all quiver representations in the limit of zero interval size. In this case a quiver has some vector spaces assigned to its vertices. By exploring the limit of a general bow representation one would obtain a new notion for a quiver representation. Now, the objects assigned to a quiver vertex would be not only vector spaces. We expect these objects to be flag manifolds. We did not treat a number of important analytic questions in this manuscript. Among these are the proofs of finiteness of the action of the self-dual connection provided by the bow construction, the analysis of the smoothness of the resulting connection, the isometry between the moduli space of self-dual connections and the moduli space of the large bow representation. We imagine that the analytic techniques developed for treating similar questions in the case of calorons and instantons on ALE spaces extend to the current setup, however, it requires careful further analysis. If one is motivated to study the moduli spaces of instantons or the bow varieties by their relations to the representation theory or by the quantum gauge theories, then the question of L 2 cohomology of these spaces becomes important. For a general study of instantons on ALF spaces the compactification of Hausel-Hunsicker-Mazzeo [59] of the 8 This curvature is anti-self-dual in the conventional orientation chosen in the mathematics literature. See the footnote in Subsect. 1.2
Instantons on Gravitons
481
base space proved to be very useful. We expect it to play a role in the study of the L 2 cohomology of the moduli spaces of instantons on these spaces as well. We have defined the notions of a bow, its representation, and the moduli space of a representation. There is an obvious limit of a bow in which it degenerates into a quiver. In this limit a representation for which the bundle rank does not change at any of the λ-points produces a representation of the corresponding quiver. For such representations in any given complex structure the moduli space of the bow and the moduli space of the corresponding quiver are isomorphic as complex varieties. We emphasize, however, that as differential manifolds, they are different. In particular the L 2 cohomology of the bow representation is typically larger than that of the corresponding quiver. Other bow representations have interesting moduli spaces which do not degenerate to the conventional quiver varieties in the quiver limit. Acknowledgements. It is our pleasure to thank Edward Witten and Hiraku Nakajima for useful conversations and Juan Maldacena for his hospitality during our visits to the IAS, Princeton. We are grateful to the Institut des Hautes Études Scientifiques, Bures-sur-Yvette for hospitality during the completion of this work. This work was supported in part by Science Foundation Ireland Grant No. 06/RFP/MAT050.
References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20.
21. 22.
Hitchin, N.J.: Polygons and Gravitons. Math. Proc. Cambridge Phil. Soc. 85, 465 (1979) Hawking, S.W.: Gravitational Instantons. Phys. Lett. A 60, 81 (1977) Gibbons, G.W., Hawking, S.W.: Gravitational Multi - Instantons. Phys. Lett. B 78, 430 (1978) Cherkis, S.A., Kapustin, A.: Singular Monopoles and Gravitational Instantons. Commun. Math. Phys. 203, 713 (1999) Cherkis, S.A., Hitchin, N.J.: Gravitational Instantons of Type Dk . Commun. Math. Phys. 260, 299 (2005) Kronheimer, P.B., Nakajima, H.: Yang-Mills Instantons on ALE Gravitational Instantons. Math. Ann. 288(2), 263–307 (1990) Losev, A., Moore, G.W., Nekrasov, N., et al.: Four-dimensional Avatars of Two-dimensional RCFT. Nucl. Phys. Proc. Suppl. 46, 130–145 (1996) Nekrasov, N., Schwarz, A.S.: Instantons on Noncommutative R4 and (2,0) Superconformal Six-dimensional Theory. Commun. Math. Phys. 198, 689–703 (1998) Nakajima, H.: Instantons on ALE spaces, Quiver Varieties, and Kac-Moody Algebras. Duke Math. J. 76, 2 (1994) Nakajima, H.: Instantons and Affine Lie Algebras. Nucl. Phys. Proc. Suppl. 46, 154 (1996) Lusztig, G.: On Quiver Varieties. Adv. in Math. 136, 141–182 (1998) Braverman, A., Finkelberg, M.: Pursuing the Double Affine Grassmannian I: Transversal Slices via Instantons on Ak Singularities. Duke Math. J. 152(2), 175–206 (2010) Licata, A.: Framed Rank r Torsion-free Sheaves on CP 2 and Representations of the Affine Lie Algebra ). http://arxiv.org/abs/math/0607690v1 [math.RT], 2006 gl(r Nakajima, H.: Quiver Varieties and Branching. SIGMA 5, 3 (2009) Dijkgraaf, R., Hollands, L., Sulkowski, P., Vafa, C.: Supersymmetric Gauge Theories, Intersecting Branes and Free Fermions. JHEP 0802, 106 (2008) Tan, M.-C.: Five-Branes in M-Theory and a Two-Dimensional Geometric Langlands Duality. Adv. Theor. Math. Phys. 14, 179–224 (2010) Witten, E.: Geometric Langlands from Six Dimensions. http://arxiv.org/abs/0905.2720v1 [hep-th], 2009 Cherkis, S.A.: Instantons on the Taub-NUT Space. Adv. Theor. Math. Phys. 14(2), 609–642 (2010) Atiyah, M.F., Hitchin, N.J., Drinfeld, V.G., Manin, Yu.I.: Construction of Instantons. Phys. Lett. A 65, 185 (1978) Nahm, W.: A Simple Formalism for the BPS Monopole. Phys. Lett. B 90, 413 (1980); Nahm, W.: All Self-dual Multimonopoles for Arbitrary Gauge Group. CERN-TH.3172 (1981); KEK entry; Nahm, W.: Selfdual Monopoles and Calorons. BONN-HE-83-16 SPIRES Talk Presented at 12th Colloq. on Group Theoretical Methods in Physics, Trieste, Italy, Sep 5–10, 1983; W. Nahm, Self-dual Monopoles and Calorons. In: Lecture Notes in Physics 201, New York: Springer, 1984, pp. 189–200 Witten, E.: Sigma Models and the ADHM Construction of Instantons. J. Geom. Phys. 15, 215 (1995) Douglas, M.R.: Gauge Fields and D-branes. J. Geom. Phys. 28, 255 (1998)
482
S. A. Cherkis
23. Douglas, M.R. Moore, G.W.: D-branes, Quivers, and ALE Instantons. http://arxiv.org/abs/hep-th/ 9603167v1, 1996 24. Johnson, C.V., Myers, R.C.: Aspects of Type IIB Theory on ALE Spaces. Phys. Rev. D 55, 6382 (1997) 25. Diaconescu, D.E.: D-branes, Monopoles and Nahm Equations. Nucl. Phys. B 503, 220 (1997) 26. Korepin, V.E., Shatashvili, S.L.: Rational Parametrization of the Three Instanton Solutions of the Yang-Mills Equations. Sov. Phys. Dokl. 28, 1018 (1983) 27. Bianchi, M., Fucito, F., Rossi, G, Martellini, M.: On the ADHM Construction on ALE Gravitational Backgrounds. Phys. Lett. B 359, 49 (1995); Bianchi, M., Fucito, F., Rossi, G., Martellini, M.: Explicit Construction of Yang-Mills Instantons on ALE Spaces. Nucl. Phys. B 473, 367 (1996) 28. Kraan, T.C., van Baal, P.: New Instanton Solutions at Finite Temperature. Nucl. Phys. A 642, 299 (1998); Kraan, T.C., van Baal, P.: Periodic Instantons with Non-trivial Holonomy. Nucl. Phys. B 533, 627 (1998); Kraan, T.C., van Baal, P.: Exact T-duality between Calorons and Taub - NUT Spaces. Phys. Lett. B 428, 268 (1998) 29. Lee, K.M., Lu, C.h.: SU(2) Calorons and Magnetic Monopoles. Phys. Rev. D 58, 025011 (1998) 30. Etesi, G., Hausel, T.: Geometric Construction of New Taub-NUT Instantons. Phys. Lett. B 514, 189 (2001) 31. Etesi, G., Hausel, T.: New Yang-Mills instantons on multicentered gravitational instantons. Commun. Math. Phys. 235, 275–288 (2003) 32. Etesi, G., Szabo, S.: Harmonic Functions and Instanton Moduli Spaces on the Multi-Taub–NUT Space. Commun. Math. Phys. 301, 175–214 (2011) 33. Cherkis, S.A.: Moduli Spaces of Instantons on the Taub-NUT Space. Commun. Math. Phys. 290, 719 (2009) 34. Minerbe, V.: On some Asymptotically Flat Manifolds with Non-maximal Volume Growth. http://arxiv. org/abs/0709.1084v1 [math.dg], 2007 35. Taub, A.H.: Empty Space-Times Admitting a Three Parameter Group of Motions. Ann. Math. 53(3), 472–490 (1951); Newman, E., Tamburino, L., Unti, T.: Empty-Space Generalization of the Schwarzschild Metric. J. Math. Phys. 4, 915 (1963); Hawking, S.W.: Gravitational Instantons. Phys. Lett. A60, 81 (1977) 36. Cherkis, S.A., Kapustin, A.: Dk Gravitational Instantons and Nahm Equations. Adv. Theor. Math. Phys. 2(6), 1287 (1999) 37. Atiyah, M.F., Hitchin, N.J.: The Geometry and Dynamics of Magnetic Monopoles. M.B. Porter Lectures. Princeton, NJ: Princeton Univ. Press, 1988 38. Dancer, A.S.: A Family of Hyperkahler Manifolds. Quart. J. Math. 45(4), 463–478 (1994) 39. Hurtubise, J., Murray, M.K.: Monopoles and their Spectral Data. Commun. Math. Phys. 133, 487 (1990) 40. Hitchin, N.J., Karlhede, A., Lindström, U., Roˇcek, M.: Hyperkähler Metrics and Supersymmetry. Commun. Math. Phys. 108, 535 (1987) 41. Kronheimer, P.B.: A Hyperkähler Structure on the Cotangent Bundle of a Complete Lie Group. MSRI preprint (1988) available at http://arxiv.org/abs/math/090925302 [math.DE], 2009 42. Donaldson, S.K.: Nahm’s Equations and the Classification of Monopoles. Commun. Math. Phys. 96, 387 (1984) 43. Hitchin, N.: The Dirac Operator. In: Bridson, M.R., Salamon, S. (eds.) Invitations to Geometry and Topology. Oxford: Oxford University Press, 2002 44. Gibbons, G.W., Rychenkova, P.: HyperKaehler Quotient Construction of BPS Monopole Moduli Spaces. Commun. Math. Phys. 186, 585 (1997) 45. Kronheimer, P.B.: The Construction of ALE Spaces as HyperKähler Quotients. J. Diff. Geom. 29, 665 (1989) 46. Dancer, A.S.: Dihedral singularities and gravitational instantons. J. Geom. Phys. 12, 77 (1993) 47. Witten, E.: Branes, Instantons, and Taub-NUT Spaces. JHEP 0906, 067 (2009) 48. Kronheimer, P.B.: Monopoles and Taub-NUT Metrics. M. Sc. Thesis, Oxford, 1985 49. Donaldson, S.K., Kronheimer, P.B.: The Geomerty of Four-Manifolds. Oxford: Oxford Geometry University Press, 1990 50. Braam, P.J., van Baal, P.: Nahm’s Transformation for Instantons. Commun. Math. Phys. 122, 267 (1989) 51. Maciocia, A.: Metrics on the Moduli Spaces of Instantons over Euclidean Four Space. Commun. Math. Phys. 135, 467 (1991) 52. Nakajima, H.: Monopoles and Nahm’s equations. SPIRES entry. In: Sanda 1990, Proceedings, Einstein metrics and Yang-Mills connections. T. Mabuchi, S. Mukai, eds., Lect. Notes pure and Appl. Math. 145, New York: Marcel Dekker, 1993, pp.193–211 53. Biquard, O., Jardim, M.: Asymptotic Behaviour and the Moduli Space of Doubly-periodic Instantons. J. Europ. Math. Soc. 3(4), 335–375 (2001) 54. Bielawski, R.: Hyperkähler Structures and Group Actions. J. London Math. Soc. 55, 400–414 (1997) 55. Bielawski, R.: Asymptotic Metrics for SU(N)-monopoles with Maximal Symmetry Breaking. Commun. Math. Phys. 199, 297–325 (1998) 56. Bielawski, R.: Monopoles and the Gibbons-Manton Metric. Commun. Math. Phys. 194, 297–321 (1998)
Instantons on Gravitons
483
57. Hurtubise, J.: The Classification of Monopoles for the Classical Groups. Commun. Math. Phys. 120, 613 (1989) 58. Bielawski, R., Cherkis, S.A.: In preparation 59. Hausel, T., Hunsicker, E., Mazzeo, R.: Hodge Cohomology of Gravitational Instantons. Duke Math. J. 122(3), 485–548 (2004) Communicated by N.A. Nekrasov
Commun. Math. Phys. 306, 485–509 (2011) Digital Object Identifier (DOI) 10.1007/s00220-011-1287-9
Communications in
Mathematical Physics
Orthogonal Polynomials with Recursion Coefficients of Generalized Bounded Variation Milivoje Lukic Mathematics 253-37, California Institute of Technology, Pasadena, CA 91125, USA. E-mail:
[email protected] Received: 23 August 2010 / Accepted: 3 January 2011 Published online: 29 June 2011 – © Springer-Verlag 2011
Abstract: We consider probability measures on the real line or unit circle with Jacobi or Verblunsky coefficients satisfying an p condition and a generalized bounded variation condition. This latter condition requires that a sequence can be expressed as a sum of sequences β (l) , each of which has rotated bounded variation, i.e., ∞ (l) |eiφl βn+1 − βn(l) | < ∞ n=0
for some φl . This includes a large class of discrete Schrödinger operators with almost periodic potentials modulated by p decay, i.e. linear combinations of λn cos(2π αn + φ) with λ ∈ p of bounded variation and any α. In all cases, we prove absence of singular continuous spectrum, preservation of absolutely continuous spectrum from the corresponding free case, and that pure points embedded in the continuous spectrum can only occur in an explicit finite set.
1. Introduction In this paper we will be interested in orthogonal polynomials on the unit circle (OPUC) and orthogonal polynomials on the real line (OPRL). We will state the necessary definitions, but for more information on OPUC and OPRL, we refer the reader to [5,7,8,24– 26,29]. dθ To each probability measure on the unit circle dμ(θ ) = w(θ ) 2π +dμs of infinite support,there corresponds a sequence of orthonormal polynomials ϕn (z) with deg ϕn = n and ϕ¯m (z)ϕn (z)dμ = δmn obeying the Szeg˝o recursion relation zϕn (z) =
1 − |αn |2 ϕn+1 (z) + α¯ n ϕn∗ (z)
(1.1)
486
M. Lukic
with ϕn∗ (z) = z n ϕn (1/¯z ) and with αn ∈ D = {z ∈ C ||z| < 1 } called Verblunsky coefficients. By a theorem of Verblunsky [30], this is a bijective correspondence between such measures and sequences {αn }∞ n=0 with αn ∈ D. To each probability measure on the real line dρ(x) = f (x)d x + dρs (x) of infinite but bounded support, there corresponds a sequence of orthonormal polynomials pn (x) with deg pn = n and pm (x) pn (x)dρ = δmn obeying the Jacobi recursion relation x pn (x) = an+1 pn+1 (x) + bn+1 pn (x) + an pn−1 (x)
(1.2)
with an > 0, bn ∈ R called Jacobi coefficients. By a theorem of Stieltjes [28], more commonly known as Favard’s theorem, this is a bijective correspondence between such measures and sequences {an , bn }∞ n=1 with an > 0, bn ∈ R, and sup an + sup|bn | < ∞. n
n
Next we discuss the generalized bounded variation condition. Definition 1.1. A sequence β = {βn }∞ n=N (N can be finite or −∞) has rotated bounded variation with phase φ if ∞
|eiφ βn+1 − βn | < ∞.
(1.3)
n=N
A sequence α = {αn }∞ n=N has generalized bounded variation with the set of phases A = {φ1 , . . . , φ L } if it can be expressed as a sum αn =
L
βn(l)
(1.4)
l=1
of L < ∞ sequences β (1) , . . . , β (L) , such that the l th sequence β (l) has rotated bounded variation with phase φl . The set of sequences having generalized bounded variation with set of phases A will be denoted G BV (A) or, with a slight abuse of notation, G BV (φ1 , . . . , φ L ). In particular, G BV (φ) is the set of sequences with rotated bounded variation with phase φ. For an example of rotated bounded variation with phase φ, take βn = e−i(nφ+α) γn , with {γn }∞ n=N any sequence of bounded variation. Generalized bounded variation may seem like an unnatural condition for real-valued sequences, but by combining rotated bounded variation with phases φ and −φ, one gets e−i(nφ+α) γn + e+i(nφ+α) γn = cos(nφ + α)γn . It is then clear that a linear combination of Wigner–von Neumann type potentials plus an 1 part, Vn =
K
λk cos(nφk + αk )/n γk + Wn
k=1
with γk > 0 and {Wn } ∈ 1 , has generalized bounded variation. We can now state the two central results of this paper.
(1.5)
Generalized Bounded Variation
487
dθ Theorem 1.1. (OPUC) Let dμ = w(θ ) 2π + dμs be a probability measure on the unit ∞ circle with infinite support and {αn }n=0 its Verblunsky coefficients. Assume that p {αn }∞ n=0 ∈ ∩ G BV (A)
for a positive odd integer p = 2q + 1 and a finite set A ⊂ R. Let ⎧ ⎫ ⎨ ⎬ + · · · + A ) − (A S = exp(iη) η ∈ (A + · · · + A ) . ⎩ ⎭ q times q−1 times
(1.6)
Then (i) suppμs ⊂ S and, in particular, dμ has no singular continuous part; (ii) w(θ ) is continuous and strictly positive on ∂D\S. Theorem 1.2. (OPRL) Let dρ = f (x)d x + dρs be a probability measure on the real line with infinite support and finite moments and {an , bn }∞ n=1 its Jacobi coefficients. Let p be a positive integer, A ⊂ R a finite set of phases, and make one of these sets of assumptions: ∞ p 1) {an2 − 1}∞ n=1 , {bn }n=1 ∈ ∩ G BV (A), ∞ ∞ p 2) {an − 1}n=1 , {bn }n=1 ∈ ∩ G BV (A).
Denote A˜ = A ∪ {0} in Case 1 and A˜ = (A + A) ∪ A ∪ {0} in Case 2, and let ⎧ ⎫ ⎨ ⎬ · · + A ˜ . S = 2 cos(η/2) η ∈ A˜ + · ⎩ ⎭ p−1 times
(1.7)
Then (i) suppρs ∩ (−2, 2) ⊂ S and, in particular, dρ has no singular continuous part; (ii) f (x) is continuous and strictly positive on (−2, 2)\S. Remark 1.1. As we will see later, since recursion coefficients are in p , all their constituent sequences of rotated bounded variation are in p . However, if some of these constituent sequences have faster decay, this can be used to reduce the set S. Namely, a phase φ1 + · · · + φk − φk+1 − · · · − φk+l must only be included in (1.6) or (1.7) if the pointwise product of the corresponding sequences, {βn(1) · · · βn(k) β¯n(k+1) · · · β¯n(k+l) }, is not in 1 . The proofs of Theorems 1.1 and 1.2 in this paper can be easily modified to show this. Remark 1.2. By Lemma 2.2(vi) shown later in this paper, 2 ∞ {an − 1}∞ n=1 ∈ G BV (A) ⇒ {an − 1}n=1 ∈ G BV ((A + A) ∪ A). ∞ p 2 p Also, {an − 1}∞ n=1 ∈ implies {an − 1}n=1 ∈ . Thus, with the replacement of the set A by (A + A) ∪ A, Case 1 of Theorem 1.2 implies Case 2. For that reason, in the remainder of the paper we will only discuss Case 1 of Theorem 1.2. Remark 1.3. If a sequence {βn } has rotated q-bounded variation, i.e. |eiφ βn+q −βn | < ∞, then it also has generalized bounded variation by Lemma 2.1(ii), so our results trivially extend to such sequences.
488
M. Lukic
Theorem 1.2 can be viewed in the special case an = 1, where it becomes a result on discrete Schrödinger operators on a half-line. Using a standard pasting argument, this also implies a result for discrete Schrödinger operators on a line. Corollary 1.3. (1D discrete Schrödinger operators) Let (H x)n = xn+1 + xn−1 + Vn xn
(1.8)
be a discrete Schrödinger operator on a half-line or line, with {Vn } in p with generalized bounded variation with set of phases A. Then (i) σac (H ) = [−2, 2], (ii) σsc (H ) = ∅, (iii) σpp (H ) ∩ (−2, 2) is a finite set, ⎧ ⎫ p−1 ⎨ ⎬ (A + · · · + A ) σpp (H ) ∩ (−2, 2) ⊂ 2 cos(η/2) η ∈ ⎭ . ⎩ k=1 k times This corollary applies in particular to linear combinations of Wigner–von Neumann potentials (1.5). Spectral consequences of bounded variation coupled with convergence of recursion coefficients are well known. These results are often cited as Weidmann’s theorem, after Weidmann who proved the first result of this kind, for Schrödinger operators [31]. The analogous OPRL result, due to Máté–Nevai [17], states that bounded variation of ∞ {an }∞ n=1 and {bn }n=1 together with an → 1, bn → 0 implies Theorem 1.2(i),(ii) with S = ∅. The corresponding result for OPUC, by Peherstorfer–Steinbauer [22], states that bounded variation of {αn }∞ n=0 together with αn → 0 implies Theorem 1.1(i),(ii) with S = {1}. Rotating the measure on the unit circle gives an immediate corollary, that rotated bounded variation of {αn }∞ n=0 with phase φ together with αn → 0 implies Theorem 1.1(i),(ii) with S = {eiφ }. Wong [33] has the first result to consider multiple 2 phases, proving Theorem 1.1 in the case {αn }∞ n=0 ∈ . During the writing of this paper, we learned about work by Janas–Simonov [12] analyzing potentials of the form 1 Vn = cos(φn + δ)/n γ + qn , with γ > 1/3 and {qn }∞ n=1 ∈ . They obtain the same spectral results as our Corollary 1.3 by a different method. All the results discussed so far concern perturbation of the free operator by generalized bounded variation. For perturbations of other operators the situation is more complicated. For instance, in contrast to Weidmann’s theorem, Last [16] has shown that for some classes of potentials V0 , perturbing the discrete Schrödinger operator − + V0 by a perturbation V of bounded variation can destroy a.c. spectrum. In another direction, one can relax the bounded variation condition to an 2 condition on q-variation, namely n |xn+q − xn |2 < ∞. Kaluzhny–Shamis [14] have shown that this kind of perturbation with xn → 0 preserves the a.c. spectrum of periodic Jacobi operators. As communicated to us by Yoram Last, this problem can also be motivated in a different way: let Vn = λn Wn , with λn > 0 monotone decaying to 0, and let H be given by (1.8). For different classes of potentials W , what kind of decay do we need to ensure preservation of a.c. spectrum? If {λn } is periodic, the method of Golinskii–Nevai [10] shows that any such {λn } suffices. For W from a large class of random potentials, Kiselev–Last–Simon [15] have shown that {λn } ∈ 2 is needed. For W from a large class of almost periodic potentials, our results state that any {λn } ∈ p , p < ∞, suffices:
Generalized Bounded Variation
489
Corollary 1.4. (Almost periodic potentials with decay) Let (H x)n = xn+1 + xn−1 + λn Wn xn
(1.9)
be a discrete Schrödinger operator on a half-line or line with {λn } ∈ p of bounded variation (with p < ∞) and W a trigonometric polynomial, Wn =
L
al cos(2π αl n + φl ).
l=1
Then with A = {±2π α1 , . . . , ±2π αl }, all conclusions of Corollary 1.3 hold. The remainder of this paper is dedicated to proofs of Theorems 1.1 and 1.2. In Sect. 2, we discuss some properties of sequences of generalized bounded variation. In Sects. 3–5, we introduce Prüfer variables for OPUC and OPRL and present them in a unified way which will enable us to present a shared proof of the two theorems. In Sects. 6 and 7 we present proofs in the 2 and 3 cases, building up the tools for the general proof in Sects. 8 and 9. 2. Generalized Bounded Variation In this section we describe some properties of sequences of rotated and generalized bounded variation. Most importantly, we prove that if a sequence is of generalized bounded variation and is in some p space, then all the constituent sequences are also in p . Lemma 2.1. (i) Let α ∈ G BV (φ1 , . . . , φ L ), with decomposition (1.4) into sequences of rotated bounded variation. Then for any 1 ≤ p ≤ ∞, α ∈ p ⇒ β (1) , . . . , β (L) ∈ p . (ii) If
n
|eiφ αn+q − αn | < ∞, then α ∈ G BV ( φq , φq +
φ 2π q ,..., q
+
2(q−1)π ). q
Proof. (i) We will prove β (1) ∈ p ; the proof for any β (l) is analogous. Let T be the ∞ shift operator on sequences, defined by T z = {z n+1 }∞ n=N for z = {z n }n=N . In terms of T , the condition (1.3) can be rewritten as (eiφl T − 1)β (l) ∈ 1 .
(2.1)
Note that for any 1 ≤ q ≤ ∞, z ∈ q implies T z ∈ q ; thus, for an arbitrary polynomial P(T ), z ∈ q ⇒ P(T )z ∈ q .
(2.2)
L
Now let Q(T ) = l=2 (eiφl T − 1). By (2.2) with q = 1, (2.1) implies Q(T )β (l) ∈ 1 for l = 1. Meanwhile, α ∈ p and (2.2) imply Q(T )α ∈ p . Thus, applying Q(T ) to (1.4) gives Q(T )β (1) = Q(T )α −
L l=2
Q(T )β (l) ∈ p .
(2.3)
490
M. Lukic
Since the φl are mutually distinct, Q(T ) is coprime with eiφ1 T −1, so there exist complex polynomials U (T ), V (T ) such that 1 = U (T )Q(T ) + V (T )(eiφ1 T − 1). Thus, applying U (T ) to (2.3) and V (T ) to (eiφ1 T − 1)β (1) ∈ 1 and adding the two, we obtain β (1) ∈ p . (ii) Let Rk (T ) = (eiφ T q − 1)/(ei(φ+2kπ )/q T − 1) for 0 ≤ k ≤ q − 1. Since there q−1 exist complex polynomials Uk (T ) with 1 = k=0 Rk (T )Uk (T ), by defining β (k) = q−1 Rk (T )Uk (T )α one gets the required representation α = k=0 β (k) with (ei(φ+2kπ )/q T − 1)β (k) ∈ 1 . Remark 2.1. If a sequence α is of generalized bounded variation, uniqueness of the representation (1.4) is of some interest. Clearly, we can freely add 1 sequences to β (l) ’s, as long as the sum of those sequences cancels out in α. By doing so, we can eliminate any extraneous β (l) which are in 1 . (k) Conversely, if we find a different representation αn = β˜n , then subtracting it from the representation (1.4) and applying Lemma 2.1 with p = 1, we see that to each β (l) ∈ / 1 there corresponds a unique β˜ (k) with the same phase, such that their difference is an 1 sequence. The following lemma describes some properties of sequences of generalized bounded variation. In particular, it shows that real sequences of generalized bounded variation have, in essence, an even set of phases and a symmetric representation with respect to complex conjugation. ∞ Lemma 2.2. Let φ, ψ ∈ R, A, B, C ⊂ R, and β = {βn }∞ n=N , γ = {γn }n=N (with N finite) complex sequences. Then
(i) (ii) (iii) (iv) (v) (vi) (vii)
If β ∈ G BV (φ), then β is bounded; if β ∈ G BV (φ), γ ∈ G BV (ψ), then {βn γn }∞ n=N ∈ G BV (φ + ψ); if β ∈ G BV (B), γ ∈ G BV (C), then {βn γn }∞ n=N ∈ G BV (B + C); if β ∈ G BV (B), γ ∈ G BV (C), then {βn + γn }∞ n=N ∈ G BV (B ∪ C); ¯ if β ∈ G BV (B), then β ∈ G BV (−B); ∞ 2 if {an − 1}∞ n=1 ∈ G BV (A), then {an − 1}n=1 ∈ G BV ((A + A) ∪ A); if x ∈ G BV (A) with xn ∈ R, then x admits a representation L x= (β (l) + β¯ (l) ) l=1
with β (l)
∈ G BV (φl ), such that φl ∈ A and for every β (l) ∈ / 1 , the corresponding φl is in −A + 2π Z.
Proof. (i) follows from the triangle inequality, |βn | ≤ |ei N φ β N | +
n−1
|ei(m+1)φ βm+1 − eimφ βm |
m=N
≤ |β N | +
∞ m=N
|eiφ βm+1 − βm |.
Generalized Bounded Variation
491
(ii) follows from the triangle inequality and part (i), i(φ+ψ) βn+1 γn+1 − βn γn ≤ eiψ γn+1 (eiφ βn+1 − βn ) + βn (eiψ γn+1 − γn ) e ≤ γ ∞ eiφ βn+1 − βn + β∞ eiψ γn+1 − γn after summing over n. (iii) is proved by decomposing β and γ into sequences of rotated bounded variation and applying (ii). (iv) and (v) follow directly from Definition 1.1. (vi) follows from (iii) and (iv), using an2 − 1 = (an − 1)2 + 2(an − 1). (vii) Taking an arbitrary representation of x and averaging it with its complex conjugate produces the desired form. Since x = x, ¯ the other claim follows from (v) and Remark 2.1. 3. Prüfer Variables — OPUC In this section we will define Prüfer variables for OPUC and reduce the proof of Theorem 1.1 to a criterion in terms of one of them. Prüfer variables are named after Prüfer [23] who defined them for Sturm—Liouville operators. The OPUC version of Prüfer variables was first introduced by Nikishin [20], and later used by Nevai [19] and Simon [25]. For z = eiη with η ∈ R, Prüfer variables rn (η), θn (η) are defined by rn (η) > 0, θn (η) ∈ R, and ϕn (eiη ) = rn (η)ei[nη+θn (η)]
(3.1)
(the ambiguity in θn modulo 2π is usually fixed by setting θ0 = 0 and |θn+1 − θn | < π , but in this paper that will be irrelevant). Then ϕn∗ (eiη ) = rn (η)e−iθn (η) , so the Szeg˝o recursion relation (1.1) implies rn ei[(n+1)η+θn ] = 1 − |αn |2 rn+1 ei[(n+1)η+θn+1 ] + α¯ n rn e−iθn . Regrouping and dividing by 1 − |αn |2 rn ei[(n+1)η+θn ] gives rn+1 i(θn+1 −θn ) 1 − α¯ n e−i[(n+1)η+2θn ] e = . rn 1 − |αn |2
(3.2)
Part (i) of the following lemma reduces the proof of Theorem 1.1 to the proof of uniform convergence of log rn (η) on intervals. Part (ii) is also used in the proof of Theorem 1.1, to provide a contradiction in a crucial step. Lemma 3.1. Let a measure dμ on the unit circle have Verblunsky parameters {αn }∞ n=0 and Prüfer variables rn (η). Then (i) If B ⊂ R is finite and log rn (η) converges uniformly on intervals I with dist(I, B + 2π Z) > 0, then Theorem 1.1(i),(ii) hold, with the set S given by S = {exp(iη)|η ∈ B}; (ii) If αn → 0, it is not possible for log rn (η) to converge as n → ∞ to +∞ or −∞ uniformly on an interval I .
492
M. Lukic
Proof. (i) Note that rn (η) = |ϕn (eiη )|, so using the Bernstein–Szeg˝o approximations (see [27]), 1 dη w → dμ(eiη ). 2π rn2 (η)
(3.3)
Thus, if log rn (η) converges uniformly on an interval I , then dμ(eiη ) =
1 1 dη 2π lim rn2 (η)
on I.
n→∞
This holds for any interval I with dist(I, B + 2π Z) > 0, and ∂D\S can be covered by the images J = {eiη |η ∈ I } of countably many such intervals, which implies the conclusions of Theorem 1.1. (ii) If rn (η) converged uniformly to 0 or to +∞ on I , (3.3) would imply that μ(I ) = ∞ or μ(I ) = 0, contradicting either the assumption that dμ is a probability measure or a result of Geronimus [9, Thm. 19.1] that αn → 0 implies suppdμ = ∂D. 4. Prüfer Variables — OPRL In this section we will define Prüfer variables for OPRL and reduce the proof of Theorem 1.2 to a criterion in terms of one of them. The OPRL analog of Prüfer variables is known as the EFGP transform, by Eggarter, Figotin, Gredeskul, Pastur [6,11,21] who developed and used it in the discrete Schrödinger case an = 1. It was also extensively used by Kiselev–Last–Simon [15]. For general OPRL, it was used by Breuer, Kaluzhny, Last, Simon [2–4,13]. For x = 2 cos(η/2), 0 < η < 2π , define rn (η) > 0, θn (η) ∈ R by rn (η)ei[nη/2+θn (η)] = an pn (x) − pn−1 (x)e−iη/2 .
(4.1)
Next we define αn (η) =
an2 − 1 + eiη/2 bn+1 . eiη − 1
(4.2)
This variable will play the same role in our proof that Verblunsky coefficients αn play for OPUC. In fact, after this section, we will not need to mention an or bn individually, only their combination (4.2). By decomposing an2 − 1 and bn into sequences of rotated bounded variation, αn (η) can be written as αn (η) =
L
h l (η)βn(l) ,
(4.3)
l=1
where β (l) has rotated bounded variation with phase φl and h l (η) are continuous nonvanishing functions on (0, 2π ). In fact, h l (η) are either 1/(eiη − 1) or eiη/2 /(eiη − 1), ∞ depending on whether the corresponding β (l) was a part of {an2 − 1}∞ n=1 or {bn }n=1 . ∞ ∞ 2 p (l) p Further, if {an − 1}n=1 , {bn }n=1 ∈ , then β ∈ by Lemma 2.1. Note that unlike in OPUC, an arbitrary choice of sequences β (l) ∈ p ∩ G BV (φl ) wouldn’t correspond via (4.3) to a valid set of Jacobi parameters; rather,
Generalized Bounded Variation
493
by Lemma 2.2(vii), for each β (l) , its complex conjugate is also one of the sequences in (4.3). Multiplying (4.1) by eiη/2 gives rn ei[(n+1)η/2+θn ] = an pn eiη/2 − pn−1 .
(4.4)
Note that 2 Re αn = 1 − an2 and 2 Re(αn eiη/2 ) = bn+1 so using (4.4), 2 Re rn ei[(n+1)η/2+θn ] αn = 2 Re an pn eiη/2 αn − pn−1 αn = an pn bn+1 + (an2 − 1) pn−1 . Subtracting this from (4.4), then using the Jacobi recursion relation (1.2), we have rn ei[(n+1)η/2+θn ] − 2 Re rn ei[(n+1)η/2+θn ] αn = an (an+1 pn+1 − pn e−iη/2 ) = an rn+1 ei[(n+1)η/2+θn+1 ] , where in the last line we used (4.1) with n replaced by n + 1. Dividing both sides by an rn ei[(n+1)η/2+θn ] and again using an2 = 1 − 2 Re αn , we get rn+1 i(θn+1 −θn ) 1 − αn − α¯ n e−i[(n+1)η+2θn ] e = . √ rn 1 − αn − α¯ n
(4.5)
Part (i) of the following lemma reduces the proof of Theorem 1.2 to proving uniform convergence of log rn (η) on intervals. Part (ii) is also used in the proof of Theorem 1.2, to provide a contradiction in a crucial step. Lemma 4.1. Let a measure dρ on the real line have Jacobi parameters {an , bn }∞ n=1 with an → 1, bn → 0 and Prüfer variables rn (η). Then (i) If B ⊂ R is finite, 0 ∈ B and log rn (η) converges uniformly on intervals I with dist(I, B + 2π Z) > 0, then Theorem 1.2(i),(ii) hold, with the set S given by S = {2 cos(η/2)|η ∈ B}; (ii) It is not possible for log rn (η) to converge as n → ∞ to +∞ or −∞ uniformly on an interval I . Proof.
(i) We use a sequence of weak approximations to dρ (see [27]) dx π(an2 pn2 (x) +
w
2 (x)) pn−1
→ dρ(x),
(4.6)
but we only know that with x = 2 cos(η/2), 2 rn2 (η) = an2 pn2 (x) − an x pn (x) pn−1 (x) + pn−1 (x)
(4.7)
uniformly converges on certain intervals. For |x| < 2 − 2 we have 2 2 (x)) ≤ rn2 (η) ≤ (2 − )(an2 pn2 (x) + pn−1 (x)). (an2 pn2 (x) + pn−1
(4.8)
494
M. Lukic
Let I be an interval with dist(I, B + 2π Z) > 0. Since log rn converges uniformly on I , it is uniformly bounded on I . Since 0 ∈ B, dist(I, 2π Z) > 0, so (4.8) implies 2 (x)) is uniformly bounded on J = {2 cos(η/2)|η ∈ I }. Thus, stanlog(an2 pn2 (x) + pn−1 dard measure theory arguments applied to (4.6) imply that dρ(x) = f (x)d x on J with log f bounded on J . It remains to prove continuity of f on J . By [18, Thm. 4.2.13], since an → 1 and bn → 0, for all bounded continuous real functions h(x),
+∞
lim
n→∞ −∞
1 π
h(x) pn (x) pn+k (x)dρ(x) =
T|k| (x/2) h(x) √ d x, 4 − x2 −2 2
where Tk (x) are Chebyshev polynomials of the first kind, given by Tk (cos θ ) = cos(kθ ). Using this and (4.7), with η(x) = 2 arccos(x/2), lim
+∞
n→∞ −∞
h(x)rn (η(x))2 dρ(x) = =
1 π
2
2T0 (x/2) − x T1 (x/2) dx √ 4 − x2 h(x) 4 − x 2 d x.
h(x)
1 2π
−2 2
−2
Assuming in addition that supph ⊂ J , uniform convergence of log rn (η) on I implies +∞ +∞ h(x)rn2 (η(x)) dρ(x) = h(x) lim rn2 (η(x)) dρ(x). lim n→∞ −∞
−∞
n→∞
Comparing the two gives √ 4 − x2 1 d x on J. dρ(x) = 2π lim rn2 (2 arccos(x/2))
(4.9)
n→∞
Since (−2, 2)\S can be covered by countably many such intervals J , this concludes the proof. (ii) If rn (η) converged uniformly to 0 or to ∞ on I , (4.8) and (4.6) would imply that ρ(I ) = ∞ or ρ(I ) = 0. This would contradict either the assumption that dρ is a probability measure or a result of Blumenthal–Weyl [1,32] (see also [26, Sect. 1.4]) that an → 1, bn → 0 implies ess supp dρ = [−2, 2]. 5. Equisummability In this section, we define a useful relation and present the framework for both OPRL and OPUC in a unified way. Define a constant c, 0 for OPUC c= (5.1) 1 for OPRL. Then (3.2) and (4.5) can be written in a unified way as rn+1 i(θn+1 −θn ) 1 − cαn − α¯ n e−i[(n+1)η+2θn ] e =√ . rn (1 − cαn )(1 − cα¯ n ) − αn α¯ n
(5.2)
Generalized Bounded Variation
495
Taking the absolute value of this equation, or dividing it by its complex conjugate, we get rn+1 |1 − αn ei[(n+1)η+2θn ] − cα¯ n | = √ , rn (1 − cαn )(1 − cα¯ n ) − αn α¯ n 1 − α¯ n e−i[(n+1)η+2θn ] − cαn e2i(θn+1 −θn ) = . 1 − αn ei[(n+1)η+2θn ] − cα¯ n
(5.3) (5.4)
For both OPUC and OPRL, the sequence α(η) can be written as αn (η) =
L
h l (η)βn(l) ,
(5.5)
l=1
where β (l) has rotated bounded variation with phase φl , β (l) ∈ p and h l (η) are continuous non-vanishing functions away from A1 + 2π Z, with ∅ for OPUC A1 = (5.6) {0} for OPRL. For a given set A of phases, we will now define sets A p with p a positive integer. Let A2 = A ∪ A1 . Let q = ( p − 1)/2 (the smallest integer not smaller than ( p − 1)/2) and ⎧ (A + · · · + A) − (A · · + A ) for OPUC ⎪ + · ⎪ ⎨ q times q−1 times Ap = A + · · · + A2 for OPRL. 2 ⎪ ⎪
⎩
(5.7)
(5.8)
p−1 times
For OPRL, note that Lemma 2.2(vii) implies A = −A, and that 0 ∈ A2 , so the set A p contains all elements of (A · · + A ) − (A · · + A ) + · + · i times
j times
for any i ≥ 1, j ≥ 0 and i + j < p. For OPUC, it only contains those with i = j + 1. Definition 5.1. Let B ⊂ R be a finite set. We define equisummability away from B, a binary relation ∼ B on the set of sequences parametrized by η ∈ R by: u n (η) ∼ B vn (η) if and only if ∞
(u n (η) − vn (η))
n=0
converges uniformly (but not necessarily absolutely) in η ∈ I for intervals I with dist(I, B + 2π Z) > 0. With this notation, if we are in the p case, it suffices to show that log
rn+1 (η) ∼Ap 0 rn (η)
because then Lemmas 3.1(i) and 4.1(i) imply Theorems 1.1 and 1.2.
(5.9)
496
M. Lukic
6. Proof in the 2 Case In this section, we present a proof of (5.9) in the 2 case. We focus on this case in order to motivate elements of the proof of the general case, and in particular a key lemma. We remind the reader that for OPUC, the 2 case has already been proved by Wong [33]. Taking the log of (5.3) and expanding to linear order in αn , we get log
rn+1 = − Re αn ei[(n+1)η+2θn ] + O(|αn |2 ). rn
In the 2 case O(|αn |2 ) ∼ A1 0, so using (5.5), rn+1 log ∼ A1 − Re h l (η)βn(l) ei[(n+1)η+2θn ] . rn L
(6.1)
l=1
Now we need a way to control terms of the form f (η)n ei[(n+1)η+2θn ] , with {n } of rotated bounded variation with phase φ. But first, some definitions. We will need the function χ (η) =
1 e−iη
1 i η = − + cot . −1 2 2 2
(6.2)
Taylor expansions of (5.4) will turn out to be important: taking the k th power of (5.4) and expanding in powers of αn , we have e2ki(θn+1 −θn ) − 1 = Pk,l (αn , ei[(n+1)η+2θn ] ) + O(|αn |l ), where Pk,l (αn , ei[(n+1)η+2θn ] ) =
(6.3)
k i[(n+1)η+2θn ] (−1)v k+u−1 + cα¯ n )u u v (αn e
u,v≥0 0
×(α¯ n e−i[(n+1)η+2θn ] + cαn )v .
(6.4)
The first part of the following lemma will give us a way of passing from a sequence of the form f (η)n ei[(n+1)η+2θn ] to a faster decaying sequence, but at a cost of a multiplicative factor with possibly finitely many singularities. These singularities exactly correspond to the points where we can’t rule out existence of a pure point. The main idea of the proof is that for η away from φ, the exponential factor einη in this sequence helps average out parts of it when partial sums are taken. The second part of the lemma uses the p condition and shows that it is allowed to replace an appearance of e2ik(θn+1 −θn ) − 1 by its Taylor polynomial Pk,l of a sufficient power. Lemma 6.1. Let k ∈ Z and φ ∈ [0, 2π ), with k and φ not both equal to 0. Let B ⊂ R be a finite set and f : R\(B + 2π Z) → C be a continuous function such that g(η) = f (η)χ (kη − φ) is also continuous on R\(B + 2π Z) (removable singularities in g are allowed). If {n } has rotated bounded variation with phase φ and n → 0, then f (η)n eik[(n+1)η+2θn ] ∼ B g(η)n eik[(n+1)η+2θn ] e2ik(θn+1 −θn ) − 1 . (6.5)
Generalized Bounded Variation
497
(k ) (k ) (l ) (l ) In particular, let n = βn 1 · · · βn s β¯n 1 · · · β¯n t with φ = φk1 +· · ·+φks −φl1 −· · ·−φlt . ( j) p If all β ∈ and A1 ⊂ B, then
f (η)n eik[(n+1)η+2θn ] ∼ B g(η)n eik[(n+1)η+2θn ] Pk, p−s−t (αn , ei[(n+1)η+2θn ] ). (6.6) Proof. Start by substituting f (η) = g(η)(e−i(kη−φ) − 1), f (η)n eik[(n+1)η+2θn ] = g(η)(e−i(kη−φ) − 1)n eik[(n+1)η+2θn ] = g(η) eiφ n eik[nη+2θn ] − n eik[(n+1)η+2θn ] ,
(6.7)
and note that g(η) is bounded on intervals I with dist(I, B + 2π Z) > 0. For a sequence xn (η) which converges to 0 uniformly in η away from B + 2π Z, ∞
(xn (η) − xn+1 (η)) = x0 (η),
n=0
uniformly in η, so xn (η) ∼ B xn+1 (η). Taking xn (η) = eiφ n eik[nη+2θn ] gives eiφ n eik[nη+2θn ] ∼ B eiφ n+1 eik[(n+1)η+2θn+1 ] .
(6.8)
Meanwhile, the rotated bounded variation condition for n implies eiφ n+1 eik[(n+1)η+2θn+1 ] ∼ B n eik[(n+1)η+2θn+1 ] .
(6.9)
Applying (6.8) and then (6.9) to the first term of the right-hand side of (6.7) proves (6.5). To prove (6.6), use Lemma 2.2(ii),(v) to note that has rotated bounded variation with phase φ. Using (5.5) and continuity of h l (η) away from A1 , on an interval I with dist(I, A1 + 2π Z) > 0 we have |αn | ≤ C1
L
|βn(l) |
(6.10)
l=1
for some constant C1 . Since β (l) are bounded sequences, αn (η) is uniformly bounded for η ∈ I . Thus, (6.3) implies 2ki(θn+1 −θn ) − 1 − Pk, p−s−t (αn , ei[(n+1)η+2θn ] ) ≤ C2 |αn | p−s−t . e (k ) (k ) (l ) (l ) Combining this with (6.10) and n = βn 1 · · · βn s β¯n 1 · · · β¯n t , and using β ( j) ∈ p , we get g(η)n eik[(n+1)η+2θn ] e2ki(θn+1 −θn ) − 1 − Pk, p−s−t (αn , ei[(n+1)η+2θn ] ) ∼ B 0.
Subtracting this from (6.5) gives (6.6) and completes the proof.
2
case. Notice that the factor Using this lemma, we can finish the proof for the χ (η − φl ) is continuous away from φl ∈ A2 , and that h l (η) are continuous away from A1 ⊂ A2 . Also, from (6.4) or (5.4), (6.3) we have e2i(θn+1 −θn ) − 1 = O(|αn |), i.e. P1,1 = 0, so by Lemma 6.1, h l (η)βn(l) ei[(n+1)η+2θn ] ∼ A2 0. Summing this over l and combining into (6.1) finally gives rn+1 log ∼ A2 0, rn which completes the proof.
(6.11)
498
M. Lukic
7. Proof in the 3 Case In this section, we present the proof in the 3 case to provide further motivation for the general proof. Beyond 2 , Lemma 6.1 needs to be used iteratively, and the 3 case illustrates the difficulties encountered in performing this iterative procedure. Taking the log of (5.3) and expanding in powers of αn , then using O(|αn |3 ) ∼ A1 0 implies rn+1 log ∼ A1 Re −αn ei[(n+1)η+2θn ] − 21 αn2 e2i[(n+1)η+2θn ] rn (7.1) −cαn α¯ n ei[(n+1)η+2θn ] + 21 αn α¯ n . As in the 2 case, we now want to apply Lemma 6.1 to parts of this expression. We begin with the first-order term in αn . In the 2 case, using (5.5) to break up αn and using Lemma 6.1 gave (6.11). However, applying the same lemma in the 3 case, we need P1,2 instead of P1,1 , since terms quadratic in the sequences β ( j) cannot be automatically discarded. Thus, instead of (6.11) we get h l (η)βn(l) ei[(n+1)η+2θn ] ∼ A2 h l (η)χ (η − φl )βn(l) ei[(n+1)η+2θn ] − cαn + cα¯ n (7.2) −α¯ n e−i[(n+1)η+2θn ] + αn ei[(n+1)η+2θn ] . (l)
Note that all terms on the right-hand side contain a βn and an αn or α¯ n , so we have obtained a faster decaying expression in n, although at the cost of a singularity at η = φl . Summing (7.2) over l and inserting into (7.1), and using (5.5) to replace αn everywhere, we have log
L rn+1 X l,m + Yl,m + Z l,m + Tl,m , ∼ A2 Re rn
(7.3)
l,m=1
where X l,m = − 21 + χ (η − φl ) h l (η)h m (η)βn(l) βn(m) e2i[(n+1)η+2θn ] , Yl,m = 21 + χ (η − φl ) h l (η)h¯ m (η)βn(l) β¯n(m) , Z l,m = cχ (η − φl )h l (η)h m (η)βn(l) βn(m) ei[(n+1)η+2θn ] , Tl,m = −c (1 + χ (η − φl )) h l (η)h¯ m (η)βn(l) β¯n(m) ei[(n+1)η+2θn ] .
(7.4) (7.5) (7.6) (7.7)
We proceed by applying Lemma 6.1 to these expressions. For OPRL, since singularities of χ (η − φl − φm ) and χ (η − φl + φm ) are inside A3 , applying Lemma 6.1 we get Z l,m ∼ A3 0, Tl,m ∼ A3 0.
(7.8) (7.9)
The same formulas hold for OPUC, but for a different reason: c = 0 implies that Z l,m = Tl,m = 0, so (7.8) and (7.9) are trivial. This is why for OPUC, φl + φm and φl − φm don’t need to be included into A3 .
Generalized Bounded Variation
499
For X l,m , Lemma 6.1 gives a multiplicative factor χ (2η − φl − φm ), which has singularities at η = (φl + φm )/2 + π Z. These points are not in A3 , so it might seem that we will have to apply Lemma 6.1 with a set greater than A3 . We are saved by the observation (1 + χ (η − φl ) + χ (η − φm )) χ (2η − φl − φm ) = χ (η − φl )χ (η − φm )
(7.10)
which is straightforward to check from (6.2). Thus, applying Lemma 6.1 to X l,m + X m,l , the points η = (φl + φm )/2 + π Z are just removable singularities in (7.10) and we get X l,m + X m,l ∼ A2 0.
(7.11)
Since (7.3) contains a sum over all l, m, this is sufficient for our purposes. Combining terms with different permutations of the same indices will also be used in the general case, to avoid unnecessarily expanding the set of critical points. Indeed, Sect. 8 generalizes the observation (7.10) to the general case. When φl = φm , χ (φm − φl ) is just a finite constant so Lemma 6.1 can be applied to Yl,m to give Yl,m ∼ A2 0
(when φl = φm ).
Combining (7.8), (7.9), (7.11) and (7.12) into (7.3), we have rn+1 log ∼ A3 Re Yl,m . rn
(7.12)
(7.13)
1≤l,m≤L φl =φm
Lemma 6.1 is not applicable to the remaining Yl,m ’s, but we are again saved by an observation that Re 21 + χ (η − φl ) = 0. (7.14) Because of this, when φl = φm , Y¯l,m = − 21 + χ (η − φl ) h¯ l (η)h m (η)β¯n(l) βn(m) = −Ym,l , so Re(Yl,m + Ym,l ) = 0 and (7.13) becomes log
rn+1 ∼ A3 0, rn
(7.15)
which completes the proof. In the proof above the observation (7.14) was crucial. To try to arrive to a more illuminating proof, let’s focus on OPUC (where h l (η) = 1) and assume that instead of (7.13) we have, more generally, rn+1 log ∼ A3 Re fl (η)βn(l) β¯n(m) . (7.16) rn 1≤l,m≤L φl =φm
We will now show that Re fl (η) = 0 for all l and η by proving that the converse leads to a contradiction with Lemma 3.1(ii). Assume Re f k (η0 ) = 0 for some k and η0 . Let −inφ k /(n + 2)1/2 for l = k e βn(l) = (7.17) 0 else.
500
M. Lukic
We have suppressed all β (l) with l = k. We have chosen n + 2 in order to make all |βn(k) | < 1; note that this makes αn = βn(k) an allowed choice of Verblunsky coefficients, corresponding by Verblunsky’s theorem to a unique probability measure on the unit circle. With the choice (7.17), (7.16) becomes log
rn+1 ∼ A3 Re f k (η)/(n + 2). rn
(7.18)
Since the harmonic series is divergent and Re f k (η) is continuous in η, depending on the sign of Re f k (η0 ), summing (7.18) in n gives log rn (η) → ±∞ uniformly in a neighborhood of η0 . However, this is a contradiction with Lemma 3.1(ii). Thus, Re fl (η) = 0, so (7.16) becomes (7.15), which completes this alternative proof for OPUC. This method can be applied to OPRL as well, with one extra difficulty: β (l) ’s are not independent there, so constructing counterexamples we have to be more careful than (7.17). Indeed, instead of relying on observations of the type (7.14), this will be the method we will apply to the general p case in Sect. 9. 8. Narrowing the Set of Possible Pure Points In the previous section, if we hadn’t made the observation (7.10) telling us that η = φk +φl + π Z are removable singularities, we would have only proved equisummability 2 away from a larger set of points, and we would have had a weaker result on the set of possible pure points. In this section, we generalize that observation to p . In the p case, iterations of Lemma 6.1 give multiplicative factors of the form ⎛ ⎞ j i χ ⎝kη − φm a + φn b ⎠ a=1
b=1
with k ≤ i and i + j < p. Such a factor has singularities at ⎞ ⎛ j i 1 ⎝ 1 η= φm a − φn b ⎠ + 2π Z. k k a=1
(8.1)
b=1
Surprisingly, with a more careful analysis shown in this section, all the singularities corresponding to k ≥ 2 will turn into removable singularities where needed, so they don’t have to be included into A p . The analysis that follows is quite technical, but the reader not interested in this aspect of the results may skip to the next section and replace the set A p by a greater (but still finite) set, containing all elements of the form (8.1) with k ≤ i and i + j < p. First let us set some conventions and definitions. We will use the Kronecker symbol δn which is 1 if n = 0 and 0 otherwise. Note that I i=0
δi−k δ I −i−(K −k) = δ I −K .
(8.2)
Generalized Bounded Variation
501
We will use the combinatorial convention for binomial coefficients, i.e. n! n if 0 ≤ k ≤ n = k!(n−k)! 0 else. k Two identities will be useful: for l, m, n ≥ 0, l m n m+n = , k l −k l k=0
l m +k n +l −k k=0
m
n
(8.3)
(8.4)
l +m+n+1 = . m+n+1
(8.5)
Equation (8.4) is just Vandermonde’s identity. The more obscure (8.5) has a combinatorial proof, by double-counting the number of subsets of {1, . . . , l + n + m + 1} with exactly m + n + 1 elements: observe that thenumber of such subsets whose (m + 1)st m+k n+l−k . smallest element is m + k + 1 is exactly m n We also need a kind of symmetrized product of functions: Definition 8.1. For a function p I,J of 1+ I + J variables and a function q K ,L of 1+ K + L variables, we define their symmetric product as a function p I,J q K ,L of 1 + (I + K ) + (J + L) variables by I +K +L 1 ( p I,J q K ,L ) η; {xi }i=1 = (I +K )!(J ; {y j } Jj=1 rσ,τ +L)! σ ∈S I +K τ ∈S J +L
with Sn the symmetric group in n elements and I I +K J +L rσ,τ = p I,J η; {xσ (i) }i=1 ; {yτ ( j) } Jj=1 q K ,L η; {xσ (i) }i=I ; {y } τ ( j) +1 j=J +1 . It is straightforward to see that is commutative and associative. Assuming we are in the p case, expanding the log of (5.3) in powers of αn and using O(|αn | p ) ∼ A1 0 gives K +L i[(n+1)η+2θ ] K rn+1 1 n αn e ∼ A1 − Re (cα¯ n ) L log K +L K rn K ,L≥0 0
+ 21
1 k+l k+l k (cαn
l + cα¯ n )k (1 − c2 )αn α¯ n .
(8.6)
k,l≥0 0
Note that this is of the form rn+1 log ∼ A1 Re rn
ξ I,J,K ,L αnI α¯ nJ ei K [(n+1)η+2θn ] c L ,
(8.7)
I,J,K ,L≥0 I +J < p
where ξ I,J,K ,L are constants. For K > 0 only the first sum in (8.6) contributes to ξ I,J,K ,L and we read off their values, K+L 1 (for K > 0) (8.8) ξ I,J,K ,L = δ I −K δ J −L K+L K
502
M. Lukic
(the values for K = 0 will turn out to be of no importance to us). Our method is to substitute αn using (5.5) and apply Lemma 6.1 to terms of the form I
f (η)
(h ki (η)βn(ki ) )
i=1
J
(l )
(h¯ l j (η)β¯n j ) ei K [(n+1)η+2θn ] c L
(8.9)
j=1
in increasing order of I + J . Note that this term will occur in all possible permutations of k1 , . . . , k I and of l1 , . . . , l J , so we can average in those terms before applying Lemma 6.1. After such averaging, the function f (η) in the term (8.9) is of the form f I,J,K ,L (η; φk1 , . . . , φk I ; φl1 , . . . , φl J ), and the corresponding g(η) constructed by Lemma 6.1 is ⎛ ⎞ I J g I,J,K ,L = χ ⎝ K η − φk i + φl j ⎠ f I,J,K ,L . i=1
(8.10)
j=1
All terms we encounter have I, J, K , L ≥ 0, so we define f I,J,K ,L = g I,J,K ,L = 0
unless I, J, K , L ≥ 0.
(8.11)
Note that f I,J,K ,L and g I,J,K ,L are well-defined functions of 1 + I + J parameters, and that they are symmetric in the I parameters φki and also in the J parameters φl j . Our goal is precisely to show that g I,J,K ,L has its singularities only at points of the form (8.1) with k = 1. To do this, we will first establish a recurrence relation for these functions. Any contribution to f I,J,K ,L is either ξ I,J,K ,L from the starting expression (8.7) or comes from an earlier term as gι, j,k,l multiplied by a constant from the Taylor expansion Pk, p−ι− j of e2ik(θn+1 −θn ) − 1. Starting from (6.4) and expanding, we have α+β k γ +δ Pk,l (αn , ei[(n+1)η+2θn ] ) = (−1)γ +δ k+α+β−1 α+β α γ +δ γ α,β,γ ,δ≥0 0<α+β+γ +δ
×(αn )α+δ (α¯ n )β+γ (ei[(n+1)η+2θn ] )α−γ cβ+δ .
(8.12)
From (8.12) we read off the value of the constant multiplying gι, j,k,l , and matching the powers of αn , α¯ n , ei[(n+1)η+2θn ] , and c, we get I = ι + α + δ, J = j + β + γ , K = k + α − γ , L = l + β + δ. Since f I,J,K ,L is then symmetrized in the appropriate variables, every product of gι, j,k,l by a constant becomes a symmetric product, so ω K ,α,β,γ ,δ g I −α−δ,J −β−γ ,K +γ −α,L−β−δ (8.13) f I,J,K ,L = ξ I,J,K ,L + α,β,γ ,δ≥0 α+β+γ +δ≥1
with ω K ,α,β,γ ,δ a constant function of 1 + (α + δ) + (β + γ ) variables, +β−1 K +γ −α α+β γ +δ ω K ,α,β,γ ,δ = (−1)γ +δ K +γα+β γ +δ α γ
(8.14)
(this is the constant from (8.12), with the replacement k = K +γ −α). By the convention (8.3), the right-hand side of (8.14) is 0 unless K ≥ 1 and α, β, γ , δ ≥ 0.
Generalized Bounded Variation
503
We have found the desired recursion relation, in the form of (8.13). Note that (8.10), (8.11) and (8.13) determine the f I,J,K ,L and g I,J,K ,L uniquely. Since ω K ,0,0,0,0 = 1, it is convenient to define h I,J,K ,L = f I,J,K ,L + g I,J,K ,L ,
(8.15)
and rewrite (8.13) as h I,J,K ,L = ξ I,J,K ,L +
ω K ,α,β,γ ,δ g I −α−δ,J −β−γ ,K +γ −α,L−β−δ .
(8.16)
α,β,γ ,δ≥0
Note that (8.15) and (8.10) imply ⎛
⎛
h I,J,K ,L = g I,J,K ,L exp ⎝−i ⎝ K η −
I
φk i +
i=1
J
⎞⎞ φl j ⎠⎠.
(8.17)
j=1
It will be useful to introduce a rescaled version of functions introduced so far. Define K ,α,β,γ ,δ as a function of 1 + (α + δ) + (β + γ ) variables, +β−1 K α+δ β+γ (8.18) K ,α,β,γ ,δ = (−1)γ +δ K +γK −1 α+δ α β . By (8.3), this is equal to 0 unless K ≥ 1 and α, β, γ , δ ≥ 0. Define I,J,K ,L as a function of 1 + I + J variables equal to I,J,K ,L = δ I −K δ J −L KK+L−1 −1 .
(8.19)
By (8.3), this is equal to 0 unless I = K ≥ 1 and J = L ≥ 0. It is straightforward to check (K + γ − α) K ,α,β,γ ,δ = K ω K ,α,β,γ ,δ , I,J,K ,L = K ξ I,J,K ,L ,
(8.20) (8.21)
G I,J,K ,L = K g I,J,K ,L , H I,J,K ,L = K h I,J,K ,L ,
(8.22) (8.23)
so if we define
then multiplying (8.16) and (8.17) by K gives K ,α,β,γ ,δ G I −α−δ,J −β−γ ,K +γ −α,L−β−δ , H I,J,K ,L = I,J,K ,L + α,β,γ ,δ≥0
⎛
⎛
H I,J,K ,L = G I,J,K ,L exp ⎝−i ⎝ K η −
I
φk i +
i=1
J
⎞⎞ φl j ⎠⎠ .
(8.24) (8.25)
j=1
We are striving to prove the identity i, j,l≥0
G i, j,k,l G I −i,J − j,K −k,L−l =
G I,J,K ,L 0
if 0 < k < K else.
(8.26)
504
M. Lukic
Comparing with the 3 case, the observation (7.10) is a special case of this identity, namely, G 2,0,2,0 = G 1,0,1,0 G 1,0,1,0 (since G 0,0,1,0 = 0 is easily computed from the recurrence relations). The following lemma proves identity (8.26) and uses it to describe non-removable singularities of f I,J,K ,L and g I,J,K ,L . It also analyzes the case L = 0 in particular, since this is the only case that matters for OPUC (c = 0 means that (8.9) vanishes for L > 0). Lemma 8.1. For I, J, K , L , k, A, B, C, D ∈ Z, the following are true: (i) For 0 < k < K , I J L
i, j,k,l I −i,J − j,K −k,L−l = I,J,K ,L .
(8.27)
K −k,A−a,B−b,C−c,D−d k,a,b,c,d = K ,A,B,C,D .
(8.28)
i=0 j=0 l=0
(ii) For 0 < k < K , A B D C a=0 b=0 c=0 d=0
(iii) For k ≥ 1,
i, j,k,l G I −i,J − j,K −k,L−l
i, j,l≥0
=
k,α,β,γ ,δ G I −α−δ,J −β−γ ,K +γ −α,L−β−δ .
(8.29)
α,β,γ ,δ≥0 α≥γ +k
(iv) (8.26) holds for all I, J, K , L ∈ Z. (v) Non-removable singularities of f I,J,K ,L are of the form (8.1) with k = 1 and i + j < I + J. (vi) Non-removable singularities of g I,J,K ,L are of the form (8.1) with k = 1 and i + j ≤ I + J. (vii) Non-removable singularities of f I,J,K ,0 are of the form (8.1) with k = i − j = 1 and i + j < I + J . (viii) Non-removable singularities of g I,J,K ,0 are of the form (8.1) with k = i − j = 1 and i + j ≤ I + J . Proof. (i) First note that both sides of (8.27) are zero unless I, J, L ≥ 0. If I, J, L ≥ 0, using the definition (8.19), (8.27) follows from a double application of (8.2) to resolve the sums in i and j, and (8.5) to resolve the sum in l. (ii) First note that both sides of (8.28) are zero unless A, B, C, D ≥ 0. If A, B, C, D ≥ 0, using the definition (8.18), the left-hand side of (8.28) becomes a product of a sum in indices a and d and a sum in b and c. For the sum in a and d, we introduce a change of indices to x = a + d instead of d. Since the summand is 0 outside the limits of summation, including some extra terms doesn’t alter the sum, so A D a=0 d=0
K −k A+D−a−d k a+d A+D−a−d A−a a+d a
Generalized Bounded Variation
505
= =
A+D x
K −k A+D−x k x A+D−x A−a x a
x=0 a=0 K A+D A+D A
after a double application of (8.4), first to compute the sum in a, and then to compute the sum in x. In the sum over b and c, we introduce a change of indices to y = b + c instead of c. Analogously to the previous sum, since the summand is 0 outside the limits of summation, B C K −k+B−b+C−c−1 B+C−b−ck+c+b−1b+c K −k−1
B−b
k−1
b
b=0 c=0
=
B+C
y K −k+B+C−y−1 B+C−y k+y−1 y K −k−1
B−b
k−1
b
y=0 b=0
=
K +B+C−1 B+C K −1
B
,
where we have used (8.4) to compute the sum in b, then (8.5) to compute the sum in y. Multiplying the two sums completes the proof of (8.28). (iii) By (8.19), i, j,k,l is only non-zero if i = k and j = l, so the left-hand side of (8.29) becomes just a sum over l, k,l,k,l G I −k,J −l,K −k,L−l . l≥0 k By (8.18), k,α,β,γ ,δ has α+δ as one of the factors, so it can only be non-zero if α+δ ≤ k. Coupled with α ≥ γ + k and γ , δ ≥ 0, this gives α = k, γ = δ = 0, so the right-hand side of (8.29) becomes k,k,β,0,0 G I −k,J −β,K −k,L−β .
β≥0
The proof is completed by k,β,k,β = k+β−1 k−1 = k,k,β,0,0 . (iv) If k ≤ 0, then G i, j,k,l = kgi, j,k,l = 0 by definition. For K − k ≤ 0, analogously G I −i,J − j,K −k,L−l = 0. For 0 < k < K , we prove (8.26) by complete induction on I + J. Both sides are 0 if I + J < 0, which provides the basis of induction. Assume that (8.26) holds when I + J < M. For I + J = M, start from Hi, j,k,l H I −i,J − j,K −k,L−l , (8.30) i, j,l≥0
and use (8.24) to replace Hi, j,k,l and H I −i,J − j,K −k,L−l . That gives four sums, one of terms of the form , two of the form G and one of the form G G. Use (8.27) to compute the sum of , use (8.29) to replace the sums of G by sums of G, and use the inductive assumption to replace the sum of G G by a sum of G (this will be possible for all terms except K −k,0,0,0,0 k,0,0,0,0 G I,J,K ,L because for that term I + J is not less than
506
M. Lukic
M). Finally using (8.28) to replace the sum of G by a sum of G and using (8.25) to combine terms, we conclude that (8.30) is equal to H I,J,K ,L − G I,J,K ,L +
G i, j,k,l G I −i,J − j,K −k,L−l .
(8.31)
i, j,l≥0
However, applying (8.25) to H I,J,K ,L , Hi, j,k,l , H I −i,J − j,K −k,L−l , one gets i, j,l≥0
Hi, j,k,l H I −i,J − j,K −k,L−l
i, j,l≥0
G i, j,k,l G I −i,J − j,K −k,L−l
=
H I,J,K ,L . G I,J,K ,L
(8.32)
From (8.30)=(8.31) and (8.32), we conclude that (8.26) holds for our choice of I, J, K , L, which completes the inductive step. We prove (v) and (vi) simultaneously by induction on I + J . If (vi) holds for I + J < M: by (8.13), singularities of f I,J,K ,L come from a gi, j,k,l with i + j < I + J , so (v) then holds for I + J ≤ M. If (v) holds for I + J < M: by applying (8.26) K − 1 times, g I,J,K ,L can be written as a sum of K -fold products of gi, j,1,l with i + j ≤ I + J . Thus, all its non-removable singularities are singularities of a gi, j,1,l with i + j ≤ I + J . By (8.10), those can only be of the form (8.1) with k = 1, or coming from f i, j,1,l . Thus, (vi) holds for I + J < M. For (vii) and (viii), note that in the L = 0 case (8.13) becomes f I,J,K ,0 = ξ I,J,K ,0 +
ω K ,α,0,γ ,0 g I −α,J −γ ,K +γ −α,0 ,
(8.33)
α,γ ≥0 α+γ ≥1
where ξ I,J,K ,0 = δ I −K δ J . Induction on (8.33) using (8.10) then shows that f I,J,K ,0 = g I,J,K ,0 = 0 unless I − J = K . With this observation in mind, the proof of (vii) and (viii) is analogous to the proof of (v) and (vi) above, using (8.33) instead of (8.13). For OPRL, if we are in the p case, we encounter functions f I,J,K ,L and g I,J,K ,L with I + J < p. Lemma 8.1(v),(vi) implies that all of their non-removable singularities are of the form (8.1) with k = 1 and i + j < p. All such points are in the set A p given by (5.8), so all iterations of Lemma 6.1 can be performed away from A p . For OPUC, since c = 0, terms with L > 0 vanish. For terms with L = 0, Lemma 8.1(vii),(viii) implies that all non-removable singularities of f I,J,K ,0 and g I,J,K ,0 are of the form (8.1) with k = i − j = 1 and i + j < p. All such points are in the set A p given by (5.8), so all iterations of Lemma 6.1 can be performed away from A p . 9. Proof in the General Case In this section, we complete the proofs of Theorems 1.1 and 1.2 in the general p case. As hinted before, the key idea will be to use Lemma 3.1(ii) and Lemma 4.1(ii); we will be able to prove that if log rn didn’t converge as desired, it would be possible to construct a set of recursion coefficients (corresponding to a measure) for which it diverged uniformly on an interval, contradicting Lemma 3.1(ii) or Lemma 4.1(ii). As explained in the previous section, the first step in the proof is to start with (8.6) and iteratively apply Lemma 6.1 to terms of the form
Generalized Bounded Variation
507
I f I,J,K ,L η; {φki }i=1 ; {φl j } Jj=1
I
(h ki (η)βn(ki ) )
i=1
J
(l )
(h¯ l j (η)β¯n j ) ei K [(n+1)η+2θn ] c L
j=1
in increasing order of I + J . In the previous section, we have seen that the only singularities we will encounter in these iterations are in A p . Lemma 6.1 can be applied to a term unless K = 0 and φ ∈ 2π Z, so after the iterative procedure, what remains is a sum of such terms, rn+1 I log ∼ A p Re f I,J,0,L η; {φki }i=1 ; {φl j } Jj=1 rn ⎞ I
× i=1
(h ki (η)βn(ki ) )
J
(l ) (h¯ l j (η)β¯n j ) c L ⎠ ,
(9.1)
j=1
with the sum going over (I + J )-tuples (k1 , . . . , k I , l1 , . . . , l J ) with φk1 + · · · + φk I − φl1 − · · · − φl J = 0,
(9.2)
and I + J < p. At this point, a change of notation will be useful. Our proof in this section will rely on constructing counterexamples, and for that it would be useful to be able to construct β (l) ’s independently. For OPUC this is true, but for OPRL, by Lemma 2.2(vii), β (l) ’s come in complex-conjugate pairs: for every β (l) there is a β (k) = β¯ (l) . For each such pair, let us keep only one of the two sequences, say β (l) , and replace β (k) everywhere by β¯ (l) . This is equivalent to replacing (5.5) by
αn (η) =
L
h l (η)(βn(l) + cβ¯n(l) ).
(9.3)
l=1 (l)
Notice that the right-hand side of (9.1) is the real part of a polynomial in βn and with coefficients continuous in η. Denoting this polynomial by Q, (9.1) becomes rn+1 log ∼ A p Re Q(η; βn(1) , . . . , βn(L) ; β¯n(1) , . . . , β¯n(L) ). (9.4) rn
(l) β¯n ,
We now make the claim that the right-hand side vanishes identically. Lemma 9.1. For all η ∈ / A p + 2π Z and all z 1 , . . . , z L ∈ C, Re Q(η; z 1 , . . . , z L ; z¯ 1 , . . . , z¯ L ) = 0.
(9.5)
Proof. The proof will proceed by contradiction. Split Q into a sum of homogeneous polynomials Q 1 , . . . , Q p−1 with deg Q k = k. If the claim of the lemma is false, then there exists a smallest k such that Re Q k does not vanish identically, and a choice of η0 , z 1 , . . . , z L such that Re Q k (η0 ; z 1 , . . . , z L ; z¯ 1 , . . . , z¯ L ) = 0. Since Q depends only on the values of p, the phases φ1 , . . . , φ L , and h 1 (η), . . . , (l) (l) h L (η), but not on βn , we are free to make a choice for βn . Let z e−inφl n −1/( p−1) for n ≥ n 0 (9.6) βn(l) = l 0 for n < n 0 .
508
M. Lukic
Note that β (l) ∈ p ∩ G BV (φl ). Through (9.3), this choice of β (l) corresponds to a sequence of recursion coefficients, if we choose n 0 large enough that the recursion coefficients are in the allowed range (|αn | < 1 for OPUC, an2 − 1 > −1 for OPRL). Verblunsky’s or Favard’s theorem then imply that (9.6) corresponds to a probability measure on the unit circle or real line. Thus, (9.4) holds for the choice (9.6). (k ) (k ) (l ) (l ) For every monomial βn 1 · · · βn I β¯n 1 · · · β¯n J in Q, the condition (9.2) is satisfied, −inφ l cancel out completely in Q, and substituting (9.6) into (9.4) gives so the factors e log
p−1 rn+1 ∼Ap Re Q l (η; z 1 , . . . , z L ; z¯ 1 , . . . , z¯ L ) n −l/( p−1) . rn
(9.7)
l=1
Summing (9.7) in n, the non-zero term with l = k will dominate the sum, and since ∞ −k/( p−1) = ∞, this will imply that log r converges to +∞ or −∞ (depending n n=1 n on the sign of Re Q k ) uniformly on η in a neighborhood of η0 . By Lemma 3.1(ii) or Lemma 4.1(ii), this is a contradiction, so (9.5) holds. Having proved Lemma 9.1, (9.4) becomes (5.9). By Lemma 3.1(i) and Lemma 4.1(i), this completes the proof of Theorems 1.1 and 1.2. Acknowledgement. It is my pleasure to thank my advisor, Professor Barry Simon, for suggesting this problem and for his guidance and helpful discussions.
References 1. Blumenthal, O.: Ueber die Entwicklung einer willkürlichen Funktion nach den Nennern des Kettenbru 0 ϕ(ξ )dξ ches für −∞ x−ξ . Ph.D. dissertation, Göttingen, 1898 2. Breuer, J.: Singular continuous and dense point spectrum for sparse trees with finite dimensions. In: Probability and Mathematical Physics, CRM Proc. and Lecture Notes 42, Providence RI: Amer. Math. Soc., 2007, pp. 65–83 3. Breuer, J.: Spectral and dynamical properties of certain random Jacobi matrices with growing parameters. Trans. Amer. Math. Soc. 362, 3161–3182 (2010) 4. Breuer, J., Last, Y., Simon, B.: The Nevai condition. Constr. Approx. 32, 221–254 (2010) 5. Chihara, T. S.: An Introduction to Orthogonal Polynomials. Mathematics and Its Applications 13, New York-London-Paris: Gordon and Breach, 1978 6. Eggarter, T.: Some exact results on electron energy levels in certain one-dimensional random potentials. Phys. Rev. B5, 3863–3865 (1972) 7. Freud, G.: Orthogonal Polynomials. Oxford-New York: Pergamon Press, 1971 8. Geronimus, Ya. L.: Orthogonal Polynomials: Estimates, Asymptotic Formulas, and Series of Polynomials Orthogonal on the Unit Circle and on an Interval. New York: Consultants Bureau, 1961 9. Geronimus, Ya. L.: Polynomials orthogonal on a circle and their applications. Amer. Math. Soc. Translation 1954(104), 79 pp (1954) 10. Golinskii, L., Nevai, P.: Szeg˝o difference equations, transfer matrices and orthogonal polynomials on the unit circle. Commun. Math. Phys. 223, 223–259 (2001) 11. Gredeskul, S. A., Pastur, L. A.: Behavior of the density of states in one-dimensional disordered systems near the edges of the spectrum. Theor. Math. Phys. 23, 132–139 (1975) 12. Janas, J., Simonov, S.: Weyl–Titchmarsh type formula for discrete Schrödinger operator with Wigner–von Neumann potential. To appear in Studia Math. available at http://arxiv.org/abs/1003.3319v1 [math.SP], 2010 13. Kaluzhny, U., Last, Y.: Purely absolutely continuous spectrum for some random Jacobi matrices. In: Probability and mathematical physics, CRM Proc. Lecture Notes 42, Providence, RI: Amer. Math. Soc., pp. 273–281, 2007 14. Kaluzhny, U., Shamis, S.: Preservation of Absolutely Continuous Spectrum of Periodic Jacobi Operators Under Perturbations of Square-Summable Variation. to appear in Constr. Approx, available at http:// arxiv.org/abs/0912.1142v2 [math.SP], 2010
Generalized Bounded Variation
509
15. Kiselev, A., Last, Y., Simon, B.: Modified Prüfer and EFGP transforms and the spectral analysis of one-dimensional Schrödinger operators. Commun. Math. Phys. 194(1), 1–45 (1998) 16. Last, Y.: Destruction of absolutely continuous spectrum by perturbation potentials of bounded variation. Commun. Math. Phys. 274(1), 243–252 (2007) 17. Máté, A., Nevai, P.: Orthogonal polynomials and absolutely continuous measures. In: Approximation Theory, IV (College Station, TX, 1983), New York: Academic Press, 1983, pp. 611–617 18. Nevai, P.: Orthogonal polynomials. Mem. Amer. Math. Soc. 18(213), 185 pp, (1979) 19. Nevai, P.: Orthogonal polynomials, measures and recurrences on the unit circle. Trans. Amer. Math. Soc. 300(1), 175–189 (1987) 20. Nikishin, E.M.: An estimate for orthogonal polynomials. Acta Sci. Math. (Szeged) 48(1–4), 395– 399 (1985) 21. Pastur, L., Figotin, A.: Spectra of Random and Almost-Periodic Operators. Berlin: Springer, 1992 22. Peherstorfer, F., Steinbauer, R.: Orthogonal polynomials on the circumference and arcs of the circumference. J. Approx. Theory 102(1), 96–119 (2000) 23. Prüfer, H.: Neue Herleitung der Sturm–Liouvilleschen Reihenentwicklung stetiger Funktionen. Math. Ann. 95(1), 499–518 (1926) 24. Simon, B.: Orthogonal Polynomials on the Unit Circle, Part 1: Classical Theory. AMS Colloquium Publications 54.1, Providence, RI: Amer. Math. Soc., 2005 25. Simon, B.: Orthogonal Polynomials on the Unit Circle, Part 2: Spectral Theory. AMS Colloquium Publications 54.2, Providence, RI: Amer. Math. Soc., 2005 26. Simon, B.: Szeg˝o’s Theorem and Its Descendants: Spectral Theory for L 2 Perturbations of Orthogonal Polynomials. Princeton, NJ: Princeton University Press, 2010 27. Simon, B.: Orthogonal polynomials with exponentially decaying recursion coefficients. In: Probability and Mathematical Physics, CRM Proc. Lecture Notes 42 Providence, RI: Amer. Math. Soc., 2007, pp. 453–463 28. Stieltjes, T.: Recherches sur les fractions continues. Ann. Fac. Sci. Univ. Toulouse 8, J76–J122; ibid. 9, A5–A47 (1894-95) 29. Szeg˝o, G.: Orthogonal Polynomials. Amer. Math. Soc. Colloq. Publ. 23, Providence, RI: Amer. Math. Soc., 1939, third edition, 1967 30. Verblunsky, S.: On positive harmonic functions: A contribution to the algebra of Fourier series. Proc. London Math. Soc. (2) 38, 125–157 (1935) 31. Weidmann, J.: Zur Spektraltheorie von Sturm-Liouville-Operatoren. Math. Z. 98, 268–302 (1967) 32. Weyl, H.: Über beschraänkte quadratische Formen, deren Differenz vollstetig ist. Rend. Circ. Mat. Palermo 27, 373–392 (1909) 33. Wong, M.-W. L.: Generalized bounded variation and inserting point masses. Constr. Approx. 30(1), 1–15 (2009) Communicated by B. Simon
Commun. Math. Phys. 306, 511–563 (2011) Digital Object Identifier (DOI) 10.1007/s00220-011-1253-6
Communications in
Mathematical Physics
From Weak to Strong Coupling in ABJM Theory Nadav Drukker1 , Marcos Mariño2,3 , Pavel Putrov3 1 Institut für Physik, Humboldt-Universität zu Berlin, Newtonstraße 15, D-12489 Berlin, Germany.
E-mail:
[email protected]
2 Département de Physique Théorique, Université de Genève, Genève CH-1211, Switzerland.
E-mail:
[email protected]
3 Section de Mathématiques, Université de Genève, Genève CH-1211, Switzerland.
E-mail:
[email protected] Received: 26 August 2010 / Accepted: 14 October 2010 Published online: 18 May 2011 – © Springer-Verlag 2011
Abstract: The partition function of N = 6 supersymmetric Chern–Simons-matter theory (known as ABJM theory) on S3 , as well as certain Wilson loop observables, are captured by a zero dimensional super-matrix model. This super–matrix model is closely related to a matrix model describing topological Chern–Simons theory on a lens space. We explore further these recent observations and extract more exact results in ABJM theory from the matrix model. In particular we calculate the planar free energy, which matches at strong coupling the classical IIA supergravity action on AdS4 × CP3 and gives the correct N 3/2 scaling for the number of degrees of freedom of the M2 brane theory. Furthermore we find contributions coming from world-sheet instanton corrections in CP3 . We also calculate non-planar corrections, both to the free energy and to the Wilson loop expectation values. This matrix model appears also in the study of topological strings on a toric Calabi–Yau manifold, and an intriguing connection arises between the space of couplings of the planar ABJM theory and the moduli space of this Calabi–Yau. In particular it suggests that, in addition to the usual perturbative and strong coupling (AdS) expansions, a third natural expansion locus is the line where one of the two ’t Hooft couplings vanishes and the other is finite. This is the conifold locus of the Calabi–Yau, and leads to an expansion around topological Chern–Simons theory. We present some explicit results for the partition function and Wilson loop observables around this locus. Contents 1. 2. 3.
Introduction and Summary . . . . . . . . . . . . . . The ABJM Matrix Model and Wilson Loops . . . . 2.1 The matrix model and its planar limit . . . . . . 2.2 Wilson loops . . . . . . . . . . . . . . . . . . Moduli Space, Picard–Fuchs Equations and Periods 3.1 Orbifold point, or weak coupling . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
512 515 515 519 520 522
512
N. Drukker, M. Mariño, P. Putrov
3.2 Large radius, or strong coupling . . . . . . . . . . . . . . . . 3.3 Conifold locus . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 The moduli space of the ABJM theory . . . . . . . . . . . . . 4. Weak Coupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. Strong Coupling Expansion and the AdS Dual . . . . . . . . . . . 5.1 Analytic continuation and shifted charges . . . . . . . . . . . 5.2 Wilson loops at strong coupling and semi–classical strings . . 5.3 The planar free energy and a derivation of the N 3/2 behaviour 5.4 Calculation of the free energy in the gravity dual . . . . . . . . 6. Conifold Expansion . . . . . . . . . . . . . . . . . . . . . . . . . 6.1 Expansion from the exact planar solution . . . . . . . . . . . . 6.2 Conifold expansion from the matrix model . . . . . . . . . . . 6.3 On the near Chern–Simons expansion of ABJM theory . . . . 7. Modular Properties and the Genus Expansion . . . . . . . . . . . . 8. More Exact Results on Wilson Loops . . . . . . . . . . . . . . . . 8.1 1/N corrections . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Giant Wilson loops . . . . . . . . . . . . . . . . . . . . . . . A. Normalization of the ABJM Matrix Model . . . . . . . . . . . . . B. Giant Wilson Loops in Chern–Simons Theory . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
523 525 527 529 530 530 533 534 536 539 539 541 544 546 551 551 554 558 559
1. Introduction and Summary The discovery of Aharony, Bergman, Jafferis and Maldacena (ABJM) of the worldvolume theory of coincident M2-branes [1] (following Bagger-Lambert and Gustavsson [2,3]) provides a new interacting field theory with well defined weak and strong coupling expansions. A great deal of effort has been given to studying these two limits of the theory: three dimensional N = 6 supersymmetric Chern–Simons-matter and type IIA string theory on AdS4 × CP3 (or M-theory on AdS4 × S7 /Zk ). For better or worse, both descriptions of the theory are much harder than the D3-brane analog: 4d N = 4 SYM and type IIB string theory on AdS5 × S5 . At weak coupling perturbative calculations in ABJM theory are rather subtle and for many quantities are in even powers of the coupling, while at strong coupling the geometry of CP3 is more complicated than S5 and has, for example, non-trivial 2-cycles. An important breakthrough, which is the underpinning of the present study, was the work of Kapustin, Willett and Yaakov [4], who use the localization techniques of [5] to reduce the calculation of certain quantities in the gauge theory on S3 to finite dimensional matrix integrals.1 These matrix integrals can be evaluated in a systematic expansion in 1/N . Indeed, they have a natural supergroup structure, i.e., they are super-matrix models [7,8], and are related to some previously studied bosonic matrix models [9,10] by analytical continuation [8]. The solution of this matrix model allowed for the evaluation of the first exact interpolating function in this theory [8] giving a closed form expression for the expectation value of the 1/2 BPS Wilson loop operator of [7] at all values of the coupling. This expression derived from the matrix-model reduction of the gauge theory reproduces exactly the known leading strong coupling result, the classical action of a macroscopic string in AdS4 . 1 Similar results apply also to other 3d theories with N = 2 supersymmetry [4,6].
From Weak to Strong Coupling in ABJM Theory
513
The purpose of this paper is to explore further what can be learnt from the matrix model and its solution to the understanding of the physical 3d gauge theory and its string/M-theory dual. This is a broad subject, connected through the matrix model to special geometry, Chern–Simons (CS) theory, topological strings and more. One of the avenues we explore is the relation between the moduli space of the matrix model and the space of couplings of the gauge theory. It is very useful to consider the generalization of the gauge theory where the rank of the two gauge groups are not equal [11].2 The space of couplings is two dimensional and upon complexification, it matches the moduli space of the Riemann surface solving the planar matrix model. This surface is also the mirror to a well studied toric Calabi–Yau manifold known as local F0 , where F0 = P1 × P1 is a Hirzebruch surface. As we review in Sect. 3, this moduli space has three special loci: the orbifold point, the large radius limit and the conifold locus. These can be identified in the gauge theory respectively as the weakly coupled gauge theory, the strongly coupled theory described by string theory on AdS, and lastly the conifold locus is where the rank of one of the gauge group vanishes, so ABJM theory reduces to topological CS theory [12]. The first two are known duality frames with the AdS/CFT rules on how to evaluate observables on both sides. The simplicity of the conifold locus suggests that there should be another duality frame where ABJM theory is considered as a deformation of topological CS theory. We explore this in Sect. 6, where we calculate the partition function and Wilson loop observables around this point. It would be very interesting to learn how to calculate other quantities in this regime. We present the matrix model for the ABJM theory and that for CS theory on the lens space L(2, 1) = S3 /Z2 in the next section. The matrix model of ABJM has an underlying U (N1 |N2 ) symmetry while that of the lens space has U (N1 + N2 ) symmetry, which in both cases are broken to U (N1 )×U (N2 ). It is easy to see that the expressions for them are related by analytical continuation of N2 → −N2 , or analogously a continuation of the ’t Hooft coupling N2 /k → −N2 /k (which may be attributed to the negative level of the CS coupling of this group in the ABJM theory). We can then go on to study the lens space model and analytically continue to ABJM at the end. Conveniently, the lens space matrix model has been studied in the past [8,10,13,14]. The planar resolvent is known in closed form and the expressions for its periods are given as power series at special points in moduli space. We review the details of this matrix model and its solution in Sects. 2 and 3. The matrix model of ABJM theory was derived by localization: it captures in a finite dimensional integral all observables of the full theory which preserve certain supercharges. At the time it was derived in [4], the only such observable (apart for the vacuum) was the 1/6 BPS Wilson loop constructed in [15–17] and 1/2 BPS vortex loop operators [18]. Indeed, the expectation value of the 1/6 BPS Wilson loop can be expressed as an observable in the ABJM matrix model, and by analytical continuation in the lens space model. Another class of Wilson loop operators, which preserve 1/2 of the supercharges, was constructed in [7] and studied further in [19]. It is the dual of the most symmetric classical string solution in AdS4 ×CP3 . This Wilson loop is based on a super-connection in space-time and reduces upon localization to the trace of a supermatrix in the ABJM matrix model [7]. The different 1/2 BPS Wilson loops are classified by arbitrary representations of the supergroup U (N1 |N2 ), and the 1/6 BPS ones are classified by a pair 2 Though commonly known as ABJ theory, for simplicity we still call the theory with this extra parameter ABJM theory. When specializing to the case of equal rank we refer to it as the “ABJM slice”.
514
N. Drukker, M. Mariño, P. Putrov
of representations3 of U (N1 ) and U (N2 ). We will mostly concern ourselves with the 1/2 BPS Wilson loop in the fundamental representation of U (N1 |N2 ) and the 1/6 BPS Wilson loop in the fundamental representation of U (N1 ). The exception is Sect. 8.2 and Appendix B, where we study the 1/2 BPS Wilson loop in large symmetric and antisymmetric representations. There we also make contact with the vortex loop operators of [18]. Of course, the natural observables in CS theory are the partition function and Wilson loops, so these quantities were also studied earlier in the matrix models of CS (see, for example, [9,10,13,20–22]). This information is encoded in different period integrals on the surface solving the matrix model, as we explain in Sect. 2.2. It turns out that the 1/6 BPS loop is captured by a period integral around one of the two cuts in the planar solution and the 1/2 BPS Wilson loop by a period integral around both cuts, or alternatively, around the point at infinity, and is much easier to calculate [8]. With all this machinery presented in Sects. 2 and 3 in hand, we are ready to calculate, and in Sects. 4, 5 and 6 we study the partition function and Wilson loop observables in the three natural limits of the matrix model. First, in Sects. 4 we look at the orbifold point, which is the weak coupling point of the matrix model and likewise of the physical ABJM theory. The calculations there are straightforward and we present the answers to these quantities. A single term (1/6 BPS loop at 2-loops) was calculated independently directly in the field theory. All other terms are predictions for the higher order perturbative corrections. Section 5 addresses the strong coupling limit of the theory, where the matrix model should reproduce the semiclassical expansion of these observables in type IIA string theory on AdS4 × CP3 . The expectation value of the Wilson loop was already derived in [8] and matched with a classical string in AdS. We first generalize the strong coupling expansion for the case of N1 = N2 , which corresponds to turning on a B-field in the AdS dual. This version of the theory was studied in [11] and a more precise analysis of the dictionary, capturing shifts in the charges, was presented in [23,24]. Interestingly, it turns out that the matrix model knows about these shifted charges, and the strong coupling parameter turns out to be exactly the one calculated in [24], rather than the naive coupling. In the same section we present also the √ calculation of the free energy in the matrix model. The result is proportional to N 2 / λ (or a slight generalization for N1 = N2 ). This scales at large N like N 3/2 , which is indeed the M-theory prediction for the number of degrees of freedom on N coincident M2-branes [25]. Comparing with a supergravity calculation, we find precise agreement with the classical action of AdS4 × CP3 . This is the first derivation of this large N scaling in the field theory side. The matrix model also provides an infinite series of instanton/anti–instanton corrections to both the partition function and to the Wilson loop expectation value, which we interpret as fundamental strings wrapping the CP1 inside CP3 . We then turn to a third limit of the theory, when one of the gauge couplings is perturbative and the other one not. In the strict limit the ABJM theory reduces to topological CS and in the matrix model one cut is removed. We show how to perform explicit calculations in this regime both from the planar solution of the matrix model and directly by performing matrix integrals. In both approaches one can see the full lens space matrix model arising as a (rather complicated) observable in topological CS theory on S3 . We 3 Special combinations of representations of U (N ) × U (N ) are also representations of U (N |N ), and 1 2 1 2 in this case the 1/6 BPS and 1/2 BPS loops will have the same expression in the matrix model and the same VEVs. The proof of localization for the 1/2 BPS loop [7] relied on this equivalence.
From Weak to Strong Coupling in ABJM Theory
515
speculate on possible tools of calculating directly in ABJM theory in this limit, where integrating out the bi-fundamental matter fields leads to correlation functions of Wilson loops in CS theory. We demonstrate the idea in the case of the 1/6 BPS Wilson loop, which has a relatively simple perturbative expansion. This limit of the spin–chain of ABJM theory was considered in [26], and a similar system in four dimensions was studied in [27]. The brave souls that will make it to Sects. 7 and 8 will find some new results on the non-planar corrections to the matrix model, and hence to ABJM theory. In Sect. 7 we show that the full 1/N expansion of the free energy on S3 is completely determined by a recursive procedure based on direct integration [28,29] of the holomorphic anomaly equations [30]. The ability to determine the full expansion is closely related to the integrability of topological string theory on toric Calabi–Yau threefolds (as discussed in for example [31]). By the AdS/CFT correspondence, the 1/N expansion obtained in this way determines the partition function of type IIA theory on the AdS4 × CP3 background at all genera. This result is reminiscent of the “old” matrix models for noncritical strings, where a double-scaled 1/N expansion, encoded in an integrable system, captures the all-genus partition function of a string theory. The recursive procedure for the computation of the 1/N expansion is quite efficient in practice, and one can perform explicit computations at high genus. This allows us to study the large genus behavior of the 1/N corrections, and we check that they display the factorial growth ∼ (2g)! typical of string perturbation theory [32]. A careful examination of the coefficients suggests that this 1/N expansion is Borel summable. In Sect. 8 we present the genus one correction to the Wilson loop and expand it at both weak and strong coupling. Another topic covered there is that of “giant Wilson loops” [33–35], where in the supergravity dual (at least in AdS5 × S5 ) a fundamental string is replaced by a D-brane. This happens for Wilson loops in representations of dimension comparable to N . We calculate the corresponding object in the matrix model and compare it to the vortex loop operators of [18]. One point we have not touched upon is the connection to topological strings. Since CS and the matrix model are related to topological strings, we expect there to be a direct connection between ABJM theory and a topological string theory. All the quantities captured by the matrix model should exist also in a topologically twisted version of ABJM theory, possibly along the lines of [36]. 2. The ABJM Matrix Model and Wilson Loops 2.1. The matrix model and its planar limit. The ABJM matrix model, obtained in [4], gives an explicit integral expression for the partition function of the ABJM theory on S3 , as well as for Wilson loop VEVs. This matrix model is defined by the partition function Z ABJM (N1 , N2 , gs ) 1
i− 2 (N1 −N2 ) = N1 !N2 ! ×e
2
− 2g1s
2
i
2 2 μi −μ j ν −ν N1 N2 2 sinh i 2 j dν j i< j 2 sinh 2 dμi 2 μi −ν j 2π 2π i=1 j=1 2 cosh i, j 2
μi2 − j ν 2j
,
(2.1)
where the coupling gs is related to the Chern–Simons coupling k of the ABJM theory as 2π i . (2.2) gs = k
516
N. Drukker, M. Mariño, P. Putrov
In writing this matrix integral we have been very careful with its precise overall normalization, since one of our goals in the present paper is to compute the free energy on the sphere at strong coupling. The calculation of [4] captures the full k dependence of the partition function, but we have to fix an overall k-independent normalization. This is done in two steps. First, we require that the above matrix integral reduces to the partition function for Chern–Simons theory on S3 when N1 = 0 or N2 = 0 (in a specific framing of S3 ). Once this is done, there is still a k-independent normalization factor which appears as a constant coefficient multiplying the cosh in the denominator. This term was not fixed in [4], but it can be easily obtained from the formulae they presented. This calculation can be found in Appendix A, and leads to the matrix integral (2.1). The ABJM matrix model is closely related to the L(2, 1) lens space matrix model introduced in [9,10]. This matrix model is defined by the partition function Z L(2,1) (N1 , N2 , gs ) 2 2 1 2 2 N1 N2 dν j μi − μ j νi − ν j dμi i− 2 (N1 +N2 ) 2 sinh 2 sinh = N1 !N2 ! 2π 2π 2 2 i=1 j=1 i< j 2 μi − ν j − 1 μ2 + j ν 2j 2 cosh × e 2gs i i . (2.3) 2 i, j
The relation between the partition functions is simply [8] Z ABJM (N1 , N2 , gs ) = Z L(2,1) (N1 , −N2 , gs ).
(2.4)
Since the large N expansion of the free energy gives a sequence of analytic functions of N1 , N2 , once these functions are known in one model, they can be obtained in the other by the trivial change of sign N2 → −N2 . Let us now discuss the large N solution of the lens space matrix model, following [8,10,13]. At large N , the two sets of eigenvalues, μi , ν j , condense around two cuts. The cut of the μi eigenvalues is centered around z = 0, while that of the νi eigenvalues is centered around z = π i. We will write the cuts as C1 = (−A, A),
C2 = (π i − B, π i + B),
(2.5)
in terms of the endpoints A, B. It is also useful to use the exponentiated variable Z = ez ,
(2.6)
In the Z plane the cuts (2.5) get mapped to (1/a, a),
(−1/b, −b),
a = eA,
b = eB ,
(2.7)
which are centered around Z = 1, Z = −1, respectively, see Fig. 1. We will use the same notation C1,2 for the cuts in the Z plane. The large N solution is encoded in the total resolvent of the matrix model, ω(z). It is defined as [13]
Z +U ω(z) = gs Tr Z −U N N 1 2
z − νj z − μi = gs coth tanh + gs , (2.8) 2 2 i=1
j=1
From Weak to Strong Coupling in ABJM Theory
517
Fig. 1. Cuts in the z-plane and in the Z -plane
where
U=
0 eμi 0 −eν j
.
(2.9)
We will denote by ω0 (z) the planar limit of the resolvent, which was found in explicit form in [13]. It reads, −t/2 e ω0 (z) = 2 log (Z + b)(Z + 1/b) − (Z − a)(Z − 1/a) , (2.10) 2 where t = t1 + t 2 is the total ’t Hooft parameter. It is useful to introduce the variables4 1 1 1 1 1 1 a+ −b− , β= a+ +b+ . ζ = 2 a b 4 a b
(2.11)
(2.12)
β is related to the total ’t Hooft parameter through β = et .
(2.13)
All the relevant planar quantities can be expressed in terms of period integrals of the one-form ω0 (z)dz. The ’t Hooft parameters are given by 1 ω0 (z)dz, i = 1, 2. (2.14) ti = 4π i Ci The planar free energy F0 satisfies the equation I≡
1 ∂ F0 π it ∂ F0 =− − − ω0 (z)dz, ∂t1 ∂t2 2 2 D
(2.15)
where the D cycle encloses, in the Z plane, the interval between −1/b and 1/a (see Fig. 1).5 4 The variable β is related to the variable ξ in [8] by β = ξ/2. 5 Likewise one can calculate the second “B-cycle” period, and it will arise when solving the Picard-Fuchs
equations at strong coupling in Sect. 3.2.
518
N. Drukker, M. Mariño, P. Putrov
The derivatives of these periods can be calculated in closed form by adapting a trick from [37]. One finds, √ ∂t1,2 1 dZ ab =− K (k), (2.16) =± 2 2 2 2 ∂ζ 4π i π(1 + ab) (Z − ζ Z + 1) − 4β Z C1,2
and similarly
√ 2ab ab ∂t1 2 = −2 K (k) −
(n 1 |k) −
(n 2 |k) , ∂β π(1 + ab) 1 + ab 1 + ab
where
k2 = 1 −
a+b 1 + ab
2 ,
n1 =
1 − a2 , 1 + ab
n2 =
b(a 2 − 1) . a(1 + ab)
Likewise for the period integral in (2.15) we find √ ∂I ab = −2 K (k ), ∂ζ 1 + ab √ ∂I ab 2a(1 − b2 ) =4 K (k ) + ( (n 1 |k ) − (n 2 |k )) , ∂β 1 + ab (1 + ab)(a + b)
(2.17)
(2.18)
(2.19)
where k =
a+b , 1 + ab
n 1 =
a+b , b(1 + ab)
n 2 =
b(a + b) . 1 + ab
(2.20)
We can now use the dictionary between the lens space matrix model and the ABJM matrix model given by (2.2) and (2.4) to get the planar solution of the latter model. In particular, the natural ’t Hooft parameters in the ABJM theory Nj (2.21) k are obtained from the planar solution of the lens space matrix model by the replacement λj =
t1 = 2π iλ1 ,
t2 = −2π iλ2 .
(2.22)
Since in the ABJM theory the couplings λ1,2 are real, the matrix model couplings t1,2 are pure imaginary. Thanks to (2.13) we know that β is of the form β = e2π i(λ1 −λ2 )
(2.23)
i.e., it must be a phase. For later convenience we introduce yet another parameterization of the couplings in terms of B and κ, 1 B = λ1 − λ2 + , κ = e−π iB ζ. (2.24) 2 B is identified as the B-field in the dual type IIA background [24]. Notice that it has a shift by −1/2 as compared to the original prescription in [11]. Clearly, all calculations in the matrix model are periodic under B → B + 1, up to possible monodromies (see (5.14) below). As we shall see later, the parameter κ is real for physical values of λ1,2 .
From Weak to Strong Coupling in ABJM Theory
519
2.2. Wilson loops. One of the main results of [4] is that the VEV of the 1/6 BPS Wilson loop in ABJM theory, labelled by a representation R or U (N1 ), can be obtained by calculating the VEV of the matrix eμi in the matrix model (2.1), i.e., 1/6 W R = gs Tr R eμi ABJMMM . (2.25) 1/2
A 1/2 BPS loop WR was constructed in [7], where R is a representation of the supergroup U (N1 |N2 ). In [7] it was also shown that it localizes to the matrix model correlator in the ABJM matrix model 1/2
WR = gs StrR U ABJMMM ,
(2.26)
with the same U as in (2.9). Though at first sight the minus sign on the lower block of U , may look surprising, it can be attributed to the fact that the ν j eigenvalues are shifted by π i from the real line. Due to the relation between the ABJM matrix model and the lens space matrix model, these correlators can be computed in the lens space matrix model as follows: 1/6 W R = gs Tr R eμi L(2,1) , N2 →−N2 1/2 WR = gs Tr R U L(2,1) N →−N , (2.27) 2
2
where the super-representation R is regarded as a representation of U (N1 + N2 ). To evaluate the Wilson loop one uses the resolvent, or equivalently, the eigenvalue densities 1 dZ (ω(Z + i ) − ω(Z − i )) , 4π it1 Z 1 dZ ρ (2) (Z )dZ = (ω(Z + i ) − ω(Z − i )) , 4π it2 Z ρ (1) (Z )dZ = −
which are each normalized in the planar approximation to unity (i) ρ0 dZ = 1. Ci
Z ∈ C1 , (2.28) Z ∈ C2 ,
(2.29)
For the 1/6 BPS Wilson loop in the fundamental representation one needs to integrate ez = Z over the first cut dZ 1/6 (1) ω(Z ). (2.30) = t1 ρ (Z )Z dZ = W 4π i C1 C1 The correlator relevant for the 1/2 BPS Wilson loop (again in the fundamental representation) is much easier, since dZ 1/2 (1) (2) W = t1 ρ (Z )Z dZ − t2 ρ (Z )Z dZ = ω(Z ), (2.31) 4π i C1 C2 ∞ and it can be obtained by expanding ω(Z ) around Z → ∞. The comparison to the case of the 1/2 BPS Wilson loop in N = 4 SYM in 4d is straight-forward. In that case the matrix model is Gaussian and the eigenvalue density
520
N. Drukker, M. Mariño, P. Putrov
in the planar approximation follows Wigner’s semi-circle law. Doing the integral with the insertion of ez gives a modified Bessel function [38], 2 ρ0 (z) = λ − z2 πλ
⇒
1/2 W4dN =4 planar
=
√
λ
√ ρ0 (z) e − λ
z
√ 2 dz = √ I1 ( λ). λ (2.32)
For the ABJM matrix model all the expressions are more complicated. Still the derivative with respect to ζ and β of the integral expression for the 1/6 BPS Wilson loop (2.30) can be written in closed form [8], like the integrals (2.16) and (2.17), 1 1 √ (a K (k) − (a + b) (n|k)) , π ab(1 + ab) √ 2 ab 1/6 E(k). ∂β W = − π a+b ∂ζ W
1/6
=−
(2.33)
For the 1/2 BPS Wilson loop of [7] the situation is much simpler and in the planar approximation one needs only the large Z behavior of ω0 (2.10), ω0 = t +
ζ ζ 2 + 2β 2 − 2 ζ (ζ 2 + 6β 2 − 3) + + + O(Z −4 ). Z 2Z 2 3Z 3
(2.34)
One finds [8] W
1/2
planar =
ζ , 2
(2.35)
which can then be expanded in different regimes. We will elaborate on the expansion of this expression in the next sections and will also turn to the non-planar corrections to it and to that of the 1/6 BPS loop in Sect. 8. As a simple generalization, by the replacement Z → Z l on the right hand side of (2.31), the higher order terms in the expansion (2.34) give the expectation values of multiply wrapped 1/2 BPS Wilson loops where U → U l in (2.26). For even winding the sign in the lower block of the matrix U (2.9) is absent. This is consistent with the gauge theory calculation [7], where this sign arose from the requirement of supersymmetry invariance in the presence of the fermionic couplings which are antiperiodic, as should be the case for a singly-wound contractible cycle (see also the discussion in [19]). The normalization of the Wilson loop as given by (2.30) and (2.31) is not the same as in the 4d N = 4 case (2.32). For the 1/6 BPS loop, the leading term at weak coupling is t1 = 2π iN1 /k. This means that the trace in the fundamental is normalized by a factor of gs . For the 1/2 BPS loop the leading term is t1 ± t2 = gs (N1 ∓ N2 ), where the sign depends on the winding number. We will comment more about this normalization in Sect. 5.2. 3. Moduli Space, Picard–Fuchs Equations and Periods In this section we present the tools for solving the lens space matrix model using special geometry. We present three special points in the moduli space of the theory and write explicit expressions for the four periods of ω0 at the vicinity of these points. The lens space matrix model is equivalent to topological string theory on local F0 = P1 × P1 . The 1/N expansions of the free energy and of the 1/2 BPS Wilson loop VEV
From Weak to Strong Coupling in ABJM Theory
521
are the genus expansions of closed and open topological string amplitudes. The planar content of the theory is encoded in the periods of the mirror geometry described by the family of elliptic curves , which can be written as z 1 x 2 + x + 1 − (1 + x + z 1 x 2 )2 − 4z 2 x 2 y= . (3.1) 2 Here, z 1 , z 2 parametrize the moduli space of complex structures, which is the mirror to the enlarged Kähler moduli space of local F0 . This moduli space has a very rich structure first uncovered in [10] and further studied in, for example, [31,37] by using the standard techniques of mirror symmetry. Notice that the mirror geometry (3.1) is closely related to the resolvent ω0 (Z ). Indeed, one finds that ω0 (Z ) ∼ log y(x) provided we identify the variables as −1/2
x = −Z z 1 and 1 ζ =√ , z1
,
(3.2)
√ β=
z2
.
(3.3)
e2π iB . κ2
(3.4)
z1
This can also be expressed as (2.24) z1 =
e−2π iB , κ2
z2 =
Let us now discuss in some detail the moduli space of (3.1), since it will play a fundamental role in the following. It has complex dimension two, corresponding to the two complexified Kähler parameters of local F0 . The coordinates z 1 , z 2 (or ζ, β) are global coordinates in this moduli space. Another way of parametrizing it is to use the periods of the meromorphic one-form ω = log y(x)
dx . x
(3.5)
As it is well-known, these periods are annihilated by a pair of differential operators called Picard–Fuchs operators. In terms of z 1 , z 2 , the operators are L1 = z 2 (1 − 4z 2 )ξ22 − 4z 12 ξ12 − 8z 1 z 2 ξ1 ξ2 − 6z 1 ξ1 + (1 − 6z 2 )ξ2 , L2 = z 1 (1 − 4z 1 )ξ12 − 4z 22 ξ22 − 8z 1 z 2 ξ1 ξ2 − 6z 2 ξ2 + (1 − 6z 1 )ξ1 ,
(3.6)
where ξi =
∂ . ∂z i
(3.7)
These operators lead to a system of differential equations known as Picard–Fuchs (PF) equations. An important property of the moduli space is the existence of special points, generalizing the regular singular points of ODEs on C. The PF system can be solved around these points, and the solutions give a basis for the periods of the meromorphic one-form. We can use two of the solutions to parametrize the moduli space near a singular point, and the resulting local coordinates, given by periods, are usually called flat coordinates.
522
N. Drukker, M. Mariño, P. Putrov
3.1. Orbifold point, or weak coupling. There are three types of special points in the moduli space. The first one is the orbifold point discovered in [10], which is the relevant one in order to make contact with the matrix model. To study this point one has to use the global variables, x1 = 1 −
z1 , z2
x2 = √
1 z2 1 −
z1 z2
.
(3.8)
The orbifold point is then defined as x1 = x2 = 0, and in terms of these variables the Picard–Fuchs system is given by the two operators 1 1 (8 − 8x1 + x12 )x2 ∂x2 − 4 − (2 − x1 )2 x22 ∂x22 − x1 (2 − 3x1 + x12 )x2 ∂x1 ∂x2 4 4 2 − (1 − x1 )x1 ∂x1 + (1 − x1 )2 x12 ∂x21 , (3.9) L2 = (2 − x1 )x2 ∂x2 − (1 − (1 − x1 )x22 )∂x22 − x12 ∂x1 L1 =
− 2(1 − x1 )x1 x2 ∂x1 ∂x2 + (1 − x1 )x12 ∂x21 . A basis of periods near the orbifold point was found in [10]. It reads, σ1 = − log(1 − x1 ),
σ2 = cm,n x1m x2n , m,n
Fσ2 = σ2 log x1 +
(3.10) dm,n x1m x2n ,
m,n
where the coefficients cm,n and dm,n vanish for non-positive n or m as well as for all even n. They satisfy the following recursion relations with the seed values c1,1 = 1, d1,1 = 0 and d1,3 = −1/6: (n + 2 − 2m)2 cm−1,n , 4(m − n)(m − 1) (n − 2)2 (m − n + 2)(m − n + 1) = cm,n−2 , n(n − 1)(2m − n)2 (n + 2 − 2m)3 dm−1,n + 4(n 2 − n − 2m + 2)cm,n , = 4(m − 1)(m − n)(n + 2 − 2m) (n − 2)2 (m − n + 1)(m − n + 2) = dm,n−2 n(n − 1)(2m − n)2 1 4 1 + + cm,n . + m − n + 2 m − n + 1 n − 2m
cm,n = cm,n dm,n dm,n
(3.11)
The ’t Hooft parameters of the matrix model are period integrals of the meromorphic one-form, therefore they must be linear combinations of the periods above, and one finds [10] t1 =
1 (σ1 + σ2 ), 4
t2 =
1 (σ1 − σ2 ). 4
(3.12)
From Weak to Strong Coupling in ABJM Theory
523
An expansion around the orbifold point leads to a regime in which t1 , t2 are very small. In view of (2.22) this corresponds, in the ABJM model, to the weakly coupled theory λ1 , λ2 1.
(3.13)
The remaining period in (3.10) might be used to compute the genus zero free energy of the matrix model. Using the normalization appropriate for the ABJM matrix model, we find I=4
1 ∂ F0 π it πi = Fσ2 − log(4)σ2 − σ1 . − ∂σ2 2 2 2
(3.14)
3.2. Large radius, or strong coupling. The second point that we will be interested in is the so-called large radius point corresponding to z 1 = z 2 = 0. This is the point where the Calabi–Yau manifold is in its geometric phase, and the expansion of the genus zero free energy near that point leads to the counting of holomorphic curves with Gromov– Witten invariants. The solutions to the Picard–Fuchs Eqs. (3.6) near this point can be obtained in a systematic way by considering the so-called fundamental period 0 (z 1 , z 2 ; ρ1 , ρ2 )
(2k + 2l + 2ρ1 + 2ρ2 )(1 + ρ1 )2 (1 + ρ2 )2 k+ρ l+ρ z 1 z2 2 . = (2ρ1 + 2ρ2 )(1 + k + ρ1 )2 (1 + l + ρ1 )2 1
(3.15)
k,l≥0
As reviewed in for example [39], a basis of solutions to the PF equations can be obtained by acting on the fundamental period with the following differential operators: Di(1) = ∂ρi ,
Di(2) =
1 κi jk ∂ρ j ∂ρk . 2
(3.16)
Here κi jk are the classical triple intersection numbers of the Calabi–Yau. This leads to the periods , Ti (z 1 , z 2 ) = −Di(1) 0 (z 1 , z 2 ; ρ1 , ρ2 ) ρ =ρ =0 1 2 (3.17) (2) . Fi (z 1 , z 2 ) = Di 0 (z 1 , z 2 ; ρ1 , ρ2 ) ρ1 =ρ2 =0
These periods should be linearly related to those defined in the matrix model in Eqs. (2.14) and (2.15). We present now some explicit expressions for them that we will use in Sects. 5.1 and 5.3 to solve for these relations (see Eqs. (5.3) and (5.23)). In general, one normalizes these periods and divides them by the fundamental period evaluated at ρ1 = ρ2 = 0. But in local mirror symmetry we have [40] 0 (z 1 , z 2 ; ρ1 , ρ2 )|ρ1 =ρ2 =0 = 1.
(3.18)
The Ti are single-logarithm solutions, and they are identified in standard mirror symmetry with the complexified Kähler parameters, while the Fi are double-logarithm solutions and they are identified with the derivatives of the large radius genus zero free energy w.r.t. the Ti . In our case, we find the explicit expressions −T1 = log z 1 + ω(1) (z 1 , z 2 ), −T2 = log z 2 + ω(1) (z 1 , z 2 ),
(3.19)
524
N. Drukker, M. Mariño, P. Putrov
where ω(1) (z 1 , z 2 ) = 2
k,l≥0,
(2k + 2l) z k zl (1 + k)2 (1 + l)2 1 2
(k,l) =(0,0)
= 2z 1 + 2z 2 + 3z 12 + 12z 1 z 2 + 3z 22 + · · · .
(3.20)
In order to obtain the Fi we have to compute the double derivatives w.r.t. the parameters ρ1 , ρ2 . We find ∂ρ21 0 (z 1 , z 2 ; ρ1 , ρ2 )
ρ1 =ρ2 =0 (2)
= log z 1 + 2 log z 1 ω(1) (z 1 , z 2 ) + ω1 (z 1 , z 2 ), 2
(3.21)
where
(2)
ω1 (z 1 , z 2 ) = 8
k,l≥0,
(2k + 2l) (ψ(2k + 2l) − ψ(1 + k)) z 1k z l2 . (1 + k)2 (1 + l)2
(3.22)
(k,l) =(0,0)
Similarly, ∂ρ22 0 (z 1 , z 2 ; ρ1 , ρ2 )
ρ1 =ρ2 =0
= log z 2 + 2 log z 2 ω(1) (z 1 , z 2 ) + ω2(2) (z 1 , z 2 ), 2
(3.23)
where (2)
ω2 (z 1 , z 2 ) = 8
k,l≥0,
(2k + 2l) (ψ(2k + 2l) (1 + k)2 (1 + l)2
(k,l) =(0,0)
(2)
− ψ(1 + l)) z 1k z l2 = ω1 (z 2 , z 1 ).
(3.24)
Finally, ∂ρ1 ∂ρ2 0 (z 1 , z 2 ; ρ1 , ρ2 )ρ
1 =ρ2 =0
= log z 1 log z 2 + (log z 1 + log z 2 ) ω(1) (z 1 , z 2 ) 1 (3.25) + ω1(2) (z 1 , z 2 ) + ω2(2) (z 1 , z 2 ) . 2
The double log periods are obtained as linear combinations of the above, by using the explicit expressions for the classical intersection numbers that can be found in for example [31] κ111 =
1 , 4
1 κ112 = − , 4
1 κ122 = − , 4
κ222 =
1 . 4
(3.26)
From Weak to Strong Coupling in ABJM Theory
525
We find: 1 2 Dρ1 ω0 − 2Dρ1 ρ2 ω0 − Dρ21 ω0 8 1 = − log2 z 1 − 2 log z 1 log z 2 − log2 z 2 8 1 (2) 1 + log z 2 ω(1) (z 1 , z 2 ) + ω2 (z 1 , z 2 ), 4 8 1 2 F2 (z 1 , z 2 ) = − Dρ1 ω0 − 2Dρ1 ρ2 ω0 − Dρ21 ω0 8 1 = − − log2 z 1 − 2 log z 1 log z 2 + log2 z 2 8 1 (2) 1 + log z 1 ω(1) (z 1 , z 2 ) + ω1 (z 1 , z 2 ). 4 8 F1 (z 1 , z 2 ) = −
(3.27)
They satisfy the symmetry property F1 (z 1 , z 2 ) = F2 (z 2 , z 1 ).
(3.28)
The reason why we are interested in the large radius point is because it describes the structure of the ABJM theory at strong coupling. In the region where z 2 is small, x2 is large and the periods t1,2 grow. In general, the expansions of the periods around the special points have a finite radius of convergence, but they can be analytically continued to the other “patches”. Since their analytic continuation satisfies the PF equation, we know for example that the analytic continuation of the orbifold periods to the large radius patch must be linear combinations of the periods at large radius. This provides an easy way to perform the analytic continuation which will be carried out in detail in Sect. 5, where we will verify that indeed the region near the large radius point corresponds to λ1 , λ2 1.
(3.29)
3.3. Conifold locus. Finally, the third set of special points is the conifold locus. This is defined by = 0, where = 1 − 8(z 1 + z 2 ) + 16(z 1 − z 2 )2 .
(3.30)
In terms of the variables ζ, β, this locus corresponds to the four lines ζ = −2β ± 2,
ζ = 2β ± 2.
(3.31)
The conifold locus is the place where cycles in the geometry collapse to zero size. The first two lines correspond to a = ±1, i.e., the collapse of the C1 cycle, while the second set of lines corresponds to b = ∓1, i.e., to the collapse of the C2 cycle. In principle we can solve the PF system near any point in the conifold locus, but in practice it is useful to focus on the point z1 = z2 =
1 , 16
(3.32)
526
N. Drukker, M. Mariño, P. Putrov
which has been studied in [31]. We will call it the symmetric conifold point. Appropriate global coordinates around this point are6 y1 = 1 −
z1 , z2
y2 = 1 −
1 . 16z 1
(3.33)
In terms of these coordinates, the PF system reads L1 = ∂ y2 − 2(1 − y2 )∂ y22 − 8(1 − y1 )2 ∂ y1 + 8(1 − y1 )3 ∂ y21 , L2 = −(7 − 8y2 )∂ y2 + 2(3 − 7y2 + 4y22 )∂ y22 − 8(1 − y1 )∂ y1
(3.34)
−16(1 − y1 )(1 − y2 )∂ y1 ∂ y2 + 8(1 − y1 )2 ∂ y21 . Notice that, strictly speaking, the orbifold point does not belong to the conifold locus, once the moduli space is compactified and resolved [10]. A generic point in the conifold locus has then t1 = 0 or t2 = 0, but not both, and expanding around the conifold locus means, in the ABJM theory, an expansion in the region λ1 1,
λ2 ∼ 1,
(3.35)
or in the region with λ2 exchanged with λ1 . This regime of the ABJM theory has been considered in [26]. It was observed in [41] that the moduli space of the local F0 surface can be mapped to a well-known moduli space, namely the Seiberg–Witten (SW) u-plane [42]. This plane is parametrized by a single complex variable u. The relation between the moduli is u=
ζ2 1 β + β −1 − . 2 8β
(3.36)
The three singular points that we have discussed (large radius, orbifold, and symmetric conifold) map to the points u = ∞, +1, −1. These are the semiclassical, monopole and dyon points of SW theory. As we will see, they can be identified with interesting points in ABJM theory. An important set of quantities in the study of moduli spaces of CY threefolds are the three-point couplings or Yukawa couplings, C zi z j z k . These are the components of a completely symmetric degree three covariant tensor on the moduli space. When expressed in terms of flat coordinates they give the third derivatives of the genus zero free energy. In terms of the coordinates z 1 , z 2 , the Yukawa couplings are given by [10,31] C111 = C112 = C122 = C222 =
(1 − 4z 2 )2 − 16z 1 (1 + z 1 ) , 4z 13 16z 12 − (1 − 4z 2 )2 4z 12 z 2 16z 22 − (1 − 4z 1 )2 4z 1 z 22
, (3.37) ,
(1 − 4z 1 )2 − 16z 2 (1 + z 2 ) . 4z 23
6 These are slightly different from the ones used in [31].
From Weak to Strong Coupling in ABJM Theory
527
Fig. 2. The moduli space of the ABJM theory, describing the possible values of the ’t Hooft couplings λ1,2 , can be parametrized by a real submanifold of the moduli space of local F0 , here depicted as a sphere. The orbifold point maps to the origin, while the conifold locus (which is represented by a dashed line) maps to the two axes
3.4. The moduli space of the ABJM theory. The matrix model of ABJM is closely related to the lens space matrix model, and therefore so are also the moduli spaces of the theories. Some of the explicit relations needed for this identification will be presented only in the following sections, but we would still like to present here the main points on the moduli space. We can think about the moduli space of the planar ABJM theory as the space of admissible values of the ’t Hooft parameters λ1 , λ2 . We will assume for simplicity that k > 0. The theory with negative values of k can be obtained from this one by a parity transformation. In the gauge theory λ1,2 must be rational and non-negative (for k > 0). Moreover, according to [11], any value of λ1,2 is admissible as long as |λ1 − λ2 | ≤ 1.
(3.38)
This moduli space can be parametrized by the B field and κ, which from the explicit expressions derived below (4.1) and (5.11) has to be real and positive. It can be identified as a real submanifold of the moduli space of local F0 . Moreover, we can identify the singular points of this moduli space with natural limits of ABJM theory (see Fig. 2): 1. The weak coupling regime λ1,2 → 0 corresponds to the orbifold point of the local F0 geometry κ = 0, B = 1/2. In terms of type IIA theory, this is also an orbifold geometry with a small radius but a nonzero value for the B field. 2. The strong coupling regime λ1,2 → ∞ (where also κ → ∞) corresponds to the large radius limit of the local F0 geometry. 3. Out of the four lines (3.31) in the conifold locus = 0, only two lead to κ ∈ R. They are the curves in the (κ, B) plane with κ = ±4 cos π B, which correspond respectively to a = 1 and b = 1, therefore to λ1 = 0 or λ2 = 0. Hence, the boundary of the ABJM moduli space given by min(λ1 , λ2 ) = 0 corresponds to −4 cos π B, B > 1/2 κ(B) = . (3.39) 4 cos π B, B < 1/2
528
N. Drukker, M. Mariño, P. Putrov
In particular, the symmetric conifold point z 1 = z 2 = 1/16 corresponds to B = n ∈ Z, κ = ±4. Along the curve (3.39), one of the two gauge groups of the ABJM theory is absent, so the theory reduces to topological CS theory. We examine this regime in Sect. 6. Given a fixed value of the B field, we can describe the real one-dimensional moduli space of the ABJM theory as a real submanifold of the u-plane of Seiberg–Witten theory, by using (3.36) in the form u = − cos(2π B) +
κ2 . 8
(3.40)
Singular points in moduli space become then the well-known singularities of SW theory. For example, when B = 1/2, the moduli space, described by κ ∈ [0, ∞), becomes the region u ∈ [1, ∞). The orbifold point (weakly coupled ABJM theory) maps to the monopole point, while the large radius point (strongly coupled ABJM theory) corresponds to the semi-classical region (see Fig. 3). Notice that the conifold point would map to the dyon point of Seiberg–Witten theory, but this does not belong to the moduli space of ABJM theory with B = 1/2. We can however realize it by making an analytic continuation of the ’t Hooft coupling to complex values. The dyon point corresponds then to the point κ 2 = −16, which leads by (5.5) to an imaginary value λ=−
2iK , π2
(3.41)
where K is Catalan’s number. As usual, string dualities lead to a full complexification of the moduli space of ’t Hooft parameters. In the case of ABJM theory, the complexified moduli space for the variables λ1,2 is simply the moduli space of the parameters β, ζ , which is a Z2 × Z2 covering of the moduli space parametrized by z 1,2 .
Fig. 3. The moduli space of the ABJM theory for B = 1/2 can be mapped to the line [1, ∞) in the u plane of Seiberg–Witten theory, which is here shown in red. The monopole point corresponds to the weakly coupled ABJM theory, while the semiclassical limit corresponds to the strongly coupled theory
From Weak to Strong Coupling in ABJM Theory
529
4. Weak Coupling In principle, to study the matrix model at weak coupling one does not need the sophisticated tools presented in the previous section. One can do perturbative calculations directly in the integral expressions (2.1) or (2.3) for the matrix model. A calculation of the 1/6 BPS Wilson loop to three loop order was indeed done in this way in the original paper [4]. Still, the explicit expressions for the periods σ1,2 (3.10) and their relation to t1,2 (3.12) gives a much more efficient way to obtain perturbative, planar expansions. Inverting these relations we find the weak coupling expression for κ (2.24) i 3 t1 + 3t12 t2 − 3t1 t22 − t23 κ = −2i(t1 − t2 ) − 12 i 5 4 t1 + 5t1 t2 − 10t13 t22 + 10t12 t23 − 5t1 t24 − t25 + O(t 7 ). − 960
(4.1)
This agrees with the weak coupling expansion of the inverse of the exact mirror map (5.5), obtained in [8]. Using the dictionary relating the ’t Hooft couplings (2.22) we immediately get the result for the 1/2 BPS Wilson loop in the planar approximation (2.35), W
1/2
=e
π iB κ
2
=e
π2 2 λ1 − 4λ1 λ2 + λ22 2π i(λ1 + λ2 ) 1 − 6 λ41 − 6λ31 λ2 − 4λ21 λ22 − 6λ1 λ32 + λ42 + O(λ6 ) . (4.2)
π i(λ1 −λ2 )
+
π4 120
In this expression we factored out the term 2π i(λ1 + λ2 ), which depends on the overall normalization of the Wilson loop, as mentioned after (2.35). There is also the extra phase factor, which appears also at strong coupling and can be attributed to framing. Note that so far this expansion has not been reproduced directly in the gauge theory, as even the two-loop graphs are quite subtle. For the 1/6 BPS Wilson loop, using the explicit expression (2.33) and expanding at low orders one finds [8] W
1/6
π2 λ1 (λ1 − 6λ2 ) =e 2π iλ1 1 − 6 π 3i π4 2 3 2 3 5 λ1 λ2 + λ1 λ1 − 10λ1 λ2 − 20λ2 + O(λ ) . − 2 120 π iλ1
(4.3)
Again the exponent is a framing factor and the factor of 2π iλ1 is due to the normalization chosen in (2.30). This expression agrees with the 2-loop calculations in [15–17]. Note that the 3-loop analysis in [17], done for λ1 = λ2 , misses the next term, due to a projection which essentially removes all terms at odd orders in perturbation theory. Next we turn to the free energy. Here we notice that the period in (2.15) gives only the derivative of the free energy. Indeed, within the formalism of special geometry developed above, the planar free energy of the matrix model is only determined up to quadratic terms in the ’t Hooft couplings. These have to be fixed by direct calculation in the matrix
530
N. Drukker, M. Mariño, P. Putrov
model N22 N12 2π N1 2π N2 + F= log log 2 k 2 k 3 − (N12 + N22 ) − log(4)N1 N2 + · · · . 4
(4.4)
The last term comes from the normalization of the cosh term in (2.1), while the remaining terms are just the free energies for two Gaussian matrix models with couplings ±2π i/k. Notice that the above free energy has an imaginary piece given by πi (N1 − N2 )((N1 − N2 )2 − 1) 6k
(4.5)
Using the identification of the periods at weak coupling (3.14) we write down the next term in the perturbative expansion π2 4 3 2 2 3 4 N . − 6N N + 18N N − 6N N + N 2 1 1 1 1 2 2 2 72k 2
(4.6)
It would be interesting to try to reproduce these expressions directly from studying perturbative ABJM theory on S3 . 5. Strong Coupling Expansion and the AdS Dual We turn now to the strong coupling limit of the matrix model, where we have to find the analytic continuation of the ’t Hooft parameters to the strong coupling region, as functions of the global parameters of moduli space. We will see how the shift of the charges discussed in [23,24] emerges naturally from our computation. We will also evaluate the free energy in this regime and compare with the classical action of the vacuum AdS dual, deriving in this way the N 3/2 behavior of the degrees of freedom. 5.1. Analytic continuation and shifted charges. In order to perform the analytic continuation of the ’t Hooft parameters, we use the explicit representation of the periods in terms of integrals given in (2.14) as well as their derivatives (2.16)–(2.17). Let us start by discussing t1 . We study its behavior at large ζ but fixed β, which is the large radius region. We find 2 ∂t1 ∂t1 i ζ i + o(ζ −1 ), log(−ζ 2 ) + π i + o(1), = log − =− ∂ζ πζ β ∂β 2πβ (5.1) and this gives the leading behavior i β log(−ζ 2 ) + π i log + · · · . t1 = − 2π ζ
(5.2)
In the physical theory t1 should be imaginary and β a phase. By examining (5.2), this implies that κ is real. From (3.4) we then see that z 1 = z¯ 2 and henceforth we label it z 1 = z.
From Weak to Strong Coupling in ABJM Theory
531
We know also that t1 must be a linear combination of the periods at large radius. Using that z 1 = 1/ζ 2 and z 2 = (β/ζ )2 , and comparing (5.2) to the behavior of the periods (3.19) and (3.27), we find i 1 πi (F1 + F2 ) − T2 − , 2π 2 6 i 1 πi t2 = − (F1 + F2 ) + T1 + . 2π 2 6 t1 =
(5.3)
The constants ±π i/6 cannot be fixed by using the above information, but they can be fixed by specializing to the ABJM slice z 1 = z 2 , as we will see in a moment. A simple calculation leads to the following explicit expression: 1 log2 κ 1 1 2 λ1 (κ, B) = B − + + 2 4 24 2π 2 log κ (1) 1 (2) (2) ω (5.4) − ω + ω z ¯ + (z, z¯ ) . (z, ) 1 2 2π 2 16π 2 This expansion is valid in the region κ → +∞. Notice that it is manifestly real when κ is real and positive. As a check of the above expression, we can particularize to the ABJM slice λ1 = λ2 = λ, (B = 1/2), which corresponds in the gauge theory, to having identical gauge groups in the two nodes of the quiver, i.e., N1 = N2 . The mirror map for this case was obtained in [8] as 1 1 1 κ 3 κ2 1 = , , ; 1, ; − . (5.5) λ κ, B = 3 F2 2 8π 2 2 2 2 16 The strong coupling expansion of this expression at κ 1 is 1 log2 κ 1 λ κ, B = = + O(κ −2 ), + 2 2 2π 24
(5.6)
in agreement with (5.4). This also fixes the constants in (5.3). As in [8], the observables of the model are naturally functions of ζ, β (alternatively κ, B), and we have to re-express them in terms of λ1,2 . Equation (5.4) shows that the natural variable at strong coupling is not λ1 , but rather 1 1 1 1 1 1 λˆ = λ1 − B2 − − = (λ1 + λ2 ) − (λ1 − λ2 )2 − . (5.7) 2 4 24 2 2 24 In particular, it is only when expressed in terms of this variable that κ is a periodic function of λˆ , B. Remarkably, the above shift is precisely the one found in [24]. In the type IIA realization of the ABJ theory U (M2 )k × U (M2 + M4 )−k , where M2 corresponds to the number of D2 branes and M4 to the number of D4 branes, the Maxwell charge of the D2 branes is not M2 , but rather 1 1 k 1 B2 − − k− , (5.8) Q 2 = M2 − 2 4 24 k
532
N. Drukker, M. Mariño, P. Putrov
where M4 1 + . k 2
B=−
(5.9)
After dividing by k and taking the large k limit, we recover (5.7) with λˆ =
Q2 . k
(5.10)
The relation between λˆ and κ can be inverted at strong coupling, generalizing [8] to B = 21 , and it is of the form ˆ B) = eπ κ(λ,
√
⎛ 2λˆ ⎝
1+
c
≥1
1 , β e−2π π 2λˆ
√
⎞
2λˆ ⎠
,
(5.11)
where c (x, β) =
2−1
()
ck (β)x k .
(5.12)
k=0 ()
The coefficients ck (β) are Laurent polynomials in β, β −1 , of degree , and symmetric under the exchange β ↔ β −1 . In other words, they can be written as polynomials in cos(2π m B), so they are periodic in B, with period 1. We find, for example, x , c1 (x, β) = − β + β −1 1 − 2 x 2 3β − 8 + 3β −2 (5.13) c2 (x, β) = 3 + 8 2 x 3 2 3x 2 β + β −1 − β + β −1 . − 8 8 The fact that c (x, β) are polynomials in x of degree 2 − 1, rather than power series, comes out from an explicit calculation of the first few cases, and we have not established it. From the explicit expression (5.4) we can implement the symmetries of the model as a function of κ and B (or equivalently, z 1 and z 2 ). For example, the transformation N1 → 2N1 + k − N2 ,
N2 → N1
(5.14)
simply corresponds to periodicity in the B field B → B+1
(5.15)
while κ remains unchanged. From the point of view of the z 1,2 variables, this is simply a monodromy transformation z 1,2 → e∓2π i z 1,2 . Notice that not all the values of κ lead to admissible values of λ1,2 , since min(λ1 , λ2 ) ≥ 0. This means that the boundary of moduli space is the conifold locus (3.39).
From Weak to Strong Coupling in ABJM Theory
533
5.2. Wilson loops at strong coupling and semi–classical strings. As an application of the explicit expression for κ (5.11), we can use (2.24) to immediately obtain the VEV of the 1/2 BPS Wilson loop (2.35) at strong coupling W
1/2
g=0 =
1 π iB ˆ B). e κ(λ, 2
(5.16)
Note that this is a real function of λˆ , B, up to the overall phase involving the B field. This is the same phase that appears also in the weak-coupling result (4.2) and arises also in field theory calculations as a framing-dependent term [12,43,44]. The matrix model always gives the answer for framing = 1. The result for the 1/6 BPS Wilson loop is, as usual more complicated, but can still be written in a power series expansion at strong coupling. We quote only the leading strong coupling result for λ1 = λ2 [8], √ 2λ π √2λ 1/6 e . (5.17) W g=0 ≈ − 2 We would like to comment about the normalization of the operators. As mentioned after (2.35), the normalization chosen there is such that the trace of the identity in the fundamental of U (N1 ) gives t1 = 2π iN1 /k and for the fundamental of U (N1 |N2 ) (with a minus sign as in (2.9), it gives t1 − t2 = 2π i(N1 + N2 )/k. In CS theory these normalizations are quite common, but they may be not the most natural ones in the ABJM theory. An alternative normalization is to divide by this term, such that at weak coupling the expansion of the Wilson loop will be W ∼ 1 + · · ·. This is the normalization chosen in [8], and hence the slight differences in the preceding equations from that reference. Note, though, that with such a normalization, one would have to divide the doubly-wound 1/2 BPS Wilson loop in the fundamental representation by the super-trace of the identity, which is 2π i(N1 − N2 )/k and is singular for N1 = N2 . There should be a natural choice of normalization that would reproduce the correct normalization of the one-loop partition function of the classical string in AdS4 × CP3 . To this day, though, a fully satisfactory calculation for the analog string in AdS5 × S5 giving the factor of λ−3/4 derived from the Gaussian matrix model does not exist. One argument, based on world-sheet arguments was given in [45], but it is not clear why this argument would be modified for ABJM theory. Direct calculations of the determinant [46,47] were not conclusive. A possible trick to derive it was proposed in [48] by considering a 1/4 BPS generalization of the circular Wilson loop, where three zero modes of the the Wilson loop of [49] are explicitly broken and the integral over them gives this factor. It would be interesting to construct such generalization to the Wilson loop of [7] and see if a similar argument can be derived from that. Regardless of the overall normalization, one can compare those of the 1/2 BPS loop and the 1/4 BPS loop. Ignoring numerical constants and the framing factor, the ratio is W
1/6
W
1/2
g=0 g=0
≈
√
λ,
(5.18)
which is proportional to the volume of a CP1 inside CP3 . Indeed, it was argued in [15,17] that the string description of the 1/6 BPS Wilson loop should be in terms of a string smeared over such a cycle.
534
N. Drukker, M. Mariño, P. Putrov
5.3. The planar free energy and a derivation of the N 3/2 behaviour. In this section we study the free energy at strong coupling. We derive the N 3/2 behavior characteristic of M2 branes [25], and we match the exact coefficient with a gravity calculation in type IIA superstring on AdS4 × CP3 . The free energy of the matrix model has a large N expansion of the form F = log Z =
∞
2g−2
gs
Fg (λ1 , λ2 ).
(5.19)
g=0
This is the way the genus expansion is typically expressed in topological string theory. To compare with the gauge theory and the AdS dual one may choose to rewrite this series as an expansion in powers of 1/N by absorbing factors of λ into Fg . As mentioned in Sect. 4, the formalism of special geometry determines the planar free energy only up to quadratic terms in the ’t Hooft couplings, and these have to be fixed from the explicit weak coupling calculation in the matrix model (4.4). Let us now consider the derivative of the genus zero free energy (2.15), and study its analytic continuation to strong coupling as we have done for ti at the top of Sect. 5.1. Expanding (2.19) for large κ we find πi ∂I = − + O(ζ −2 ), ∂ζ ζ
∂I = O(ζ −1 ), ∂β
(5.20)
so I = −π i log ζ + O(ζ 0 ) = −π i log κ + π 2 B + O(κ 0 , B 0 ),
κ → ∞. (5.21)
From this leading large κ behavior we have that in the ABJM slice √ ∂ F0 ≈ 2π 3 2λ, ∂λ
(5.22)
which can be integrated to give the leading term in (5.34) and the match with the supergravity calculation presented below. But to get the full series of corrections we should proceed more carefully. We know that the result of the continuation should be a linear combination of periods, and comparing to (3.19) we see that we can express the period as I+
∂ F0 ∂ F0 πi π it = − = − (T1 + T2 + 2π i) . 2 ∂t1 ∂t2 4
(5.23)
The constant term can be fixed by looking at the solution on the ABJM slice N1 = N2 , which can be obtained as follows. Since on the slice we effectively have a one-parameter model, there is only one Yukawa coupling, which we can integrate to obtain F0 . From (3.37) we easily obtain 1 1 128π 6 ∂λ3 F0 (λ) = Cλλλ =− , (5.24) 2 4 κ(κ + 16) K iκ 3 λ1 =−λ2 4 where the factor of 4 is introduced to match the normalization of the matrix model, and we used that 1 dλ iκ = . (5.25) K 2 dκ 4π 4
From Weak to Strong Coupling in ABJM Theory
535
Integrating once, we find ∂λ2 F0 (λ)
= 4π
3
K K
iκ iκ4 + a1 ,
(5.26)
4
where a1 is an integration constant and we have used the Legendre relation π EK + E K − K K = . (5.27) 2 A further integration leads to the following expression in terms of a Meijer function: κ 2,3 21 , 21 , 21 κ 2 − + a1 λ + a2 . (5.28) ∂λ F0 (λ) = G 3,3 0, 0, − 21 16 4 Comparison with the matrix model free energy at weak coupling (4.4) fixes a1 = 4π 3 i, a2 = 0, so we can write κ 2,3 21 , 21 , 21 κ 2 − ∂λ F0 (λ) = G 3,3 0, 0, − 21 16 4 1 1 1 3 κ 2. π 2 iκ , , ; 1, ; − . (5.29) F + 3 2 2 2 2 2 2 16 If we integrate this expression with the following choice of integration constant, λ F0 (λ) = dλ ∂λ F0 (λ ), (5.30) 0
we obtain the correct weak coupling expansion. We can now analytically continue the r.h.s. of (5.29) to κ = ∞, and we obtain 3 3 16 4π 2 (5.31) ∂λ F0 (λ) = 2π 2 log κ + 2 4 F3 1, 1, , ; 2, 2, 2; − 2 . κ 2 2 κ This agrees with (5.23) on the ABJM slice. To see this, one notices that ∞ ∞ n
4 (2n − 1)! n + 21 n (2k + 2l − 1)! n (1) ω (z, z) = 2 z =2 z √ (k!)2 (l!)2 π (n + 1)3 n=1 k+l=n n=1 3 3 (5.32) = 4z 4 F3 1, 1, , ; 2, 2, 2; 16z 2 2 is precisely the generalized hypergeometric function appearing in (5.31). We are now ready to discuss the calculation of the planar free energy at strong coupling. We have ∂λˆ F0 (λ1 , λ2 ) = 2π 2 log κ − π 2 ω(1) (z, z¯ ).
(5.33)
After plugging the value of κ in terms of λˆ given by the series expansion (5.11), and ˆ we obtain integrating w.r.t. λ, √ √ 1 3 4π 3 2 3/2 −2π 2λˆ 2π 3 i 1 ˆ ˆ ,β − F0 (λ, B) = B− e f , λ + 3 3 2 π 2λˆ ≥1 (5.34)
536
N. Drukker, M. Mariño, P. Putrov
where f (x) is a polynomial in x of the form f (x, β) =
2−3
()
f k (β)x k ,
≥ 2.
(5.35)
k=0 ()
The coefficients f k (β) are Laurent polynomials in β of degree , and symmetric under the exchange β ↔ β −1 . We have, for the very first cases, 1 β + β −1 , 2 x 2 1 2 f 2 (x, β) = β + 16 + β −2 + β + β −1 . 16 4 f 1 (x, β) = −
(5.36)
The integration constant in going from (5.33) to (5.34) can be seen to be zero by comparing (5.34) with a numerical calculation of the integral (5.30) at intermediate coupling. The free energy in the planar approximation is given by rescaling (5.34) by the string coupling F = gs−2 F0 + O(gs0 ). This expression displays many interesting features. First, note that on the ABJM slice N1 = N2 the leading term √ π 2 2 3/2 k λˆ − 3
(5.37)
displays the “anomalous” scaling N 3/2 in the number of degrees of freedom for a theory of M2 branes, as was first derived from a supergravity calculation in [25]. The above calculation is a first principles derivation of this behaviour at strong coupling in the gauge theory. Usually, this behaviour is associated to the thermal free energy on R3 , while (5.37) gives rather the free energy of the ABJM theory on S3 at strong coupling. However, a supergravity calculation of this free energy also leads to the N 3/2 behavior. We will show this now, and in particular we will match the numerical coefficient in (5.37).7
5.4. Calculation of the free energy in the gravity dual. Consider type IIA theory on AdS4 × CP3 , and let us reduce it to the AdS4 factor as in for example [50]. The (Euclidean) AdS metric appropriate for a boundary theory on S3 is ds 2 = dρ 2 + sinh2 ρ d2 ,
(5.38)
where d2 is the metric on an S3 of unit radius. In this coordinate system, the boundary is at ρ → ∞. The free energy of the boundary CFT on S3 should be given, in the supergravity approximation, by minus the Euclidean gravitational action of the AdSn+1 space −IAdSn+1 . This action is given by a bulk term, a surface term, and a counterterm at the boundary [51,52] IAdSn+1 = Ibulk + Isurf + Ict , 7 We would like to thank Diego Hofman for very useful remarks on this calculation.
(5.39)
From Weak to Strong Coupling in ABJM Theory
with
1 √ Ibulk = − dn+1 x g (R − 2) , 16π G N X √ 1 Isurf = − dn x h K , 8π G N ∂ X √ 1 1 R + ··· . Ict = dn x h n − 1 + 8π G N ∂ X 2(n − 2)
537
(5.40)
In these equations, G N is Newton’s constant, and R, K and R are the scalar curvature of the bulk, the extrinsic curvature of the boundary ∂ X , and the scalar curvature of the induced metric h on ∂ X , respectively. The counterterm action includes higher order corrections which are not relevant for the case of AdS4 and will not be considered here [52]. As our boundary ∂ X , we will take the hypersurface ρ = ρ0 , and at the end of the calculation we must take ρ0 → ∞. The counterterms guarantee that the resulting action will be finite. The bulk action is easy to evaluate and gives Ibulk (ρ0 ) = where
3 vol(AdS4 ; ρ0 ), 8π G N
ρ0 dρ (sinh ρ)3 vol(AdS4 ; ρ0 ) = vol(S3 ) 0 3 2 2 1 cosh(3ρ0 ) − cosh(ρ0 ) + = 2π . 12 4 3
(5.41)
(5.42)
It is easy to see that the surface term and the counterterms remove the divergences as ρ0 → ∞, leaving only the term 4π 2 /3 in (5.42), and we find [52] π lim IAdS4 (ρ0 ) = . ρ0 →∞ 2G N
(5.43)
If we now use the dictionary relating Newton’s constant to the gauge theory data, √ 1 2 2 2 3/2 k λˆ , = (5.44) GN 3 we find exactly the leading term in (5.34)! Of course, in order to obtain this result we have used the regularization provided by the counterterm integral in (5.40), and one could suspect that the matching depends very much on this regularization. However, this counterterm has been tested (or fixed) in an independent way in the calculations of [51,52]. In particular, for n = 4 it leads to the matching of the Casimir energy of N = 4 SYM on R × S3 , and for n = 3 it reproduces the standard mass of an AdS4 – Schwarzschild black hole [51]. Therefore, the above calculation provides a genuine test of the AdS4 /CFT3 correspondence. In Fig. 4 we show the exact result for the planar limit of ∂λ F0 (λ) in the case N1 = N2 , as a function of λ = N /k, and we compare it to the behavior of the supergravity prediction, ∂λ F0 (λ) ≈ 2π 3 2(λ − 1/24), λ → ∞. (5.45)
538
N. Drukker, M. Mariño, P. Putrov
Fig. 4. Comparison of the exact result for ∂λ F0 (λ) given in (5.29), plotted as a solid blue line, and the weakly coupled and strongly coupled results. In the figure on the left, the red dashed line is the supergravity result (5.45), while in the figure on the right, the black dashed line is the Gaussian result (5.46)
We see that the strong coupling behavior gets triggered for values of the coupling λ ≈ 0.2. For λ → 0, the behavior of the prepotential is dominated by the Gaussian, weakly coupled result (4.4), πλ −1 , λ → 0. (5.46) ∂λ F0 (λ) ≈ −8π 2 λ log 2 A second aspect to notice is that the supergravity result (5.34) has corrections which are exponentially suppressed. The exponential is of the form e−A(CP ) ,
(5.47)
A(CP1 ) = 2π 2λˆ
(5.48)
1
where is the area of the CP1
two-cycle in CP3 . Also, notice that each of these exponential cor-
rections multiplies (at each order in λˆ −1/2 ) the polynomial f k() (β) in β, β −1 . Therefore, we have contributions schematically of the form
1 1 cn + ,n − e−n + (A(CP )+2π iB)−n − (A(CP )−2π iB) . (5.49) n + +n − =
This is precisely what one should expect for a gas of n + instantons and n − anti–instantons in a σ model on CP3 , where the (anti)instantons wrap the CP1 cycle. Notice that this kind of corrections are made possible by the non-trivial topology of two cycles in CP3 , i.e., by the fact that b2 (CP3 ) = 1, and as such they are absent in AdS5 × S5 . Some aspects of these string instantons have been studied in [53]. It would be interesting to test in detail the possible connection between these string instantons and the exponentially suppressed corrections to the planar free energy. These instanton corrections are also present in the Wilson loop result (5.16), again with an infinite series of corrections. This can be compared with the case of N = 4 SYM in 4d, where the asymptotic large coupling expansion of the Gaussian matrix model (2.32) has a single instanton correction which can be explicitly identified with a second saddle point solution in AdS5 × S 5 [48,54]. Finally, we note that when N1 = N2 , the planar free energy (5.34) includes an imaginary term proportional to (B − 1/2)3 , which is derived by the weak coupling calculation (4.5). In CS theory such a term is related to framing [12]. It would be very interesting to derive this phase in type IIA string theory.
From Weak to Strong Coupling in ABJM Theory
539
6. Conifold Expansion The expansion around the conifold locus corresponds to a region in the moduli space of the ABJM model where one of the gauge groups has finite coupling, while the other one is weakly coupled. In the lens space matrix model this corresponds to one ’t Hooft parameter being small, and the other of order 1. In this section we will study this regime from three different points of view: the exact planar solution in terms of periods and Picard–Fuchs equations, the matrix model, and the gauge theory.
6.1. Expansion from the exact planar solution. We can use the exact planar solution to calculate various physical quantities near the conifold locus. For concreteness, we will expand around t2 = 0 but with t1 arbitrary. The first ingredient we need is an expansion of the global coordinates of moduli space. It turns out that the most convenient method is based on the expressions for the periods (2.14). The locus where t2 = 0 is the line ζ = 2β − 2,
(6.1)
where the cut (−b, −1/b) collapses to the point Z = −1. The derivative of t2 w.r.t. ζ can then be computed in terms of residues at this point by expanding the expression in (2.16):
1 ∂t2 = − ∂ζ 4π i k 0
−1
dZ
Hk (Z , β) (ζ − 2β + 2)k , (Z + 1)2k+1
(6.2)
where Hk (Z , β) are regular at Z = −1. This gives a series for t2 in powers of ζ − 2β + 2, 1−β 1 − t2 = √ (ζ − 2β + 2) − (ζ − 2β + 2)2 128 β 3/2 4 β 9 − 2β + 9β 2 + (ζ − 2β + 2)3 + O((ζ − 2β + 2)4 ), 12288 β 5/2
(6.3)
which can be easily inverted to 1 3 + 10β + 3β 2 3 t2 + O(t24 ). ζ = 2β − 2 − 4 β t2 + (1 − β) t22 + √ 2 48 β
(6.4)
As a nice application of this expansion, we can compute the VEV of the 1/2 BPS Wilson loop around the conifold point, which is given in (2.35). Using the dictionary (2.23), (2.24), we find e−π iB W
1/2
g=0 = 2 sin(π λ1 ) + 2π λ2 (2 − cos(π λ1 )) + π 2 λ22 sin(π λ1 ) 1 + π 3 λ32 1 − 5 cos(π λ1 ) + 3 cos2 (π λ1 ) + O(λ42 ). (6.5) 3
As λ2 → 0, we recover the result for a Wilson loop VEV in U (N1 ) CS theory. In the conifold expansion we are then regarding the ABJM theory as a perturbation of U (N1 ) CS theory at strong coupling.
540
N. Drukker, M. Mariño, P. Putrov
The above result can be also obtained by solving the Picard–Fuchs equation around a point in the conifold locus. Let us choose for example the symmetric conifold point (3.32), with B = 1 and κ = 4. This corresponds to the point in the conifold locus with λ1 =
1 , 2
λ2 = 0.
(6.6)
The appropriate global coordinates near this point are (3.33). We find that λ2 is a period solving the PF system (3.34) and with leading behavior λ2 = −
1 (y2 + y1 /2) + O(y 2 ). 4π
(6.7)
One finds the expansion λ2 =
π π 5π 3 1 43π 3 (B − 1)2 − (B − 1)4 + − (B − 1)2 + (B − 1)4 (κ − 4) 4 96 8π 32 1536 3 1 9π 99π + − + (B − 1)2 − (B − 1)4 (κ − 4)2 128π 1024 8192 (6.8) + O (B − 1)6 + O (κ − 4)3 ,
which is inverted to 1 2 π4 1 4 κ = 4 − 2π λ1 − λ1 − + 2 6 2 2π 3 1 1 3 − λ1 − + π λ2 8 + 4π λ1 − 2 3 2 + O λ22 + O (λ1 − 1/2)5 . 2
(6.9)
This is indeed the expansion around λ1 = 1/2 of (twice) the series in the r.h.s. of (6.5). Once we know the expansion of the global coordinates, we can consider other quantities in the model, like the genus g free energies. The conifold expansion of Fg (t1 , t2 ) has the form
Fg (λ1 , λ2 ) = FgG (λ2 ) + Fg(n) (λ1 )λn2 , (6.10) n≥0
where FgG (λ2 ) is the free energy of the U (N2 ) Gaussian matrix model, and each coeffi(n)
cient Fg (λ1 ) can be obtained as an exact function of λ1 . Of course, Fg(0) (λ1 ) = FgS (λ1 ) 3
(6.11)
is the genus g free energy of the CS theory on S3 . When g = 0, the expansion (6.10) can be computed from the exact planar solution in various ways. One can for example use the Yukawa couplings (3.37) expanded around the conifold locus in order to compute
From Weak to Strong Coupling in ABJM Theory
541
the third derivatives of F0 , or use the modularity properties of the solution discussed in [31,41]. In any case, for the first few functions one finds the following results: (1) F0 (λ1 ) = 2π i π 2 λ21 + 2Li2 −eπ iλ1 − 2Li2 −e−π iλ1 , π λ1 (2) 3 2 F0 (λ1 ) = −2π iλ1 + 8π log cos , (6.12) 2 2π 3 i π 3 π λ1 (3) F0 (λ1 ) = + . (3 cos(π λ1 ) − 5) tan 3 3 2 6.2. Conifold expansion from the matrix model. It is easy to implement the conifold expansion directly in the lens space matrix model. To do that, we notice that it can be written as two interacting Chern–Simons matrix models on S3 . We recall that the CS matrix model on S3 , first considered in [9], is defined by the partition function Z S3 (N , gs ) =
1 N!
2 N μi − μ j dμi − 1 2 sinh e 2gs 2π 2 i=1
i
μi2
.
(6.13)
i< j
This is a one-cut matrix model [21]. It can be obtained from the lens space matrix model when one of the two cuts collapses to zero size. In the Z plane the endpoints of the cut are given by a and a −1 , where a = 2et − 1 − 2et/2 et − 1. (6.14) Let us consider the following operator in this model:
μi − ν j . log 2 cosh W(ν j ) = 2 2
(6.15)
i, j
The lens space partition function (2.3) can be calculated in two steps. In the first step, we compute Z 1 (ν j ) = eW (ν j ) , (6.16) N1
where the subindex N1 indicates that this is an unnormalized VEV in the S3 CS matrix model with gauge group U (N1 ). In a second step, we calculate Z L(2,1) = Z 1 (ν j ) N2
(6.17)
in the CS matrix model with gauge group U (N2 ). To obtain the conifold expansion, we calculate Z 1 (ν j ) and we expand it in gs and around ν j = 0. Each term in this expansion can be computed exactly as a function of the Kähler parameter t1 , since the CS matrix model can be solved exactly in the 1/N expansion. The resulting double series in gs and ν j is then regarded as an operator in the CS matrix model with group U (N2 ), which we expand around the Gaussian point as in [9,10], i.e., we expand the sinh measure around ν j = 0. The partition function Z L(2,1) is then computed as a VEV in the Gaussian matrix model. This procedure gives a method to compute the expansion (6.10) directly in the matrix model.
542
N. Drukker, M. Mariño, P. Putrov
To illustrate this procedure, let us calculate F0 (t1 , t2 ) at first order in t2 . In this computation we will denote U1 = diag(eμi ),
U2 = diag(eν j ).
(6.18)
The expansion around ν j = 0 of the operator W(ν j ) reads W(ν j ) = 2N2
N1
i=1
N1 N2 μ μ
i i − + O(ν 2j ). (6.19) log 2 cosh νj tanh 2 2 j=1
i=1
The average of the second term in the U (N2 ) matrix model vanishes (since it is odd in ν j ), while higher order terms are at least of order t22 . The first term can be written as 2
N1
i=1
N1 μ
i = 2 Tr log(1 + U1 ) − log 2 cosh μi . 2
(6.20)
i=1
Therefore, in the planar limit and neglecting terms which contribute at order t22 , we have log Z 1 (ν j ) ≈
2t2 Tr log(1 + U1 ) N1 , gs
(6.21)
since the second term in (6.20) is odd in μi and its VEV vanishes. We then find, F0 (t1 , t2 ) = F0S (t1 ) + 2t2 gs Tr log(1 + U1 ) + O(t22 ). 3
(6.22)
The VEV in (6.22), which is now normalized, can be computed in terms of the resolvent of the CS matrix model, and similar computations appear in [14,55] in the context of large N instanton corrections. In fact, it follows from (8.28) and (8.30) that the VEV in (6.22) is given by −g(−1), where g(Y ) is computed in (B.2). The final result for the linear correction in t2 is π 2 t12 + + Li2 (e−t1 ) − 2Li2 (e−t1 /2 ) + 2Li2 (−e−t1 /2 ). 3 2
(6.23)
(1)
Using dilogarithm identities, this agrees with λt22 F0 (λ1 ) in (6.12). It is interesting to point out that, in the context of CS theory on the lens space L(2, 1), this function is essentially the action of the large N instanton corresponding to the flat connection U (N ) → U (N1 ) × U (N2 ),
N2 N1 ,
(6.24)
as shown in [14]. In the matrix model, this action is obtained by tunneling N2 eigenvalues from the first cut to the second cut. We can also calculate the conifold expansion for the VEV of 1/6 and 1/2 BPS Wilson loops directly in the matrix model. We want to compute W
1/6
= gs Tr U1 L(2,1) .
(6.25)
We will again perform this computation in the planar approximation and at linear order in t2 . At this order we can compute instead the normalized average of the operator Tr U1 eW (ν j ) N (c) 1 = Tr U1 + Tr U1 W(ν j ) + · · · (6.26) W (ν ) j e N 1
From Weak to Strong Coupling in ABJM Theory
543
in a Gaussian matrix model for the ν j . In the last line, all VEVs are normalized VEVs in the S3 CS matrix model. By completing the square of the Gaussian weight we derive Tr U1
N 1
μi
N1 ∂ Tr U1 e j i=1 μi = gs Tr U1 . ∂j j=0
=
i=1
(6.27)
We then find, at this order, W
1/6
g=0 = gs Tr U1 + t2 2Tr U1 Tr log(1 + U1 ) (c) − gs Tr U1 + O(t22 ). (6.28)
The connected correlator Tr U1 Tr log(1 + U1 ) (c) = −
∞
(−1) =1
Tr U1 Tr U1 (c)
(6.29)
can be computed by considering the (partially) integrated two-point function (see for example [56])
1 d p W0 ( p, q) = − Tr U1n Tr U1m (c) (6.30) n q m+1 np n,m and extracting the coefficient of q −2 . We have, d p W0 ( p, q) ⎛ ⎞ # −1 ) ( p − a)( p − a 1 1 ⎝1 − ⎠+ = , (6.31) −1 2( p − q) (q − a)(q − a ) 2 (q − a)(q − a −1 ) which includes the appropriate integration constant. We find, after changing p → − p, ∞
(−1)
Tr U1 Tr U1 (c) p =1 $ 1 a + a −1 + 2 p − 2 ( p + a)( p + a −1 ) . = 4
−
(6.32)
When p = 1 this gives −
∞
(−1) =1
Tr U1 Tr U1 (c) = et1 − et1 /2 .
(6.33)
Notice that this is an infinite sum of correlators in the CS matrix model. Since Tr U1 =
e t1 − 1 , gs
(6.34)
544
N. Drukker, M. Mariño, P. Putrov
we finally obtain, W
1/6
2 g=0 = et1 − 1 + t2 et1 /2 − 1 + O(t22 ) t1 t1 t1 /2 2 =e 2 sinh + t2 −2 + 2 cosh + O(t2 ) . 2 2
(6.35)
Since this is a Wilson loop only in the first group, the framing prefactor depends only on the first ’t Hooft coupling. The 1/2 BPS Wilson loop is obtained by subtracting Tr U2 L(2,1) = N2 + O(t22 ) = We find, e
−(t1 +t2 )/2
W
1/2
g=0
t2 + O(t22 ). gs
t1 t1 + t2 −2 + cosh + O(t22 ). = 2 sinh 2 2
(6.36)
(6.37)
This is the result (6.5) obtained from the conifold expansion after using the dictionary (2.22). 6.3. On the near Chern–Simons expansion of ABJM theory. In the matrix model the conifold locus corresponds to vanishing of one of the two cuts, where the lens space matrix model can be written as a perturbation around the matrix model for Chern–Simons on S3 . Here we want to explore this limit in the original 3-dimensional theory. In the strict limit we have the theory with N2 = 0 and N1 1 and arbitrary N1 /k. In this limit all the fields charged under the second gauge group, i.e., its gluons and all the bi-fundamental fermions and scalars are removed. Consequently, ABJM theory simplifies dramatically and reduces to topological U (N ) CS. The only observables in the theory in this strict limit are Wilson loops, and they are given by the standard CS answer, which is exact in λ1 (and 1/N1 ). One can try to perform a systematic expansion around this point in a perturbative expansion in λ2 . One keeps λ2 λ1 , but if desired, can still assume the planar approximation, ignoring also the 1/N2 corrections. It is convenient to draw the Feynman graphs in double-line double-color notation, one color for each group. At the first non-trivial order in λ2 , only graphs with a single index loop of U (N2 ) are included. An arbitrary number of gluons of U (N1 ) are allowed. Let us propose the following calculation procedure: First ignore all U (N1 ) gluons and enumerate all remaining graphs. They are a very restricted subset, which can be identified very easily. In order to dress them up with the U (N1 ) gluons we write the propagators for the bi-fundamental fields as a path integral over all trajectories in space. As a charged object, these paths will effectively be Wilson loop in U (N1 ), which can be calculated exactly in CS theory. Since this theory is topological, the result of adding all the gluons is independent of the path of the bi-fundamental fields. One can then do the usual path integral for these fields and find the regular scalar and fermion propagators. The statement in the previous paragraph fails in a subtle way. The correlation function of Wilson loop operators in CS theory does not depend on their geometry only as long as their topology — the knotting and linking numbers — are kept fixed. Therefore one
From Weak to Strong Coupling in ABJM Theory
(a)
(b)
545
(c)
(d)
(e)
Fig. 5. Several Feynman graphs which may contribute at order λ2 to the 1/6 BPS Wilson loop with gluons stripped. The big circle is the Wilson loop, dashed lines are bosons and the solid line a fermion, all presented in double-line, double-color notation
has to modify the above statement, and sum over all possible topologies of the paths of the bi-fundamental fields accompanied by the relevant knotted/linked Wilson loops. Unfortunately, we do not have an a-priori method of determining the weight that should be assigned to the different topologies.8 As an illustration, let us consider the 1/6 BPS Wilson loop (whose Feynman rules are simpler than the 1/2 BPS one) and examine its perturbative expansion about the conifold locus. The Wilson loop is given in our normalization by [15–17] 2π 1/6 W = gs Tr P exp i Aμ x˙ μ + (6.38) |x|M ˙ JI C I C¯ J ds. k x μ parameterizes a circle in R3 (or S3 ), Aμ are the U (N1 ) gluons, C I and C¯ I are the bi-fundamental scalars and M JI = diag(1, 1, −1, −1) is a matrix in flavor space, which is required to make this object BPS. At order O(λ02 ), this is simply a Wilson loop of CS, whose planar expectation value (ignoring framing) is W
1/6
g=0 = 2i sin π λ1 + O(λ2 ).
(6.39)
After stripping away the gluon lines, there are still an infinite number of graphs involving bi-fundamental fields. Examples are shown in Fig. 5. In the example drawn, there is a single scalar or fermion loop. The scalar loop can “touch” the Wilson loop at an arbitrary number of points, due to the scalar bilinear term in (6.38). There are extra graphs which are not drawn, with fermionic tadpoles on the scalar lines, or vice–versa. By explicit calculations [15–17], all the connected graphs illustrated (Fig. 5c, 5d, 5e) vanish in dimensional regularization. The same can be argued for higher order graphs of this form. Likewise, one would not expect tadpoles to contribute. We are left therefore with the first two disconnected graphs, which become connected once gluon lines are added. Indeed, the only non-vanishing graph that was thus far calculated is the one-loop correction to the gluon propagator (Fig. 5a, 5b with two extra gluons), which accounts for the O(λ2 λ21 ) term (which with our normalization is 2-loops) in the explicit answer (4.3). 9 We can compare this to the explicit calculation in the matrix model above. The essential part of the expression for the Wilson loop at order O(λ2 ), (6.28) is the connected 8 Wilson loops arise out of dressing propagators of matter fields also in [57]. In that case the path is fixed to a collection of light-like segments, due to the singularity in the Minkowski-space propagator. 9 This graph has a divergence that can be removed by including the double scalar exchange graph (Fig. 5c). In dimensional regularization the finite part comes only from the gluon graph.
546
N. Drukker, M. Mariño, P. Putrov
correlator of two Wilson loops. One of them is the original Wilson loop and the other came from expanding the cosh term in the matrix model (6.20), which arises from integrating out the bi-fundamental matter. So this agrees with the identification of the contribution as coming from the bubble graphs. Moreover, what we see in the matrix model is that one should sum over multi-winding of this second Wilson loop, with a weight 1/l. This corresponds in the physical theory to summing over all possible topologies for the scalar and fermion bubble. As mentioned, we do not know how to derive this factor of 1/l from perturbation theory, but it is given to us by the explicit matrix model calculation. It was noted in [26] that in this limit of ABJM theory the spectrum of local operators also simplifies and the spin-chain hamiltonian becomes short-range. A compelling conjecture for the mysterious function h(λ) in that limit was also presented there. It would be interesting to explore this limit further and learn how to do this sum over topologies for other observables.
7. Modular Properties and the Genus Expansion In this section we provide an efficient, recursive method to compute the 1/N corrections to the free energy in the case N1 = N2 = N . This is based on the modular properties of the solution and the technique of direct integration of the holomorphic anomaly equations. The method determines a priori the full 1/N expansion. In practice it is quite efficient and it makes possible to calculate the Fg corrections for high genera. This is then used to estimate non-perturbative effects in the large N expansion. As noted in [41], we can use the relation between the local F0 theory and Seiberg– Witten theory to write all the quantities in the model in terms of modular forms. This representation becomes particularly useful when we restrict ourselves to a one-parameter model, as it was shown in a different context in [58]. When N1 = N2 , β = 1 and the modulus u becomes simply κ2 . 8
u =1+
(7.1)
In Seiberg–Witten theory, u is related to the modular parameter τ of the Seiberg–Witten curve by u=
ϑ44 − ϑ24 ϑ34
(τ ) = 1 − 32q 1/2 + 256q + · · · .
(7.2)
where q = e2π iτ . This formula can be inverted to τ =i
K K
iκ iκ4 ,
(7.3)
4
therefore we see that the modular parameter τ is related to the specific heat of the theory through (5.26). Let us now introduce the quantity ξ=
2 ϑ22 (τ )ϑ44 (τ )
.
(7.4)
From Weak to Strong Coupling in ABJM Theory
547
This is proportional to the third derivative of the genus zero free energy, therefore to the Yukawa coupling Cλλλ . More precisely, we have ∂λ3 F0 (λ) = −8π 3 i ξ.
(7.5)
Therefore, the planar content of the theory can be elegantly encoded in terms of modular forms on the Seiberg–Witten curve. One powerful application of the modular properties of the ABJM theory is the determination of the higher genus corrections to the free energy, Fg (λ). These can be obtained in principle from the matrix model (2.1), or equivalently from the formalism of [59] (appropriately modified as in [60,61]). However, as emphasized in for example [28,58,62], this formalism is not very convenient to do calculations at higher genus. One should rather use the fact that the Fg are quasi-modular forms that can be promoted to non-holomorphic modular forms. The resulting non-holomorphic objects satisfy the holomorphic anomaly equations of [30], as shown in [28,63], and these can be in turn solved with the technique of direct integration developed in [28,29,31,58] for local CY manifolds and matrix models. The basic strategy of direct integration is the following. First, we assume an ansatz for Fg of the form Fg (τ ) = ξ 2g−2 f g (τ ),
(7.6)
where f g (τ ) =
3g−3
(g)
E 2k (τ )ck (τ ) ,
g ≥ 2,
(7.7)
k=0
is an almost modular form of weight 6g − 6, with respect to a monodromy group ⊂ S L(2, Z). Fg (τ ) can be promoted to a non-holomorphic modular form Fg (τ, τ¯ ) by changing %2 (τ, τ¯ ) = E 2 (τ ) − E 2 (τ ) → E
3 . π Im(τ )
(7.8)
The resulting Fg (τ, τ¯ ) satisfies the holomorphic anomaly equations of [30], which gov%2 (τ, τ¯ ), ern their anti–holomorphic dependence. Since this dependence is contained in E (g) these equations govern the E 2 content of Fg . This means that the coefficients ck (τ ), which are modular forms of weight 6g − 6 − 2k, can be obtained recursively for k > 0 if one knows the lower Fg . In order to write down the recursive equation, it is useful to introduce a covariant derivative dξ taking a form of weight k into a form of weight k + 2: dξ = ∂τ +
k ∂τ ξ . 3 ξ
Then, the holomorphic anomaly equations lead to ⎫ ⎧ g−1 ⎬
d fg 1 ⎨ 2 1 ∂τ ξ dξ f g−1 + =− dξ fr dξ f g−r , dξ f g−1 + ⎭ dE 2 3 ⎩ 3 ξ
(7.9)
g ≥ 2. (7.10)
r =1
(g)
If Fg are known, with g < g, the above equation determines all the coefficients ck (τ ) (g) in f g , with the exception of c0 (τ ), which plays the rôle of an integration constant. This
548
N. Drukker, M. Mariño, P. Putrov
coefficient is a holomorphic form of weight 6g − 6 and it is called the holomorphic ambiguity. In order to fix the holomorphic ambiguity we need two pieces of information. The (g) first one concerns its functional dependence. Since c0 (τ ) is a modular form w.r.t. some monodromy subgroup, it belongs to a finitely generated ring. This means that it is determined by a finite number of coefficients, which typically grows with g. The second piece of information comes from boundary conditions at singular points in moduli space. A very powerful boundary condition for matrix models and local Calabi–Yau manifolds is the so-called gap condition, discovered in [28] and further used in [31,58] to fix the holomorphic ambiguity. According to the gap condition, near certain points pi in moduli space, parametrized by a flat coordinate ti , the genus g free energy behaves as Fg(i) =
ag 2g−2 ti
+ O(1).
(7.11)
The superscript (i) means that the genus g free energy has to be transformed to the duality frame which is appropriate for the i th singularity, as it is well-known in special geometry. The “gap” refers to the absence of singular terms t −k with 0 < k < 2g − 2 in the local expansion near ti = 0. The vanishing of these terms provides boundary (g) conditions for c0 (τ ), and in some cases it fixes them completely. In our case, the relevant ring is that of 2 modular forms which is generated by the theta functions b = ϑ24 (τ ), c = ϑ34 (τ ), d = ϑ44 (τ ).
(7.12)
Since c = b + d, only two of them are independent, and we will choose b and d. Using standard formulae in the theory of modular forms, one finds ∂τ ξ b − E2 = , ξ 4
(7.13)
as well as dξ b =
b2 + bd (bd)b 1 , dξ (bd) = , dξ E 2 = −E 22 + 2bE 2 − E 4 . (7.14) 3 6 12
The modular expression for the genus one free energy is known [41] and reads F1 = − log η(τ ),
(7.15)
therefore we have dξ f 1 = −
E2 . 24
(7.16)
These are all the ingredients needed for the recursion. The holomorphic ambiguity can be written as (g) c0 (τ )
=
3g−3
(g)
α j b j d 3g−3− j
(7.17)
j=0
and it involves 3g − 2 unknowns. Let us see how we can fix these by looking at the behavior near the three singular points of moduli space.
From Weak to Strong Coupling in ABJM Theory
549
At the orbifold point, the Fg are the genus g amplitudes of the super-matrix model (2.1) with N1 = N2 . Their leading behavior near λ = 0 is governed by two copies of the Gaussian matrix model, therefore they behave as Fg(o) (λ) =
B2g (2π iλ)2−2g + O(1). g(2g − 2)
(7.18)
This gives g − 1 conditions, since the ansatz (7.17) for the holomorphic ambiguity only involves even powers of λ. The symmetric conifold point z 1 = z 2 = 1/16 is related to the orbifold point through an S-duality transformation. The appropriate global coordinates near this point are given in (3.33). In the ABJM slice one has y1 = 0,
y2 = y = 1 −
ζ2 . 16
(7.19)
The following period is a good local, flat coordinate near the symmetric conifold point: t=
∞
n=0
an y n+1 , (n + 1) 24n
(7.20)
where n
2k 4k 2n − 2k 4n − 4k . an = k 2k n−k 2n − 2k 2n 1 n
(7.21)
k=0
It was noticed in [31] that the genus g amplitude at the conifold point behaves like Fg(c) (t)
B2g = 2g(2g − 2)
t 2i
2−2g + O(1).
(7.22)
This fixes 2g − 2 conditions. Finally, the large radius point is related to the orbifold point by an ST S transformation. The genus g free energy is the generating function of Gromov–Witten invariants of the local F0 geometry in the slice T1 = T2 = T . More precisely, one has
Fg(GW) (Q) = (−4)g−1 Nd,g Q d , Q = e−T , (7.23) d≥1
where Nd,g =
Nd1 ,d2 ,g
(7.24)
d1 +d2 =d
is a sum of Gromov–Witten invariants at genus g, Nd1 ,d2 ,g , of local F0 (the degrees d1 , d2 correspond to the two Kähler classes of this geometry). Since (7.23) is a power series in Q with no constant term, we obtain one extra condition, which, together with the g − 1 conditions from the orbifold point and the 2g − 2 conditions of the conifold point, completely fixes the 3g − 2 unknowns in the holomorphic ambiguity.
550
N. Drukker, M. Mariño, P. Putrov
Let us see how this works in some detail when g = 2. The integration of the holomorphic anomaly equation gives, 5 1 1 (2) f 2 = · 2 − E 23 + 3bE 22 − 2E 4 E 2 + c0 (τ ), (7.25) 3 24 3 (2)
where c0 (τ ) is of the form (7.17). The expansion around the orbifold, conifold, and large radius points read, respectively, 1 11 (o) (2) F2 (λ) = − + 1728α0 + O(1), 432(2π iλ)2 3 (2)
(2)
(2)
5 + 1296α3 −1 − 864(12α2 + 15α3 ) + O(1), + 2 1296t 10368t 2 1 F2(GW) (Q) = − + 1728(α0(2) − α1(2) + α2(2) − α3(2) ) + O(Q). 432 3 (c)
F2 (t) = −
(7.26)
Imposing the conditions (7.18), (7.22) and (7.23) we fix (2)
α0 =
1 7 1 1 (2) (2) (2) , α1 = , α2 = , α3 = . 25920 17280 3456 3240
(7.27)
We finally obtain (o) F2
1 = 432bd 2 +
5 3 2 − E 2 + 3bE 2 − 2E 4 E 2 3
16b3 + 15db2 + 21d 2 b + 2d 3 . 12960bd 2
(7.28)
Since τ depends on λ through (7.3) and (5.5), this gives the exact expression for the genus two free energy on S3 in the ABJM model, for any value of the ’t Hooft coupling. Notice that the modular ring appearing here and parametrizing the holomorphic ambiguity is different from the one appearing in Seiberg–Witten theory [28,29] or in the cubic matrix model [58]. This is due to the fact that, although the curves are the same, the meromorphic forms defining the theory are different. Using this method, we have computed the free energies up to high genus. The strong (o) coupling behavior of Fg is of the form 3
Fg(o) (λ) ∼ −λ 2 −g ,
λ → ∞, g ≥ 0.
(7.29)
We have also used these results in order to investigate the large order behavior of the 1/N expansion. We have found that B2g ,g (λ) = (−1)g−1 Fg(o) − (7.30) (2π iλ)2−2g F g(2g − 2) behaves at large g as ,g (λ) ∼ (2g)! |A(λ)|−2g cos (2gθ (λ) + δ(λ)) . F
(7.31)
In this equation, the angle θ (λ) satisfies θ (0) = π/2 and θ (λ) = 0 for all λ > 0, while δ(λ) is a function of λ (see for example Sect. 2 of [64] for more details on the large
From Weak to Strong Coupling in ABJM Theory
551
order behavior of the genus expansion). The sign (−1)g−1 is included in (7.30) since in the physical ABJM theory the coupling gs is imaginary. The large order behavior (7.31) indicates that the singularities of the Borel transform of Fg (λ) which are closest to the origin are located at ±A(λ), where A(λ) = |A(λ)| eiθ(λ) .
(7.32)
Since θ (λ) does not vanish, none of them lies on the positive real axis. This strongly suggests that the 1/N expansion of the free energy is Borel summable for any λ > 0. The large order behavior of the genus expansion (7.31) is similar to the one found for Chern–Simons theory on S3 in [65], and it should be governed by a large N instanton with action A(λ). It would be very interesting to identify this instanton and compute A(λ) analytically, both in the gauge theory and in the string theory dual. The factorial growth, found here by explicit calculation in the matrix model, agrees with the expected behavior for the genus expansion in string theory [32]. 8. More Exact Results on Wilson Loops In this section we elaborate on the results of [7,8] and we obtain more exact results on Wilson loops.
8.1. 1/N corrections. The higher genus corrections to the VEV of 1/2 and 1/6 BPS Wilson loops can be computed in terms of the higher genus corrections to the resolvent of the matrix model. The resolvent has a genus expansion of the form ω(z) =
∞
2g
gs ωg (z).
(8.1)
g=0
In the same way, the density of eigenvalues has a large N expansion of the form ρ(z) =
∞
2g
gs ρg (z),
ρ(z) = ρ (1) (z) + ρ (2) (z).
(8.2)
g=0
The ρg(i) (z) (with i = 1, 2) have their support on the intervals Ci , and they can be obtained by the discontinuity of ωg at the cuts as in (2.28). The genus expansion of the expectation value of the 1/6 BPS and 1/2 BPS Wilson loops follows the expressions in (2.30) and (2.31) with the appropriate term in the expansion of ρ (i) (Z ) and ω(Z ). The first step is therefore to compute ωg ( p). This calculation can be done with the recursive techniques developed in the matrix model literature starting with [56] and culminating with [59]. We will perform an explicit computation for g = 1. Calculations for g ≥ 2 are in principle doable, but they become complicated. A convenient formula for ω1 ( p) for an algebraic resolvent was found in [66]. To write this formula, we write the discontinuity of the resolvent (also called spectral curve in the matrix model literature) as y( p) = M( p) σ ( p), σ ( p) = ( p − x1 )( p − x2 )( p − x3 )( p − x4 ). (8.3)
552
N. Drukker, M. Mariño, P. Putrov
M( p) is sometimes called the moment function. Then, one has 4 Ai 4 Bi + + C ω1 ( p) = √ i , σ ( p) i=1 ( p − xi )2 p − xi
(8.4)
where Ai =
1 1 , 16 M(xi )
⎞ ⎛
1 1 1 M (xi ) ⎠, ⎝2αi − + Bi = − 16 M 2 (xi ) 8 M(xi ) xi − x j 1 α j − αi 1 Ci = − 48 M(xi ) x j − xi
j=i
(8.5)
j=i
⎞ ⎛
1 1 M (xi ) αi ⎠, ⎝2αi − αi + − 16 M 2 (xi ) 8 M(xi ) xi − x j j=i
and the αi are given by α1 = α2 = α3 = α4 =
1 1− (x1 − x2 ) 1 1− (x2 − x1 ) 1 1− (x3 − x4 ) 1 1− (x4 − x3 )
(x4 − x2 ) E(k) , (x4 − x1 ) K (k) (x3 − x1 ) E(k) , (x3 − x2 ) K (k) (x4 − x2 ) E(k) , (x3 − x2 ) K (k) (x3 − x1 ) E(k) , (x4 − x1 ) K (k)
(8.6)
where the modulus of the elliptic functions is k2 =
(x1 − x2 )(x3 − x4 ) . (x1 − x3 )(x2 − x4 )
(8.7)
These expressions differ from the ones in [66] in a permutation of the roots, as explained in [67]. The overall factor of 4 in (8.4) is due to the fact that our resolvent has a different normalization than the one in [66]. Although the resolvent of the lens space matrix model (2.10) is not algebraic, its discontinuity can be written in the form (8.3) with σ ( p) = f ( p)2 − 4β 2 p 2 ,
f ( p) = p 2 − ζ p + 1
(8.8)
and √ σ ( p) 2 −1 . M( p) = √ tanh f ( p) p σ ( p)
(8.9)
From Weak to Strong Coupling in ABJM Theory
553
This form of the spectral curve is typical of the mirrors of toric geometries [60,61]. The branch points are 1 1 x1 = −b, x2 = − , x3 = , x4 = a. b a Using these expressions, it is possible to compute the integral 1 1/6 W g=1 = ω1 (Z )Z dZ 4π i C1
(8.10)
(8.11)
in closed form, in terms of elliptic functions E, K and the elliptic integral of the third kind (n, k), with n=
(a 2 − 1)b . (1 + ab)a
(8.12)
One finds the rather complicated expression W
1/6
g=1 =
1 −3(b−2a +a 2 b) (1+ab)4 E 2 √ 3/2 12π a b (1 + ab)(a 2 − 1)2 (b2 − 1)K + a(1 + a 4 ) − b + a 2 (4 + 4a 2 − a 4 )b − 4a(1 − 3a 2 + a 4 )b2 −a 2 (1 + a 2 )b3 (2 + b2 ) + a(1 − 8a 2 + a 4 )b4 K 2 + b3 (1 + 6a 2 + a 4 ) + 4a(1 + a 2 )(b2 − 1) +b(3 − 14a 2 + 3a 4 ) (1 + ab)2 E K +
(ab − 1)(a 2 − b2 ) −6E 2 12π (ab)3/2 (1 + ab)k 4 K 2
+ 4(2 − k 2 )E K − (2 − 2k 2 + k 4 ) K 2 .
(8.13)
To check this formula, we expand it around the weakly coupled point λ1 = λ2 = 0. After using the inverse mirror map given by (4.1) we find πi π2 2 π2 π 3i 3 π 3i 2 π 3i λ1 + λ1 + λ1 λ2 + λ1 + λ1 λ2 − λ1 λ22 12 12 4 18 24 4 π4 π4 3 5π 4 2 2 π 4 λ1 λ2 + λ λ − λ1 λ32 + O(λ5 ). − λ41 + (8.14) 36 24 24 1 2 6 We can test this expansion with a perturbative calculation in the ABJM matrix model. At order O(gs4 ) we have found, e−gs N1 /2 1 2 1 1 1 2 1 1/6 gs2 + W = 1 − N1 − N1 N2 + N1 N2 − N2 gs3 2π iλ1 24 4 24 16 16 3 10 3 20 N4 − N N2 − N1 N23 + 5760 1 1920 1 1920 10 2 5 1 2 7 N1 + N1 N2 + N2 + g4 + · · · . − (8.15) 5760 192 32 5760 s W
1/6
g=1 = −
It is straightforward to see that this agrees with (8.14).
554
N. Drukker, M. Mariño, P. Putrov
The 1/N correction to the 1/2 BPS Wilson loop is much easier to obtain, since it can be computed as a residue at infinity. We have that ω1 (Z ) =
4 4 Ci + O(Z −3 ), Z2
(8.16)
i=1
where the Ci are given in (8.5). We find, at weak coupling, W
1/2
πi π2 2 π 3i 3 5π 3 i (λ1 + λ2 ) + (λ1 − λ22 ) + (λ1 + λ32 ) − λ1 λ2 (λ1 − λ2 ) 12 12 18 24 π4 5π 4 − (λ41 − λ42 ) + (8.17) λ1 λ2 (λ21 − λ22 ) + O(λ5 ). 36 24
g=1 = −
At strong coupling we find (we consider for simplicity the ABJM slice) W
1/2
g=1 =
1 3 + 2 log2 κ − 4 log κ κ + O(1). 24i log2 κ
(8.18)
The leading exponent is exactly as at genus zero (5.16), representing the same minimal surface with an extra degenerate handle attached. Its effect is to modify the one-loop determinant, which (with our normalization and ignoring instantons) can be written as √ 1 1 1 1/2 eπ 2λ , W g=1 = −i λ → ∞. (8.19) − √ + 12 6π 2λ 16π 2 λ 8.2. Giant Wilson loops. It has been argued in [33–35,68] that a D-brane probe in AdS5 × S5 represents an insertion of a Wilson loop in the dual 4d N = 4 SYM with a large symmetric or antisymmetric representation (in the case of D3 branes and D5 branes, respectively). These “giant Wilson loops” are characterized by a representation with n boxes, and one considers the limit n, N → ∞,
n N
fixed.
(8.20)
In terms of the Gaussian matrix model of the Wilson loops in that theory, the giant Wilson loop in the symmetric representation is represented by an additional eigenvalue outside the cut and the antisymmetric representation by a “hole” in the original cut. Let us review now the known D-brane solutions which could be relevant for ABJM theory. The usual 1/2 BPS Wilson loop in the fundamental representation is described by a string with world-volume AdS2 ⊂ AdS4 . In M-theory it is an M2-brane wrapping also the orbifold cycle on S7 /Zk . When considering k/2 coincident M2-branes (or k, when it is odd) the M2-brane solution develops an extra branch, where the circle becomes a linear combination of the orbifold direction and a contractible circle in AdS4 [69]. In type IIA these configurations are D2-branes with world-volume AdS2 × S1 ⊂ AdS4 , where the radius of the S1 is a free modulus. From the M-theory point of view these are continuous deformations of the system of k/2 coincident M2-branes describing a Wilson loop in a k/2 dimensional representation. In the field theory they are the vortex loop operators of [18], which have a description as semi-classical field configurations and carry the same charge as k/2 Wilson loops.
From Weak to Strong Coupling in ABJM Theory
555
These solutions have further moduli associated to rotations away from the orbifold cycle inside S7 /Zk . Such M2-brane configurations preserve 8 supercharges (1/3 BPS) [15,18]. There is also a known family of D6-brane solutions which were argued in [15] to represent the 1/6 BPS Wilson loops in anti-symmetric representations. The action for this D-branes is (for N1 = N2 ) √ n(N − n) SD6 = −π 2λ , N
(8.21)
which matches that of n strings for small n and has the n → N − n symmetry of the antisymmetric representation. In the matrix model these D6-branes should correspond to creating a “hole” in one of the two cuts, splitting it in two. We turn now to the lens space matrix model and try to find the appropriate description for these objects, and in particular the 1/2 BPS vortex loop operators. As pointed out in [70], the calculation of Wilson loops in the matrix model in this limit can be done in a saddle-point approximation. We will now reformulate the arguments of [70] and adapt them to the lens space matrix model. We will focus on the case of 1/2 BPS Wilson loops, where we want to calculate Wnη = Tr Rηn U ,
η = ±1,
(8.22)
where U is the same matrix as in (2.9) and R±1 n = Sn , An are respectively the totally symmetric and the totally antisymmetric representations of U (N1 + N2 ) with n boxes. It will turn out that the relevant limit in this theory is slightly different from (8.20) and is given by fixing ν=η
ηgs n n = . k 2π i
(8.23)
Positive ν will correspond to symmetric representations and negative ν to antisymmetric ones. In the ’t Hooft limit, for fixed N /k, the two scalings are clearly equivalent. The calculation of (8.22) is very similar to the calculation of partition functions of n bosons or fermions in the canonical ensemble, where n is fixed and large. But at large n, in the thermodynamic limit, this calculation can be done as well in the grand canonical ensemble. We then introduce the fugacity z and consider the grand-canonical partition function, using the expression for the determinant as the generating function of the characters ⎛ ⎞
Tr U
η (z) = η−1 z ⎠ . (8.24) z n Wnη = det (1 − ηz U )−η = exp ⎝ ≥1
n≥0
The average value of n in this ensemble is given by (we remove the average notation here, as is standard in the grand canonical formalism) n=z
∂ log η . ∂z
(8.25)
This is inverted to determine the fugacity as a function of the number of particles z ∗ = z ∗ (n),
(8.26)
556
N. Drukker, M. Mariño, P. Putrov
and then the original VEV can be calculated, in a saddle point approximation, as ⎛ ⎞
Tr U Wnη ≈ z ∗−n η (z ∗ ) = exp ⎝−n log z ∗ + η−1 z ∗ ⎠ . (8.27) ≥1
For convenience, let us henceforth absorb Y = ηz. It can be seen that, at leading order in large N , the grand-canonical partition function (8.24) is given by disconnected planar graphs. Therefore
Tr U 0 η Y , g(Y ) , g(Y ) = gs (8.28) η (Y ) ≈ exp gs ≥1
where the subscript 0 refers to the planar part. We now observe that the function g(Y ) is related to the planar resolvent in the lens space matrix model (2.8) and (2.10) by Y
1 ∂ g(Y ) = ω0 (Y −1 ) − t ∂Y 2 1 = − log (Y + b)(Y + 1/b) + (Y − a)(Y − 1/a) . (8.29) 2
Note that compared to ω0 in (2.10), the sign between the two square roots is reversed. Integrating this equation we get Y dY 1 + b)(Y + 1/b) + (Y − a)(Y − 1/a) g(Y ) = − log (Y . Y 2 0 (8.30) The initial point of integration is chosen to be Y = 0, since around that point the integrand approaches a constant ζ /2 + O(Y ). This guaranties that for small Y the result of the integration will be proportional to the 1/2 BPS Wilson loop (2.35). The saddle point Eqs. (8.25) determining the mean value of n is then given by ν=
1 ∂ Y g(Y ), 2π i ∂Y
(8.31)
i.e., (8.29), e−2π iν =
1 (Y∗ + b)(Y∗ + 1/b) + (Y∗ − a)(Y∗ − 1/a) , 2
Y∗ = ηz ∗ .
(8.32)
This can be solved explicitly in terms of β, ζ or alternatively in terms of B and κ. The solution reads iκ e−π i(2ν+B) 16 sin(2π ν) sin(2π(ν + B)) Y∗ = . (8.33) 1− 1− 4 sin(2π(ν + B)) κ2 The choice of sign is such that Y∗ = 0 when ν = 0. We will write Wnη ≈ exp Aη /gs ,
(8.34)
From Weak to Strong Coupling in ABJM Theory
557
where Aη , which is identified with the action of a brane probe in the large N string/Mtheory dual, is given by Aη = −2π iην log(ηY∗ ) + ηg(Y∗ ).
(8.35)
In the original variables, in terms of ω0 , the integral (8.30) is from infinity to a finite position Y∗−1 , and represents the effect of adding a single eigenvalue to the system. This fits with the standard dictionary [71] identifying a brane with a single eigenvalue. This integral gives an expression for the action of the giant Wilson loop, in the limit (8.20) which is exact as a function of the ’t Hooft couplings. The derivatives of this integral with respect to β and ζ can be evaluated in closed form, as in (2.16), in terms of incomplete elliptic integrals. The resulting expression can then be studied at the different limits of the ABJM theory as done for other observables in earlier sections. If we go to the conifold limit, setting λ2 = 0, we get an expression for the giant Wilson loop in Chern–Simons theory on S3 . In that case there exists an exact expression for the Wilson loop for all n. As we show in Appendix B, the above derivation in this limit indeed reproduces the CS answer. We will now discuss the expansion of the result for the giant Wilson loop for large κ, since this is the strong coupling limit in which one makes contact with the AdS geometry [33]. In terms of B and κ, the integral (8.30) reads dY 1 log (1 + Y )2 − eπ iB Y (κ − 4i sin(π B)) g(Y∗ ) = − Y 2 0 2 π iB + (1 − Y ) − e Y (κ + 4i sin(π B)) ,
Y∗
(8.36)
where Y∗ is given in (8.33). Expanding Y∗ at leading order at large κ we get Y∗ = 2i e−π i(2ν+B)
sin(2π ν) 1 − e−4π iν −π iB + O(κ −2 ). (8.37) + O(κ −2 ) = e κ κ
This suggests rescaling Y in the integral (8.36) by κ, which allows for a systematic expansion in powers of κ −1 . At leading order the integral becomes g(Y∗ ) = −
Y∗ 0
dY π iB κY + O(κ −1 ) . log 1 − e Y
(8.38)
This yields g(Y∗ ) =
1 π iB 1 Li2 e κY∗ + O(κ −2 ) = Li2 1 − e−4π iν + O(κ −2 ). 2 2
(8.39)
Another way to get this estimate is to notice that the highest powers of ζ in the series expansion in y of g(y) are captured by g(y) =
1 Li2 (ζ y) + · · · . 2
(8.40)
558
N. Drukker, M. Mariño, P. Putrov
Using the dilogarithm identity (B.3) we conclude that the action (8.35), written in terms of the original variable n, is 1 nπ i (2B − 1 + η) Aη = nπ 2λˆ + gs 2 2 √ ηk π ˆ −4π in/k + (8.41) − Li2 e + O(λˆ −1/2 , e−2π 2λ ). 4π i 6 Notice that this formula does not display the exchange symmetry n ↔ N − n for the antisymmetric case η = −1. This is because this symmetry is not present for the antisymmetric super-representation, as pointed out in [72]. The leading order in λ in (8.41) is as expected, i.e., n times the action of the fundamental string (and n times an extra framing factor). The non-trivial dependence on ν only appears at subleading order in λ, and therefore will not be visible in the supergravity approximation. As mentioned above, there are no known 1/2 BPS brane solutions carrying less than k/2 units of electric charge other than fundamental strings. So we expect that the above action describes the interaction of these coincident strings. For n a multiple of k/2 (or of k, if it is odd), we see from (8.33) that Y∗ = 0 and the integral (8.30) is over a full cycle. The argument of the dilogarithm in (8.41) is unity, canceling the π 2 /6 term. Since Y ∗ passed through one of the cuts C1 or C2 , it is now on a different sheet, and exactly at the branch point of the logarithm in ω0 (Y −1 ). This happens exactly for the value of n where the strings describing the Wilson loop can be replaced by D2-branes, which are the string theory incarnation of the vortex loop operators [18]. This suggests that the vortex loop operators are related to eigenvalues along the logarithmic branch-cut. It is possible to use our formalism to calculate the perturbative and instanton corrections to these configurations and it would be interesting to understand further their significance in the matrix model. Acknowledgements. We would like to thank Ofer Aharony, Massimo Bianchi, Andrea Brini, Aristos Donos, Valentina Forini, Sean Hartnoll, Aki Hashimoto, Diego Hofman, Anton Kapustin, Albrecht Klemm, Joe Minahan, Juan Maldacena, Vasily Pestun, Jan Plefka, Olof Sax and Christoph Sieg for stimulating discussions. N.D. would like to thank Nordita and the Erwin Schrödinger International Institute for their hospitality in the final stages of this project. M.M. would like to thank the Physics Department at Harvard University and the Erwin Schrödinger International Institute for hospitality. The work of M.M. and of P.P. is supported by the Fonds National Suisse.
A. Normalization of the ABJM Matrix Model Here we shall fix the overall normalization of the matrix model. As explained in the beginning of Sect. 2, to do this we must fix the coefficient of the cosh in the denominator. This term appears as a consequence of integrating out the matter hypermultiplets at one-loop. For general supersymmetric Chern–Simons-matter theories, the contribution of a hypermultiplet in representation R is given by [4] ∞ n + 1/2 + iρ(a) n , (A.1) log Z [a] = log n − 1/2 − iρ(a) ρ n=1
where ρ are the weights of the representation, and a is the element in the Cartan algebra given by 1 diag μ1 , · · · , μ N1 , ν1 , · · · , ν N2 . a= (A.2) 2π
From Weak to Strong Coupling in ABJM Theory
559
In [4] the one-loop determinant is evaluated up to a multiplicative constant, Z [a] = (C cosh (πρ(a)))−1/2 .
(A.3)
ρ
The constant C can be determined by setting a = 0 in (A.1) ∞
−
1 log C = log 2
n=1
n + 1/2 n − 1/2
n .
(A.4)
This is a divergent constant, but as usual when considering determinants on compact manifolds, we can compute it by using ζ -function regularization. Let us define ∞
n n ζ Z (s) = (A.5) − s . 1 s n+2 n − 21 n=1 The regularization of the quantity appearing in (A.4) is then −ζ Z (0). An elementary calculation shows that ζ Z (s) = − 2s − 1 ζ (s), (A.6) where ζ (s) is the standard Riemann zeta function. Therefore, − ζ Z (0) = −
log 2 2
(A.7)
and C = 2. B. Giant Wilson Loops in Chern–Simons Theory Chern–Simons theory on S3 is a particular case of the lens space matrix model when b = 1 and the second cut collapses to zero size, i.e., t1 = t, t2 = 0. It gives the leading behavior of the Wilson loop in ABJM theory when λ2 λ1 , as discussed in Sect. 6. Here we consider the behavior of the giant Wilson loops, those in high dimensional symmetric or antisymmetric representations presented in Sect. 8.2, in this limit. In this case it is easy to calculate explicitly the action (8.35), since the integral Y dY 1 2 − 4et Y 1 + Y + log(h(Y )), h(Y ) = (1 + Y ) (B.1) g(Y ) = − Y 2 0 can be obtained in closed form g(Y ) =
π2 1 − log2 (h(Y )) + log(h(Y )) log 1 − e−t h(Y ) − log(1 − h(Y )) 6 2 −Li2 (h(Y )) + Li2 e−t h(Y ) − Li2 (e−t ). (B.2)
Here we used the dilogarithm identity Li2 (1 − x) =
π2 − Li2 (x) − log(x) log(1 − x). 6
(B.3)
560
N. Drukker, M. Mariño, P. Putrov
The solution of the saddle point Eqs. (8.25) is obtained by setting in (8.33) t κ = −4i sinh , 2
B=
t 1 + , 2π i 2
(B.4)
and we find Y∗ = −
1 − e−2π iν . 1 − e2π iν+t
(B.5)
The action (8.35) is η Aη = −2π iν log(ηY∗ ) + g(Y∗ ) π2 = −2π iν log η − 2π 2 ν 2 + 2π iνt + 6 2π iν−t 2π iν − Li2 e − Li2 e−t . + Li2 e
(B.6)
Notice that this expression is exact in t. We can test (B.6) in all details against a direct calculation of correlators. Indeed, the VEVs Tr R U for the Chern–Simons matrix model on S3 are proportional to quantum dimensions (see for example [21]): Tr R U = q κ R /2+(R)N /2 dimq (R).
(B.7)
q = egs ,
(B.8)
In this equation,
(R) is the number of boxes in R, and κ R is the framing factor, given by
li (li − 2i + 1), κR =
(B.9)
i
where li are the lengths of the rows in the diagrams. The quantum dimensions of the symmetric and antisymmetric representations are given by dimq (Rηn ) =
n q ηn(n−1)/4 ent/2 (1 − e−t q −η(i−1) ), [n]!
(B.10)
i=1
where [n]! =
n
(q
i/2
−q
−i/2
)=q
1 4 n(n+1)
i=1
n
(1 − q −i ).
(B.11)
i=1
At large n we rescale ξ=
i , n
so that log([n]!) ≈
1 gs
q −i = exp(−gs i) → e−2π iηνξ
−π 2 ν 2 + 2π iην
1 0
dξ log(1 − e−2π iηνξ ) .
(B.12)
(B.13)
From Weak to Strong Coupling in ABJM Theory
561
This gives the following contribution to the action: π2 − Li2 e−2π iην π 2ν2 + 6 π2 2 2 −2π iν = η π ν − 2π iν log η + . − Li2 e 6
(B.14)
To derive the expression on the right hand side we used, for η = −1 the dilogarithm identity Li2 (e x ) = −Li2 (e−x ) +
x2 π2 − ± π ix. 3 2
(B.15)
The product in the numerator of both the symmetric and antisymmetric representations can be written in a unified form as 1 2π iην dξ log(1 − e−t e−2π iνξ ) = η Li2 (e−t−2π iν ) − Li2 (e−t ) . (B.16) 0
The prefactors in (B.7) and (B.10) contribute η(−3π 2 ν 2 + 2π iνt).
(B.17)
Together with (B.14) and (B.16) this exactly reproduces (B.6). In the antisymmetric representation the result can also be written as − 2π iν(t + 2π iν) +
π2 + Li2 (e−t ) − Li2 (e−t−2π iν ) − Li2 (e2π iν ). 6
(B.18)
This expression agrees at leading order with the D6-brane calculation (8.21) and should be the full answer in the limit of λ2 = 0. In this expression we see the expected symmetry [68] n ↔ N − n,
(B.19)
2π iν ↔ −t − 2π iν.
(B.20)
which is
References 1. Aharony, O., Bergman, O., Jafferis, D.L., Maldacena, J.: N = 6 superconformal Chern–Simons-matter theories, M2-branes and their gravity duals. JHEP 0810, 091 (2008) 2. Gustavsson, A.: Algebraic structures on parallel M2-branes. Nucl. Phys. B 811, 66 (2009) 3. Bagger, J., Lambert, N.: Gauge symmetry and supersymmetry of multiple M2-branes. Phys. Rev. D 77, 065008 (2008) 4. Kapustin, A., Willett, B., Yaakov, I.: Exact results for Wilson loops in superconformal Chern–Simons theories with matter. JHEP 1003, 089 (2010) 5. Pestun, V.: Localization of gauge theory on a four-sphere and supersymmetric Wilson loops. http://arXiv. org/abs/0712.2824v2 [hep-th], 2010 6. Kapustin, A., Willett, B., Yaakov, I.: Nonperturbative Tests of Three-Dimensional Dualities. http://arXiv. org/abs/1003.5694v2 [hep-th], 2010 7. Drukker, N., Trancanelli, D.: A supermatrix model for N = 6 super Chern–Simons-matter theory. JHEP 1002, 058 (2010) 8. Mariño, M., Putrov, P.: Exact results in ABJM theory from topological strings. JHEP 1006, 011 (2010)
562
N. Drukker, M. Mariño, P. Putrov
9. Mariño, M.: Chern–Simons theory, matrix integrals, and perturbative three-manifold invariants. Commun. Math. Phys. 253, 25 (2004) 10. Aganagic, M., Klemm, A., Mariño, M., Vafa, C.: Matrix model as a mirror of Chern–Simons theory. JHEP 0402, 010 (2004) 11. Aharony, O., Bergman, O., Jafferis, D.L.: Fractional M2-branes. JHEP 0811, 043 (2008) 12. Witten, E.: Quantum field theory and the Jones polynomial. Commun. Math. Phys. 121, 351 (1989) 13. Halmagyi, N., Yasnov, V.: The spectral curve of the lens space matrix model. JHEP 0911, 104 (2009) 14. Mariño, M., Pasquetti, S., Putrov, P.: Large N duality beyond the genus expansion. http://arXiv.org/abs/ 0911.4692v2 [hep-th], 2010 15. Drukker, N., Plefka, J., Young, D.: Wilson loops in 3-dimensional N = 6 supersymmetric Chern–Simons Theory and their string theory duals. JHEP 0811, 019 (2008) 16. Chen, B., Wu, J.B.: Supersymmetric Wilson loops in N = 6 super Chern–Simons-matter theory. Nucl. Phys. B 825, 38 (2010) 17. Rey, S.J., Suyama, T., Yamaguchi, S.: Wilson loops in superconformal Chern–Simons theory and fundamental strings in Anti-de Sitter supergravity dual. JHEP 0903, 127 (2009) 18. Drukker, N., Gomis, J., Young, D.: Vortex Loop Operators, M2-branes and Holography. JHEP 0903, 004 (2009) 19. Lee, K.M., Lee, S.: 1/2-BPS Wilson Loops and Vortices in ABJM Model. JHEP 1009, 004 (2010) 20. Tierz, M.: Soft matrix models and Chern–Simons partition functions. Mod. Phys. Lett. A 19, 1365 (2004) 21. Mariño, M.: Les Houches lectures on matrix models and topological strings. http://arXiv.org/abs/hep-th/ 0410165v3, 2010 22. Dolivet, Y., Tierz, M.: Chern–Simons matrix models and Stieltjes-Wigert polynomials. J. Math. Phys. 48, 023507 (2007) 23. Bergman, O., Hirano, S.: Anomalous radius shift in AdS4 /CFT3 . JHEP 0907, 016 (2009) 24. Aharony, O., Hashimoto, A., Hirano, S., Ouyang, P.: D-brane charges in gravitational duals of 2+1 dimensional gauge theories and duality cascades. JHEP 1001, 072 (2010) 25. Klebanov, I.R., Tseytlin, A.A.: Entropy of near-extremal black p-branes. Nucl. Phys. B 475, 164 (1996) 26. Minahan, J.A., Sax, O.O., Sieg, C.: A limit on the ABJ model. http://arXiv.org/abs/1005.1786v1 [hep-th], 2010 27. Gadde, A., Pomoni, E., Rastelli, L.: Spin chains in N = 2 superconformal theories: from the Z2 quiver to superconformal QCD. http://arXiv.org/abs/1006.0015v1 [hep-th], 2010 28. Huang, M.x., Klemm, A.: Holomorphic anomaly in gauge theories and matrix models. JHEP 0709, 054 (2007) 29. Grimm, T.W., Klemm, A., Mariño, M., Weiss, M.: Direct integration of the topological string. JHEP 0708, 058 (2007) 30. Bershadsky, M., Cecotti, S., Ooguri, H., Vafa, C.: Kodaira-Spencer theory of gravity and exact results for quantum string amplitudes. Commun. Math. Phys. 165, 311 (1994) 31. Haghighat, B., Klemm, A., Rauch, M.: Integrability of the holomorphic anomaly equations. JHEP 0810, 097 (2008) 32. Shenker, S.H.: The Strength of nonperturbative effects in string theory. In: Álvarez, O., Marinari, E., Windey, P. (eds.) Random surfaces and quantum gravity. New York: Plenum 1991, pp. 191–200 33. Drukker, N., Fiol, B.: All-genus calculation of Wilson loops using D-branes. JHEP 0502, 010 (2005) 34. Gomis, J., Passerini, F.: Holographic Wilson loops. JHEP 0608, 074 (2006) 35. Gomis, J., Passerini, F.: Wilson loops as D3-branes. JHEP 0701, 097 (2007) 36. Kapustin, A., Saulina, N.: Chern–Simons-Rozansky-Witten topological field theory. Nucl. Phys. B 823, 403 (2009) 37. Brini, A., Tanzini, A.: Exact results for topological strings on resolved Y ( p, q) singularities. Commun. Math. Phys. 289, 205 (2009) 38. Erickson, J.K., Semenoff, G.W., Zarembo, K.: Wilson loops in N = 4 supersymmetric Yang-Mills theory. Nucl. Phys. B 582, 155 (2000) 39. Hosono, S., Klemm, A., Theisen, S.: Lectures on mirror symmetry. In: Lecture Notes in Phys., Vol. 436, Berlin: Heidelberg-NewYork: Springer, 1994, pp. 235–280 40. Chiang, T.M., Klemm, A., Yau, S.T., Zaslow, E.: Local mirror symmetry: Calculations and interpretations. Adv. Theor. Math. Phys. 3, 495 (1999) 41. Aganagic, M., Bouchard, V., Klemm, A.: Topological strings and (almost) modular forms. Commun. Math. Phys. 277, 771 (2008) 42. Seiberg, N., Witten, E.: Electric-magnetic duality, monopole condensation, and confinement in N = 2 supersymmetric Yang-Mills theory. Nucl.Phys. B 426, 19 (1994) [Erratum-ibid. B 430, 485 (1994)] 43. Guadagnini, E., Martellini, M., Mintchev, M.: Wilson lines in Chern–Simons theory and link invariants. Nucl. Phys. B 330, 575 (1990) 44. Alvarez, M., Labastida, J.M.F.: Analysis of observables in Chern–Simons perturbation theory. Nucl. Phys. B 395, 198 (1993)
From Weak to Strong Coupling in ABJM Theory
563
45. Drukker, N., Gross, D.J.: An exact prediction of N = 4 SUSYM theory for string theory. J. Math. Phys. 42, 2896 (2001) 46. Drukker, N., Gross, D.J., Tseytlin, A.A.: Green-Schwarz string in AdS5 × S5 : Semiclassical partition function. JHEP 0004, 021 (2000) 47. Kruczenski, M., Tirziu, A.: Matching the circular Wilson loop with dual open string solution at 1-loop in strong coupling. JHEP 0805, 064 (2008) 48. Drukker, N.: 1/4 BPS circular loops, unstable world-sheet instantons and the matrix model. JHEP 0609, 004 (2006) 49. Zarembo, K.: Supersymmetric Wilson loops. Nucl. Phys. B 643, 157 (2002) 50. Bak, D., Yun, S.: Thermal aspects of ABJM theory: Currents and condensations. Class. Quant. Grav. 27, 215011 (2010) 51. Balasubramanian, V., Kraus, P.: A stress tensor for anti-de Sitter gravity. Commun. Math. Phys. 208, 413 (1999) 52. Emparan, R., Johnson, C.V., Myers, R.C.: Surface terms as counterterms in the AdS/CFT correspondence. Phys. Rev. D 60, 104001 (1999) 53. Cagnazzo, A., Sorokin, D., Wulff, L.: String instanton in AdS4 × CP3 . JHEP 1005, 009 (2010) 54. Giombi, S., Pestun, V., Ricci, R.: Notes on supersymmetric Wilson loops on a two-sphere. http://arXiv. org/abs/0905.0665v2 [hep-th], 2009 55. Arsiwalla, X., Boels, R., Mariño, M., Sinkovics, A.: Phase transitions in q-deformed 2d Yang-Mills theory and topological strings. Phys. Rev. D 73, 026005 (2006) 56. Ambjorn, J., Chekhov, L., Kristjansen, C.F., Makeenko, Yu.: Matrix model calculations beyond the spherical limit. Nucl. Phys. B 404, 127 (1993) [Erratum-ibid. B 449, 681 (1995)] 57. Alday, L.F., Eden, B., Korchemsky, G.P., Maldacena, J., Sokatchev, E.: From correlation functions to Wilson loops. http://arXiv.org/abs/1007.3243v2 [hep-th], 2010 58. Klemm, A., Mariño, M., Rauch, M.: Direct integration and non-perturbative effects in matrix models. JHEP 1010, 004 (2010) 59. Eynard, B., Orantin, N.: Invariants of algebraic curves and topological expansion. http://arXiv.org/abs/ math-ph/0702045v4, 2007 60. Mariño, M.: Open string amplitudes and large order behavior in topological string theory. JHEP 0803, 060 (2008) 61. Bouchard, V., Klemm, A., Mariño, M., Pasquetti, S.: Remodeling the B-model. Commun. Math. Phys. 287, 117 (2009) 62. Huang, M.x., Klemm, A.: Holomorphicity and modularity in Seiberg-Witten theories with matter. http:// arXiv.org/abs/0902.1325v1 [hep-th], 2009 63. Eynard, B., Mariño, M., Orantin, N.: Holomorphic anomaly and matrix models. JHEP 0706, 058 (2007) 64. Mariño, M., Schiappa, R., Weiss, M.: Nonperturbative effects and the large-order behavior of matrix models and topological strings. Commun. Number Theor. Phys. 2, 349 (2008) 65. Pasquetti, S., Schiappa, R.: Borel and Stokes nonperturbative phenomena in topological string theory and c = 1 matrix models. http://arXiv.org/abs/0907.4082v2 [hep-th], 2010 66. Akemann, G.: Higher genus correlators for the Hermitian matrix model with multiple cuts. Nucl. Phys. B 482, 403 (1996) 67. Klemm, A., Mariño, M., Theisen, S.: Gravitational corrections in supersymmetric gauge theory and matrix models. JHEP 0303, 051 (2003) 68. Yamaguchi, S.: Wilson loops of anti-symmetric representation and D5-branes. JHEP 0605, 037 (2006) 69. Lunin, O.: 1/2-BPS states in M theory and defects in the dual CFTs. JHEP 0710, 014 (2007) 70. Hartnoll, S.A., Kumar, S.P.: Higher rank Wilson loops from a matrix model. JHEP 0608, 026 (2006) 71. Lin, H., Lunin, O., Maldacena, J.M.: Bubbling AdS space and 1/2 BPS geometries. JHEP 0410, 025 (2004) 72. Bars, I.: Supergroups and their representations. In: Application of Group Theory in Phys. and Math. Phys., Lectures Appl. Math. 21, Providence, RI: Amer. Math. Soc., 1985, p. 17 Communicated by A. Kapustin
Commun. Math. Phys. 306, 565–578 (2011) Digital Object Identifier (DOI) 10.1007/s00220-011-1261-6
Communications in
Mathematical Physics
The Excitation Spectrum for Weakly Interacting Bosons Robert Seiringer Department of Mathematics and Statistics, McGill University, Burnside Hall, 805 Sherbrooke Street West, Montreal, Quebec H3A 2K6, Canada. E-mail:
[email protected] Received: 17 September 2010 / Accepted: 23 November 2010 Published online: 15 May 2011 – © The Author(s) 2011
Abstract: We investigate the low energy excitation spectrum of a Bose gas with weak, long range repulsive interactions. In particular, we prove that the Bogoliubov spectrum of elementary excitations with linear dispersion relation for small momentum becomes exact in the mean-field limit. 1. Introduction and Main Results Bogoliubov’s seminal 1947 paper [1] contains several important results concerning the low energy behavior of bosonic systems. Among its striking predictions is the fact that the excitation spectrum is made up of elementary excitations whose energy is linear in the momentum for small momentum. Bogoliubov’s method is based on various approximations and crucially uses a formalism on Fock space that does not conserve particle number. Mathematically, the validity of his method has so far only been established for the ground state energy of certain systems, see [5,9,13–15,18]. In particular, there are no rigorous results on the low energy excitation spectrum of interacting Bose gases, with the notable exception of exactly solvable models in one dimension [2,3,8,10,11,16,17]. In this article, we shall prove the validity of Bogoliubov’s approximation scheme for a Bose gas in arbitrary dimension in the mean-field (Hartree) limit, where the interaction strength is proportional to the inverse particle number, and its range extends over the whole system. In particular, we verify that the low energy excitation spectrum for such a system equals the sum of elementary excitations, as predicted by Bogoliubov. As a corollary, we observe that the lowest energy in the sector of total momentum P depends linearly on |P|, a property that is crucial for the superfluid behavior of the system. The mean-field limit has served as a convenient and instructive toy model for several aspects of bosonic systems over the years. We refer to [6,7] for a review and further references. We consider a homogeneous system of N ≥ 2 bosons on the flat unit torus Td , d ≥ 1. The bosons interact with a weak two-body interaction which we write for convenience © 2011 by the author. This work may be reproduced, in its entirety, for non-commercial purposes.
566
R. Seiringer
as (N − 1)−1 v(x). We assume that v is bounded and periodic (with period one), and v(−x) = v(x). We also assume that v is of positive type, i.e., it has only non-negative Fourier coefficients. With denoting the usual Laplacian on Td , the Hamiltonian equals HN = −
N
i +
i=1
1 v(xi − x j ) N −1 i< j
in suitable units. It acts on L 2sym (Td N ), the permutation-symmetric square integrable functions of N variables xi ∈ Td . Let E 0 (N ) denote the ground state energy of H N . The Bogoliubov approximation [1,12] predicts that E 0 (N ) is close to 21 N v (0) + E Bog , where 1 E Bog = − | p|2 + v ( p) − | p|4 + 2| p|2 v ( p) . (1) 2 p=0
The sum runs over p ∈ (2π Z)d , and v ( p) =
Td
v(x)e−i px d x
are the Fourier coefficients of v. Note that the sum above converges, since the summands behave like v ( p)2 /| p|2 for large p. More importantly, the Bogoliubov approximation predicts that the excitation spectrum of H N is made up of elementary excitations of momentum p with corresponding energy e p = | p|4 + 2| p|2 v ( p). (2) One noteworthy feature of (2) is that it is linear in p for small p, in contrast to the case when interactions are absent. Our main results can be summarized as follows: Theorem 1. The ground state energy E 0 (N ) of H N equals E 0 (N ) =
N v (0) + E Bog + O(N −1/2 ) , 2
(3)
with E Bog defined in (1). Moreover, the spectrum of H N − E 0 (N ) below an energy ξ is equal to finite sums of the form (4) e p n p + O ξ 3/2 N −1/2 , p∈(2π Z)d \{0}
where e p is given in (2) and n p ∈ {0, 1, 2, . . . } for all p = 0. The error term O(N −1/2 ) in (3) refers to an expression that is bounded, in absolute value, by a constant times N −1/2 for large N , where the constant depends only on the interaction potential v; likewise for the error term O(ξ 3/2 N −1/2 ) in (4). The dependence on v is rather complicated but can be deduced from our proof, which gives explicit bounds. Keeping track of this dependence allows to draw conclusions about the spectrum for large N even if v depends on N .
The Excitation Spectrum for Weakly Interacting Bosons
567
In the case of a fixed, N -independent v, Theorem 1 implies that the Bogoliubov approximation becomes exact in the mean-field (Hartree) limit. As long as ξ N 1/3 ,
each individual excitation energy ξ is of the form p e p n p with error o(1). Moreover,
as long as ξ N , it is of the form e p n p (1 + o(1)), i.e., the error is small relative to the magnitude of the excitation energy. In other words, in the mean field limit the whole excitation spectrum with energy ξ N is given in terms of sums of elementary excitations. The condition ξ N can be expected to be optimal, since only under this condition a large fraction of the particles are guaranteed to occupy the zero momentum mode, one of the key assumptions in the Bogoliubov approximation. For excitation energies of the order N and larger, the spectrum will not be composed of elementary excitations anymore but has a more complicated structure. Our proof shows that for each value of the {n p } there exists exactly one eigenvalue of the form (4). Moreover, the
eigenfunction corresponding to an eigenvalue with given {n p } has total momentum p p n p . Given this fact we readily deduce the following corollary from Theorem 1. Corollary 1. Let E P (N ) denote the ground state energy of H N in the sector of total momentum P. We have E P (N ) − E 0 (N ) = (5) min e p n p + O |P|3/2 N −1/2 .
{n p },
p
p n p =P
In particular, E P (N ) − E 0 (N ) ≥ |P| min p
p=0
2 v ( p) + | p|2 + O(|P|3/2 N −1/2 ).
The linear dependence of E P (N ) on |P| is of crucial importance for the superfluid behavior of the Bose gas; see, e.g., the detailed discussion in [4]. It is a result of the interactions among the particles. The expression (5) differs markedly from the corresponding result for an ideal, non-interacting gas, especially if v (0) is large.
Finally, we note that under the unitary transformation U = exp(−iq Nj=1 x j ), q ∈ (2π Z)d , the Hamiltonian H N transforms as
N
U † H N U = H N + N |q|2 − 2q P,
where P = −i j=1 ∇ j denotes the total momentum operator. Hence our results apply equally also to the parts of the spectrum of H N with excitation energies close to N |q|2 , corresponding to collective excitations where the particles move uniformly with momentum q. The remainder of this paper is devoted to the proof of Theorem 1. The main strategy is to compare H N with Bogoliubov’s approximate Hamiltonian. The latter has to be suitably modified to take particle number conservation into account. By adding a suitable constant, we may assume that v ≥ 0 without loss of generality. 2. Preliminaries We denote by P the projection onto the constant function in L 2 (Td ), and Q = 1 − P. The operator that counts the number of particles outside the zero momentum mode will be denoted by N > , i.e., N> =
N
Qi .
i=1
We shall also use the symbol T for the kinetic energy T = −
N
i=1 i .
568
R. Seiringer
Lemma 1 below gives simple upper and lower bounds on the ground state energy E 0 (N ) of H N , as well as an upper bound on the expectation value of T in a low energy state. Lemma 1. The ground state energy of H N satisfies the bounds 0 ≥ E 0 (N ) −
N N v (0) ≥ − v (0)). (v(0) − 2 2(N − 1)
Moreover, in any N -particle state with |H N | ≤
(2π )2 |N > | ≤ |T | ≤
N v (0) + μ 2
(6)
we have
N v (0)) + μ. (v(0) − 2(N − 1)
(7)
Proof. The upper bound to the ground state energy follows from using a constant trial function. Since v ≥ 0, we have 2 N i px j ≥ 0. v ( p) e j=1 p∈(2π Z)d \{0} This inequality can be rewritten as 1≤i< j≤N
v(xi − x j ) ≥
N N2 v (0) − v(0), 2 2
(8)
and thus HN ≥
N N v (0) + T − v (0)). (v(0) − 2 2(N − 1)
The rest follows easily.
In the following, we shall also need a bound on the expectation value of higher powers of N > . More precisely, we shall use the following lemma. Lemma 2. Let be an N -particle wave function in the spectral subspace of H N corresponding to energy E ≤ E 0 (N ) + μ. Then μ 2 (2π )2 N > T ≤ v(0) + 2 N + v (0)) (2μ + v(0) − v (0)). (μ + 3v(0) + 2(N − 1) In particular, |N > T | is bounded above by an expression depending only on μ and v but not on N . Proof. Since is permutation symmetric, > 1 N T = N |Q 1 S| + N > H N − E 0 (N ) − μ , 2
The Excitation Spectrum for Weakly Interacting Bosons
569
where S = E 0 (N ) + 21 μ − (N − 1)−1 i< j v(xi − x j ). Using Schwarz’s inequality, the last term can be bounded as > μ > 2 1/2 1 (N ) N . H N − E 0 (N ) − μ ≤ 2 2 We split S into two parts, S = Sa + Sb , with Sa = E 0 (N ) +
1 μ − 2 N −1
v(xi − x j )
2≤i< j≤N
and 1 v(x1 − x j ). N −1 N
Sb = −
j=2
v ( p) as in (8), but with N Note that Sa does not depend on x1 . Using the positivity of replaced by N − 1, as well as the upper bound on E 0 (N ) in (6), we see that Sa ≤
1 v (0) + v(0)) . (μ + 2
In particular, this implies that 1 v (0) + v(0)) N > . (μ + 2 To bound the contribution of Sb , we use N |Q 1 Sa | ≤
−|Q 1 Sb | = |Q 1 v(x1 − x2 )| = |Q 1 Q 2 v(x1 − x2 )| + |Q 1 P2 v(x1 − x2 )P2 | + |Q 1 P2 v(x1 − x2 )Q 2 | . The second term on the right side is positive. For the first and the third, we use Schwarz’s inequality and v ∞ = v(0) to conclude |Q 1 Sb | ≤ v(0)|Q 1 Q 2 |1/2 + v(0)|Q 1 |. Since N 2 |Q 1 Q 2 | ≤ |(N > )2 |, we have thus shown that > 1 N T ≤ (μ + v (0) + 3v(0)) N > 2 1/2 1 + v(0) + μ (N > )2 . 2 Using N > ≤ (2π )−2 T this yields 2 > v(0) + 21 μ N T ≤ + (μ + 3v(0) + v (0)) N > . 2π The result then follows from Lemma 1.
570
R. Seiringer
3. The Bogoliubov Hamiltonian The main strategy in the proof of Theorem 1 is to compare the Hamiltonian H N with the Bogoliubov Hamiltonian. We will use a slightly modified version of it, which is, in particular, particle number conserving. Let a p and a †p denote the usual creation and annihilation operators on Fock space for a particle with momentum p ∈ (2π Z)d , satisfying the canonical commutation relations [a p , aq† ] = δ pq . For p = 0, let a p a† bp = √ 0 N −1 and define the Bogoliubov Hamiltonian 1 † | p|2 b†p b p + v ( p) 2b†p b p + b†p b− H Bog = . + b b p −p p 2
(9)
(10)
p=0
Note that H Bog conserves particle number. We are interested in H Bog in the sector of exactly N particles. Note also that v (− p) = v ( p) for all p ∈ (2π Z)d . Let v ( p) A p = | p|2 + and v ( p). Bp = A simple computation (compare with [13, Thm. 6.3]) shows that † † † A p b†p b p + b− p b− p + B p b p b− p + b p b− p ⎞ ⎛ † † † b− b− p + α p b†p b p + α p b− p b p + α p b− p p + αpbp ⎠ = A2p − B 2p ⎝ + 1 − α 2p 1 − α 2p 1 † − A p − A2p − B 2p [b p , b†p ] + [b− p , b− p] , 2 where αp =
1 A p − A2p − B 2p if B p > 0 , α p = 0 if B p = 0. Bp
Note that 0 ≤ α p ≤ v ( p)/(| p|2 + v ( p)). In particular, sup p=0 α p < 1, and α( p) ∼ 2 v ( p)/| p| for large p. Define further † b p + α p b− p cp = 2 1 − αp
The Excitation Spectrum for Weakly Interacting Bosons
571
for p = 0. What the above calculation shows is that H
Bog
[b p , b†p ] + [b− p , b† ] 1 −p 2 2 A p − A p − Bp =− e p c†p c p + 2 2 p=0
(11)
p=0
with e p defined in (2). The commutators equal [b p , b†p ] =
a0† a0 − a †p a p N ≤ . N −1 N −1
(12)
4. Proof of Theorem 1: Lower Bound Recall that P denotes the projection onto the constant function in L 2 (Td ), and Q = 1−P. Denote the two-particle multiplication operator v(x1 − x2 ) by v for short. Using translation invariance and the Schwarz inequality (P ⊗ Q + Q ⊗ P)v Q ⊗ Q + Q ⊗ Qv(P ⊗ Q + Q ⊗ P) ≥ −ε(P ⊗ Q + Q ⊗ P)v(P ⊗ Q + Q ⊗ P) − ε−1 Q ⊗ Qv Q ⊗ Q (which follows from positivity of v) we conclude that v ≥ P ⊗ Pv P ⊗ P + P ⊗ Pv Q ⊗ Q + Q ⊗ Qv P ⊗ P +(1 − ε)(P ⊗ Q + Q ⊗ P)v(P ⊗ Q + Q ⊗ P) − ε−1 Q ⊗ Qv Q ⊗ Q (13) for any ε > 0. The last term can be bounded by v(0)Q ⊗ Q. In second quantized language, this means that H N is bounded from below by the restriction of
v (0) N (N − 1) − 2ε(N − N > )N > − N > N > − 1 2(N − 1) p N > (N > − 1)v(0) v ( p) † + 2(1 − ε)b†p b p + b†p b− p + b p b− p − 2 2ε(N − 1) | p|2 a †p a p +
p=0
to the N -particle sector. Here, we use again the definition (9) of b p . From now on, we shall work with operators on Fock space, but it is always understood that we are only concerned with the sector of N particles. Next, we observe that a †p a p ≥
N −1 † bpbp. N
Moreover,
v ( p)b†p b p ≤ v (0)
p=0
N N> , N −1
and hence HN ≥
N v (0) + H Bog − E ε , 2
572
R. Seiringer
where the Bogoliubov Hamiltonian H Bog was defined in (10) and 1 N > (N > − 1) v(0) 2N − 1 > Eε = T+ v (0) + + ε v (0) N . N −1 2(N − 1) ε N −1
(14)
In particular, using (11) and (12), HN + E ε ≥
N N e p c†p c p , v (0) + E Bog + 2 N −1
(15)
p=0
where E Bog is the Bogoliubov energy defined in (1). The last term on the right side of (15) is positive and can be dropped for a lower bound on the ground state energy of H N . For the choice ε = O(N −1/2 ), the expected value of E ε in the ground state of H N is bounded above by O(N −1/2 ), as the bounds in Lemma 1 and 2 show. This proves the desired lower bound on E 0 (N ). To obtain lower bounds on excited eigenvalues, it remains to investigate the positive last term in (15). We do this via a unitary transformation. Let U = e X , where † X= β p b†p b− p − b p b− p p=0
with β p ≥ 0 determined by tanh(2β p ) = α p . Note that X is anti-hermitian and hence U is unitary. If a0 and a0† were replaced by √ N − 1, U would be the usual Bogoliubov transformation. Our modified U has the advantage of being particle number conserving, however. The price to pay for this modification is that U † aq U can not be calculated anymore so easily. A second order Taylor expansion yields t −t X tX e aq e = aq − t[X, aq ] + (t − s)e−s X [X, [X, aq ]]es X ds 0
for any t > 0. We compute [X, aq ] = − and
2 † βq a−q a02 N −1
⎛ ⎞ † 2 (a † )2 a † 2a0 a0 + 1 [X, [X, aq ]] = 4βq2 aq 0 0 2 − 4βq ⎝ β p a p a− p ⎠ a−q (N − 1) (N − 1)2 p=0
=:
4βq2 aq
+ Jq .
(16)
For any t > 0 we thus have e−t X aq et X = aq +
2t † βq a−q a02 + N −1
0
t
(t − s)e−s X 4βq2 aq + Jq es X ds.
The Excitation Spectrum for Weakly Interacting Bosons
573
Iterating this identity leads to † U † aq U = cosh(2βq )aq + sinh(2βq )a−q
a02 + K q =: dq + K q N −1
with Kq =
1
e−s X Jq es X
0
sinh(2βq (1 − s)) ds. 2βq
In particular, we see that U † aq† aq U = dq† dq + K q† K q + dq† K q + K q† dq ≤ (1 + λ)dq† dq + (1 + λ−1 )K q† K q
(17)
for any λ > 0. We further have d †p d p
=
c†p c p
a †p a p − 1 − α 2p
≤ c†p c p +
† † α 2p a− p a− a0 a0† a0 a0 (a0† )2 a02 p −1 − − N −1 1 − α 2p N − 1 (N − 1)2
a †p a p N > . 1 − α 2p N
(18)
Using Schwarz’s inequality, we can bound K q† K q as K q† K q ≤
cosh(2βq ) − 1 (2βq )2
1
0
e−s X Jq† Jq es X
sinh(2βq (1 − s)) ds. 2βq
(19) (1)
(2)
To get an upper bound on Jq† Jq , we write Jq as the sum of two terms, Jq + Jq , (2) (1) where Jq is the second term on the right side in the first line of (16), and Jq = 4βq2 aq ((N − 1)−2 a02 (a0† )2 − 1). Using Jq† Jq ≤ 2Jq(1)† Jq(1) + 2Jq(2)† Jq(2) as well as ⎛ ⎝
p=0
⎞⎛ † ⎠⎝ β p a †p a− p
⎞
⎛
βq aq a−q ⎠ ≤ ⎝
q=0
⎞ β 2p ⎠ N > (N > − 1) ,
p=0
we obtain the bound ⎞ ⎛ > + 1)2 (N N ⎝sup β 2 + Jq† Jq ≤ C1 βq2 , C1 = 64 β 2p ⎠ . N −1 N − 1 q=0 q
(20)
p=0
To proceed, we need an upper bound on e−s X (N > + 1)2 es X for 0 ≤ s ≤ 1. For this purpose, let us compute † [X, N > ] = −2 βq bq† b−q + bq b−q . q=0
574
R. Seiringer
We have ⎛ [X, N > ]2 ≤ 8 ⎝
p=0
≤
⎞⎛ † ⎠⎝ β p b†p b− p
8 (N − 1)2 >
β 2p
⎞
⎛
βq bq b−q⎠ + 8 ⎝
q=0
⎞⎛ β p b p b− p⎠⎝
p=0
q=0
⎞ † ⎠ βq bq† b−q
N > (N > − 1)(N + 1 − N > )(N + 2 − N > )
p >
+(N + 1)(N + 2)(N − N > )(N − 1 − N > ) 2 N > 2 ≤ 16 N +1 . βp N −1 p
(21)
With the aid of Schwarz’s inequality, we obtain [X, (N > + 1)2 ] = (N > + 1)[X, N > ] + [X, N > ](N > + 1) ≥ −η(N > + 1)2 − η−1 [X, N > ]2 for any η > 0. In particular, [X, (N > + 1)2 ] ≥ −C2 (N > + 1)2 with C2 = 8
N β2. q=0 q N −1
We conclude that e−t X (N > + 1)2 et X ≤ (N > + 1)2 + C2
t
e−s X (N > + 1)2 es X ds
0
for any t > 0. Iterating this bound gives e−t X (N > + 1)2 et X ≤ etC2 (N > + 1)2 .
(22)
In combination, (19), (20) and (22) yield the bound K q† K q ≤
(cosh(2βq ) − 1)2 (N > + 1)2 C 1 eC2 . N −1 16βq2
(23)
By combining (17) with (18) and (23), we obtain ep 1 N >T † † † U ep cpcp ≥ ep apap U − sup 2 2 1+λ N p=0 | p| (1 − α p ) p p > 2 (cosh(2βq ) − 1)2 −1 (N + 1) C2 C1 e eq −(1 + λ ) . (24) N −1 16βq2 q
The Excitation Spectrum for Weakly Interacting Bosons
575
Note that |q|2 βq2 ∼ v (q)2 /|q|2 for large q, hence the last sum is finite. Applying this bound to (15), we have thus shown that N N 1 Bog † † v (0) + E U H N + E ε,λ ≥ + e p a p a p U, 2 N −1 1+λ p where E ε,λ
ep sup 2 2 p=0 | p| (1 − α p ) > + 1)2 (cosh(2βq ) − 1)2 (N C 1 eC2 eq , + (1 + λ−1 ) N −1 16βq2 q
N >T = Eε + N
with E ε defined in (14). The desired lower bound now follows easily from the min-max principle, and the
fact that the spectrum of p e p a †p a p equals p e p n p , with n p ∈ {0, 1, 2, . . . } for all p ∈ (2π Z)d . In fact, for any function in the spectral subspace of H N corresponding to energy E ≤ E 0 (N ) + ξ , we have | E ε,λ | ≤ O ε + N −1 ξ + ξ 2 N −1 ε−1 + λ−1 √ according to Lemmas 1 and 2. The choice ε = O( ξ/N ) = λ then leads to the conclusion that the spectrum H N below an energy E 0 (N ) + ξ is bounded from below by the corresponding spectrum of N e p a †p a p − O ξ 3/2 N −1/2 . v (0) + E Bog + 2 p=0
This completes the proof of the lower bound. 5. Proof of Theorem 1: Upper Bound We proceed in essentially the same way as in the lower bound. In analogy to (13), we have v ≤ P ⊗ Pv P ⊗ P + P ⊗ Pv Q ⊗ Q + Q ⊗ Qv P ⊗ P +(1 + ε)(P ⊗ Q + Q ⊗ P)v(P ⊗ Q + Q ⊗ P) + (1 + ε−1 )Q ⊗ Qv Q ⊗ Q for any ε > 0. Together with N> b†p b p ≥ a †p a p 1 − N this implies the upper bound HN ≤
N v (0) + H Bog + Fε , 2
576
R. Seiringer
where Fε =
2N − 1 > N> N > (N > − 1)v(0) T + ε v (0) N + (1 + ε−1 ) . N N −1 2(N − 1)
(25)
From a lower bound to the commutator (12), namely † [b p , b†p ] + [b− p , b− p]
=
† 2a0† a0 − a †p a p − a− p a− p
N −1
≥2−
3N > , N −1
we get H
Bog
≤E
Bog
3N > 1− 2(N − 1)
+
e p c†p c p .
(26)
p=0
Finally, to investigate the last term on the right side of this expression, we proceed as in (17)–(24), with the obvious modifications to get an upper bound instead of a lower bound. In replacement of (18) we use d †p d p ≥ c†p c p −
α 2p N > (N > + 1) a †p a p 1 − . N − 1 1 − α 2p 1 − α 2p N −1
The result is p
⎞ ⎛ > (N > + 1) e α 2 1 N p p ⎠ ⎝ U† e p c†p c p ≤ e p a †p a p U + 2 1−λ N − 1 1 − α p p p=0 ep T + sup 2 N − 1 p=0 | p| (1 − α 2p ) > 2 (cosh(2βq ) − 1)2 −1 (N + 1) C2 C1 e eq (27) +λ N −1 16βq2 q
for any λ > 0. Altogether, this shows that HN ≤
N 1 v (0) + E Bog + U† e p a †p a p U + Fε,λ , p 2 1−λ
with Fε,λ given by the sum of Fε in (25), 23 N > (N − 1)−1 |E Bog | from (26) and the last three terms in (27). To complete the upper bound, we need a bound on U Fε,λ U † . For this purpose, we find it convenient to first bound Fε,λ by Fε,λ ≤ C3
ε
−1
+λ
−1
(T + 1)2 −1 + ε+N T N
for an appropriate constant C3 > 0. What remains to be shown is that U (T + 1)2 U † ≤ eC4 (T + 1)2
(28)
The Excitation Spectrum for Weakly Interacting Bosons
577
for some constant C4 > 0. Given (28), we obtain U HN U † ≤
N 1 v (0) + E Bog + e p a †p a p 2 1−λ p=0 (T + 1)2 C4 −1 −1 −1 ε +λ + ε+N T . +C3 e N
(29)
The spectrum of the operator on the right side of this inequality has exactly the desired
√ form. Given an eigenvalue of p=0 e p a †p a p with value ξ , we choose ε = O( ξ/N ) = λ to obtain N2 v (0) + E Bog + ξ + O(ξ 3/2 N −1/2 ) for the right side of (29). This gives the desired upper bound. It remains to prove (28). This can be done in essentially the same way as in the proof of (22). In fact, [X, T ] = −2
† βq |q|2 bq† b−q + bq b−q
q
and hence, similarly to (21), [X, T ] ≤ 16 2
|q|4 βq2
q
2 N > N +1 . N −1
v (q)2 /|q|4 for large q, which implies the finiteness of the sum since v is Recall that βq2 ∼ bounded by assumption (and hence, in particular, square integrable on Td ). By Schwarz, [X, (T + 1)2 ] ≤ η(T + 1)2 + η−1 [X, T ]2 for any η > 0. In particular, [X, (T + 1)2 ] ≤ C4 (T + 1)2 for some C4 > 0. We conclude that et X (T + 1)2 e−t X ≤ (T + 1)2 + C4
t
es X (T + 1)2 e−s X ds
0
for any t > 0. Iterating this bound yields (28). This completes the proof of the upper bound, and hence the proof of Theorem 1. Acknowledgements. It is a pleasure to thank J. Fröhlich for inspiring discussions and for drawing my attention towards studying the mean-field limit. Financial support by the U.S. National Science Foundation grant PHY-0845292 is gratefully acknowledged.
578
R. Seiringer
References 1. Bogoliubov, N.N.: On the theory of superfluidity. J. Phys. (U.S.S.R.) 11, 23–32 (1947) 2. Calogero, F.: Ground State of a One-Dimensional N -Body System. J. Math. Phys. 10, 2197–2200 (1969) 3. Calogero, F.: Solution of the One-Dimensional N -Body Problems with Quadratic and/or Inversely Quadratic Pair Potentials. J. Math. Phys. 12, 419–436 (1971) 4. Cornean, H.D., Derezi´nski, J., Zi´n, P.: On the infimum of the energy-momentum spectrum of a homogeneous Bose gas. J. Math. Phys. 50, 062103 (2009) 5. Erd˝os, L., Schlein, B., Yau, H.-T.: Ground-state energy of a low-density Bose gas: A second-order upper bound. Phys. Rev. A 78, 053627 (2008) 6. Fröhlich, J., Lenzmann, E.: Mean-Field Limit of Quantum Bose Gases and Nonlinear Hartree equation. Séminaire É. D. P. XVIII, 26 p. Ecole Polytechnique, Palaiseau (2004) 7. Fröhlich, J., Knowles, A., Schwarz, S.: On the Mean-Field Limit of Bosons with Coulomb Two-Body Interaction. Commun. Math. Phys. 288, 1023–1059 (2009) 8. Girardeau, M.: Relationship between Systems of Impenetrable Bosons and Fermions in One Dimension. J. Math. Phys. 1, 516–523 (1960) 9. Giuliani, A., Seiringer, R.: The Ground State Energy of the Weakly Interacting Bose Gas at High Density. J. Stat. Phys. 135, 915–934 (2009) 10. Lieb, E.H., Liniger, W.: Exact Analysis of an Interacting Bose Gas. I. The General Solution and the Ground State. Phys. Rev. 130, 1605–1616 (1963) 11. Lieb, E.H.: Exact Analysis of an Interacting Bose Gas II, The Excitation Spectrum. Phys. Rev. 130, 1616– 1624 (1963) 12. Lieb, E.H., Seiringer, R., Solovej, J.P., Yngvason, J.: The Mathematics of the Bose Gas and its Condensation. Oberwolfach Seminars, Vol. 34, Birkhäuser (2005), also available at http://arxiv.org/abs/condmat/0610117v1 [cond-mat. stat-mech], 2006 13. Lieb, E.H., Solovej, J.P.: Ground State Energy of the One-Component Charged Bose Gas. Commun. Math. Phys. 217, 127–163 (2001). Errata 225, 219–221 (2002) 14. Lieb, E.H., Solovej, J.P.: Ground State Energy of the Two-Component Charged Bose Gas. Commun. Math. Phys. 252, 485–534 (2004) 15. Solovej, J.P.: Upper Bounds to the Ground State Energies of the One- and Two-Component Charged Bose Gases. Commun. Math. Phys. 266, 797–818 (2006) 16. Sutherland, B.: Quantum Many-Body Problem in One Dimension: Ground State. J. Math. Phys. 12, 246– 250 (1971) 17. Sutherland, B.: Quantum Many-Body Problem in One Dimension: Thermodynamics. J. Math. Phys. 12, 251–256 (1971) 18. Yau, H.-T., Yin, J.: The Second Order Upper Bound for the Ground Energy of a Bose Gas. J. Stat. Phys. 136, 453–503 (2009) Communicated by H. Spohn
Commun. Math. Phys. 306, 579–615 (2011) Digital Object Identifier (DOI) 10.1007/s00220-011-1309-7
Communications in
Mathematical Physics
The Quantum Reverse Shannon Theorem Based on One-Shot Information Theory Mario Berta, Matthias Christandl, Renato Renner Institute for Theoretical Physics, ETH Zurich, 8093 Zurich, Switzerland. E-mail:
[email protected];
[email protected];
[email protected] Received: 22 December 2009 / Accepted: 23 March 2011 Published online: 8 August 2011 – © The Author(s) 2011. This article is published with open access at Springerlink.com
Abstract: The Quantum Reverse Shannon Theorem states that any quantum channel can be simulated by an unlimited amount of shared entanglement and an amount of classical communication equal to the channel’s entanglement assisted classical capacity. In this paper, we provide a new proof of this theorem, which has previously been proved by Bennett, Devetak, Harrow, Shor, and Winter. Our proof has a clear structure being based on two recent information-theoretic results: one-shot Quantum State Merging and the Post-Selection Technique for quantum channels. 1. Introduction The birth of classical information theory can be dated to 1948, when Shannon derived his famous Noisy Channel Coding Theorem [1]. It shows that the capacity C of a classical channel E is given by the maximum, over the input distributions X , of the mutual information between the input X and the output E(X ). That is C(E) = max {H (X ) + H (E(X )) − H (X, E(X ))}, X
where H denotes the Shannon entropy. Shannon also showed that the capacity does not increase if one allows to use shared randomness between the sender and the receiver. In 2001 Bennett et al. [2] proved the Classical Reverse Shannon Theorem which states that, given free shared randomness between the sender and the receiver, every channel can be simulated using an amount of classical communication equal to the capacity of the channel. This is particularly interesting because it implies that in the presence of free shared randomness, the capacity of a channel E to simulate another channel F is C(E ) given by the ratio of their plain capacities C R (E, F) = C( F ) and hence only a single parameter remains to characterize classical channels. In contrast to the classical case, a quantum channel has various distinct capacities [2–7]. In [2] Bennett et al. argue that the entanglement assisted classical capacity C E of
580
M. Berta, M. Christandl, R. Renner
a quantum channel E is the natural quantum generalization of the classical capacity of a classical channel. They show that the entanglement assisted classical capacity is given by the quantum mutual information C E (E) = max H (ρ) + H (E(ρ)) − H ((E ⊗ I)ρ ) , ρ
where the maximum ranges over all input distributions ρ, ρ is a purification of ρ, I is the identity channel, and H denotes the von Neumann entropy. Motivated by this, they conjectured the Quantum Reverse Shannon Theorem (QRST) in [2]. Subsequently Bennett, Devetak, Harrow, Shor and Winter proved the theorem in [8]. The theorem states that any quantum channel can be simulated by an unlimited amount of shared entanglement and an amount of classical communication equal to the channel’s entanglement assisted classical capacity. So if entanglement is for free we can conclude, in complete analogy with the classical case, that the capacity of a quantum channel E to E) simulate another quantum channel F is given by C E (E, F) = CCEE((F ) and hence only a single parameter remains to characterize quantum channels. In addition, again analogous to the classical scenario [8,9], the Quantum Reverse Shannon Theorem gives rise to a strong converse for the entanglement assisted classical capacity of quantum channels. That is, if one sends classical information through a quantum channel E at a rate of C E (E) + ς for some ς > 0 (using arbitrary entanglement as assistance), then the fidelity of the coding scheme decreases exponentially in ς [8]. Free entanglement in quantum information theory is usually given in the form of maximally entangled states. But for the Quantum Reverse Shannon Theorem it surprisingly turned out that maximally entangled states are not the appropriate resource for general input sources. More precisely, even if one has arbitrarily many maximally entangled states as an entanglement resource, the Quantum Reverse Shannon Theorem cannot be proven [8]. This is because of an issue known as entanglement spread, which arises from the fact that entanglement cannot be conditionally discarded without using communication [10]. If we change the entanglement resource from maximally entangled states to embezzling states [11] however, the problem of entanglement spread can be overcome and the Quantum Reverse Shannon Theorem can be proven. A δ-ebit embezzling state is a bipartite state μ AB with the feature that the transformation μ AB → μ AB ⊗ φ A B , where φ A B denotes an ebit (maximally entangled state of Schmidt-rank 2), can be accomplished up to an error δ with local operations. Remarkably, δ-ebit embezzling states exist for all δ > 0 [11]. In this paper we present a proof of the Quantum Reverse Shannon Theorem based on one-shot information theory. In quantum information theory one usually makes the assumption that the resources are independent and identically distributed (iid) and is interested in asymptotic rates. In this case many operational quantities can be expressed in terms of a few information measures (which are usually based on the von Neumann entropy). In contrast to this, one-shot information theory applies to arbitrary (structureless) resources. For example, in the context of source coding, it is possible to analyze scenarios where only finitely many, possible correlated messages are encoded. For this the smooth entropy formalism was introduced by Renner et al. [12–14]. Smooth entropy measures have properties similar to the ones of the von Neumann entropy and like in the iid case many operational interpretations are known [12,15–29]. For our proof of the Quantum Reverse Shannon Theorem we work in this smooth entropy formalism and use a one-shot version for Quantum State Merging and its dual Quantum State Splitting as well as the Post-Selection Technique for quantum channels.
Quantum Reverse Shannon Theorem Based on One-Shot Information Theory
581
As in the original proof of the Quantum Reverse Shannon Theorem [8] we need embezzling states. Quantum State Merging was introduced by Horodecki et al. in [30,31]. It has since become an important tool in quantum information processing and was subsequently reformulated in [32], where it is called mother protocol. Quantum State Merging corresponds to the quantum generalization of classical Slepian and Wolf coding [33]. For its description, one considers a sender system, traditionally called Alice, a receiver system, Bob, as well as a reference system R. In Quantum State Merging, Alice, Bob, and the reference are initially in a joint pure state ρ AB R and one asks how much of a given resource, such as classical or quantum communication or entanglement, is needed in order to move the A-part of ρ AB R from Alice to Bob. The dual of this, called Quantum State Splitting, addresses the problem of how much of a given resource, such as classical or quantum communication or entanglement, is needed in order to transfer the A -part of a pure state ρ A A R , where part A A is initially with Alice, from Alice to Bob. The Post-Selection Technique was introduced in [34] and is a tool in order to estimate the closeness of two completely positive and trace preserving (CPTP) maps that act symmetrically on an n-partite system, in the metric induced by the diamond norm, the dual of the completely bounded norm [35]. The definition of this norm involves a maximization over all possible inputs to the joint mapping consisting of the CPTP map tensored with an identity map on an outside system. The Post-Selection Technique allows to drop this maximization. In fact, it suffices to consider a single de Finetti type input state, i.e. a state which consists of n identical and independent copies of an (unknown) state on a single subsystem. The technique was applied in quantum cryptography to show that security of discrete-variable quantum key distribution against a restricted type of attacks, called collective attacks, already implies security against the most general attacks [34]. Our proof of the Quantum Reverse Shannon Theorem is based on the following idea. Let E A→B be a quantum channel that takes inputs ρ A on Alice’s side and outputs E A→B (ρ A ) on Bob’s side. To find a way to simulate this quantum channel, it is useful to think of E A→B as E A→B (ρ A ) = trC (U A→BC ) ρ A (U A→BC )† , where C is an additional register and U A→BC is some isometry from A to BC. This is the Stinespring dilation [36]. Now the idea is to first simulate the isometry U A→BC locally at Alice’s side, resulting in ρ BC = (U A→BC )ρ A (U A→BC )† , and in a second step use Quantum State Splitting to do an optimal state transfer of the B-part to Bob’s side, such that he holds ρ B = E A→B (ρ A ) in the end. This simulates the channel E A→B . To prove the Quantum Reverse Shannon Theorem, it is then sufficient to show that the classical communication rate of the Quantum State Splitting protocol is C E (E). We realize this idea in two steps. Firstly, we propose a new version of Quantum State Splitting (since the known protocols are not good enough to achieve a classical communication rate of C E (E)), which is based on one-shot Quantum State Merging [15]. For the analysis we require a decoupling theorem, which is optimal in the one-shot case [16,37]. This means that the decoupling can be achieved optimally even if only a single instance of a quantum state is available. Secondly, we use the Post-Selection Technique to show that our protocol for Quantum State Splitting is sufficient to asymptotically simulate the channel E A→B with a classical communication rate of C E (E). This then completes the proof of the Quantum Reverse Shannon Theorem. Our paper is structured as follows. In Sect. 2 we introduce our notation and give some definitions. In particular, we review the relevant smooth entropy measures. Our results
582
M. Berta, M. Christandl, R. Renner
about Quantum State Splitting are then discussed in Sect. 3. Finally, we give our proof of the Quantum Reverse Shannon Theorem in Sect. 4. The argument uses various technical statements (e.g. properties of smooth entropies), which are proved in the Appendix. 2. Smooth Entropy Measures – Notation and Definitions We assume that all Hilbert spaces, in the following denoted H, are finite-dimensional. The dimension of H A is denoted by |A|. The set of linear operators on H is denoted by L(H) and the set of positive semi-definite operators on H is denoted by P(H). We define the sets of subnormalized states S≤ (H) = {ρ ∈ P(H) : tr[ρ] ≤ 1} and normalized states S= (H) = {ρ ∈ P(H) : tr[ρ] = 1}). The tensor product of H A and H B is denoted by H AB ≡ H A ⊗H B . Given a multipartite operator ρ AB ∈ P(H AB ), we write ρ A = tr B [ρ AB ] for the corresponding reduced operator. For M A ∈ L(H A ), we write M A ≡ M A ⊗ 1 B for the enlargement on any H AB , where 1 B denotes the identity in L(H B ). Isometries from H A to H B are denoted by V A→B . |A| |B| For H A , H B with bases {|i A }i=1 , {|i B }i=1 and |A| = |B|, the canonical identity mapping from L(H A ) to L(H B ) with respect to these bases is denoted by I A→B , i.e. I A→B (|i j| A ) = |i j| B . A linear map E A→B : L(H A ) → L(H B ) is positive if E A→B (ρ A ) ∈ P(H B ) for all ρ A ∈ P(H A ). It is completely positive if the map (E A→B ⊗ IC→C ) is positive for all HC . Completely positive and trace preserving maps are called CPTP maps or quantum channels. The support of ρ ∈ P(H) is denoted by supp(ρ), the projector onto supp(ρ) is denoted by ρ 0 and tr ρ 0 = rank(ρ), the rank of ρ. For ρ ∈ P(H) we write ρ∞ for the operator norm of ρ, which is equal to the maximum eigenvalue of ρ. Recall the following standard definitions. The von Neumann entropy of ρ ∈ S= (H) is defined as1 H (ρ) = −tr ρ log ρ . (1) The quantum relative entropy of ρ ∈ S≤ (H) with respect to σ ∈ P(H) is given by D(ρσ ) = tr[ρ log ρ] − tr[ρ log σ ]
(2)
if supp(ρ) ⊆ supp(σ ) and ∞ otherwise. The conditional von Neumann entropy of A given B for ρ AB ∈ S= (H) is defined as H (A|B)ρ = −D(ρ AB 1 A ⊗ ρ B ).
(3)
The mutual information between A and B for ρ AB ∈ S= (H) is given by I (A : B)ρ = D(ρ AB ρ A ⊗ ρ B ). Note that we can also write H (A|B)ρ = − I (A : B)ρ = 1 All logarithms are taken to base 2.
inf
σ B ∈S= (H B )
inf
σ B ∈S= (H B )
D(ρ AB 1 A ⊗ σ B ),
D(ρ AB ρ A ⊗ σ B ).
(4)
Quantum Reverse Shannon Theorem Based on One-Shot Information Theory
583
We now give the definitions of the smooth entropy measures that we need in this work. In Appendix B some basic properties are summarized. For a more detailed discussion of the smooth entropy formalism we refer to [12,17,18,38,39]. Following Datta [18] we define the max-relative entropy of ρ ∈ S≤ (H) with respect to σ ∈ P(H) as Dmax (ρσ ) = inf{λ ∈ R : 2λ · σ ≥ ρ}.
(5)
The conditional min-entropy of A given B for ρ AB ∈ S≤ (H AB ) is defined as Hmin (A|B)ρ = −
inf
σ B ∈S= (H B )
Dmax (ρ AB 1 A ⊗ σ B ).
(6)
In the special case where B is trivial, we get Hmin (A)ρ = − log ρ A ∞ . The max-information that B has about A for ρ AB ∈ S≤ (H AB ) is defined as Imax (A : B)ρ =
inf
σ B ∈S= (H B )
Dmax (ρ AB ρ A ⊗ σ B ).
(7)
Note that unlike the mutual information, this definition is not symmetric. The smooth entropy measures are defined by extremizing the non-smooth measures over a set of nearby states, where our notion of nearby is expressed in terms of the purified distance. For ρ, σ ∈ S≤ (H) it is defined as [39, Def. 4] (8) P(ρ, σ ) = 1 − F¯ 2 (ρ, σ ), ¯ , ·) denotes the generalized fidelity (which equals the standard fidelity2 if at where F(· least one of the states is normalized),
¯ F(ρ, σ ) = ρ ⊕ (1 − trρ) σ ⊕ (1 − trσ ) 1 = F(ρ, σ ) + (1 − trρ)(1 − trσ ). (9) The purified distance is a distance measure on S≤ (H) [39, Lemma 5], in particular, it satisfies the triangle inequality P(ρ, σ ) ≤ P(ρ, ω) + P(ω, σ ) for ρ, σ, ω ∈ S≤ (H). P(ρ, σ ) corresponds to one half times the minimum trace distance3 between purifications of ρ and σ . Henceforth we call ρ, σ ∈ S≤ (H) ε-close if P(ρ, σ ) ≤ ε and denote this by ρ ≈ε σ . We use the purified distance to specify a ball of subnormalized density operators around ρ ∈ S≤ (H): B ε (ρ) = {ρ¯ ∈ S≤ (H) : P(ρ, ρ) ¯ ≤ ε}.
(10)
Miscellaneous properties of the purified distance that we use for our proof are stated in Appendix A. For a further discussion we refer to [39]. For ε ≥ 0, the smooth conditional min-entropy of A given B for ρ AB ∈ S≤ (H AB ) is defined as ε Hmin (A|B)ρ =
sup
ρ¯ AB ∈Bε (ρ AB )
Hmin (A|B)ρ¯ .
(11)
√ 2 The fidelity between ρ, σ ∈ S (H) is defined by F(ρ, σ ) = √ρ √σ , where = tr † . ≤ 1 1 3 The trace distance between ρ, σ ∈ S (H) is defined by ρ − σ . The trace distance is often defined ≤ 1 with an additional factor one half; we choose not to do this.
584
M. Berta, M. Christandl, R. Renner
The smooth max-information that B has about A for ρ AB ∈ S≤ (H AB ) is defined as ε (A : B)ρ = Imax
inf
ρ¯ AB ∈Bε (ρ AB )
Imax (A : B)ρ¯ .
(12)
The smooth entropy measure can be seen as a generalization of its corresponding von Neumann quantity in the sense that the latter can be retrieved asymptotically by evaluating the smooth entropy measure on iid states (cf. Remark B.23 and Corollary B.25). In Sec. 3 we give an operational meaning to the smooth max-information (Theorem 3.10 and Theorem 3.11).4 Since all Hilbert spaces in this paper are assumed to be finite dimensional, we are allowed to replace the infima by minima and the suprema by maxima in all the definitions of this section. We will do so in the following.
3. Quantum State Splitting The main goal of this section is to prove that there exists a one-shot Quantum State Splitting protocol (Theorem 3.10) that is optimal in terms of its quantum communication cost (Theorem 3.11). The protocol is obtained by inverting a one-shot Quantum State Merging protocol. The main technical ingredient for the construction of these protocols is the following decoupling theorem (Theorem 3.1). The proof of the decoupling theorem can be found in Appendix C. Theorem 3.1. Let ε > 0, ρ A R ∈ S≤ (H A R ) and consider a decomposition of the system A into two subsystems A1 and A2 . Furthermore define σ A1 R (U ) = tr A2 (U ⊗ 1 R )ρ A R (U † ⊗ 1 R ) . If log |A| + Hmin (A|R)ρ 1 − log 2 ε
(13)
σ A R (U ) − 1 A1 ⊗ ρ R dU ≤ ε, 1 |A1 | U (A) 1
(14)
log |A1 | ≤ then
where dU is the Haar measure over the unitaries on system A, normalized to
dU = 1.
An excellent introduction into the subject of decoupling can be found in [40]. Note that our decoupling theorem (Theorem 3.1) can be seen as a special case of a more general decoupling theorem [16,37]. It is possible to formulate the decoupling criterion in Theorem 3.1 more generally in terms of smooth entropies, which is then optimal in the most general one-shot case [15,16]. Quantum State Merging, Quantum State Splitting, and other related quantum information processing primitives are discussed in detail in [15,30–32,41]. Note that we are not only interested in asymptotic rates, but in (tight) one-shot protocols. This is reflected by the following definitions. 4 For an operational meaning of the smooth conditional min-entropy see e.g. [12,15].
Quantum Reverse Shannon Theorem Based on One-Shot Information Theory
585
Definition 3.2 (Quantum State Merging). Consider a bipartite system with parties Alice and Bob. Let ε > 0 and ρ AB R = |ρ ρ| AB R ∈ S≤ (H AB R ), where Alice controls A, Bob B and R is a reference system. A CPTP map E is called ε-error Quantum State Merging of ρ AB R if it consists of applying local operations at Alice’s side, local operations at Bob’s side, sending q qubits from Alice to Bob and outputs a state (E ⊗ I R )(ρ AB R ) ≈ε ρ B B R ⊗ |φ L φ L | A1 B1 ,
(15)
where |φ L φ L | A1 B1 is a maximally entangled state of Schmidt-rank L and ρ B B R = (I A→B ⊗ I B R )ρ AB R . q is called quantum communication cost and e = log L entanglement gain. Quantum State Merging is also called Fully Quantum Slepian Wolf (FQSW) or mother protocol [32]. Lemma 3.3. Let ε > 0 and ρ AB R = |ρ ρ| AB R ∈ S≤ (H AB R ). Then there exists an ε-error Quantum State Merging protocol for ρ AB R with a quantum communication cost of
1 1 q= H0 (A)ρ − Hmin (A|R)ρ + 2 · log (16) 2 ε and an entanglement gain of 1 1 e= H0 (A)ρ + Hmin (A|R)ρ − 2 · log , 2 ε
(17)
where H0 (A)ρ = log rank(ρ A ). Proof. The intuition is as follows (cf. Fig. 1). First Alice applies a unitary U A→A1 A2 . After this she sends A2 to Bob who then performs a local isometry V A2 B→B B B1 . We choose U A→A1 A2 such that it decouples A1 from the reference R. After sending the 1 A2 -part to Bob, the state on A1 R is given by |AA11| ⊗ ρ R and Bob holds a purification of 1
this. But |AA11| ⊗ ρ R is the reduced state of ρ B B R ⊗ |φ L φ L | A1 B1 and since all purifications are equal up to local isometries, there exists an isometry V A2 B→B B B1 on Bob’s side that transforms the state into ρ B B R ⊗ |φ L φ L | A1 B1 . More formally, let A = A1 A2 with log |A2 | = 21 (log |A| − Hmin (A|R)ρ ) + 2 · log 1ε . According to the decoupling theorem (Theorem 3.1), there exists a unitary † U A→A1 A2 such that for σ A1 A2 B R = (U A→A1 A2 ⊗ 1 B R )ρ AB R (U A→A ⊗ 1 B R ), 1 A2 1 A1 2 σ A1 R − |A1 | ⊗ ρ R ≤ ε . By an upper bound of the purified distance in terms of 1
1
the trace distance (Lemma A1) this implies σ A1 R ≈ε |AA11| ⊗ ρ R . 1 We apply this unitary U A→A1 A12 and then send A2 to Bob; therefore q = 2 (log |A| − Hmin (A|R)ρ ) + 2 · log ε . Uhlmann’s theorem [42,43] tells us that there exists an isometry V A2 B→B B B1 such that 1 A1 P σ A1 R , ⊗ ρ R = P 1 A1 R ⊗ V A2 B→B B B1 σ A1 A2 B R (1 A1 R ⊗ V A2 B→B B B1 )† , |A1 | |φ L φ L | A1 B1 ⊗ ρ B B R .
586
M. Berta, M. Christandl, R. Renner
Fig. 1. From the protocol for Quantum State Merging, which we describe in Lemma 3.3, we get a protocol for Quantum State Splitting with maximally entangled states. All we have to do is to run the Quantum State Merging protocol backwards
Hence the entanglement gain is given by e = 21 (log |A| + Hmin (A|R)ρ ) − 2 · log 1ε . 0 ˆ ≤ Now if ρ A has full rank this is already what we want. In general log tr ρ A = log | A| log |A|. But in this case we can restrict the Hilbert space H A to the subspace H Aˆ , on which ρ A has full rank. Definition 3.4 (Quantum State Splitting with maximally entangled states). Consider a bipartite system with parties Alice and Bob. Let ε > 0 and ρ A A R = |ρ ρ| A A R ∈ S≤ (H A A R ), where Alice controls A A and R is a reference system. Furthermore let |φ K φ K | A1 B1 be a maximally entangled state of Schmidt-rank K between Alice and Bob. A CPTP map E is called ε-error Quantum State Splitting of ρ A A R with maximally entangled states if it consists of applying local operations at Alice’s side, local operations at Bob’s side, sending q qubits from Alice to Bob and outputs a state (E ⊗ I R )(ρ A A R ⊗ |φ K φ K | A1 B1 ) ≈ε ρ AB R ,
(18)
where ρ AB R = (I A →B ⊗ I A R )ρ A A R . q is called quantum communication cost and e = log K entanglement cost. This is also called the Fully Quantum Reverse Shannon (FQRS) protocol [32], which is a bit misleading, since there is a danger of confusion with the Quantum Reverse Shannon Theorem. Quantum State Splitting with maximally entangled states is dual to Quantum State Merging in the sense that every Quantum State Merging protocol already defines a protocol for Quantum State Splitting with maximally entangled states and vice versa. Lemma 3.5. Let ε > 0 and ρ A A R = |ρ ρ| A A R ∈ S≤ (H A A R ). Then there exists an ε-error Quantum State Splitting protocol with maximally entangled states for ρ A A R with a quantum communication cost of
1 1 H0 (A )ρ − Hmin (A |R)ρ + 2 · log , (19) q= 2 ε
Quantum Reverse Shannon Theorem Based on One-Shot Information Theory
587
and an entanglement cost of 1 1 , e= (H0 (A )ρ + Hmin (A |R)ρ ) − 2 · log 2 ε
(20)
where H0 (A )ρ = log rank(ρ A ). Proof. The Quantum State Splitting protocol with maximally entangled states is defined by running the Quantum State Merging protocol of Theorem 3.3 backwards (see Fig. 1). The claim then follows from Theorem 3.3. In order to obtain a Quantum State Splitting protocol that is optimal in terms of its quantum communication cost, we need to replace the maximally entangled states by embezzling states [11]. Definition 3.6. Let δ > 0. A state μ AB ∈ S= (H AB ) is called a δ-ebit embezzling state if there exist isometries X A→A A and X B→B B such that (X A→A A ⊗ X B→B B )μ AB (X A→A A ⊗X B→B B )† ≈δ μ AB ⊗ |φ φ| A B ,
(21)
where |φ φ| A B ∈ S= (H A B ) denotes an ebit (maximally entangled state of Schmidt rank 2). Proposition 3.7 [11]. δ-ebit embezzling states exist for all δ > 0. We would like to highlight two interesting examples. For the first example consider the state |μm μm | Am B m ∈ S= (H Am B m ) defined by |μm Am B m = C ·
m−1
⊗j
⊗(m− j)
|ϕ AB ⊗ |φ AB
,
j=0
where |ϕ ϕ| AB ∈ S= (H AB ), |φ φ| AB ∈ S= (H AB ) denotes an ebit and C is such that |μm μm | Am B m is normalized. Note that applying the cyclic shift operator U A0 Am that sends Ai → Ai+1 at Alice’s side (modulo m + 1) and the corresponding cyclic m m m m shift operator U B0 B m at Bob’s side maps |ϕ A0 B0 ⊗ |μm A B to |φ A0 B0 ⊗ |μm A B up to an accuracy of m2 [44]. For the choice |ϕ AB = |ϕ A ⊗ |ϕ B , |μm μm | Am B m is a m2 -ebit embezzling state with the isometries X Am →A0 Am = U A0 Am |ϕ A0 and X B m →B0 B m = U B0 B m |ϕ B0 . The second example is the state |μ˜ m μ˜ m | AB ∈ S= (H AB ) defined by |μ˜ AB m
It is a
2 m -ebit
m
m
j=1
j=1
2 2 1 −1/2 1 ) =( · √ | j A ⊗ | j B . j j
embezzling state [11].5
5 The state |μ ˜ m μ˜ m | AB is even a (
2 m , r )-universal embezzling state [11]. That is, for any |ς ς | A B ∈
S= (H A B ) of Schmidt-rank at most r , there exist isometries X A→A A and X B→B B such that
(X A→A A ⊗ X B→B B )|μ˜ m μ˜ m | AB (X A→A A ⊗X B→B B )† ≈ 2 |μ˜ m μ˜ m | AB ⊗ |ς ς | A B . m
588
M. Berta, M. Christandl, R. Renner
Remark 3.8. By using δ-ebit embezzling states multiple times, it is possible to create maximally entangled states of higher dimension. More precisely, for every δ-ebit embezzling state μ AB ∈ S= (H AB ) there exist isometries X A→A A and X B→B B such that (X A→A A ⊗ X B→B B )μ AB (X A→A A ⊗X B→B B )† ≈δ·log L μ AB ⊗ |φ L φ L | A B , (22) where |φ L φ L | A B ∈ S= (H A B ) denotes a maximally entangled state of Schmidt-rank L (with L being a power of 2). We are now ready to define Quantum State Splitting with embezzling states. Definition 3.9 (Quantum State Splitting with embezzling states). Consider a bipartite system with parties Alice and Bob. Let ε > 0, δ > 0 and ρ A A R = |ρ ρ| A A R ∈ S≤ (H A A R ), where Alice controls A A and R is a reference system. A CPTP map E is called ε-error Quantum State Splitting of ρ A A R with a δ-ebit embezzling state if it consists of applying local operations at Alice’s side, local operations at Bob’s side, sending q qubits from Alice to Bob, using a δ-ebit embezzling state μ Aemb Bemb , and outputs a state (E ⊗ I R )(ρ A A R ⊗ μ Aemb Bemb ) ≈ε ρ AB R ,
(23)
where ρ AB R = (I A →B ⊗ I A R )ρ A A R . q is called quantum communication cost. The following theorem about the achievability of Quantum State Splitting with embezzling states (Theorem 3.10) is the main result of this section. In Sec. 4 we use this theorem to prove the Quantum Reverse Shannon Theorem. Theorem 3.10. Let ε > 0, ε ≥ 0, δ > 0 and ρ A A R = |ρ ρ| A A R ∈ S≤ (H A A R ). Then there exists an (ε + ε + δ · log |A | + |A |−1/2 )-error6 Quantum State Splitting protocol for ρ A A R with a δ-ebit embezzling state for a quantum communication cost of q≤
1 ε 1 I (A : R)ρ + 2 · log + 4 + log log |A |. 2 max ε
(24)
Proof. The idea for the protocol is as follows (cf. Fig. 2). First, we disregard the eigenvalues of ρ A that are smaller than |A|−2 . This introduces an error α = |A|−1/2 , but because of the monotonicity of the purified distance (Lemma A.2), the error at the end of the protocol is still upper bounded by the same α. As a next step we let Alice perform a coherent measurement W with roughly 2 · log |A| measurement outcomes in the eigenbasis of ρ A . That is, the state after the measurement is of the form ω A A R I A = |ω ω| A A R I A with √ pi |ρ i A A R ⊗ |i I A . |ω A A R I A = i∈I
Here the index i indicates which measurement outcome occurred, pi denotes its probability and ρ iA A R = |ρ i ρ i | A A R the corresponding post-measurement state. 6 The error term |A |−1/2 can be made arbitrarily small by enlarging the Hilbert space H . Of course A this increases the error term δ · log |A |, but this can again be compensated with decreasing δ. Enlarging the Hilbert space H A also increases the quantum communication cost (24), but only slightly.
Quantum Reverse Shannon Theorem Based on One-Shot Information Theory
589
Fig. 2. A schematic description of our protocol for Quantum State Splitting with embezzling states in the language of the quantum circuit model [45,46]. See the text for definitions and a precise description
Then, conditioned on the index i, we use the Quantum State Splitting protocol with maximally entangled states from Lemma 3.5 for each state ρ iA A R and denote the corresponding quantum communication cost and entanglement cost by qi and ei respectively. The total amount of quantum communication we need for this is given by maxi qi plus the amount needed to send the register I A (which is of order log log |A|). In addition, since the different branches of the protocol use different amounts of entanglement, we need to provide a superposition of different (namely ei sized) maximally entangled states. We do this by using embezzling states.7 As the last step, we undo the initial coherent measurement W . This completes the Quantum State Splitting protocol with embezzling states for ρ A A R . All that remains to do, is to bring the expression for the quantum communication cost in the right form. In the following, we describe the proof in detail. Let Q = 2 · log |A | − 1, I = {0, 1, . . . , Q, (Q + 1)} and let {PAi }i∈I be a collecQ+1 tion of projectors on H A defined as follows. PA projects on the eigenvalues of ρ A Q in [2−2 log |A | , 0], PA projects on the eigenvalues of ρ A in [2−Q , 2−2 log |A | ] and for i = 0, 1, . . . , (Q − 1), PAi projects on the eigenvalues of ρ A in [2−i , 2−(i+1) ].
7 Note that it is not possible to get such a superposition starting from any amount of maximally entangled states only using local operations. This problem is known as entanglement spread and is discussed in [10].
590
M. Berta, M. Christandl, R. Renner
−1/2 Furthermore let pi = tr PAi ρ A , ρ iA A R = |ρ i ρ i | A A R with |ρ i A A R = pi · i PA |ρ A A R and define the state ρ¯ A A R = |ρ
¯ ρ| ¯ A A R with |ρ ¯ A A R = ϒ
−1/2
·
Q √
pi |ρ i A A R ,
i=0
where ϒ = We have
Q i=0
pi . ρ¯ A A R ≈|A |−1/2 ρ A A R
(25)
as can be seen as follows. We have
P(ρ¯ A A R , ρ A A R ) = 1 − F 2 (ρ¯ A A R , ρ A A R ) = 1 − | ρ|ρ ¯ A A R |2 Q √ = 1 − p = p . i
Q+1
i=0
But because at most |A | eigenvalues of ρ A can lie in [2−2 log |A | , 0], each one smaller or equal to 2−2 log |A | , we obtain p Q+1 ≤ |A | · 2−2 log |A | = |A |−1 and hence P(ρ¯ A A R , ρ A A R ) ≤ |A |−1/2 . We proceed by defining the operations that we need for the Quantum State Splitting protocol with embezzling states for ρ¯ A A R (cf. Fig. 2). Define the isometry W A →A I A = PAi ⊗ |i I A , (26) i∈I
where the vectors |i A are mutually orthogonal and I A is at Alice’s side. We want to use the ε-error Quantum State Splitting protocol with maximally entangled states from Lemma 3.5 for each ρ iA A R . For each i = 0, 1, . . . , Q this protocol has a quantum communication cost of
1 1 (H0 (A )ρ i − Hmin (A |R)ρ i ) + 2 · log qi = 2 ε and an entanglement cost of 1 1 ei = (H0 (A )ρ i + Hmin (A |R)ρ i ) − 2 · log . 2 ε
(27)
For A1 on Alice’s side, B1 on Bob’s side and Ai1 , B1i 2ei -dimensional subspaces of A1 , B1 respectively, the Quantum State Splitting protocol from Lemma 3.5 has the following i form: apply the isometry V i i i on Alice’s side, send B2 from Alice to Bob (for A A A1 →AB2
a quantum communication cost of qi ) and then apply the isometry U i i
B1 B2i →B
on Bob’s
side. As a next ingredient to the protocol, we define the isometries that supply the maximally entangled states of size ei . For i = 0, 1, . . . , Q, let X i i and Xi Bemb →Bemb B1i
Aemb →Aemb A1
be the isometries at Alice’s and Bob’s side respectively, that embezzle,
Quantum Reverse Shannon Theorem Based on One-Shot Information Theory
591
with accuracy δ · ei , a maximally entangled state of dimension ei out of the embezzling state and put it in Ai1 B1i . We are now ready to put the isometries together and give the protocol for Quantum State Splitting with embezzling states for ρ¯ A A R (cf. Fig. 2). Alice applies the isometry W A →A I A followed by the isometry X AembI A →Aemb A1 I A =
Q
X iA
i=1
i emb →Aemb A1
⊗ |i i| I A
and the isometry V A A A1 I A →AB2 I A =
Q i=0
V Ai A Ai →AB i ⊗ |i i| I A . 1
2
Afterwards she sends I A and B2 , that is
1 1 q = max (H0 (A )ρ i − Hmin (A |R)ρ i ) + 2 · log + log 2 · log |A | i 2 ε qubits to Bob (where we rename I A to I B ). Then Bob applies the isometry X BembI B →Bemb B1 I B =
Q
X iB
i emb →Bemb B1
i=1
⊗ |i i| I B
followed by the isometry U B1 B2 I B →B I B =
Q i=0
U Bi i B i →B ⊗ |i i| I B .
(28)
1 2
Next we analyze how the resulting state looks like. By the definition of embezzling states (cf. Definition 3.6 and Remark 3.8), the monotonicity of the purified distance (Lemma A.2) and the triangle inequality for the purified distance, we obtain a state σ AB R I B = |σ σ | AB R I B with |σ AB R I B = ϒ −1/2 ·
Q √
pi |ρ˜ i AB R ⊗ |i I B ,
i=0
where |ρ˜ i ρ˜ i | AB R = ρ˜ iAB R ≈ε+δ·ei ρ iAB R and ρ iAB R = (I A →B ⊗ I A R )ρ iA A R for i = 0, 1, . . . , Q. The state σ AB R I B is close to the state ω AB R I B = |ω ω| AB R I B with |ω AB R I B = ϒ −1/2 ·
Q √
pi |ρ i AB R ⊗ |i I B ,
i=0
as can be seen as follows. Because we can assume without lost of generality that all
ρ˜ i |ρ i are real and nonnegative,8 we obtain 8 This can be done by multiplying the isometries U i in (28) with appropriately chosen phase factors. B1i B2i →B
592
M. Berta, M. Christandl, R. Renner
2 P(σ AB R I B , ω AB R I B ) = 1 − F 2 (σ AB R I B , ω AB R I B ) = 1 − σ |ω AB R I B 2 Q 1 i i · = 1− pi ρ˜ |ρ AB R ϒ i=0 ⎞2 ⎛ ⎛ ⎞2 Q Q 1 1 = 1 − ⎝ · pi ρ˜ i |ρ i AB R ⎠ = 1 − ⎝ · pi F ρ˜ iAB R , ρ iAB R ⎠ ϒ ϒ i=0 i=0 ⎞2 ⎛ Q i 1 = 1 − ⎝ · pi 1 − P 2 ρ˜ AB R , ρ iAB R ⎠ ϒ i=0 ⎛ ⎞2 Q
1 ≤ 1 − ⎝ · pi 1 − (ε + δ · ei )2 ⎠ ϒ i=0 ⎛ ⎞2 Q % 1 ≤ 1 − ⎝ · pi 1 − (ε + δ · max ei )2 ⎠ i ϒ i=0
= ε + δ · max ei ≤ ε + δ · log |A |, i
where the last inequality follows from (27). To decode the state σ AB R I B to a state that is (ε+δ·log |A |)-close to ρ¯ AB R , we define the isometry W B→B I B analogously to W A →A I A in (26). Because all isometries are injective, we can define an inverse of W on the image of W (which we denote by Im(W )). The inverse is again an isometry and we denote it −1 by WIm(W )→B . The last step of the protocol is then to apply the CPTP map to the state σ AB R I B , that first does a measurement on B I B to decide whether σ B I B ∈ Im(W ) or not and then, if −1 σ B I B ∈ Im(W ), applies the isometry WIm(W )→B and otherwise maps the state to |0 0| B . By the monotonicity of the purified distance (Lemma A.2) we finally get a state that is (ε + δ · log |A |)-close to ρ¯ AB R . Hence we showed the existence of an (ε + δ · log |A |)error Quantum State Splitting protocol with embezzling states for ρ¯ A A R with a quantum communication cost of
1 1 H0 (A )ρ i − Hmin (A |R)ρ i + 2 · log + log 2 · log |A | , (29) q = max i 2 ε where i ∈ {0, 1, . . . , Q}. But by the monotonicity of the purified distance (Lemma A.2), (25) and the triangle inequality for the purifed distance, this implies the existence of an ε + δ · log |A | + |A|−1/2 -error Quantum State Splitting protocol with embezzling states for ρ A A R with a quantum communication cost as in (29). We now proceed with simplifying the expression for the quantum communication cost (29). We have H0 (A )ρ i ≤ Hmin (A )ρ i + 1 for i = 0, 1, . . . , Q as can be seen as follows. We have 1 i ≤ ρ iA ≤ 2−i , 2−(i+1) ≤ λmin (ρ iA ) ≤ ∞ rank ρ A
Quantum Reverse Shannon Theorem Based on One-Shot Information Theory
593
where λmin (ρ iA ) denotes the smallest non-zero eigenvalue of ρ iA . Thus rank ρ iA ≤ 2i+1 = 2i · 2 ≤ i 2 , and this is equivalent to the claim. ρ A
∞
Hence we get an (ε + δ · log |A | + |A |−1/2 )-error Quantum State Splitting protocol with embezzling states for ρ A A R with a quantum communication cost of
1 1 Hmin (A )ρ i − Hmin (A |R)ρ i + 1 + 2 · log + log 2 · log |A | . q = max i 2 ε Using a lower bound for the max-information in terms of min-entropies (Lemma B.13) and the behavior of the max-information under projective measurements (Corollary B.19) we can simplify this to
1 1 1 + log 2 · log |A | q ≤ max Imax (A : R)ρ i + 2 · log + i 2 ε 2
1 1 1 + log 2 · log |A | . ≤ Imax (A : R)ρ + 2 · log + 2 ε 2 It is then easily seen that q≤
1 1 Imax (A : R)ρ + 2 · log + 4 + log log |A |. 2 ε
As the last step, we transform the max-information term in the formula for the quantum communication cost into a smooth max-information. Namely, we can reduce the quantum communication cost if we do not apply the protocol as described above to the state ρ A A R , but pretend that we have another (possibly subnormalized) state ρˆ A A R that is ε -close to ρ A A R and then apply the protocol for ρˆ A A R . By the monotonicity of the purified distance (Lemma A.2), the additional error term that we get from this is upper bounded by ε and by the triangle inequality for the purified distance this results in an accuracy of ε + ε + δ · log |A | + |A |−1/2 . But if we minimize q over all ρˆ A A R that are ε -close to ρ A A R , we can reduce the quantum communication cost to q≤
1 ε 1 I (A : R)ρ + 2 · log + 4 + log log |A |. 2 max ε
(30)
This shows the existence of an (ε + ε + δ · log |A | + |A |−1/2 )-error Quantum State Splitting protocol with embezzling states for ρ A A R for a quantum communication cost as in (30). The following theorem shows that the quantum communication cost in Theorem 3.10 is optimal up to small additive terms. Theorem 3.11. Let ε > 0 and ρ A A R = |ρ ρ| A A R ∈ S≤ (H A A R ). Then the quantum communication cost for any ε-error Quantum State Splitting protocol9 for ρ A A R is lower bounded by q≥
1 ε I (A : R)ρ . 2 max
(31)
9 We suppress the mentioning of any entanglement resource, since the statement holds independently of it.
594
M. Berta, M. Christandl, R. Renner
Proof. We have a look at the correlations between Bob and the reference by analyzing the max-information that the reference has about Bob. At the beginning of any protocol, there is no register at Bob’s side and therefore the max-information that the reference has about Bob is zero. Since back communication is not allowed, we can assume that the protocol for Quantum State Splitting has the following form: applying local operations at Alice’s side, sending qubits from Alice to Bob and then applying local operations at Bob’s side. Local operations at Alice’s side have no influence on the max-information that the reference has about Bob. By sending q qubits from Alice to Bob, the max-information that the reference has about Bob can increase, but at most by 2q (Lemma B.10). By applying local operations at Bob’s side the max-information that the reference has about Bob can only decrease (Lemma B.17). So the max-information that the reference has about Bob is upper bounded by 2q. Therefore, any state ω B R at the end of a Quantum State Splitting protocol must satisfy Imax (B : R)ω ≤ 2q. But we also need ω B R ≈ε ρ B R ≡ (I A →B ⊗ I R )(ρ A R ) by the definition of ε-error Quantum State Splitting (Definition 3.9). Using the definition of the smooth max-information, we get q≥
1 ε I (A : R)ρ . 2 max
4. The Quantum Reverse Shannon Theorem This section contains the main result, a proof of the Quantum Reverse Shannon Theorem. The intuition is as follows. Let E A→B be a quantum channel with E A→B :
S= (H A ) → S= (H B ) ρ A → E A→B (ρ A ),
where we want to think of subsystem A being at Alice’s side and subsystem B being at Bob’s side. The Quantum Reverse Shannon Theorem states that if Alice and Bob share embezzling states, they can asymptotically simulate E A→B only using local operations at Alice’s side, local operations at Bob’s side, and a classical communication rate (from Alice to Bob) of C E = max I (B : R)(E ⊗I )() ,
where A R is a purification of ρ A and we note that I (B : R)(E ⊗I )() = H (R)ρ + H (B)E (ρ) − H (B R)(E ⊗I )() . Using Stinespring’s dilation [36], we can think of E A→B as (32) E A→B (ρ A ) = trC (U A→BC )ρ A (U A→BC )† , where C is an additional register with |C| ≤ |A||B| and U A→BC some isometry. The idea of our proof is to first simulate the quantum channel locally at Alice’s side, resulting in ρ BC = (U A→BC )ρ A (U A→BC )† , and then use Quantum State Splitting with embezzling states (Theorem 3.10) to do an optimal state transfer of the B-part to Bob’s side, such that he holds ρ B = E A→B (ρ A ) in the end. Note that we can replace the quantum communication in the Quantum State Splitting protocol by twice as much classical communication, since we have free entanglement and can therefore use quantum teleportation [47]. Although the free entanglement is given in the form of embezzling states, maximally entangled states can be created without any (additional) communication (Definition 3.6).
Quantum Reverse Shannon Theorem Based on One-Shot Information Theory
595
More formally, we make the following definitions: Definition 4.1. Consider a bipartite system with parties Alice and Bob. Let ε ≥ 0 and E : L(H A ) → L(H B ) be a CPTP map, where Alice controls H A and Bob H B . A CPTP map P is a one-shot reverse Shannon simulation for E with error ε if it consists of applying local operations at Alice’s side, local operations at Bob’s side, sending c classical bits from Alice to Bob, using a δ-ebit embezzling state for some δ > 0, and P − E ≤ ε,
(33)
where . denotes the diamond norm (Definition D.1). c is called classical communication cost of the one-shot reverse Shannon simulation. Definition 4.2. Let E : L(H A ) → L(H B ) be a CPTP map. An asymptotic reverse Shannon simulation for E is a sequence of one-shot reverse Shannon protocols P n for E ⊗n with error εn , such that limn→∞ εn = 0. The classical communication cost cn of this simulation is lim supn→∞ cnn = c. A precise statement of the Quantum Reverse Shannon Theorem is now as follows. Theorem 4.3. Let E A→B : L(H A ) → L(H B ) be a CPTP map. Then the minimal classical communication cost CQRST of asymptotic reverse Shannon simulations for E A→B is equal to the entanglement assisted classical capacity C E of E A→B . That is CQRST = max I (B : R)(E ⊗I )() ,
(34)
where A R = | | A R ∈ S= (H A R ) is a purification of the input state ρ A ∈ S= (H A ).10 Proof. First note that by the entanglement assisted classical capacity theorem CQRST ≥ C E [2].11 Hence it remains to show that CQRST ≤ C E . We start by making some general statements about the structure of the proof, and then dive into the technical arguments. Because the Quantum Reverse Shannon Theorem makes an asymptotic statement, we have to make our considerations for a general n ∈ N. Thus the goal is to show the exis⊗n n tence of a one-shot reverse Shannon simulation P A→B for E A→B that is arbitrarily close ⊗n for n → ∞, has a classical communication rate of C E and works for any input. to E A→B We do this by using Quantum State Splitting with embezzling states (Theorem 3.10), quantum teleportation [47] and the Post-Selection Technique (Proposition D.4). ⊗n n Any hypothetical map P A→B (that we may want to use for the simulation of E A→B ), ⊗n can be made to act symmetrically on the n-partite input system H A by inserting a symmetrization step. This works as follows. First Alice and Bob generate some shared randomness by generating maximally entangled states from the embezzling states and measuring their part in the same computational basis (for n large, O(n log n) maximally n entangled states are needed). Then, before the original map P A→B starts, Alice applies a random permutation π on the input system chosen according to the shared randomness. n Afterwards they run the map P A→B and then, in the end, Bob undoes the permutation 10 Since all purifications give the same amount of entropy, we do not need to specify which one we use. 11 Assume that C QRST ≤ C E − δ for some δ > 0 and start with a perfect identity channel I A→B . Then we could use CQRST ≤ C E − δ together with the entanglement assisted classical capacity theorem to asymptotC ically simulate the perfect identity channel at a rate C E−δ > 1; a contradiction to Holevo’s theorem [3,46]. E
596
M. Berta, M. Christandl, R. Renner
(a)
(b)
(c) Fig. 3. (a) X is the map that embezzles m maximally entangled states |φm φm | Aebit Bebit out of μ Aemb Bemb . These maximally entangled states are then used in the protocol. (b) The whole map that should simulate ⊗n takes ρ nA ⊗ μ Aemb Bemb ⊗ |φm φm | Aebit Bebit with ρ nA ∈ S= (H⊗n E A→B A ) as an input. But since this input is constant on all registers except for A, we can think of the map as in (c), namely as a CPTP map P nA→B which takes only the input ρ nA
by applying π −1 on the output system. From this we obtain a permutation invariant n version of P A→B . Since the maximally entangled states can only be created with finite precision, the shared randomness, and therefore the permutation invariance, is not perfect. However, as we will argue at the end, this imperfection can be made arbitrarily small and can therefore be neglected. Note that the simulation will need embezzling states μ Aemb Bemb and maximally entangled states |φm φm | Aebit Bebit (for the quantum teleportation step and to assure the permutation invariance). But since the input on these registers is fixed, we are allowed to n think of the simulation as a map P A→B , see Fig. 3. n Let β > 0. Our aim is to show the existence of a map P A→B , that consists of applying local operations at Alice’s side, local operation at Bob’s side, sending classical bits from Alice to Bob at a rate of C E , and such that ⊗n n − P A→B ≤ β. E A→B n P A→B
(35)
is permutation invariant, we are allowed to use Because we assume that the map the Post-Selection Technique (Proposition D.4). Thus (35) relaxes to ⊗n n n −(|A|2 −1) (E , (36) A→B − P A→B ) ⊗ I R R )(ζ A R R 1 ≤ β(n + 1) where ζ An R R is a purification of ζ An R = ω⊗n A R d(ω A R ), ω A R = |ω ω| A R ∈ S= (H A R ) and d(.) is the measure on the normalized pure states on H A R induced by the Haar measure on the unitary group acting on H A R , normalized to d(.) = 1. ⊗n at Alice’s side To show (36), we consider a local simulation of the channel E A→B (using Stinespring’s dilation as in (32)) followed by Quantum State Splitting with embezzling states. Applied to the de Finetti type input state ζ An R R , we obtain the state n n n n † ζ BC R R = (U A→BC ⊗ 1 R R )ζ A R R (U A→BC ⊗ 1 R R ) .
As described above, this map can be made permutation invariant (cf. Fig. 4).
Quantum Reverse Shannon Theorem Based on One-Shot Information Theory
597
Fig. 4. A schematic description of the protocol that is used to prove the Quantum Reverse Shannon Theorem. n The channel simulation is done for the de Finetti type input state ζ AR . Because our simulation is permutation R invariant, the Post-Selection Technique (Proposition D.4) shows that this is also sufficient for all input states. The whole simulation is called P n in the text. (i) and (ii) denote the subroutine of Quantum State Splitting with embezzling states and quantum teleportation; with local operations on Alice’s and Bob’s side and a classical communication rate of cn
n Now we use this map as (P A→B ⊗ I R R ) in (36).12 We obtain from the achievability of Quantum State Splitting with embezzling states (Theorem 3.10) that
⊗n n ⊗ I R R )(ζ An R R ), (P A→B ⊗ I R R )(ζ An R R ) ≤ ε + ε + δn · log |B| + |B|−n/2 , P (E A→B for a quantum communication cost of qn ≤
1 ε 1 Imax (B : R R )(E ⊗n ⊗I )(ζ n ) + 2 · log + 4 + log n + log log |B|. 2 ε
(37)
Because the trace distance is upper bounded by two times the purified distance (Lemma A1), this implies ⊗n (E
A→B
n − P A→B ) ⊗ I R R )(ζ An R R 1 ≤ 2(ε + ε + δn · log |B| + |B|−n/2 ).
By choosing ε = ε and δ =
ε n·log |B|
we obtain
⊗n n − P A→B ) ⊗ I R R )(ζ An R R )1 ≤ 6ε + 2 · |B|−n/2 . ((E A→B 12 So far this map needs quantum communication, but we are going to replace this by classical communication shortly.
598
M. Berta, M. Christandl, R. Renner
Furthermore we choose ε = 16 β(n + 1)−(|A| −1) − 13 |B|−n/2 (for large enough n) and hence ⊗n n n −(|A|2 −1) (E . A→B − P A→B ) ⊗ I R R )(ζ A R R 1 ≤ β(n + 1) 2
This is (36) and by the Post-Selection Technique (Proposition D.4) this implies (35). n But the map (P A→B ⊗ I R R ) uses quantum communication and we are only allowed to use classical communication. It thus remains to replace the quantum communication by classical communication and to show that the classical communication rate of the resulting map is upper bounded by C E . Set χ = 2 · log ε1 + 4 + log n + log log |B|. It follows from (37) and below that the n quantum communication cost of (P A→B ⊗ I R R ) is quantified by qn ≤
1 ε I (B : R R )(E ⊗n ⊗I )(ζ n ) + χ . 2 max
We can use quantum teleportation [47] (using the maximally entangled states |φm φm | Aebit Bebit ) to transform this into a classical communication cost of
ε cn ≤ Imax (B : R R )(E ⊗n ⊗I )(ζ n ) + 2χ .
By the upper bound in Proposition B.12 and the fact that we can assume |R | ≤ (n + 2 1)|A| −1 (Proposition D.4), we get
ε (B : R)(E ⊗n ⊗I )(ζ n ) + 2 · log |R | + 2χ cn ≤ Imax 2 ε (B : R)(E ⊗n ⊗I )(ζ n ) + 2 · log (n + 1)|A| −1 + 2χ . ≤ Imax
By a corollary of Carathéodory’s theorem (Corollary D.6), we can write ζ An R = pi (iA R )⊗n , i
where iA R = |i i | A R ∈ S= (H A R ), i ∈ {1, 2, . . . , (n + 1)2|A||R|−2 } and pi a probability distribution. Using a quasi-convexity property of the smooth max-information (Proposition B.21) we then obtain 2 ε cn ≤ Imax (B : R)(E ⊗n ⊗I )(i pi (i )⊗n ) + 2 · log (n + 1)|A| −1 + 2χ 2 ε ≤ max Imax (B : R)[(E ⊗I )(i )]⊗n +log (n + 1)2|A||R|−2 +2 · log (n + 1)|A| −1 +2χ i 2 ε ≤ max Imax (B : R)[(E ⊗I )()]⊗n +log (n + 1)2|A||R|−2 +2 · log (n + 1)|A| −1 +2χ ,
where the last maximum ranges over all A R = | | A R ∈ S= (H A R ). From the Asymptotic Equipartition Property for the smooth max-information (Lemma B.24) we obtain √ ε2 + log (n + 1)2|A||R|−2 cn ≤ n · max I (B : R)(E ⊗I )() + n · ξ(ε ) − 2 · log 24 |A|2 −1 + 2χ , +2 · log (n + 1)
Quantum Reverse Shannon Theorem Based on One-Shot Information Theory
599
2 where ξ(ε ) = 8 13 − 4 · log ε · (2 + 21 · log |A|). Since ε = 16 β(n + 1)−(|A| −1) − 1 −n/2 , the classical communication rate is then upper bounded by 3 |B| c = lim sup lim sup β→0
n→∞
cn ≤ max I (B : R)(E ⊗I )() . n
Thus it only remains to justify why it is sufficient that the maximally entangled states, which we used for the quantum teleportation step and to make the protocol permutation n invariant, only have finite precision. For this, it is useful to think of the CPTP map P A→B that we constructed above, as in Fig. 3 (b). Let ε > 0 and assume that the entanglement is ε -close to the perfect input state μ Aemb Bemb ⊗ |φm φm | Aebit Bebit . The purified distance is monotone (Lemma A.2) and hence the corresponding imperfect output state is ε -close to the state obtained under the assumption of perfect permutation invariance. Since ε can be made arbitrarily small (Definition 3.6), the CPTP map based on the imperfect entanglement does the job. Acknowledgements. We thank Jürg Wullschleger and Andreas Winter for inspiring discussions and William Matthews and Debbie Leung for detailed feedback on the first version of this paper as well as for suggesting Figs. 2 and 4. MB and MC are supported by the Swiss National Science Foundation (grant PP00P2-128455) and the German Science Foundation (grants CH 843/1-1 and CH 843/2-1). RR acknowledges support from the Swiss National Science Foundation (grant No. 200021-119868). Part of this work was carried out while MB and MC were affiliated with the Faculty of Physics at the University of Munich in Germany. Open Access This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
Appendix A: Properties of the Purified Distance The following gives lower and upper bounds to the purified distance in terms of the trace distance. Lemma A.1 [39, Lemma 6]. Let ρ, σ ∈ S≤ (H). Then
1 · ρ − σ 1 ≤ P(ρ, σ ) ≤ ρ − σ 1 + |tr[ρ] − tr[σ ]|. 2
(A1)
The purified distance is monotone under CPTP maps. Lemma A.2 [39, Lemma 7]. Let ρ, σ ∈ S≤ (H) and E be a CPTP map on H. Then P (E(ρ), E(σ )) ≤ P(ρ, σ ).
(A2)
The purified distance is convex in its arguments in the following sense. Lemma A.3. Let ρi , σi ∈ S≤ (H) with ρi ≈ε σi for i ∈ I and pi a probability distribution. Then pi ρi ≈ε pi σi . (A3) i∈I
i∈I
600
M. Berta, M. Christandl, R. Renner
Proof. Let ρ = i∈I pi ρi and σ = i∈I pi σi and define ρˆ = ρ ⊕ (1 − tr [ρ]) , σˆ = ρ ⊕ (1 − tr [σ ]) as well as ρˆi = ρi ⊕ (1 − tr [ρi ]) and σˆ i = σi ⊕ (1 − tr [σi ]) for all i ∈ I. √ By assumption we have F ρˆi , σˆ i ≥ 1 − ε2 for all i ∈ I and using the joint concavity of the fidelity [46] we obtain & ' 2 2 ˆ σˆ = 1 − F pi ρˆi , pi σˆ i P(ρ, σ ) = 1 − F ρ, i∈I
'2 & ≤ 1− pi F ρˆi , σˆ i ≤ ε.
i∈I
i∈I
Appendix B: Basic Properties of Smooth Entropy Measures 1. Additional definitions. Our technical claims use some auxiliary entropic quantities. For ρ A ∈ S≤ (H A ) we define 1/2 Hmax (A)ρ = 2 · log tr ρ A , (B1) H0 (A)ρ = log rank(ρ A ), ) ( H R (A)ρ = − sup λ ∈ R : ρ A ≥ 2λ · ρ 0A ,
(B2) (B3)
where Hmax (A)ρ is called max-entropy. For ε ≥ 0, the smooth max-entropy of ρ A ∈ S≤ (H A ) is defined as ε Hmax (A)ρ =
inf
ρ¯ A ∈Bε (ρ A )
Hmax (A)ρ¯ .
(B4)
The conditional min-entropy of ρ AB ∈ S≤ (H AB ) relative to σ B ∈ S= (H B ) is given by Hmin (A|B)ρ|σ = −Dmax (ρ AB 1 A ⊗ σ B ).
(B5)
The quantum conditional collision entropy of A given B for ρ AB ∈ S≤ (H AB ) is defined as * + −1/4 −1/4 2 , (B6) HC (A|B)ρ = − inf log tr (1 A ⊗ σ B )ρ AB (1 A ⊗ σ B ) σ B ∈S= (H B )
where the inverses are generalized inverses.13 As in [18] we define the min-relative entropy of ρ ∈ S≤ (H) with respect to σ ∈ P(H) as Dmin (ρσ ) = − log tr ρ 0 σ . (B7) 13 For ρ ∈ P(H), ρ −1 is a generalized inverse of ρ if ρρ −1 = ρ −1 ρ = ρ 0 = (ρ −1 )0 .
Quantum Reverse Shannon Theorem Based on One-Shot Information Theory
601
2. Alternative formulas. The max-relative entropy can be written in the following alternative form. Lemma B.4 [12, Lemma B.5.3]. Let ρ ∈ S≤ (H) and σ ∈ P(H) such that supp(ρ) ⊆ supp(σ ). Then Dmax (ρσ ) = log σ −1/2 ρσ −1/2 ∞ ,
(B8)
where the inverses are generalized inverses. Using this we can give an alternative expression for the max-information. Lemma B.5. Let ρ AB ∈ S≤ (H AB ). Then Imax (A : B)ρ = H0 (A)ρ − Hmin (A|B)ρ B|A ,
(B9)
ρ AB (ρ A ⊗ 1 B )−1/2 and the inverses are generalized where ρ B|A = (ρ A ⊗ 1 B )−1/2 rank(ρ A) inverses.
Proof. Without loss of generality we can restrict the minimum in the definition of the max-mutual information to σ B ∈ S= (H B ) with supp(ρ AB ) ⊆ supp(ρ A ) ⊗ supp(σ B ). To see this note that Dmax (ρ AB ρ A ⊗ ρ B ) is finite but Dmax (ρ AB ρ A ⊗ σ B ) = ∞ for any σ B ∈ S= (H B ) with supp(ρ AB ) ⊆ supp(ρ A ) ⊗ supp(σ B ). Therefore we can use Lemma B.4, Imax (A : B)ρ = min log (ρ A ⊗ σ B )−1/2 ρ AB (ρ A ⊗ σ B )−1/2 σB ∞ 1 ρ AB A −1/2 = min log (ρ A ⊗ 1 B )−1/2 ( rank(ρ ) ⊗ σ B ) σB rank(ρ A A) 1A ⊗ σ B )−1/2 × (ρ A ⊗ 1 B )−1/2 ( . rank(ρ A ) ∞ ρ AB We have ρ B|A = (ρ A ⊗ 1 B )−1/2 rank(ρ (ρ A ⊗ 1 B )−1/2 ∈ S= (H AB ) because A)
tr ρ B|A
,
(ρ −1 A ⊗ 1 B )ρ AB = tr rank(ρ A )
-
,
ρ −1 A ρA = tr rank(ρ A )
= 1.
Hence we can write Imax (A : B)ρ = min Dmax (ρ B|A σB
1A ⊗ σ B ) = H0 (A)ρ − Hmin (A|B)ρ B|A . rank(ρ A )
602
M. Berta, M. Christandl, R. Renner
3. Upper and lower bounds. The conditional min-entropy is upper bounded by the quantum conditional collision entropy. Lemma B.6. Let ρ AB ∈ S≤ (H AB ). Then Hmin (A|B)ρ ≤ HC (A|B)ρ .
(B10)
Proof. Let σ B ∈ S= (H B ) be such that Hmin (A|B)ρ = −Dmax (ρ AB ρ A ⊗ σ B ). We know that supp(ρ AB ) ⊆ 1 A ⊗ supp(σ B ) (argumentation analogous as in the proof of Lemma B.5), and hence we can use the alternative expression for the max-relative entropy (Lemma B.4) −1/2 −1/2 ρ AB 1 A ⊗ σ B . Hmin (A|B)ρ = − log max tr ω AB 1 A ⊗ σ B ω AB ∈S= (H AB )
But for ρˆ AB =
ρ AB tr[ρ AB ]
∈ S= (H AB ) we have
HC (A|B)ρ = − log
min
κ B ∈S= (H B )
−1/2 −1/2 ρ AB 1 A ⊗ κ B tr ρ AB 1 A ⊗ κ B
−1/2 −1/2 ≥ − log tr ρ AB 1 A ⊗ σ B ρ AB 1 A ⊗ σ B −1/2 −1/2 = − log tr [ρ AB ] − log tr ρˆ AB 1 A ⊗ σ B ρ AB 1 A ⊗ σ B −1/2 −1/2 ≥ − log max ρ AB 1 A ⊗ σ B tr ω AB 1 A ⊗ σ B ω AB ∈S= (H AB )
= Hmin (A|B)ρ . The max-relative entropy is lower bounded by the quantum relative entropy. Lemma B.7 [18, Lemma 10]. Let ρ, σ ∈ S≤ (H). Then Dmax (ρσ ) ≥ D(ρσ ).
(B11)
The min-relative entropy is nonnegative for normalized states. Lemma B.8 [18, Lemma 6]. Let ρ, σ ∈ S= (H). Then Dmin (ρσ ) ≥ 0. From this we find the following dimension lower bounds for the conditional minentropy. Lemma B.9. Let ρ AB ∈ S≤ (H AB ). Then − log |A| ≤ Hmin (A|B)ρ|ρ ,
(B12)
− log |A| ≤ Hmin (A|B)ρ + log tr[ρ AB ].
(B13)
Quantum Reverse Shannon Theorem Based on One-Shot Information Theory
603
ρ AB R Proof. Let ρ AB R ∈ S≤ (H AB R ) be a purification of ρ AB and define ρˆ AB R = tr[ρ ∈ AB R ] S= (H AB R ). By a duality property of min- and max-entropy [15, Prop. 3.10] we have
Hmin (A|B)ρ| ˆ ρˆ =
min
σ R ∈S= (H R )
Dmin (ρˆ A R 1 A ⊗ σ R ).
Using the nonnegativity of the min-relative entropy for normalized states (Lemma B.8) we obtain 1A 0≤ min Dmin ρˆ A R Dmin ρˆ A R 1 A ⊗ σ R min ⊗ σ R = log |A| + |A| σ R ∈S= (H R ) σ R ∈S= (H R ) = log |A| + Hmin (A|B)ρ| ˆ ρˆ = log |A| + Hmin (A|B)ρ|ρ . Inequality (B13) is proved in [39, Lemma 20].
The following dimension upper bound holds for the max-information. Lemma B.10. Let ρ AB ∈ S≤ (H AB ). Then Imax (A : B)ρ ≤ 2 · log min {|A|, |B|} .
(B14)
Proof. It follows from the dimension lower bound for the conditional min-entropy (Lemma B.9) that 1B Imax (A : B)ρ ≤ Dmax ρ AB ρ A ⊗ = log |B| − Hmin (B|A)ρ|ρ ≤ 2 · log |B|. |B| Using the alternative expression for the max-information (Lemma B.5) and again Lemma B.9 we get Imax (A : B)ρ = H0 (A)ρ − Hmin (A|B)ρ B|A ≤ 2 · log |A|. Remark B.11. In general there is no dimension upper bound for Dmax (ρ AB ρ A ⊗ ρ B ) as can be seen by the following example. Let ρ AB = |ρ ρ| AB ∈ S= (H AB ) with Schmidt-decomposition |ρ AB = i λi |i A ⊗ |i B . Then Dmax (ρ AB ρ A ⊗ ρ B ) = log
&
' λi−1 ,
(B15)
i
where the sum ranges over all i with λi > 0. The following is a bound on the increase of the max-information when an additional subsystem is added. Lemma B.12. Let ε ≥ 0 and ρ ABC ∈ S= (H ABC ). Then ε ε (A : BC)ρ ≤ Imax (A : B)ρ + 2 · log |C|. Imax
(B16)
604
M. Berta, M. Christandl, R. Renner
ε (A : B) Proof. Let ρ˜ AB ∈ B ε (ρ AB ) and σ˜ B ∈ S= (H B ) such that Imax ρ = Dmax (ρ˜ AB ρ˜ A ⊗ σ˜ B ) = log μ. That is, μ is minimal such that μ · ρ˜ A ⊗ σ˜ B ≥ ρ˜ AB and 1C 1 ≥ |C| · ρ˜ AB ⊗ 1C . Furthermore define ρ˜ ABC ∈ B ε (ρ ABC ) this implies μ · ρ˜ A ⊗ σ˜ B ⊗ |C| such that trC ρ˜ ABC = ρ˜ AB (by Uhlmann’s theorem such a state exists [42,43]). By the dimension lower bound for the min-entropy (Lemma B.9), we have Hmin (C|AB)ρ| ˜ ρ˜ ≥ − log |C|. Therefore |C| · ρ˜ AB ⊗ 1C ≥ ρ˜ ABC and hence μ · ρ˜ A ⊗ 1C 1 σ˜ B ⊗ |C| ≥ |C| ˜ ABC . 2 ·ρ 1C Now let Dmax ρ˜ ABC ρ˜ A ⊗ σ˜ B ⊗ |C| = log λ. That is, λ is minimal such that
λ · ρ˜ A ⊗ σ˜ B ⊗ ε (A : BC)ρ Imax
1C |C|
≥ ρ˜ ABC . Thus it follows that λ ≤ μ · |C|2 and from this we get 1C ≤ Dmax (ρ˜ AB ρ˜ A ⊗ σ˜ B )+2 log |C| ≤ Dmax ρ˜ ABC ρ˜ A ⊗ σ˜ B ⊗ |C| ε = Imax (A : B)ρ + 2 · log |C|.
The max-information can be lower bounded in terms of the min-entropy. Lemma B.13. Let ρ AB ∈ S≤ (H AB ). Then Imax (A : B)ρ ≥ Hmin (A)ρ − Hmin (A|B)ρ .
(B17)
Proof. Let σ˜ B ∈ S= (H B ) such that Imax (A : B)ρ = Dmax (ρ AB ρ A ⊗ σ˜ B ) = log λ. That is, λ is minimal such that λ · ρ A ⊗ σ˜ B ≥ ρ AB . Furthermore let μ be minimal such that μ · ρ A ∞ · 1 A ⊗ σ˜ B ≥ ρ AB . Since ρ A ∞ · 1 A ≥ ρ A , we have that λ ≥ μ. Now set Hmin (A|B)ρ|σ˜ = − log ν, i.e. ν is minimal such that ν · 1 A ⊗ σ˜ B ≥ ρ AB . Thus ν = μ · ρ A ∞ and we conclude Imax (A : B)ρ = log λ ≥ log μ = − log ρ A ∞ + log ν = Hmin (A)ρ − Hmin (A|B)ρ|σ˜ ≥ Hmin (A)ρ − Hmin (A|B)ρ . The max-information can be upper bounded in terms of a difference between two entropic quantities. Lemma B.14. Let ρ AB ∈ S≤ (H AB ). Then Imax (A : B)ρ ≤ H R (A)ρ − Hmin (A|B)ρ .
(B18)
Proof. Let σ˜ B ∈ S= (H B ) such that Hmin (A|B)ρ = Hmin (A|B)ρ|σ˜ = − log μ. That is, μ is minimal such that μ · 1 A ⊗ σ˜ B ≥ ρ AB . Since multiplication by ρ 0A ⊗ 1 B does not affect ρ AB (note that the support of ρ AB is contained in the support of ρ A ⊗ ρ B ), we also have μ · ρ 0A ⊗ σ˜ B ≥ ρ AB . Furthermore ρ A ≥ λmin (ρ A )·ρ 0A , where λmin (ρ) denotes the smallest non-zero eigenvalue of ρ. Therefore λminμ(ρ A ) · ρ A ⊗ σ˜ B ≥ ρ AB . Now let Dmax (ρ AB |ρ A ⊗ σ˜ B ) = log λ, i.e. λ is minimal such that λ · ρ A ⊗ σ˜ B ≥ ρ AB . Hence λ ≤ μ · λ−1 min (ρ A ) and thus Imax (A : B)ρ ≤ Dmax (ρ AB |ρ A ⊗ σ˜ B ) ≤ H R (A)ρ − Hmin (A|B)ρ .
Quantum Reverse Shannon Theorem Based on One-Shot Information Theory
605
This can be generalized to a version for the smooth max-information. Lemma B.15. Let ε > 0 and ρ AB ∈ S= (H AB ). Then ε2 /48
ε ε /48 (A : B)ρ ≤ Hmax (A)ρ − Hmin (A|B)ρ − 2 · log Imax 2
ε2 . 24
(B19)
Proof. By the entropy measure upper bound for the max-information (Lemma B.14) we obtain ε Imax (A : B)ρ ≤
≤
ρ¯ AB
min
∈Bε (ρ
AB )
min
ω AB ∈Bε
2 /48
H R (A)ρ¯ − Hmin (A|B)ρ¯ . / min H R (A) A ω A − Hmin (A|B) A ω A ,
(ρ AB )
A
where the minimum ranges over all 0 ≤ A ≤ 1 A such that A ω AB A ≈ε/2 ω AB . Now we choose σ˜ B ∈ S= (H B ) such that Hmin (A|B)ω|σ˜ = Hmin (A|B)ω and use Lemma B.27 to obtain ε Imax (A : B)ρ ≤
≤ =
. min
ω AB ∈Bε
2 /48
(ρ AB )
.
min
ω AB ∈Bε
2 /48
(ρ AB )
min
ω AB ∈Bε
2 /48
(ρ AB )
.
min H R (A) A ω A − Hmin (A|B) A ω A |σ˜ A
min H R (A) A ω A − Hmin (A|B)ω|σ˜ A
/
/
/
min H R (A) A ω A − Hmin (A|B)ω , A
where the minimum ranges over all 0 ≤ A ≤ 1 A such that A ω AB A ≈ε/2 ω AB . As a next step we choose ω AB = ω˜ AB ∈ B ε Hmin (A|B)ω˜ . Hence we get
2 /48
ε2 /48
(ρ AB ) such that Hmin (A|B)ρ =
ε2 /48 ε (A : B)ρ ≤ min H R (A) A ω Imax ˜ A − Hmin (A|B)ρ , A
where the minimum ranges over all 0 ≤ A ≤ 1 A such that A ω˜ AB A ≈ε/2 ω˜ AB . Using Lemma B.28, we can choose 0 ≤ A ≤ 1 A with A ω˜ AB A ≈ε/2 ω˜ AB such ε2 /24
ε that H R (A) A ω ˜ A ≤ Hmax (A)ω˜ − 2 · log 24 . From this we finally obtain 2
ε2 24 2 2 ε 2 ε /48 ε /48 ≤ Hmax (A)ρ − Hmin (A|B)ρ − 2 · log . 24 ε2 /48
ε ε /24 Imax (A : B)ρ ≤ Hmax (A)ω˜ − Hmin (A|B)ρ − 2 · log 2
606
M. Berta, M. Christandl, R. Renner
4. Monotonicity. The max-relative entropy is monotone under CPTP maps. Lemma B.16 [18, Lemma 7]. Let ρ, σ ∈ P(H) and E be a CPTP map on H. Then Dmax (ρσ ) ≥ Dmax (E(ρ)E(σ )).
(B20)
It follows that the max-information is monotone under local CPTP maps. Lemma B.17. Let ρ AB ∈ S≤ (H AB ) and T be a CPTP map on H AB of the form E = E A ⊗ E B . Then Imax (A : B)ρ ≥ Imax (A : B)E (ρ) .
(B21)
Proof. Let σ˜ B ∈ S= (H B ). Using the monotonicity of the max-information under local CPTP maps (Lemma B.16) we obtain Imax (A : B)ρ = Dmax (ρ AB ρ A ⊗ σ˜ B ) ≥ Dmax (E(ρ AB )E A (ρ A ) ⊗ E B (σ˜ B )) ≥ min Dmax (E(ρ AB )E A (ρ A ) ⊗ ω B ) ω B ∈S= (H B )
= Imax (A : B)E (ρ) . 5. Miscellaneous properties. The max-information of classical-quantum states can be estimated as follows. i i Lemma B.18. Let ρ AB I ∈ S≤ (H AB I ) with ρ AB I = i∈I pi ρ AB ⊗ |i i| I , ρ AB ∈ S≤ (H AB ) and pi > 0 for i ∈ I as well as the |i mutually orthogonal (i.e. the state is classical on I ). Then Imax (AI : B)ρ ≥ max Imax (A : B)ρ i .
(B22)
i∈I
Proof. Let σ˜ B ∈ S= (H B ) such that Imax (AI : B)ρ = Dmax (ρ AB I ρ AI ⊗ σ˜ B ) = log λ. That is, λ is minimal such that λ· pi ρ iA ⊗ σ˜ B ⊗ |i i| ≥ pi ρ iAB ⊗ |i i|. i
i
Since the |i are mutually orthogonal and pi > 0 for i ∈ I , this is equivalent to ∀i ∈ I : λ · ρ iA ⊗ σ˜ B ≥ ρ iAB . Set Dmax (ρ iAB ρ iA ⊗ σ˜ B ) = log λi , i.e. λi is minimal such that λi · ρ iA ⊗ σ˜ B ≥ ρ iAB . Hence λ ≥ maxi∈I λi and therefore Imax (AI : B)ρ = log λ ≥ max λi = max Dmax (ρ iAB ρ iA ⊗ σ˜ B ) ≥ max Imax (A : B)ρ i . i∈I
i∈I
i∈I
From this we obtain the following corollary about the behavior of the max-information under projective measurements. Corollary B.19. Let ρ AB ∈ S≤ (H AB ) and let P = PAi i∈I be a collection of pro jectors that describe a projective measurement on system A. For tr PAi ρ AB = 0, let i pi = tr PA ρ AB and ρ iAB = p1i PAi ρ AB PAi . Then
Quantum Reverse Shannon Theorem Based on One-Shot Information Theory
607
Imax (A : B)ρ ≥ max Imax (A : B)ρ i ,
(B23)
i
where the maximum ranges over all i for which ρ iAB is defined.
Proof. Define a CPTP map E : S≤ (H AB ) → S≤ (H AB I ) with E(.) = i PAi (.)PAi ⊗ |i i| I , where the |i are mutually orthogonal. Then the monotonicity of the max-information under local CPTP maps (Lemma B.17) combined with the preceding lemma about the max-information of classical-quantum states (Lemma B.18) show that Imax (A : B)ρ ≥ Imax (A : B)E (ρ) ≥ max Imax (A : B)ρ i . i
The max-relative entropy is quasi-convex in the following sense. Lemma B.20 [18, Lemma 9]. Let ρ = i∈I pi ρi ∈ S≤ (H) and σ = i∈I pi σi ∈ S≤ (H) with ρi , σi ∈ S≤ (H) for i ∈ I . Then Dmax (ρσ ) ≤ max Dmax (ρi σi ).
(B24)
i∈I
From this we find the following quasi-convexity type lemma for the smooth maxinformation. Lemma B.21. Let ε ≥ 0 and ρ AB = i∈I pi ρ iAB ∈ S≤ (H AB ) with ρ iAB ∈ S≤ (H AB ) for i ∈ I . Then ε ε (A : B)ρ ≤ max Imax (A : B)ρ i + log |I |. Imax
(B25)
i∈I
Proof. Let ρ˜ iAB ∈ B (ρ iAB ) and σ˜ Bi ∈ S= (H B ) for i ∈ I . Using the quasi-convexity of the max-relative entropy (Lemma B.20) we obtain ε (A : B)ρ i + log |I | max Imax i∈I = max Dmax ρ˜ iAB ρ˜ iA ⊗ σ˜ Bi + log |I | i∈I & ' i i i ≥ Dmax pi ρ˜ AB pi ρ˜ A ⊗ σ˜ B + log |I | i∈I
0
= log min λ ∈ R : (i)
⎧ ⎨
i∈I
i∈I
pi ρ˜ iAB ≤ λ ·
1 pi ρ˜ iA ⊗ σ˜ Bi
+ log |I |
i∈I
i∈I
i∈I
j∈I
j
⎫ ⎬
≥ log min μ ∈ R : pi ρ˜ iAB ≤ μ · pi ρ˜ iA ⊗ σ˜ B + log |I | ⎩ ⎭ i∈I i∈I j∈I ⎧ ⎫ ⎨ ⎬ 1 j · σ˜ B , = log min μ ∈ R : pi ρ˜ iAB ≤ μ · pi ρ˜ iA ⊗ ⎩ ⎭ l
j i ˜ Bi . Now set where step (i) holds because i∈I pi ρ˜ iA ⊗ j∈I σ˜ B ≥ i∈I pi ρ˜ A ⊗ σ j 1 i σ˜ B = j∈I l · σ˜ B and ρ˜ AB = i∈I pi ρ˜ AB . By the convexity of the purified distance in its arguments (Lemma A3) we obtain
608
M. Berta, M. Christandl, R. Renner ε max Imax (A : B)ρ i + log |I | i∈I
≥ log min {μ ∈ R : ρ˜ AB ≤ μ · ρ˜ A ⊗ σ˜ B } min log min {ν ∈ R : ρ¯ AB ≤ ν · ρ¯ A ⊗ σ B } ≥ min =
ρ¯ AB ∈Bε (ρ AB ) σ B ∈S= (H B ) ε (A : B)ρ . Imax
6. Asymptotic behavior. The following is the Asymptotic Equipartition Property (AEP) for smooth min- and max-entropy. Lemma B.22 [38, Theorem 9]. Let ε > 0, n ≥ 2 · (1 − ε2 ) and ρ AB ∈ S= (H AB ). Then 1 ε η(ε) Hmin (A|B)ρ ⊗n ≥ H (A|B)ρ − √ , n n 1 ε η(ε) H (A)ρ ⊗n ≤ H (A)ρ + √ , n max n √ where η(ε) = 4 1 − 2 · log ε · (2 +
1 2
(B26) (B27)
· log |A|).
Remark B.23 [38, Theorem 1]. Let ρ AB ∈ S= (H AB ). Then 1 ε H (A|B)ρ ⊗n = H (A|B)ρ , n min 1 ε (A)ρ ⊗n = H (A)ρ . lim lim Hmax ε→0 n→∞ n
lim lim
ε→0 n→∞
(B28) (B29)
The Asymptotic Equipartition Property for the smooth max-information is as follows. Lemma B.24. Let ε > 0, n ≥ 2 · (1 − ε2 ) and ρ AB ∈ S= (H AB ). Then ε2 1 ε ξ(ε) 2 Imax (A : B)ρ ⊗n ≤ I (A : B)ρ + √ − · log , n n 24 n √ where ξ(ε) = 8 13 − 4 · log ε · (2 + 21 · log |A|).
(B30)
Proof. Using the entropy measure upper bound for the smooth max-information (Lemma B.15) together with the Asymptotic Equipartition Property for the smooth minand max-entropies (Lemma B.22) we obtain 1 ε ε2 1 ε2 /48 1 ε2 /48 2 Imax (A : B)ρ ⊗n ≤ Hmax (A)ρ ⊗n − Hmin (A|B)ρ ⊗n − · log n n n n 24 ε2 2 ≤ H (A)ρ − H (A|B)ρ − · log n 24 8 2 2 4 ε 1 +2 · √ 1 − log · log 2 + · log |A| 48 2 n ξ(ε) 2 ε2 ≤ I (A : B)ρ + √ − · log . n 24 n
Quantum Reverse Shannon Theorem Based on One-Shot Information Theory
609
Corollary B.25. Let ρ AB ∈ S= (H AB ). Then lim lim
ε→0 n→∞
1 ε I (A : B)ρ ⊗n = I (A : B)ρ . n max
(B31)
Proof. By the Asymptotic Equipartition Property for the max-information (Lemma B.24) we have ξ(ε) 2 ε2 1 ε , lim lim Imax (A : B)ρ ⊗n ≤ I (A : B)ρ + lim lim √ − · log ε→0 n→∞ n ε→0 n→∞ n 24 n √ where ξ(ε) = 8 13 − 4 · log ε · (2 + 21 · log |A|). Thus we obtain lim lim
ε→0 n→∞
1 ε I (A : B)ρ ⊗n ≤ I (A : B)ρ . n max
To show the converse we need a Fannes-type inequality for the von Neumann entropy. We use Theorem 1 of [48]: Let ρ, σ ∈ S= (H) with ρ ≈ε σ . Then |H (ρ) − H (σ )| ≤ ε log(d − 1) + H ((ε, 1 − ε)), where d denotes the dimension of H. Because the max-relative entropy is always lower bounded by the relative von Neumann entropy (Lemma B.7) we get ε Imax (A : B)ρ ⊗n = min min Dmax ρ¯ nAB ρ¯ nA ⊗ σ Bn ⊗n n ρ¯ nAB ∈Bε (ρ ⊗n AB ) σ B ∈S= (H B )
≥
min
min
⊗n n ρ¯ nAB ∈Bε (ρ ⊗n AB ) σ B ∈S= (H B )
D(ρ¯ nAB ρ¯ nA ⊗ σ Bn ).
Noting that D(ρ AB ρ A ⊗ σ B ) = D(ρ AB ρ A ⊗ ρ B ) + D(ρ B σ B ) and using the Fannes type inequality we then obtain lim lim
ε→0 n→∞
1 ε 1 I (A : B)ρ ⊗n ≥ lim lim ε→0 n→∞ n n max
min
ρ¯ nAB ∈Bε (ρ ⊗n AB )
D(ρ¯ nAB ρ¯ nA ⊗ ρ¯ Bn )
n n 1 ⊗n ⊗n ≥ lim lim D(ρ ⊗n AB ρ A ⊗ ρ B ) − 3 · H ((ε, 1 − ε)) − ε · log |A| |B| − 1 ε→0 n→∞ n −ε · log |A|n − 1 − ε · log |B|n − 1 . / 3 ≥ D(ρ AB ρ A ⊗ ρ B ) − lim lim · H ((ε, 1 − ε)) − ε · {log [|A||B|] + log |A| + log |B|} ε→0 n→∞ n = I (A : B)ρ .
7. Technical lemmas. Lemma B.26 [49, Lemma A.7]. Let ρ ∈ S≤ (H) and ∈ P(H) such that ≤ 1. Then
P(ρ, ρ) ≤ tr[ρ]−1/2 · tr[ρ]2 − tr[2 ρ]2 . (B32) Lemma B.27. Let ρ AB ∈ S≤ (H AB ), σ B , ω B ∈ S= (H B ) and 0 ≤ AB ≤ 1 AB with 1 A ⊗ ω B − AB (1 A ⊗ σ B ) AB ≥ 0. Then Hmin (A|B)ρ|ω ≥ Hmin (A|B)ρ|σ .
(B33)
610
M. Berta, M. Christandl, R. Renner
Proof. Set Hmin (A|B)ρ|σ = − log λ, i.e. λ is minimal such that λ · 1 A ⊗ σ B − ρ AB ≥ 0. Hence λ · AB (1 A ⊗ σ B ) AB − AB ρ AB AB ≥ 0. Using 1 A ⊗ ω B − AB (1 A ⊗ σ B ) AB ≥ 0 we obtain λ · 1 A ⊗ ω B − AB ρ AB AB = λ · (1 A ⊗ ω B − AB (1 A ⊗ σ B ) AB ) +λ · AB (1 A ⊗ σ B ) AB − AB ρ AB AB ≥ 0. The claim then follows by the definition of the min-entropy.
Lemma B.28. Let ε > 0 and ρ A ∈ S≤ (H A ). Then there exists 0 ≤ A ≤ 1 A such that ρ A ≈ε A ρ A A and ε2 . (B34) 6 Proof. By [49, LemmaA.15] we have ρ A ∈ S≤ (H A ), there exists 0 ≤ that for δ > 0 and 1 δ (A) ≥ H (A) A ≤ 1 A such that tr 1 A − 2A ρ A ≤ 3δ and Hmax R ρ − 2 · log δ . ρ 2 √ Furthermore Lemma B.26 shows that tr 1 A − A ρ A ≤ 3δ implies ρ A ≈ 6δ √ A ρ A A . For ε = 6δ this concludes the proof. ε /6 (A)ρ ≥ H R (A)ρ + 2 · log Hmax 2
Appendix C: Proof of the Decoupling Theorem Let A be of the same dimension as A. Denote by FA A the swap operator of A ⊗ A . Let +A be the projector on the symmetric subspace of A ⊗ A, and − A the projector on the anti-symmetric subspace of A ⊗ A. We need the following facts (see also [31]): • • • • • • •
F A A R R = F A A ⊗ FR R rank(± A ) = |A|(|A| ± 1)/2 F2 = 1 1 ± A = 2 (1 A A ± F A A ) tr [FA A ] = |A| tr [(ψ A ⊗ φ A )FA A ] = tr [ψ A φ A ] tr [(ψ A R ⊗ ψ A R ) · (1 A A ⊗ FR R )] = tr ψ R2
Lemma C.1. Let A = A1 A2 . Then (U ⊗ U )† (1 A2 A2 ⊗ FA1 A1 )(U ⊗ U )dU ≤ U (A)
1 1 1 A A + F A A , |A1 | |A2 |
(C1)
where dU is the Haar measure over the unitaries on system A. Proof. For any X that is Hermitian, it follows from Schur’s lemma that (U † ⊗ U † )X (U ⊗ U )dU = a+ (X )+A + a− (X )− A, U (A)
± where a± (X ) · rank(± A ) = tr[X A ]. Choosing X = G = 1 A2 A2 ⊗ F A1 A1 we get 1 tr ± A (1 A2 A2 ⊗ F A1 A1 ) = tr (1 A A ± F A A )(1 A2 A2 ⊗ F A1 A1 ) 2 1 1 = tr 1 A2 A2 ⊗ FA1 A1 ± tr FA A (1 A2 A2 ⊗ FA1 A1 ) 2 2 1 1 = |A2 |2 · |A1 | ± |A2 | · |A1 |2 . 2 2
Quantum Reverse Shannon Theorem Based on One-Shot Information Theory
611
1 Since rank(± A ) = 2 |A|(|A| ± 1) we get
a± (G) =
|A2 | ± |A1 | |A2 |2 |A1 | ± |A2 ||A1 |2 = . |A|(|A| ± 1) |A| ± 1
From a+ (G) + a− (G) 1 = 2 2 =
|A2 | + |A1 | |A2 | − |A1 | + |A| + 1 |A| − 1
=
|A2 ||A| − |A1 | |A|2 − 1
1 |A|2 − |A1 |2 1 · ≤ |A1 | |A|2 − 1 |A1 |
and 1 a+ (G) − a− (G) = 2 2 =
|A2 | + |A1 | |A2 | − |A1 | − |A| + 1 |A| − 1
|A1 ||A| − |A2 | 1 |A|2 − |A2 |2 1 = · ≤ 2 2 |A| − 1 |A2 | |A| − 1 |A2 |
follows that (U † ⊗ U † )G(U ⊗ U )dU = a+ (G)+A + a− (G)− A U (A)
a+ (G) + a− (G) a+ (G) − a− (G) 1CC + FCC 2 2 1 1 1 A A + F A A . ≤ |A1 | |A2 |
=
Lemma C.2. Let ρ A R ∈ P(H A R ), A = A1 A2 and σ A1 R (U ) = tr A2 [(U ⊗ 1 R )ρ A R (U ⊗ 1 R )† . Then 1 1 tr ρ 2R + tr ρ 2A R , (C2) tr σ A1 R (U )2 dU ≤ |A1 | |A2 | U (A) where dU is the Haar measure over the unitaries on system A. Proof. Using Lemma C.1 we have tr σ A1 R (U )2 dU =
U (A)
=
U (A)
U (A)
tr
σ A1 R (U ) ⊗ σ A1 R (U ) FA1 A1 R R dU
tr (U ⊗ U ⊗ 1 R R )(ρ A R ⊗ ρ A R )(U ⊗ U ⊗ 1 R R )† (1 A2 A2 ⊗ FA1 A1 R R ) dU
* = tr (ρ A R ⊗ ρ A R )
U (A)
+
(U ⊗ U )† (1 A2 A2 ⊗ FA1 A1 )(U ⊗ U )dU ⊗ FR R
* + 1 1 1 1 ≤ tr (ρ A R ⊗ ρ A R ) = 1 A A + F A A ⊗ FR R tr ρ 2R + tr ρ 2A R . |A1 | |A2 | |A1 | |A2 |
612
M. Berta, M. Christandl, R. Renner
Lemma C.3 [12, Lemma 5.1.3]. Let S be a Hermitian operator on H and ξ ∈ P(H). Then14
S1 ≤
tr(ξ ) ξ −1/4 Sξ −1/4 . 2
Proof of Theorem 3.1. We show the theorem for log |A1 | ≤
log |A| + HC (A|R)ρ 1 − log , 2 ε
(C3)
which is sufficient because the conditional min-entropy is upper bounded by the quantum conditional collision entropy (Lemma B.6). Using Lemma C.3 with ξ = 1 A1 ⊗ ω R for ω R ∈ S= (H R ), it suffices to show that
ε2 −1/4 −1/4 2 . σ A1 R (U ) − τ A1 ⊗ ρ R 1 A1 ⊗ ω R dU ≤ 1 A1 ⊗ ω R 2 |A1 | U (A)
We have −1/4 −1/4 1 A1 ⊗ ω R σ A1 R (U ) 1 A1 ⊗ ω R −1/4 −1/4 = tr A2 (U ⊗ 1 R ) 1 A ⊗ ω R ρAR 1A ⊗ ωR (U ⊗ 1 R )† . −1/4 −1/4 ρAR 1A ⊗ ωR and σ˜ A1 R (U ) = tr A2 [(U ⊗ 1 R ) 1A ⊗ ωR † ρ˜ A R (U ⊗ 1 R ) . Our inequality can then be rewritten as
Let ρ˜ A R =
2 σ˜ A R (U ) − τ A ⊗ ρ˜ R 2 dU ≤ ε . 1 1 2 |A1 | U (A)
Using τ A1 ⊗ ρ˜ R = U (A)
˜ A1 R (U )dU U (A) σ
σ˜ A R (U ) − τ A ⊗ ρ˜ R 2 dU = 1 1 2
we get
U (A)
tr
σ˜ A1 R (U ) − τ A1 ⊗ ρ˜ R
2
dU
( ) tr σ˜ A1 R (U )2 −tr σ˜ A1 R (U )(τ A1 ⊗ ρ˜ R ) −tr (τ A1 ⊗ ρ˜ R )σ˜ A1 R (U ) +tr (τ A1 ⊗ ρ˜ R )2 dU = U (A) 1 = · tr ρ˜ 2R tr σ˜ A1 R (U )2 dU − tr (τ A1 ⊗ ρ˜ R )2 = tr σ˜ A1 R (U )2 dU − |A1 | U (A) U (A) (ii) ε 2 (i) 1 · tr ρ˜ 2AR ≤ , ≤ |A2 | |A1 |
where (i) follows from Lemma C.2 and (ii) follows from (C3) and the definition of HC (A|R)ρ . 14 The Hilbert-Schmidt norm is defined as = 2
tr † .
Quantum Reverse Shannon Theorem Based on One-Shot Information Theory
613
Appendix D: The Post-Selection Technique We use a norm on the set of CPTP maps which essentially measures the probability by which two such mappings can be distinguished. The norm is known as the diamond norm in quantum information theory [35]. Here, we present it in a formulation which highlights that it is dual to the well-known completely bounded (cb) norm [50]. Definition D.1 (Diamond norm). Let E A : L(H A ) → L(H B ) be a linear map. The diamond norm of E A is defined as E A = sup E A ⊗ Ik 1 , k∈N
(D1)
where F1 = supσ ∈S≤ (H) F(σ )1 and Ik denotes the identity map on states of a k-dimensional quantum system. Proposition D.2 [35,50]. The supremum in Definition D.1 is reached for k = |A|. Furthermore the diamond norm defines a norm on the set of CPTP maps. Two CPTP maps E and F are called ε-close if they are ε-close in the metric induced by the diamond norm. Definition D.3 (De Finetti states). Let σ ∈ S= (H) and μ(.) be a probability measure on S= (H). Then n (D2) ζ = σ ⊗n μ(σ ) ∈ S= (H⊗n ) is called de Finetti state. The following proposition lies at the heart of the Post-Selection Technique. Proposition D.4 [34]. Let ε > 0 and E An and F An be CPTP maps from L(H⊗n A ) to L(H B ). If there exists a CPTP map K π for any permutation π such that (E An − F An ) ◦ π = K π ◦ (E An − F An ), then E An and F An are ε-close whenever n ((E − F n ) ⊗ I R R )(ζ n ) ≤ ε(n + 1)−(|A|2 −1) , A A ARR 1
(D3)
where ζ An R R is a purification of the de Finetti state ζ An R = σ A⊗n R d(σ A R ) with σ A R = |σ σ | A R ∈ S= (H A ⊗ H R ), H A ∼ = H R and d(.) the measure on the normalized pure states on H A ⊗H R induced by the Haar measure on the unitary group acting on H A ⊗H R , normalized to d(.) = 1. Furthermore we can assume without loss of generality that 2 |R | ≤ (n + 1)|A| −1 . Theorem D.5 [51, Carathéodory]. Let d ∈ N and x be a point that lies in the convex hull of a set P of points in Rd . Then there exists a subset P of P consisting of d + 1 or fewer points such that x lies in the convex hull of P . Corollary D.6. Let ζ An R = σ A⊗n d(σ A R ) as in Proposition D.4. Then ζ An R = i pi R i ⊗n ωAR with ωiA R = |ωi ωi | A R ∈ S= (H A R ), i ∈ {1, 2, . . . , (n + 1)2|A||R|−2 } and pi a probability distribution.
614
M. Berta, M. Christandl, R. Renner
Proof. We can think of ζ An R as a normalized state on the symmetric subspace n |A||R|−1 . Symn (H A R ) ⊂ H⊗n A R . The dimension of Sym (H A R ) is bounded by k = (n + 1) n Furthermore the normalized states on Sym (H A R ) can be seen as living in an m-dimensional real vector space where m = k − 1 + 2 · k(k−1) = k 2 − 1. Now define S as 2 n the set of all ξ An R = ω⊗n A R , where ω A R = |ω ω| A R ∈ S= (H A R ). Then ζ A R lies in 2 −1 k . Using Carathéodory’s theorem (Theorem D.5), the convex hull of the set S ⊂ R we have that ζ An R lies in the convex hull of a set S ⊂ S, where S consists of at most p = k 2 − 1 + 1 = k 2 points. Hence we can write ζ An R as a convex combina tion of p = (n + 1)2|A||R|−2 extremal points in S , i.e. ζ An R = i pi (ωiA R )⊗n , where ωiA R = |ωi ωi | A R ∈ S= (H A R ), i ∈ {1, 2, . . . , (n + 1)2|A||R|−2 } and pi a probability distribution. References 1. Shannon, C.E.: A mathematical theory of communication. Bell System Tech. J. 27, 379–423, 623–656, (1948) 2. Bennett, C.H., Shor, P.W., Smolin, J.A., Thapliyal, A.V.: Entanglement-assisted capacity of a quantum channel and the reverse Shannon theorem. IEEE Trans. on Inf. Th. 48(10), 2637 (2002) 3. Holevo, A.S.: The capacity of the quantum communication channel with general signal states. IEEE Trans. on Inf. Th. 44, 269 (1998) 4. Schumacher, B., Westmoreland, M.D.: Sending classical information via noisy quantum channels. Phys. Rev. A 56, 131 (1997) 5. Devetak, I.: The private classical capacity and quantum capacity of a quantum channel. IEEE Trans. on Inf. Th. 51, 44 (2005) 6. Lloyd, S.: Capacity of the noisy quantum channel. Phys. Rev. A 55, 1613 (1997) 7. Shor, P.W.: The quantum channel capacity and coherent information. Lecture notes, MSRI Workshop on Quantum Computation, 2002 8. Bennett, C.H., Devetak, I., Harrow, A.W., Shor, P.W., Winter, A.: The quantum reverse Shannon theorem. http://arxiv.org/abs/0912.5537v1 [quant-ph], 2009 9. Winter, A.: Compression of sources of probability distributions and density operators. http://arxiv.org/ abs/quant-ph/0208131v1, 2002 10. Harrow, A.W.: Entanglement spread and clean resource inequalities. In: Proceedings 16th International Congress Mathematical Physics, 2009, RiverEdge, NJ: World Scientific, 2010 11. van Dam, W., Hayden, P.: Universal entanglement transformations without communication. Phys. Rev. A, Rapid Communication 67, 060302(R) (2003) 12. Renner, R.: Security of quantum key distribution. Int. J. Quantum Inf. 6, 1 (2008) 13. Renner, R., Wolf, S.: Smooth Rényi entropy and applications. In: Proc. IEEE Intl. Sympo. Inf. Th. Piscataway, NJ: IEEE publishing, 2004, p. 233 14. Renner, R., König, R.: Universally composable privacy amplification against quantum adversaries. Springer Lecture Notes in Computer Science, 3378. Berlin-Feidelberg-NewYork: Springer, 2005, p. 407 15. Berta, M.: Single-shot quantum state merging. Master’s thesis, ETH Zurich, 2008. http://arxiv.org/abs/ 0912.4495v1 [quant-ph], 2009 16. Dupuis, F., Berta, M., Wullschleger, J., Renner, R.: The decoupling theorem. http://arxiv.org/abs/1012. 6044v1 [quant-ph], 2010 17. König, R., Renner, R., Schaffner, C.: The operational meaning of min- and max-entropy. IEEE Trans. on Inf. Th. 55(9), (2009) 18. Datta, N.: Min- and max- relative entropies and a new entanglement monotone. IEEE Trans. on Inf. Th. 55(6), 2816 (2009) 19. Datta, N.: Max- relative entropy of entanglement, alias log robustness. Int. J. Quant. Inf. 7, 475 (2009) 20. Mosonyi, M., Datta, N.: Generalized relative entropies and the capacity of classical-quantum channels. J. Math. Phys. 50, 072104 (2009) 21. Buscemi, F., Datta, N.: The quantum capcity of channels with arbitrarily correlated noise. IEEE Tran. on Inf. Th. 56, 1447 (2010) 22. Brandão, F., Datta, N.: One-shot rates for entanglement manipulation under non-entangling maps. IEEE Trans. on Inf. Th. 57, 1754 (2011) 23. Buscemi, F., Datta, N.: Entanglement cost in practical scenarios. Phys. Rev. Lett. 106, 130503 (2011). doi:10.1103/PhysRevLett.106.130503
Quantum Reverse Shannon Theorem Based on One-Shot Information Theory
615
24. Wang, L., Renner, R.: One-shot classical-quantum capacity and hypothesis testing. http://arxiv.org/abs/ 1007.5456v1 [quant-ph], 2010 25. Renes, J.M., Renner, R.: One-shot classical data compression with quantum side information and the distillation of common randomness or secret keys. http://arxiv.org/abs/1008.0452v2 [quant-ph], 2010 26. Renes, J.M., Renner, R.: Noisy channel coding via privacy amplification and information reconciliation. 2010. http://arxiv.org/abs/1012.4814v1 [quant-ph], 2010 27. Tomamichel, M., Schaffner, C., Smith, A., Renner, R.: Leftover hashing against quantum side information. IEEE Trans. on Inf. Th. 57(8) (2011) 28. Buscemi, F., Datta, N.: General theory of assisted entanglement distillation. http://arxiv.org/abd/1009. 4464v1 [quant-ph], 2010 29. Buscemi, F., Datta, N.: Distilling entanglement from arbitrary resources. J. Math. Phys. 51, 102201 (2010) 30. Horodecki, M., Oppenheim, J., Winter, A.: Partial quantum information. Nature 436, 673–676 (2005) 31. Horodecki, M., Oppenheim, J., Winter, A.: Quantum state merging and negative information. Commun. Math. Phys. 269, 107 (2006) 32. Abeyesinghe, A., Devetak, I., Hayden, P., Winter, A.: The mother of all protocols: Restructuring quantum information’s family tree. Proc. Royal Soc. A 465(2108), 2537 (2009) 33. Slepian, D., Wolf, J.: Noiseless coding of correlated information sources. IEEE Trans. on Inf. Th. 19, 461 (1971) 34. Christandl, M., König, R., Renner, R.: Post-selection technique for quantum channels with applications to quantum cryptography. Phys. Rev. Lett. 102, 020504 (2009) 35. Kitaev, A.Y.: Quantum computations: algorithms and error correction. Russ. Math. Surv. 52, 1191 (1997) 36. Stinespring, W.: Positive functions on C*-algebras. Proc. Amer. Math. Soc. 6, 211 (1955) 37. Dupuis, F.: The Decoupling Approach to Quantum Information Theory. PhD thesis, Université de Montréal, 2009, http://arxiv.org/abs/1004.1641v1 [quant-ph], 2010 38. Tomamichel, M., Colbeck, R., Renner, R.: A fully quantum asymptotic equipartition property. IEEE Trans. on Inf. Th. 55, 5840–5847 (2009) 39. Tomamichel, M., Colbeck, R., Renner, R.: Duality between smooth min- and max-entropies. IEEE Trans. on Inf. Th. 56, 4674 (2010) 40. Hayden, P.: Quantum information theory via decoupling. Tutorial QIP Singapore, 2011. http://qip2011. quantumlah.org/images/QIPtutorial1.pdf, 2011 41. Oppenheim, J.: State redistribution as merging: introducing the coherent relay. http://arxiv.org/abs.0805. 1065v1 [quant-ph], 2008 42. Uhlmann, A.: The transition probability in the state space of a *-algebra. Rep. Math. Phys. 9, 273 (1976) 43. Jozsa, R.: Fidelity for mixed quantum states. J. Mod. Optics 41, 2315 (1994) 44. Leung, D., Toner, B., Watrous, J.: Coherent state exchange in multi-prover quantum interactive proof systems. 2008. http://arxiv.org/abs/0804.4118v1 [quant-ph], 2008 45. Deutsch, D.: Quantum computational networks. Proc. Royal Society London 73, 425 (1989) 46. Nielsen, M.A., Chuang, I.L.: Quantum computation and quantum information. Cambridge: Cambridge University Press, 2000 47. Bennett, C.H., Brassard, G., Crépeau, C., Jozsa, R., Peres, A., Wootters, W.K.: Teleporting an unknown quantum state via dual classical and Einstein-Podolsky-Rosen channels. Phys. Rev. Lett. 70, 1895 (1993) 48. Audenaert, K.M.R.: A sharp Fannes-type inequality for the von Neumann entropy. J. Phys. A 40, 8127–8136 (2007) 49. Berta, M., Christandl, M., Colbeck, R., Renes, J.M., Renner, R.: The uncertainty principle in the presence of quantum memory. Nature Phys. 6, 659 (2010) 50. Paulsen, V.I.: Completely bounded maps and operator algebras. Cambridge: Cambridge University Press, 2002 51. Gruber, P.M., Wills, J.M.: Handbook of Convex Geometry, Vol. A. London: Elsevier Science Publishers, 1993 Communicated by M.B. Ruskai
Commun. Math. Phys. 306, 617–645 (2011) Digital Object Identifier (DOI) 10.1007/s00220-011-1310-1
Communications in
Mathematical Physics
The Szeg˝o Kernel on a Sewn Riemann Surface Michael P. Tuite, Alexander Zuevsky School of Mathematics, Statistics and Applied Mathematics, National University of Ireland Galway, University Road, Galway, Ireland. E-mail:
[email protected];
[email protected] Received: 22 February 2010 / Accepted: 1 March 2011 Published online: 30 July 2011 – © Springer-Verlag 2011
Abstract: We describe the Szeg˝o kernel on a higher genus Riemann surface in terms of Szeg˝o kernel data coming from lower genus surfaces via two explicit sewing procedures where either two Riemann surfaces are sewn together or a handle is sewn to a Riemann surface. We consider in detail the examples of the Szeg˝o kernel on a genus two Riemann surface formed by either sewing together two punctured tori or by sewing a twice-punctured torus to itself. We also consider the modular properties of the Szeg˝o kernel in these cases. 1. Introduction The purpose of this paper is to provide an explicit description of the Szeg˝o kernel [Sz,HS,Sc,F1] on a higher genus Riemann surface in terms of Szeg˝o kernel data coming from lower genus surfaces. We exploit two explicit sewing procedures where either two lower genus Riemann surfaces are sewn together or else a handle is sewn to a lower genus Riemann surface. We also consider in some detail the construction and modular properties of the Szeg˝o kernel on a genus two Riemann surface formed either by sewing two tori together or by sewing a handle on to a torus. This paper is a further development of the theory of partition and n-point correlation functions on Riemann surfaces for vertex operator algebras (e.g. [FLM,Ka]) as described in [T,MT1,MT1,MT2,MT3,MT4,MTZ]. Our main motivation is to lay the foundations for the explicit construction of the partition and n-point correlation functions for a fermionic vertex operator super algebra on higher genus Riemann surfaces [TZ1,TZ2]. (The central role played by the Szeg˝o kernel for such systems has been long appreciated in theoretical physics [RS,R,DVFHLS,DVPFHLS]). Our ultimate aim is to develop a fully rigorous theory of higher genus partition and n-point functions along the lines of Zhu’s theory at genus one [Z]. Thus, to name but Supported by a Science Foundation Ireland Research Frontiers Programme Grant, and by Max–Planck Institut für Mathematik, Bonn.
618
M. P. Tuite, A. Zuevsky
a few applications, one might eventually study Siegel (sub)group modular invariance, the Freidan-Shenker conjecture concerning the reconstructability of a conformal field theory from the genus g partition function at all genera [FS], higher genus bosonization etc. However, this present paper may also be of interest to readers outside the vertex operator algebra community. We begin in Sect. 2 with a review of some basic aspects of the theory of Riemann surfaces [FK,Sp,F1,F2,Mu]. We then define and discuss properties of the Szeg˝o kernel, which is a meromorphic ( 21 , 21 ) differential with a simple pole structure and prescribed multiplicities on the cycles of the Riemann surface [Sz,HS,Sc,F1]. In Sect. 3 we describe the Szeg˝o kernel on a genus g1 + g2 Riemann surface (g1 +g2 ) obtained by sewing two lower genus Riemann surfaces (g1 ) and (g2 ) . This is similar to the approach of refs. [Y] and [MT1], for computing the period matrix and other related structures on (g1 +g2 ) in terms of lower genus data. Following [MT1], we refer to this sewing scheme as the -formalism where is a complex sewing parameter which forms part of the data according to which the sewing is performed (see Fig. 1 below). In particular, we introduce an infinite block matrix 0 ξ F1 , (1) Q= −ξ F2 0 where F1 and F2 are infinite matrices whose entries are certain √weighted moments of the Szeg˝o kernels on (g1 ) and (g2 ) , respectively, and ξ ∈ {± −1}. The matrix I − Q, where I is the infinite identity matrix, plays a crucial role here (and in the sequel [TZ1]). In particular, we show that I − Q is invertible for small enough . (I − Q)−1 then forms part of the expression of the genus g1 + g2 Szeg˝o kernel in terms of the lower genus Szeg˝o kernel data as proved in Theorem 3. In Theorem 4 we further show that the determinant det(I − Q) is well-defined and is a non-vanishing holomorphic function for small enough . Finally, we describe the example of the Szeg˝o kernel on a genus two Riemann surface formed by sewing two tori and verify its modular transformation properties under the modular group which preserves the sewing scheme. This example is extensively exploited in [TZ1]. Section 4 is devoted to development of the corresponding formalism in the case that (g+1) of genus g + 1 is obtained by self-sewing a handle to a genus g Riemann surface (g) with complex sewing parameter ρ. We refer to this as the ρ-formalism. This case is more technical due to the extra multiplicities on the two new cycles associated with the sewing handle. This leads us to introduce an analogue of (1), namely, an infinite matrix T whose entries are determined by weighted moments of certain genus g objects related to the Szeg˝o kernel on (g) and the new multiplicities. We show that I − T is invertible for suitably small ρ and in Theorem 7 express the Szeg˝o kernel on (g+1) in terms of (I − T )−1 and other genus g Szeg˝o kernel data. In Theorem 8 we show that the determinant det(I − T ) is well-defined and holomorphic for suitably small ρ. We conclude with two examples of sewing a handle to a Riemann sphere to obtain a torus and sewing a handle to a torus to obtain genus two Riemann surface. The modular transformation properties of the genus two Szeg˝o kernel are also verified under the modular group preserving this ρ-sewing scheme. This example will be extensively exploited in [TZ2]. 2. The Szeg˝o Kernel on a Riemann Surface Consider a compact Riemann surface of genus g with canonical homology cycle basis a1 , . . . , ag , b1 , . . . , bg . In general there exists g holomorphic 1-forms νi , i = 1, . . . , g
The Szeg˝o Kernel on a Sewn Riemann Surface
619
which we may normalize by (e.g. [FK,Sp]) ν j = 2πiδi j .
(2)
ai
The genus g period matrix is defined by
i j =
1 2πi
νj,
(3)
bi
for i, j = 1, . . . , g. is symmetric with positive imaginary part i.e. ∈ Hg , the Siegel upper half plane. The canonical intersection form on cycles is preserved under the action of the symplectic group Sp(2g, Z), where A B b A B b b˜ ∈ Sp(2g, Z). (4) , → = C D a C D a a˜ This induces the modular action on Hg ˜ = (A + B) (C + D)−1 .
→
(5)
It is useful to introduce the normalized differential of the second kind defined by [Sp,Mu,F1]: d xd y for x ∼ y, (6) (x − y)2 for local coordinates x, y, with normalization ai ω(x, ·) = 0 for i = 1, . . . , g. Using the Riemann bilinear relations, one finds that νi (x) = bi ω(x, ·). We also introduce the normalized differential of the third kind p2 ω p2 − p1 (x) = ω(x, ·), (7) ω(x, y) ∼
p1
a for which ai ω p2 − p1 = 0 and ω p2 − p1 (x) ∼ (−1) x− pa d x for x ∼ pa and a = 1, 2. We recall the definition of the theta function with real characteristics e.g. [Mu,F1,FK]
α exp (iπ(m + α). .(m + α) + (m + α).(z + 2πiβ)), (8) ϑ (z| ) = β g m∈Z
for α = (αi ), β = (βi ) ∈ Rg , z = (z i ) ∈ Cg and i = 1, . . . , g with α α ϑ (z| ) , (z + 2πi( .r + s)| ) = e2πiα.s e−2πiβ.r e−iπr. .r −r.z ϑ β β α α +r ϑ (z| ), (z| ) = e2πiα.s ϑ β β +s for r, s ∈ Zg . There exists a (nonsingular and odd) character γδ such that [Mu,F1]
γ
γ ϑ (0| ) = 0, ∂z i ϑ (0| ) = 0. δ δ
(9)
(10)
620
M. P. Tuite, A. Zuevsky
Let ζ (x) =
g
∂z i ϑ
i=1
γ δ
(0| )νi (x),
(11)
1
a holomorphic 1-form, and let ζ (x) 2 denote the form of weight 21 on the double cover of . We also refer to ζ (x) 12 as a (double-valued) 1 -form on . We define the prime 2
form E(x, y) by1 E(x, y) = where
x y
of weight
ν =(
x
ϑ
γ x δ
y 1 2
ν|
ζ (x) ζ (y)
1 2
1
1
∼ (x − y)d x − 2 dy − 2 for x ∼ y,
(12)
g y νi ) ∈ C . E(x, y) = −E(y, x) is a holomorphic differential form 1 1 × . E(x, y) has multipliers along the ai and b j cycles in x (− 2 , − 2 ) on x −iπ j j − y ν j
given by 1 and e respectively [F1]. The normalized differentials of the second and third kind can be expressed in terms of the prime form [Mu] ω(x, y) = ∂x ∂ y log E(x, y)d xd y, E(x, p) d x. ω p−q (x) = ∂x log E(x, q) Conversely, we can also express the prime form in terms of ω by [F2] 1 1 1 x E(x, y) = lim (x − p)(q − y) exp − ω p−q d x − 2 dy − 2 . p→x, q→y 2 y
We define the Szeg˝o kernel [Sc,HS,F1] for ϑ βα (0| ) = 0 as follows:
x α ϑ ν|
y β θ S (x, y| ) = , φ ϑ βα (0| )E(x, y)
(13) (14)
(15)
(16)
where θ = (θi ), φ = (φi ) ∈ U (1)n for θ j = −e−2πiβ j , φ j = −e2πiα j ,
j = 1, . . . , g.
(17)
It follows from (9) that (16) is a function of e2πiαi and e2πiβi . The further factors of −1 in (17) are included for later convenience. The Szeg˝o kernel has multipliers along the ai and b j cycles in x given by −φi and −θ j respectively and is a meromorphic ( 21 , 21 )-form × satisfying: on 1 1 θ 1 d x 2 dy 2 for x ∼ y, (18) S (x, y) ∼ x−y φ −1 θ θ (y, x), (19) S (x, y) = −S φ −1 φ 1 Note that our definition differs from that of refs. [Mu,F1] by a factor of −1.
The Szeg˝o Kernel on a Sewn Riemann Surface
621
where θ −1 = (θi−1 ) and φ −1 = (φi−1 ). Note that the skew-symmetry property (19) implies S φθ (x, y) has multipliers along the ai and b j cycles in y given by −φi−1 and
−θ −1 j respectively. Finally, we describe the modular invariance of the Szeg˝o kernel under the symplectic group Sp(2g, Z) where we find [F1] θ˜ θ ˜ (x, y| ) = S (x, y| ), (20) S φ˜ φ ˜ of (5) and where θ˜ j = −e−2πi β˜ j , φ˜ j = −e2πi α˜ j for with
−β˜ −β 1 −diag(AB T ) A B , = + C D α˜ 2 diag(C D T ) α
(21)
where diag(M) denotes the diagonal elements of a matrix M. For a Riemann surface of genus one described by an oriented torus C/ for lattice = 2πi(Zτ ⊕ Z) for τ ∈ H1 , the genus one prime form is E (1) (x, y) = K (x − 1 1 y, τ )d x − 2 dy − 2 , where ϑ1 (z, τ ) , ∂z ϑ1 (0, τ ) 1 for z ∈ C and τ ∈ H1 and where ϑ1 (z, τ ) = ϑ 21 (z, τ ). K (z, τ ) =
(22)
2
For (θ, φ) = (1, 1) with θ = −e−2πiβ and φ = −e2πiα the genus one Szegö kernel is 1 1 θ θ (x, y|τ ) = P1 (x − y, τ )d x 2 dy 2 , S (1) (23) φ φ where
α ϑ β (z, τ ) 1 θ (z, τ ) = , P1 α φ ϑ β (0, τ ) K (z, τ ) =−
k∈Z
qzk+λ , 1 − θ −1 q k+λ
(24)
is a ‘twisted’ Weierstrass function [MTZ] for qz = e z and with φ = exp(2πiλ) for 0 ≤ λ < 1. The genus one modular group S L(2, Z) acts in this case with (20) and (21) following from θ θ (γ z|γ τ ) = (cτ + d)P1 (z|τ ), (25) P1 γ φ φ with γτ =
z aτ + b , γz = , cτ + d cτ + d
(26)
622
M. P. Tuite, A. Zuevsky
for γ =
a b c d
∈ S L(2, Z) and a b θ θ φ γ = c d . φ θ φ
We also have a Laurant expansion [MTZ] θ 1
θ (z, τ ) = − (τ )z n−1 , P1 En φ φ z
(27)
(28)
n≥1
for twisted Eisenstein series defined by
(r + λ)n−1 θ −1 q r +λ θ Bn (λ) 1 En (τ ) = − + φ n! (n − 1)! 1 − θ −1 q r +λ r ≥0
(−1)n (r − λ)n−1 θq r −λ + , (n − 1)! 1 − θq r −λ
(29)
r ≥1
for n ≥ 1 and where Bn (λ) is the Bernoulli polynomial defined by qzλ 1 Bn (λ) n−1 = + z . qz − 1 z n! n≥1
For (θ, φ) = (1, 1) and n ≥ 2 the twisted Eisenstein series reduce to the standard elliptic Eisenstein series with E n (τ ) = 0 for n odd. 3. The Szeg˝o Kernel on Two Sewn Riemann Surfaces 3.1. The -formalism sewing scheme. We review the Yamada [Y] formalism for ‘sewing’ together two Riemann surfaces (ga ) of genus ga for a = 1, 2 to form a surface of genus g1 + g2 . Following [MT1], we refer to this sewing scheme as the -formalism. Choose a local coordinate z a on (ga ) in the neighborhood of a point pa , and consider the closed disk |z a | ≤ ra for ra > 0, sufficiently small. Let be a complex sewing parameter with || ≤ r1r2 and excise the disk {z a : |z a | ≤ ||/ra¯ } ⊂ (ga ) , to form a punctured surface (ga ) = (ga ) \{z a : |z a | ≤ ||/ra¯ }. Here and below, we use the convention 1 = 2, 2 = 1. (ga ) and identify A1 and A2 as Define the annulus Aa = {z a : ||/ra¯ ≤ |z a | ≤ ra } ⊂ a single region A = A1 A2 via the sewing relation z 1 z 2 = .
(30)
The Szeg˝o Kernel on a Sewn Riemann Surface
623
Fig. 1. Sewing Two Riemann Surfaces
g1 \A1 } ∪ In this way we obtain a compact Riemann surface (g1 +g2 ) = { (g ) (g +g ) 2 1 2 { \A2 } ∪ A of genus g1 + g2 . By construction, degenerates into (g1 ) and (g2 ) in the limit → 0. The form ω(g1 +g2 ) on (g1 +g2 ) can be found in terms of data coming from ω(ga ) on (g a ) [Y]. (g1 +g2 ) inherits a homology cycle basis labeled {as1 , bs1 : s1 = 1, . . . , g1 } and {as2 , bs2 : s2 = g1 +1, . . . , g1 +g2 } from (g1 ) and (g2 ) respectively. This allows us (g +g ) (g +g ) to compute the normalized 1-forms νi 1 2 and the period matrix i j 1 2 . In particular, we find [Y,MT1] (g1 +g2 )
Theorem 1. ω(g1 +g2 ) , νi
(g +g2 )
and i j 1
are holomorphic in for || < r1r2 with
ω(g1 +g2 ) (x, y) = δab ω(ga ) (x, y) + O(), (g +g2 )
νsb 1
(g )
(x) = δab νsa a (x) + O(),
(g +g2 )
sa t1b
(g )
= δab sa taa + O(),
(ga ) , y ∈ (gb ) and a, b = 1, 2 and where sa , tb label the inherited homology for x ∈ basis. (g1 +g2 )
The explicit form of ω(g1 +g2 ) , νi
(g +g2 )
and i j 1
is described in [Y,MT1].
3.2. The Szeg˝o kernel in the -formalism. We now determine the Szeg˝o kernel on the Riemann surface (g1 +g2 ) in terms of data coming from Szeg˝o kernel (ga ) (ga ) (ga ) θ S (x, y) = S (x, y), (31) φ (ga ) on the surface (ga ) for a = 1, 2. We adopt the abbreviated notation of the left hand side of (31) when there is no ambiguity. Similarly, the Szeg˝o kernel on (g1 +g2 ) is denoted by (g1 +g2 ) (g1 +g2 ) (g1 +g2 ) θ S (x, y) = S (x, y), (32) φ (g1 +g2 ) (g +g2 )
with periodicities (θsa 1
(g +g2 )
, φsa 1
(g )
(g )
) = (θsa a , φsa a ) on the inherited homology basis.2
2 Note that we exclude those Riemann theta characteristics for which (32) exists but where either of the lower genus theta functions vanishes, i.e. we assume that (31) exists for a = 1, 2.
624
M. P. Tuite, A. Zuevsky
We next describe S (g1 +g2 ) (x, y) in terms of S (ga ) (x, y). We first show that 1
Theorem 2. S (g1 +g2 ) is holomorphic in 2 for || < r1r2 with (ga ) (ga ) , S (x, y) + O(), for x, y ∈ (g1 +g2 ) (x, y) = S 1 (ga ) , y ∈ (ga¯ ) . O( 2 ), for x ∈ Proof. Applying Theorem 1 to (8) we have α (g1 +g2 ) (g1 +g2 ) (g1 +g2 ) α (g1 ) (g1 ) (g1 ) α (g2 ) (g2 ) (g2 ) z z |
z |
ϑ =ϑ ϑ |
β (g1 +g2 ) β (g1 ) β (g2 ) +O(),
(33)
(g ) (g ) (g ) (g ) (α1 1 , . . . , αg11 , α1 2 , . . . , αg22 ), etc. We firstly show that the genus
with (α (g1 +g2 ) ) = g1 + g2 prime form obeys E
(g1 +g2 )
(x, y) =
(ga ) , E (ga ) (x, y) + O(), for x, y ∈ 1 (ga ) , y ∈ (ga¯ ) . O( − 2 ), for x ∈
(34)
For genus g1 + g2 odd characteristic of (10) we find from (33) that either
the γ (g1 ) γ (g2 ) ϑ δ (g1 ) (0) = 0 or ϑ δ (g2 ) (0) = 0 on the lower genus surfaces. Hence it follows (ga ) , y ∈ (ga¯ ) for the 1-form (11). We that ζ (g1 +g2 ) (x)ζ (g1 +g2 ) (y) = O() for x ∈ also note that x (g ) x (ga ) , νs a + O(), for x, y ∈ (g1 +g2 ) pa¯ (ga¯ ) (35) νsb = y ax (ga ) (ga ) , y ∈ (ga¯ ) , νsa +δab νsa¯ + O(), for x ∈ δab y ¯ pa
y
1 (ga ) and y ∈ (ga¯ ) . from which it follows that E (g1 +g2 ) (x, y) = O( − 2 ) for x ∈ (g +g ) (g ) a 1 2 We next determine E (x, y) for x, y ∈ . The differential ω(g1 +g2 ) for (g ) a x, y ∈ obeys
ω(g1 +g2 ) (x, y) − ω(ga ) (x, y) = aa (x)X a¯ a¯ aaT (y) = O(), (36) where aa (x)X a¯ a¯ aaT (y) = k,l≥1 aa (x, k)X a¯ a¯ (k, l)aa (y, l) with aa (x, k) a certain (ga ) and X a¯ a¯ (k, l) an infinite matrix determined from genus g1 and g2 1-form on data (see [MT1] for details). It follows from (15) that 1
E (g1 +g2 ) (x, y) = E (ga ) (x, y)e− 2 ba X a¯ a¯ ba = E (ga ) (x, y) + O(), x where ba (k) = y aa (·, k). Thus (34) holds. We then apply Theorem 1, (33), (34) and (35) to (16) to prove the result. (ga ) then S (ga ) (x, z a )S (g1 +g2 ) (z a , y) is a meromorWe next remark that for x, z a ∈ phic 1-form (cf. [HS]) in z a periodic on the (ga ) cycles (cf. (19)) with simple poles described by (18) where T
dz a (g1 +g2 ) S (x, y) for z a ∼ x, x − za (37) dz a (ga ) (ga ) . S (x, y) for z a ∼ y if y ∈ S (ga ) (x, z a )S (g1 +g2 ) (z a , y) ∼ za − y
S (ga ) (x, z a )S (g1 +g2 ) (z a , y) ∼
A similar behavior holds for S (g1 +g2 ) (x, z b )S (gb ) (z b , y) as a meromorphic 1-form in z b . This allows us to determine the following integral equations
The Szeg˝o Kernel on a Sewn Riemann Surface
625
(g1 ) Fig. 2. Example with x, y ∈
Proposition 1. The Szeg˝o kernel on (g1 +g2 ) is given by 1 S (g1 +g2 ) (x, y) = δab S (ga ) (x, y) − S (ga ) (x, z a )S (g1 +g2 ) (z a , y), 2πi Ca (za ) 1 = δab S (ga ) (x, y) + S (g1 +g2 ) (x, z b )S (gb ) (z b , y), 2πi Cb (z b )
(38) (39)
(ga ) , y ∈ (gb ) for a, b = 1, 2, and where Ca (z a ) ⊂ Aa denotes a closed for x ∈ anti-clockwise oriented contour parameterized by z a surrounding the puncture at z a = 0 (ga ) . on (ga ) surrounding Aa and the given points x (and y, if Proof. Let σa be a contour on (g ) 3 a (see Fig. 2). a = b) on From Cauchy’s theorem σa S (ga ) (x, z a )S (g1 +g2 ) (z a , y) = 0, and hence (37) gives 1 (g1 +g2 ) (ga ) (x, y) + δab S (x, y) + S (ga ) (x, z a )S (g1 +g2 ) (z a , y), 0 = −S 2πi Ca (za ) giving (38). Considering S (g1 +g2 ) (x, z b )S (gb ) (z b , y) leads to (39).
Similarly to [MT1] we define weighted moments for S (g1 +g2 ) by θ (g1 +g2 ) X ab (k, l, ) = X ab (k, l, ) φ (g1 +g2 ) 1 1 1 2 (k+l−1) x −k y −l S (g1 +g2 ) (x, y)d x 2 dy 2 , = (2πi)2 Ca (x) Cb (y) for k, l ≥ 1. From (19) it follows that θ (g1 +g2 ) (θ (g1 +g2 ) )−1 X ab (k, l, ) = −X ba (l, k, ). −1 φ (g1 +g2 ) (φ (g1 +g2 ) )
(40)
(41)
We denote by X ab = (X ab (k, l, )) the infinite matrix indexed by k, l ≥ 1. 3 σ may be construed as being the boundary of the simple-connected covering space for (ga ) as illustrated a in Fig. 2 for a genus two surface.
626
M. P. Tuite, A. Zuevsky
We also define various moments for S (ga ) (x, y). These provide the data used to (ga ) by construct S (g1 +g2 ) (x, y). Define holomorphic 21 -forms on
θ (ga ) h a (k, x, ) = h a (k, x, ) = φ (ga ) (ga ) θ h¯ a (k, y, ) = h¯ a (k, y, ) = φ (ga )
k 1 1 2−4 S (ga ) (x, z a )z a−k dz a2 , 2πi Ca (za ) k 1 1 2−4 S (ga ) (z a , y)z a−k dz a2 , 2πi Ca (za )
(42)
(43)
and introduce infinite row vectors h a (x) = (h a (k, x)), h¯ a (x) = (h¯ a (k, x)) indexed by k ≥ 1. From (19) it follows that (ga ) (ga ) )−1 θ (θ h¯ a (k, x, ) = −h a (k, x, ). (44) φ (ga ) (φ (ga ) )−1 Finally, we define the moment matrix θ (ga ) Fa (k, l, ) = Fa (k, l, ) φ (ga ) 1 1 1 2 (k+l−1) x −k y −l S (ga ) (x, y)d x 2 dy 2 2 (2πi) Ca (x) Ca (y) k 1 l 1 1 1 2−4 2−4 −k = x h a (l, x)d x 2 = y −l h¯ a (k, y)dy 2 . 2πi Ca (x) 2πi Ca (y)
=
(45)
Fa (k, l, ) obeys a skew-symmetry property from (19) similar to (41). We may invert (ga ) that (42)–(45) using (18) to find for x, y ∈ ⎡ ⎤
1 1 1 1 S (ga ) (x, y) = ⎣ + − 2 (k+l−1) Fa (k, l, )x k−1 y l−1 ⎦ d x 2 dy 2 (46) x−y k,l≥1
=
− k2 + 14
1
h a (k, x)y k−1 dy 2
(47)
− 2 + 4 x l−1 h¯ a (l, y)d x 2 .
(48)
k≥1
=
l
1
1
l≥1
We are now in a position to express S (g1 +g2 ) (x, y) in terms of the lower genus data. From a¯ so that the sewing relation (30) we have dz a = − dz z2 a¯
1
dz a¯2 dz a = (−1) ξ , z a¯ 1 2
a¯
1 2
√ where ξ ∈ {± −1} determines the square root branch chosen. We then find
(49)
The Szeg˝o Kernel on a Sewn Riemann Surface
627
Proposition 2. S (g1 +g2 ) (x, y) is given by S
(g1 +g2 )
(x, y) =
(ga ) , S (ga ) (x, y) + h a (x)X a¯ a¯ h¯ aT (y), for x, y ∈ T (ga ) , y ∈ (ga¯ ) , h a (x) ξ(−1)a¯ I − X aa ¯ h¯ a¯ (y), for x ∈
(50)
where I denotes the infinite identity matrix and T the transpose. (g1 ) . Noting that C1 (z 1 ) may be deformed to −C2 (z 2 ) on A Proof. Consider x, y ∈ via (30) we find 1 (g1 +g2 ) (g1 ) S (x, y) − S (x, y) = − S (g1 ) (x, z 1 )S (g1 +g2 ) (z 1 , y) 2πi C1 (z 1 ) k 1
1 − 2 + 4 h 1 (k, x) S (g1 +g2 ) (z 1 , y)z 1k−1 dz 12 =− 2πi C1 (z 1 ) k≥1
k 1 1 2−4 =ξ h 1 (k, x) S (g1 +g2 ) (z 2 , y)z 2−k dz 22 2πi C2 (z 2 ) k≥1 k 1
1 2−4 =ξ h 1 (k, x) S (g1 +g2 ) (z 2 , u 1 )S (g1 ) (u 1 , y)z 2−k dz 22 2 (2πi) C2 (z 2 ) C1 (u 1 )
k≥1
= −ξ 2
h 1 (k, x)h¯ 1 (l, y)
k,l≥1
1 1 1 2 (k+l−1) 2 2 S (g1 +g2 ) (z 2 , u 2 )z 2−k u −l 2 dz 2 du 2 2 (2πi) C2 (z 2 ) C2 (u 2 )
= h 1 (x)X 22 h¯ 1T (y), using (38), (47), (49), (48), (39) and (49) again, respectively. Thus we recover the first line of (50) for a = b = 1. A similar analysis holds for a = b = 2. (g1 ) , y ∈ (g2 ) we find that For x ∈ 1 S (x, y) = − S (g1 ) (x, z 1 )S (g1 +g2 ) (z 1 , y) 2πi C1 (z 1 )
k2 − 14 1 z 2−k dz 22 h 1 (k, x) =ξ 2πi C2 (z 2 ) k≥1 1 (g2 ) (g1 +g2 ) (g1 ) · S (z 2 , y) + S (z 2 , u 2 )S (u 2 , y) 2πi C2 (u 2 )
h 1 (k, x)h¯ 2 (k, y) =ξ (g1 +g2 )
k≥1
+ξ 2
k,l≥1
h 1 (k, x)h¯ 2 (l, y)
1 1 1 2 (k+l−1) 2 S (g1 +g2 ) (z 2 , u 1 )z 2−k dz 22 u −l 1 du 1 2 (2πi) C2 (z 2 ) C1 (u 1 )
= h 1 (x) (ξ I − X 21 ) h¯ 2T (y). (g2 ) , y ∈ (g1 ) . A similar result holds for x ∈
628
M. P. Tuite, A. Zuevsky
We next compute the explicit form of the moment matrix X ab in terms of the moments Fa of S (ga ) (x, y). It is useful to introduce infinite block matrices F1 0 X 11 X 12 , F= , X= 0 F2 X 21 X 22 (51) 0 ξI 0 ξ F1 . = , Q = F = −ξ F2 0 −ξ I 0 Then one finds: Proposition 3. X is given by X = (I − Q)−1 F, where (I − Q)−1 =
n≥0
(52)
Q n is convergent for || < |r1r2 |.
Proof. Using (38) we find X 11 (k, l) − F1 (k, l) is given by 1 1 1 2 (k+l−1) (g1 ) −k S (x, z 1 )x d x 2 S (g1 +g2 ) (z 1 , y)y −l dy 2 − 3 (2πi) C1 (x) C1 (z 1 ) C1 (y) 1 (k+l−1)
m 1 1 2 − 2 + 4 =− h 1 (m, x)x −k d x 2 3 (2πi) C1 (x) m≥0 1 · S (g1 +g2 ) (z 1 , y)z 1m−1 y −l dy 2
C1 (z 1 ) C1 (y)
k 1 1 2−4 h 1 (m, x)x −k d x 2 2πi C1 (x) m≥0 1 1 2 (m+l−1) −m −l (g1 +g2 ) · S (z 2 , y)z 2 y dy 2 (2πi)2 C2 (z 2 ) C1 (y)
=ξ
= ξ (F1 X 21 ) (k, l), using (47) and (49). Similarly we find X 22 = F2 − ξ F2 X 12 so that X aa = (F + Q X )aa using (51). A similar calculation of X 12 and X 21 leads to X a a¯ = (Q X )a a¯ . These combine to give (I − Q)X = F which implies (52) provided (I − Q)−1 = n≥0 Q n converges. But (52) can be rewritten X=
Q n .
(53)
n≥1 1
By Theorem 2, X ab (k, l) has a convergent series expansion in 2 for || < r1r2 . But N 1 1 n 2 N ) so that (53) holds to all orders in 2 . Hence (I − Q)−1 converges n=1 Q = O( for || < r1r2 and the proposition holds.
The Szeg˝o Kernel on a Sewn Riemann Surface
629
Propositions 2 and 3 imply Theorem 3. S (g1 +g2 ) (x, y) is given by
S (g1 +g2 ) (x, y) = δab S (ga ) (x, y) + h a (x) (I − Q)−1
ab
h¯ bT (y),
(ga ) , y ∈ (gb ) . Equivalently, for x ∈ (ga ) , S (ga ) (x, y) + h a (x) (I − Fa¯ Fa )−1 Fa¯ h¯ aT (y), for x, y ∈ (g1 +g2 ) S (x, y) = −1 ¯ T a ¯ (g ) a , y∈ (ga¯ ) . ξ(−1) h a (x) (I − Fa¯ Fa ) h a¯ (y), for x ∈
1 (ga ) (respectively, for Remark 1. Note that S (g1 +g2 ) (x, y) is even (odd) in 2 for x, y ∈ (g ) (g ) (g +g ) a ,y∈ a¯ ). Thus S 1 2 (x, y) is invariant under a Dehn twist → e2πi x ∈ with ξ → −ξ from (49).
Similarly to ref. [MT1] we define the determinant of I − Q as a formal power series 1 in 2 by log det (I − Q) = Tr log (I − Q) = −
1 Tr(Q n ). n n≥1
Clearly Tr(Q 2k ) = 2Tr (F1 F2 )k for k ≥ 0 whereas Tr(Q n ) = 0 for n odd. Furthermore, from (45) the diagonal terms (F1 F2 )k have integral power series in . Thus it follows that Lemma 1. det (I − Q) = det (I − F1 F2 ) and is a formal power series in . The determinant has the following holomorphic properties: Theorem 4. det (I − Q) is non-vanishing and holomorphic in for || < r1r2 . Proof. The proof follows a similar argument to Theorem 2 of ref. [MT1]. Let 1
1
S (g1 +g2 ) (z 1 , z 2 ) = f (z 1 , z 2 , )dz 12 dz 22 for |z a | ≤ ra , where f (z 1 , z 2 , ) is holomor1
phic in 2 for || ≤ r for r < r1r2 from Theorem 2. Apply Cauchy’s inequality to the n coefficients of f (z 1 , z 2 , ) = n≥0 f n (z 1 , z 2 ) 2 to find n
| f n (z 1 , z 2 )| ≤ Mr − 2 ,
(54)
for M = sup|za |≤ra ,||≤r | f (z 1 , z 2 , )|. Consider I=
−1 21 21 1 (g1 +g2 ) S (z , z ) 1 − dz 1 dz 2 , 1 2 (2πi)2 Cr1 (z 1 ) Cr2 (z 2 ) z1 z2
for Cra (z a ) the contour with |z a | = ra . Then using (54) we find n ! !
|| 2 ! || !!−1 ! |I| ≤ M. . !1 − .r1r2 , r r1 r2 !
n≥0
(55)
630
M. P. Tuite, A. Zuevsky 1
i.e. I is absolutely convergent and thus holomorphic in 2 for || < r < r1r2 . Since |z 1 z 2 | = r1r2 we may alternatively expand in /z 1 z 2 to obtain
1 1 1 I= k S (g1 +g2 ) (z 1 , z 2 )z 1−k z 2−k dz 12 dz 22 2 (2πi) Cr1 (z 1 ) Cr2 (z 2 ) k≥1 1
= 2 Tr X 12 , where Tr X 12 =
k≥1
X 12 (k, k). But (52) implies Tr X 12 = ξ
Tr((F1 F2 )n ),
n≥1
which is absolutely convergent for || < r1r2 . Hence we find Tr log(I − F1 F2 ) = −
1 Tr((F1 F2 )n ), n n≥1
is also absolutely convergent for || < r1r2 . Thus det(I − Q) = det(I − F1 F2 ) is non-vanishing and holomorphic for || < r1r2 .
3.3. Sewing two tori. Consider the genus two surface formed by sewing two oriented tori a(1) = C/a for a = 1, 2, and lattice a = 2πi(Zτa ⊕ Z) for τa ∈ H1 . This is discussed at length in [MT1]. For local coordinate z a ∈ C/a consider the closed disk |z a | ≤ ra which is contained in a(1) provided ra < 21 D(qa ), where D(qa ) =
min
λ∈a ,λ=0
|λ|,
is the minimal lattice distance. From Subsect. 3.1 we obtain a genus two Riemann surface (2) parameterized by the domain D = {(τ1 , τ2 , ) ∈ H1 ×H1 ×C : || <
1 D(q1 )D(q2 )}, 4
(56)
D is preserved under the action of G (S L(2, Z) × S L(2, Z)) Z2 , the direct product of the left and right torus modular groups, which are interchanged upon conjugation by an involution β as follows a1 τ1 + b1 , γ1 (τ1 , τ2 , ) = , τ2 , c1 τ1 + d1 c1 τ1 + d1 a2 τ2 + b2 , (57) γ2 (τ1 , τ2 , ) = τ1 , , c2 τ2 + d2 c2 τ2 + d2 β(τ1 , τ2 , ) = (τ2 , τ1 , ), ai bi . for (γ1 , γ2 ) ∈ S L(2, Z) × S L(2, Z) with γi = ci di
The Szeg˝o Kernel on a Sewn Riemann Surface
631
There is a natural injection G → Sp(4, Z) in which the two S L(2, Z) subgroups are mapped to ⎧⎡ ⎧⎡ ⎤⎫ ⎤⎫ a 0 b1 0 ⎪ 1 0 0 0 ⎪ ⎪ ⎪ ⎨ 1 ⎨ ⎬ ⎬ ⎢ 0 1 0 0⎥ ⎢ 0 a2 0 b2 ⎥ 1 = ⎣ , (58) , 2 = ⎣ ⎦ ⎦ ⎪ ⎪ ⎩ c1 0 d1 0 ⎪ ⎩ 0 0 1 0 ⎪ ⎭ ⎭ 0 0 0 1 0 c2 0 d2 and the involution is mapped to ⎡
0 ⎢1 β=⎣ 0 0
1 0 0 0
0 0 0 1
⎤ 0 0⎥ . 1⎦ 0
(59)
G also has a natural action on H2 as given in (5) which is compatible with respect to
(2) as a function of (τ1 , τ2 , ) [MT1]. (1) The Szeg˝o kernel on the torus a is given by 1 1 θa (1) θa (x, y|τa ) = P1 (x − y, τa )d x 2 dy 2 , S φa φa from (23). It is straightforward to compute the moment matrix Fa of (45). Using the Laurant expansion (28) we find
θ θ 1 (x − y, τ ) = + P1 C (k, l)x k−1 y l−1 , (60) φ x−y φ k,l≥1
where for k, l ≥ 1 we define θ θ l k +l −2 C (k, l, τ ) = (−1) E k+l−1 (τ ), φ k−1 φ
(61)
for twisted Eisenstein series (29). Then it follows that 1 θa θa (k, l, τa , ) = 2 (k+l−1) C (k, l, τa ). Fa φa φa
(62)
We also have the analytic expansion
θ θ (x − y, τ ) = (x, τ )y k−1 , P1 Pk φ φ
(63)
k≥0
for Pk
θ φ
(z, τ ) =
(−1)k−1 k−1 P1 (k−1)! ∂z
ha
θ φ
(z, τ ). Then we find
k 1 1 θa θa − (k, x, τa , ) = 2 4 Pk (x, τa )d x 2 . φa φa
Using these results we may therefore determine the explicit form for S (2) via Theorem 3.
(64)
θ (2) φ (2)
on D
632
M. P. Tuite, A. Zuevsky
One may also confirm that S (2) satisfies the modular invariance property of (20) under the group G generated by γi , β of (58) and (59) with (2) θ (2) (2) (2) θ (65) S γ (γ x, γ y|γ (τ1 , τ2 , )) = S (x, y|τ1 , τ2 , ), φ (2) φ (2) where
⎤ ⎡ ⎤ ⎡ ⎤ ⎤ ⎡ a 1 b1 ⎤ ⎡ ⎤ ⎡ θ 1 θ1 θ1 θ1 φ1 θ1 θ2 a b 2 2 θ2 ⎥ θ2 φ2 ⎥ ⎢ θ2 ⎥ ⎢ θ1 ⎥ ⎢ θ2 ⎥ ⎢ ⎢ θ2 ⎥ ⎢ ⎥ ⎥ ⎢ ⎢ , β ⎣ ⎦ = ⎣ ⎦, γ1 ⎣ ⎦ = ⎣ c 1 d 1 ⎦ , γ 2 ⎣ ⎦ = ⎣ φ1 φ1 φ1 ⎦ φ1 φ2 θ1 φ1 c d φ2 φ2 φ2 φ1 φ2 θ 2φ 2 ⎡
2
and
x ca τa +da ,
γa x =
x,
2
(1)
a , for x ∈ (1) , for x ∈ a¯
a(1) with 0 ≤ u, v < 1 we define βx = and where for x = 2πi (u + vτa ) ∈
2πi (u + vτa¯ ). Finally, we note that det (I − Q) = det I − F1 φθ11 F2 φθ22 is also G invariant. 4. The Szeg˝o Kernel on a Self-Sewn Riemann Surface 4.1. The ρ-formalism sewing scheme. We now consider the construction of the Szeg˝o kernel on a Riemann surface (g+1) formed by self-sewing a handle to a Riemann surface (g) of genus g. We begin by reviewing the Yamada formalism [Y] in this scheme which, following [MT1], we refer to as the ρ-formalism. Consider a Riemann surface (g) of genus g and let z 1 , z 2 be local coordinates in the neighborhood of two separated points p1 and p2 . Consider two disks |z a | ≤ ra , for ra > 0 and a = 1, 2. Note that r1 , r2 must be sufficiently small to ensure that the disks do not intersect. Introduce a complex parameter ρ where |ρ| ≤ r1r2 and excise the disks (g) {z a : |z a | < |ρ|ra−1 ¯ }⊂ ,
to form a twice-punctured surface (g) = (g)\
+
{z a : |z a | < |ρ|ra−1 ¯ }.
a=1,2
(g) As before, we use the convention 1¯ = 2, 2¯ = 1. We define annular regions Aa ⊂ −1 with Aa = {z a : |ρ|ra¯ ≤ |z a | ≤ ra } and identify them as a single region A = A1 A2 via the sewing relation z 1 z 2 = ρ,
(66)
(g) \{A1 ∪ A2 } ∪ A of genus g + 1. The to form a compact Riemann surface (g+1) = sewing relation (66) can be considered to be a parameterization of a cylinder connecting the punctured Riemann surface to itself.
The Szeg˝o Kernel on a Sewn Riemann Surface
633
In the ρ-formalism we define a standard basis of cycles {a1 , b1 , . . . , ag+1 , bg+1 } on (g+1) , where the set {a1 , b1 , . . . , ag , bg } is the original basis on (g) . Let Ca (z a ) ⊂ Aa denote a closed anti-clockwise contour parameterized by z a surrounding the puncture at z a = 0. Clearly C2 (z 2 ) ∼ −C1 (z 1 ) on applying the sewing relation (66). We then define (g) between the cycle ag+1 to be C2 (z 2 ) and define the cycle bg+1 to be a path chosen in identified points z 1 = z 0 and z 2 = ρ/z 0 on the sewn surface. As in the -formalism, the normalized differential of the second kind ω(g+1) , the (g+1) holomorphic 1-forms νi and the period matrix (g+1) can be computed in terms of (g) data coming from [Y,MT1] to find (g+1)
Theorem 5. ω(g+1) , νi for |ρ| < r1r2 with
(g+1)
and i j
for (i, j) = (g + 1, g + 1) are holomorphic in ρ
ω(g+1) (x, y) = ω(g) (x, y) + O(ρ), (g+1) νi (x) (g+1) νg+1 (x) (g+1)
i j (g+1)
= = =
i,g+1 =
(67)
(g) νi (x) + O(ρ), i = 1, . . . , g, (g) ω p2 − p1 (x) + O(ρ), (g)
i j + O(ρ), i, j = 1, . . . , g, p2 1 (g) ν + O(ρ), i = 1, . . . , g, 2πi p1 i
(68) (69) (70) (71)
(g+1)
(g) . e2πi g+1,g+1 is holomorphic in ρ for |ρ| < r1r2 with for x, y ∈ (g+1)
e2πi g+1,g+1 = −
ρ (1 + O(ρ)) , K 02
(72) −1
−1
where K 0 = K (g) (z 1 = 0, z 2 = 0) for E (g) (z 1 , z 2 ) = K (g) (z 1 , z 2 )dz 1 2 dz 2 2 expressed in terms of the local coordinates z 1 , z 2 . (g+1) (x, y) = 4.2. Szeg˝ kernel
o(g+1) in the ρ-formalism. We now determine the Szeg˝o kernel S θ (g+1) (g+1) in terms of genus g data S φ (g+1) (x, y) on the sewn Riemann surface
together with the multiplier parameters associated with the handle cycles. The S (g+1) multipliers (17) on the cycles ai , bi for i = 1, . . . , g are determined by the multipliers (g+1) (g) (g+1) (g) (g+1) (g) (g+1) (g) = φi and θi = θi i.e. αi = αi and βi = βi . The of S (g) with φi remaining two multipliers associated with the cycles ag+1 and bg+1 , (g+1)
(g+1)
φg+1 = φg+1 = −e2πiαg+1 , (g+1)
θg+1 = θg+1 = −e
(g+1) −2πiβg+1
(73) ,
(74)
must be additionally specified so that a−a¯ (g+1) S (g+1) (e2πi xa , y) = −φg+1 S (xa , y),
S for xa ∈ Aa and xa¯ ∈ Aa¯ .
(g+1)
(xa , y) =
a−a¯ −θg+1
S
(g+1)
(xa¯ , y),
(75) (76)
634
M. P. Tuite, A. Zuevsky
We next consider the analogue of Theorem 2 concerning holomorphicity of S (g+1) 1 the 1 as a function of ρ. It is convenient to define κ ∈ − 2 , 2 by φg+1 = −e2πiκ , i.e. (g+1) κ = αg+1 mod 1. We then find Theorem 6. S (g+1) is holomorphic in ρ for |ρ| < r1r2 with S (g+1) (x, y) = Sκ(g) (x, y) + O(ρ), (g) , where Sκ(g) (x, y) is defined as follows: For κ = − 1 , for x, y ∈ 2
(g) x (g) + κz (g) U (x, y)κ ϑ βα (g) ν |
p , p 1 2 y
(g) , Sκ(g) (x, y) = α (g) E (x, y)ϑ β (g) κz p1 , p2 | (g)
(77)
(78)
where U (x, y) =
E (g) (x, p2 )E (g) (y, p1 ) , E (g) (x, p1 )E (g) (y, p2 )
for prime form E (g) and where
z p1 , p2 =
p2
ν (g) ,
(80)
p1 (g) (x, − 21
for holomorphic 1-forms ν (g) . For κ = − 21 then S
(79)
y) is given by
1 x α (g) U (x, y) 2 1 (g) (g) z ν + |
ϑ p , p β (g) 2 1 2 E (g) (x, y) y x 1 α (g) U (x, y)− 2 1 −θg+1 (g) ϑ ν (g) − z p1 , p2 | (g) β (g) 2 E (x, y) y −1 (g) 1 1 α α (g) − z p1 , p2 | (g) z p , p | (g) − θg+1 ϑ . (81) . ϑ β (g) 2 1 2 β (g) 2
Proof. We firstly note that from (15) it follows that E (g+1) (x, y) = E (g) (x, y) + O(ρ).
(82)
From Theorem 5 we may expand the genus g + 1 theta series to leading order in ρ for |ρ| < r1r2 as follows:
1 (n+α (g+1) )2 g+1 2 x
α (g+1) ρ (g+1) (g+1) = ϑ ν |
− β (g+1) K 02 y m∈Zg n∈Z x (g) (g) (g) (g) . exp iπ(m + α ). .(m + α ) + (m + α ).( ν (g) + 2πiβ (g) ) y p2 x (g+1) (g) (g+1) (g) (g) + (n + αg+1 ) (m + α ). ν + ω p2 − p1 + 2πiβg+1 p1
.(1 + O(ρ)),
y
The Szeg˝o Kernel on a Sewn Riemann Surface
635
(g+1)
Clearly |n + αg+1 | ≥ |κ|. For κ = − 21 it follows that this lower bound is satisfied for one value of n so that 1 κ2 x α (g+1) ρ 2 (g+1) (g+1) = − 2 ϑ ν |
(−θg+1 )−κ U (x, y)κ β (g+1) K0 y
× exp iπ(m + α (g) ). (g) .(m + α (g) )
m∈Zg
+ (m + α
(g)
x
).(
ν
(g)
y
+ κz p1 , p2 + 2πiβ
(g)
) .(1 + O(ρ)),
for z p1 , p2 of (80), and where from (13)
x y
(g) ω p2 − p1
x
= y
p2
ω(g) (·, ·) = log U (x, y),
p1
for U (x, y) of (79). Therefore x α (g+1) (g+1) (g+1) ν |
ϑ β (g+1) y 1 κ2 x α (g) ρ 2 −κ κ (g) (g) (1 + O(ρ)). = − 2 (−θg+1 ) U (x, y) ϑ ν + κz |
p , p 1 2 β (g) K0 y
Since U (x, x) = 1 we find that for κ = − 21 ,
(g+1)
(g) x (g+1) (g+1) x (g) ϑ βα (g+1) |
ϑ βα (g) ν + κz p1 , p2 | (g) y ν y
(g+1)
(g) = U (x, y)κ (1 + O(ρ)), ϑ βα (g+1) (0| (g+1) ) ϑ βα (g) (κz p1 , p2 | (g) )
is holomorphic in ρ for |ρ| < r1r2 . Combining this result with (82) we immediately find (78) using the definition of the Szeg˝o kernel (16). (g+1) For κ = − 21 the lower bound on |n + αg+1 | = |κ| is satisfied for two values of n so that x α (g+1) (g+1) (g+1) ϑ ν |
β (g+1) y 1 x 1 α (g) ρ 8 1 − 21 (g) (g) 2 = − 2 ν + z p1 , p2 |
(−θg+1 ) U (x, y) ϑ β (g) 2 K0 y x 1 1 α (g) 1 − (g) (g) + (−θg+1 ) 2 U (x, y) 2 ϑ ν − z p1 , p2 |
(1 + O(ρ)), β (g) 2 y which eventually leads to (81).
636
M. P. Tuite, A. Zuevsky
Fig. 3. Contour σ (g)
We next note that, similarly to (37), Sκ (x, z a )S (g+1) (z a , y) is a meromorphic 1-form in z a periodic on the (g) cycles ai , bi for i = 1, . . . , g with simple poles dz a (g+1) S (x, y) for z a ∼ x, x − za dz a (g) S (x, y) for z a ∼ y. Sκ(g) (x, z a )S (g+1) (z a , y) ∼ za − y κ Sκ(g) (x, z a )S (g+1) (z a , y) ∼
(83)
(g)
Furthermore, Sκ (x, z a )S (g+1) (z a , y) is also periodic on the ag+1 cycle defined by C2 (z 2 ) ∼ −C1 (z 1 ). This follows from applying (75) to (77) so that ¯ Sκ(g) (x, e2πi z a ) = e2πiκ(a−a) Sκ(g) (x, z a ),
(84)
¯ U (x, z a )κ ). Similar proper(or alternatively we may apply U (x, e2πi z a )κ = e2πiκ(a−a) (g) (g+1) (z a , y). This leads to the following analogue of Proposities hold for Sκ (x, z a )S tion 1
Proposition 4. The Szeg˝o kernel on a genus g + 1 Riemann surface in the ρ-formalism (g) is given by for x, y ∈
1 (g+1) (g) (x, y) = Sκ (x, y) + S (g) (x, z a )S (g+1) (z a , y), (85) S 2πi Ca (za ) κ a=1,2
1 (g) = Sκ (x, y) − S (g+1) (x, z a )Sκ(g) (z a , y). (86) 2πi Ca (za ) a=1,2
Proof. The proof follows along the same lines as Proposition 1. Let σ be a contour on (g) surrounding Aa and the given points x, y ∈ (g) as shown in Fig. 3. Cauchy’s Theorem and (83) imply 1 0= S (g) (x, z)S (g+1) (z, y) 2πi σ κ
1 (g+1) (g) = −S (x, y) + Sκ (x, y) + S (g) (x, z a )S (g+1) (z a , y), 2πi Ca (za ) κ a=1,2
(g) Sκ (x, z a )S (g+1) (z a ,
recalling that argument holds for (86).
y) is periodic on Ca . Thus (85) follows. A similar
The Szeg˝o Kernel on a Sewn Riemann Surface
637
We next define weighted moments of S (g+1) (x, y). Let ka = k + (−1)a¯ κ, for a = 1, 2 and integer k ≥ 1 and define θ (g+1) (k, l) Yab (k, l) = Yab φ (g+1) 1 1 1 ρ 2 (ka +lb −1) −ka −lb (g+1) 2 2 (x ) (y ) S (x , y )d x dy = a ¯ b a ¯ b a¯ b. (2πi)2 Ca¯ (xa¯ ) Cb (yb ) (87) We define Y = (Yab (k, l)) to be the infinite matrix indexed by a, k and b, l. From (19) we note the skew-symmetry property θ (g+1) (θ (g+1) )−1 Yab (k, l) = −Yb¯ a¯ (k, l). (88) φ (g+1) (φ (g+1) )−1 (g)
We also introduce moments for Sκ (x, y), θ (g) G ab (k, l) = G ab (κ; k, l) φ (g) 1 1 1 ρ 2 (ka +lb −1) (xa¯ )−ka (yb )−lb Sκ(g) (xa¯ , yb )d xa¯2 dyb2 , = 2 (2πi) Ca¯ (xa¯ ) Cb (yb )
(89)
with associated infinite matrix G = (G ab (k, l)). This also satisfies a skew-symmetry property θ (g) (θ (g) )−1 G ab (κ; k, l) = −G b¯ a¯ (−κ; k, l). (90) φ (g) (φ (g) )−1 Finally we define half-order differentials 1 1 1 θ (g) ρ 2 (ka − 2 ) ya−ka Sκ(g) (x, ya )dya2 , h a (k, x) = h a (κ; k, x) = (g) φ 2πi Ca (ya ) 1 1 1 θ (g) ρ 2 (ka − 2 ) −ka (g) 2 x S (x , y)d x , (κ; k, y) = h¯ a (k, y) = h¯ a a ¯ κ a ¯ a ¯ φ (g) 2πi Ca¯ (xa¯ )
(91)
(92)
¯ = (h¯ a (k, y)) denote the infinite row vectors indexed and let h(x) = (h a (k, x)) and h(y) by a, k. These are related by skew-symmetry with (g) )−1 θ (g) (θ ha (κ; k, x) = −h¯ a¯ (−κ; k, x). (93) φ (g) (φ (g) )−1
638
M. P. Tuite, A. Zuevsky
These moments can be inverted to obtain Sκ(g) (x, ya ) =
ka
1
1
ρ − 2 + 4 h a (k, x)yaka −1 dya2 ,
(94)
k≥1
Sκ(g) (xa¯ , y) =
1
ρ − 2 + 4 xak¯ a −1 h¯ a (k, y)d xa¯2 . ka
1
(95)
k≥1
From the sewing relation (66) we have 1
dz a¯2 dz a = (−1) ξρ , z a¯ 1 2
1 2
a¯
(96)
√ for ξ ∈ {± −1}. We then find in a similar way to Proposition 2 that (g) then S (g+1) (x, y) is given by Proposition 5. For x, y ∈ ¯ T, S (g+1) (x, y) = Sκ(g) (x, y) + ξ h(x)D θ I + ξ Y D θ h(y) for infinite diagonal matrix D θ (k, l) =
(97)
−1 0 θg+1 δ(k, l). 0 −θg+1
Proof. From (85) of Proposition 4 we find 1 S (g) (x, z a )S (g+1) (z a , y) 2πi Ca (za ) κ a=1,2 ka 1
1 ρ− 2 + 4 = h a (k, x) z aka −1 S (g+1) (z a , y)dz a2 2πi Ca (z a )
S (g+1) (x, y) − Sκ(g) (x, y) =
a=1,2 k≥1
=ξ
ρ θ h a (k, x)Daa (k, k)
a,k
ka 2
− 14
2πi
1
Ca¯ (z a¯ )
a (g+1) z a−k (z a¯ , y)dz a¯2 , ¯ S
using respectively (94), (76) and (96). Applying (86) it follows that S (g+1) (x, y) − (g) Sκ (x, y) is given by ξ
θ h a (k, x)Daa (k, k)
a,k
−ξ
ρ
ka 2
2πi
b=1,2
Cb¯ (z b¯ )
1
Ca¯ (z a¯ )
ρ θ h a (k, x)Daa (k, k)
ka 2
− 14
a (g) 2 z a−k ¯ Sκ (z a¯ , y)dz a¯
(2πi)2 Ca¯ (za¯ )
a,k
.
− 14
a z a−k ¯ 1
S (g+1) (z a¯ , wb¯ )Sκ(g) (wb¯ , y)dz a¯2
¯ T −ξ = ξ h(x)D θ h(y)
a,b,k,l
θ h a (k, x)Daa (k, k).
The Szeg˝o Kernel on a Sewn Riemann Surface
639
1 1 1 ρ 2 (ka −lb ) a lb −1 (g+1) z a−k S (z a¯ , wb¯ )dz a¯2 dwb¯2 h¯ b (l, y) ¯ wb¯ 2 (2πi) Ca¯ (z a¯ ) Cb¯ (wb¯ ) ¯ T − h(x)D θ Y D θ h(y) ¯ T, = ξ h(x)D θ h(y)
on applying (95), (76) and (96). Hence the result follows.
We next compute the explicit form of Y in terms of the weighted moment matrix (g) G for Sκ . In particular it is convenient to define T = ξ G D θ . From Proposition 5 it follows on taking moments that Y = G + ξ G D θ (I + ξ Y D θ )G. This can be solved recursively to obtain Y = to that given for Proposition 3 we then find
n≥0
T n G. Following a similar argument
Proposition 6. Y = (I − T )−1 G, where (I − T )−1 = |ρ| < r1r2 .
n≥0
T n is convergent for
This result together with Proposition 5 implies Theorem 7. S (g+1) (x, y) is given by S (g+1) (x, y) = Sκ(g) (x, y) + ξ h(x)D θ (I − T )−1 h¯ T (y). Finally, similarly to Theorem 4 we may define det (I − T ) and find Theorem 8. det (I − T ) is non-vanishing and holomorphic in ρ for |ρ| < r1r2 .
4.3. Self-sewing a sphere. We consider the example of sewing the Riemann sphere (0) = C ∪ {∞} to itself to form a torus. Choose local coordinates z 2 = z ∈ C in the neighborhood of the origin p2 = 0, and z 1 = 1/z for z in the neighborhood of the point at infinity p1 = ∞. Identify the annular regions |q|ra−1 ¯ ≤ |z a | ≤ ra for a complex sewing parameter ρ = q obeying |q| ≤ r1r2 , via the sewing relation z = qz . These annular regions do not intersect on the sphere provided r1r2 < 1 so that |q| < 1. Furthermore, the sewing relation implies log z = log z + 2πiτ + 2πik for integer k, where q = e2πiτ . This is the standard parameterization of the torus with periods 2πiτ and 2πi and modular parameter τ ∈ H1 . We now show that the results of the previous subsection allow us to recover the genus one Szeg˝o kernel (23) from the genus zero one. For x, y ∈ C the genus zero prime form and Szeg˝o kernel are given by 1
1
E (0) (x, y) = (x − y)d x − 2 dy − 2 , 1 1 1 S (0) (x, y) = d x 2 dy 2 . x−y
(98) (99)
640
M. P. Tuite, A. Zuevsky
Let θ = θ (1) and φ = φ (1) = −e2πiκ denote the multipliers on the torus cycles. Then since p1 = ∞ and p2 = 0 we find U (x, y) = x/y so that (78) and (81) imply 1
Sκ(0) (x, y) =
1
x κ y −κ 1 1 θ d x 2 dy 2 δ 1. d x 2 dy 2 + x−y 1 − θ x 21 y 21 κ,− 2
(100)
Computing moments one finds that for κ = − 21 the half-differentials are 1
1
1
h 1 (k, x) = −ξ q 2 (k+κ− 2 ) x k+κ−1 d x 2 , 1
1
1
h 2 (k, x) = q 2 (k−κ− 2 ) x −k+κ d x 2 , h¯ 1 (k, y) = −q 2 (k+κ− 2 ) y −k−κ dy 2 , 1
1
1
h¯ 2 (k, y) = ξ q 2 (k−κ− 2 ) y k−κ−1 dy 2 , 1
1
1
(0) and the moment matrix T = ξ G D θ is diagonal with for x, y ∈ 1
Tab (k, l) = θ a−a¯ q ka − 2 δab δ(k, l). (0) , Altogether we find from Theorem 7 that for κ = − 21 and x, y ∈ S (1) (x, y) = Sκ(0) (x, y) + ξ h(x)D θ (I − T )−1 h¯ T (y) ⎡ 1 κ+ 2 x k+κ− 1
θ −1 q k+κ− 12 2 ⎢ y x ⎢ = ⎣− x − 1 1− y y −1 k+κ− 2 k≥1 1 − θ q ⎤ +
θq
k−κ− 12 1
k≥1
1 − θq k−κ− 2
y k−κ− 1 ⎥ d x 21 dy 21 2 ⎥ . 1 1 ⎦ x x2 y2
Denoting qu = eu for any u we define X, Y by x = q X , y = qY and let Z = X − Y . We also define λ = κ + 21 with 0 < λ < 1 and obtain ⎡
q Zλ θ −1 q k+λ S (1) (X, Y ) = ⎣− − q Zk+λ 1 1 − qZ −1 q k+κ+ 2 1 − θ k≥0 ⎤
θq −k−λ 1 1 + q Zk+λ ⎦ d X 2 dY 2 −k−λ 1 − θq k≤−1
=−
k∈Z
q Zk+λ 1 1 1 1 d X 2 dY 2 = P1 (Z , q)d X 2 dY 2 , −1 k+λ 1−θ q
from (24). A similar result also holds for κ = − 21 for θ = 0, i.e. (θ, φ) = (0, 0). Lastly, we note that (I − T )−1 is convergent for |q| < 1 and that furthermore , 1 1 det(I − T ) = 1 − θ −1 q k+κ− 2 1 − θ q k−κ− 2 , (101) k≥1
The Szeg˝o Kernel on a Sewn Riemann Surface
641
is holomorphic for |q| < 1 from Theorem 8. In vertex operator algebra theory, det(I −T ) is related to the genus one partition function for a continuous orbfolding of a rank two free fermion system e.g. [MTZ]. Furthermore, the infinite product (101) is part of that arising in the Jacobi triple identity on applying the bosonic decomposition of this theory. 4.4. Self-sewing a torus. We next consider the example of self-sewing an oriented torus (1) = C/ for lattice = 2πi(Zτ ⊕ Z) and τ ∈ H1 . This is discussed in detail in ref. [MT1]. Define annuli Aa , a = 1, 2 centered at p1 = 0 and p2 = w of (1) with local coordinates z 1 = z and z 2 = z − w respectively. Take the outer radius of Aa to be ra < 21 D(q) for D(q) = minλ∈,λ=0 |λ| and the inner radius to be |ρ|/ra¯ , with |ρ| ≤ r1r2 . Identifying the annuli via (66) we obtain a compact genus two Riemann surface (2) parameterized by 1
Dρ = {(τ, w, ρ) ∈ H1 × C × C : |w − λ| > 2|ρ| 2 > 0, λ ∈ }.
(102)
For x, y ∈ (1) the genus one prime form and Szeg˝o kernel with multipliers θ1 = −e−2πiβ1 and φ1 = −e2πiα1 are given by (22) and (23). Let θ2 = −e−2πiβ2 and φ2 = −e2πiα2 = −e2πiκ denote the multipliers on a2 , b2 cycles. Then, in this case U (x, y) =
ϑ1 (x − w, τ )ϑ1 (y, τ ) , ϑ1 (x, τ )ϑ1 (y − w, τ )
and z 0,w = κw so that for κ = − 21 , (1) Sκ
α1 1 1 θ1 ϑ1 (x − w, τ )ϑ1 (y, τ ) κ ϑ β1 (x − y + κw, τ )
d x 2 dy 2 , (x, y|τ, w) = α1 φ1 ϑ1 (x, τ )ϑ1 (y − w, τ ) ϑ β1 (κw, τ ) K (x − y, τ )
with a similar result for κ = − 21 . We take κ = − 21 from now on. It is straightforward to see that θ1 (1) θ1 (x, y|τ, w) = S−κ (x − w, y − w|τ, −w). Sκ(1) φ1 φ1
(103)
Computing moments and using (93) and (103) the half-differentials (91), (92) for x ∈ (1) and κ = − 1 are given by 2 1 1 1 θ1 ρ 2 (k+κ− 2 ) ϑ1 (x − w, τ ) κ dx 2
(κ; k, x|τ, w, ρ) = φ1 2πi ϑ1 (x, τ ) ϑ αβ11 (κw, τ )
κ ϑ α1 (x − y + κw, τ ) ϑ (y, τ ) β1 1 × dy, y −k−κ ϑ (y − w, τ ) K (x − y, τ ) 1 C1 (y) θ1 θ1 (κ; k, x|τ, w, ρ) = h 1 (−κ; k, x − w|τ, −w, ρ), h2 (104) φ1 φ1 −1 θ ¯h 1 θ1 (κ; k, x|τ, w, ρ) = −h 1 1 (κ; k, x − w|τ, −w, ρ), φ1 φ1−1 −1 θ ¯h 2 θ1 (κ; k, x|τ, w, ρ) = −h 1 1 (−κ; k, x|τ, w, ρ). φ1 φ1−1
h1
642
M. P. Tuite, A. Zuevsky
Similarly, using (103), the moment matrix (89) is given by 1 θ1 ρ κ+ 2 (k+l−1) (κ; k, l|τ, w, ρ) = x2 −k−κ y1 −l−κ G 11 φ1 (2πi)2 C2 (x2 ) C1 (y1 ) 1 1 θ1 (x2 , y1 |τ, w)d x22 dy12 , ×Sκ(1) φ1 θ1 (−κ; k, l|τ, −w, ρ), = G 22 φ1 1 θ1 ρ 2 (k+l−1) G 21 x1 −k+κ y1 −l−κ (κ; k, l|τ, w, ρ) = φ1 (2πi)2 C1 (x1 ) C1 (y1 ) 1 1 θ1 (x1 , y1 |τ, w)d x12 dy12 , ×Sκ(1) φ1 θ1 (−κ; k, l|τ, −w, ρ). = G 12 φ1
The genus two Szego kernel is determined for T = ξ G φθ11 D θ2 by (97) S (2)
(105)
θ1 θ1 (x, y|τ, w, ρ) = Sκ(1) (x, y|τ, w) θ2 φ1 θ1 θ1 (x)D θ2 (I − T )−1 h¯ T (y). (106) + ξh φ1 φ1
4.4.1. Modular invariance. We now consider the modular invariance of (106) under the action of a particular subgroup L ⊂ Sp(4, Z) and verify that (20) holds. We define L as follows [MT1]. Consider Hˆ ⊂ Sp(4, Z) with elements ⎛ ⎞ 1 0 0 b ⎜a 1 b c ⎟ μ(a, b, c) = ⎝ . (107) 0 0 1 −a ⎠ 0 0 0 1 Hˆ is generated by A = μ(1, 0, 0), B tions [A, B]C −2 = [A, C] = [B, C] 1 ∼ = S L(2, Z) with elements ⎛ a1 0 b1 ⎜0 1 0 γ1 = ⎝ c1 0 d1 0 0 0
= μ(0, 1, 0) and C = μ(0, 0, 1) with rela= 1. We also define 1 ⊂ Sp(4, Z), where ⎞ 0 0⎟ , a1 d1 − b1 c1 = 1. 0⎠ 1
(108)
Together these groups generate L = Hˆ 1 ⊂ Sp(4, Z) with center Z (L) = C, where J = L/Z (L) ∼ = Z2 S L(2, Z) is the Jacobi group. From Lemma 15 of [MT1] we find that L acts on the domain Dρ of (102) as follows: μ(a, b, c).(τ, w, ρ) = (τ, w + 2πiaτ + 2πib, ρ), a1 τ + b1 w ρ . γ1 .(τ, w, ρ) = , , c1 τ + d1 c1 τ + d1 (c1 τ + d1 )2
(109) (110)
The Szeg˝o Kernel on a Sewn Riemann Surface
643
The kernel of the action is Z (L), so that the effective action is that of J . However, this ρ of Dρ for which (g+1) action is lifted to L when considering the covering space D g+1,g+1 of (72) is single-valued (Theorems 10, 11 of [MT1]). In particular, one finds that C acts as C.(τ, w, ρ) = (τ, w, e2πi ρ),
(111)
ρ . which has a non-trivial action on D
(2) Let us now consider the action of L on S (2) φθ (2) (x, y|τ, w, ρ). This is partly deter (1) mined by the action of J on Sκ φθ11 (x, y|w, τ ). For γ1 ∈ 1 it is clear from (25) that a b (1) θ1 φ1 (1) θ1 (x, y|τ, w). x, γ y|γ (τ, w)) = S Sκ (γ 1 1 1 κ φ1 θ1 c φ1d
θ1 (κ; k, x|τ, w, ρ) and G ab φθ11 (κ; k, l|τ, w, ρ) are similarly 1 invariant so that
φ1 (2) S (2) φθ (2) (x, y|τ, w, ρ) is 1 invariant in a similar fashion to (65).
ha
Next we consider the action of the generators A, B and C. We firstly note that (21) implies θ1 ⎡ θ1 ⎤ θ1 −θ1 φ2 θ1 θ1 A
θ2 φ1 φ2
=⎣
−θ2 θ1
−φ1 φ2−1 φ2
⎦,
B
θ2 φ1 φ2
=
−θ2 φ1 φ1 φ2
, C
θ2 φ1 φ2
=
−θ2 φ2 φ1 φ2
Using (8) and recalling that φ2 = −e2πiκ we find θ1 (1) θ1 (1) (x, y|τ, w) = Sκ (x, y|τ, w + 2πiτ ) Sκ φ1 −φ1 φ2−1 −θ1 φ2 (x, y|τ, w + 2πi), = Sκ(1) φ1
.
(112)
(113) (114)
where the multipliers comply with those of (112) for A and B respectively. Define infinite diagonal matrices 1 0 −α −1 0 E α (k, l) = δ(k, l), (115) δ(k, l), F α (k, l) = 0 −α 0 1 for α ∈ U (1). Then (104), (113) and (114) imply θ1 θ1 (κ; x|τ, w, ρ) = h (κ; x|τ, w + 2πiτ, ρ)E θ1 h φ1 −φ1 φ2−1 −θ1 φ2 (κ; x|τ, w + 2πi, ρ)E φ1 =h φ1 θ1 (κ; x|τ, w, e2πi ρ)E φ2 , = e−iπ κ h φ1
644
M. P. Tuite, A. Zuevsky
θ1 θ1 (κ; x|τ, w, ρ) = h (κ; x|τ, w + 2πiτ, ρ)F θ1 h¯ φ1 −φ1 φ2−1 −θ1 φ2 (κ; x|τ, w + 2πi, ρ)F φ1 =h φ1 θ1 (κ; x|τ, w, e2πi ρ)F φ2 . = eiπ κ h φ1 Similarly, from (105) we find θ1 θ1 θ1 G (κ|τ, w, ρ) = F G (κ|τ, w + 2πiτ, ρ)E θ1 φ1 −φ1 φ2−1 −θ1 φ2 φ1 (κ|τ, w + 2πi, ρ)E φ1 =F G φ1 −θ1 φ2 (κ|τ, w, e2πi ρ)E φ2 . =F G φ1 α θ2 α −αθ2 for α = θ , φ and φ we may then easily confirm Noting that 1 1 2
(2)E D F = D θ (2) that S φ (2) (x, y|τ, w, ρ) is invariant under the generators A, B and C respectively.
(2) Therefore S (2) φθ (2) (x, y|τ, w, ρ) is modular invariant under L. Furthermore, since det(E α F α ) = 1 it follows that det (I − T ) is also L invariant.
References [DVFHLS] [DVPFHLS] [F1] [F2] [FK] [FLM] [FS] [HS] [Ka] [MT1] [MT2] [MT3] [MT4] [MTZ] [Mu]
di Vecchia, P., Hornfeck, K., Frau, M., Lerda, A., Sciuto, S.: N-string, g-loop vertex for the fermionic string. Phys. Lett. B 211, 301–307 (1988) di Vecchia, P., Pezzella, F., Frau, M., Hornfeck, K., Lerda, A., Sciuto, S.: N -point g-loop vertex for a free fermionic theory with arbitrary spin. Nucl. Phys. B 333, 635–700 (1990) Fay, J.D.: Theta Functions on Riemann surfaces, Lecture Notes in Mathematics, Vol. 352. Berlin-New York: Springer-Verlag, 1973 Fay, J.D.: Kernel functions, analytic torsion, and moduli spaces. Mem. Amer. Math. Soc. 96, no. 464 (1992) Farkas, H.M., Kra, I.: Riemann Surfaces. New York: Springer-Verlag, 1980 Frenkel, I., Lepowsky, J., Meurman, A.: Vertex Operator Algebras and the Monster. New York: Academic Press, 1988 Freidan, D., Shenker, S.: The analytic geometry of two dimensional conformal field theory. Nucl. Phys. B 281, 509–545 (1987) Hawley, N.S., Schiffer, M.: Half-order differentials on Riemann surfaces. Acta Math. 115, 199–236 (1966) Kac, V.: Vertex Operator Algebras for Beginners. University Lecture Series, Vol. 10. Providence, RI: Amer. Math. Soc., 1998 Mason, G., Tuite, M.P.: On genus two Riemann surfaces formed from sewn tori. Commun. Math. Phys. 270, 587–634 (2007) Mason, G., Tuite, M.P.: Partition functions and chiral algebras. In Lie Algebras, Vertex Operator Algebras and their Applications (in honor of Jim Lepowsky and Robert L. Wilson), Contemp. Math. 442, 401–410 (2007) Mason, G., Tuite, M.P.: Free bosonic vertex operator algebras on genus two Riemann surfaces I. Commun. Math. Phys. 300, 673–713 (2010) Mason, G., Tuite, M.P.: Free bosonic vertex operator algebras on genus two Riemann surfaces II. To appear Mason, G., Tuite, M.P., Zuevsky, A.: Torus n-point functions for R-graded vertex operator superalgebras and continuous fermion orbifolds. Commun. Math. Phys. 283, 305–342 (2008) Mumford, D.: Tata Lectures on Theta I and II. Boston MA: Birkhäuser, 1983
The Szeg˝o Kernel on a Sewn Riemann Surface
[R] [RS] [Sc] [Sp] [Sz] [T] [TZ1] [TZ2] [Y] [Z]
645
Raina, A.K.: Fay’s trisecant identity and conformal field theory. Commun. Math. Phys. 122, 625–641 (1989) Raina, A.K., Sen, S.: Grassmannians, multiplicative ward identities and theta-function identities. Phys. Lett. B 203, 256–262 (1988) Schiffer, M.: Half-order differentials on Riemann surfaces. SIAM J. Appl. Math. 14, 922–934 (1966) Springer, G.: Introduction to Riemann Surfaces. Reading, MA: Addison-Wesley, 1957 Szeg˝o, G.: Über orthogonale Polynome, die zu einer gegebenen kurve der komplexen Ebene gehören. Math. Z. 9, 218–270 (1921) Tuite, M.P.: Genus two meromorphic conformal field theory. CRM Proceedings and Lecture Notes 30, 231–251 (2001) Tuite, M.P., Zuevsky, A.: Genus two partition and correlation functions for fermionic vertex operator superalgebras I. http://arXiv.org/abs/1007.5203v2 [math.QA], 2011, to appear in Commun. Math. Phys., doi:10.1007/s00200-011-1258-1, 2011 Tuite, M.P., Zuevsky, A.: Genus two partition function for free fermionic vertex operator algebras II, to appear Yamada, A.: Precise variational formulas for abelian differentials. Kodai Math. J. 3, 114–143 (1980) Zhu, Y.: Modular invariance of characters of vertex operator algebras. J. Amer. Math. Soc. 9, 237–302 (1996)
Communicated by Y. Kawahigashi
Commun. Math. Phys. 306, 647–662 (2011) Digital Object Identifier (DOI) 10.1007/s00220-011-1260-7
Communications in
Mathematical Physics
A Rigidity Result for Extensions of Braided Tensor C ∗ –Categories Derived from Compact Matrix Quantum Groups Claudia Pinzari1 , John E. Roberts2 1 Dipartimento di Matematica, Università di Roma “La Sapienza”, 00185 Roma, Italy.
E-mail:
[email protected]
2 Dipartimento di Matematica, Università di Roma “Tor Vergata”, 00133 Roma, Italy.
E-mail:
[email protected] Received: 31 July 2010 / Accepted: 1 December 2010 Published online: 25 May 2011 – © Springer-Verlag 2011
Dedicated to Sergio Doplicher on the occasion of his seventieth birthday Abstract: Let G be a classical compact Lie group and G μ the associated compact matrix quantum group deformed by a positive parameter μ (or μ ∈ R \ {0} in the type A case). It is well known that the category of unitary representations of G μ is a braided tensor C ∗ –category. We show that any braided tensor ∗ –functor ρ : Rep(G μ ) → M to another braided tensor C ∗ –category with irreducible tensor unit is full if |μ| = 1. In particular, the functor of restriction RepG μ → Rep(K ) to a proper compact quantum subgroup K cannot be made into a braided functor. Our result also shows that the Temperley–Lieb category T±d for d > 2 can not be embedded properly into a larger category with the same objects as a braided tensor C ∗ –subcategory.
1. Introduction There are various strategies for studying the structure of or classifying semisimple rigid tensor categories. As an oversimplification, sometimes one focuses on the structure of simple objects. A basic result is Ocneanu’s rigidity, see [11] for a proof, asserting that there are finitely many C–linear fusion categories with a prescribed fusion ring. Important results have been obtained for categories that may or may not have finitely many irreducibles, or a braiding, see [23,26,29,30,38,39,11], however this is an incomplete list. Another approach, when there is a relevant subcategory with few arrows, is to try and construct the whole category M from a smaller subcategory A. This subcategory is regarded as a symmetry of M, and the classification problem becomes that of classifying extensions M with the given symmetry. The first example, motivated by AQFT [9] (see also [15]), is that of permutation symmetry. For tensor C ∗ –categories with conjugates (and subobjects, direct sums and irreducible tensor unit) realizations of this symmetry are few, they are classified by a single integer parameter and a sign, the statistics phase. When all the statistics phases
648
C. Pinzari, J. E. Roberts
are one, M Rep(G) with its natural permutation symmetry, for a unique compact group G [10]. A related independent result is also well known [8]. The theory of subfactors [19] or low dim AQFT [12], although differing, provide further remarkable instances of this general scheme. In the first case the Jones projections play a fundamental role, leading to the Temperley–Lieb symmetry; whereas the Virasoro algebra symmetry with central charge < 1 plays a major role in CFT on the circle. Deep classification results have been obtained in both these areas [2,13,17,22,34]. For the categories arising from low dimensional QFT or from certain quantum groups one has braid group symmetries. The main difference from the permutation symmetric case is the great variety of realizations in tensor categories. In this paper we start with braided categories A arising from quantum groups where the braiding comes from an R–matrix. The matrix carries more information on the quantum group than in the group case, where most of the information gets lost, so that, for example, for any proper closed subgroup K of a group G the restriction functor Rep(G) → Rep(K ) is not a full permutation symmetric tensor ∗ –functor Rep(G) → Rep(K ). The R–matrix depends on the structural constants of the quantum group, making it an object intimately related to the quantum group and raising the question of whether the class of tensor categories with that specific braided symmetry has a rigid structure. A recent result in [6] for the categories arising from CFT on the circle seems to confirm a certain rigidity when constructing inclusions of unitarily braided, but not permutation symmetric, tensor categories. More precisely, the authors, working with inclusions of conformal nets on the circle, where unitary braided symmetries arise, have shown that far fewer low values of the Jones index of the inclusion can √ occur than in the subfactor situation. They exclude all non-integer index values < 3 + 3 (hence in particular all non-integer index values in the Jones discrete series) except one, 4 cos2 π/10, which is realized [35]. The integer values are known to be realized from inclusions derived by taking fixed points under group actions. α–induction relates the tensor category associated with a net A to that associated with an extension B, see [22] and references there. We also mention the work of [27] for a categorical relation corresponding to inclusions of nets. The aim of this paper is to derive a rigidity result for tensor categories with a braided symmetry derived from quantum groups of all Lie types but not at a root of unity. Note that in this case the R–matrix carries maximal information on the quantum group. We shall work with tensor C ∗ –categories, A will be the category of unitary representations of any deformation G μ of the classical compact Lie groups G by a real parameter μ. It is known that the R–matrix for such quantum groups makes the associated representation category into a braided tensor C ∗ –category with conjugates (or duals). Except for the extreme values of μ, the coinverse of G μ is not involutive and the braiding is not unitary and we show that every braided tensor ∗ –functor ρ : Rep(G μ ) → M to a braided tensor C ∗ –category is full (see Theorem 3.1 for a more general statement). We derive this result from Theorem 5.4, showing an obstruction to construct extensions of braided tensor C ∗ –categories. The obstruction vanishes for unitary braided symmetries. Note that, in our class of examples, A has infinitely many irreducibles whose indices (or dimensions) can be arbitrarily large. The category M is not assumed to be embeddable into the category of Hilbert spaces. However, when it is, our result shows that the braiding of Rep(G μ ) does not extend to any RepK , where K is a proper quantum subgroup.
Rigidity Result for Extensions of Braided Tensor C ∗ –Categories
649
This corollary does not, to our knowledge, seem to have been noticed in the literature. It shows how the results are discontinuous in the classical limit μ = 1. Another application concerns Temperley–Lieb categories. It states that when this category is generated by an object of intrinsic (quantum) dimension d > 2, (i.e. not in the discrete series of Jones), it cannot be embedded properly as a braided tensor C ∗ –subcategory into a larger such category with the same objects. We recall here the situation for subfactors N ⊂ M with finite index where the associated N − N –bimodule tensor category contains a copy of the TL category as a (non-braided) tensor C ∗ –category. The relation to our result can be seen by regarding the TL category T±d as a full braided tensor subcategory of the representation category of Sμ U (2) for a deformation parameter determined by |μ + μ−1 | = d, and μ is either positive or negative corresponding to the minus or plus sign. Our result relies on the theory of induction of [33]. We showed there that if a tensor C ∗ –category with conjugates A admits a tensorial embedding τ into the category of Hilbert spaces, and hence, by Woronowicz duality, is a full tensor C ∗ –subcategory of the representation category of a compact quantum group G τ , then given an embedding of ρ : A → M of tensor C ∗ –categories, the full subcategory of M, whose objects are in the image of ρ, can be identified with a category of Hilbert bimodule representations of G τ for a noncommutative G τ –ergodic C ∗ –algebra (C, α) intrinsically associated with ρ and τ . (C, α) is to be regarded as a virtual subgroup. For a fixed τ , isomorphic pairs (M, ρ), (M , ρ ) yield conjugate ergodic actions. We show that, taking braided symmetries into account, there is an obstruction to constructing such non-trivial ergodic actions (C, α). More precisely, our main result states that the invariant κ of A must be a phase on irreducible spectral representations for the action α corresponding to ρ : A → M (Theorem 5.4). We are thus left to check that κ(v) is never a phase for our quantum group G μ , unless v is equivalent to the trivial representation (Theorem 3.1). The invariant κ appeared in its simplest form for categories derived from AQFT. It is just the statistics phase recalled above for permutation symmetric tensor C ∗ –categories [9]. In two spacetime dimensions the braid group replaces the permutation group. Unitarity of the braiding means that κ is a phase on irreducible objects but it need not be a sign [12]. For general braided tensor C ∗ –categories, κ has been discussed in [25]. A closely related invariant was independently introduced for certain braided tensor categories, the ribbon categories [36], and referred to as the twist, θ . Its importance derives from the fact that ribbon categories associated with certain quantum groups at roots of unity lead to the Reshetikhin–Turaev invariants of 3–manifolds and links in 3–manifolds. Note that for quantum groups with unitary braided symmetries, κ is indeed always a phase, and our obstruction vanishes. In this case, braided extensions can be expected. We describe the braided symmetries of Rep(Sμ U (d)) got by twisting the universal R-matrix by d th roots of unity. The invariant κ can be used to show that these braidings provide inequivalent braided tensor categories. We finally mention that reconsidering the construction of these braidings led us to compare explicitly the intrinsic characterizations of the quantum SU (d), when not at roots of unity, given by [23] and [31]. Whilst [23] starts from the fusion rules, [31] relies on the braiding. The twist τC may be derived following the approach of [31], leading to a characterization of the tensor categories with fusion rules of type sl(d) in the spirit of [23], in terms of generators and relations analogous to [31], but now involving a d th root of unity, see Prop. 7.2 and Cor. 7.3.
650
C. Pinzari, J. E. Roberts
The paper is organized as follows. In Sect. 2 we recall the notion of a braided tensor C ∗ –category. We shall propose a variant weaker than the usual one, but sufficient for our main result, announced in Sect. 3. To this end, we recall the natural structure of RepG μ as a braided tensor C ∗ –category. We shall in particular emphasize the case of Sμ U (d) as it allows one to grasp the general idea of the proof quickly. In Sect. 4 we derive the main properties of the invariant κ of a braided tensor C ∗ –category with conjugates and compare it with the twist of a ribbon tensor category. In Sect. 5 we show that when the invariant κ of a braided tensor C ∗ –category A is not unitary it is an obstruction to construct braided extensions. In Sect. 6 we complete the proof by showing that κ(v) is never a phase for an irreducible representation v of the quantum groups G μ , unless v is the trivial representation. In the last section we reconsider the results of [23] and [31]. 2. Braided Tensor C ∗–Categories In this section we introduce various notions of braiding, from the weakest to the strongest. The categories we shall consider will always be assumed to be strict, to be over the complex numbers, with irreducible tensor unit ι. We shall also assume existence of subobjects and direct sums, unless otherwise stated, but see the remarks at the beginning of Sect. 4. In this paper we shall mostly deal with tensor C ∗ –categories (or unitary categories). The weakest notion requires that for any pair of objects u, v of a tensor C ∗ –category A there is an invertible intertwiner σ (u, v) ∈ (u ⊗ v, v ⊗ u) such that for arrows T belonging to spaces of the form (u, ι), 1v ⊗ T ◦ σ (u, v) = T ⊗ 1v .
(2.1)
σ (u, v)∗ −1
Note that the dual intertwiners σd (u, v) := ∈ (u ⊗ v, v ⊗ u) do not satisfy (2.1). If both σ and σd satisfy (2.1), σ will be referred to as a weak left braided symmetry. Notice that a weak left braided symmetry by itself, may not even be related to the braid group as we are not assuming properties (2.3) and (2.4) below. However, the relation to the braid group will follow automatically for the weak braided symmetries of interest in this paper (cf. the end of the section). A stronger notion is that of a left braided symmetry. One requires the following relations for objects u, u , v, and arrows T ∈ (u, u ), σ (ι, u) = σ (u, ι) = 1u , σ (u ⊗ u , v) = σ (u, v) ⊗ 1u ◦ 1u ⊗ σ (u , v), σ (u , v) ◦ T ⊗ 1v = 1v ⊗ T ◦ σ (u, v).
(2.2) (2.3) (2.4)
Indeed, the notion of a weak left braided symmetry is just a special case of the naturality property (2.4) and (2.2) for u or u the trivial object ι. A weak right braided symmetry is defined by the following equation for T ∈ (u, ι): 1v ⊗ T = T ⊗ 1v ◦ σ (v, u) = T ⊗ 1v ◦ σd (v, u), and right braided symmetry is defined replacing (2.3) and (2.4) by σ (u, v ⊗ v ) = 1v ⊗ σ (u, v ) ◦ σ (u, v) ⊗ 1v ,
(2.3 )
σ (v, u ) ◦ 1v ⊗ T = T ⊗ 1v ◦ σ (v, u).
(2.4 )
Obviously, σ is a (weak) right braided symmetry if and only if σ−1 (u, v) := σ (v, u)−1 or σ∗ (u, v) := σ (v, u)∗ are (weak) left braided symmetries.
Rigidity Result for Extensions of Braided Tensor C ∗ –Categories
651
We thus recover the usual notion of a braided symmetry considering a left and right braided symmetry. A (weak) half braided symmetry will just mean a (weak) left or right braided symmetry. Given (weak) half braided symmetry σ in A, the dual symmetry σd (u, v) is another braided symmetry for A of the same type. The representation categories of Sμ U (d), or, more generally, of G μ , where G is a classical compact Lie group, or of Ao (F) are well known examples of braided tensor C ∗ –categories. These will in fact be our main examples. They will be discussed in later sections. If σ satisfies (2.1) or is a weak half or half braided symmetry for a tensor C ∗ – category A and τ : A → Hilb is a tensor ∗ –functor into the category of Hilbert spaces, then the representation category of the compact quantum group G τ associated to τ via Woronowicz duality has a braiding στ of the same type defined by στ (u, ˆ v) ˆ := τ (σ (u, v)), where uˆ is the representation of G τ corresponding to the object u of A. It is easy to show that the category of Hilbert spaces has a unique braiding, even in the sense of (2.1), given by its unique permutation symmetry. If (A, σ ) and (M, ε) are tensor C ∗ –categories with some type of braiding, a tensor ∗ –functor F : A → M will be called braided if for any pair of objects u, v in A, F(σ (u, v)) = ε(F(u), F(v)). We shall consider cases where A has a stronger braided symmetry than M. Note that, if for example properties (2.2), (2.3) hold for the braiding of A and F is a braided tensor ∗ –functor to a weak half braided M, then those properties also hold for the braided symmetry of M for objects in the image of F. However, the naturality axioms (2.4) or (2.4 ) are not inherited by the full subcategory of M whose objects are in the image of F. 3. The Rigidity Result Tensor categories derived from quantum groups arising from a deformation of classical Lie groups admit natural braided symmetries associated with the universal R–matrices of Drinfeld, see, e.g. [7,24]. We work with compact matrix quantum groups. We start by describing briefly the corresponding braided symmetry for the type Ad−1 in this framework. After this, we recall how to construct braided tensor C ∗ –categories for the other Lie types, before illustrating our main result and its corollaries. Woronowicz has introduced the compact quantum group Sμ U (d), for a nonzero μ ∈ [−1, 1], using his duality theorem [42]. It is the (maximal) compact quantum group whose representation category Rep(Sμ U (d)) is generated, as an embedded tensor C ∗ –category, by the deformed determinant element S= (−μ)i( p) ψ p(1) ⊗ · · · ⊗ ψ p(d) , p∈Pd
where ψi is an orthonormal basis of a d–dimensional Hilbert space. For a nonzero complex parameter q, let Hn (q) denote the Hecke algebra of type An−1 , i.e. the quotient of the complex group algebra of the braid group Bn with generators g1 , . . . gn−1 by the relations gi2 = (1 − q)gi + q.
652
C. Pinzari, J. E. Roberts
For q real or |q| = 1, Hn (q) becomes a ∗ –algebra with involution making the spectral idempotents of the gi into selfadjoint projections. However, there are nontrivial Hilbert space representations for n arbitrarily large, only if q > 0 or q = e±2πi/ [40]. In this paper we only consider the real case. There is a well known remarkable representation of the Hecke algebra H∞ (q), with q = μ2 , in Rep(Sμ U (d)), due to Jimbo and Woronowicz [18,42], and defined by η(g1 )ψi ⊗ ψ j = μψ j ⊗ ψi , i < j, η(g1 )ψi ⊗ ψi = ψi ⊗ ψi , η(g1 )ψi ⊗ ψ j = μψ j ⊗ ψi + (1 − q)ψi ⊗ ψ j , i > j. Rep(Sμ U (d)) admits an intrinsic characterization [23,31]. We follow the approach of [31] (based on the type of the braiding) and compare with that of [23] (based on the type of the fusion rules) in the last section. Rep(Sμ U (d)) is, up to equivalence, the unique tensor C ∗ –category A (with subobjects and direct sums) with a tensor ∗ –functor from the braid category η : B → A factoring through representations of the complex Hecke algebras of type A, Hn (q) for q = μ2 and generated by an object u and an arrow S ∈ (ι, u d ) satisfying S ∗ ◦ S = d!q , S ◦ S ∗ = η(Ad ), ∗
S ⊗ 1u ◦ 1u ⊗ S = (d − 1)!q (−μ) η(g1 . . . gd ) ◦ S ⊗ 1u = μ
d−1
d−1
(3.1) ,
(3.2)
1u ⊗ S,
(3.3)
where n!q is the usual quantum factorial and Ad is the antisymmetrized sum of the elements of the canonical basis of Hd (q), a scalar multiple of the analogue of the totally antisymmetric projection. The above relations are realized by the deformed determinant element S and the JW representation η. They are easily verified for d = 2. (See [31] for details. Notice our gi corresponds to −gi there.) Remark. Note that for d odd, Rep(Sμ U (d)) and Rep(S−μ U (d)) are canonically isomorphic. It is well known from Drinfeld’s theory of universal R–matrices, that the map σω : ω gi → μ η(gi ), where ω is a complex d th root of μ, makes Rep(Sμ U (d)) into a braided tensor category. It is easy to check that Rep(Sμ U (d)) actually becomes a braided tensor C ∗ –category in this way, cf. [31]. Working with a fixed deformation parameter leads to d inequivalent braided symmetries obtained by varying ω. This fact was first noted in [23], see also [39], Sect. 4. Note that, for μ > 0, the braided symmetry of Rep(Sμ U (d)) corresponding to the positive d th root μ1/d of μ is a natural choice, as it reduces to the unique permutation symmetry of the category of Hilbert spaces for μ = 1. Deformation has been generalized to all classical compact Lie groups in the framework of compact matrix quantum groups [1,24,37]. The starting point was the dual picture of Drinfeld and Jimbo. r We start with acomplex simple Lie algebra g. Denote by (αi )1 a set of simple roots. Let A =
2(αi ,α j ) (α j ,α j )
be the Cartan matrix, where (· , · ) is a symmetric invariant bilinear (α ,α )
form on g such that (α, α) = 2 for a short root α. It follows that d j := j 2 j ∈ N for all j. For a complex deformation parameter μ other than a root of unity, consider the quantized universal enveloping algebra Uμ (g) as defined in Ch. 2, 7.1.1, [24].
Rigidity Result for Extensions of Braided Tensor C ∗ –Categories
653
The category C(g, μ) of finite dimensional representations of Uμ (g) admitting a weight decomposition has the structure of a braided ribbon tensor category [24], see also [3,7]. If μ ∈ (0, 1), Uμ (g) becomes a Hopf ∗ –algebra with real form E i∗ = Fi , K i∗ = K i . The representation space of every object of C(g, μ) with positive weights on the K i ’s admits a natural Hilbert space structure making it into a ∗ –representation. One gets in this way a tensor C ∗ –category with conjugates and braided symmetry embedded into the category of Hilbert spaces. A compact quantum group G μ may then be defined via Woronowicz duality, see [1,24,37]. The category Rep(G μ ) of unitary finite dimensional representations of G μ will be regarded as a braided tensor C ∗ –category as described above. In the particular case of type A, we allow negative parameters for Sμ U (d) and the braided symmetry may be any of the σω ’s. The aim of this paper is to prove the following rigidity result (a consequence of Theorem 5.4.) Theorem 3.1. For μ = 1 (and also μ = −1 in the type A case), every braided tensor ρ : Rep(G μ ) → M into a weak half braided tensor C ∗ –category is full.
∗ –functor
We derive the following consequences. Corollary 3.2. For μ = 1 (and also = −1 in the type A case) the braided symmetry of Rep(G μ ) does not make the representation category of any proper compact quantum subgroup into a weak half braided tensor C ∗ –category. Proof. If K is a proper compact quantum subgroup then the tensor ∗ –functor Rep(G μ ) → Rep(K ) of restriction is not full, hence the image of the braided symmetry of Rep(G μ ) can not be a braided symmetry for Rep(K ). The next corollary is an application to the Temperley–Lieb category T±d . Recall that for d > 0, T±d may be defined as the universal tensor ∗ –category with objects N0 whose arrows are generated by a single arrow R ∈ (0, 2) satisfying R ∗ ⊗ 11 ◦ 11 ⊗ R = ±11 and R ∗ ◦ R = d. The following assertions are well known [14]. The units and generating objects of these categories are irreducible and the spaces of arrows are finite dimensional. The categories are simple except at roots of unity, d = 2 cos π/n, when they have a single non-zero proper ideal. They are tensor C ∗ –categories when d ≥ 2 and at roots of unity their quotients by the unique non-zero proper ideal are tensor C ∗ –categories. The ∗ defines a Hecke algebra representation η(g1 ) := (1 + q)e − q, projection e = R◦R d where q = μ2 and −1 < μ < 1 is determined by |μ + μ1 | = d and μ > 0 for T−d and μ < 0 for Td . It is also known that T±d is a full braided tensor C ∗ –subcategory of Rep(Sμ U (2)), see [32] with previous results in [4] for embedded categories. It is equivalent to Rep(Sμ U (2) after completion under subobjects and direct sums. We may thus apply Theorem 3.1. Corollary 3.3. The category T±d generated by a single selfadjoint object of dimension d > 2 regarded as a braided tensor C ∗ –category can not be embedded properly as a weak half braided tensor C ∗ –subcategory with the same objects. 4. Invariant κ and the Twist of Ribbon Categories In this section we recall the main properties of the invariant κ of a braided tensor C ∗ –category with conjugates [25] and we compare it with the twist θ of ribbon categories [36].
654
C. Pinzari, J. E. Roberts
If v is an irreducible object of A and R ∈ (ι, v ⊗ v), R ∈ (ι, v ⊗ v) a solution of the conjugate equations for v then σ (v, v) ◦ R must be a nonzero scalar multiple of R whenever σ (v, v) ∈ (v ⊗ v, v ⊗ v) is an invertible intertwiner, as the space (ι, v ⊗ v) is one dimensional. This scalar in general depends on the choice of the conjugate equations. However, the following proposition shows that it is independent if we choose a standard solution of the conjugate equations, in the sense of [25]. For irreducible objects v solutions are standard if and only if they are normalized, R = R. Proposition 4.1. If σ is a right braided symmetry for A, then for any irreducible object v the scalar κr (v) defined by σ (v, v) ◦ R = κr (v)R does not depend on the choice of standard solutions of the conjugate equations for v. If σ is a braided symmetry, κr is a class function, i.e. κr (v) = κr (u) if v and u are unitarily equivalent. Remark. Note that, unlike in the case of tensor categories, we do not need to make a choice v → Rv to define v → κr (v). We may extend κr to reducible objects v of A by the equation σ (v, v) ◦ R = 1v ⊗ κr (v) ◦ R, where we again use standard solutions. Note that κr (v) ∈ (v, v). Similarly, for left braided symmetries we may define κl (v). For a braided symmetry, κr (v) = κl (v) =: κ(v). In this case κ(v) = κ(vn )E n , where vn are the irreducible components of v and E minimal central projections in n (v, v) with n E n = 1v . It follows that v → κ(v) is central, κ(v) ◦ T = T ◦ κ(u), T ∈ (u, v), see [25]. Proposition 4.2. If σ is a right braided symmetry, then for any object v of A, ∗
σ
κl −1 (v) = κrσ (v)−1 , κlσ∗ (v) = κrσ (v)∗ , κrσd (v) = κrσ (v)−1 . Proof. We may assume v irreducible. The first relation is obvious and the third follows from the first two. Now σ (v, v)∗ ◦ Rv = κlσ∗ (v)R v , so ∗
κlσ∗ (v)R v 2 = R v ◦ σ (v, v)∗ ◦ Rv = (σ (v, v) ◦ R v )∗ ◦ Rv = κr (v)Rv 2 , completing the proof.
The central element κ is in fact an invariant for braided tensor C ∗ –categories. More precisely, the invariant κr is preserved under a full tensor ∗ –functor of right braided tensor C ∗ –categories with conjugates. We next give an alternative definition of κl (v), in the spirit of the relation between statistics parameter, dimension and statistics phase in AQFT [15].
Rigidity Result for Extensions of Braided Tensor C ∗ –Categories
655
Proposition 4.3. If σ is a left braided symmetry of A, then for any object v and any standard solution of the conjugate equations, κl (v)−1 = Rv∗ ⊗ 1v ◦ 1v ⊗ σ (v, v) ◦ Rv ⊗ 1v ,
κl (v) = Rv∗ ⊗ 1v ◦ 1v ⊗ σ (v, v)−1 ◦ Rv ⊗ 1v .
The same relations hold for a right braided symmetry with κr (v) in place of κl (v). Proof. Rv∗ ⊗ 1v ◦ 1v ⊗ σ (v, v) ◦ Rv ⊗ 1v = Rv∗ ⊗ 1v ◦ σ (v, v)−1 ⊗1v ◦ σ (v⊗v, v) ◦ Rv ⊗1v ∗
= (σd (v, v) ◦ Rv )∗ ⊗ 1v ◦ 1v ⊗ Rv = R v ⊗ 1v ◦ κl (v)−1 ⊗ Rv = κl (v)−1 .
The second relation follows applying the first to σd . For the remaining statements it suffices to replace σ with σ∗ and σ−1 . The following known result exhibits the behaviour of κ under tensor products, see [25]. Proposition 4.4. If σ is a braided symmetry of A, then for any pair of objects u, z, κ(u ⊗ z) = (σ (z, u) ◦ σ (u, z))−1 ◦ κ(u) ⊗ κ(z). The following convention suggested by categories of endomorphisms of an algebra helps to simplify notation. The symbol T denoting an arrow in (w, z) will also be used for T ⊗ 1u ∈ (w ⊗ u, z ⊗ u). The meaning of T will be clear from the context. Whereas 1u ⊗ T will be denoted by ρ(T ). The previous proposition easily yields the following formula. Corollary 4.5. For any object u ∈ A and any integer n, −1 κ(u ⊗n ) = n−1 ◦ · · · ◦ 1−1 ◦ κ(u)⊗n ,
where k := ρ k−1 (σ ) ◦ ρ k−2 (σ ) · · · ◦ σ 2 ◦ · · · ◦ ρ k−2 (σ ) ◦ ρ k−1 (σ ) and σ = σ (u, u). −1 Remark. As κ(u) is a scalar, when u is irreducible, κ(v) is an eigenvalue of n−1 ◦···◦ −1 n ⊗n 1 κ(u) , if v is an irreducible summand of u .
We next compare the invariant κ of a braided tensor C ∗ –category with the twist θ of a ribbon category. Recall that in a C–linear tensor category (always assumed abelian and semisimple, although not necessarily strict) a right dual u ∗ for any object u of the category, is defined by two arrows e ∈ (v ∗ ⊗ v, ι) and d ∈ (ι, v ⊗ v ∗ ) such that, up to canonical associativity isomorphisms, omitted here for simplicity, 1v ⊗ e ◦ d ⊗ 1v = 1v , e ⊗ 1v ∗ ◦ 1v ∗ ⊗ d = 1v ∗ . As for conjugates, a right dual is unique up to a unique invertible T ∈ (v1∗ , v2∗ ) such that e1 = e2 ◦ T ⊗ 1v , d2 = 1v ⊗ T ◦ d1 . A left dual ∗ v of v is similarly defined by arrows
656
C. Pinzari, J. E. Roberts
e ∈ (v ⊗∗ v, ι), d ∈ (ι, ∗ v ⊗ v). In a tensor ∗ –category with conjugates, a conjugate v of v is always a right and left dual, v = v ∗ = ∗ v. A right rigid tensor category is a tensor category with a specified choice of right dual (v ∗ , ev , dv ) for every object v. A ribbon category is a right rigid braided tensor category with a choice of isomorphisms θv ∈ (v, v), called twists, natural in v and satisfying θv⊗w = σ (w, v) ◦ σ (v, w) ◦ θv ⊗ θw , θv ∗ = (θv )c , θι = 1ι . In a ribbon category we also have an associated left duality (∗ v = v ∗ , ev , dv ) defined as in Eq. (3.5) of Ch. XIV in [20]. The associated contravariant functor coincides with that induced by right duality. As is well known, we may then define a scalar valued trace trv as in Def. XIV.4.1 in [20], analogous to a left inverse in a tensor C ∗ –category [25]. The twist can be computed from the braided symmetry and trace, (θv )−1 = trv ⊗ 1v (σ−1 (v, v)). From this, it is easy to show that if N is a ribbon category and M a tensor C ∗ –category with conjugates embedded in N as a full tensor subcategory, then for any irreducible object v of M, the trace arising from the ribbon structure may be chosen positive and θv = κr (v)−1 , cf. Ch. XIV in [20]. 5. An Obstruction to Extending Braided Tensor C ∗–Categories Throughout this section, A and M are tensor C ∗ –categories with conjugates and irreducible tensor units and ρ : A → M is a tensor ∗ –functor. We start by assuming that A and M have braidings, σ and ε, in the sense of (2.1) and that ρ is a braided tensor functor. We shall refer M as a braided extension of A. Our main assumption is that A admits an embedding into the category of Hilbert spaces, and we fix a tensor ∗ –functor τ : A → Hilb. (We shall not assume that τ is braided as this would imply that σ is a permutation symmetry.) Hence, by Woronowicz duality, τ determines a compact quantum group G τ with a braided representation category. Consider the ergodic C ∗ –action (C, α) of G τ associated with the pair (ρ, τ ), see [32]. For each object u of A, Hu is the Hilbert bimodule constructed in [33], in fact just depending on ρu of M. We may identify canoniclly Hu with τu ⊗ C as right Hilbert modules via the unitaries Su of [33] making the left module structures explicit, see Prop. 8.6 in [33], < ψ ⊗ I, cv (φ) · ψ ⊗ I >= (ρ(Ru∗ ) ◦ 1ρu ⊗ T ⊗ 1ρu ) ⊗ ( ju ψ ⊗ φ ⊗ ψ ), for every irreducible spectral representation v of the ergodic action of G τ on C and every linear intertwiner cv : τv → C between v and α, i.e. of the form cv (φ) = T ⊗ φ, T ∈ (ρv , ι). The next lemma shows that if A and M have a braiding in the weak sense of relation (2.1) and if ρ is a braided functor, the left bimodule structure is completely determined by the representation theory of G τ and the ergodic action. Lemma 5.1. If σ and ε satisfy (2.1) for tensor C ∗ –categories A and M and if ρ : A → M is a braided tensor ∗ –functor then the left module structure on Hu under the canonical identification with τu ⊗ C is given by < ψ ⊗ I, cv (φ) · ψ ⊗ I >= cv (τ (Ru∗ ⊗ 1v ◦ 1u ⊗ σ (v, u)) ju ψ ⊗ φ ⊗ ψ ).
Rigidity Result for Extensions of Braided Tensor C ∗ –Categories
657
Proof. Writing, as above, cv (φ) = T ⊗ φ, with φ ∈ τv and T ∈ (ρv , ι), < ψ ⊗ I, cv (φ) · ψ ⊗ I >= (ρ(Ru∗ ) ◦ 1ρu ⊗ T ⊗ 1ρu ) ⊗ ( ju ψ ⊗ φ ⊗ ψ ) = (ρ(Ru∗ ) ◦ 1ρu ⊗ (1ρu ⊗ T ◦ ε(ρv , ρu ))) ⊗ ( ju ψ ⊗ φ ⊗ ψ ) = (ρ(Ru∗ ) ◦ 1ρu ⊗ 1ρu ⊗ T ◦ 1ρu ⊗ ρσ (v, u)) ⊗ ( ju ψ ⊗ φ ⊗ ψ ) = (T ◦ ρ(Ru∗ ) ⊗ 1ρv ◦ 1ρu ⊗ ρσ (v, u)) ⊗ ( ju ψ ⊗ φ ⊗ ψ ) = T ◦ ρ(Ru∗ ⊗ 1v ◦ 1u ⊗ σ (v, u)) ⊗ ( ju ψ ⊗ φ ⊗ ψ ) = T ⊗ (τ (Ru∗ ⊗ 1v ◦ 1u ⊗ σ (v, u)) ju ψ ⊗ φ ⊗ ψ )) = cv (τ (Ru∗ ⊗ 1v ◦ 1u ⊗ σ (v, u)) ju ψ ⊗ φ ⊗ ψ ).
From Lemma 5.1, we derive a first property that spectral representations for ergodic actions arising from braided functors ρ : A → M have to satisfy. Corollary 5.2. Let σ and ε be weak left braided symmetries of A and M respectively and ρ : A → M a braided tensor ∗ –functor. For every irreducible object v of A such that (ι, ρv ) = {0} and for every object u ∈ A, and every solution Ru of the conjugate equations for u, Ru∗ ⊗ 1v ◦ 1u ⊗ σ (v, u) = Ru∗ ⊗ 1v ◦ 1u ⊗ σd (v, u).
(5.1)
Proof. It suffices to apply Lemma 5.1 to σ and σd , recalling that entries of spectral multiplets of an ergodic action corresponding to irreducible representations are linearly independent and that τ is faithful on arrows being defined on a tensor C ∗ –category with conjugates.
Remark. When σ and ε are weak right braided symmetries, (5.1) is replaced by Ru∗ ⊗ 1v ◦ 1u ⊗ σ−1 (v, u) = Ru∗ ⊗ 1v ◦ 1u ⊗ σ∗ (v, u). Note that (5.1) is automatically satisfied if σ is unitary. However, if it is non-unitary, the corollary provides an obstruction to constructing non-trivial braided extensions (M, ρ) of A. We start with a given braided extension (M, ρ) and draw conclusions about the eigenvalues κl (v) corresponding to irreducible objects v of A with (ι, ρv ) non-trivial. Corollary 5.3. Let σ be a left (right) braided symmetry of A, ε a weak left (right) braided symmetry of M and ρ a braided tensor ∗ –functor. For any irreducible object v of A such that (ι, ρv ) = {0}, κl (v)(κr (v)) is a phase. Proof. Replacing σ and ε by σ−1 and ε−1 if necessary, we may assume, by Prop. 4.2, that σ and ε are left braided symmetries. We claim that for any irreducible v with (ι, ρv ) = 0 and any solution R v of the conjugate equations for v, σ∗ (v, v) ◦ R v = σ−1 (v, v) ◦ R v . σ This would show that, choosing standard solutions, κrσ∗ (v) = κr −1 (v) and the first two relations in Prop. 4.2 imply |κl (v)| = 1. Now applying the previous proposition to ε and σ , being left and hence weak left braided symmetries, it follows from (5.1) with v and v in place of u and v respectively, that Rv∗ ⊗ 1v = Rv∗ ⊗ 1v ◦ 1v ⊗ σd (v, v)σ (v, v)−1 . Composing on the right by 1v ⊗ R v gives 1v = Rv∗ ⊗ 1v ◦ 1v ⊗ (σd (v, v) ◦ σ (v, v)−1 ◦ R v ).
658
C. Pinzari, J. E. Roberts
On the other hand σd (v, v)◦σ (v, v)−1 ◦ R v must be a scalar multiple of R v as v is irreducible. Our equation then shows that the scalar equals 1, so σ (v, v)∗ ◦ R v = σ (v, v)−1 ◦ R v .
Theorem 5.4. Let (A, σ ) be a tensor C ∗ –category with conjugates and a left (right) braided symmetry. Assume that A admits a tensor ∗ –embedding τ into the category of Hilbert spaces. If κ(v) is not a phase whenever v is an irreducible of A not equivalent to ι, then every braided tensor ∗ –functor ρ : (A, σ ) → (M, ε) into a weak left (right) braided tensor C ∗ –category (M, ε) is full. Proof. The arrow space (ι, ρv ) of M can be identified with the spectral space of vˆ for the action of G τ on C. By the main result of [33], Theorem 7.7, there is then a full and faithful ∗ –functor λ from M to the category of bimodule G τ –representations, taking u to the G τ –bimodule τu ⊗ C and ρ(T ) to τ (T ) ⊗ I . By the previous corollary, (ι, ρv ) = {0} for every irreducible v = ι, hence C = C. Now regarding λ as taking values in the category of G τ –representations, λρ = τ and is full, thus ρ is full.
6. Proof of Theorem 3.1 By Theorem 5.4 it suffices to show that κ(v) is not a phase whenever v is an irreducible unitary representation of G μ not equivalent to ι. We want to compute the invariant κ for the braided C ∗ –tensor category (RepSμ U (d), σω ) and begin with its value on the fundamental representation u, writing σ for σω for brevity. Recall that u may be realized as the subobject of u ⊗d−1 defined by η(E d−1 ). We have the relations gi S = −q S, with q = μ2 for i = 1, . . . , d − 1 (see e.g. Lemma 5.6 in [31]), hence σi S = −ωμS. By Theorem 5.5 in [31], a standard solution of the conjugate equations for the fundamental representation u is given by R = λS, R = (−1)d−1 R, with λ a suitable positive scalar. Hence σ (u, u) ◦ R = (−1)d−1 λσ (u, u) ◦ 1u ⊗ E d−1 ◦ S = (−1)d−1 λE d−1 ⊗ 1u ◦ σ (u, u ⊗d−1 ) ◦ S = (−1)d−1 (−ωμ)d−1 λE d−1 ⊗ 1u ◦ S = (ωμ)d−1 R, so that κ(u) = (ωμ)d−1 . Remark. A similar computation shows that κ(u) = (ωμ)d−1 as well. Note that these are not phases, unless μ = ±1. As we shall see more precisely later, these eigenvalues can be well understood in terms of the representation theory of the quantum group. From the above computation of κ(u), and the remark following Corollary 4.5, we arrive at the following result expressed in terms of the original representation η of the Hecke algebra Hn (μ2 )
Rigidity Result for Extensions of Braided Tensor C ∗ –Categories
659
Proposition 6.1. κ(u
⊗n
−n(n−1) ω −1 )= (ωμ)n(d−1) ημ (G −1 n−1 . . . G 1 ), μ
where G k = gk gk−1 . . . g 2 . . . gk−1 gk . We shall use this formula to reduce the problem to the case μ > 0 and a specified root. Theorem 6.2. Suppose that, for some μ > 0 and a specified d th –root of ω of μ, κ(v) is never a phase when v is an irreducible object of Rep(Sμ U (d)) not equivalent to ι, then the same property holds if μ and ω are replaced independently by −μ and any other root ω . Proof. If v is an irreducible summand of some tensor power u ⊗n , κ(v) appears as the eigenvalue of κ(u ⊗n ) corresponding to the central support of v in (u ⊗n , u ⊗n ) = ημ (Hn (q)). On the other hand the representations ημ and η−μ have the same kernel In in each Hn (q) (cf. [23] and also [31]), hence this explicit dependence on the Hecke algebra Hn (μ2 ) shows that the relevant central support does not change if one replaces μ by −μ. This central support agrees with that of an irreducible summand v in u ⊗n , where −1 u is the fundamental representation of S−μ U (d). The spectrum of ημ (G −1 n . . . G 1 ) is −1 just the spectrum of the image of G −1 n . . . G 1 in Hn (q)/In under the quotient map, and hence does not change if μ is replaced by −μ. It follows that κ(v) for μ and σω can differ from κ(v ) for μ or −μ and σω only by a phase.
Remark. The kernel of ημ is known and may be described in terms of Young diagrams. −1 Hence the spectral analysis of the element G −1 n−1 . . . G 1 of the Hecke algebra Hn (q) determines the invariant κ on all irreducibles contained in u ⊗n , cf. [41]. However, we shall not pursue this approach, but rather look for a more general argument giving results for deformations of classical compact Lie groups of Lie types other than A. To this end, we use the dual picture of Drinfeld and Jimbo. See Sect. 3 for notation. Let ρ ∈ h ∗ be the element defined by (αi , 2ρ) = (αi , αi ). Let λ be a dominant integral weight and vλ the irreducible representation of Uμ (g) with highest weight λ. The twist θvλ is known to act on the space of vλ as scalar multiplication by μ−(λ,λ+2ρ) see Lemma 7.3.2 in [24]. Taking the comparison between the algebraic and analytic approach to the twist into account, see Prop. 5.1, we are reduced to showing that (λ, λ + 2ρ) is strictly positive for λ = 0. Now by [16] Sect. 13.3, if (λ j ) are the fundamental dominant weights (the r 2α basis dual to ( (α j ,αj j ) )), ρ = i m i λi , with m i 1 λ j . Writing λ in the form λ = non-negative integers, (λ, λ + 2ρ) = i, j m i (m j + 2)(λi , λ j ). The explicit relationship between simple roots and fundamental weights, λi = r dir αr , with (dir ) = (At )−1 , shows that (λi , λ j ) = dir d js (αr , αs ) = dir d js dr ar s = d j di j > 0, r,s
r.s
(cf. Table 1, p.69, [16]), completing the proof of Theorem 3.1.
660
C. Pinzari, J. E. Roberts
In particular, in the type Ad−1 case that table in Humphrey’s book gives, 3 2 2 (d − i)(1 + · · · + i − 1) + i(d − i) + i(1 + · · · + d − i − 1) d d d 1 3 1 = (d − i)(i − 1)i + i(d − i) + i(d − i − 1)(d − i) d d d 1 = i(d − i)(d + 1), d
(λi , λi + 2ρ) =
for the fundamental weights. For i = 1 and i = d − 1, this reduces to our previous computation of κ(u) and κ(u). Remark. The computation of κ(u) and κ(u) at the beginning of the section depended on relations (3.1)–(3.3). The above proof gives an independent derivation. 7. On the Characterization of Rep(Sμ U(d)) An intrinsic characterization of the quantum deformation of the spatial linear group was first given in [23], and based on the analysis of the fusion rules. An independent approach was proposed in [31] for Sμ U (d), where the starting point was the braided symmetry. The aim of this section is to make the relation between the two approaches explicit, when not at roots of unity. To any semisimple rigid tensor category C with Grothendieck semiring isomorphic to that of sl(d) (briefly, an sl(d)–category), Kazhdan and Wenzl associate an invariant τC , the twist of the category. They start with a suitable idempotent a ∈ (X 2 , X 2 ), X the fundamental object of C, and find a nonzero complex number q such that qa − (I − a) satisfies the Hecke algebra relations for the parameter q. This defines representations η K W : Hn (q) → (X n , X n ). If ν ∈ (ι, X d ) and p ∈ (X d , ι) are chosen so that p ◦ ν = 1, then the twist is defined by τC := p ⊗ 1 ◦ η K W (gd . . . g1 ) ◦ 1 ⊗ ν. It is asserted in Prop. 5.1, [23] that τC is a d th -root of unity. Clearly, Rep(Sμ U (d)) is an sl(d)–category. However, comparing with (3.3) leads to τ = μd−1 . The origin of the inconsistency is the claim on p. 135 of [23] that ν ⊗ ν is an eigenvector of η K W (X d , X d ) with eigenvalue 1, whilst, for Sμ U (d), iterating relations (3.3) d times gives, η(u d , u d )ν ⊗ ν = μd(d−1) ν ⊗ ν. This value is related to Drinfeld’s theory of universal R–matrices. In fact, it is well known that universal R–matrices give rise to ribbon tensor categories. As a consequence, in the type A case, the generator of the Hecke algebra needs to be multiplied by a suitable scalar to ensure the naturality of the braiding. This scalar is well known, see Lemma 3.2.1 in [41], see also [5,43]. (We would warn the reader that this scalar is often computed incorrectly in the literature.) The well known characterization of the quasiequivalence class of a non-faithful Hecke algebra representation in a rigid tensor category C [23,31,42] leads to τC = wμd−1 , where μ is a complex square root of q and w is a d th root of unity. This expression easily leads to a presentation of sl(d)–categories analogous to [31]. To make the comparison more immediate, we shall assume that C has a ∗ –involution making it into a tensor ∗ –category. Analogous relations may be derived in the general case. We omit the proof.
Rigidity Result for Extensions of Braided Tensor C ∗ –Categories
661
Proposition 7.2. If C is a tensor ∗ –category of sl(d) type, q is derived from C as in [23], ν ∈ (ι, X d ) satisfies ν ∗ ◦ ν = 1, then ν ◦ ν ∗ = η(E d ) and ν∗ ⊗ 1X ◦ 1X ⊗ ν = w
(−μ)d−1 , [d]q
η(g1 . . . gd )ν ⊗ 1 X = wμd−1 1 X ⊗ ν,
(7.2) (7.3)
where μ2 = q, τC = wμd−1 , [d]q = 1 + q + · · · + q d−1 , and η = η K W . Remark. If C is a tensor C ∗ –category, there are Hilbert space representations for H∞ (q). Hence q is either a root of unity or q > 0 by Wenzl’s result [40]. We characterize Rep(Sμ U (d)) among sl(d)–categories in the spirit of [23]. Corollary 7.3. Let C be a tensor C ∗ –category of sl(d)–type with associated parameter q. Then, a) if τC > 0, then C is tensor ∗ –equivalent to Rep(S√q U (d)), b) if τC < 0 and d is even, then C is tensor ∗ –equivalent to Rep(S−√q U (d)). Remark. Note that for μ > 0, and d even, Rep(S−μ U (d)) is a twist of Rep(Sμ U (d)) by w = −1. Acknowledgements C.P. would like to thank C. De Concini, Y. Kawahigashi, R. Longo and M. Müger for discussions. We would like to thank the referee for pointing out Lemma 3.2.1 in [41] and [5] and for giving suggestions on how to shorten the presentation of the paper.
References 1. Andruskiewitsch, N.: Some exceptional compact matrix pseudogroups. Bull. Soc. Math. France 120, 297– 325 (1992) √ 2. Asaeda, √ M., Haagerup, U.: Exotic subfactors of finite depth with Jones indices (5 + 13)/2 and (5 + 17)/2. Commun. Math. Phys. 202, 1–63 (1999) 3. Bakalov, B., Kirillov, A. Jr.: Lectures on tensor categories and modular functors. University Lecture Series, Vol. 21, Providence, RI: Amer. Math. soc., 2001 4. Banica, T.: Le groupe quantique compact libre U (n). Commun. Math. Phys. 190, 143–172 (1997) 5. Blanchet, C.: Hecke algebras, modular categories and 3-manifolds quantum invariants. Topology 39, 193–223 (2000) 6. Carpi, S., Kawahigashi, Y., Longo, R.: On the Jones index values for conformal subnets. Lett. Math. Phys. 92(2), 99–108 (2010) 7. Chari, V., Pressley, A.: A guide to quantum groups. Cambridge: Cambridge University Press, 1994 8. Deligne, P.: Catégories tannakiennes. In: The Grothendieck Festschrift. Vol. II, Progr. Math. 87, Boston, MA: Birkhäuser Boston, 1990, pp. 111–1995 9. Doplicher, S., Haag, R., Roberts, J.E.: Local observables and particle statistics I. Commun. Math. Phys. 23, 199–230 (1971); II, Commun. Math. Phys. 35, 49–85 (1974) 10. Doplicher, S., Roberts, J.E.: A new duality theory for compact groups. Invent. Math. 98, 157–218 (1989) 11. Etingof, P., Nikshych, D., Ostrik, V.: On fusion categories. Ann. of Math. 162, 581–642 (2005) 12. Fredenhagen, K., Rehren, K.H., Schroer, B.: Superselection sectors with braid group statistics and exchange algebras 1. General theory. Commun. Math. Phys. 125, 201–226 (1989) 13. Goodman, F.M., de la Harpe, P., Jones, V.F.R.: Coxeter graphs and towers of algebras. MSRI Publications 14, Berlin-New York: Springer-Verlag, 1989 14. Goodman, F.M., Wenzl, H.: Ideals in the Temperley-Lieb category. Appendix to M. Freedman: a magnetic model with a possible Chern-Simons phase. Commun. Math. Phys. 234, 129–183 (2003) 15. Haag, R.: Local quantum physics. Berlin-Heidelberg-New York: Springer–Verlag, 1992 16. Humphreys, J.E.: Introduction to Lie algebras and representation theory. Berlin-Heidelberg-New York: Springer–Verlag, 1972
662
C. Pinzari, J. E. Roberts
√ 17. Haagerup, U.:Principal graphs of subfactors in the index range 4 < [M : N ] < 3 + 2. In: Subfactors (Kyuzeso, 1993), Singapore: World Sci., 1994, pp. 1–38 18. Jimbo, M.: A q–analogue of U (gl(N + 1)). Hecke algebras and the Young Baxter equation. Lett. Math. Phys. 10, 63–69 (1985) 19. Jones, V.F.R.: Index for subfactors. Invent. Math. 72, 1–25 (1983) 20. Kassel, C.: Quantum groups. Berlin-Heidelberg-New York: Springer–Verlag, 1995 21. Kassel, C., Rosso, M., Turaev, V.: Quantum groups and knot invariants. Paris: Soc. Math. de France, 1997 22. Kawahigashi, Y., Longo, R.: Classification of local conformal nets. Case c < 1. Ann. Math. 160, 493–522 (2004) 23. Kazhdan, D., Wenzl, H.: Reconstructing monoidal categories. I. M. Gelfand seminar. Adv. Soviet. Math., 16, Providence, RI: Amer. Math. Soc., 1993, pp. 111–136 24. Korogodski, L.I., Soibelman, Y.S.: Algebras of functions on quantum groups: part I, Mathematical Surveys and Monographs, 56, Providence, RI: Amer. Math. Soc., 1998 25. Longo, R., Roberts, J.E.: A theory of dimension.. K –Theory 11, 103–159 (1997) 26. Müger, M.: On the structure of modular categories. Proc. London Math. Soc. 87, 291–308 (2003) 27. Müger, M.: On superselection theory of quantum fields in low dimensions. http://arXiv.org/abs/0909. 2537v1 [math-ph], 2009 28. Müger M.: Tensor categories: A selective guided tour. http://arXiv.org/abs/0804.3587v3 [math.CT], 2010 29. Ostrik, V.: Fusion categories of rank 2. Math. Res. Lett 10, 177–183 (2003) 30. Ostrik, V.: Pre-modular categories of rank 3. Mosc. Math. J. 8, 111–118 (2008) 31. Pinzari, C.: The representation category of the Woronowicz quantum group Sμ U (d) as a braided tensor C ∗ –category. Int. J. Math. 18, 113–136 (2007) 32. Pinzari, C., Roberts, J.E.: A duality theorem for ergodic actions of compact quantum groups on C ∗ –algebras. Commun. Math. Phys. 277, 385–421 (2008) 33. Pinzari C., Roberts J.E.: A theory of induction and classification of tensor C ∗ –categories. arXiv:0907.2459, to appear in J. Noncomm. Geom. 34. Popa, S.: Classification of amenable subfactors of type II. Acta Math. 172, 163–255 (1994) 35. Rehren, H.:Subfactors and coset models. In: Generalized symmetries in Physics. (Clausthal, 1994), Singapore: World Scientific, 1999, pp. 338–356 36. Reshetikhin, N.Yu., Turaev, V.G.: Ribbon graphs and their invariants derived from quantum groups. Commun. Math. Phys. 127, 1–26 (1990) 37. Rosso, M.: Algèbres enveloppantes quantifiées, groupes quantiques compacts de matrices et calcul différentiel non commutatif. Duke Math. J. 61, 11–40 (1990) 38. Rowell, E., Stong, R., Wang, Z.: On classification of modular tensor categories. Commun. Math. Phys. 292, 343–389 (2009) 39. Tuba, I., Wenzl, H.: On braided tensor categories of type BCD. J. Reine Angew. Math. 581, 31–69 (2005) 40. Wenzl, H.: Hecke algebras of type An and subfactors. Invent. Math. 92, 349–383 (1988) 41. Wenzl, H.: Braids and invariants of 3–manifolds. Invent. Math. 114, 235–275 (1993) 42. Woronowicz, S.L.: Tannaka–Krein duality for compact matrix pseudogroups. Twisted SU(N ) groups. Invent. Math. 93, 35–76 (1988) 43. Yamagami, S.: Free products of semisimple tensor categories. http://arXiv.org/abs/math/0106214v1 [math.CT], 2001 Communicated by Y. Kawahigashi
Commun. Math. Phys. 306, 663–694 (2011) Digital Object Identifier (DOI) 10.1007/s00220-011-1294-x
Communications in
Mathematical Physics
The Quantum Double Model with Boundary: Condensations and Symmetries Salman Beigi1,2 , Peter W. Shor3 , Daniel Whalen3 1 Institute for Quantum Information, California Institute of Technology, MC 305-16, 1200 E. California Blvd.,
Pasadena, CA 91125-1600, USA. E-mail:
[email protected]
2 School of Mathematics, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran 3 Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
Received: 3 August 2010 / Accepted: 11 February 2011 Published online: 28 June 2011 – © Springer-Verlag 2011
Abstract: Associated to every finite group, Kitaev has defined the quantum double model for every orientable surface without boundary. In this paper, we define boundaries for this model and characterize condensations; that is, we find all quasi-particle excitations (anyons) which disappear when they move to the boundary. We then consider two phases of the quantum double model corresponding to two groups with a domain wall between them, and study the tunneling of anyons from one phase to the other. Using this framework we discuss the necessary and sufficient conditions when two different groups give the same anyon types. As an application we show that in the quantum double model for S3 (the permutation group over three letters) there is a chargeon and a fluxion which are not distinguishable. This group is indeed a special case of groups of the form of the semidirect product of the additive and multiplicative groups of a finite field, for all of which we prove a similar symmetry. 1. Introduction The quantum double model of Kitaev [1] has been studied extensively in recent years both from the point of view of quantum error correcting codes as well as non-abelian statistics. Bombin and Martin-Delgado [2] have defined a generalization of the Kitaev Hamiltonian and characterized condensations and confinements in their model. They have also studied their model with domain walls [3]. Moreover, the Levin-Wen model [4] can be considered as a generalization of the quantum double model for unitary tensor categories [5], and provides us with a general approach to translate mathematical objects to physical concepts and vice versa. Kitaev and Kong (personal communication, 2009) have used this framework and defined the Levin-Wen model with boundaries and domain walls between two phases corresponding to two categories. In their model, the boundaries are parametrized by an algebra in the corresponding category. So by fixing an algebra, one can study edge excitations, condensations, and tunneling of quasi-particle excitations from one phase to another. However, the connection between this model and
664
S. Beigi, P. W. Shor, D. Whalen
the work of Bombin and Martin-Delgado [2,3] is not clear. In particular, we do not know which algebras give the condensations of [2]. In this paper, we define the quantum double model with boundary, and based on the recent work of Davydov [6] on the classification of algebras in group-theoretical modular categories, try to characterize possible condensations in this model. Before explaining our work and its consequences, let us start with the example of the toric code. Consider a planar square lattice and to each edge correspond a Hilbert space with basis elements indexed with Z2 . The Hamiltonian is a summation of vertex and face operators which are defined in terms of σx and σz Pauli matrices (see [1] for details). Then elementary excitations of the system correspond to certain chains of σx and σz operators (ribbon operators) which create two quasi-particles at the end points of the ribbon. A string of σx operators gives magnetic charges (fluxions), denoted by m, and a chain of σz operators gives electric charges (chargeons), denoted by e. Moreover, movement of quasi-particles is equivalent to extending the corresponding ribbons, and then their braidings can be defined. Simultaneous application of σx and σz chains corresponds to the fusion of m and e: = m ⊗ e. These three particles together with the vacuum give the system of anyons corresponding to group Z2 ; they are indeed the four irreducible representations of the quantum double of Z2 , denoted by D(Z2 ). Assume that there is a defect line in the lattice which divides the plane into two parts and so that the lattice on the right-hand side is the dual of the lattice on the left. That is, on the left half-plane the vertex operators are defined in terms of σx and on the right half-plane in terms of σz , and similarly for the face operators. (The vertex and face operators should be carefully defined on the defect line; details are given in the work of Kitaev and Kong and in [7].) Now consider an m excitation on the left and by applying a chain of σx operators move it to the right hand side. Due to the structure of the lattice, the string of σx operators will change to σz terms on the right, which means that m becomes an e on the right. Thus the operation of moving particles from the left half-plane to the right side exchanges e and m (while keeping the vacuum and unchanged). As a result, all braidings and fusions are symmetric with respect to the transposition (e, m). In this paper we generalize the above construction for every group. We consider a planar lattice with a defect line and define Kitaev’s Hamiltonian on the left and right half-planes corresponding to two groups G and G . The excitations on the two bulks again correspond to the representations of the quantum doubles of G and G , and ribbon operators create and move these quasi-particles. Therefore, if the movement of quasiparticles from the left half-plane to the right makes sense, i.e., a consistent definition of the Hamiltonian near the domain wall is available, we can study the tunneling of anyons from one phase to the other. Suppose G is the trivial single-element group. In this case there is no excitation on the right hand side, and indeed the domain wall turns into a boundary. The toric code with boundary (G = Z2 ) has been studied by Bravyi and Kitaev [8]. They have considered two types of boundaries: the z-boundary and the x-boundary, and have shown that an m (e) excitation disappears when it moves towards the x-boundary (z-boundary). In other words, anyons m and e get condensed near the corresponding boundaries. Kitaev and Kong (personal communication, 2009) have generalized this idea and defined the Levin-Wen model with boundaries. In their model, the boundary is parametrized in terms of an algebra in the corresponding category. Here we define a boundary for the quantum double model in terms of a subgroup K ⊆ G and a 2-cocycle of K . Then we characterize the anyons that become condensed at the boundary. We finally by applying the folding idea turn a domain wall between two
The Quantum Double Model with Boundary: Condensations and Symmetries
665
phases G and G into a boundary given by some U ⊆ G × G and a 2-cocycle, so the tunneling of anyons from one phase to another can be studied in terms of condensations. Using this machinery, we study groups G for which a symmetry similar to that of Z2 exists. That is, a transposition of a chargeon-fluxion pair, together with replacing each anyon with its charge conjugation, gives a symmetry of anyons corresponding to G. We show that all groups of the form Fq+Fq× , where Fq is the finite field with q elements, have this property. Note that q = 2 gives Z2 and q = 3 corresponds to S3 , the permutation group over three letters. The rest of this paper is organized as follows. In the following two sections, we review the basic ingredients of the quantum double of finite groups and the quantum double model. In Sect. 4 we define a boundary for the Kitaev model that depends only on a subgroup (the corresponding 2-cocycle is trivial) and compute the condensations. This construction is generalized in Sect. 5. In Sect. 6 we consider a domain wall between two phases, and by applying the folding idea, turn it into a boundary. We then use all the previous results to find a non-trivial symmetry in the system of anyons of Fq+ Fq× . In Appendix B we try to classify all groups for which a symmetry similar to that of Fq+ Fq× exists. Finial remarks and some open problems are discussed in Sect. 7. 2. Drinfeld Double of a Finite Group Let us first fix some notations. C denotes the set of complex numbers and C× is its multiplicative group. x ∗ is the complex conjugate of x ∈ C and |x|2 = x x ∗ . The identity element of a general group G is denoted by e. For a subgroup K of G and g ∈ K , Z K (g) denotes the centralizer of g in K : Z K (g) = {h ∈ K : hg = gh}. We write K
g ∼ g if there exists h ∈ K such that hgh −1 = g . When K = G and there is no G
confusion, we drop K in these notations (Z G (g) = Z (g) and g ∼ g means g ∼ g ). The conjugacy class of g ∈ G is denoted by g (g = {hgh −1 : h ∈ G}). In this paper all representations are over complex numbers, and for a representation ρ of a group, ρ ∗ denotes its complex conjugate representation. trρ (·) is the character of ρ, and 1 denotes the trivial representation (tr1 (·) = 1). δ denotes the Kronecker delta function and for any relation p, δ p = 1 if p holds and otherwise δ p = 0. The size of a set X is denoted by |X |. Finally, equivalence of categories, and isomorphism of groups and representations are shown by . Although some of the results of this paper are stated in terms of category theory notions, a basic knowledge of the theory of anyons is enough to follow the proofs. For technical details we refer to [9] and Appendix E of [10].
2.1. D(G). Let G be a finite group. The quantum double or Drinfeld double of G denoted by D(G), is a Hopf algebra containing CG. D(G) can be described by the C-basis {gh ∗ : g, h ∈ G} with the multiplication (g1 h ∗1 )(g2 h ∗2 ) = δh 2 , g−1 h 1 g2 (g1 g2 )h ∗2 , 2
and the comultiplication (gh ∗ ) =
h 1 h 2 =h
gh ∗1 ⊗ gh ∗2 .
(1)
666
S. Beigi, P. W. Shor, D. Whalen
By g ∈ D(G) we mean g = hgh ∗ , and h ∗ = eh ∗ (e is the identity of the group). The unit of D(G) is equal to e = h eh ∗ , the counit is given by ε(gh ∗ ) = δh,e , and the antipode is γ (gh ∗ ) = g −1 (gh −1 g −1 )∗ . 2.2. Representations of D(G). Consider an element a ∈ G and let π be a representation of Z (a) over the vector space W with the basis {w1 , . . . , wd }. Define the vector space V(a,π ) with the basis {|b, wi : b ∈ a, 1 ≤ i ≤ d}. V(a,π ) is a representation of D(G) as follows. For any b ∈ a fix kb ∈ G such that b = kb akb−1 . (Let ka = e.) Observe that −1 ∗ k gbg −1 gk b is always in Z (a), and then for any w ∈ W , b ∈ a, and gh ∈ D(G) define −1 gh ∗ |b, w = δh,b |gbg −1 , π(k gbg −1 gk b ) w .
It is easy to show that this action gives a representation of D(G). χ(a,π ) , the character of this representation, is given by χ(a,π ) (gh ∗ ) = δh∈a δgh,hg trπ (kh−1 gkh ).
(2)
If π is an irreducible representation (irrep) of Z (a), then the representation V(a,π ) of D(G) is irreducible as well. Conversely, all irreps of D(G) are of the above form and are indexed by conjugacy classes of G and irreps of the centralizer of a fixed element in the corresponding conjugacy class (see for example [9]). The trivial representation of D(G) is indexed by 0 = (e, 1). Moreover, the (charge) conjugation of (a, π ), which we denote by (a, π )∨ , is isomorphic to (a −1 , π ∗ ). The conjugacy class a of an irrep (a, π ) is called its magnetic charge and π is its electric charge. (a, π ) is called a chargeon if a = e and a fluxion if π = 1. Irreducible representations of D(G) are orthogonal to each other with respect to the following inner product: χ1 , χ2 =
∗ 1 χ1 (gh ∗ ) χ2 (gh ∗ ). |G|
(3)
g,h
Then the multiplicity of the irrep (a, π ) in the character χ is equal to χ(a,π ) , χ . 2.3. Fusion rules. Let (a, π ) and (a , π ) be two irreps of D(G). Then using the comultiplication (1), (a, π ) ⊗ (a , π ) is also a representation1 of D(G) and is isomorphic to the direct sum of irreducible ones: (a, π ) ⊗ (a , π )
(h,ρ)
N
(h,ρ) (h, ρ), (a,π )(a ,π )
(h,ρ)
where N is a non-negative integer. To compute these numbers we may use the (a,π )(a ,π ) Verlinde formula. 1 The action of gh ∗ on v ⊗ w is given by (gh ∗ )v ⊗ w.
The Quantum Double Model with Boundary: Condensations and Symmetries
667
Define the matrix S whose rows and columns are indexed by irreps of D(G) and S(a,π )(a ,π ) =
1 |Z (a)| · |Z (a )|
trπ (ha −1 h −1 )trπ (h −1 a −1 h).
(4)
h: ha h −1 ∈Z (a)
Then N XZY can be computed in terms of S: N XZY =
S XU SY U S ∗ ZU , S0U
(5)
U
where the summation runs over all irreps U , and 0 = (e, 1) is the trivial representation. 2.4. Z(G). Z(G) denotes the category of finite dimensional representations of D(G) over complex numbers. Every object of Z(G) is isomorphic to a direct sum of simple objects, i.e., irreducible representations. Z(G) is a fusion category, where the fusion rules are given by the Verlinde formula. Moreover, the R-matrix R=
g ∗ ⊗ g,
g∈G
defines the braiding C X,Y = P R : X ⊗ Y → Y ⊗ X of two representations X and Y , where P is the transposition of X and Y . (C X,Y v ⊗ w = g gw ⊗ g ∗ v.) Z(G) is a modular tensor category (see [9] for details). 2.5. Example: Z(S3 ). Let G = S3 be the permutation group over three letters: S3 = σ, τ : σ 2 = τ 3 = e, σ τ = τ −1 σ . D(S3 ) has eight irreducible representations described in the following table. A B C D E F G H conjugacy class e e e σ σ τ τ τ irrep of the centralizer 1 sign π 1 [−1] 1 [ω] [ω∗ ] Here sign denotes the sign representation, π is the two-dimensional representation of S3 , and [−1], and [ω], [ω∗ ] denote the non-trivial representations of Z (σ ) = {e, σ } and Z (τ ) = {e, τ, τ −1 }. The corresponding S-matrix is ⎛
1 1 2 3 3 2 2 2 ⎜1 1 2 −3 −3 2 2 2 ⎜ 4 0 0 −2 −2 −2 ⎜2 2 1⎜ 0 0 ⎜ 3 −3 0 3 −3 0 S= ⎜ 3 −3 0 −3 3 0 0 0 ⎜ 6 ⎜ 2 2 −2 0 0 4 −2 −2 ⎜ ⎝ 2 2 −2 0 0 −2 4 −2 2 2 −2 0 0 −2 −2 4
⎞ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟, ⎟ ⎟ ⎟ ⎠
and then using the Verlinde formula (5) the fusion rules can be computed.
(6)
668
S. Beigi, P. W. Shor, D. Whalen
⊗ AB A A B B B A C C C D D E E E D F F F G G G H H H
C
D
E
F
G
H
C C A⊕B⊕C
D E D⊕E
E D D⊕E
F F G⊕H
G G F⊕H
H H F⊕G
D⊕E D⊕E
A⊕C⊕F⊕G⊕H B⊕C⊕F⊕G⊕H
B⊕C⊕F⊕G⊕H A⊕C⊕F⊕G⊕H
D⊕E D⊕E
D⊕E D⊕E
D⊕E D⊕E
G⊕H F⊕H F⊕G
D⊕E D⊕E D⊕E
D⊕E D⊕E D⊕E
A⊕B⊕F H ⊕C G⊕C
H ⊕C A⊕B⊕G F⊕C
G⊕C F⊕C A⊕B⊕H
There are several symmetries in this fusion table. In particular, by exchanging C and F we obtain the same table. This fact can also be seen from the S-matrix (6). If we let P be the permutation matrix corresponding to the transposition (C, F), then P S P −1 = S. On the other hand, by the Verlinde formula the fusion rules are computed in terms of S, so since S is invariant under P, the fusion rules are also symmetric with respect to the transposition (C, F). In Sect. 6.1 we prove that this symmetry can be extended to an auto-equivalence of the whole category Z(G), i.e., the braidings are also symmetric. 3. The Quantum Double Model In this section we briefly discuss the main ingredients of the quantum double model [1]. For a detailed description we refer the reader to the original paper and [2]. 3.1. Kitaev’s Hamiltonian. Consider a planar lattice with directed edges. We associate to each edge the Hilbert space CG with the orthonormal basis {|g : g ∈ G}, where G is a finite group. For simplicity of presentation we assume that the direction of an edge can be reversed, and in this case we change the vector corresponding to that edge by sending |g to |g −1 (and extending linearly). A pair s = (v, f ) of adjacent vertex v and face f is called a site and is depicted by a dotted line as in Fig. 1. For any site s = (v, f ) g g we define the operators As and Bsh , g, h ∈ G, according to Fig. 2. Observe that As depends only on the vertex v and not f , and furthermore, if h is in the center of G, Bsh is g g independent of v. So if there is no ambiguity As and Bsh are denoted by Av and B hf , and are called the vertex and face operators, respectively. The following relations are easy to verify: g
g
gg
As As = As , g
(g −1 )
(As )† = As
,
Bsh Bsh = δh,h Bsh , (Bsh )† = Bsh , g (ghg −1 ) g As Bsh = Bs As . g h ∗ gh → As Bs gives an isomorphism
These equations show that between the quantum double D(G) and the algebra of operators acting on site s. g g g Observe that, for different sites s = s we have [As , As ] = [As , Bsh ] = [Bsh , Bsh ] = 0. Define 1 g Av = As = As |G| g∈G
The Quantum Double Model with Boundary: Condensations and Symmetries
669
Fig. 1. A planer square lattice with directed edges. The direction of an edge can be reversed by changing the corresponding state according to the right figure. The pair of adjacent vertex v0 and face f 0 consist of a site. This site s0 = (v0 , f 0 ) is depicted as a dotted line. A ribbon connecting two sites s0 and s1 is also shown. The corresponding ribbon operator acts on bold edges and is defined in Fig. 3
g
Fig. 2. Definition of operators As and Bsh . Note that if h is in the center of G, Bsh depends only on the face f and not vertex v
and B f = Bs = Bse . Operators Av and B f are projections and pairwise commute. Then consider the Hamiltonian HG = − Av − Bf , v
f
where the summations run over all vertices v and faces f . Since all terms of the Hamiltonian commute, the ground state of HG is a state |ψ such that Av |ψ = B f |ψ = |ψ . For a planar lattice the ground state is unique and can be explicitly computed [1]. Nevertheless, here we are interested in elementary excitations.
670
S. Beigi, P. W. Shor, D. Whalen
h,g
Fig. 3. Definition of the ribbon operator Fξ
. The ribbon ξ connects the starting site s0 to the ending site s1
3.2. Ribbon operators. A ribbon ξ in the lattice is a sequence of “adjacent” sites connecting two sites s0 and s1 as in Fig. 1. From now on we always assume that s0 is the starting site of ξ and s1 is the ending site. (We also assume that ribbons avoid selfh,g crossing.) For any ribbon ξ and g, h ∈ G the ribbon operator Fξ is defined as in Fig. 3. It is easy to see that h,g
Fξ
h ,g
Fξ
hh ,g
= δg,g Fξ h −1 ,g
h,g
(Fξ )† = Fξ
,
(7)
.
(8) h,g
h,g
Moreover, for every site t different from s0 and s1 , [Fξ , Akt ] = [Fξ , Btk ] = 0, and we have h,g
Aks0 Fξ
h,g Bsk0 Fξ
khk −1 ,kg
= Fξ =
Aks0 ,
(9)
h,g Fξ Bskh , 0
(10)
and h,g
Aks1 Fξ
h,g
Bsk1 Fξ
h,gk −1
= Fξ
h,g
= Fξ
Aks1 ,
g −1 h −1 gk
Bs1
(11) .
(12)
3.3. Elementary excitations. Let |ψ be the ground state of the Hamiltonian HG . Fix a h −1 ,g
ribbon ξ which connects two sites s0 and s1 . Since Fξ HG except the terms at sites s0 , s1
, the state |ψ h,g
=
commutes with all terms of
h −1 ,g Fξ |ψ satisfies all constraints
The Quantum Double Model with Boundary: Condensations and Symmetries
671
of the Hamiltonian except the ones at s0 and s1 . Moreover, |ψ h,g does not depend on ξ , but only on the end points s0 , s1 ; that is, if ξ is another ribbon with the same end points, h −1 ,g
h −1 ,g
h,g
|ψ h,g = Fξ |ψ = Fξ |ψ , [1]. Therefore, applying Fξ on the ground state can be thought of as creating a pair of quasi-particles at sites s0 and s1 . Thus movement of such quasi-particles is equivalent to extending the corresponding ribbons, and then their braidings can be defined. Furthermore, to fuse two quasi-particles of this form we can simply move them to the same site. As a result, the set of these quasi-particles describes a system of anyons. Now the question is whether there exists other elementary excited states or not. The space of quasi-particle excitations living at s0 and s1 is equal to L(s0 , s1 ) = {|v : At |v = Bt |v = |v for all t = s0 , s1 } . By the above argument |ψ h,g belongs to L(s0 , s1 ), and it is proved in [1] that L(s0 , s1 ) is spanned by states |ψ h,g . Therefore, all excitations in this system can be obtained by applying ribbon operators on the ground state. The inner product in this space is computed in the following lemma. Lemma 3.1.
1 ψ h,g |ψ h ,g = δh,h δg,g . |G| h −1 ,g † h −1 ,g ) Fξ |ψ
Proof. ψ h,g |ψ h ,g = ψ|(Fξ to show that
h,g
ψ|Fξ |ψ =
(13) hh −1 ,g
= δg,g ψ|Fξ
|ψ . So it suffices
1 δh,e . |G|
(14)
Using (12) we have h,g
h,g
h,g
ψ|Fξ |ψ = ψ|Bse1 Fξ |ψ = ψ|Fξ
g −1 h −1 g
Bs1
|ψ .
h,g
Thus ψ|Fξ |ψ = 0 if h = e. Now by (11) and Aks1 As1 = As1 we obtain e,g
e,g
ψ|Fξ |ψ = ψ|Aks1 Fξ |ψ e,gk −1
= ψ|Fξ
e,gk −1
= ψ|Fξ
e,g
Then for every g, g we have ψ|Fξ |ψ = ψ|Fξ is the identity operator, so we are done. e,g
Aks1 |ψ |ψ .
|ψ . On the other hand,
e,g
g∈G
Fξ
672
S. Beigi, P. W. Shor, D. Whalen
3.4. Anyon-types. The only remaining question is to find different types of anyons. Let t be a site different from s0 , s1 , and let ζ be a closed ribbon which encircles s0 but not s1 , and both of whose end points are t. To characterize an unknown excitation sitting at s0 , we can create a particle-antiparticle pair at t, move one of them along ζ and rotate it around s0 , and finally by measuring the vertex and face operators at t check whether the pair (after braiding) fuses to vacuum or not. We can identify the anyon at s0 by repeating this process for different particle-antiparticle pairs that we create at t. Of course, if there is no excitation at s0 , we expect that after the rotation, the particleantiparticle pair always fuses to vacuum. Mathematically, creating this pair and moving one of them, correspond to applying some ribbon operator along ζ . Moreover, if the pair fuses to vacuum, this ribbon operator should not create any excitation at t. This means that, besides the vertex and face operators along ζ , this ribbon operator must commute h,g with At and Bt as well. Letting Fζ , be the algebra of ribbon operators Fζ , g, h ∈ G, we conclude that the anyon-types are characterized by the subalgebra Kζ ⊆ Fζ of ribbon operators which commute with At and Bt : Kζ = {T ∈ Fζ : [T, At ] = [T, Bt ] = 0}. In [2] for any irreducible representation X of D(G) a ribbon operator T X ∈ Kζ is defined, and it is proved that Kζ is generated by these operators and the following equations hold: (T X )† = T X , T X T Y = δ X,Y T X , T X = I. X
As a result, anyon-types are in one-to-one correspondence with irreps of D(G). Indeed, the set of projections T X decompose the space of excitations L(s0 , s1 ) = T X L(s0 , s1 ), (15) X
and |v ∈ T X L(s0 , s1 ) is a state of an anyon of type X . The decomposition (15) can also be derived from another point of view. Using the commutation relations (9) and (10), it is easy to see that Aks0 |ψ h,g = |ψ khk
−1 ,kg
,
Bsl 0 |ψ h,g = δl,h |ψ h,g . As we mentioned in Sect. 3.1 the algebra generated by operators Aks0 and Bsl 0 is isomorphic to D(G). Then the above equations define a representation of D(G) on the space L(s0 , s1 ). This representation, however, is equivalent to the regular representation of D(G); that is, by sending |ψ h,g to h ∗ g, the action of D(G) is given by multiplication from the left. On the other hand, decomposing the regular representation of D(G) into irreducible ones, we obtain all irreps of D(G) as a summand, i.e., as representations of D(G) we have L(s0 , s1 ) (dimX )V X , X
The Quantum Double Model with Boundary: Condensations and Symmetries
673
where the direct sum runs over irreps of D(G) and V X is the vector space corresponding to irrep X . Again we refer to [2] for an explicit description of this isomorphism in terms of basis vectors, which indeed is the same as the decomposition (15): T X L(s0 , s1 ) (dimX )V X . As a summary, anyon-types in this model are described by simple objects of Z(G). Moreover, given a subspace of elementary excitations W ⊆ L(s0 , s1 ), to find the types of anyons in W we can compute the character of the representation W of D(G), and then decompose it into irreducible ones. We will use this method in Sects. 4 and 5 to find condensed anyons. Remark 3.1. Here the representation of D(G) on the space L(s0 , s1 ) is defined based on the action of the vertex and face operators at site s0 , and another representation can be found by considering those operators at s1 . However, it is not hard to see that these two representations of D(G) are charge conjugations of each other and it does not matter which representation is picked to find anyon-types. Indeed, the anyon sitting at s1 is the charge conjugation of the one at s0 . 4. The Quantum Double Model with Boundary I In this section we define a boundary for the quantum double model and compute the corresponding condensation. This model will be generalized in the following section. Instead of a planar lattice, consider a lattice defined on a half-plane as in Fig. 4. Again CG is the Hilbert space associated with internal edges, however, the Hilbert space of the boundary edges is CK , where K ⊆ G is a fixed subgroup. The vertex and face operators corresponding to internal sites are defined as before. But for a boundary site s = (v, f ) the vertex operator is AsK =
1 k As . |K | k∈K
Also the fact that the corresponding Hilbert space to a boundary edge is CK ⊆ CG, can be captured by considering the projection onto this subspace Bsk . BsK = k∈K
Now define the Hamiltonian HG,K = −
v
Av −
f
Bf −
(AsK + BsK ),
(16)
s
where the summations run over internal vertices and faces v, f and boundary sites s. Observe that similar to HG , all terms of HG,K commute. By the same reasoning as before the bulk excitations can be created and moved by applying ribbon operators, and they are in one-to-one correspondence with irreps of D(G). Some of these quasi-particles, however, may disappear when they move to the boundary; that is, because the local terms of the Hamiltonian on the boundary are different from internal terms, an excitation which violates some of the internal constraints, may become a ground state when it moves to a boundary site.
674
S. Beigi, P. W. Shor, D. Whalen
Fig. 4. A planar lattice with boundary. A ribbon connects the boundary site s0 = (v0 , f 0 ) to the internal site s1 = (v1 , f 1 ). Here, operators Aks0 and Bsk0 are defined the same as before: Aks0 |x, y, z = |kx, ky, kz and Bsk0 |x = δk,x |x
4.1. Condensations. Fix a ribbon ξ which connects a boundary starting site s0 to an internal site s1 as in Fig. 4. Let Cξ ⊆ Fξ be the subalgebra of operators that commute with both AsK0 and BsK0 : Cξ = T ∈ Fξ : [T, AsK0 ] = [T, BsK0 ] = 0 . Then for every T ∈ Cξ , by applying T on the ground state of HG,K , we generate a quasi-particle at s1 , but no excitation at s0 . It means that the quasi-particle at s1 disappears when it moves to the boundary site s0 . In this case we say that this excitation gets condensed at the boundary. So to classify condensations we should find the algebra Cξ . h,g Let T = h,g ch,g Fξ be in Cξ . Using the commutation relations2 (9) and (10), [T, AsK0 ] = 0 is equivalent to ckhk −1 ,kg = ch,g , for every k ∈ K , and [T, BsK0 ] = 0 if and only if ch,g = 0, for h ∈ / K . These two relations completely characterize Cξ as follows. For every k ∈ K and g ∈ G define T k,g =
lkl −1 ,lg −1
Fξ
.
(17)
l∈K
Proposition 4.1. The algebra Cξ is spanned by operators T k,g , k ∈ K , g ∈ G, and the following equations hold: 1. 2. 3. 4.
−1
T k,gm = T mkm ,g , for every m ∈ K . T k,g T k ,g = 0 if g K = g K . k,g k T T ,g = T−1kk ,g . k,g † k ,g (T ) = T .
2 Although these equations are given for an internal site, it is not hard to see that they also hold for boundary sites.
The Quantum Double Model with Boundary: Condensations and Symmetries
675
Proof. The proof is straightforward and is left to the reader. (We will prove a generalization of this proposition in the next section.) Now in order to find the type of condensed anyons we use the idea of Sect. 3.3 and compute the representation of D(G) induced by Cξ . Let |ψ K be the ground state k,g of HG,K . For k ∈ K and g ∈ G define |ψ K = T k,g |ψ K and let A(K ) be the span of these vectors. Since the operators T k,g commute with AsK0 and BsK0 , we have k,g
k,g
k,g
AsK0 |ψ K = BsK0 |ψ K = |ψ K . But using (11) and (12) k,g
k,hg
Ash1 |ψ K = |ψ K k,g Bsh1 |ψ K
, k,g
= δh,gkg−1 |ψ K ,
and we obtain a representation of D(G) on A(K ). To compute the character of this representation we need to fix a basis. Assume that |G|/|K | = r and G = g1 K ∪ · · · ∪ gr K . k,g Then by Proposition 4.1 the |G| states |ψ K i , k ∈ K , i = 1, . . . , r , span A(K ). We have k,g
k ,g j
ψ K i |ψ K
= ψ K |(T k,gi )† T k ,g j |ψ K −1
= δi, j ψ K |T k k ,gi |ψ K mk −1 k m −1 ,mgi−1 = δi, j ψ K |Fξ |ψ K m∈K
|K | δi, j δk,k , = |G|
(18)
where in the last line we use (14) which still holds even considering a boundary. So √ k,g { r |ψ K i : k ∈ K , i = 1, . . . , r } is an orthonormal basis for A(K ). Now we are ready to compute χA(K ) the character of the representation A(K ). For g, h ∈ G we have χA(K ) (hg ∗ ) = r =r =r
r k∈K i=1 r k∈K i=1 r
ψ K i |hg ∗ |ψ K i k,g
k,g
k,g
k,g
δg,gi kg−1 ψ K i |h|ψ K i i
k,g
k,hgi
δg,gi kg−1 ψ K i |ψ K i
k∈K i=1
.
Let hgi = g(i) ki where ki ∈ K and 1 ≤ (i) ≤ r . Thus by Proposition 4.1 we have χA(K ) = r =r
r k∈K i=1 r k∈K i=1
ki kki−1 ,g(i)
k,g
δg,gi kg−1 ψ K i |ψ K i
δg,gi kg−1 δk,ki kk −1 δi,(i) i
i
|K | |G|
676
S. Beigi, P. W. Shor, D. Whalen
= =
r k∈K i=1 r
δg−1 ggi ,k δg−1 hgi ,ki δkki ,ki k i
δg−1 ggi ,k δg−1 hgi ,ki δgh,hg i
k∈K i=1
= δgh,hg
i
r i=1
i
δg−1 ggi ∈K δg−1 hgi ∈K , i
i
and then χA(K ) (hg ∗ ) =
1 δgh,hg δxgx −1 ∈K δxhx −1 ∈K . |K |
(19)
x∈G
To compute the set of condensed anyons we just need to decompose this character into irreducible ones. Let us give some examples. Let K = G. We have χ A(G) (hg ∗ ) =
1 δgh,hg 1 = δgh,hg . |G|
(20)
x∈G
Then using (2) it is easy to see that χ A(G) = a χ(a,1) . As a result, in this case the condensation corresponds to all fluxions. Another example is K = {e}. We have χ A({e}) (hg ∗ ) = |G|δg,e δh,e = δg,e trρ (h),
where ρ denotes the regular representation of G. Therefore, χ A({e}) = π dimπ χ(e,π ) , where the summation is over irreps of G, and condensed anyons are all chargeons. In general, an anyon indexed by (a, π ) is condensed if χA(K ) and χ(a,π ) are not orthogonal: χA(K ) , χ(a,π ) > 0 (see (3) for the definition of the inner product of characters). Expanding χA(K ) , χ(a,π ) , it is easy to see that the condensed anyons in our model coincide with those in the model proposed in [2]. Bombin and Martin-Delgado have defined a variation of the quantum double model in which two subgroups N ⊆ M are involved, and described the necessary and sufficient condition for an anyon (a, π ) to be condensed. This condition in the case where M = N = K is the same as our constraint χA(K ) , χ(a,π ) > 0 which means that these two models characterize similar condensations.3 Remark 4.1. In the example of K = G the condensation consists of all fluxions. Since some of these condensed anyons may have non-trivial braidings, we should have confinements in this case. Confinements have been discussed in [2] and classified in some cases. 5. The Quantum Double Model with Boundary II We now generalize the boundary defined in the previous section. Here besides a subgroup K , the boundary depends on a 2-cocyle of K as well. 3 Bombin and Martin-Delgado [2] have assumed that N is a normal subgroup of G, however, it seems that the normality of N in M is enough.
The Quantum Double Model with Boundary: Condensations and Symmetries
677
5.1. 2-cocycles. Let ϕ : K × K → C× be a function such that for every k, l, m ∈ K , ϕ(kl, m)ϕ(k, l) = ϕ(k, lm)ϕ(l, m).
(21)
Then ϕ is called a 2-cocycle of K . Every 2-cocycle comes from a projective representation and vise versa. A representation of K on the vector space V is indeed a homomorphism K → GL(V ), where GL(V ) is the group of invertible linear transformations of V . A projective representation is a homomorphism ρ : K → PGL(V ), where PGL(V ) = GL(V )/C is the quotient of GL(V ) modulo scalers. Thus every representation, by composing with the map : GL(V ) → PGL(V ), gives a projective representation, but the converse does not hold. However, a projective representation provides us with a 2-cocycle: for every k ∈ K fix L(k) ∈ GL(V ) such that (L(k)) = ρ(k) (L is a lifting of ρ). So (L(kl)) = (L(k)L(l)) and then there exists ϕ(k, l) ∈ C such that ϕ(k, l)L(kl) = L(k)L(l).
(22)
It is easy to see that this function ϕ satisfies (21) and is a 2-cocycle. Conversely, every 2-cocycle ϕ corresponds to a projective representation. Let V = CK and define L(k)|l = ϕ(k, l)|kl . Then ρ(k) = (L(k)) gives a projective representation and (22) holds. Let α : K → C× be an arbitrary function. Then L (k) = α(k)L(k) also satisfies (L (k)) = ρ(k), and defines another 2-cocycle ϕ corresponding to the same projective representation. We call such two 2-cocycles ϕ and ϕ equivalent. More precisely, ϕ and ϕ are equivalent if there exists α such that ϕ (k, l) = α(kl)−1 α(k)α(l)ϕ(k, l). The set of 2-cocycles of K up to the above equivalency is denoted by H 2 (K , C× ). Lemma 5.1. Every 2-cocycle is equivalent to one with the following properties. 1. 2. 3. 4.
ϕ(e, k) = ϕ(k, e) = 1, ϕ(k, k −1 ) = 1, |ϕ(k, l)| = 1, ϕ(k −1 , l −1 ) = ϕ(l, k)−1 .
Proof. Every 2-cocycle corresponds to a projective representation. Consider a lifting of this projective representation such that L(e) = I , L(k −1 ) = L(k)−1 , and det L(k) = 1 for every k ∈ K . Then the above equations follow from (22). For simplicity from now on we assume that ϕ satisfies the properties of this lemma.
678
S. Beigi, P. W. Shor, D. Whalen
Fig. 5. A lattice with boundary in which every other boundary edge is marked by a dotted line. Here we fix a boundary site s0 = (v0 , f 0 ), where f 0 is adjacent to a solid boundary edge
5.2. A new boundary. Consider a lattice on a half-plane as before. The vertex and face operators for internal sites remain unchanged. Also, the state of the boundary edges lives in the space CK which as before can be captured by adding the projections BsK (for boundary sites s) to the Hamiltonian. However, we need to change the vertex operators at the boundary. We assume that at the boundary, every other edge is marked by a dotted line as in Fig. 5. Thus, for every boundary site there are three adjacent edges: (1) an internal edge, (2) a boundary solid edge, (3) and a boundary dotted edge. Then for every k ∈ K and ks as in Fig. 6. The equality A ks A kl ls = A boundary site s define the vertex operator A s can −1 4 k † k s ) = A s . Then be proved using (21). Furthermore, using Lemma 5.1 we have ( A ks sK = 1 A A |K | k∈K
is a projection. Now define the Hamiltonian sK + BsK ), G,K = − Av − Bf − (A H v
f
(23)
s
where the summations run over internal vertices and faces v, f and boundary sites s. The K . The bulk excitations as before, are ground state of this Hamiltonian is denoted by |ψ created by applying ribbon operators and are labeled by irreps of D(G). Condensations, however, are different because the boundary terms have been changed. 5.3. Condensations II. In this section we fix a ribbon ξ that connects a boundary site s0 to an internal site s1 . Then the same as before we consider the subalgebra of ribbon opersK and BsK , and compute the corresponding representation of ators that commute with A 0 0 D(G) in order to find the excitations that get condensed at s0 . Nevertheless, because we 4 This is the equation that motivates us to have two types of boundary edges; if all the boundary edges were ks A ls = ϕ(k, l)2 A kl solid, we had A s .
The Quantum Double Model with Boundary: Condensations and Symmetries
679
kv for k ∈ K and boundary vertex v0 . Here we assume that x, y are in K Fig. 6. Definition of A 0
k,g for a ribbon which connects a boundary site s0 with a corresponding solid edge, to Fig. 7. Definition of F ξ an internal site s1 . Here we assume that k, x1 are in K
have changed the definition of the boundary terms, the ribbon operators when acting on boundary edges, must also be modified. From now on assume that the boundary edge corresponding to site s0 is solid (see k,g as in Fig. 7. Observe Fig. 5). Then for k ∈ K and g ∈ G define the ribbon operator F ξ that
k,g F kk ,g , k ,g = δg,g ϕ(k, k ) F F ξ ξ ξ k −1 ,g
k,g )† = F (F ξ ξ
.
(24) (25)
k,g commutes with At and Bt for every internal site t = s1 , and the commutation F ξ relations with Ash1 and Bsh1 are as before −1
Ash1 F ξ
k,gh =F ξ
Bsh1 F ξ
k,g Bsg1 =F ξ
k,g
Ash1 , −1 k −1 gh
k,g
(26) ,
(27)
and for l ∈ K , Bsl 0 F ξ
k,g
k,g Bslk . =F ξ 0
(28)
680
S. Beigi, P. W. Shor, D. Whalen
However, we have lkl ls F k,g = ϕ(lk, l −1 )ϕ(l, k) F A ξ 0 ξ
−1 ,lg
ls . A 0
(29)
Remark 5.1. It is shown in [2] that the ribbon operators are indeed certain extensions ks , the same of the local operators. Considering the extra phases in the definition of A 0 k,g k,g , which comparing to F has an extra phase. extension gives the definition of F ξ
ξ
k,g only when k belongs to K . This is because Remark 5.2. We defined the operators F ξ we are interested in ribbon operators which commute with BsK0 . According to (28), this condition automatically enforces k to be in K . ξ be the algebra generated by F k,g , k ∈ K , g ∈ G, and Let F ξ ξ : [T, A sK ] = [T, BsK ] = 0}. Cξ = {T ∈ F 0 0 For every k ∈ K and g ∈ G define −1 −1 lkl ,lg . k,g = ϕ(l, k)ϕ(lk, l −1 ) F T ξ
(30)
l∈K
k,g , k ∈ K , g ∈ G. Proposition 5.1. The algebra Cξ is spanned by the operators T k,g ] = 0 is easy. For m ∈ K we have Proof. [BsK0 , T m k,g = A s0 T =
m lkl ϕ(l, k)ϕ(lk, l −1 ) A s0 Fξ
−1 ,lg −1
l∈K mlkl −1 m −1 ,mlg −1 m As0
ϕ(l, k)ϕ(lk, l −1 )ϕ(m, lkl −1 )ϕ(mlkl −1 , m −1 ) F ξ
l∈K
=
lkl ϕ(m −1l, k)ϕ(m −1lk, l −1 m)ϕ(m, m −1lkl −1 m)ϕ(lkl −1 m, m −1 ) F ξ
−1 ,lg −1
m A s0 .
l∈K
sK , T k,g ] = 0 it is sufficient to show that So to prove [ A 0 ϕ(m −1l, k)ϕ(m −1 lk, l −1 m)ϕ(m, m −1lkl −1 m)ϕ(lkl −1 m, m −1 ) = ϕ(l, k)ϕ(lk, l −1 ). By three applications of (21) and Eq. 4 of Lemma 5.1 we have ϕ(m −1l, k)ϕ(m −1 lk, l −1 m)ϕ(m, m −1lkl −1 m)ϕ(lkl −1 m, m −1 ) = ϕ(m −1l, k)ϕ(m, m −1 lk)ϕ(lk, l −1 m)ϕ(lkl −1 m, m −1 ) = ϕ(m −1l, k)ϕ(m, m −1 lk)ϕ(l −1 m, m −1 )ϕ(lk, l −1 ) = ϕ(l, k)ϕ(m, m −1 l)ϕ(l −1 m, m −1 )ϕ(lk, l −1 ) = ϕ(l, k)ϕ(lk, l −1 ). k,g ∈ Cξ . To see that these operators span Cξ , observe that if T = As a result, T k,g is in Cξ , by [T, BsK ] = 0, ck,g = 0 for every k ∈ ck,g F / K . Moreover, by ξ 0 K s ] = 0, cmkm −1 ,lg , for every m ∈ K , is uniquely determined in terms of ck,g . [T, A 0
The Quantum Double Model with Boundary: Condensations and Symmetries
681
Proposition 5.2. The following equations hold: 1. 2. 3. 4.
mkm −1 ,g , for every m ∈ K . k,gm = ϕ(m, k)ϕ(mk, m −1 ) T T k,g T k ,g = 0, if g K = g K . T k,g kk ,g . k ,g = ϕ(k, k )T T T −1 k ,g . k,g )† = T (T
Proof. k,gm = T
lkl −1 ,lm −1 g −1
ϕ(l, k)ϕ(lk, l −1 ) F ξ
l∈K
=
lmkm ϕ(lm, k)ϕ(lmk, m −1 l −1 ) F ξ
−1 l −1 ,lg −1
.
l∈K
So for the first equation it is sufficient to show ϕ(lm, k)ϕ(lmk, m −1 l −1 ) = ϕ(m, k)ϕ(mk, m −1 )ϕ(l, mkm −1 )ϕ(lmkm −1 , l −1 ). Using Lemma 5.1 we have ϕ(lm, k)ϕ(lmk, m −1 l −1 ) = ϕ(lm, k)ϕ(l, m)ϕ(m −1 , l −1 )ϕ(lmk, m −1l −1 ) = ϕ(l, mk)ϕ(m, k)ϕ(lmk, m −1 )ϕ(lmkm −1 , l −1 ) = ϕ(m, k)ϕ(lmkm −1 , l −1 )ϕ(l, mkm −1 )ϕ(mk, m −1 ). The second equation is an easy consequence of (24). For the third equation we have k,g T k ,g = T
lkl ϕ(l, k)ϕ(lk, l −1 ) F ξ
l∈K
=
−1 ,lg −1
l ∈K
−1 ,l g −1
l k l ϕ(l , k )ϕ(l k l −1 ) F ξ
lkk l −1 ,lg −1
ϕ(l, k)ϕ(lk, l −1 )ϕ(l, k )ϕ(lk , l −1 )ϕ(lkl −1 , lk l −1 ) F ξ
.
l∈K
So we need to show that ϕ(l, k)ϕ(lk, l −1 )ϕ(l, k )ϕ(lk , l −1 )ϕ(lkl −1 , lk l −1 ) = ϕ(k, k )ϕ(l, kk )ϕ(lkk , l −1 ), which can be proved by ϕ(l, k)ϕ(lk, l −1 )ϕ(l, k )ϕ(lk , l −1 )ϕ(lkl −1 , lk l −1 ) = ϕ(l, k)ϕ(l, k )ϕ(lk , l −1 )ϕ(lk, k l −1 )ϕ(l −1 , lk l −1 ) = ϕ(l, k )ϕ(lk , l −1 )ϕ(l −1 , lk l −1 )ϕ(l, kk l −1 )ϕ(k, k l −1 ) = ϕ(l, k l −1 )ϕ(k , l −1 )ϕ(l −1 , lk l −1 )ϕ(l, kk l −1 )ϕ(k, k l −1 ) = ϕ(l, k l −1 )ϕ(l −1 , lk l −1 )ϕ(l, kk l −1 )ϕ(kk , l −1 )ϕ(k, k ) = ϕ(l −1 , l)ϕ(e, k l −1 )ϕ(l, kk l −1 )ϕ(kk , l −1 )ϕ(k, k ) = ϕ(k, k )ϕ(l, kk )ϕ(lkk , l −1 ).
682
S. Beigi, P. W. Shor, D. Whalen
For the last equation we have k,g )† = (T
−1 −1 † lkl ,lg ϕ(l, k)∗ ϕ(lk, l −1 )∗ F ξ
l∈K
=
lk ϕ(k −1 , l −1 )ϕ(l, k −1 l −1 ) F ξ
−1 l −1 ,lg −1
l∈K
=
lk −1 l −1 ,lg −1
ϕ(l, k −1 )ϕ(lk −1 , l −1 ) F ξ
l∈K
k −1 ,g . =T This proposition gives a full characterization of the algebra Cξ . The next step is to G,K . K be the ground state of H compute the induced representation of D(G). Let |ψ k,g k,g Define |ψ K = T |ψ K , and let A(K , ϕ) be the span of these vectors. Using (26) and (27) the representation A(K , ϕ) of D(G) is given by k,g = |ψ k,hg , Ash1 |ψ K K k,g = δh,gkg−1 |ψ k,g . Bsh1 |ψ K K To compute the character of this representation we need the following lemma. Lemma 5.2. The following equations hold for every k, k ∈ K and g, g ∈ G: |ψ K = K | F 1. ψ ξ k,g
1 |G| δk,e .
k,g |ψ k ,g = 0 if g K = g K . 2. ψ K K k ,g = |K | δk,k . k,g |ψ 3. ψ K
K
|G|
Proof. The proof follows from similar steps as in the proof of Lemma 3.1 and (18). The following theorem gives a generalization of (19). Theorem 5.1 [6]. The character of the representation A(K , ϕ) of D(G) is given by χA(K ,ϕ) (gh ∗ ) =
1 δgh,hg δxgx −1 ∈K δxhx −1 ∈K ϕ(xgx −1 |xhx −1 ), |K |
(31)
x∈G
where ϕ(k|l) = ϕ(k, l)ϕ(klk −1 , k)−1 which equals ϕ(k, l)ϕ(kl, k −1 ) if ϕ satisfies Lemma 5.1. Proof. Let r = |G|/|K | and assume that G = g1 K ∪ · · · ∪ gr K . Then by Proposition √ k,gi : k ∈ K , i = 1, . . . , r } is an orthonormal basis for 5.2 and Lemma 5.2, { r |ψ K A(K , ϕ). Fix g, h ∈ G and let 1 ≤ (i) ≤ r and ki ∈ K such that ggi = g(i) ki . We have
The Quantum Double Model with Boundary: Condensations and Symmetries
χA(K ,ϕ) (gh ∗ ) = r =r =r =r =r
r k,gi |gh ∗ |ψ k,gi ψ K K k∈K i=1 r k∈K i=1 r k∈K i=1 r k∈K i=1 r
k,gi |ψ k,ggi δh,gi kg−1 ψ K K i
r i=1
=
k,g(i)ki
i |ψ δh,gi kg−1 ψ K K k,g
i
−1
k,gi |ψ ki kki δh,gi kg−1 ϕ(ki |k) ψ K K
,g(i)
i
δh,gi kg−1 ϕ(ki |k)δi,(i) δk,ki kk −1 i
k∈K i=1
=
683
i
|K | |G|
δg−1 ggi ∈K δg−1 hgi ∈K δg−1 hggi , g−1 ghgi ϕ(gi−1 ggi |gi−1 hgi ) i
i
i
i
1 δgh,hg δxgx −1 ∈K δxhx −1 ∈K ϕ(xgx −1 |xhx −1 ). |K | x∈G
Now to compute the condensed anyons we just need to decompose the character of the representation A(K , ϕ) into irreducible ones. An excitation (a, π ) gets condensed at the boundary if χ(a,π ) appears in this decomposition. Remark 5.3. Note that if ϕ is another 2-cocycle equivalent to ϕ then ϕ (k|l) = ϕ(k|l) for every k, l where kl = lk. Therefore, Theorem 5.1 still holds even if ϕ does not satisfy Lemma 5.1. Remark 5.4. A(K , ϕ) gives a representation of D(G), so it is an object of Z(G). On the other hand, by Proposition 5.2, an algebra structure is defined on A(K , ϕ). Davydov has shown that these algebras are indeed all maximal indecomposable separable commutative algebras of Z(G) [6]. 6. A Domain Wall Between Two Phases We now consider two phases corresponding to two groups G and G and a domain wall between them, and want to study tunneling of anyons from the G-phase to the G -phase. More precisely, we consider a planar lattice divided into two parts by a defect line; we associate the right half-plane to the quantum double model corresponding to group G, and the left half-plane to group G ; the terms of the Hamiltonian on the domain wall then depend on a subgroup U ⊆ G ×G and a 2-cocycle ϕ ∈ H 2 (U, C× ). To understand these terms we can simply use the folding idea, i.e., we fold the plane through the defect line. Then we have one half-plane with a boundary. In this case, there are two vectors living on each edge of the lattice: one corresponding to G and the other to G . In other words, the Hilbert space associated with each edge of the lattice is CG ⊗ CG C(G × G ). Moreover, the boundary is parametrized by U and the whole Hamiltonian would be G×G ,U given by (23). equal to H
684
S. Beigi, P. W. Shor, D. Whalen
Again, the bulk excitations correspond to irreps of D(G × G ) which are basically pairs of irreps of D(G) and D(G ) and can be interpreted as two anyons, one belonging to the G-phase and the other to the G -phase. More formally, the bulk excitations correspond to Z(G × G ) which is equivalent to Z(G) Z(G ) [6]. Here there is a technical point. After folding, we change the orientation of the right half-plane, and then the braiding operator CY,Y (see Sect. 2.4) will change to CY−1 ,Y . That G×G ,U corresponds to the pair of anyons (X, Y op ) in is why the excitation X Y of H the unfolded plane, where if Y = (y, π ) then Y op = (y, π ∗ ) (see [6] for the definition of the opposite category). Therefore, in the unfolded plane, anyons on the right-hand side indeed live in the category Z(G)op . However, as we will show later in this section Z(G)op and Z(G) are equivalent categories. Now assume that an anyon X in the G-phase, without creating any excitation on the domain wall, tunnels to the G -phase and becomes Y . This process in the folded half-plane is equivalent to creating X Y op and moving it to the boundary so that it disappears, i.e., condensation of X Y op . Since we know a classification of condensations G×G ,U we can characterize tunnelings as well. of H Let us give an example to clarify our framework. Assume that G = G and let U = (G) = {(g, g) : g ∈ G} and ϕ be the trivial cocycle (ϕ ≡ 1). To find anyons in this condensation we use Theorem 5.1. For g = (g1 , g2 ) and h = (h 1 , h 2 ) in G × G we have χA((G),1) (gh ∗ ) =
1 δgh,hg δxg1 x −1 ,yg2 y −1 δxh 1 x −1 ,yh 2 y −1 |(G)|
(32)
x,y∈G
= δgh,hg δ
G
g1 h ∗1 ∼g2 h ∗2
|Z (g1 ) ∩ Z (h 1 )|,
(33)
G
where by g1 h ∗1 ∼ g2 h ∗2 we mean that there exists l ∈ G such that l(g1 h ∗1 )l −1 = g2 h ∗2 . Now define (gh ∗ ) =
χ(x,ρ)(x,ρ ∗ ) (gh ∗ ),
(34)
(x,ρ)
where the summation runs over all irreducible representations (x, ρ) of D(G). We have (gh ∗ ) =
χ(x,ρ) (g1 h ∗1 ) χ(x,ρ ∗ ) (g2 h ∗2 )
(x,ρ)
= δg1 h 1 ,h 1 g1 δg2 h 2 ,h 2 g2 δ = δgh,hg δ
δ
G
G
h 1 ∼h 2
Z (h )
trρ (g1 ) trρ ∗ (kh−1 g k ) 2 2 h2
(35) (36)
(h 1 ,ρ)
h 1 ∼h 2 g1 ∼1 (kh−1 g2 kh ) 2
|Z Z (h 1 ) (g1 )|
(37)
2
= δgh,hg δ
G
g1 h ∗1 ∼g2 h ∗2
|Z (g1 ) ∩ Z (h 1 )|,
(38)
where in the third line we use the orthogonality relations in the character table of Z (h 1 ). As a result, χA((G),1) = , or equivalently anyons of the form X X op get condensed.5 5 This example indeed shows that the map X → X op gives the equivalence between two categories Z(G) and Z(G)op .
The Quantum Double Model with Boundary: Condensations and Symmetries
685
6.1. A non-trivial auto-equivalence of Z(Fq+ Fq× ). The tunneling process may give an equivalence between two phases G and G . Suppose that condensations corresponding to A(U, ϕ) (U ⊆ G × G ), is described by the character χA(U,ϕ) = i χ(X i Yi ) . Then X i after tunneling, without creating any excitation at the domain wall, is changed to Yi . On the other hand, fusions and braidings are invariant under tunneling. Therefore, if op X i ’s and Yi ’s are all simple objects of Z(G) and Z(G ) respectively, then X i → Yi gives an equivalency between anyons of the G-phase and G -phase.6 Using this idea we show a non-trivial symmetry in Z(Fq+ Fq× ). Let Fq be the finite field with q elements and denote its additive and multiplicative groups by Fq+ and Fq× respectively. Then the semidirect product of these groups is defined as follows. We represent elements of Fq+ Fq× by (a, α), where a ∈ Fq+ and α ∈ Fq× , and define (a, α)(a , α ) = (a + α × a , α × α ) which by abuse of notation is denoted by (a + αa , αα ). The identity element of this group is e = (0, 1) and the inverse of (a, α) is equal to (a, α)−1 = (−α −1 a, α −1 ). We will use the following properties of Fq+ Fq× . The conjugacy class of (a, α) is (a, α) = {(b, α) : b ∈ Fq+ } if α = 1, and (1, 1) = {(b, 1) : b ∈ Fq+ , b = 0}. K = {(a, 1) : a ∈ Fq+ } is a normal subgroup of Fq+ Fq× isomorphic to Fq+ , and K = Z (1, 1). Also note that for every (a, α), where α = 1, |Z (a, α)| = q − 1 and K ∩ Z (a, α) = {e}. Consider a non-trivial irrep of K , and let π be the corresponding induced representation on Fq+ Fq× . Then trπ (e) = q − 1, trπ (1, 1) = −1, and trπ (a, α) = 0 if α = 1. Since (a,α) |trπ (a, α)|2 = |Fq+ Fq× |, π is an irreducible representation of Fq+ Fq× . Theorem 6.1. There exists an auto-equivalence of Z(Fq+ Fq× ) whose corresponding permutation on simple objects is of the form P J, where J sends every object to its charge conjugation (J : X → X ∨ ) and P is the transposition of the chargeon C = (e, π ) and fluxion F = ((1, 1), 1). Proof. Define U = {((a1 , α), (a2 , α −1 )) : a1 , a2 ∈ Fq+ , α ∈ Fq× }. Let p be the characteristic of Fq (so q is a power of p), and let ω be a p th root of unity (ω p = 1). Additionally, assume that tr p : Fq → F p is the trace function, i.e., tr p (a) is equal to the trace of the F p -linear map x → ax. Now define ϕ : U × U → C× by ϕ(g, h) = ωtr p (αa2 b1 ) ,
(39)
where g = ((a1 , α), (a2 , α −1 )) and h = ((b1 , β), (b2 , β −1 )). ϕ satisfies (21), and then ϕ ∈ H 2 (U, C× ). In Appendix A it is shown that the character of the representation A(U, ϕ) is given by χA(U,ϕ) (gh ∗ ) = 0 if g or h is not in U , and δgh,hg δg,h∈U (q − 1) if α = 1 or β = 1, ∗ χA(U,ϕ) (gh ) = δgh,hg δg,h∈U δa1 b2 ,a2 b1 (q − 1)−δa1 b2 =a2 b1 if α = β = 1, (40) 6 X → Y gives the equivalence Z(G) Z(G )op which by combining with the equivalence Z(G )op i i op Z(G ) we find that Z(G) Z(G ) is given by X i → Yi .
686
S. Beigi, P. W. Shor, D. Whalen
if g = ((a1 , α), (a2 , α −1 )) and h = ((b1 , β), (b2 , β −1 )) belong to U . Furthermore, it is shown that χA(U,ϕ) = − where (gh ∗ ) =
(x,ρ)
χ(x,ρ)(x −1 ,ρ) (gh ∗ ),
(41)
and = χC C +χ F F −χC F −χ F C . As a result, A(U, ϕ) gives an auto-equivalence of Z(Fq+ Fq× ) which transposes C and F and sends (x, ρ) = C, F to (x −1 , ρ)op = (x, ρ)∨ . q = 2, the simplest example of this theorem, gives the group Z2 , and the corresponding auto-equivalence is described in Sect. 1. For q = 3 the group Fq+ Fq× is isomorphic to S3 , and the chargeon and fluxion constructed in the proof, correspond to representations C and F described in Sect. 2.5. Moreover, in Z(S3 ) the charge conjugation of each particle is itself. Thus this autoequivalence of Z(S3 ) only transposes C and F, which means that these two particles in Z(S3 ) are indistinguishable. In Appendix B we show that a group G has a symmetry similar to that of Fq+ Fq× only if G H+ H× , where H is a finite near-field. 6.2. When are Z(G) and Z(G ) equivalent?. The proof of Theorem 6.1 is based on the fact that there exists A(U, ϕ) such that χA(U,ϕ) = i χ X i Yi gives a permutation between anyons of the two phases. So if we find the necessary and sufficient condition for the existence of such U ⊆ G × G and ϕ ∈ H 2 (U, C× ) we can answer the question of whether Z(G) and Z(G ) are equivalent. This question was first answered by Naidu and Nikshych [11,12] based on the classification of Lagrangian subcategories of Z(G). Here we state the necessary and sufficient conditions of Davydov which is more appropriate for us. Theorem 6.2 [6]. An equivalence between Z(G) and Z(G ) corresponds to a subgroup U ⊆ G × G , and ϕ ∈ H 2 (U, C× ) such that 1. the projections of U onto the first and second components are equal to G and G respectively, and 2. the restriction of ϕ(·|·) (defined in Theorem 5.1) on (U ∩(G ×{e}))×(U ∩({e}×G )) is non-degenerate. Moreover, if such a U and ϕ exist, the map of the corresponding equivalence on simple objects can be computed by decomposing χA(U,ϕ) into irreducible characters of D(G × G ); if X Y appears in this decomposition, then the equivalence sends X to Y op . Observe that the subgroup U and 2-cocycle ϕ defined in the proof of Theorem 6.1 satisfy the conditions of this theorem. The framework of tunneling can be considered for any U and ϕ and not necessarily those given by the above theorem. However, in general we obtain an equivalence between certain subcategories of Z(G) and Z(G ) and not necessarily the whole categories (see Theorem 2.5.1 of [6]).
The Quantum Double Model with Boundary: Condensations and Symmetries
687
7. Conclusion In this paper we defined the quantum double model with boundaries and found the corresponding condensations. Our work is based on the characterization of algebras in Z(G). However, the algebras that we constructed are the maximal ones classified in [6]. Indeed, indecomposable separable commutative algebras of Z(G) are indexed by A(M, K , ϕ, ε), where K ⊆ M are subgroups of G and K is normal in M, ϕ ∈ H 2 (K , C× ), and ε is some extension of ϕ to M × K . A(M, K , ϕ, ε) is maximal if M = K and in this case ε is uniquely determined in terms of ϕ. It is an interesting question whether we can define a boundary for the quantum double model so that the corresponding condensation is given by A(M, K , ϕ, ε). Since the model of [2] is defined based on two subgroups of G, a combination of the ideas of the current paper and [2] may answer this question. The condensations that we characterized are indeed single-quasiparticle excitations, and we know that such excitations do not exist in the usual quantum double model. In the extended quantum double model of [2], however, single-quasiparticles are possible on surfaces with non-trivial topology. So it is interesting to see what happens to boundaries on surfaces beyond plane and sphere. Characterization of confinements as well as edge excitations in these models is another important problem. Classification of edge excitations will clarify domain wall excitations as well. Our general framework for studying boundaries allows us to examine the known facts about the toric code with boundary for the non-abelian quantum double models. See [13,14] for some results in this direction. In the second part of the paper, by applying the folding idea we considered the problem of tunneling of an excitation from one phase to another one, and then explained the necessary and sufficient conditions on two groups G and G such that Z(G) Z(G ). Based on this approach, we found some non-trivial auto-equivalence of Z(Fq+ Fq× ). Finding other such symmetries and their applications are also of interest. For example Bombin in [7], using the symmetry in the case of G = Z2 have realized Ising anyons from an abelian model. Acknowledgements. This paper would have never had this shape without several helpful discussions with Alexei Kitaev, so we gratefully acknowledge him. We are also thankful to Miguel A. Martin-Delgado for introducing his work on condensations in the Kitaev model, and Liang Kong, Chris Heunen, Alexei Davydov, and John Preskill for many clarifications.
Appendix A. Proof of Theorem 6.1 To compute χA(U,ϕ) we use Theorem 5.1. Since U is a normal subgroup we have χA(U,ϕ) (gh ∗ ) =
1 δgh,hg δg,h∈U ϕ(kgk −1 , khk −1 )ϕ(khk −1 , kgk −1 )−1 . |U | k
Letting g = ((a1 , α), (a2 we have
, α −1 )),
h = ((b1 , β), (b2 , β −1 )) and k = ((x1 , θ ), (x2 , λ)),
kgk −1 = ((x1 + θa1 − αx1 , α), (x2 + λa2 − α −1 x2 , α −1 )), khk −1 = ((x1 + θ b1 − βx1 , β), (x2 + λb2 − β −1 x2 , β −1 )),
688
S. Beigi, P. W. Shor, D. Whalen
and thus ϕ(kgk −1 , khk −1 ) = ωtr p ((αx2 +αλa2 −x2 )(x1 +θb1 −βx1 )) , ϕ(khk −1 , kgk −1 ) = ωtr p ((βx2 +βλb2 −x2 )(x1 +θa1 −αx1 )) . Now observe that gh = hg is equivalent to b1 (α − 1) = a1 (β − 1) and α(1 − β)a2 = β(1−α)b2 . So if g and h commute, ϕ(kgk −1 , khk −1 )ϕ(khk −1 , kgk −1 )−1 is independent of x1 , x2 , and we have χA(U,ϕ) (gh ∗ ) =
1 δgh,hg δg,h∈U |U |
ωtr p (θλ(αa2 b1 −βb2 a1 )) .
x1 ,x2 ,θ,λ
Therefore, χA(U,ϕ) (gh ∗ ) =
δgh,hg δg,h∈U (q − 1) if αb1 a2 = βb2 a1 , −δgh,hg δg,h∈U
if αb1 a2 = βb2 a1 .
(42)
Note that if either α or β is not equal to 1, then g, h ∈ U and gh = hg imply αb1 a2 = βb2 a1 . Thus (42) can be simplified to δgh,hg δg,h∈U (q − 1) if α = 1 or β = 1, ∗ χA(U,ϕ) (gh ) = δgh,hg δg,h∈U δa1 b2 ,a2 b1 (q − 1) − δa1 b2 =a2 b1 if α = β = 1. We now need to decompose χA(U,ϕ) into irreducible characters. Let χ(x,ρ)(x −1 ,ρ) (gh ∗ ). (gh ∗ ) = (x,ρ)
By the same steps as in the computation of (gh ∗ ) in (35)-(38) we find that (gh ∗ ) = δgh,hg δg1 h ∗ ∼g−1 (h −1 )∗ |Z (g1 ) ∩ Z (h 1 )|, 1
2
2
, α ))
where g = (g1 , g2 ) = ((a1 , α), (a2 and h = (h 1 , h 2 ) = ((b1 , β), (b2 , β )). ∗ Observe that if gh = hg and either α = 1 or β = 1, then g1 h ∗1 ∼ g2−1 (h −1 2 ) is equivalent to g, h ∈ U . This fact can be verified simply by writing these conditions in terms of a1 , a2 , α, etc. Also in this case g, h ∈ U and gh = hg imply |Z (g1 ) ∩ Z (h 1 )| = q − 1. ∗ Moreover, if α = β = 1, then g1 h ∗1 ∼ g2−1 (h −1 2 ) is equivalent to a1 b2 = a2 b1 , −1 −1 g1 ∼ g2 , and h 1 ∼ h 2 . Therefore, δgh,hg δg,h∈U (q − 1) if α = 1 or β = 1, (gh ∗ ) = δgh,hg δg1 ∼g−1 δh 1 ∼h −1 δa1 b2 ,a2 b1 |Z (g1 ) ∩ Z (h 1 )| if α = β = 1. 2
2
Define = χC C + χ F F − χC F − χ F C . Then (gh ∗ ) = χC (g1 h ∗1 ) − χ F (g1 h ∗1 ) χC (g2 h ∗2 ) − χ F (g2 h ∗2 ) = δgh,hg δh 1 ,e trπ (g1 ) − δh 1 ∈(1,1) δh 2 ,e trπ (g2 ) − δh 2 ∈(1,1) . If either α = 1 or β = 1, then (gh ∗ ) = 0 and we have χA(U,ϕ) (gh ∗ ) = (gh ∗ ) = (gh ∗ ) − (gh ∗ ). Moreover, when α = β = 1 by considering a few cases one can
The Quantum Double Model with Boundary: Condensations and Symmetries
689
verify that χA(U,ϕ) (gh ∗ ) = (gh ∗ ) − (gh ∗ ). For instance, if (α = β = 1 and) a1 = 0 and a2 = 0 we have χA(U,ϕ) (gh ∗ ) = δg2 h 2 ,h 2 g2 δα =β =1 (δb1 =0 (q − 1) − δb1 =0 ) = δα =β =1 (δb1 =0 (q − 1) − δb1 =0 ), and (gh ∗ ) = δg2 h 2 ,h 2 g2 δα =β =1 (δb1 =0 (q − 1) − δb1 =0 )(−δb2 =0 − δb2 =0 ) = δα =β =1 (δb1 =0 (q − 1) − δb1 =0 )(−1), and since g1 = e is not conjugate with g2−1 = e, (gh ∗ ) = 0. Thus χA(U,ϕ) (gh ∗ ) = (gh ∗ ) = (gh ∗ ) − (gh ∗ ). As a result, A(U, ϕ) corresponds to an auto-equivalence of Z(Fq+ Fq× ) which transposes C and F and sends (x, ρ) = C, F to (x −1 , ρ)op = (x, ρ)∨ . B. Chargeon-Fluxion Symmetry as a Modular Invariant The corresponding S-matrix to Z(G) is defined in (4). The T -matrix is a diagonal one that contains the twist numbers of simple objects on the diagonal. For Z(G), T is given by T(g,π )(g,π ) = T(g,π ) =
trπ (g) . trπ (e)
(43)
The pair of matrices (S, T ) is called a modular data, and a modular invariant corresponding to (S, T ) is a matrix M that commutes with both S and T , and such that all entries of M are non-negative integers and M00 = 1 (0 is the trivial object). Clearly, the permutation corresponding to an auto-equivalent of a modular tensor category commutes with both S and T and is a modular invariant. However, a modular invariant may not even be a permutation and then may not come from an auto-equivalence. In this section we study permutation matrices which form a modular invariant of Z(G). In particular, we classify all groups G for which there exists a modular invariant of the form P or P J , where P is a transposition of a chargeon-fluxion pair. Note that J always commutes with both S and T (it can easily be seen from the formulas (4) and (43) in the case of Z(G); for a proof in the general case see [9]). Thus, P J is a modular invariant if and only if P is a modular invariant. B.1. Near-fields. By the result of Sect. 6.1, all groups Fq+ Fq× , defined in terms of a finite field, admit a transposition of a chargeon-fluxion pair as a modular invariant. Here we show that every group with a modular invariant of this form is isomorphic to H+ H× , where H is a near-field. Definition B.1. A set H with two binary operations + and × is called a near-field if 1. (H, +) is an abelian group with the identity element 0. 2. 0 × x = x × 0 = 0 for every x ∈ H. 3. (H \ 0, ×) is a group with the identity element 1. 4. The multiplication is distributive from left with respect to the addition: x × (y + z) = x × y + x × z. (Distributivity from right is not assumed.) The class of all finite near-fields is completely known: there is a method for constructing finite near-fields due to Dickson [15], and it has been shown by Zassenhaus [16] that all finite near-fields except precisely seven of them, are given by Dickson’s construction.
690
S. Beigi, P. W. Shor, D. Whalen
For a near-field H one can consider an action of H× on H+ and define a group structure on H+ H× as follows. Elements of H+ H× are denoted by (a, α), where a ∈ H+ and α ∈ H× , and (a, α)(b, β) = (a + α × b, α × β). This multiplication turns H+ H× to a group with the identity element e = (0, 1). (Note that to obtain a group we must define the action of H× on H+ by multiplication from left, and multiplication from right does not work.) K = {(a, 1) : a ∈ H+ } is a subgroup of H+ H× isomorphic to H+ . On the other hand, it is easy to see that all elements of K \ e are conjugate. Thus, K is an abelian group all of whose elements, except identity, have the same order. As a result, the size of this group |K | = |H| = q is a power of a prime number, and K H+ Fq+ . We will also use the fact that the centralizer of the multiplicative group of every nearfield (with more than 2 elements) is non-trivial. This property can be verified by checking Dickson’s near-fields as well as the other seven near-fields classified by Zassenhaus (see [17]). B.2. A group with a chargeon-fluxion symmetry is isomorphic to H+ H× . We now state the main result of this section. Theorem B.1. Suppose that the permutation matrix P corresponding to a transposition of a chargeon-fluxion pair forms a modular invariant for Z(G). Then G H+ H× , where H is a near-field. Conversely, for every group H+H× there exists such a modular invariant. Proof. We first show that there exists a chargeon-fluxion pair in Z(H+ H× ) that forms a modular invariant. Consider a non-trivial representation of the abelian subgroup K ⊆ H+ H× (defined above), and denote its induced representation on H+ H× by π . Let C = (e, π ) and F = (a, 1), where a = (1, 1) ∈ H+ H× . We claim that the permutation P which exchanges C and F is a modular invariant. By the definition of π , dimπ = q − 1, trπ (a) = −1 and trπ (h) = 0 for every h ∈ / K . Then since g |trπ (g)|2 = q(q − 1), π is an irreducible representation. Also a dimension-counting argument shows that all other irreducible representations of H+ H× come from an irreducible representation of H× (H+ H× )/K , and then for every such representation μ, trμ (a) = trμ (e) = dimμ. P commutes with T because TC = TF = 1. To prove P S = S P we should show that SC X = S F X for every irreducible representation X = C, F of D(H+ H× ), and SCC = S F F . This is a straightforward computation given the structure of Z (a) = K and the irreducible representations of H+ H× described above. Now consider an arbitrary group G, let C = (e, π ) be a chargeon and F = (a, 1) a fluxion in Z(G), and assume that the permutation P which interchanges C and F commutes with the corresponding S-matrix. Note that since by the Verlinde formula the fusion rules are computed in terms of S and P S P −1 = S, the fusion rules are also symmetric with respect to C and F. We prove G H+ H× in the following steps: (a) π = 1 and a = e. 0 = (e, 1) is the unique representation such that 0 ⊗ X X , so its fusion rules cannot be the same as any other representation. Therefore, C and F are different from 0. (b) dimπ = |a|. Since 0 = C, F we have SC0 = S F0 . Then 1 dimπ = , |G| |Z (a)| or equivalently dimπ = |a|.
(44)
The Quantum Double Model with Boundary: Condensations and Symmetries
691
(c) {e} ∪ a is a subgroup of G. Let X = (h, μ) be a representation such that h is different from {e} and a. Since C has a trivial magnetic flux, C ⊗ X is equivalent to the sum of representations whose magnetic flux is equal to h. Thus, the magnetic flux of any representation in F ⊗ X should also be h. This means that a h = h for any h ∈ / {e} ∪ a. As a result, a a ⊆ {e} ∪ a which implies that {e} ∪ a is closed under multiplication and forms a subgroup. For every X = (h, μ) we have SC X =
1 trπ (h −1 ) dimμ , trπ (kh −1 k −1 )trμ (e) = |G| · |Z (h)| |Z (h)| k∈G
and SF X =
1 |Z (h)| · |Z (a)|
trμ (k −1 a −1 k).
khk −1 ∈Z (a)
Therefore, if X = C, F, trπ (h −1 ) dimμ 1 = |Z (h)| |Z (h)| · |Z (a)|
trμ (k −1 a −1 k).
(45)
khk −1 ∈Z (a)
(d) For any irreducible representation μ of G different from π we have trμ (a −1 ) = trμ (a) = dimμ = trμ (e). Let h = e in (45) and note that k −1 a −1 k is a conjugate of a −1 in Z (h) = G. Thus trμ (k −1 a −1 k) = trμ (a −1 ) and trμ (a −1 ) dimπ dimμ = . |G| |Z (a)| Then by (44) we obtain trμ (a −1 ) = dimμ.
(e) trπ (h) = 0, for any h ∈ / {e} ∪ a, and trπ (a) = trπ (a −1 ) = −1. The column h of the character table of G is orthogonal to columns e and a. On the other hand, by (d) columns e and a coincide except at the representation π . Therefore, trπ (h) = 0. trπ (a) = −1 can be shown using (b), and the orthogonality of π and the trivial representation of G. (f) |Z (a)| = |a| + 1. Because of the orthogonality of columns e and a of the character table of G we have trμ (e)trμ (a)∗ = 0, μ
where the sum is over all irreducible representations of G. Thus μ=π (dimμ)2 − dimπ = 0. On the other hand, we know that μ (dimμ)2 = |G|. Therefore, |G| − (dimπ )2 − dimπ = 0 which by using dimπ = |a| gives |Z (a)| = |a| + 1.
692
S. Beigi, P. W. Shor, D. Whalen
(g) Z (a) = {e} ∪ a. According to (f) it is sufficient to show that h ∈ / Z (a) for every h∈ / {e} ∪ a. This fact can easily be seen from (45) by letting μ = 1. (h) Z (a) Fq+ where q is a power of a prime number, and |G| = |Z (a)|·|a| = q(q −1). Since Z (a) = {e} ∪ a is a normal subgroup, Z (b) = Z (a) for every b ∈ a. Thus Z (a) is an abelian subgroup. On the other hand, the order of all elements of a = Z (a) \ e is the same. Therefore, Z (a) is isomorphic to Fq+ , where q is a power of a prime number. For simplicity let Z (a) = H. Then H is an abelian subgroup of G. We show that a multiplication × can be defined on H which together with the operation of H induced from G turns it into a near-field. We then prove that G H H× . Since |G/H| = |a|, and H = Z (a), the cosets of G/H are in one-to-one correspondence with elements of a; for every b ∈ a there exists a unique x˜b = xb H ∈ G/H such that xb axb−1 = b. Now define a binary operation × on H in the following form. e × b = b × e = e for every b ∈ H, and for b, c ∈ a, b × c = xb xc axc−1 xb−1 . × is well-defined because elements of H commute with every element of a. ¯ (i) H× = (H \ e, ×) is a group whose identity element is a. The inverse of b is b , where b = xb−1 axb . The associativity is proved using x˜b×c = x˜b x˜c . (j) H with the induced operation from G as the addition and × as the multiplication forms a near-field. We need to show that multiplication is distributive from left with respect to addition: b × (cd) = (b × c)(b × d). If one of b, c, d is equal to e, it obviously holds; otherwise both sides are equal to xb cd xb−1 . In the following we assume that q = |H| > 2 since otherwise G H H× is obvious. (k) There exists g ∈ G \ H such that G = HZ (g). Since H is a near-field, the centralizer of H× is non-trivial (see Sec. B.1). This means that there exists g ∈ G such that gag −1 = a and (gag −1 ) × b = b × (gag −1 ) for every b ∈ H. In other words, for every x ∈ G, gxax −1 g = xgag −1 x −1 , or equivalently, g ⊆ gH. Therefore, |g| ≤ |H| = q, and then |Z (g)| ≥ q −1. On the other hand, H is a normal subgroup of G, so HZ (g) is a subgroup and since Z (g) ∩ H = Z (g) ∩ Z (a) = {e}, the size of this subgroup is equal to q|Z (g)|. Thus |Z (g)| ≤ q − 1 and therefore, Z (g) is a subgroup of order q − 1 and G = HZ (g). (l) G H H× . Since G = HZ (g) and H ∩ Z (g) = {e}, every element of G can uniquely be written in the form of bk, where b ∈ H and k ∈ Z (g). It is easy to see that the map which sends bk ∈ G to (b, kak −1 ) ∈ H H× is an isomorphism. We are done. B.3. Example: two modular invariants in Z(A6 ). Assume that the transposition (X, Y ) forms a modular invariant in Z(G). Theorem B.1 classifies all groups for which there exists such a modular invariant when X is a chargeon and Y is a fluxion. If we relax this assumption by keeping X to be a chargeon but assuming Y = (a, ρ) is arbitrary, most steps in the proof of Theorem B.1 (with some variations) still hold. In particular, dimρ = 1 is enough to show that G H+ H× . (In this case proving (g) needs more work.)
The Quantum Double Model with Boundary: Condensations and Symmetries
693
There are two remaining cases. First, both X and Y are chargeon, and second, none of them is chargeon. The first case cannot happen; if X = (e, π ) and Y = (e, π ), S0X = S0Y implies dimπ = dimπ . Moreover, for every g = e, S X (g,1) = SY (g,1) is equivalent to trπ (g) = trπ (g). Thus π = π . Now assume that X = (a, ρ) and Y = (b, ρ ), and a, b = e. Then for every irreducible representation π of G, S X (e,π ) = SY (e,π ) , and we obtain trπ (a −1 )dimρ trπ (b−1 )dimρ = . |Z (a)| |Z (b)| For π = 1 we find that dimρ/|Z (a)| = dimρ /|Z (b)|, and thus for every π , trπ (a −1 ) = trπ (b−1 ). Equivalently, a and b belong to the same conjugacy class, and X, Y have the same magnetic flux. Here we present an example of a modular invariant in the latter case (X = (a, ρ) and Y = (a, ρ )). Let A6 be the alternating group of order six (the group of even permutations over {1, . . . , 6}). Let a = (1, 2)(3, 4). a is equal to the set of all permutations of the form (t1 , t2 )(t3 , t4 ). Then |a| = 45, and |Z (a)| = |A6 |/|a| = 8; Z (a) = {e, a, b1 = (1, 2)(5, 6), b2 = (3, 4)(5, 6), b3 = (1, 3)(2, 4), b4 = (1, 4)(2, 3), c1 = (1, 3, 2, 4)(5, 6), c2 = (1, 4, 2, 3)(5, 6)}. The conjugacy classes of Z (a) are {e}, {a}, {b1 , b2 }, {b3 , b4 }, and {c1 , c2 }, and the character table of Z (a) is as follows: ρ1 ρ2 ρ3 ρ4 μ
e 1 1 1 1 2
a b1 , b2 b3 , b4 c1 , c2 1 1 1 1 1 −1 −1 1 1 1 −1 −1 1 −1 1 −1 −2 0 0 0
We claim that both transpositions (X 1 , X 2 ) and (X 3 , X 4 ), where X i = (a, ρi ), are modular invariants. T is invariant under these transpositions because TX i = 1 for every 1 ≤ i ≤ 4. For every Y = (e, π ), S X i Y = trπ (a −1 )dimρi /|Z (a)|, and since dimρi = 1 for every i, S X i Y = S X j Y for i, j ∈ {1, 2, 3, 4}. For every Y = (h, π ), where h ∈ / {e} ∪ a, we have SXi Y =
1 |Z (a)| · |Z (h)|
trρi (kh −1 k −1 )trπ (k −1 a −1 k).
k: khk −1 ∈Z (a)
Observe that c1 , c2 are the only elements of Z (a) which can be conjugates of h −1 , and trρi (c1 ) = trρ j (c1 ) for i, j ∈ {1, 2} and i, j ∈ {3, 4}. Therefore, S X i Y = S X j Y . Now it remains to show that S X 1 X 1 = S X 2 X 2 , S X 3 X 3 = S X 4 X 4 , S X 1 X 3 = S X 2 X 3 = S X 2 X 4 = S X 1 X 4 , and S X 1 Y = S X 2 Y = S X 3 Y = S X 4 Y , where Y = (a, μ). These equalities can simply be verified given the character table of Z (a).
694
S. Beigi, P. W. Shor, D. Whalen
References 1. Kitaev, A.Yu.: Fault-tolerant quantum computation by anyons. Ann. Phys. 303, 2 (2003) 2. Bombin, H., Martin-Delgado, M.A.: A Family of Non-Abelian Kitaev Models on a Lattice: Topological Condensation and Confinement. Phys. Rev. B 78, 115421 (2008) 3. Bombin, H., Martin-Delgado, M.A.: Nested Topological Order. http://arXiv:org/abs/0803.4299v1 [condmat.str-el], 2008 4. Levin, M.A., Wen, X.-G.: String-net condensation: A physical mechanism for topological phases. Phys. Rev. B 71, 045110 (2005) 5. Buerschaper, O., Aguado, M.: Mapping Kitaevs quantum double lattice models to Levin and Wens stringnet models. Phys. Rev. B 80, 155136 (2009) 6. Davydov, A.: Modular invariants for group-theoretical modular data I. J. Alg. 323(5), 1321–1348 (2010) 7. Bombin, H.: Topological Order with a Twist: Ising Anyons from an Abelian Model. Phys. Rev. Lett. 105, 030403 (2010) 8. Bravyi, S., Kitaev, A.Yu.: Quantum codes on a lattice with boundary. http://arXiv:org/abs/quant-ph/ 9811052.v1, 1998 9. Bakalov, B., Kirillov, A. Jr.: Lectures on tensor categories and modular functors. University Lecture Series, 21. Providence, RI: Amer. Math. Soc., 2001 10. Kitaev, A.: Anyons in an exactly solved model and beyond. Ann. Phys. 321(1), 2–111 (2006) 11. Naidu, D.: Categorical Morita equivalence for group-theoretical categories. Commun. Alg. 35(11), 3544–3565 (2007) 12. Naidu, D., Nikshych, D.: Lagrangian Subcategories and Braided Tensor Equivalences of Twisted Quantum Doubles of Finite Groups. Commun. Mathe. Phys. 279, 845–872 (2008) 13. Bermudez, A., Patane, D., Amico, L., Martin-Delgado, M.A.: Topology Induced Anomalous Defect Production by Crossing a Quantum Critical Point. Phys. Rev. Lett. 102, 135702 (2009) 14. Nussinov, Z., Ortiz, G.: A symmetry principle for Topological Quantum Order. Ann. Physics 324(5), 977–1057 (2009) 15. Dickson, L.E.: On finite algebras. Nachr. Akad. Wiss. Göttingen. Math. Phys. Kl. II, 358–393 (1905) 16. Zassenhaus, H.: Über endliche Fastkörper. Abh. Math. Sem. Univ. Hamburg 11, 187–220 (1936) 17. Hall, M.: The theory of groups. New York: MacMillan Co., 1959 Communicated by Y. Kawahigashi
Commun. Math. Phys. 306, 695–746 (2011) Digital Object Identifier (DOI) 10.1007/s00220-011-1296-8
Communications in
Mathematical Physics
Large Violation of Bell Inequalities with Low Entanglement M. Junge, C. Palazuelos Department of Mathematics, University of Illinois at Urbana-Champaign, Illinois 61801-2975, USA. E-mail:
[email protected];
[email protected] Received: 28 August 2010 / Accepted: 17 January 2011 Published online: 5 July 2011 – © Springer-Verlag 2011
Abstract: In this paper we obtain violations of general bipartite Bell inequalities of √ n order log n with n inputs, n outputs and n-dimensional Hilbert spaces. Moreover, we construct explicitly, up to a random choice of signs, all the elements involved in such violations: the coefficients of the Bell inequalities, POVMs measurements and quantum states. Analyzing this construction we find that, even though entanglement is necessary to obtain violation of Bell inequalities, the entropy of entanglement of the underlying state is essentially irrelevant in obtaining large violation. We also indicate why the maximally entangled state is a rather poor candidate in producing large violations with arbitrary coefficients. However, we also show that for Bell inequalities with positive coefficients (in particular, games) the maximally entangled state achieves the largest violation up to a logarithmic factor. 0. Introduction and Main Results The study of quantum nonlocality dates back to the famous work of Einstein, Podolsky and Rosen (EPR) in 1935. They presented an argument which questioned the validity of quantum mechanics as a complete theory of Nature ([28]). However, it took almost 30 years to understand that the apparent dilemma presented in [28] could be formulated in terms of assumptions which naturally lead to a refutable prediction ([75]). Bell showed that the assumption of a local hidden variable model implies some inequalities on the set of probabilities, since then called Bell inequalities, which are violated by certain quantum probabilities produced with an entangled state ([6]). For a long time after this, entanglement and violation of Bell inequalities were thought to be parts of the same concept. This changed in the late 1980s with a number of surprising results (see [29,64,74]) which showed that, although entanglement is necessary for the violation of Bell inequalities, the converse is not true. On the other hand, up to our knowledge, violation of Bell inequalities is the only way to detect entanglement experimentally without additional hypothesis on the experiment.
696
M. Junge, C. Palazuelos
Nowadays, Bell inequalities is a fundamental subject in Quantum Information Theory (QIT). Apart from the theoretical interest, Bell inequalities have found applications in many areas of QIT: quantum cryptography, where it opens the possibility of getting unconditionally secure quantum key distribution ([1,4,50,51]), complexity theory, where it enriches the theory of multipartite interactive proof systems ([7,18,19,24,32,39,41]), communication complexity (see the recent review [15]); estimates for the dimension of the underlying Hilbert space ([11,13,57,71,73]), entangled games ([40,41]), etc. Bell inequalities and their connection to quantum entanglement have remained quite mysterious despite the recent research on this topic. In the few last years, the application of techniques from different areas of mathematics has started to clarify the situation. This includes the previous works of the authors, which are based on operator space techniques. Indeed, in the consecutive works [57] and [35], the authors have shown the operator space theory as a natural framework for the study of Bell inequalities (see also [36]). Using this connection the authors proved in [57] the existence of unbounded violations of tripartite correlation Bell inequalities, answering an old question stated by Tsirelson ([70]). Moreover, in [35] the authors used operator spaces techniques to get unbounded violations of general bipartite Bell inequalities. In the present paper we improve the main results of [35] (see√ also [16]). In fact, we obtain violations of general bipartite Bell inequalities of order lognn with N = n inputs, K = n outputs and d = n-dimensional Hilbert spaces. We also provide upper bounds for general Bell inequalities of order O(N ), O(K √) and O(d). In addition to being almost optimal in all the parameters of the problem ( n instead of n) our estimates are also very concrete. Indeed, we construct explicitly, up to a random choice of signs, all the elements involved in the violation. That is, the coefficients of the Bell inequalities, the quantum state and the POVM’s. We hope these constructions can be used for further applications. Moreover, we connect our estimates with the entropy of entanglement of the underlying pure states. To our own surprise, violation and entropy of entanglement appear to be almost independent. Also, the maximally entangled state is only of very limited use in producing violation. Moreover, we show that this limitation is no longer true when considering Bell inequalities with positive coefficients (in particular, games), where the maximal entangled state always gives the largest violation up to a logarithmic factor. Let us now state the results more explicitly. A standard scenario to study quantum nonlocality consists of two spatially separated and non-communicating parties, usually called Alice and Bob. Each of them can choose among different observables, labelled by x = 1, . . . , N in the case of Alice and y = 1, . . . , N in the case of Bob.The possible outcomes of these measurements are labelled by a = 1, . . . , K in the case of Alice and b = 1, . . . , K in the case of Bob. Following the standard notation, we will refer to the observables x and y as inputs and call a and b outputs. We are considering the same number of inputs (resp. outputs) for Alice and Bob just for simplicity. For fixed x, y, we K will consider the probability distribution (P(a, b|x, y))a,b=1 of positive real numbers satisfying K
P(ab|x y) = 1.
a,b=1 N ,K The collection P = (P(a, b|x, y))x,y;a,b=1 is called probability distribution (Fig. 1).
Large Violation of Bell Inequalities with Low Entanglement
697
Fig. 1. {P(a, b|x, y)}a,b,x,y is the probability distribution of the measurement outcomes a, b, when Alice and Bob choose the observables labeled by x and y respectively N ,K Given a probability distribution P = (P(a, b|x, y))x,y;a,b=1 , we will say that P is
a) Non-signaling if K
P(a, b|x, y) = P(b|y) is independent of x,
a=1 K
P(a, b|x, y) = P(a|x) is independent of y.
b=1
This condition means that Alice’s choice of inputs does not affect Bob’s marginal probability distribution and viceversa. This is physically motivated by the principle of Einstein locality which implies non-signaling if we assume that Alice and Bob are space-like separated. We denote the set of non-signaling probability distributions by C. We must point out that the elements in C were initially called behaviors (see [70]). However, following the more recent literature (see [22] and [35]), we will not use that terminology. b) LHV (Local Hidden Variable) if P(a, b|x, y) = Pω (a|x)Q ω (b|y)dP(ω)
for everyx, y, a, b, where (, , P) is a probability space, Pω (a|x) ≥ 0 for all a, x, ω, a Pω (a|x) = 1 for all x, ω and the analogous conditions for Q ω (b|y). We denote the set of LHV probability distributions by L. c) Quantum if there exist two Hilbert spaces H1 , H2 such that P(a, b|x, y) = tr (E xa ⊗ Fyb ρ) for every x, y, a, b, where ρ ∈ B(H1 ⊗ H2 ) is a density operator and (E xa )x,a ⊂ B(H1 ), (Fyb ) y,b ⊂ B(H2 ) are two sets of operators representing POVM measure ments on Alice and Bob systems. That is, E xa ≥ 0 for every x, a, a E xa = id for every x, Fyb ≥ 0 for every y, b and b Fyb = id for every y. We denote the set of quantum probability distributions by Q.
698
M. Junge, C. Palazuelos 2
2
It is well known (see [70]) that L Q C ⊂ R N K . We want to understand the “distance between L and Q” quantitatively. Following [35], we define the largest Bell violation that a given P ∈ C may attain as ν(P) = sup M
|M, P| , sup P ∈L |M, P |
a,b N ,K where M = {Mx,y } is the Bell inequality acting on P by duality in the natural Nx,y=1,a,b=1 ,K a,b way: M, P = x,y;a,b=1 Mx,y P(a, b|x, y). Thus, in order to measure how far the elements in Q can be from L, we are interested in computing the maximal possible Bell violation
sup ν(P).
P∈Q
(0.1)
Beyond the theoretical interest of sup P∈Q ν(P) as a measure of nonlocality, this term turns out to be a useful measure regarding the applications in different contexts. Indeed, in [36] (see also [22,35]) the authors showed its immediate application to dimension witness, communication complexity or entangled games. Moreover, this term can be used to measure nonlocality in the presence of noise or/and detector inefficiencies. This is the key point in the search of a loophole free Bell test (see [35,36] for details). The main result of this paper can be stated as follows: Theorem 1. For every n ∈ N there exists a quantum probability distribution P with n inputs, n outputs and Hilbert spaces of dimension n such that √ n ν(P)
. log n Here, we use to denote inequality up to a universal constant independent of n ∈ N. The first unbounded violation of Bell inequalities dates back to the Raz parallel repetition theorem ([66]). Indeed, applying this result to the repetition of the magic square game (or any pseudo-telepathy game ([9])), one can deduce the existence of an x > 0 such that for every n we have quantum probability distributions P with n inputs, n outputs and dimension n such that ν(P) n x . However, regarding the sharpest estimates on the parallel repetition theorem ([30,65,67]), the best known value for the previous x doesn’t seem to be much better than 10−5 . In [41] a great improvement of the previous results was made. Via a highly non-trivial construction of Khot and Visnoi in the context of Complexity Theory ([43]), the main result in [41] shows the existence of a quantum 1 n probability distribution P with n outputs and 2n inputs, which verifies that ν(P) n 54 .1 The prize in that estimate is a large number of inputs and no control in terms of the dimension of the underlying Hilbert space. In the recent paper [35] we showed the existence of log2 n
a quantum probability P constructed with [2 2 ]n inputs, n outputs and Hilbert spaces √ dimension n which verifies ν(P) log2nn . This result highly improved the previous ones, almost closing the gap to the known upper bounds in the number of outputs and in the dimension of the Hilbert spaces. Finally, in the very recent work [16] the previous estimates were improved by obtaining a quantum probability distribution P verifying 1
1 Actually, one can obtain n 24 up to terms of lower order via a claim in ([17], p. 3).
Large Violation of Bell Inequalities with Low Entanglement
699
√
ν(P) lognn with 2n inputs, n outputs and n dimensional Hilbert spaces. Moreover, we should mention that the techniques used in [16] are of combinatorial nature (without using randomness). As before these results required a large number of inputs. Therefore, Theorem 1 significantly improves the previously known results about unbounded violations of Bell inequalities. It almost closes the gap to the known upper bounds in all the involved parameters of the problem (see Sect. 5). Furthermore, all the elements in our construction are extremely simple. Actually, the Bell inequality coefficients and the measurements are explicitly constructed by means of inner products and rank one projections of random vectors in Rn+1 . Indeed, we consider a fixed number of k with x, a, k = 1, · · · , n. For a constant K we define ±1 signs x,a a) Bell inequality coefficients: ⎧ 1 n k k x, y, a, b = 1, · · · , n ⎪ ⎨ n 2 k=1 x,a y,b a,b ˜ Mx,y = 0 a = n + 1, and x, y = 1, . . . , n; b = 1, . . . , n + 1 ⎪ ⎩ 0 b = n + 1, and x, y = 1, . . . , n; a = 1, . . . , n + 1. b) POVMs measurements: {E xa }n,n+1 x,a=1 in Mn+1 as ⎧ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎨
⎛
1
1 x,a 1 .. .
n ··· x,a 1 n · · · x,a x,a .. .. . .
1 ⎜ x,a ⎜ ⎜ . ⎝ .. E xa = ⎪ n n 1 ⎪ ⎪ x,a x,a x,a · · · ⎪ ⎪ ⎪ ⎩ n a 1 − a=1 E x 1 nK
⎞ ⎟ ⎟ ⎟ ⎠
for a = 1, · · · , n,
1 for a = n + 1
for x = 1, · · · , n. n+1 be a decreasing and positive sequence verifying n+1 α 2 = 1 c) States: Let (αi )i=1 i=1 i and |ϕα =
n+1
αi |ii.
i=1
Theorem 2. There exist universal constants C and K such that for every natural numk }n a n,n+1 ber n there exists a choice of signs {x,a x,a,k=1 verifying that {E x }x,a=1 define POVMs measurements, ⎫ ⎧ ⎬ ⎨ n,n+1 a,b (0.2) sup P(a, b|x, y) : P ∈ L ≤ C log n M˜ x,y ⎭ ⎩ x,y;a,b=1 and n,n+1 x,y;a,b=1
2 a,b ϕα |E xa ⊗ E yb |ϕα ≥ 2 α1 αi . M˜ x,y K n+1
(0.3)
i=2
Moreover, the probability of the elements (choices of signs) verifying this tends to 1 exponentially fast as n → ∞.
700
M. Junge, C. Palazuelos
It is clear that the Bell inequality coefficients are defined as inner products of the 1 , . . . , n ). On the other hand, one can see that our measurement opervectors n1 (x,a x,a 1 , . . . , n ) ∈ n+1 ators can be written as E xa = |u ax u ax |, where |u ax = √1 (1, x,a x,a 2 nK for every x, a = 1, . . . , n. One could of course increase the dimension of the Hilbert space to realize these measurements as von Neumann projective measurements, with orthonormal rank one projectors. This explicit construction allows us to study the connection between two concepts, violation of Bell inequalities and quantum entanglement, which are at the heart of Quantum Information Theory. Indeed, for bipartite pure states there exists a universal measure of entanglement, the so called entropy of entanglement: E(|ψ) = S((|ψψ|) A ), where S denotes the usual von Neumann entropy (see [25]). It is easy to see that E(|ψ) ≥ 0 for every state |ψ and that the maximally entangled state in dimension n is n 1 |ψn = √ |ii, n i=1
verifying E(|ψn ) = log2 (n). For a given bipartite pure state |ϕ in dimension n and δ > 0, we will say that |ϕ is δ-maximally entangled (resp. δ- non entangled) if log2 (n)− E(|ϕ) < δ (resp. E(|ϕ) < δ). As a consequence of Theorem 2 we have: Corollary 1. For any δ > 0 we can find a δ-maximally entangled state (resp. δ- non a,b n entangled state) |ψδ in a high enough dimension n, a Bell inequality (Mx,y )x,y,a,b=1 n a and POVMs measurement {E x }x,a=1 such that √ |M, Q |ψδ | n
, sup P∈L |M, P| (log n)2 where Q |ψδ (a, b|x, y) = ψδ |E xa ⊗ E yb |ψδ for every x, y, a, b = 1, . . . , n. The previous corollary shows that even though quantum entanglement is needed to obtain violation of Bell inequalities, the amount of entanglement is essentially irrelevant for large violation. Indeed, we can find states with entropy of entanglement close to either 0 or log2 (n + 1) and this only decreases violation by a logarithmic factor. It is interesting to note that the previous construction doesn’t say anything about the extremal cases: entanglement 0 (which is trivial) and maximal entanglement. This leads us to the following result: Theorem 3. There exist Bell inequalities M˜ with 2n inputs and n+1 outputs and POVMs n2 { E˜ xa }2 ,n+1 acting on n+1 with the following properties: 2
2
x,a=1
n2
,n+1 a) M˜ and { E˜ xa }2x,a=1 verify Eqs. (0.2) and (0.3) in Theorem 2 for every state |ϕα = n+1 i=1 αi |ii. ˜ Q max |} 1, where this sup runs over all quantum probability distribub) sup{| M, tions Q max constructed with the maximally entangled state in any dimension.
Large Violation of Bell Inequalities with Low Entanglement
701
In particular, Theorem 3 shows the existence of quantum probability distributions P which can not be written as a quantum probability distribution by using the maximally entangled state, even when the dimension of the Hilbert spaces is not restricted (note the difference with the case of quantum correlations matrices, [70]).√ However, we will show that every n dimensional diagonal state can be written, up to a log n factor, as a superposition of maximally entangled states in the same dimension. A very interesting consequence of this superposition result, is that Theorem 3 is no longer true if we restrict to Bell inequalities with positive coefficients (in particular, games), because in that case the maximal entangled state always gives the largest violation up to a log factor in the dimension of the Hilbert space (see Theorem 10). The paper is organized as follows. Section 1 is devoted to introduce the basic tools. In the first part we will give a brief introduction to operator space theory. In the second part of this section we will summarize the connections between operator spaces and Bell inequalities from [35]. Furthermore, we will explain some new connections which will be used in this work. In Sect. 2 we will prove Theorem 1. However, we will first give a direct less explicit proof. This proof serves as a guideline for the strategy used throughout the paper. In the last part of the section, we will discuss the optimality of our result. In Sect. 3 we will present the proof of Theorem 2 and we will investigate the connection between the amount of violation and the entropy of entanglement of our states leading to Corollary 1. In Sect. 4 we study the maximally entangled state. In the first part of the section we will prove Theorem 3. Motivated by this result, we will clarify the role of the maximally entangled state in the context of violation of Bell inequalities. In Sect. 5 we will discuss the geometric meaning of violation of Bell inequalities. We will show that the “distance” introduced in Eq. (0.1) and its dual version L V (see Eq. (1.5)) are not only the right ones regarding the applications of violation of Bell inequalities to different contexts of QIT, but they are also the natural ones from a geometric point of view. As a consequence of the results developed in this section we will obtain upper bounds for the largest violation of Bell inequalities. Finally, in Sect. 6 we will study the role of the γ2∗ tensor norm in the context of violation of Bell inequalities. The main motivation is the study of this norm as a relaxation of the problem of computing the classical and the quantum value of Bell inequalities. Actually, we will show that this relaxation is related to some well known SDP relaxations already used in the study of some problems of Complexity Theory. Throughout the section, we will give some optimal results for this norm. 1. Basic Tools 1.1. Operator spaces. We will recall some basic facts from operator space theory. We recommend [27] and [58] for further information and more detailed definitions. We will denote by Mn (resp Mm,n ) the space of complex n × n (resp m × n) matrices. The theory of operator spaces came to life through the work of Effros and Ruan in the 80’s (see [27,58]). They provided an axiomatic characterization of closed subspaces of B(H ), the space of bounded linear operators on a Hilbert space, where the objects are Banach spaces E combined with a tail of matrix norms on Mn (E) attached to it. More formally, an operator space is a complex vector space E and a sequence of norms · n in the space of E-valued matrices Mn (E) = Mn ⊗ E, verifying Ruan’s axioms. 1. For every n, m ∈ N, x ∈ Mm (E), a ∈ Mn,m and b ∈ Mm,n we have that axbn ≤ axm b.
702
M. Junge, C. Palazuelos
2. For every n, m ∈ N, x ∈ Mn (E), y ∈ Mm (E), we have that x 0 = max{xn , ym }. 0 y n+m
In particular, every C ∗ -algebra A has a natural operator space structure induced by a faithful embedding j : A → B(H ). Indeed, it is enough to consider the sequence of norms on Mn ⊗A defined by the embedding id ⊗ j : Mn ⊗A → Mn ⊗ B(H ) = B( n2 ⊗ H ). In particular, k∞ has a natural operator space structure. Let us describe this explic itly. We embed k∞ as diagonal maps in Mk . Let x = i Ai ⊗ei ∈ Mn ( k∞ ) = Mn ⊗ k∞ . Then we have xn = Ai ⊗ |ii| = max Ai Mn . (1.1) i i
Mnk
The category of Banach spaces and the category of operator spaces essentially deal with the same objects, closed subspaces of B(H ), but they differ through their morphisms. The morphisms in the category of operator spaces are those which allow a uniform control of all matrix norms, so called completely bounded maps. A linear map u : E −→ F between operator spaces is called completely bounded if all the amplifications u n = idn ⊗u : Mn ⊗ E = Mn (E) −→ Mn ⊗ F = Mn (F) remain uniformly bounded. The cb-norm of u is then defined as ucb = supn u n . We will call C B(E, F) the resulting normed space. It has a natural operator space structure given by Mn (C B(E, F)) = C B(E, Mn (F)). We can analogously define the notion of a complete isomorphism/isometry (see [27,58]). The minimal tensor product of two operator spaces E ⊂ B(H ) and F ⊂ B(K ) is defined as the operator space E ⊗min F with the structure inherited from the induced embedding E ⊗ F ⊂ B(H ⊗ K ). In particular, Mn (E) = Mn ⊗min E holds for every operator space E. The tensor norm min in the category of operator spaces will play the role of the so called norm in the classical theory of tensor norms in Banach spaces [23]. In particular min is injective, in the sense that if E ⊂ X and F ⊂ Y completely isometric (isomorphic), then E ⊗min F ⊂ X ⊗min Y holds completely isometrically (isomorphically). The analogue of the largest tensor norm π for Banach spaces in operator space theory is given by the operator space projective norm ∧. The norm is defined as u Mn (E⊗∧ F) = inf α Mn,lm x Ml (E) y Mm (F) β Mlm,n : u = α(x ⊗ y)β , where u = α(x ⊗ y)β means the matrix product u= αr,i p β jq,s |r s| ⊗ xi j ⊗ y pq ∈ Mn ⊗ E ⊗ F. r si j pq
Both tensor norms, ∧ and min, are associative and commutative and they share the duality relations given by π and . This means that for finite dimensional operator spaces we have the natural completely isometric identifications (E ⊗∧ F)∗ = C B(E, F ∗ ) = E ∗ ⊗min F ∗ .
(1.2)
Here the matrix norms of the dual operator space E ∗ of an operator space E are given by Mn (E ∗ ) = C B(E, Mn ).
Large Violation of Bell Inequalities with Low Entanglement
703
A Banach space X carries many different operator space structures. This means that there are different isometric inclusions in B(H ) with different tails of matrix norms. Fundamental examples are the row and column structures, defined on a Hilbert space
n2 . For the row operator spaces Rn , we embed n2 into Mn as a row αk |0k| : αk ∈ C Rn = k
and similarly we define the column operator space Cn via αk |k0| : αk ∈ C . Cn = k
It can be seen that Ai ⊗ ei
Mm ⊗min Rn
i
1 2 = Ai Ai† and Ai ⊗ ei i
i
Mm ⊗min Cn
1 2 = Ai† Ai . i
Here and in the rest of the work ei denotes the i th canonical vector of Cn (as |i does). We may also need the intersection of two operator spaces. Assume that X and Y are injectively embedded in the larger topological vector space V . Then we may define the norm on Mn (X ∩ Y ) = Mn (X ) ∩ Mn (Y ) by (xi j ) Mn (X ∩Y ) = max (xi j ) Mn (X ) , (xi j ) Mn (Y ) . It is easy to see that this definition satisfies Ruan axioms, and moreover for X ⊂ B(H ), Y ⊂ B(K ), X ∩ Y ⊂ X ⊕ Y ⊂ B(H ) ⊕ B(K ) ⊂ B(H ⊕ K ), where the last inclusion is given by diagonal operators, and the first inclusion by identifying elements which are considered equal in the ambient space V . Specifically, if we consider Rn ∩ Cn , we obtain a new operator space structure on n2 , described by ⎧ 1 1 ⎫ 2 2 ⎬ ⎨ Ai ⊗ ei = max Ai Ai† , Ai† Ai . ⎭ ⎩ i
Mm ⊗min Rn ∩Cn
i
i
In this work we will also use Pisier’s operator space O H as a technical tool. We refer to its definition as complex interpolation space O H = (R, C) 1 and further properties 2 to [58, Chap. 7] and [62]. The operator space n1 carries a natural operator space structure as the dual of n∞ , i.e. n1 = ( n∞ )∗ . Note that for any operator space X the natural operator space structure on 1 (X ) ⊂ (c0 ⊗min X ∗ )∗ is given by the norm closure of 1 ⊗ X . We will write n1 (X ) for the space given by n-tuples of elements in X and observe that by definition ( n∞ ⊗min X )∗ = ( n∞ (X ))∗ = n1 (X ∗ ) = n1 ⊗∧ X ∗
holds completely isometrically.
Let E 0 and E 1 be two operator spaces so that (E 0 , E 1 ) is a compatible couple of Banach spaces. This means that E 0 and E 1 are injectively embedded in a topological
704
M. Junge, C. Palazuelos
vector space V . On the complex interpolation space E θ = (E 0 , E 1 )θ we have a natural operator space structure given by the formula Mn (E θ ) = (Mn (E 0 ), Mn (E 1 ))θ . This turns E θ into an operator space (see [58], Chap. 2). As an application we observe that
n2 ( ∞ ) = ( n1 ( ∞ ), n∞ ( ∞ )) 1 2
carries a natural operator space structure. Using standard interpolation theory (a clever application of the three line lemma), this implies √ √ id : n∞ ( ∞ ) → n2 ( ∞ )cb = n and id : n2 ( ∞ ) → n1 ( ∞ )cb = n. (1.3) 1.2. Connections to the physical problem. As it was shown in [35], we understand a Bell inequality (or more precisely the coefficients of a potential Bell inequality) as an element M=
N ,K
a,b K K Mx,y (ex ⊗ ea ) ⊗ (e y ⊗ eb ) ∈ 1N ( ∞ ) ⊗ 1N ( ∞ ).
x,y;a,b=1
Looking at violations, we study the rate viol(M) =
Mmin . M
(1.4)
Let us recall the following result. Theorem 4 [35, Corollary 4 + Lemma 1]. Given an element M = Mmin M
N ,K x,y;a,b=1
a,b Mx,y (ex ⊗
K )⊗ N ( K ) such that ≥ C, we can define a Bell inequalea )⊗(e y ⊗eb ) ∈ 1N ( ∞ ∞ 1 ity Mˆ with N inputs and K + 1 outputs (just completing with zeros) and verifying
ˆ Q| C | M, .
ˆ P| 16 Q∈Q sup P∈L | M,
ˆ = sup L V ( M)
(1.5)
Furthermore, if the Hilbert space dimension required in Eq. (1.4) is n, Eq. (1.5) can be obtained with a Hilbert space dimension lower than or equal to 2n. Remark 1. The reader should note that the meaning of L V (M) here is different from the one in [35]. In the previous work, we used L V (M) to denote the large violation of M over incomplete probability distributions (see [35, Def. 1]). Here L V (M) represents the violation of M over the (complete) probability distributions. We will not deal with the incomplete probabilities here. In this paper we will also consider the problem of studying violation for a fixed state. This motivates the following definition
Large Violation of Bell Inequalities with Low Entanglement
705
Definition 1. Let k be a natural number and let ρ be a state acting on k2 ⊗2 k2 . Given a,b N ,K a Bell inequality M = (Mx,y )x,y;a,b=1 , we define the largest violation of M over ρ as |M, Q| , sup Q∈Qρ P∈L |M, P|
L Vρ (M) = sup where Qρ =
tr (E xa
⊗
Fyb ρ)
N ,K x,y;a,b=1
:
N ,K N ,K {E xa }x,a=1 , {Fyb } y,b=1
(1.6) POVM’s .
We will abuse the notation by writing L V|η (M) instead of L V|ηη| (M) for any pure state. Actually, we will be mainly interested in the particular case where our state is the k maximal entangled state ρk = |ψk ψk |, where |ψk = √1 i=1 |ii. Furthermore, we k will be interested in identifying Bell inequalities where the dimension free version L V|ψmax (M) = sup L V|ψk (M)
(1.7)
k
remains bounded. In fact similar objects have been studied in operator space theory (see [38]). Given two operator spaces X and Y and a ∈ X ⊗ Y , we may define a modified min-norm aψ−min = sup{ψk |(u ⊗ v)(a)|ψk }, where the sup runs over all k ∈ N and all complete contractions u : X → Mk and v : Y → Mk . The following connection between this modified min norm and vioN ,K in Mk , the map lation follows easily from the fact that for every POVM {E xa }x,a=1 K ) → M given by u(e ⊗ e ) = E a for every x, a, is a complete contraction u : 1N ( ∞ k x a x (see [35, Sect. 8]). N ,K a,b K )⊗ Lemma 1. Given an element M = x,y;a,b=1 Mx,y (ex ⊗ ea ) ⊗ (e y ⊗ eb ) ∈ 1N ( ∞ N K
1 ( ∞ ), such that Mψ−min ≤ C, then sup sup |M, Q| ≤ C. k
Q∈Q|ψk
In particular, if sup P∈L |M, P| = 1 we have that Mψ−min ≤ C implies L V|ψmax (M) ≤ C. The following lemma will be very useful in Sect. 4. Lemma 2. For every a ∈ (Rn ∩ Cn ) ⊗ (Rn ∩ Cn ) we have aψ−min ≤ 4a . n ei ⊗ ei . Indeed, for Proof. First of all note that it is enough to consider a = id = i=1 n every a we can consider the Hilbert Schmidt decomposition and write a = i=1 αi ei ⊗ n n n U (ei ) for some unitary operator U : 2 → 2 and coefficients (αi )i=1 verifying |αi | ≤ a for every i = 1, · · · , n. But it is easy to see that we can consider the coefficients and the unitary operator as a part of the complete contractions in the definition of aψ−min . Let u : Rn ∩ Cn → Mk be a complete contraction. According to the Wittstock extension theorem (due independently to Haagerup, Paulsen and Wittstock see [56]) and the definition of Rn ∩ Cn ⊂ Rn ⊕ Cn , we can extend u : Rn ∩ Cn → Mk to Rn ⊕ Cn and find a decomposition u = u c + u r , where u c : Cn → Mk and u r : Rn → Mk are
706
M. Junge, C. Palazuelos
complete contractions. Therefore it is enough to consider the norm idψ−min on the four spaces Rn ⊗ψ−min Cn , Rn ⊗ψ−min Rn , Cn ⊗ψ−min Rn , Cn ⊗ψ−min Cn . For this, n , (Y )n let (X i )i=1 i i=1 ⊂ Mk be operators. Then, we deduce from the Cauchy-Schwartz inequality that n n k 1 X i ⊗ Yi |ψk = X i (r, s)Yi (r, s) ψk | k r,s=1 i=1 i=1 1 1 2 2 n n 1 k 1 k ≤ X i (r, s)X i (r, s) Yi (r, s)Yi (r, s) . k k r,s=1 i=1
r,s=1 i=1
n ⊂ Mk , On the other hand, for every (X i )i=1 n n n k tr tr 1 X i X i∗ = X i∗ X i = X i (r, s)X i (r, s). k k k i=1
r,s=1 i=1
i=1
The statement follows from the trivial inequality n n n tr ∗ ∗ ∗ X i X i ≤ min Xi Xi , Xi Xi . k i=1
i=1
i=1
Hence for all four possibilities we obtain idψ−min ≤ 1. 2. Main Result 2.1. Proof of the main result. In this section we will present a first proof of Theorem 1. The key point of the proof is the construction of a complemented copy of n2 in n1 ( n∞ ) (see Theorem 6). This result will be also very useful in Sects. 3 and 4. At the end of this section, we will discuss the optimality of our result and some interesting open questions. We refer to Sect. 3 for more information regarding the constants in Theorem 1. Note that according to Theorem 4, Theorem 1 follows from the next result. Theorem 5. For every n ∈ N, there exists an element M ∈ n1 ( n∞ ) ⊗ n1 ( n∞ ) such that √ Mmin n .
M log n Furthermore, this can be achieved with a Hilbert space of dimension n. Remark 2. Actually, the proof of the previous theorem guarantees that we can get Theorem 1 with a Hilbert space of dimension 2n. However, we will see in Sect. 3 that n + 1 dimensions suffice. The key point to prove Theorem 5 is the following result. Theorem 6. There exist δ ∈ (0, 21 ) and a universal constant C, such that, for every n, we have a Hilbert space Hn of dimension δn and maps V : Hn → n1 ( n∞ ) and √ V ∗ : n1 ( n∞ ) → Hn such that V ≤ C log n, V ∗ ≤ 1 and V ∗ V = id Hn . In √ n n particular, there exists a C log n- complemented copy of δn 2 in 1 ( ∞ ).
Large Violation of Bell Inequalities with Low Entanglement
707
Remark 3. Theorem 6 has a very interesting physical interpretation. Recall that the N ,K of Alice and natural space hosting joint probability distributions (P(a, b|x, y))x,y;a,b=1 K K N N Bob is the tensor product ∞ ( 1 ) ⊗ ∞ ( 1 ). Following this idea different descriptions of Nature are expressed through different tensor norms on this space. Roughly speaking corresponds to a local model of Nature and min to a quantum description (see Subsect. 5.2 for the rigorous formalization of this idea). However, it is conceivable that other models of nature are associated to other tensor norms (see Sect. 6 for such an interpretation). Our complementation result shows that on the subspace constructed in Theorem 6 the norms of the corresponding probabilities can be identified, up to a logarithmic term, by the α-norm on the Hilbert space tensor product n2 ⊗α n2 . In general, tensor norms on Hilbert spaces are much easier to calculate. We will recall Chevet’s Inequality, which will be frequently used in this paper (see e.g. [45]). For its formulation, we should recall the notation of the weak- 2 norm which is defined as ⎧ ⎫ 1 2 ⎨ ⎬ |x ∗ (xs )|2 : x ∗ ∈ X ∗ , x ∗ ≤ 1 w2 ((xs )s ; X ) = sup ⎩ ⎭ s
for every sequence (xs )s in a Banach space X . Theorem 7 (Chevet’s Inequality). There exists a universal constant b such that for every Banach spaces E, F and every sequence (gs,t )s,t of independent normalized gaussian random variables we have E gs,t xs ⊗ yt s,t
E⊗ε F
≤ bw2 ((xs )s ; E)E gt yt + bw2 ((yt )t ; F)E gs xs s t
F
E
for all xs ∈ E and yt ∈ F, where (gt )t and (gs )s are also independent normalized gaussian random variables. Here b = 1 for real Banach spaces and b = 4 for complex spaces. To prove Theorem 6, we will use the following three lemmas. Lemma 3 [35, Lemma 5]. There exists δ ∈ (0, 1/2) with the following property: Given natural numbers n ≤ m and a family of independent normalized real gaussian random n,m n m variables (gi j )i,n,m j=1 , let G = i, j=1 gi j ei ⊗e j be an operator from 2 to 2 . Then, “with n 2 highly probability”, there exists an operator vn : Hn −→ 2 such that vn∗ m1 G ∗ Gvn = . Here [x] denotes the entire part of id Hn and vn ≤ 2, where we denote Hn = [δn]+1 2 the real number x. Lemma 4. Let (gi,k j )i,n j,k=1 be a family of independent and normalized real gaussian variables and let the map G be defined by G(ek ) =
n
gi,k j ei ⊗ e j
i, j=1 2 High probability means here that the probability tends to 1 exponentially fast as m → ∞ (see Theorem 4.7 in [60]).
708
M. Junge, C. Palazuelos
for every k = 1, . . . , n. Then, there exists a universal constant C1 > 0 such that EG : n2 → n2 ( n∞ ) ≤ C1 n log n. √ In particular, EG : n2 → n1 ( n∞ ) ≤ C1 n log n. Proof. Chevet’s Inequality implies that n n EG ≤ w2 ((ei )i ; 2 ) E gi, j ei ⊗ e j i, j=1 n n
( ∞ ) n 2 + ω2 ((ei ⊗ e j )i, j ; n2 ( n∞ )) E gi ei . n
2
i=1
n √ It is well known that w2 ((ei )i ; n2 ) = 1 and E i=1 gi ei n ≤ n. On the other hand, 2 it is immediate to check that ω2 ((ei ⊗ e j )i, j ; n2 ( n∞ )) = 1. Then it suffices to show n E g e ⊗ e n log n. i, j i j i, j=1 n n
2 ( ∞ )
n √ Indeed, using the well-known estimate E i=1 gi ei n log n (see e.g. [69, ∞ p. 15]), we have n n √ √ ! E gi, j ei ⊗ e j ≤ nE gi, j ei ⊗ e j n log n 2 . i, j=1 i, j=1 n n n n
2 ( ∞ )
∞ ( ∞ )
The second assertion follows from id : n2 ( n∞ ) → n1 ( n∞ ) ≤
√
n.
Lemma 5. Let (gi,k j )i,n j,k=1 be a family of independent and normalized real gaussian n k variables and let the map G ∗ be defined by G ∗ (ei ⊗ e j ) = k=1 gi, j ek for every i, j = 1, . . . , n. Then, √ EG ∗ : n1 ( n2 ) → n2 ≤ C2 n holds with a universal constant C2 . In particular EG ∗ : n1 ( n∞ ) → n2 ≤ C2 n. Proof. According to Chevet’s Inequality, we have n EG ∗ ≤ ω2 ((ei ⊗ e j )i, j ; n∞ ( n2 )) E gi ei n i=1
2 n + ω2 ((ei )i ; n2 ) E gi, j ei ⊗ e j . i, j=1 n n
∞ ( 2 )
Large Violation of Bell Inequalities with Low Entanglement
709
Using the simple estimates mentioned in the proof is enough to of the previouslemma, it √ n n n see that ω2 ((ei ⊗ e j )i, j ; ∞ ( 2 )) ≤ 1 and E i, j=1 gi, j ei ⊗ e j n n n. Indeed,
∞ ( 2 )
for the first one just note that id : n2 ( n2 ) → n∞ ( n2 ) ≤ 1. The other inequality n n √ E gi, j ei ⊗ e j = E gi, j ei ⊗ e j n i, j=1 i, j=1 n n n n
∞ ( 2 )
∞ ⊗ 2
follows easily from a further application of √ Chevet’s Inequality. For the last assertion just note that id : n1 ( n∞ ) → n1 ( n2 ) ≤ n. Now we have all the required ingredients for the proof of Theorem 6. Proof (Theorem 6). According to Chebyshev’s Inequality, we may find a random matrix (gi,k j (ω))i,n j,k=1 verifying Lemmas 3, 4 and 5 simultaneously with a slight modification in the constants for the expectation. Then, we define W : n2 → n1 ( n∞ ) and W ∗ : n1 ( n∞ ) → n2 respectively, as W (ek ) =
n n 1 k 1 l gi, j (ω)ei ⊗ e j and W ∗ (ei ⊗ e j ) = gi, j (ω)el n n i, j=1
l=1
for k = 1, √. . . , n and 1 ≤ i, j ≤ n. According to Lemma 4 and Lemma 5 we know that W log n and W ∗ 1. On the other hand, by Lemma 3 applied to m = n 2 , we obtain a subspace Hn of n2 and an operator vn : Hn → n2 with vn ≤ 2 such that vn∗ (W ∗ W )vn = id Hn . Thus, defining V = W vn and V ∗ = vn∗ W ∗ , we obtain the assertion. In order to work with the operator space minimal tensor norm on n1 ( n∞ ) ⊗ n1 ( n∞ ), we will have to estimate the cb-norm of V and V ∗ . The following lemma allows us to compute the cb-norm of an operator T : n1 ( n∞ ) → R ∩ C. Lemma 6. There exists a universal constant C3 > 0, such that for every measure space (, μ), for every natural number k ∈ N and every operator T : L 1 (, k∞ ) → 2 , T : L 1 (, k∞ ) → R ∩ Ccb ≤ C3 T . Furthermore, C3 = K L is the constant in the little Grothendieck theorem. Proof. By approximation it suffices to prove the assertion for n1 ( k∞ ) and arbitrary natural number n. Now, given an operator T : n1 ( k∞ ) → 2 , it is immediate that T = supi Ti and T cb = supi Ti cb , where Ti is the associated operator Ti : k∞ → 2 defined by Ti (e j ) = T (ei ⊗ e j ) for every j = 1, · · · , k and i = 1, .., n. Now, according to the little Grothendieck theorem, the 2-summing norm of Ti is bounded by K Ti , and hence Ti = u i Dσi factors through a diagonal map Dσi (e j ) = σi ( j)e j (see [61]) with u i ≤ 1 and σi 2 ≤ K Ti . Let a j ∈ Mk . Then we have 2 ∗ |σ ( j)| a a ≤ |σi ( j)|2 sup a j 2 . i j j j j j Mk
710
M. Junge, C. Palazuelos
Hence we have Dσi : n∞ → Cn cb ≤ σi 2 . On the other hand, it is well known (see [58]) that u : n2 → n2 = u : Cn → Cn cb for every operator u. Thus, Ti : n∞ → Cn cb ≤ K Ti for every i = 1, .., n. The estimate for Rn is similar. The result follows by the definition of Rn ∩ Cn . Remark 4. The same proof works if we replace Rn ∩ Cn by O Hn . We prove now Theorem 5: Proof (Theorem 5). Let us consider the element M = (V ⊗ V )(a) ∈ n1 ( n∞ ) ⊗ n1 ( n∞ ), n n where V : δn δn . It is enough 2 → 1 ( ∞ ) is the operator in Theorem 6 and a = id δn 2 ⊗ 2 to show:
a) M n1 ( n∞ )⊗ n1 ( n∞ ) log n and √ b) M n1 ( n∞ )⊗min n1 ( n∞ ) n. Observe that a) follows from Lemma 5. Indeed, M n1 ( n∞ )⊗ n1 ( n∞ ) = (V ⊗ V )(a) n1 ( n∞ )⊗ n1 ( n∞ ) ≤ V 2 a δn ⊗ δn log n. 2
2
For b) we recall that by definition
M n1 ( n∞ )⊗min n1 ( n∞ ) = sup (u ⊗ v)(M) B(H )⊗min B(H ) ,
where the sup runs over Hilbert spaces H and complete contractions u,v : n1 ( n∞ ) → B(H ). Lemma 4 and Lemma 6 tell us that V ∗ : n1 ( n∞ ) → Rn ⊆ Mn verifies that V ∗ cb 1. This implies M n1 ( n∞ )⊗min n1 ( n∞ ) (V ∗ ⊗ V ∗ )(M) Rn ⊗min Rn . But then (V ∗ ⊗ V ∗ )(V ⊗ V )(a) Rδn ⊗min Rδn = a Rδn ⊗min Rδn = a δn ⊗2 δn = Thus, we conclude that M n1 ( n∞ )⊗min n1 ( n∞ )
√ n.
2
√
2
δn
√ n.
Problem 1. Theorem 5 exactly says that: √
id ⊗ id : n1 ( n∞ ) ⊗ n1 ( n∞ ) → n1 ( n∞ ) ⊗min n1 ( n∞ )
n . log n
We don’t know whether it’s possible to improve this order to lognα n . According to Theorem 18 we cannot remove a log term for elements M of rank n. However, for our specific M we cannot improve the violation estimate. Proposition 1. The element M = (V ⊗ V )(a) ∈ n1 ( n∞ )⊗ n1 ( n∞ ) in Theorem 5 verifies √ M n1 ( n∞ )⊗min n1 ( n∞ ) n log n. To prove Proposition 1 we need the following technical lemma:
Large Violation of Bell Inequalities with Low Entanglement
711
Lemma 7. a) Let X be an operator space and T : X → n2 ( n∞ ) a linear map. Then T cb ≤
√
n T . 1
b) Let T : O H → n2 ( n∞ ) be a linear map. Then T cb ≤ n 4 T . Proof. For the proof of a), we consider the factorization id
id
id ◦ id ◦T : X → n2 ( n∞ ) → n∞ ( n∞ ) → n2 ( n∞ ). Obviously, T cb ≤ id ◦T : X → n∞ ( n∞ )cb id : n∞ ( n∞ ) → n2 ( n∞ )cb √ √ ≤ id ◦T : X → n∞ ( n∞ ) n ≤ T : X → n2 ( n∞ ) n. Here we used that n∞ ( n∞ ) = n∞ is a commutative C ∗ -algebra and estimate (1.3) from Sect. 1. For the proof of b), we consider a map T : O H → n2 ( n∞ ). Then, the map 2
T T ∗ : n2 ( n1 ) → O H ∗ O H → n2 ( n∞ ) verifies T T ∗ cb = T 2cb according to [58, Prop. 7.2]. Applying a), we deduce √ √ T 2cb = T T ∗ cb ≤ nT T ∗ = nT 2 . 1
Hence T cb ≤ n 4 T . Now, we can show the optimality of our result when we consider the element M. Proof (Proposition 1). According to Lemma 4 and Lemma 7, V : O Hn → n1 ( n∞ )cb ≤ V : O Hn → n2 ( n∞ )cb id : n2 ( n∞ ) → n1 ( n∞ )cb √ 1 1 log n 1 n4 √ n 2 = n 4 log n. n Therefore, we obtain M n1 ( n∞ )⊗min n1 ( n∞ ) = (V ⊗ V )(a) n1 ( n∞ )⊗min n1 ( n∞ ) √ √ = n log na δn ⊗ δn n log n. 2
√ n log na O Hδn ⊗min O Hδn
2
3. Explicit Form of the Violation In this section we will prove Theorem 2. It turns out that very little knowledge on a given state is required to have violation and that, in a certain sense, entanglement and violation are rather independent.
712
M. Junge, C. Palazuelos
3.1. Constructing violation. In our approach violation of Bell inequalities is obtained by constructing Bell inequalities and the corresponding positive operator valued measurements simultaneously. The explicit form is derived from an explicit form of the Wittstock/Paulsen extension theorem. Another key ingredient is the factorization structure of the coefficients. Before proving the result, consider the following two remarks. Remark 5 (Concerning the universal constants in Theorem 2). Although we are not going to give an explicit value of the constants C and K , we would like to point out that for all the constants we are going to use there are explicit bounds in the literature. Remark 6 (Random variables). Although we have proved Theorem 1 (via Theorem 5) using gaussian variables, it is well known that the same estimates work for Bernoulli variables ([69]) √ and random unitaries ([49,33]) (in this last case one has tok normalize by a factor n). Thus Theorem 2 can be stated using Bernoulli variables (x,a )x,a,k (as k it is stated), gaussian variables (gx,a )x,a,k and random unitaries (Uk (x, a))x,a,k , where "k Un . It turns out in this last case the probability is defined by the Haar measure on i=1 that Bernoulli variables lead to a slight simplification in the proof of (0.3). We will divide the proof of Theorem 2 into three steps. First, we refer to [63] for the simple fact that # π k k E E x,a ek ⊗ (ex ⊗ ea ) ≤ gx,a ek ⊗ (ex ⊗ ea ) 2 x,a,k n n n n n n x,a,k
2 ⊗ 2 ( ∞ )
2 ⊗ 2 ( ∞ )
and the analogous inequality for the case n∞ ( n2 ) ⊗ n2 . This allows us to state!Lemma 4 and Lemma 5 for Bernoulli variables replacing the constants C1 and C2 with π2 C1 and !
π 2 C2
respectively. Furthermore, according to Chebyshev’s inequality, we may find a
k )n random matrix (x,a x,a,k=1 verifying Lemmas 4 and 5 simultaneously (replacing the constants c for the expectation by 2c). Let’s call K 1 and K 2 the corresponding final constants.
Step 1.
⎫ ⎧ ⎬ ⎨ n,n+1 a,b P(a, b|x, y) : P ∈ L ≤ K 12 log n. sup M˜ x,y ⎭ ⎩ x,y;a,b=1
Proof. Indeed, in [35, Prop. 4] we gave an elementary proof of ⎫ ⎧ ⎬ ⎨ n,n+1 a,b ˜ n n+1 sup P(a, b|x, y) : P ∈ L ≤ M . M˜ x,y
1 ( ∞ )⊗ n1 ( n+1 ∞ ) ⎭ ⎩ x,y;a,b=1 Moreover, ˜ n n+1 M = M n1 ( n∞ )⊗ n1 ( n∞ ) ≤ K 12 log n
( ∞ )⊗ n ( n+1 ∞ ) 1
1
follows immediately from the injectivity of the -norm for the firstequality and from k e ⊗e Lemma 4. Indeed, if E : n2 → n1 ( n∞ ) denotes the map E(ek ) = nx,a=1 x,a x a for every k = 1, . . . , n, then we have 1 M n1 ( n∞ )⊗ n1 ( n∞ ) = 2 (E ⊗ E) (id n2 ⊗ n2 ) n1 ( n∞ )⊗ n1 ( n∞ ) ≤ K 12 log n. n
Large Violation of Bell Inequalities with Low Entanglement
713
Step 2. Moreover, the operators {E xa }n,n+1 x,a=1 in the statement of Theorem 2 define POVM’s measurements in Mn+1 . n Proof. Thanks to our definition of E xn+1 , it certainly suffices to show a=1 E xa ≤ 1. n Our proof is motivated by Paulsen’s description of cb-maps from ∞ → Mk . Let us fix x and consider the operators ⎛ 1 ⎞ 2 n x,a x,a · · · x,a n 0 ··· 0 ⎟ 1 k 1⎜ ⎜ 0 ⎟ E˜ xa = x,a e1,k = ⎜ . ⎟ .. .. ⎠ n n ⎝ .. . . k=1 0 0 ··· 0 for every a = 1, . . . , n. Here ei, j denotes the matrix with 1 in the i j th entry and 0’s k e elsewhere. We define the operators αax = √1n e1,1 and βax = √1n nk=1 x,a 1,k and note that E˜ xa = αax βax holds for every a = 1, . . . , n. Then we observe that n
αax (αax )† = e11 ≤ 1.
a=1
Let R = (
k x,a √
)n . n a,k=1
According to Lemma 5, we have R ≤ K 2 and hence n (βax )† βax = R ∗ R ≤ K 22 . a=1
Then, we define the positive operators x † x x x † x 0 αa αa (αa ) αax βax (αa ) βa = Eˆ xa = (βax )† 0 0 0 (αax βax )† (βax )† βax ⎞ ⎛ 1 n 1 0 ··· 0 x,a ··· x,a ⎜ 0 0 ··· 0 0 ··· 0 ⎟ ⎟ ⎜ ⎟ ⎜ .. .. .. .. .. .. .. ⎟ ⎜ . . . . . . . ⎟ ⎜ 1⎜ ⎟. 0 0 · · · 0 0 · · · 0 = ⎜ ⎟ n ⎜ 1 0 · · · 0 1 n ⎟ 1 · · · x,a x,a ⎟ ⎜ x,a ⎟ ⎜ .. .. .. .. .. .. .. ⎠ ⎝ . . . . . . . n n 1 1 x,a 0 · · · 0 x,a x,a · · · Note that these operators are in M2n for every a = 1, . . . , n. However, we may erase columns and rows and obtain the positive operators E xa in Mn+1 stated in our assertion n using the constant K = 2K 22 . Thus it remains to show that 2K1 2 a=1 E xa ≤ 1. We recall 2
that for a positive matrix A =
a b b∗ c
714
M. Junge, C. Palazuelos
we have A ≤ 2
a 0 . Hence, 0 c n 1 a 1 1 0 ≤ 1. Ex ≤ 2 x † x 2K 22 a=1 K2 0 a (βa ) βa
Step 3. n,n+1 x,y;a,b=1
n+1 2 a,b a b ˜ αi . Mx,y ϕα |E x ⊗ E y |ϕα ≥ 2 α1 K i=2
$ %n+1 Proof. Let us denote by E xa = E xa (k, l) k,l=1 the matrix coefficients of our operators from Step 2. Using the fact that the E xa are selfadjoint, we deduce n,n+1
a,b ϕ|E xa ⊗ E yb |ϕ M˜ x,y
x,y;a,b=1 n
= α12
a,b a Mx,y E x (1, 1)E yb (1, 1) + 2α1
x,y,a,b=1
+
n+1
αi
i=2 n
αi α j
i, j=2
n+1
n
a,b a Mx,y E x (1, i)E yb (1, i)
x,y,a,b=1
a,b a Mx,y E x (i, j)E yb (i, j)
x,y,a,b=1
= I + II + III. Let us start with the main term
II = 2α1
n+1 i=2
=
=
n
αi
2
α1 K 2 n4 2 α1 K 2 n4
x,y,a,b=1 n+1 i=2
2 α1 αi 2 2 K n n+1
a,b a Mx,y E x (1, i)E by (1, i) =
i=2
n
αi
n
k k i−1 i−1 x,a y,b x,a y,b
x,y,a,b=1 k=1
n n+1
n
αi
k=1 i=2
k k i−1 i−1 x,a y,b x,a y,b
x,y,a,b=1
⎞2 n n+1 n 2 k i−1 = 2 4 α1 αi ⎝ x,a x,a ⎠ K n k=1 i=2
⎛
⎛
x,a=1
⎞2 n+1 n+1 n 2 2 k k ≥ 2 4 α1 αi ⎝ x,a x,a ⎠ = 2 α1 αi . K n K i=2
x,a=1
i=2
n x,y,a,b=1
a,b i−1 i−1 Mx,y x,a y,b
Large Violation of Bell Inequalities with Low Entanglement
715
To conclude the proof it remains to show that the other two terms are positive. Indeed for the first term we have n
II = α12
a,b a Mx,y E x (1, 1)E yb (1, 1) =
x,y,a,b=1
⎛
=
α12 K 2n4
⎞2
n
k k x,a y,b
x,y,a,b,k=1
n n α12 k ⎝ x,a ⎠ ≥ 0. K 2n4 x,a=1
k=1
For the third and last term we argue similarly, III =
n+1 i, j=2
=
n
αi α j
1 2 K n2
1 = 2 3 K n
a,b a Mx,y E x (i, j)E yb (i, j)
x,y,a,b=1 n+1
αi α j
i, j=2
n
j−1
j−1
i−1 a,b i−1 Mx,y x,a x,a y,b y,b
x,y,a,b=1
n n+1
αi α j
n
j−1
j−1
i−1 k k i−1 x,a y,b x,a x,a y,b y,b
k=1 i, j=2
x,y,a,b=1
k=1 i, j=2
x,a=1
⎛ ⎞2 n n+1 n 1 k i−1 j−1 ⎠ = 2 3 αi α j ⎝ x,a x,a x,a ≥ 0. K n Remark 7. Note that the lower estimate from Step 3 holds for every choice of signs. The exponential estimate for the -norm of the Bell inequality in Step 1 and for the POVMs measurements in Step 2 for the Bernouilli variables follows from the contraction principle and the corresponding exponential estimate for gaussian, an immediate consequence of the deviation inequalities (see [63] and [44]). 3.2. Entanglement and quantum nonlocality. We will now exhibit a family of states (|ϕα )0<α<1 on n+1 ⊗ n+1 and show that in a certain range μn ≤ α ≤ νn we find 2 2 n n+1 coefficients of Bell inequalities M ∈ n1 ( n+1 ∞ ) ⊗ 1 ( ∞ ) such that √ n L V|ψα (M). 2 log n We will study the entropy of entanglement of these states. We refer to Definition 1 for the precise definition of L Vρ (M) as the largest violation of M over a given state ρ and propose a measure of nonlocality, namely the largest Bell violation that ρ may attain: L Vρ = sup L Vρ (M). M
We were very surprised when comparing this measure of violation with the entropy of entanglement of these states. Let us recall that for a pure state |ψ ∈ H A ⊗ H B the corresponding density is defined as ρ A = tr B (|ψψ|).
716
M. Junge, C. Palazuelos
Here tr B = id ⊗ tr HB is the partial trace onto the first Hilbert space H A . For the n+1 αi |ii we have convenience of the reader let us recall that for |ψ = i=1 ρ|ψ =
n+1
αi2 |ii|.
i=1
The entropy of entanglement is given by the von Neumann entropy of ρ A , i.e. % $ E(ψ) = H (ρψ ) A = −tr (ρ A log2 (ρ A )) . The following lemma is completely elementary. Lemma 8. Let √ |ϕα = α|11 +
n+1 1 − α2 n+1 |ii ∈ n+1 √ 2 ⊗2 2 . n i=2
Then the function f (α) = E(ϕα ) satisfies n+1 1−α 2 1−α 2 1 2 log = α + (1 − α 2 ) i) f (α) = −α 2 log2 (α 2 ) − i=2 log 2 2 n n α2 n log2 1−α 2 ; ii) f n (0) = log2 (n), f n (1) = 0; iii) the function f has a unique maximum f n (αn ) = log2 (n + 1) for the maximally entangled state ϕαn , αn = √ 1 . n+1
p
iv) Let , δ > 0, n ≥ 2, p ≥ 1 and ≤ log2 (n), 2δ ≤ log2 (n). Set μ2n = 1 − νn2
=
δ p
log2 (n)
log2 (n)
and
. Then
log2 (n + 1) − −
1 ≤ log2 (n) − ≤ f (μn ) and f (νn ) n ln 2 √ 1− p − p/2 ≤ 4 δ log2 n + δ log2 n .
Corollary 1 follows from the next theorem. Theorem 8. Let n ≥ 2 and 0 < < n+1 such that (|ϕα )0<α<1 on n+1 2 ⊗ 2 L V|ϕα
1 2.
Then there exists a family of pure states
√ n ≥ c 2 log n
for all α and &
' 8 {E(|ϕα ) : 0 < α < 1} ⊃ , log2 (n) − . log2 (n)
Large Violation of Bell Inequalities with Low Entanglement
717
Proof. Let |ϕα be defined as above, 0 < < 21 , p = 2 and δ = 2 . Since n ≥ 2, we trivially have 2δ ≤ log22 (n). We consider α in the range [μn , νn ] and deduce from Theorem 2 the existence of Q ∈ Q|ϕα and M˜ such that √ n+1 1 − α2 √ α 2 ˜ ≥ α1 αi ≥ n Q, M 2 K K2 i=2 ⎫ ⎧ √ √ √ √ ! ! ⎨ n n δ ⎬ ≥ 2 min μn 1 − μ2n , νn 1 − νn2 ≥ 2 min . ,! ⎭ ⎩ log2 n K K log22 n However, we should also normalize with the -norm of M˜ and have to consider M˜ = M˜ C log n instead. This yields √ n L V|ϕα ≥ c 2 . log2 n The intermediate value theorem concludes the proof thanks to (Lemma 8, iv)). Theorem 2 allows us to go a little further and introduce the following indicator of violation of a pure state |ψ given by iviol(|ψ) = |ψ∞ |ψ1 , where |ψ p = (tr (||ψ| p ))1/ p is the p-norm of the Hilbert-Schmidt operator associated with |ψ. Let us note that for a state |ψ2 = 1 and hence, the well-known application of Hölder’s inequality 1/2
1/2
1 = |ψ2 ≤ |ψ1 |ψ∞
shows that iviol(ψ) ≥ 1. We may reformulate Theorem 2 as follows: Corollary 2. Let n ≥ 2 and |ψ ∈ n2 ⊗ n2 with iviol(|ψ) ≥ 2. Then L V|ϕα ≥
c iviol(|ψ) log n
holds for an absolute constant c. Proof. Let λi be the singular values of |ψ. Then we deduce from our assumption that iviol(|ψ) = λ1 (
n i=1
λi ) = λ1 (
n i=2
λi ) + λ21 ≤ λ1 (
n i=2
λi ) + 1 ≤ λ1 (
n
λi )
i=2
1 + iviol(|ψ) . 2 $n % Thus we find iviol(|ψ) ≤ 2λ1 i=2 λi . We may diagonalize |ψ with local operations (i. e. unitary operations u ⊗ 1 and 1 ⊗ v in n2 ⊗ n2 ) and assume that |ψ = n λ1 |11 + i=2 λi |ii. Then Theorem 2 applied for n − 1 yields the assertion. √ Conclusion 1. Large violation of order n can occur independently of the entropy of entanglement of a pure state, as long as the state is not too local (rank(|ψ) ≤ k) or too maximally entangled (maxi {λi } ≤ √kn ).
718
M. Junge, C. Palazuelos
4. Nonlocality and the Maximally Entangled State In this section we will investigate conditions for Bell inequalities which either avoid violation in the maximal entangled state or enforce violation to occur on the maximal entangled state. 4.1. Unbounded violation away from the maximally entangled state. The fact that the maximally entangled state is not optimal in terms of violation is not new. Indeed, there are many examples in the context of quantum nonlocality where the maximally entangled state has been shown not to be the most nonlocal one. We can find some of these anomalies ([53]) in the study of Bell inequalities ([2]), detection loophole ([26]), extractable secrecy key ([68]), K-L distance ([3]), etc. Here we will show that there are Bell inequalities which avoid violation in the maximally entangled state in high dimension. The examples are closely related to Theorem 2, but we need more inputs. From an operator space perspective, we may say that we use a C log n completely complemented copy of Rn ∩ Cn . Theorem 9. There exist universal constants C, D > 0 with the following property: For Dn ) and S ∗ : k ( Dn ) → R ∩ C for n ∈ N there are linear maps S : Rn ∩ Cn → k1 ( ∞ n n 1 ∞ some k such that S ∗ S = id n2 with S ∗ cb ≤ 1 and Scb ≤ C log n . Moreover, k ≤ 2 D
2 n2
.
Before proving Theorem 9, let’s see how to obtain Theorem 3: Proof (Theorem 3). Following the %same steps as in Theorem 5, we can prove that the $ Dn ) ⊗ k ( Dn ) satisfies element G = (S ⊗ S) e ⊗ ek ∈ k1 ( ∞ k k 1 ∞ √ G log n and Gmin n . According to Lemma 2 and the cb-estimate from Theorem 9, we obtain Mψ−min = (S ⊗ S) ek ⊗ ek ≤ S2cb aψ−min log n . ψ−min
k
Taking M =
M log n
Dn+1 ) ⊗ k ( Dn+1 ) and following Theorem 4, we obtain M˜ ∈ k1 ( ∞ 1 ∞ √
˜ n . Furthermore, adding 0’s does not change the ψ−min -norm. such that L V ( M) log n Thus, we conclude the proof applying Lemma 1. For any 1 ≤ p < ∞ we will denote by S np the space of n × n matrices joint with the 1
norm x p = (tr (|x| p )) p . We need the following well-known lemma. Lemma 9. Let f j be identically distributed independent copies of a random matrix f on a probability space and 1 ≤ q ≤ ∞. Then n fl ⊗ el ≤ n 1/q f S1m (L q ) . (4.1) m n l=1
L 1 (S1 ( ∞ ))
Large Violation of Bell Inequalities with Low Entanglement
719
Proof. Let us prove this for q = 1. Then clearly the triangle inequality implies the assertion. For q = ∞ we may assume that f is positive and it can be written as f = a ∗ Fa a ∈ S2m , F ∈ L ∞ (Mm ), such that a2 ≤ 1 and F∞ ≤ f S1m (L ∞ ) . Let Fl be independent copies. Then we find fl = a ∗ Fl a and sup Fl ≤ F∞ . l
Since the underlying measure space is a probability space, we obtain the assertion in the case q = ∞. For 1 ≤ q ≤ ∞, this follows from a complex interpolation argument as in [37]. Just note that in operator space jargon, the lemma is an immediate consequence of the fact that id : qn → n∞ and id : L q (; X ) → L 1 (; X ) are complete contractions. Although the techniques to prove Theorem 9 are similar to the ones used before, we have to use a random embedding of n2 into L 1 (, n∞ ) contained in the following lemma: Lemma 10. Let (gi, j )i,n j=1 be a normalized family of independent and identically distributed real gaussian variables. The map u : Rn ∩ Cn → L 1 (, n∞ ) defined by u(ei ) =
n
gi, j ⊗ e j for every i = 1, . . . , n ,
j=1
√ n defined satisfies ucb n log n. Similarly, the map u R AD : Rn ∩ Cn → L 1 (, ∞ ) √ by u R AD (ei ) = j=1 i, j ⊗ e j for every i = 1, . . . , n satisfies u R AD cb ≤ C log n, where (i, j )i,n j=1 is a family of independent Bernoulli random variables. Proof. Let x1 , . . . , xn ∈ S1m . Then we have n E gi, j xi ⊗ e j ≤ C log n i, j=1 m n S1 ( ∞ )
n xi ⊗ e j k=1
. S1m (Rn ∩Cn )
Indeed, let 2 < q < ∞. From Lemma 9 we deduce that n n n g x ⊗ e = (π ⊗ id) g x ⊗ e i, j i j j i i j i, j=1 j=1 i=1 L 1 (S1m ( n∞ )) L 1 (S1m ( n∞ )) n 1 ≤ nq xi ⊗ gi , m S1 (L q ())
i=1
(
j−1
)*
+
(
n− j
)*
+
where π j : L q () → L 1 (n ) is defined by π j ( f ) = id ⊗ · · · ⊗ id ⊗ f ⊗ id ⊗ · · · ⊗ id. We need to recall the definition of RCqn = [Rn ∩ Cn , Rn + Cn ]1/q . Then we have id : Rn ∩ Cn → RCqn cb ≤ 1 ,
720
M. Junge, C. Palazuelos
and the noncommutative Khintchine inequality can be reformulated (see [58]) as √ id : RCqn → span {gi : 1 ≤ i ≤ n} ⊂ L q () ≤ C q. cb
√ Hence id : Rn ∩ Cn → span {gi : 1 ≤ i ≤ n}cb ≤ C q implies n n 1 q g x ⊗ e ≤ n x ⊗ g i, j i j i i m i, j=1 i=1 m n S1 (L q ()) L 1 (S1 ( ∞ )) n √ q1 ≤ C qn xi ⊗ ei m
.
S1 (Rn ∩Cn )
i=1
Taking q = log n we obtain the first assertion. The second assertion follows immediately from the contraction principle (see [63]) # π E E x ≤ g x ij ij ij ij . 2 ij ij X
X
Proof (Theorem 9). Let 0 < δ < and n ∈ N. As a consequence of Chevet’s inequality and Chebyshev’s inequality we can obtain a constant C(δ) > 0 and a set Aδ ⊂ such that for every ω ∈ Aδ we have n √ i, j (ω)ei ⊗ e j ≤ C(δ) n, (4.2) i, j=1 n n 1 2
2 ⊗ 2
and
μ(Acδ )
< δ. In particular, we have n j,i (ω)ei ⊗ e j i, j=1
≤ C(δ)n
(4.3)
n1 ⊗ n2
for ω ∈ Aδ . Then, we define v : L 1 (, n∞ ) → Rn ∩ Cn by ⎞ ⎛ n n 1 ⎝ v( f ) = f j (ω)k, j (ω)dμ⎠ ek . n Aδ k=1
j=1
We deduce from Eq. (4.3) that n v = sup (ω)e ⊗ e k, j j k ω∈Aδ j,k=1
≤ C(δ) .
(4.4)
n1 ⊗ n2
Furthermore, by Lemma 6, we infer that vcb ≤ C (δ). Recall that u : Rn ∩ Cn → L 1 (, n∞ ) defined by u(ei ) =
n j=1
i, j ⊗ e j for every i = 1, . . . , n,
Large Violation of Bell Inequalities with Low Entanglement
721
√ satisfies ucb ≤ C log n. Let us now show that v ◦u is invertible of ,on a large subspace 1 n dimension ηn. Indeed, we observe that (v ◦ u)(ei ) = n j,k=1 Aδ i, j k, j ek . Then
v ◦ u Mn
⎛ n n 1⎜ ≤ ⎝ ( i, j k, j )ei ⊗ ek n Aδ =
1 n
Aδ
follows from Eq. (4.2) for R = v ◦ u S1n Take now 0 < η = Then we have
i,k=1 j=1
⎟ dμ⎠
n ⊗ n2
2 t n 2 R R 2 ⊗ n2 dμ ≤ C(δ) ,
n
i, j=1 i, j ei
1 ≥ |tr (v ◦ u)| = n
1 4C(δ)2
⎞
⊗ e j : n2 → n2 . On the other hand,
n n 2 i, j dμ = nμ(Aδ ) > . 2 Aδ i, j=1
and recall that si (v ◦ u) denotes the i th singular value of v ◦ u.
n ≤ v ◦ u S1n = |si (v ◦ u)| ≤ |s1 (v ◦ u)|([ηn] − 1) + n|s[δn] (v ◦ u)| 2 n
i=1
≤ C(δ) ([ηn] − 1) + n|s[ηn] (v ◦ u)| ≤ C(δ)2 ηn + n|s[ηn] (v ◦ u)|. 2
We conclude that |s[ηn] (v ◦ u)| ≥ 41 . Let n˜ = [ηn]. Using the singular value decomposition we may think of v ◦ u as a diagonal operator. From this, it is easy to see that we can find an operator w : n2˜ → n2 such that w ∗ ◦ v ◦ u ◦ w = id n˜ and ww ∗ ≤ 4. There2 fore, we can define S = u ◦ w and S ∗ = w ∗ ◦ v. Furthermore, the Bernoulli variables 2 are defined on a discrete probability space with 2n many atoms. Thus, we may replace n2
n2
n2
L 1 (, n∞ ) by 21 ( n∞ ) to obtain operators S : n2˜ → 21 ( n∞ ) and S ∗ : 21 ( n∞ ) → n2˜ verifying the statement of the theorem. To see the final assertion just note that n ≤ η2 n˜ and hence D = 2η−1 = 8C(δ)2 and k ≤ 2D
2 n˜ 2
.
Remark 8 (Application to Bell inequalities). We also obtain similar violation of Bell inequalities as in Sect. 3 with a similar behavior for states. Indeed, the coefficients of 2 the Bell inequalities with k ≤ 2n inputs and n outputs are given by
ω,ω Ma,b = 2−2n
2
n
k,a (ω)k,b (ω ).
k=1
√ Then the estimate u R AD cb ≤ C log n implies ω,ω Ma,b (eω ⊗ ea ) ⊗ (eω ⊗ eb ) 2n 2 a,b,ω,ω
1
≤ C 2 log n . ( n∞ )⊗ 21
n2
( n∞ )
722
M. Junge, C. Palazuelos
Moreover, with the correct normalization we have
a,b,ω,ω
ω,ω Ma,b (eω ⊗ ea ) ⊗ (eω ⊗ eb ) =
n
u R AD (ek ) ⊗ u R AD (ek ).
k=1
Therefore we deduce from Lemma 2 and Lemma 10 that Mψ−min ≤ 4u R AD 2cb log n .
(4.5)
On the other hand, we find exactly the same behavior in terms of states as in Sect. 3. Indeed, using (4.4) and the argument from Step 2, we deduce that ⎞ 1 1,a (ω) ··· n,a (ω) ⎜ (ω) 1 · · · 1,a (ω)n,a (ω)⎟ ⎟ a = 1 Aδ (ω) ⎜ 1,a Eω ⎟ ∈ Mn+1 , a = 1, . . . , n, ⎜ .. .. .. .. 2 ⎠ 2C(δ) n ⎝ . . . . 1 n,a (ω) n,a (ω)1,a (ω) · · · ⎛
satisfies sup
ω
Thus we may define n2
E ωn+1
= 1−
ω ∈ {−1, 1}2 . For φ = α1 |11 + Step 3 yields n ω,ω a,b=1
E ωa ≤ 1 .
a
a a E ω for all ω and obtain POVMs indexed by n+1 i=2 αi |ii we see that the same argument as in
ω,ω Ma,b φ|E ωa ⊗ E ωb |φ ≥
2 α αi . 1 (2C(δ)2 )2 n+1
i=2
ω,ω
1 ˜ Hence the matrix ( C log n Ma,b )a,b=1,..,n+1,ω,ω is an example of a Bell inequality which can not produce large violation on the maximal entangled state, but produces large violation for all states with iviol(φ) log n.
Conclusion 2. We have provided an example of a Bell inequality which gives violations √ n of order log n but only bounded violations can be obtained with any maximally entangled state. This example suggests that the maximally entangled state is a poor candidate to get large violations. A similar statement holds in the context of tripartite correlations, see [57] and the recent generalization to diagonal states in [10]. However, in [16] the authors showed the existence of a Bell inequality (constructed with 2n inputs and n outputs) for which the maximally entangled state in dimension n gives violations of order √ n log n . Therefore, we can not expect to have condition b) in Theorem 3 for every Bell inequality M. Problem 2. The key point to prove condition b) in Theorem 3 is the fact that the operator u R AD : Rn ∩ Cn → L 1 (, n∞ ) admits a good estimate of the cb-norm. It would be nice to know whether the operator V : Rn ∩ Cn → n1 ( n∞ ), defined by V (ek ) =
n 1 k i, j ei ⊗ e j for every k = 1, . . . , n, n i, j=1
Large Violation of Bell Inequalities with Low Entanglement
723
√ also satisfies a similar estimate V cb log n. We refer to [35, Remark 9] for interest in the negative answer, which would imply that there are violations of the Bell inequality involving POVM’s only for Alice or Bob, but√not both. On the other hand, an affirmative answer would imply that Theorem 6 gives a log n- completely complemented copy of R δn ∩ C δn in n1 ( n∞ ). This would imply, in particular, that the element M in Theorem 2 also verifies property b) in Theorem 3. 4.2. The role of the maximally entangled state in violation of Bell inequalities. Theorem 3 says that there exist violations of Bell inequalities which can not be obtained from the maximally entangled state. This is completely different from the case of quantum correlations (see [70]). We will now show that for positive Bell inequalities and violation in dimension n, the maximal entangled state plays a prominent role. Using standard tools from √ interpolation theory we show that every diagonal pure state can be written, up to a log n factor, as a superposition of maximal entangled states in lower dimensions. n ai |i ∈ n2 be such that a1 ≥ a2 ≥ · · · ≥ an ≥ 0 and Lemma 11. Let |ψ = i=1 n n n 2 i=1 ai ≤ 1. Then, there exist positive coefficients (βs )s=1 and some vectors (|ϕs )s=1 such that n |ψ = βs |ϕs , s=1
and they verify the following two properties: n √ a) s=1 βs ≤ 2 log n; 1 b) For every s = 1, . . . , n, |ϕs = √|A i∈As |i, where As is a set contained in s| {1, . . . , n}. Proof. We use the dyadic intervals Ik = [2k−1 , 2k − 1] for k = 1, . . . , log(n + 1) and note |Ik | = 2k−1 for every k. Then, we can write ⎞ ⎞ ⎛ ⎛ log (n+1) log(n+1) ai k−1 k−1 ⎝ ai |i⎠ = a2k−1 2 2 ⎝ |i⎠ 2− 2 . |ψ = a2k−1 k=1
i∈Ik
k=1
i∈Ik
First, note that ⎛
log (n+1)
⎞1 2
log(n+1)
a2k−1 2
k−1 2
≤
log(n + 1) ⎝
k=1
a22k−1 2k−1 ⎠
k=1
√ ≤ 2 log(n + 1)
n
1 2
ai2
i=1
≤
√ 2 log(n + 1).
. i : i ∈ Ik is a set of 2k−1 positive On the other hand, we see that for fixed k, a ak−1 2 numbers of value lower than or equal one. Then we can find positive convex combina k−1 tions 2j=1 α kj = 1 such that -
ai a2k−1
=
k−1 2
j=1
α kj Akj (i),
724
M. Junge, C. Palazuelos
where Akj (i) ∈ {0, 1} for every k, j and i. Indeed, this can be done by defining Akj (i) = a2k−1 + j−1 −a2k−1 + j a2k−1
1 if i < 2k−1 + j and Akj (i) = 0 otherwise; and α kj = 1, . . . , 2k−1 − 1, while α2kk−1 = Thus, we obtain
a2k −1 a2k−1 .
⎛
log(n+1)
|ψ =
a2k−1 2
k−1 2
k=1
⎝
k=1
α kj Akj (i)|i⎠ 2−
i∈Ik j=1
⎛
log(n+1) 2k−1
=
⎞
k−1
2
α kj a2k−1 2
k−1 2
j=1
⎝
for j =
k−1 2
⎞ Akj (i)|i⎠ 2−
k−1 2
.
i∈Ik
Given k and j, we have that card{i ∈ Ik : Akj (i) = 1} = j. Obviously, j ≤ 2k−1 = |Ik |. This yields ⎞ ⎛ √ log(n+1) 2k−1 k−1 1 j ⎝√ |ψ = α kj a2k−1 2 2 √ Akj (i)|i⎠ . k−1 j 2 k=1 j=1 i∈I k
By construction we have log(n+1) 2k−1
k=1
α kj a2k−1 2
k−1 2
j=1
√ log(n+1) 2k−1 k−1 j ≤ α kj a2k−1 2 2 √ k−1 2 k=1 j=1
log(n+1)
=
a2k−1 2
k−1 2
≤
√ 2 log(n + 1) ≤ 2 log n.
k=1
Thus it remains to rename the double indices by s(k, j) and to use βs(k, j) = √ k−1 j . Then |ϕs(k, j) = √1 j i∈Ik Akj (i)|i is a characteristic function with α kj a2k−1 2 2 √ k−1 2
support [2k−1 , 2k−1 + j − 1] contained in Ik . Remark 9. The general case in which the coefficients of |ψ are not necessary positive follows by decomposition in real and imaginary and then positive and negative parts. As an application of Lemma 11 we show that for games (even Bell inequalities satisfying a much weaker positivity condition) the largest violation is attained, up to a logarithmic factor, for a maximally entangled state. Indeed, using the notation introduced in Definition 1 we have x,y
Theorem 10. Let M = (Ma,b )a,b,x,y be a Bell inequality with positive coefficients. Suppose there exists a state ρ acting on H ⊗ H for an n-dimensional Hilbert space H and verifying L Vρ (M) = C. Then, there exists k ≤ n such that L V|ψk (M) ≥ where |ψk =
√1 k
k
i=1 |ii
C , 4 log n
is the maximally entangled state in dimension k.
(4.6)
Large Violation of Bell Inequalities with Low Entanglement
725
Proof. By hypothesis there exist some POVM’s {E xa }x,a , {Fyb } y,b acting on a Hilbert space H of dimension n and a state ρ : H ⊗2 H → H ⊗2 H such a,b a b ˜ that x,y,a,b M x,y tr (E x ⊗ Fy ρ) = C. This implies that the operator M = a,b a b x,y,a,b M x,y E x ⊗ Fy is positive (and this is the weaker positivity assumption we need). By a convexity argument, we may assume that ρ is a pure state ρ = |ψψ| on H ⊗2 H . Using the Hilbert-Schmidt decomposition, we may even assume that |ψ is a diagonal element with positive coefficients given by the singular values |ψ = i αi |ii. According to Lemma 11 we find a decomposition |ψ =
β p |ϕ p ,
p
where (β p ) p are positive coefficients satisfying |ϕ p =
p
√ β p ≤ 2 log n and
1 |ii |A p | i∈A p
holds for all p and some sets A p ⊂ {1, · · · , n}. Thus we find p0 and q0 such that ˜ p0 ≥ ϕq0 | M|ϕ
C . 4 log n
(4.7)
Now, we use positivity and the Cauchy-Schwartz inequality: C2 ˜ p0 2 ≤ ϕq0 | M|ϕ ˜ q0 ϕ p0 | M|ϕ ˜ p0 . ≤ ϕq0 | M|ϕ 42 (log n)2
(4.8)
˜ q0 or ϕ p0 | M|ϕ ˜ p0 . Thus (4.6) must be satisfied by ϕq0 | M|ϕ The previous theorem is particularly interesting once we know that there exist violations of Bell inequalities with positive coefficients (actually games) of order n α for some α > 0, where n is the dimension of the Hilbert spaces ([9,66]). Remark 10. For arbitrary M we still have (4.7). In fact, previously we obtained large violation with |q0 | = 1 and | p0 | = n − 1. Furthermore, as a consequence of the polarization identity, we find some k ∈ {0, 1, 2, 3} such that ξ |M|ξ ≥
C 42 log n
for the non normalized state |ξ = |ϕ A + i k |ϕ B . Remark 11. It turns out that the operator
x,y,a,b
a,b a Mx,y E x ⊗ Fxa is positive for all
a,b POVMs if and only if Mx,y represents a positive element in N SG ⊗ N SG (see Subsect. 5.2).
726
M. Junge, C. Palazuelos
Fig. 2. Geometric meaning of d(S, Q)
5. Geometric Interpretation of Violation of Bell Inequalities and the N SG Space 5.1. Geometric interpretation of violation of Bell inequalities. We have seen in Sect. 0 that for certain applications in QIT (see [22,35], and [36]) the value sup P∈Q ν(P) is very interesting. In this section we want to provide a geometric interpretation of this value in terms of convex sets of probabilities. We begin recalling some basic notions of convex geometry. Definition 2. i) We say that a set S ⊂ Rk is absolutely convex if for every x1 , x2 ∈ S and λ1 , λ2 ∈ R with |λ1 | + |λ2 | ≤ 1 we have λ1 x1 + λ2 x2 ∈ S. ii) Let S ⊂ Rk be an absolutely convex set. Then ρ S (x) := inf{λ ≥ 0 : x ∈ λS} is the Minkowski functional of S. The bipolar theorem tells us that k ρ S (x) = sup B(x) : B ∈ R such that sup B(s) ≤ 1 .
(5.1)
s∈S
We say that the set S is absorbing if we have ρ S (x) < ∞ for every x ∈ Rk . Clearly S is always absorbing if it is restricted to the linear space [S] generated by S. In particular, given two bounded absolutely convex sets S ⊆ Q ⊂ [S] ⊂ Rk , we may consider the homothetic distance (Fig. 2): d(S, Q) := sup ρ S (x) = inf {λ ≥ 0 : Q ⊆ λS} .
(5.2)
x∈Q
Equation (5.1) tells us that we can also define the previous distance by duality. Indeed, if we consider ζ (B) =
supx∈Q B(x) sup y∈S B(y)
,
(5.3)
for every B ∈ Rk , we have sup B∈Rk ζ (B) = d(S, Q). Remark 12. The fact that Q ⊂ [S] guarantees that sup y∈S B(y) = 0 implies supx∈Q B(x) = 0. So we just define 00 = 0 in Eq. (5.3).
Large Violation of Bell Inequalities with Low Entanglement
727
It is easy to see that the distance d in Eq. (5.2) is the one considered in the case of correlation Bell inequalities (see [70]). In fact, it can be seen that the set of classical (resp. N ⊗ N (resp. quantum) correlation matrices MC (resp. M Q ) is exactly the unit ball of ∞ π ∞ N N
∞ ⊗γ2 ∞ ) (see for instance [23] for the definition of such norms). Thus, in the case of correlation Bell inequalities the physical definition of violation of Bell inequalities coincides with the natural geometric definition. In particular, Eq. (5.3) coincides with the largest violation of the Bell inequality B and the maximum of these values is the distance described in Eq. (5.2): d(MC , M Q ). General Bell inequalities (where we work with the whole probability distribution) present some additional problems because the corresponding sets are not centrally symmetric. Indeed, our sets L and Q are contained in a d = (N K − N + 1)2 − 1 dimen2 2 sional affine subspace A of R N K (see [70]). Actually, it is not difficult to see that L Q C ⊂ A = Aff (L) with equality in the last inclusion if we consider only those elements in Aff(L) that are probability distributions. Here, N N Aff(L) = αi Pi : N ∈ N, Pi ∈ L, αi ∈ R, αi = 1 i=1
i=1
denotes the affine hull of the space L. Thus, in order to define a “good distance” in this situation we must be more careful. Consider a subset S contained in an affine subspace Aff(S) ⊂ Rk of dimension d. It is standard in convex geometry to consider the absolutely convex hull of S: S˜ := conv(S ∪ −S). S˜ is an absolutely convex set contained in a linear space of dimension d + 1. It is easy to see that N N ρ S˜ (x) = inf |αi | : x = αi si : N ∈ N, si ∈ S, αi ∈ R i=1
i=1
˜ for every x ∈ [ S]. Therefore, if we have two sets S ⊆ Q ⊂ Aff (S), we can naturally define a distance between them by using their absolutely convex hull S˜ ⊆ Q˜ with the same geometric interpretation as in Eq. (5.2) (Fig. 3): ˜ Q). ˜ d1 (S, Q) = d( S, As before, the dual point of view defines a distance in terms of functionals. That is, for any linear functional B on Rk , we have ζ1 (B) =
supq∈ Q˜ B(q) sups∈ S˜ B(s)
=
supq∈Q |B(q)| sups∈S |B(s)|
.
We immediately deduce that this is the distance that we have considered in Sect. 0 in the particular case of S = L and Q = Q. Specifically, sup ν(P) = d1 (L, Q),
P∈Q
728
M. Junge, C. Palazuelos
Fig. 3. Geometric meaning of d1 (S, Q)
and for every linear functional M on R K
2N2
,
L V (M) = ζ1 (M). Therefore, Theorem 1 can be stated as follows: Theorem 11. With the same notation as in Sect. 0, if N = n, K = n and d = n, √ n . d1 (L, Q)
log n However, from a purely geometric point of view d1 (L, Q) presents two problems: ˜ are “much” bigger than the a) The sets we are using to define the distance, S˜ and Q, sets S and Q. In particular, the previous definition involves to consider an extra dimension. b) In order to measure distances between two affine subsets we would like to have a measure invariant under translations. The previous one does not verify this property. There exists a natural way to obtain an absolutely convex set from an affine convex subset S which solves the previous problems. Indeed, consider the set Sˆ = S − S. The new absolutely convex set Sˆ is contained in a d-dimensional linear space and it is invariant under translations of S (Fig. 4). As before, given two convex sets S ⊆ Q ⊂ Aff(S), we can naturally define a distance between them by using the sets Sˆ ⊆ Qˆ with the same geometric interpretation as in Eq. (5.2): ˆ Q). ˆ d2 (S, Q) = d( S, Furthermore, the dual formulation of d2 defines a very nice distance in terms of functionals. Given an affine subset S of Rk , for any linear functional on Rk we define the -width of S as (Fig. 5) ω (S) = sup (x) = sup (s) − inf (s ). x∈ Sˆ
s∈S
s ∈S
Large Violation of Bell Inequalities with Low Entanglement
729
Fig. 4. Geometric meaning of Sˆ
Fig. 5. Geometric meaning of ω (S)
ω (Q)
Fig. 6. Geometric meaning of ω (S)
Then, we define the band-width distance between S and Q as (Fig. 6) ω (Q) . ∈Rk ω (S)
d2 (S, Q) = sup
730
M. Junge, C. Palazuelos
Remark 13. Again, ω (S) = 0 implies ω (Q) = 0, so we can define
0 0
= 0.
Remark 14. In fact, there exists another standard way to obtain an absolutely convex set Sh from S in dimension d (which is not, in general, invariant under translations). Consider the set Sh = S˜ ∩ (Aff(S))lin , where (Aff(S))lin is the linear space associated to Aff(S) (that is, (Aff(S))lin = Aff(S)− x0 for any x0 ∈ Aff(S)). However, in our particular situation both definitions are equivalent since 0 ∈ / Aff(S) implies Sˆ = 2Sh .
(5.4)
Of course, we can not compare the sets S˜ and Sˆ because they have different dimensions. However, it is easy to see that they are comparable when we consider functionals which vanish at some point x0 ∈ S. We have: Lemma 12. Let S be a set contained in an affine subspace Aff(S) of Rk and let be a linear functional on Rk such that (x0 ) = 0 for some x0 ∈ S. Then, sup (s) ≤ ω (S) ≤ 2 sup (s). s∈ S˜
(5.5)
s∈ S˜
In particular, given two sets S ⊆ Q ⊂ Aff(S) ⊂ Rk and given a linear functional on Rk such that (x0 ) = 0 for some x0 ∈ S, we have 1 ω (Q) ζ1 () ≤ ≤ 2ζ1 (). 2 ω (S) Proof. To prove the first inequality just note that sup (s) = sup |(s)| = sup |(s − x0 )| ≤ sup |(t)| = sup (t). s∈ S˜
s∈S
t∈ Sˆ
s∈S
t∈ Sˆ
On the other hand, to see the second inequality note that ω (S) = sup |(s)| ≤ sup |(s)| + sup |(s )| = 2 sup |(s)| = 2 sup (s). s∈S−S
s∈S
s ∈S
s∈S
s∈ S˜
The second part of the lemma is straightforward. Remark 15. Note that, in particular, S˜ and Sˆ are comparable if 0 ∈ S. Now, it is trivial to check that the element M given in Theorem 2 (see construction in Sect. 3) verifies that M(P0 ) = 0 if we define P0 (a, b|x, y) = P(a|x)P(b|y) ∈ L by P(a|x) = 0 if a = 1, . . . , k and P(k + 1|x) = 1 for every x = 1, . . . , N . Therefore, the previous lemma allows us to state Theorem 1 as follows: Theorem 12. With the same notation as in Sect. 0, if N = n, K = n and d = n, √ n . d2 (L, Q)
log n
Large Violation of Bell Inequalities with Low Entanglement
731
In fact, we can get an improvement of Lemma 12. Indeed, in our particular situation the distances d1 and d2 are equivalent in the following sense: Lemma 13. The following statement holds: d2 (L, Q) ≤ d1 (L, Q) ≤ 4d2 (L, Q) + 1. Proof. The first inequality follows from Eq. (5.4) in Remark 14. Indeed, d1 (L, Q) ≤ λ ⇔ Q˜ ⊆ λL˜ ⇒ Qh ⊆ λLh ⇒ Qˆ ⊆ λLˆ ⇔ d2 (L, Q) ≤ λ. To see the second inequality let’s define 1 K2 1 ϕ0 = ϕ0 (a, b|x, y) = 2 N P0 = P0 (a, b|x, y) =
for every x, y = 1, . . . , N ; a, b = 1, . . . , K and for every x, y = 1, . . . , N ; a, b = 1, . . . , K .
Then, we obtain an element P0 ∈ L and a linear functional ϕ0 on R N ϕ0 (P0 ) = 1 and sup Q∈Q ϕ0 (Q) = 1. 2
2K2
such that
2
Now, for any linear functional ψ on R N K , we can write ψ = ψ1 + ψ(P0 )ϕ0 , where 2 2 ψ1 = ψ − ψ(P0 )ϕ0 . Note that ψ1 is a linear functional on R N K such that ψ1 (P0 ) = 0. Therefore, according to Lemma 12, if d2 (L, Q) ≤ λ we have sup ψ(Q) = sup (ψ1 (Q) + ψ(P0 )ϕ0 (Q)) ≤ ω1 (Q) + ψ(P0 )
Q∈Q˜
Q∈Q˜
≤ λω1 (L) + ψ(P0 ) ≤ 2λ sup ψ1 (P) + ψ(P0 ) ≤ (4λ + 1) sup ψ(P). P∈L˜
P∈L˜
5.2. The N SG space and upper bounds. In this section we will show that there exists a canonical identification between the problems of computing the largest violation of a Bell inequality L V (M) and the quotient M n1 ( n∞ )⊗min n1 ( n∞ ) M n1 ( n∞ )⊗ n1 ( n∞ )
.
This means an improvement of [35, Lemma 1] (see Theorem 4), where just the implication M n1 ( n∞ )⊗min n1 ( n∞ ) M n1 ( n∞ )⊗ n1 ( n∞ )
L V (M)
is shown. As a consequence of this, we will provide upper bounds for the violation of a Bell inequality as a function of the number of inputs, the number of outputs and the dimension of the Hilbert space. This will show that our Theorem 2 is almost optimal in all the parameters of the problem. According to the previous subsection, the problem of computing the largest violation ˜ Q), ˜ where we of Bell inequalities is exactly the same as computing the distance d(L, denote L˜ = conv(L ∪ −L) and Q˜ = conv(Q ∪ −Q).
732
M. Junge, C. Palazuelos
Let’s start defining the complex linear space N ,K N SG(N , K ) = {R(a|x)}x,a=1 ∈ CN K :
K
R(a|x) = constant ∈ C for every x .
a=1
It is easy to see that dim(N SG(N , K )) = N K − N + 1. We will identify the algebraic dual space N SG(N , K )∗ with C N K /N SG(N , K )⊥ , where . N SG(N , K )⊥ = B ∈ C N K : B(R) = 0 for every R ∈ N SG(N , K ) is the orthogonal space of N SG(N , K ). On the other hand, we will consider the family I =
N ,K a {E xa }x,a=1 Ex
≥ 0, and
K
E xa
= 1 for every x
.
a=1
Then, it is not difficult to see that the map J : N SG(N , K )∗ →
/
B(H{E xa } )
{E xa }∈I
given by ⎛ {ρ(x|a)}x,a → ⎝
N ,K
⎞ ρ(x|a)E xa ⎠
x,a=1
{E xa }∈I
is well defined and it defines an operator system structure on N SG(N , K )∗ (see [56] for the definition of operator system). Then, as we explained in Sect. 1, N SG(N , K ) has a natural operator space structure as dual space of N SG(N , K )∗ . Remark 16. Duality in the category of operator system is in general a tricky point and we will disregard this problem here. Remark 17. In [34] the authors show that the map N K ι : N SG(N , K )∗ → i=1
∞ N and π : N → defined by ι(ex,a ) = πx (ea ), where ea is the a th canonical vector in ∞ x ∞ N K N th i=1 ∞ is the canonical embedding of ∞ into the x position of the free product, is a completely isometric embedding. Furthermore, the operator system structure on N SG(N , K )∗ is exactly the one defined by this embedding.
The following theorem shows that the operator space N SG(N , K ) is, actually, just N ( K −1 ). a little distortion of ∞ 1
Large Violation of Bell Inequalities with Low Entanglement
733
Theorem 13. The map N T : N SG(N , K ) → ∞ ( 1K −1 ) ⊕∞ C
defined as K N ;K −1 N ,K T {R(x|a)}x=1,a=1 = {R(x|a)}x=1,a=1 , R(x|a) a=1 N ;K for every {R(x|a)}x=1,a=1 ∈ N SG(N , K ) is a complete isomorphism with T cb ≤ 1 −1 and T cb ≤ 9. Here, T −1 is defined as
N K −1 N ;K −1 K −1 ,R− R(x|a) T −1 {R(x|a)}x=1,a=1 , R = {R(x|a)}a=1 a=1
.
x=1
Proof. Clearly, the map S1 : N SG(N , K ) → C defined by S1 ({R(x|a)}) = K a=1 R(x|a), verifies that S1 = S1 cb ≤ 1. Therefore it suffices to study the N ( K −1 ) defined by cb-norm of the map P1 : N SG(N , K ) → ∞ 1 N ;K N ;K −1 P1 {R(x|a)}x=1,a=1 = {R(x|a)}x=1,a=1 N ;K for every {R(x|a)}x=1,a=1 ∈ N SG(N , K ). Note that, by definition, N ,K −1 P1 cb = ex,a ⊗ (ex ⊗ ea ) x,a=1 N ( K −1 ) N SG(N ,K )∗ ⊗min ∞ 1 N ,K −1 a = sup E x ⊗ (ex ⊗ ea ) N ,K {E xa }x,a=1 ∈I x,a=1 N
≤ 1.
B(H{E a } )⊗min ∞ ( 1K −1 ) x
N ,K Here the last inequality follows by the fact that for any {E xa }x,a=1 ∈ I , the map N K −1 a
1 ( ∞ ) → B(H ) defined by ex ⊗ ea → E x is a complete contraction (see [35, Sect. 8]). To study T −1 cb , again it is clear that the map S2 : C → N SG(N , K ) defined by K −1
( )* + N verifies S2 = S2 cb ≤ 1. Therefore, it suffices to show S2 (R) = (0, . . . , 0, R)x=1 N ( K −1 ) → N SG(N , K ) is defined as that P2 cb ≤ 8, where P2 : ∞ 1 N ;K −1 ) P2 ({R(x|a)}x=1,a=1
K −1 = {R(x|a)}a=1 ,−
K −1 a=1
N .
R(x|a) x=1
N ,K −1 a N ( K −1 ) E x ⊗ (ex ⊗ ea ) ∈ B(H ) ⊗min ∞ To see this, consider an element x = x,a=1 1 of norm 1. By using the same argument as in [35, Thm. 6] we can assume, up to a N ,K −1 are positive. Now, we have constant C = 4 in the norm, that the operators {E xa }x,a=1
734
M. Junge, C. Palazuelos
to check that (id ⊗P2 )(x) B(H )⊗min N SG(N ,K ) ≤ 2. Equivalently, we will show that the associated operator to (id ⊗P2 )(x), xˆ : N SG(N , K )∗ → B(H ), defined as xˆ ({ρ(x, a)}) =
N K −1
E xa (ρ(x|a) − ρ(x|K )) ,
x=1 a=1
verifies x ˆ cb 2. To see this, consider the maps U, V : N SG(N , K )∗ → B(H ) defined as follows: U ({ρ(x, a)}) =
N ,K
ρ(x, a) Eˆ xa ,
x,a=1
where Eˆ xa = E xa for every a = 1, . . . , K − 1 and Eˆ xK = 1 − x = 1, . . . , N and V ({ρ(x, a)}) =
N ,K
K −1 a=1
Eˆ xa for every
ρ(x, a) Fˆxa ,
x,a=1
where Fˆxa = 0 for every a = 1, . . . , K − 1 and FˆxK = 1 for every x = 1, . . . , N . It is trivial to see that { Eˆ xa }x,a (resp. { Fˆxa }x,a ) is a family of positive operators such that K K ˆa ˆa a=1 E x = 1 (resp. a=1 Fx = 1) for every x = 1, . . . , N . Then, by the very definition of the operator space N SG(N , K )∗ it is clear that U cb , V cb ≤ 1. On the other hand, we have xˆ = U − V . Therefore, x ˆ cb ≤ 2. The following corollary is a direct consequence of the metric mapping property of the norms π and ∧ in their corresponding categories (see [23,58]). Corollary 3. For every natural numbers N , K ,
0 N ( K −1 ) a) T˜ := T ⊗ T : N SG(N , K ) ⊗π N SG(N , K ) → ∞ C ⊗π ∞ 1 0 N ( K −1 ) ˜ ˜ −1 ≤ 81.
∞ ∞ C defines an isomorphism with T T 1 0 N ( K −1 ) b) T˜ := T ⊗ T : N SG(N , K ) ⊗∧ N SG(N , K ) → ∞ ∞ C ⊗∧ 1 0 N ( K −1 )
∞ C defines a complete isomorphism with T˜ cb T˜ −1 cb ≤ 81. ∞ 1
Remark 18. Actually, it is easy to see that T −1 ≤ 3 in Theorem 13 when we restrict to real Banach spaces. In particular, in that case we obtain T˜ T˜ −1 ≤ 9 in part a) of Corollary 3. The following lemma shows that N SG is the suitable space to describe the sets L˜ ˜ However, since L˜ and Q˜ are real sets we have to restrict the previous space to the and Q. real part. Taking the π tensor product for real Banach spaces is well-defined and provides the correct dual. The correct way to understand the tensor product of the real space N SG(N , K )R is to use the operator system V = N SG(N , K )∗ ⊗min N SG(N , K )∗ , and then consider the real part Vsa . Then we may define the tensor product of real coefficients as ∗ ˆ R N SG(N , K )R := Vsa . N SG(N , K )R ⊗
Large Violation of Bell Inequalities with Low Entanglement
735
Using the fact that every element ξ ∈ V can be written as ξ = ξ1 + iξ2 with max{ξ1 , ξ2 } ≤ ξ we deduce that B N SG(N ,K )R ⊗ˆ R N SG(N ,K )R ⊂ B N SG(N ,K )⊗∧ N SG(N ,K ) = BV ∗ ⊂ BVsa∗ + i BVsa∗ = B N SG(N ,K )R ⊗ˆ R N SG(N ,K )R + i B N SG(N ,K )R ⊗ˆ R N SG(N ,K )R . ˜ it can be seen that for every element R ∈ N SG(N , K )R In order to describe the set L, we have ||R|| N SG(N ,K ) = inf {|λ| + |μ| : R = λP + μQ : P, Q ∈ S(N , K )} , where S(N , K ) =
N ,K {P(x|a)}x,a=1
: P(x|a) ≥ 0 for every x, a and
K
P(x|a) = 1 for every x .
a=1
That is, the norm || · || N SG(N ,K ) coincides with the Minkowski functional of the set S˜ = conv(S ∪ −S) when we consider real coefficients. With this at hand, we have Lemma 14. a) L˜ = B N SG(N ,K )R ⊗π N SG(N ,K )R = B( N SG(N ,K )∗ ⊗ N SG(N ,K )∗ )∗ . R R b) Q˜ = B N SG(N ,K )R ⊗ˆ R N SG(N ,K )R := BVsa∗ . Proof. To prove part a), let’s denote, for a couple of sets A and B, A ⊗ B = {a ⊗ b : a ∈ A, b ∈ B}. Now, as we have said, B N SG(N ,K )R = conv(S ∪−S). On the other hand, by definition L˜ = conv (conv(S ⊗ S) ∪ −conv(S ⊗ S)) = conv (S ⊗ S ∪ −(S ⊗ S)) . Then, it follows by the well known fact B X ⊗π Y = conv(B X ⊗ BY ) (see [23]) that B N SG(N ,K )R ⊗π N SG(N ,K )R = conv (conv(S ∪ −S) ⊗ conv(S ∪ −S)) ˜ = conv (S ⊗ S ∪ −(S ⊗ S)) = L. The second equality follows trivially by the duality between the and π tensor norms. In order to prove part b), recall that the map % $ J ⊗ J : N SG(N , K )∗ ⊗min N SG(N , K )∗ −→ ⊕{E xa }∈I B H{E xa } B H{E xa } ⊗ H{Fyb } ⊗min ⊕{Fyb }∈I B H{Fyb } ⊂ ⊕ a b {E x }×{Fy } ∈I ×I
is a (complete) isometry. Then, the result follows easily just reasoning by duality on real a,b )x,y;a,b . elements (Mx,y Remark 19. The space N SG(N , K ) represents the formalization of the comments after Theorem 6. Actually, in [34] it is shown that this space is very useful to study the set of probability distributions when we assume another very interesting model of Nature. To finish this section, we will use Corollary 3 and Lemma 14 to obtain upper bounds for the largest violation of Bell inequalities.
736
M. Junge, C. Palazuelos
Theorem 14. With the notation of Subsect. 5.1, if we consider N inputs, K outputs and Hilbert space dimension d, we have that ˜ Q) ˜ min{N , K , d}. d(L, The upper bound regarding the Hilbert space dimension d was proved in [35, Prop. 1]. On the other hand, Theorem 14 improves the previously known O(K 2 ) upper bounds for ˜ Q) ˜ as a function of the number of outputs (see [22, Prop. 25]). To our knowledge, d(L, ˜ Q) ˜ as a function of the number of inputs was known. no nontrivial upper bound for d(L, Proof. The upper bound in terms of d was proved in [35, Prop. 1]. It remains to show the upper bound in terms of the outputs or the inputs. Since we allow absolute constants we may work with complex Banach spaces. Now, according to Corollary 3 and Lemma 14 it suffices to show that K K K K id ⊗ id : 1N ( ∞ ) ⊗ 1N ( ∞ ) → 1N ( ∞ ) ⊗min 1N ( ∞ ) min{N , K }. √ K)≤ K → K such that u −1 ≤ 1 K ([69]). Thus there exist u : ∞ Recall that d( 1K , ∞ 1 and √ K → 1K ≤ K . u : ∞
Moreover, we know that u −1 cb = u −1 ≤ 1. We will use the amplifications u˜ = K ) → N ( K ) and u˜ −1 = id ⊗u −1 : N ( K ) → N ( K ). Then, we have id ⊗u : 1N ( ∞ ∞ 1 1 1 1 1 a factorization K K ˜ : 1N ( ∞ ) ⊗ 1N ( ∞ ) id ⊗ id = (u˜ −1 ⊗ u˜ −1 ) ◦ (id ⊗ id) ◦ (u˜ ⊗ u) K K → 1N ( ∞ ) ⊗min 1N ( ∞ ).
By the previous comments we have that K K ) ⊗ 1N ( ∞ ) → 1N ( 1K ) ⊗ 1N ( 1K ) ≤ K , u˜ ⊗ u˜ : 1N ( ∞
and K K ) ⊗min 1N ( ∞ ) ≤ 1. u˜ −1 ⊗ u˜ −1 : 1N ( 1K ) ⊗min 1N ( 1K ) → 1N ( ∞
Finally, according to Grothendieck’s inequality, for any operator T : ∞ → 1 , T cb ≤ K G T or, equivalently, id ⊗ id : 1 ⊗ 1 → 1 ⊗min 1 ≤ K G . The metric mapping property of the and the min norms in their corresponding categories implies K K K K id ⊗ id : 1N ( ∞ ) ⊗ 1N ( ∞ ) → 1N ( ∞ ) ⊗min 1N ( ∞ ) ≤ K G K .
To prove the upper bound as a function of the number of inputs N note that, according to Eq. (1.3), we have that N K K ( ∞ ) → 1N ( ∞ )cb ≤ N . id ⊗ id : ∞
Then, the factorization K K N K K K K ) ⊗ 1N ( ∞ ) → ∞ ( ∞ ) ⊗min 1N ( ∞ ) → 1N ( ∞ ) ⊗min 1N ( ∞ ) id ⊗ id : 1N ( ∞
implies K K K K ) ⊗ 1N ( ∞ ) → 1N ( ∞ ) ⊗min 1N ( ∞ ) ≤ N . id ⊗ id : 1N ( ∞
Large Violation of Bell Inequalities with Low Entanglement
737
6. A Relaxation of the Problem: The γ2∗ Norm The problem of computing or approximating the classical and quantum value of Bell inequalities has been studied from different points of view. On the one hand, the problem of studying the quantum value of Bell inequalities (related to the study of deciding whether a given probability distribution belongs to the quantum set Q), has captured the interest of many researchers in QIT (see e.g. [24,46,54,55,72]). On the other hand, since any two-prover one-round game can be seen as a Bell inequality, Game Theory and, in general Computer Science, can be considered as a very important source of results about the (mostly) classical value of Bell inequalities. The study of two-prover one-round games is of great interest in Computer Science owing to the fact that many of the most important problems in Complexity Theory can be stated in terms of these kinds of games. In particular, they are extremely useful to study problems of hardness of approximation. One example of this is the so called unique game conjecture (see [42,43]), which has become one of the crucial problems in Complexity Theory since it implies hardness of approximation results for several important problems (MaxCut, Multicut and Sparsest Cut, Vertex Cover,…) which are difficult to obtain by standard complexity assumptions. Although the results in Complexity Theory mainly focus on the classical value of games, recently some of the most relevant problems in the field have been studied in the context of quantum physics or under the assumption of the non-signaling condition. Some examples of this are the study of hardness of approximation of the quantum value (commonly called entangled value) of games ([31,39]), the unique game conjecture in the quantum context ([41]) or the parallel repetition theorem ([20,30,41,40]). The standard way to tackle these kinds of problems is to show that the entangled value (resp. non-signaling value) of the considered games can be approximated by some semidefinite programming (SDP) relaxation with some good properties. In this sense, the following relaxation has been shown to be very useful (see [5] and [41] for details). For every M consider the following optimization problem (OP), which maximizes over complex vectors {u ax }nx,a=1 , {v by }ny,b=1 and z: OP 1.
subject to:
⎧ ⎫ ⎬ n ⎨ a,b a b ωop (M) := max Mx,y u x , v y ⎩ ⎭ x,y,a,b=1 ⎧ z = 1, ⎪ ⎪ ⎨ ∀x, y, a u ax = b v by = z, ⎪ ⎪ ⎩ ∀x, y, ∀a != b, u ax , u bx = 0, v ay , v by = 0.
Following the notation in [41], the relaxation we are considering here verifies ωsd p3 (M) ≥ ωop (M) ≥ ωsd p1 (M) ≥ sup{|M, Q| : Q ∈ Q}. Indeed, it is easy to see that S D P3 and S D P1 can be stated equivalently for real or complex vectors. S D P3 and S D P1 have been shown to be very useful in the study of different problems (see [40,41,55]). In particular, they can be used to approximate the entangled value of unique games.
738
M. Junge, C. Palazuelos
Actually, S D P1 is obtained when we consider the extra restriction of u ax , v by ≥ 0 for every x, y, a, b in our problem OP. Note that this S D P1 was already considered in the context of Bell inequalities in [54,55] (certificate of order 1). As far as we know, it is an open question whether the quantum value of a Bell inequality (in particular, the entangled value of a game) can be efficiently approximated up to a universal constant. S D P1 (and the certificates of higher order in [54,55]) seems to be the best known candidate to approximate such a value by using semidefinite programming. In this section, we will study the problem O P 1 to compute the classical and the quantum value of Bell inequalities. Our main theorem states: Theorem 15. There exists a Bell inequality M with n inputs and n + 1 outputs such that a) sup |M, P| 1, √ P∈L √ b) lognn sup Q∈Q |M, Q| n and c) logn n ωop (M) (lognn)β for certain universal constant β. Actually, the element M in Theorem 15 is exactly the same as the one considered in Theorem 1. Furthermore, the key point to prove this result is again the complemented copy of n2 provided in Theorem 6. Thus, we have a canonical object M which allows us to “separate” the three different models: classical, quantum and O P (see explanation after Theorem 6). In terms of distance, we immediately deduce: Corollary 4. There exists a Bell inequality M with n inputs and n + 1 outputs such that a) b)
ωop (M) n , sup P∈L |M,P| log √n ωop (M) n sup Q∈Q |M,Q| log n .
Thus, even when this O P is very useful to approximate the entangled value of some kinds of games, we can not use it to approximate the value of a general Bell inequality. Moreover, we will show that part a) in Corollary 4 is essentially optimal. Specifically, Proposition 1. For every Bell inequality M with n inputs and n outputs we have ωop (M) n. sup P∈L |M, P| Furthermore, as a consequence of a deep result in [14], we can prove a much sharper result about the previous optimality when we consider the rank of the operator M. Indeed, we will show Theorem 16. Let’s denote, for each n ∈ N, ωop (M) An = inf A : ≤ An for every M with rank n . sup P∈L |M, P| Then, 1 1 An log n (log n)β for certain universal constant β.
Large Violation of Bell Inequalities with Low Entanglement
739
As we did in the previous sections (see Lemma 14), we will regard Bell inequalities as elements in (N SG(n, n)∗ ⊗ N SG(n, n)∗ )R . Actually, since we are working up to universal constants, we can deal with the complex linear space N SG(n, n)∗ ⊗ N SG(n, n)∗ and, via Corollary 3, with n1 ( n∞ ) ⊗ n1 ( n∞ ). In the same way as we described the classical (resp. quantum) value of a Bell inequality via the tensor norm (resp. min), we will be able to describe the value ωop via another tensor norm. In this way, the results of this section emphasize again the importance of the Banach space techniques to study these kinds of problems. We will need the following two definitions (see [58,61] respectively): Definition 3. Let X and Y be two Banach spaces. For every z ∈ X ⊗ Y we define γ2∗ (z) = sup{(u ⊗ v)(z) H ⊗π H }, where the sup is taken over all Hilbert spaces H and all contractions u : X → H, v : Y → H. n Definition 4. Given two operator spaces X, Y , for every z = i=1 xi ⊗ yi ∈ X ⊗ Y we define the Haagerup tensor norm of z as ⎧ ⎫ n ⎨ ⎬ zh = sup u(xi )v(yi ) , ⎩ ⎭ B(H )
i=1
where the sup is taken over all Hilbert spaces H and all complete contractions u : X → B(H ), v : Y → B(H ). The following lemma will be very helpful in this section: K ) ⊗ N ( K ) we have z ≤ γ ∗ (z) ≤ Lemma 15. For every element z ∈ 1N ( ∞ h ∞ 1 2 2 K G zh .
Proof. The first inequality follows from the self-duality of the tensor norm · h and the fact that for any pair of Banach spaces X e Y , the map id : min(X )⊗h min(Y ) → X ⊗γ2 Y defines an isometry (see for instance [8]). Here, given a Banach space Z we denote by min(Z ) the operator space structure endowed by any embedding Z → C(K ). To see the second inequality, consider an element z such that zh ≤ 1. Now, according to K ) → n (resp. v : N ( K ) → n ) it verifies Lemma 6, given any contraction u : 1N ( ∞ ∞ 2 1 2 K ) → R ≤ K (resp. v : N ( K ) → C ≤ K ). Then, u : 1N ( ∞ n cb G n cb G ∞ 1 (u ⊗ v)(z) n2 ⊗π n2 = (u ⊗ v)(z) Rn ⊗h Cn ≤ K G2 . Remark 20. According to Theorem 13, we know that for any z ∈ N SG(N , K )∗ ⊗ N SG(N , K )∗ we have zh ≤ γ2∗ (z) ≤ Czh for a certain universal constant C. The following lemma shows that γ2∗ (M) is, up to a universal constant, the same as ωop (M). Lemma 16. For every Bell inequality M with N inputs and K outputs we have ωop (M) M N SG(N ,K )∗ ⊗γ ∗ N SG(N ,K )∗ . 2
740
M. Junge, C. Palazuelos
Proof. Suppose on the one hand that M N SG(N ,K )∗ ⊗γ ∗ N SG(N ,K )∗ ≤ 1 and consider 2
some vectors {u ax }x,a and {v by } y,b in a Hilbert space H verifying the restrictions in OP 1. Then, it can be seen that the map u : N SG(N , K )∗ → H given by u(ex,a ) = u ax (resp. v : N SG(N , K )∗ → H given by v(e y,b ) = v by ) is well defined and verifies u 1 (resp. v 1). Thus, we have that n n a,b a b a,b = M u , v M u(e ), v(e ) x,a y,b 1. x,y x y x,y x,y,a,b=1 x,y,a,b=1 Therefore, ωop (M) M N SG(N ,K )∗ ⊗γ ∗ N SG(N ,K )∗ . 2 The proof of the second inequality requires a more sophisticated argument. Note that, according to Remark 20, we can consider N SG(N , K )∗ ⊗h N SG(N , K )∗ . On the other hand, the application defined in Remark 17, N K ι : N SG(N , K )∗ → ∗i=1
∞ ,
defines a completely isometric embedding. Then, we know (see [58, Theorem 5.13]) that N K N K 2N K ι ⊗ ι : N SG(N , K )∗ ⊗h N SG(N , K )∗ → (∗i=1
∞ ) ∗ (∗i=1
∞ ) = ∗i=1
∞
defines a (completely) isometric embedding. a,b ex,a ⊗ e y,b ∈ N SG(N , K )∗ ⊗ N SG(N , K )∗ , Therefore, for any M = x,y,a,b Mx,y a,b M (ι ⊗ ι)(e ⊗ e ) . Mh = (ι ⊗ ι)(M)∗2N K = x,a y,b x,y i=1 ∞ x,y,a,b 2N K a,b a y Thus, it follows that Mh = sup x,y,a,b Mx,y E x Fb
∗i=1 ∞
B(H )
, where the sup is taken
over all families of positive operators {E xa }x,a ⊂ B(H ) (resp. {Fyb } y,b ⊂ B(H )) veri fying a E xa = 1 for every x (resp. b Fyb = 1 for every y) and E xa ⊥ E xa for every a != a (resp. Fyb ⊥ Fyb for every b != b ). We conclude the proof by using the polar y a,b E xa ξ, Fb ξ over all ization identity in order to take the supremum of x,y,a,b Mx,y operators as before and ξ in the unit sphere of H . Indeed, it is trivial to check that the y elements u ax = E xa ξ and v by = Fb ξ verify the restriction in OP 1. Remark 21. In the case of correlation Bell inequalities, this norm exactly describes the set of quantum correlations. Specifically, given a matrix (Ti, j )i,n j=1 , we have ([70]) ⎧ ⎫ n ⎨ ⎬ sup Ti, j γi, j : (γi, j )i,n j=1 is a quantum correlation matrix ⎩ ⎭ i, j=1 n = Ti, j ei ⊗ e j . i, j=1 n n
1 ⊗γ ∗ 1 2
Large Violation of Bell Inequalities with Low Entanglement
741
Remark 22. The tensor norm γ2∗ has been shown to be a very important tool in Communication Complexity (see [47] and [48] and the references therein). According to Lemma 16 and Corollary 3, Theorem 15 follows from the next result: Theorem 17. There exists an element M ∈ n1 ( n∞ ) ⊗ n1 ( n∞ ) such that a) M √ 1, √ b) lognn Mmin n and c) logn n γ2∗ (M) (lognn)β . Indeed, once we have an element M as in Theorem 17, we can obtain Mˆ verifying Theorem 15 just adding some extra zeros as it was explained in Theorem 4. Actually, it is easy to see that this is exactly the same as taking Mˆ = T ∗ | n1 ( n∞ ) ⊗ T ∗ | n1 ( n∞ ) (M) ∈ N SG(n, n + 1)∗ ⊗ N SG(n, n + 1)∗ , where T : N SG(n, n + 1) → n∞ ( n1 ) ⊕ C is the map defined in Theorem 13. Furthermore, Proposition 1 and Theorem 16 are equivalent, respectively, to Proposition 2. For every element M ∈ n1 ( n∞ ) ⊗ n1 ( n∞ ) we have M n1 ( n∞ )⊗γ ∗ n1 ( n∞ ) 2
M n1 ( n∞ )⊗ n1 ( n∞ )
n,
and Theorem 18. Let’s denote, for each n ∈ N, An = inf
A:
M N ( K )⊗ 1
∞
N K γ2∗ 1 ( ∞ )
M N ( K )⊗ N ( K ) 1
∞
1
≤ An for all N , K ∈ N
∞
K K ) ⊗ 1N ( ∞ ) with rank n . and M ∈ 1N ( ∞
Then, 1 1 An . log n (log n)β Here β is a universal constant. Note that parts a) and b) in Theorem 17 were already proved in Sect. 2 and 3. So it remains n n n n k k e to show that our element M = n12 nx,y,a,b,k=1 x,a y,b x,a ⊗ e y,b ∈ 1 ( ∞ ) ⊗ 1 ( ∞ ) n k in Sect. 3 verifies part c). By construction, our signs (x,a )x,a,k=1 verify Lemma 5.
742
M. Junge, C. Palazuelos
In particular, the map G ∗ (ei ⊗ e j ) = G ∗ : n1 ( n∞ ) → n2 n. Then,
n
k k=1 i, j ek
for every i, j = 1, . . . , n verifies
n 1 k k ∗ ∗ γ2∗ (M) 4 G (e ) ⊗ G (e ) x,a y,b x,a y,b n x,y,a,b,k=1
n2 ⊗π n2
≥
1 n4
n
n
p
p
k k x,a y,b x,a y,b
p=1 x,y,a,b,k=1
⎛ ⎞2 ⎛ ⎞2 n n n n 1 ⎝ k p ⎠ 1 ⎝ k k ⎠ = 4 x,a x,a ≥ 4 x,a x,a = n. n n k, p=1
x,a=1
k=1
x,a=1
Therefore, we have proved the first inequality in part c) of Theorem 17. Remark 23. As in the case of Mmin , the key point of the previous estimate is Theorem 6. However, as we already did in Sect. 3, we could construct explicitly the elements that we have to use in order to norm M. In this particular that we can casekwe have shown obtain γ2∗ (M) n by using the vectors u ax = vxa = nk=1 x,a ek ∈ n2 in the definition of γ2∗ . On the other hand, since the element M we are considering has rank n, the upper bound in (Theorem 17, part c)) follows by Theorem 18. To prove this result, we will need the following theorem: Theorem 19 ([14]). There exist universal constants α, β > 0 such that for every n ∈ N and every pair of linear maps u : 1 (c0 ) → n2 and v : n2 → 1 (c0 ) verifying u ◦ v = id n2 , we have uv ≥ α(log n)β . Remark 24. Note that Theorem 19 says that if we have a C-complemented copy of n2 √ in 1 (c0 ), then C ≥ α(log n)β . On the other hand, in Theorem 6 we provided a log ncomplemented copy of n2 in n1 ( n∞ ). With this at hand, we can prove Theorem 18. Proof (Theorem 18). The first inequality is a consequence of the comments above. γ2∗ (M)
logn n . Indeed, we have shown that our particular element M verifies M For the second inequality, let z be a rank n element in 1 (c0 ) ⊗ 1 (c0 ) such that z ≤ 1. Assume that zγ2∗ = Cn n, where Cn is a constant which may depend on n. We want to show that Cn
1 . (log n)β
If we denote Tz ∈ ∞ ( 1 ) → 1 (c0 ) the associated operator to z, we know by hypothesis that there exist contractions u : n2 → ∞ ( 1 ) and v : 1 (c0 ) → n2 such that $ % Cn n = tr v ◦ Tz ◦ u : n2 → ∞ ( 1 ) → 1 (c0 ) → n2 .
Large Violation of Bell Inequalities with Low Entanglement
743
Indeed, this is an immediate consequence of the very definition of the norm γ2∗ and the fact that the operator Tz has rank n. On the other hand, it is immediate that Cn n = |tr (v ◦ Tz ◦ u)| ≤
n
ai (v ◦ Tz ◦ u) =
i=1
δn
ai (v ◦ Tz ◦ u) + naδn (v ◦ Tz ◦ u),
i=1
where ai (v ◦ Tz ◦ u) denotes the i th singular value of the operator v ◦ Tz ◦ u and δ = C2n . δn ai (v ◦ Tz ◦ u) ≤ δnv ◦ Tz ◦ u ≤ δn. It follows Now, it is very easy to see that i=1 that Cn n ≤ δn + naδn (v ◦ Tz ◦ u), so aδn (v ◦ Tz ◦ u) ≥ C2n . n n δn This means that there exist operators p : δn 2 → 2 and q : 2 → 2 , such that pq ≤ C2n and q ◦ (v ◦ Tz ◦ u) ◦ p = id δn . In particular, we have the following 2 factorization:
1@ (c0=) == == ==q◦v Tz ◦u◦ p == == = id δn / δn .
2 2 According to Theorem 19, we know that 2 Cn n β ≤ q ◦ vTz ◦ u ◦ p ≤ pq ≤ . α log 2 Cn 1 1 β That is, C2n ≥ α β log C2n n . On the other hand, 1 1 Cn n Cn β β log log n + log =α
log n, α 2 2 where we have used that Cn
1 log n
in the last inequality. We conclude that Cn
1 . (log n)β
We will finish this section with the proof of Proposition 2. Proof (Proposition 2). We have to show that id ⊗ id : n1 ( n∞ ) ⊗ n1 ( n∞ ) → n1 ( n∞ ) ⊗γ2∗ n1 ( n∞ ) n. To see this estimate, recall that Grothendieck’s theorem can be stated (see [23] for details) as: id ⊗ id : 1 ⊗ 1 → 1 ⊗γ2∗ 1 ≤ K G . Then, the result follows easily considering the same factorization as in the proof of Theorem 14: K K ˜ : 1N ( ∞ ) ⊗ 1N ( ∞ ) id ⊗ id = (u˜ −1 ⊗ u˜ −1 ) ◦ (id ⊗ id) ◦ (u˜ ⊗ u) K K → 1N ( ∞ ) ⊗γ2∗ 1N ( ∞ ).
744
M. Junge, C. Palazuelos
Acknowledgements. We would like to thank Timur Oikhberg for drawing our attention to Theorem 19, and the referees for their many useful comments which helped to improve the readability of this paper. The authors are partially supported by National Science Foundation grant DMS-0901457.
References 1. Acin, A., Brunner, N., Gisin, N., Massar, S., Pironio, S., Scarani, V.: Device-independent security of quantum cryptography against collective attacks. Phys. Rev. Lett. 98, 230501 (2007) 2. Acín, A., Durt, T., Gisin, N., Latorre, J. I.: Quantum nonlocality in two three-level systems. Phys. Rev. A 65, 052325 (2002) 3. Acín, A., Gill, R., Gisin, N.: Optimal Bell tests do not require maximally entangled states. Phys. Rev. Lett. 95, 210402 (2005) 4. Acin, A., Masanes, L., Gisin, N.: From Bell’s Theorem to Secure Quantum Key Distribution. Phys. Rev. Lett. 97, 120405 (2006) 5. Barak, B., Hardt, M., Haviv, I., Rao, A., Regev, O., Steurer, D.: Rounding parallel repetitions of unique games. In: Proc. 49th Annual IEEE Symp. on Foundations of Computer Science (FOCS), Piscataway, NJ: IEEE, 2008, pp. 374–383 6. Bell, J.S.: On the Einstein-Poldolsky-Rosen paradox. Physics 1, 195 (1964) 7. Ben-Or, M., Hassidim, A., Pilpel, H.: Quantum Multi Prover Interactive Proofs with Communicating Provers. In: Proceedings of 49th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2008), Piscataway, NJ: IEEE, 2008 8. Blecher, D. P., Paulsen, V. I.: Tensor products of operator spaces. J. Funct. Anal. 99, 262–292 (1991) 9. Brassard, G., Broadbent, A., Tapp, A.: Quantum Pseudo-Telepathy. Foundations of Physics 35(11), 1877–1907 (2005) 10. Briet, J., Buhrman, H., Lee, T., Vidick, T.: Multiplayer XOR games and quantum communication complexity with clique-wise entanglement. http://arXiv.org/abs/0911.4007v1 [quant-ph], 2009 11. Briët, J., Buhrman, H., Toner, B.: A generalized Grothendieck inequality and entanglement in XOR games. http://arXiv.org/abs/0901.2009v1 [quant-ph], 2009 12. Brunner, N., Gisin, N., Scarani, V., Simon, C.: Detection loophole in asymmetric Bell experiments. Phys. Rev. Lett. 98, 220403 (2007) 13. Brunner, N., Pironio, S., Acin, A., Gisin, N., Methot, A.A., Scarani, V.: Testing the Hilbert space dimension. Phys. Rev. Lett. 100, 210503 (2008) 14. Bourgain, J., Casazza, P.G., Lindenstrauss, J., Tzafriri, L.: Banach spaces with a unique unconditional basis, up to a permutation. Memoirs Am. Math. Soc. No. 322, Providence, RI: Amer. Math. Soc., 1985 15. Buhrman, H., Cleve, R., Massar, S., de Wolf, R.: Non-locality and Communication Complexity. Rev. Mod. Phys. 82, 665–698 (2010) 16. Buhrman, H., Scarpa, G., de Wolf, R.: Better Non-Local Games from Hidden Matching. http://arXiv.org/ abs/1007.2359v1 [quant-ph], 2010 17. Charikar, M., Makarychev, K., Makarychev, Y.: Near-optimal algorithms for unique games. Proc. 38th ACM STOC, New York: ACM Press, 2006, pp. 205–214 18. Gavinsky, R.D., Jain, R.: Entanglement- resistant two-prover interactive proof systems and nonadaptive PIRs. Quantum Information and Computation 9, 648–656 (2009) 19. Cleve, R., Høyer, P., Toner, B., Watrous, J.: Consequences and Limits of Nonlocal Strategies. In: Proceedings of the 19th IEEE Annual Conference on Computational Complexity (CCC 2004), Piscataway, NJ: IEEE, 2004, pp. 236–249 20. Cleve, R., Slofstra, W., Unger, F., Upadhyay, S.: Perfect parallel repetition theorem for quantum XOR proof systems. In: Proc. 22nd IEEE Conference on Computational Complexity, Piscataway, NJ: IEEE, 2007, pp. 109–114 21. Collins, D., Gisin, N., Linden, N., Massar, S., Popescu, S.: Bell inequalities for arbitrarily highdimensional systems. Phys. Rev. Lett. 88, 040404 (2002) 22. Degorre, J., Kaplan, M., Laplante, S., Roland, J.: The communication complexity of non-signaling distributions. In: Proc. 34th Int. Symp. of the MFCS, Berlin-Heidelberg-New York: Springer-Verlag, 2009, pp. 270–281 23. Defant, A., Floret, K.: Tensor Norms and Operator Ideals. Amsterdam: North-Holland, 1993 24. Doherty, A.C., Liang, Y.-C., Toner, B., Wehner, S.: The quantum moment problem and bounds on entangled multi-prover games. In: Proceedings of IEEE Conference on Computational Complexity 2008, Piscataway, NJ: IEEE, 2008, pp. 199–210 25. Donald, M.J., Horodecki, M., Rudolph, O.: The uniqueness theorem for entanglement measures. Jour. Math. Phys. 43, 4252–4272 (2002) 26. Eberhard, P.: Background level and counter efficiencies required for a loophole free Einstein-PodolskyRosen experiment. Phys. Rev. A 47, R747–R750 (1993)
Large Violation of Bell Inequalities with Low Entanglement
745
27. Effros, E.G., Ruan, Z.-J.: Operator spaces. London Math. Soc. Monographs New Series, Oxford: Clarendon Press, 2000 28. Einstein, A., Podolsky, B., Rosen, N.: Can Quantum-Mechanical Description of Physical Reality Be Considered Complete? Phys. Rev. 47, 777 (1935) 29. Gisin, N.: Hidden quantum nonlocality revealed by local filters. Phys. Lett. A 210, 151–156 (1996) 30. Holenstein, T.: Parallel repetition: simplifications and the no-signaling case. In: Proceedings of the thirty-ninth annual ACM symposium on Theory of computing (STOC) 2007, New York: ACM Press, 2007 31. Ito, T., Kobayashi, H., Matsumoto, K.: Oracularization and two-prover one-round interactive proofs against nonlocal strategies. In: Proc. 24th IEEE Conference on Computational Complexity, Piscataway, NJ: IEEE, 2009, pp. 217–228 32. Jain, R., Ji, Z., Upadhyay, S., Watrous, J.: QIP=PSPACE. In: Proceedings of the 42nd ACM Symposium on Theory of Computing (STOC 2010), New York: ACM Press, pp. 573–582 33. Junge, M.: Factorization theory for Spaces of Operators. Habilitationsschrift Kiel, (1996); see also: Preprint server of the University of Southern Denmark 1999, IMADA preprint: pp. 1999–2002 34. Junge, M., Navascues, M., Palazuelos, C., Pérez-García, D., Scholz, V. B., Werner, R. F.: Connes’ embedding problem and Tsirelson’s problem. J. Math. Phys. 52, 012102 (2011) 35. Junge, M., Palazuelos, C., Pérez-García, D., Villanueva, I., Wolf, M.M.: Unbounded violations of bipartite Bell Inequalities via Operator Space theory. Commun. Math. Phys. 300(3), 715–739 (2010) 36. Junge, M., Palazuelos, C., Pérez-García, D., Villanueva, I., Wolf, M.M.: Operator Space theory: a natural framework for Bell inequalities. Phys. Rev. Lett. 104, 170405 (2010) 37. Junge, M., Parcet, J.: Mixed-norm inequalities and operator space L p embedding theory. Mem. Amer. Math. Soc. 203(31), 953 (2010) 38. Junge, M., Pisier, G.: Bilinear forms on exact operator spaces and B(H ) ⊗ B(H ). Geom. Func. Anal. 5, 329–363 (1995) 39. Kempe, J., Kobayashi, H., Matsumoto, K., Toner, B., Vidick, T.: Entangled games are hard to approximate. http://arXiv.org/abs/0704.2903v2 [quant-ph], 2007 40. Kempe, J., Regev, O.: No Strong Parallel Repetition with Entangled and Non-signaling Provers. In: Proc. 25thCCC 10, Piscataway, NJ: IEEE, 2010, pp. 7–15 41. Kempe, J., Regev, O., Toner, B.: The Unique Games Conjecture with Entangled Provers is False. In: Proceedings of 49th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2008), Piscataway, NJ: IEEE, 2008 42. Khot, S.: On the power of unique 2-prover 1-round games, In: Proceedings of the 34th annual ACM Symposium on Theory of Computing, New york: ACM Press, 2002, pp. 767–775 43. Khot, S.: Vishnoi, N. K.: The unique games conjecture, integrality gap for cut problems and embeddability of negative type metrics into 1 . In: Proc. 46th IEEE Symp. on Foundations of Computer Science, Piscataway, NJ: IEEE, 2005, pp. 53–62 44. Kwapien, S., Woyczynski, W.: Random Series and Stochastic Integrals: Single and Multiple. Probab. Appl., Boston, MA: Birkhäuser Boston, 1982 45. Ledoux, M., Talagrand, M.: Probability in Banach Spaces. Berlin-Heidelberg-New York: SpringerVerlag, 1991 46. Liang, Y.-C., Doherty, A.: Bounds on Quantum Correlations in Bell Inequality Experiments. Phys. Rev. A 75, 042103 (2007) 47. Linial, N., Mendelson, S., Schechtman, G., Shraibman, A.: Complexity Measures of Sign Matrices. Combinatorica 27(4), 439–463 (2007) 48. Linial, N., Shraibman, A.: Lower Bounds in Communication Complexity Based on Factorization Norms. Random Structures and Algorithms 34, 368–394 (2009) 49. Marcus, M.B., Pisier, G.: Random Fourier series with applications to harmonic analysis. Annals of Math. Studies 101, Princeton, NJ: Princeton Univ. Press, 1981 50. Masanes, L.: Universally-composable privacy amplification from causality constraints. Phys. Rev. Lett. 102, 140501 (2009) 51. Masanes, Ll., Renner, R., Winter, A., Barrett, J., Christandl, M.: Security of key distribution from causality constraints. http://arXiv.org/abs/quant-ph/0606049v4, 2009 52. Massar, S.: Nonlocality, closing the detection loophole, and communication complexity. Phys. Rev. A 65, 032121 (2002) 53. Methot, A.A., Scarani, V.: An anomaly of non-locality. Quant. Inf. Comput. 7, 157 (2007) 54. Navascues, M., Pironio, S., Acin, A.: Bounding the Set of Quantum Correlations. Phys. Rev. Lett. 98, 010401 (2007) 55. Navascués, M., Pironio, S., Acín, A.: A convergent hierarchy of semidefinite programs characterizing the set of quantum correlations. New J. Phys. 10, 073013 (2008) 56. Paulsen, V. I.: Completely Bounded Maps and Operator Algebras. Cambridge Studies in Advanced Mathematics 78, Cambridge: Cambridge University Press, 2003
746
M. Junge, C. Palazuelos
57. Pérez-García, D., Wolf, M.M., Palazuelos, C., Villanueva, I., Junge, M.: Unbounded violation of tripartite Bell inequalities. Commun. Math. Phys. 279(2), 455–486 (2008) 58. Pisier, G.: An Introduction to Operator Spaces. London Math. Soc. Lecture Notes Series 294, Cambridge: Cambridge University Press, 2003 59. Pisier, G.: Non-Commutative Vector Valued Lp-Spaces and Completely p-Summing Maps. Asterisque 247 (1998) 60. Pisier, G.: The volume of convex bodies and Banach space geometry. Cambridge Tracts in Mathematics, Vol. 94, Cambridge: Cambridge University Press, 1989 61. Pisier, G.: Factorization of linear operators and geometry of Banach spaces. CBMS 60, Providence, RI: Amer. Math. Soc., 1986 62. Pisier, G.: The operator Hilbert space OH, complex interpolation and tensor norms. Mem. Am. Math. Soc. 585, 122 (1996) 63. Pisier, G.: Probabilistic methods in the geometry of Banach spaces. In: Probability and Analysis (Varenna. Italy, 1985), Lecture Notes Math. 1206, Berlin-Heidelberg-New York: Springer, 1986, pp. 167–241 64. Popescu, S.: Bell’s Inequalities and Density Matrices: Revealing “Hidden” Nonlocality. Phys. Rev. Lett. 74, 2619 (1995) 65. Rao, A.: Parallel repetition in projection games and a concentration bound. STOC (2008), Piscataway, NJ: IEEE, 2008 66. Raz, R.: A Parallel Repetition Theorem. SIAM Journal on Computing 27, 763–803 (1998) 67. Raz, R.: A counterexample to strong parallel repetition. In: 49th Annual IEEE Symposium on Foundations of Computer Science, Piscataway, NJ: IEEE, 2008, pp. 369–373 68. Scarani, V., Gisin, N., Brunner, N., Masanes, L., Pino, S., Acin, A.: Secrecy extraction from no-signalling correlations. Phys. Rev. A 74, 042339 (2006) 69. Tomczak-Jaegermann, N.: Banach-Mazur Distances and Finite Dimensional Operator Ideals. Pitman Monographs and Surveys in Pure and Applied Mathematics 38, London: Longman Scientific and Technical, 1989 70. Tsirelson, B.S.: Some results and problems on quan- tum Bell-type inequalities. Hadronic Journal Supplement 8(4), 329–345 (1993) 71. Vertesi, T., Pal, K.F.: Bounding the dimension of bipartite quantum systems. Phys. Rev. A 79, 042106 (2009) 72. Wehner, S.: Tsirelson bounds for generalized Clauser-Horne-Shimony-Holt inequalities. Phys. Rev. A 73, 022110 (2006) 73. Wehner, S., Christandl, M., Doherty, A. C.: A lower bound on the dimension of a quantum system given measured data. Phys. Rev. A 78, 062112 (2008) 74. Werner, R.F.: Quantum states with Einstein-Podolsky-Rosen correlations admitting a hidden-variable model. Phys. Rev. A 40, 4277 (1989) 75. Werner, R.F., Wolf, M.M.: Bell inequalities and Entanglement. Quant. Inf. Comp. 1(3), 1–25 (2001) Communicated by M.B. Ruskai
Commun. Math. Phys. 306, 747–776 (2011) Digital Object Identifier (DOI) 10.1007/s00220-011-1262-5
Communications in
Mathematical Physics
Current in Periodic Lorentz Gases with Twists Hong-Kun Zhang Department of Math. & Statistics, University of Massachusetts, Amherst, MA 01003, USA. E-mail:
[email protected] Received: 8 September 2010 / Accepted: 5 December 2010 Published online: 20 May 2011 – © Springer-Verlag 2011
Abstract: We study electrical current in two-dimensional periodic Lorentz gas in the presence of a twist force on the scatterers. In this deterministic system, billiard orbits are still geodesics between collisions, but do not reflect elastically when reaching the boundary. When the horizon is finite, i.e. the free flights between collisions are bounded, the resulting current J is proportional to the strength of the twist force measured by ε. We also prove the existence of a unique SRB measure, for which the Pesin entropy formula and Young’s expression for the fractal dimension are valid. 1. Introduction Lorentz gas is a popular model in mathematical physics introduced in 1905, see [15], in studying the motions of electrons in metals. Here we consider a two-dimensional periodic Lorentz gas on Q˜ (the plane minus the union of scatterers), which reduces to a dynamical system generated by a point particle moving on T2 and bouncing off fixed convex scatterers. More precisely, let B1 , · · · , Bs be open convex domains on the unit 2-dimensional torus T2 . Assume that the closures of these domains are pairwise disjoint and the boundary of each domain is C 3 smooth with non-vanishing curvature. The billiard dynamics describe the motion of a particle of unit mass moving in Q := T2 \ ∪i Bi according to equations q˙ = v,
v˙ = 0,
(1.1)
where q = (x, y) ∈ Q is the position vector and the unit vector v is the velocity. In classical billiards, the particle reflects elastically upon reaching the boundary ∂Q = ∪i ∂Bi , according to the law of elastic reflections: v+ = v− − 2(n(q), v− n(q).
(1.2)
Here q ∈ ∂ Q is the point of reflection, n(q) is the inward unit normal vector along ∂ Q at q, and v+ , v− are the outgoing and incoming velocity vectors, respectively. If we denote
748
H.-K. Zhang
ϕ ∈ [−π/2, π/2] as the outgoing angle formed by ±v and n(q), such that the angle ϕ is positive (or negative) if it is obtained by rotating from n(q) to v in counterclockwise (or clockwise) direction. In particular, the elastic reflection property implies that the incoming angle equals the outgoing one. Since each obstacle Bi is convex, so that parallel bundles of trajectories diverge upon reflection. Billiards with this property are said to be dispersing, or Sinai billiards. The billiard preserves the kinetic energy E = 21 v2 = 21 . And it generates a flow t0 on a compact 3-d manifold M := Q × S1 . The collision space is defined by: M := {(q, v) ∈ M : q ∈ ∂ Q, v, n(q) ≥ 0},
(1.3)
which consists of all outgoing velocity vectors at reflection points on ∂ Q. Then the first return map T0 : M → M is well defined, which is also called the collision (or billiard) map. The collision space M can be parameterized by (r, ϕ), where r is the arclength parameter along ∂ Q, oriented positively, and ϕ ∈ [− π2 , π2 ] is the angle defined above. In these coordinates, M = ∂ Q × [− π2 , π2 ] is made of a finite number of rectangles. It was shown that the flow t0 preserves the Liouville measure μˆ 0 on M and T0 preserves μ0 defined as dμ0 = c cos ϕ dr dϕ on M, where c is the normalizing constant, see for example [3,9]. For any x = (r, ϕ) ∈ M, let the time of the first collision of the trajectory starting at x be τ (x) = min{t > 0 : t0 (x) ∈ M}.
(1.4)
τ (x) is also known as the free path of x along its forward orbit. The infinite domain Q˜ can be viewed as the universal cover of Q, which is convenient ˜ we denote to use for studying Lorentz gas. For particle moving in the infinite domain Q, th ˜ Let ˜ by q(t) its position at time t and by q˜ n its position at the n collision with ∂ Q. n = q˜ n+1 − q˜ n denote the displacement vector between collisions. (Note that we use ˜ The estimation tildes for notations related to the dynamics in the unbounded region Q.) √ ˜ t was proved by Bunimovich, Chernov and Sinai: of the limiting distribution of q(t)/ Theorem ([2,3]). Suppose that the particle moves freely without any perturbations. √ ˆ ∗ ) with a ˜ t converges in distribution to the normal law N (0, D (a) As t → ∞, q(t)/ ∗ ˆ non-degenerate covariance matrix D (also called the diffusion matrix). Moreover ˆ ∗ is given by Green-Kubo formula: the matrix D ∞ ˆ∗ = 1 D μ0 ( ⊗ n ), τ¯ n=−∞
(1.5)
where τ¯ = μ0 (τ ) is the mean free path given by τ¯ = μ0 (τ ) =
π · Area(Q) length(∂ Q)
(1.6)
and u ⊗ v means the tensor product of two vectors. √ (b) Similar results hold for the discrete map. As n → ∞, qn / n converges in distriˆ ∗. bution to N (0, D∗ ), where D∗ := τ¯ D
Current in Periodic Lorentz Gases with Twists
749
Young [19] showed that the series (1.5) converges exponentially fast. Here we consider a perturbation of the classical Sinai billiards which also preserves the kinetic energy and satisfies (1.1) between collisions. But the reflection is not elastic upon collision. More precisely let x = (r, ϕ) ∈ M, T0 x = (r1 , ϕ1 ), and π2 : M → [−π/2, π/2] be the natural projection such that π2 (r, ϕ) = ϕ. Then the perturbed map Tε : M → M satisfies: Tε (r, ϕ) = (r1ε , ϕ1ε ) := T0 x + (0, εg(π2 (T0 x)),
(1.7)
where ε ≥ 0 and the twist function g : [− π2 , π2 ] → [− π2 , π2 ] is a C 2 function satisfying π g(± ) = 0. 2
(1.8)
This defines a perturbation of the classical billiard systems. Assumption (1.8) implies that the perturbation preserves grazing trajectories. This assumption is reasonable as it ensures that the outgoing angle (after the twist) lies in [−π/2, π/2] and the inverse map Tε−1 is well-defined. We denote the perturbed flow as tε on M. For any x ∈ M, the collision time function τ (x) does not depend on ε, but the free path of Tε x does, and we denote τ1ε (x) = τ (Tε x). In fact for all the quantities we introduced above, if they depend on ε corresponding to Tε or tε , we will add ε as a subscript or superscript without emphasizing. Recall that the time reversibility of the transformation requires that the reflection is reversible, as the transition already has this property in our case. This means that if the incoming angle is −ϕ1ε then the outgoing one should be −ϕ1 , i.e. −ϕ1ε +εg(−ϕ1ε ) = −ϕ1 . Combining with (1.7), we know that Tε is reversible if and only if g(ϕ1 ) = g(−ϕ1ε ).
(1.9)
This relation also implies that the graph of y = x + εg(x) is symmetric about the line y = −x. Note that the choice of such twist function g does depend on ε for the reversible system. In our paper, the reversibility is not required. There are many other types of perturbation of billiards. For example billiards under small forces was studied by N. Chernov in [5,7] using a somewhat opposite approach, where the author considered the Sinai billiards with elastic collisions and under small external forces between collisions. The ergodic and statistical properties were proved and it was shown that the current generated by the perturbed flow is closely related to the strength of the force. Now that we also have a perturbed system, it is important to investigate the ergodic and statistical properties of the systems. Furthermore, we would like to estimate the current generated by the system. 2. The Main Results We first state our assumptions on the model. (A1) (Finite horizon). There exists τmax > 0 such that free paths between successive reflections are uniformly bounded, i.e. τ (x) < τmax , ∀x ∈ M. In addition it follows from the assumptions on scatterers, the free path function is indeed bounded from below τ (x) ≥ τmin > 0. And by smoothness of the boudnary, the curvature K(r ) of the boundary also has bounds: 0 < Kmin < K(r ) < Kmax < ∞, ∀r ∈ ∂ Q.
750
H.-K. Zhang
Note that by (1.8) and the assumption that g is C 2 smooth, there exists ag > 0 such that for any k = 0, 1, 2, g(ϕ) and (2.1) g (k) ∞ ≤ ag cos ϕ ≤ ag . ∞ (A2) (Smallness of the perturbation). We assume ε ∈ [0, ε0 ] such that ε0 is small enough. Note that under our assumptions, Tε may not even be a C 1 perturbation of T0 , as we will show in Sect. 3.2. Next we state our main results of this paper under the assumption (A1)–(A2). Theorem 2.1. The map Tε : M → M is a uniformly hyperbolic map with singularities. It admits a unique SRB measure με , which is positive on any open set, K -mixing and Bernoulli. (1) The measure με satisfies the Pesin formula for the K-S entropy, h με (Tε ) = λ˜ uε ,
(2.2)
where λ˜ sε < 0 < λ˜ uε are the Lyapunov exponents for the measure με and h με (Tε ) is the K-S entropy for (Tε , με ): (2) με satisfies Young’s formula for the fractal dimension 1 1 H D(με ) = h με (Tε ) − , λ˜ uε λ˜ sε
(2.3)
where H D(με ) denotes the fractal dimension of με . (3) The flow tε preserves a unique SRB measure μˆ ε on M. (4) The measure μˆ ε is ergodic, mixing, which has the K-S entropy given by h μˆ ε (1ε ) := h με (Tε )/με (τ ) and the fractal dimension H D(μˆ ε ) = H D(με ) + 1. The next theorem describes estimations on the current for the discrete system. Theorem 2.2. Let (x) = q˜ 1 − q˜ = ( x , y ) denote the displacement vector between collisions, where q˜ = (x, y). (a) For small enough ε, the discrete-time electrical current is well-defined: J = lim q˜ n /n = με ().
(2.4)
J = εσ + o(ε),
(2.5)
n→∞
(b) The current J satisfies
where σ = (σx , σ y ), for a ∈ {x, y} σa =
∞
μ0 [( a ◦ T0k ) · G],
(2.6)
k=1
where G(x) = −g (ϕ1 ) + g(ϕ1 ) tan ϕ1 and ϕ1 is the outgoing angle of Tε (x) = (r1 , ϕ1 ).
Current in Periodic Lorentz Gases with Twists
751
(c) As n → ∞, q˜ n − Jn ⇒ N (0, Dε ), √ n
(2.7)
where the above convergence means that the distributions of q˜ n√−Jn converge to the n normal distribution N (0, Dε ), and Dε is the discrete-time diffusion matrix of Tε given by: Dε =
∞ με [ ◦ Tεn ⊗ ] − με () ⊗ με () .
(2.8)
n=−∞
(d) Dε is continuous in ε at ε = 0 with Dε = D∗ + o(1).
(2.9)
The corresponding results for the continuous-time system is provided by the following theorem. Theorem 2.3. For the particle moving in the domain Q˜ under the perturbation: (a) The current ˜ Jˆ = lim q(t)/t = με ()/με (τ ),
(2.10)
Jˆ = εσˆ + o(ε),
(2.11)
t→∞
is well-defined. (b) The current Jˆ satisfies where σˆ = (σˆ x , σˆ y ) and for a ∈ {x, y}, σˆ a = σa /τ¯ . (c) As t → ∞, ˆ ˜ − Jt q(t) ˆ ε ), ⇒ N (0, D √ t
(2.12)
ˆ ε is the corresponding diffusion matrix. where D (d) The diffusion matrix is continuous in ε at ε = 0 and ˆε = D ˆ ∗ + o(1). D
(2.13)
Theorem 2.4. Let h 0 = h μ0 (T0 ) be the K-S entropy of T0 , then ε 2 ∞ k 2 H D(μ) ˆ = 3 − 2h k=−∞ μ0 [G ◦ T0 · G] + o(ε ), 0 where G is defined in (2.6) satisfying μ0 (G) = 0. We first investigate the hyperbolicity of the perturbed system in Sect. 3. The regularity of a class of invariant unstable curves is proved in Sect. 4 and the growth lemma is studied in Sect. 5. Section 6 proves a partial result for Theorem 2.1, i.e. Tε admits at least one and at most finitely many SRB measures. Furthermore every SRB measure is K-mixing and Bernoulli, up to a finite cycle. The uniqueness of such an SRB measure and the rest of Theorem 2.1 are proved in Sect. 7 using the exponential decay of correlations of (Tε , με ) and related properties. The last section is devoted to proving Theorem 2.2, Theorem 2.3 and Theorem 2.4.
752
H.-K. Zhang
3. Hyperbolic Properties of Tε In this section we show that the hyperbolic property is stable under small perturbations indicated as assumption (A1–A2). First we introduce some notations and review some useful formulas for the original map T0 ; the derivation of these formulas can be found in many references, for example see [9]. For any x, y ∈ M, denote d(x, y) as the Euclidean distance (defined by the (r, ϕ)-coordinates) between x and y in M. For any set A ⊂ M, denote m A as the Lebesgue measure on A. 3.1. Hyperbolicity of the original system (T0 , M). For any x = (r, ϕ), denote x1 = (r1 , ϕ1 ) := T0 x. It was shown in [9] that the derivative DT0 at the point x = (r, ϕ) ∈ M is a 2 × 2 matrix:
τ K + cos ϕ τ , (3.1) cos ϕ1 Dx T0 = K1 (τ K + cos ϕ) + K cos ϕ1 τ K1 + cos ϕ1 where K1 = K(r1 ). Furthermore the map T0 has two families of cones C¯xu (unstable) and C¯xs (stable) in the tangent spaces Tx M, for all x ∈ M. More precisely, the unstable cone C¯xu contains all tangent vectors based at x whose images generate dispersing wave fronts: −1 C¯xu = {(dr, dϕ) ∈ Tx M : K(r ) ≤ dϕ/dr ≤ K(r ) + τmin }.
For any dx = (dr, dϕ) ∈ C¯xu , one can check that the slope of dx1 = (dr1 , dϕ1 ) = Dx T0 dx satisfies K(r1 ) +
cos ϕ1 cos ϕ1 ≥ V1 := dϕ1 /dr1 ≥ K(r1 ) + . τ (x) τ (x) + cos ϕ/2K(r )
(3.2)
Note that for any x = (r, ϕ) ∈ ∂ Q × (−π/2, π/2), cos ϕ > 0. This implies that the unstable cone C¯u is strictly invariant under DT0 if |ϕ| = π/2. Before studying the expansion factor for vectors in unstable cones, we first introduce two norms that are usually used in billiard systems. Let dx be a vector in Tx M. We define the p-norm of dx as dx p = cos ϕ|dr |. The Euclidean norm is dx =
√ dr 2 + dϕ 2 = |dr | 1 + V 2 ,
(3.3)
where V = dϕ/dr . Note that cos ϕdx =
√ 1 + V 2 dx p ,
(3.4)
which implies that dx cos ϕ and dx p are equivalent for vectors in C¯u or C¯s . Let dx ∈ C¯xu be any nonzero vector and x1 = T0 x = (r1 , ϕ1 ). Then dx1 p 2Kτ (x) ˆ p := 1 + 2τmin Kmin . ≥ ≥1+ dx p cos ϕ
(3.5)
Current in Periodic Lorentz Gases with Twists
753
Furthermore, the expansion in Euclidean norm satisfies cos ϕ 1 + V12 d x1 p dx1 ˆ e := C ˆ p, = ≥ √ dx cos ϕ1 1 + V 2 d x p ˆ e > 0 is called the minimal expansion factor for T0 in the Euclidean norm. Note where ˆ e may not be larger than 1, but one can find a large N > 1, such that T N has that 0 uniform minimal expanding rate larger than 1. In addition,
dx1 2τmin Kmin cos ϕ 2τmin Kmin 1+ ≥ const. ≥ const. . dx cos ϕ1 cos ϕ cos ϕ1 Similarly the stable cone C¯xs is: −1 C¯xs = {(dr, dϕ) ∈ Tx M : −K(r ) ≥ dϕ/dr ≥ −K(r ) − τmin }.
One can also check that the stable cones are strictly invariant under DT0−1 , and any ˆ −1 nonzero vector in the stable cone gets uniformly contracted under DT0 at least by p ˆ −1 and in the p-norm and the Euclidean norm, respectively. e 3.2. Hyperbolicity of the perturbed system (Tε , M). Now we consider the perturbed map Tε . For any phase point x = (r, ϕ) ∈ M, denote T0 x = (r1 , ϕ1 ) and Tε x = (r1ε , ϕ1ε ). According to (1.7), and dϕ1ε = dϕ1 1 + εg (ϕ1 ) . (3.6) dr1ε = dr1 This implies
1 0 Dx T0 Dx Tε = 0 1 + εg (ϕ1 ) 0 = Dx T0 + εg (ϕ1 ) K1 cos ϕ1 (τ K + cos ϕ) + K
0
τ K1 cos ϕ1
+1
.
(3.7)
Combining with assumption (A1–A2), one can show that Tε is not a C 1 perturbation of T0 . More precisely, it follows from (A1–A2) and (3.7) that for any x ∈ M, Dx Tε − Dx T0 ≥ Cε
|g (ϕ1 )| , cos ϕ1
(3.8)
which is unbounded around the boundary of M. Furthermore the perturbed flow ε may not preserve the volume and the map Tε does not preserve μ0 any more. In particular, one can check that cos ϕ | det Dx Tε | = | det Dx T0 |(1 + εg (ϕ1 )) = (1 + εg (ϕ1 )). (3.9) cos ϕ1 An inductive argument leads to the n th Jacobian: dTε−n m n (x) = det Dx Tεn = det Dx T0n · i=1 (1 + εg (π2 T0i x)), dm
(3.10)
754
H.-K. Zhang
where π2 : M → [−π/2, π/2] s.t. π2 (r, ϕ) = ϕ is the projection from M to the angle space. This implies that for any n ≥ 1, (1 + ε0 ag )−n ≤
| det Dx Tεn | ≤ (1 + ε0 ag )n . | det Dx T0n |
(3.11)
We are also interested in the density function defined by J ε (x) := dTε−1 μ0 /dμ0 (x). Lemma 3.1. Let ε ∈ [0, ε0 ]. For any x ∈ M, J ε (x) is differentiable in ε and satisfies 1 − J ε (x) = εG(x) + ε2 Rε ,
(3.12)
where G(x) = −g (ϕ1 ) + g(ϕ1 ) tan ϕ1 with μ0 (G) = 0 and Rε is continuous on M. In addition, there exists C > 0 such that for any n ≥ 1, C(1 + ε0 ag )−n ≤
dTε−n μ0 (x) ≤ C(1 + ε0 ag )n . dμ0
(3.13)
Proof. For any n ≥ 1, let Tεn x = (rnε , ϕnε ) and T0 x = (rn , ϕn ): dTε−n μ0 (x) dTε−n μ0 (x) dTε−n m(x) dm(x) · = · dμ0 (x) dm(x) dμ0 (x) dTε−n m(x) n −n dμ0 (Tε x) dTε m(x) dm(x) = · · dm(Tεn x) dm(x) dμ0 (x) 1 = cos ϕnε · | det Dx Tεn | · cos ϕ cos ϕnε n = (1 + εg (π2 T0i x)), cos ϕn i=1 where we have used (3.10) in the last equation. The smoothness of J ε (x) follows from that of ϕ1ε . By (2.1) for any n ≥ 1, ε2 g 2 (ϕn ) cos ϕn + O(ε3 ). 2
(3.14)
cos ϕnε ε2 g 2 (ϕn ) + O(ε3 ) ≤ 1 + 2ag ε. = 1 − εg(ϕn ) tan ϕn − cos ϕn 2
(3.15)
cos ϕnε = cos ϕn − εg(ϕn ) sin ϕn − This implies that
It follows from the invariance of μ0 under T0 that one gets J 0 (x) = dT0−1 μ0 (x)/dμ0 (x) = 1. Thus we get the expansion for J ε at ε = 0: J ε (x) = J 0 (x) − εG(x) − ε2 Rε ,
(3.16)
Current in Periodic Lorentz Gases with Twists
755
where G(x) = −g (ϕ1 ) + g(ϕ1 ) tan ϕ1 and Rε (x) =
1 2 g (ϕ1 ) + g (ϕ1 )g(ϕ1 ) tan ϕ1 + O (ε). 2
Note that dμ0 = c cos ϕdϕdr . This implies that
−g (ϕ1 ) + g(ϕ1 ) tan ϕ1 cos ϕ1 dϕ1 dr1 = 0. μ0 (G) = c ∂Q
[−π/2,π/2]
By assumption (A1), both G and Rε are continuous. The estimate (3.13) follows from (3.14) and (3.11). The next proposition states the fact that although the perturbed maps do not have the same families of stable/unstable manifolds, they share the same families of stable/unstable cones. Proposition 3.2. There exist two families of cones Cxu (unstable) and Cxs (stable) in the tangent spaces Tx M and p > 1, such that for all ε ∈ [0, ε0 ] and all x ∈ M: (1) DTε (Cxu ) ⊂ C Tuε x and DTε (Cxs ) ⊃ C Ts ε x whenever DTε exists. (2) These families of cones are continuous on M and the angle between Cxu and Cxs is uniformly bounded away from zero. (3) Dx Tε (v) p ≥ p v p , ∀v ∈ Cxu and Dx Tε−1 (v) p ≥ p v p , ∀v ∈ Cxs . (4) Let εe , εp be the minimal expansion factor for unstable vectors of Tε in the Euclidean norm and p-norm, respectively. Then ˆ e + O(ε) εe =
and
ˆ p + O(ε) ≥ p . εp =
(3.17)
Proof. For x ∈ M, any unit vector dx ∈ Tx M, let dx1ε = Dx Tε dx. Then by (3.6) the slope V1ε of the vector dx1ε at Tε x = (r1ε , ϕ1ε ) satisfies V1ε = V1 (1 + εg (ϕ1 ))
cos ϕ1 ≥ K(r1 ) + (1 + εg (ϕ1 )) τ + cos ϕ/2K(r ) ≥ K(r1 )(1 + εg (ϕ1 )).
(3.18)
So the cone C¯u may not be invariant under DTε . Accordingly, we define a slightly bigger cone, −1 (1 + ε0 ag )), Cxu = {(dr, dϕ) ∈ Tx M : K(r )(1 − ε0 ag )) ≤ dϕ/dr ≤ K(r ) + τmin where we used Assumption (A2) to ensure that ag ε0 < 1/2. According to (3.2) and (3.6) we get
cos ϕ1 ≥ V1ε := dϕ1ε /dr1 1 + εg (ϕ1 ) K(r1 ) + τ (x)
cos ϕ1 1 + εg (ϕ1 ) . ≥ K(r1 ) + (3.19) τ (x) + cos ϕ/2K(r ) Note that for any x = (r, ϕ) ∈ intM, cos ϕ > 0. This implies that the unstable cone C¯u is strictly invariant under DTε for any ε ≤ ε0 .
756
H.-K. Zhang
Similarly we define the stable cone Cxs as: −1 (1 + ε0 ag )}. Cxs = {(dr, dϕ) ∈ Tx M : −K(r )(1 − ε0 ag ) ≥ dϕ/dr ≥ − K(r ) + τmin Then one can check that the stable cone C s is strictly invariant under DTε−1 whenever DTε−1 exists for any ε ∈ [0, ε0 ]. Observe that by (2.1) and (3.14), for any ε ∈ [0, ε0 ], dx1ε p dx1 p cos ϕ1ε ˆ p (1 − 2ag ε0 ). = ≥ p := dx p dx p cos ϕ1 By choosing ε0 <
1 4ag ,
one gets p > 1. Furthermore (3.4) implies that
dx1ε dx1 = dx dx
1 + V12 (1 + εg )2 dx1 ˆ e. ≥ ≥ dx 2 1 + V1
Thus the hyperbolicity is uniform in both metrics and the minimal expansion factors are uniformly (independent of ε) bounded from below. Note that the above relations also imply that at nearly grazing collisions dx1ε p τ (x) dx p cos ϕ
and
dx1ε τ (x) . dx cos ϕ1ε
(3.20)
Here we denote two quantities F G iff C1 ≤ F/G ≤ C2 for some constants 0 < C1 < C2 depending only on Q. Similarly one can show properties (1)-(3) for stable cones, which we will not repeat here. The last statement follows directly. It follows from Proposition 3.2 that the perturbations Tε can have the same family of stable (resp. unstable ) cones for any ε < ε0 . Every C 2 smooth curve W in M with all tangent vectors belong to unstable (resp. stable) cones is called an unstable (resp. stable) curve. If all preimages (resp. images) of an unstable (resp. stable) curve W are unstable (resp. stable) curves, W is called an unstable (resp. stable) manifold. Note that Tε may have a different set of stable/unstable manifolds compared to T0 . As a result, we will mostly use stable/unstable curves instead of the corresponding manifolds. We review some basic facts about stable and unstable curves. Stable curves have negative slope and unstable curves have positive slope. These curves cannot self-intersect. If W u is an unstable curve and W s a stable curve, then their intersection W u ∩ W s is transversal and consists of at most one point. Let |W | be the length of W and m W the Lebesgue measure on W .
4. Regularity of Stable (resp. Unstable) Curves In this section we will show that there is a class of C 2 smooth unstable curves in M which is invariant under Tε , for any ε ∈ [0, ε0 ]. Furthermore these curves are regular in the sense that they have curvature bounds, distortion bounds, absolute continuity, etc.
Current in Periodic Lorentz Gases with Twists
757
4.1. Curvature bounds. The next lemma was proved for T0 in [6,9], which basically states that the images of an unstable curve is essentially flattened under the map. Lemma 4.1. Let W ⊂ M be a C 2 -smooth unstable curve with equation ϕ0 = ϕ0 (r0 ). Then every smooth curve W ⊂ T0n (W ) with equation ϕn = ϕn (rn ) satisfies: |
2 d 2 ϕn 3n d ϕ0 | ≤ C + θ | |, 1 drn2 dr02
(4.1)
where Ci = Ci (Q), i = 1, 2 is a constant and θ ∈ (0, 1). Furthermore, for any regular unstable curve W , there exists n W ≥ 1, such that for any n > n W , every smooth curve of T n W has uniformly bounded curvature. In fact one can obtain a similar bounded curvature property for the perturbed map Tε . More precisely, for any n ≥ 1, let x¯ n = (¯rn , ϕ¯n ) := Tεn x, then we have: Proposition 4.2 (Curvature bounds). Let W be any C 2 smooth unstable curve. Then there exists n W ≥ 1 such that for any n > n W , every smooth curve W ⊂ Tεn W with equation ϕ¯n = ϕ¯n (¯rn ) has curvature uniformly bounded from above by Cb > 0. More precisely, there exists Cb = Cb (Q) > 0 such that: |d 2 ϕ¯n /d r¯n2 | ≤ Cb .
(4.2)
Proof. We fix any phase point x ∈ W . According to (3.6), the slope of the vector DTε dx satisfies d ϕ¯1 /d r¯1 = (1 + εg (ϕ1 ))dϕ1 /dr1 .
(4.3)
We differentiate the above equality with respect to r¯1 and get d 2 ϕ¯1 d 2 ϕ1 dϕ1 = (1 + εg (ϕ )) + εg (ϕ1 ) . 1 2 2 dr1 d r¯1 dr1 Using the same notations as in Lemma 4.1 and by (2.1) we get for some C0 > 0, | where dd r¯ϕ¯20 = 2
0
2 d 2 ϕ¯1 3 d ϕ¯ 0 | ≤ C + (1 + ε a )θ | |, 0 0 g d r¯12 d r¯02
(4.4)
d 2 ϕ0 . According to (A2) by choosing ε0 small one can make (1+ε0 ag )θ 2 dr02
<
1. Then we have for any n ≥ 1, |
2 C0 d 2 ϕ¯n n d ϕ¯ 0 + θ | ≤ | |. d r¯n2 1−θ d r¯02
Since W is of C 2 , there exists C1 = C1 (W ) > 0 such that | dd r¯ϕ¯20 | < C1 . We fix a constant 2
0
Cb = Cb (Q) > 0 and define
nW = |
ln(Cb /C1 ) |. ln θ
Then for any n > n W , images in Tεn W have equation ϕ¯n = ϕ¯n (¯rn ) with second derivative bounded from above by Cb .
758
H.-K. Zhang
From now on we fix the constant Cb > 0, and c M > 0 (which is very small and will be chosen in Subsect. 4.4), then define W u to be the class of all unstable curves with length |W | < c M , such that the curvature of any W ∈ W u is uniformly bounded by Cb . It follows from the above proposition that the class W u is invariant under Tε , for all ε ∈ [0, ε0 ]. Any unstable curve W ∈ W u is called a regular unstable curve. We will show in later subsections that any regular unstable curves also have bounded distortions and absolute continuity properties. 4.2. Singular curves. The billiard map Tε has discontinuities which are analyzed here. We first note that the singular curves for the perturbed system (Tε , M) are the same as that of the original system (T0 , M). The main reason is that by assumption (1.8) the grazing collisions are preserved: Tε−1 (r, ±π/2) = T0−1 (r, ±π/2),
∀r ∈ ∂ Q.
(4.5)
Furthermore since we assume that the billiard wall does not have any corners, so the discontinuity set of Tε is made of the preimage of S0 := {ϕ = ±π/2}. To overcome some technical difficulties, in particular, to control the distortion bounds for the dynamics, we need also to add some artificial curves into the singular set. Next we explain the construction of these additional singular curves. Let W be an unstable curve on which Tεn is continuous, for some n > 1, then by the hyperbolic property the length of Tεn W grows exponentially compared to W . We need to compare the expansion rates on different points x, y ∈ W and ensure that those rates vary slowly over W (this property is referred to as the distortion bound). However at almost grazing reflections, i.e. cos ϕ ≈ 0, the expansion of unstable curves is highly nonuniform, and so the distortions are unbounded. To overcome the difficulty we divide the space M into homogeneity stripes Ik as in [3], for k ≥ k0 , where k0 is a large integer: Ik = {(r, ϕ) : π/2 − k −2 < ϕ < π/2 − (k + 1)−2 } and I−k = {(r, ϕ) : −π/2 + (k + 1)−2 < ϕ < −π/2 + k −2 }. We also put I0 = {(r, ϕ) : −π/2 + k0−2 < ϕ < π/2 − k0−2 }. Denote by H S±k = {(r, ϕ) : |ϕ| = ±π/2 ∓ k −2 } n ε H ∓m ±n is and S0,H = S0 ∪∞ k≥k0 S±k . Let S±n = ∪m=0 Tε S0,H with n ∈ N ∪ {∞}. Then Tε ε ε smooth on M\S±n . Although S0 does not depend on ε, the set S1 , which is now called the singular set of Tε , does depend on ε. From now on we assume all regular curves are taken from M\S0,H . This is somewhat different from the literature; as a result the ε-neighborhood of the singular set S0,H also increases to order ε2/3 as we will see in (4.17). By the finite horizon condition, there are a finite number of curves in Tε−1 S0 that are preimages of S0 ; this set H } converges to S as k goes to ∞, so for any W ⊂ there is denoted as . Since {S±k 0
Current in Periodic Lorentz Gases with Twists
759
Fig. 1. Lorentz gas in T2
ε (right) Fig. 2. Singularity curves in S1ε (left) and S−1
is a sequence {Wn } with Wn ⊂ Tε−1 SnH that converges to W as n goes to ∞ or −∞. Furthermore, since S0 is made of closed curves, any smooth curve in S1ε terminates on curves in S1ε . The singularity curves near the point marked x based on the boundary of B1 in Fig. 1 are shown in Fig. 2. The long singularity curve S corresponds to preimages of grazing collisions with the adjacent scatterer B2 . The short singular curves in black are ˜ By the assumption on the made by images of tangent vectors of further scatters in Q. finite horizon, there exists N > 0 such that there are at most N singular curves which are the preimage of S0 . The dashed horizontal lines correspond to SkH , k ≥ k0 ; the grey curves are preimages of points in SkH ’s, and they form at most N sequences converging to black singular curves. The regions bounded by singular curves are usually called cells; they are made of vectors whose trajectories will land on a particular scatterer and are denoted as Mm,k for m = k0 , · · · , and k = k0 , · · · such that Mm,k ⊂ Ik and Tε Mm,k ⊂ Im . The height of Mm,k is of order O( k13 ) and its width is of order O( m15 ). Any regular unstable curve W in Mm,k with images Tε W belongs to Im expanded by a factor m,k = m 2 . The measure of this cell is μ0 (Mm,k ) = O( m 51k 5 ). Let W be a regular unstable curve such that Tε W is fully stretched in some Ik , then for any x ∈ W with Tε x = (r1 , ϕ1ε ), 3
|Tε W | |k|−3 cos 2 ϕ1ε .
(4.6)
760
H.-K. Zhang
It follows from (3.20) that the expansion factor on W is of order cos−1 ϕ1ε or O(k 2 ) which implies that 3
|Tε W | |W |k 2 |W | 5 .
(4.7)
Accordingly there exists C > 0 s.t. for any x ∈ M\S1ε , 2
C −1 d(x, S1ε ) 5 ≤
2 Dx Tε v ≤ C d(x, S1ε )− 5 , ∀v ∈ Tx M, v
(4.8)
where d(x, S1ε ) is the Euclidean distance between x and the set S1ε in M. Note that any curve W in SkH , k ≥ k0 , is horizontal, i.e. dϕ = 0. Then Tε−1 W corresponds to a family of focusing wave front, which are stable curves. Similarly any smooth curve in is also stable since it is a limit of stable curves. According to Proposiε are tion 4.2, curves in S1ε have bounded curvature. In general the singular set S1ε and S−1 not symmetric unless for time reversible maps, i.e. the twist function g satisfies (1.9). 4.3. Distortion bounds. Now we are ready to study the distortion bound of Tε . For any unstable curve W and x ∈ W , denote by JW T0−1 (x) (resp. JW Tε−1 (x)) the Jacobian of T0 (resp. Tε ) along W at x ∈ W . Note that for unit vector v = (dr, dϕ) ∈ Tx W , the Jacobian of Tε defined by JW Tε−1 (x) = Dx Tε−1 v is the contraction factor of W under Tε−1 at x. In [6,9] the distortion bounds were proved for the classical billiard map T0 . More precisely it was shown that for any regular unstable curve W , there exist C1 , C2 > 0, s.t. | ln JW T0−1 (x) − ln JW T0−1 (y)| ≤ C1 dW (x, y) 3 , 1
(4.9)
where dW (x, y) is the arclength between x and y along W . Next we show Tε also has similar properties on the set of all regular unstable curves. Lemma 4.3 (Distortion bounds). Let W be a regular unstable curve, then there exists C J > 0 such that 1
| ln JW Tε−1 (x) − ln JW Tε−1 (y)| ≤ C J dW (x, y) 3 . Proof. It follows from (3.20) and (3.4) that these Jacobians along any unstable curve W have the following relation: 1+V 2 JW Tε−1 (x) = JW T0−1 (x) 1+V−1,ε (4.10) 2 , −1
where V−1,ε = V−1 (1 + εg (ϕ)) is the slope of the vector DTε−1 v. Then it follows from (4.10) that ln JW Tε−1 (x) = ln JW T0−1 (x) +
1 1 2 2 ln(1 + V−1,ε (x)) − ln(1 + V−1 (x)). 2 2
It follows from the smoothness of W and the curvature bounds, there exists C > 0 such that for any x, y ∈ W , any ε ∈ [0, ε0 ], 2 2 (x)) − ln(1 + V−1,ε (y))| ≤ C1 | | ln(1 + V−1,ε
2 (x) V−1,ε 2 V−1,ε (y)
| ≤ CdW (x, y).
Current in Periodic Lorentz Gases with Twists
761
Since the original map T0 satisfies the distortion bound, there exists C J > 0 such that | ln JW Tε−1 (x) − ln JW Tε−1 (y)| ≤ | ln JW T0−1 (x) − ln JW T0−1 (y)| + C2 dW (x, y) 1
≤ C J dW (x, y) 3 . In general, for any x, y ∈ W , n ∈ N, we denote Wn = Tε−n W , xm = Tε−m x and ym = Tε−m y. If Wn is a smooth unstable curve then | ln JW Tε−n (x) − ln JW Tε−n (y)| ≤
n−1
| ln JWk Tε−1 (xk ) − ln JWk Tε−1 (yk )|
k=0 n−1
≤C
1
dWk (xk , yk ) 3 .
k=0
Finally, due to the uniform hyperbolicity of Tε , we have 1
| ln JW Tε−n (x) − ln JW Tε−n (y)| ≤ CdW (x, y) 3 . Let M(x) be the open connected component of the set M\S1 containing the phase point x. For any x, y ∈ M denote the separation time of x and y by s+ε (x, y) = min{n ≥ 0 : Tεn y ∈ / M(Tεn x)}.
(4.11)
Observe that if x and y lie on one unstable curve W , then by the uniform expansion property and the distortion bounds, there exists C = C(Q) such that −s+ε (x,y)
dW (x, y) ≤ Ce
,
(4.12)
ˆ e is the minimal expansion factor in the Euclidean metric for all Tε with where e = ε ∈ [0, ε0 ]. In addition, we prove the Holder continuity of DTε and Jε . Lemma 4.4. Both ln | det Dx Tε−1 | and ln |Jε (Tε−1 x)| are piecewise Holder functions with exponent 1/3: 1
| ln f (x) − ln f (y)| ≤ Cd(x, y) 3 ,
∀x, y ∈ M\S0,H ,
(4.13)
where C > 0 is a constant and f (x) = | det Dx Tε | or f (x) = |Jε (Tε−1 x)|. Proof. We first prove the following fact. Let Q(x) = cos−1 ϕ, with x = (r, ϕ). Then Q is differentiable on M\∂ M and for any x, y ∈ M\S0,H , 1
| ln Q(x) − ln Q(y)| ≤ CdW (x, y) 3 ,
(4.14)
where C > 0 is a constant. Since Q is differentiable on M\∂ M, then for any x, y belong to one homogeneity strip Ik , k ≥ k0 , 3
d(x, y) ≤ C1 k −3 ≤ C2 cos 2 ϕ. This implies that 1
| ln Q(x) − ln Q(y)| ≤ C3 Q(x)d(x, y) ≤ Cd(x, y) 3 . Here C, Ci , i ≥ 1 are all constants only depend on the table.
(4.15)
762
H.-K. Zhang
According to (3.9) the Jacobian det Dx Tε−1 is the product of Q(x) and a C 1 function on M. It follows from (4.15), ln | det Dx Tε−1 | is a piecewise Holder function on M\S0,H and satisfies (4.13). In addition by Lemma 3.1, one can check that Jε (Tε−1 x) is the product of Q(ϕ) and a C 1 smooth function on M. Again, (4.15) implies that Jε (Tε−1 x) satisfies (4.13). Now we consider x, y ∈ M\S0,H such that s+ε (x, y) = n. A similar argument as in (4.12) leads to | det Dx Tε−n | ≤ C1 eϑ | det Dy Tε−n | n
−1/3
where C1 = eC and ϑ = e
and
Jε (Tε−n x) ≤ C1 eϑ Jε (Tε−n y), (4.16) n
.
4.4. Absolute continuity. Since billiards have singularities, if the orbit of x approaches to the singularity set S1ε too fast under Tε , then x may not have a stable manifold. For the finite horizon case, one can show that m-a.e. x ∈ M does have a stable (resp. unstable) manifold. Definition 4.5. For any x ∈ M, let rεσ (x) = dW σ (x) (x, ∂ W σ (x)), where W σ (x) is the stable (resp. unstable) manifold that contains x, for σ ∈ {s, u}. Since we have added the boundaries of the homogeneity strips and their preimages in the ε has measure: singular set, one can check that for any δ > 0, the δ−neighborhood of S±1 ε )) = O(δ) μ0 (Bδ (S±1
and
ε m(Bδ (S±1 )) = O(δ 2/3 ).
(4.17)
ε ) = {x ∈ M : d (x, S ε ) ≤ δ} for any δ > 0. Here Bδ (S±1 M ±1
Lemma 4.6. For any small δ > 0, the set (rεs (x) < δ) has measure: μ0 (rεs (x) < δ) ≤ Cδ
and
m(rεs (x) < δ) ≤ Cδ 2/3 .
(4.18)
Proof. It follows from hyperbolicity there exists c > 0 such that for any small δ > 0, ε (S−1 ). (rεs (x) < δ) ⊂ ∪n≥0 Tε−n Bcδ−n p
We apply the measure μ0 and get μ0 (rεs (x) < δ) ≤
ε Tεn μ0 (Bcδ−n (S−1 )). p
n≥0
By (3.12) and (4.16), for any measurable set B ⊂ M,
1 Tεn μ0 (B) = dμ0 (x) ≤ C1 (1 + 2ε0 ag )μ0 (B). −n x∈B Jε (Tε x) We choose ε0 such that θ := C1 (1 + 2ε0 ag )/ p < 1. In particular, there exist c > 0, C > 0, C1 > 0 such that for any δ > 0, μ0 (rεs (x) < δ) ≤ C
∞ n=0
ε T n μ0 (Bcδ−n (S−1 )≤ p
C1 δθ n ≤ Cδ.
(4.19)
n≥0
A similar argument for the Lebesgue measure follows from (4.16) and (4.17).
Current in Periodic Lorentz Gases with Twists
763
The above lemma implies that almost every point in M has a regular stable (resp. unstable) manifold and there are plenty of reasonable long stable (resp. unstable) manifolds for Tε . We fix ε ∈ [0, ε0 ]. Let W 1 and W 2 be two short, almost parallel regular unstable curves that are sufficiently close to each other. We assume that W 1 and W 2 have uniformly bounded first and second derivatives (then their future images will have uniformly bounded derivatives as well, due to Proposition 4.2). Some points x ∈ W 1 may be jointed with points x¯ ∈ W 2 by stable manifolds. The images of these points x and x¯ will get closer together exponentially fast in the future under iterations of Tε . Denote W∗i = {x ∈ W i : W s (x) ∩ W 3−i = ∅} for i = 1, 2, where W s (x) is the stable manifold for Tε at x. The holonomy map hε : W∗1 → W∗2 takes points x ∈ W∗1 to x¯ = W s (x) ∩ W 2 . If both W 1 and W 2 are short enough, say, there exists c M > 0 such that for all |Wi | < c M , i = 1, 2, the map hε is well-defined for all ε ∈ (0, ε0 ]. Note that the Lebesgue measure m W 2 is transferred by (hε )−1 onto W∗1 as m 1∗ = (hε )−1 (m W 2 ), which may differ from the Lebesgue measure on W 1 . To study the distortion of m 1∗ comparing to the Lebesgue measure on W 1 , we need to study the Jacobian of the holonomy map, which is defined as for any x ∈ W∗1 , J hε (x) =
dm ∗1 (x). dm 1
It was known (see [9,6]) that the Jacobian of h0 satisfies a basic formula: J h0 (x) = lim
n→∞
JW 1 T0n (x) . JW 2 T0n h0 (x)
(4.20)
Furthermore, it is uniformly bounded. Moreover, the Jacobian J h0 is also shown to be Holder continuous on its domain. Since the perturbation is of order ε, which is small enough by (A2). Combining with the absolute continuity property of the original map T0 , similar argument as in the proof for distortion bound leads to the following property. Lemma 4.7. For any two close enough regular unstable manifolds W 1 and W 2 , the unstable holonomy map hε : W∗1 ⊂ W 1 → W 2 is absolutely continuous and J hε is uniformly bounded. Moreover, for any x, y ∈ W∗1 , 1
| ln J hε (x) − ln J hε (y)| ≤ C J dW (x, y) 3 .
(4.21)
From now on we will only consider regular unstable and stable curves without emphasizing. 5. Growth Lemma One of the fundamental facts in the theory of chaotic billiards is the Growth Lemma that describes the victory of the hyperbolicity over singularities: expansion always prevails over fragmentation. We now turn to the precise definition. Let W be any unstable curve in M\S1ε , for any x ∈ W and any n ∈ N denote Wn (x) ⊂ Tεn W as the regular unstable curve that contains xnε := Tεn x and define ε (x) = dWn (x) (xnε , ∂ Wn (x)). r W,n
764
H.-K. Zhang
Denote H as the class of Hölder continuous functions. More precisely for any f ∈ H, there exist C f > 0, ϑ f ∈ (0, 1) and n ≥ 1, for any x, y ∈ M, | f (x) − f (y)| ≤ C f d(x, y)ϑ f .
(5.1)
Let K f = sup x,y∈M
| f (x) − f (y)| . d(x, y)ϑ f
In this paper we consider Holder continuous functions instead of piecewise smooth ones to avoid the class of functions that depends on ε. For a fixed large constant C T > 0, denote D as the set of all density functions f : M → R such that: (1) there exists a measurable foliation W of M into regular unstable curves. (2) For any W ∈ W, | ln f (x) − ln f (y)| ≤ C T d(x, y)ϑ f
∀x, y ∈ W.
We will see later that all of our density functions come from the set D. We now introduce the concepts of standard pairs and families that were brought up by Chernov and Dolgopyat in [6,8,13]. Let W be any regular unstable curve in M. We say (W, ν) is a standard pair for Tε , if ν is a probability measure that is absolutely continuous with respect to m W and the density function dν/dm W ∈ D. Let G = {(Wβ , νβ ), β ∈ A} be a family of standard pairs for Tε . We call it a standard family if there exists a probability factor measure λG on A, which defines a measure ν on the Boreal σ -algebra of W = ∪β Wβ as
νβ (B ∩ Wβ ) dλG (β), (5.2) ν(B ∩ W) = β∈A
for all measurable sets B in M. We denote such a standard family by G = (W, ν). The fact that T G is still a standard family follows from the distortion bounds. For any δ > 0 and any family of unstable curves W, denote the set of short curves by Bδ (W) = {W ∈ W | |W | < δ}. For any standard family G = (W, ν), denote FG (δ) = ν(Bδ (W)). Accordingly, FG defines the distribution of short unstable curves in W corresponding to ν. Fix a large number Cp ; we say G is a proper family if FG (δ) < Cp δ 2/3 ; G is an essentially proper family if δ −2/3 FG (δ) < ∞. In particular one can check that there exists δ˜ > 0 (does not depend on ε) such that for any standard pair (W, ν) with |W | ≤ δ˜ is proper. Furthermore, one may wonder what is the distribution of short curves in Tεn W? We first mention a very important fact, which follows from a similar argument as the unperturbed case, see [9], Lemma 5.56. Lemma 5.1. There exist δ0 > 0 and θ ∈ (0, 1) such that for any unstable curve W that intersects S1ε with |W | < δ0 , |T −1 V j | ε
j
|V j |
< θ,
ε . where {V j } is the collection of connected components in Tε W \S−1
(5.3)
Current in Periodic Lorentz Gases with Twists
765
Proof. It is enough to consider the worst case where the image of an unstable curve W is cut into infinitely many pieces by the homogeneity strips. Equation (4.6) implies that for each Vk ⊂ Ik , |Vk | ∼ O(k −3 ) for k larger than some index k1 . Note that the expansion factor (in the Euclidean metric) is of order k 2 on T −1 Vk . Thus |T −1 Vk | is of order k −5 which implies that |W | is of order k1−4 . Combining these facts we have |T −1 Vk | ε ≤ C k≥k1 |Vk |
1 k2
≤ const.k1−1 ≤ const.|W |1/4 .
(5.4)
k≥k1
Another worse case is when W meets several pieces, say N > 1, in Tε−1 (∂ M), each approached by a sequence in the preimage of boundary of the homogeneity strips. Similar N −1 as above, the sum is bounded by i=1 ki . By choosing |W | small, the sum is small, i.e. there exist δ0 > 0 and θ ∈ (0, 1) such that for |W | < δ0 we can make the above expression ≤ θ . Lemma 5.2 (Growth Lemma). The system (Tε , M) has the following properties: (1) There exist c > 0, C z > 0 and θ ∈ (0, 1), such that for any standard pair G = (W, ν) and δ > 0, ε m W (r W,n (x) < δ) ≤ (cθ n + C z |W |)δ.
(5.5)
(2) There exists χ > 0 such that for any standard pair G = (W, ν) and n > χ | ln |W ||, Tεn G is proper. (3) There exist c1 > 0 and θ ∈ (0, 1), such that for any proper family G = (W, ν) and δ > 0, ε ν(rW ,n (x) < δ) ≤ c1 δ.
(5.6)
(4) If G = (W, με ) where με is an SRB measure, then G is proper. The above lemma directly follows from the general growth lemma proved in [11]. In fact in lemmas proved in this paper, the key assumption that guarantees the three properties (1)–(3) is the one-step expansion condition (5.3). Another important fact follows from the one-step-expansion condition (5.3) proved in [8,9,11] is the existence of plenty of long stable manifolds for points belong to a proper family. Lemma 5.3. There exists C > 0 such that for any δ > 0, any proper family G = (W, ν) of Tε , ν(rεs/u (x) < δ) ≤ Cδ. In particular this implies a similar argument as (4.18) for (Tε , με ), where με is any SRB measure. More precisely we take W to be the partition of the support of με into unstable manifolds of Tε , then apply the above lemma for the proper family (W, με ).
766
H.-K. Zhang
6. Ergodic Properties Our first task is to prove the existence and uniqueness of the SRB measure με on M, which is, by definition, an Tε -invariant ergodic measure whose conditional distributions on unstable manifolds are absolutely continuous with respect to the Lebesgue measure along unstable manifolds. In [16] Pesin proved the existence and ergodic properties of SRB measures for a wide class of hyperbolic systems with singularities, covering the class we study here, under two extra assumptions, which in our notation are: (h1) ∃C > 0, r > 0 such that ∀δ > 0, n ≥ 1 m(Tε−n Bδ (S1ε )) ≤ Cδr ; (h2) ∃z ∈ M, C > 0, r > 0 such that ∀δ > 0, n ≥ 1, m W u (z) (W u (z) ∩ Tε−n Bδ (S1ε )) ≤ Cδr . Here Bδ (·) denotes for δ-neighborhood in the Riemannian metric. Under these assumptions, Pesin [16] proved that any SRB measure has at most a countable number of ergodic components. Later Sataev [17] showed that the number of ergodic components is finite under the above (h1) and the following: (h3) ∃C > 0, r > 0 such that for the unstable curve W there are n W > 0 and C W > 0 such that ∀δ > 0, ∀ε ∈ [0, ε0 ], (a) υW (W ∩ Tε−n Bδ (S1ε )) ≤ C W δr , ∀n > 0; (b) υW (W ∩ Tε−n Bδ (S1ε )) ≤ Cδr , ∀n > n W . Here υW is the probability measure on W induced by m W . Lemma 6.1. The perturbed system (Tε , M) satisfies (h1)–(h3). Hence Tε admits at least one and at most finitely many SRB measures. Every SRB measure is K-mixing and Bernoulli, up to a finite cycle. Proof. Assumption (h1) follows from (4.17) and (4.16) with r = 2/3. By (5.5) (h3)-(a) is true with r = 1; furthermore by the second statement in the Growth Lemma, there exists n W such that Tεn W G is proper, which implies (b). According to Lemma 5.3, long unstable manifolds exist almost everywhere. Thus (h2) is a special case of (h3)-(a) with (W u (z), υW u (z) ) being proper. 7. Statistical Properties From now we fix an SRB measure με . For any pair of integrable functions (observables) f, h in L 1με (M), the correlations of f ◦ Tεn and h are defined by
C f,h (n) = M
( f ◦ Tεn ) h dμε −
M
f dμε
M
h dμε
= με [( f ◦ Tεn )h] − με ( f )με (h), n ∈ N.
(7.1)
Furthermore, the rate of mixing of Tε is characterized by the speed of convergence in (7.1) for smooth enough functions f and h. According to the general theorems in [9,8,11], the following statistical properties hold for the perturbed system Tε .
Current in Periodic Lorentz Gases with Twists
767
Lemma 7.1 (Equidistribution). Let G be a proper standard family. For any f ∈ H and n ≥ 0,
n f ◦ Tε dνG − f dμε ≤ B f θ nf , (7.2) M
M
where B f = C(K f + f ∞ ) and θ f < 1. Lemma 7.2 (Exponential decay of correlations). For every f, g ∈ H, n ≥ 0 and any probability measure ν that is absolutely continuous with respect to με , ν[( f ◦ T n )g] − με ( f )ν(g) ≤ B f,g θ n , (7.3) ε where θ < 1, and
B f,g = C0 K f h∞ + K g f ∞ + f ∞ g∞ ,
(7.4)
with C0 = C0 (Q) > 0 being a constant. Note that we use ν instead of με in (7.3) since the original proof for (7.3) with με also leads to the more general formula here, since h dν = hρ dμε with dν = ρ dμε . The above results can be extended to variables made at multiple times. Let f 0 , f 1 , . . . , f k ∈ H, and ϑ fi = ϑ f0 ,
f i ∞ = f 0 ∞ ,
i = 0, 1, . . . , k.
Consider the product f˜ = f 0 · ( f 1 ◦ Tε ) · · · ( f k ◦ Tεk ). Lemma 7.3. Let G be a proper family. Then there exists B f˜ > 0, θ ∈ (0, 1) such that for any n ≥ 0,
˜ ◦ Tεn dνG − ˜ dμε ≤ B ˜ θ n . (7.5) f f f M
M
Furthermore, let g0 , g1 , . . . , gk ∈ H, with ϑgi = ϑg0 and gi ∞ = g0 ∞ , 0 = 1, . . . , k. Consider the product g˜ = g0 ·(g1 ◦Tε ) · · · (gk ◦Tεk ). Then we can estimate the correlations between observables f˜ and g˜ as we did in Theorem 7.2. Lemma 7.4. There exists B f˜,g˜ > 0, for all n ≥ 0, ˜ ≤ B f˜,g˜ θ n , g˜ · ( f˜ ◦ Tεn ) − f˜ · g where θ is the same as in (7.3). Let f be an observable on M with expectation 0, we say that the sequence { f ◦ Tεn } satisfies the Central Limit Theorem (CLT) if for any z ∈ R,
z s2 f + f ◦ Tε + · · · f ◦ Tεn−1 1 lim με
∞ n=−∞
C f, f (n) = C f, f (0) + 2
∞ n=1
C f, f (n).
(7.7)
768
H.-K. Zhang
Lemma 7.5. Let f ∈ H be a Hölder continuous function. Then the CLT (7.6) holds, along with formula (7.7) and σ f,ε = 0 if and only if f is cohomologous to a constant. The proof follows from a similar argument as in [9]- Section 7.8, we will not repeat the proof here. Note that in the above lemmas from Lemma 7.1–Lemma 7.5, all the constants can be chosen independently of ε. The uniqueness property of the SRB measure require a more elaborate argument, which is based on the fact that our map Tε is a small perturbation of a strongly mixing map T0 with a smooth invariant measure μ0 . The argument is very similar to that presented in [5]; we only recall its principal steps. ˆ Let X be the collection of all regular unstable curves of T0 with length at least δ, where δˆ > 0 is chosen such that any (W, υW ) is proper whenever an unstable curve W ˆ This implies that for any unstable curve W ∈ X , any standard has length larger than δ. pair (W, ν) is proper under Tε for ε small. Denote X¯ as the closure of X in Hausdorff metric. Since curves in X are at least C 2 , derivatives and curvatures are uniformly bounded according to the definition of the unstable cone and the curvature bound 4.2. This implies that curves in X¯ are at least C 1 . Furthermore, curves in X¯ are called generalized unstable curves. For any generalized unstable curve W ∈ X¯ , there exists an open set U in M whose closure is bounded by two stable and unstable manifolds of T0 (a curvilinear “rhombus”) such that the images of every unstable curve W ⊂ M under the billiard map T0 crosses U(W ) fully (intersecting both of its stable sides) at a sufficiently high rate after a fixed number of iterations. Such a set U(W ) was constructed in [9,4], together with a cantor hyperbolic set R(W ) which can be viewed as the central part of the direct product of stable manifolds s (W ) with length ≥ δs > 0 and unstable manifolds u (W ) of length ˆ In fact the closure of U(W ) is the minimal solid “rectangle” that contains ≥ δu = δ. R(W ) and shares the u/s-boundary with R(W ). Similarly one defines the cantor hyper¯ ) that are made of direct product of εs (W ) and εs (W ). We bolic set Rε (W ) ⊂ U(W say that an unstable curve W fully u-crosses U(W ), if W meets every stable manifold in s (W ), sticks out both sides of U(W ) at least δu /4 units and stays away from the u-boundary of U(W ) (made of two unstable manifolds) at least δs /4 units. Similarly one can define a stable manifold to s-cross U(W ). Any strip in U(W ) that fully stretched between two s-boundaries of U(W ) and bounded by two unstable manifolds of T0 is called a solid u-subrectangle of U(W ). Note that each U(W ) is fully u-crossed by an open ball O(W ) (centered at W ) in X¯ . In particular we always assume there exists a solid u-subrectangle B(W ) ⊂ U(W ), such that every W ∈ O(W ) is squeezed between the two u-boundaries of the u-subrectangle B(W ). We also say B(W ) is the u-subrectangle associated with O(W ). Since X¯ is a compact set, there exists finite open cover {O(Wi ) : i = 1, · · · , k}. And every W ∈ X¯ must fully u-cross one of those U(Wi ). In particular we fix U ∗ := U(W1 ) and R∗ := R(W1 ) for T0 . The growth lemma also guarantees that there are plenty of stable and unstable curves crossing the domain U ∗ fully. More precisely the following is true for the map T0 according to [11], Lemma 13. Lemma 7.6. There exist positive constant d0 ∈ (0, 1) and N1 > 1 such that for any proper family G = (W, ν) and n ≥ N1 we have T0n ν(R∗ ∩∗ T0n W) > d0 , where ∩∗ T0n W only contains the portion of T0n W that fully u-crosses U ∗ . Next we will show that Tε also has similar property as long as ε is small enough. We need the following observations to proceed.
Current in Periodic Lorentz Gases with Twists
769
Lemma 7.7. For any W ∈ X¯ , there exist ε(W ) > 0 and an open ball O(W ) centered at W in the space X¯ , such that any W ∈ O(W ), TεN1 υW (εs ∩∗ TεN1 W ) > d1 ,
(7.8)
for all ε < ε(W ). Here d1 = d1 (W ) and ∩∗ TεN1 W only contains the portion of TεN1 W that fully stretches in U ∗ , staying away from the u-boundary of U ∗ at least δs /5-units and away from the u-boundary of U ∗ at least δs /5 units. Here υW represents the conditional measure of m W on W . Proof. For any generalized unstable curve W belongs to X¯ , denote G = (W, υW ), where υW is the probability measure on W induced by m W . Then by the Growth Lemma, Tεn G and T0n (G) are both proper families for any n ≥ 1. Since by Lemma 7.6, for n ≥ N1 , T0n G has at least d0 portion properly return to R∗ along unstable curves that fully u-across U. In particular we denote W ∗ as the collection of all V ⊂ W as a smooth subcurve such that T0N1 V fully u-across R. For an open set O(W ) that fully cross U(W ), for any V ∈ W ∗ , let Bε,V (W ) be the portion of the u-subrectangle (associated with O(W )) that contains V , and bounded by two stable manifolds in εs (W ) and two unstable manifolds in εu (W ). Since Tε are small perturbations of T0 , one can choose O(W ) small enough (depends on ε), such that T N1 Bε,V ∩ U ∗ contains a stable manifold of Tε that stretching between the u-boundaries of T N1 Bε,V (W ), and stays inside U ∗ , away from the u-boundary of U ∗ at least δs /5 units. This can be done since it follows from Lemma 5.3 that a majority of points in T N1 V have long stable manifolds (of Tε ) and T N1 V stays away from the u-boundary of U ∗ at least δs /4 units. Now according to the continuity property of the singular curves (and any order of its images), and the fact that curves in ε ∩ intM are all unstable (with increasing slopes), ∀m ≥ 1, the strip T N1 B S−m ε,V (W ) must be fully stretched in U ∗ from one s-boundary to the other. This implies that any generalized unstable curve in O(W ) must have a (smooth) portion in Bε,V (W ) that fully u crosses U ∗ . Now we sum over all V ∈ W ∗ and can show that there exists d = dε,W such that for any W ∈ O(W ), TεN1 υW (εs ∩∗ TεN1 W ) > d.
(7.9)
Now we fix a small ε = ε(W ) < ε0 , then (7.9) is true for d1 (W ) := dε,W . ¯ ¯ Again since X is compact, there exists a finite cover of X by O(W1 ), . . . , O(Wk ). We define ε∗ = min ε(Wi ), i=1,...,k
d2 = min d1 (Wi ). i=1,...,k
(7.10)
Thus we have shown for any ε < ε∗ and any Tε satisfying (A2), TεN1 υW (Rε ∩ TεN1 W ) > d2 holds for any W ∈ X¯ . Combining with the growth lemma and the distortion bound, for any proper family (W, ν), any ε < ε∗ and any Tε satisfies (A1–A3), Tεn ν(Rε ∩ Tεn W) > d3 ,
(7.11)
770
H.-K. Zhang
where d3 < d2 is an uniform constant. Note that (7.11) is also true for all n ≥ N1 because of the Growth Lemma and the fact that iterations of proper families are also proper. Now the following modified coupling lemma can be proved for each Tε using similar arguments as in [9]. Let (W, νW ) be a standard pair, and define Wˆ = {(x, t) | x ∈ W, t ∈ [0, 1]}. Then Wˆ is a rectangle based on W . We equip Wˆ with a probability measure νˆ , such that for any (x, t) ∈ Wˆ , d νˆ W (x, t) = dνW (x)dt.
(7.12)
Note that the map Tεn defined on W can be extended to Wˆ with Tεn (x, t) := (Tεn (x), t). Let G = {(Wα , νWα ), α ∈ A} be a standard family with probability measure νG . Then the rectangles based on W are Gˆ = {(x, t) : x ∈ ∪α Wα , t ∈ [0, 1]}. Again we define the probability measure νˆ Gˆ , such that d νˆ Gˆ (x, t) = dνG (x)dt. Lemma 7.8. Given any two proper families G, E, there exists a measure preserving map (the coupling map) : Gˆ → Eˆ with (x, t) = (y, s), ∗ νˆ Gˆ = νˆ Eˆ ˆ such that Tεϒ(x,t) x and Tεϒ(x,t) y lie on and a coupling time function ϒ defined on G, the same stable manifold. Furthermore, the coupling time function ϒ : Gˆ → N has exponential tail bound: νˆ Gˆ (ϒ > n) ≤ Cϒ ϑϒn ,
(7.13)
where Cϒ is a positive constant, and ϑϒ ∈ (0, 1). Now we assume that ν1 and ν2 are two SRB measures of Tε , then take G = (W u , ν1 ) and E = (W u , ν2 ) with W u being the measurable partition of M into unstable manifolds of Tε . It follows from the coupling lemma that these two measures converge exponentially fast under Tεn as n goes to ∞. This proves the uniqueness and mixing property of the SRB measure με , hence its Bernoulli property. Note that the requirement of ε < ε∗ is much more stringent than any other requirements on ε listed in (A2). Furthermore, as ε larger than ε∗ , we may expect to see several SRB measures for the perturbed map Tε . As an additional fact, the basin of με has full Lebesgue measure, and due to the mixing property of με , we have με = lim Tεn ν n→∞
(7.14)
exponentially on Holder functions, where ν is any measure that gives rise to a proper or essentially proper family G = (W, ν) Lemma 7.9. For any open set A ⊂ M, με (A) > 0. In addition there exists C > 0 and δε = χ ln(1 + εag ) such that με (A) > C[μ0 (A)]1+δε .
Current in Periodic Lorentz Gases with Twists
771
Proof. It is enough to consider any open disc A ⊂ M\S0,H . Let W ⊂ A be an unstable curve and U := U(W ) be the solid rhombus bounded by two stable and two unstable manifolds of T0 , such that W is stretched between the s-boundary of U and through the geometric center of U. In fact one can choose W long enough such that μ0 (U) ≥ c0 μ0 (A), for some c0 > 0. Denote s as the foliation of U into stable curves with full length, such that the two s-boundaries are contained in s . Then there exists δ1 > 0 such that for any W ∈ s , |W | > δ1 . According to the Growth Lemma, there exists n 0 > χ | ln δ1 |, such that Tε−n 0 s corresponds to a proper (stable ) family for Tε−1 . For any n ≥ 1, let Tε−n s ∩∗ s be the union of all stable curves in Tε−n s that are fully stretched in U. Since Tεn μ0 converges to με exponentially on any Holder observables, which is dense in the function space L ∞ (M), there exist n 1 > 1 and C > 0 such that for any n ≥ n 1 , Tεn με (A) ≥ C Tεn μ0 (A).
(7.15)
Now by the dual part of Lemma 7.7, there exists N1 > 1 such that for each stable curve W ⊂ s , Tεn˜ υW (Tε−n˜ s ∩∗ s ) ≥ d,
(7.16)
where n˜ = N1 + n 0 + n 1 . We define B(W ) = Tεn˜ [Tε−n˜ W ∩∗ s ] be the union of portions V ⊂ W such that Tε−n˜ V stretches fully in U. Since by assumption, A is contained in a homogeneity strip, so μ0 and m are equivalent on subsets in A. Accordingly there exists a global constant c1 > 0 and (7.16) implies μ0 (∪W ∈ s B(W )) ≥ c1 μ0 (A).
(7.17)
Now we denote B(U) = ∪W ∈ s B(W ). Then the Jacobian bound (3.13) implies that μ0 (Tε−n˜ B(U)) ≥ c2 (1 + ε0 ag )−n˜ μ0 (A) ≥ c3 δ1δ μ0 (A) ≥ c4 μ0 (A)1+δ ,
(7.18)
where δ = χ ln(1+εag ) and c2 , c3 , c4 > 0 are global constants. Finally, by the invariance of με and (7.15) we have με (A) = Tεn˜ με (A) ≥ C Tεn˜ μ0 (A) ≥ C Tεn˜ μ0 (B(U)) ≥ Cc4 μ0 (A)1+δ , where we used (7.18) in the last step. Next we prove the parts of Theorem 2.1 concerning entropy and fractal dimension for the measure με . First the Pesin formula was proved in [14] for hyperbolic systems with singularities under the only assumption that the underlying invariant measure is absolutely continuous on unstable curves, which is true in our case. The entropies of the flow and map are related according to the well-known Abramov formula [1]: h μˆ ε (1ε ) = h με (Tε )/με (τ ). The estimation of fractal dimension of the measure με follows from the results of Young in [18] by making minor modifications of the proofs along the lines of Katok-Strelcyn [14]. So we omit the detail here. This finishes the proof of Theorem 2.1.
772
H.-K. Zhang
8. Proof of Theorem 2.2 and Theorem 2.4 We use the coordinate (x, y) for the position of the particle q, thus = ( x , y ). In this section we denote = x or y . According to the Kawasaki-type formula and Lemma 3.1, for any f ∈ H, με ( f ) = lim Tεn μ0 ( f ) n→∞
= μ0 ( f ) + lim
n→∞
= μ0 ( f ) +
∞
n
Tεk μ0 ( f ) − Tεk−1 μ0 ( f )
k=1
μ0 [( f ◦ Tεk )(1 − J ε )].
(8.1)
k=1
We will show that the convergence in (8.1) is exponentially fast and uniform in ε. Note that it follows from the fact that J ε is the density function of a probability measure μ0 (1 − J ε ) = 0. According to (3.12), formula (8.1) can be rewritten as με ( f ) = μ0 ( f ) + ε
∞
μ0 [( f ◦ Tεk )(G(x) + ε Rε (x))].
(8.2)
k=1
Now we use the fact μ0 ( ) = 0 to write με ( ) = μ0 ( ) + ε
∞
μ0 [( ◦ Tεk )(G(x) + ε Rε (x))]
k=1
=ε
∞
μ0 [( ◦ Tεk )(G(x) + ε Rε (x))].
(8.3)
k=1
Lemma 8.1. Both functions and τ belong to H with Holder exponent bounded. Furthermore,
1 2
and uniformly
μ0 ( ◦ Tε−1 ) = μ0 ( ) + O(ε) = O(ε). Proof. The bounded property follows from the finite horizon assumption. Let x = (q1 , v1 ) and y = (q2 , v2 ) be two nearby phase points originated from one scatterer such that their images under the collision map land on the same scatterer. The worst case happens when one image is tangential to the scatterer. Thus by elementary calculation we have 1 (x) − (y) ≤ C (x) · d(x, y) ≤ C1 d(x, y) 2 . Similarly we get 1
|τ (x) − τ (y)| ≤ Cd(x, y) 2 . By the definition of our perturbed system (1.7) and (1.1), if a trajectory starts from a phase point x after collision and moves to Tε−1 x for a time τ (Tε−1 x), then it deviates from the classical billiard trajectory moving backward from the same point x by
Current in Periodic Lorentz Gases with Twists
773
O(ετ ). If Tε−1 x is almost grazing, then T0−1 x and Tε−1 x might land on different scatterers so that | ◦ Tε−1 − ◦ T0−1 | = O(1). Thus on most part of the phase space we have ◦ Tε−1 = ◦ T0−1 + O(ε) except the O(ε)-neighborhood of the singular set ∂ M ∪ Tε−1 ∂ M which also has a measure of order O(ε). This completes the proof. According to Lemma 7.2, for the smooth function f on M and any ε ∈ [0, ε0 ], (8.4) μ0 [( ◦ Tεk ) f ] − μ0 ( f )με ( ) ≤ Cθ k . It follows from (8.4) and the fact that both G and Rε are smooth functions that ∞
μ0 [( ◦ Tεk )(1 − J ε )] = ε
k=1
∞
μ0 [( ◦ Tεk )(G + ε Rε )]
k=1
=ε
∞
μ0 [( ◦ Tεk )G] + ε2
k=1
=ε
∞
∞
μ0 [( ◦ Tεk )Rε ]
k=1
μ0 [( ◦ T0k )G] + o(ε),
(8.5)
k=1
where we have used the Lebesgue dominated convergence theorem in the last step. Now (8.3) can be estimated as με ( ) = ε
∞
μ0 [( ◦ T0k )G] + o(ε).
(8.6)
k=1
Again by Lemma 8.1 and the fact that μ0 (G + ε Rε ) = μ0 ((1 − J ε )/ε) = 0, με (τ ) = μ0 (τ ) + ε
∞
μ0 [(τ ◦ Tεk )(G + ε Rε )]
k=1
= τ¯ + O(ε), where we have used the property of exponential decay of correlations stated in Lemma 7.2 in the second to last step. Denote σ = (σx , σ y ), for a ∈ {x, y}, σa =
∞
μ0 [( a ◦ T0k )G].
k=1
Then we have shown that J = σ ε + o(ε) and Jˆ = εσ /τ¯ + o(ε). Now we turn to (2.7) and (2.9). The convergence to a normal law N (0, Dε ) follows from the Central Limit Theorem 7.5 and Lemma 8.1. Thus it is enough to estimate Dε which is given by the sum of correlations Dε =
∞ με [( ◦ Tεn ) ⊗ ] − με () ⊗ με () . n=−∞
(8.7)
774
H.-K. Zhang
Note that for any n ≥ 1, lim ◦ Tεn = ◦ T0n .
ε→0
Furthermore by Lemma 7.2 and its remarks, there exist C > 0 and θ ∈ (0, 1) which are independent of ε, s.t. for any ε ∈ [0, ε0 ], |με [( ◦ Tεn ) ⊗ ] − με ()2 | ≤ Cθ n . This implies that the series ∞ n=1 C (n) converges to a fixed constant C 1 . Hence the diffusion matrix is continuous in ε at ε = 0: Dε = =
∞ n=−∞ ∞
|με [( ◦ Tεn ) ⊗ ] − με ()με ()| |μ0 [( ◦ T0n ) ⊗ ]| + o(1) = D∗ + o(1).
(8.8)
n=−∞
This finishes the proof of Theorem 2.2 and Theorem 2.3. Finally we prove Theorem 2.4. Let λ˜ sε < 0 < λ˜ uε denote the Lyapunov exponents of the map Tε . The sum ξε := λ˜ sε + λ˜ uε represents the average rate of volume contraction if ξε < 0 (resp. expansion if ξε > 0) under the map Tε . This implies λ˜ sε + λ˜ uε = με (ln J ε ). It implies that 1 ξε = με (ln(1 − (1 − J ε ))) = −με (1 − J ε ) − με ((1 − J ε )2 ) + O(ε3 ). 2 Using a similar analysis as in (8.5), one can check that με (1 − J ε ) = μ0 (1 − J ε ) +
∞
μ0 [(1 − J ε ) ◦ Tεk · (1 − J ε )]
k=1
= ε2
∞
μ0 (G ◦ T0k · G) + o(ε2 ).
(8.9)
k=1
In addition ε 2
ε 2
με ((1 − J ) ) = μ0 ((1 − J ) ) + = ε2 μ0 (G 2 ) +
∞
μ0 [((1 − J ε )2 ) ◦ Tεk · (1 − J ε )]
k=1 ∞ 3 ε
2
μ0 (G 2 ◦ T0k · G) + o(ε3 ),
(8.10)
k=1
where we used (1.8) and (8.3) in the above estimates. Combining these facts we get ξε = −
∞ ε2 μ0 (G ◦ T0k · G) + o(ε2 ). 2 k=−∞
Current in Periodic Lorentz Gases with Twists
775
Also note that by Pesin’s formula, h με (Tε ) = με (ε ), where ε (x) is the local expansion rate of x ∈ M along the unstable direction under map Tε . Then according to (8.1) and the exponential decay of correlations, we have h με (Tε ) = μ0 (ε ) +
∞
μ0 (ε ◦ Tε · (1 − J ε ))
k=1 ∞
= μ0 (0 ) + ε
μ0 (0 ◦ Tε · (G + ε Rε )) + O(ε)
k=1
= h 0 + O(ε),
(8.11)
where h 0 := h μ0 (T0 ) is the Kolmogorov-Sinai entropy of the billiard map T0 . Combining the above facts and (2.3) we get 1 1 H D(με ) = h με (Tε ) − λ˜ uε λ˜ sε = 2− = 2−
ξε λsε
∞ ε2 μ0 (G ◦ T0k · G) + o(ε2 ) 2h 0 + O(ε) k=−∞
= 2−
ε2 2h 0
∞
μ0 (G ◦ T0k · G) + o(ε2 ).
k=−∞
Thus H D(μˆ ε ) = H D(με ) + 1 = 3 −
∞ ε2 μ0 (G ◦ T0k · G) + o(ε2 ). 2h 0 k=−∞
Acknowledgements. The research is partially supported by NSF Grant DMS 0901448. The author would like to thank Prof. Chernov for constant encouragement and many useful discussions. She would also like to thank Prof. Joel Lebowitz, who actually brought up the model of billiards with twists.
References 1. Abramov, L.M.: On the entropy of a flow. Dokl. Akad. Nauk SSSR, 128, 873–875 (1959) 2. Bunimovich, L.A., Sinai, Ya.G.: Statistical properties of Lorentz gas with periodic configuration of scatterers. Commun. Math. Phys. 78, 479–497 (1980/81) 3. Bunimovich, L.A., Sinai, Ya.G., Chernov, N.I.: Statistical properties of two-dimensional hyperbolic billiards. Russian Math. Surveys 46, 47–106 (1991) 4. Chernov, N.: Decay of correlations in dispersing billiards. J. Stat. Phys. 94, 513–556 (1999) 5. Chernov, N.: Sinai billiards under small external forces. Ann. Henri Poincare 2, 197–236 (2001) 6. Chernov, N.: Advanced statistical properties of dispersing billiards. J. Stat. Phys. 122, 1061–1094 (2006) 7. Chernov, N.: Sinai billiards under small external forces II. Ann. Henri Poincare 9, 91–107 (2008) 8. Chernov, N., Dolgopyat, D.: Brownian Motion - I. Memoirs of AMS 198, 927 (2009) 9. Chernov, N., Markarian, R.: Chaotic Billiards. Mathematical Surveys and Monographs 127, Providence, RI: Amer. Math. Soc. 2006 10. Chernov, N., Zhang, H.-K.: Billiards with polynomial mixing rates. Nonlineartity 4, 1527–1553 (2005) 11. Chernov, N., Zhang, H.-K.: Statistical properties of hyperbolic systems with general singularities. J. Stat. Phys. 136, 615–642 (2009)
776
H.-K. Zhang
12. Collet, P.: A remark about uniform de-correlation prefactors, Preprint., 1999 13. Dolgopyat, D.: On dynamics of mostly contracting diffeomorphisms. Commun. Math. Phys 213, 181– 201 (2001) 14. Katok, A., Strelcyn, J.-M. (with the collaboration of F. Ledrappier & F. Przytycki): Invariant manifolds, entropy and billiards; smooth maps with singularities. Lect. Notes Math. 1222 Berlin-HeidelbergNew York: Springer-Verlag, 1986 15. Lorentz, H.A.: The motion of electrons in metallic bodies. Proc. Amst. Acad. 7, 438–453 (1905) 16. Pensin, Ya.B.: Dynamical systems with generalized hyperbolic attractors: hyperbolic, ergodic and topological properties. Erg. Th. Dyn. Syst. 12, 123–152 (1992) 17. Sataev, E.: Invariant measures for hyperbolic maps with singularities. Russ. Math. Surv. 47, 191–251 (1992) 18. Young, L.-S.: Dimension, entropy and Laypunove exponents. Erg. Th. Dyn. Syst. 2, 109–124 (1982) 19. Young, L.-S.: Statistical properties of systems with some hyperbolicity including certain billiards. Ann. Math. 147, 585–650 (1998) Communicated by G. Gallavotti
Commun. Math. Phys. 306, 777–784 (2011) Digital Object Identifier (DOI) 10.1007/s00220-011-1225-x
Communications in
Mathematical Physics
Lower Bounds for Nodal Sets of Eigenfunctions Tobias H. Colding1 , William P. Minicozzi II2 1 Dept. of Math., MIT, 77 Massachusetts Avenue, Cambridge, MA 02139-4307, USA.
E-mail:
[email protected]
2 Dept. of Math., Johns Hopkins University, 3400 N. Charles St., Baltimore, MD 21218, USA.
E-mail:
[email protected] Received: 26 September 2010 / Accepted: 26 October 2010 Published online: 9 March 2011 – © Springer-Verlag 2011
Abstract: We prove lower bounds for the Hausdorff measure of nodal sets of eigenfunctions. 0. Introduction Let M be a smooth closed Riemannian manifold and the Laplace operator. A function u is said to be an eigenfunction with eigenvalue λ if u = −λ u.
(0.1)
With our convention on the sign of , the eigenvalues are non-negative and go to infinity. One of the most fundamental questions about eigenfunctions is to understand the sets where they vanish; these sets are called nodal sets. Nodal sets are (n − 1)-dimensional manifolds away from (n −2)-dimensional singular sets where the gradient also vanishes, so it is natural to estimate the (n − 1)-dimensional Hausdorff measure Hn−1 . The main result of this short note is the following lower bound: Theorem 1. Given a closed n-dimensional Riemannian manifold M, there exists C so that Hn−1 ({u = 0}) ≥ C λ
3−n 4
.
(0.2)
In particular, Theorem 1 gives a uniform lower bound √ in dimension n = 3. In [Y], S.-T. Yau conjectured the lower bound C λ in all dimensions. This was proven for surfaces by Brüning, [B], and Yau, independently, and for real analytic metrics by Donnelly and Fefferman in [DF], but remains open in the smooth case. The authors were partially supported by NSF Grants DMS 0606629, DMS 0906233, and NSF FRG grants DMS 0854774 and DMS 0853501.
778
T. H. Colding, W. P. Minicozzi II
The Donnelly-Fefferman argument leads to exponentially decaying lower bounds in the smooth case; see also [HL]. In [M], Mangoubi considered eigenfunctions on a ball in a manifold and proved lower bounds for the volume of the subset of the ball where the function is positive and the subset where it is negative. Combined with the isoperimetric inequality, one can get the 3−n 1 lower bound C λ 2 − 2n for the measure of the nodal set on the entire manifold. 7−3n Recently, Sogge and Zelditch, [SZ], proved the lower bound C λ 8 . Their argument is completely different and is based in part on a beautiful new integral formula relating the L 1 norm of |∇u| on the nodal set and the L 1 norm of u on M. Finally, note that some papers use λ2 for the eigenvalue (i.e., u = −λ2 u); with 3−n that convention, our bound is C λ 2 . 1. Finding Good Balls Throughout M will be a fixed closed manifold with Laplace operator and u will be an eigenfunction with eigenvalue λ and M u 2 = 1. We will always assume that λ ≥ 1, since our interest is in what happens when λ goes to infinity. The first step is to fix a scale r depending on λ: Lemma 1. There exists a > 0 so that u has a zero in every ball of radius
3
a √
λ
.
This lemma is standard, but we will include a proof since it is so short. Proof. If u does not vanish on B r3 , then Barta’s theorem implies that the lowest Dirichlet eigenvalue on this ball is at least λ. Let φ be a function that is identically one on B r6 and that cuts off linearly to zero on the annulus B r3 \ B r6 . It follows that λ≤
36r −2 Vol(B r3 ) |∇φ|2 C ≤ 2, ≤ Vol(B r6 ) r φ2
(1.1)
where C comes from the Bishop-Gromov volume comparison (see p. 275 of [G]) and depends only on n, the Ricci curvature of M, and an upper bound for r . Since this is impossible for r 2 > C λ−1 , the lemma follows. 1
From now on, we set r = a λ− 2 with a given by Lemma 1. Next we use a standard covering argument to decompose the manifold M into small balls of radius r . If B is a ball in M, then we write 2B for the ball with the same center as B and twice the radius. Lemma 2. There exists a collection {Bi } of balls of radius r in M so that M ⊂ ∪i Bi and each point of M is contained in at most C M = C M (M) of the double balls 2Bi ’s. Proof. Choose a maximal disjoint collection of balls B r2 ( pi ). It follows immediately from maximality that the double balls Bi = Br ( pi ) cover M. Suppose that p ∈ M is contained in balls B2r ( p1 ), . . . , B2r ( pk ). In particular, the disjoint balls B r2 ( p1 ), . . . , B r2 ( pk ) are all contained in B3r ( p) so that k i=1
Vol(B r2 ( pi )) ≤ Vol(B3r ( p)).
(1.2)
Nodal Sets
779
On the other hand, for each i, the Bishop-Gromov volume comparison gives C M that depends only on n, an upper bound on r , and a bound on the Ricci curvature of M so that Vol(B3r ( p)) ≤ Vol(B5r ( pi )) ≤ C M Vol(B r2 ( pi )).
(1.3)
Combining (1.2) and (1.3) gives that k ≤ C M . Since p is arbitrary, the lemma follows. From now on, we will use the balls Bi given by Lemma 2. These balls will be sorted into two groups, depending on how fast u is growing from Bi to 2Bi . The two groups will be the ones that are d-good and the ones that are not. Namely, given a constant d > 1, we will say that a ball Bi is d-good if u 2 ≤ 2d u2. (1.4) 2Bi
Bi
Let G d be the union of the d-good balls G d = ∪{Bi | Bi is d-good}.
(1.5)
The next lemma shows that most of the L 2 norm of u comes from d-good balls, provided that d is chosen fixed large independently of λ. Lemma 3. There exists d M depending only on C M so that if d ≥ d M , then 3 u2 ≥ . 4 Gd
(1.6)
Proof. Let = ∪{Bi | Bi is not d-good} be the union of the balls Bi that are not d-good. Since the Bi ’s cover M, we have u2 ≥ u2 − u2 = 1 − u2. (1.7) Gd
M
If the ball Bi is not d-good, then
u 2 > 2d 2Bi
u2.
(1.8)
Bi
Summing (1.8) over the balls that are not d-good gives 2 2 −d 2 −d u ≤ u ≤ 2 u ≤ 2 CM u 2 = 2−d C M ,
Bi is not d-good
Bi
i
2Bi
(1.9)
M
where the second inequality used (1.8) and the third inequality used Lemma 2. If we choose d M so that 2−d M C M = 14 , then (1.6) follows by combining (1.7) and (1.9). To get a lower bound for the number of good balls, we will use the following L p bounds for eigenfunctions proven by Sogge in [S1]: ⎧ p ⎨ C λ n( p−2)− 4p if p ≥ 2(n+1) n−1 , (1.10) u L p ≤ (n−1)( p−2) 2(n+1) ⎩ C λ 8p if p ≤ . n−1
We will only use this estimate for p = remark right after the proof.
2(n+1) n−1
as this gives the sharpest bound; see the
780
T. H. Colding, W. P. Minicozzi II
Lemma 4. There exists C depending only on M so that there are at least C λ that are d M -good.
n+1 4
balls
Proof. Let N denote the number of d M -good balls Bi . Given any p > 2, the L 2 norm of u on a set G is bounded by
u ≤ 2
G
G
Raising both sides to the G = G d M gives
2 p−2 p p p−2 2 2p 1 (u ) ≤ (Vol(G)) p u2L p .
p p−2
power, bringing the L p norm to the other side, and setting
p p p−2 2p − p−2 − 2p 3 p−2 2 u L p ≤ u u L pp−2 ≤ Vol(G d M ), 4 GdM where the first inequality used Lemma 3. Thus, for any p ≤ bound (1.10) gives Cp λ
1−n 4
(1.11)
G
λ
= Cp
(n−1)( p−2) 8p
−
2p p−2
2(n+1) n−1 , the
(1.12)
L p eigenfunction
≤ Vol(G d M ),
(1.13)
where the constant C p depends on p and M but not on λ. r n = C λ− n2 for C and C depending on M (in fact, just on Since Vol(Bi ) ≤ C M M n, a lower bound for the Ricci curvature, and an upper bound on r ), we get that Cλ
1−n 4
n
≤ Vol(G d ) ≤ N C λ− 2 ,
(1.14)
giving the lemma.
Remark 1. Setting p = 2(n+1) n−1 gives the sharpest bound in Lemma 4. To see this, suppose 2(n+1) that p ≥ n−1 and use the L p eigenfunction bound (1.10) in (1.12) to get Cp λ
p n 2( p−2) − 2
2p n( p−2)− p − p−2 4 p = Cp λ ≤ Vol(G d M ), p 2( p−2) 2(n+1) n−1 .
where the constant C p depends on p and M but not on λ. Since
(1.15) is monotone
decreasing in p, the bound (1.15) is sharpest at the endpoint p = Remark 2. The lower bound (1.13) for the volume where u 2 concentrates is sharp. 1 There are spherical harmonics concentrating on a λ− 4 neighborhood of a geodesic; see [S2]. Remark 3. If we used above the Sobolev inequality (p. 89 in [ScY]) instead of the L p -bounds of Sogge, then we would get the following lower bound for the volume of 2n G where p = n−2 ,
u ≤ 2
G
G
2 p−2 p p p−2 2 2p 1 (u ) ≤ (Vol(G)) p u2L p
≤ (Vol(G))
G
p−2 p
∇u2L 2 = (Vol(G))
p−2 p
λ.
(1.16)
Nodal Sets
781
Note that the Sobolev inequality holds since u is an eigenfunction so p−2 2 2 p = 1 − p = n , we get
2
u 2 ≤ (Vol(G)) n λ,
M
u = 0. Since
(1.17)
G
which gives only that the number of good balls is bounded below (independent of λ). 1−n This leads to the lower bound C λ 2 for the measure of the nodal set. 2. Local Estimates for the Nodal Set The main theorem will follow by combining the lower bound on the number of good balls with a lower bound for the nodal volume in each good ball. This local estimate for the nodal set is based on the isoperimetric inequality together with estimates for the sets where the function is positive and negative; this approach comes from [DF] where they prove a similar local estimate (cf. also [HL]). The local lower bound for nodal volume is the following: Proposition 1. Given constants d > 1 and ρ > 1, there exist μ > 0 and λ¯ so that 1 if u = −λ u on Br ( p) ⊂ M with r ≤ ρ λ− 2 , λ ≥ λ¯ , u vanishes somewhere in B r3 ( p), and
u ≤2 2
B2r ( p)
d Br ( p)
u2,
(2.1)
then Hn−1 (Br ( p) ∩ {u = 0}) ≥ μ r n−1 .
(2.2)
Given this proposition, we can now prove the main theorem: Proof (of Theorem 1). We can assume that λ is large. By Lemma 4, there are at least n+1 C λ 4 balls that are d M -good. If Bi is any of the balls in the covering, then Lemma 1 implies that u vanishes some1 where in 13 Bi . Thus, if Bi is d M -good, then Proposition 1 (with d = d M and r = a λ− 2 ) gives Hn−1 (Bi ∩ {u = 0}) ≥ C1 λ−
n−1 2
,
(2.3)
where C1 depends only on M. Here, we have used that we can assume that λ is large. Combining these two facts and using the covering bound from Lemma 2 gives n+1 n−1 C λ 4 C 1 λ− 2 ≤ Hn−1 (Bi ∩ {u = 0}) ≤ Cm Hn−1 ({u = 0}) , (2.4) Bi d M -good
and the theorem follows.
It only remains to prove the local estimate in Proposition 1.
782
T. H. Colding, W. P. Minicozzi II
2.1. Proof of the local lower bound. In Euclidean space, if an eigenfunction u vanishes at a point p, then its average is zero on every ball with center at p. We will use the following generalization of this: Lemma 5. There exists R¯ > 0 depending on M so that if r ≤ R¯ and u( p) = 0, then 1 ≤ u |u|. (2.5) 3 Br ( p) Br ( p) Proof. Given a function v, define the spherical average 1−n v. Iv (s) = s ∂ Bs ( p)
Let d denote the distance in M to p. Differentiating Iv (s) gives
∂v 1−n 1−n + v d + Iv (s) = s s ∂ Bs ( p) ∂r 1−n 1−n 1−n , v + s v d + =s s Bs ( p) ∂ Bs ( p)
(2.6)
(2.7)
where the second equality used the divergence theorem. On Rn , d + 1−n d = 0. On M, as long as d is small (depending on sup |K M | and lower bound for the injectivity radius of M), the Hessian comparison theorem (see, e.g., p. 4 in [ScY]) gives d + 1 − n ≤ h(d), (2.8) d where the function h : [0, ∞) → R is continuous, monotone non-decreasing, and satisfies h(0) = 0. Thus, we see that 1−n λs I (s) ≤ λ s 1−n +s max |Iu (t)| + h(s) I|u| (s). u h(s) |u| ≤ u n t≤s Bs ( p) ∂ Bs ( p) (2.9) Motivated by this, define the function f by f (s) = max |Iu (t)|. t≤s
(2.10)
Observe that f is automatically monotone non-decreasing, f (0) = 0 (since u( p) = 0), and f is Lipschitz with λs f (s) + h(s) I|u| (s), f (s) ≤ Iu (s) ≤ n
(2.11)
where this inequality is understood in the sense of the limsup of forward difference quotients. In particular, we have for s ≤ r that λr s d f (s) e− n ≤ h(s) I|u| (s). ds
(2.12)
Nodal Sets
783
Using that f (0) = 0 and integrating this gives for each t ≤ r that t t λr t λ r2 n n I|u| (s) ds ≤ e h(r ) I|u| (s) ds. f (t) ≤ e h(t)
(2.13)
By the coarea formula, we have r u = t n−1 Iu (t) dt
(2.14)
0
Br ( p)
0
r rn λ r2 rn ≤ n h(r ) f (r ) ≤ e I|u| (t) dt. n n 0
0
λ r2
Observe that e n is bounded since r is on the order of λ−1/2 and we can make h(r ) as small as we like by taking r small enough (independent of λ). Thus, to finish off the r proof, we need only bound r n 0 I|u| (t) dt by a fixed multiple of Br ( p) |u|. To do this, observe that, since r is proportional to λ−1/2 , the mean value inequality (Theorem 1.2 in [LiSc]) gives sup |u| ≤ C r −n |u|, (2.15) Br ( p)
Br/2 ( p)
so we get r I|u| (t) dt = 0
0
r ≤ 2
r 2
t 1−n
∂ Bt ( p)
|u|
sup |u|
sup
Br/2 ( p)
t≤ r2
≤ C1 r 1−n
Br ( p)
dt +
r r 2
t 1−n
Vol(∂ Bt ( p)) + t n−1
∂ Bt ( p)
|u|
dt
n−1 r 2 |u| dt r r ∂ Bt ( p) 2
|u|.
(2.16)
¯ we get the desired bound and the lemma follows. Thus, since r ≤ R,
We are now ready to prove the local lower bound: Proof (of Proposition 1). Let q ∈ B r3 ( p) be a point with u(q) = 0. Note that Br ( p) ⊂ B 4r (q) and B 5r (q) ⊂ B2r ( p). 3
(2.17)
3
1
Since the scale r is proportional to λ− 2 , we can apply the meanvalue inequality (Theorem 1.2 in [LiSc]) to u 2 to get sup u 2 ≤ C0 r −n u 2 ≤ C0 2d r −n u 2 ≤ C0 2d r −n u 2 , (2.18) B 4r (q) 3
B2r ( p)
Br ( p)
B 4r (q) 3
where the second inequality used (2.1) and C0 depends only on n, the geometry of M, and an upper bound for r 2 λ (all of which are fixed). From now on, all integrals will be over B 4r (q) unless stated otherwise. Using (2.18), 3 we get the “reverse Hölder” inequality 2 2 2 (2.19) |u| ≤ C0 2d r −n u2 |u| , u 2 ≤ sup u 2
784
T. H. Colding, W. P. Minicozzi II
which simplifies to
u 2 ≤ C0 2d r −n
2 |u|
.
(2.20)
Let u + be the positive part of u, i.e., u + (x) = max{u(x), 0}, and let u − = u + − u be the negative part of u. It follows from Lemma 5 that 1 1 u+ ≥ |u| and u− ≥ |u|. (2.21) 3 3 Let B + denote B 4r (q) ∩ {u > 0} and B − denote B 4r (q) ∩ {u < 0}. Thus, applying 3 3 Cauchy-Schwarz to u + gives 2 2 2 1 |u| , (2.22) |u| ≤ u + ≤ Vol(B + ) u 2 ≤ Vol(B + ) C0 2d r −n 9 where the last equality used (2.20). Dividing through by the square of the L 1 norm of u gives a scale-invariant lower bound for the volume of B + , rn ≤ Vol(B + ). 9 C 0 2d
(2.23)
The same argument applies to u − to give the same lower bound for the volume of B − . Together, these allow us to apply the isoperimetric inequality to get the lower bound for the measure of the nodal set in B, thus completing the proof of the proposition. References [B]
Brüning, J.: Über knoten von eigenfunktionen des laplace-beltrami-operators. Math. Z. 158(1), 15–21 (1978) [DF] Donnelly, H., Fefferman, C.: Nodal sets of eigenfunctions on Riemannian manifolds. Invent. Math. 93, 161–183 (1988) [G] Gromov, M.: Metric structures for Riemannian and non-Riemannian spaces. Boston, MA: Birkhäuser Boston, Inc., 2007 [HL] Han, Q., Lin, F.H.: Nodal sets of solutions of elliptic differential equations. Book in preparation, 2007, available at http://www.nd.edu/~qhan/nodal.pdf [LiSc] Li, P., Schoen, R.: L p and mean value properties of subharmonic functions on Riemannian manifolds. Acta Math. 153, 279–301 (1984) [M] Mangoubi, D.: Local asymmetry and the inner radius of nodal domains. Comm. Par. Diff. Eqs. 33(7–9), 1611–1621 (2008) [ScY] Schoen, R., Yau, S.-T.: Lectures on differential geometry. Cambridge, MA: International Press, 1994 [S1] Sogge, C.: Concerning the L p norm of spectral clusters for second-order elliptic operators on compact manifolds. J. Funct. Anal. 77(1), 123–138 (1988) [S2] Sogge, C.: Fourier integrals in classical analysis. Cambridge Tracts in Mathematics, 105. Cambridge: Cambridge University Press, 1993 [SZ] Sogge, C., Zelditch, S.: Lower bounds on the Hausdorff measure of nodal sets. Math. Res. Lett. 18(1), 25–37, available at http://arXiv.org/abs/1009.3573.v3 [math.Ap], 2011 [Y] Yau, S.-T.: Open problems in geometry. Proc. Sympos. Pure Math. Vol. 54, Part 1, Providence RI: Amer. Math. Soc. 1993, pp. 1–28 [Z] Zelditch, S.: Complex zeros of real ergodic eigenfunctions. Invent. Math. 167(2), 419–443 (2007) Communicated by S. Zelditch
Commun. Math. Phys. 306, 785–803 (2011) Digital Object Identifier (DOI) 10.1007/s00220-011-1295-9
Communications in
Mathematical Physics
Specifying Angular Momentum and Center of Mass for Vacuum Initial Data Sets Lan-Hsuan Huang1 , Richard Schoen2 , Mu-Tao Wang1 1 Department of Mathematics, Columbia University, New York, NY 10027, USA.
E-mail:
[email protected];
[email protected]
2 Department of Mathematics, Stanford University, Stanford, CA 94305, USA.
E-mail:
[email protected] Received: 29 September 2010 / Accepted: 22 February 2011 Published online: 2 July 2011 – © Springer-Verlag 2011
Abstract: We show that it is possible to perturb arbitrary vacuum asymptotically flat spacetimes to new ones having exactly the same energy and linear momentum, but with center of mass and angular momentum equal to any preassigned values measured with respect to a fixed affine frame at infinity. This is in contrast to the axisymmetric situation where a bound on the angular momentum by the mass has been shown to hold for black hole solutions. Our construction involves changing the solution at the linear level in a shell near infinity, and perturbing to impose the vacuum constraint equations. The procedure involves the perturbation correction of an approximate solution which is given explicitly.
1. Introduction For asymptotically flat spacetimes with appropriate asymptotics there are several conserved quantities which can be measured at spatial infinity. These include the total energy and linear momentum, as well as the angular momentum and center of mass. When we fix an affine frame at infinity the linear and angular momentum as well as the center of mass become three vectors. It is natural to ask whether there are any constraints on these quantities imposed by the Einstein equations. The positive mass theorem provides one such constraint, namely that the energy-momentum vector is a forward pointing timelike vector. In particular this says that the magnitude of the linear momentum vector is bounded above by the energy. For the Kerr solutions which describe rotating stationary axisymmetric vacuum black holes, it is true that the angular momentum must satisfy such a bound. It has been shown over the past several years by S. Dain [11] and P. T. Chru´sciel et al. [4,7,8] that such an inequality is also satisfied by general axisymmetric black hole solutions of the Einstein equations. The paper by X. Zhang [17] proves such an inequality under an energy condition involving his definition of angular momentum density, but it appears that general vacuum data sets do not satisfy this energy condition.
786
L.-H. Huang, R. Schoen, M.-T. Wang
The main results of this paper show that there are no constraints on the angular momentum and center of mass in terms of the energy-momentum vector for general vacuum solutions of the Einstein equations. Precisely we fix an affine frame at infinity and we give an effective procedure for adding a specified amount of angular momentum to a solution of the vacuum Einstein equations, producing a new solution with specified angular momentum but with only slightly perturbed energy-momentum vector. We obtain a similar result for the center of mass. Then, by considering a family of initial data near the given one, and by doing the construction continuously, we obtain a perturbation with arbitrarily specified angular momentum and center of mass, while leaving the energy-momentum vector unchanged. One may think of these results as the pure gravity analogue of the addition to a Newtonian system of a very small symmetrically placed mass far from an axis whose rotation imposes a fixed amount of angular momentum. Similarly one can think of adding a small mass to a Newtonian system which when translated far from the center of mass of the original system produces a fixed change in the center of mass of the new system. From the point of view of the dynamics of the vacuum Einstein equations we expect the angular momentum that we add near spatial infinity to be radiated away and to have little effect on the final stationary state of the system. We emphasize that our solutions with arbitrarily specified angular momentum and center mass are complete manifolds, and we can arbitrarily specify the angular momentum of the perturbed data while keeping the energy-momentum and center of mass fixed. Without the completeness condition, there are exterior vacuum solutions with arbitrary prescribed energy-momentum, angular momentum, and center of mass, such as the example of a boosted slice in an exterior Kerr solution computed by Chru´sciel–Delay [6]. Also, certain N -body solutions constructed by Chru´sciel–Corvino–Isenberg [5] should have large angular momentum, but this comes from orbital angular momentum, i.e. c × p for large c. In particular, the center of mass is not fixed in their case. From a technical point of view the reason it is possible to make these constructions is that the angular momentum and center of mass are determined by terms in the expansion of the solution which are of lower order than those which determine the energy and linear momentum. The idea then is to make perturbations near infinity which affect only the lower order terms in the expansion. We do this by explicitly constructing linear perturbations supported in a shell near infinity which impose the required change in angular momentum (or center of mass), and then by finding a solution of the vacuum constraint equations which is sufficiently close to the perturbed system so that the change in angular momentum (or center of mass) persists. This can be done in such a way that the energy and linear momentum are changed by an arbitrarily small amount. Let (M, g, π ) be asymptotically flat in the sense that, outside a compact set, there exists an asymptotically flat coordinate system {x i } so that gi j (x) = δi j + O |x|−1 , πi j (x) = O(|x|−2 ), ∂ k gi j (x) = O(|x|−1−k ) for k = 1, 2, ∂πi j (x) = O |x|−3 . In addition, we assume that (M, g, π ) satisfies the Regge–Teitelboim condition gi j (x) − gi j (−x) = O |x|−2 , πi j (x) + πi j (−x) = O(|x|−3 ), ∂ k (gi j (x)−gi j (−x)) = O |x|−2−k for k = 1, 2, ∂ πi j (x)+πi j (−x) = O |x|−4 . The notation f = O(|x|−a ) means that | f | ≤ C|x|−a for a constant C. We remark that our construction works for data (g, π ) with weaker assumptions on the decay rates. For
Specifying Angular Momentum and Center of Mass
787
simplicity of notation, we assume the decay rates above and do not consider here the question of optimal decay conditions. Let E, C, P, J denote the energy, center of mass, linear momentum, and angular momentum of (g, π ). They are defined as limits of integrals over Euclidean spheres: 1 xj E= (gi j,i − gii, j ) lim dσ0 , 16π ρ→∞ |x|=ρ |x| i, j ⎡ ⎤
j i p x x ⎦ 1 x ⎣x p gi p lim − − gii (gi j,i − gii, j ) Cp = dσ0 , 16π E ρ→∞ |x|=ρ |x| |x| |x| i, j i xj 1 lim dσ0 , πi j Pi = 8π ρ→∞ |x|=ρ |x| j k 1 j x lim dσ0 , π jk Yi Ji = 8π E ρ→∞ |x|=ρ |x| j,k
where dσ0 is the area measure of the Euclidean sphere {|x| = ρ} and Yi = ∂∂x i × x (cross product) for i = 1, 2, 3 are the rotation vector fields. Denote by E, C, P, J the energy, center of mass, linear momentum, and angular momentum of (g, π ). We now give precise statements of the main theorems, where the definition of the k, p weighted Sobolev spaces W−q is provided in Sect. 3, and we assume p > 3 and q ∈ (1/2, 1). In our construction, we fix an affine frame near infinity and measure all asymptotic quantities relative to this frame; in fact, we may fix an asymptotically flat coordinate system throughout. Theorem 1. Let (g, π ) be a nontrivial vacuum asymptotically flat initial data set with g = v 4 δ outside a compact set and π(x) + π(−x) = O(|x|−1−2q ). Given α ∈ R3 and > 0, there exists a vacuum asymptotically flat initial data set (g, π ) such that (g, π ) 2, p 1, p is within the -neighborhood of (g, π ) in W−q × W−1−q and |E − E| ≤ , |C − C| ≤ ,
|P − P| ≤ ,
and |J − J − α | ≤ .
(1.1)
For the center of mass we prove the following. Theorem 2. Let (g, π ) be a nontrivial vacuum asymptotically flat initial data set with g = v 4 δ outside a compact set and π(x) + π(−x) = O(|x|−1−2q ). Given γ ∈ R3 and > 0, there exists a vacuum asymptotically flat initial data set (g, π ) such that (g, π ) 2, p 1, p is within the -neighborhood of (g, π ) in W−q × W−1−q and |E − E| ≤ , |J − J| ≤ , |P − P| ≤ and |C − C − γ | ≤ .
(1.2)
788
L.-H. Huang, R. Schoen, M.-T. Wang
By combining these two results we can change both the center of mass and angular momentum so that they are arbitrarily close to specified values while leaving the energy and linear momentum essentially unchanged. The condition that g = v 4 δ can be removed by a density theorem. Moreover, given a vacuum initial data set, there is a small perturbation with arbitrary specified angular momentum and center of mass and with the same mass and linear momentum. Theorem 3. Let (g, π ) be a nontrivial vacuum initial data set satisfying the Regge– Teitelboim condition. Given any constant vectors α 0 , γ0 ∈ R3 , there exists a vacuum 2, p 1, p initial data set (g, ¯ π¯ ) within a small neighborhood of (g, π ) in W−q × W−1−q and E = E, P = P, and J = J + α 0 , C = C + γ0 . In Sect. 2 we give an explicit construction of solutions of the linearized constraint equations which satisfy a certain moment condition. Sections 3, 4, and 5 are devoted to the proofs of the main theorems. We remark that the constructions of Sect. 2 are explicit, while the method of solving the exact constraint equations from the approximate solution involves constructing a small solution of an elliptic system with leading order term the diagonal Laplace equation. It should be possible to numerically approximate the resulting solutions to a high degree of accuracy. 2. Compactly Supported Solutions of the Linearized Constraints Recall that the vacuum constraint equations for initial data (g, π ) may be written R(g) +
2 1 Trg π − |π |2 = 0, divg (π ) = 0, 2
where πi j = K i j − Trg (K )gi j is the momentum tensor. In general, we consider the constraint map defined by
(g, π ) = (R(g) +
2 1 Trg π − |π |2 , divg (π )). 2
The vacuum constraint equations, linearized at the trivial data (δ, 0), become Lσ := (σi j,i j − σii, j j ) = 0, div(τ ) = 0, i, j
for symmetric (0, 2) tensors (σ, τ ). In this section we will construct solutions of the linearized constraint equations which are compactly supported in the shell A1 = {x : 1 < |x| < 2} contained in R3 and which have certain specified moment conditions with respect to rotation vector fields. We use the Einstein summation convention and sum over repeated indices; though, sometimes we employ summation symbols for clarity. We write the Euclidean metric on A1 in spherical coordinates dr 2 + r 2 g˜ ab d x a d x b , where g˜ is the standard round metric on S 2 . The coordinates are labeled by r = x 0 and θ, φ = x 1 , x 2 . The ranges for the indices are i, j, k, l, . . . = 0, 1, 2, and
Specifying Angular Momentum and Center of Mass
789
a, b, c, d, e, . . . = 1, 2. If α, β are one-forms, we define the symmetric product α β to be the symmetric (0, 2) tensor whose components are (α β)i j =
1 (αi β j + α j βi ). 2
We first impose the following ansatz for our solutions. Lemma 2.1. Suppose that q and Q are functions of r , α˜ is a one-form on S 2 , and τ˜ is a trace-free symmetric (0, 2) tensor on S 2 . Then τ = 2q α˜ dr + Q τ˜ is a trace-free symmetric (0, 2) tensor on A1 . The condition div(τ ) = 0 becomes α˜ = 0 and (r 2 q) α˜ + Q(div τ˜ ) = 0, div is the divergence operator of S 2 on tensors. where div Suppose that p and P are functions of r , η˜ is a one-form on S 2 , and σ˜ is a trace-free symmetric (0, 2) tensor on S 2 . Then σ = 2 p η˜ dr + P σ˜ is a trace-free symmetric (0, 2) tensor on A1 , and Lσ = 0 if η˜ + P(div div σ˜ ) = 0. 2r (r p) div This lemma follows directly from the following two computational lemmas. For a coordinate system x i on R3 , denote by gi j d x i d x j the Euclidean metric. Lemma 2.2. Let h be any symmetric (0, 2) tensor on R3 , then ∂ 1 ∂ √ k 1 (divh)i = √ ( gh i ) + h jk i (g jk ). g ∂xk 2 ∂x Proof. Let V i be any vector field. We have V i (h ik );k = (V i h ik );k − h ik V;ki . Now i ∂ √ k 1 ∂ i k√ i 1 k ∂V (V i h ik );k = √ (V h g) = V ( gh ) + h . √ i i i g ∂xk g ∂xk ∂xk
√ i V l , and then Therefore V i (h ik );k = V i √1g ∂ ∂x k ( gh ik ) − h ik kl 1 ∂ √ k l ( gh i ) − h lk ki . (divh)i = √ g ∂xk Lastly, we plug in the formula for ikj .
Now we apply this to the spherical coordinates gi j d x i d x j = dr 2 + r 2 dθ 2 + r 2 sin2 θ dφ 2 = dr 2 + r 2 g˜ ab d x a d x b and write h = h 00 dr 2 + 2h 0a d x a dr + h ab d x a d x b and α = α0 dr + αa d x a .
790
L.-H. Huang, R. Schoen, M.-T. Wang
Lemma 2.3. Let h be a symmetric (0, 2) tensor and α a one-form on R3 , then ∂ 2 1 ∂ ab (r h 00 ) + r −2 ( g˜ g˜ h 0b ) − r −3 g˜ ab h ab ]dr ∂r g˜ ∂ x a ∂ ∂ 1 ∂ bc 1 +{r −2 (r 2 h a0 ) + r −2 [ ( g˜ g˜ h ac ) + h bc a g˜ bc ]}d x a b ∂r ∂ x 2 ∂ x g˜
divh = [r −2
and divα = r −2
∂ 2 1 ∂ ab (r α0 ) + r −2 ( g˜ g˜ αa ). ∂r g˜ ∂ x b
We shall take the following ansatz on h that h 00 = 0 and g˜ ab h ab = 0. We also assume that h 0a = p(r )η˜ a and h ab = P(r )h˜ ab for a one-form η˜ and a symmetric (0, 2) tensor h˜ on S 2 ; that is, we have ˜ h = 2 p(r )η˜ dr + P(r )h. We then have η)dr h˜ ˜ + r −2 (r 2 p) η˜ + r −2 P div divh = (r −2 p)(div and ˜ ivη) ˜ + r −4 P(d ivd ivh). divdivh = 2r −3 (r p) (d Hence, Lemma 2.1 follows directly from the above two identities. In particular, if we consider the tensor τ of the following expression τ = −2q d iv(τ˜ ) dr + (r 2 q) τ˜ .
(2.1)
Then τ satisfies divτ = 0 on A1 if d ivd ivτ˜ = 0 on S 2 .
(2.2)
σ˜ dr + 2r (r p) σ˜ . σ = −2 p div
(2.3)
Also, we consider σ as follows:
Then σ satisfies Lσ = 0 on A1 for any trace-free symmetric (0, 2) tensor σ˜ on S 2 . In the following, we show that there are nontrivial solutions to (2.2). We can start from any one-form η˜ on S 2 and construct a trace-free symmetric (0, 2) tensor δ˜∗ η˜ on S 2 by defining (δ˜∗ η) ˜ ab =
1 (η˜ a;b + η˜ b;a − g˜ cd η˜ c;d g˜ ab ). 2
We need the following computational result.
Specifying Angular Momentum and Center of Mass
791
Lemma 2.4. δ˜∗ η˜ = − 1 (dd ∗ + d ∗ d)η˜ + η. div ˜ 2 In particular, if u is a function on S 2 , then δ˜∗ du = 1 d(u + 2u), div 2 and δ˜∗ (∗du) = 1 ∗ d(u + 2u). div 2 Proof. We recall that for one-forms on S 2 , we have d ∗ = − ∗ d∗, and for a function u, d ∗ du = −u. We assume η˜ = du and compute in an orthonormal frame: (d iv(δ˜∗ du))1 = On the other hand,
1
2 (dd
∗
+ d ∗ d)du
(d ivδ˜∗ du)1 +
1 (u 1;11 − u 2;21 ) + u 1;22 . 2
1
= − 21 (u)1 = − 21 (u 1;11 + u 2;21 ). Therefore,
1 (dd ∗ + d ∗ d)du 2
= u 1;22 − u 2;21 . 1
At last, we use the commutation formula u b;ab = u b;ba + u a for a = b on S 2 . Other formulae can be checked similarly.
The next lemma shows how solutions of Eq. (2.2) can be constructed: Lemma 2.5. Suppose that α˜ is any one-form on S 2 . Then τ˜ = δ˜∗ α˜ is a trace-free symmetric (0, 2) tensor on S 2 . Moreover, if α˜ = ∗du for a function u on S 2 , then τ˜ div τ˜ = 0. satisfies div Proof. This follows from the formula of Lemma 2.4, d ivτ˜ =
1 ∗ d(u + 2u), 2
together with the fact that (d ∗ )2 = 0.
We need the following computational result. Lemma 2.6. Suppose that τ = 2q α˜ dr + Q τ˜ , σ = 2 p η˜ dr + P σ˜ , and Y = Y a ∂ ∂x a is tangent to S 2 . Then
1 l l a η˜ c g˜ bc τi j,l Y + τil Y, j σ i j = pqr −2 α˜ b;a Y a + α˜ a Y;b 2 1 a g˜ bd σ˜ de g˜ ec , + P Qr −4 τ˜bc;a Y a + τ˜ab Y;ca + τ˜ac Y;b 2
792
L.-H. Huang, R. Schoen, M.-T. Wang
a denote covariant derivatives of α, where α˜ b;a , τ˜bc;a , and Y;b ˜ τ˜ , and Y with respect to 2 the standard metric g˜ ab on S .
Proof. Direct computation.
We are finally in a position to prove the main results of this section. = (λ1 , λ2 , λ3 ), there exist symmetric (0, 2) tensors σ, τ ∈ Theorem 2.1. Given any λ C0∞ (A1 ) with σ (x) = σ (−x) and τ (x) = τ (−x) satisfying
Lσ = 0, τi j,i = 0, for j = 1, 2, 3,
(2.4)
1 l l τi j,l (Yk ) + τil (Yk ), j σ i j d x = λk 2
(2.5)
i
so that
A1
for k = 1, 2, 3, where Yk =
∂ ∂xk
× x (cross product).
Remark. By a direct computation, the integrand in (2.5) equals 21 (LYk τ )i j σ i j , where LYk τ is the Lie derivative of τ along Yk on R3 . Thus, if τ is axisymmetric with respect to Yk , i.e. LYk τ = 0, then (2.5) is always zero. Proof. We first show how to make the integral on the left of (2.5) for k = 1 nonzero. To simplify notation for this purpose we denote Y1 by Y . We choose τ of the form (2.1) for some τ˜ = δ˜∗ ∗ du, where u is an even function on S 2 . By Lemma 2.1 and Lemma 2.5, τ satisfies Eq. (2.4) and has the desired symmetry τ (x) = τ (−x). We take σ of the form (2.3) for a symmetric (0, 2) tensor σ˜ to be determined later. By Lemma 2.6, the integral in question can be written as
2 −2 a α˜ b;a Y a + α˜ a Y;b η˜ c g˜ bc d S 2 pqr dr S2
1
+ 1
2
1 P Qr −4 dr 2
S2
a τ˜bc;a Y a + τ˜ab Y;ca + τ˜ac Y;b g˜ bd σ˜ de g˜ ec d S 2 ,
for α˜ = −d iv τ˜ , η˜ = −d iv σ˜ , P = 2r (r p) , and Q = (r 2 q) . We can choose p and 2 q to be any compactly supported functions on the interval (1, 2) so that ( 1 pqr −2 dr ) 2 and ( 1 21 P Qr −4 dr ) are arbitrary. It suffices to choose σ˜ to make the following integral nonzero: a τ˜bc;a Y a + τ˜ab Y;ca + τ˜ac Y;b g˜ bd σ˜ de g˜ ec d S 2 = 0. S2
To achieve this, we take a σ˜ bc = τ˜bc;a Y a + τ˜ab Y;ca + τ˜ac Y;b
and define σ by Eq. (2.3). Since Y is invariant under x → −x, σ defined in this way has the desired symmetry and satisfies Lσ = 0. It is not hard to check that
Specifying Angular Momentum and Center of Mass
793
σ˜ = LY τ˜ , the Lie derivative of τ˜ with respect to Y on S 2 . Thus, we can take an even function u (e. g. the restriction of any homogeneous polynomial of even degree) a is nonzero. so that τ˜bc;a Y a + τ˜ab Y;ca + τ˜ac Y;b To achieve the desired conclusion, we consider the linear functional T(σ,τ ) (v) given by the left-hand side of (2.5) with vector field Y = v × x. Since this is a nonzero linear functional we may choose a positively oriented orthonormal basis {e1 , e2 , e3 } so that the , and after multiplication of vector (T(σ,τ ) (e1 ), T(σ,τ ) (e2 ), T(σ,τ ) (e3 )) is proportional to λ τ by a constant we may assume the vector is equal to λ. It follows that there is a rotation R of R3 so that T(σ,τ ) (R( ∂ ∂x k )) = λk for k = 1, 2, 3. It follows that (2.5) holds for the pair ((R −1 )∗ (σ ), (R −1 )∗ (τ )) since we clearly have T(S ∗ σ,S ∗ τ ) = T(σ,τ ) ◦ S −1 for any rotation S.
We will need a corresponding result which will be used to specify the center of mass. This involves the construction of solutions of Lσ = 0 satisfying a moment condition. Theorem 2.2. Given any β = (β1 , β2 , β3 ) ∈ R3 , there exist a trace-free and divergencefree symmetric (0, 2) tensor σ ∈ C0∞ (A1 ) satisfying Lσ = 0 so that xp (σi j,k )2 d x = β p (2.6) A1
i, j,k
for p = 1, 2, 3. Proof. We first show how to make the integral on the left of (2.6) nonzero for p = 1. By starting with a nonzero function u supported in the first octant, we find from Lemma 2.5 and (2.1) a nonzero trace-free symmetric (0,2) tensor σ in C0∞ (A1 ) satisfying div(σ ) = 0. Then, in particular Lσ = 0. Because σ is supported in the first octant, this implies x1 (σi j,k )2 d x > 0. A1
i, j,k
We then let Tσ be the linear functional on R3 given by x ·v (σi j,k )2 d x. Tσ (v) = A1
i, j,k
Since Tσ is nonzero, there is a positively oriented orthonormal basis {e1 , e2 , e3 } for Replacing σ which the vector defined by (Tσ (e1 ), Tσ (e2 ), Tσ (e3 )) is proportional to β. with a scalar multiple we may assume that Tσ (e p ) = β p for p = 1, 2, 3. Thus there is a rotation R with R( ∂ x∂ p ) = e p so that Tσ (R( ∂ x∂ p )) = β p for p = 1, 2, 3. Since for a rotation S we have TS ∗ σ = Tσ ◦ S −1 it follows that (R −1 )∗ σ satisfies the required condition (2.6).
3. Specifying the Angular Momentum We first state a general analytic result which constructs new initial data sets from given ones. Let Lσ := i, j (σi j,i j − σii, j j ) be the linearized (at the Euclidean metric) scalar
794
L.-H. Huang, R. Schoen, M.-T. Wang
curvature map which goes from smooth symmetric (0, 2) tensors to smooth functions. k, p k, p Denote by W−q the weighted Sobolev spaces defined as follows. We say f ∈ W−q if p D α f ρ |α|+q p ρ −3 dvolg < ∞,
f W k, p := −q
1
M |α|≤k
where α is a multi-index and ρ is a continuous function with ρ = |x| on the region where the asymptotically flat coordinate system {x i } is defined. When p = ∞, ess sup |D α f |ρ |α|+q . f W k,∞ = −q
|α|≤k
M
2, p
We assume k = 1 or 2, q ∈ (1/2, 1), and p > 3. By our assumption (g, π ) ∈ W−q × 1, p W−1−q .
In the following, the notation f = O(r −a ) means that | f | ≤ Cr −a , |∂ f | ≤ Cr −a−1 for some constant C independent of k and the analogous conditions on successive derivatives as needed. We remark that the following results would hold similarly for any p > 3/2, but the “O”-notation below would mean the decay in weighted Sobolev norms. Here we assume p > 3 because the weighted Sobolev norms can be replaced by pointwise estimates using the Sobolev imbedding theorem. Let r := |x|. Denote the shell by Ak = {k < r < 2k} and denote by C0∞ (Ak ) the set of functions or tensors which are compactly supported in Ak . Proposition 3.1. Let the symmetric (0, 2) tensors σ, τ ∈ C0∞ (A1 ) satisfy the linearized constraint equations
Lσ = 0, τi j,i = 0 for j = 1, 2, 3.
i
Define σ k , τ k ∈ C0∞ (Ak ) by σ k = k −1 σ (x/k), and τ k = k −2 τ (x/k). Then given any vacuum initial data set (M, g, π ) with decay rate g − δ = O(r −1 ), π = O(r −2 ), and any fixed q ∈ (1/2, 1), there exists a sequence of vacuum initial data sets {(g k , π k )} so that for k large and outside a fixed compact set (independent of k),
Ak k gi j + σikj + O(r −2q ), gi j = 1 + (3.1) r 1 π ikj = πi j + τikj + 3 −Bik x j − B kj xi + Blk xl δi j + O(r −1−2q ), (3.2) r l
where Ak and (B1k , B2k , B3k ) are constants. The initial data sets {(g k , π k )} are small perturbations of (g, π ) in the weighted Sobolev spaces; in fact, g − g k W 2, p → 0, π − π k W 1, p → 0, −q
−1−q
as k → ∞.
Moreover, E k → E and Pk → P
as k → ∞.
(3.3)
Specifying Angular Momentum and Center of Mass
795
Proof. Let gˆ k = g +σ k and πˆ k = π +τ k . Then (gˆ k , πˆ k ) satisfies the constraint equations
(gˆ k , πˆ k ) = (0, 0) everywhere in M \ Ak and (gˆ k , πˆ k ) = (O(r −4 ), O(r −4 )) in Ak . We denote (Lg X)i j = Xi; j + X j;i − (divg X)gi j for any vector field X and metric g. By the proof of [10, Thm. 1], there exist (u k , Xk ) on M and (h k , w k ) with compact supports (uniformly in k) such that g k = (u k )4 gˆ k + h k , and π k = (u k )2 (πˆ k + Lgˆ Xk ) + w k satisfy (g k , π k ) = 0 for all k large, and u k − 1W 2, p → 0, Xk W 2, p → 0, h k W 2, p → 0, w k W 1, p → 0, −q
−q
−q
−q
(3.4)
as k → ∞. The constraint equations imply u k = O(r −2−2q ) and (Xk )i = O(r −2−2q ). It follows that Bk Ak + O(r −2q ), and (Xk )i = i + O(r −2q ). 4r r Therefore, (3.1) and (3.2) follow, and the convergence of (3.3) can be derived as in [10].
uk = 1 +
For the rest of the section, we consider the special case of Proposition 3.1 when −2 g = v 4 δ outside a compact set so we have g = 1 + 2E r δi j + O(r ). We also assume −2 −1−2q that π = O(r ), π(x) + π(−x) = O(r ). That any initial data can be approximated by such data follows from [10]. Note that the asymptotic oddness condition is the Regge–Teitelboim condition [16,1] required for the existence of J. Therefore, near infinity, (g k , π k ) satisfy the conditions
2E + Ak k k 4 k δi j + σikj + O(r −2q ), g i j = (u ) (gi j + σi j ) = 1 + (3.5) r π ikj = (u k )2 (π + τ k + Lgˆ Xk )i j 1 k k k k = πi j + τi j + 3 −Bi x j − B j xi + Bl xl δi j + O(r −1−2q ). r
(3.6)
l
Proof of Theorem 1. Let (g, π ) be a vacuum initial data set satisfying the above con = −8π α . ditions. Choose σ and τ satisfying the assumptions in Theorem 2.1 with λ There exist vacuum initial data sets (g k , π k ) satisfying (3.5) and (3.6) by Proposition 3.1, and |E k − E| ≤ and |Pk − P| ≤ . It remains to prove the desired properties of the center of mass and angular momentum. In the following, we suppress the superscript k of g k and π k whenever it is clear from the context. We assume that α = (α1 , α2 , α3 ). We denote by Y p the Euclidean Killing vector field Y p = ∂ x∂ p × x for p = 1, 2, 3. Because of the asymptotics of (g, π ), xj 1 E Jp = π i j (Y p )i dσ0 lim ρ→∞ 8π r r =ρ i, j 1 = π i j (Y p )i ν j dσg , lim 8π ρ→∞ r =ρ
796
L.-H. Huang, R. Schoen, M.-T. Wang
where ν and dσg are respectively the outward unit normal vector and the area measure of {r = ρ} with respect to g. To shorten notation we do the following estimates using Y in place of Y p . By the divergence theorem, assuming ρ0 k and 2k < ρ, r =ρ
π i j Y i ν j dσg =
ρ0 ≤r ≤ρ
gl j (π il Y i ); j dvolg +
r =ρ0
π i j Y i ν j dσg .
Because τ k vanishes on {r = ρ0 },
π i j Y ν dσg = i
r =ρ0
j
r =ρ0 i, j
(u k )2 (π + Lgˆ Xk )i j Y i ν j dσg .
Although the absolute value of the last integrand is O(ρ0−1 ) which would not have a finite limit, the asymptotic symmetry conditions say that π is odd and the term Lgˆ Xk is also asymptotically odd, and hence the leading order term of the integrand is odd causing the limit to be finite. In fact, we have for k large enough,
2 uk π + Lgˆ Xk Y i ν j dσg ij
r =ρ0 i, j
=
r =ρ0 i, j
+
πi j Y ν dσg + i
Lgˆ Xk Y i ν j dσg
j
ij
r =ρ0 i, j
(u k )2 − 1 π + Lgˆ Xk Y i ν j dσg ij
r =ρ0 i, j
(3.7) (3.8)
1−2q . = 8π EJ p + O ρ0 Clearly, the first integral in (3.7) is 8π EJ p + O(ρ0−1 ). For the second integral in (3.7), we use (3.4) and choose k large so that |Lgˆ Xk | is small, say less than ρ0−4 . The integral 1−2q in (3.8) is O(ρ0 ) by (3.5), (3.6), and the asymptotic symmetry of π . To estimate the interior integral, by the constraint equation gl j π il; j = 0 and the condition that Y is a Euclidean Killing vector field, gl j π il Y i
;j
i = gl j − δl j π il Y,ij + gl j π il Y s js .
(3.9)
By (3.5), (3.6), and σ k (x) = σ k (−x), the integral of the first term on the right-hand side is ρ0 ≤r ≤ρ
g −δ lj
lj
π il Y,ij
dvolg = − =−
Ak i, j,l
A1 i, j,l
1−2q σikj τilk Y,l j d x + O ρ0 1−2q , σi j τil Y,l j d x + O ρ0
Specifying Angular Momentum and Center of Mass
797
where we use π(x) + π(−x) = O(r −1−2q ) and σ k (x) = σ k (−x) to estimate the error terms. For example, for some of the error terms, ρ 2E Ak 1−2q , − δl j πil Y,ij dvolg = O(r −2−2q ) r 2 dr = O ρ0 r ρ0 ≤r ≤ρ ρ0 i, j,l 2k 1−2q k i . − σl j πil Y, j dvolg = O k −1r −1−2q r 2 dr = O k 1−2q = O ρ0 ρ0 ≤r ≤ρ
k
i, j,l
To estimate the integral of the second term on the right-hand side of (3.9), we again use (3.5), (3.6) and asymptotic symmetry to derive the first equality. 1 1−2q jl s i g π il Y js dvolg = τikj Y l σikj,l d x + O ρ0 2 Ak ρ0 ≤r ≤ρ i, j,l 1 1 1−2q 1−2q =− =− , σikj τikj,l Y l d x + O ρ0 σi j τi j,l Y l d x + O ρ0 2 Ak 2 A1 i, j,l
i, j,l
where in the second-to-last identity, we integrate by parts and use the fact that σ k and τ k vanish on the boundary and that Y is divergence-free. Combining the above identities, we derive l l 1 1−2q 8π E J p = 8π EJ p − τi j,l Y p + τil Y p , j σi j d x + O ρ0 . 2 A1 i, j,l
Then we choose ρ0 large so that the error term is less than . For k ρ0 large enough, = −8π α . we prove that g k , π k satisfies (1.1) from Theorem 2.1 applied with λ We show that the center of mass of (g, π ) remains almost unchanged during the process. For p = 1, 2, 3 we have the components of C defined by ⎡ ⎤ j p xi x x 1 p ⎣x p ⎦ dσ0 . − C = g i j,i − g ii, j g i p − g ii lim r r r 16π E ρ→∞ r =ρ i, j i By the divergence theorem, p 16π E C = lim xp g i j,i j − g ii, j j d x ρ→∞ ρ ≤r ≤ρ 0
+
r =ρ0
⎡
⎣x p
i, j
i, j
⎤
xj xi xp ⎦ − g i j,i −g ii, j g i p −g ii dσ0 . r r r
(3.10)
i
Because i, j g i j,i j − g ii, j j is the leading order term of the scalar curvature, it can be replaced by the lower order terms such as |π|2 and (Dg)2 using the constraint equations. k k Then (3.6), and the symmetry σ (x) = σ (−x), the interior integral above is by (3.5), 1−2q . Similarly, we have O ρ0 ⎡ ⎤ j
i p x x x ⎣x p ⎦ dσ0 + O(ρ −1 ). gi p − gii 16π EC p = − gi j,i − gii, j 0 r r r r =ρ0 i, j
i
(3.11)
798
L.-H. Huang, R. Schoen, M.-T. Wang
4 Moreover, because g = u k g and u k is close to 1 on{r =ρ0 } for k large enough, the difference of the boundary integrals on {r = ρ0 } is O ρ0−1 . Therefore, for a fixed ρ0 large and for k ρ0 large enough, we have |C − C| ≤ .
4. Specifying the Center of Mass 4 Asin the that previous section we assume g = v δ outside a compact set and π = O r −2 and π(x) + π(−x) = O r −1−2q . We apply Proposition 3.1 with τ = 0 and σ chosen by Theorem 2.2 to be a solution of Lσ = 0 satisfying the moment condition (2.6). k
Proof of Theorem 2. By Proposition 3.1, |E − E| ≤ for k large. Since τ = 0, the proof that the angular momentum satisfies |J − J| ≤ follows as in the previous section. To estimate the change in the center of mass we use (3.10), (3.11), and the argument following them. For p = 1, 2, 3 we have p p xp g i j,i j − g ii, j j d x + O ρ0−1 . 16π E C − EC = lim ρ→∞ ρ ≤r ≤ρ 0
σ k (x)
i, j
Because
= the interior term above is not of lower order. Let R denote the scalar curvature of g. Then from the constraint equations, we have R = O(r −4 ) and R(x) − R(−x) = O r −3−2q . By [13, Lemma 3.5], xp g i j,i j − g ii, j j d x = x p R dx − x p Eg d x + O ρ0−1 , ρ0 ≤r ≤ρ
σ k (−x),
ρ0 ≤r ≤ρ
i, j
where Eg = −
g il − δil
ρ0 ≤r ≤ρ
2g i j,l j − g il, j j − g j j,li
i, j,l
3 1 1 −g jl, j g il,i + g jl, j g ii,l + g i j,l g i j,l − g j j,l g ii,l − g i j,l g il, j . + 4 4 2 i, j,l
Recall that σ k ∈ C0∞ (Ak ) is trace-free and divergence-free. the above integral By (3.5), 1−2q
, is equal to the following, up to an error term of order O ρ0 ⎡ ⎤
2E + Ak k ⎦ xp ⎣ −2σilk − σilk σil, dx jj r Ak ,li i,l i,l, j
2 2E + Ak 3 p k x − δi j + σi j,l d x r Ak 4 ,l i, j,l
2E + Ak 2E + Ak 1 p k k + x δi j + σi j,l δil + σil, j d x r r Ak 2 ,l ,j i, j,l 1 2 3 k k p k k −σil σil, j j − σi j,l σi j,l d x = σi j,l d x, = x xp 4 4 Ak A1 i,l, j
i, j,l
Specifying Angular Momentum and Center of Mass
799
where in the last line we use integration by parts. We may then apply Theorem 2.2 with β = 64π E γ to obtain the required condition (1.2) on the center of mass.
5. Proof of Theorem 3 4 In Theorem −2 1 and Theorem 2 we assumed −1−2q that g = v δ outside a compact set and π = O r , π(x) + π(−x) = O r . Using a density theorem [12], we prove below that the condition can be replaced by the weaker Regge–Teitelboim condition. In this section, we fix the constant p > 3 and the constant q ∈ (1/2, 1).
Theorem 5.1. Let (g, π ) be a nontrivial vacuum initial data set satisfying the Regge– Teitelboim condition. Given α , γ ∈ R3 and given > 0, there exists a vacuum initial data set (g, ˜ π˜ ) with g˜ − gW 2, p ≤ , π˜ − π W 1, p ≤ , so that −q
−1−q
| E˜ − E| ≤ , |P˜ − P| ≤ ,
(5.1)
˜ − C − γ | ≤ . |J˜ − J − α | ≤ , |C
(5.2)
and
Proof. By the density theorem in [12], given the vacuum initial data set (g, π ) satisfying the Regge–Teitelboim condition and any > 0, there exists a vacuum initial data ˇ πˇ) (g, with gˇ = v 4 δ outside a compact set and πˇ (x) = O(r −2 ), πˇ (x) + πˇ (−x) = O r −1−2q , so that gˇ − gW 2, p ≤ , πˇ − π W 1, p ≤ . Moreover, −q
−1−q
ˇ − C| ≤ . | Eˇ − E| ≤ , |Pˇ − P| ≤ , |Jˇ − J| ≤ , |C 3 3 3 3 Then we apply the construction in the proof of Theorem 1 to (g, ˇ πˇ ). Given α = (α1 , α2 , α3 ) ∈ R3 , there exist symmetric (0, 2) tensors σ, τ ∈ C0∞ (A1 ) satisfying 1 τi j,l (Y p )l + τil (Y p )l, j σ i j d x = −8π α p , (5.3) A1 2 where rotation vector fields Y p = ∂ x∂ p × x. Let σ k (x) = k −1 σ (x/k) and τ k (x) = k −2 σ (x/k). By Proposition 3.1, there exists a large integer k so that (gˆ k , πˆ k ): 4 gˇ + σ k + h k , gˆ k = u k 2 πˇ + τ k + L(g+σ k ) Xk + w k πˆ k = u k satisfies the vacuum constraint equations, where (u k , Xk ) and (h k , w k ) arise from solving the linearized constraint equations. They satisfy the decay condition (3.4), and the (h k , w k ) have compact support. Then, from Theorem 1, we have ˇ ≤ | Eˆ − E|
, 3
ˇ ≤ , |Pˆ − P| 3
and |Jˆ − Jˇ − α | ≤ . 3
ˆ − C| ˇ ≤ , |C 3
800
L.-H. Huang, R. Schoen, M.-T. Wang
ˆ πˆ satisWe suppress the superscript k of gˆ k , πˆ k in the following. Clearly g, 4 fies the condition in Theorem 2, namely gˆ = (uv) δ outside a compact set and −2 −1−2q . Let σˆ be a symmetric (0, 2) tensor satisfying πˆ = O(r ), πˆ (x)+πˆ (−x) = O r ˆ p. xp (σˆ i j,k )2 d x = 64π Eγ (5.4) A1
i, j,k
Let σˆ l (x) = l −1 σˆ (x/l). By the construction in the proof of Theorem 2, there exists an integer l k so that g˜ l = (uˆ l )4 (gˆ + σˆ l ) + hˆ l , ˆl ˆl π˜ l = (uˆ l )2 (πˆ + L(g+ ˆ σˆ l ) X ) + w ˆ l ) and (hˆ l , wˆ l ) are from solving satisfies the vacuum constraint equations, where (uˆ l , X the linearized constraint equations as above. Moreover, ˆ ≤ | E˜ − E|
ˆ ≤ , |J˜ − J| ˆ ≤ , , |P˜ − P| 3 3 3
and ˜ −C ˆ − γ | ≤ . |C 3 Then (5.1) and (5.2) follow by combining the above inequalities.
We can further perturb (g, ˜ π˜ ) so that the energy-momentum vector equals to that of (g, π ), while changing the angular momentun and center of mass by only a small amount. Proposition 5.1. Let (g, π ) be a nontrivial vacuum initial data set satisfying the Regge– Teitelboim condition. Given α , γ ∈ R3 and > 0, there exists a vacuum initial data (g, ¯ π¯ ) satisfying g¯ − gW 2, p ≤ and π¯ − π W 1, p ≤ so that E = E and P = P −q
−1−q
and
|J − J − α | ≤ , and |C − C − γ | ≤ . Proof. Let (g, ˜ π˜ ) be the initial data constructed in Theorem 5.1. Let (g, ¯ π¯ ) be the vacuum initial data from scaling g¯ = λ2 g˜ and π¯ = λπ˜ , where the constant λ is a positive ˜ 2 ). Then, by straightforward computations, constant and λ2 = (E 2 − |P|2 )/( E˜ 2 − |P| ˜ ˜ ˜ ˜ we have (E, P, J, C) = λ( E, P, J, C) and then 2
E − |P|2 = E 2 − |P|2 . Notice that by (5.1), ˜2 ˜ 2 ) − (E 2 − |P|2 ) ≤ 2(E + |P|) + 2 . ( E − |P| Therefore, since E > |P| by the positive mass theorem, we divide the above inequality by E 2 − |P|2 . Then |λ−2 − 1| ≤
2 2 + 2 . E − |P| E − |P|2
Specifying Angular Momentum and Center of Mass
801
Because E and P are fixed, we can choose k, l in the proof of Theorem 5.1 large enough ˜ J, ˜ C) ˜ and hence to ˜ P, so that λ is close to 1. Therefore, (E, P, J, C) is close to ( E, (E, P, J, C). Because E is close to E, we then boost the data (g, ¯ π¯ ) by a small angle so that E = E, and then |P| = |P| (the existence of such boosted slice is proven in [3]). By rotating the asymptotically flat coordinates, we can make P = P. Also, notice that the angular momentum and center of mass only change a small amount after these transformations. (The transformation formulas of these quantities under the Poincaré transformations can be found in, for example, [6, App. E].)
To prove Theorem 3, we need the following degree argument. Lemma 5.2. Fix the constant a > 0. Let Ba (z 0 ) ⊂ Rn denote the closed ball centered at z 0 with radius a. Let f : Ba (z 0 ) → Rn be a continuous map satisfying, for any z ∈ Ba (z 0 ), | f (z) − z| ≤ a. Then f −1 (z 0 ) is non-empty. More precisely, either f (z) = z 0 for some z ∈ ∂ Ba (z 0 ) or the degree of f at z 0 is one. Proof. By scaling, we only need to prove the case when a = 1. We define the continuous homotopy between f and the identity map for 0 ≤ t ≤ 1: h(z, t) = (1 − t)z + t f (z). For a boundary point z ∈ ∂ B1 (z 0 ), (h(z, t) − z 0 ) · (z − z 0 ) = [(z − z 0 ) + t ( f (z) − z)] · (z − z 0 ) ≥ |z − z 0 |2 − t| f (z) − z||z − z 0 | ≥ 1 − t. Then either f (z) = z 0 for some z ∈ ∂ B1 (z 0 ) or h(z, t) = z 0 for all 0 ≤ t ≤ 1 and for all z ∈ ∂ B1 (z 0 ). In particular, the latter case implies that z 0 stays in the range of h(·, t) for all t ∈ [0, 1]. Therefore, f −1 (z 0 ) is non-empty.
Proof of Theorem 3. Denote the given constant vector ( α0 , γ0 ) by z 0 ∈ R6 . We may without loss of generality prove only for the case | α0 | = 0 and |γ0 | = 0, for if α 0 (or γ0 ) is the zero vector, we apply the theorem twice to a non-zero constant vector v and then to − v . We define the map f : B (z 0 ) ⊂ R6 → R6 by f ( α , γ ) = (J − J, C − C), where J and C are the angular momentum and center of mass of (g, ¯ π¯ ) constructed in Proposition 5.1. By the construction, | f (z) − z| ≤ . Once we verify that f is continuous, we apply Lemma 5.2 to obtain f ( α , γ ) = ( α0 , γ0 ) for some ( α , γ ) and complete the proof of Theorem 3. Claim. The map f is continuous.
802
L.-H. Huang, R. Schoen, M.-T. Wang
Proof. Given non-zero vectors ( α0 , γ0 ), let (σ, τ, σˆ ) be the symmetric (0, 2) tensors and k, l be the integers in the proof of Theorem 5.1 for ( α0 , γ0 ). (Notice that k, l may be chosen large depending only on α 0 , γ0 , and .) Assume that ( α , γ ) is another pair of constant vectors. We fix the symmetric (0, 2) tensors σ, τ, σˆ and the annular shells determined by k and l. We choose the symmetric (0, 2) tensors for α , γ from σ, τ, σˆ and apply the construction in Theorem 5.1 over the annular shells determined by k and l. That the choice of the symmetric (0, 2) tensors depends continuously on α can be seen as follows: Let R1 : R3 → R3 be a rotation so that R1 ( α0 ) is parallel to α . The symmetric (0, 2) tensors defined by | α | −1 ∗ | α | −1 ∗ (R1 ) σ, (R ) τ | α0 | | α0 | 1 satisfy the corresponding condition (5.3) for α on the right-hand side. It is easy to check that at each point
| ∗ 3 1 α| −1 α − α 0 | R1 |σ | + |Dσ | . σ − σ ≤ | | α0 | 2| α0 | 2 Similar estimate can be derived for the other tensor τ . For γ and the integral (5.4), we can also choose the tensor corresponding to σˆ in the same fashion. It is straightforward to check that the rest of the construction is continuous, and hence f is continuous.
Corollary 5.3. Given any constant vector (E, P, J, C) ∈ R10 with E > |P|, there exists a smooth and complete asymptotically flat vacuum initial data set whose energy, linear momentum, angular momentum, and center of mass are the corresponding components of this constant vector. Proof. By Theorem 3, it suffices to show that there exists a vacuum initial data with the specified E and P. By the results of the global existence of the Cauchy problem (see [2,14,15]), given a strongly asymptotically flat vacuum initial data set (g, π ) close to the flat data, there exist future and past complete vacuum developments. In particular, we can boost the slice in spacetime and then rotate the coordinates so that the energymomentum vector of (g, π ) is parallel to the given vector (E, P). Then, by scaling the data, we obtain a vacuum initial data with the desired energy-momentum vector (E, P).
Acknowledgements. LH supported by DMS-1005560, RS by DMS-0604960, MW by DMS-0904281.
References 1. Beig, R., Ó Murchadha, N.: The Poincaré group as the symmetry group of canonical general relativity. Ann. Physics 174(2), 463–498 (1987) 2. Christodoulou, D., Klainerman, S.: The global nonlinear stability of the Minkowski space. Princeton Mathematical Series, 41. Princeton University Press, 1994 3. Christodoulou, D., Ó Murchadha, N.: The boost problem in general relativity. Commun. Math. Phys. 80(2), 271–300 (1981) 4. Chru´sciel, P.T.: Mass and angular-momentum inequalities for axi-symmetric initial data sets. I. Positivity of mass. Ann. Physics 323(10), 2566–2590 (2008)
Specifying Angular Momentum and Center of Mass
803
5. Chru´sciel, P.T., Corvino, J., Isenberg, J.: Construction of N-body initial data sets in general relativity. Commun. Math. Phys. 304, 637–647 (2011) 6. Chru´sciel, P.T., Delay, E.: On mapping properties of the general relativistic constraints operator in weighted function spaces, with applications. Mém. Soc. Math. Fr. (N.S.) 94 (2003) 7. Chru´sciel, P.T., Li, Y., Weinstein, G.: Mass and angular-momentum inequalities for axi-symmetric initial data sets. II. Angular momentum. Ann. Physics 323(10), 2591–2613 (2008) 8. Chru´sciel, P.T., Costa, J.: Mass angular-momentum and charge inequalities for axisymmetric initial data. Class. Quantum Grav. 26(23), 235013 (2009) 9. Corvino, J.: Scalar curvature deformation and a gluing construction for the Einstein constraint equations. Commun. Math. Phys. 214(1), 137–189 (2000) 10. Corvino, J., Schoen, R.: On the asymptotics for the vacuum Einstein constraint equations. J. Diff. Geom. 73(2), 185–217 (2006) 11. Dain, S.: Proof of the angular momentum-mass inequality for axisymmetric black holes. J. Diff. Geom. 79, 3–67 (2008) 12. Huang, L.-H.: On the center of mass of isolated physical systems with general asymptotics. Class. Quantum Grav. 26(1), 015012 (2009) 13. Huang, L.-H.: Solutions of special asymptotics to the Einstein constraint equations. Class. Quantum Grav. 27(24), 245002 (2010) 14. Klainerman, S., Nicolò, F.: The evolution problem in general relativity. Progress in Mathematical Physics, 25, Boston, MA: Birkhauser, 2003 15. Lindblad, H., Rodnianski, I.: Global existence for the Einstein vacuum equations in wave coordinates. Commun. Math. Phys. 256(1), 43–110 (2005) 16. Regge, T., Teitelboim, C.: Role of Surface Integrals in the Hamiltonian Formulation of General Relativity. Ann. Phys. 88, 286–318 (1974) 17. Zhang, X.: Angular momentum and positive mass theorem. Commun. Math. Phys. 206, 137–155 (1999) Communicated by P.T. Chru´sciel
Commun. Math. Phys. 306, 805–830 (2011) Digital Object Identifier (DOI) 10.1007/s00220-011-1302-1
Communications in
Mathematical Physics
Faithful Squashed Entanglement Fernando G. S. L. Brandão1 , Matthias Christandl2 , Jon Yard3 1 Departamento de Física, Universidade Federal de Minas Gerais, Belo Horizonte, Caixa Postal 702,
MG 30123-970, Brazil. E-mail:
[email protected]
2 Institute for Theoretical Physics, ETH Zurich, Wolfgang-Pauli-Strasse 27, CH-8057 Zurich, Switzerland.
E-mail:
[email protected]
3 Center for Nonlinear Studies (CNLS), Computer, Computational and Statistical Sciences (CCS-3),
Los Alamos National Laboratory, Los Alamos, NM 87545, USA. E-mail:
[email protected];
[email protected]
Received: 8 October 2010 / Accepted: 14 June 2011 Published online: 2 August 2011 – © Springer-Verlag 2011
Abstract: Squashed entanglement is a measure for the entanglement of bipartite quantum states. In this paper we present a lower bound for squashed entanglement in terms of a distance to the set of separable states. This implies that squashed entanglement is faithful, that is, it is strictly positive if and only if the state is entangled. We derive the lower bound on squashed entanglement from a lower bound on the quantum conditional mutual information which is used to define squashed entanglement. The quantum conditional mutual information corresponds to the amount by which strong subadditivity of von Neumann entropy fails to be saturated. Our result therefore sheds light on the structure of states that almost satisfy strong subadditivity with equality. The proof is based on two recent results from quantum information theory: the operational interpretation of the quantum mutual information as the optimal rate for state redistribution and the interpretation of the regularised relative entropy of entanglement as an error exponent in hypothesis testing. The distance to the set of separable states is measured in terms of the LOCC norm, an operationally motivated norm giving the optimal probability of distinguishing two bipartite quantum states, each shared by two parties, using any protocol formed by local quantum operations and classical communication (LOCC) between the parties. A similar result for the Frobenius or Euclidean norm follows as an immediate consequence. The result has two applications in complexity theory. The first application is a quasipolynomial-time algorithm solving the weak membership problem for the set of separable states in LOCC or Euclidean norm. The second application concerns quantum Merlin-Arthur games. Here we show that multiple provers are not more powerful than a single prover when the verifier is restricted to LOCC operations thereby providing a new characterisation of the complexity class QMA.
806
F. G. S. L. Brandão, M. Christandl, J. Yard
Contents 1. 2. 3.
Introduction . . . . . Results . . . . . . . . Proof of Theorem . . A Proof of Lemma 1 B Proof of Lemma 2 C Proof of Lemma 3 D Proof of Theorem 4 Proofs of Corollaries . References . . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
806 807 817 817 819 822 824 826 828
1. Introduction The correlations of a bipartite quantum state ρ AB can be measured by the quantum mutual information I (A; B)ρ := H (A)ρ + H (B)ρ − H (AB)ρ ,
(1)
where H (X )ρ := − tr(ρ X log ρ X ) is the von Neumann entropy. Subadditivity of entropy implies that mutual information is always positive. If ∗ 1 is the trace norm, defined √ as X 1 = tr X † X , the inequality 1 || ρ AB − ρ A ⊗ ρ B ||21 , (2) 2 ln 2 follows from Pinsker’s inequality [1] for the relative entropy. Thus a bipartite state has zero mutual information if – and only if – it has no correlations (i.e. it is a product state). Conditional mutual information, in turn, measures the correlations of two quantum systems relative to a third. Given a tripartite state ρ AB E , it is defined as I (A; B)ρ ≥
I (A; B|E)ρ := H (AE)ρ + H (B E)ρ − H (AB E)ρ − H (E)ρ .
(3)
This measure is also always non-negative, as a consequence of the celebrated strong subadditivity inequality for the von Neumann entropy proved by Lieb and Ruskai [2]. As for mutual information, one can ask which states have zero conditional mutual information. Such a characterization was given in Ref. [3]. A state ρ AB E has I (A; B|E)ρ = 0 if and only if it is so-called quantum Markov chain, i.e. there is a decomposition of the E system vector space He L ⊗ He R (4) HE = j
j
j
into a direct sum of tensor products such that ρ AB E = p j ρ Ae L ⊗ ρ Be R , j
j
(5)
j
with states ρ Ae L on H A ⊗ He L , ρ Be R on H B ⊗ He R and probabilities p j . One might j j j j ask: Is there an inequality analogous to (2) for the conditional mutual information? Here one could expect the minimum distance to quantum Markov chain states to play the role of the distance to the tensor product of the reductions in (2). Up to now, however, only negative results were obtained in this direction [4]. Note, in particular, that the AB reduction of any state of the form (5) is separable [5], i.e.
Faithful Squashed Entanglement
807
ρ AB =
p j ρ A, j ⊗ ρ B, j .
(6)
j
Rather than finding a lower bound in terms of a distance to Markov chain states, our main result, the theorem in Sect. 2, is a lower bound, similar to the one in (2), in terms of a distance to separable states. The fact that states ρ AB E with zero conditional mutual information are such that ρ A:B is separable motivated the introduction of the entanglement measure squashed entanglement [6],1 defined as 1 (7) I (A; B|E)ρ : ρ AB E is an extension of ρ AB . E sq (ρ A:B ) = inf 2 This measure is known to be an upper bound both for distillable entanglement and distillable key [6,9,10]. It satisfies several useful properties such as monotonicity under local operations and classical communication (LOCC), additivity on tensor products [6], and monogamy [11]. A central open question in entanglement theory, posed already in [6], is whether squashed entanglement is a faithful entanglement measure, i.e. whether it vanishes if and only if the state is separable. On the one hand, an entangled state with zero squashed entanglement would imply the existence of bound key, i.e. of an entangled state from which no secret key can be extracted. Analogously, a state with a non-positive partial transpose (NPT) [12,13] and vanishing squashed entanglement would show there are NPT-bound entangled states [14,15]. Both of these are core unresolved problems in quantum information theory. On the other hand, if squashed entanglement turned out to be positive on every entangled state, then we would have the first example of an entanglement measure which is faithful, LOCC monotone and which satisfies the monogamy inequality [16], three properties usually attributed to entanglement but which are not known to be compatible with each other. The lower bound on conditional mutual information obtained in this paper resolves this question by showing that squashed entanglement is strictly positive on every entangled state. Besides its impact on entanglement theory, this result has a number of unexpected consequences for separability testing, quantum data hiding and quantum complexity theory. In the following section we will present the results of this work in detail, before proceeding to the proofs in subsequent sections. 2. Results A lower bound on conditional mutual information. Our main result is an approximate version of the fact that states ρ AB E with zero conditional mutual information I (A; B|E) are such that ρ A:B is separable: We show that if a tripartite state has small conditional mutual information, its AB reduction is close to a separable state. The appropriate distance measure on quantum states turns out to be crucial for a fruitful formulation of the result and its consequences. By analogy with the definition of the trace norm as the optimal probability of distinguishing two quantum states, our main results involve norms that quantify the distinguishability of quantum states under measurements that are restricted by locality. Specifically, given two equiprobable states 1 The functional (without the factor 1 ) has also been considered in [7,8]. 2
808
F. G. S. L. Brandão, M. Christandl, J. Yard
ρ and σ , define the distance2 ||ρ − σ ||LOCC = 4 Psucc − 21 , where Psucc ≥ 21 is the probability of correctly distinguishing the states, maximised over all decision procedures that can be implemented using only local operations and classical communication (LOCC) [17]. When this distance is small, the states ρ and σ will roughly behave “in the same way” when used in any LOCC protocol. Writing the distance from a state ρ AB to the set S A:B of separable states on A : B as ||ρ AB − S A:B || = min ||ρ AB − σ ||, σ ∈S A:B
(8)
we may state our main result. Theorem. For every tripartite finite-dimensional state ρ AB E , 1 || ρ AB − S A:B ||2LOCC . (9) I (A; B|E)ρ ≥ 8 ln 2 Interestingly, a similar statement is true when replacing the LOCC norm with the Frobenius norm, since there is the dimension-independent lower bound [17] 1 ||X ||LOCC ≥ √ ||X ||2 , (10) 153 √ where ||X ||2 = tr X † X . The Frobenius norm is also known as the Euclidean norm since it measures the Euclidean distance between quantum states when interpreted as elements of the real vector space of Hermitian matrices. Note, however, that the theorem fails dramatically if we replace the LOCC norm by the trace norm, a counterexample being a tripartite extension of the antisymmetric state3 constructed in [10].4 On the other hand, the theorem readily implies a lower bound for the trace norm with a dimensiondependent factor. This is because of (10) and the well-known relation between the trace norm and the Frobenius norm 1 ||X ||2 ≥ √ (11) ||X ||1 , |A||B| √ where |A| denotes the dimension of system A and ||X ||1 = tr X † X . A lower bound for squashed entanglement. Combining the theorem with the definition of squashed entanglement we find the following corollary. Corollary 1. For every state ρ AB , 1 || ρ AB − S A:B ||2LOCC . (12) E sq (ρ A:B ) ≥ 16 ln 2 Because || ∗ ||LOCC is a norm, this implies that squashed entanglement is faithful, i.e. that it is strictly positive on every entangled state. This property had been conjectured in [6] and its resolution here settles the last major open question regarding squashed entanglement. Squashed entanglement is the entanglement measure which – among all known entanglement measures – satisfies most properties that have been proposed as useful for an entanglement measure. A comparison of different entanglement measures is provided in Table 1. 2 This definition extends to a norm for the operators on AB, see (33). (I−F) 3 The antisymmetric state is a particular Werner state defined as ω AB := d(d−1) , with F the flip, or swap,
operator. 4 log e 4 Indeed, Ref. [10] presents an extension ω AB E of ω AB such that I (A; B|E)ω ≤ |A|−1 , while a simple 1 calculation gives that ω AB is 2 away from any separable state in trace distance.
Faithful Squashed Entanglement
809
Table 1. If no citation is given, the property either follows directly from the definition or was derived by the authors of the main reference. Many recent results listed in this table have significance beyond the study of entanglement measures, such as Hastings’s counterexample to the additivity conjecture of the minimum output entropy [33] which implies that entanglement of formation is not strongly superadditive [32] Measure
E sq [6]
ED [18,19]
KD [20,21]
EC [18,22]
EF [18]
ER [23]
E∞ R [24]
EN [25]
normalisation faithfulness LOCC monotonicitya asymptotic continuity convexity strong superadditivity subadditivity monogamy
y y Cor. 1 y y [29] y y y y [11]
y n [14] y ? ? y ? ?
y ? y ? ? y ? ?
y y [26] y ? ? ? y n [10]
y y y y y n [32,33] y n [10]
y y y y [30] y n [34] y n [10]
y y [27] y y [9] y [31] ? y n [10]
y n y [28] n [9] n ? y ?
a More precisely, we consider “weak” LOCC monotonicity, i.e. monotonicity without averages
Squashed entanglement is the quantum analogue of the intrinsic information, which is defined as I (X ; Y ↓ Z ) := inf I (X ; Y | Z¯ ), PZ¯ |Z
(13)
for a triple of random variables X, Y, Z [35]. The minimisation extends over all conditional probability distributions mapping Z to Z¯ . It has been shown that the minimisation can be restricted to random variables Z¯ with size | Z¯ | = |Z |[36]. This implies that the minimum is achieved and in particular that the intrinsic information only vanishes if there exists a channel Z → Z¯ such that I (X ; Y | Z¯ ) = 0. Whereas our work does not allow us to derive a dimension bound on the system E in the minimisation of squashed entanglement and hence conclude that the minimisation is achieved in general, we can assert such a bound if squashed entanglement vanishes: Corollary 1 implies that squashed entanglement vanishes only for separable ρ AB . By Caratheodory’s theorem, the number 2 of terms in the separable decomposition of i pi ρ A,i ⊗ ρ B,i can be bounded by |AB| , and thus ρ AB E = i pi ρ A,i ⊗ ρ B,i ⊗ |i i| E has vanishing conditional mutual information with E = |AB|2 . Equivalently, there exists a channel applied to a purification of ρ AB resulting in ρ AB E such that I (A; B|E)ρ vanishes. Positive rate in state redistribution. Quantum conditional mutual information can be given an operational interpretation in the context of the state redistribution problem [37,38]. Suppose a sender and a receiver share a quantum state |ψ AB E E and the sender (who initially holds subsystems B E) wants to send B to the receiver (who initially holds E ), while preserving the purity of the global state. The purifying subsystem A is unavailable to both of them. Given free entanglement between the sender and receiver, the minimum achievable rate of quantum communication needed to send B, with vanishing error in the limit of a large number of copies of the state, is given by 21 I (A; B|E)ψ [37,38]. Then from an optimal protocol for state redistribution we find E sq (ρ A:B ) to be the minimum rate of quantum communication needed to send the B system, optimised over all possible ways of distributing the side-information among E and E [39]. One can ask: Is a positive rate of quantum communication always required in order to redistribute a system entangled with another (irrespective of the side information available to the sender and receiver)? Corollary 1 allows us to answer this question in the affirmative. The need of a positive rate in state redistribution can then be seen as a new distinctive feature of quantum correlations.
810
F. G. S. L. Brandão, M. Christandl, J. Yard
The quantum de Finetti theorem, the LOCC norm and data-hiding states. We say a bipartite state ρ A:B is k-extendible if there is a state ρ A:B1 ,...,Bk that is permutation-symmetric in the B systems with ρ A:B = tr B2 ,...,Bk (ρ A:B1 ,...,Bk ). Such a family of states provides a sequence of approximations to the set of separable states. A state is separable if, and only if, it is k-extendible for every k [40–45]. Indeed, quantum versions of the de Finetti theorem [44,45] show that any k-extendible state ρ A:B is such that || ρ AB − S A:B ||1 ≤
4|B|2 . k
(14)
Moreover, this bound is close to tight, as there are k-extendible states that are (|B|k −1 )away from the set of separable states [45]. Unfortunately, for many applications this error estimate – exponentially large in the number of qubits of the state – is too big to be useful. Our next result shows that a significant improvement is possible if we are willing to relax our notion of distance of two quantum states. It shows that in LOCC norm we can obtain an error term that grows as the square root of the number of qubits of the A system: Corollary 2. Let ρ A:B be k-extendible. Then
|| ρ AB − S A:B ||LOCC ≤
16 ln 2 log |A| . k
(15)
The key point in the proof of Corollary 2 is the fact that squashed entanglement satisfies the so-called monogamy inequality [11], namely E sq (ρ A:B1 B2 ) ≥ E sq (ρ A:B1 ) + E sq (ρ A:B2 ),
(16)
for every state ρ AB1 B2 . This, together with the bound E sq (ρ A:B ) ≤ log |A| [6] implies that the squashed entanglement of any k-extendible state must be smaller than k −1 log |A|, which combined with Corollary 1 gives Corollary 2. We do not know if the bound given in Corollary 2 is tight. An indication, however, is given by the following example, which shows that for all k there is a k-extendible state ρ AB with log |A| = k and || ρ AB − S A:B ||LOCC ≥ 1. Example (Lower bound). Consider systems A = A1 A2 · · · Ak and B = B B and define ρ AB := tr B2 ···Bk (ρ AB1 B2 ···Bk ),
(17)
where ρ A1 ...Ak :B B ···B B := Id A1 ···Ak ⊗ 1 1
k
k
Sym B B ,...,B B A1 B ⊗ |1 1| B ⊗ · · · ⊗ Ak B ⊗ |k k| B , 1 1
k
k
1
1
k
k
(18) √ where := | | is the projector onto an EPR pair | := (|0 |0 + |1 |1 )/ 2, Id is the identity superoperator and Sym B1 ,...,Bk is the symmetrization superoperator defined as 1 Pπ X Pπ −1 , (19) Sym B1 ,...,Bk (X ) := k! π ∈Sk
Faithful Squashed Entanglement
811
with Sk the symmetric group of order k and Pπ the representation of the permutation π which acts on a k-partite vector space as Pπ |l1 ⊗ · · · ⊗ |lk = |lπ −1 (1) ⊗ · · · ⊗ |lπ −1 (k) . The state ρ AB takes the form 1 Al B ⊗ |l l| B ⊗ τ A1 ···Al−1 Al+1 ···Ak , k k
ρ AB =
(20)
l=1
with τC := I/|C| and is by construction k-extendible. Moreover we can deterministically obtain an EPR pair from ρ AB by the following simple LOCC protocol: Bob measures the B register in the computational basis and tells Alice the outcome l obtained. Then she traces out all her systems except the l th . Thus the LOCC distance of ρ AB to separable states is at least the LOCC distance of an EPR pair to separable states, which is known to be one [46], and so || ρ AB − S A:B ||LOCC ≥ 1.
(21)
A direct implication of Corollary 2 concerns data-hiding states [47–50]. Every state ρ AB that can be well-distinguished from separable states by a global measurement, yet is almost completely indistinguishable from a separable state by LOCC measurements is a so-called data-hiding state: it can be used to hide a bit of information (of whether the state prepared is ρ AB or its closest separable state in LOCC norm) that is not accessible by LOCC operations alone. The antisymmetric state of sufficiently high dimension is an example of a data hiding state [49], as are random mixed states with high probability [50] (given an appropriate choice of the dimensions and the rank of the state). Corollary 2 shows that highly extendible states that are far away in trace norm from the set of separable states must necessarily be data-hiding. A quasipolynomial-time algorithm for separability of quantum states. Given a density matrix describing a bipartite quantum state ρ A:B , the separability problem consists of determining if the state is entangled or separable. This is one of the most basic questions in quantum information theory and has been a topic of active research in the past years (see e.g. [15]). In the weak-membership problem WSEP ( , ∗ ) for separability one should decide if a given bipartite state ρ A:B is in the interior of the set of separable states or -away from any separable state (in norm ∗ ), given the promise that one of the two alternatives holds true. The best known algorithms for the problem have worst case complexity 2poly(|A|,|B|) log(1/ ) (see e.g. [51–54]). In fact, the problem is NP-hard for = 1/ poly(n) [55–57]. Corollary 2 implies that for the LOCC and Euclidean norm, one can greatly improve upon the previous known algorithms. Corollary 3. There is a quasipolynomial-time algorithm for solving the weak membership problem for separability in the LOCC norm and the Euclidean norm. More precisely, there is an exp(O( −2 log |A| log |B|))-time algorithm for deciding WSEP ( , ∗ LOCC ) and WSEP ( , ∗ 2 ). We note that in many applications of the separability algorithm, e.g. accessing the usefulness of a quantum state for violating Bell’s inequalities or for performing quantum teleportation, the LOCC norm is actually the relevant quantity to consider. Also, it is intriguing that the complexity of our algorithm matches the best hardness result for the trace-norm version of the problem [58]. It is an open question if a similar hardness result
812
F. G. S. L. Brandão, M. Christandl, J. Yard
could be obtained for the LOCC norm case, which would imply that our algorithm is essentially optimal. The algorithm, which we analyse in more detail in Sect. 4, is very simple and in fact is the basis of a well-known hierarchy of efficient tests for separability [51]. Using semidefinite-programming one looks for a (log |A| −2 )-extension of ρ A:B and decides that the state is separable if one is found. If the state is separable, then clearly there is such an extension. If it is -away from any separable state, Corollary 2 shows that no such extension exists.5 More detailed information on this algorithm as well as the related problem of optimising a linear function over separable states, relevant for instance in mean-field theory, is given in [59]. Quantum Merlin-Arthur games with multiple Merlins. A final application of the theorem concerns quantum Merlin-Arthur games. The class QMA can be considered the quantum analogue of NP and is formed by all languages that can be decided in quantum polynomial-time by a verifier who is given a quantum system of polynomially many qubits as a proof (see e.g. [60]). It is natural to ask how robust this definition is. A few results are known in this direction: For example, it is possible to amplify the soundness and completeness parameters to exponential accuracy, even without enlarging the proof size [61]. Also, the class does not change if we allow a first round of logarithmic-sized quantum communication from the verifier to the prover [62]. From Corollary 2 we get a new characterisation of QMA, which at first sight might appear to be strictly more powerful: We show QMA to be equal to the class of languages that can be decided in polynomial time by a verifier who is given a constant number k of unentangled proofs and can measure them using any quantum polynomial-time implementable LOCC protocol (among the k proofs). This answers an open question of Aaronson et al. [63]. We hope this characterisation of QMA proves useful in devising new QMA verifying systems. In order to formalise our result we consider the classes QMAM (k)m,s,c , defined in analogy to QMA as follows [58,63,64]: Definition. Let M be a set of POVM elements. A language L is in QMAM (k)m,s,c if for every input x ∈ {0, 1}n there exists a polynomial time implementable two outcome measurement {Mx , I − Mx } with Mx in M such that • Completeness: If x ∈ L, there exist k witnesses |ψ1 , . . . , |ψk , each of m qubits, such that tr (Mx (|ψ1 ψ1 | ⊗ · · · ⊗ |ψk ψk |)) ≥ c.
(22)
• Soundness: If x ∈ / L, then for any states |ψ1 , . . . , |ψk , tr (Mx (|ψ1 ψ1 | ⊗ · · · ⊗ |ψk ψk |)) ≤ s.
(23)
We call QMAM (k) = QMAM (k)poly(n),2/3,1/3 . Let M = SEPYES be the class of (non-normalised) separable POVM elements.6 Then Harrow and Montanaro showed that QMASEPYES (2) = QMA(2) = QMA(k) for 5 In [51] one imposed the further constraint that the symmetric-extension has to have a positive partial transpose with respect to all B systems. It is an interesting open question if the worst-case complexity of the algorithm (in the LOCC norm case) can be further improved taking into consideration this extra constraint. 6 POVM elements A from this class form two-outcome POVMs {A, I − A} known as separable-onceremoved, meaning that they are separable measurements once one of the effects is removed. Here we use the notation SEPYES to denote that the POVM element associated to accept should be be separable, i.e. proportional to a separable state.
Faithful Squashed Entanglement
813
any k = poly(n) [58], i.e. two proofs are just as powerful as k proofs and one can restrict the verifier’s action to SEPYES without changing the expressive power of the class. We define QMALOCC (k) in an analogous way, but now the verifier can only measure the k proofs with a LOCC measurement. Then we have, Corollary 4. For k = O(1), QMALOCC (k) = QMA.
(24)
QMALOCC (2)m,s,c ⊆ QMA O(m 2 −2 ),s+ ,c .
(25)
In particular, for any > 0
A preliminary result in the direction of Corollary 4 appeared in [65], where a similar result was shown for QMALO (k), a variant of QMA(k) in which the verifier is restricted to implement only local measurements on the k proofs and jointly post-process the outcomes classically.7 It is an open question whether (25) remains true if we consider QMA(2) instead of QMALOCC (2). If this turns out to be the case, then it would imply an optimal conversion of QMA(2) into QMA in what concerns the proof length (under a plausible complexity-theoretic assumption). For it follows from [58] (based on the √ QMA(poly log(n) n)log(n),1,1/3 protocol for 3-SAT of [63]) that unless there is a subexponential-time quantum algorithm for 3-SAT, then there is a constant 0 > 0 such that for every δ > 0, QMA(2)m,s,c QMA O(m 2−δ −2 ),s+ 0 ,c . 0
(26)
Another open question is whether a similar result holds for QMALOCC (2). This would be the case if one could construct a LOCC version of the productness test of Harrow and Montanaro [58]. An interesting approach to the QMA(2) vs. QMA question concerns the existence of disentangler superoperators [63], defined as follows. Definition. A superoperator : S → AB is a (log |S|, , δ)-disentangler in the ∗ norm if • (ρ) is -close to a separable state for every ρ, and • for every separable state σ , there is a ρ such that (ρ) is δ-close to σ . As noted in [63], the existence of an efficiently implementable (poly(log |A|, log |B|),
, δ)-disentangler in trace norm (for sufficiently small and δ) would imply QMA(2) = QMA. Watrous has conjectured that this is not the case and that for every , δ < 1, any ( , δ)-disentangler (in trace-norm) requires |S| = 2(min(|A|,|B|)) . For an exponentially large |S|, in turn, Matsumoto presented a construction of a quantum disentangler [66] and the quantum de Finetti theorem gives an alternative construction [45]. Corollary 2 can be rephrased as saying that there is an efficient disentangler in LOCC norm. Corollary 5. Let k = 16 ln 2 log |A| −2 and S := AB1 · · · Bk . Define the superoperator : S → AB, with |B| = |B j | for all j ≤ k, as 1 ρ ABi . (ρ AB1 ···Bk ) := k k
(27)
i=1
7 QMA (k) is also called BellQMA(k) [63] since the verifier is basically restricted to perform a Bell test LO on the proofs.
814
F. G. S. L. Brandão, M. Christandl, J. Yard
Then is a (O(log |A| log |B| −2 ), , 0)-disentangler in LOCC norm. For more applications of this work to quantum Merlin-Arthur games, see [59]. Distinguishability under locality-restricted measurements. We will now give a formal definition of the norm || ∗ ||LOCC that was used in order to state the results. || ∗ ||LOCC can be seen as a restricted version of the trace norm || ∗ ||1 . Introducing the set ALL as consisting of all operators M that satisfy 0 ≤ M ≤ I, the trace norm can be written as ||X ||1 = max tr((2M − I)X ), M∈ALL
(28)
where I is the identity matrix. The trace norm is of special importance in quantum information theory as it is directly related to the optimal probability for distinguishing two equiprobable states ρ and σ with a quantum measurement.8 In analogy with this interpretation of the trace norm, and for operators X on the tensor product space AB, we define the LOCC norm as ||X ||LOCC :=
max tr((2M − I)X ),
M∈LOCC
(29)
where LOCC is the convex set of matrices 0 ≤ M ≤ I such that there is a two-outcome measurement {M, I − M} that can be realized by LOCC between Alice and Bob [17]. It will become clear below why this defines a norm. The optimal bias in distinguishing ρ and σ by any LOCC protocol is then 21 ||ρ − σ ||LOCC . We note that in many applications of the separability problem, e.g. assessing the usefulness of a quantum state for violating Bell’s inequalities or for performing quantum teleportation, the LOCC norm is actually the more relevant quantity to consider. In the proofs, we will also need versions of the LOCC norm, where the number of rounds of communication is restricted. Similarly to the definition of LOCC above, we therefore introduce LOCC→ (k) consisting of operators 0 ≤ M ≤ I such that there is a two-outcome measurement {M, I − M} that can be realized by k-round LOCC protocol with the first communication from Alice to Bob. A special role is played by the oneround implementable POVM elements LOCC→ ≡ LOCC→ (1) for which an explicit characterisation exists (Lemma 4). Note that LOCC = lim LOCC→ (k). k→∞
(30)
Although not strictly needed in this work, one of the lemmas (Lemma 3) will also be valid for the set SEP which is defined as the set of POVM elements M such that both M and I − M are proportional to a separable state. The inclusions of the sets then imply the following relations between the norms. ||X ||1 ≥ ||X ||SEP ≥ ||X ||LOCC ≥ · · · ≥ ||X ||LOCC→ (k) ≥ ||X ||LOCC→ (k−1) ≥ · · · ≥ ||X ||LOCC→ .
(31)
Analogous statements hold true for LOCC← (k), where the reverse arrow indicates that the first round of communication is from Bob to Alice. 8 Two-outcome measurements suffice for such tasks, and these are described by a pair of positive semidefinite matrices summing to I, which we write {M, I − M}. When the state is ρ, the probabilities of the outcomes are Pr(M) = tr(Mρ) and Pr(I − M) = tr((I − M)ρ) = 1 − Pr(M). The optimal bias of distinguishing two states ρ and σ is then given by max0≤M≤I tr(M(ρ − σ )) = 21 ρ − σ 1 .
Faithful Squashed Entanglement
815
The introduced norms fit into a general framework [17] for restricted norms where one considers M, a closed, convex subset of the operator interval {M : 0 ≤ M ≤ I} = {M : M ≥ 0, ||M||∞ ≤ 1}
(32)
containing I, having nonempty interior, and such that M ∈ M implies I − M ∈ M. Then ||X ||M = max tr ((2M − I)X ) = max tr Y X, Y ∈D
M∈M
(33)
where D = conv(M ∪ −M) = {2M − I : M ∈ M} = {Y : ||Y ||∗M ≤ 1} is the unit ball for the dual norm || · ||∗M . Relating entanglement measures. The core of the proof of the theorem is composed of three steps, Lemmas 1, 2, and 3 below, each of which is a new result about entanglement measures. Together, they establish the weaker inequality I (A; B|E) ≥
1 || ρ AB − S A:B ||2LOCC← , 8 ln 2
(34)
and by symmetry, also the same inequality with for LOCC→ . In Sect. 3D, we then give an inductive proof that the same bound holds for k-round LOCC, where k is arbitrary: I (A; B|E) ≥
1 || ρ AB − S A:B ||2LOCC→ (k) . 8 ln 2
(35)
For k → ∞, this proves the theorem. A measure that plays a fundamental role in the proof is the regularised relative entropy of entanglement [24,34], defined as E R (ρ ⊗n A:B ) , n
E∞ R (ρ A:B ) := lim
n→∞
(36)
with the relative entropy of entanglement given by E R (ρ A:B ) :=
min
σ AB ∈S A:B
S(ρ AB ||σ AB ),
(37)
where the quantum relative entropy is given by S(ρ||σ ) = tr(ρ(log ρ − log σ )) if supp ρ ⊆ supp σ and S(ρ||σ ) = ∞ otherwise. A distinctive property of the relative entropy of entanglement among entanglement measures is the fact that it is not lockable, i.e. by discarding a small part of the state, E R can only drop by an amount proportional to the number of qubits traced out. Indeed, as shown in [67], one has E R (ρ A:B E ) ≤ E R (ρ A:E ) + 2H (B)ρ .
(38)
While the same is true for E ∞ R , we use an optimal protocol for state redistribution [37,38] to obtain the following improvement of (38), which is valid only for the regularised quantity. Lemma 1. Let ρ AB E be a quantum state. Then, ∞ I (A; B|E)ρ ≥ E ∞ R (ρ A:B E ) − E R (ρ A:E ).
(39)
816
F. G. S. L. Brandão, M. Christandl, J. Yard
∞ The second step in the proof is to obtain a lower bound on E ∞ R (ρ A:B E ) − E R (ρ A:E ), which by Lemma 1 also gives a lower bound on the conditional mutual information. If we could prove that E ∞ R satisfied the monogamy inequality given by (16), we would be done, since Ref. [68] shows that
E∞ R (ρ A:B ) ≥
1 || ρ A:B − S A:B ||2LOCC . 2 ln 2
(40)
However, the antisymmetric state is a counterexample [10]. Nonetheless, we will show that the regularised relative entropy of entanglement satisfies a weaker form of the monogamy inequality, and this will suffice for us. In order to state the new inequality, we have to recall an operational interpretation of E∞ R in the context of quantum hypothesis testing [27] (see Sect. 3B for more details). Suppose Alice and Bob are given either n copies of an entangled state ρ A:B , or an arbitrary state which is separable in the cut An : B n . Suppose further they can only measure a POVM from the class M in order to find out which of the two hypotheses is the case. Let pe (n) be the probability that Alice and Bob mistake a separable state for the n copies of ρ A:B , minimised over all possible measurements available to them.9 Then we define the optimal rate function DM (ρ A:B ) as follows: DM (ρ A:B ) = lim − n→∞
log pe (n) . n
(41)
The main result of [27] implies DALL (ρ A:B ) = E ∞ R (ρ A:B ),
(42)
i.e. the regularised relative entropy of entanglement is the optimal distinguishability rate when trying to distinguish many copies of an entangled state from (arbitrary) separable states, in the case where there are no restrictions on the measurements available. ∞ The new monogamy-like inequality we prove for E ∞ R , using the connection of E R with hypothesis testing given by (42), is the following: Lemma 2. For every state ρ A:B E , ∞ ← E∞ R (ρ A:B E ) − E R (ρ A:E ) ≥ DLOCC (ρ A:B ).
(43)
The third and final step is to obtain a lower bound for DLOCC← in terms of the minimum LOCC← distance to the set of separable states. In fact, we prove a slightly more general result. Lemma 3. For any M ∈ {LOCC→ , LOCC← , LOCC, SEP} and every state ρ A:B , DM (ρ A:B ) ≥
1 || ρ A:B − S A:B ||2M . 8 ln 2
(44)
In [68] a similar result was proved: consider the following modification of the relative entropy of entanglement: E R,M (ρ) := max min S(M(ρ)||M(σ )), M∈M σ ∈S
(45)
9 We also require that the measurement only mistakes ρ ⊗n for a separable state with a small probability. A:B See Sect. 3B for details.
Faithful Squashed Entanglement
817
with M(X ) := (tr M X )|0 0| + (1 − tr M X )|1 1| for M ∈ M. Then for any class M ∈ {LOCC→ , LOCC← , LOCC, SEP}, it was shown that [68] E R,M (ρ A:B ) ≥
1 || ρ A:B − S A:B ||2M . 2 ln 2
(46)
It is conceivable that E∞ R,M (ρ A:B ) = DM (ρ A:B ),
(47)
and if this was the case we could obtain Lemma 3 from (46), since E R,M is superadditive. Indeed, for M = ALL, (47) holds true and this is precisely the content of (42). For restricted classes of POVMs, however, it is not known whether (47) remains valid and, alas, the techniques of [27] do not appear to be directly applicable. We note that one direction is easily seen to hold, namely10 E∞ R,M (ρ A:B ) ≥ DM (ρ A:B ).
(48)
So we can obtain an asymptotic version of (46) from Lemma 3. A new construction of entanglement witnesses: In the proof of Lemma 3 we derive a relation that might be of independent interest. For any restricted set of POVM elements M,
2 max tr(Mρ) − max tr(Mσ ) = || ρ AB − S A:B ||M . (49) M∈M
σ ∈S
The POVM element maximising the left hand side is a so-called entanglement witness [15,71]: its expectation value on the entangled state ρ is bigger than its expectation value on any separable state. Therefore it can be used as a witness to attest the entanglement of ρ AB . (49) gives a new construction of entanglement witnesses that is attractive in the following respects: • The witness is a POVM element and thus can be measured with a single measurement setting. • Every M that (after rescaling) contains an informationally complete POVM can be used to construct witnesses detecting every entangled state. • The maximum gap in expectation value for an entangled state versus separable states, optimised over all witnesses from a class M, is equal to the minimum distance of the entangled state to the set of separable states in the M norm and thus can also be used to quantify the entanglement of the state (see [72,73] for related results in this direction). 3. Proof of Theorem A. Proof of Lemma 1. The main idea in the proof of Lemma 1 is to apply the nonlockability of E R given by (38) to a tensor power state ρ ⊗n A:B E , but to use an optimal protocol for performing state redistribution protocol to trace out B in a more efficient way. A general protocol for state redistribution is outlined in Fig. 1. The following proposition follows immediately from the existence of a state redistribution protocol achieving the minimum possible rates of communication and entanglement cost: 10 The proof is very similar to the usual argument, which shows S(ρ||σ ) to be an upper bound on the optimal distinguishability rate in quantum hypothesis testing (see e.g. [69,70]), and thus we omit it.
818
F. G. S. L. Brandão, M. Christandl, J. Yard
Fig. 1. General protocol for redistributing the B parts of a pure state |ψ ⊗n where the sender holds E AB E E and the receiver holds E . The protocol consumes maximal entanglement on D D , transmits the quantum system G, and produces maximal entanglement on F F . The dimensions of D, F and G typically depend exponentially on n. The protocol is asymptotically reversible
Proposition 1. ([37,38]) There is a sequence of asymptotically reversible encoding maps n : B n E n D → E n F G such that if φ An :E n F G = n (ρ ⊗n A:B E ⊗ τ D ) (see Fig. 1), then log |G| → 21 I (A; B|E)ρ , n
(50)
|| φ An E n F − ρ ⊗n AE ⊗ τ F ||1 → 0,
(51)
and where τ D and τ F are the maximally mixed states on systems D and F, respectively. Proof of Lemma 1. 1 lim E R (φ An :E n F G ) E∞ R (ρ A:B E ) = n→∞ n
1 ≤ lim E R (φ An :E n F ) + I (A; B|E) n→∞ n
1 ⊗n = lim E R (ρ A:E ⊗ τ F ) + I (A; B|E) n→∞ n = E∞ R (ρ A:E ) + I (A; B|E).
(52) (53) (54) (55)
The first line holds by asymptotic continuity of E R [30,74] and the asymptotic reversibility of the encoding operation. The second follows from (38) and (50). The third line holds again by continuity of E R , applied to the asymptotic equality (51) of the respective arguments of E R . The same argument applied to the mutual information instead of the relative entropy of entanglement can be used to prove the converse of state redistribution (see [75,76] for a similar argument in the context of one-shot quantum state splitting).
Faithful Squashed Entanglement
819
B. Proof of Lemma 2. Before proceeding to the proof, we give a precise definition of the rate functions DM . Definition. Let ρ AB be a state and M ∈ {LOCC→ , LOCC← , LOCC, SEP, ALL} a class of POVMs. Then we define DM (ρ A:B ) as the unique non-negative number such that • (Direct part): For every > 0 there exists a sequence of POVM elements {M An B n }n∈N , with M An B n ∈ M, such that lim tr(M An B n ρ ⊗n AB ) = 1
n→∞
(56)
and for every n ∈ N and ω An B n ∈ S An :B n , −
log tr(M An B n ω An B n ) ≥ DM (ρ A:B ) − . n
(57)
• (Converse): If a real number > 0 and a sequence of POVM elements {M An B n }n∈N are such that for every n ∈ N and every ω An B n ∈ S An :B n , −
log(tr(M An B n ω An B n )) ≥ DM (ρ A:B ) + , n
(58)
then lim tr(M An B n ρ ⊗n AB ) < 1.
n→∞
(59)
Recall (42): Proposition 2 ([27]). For every state ρ A:B , DALL (ρ A:B ) = E ∞ R (ρ A:B ).
(60)
The main application of Proposition 2 so far has been to prove that the asymptotic manipulation of entanglement under operations that, approximately, do not generate entanglement is reversible: both the distillation and a cost function in this paradigm are equal to the regularised relative entropy of entanglement [77–79]. In fact, one can show that Proposition 2 is equivalent to such reversibility under non-entangling operations, in the sense that the latter can be used to prove the former [80]. The proof of Lemma 2 will utilise the special structure of LOCC measurements with one-way communication: Lemma 4. Any operator in LOCC← necessarily has the form M A,k ⊗ N B,k ,
(61)
k
where 0 ≤ M A,k ≤ I A , 0 ≤ N B,k and
k
N B,k ≤ I B .
Proof. Any LOCC measurement consisting of communication only from Bob to Alice has the general form {M ⊗ N }, where 0 ≤ M , 0 ≤ N , A,a|b B,b A,a|b B,b a M A,a|b ≤ I A and b N B,b ≤ I B . The convex hull of such POVM elements consists of all operators of the form given in the lemma.
820
F. G. S. L. Brandão, M. Christandl, J. Yard
Proof of Lemma 2. The main idea in this proof is to start with measurements for ρ A:E and ρ A:B that respectively achieve DALL (ρ A:E ) and DLOCC← (ρ A:B ), and to construct a measurement for ρ A:B E distinguishing it from separable states at a rate given by the sum of the individual rates. The lemma then follows from Proposition 2. Here, it is crucial that we only use one-way LOCC measurements to discriminate ρ A:B from separable states, since only in this case are we able to make sure that the composed measurement for ρ A:B E acts as it should. Let δ > 0 and fix two δ-optimal sequences of POVM elements: an unrestricted sequence Q An E n of POVM elements for distinguishing ρ A:E from separable states, and a sequence S An B n of one-way LOCC (with classical communication from Bob to Alice) elements for distinguishing ρ A:B from separable states. This means that ⊗n n n tr Q An E n ρ ⊗n (62) A:E → 1 and tr S A B ρ A:B → 1, while
log tr Q An E n ωnA:E ≥ DALL (ρ A:E ) − δ − n
(63)
for every state ω An :E n separable across An: E n and −
log tr (S An B n ω An :B n ) ≥ DLOCC← (ρ A:B ) − δ n
(64)
for every state ω An :B n separable across An: B n . We will use these to construct a sequence n n n of POVM elements T An B n E n that distinguish ρ ⊗n A:B E from separable states ω A :B E , satisfying lim tr(T An B n E n ρ ⊗n A:B E ) = 1
n→∞
(65)
and lim −
n→∞
log tr(T An B n E n ω An :B n E n ) ≥ DALL (ρ A:E ) + DLOCC← (ρ A:B ). n
(66)
Because the left-hand-side is a lower bound to the maximal achievable error rate DALL (ρ A:B E ), Proposition 2 then gives the desired inequality ∞ ← E∞ R (ρ A:B E ) ≥ E R (ρ A:E ) + DLOCC (ρ A:B ).
To construct T An B n E n , first observe that by Lemma 4, we can write the LOCC← operator S An B n in the form S An B n = M An ,k ⊗ N B n ,k , (67) k
with 0 ≤ M An ,k ≤ I, 0 ≤ N B n ,k and k N B n ,k ≤ I. It will be useful to consider a Naimark extension of the measurement on B, consisting of orthogonal projections PB n B,k such that 0 B |PB n B,k |0 B = N Bn B,k . Define R An B n B :=
k
M An ,k ⊗ PB n B,k ,
(68)
Faithful Squashed Entanglement
821
which by construction is such that 0 ≤ R An B n B ≤ I. Define R n n Q An E n R n n 0 T An B n E n := 0 B A B B A B B B = M An ,k Q An E n M An ,k ⊗ N B n ,k .
(69) (70)
k
Note that we omit identity operators from our notation here and in the remainder of this proof. There should be no room for confusion, as the global Hilbert space will always be clear, and we use superscripts to indicate on which “local systems” each operator acts. For instance, Q An E n ≡ Q An E n ⊗ I B n B in Eq. (69), while M An ,k ≡ M An ,k ⊗ I E n in Eq. (70). Observe that 0 ≤ T An B n E n ≤ I because the same holds for Q An E n and R An B n B . We will now verify the conditions (65) and (66) required to complete the proof. For proving (65), we first write
⊗n n n tr(T An B n E n ρ ⊗n R An B n R An B n B ρ A:B E ⊗ |0 0| B B . (71) A:B E ) = tr Q A E From (62)
⊗n = lim tr(S An B n ρ ⊗n lim tr R An B n B ρ A:B E ⊗ |0 0| B A:B ) = 1.
n→∞
(72)
n→∞
Hence by the gentle measurement lemma [81,82], ⊗n lim ρ ⊗n ρ ⊗ |0 0| − R ⊗ |0 0| R n n n n B B B B = 0, A B A B A:B E A:B E n→∞
(73)
1
and so (71) and (62) give
lim tr(T An B n E n ρ ⊗n lim tr Q An E n ρ ⊗n B A:B E ) ≥ n→∞ A:B E ⊗ |0 0| ⊗n − lim ρ ⊗n ρ ⊗ |0 0| − R ⊗ |0 0| R B B B B An B n An B n A:B E A:B E
n→∞
n→∞
1
= 1.
(74)
Let us now prove (66). We write
tr(T An B n E n ω An :B n E n ) = tr (Q An E n ω˜ An :E n ) tr R An B n B ω An :B n E n ⊗ |0 0| B , (75)
with ω˜
An :E n
:=
tr B n B
n R An B n R An B n B ω A:B E ⊗ |0 0| B B . tr R An B n B ω An :B n E n ⊗ |0 0| B
(76)
We claim that ω˜ An :E n is separable in the cut An : E n whenever ω An :B n E n is separable in the cut An : B n E n . Then (75) will imply (66) because lim −
n→∞
log tr(T An B n E n ω An :B n E n ) log tr(Q An E n ω˜ An :E n ) = lim − n→∞ n n log tr(S An B n ω An :B n ) + lim − n→∞ n ≥ E∞ (ρ ) + D A:E LOCC← (ρ A:B ) − 2δ, R
where we used Proposition 2 and (63) and (64).
(77)
822
F. G. S. L. Brandão, M. Christandl, J. Yard
To show that ω˜ An :E n is separable, we now prove that the completely-positive trace-non-increasing map : An B n E n → An E n defined as
(78) R An B n R An B n (X An B n E n ) := tr B n B B X An B n E n ⊗ |0 0| B B is a separable superoperator. From the form of R An B n B given in (68) we can write tr B n M An ,k ⊗ PB n (X An B n E n ) := B B,k k,k
n . M ⊗ P × X An B n E n ⊗ |0 0| n A ,k B B B,k
Then, taking the partial trace over B n B we find that the cross terms in (79) are zero since are mutually orthogonal and therefore the projectors PB n B,k tr B n M An ,k ⊗ PB n (X An B n E n ) := B B,k
× X
k
An B n E n
, ⊗ |0 0| M An ,k ⊗ PB n B B,k
which is manifestly a separable superoperator. This implies that ω˜ An :E n = (ω An :B n E n ) is separable, since ω An :B n E n is separable. C. Proof of Lemma 3. The final step is now to prove Lemma 3. For that we construct a particular protocol for distinguishing many copies of an entangled state from arbitrary separable states, with an exponentially small probability of mistaking a separable state for many copies of a particular entangled state. In Corollary II.2 of [27], a similar result was shown using the exponential de Finetti theorem [83]. Here we give a simpler proof with a stronger bound on the distinguishability rate. A useful result we will employ is the following simple extension of [84], which in turn is a consequence of the minimax theorem, which we quote below the proof. Lemma 5. Let M be a closed convex set of POVM elements and C a closed convex set of states. Then given ρ ∈ / C,
2 max tr(Mρ) − max tr(Mσ ) = ||ρ − C||M . (79) σ ∈C
M∈M
Proof. Following Jain [84], let the function u : (M, C) → R be defined as u(M, σ ) = tr(Mρ)−tr(Mσ ). The sets M and C are compact convex subsets of the (real) vector space of Hermitian matrices. Since the function u is furthermore affine in both arguments, the conditions of the minimax theorem are satisfied (see Proposition 3 just below) and we find
2 max tr(Mρ) − max tr(Mσ ) = 2 max min (tr(Mρ) − tr(Mσ )) (80) M∈M
σ ∈C
M∈M σ ∈C
= 2 min max (tr(Mρ) − tr(Mσ ))
(81)
= min ρ − σ M .
(82)
σ ∈C M∈M
σ ∈C
Faithful Squashed Entanglement
823
Proposition 3 (Minimax theorem). Let C1 , C2 be non-empty, convex and compact subsets of Rn and Rm , respectively. Let u : C1 × C2 → R be convex in the first argument and concave in the second and continuous in both. Then, min max u(x1 , x2 ) = max min u(x1 , x2 ).
x1 ∈C1 x2 ∈C2
x2 ∈C2 x1 ∈C1
(83)
In the proof of Lemma 3 we will also make use of Azuma’s inequality, a large deviation bound for correlated random variables. We say a sequence of random variables X 0 , X 1 , X 2 , . . . is a supermartingale if for all k ∈ N, E (X k+1 |X 0 , . . . , X k ) ≤ X k .
(84)
Then we have Proposition 4 (Azuma’s inequality). Let {X k }k be a supermartingale with |X k+1 − X k | ≤ ck . Then for all positive integers n and all positive reals t, t2 . Pr (X n − X 0 ≥ t) ≤ exp − n (85) 2 2 k=1 ck Proof of Lemma 3. By Lemma 5, there is M AB ∈ M such that 1 b = a + || ρ AB −S A:B ||M , where a := max tr(M AB σ AB ), b := tr(M AB ρ AB ), 2 σ AB ∈S A:B (86) and without loss of generality || ρ AB − S A:B ||M > 0, or b > a. The element M An B n ∈ M is constructed as follows. Fix δ > 0. One measures the POVM {M AB , I − M AB } in each of the n copies, obtaining n binary random variables n , where Z = 1 is associated with effect M {Z i }i=1 i AB and Z i = 0 with effect I − M AB . Then if n 1 Z i ≥ a + (1 − δ)(b − a), n
(87)
i=1
we accept (this corresponds to M An B n ). Otherwise we reject. First, by the law of large numbers it is clear that lim tr(M An B n ρ ⊗n AB ) = 1.
n→∞
(88)
We now show that lim −
n→∞
(1 − δ)2 log tr(M An B n ω An B n ) || ρ AB − S A:B ||2M . ≥ n 8 ln 2
(89)
The key observation is that, in case we have a separable state ω An B n , for all i ∈ {1, . . . , n}, Pr(Z i = 1|Z 1 , . . . , Z i−1 ) ≤ a.
(90)
This is so because by assumption M AB ∈ SEP implies that both POVM elements in the measurement {M AB , I − M AB } are separable operators. Therefore, the state to
824
F. G. S. L. Brandão, M. Christandl, J. Yard
which the k th measurement is applied is always separable, even conditioning on previous measurement outcomes. Then by Proposition 4, n (1−δ)2 1 2 Pr Z i ≥ a + (1 − δ)(b − a) ≤ 2− 2 ln 2 n(b−a) . (91) n i=1
In more detail, define the random variables X k :=
k
(Z i − a) .
(92)
i=1
Note that E (X k |X 1 , . . . , X k−1 ) = E (Z k |Z 1 , . . . , Z k−1 ) − a + X k−1 = Pr (Z k = 1|Z 1 , . . . , Z k−1 ) − a + X k−1 ≤ X k−1 ,
(93) (94) (95)
i.e. {X k }k form a supermartingale. Moreover, for all k, |X k − X k−1 | = |Z k − a| ≤ 1. Hence by Proposition 4, setting X 0 = 0, we have n 1 Pr Z i ≥ a + (1 − δ)(b − a) = Pr (X n ≥ n(1 − δ)(b − a)) (96) n i=1
(1 − δ)2 (97) n(b − a)2 ≤ exp − 2 = 2−
(1−δ)2 2 2 ln 2 n(b−a)
.
(98)
From (86), n 1 1 (1 − δ)2 || ρ AB − S A:B ||2M , (99) − log Pr Z i ≥ a + (1 − δ)(b − a) ≥ n n 8 ln 2 i=1
and thus DM (ρ A:B ) ≥
(1 − δ)2 || ρ AB − S A:B ||2M . 8 ln 2
The result then follows from the fact that δ > 0 is arbitrary.
(100)
D Proof of Theorem Proof. We show by induction that for every ρ AB E and every k ∈ N, I (A; B|E) ≥
1 || ρ AB − S A:B ||2LOCC→ (k) , 8 ln 2
(101)
which is equivalent to the statement of the theorem. As we have already established the base case k = 1 for all ρ AB E as a consequence of Lemmas 1, 2, and 3, I (A; B|E) ≥
1 || ρ AB − S A:B ||2LOCC→ , 8 ln 2
(102)
Faithful Squashed Entanglement
825
we only need to assume that (101) holds for k − 1 and prove it for k. By Lemma 5, we have for all ρ AB ∈ / S A:B , ||ρ AB − S A:B ||LOCC→ (k) = tr(M AB ρ AB ) −
max
σ AB ∈S A:B
tr(M AB σ AB )
(103)
for some M AB ∈ LOCC→ (k). Without loss of generality, we may assume that M AB is a sum of POVM elements from a measurement of the following form: First Alice performs a projective measurement with mutually orthogonal projectors PA, j on her share of the state (we assume that the ancilla register needed to realise a general POVM is contained in the state ρ A:B ) and communicates the outcome to Bob, obtaining the state (ρ AB ) := (PA, j ⊗ I B )ρ AB (PA, j ⊗ I B ) ⊗ | j j| (104) B j
on AB B. Then they measure a (k − 1)-round LOCC POVM on AB B with the first AB round of communication from Bob to Alice, so that M AB = † ( M B ) for some ← † is the dual map to . AB M ∈ LOCC (k − 1), where B be a copy Let ω AB be the extension of (I E ⊗ )(ρ AB E ) obtained by letting E BEE of the classical register B. It satisfies ω. I (A; B|E)ρ ≥ I (A B; B|E)ω ≥ I (A; B B|E E)
(105)
Indeed, the first inequality is by the data processing inequality, or equivalently, by strong subadditivity [2], while the second follows from the chain I (A B; B|E)ω = I ( B; B|E)ω + I (A; B| B E)ω ω. ≥ I (A; B| B E)ω = I (A; B B|E E)
(106)
Here, we have used the chain rule for conditional mutual information, strong subaddi is a copy of tivity, and the fact that E B. Assuming that (101) holds with k − 1 and applying it to this extension (with swapped roles of A and B Bˆ in order to account for reverse arrow) gives I (A; B B|E B)ω = I (B B; A|E B)ω ≥
1 || (ρ AB ) − S A:B Bˆ ||2LOCC← (k−1) . (107) 8 ln 2
AB Because M AB = † ( M B ), we have the following, which, together with the inequalities (105) and (107), shows the desired inequality (101): || (ρ AB ) − S A:B Bˆ ||LOCC← (k−1) =
(108)
max
← M AB B ∈LOCC (k−1)
× tr(M AB B (ρ AB )) −
AB ≥ tr M B (ρ AB ) −
AB ≥ tr M B (ρ AB ) − = tr(M AB ρ AB ) −
max
σ AB B ∈S A:B B
max
σ AB B ∈S A:B B
max
σ AB ∈S A:B
max
σ AB ∈S A:B
tr M AB σ B AB B
AB tr M B σ AB B
AB tr M B (σ AB )
tr(M AB σ AB )
= || ρ AB − S A:B ||LOCC→ (k) .
(109) (110) (111) (112)
826
F. G. S. L. Brandão, M. Christandl, J. Yard
The only nonobvious step here is (110), but before we show it, we remark that since the reverse inequality is obvious, we will actually have established the equality || (ρ AB ) − S A:B Bˆ ||LOCC← (k−1) = || ρ AB − S A:B ||LOCC→ (k) ,
(113)
which, together with (105) and (107) proves the induction step. To show the inequality (110), first observe that M AB =
B PA, j ⊗ I B ⊗ | j j| . PA, j ⊗ I B ⊗ | j j| B M AB B
(114)
j
We may thus assume without loss of generality that AB M B =
AB, j ⊗ | j j| M B
(115)
j
AB, j (PA, j ⊗ I B ) = AB, j ∈ LOCC← (k − 1) satisfying (PA, j ⊗ I B ) M for operators M M AB, j . Now let σ AB B ∈ S A:B B achieve the maximum in (109). Because of the assumed AB, j , we may assume that form of the M σ AB B =
p j σ AB, j ⊗ | j j| B
(116)
j
for σ AB, j ∈ S A:B and probabilities p j . Then the average state σ AB = j p j σ AB, j satisfies (σ AB ) ≤ max tr M (σ AB σ = tr M ) . (117) tr M AB B AB B AB B AB B σ AB ∈S A:B
4 Proofs of Corollaries We start with the proof of Corollary 2, which is straightforward. Proof of Corollary 2. Squashed entanglement satisfies the monogamy inequality [11] E sq (ρ A:B B ) ≥ E sq (ρ A:B ) + E sq (ρ A:B ).
(118)
Repeatedly applying it to ρ A:B1 ···Bk and noting that E sq (ρ A:B1 ···Bk ) ≤ log |A| [6], we get log |A| ≥ E sq (ρ A:B1 ...Bk ) ≥ k E sq (ρ A:B ),
(119)
where we used that ρ AB j = ρ AB for all j. The result then follows from the lower bound for squashed entanglement given in Corollary 1. Here, we give an outline of the proof of Corollary 3, concerning the quasipolynomialtime algorithm for deciding separability. A more careful treatment can be found in [59].
Faithful Squashed Entanglement
827
Proof of Corollary 3. Given adensity matrix ρ AB , we use semidefinite programming 16 ln 2 log |A| -extension of ρ A:B . If ρ AB ∈ S A:B , then [85] in order to search for a
2 there is such an extension (since separable states have a k-extension for every k). If || ρ AB − S A:B ||LOCC ≥ , Corollary 2 implies that there is no such extension. The computational cost of solving a feasibility semidefinite problem with n variables and a matrix of dimension m is O(n 2 m 2 ). A k-extension of ρ AB has dimension 2 2k |A||B|k and less Hence the time complexity of searching for a than |A| |B| variables. 16 ln 2 log |A| −2 log |A| log |B|) . -extension is of order exp O(
2 Proof of Corollary 4. We start by proving (25). Consider a protocol in QMALOCC (2)m,s,c given by the LOCC measurement {M AB , I − M AB }. We construct a protocol in QMA O(m 2 −2 ),s+ ,c that can simulate it: The verifier asks for a proof of the form |ψ AB1 ···Bk , where |A| = |Bi | = 2m for all i (each register consists of m qubits) and k = 16 ln 2 −2 m. He then symmetrises the B systems obtaining the state ρ AB1 ···Bk and measures {M AB , I − M AB } in the subsystems AB ≡ AB1 . Let us analyse the completeness and soundness of the protocol. For completeness, the prover can send |ψ A ⊗ |φ ⊗k B , for states |ψ , |φ such that tr |ψ ψ| A ⊗ |φ φ| B M AB ≥ c.
(120)
Thus the completeness parameter of the QMA protocol is at least c. For soundness, we note that by Corollary 2, || ρ AB − S A:B ||LOCC ≤ .
(121)
Thus, as {M AB , I − M AB } ∈ LOCC the soundness parameter for the QMA protocol can only be away from s. Indeed, for every ρ AB1 ···Bk symmetric in B, tr(M AB ρ AB ) ≤
max
σ AB ∈S A:B
tr(M AB σ AB ) + || ρ AB − S A:B ||LOCC ≤ s + .
(122)
Equation (24) now follows easily from the protocol above. Given a protocol in QMALOCC (l) with each proof of size m qubits, we can simulate it in QMALOCC (l − 1) as follows: The verifier asks for l − 1 proofs, the first proof consisting of registers AB1 · · · Bk , each of size m qubits and k = 16 ln 2m −2 , and all the l − 2 other proofs in systems C j (3 ≤ j ≤ l) of size m qubits. Then he symmetrises the B systems and traces out all of them except the first. Finally he applies the original measurement from the QMALOCC (l) protocol to the resulting state. The completeness of the protocol is unaffected by the simulation. For the soundness let ρ AB1 ···Bk ⊗ ρC3 ⊗ · · · ⊗ ρCl be an arbitrary state sent by the prover (after the symmetrisation step has been applied to B1 , . . . , Bm ). Let {MC l , I − MC l } ∈ LOCC be the l-partite LOCC verification measurement from the QMALOCC (l) protocol (where for notational simplicity we define C1 = A, C2 = B1 and C l = C1 · · · Cl ). Then tr MC l (ρC1 C2 ⊗ ρC3 ⊗ · · · ⊗ ρCl ) ≤ max tr(MC l σC l ) (123) σC l ∈SC1 :···:Cl
+ =
min
σC l ∈SC1 :···:Cl
max
σC l ∈SC1 :···:Cl
+
min
ρC1 C2 ⊗ ρC3 ⊗ · · · ⊗ ρCl − σC l LOCC
tr(MC l σC l )
σC1 C2 ∈SC1 :C2
≤ s + .
(124)
ρC1 C2 − σC1 C2 LOCC (125)
828
F. G. S. L. Brandão, M. Christandl, J. Yard
The equality in the second line follows since we can assume that the states ρC3 , . . . , ρCl belong to Alice and adding local states does not change the minimum LOCC-distance to separable states. Since for going from QMALOCC (l) to QMALOCC (l − 1) we had to blow up one of the proof’s size only by a quadratic factor, we can repeat the same protocol a constant number of times and still get each proof of polynomial size. In the end, the completeness parameter of the QMA procedure is the same as the original one for QMALOCC (l), while the soundness is smaller than s + l , which can be taken to be a constant away from c by choosing sufficiently small. To reduce the soundness back to the original value s we then use the standard amplification procedure for QMA (see e.g. [60]), which works in this case since the verification measurement is LOCC [63]. Acknowledgements. We thank Mario Berta, Aram Harrow and Andreas Winter for helpful discussions and in particular David Reeb for pointing out that our LOCC norm bound on squashed entanglement would extend to a bound on the conditional mutual information. FB is supported by a “Conhecimento Novo” fellowship from the Brazilian agency Fundacão de Amparo a Pesquisa do Estado de Minas Gerais (FAPEMIG). MC is supported by the Swiss National Science Foundation (grant PP00P2-128455) and the German Science Foundation (grants CH 843/1-1 and CH 843/2-1). JY is supported by a grant through the LDRD program of the United States Department of Energy. FB and JY thank the Institute Mittag Leffler, where part of this work was done, for their hospitality.
References 1. Ohya, M., Petz, D.: Quantum Entropy and Its Use. Berlin-Heidelberg-New York: Springer-Verlag, 2004 2. Lieb, E.H., Ruskai, M.B.: Proof of the strong subadditivity of quantum-mechanical entropy. J. Math. Phys. 14, 1938 (1973) 3. Hayden, P., Jozsa, R., Petz, D., Winter, A.: Structure of states which satisfy strong subadditivity of quantum entropy with equality. Commun. Math. Phys. 246, 359 (2004) 4. Ibinson, B., Linden, N., Winter, A.: Robustness of quantum Markov chains. Commun. Math. Phys. 277, 289 (2008) 5. Werner, R.F.: Quantum states with Einstein-Podolsky-Rosen correlations admitting a hidden-variable model. Phys. Rev. A 40, 4277 (1989) 6. Christandl, M., Winter, A.: “Squashed entanglement”: an additive entanglement measure. J. Math. Phys. 45, 829 (2004) 7. Tucci, R.R.: Quantum entanglement and conditional information transmission. http://arxiv.org/abs/quantph/9909041v2, 1999 8. Tucci, R.R.: Entanglement of distillation and conditional mutual information. http://arxiv.org/abs/quantph/0202144v2 2002 9. Christandl, M.: The Structure of Bipartite Quantum States - Insights from Group Theory and Cryptography. PhD thesis, Cambridge University, 2006 10. Christandl, M., Schuch, N., Winter, A.: Highly entangled states with almost no secrecy. Phys. Rev. Lett. 104, 240405 (2010) 11. Koashi, M., Winter, A.: Monogamy of entanglement and other correlations. Phys. Rev. A 69, 022309 (2004) 12. Peres, A.: Separability criterion for density matrices. Phys. Rev. Lett. 77, 1413 (1996) 13. Horodecki, M., Horodecki, P., Horodecki, R.: Separability of mixed states: Necessary and sufficient conditions. Phys. Lett. A 223, 1 (1996) 14. Horodecki, M., Horodecki, P., Horodecki, R.: Mixed-state entanglement and distillation: Is there a “bound” entanglement in nature? Phys. Rev. Lett. 80, 5239 (1998) 15. Horodecki, R., Horodecki, P., Horodecki, M., Horodecki, K.: Quantum entanglement. Rev. Mod. Phys. 81, 865 (2009) 16. Coffman, V., Kundu, J., Wootters, W.K.: Distributed entanglement. Phys. Rev. A 61, 052306 (2000) 17. Matthews, W., Wehner, S., Winter, A.: Distinguishability of quantum states under restricted families of measurements with an application to quantum data hiding. Commun. Math. Phys. 291, 813 (2009) 18. Bennett, C.H., DiVincenzo, D.P., Smolin, J.A., Wootters, W.K.: Mixed state entanglement and quantum error correction. Phys. Rev. A 54, 3824 (1996) 19. Rains, E.M.: Rigorous treatment of distillable entanglement. Phys. Rev. A 60, 173 (1999)
Faithful Squashed Entanglement
829
20. Devetak, I., Winter, A.: Distillation of secret key and entanglement from quantum states. Proc. Roy. Soc. Lond. Ser. A 461, 207 (2004) 21. Horodecki, K., Horodecki, M., Horodecki, P., Oppenheim, J.: Secure key from bound entanglement. Phys. Rev. Lett. 94, 160502 (2005) 22. Hayden, P., Horodecki, M., Terhal, B.M.: The asymptotic entanglement cost of preparing a quantum state. J. Phys. A 34, 6891 (2001) 23. Vedral, V., Plenio, M.B., Rippin, M.A., Knight, P.L.: Quantifying entanglement. Phys. Rev. Lett. 78, 2275 (1997) 24. Vedral, V., Plenio, M.B.: Entanglement measures and purification procedures. Phys. Rev. A 57, 1619 (1998) 25. Vidal, G., Werner, R.F.: A computable measure of entanglement. Phys. Rev. A 65, 032314 (2001) 26. Yang, D., Horodecki, M., Horodecki, R., Synak-Radtke, B.: Irreversibility for all bound entangled states. Phys. Rev. Lett. 95, 190501 (2005) 27. Brandão, F.G.S.L., Plenio, M.B.: A generalization of quantum Stein’s lemma. Commun. Math. Phys. 295, 791 (2010) 28. Plenio, M.B.: Logarithmic negativity: A full entanglement monotone that is not convex. Phys. Rev. Lett. 95, 090503 (2005) 29. Alicki, R., Fannes, M.: Continuity of quantum conditional information. J. Phys. A: Math. Gen. 37, L55 (2004) 30. Donald, M.J., Horodecki, M.: Continuity of relative entropy of entanglement. Phys. Lett. A 264, 257 (1999) 31. Donald, M.J., Horodecki, M., Rudolph, O.: The uniqueness theorem for entanglement measures. J. Math. Phys. 43, 4252 (2002) 32. Shor, P.W.: Equivalence of additivity questions in quantum information theory. Commun. Math. Phys. 246, 453 (2003) 33. Hastings, M.B.: Superadditivity of communication capacity using entangled inputs. Nature Physics 5, 255 (2009) 34. Vollbrecht, K.G.H., Werner, R.F.: Entanglement measures under symmetry. Phys. Rev. A 64, 062307 (2001) 35. Maurer, U.M., Wolf, S.: Unconditionally secure key agreement and the intrinsic conditional information. IEEE Trans. Inform. Theory 2, 499 (1999) 36. Christandl, M., Renner, R., Wolf, S.: A property of the intrinsic mutual information. In: Proc. 2003 IEEE Int. Symp. Inform. Theory, 2003, p. 258 37. Devetak, I., Yard, J.: Exact cost of redistributing quantum states. Phys. Rev. Lett 100, 230501 (2008) 38. Yard, J., Devetak, I.: Optimal quantum source coding with quantum information at the encoder and decoder. IEEE Trans. Inform. Theory 55, 5339 (2009) 39. Oppenheim, J.: A paradigm for entanglement theory based on quantum communication. http://arxiv.org/ abs/0801.0458v1 [quantph], 2008 40. Hudson, R.L., Moody, G.R.: Locally normal symmetric states and an analogue of de Finetti’s theorem. Z. Wahrschein. Verw. Geb. 33, 343 (1976) 41. Størmer, E.: Symmetric states of infinite tensor products of C ∗ -algebras. J. Funct. Anal. 3, 48 (1969) 42. Raggio, G.A., Werner, R.F.: Quantum statistical mechanics of general mean field systems. Helv. Phys. Acta. 62, 980 (1989) 43. Werner, R.F.: An application of Bell’s inequalities to a quantum state extension problem. Lett. Math. Phys. 17, 359 (1989) 44. König, R., Renner, R.: A de Finetti representation for finite symmetric quantum states. J. Math. Phys. 46, 122108 (2005) 45. Christandl, M., König, R., Mitchison, G., Renner, R.: One-and-a-half quantum de Finetti theorems. Commun. Math. Phys. 273, 473 (2007) 46. Virmani, S., Plenio, M.B.: Construction of extremal local positive operator-valued measures under symmetry. Phys. Rev. A 67, 062308 (2003) 47. DiVincenzo, D.P., Leung, D.W., Terhal, B.M.: Quantum data hiding. IEEE Trans. Inform. Theory 48, 580 (2002) 48. DiVincenzo, D.P., Hayden, P., Terhal, B.M.: Hiding quantum data. Found. Phys. 33, 1629 (2003) 49. Eggeling, T., Werner, R.F.: Hiding classical data in multi-partite quantum states. Phys. Rev. Lett. 89, 097905 (2002) 50. Hayden, P., Leung, D., Winter, A.: Aspects of generic entanglement. Commun. Math. Phys. 265, 95 (2006) 51. Doherty, A.C., Parrilo, P.A., Spedalieri, F.M.: A complete family of separability criteria. Phys. Rev. A 69, 022308 (2004) 52. Brandão, F.G.S.L., Vianna, R.O.: Separable multipartite mixed states - operational asymptotically necessary and sufficient conditions. Phys. Rev. Lett. 93, 220503 (2004)
830
F. G. S. L. Brandão, M. Christandl, J. Yard
53. Ioannou, L.M.: Computational complexity of the quantum separability problem. Quant. Inform. Comp. 7, 335 (2007) 54. Navascues, M., Owari, M., Plenio, M.B.: A complete criterion for separability detection. Phys. Rev. Lett. 103, 160404 (2009) 55. Gurvits, L.: Classical complexity and quantum entanglement. J. Comp. Sys. Sci 69, 448 (2004) 56. Gharibian, S.: Strong NP-hardness of the quantum separability problem. Quant. Inform. Comp. 10, 343 (2010) 57. Beigi, S.: NP vs QMAlog (2). Quant. Inform. Comp. 10, 141 (2010) 58. Harrow, A., Montanaro, A.: An efficient test for product states, with applications to quantum Merlin-Arthur games. In: Proc. Found. Comp. Sci. (FOCS), 2010, p. 633 59. Brandão, F.G.S.L., Christandl, M., Yard, J.: A quasipolynomial-time algorithm for the quantum separability problem. In: Proc. ACM Symp. on Theoretical Computer Science (STOC), 2011, p. 343 60. Watrous, J.: Quantum computational complexity. In: Encyclopedia of Complexity and System Science. Berlin-Heidelberg-New York: Springer, 2009 61. Marriott, C., Watrous, J.: Quantum Arthur-Merlin games. Computational Complexity 14, 122 (2005) 62. Beigi, S., Shor, P.W., Watrous, J.: Quantum interactive proofs with short messages. Theory of Computing 7, 201 (2011) 63. Aaronson, S., Beigi, S., Drucker, A., Fefferman, B., Shor, P.: The power of unentanglement. Theory of Computing 5, 1 (2009) 64. Kobayashi, H., Matsumoto, K., Yamakami, T.: Quantum Merlin-Arthur proof systems: Are multiple Merlins more helpful to Arthur? In: Lecture Notes in Computer Science, Volume 2906, BerlinHeidelberg-Newyork: Springer, 2003, p. 189 65. Brandão, F.G.S.L.: Entanglement Theory and the Quantum Simulation of Many-Body Physics. PhD thesis, Imperial College, 2008 66. Matsumoto, K.: Can entanglement efficiently be weakened by symmetrization? http://arxiv/org/abs/ quant-ph/0511240v3, 2005 67. Horodecki, K., Horodecki, M., Horodecki, P., Oppenheim, J.: Locking entanglement measures with a single qubit. Phys. Rev. Lett. 94, 200501 (2005) 68. Piani, M.: Relative entropy of entanglement and restricted measurements. Phys. Rev. Lett. 103, 160504 (2009) 69. Hiai, F., Petz, D.: The proper formula for the relative entropy and its asymptotics in quantum probability. Commun. Math. Phys. 143, 99 (1991) 70. Ogawa, T., Nagaoka, H.: Strong converse and Stein’s lemma in the quantum hypothesis testing. IEEE Trans. Inform. Theory 46, 2428 (2000) 71. Gühne, O., Toth, G.: Entanglement detection. Phys. Rep. 474, 1 (2009) 72. Brandão, F.G.S.L.: Quantifying entanglement with witness operators. Phys. Rev. A 72, 022310 (2005) 73. Brandão, F.G.S.L.: Entanglement activation and the robustness of quantum correlations. Phys. Rev. A 76, 030301(R) (2007) 74. Synak-Radtke, B., Horodecki, M.: On asymptotic continuity of functions of quantum states. J. Phys. A: Math. Gen. 39, 423 (2006) 75. Berta, M., Christandl, M., Renner, R.: A conceptually simple proof of the quantum reverse Shannon theorem. In: Lecture Notes in Computer Science, Volume 6519, Berlin-Heidelberg-New York: Springer, 2011, p. 131 76. Berta, M., Christandl, M., Renner, R.: The quantum reverse Shannon theorem based on one-shot information theory. Commun. Math. Phys., 2011. to appear, http://arxiv.org/abs/0912.3805v2 77. Brandão, F.G.S.L., Plenio, M.B.: Entanglement theory and the second law of thermodynamics. Nature Physics 4, 873 (2008) 78. Brandão, F.G.S.L., Plenio, M.B.: A reversible theory of entanglement and its relation to the second law. Commun. Math. Phys. 295, 829 (2010) 79. Brandão, F.G.S.L., Datta, N.: One-shot rates for entanglement manipulation under non-entangling maps. IEEE Trans. Inform. Theory 57, 1754 (2011) 80. Brandão, F.G.S.L.: A reversible framework for resource theories. In preparation, 2011 81. Winter, A.: Coding theorem and strong converse for quantum channels. IEEE Trans. Inform. Theory 45, 2481 (1999) 82. Ogawa, T., Nagaoka, H.: A new proof of the channel coding theorem via hypothesis testing in quantum information theory. In: Proc. 2002 IEEE Int. Symp. Inform. Theory 2002, p. 73 83. Renner, R.: Symmetry implies independence. Nature Physics 3, 645 (2007) 84. Jain, R.: Distinguishing sets of quantum states. http://arxiv.org/abs/quant-ph/0506205v1, 2005 85. Vandenberghe, L., Boyd, S.: Semidefinite programming. SIAM Review 38, 49 (1996) Communicated by M.B. Ruskai
Commun. Math. Phys. 306, 831–851 (2011) Digital Object Identifier (DOI) 10.1007/s00220-011-1299-5
Communications in
Mathematical Physics
Global Solutions to the Ultra-Relativistic Euler Equations B. D. Wissman Natural Sciences Division, University of Hawaii at Hilo, 200 W. Kawili Street, Hilo, HI 96720, USA. E-mail:
[email protected]
Received: 4 November 2010 / Accepted: 25 February 2011 Published online: 12 July 2011 – © Springer-Verlag 2011
Abstract: We show that when entropy variations are included and special relativity is imposed, the thermodynamics of a perfect fluid leads to two distinct families of equations of state whose relativistic compressible Euler equations are of Nishida type. (In the non-relativistic case there is only one.) The first corresponds exactly to the StefanBoltzmann radiation law, and the other, emerges most naturally in the ultra-relativistic limit of a γ -law gas, the limit in which the temperature is very high or the rest mass very small. We clarify how these two relativistic equations of state emerge physically, and provide a unified analysis of entropy variations to prove global existence in one space dimension for the two distinct 3 × 3 relativistic Nishida-type systems. In particular, as far as we know, this provides the first large data global existence result for a relativistic perfect fluid constrained by the Stefan-Boltzmann radiation law. It was shown in [6,10] that for non-relativistic perfect fluids a unique equation of state of the form p = a 2 ρ emerges from a γ -law gas in the (appropriately re-scaled) limit γ → 1. A global existence theorem for the 3 × 3 non-relativistic compressible Euler equations was then proven for this model equation of state. This non-relativistic equation of state is unique, but has questionable physical interpretation as an isothermal gas, cf. [8]. Surprisingly, in contrast with the classical γ → 1 limit, the equation of state p = a 2 ρ emerges in two fundamental limits, not one, when special relativity is imposed: it is exact in the case of the Stefan-Boltzmann radiation law, and also emerges in a most natural ultra-relativistic limit of a γ -law gas, the limit in which the temperature is very high or the rest mass very small [2], (not the awkward limit γ → 1). Our results clarify how these two relativistic equations of state emerge physically, and provide a unified analysis of the entropy variations for the resulting two distinct relativistic Nishida systems that leads to a large data global existence theorem for both. In particular, as far as we know, this provides the first large data global existence result for a relativistic perfect fluid constrained by the Stefan-Boltzmann radiation law.
832
B. D. Wissman
1. The Relativistic Euler Equations The relativistic Euler equations in one spatial dimension form a system of conservation laws which can be written as [9], Ut + F(U )x = 0,
(1)
where, U=
v v2 , (ρ + p) , + p) +ρ √ (ρ 1 − v2 1 − v2 1 − v2 n
(2)
and F(U ) =
v2 v , (ρ + p) + p, (ρ + p) √ 2 2 1−v 1 − v2 1−v nv
.
(3)
We designate: v the gas velocity in a chosen Lorenz frame; ρ the proper energy density; p the pressure; the specific internal energy; n the particle number; S the specific entropy; and T the temperature. We choose units where the speed of light is one and note that the thermodynamic quantities are related by the second law of thermodynamics, T d S = d + pdτ , where τ = 1/n is the specific volume. To close the system (1) we consider equations of state of the form p = a 2 ρ.
(4)
With (4) the system (1) contains special properties; in this limit one can prove global solutions exist and depend continuously on the initial data in the density, pressure and velocity variables, for initial data with arbitrarily large, but finite, variation [3,9]. We will use this existence result to prove a large data existence theorem for an ultra-relativistic gas that incorporates entropy and temperature variations via an equation of state of the form, (n, S) = A(S)n γ −1 ,
(5)
with 1 < γ < 2. Once the existence of bounded variation solutions is proven, its uniqueness and continuous dependence can be analyzed using techniques in [5]. We assume the function A satisfies the following conditions: A ∈ C 1 (R+ , R+ ), and A (S) > 0 for S > 0.
(6) γ −1
This family includes the equations of state for a polytropic gas, A(S) ≈ e R S and one modeling a radiation dominated gas constrained by the Stefan-Boltzmann radiation law, A(S) ≈ S γ , [11]. (When γ = 43 this gives the condition that ρ ≈ T 4 .) The proper energy density ρ is the sum of the rest mass energy and internal energy. For a gas with particles of rest mass m and specific internal energy this gives ρ = n(m + ). Based upon this relation the equations of state (5) do not reduce to (4); however, we observe now that they do in the ultra-relativistic limit where either the rest mass is very small (e.g. neutrinos or, in the limiting sense, massless thermal radiation) or the temperature very large, m T 1, [2]. In this limit it follows that ρ = n(m + ) ≈ n.
(7)
Global Solutions to the Ultra-Relativistic Euler Equations
833
Indeed using the second law of thermodynamics under assumption (7), (5) reduces to an equation of state of the form (4) with a 2 = (γ − 1), p = n2
d = n 2 (γ − 1)A(S)n γ −2 = (γ − 1)A(S)n γ = (γ − 1)n = (γ − 1)ρ. dn
We find it remarkable in this limit, when special relativity is assumed, the pressure is still a function of n and S, but reduces to (4) when viewed as a function of ρ alone. This model allows one to find the temperature evolution of the gas and still take advantage of the simplifying effects of an equation of state of the form (4). An analogous equation of state incorporating the entropy was obtained in [6,10] with a rescaled limit γ → 1 in the non-relativistic case, and a corresponding global existence theorem for the classical Euler equations was given. In contrast to the classical γ → 1 limit, the equation of state p = a 2 ρ emerges in two fundamental limits, not one, when special relativity is imposed. It is exact in the case of the Stefan-Boltzmann radiation law and also emerges in the ultra-relativistic limit of a perfect fluid. With these two equations of state as motivation, the goal of this paper is now to prove the following: Theorem 1. Let ρ0 (x), v0 (x) and S0 (x) be arbitrary initial data satisfying, ρ0 (x) > 0, −1 < v0 (x) < 1 and S0 (x) > 0. Let = ln [A(S)] for (n, S) = A(S)n γ −1 , 1 < γ < 2, and A satisfying (6). Suppose further that
and
V ar {0 (·)} < ∞, V ar {ln(ρ0 (·))} < ∞,
(8) (9)
1 + v0 (·) < ∞. V ar ln 1 − v0 (·)
(10)
Then there exists a bounded weak solution (ρ(x, t), v(x, t), S(x, t)) to (1) in the ultrarelativistic limit, satisfying
and
V ar {(·, t)} < N , V ar {ln(ρ(·, t))} < N ,
(11) (12)
1 + v(·, t) V ar ln < N, 1 − v(·, t)
(13)
where N is a constant depending only on the initial variation bounds in (8), (9), and (10). Theorem 1 is a generalization of the work by Smoller and Temple [9] that includes the entropy evolution. In other words, in this model we are able to prove global solutions exist including a physically relevant temperature profile. Smoller and Temple found that the relativistic Euler equations with equation of state (4) possessed the property that after each elementary wave interaction in a Glimm scheme, V ar {ln(ρ)} is non-increasing. This functional, introduced by Liu, is used as a replacement for the quadratic potential in Glimm’s original analysis, which can be used to show that (9) and (10) implies (12) and (13). Considering the ultra-relativistic limit, the solutions of Riemann problems are
834
B. D. Wissman
independent of S, enabling one to solve for the intermediate state in the projected state space and place a corresponding entropy wave between them. In [9] it is shown that for an equation of state of the form (4), the shock curves are translationally invariant in the plane of Riemann invariants. In our case, this property continues to hold under certain coordinate changes in the three dimensional non-projected state space for an equation of state of the form (5). This can be viewed as the relativistic analogue of the large data existence result in [10] with a family of distinct temperature profiles. The main part of the analysis is showing that V ar {S} is bounded in our approximate solutions. We extend the analysis by Smoller and Temple for the ultra-relativistic regime with equation of state (5), by utilizing the geometry of the shock curves in the space of Riemann invariants. Considering only the change of S across shock waves, we find that V ar {S} is uniformly bounded by V ar {ln(ρ)} for a polytropic equation of state; however, across the linearly degenerate entropy waves there is no change in pressure and hence no jump in proper energy density by (4). Thus, another method must be employed to estimate the strengths of these jumps. For a gas dominated by radiation or a general equation of state of the form (5), the change in entropy across a shock depends on the initial entropy value. It is not known a priori that this dependence does not lead to blow-up in the variation in S. Furthermore, in certain elementary wave interactions, V ar {S} may actually increase while V ar {ln(ρ)} remains invariant. Complicating matters, using Δ ln(ρ) as the definition of wave strengths increases the technicality of the entropy wave estimates. For example, after the interaction of two shocks of the same family the new shock wave has strength strictly less than the sum of the two previous. In other words when two shock waves combine, the strengths are not simply additive, but the new wave strength is less than the simple sum of the incoming shock strengths. To alleviate technicalities with the entropy estimates and decreasing shock strengths after interaction, we use the change of Riemann invariants as a measure of wave strength. Under this regime, wave strengths are now additive and the sum of all the strengths of shock waves is shown to be non-increasing in time. In conclusion, using Δ ln(ρ) as a measure of wave strength dramatically simplifies the interaction estimates for the nonlinear waves, but complicates the problem dealing with the entropy waves. The rest of this paper is outlined as follows: In Sect. 2, we analyze the structure of simple wave solutions of (1) and derive the equations of state corresponding to both a γ -law gas and constrained by the Stefan-Boltzmann law. Using these properties, we prove global existence of solutions to Riemann problems. We then obtain a priori wave interaction estimates which are used to produce estimates on approximate solutions constructed using a Glimm scheme in Sect. 3. Section 4 contains the proof of our main theorem. 2. Relativistic Gas Dynamics We consider a gas where the proper energy density ρ and √ pressure satisfy the relationship (4) where causality restricts the sound speed cs = dp/dρ = a to be less than one. Under assumption (4), the system (1) decouples so that we may solve for two variables first, then solve for the third afterward. In this section, we will show in the domain ρ > 0, −1 < v < 1, and S > 0, Riemann problems are globally solvable and their general structure consists of two waves separated by a jump in entropy traveling with the fluid.
Global Solutions to the Ultra-Relativistic Euler Equations
835
2.1. Riemann invariants. The Riemann invariants for the system (1) with p = a 2 ρ are given by [9], a a 1 1 1+v 1+v − + r = ln ln(ρ) and s = ln(ρ). ln 2 2 1−v 1+a 2 1−v 1 + a2 The function r = r (ρ, v) is constant across 3-rarefaction waves and s = s(ρ, v) is constant across 1-rarefaction waves. The entropy S is a third Riemann invariant constant across 1 and 3-rarefaction waves. In our analysis we will view state space in the coordinates of the Riemann invariants rather than the conserved variables. However, using S is not sufficient because the shock curves in (r, s, S) space are, in general, not translationally invariant. We will instead use = ln[A(S)] as our third coordinate. It is shown in Sect. 2.3 that in (r, s, ) space the shock-rarefaction curves are independent of base point. We now change our variables from the conserved quantities (U1 , U2 , U3 ) to (ρ, v, S). Proposition 1. In the region, ρ > 0, −1 < v < 1, S > 0, the mapping (ρ, v, S) → (U1 , U2 , U3 ) is one-to-one, and the Jacobian determinant of the map is both continuous and non-zero. Proof. The conserved quantities U2 and U3 depend only on v and ρ. It can be shown that that the mapping (ρ, v) → (U2 , U3 ) is one-to-one for ρ > 0 and −1 < v < 1 [9]. Now we show that the mapping (ρ, v, S) → (U1 , U2 , U3 ) is one-to-one. If (ρ1 , v1 , S1 ) and (ρ2 , v2 , S2 ) have the same image we must have ρ1 = ρ2 and v1 = v2 , since U2 and U3 only depend on ρ and v, and the mapping (ρ, v) → (U2 , U3 ) is oneto-one. Moreover, since n = n(ρ, S) the equality U1 (ρ1 , v1 , S1 ) = U1 (ρ2 , v2 , S2 ) reduces to n(ρ, S1 ) = n(ρ, S2 ). Thus we are done if ∂n/∂ S = 0. Using ρ = n to rewrite the second law of thermodynamics as ndρ = n 2 T d S + (a 2 + 1)ρdn we conclude ∂n n2 T ∂ S = − (a 2 +1)ρ = 0, and the mapping (ρ, v, S) → (U1 , U2 U3 ) is one-to-one. Finally, the determinate of the Jacobian matrix of the map is det(J) = 0, which is continuous for ρ > 0, −1 < v < 1 and S > 0.
n 2 T (1−a 2 v 2 ) (1−v 2 )2
>
2.2. Jump conditions. For systems of conservation laws, the relations defining the dynamics of shock waves are the Rankine-Hugoniot jump conditions, s[[U ]] = [[F(U )]],
(14)
where s is the speed of the shock and [[U ]] and [[F(U )]] the change of U and F(U ) respectively across the shock, [8]. Given a state U L the Rankine-Hugoniot relations, for each i = 1, . . . , n, define a 1−parameter family of states that can be connected on the right by a shock wave in the i th characteristic family. Moreover, this curve has second order contact with the curve defining all the states that connect to U L on the right by an i th rarefaction wave given by the i th integral curve. Only half of these curves are physically relevant. For the first and third genuinely non-linear characteristic field, with wave speeds λ1 = (v − a)/(1 − va) and λ3 = (v +a)/(1+va), we take the portion of the integral curve extending from U that satisfies λi (U ) < λi (U ). On the other hand, take the portion of the shock curve Si that satisfies the Lax entropy condition, λi (U ) < s < λi (U ) [8]. The second characteristic class is linearly degenerate with characteristic speed λ2 = v. We now give two lemmas that describe the structure of the shock curves.
836
B. D. Wissman
Lemma 1 [7]. Let U = (ρ, v, n) and U L = (ρ L , v L , n L ) be two states separated by a shock wave. Then with (4) the following relation holds: 2 ρL 2 2 1 + a ρ n ρ . = 2 (15) 2 ρ nL ρL 1 + a2 ρL
The global structure of the solutions of the shock relations (14) for the relativistic Euler equations in the space of Riemann invariants was studied by Smoller and Temple [9] for an equation of state of the form (4). We summarize their results in the following lemma: Lemma 2 [9]. Let p = a 2 ρ with 0 < a < 1. The projected shock curves i = 1, 3 onto the plane of Riemann invariants (r, s) at any entropy level satisfy the following: 1. The shock speed s is monotone along the shock curve Si and for each state (ρ L , v L ) = (ρ R , v R ) on Si the Lax entropy condition holds. 2. The shock curves when parameterized by Δ ln(ρ) are translationally invariant. Furthermore the 1 and 3−shock curves based at a common point (r , s) have mirror symmetry across the line r = s through the point (r , s). 3. The i−shock curves are convex and √ 2K − 1 ds <1 0≤ ≤ √ dr − 2K − 1 for i = 1 and √ dr 2K − 1 0≤ ≤ √ <1 ds − 2K − 1 for i = 3, where K = 2a 2 /(1 + a 2 )2 . In light of Lemma 2 we can globally define the shock curves in the r s−plane and know that everywhere on this curve the Lax entropy conditions hold.
2.3. Equations of state. In this section we derive both the equations of state for an ultrarelativistic γ -law gas and one dominated by thermal radiation subject to the StefanBoltzmann law. We then show that the general class of equations of state (5) have the property that as a function of wave strength, the change in a certain function of entropy is independent of base point. Moreover, we will find that the change of this function of entropy and its derivative are monotonically increasing. These facts are used in our estimates on the entropy waves in Sect. 2.5. 2 We begin by assuming the ultra-relativistic limit (7) and(4) with a = γ − 1 for 1 1 τ dp. 1 < γ < 2. Finding the differential of = ρτ we get, d = γ −1 pdτ + γ −1 Plugging this into the second law, we get the two constraints on entropy function S( p, τ ), γ p ∂S 1 τ ∂S = and = . (16) ∂τ γ −1 T ∂p γ −1 T
Global Solutions to the Ultra-Relativistic Euler Equations
837
2.3.1. Ideal gas. We first consider a gas subject to the ideal gas law: pτ = RT . Using the ideal gas law, rewrite (16) as, R R ∂S γ ∂S 1 = and = , ∂τ γ −1 τ ∂p γ −1 p which has the solution,
S( p, τ ) = After dividing by
R γ −1 ,
γ γ −1
R ln(τ ) +
1 γ −1
R ln( p) + C.
exponentiating and rearranging, we get,
ρ(n, S) = Ce
R γ −1
S γ
n ,
that returns the equation of state (5) with A(S) = Ce
R γ −1
S
.
2.3.2. Stefan-Boltzmann. For the Stefan-Boltzmann equation of state, we now assume that the pressure, and hence energy density by (4), depends only on the temperature T . With this we equate the mixed partials of our constraint equations (16),
∂ γ p
1 τ
∂ = . ∂p γ − 1 T τ ∂τ γ − 1 T p After differentiation and simplifying we get the differential equation for T , γ −1 T dT = , dp γ p γ
(17)
γ
with solution, p(T ) = C T γ −1 . Equivalently by (4), ρ(T ) = bT γ −1 . (Note that when γ = 43 , this reduces to the fourth power law, ρ = bT 4 .) Now we put this equation of state in the form (5). To proceed we first find the entropy function S(n, ρ), then solve for ρ. From the first equation of (16) we find, pτ γ + f ( p). S( p, τ ) = γ −1 T The second constraint gives us,
τ γ p dT df =− 1− . dp T γ − 1 T dp
In light of (17) f ( p) = 0, this gives, S( p, τ ) =
γ γ −1
pτ . T
Now, replacing p, τ and T with their equivalent expressions in terms of ρ and n, we obtain, S(n, ρ) = γ b
γ −1 γ
ρ 1/γ n −1 .
Solving for ρ results in ρ = b1−γ γ −γ S γ n γ . Therefore, we have (5) with, A(S) = b1−γ γ −γ S γ ≈ S γ .
838
B. D. Wissman
2.3.3. Shockwave entropy change. For an equation of state of the form, (n, S) = A(S)n γ −1 , with A(S) satisfying (6), the second law of thermodynamics says, p(n, S) = ∂ n 2 ∂n = (γ − 1)A(S)n γ = (γ − 1)n. In the ultra-relativistic limit this further reduces to p(n, S) = (γ − 1)ρ, an equation of state of the form (4) with a 2 = (γ − 1). Choose by (S) = ln [A(S)] .
(18)
We show that across a shock wave the difference [ − L ] is a function of the change of the corresponding Riemann invariant alone. Then the difference [ − L ] along the shock curve is independent of base point. Finally, we show that the difference [ − L ] and its derivative, as a function of the change of Riemann invariants, are monotonically increasing. Note that it is sufficient to show that the change [ − L ] and its derivative are monotonically increasing as viewed as a function of ln(ρ/ρ L ), because they satisfy the relationship as parameters, Δr = a2a 2 +1 Δ ln(ρ). (For 3−Shocks replace Δr with Δs.) Indeed,
d[S − SL ] d[S − SL ]
d ln(ρ/ρ L )
a 2 + 1 d[S − SL ] = = · · . d(r − r ) d ln(ρ/ρ ) d(r − r )
2a d ln(ρ/ρ ) L
L
L
L
Using (15), we have for σ = ln(ρ/ρ L ),
γ 1 + (γ − 1)eσ . [ − L ](σ ) = (1 − γ )σ + ln 2 1 + (γ − 1)e−σ
After differentiating, we have d[ − L ] (eσ − 1)2 (2 − γ )(γ − 1) = , dσ 2(1 + eσ (γ − 1))(eσ + (γ − 1)) which is non-negative in the domain 1 < γ < 2 and σ ≥ 0. Furthermore, the derivative is zero only when σ = 0. Thus, [ − L ](σ ) is a monotonically increasing function. Differentiating a second time we find, d 2 [ − L ] γ 2 (2 − γ )(γ − 1)(e3σ − eσ ) = > 0, 2 dσ 2(1 + eσ (γ − 1))2 (eσ + (γ − 1))2 showing d[ − L ]/dσ is also monotonically increasing for 1 < γ < 2 and σ > 0. Considering Lemma 2 we have proven: Proposition 2. Consider the ultra-relativistic Euler equations with the equation of state (5), 1 < γ < 2 and A satisfying (6). Then the change in = ln[A(S)], when regarded as a function of the change in the corresponding Riemann invariant, is independent of base state. Geometrically, the shock curves, as viewed in (r, s, )−space, are translationally invariant. An interesting fact is that the change in becomes nearly linear for strong shock waves. We state this as a corollary. Corollary 1. Under the assumptions of Proposition 2, the change in becomes nearly linear for large σ . Proof. d[ − L ] (eσ − 1)2 (2 − γ )(γ − 1) (2 − γ ) = lim = . σ σ σ →∞ σ →∞ 2(1 + e (γ − 1))(e + (γ − 1)) dσ 2 lim
Global Solutions to the Ultra-Relativistic Euler Equations
839
differ only in S Fig. 1. A solution to the Riemann Problem
. The states U M and U M
2.4. The Riemann problem. The Riemann problem is a particular class of Cauchy problems with initial data of the form, U L x < 0, U0 (x) = U R x > 0. From the geometry of the shock-rarefaction curves in the coordinate system of Riemann invariants, we can globally solve Riemann problem for any two initial states in the region ρ > 0, −1 < v < 1 and S > 0. Theorem 2. Consider left and right states U L = (ρ L , v L , SL ) and U R = (ρ R , v R , S R ), such that ρ L , ρ R > 0, −1 < v L , v R < 1, and SL , S R > 0. With the equation of state (5) satisfying 1 < γ < 2 and (6), there exists a weak solution to the Riemann problem for system (1) in the ultra-relativistic limit. This solution is unique in the class of solutions with constant states separated by centered rarefaction, shock and contact waves. We parameterize the 1 − (resp. 3)shock/rarefaction curve by the change in r (resp. s) and define the strength of a shock or rarefaction wave as the difference in the values of either r for a 1−shock-rarefaction wave, or s for a 3−shock-rarefaction wave. We choose the orientation on our parametrization so that we have a positive parameter along the rarefaction curve and negative parameter along the shock curve. Therefore, the solution of the Riemann problem can be given as a sequence of coordinates, (1 , 2 , 3 ), where 1 denotes the change in the Riemann invariant r from U L to U M , 2 the change in S from and the change in the Riemann invariant s from U to U . In summary, U M to U M 3 R M for i = 1, 3 we have a shock wave of strength i when i < 0 and a rarefaction wave of strength i when i > 0 (Fig. 1). We adopt the following notation: α, strength of 1−shock wave; β, strength of 3−shock wave; μ, strength of 1−rarefaction wave; η, strength of 3−rarefaction wave; and δ, strength of entropy wave. If (1 , 2 , 3 ) is the solution to the Riemann problem with states U L , U R , we would have: −3 3 ≤ 0 −1 1 ≤ 0 , β= , α= 0 otherwise 0 otherwise ≥0 1 1 ≥ 0 , η= 3 3 . (19) μ= 0 otherwise 0 otherwise
840
B. D. Wissman
We define δ = R − L where = ln[A(S)] and denote δω as the absolute change of across a shock wave of strength ω. More specifically, if two states are separated by a shock of strength ω the absolute change in across the shock would be δω for either a 1 or 3−shock. Since we have shown that the change in is independent on the base state and dependent only on the strength of the wave, δω is well defined. 2.5. Interaction estimates. Consider the following three states: U L = (ρ L , v L , L ), U M = (ρ M , v M , M ), and U R = (ρ R , v R , R ). We wish to estimate the difference in the solutions of the three Riemann problems , , and with solutions denoted by a 1 subscript, 2 subscript and respectively. Proposition 3. Let be a simply connected compact set in r s−space. Then there exists a constant C0 , 1/2 < C0 < 1, such that for any interaction + → in at any value of , one of the following holds: 1. A = −ξ ≤ 0, 0 ≤ B ≤ C0 ξ , or B = −ξ ≤ 0, 0 ≤ A ≤ C0 ξ . 2. A ≤ 0, and B ≤ 0, where A = α − α1 − α2 and B = β − β1 − β2 are change in the strengths of the 1 and 3 shock waves in the solutions. These estimates are proven by systematically looking at all possible wave interactions for which we show several representative examples in the Appendix. Because the interactions are independent of entropy level, we only consider interactions within the first and third characteristic classes. The main consequence is that after an interaction, there cannot be an overall increase in the strengths of the shock waves. This fact follows since as the solution progresses forward in time, cancelations and merging of shock and rarefaction waves of the same class lead to a decrease in shock strength. For example, when a shock wave is weakened by a rarefaction wave, a reflected shock wave is created in the opposite family. This interaction may increase the total strength of the shock waves in the opposite family, but the total gain in shock strength is uniformly bounded by the loss in the weakened or annihilated shock. We choose the constant C0 to be the maximum slope of the largest shock curve that lies within the compact set or 1/2 in order to bound the constant below. More specifically, let ω be the strongest shock wave possible in . Then we take C0 to be
1 dr
ds
C0 = max . (20) , , 2 ds ω dr ω By Lemma 2, the slopes of the shock wave curves in a compact set in the r s−plane are strictly bounded away by 1. Therefore, we conclude C0 < 1. For interactions in a compact set, the variation in across a shock wave is uniformly bounded by a constant times the strength of the shock, but the variation in may increase after an interaction because of the creation of an entropy wave. Typically, across these waves the pressure is invariant and there is a jump in density; however, under the assumption (4), there must be no jump in energy density. Thus, we cannot use ln (ρ/ρ L ) or the change in the Riemann invariants r or s as a measure of wave strength. It should be noted that under certain interactions, such as an i−shock being weakened by an incoming i−rarefaction wave, an entropy wave is created with strength such that
Global Solutions to the Ultra-Relativistic Euler Equations
841
R − L is equal to the loss in entropy change across the shock, plus the change in the entropy across the new shock wave in the opposite family. We need a way to bound the variation in the entropy waves and it turns out that this increase is bounded by a corresponding decrease in the shock strengths. Proposition 4. For every simply connected compact set in r s−space, there exists a constant M > 0 such that after every interaction in , at any value for the system (1) with (5) in the ultra-relativistic limit, the following holds: |δ | − |δ1 | − |δ2 | + (δα1 + δα2 − δα ) + (δβ1 + δβ2 − δβ ) ≤ −M(A + B). Proof. Choose C0 so that Proposition 3 holds. Since is a compact set, let ω = sup {(r1 , s1 ) − (r2 , s2 ) : (r1 , s1 ), (r2 , s2 ) ∈ } so that the strength of the largest shock wave in is bounded above by ω. Furthermore, let M = (1 − C0 )−1 M, where M =2
d[ − L ] (ω), dω
(21)
which is twice the largest rate of change of for all shocks contained in . Also, since [ − L ](ω) is positive and convex up, we have for strengths, ω ≥ ω1 + ω2 , δω ≥ δω1 + δω2 . The proof will be split into two cases, one for each of the two cases from Proposition 3. First let us assume that A ≤ 0 and B ≤ 0, i.e., α − α1 − α2 = −ξα ≤ 0 and β − β1 − β2 = −ξβ ≤ 0. We have α1 + α2 − ξα = α and hence, δ(α1 +α2 −ξα ) = δα . It follows that δα1 + δα2 −
1 1 Mξα ≤ δα1 +α2 − Mξα ≤ δα , 2 2
and δα1 + δα2 − δα ≤
1 1 Mξα ≤ − M A. 2 2
(22)
Also, the change in entropy across the two Riemann problems before and the resulting one are equal: δα + δ − δβ = δα1 + δ1 − δβ1 + δα2 + δ2 − δβ2 .
(23)
Rearranging (23) and using (22), we find 1 δ − δ1 − δ2 + δβ1 + δβ2 − δβ = δα1 + δα2 − δα ≤ − M A. 2
(24)
Adding the inequality (22) to (24), 1 1 δ − δ1 − δ2 + δα1 + δα2 − δα + δβ1 + δβ2 − δβ ≤ − M A − M A = −M A. 2 2 By a similar argument we also have − δ − δ1 − δ2 + δα1 + δα2 − δα + δβ1 + δβ2 − δβ ≤ −M B.
842
B. D. Wissman
Since 0 ≤ −M A and 0 ≤ −M B by assumption, and |δ − δ1 − δ2 | ≥ |δ | − |δ1 | − |δ2 |, we deduce |δ | − |δ1 | − |δ2 | + δα1 + δα2 − δα + δβ1 + δβ2 − δβ ≤ −M(A + B), which concludes the proof of the first case. Now, without loss of generality assume A = −ξ ≤ 0 and 0 ≤ B ≤ C0 ξ . The other case when 0 ≤ A is similar. As before we find δα1 + δα2 − δα ≤ 21 Mξ and δ − δ1 − δ2 + δα1 + δα2 − δα + δβ1 + δβ2 − δβ ≤ Mξ. (25) Since β ≥ β1 + β2 , we have δβ1 + δβ2 − δβ ≤ 0 and so by adding this inequality twice to (23), − δ − δ1 − δ2 + δα1 + δα2 − δα + δβ1 + δβ2 − δβ ≤ 0. (26) Therefore, from (25) and |δ − δ1 − δ2 | ≥ |δ | − |δ1 | − |δ2 |, |δ | − |δ1 | − |δ2 | + δα1 + δα2 − δα + δβ1 + δβ2 − δβ ≤ Mξ. But, Mξ = M(1 − C0 )ξ = M(ξ − C0 ξ ) ≤ M(−A − B) = −M(A + B), where we used the fact −C0 ξ ≤ −B following from the assumption that 0 ≤ B ≤ C0 ξ and A = −ξ .
3. Glimm’s Difference Scheme In 1965 Glimm [4] proved existence of solutions to general systems of strictly hyperbolic conservation laws with genuinely non-linear or linearly degenerate characteristic fields. Glimm’s method takes a piecewise constant approximate solution at one time step and uses Riemann problems, defined at each point of discontinuity, to evolve the solution to a later time. After the approximate solution is brought forward in time, the solution is randomly sampled and a new piecewise constant approximate solution is obtained. In this section we use a Glimm scheme to construct approximate solutions to (1). 3.1. Glimm difference scheme. Begin by partitioning space into intervals of length Δx and time into intervals of length Δt. In order to keep neighboring Riemann problems from colliding, we impose the following CFL condition: Δx Δt > 1 > |λi |, i = 1, 2, 3. For 1 < γ < 2 this condition is satisfied since the characteristic speeds are bounded above and below by 1 and −1. We inductively define our approximate solution. To begin suppose that we have an approximate solution at time t = nΔt, U (x, nΔt), which is constant on the intervals, (kΔx, (k + 2)Δx), where k + n is odd. At each point x = kΔx a Riemann problem is defined. Solve each Riemann problem for time t = Δt. This evolves our approximate solution forward in time from t = nΔt to t = (n + 1)Δt. To finish, we must construct a new piecewise constant function at time t = (n + 1)Δt. Choose a ∈ [−1, 1] and define, U (x, (n + 1)Δt) = U ((k + 1 + a)Δx, (n + 1)Δt−) for x ∈ (kΔx, (k + 2)Δx) and k + n + 1 odd. Here Δt− denotes the lower limit. To begin this process at t = 0, obtain a piecewise constant function from the initial data U0 (x) by again choosing a ∈ [−1, 1] and defining, U (x, 0) = U0 ((k + a)Δx) for k odd.
Global Solutions to the Ultra-Relativistic Euler Equations
843
∞ Consider, θ ∈ i=0 [−1, 1]. We call Uθ,Δx (x, t) the approximate solution given by a mesh size of Δx with sampling points at the n th time step given by θn . In order to estimate the change in the variation of our approximate solutions, we define piecewise linear, space-like curves, called I-curves, which connect sample points at different time levels. If an I-curve J passes through the sampling point ((k + θn )Δx, nΔt), then J is only allowed to connect to ((k + 1 + θn±1 )Δx, (n ± 1)Δt) on the right and to ((k − 1 + θn±1 )Δx, (n ± 1)Δt) on the left. We consider two functionals defined on I −curves. Define for an I −curve J : F(J ) = αi + βi + V (27) J
and L(J ) =
J
αi − M0 δαi + βi − M0 δβi − M0 |δ| + V, J
J
(28)
j
where the sums are taken over all waves that cross J . The constant M0 will be chosen later and V = V ar {U0 (·)} is the variation of the initial data. The main problem in our analysis is to show that the variation in the entropy waves stays bounded for all time. To do this we need to bound the possible change in across shock waves. This is accomplished by first showing that the variation in r and s stays finite for all time. This implies that all the interactions, as projected onto the r s−plane, occur in a compact set. Thus, there is a largest possible shock strength in this compact set. Using the fact that the derivative of the entropy change as a function of wave strength is monotonically increasing, there is a constant such that the entropy change is bounded by a constant times the wave strength. Moreover, we can then use Proposition 4 to estimate the increase in the variation in entropy in our approximate solutions. 3.2. Estimates on approximate solutions. For initial data U0 (x) and the corresponding approximate solution Uθ,Δx (x, t), define U 0 (x) and U θ,Δx (x, t) as the initial data and approximate solutions viewed as functions of r and s only. The first estimate will show that the variation in the Riemann invariants across an I-curve J is bounded above by the functional F on J . Proposition 5. Let U 0 (·) be of finite variation, J an I-curve and suppose that the approximate solution U θ,Δx (x, t) is defined on J . Then, V arr s (J ) ≤ 4F(J ). Proof. Let V arr− (J ) and V arr− (J ) denote the variation across J given by a decrease and increase in r respectively. The only waves that contribute to the decrease in r are 1 and 3−shocks and increase 1−rarefactions. Therefore, V arr− (J ) ≤ αi + βi and V arr+ (J ) = μi , (29) J
J
J
where the sum is over all waves of the particular type crossing J . Following this line of reasoning for s, we also have αi + βi and V ars+ (J ) = ηi . (30) V ars− (J ) ≤ J
J
J
844
B. D. Wissman
The initial data U 0 may be written as a function of the Riemann invariants r and s, U 0 (x) = (r0 (x), s0 (x)). Since U 0 (·) is of finite variation, the limits lim x→±∞ r0 (x) = r ± and lim x→±∞ s0 (x) = s ± exist. For any I-curve J , the end states at ±∞ are given − | ≤ V , and hence by (r ± , s ± ). From this we obtain, |V arr+ (J ) − V arr− (J )| = |r + − r + (J ) ≤ V ar − (J ) + V . Using (29) and similarly from (30), V ar r J μi ≤ J αi + r β + V and η ≤ α + β + V . Combining these together we have i J J i J i J i μ + η ≤ 2 α + β + V . Thus, J i J i J i J i αi + βi + μi + ηi , V arr s (J ) ≤ 2 J
≤4
J
αi +
J
βi
J
J
+ 2V ≤ 4F(J ).
J
We now show that the functional F on the I-curves is non-increasing. We define a partial ordering on the I-curves by saying that J ≺ J if the curve J never lies below the curve J . Furthermore, we say that J is an immediate successor to J if J ≺ J and J and J share all the same sample points except for one. It is clear that for any pair of I-curves such that J ≺ J , there is a sequence of immediate successors that begins at J and ends at J . The next proposition shows that if our approximate solution is defined on an I-curve, it can be defined for all following I-curves. Proposition 6. Let J and J be I −curves, J ≺ J , and suppose that J is in the domain of definition of U θ,Δx . If F(J ) < ∞, then J is in the domain of definition of U Δx,θ , and F(J ) ≤ F(J ). Moreover, if V arr s {U0 (·)} < ∞ then U θ,Δx can be defined for t ≥ 0. Proof. We proceed by induction. Suppose first that J is an immediate successor to J . Then the difference F(J ) − F(J ) is given by the change in shock wave strengths across the diamond enclosed by J and J . This is a consequence of the fact that the waves that head into the diamond from the left and right solve the same Riemann problem as the outgoing waves in the new single Riemann problem. If we denote J0 and J0 as the diamond portion of J and J , we have by Proposition 3, αi + βi + V − αi + βi + V , F(J ) − F(J ) = J
=
J0
J
αi +
J0
βi −
J0
J
αi −
J
βi ,
J0
= α − α1 − α2 + β − β1 − β2 ≤ A + B ≤ 0. Thus, F(J ) ≤ F(J ) for immediate successors. For any a general J and J such that J ≺ J , we produce a sequence of immediate successors that take J to J . At each step the functional F is non-increasing, thus F(J ) ≤ F(J ) continues to hold. By Proposition 5, V arr s (J ) ≤ 4F(J ) ≤ 4F(J ), so, J is in the domain of definition of U θ,Δx . Moreover, if V arr s U 0 (·) < ∞, then V arr s (0) < ∞ for the unique I −curve 0 that lies along the linet = 0. In order to show that U Δx,θ can be defined for t ≥ 0, we must show that V arr s U θ,Δx (·, t) < ∞ for all time. But, this condition is equivalent to showing the variation across any I −curve J is always finite. Since for any I −curve J, V arr s (J ) ≤ 4F(J ) ≤ 4F(0) ≤ 8V arr s {U0 (·)}, the result follows.
Global Solutions to the Ultra-Relativistic Euler Equations
845
Again, Proposition 6 shows that the variation of our approximate solution in the variables r and s is finite. Thus, there exists a compact set in the r s−plane that contains all the interactions in our approximate solution. Corollary 2. Suppose that V arr s {U0 (·)} < ∞. Then there exists a simply connected compact set in the r s−plane such that all possible interactions are contained in . Proof. From Proposition 5 and Proposition 6 we know that for any I-curve J , V arr s (J ) < 4F(J ) < 4F(0) < 8V arr s {U0 (·)} = N < ∞. Thus, the distance between any two states occurring anywhere in our approximate solution is bounded by N . Consider the left limit state of U 0 (·), (r − , s − ). Therefore, all states must be contained within the ball of radius 2N centered around (r − , s − ).
Now, we show that the variation of our approximate solution, including the variation in , is bounded above by the functional L(·). Proposition 7. Suppose V ar {U0 (·)} < ∞ and J is an I −curve that is in the domain of definition of Uθ,Δx . Then there exists constants M0 > 0 and K > 0, independent of Δx and θ , such that V ar (J ) ≤ K · L(J ). Proof. The variation across the I −curve J is bounded by V ar (J ) ≤ V ar (Shock Waves) + V ar (Rarefaction Waves) +V ar (-Waves) + V ar (across Shocks). Since V arr s U 0 (·) ≤ V ar {U0 (·)} < ∞, we have from Corollary 2 that all the interactions projected into the r s−plane occur in a compact set . Therefore there exists a constant M > 0 such that for a shock wave of strength ω, δω ≤ Mω. Let M = (1−C0 )−1 M as in Proposition 4. Since, M < M we have for a shock wave of strength ω, δω < Mω. From the proof of Proposition 5, we can bound the variation from the shock waves and rarefaction waves by the shock waves crossing J and the initial variation V . Thus, V ar (J ) ≤ 2
αi +
J
≤4
βi
+
J
αi +
J
≤ (4 + M)
μi +
J
βi + V
+
J
αi +
J
J
ηi +
J
J
βi + V
J
|δ| + M
J
+
|δ| +
J
αi +
δαi +
δβi ,
J
βi ,
J
|δ|.
J
1 1 δω ≤ 2M Let M0 ≤ 1/2M. Then, M0 δω ≤ 2M (Mω) ≤ 21 ω. Thus, for a shock wave of strength ω, ω ≤ 2(ω − M0 δω ). Using this we find,
V ar (J ) ≤ 2(4 + M) αi − M0 δβi + βi − M0 δβi + V + |δ|. J
J
J
846
B. D. Wissman
Finally, since M0 · 2(4 + M) ≥ 2M M0 ≥ 1, we move the sum of the strengths of the entropy waves inside, V ar (J ) ≤ 2(4 + M) αi − M0 δβi + βi − M0 δβi + M0 |δ| + V . J
J
J
Therefore, V ar (J ) ≤ K · L(J ), with K = 2(4 + M).
Proposition 8. Suppose that V ar {U0 (·)} < ∞ and J, J are I-curves such that J ≺ J and L(J ) < ∞. Then J is in the domain of definition of Uθ,Δx (x, t), L(J ) ≤ L(J ) and UΔx,θ (x, t) is defined for t ≥ 0. Proof. Since V arr s U 0 (·) < V ar {U0 (·)} < ∞ there exists a compact set that contains all possible interactions. Define M as in Proposition 4 and take M0 ≤ 1/2M. As Proposition 6, we prove the result by induction on the I curves. First let J be an immediate successor to J . Let J0 and J0 be the parts of J and J that bound the diamond formed by J and J . Using this and the definition of L(J ), ⎡
⎤ L(J ) − L(J ) ≤ ⎣ αi − M0 δαi + βi − M0 δβi + M0 |δ|⎦ J0
J0
⎡
J0
⎤ αi − M0 δαi + βi − M0 δβi + M0 −⎣ |δ|⎦ , J0
J0
J0
= α − α1 − α2 + β − β1 − β2 + M0 δα1 + δα2 − δα +M0 δβ1 + δβ2 − δβ + M0 |δ | − |δ1 | − |δ2 | .
Now we refer to Proposition 3 and 4. We see that the first two terms are equal to (A + B) and the others are bounded above by −M(A + B). Putting this together, L(J ) − L(J ) ≤ (A + B) − M M0 (A + B) ≤
1 (A + B) ≤ 0. 2
For immediate successors, we have L(J ) ≤ L(J ). Moreover, by Proposition 7 we have that the variation along J is bounded by L(J ) and hence L(J ). Thus, J is in the domain of definition of UΔx,θ . For general J and J such that J ≺ J , the same conclusion holds by constructing a sequence of immediate successors to move from J to J . Along each step, the results above continue to hold. Finally, if V ar {U0 (·)} < ∞, we have L(0) < ∞ and for any I −curve J, L(J ) ≤ L(0). Thus we can conclude that V ar (J ) ≤ 2(4 + M)L(J ) ≤ 2(4 + M)L(0) < ∞, so our approximate solution can be defined for t ≥ 0.
Global Solutions to the Ultra-Relativistic Euler Equations
847
4. Existence of Weak Solutions We use Glimm’s Theorem [4] to prove existence of solutions to (1) in the ultra-relativistic limit with an equation of state of the form (5). For θ fixed and xn = 1/2n , ∞ the set of approximate solutions Uθ,Δxn (x, t) n=1 has uniformly bounded variation by Proposition 7. Furthermore, since the variation is bounded and each approximate solution has the same limits at infinity, the sup norm is also uniformly bounded and are L 1 Lipschitz in time. At this point Helly’s Theorem [1] provides a convergent subsequence, Uθ,Δxni (x, t), that converges to a function U (x, t) with finite variation for each fixed time. However, there is no justification that this limit is actually a weak solution. Glimm’s Theorem guarantees that there exists a subsequence that converges to a weak solution. Theorem 3 [4]. Assume that the approximate solution Uθ,Δxi satisfies, (31) V ar Uθ,Δxi (·, t) < N < ∞
∞ for xi = 1/2i , θ ∈ = i=0 [−1, 1], and all t ≥ 0. Then there exists a subsequence 1 , where U (x, t) satisfies of mesh lengths Δxik such that Uθ,Δxik → U in LLoc V ar {U (·, t)} < N . Furthermore, there exists a set of measure zero ⊂ such that if θ ∈ − then U (x, t) is a weak solution to (1). We now prove Theorem 1 by showing that our approximate solutions meet the assumptions of Glimm’s Theorem. Proof. Assume the initial (8), (9), and (10). We show that for all Δxi and data satisfies, sample points θ, V ar UΔx,θ (·, t) < N < ∞, where Uθ,Δx (ρ(x, t), v(x, t), S(x, t)) = (U1 , U2 , U3 )θ,Δx . First we show that the variation in ρ, v, and S is bounded for all time in the approximate solutions. From Proposition 5 and Proposition 6 we have that the variation of our approximate solution in r and s is uniformly bounded for all time. More specifically, V arr s U θ,Δx (·, t) < 4F(0) < 4 αi + βi + V arr s {U0 } 0
0
< 8 · V arr s {U0 (·)}. 1+v From this the variation of ln (ρ) and ln 1−v are also bounded for all time. Using 1+v ln 1−v = 21 (r + s) we have N 1 1 + v(·, t) = sup V ar ln |(r (xi+1 , t) + s(xi+1 , t)) − (r (xi , t) + s(xi , t))|, 1 − v(·, t) 2 N i=1
1 1 ≤ V arr s U Δx,θ (·, t) + V arr s U Δx,θ (·, t) , 2 2 ≤ 8 · V ar {U0 (·)} . 2 2 1+a {ln(ρ(·, V ar {U0 (·)}. Similarly, using ln(ρ) = 1+a (s−r ) we find, V ar t))} ≤ 16 a a
848
B. D. Wissman
The variation in is also bounded for all time in approximate solutions. This is clearfrom Proposition 7 and Proposition 8 because there exists a constant M so that V ar θ,Δx (·, t) ≤ 2(4 + M)L(0). We can now show that the variation in ρ, v and S is bounded for all time. Since V ar {ln(ρ(·, t))} < ∞ for all t > 0 there exists a constant b > 0 such that ρ(x, t) < b. Let c = max {1, b}, then V ar {ρ(·, t)} ≤ c · V ar {ln(ρ(·, t))}. For v we have, V ar {v(·, t)} = sup
N
|v(xi+1 , t) − v(xi , t)| ,
N i=1
N
1 + v(xi+1 , t) 1 + v(xi , t)
≤ sup
ln 1 − v(x , t) − ln 1 − v(x , t) , i+1 i N i=1 1 + v(·, t) ≤ V ar ln . 1 − v(·, t) For S we need to find a constant C such that |S(x, t)− S(y, t)| ≤ C |(x, t)−(y, t)|. Since is of finite variation for all time, there exists a largest and smallest value of S, say Smax and Smin with 0 < Smin ≤ Smax . Define C by C=
max
S∈[Smin ,Smax ]
d dS
−1
=
max
S∈[Smin ,Smax ]
A(S) . A (S)
It follows that V ar {S(·, t)} ≤ C · V ar {(·, t)}. Finally, from Proposition 1 the determinant of the Jacobian is bounded away from zero for all approximate solutions. Thus, the variation in conserved variables, (U1 , U2 , U3 ), are bounded for all t ≥ 0, θ and Δxi . Therefore, Theorem 3 provides existence of a set measure zero ⊂ such that if we choose θ ∈ − there exists a subsequence of mesh refinements, Δxik → 0 such 1 to a weak solution, U (x, t) that Uθ,Δxi converges pointwise almost everywhere in L loc k of (1). Moreover, this solution satisfies (11), (12) and (13) for some N > 0, all t > 0 and is L 1 Lipschitz in time.
Acknowledgements. This paper is part of a doctoral thesis written under Professor J.B. Temple at the University of California, Davis.
5. Appendix: Interaction Estimates In this section we discuss four cases of the interaction estimates needed to prove Proposition 3. In total there are sixteen possible incoming wave profiles, corresponding to whether each of the four incoming waves are a shock or rarefaction wave, and between one and four outgoing wave configurations. The main consequence of our estimates is that after an interaction there can be an increase in strengths of the shock waves in one class, but it is accompanied by a corresponding decrease in overall shock strength in the other class. We assume that all the interactions occur in a simply connected compact set ⊂ R2 and, as in (20), we define C0 as the max of 1/2 and the maximum slope of the largest shockwave contained in .
Global Solutions to the Ultra-Relativistic Euler Equations
849
Fig. 2. The change in Riemann invariants along shock curves satisfy y/z < C0
For these estimates we repeatedly utilize that the shock curves in the space of Riemann invariants are traslationally invariant, convex and whose derivatives are bounded above by C0 . Since our definition of wave strength is determined by the change in Riemann invariant r for 1-waves and s for 3-waves, we use the following two facts: one, the change in s, (r ) along a 1, (3)-shock is uniformly bounded by the change in r, (s); and two, if two shock waves of the same family begin at two distinct states U1 and U2 and meet at a common third state U3 , then the ratio of the distances along the r and s axes from U1 to U2 are bounded above by C0 . These two facts are shown geometrically in Fig. 2. We also note that interaction estimates are often similar for cases where the shock and rarefaction waves are permuted in either the incoming or outgoing waves. For example, one can show the estimates hold in a similar manner for the four permutations of three incoming shock waves and one rarefaction wave. We begin by noticing that after an interaction, the strengths of the shock waves in both families cannot increase. Suppose that B > 0. If the outgoing 1−wave is a rarefaction wave, we are done since A = −α1 − α2 ≤ 0. Suppose now that the outgoing 1−wave is a shock. Since the starting and ending states, U L and U R , are fixed before and after an interaction, the total change in Riemann invariants is the same. Equating the change in r we find, −α − Δrβ = −α1 − α2 + μ1 + μ2 − Δrβ1 − Δrβ2 , where Δrβ is the change in r of the β shock. Rearranging terms we get, α − α1 − α2 = −μ1 − μ2 + Δrβ1 + Δrβ2 − Δrβ . By strict concavity of the shock curves and β > β1 +β2 , we have Δrβ1 +Δrβ2 −Δrβ < 0 and thus A = α − α1 − α2 ≤ 0. Proposition 3 refines this result further. It states that the increase in B is strictly bounded above by the decrease in A. For our first wave interaction estimate consider the case with four incoming shock waves, (α1 , β1 ) + (α2 , β2 ) for which there are three possible outgoing wave profiles: two shock waves, (α , β ), or one rarefaction wave and one shock wave, (μ , β ) or (α , η). Consider the interaction (α1 , β1 ) + (α2 , β2 ) → (α , β ) and suppose that A ≥ 0. (The case with B ≥ 0 is similar.) See Fig. 3. We have B = β − β1 − β2 = −z 1 − z 2 = −ξ and A = α − α1 − α2 = y1 + y2 < C0 (z 1 + z 2 ) = −C0 ξ . Now consider the same interaction, but with outgoing waves, (α , η ). In this case, B = −β1 − β2 = −z and A = α − α1 − α2 = y < C0 z. The interaction (α1 , β1 ) + (α2 , η2 ) has two possible outgoing profiles, (α , β ) or (α , η ). See Fig. 4. For the first case, B = β − β1 = −z and α = α1 + α2 + y. Hence, A = y < C0 z. For the outgoing waves, (α , η ), B = −z and A = y < C0 z. Now consider the incoming waves, (α1 , η1 ) and (μ2 , β2 ). If two rarefaction waves are produced, A, B ≤ 0 and we are done. For one or two outgoing shocks we look at outgoing cases (α , η ) and (α , β ). See Fig. 5. With one shock, either A, B ≤ 0
850
B. D. Wissman
(b)
(a)
Fig. 3. Interaction (α1 , β1 ) + (α2 , β2 ), with result, (a), (α , η ) and (b), (α , β )
(a)
(b)
Fig. 4. Interaction (α1 , β1 ) + (α2 , η2 ), with result, (a), (α , η ) and (b), (α , β )
(a)
(b)
Fig. 5. Interaction (α1 , η1 ) + (μ2 , β2 ), with result, (a), (α , η ) and (b), (α , β )
or B = −β2 = −z and A = y − μ2 ≤ y < C0 z. For two outgoing shocks with A ≥ 0, B = −z and A = y < C0 z. Lastly, we consider the interaction of three rarefaction waves and a shock wave, (α1 , η1 ) + (μ2 , η2 ), Fig. 6. The cases with outgoing waves, (μ , η ) and (α , η ) have A, B ≤ 0. For (α1 , ν1 ) + (μ2 , η2 ) → (μ , β ), we have A = −α1 = −z and B = β ≤ y < C0 z and for (α1 , η1 ) + (μ2 , ν2 ) → (α , β ) we have A = α − α1 = −z and B = β ≤ y < C0 z.
Global Solutions to the Ultra-Relativistic Euler Equations
(a)
851
(b)
Fig. 6. Interaction (α1 , η1 ) + (μ2 , η2 ), with result, (a), (μ , β ) and (b), (α , β )
References 1. Bressan, A.: Hyperbolic Systems of Conservation Laws: The One-Dimensional Cauchy Problem. Oxford: Oxford University Press, 2000 2. Cercignani, C., Kremer, G.M.: The Relativistic Boltzmann Equation: Theory and Applications. BaselBoston: Birkhäuser Verlag, 2002 3. Colombo, R., Risebro, N.H.: Continuous dependece in the large for some equations of gas dynamics. Comm. Partial Diff. Eqs. 23, 1693–1718 (1998) 4. Glimm, J.: Solutions in the large for nonlinear hyperbolic systems of equations. Comm. Pure. Appl. Math. 18, 697–715 (1965) 5. Lewicka, M.: Well-Posedness for hyperbolic systems of conservation laws with large BV data. Arch. Rat. Mech. Anal. 173, 415–445 (2004) 6. Nishida, T.: Global Solution for an Initial Boundary Value Problem of a Quasilinear Hyperbolic System. Proc. Jap. Acad. 44, 642–646 (1968) 7. Taub, A.H.: Relativistic Rankine-Hugonoit Equations. Phys. Rev. 74(3), 328–334 (1948) 8. Smoller, J.: Shock Waves and Reaction-Diffusion Equations, 2nd Ed. Berlin-Heidelberg-New York: Spriger, 1994 9. Smoller, J., Temple, B.: Global solutions of the Relativistic Euler equations. Commun. Math. Phys. 156, 67–99 (1993) 10. Temple, B.: Solutions in the large for the nonlinear hyperbolic conservation laws of gas dynamics. J. Diff. Eq. 41(1), 96–161 (1981) 11. Weinberg, S.: Gravitation and Cosmology: Principles and Applications of the General Theory of Relativity. New York: Wiley, 1972 Communicated by P. Constantin