This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
DQ [ϕ] = (detζ Q)− 2 ,
(A.1)
configurations ϕ
where Q is an invertible admissible elliptic operator with positive order. The integrals on the infinite dimensional configuration space of the physical system are therefore to be understood as the r.h.s. well-defined ζ -determinant. The “volume measures” DQ [ϕ]– which are there to remind us that we are mimicking the finite dimensional integration procedure– can a priori depend on Q, a dependence one needs to take into account in the following. Just as the operator Q “weights” a priori divergent traces in a way that enables us to extract a finite part, it serves here to “extract a finite part” of a priori ill-defined formal path integrals. Let us see how this Q-dependence can affect the computations. Starting from the finite dimensional setting, let us make the change of variable x˜ = Cx in a gaussian integral and denote by J the corresponding jacobian determinant: − 21
(det Q)
= =
1
Rn
˜ x> ˜ e− 2
Rn
e− 2J dx 1
= J · det(C ∗ QC)− 2 . Furthermore 1
J :=
(det(C ∗ QC)) 2 (det Q)
1 2
=
det(C ∗ C) = | det C|.
Similarly, replacing ordinary determinants by ζ -determinants, one could expect the modulus of the jacobian determinant of a ϕ˜ = Cϕ in (A.1) to correspond to a quotient of ζ -determinants. But at this point the multiplicative anomaly comes into the way. Let C be an invertible elliptic operator (with possibly zero order), C ∗ its formal adjoint (with respect to an L2 structure on the space of sections it is acting on), assuming that Q is positive (or “sufficiently close” to a positive operator [KV, Du]), then C ∗ QC is a positive elliptic operator (or “sufficiently close” to a positive operator) with positive order in such a way that we can define its ζ -determinant. Applying a computation similar to the finite dimensional one would yield: 1
JQ :=
detζ (C ∗ QC) 2 1
(det ζ Q) 2 But this does not generally coincide with J˜ :=
det ζ (C ∗ C).
.
62
A. Cardona, C. Ducourtioux, S. Paycha
In any case the latter determinant is only defined if C has non-vanishing positive order, which is not always the case in applications where C could typically be a multiplication operator. The fact that J = JQ is a consequence of the multiplicative anomaly for ζ -determinants recalled in (16) as the following computation shows: det ζ (C ∗ QC) det ζ Q detζ (QC ∗ C) = = Fζ (Q, C ∗ C)det ζ (C ∗ C) = Fζ (Q, C ∗ C) · J˜2 . det ζ Q
2 JQ =
(A.2)
The second identity follows from interpolating C ∗ QC and QC ∗ C by the family Qt := Qt C ∗ Q1−t C, t ∈ [0, 1] of constant order elliptic operators which have a constant deterd minant since: dt log detζ Qt = 0. The third identity follows from (16). B. Computation of the Chern-Simons Term in TQFT in Dimension 3 Using the Atiyah-Patodi-Singer Theorem Theorem [APS II]. Let X be an oriented Riemannian manifold of dimension 4l with boundary M such that X is isometric to a product M × I, I ⊂ R near the boundary. Let ∇ W be a connection on the exterior bundle W based on X and ∇ L.C. the LeviCivita connection on X. Let D∇ := d∇ + d∇∗ , where d∇ = d ⊗ 1 + 1 ⊗ ∇ W and d∇∗ = d ∗ ⊗ 1 + 1 ⊗ ∇ W as in Sect. 5, and let D∇+ denote the restriction of D∇ to the even forms on X. Near the boundary, D∇+ = c ◦ (
d + B odd ), dt
where B odd is the restriction to odd forms on the boundary of the operator defined on 2p or 2p + 1 forms by: B∇ = (−1)k+p+1 ( ∗ d∇ − d∇ ∗), denoting the grading operator on forms. We let the operator D∇+ act on sections f of the vector bundle satisfying the Atiyah-Patodi-Singer (APS) boundary condition Pf (·, 0) = 0, where P is the spectral projection of B odd corresponding to non-negative eigenvalues. Then W L(∇ L.C )tr x e− + ηB (0), indD∇+ = X
where L is the Hirzebruch L polynomial, W the curvature on W , and where ηB denotes the η invariant of Bodd . Let us apply this result to X = M × [0, 1], where M is an 4l − 1 dimensional closed Riemannian manifold and let us equip X with the product metric. The boundary of X is the odd dimensional manifold M × {0} M × {1}. With the notations of the above theorem where we set p = k, since k is odd, we have Bk = ∗dk − dn−k ∗, where Bk is the restriction of B to the odd k forms. Since ∗2 = 1 on k forms in dimension n = 2k +1, we ∗ ∗ ) coincides with the restriction have dn−k = − ∗ dk∗ so that the restriction Bk to R(dk−1 ∗dk .
From Tracial Anomalies to Anomalies in Quantum Field Theory
63
In order to compute the r.h.s of (40) we need to compute the difference of η-invariants of Bk . Following Atiyah, Patodi and Singer, let us first investigate the metric dependence of the eta invariants η∗dk (0) in order to build an invariant independent on the choice of metric. To two metrics g and g on M correspond two operators B and B and it follows from the Atiyah-Patodi-Singer index theorem that (see (2.3) in [APS II]): ηB (0) − ηB (0) = n
L(∇ L.C )
(B.3)
M×[0,1]
using the fact that sign(M × [0, 1]) = 0 and that the connection on W is flat. Let us now fix the metric and take two flat connections ∇0W and ∇1W on W restricted (0) and ηB (0). From the above it to M, this leading again to two η invariants ηBk,1 k,0 follows that this expression is independent of the choice of metric (see Theorem 2.4 in [APS II]). We now equip W restricted to M with a one parameter family of connections ∇tW := (1 − t)∇0W + t∇1W and correspondingly a one parameter family of operators: Bt = (−1)k+p+1 ( ∗ dt − dt ∗). We can equip W seen as a bundle over X = [0, 1] × M with the connection ∇ W := d W dt + ∇t and build the corresponding Dirac operator: D∇+ = c ◦ (
d + Btodd ). dt
(0) − B (0) does not depend on the choice of metric, we can choose a flat Because Bk,1 k,0 metric. Thus the L form will be trivial. On the other hand sgn(X) = 0 for the particular choice of manifold X = M × [0, 1] we took so that the spectral flow vanishes. Applying once again the Atiyah-Patodi-Singer theorem yields:
(0) − ηB (0) = ηBk,1 k,0
W tr x e− .
(B.4)
M×[0,1]
Combining (B.3) and (B.4) where the Levi-Civita connection reads d + ω and the connection on W reads ∇ W = d + A (provided both the tangent bundle and the bundle E are trivial) yields the expression of the Chern-Simons term computed by Witten (see formula (2.23) in [Wi]). Acknowledgement. The last author would like to thank Sergio Albeverio, Edwin Langmann and Jouko Mickelsson for giving her the opportunity to present the results of this article at an early stage of their development and for the fruitful discussions that followed. She is also very grateful to Daniel Bennequin who made very useful comments at a later stage of the development of the results presented here. The authors also benefited from useful and interesting discussions with Jose Gracia Bondia, Matthias Lesch and Jean Orloff whom we would like to thank most warmly. The first author was supported by an ECOS-Nord grant during a stay at the Universit´e Blaise Pascal in Clermont-Ferrand where this article was written. The last author thanks the von Humboldt Stiftung for the financial support during a one month stay at the University of Bonn. The authors also want to address their thanks to the referee for his/her very useful comments.
64
A. Cardona, C. Ducourtioux, S. Paycha
References [Ad] [AdSe]
Adler, S.: Axial vector vertex in spinor electrodynamics. Phys. Rev. 177, 2426–2438 (1969) Adams, D., Sen, S.: Phase and scaling properties of determinants arising in topological field theories. Phys. Lett. 353, 495–500 (1995) [AG] Alvarez-Gaume, L.: Supersymmetry and the Atiyah-Singer index theorem. Commun. Math. Phys. 90, 161–173 (1983) [AGDPM] Alvarez-Gaume, L., Della Pietra, S., Moore, G.: Anomalies and odd dimensions. Ann. Phys. 163, 288–317 (1985) [AM] Arnlind, J., Mickelsson, J.: Trace extensions, determinant bundles, and gauge group cocycles. hep-th/0205126, 2002 [At] Atiyah, M.: The Geometry and Physics of Knots. Cambridge: Cambridge University Press, 1990 [APS I] Atiyah, M., Patodi, V., Singer, I.M.: Spectral asymmetry and Riemannian Geometry I. Math. Proc. Camb. Phil. Soc. 77, 43–69 (1975) [APS II] Atiyah, M., Patodi, V., Singer, I.M.: Spectral asymmetry and Riemannian Geometry II. Math. Proc. Camb. Phil. Soc. 78, 405–432 (1975) [APS III] Atiyah, M., Patodi, V., Singer, I.M.: Spectral asymmetry and Riemannian Geometry III. Math. Proc. Camb. Phil. Soc. 79, 71–99 (1976) [AS] Atiyah, M., Singer, I.M.: Dirac operators coupled to vector potentials. Proc. Nath. Acad. Sci. USA 81, 2597–2600 (1984) [Ba] Baadhio, R.: Quantum Topology and Global Anomalies. Adv. Ser. Math. Phys. 23, Singapore: World Scientific, 1996 [BJ] Bell, J.S., Jackiw, R.: A PCAC Puzzle: π 0 → γ γ in the σ model. Il Nuovo Cimento LX A, 47–61 (1969) [Bar] Bardeen, W.A.: Anomalous Ward identities in spinor field theories. Phys. Rev. 184, 1848– 1859 (1969) [Ber] Bertlmann, R.: Anomalies in Quantum Field Theory. Oxford: Oxford University Press, 1996 [BF] Bismut, J.-M., Freed, D.: The analysis of elliptic families I. Commun. Math. Phys. 106, 159–176 (1986) [BGV] Berline, N., Getzler, E., Vergne, M.: Heat Kernels and Dirac Operators. Berlin-HeidelbergNew York: Springer-Verlag, 1992 [BLP] Booss-Bavnek, B., Lesch, M., Phillips, J.: Spectral flow of paths of self-adjoint Fredholm operators. Nucl. Phys. (Proc. Suppl.) 104, 177–180 (2002); Unbounded Fredholm Operators and Spectral Flow. Preprint TEKST Nr 407, Roskilde University (2001) [C] Cardona, A.: Geometry of Families of Elliptic Complexes, Duality and Anomalies. Ph.D. thesis, Mathematics Department, Universit´e Blaise Pascal, 2002 [CDMP] Cardona, A., Ducourtioux, C., Magnot, J.-P., Paycha, S.: Weighted traces on algebras of pseudo-differential opertors and geometry on loop groups. Infinite Dim. Anal. Quant. Prob. Rel Top. 5(4), 503–540 (2002) [CS] Chern, S.-S., Simons, J.: Characteristic forms and geometric invariants. Ann. Math. 99, 48–69 (1974) [CZ] Cognola, G., Zerbini, S.: Consistent, covariant and multiplicative anomalies. hep-th98110398, 1998 [Do] Dowker, J.S.: On the relevance of the multiplicative anomaly. hep-th/9803200, 1998 [Du] Ducourtioux, C.: Weighted Traces on Pseudo-differential Operators and Associated Determinants. Ph.D. thesis, Mathematics Department, Universit´e Blaise Pascal, 2001 [E] Eckstrand, C.: A simple algebraic derivation of the covariant anomaly and Schwinger term. J. Math. Phys. 41(11), 7294–7303 (2000) [EM] Eckstrand, C., Mickelsson, J.: Gravitational anomalies, gerbes and hamiltonian quantization. Commun. Math. Phys. 212, 613–624 (2000) [ECZ] Elizalde, E., Cognola, G., Zerbini, S.: Applications in physics of the multiplicative anomaly formula involving some basic differential operators. Nucl. Phys. B 532(1-2), 407–428 (1998) [EFVZ] Elizalde, E., Filippi, A., Vanzo, L., Zerbini, S.: Is the multiplicative anomaly relevant? hepth/9804072, 1998 [FU] Freed, D., Uhlenbeck, K.: Instantons and Four-manifolds. Berlin-Heidelberg-New York: Springer-Verlag, 1984 [Fr] Friedrich, R.: Dirac Operatoren in der Riemannschen Geometrie. Advanced Lectures in Mathematics, Vieweg, 1997 [Fu] Fujikawa, K.: Path integral measure for gauge invariant fermion theories. Phys. Rev. Lett. 42, 1195 (1979) [GJ] Gross, D.J., Jackiw, R.: Effect of anomalies on quasirenormalizable theories. Phys. Rev. D6, 477–493 (1972)
From Tracial Anomalies to Anomalies in Quantum Field Theory [Gr] [KV] [L] [LM] [LaMi] [Me] [MN] [M]
[MR] [N] [O] [P1] [PR] [PR2]
[Q1] [Q2] [R] [RS] [Sc] [Si] [TJZW] [Wi] [Wo]
65
Grubb, G.: Functional calculus of pseudodifferential boundary problems. Progress in Mathematics 65, Basel-Boston: Birkh¨auser, 1996 Kontsevich, M., Vishik, S.: Determinants of elliptic pseudo-differential operators. Max Planck Institut preprint, 1994 Lesch, M.: On the non commutative residue for pseudo-differential operators with log-polyhomogeneous symbols. Annals of Global Analysis and Geometry 17, 151–187 (1999) Langmann, E., Mickelsson, J.: Elementary derivation of the chiral anomaly. Lett. Math. Phys. 36, 45–54 (1996) Lawson, H., Michelsohn, M.-L.: Spin Geometry. Princeton: Princeton University Press, 1989 Melrose, R.: The Atiyah-Patodi-Singer Index Theorem. Research Notes in Mathematics, Vol. 4, Wellesley, MA: A K Peters, Ltd., 1993 Melrose, R., Nistor, V.: Homology of pseudo-differential operators I. Manifolds with boundary. funct-an/9606005, June 1999 Mickelsson, J.: Second Quantization, anomalies and group extensions. In: Lecture notes given at the "Colloque sur les M´ethodes G´eom´etriques en physique, C.I.R.M, Luminy, June 1997; Wodzicki residue and anomalies on current algebras. In: “Integrable Models and Strings” A. Alekseev and al., (eds.), Lecture Notes in Physics 436, Berlin-Heidelberg-New York: Springer, 1994 Mickelsson, J., Rajeev, S.: Current algebras in d + 1 dimensions and determinant bundles over infinite-dimensional Grassmainnians. Commun. Math. Phys. 116, 365–400 (1985) Nakahara, M.: Geometry, Topology and Physics. Bristol: Adam Hilger, 1990 Okikiolu, K.: The Campbell-Hausdorff theorem for elliptic operators and a related trace formula. Duke. Math. J. 79, 687–722 (1995); The multiplicative anomaly for determinants of elliptic operators. Duke. Math. J. 79, 723–750 (1995) Paycha, S.: Renormalized traces as a looking glass into infinite dimensional geometry. Inf. Dim. Anal. Quant. Prob. Rel. Top. 4(2), 221–266 (2001) Paycha, S., Rosenberg, S.: Curvature on determinant bundles and first Chern forms. J. Geom. Phys. 45, 393–429 (2003) Paycha, S., Rosenberg, S.: Traces and characteristic classes on Loop spaces. To appear in “Infinite dimensional groups and manifolds”. Proceedings of the 70th Meeting of Theoretical Physicists and Mathematicians held in Strasbourg, May 23–25, 2002. Edited by Tilmann Wurzbacher. IRMA Lectures in Mathematics and Theoretical Physics. Berlin: Walter de Gruyter & Co. Quillen, D.: Determinants of Cauchy-Riemann operators over a Riemann surface. Funct. Anal. Appl. 19, 37–41 (1985) Quillen, D.: Superconnections and the Chern character. Topology 24, 89–95 (1985) Radul, A.O.: Lie algebras of differential operators, their central extensions, and W-algebras. Funct. Anal. Appl. 25, 25–39 (1991) Ray, D.B., Singer, I.M.: R-torsion and the Laplacian on Riemannian manifolds. Adv. Math. 7, 145–210 (1971) Schwarz, A.: The partition function of a degenerate functional. Commun. Math. Phys. 67, 1–16 (1979) Singer, I.M.: Families of Dirac operators with applications to physics. Ast´erisque (hors s´erie), 323–340 (1985) Treiman, S., Jackiw, R., Zumino, B., Witten, E.: Current Algebra and Anomalies. Singapore: World Scientific, 1985 Witten, E.: Quantum field theory and the Jones polynomial. Commun. Math. Phys. 121, 351–399 (1989) Wodzicki, M.: Non-commutative residue. In: Lecture Notes in Mathematics, 1289, BerlinHeidelberg-New York: Springer Verlag, 1987
Communicated by M. Aizenman
Commun. Math. Phys. 242, 67–80 (2003) Digital Object Identifier (DOI) 10.1007/s00220-003-0922-5
Communications in
Mathematical Physics
Arithmetic and Equidistribution of Measures on the Sphere Siegfried B¨ocherer1 , Peter Sarnak2 , Rainer Schulze-Pillot3 1 2
Kunzenhof 4B, 79117 Freiburg, Germany. E-mail: [email protected] Department of Mathematics, Princeton University, Fine Hall, Princeton, NJ 08544, USA. E-mail: [email protected] 3 FR 6.1 Mathematik, Universit¨at des Saarlandes, Postfach 151150, 66041 Saarbr¨ ucken, Germany. E-mail: [email protected]
Received: 18 December 2002 / Accepted: 16 April 2003 Published online: 22 September 2003 – © Springer-Verlag 2003
Abstract: Motivated by problems of mathematical physics (quantum chaos) questions of equidistribution of eigenfunctions of the Laplace operator on a Riemannian manifold have been studied by several authors. We consider here, in analogy with arithmetic hyperbolic surfaces, orthonormal bases of eigenfunctions of the Laplace operator on the two dimensional unit sphere which are also eigenfunctions of an algebra of Hecke operators which act on these spherical harmonics. We formulate an analogue of the equidistribution of mass conjecture for these eigenfunctions as well as of the conjecture that their moments tend to moments of the Gaussian as the eigenvalue increases. For such orthonormal bases we show that these conjectures are related to the analytic properties of degree eight arithmetic L-functions associated to triples of eigenfunctions. Moreover we establish the conjecture for the third moments and give a conditional (on standard analytic conjectures about these arithmetic L-functions) proof of the equidistribution of mass conjecture. Electronic Supplementary Material: Supplementary material is available in the online version of this article at http://dx.doi.org/10.1007/s00220-003-0922-5 1. Introduction Let X be a Riemannian manifold of finite volume. Starting out from problems of theoretical physics (quantum chaos) several authors have recently studied questions of equidistribution of eigenfunctions of the Laplace operator. In particular, precise versions of conjectures on equidistribution properties have been put forward by Rudnick and Sarnak [15] for arithmetic hyperbolic manifolds X = \H, where H is the upper half plane of the complex numbers and an arithmetic subgroup of SL2 (R) and for eigenfunctions of the Laplace operator, that are eigenfunctions of the (arithmetically defined) Hecke operators as well. The phenomenon
Part of this work was done at the Institute for Advanced Study, Princeton, NJ.
68
S. B¨ocherer, P. Sarnak, R. Schulze-Pillot
of (conjectural) equidistribution of eigenfunctions in this arithmetical situation is one of the central problems in what has become known as arithmetic quantum chaos. We investigate here the analogous question for the situation of the 2-dimensional unit sphere. Although the dynamics of geodesics for this manifold is certainly not chaotic it turns out that it nevertheless makes sense to look for an equidistribution property of eigenfunctions. At first sight, the well known fact that the usual spherical eigenfunctions Yl,m (see [20, Chap. III]) concentrate for l = m → ∞ around the equator [3] seems to contradict the expectation of equidistribution, but since the eigenvalues occur on the sphere with multiplicities bigger than one, it makes sense to look into the question of what happens if one varies the basis of eigenfunctions. In this direction, it has been proved by Zelditch [22] that for a random orthonormal basis of eigenfunctions the equidistribution of mass conjecture is true. We consider here, in analogy to the arithmetic hyperbolic surfaces, an orthonormal basis of eigenfunctions of the Laplace operator that are also eigenfunctions of an algebra of Hecke operators that acts on the space of spherical functions. The papers [11, 19] also consider questions of the behaviour of eigenfunctions for such bases of spherical harmonics. Concretely, a definite quaternion algebra such as the Hamilton Quaternions H over Q, gives rise to Hecke operators on L2 (S 2 ) (see [4, 13]). For α = x0 + x1 i + x2 j + x3 k ∈ H(R), let S(α) = √ Here
1 x0 + x1 i x2 + x3 i ∈ SU (2). N (α) −x2 + x3 i x0 − x1 i
N (α) = αα = x02 + x12 + x22 + x32 .
For n ≥ 1 an odd integer defines the Hecke operator Tn on L2 (S 2 ) by φ(S(α)P ), (Tn φ)(P ) = N (α)=n α∈H(Z)
where P ∈ S 2 and SU (2) acts on S 2 by isometries after one realizes S 2 as C ∪ {∞} via stereographic projection and SU (2) acts by linear fractional transformations. The Tn ’s are selfadjoint, they commute with each other as well as with the Laplacian on the round sphere. Thus the Tn ’s can be simultaneously diagonalized in each of the 2ν + 1 dimensional spaces Hν , consisting of spherical harmonics on S 2 of degree ν (that is the restriction of harmonic polynomials in R3 , homogeneous of degree ν). This algebra of Hecke operators arises naturally if one views the spherical harmonics as components at infinity of automorphic forms on the multiplicative group of the adelization of the rational Hamilton quaternions. We denote by ψν such a Hecke eigenform with ν indicating its degree (so that its Laplace eigenvalue is ν(ν + 1)/2). The analogue of the equidistribution of mass conjecture [15] for the ψν ’s is the following: Conjecture 1. Normalize ψν on S 2 to have L2 -norm equal to 1, so that µψν := |ψν (P )|2 dv(P ) is a probability measure. Then
Arithmetic and Equidistribution of Measures on the Sphere
lim µψν =
ν→∞
69
dv , 2π
in the sense of integration against continuous functions on S 2 . The analogue of the Gaussian equidistribution conjecture of Berry and others [7] in this context is as follows: Conjecture 2. Fix q ≥ 0 an integer, then ψνq dv −→ lim ν→∞ S 2
cq , (2π )q/2
where cq is the q th moment of the Gaussian distribution. By the work of Eichler [4] and of Jacquet/Langlands it is known that there is a correspondence between spherical harmonic polynomials and modular forms via the theory of theta series with spherical harmonics. This correspondence is Hecke-equivariant, and thus methods and results from the theory of modular forms, in particular from the theory of L-functions associated to Hecke eigenforms (or irreducible automorphic representations), can be used in the study of the spherical harmonics. The crucial point for our study of the integrals appearing in the equidistribution conjecture above is a formula proved in [2] that connects the integral of a product of 3 eigenfunctions over the sphere with the central critical value of the automorphic L-functions associated to a triple of modular Hecke eigenforms; this allows one to connect the equidistribution conjecture with conjectural properties of such automorphic L-functions. We note in passing that such integrals of products of 3 eigenfunctions of the Laplace operator on the sphere have been considered in various places in the physics literature, see [17]. The purpose of this note is to show that combining the main formula in [2] with the recent subconvex estimates for special values of L-functions of holomorphic modular forms [14] allows one to prove Conjecture 2 for q = 3 (the cases q = 1 and q = 2 are obvious). We also show that Conjecture 1 would follow from subconvex estimates for the degree 8 L-functions mentioned above. Such subconvex estimates are an immediate consequence of the Riemann Hypothesis for these L-functions. At the present time such subconvex estimates are known only for special forms, see [16 and 10]. In his recent thesis [21], Watson has derived general explicit identities relating integrals of products of 3 Maass (or holomorphic) Hecke eigenforms on arithmetic surfaces, to special values of degree 8 L-functions. As a consequence he obtains similar results for “chaotic” eigenstates. The estimates in this article depend on the results of [2], in some cases in corrected versions. A revised version of that article and a list of errata are available at www.math.uni-sb.de/˜ag-schulze/Preprints, the list of errata is also attached to the electronic version of this article. 2. Equidistribution Our first goal is to describe explicitly the connection between the central critical value of the triple product L-function associated to a triple of cusp forms on one side and integrals of harmonic polynomials over the unit sphere on the other side. We fix first some notations.
70
S. B¨ocherer, P. Sarnak, R. Schulze-Pillot
We consider a definite quaternion algebra D of discriminant N (where N is the product of the primes ramified in D) over Q and a maximal order R in D; we assume that the class number (i.e., the number of classes of left R-ideals) of D is 1; this restricts D to be one of the algebras of discriminant equal to 2, 3, 5, 7, 13. On D we have the involution x → x, the (reduced) trace tr(x) = x + x and the (reduced) norm n(x) = xx. (0) For ν ∈ N let Uν be the space of homogeneous harmonic polynomials of degree ν (0) (0) on R3 and view P ∈ Uν as a polynomial on D∞ = {x ∈ D∞ = D ⊗ R|tr(x) = 0} 3 (0) by putting P ( i=1 xi ei ) = P (x1 , x2 , x3 ) for an orthonormal basis {ei } of D∞ with respect to the norm form n. Integrating the polynomial in this identification over the (0) set of x ∈ D∞ of norm 1 is the same as integrating the original polynomial in 3 real variables over the unit sphere S 2 ; we will freely use this identification below. In the same way we fix an orthonormal basis of D∞ extending the one from above and use it to identify (harmonic) polynomials in 4 variables with (harmonic) polynomial functions on D∞ . × on U (0) by conjugation of the argument is denoted by τ . By The representation of D∞ ν ν (0) ,
we denote the invariant scalar product in the representation space Uν (where (0) (0) the choice of normalization will be discussed later). The D × × D × -space Uν ⊗ Uν is isomorphic to the D × × D × -space U2ν of harmonic polynomials on D∞ of degree 2ν, where (d1 , d2 ) ∈ D × × D × acts by sending P (x) to P (d1−1 xd2 ). An explicit isomor¯ phism is given by mapping P1 ⊗P2 to the polynomial P1 ⊗P2 (d) := P2 (x), P1 (dx d)
. (0) (0) We will henceforth identify Uν ⊗ Uν with U2ν using this isomorphism. (0) There is a Hecke action on Uν which has been described by Eichler [4] in terms of Brandt matrices with polynomial entries, it is given by τν (y)(P ), T˜ (p)P = y∈R,n(y)=p (0)
see also [13]. In particular the space Uν has a basis consisting of eigenforms of all the T˜ (p) for the p N. (0) To P1 ∈ Uν we associate the theta series of R with harmonic polynomial P1 ⊗ P1 given as usual as fP1 (z) :=
1 (P1 ⊗ P1 )(x) exp(2π in(x)z). |R × | x∈R
For this to be nonzero we have to restrict to polynomials P1 that are invariant under the action of the group R × of R, we will always do so in the sequel. The function fP1 is then a cusp form for 0 (N ) of weight 2 + 2ν if ν > 0 and it is an eigenform for the Hecke operators T (p) for p N if P1 is an eigenfunction of the T˜ (p) for the p N. In fact it is a normalized newform if P1 , P1
= 1, and it is a result of [4] that one gets all normalized newforms of level N , weight 2 + 2ν and trivial character in this way (we will actually not use the latter fact). With these notations we can now formulate: (0)
(0)
(0)
Proposition 2.1. Let P1 , P2 , P3 ∈ Uν1 , Uν2 , Uν3 (with ν1 = ν2 , ν3 > 0) be harmonic polynomials that are Hecke eigenforms as above, denote by f1 , f2 , f3 the associated cusp forms of weights k1 = k2 = 2 + 2ν1 , k3 = 2 + 2ν3 and by L(f1 , f2 , f3 ; s) the
Arithmetic and Equidistribution of Measures on the Sphere
71
triple product L-function associated to f1 , f2 , f3 (as defined for the good primes e.g. in [5], for the Euler factors at the bad primes we refer to [2]). Then one has for all > 0 : 2 2k1 + k3 − 1) ≥ C1 (N, ν3 )ν11−
P1 (x)P2 (x)P3 (x)dx (2.1) L(f1 , f2 , f3 ; 2 S2 with a positive constant C1 (N, ν3 ) depending only on N, ν3 and . If ν1 = ν2 = ν3 =: ν, one has for all > 0 (with k := k1 = k2 = k3 = 2 + 2ν): 2 P1 (x)P2 (x)P3 (x)dx (2.2) L(f1 , f2 , f3 ; 2 + 3ν) ≥ C2 (N )ν 2−
S2
with a positive constant C2 (N ) depending only on N and . Proof. According to [2] the central critical value is: (a + 1)[b] 2[a+b] 2[a ] (b + 1)[a ] (1)[a ] 2 1 × f1 , f1 f2 , f2 f3 , f3 T0 ⊗ P ⊗ P P 1 2 3 |R × |
(−1)a 25+4a+3b−ω(N) π 5+9a +4b
(0)
(0)
(2.3)
(0)
with a certain trilinear form T0 on Uν1 ⊗ Uν2 ⊗ Uν3 whose definition is recorded below. Here we have the following notations: (α + ν) 1 ν=0 α [ν] = = , α(α + 1) . . . (α + ν − 1) ν > 0 (α) hence
(a + 1)[b] b! . = (a !)2 (a + 1)!(a + b + 1)! 2[a+b] 2[a ] (b + 1)[a ] (1)[a ] Unfortunately, [2] contains a mistake at this point, the correct value of the factor arising here is (a + 1 + b)b! . (2.4) (a !)2 (a + 1)!(3a + b + 1)! Moreover, there should be an additional factor of N −1 in (2.3), the exponent at π should be 5 + 6a + 2b and the factor (−1)a should be omitted. The forms f1 , f2 , f3 are normalized newforms of weights k1 ≥ k2 ≥ k3 with k2 + k3 > k1 ; in our case we have k1 = k2 . We write k1 = k2 = 2+a +b and k3 = 2+a, a = 2a . For our purposes we can restrict to the case that both a and b are even. We normalize the invariant scalar product on the latter space in such a way that the Gegenbauer polynomial G(α) (x, x ) obtained from α
(α) G1 (t)
= 2α
[2] j =0
(−1)j
(α − j )! α−2j 1 t j !(α − 2j )! 22j
by
(α)
G(α) (x1 , x2 ) = 2α (n(x1 )n(x2 ))α/2 G1 is a reproducing kernel.
tr(x1 x2 ) √ 2 n(x1 )n(x2 )
72
S. B¨ocherer, P. Sarnak, R. Schulze-Pillot (0)
The invariant scalar product on Uα is then normalized such that we obtain the (0) (0) product on U2α given above under the identification U2α = Uα ⊗ Uα . Given this, the polynomials P1 , P2 , P3 are normalized to Pi , Pi
= 1. The normalization of the trilinear form is then as follows: We have a harmonic polynomial P ∈ Ua+b ⊗Ua+b ⊗Ua in three vector variables (each vector being a quaternion) derived from the action of a certain differential operator on an exponential in Sect. 1 of [2]. This gives an invariant trilinear form T on Ua+b ⊗ Ua+b ⊗ Ua defined by taking the scalar product with P0 := (π )−3a −b i −3a P (notice that in [2] we write erroneously (0) (0) π −3a−2b P ). Using the identification U2α = Uα ⊗ Uα from above T decomposes as T0 ⊗ T0 . (0) (0) (0) For the intended application the form T0 (Q1 , Q2 , Q3 ) should be replaced by the integral (0)
(0)
(0)
Q1 (x)Q2 (x)Q3 (x)dx over the unit sphere. As a first step we compare T (Q1 , Q2 , Q3 ) with Q1 (x)Q2 (x)Q3 (x)dx; since both expressions give invariant trilinear forms they have to be proportional. (α) We compute T for special polynomials Qi on the space of quaternions: Write Gw (x) (α) for G (w, x). Then we have T (G(a+b) , G(a+b) , G(a) w w w ) = P0 (w, w, w) by the reproducing property of the G(α) . On the other hand, by [20, p.490] the integral G(a+b) (x)G(a+b) (x)G(a) w w w (x)dx is equal to π/2 and hence T (Q1 , Q2 , Q3 ) = 2P0 (w, w, w)
Q1 (x)Q2 (x)Q3 (x)dx.
(2.5)
We have to compute P0 (w, w, w) explicitly. This looks at first sight rather awkward since our description in [2] gives us an explicit formula only for one coefficient of the polynomial. Fortunately there are some results on such polynomials in the forthcoming work of Ibukiyama and Zagier, see [9]: For n ∈ N we denote by Hn (4) the space of harmonic homogeneous polynomials of degree n in 4 variables. For nonnegative integers µ1 , µ2 , µ3 we then put
O(4) . Hµ1 ,µ2 ,µ3 (4) := Hµ2 +µ3 (4) ⊗ Hµ1 +µ3 (4) ⊗ Hµ1 +µ2 (4)
Arithmetic and Equidistribution of Measures on the Sphere
73
This space is then always one-dimensional and a nonzero element of Hµ1 ,µ2 ,µ3 (4) is µ µ µ (explicitly!) given as the coefficient of X1 1 X2 2 X3 3 in the formal power series G4 (X, T ) = G4 (X1 , X2 , X3 ; T ) =
1 (X, T )2
− 4d(T )X1 X2 X3
Here T is (twice of) a Gram matrix 2m1 r3 r2 (x, x) (x, y) (x, z) T = r3 2m2 r1 = 2 · (y, x) (y, y) (y, z) (z, x) (z, y) (z, z) r2 r1 2m3
.
(x, y, z ∈ C4 )
and d(T ) = 4m1 m2 m3 − m1 r12 − m2 r22 − m3 r32 + r1 r2 r3 =
1 det(T ), 2
(X1 , X2 , X3 ; T ) = (X, T ) = 1 − r 1 X1 − r 2 X2 − r 3 X3 + r 1 m1 X2 X3 + r 2 m2 X3 X1 +r3 m3 X1 X2 + m1 m2 X32 + m2 m3 X12 + m3 m1 X22 . We are interested in the coefficient of X1a X2a X3a +b which we call P˜ in the sequel. The coefficient of m01 m02 m03 r1a r2a r3a +b in the polynomial P˜ can be read off from the expression above (putting m1 = m2 = m3 = 0); it is
a 3α + b 2a − 2α 2a + b + α , a − α 3α + b α, α, α + b
α=0
where we write
j α, β, γ
:=
j! . α!β!γ !
This is known to be equal to
(2a )!(b + 2a )!2 , a !4 (b + a )!2 an identity which can be reduced to a special case of an exercise on p. 44 in [18] with hints to [1], who traces it back to “Saalschutz summation”. Now we compare P0 with Ibukiyama’s polynomial P˜ ; it is enough to compare the coefficients in the monomial above. From Sect. 1 of [2] one reads off that the coefficient of P0 in the same monomial is 2b . (b + 1)!b! Again, there is a mistake in [2] here. The correct value is
2b 24a (a + 2) , (a + b + 2)b! so that we arrive at P0 = 2b+4a
a !4 (b + a )!2 (a + 1)! P˜ . + 2a )!2 (a + b + 1)!b!
(2a )!(b
(2.6)
74
S. B¨ocherer, P. Sarnak, R. Schulze-Pillot
Next we have to evaluate P˜ at the triple (w, w, w), i. e., at the matrix T with m1 = m2 = m3 = 1, r1 = r2 = r3 = 2. We get in this case G4 (X, T ) = 1 −
3
−2 Xi
,
i=1
the coefficient of which at X1a X2a X3a +b is the value we try to compute. It is proved easily (Taylor expansion) that this is equal to (3a + b + 1)! , (a )!(a )!(a + b)! which leads us to (a + 1)!(3a + b + 1)!a !4 (b + a )!2 + b)!(2a )!(b + 2a )!2 (a + b + 1)!b! 2 (3a + b + 1)!a ! (a + 1)!(a + b)! = 2b+4a . 2 (2a )!(b + 2a )! b!(a + b + 1)!
P0 (w, w, w) = 2b+4a
(0)
(0)
a !2 (a
(0)
(0)
(0)
For polynomials Q1 , Q2 ∈ Ua +b/2 , Q3 ∈ Ua we have by definition (0)
(0)
(0)
(0)
(0)
(0)
(0)
(0)
(0)
(T0 (Q1 , Q2 , Q3 ))2 = T (Q1 ⊗ Q1 , Q2 ⊗ Q2 , Q3 ⊗ Q3 ), and hence (as a consequence of the discussion given above) (0)
(0)
(0)
(T0 (Q1 , Q2 , Q3 ))2 (3a + b + 1)!a !2 (a + 1)!(a + b)! = 2b+4a +1 π −1 (2a )!(b + 2a )!2 b!(a + b + 1)! (0) (0) (0) (0) (0) (0) × (Q1 ⊗ Q1 )(x)(Q2 ⊗ Q2 )(x)(Q3 ⊗ Q3 )(x)dx, where the integration is over the 3-dimensional unit sphere. Our next task is to relate the integral (0) (0) (0) (0) (0) (0) (Q1 ⊗ Q1 )(x)(Q2 ⊗ Q2 )(x)(Q3 ⊗ Q3 )(x)dx
(2.7)
with (T˜0 (Q1 , Q2 , Q3 ))2 , where we put (0) (0) (0) (0) (0) (0) ˜ T0 (Q1 , Q2 , Q3 ) := Q1 (z)Q2 (z)Q3 (z)dz,
(2.8)
(0)
(0)
(0)
and where the integration is now over the 2-dimensional unit sphere (the factor of pro(0) (0) portionality arising here depends on the identification between U2α and Uα ⊗ Uα and hence on the degrees of the polynomials involved). In order to do this we need again special polynomials which show us the normalization of our isomorphism. We recall first how this isomorphism is described: (0) (0) (0) (0) (0) Given P1 , P2 ∈ Uα we defined the polynomial P1 ⊗ P2 by (P1 ⊗ P2 )(x) =
Arithmetic and Equidistribution of Measures on the Sphere (0)
75
(0)
P1 (d), P2 (xd x)
¯ 0 , where ·, ·
0 denotes the invariant scalar product chosen (0) in Uα . (0) (0) We consider again the Gegenbauer polynomial G(α,0) of degree α in Uα ⊗ Uα , derived in the same way from the one-variable polynomial with indices l = α, p = 1/2 given in [20] as we did it above for the Gβ in Uβ , and let ·, ·
0 be normalized such that this polynomial is a reproducing kernel; this normalization determines then our choice (0) (0) of the isomorphism between U2α and Uα ⊗ Uα . In order to relate the integrals in (2.7) and (2.8) we evaluate them for a special (a +b/2,0) (0) (0) (0) (a ,0) choice of polynomials: We put Q1 = Q2 = Gz and Q3 = Gz for some quaternion z of norm 1 and trace 0. The integral in (2.8) is then by [20, p.490] equal to
c1 :=
) ( 3a 2+b + 1)( a 2+1 )2 ( a +b+1 2
(2.9)
π( a2 + 1)2 ( a 2+b + 1)( 3a +b+3 ) 2
= 16
(a )2 (a + b)( 3a 2+b + 1)2
(3a + b + 2)( a2 )2 ( a2 + 1)2 ( a 2+b )( a 2+b + 1)
(2.10)
(where the second form is derived from the first using the duplication formula for the -function and where a factor (2w)/ (w) has to be replaced by 2 if w = 0). (0) (0) On the other hand, the definition of the isomorphism between U2α and Uα ⊗ Uα and the reproducing property of the Gegenbauer polynomials imply that (G(α,0) ⊗ G(α,0) )(x) = G(α,0) (xzx), ¯ z z z and hence that
((a +b/2),0)
(Gz
((a +b/2),0)
⊗ Gz
((a +b/2),0)
)(x)(Gz
((a +b/2),0)
⊗ Gz
)(x)
,0) ,0) ⊗ G(a )(x)dx ×(G(a z z ,0) ((a +b/2),0) ((a +b/2),0) (z )Gz (z )G(a (z )dz , = vol(Stab(z)) Gz z
(2.11)
where Stab(z) is the set of x of norm 1 with xzx ¯ = z. The normalizations of the integrals over the 3-sphere and over the 2-sphere in [20] are such that vol(Stab(z)) = π4 holds. Taken together we obtain
(0)
(0)
(0)
(0)
(0)
(0)
(Q1 ⊗ Q1 )(x)(Q2 ⊗ Q2 )(x)(Q3 ⊗ Q3 )(x)dx =
π 4c1
(0)
(0)
(0)
Q1 (z)Q2 (z)Q3 (z)dz
where c1 is the constant computed in (2.9).
2 ,
(2.12)
76
S. B¨ocherer, P. Sarnak, R. Schulze-Pillot
This gives us the first formula for the central critical value of the triple product L-function,
N −1 212a +4b−4−ω(N) π 5+6a +2b f1 , f1 f1 , f1 f3 , f3
×
(b + a )a ( a 2+b )2 ( a2 )4 (3a + b + 2)
(a )2 (2a )( 3a 2+b + 1)2 (a + b)(2a + b + 1)2 2 × P1 (x)P1 (x)P3 (x)dx .
(2.13)
In this we replace the Petersson product fi , fi by (4π)1−ki (ki )Dfi (k − 1) (where Dfi denotes the symmetric square L-function of fi ), which leads us to 2−9−ω(N) π 2 Df1 (k1 − 1)Df1 (k1 − 1)Df3 (k3 − 1)
×
(a )2 (2a + 1)(a + b)(b + 2a + 1)2 ( a 2+b )2 (3a + b + 2)( a2 )4
(a )2 ( 3a 2+b + 1)2 (a + b) 2 × P1 (x)P1 (x)P3 (x)dx .
(2.14)
Here the factor Df1 (k1 −1)Df1 (k1 −1)Df3 (k3 −1) does not contribute in an essential way to the asymptotics as k1 → ∞ since it is well known that ki−δ << Dfi (ki − 1) << kiδ for all δ > 0, see e.g. [8]. We analyze the total factor on the right-hand side in front of 2 Df1 (k1 − 1)Df1 (k1 − 1)Df3 (k3 − 1) P1 (x)P2 (x)P3 (x)dx with Stirling’s formula: For the first assertion of the proposition we fix ν3 and let ν1 tend to infinity; we find that the factor from above can for all > 0 be bounded from below by c b3−
for some nonzero constant c depending on a , and the level N as b tends to infinity. For the second part of the proposition we have all the νi equal, which implies b = 0 in the notation used above; we find that the factor can (for all > 0) be bounded from below by c a 5−
for some nonzero constant c depending on and the level N. We have to adjust a final normalization: The ϕi were normalized to have ϕi , ϕi
0 = 1, whereas we want them to have L2 -norm 1. A comparison of ϕi , ϕi
0 with the scalar product on the space of ϕ derived from the L2 -norm with the help of [20, p. 461] shows that we have −1 −1 P1 = a + (b + 1)/2 P˜1 , P3 = a + 1/2 P˜3 , where the P˜i are L2 -normalized.
Arithmetic and Equidistribution of Measures on the Sphere
77
This multiplies the last formula with (a + (b + 1)/2)−2 (a + 1/2)−1 in the first case and leads to an expression that is bounded from below for every > 0 by C1 b1−
for some nonzero constant C1 depending on , a and the level N as b tends to infinity. In the second case the formula gets multiplied with (a + 1/2)−3 and leads to an expression that is bounded from below for every > 0 by C2 a 2−
for some nonzero constant C2 depending on and the level N as a tends to infinity. This finishes the proof of the proposition.
On the other hand one can investigate the dependence of the central critical value L(f1 , f1 , f3 ; 2k12+k3 −1) on the level and weights with analytic methods from the theory of L-functions. We do this first for the equidistribution of mass conjecture, i. e., for the situation in which ν3 is fixed and ν1 = ν2 tends to infinity: As usual in the theory of L-functions the first step is to establish the convexity bound. Lemma 2.2. Let f1 , f3 be newforms of level N and weights k1 , k3 as above. Then L(f1 , f1 , f3 ;
2k1 + k3 − 1) = ON (k11+ ) 2
for all > 0. Proof. In [10] a general description of the convexity bound of (standard) automorphic L-functions for GL(n) is given. That bound is also applicable for our triple L-function. We recall the (normalized) functional equation of the triple L-function (quoting from [6] for weight 2 and more generally from [2]; we restrict ourselves to the case where all the cusp forms involved are newforms of the same -squarefree- level): Putting k3 k3 k3 (s) = C s + k1 + C s + 1 + C s + 1 + 2 2 2 k3 ×C s + 1 + k1 − (2.15) L(f1 , f2 , f3 , s) 2 with
k1 + k 2 + k 3 − 3 L(f1 , f2 , f3 , s) = L f1 , f2 , f3 , s + 2
and C (s) = (2π )−s (s) we get the functional equation 1
(1 − s) = (N 5 )s− 2 w(s). Under such circumstances, the convexity bound (as described in [10]) is L(f1 , f2 , f3 ,
1 1 + it) << (C(f1 , f2 , f3 , t)) 4 + . 2
(2.16)
78
S. B¨ocherer, P. Sarnak, R. Schulze-Pillot
Here C(f1 , f2 , f3 , t) is given in terms of the gamma factors; in our case this means 2 k 3 5 C(f1 , f2 , f3 , t) = (N ) 1 + it − k1 − 2 2 2 2 k k 3 3 1 + it − 1 − k1 + . (2.17) × 1 + it − 1 − 2 2 In our special case (i.e. k1 = k2 and k3 fixed) this implies an estimate of type 1 L(f1 , f1 , f3 , ) << k11+
2 or
1 L(f1 , f1 , f3 , ) << b1+ . 2 For our intended application this result is just too weak. We will therefore assume henceforth that one can break convexity for the estimate of L(f1 , f1 , f3 ; 2k12+k3 − 1) in the k1 -aspect (see [10] for a survey of subconvex estimates).
Subconvexity hypothesis. Fix f3 as above. There is δ > 0 such that for all f1 as above 1 L(f1 , f1 , f3 , ) = Of3 (k11−δ ). 2 Then the result from our proposition immediately translates into a statement about equidistribution of measures on the unit sphere that are associated to Hecke eigenfunctions on a quaternion algebra which proves the last assertion in the introduction (concerning Conjecture 1).
(0)
Proposition 2.3. Let D be as above, let S 2 ⊆ R3 be identified with {x ∈ D∞ | n(x) = 1} as above. For a harmonic polynomial P ∈ C[X1 , X2 , X3 ] let the measure µP on S 2 be defined by f (x)dµP (x) := |P (x)|2 f (x)dx. S2
S2
Then under our subconvexity hypothesis the measures µP become equidistributed if P runs through Hecke eigenfunctions of degree ν for ν −→ ∞, i.e., one has lim f (x)dµP (x) = f (x)dx (∗) ν(P )→∞ S 2
S2
for all continuous f on S 2 . Proof. We have to check (∗) only for Hecke eigenfunctions f = Q, as these form a Hilbert space basis of L2 (S 2 ). Then for Q = 1 the right-hand side of (∗) is zero, for Q = 1 it is 1. For Q = 1 we have equality in (∗) for all P (of L2 -norm 1). For Q = 1, Proposition 2 together with the convexity breaking assumption implies that lim Q(x)|P (x)|2 dx = 0, ν→∞ S 2
which proves the assertion.
Arithmetic and Equidistribution of Measures on the Sphere
79
We end with the statement and proof of Conjecture 1 for the case q = 3. Proposition 2.4. Let D and P (harmonic of degree ν(P )) be as in Proposition 2.3. Then lim
ν(P )→∞ S 2
P (x)3 dx = 0.
(2.18)
Proof. According to (2.2) we have for > 0,
S
2 P (x) dx << ν(P ) −2 L(f, f, f ; 2 + 3ν(P )). 2 3
(2.19)
Now with the notation from (2.16) (i. e. denoting by L an L-function normalized to have a functional equation under s → (1 − s)) we have L(f, f, f, s) = L(Sym3 f, s)(L(f, s))2 .
(2.20)
Furthermore L(Sym3 f, s) is the L-function of a GL4 -cusp form [12], so we may apply the general Molteni subconvexity bound (see [10]) to L(Sym3 f, 21 ). This combined with the subconvex bound for L(f, 21 ) due to Peng [14] shows that ν(P ) −2 L(f, f, f ; 2 + 3ν(P )) = O(ν −δ ) for a fixed δ > 0. This proves the proposition.
Remark. We see presently no way to extend the statement of Proposition 2.1 (and hence our arguments in this article) to a product of more than three polynomials, since our proofs here and in [2] use several special features of the case of three polynomials. In particular our proofs depend on – the existence of an integral representation for the triple product L-function using an Eisenstein series whose special value is expressed by theta series with spherical harmonics, – the existence and uniqueness of trilinear forms on tensor products of spaces of harmonic polynomials (and their explicit description by the generating series in [9]). Remark. In the case that the class number of the quaternion algebra is h = 1 Hecke eigenforms (on the adelic quaternion algebra) give rise to h-tuples of harmonic polynomials; they should then be viewed as functions on the disjoint sum of h copies of the unit sphere. All arguments from above can be carried out for such h-tuples of harmonic polynomials (resp. functions on the disjoint sum of h copies of the unit sphere). Acknowledgement. Part of this work was done during a stay of all three authors at the Institute for Avanced Study (IAS), Princeton, supported by von Neumann Fund, Weyl fund and Bankers Trust Fund. We thank the IAS for its hospitality. S. B¨ocherer and R. Schulze-Pillot thank D. Prasad and the Harish Chandra Research Institute (HCRI), Allahabad, India, for their hospitality. Schulze-Pillot’s visit to HCRI was also supported by DFG. We would like to thank T. Ibukiyama for the permission to use the not yet published results in [9].
80
S. B¨ocherer, P. Sarnak, R. Schulze-Pillot
References 1. Andrews, G.E.: Identities in combinatorics I: On sorting two ordered sets. Discrete Math. 11, 97–106 (1975) 2. B¨ocherer, S., Schulze-Pillot, R.: On the central critical value of the triple product L-function. In: Number Theory 1993–94, Cambridge: Cambridge University Press, 1996, pp. 1–46 3. Colin de Verdiere, Y.: Ergodicit´e et fonctions propres du laplacien. Commun. Math. Phys. 102, 497–502 (1985) 4. Eichler, M.: The basis problem for modular forms and the traces of the Hecke operators. In: Modular functions of one variable I , Lecture Notes Math. 320, Berlin-Heidelberg-NewYork: Springer-Verlag, 1973, pp. 76–151 5. Garrett, P.: Decomposition of Eisenstein series: Rankin Triple products. Ann. Math. 125, 209–235 (1987) 6. Gross, B., Kudla, S.S.: Heights and the central critical value of triple product L-functions. Compositio Math. 81, 143–209 (1992) 7. Hejhal, D.A., Rackner, B.: On the topography of Maaß wave forms for P SL(2, Z). Exp. Math. 1, 275–305 (1992) 8. Hoffstein, J., Lockhart.: Coefficients of Maaß forms and the Siegel zero. Ann. Math. II. Ser. 140, 161–176 (1994) 9. Ibukiyama, T., Zagier, D.: Higher spherical polynomials. In preparation 10. Iwaniec, H., Sarnak, P.: Perspectives on the analytic theory of L-functions. GAFA, Special Volume II, 705–741 (2000) 11. Jakobson, D., Zelditch, S.: Classical limits of eigenfunctions for some completely integrable systems. In: Emerging applications of number theory (Minneapolis 1996), IMA Vol. Math. Appl. 109, New York: Springer-Verlag, 1999, pp. 329–354 12. Kim, H., Shahidi, F.: Functorial products for GL2 × GL3 and the symmetric cube for GL2 . C. R. Acad. Sc. Paris 331, 599–604 (2000) 13. Lubotzky, A., Phillips, R., Sarnak, P.: Hecke operators and distributing points on the sphere. I. Commun. Pure Appl. Math. 39, S149–S186 (1986) 14. Peng, Z.: Zeros of central values of automorphic L-functions. PhD Thesis, Princeton: Princeton University, 2001 15. Rudnick, Z., Sarnak, P.: The behaviour of eigenstates of arithmetic hyperbolic manifolds. Commun. Math. Phys. 161(1), 195–213 (1994) 16. Sarnak, P.: Estimates for Rankin-Selberg L-functions and quantum unique ergodicity. J. Funct. Anal. 184, 419–453 (2001) 17. S´ebilleau, D.: On the computation of the integrated products of three spherical harmonics. J. Phys. A: Math. Gen. 31, 7157–7168 (1998) 18. Stanley, R.P.: Enumerative Combinatorics I. Cambridge: Cambridge University Press, 1997 19. VanderKam, J.M.: L∞ norms and quantum ergodicity on the sphere. Internat. Math. Res. Notices 7, 329–347 (1997), correction in: Internat. Math. Res. Notices 1, 65 (1998) 20. Vilenkin, N.Ja.: Special functions and the theory of group representations. Translated from the Russian by V. N. Singh. Translations of Mathematical Monographs, Vol. 22, Providence, RI: American Mathematical Society, 1983 21. Watson, T.C.: Rankin triple products and quantum chaos. PhD Thesis, Princeton: Princeton University, 2002 22. Zelditch, S.: Quantum ergodicity on the sphere. Commun. Math. Phys. 146, 61–71 (1992) Communicated by M. Aizenman
Commun. Math. Phys. 242, 81–135 (2003) Digital Object Identifier (DOI) 10.1007/s00220-003-0937-y
Communications in
Mathematical Physics
Self-Averaging of Wigner Transforms in Random Media Guillaume Bal1 , Tomasz Komorowski2 , Lenya Ryzhik3 1
Department of Applied Physics & Applied Mathematics, Columbia University, New York, NY 10027, USA. E-mail: [email protected] 2 Institute of Mathematics, UMCS, pl. Marii Curie Skłodowskiej 1, 20-031 Lublin, Poland. E-mail: [email protected] 3 Department of Mathematics, University of Chicago, Chicago, IL 60637, USA. E-mail: [email protected] Received: 8 October 2002 / Accepted: 25 April 2003 Published online: 7 October 2003 – © Springer-Verlag 2003
Abstract: We establish the self-averaging properties of the Wigner transform of a mixture of states in the regime when the correlation length of the random medium is much longer than the wave length but much shorter than the propagation distance. The main ingredients in the proof are the error estimates for the semiclassical approximation of the Wigner transform by the solution of the Liouville equations, and the limit theorem for two-particle motion along the characteristics of the Liouville equations. The results are applied to a mathematical model of the time-reversal experiments for the acoustic waves, and self-averaging properties of the re-transmitted wave are proved. 1. Introduction 1.1. The Wigner transform of mixtures of states. The Wigner transform is a useful tool in the analysis of the semi-classical limits of non-dissipative evolution equations as well as in the high frequency wave propagation [14, 17, 20]. It is defined as follows: given a family of functions fε (t, x) uniformly bounded in L∞ ([0, T ]; L2 (Rd )) its Wigner transform is εy ∗ εy dy W˜ ε (t, x, k) = eik·y fε t, x − fε t, x + . (1.1) 2 2 (2π )d The family W˜ ε is uniformly bounded in the space of Schwartz distributions S (Rd ×Rd ), and all its limit points are non-negative measures of bounded total mass [14, 17]. It is customary to interpret a limit Wigner measure W as the energy density in the phase space, since the limit points of nε = |fε |2 are of the form n(t, x) = W (t, x, k)dk, provided that the family fε is ε-oscillatory and compact at infinity [14]. However, while neither nε nor its limit n(t, x) satisfy a closed equation, both W˜ ε and W usually obey an evolution equation when the family fε (t, x) arises from a time-dependent PDE. This makes the Wigner transform a useful tool in the study of semiclassical and high
82
G. Bal, T. Komorowski, L. Ryzhik
frequency limits, especially in random media [1, 2, 10, 18, 20]. However, a priori bounds on the Wigner transform W˜ ε other than those mentioned above are usually difficult to obtain. It has been observed in [17] that the Wigner transform of a mixture of states εy εy dydµ(ζ ) Wε (x, k) = eik·y fε x − ; ζ fε∗ x + ; ζ , 2 2 (2π )d enjoys better regularity properties. The family fε above depends on an additional “state” parameter ζ ∈ S, where S is a state space equipped with a non-negative bounded measure dµ(ζ ). Typically this corresponds to introducing random initial data for fε at t = 0 and estimating the expectation of W˜ ε with respect to this randomness. This improved regularity has been used, for instance, in [18, 22] in the analysis of the average of the Wigner transform of mixtures of states in random media and in [17] in order to obtain an asymptotic expansion for the Wigner transform of a mixture of states. The purpose of this paper is to analyze the self-averaging properties of moments of the mixed Wigner transform of the form Wε (t, x, k)S(k)dk, where S(k) is a test function, and the family fε (t, x; ζ ) satisfies the acoustic equations. This problem arises naturally in the mathematical study of the experiments in time-reversal of acoustic waves that we will describe in detail below. However, apart from the time-reversal application, the statistical stability of such moments provides an important key to understanding the physical applicability of the limit equations for the Wigner transform in random media in the situations when results for each realization are more relevant than the statistically averaged quantities. We start with the wave equation in dimension d ≥ 3, 1 ∂ 2φ − φ = 0, c2 (x) ∂t 2
(1.2)
√ and assume that the wave speed has the form c(x) = c0 + δc1 (x). Here c0 > 0 is the constant sound speed of the uniform background medium, while the small parameter δ 1 measures the strength of the mean zero random perturbation c1 . Rescaling the spatial and temporal variables x = x /δ and t = t /δ we obtain (after dropping the primes) Eq. (1.2) with rapidly fluctuating wave speed x √ cδ (x) = c0 + δc1 . (1.3) δ It is convenient to re-write (1.2) as the system of acoustic equations for the “pressure” 1 p = φt and “acoustic velocity” u = −∇φ: c ∂u + ∇ (cδ (x)p) = 0, ∂t (1.4) ∂p + cδ (x)∇ · u = 0. ∂t The energy density for (1.4) is E(t, x) = |u|2 + p 2 : E(t, x)dx = const is independent of time. We will denote for brevity v = (u, p) ∈ Cd+1 and write (1.4) in the more general form of a first order linear symmetric hyperbolic system, ∂ ∂vεδ + Aδ (x)D j j Aδ (x)vεδ (x) = 0. ∂t ∂x
(1.5)
Self-Averaging of Wigner Transforms in Random Media
83
In the present case, the symmetric matrices Aδ and D j are defined by Aδ (x) = diag(1, 1, 1, cδ (x)), and D j = ej ⊗ ed+1 + ed+1 ⊗ ej , j = 1, . . . , d.
(1.6)
Notice that the matrices D j are independent of x. Here em ∈ Rd+1 is the standard orthonormal basis: (em )k = δmk . The dispersion matrix for (1.5) is P0δ (x, k) = iAδ (x)kj D j Aδ (x) = icδ (x)kj D j d = icδ (x) k˜ ⊗ ed+1 + ed+1 ⊗ k˜ , k˜ = kj e j .
(1.7)
j =1
The self-adjoint matrix (−iP0δ ) has an eigenvalue λ0 = 0 of multiplicity d − 1, and two simple eigenvalues λδ± (x, k) = ±cδ (x)|k|. The corresponding eigenvectors are
1 k˜ 0 ⊥ ± ed+1 , (1.8) bm = km , 0 , m = 1, . . . , d − 1; b± = √ 2 |k| ⊥ ∈ Rd is the orthonormal basis of vectors orthogonal to k. where km We assume that the initial data v0 (x; ζ ) = vεδ (0, x; ζ ) = (−ε∇φ0ε , 1/cδ φ˙ 0ε ) for (1.5) is an ε-oscillatory and compact at infinity family of functions uniformly bounded in L2 (Rd ) [14] for each “realization” ζ of the initial data. The scale ε of oscillations is much smaller than the correlation length δ of the medium: ε δ 1. The (d + 1) × (d + 1) Wigner matrix of a mixture of solutions of (1.5) is defined by εy εy dydµ(ζ ) Wεδ (t, x, k) = eik·y vεδ t, x − ; ζ vεδ∗ t, x + ; ζ . 2 2 (2π )d
Rd ×S
The non-negative measure dµ has bounded total mass: S dµ(ζ ) < ∞. It is well-known [14, 17] that for each fixed δ > 0 (and even without introduction of a mixture of states) one may pass to the limit ε → 0 and show that Wεδ converges weakly in S (Rd × Rd ) to W¯ δ (t, x, k) = uδ+ (t, x, k)b+ (k) ⊗ b+ (k) + uδ− (t, x, k)b− (k) ⊗ b− (k). The scalar amplitudes uδ± satisfy the Liouville equations: ∂uδ± + ∇k λδ± · ∇x uδ± − ∇x λδ± · ∇k uδ± = 0. (1.9) ∂t Furthermore, one may formally pass to the limit δ → 0 in (1.9) and show that (see [3]) E uδ± converge to the solution of ∂ u¯ ± ∂ ∂ u¯ ± 2 ˆ ˆ |k| Dmn (k) . (1.10) ± c0 k · ∇x u¯ ± = ∂t ∂km ∂kn Here kˆ = k/|k|, and the diffusion matrix D is given by ˆ 1 ∞ ∂ 2 R(c0 s k) Dmn = − ds, 2 −∞ ∂xn ∂xm where R(x) is the correlation function of c1 : E {c1 (y)c1 (x + y)} = R(x).
(1.11)
84
G. Bal, T. Komorowski, L. Ryzhik
The purpose of this paper is to make the passage to the limit ε, δ → 0 rigorous for a mixture of states (and eliminate the consecutive limits ε → 0 thenδ → 0) and establish the self-averaging properties of moments of the form sεδ (t, x) = Wεδ (t, x, k)S(k)dk, where S ∈ L2 (Rd ) is a given test function. The assumption that ε δ is formalized as follows. We let Kµ = (ε, δ) : δ ≥
| ln ε|−2/3+µ , with 0 < µ < 2/3 and assume that (ε, δ) ∈ Kµ for some µ ∈ (0, 2/3). From now on, µ is a given fixed number in (0, 2/3). 1.2. The random medium. We make the following assumptions on the random field c1 (x). Let ( , C, P) be a certain probability space, and let E denote the expectation with respect to P and · p denote the respective Lp norm for any p ∈ [1, +∞]. We suppose further that c1 : Rd × → R is a measurable, strictly stationary, mean-zero random field, that is pathwise C 4 -smooth and satisfies Di := ess-sup |∇xi c1 (x; ω)| < +∞,
i = 0, 1, . . . , 4.
(1.12)
ω∈
We assume in addition that c1 is exponentially φ-mixing. More precisely, for any R > 0 we let CRi := σ {c1 (x) : |x| ≤ R} and CRe := σ {c1 (x) : |x| ≥ R}. We also define e ], φ(ρ) := sup[ |P(B) − P(B|A)| : R > 0, A ∈ CRi , B ∈ CR+ρ
for all ρ > 0. We suppose that there exists a constant C1 > 0 such that φ(ρ) ≤ 2e−C1 ρ ,
∀ ρ > 0.
(1.13)
We let also R(y) = E[c1 (y)c1 (0)],
y ∈ Rd
be the covariance function of the field c1 (·) and note that (1.13) implies that there exists a constant C2 > 0 such that |∇ym R(y)| ≤ C2 e−|y|/C2 ,
∀ y ∈ Rd , m = 0, · · · , 4.
(1.14)
Finally we assume that R ∈ C ∞ (Rd ); this condition will be used only to establish the hypoellipticity of (1.10). Notice that sufficiently regular random fields with finite correlation length satisfy the hypotheses of this section. The exponential φ-mixing assumption was used in [15] to analyze the solutions of Liouville equations with random coefficients. Their techniques lay at the core of our proof of the mixing properties presented in our main result, Theorem 1.1, below. 1.3. The main result. We assume that the initial Wigner transform Wεδ (0, x, k) is uniformly bounded in L2 (Rd × Rd ) and Wεδ (0, x, k) → W0 (x, k) strongly in L2 (Rd × Rd ) as Kµ (ε, δ) → 0. We also assume that the limit W0 ∈ Cc (Rd × Rd ) with a support that satisfies supp W0 (x, k) ⊆ X = (x, k) : |x| ≤ C, C −1 ≤ |k| ≤ C .
(1.15)
(1.16)
Self-Averaging of Wigner Transforms in Random Media
85
Note that (1.15) may not hold for a pure state since W˜ ε 2 = (2π ε)−d/2 fε 22 [17]. We will later present examples where it does hold for a mixture of states. Furthermore, we assume that W0 has the form W0 (x, k) = u0+ b+ ⊗ b+ + u0− b− ⊗ b−
(1.17)
and let W¯ (t, x, k) = u¯ + (t, x, k)b+ (k) ⊗ b+ (k) + u¯ − (t, x, k)b− (k) ⊗ b− (k).
(1.18)
The functions u¯ ± satisfy the Fokker-Planck equation (1.10) with initial data u0± as in (1.17). The main result of this paper is the following theorem. Theorem 1.1. Let us assume that the random field c1 (x) satisfies the assumptions given in Sect. 1.2 and that the initial data Wεδ (0, x, k) satisfies (1.15) and (1.16). Let S(k) ∈ L2 (Rd ) be a test function, and define the moments δ δ and s¯ (t, x) = W¯ (t, x, k)S(k)dk, sε (t, x) = Wε (t, x, k)S(k)dk where W¯ is given by (1.18). Then for each t > 0 we have δ 2 E |sε (t, x) − s¯ (t, x)| dx → 0
(1.19)
as Kµ (ε, δ) → 0. Theorem 1.1 means that the moments sεδ converge to a deterministic limit. The main application of Theorem 1.1 we have in mind is the mathematical modeling of refocusing in the time-reversal experiments we present in Sect. 2. Our results may be generalized in a fairly straightforward manner to other wave equations that may be written in the form (1.5), which include acoustic equations with variable density and compressibility, electromagnetic and elastic equations [20]. The paper is organized as follows. The mathematical framework of the time-reversal experiment as well as the main result concerning the self-averaging properties of the time reversed signal, Theorem 2.1, are presented in Sect. 2. Section 3 contains the derivation of the Liouville equations in the L2 -framework. Some straightforward but tedious calculations from this section are presented in Appendices A, B and C. The limit theorem for the two-point motion along the characteristics of the Liouville equations, Theorem 4.4, is presented in Sect. 4. Theorem 1.1 follows from this result. The proof of Theorem 4.4 is contained in Sect. 5. 2. Refocusing in the Time-Reversal Experiments 2.1. Mathematical formulation of the time-reversal experiments. Refocusing of timereversed acoustic waves is a remarkable mathematical property of wave propagation in complex media that has been discovered and intensively studied by experimentalists in the last decade (see [12, 16] and also [8] for further references to the physical literature). A typical experiment may be described schematically as follows. A point source emits a localized signal. The signal is recorded in time by an array of receivers. It is then reemitted into the medium reversed in time so that the part of the signal recorded last is reemitted first and vice versa. There are two striking experimental observations. First
86
G. Bal, T. Komorowski, L. Ryzhik
the repropagated signal tightly refocuses at the location of the original source when the medium is sufficiently heterogeneous even with a recording array of small size. This is to be compared with the extremely poor refocusing that would occur if the heterogeneous medium were replaced by a homogeneous medium. Second, the repropagated signal is self-averaging. This means that the refocused signal is essentially independent of the realization of a random medium with given statistics, assuming that we model the heterogeneous medium as a random medium. The first mathematical study of a time-reversal experiment has been performed in [11] in the framework of one-dimensional layered random media. The one-dimensional case has been further studied in [21], and a three-dimensional layered medium was considered in [13]. The time-reversal experiments in an ergodic domain have been analyzed mathematically in [6]. The basic ideas that explain the role of randomness in the refocusing beyond the one-dimensional case were first outlined in [8] in the parabolic approximation of the wave equation, that was further analyzed in [2, 19]. Time-reversal in the general framework of multidimensional wave equations in random media has been studied formally in [3, 4]. One of the purposes of this paper is to present the rigorous proof of some of the results announced in [3]. The re-transmission scheme introduced in [3, 4] is as follows. Consider the system of acoustic equations (1.4) (or, equivalently, (1.5) for the pressure p and the acoustic velocity u(t, x)). The initial data for (1.5) is assumed to be localized in space: x − x0 x − x0 x − x0 1 vε (0, x) = S0 = −∇φ0 , φ˙ 0 . (2.1) ε ε cδ ε Here x0 ∈ Rd is the location of the original source, and S0 ∈ S(Rd ) is the source shape function. The small parameter ε 1 measures the spatial localization of the source. The signal vεδ (t, T ) is recorded at some time t = T , processed at the recording array and re-emitted into the medium. The new signal v˜ εδ is the solution of (1.5) on the time interval T ≤ t ≤ 2T with the Cauchy data v˜ εδ (T , x) = [fε [χ vεδ (T )](x)]χ (x).
(2.2)
The initial data (2.2) reflects the process of recording of the signal at the array and its smoothing by the recording process. The kernel fε (x) = ε −d f (x/ε) represents the smoothing. The array function χ (x) is either the characteristic function of the set of the receivers, or some non-uniform function supported on this set. We will assume for simplicity that f (|y|) is radially symmetric, and, moreover, χ ∈ Cc (Rd ), fˆ ∈ Cc (Rd ), suppfˆ(k) ⊆ 0 < C −1 ≤ |k| ≤ C < ∞ , (2.3) where fˆ(k) =
e−ik·y f (y)dy
is the Fourier transform of f . The matrix corresponds to the linear transformation of the signal. The pure time-reversal corresponds to keeping pressure unchanged but reversing the acoustic velocity so that = 0 := diag(−1, −1, −1, 1). However, this is only one possible transformation, and while we restrict to the above choice our results may be extended to more general matrices , or even allow be a pseudo-differential operator of the form (x, εD).
Self-Averaging of Wigner Transforms in Random Media
87
The re-propagated field near the source at time t = 2T is defined as a function of the local coordinate ξ and of the source location x0 : vεδ,B (ξ ; x0 ) = v˜ εδ (2T , x0 + εξ ). 2.2. The re-propagated signal and the Wigner transform. Let us assume that the random field c1 satisfies the assumptions of Theorem 1.1 outlined in Sect. 1.2. Then Theorem 1.1 implies the following result. Theorem 2.1. Under the assumptions made above the re-propagated field vεδ,B (ξ , x0 ) converges as Kµ (ε, δ) → 0, vB (ξ , x0 ) = eik·ξ [u+ (T , x0 , k)Sˆ0 (k), b− (k)b+ (k) dk + u− (T , x0 , k)Sˆ0 (k), b+ (k)b− (k)] (2π )d in the sense that
sup E
ξ ∈Rd
|vεδ,B (ξ , x0 ) − vB (ξ , x0 )|2 dx0 → 0.
(2.4)
The functions u± (t, x, k) are the solutions of the Fokker-Planck equation (1.10) with initial data u± (0, x, k) = |χ (x)|2 fˆ(k). The proof of Theorem 2.1 is based on Theorem 1.1 and a representation of the re-propagated signal in terms of the Wigner transform of a mixture of solutions of the acoustic wave equations. The latter arises as follows. Let Qδε (t, x; q) be the matrix-valued solution of (1.5) with initial data Qδε (0, x; q) = χ (x)eiq·x/ε I,
(2.5)
where I is the (d + 1) × (d + 1) identity matrix, χ (x) is the array function, and q ∈ Rd is a fixed vector. It plays the role of the “state” of the initial data. Physically Qδε describes evolution of a wave that is emitted by the recorders-transducers with a wave vector q. The Wigner transform of the family Qδε (t, x; q) is εy εy dy W˜ εδ (t, x, k; q) = eik·y Qδε t, x − ; q Qδ∗ t, x + ;q . (2.6) ε 2 2 (2π )d The corresponding “mixed” Wigner transform is Wεδ (t, x, k) = W˜ εδ (t, x, k; q)fˆ(q)dq.
(2.7)
Then the re-propagated signal is described as follows in terms of Wεδ . Lemma 2.2. The re-propagated signal may be expressed as ε(ξ + y) dkdy δ,B ik·(ξ −y) δ vε (ξ , x0 ) = e , k 0 S0 (y) Wε T , x0 + . 2 (2π )d
(2.8)
88
G. Bal, T. Komorowski, L. Ryzhik
Proof. Let G(t, x; y) be the Green’s matrix of (1.5), that is, solution of (1.5) with the initial data G(0, x; y) = I δ(x − y). Then the signal arriving to the recorders-transducers array is y − x0 vεδ (T , x) = G(T , x; y)vεδ (0, y)dy = G(T , x; y)S0 dy ε and the re-emitted signal is v˜ εδ (T , z) =
0 fε (z − z )χ (z)χ (z )vεδ (T , z )dz .
Therefore we obtain vεδ,B (ξ , x0 ) G(T , x0 + εξ ; z)˜vε (T , z)dz z − z = G(T , x0 + εξ ; z) 0 f χ (z)χ (z )G(T , z ; y) ε y − x0 dzdz dy ×S0 . ε εd
=
(2.9)
However, we also have
0 G(t, x; y) 0 = G∗ (t, y; x).
(2.10)
This is seen as follows: a solution of (1.5) satisfies v(t, x) = G(t − s, x; y)v(s, y)dy for all 0 ≤ s ≤ t. Differentiating the above equation with respect to s, using (1.5) for v(s, y) and integrating by parts we obtain ∂G(t − s, x; y) ∂ 0= − + (G(t − s, x; y)Aδ (y)) D j Aδ (y) v(s, y). ∂t ∂yj Passing to the limit s → 0 and using the fact that the initial data v0 (y) is arbitrary we obtain ∂G(t, y; x) ∂ − (G(t, y; x)Aδ (x)) D j Aδ (x) = 0. ∂t ∂xj
(2.11)
Furthermore, the matrix G∗ (t, x; y) satisfies ∂ ∗ ∂G∗ (t, x; y) + G (t, x; y)Aδ (x) D j Aδ (x) = 0. ∂t ∂xj Multiplying (2.11) by 0 on the left and on the right, and using the commutation relations
0 Aδ = Aδ 0 , 0 D j = −D j 0 , we deduce (2.10). Then, since 02 = I , (2.9) may be re-written as
(2.12)
Self-Averaging of Wigner Transforms in Random Media
vεδ,B (ξ , x0 ) =
89
G(T , x0 + εξ ; z)χ (z)eiq·z/ε χ (z )e−iq·z /ε
dzdz dydq × G∗ (T , x0 + εy; z ) 0 fˆ(q)S0 (y) (2π )d dydq ˆ = Qδε (T , x0 + εξ ; q)Qδ∗ , ε (T , x0 + εy; q)f (q) 0 S0 (y) (2π )d and (2.8) follows.
(2.13)
The following lemma allows us to drop the term of order ε in the argument of Wε in (2.8). dkdy . There Lemma 2.3. Let us define v˜ εδ,B (ξ ; x0 ) = eik·(ξ −y) Wεδ (T , x0 , k) 0 S0 (y) (2π )d exists a deterministic function C(ε, δ) so that sup vεδ,B − v˜ εδ,B L2x ≤ C(ε, δ) ξ
0
(2.14)
and C(ε, δ) → 0 as Kµ (ε, δ) → 0. The proof of Lemma 2.3 is presented in Appendix A. Note that Theorem 1.1 may be applied directly to the moment dk v˜ εδ (ξ ; x0 ) = eik·ξ Wεδ (T , x0 , k) 0 Sˆ 0 (k) (2π )d and the conclusion of Theorem 2.1 follows. 3. The High Frequency Analysis In this section we study the deterministic high-frequency behavior of the Wigner transform and estimate the error between the Wigner transform and the solution of the Liouville equations. It is well known [14, 17, 20] that in the high frequency regime the weak limit of the Wigner transform as ε → 0 and δ > 0 is fixed, is described by the classical Liouville equations in the phase space. Here, we do not pass to the limit ε → 0 at δ fixed but rather control the error introduced by the semi-classical approximation. As explained in the introduction, this is possible because we are dealing with the Wigner transform of a mixture of states that may have strong limits [17] rather than the Wigner transform of pure states, which converges only weakly. 3.1. Convergence on the initial data. We first show that the assumptions on the convergence of the initial data in Theorem 1.1 are not purely academic, and in particular are satisfied in the time-reversal application. We note that the L2 -norm of a pure Wigner transform W˜ ε (t, x, k; q) of a single wave function, such as (2.6), blows up as ε → 0 in L2 (Rd ), because ˜ Wε (t; ζ ) L2 (Rd ×Rd ) := Tr[W˜ ε (t, x, k; ζ )W˜ ε∗ (t, x, k; ζ )]dxdk = (2πε)−d/2 Qε (t, q) 2L2 (Rd ) = (2π ε)−d/2 χ L2 (Rd ) . (3.1)
90
G. Bal, T. Komorowski, L. Ryzhik
Therefore (1.15) may not hold for a pure state. Two examples when assumption (1.15) holds are given by the following lemma, which may be verified by a straightforward calculation. The first one arises when the initial data is random, and the second comes from the time-reversal application. Lemma 3.1. Assumption (1.15) is satisfied in the following two cases: (1) Statistical averaging: the initial data is v0ε (x; ζ ) = ψ(x)V (x/ε; ζ ), where V (y; ζ ) is a mean zero, scalar spatially homogeneous random process witha rapidly decaying two-point correlation function R(z): E {V (y)V (y + z)} = V (y; ζ )V (y + z; ζ )dµ(ζ ) = R(z) ∈ L2 (Rd ), and ψ(x) ∈ Cc (Rd ). The limit Wigner distribution ˜ ˜ is given by W0 (x, k) = |ψ(x)|2 R(k), where R(k) is the inverse Fourier transform of R(y). (2) Smoothing of oscillations: the initial data is v0ε (x; ζ ) = ψ(x)eiζ ·x/ε , where ψ(x) ∈ Cc (Rd ). The measure µ is dµ(ζ ) = g(ζ )dζ , ζ ∈ Rd , and g ∈ L2 (Rd ). The limit Wigner distribution is W0 (x, k) = |ψ(x)|2 g(k). Proof. We only verify case (2), the other case being similar: εy dy εy ∗ Wε0 (x, k) = eik·y ψ x − ψ x+ g(y) ˆ 2 2 (2π )d so that Wε0
− W0 22
2 εy εy 2 dxdy = ψ(x − )ψ ∗ (x + ) − |ψ(x)|2 |g(y)| ˆ 2 2 (2π )d 2 dy = Iε (y)|g(y)| ˆ . (2π)d
(3.2)
However, we have |Iε (y)| ≤ 4 ψ 4L4 and 2 εy εy Iε (y) = ψ(x − )ψ ∗ (x + ) − |ψ(x)|2 dx → 0 2 2 as ε → 0 since ψ ∈ Cc (Rd ), pointwise in y. Therefore Wε0 − W0 2 → 0 by the Lebesgue dominated convergence theorem. Note that if g(ζ ) and ψ in part (2) of Lemma 3.1 are sufficiently regular, then W0ε − W0 2 = O(ε) so that one may get the order of convergence in (1.15). 3.2. Approximation by the Liouville equations. We now estimate directly the error between the mixed Wigner transform and its semi-classical approximation. The dispersion matrix P0δ (x, k) = icδ (x)kj D j may be diagonalized as −iP0δ (x, k)
=
2 q=0
λδq (x, k)q (k),
2
q (k) = I.
(3.3)
q=0
Here q is the projection matrix onto the eigenspace corresponding to the eigenvalue λδq . Notice that the eigenspaces are independent of the spatial position x, hence of the parameter δ; see (1.7)-(1.8).
Self-Averaging of Wigner Transforms in Random Media
91
As we have mentioned before, for a fixed δ > 0 the Wigner transform Wεδ (t, x, k) converges weakly as ε → 0 to its semi-classical limit U δ (t, x, k) given by U δ (t, x, k) =
uδq (t, x, k)q (k).
(3.4)
q
The functions uδq satisfy the Liouville equations ∂uδq ∂t
+ ∇k λδq · ∇x uδq − ∇x λδq · ∇k uδq = 0
(3.5)
with initial data uδq (0, x, k) = Trq W0 (x, k)q . Our goal is to estimate the difference between Wεδ and U δ in L2 (Rd × Rd ). Let us denote by γqδ (x, k) the largest eigenvalue of the matrix (Fqδ Fqδ∗ )1/2 , where
∂ 2 λδq
∂ 2 λδq
− − ∂ki ∂kj ∂k ∂x Fqδ = 2 i δ j . ∂ 2 λδq ∂ λq ∂xi ∂xj ∂xi ∂kj Note that (1.12) implies that γ1δ (x, k) = γ2δ (x, k) = γ δ (x, k), while γ0 = 0. The initial data u0q is supported on a compact set S because W0 is (see (1.16)). Then the set S=
supp uδq (t, x, k)
t≥0, δ∈(0,1]
is bounded because the speed cδ (x) is uniformly bounded from above and below for δ sufficiently small (1.12). Therefore we have γ δ (x, k) ≤ C/δ 3/2 with a deterministic constant C > 0. We denote γ¯δ = supS γ δ (x, k). We have the following approximation theorem. Theorem 3.2. Let the acoustic speed cδ (x) be of the form (1.3) and satisfy assumptions (1.12). We assume that the Wigner transform Wεδ satisfies (1.15) and that (1.16) holds. Moreover, we assume that the initial limit Wigner transform W0 is of the form W0 (x, k) =
u0q (x, k)q (k).
(3.6)
q
Let U δ (t, x, k) = p uδp (t, x, k)p (k), where the functions uδp satisfy the Liouville equations (3.5) with initial data u0q (x, k). Then we have Wεδ (t, x, k) − U δ (t, x, k) 2 ≤ C(δ)[ε W0 H 2 e2γ¯δ t + ε 2 W0 H 3 e3γ¯δ t ] + Wεδ (0) − W0 2 ,
(3.7)
where C(δ) is a rational function of δ with deterministic coefficients that may depend on the constant C > 0 in the bound (1.16) on the support of W0 .
92
G. Bal, T. Komorowski, L. Ryzhik
Theorem 3.2 shows that the semi-classical approximation is valid for times T | ln ε|/γ¯δ . This is reminiscent of the Ehrenfest time of validity of the semi-classical approximation in quantum mechanics, see [5, 9] for recent mathematical results in this direction for the Schr¨odinger operators. The pre-factor constants on the right side of (3.7) are not optimal but sufficient for the purposes of our analysis. The assumption that initially W0 has no terms of the form p q with p = q is necessary in general for the Liouville equation to provide an approximation to Wεδ in the strong sense. This may be seen on the simple example of the solution uε (t, x) = aei(q·x−ct)/ε + bei(q·x+ct)/ε of the wave equation utt − c2 uxx = 0 with a constant speed c. The cross-terms in the Wigner distribution Wε = [|a|2 + |b|2 + ab∗ e−2ict/ε + a ∗ be2ict/ε ]δ(k − q) vanish only in the weak sense as a function of t but not strongly. The Wigner distribution that arises in the time-reversal application has an initial data that is described by part (2) of Lemma 3.1: W0 (x, k) = |χ (x)|2 fˆ(k)I and satisfies the assumption (3.6) with u0q (x, k) = |χ (x)|2 fˆ(k) for all eigenspaces because of the second equation in (3.3). The error introduced by the replacement of the initial data in (3.7) in that case is given by (3.2) and is O(ε) provided that χ and f are sufficiently regular. The proof of Theorem 3.2 is quite straightforward though tedious. We first obtain the evolution equation for Wεδ in Sect. 3.3, and show that it preserves the L2 -norm. This allows us to replace the initial data in the equation for Wεδ by W0 at the expense of the last term in (3.7). We obtain the Liouville equations (3.5) in Sect. 3.4 and estimate the right side of (3.7) in terms of the H 3 -norm of its solution. Finally, in Appendix C we obtain the necessary estimates for the solution of the Liouville equation. 3.3. The evolution equation for the Wigner transform. The L2 -norm of the Wigner transform W˜ (t, x, k; ζ ) of a pure state, or a fixed ζ , is preserved in time as follows from the preservation of the L2 -norm of solutions of (1.5). We obtain now an evolution equation for the Wigner transform Wε of mixed states and show that its L2 -norm is also preserved. It is convenient to define the skew-symmetric matrix symbol Pεδ (x, k) = P0δ (x, k) + εP1δ (x),
(3.8)
where P0δ is defined by (1.7) and the symbol P1δ depends only on x: 1 ∂Aδ 1 ∂Aδ (x) − (x)D j Aδ (x) 2 ∂xj 2 ∂xj 1 ∂cδ = (x) ej ⊗ ed+1 − ed+1 ⊗ ej . 2 ∂xj
P1δ (x) = Aδ (x)D j
(3.9)
Self-Averaging of Wigner Transforms in Random Media
93
The latter equality follows from (1.6) and calculations of the form Dj
∂Aδ ∂cδ (x) = (x)ej ⊗ ed+1 . ∂xj ∂xj
(3.10)
The following lemma describes the evolution of the Wigner transform Wεδ . Lemma 3.3. The Wigner transform Wεδ (t, x, k) satisfies the evolution equation
ε
∂Wεδ + Lδε Wεδ = 0 ∂t
(3.11)
with initial data Wεδ (0, x, k). The operator Lδε is given by Lδε f (x, k) =
dzdpdydq
Pεδ (y, q)eiφ f (z, p) − f (z, p)e−iφ Pεδ (y, q)
(π ε)2d
,
(3.12)
where φ(x, z, k, p, y, q) = 2ε ((p − k) · y + (q − p) · x + (k − q) · z). The integral of the trace and the L2 -norm of the Wigner transform Wε are preserved:
TrWεδ (t, x, k)dxdk =
TrWεδ (0, x, k)dxdk
(3.13)
and
Tr[Wεδ (t, x, k)Wεδ∗ (t, x, k)]dxdk =
Tr[Wεδ (0, x, k)Wεδ∗ (0, x, k)]dxdk. (3.14)
This lemma is verified by a direct calculation that we present for the convenience of the reader in Appendix B. Note that the solution of (3.11) with self-adjoint initial data remains self-adjoint and the L2 -norm is preserved. Therefore, we have the following corollary. Corollary 3.4. Let W0 (x, k) be a strong limit of Wε (0) in L2 , which exists by assumption (1.15). Then the solutions Wεδ (t, x, k) and W¯ εδ (t, x, k) of (3.11) with initial conditions Wε0 (x, k) and W0 (x, k), respectively, satisfy W¯ εδ (t) − Wεδ (t) 2 = Wε0 − W0 2 → 0 as ε → 0. This shows that in the analysis of (3.11), we can replace strongly converging initial conditions by their limit, and consider then the limit of W¯ ε (t, x, k) as ε → 0 with fixed initial conditions. This is done in the following section.
94
G. Bal, T. Komorowski, L. Ryzhik
3.4. Derivation of the Liouville equations. We consider in this section the solution W¯ εδ (t, x, k) of the evolution equation (3.11) with fixed initial data W0 (x, k) and show that it may be approximated by the solution of the Liouville equation. We split the operator δ,1 Lδε = Lδ,0 ε + εLε , where dzdpdydq Lδ,j f (x, k) = Pjδ (y, q)eiφ f (z, p) − f (z, p)e−iφ Pjδ (y, q) , j = 0, 1, ε (π ε)2d and the symbols Pjδ are given by (1.7) and (3.9). The operator Lδ,0 ε is given explicitly by Lδ,0 ε f (x, k) =
εy εy ei(k−p)·y cδ (x − )ipj D j f (x, p) − cδ (x + )f (x, p)ipj D j 2 2 dpdy ε εy ∂ i(k−p)·y j × D + (x − c e )f (x, p) δ (2π)d 2 ∂xj 2 dpdy ∂ εy 02 + cδ (x + )f (x, p) D j = L01 ε,δ + Lε,δ . ∂xj 2 (2π )d
We recast the operator L01 ε,δ as j j f (x, k) = c (x) ik D f (x, k) − f (x, k)ik D L01 δ j j ε,δ ε ∂cδ (x) ∂ − kj D j f (x, k) + f (x, k)kj D j + εR01 ε,δ f 2 ∂xm ∂km with the correction term 1 εy ε i(k−p)·y f (x, k) = (x − (x) + (x) ipj D j f (x, p) c e ) − c y · ∇c R01 δ δ δ ε,δ ε 2 2 dpdy εy ε − cδ (x + ) − cδ (x) − y · ∇cδ (x) f (x, p)ipj D j . 2 2 (2π )d Similarly, we have L02 ε,δ f (x, k) =
ε j ∂ ε ∂ D (cδ (x)f (x, k)) + (cδ (x)f (x, k)) D j + εR02 ε,δ f 2 ∂xj 2 ∂xj
with R02 ε,δ f (x, k) =
∂ εy ei(k−p)·y D j cδ (x − ) − cδ (x) f (x, p) ∂xj 2 dpdy ∂ εy + cδ (x + ) − cδ (x) f (x, p) D j . ∂xj 2 (2π )d
1 2
The operator Lδ,1 ε is given explicitly by εy εy dpdy δ,1 Lε f (x, k) = ei(k−p)·y P1δ (x − )f (x, p) − f (x, p)P1δ (x + ) 2 2 (2π )d δ δ 1 = P1 (x)f (x, k) − f (x, k)P1 (x) + Rε,δ f (x, k) (3.15)
Self-Averaging of Wigner Transforms in Random Media
95
with the correction R1ε,δ defined by εy 1 Rε,δ f (x, k) = ei(k−p)·y P1δ (x − ) − P1δ (x) f (x, p) − f (x, p) 2 dpdy εy δ δ × P1 (x + ) − P1 (x) . (3.16) 2 (2π )d Putting together the above expressions, we obtain the following equation for W¯ εδ : 1 ∂ W¯ εδ = Lδε W¯ εδ ∂t ε [W¯ εδ , P0δ ] 1 = + [W¯ εδ , P1δ ] + ({W¯ εδ , P0δ } − {P0δ , W¯ εδ }) − Rδε W¯ εδ (3.17) ε 2i 02 1 with Rδε = R01 ε,δ + Rε,δ + Rε,δ . Here {f, g} is the standard Poisson bracket
{f, g} = ∇k f · ∇x g − ∇x f · ∇k g and [A, B] = AB − BA is the commutator. We now introduce the expansion δ W¯ εδ = U δ + εU1δ + U2,ε .
(3.18)
We insert this ansatz into (3.17) and equating like powers of ε obtain at the order ε −1 , [P0δ , U δ ] = 0,
(3.19)
which is equivalent to Uδ =
2 q=0
q U δ q =
2
Uqδ ,
(3.20)
q=0
where Uqδ = q U δ q , and for q = 1, 2 one has Uqδ = uδq q with uq = TrUqδ . The matrices q are projections on the eigenspaces of P0δ , as in (3.3). This means that the matrix U δ does not have off-diagonal contributions in the eigenbasis of P0δ . The equation of order O(ε0 ) is given by ∂U δ 1 = [U1δ , P0δ ] + [U δ , P1δ ] + ({U δ , P0δ } − {P0δ , U δ }). ∂t 2i
(3.21)
Multiplying the above equation on both sides by q yields ∂q U δ q 1 = q [U δ , P1δ ]q + q {U δ , P0δ } − {P0δ , U δ } q . ∂t 2i
(3.22)
This is nothing but Eq. (6.16) of reference [14] for the Wigner matrix without consideration of mixtures of states. The only difference is that the leading order term P0 depends on the parameter δ. This, of course, does not change the algebra, and following [14] one obtains a system of decoupled Liouville equations for uδq = TrUqδ , q = 1, 2, ∂uδq ∂t
+ λδq , uδq = 0
(3.23)
96
G. Bal, T. Komorowski, L. Ryzhik
with initial data uδq (0, x, k) = Tr[q (k)W0 (x, k)q (k)]. The zero eigenvalue component of the matrix U δ , that is, U0 (t, x, k) = 0 (k)W0 (x, k)0 (k), does not change in time. δ in (3.18) are small. In order to uniquely We have to show that the terms U1δ and U2,ε characterize U1δ , we assume that it is orthogonal to the terms of the form (3.20), that is, p U1δ q . (3.24) U1δ = p=q
Then, (3.20) and (3.21) imply that m U1δ p =
i(λδm
1 m B(U δ )p , − λδp )
(3.25)
where B(U δ ) = [U δ , P1δ ] +
1 ({U δ , P0δ } − {P0δ , U δ }). 2i
δ in (3.18) and show that it vanishes in the limit ε → 0. We now analyze the term U2,ε δ is The equation for the U2,ε δ ∂U2,ε
∂t
=
1 δ δ L U + Sε , ε ε 2,ε
(3.26)
where ∂U δ 1 Sε = ε [U1δ , P1δ ] + ({U1δ , P0δ } − {P0δ , U1δ }) − ε 1 − Rδε (U δ + εU1δ ). (3.27) 2i ∂t δ (0, x, k) = −εU δ (0, x, k) because of (3.6), which The initial condition for (3.26) is U2,ε 1 implies that W0 (0, x, k) = U δ (0, x, k). We now use the fact that Lδε is skew-symmetric to obtain the bound t δ δ Sε (s) 2 ds. (3.28) U2,ε (t) 2 ≤ ε U1 (0) 2 + 0
The analysis of the convergence of the difference of W¯ εδ and U δ to zero thus relies on estimating the error term Sε . The relevant bounds are provided by the following two lemmas. Here we denote f H˙ s = D s f L2 . Lemma 3.5. There exists a constant C > 0 that depends on the constant in the bound (1.16) on the support of W0 , and on the constants Di in (1.12) so that ε ε2 δ δ (3.29) Sε 2 ≤ C 3 U H 2 + 11/2 U H 3 . δ δ Lemma 3.6. The H˙ s (Rd × Rd )-norm, s = 1, 2, 3, of U δ (t) is bounded by uδq (t) H˙ s ≤ Cs u0q H˙ s exp(s γ¯δ t).
(3.30)
Here u0q = Tr[q W0 q ], the initial data for the Liouville equation (3.23), the constant Cs is a deterministic rational function of δ.
Self-Averaging of Wigner Transforms in Random Media
97
Note that the prefactors of the type δ −m in Lemma 3.5 are not as important as the terms U δ H s since the latter grow exponentially in γ¯δ ∼ δ −3/2 according to Lemma 3.6. Proof of Lemma 3.5. Observe that thanks to (3.27), we have ! ! " ! ∂U δ ! ε ! ! Sε 2 ≤ C √ U1δ H 1 + ε ! 1 ! + Rδε (U δ + εU1δ ) . ! ∂t ! δ 2
(3.31)
We have the following bound for U1δ : U1δ H s ≤
C U δ H s+1 δ 2s
(3.32)
with a constant C > 0 that depends only on the constant in the bound (1.16) on the support of W0 and on the constants Di in (1.12). Indeed, expression (3.24) implies that 1 1 U1δ H s ≤ Cδ 2 −s B(U δ ) H s , while we have B(U δ ) H s ≤ Cδ −s− 2 U δ H s+1 so that (3.32) follows. This bound is by no means optimal but will be sufficient for our purposes. Furthermore, we have ! ! ! δ ! ! ∂U δ ! ! ∂U ! C ! 1! δ ! (3.33) ! ! ≤C! !B ∂t ! ≤ δ 3 U H 2 . ! ∂t ! 2 2
In order to complete the bound (3.29) for Sε we show that Rδε f 2 ≤
Cε [ kj f H 2 + f H 2 ]. δ 3/2
(3.34)
j
02 1 We only consider R01 ε,δ as the corresponding bounds for the operators Rε,δ and Rε,δ are 01 obtained similarly. We split R01 ε,δ as Rε,δ = I01 − II01 . We have
εy ε dpdy ei(k−p)·y cδ (x − ) − cδ (x) + y · ∇cδ (x) ipj D j f (x, p) 2 2 (2π )d sy ε ∂ 2 cδ x − 1 i(k−p)·y 2 ip D j f (x, p) dpdy ds = (ε − s) e yl y m j 4ε 0 ∂xl ∂xm (2π )d ε 1 = (ε − s)I˜01 (s)f ds. 4ε 0
I01 f =
1 ε
Moreover, we obtain that |I˜01 (s)f (x, k)|2 dxdk sy 2 sz ∂ 2 cδ x − ∂ cδ x − 2 2 = Tr ei(k−p)·y−i(k−q)·z yl ym zl zm ∂xl ∂xm ∂xl ∂xm dpdydqdzdxdk × pj qr D j f (x, p)f ∗ (x, q)D r (2π )2d
98
G. Bal, T. Komorowski, L. Ryzhik
sy 2 sy ∂ 2 cδ x − ∂ cδ x − 2 2 = Tr ei(q−p)·y yl ym yl ym ∂xl ∂xm ∂xl ∂xm dpdydqdx C ×pj qr D j f (x, p)f ∗ (x, q)D r ≤ 3 kj f 2H 2 . d (2π) δ
j
Therefore the Minkowski inequality implies that I01 f 2 ≤ Cεδ −3/2 j kj f H 2 , and 1 the same bound holds for II01 . The operators R02 ε and Rε may be bounded in a similar 02 1 −3/2 way as Rε f L2 + Rε f L2 ≤ Cεδ f H 2 . Therefore we have the bound (3.34) and then (3.29) follows from (3.31)-(3.34). δ , and Theorem 3.2 now follows from the bound (3.32) for U1δ , the bound (3.28) for U2,ε Lemmas 3.6 and 3.5. It only remains to prove Lemma 3.6, which is done in Appendix C.
4. The Liouville Equations in a Random Medium We formulate in this section the main result concerning the convergence of the expectation of the solution of the Liouville equation (1.9) to the solution of the phase space diffusion equation (1.10) in the limit δ → 0. We also show that the values of the solution of the Liouville equation at different points in the phase space become independent in this limit. This allows us to establish the self-averaging property in Theorem 1.1. d m 4.1. Preliminaries. We let Cm := C([0, +∞); (R ) ), and for anyR1 , · · · , Rm > 0 we d−1 , where SRd−1 is the denote by Cm (R1 , · · · , Rm ) := C [0, +∞); SR1 × · · · × SRd−1 m
sphere in Rd of radius R > 0 centered at 0. We also let πt : Cm → (Rd )m , t > 0, be the canonical mapping πt (K) = (K1 (t), · · · , Km (t)), K = (K1 , · · · , Km ) ∈ Cm . For any u ≤ v we denote by Mu,v v], and m the σ -algebra of subsets of Cm generated by πt , t∈ [u, 0,+∞ 0,t . and Tm be the filtered measurable space Cm , Mm , Mm let Mm := Mm t≥0
For any set A ∈ B(Rd ) we denote C(A) := σ {c1 (x) : x ∈ A}. We suppose further that c1 : Rd × → R is a scalar, measurable, strictly stationary, zero mean random field that satisfies assumptions presented in Sect. 1.2, that is, it satisfies the almost sure bounds (1.12), is exponentially φ-mixing (1.13), and has a C ∞ -correlation function R(x). We define the differential operator d
LF (k) =
ˆ k2 ,k F (k) |k|2 Dp,q (k)∂ p q
p,q=1 d
+
ˆ kp F (k), |k|Ep (k)∂
F ∈ C0∞ (Rd \ {0})
p=1
with the diffusion matrix D given by (1.11) and the drift E defined by ˆ = −c0 Ep (k)
+∞ d ˆ ds, s ∂x3p ,xq ,xq R(c0 s k) q=1 0
∀ p = 1, · · · , d.
(4.1)
Self-Averaging of Wigner Transforms in Random Media
99
A simple calculation shows that L is a generator of a diffusion on Skd−1 given by Itˆo 0 S.D.E., #
√ ˆ ˆ dk(t) = |k(t)| E(k(t)) dt + 2 D1/2 (k(t)) dB(t) k(0) = k0 = 0.
(4.2)
Here E = (E1 , · · · , Ed ) and B(·) is a d-dimensional standard Brownian motion. Remark 4.1. A simple calculation shows that the diffusion k(·) given by (4.2) is symmetric. Indeed the generator can be written in the form LF (k) =
ˆ kq F (k) , ∂kp |k|2 Dp,q (k)∂
d
F ∈ C0∞ (Rd \ {0}).
p,q=1
For any k = 0 we denote by Qk the law of such a diffusion starting at k, which is supported in C1 (k), k = |k|. ˆ = 0 for all k ∈ Rd \{0}. Remark 4.2. The matrix D := [Dp,q ] is degenerate since D(k)k It can be shown however that under fairly general assumptions its rank equals d − 1. ˆ Proposition 4.3. Suppose that R(0) > 0. Then, the rank of D equals d − 1. Proof. Suppose that c0 = 1 and let Hk := [p ∈ Rd : p · kˆ = 0] be the hyperplane orthogonal to k. Then, ˆ = −1 Dml (k) 2 =
∞
−∞
1 2d π d−1
ˆ ds ∂x2m ,xl R(s k)
1 = 2
∞
Rd
−∞
e
ˆ is k·p
ˆ pm pl R(p)dp ds
ˆ pm pl R(p)dp, Hk
and hence for any ξ ∈ Rd we have ˆ , ξ ) = Dml (k)ξ ˆ m ξl = (D(k)ξ
1 2d π d−1
ˆ (p · ξ )2 R(p)dp.
(4.3)
Hk
ˆ Suppose that ξ ∈ Hk . Then, since R(p) ≥ 0 the left hand side of (4.3) is nonnegative. ˆ , ξ ) > 0. Indeed, if otherwise then, since Rˆ is continuous, We claim that in fact (D(k)ξ ˆ we would have R(p)(p · ξ )2 = 0 for all p ∈ Hk , which is impossible due to the fact that ˆ R(0) > 0 and the set Hξ ∩ Hk has the linear dimension d − 2. ˆ is of rank d − 1 if there exists p0 ∈ Hk such The above argument shows that D(k) ˆ =0 ˆ 0 ) > 0. On the other hand, if R(p) ˆ that R(p = 0 for all p in the plane Hk then D(k)ξ d ˆ for all ξ ∈ R . Therefore the matrix D(k) either has rank d − 1, or vanishes identically. Another condition ensuring the latter does not happen is the radial symmetry of R(·).
100
G. Bal, T. Komorowski, L. Ryzhik
4.2. Two particle model. We would like to show that solution uδ (t, x, k) of (1.9) decorrelates in the limit δ → 0 at two different points, that is, that
E uδ (t, x1 , k1 )uδ (t, x2 , k2 ) − E uδ (t, x1 , k1 ) E uδ (t, x2 , k2 ) → 0 as δ → 0, (4.4) provided that k1 = k2 . Recall that uδ (t, x, k) may be represented as uδq (T , x, k) = u0q (Xδ (T , x, k), −Kδ (T , x, k)), where u0q is the initial data for (1.9), and ∂λδq δ dXδ (t) = (X (t), Kδ (t)), dt ∂k ∂λδq δ dKδ (t) =− (X (t), Kδ (t)), dt ∂x
Xδ (0) = x, (4.5) K (0) = −k. δ
In order to establish (4.4) we have to consider motion of two particles that may start at the same physical point but are moving in different directions. The equations of motion for two particles are governed by the Hamiltonian system (δ) (δ) d x m (t;xm ,km ) = ∇k λδq x m (t; xm , km ), k (δ) (t; xm , km ) m dt (δ) d k m (t;xm ,km ) (δ) (4.6) = −∇x λδq x m (t; xm , km ), k (δ) (t; xm , km ) m dt (δ) x m (0; xm , km ) = xm , k (δ) m (0; xm , km ) = km , m = 1, 2. We will assume that x1 = x2 = 0, and k1 = 0,
k2 = 0 and kˆ 1 = kˆ 2 .
The above system can be rewritten in the form (δ) √ (δ) x (δ) d x m (t;xm ,km ) m (t;xm ,km ) kˆ m (t; xm , km ) = c + δc 0 1 dt δ (δ) d k m (t;xm ,km ) x (δ) 1 m (t;xm ,km ) √ |k (δ) = − ∇ c m (t; xm , km )| δ dt δ x 1 (δ) x m (0; xm , km ) = 0, k (δ) m (0; xm , km ) = km , m = 1, 2.
(4.7)
(4.8)
The main result of this section is the following. Theorem 4.4. Suppose that the random field c1 (·) satisfies the assumptions in Sect. 1.2 (δ) (δ) (δ) (δ) and that d ≥ 3. Then, the laws of processes (k 1 (·), x 1 (·), k 2 (·), x 2 (·)) determined by (4.6), converge weakly in C4 , as δ → 0, to the law of (k 1 (·), x 1 (·), k 2 (·), x 2 (·)), where k j (·), j = 1, 2 are independent symmetric diffusions given by (4.2) starting at kj , j = 1, 2 respectively and t x j (t) = −c0
kˆ j (s)ds,
j = 1, 2.
0
Theorem 1.1 is a simple corollary of Theorems 3.2 and 4.4.
Self-Averaging of Wigner Transforms in Random Media
101
Proof of Theorem 1.1. First we observe that (2 ( ( ( ( (W δ (t, x, k) − U δ (t, x, k)S(k)dk( dx ε ( ( ≤ S 2L2 |Wεδ (t, x, k) − U δ (t, x, k)|2 dkdx → 0 as (ε, δ) → 0 in Kµ and this convergence is uniform in realizations of the random medium provided that the bounds (1.12) are satisfied. Therefore it suffices to study s˜ δ (x) = U δ (t, x, k)S(k)dk. We observe that # ! !2 ) ! ! δ 2 δ ! (U (t, x, k) − W¯ (t, x, k))S(k)dk! dx E ˜s (x) − s¯ (x) dx = E ! ! = E S ∗ (k1 ) (U δ∗ (t, x, k1 ) − W¯ ∗ (t, x, k1 )) × (U δ (t, x, k2 ) − W¯ (t, x, k2 ))S(k2 )dk1 dk2 dx with s¯ (x) and W¯ (t, x, k) as in the formulation of Theorem 1.1. Theorem 4.4 implies that
E U δ (t, x, k) → W¯ (t, x, k),
E U δ (t, x, k1 )U δ (t, x, k2 ) → W¯ (t, x, k1 )W¯ (t, x, k2 ) pointwise in x and k. Recall that the functions U δ (t, x, k) and W¯ (t, x, k) are uniformly compactly supported and bounded in L∞ . Therefore the Lebesgue dominated convergence implies that δ 2 E ˜s (x) − s¯ (x) dx → 0 and the proof of Theorem 1.1 is complete.
5. Proof of Theorem 4.4 Before we present the proof of this result we wish to spend a few words to lay out its main ideas. They are based in large part on the ideas of [15] where the phase space diffusion equation for the limit of the expectation √ of the solution of the Liouville equation with the Hamiltonian H δ (x, k) = k 2 /2 + δV (x/δ) has been obtained. The two-particle case introduces some additional difficulties into the problem. Our first step in the proof, in Sect. 5.1 below, is to replace the processes (k δ1 (·), k δ2 (·)) by (l δ1 (·), l δ2 (·)) that agree with (k δ1 (·), k δ2 (·)) up to certain stopping times. These times are determined by the stopping rules, introduced by multiplying the Hamiltonian λδ (x, k) by several cut-off functions. Their role is to prevent the trajectory of each particle to self-intersect and also not to allow the particles to get too close to each other. We shall prove tightness of such modified processes by showing that for any bounded, positive and continuous function F one can find a constant C > 0 such that F (l δ1 (t), l δ2 (t)) + Ct, t ≥ 0 are sub-martingales (see e.g. [23] Theorem 1.4.6), cf. (5.29). This fact will be established thanks to the decorrelation properties of the random field ∇x c1 (·). More precisely, the latter imply mixing lemmas
102
G. Bal, T. Komorowski, L. Ryzhik
contained in Sect. 5.2. The second ingredient of the proof is a perturbative argument (δ) (δ) that allows us to replace the trajectory x i (·) (in fact its modification y i (·) that arises from the replacement of k δ by l δ ) by a linear approximation over the time interval that is much longer than the correlation time (that we recall is of order O(δ)) yet is sufficiently short so we can control the accuracy of the approximation, cf. Lemma 5.4. In order to ensure that the approximate motion (under linear approximation) is not transverse to the direction of the field at a given time, which could prevent us from using the decorrelation properties of the field, but is rather propelled forward, we have to introduce another stopping time rule, cf. the condition on the scalar product of wave number directions contained in (5.5). Conducting the proof of tightness we also identify a certain martingale property of any limiting law of (l δ1 (·), l δ2 (·)), as δ → 0 that holds up to the aforementioned stopping time. By proving that this time goes to infinity with the removal of the cut-offs we are able to prove both the weak convergence of the laws of (k δ1 (·), k δ2 (·)) and identify a well-posed martingale problem associated with the limiting measure. This step is done in Sect. 5.4. With no loss of generality we shall assume throughout this section that c0 = 1. 5.1. The cut-off functions. Let p, q > 0 and k ≥ 0 be integers. Let M be chosen in such a way that M ≥ |k1 | ∨ |k2 |
and
|k1 | ∧ |k2 | ≥ M −1 .
Let kˆ 1 = kˆ 2 be such as in the statement of Theorem 4.4. Denote 1 1 ˆ ˆ ˆ ˆ ˆ ˆ K N := (k, k ) : (k, k1 )Rd ≥ 1 − , (k , k2 )Rd ≥ 1 − N +1 N +1 and choose N a positive integer such that ˆ kˆ ) ∈ K N > 0, γN := inf |kˆ − kˆ | : (k,
(5.1)
(5.2)
(5.3)
that is, the cones of aperture 1/(N + 1) centered at kˆ 1 and k2 are separated. As a consequence of (5.3) we may choose a positive integer q so that ( ( ( ( ( (1 (1 ( 1 ˆ kˆ ) ∈ K N ≥ 4 .(5.4) λN (p) := inf (( kˆ − ρ kˆ (( ∧ (( kˆ − ρ kˆ (( : ρ ∈ 0, , (k, p p p q We define now several auxiliary functions that will be used to introduce the cut-offs in the dynamics. The function ψ : Rd × (S1d−1 )2 → [0, 1] is C ∞ and has the property that 1 1 1, if kˆ · l1 ≥ 1 − N+1 and kˆ · l2 ≥ 1 − N+1 and M −1 ≤ |k| ≤ M ψ(k, l1 , l2 ) = 0, (5.5) 2 2 if kˆ · l1 ≤ 1 − N+1 or kˆ · l2 ≤ 1 − N+1 or |k| ≤ (2M)−1 or |k| ≥ 2M.
Self-Averaging of Wigner Transforms in Random Media
103
The function φk : Rd × C1 → [0, 1] is C ∞ for a fixed path K(t) and satisfies ( ( ( ( t ( ( 1, if inf (y − K(s)ds ( ≥ q2 (p) ( ( 0 0≤t≤tk−1 ( ( φk (y; K) = ( ( t ( ( 1 y − 0, if inf K(s)ds ( ( ≤ q. (p) ( ( 0 0≤t≤t
(5.6)
k−1
Here tk := kp −1 and, by convention, K(s) := K(0), s ≤ 0. The function ξk : Rd × Rd × C2 → [0, 1] is smooth when the paths K1 (·), K2 (·) ∈ C1 are fixed. We let (p)
p1 := 2q [8(1 + D0 )]p
(5.7)
:= kp1−1 be a sub-partition of tk , and define ( ( ( ( t ( ( 2 1, if inf y − K (s)ds ( (≥ q 1 2 (p1 ) ( ( 0 0≤t≤sk ( ( ( ( t ( ( 2 y and inf − K (s)ds ( (≥ q 2 1 (p1 ) ( ( 0 0≤t≤sk ( ( ξk (y1 , y2 ; K1 (·), K2 (·)) = ( ( t ( ( 1 y 0, if inf − K (s)ds ( (≤ q 1 2 (p1 ) ( ( 0 0≤t≤sk ( ( ( ( t ( ( or inf (y2 − K1 (s)ds ( ≤ q1 . (p1 ) ( ( 0 0≤t≤s (p1 )
and sk
k
(5.8) For j = 1, 2 we set # j (t, y; K(·)) :=
(p)
1, if 0 ≤ t < t1 (p) (p) φk (y; K(·)), if tk ≤ t < tk+1 .
(5.9)
Each j (·) shall be used to modify the dynamics of the corresponding particle in order to avoid a possibility of self-intersections of its trajectory. The cut-off function # (p) (p) (p) (p) ψ k, Kˆ tk−1 , Kˆ tk for t ∈ [tk , tk+1 ) and k ≥ 1 (t, k; K(·)) := (p) ˆ ˆ ψ(k, K(0), K(0)) for t ∈ [0, t1 ) (5.10) will allow us to control the direction of the particle motion over each interval of the partition as well as not to allow the trajectory to escape to the regions where the change of velocity can be uncontrollable. The cut-off (t, y#1 , y2 ; K1 (·), K2 (·)) (p) 1, if 0 ≤ t < t1 = (p1 ) (p1 ) (p) (p ) ξk (y1 , y2 ; K1 (·), K2 (·)), if sk ≤ t < sk+1 and t1 ≤ sk 1
(5.11)
104
G. Bal, T. Komorowski, L. Ryzhik
is introduced in order not to allow the two trajectories to come too close to each other. (p) Note that this cut-off is “switched on” only after time t = t1 so as to allow the two particles to separate initially. After this time it is updated every 1/p1 time step, that is, more frequently than the cut-offs that control the self-intersections of each trajectory that are updated only at each 1/p time step. The following lemma can be checked by a direct calculation. Both here and in what follows we denote by D•,β the partial with respect to the β component of the given vector variable. Lemma 5.1. Let m = (m1 , · · · , md ) be a multi-index with nonnegative integer vald ued components, m = mp . There exist constants C3 , C4 > 0 depending only on p=1
M, N, p, q, m such that |Dym j (t, y)| ≤ C3 ,
|Dymj (t, y1 , y2 )| ≤ C4 ,
j = 1, 2.
Let K = (K1 , K2 ) ∈ C2 and denote j (s, y 1 , y 2 , l; K) := (s, l; Kj )j s, y j ; Kj s, y 1 , y 2 ; K ,
(5.12)
j (s, y1 , y2 , y1 , y2 , l; K) := j (s, y1 , y2 , l; K)j (s, y1 , y2 , l; K).
(5.13)
˜ We also introduce a random transformation of paths K(·) = (K˜ 1 (·), K˜ 2 (·)) for any K ∈ C2 given by √ Kj (t) Kˆ j (t), K˜ j (t) = 1 + δc1 δ
t ≥ 0.
(5.14)
Finally, let us set Fj (t, y1 , y2 , l; K) = j (t, δy1 , δy2 , l; K)∇yj c1 yj |l|,
j = 1, 2.
(5.15)
The modified two particle system with the cut-offs that we will consider is given by (δ) (δ) √ y j (t;xj ,kj ) d y j (t) (δ) lˆj (t; xj , kj ) = 1 + δc1 dt δ (δ) (δ) (δ) (δ) d l j (t) y (t) y 2 (t) (δ) 1 1 ˜ = − √ Fj t, δ , δ , l j (t); l (·) dt δ (δ) (δ) y j (0) = 0, l j (0) = kj , j = 1, 2,
(5.16)
(δ) (δ) (δ) where the path l˜ (·) = (l˜1 (·), l˜2 (·)) is obtained from l(·) by the transformation (δ) (δ) (δ) (δ) (5.14). We will denote by Qδ (·; M, N, p, q) the law of (l 1 (·), y 1 (·), l 2 (·), y 2 (·)) for a given δ > 0 over C4 .
Self-Averaging of Wigner Transforms in Random Media
105
5.2. The Mixing Lemmas. For any t ≥ 0 we denote by Ft the σ -algebra generated by (δ) (δ) (δ) (δ) (l 1 (s), y 1 (s), l 2 (s), y 2 (s)), s ≤ t. Throughout this section we assume that X1 , X2 : 2 R × Rd × Rd → R are certain continuous functions, Z is a random variable and g1 , g2 d are R -valued random vectors. We suppose further that Z,g1 , g2 , are Ft -measurable, while X1 , X2 are random fields of the form Xi (x) = Xi c1 (x), ∇x c1 (x), ∇x2 c1 (x) , satisfy lim Xi (x) − Xi (0) ∞ = 0, i = 1, 2. We also let |x|→0
U (θ1 , θ2 ) := E [X1 (θ1 )X2 (θ2 )] ,
(θ1 , θ2 ) ∈ (Rd )2 .
(5.17)
The following mixing lemmas will be of crucial importance for us in the sequel. Lemma 5.2. Assume that r, t ≥ 0 and ( ( (δ) ( y j (u) (( r ( inf (gi − (≥ , u≤t ( δ ( δ P–a.s. on the set Z = 0 for i, j = 1, 2. Then, we have r |E [X1 (g1 )X2 (g2 )Z] − E [U (g1 , g2 )Z]| ≤ 2φ X1 ∞ X2 ∞ Z 1 . 2δ
(5.18)
(5.19)
Proof. The proof is a modification of the proof of Lemma 2 of [15] so we only highlight its main points. Choose an arbitrary η > 0. By a suitable modification of g1 , g2 on the event Z = 0, so that the modified r.v. remain Ft –measurable, we can guarantee that (5.18) holds P–a.s. Let i = (i1 , · · · , id ) ∈ Zd and Ci := [i1 /2M1 , (i1 + 1)/2M1 ) × · · · × [id /2M1 , (id + 1)/2M1 ) and ci := ((2i1 + 1)/2M1 +1 , · · · , (2id + 1)/2M1 +1 ). Here M1 > 0 is a sufficiently large integer so that Xi (x) − Xi (ci ) ∞ ≤ η,
∀ i ∈ Zd , x ∈ Ci , i = 1, 2
(5.20)
and 2−M1 < r/(20δ). We let 0 := [z : dist (z, Ci ∪ Cj ) > 3r(4δ)−1 ], Di,j
Di,j := [z : dist (z, Ci ∪ Cj ) > r(2δ)−1 ], and
(δ)
Yt
:=
1 (δ) (δ) (y 1 (s), y 2 (s)) : s ≤ t . δ
Let us denote by Ii,j the indicator of the event [(g1 , g2 ) ∈ Ci × Cj ] and the event (δ) 0 ]. Ai,j = [ω : Yt (ω) ⊆ Di,j
106
G. Bal, T. Komorowski, L. Ryzhik
Note that E [X1 (g1 )X2 (g2 )Z] =
E X1 (g1 )X2 (g2 )ZIi,j χAi,j .
(5.21)
i,j
Using precisely the same argument as the one contained in p. 31 of [15] we prove that ZIi,j χAi,j is C(Di,j )–measurable for each i, j ∈ Zd . Note however that the right-hand side of (5.21) is equal, up to a term of order O(η), to E X1 (ci )X2 (cj )ZIi,j χAi,j . (5.22) i,j
The random variable X1 (ci )X2 (cj ) is however C(Ci ∪ Cj )–measurable. Therefore we can write, see e.g. [7] p.171, that ( ( (E X1 (ci )X2 (cj )ZIi,j χA − U (ci , cj )E ZIi,j χA ( i,j i,j i,j
≤
r ( ( (E ZIi,j χA ( X1 ∞ X2 ∞ . φ i,j 2δ
(5.23)
i,j
However, U (ci , cj ) equals, up to a term of order O(η), to U (g1 , g2 ) on the event corresponding to Ii,j . The conclusion of Lemma 5.2 follows upon the passage to the limit M1 → +∞ and η ↓ 0. Lemma 5.3. Assume that r, t are as in the previous lemma. Let EX1 = 0. Furthermore, we assume that g2 satisfies (5.18), ( ( (δ) ( y j (u) (( r + r1 ( inf (g1 − , j = 1, 2 (5.24) (≥ u≤t ( δ ( δ and r1 , δ 0. Then we have for some r1 ≥ 0, P-a.s. on the event Z = |g1 − g2 | ≥
|E [X1 (g1 )X2 (g2) Z] −E [U(g1 , g2 )Z]| r r1 ≤ C5 φ 1/2 φ 1/2 X1 ∞ X2 ∞ Z 1 2δ 2δ for some absolute constant C5 > 0 Here the function U is given by (5.17).
(5.25)
(5.26)
Proof. We prove that the left hand side of (5.26) is bounded by r 1 X1 ∞ X2 ∞ Z 1 . (5.27) C6 φ 2δ This together with the result of the previous lemma imply (5.26). Let η > 0 and M1 be as in the proof of Lemma 5.2, and in addition 2−M1 < r1 /(20δ). Note that X2 (cj )ZIi,j χAi,j (in the notation of the proof of Lemma 5.2) is C(Di,j ∪ Cj )– measurable. In addition, we have dist (Ci , Di,j ∪ Cj ) > r1 (2δ)−1 thus, using the mixing coefficient as in e.g. [7] p.171 we can estimate ( ( (E X1 (ci )X2 (cj )ZIi,j χA ( ≤ 2φ r1 X1 ∞ X2 ∞ Z 1 . i,j 2δ i,j
Self-Averaging of Wigner Transforms in Random Media
107
On the other hand, we have Ii,j = 0 only if |ci − cj | ≥ r1 (2δ)−1 , which in turn implies that r 1 X1 ∞ X2 ∞ , |U (ci , cj )| ≤ C7 φ 2δ with the constant C7 independent of η > 0. Summarizing, we have shown that ( ( (U (ci , cj )E ZIi,j χA ( ≤ C8 φ r1 X1 ∞ X2 ∞ Z 1 , i,j 2δ i,j
with the constant C8 independent of η > 0. Letting η → 0 and using (5.20) we conclude (5.26). 5.3. Tightness and the martingale property of limiting measures. In this section we prove tightness of the family Qδ (·; M, N, p, q), δ ∈ (0, 1] and show that any weak limit point Q(·; M, N, p, q) of this family as δ → 0, has a certain martingale property. Let LM,N,p,q be a random partial differential operator defined on C0∞ ((Rd )2 ) as follows. For any K = (K1 , K2 ) ∈ C2 and G ∈ C0∞ ((Rd )2 ) we set Y = (Y1 , Y2 ) ∈ C2 , t Yi (t) =
Ki (s)ds,
i = 1, 2,
(5.28)
0
i (t; K) := i,∗ (t; Ki )∗ (t; Ki )∗ (t; K), where i,∗ (t; Ki ) := i (t, Yi (t); K1 ) , ∗ (t; Ki ) := (t, Ki (t); Ki ), ∗ (t; K) := (t, Y1 (t), Y2 (t); K) . We let (LM,N,p,q G)(k1 , k2 ; K) := 21 (t; K)Lk1 G(k1 , k2 ) + 22 (t; K)Lk2 G(k1 , k2 ), with Lki , i = 1, 2 given by (4.1). Let ζ ∈ Cb ((Rd )2n ) be an arbitrary nonnegative function, let 0 ≤ t1 < · · · < tn ≤ t < u and define ζ (K) := ζ (K(t1 ), · · · , K(tn )). We will show that for any function G ∈ C0∞ ((Rd )2 ) there exists a deterministic constant C9 > 0 such that ( ( ( ( (δ) (δ) (δ) (δ) (δ) (δ) (E G(l 1 (u), l 2 (u)) − G(l 1 (t), l 2 (t)) ζ (l 1 (·), l 2 (·)) ( (δ)
(δ)
≤ C9 (u − t)E[ζ (l 1 (·), l 2 (·))],
∀ ζ (·), δ ∈ (0, 1].
(5.29)
The choice of the constant C9 may depend on a particular function G but should be the same for all the spatial translates of G, and may not depend on the test function ζ . This, (δ) (δ) according to Theorem 1.4.6 of [23], implies tightness of the laws of (l 1 (·), l 2 (·)), δ ∈ (0, 1] over C2 . Additionally, we prove that if Q(·; M, N, p, q) is any limiting law of Qδn (·; M, N, p, q), as δn → 0 then
108
G. Bal, T. Komorowski, L. Ryzhik
(δ ) (δ ) (δ ) (δ ) (δ ) (δ ) lim E G(l 1 n (u), l 2 n (u)) − G(l 1 n (t), l 2 n (t)) ζ (l 1 n (·), l 2 n (·)) n→+∞ u (LM,N,p,q G)(K(s); K)ds ζ (K) Q(dK; M, N, p, q) (5.30) = t
for any u > t. This property will be used in the next section to identify the limiting law (δ) (δ) of (k 1 (·), k 2 (·)), as δ → 0. Throughout the remainder of this section we suppress writing both the superscript δ and the cut-off parameters M, N, p, q of the respective measures. With no loss of (p ) (p1 ) generality we assume that there exists k1 such that sk1 1 ≤ t < u ≤ sk1 +1 , cf. (5.7). Given s ≥ σ > 0, we define the linear approximation Lj (σ, s) := y j (σ ) + (s − σ )lˆj (σ ), and R j (v, σ, s) := (1 − v)Lj (σ, s) + vy j (s),
j = 1, 2.
The following simple lemma can be verified by a direct calculation. Lemma 5.4. Suppose that s ≥ σ . Then, |y j (s) − Lj (σ, s)| ≤
√ D1 (s − σ )2 + D0 δ(s − σ ), √ 2 δ
∀ δ > 0, j = 1, 2.
Remark 5.5. Throughout this argument we use σ (s) := max[t, s − δ 1−γ1 ] for some γ1 ∈ (0, 1/8).
(5.31)
The above lemma proves that for this choice of σ the linear approximation Lj (σ, s) of the particle position given by y j (s) is exact, up to a term of order O(δ 3/2−2γ1 ).
(5.32)
We begin now the proof of (5.29). Our strategy is based on the perturbation method: the trajectory is approximated by the iterated linear approximation sufficiently many times so that the error becomes deterministically small. The terms that involve the linear approximation are potentially large but are handled with the help of the mixing lemmas. Note that (δ)
(δ)
(δ)
(δ)
G(l 1 (u), l 2 (u)) − G(l 1 (t), l 2 (t)) u 1 y 1 (s) y 2 (s) = −√ , , l j (s) ds. (5.33) Dlj ,α G(l 1 (s), l 2 (s))Fj,α s, δ δ δ j,α t
We can rewrite (5.33) in the form I (1) + I (2) + I (3) ,
(5.34)
Self-Averaging of Wigner Transforms in Random Media
109
where I
(1)
u y (s) y (s) 1 := − √ Dlj ,α G(l 1 (σ ), l 2 (σ ))Fj,α s, 1 , 2 , l j (σ ) ds, δ δ δ j,α t
y (s) y (s) Dlj ,α G(l 1 (ρ), l 2 (ρ))Dlj ,β Fj,α s, 1 , 2 , l j (ρ) δ δ j,α i,β t σ y (ρ) y 2 (ρ) ×Fj,β ρ, 1 , , l j (ρ) ds dρ, δ δ u s 1 := Dli ,β Dlj ,α G(l 1 (ρ), l 2 (ρ)) δ j,α i,β t σ y 1 (s) y 2 (s) y 1 (ρ) y 2 (ρ) ×Fj,α s, , , l j (ρ) Fi,β ρ, , , l i (ρ) ds dρ. δ δ δ δ
I (2) :=
I (3)
1 δ
u s
5.3.1. Term E[I (1) ζ ]. The term I (1) can be rewritten in the form J (1) + J (2) , where J
(1)
u L1 (σ, s) L2 (σ, s) 1 := − √ Dlj ,α G(l 1 (σ ), l 2 (σ ))Fj,α s, , , l j (σ ) ds, δ δ δ j,α t
and J (2) := −
u 1 1
Dlj ,α G(l 1 (σ ), l 2 (σ ))Dyi ,β Fj,α δ 3/2 j,α i,β t 0 R 1 (v, σ, s) R 2 (v, σ, s) × s, , , l j (σ ) (yi,β (s) − Li,β (σ, s)) ds dv. (5.35) δ δ
Note that we have replaced y j by its linearization Lj in the term J (1) . The linear approximation is always “propelled forward”, which allows us to use Lemma 5.2 to (p) (p) handle the term E[J (1) ζ ]. Suppose that k is such that s, t ∈ [tk , tk+1 ), recall also that (p )
(p )
1 s, t, u ∈ [sk1 1 , sk1 +1 ), and let us fix one trajectory by setting, for instance, j = 1. We will use Lemma 5.2 with X1 (x) = −∇x c1 (x), X2 (x) ≡ 1, (p1 ) L1 (σ, s) L2 (σ, s) Z = 1 sk1 , , , l 1 (σ ) |l 1 (σ )|Dl1 G(l 1 (σ ), l 2 (σ ))ζ δ δ
and g1 = L1 (σ, s)δ −1 , cf. (5.12). We need to verify (5.18). Suppose therefore that (p) Z = 0. For ρ ∈ [0, tk−1 ] we have |L1 (σ, s) − y 1 (ρ)| ≥ (2q)−1 , provided that 0 < δ <
(2q)−1/(1−γ1 ) . For ρ ∈ [tk−1 , σ ] we have (p)
110
G. Bal, T. Komorowski, L. Ryzhik
(p) (L1 (σ, s) − y 1 (ρ)) · lˆ1 tk−1
σ
y 1 (ρ1 ) (p) lˆ1 (ρ1 ) · lˆ1 tk−1 dρ1 δ ρ √ 2 2 ≥ (s − σ ) 1 − + (1 − δD0 )(s − ρ) 1 − N + 1 N +1 2 ≥ (s − σ ) 1 − , (5.36) N +1 ≥ (s − σ )lˆ1 (σ ) · lˆ1
(p) tk−1
+
1+
√
δc1
provided that δ < 1/D02 . We see from (5.36) that (5.18) is satisfied with 2 r = 1 − N+1 (s − σ ) and j = 1. We verify next that g1 is also separated from y 2 (ρ)δ −1 , ρ ∈ [0, σ ]. Consider two (p) cases. First, when s, t ∈ [0, t1 ), using condition (5.3) we obtain then that there exists γN > 0 depending only on N such that ( ( ( ( (g1 − y 2 (ρ) ( ≥ γN (s − σ ) . ( δ ( δ (p )
(p )
1 ). Then we have for ρ ∈ Suppose then that s, t ≥ 1/p and s, t ∈ [sk1 1 , sk1 +1
[0, sk1 1 ], with p1 given by (5.7), |L1 (σ, s) − y 2 (ρ)| ≥ (2q)−1 , provided that δ is as (p )
(p )
above. For ρ ∈ [sk1 1 , σ ] we get, thanks to (5.7), ( ( ( ( ( ( ( (p ) ( (p ) |L1 (σ, s) − y 2 (ρ)| ≥ (L1 (σ, s) − y 2 sk1 1 ( − (y 2 sk1 1 − y 2 (ρ)( 1 1 + D0 2 1 ≥ − ≥ (s − σ ) 1 − , ≥ 2q p1 4q N +1 provided that δ < (4q)−(1−γ1 ) . Using Lemma 5.2 we estimate u ( ( MD s−σ 0 ( (1) ( ds ζ ] ≤ E[ζ ] φ C ∇G √ (E[J ( 10 L∞ ((Rd )2 ) δ δ t
≤ C11 (δ)(u − t) ∇G L∞ ((Rd )2 ) E[ζ ],
(5.37)
where C10 := min[γN , 1/2(1−2/(N +1))], and C11 (δ) depends only on δ and vanishes as δ → 0. On the other hand, the term J (2) defined by (5.35) may be written as (2)
(2)
J (2) = J1 + J2 , where 1
u
(2) J1
:= −
δ 3/2
j,α i,β t
Dlj ,α G(l 1 (σ ), l 2 (σ ))
L1 (σ, s) L2 (σ, s) ×Dyi ,β Fj,α s, , , l j (σ ) (yi,β (s) − Li,β (σ, s)) ds δ δ
Self-Averaging of Wigner Transforms in Random Media
111
and (2)
J2
:= −
u 1 1 1 δ 5/2
j,α i,β t k,γ
0
Dyk ,γ Dyi ,β Fj,α
0
R 1 (θ v, σ, s) R 2 (θ v, σ, s) , , l j (σ ) δ δ ×Dlj ,α G(l 1 (σ ), l 2 (σ ))v (yi,β (s) − Li,β (σ, s))(yk,γ (s) −Lk,γ (σ, s)) ds dv dθ.
×
s,
(5.38)
The second term may be handled easily with the help of Lemma 5.4 and (5.32). We have |E[J2 ζ ]| ≤ C12 D2 E[ζ ] ∇G L∞ ((Rd )2 ) (u − t)δ −5/2 δ 3−4γ1 ≤ C13 δ 1/2−4γ1 (u − t)E[ζ ] ∇G L∞ (R2d ) . (2)
(2)
In order to estimate J1
(5.39)
we split it as (2)
J1
(2)
(2)
= J1,1 + J1,2 ,
(5.40)
where (2) J1,1
:= −
u s 1
Dlj ,α G(l 1 (σ ), l 2 (σ ))Dyi ,β Fj,α δ 3/2 j,α i,β t σ L1 (σ, s) L2 (σ, s) d ˆ li,β (ρ1 ) ds dρ1 , × s, , , l j (σ ) (s − ρ1 ) δ δ dρ1
(5.41)
and (2) J1,2
u s 1 := − Dlj ,α G(l 1 (σ ), l 2 (σ ))Dyi ,β Fj,α δ j,α i,β t σ y i (ρ) ˆ L1 (σ, s) L2 (σ, s) , , l j (σ ) c1 li,β (ρ) ds dρ, (5.42) × s, δ δ δ
with d d d ˆ −1 ˆ ˆ li,β (ρ1 ) − (l i (ρ1 ), l i (ρ1 ))Rd li,β (ρ1 ) . li,β (ρ1 ) = |l(ρ1 )| dρ1 dρ1 dρ1 (2)
(2)
(2)
(2)
(5.43)
(2)
We deal with J1,2 first. It may be split as J1,2 = J1,2,1 + J1,2,2 + J1,2,3 , where (2)
u s 1 Dlj ,α G(l 1 (σ ), l 2 (σ ))Dyi ,β Fj,α δ j,α i,β t σ L1 (σ, s) L2 (σ, s) Li (σ, ρ) ˆ × s, , , l j (σ ) c1 li,β (σ ) ds dρ, δ δ δ
J1,2,1 := −
(5.44)
112
G. Bal, T. Komorowski, L. Ryzhik
(2) J1,2,2
u s 1 1 := − 2 Dlj ,α G(l 1 (σ ), l 2 (σ ))Dyi ,β Fj,α δ j,α i,β,γ t σ 0 L1 (σ, s) L2 (σ, s) , , l j (σ ) × s, δ δ Ri (v, σ, ρ) ×(Dyi ,γ c1 ) (yi,γ (ρ) − Li,γ (σ, ρ))lˆi,β (ρ) ds dρ dv, δ
and (2) J1,2,3
u s ρ 1 := − Dlj ,α G(l 1 (σ ), l 2 (σ ))Dyi ,β Fj,α δ j,α i,β t σ σ L1 (σ, s) L2 (σ, s) , , l j (σ ) × s, δ δ Li (σ, s) d ˆ ×c1 li,β (ρ1 ) ds dρ dρ1 . δ dρ1
By virtue of Lemma 5.4, (5.32) and the definition (5.15), we obtain easily (2)
|E[J1,2,2 ζ ]| = O(δ 1/2−2γ1 ) ∇G L∞ ((Rd )2 ) (u − t)Eζ,
as δ → 0.
(5.45)
(2)
The same argument also shows that |E[J1,2,3 ζ ]| is of the order of magnitude of the right-hand side of (5.45). Using Lemma 5.1 and the definition (5.15) we conclude that L1 (σ, s) L2 (σ, s) Dyi i s, , , l i (σ ) = O(δ). δ δ (2)
Therefore, |E[J1,2,1 ζ ]| is equal, up to a term of order O(δ 1−γ1 )(u−t) ∇G L∞ ((Rd )2 ) Eζ , to u s 1 (p1 ) L1 (σ, s) L2 (σ, s) − , , l i (σ ) E Dli ,α G(l 1 (σ ), l 2 (σ ))i sk1 , δ δ δ i,α,β t σ Li (σ, s) Li (σ, ρ) ˆ × Dyi ,β Dyi ,α c1 (5.46) c1 |l i (σ )|li,β (σ ) ζ ds dρ. δ δ Let δ < (2p1 )1/(1−γ1 ) and fix i. We may apply Lemma 5.3, with (p1 ) L1 (σ, s) L2 (σ, s) , , l i (σ ) |l i (σ )|lˆi,β (σ )ζ, Z = Dli ,α G(l 1 (σ ), l 2 (σ ))i sk1 , δ δ X1 := Dyi ,β Dyi ,α c1 (x), X2 := c1 (x), Li (σ, s) Li (σ, ρ) , g2 := , r = C13 (ρ − σ ), r1 = C13 (s − ρ), g1 := δ δ where C13 > 0 depends only on N . We conclude that
Self-Averaging of Wigner Transforms in Random Media
113
( u s ( L1 (σ, s) L2 (σ, s) ( (E J (2) ζ + 1 E i σ, , , l i (σ ) 1,2,1 ( δ δ δ ( i,α,β t σ ( ( ( (σ, s) − L (σ, ρ) L i i 2 × Dli ,α G(l 1 (σ ), l 2 (σ ))|l i (σ )|lˆi,β (σ )∂α,β R ζ ds dρ (( δ ( ≤ C14 δ −1 ∇G L∞ ((Rd )2 ) E[ζ ] u s C13 (ρ − σ ) C13 (s − ρ) 1/2 1/2 × φ φ ds dρ. 2δ 2δ t
(5.47)
σ
Here we used the fact that L1 (σ, s) L2 (σ, s) (p ) L1 (σ, s) L2 (σ, s) i σ, , , l i (σ ) = i sk1 1 , , , l i (σ ) . δ δ δ δ The right-hand side of (5.47) is of the form C15 (δ)(u − t) ∇G L∞ ((Rd )2 ) Eζ , where C15 (δ) vanishes, as δ → 0. The second term appearing on the left-hand side equals L1 (σ, s) L2 (σ, s) E i σ, , , l i (σ ) Dli ,α G(l 1 (σ ), l 2 (σ )) δ δ i,α,β t s d s − ρ × ∂α R lˆi (σ ) dρ |l i (σ )|ζ ds dρ δ
u
−
σ
u
L1 (σ, s) L2 (σ, s) = E i σ, , , l i (σ ) δ δ i,α,β t s−σ ˆ × Dli ,α G(l 1 (σ ), l 2 (σ ))∂α R l i (σ ) |l i (σ )|ζ ds, δ
(5.48)
thanks to the fact that ∇y R(0) = 0. The term appearing on the right-hand side of (2) (5.48) vanishes as δ → 0 and, in consequence we have shown that |E[J1,2,1 ζ ]| = C16 (δ)(u − t) ∇G L∞ ((Rd )2 ) Eζ , where C16 (δ) vanishes, as δ → 0. (2)
We now estimate J1,1 given by (5.41). Note that according to (5.43) and (5.16) we have (2)
(2)
(2)
J1,1 = J1,1,1 + J1,1,2 , where (2) J1,1,1
u s 1 := 2 Dlj ,α G(l 1 (σ ), l 2 (σ ))Dyi ,β Fj,α δ j,α i,β t σ L1 (σ, s) L2 (σ, s) , , l j (σ ) × s, δ δ y (ρ1 ) y 2 (ρ1 ) ×(s − ρ1 ) i,β ρ1 , 1 , , l i (σ ) ds dρ1 , δ δ
114
G. Bal, T. Komorowski, L. Ryzhik
with
i ρ, y 1 , y 2 , l := |l|−1 Fi ρ, y 1 , y 2 , l − ˆl, Fi ρ, y 1 , y 2 , l
R
ˆl , d
while (2) J1,1,2
u s ρ1 1 := 2 Dlj ,α G(l 1 (σ ), l 2 (σ ))Dyi ,β Fj,α δ j,α i,β t σ σ L1 (σ, s) L2 (σ, s) d , , l j (σ ) (s − ρ1 ) × s,
i,β δ δ dρ2 y (ρ1 ) y 2 (ρ1 ) × ρ1 , 1 , , l i (ρ2 ) ds dρ1 dρ2 . δ δ
A straightforward computation, using Lemma 5.4 (note that shows that
d dρ2 i,β
(5.49)
∼ δ −1/2 in (5.49)),
(2)
|E[J1,1,2 ζ ]| ≤ O(δ 1/2−3γ1 )(u − t) ∇G L∞ ((Rd )2 ) E[ζ ]. An application of Lemma 5.4, in the same fashion as was done in the calculations (2) (2) (2) concerning the terms E[J1,2,2 ζ ] and E[J1,2,3 ζ ], yields that E[J1,1,1 ζ ] is equal, up to a term of the order C17 (δ)(u − t) ∇G L∞ ((Rd )2 ) E[ζ ], where lim C17 (δ) = 0, to δ→0
u s 1 (s − ρ )E Dli ,α G(l 1 (σ ), l 2 (σ ))Dyi ,β Fi,α 1 δ2 i,α,β t σ L1 (σ, s) L2 (σ, s) × s, , , l i (σ ) i,β δ δ L1 (σ, ρ1 ) L2 (σ, ρ1 ) , , l i (σ ) ζ ds dρ1 . × ρ1 , δ δ We denote
Vi,β (y 1 , y 2 , y 1 , y 2 , l) :=
γ ,q
3 ˆ ˆ ∂β,γ ,q R(y i − y i )lq lγ −
(5.50)
3 ∂β,γ ,γ R(y i − y i ) |l|.
γ
Applying Lemmas 5.1 and 5.3, as in (5.46) (5.47), we conclude that (5.50) is equal, up to a term of order C18 (δ)(u − t) ∇G L∞ ((Rd )2 ) E[ζ ], where lim C18 (δ) = 0, to δ→0
u s 1 ˜ (s − ρ )E D G(l (σ ), l (σ )) (σ, P ; l(·))V (P ) ζ ds dρ1 , (5.51) 1 l ,α 1 2 i i i,α i i δ2 i,α t
σ
with i defined by (5.13), and Pi = (L1 (σ, s), L2 (σ, s), L1 (σ, ρ1 ), L2 (σ, ρ1 ), l i (σ )) . (p ) (p1 ) ], Note, however, that for s ∈ [sk1 1 , sk1 +1 (p ) (s, L1 (σ, s), L2 (σ, s)) = sk1 1 , L1 (σ, s), L2 (σ, s)
Self-Averaging of Wigner Transforms in Random Media
115
and ( ( ( (σ, L1 (σ, s), L2 (σ, s)) − σ, y 1 (σ ), y 2 (σ ) ( 2 ( ( (Lp (σ, s) − y p (σ )( ≤ C(s − σ ) ≤ Cδ 1−γ1 .
≤C
p=1
A similar estimate holds also for the terms containing Li (σ, ρ1 ) and we conclude that the expression in (5.51) is equal, up to a term of order C19 (δ)(u−t) ∇G L∞ ((Rd )2 ) E[ζ ], where lim C19 (δ) = 0, to δ→0
i,α
s 2 E Dlj ,α G(l 1 (σ ), l 2 (σ ))i (σ ) (s − ρ1 )Vi,α (Pi ) dρ1 ζ ds,
u
1 δ2
σ
t+δ 1−γ1
(5.52) with ˜ i (σ ) := i (σ, y 1 (σ ), y 2 (σ ), l(σ ); l(·)). Note that, for s > t + δ 1−γ1 we have s
1 δ2 i,α
(s − ρ1 )Vi,α (Pi ) dρ1
s−δ 1−γ1
i,α
−
=
3 ∂α,γ ,γ R δ−γ1
|l i (σ )|
i,α
−
3 ∂α,γ ,q R
γ ,q
s − ρ1 ˆ l i (σ ) lˆi,q (σ )lˆi,γ (σ ) δ
s − ρ1 ˆ l i (σ ) dρ1 δ
ρ1
3 ˆ ˆ ˆ ∂α,γ ,q R ρ1 l i (σ ) li,q (σ )li,γ (σ )
γ ,q
3 ˆ dρ1 . ∂α,γ ,γ R ρ1 l i (σ ) 0
(s − ρ1 )
s−δ 1−γ1
γ
s
1 = 2 |l i (σ )| δ
(5.53)
γ
Using the fact that q
3 ˆi (σ ) lˆi,q (σ ) = d ∂ 2 R ρ1 lˆi (σ ) ∂α,γ R ρ l 1 ,q dρ1 α,γ
we obtain, upon the integration by parts performed in the first term on the utmost righthand side of (5.53), that this term equals
116
G. Bal, T. Komorowski, L. Ryzhik
δ−γ1 −γ1 2 −γ1 ˆ 2 l i (σ ) lˆi,γ (σ ) − |l i (σ )| δ ∂α,γ R δ ∂α,γ R ρ1 lˆi (σ ) lˆi,γ (σ ) dρ1
i,α,γ δ−γ1
− 0
=
0
3 ˆ ρ1 ∂α,γ R ρ (σ ) dρ1 l 1 i ,γ
2 |l i (σ )| δ −γ1 ∂α,γ R δ −γ1 lˆi (σ ) lˆi,γ (σ ) − ∂α R δ −γ1 lˆi (σ )
i,α,γ δ−γ1
−
3 ˆ ρ1 ∂α,γ ,γ R ρ1 l i (σ ) dρ1 .
(5.54)
0
We have used here the fact that ∇R(0) = 0 and d ∂α,γ R ρ1 lˆi (σ ) lˆi,γ (σ ) = ∂α R ρ1 lˆi (σ ) . dρ1 γ Summarizing the work done in this section, we have shown that |E[I (1) ζ ]| ≤ C20 (u − t) ∇G L∞ ((Rd )2 ) Eζ,
(5.55)
where the constant C20 does not depend on δ and G. 5.3.2. The terms E[I (2) ζ ] and E[I (3) ζ ]. The calculations concerning these terms essentially follow the respective steps performed in the previous section so we only highlight their main points. First, we note that because l i (ρ) − l i (σ ) ∼ δ 1/2−γ1 we have that E[I (2) ζ ] is, up to a term C21 (δ)(u − t) ∇G L∞ ((Rd )2 ) E[ζ ], where lim C21 (δ) = 0, δ→0
equal to u s 1 y (s) y (s) E Dlj ,α G(l 1 (σ ), l 2 (σ ))Dlj ,β Fj,α s, 1 , 2 , l j (σ ) δ δ δ j,α,β t σ y (ρ) y 2 (ρ) × Fj,β ρ, 1 (5.56) , , l j (σ ) ζ ds dρ. δ δ Replacing ρ by σ as the argument of l 1 (·), l 2 (·) in (5.56) needs a correction that is of order of magnitude O(δ 1/2−2γ1 )(u − t) ∇G L∞ ((Rd )2 ) E[ζ ], since γ1 ∈ (0, 1/8]. Next we note that (5.56) equals u s 1 L1 (σ, s) L2 (σ, s) , , l j (ρ) E Dlj ,α G(l 1 (σ ), l 2 (σ ))Dlj ,β Fj,α s, δ δ δ j,α,β t σ L1 (σ, ρ) L2 (σ, ρ) × Fj,β ρ, , , l j (σ ) ζ ds dρ δ δ u s 1 1 + 2 E Dlj ,α G(l 1 (ρ), l 2 (ρ))Dyi ,γ Dlj ,β Fj,α δ i,γ j,α,β t
σ
0
Self-Averaging of Wigner Transforms in Random Media
117
R 1 (v, σ, s) R 2 (v, σ, s) , , l j (σ ) δ δ L1 (σ, ρ) L2 (σ, ρ) × Fj,β ρ, , , l j (σ ) (yi,γ (s) − Li,γ (σ, s))ζ ds dρ dv δ δ u s 1 1 + 2 E Dlj ,α G(l 1 (ρ), l 2 (ρ))Dyi ,γ Dlj ,β Fj,α δ i,γ j,α,β t σ 0 R 1 (v, σ, ρ) R 2 (v, σ, ρ) y (s) y (s) , , l j (σ ) × s, 1 , 2 , l j (σ ) Fj,β ρ, δ δ δ δ × (yi,γ (s) − Li,γ (σ, ρ))ζ ds dρ dv. (5.57) ×
s,
A simple argument using Lemma 5.4, (5.31) and (5.32) shows that the second and third terms of (5.57) are both of order of magnitude O(δ 1/2−3γ1 )(u − t) ∇G L∞ ((Rd )2 ) E[ζ ]. The first term, on the other hand, can be handled with the help of Lemma 5.3 in the (2) same fashion as we have dealt with the term J1,2,1 , given by (5.44) of Sect. 5.3.1, and we obtain that |E[I (2) ζ ]| ≤ C22 (δ)(u − t) ∇G L∞ ((Rd )2 ) E[ζ ],
(5.58)
where lim C22 (δ) = 0. δ→0
Finally, concerning the limit of E[I (3) ζ ] we note that by Lemma 5.4 we have E[I (3) ζ ] ≤ C23 (δ)(u − t) ∇G L∞ ((Rd )2 ) E[ζ ] + Ii,j , (5.59) i,j
where lim C23 (δ) = 0 and δ→0
Ii,j
1 := δ
u s
α,β t
E Dli ,β Dlj ,α G(l 1 (σ ), l 2 (σ ))Fj,α
L1 (σ, s) L2 (σ, s) s, , , l j (σ ) δ δ
σ
L1 (σ, ρ) L2 (σ, ρ) × Fi,β ρ, , , l i (σ ) ζ ds dρ. δ δ First, let i = j and 2δ 1−γ1 M ≤ (2q)−1 . Suppose also that s ≥ t1 . We have then (p)
|Li (σ, s) − Lj (σ, ρ)| ≥
1 1 − 2M(s − σ ) ≥ q 2q
on the event (with fixed α, β) L1 (σ, s) L2 (σ, s) L1 (σ, ρ) L2 (σ, ρ) , , l j (σ ) i s, , , l i (σ ) j s, δ δ δ δ × Dli ,β Dlj ,α G(l 1 (σ ), l 2 (σ ))|l j (σ )||l i (σ )| = 0. (p)
When, on the other hand, s, ρ ∈ [0, t1 ], then we conclude from (5.3) that |Li (σ, s) − Lj (σ, ρ)| ≥ γN s ≥ γN (s − σ ).
118
G. Bal, T. Komorowski, L. Ryzhik
Therefore |Ii,j | can then be estimated via Lemma 5.3 and Lemma 5.1 by γ C23 D12 M 2 Dl1 Dl2 G L∞ δ −γ1 φ 1/2 γN + δ 1−2γ1 E[ζ ]. δ 1
(5.60)
It obviously vanishes, as δ → 0. The second term in (5.60) arises from the contribution of s < t + δ 1−γ1 . When i = j we can use Lemma 5.3 in order to obtain |Ii,i | ≤ C24 (u − t) ∇ 2 G L∞ ((Rd )2 ) E[ζ ]. Summarizing, we conclude that |E[I (3) ζ ]| ≤ C25 (u − t) ∇ 2 G L∞ ((Rd )2 ) E[ζ ], where C25 can be chosen independently of δ and G. Hence we conclude (5.29) and tightness follows. Suppose now that Q is any limiting measure of Q(δn ) for a certain sequence δn → 0, as n → +∞. Coming back to (5.52) we conclude, using calculation (5.53)–(5.54), that the limit, as δ → 0, of the expression on the left hand side of (5.52) equals u aα(i) (s)Dli ,α G(K1 (s), K2 (s))|Ki (s)|i (s) ζ (K) ds Q(dK), (5.61) i,α
t
where
ˆ i (s) := i s, Y1 (s), Y2 (s), K1 (s), K2 (s); K(·), K(·) , s Yi (s) := xi +
Kˆ i (ρ) dρ,
i = 1, 2,
0
aα(i) (s)
+∞ 3 ˆ K := − ρ1 ∂α,γ R ρ (s) dρ1 . 1 i ,γ γ
0
Similarly, we calculate the limit, as δ → 0, of E[I (3) ζ ]. We know that only the limits of the terms Ii,i contribute. A straightforward computation shows that lim Ii,i δ→0
=
i
i,α,β
u c(i) (s)Dli ,α Dli ,β G(K1 (s), K2 (s))H (i) (s) ζ (K) ds Q(dK), α,β t
where (i) cα,β (s)
+∞ 2 := −|Ki (s)| ∂α,β R(ρ Kˆ i (s)) dρ, 2
0
H
(i)
(s) := 2i (s, Y1 (s), Y2 (s), Ki (s)) .
Summarizing, we have shown that any limiting measure Q satisfies (5.30).
Self-Averaging of Wigner Transforms in Random Media
119 (δ)
(δ)
5.4. The removal of cut-offs and the proof of weak convergence of (k 1 (·), k 2 (·)). Let Qk1 ,k2 := Qk1 ⊗ Qk2 be the law of two independent copies of the diffusion given by (M) (4.1) over C2 (k1 , k2 ) starting respectively at k1 and k2 . For a fixed M let Qk1 be the law over C1 of any diffusion starting at a given k1 ∈ Rd with the generator L(M) given by (M) ap,q (k)∂k2p ,kq F (k) + bp(M) (k)∂kp F (k), F ∈ C0∞ (Rd ). L(M) F (k) = p,q (M)
p
(M)
(M)
Here ap,q (·), bp (·) are bounded and twice continuously differentiable, ap,q (k) = ˆ bp(M) (k) = |k|Ep (k) ˆ for M −1 ≤ |k| ≤ M. By virtue of Theorems 5.2.3 |k|2 Dp,q (k), (M) and 5.3.2 of [23] we conclude that Qk1 is the unique probability measure such that t F (K(t)) − F (k1 ) −
L(M) F (K(s))ds,
t ≥0
0
is an M0,t 1
t≥0
(M)
(M)
(M)
-martingale for any F ∈ Cb2 (Rd ). We define Qk1 ,k2 := Qk1 ⊗ Qk2 . (δ)
(δ)
Let us briefly describe the strategy of the proof of weak convergence of (k 1 (·), k 2 (·)). First, for any K ∈ C2 we define a certain stopping time W (K; M, N, p, q), see (5.63). The crucial property of that time is that the dynamics given by (4.8) agrees with the dynamics of the truncated system (5.16) up to W (·; M, N, p, q). We also show that any limiting measure Q(·; M, N, p, q) satisfies, up to the stopping time, the martingale problem associated with the diffusion given by Qk1 ,k2 . This property allows to identify Q(·; M, N, p, q) with Qk1 ,k2 on the σ –algebra M0,W corresponding to the stopping 2 time. The final step is to show that for sufficiently large N , so that (5.3) is satisfied, and sufficiently large M, as in (5.1), the stopping time W (·; M, N, p, q) converges to infinity in Qk1 ,k2 as q → +∞ and p → +∞ (in that order), see (5.64). The weak convergence statement is a consequence of this property of the stopping time and it is shown in the calculation following (5.80). We introduce the following (M0,t 2 )t≥0 –stopping times. As before, for any K = (K1 , K2 ) such that K(t) = 0 for all t ≥ 0 we define t Yj (t) :=
Kˆ j (s)ds.
(5.62)
0
For such a K we let S(N, p) := lim Sn (N, p), where n↑+∞
(p) (p) Sn (N, p) := inf t ≥ 0 : for some k ≥ 0 we have t ∈ tk , tk+1 (p) and Kˆ i (tj ) · Kˆ i (t) < 1 −
2 1 + , for some i ∈ {1, 2} or j ∈ {k − 1, k} . N +1 n
If K is such that it becomes 0 for some t we adopt the convention that S(N, p) = +∞. We let further T (M) := lim Tn (M), where n↑+∞
120
G. Bal, T. Komorowski, L. Ryzhik
1 1 Tn (M) := inf t ≥ 0 : |Ki (t)| < + , for some i ∈ {1, 2}, M n 1 or |Ki (t)| > M − , for some i ∈ {1, 2} . n Finally, for any R1 , R2 > 0 and K = (K1 , K2 ) ∈ C2 (R1 , R2 ) we let U (p, q; K) := lim Un (p, q; K), V (p, q; K) := lim Vn (p, q; K), where
n↑+∞
n↑+∞
Un (p, q; K) := inf t ≥ 0 : for some k ≥ 1, i ∈ {1, 2} we have 1 1 t∈ such that |Yi (t) − Yi (u)| < + u∈ , q n 1 1 1 Vn (p, q; K) := inf t ≥ : inf |Y1 (t) − Y2 (u)| < + , p 0≤u≤t q n 1 1 or inf |Y2 (t) − Y1 (u)| < + . 0≤u≤t q n (p) [0, tk−1 ],
(p) (p) tk , tk+1
We adopt the convention that any of the above defined stopping times is infinite if the respective set of times over which it is determined is empty. Suppose that T0 > 0 is an arbitrary deterministic time. Let W (M, N, p, q) := S(N, p) ∧ T (M) ∧ U (p, q) ∧ V (p, q) and B(M, N, p, q) := {S(N, p) ∧ U (p, q) ∧ V (p, q) ≤ T (M) ∧ T0 }.
(5.63) (M)
We have B ∈ M0,W 2 . According to Theorem 6.1.2 of [23] the measures Qk1 ,k2 , Qk1 ,k2 , Q(·; M, N, p, q) agree, when restricted to M0,W 2 . In what follows we show that lim Qk1 ,k2 [W (M, N, p, q) < +∞] = 0.
lim
p→+∞ q→+∞
(5.64)
The condition (δ)
(δ)
(δ)
(δ)
T0 < W (k 1 (·), k 2 (·); M, N, p, q) = W (l 1 (·), l 2 (·); M, N, p, q) (δ)
(δ)
(δ)
(5.65)
(δ)
implies (k 1 (s), k 2 (s)) = (l 1 (s), l 2 (s)) for s ∈ [0, T0 ]. We will use both (5.64) and (δ) (δ) (5.65) to establish weak convergence of the laws of (k 1 (·), k 2 (·)) over d 2 C([0, T0 ]; (R ) ). We start with the following simple observation. Lemma 5.6. With the choice of M as in (5.1) we have Qk1 ,k2 [T (M) = +∞] = 1. Proof. A simple calculation using the Itˆo formula and Remark 4.1 shows that d|kj (t)|2 = 0, j = 1, 2 which proves the lemma. Lemma 5.7. Under the assumptions of Theorem 4.4 we have lim U (p, q) = +∞, ∀ p,
q→+∞
Qk1 ,k2 − a.s.
(5.66)
Self-Averaging of Wigner Transforms in Random Media
121
Proof. The proof is essentially the repetition of the argument from [15] pp. 60–61 so we only highlight its main points. It suffices only to show that lim U (i) (p, q) = +∞, ∀ p,
Qki − a.s.,
q→+∞
(5.67)
(i)
where U (i) (p, q) := lim Un (p, q), n↑+∞
(p) (p) (p) Un(i) (p, q) := inf t ≥ 0 : for some k ≥ 1, we have u ∈ [0, tk−1 ], t ∈ tk , tk+1 ( t ( ( ( 1 1 ( ( ˆ such that ( Ki (s)ds ( < + . q n u However, (5.67) can be proved with the help of the argument contained in pp. 60–61 [15] so we omit the details here. We obtain from (5.67) lim U (p, q) = +∞, ∀ p,
q→+∞
Qk1 ,k2 − a.s.
However U (p, q) = U (1) (p, q) ∧ U (2) (p, q) and (5.66) follows.
Let us denote by (j )
Yt
:=
(5.68)
Yj (s).
0≤s≤t (j )
(j )
and by Br (Yt ) := [x : dist(x, Yt ) ≤ r], the sausage, up to time t, of diameter r > 0 around trajectory Yj (·). The next lemma shows that S(N, p) becomes infinite as p → ∞ for each N . Lemma 5.8. We have lim S(N, p) = +∞, ∀ N,
p→+∞
Qk1 ,k2 − a.s.
(5.69)
Proof. The conclusion of the lemma is a consequence of the uniform continuity of paths of the diffusion on any finite time interval [0, T ], which implies that lim
min
min
p→+∞ t (p) ∈[0,T ] t∈[t (p) ,t (p) ] k
k
(p) Kˆ j (t) · Kˆ j (tk ) = 1,
j = 1, 2.
k+1
Our next lemma shows that V (p, q) becomes infinite with p, q → +∞. Lemma 5.9. Suppose that N is as in (5.3) and T1 , η > 0 are arbitrary. Then, one can find p0 , q0 such that Qk1 ,k2 [S(N, p) ∧ V (p, q) ≤ T1 ] < η,
∀ p ≥ p0 , q ≥ q0 .
(5.70)
122
G. Bal, T. Komorowski, L. Ryzhik
In order to prove this lemma we will need an auxiliary property of (K1 (·), Y1 (·)). Let k1 = |k1 |. Note that the process (K1 (·), Y1 (·)) is a diffusion on Rd × Rd , actually (0) × Rd , over (T1 , Qk1 ). Its generator is given by supported on Skd−1 1 N (k, x) := LF (k, x) + kˆ · ∇x F (k, x). We denote by P (t, k, x; ·) its transition probability. It satisfies the Fokker-Planck equation +∞
(∂t − N )ϕ(t, k , y)P (t, k, x; dt, dk , dy) = 0,
0 Rd Rd
∀ϕ ∈ C0∞ ((0, +∞) × Rd × Rd ).
(5.71)
Lemma 5.10. Let t > 0, (k, x) ∈ Skd−1 × Rd (k = |k|). Then, P (t, k, x; ·) is absolutely continuous with respect to the Lebesgue measure on Skd−1 ×Rd , with the transition probability density p(t, k, x, ·, ·) that is a C ∞ -function. In particular, for any T , K, η > 0 there exists a constant C > 0 such that max
max
t∈[η,T ] (k,x)∈S d−1 ×B K (0)
P (t, k, x; S d−1 × A) ≤ C|A|
(5.72)
k
for any A ⊂ B K (0) and A ∈ B(Rd ). (±)
Proof. Let k := |k| and Si : Bkd−1 (0) → S d−1 be given by 3 (±) Si (l) := (l1 , · · · , ± k 2 − l 2 , · · · , ld−1 ), l = (l1 , · · · , ld−1 ) ∈ Bkd−1 (0), l := |l|. 56 7 4 i th component (±)
(±)
Define the measure Pi (t, B × A) := P (t, k, x; Si (B) × A), A ∈ B(Rd ), B ∈ B(Bkd−1 (0)). The conclusion of the lemma holds if we can show that each measure Pi possesses a C ∞ smooth density. In what follows we consider only the case i = d and (+) (+) denote S := Sd , PS := Pd . Note that the respective measure satisfies the equation (∂t − N˜ ∗ )PS (t, ·) = 0 in the distribution sense, with d−1
N˜ F (l, x) := k 2
D˜ p,q (l)∂l2p ,lq F (l, x) + k
p,q=1
+
1 k
d−1 p=1
8 lp ∂xp F (k, x) +
d−1
E˜ p (l)∂lp F (l, x)
p=1
2 l 1− ∂xd F (k, x), k
√ √ where D˜ p,q (l) = Dp,q (k −1 l, k −1 k 2 − l 2 ), E˜ p (l) = Ep (k −1 l, k −1 k 2 − l 2 ). It suffices therefore to prove that ∂t − N˜ ∗ is hypoelliptic in order to prove the lemma. Note that (∂t − N˜ ∗ )F =
d p=1
Xp2 F + X0 F + a(l)F,
∀ F ∈ C0∞ (Bkd−1 ),
Self-Averaging of Wigner Transforms in Random Media
123
where Xp (l) := k
d−1
1/2 Dˆ p,q (l)∂lq ,
q=1
1 X0 (l) := ∂t − lq ∂xq − k d−1 q=1
p = 1, · · · , d − 1, 8
2 d−1 l 1− ∂ xd + aq (l)∂lq , k q=1
and a(·), ap (·), p = 1, · · · , d − 1 are C ∞ -smooth functions. It suffices therefore to prove that for any (t, l, x) ∈ R × S d−1 × Rd the linear subspace Lt,l,x of the tangent space to R × S d−1 × Rd , spanned by the vector fields belonging to the Lie algebra L generated by X0 , X1 , · · · , Xd , is of dimension 2d. The (d − 1) × (d − 1) matrix ˜ 1/2 (·), is non-degenerate in B d−1 (0) due to Proposition ˜ D(·) := [D˜ p,q (l)], as well as D k 4.3 (actually it degenerates in the limit when l approaches ∂Bkd−1 (0)). Hence the vectors ∂lp ∈ Lt,l,x , p = 1, · · · , d − 1. Note also that d−1 d−1 lq 1/2 ˆ [X0 , Xp ] = ∂xd + bq (l)∂lq , Dp,q (l) ∂xq + √ k2 − l2 q=1 q=1 where bp (·), p = 1, · · · , d − 1 are C ∞ -smooth functions. Hence, ∂xq + lq (k 2 − l 2 )−1/2 ∂xd ∈ Lt,l,x , q = 1, · · · , d − 1. Furthermore, d−1
˜ + (D(l)l, ˜ [[X0 , Xp ], Xp ] = −k trD(l) l)Rd (k 2 − l 2 )−1 (k 2 − l 2 )−1/2 ∂xd
p=1
+
d−1 lq dq (l) ∂xq + √ ∂xd + cq (l)∂lq , k2 − l2 q=1 q=1
d−1
where cp (·), dp (·), p = 1, · · · , d − 1 are C ∞ -smooth functions. We can conclude therefore that ∂xd ∈ Lt,l,x , hence also ∂xp ∈ Lt,l,x , p = 1, · · · , d − 1 and finally we also get ∂t ∈ Lt,l,x , so that the proof of Lemma 5.10 is complete. Proof of Lemma 5.9. Let A(N, p) := [S(N, p) ≥ T1 + 1]. Choose p sufficiently large so that Qk1 ,k2 [A(N, p)] ≥ 1 − η/2. This can be done thanks to the continuity property of diffusion paths. For any (K1 (·), K2 (·)) ∈ A(N, p) we have ( ( ( ( ( ( ( ( (Y1 1 − Y2 (ρ)( ≥ λN (p), and (Y2 1 − Y1 (ρ)( ≥ λN (p) (5.73) ( ( ( ( p p for all ρ ∈ [0, 1/p], according to (5.4) (see (5.62) for the definition of Yi (·), i = 1, 2 ). (i) Recall also that Yt (Ki ), i = 1, 2 are defined by (5.68).
124
G. Bal, T. Komorowski, L. Ryzhik (1)
Let V (1) (p, q; K) := lim Vn (p, q; K), where n→+∞
1 1 1 (2) , Vn(1) (p, q; K) := inf t ≥ : dist(Y1 (t), Yt ) < + p q n (2)
and likewise we introduce V (2) (p, q; K) := lim Vn (q; K), with n→+∞
1 1 1 (1) . Vn(2) (p, q; K) := inf t ≥ : dist(Y2 (t), Yt ) < + p q n Note that V (p, q; K) := V (1) (p, q; K) ∧ V (2) (p, q; K).
The conclusion of Lemma 5.9 is then a consequence of the following. Lemma 5.11. For any N sufficiently large so that (5.3) holds and p ≥ 1 we have lim V (i) (p, q; K) = +∞ Qk1 ,k2 − a.s. on A(N, p),
q→+∞
∀ k1 , k2 = 0, i = 1, 2. (5.74)
Proof. With no loss of generality we assume that i = 1. Note that obviously V (1) (p, q ; K) ≥ V (1) (p, q; K) for q ≥ q and all K ∈ C2 (k1 , k2 ). For any K2 we denote A(N, p; K2 ) := [K1 : (K1 , K2 ) ∈ A(N, p)]. It suffices to show that for Qk2 –a.s. K2 we have lim V (1) (p, q; K1 , K2 ) = +∞,
q→+∞
Qk1 − a.s. on A(N, p; K2 ).
Let us denote B(t, x; K2 ) := [K1 : |Y1 (t; x) − Y2 (ρ)| ≥ λN (p), ρ ∈ [0, 1/p] ] . Note that A(N, p; K2 ) ⊆ B p1 , 0; K2 , according to (5.73). Let T1 > 0 be arbitrary. We show that 1 , 0; K2 = 0, Qk2 − a.s. in K2 . (5.75) lim Qk1 V (1) (p, q; ·, K2 ) ≤ T1 , B q→+∞ p The expression under the limit in (5.75) can be estimated by 1 1 (2) inf dist (Y1 (u), YT1 ) ≤ , B , 0; K2 Qk1 u∈[0,T1 ] q p 1 = , k1 , 0, dk, dx Qk P p Skd−1 ×[1/p≥|x|≥λn (p)] 1
×
inf
u∈[0,T1 −1/p]
(2)
dist (Y1 (u; x), YT1 ) ≤
1 , B(0, x; K2 ) . q
Self-Averaging of Wigner Transforms in Random Media
125
Here we used the Markov property of the process (K1 , Y1 ). Equation (5.75) follows if we can show that 1 (2) inf lim Qk =0 (5.76) dist (Y1 (u; x), YT1 ) ≤ q→+∞ u∈[0,T1 −1/p] q and x satisfying 1/p ≥ |x| ≥ λN (p), Qk2 − a.s. in K2 . for every k ∈ Skd−1 1 (2)
Suppose first that η1 := 21 dist(x, YT1 ) > 0. Then,
Qk
inf
0≤u≤η1
1 (2) dist Y1 (u; x), YT1 ≥ = 1, q
∀ q ≥ 4η−1 .
(5.77)
Note that the expression under the limit on the left-hand side of (5.76) can be estimated by Qk
inf
η1 ≤j/q≤T1
≤ (T1 + 1)q
dist
2 ≤ q (2) Qk Y1 (j/q; x) ∈ B2/q (YT1 ) .
(2) Y1 (j/q; x), YT1
max
η1 ≤j/q≤T1
(5.78)
The right-hand side of (5.78) can be estimated, with the help of Lemma 5.10, by C(η1 )(T1 + 1)q 2−d ,
∀ q ≥ 4η1−1
(recall that Y2 (·) is of C 1 -class, with |Y2 (·)| ≤ 1) and (5.75) follows, provided we can prove that (2) (5.79) Qk2 dist x, YT1 = 0 = 0 for 1/p ≥ |x| ≥ λN (p). Recall that |x − Y2 (ρ)| ≥ λN (p), ρ ∈ [0, 1/p]. For any ρ > 0 we can estimate therefore the left-hand side of (5.79) by T1 + 1 max Qk [ |Y2 (jρ) − x| ≤ 2ρ ] ≤ C(p)(T1 + 1)ρ d−1 ρ 1/p≤jρ≤T1 2 for some constant C(p) > 0 depending only on p. Since the last inequality holds for all ρ > 0 we conclude (5.79). An immediate consequence of Lemmas 5.6, 5.7 and 5.9 is the following. Corollary 5.12. For any M, ε > 0 there exist sufficiently large p, q and N so that Qk1 ,k2 [B(M, N, p, q)] < ε. Choose any T0 > 0 and F a bounded and continuous functional over C2 that is 0 M0,T 2 -measurable. We show that (δ) (δ) lim sup E F (k 1 (·), k 2 (·)) ≤ F (K(·))Qk1 ,k2 (dK). (5.80) δ→0
(δ)
(δ)
This, in fact, implies weak convergence of the laws of (k 1 (·), k 2 (·)) over C2 , as δ → 0.
126
G. Bal, T. Komorowski, L. Ryzhik
Fix η > 0 and choose M > 0 such that M − 1 satisfies (5.1). Then, by virtue of Lemma 5.6, Qk1 ,k2 [T (M − 1) ≤ T0 ] = 0.
(5.81)
Qk1 ,k2 [B(M, N, p, q)] ≤ η.
(5.82)
Let p, q be such that
Note that B(M − 1, N − 1, p, q − 1) ⊆ B(M, N, p, q). Let δn → 0, then we can choose a subsequence that we still denote as (δn ), such that (δ ) (δ ) the laws of (l 1 n (·), l 2 n (·)) converge over C2 , as n → +∞, to a certain Q(·; M, N, p, q). We have (δ ) (δ ) lim sup E F (k 1 n (·), k 2 n (·)) n→+∞ (δ ) (δ ) (δ ) (δ ) ≤ lim sup E F (l 1 n (·), l 2 n (·)); W (l 1 n (·), l 2 n (·); M − 1, N − 1, p, q − 1) > T0 n→+∞ ( ( ( ( (δ ) (δ ) (δ ) (δ ) + lim sup (E F (k 1 n (·), k 2 n (·)); W (l 1 n (·), l 2 n (·); M − 1, N − 1, p, q − 1) ≤ T0 (. n→+∞
(5.83) The second term on the right-hand side of (5.83) can be estimated by (δ ) (δ ) F L∞ lim sup P T (l 1 n (·), l 2 n (·); M − 1) ≤ T0 n→+∞ + Qk1 ,k2 B(M − 1, N − 1, p, q − 1) .
(5.84)
Note also that
( ( (δ ) (δ ) lim sup (P T (l 1 n (·), l 2 n (·); M − 1) > T0 n→+∞ ( ( (δ ) (δ ) − P W (l 1 n (·), l 2 n (·); M − 1, N − 1, p, q − 1) > T0 ( (δ ) (δ ) ≤ lim sup P (l 1 n (·), l 2 n (·)) ∈ B(M − 1, N − 1, p, q − 1) n→+∞ ≤ Qk1 ,k2 B(M − 1, N − 1, p, q − 1) ≤ η,
and hence
(5.85)
(δ ) (δ ) lim sup P T (l 1 n (·), l 2 n (·); M − 1) ≤ T0 n→+∞ (δ ) (δ ) = 1 − lim inf P T (l 1 n (·), l 2 n (·); M − 1) > T0 n→+∞
≤ Qk1 ,k2 [W (K; M − 1, N − 1, p, q − 1) ≤ T0 ] + η.
(5.86)
The first expression on the utmost right-hand side of (5.86) is less than or equal to (5.87) Qk1 ,k2 [T (K; M − 1) ≤ T0 ] + Qk1 ,k2 B(M − 1, N − 1, p, q − 1) ≤ η according to (5.81) and (5.82). Summarizing, the expression in (5.84) can be estimated by 2η F L∞ .
Self-Averaging of Wigner Transforms in Random Media
127
The first term on the right hand side of (5.83) can be estimated by F (K(·))1[W (K;M,N,p,q)>T0 ] Qk1 ,k2 (dK) ≤ F (K(·))Qk1 ,k2 (dK) + F L∞ Qk1 ,k2 [W (K; M, N, p, q) ≤ T0 ] ≤ F (K(·))Qk1 ,k2 (dK) + 2η F L∞ . The last estimate follows from an analogous estimate to (5.87). Summarizing, since η > 0 is arbitrary we conclude (5.80).
A. Proof of Lemma 2.3 We define dεδ (ξ , x0 ) = vεδ,B − v˜ εδ,B = β
and split dε =
e
ik·(ξ −y)
3
δ,j j =1 dε
Wεδ (x0
ε(ξ + y) dkdy δ , k) − Wε (x0 , k) 0 S0 (y) + , (A.1) 2 (2π )d
according to the decomposition
ε(ξ + y) Wεδ x0 + , k − Wεδ (x0 , k) 2 ε(ξ + y) ε(ξ + y) δ δ = Wε x0 + , k − U x0 + ,k 2 2 ε(ξ + y) + U δ x0 + , k − U δ (x0 , k) + U δ (x0 , k) − Wεδ (x0 , k) . 2 δ Here U δ = uq q is the semi-classical approximation of Wεδ , with uδq the solutions of the Liouville equations. The last term may be estimated as dεδ,3 (ξ , x0 ) 2 dx0
!2 ! ! dk ! ik·ξ δ δ ! dx0 ! ˆ ≤ ! e [U (x0 , k) − Wε (x0 , k)] 0 S0 (k) (2π )d ! 2 ≤ C S0 L2 U δ (x0 , k) − Wεδ (x0 , k) 2 dx0 dk → 0
as Kµ (ε, δ) → 0 with C independent of ξ . The Fourier transform of the first term in x0 is ε(ξ + y) dydkdx0 dˆεδ,1 (ξ ; p) = e−ip·x0 +ik·(ξ −y) fεδ (x0 + , k) 0 S0 (y) 2 (2π )d dydk = eik·(ξ −y)+iεp·(ξ +y)/2 fˆεδ (p, k) 0 S0 (y) (2π )d εp dk = ei(k+εp/2)·ξ fˆεδ (p, k) 0 Sˆ 0 (k − , ) 2 (2π )d
128
G. Bal, T. Komorowski, L. Ryzhik
where fεδ (x, k) = Wεδ (x, k) − U δ (x, k). Therefore we have using the Cauchy-Schwarz inequality dˆεδ,1 (ξ ; p) 2 dp ≤ C S0 L2 fεδ L2 → 0 as Kµ (ε, δ) → 0 with C independent of ξ . Finally, the Fourier transform of dεδ,2 is ε(ξ + y) dydkdx0 δ,2 ˆ , k) − U δ (x0 , k)] 0 S0 (y) dε (ξ ; p) = e−ip·x0 +ik·(ξ −y) [U δ (x0 + 2 (2π )d dydk = eik·(ξ −y) eiεp·(ξ +y)/2 − 1 Uˆ δ (p, k) 0 S0 (y) (2π )d dk εp . ) − Sˆ 0 (k) = eik·ξ Uˆ δ (p, k) 0 eiεp·ξ /2 Sˆ 0 (k − 2 (2π )d We write eiεp·ξ /2 = (eiεp·ξ /2 − 1) + 1 and decompose dˆεδ,2 (ξ ; p) as I1 (ξ ; p) + I2 (ξ ; p) accordingly. We have for the second term dk 2 εp ˆ 2 δ ˆ ˆ I2 (ξ ; p) dp ≤ C U (p, k) S0 k − dp − S0 (k) 2 (2π )d εp ˆ ≤C Uˆ δ (p, k) 2 dk − S0 (l) 2 dl dp. Sˆ 0 l − 2 Note that
1 εp ˆ ∇S0 (l − εps) 2 ds dl − S0 (l) 2 dl ≤ ε2 |p|2 Sˆ 0 l − 2 0 ≤ ε2 |p|2 ∇S0 2L2
and hence
I2 (ξ ; p) 2 dp ≤ ε2 ∇S0 2L2 U δ 2H 1 → 0
as Kµ (ε, δ) → 0 according to Lemma 3.6. It remains to bound the L2 norm of I1 (p; ξ ). We derive two estimates according to whether ξ is small or large. The first estimate is εp dk )| |I1 (ξ ; p)| ≤ C eik·ξ |Uˆ δ (p, k)|ε|p · ξ ||Sˆ 0 (k − 2 (2π )d δ ˆ ≤ Cε|ξ | U (p, k) L2 (p) |p| S0 2 , k
so that
|I1 (ξ ; p)|2 dp ≤ ε2 |ξ |2 U δ (p, k) 2H 1 (Rd ;L2 (Rd )) S0 22 . x
k
At the same time using integrations by parts we get i ik·ξ ˆ dk εp iεp·ξ /2 I1 (ξ ; p) = e ξ · ∇k Uˆ δ (p, k) 0 Sˆ 0 k − (e − 1) . |ξ | 2 (2π )d
Self-Averaging of Wigner Transforms in Random Media
129
This shows that C |I1 (ξ ; p)|2 dp ≤ 2 U 2L2 (Rd ;H 1 (Rd )) (1 + |x|2 )1/2 S0 22 . x |ξ | k With these estimates, we obtain that |I1 (ξ ; p)|2 dp ≤ C min(hδε |ξ |2 , |ξ |−2 ) with hδε → 0 as Kµ (ε, δ) → 0. This implies that |I1 (ξ ; p)|2 dp → 0, hence δ,2 |dˆε (ξ ; p)|2 dp → 0 as Kµ (ε, δ) → 0 uniformly with respect to ξ ∈ Rd and concludes the proof of Lemma 2.3. B. Proof of Lemma 3.3 We may recast (1.5) as ∂v ∂cδ ∂vεδ + cδ (x)D j j + j ej ⊗ ed+1 vεδ = 0. ∂t ∂x ∂x Thanks to calculations of the form (3.10), this is equivalent to the equation ε
∂vεδ + Pεδ,W (x, εDx )vεδ = 0, ∂t
(B.1)
(B.2)
where the symbol Pεδ is given by (3.8). We recall that the pseudo-differential Weyl operator P W (x, εD) associated to a symbol P (x, k) is defined by Weyl’s quantization rule dydk x+y W i(x−y)·k P (x, εDx )u = , εk u(y) e P . (B.3) 2d 2 (2π )d R The fact that (B.2) is equivalent to (B.1) is verified by a straightforward calculation: dξ dy δ,W δ i(x−y)·ξ δ x + y P0 (x, εD)vε (x) = e , εξ vεδ (y) P0 2 (2π )d x+y dξ dy D j ξj vεδ (y) = iε ei(x−y)·ξ cδ 2 (2π )d x+y ∂ j δ = ε cδ δ(x − y) dy D vε (y) − 2 ∂yj ( ( ∂ x+y =ε D j vεδ (y) (( cδ ∂yj 2 y=x = εcδ (x)D j
∂vεδ (x) ε ∂cδ (x) j δ + D vε (x) ∂xj 2 ∂xj
= −ε
∂vεδ ε ∂cδ (x) + [−ej ⊗ ed+1 + ed+1 ⊗ ej ]vεδ ∂t 2 ∂xj
= −ε
∂vεδ − εP1δ (x)vεδ (x), ∂t
and now (B.2) follows because P1δ,W (x)vεδ (x) = P1δ (x)vεδ (x) since P1δ (x) is independent of k.
130
G. Bal, T. Komorowski, L. Ryzhik
The associated Cauchy problem for the Wigner transform W˜ εδ with a fixed ζ is given by ∂ W˜ εδ ε + W˜ [Pεδ,W (x, εDx )vεδ , vεδ ] + W˜ [vεδ , Pεδ,W (x, εDx )vεδ ] = 0 ∂t W˜ εδ (0, x, k) = W˜ εδ (0, x, k; ζ ), where the Wigner transform of two different fields is defined by εy ∗ εy dy ψε x + eik·y φε x − . W˜ [φε , ψε ](x, k) = 2 2 (2π )d Rd
(B.4)
(B.5)
We deduce from the definitions of W˜ ε and PεW that W˜ [Pεδ,W (x, εDx )vεδ , vεδ ](x, k) εy δ∗ εy dy vε x + = eik·y (Pεδ,W (x, εDx )vεδ ) x − 2 2 (2π )d
εy x − εy εy dydzdq 2 +z , εq vεδ (z)vεδ∗ x + = eik·y ei(x− 2 −z)·q Pεδ 2 2 (2π )2d
εy p εy x − εy 2 +z = eik·y ei(x− 2 −z)·q Pεδ , εq e−i ε ·(x+ 2 −z) 2
εy x+ 2 +z dpdydzdq × Wεδ ,p 2 (2π)2d εy dzdpdydq = eiy·(k−p) ei2(x−z)·(q−p/ε) Pεδ (z − , εq)Wεδ (z, p) √ 2 (π 2)2d 2i dzdpdydq = Pεδ (y, q)Wεδ (z, p)e ε ((p−k)·y+(q−p)·x+(k−q)·z) (π ε)2d dzdpdydq = Pεδ (y, q)Wεδ (z, p)eiφ . (B.6) (π ε)2d Moreover, the matrix W˜ εδ is self-adjoint, while W [fε , gε ] = Wε∗ [g, f ] for any pair of functions f and g, and the symbol Pε is skew-symmetric. Thus (B.6) and (B.4) imply that the pure Wigner transform W˜ ε satisfies (3.11), and hence so does Wε . Moreover, the function φ satisfies an anti-symmetry relation φ(x, z, k, p; y, q) = −φ(z, x, p, k; y, q). Then, using the fact that Wε is self-adjoint we obtain Tr((Lδε Wεδ )Wεδ∗ )dxdk R2d Tr(Pεδ (y, q)Wεδ (z, p)Wεδ (x, k)eiφ = R6d
dxdkdydzdpdq −Wε (z, p)Pεδ (y, q)Wεδ (x, k)e−iφ ) (π ε)2d = Tr(Pεδ (y, q)Wεδ (z, p)Wεδ (x, k)eiφ − Pεδ (y, q)Wεδ (z, p)Wεδ (x, k)eiφ ) = 0, R6d
Self-Averaging of Wigner Transforms in Random Media
131
where we interchanged x ↔ z and k ↔ p in the second term on the last line, and used the anti-symmetry of φ. This implies conservation of the L2 -norm (3.14). Note that (3.13) follows immediately from (3.11) and the proof of Lemma 3.3 is complete. C. Regularity of the Liouville Equations We prove Lemma 3.6 in this Appendix. We recall that the functions uδq satisfy the evolution equations ∂uδq
+ {λδq , uδq } = 0, ∂t u0q = uq (t = 0) = Tr[q W0 q ].
(C.1)
These equations can be solved by following the Hamiltonian flow generated by λδq . More precisely, let us define for T , x, k given, the trajectories ∂λδq dX(t) =− (X(t), K(t)), dt ∂k ∂λδq dK(t) = (X(t), K(t)), dt ∂x
X(0) = x, K(0) = k.
(C.2)
uδq (T , x, k) = u0q (X(T , x, k), K(T , x, k)).
(C.3)
Then solution of (C.1) is given by
The flow (4.5) preserves the Hamiltonian λδq (x, k) and the initial data u0q is supported on a compact set S. Therefore the set S= supp uδq (t, x, k) t≥0,δ∈(0,1]
is compact because the speed cδ (x) is uniformly bounded from above and below. Furthermore ∇uδq = D δ∗ ∇u0q , uq (t) H˙ 1 ≤ u0q H˙ 1 D δ (t) ∞ , where D δ (t, x, k) is the Jacobian matrix, Djδ,i = ∂Ziδ /∂zj , with det D δ (t) ≡ 1, Z = (X, K), and z = (x, k). To simplify notation, we do not write explicitly the dependence of D δ and its derivatives with respect to the eigenvalue label q in the sequel. Here we define 1/2 D δ ∞ = sup Tr[D δ (x, k)D δ∗ (x, k)] . (x,k)∈S
More generally, given a tensor Tj1 j2 ...jm we denote T ∞ = sup (x,k)∈S
j1 ,...,jm
1/2 |Tj1 ...jm |2
.
132
G. Bal, T. Komorowski, L. Ryzhik
We will also use the matrix norm |A| that is dual to the Euclidean norm on Rd and is equal to the square root of the largest eigenvalue of the matrix AA∗ , and denote |A|∞ = sup |A(x, k)|. (x,k)∈S
Furthermore, we have ∂ 2 uδq ∂zj ∂zp
=
δ ∂u0 δ ∂Z δ ∂ 2 u0 ∂ 2 Zm ∂Zm q q r + , ∂zj ∂zp ∂zm ∂zj ∂zp ∂zm ∂zr
so that ( (2 δ ∂u0 ∂ 2 Z δ ∂u0 (( ∂ 2 uδq (( ∂ 2 Zm ∂Z δ ∂Zrδ ∂ 2 u0q ∂Zlδ ∂Zsδ ∂ 2 u0q q q l +2 m ( ( ≤2 ( ∂zj ∂zp ( ∂zj ∂zp ∂zm ∂zj ∂zp ∂zl ∂zj ∂zp ∂zm ∂zr ∂zj ∂zp ∂zl ∂zs j,p ( (2 ( ( ( ∂ 2 Z δ (2 (( ∂ 2 u0q (( m 0 2 δ 2 ( ( ≤2 |Dmj | , ( ( ( ∂z ∂z ( ∇uq + 2 ( ( ∂z ∂z j p l s m,j,p
l,s
m,j
and hence uδq (t) H˙ 2 ≤ 2 u0q H˙ 1 D2δ (t) ∞ + 2 u0q H˙ 2 D δ (t) 2∞ , δ,m 2 δ with D2,j l = ∂ Zm /∂zj ∂zl . We observe that
∂ 3 uδq ∂zj ∂zp ∂zs
=
δ δ ∂Z δ ∂ 2 u0 δ ∂Z δ ∂ 2 u0 ∂u0q ∂ 3 Zm ∂ 2 Zm ∂ 2 Zm q q r l + + ∂zj ∂zp ∂zs ∂zm ∂zj ∂zp ∂zs ∂zm ∂zl ∂zj ∂zs ∂zp ∂zm ∂zr
+
δ ∂ 2 Zδ δ ∂Z δ ∂Z δ ∂ 2 u0q ∂ 3 u0q ∂Zm ∂Zm r r l + . ∂zj ∂zp ∂zs ∂zm ∂zr ∂zj ∂zp ∂zs ∂zm ∂zr ∂zl
Therefore we have ( (2 ( ( (( ∂ 3 uδq (( ( ∂ 3 Zm (2 0 2 ( ( ≤ 5 ( ( ( ∂z ∂z ∂z ( ∇uq ( ∂zj ∂zp ∂zs ( j p s j,p,s m,j,p,s ( (2 (( ∂ 2 u0q (( 2 2 +15 D(t) ∞ D2 (t) ∞ ( ( ( ∂zm ∂zn ( m,n ( (2 (( ∂ 3 u0q (( 6 +5 D(t) ∞ ( ( , ( ∂zm ∂zr ∂zl ( m,r,l
so that uδq (t) H˙ 3 ≤ 3 u0q H˙ 1 D3δ (t) ∞ + 3 u0q H˙ 2 D2δ (t) L∞ D δ (t) ∞ + u0q H˙ 3 D δ (t) 3∞ δ,m 3 δ with D3,j lp = ∂ Zm /∂zj ∂zl ∂zp .
Self-Averaging of Wigner Transforms in Random Media
133
It thus remains to estimate the matrices D δ , D2δ , and D3δ . The matrix D δ satisfies the differential equation ∂ 2 λδq ∂ 2 λδq − − dD δ ∂ki ∂kj ∂k ∂x = F δ Dδ , F δ = 2 i δ j , D δ (0) = I. ∂ 2 λδq ∂ λq dt ∂xi ∂xj ∂xi ∂kj Therefore we have d δ,l δ,m Filδ Dm Di Tr[D δ D δ∗ ] = 2Tr[F δ D δ D δ∗ ] = 2 dt
1/2
1/2 δ,l δ,p ≤2 Filδ |Dk |2 |Di |2 ≤ 2|F |Tr[D δ D δ∗ ], p
k
so that D δ (t) ∞ ≤ exp |F δ |∞ t and hence uδq (t) H˙ 1 ≤ u0q H˙ 1 exp |F δ |∞ t . Differentiating (C.2) once again we obtain δ,i dD2,j k
dt
=
∂Filδ δ,m δ,l δ,l D Dj + Fil D2,j k, ∂zm k
so that along each characteristic ∂Filδ δ,m δ,l δ 1 d δ,l δ,i D2δ 2 = D Dj D2,j k + Filδ D2,j k D2,j k 2 dt ∂zm k ≤ F2δ ∞ D δ 2 D2δ + |F δ |∞ D2δ 2 , δ δ δ where F2,ij k = ∂Fij /∂zk . Furthermore, initially at t = 0 we have D2 (0) = 0. Therefore we obtain
D2δ (t) L∞ ≤ and thus
F2δ ∞ exp(2|F δ |∞ t), |F δ |∞
F2δ ∞ 0 0 ≤2 u ˙ 1 + uq H˙ 2 exp(2|F δ |∞ t) |F δ |∞ q H
F2δ ∞ ≤2 + 1 u0q H 2 exp(2|F δ |∞ t). |F δ |∞
uδq (t) H˙ 2
Similarly, the tensor D3δ satisfies the ordinary differential equation δ,i dD3,j km
dt
δ,p
δ,n δ,l δ δ δ Dm Dkδ,n Djδ,l + F2,iln D2,km Djδ,l + F2,iln Dkδ,n D2,j = F3,ilnp m δ,l δ δ,n δ,l +F2,iln Dm D2,j k + Filδ D3,j km
134
G. Bal, T. Komorowski, L. Ryzhik
so that along each characteristic 1 d δ,p δ,i δ,n δ,l δ,i δ δ D3δ 2 = F3,ilnp Dm Dkδ,n Djδ,l D3,j km + F2,iln D2,km Dj D3,j km 2 dt δ,l δ,i δ,i δ,i δ δ δ,n δ,l δ δ,l +F2,iln Dkδ,n D2,j m D3,j km + F2,iln Dm D2,j k D3,j km + Fil D3,j km D3,j km ≤ F3δ D δ 3 D3δ + 3 F2δ D δ D2δ D3δ + |F δ | D3δ 2 , δ δ where F3,ij kn = ∂F2,ij k /∂zn , and at t = 0 we have D3 (0) = 0. Therefore we obtain
F3δ ∞ 3 F2δ 2∞ δ D3 (t) ∞ ≤ exp(3|F δ |∞ t), + |F δ |∞ |F δ |2∞
and thus uδq (t) H˙ 3
≤
F3δ ∞ 3 F2δ 2∞ + |F δ |∞ |F δ |2∞
u0q H˙ 1
F δ ∞ 0 + 3 2δ u ˙ 2 + u0q H˙ 3 |F |∞ q H
"
× exp(3|F δ |∞ t)
F3δ ∞ F2δ 2∞ ≤6 + + 1 u0q H 3 exp(3|F δ |∞ t). |F δ |∞ |F δ |2∞ This completes the proof of Lemma 3.6 because γ¯δ = |F δ |∞ . Acknowledgement. The authors thank the organizers of the Mathematical Geophysics Summer School at Stanford, where part of this work was completed, for their hospitality. This work was supported in part by ONR grant N00014-02-1-0089. GB was supported in part by NSF Grants DMS-0072008 and DMS-0233549, TK was partially supported by grant Nr. 2 PO3A 031 23 from the State Committee for Scientific Research of Poland, and LR in part by NSF Grants DMS-9971742 and DMS-0203537, and by an Alfred P. Sloan Fellowship.
References 1. Bal, G., Papanicolaou, G., Ryzhik, L.: Radiative transport limit for the random Schroedinger equation. Nonlinearity 15, 513–529 (2002) 2. Bal, G., Papanicolaou, G., Ryzhik, L.: Self-averaging in time reversal for the parabolic wave equation. Stochastics and Dynamics 2, 507–531 (2002) 3. Bal, G., Ryzhik, L.: Time reversal for classical waves in random media. Comptes Rendus de l’Acad. Sci, Seri´e I/Math 333, 1041–1046 (2001) 4. Bal, G., Ryzhik, L.: Time reversal and refocusing in random media. SIAM J. Appl. Math. 63, 1475– 1498 (2003) 5. Bambusi, D., Graffi, S., Paul, T.: Long time semiclassical approximation of quantum flows: A proof of the Ehrenfest time. Asymptot. Anal. 21, 149–160 (1999) 6. Bardos, C., Fink, M.: Mathematical foundations of the time reversal mirror. Asympt. Anal. 29, 157–182 (2002) 7. Billingsley, P.: Convergence of Probability Measures. New York: Wiley, 1968 8. Blomgren, P., Papanicolaou, G., Zhao, H.: Super-Resolution in Time-Reversal Acoustics. J. Acoust. Soc. Am. 111, 230–248 (2002) 9. Bouzouina, A., Robert, D.: Uniform semiclassical estimates for the propagation of quantum observables. Duke Math. J. 111, 223–252 (2002) 10. Erd¨os, L.,Yau, H.T.: Linear Boltzmann equation as the weak coupling limit of a random Schr¨odinger Equation. Commun. Pure Appl. Math. 53, 667–735 (2000) 11. Clouet, J.F., Fouque, J.P.: A time-reversal method for an acoustical pulse propagating in randomly layered media. Wave Motion 25, 361–368 (1997)
Self-Averaging of Wigner Transforms in Random Media
135
12. Fink, M., Prada, C.: Acoustic time-reversal mirrors. Inverse Problems 17, R1–R38 (2001) 13. Fouque, J.P., Sølna, K.: SIAM J. for Multiscale Modeling and Simulation 1, 239–259 (2003) 14. G´erard, P., Markowich, P.A., Mauser, N.J., Poupaud, F.: Homogenization limits and Wigner transforms. Comm. Pure Appl. Math. 50, 323–380 (1997) 15. Kesten, H., Papanicolaou, G.: A limit theorem for stochastic acceleration. Commun. Math. Phys. 78, 19–63 (1980) 16. Kuperman, W., Hodgkiss, W., Song, H., Akal, T., Ferla, C., Jackson, D.: Phase-conjugation in the ocean. J. Acoust. Soc. Am. 102, 1–16 (1997) 17. Lions, P.-L., Paul, T.: Sur les mesures de Wigner. Rev. Mat. Iberoamericana 9, 553–618 (1993) 18. Poupaud, F., Vasseur, A.: Classical and quantum transport in random media. J. Math. Pure et Appl. 82, 711–748 (2003) 19. Papanicolaou, G., Ryzhik, L., Sølna, K.: Statistical stability in time reversal. To appear in SIAM J. Appl. Math., 2003 20. Ryzhik, L., Papanicolaou, G., Keller, J.B.: Transport equations for elastic and other waves in random media. Wave Motion 24, 327–370 (1996) 21. Sølna, K.: Focusing of time-reversed reflections. Waves in Random Media 12, 365–385 (2002) 22. Spohn, H.: Derivation of the transport equation for electrons moving through random impurities. J. Stat. Phys. 17, 385–412 (1977) 23. Strook, D., Varadhan, S.R.S.: Multidimensional Diffusion Processes. Berlin-Heidelberg-New York: Springer-Verlag, 1979 Communicated by P. Constantin
Commun. Math. Phys. 242, 137–183 (2003) Digital Object Identifier (DOI) 10.1007/s00220-003-0946-x
Communications in
Mathematical Physics
Critical Region for Droplet Formation in the Two-Dimensional Ising Model Marek Biskup1 , Lincoln Chayes1 , Roman Koteck´y2 1 2
Department of Mathematics, UCLA, Los Angeles, CA 90095-1555, USA Center for Theoretical Study, Charles University, Prague, Czech Republic
Received: 13 December 2002 / Accepted: 28 April 2003 Published online: 7 October 2003 – © M. Biskup, L. Chayes and R. Koteck´y 2003
Abstract: We study the formation/dissolution of equilibrium droplets in finite systems at parameters corresponding to phase coexistence. Specifically, we consider the 2D Ising model in volumes of size L2 , inverse temperature β > βc and overall magnetization conditioned to take the value m L2 − 2m vL , where βc −1 is the critical temperature, m = m (β) is the spontaneous magnetization and vL is a sequence of positive numbers. 3/2 We find that the critical scaling for droplet formation/dissolution is when vL L−2 tends to a definite limit. Specifically, we identify a dimensionless parameter , proportional to this limit, a non-trivial critical value c and a function λ such that the following holds: For < c , there are no droplets beyond log L scale, while for > c , there is a single, Wulff-shaped droplet containing a fraction λ ≥ λc = 2/3 of the magnetization deficit and there are no other droplets beyond the scale of log L. Moreover, λ and are related via a universal equation that apparently is independent of the details of the system. Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . 1.1 Motivation . . . . . . . . . . . . . . . . . . 1.2 The model . . . . . . . . . . . . . . . . . . 1.3 Main results . . . . . . . . . . . . . . . . . 1.4 Discussion and outline . . . . . . . . . . . 2. Technical Ingredients . . . . . . . . . . . . . . . 2.1 Variational problem . . . . . . . . . . . . . 2.2 Skeleton estimates . . . . . . . . . . . . . 2.2.1 Definition and geometric properties.
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
138 138 140 143 145 147 147 149 149
c Copyright rests with the authors. Reproduction, by any means, of the entire article for non-com mercial purposes is permitted without charge.
138
M. Biskup, L. Chayes, R. Koteck´y
2.2.2 Probabilistic estimates. . . . . . . . . . . . . . 2.2.3 Quantitative estimates around Wulff minimum. 2.3 Small-contour ensemble . . . . . . . . . . . . . . . . 2.3.1 Estimates using the GHS inequality. . . . . . . 2.3.2 Gaussian control of negative deviations. . . . . 3. Lower Bound . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Large-deviation lower bound . . . . . . . . . . . . . . 3.2 Results using random-cluster representation . . . . . . 3.2.1 Preliminaries. . . . . . . . . . . . . . . . . . . 3.2.2 Decay estimates. . . . . . . . . . . . . . . . . 3.2.3 Corona estimates. . . . . . . . . . . . . . . . . 4. Absence of Intermediate Contour Sizes . . . . . . . . . . . . 4.1 Statement and outline . . . . . . . . . . . . . . . . . . 4.2 Contour length and volume . . . . . . . . . . . . . . . 4.2.1 Total contour length. . . . . . . . . . . . . . . 4.2.2 Interiors and exteriors. . . . . . . . . . . . . . 4.2.3 Volume of large contours. . . . . . . . . . . . 4.3 Magnetization deficit due to large contours . . . . . . 4.3.1 Magnetization inside. . . . . . . . . . . . . . . 4.3.2 Magnetization outside. . . . . . . . . . . . . . 4.4 Proof of Theorem 4.1 . . . . . . . . . . . . . . . . . . 4.4.1 A lemma for the restricted ensemble. . . . . . 4.4.2 Absence of intermediate contours. . . . . . . . 5. Proof of Main Results . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . .
150 152 154 154 156 157 157 160 160 161 163 166 166 167 167 168 169 170 170 171 173 174 176 177
1. Introduction 1.1. Motivation. The connection between microscopic interactions and pure-phase (bulk) thermodynamics has been understood at a mathematically sophisticated level for many years. However, an analysis of systems at phase coexistence which contain droplets has begun only recently. Over a century ago, Curie [25], Gibbs [33] and Wulff [55] derived from surface-thermodynamical considerations that a single droplet of a particular shape—the Wulff shape—will appear in systems that are forced to exhibit a fixed excess of a minority phase. A mathematical proof of this fact starting from a system defined on the microscopic scale has been given in the context of percolation and Ising systems, first in dimension d = 2 [4, 27] and, more recently, in all dimensions d ≥ 3 [13, 21, 22]. Other topics related to the droplet shape have intensively been studied: Fluctuations of a contour line [3, 18–20, 26, 37], wetting phenomena [50] and Gaussian fields near a “wall” [5, 15, 29]. See [14] for a summary of these results and comments on the (recent) history of these developments. The initial stages of the rigorous “Wulff construction” program have focused on systems in which the droplet subsumes a finite fraction of the available volume. Of no less interest is the situation when the excess represents only a vanishing fraction of the total volume. In [28], substantial progress has been made on these questions in the context of the Ising model at low temperatures. Subsequent developments [38, 39, 48, 49] have allowed the extension, in d = 2, of the aforementioned results up to the critical point [40]. Specifically, what has so far been shown is as follows: For two-dimensional volumes L of side L and δ > 0 arbitrarily small, if the magnetization deficit exceeds L4/3+δ , then a Wulff droplet accounts, pretty much, for all the deficit, while if the
Droplet Formation in the 2D Ising Model
139
magnetization deficit is bounded by L4/3−δ , there are no droplets beyond the scale of log L. The preceding are of course asymptotic statements that hold with probability tending to one as L → ∞. The focus of this paper is the intermediate regime, which has not yet received appropriate attention. Assuming the magnetization deficit divided by L4/3 tends to a definite limit, we define a dimensionless parameter, denoted by , which is proportional to this limit. (A precise definition of is provided in (1.10).) Our principal result is as follows: There is a critical value c such that for < c , there are no large droplets (again, nothing beyond log L scale), while for > c , there is a single, large droplet of a diameter of the order L2/3 . However, in contrast to all situations that have previously been analyzed, this large droplet only accounts for a finite fraction, λ < 1, of the magnetization deficit, which, in addition, does not tend to zero as ↓ c ! (Indeed, λ ↓ λc , with λc = 2/3.) Whenever the droplet appears, its interior is representative of the minus phase, its shape is close to the optimal (Wulff) shape and its volume is tuned to contain the λ -fraction of the deficit magnetization. Furthermore, for all values of , there is at most one droplet of size L2/3 and nothing else beyond the scale log L. At = c the situation is not completely resolved. However, there are only two possibilities: Either there is one droplet of linear size L2/3 or no droplet at all. The above transition is the result of a competition between two mechanisms for coping with a magnetization deficit in the system: Absorption of the deficit by the ambient fluctuations or the formation of a droplet. The results obtained in [27, 28] and [40] deal with the situations when one of the two mechanisms completely dominates the other. As is seen by a simple-minded comparison of the exponential costs of the two mechanisms, L4/3 is the only conceivable scaling of the magnetization deficit where these are able to coexist. (This is the core of the heuristic approach outlined in [9, 46] and [7], see also [8, 11].) However, at the point where the droplets first appear, one can envision alternate scenarios involving complicated fluctuations and/or a multitude of droplets with effective interactions ranging across many scales. To rule out such possibilities it is necessary to demonstrate the absence of these “intermediate-sized” droplets and the insignificance—or absence—of large fluctuations. This was argued on a heuristic level in [10] and will be proven rigorously here. Thus, instead of blending into each other through a series of intermediate scales, the droplet-dominated and the fluctuation-dominated regimes meet—literally—at a single point. Furthermore, all essential system dependence is encoded into one dimensionless parameter and the transition between the Gaussian-dominated and the droplet-dominated regimes is thus characterized by a universal constant c . In addition, the relative fraction λ of the deficit “stored” in the droplet depends on via a universal equation which is apparently independent of the details of the system [10]. At this point we would like to stress that, even though the rigorous results presented here are restricted to the case of the two-dimensional Ising model, we expect that their validity can be extended to a much larger class of models and the universality of the dependence on will become the subject of a mathematical statement. Notwithstanding the rigorous analysis, this universal setting offers the possibility of fitting experimental/numerical data from a variety of systems onto a single curve. A practical understanding of how droplets disappear is by no means an esoteric issue. Aside from the traditional, i.e., three-dimensional, setting, there are experimental realizations which are effectively two-dimensional (see [42] and references therein). Moreover, there are purported applications of Ising systems undergoing “fragmentation” in such diverse areas as nuclear physics and adatom formation [36]. From the perspective of statistical physics, perhaps more important are the investigations of small systems at
140
M. Biskup, L. Chayes, R. Koteck´y
parameter values corresponding to a first order transition in the bulk. In these situations, non-convexities appear in finite-volume thermodynamic functions [36, 43, 44, 51], which naturally suggest the appearance of a droplet. Several papers have studied the disappearance of droplets and reported intriguing finite-size characteristics [7, 9, 42, 45, 46, 51, 52]. It is hoped that the results established here will shed some light in these situations. 1.2. The model. The primary goal of this paper is a detailed description of the above droplet-formation phenomenon in the Ising model. In general dimension, this system is defined by the formal Hamiltonian H=− σx σy , (1.1) x,y
where x, y denotes a nearest-neighbor pair on Zd and where σx ∈ {−1, +1} denotes an Ising spin. To define the Hamiltonian in a finite volume ⊂ Zd , we use ∂ to denote the external boundary of , ∂ = {x ∈ / : there exists a bond x, y with y ∈ }, fix a collection of boundary spins σ∂ = (σx )x∈∂ and restrict the sum in (1.1) to bonds x, y such that {x, y} ∩ = ∅. We denote this finite-volume Hamiltonian by H (σ , σ∂ ). The special choices of the boundary configurations such that σx = +1, resp., σx = −1 for all x ∈ ∂ will be referred to as plus, resp., minus boundary conditions. The Hamitonian gives rise to the concept of a finite-volume Gibbs measure (also known as Gibbs state) which is a measure assigning each configuration σ = (σx )x∈ ∈ {−1, +1} the probability σ
,β
P∂ (σ ) =
e−βH (σ ,σ∂ ) . σ∂ Z (β)
(1.2)
Here β ≥ 0 denotes the inverse temperature, σ∂ is an arbitrary boundary configuration σ∂ and Z (β) is the partition function. Most of this work will concentrate on squares of L × L sites, which we will denote by L , and the plus boundary conditions. In this case +,β +,β we denote the above probability by PL (−) and the associated expectation by −L . +,β As the choice of the signs in (1.1–1.2) indicates, the measure PL with β > 0 tends to favor alignment of neighboring spins with an excess of plus spins over minus spins. Remark 1. As is well known, the Ising model is equivalent to a model of a lattice gas where at most one particle is allowed to occupy each site. In our case, the sites occupied by a particle are represented by minus spins, while the plus spins correspond to the +,β sites with no particles. In the particle distribution induced by PL , the total number of particles is not fixed; hence, we will occasionally refer to this measure as the “grand canonical” ensemble. On the other hand, if the number of minus spins is fixed (by conditioning on the total magnetization, see Sect. 1.3), the resulting measure will sometimes be referred to as the “canonical” ensemble. The Ising model has been studied very extensively by mathematical physicists in the last 20–30 years and a lot of interesting facts have been rigorously established. We proceed by listing the properties of the two-dimensional model which will ultimately be needed in this paper. For general overviews of various aspects mentioned below we refer to, e.g., [14, 31, 32, 54]. The readers familiar with the background (and the standard notation) should feel free to skip the remainder of this section and go directly to Sect. 1.3 where we discuss the main results of the present paper.
Droplet Formation in the 2D Ising Model
141 +,β
Bulk properties. For all β ≥ 0, the measure PL has a unique infinite volume (weak) limit P +,β which is a translation-invariant, ergodic, extremal Gibbs state for the interaction (1.1). Let −+,β denote the expectation with respect to P +,β . The persistence of the plus-bias in the thermodynamic limit, characterized by the magnetization m (β) = σ0 +,β ,
(1.3)
marks the region of phase coexistence in this model. Indeed, there is√a non-trivial critical value βc ∈ (0, ∞)—known [1, 6, 41, 47] to satisfy e2βc = 1 + 2—such that for β > βc , we have m (β) > 0 and there are multiple infinite-volume Gibbs states, while for β ≤ βc , the magnetization vanishes and there is a unique infinite-volume Gibbs state for the interaction (1.1). Further, using A; B+,β to denote the truncated correlation function AB+,β − A+,β B+,β , the magnetic susceptibility, defined by σ0 ; σx +,β , (1.4) χ (β) = x∈Z2
is finite for all β > βc , see [24, 53]. By the GHS or FKG inequalities, we have χ(β) ≥ 1 − m (β)2 > 0 for all β ∈ [0, ∞). Peierls’ contours. Our next requisite item is a description of the Ising configurations in terms of Peierls’ contours. Given an Ising configuration in with plus boundary conditions, we consider the set of dual bonds intersecting direct bonds that connect a plus spin with a minus spin. These dual bonds will be assembled into contours as follows: First we note that only an even number of dual bonds meet at each site of the dual lattice. When two bonds meet at a single dual site, we simply connect them. When four bonds are incident with one dual lattice site, we apply the rounding rule “south-east/north-west” to resolve the “cross” into two curves “bouncing” off each other (see, e.g., [27, 49] or Fig. 1). Using these rules consistently, the aforementioned set of dual bonds decomposes into a set of non self-intersecting polygons with rounded corners. These are our contours. Each contour γ is a boundary of a bounded subset of R2 , which we denote by V (γ ). We will also need a symbol for the set of sites in the interior of γ ; we let V(γ ) = V (γ ) ∩ Z2 . The diameter of a contour γ is defined as the diameter of the set V (γ ) in the 2 -metric on R2 . In the thermodynamic interpretation used in Sect. 1.1, contours represent microscopic boundaries of droplets. The advantage of the contour language is that it permits the identification of a sharp boundary between two phases; the disadvantage is that, in order to study the typical shape (and other properties) of large droplets, one has to first resum over small fluctuations of this boundary. Surface tension. In order to study droplet equilibrium, we need to introduce the concept of microscopic surface tension. Following [4, 48], on Z2 we can conveniently use duality. Given a β > βc , let β ∗ = 21 log coth β denote the dual temperature. For any (k1 , k2 ) ∈ Z2 and k = (k12 + k22 )1/2 , let n = (k1 /k, k2 /k) ∈ S1 = {x ∈ R2 : x = 1}. (Here x is the Euclidean norm of x.) Then the limit τβ (n) = lim
N→∞
1 ∗ logσ0 σNkn +,β , Nk
(1.5)
where Nkn = (k1 N, k2 N ) ∈ Z2 , exists independently of what integers k1 and k2 we chose to represent n and defines a function on a dense subset of S1 . It turns out that this
142
M. Biskup, L. Chayes, R. Koteck´y
Fig. 1. An example of an Ising spin configuration and its associated Peierls’ contours. In general, a contour consists of a string of dual lattice bonds that bisect a direct bond between a plus spin and a minus spin. When four such dual bonds meet at a single (dual) lattice site, an ambiguity is resolved by applying the south-east/north-west rounding rule. (The remaining corners are rounded just for æsthetic reasons.) The shaded areas correspond to the part of V (γ ) occupied by the minus spins
function can be continuously extended to all n ∈ S1 . We call the resulting quantity τβ (n) the surface tension in direction n at inverse temperature β. As is well known, n → τβ (n) is invariant under rotations of n by integer multiples of π2 and τmin = inf n∈S1 τβ (n) > 0 for all β > βc [48]. Informally, the quantity τβ (n)N represents the statistical-mechanical cost of a (fluctuating) contour line connecting two sites at distance N on a straight line with direction (or normal vector) n. Remark 2. Our definition of the surface tension differs from the standard definition by a factor of β −1 . In particular, the physical units of τβ are length−1 rather than energy×length−1 . The present definition eliminates the need for an explicit occurrence of β in many expressions throughout this paper and, as such, is notationally more convenient. Surface properties. On the level of macroscopic thermodynamics, it is obvious that when a droplet of the minority phase is present in the system, it is pertinent to minimize the total surface cost. By our previous discussion, the cost per unit length is given by the surface tension τβ (n). Thus, one is naturally led to the functional Wβ (γ ) that assigns the number Wβ (γ ) = τβ (nt )dt (1.6) γ
to each rectifiable, closed curve γ = (γt ) in R2 . Here nt denotes the normal vector at γt . The goal of the resulting variational problem is to minimize Wβ (∂D) over all D ⊂ R2 with rectifiable boundary subject to the constraint that the volume of D coincides with that of the droplet. The classic solution, due to Wulff [55], is that Wβ (∂D) is minimized by the shape DW = r ∈ R2 : r · n ≤ τβ (n), n ∈ S1 (1.7)
Droplet Formation in the 2D Ising Model
143
rescaled to contain the appropriate volume. (Here r · n denotes the dot product in R2 .) We will use W to denote the shape DW scaled to have a unit (Lebesgue) volume. It follows from (1.7) that W is a convex set in R2 . We define w1 (β) = Wβ (∂W )
(1.8)
and note that w1 (β) > 0 once β > βc . Our preliminary arsenal is now complete and we are prepared to discuss the main results. 1.3. Main results. Recall the notation L for a square of L × L sites in Z2 . Consider the Ising model in volume L with plus boundary condition and inverse temperature β. Let us define the total magnetization (of a configuration σ ) in L by the formula ML = σx . (1.9) x∈L
Let (vL )L≥1 be a sequence of positive numbers, with vL → ∞ as L → ∞, such that m |L | − 2m vL is an allowed value of ML for all L ≥ 1. Our first result concerns the decay rate of the probability that ML = m |L | − 2m vL in the “grand canonical” +,β ensemble PL : Theorem 1.1. Let β > βc and let m = m (β), χ = χ (β), and w1 = w1 (β) be as above. Suppose that the limit 3/2
=2
v (m )2 lim L χ w1 L→∞ |L |
exists with ∈ (0, ∞). Then 1 +,β lim √ log PL ML = m |L | − 2m vL = −w1 inf (λ), 0≤λ≤1 L→∞ vL where (λ) =
√ λ + (1 − λ)2 ,
0 ≤ λ ≤ 1.
(1.10)
(1.11)
(1.12)
The proof of Theorem 1.1 is a direct consequence of Theorems 3.1 and 4.1; the actual proof comes in Sect. 5. We proceed with some remarks: Remark 3. Note that, by our choice of the deviation scale, the term m (β)|L | can be +,β replaced by the mean value ML L in all formulas; see Lemma 2.9 below. The motivation for introducing the factor “2m ” on the left-hand-side of (1.11) is that then vL represents the volume of a droplet that must be created in order to achieve the required value of the overall magnetization (provided the magnetization outside, resp., inside the droplet is m , resp., −m ). Remark 4. The quantity λ that appears in (1.11–1.12) represents the trial fraction of the deficit magnetization which might go into a large-scale droplet. (So, by our convention, the volume of such a droplet is just λvL .) The core of the proof of Theorem 1.1, roughly speaking, is that the probability of seeing a droplet of this size tends to zero √ as exp{−w1 vL (λ)}. Evidently, a large deviation principle for the size of such a droplet is satisfied with rate L2/3 and a rate function proportional to . However, we will not attempt to make this statement mathematically rigorous.
144
M. Biskup, L. Chayes, R. Koteck´y
Next we shall formulate our main result on the asymptotic form of typical configura+,β tions in the “canonical” ensemble described by the conditional measure PL ( · |ML = m |L | − 2m vL ). For any two sets A, B ⊂ R2 , let dH (A, B) denote the Hausdorff distance between A and B, dH (A, B) = max sup dist(x, B), sup dist(y, A) , x∈A
(1.13)
y∈B
where dist(x, A) is the Euclidean distance of x and A. Our second main theorem is then as follows: Theorem 1.2. Let β > βc and suppose that the limit in (1.10) exists with ∈ (0, ∞). Recall that W denotes the Wulff shape of a unit volume. Given κ, s, L ∈ (0, ∞), let Aκ ,s,L be the√event that any external contour γ for which diam γ ≥ s must also satisfy diam γ > κ vL . Next, for each > 0, let B,s,L be the event that there is at most one external contour γ0 in L with diam γ0 ≥ s and, whenever such a contour γ0 exists, it satisfies the conditions √ (1.14) inf dH V (γ0 ), z + |V (γ0 )| W ≤ vL z∈R2
and vL−1 |V (γ0 )| ≤ inf (λ ) + . 0≤λ ≤1
(1.15)
In addition, the event B,s,L also requires that the magnetization inside γ0 obeys the constraint
(σx + m ) ≤ vL . (1.16)
x∈V(γ0 )
There exists a constant κ0 > 0 such that for each ζ > 0 and each > 0 there exist numbers K0 < ∞ and L0 < ∞ such that
+,β PL Aκ ,s,L ∩ B,s,L ML = m |L | − 2m vL ≥ 1 − L−ζ (1.17) holds provided κ ≤ κ0 and s = K log L with K ≥ K0 and L ≥ L0 , . Thus, simply put, whenever there is a large droplet in the system, its shape rarely deviates from that of the Wulff shape and its volume (in units of vL ) is almost always given by a value of λ nearly minimizing . Moreover, all other droplets in the system are at most of a logarithmic size. Most of the physically interesting behavior of this system is simply a consequence of where achieves its minimum and how this minimum depends on . The upshot, which is stated concisely in Proposition 2.1 below, is that there is a critical value of , given by c =
1 3 3/2 , 2 2
(1.18)
Droplet Formation in the 2D Ising Model
145
such that if < c , then has the unique minimizer at λ = 0, while for > c , the unique minimizer of is nontrivial. More explicitly, for = c , the function is minimized by 0, if < c , λ = (1.19) λ+ (), if > c , where λ+ () is the maximal positive solution to the equation √ 4 λ(1 − λ) = 1.
(1.20)
The reason for the changeover is that, as increases through c , a local minimum becomes a global minimum, see the proof of Proposition 2.1. As a consequence, the minimizing fraction λ does not tend to zero as ↓ c ; in particular, it tends to λc = 2/3. Using the information about the unique minimizer of for = c , it is worthwhile to reformulate Theorem 1.2 as follows: Corollary 1.3. Let β > βc and suppose that the limit in (1.10) exists with ∈ (0, ∞). Let c and λ be as in (1.18) and (1.19), respectively. Let K be sufficiently large (i.e., K ≥ K0 , where K0 is as in Theorem 1.2). Considering the conditional distribution +,β PL ( · |ML = m |L | − 2m vL ), the following holds with probability tending to one as L → ∞: (1) If < c , then all contours γ in L satisfy diam γ ≤ K log L. (2) If > c , then there is exactly one external contour γ0 with diam γ0 > K log L and all other external contours γ satisfy diam γ ≤ K log L. Moreover, the unique “large” external contour γ0 asymptotically satisfies the bounds (1.14–1.16) for all > 0. In particular, |V (γ0 )| = vL (λ + o(1)) with probability tending to one as L → ∞. We remark that although the situation at = c is not fully resolved, we must have either a single large droplet or no droplet at all; i.e., the outcome must mimic the case > c or < c . A better understanding of the case = c will certainly require a more refined analysis; e.g., the second-order large-deviation behavior of the measure +,β PL (·). Remark 5. We note that in the course of this work, the phrase “β > βc ” appears in three disparate meanings. First, for β > βc , the magnetization is positive, second, for β > βc , the surface tension is positive and third, for β > βc , truncated correlations decay exponentially. The facts that the transition temperatures associated with these properties all coincide and that βc is given by the self-dual condition plays no essential role in our arguments. Nor are any other particulars of the square lattice really used. Thus, we believe that our results could be extended to other planar lattices without much modification. However, in the cases where the coincidence has not yet been (or cannot be) established, we would need to define “βc ” so as to satisfy all three criteria. 1.4. Discussion and outline. The mechanism which drives the droplet formation/dissolution phenomenon described in the above theorems is not difficult to understand on a heuristic level. This heuristic derivation (which applies to all dimensions d ≥ 2) has been discussed in detail elsewhere [10], so we will be correspondingly brief. The main ideas
146
M. Biskup, L. Chayes, R. Koteck´y
are best explained in the context of the large-deviation theory for the “grand canonical” distribution and, as a matter of fact, the actual proof also follows this path. Consider the Ising model in the box L and suppose we wish to observe a magnetization deficiency δM = 2m vL from the nominal value of m |L |. Of course, this can be √achieved in one shot by the formation of a Wulff droplet at the cost of about exp{−w1 vL }. Alternatively, if we demand that this deficiency emerges out of the background fluctuations, we might guess on the basis of fluctuation-dissipation arguments that the cost would be of the order
(m v )2 (δM)2 L exp − ≈ exp −2 , 2Var(ML ) χ |L |
(1.21)
where χ is the susceptibility and Var(ML ) = (χ + o(1))|L | is the variance of ML in √ +,β distribution PL . Obviously, the former mechanism dominates when vL vL2 /|L |, i.e., when vL L4/3 , while the latter dominates under the opposite extreme conditions, i.e., when vL L4/3 . (These are exactly the regions previously treated in [28, 40] where the corresponding statements have been established in full rigor.) In the case when vL /L4/3 tends to a finite limit we now find that the two terms are comparable. This is the basis of our parameter defined in (1.10). 3/2 Assuming vL /|L | is essentially at its limit, let us instead try a droplet of volume λvL , where 0 ≤ λ ≤ 1. The droplet cost is now reduced to √ √ exp −w1 λ vL , (1.22) but we still need to account for the remaining fraction of the deficiency. Assuming the fluctuation-dissipation reasoning can still be applied, this is now
(m v )2 √ L exp −2 (1 − λ)2 = exp −w1 vL (1 − λ)2 . χ |L |
(1.23)
Putting these together we find that the total cost of achieving the deficiency δM = 2m vL using a droplet of volume λvL is given in the leading order by √ exp −w1 (λ) vL . (1.24) An optimal droplet size is then found by minimizing (λ) over λ. This is exactly the content of Theorem 1.1. We remark that even on the level of heuristic understanding, some justification is required for the decoupling of these two mechanisms. In [10], we have argued this case on a heuristic level; in the present work, we simply provide a complete proof. The pathway of the proof is as follows: The approximate equalities (1.22–1.24) must be proved in the form of upper and lower bounds which agree in the L → ∞ limit. (Of course, we never actually have to go through the trouble of establishing the formulas involving (λ) for non-optimal values of λ.) For the lower bound (see Theorem 3.1) we simply shoot for the minimum of (λ): We produce a near-Wulff droplet of the desired area and, on the complementary region, allow the background fluctuations to account for the rest. Here, as a bound, we are permitted to use a contour ensemble with restriction to contours of logarithmic size which ensures the desired Gaussian behavior. The upper bound requires considerably more effort. The key step is to show that, with probability close to one, there are no droplets at any scale larger than log L or smaller
Droplet Formation in the 2D Ising Model
147
√ than vL . Notwithstanding the technical difficulties, the result (Theorem 4.1) is of independent interest because it applies for all ∈ (0, ∞), including the case = c . Once the absence of these “intermediate” contour scales has been established, the proof of the main results directly follow. We finish with a brief outline of the remainder of this paper. In the next section we collect the necessary technical statements needed for the proof of both the upper and lower bound. Specifically, in Sect. 2.1 we discuss in detail the minimizers of , in Sect. 2.2 we introduce the concept of skeletons and in Sect. 2.3 we list the needed properties of the logarithmic contour ensemble. Sect. 3 contains the proof of the lower bound, while Sect. 4 establishes the absence of contour on scales between log L and the anticipated droplet size. Sect. 5 assembles these ingredients together into the proofs of the main results. 2. Technical Ingredients This section contains three subsections: Sect. 2.1 presents the solution of the variational problem for function on the right-hand side of (1.12), while Sects. 2.2 and 2.3 collect the necessary technical lemmas concerning the skeleton calculus and the smallcontour ensemble. We remark that a variety of closely related results have appeared in literature; in particular, in [40] (and the earlier [27, 28, 48]). For completeness, we will provide proofs, but keep them as brief as possible. Readers familiar with these topics (or who are otherwise uninterested) are invited to skip the entire section on a preliminary run-through, referring back only for definitions when reading through Sects. 3–5. 2.1. Variational problem. Here we investigate the global minima of the function that was introduced in (1.12). Since the general picture is presumably applicable in higher dimensions as well (certainly at the level of heuristic arguments, see [10]), we might as well carry out the analysis in the case of a general dimension d ≥ 2. For the purpose of this subsection, let (λ) = λ
d−1 d
+ (1 − λ)2 ,
0 ≤ λ ≤ 1.
(2.1)
We define = inf (λ) 0≤λ≤1
(2.2)
and note that > 0 once > 0. Let us introduce the d-dimensional version of (1.18), 1 d + 1 d . d 2 The minimizers of are then characterized as follows: d+1
c =
(2.3)
Proposition 2.1. Let d ≥ 2 and, for any ≥ 0, let M denote the set of all global minimizers of on [0, 1]. Then we have: (1) If < c , then M = {0}. (2) If = c , then M = {0, λc }, where λc =
2 . d +1
(2.4)
148
M. Biskup, L. Chayes, R. Koteck´y
(3) If > c , then M = {λ0 }, where λ0 is the maximal positive solution to the equation 1 2d λ d (1 − λ) = 1. d −1
(2.5)
In particular, λ0 > λc . Proof. A simple calculation shows that λ = 0 is always a (one-sided) local minimum of λ → (λ), while λ = 1 is always a (one-sided) local maximum. Moreover, the stationary points of in (0, 1) have to satisfy (2.5). Consider the quantity q(λ) =
1 1−
1/d d (λ) d−1 λ
=
2d 1/d λ (1 − λ), d −1
(2.6)
i.e., q(λ) is essentially the left-hand side of (2.5). A simple calculation shows that q(λ) 1 2 2 achieves its maximal value on [0, 1] at λ = λd = d+1 , where it equals −1 d = 2d (d − 1)−1 (d +1)−1/d , and is strictly increasing for λ < λd and strictly decreasing for λ > λd . On the basis of these observations, it is easy to verify the following facts: (1) For ≤ d , we have q(λ) < 1 for all λ ∈ [0, 1] (except perhaps at λ = λd when equals d ). Consequently, λ → (λ) is strictly increasing throughout [0, 1]. In particular, λ = 0 is the unique global minimum of (λ) in [0, 1]. (2) For > d , (2.5), resp., q(λ) = 1 has two distinct solutions in [0, 1]. Consequently, λ → (λ) has two local extrema in (0, 1): A local maximum at λ = λ− () and a local minimum at λ = λ+ (), where λ− () and λ+ () are the minimal and maximal positive solutions to (2.5), respectively. As a simple calculation shows, the function → λ+ () is strictly increasing on its 1 domain with λ+ () ∼ 1 − d−1 2d as → ∞. In order to decide which of the two previously described local minima (λ = 0 or λ = λ+ ()) gives rise to the global minimum, we first note that, while (0) = tends to infinity as → ∞, the above asymptotics of λ+ () shows that (λ+ ()) → 1 as → ∞. Hence, λ+ () is the unique global minimum of once is sufficiently large. Thus, it remains to show that the two local minima interchange their roles at = c . To that end we compute 2 d ∂ λ+ () = λ+ () = 1 − λ+ () < 1, d ∂
(2.7)
where we used that λ+ () is a stationary point of to derive the first equality. Comd paring this with d (0) = 1, we see that → (λ+ ()) increases with strictly slower than → (0) on any finite interval of ’s. Hence, there must be a unique value of for which (0) and (λ+ ()) are exactly equal. An elementary computation shows that this happens at = c , where c is given by (2.3). This finishes the proof of (1) and (3); in order to show that also (2) holds, we just need to note that λ+ (c ) is exactly λc as given in (2.4). Proposition 2.1 allows us to define a quantity λ by formula (1.19), where now λ+ () is the maximal positive solution to (2.5). Since lim↓c λ = λc > 0, the function → λ undergoes a jump at c .
Droplet Formation in the 2D Ising Model
149
2.2. Skeleton estimates. In this section we introduce coarse-grained versions of contours called skeletons. These objects will be extremely useful whenever an upper bound on the probability of large contours is needed. Indeed, the introduction of skeletons will permit us to effectively integrate out small fluctuations of contour lines and thus express the contour weights directly in terms of the surface tension. Skeletons were first introduced in [4, 27]; here we use a modified version of the definition from [40]. 2.2.1. Definition and geometric properties. Given a scale s > 0, an s-skeleton is an n-tuple (x1 , . . . , xn ) of points on the dual lattice, xi ∈ (Z2 )∗ , such that n > 1 and s ≤ xi+1 − xi ≤ 2s,
i = 1, . . . , n.
(2.8)
Here · denotes the 2 -distance on R2 and xn+1 is identified with x1 . Given a skeleton S, let P(S) be the closed polygonal curve in R2 induced by S. We will use |P(S)| to denote the total length of P(S), in accord with our general notation for the length of curves. A contour γ is called compatible with an s-skeleton S = (x1 , . . . , xn ), if (1) γ , viewed as a simple closed path on R2 , passes through all sites xi , i = 1, . . . , n in the corresponding order. (2) dH (γ , P(S)) ≤ s, where dH is the Hausdorff distance (1.13). We write γ ∼ S if γ and S are compatible. For each configuration σ , we let s (σ ) be the set of all s-large contours γ in σ ; namely all γ in σ for which there is an s-skeleton S such that γ ∼ S. Given a set of s-skeletons S = (S1 , . . . , Sm ), we say that a configuration σ is compatible with S, if s (σ ) = (γ1 , . . . , γm ) and γk ∼ Sk for all k = 1, . . . , m. We will write σ ∼ S to denote that σ and S are compatible. It is easy to see that s (σ ) actually consists of all contours γ of the configuration σ such that diam γ ≥ s. Indeed, diam γ ≥ s for every γ ∈ s (σ ) by the conditions (1) and (2.8) above. On the other hand, for any γ with diam γ ≥ s, we will construct an s-skeleton by the following procedure: Regard γ as a closed non-self-intersecting curve, γ = (γt )0≤t≤1 , where γ0 is chosen so that supx∈γ x − γ0 ≥ s. Then we let x1 = γ0 and x2 = γt2 , where t2 = inf{t > 0 : γt − γ0 ≥ s}. Similarly, if tj has been defined and xj = γtj , we let xj +1 = γtj +1 , where tj +1 = inf{t ∈ (tj , 1] : γt − γtj ≥ s}. Note that this definition ensures that (2.8) as well as the conditions (1) and (2) hold. The consequence of this construction is that, via the equivalence relation σ ∼ S, the set of all skeletons induces a covering of the set of all spin configurations. Remark 6. The reader familiar with [27, 40] will notice that we explicitly keep the stronger condition (1) from [27]. Without the requirement that contours pass through the skeleton points in the given order, Lemma 2.3 and, more importantly, Lemma 2.4 below would fail to hold. Next we will discuss some subtleties of the geometry of the skeletons stemming from the fact that the corresponding polygons (unlike contours) may have self-intersections. We will stay rather brief; a detailed account of the topic can be found in [27]. We commence with a few geometric definitions: Let P = {P1 , . . . , Pk } denote a finite collection of polygonal curves. Consider a smooth self-avoiding path L from a point x to ∞ that is generic with respect to the polygons from P (i.e., the path L has a finite number of intersections with each Pj and this number does not change under small perturbations of L). Let #(L ∩ Pj ) be the number of intersections of L with Pj . Then we define V (P) ⊂ R2 to be the set of points x ∈ R2 such that the total number
150
M. Biskup, L. Chayes, R. Koteck´y
of intersections, nj=1 #(L ∩ Pj ), is odd for any path L from x to ∞ with the above properties. We will use |V (P)| to denote the area of V (P). If P happens to be a collection of skeletons, P = S, the relevant set will be V (S). If P happens to be a collection of Ising contours, P = , the associated V () can be thought of as a union of plaquettes centered at sites of Z2 ; we will use V() = V ()∩Z2 to denote the relevant set of sites. It is clear that if are the contours associated with a spin configuration σ in and the plus boundary condition on ∂, then V() are exactly the sites x ∈ where σx = −1. We proceed by listing a few important estimates concerning compatible collections of contours and their associated skeletons: Lemma 2.2. There is a finite geometric constant g1 such that if is a collection of contours and S is a collection of s-skeletons with ∼ S, then
P(S) . |γ | ≤ g1 s (2.9) γ ∈
S∈S
In particular, if diam γ ≤ κ for all γ ∈ , then we also have, for some finite constant g2 ,
P(S) .
V () ≤ g2 κ (2.10) S∈S
Proof. Immediate from the definition of s-skeletons.
Lemma 2.2 will be useful because of the following observation: Let S be a collection of s-skeletons and recall that the minimal value of the surface tension, τmin = inf n∈S1 τβ (n) is strictly positive, τmin > 0. Then
P(S) . Wβ P(S) ≥ τmin (2.11) S∈S
S∈S
Thus the bounds in (2.9–2.10) will allow us to convert a lower bound on the overall contour surface area/volume into a lower bound on the Wulff functional of the associated skeletons. A little less trivial is the estimate on the difference between the volumes of V () and V (S): Lemma 2.3. There is a finite geometric constant g3 such that if is a collection of contours and S is a collection of s-skeletons with ∼ S, then
P(S) . (2.12)
V () − V (S)
≤ V ()V (S) ≤ g3 s S∈S
Here V ()V (S) denotes the symmetric difference of V () and V (S). Proof. Follows by the same arguments as used in the proof of Theorem 5.13 in [27].
2.2.2. Probabilistic estimates. The main reason why skeletons are useful is the availability of the so called skeleton upper bound, originally due to Pfister [48]. Recall that, +,β for each A ⊂ Z2 , we use PA to denote the probability distribution on spins in A with plus boundary condition on the boundary of A. Given a set of skeletons, we let +,β +,β PA (S) = PA ({σ : σ ∼ S}) be the probability that S is a skeleton of some configuration in A. Then we have:
Droplet Formation in the 2D Ising Model
151
Lemma 2.4 (Skeleton upper bound). For all β > βc , all finite A ⊂ Z2 , all scales s and all collections S of s-skeletons in A, we have +,β (2.13) PA (S) ≤ exp −Wβ (S) , where Wβ (S) =
Wβ P(S) .
(2.14)
S∈S
Proof. This is exactly Eq. (1.3.1) in [40]. The proof goes back to [48], Lemma 6.7. For our purposes, the key “splitting” argument is provided in Lemma 5.4 of [49]. A special case of the key estimate appears in Eq. (5.51) from Lemma 5.5 of [49] with the correct interpretation of the left-hand side. The bound (2.13) will be used in several ways: First, to show that the K log L-large contours in a box of side-length L are improbable, provided K is large enough; this is a consequence of Lemma 2.5 below. The absence of such contours will be wielded to rule out the likelihood of other improbable scenarios. Finally, after all atypical situations have been dispensed √ with, the skeleton upper bound will deliver the contribution corresponding to the term λ in (1.11). An important consequence of the skeleton upper bound is the following generalization of the Peierls estimate, which will be useful at several steps of the proof of our main theorems. Lemma 2.5. Let s = K log L and let SL,K denote the set of all s-skeletons that arise from contours in L . For each β > βc and α > 0, there is a K0 = K0 (α, β) < ∞, such that exp −αWβ (S) ≤ 1 (2.15) S⊂SL,K
for (all L and) all K ≥ K0 . 0 be the set of all K log L-skeletons S such that S = (x1 , . . . , xk ) with Proof. Let SL,K x1 = 0. By translation invariance, n e−αWβ (S) ≤ e−αWβ (P(S)) , (2.16) L2
S⊂SL,K
n≥1
0 S∈SL,K
where the prefactor L2 accounts for the translation entropy of each skeleton within L . The latter sum can be estimated by mimicking the proof of Peierls’ bound, where contour entropy was bounded by that of the simple random walk on Z2 . Indeed, each skeleton can be thought of as a sequence of steps with step-length entropy at most 32s 2 , where s = K log L, and with each step weighted by a factor not exceeding e−τmin s . This and (2.11) yield m e−αWβ (P(S)) ≤ (2.17) 32s 2 e−ατmin s . 0 S∈SL,K
m≥1
By choosing K0 sufficiently large, the right-hand side is less than 21 L−2 for all K ≥ K0 . Using this in (2.16), the claim follows.
152
M. Biskup, L. Chayes, R. Koteck´y
Lemmas 2.4 and 2.5 will be used in the form of the following corollary: Corollary 2.6. Let β > βc , L ≥ 1 and κ > 0 be fixed, and let A be the set of configurations σ such that Wβ (S) ≥ κ for at least one collection of s-skeletons S satisfying S ∼ σ . Let α ∈ (0, 1), and let K0 (α, β) be as in Lemma 2.5. If s = K log L with K ≥ K0 (α, β), then +,β
PL
(A) ≤ e−(1−α)κ .
Proof. By the assumptions of the lemma, we have +,β +,β PL (A) ≤ PL (S),
(2.18)
(2.19)
S⊂SK,L Wβ (S)≥κ +,β
+,β
where we used the notation PL (S) = PL ({σ : σ ∼ S}). Lemma 2.4 then implies +,β PL (A) ≤ e−Wβ (S) ≤ e−(1−α)κ e−αWβ (S). (2.20) S⊂SK,L Wβ (S)≥κ
S⊂SK,L
Here we wrote e−Wβ (S) = e−αWβ (S)e−(1−α)Wβ (S) and then invoked the bound Wβ (S) ≥ κ to estimate e−(1−α)Wβ (S) by e−(1−α)κ . Finally, we dropped the constraint to Wβ (S) ≥ κ in the last sum. Since s = K log L with K ≥ K0 (α, β), the last sum is less than one by Lemma 2.5. Ideas similar to those used in the proof of Lemma 2.5 can be used to estimate the probability of the occurrence of an s-large contour: Lemma 2.7. For each β > βc , there exists a constant α(β) > 0 such that +,β PA s (σ ) = ∅ ≤ |A|e−α(β)s
(2.21)
for any finite A ⊂ Z2 and any scale s. Proof. Fix α > 0 and suppose without loss of generality that |A| > 1 and s ≥ α −1 log |A| for some α > 0. If s (σ ) = ∅, the associated s-skeleton must satisfy Wβ (S) ≥ τmin s. Invoking (2.13) a variant of the estimate (2.16–2.17) (here is where 1 +,β s ≥ α −1 log |A| enters into play), we show that PA (s (σ ) = ∅) ≤ C|A|s 2 e− 2 τmin s , where C > 0 is a constant. From here the bound (2.21) follows by absorbing the factor Cs 2 into the exponential. 2.2.3. Quantitative estimates around Wulff minimum. The existence of a minimum for the functional (1.6) and a coarse-graining scheme supplemented with a bound of the type in (2.13) tell us the following: Consider a collection of contours, all of which are roughly of the same scale and which enclose a fixed total volume, and suppose that the value of the Wulff functional on a S with S ∼ is close to the Wulff minimum. Then (1) it must be the case that consists of a single contour and (2) the shape of this contour must be close to the Wulff shape. A quantitative (and mathematically precise) version of this statement is given in the forthcoming lemma:
Droplet Formation in the 2D Ising Model
153
Lemma 2.8. For any β ≥ βc , there exist constants 0 = 0 (β) ∈ (0, 1), c = c(β) > 0, and C = C(β) < ∞ such that the following √ holds for all ∈ (0, 0 ): Let be a collection of contours such that diam γ > c |V ()| for all γ ∈ and let s be a scale √ function satisfying s ≤ |V ()|. Let S be a collection of s-skeletons compatible with , S ∼ , such that Wβ (S) ≤ w1 |V ()|(1 + ). (2.22) Then consists of a single contour, = {γ }, and there is an x ∈ R2 such that √ dH V (γ ), |V (γ )|W + x ≤ c |V (γ )|, (2.23) where W is the Wulff shape of unit area centered at the origin. Moreover,
|V (γ )| − |V (S)| ≤ C|V (γ )|.
(2.24)
Proof. We begin by noting that, by the assumptions of the present lemma, |V ()| and |V (S)| have to be of the same order of magnitude. More precisely, we claim that
|V ()| − |V (S)| ≤ C V ()
(2.25) holds with some C = C(β) < ∞ independent of , S and . Indeed, from (2.11) and (2.22) we have
P(S) ≤ τ −1 Wβ (S) ≤ w1 (1 + )τ −1 |V ()|, (2.26) min min S∈S
√ which, using Lemma 2.3 and the bounds s ≤ |V ()| and ≤ 1, gives (2.25) with −1 . C = 2g3 w1 τmin The bound (2.25) essentially allows us to replace V () by V (S) in (2.22). Applying Theorem 2.10 from [27] to the set of skeletons S rescaled by |V (S)|−1/2 , we can conclude that there is point x ∈ R2 and a skeleton S0 ∈ S such that √ dH P(S0 ), |V (S)|∂W + x ≤ α |V (S)|, (2.27) and
P(S) ≤ α |V (S)|,
(2.28)
S∈S\{S0 }
where α is a constant proportional to the ratio of the maximum and the minimum of the surface tension. Using (2.25) once more, we can modify (2.27–2.28) by replacing V (S) on the right-hand sides by √cost of changing √V () at the √ α to α(1 + C). Moreover, since (2.25) also implies that | |V ()| − |V (S)|| ≤ C |V ()|, we have dH |V ()|∂W, |V (S)|∂W ≤ C diam W |V ()|. (2.29) Let γ ∈ be the contour √ corresponding to S0 . By the definition of skeletons, we have dH (γ , P(S0 )) ≤ s ≤ |V ()|. Combining this with (2.29), the modified bound (2.27), and ≤ 1, we get √ dH γ , |V ()|∂W + x ≤ c |V ()| (2.30)
154
M. Biskup, L. Chayes, R. Koteck´y
for any c ≥ 1 + α(1 + C) + C diam W . (From the properties of W , it is easily shown that diam W is of the order of unity.) Let us proceed by proving that = {γ }. For any γ ∈ \ {γ }, let Sγ be the unique skeleton in S such that γ ∼ Sγ . Since diam γ ≤ |P(Sγ )| + s and, since also |P(Sγ )| ≥ s, we have diam γ ≤ 2|P(Sγ )|. Using the modified bound (2.28), we get
diam γ ≤ 2 P(Sγ ) ≤ 2α(1 + C) |V ()|. (2.31) If c also satisfies the inequality √ c > 2α(1 + C), then this estimate contradicts the assumption that diam γ ≥ c |V ()| for all γ ∈ . Hence, = {γ } as claimed. Thus, V () = V (γ ) and the bound (2.24) is directly implied by (2.25). Moreover, (2.30) holds with V () replaced by V (γ ) on both sides. To prove (2.23), it remains to show that the naked γ on the left-hand side of (2.30) can be replaced by V (γ ). But that is trivial because γ is the boundary of V (γ ) and the Hausdorff distance of two closed sets in R2 equals the Hausdorff distance of their boundaries. 2.3. Small-contour ensemble. The goal of this section is to collect some estimates for +,β the probability in PL conditioned on the fact that all contours are s-small in the sense that s (σ ) = ∅. Most of what is to follow appears, in various guises, in the existing literature (cf Remark 7). For some of the estimates (Lemmas 2.9 and 2.10) we will actually provide a proof, while for others (Lemma 2.11) we can quote directly. 2.3.1. Estimates using the GHS inequality. The principal resource for what follows are +,β two basic properties of the correlation function of Ising spins. Specifically, let σx ; σy A,h denote the truncated correlation function of the Ising model in a set A ⊂ Z2 with plus boundary condition, in non-negative inhomogeneous external fields h = (hx ) and inverse temperature β. Then: (1) If β > βc , then the correlations in infinite volume decay exponentially, i.e., we have +,β
σx ; σy Z2 ,h ≤ e−x−y/ξ
(2.32)
for some ξ = ξ(β) < ∞ and all x and y. +,β (2) The GHS inequality implies that the finite-volume correlation function, σx ; σy A,h , is dominated by the infinite-volume correlation function at any pointwise-smaller field: +,β
+,β
0 ≤ σx ; σy A,h ≤ σx ; σy Z2 ,h
(2.33)
for all A ⊂ Z2 and all h = (hx ) with hx ∈ [0, hx ] for all x. Note that, via (2.33), the exponential decay (2.32) holds uniformly in A ⊂ Z2 . Part (1) is a consequence of the main result of [24], see [53]; the GHS inequality from part (2) dates back to [34]. Now we are ready to state the desired estimates. Let A ⊂ Z2 be a finite set and let +,β,s s be a scale function. Let PA be the Gibbs measure of the Ising model in A ⊂ Z2 +,β,s to denote the expectation conditioned on the event {s (σ ) = ∅} and let us use −A +,β,s with respect to PA . Then we have the following bounds:
Droplet Formation in the 2D Ising Model
155
Lemma 2.9. For each β > βc , there exist constants α1 (β) and α2 (β) such that
MA +,β,s − m |A| ≤ α1 (β) |∂A| + |A|2 e−α2 (β) s (2.34) A for each finite set A ⊂ Z2 and any scaling function s. Moreover, if A ⊂ A, then
MA +,β,s − MAA +,β,s ≤ α1 (β) |A | + |A|2 e−α2 (β) s . (2.35) A AA +,β
Proof. By Lemma 2.7, we have PA (s (σ ) = ∅) ≤ |A|e−α2 s for some α2 > 0, independent of A. Note that we can suppose that |A|e−α2 s does not exceed, e.g., 1/2, because otherwise (2.34–2.35) can be ensured by deterministic estimates. An easy bound then shows that, for some α1 = α1 (β) < ∞,
MA +,β,s − MA +,β ≤ α |A|2 e−α2 s . (2.36) 1 A A Therefore, it suffices to prove the bounds (2.34–2.35) without the restriction to the ensemble of s-small contours. The proof will use that, for any B ⊂ Z2 we have +,β
0 ≤ σx B
+,β
− σx B∪{y} ≤ e−x−y/ξ .
(2.37)
This inequality is a direct consequence of properties (1-2) above. The original derivation goes back to [17]. The bound (2.37) immediately implies both (2.34) and (2.35). Indeed, using (2.37) for all x ∈ A and y ∈ B \ A, we have for all A ⊆ B ⊆ Z2 that +,β +,β e−x−y/ξ ≤ α1 |∂A|, (2.38) 0 ≤ MA A − MA B ≤ x∈A y∈B\A
where α1 = α1 (β) < ∞. This and (2.36) directly imply (2.34). To get (2.35), we also need to note that |MA − MA\A | ≤ |A |. Our next claim concerns an upper bound on the probability that the magnetization in the plus state deviates from its mean by a positive amount: Lemma 2.10. Let β > βc and let χ = χ (β) be the susceptibility. Then there exists a constant K = K(β) such that +,β,s
+,β
MA ≥ MA A
PA
2 − (vm ) + m v ≤ 2e 2χ |A|
(2.39)
for any finite A ⊂ Z2 , any v ≥ 0, and any s ≥ K log |A|. +,β
Proof. Let M denote the event M = {σ : MA ≥ MA A + m v}. By Lemma 2.7 we +,β +,β +,β,s have that PA (M) ≤ 2PA (M), so we just need to estimate PA (M). Consider +,β +,β the cumulant generating function FA (h) = logehMA A . The exponential Chebyshev inequality then gives +,β
+,β
+,β
log PA (M) ≤ FA (h) − hMA A
− hm v,
h ≥ 0.
(2.40)
By property (2) of the truncated correlation function, we get +,β
d2 FA +,β +,β (h) = MA ; MA A,h ≤ MA ; MA A,0 , dh2
(2.41)
156
M. Biskup, L. Chayes, R. Koteck´y
where h = (hx ) with hx = h for all x ∈ Z2 and where 0 is the zero field. Since +,β +,β d +,β FA (0) = 0 and dh FA (0) = MA A , we get the bound +,β
+,β
FA (h) ≤ hMA A
+
h2 +,β MA ; MA A,0 . 2
(2.42)
Now, once more by property (2) above, +,β
+,β
|A|−1 MA ; MA A,0 ≤ |A|−1 MA ; MA Z2 ,0 ≤ |A|−1
σx ; σy +,β = χ ,
x∈A y∈Z2
(2.43) where the sums converge by property (1) above. The claim now follows by optimizing over h. Remark 7. The bound in Lemma 2.10 corresponds to Eq. (9.33) of Proposition 9.1 in [49] proved with the help of Lemma 5.1 from [48]. Similarly, the estimates in Lemma 2.9 are closely related to the bounds in Lemma 2.2.1 of [40]. We included the proofs of both statements to pinpoint the exact formulation needed for our analysis as well as to reduce the number of extraneous references. 2.3.2. Gaussian control of negative deviations. Our last claim concerns the deviations of the plus magnetization in the negative direction. Unlike in the previous section, here the restriction to the small contour is crucial because, obviously, if the deviation is too large, there is a possibility of forming a droplet which cannot be controlled by bulk estimates. +,β,s Let β > βc and let v be such that MA A − 2m v is an allowed value of MA . s Define A (v) by the expression +,β,s
PA
+,β,s
MA = MA A
(m )2 1 − 2vm = √ v 2 + sA (v) . exp −2 χ |A| 2πχ |A| (2.44)
Then we have: Lemma 2.11 (Gaussian estimate). For each β > βc and each set of positive constants a1 , a2 , a3 , there are constants C < ∞ and K < ∞ such that if s = K log L, then
2 3
s
(v) ≤ C max K v log L, v A L3 L4
(2.45)
for all allowed values of v such that 0 ≤ v ≤ a1
L2 log L
(2.46)
and all connected sets A ⊂ Z2 such that a2 L2 ≤ |A| ≤ L2 and |∂A| ≤ a3 L log L.
(2.47)
Proof. This is a reformulation of (a somewhat nontrivial) Lemma 2.3.3 from [40].
Droplet Formation in the 2D Ising Model
157
3. Lower Bound In this section we establish a lower bound for the asymptotic stated in (1.11). In addition to its contribution to the proof of Theorem 1.1, this lower bound will play an essential role in the proofs of Theorem 1.2 and Corollary 1.3. A considerable part of the proof hinges on the Fortuin-Kasteleyn representation of the Ising (and Potts) models, which makes the technical demands of this section rather different from those of the following sections. 3.1. Large-deviation lower bound. This section is devoted to the proof of the following theorem: Theorem 3.1 (Lower bound). Let β > βc and let (vL ) be a sequence of positive numbers such that m |L | − 2m vL is an allowed value of ML for all L. Suppose that the limit (1.10) exists with ∈ (0, ∞). Then there exists a sequence (L ) with L → 0 such that √ +,β (3.1) PL ML = m |L | − 2m vL ≥ exp −w1 vL inf (λ) + L 0≤λ≤1
holds for all L. Remark 8. It is worth noting that, unlike in the corresponding statements of the lower bounds in [27, 40], we do not require any control over how fast the error L tends to zero as L → ∞. Indeed, it turns out that in the regime of finite , the simple convergence L → 0 will be enough to prove our main results. However, in the cases when vL tends to infinity so fast that is infinite, a proof would probably need also some information about the rate of the convergence L → 0. The strategy of the proof will simply be to produce a near-Wulff droplet that comprises a particular fraction of the volume vL . The droplet will account for its requisite share of the deficit magnetization and we then force the exterior to absorb the rest. The probability of the latter event is estimated by using the truncated contour ensemble. Let us first attend to the production of the droplet. Consider the Wulff shape W of unit area centered at the origin and a closed, self-avoiding polygonal curve P ⊂ W . We will assume that the vertices of P have rational coordinates and, if N denotes the number of vertices of P, that each vertex is at most 1/N away from the boundary of W . Let Int P denote the set of points x ∈ R2 surrounded by P. For any t, r > 1, let P0 , P1 , P2 , P3 be four magnified copies of P obtained by rescaling P by factors t, t + r, t + 2r, and t + 3r, respectively. (Thus, for instance, P0 = {x ∈ R2 : x/t ∈ P}.) This yields three “coronas” I = Int P \ Int P , K II = Int P \ Int P , and K III = Int P \ Int P surrounding Kt,r 1 0 2 1 3 2 t,r t,r I = K I ∩ Z2 , and similarly for KII and KIII . P0 . Let Kt,r t,r t,r t,r Recall that a ∗-connected circuit in Z2 is a closed path on vertices of Z2 whose elementary steps connect either nearest or next-nearest neighbors. Let Et,r be the set I contains a ∗-connected circuit of sites x ∈ Z2 with of configurations σ such that Kt,r III σx = −1 and Kt,r contains a ∗-connected circuit of sites x ∈ Z2 with σx = +1. The essential part of our lower bound comes from the following estimate: Lemma 3.2. Let β > βc and let P be a polygonal curve as specified above. For any pair of sequences (tL ) and (rL ) tending to infinity as L → ∞ in such a way that tL L−1 → 0,
tL rL e−rL τmin /3 → 0 and rL tL−1 → 0,
(3.2)
158
M. Biskup, L. Chayes, R. Koteck´y
III Fig. 2. An illustration of the “coronas” KIt,r , KII t,r , Kt,r , the sets INT and EXT, and the ∗-connected circuits C+ and C− of plus and minus sites, respectively, which are used in Lemma 3.2 and the proof of Theorem 3.1. Going from inside out, the four polygons correspond to P0 , P1 , P2 and P3 ; the shaded region denotes the set A±
there is a sequence (L ) with L → 0 such that +,β
PL
(EtL ,rL ) ≥ exp −tL Wβ (P)(1 + L ) ,
(3.3)
for all L ≥ 1. The proof of this lemma requires some substantial preparations and is therefore deferred to Sect. 3.2. Using Lemma 3.2, we can prove Theorem 3.1. Proof of Theorem 3.1. Let us introduce the abbreviation ML = σ : ML = m |L | − 2m vL
(3.4)
for the central event in question. Suppose first that ≤ c , where c is as in (1.18). Proposition 2.1 then guarantees that inf 0≤λ≤1 (λ) = (0) = . In particular, there is no need to produce a droplet in the system. Let s = K log L. By restricting to the set of configurations {σ : s (σ ) = ∅} we get +,β +,β,s +,β PL (ML ) ≥ PL (ML )PL s (σ ) = ∅ . (3.5) The resulting lower bound is then a consequence of (2.44), Lemma 2.11 and Lemma 2.7, provided K is sufficiently large. To handle the remaining cases, > c , we will have to produce a droplet. Fix a polygon P with the above properties, let Vol(P) denote the two-dimensional Lebesgue volume of its interior, and let |P| denote the size (i.e., length) of its boundary. Let λ = λ , where λ is as defined in (1.19), and recall that, for this choice of λ, we have (λ) = inf 0≤λ ≤1 (λ√ ) and λ ≥ λc > 0. Since the goal is to produce a droplet of volume λvL , we let tL = λvL and pick rL to be such that (3.2) holds as L → ∞.
Droplet Formation in the 2D Ising Model
159
Abbreviating EL = EtL ,rL , we let (L ) denote the corresponding sequence from Lemma 3.2. (Note that L may depend on P.) For configurations in EL , let C+ be the innermost ∗-connected circuit of plus spins III and let C denote the outermost ∗-connected circuit of minus spins in KI . in Kt,r − t,r Let INT be the set of sites in the interior of C− and let EXT be the set of sites in L that are in the exterior of C+ . (Thus, we have INT ∩ C− = EXT ∩ C+ = ∅.) Further, let A± = L \ (INT ∪ EXT) and use σ± to denote the spin configuration on A± . Let MINT , MEXT and M± denote the overall magnetization in INT, EXT and A± , +,β,s respectively. Finally, let us abbreviate µINT = MINT INT and introduce the event EL = {σ ∈ EL : MINT = −µINT }. +,β The lower bound on PL (ML ) will be derived by restricting to the event EL , conditioning on σ± , extracting the probability of having the correct magnetization in L \A± , and applying Lemma 2.11 to retrieve the contribution from droplet surface tension. The first two steps of this program give +,β +,β +,β +,β PL (ML ) ≥ PL (ML ∩ EL ) ≥ PL (ML ∩ EL |σ± )PL (σ± ). (3.6) σ± +,β
Our next goal is to produce a lower bound of the type (3.1) on PL (ML ∩ EL |σ± ), uniformly in σ± . The advantage of conditioning on a fixed configuration is that, if ML ∩ EL ∩ {σ± } occurs, the overall magnetizations in INT and EXT are fixed. Thus, on ML ∩ EL ∩ {σ± } we get +,β,s
MEXT = ML − M± − MINT = MEXT EXT
− 2m vL 1 − λVol(P) − δL ,
(3.7)
where δL = δL (σ± ) is given by the equation 2m vL δL = I + II + III + IV with I–IV defined by +,β,s
I = µINT − m |INT|, III = −M± + m |A± |,
II = −MEXT EXT + m |EXT|, IV = 2m |INT| − λVol(P)vL .
(3.8) (3.9)
To estimate I–IV, we first notice the geometric bounds tL2 Vol(P) − tL |P| ≤ |INT| ≤ (tL + rL )2Vol(P) + (tL + rL )|P|, |A± | ≤ (tL + 3rL )2 − tL2 + (tL + 3rL )|P|,
(3.10)
and recall that, since both C+ and C− are contained in A± , we have |C− |, |C+ | ≤ |A± |. Lemma 2.9 for s = K log L then allows us to estimate |I| ≤ α1 (β)(|A± | + |INT|2 L−α2 (β)K ) and, similarly, |II| ≤ α1 (β)(|A± |+4L+L4−α2 (β)K ), while the remain2 ing two quantities are bounded by invoking |III| ≤ 2|A√ ± | and |IV| ≤ 4rL tL + 2rL + √ 2(tL + rL )|P|. Using that rL = o( vL ) and tL = O( vL ), we have |A± | = o(vL ) as L → ∞. Moreover, if K is so large that 4 − α2 (β)K < 4/3, we also have |INT|2 L−α2 (β)K ≤ L4−α2 (β)K = o(vL ) as L → ∞. Combining these bounds, it is easy to show that |δL (σ± )| ≤ δ¯L for all σ± , where δ¯L is a sequence such that limL→∞ δ¯L = 0. Now we are ready to estimate the probability that both INT and EXT produce their share of magnetization deficit. Note first that −,β
−,β,s
PINT (MINT = −µINT ) ≥ PINT
−,β (MINT = −µINT )PINT s (σ ) = ∅ .
(3.11)
160
M. Biskup, L. Chayes, R. Koteck´y −,β
Using Lemmas 2.11 and 2.7, we get PINT (MINT = −µINT ) ≥ CL−2/3 for some C = +,β,s C(β) > 0. On the other hand, letting MEXT = {σ : MEXT = MEXT EXT − 2m vL (1 − +,β λVol(P) − δL )}, a bound similar to (3.11) for PEXT combined with Lemmas 2.11 and 2.7 yields +,β
PEXT (MEXT ) ≥ √
(m v )2 2 C L 1 − λVol(P) − δL , exp −2 χ |EXT| |EXT|
(3.12)
where C = C (β) > 0 is independent of σ± contributing to (3.6). Combining the previous estimates, we can use Lemma 3.2 to extract the surface energy term. The result is √ √ +,β (3.13) PL (ML ) ≥ C L−5/3 exp −w1 vL L − L vL , where C = C (β) > 0 and where L stands for the quantity L =
3/2 2 2(m )2 χ −1 w1−1 vL Wβ (P) √ 1 − λVol(P) + δ¯L . λ+ 2 2 w1 L − (tL + rL )
(3.14)
As is clear from our previous reasoning, the quantity L can be made arbitrary close to (λ) by letting L → ∞ and optimizing over P with the above properties. The existence of the desired sequence (L ) then follows by the definition of the limit. 3.2. Results using random-cluster representation. In this section we establish some technical results necessary for the completion of the proof of our lower bound. These results are stated mostly in terms of the random cluster counterpart of the Ising model; the crowning achievement, which is Lemma 3.5, gives immediately the proof of Lemma 3.2. We remark that the latter is the sum total of what this section contributes to the proof of Theorem 3.1. The uninterested, or well-informed, readers are invited to skip the entire section, provided they are prepared to accept Lemma 3.2 without a proof. 3.2.1. Preliminaries. The random cluster representation for the Ising (and Potts) ferromagnets is by now a well established tool. The purpose of the following remarks is to define our notation; for more background and details we refer the reader to, e.g., [12, 35] or the excellent review [32]. Let T ⊂ Z2 denote a finite graph. A bond configuration, generically denoted by ω, is the assignment of a zero (vacant) or a one (occupied) to each bond in T. The weight of a configuration ω is given, informally, by R |ω| q C(ω) , where |ω| denotes the number of occupied bonds and C(ω) denotes the number of connected components. For the Ising system at hand we have q = 2 and R = e2β − 1. The precise meaning of C(ω) depends on the boundary conditions; of concern here are the so called free and wired boundary conditions. In the former, C(ω) is the usual number of connected components including the isolated sites, while in the latter all clusters touching the bond-complement of T are identified as a single component. free,β w,β The free and wired random-cluster measures in L , denoted by PL,FK and PL,FK , respectively, correspond to the free and plus (or minus) boundary conditions in the Ising spin system. Both random-cluster measures enjoy the FKG property and the wired measure stochastically dominates the free measure. The infinite volume limits of these free,β w,β measures also exist; we denote these limiting objects by PFK and PFK . The most important type of event we shall consider is the event that sites are connected by paths
Droplet Formation in the 2D Ising Model
161
of occupied bonds. Our notation is as follows: If x, y ∈ T, we define {x ←→ y} to be the event that there is such a connection. If we demand the existence of a path using only bonds with both ends in some subgraph A ⊂ T, we write {x ←→ y}. A
The next concept we need to discuss is duality. For any T ⊂ Z2 , the dual graph T∗ is defined as follows: Each bond of T is transversal to a bond on (Z+ 21 )×(Z+ 21 ) = (Z2 )∗ . These bonds are the bonds of T∗ ; the sites of T∗ are the endpoints of these bonds. Each configuration ω induces a configuration on the dual graph via the correspondence “direct occupied” with “dual vacant” and vice versa. It turns out that, if we start with either free or wired boundary conditions on T, the weights for the dual configurations are also random-cluster weights with parameters (q ∗ , R ∗ ) = (q, q/R), provided we also interchange the designation of “free” and “wired.” Of course, the graph and its dual are not precisely the same. For example, if we examine the relevant graph for the problem dual to the wired system in L , this consists of an (L + 1) × (L + 1) rectangle with the corners missing. Moreover, because the boundary conditions on the dual graph are free, all dual edges touching the boundary sites are occupied independently of the rest of the configuration. Thus, ignoring these decoupled degrees of freedom, the restricted measure is equivalent to a free measure on L−1 . In general, we will use β ∗ to denote the inverse temperature dual to β, which, for q = 2 and the normalization of the Hamiltonian (1.1), is related to β via β ∗ = 21 log coth β. The critical temperature is self dual, i.e., βc = 21 log coth βc . For β > βc , the dual model is in the high-temperature phase. Hence, the limiting free and wired measures at β ∗ coincide and, using the well-known relation between the spin-correlations and the connectivity functions in the FK representation, we have free,β ∗
PFK
w,β ∗
∗
(x ←→ y) = PFK (x ←→ y) = σ0 σx +,β ,
(3.15)
for all x, y ∈ Z2 . Thus, the exponential decay of correlations in the spin system at ∗ high temperatures, σ0 σx +,β ≤ e−x−y/ξ where ξ = ξ(β ∗ ) is the correlation length, corresponds to an exponential decay of the connectivity probabilities. In particular, the surface tension at β > βc , as defined in (1.5) for unit vectors n with rationally related components, is the inverse of the correlation length for two point connectivity functions in the direction n at inverse temperature β ∗ . 3.2.2. Decay estimates. Here we assemble two important ingredients for the proof of Lemma 3.2. We begin by quantifying the decay of the point-to-boundary connectivity function: Lemma 3.3. Consider the q = 2 random cluster model at β < βc (which corresponds to the high-temperature phase of the Ising system). Then, w,β P,FK {0 ←→ ∂ } ≤ 4e−/ξ (3.16) for all ≥ 1. Proof. This is one portion of the proof of Proposition 4.1 in [23].
For the purposes of the next lemma, let n be a unit vector with rationally related components and let C(n) be the set of all pairs (a, b) of positive real numbers such that the a × b rectangle with side b perpendicular to n can be positioned in R2 in such a n ⊂ Z2 to denote a generic a × b way that all its four corners are in Z2 . We will use Ra,b
162
M. Biskup, L. Chayes, R. Koteck´y
rectangle with the latter property. If x and y are the two corners along the same b-side n , we let B n denote the event {x ←→ y}. of Ra,b a,b n Ra,b
Lemma 3.4. Let β ∈ (0, βc ) and let β ∗ = 21 log coth β. Let n be a unit vector with rationally related components and suppose that L, aL and bL , with (aL , bL ) ∈ C(n), n , Z2 \ )/(b + tend to infinity in such a way that aL /L → 0, bL /L → 0 and dist(Ra,b L L log L) → ∞ as L → ∞. Then 1/bL free,β ≥ e−τβ ∗ (n) . (3.17) lim PL,FK BanL ,bL L→∞
Proof. We will first establish the limit (3.17) for the measure in infinite volume and then show that provided RLn are well separated from Z2 \ L as specified, the finite volume effects are not important. Throughout the proof, we will omit the subscript β ∗ for the surface tension. Fix n ∈ S1 with rationally related components and let β < βc . Let w,β n n , (a, b) ∈ C(n), (3.18) = PFK Ba,b θa,b and note that if (a, b1 ) ∈ C(n) and (a, b2 ) ∈ C(n) with b2 ≥ b1 , then also (a, b1 + b2 ) ∈ C(n) and (a, b2 − b1 ) ∈ C(n). We begin by the claim that the events in question enjoy a subadditive property: n n θa,b ≥ θa,b θn , 1 +b2 1 a,b2
(a, b1 ), (a, b2 ) ∈ C(n).
(3.19)
n n n be translated relative to Ra,b so that the “left” a-side of Ra,b Indeed, we let Ra,b 2 1 2 n coincides with the “right” a-side of Ra,b1 . Let x1 and y1 be the “left” and “right” bottom n n . By our construction, y and let x2 and y2 be similar corners of Ra,b corners of Ra,b 1 1 2 n n n and x2 coincide. Let Ra,b1 +b2 denote the union Ra,b1 ∪ Ra,b2 . Then x1 ←→ (3.20) y2 ⊃ x1 ←→ y1 ∩ x2 ←→ y2 . n n n Ra,b
Ra,b
1 +b2
1
Ra,b
2
The inequality (3.19) then follows immediately from the FKG property of the w,β measure PFK . Let A(n) = {a > 0 : ∃b > 0, (a, b) ∈ C(n)} be the set of allowed values of a. As a consequence of subadditivity, for any a ∈ A(n) we have the existence of the limit n )1/b . (Here b only takes values such that (a, b) ∈ C(n).) Fure−a (n) = limb→∞ (θa,b ther, if a1 , a2 ∈ A(n) with a1 ≥ a2 , then there is a b such that both (a1 , b) ∈ C(n) and (a2 , b) ∈ C(n), and, for any such b, we have θan1 ,b ≥ θan2 ,b . Thence a1 (n) ≤ a2 (n) whenever a1 , a2 ∈ A(n) satisfy a1 ≥ a2 . Let (n) = lima→∞ a (n), where a’s are n n , where (a, b) ∈ C(n), still restricted to A(n). Now the quantity θ∞,b = lima→∞ θa,b obeys the subadditivity relation (3.19) and, in particular, the half-space surface tension τh (n) is well defined by the limit e−τh (n) = lim
lim
b→∞ (a,b)∈C (n) a→∞
n 1/b (θa,b ) .
(3.21)
n ≥ θ n for all a and b such that (a, b) ∈ C(n) and, therefore, τ (n) ≤ Moreover, θ∞,b h a,b (n). Our goal is to demonstrate that τh (n) = (n) and that the half-space surface tension τh (n) equals the full space surface tension τ (n).
Droplet Formation in the 2D Ising Model
163
n −b (τh (n)+) . However, since θ n Let > 0. Then there is a b such that θ∞,b ≥ e ∞,b (τ (n)+2) n n −b h simply equals the limit of θa,b as a → ∞, there is an a such that θa ,b ≥ e . Thence (n) ≤ τh (n) and the equality of τh (n) and (n) follows. To remove the halfspace constraint, consider the analogue of the previously defined events. Let x and y be n as in the definition of event B n and let D n denote the union of R n related to Ra,b a,b a,b a,b and its reflection through the line joining x and y. Let w,β n ρa,b = PFK {x ←→ y} . (3.22) n
Da,b
Reasoning identical to that employed thus far yields n 1/b n 1/b ) = lim lim (ρa,b ) , e−τ (n) = lim lim (ρa,b b→∞ a→∞
a→∞ b→∞
(3.23)
where we tacitly assume (a, b) ∈ C(n) for the production of both limits. Now, obviously, n ≥ θ n and hence τ (n) ≤ τ (n). To derive the opposite inequality, we note that for ρa,b h a,b each a ∈ A(n), there is a g(a) > 0 such that n n ≥ g(a)ρa,b , θ2a,b
(a, b) ∈ C(n).
(3.24)
n can certainly be achieved by connecting the bottom Indeed, the event giving rise to θ2a,b n corners of R2a,b directly to the middle points and then connecting the middle points on n . Then (3.24) follows by FKG. (To get that g(a) > 0, we the opposite a-sides of R2a,b also used that β > 0.) Taking the 1/bth power of both sides of (3.24) and letting b → ∞ followed by a → ∞ we arrive at (n) = τh (n) = τ (n) as promised. To finish the proof, we must account for the effects of finite volume. Consider the n = {∂R n ↔ ∂ }. Should F n not occur, a vacant ring separates R n from event Fa,b L a,b a,b a,b ∂L and, using fairly standard arguments, we have free,β w,β n
n n c PL,FK (Ba,b (Fa,b (3.25) ) ≥ PFK Ba,b ) .
On the other hand, by Lemma 3.3, we have n
n n ) ≤ PL,FK (Fa,b ) ≤ 8L(a + b) e− dist(∂Ra,b ,∂L )/ξ . PFK (Fa,b w,β
w,β
(3.26)
n and ∂ exceeds a large multiple of b + log L, the Thus if the distance between ∂Ra,b L L
n c w,β w,β n n
dominant contribution to PFK (Ba,b ) comes from PFK (Ba,b (Fa,b ) ). Using (3.25), the claim follows. I –KIII associated with some 3.2.3. Corona estimates. We recall the “corona” regions Kt,r t,r given polygon P. In addition, we will also need to consider the collection of dual sites ∗II = K II ∩ (Z2 )∗ , where (Z2 )∗ is the lattice dual to Z2 . (This differs slightly from Kt,r t,r II by some boundary sites.) In the context of the random cluster the graph dual to Kt,r I , model (and its dual) we will consider three events: The first event, to be denoted Et,r I and is defined by takes place in Kt,r
I I Et,r = ω : there is a circuit of occupied bonds in Kt,r surrounding the origin . (3.27)
164
M. Biskup, L. Chayes, R. Koteck´y
III is defined similarly except that the circuit takes place in the region KIII . The event Et,r t,r II∗ . We define Finally, one more circuit, this time a dual circuit in the region Kt,r II∗ ∗II = ω : there is a dual circuit of vacant bonds in Kt,r surrounding the origin . Et,r (3.28) I ∩ E II∗ ∩ E III more or less implies As we will see in the proof of Lemma 3.2, the event Et,r t,r t,r the desired event Et,r . The desired lower bound will then be an immediate consequence of the following lemma:
Lemma 3.5. Let β > βc and let P be as in Lemma 3.2. For any sequences (tL ) and (rL ) satisfying (3.2), there is a sequence (L ) such that L → 0 and, for all L, w,β ≥ exp −tL Wβ (P)(1 + L ) . ∩ EtIII (3.29) PL,FK EtIL ,rL ∩ EtII∗ L ,rL L ,rL Proof. In the course of this proof, let us abbreviate ELI = EtIL ,rL , and similarly for ELII∗ and
I , K∗II , and KIII . We will start with an estimate for P II∗ ELIII , as well as KL L L L,FK (EL ), which is in any case the central ingredient of this lemma. Let T be the smallest integer T ≥ 2 such that the polygon P magnified by T has all vertices on Z2 . Let uL = T (tL + rL )/T + T ∗ and let x1 , . . . , xN be the vertices of the polygon P magnified by uL . Let x1∗ , . . . , xN be the corresponding vertices of the polygon P magnified by uL and translated by ∗ lie inside (− 21 , − 21 ). Notice that (once tL and rL are large enough) the sites x1∗ , . . . , xN ∗II the “corona” KL . We use ni to denote the unit vector constituting the outer normal to ∗ and x ∗ (where x ∗ ∗ the side between xi+1 i N+1 is identified with x1 ). By our construction, ∗ ∗ 2 2 ∗ x1 , . . . , xN ∈ Z , x1 , . . . , xN ∈ (Z ) and ni have rationally related components. For i = 1, . . . , N, let us consider the rectangles Ranii,bi with the base coinciding ∗ . Here a is the largest possible number such that with the line between xi∗ and xi+1 i ∗II . We remark that all (a ) and (b ) have L-dependence (ai , bi ) ∈ C(ni ) and Ranii,bi ⊂ KL i i which is notationally suppressed and that these tend to infinity as L → ∞. In particular, the bi ’s scale with uL . Let us denote w,β
bi , L→∞ tL
bi = lim
i = 1, . . . , N,
(3.30)
where the limit exists by the construction of bi ’s and where we noted that tL /uL → 1 as L → ∞. ∗ in the box Let Bi∗ be the event that there is a dual vacant connection xi∗ ←→ xi+1 ni Rai ,bi and let Bi be the corresponding “direct” event that there is a direct occupied path xi ←→ xi+1 contained in ( 21 , 21 )-translate of Ranii,bi . It is clear that the intersection N ∗ II∗ i=1 Bi produces the event EL and that these events are FKG-correlated. Moreover, by duality, we have free,β ∗
PL,FK (Bi∗ ) = PL−1,FK (Bi ) w,β
(3.31)
(cf., the paragraph before (3.15)). Now we are perfectly positioned to apply Lemma 3.4: Using FKG, the scaling relation (3.30), and the fact that also the aj ’s tend to infinity by our construction, we have as a consequence of the above-mentioned lemma that w,β II∗ 1/tL lim P EL L→∞ L,FK
N
= exp − bj τβ (nj ) . j =1
(3.32)
Droplet Formation in the 2D Ising Model
165 w,β
The remainder of the proof concerns the estimate of the probability PL,FK (ELI ∩ ELIII |ELII∗ ). We claim that this conditional probability tends to one as L → ∞. First, ∗II are vacant. By as a worst-case scenario, consider the event VLII∗ that all bonds in KL w,β monotonicity in boundary conditions and the strong FKG property of PL,FK it is seen that
w,β w,β PL,FK ELI ∩ ELIII ELII∗ ≥ PL,FK ELI ∩ ELIII VLII∗ . (3.33) Under the condition that VLII∗ occurs, ELI and ELIII are independent and we may treat them separately. The arguments are virtually identical for both events, so we need only w,β be explicit about PL,FK (ELI |VLII∗ ). ∗, Let L be a maximal integer such that there is a circuit of dual sites, z1∗ , . . . , zm ∗ ∗ I separating the boundaries of KL with the property that, if L (zj ) is the translate of ∗L I . Note that lim inf by (the vector) zj∗ , then ∗L (zj∗ ) ⊂ KL L→∞ L /rL > 1/3. Now, for I the event EL not to occur, there must be a dual occupied path connecting some dual site I to another on the inner boundary and hence at least one z∗ on the outer boundary of KL j has to be connected to the boundary of its ∗L (zj∗ ) by a path of dual occupied bonds. Using subadditivity of the probability measure, we find w,β
1 − PL,FK ELI VLII∗
≤
m j =1
w,β PL,FK zj∗ ←→ ∂∗L (zj∗ ) VLII∗ .
(3.34)
Now, again invoking monotonicity in the boundary conditions, the probability of the above connection events may be estimated from above by placing dual wired (i.e., direct free) boundary conditions on ∗L (zj∗ ). But then, by duality, we have exactly the event which is the subject of Lemma 3.3. Explicitly,
w,β w,β ∗ PL,FK zj∗ ←→ ∂∗L (zj∗ ) VLII∗ ≤ PL ,FK 0 ←→ ∂L (3.35) holds for all j = 1, . . . , m, and the bound in (3.16) can be applied. Now the number of sites zj∗ which comprise the circuit does not exceed a multiple of tL . Thus, for some constant C independent of L we have w,β
PL,FK ELI VLII∗ ≥ 1 − CL tL e−L /ξ . (3.36) By the condition stated in (3.2), the fact that rL ≥ L ≥ rL /3 for sufficiently large L, and the observation that ξ −1 = τmin , the desired result for ELI follows. Similarly for the event ELIII . Proof of Lemma 3.2. We make liberal use of the correspondence between the graphical configurations ω and (sets of) spin configurations as described, e.g., in [2, 12, 30]. Each connected cluster in ω represents the spin configurations in which all sites of the cluster have spins of the same type. Thus, if ELI ∩ ELII∗ ∩ ELIII occurs, then the inner circuit of I forces the spins on these sites to be of the same type. Since these occupied bonds in KL ∗II , with probare disconnected from the boundary of L by the dual vacant circuit in KL ability one-half, all spins on the circuit are minus. Similarly, the outer circuit of bonds III is plus-type with probability one if it is connected to ∂ and with probability in KL L +,β 1/2 otherwise. Thus, PL (EtL ,rL |ELI ∩ ELII∗ ∩ ELIII ) is certainly bigger than 1/4, and the claim follows using Lemma 3.5.
166
M. Biskup, L. Chayes, R. Koteck´y
4. Absence of Intermediate Contour Sizes 4.1. Statement and outline. The goal of this section is to prove that, with probability tending to one√as L → ∞, there will be no contours with a diameter between the scales of log L and vL in the “canonical” ensemble of the Ising model in volume L . This result is by far the most difficult part of the proof of our main results stated in Sect. 1.3. We start with a standard notion from contour theory. Let (σ ) denote the set of all contours of a configuration σ in L with plus boundary condition. Applying the rounding rule, contours are self-avoiding simple curves in R2 . Recall that s (σ ) is the set of contours of σ that have a non-trivial s-skeleton. We say that γ ∈ (σ ) is an external contour, if it is not surrounded by any other contour from . We will use sext (σ ) to denote the set of external contours of s (σ ). (We remark that sext (σ ), namely the external contours of (σ ) which are big enough to have an s-skeleton, coincides exactly with the set of external contours of the collection s (σ ).) Using this notation, the event Aκ ,s,L from Theorem 1.2 is best described via its complement: √ Acκ ,s,L = σ : ∃γ ∈ sext (σ ), diam γ ≤ κ vL . (4.1) The relevant claim is then restated as follows: Theorem 4.1. Let β > βc and let (vL ) be a sequence of positive numbers that make m |L | − 2m vL an allowed value of ML for all L. Suppose the limit in (1.10) obeys ∈ (0, ∞). For each c0 > 0 there exist κ > 0, K0 < ∞ and L0 < ∞ such that if K ≥ K0 , L ≥ L0 and s = K log L, then
Acκ ,s,L ML = m |L | − 2m vL ≤ L−c0 .
+,β
PL
(4.2)
Let s = K log L be a scale function and recall that a contour γ is s-large if γ ∈ s (σ ). For κ√> 0, a contour γ large enough to be an s-large contour but satisfying diam γ ≤ κ vL will be called a κ-intermediate contour. Thus, Theorem 4.1 shows that, in the canonical ensemble with the magnetization fixed to m |L | − 2m vL , there are no κ-intermediate contours with probability tending to one as L tends to infinity. This statement, which is of interest in its own right, reduces the proof of our main result to a straightforward application of isoperimetric inequalities for the Wulff functional as formulated in Lemma 2.8. Remark 9. The reason why a power of L appears on the right-hand side is because we only demand the absence of contours with sizes over K log L. Indeed, for a general s, the right-hand side of (4.2) could be replaced by e−αs for some constant α > 0. In particular, the decay can be made substantially faster by easing the lower limit of what we chose to call an intermediate size contour. Finally, we note that L0 in Theo3/2 rem 4.1 depends not only on β, , and c0 , but also on how fast the limit vL /|L | is achieved. The proof of Theorem 4.1 will require some preparations. In particular, we will need to estimate the (conditional) probability of five highly unprobable events that we would like to exclude explicitly from the further considerations. All five events are defined with reference to a positive number κ which, more or less, is the same κ that appears in Theorem 4.1.
Droplet Formation in the 2D Ising Model
167
The first event, R1κ ,s,L , collects the configurations for which the combined length √ of all s-large contours in L exceeds κ −1 s vL . These configurations need to be a priori excluded because all of the crucial Gaussian estimates from Sect. 2.3 can only be applied to regions with a moderate surface-to-volume ratio. Next, we show that one can ignore configurations whose large contours occupy too big a volume. This is the basis of the event R2κ ,s,L . The remaining three events concern the magnetization deficit in two random subsets of L : A set Int◦ ⊂ V(sext (σ )) of sites enclosed by an s-large contour and a set Ext◦ of sites outside all s-large contours. The precise definitions of these sets is given in Sect. 4.2. The respective events are: (3) The event R3κ ,s,L that MInt◦ ≤ −m |Int◦ | − κ −1 svL . (4) The event R4κ ,s,L that MExt◦ ≥ m |Ext◦ | − 2κm vL . (5) The event R5κ ,s,L that MExt◦ ≤ m |Ext◦ | − 2(1 + κ −1 )m vL . 3/4
1 5 By choosing κ sufficiently small, the events √ R , . . . , R will be shown to have a probability vanishing exponentially fast with vL . These estimates are the content of Lemma 4.2 and Lemmas 4.6-4.8. Once the preparatory statements have been proven, we consider a rather extreme version of the restricted contour ensemble, namely, one in which no contour that is larger than κ-intermediate is allowed to appear. We show, in a rather difficult Lemma 4.9, that despite this restriction, bounds similar to those of (4.2) still hold. The final step—the proof of Theorem 4.1—is now achieved by conditioning on the location(s) of the large contour(s), which by the “R-lemmas” are typically not too big and not too rough. By definition, the exterior region is now in the restricted ensemble featured in Lemma 4.9 and the result derived therein allows a relatively easy endgame. Throughout Sects. 4.2–4.4 we will let β > βc be fixed and let (vL ) be a sequence of positive numbers such that m |L | − 2m vL is an allowed value of ML for all L. Moreover, we will assume that (vL ) is such that the limit in (1.10) exists with ∈ (0, ∞).
4.2. Contour length and volume. In this section we will prepare the grounds for the proof of Theorem 4.1. In particular, we derive rather crude estimates on the total length of large contours and the volume inside and outside large external contours. These results come as Lemmas 4.2 and 4.4 below. 4.2.1. Total contour length. We begin by estimating the combined length of large contours. Let s be a scale function and, for any κ > 0, let R1κ ,s,L be the event
√ |γ | ≥ κ −1 s vL . R1κ ,s,L = σ :
(4.3)
γ ∈s (σ )
The probability of event R1κ ,s,L is then estimated as follows: Lemma 4.2. For each c1 > 0 there exist κ0 > 0, K0 < ∞ and L0 < ∞ such that √
+,β (4.4) PL R1κ ,s,L ML = m |L | − 2m vL ≤ e−c1 vL holds for all κ ≤ κ0 , K ≥ K0 , L ≥ L0 , and s = K log L.
168
M. Biskup, L. Chayes, R. Koteck´y
Proof. Let K0 be the quantity K0 ( 21 , β) from Lemma 2.5 and let us recall that τmin denotes the minimal value of the surface tension. We claim that it suffices to show that, for all c1 > 0 and an appropriate choice of κ, the bound +,β
PL
√v L
(R1κ ,s,L ) ≤ e−c1
(4.5)
holds true once L is sufficiently large. Indeed, if (4.5) is established, we just choose c1 so large that the difference c1 − c1 exceeds the rate constant from the lower bound in Theorem 3.1 and the estimate (4.4) immediately follows. In order to prove (4.5), fix c1 > 0 and let κ0−1 = 2g1 c1 /τmin , where g1 is as in (2.9). Let K ≥ K0 , κ ≤ κ0 and s = K log L. We claim that if σ ∈ R1κ ,s,L and S is a collection of s-skeletons such that S ∼ σ , then (2.9) and (2.11) force
√
P(S) ≤ g1 sτ −1 Wβ (S). κ −1 s vL ≤ |γ | ≤ g1 s (4.6) min γ ∈s (σ )
S∈S
Hence, for each σ ∈ R1κ ,s,L there is at least one S such that S ∼ σ and Wβ (S) ≥ √ √ 2c1 vL . By Corollary 2.6 with κ = 2c1 vL and α = 21 , and our choice of K0 , (4.5) follows. 4.2.2. Interiors and exteriors. Given a scale function s and a configuration σ , let sext (σ ) be the set of external contours in s (σ ). (Note that these contours will also be external in the set of all contours of σ .) Define Int = Ints,L (σ ) to be the set of all sites in L enclosed by some γ ∈ sext (σ ) and let Ext = Exts,L (σ ) be the complement of Int, i.e., Ext = L \ Int. Given a set of external contours , we claim that under the condition that sext (σ ) = , +,β the measure PL is a product of independent measures on Ext and Int. A coarse look might suggest a product of plus-boundary condition measure on Ext and the minus measure on Int. Indeed, all spins in Ext up against a piece of are necessarily pluses and similarly all spins on the Int sides of these contours are minuses. But this is not quite the end of the story, two small points are in order: First, we have invoked a rounding rule. Thus, for example, certain spins in Ext (at some corners but not up against the contours) are forced to be plus otherwise the rounding rule would have drawn the contour differently. On the other hand, some corner spins are permitted either sign because the rounding rule would separate any such resulting contour. Fortunately, the upshot of these “rounding anomalies” is only to force a few additional minus spins in Int and plus spins in Ext than would appear from a naive look at . To make the aforementioned observations notationally apparent, we define Int◦ ⊂ Int to be the set of sites that can be flipped without changing and similarly for Ext. We thus have σx = −1 for all x ∈ Int \ Int◦ and σx = +1 for all x ∈ Ext \ Ext◦ . Explicitly, there are a few more boundary spins than one might have thought, but they are always of the +,β correct type. Thus, clearly, although rather trivially, the measure PL (·|sext (σ ) = ) restricted to Int is simply the measure in Int with minus boundary conditions. The same measure on Ext is not quite the corresponding plus-measure due to the condition that constitutes all the external contours visible on the scale s. Thus, beyond the scale s in Ext, we must see . . . no contours. But this is precisely the definition of the restricted ensemble. We conclude that the conditional measure splits on Int and Ext into independent measures that are well understood. Explicitly, if A is an event depending only on the spins
Droplet Formation in the 2D Ising Model
169
in Int◦ and B is an event depending only on the spins in Ext◦ , then
+,β −,β +,β,s PL A ∩ B sext (σ ) = = PInt◦ (A)PExt◦ (B).
(4.7)
This observation will be crucial for our estimates in the next section. Next we will notice that the number of sites associated with the contours can be easily bounded in terms of the total length of : Lemma 4.3. There exists a geometrical constant g4 < ∞ such that the following is true: If is a set of external contours and Int◦ and Ext◦ are as defined above, then |L \ (Int◦ ∪ Ext◦ )| ≤ g4 |γ |. (4.8) γ ∈
Proof. Each site from L \(Int◦ ∪Ext◦ ) is within some (Euclidean) distance from a dual lattice site x ∗ ∈ (Z2 )∗ such that some contour γ ∈ passes through x ∗ . On the other hand, the number of dual lattice sites x ∗ visited by contours from does not exceed twice the total length of all contours in . From here the existence of a g4 satisfying (4.8) follows. The definition of the event R1κ ,s,L gives us the following easy bounds: Lemma 4.4. Let g4 be as in Lemma 4.3. Let σ ∈ R1κ ,s,L and let the sets Int = Ints,L (σ ), Int◦ = Int◦s,L (σ ) and Ext◦ = Ext◦s,L (σ ) be as above. Then we have the bounds √ √ |∂Int◦ | ≤ g4 κ −1 s vL and |∂Ext◦ | ≤ g4 κ −1 s vL + 4L (4.9) and |Int◦ | ≤ |Int| ≤ g42 κ −2 s 2 vL .
(4.10)
◦ ◦ ◦ ◦ Proof. Since ∂Int ⊂ L \ (Ext ∪ Int ) which by Lemma 4.4 implies |∂Int | ≤ g4 γ ∈s (σ ) |γ |, the first bound in (4.9) is an immediate consequence of the fact that σ ∈ R1κ ,s,L . Note that the same inequality is true for |∂Int|. The second bound in (4.9) then follows by the fact that ∂Ext◦ ⊂ ∂L ∪ L \ (Ext◦ ∪ Int◦ ). The last bound, (4.10), is then implied by the first bound in (4.9) for ∂Int instead of ∂Int◦ and the isoperimetric 1 inequality || ≤ 16 |∂|2 valid for any ⊂ R2 that is a finite union of closed unit squares (see, e.g., Lemma A.1 in [16]).
4.2.3. Volume of large contours. The preceding lemma asserts that, for typical configurations, the interior of large contours is not too big. Actually, one can be a bit more precise. Namely, introducing R2κ ,s,L = σ : |V (sext (σ ))| ≥ (1 − κ)vL , (4.11) we will show in the next lemma that, whenever κ is sufficiently small, the conditional √ probability of R2κ ,s,L given the ML ’s of interest is still exponentially small in vL . However, unlike in Lemma 4.2 (and Lemma 4.6 below), here the constant multiplying √ vL in the exponent can no longer be made arbitrarily large. Lemma 4.5. There exist constants c2 > 0, κ0 > 0, K0 < ∞, and L0 < ∞ such that √
+,β (4.12) PL R2κ ,s,L ML = m |L | − 2m vL ≤ e−c2 vL holds for all K ≥ K0 , κ ∈ (0, κ0 ], L ≥ L0 , and s = K log L.
170
M. Biskup, L. Chayes, R. Koteck´y
Proof. Let be as defined in (2.2). Clearly, it suffices to prove the statement for some κ > 0, so let κ ∈ (0, 1) be such that (4.13) c2 = w1 (1 − κ)2 − ( + 2κ) > 0. (This is possible because < 1 for all < ∞.) Let L0 be so large that L from Theorem 3.1 satisfies L ≤ κ for all L ≥ L0 . Let K0 be chosen to exceed the quantity K0 (κ, β) from Lemma 2.5. Fix K ≥ K0 , L ≥ L0 , and s = K log L. Let now σ ∈ R2κ ,s,L and let us temporarily abbreviate = s (σ ) and = sext (σ ). Let S be any s-skeleton such that S ∼ , and let S be the set of skeletons in S corresponding to . First we note that we may as well assume that, for some fixed B > 0 to be specified later
P(S) ≤ B √vL . τmin
(4.14)
S∈S
Indeed, the contribution of the configurations violating this bound can√ be directly estimated, combining Corollary 2.6 with α = κ and (2.11), by e−(1−κ )B vL . For configurations satisfying (4.14), Lemma 2.3 in turn implies
P(S) ≥ (1 − κ)2 vL ,
V (S ) ≥ V ( ) − g3 s (4.15) S∈S √ L B 1. As a consequence of provided L is sufficiently large to ensure that g3 K log vL τmin √ this and the Wulff variational problem, Wβ (S ) ≥ w1 (1 − κ) vL . Since S ⊃ S , we have Wβ (S) ≥ Wβ (S ) and thus for every σ ∈ R2κ ,s,L satisfying (4.14) there is a √ collection S of s-skeletons such that S ∼ σ and Wβ (S) ≥ w1 (1 − κ) vL . Using, once more, Corollary 2.6 with α = κ and our choice of K0 , we have +,β
PL
2 w √v 1 L
(R2κ ,s,L ) ≤ e−(1−κ )
+ e−(1−κ )B
√ vL
(4.16)
. +,β
Letting B = (1 − κ)w1 , the right-hand side beats the lower bound PL (ML = √ m |L | − 2m vL ) ≥ exp{−w1 √vL ( + κ)} from Theorem 3.1 and our choice of L0 and κ by exactly 2e−(c2 +κ w1 ) vL . Using the leeway in the exponent to absorb the extra factor of 2 (which may require that we further increase L0 ), the estimate (4.12) follows. 4.3. Magnetization deficit due to large contours. In this section we will provide the necessary control over the magnetization deficit inside and outside large contours. The relevant statements come as Lemmas 4.6-4.8. 4.3.1. Magnetization inside. Our next claim concerns the total magnetization inside the large contours in L . Recalling the definition of Int◦ , we reintroduce the event 3/4 R3κ ,s,L = σ : MInt◦ ≤ −m |Int◦ | − κ −1 svL . For the probability of R3κ ,s,L we have the following bound:
(4.17)
Droplet Formation in the 2D Ising Model
171
Lemma 4.6. For each c3 > 0 there exist κ0 > 0, K0 < ∞ and L0 < ∞ such that √
+,β (4.18) PL R3κ ,s,L ML = m |L | − 2m vL ≤ e−c3 vL for any κ ≤ κ0 , K ≥ K0 , L ≥ L0 , and s = K log L. Proof. Fix a c3 > 0. By Lemma 4.2, there are ϑ < ∞, K0 < ∞ and L0 < ∞ such √ +,β 1 −2c v 3 L that PL (Rϑ,s,L |ML = m |L | − 2m vL ) ≤ e whenever s = K log L and L ≥ L0 . Let = {sext (σ ) : σ ∈ R1ϑ,s,L }. Recalling the lower bound in Theorem 3.1, it is clearly sufficient to prove that for some c3 > 0 large enough,
√ R3κ ,s,L sext (σ ) = ≤ 2e−c3 vL
+,β
PL
(4.19)
holds for all ∈ and all L sufficiently large provided κ is sufficiently small and that the K in s = K log L is sufficiently large. (Note that, for (4.19) to imply (4.18), c3 will have to exceed c3 by a β-dependent factor. The factor of “2” was put in for later convenience.) Pick a ∈ . Since R3κ ,s,L depends only on the configuration in Int◦ , (4.7) implies
−,β R3κ ,s,L sext (σ ) = = PInt◦ R3κ ,s,L .
+,β
PL
(4.20)
In order to apply Lemma 2.10, we need to compare −m |Int◦ | with the actual average magnetization of the Ising model in volume Int◦ with minus boundary √ condition. By (4.10) and (4.9), we have |Int◦ | ≤ g42 ϑ −2 s 2 vL and |∂Int◦ | ≤ g4 ϑ −1 s vL . Then Lemma 2.9 and (2.36) imply the existence of constants α1 = α1 (β) < ∞ and α2 = α2 (β) > 0 such that
MInt◦ −,β◦ + m |Int◦ | ≤ α1 g4 ϑ −1 s √vL + (g 2 s 2 ϑ −2 vL )2 e−α2 s . (4.21) 4 Int √ Now, since s = K log L, for K large the right-hand side is less than 2α1 g4 ϑ −1 s vL . √ 3/4 Thus, if L is so large that the latter does not exceed 21 κ −1 svL (i.e., if 4α1 g4 ϑ −1 s vL ≤ 3/4 κ −1 svL ), then σ ∈ R3κ ,s,L and sext (σ ) = imply −,β,s
MInt◦ ≤ MInt◦ Int◦
1 3/4 − κ −1 svL . 2
(4.22)
Let now κ0 > 0 be such that c3 ≤ ϑ 2 (8κ02 χg42 )−1 , where χ = χ (β) is the susceptibility, and let κ ≤ κ0 . By Eq. (2.39) in Lemma 2.10 and the fact that |Int◦ | ≤ g42 ϑ −2 s 2 vL , the √ right-hand side of (4.20) is bounded by 2e−c3 vL . The bound (4.19) is thus proved. 4.3.2. Magnetization outside. Recall the definition of Ext◦ . Our first concern here is an upper bound on the total magnetization in Ext◦ . Let R4κ ,s,L be the event R4κ ,s,L = σ : MExt◦ ≥ m |Ext◦ | − 2κm vL .
(4.23)
To bound the conditional probability of this event is easy; we will actually show that it can be included into the preceding ones for configurations contained in ML = {σ : ML = m |L | − 2m vL }.
172
M. Biskup, L. Chayes, R. Koteck´y
Lemma 4.7. For any κ > 0 and any K < ∞ there exists an L0 < ∞ such that R4κ /2,s,L ∩ ML ⊂ R1κ ,s,L ∪ R2κ ,s,L ∪ R3κ ,s,L ∩ ML (4.24) for any L ≥ L0 and s = K log L. Proof. Let κ and K be fixed. Let us abbreviate Int◦ = Int◦s,L (σ ) and Ext◦ = Ext◦s,L (σ ) for a configuration σ which we will take to be in (R1κ ,s,L )c ∩(R2κ ,s,L )c ∩(R3κ ,s,L )c ∩ML . First, we note that if σ ∈ R1κ ,s,L , we can use Lemmas 4.3 and 4.4 to get √ (4.25) |L | − |Ext◦ | + |Int◦ | ≤ g4 κ −1 s vL and hence √ |ML − MExt◦ − MInt◦ | ≤ g4 κ −1 s vL .
(4.26)
Now, since the total magnetization is held fixed, i.e., σ ∈ ML , we have ML = L |− 2m vL and by a simple calculation we get √ MExt◦ ≤ ML − MInt◦ + g4 κ −1 s vL √ = m (|L | − |Int◦ |) − MInt◦ + m |Int◦ | − 2m vL + g4 κ −1 s vL . (4.27) √ At the expense of another factor of g4 κ −1 s vL , we can replace |L | − |Int◦ | with |Ext◦ |. Finally, since σ ∈ R2κ ,s,L ∪ R3κ ,s,L we can use the bounds m |
MInt◦ ≥ −m |Int◦ | − κ −1 svL
3/4
(4.28)
and |Int◦ | ≤ |V (sext (σ ))| ≤ (1 − κ)vL
(4.29)
√ 3/4 MExt◦ ≤ m |Ext◦ | − 2m κvL + 2g4 κ −1 s vL + κ −1 svL .
(4.30)
in succession to arrive at
From here we see that σ ∈ R4κ /2,s,L once L is so large that the remaining terms on the right-hand side are swamped by −m κvL . Our second task concerning the magnetization outside the large external contours is to show that MExt◦ − m |Ext◦ | will not get substantially below the deficit value forced in by the condition on overall magnetization. (Note, however, that we have to allow for the possibility that Ext◦ = L in which case the exterior takes the entire deficit.) Let κ > 0 and consider the event R5κ ,s,L = σ : MExt◦ ≤ m |Ext◦ | − 2m (1 + κ −1 )vL . (4.31) The probability of R5κ ,s,L is bounded as follows: Lemma 4.8. For any c5 > 0 there exist constants κ0 > 0 , K0 < ∞ and L0 < ∞ such that √
+,β PL R5κ ,s,L ML = m |L | − 2m vL ≤ e−c5 vL (4.32) for all K ≥ K0 , κ ≤ κ0 and L ≥ L0 , and s = K log L.
Droplet Formation in the 2D Ising Model
173
Proof. With as in (2.2) and c5 fixed, choose κ0 so that c5 ≤
w1 + − . 2 3κ0
(4.33)
For this κ0 > 0, let L0 be so large that for all L ≥ L0 , the finite-L expression on the right-hand side of (1.10) exceeds (1 + 2κ1 0 )−1 and, at the same time, L from Theorem 3.1 is bounded by /(6κ0 ). First, we can restrict ourselves to the complement of R1ϑ,s,L with ϑ so small that the corresponding c1 exceeds 2c5 . Once again using Lemma 2.9, we get
MExt◦ +,β◦ − m |Ext◦ | ≤ α1 g4 ϑ −1 s √vL + 4L + L4 e−α2 s ). Ext
(4.34)
Now, since s = K log L and vL ∼ L4/3 , for K sufficiently large the right-hand side does not exceed 8α1 L. Thus, if L is so large that the latter does not exceed m vL κ0−1 , it suffices to prove the corresponding bound for the event +,β R = σ : MExt◦ ≤ MExt◦ Ext◦ − m (2 + κ0−1 )vL .
(4.35)
Clearly, R depends only on the configuration in Ext◦ , and thus (4.7) makes the estimates in Lemma 2.11 available. We get +,β
PL
(m v )2
1 2 L R sext (σ ) = ≤ C exp −2 1+ ◦ χ |Ext | 2κ0
1 √ ≤ C exp −w1 1 + vL . 2κ0
(4.36)
Here C = C(β) < ∞ is independent of and the second inequality follows from our assumption about L0 . Now, using (4.33) and the fact that L ≤ /(6κ0 ), we derive the bound +,β
PL
√ √
R sext (σ ) = ≤ Ce−w1 vL ( +L )−2c5 vL .
(4.37)
+,β
The claim then follows by multiplying both sides by PL (sext (σ ) = ), summing over all with the above properties and comparing the right-hand side with the lower bound in Theorem 3.1.
4.4. Proof of Theorem 4.1. The ultimate goal of this section is to rule out the occurrence of intermediate contours. As a first step we derive an upper bound on the probability of the occurrence of contours of intermediate sizes in√a contour ensemble constrained to not contain contours with diameters larger than κ vL . The relevant statement comes as Lemma 4.9. Once this lemma is established, we will give a proof of Theorem 4.1.
174
M. Biskup, L. Chayes, R. Koteck´y +,β,s
4.4.1. A lemma for the restricted ensemble. Recall our notation P for the probability measure in volume ⊂ L conditioned on the event that the contour diameters do not exceed s . We will show that the occurrence of intermediate contours is improbable √ +,β,s in P with s = κ vL and magnetization restricted to “reasonable” values. For any ⊂ L and any s > 0 and κ > 0, let √ Acκ ,s, = σ : there exists γ in such that s ≤ diam γ ≤ κ vL . (4.38) Then we have the following estimates: Lemma 4.9. For any c6 > 0, ϕ0 > 1, and ϑ > 1, there exist κ0 ∈ (0, 1), K0 < ∞, and L0 < ∞, such that for s = K log L, all κ ∈ (0, κ0 ], K ≥ K0 , L ≥ L0 , all ⊂ L satisfying the bounds || ≥ ϑ −1 L2 and |∂| ≤ ϑL,
(4.39)
and all ϕ ∈ [κ0 , ϕ0 ] that make m || − 2ϕm vL an allowed value of M , we have √
+,β,κ vL c P Aκ ,s, M = m || − 2ϕm vL ≤ L−c6 . (4.40) Proof. Notice that the event Acκ ,s, is monotone in s = K log L and thus it is sufficient to prove the claim for only a fixed K (chosen suitably large). Let κ0 ∈ (0, 1) be fixed and let κ ∈ (0, κ0 ]. (At the very end of the proof, we will have to assume that κ0 is sufficiently small, see (4.54).) Fix a set ⊂ Z2 satisfying (4.39) and let M (ϕ) = σ : M = m || − 2ϕm vL . (4.41) Let us define +,β,s
δ = M
− m ||
(4.42)
+,β,s
and note that, on M (ϕ), we have M = M − δ − 2ϕm vL . The proof of (4.40) will be performed by writing the conditional probability as a quotient of two probabilities with unconstrained contour sizes and estimating separately the numerator and the denominator. Let √ E = σ : ∀γ ∈ s (σ ), diam γ ≤ κ vL (4.43) and, using the shorthand A = Aκ ,s, , write
P +,β (Ac ∩ M (ϕ) ∩ E) Ac M (ϕ) = +,β . P (M (ϕ) ∩ E)
√ +,β,κ vL
P
(4.44)
As to the bound on the denominator, we restrict the contour sizes in to s = K log L as in (3.5) and apply Lemmas 2.11 and 2.7 with the result +,β
P (M (ϕ) ∩ E) ≥
(m v )2 C1 m ϕ vL L 2 ϕ δ , exp −2 − 2 L2 χ || χ ||
(4.45)
where C1 = C1 (β, ϑ, ϕ0 ) > 0. Here, we note that two distinct terms were incorpo2 since, by Lemma 2.9 and rated into the constant C1 : First, a term proportional to δ (4.39), |δ | ≤ 2α1 ϑL once K is sufficiently large and thus |δ |2 /|| is bounded by a constant independent of L. Second, a term that comes from the bound (2.45) yielding log L δ |s (ϕvL + 2m )| ≤ C2 max{K 1/3 , 1} with some C2 = C2 (β, ϑ, ϕ0 ) < ∞. (Notice L
Droplet Formation in the 2D Ising Model
175
that, to get a constant C1 independent of L, we have to choose L0 after a choice of K is done.) Although the second term on the right-hand side of (4.45) is negligible compared to the first one, its exact form will be needed to cancel an inconvenient contribution of the complement of intermediate contours. In order to estimate the numerator, let = {s (σ ) : σ ∈ E, s (σ ) = ∅} be the set of all collections of s-large contours that can possibly contribute to E. (We also demand that s (σ ) = ∅, because on Ac there will be at least one s-large contour.) Then we have
+,β +,β +,β P Ac ∩ M (ϕ) ∩ E ≤ P M (ϕ) s (σ ) = P s (σ ) = . ∈
(4.46) +,β
Our strategy is to derive a bound on P (M (ϕ)|s (σ ) = ) which is uniform in +,β ∈ and to estimate P (s (σ ) = ) using the skeleton upper bound. Let ∈ and let S be an s-skeleton such that S ∼ . We claim that, for some C = C (β, ϑ) < ∞ and some η0 = η0 (β, ϑ) < ∞, independent of , S, κ0 and L, +,β
P (M (ϕ)|s (σ ) = ) +,β P (M (ϕ) ∩ E)
√
≤ C L2 eη0 κ 0 Wβ (S)
(4.47)
holds true. Indeed, let be the abbreviation for the set of external contours in and let S be the set of skeletons in S corresponding to . Recall the definition of Int and Int◦ and note that V( ) = Int and Wβ (S) ≥ W√ β (S ), since S ⊃ S . Also note that, by (2.10) and (2.11) and the fact that diam γ ≤ κ vL for all γ ∈ , we have
√
−1 √ |Int| ≤ g2 κ vL P(S) ≤ g2 κ0 τmin vL Wβ (S). (4.48) S∈S
√ This bound tells us that we might as well assume that |Int| ≤ κ0 vL . Indeed, in the opposite case, the bound (4.47) would directly follow by noting that (4.45) implies √ +,β −2 −η κ W (S) with η given by β 1 0 PL (M (ϕ) ∩ E) ≥ C1 L e 1 η1 = 2g2
(m ϕ)2 v 3/2 L
χ τmin ||
+
√ m ϕ δ v L . χ τmin ||
(4.49)
Notice that η1 is bounded uniformly in L and by (4.39) and the facts that < ∞ and δ √1 ϑL. A similar bound, using (2.9) instead of (2.10), shows that √ also |∂Int| ≤ √ ≤ 2α s vL / κ0 . Indeed, if the opposite is true, then (2.9–2.11) imply that κ0 Wβ (S) ≥ √ τmin g1−1 vL and we can proceed as before. √ √ √ Thus, let us assume that |Int| ≤ κ0 vL and |∂Int| ≤ s vL / κ0 hold true. In order for M (ϕ) to occur, the total magnetization in should deviate from m || by −2ϕm vL , while the volume Int can help the bulk only by at most −|Int|. More +,β,s precisely, MExt◦ is forced to deviate from its mean value MExt◦ Ext◦ by at least −2m u (and by not more than −2m u − 2|Int|) where u is defined by −2m u = −2ϕm vL − δExt◦ + 2|Int|,
(4.50)
√ with δExt◦ as in (4.42). By the estimates |Int| ≤ κ0 vL , |Ext◦ | ≥ 21 ϑ −1 L2 , |∂Ext◦ | ≤ 2ϑL, and u ≤ C3 L4/3 L2 / log L, with C3 = C3 (β, ϑ, ϕ0 ) (all these bounds hold
176
M. Biskup, L. Chayes, R. Koteck´y
√ for L sufficiently large—in particular, to ensure that K vL log L ≤ ϑL), we now have, once more, Lemma 2.11 at our disposal. Thus,
(m v )2
m ϕvL L M (ϕ) s (σ ) = ≤ C4 exp −2 ϕ2 − 2 δExt◦ − 2|Int| , χ || χ || (4.51)
+,β
P
where C4 = C4 (β, ϑ, ϕ0 ) < ∞. Similarly as in (4.45), the constant C4 incorporates also the error term sExt◦ (u). To compare the right-hand side of (4.51) and (4.45), we invoke the second part of Lemma 2.9 to note that, for K sufficiently large and some α1 = α1 (β) < ∞, δ − δExt◦ ≤ α1 | \ Ext◦ |.
(4.52)
√ Using (4.48) again, |Int| is bounded by a constant times κ0 Wβ (S) vL and the same ◦ holds for | \ Ext |. Therefore, there is a constant η2 = η2 (β, ϑ) < ∞, independent of κ0 , such that 2
m ϕvL δ − δExt◦ + 2|Int| ≤ η2 κ0 Wβ (S), χ ||
(4.53)
holds true for all ∈ and their associated skeletons S. By combining this with (4.51) and (4.45), the bound (4.47) is established with η0 = max{η1 , η2 }, which we recall is independent of κ0 . With (4.47), the proof is easily concluded. Indeed, a straightforward application of the skeleton bound to the second term on the right-hand side of (4.46) then shows that √ 2 −(1−η √κ )W (S) +,β,κ vL c
β 0 0 P A M (ϕ) ≤ CL e . (4.54) S =∅
√ Now, choosing κ0 sufficiently small, we have 1−η0 κ0 > 2/3. Then we can extract the 1 term C e− 3 Wβ (S) which, choosing the K in s = K log L sufficiently large, can be made less than L−2−c6 , for any c6 initially prescribed. Invoking Lemma 2.5, the remaining sum is then estimated by one. 4.4.2. Absence of intermediate contours. Lemmas 4.2 and 4.5–4.9 finally put us in the position to rule out the intermediate contours altogether. +,β
Proof of Theorem 4.1. Recall that our goal is to prove (4.2), i.e., PL (Ac |ML ) ≤ L−c0 . Pick any c0 > 0 and κ0 < 1. Let K0 and L0 be chosen so that Lemmas 4.2, 4.5, 4.6, and 4.8 hold with some c1 , c2 , c3 , c5 > 0 for all κ ≤ 2κ0 , K ≥ K0 and L ≥ L0 . We also assume that L0 is chosen so that Lemma 4.7 is valid for κ = 2κ0 . We wish to restrict attention to configuration outside the sets R1κ0 ,s,L , R4κ0 ,s,L and R5κ0 ,s,L , but since R4κ0 ,s,L is essentially included in R2κ0 ,s,L and R3κ0 ,s,L , we might as well focus on the event Rc , where R = 5=1 Rκ0 ,s,L . Fix any κ ≤ κ0 , let s = K log L and let us introduce the shorthand A = Aκ ,s,L . Appealing to the aforementioned lemmas, our +,β goal will be achieved if we establish the bound PL (Ac ∩ Rc |ML ) ≤ L−2c0 . √ Let us abbreviate q = κ vL and let = {qext (σ ) : σ ∈ Rc } be the set of all collections of external contours that can possibly arise from Rc . Fix ∈ and recall
Droplet Formation in the 2D Ising Model
177
our notation Ext◦ for the exterior component of L induced by the contours in . To prove (4.2), it suffices to show that, for all ∈ ,
+,β +,β PL Ac ∩ Rc ∩ ML qext (σ ) = ≤ L−2c0 PL ML qext (σ ) = . (4.55) +,β
Indeed, multiplying (4.55) by PL (qext (σ ) = ) and summing over all ∈ , we derive that +,β +,β PL Ac ∩ Rc ∩ ML ≤ L−2c0 PL (ML ). (4.56) +,β
+,β
Thence, PL (Ac ∩ Rc |ML ) ≤ L−2c0 which, in light of the bound PL (R|ML ) ≤ √ −c v L , where c = min{c , c , c , c }, implies (4.2) once L is sufficiently large. 4e 1 2 3 5 It remains to prove (4.55) for all ∈ . Let ϕ ≥ 0 be such that m |Ext◦ | − 2ϕm vL is an allowed value of MExt◦ and consider the corresponding event MExt◦ (ϕ) (cf. (4.41)). Note that, by the restriction to the complements of R4κ0 ,s,L and R5κ0 ,s,L , we only need to consider ϕ ∈ [κ0 , 1 + κ0−1 ]. We claim that, for all such allowed values of ϕ, we have
+,β,q +,β PL Ac {qext (σ ) = } ∩ ML ∩ MExt◦ (ϕ) = PExt◦ Ac MExt◦ (ϕ) . (4.57) Indeed, given that qext (σ ) = , the event A depends only on the configurations in Ext◦ . Moreover, ML ∩ MExt◦ (ϕ) can be written as an intersection of MExt◦ (ϕ), which also depend only on σ in Ext◦ , and the event {σ : ML \Ext◦ = m (|L | − |Ext◦ |) − 2m (1 − ϕ)vL }, which depends only on the configuration in Int◦ . Thus, (4.57) follows from (4.7) and some elementary manipulations. By the restriction to the complement of R1κ0 ,s,L , we have |Ext◦ | ≥ L2 /2 and |∂Ext◦ | ≤ 8L for all ∈ . Choosing now c6 = 2c0 and then K0 and L0 (if necessary, even bigger than before) so that Lemma 4.9 can be applied, the right-hand side of (4.57) can be bounded by L−c6 = L−2c0 uniformly in ∈ , provided κ is sufficiently small and L ≥ L0 . Using (4.57), we thus have
+,β PL Ac ∩ Rc ∩ ML ∩ MExt◦ (ϕ) q (σ ) =
+,β ≤ PL Ac {qext (σ ) = } ∩ ML ∩ MExt◦ (ϕ)
+,β ×PL ML ∩ MExt◦ (ϕ) q (σ ) =
+,β ≤ L−2c0 PL ML ∩ MExt◦ (ϕ) q (σ ) = , (4.58) for all ϕ for which m |Ext◦ | − 2ϕm vL is an allowed value of MExt◦ . (In the cases when ϕ ∈ [κ0 , 1 + κ0−1 ] we have Rc ∩ MExt◦ (ϕ) = ∅ and the left-hand side vanishes.) This implies (4.55) by summing over all allowed values of ϕ. 5. Proof of Main Results Having established the absence of intermediate-size contours, we are now in the position to prove our main results. Proof of Theorem 1.2. Fix a ζ > 0 and recall our notation ML = {σ : ML = m |L |− +,β c 2m vL }. Our goal is to estimate the conditional probability PL (Acκ ,s,L ∪ B,s,L |ML ) −ζ by L . Let c0 > ζ and note that, by Theorem 4.1, we have +,β
PL
(Acκ ,s,L |ML ) ≤ L−c0 ,
(5.1)
178
M. Biskup, L. Chayes, R. Koteck´y
provided κ is sufficiently small and L sufficiently large. This means we can restrict our c attention to the event B,s,L \Acκ ,s,L . Furthermore, we can use Lemmas 4.2, 4.5, 4.6, and 4.7 to exclude the events R1ϑ,s,L , R2ϑ,s,L , R3ϑ,s,L , and R4ϑ,s,L , provided ϑ is sufficiently small. We therefore introduce the event E,κ ,ϑ defined by c E,κ ,ϑ = B,s,L \ (Acκ ,s,L ∪ R1ϑ,s,L ∪ R2ϑ,s,L ∪ R3ϑ,s,L ∪ R4ϑ,s,L ),
(5.2)
where we have suppressed s = K log L and L from the notation. On the basis of the aforementioned lemmas, the proof of Theorem 1.2 will follow if we can establish that for each κ > 0 and each > 0 there are K0 < ∞, ϑ > 0 and c7 > 0 such that +,β
PL
(E,κ ,ϑ |ML ) ≤ e−c7
√
vL
(5.3)
whenever L is sufficiently large. The proof of (5.3) will be performed by conditioning on the set of s-large exterior contours and applying separately the Gaussian estimates and the skeleton upper bound. The argument will be split into several cases, depending on which of the bounds (1.14–1.16) constituting the event B,s,L fail to hold. 1 2 1 Let us write E,κ ,ϑ as the disjoint union E, κ ,ϑ ∪ E,κ ,ϑ , where E,κ ,ϑ is the set of all 2 1 configurations on which one of (1.14) or (1.15) fail and where E,κ ,ϑ = E,κ ,ϑ \ E, κ ,ϑ . ext Let = {s (σ ) : σ ∈ E,κ ,ϑ } be the set of all collections of exterior contours allowed c by E,κ ,ϑ . (Here s = K log L.) Since s (σ ) is non-empty for all σ contributing to B,s,L , we have = ∅ for all ∈ . Let λ = vL−1 |V ()|.
(5.4)
To apply the Gaussian estimate, we need the following upper bound on the magnetization in Ext◦ : Lemma 5.1. Let > 0, κ > 0 and ϑ > 0 and let the K in s = K log L be sufficiently large. Then there exists a sequence (κL ) with limL→∞ κL = 0 such that for both i ext i = 1, 2, all ∈ and all σ ∈ ML ∩ E, κ ,ϑ ∩ {s (σ ) = }, the magnetization MExt◦ = MExt◦s,L (σ ) (σ ) obeys the bound +,β,s
MExt◦ ≤ MExt◦ Ext◦ − 2m vL (1 − λ + i − κL ).
(5.5)
Here 1 = 0 and 2 = /(2m ). Proof. Recall the exact definition of Ext◦ . The proof is similar in spirit to the reasoning 1 (4.29–4.30). First we will address the case of configurations in E, κ ,ϑ . Using the equality ML = m |L | − 2m vL and our restriction to the complement of R1ϑ,s,L , we have √ ML ≤ m |Ext◦ | + m |V ()| − 2m vL + g4 ϑ −1 s vL ,
(5.6)
√ where g4 ϑ −1 s vL bounds the volume of Ext \ Ext◦ according to Lemma 4.3. Next, in view of the restriction to (R3ϑ,s,L )c , we have √ 3/4 MV() ≥ −m |V ()| − ϑ −1 svL − g4 ϑ −1 s vL .
(5.7)
Droplet Formation in the 2D Ising Model
179
√ Finally, since MExt◦ ≤ ML − MV() + g4 ϑ −1 s vL and since (4.34) implies that +,β,s m |Ext◦ | − MExt◦ Ext◦ can be bounded by 8α1 L once K is sufficiently large, we have (5.5) with κL given by −1/4
2m κL = ϑ −1 svL
−1/2
+ 3g4 ϑ −1 svL
+ 8α1 LvL−1 .
(5.8)
Since vL ∼ L4/3 , we have limL→∞ κL = 0 as claimed. 2 Next we will attend to the case of configurations from E, κ ,ϑ , for which the bound 2 3 c (1.16) must fail. Since E,κ ,ϑ is still a subset of (Rϑ,s,L ) , we still have the bound (5.7) at our disposal implying that MV() ≥ −m |V ()| − vL once L is sufficiently large. However, this means that the only way (1.16) can fail is that, in fact, the lower bound MV() ≥ −m |V ()| + vL
(5.9)
holds. Substituting this stronger bound in the above derivation in the place of (5.7), the desired estimate follows. With Lemma 5.1 in hand, we are ready to start proving the bound (5.3). We begin with the Gaussian estimate. By the restriction to the complement of R2ϑ,s,L , we have the bound λ ≤ 1 − ϑ and thus 1 − λ + i − κL ≥ 0 once L is sufficiently large. Moreover, since we also discarded R1ϑ,s,L , Lemma 2.11 for A = Ext◦ applies. Combining this with the observation (4.7) and the bound (5.5), there exists a constant C < ∞ such that
ext (m vL )2 +,β i 2
PL ML ∩ E,κ ,ϑ s (σ ) = ≤ C exp −2 (1 − λ + i − κL ) χ |L | (5.10) holds for all ∈ . Next we will estimate the probability that sext (σ ) = . Let S be a collection of skeletons corresponding to . The skeleton upper bound in Lemma 2.4 along with the estimates featured in Lemma 2.5 then yields +,β PL sext (σ ) = ≤ e−Wβ (S ) ≤ C e−Wβ (S), (5.11) S ⊇S
where C < ∞ and where S corresponds to the skeleton of a full set s (σ ) with sext (σ ) = . i ext To estimate the probability of ML ∩ E, κ ,ϑ ∩ {s (σ ) = }, we will write as the union of two disjoint sets, = 1 ∪ 2 . Here 1 = ∈ : ∃S ∼ , Wβ (S) ≤ w1 λ vL (1 + c−2 ) , (5.12) where c is the constant from Lemma 2.8, and 2 = \ 1 . First we will study the √ cases when ∈ 1 . By the restriction to the event Aκ ,s,L , we know that diam γ ≥ κ vL for all γ ∈ . Using that λ ≤ 1 − ϑ—recall that we are in the complement of R2ϑ,s,L — √ we have diam γ ≥ c(c−2 ) |V ()| whenever κ ≥ /c. Moreover, √ the upper bound on Wβ (S) from (5.12) along with the estimate Wβ (S) ≥ τ κ vL imply that λ min √ √ is bounded away from zero and thus |V ()| = λ vL ≥ s for L sufficiently large. This verifies the assumptions of Lemma 2.8 with replaced by c−2 , which then guarantees that is a singleton, = {γ0 }, and that √ inf dH V (γ0 ), |V (γ0 )|W + z ≤ |V (γ0 )|. (5.13) z∈R2
180
M. Biskup, L. Chayes, R. Koteck´y
Now, |V (γ0 )| = λ vL ≤ vL√(because, as noted before, λ ≤ 1), which means that the i right-hand side is less than vL and (1.14) holds. But on E, κ ,ϑ the event B,s,L must fail, so we must have either that (λ ) > + , which only applies when i = 1, or that (1.16) fails, which only applies when i = 2. We claim that, in both cases, there exists an > 0 and an α > 0—both proportional to —such that for some S ∼ and L sufficiently large, we have (1 − α)Wβ (S) + 2
√ (m vL )2 (1 − λ + i − κL )2 ≥ w1 vL + . χ |L |
(5.14)
Indeed, the Wulff variational problem in conjunction with Lemma 2.3, the restriction to (R1ϑ,s,L )c and the bound (1 − x)1/2 ≥ 1 − x for x ∈ [0, 1] imply that √ 1/2 Wβ (S) ≥ w1 |V(S)|1/2 ≥ w1 |V (γ0 )| − g3 ϑ −1 s 2 vL −1 2 ≥ w 1 λ v L − g 3 w 1 ϑ λ s .
(5.15)
3/2
Observing also that the difference 2(m )2 vL /(χ |L |) − w1 → 0 as L → ∞, the left hand side of (5.14) can be bounded from below by √ √ √ w1 vL (λ ) − αw1 λ vL − δL vL + 2w1 vL (i − κL )ϑ,
(5.16)
where δL → 0 (as well as κL → 0) with L → ∞. (Here we again used that 1−λ ≥ ϑ.) Now, for i = 1 we have (λ ) > + from which (5.14) follows once α < and L is sufficiently large. For i = 2, we use (λ ) ≥√ and√get the same conclusion since (5.16) now contains the positive term 2w1 2 vL ∝ vL . By putting (5.10) and (5.11) together, applying (5.14), choosing K ≥ K0 (α, β) and invoking Lemma 2.5 to bound the sum over all skeletons S, we find that +,β
PL
√ ML ∩ E,κ ,ϑ ∩ {sext (σ ) ∈ 1 } ≤ 2CC exp −w1 vL + ,
(5.17)
whenever L is sufficiently large. (Here the embarrassing factor “2” comes from combining the corresponding estimates for i = 1 and i = 2.) Thus, we are down to √the cases ∈ 2 , which means that for every skeleton S ∼ , we have Wβ (S) > w1 λ vL (1 + c−2 ). Moreover, since E,ϑ,κ ⊂√Aκ ,s,L , all s-large contours that we have to consider actually satisfy that diam γ ≥ κ vL . In particular, √ we also have that Wβ (S) ≥ τmin κ vL . Combining these bounds we derive that, for some c > 0 and regardless of the value of λ , Wβ (S) ≥ w1
λ + c
√
vL .
(5.18)
Disregarding the factor i in (5.10) and performing similar estimates as in the derivation of (5.17), we find that (5.14) holds again for some α > 0. Hence an analogue of (5.17) is valid also for all ∈ 2 . A combination of these estimates in conjunction with Theorem 3.1 show that, indeed, (5.3) is true with a c7 proportional to . This finishes the proof. The previous proof immediately provides us with the proof of the other main results:
Droplet Formation in the 2D Ising Model
181
Proof of Theorem 1.1. In light of Theorem 3.1, we need to prove an appropriate upper +,β bound on PL (ML ), where ML = {σ : ML = m |L | − 2m vL }. First we note +,β +,β that for L sufficiently large, the probability PL (ML ) is comparable with PL (FL ), where FL is the event c (5.19) FL = ML ∩ Aκ ,s,L ∩ B,s,L ∩ R1ϑ,s,L ∪ R3ϑ,s,L ∪ R4ϑ,s,L with , κ, ϑ as in the proof of Theorem 1.2. But on FL , we have at most one large contour and the skeleton and Gaussian upper bounds readily give us that +,β
PL
(FL ) ≤ Ce−w1
√
vL ( − )
(5.20)
for some C < ∞ and some > 0 proportional to . From here and Theorem 3.1, the claim (1.11) follows by letting L → ∞ and ↓ 0. Our last task is to prove Corollary 1.3. Proof of Corollary 1.3. By Proposition 2.1, if < c , the unique minimizer of (λ) is λ = 0. Thus, for > 0 sufficiently small and L large enough, the contour volumes are restricted to a small number times vL . Since (1.14) says that the contour volume √ is proportional to the square of its diameter, this (eventually) forces diam γ < κ vL for any fixed κ > 0. But that contradicts the fact that Aκ ,s,L holds for a κ sufficiently small. Hence, no such intermediate γ exists and all contours have a diameter smaller than K log L. In the cases > c , the function (λ) is minimized only by a non-zero λ (which is, in fact, larger than 2/3) and so the scenarios without large contours are exponentially √ suppressed. Since, again, diam γ > κ vL for all potential contours, Theorem 1.2 guarantees that there is only one such contour and it obeys the bounds (1.14) and (1.15). All the other contours have diameter less than K log L. Acknowledgement. The research of L.C. was supported by the NSF under the grant DMS-9971016 and by the NSA under the grant NSA-MDA 904-00-1-0050. The research of R.K. was supported by the ˇ 201/00/1149 and MSM 110000001. R.K. would also like to thank the UCLA Department grants GACR of Mathematics and the Max-Planck Institute for Mathematics in Leipzig for their hospitality as well as the A. von Humboldt Foundation whose Award made the stay in Leipzig possible.
References 1. Abraham, D.B., Martin-L¨of, A.: The transfer matrix for a pure phase in the two-dimensional Ising model. Commun. Math. Phys. 31, 245–268 (1973) 2. Aizenman, M., Chayes, J.T., Chayes, L., Newman, C.M.: Discontinuity of the magnetization in one-dimensional 1/|x − y|2 Ising and Potts models. J. Stat. Phys. 50, 1–40 (1988) 3. Alexander, K.: Cube-root boundary fluctuations for droplets in random cluster models. Commun. Math. Phys. 224, 733–781 (2001) 4. Alexander, K., Chayes, J.T., Chayes, L.: The Wulff construction and asymptotics of the finite cluster distribution for two-dimensional Bernoulli percolation. Commun. Math. Phys. 131, 1–51 (1990) 5. Ben Arous, G., Deuschel, J.-D.: The construction of the d + 1-dimensional Gaussian droplet. Commun. Math. Phys. 179, 467–488 (1996) 6. Bennetin, G., Gallavotti, G., Jona-Lasinio, G., Stella, A.: On the Onsager-Yang value of the spontaneous magnetization. Commun. Math. Phys. 30, 45–54 (1973) 7. Binder, K.: Theory of evaporation/condensation transition of equilibrium droplets in finite volumes. Physica A 319, 99–114 (2003) 8. Binder, K.: Reply to ‘Comment on “Theory of the evaporation/condensation transition of equilibrium droplets in finite volumes”’. Physica A 327, 589–592 (2003)
182
M. Biskup, L. Chayes, R. Koteck´y
9. Binder, K., Kalos, M.H.: Critical clusters in a supersaturated vapor: Theory and Monte Carlo simulation. J. Statist. Phys. 22, 363–396 (1980) 10. Biskup, M., Chayes, L., Koteck´y, R.: On the formation/dissolution of equilibrium droplets. Europhys. Lett. 60(1), 21–27 (2002) 11. Biskup, M., Chayes, L., Koteck´y, R.: Comment on “Theory of the evaporation/condensation transition of equilibrium droplets in finite volumes”. Physica A 327, 583–588 (2003) 12. Biskup, M., Borgs, C., Chayes, J.T., Koteck´y, R.: Gibbs states of graphical representations of the Potts model with external fields. J. Math. Phys. 41, 1170–1210 (2000) 13. Bodineau, T.: The Wulff construction in three and more dimensions. Commun. Math. Phys. 207, 197–229 (1999) 14. Bodineau, T., Ioffe, D., Velenik, Y.: Rigorous probabilistic analysis of equilibrium crystal shapes. J. Math. Phys. 41, 1033–1098 (2000) 15. Bolthausen E., Ioffe, D.: Harmonic crystal on the wall: a microscopic approach. Commun. Math. Phys. 187, 523–566 (1997) 16. Borgs, C., Koteck´y, R.: Surface-induced finite-size effects for first-order phase transitions. J. Stat. Phys. 79, 43–115 (1995) 17. Bricmont, J., Lebowitz, J.L., Pfister, C.E.: On the local structure of the phase separation line in the two-dimensional Ising system. J. Statist. Phys. 26(2), 313–332 (1981) 18. Campanino, M., Chayes, J.T., Chayes, L.: Gaussian fluctuations of connectivities in the subcritical regime of percolation. Probab. Theory Rel. Fields 88, 269–341 (1991) 19. Campanino, M., Ioffe, D.: Ornstein-Zernike theory for the Bernoulli bond percolation on Z d . Ann. Probab. 30(2), 652–682 (2002) 20. Campanino, M., Ioffe, D., Velenik, Y.: Ornstein-Zernike theory for the finite-range Ising models above Tc . Probab. Theory Rel. Fields 125(3), 305–349 (2003) 21. Cerf, R.: Large deviations for three dimensional supercritical percolation. Ast´erisque 267, vi+177 (2000) 22. Cerf, R., Pisztora, A.: On the Wulff crystal in the Ising model. Ann. Probab. 28, 947–1017 (2000) 23. Chayes, J.T., Chayes, L., Fisher, D.S., Spencer, T.: Correlation length bounds for disordered Ising ferromagnets. Commun. Math. Phys. 120, 501–523 (1989) 24. Chayes, J.T., Chayes, L., Schonmann, R.H.: Exponential decay of connectivities in the two-dimensional Ising model. J. Statist. Phys. 49, 433–445 (1987) 25. Curie, P.: Sur la formation des cristaux et sur les constantes capillaires de leurs diff´erentes faces. Bull. Soc. Fr. Mineral. 8, 145 (1885); Reprinted in Œuvres de Pierre Curie, Paris: Gauthier-Villars, 1908, pp. 153–157 26. Dobrushin, R.L., Hryniv, O.: Fluctuation of the phase boundary in the 2D Ising ferromagnet. Commun. Math. Phys. 189, 395–445 (1997) 27. Dobrushin, R.L., Koteck´y, R., Shlosman, S.B.: Wulff construction. A global shape from local interaction. Providence, RI: Am. Math. Soc., 1992 28. Dobrushin, R.L., Shlosman, S.B.: Large and moderate deviations in the Ising model. In: Probability contributions to statistical mechanics, Adv. Soviet Math., Vol. 20, Providence, RI: Amer. Math. Soc., 1994, pp. 91–219 29. Dunlop, F., Magnen, J., Rivasseau, V., Roche, Ph.: Pinning of an interface by a weak potential. J. Statist. Phys. 66, 71–98 (1992) 30. Edwards, R.G., Sokal, A.D.: Generalization of the Fortuin-Kasteleyn-Swendsen-Wang representation and Monte Carlo algorithm. Phys. Rev. D 38, 2009–2012 (1988) 31. Georgii, H.-O.: Gibbs Measures and Phase Transitions. de Gruyter Studies in Mathematics, Vol. 9, Berlin: Walter de Gruyter & Co., 1988 32. Georgii, H.-O., H¨aggstr¨om, O., Maes, C.: The random geometry of equilibrium phases. In: C. Domb and J.L. Lebowitz (eds), Phase Transitions and Critical Phenomena, Vol. 18, New York: Academic Press, 1999, pp. 1–142 33. Gibbs, J.W.: On the equilibrium of heterogeneous substances. (1876). In: Collected Works, Vol. 1, London: Longmans, Green and Co., 1928 34. Griffiths, R.B., Hurst, C.A., Sherman, S.: Concavity of magnetization of an Ising ferromagnet in a positive external field. J. Math. Phys. 11, 790–795 (1970) 35. Grimmett, G.R.: The stochastic random cluster process and the uniqueness of random cluster measures. Ann. Probab. 23, 1461–1510 (1995) 36. Gross, D.H.E.: Microcanonical Thermodynamics: Phase Transitions in “Small” Systems. Lecture Notes in Physics, Vol. 66, Singapore: World Scientific, 2001 37. Hryniv, O., Koteck´y, R.: Surface tension and the Ornstein-Zernike behaviour for the 2D Blume-Capel model. J. Stat. Phys. 106(3-4), 431–476 (2002) 38. Ioffe, D.: Large deviations for the 2D Ising model: a lower bound without cluster expansions. J. Statist. Phys. 74, 411–432 (1994)
Droplet Formation in the 2D Ising Model
183
39. Ioffe, D.: Exact large deviation bounds up to Tc for the Ising model in two dimensions. Probab. Theory Rel. Fields 102, 313–330 (1995) 40. Ioffe, D., Schonmann, R.H.: Dobrushin-Koteck´y-Shlosman theorem up to the critical temperature. Commun. Math. Phys. 199, 117–167 (1998) 41. Kaufman, B., Onsager, L.: Crystal statistics. III. Short range order in a binary Ising lattice. Phys. Rev. 76, 1244–1252 (1949) 42. Krishnamachari, B., McLean, J., Cooper, B., Sethna, J.: Gibbs-Thomson formula for small island sizes: Corrections for high vapor densities. Phys. Rev. B 54, 8899–8907 (1996) 43. Lee, J., Kosterlitz, J.M.: Finite-size scaling and Monte Carlo simulations of first-order phase transitions. Phys. Rev. B 43, 3265–3277 (1990) 44. Machta, J., Choi, Y.S., Lucke, A., Schweizer, T., Chayes, L.M.: Invaded cluster algorithm for Potts models. Phys. Rev. E 54, 1332–1345 (1996) 45. M¨uller, T., Selke, W.: Stability and diffusion of surface clusters. Eur. Phys. J. B 10, 549–553 (1999) 46. Neuhaus, T., Hager, J.S.: 2d crystal shapes, droplet condensation and supercritical slowing down in simulations of first order phase transitions. J. Statist. Phys. 113, 47–83 (2003) 47. Onsager, L.: Crystal statistics. I. A two-dimensional model with an order-disorder transition. Phys. Rev. 65, 117–149 (1944) 48. Pfister, C.-E.: Large deviations and phase separation in the two-dimensional Ising model. Helv. Phys. Acta. 64, 953–1054 (1991) 49. Pfister, C.-E., Velenik, Y.: Large deviations and continuum limit in the 2D Ising model. Probab. Theory Rel. Fields 109, 435–506 (1997) 50. Pfister, C.-E., Velenik, Y.: Interface, surface tension and reentrant pinning transition in 2D Ising model. Commun. Math. Phys. 204, 269–312 (1999) 51. Pleimling, M., H¨uller, A.: Crossing the coexistence line at constant magnetization. J. Statist. Phys. 104, 971–989 (2001) 52. Pleimling, M., Selke, W.: Droplets in the coexistence region of the two-dimensional Ising model. J. Phys. A: Math. Gen. 33, L199–L202 (2000) 53. Schonmann, R.H., Shlosman, S.B.: Wulff droplets and the metastable relaxation of kinetic Ising models. Commun. Math. Phys. 194, 389–462 (1998) 54. Simon, B.: The Statistical Mechanics of Lattice Gases. Vol. I., Princeton Series in Physics, Princeton, NJ: Princeton University Press, 1993. 55. Wulff, G.: Zur Frage des Geschwindigkeit des Wachsturms und der Aufl¨osung der Krystallflachen. Z. Krystallog. Mineral. 34, 449–530 (1901) Communicated by H. Spohn
Commun. Math. Phys. 242, 185–219 (2003) Digital Object Identifier (DOI) 10.1007/s00220-003-0940-3
Communications in
Mathematical Physics
An Area-Preserving Action of the Modular Group on Cubic Surfaces and the Painlev´e VI Equation Katsunori Iwasaki Faculty of Mathematics, Kyushu University, 6-10-1 Hakozaki, Higashi-ku, Fukuoka 812-8581 Japan. E-mail: [email protected] Received: 28 October 2002 / Accepted: 2 May 2003 Published online: 26 September 2003 – © Springer-Verlag 2003
Abstract: We construct an area-preserving action of the modular group on a general 4parameter family of affine cubic surfaces. We present a geometrical background behind this construction, that is, a natural symplectic structure on a moduli space of rank two linear monodromy representations over the 2-dimensional sphere with four punctures, and a natural symplectic action upon it of the braid group on three strings. Studying this action as a discrete dynamical system will be important in discussing the monodromy of the Painlev´e VI equation.
1. Introduction In the announcement, Iwasaki [18], the author constructed a modular group action on a general 4-parameter family of affine cubic surfaces and stated that this action represents the nonlinear monodromy of the Painlev´e VI equation. In this paper we shall give a detailed account of this construction, shedding light on a geometrical background behind it, that is, a natural symplectic structure on a moduli space of rank two linear monodromy representations over the 2-dimensional sphere with four punctures, and a natural symplectic action upon it of the braid group on three strings. Let us briefly recall our construction. Let C7 = C3 × C4 be the complex 7-space with coordinates (x, a), where x=(x1 , x2 , x3 ) ∈ C3 is space variables and a = (a1 , a2 , a3 , a4 ) ∈ C4 is parameters. Throughout the paper we denote by (i, j, k) any cyclic permutation of (1, 2, 3) and put θi (a) = ai a4 + aj ak
(i = 1, 2, 3).
Let gi : (x, a) → (x , a ) be a polynomial automorphism of C7 defined by
186
K. Iwasaki
xi = θj (a) − xj − xk xi , x j = xi , xk = xk gi : ai = aj , aj = ai , ak = ak , a4 = a4 .
(1.1)
We remark that the action of gi on parameters a = (a1 , a2 , a3 , a4 ) is very simple; it just permutes ai and aj , while keeping ak and a4 fixed. A transformation rule similar to (1.1) has been given by Dubrovin and Mazzocco [7] for a special case of the Painlev´e VI equation, which inspired our treatment for the general case in this paper. We should also remark that a classical work [20] by Jimbo and a recent work [10] by Guzzetti settle the connection problem for the general 4-parameter family of the Painlev´e VI equation. However, the contents and presentation of this paper are quite different from theirs. We employ geometrical and group theoretical points of view. In particular, dynamical and symplecto-geometrical aspects are the main focus of this paper, while the papers [20, 10] provide a thorough treatment in the analytic side of the problem. The present paper has no emphasis on the latter aspect. Let G = g1 , g2 , g3 be the group generated by the transformations g1 , g2 , g3 . Then an inspection shows that the generators satisfy three relations gi gj gi = gj gi gj ,
gk = gi gj gi−1 .
(gi gj )3 = 1,
(1.2)
The last relation implies that only two transformations gi , gj are sufficient to generate the group G, while the first two tell us that if we put s = gi gj gi ,
t = gi ,
(1.3)
then s and t satisfy relations s 2 = (ts)3 = 1.
(1.4)
Conversely, if relations (1.4) are assumed, Eq. (1.3) is settled as gi = t,
gj = sts,
gk = st −2 .
Hence the group G is generated by s and t. This fact suggests that G is closely related to the full modular group az + b = P SL(2, Z) = z → : a, b, c, d ∈ Z, ad − bc = 1 . cz + d It is well known that has generators S, T with defining relations S 2 = (T S)3 = 1,
(1.5)
where S and T are M¨obius transformations S(z) = −1/z,
T (z) = z + 1.
In view of (1.4) and (1.5), there exists a surjective group homomorphism →G
such that
S → s,
T → t.
(1.6)
Area-Preserving Action of the Modular Group
187
Through the homomorphism (1.6), the group acts on C7 as a polynomial automorphism group. Moreover, it follows from the remark after (1.1) that there exists a surjective group homomorphism G → S3 ,
such that
gi → σi = (i, j ),
(1.7)
where the symmetric group S3 acts on parameters a ∈ C4 by permuting the first three coordinates (a1 , a2 , a3 ), while keeping the fourth coordinate a4 always fixed. For each element g ∈ G, there exists a commutative diagram g
C7 −−−−→ π
C7 π
(1.8)
C4 −−−−→ C4 σ
where π : (x, a) → a is the projection down to parameters and σ ∈ S3 is the permutation corresponding to g ∈ G under the homomorphism (1.7). Let (2) be the principal congruence subgroup of of level 2, az + b a, b, c, d ∈ Z, ad − bc = 1 (2) = z → : , a ≡ d ≡ 1, b ≡ c ≡ 0 (mod2) cz + d and let G(2) be the subgroup of G generated by g12 , g22 , g32 , namely, G(2) = g12 , g22 , g32 . Then it is easy to see that G(2) is exactly the image of (2) under the homomorphism (1.6) and is contained in the kernel of the homomorphism (1.7). Since σ = 1 for g ∈ G(2) in (1.8), the subgroup (2) acts on C7 , keeping parameters a fixed. We thus have a 4-parameter family of (2)-actions on the 3-space C3 parametrized by a ∈ C4 . It is an important observation that the polynomial f (x, a) = x1 x2 x3 + x12 + x22 + x32 − θ1 (a) x1 − θ2 (a) x2 − θ3 (a) x3 + θ4 (a)
(1.9)
is G-invariant, where θ4 (a) is defined by θ4 (a) = a1 a2 a3 a4 + a12 + a22 + a32 + a42 − 4. To show this, we have only to check that f (x, a) is invariant under the transformation gi in (1.1). The polynomial f (x, a) is written fa (x) when it is regarded as a cubic polynomial of x depending on parameters a. This cubic appears in Jimbo [20] in connection with a parametrization of monodromy data (see §3). Thanks to the G-invariance of f (x, a), its zero-level set S = { (x, a) ∈ C7 : f (x, a) = 0 } is stable under the action of G. So the commutative diagram (1.8) restricts to g
S −−−−→ π
S π
C4 −−−−→ C4 σ
188
K. Iwasaki
The space S is the total space of a 4-parameter family of cubic surfaces S(a) = { x ∈ C3 : fa (x) = 0 } parametrized by a ∈ C4 . Every element g ∈ G induces an isomorphism g : S(a) → S(σ (a)) for each a ∈
C4 .
(1.10)
If g ∈ G(2), then σ = 1 and hence (1.10) is an automorphism g : S(a) → S(a).
Through the homomorphism (1.6), we have an action of on S. It restricts to an action of (2) on each cubic surface S(a) parametrized by a ∈ C4 . Now we can draw a picture as in Fig. 1, which would give us a total image of the action constructed. The construction so far is already announced in [18]. We now add a new ingredient, that is, a (complex) area form on each cubic surface S(a). Definition 1.1 (Area form). For each a ∈ C4 , the cubic surface S(a) is provided with a (complex) area form dxi ∧ dxj ωa = , (1.11) yk (x, a) where (i, j, k) is any cyclic permutation of (1, 2, 3) and yi (x, a) =
∂fa (x) = 2xi + xj xk − θi (a) ∂xi
(i = 1, 2, 3).
(1.12)
Since the function fa (x) is identically zero on the surface S(a), we have y1 (x, a) dx1 + y2 (x, a) dx2 + y3 (x, a) dx3 = 0 on S(a).
S (a)
(2)
⊂
S (σ (a))
u (x , a ) ? G(2)
3
g
? ⊂
G
(x, a)
? {1}
? ⊂
S3
S
u . . . . . . . . . .u
π
?
a Fig. 1. A total image of the action
. . . . . . . . . .u σ (a)
C4
Area-Preserving Action of the Modular Group
189
It is easily seen from this relation that the 2-form ωa is independent of the choice of (i, j, k), provided that it is a cyclic permutation of (1, 2, 3). The area forms ωa are put together to form a relative area form ω on the fibration π : S → C4 . Note that ωa is the Poincar´e residue of the surface S(a), which turns out to be a concrete realization of the symplectic structure discussed in Iwasaki [17] and Hitchin [13] (see Theorem 5.1). The area form ωa is not defined precisely at those points which satisfy y1 (x, a) = y2 (x, a) = y3 (x, a) = 0, that is, at the singular points of S(a). Now we should recall a result of Iwasaki [18, Theorem 1]: the surface S(a) has singular points if and only if w(a)
4 (al2 − 4) = 0,
(1.13)
l=1
with w(a) being a polynomial defined by w(a) =
ε1 ε2 ε3 =1
(ε1 a1 + ε2 a2 + ε3 a3 + a4 ) −
3
(ai a4 − aj ak ),
(1.14)
i=1
where the first product on the right-hand side is taken over all triple signs ε = (ε1 , ε2 , ε3 ) ∈ {±1}3 satisfying ε1 ε2 ε3 = 1. Thus S(a) is a nonsingular surface for a generic value of a ∈ C4 . If a satisfies condition (1.13), then the surface S(a) has finitely many singular points and the 2-form ωa has singularities exactly at those points. We remark that results of Mazzocco [28], Saito and Terajima [36] imply that the special function solutions of the Painlev´e VI equation correspond to the singularities of cubics. Singular points should also be discussed from a dynamical point of view. The author asked what the polynomial w(a) is all about [18, Problem 2]. In response to this question, Terajima [39] gave a Lie theoretic interpretation of it, which seems to be very useful in connection with singularity theory (see also Lemma 6.7). A characteristic feature of the relative area form ω is the following: Theorem 1.2 (Area-preserving property). Our modular group action is area-preserving, namely, the isomorphism (1.10) preserves the relative 2-form ω introduced in Definition 1.1. Proof. The theorem itself is quite easily proved, once the transformations gi and the 2-form ωa are introduced as in (1.1) and (1.11). Indeed, the assertion that gi preserves ω is almost immediate if one notices that gi in (1.1) induces a transformation gi : (y1 , y2 , y3 ) → (y1 , y2 , y3 ) with yi = −yj , yj = yi − xk yj , gi : (1.15) y = y − x y , k i j k on the variables y = (y1 , y2 , y3 ) in (1.12). Since the definition (1.11) of the 2-form ωa is independent of the cyclic permutation (i, j, k) chosen, the second and third formulas of (1.1) and the first formula of (1.15) yield
190
K. Iwasaki
ωa =
dxj ∧ dxk yi
=
dxi ∧ dxk dxk ∧ dxi = = ωa . −yj yj
Hence gi preserves ω and the theorem is established.
The aim of this paper is not merely to state this very simple observation but also, or much more significantly, to uncover its deep geometrical meaning (Theorem 5.1), as well as to suggest its important role in investigating the Painlev´e VI equation (Theorem 6.5). Namely, we shall present a geometrical construction underlying our area-preserving action of the modular group. The main ingredients are a moduli space of rank two monodromy representations over the 2-dimensional plane with three punctures, or the 2-dimensional sphere with four punctures, and a natural action upon it of the braid group B3 on three strings. More specifically, we shall introduce a moduli space Rt of monodromy representations and identify it with the space of monodromy data, M. As will be seen in §2, the space Rt and hence M admit a natural action of the group B3 , which we call the isomonodromic action (Definition 2.3). Note that this action is an abstract object of purely topological nature. In §3 we shall show that a dense open subset M◦ of M can be identified with a Zariski open subset S ◦ of S (Theorem 3.6). Through this identification, the B3 -action on M is recast into a more concrete B3 -action on S. Moreover, it factors through the modular group action constructed in this section, and the transformation rule (1.1) is just a concretization of the former abstract action (Theorem 3.7). In this manner, the family S of cubic surfaces and the modular group action upon it emerge into the foreground. We will also be concerned with moduli spaces Rt (a) of monodromy representations with fixed local monodromy data and the corresponding spaces M(a) in M. Here local monodromy data are parametrized by a ∈ C4 . Again, for each a ∈ C4 , the dense open subset M◦ (a) = M(a) ∩ M◦ is identified with the Zariski open subset S ◦ (a) = S(a) ∩ S ◦ of S(a). For a generic value of a, the subspace S ◦ (a) coincides with the entire surface S(a) (Theorem 4.1). As was constructed in Iwasaki [16] and Hitchin [13] (see also Goldman [8]), there exists a natural symplectic structure on each Rt (a) M(a) such that the braid group action mentioned above is a symplectic action. This symplectic structure is also an abstract object of purely topological nature that arises from the Poincar´e-Lefschetz duality for cohomology. So we should realize it as a concrete object on S(a), through the isomorphism M◦ (a) S ◦ (a). Then we will be able to obtain a symplectic structure on S(a), or an area form on it, as it is a surface. (The symplectic structure may have singularities, but even if so, we will be happy with another interesting problem to tackle in connection with singularity theory.) In any case, as is expected naturally, the area form (1.11) is actually the one so obtained (Theorem 5.1). As was mentioned in Iwasaki [18], our modular group action describes the nonlinear monodromy of the Painlev´e VI equation, PVI . Here PVI is a nonlinear ordinary differential equation which, in the author’s opinion, may be thought of as a nonlinear analogue of the Gauss hypergeometric equation. See Iwasaki et al. [19] for general information about PVI . Recently it is a focus of much attention by many authors; here we wish to cite [1, 6, 7, 9–12, 20, 25–27, 31, 34, 35, 37, 40] to list only a few. In the final section, §6, we shall discuss the connection of our construction with the isomonodromic nature of the Painlev´e VI equation. This will confirm the meaning of the area-preserving property of our discrete dynamical system as a manifestation of the global Hamiltonian structure of the Painlev´e VI equation (Theorem 6.5). The area-preserving property will also be important in investigating our modular group action itself, for instance, in the classification of its bounded orbits. Under a
Area-Preserving Action of the Modular Group
191
certain generic condition on a, the surface S(a) contains a bounded orbit only when it is a (complexified) real surface, and every bounded orbit is confined in the real part of S(a). In this situation, the 2-form ωa is an area form in the original sense, that is, a real area form. We conclude this fairly long introduction with a few words about the moduli of cubic surfaces. It is well known in classical algebraic geometry that isomorphism classes of complex cubic surfaces admit a 4-dimensional moduli space and that Cayley [5] constructed a normal form parametrizing general cubic surfaces. Rather recently, Cayley’s normal form was modified in a convenient way by Naruki and Sekiguchi [29, 30]. Comparing our 4-parameter family with theirs, we find that our family also captures general moduli. This fact makes our dynamical system more interesting, suggesting a close connection between the moduli parameters of cubic surfaces and the 4-parameters of the Painlev´e VI equation. 2. Action of Braids on Monodromy Data The action of braids to be discussed here was already considered by Dubrovin and Mazzocco [7] and Mazzocco [27]. Still previously, Iwasaki [16, 17] hinted at it more abstractly (hence less concretely) in a more general Riemann surface setting. In [7] the action was constructed on a moduli space of certain special monodromy representations, which was partially extended to the general case in [27]. In this paper, we shall deal with a moduli space of general monodromy representations. Let us review some of the constructions in [7, 27], to make things somewhat more transparent in our situation, as well as to make the exposition self-contained for later convenience. Let T be the configuration space of unordered distinct three points in C, T = { t = {t1 , t2 , t3 } ∈ C3 /S3 : ti = tj
for
i = j }.
(2.1)
Consider the fibration π : X → T whose fiber over t = {t1 , t2 , t3 } ∈ T is the punctured plane Xt = C − {t1 , t2 , t3 }. Since this fibration is locally trivial topologically, the fundamental group π1 (T ) of the base space T acts on the fundamental group π1 (Xt ) of a typical fiber Xt . On the other hand, the group π1 (T ) is isomorphic to the braid group B3 on three strings with base points at t1 , t2 , t3 . We thus have a right action of the group B3 on π1 (Xt ), π1 (Xt ) × B3 → π1 (Xt ), (β, γ ) → γ β . (2.2) Intuitively, a braid β ∈ B3 is thought of as a movement of three points t1 , t2 , t3 going around in C. Then γ β is the result of such a continuous deformation of γ ∈ π1 (Xt ) that keeps the moving points t1 , t2 , t3 away from γ . Let us describe the action (2.2) explicitly in terms of generators of the groups involved. As for the generators of π1 (Xt ), we take the loops γ1 , γ2 , γ3 as in Fig. 2. The braid group B3 is generated by three braids β1 , β2 , β3 indicated in Fig. 3. They satisfy relations βi βj βi = βj βi βj ,
βk = βi−1 βj βi .
(2.3)
The second relation means that the group B3 is generated by two braids βi , βj , while the first one is the well-known braid relation which is the defining relation of the group B3 . Now the action (2.2) is described as follows. Lemma 2.1. Write γ = γ βi for each γ ∈ π1 (Xt ). Then, γi = γi γj γi−1 ,
γj = γi ,
γk = γk .
(2.4)
192
K. Iwasaki
6
6
?
6
?
?
'$ '$ '$ u
u
u
ti
tj
tk
γi
γj
γk
&% &% &%
Fig. 2. The loops γi , γj , γk
tj = ti
ti = tj
@ I @
tk = tk
6
@
@ @ @
@
@
ti
tj Fig. 3. The braids βi
tk (i = 1, 2, 3)
Proof. Draw a picture carefully. Deforming the loops γi , γj , γk along the braid βi , we get the loops γi , γj , γk in Fig. 4. A detailed explanation can be found in Dubrovin and Mazzocco [7], and hence omitted.
The action (2.2) induces a natural action of the braid group B3 on the space of conjugacy classes of monodromy representations. Here, by a monodromy representation, we mean a group anti-homomorphism ρ : π1 (Xt ) → SL(2, C). Remark 2.2. The reason why a monodromy representation is defined to be an anti-homomorphism is only conventional and stems from the connection with the theory of Fuchsian differential equations. Consider a Fuchsian system on the Riemann sphere P1 with four regular singular points at t1 , t2 , t3 , t4 with t4 = ∞, and let Y be a fundamental matrix of solutions at a base point. Then we can speak of the associated monodromy representation ρ; for each loop γ ∈ π1 (Xt ), the result Y γ of the analytic continuation along γ of Y is expressed as Y γ = Yρ(γ ) for some nonsingular matrix ρ(γ ). Then, for two loops γ , γ ∈ π1 (Xt ), we have Y γ γ = Yρ(γ γ ) on one hand and Y γ γ = [Y γ ]γ = [Yρ(γ )]γ = Yρ(γ )ρ(γ ) on the other hand. Hence ρ(γ γ ) = ρ(γ )ρ(γ ),
Area-Preserving Action of the Modular Group
193
?6
?6
?6
γi
'$ '$ '$ u
u
tj
u
ti
tk
&% &% &% γj
γk
Fig. 4. The action of βi on γi , γj , γk
and so the monodromy representation ρ is an anti-homomorphism. Here, following the convention in [7], we understand that the composite γ γ of two loops γ , γ is the loop obtained by joining γ and γ in this order. Two monodromy representations ρ, ρ are said to be conjugate if there exists a matrix P ∈ SL(2, C) such that ρ (γ ) = P ρ(γ ) P −1
for any
γ ∈ π1 (Xt ).
A monodromy representation, say ρ, and its conjugacy class will be denoted by the same symbol ρ and the phrase “the conjugacy class of” will often be omitted. This abuse of notation and terminology should cause no confusion. Let Rt be the space of all conjugacy classes of monodromy representations, Rt = Hom(π1 (Xt ), SL(2, C))/ ∼
(2.5)
equipped with a natural topology: We provide Hom with the compact-open topology and then Rt = Hom/∼ with its quotient topology, where we understand that π1 (Xt ) and SL(2, C) have the topologies as a discrete group and a complex Lie group, respectively. We are now in a position to define an action of the braid group B3 on the space Rt . Definition 2.3 (Isomonodromic action). The isomonodromic action Rt × B3 → Rt ,
ρ → ρ β
(2.6)
is the right action of B3 on Rt that satisfies the condition ρ β (γ β ) = ρ(γ )
for any γ ∈ π1 (Xt ).
(2.7)
A monodromy representation ρ is expressed by a triple of matrices M = (M1 , M2 , M3 ) ∈ SL(2, C)3 , where the matrices M1 , M2 , M3 are defined by Mi = ρ(γi )
(i = 1, 2, 3).
(2.8)
194
K. Iwasaki
This triple is called the monodromy data of ρ. Similarly the conjugacy class of a monodromy representation is expressed by the conjugacy class of its monodromy data. Here two triples M = (M1 , M2 , M3 ), M = (M1 , M2 , M3 ) are said to be conjugate if there exists a matrix P ∈ SL(2, C) such that Mi = P Mi P −1
(i = 1, 2, 3).
Let M be the space of all conjugacy classes of triples in SL(2, C)3 , M = SL(2, C)3 / ∼, equipped with the quotient topology. The conjugacy class of a triple, say, M ∈ SL(2, C)3 is denoted by the same symbol M and is referred to as a monodromy data. Again this abuse of notation and terminology should cause no confusion. There exists a natural bijection or an identification Rt → M,
ρ → M = (M1 , M2 , M3 ),
(2.9)
associating to each monodromy representation ρ ∈ Rt its monodromy data M ∈ M. Through this identification the isomonodromic action (2.6) on Rt induces a right action on the space M of monodromy data, M × B3 → M,
M → M β ,
(2.10)
which will also be called the isomonodromic action. This action can be described explicitly in terms of the generators β1 , β2 , β3 of B3 . The following lemma is due to Mazzocco [27]. Lemma 2.4. The isomonodromic action (2.10) of the braid βi on M is given by βi : M = (M1 , M2 , M3 ) → M = (M1 , M2 , M3 ) with
βi :
Mi = Mj , Mj = Mj Mi Mj−1 , Mk = Mk .
(2.11)
Proof. Let M ∈ M be the monodromy data of a monodromy representation ρ ∈ Rt . Write ρ = ρ βi and M = M βi . By (2.8), M = (M1 , M2 , M3 ) is given by Ml = ρ (γl ) for l = 1, 2, 3. Condition (2.7) for the loop βi reads ρ (γ ) = ρ(γ ) for any γ ∈ π1 (Xt ). Substituting (2.4) into this and recalling that a monodromy representation is an antihomomorphism, we have Mi = ρ(γi ) = ρ (γi ) = ρ (γi γj γi−1 ) = Mi −1 Mj Mi ,
Mj = ρ(γj ) = ρ (γj ) = ρ (γi )
Mk = ρ(γk ) = ρ (γk ) = ρ (γk )
= Mi ,
= Mk .
Solving these equations for Mi , Mj , Mk yields (2.11) as desired.
Area-Preserving Action of the Modular Group
195
Two remarks should be in order at the end of this section. Remark 2.5. It is self-evident from the way of construction that the transformations β1 , β2 , β3 in (2.11) satisfy relations (2.3). Moreover, a direct check shows that they satisfy an additional relation (βj βi )3 = 1. Hence they satisfy the same relations as those in (1.2) satisfied by the transformations g1 , g2 , g3 in (1.1), except that the order of products is reversed; the order reversal is not a contradiction, since βi acts from the right, while gi acts from the left. This observation means that the isomonodromic action of B3 on M factors through an action of the modular group . Remark 2.6. The space T in (2.1) is the configuration space of unordered distinct three points in C. We may replace it by the configuration space of ordered distinct three points, to restrict the B3 -action to the pure braid group P3 = β12 , β22 , β22 . Clearly, the action obtained factors through the restriction to (2) of the -action in Remark 2.5. 3. Parametrization of Monodromy Data The aim of this section is to parametrize the space M in terms of the family of cubic surfaces S constructed in §1, along the line of arguments in Jimbo [20]. We shall introduce certain big open subsets of M, S and establish a neat parametrization theorem (Theorem 3.6). The reason why S appears in the parametrization is as follows: A natural strategy to parametrize M is to interpret it as the categorical quotient of the triple product SL(2, C)3 by the diagonal adjoint action of SL(2, C), namely, as the spectrum of its invariant ring, which is none other than the cubics S. As in [20], we shall employ the following basis of the invariant ring. Definition 3.1 (Invariants). Given M = (M1 , M2 , M3 ) ∈ SL(2, C)3 , let xi = Tr(Mk Mj ), ai = Tr Mi , a4 = Tr(M3 M2 M1 ).
(i = 1, 2, 3),
(3.1)
and put x = (x1 , x2 , x3 ), a = (a1 , a2 , a3 , a4 ). As will be seen later (Theorem 3.6), the invariants (x, a) introduced here are nothing other than the coordinates (x, a) that are used to construct the family S of cubic surfaces in §1. The following remark should be in order. Remark 3.2. For any cyclic permutation (i, j, k) of (1, 2, 3), we have a4 = Tr(Mk Mj Mi ), since the value of Tr(Mk Mj Mi ) depends only on the signature of (i, j, k).
196
K. Iwasaki
Any polynomial p = p(x, a) of (x, a) may be thought of as a function on S if (x, a) is regarded as the coordinates in §1, and as a function on M if (x, a) is regarded as the invariants in (3.1). So we can speak of open subsets S[p] = S ∩ {p = 0},
M[p] = M ∩ {p = 0}
of S and M, respectively. As such polynomials we will employ
(xi2 − 4) ψ(xi , ai , a4 ) (ν = 1), piν (x, a) = 2 (xi − 4) ψ(xi , aj , ak ) (ν = 2),
(3.2)
where the polynomial ψ(s, t, u) is defined by ψ(s, t, u) = s 2 + t 2 + u2 − stu − 4.
(3.3)
Then the following open subsets (charts) will play an important role, Siν = S[piν ],
Miν = M[piν ]
(i = 1, 2, 3, ν = 1, 2).
The reason why the polynomials piν (x, a) are relevant to our discussion will be clear in the proof of Theorem 3.6 below; see especially (3.15) and (3.16). The polynomial ψ(s, t, u) in (3.3) will frequently appear in the rest of this paper, with variables taking the form (s, t, u) = (xi , ap , aq ). There are two viewpoints looking at ψ(s, t, u); it is a (symmetric) cubic polynomial of three variables (s, t, u), as well as a quadratic polynomial of single variable s with parameters (t, u). We will mainly be based on the second viewpoint. A bit more notation: fix a square root of xi2 − 4 and put xi ± ri . (3.4) λ± ri = xi2 − 4, i = 2 Our parametrization of the space M is based on the following: Definition 3.3 (Normal forms). For i = 1, 2, 3, ν = 1, 2, let ϕiν : Siν → Miν ,
(3.5)
be the map associating to each (x, a) ∈ Siν the conjugacy class of the triple M = (M1 , M2 , M3 ) ∈ Miν defined as in Tables 1 (for ν = 1) and 2 (for ν = 2). This triple is referred to as the normal form on the chart Miν . First of all, the well-definedness of Definition 3.3 should be discussed. Lemma 3.4. The map ϕiν is well defined, that is, the conjugacy class of the triple M = (M1 , M2 , M3 ) defined in Tables 1 and 2 is uniquely determined, not depending on the choice of the branch in (3.4). Proof. We only consider the case ν = 1; the other case is treated in a similar manner and hence omitted. Taking the other branch in (3.4) has the effect that ri ↔ −ri and ∓ λ± i ↔ λi , which results in a change of the triple M. However, this change is canceled by taking conjugation by a matrix
ψ(xi , ai , a4 ) 0 ξ such that ξ2 = . −ξ −1 0 xi2 − 4 Hence the conjugacy class is independent of the choice of the branch.
Area-Preserving Action of the Modular Group
197
Table 1. The normal form on Miν with ν = 1 Mi =
a4 − ai λ− ψ(xi , ai , a4 ) i − ri xi2 − 4 a4 − ai λ+ i 1 − ri
yk − yj λ− ak − aj λ+ i i − − ri xi2 − 4 Mj = + − yk − yj λi ak − aj λi ψ(xi , ai , a4 ) ri yj − yk λ+ aj − ak λ+ i i − − ri xi2 − 4 Mk = − − yj − yk λi aj − ak λi ψ(xi , ai , a4 ) ri
Table 2. The normal form on Miν with ν = 2
a4 − ai λ− i ri Mi = yk − yj λ− i ψ(xi , aj , ak ) − Mj =
ak − aj λ+ i ri 1
−
yk − yj λ+ i
xi2 − 4 + a4 − ai λi − ri
−
ψ(xi , aj , ak ) xi2 − 4 ak − aj λ− i ri
+ aj − ak λ+ i λi ψ(xi , aj , ak ) − ri xi2 − 4 Mk = a − ak λ− j i −λ− i ri
It is not likely that a good parametrization is available on the entire space M. So we try to construct it on an open subset that should be as large as possible. We introduce such an open subset of M and its counterpart in S. Definition 3.5 (Big opens). Define open subsets S ◦ , M◦ of S, M by S◦ =
2 3 i=1 ν=1
Siν ,
M◦ =
2 3
Miν ,
i=1 ν=1
respectively. These open subsets are referred to as the big opens.
(3.6)
198
K. Iwasaki
It is of interest to ask how large the big open S ◦ is, or equivalently, how small the complement S \ S ◦ to the big open S ◦ is. This question will be answered in §4. With these preliminaries, we shall establish the following: Theorem 3.6 (Parametrization theorem). For each i = 1, 2, 3, ν = 1, 2, the map ϕiν : Siν → Miν in (3.5) is a homeomorphism. These six local homeomorphisms are patched together to yield a global homeomorphism between the big opens, ϕ : S ◦ → M◦ .
(3.7)
Proof. We shall only prove that the map ϕiν is bijective; the proofs of the remaining assertions are mere formalities. Further we only consider the case ν = 1; the other case ν = 2 can be treated in a similar manner. We first show that ϕiν is surjective. Given any M = (M1 , M2 , M3 ) ∈ Mi1 , we have xi = ±2. So the numbers λ± i are distinct and the . Hence there exists a matrix P ∈ SL(2, C) matrix Mk Mj has distinct eigenvalues λ± i such that − P (Mk Mj )P −1 = diag{λ+ i , λi }. Such a matrix P is unique up to the replacement P → DP , where D is any diagonal matrix of determinant one. If we put
u11 u12 −1 , P Mi P = U = u21 u22
v11 v12 P Mj P −1 = V = , v21 v22
w11 w12 . P Mk P −1 = W = w w 21 22
(3.8)
(3.9)
then the above diagonalization of Mk Mj is expressed as − W V = diag{λ+ i , λi }.
(3.10)
Conditions Tr U = Tr Mi = ai and Tr(W V U ) = Tr(Mk Mj Mi ) = a4 yield
u11 + u22 = ai , + − λi u11 + λi u22 = a4 , where (3.10) is used to derive the second equality. This system is settled as a 4 − a i λ− i , u11 = ri + u22 = − a4 − ai λi . ri − −1 is written as Equation (3.10) or equivalently W = diag{λ+ i , λi } V
w11 = λ+ w12 = −λ+ i v22 , i v12 , − w22 = λ− w21 = −λi v21 , i v11 .
(3.11)
(3.12)
Area-Preserving Action of the Modular Group
199
Conditions Tr V = Tr Mj = aj and Tr W = Tr Mk = ak yield
v22 = aj , v11 + + v + λ λ− i 11 i v22 = ak , where (3.12) is used to derive the second equality. This system is settled as a k − a j λ+ i , v11 = − ri ak − a j λ− i v22 = . ri Substituting (3.13) into (3.12), we have a j − a k λ+ i , w11 = − ri − w22 = aj − ak λi . ri
(3.13)
(3.14)
Applying (3.11) to the condition det U = det Mi = 1, we have u12 u21 = −
ψ(xi , ai , a4 ) . xi2 − 4
(3.15)
Similarly, applying (3.13) to the condition det V = det Mj = 1, we have v12 v21 = −
ψ(xi , aj , ak ) . xi2 − 4
(3.16)
Conditions Tr(V U ) = Tr(Mj Mi ) = xk and Tr(U W ) = Tr(Mi Mk ) = xj yield
= xk − u11 v11 − u22 v22 , u12 v21 + u21 v12 + + − λ− u v + λ u v i 12 21 i 21 12 = −xj + λi u11 v22 + λi u22 v11 , where (3.12) is used to derive the second equality. Upon substituting (3.11) and (3.13) into the right-hand side, this system is settled as y j λ+ i − yk v = , u 12 21 2 xi − 4 (3.17) y j λ− i − yk . u21 v12 = xi2 − 4 Now we notice that there exists the following identity, − 2 ψ(xi , ai , a4 )ψ(xi , aj , ak ) − (yk − yj λ+ i )(yk − yj λi ) = (xi − 4)f (x, a),
(3.18)
where f (x, a) is the polynomial defined by (1.9). Putting (3.15), (3.16), (3.17) together and using (3.18) yield (u12 u21 )(v12 v21 ) − (u12 v21 )(u21 v12 ) =
f (x, a) . xi2 − 4
200
K. Iwasaki
This leads to f (x, a) = 0, since the left-hand side is clearly zero. Hence we have (x, a) ∈ S. On the other hand, since M = (M1 , M2 , M3 ) ∈ Mi1 , we have (xi2 − 4) ψ(xi , ai , a4 ) = 0, and so (x, a) ∈ Si1 . Moreover it, together with (3.15), implies that u12 u21 = 0. If we make the replacement (3.8) with D = diag{δ, δ −1 }, then the (1, 2)-entries and (2, 1)entries of (3.9) are multiplied by δ 2 and δ −2 respectively, while all the diagonal entries are kept invariant. Taking a suitable number δ, if necessary, we may assume from the beginning that u21 = 1. (3.19) Then (3.15) and the second equality of (3.17) yield u12 = −
ψ(xi , ai , a4 ) , xi2 − 4
v12 = −
yk − y j λ− i xi2 − 4
.
(3.20)
Substituting the first equality of (3.20) into that of (3.17), we have v21 =
y k − y j λ+ i . ψ(xi , ai , a4 )
(3.21)
Moreover, substituting (3.20) and (3.21) into (3.14), we obtain w12 = −
yj − y k λ + i xi2 − 4
,
w21 =
y j − y k λ− i . ψ(xi , ai , a4 )
(3.22)
Comparing (3.11), (3.13), (3.14), (3.19), (3.20), (3.21), (3.22) with Table 1, we conclude that ϕi1 (x, a) = M, namely, the map (3.5) is surjective. To show that the map (3.5) is injective, we have only to notice that, once the normalization (3.19) is employed, the admissible diagonals D in (3.8) are only D = ±I . However, for D = ±I , the replacement (3.8) leaves every entry in (3.9) unchanged. This fact readily implies the injectivity of the map (3.5). The proof is complete.
Theorem 3.6 enables us to derive the transformation formula (1.1) for the isomonodromic action (2.11) of B3 on M. Theorem 3.7 (Transformations). In terms of invariants (x, a) in (3.1), the action of the braid βi in (2.11) is represented by the transformation gi in (1.1). Proof. It is easy to prove the equalities in (1.1) except for the first one. Indeed, by (2.11) and (3.1), we have xj = Tr(Mi Mk ) = Tr(Mj Mk ) xk ai aj ak
= = = =
Tr(Mj Mi ) Tr Mi Tr Mj Tr Mk
=
Tr(Mj Mi Mj−1 Mj )
= Tr Mj =
Tr(Mj Mi Mj−1 )
= Tr Mk
= Tr(Mk Mj ) = xi , = Tr(Mj Mi ) = xk , = aj , = ai , = ak .
It remains to prove the first equality. We again use (2.11) and (3.1) to obtain xi = Tr(Mj Mk ) = Tr(Mk Mj Mi Mj−1 ).
Area-Preserving Action of the Modular Group
201
To evaluate the right-hand side, we utilize the parametrization ϕiν : Siν → Miν in (3.5). If we pick out the case ν = 1, namely, the normal form in Table 1, then we have (Mi , Mj , Mk ) = (U, V , W ), where U , V , W are given by (3.9). By (3.10) we have − xi = Tr(W V U V −1 ) = λ+ i (u11 v22 − u12 v21 ) + λi (u22 v11 − u21 v12 ).
Substituting (3.11), (3.13) and (3.17) into this formula yields xi =
− − + λ+ i {(a4 − ai λi )(ak − aj λi ) − (yj λi − yk )}
+
λ− i {(a4
xi2 − 4 + − − ai λi )(ak − aj λ+ i ) − (yj λi xi2 − 4
− yk )}
.
After some computations, we obtain xi = θj (a) − xj − xk xi as desired.
Corresponding to Remark 2.6, we make the following: Remark 3.8. Definition (3.1) allows us to consider the fibration π : M → C4 ,
M = (M1 , M2 , M3 ) → a = (a1 , a2 , a3 , a4 ).
(3.23)
It is clear that the fiber M(a) over each a ∈ C4 is stable under the action of the pure braid group P3 in Remark 2.6. Then Theorem 3.7 implies that the action of the pure braid βi2 is represented by the transformation gi2 . We conclude this section with the following: Definition 3.9 (Extended monodromy data). Fix an index i ∈ {1, 2, 3}. For a monodromy data M = (M1 , M2 , M3 ) ∈ M, we put M4 = (Mk Mj Mi )−1 ,
(3.24)
where (i, j, k) is the cyclic permutation of (1, 2, 3) starting from i. The quartet (M1 , M2 , M3 , M4 ) is then called the extended monodromy data relative to the index i. Reference to the index will often be omitted, but which index is chosen should be distinguished from the context. Lemma 3.10. If M = (M1 , M2 , M3 ) is the normal form on Miν with ν = 1 (see Table 1), then the matrix M4 in (3.24) takes the form ai − a4 λ− λ+ i i ψ(xi , ai , a4 ) ri xi2 − 4 (3.25) M4 = . + a − a λ i 4 i − −λi − ri Proof. The proof is just by a straightforward calculation.
A similar formula to (3.25) can also be obtained for ν = 2, its derivation being left to the reader. If M = (M1 , M2 , M3 ) is the normal form on Miν , then the quartet (M1 , M2 , M3 , M4 ) with M4 given by (3.25) is referred to as the extended normal form on Miν . This notion will be necessary in §5.
202
K. Iwasaki
4. The Big Open We shall characterize the big open S ◦ in some detail and make sure that it certainly occupies a large portion of S. Based on the fibration (x, a) → a,
π : S → C4 ,
this problem can be discussed fiberwise. We define the big open of S(a) by S ◦ (a) = S(a) ∩ S ◦
(a ∈ C4 ).
Then a first question is to ask when the big open S ◦ (a) coincides with the entire surface S(a). To approach this problem, we consider the complement S(a) \ S ◦ (a) rather than S(a) itself. In view of the definition (3.6), a point x ∈ S(a) belongs to the complement S(a) \ S ◦ (a) if and only if x is a common root of six algebraic equations, piν (x, a) = 0
(i = 1, 2, 3, ν = 1, 2),
(4.1)
where piν (x, a) are defined by (3.2). Clearly the eight points (±2, ±2, ±2) are common roots of (4.1) on C3 . Thus, for a triple sign ε = (ε1 , ε2 , ε3 ) ∈ {±1}3 , we have 2ε = (2ε1 , 2ε2 , 2ε3 ) ∈ S(a) \ S ◦ (a) if and only if fa (2ε) = 0. This observation leads us to introduce the polynomial v(a) = fa (2ε), (4.2) ε∈{±1}3
which is naturally expected to play a role in solving the above problem. Somewhat more unexpectedly, the polynomial w(a) defined by (1.14) also plays an important part. Indeed we have the following: Theorem 4.1. For any a ∈ C4 , we have S ◦ (a) = S(a) if and only if v(a) w(a) = 0,
(4.3)
where v(a) and w(a) are defined by (4.2) and (1.14), respectively. Proof. First, we show that if the complement S(a) \ S ◦ (a) is nonempty, then we have v(a)w(a) = 0. It is sufficient to deduce w(a) = 0 upon assuming v(a) = 0. Let x = (x1 , x2 , x3 ) be a point of S(a) \ S ◦ (a). By the assumption v(a) = 0, it follows from (4.2) that xi = ±2 for some i ∈ {1, 2, 3}. In view of (3.2), we notice that xi is a common root of two quadratic equations,
ψ(xi , ai , a4 ) = xi2 − (ai a4 )xi + ai2 + aj2 − 4 = 0, (∗)i (4.4) ψ(xi , aj , ak ) = xi2 − (aj ak )xi + aj2 + ak2 − 4 = 0. Two cases occur according to whether τi (a) = ai a4 − aj ak is zero or not. Case (1). τi (a) = 0: In this case, subtracting one equation from the other in (4.4), we find that the common root xi must be zi =
ai2 − aj2 − ak2 + a42 τi (a)
.
(4.5)
Area-Preserving Action of the Modular Group
203
On the other hand, a simple check shows that there exist identities ψ(zi , ai , a4 ) = ψ(zi , aj , ak ) =
w(a) . τi2 (a)
(4.6)
Hence (4.4) implies w(a) = 0. Case (2). τi (a) = 0: In this case, the two quadratic equations (4.4) have common roots if and only if they are identical, that is, if and only if ai a4 = aj ak ,
ai2 + a42 = aj2 + ak2 .
This is the case if and only if there exists a sign ε ∈ {±1} such that either aj = εai , aj = εa4 , (i) or (ii) ak = εa4 , ak = εai .
(4.7)
In either case, if we put εi = 1, εj = εk = −ε, then we have ε1 ε2 ε3 = 1 and ε1 a1 + ε2 a2 + ε3 a3 + a4 = 0.
(4.8)
This, together with τi (a) = ai a4 − aj ak = 0, yields w(a) = 0. Conversely, we shall show that if v(a)w(a) = 0, then the complement S(a) \ S ◦ (a) is nonempty. First, if v(a) = 0, we have fa (2ε) = 0 for some triple sign ε ∈ {±1}3 , and hence 2ε ∈ S(a) \ S ◦ (a) as desired. Next we assume that w(a) = 0. The arguments are divided into four cases. Case (1). τ1 (a)τ2 (a)τ3 (a) = 0: In this case, (4.5) makes sense for each i = 1, 2, 3. Since we are assuming w(a) = 0, (4.6) implies that zi is a common root of (4.4) for i = 1, 2, 3. Hence z = (z1 , z2 , z3 ) is a common root of (4.1), namely, z ∈ S ◦ (a). On the other hand, there exists an identity fa (z) =
(a1 a2 a3 a4 )w 2 (a) . τ12 (a)τ22 (a)τ32 (a)
By the assumption w(a) = 0, we have fa (z) = 0 and hence z ∈ S(a) \ S ◦ (a). Case (2). τi (a) = 0, τj (a)τk (a) = 0. By τi (a) = 0, (1.14) implies that (εi ai + εj aj + εk ak + a4 ) = 0. εi εj εk =1
Hence there exists a triple sign (ε1 , ε2 , ε3 ) ∈ {±1}3 with ε1 ε2 ε3 = 1 such that (4.8) holds. Then conditions (4.8) and τi (a) = 0 readily yield either case of (4.7), with ε = −εk in case (i) and ε = −εj in case (ii). But we have τj (a) = 0 in case (i) and τk (a) = 0 in case (ii). So neither case is feasible. Case (3). τi (a) = τj (a) = 0, τk (a) = 0. The argument in Case (2) shows that the case (i) of (4.7) is occurring. Then we have τk (a) = ε(a42 − ai2 ) = 0 and zk = 2ε. By the assumption w(a) = 0 and identities (4.6) with i replaced by k, we see that zk is a common root of the system (∗)k in (4.4). On the other hand, systems (∗)i and (∗)j are reduced to single equations, (∗)i (∗)j
xi2 − (ai a4 )xi + ai2 + a42 − 4 = 0, xj2 − (εai a4 )xj + ai2 + a42 − 4 = 0,
204
K. Iwasaki
respectively. Moreover, in the present situation, we observe that fa (xi , xj , zk ) = (xi + εxj − ai a4 )2 . In view of these, take any root α of (∗)i and put xi = α,
xj = ε(ai a4 − α),
xk = zk = 2ε.
Then we easily see that xj is a root of (∗)j , along with the trivial fact that xi is a root of (∗)i and fa (x) = 0. This means that x ∈ S(a) \ S ◦ (a). Case (4). τ1 (a) = τ2 (a) = τ3 (a) = 0. Condition (4.8) is still satisfied for some ε = (ε1 , ε2 , ε3 ) ∈ {±1}3 with ε1 ε2 ε3 = 1. Using this we easily see that ai = −εi a4
(i = 1, 2, 3).
For each i = 1, 2, 3, the system (∗)i in (4.4) is reduced to a single equation xi2 + εi a42 xi + 2(a42 − 2) = (xi + 2εi ){xi + εi (a42 − 2)} = 0. Hence (∗)i has the roots −2εi , −εi (a42 − 2). On the other hand, we have fa (x) = x1 x2 x3 + x12 + x22 + x32 − 2ε1 a42 x1 − 2ε2 a42 x2 − 2ε3 a42 x3 + a44 + 4a42 − 4. It can easily be seen that x = (−2ε1 , −2ε2 , −ε3 (a42 − 2)), for instance, satisfies (∗)i , i = 1, 2, 3, and fa (x) = 0 simultaneously, and hence x ∈ S(a) \ S ◦ (a). In any case, the complement is nonempty and the proof is complete.
Theorem 4.1 prompts a complete characterization of the set S(a) \ S ◦ (a) for each a ∈ C4 satisfying v(a)w(a) = 0. This problem is not discussed in this paper, being left to other occasions. Here we only content ourselves with the following: Lemma 4.2. For any a ∈ C4 , the set S(a) \ S ◦ (a) contains at most 64 points. Proof. In (4.1), each equation piν (x, a) = 0 is a quartic equation for the single unknown xi . Hence, for each i = 1, 2, 3, there are at most four possible values of xi . In total, x = (x1 , x2 , x3 ) has at most 43 = 64 possibilities.
The upper bound 64 in Lemma 4.2 is not best possible; it is the result of a very rough estimate. We only wish to illustrate that, for every a, the big open S ◦ (a) has at most finitely many complements in S(a). 5. Symplectic Structure The parameters a = (a1 , a2 , a3 , a4 ) in (3.1) play the role of local monodromy data around the punctures t1 , t2 , t3 , t4 of the space Xt = C − {t1 , t2 , t3 } = P1 − {t1 , t2 , t3 , t4 }, where t4 = ∞ is the point at infinity. To discuss moduli spaces of monodromy representations with fixed local monodromy data, let Rt (a) be the subspace of Rt that can be identified with M(a) through the bijection (2.9), Rt −→ M ∪ ∪ Rt (a) −→ M(a).
Area-Preserving Action of the Modular Group
205
Then Rt (a) may be regarded as the moduli space of monodromy representations with a fixed local monodromy data a. This na¨ıve picture is true, provided that a ∈ C4 satisfies the condition 4 (ai2 − 4) = 0. (5.1) i=1
Indeed, for i = 1, 2, 3, 4, a local monodromy data at the point ti is the conjugacy class of a local monodromy matrix Mi , while the conjugacy class of a matrix Mi ∈ SL(2, C) is uniquely determined by the value of its trace ai = Tr Mi , provided that Mi has distinct eigenvalues, namely, provided that ai = ±2. This constraint for every i = 1, 2, 3, 4 leads to the condition (5.1). In previous papers [16, 17], following the idea of Goldman [8], we constructed a natural symplectic structure on Rt (a) based on the Poincar´e-Lefschetz duality for cohomology. Note that Hitchin [13] also considered the symplectic structure in similar isomonodromic problems. Let us briefly recall our construction. First, we notice that the space Xt in (2.5) can be replaced by a compact domain D with boundary C = C1 ∪ C2 ∪ C3 ∪ C4 as indicated in Fig. 5, where Cl , l = 1, 2, 3, 4, are copies of the circle S 1 . Namely, D is obtained from the Riemann sphere P1 S 2 by removing four sufficiently small open disks centered at t1 , t2 , t3 , t4 . Then the space Rt is identified with the moduli space RD of monodromy representations of π1 (D), Rt = RD . Let RC be the moduli space of monodromy representations of π1 (C) and let r : RD → RC = RC1 × RC2 × RC3 × RC4 be the natural restriction map. Then each local monodromy data a ∈ C4 satisfying (5.1) is thought of as an element of RC and we have for a ∈ RC , Rt (a) = { ρ ∈ RD : r(ρ) = a }. C4
'
$
'$ '$ '$ Ci
Cj
Ck
&% &% &% D
&
% Fig. 5. The domain D with boundary C = C1 ∪ C2 ∪ C3 ∪ C4
206
K. Iwasaki
Hence the tangent space to Rt (a) at a point ρ ∈ Rt (a) is expressed as Tρ Rt (a) = Ker [ (dr)ρ : Tρ RD → Tr(ρ) RC ]. This expression has the following cohomological interpretation. A monodromy representation ρ ∈ Rt (a) defines a linear representation Ad ◦ ρ −1 of π1 (D) in sl(2, C), where Ad is the adjoint representation of SL(2, C) in its Lie algebra sl(2, C). Let Lρ be the associated flat sl(2, C)-bundle over D. Then the standard deformation theory tells us that the tangent spaces Tρ RD and Tr(ρ) RC are identified with the first cohomology groups H 1 (D; Lρ ) and H 1 (C; Lρ ), respectively, and the tangent map (dr)ρ : Tρ RD → Tr(ρ) RC at ρ of the restriction map r is represented by the homomorphism j ∗ in the cohomology long exact sequence of the pair (D, C) with local system Lρ , δ∗
i∗
j∗
H 0 (C; Lρ ) −−−−→ H 1 (D, C; Lρ ) −−−−→ H 1 (D; Lρ ) −−−−→ H 1 (C; Lρ ). Hence the tangent space to Rt (a) at the point ρ can be expressed as Tρ Rt (a) = Ker [ j ∗ : H 1 (D; Lρ ) → H 1 (C; Lρ ) ].
(5.2)
The long exact sequence allows another description of the tangent space, Tρ Rt (a) =
H 1 (D, C; Lρ ) , δ ∗ H 0 (C; Lρ )
(5.3)
where this identification is induced from the homomorphism i ∗ . On the other hand, there exists the Poincar´e-Lefschetz duality pairing cup product
H 1 (D; Lρ ) × H 1 (D, C; Lρ ) −−−−−−→ H 2 (D, C; Lρ ⊗ Lρ ) Killing form
(5.4)
−−−−−−→ H 2 (D, C; CD ) = C. Here the second arrow is induced from the morphism of local systems Lρ ⊗ Lρ → CD associated to the Killing form on the Lie algebra sl(2, C), where CD is the constant system on D with fiber C. In (5.4) the orthogonal complement to Ker j ∗ ⊂ H 1 (D; Lρ ) is the subspace δ ∗ H 0 (C; Lρ ) ⊂ H 1 (D, C; Lρ ). Hence (5.4) induces a perfect pairing between two vector spaces Ker j ∗ and H 1 (D, C; Lρ )/δ ∗ H 0 (C; Lρ ). Then the identifications (5.2) and (5.3) lead to a nondegenerate skew-symmetric bilinear form a,ρ : Tρ Rt (a) × Tρ Rt (a) → C.
(5.5)
We thus have an almost symplectic structure a on Rt (a), which turns out to be integrable and hence defines a (complex) symplectic structure. The aim of this section is to represent this symplectic structure explicitly in terms of the coordinates (x, a) in §1, and the main result is the following: Theorem 5.1 (Symplectic structure). In terms of the coordinates (x, a), the symplectic structure (5.5) is identical with the 2-form ωa defined by (1.11).
Area-Preserving Action of the Modular Group
'
qi
r
+
207 qj
qk
r
+
r
$ +
6δi
6δj
6δk
δi− ?
δj− ?
δk− ?
rpi rpj rpk '$ '$ '$ ?
Ci
Cj
Ck
&% &% &%
6
D
-
&
%
C4 Fig. 6. The domain D with cuts δ = δ1 ∪ δ2 ∪ δ3
The rest of this section is devoted to the proof of this theorem. It will be completed only at the end of this section after making several preliminary discussions. Although we are now considering the space of monodromy representations over P1 minus four points, the calculation presented below remains valid over P1 minus n points. To express the pairing (5.5) explicitly, we shall describe the tangent space Tρ Rt (a) in terms of de Rham cohomology. By (5.2), any tangent vector X ∈ Tρ Rt (a) is an element of H 1 (D; Lρ ) whose j ∗ -image in H 1 (C; Lρ ) is trivial. We begin by describing the de Rham isomorphism 1 (D; Lρ ), H 1 (D; Lρ ) → HDR
X → φ.
(5.6)
Let M = (M1 , M2 , M3 ) ∈ M(a) be the monodromy data of the monodromy representation ρ ∈ Rt (a). We provide the domain D with cuts δ1 , δ2 , δ3 , where for each l = 1, 2, 3, the cut δl is a line segment joining the circles Cl and C4 in a manner indicated in Fig. 6. Let δl+ (resp. δl− ) be the line segment infinitesimally near δl to the right (resp. left). The domain D is provided with the usual counter-clockwise orientation. Then the induced orientations on Cl , l = 1, 2, 3, 4, and δl± , l = 1, 2, 3, are indicated in Fig. 6. The loop obtained by joining δl+ , Cl , δl− in this order corresponds to the loop γl in Fig. 2 with reversed orientation. An Lρ -valued smooth differential 1-form φ on D is identified with an sl(2, C)-valued smooth differential 1-form φ on D − (δ1 ∪ δ2 ∪ δ3 ), having extensions φl± to δl± , such that φl+ = Ml φl− Ml−1 on δl (l = 1, 2, 3). (5.7) On the other hand, an element X ∈ H 1 (D; Lρ ) is represented by a triple X = (X1 , X2 , X3 ) ∈ sl(2, C)3 , where Xl is regarded as an sl(2, C)-valued constant function on δl . Since the sheaf of smooth sections of Lρ is soft, there exists an sl(2, C)-valued ± smooth function u on D − (δ1 ∪ δ2 ∪ δ3 ), having extensions u± l to δl , such that − −1 Xl = u+ l − M l ul M l
on
δl
(l = 1, 2, 3).
(5.8)
208
K. Iwasaki
Then φ = du satisfies (5.7) and defines an Lρ -valued closed 1-form whose de Rham class is none other than the image of X under the isomorphism (5.6). Assume that X ∈ H 1 (D; Lρ ) has the trivial j ∗ -image in H 1 (C; Lρ ) =
4
H 1 (Cl ; Lρ ).
l=1
For l = 1, 2, 3, the condition that X ∈ H 1 (D; Lρ ) has the trivial j ∗ -image in H 1 (Cl ; Lρ ) implies that there exists a matrix Yl ∈ sl(2, C)4 such that Xl = Yl − Ml Yl Ml−1
(l = 1, 2, 3),
(5.9)
where Yl is regarded as an sl(2, C)-valued constant function on Cl . Similarly, the condition that X ∈ H 1 (D; Lρ ) has the trivial j ∗ -image in H 1 (C4 ; Lρ ) implies that there exists a triple Z = (Zij , Zj k , Zki ) ∈ SL(2, C)3 such that −1 Xi = Zij − Mi Zki Mi , Xj = Zj k − Mj Zij Mj−1 , (5.10) −1 Xk = Zki − Mk Zj k Mk , where Zij , Zj k , Zki are regarded as sl(2, C)-valued constant functions on the arcs q i qj , q j qk , q k qi on C4 , respectively. Remark 5.2. Giving a triple Z = (Zij , Zj k , Zki ) satisfying condition (5.10) is equivalent to giving a matrix Zki satisfying a compatibility condition Zki − M4−1 Zki M4 = Xk + Mk Xj Mk−1 + (Mk Mj )Xi (Mk Mj )−1 .
(5.11)
Indeed, given a triple Z, elimination of Zij and Zj k from (5.10) yields the compatibility condition (5.11) for Zki . Conversely, if a matrix Zki satisfying (5.11) is given, then Zij and Zj k are uniquely determined from the first and second equations of (5.10); the third equation is automatically satisfied thanks to condition (5.11). In the present case, for any data Y = (Y1 , Y2 , Y3 ) and Z = (Zij , Zj k , Zki ) satisfying conditions (5.9) and (5.10), we may and shall assume that the function u mentioned above takes constant boundary values, Yl on Cl (l = 1, 2, 3), Z on q ij i qj , u= (5.12) Z on q j k j qk , Zki on q k qi . Accordingly, the 1-form φ = du has zero boundary values φ=0
(l = 1, 2, 3, 4).
on Cl
(5.13)
In conclusion, a tangent vector X ∈ Tρ Rt (a) is represented by the data (X, Y, Z, u, φ) constructed above. We call it the de Rham data of X. Then the symplectic pairing (5.5) between X, X˜ ∈ Tρ Rt (a) is expressed as ˜ = ˜ a,ρ (X, X) Tr (φ ∧ φ), (5.14) D
Area-Preserving Action of the Modular Group
209
˜ Y˜ , Z, ˜ u, ˜ respec˜ are the de Rham data of X and X, where (X, Y, Z, u, φ) and (X, ˜ φ) ˜ is a single-valued smooth 2-form on D (with zero boundary tively. Note that Tr (φ ∧ φ) value), since φ and φ˜ satisfy condition (5.7). Hence the integral in (5.14) is well defined. Now the Stokes theorem allows us to recast the integral representation (5.14) into a more elementary expression. Lemma 5.3. The symplectic pairing (5.14) between two tangent vectors X, X˜ ∈ Tρ Rt (a) ˜ Y˜ , Z, ˜ u, ˜ is given by with de Rham data (X, Y, Z, u, φ), (X, ˜ φ) ˜ = Tr(Xi [Y˜i − Z˜ ij ]) + Tr(Xj [Y˜j − Z˜ j k ]) + Tr(Xk [Y˜k − Z˜ ki ]). (5.15) a,ρ (X, X) Proof. Integral representation (5.14) yields ˜ ˜ ˜ Tr (φ ∧ φ) = Tr (φ ∧ φ) a,ρ (X, X) = D D−δ ˜ = ˜ Tr (du ∧ φ) d Tr (u φ) = D−δ D−δ ˜ = Tr (u φ) =
∂(D−δ) 3 δl+
l=1
=
3 l=1
δl+
˜+ Tr (u+ l φl ) +
3 δl−
l=1
˜+ Tr (u+ l φl )
+
3 δl−
l=1
˜− Tr (u− l φl ) +
(by φ = du) (by Stokes) 4 l=1
˜ Tr (u φ) Cl
˜− Tr (u− l φl )
(by (5.13)).
Rewriting the second term on the last line by using the equalities −1 + u− l = Ml (ul − Xl )Ml ,
φ˜ l− = Ml−1 φ˜ l+ Ml
on
δl ,
which follow from (5.8) and (5.7), we have ˜ = a,ρ (X, X) = =
3 l=1
δl+
l=1
δl+
3 3 l=1
δl+
˜+ Tr (u+ l φl )
−
Tr (Xl φ˜ l+ ) = d Tr (Xl u˜ + l )=
3 l=1
δl+
l=1
δl+
3
3
˜+ Tr ((u+ l − Xl ) φl ) ˜ Tr (Xl d u˜ + ˜ l ) (by φ = d u)
Tr (Xl [u˜ + ˜+ l (pl ) − u l (ql )]).
l=1
˜ ˜ ˜ + (qj ) = Z˜ j k , By (5.12), we have u˜ + ˜+ l (pl ) = Yl for l = 1, 2, 3, and u i (qi ) = Zij , u j ˜ ki . Substituting these equalities into the above formula, we establish (5.15). u˜ + (q ) = Z k k The proof is complete.
Formula (5.15) is still in an intermediate result yet to be made more explicit. This task will be made in terms of the extended normal form on Miν constructed in §3. In that procedure, it will be necessary to diagonalize the matrices in Table 1 and (3.25) for ν = 1 and their counterparts for ν = 2. In what follows, only the case ν = 1 is treated,
210
K. Iwasaki
the other case being omitted. We require a bit of notation; as in (3.4), fix a square root of ai2 − 4 and put ai ± si si = ai2 − 4, ξi± = . 2 Under the assumption (5.1), si is nonzero and ξi± are mutually distinct. To save space, the following abbreviated notation is employed in the sequel, ± = y − y λ± , ypq p q i
ψpq = ψ(xi , ap , aq ),
± apq
bpq = 2ap − aq xi .
Moreover, we put i =
=
a p − a q λ± i ,
√ −1 (boldface) to distinguish it from the index i.
Lemma 5.4. Let M = (M1 , M2 , M3 , M4 ) be the extended normal form on Mi1 as in Table 1 and (3.25). Then the matrices Ml are diagonalized as Pl−1 Ml Pl = diag{ξl+ , ξl− }
(l = 1, 2, 3, 4),
where Pl ∈ SL(2, C), l = 1, 2, 3, 4, are defined as in Table 3. Proof. We use the following general fact: Assume that a matrix M=
ab cd
∈ SL(2, C).
Table 3. The diagonalizing matrices Pl . (i = + + 1 a4i + ξi ri Pi = √ ri si 1 1 Pj = √ sj
1 Pk = √ sk
+ + ξi− ri a4i ri 1
− ykj
ri + −akj − ξj+ ri
−
yj+k ri −aj+k − ξk+ ri
√ −1)
−
1 ri + akj + ξj− ri − ykj
1 ri aj+k + ξk− ri yj+k
a + + ξ4+ ri λ+ (a + + ξ4− ri ) i − i4 − i i4 P4 = √ ri ri s4 − λi 1
(5.16)
Area-Preserving Action of the Modular Group
211
has distinct eigenvalues ξ ± and define the matrix P ∈ SL(2, C) by
1 µ(ξ + − d) (ξ − − d)/µ (if c = 0), c(ξ + − ξ − ) µc c/µ
P = i µb b/µ (if b = 0), + − b(ξ + − ξ − ) µ(ξ − a) (ξ − a)/µ
(5.17)
where µ is any nonzero number. Then the matrix M is diagonalized as P −1 MP = diag{ξ + , ξ − }. We apply the first formula of (5.17) to M = Mi , M4 and the second formula to M = Mj , Mk , respectively. Taking suitable numbers µ, we obtain the matrices Pi , Pj , Pk , P4 as in Table 3. The proof is complete.
Next we have to calculate the infinitesimal variation of a monodromy data in M(a). Let d denote the exterior differentiation on the space M(a), namely, the relative differentiation on the fibration (3.23). Note that we must treat a = (a1 , a2 , a3 , a4 ) as constants when applying the differential d. Lemma 5.5. Let M = (M1 , M2 , M3 , M4 ) be the extended normal form on Mi1 and P = (P1 , P2 , P3 , P4 ) be the diagonalizing matrices given in Table 3. Define sl(2, C)valued 1-forms Xl and Yl by Xl = (dMl )Ml−1 ,
Yl = (dPl )Pl−1
(l = 1, 2, 3, 4),
(5.18)
and put X = (X1 , X2 , X3 ) and Y = (Y1 , Y2 , Y3 , Y4 ). Then Xl and Yl satisfy Xl = Yl − Ml Yl Ml−1
(l = 1, 2, 3, 4),
and, in particular, condition (5.9). If we put −1 Zij = Yi − Mi (Yi − Y4 )Mi , Zj k = Yj − Mj (Yj − Yi )Mj−1 − (Mj Mi )(Yi − Y4 )(Mj Mi )−1 , Zki = Y4 ,
(5.19)
(5.20)
then the triple Z = (Zij , Zj k , Zki ) satisfies condition (5.10). Moreover, the explicit formulas for X and Y are given by Tables 4 and 5, respectively. Proof. Condition (5.19) is readily obtained by differentiating (5.16) and using (5.18). Here we use the fact that the right-hand side of (5.16) is constant. Next we show the second assertion. It follows from M4 Mk Mj Mi = I that d(M4 Mk Mj Mi ) · (M4 Mk Mj Mi )−1 = 0, which leads to X4 + M4 Xk M4−1 + (M4 Mk )Xj (M4 Mj )−1 + (M4 Mk Mj )Xi (M4 Mk Mj )−1 = 0. Substituting (5.19) with l = 4 into this equation yields Y4 − M4−1 Y4 M4 = Xk + Mk Xj Mk−1 + (Mk Mj )Xi (Mk Mj )−1 .
212
K. Iwasaki Table 4. The matrices X = (X1 , X2 , X3 ) − − b4i a4i ψi4 − a4i b r i4 ri2 Xi = i 3 dxi − ri a4i 1 − r i + akj ψj k − − ri ykj ri2 dykj Xj = ψ a− j k kj ψj k ri − − − − (ykj )2 ri ykj
− − + ykj bj k (2xi akj + bj k ) 2xi ψj k + akj − − 2 ri ri + 2xi ψj k a − + a − bj k bkj − ψj k bj k 2xi ψj k + a − bj k kj kj kj − ri ykj aj+k ψj k + ri yj+k ri2 dyj k Xk = ψ a− jk jk ψj k ri − + − + (yj k )2 ri yj k
2xi ψj k + aj−k bkj
− ri + 2xi ψj k a − + a − bj k bkj − ψj k bkj jk jk yj+k
−
dxi r3 i
yj+k (2xi aj+k + bkj ) ri2 2xi ψj k + aj−k bkj ri
dxi r3 i
Table 5. The matrices Y = (Y1 , Y2 , Y3 , Y4 ) Yi =
01 00
− Yj = − Yk = − Y4 =
+ akj
bi4
dxi ri3 + ξj− ri
− ri ykj
ψj k − 2 (ykj ) + aj k + ξk− ri
1 − 2 ri + akj + ξj− ri − ri ykj
ψj k
1 − 2 ri aj+k + ξk− ri
(yj+k )2
ri yj+k
ri yj+k
+ ai4
+ ξ4− ri ri2 λ− i ri
λ+ ψi4 − i 3 ri + ai4 + ξ4− ri ri2
x i 0 − − dykj ri b + s − j k xi j − r ykj i
dxi ri
x i − 0 + dyj k ri dxi + bkj xi s ri − k yj+k ri dx i s4
−
01 00
+ λi b4i ri3
dxi
Area-Preserving Action of the Modular Group
213
This means that Zki = Y4 satisfies the compatibility condition (5.11). In view of Remark 5.2, if we put −1 Zij = Xi + Mi Y4 Mi , Zj k = Xj + Mj Xi Mj−1 + (Mj Mi )Y4 (Mj Mi )−1 , Zki = Y4 , then the triple Z = (Zij , Zj k , Zki ) satisfies (5.10). Substituting (5.19) with l = 1, 2 into the above formula yields (5.20). Finally, the explicit formulas for X and Y in Tables 4 and 5 are obtained by direct but somewhat elaborate calculations. In this process, we have only to apply the general formula
r11 r12 r22 dr11 − r21 dr12 r11 dr12 − r12 dr11 −1 (dR)R = for R = r22 dr21 − r21 dr22 r11 dr22 − r12 dr21 r21 r22 to the matrices in Tables 1 and 3. Here preliminary formulas such as λ± xi dxi i dxi , dλ± = ± , i ri ri + + + − + + + (akj + ξj ri )(akj + ξj ri ) = ψj k , (ai4 + ξ4 ri )(ai4 + ξ4− ri ) = ψi4 , + − ykj = ψj k ψi4 ykj dri =
are effectively used in the course of calculations. Note that the last formula follows from the identity (3.18), since we have f (x, a) = 0.
Summarizing the above arguments, we obtain the following: Lemma 5.6. Let M = (M1 , M2 , M3 ), X = (X1 , X2 , X3 ), Y = (Y1 , Y2 , Y3 , Y4 ) be as in Tables 1, 4, 5. Then the symplectic 2-form a is expressed as a = Tr(Xi ∧ Mi (Yi − Y4 )Mi−1 )
+Tr(Xj ∧ Mj (Yj − Yi )Mj−1 )
+Tr(Xj ∧ (Mj Mi )(Yi − Y4 )(Mj Mi )−1 ) +Tr(Xk ∧ (Yk − Y4 )).
(5.21)
Proof. Let Z = (Zij , Zj k , Zik ) be defined by (5.20). Then (5.15) yields a = Tr(Xi ∧ [Yi − Zij ]) + Tr(Xj ∧ [Yj − Zj k ]) + Tr(Xk ∧ [Yk − Zki ]). Here the exterior product ∧ is introduced between, say, Xi and Yi − Zij , because they are 1-forms. Replacing Zij , Zj k , Zki by the right-hand sides of (5.20), we obtain (5.21).
We are now in a position to establish Theorem 5.1. Proof of Theorem 5.1. The only task yet to be done is to substitute into (5.21) the explicit formulas for M, X, Y in Tables 1, 4, 5 and to carry out some straightforward but elaborate computations. We have done this using the computer algebra system, MathematicaTM . A large amount of cancellations happens in the process of calculations and we finally arrive at the simple formula (1.11). More precisely, we are able to show that the 2-form defined by (5.14) is two times the 2-form defined by (1.11). To drop this inessential factor 2, we have only to replace the definition (5.14) by its half. The proof is complete.
214
K. Iwasaki
6. Hamiltonian Dynamics We shall discuss the connection between our construction and the isomonodromic nature of the Painlev´e VI equation from the viewpoint of Hamiltonian dynamics. In this section, we understand that T is the configuration space of ordered distinct three points in C, and hence the relevant action is that of the pure braid group P3 . So far we have considered the moduli space Rt (a) of monodromy representations for a fixed t ∈ T . Hereafter we shall consider the family of spaces Rt (a) parametrized by t ∈ T , namely, the fibration whose fiber over each t ∈ T is the space Rt (a), π : R(a) → T .
(6.1)
This fibration admits a local system structure whose monodromy at a base point t ∈ T is represented by the isomonodromic action of the pure braid group P3 on Rt (a). In our previous papers [16, 17] (see also Kawai [23, 24], Boalch [3]), we insisted on the standpoint of studying the isomonodromic deformation of Fuchsian differential equations based on the commutative diagram monodromy map
E(a) −−−−−−−−−→ R(a) π π T
−−−−−−−−−−→
(6.2)
T
identity
where E(a) is a moduli space of Fuchsian connections on a Riemann surface with a fixed local monodromy data a. In the present situation, it is a moduli space of Fuchsian equations with four regular singular points on the Riemann sphere P1 . The canonical projection π : E(a) → T (6.3) is the map associating to each Fuchsian connection its (ordered) regular singular points. The top horizontal arrow in (6.2) is the monodromy map, or the Riemann-Hilbert correspondence, associating to each Fuchsian connection its monodromy representation. The moduli space E(a) should be formulated in such a manner that the monodromy map is a covering map. Then this map lifts the local system structure on (6.1) up to the fibration (6.3). To describe this lifting explicitly, we shall introduce certain 2-forms, called the fundamental 2-forms. To this end, note that giving a local system structure on a fibration is equivalent to giving an integrable foliation whose leaves are transverse to each fiber, and that the latter structure determines a horizontal structure on the fibration. Now we make the following: Definition 6.1 (Fundamental 2-forms). The fundamental 2-form R(a) on R(a) is, by definition, the unique global 2-form on R(a) such that (1) R(a) |Rt (a) = Rt (a) for each t ∈ T , where Rt (a) = a is the symplectic 2-form on each fiber Rt (a) defined by (5.5). (2) ιX R(a) = 0 for any (local) horizontal vector field X on the fibration (6.1), where ιX denotes the interior product by the vector field X. Moreover, the fundamental 2-form E (a) on E(a) is defined to be the pull-back of R(a) by the monodromy map E(a) → R(a).
Area-Preserving Action of the Modular Group
215
We remark that the fundamental 2-form R(a) determines a relative symplectic structure on the fibration (6.1), with respect to which the foliation on it is symplectic, namely, Hamiltonian. If the monodromy map is a covering map, then these structures are lifted verbatim up to the fibration (6.3). Then the lifting principle is only a repeat of what we have just mentioned. Lemma 6.2 (Lifting principle). Assume that the monodromy map E(a) → R(a) is a covering map. Then it lifts the local system structure on (6.1) up to the fibration (6.3). In terms of the fundamental 2-form E (a) , the lifted local system structure is described as the condition that ιX E (a) = 0 (6.4) for any (local) horizontal vector field X on E(a). The fundamental 2-form E (a) determines a relative symplectic structure on the fibration (6.3), with respect to which the lifted foliation is Hamiltonian. The associated Hamiltonian system is none other than the rewrite of (6.4) as a differential equation. The almost tautological statement of Lemma 6.2 will yield a nontrivial result if the moduli space E(a) is formulated so as to admit good coordinates that make it possible to write down (6.4) as a concrete Hamiltonian system. We shall now set up the space E(a) in such a manner. There are two ways of representing Fuchsian equations with four regular singular points on P1 ; one is in terms of first order systems (connections) of rank two, and the other is in terms of second order single equations with an additional singularity called an apparent singular point, each having its merits and demerits. Naturally, we should take both approaches for a complete discussion on this theme, but this issue is left to forthcoming papers, e.g, Inaba, Iwasaki and Saito [15]. Here we only wish to clarify the role that our modular group action should play in the isomonodromic deformation. So we content ourselves to take the second approach only. We denote by z the coordinate on the finite plane C = P1 \ {∞}. To set up the space E(a), consider a meromorphic differential operator of the form L=−
d2 + Q(z). dz2
(6.5)
We think of it as an operator from MP1 (1) to MP1 (−3), where MP1 (d) is the sheaf of meromorphic sections of the degree d line bundle over P1 . Then L takes the same form as (6.5) in terms of the coordinate z−1 around the point at infinity. We assume that the potential Q(z) is of the form
3 κi2 − 1 3 Hi p Q(z) = + . (6.6) − + z − ti 4(z − q)2 z−q 4(z − ti )2 i=1
Then L has regular singular points at ti , i = 1, 2, 3, and q, with local exponents (1±κi )/2, i = 1, 2, 3, and (1 ± 2)/2, respectively. Assume further that the point at infinity, t4 = ∞, is also a regular singular point with exponents (1 ± κ4 )/2 and that q is an apparent singular point, namely, a non-logarithmic singular point. Then the last assumption forces Hi to be a rational function of (q, p, t) with parameters κ = (κ1 , κ2 , κ3 , κ4 ), namely,
3 κl2 − 1 qm qn qi 2 Hi = qj qk p + (qj + qk )p + α − , (6.7) tij tki 4 ql qi l=1
216
K. Iwasaki
where {l, m, n} = {1, 2, 3} and the following abbreviated notation is used, κ12 + κ22 + κ32 − κ42 + 1 . 4 Given an operator L, its monodromy matrix along the loop γi in Fig. 2 is denoted by −Mi for each i = 1, 2, 3, 4. Then Mi has the eigenvalues exp(±π κi ), and hence the trace Tr Mi = 2 cos πκi . On the other hand, the monodromy matrix around the apparent singular point q is −I . Dropping the factor −1, we may define the monodromy data of L to be the triple M = (M1 , M2 , M3 ). Now we are in a position to make the following: qi = q − ti ,
tij = ti − tj ,
α=
Definition 6.3 (Fuchsian moduli). Let Eκ be the space of Fuchsian differential operators L in (6.5), with (6.6) and (6.7), and let Eκ , (6.8) E(a) = κ
where the disjoint union is taken over all κ = (κ1 , κ2 , κ3 , κ4 )’s such that ai = 2 cos πκi (i = 1, 2, 3),
a4 = −2 cos π κ4 .
(6.9)
The monodromy map E(a) → R(a) is defined by L → (M, t), where M is as above and t = (t1 , t2 , t3 ) is the location of ordered regular singular points. A more precise description of the monodromy map is given in Inaba, Iwasaki and Saito [14, 15]. In this setting, a result of Iwasaki [16] and Yoshida [41] implies that the monodromy map is a covering map over a Zariski open subset of R(a), one-to-one on each Eκ . Then the general pull-back principle established by Iwasaki [17] yields the following: Theorem 6.4 (Hamiltonian system). Let Hi = Hi (q, p, t, κ), i = 1, 2, 3, be as in (6.7). Then the fundamental 2-form E (a) on E(a) is expressed as E (a) = dq ∧ dp −
3
dti ∧ dHi ,
(6.10)
i=1
and the isomonodromic deformation is described by the Hamiltonian system ∂Hi ∂q = , ∂ti ∂p
∂Hi ∂p =− ∂ti ∂q
(i = 1, 2, 3).
(6.11)
The complete integrability of the system (6.11) is clear from the way in which it is constructed; of course, a direct check will also reconfirm it. The Hamiltonian system (6.11) is a system of partial differential equations with three independent variables t = (t1 , t2 , t3 ). However, it can be reduced to a system of ordinary differential equations by a symmetry reduction. Indeed, the group of affine transformations on C acts diagonally on the configuration space T and this action on the base space lifts symplectically up to the total space E(a) of the fibration (6.3). Then the last action reduces (6.11) into a Hamiltonian system of single time-variable, which is equivalent to the Painlev´e VI equation, PVI (κ), with parameters κ = (κ1 , κ2 , κ3 , κ4 ),
1 1 1 1 1 1 1 + + qt2 − + + q qtt = 2 q q −1 q −t x x−1 q −t q(q − 1)(q − t) 2 2 t + κ 2 t − 1 + (1 − κ 2 ) t (t − 1) . + κ − κ 4 1 2 2 3 2t 2 (t − 1)2 q (q − 1)2 (q − t)2
Area-Preserving Action of the Modular Group
217
This reduction amounts to the simple fact that an affine transformation allows us to normalize the variables (t1 , t2 , t3 ) as t1 = 0, t2 = 1, making t = t3 the only essentially independent variable. Putting all the arguments together, we can summarize the meaning of our modular group action and its area-preserving property (Theorem 1.2) in the following manner. Theorem 6.5 (Summary). Our (2)-action on the cubic surface S(a) is an explicit representation of the nonlinear monodromy of the Painlev´e VI equation PVI (κ), or of the Hamiltonian system (6.11) equivalent to PVI (κ), realized on the moduli space of monodromy representations with a fixed local monodromy data a ∈ C4 . Here two kinds of parameters a = (a1 , a2 , a3 , a4 ) and κ = (κ1 , κ2 , κ3 , κ4 ) are related by (6.9). Through the monodromy map, or the Riemann-Hilbert correspondence, the Hamiltonian structure on the moduli space of Fuchsian equations can be identified with the Hamiltonian structure on the moduli space of monodromy representations. These facts result in the area-preserving property of our modular group action. In general, a continuous dynamical system induces a discrete dynamical system, called the Poincar´e map. From this point of view, our modular group action may also be regarded as the Poincar´e map, induced on the moduli space of monodromy representations, of the Hamiltonian system (6.11). The nonlinear monodromy of PVI constructed in this paper is not the true monodromy in its strict sense, because it is constructed only on a moduli space of monodromy representations which is related with the true phase space of PVI via a highly transcendental map, that is, the Riemann-Hilbert correspondence. Here the true phase space should be the so-called space of initial values or the defining manifold for PVI constructed as in Okamoto [33], Shioda and Takano [38], Arinkin and Lysenko [1], Sakai [37], Saito, Takebe and Terajima [35], Noumi, Takano and Yamada [32] and others. Roughly speaking, our first task is to lift the modular group action on S(a) M(a) Rt (a) up to a typical fiber Et (a) of the fibration (6.3). For this purpose, however, our setting of E(a) in Definition 6.3 is somewhat too na¨ıve; it must be replaced by a slightly refined space ¯ E(a). We now pose the following: ¯ Problem 6.6. Introduce an appropriate fibration of moduli spaces π : E(a) → T on which the isomonodromic foliation becomes uniform, and lift our modular group action up to its typical fiber E¯t (a), to obtain the true nonlinear monodromy of the Painlev´e VI equation. ¯ It is natural that the space E(a) should be constructed as a refinement of a moduli space of gauge equivalence classes of Fuchsian connections so that each piece Eκ in (6.8) ¯ ¯ serves as a Zariski open chart of E(a). Intuitively, E(a) is constructed by gluing all the pieces Eκ , with κ satisfying (6.9), together by the B¨acklund transformations, namely, by those transformations which describe the symmetry of PVI , but this construction should be made algebro-geometrically and conceptually. We recall that Okamoto [34], Arinkin and Lysenko [2], Noumi and Yamada [31] gave various descriptions of the B¨acklund (1) transformations as an affine Weyl group symmetry of type D4 . In this connection, the following observation due to Terajima [39] is important. Lemma 6.7. Under the relation (6.9), the variables θ = θ (a) = (θ1 (a), θ2 (a), θ3 (a), θ4 (a)) (1)
form a basis of the W (D4 )-invariant functions of κ = (κ1 , κ2 , κ3 , κ4 ).
218
K. Iwasaki
Using this lemma, Inaba, Iwasaki and Saito [14] give a characterization of the B¨acklund transformations in terms of the Riemann-Hilbert correspondence, which is natural ¯ in the context of the present paper. The space E(a) in Problem 6.6 is constructed in Inaba, Iwasaki and Saito [15] as a moduli space of certain stable parabolic connections on P1 by generalizing the construction of Arinkin and Lysenko [1]. See [15] for more precise treatment of this section. Acknowledgements. The author would like to thank Masaaki Yoshida for bringing his attention to the papers [29, 30] on the moduli of cubic surfaces. Thanks are also due to Masa-Hiko Saito, Hitomi Terajima and Hiroshi Umemura for helpful discussions about the singular locus of cubic surfaces. A personal communication with Terajima should particularly be acknowledged; her observation in [39] must play significant roles in the future development of the theory. The author is also grateful to Kyoichi Takano for his keen interest in this work and to the referee for helpful suggestions.
References 1. Arinkin, D., Lysenko, S.: On the moduli of SL(2)-bundles with connections on P1 \{x1 , . . . , x4 }. Internat. Math. Res. Notices 19, 983–999 (1997) 2. Arinkin, D., Lysenko, S.: Isomorphisms between moduli spaces of SL(2)-bundles with connections on P1 \ {x1 , . . . , x4 }. Math. Res. Lett. 4, 181–190 (1997) 3. Boalch, P.P.: Symplectic manifolds and isomonodromic deformations. Adv. Math. 163, 137–205 (2001) 4. Birman, J.S.: Braids, links, and mapping class groups. Ann. Math. Stud., Princeton, NJ: Princeton Univ. Press, 1974 5. Cayley, A.: On the triple tangent planes of surfaces of the third order. Collected Papers I, Cambridge: Cambridge Univ. Press, 1889, pp. 231–326 6. Dubrovin, B.: Painlev´e transcendents in two-dimensional topological field theory. In: The Painlev´e property, one century later, R. Conte (ed.), New York: Springer-Verlag, 1999 7. Dubrovin, B., Mazzocco, M.: Monodromy of certain Painlev´e-VI transcendents and reflection groups. Invent. Math. 141(1), 55–147 (2000) 8. Goldman, W.M.: The symplectic nature of the fundamental groups of surfaces. Adv. Math. 54, 200–225 (1984) 9. Guzzetti, D.: On the critical behavior, the connection problem and the elliptic representation of a Painlev´e VI equation. J. Math. Phys. Anal. Geom. 4, 293–377 (2001) 10. Guzzetti, D.: The elliptic representation of the general Painlev´e VI equation. Comm. Pure Appl. Math. 55, 1280–1363 (2002) 11. Hitchin, N.: Poncelet polygons and the Painlev´e equations. In: Geometry and analysis (Bombay, 1992), Bombay: Tata Inst. Fund. Res., 1995, pp. 151–185 12. Hitchin, N.: Twister spaces, Einstein metrics and isomonodromic deformations. J. Diff. Geom. 42, 30–112 (1995) 13. Hitchin, N.: Frobenius manifolds. In: Gauge theory and symplectic geometry (Montreal, PQ, 1995), NATO Adv. Sci. Inst. Ser. C, Math. Phys. Sci., 488, Dordrecht: Kluwer Acad. Publ., 1997, pp. 69–112 14. Inaba, M., Iwasaki, K., Saito, M.-H.: B¨acklund transformations of the sixth Painlev´e equation in terms of Riemann-Hilbert correspondence. To appear in Internat. Math. Res. Notices 15. Inaba, M., Iwasaki, K., Saito, M.-H.: Moduli of stable parabolic connections, Riemann-Hilbert correspondence and geometry of Painlev´e equation of type VI. Preprint (2003) 16. Iwasaki, K.: Moduli and deformation for Fuchsian projective connections on a Riemann surface. J. Fac. Sci. Univ. Tokyo, Sect. IA, Math. 38, 431–531 (1991) 17. Iwasaki, K.: Fuchsian moduli on a Riemann surface – its Poisson structure and Poincar´e-Lefschetz duality. Pacific J. Math. 155, 319–340 (1992) 18. Iwasaki, K.: A modular group action on cubic surfaces and the monodromy of the Painlev´e VI equation. Proc. Japan Acad. 78, Ser. A, 131–135 (2002) 19. Iwasaki, K., Kimura, H., Shimomura, S., Yoshida, M.: From Gauss to Painlev´e. Wiesbaden: Vieweg-Verlag, 1991 20. Jimbo, M.: Monodromy problem and the boundary condition for some Painlev´e equation. Publ. Res. Inst. Math. Sci. 18, 1137–1161 (1982) 21. Jimbo, M., Miwa, T., Ueno, K.: Monodromy preserving deformation of linear ordinary differential equations with rational coefficients I – General theory and τ -functions. Physica 2D, 306–352 (1981)
Area-Preserving Action of the Modular Group
219
22. Jimbo, M., Miwa, T.: Monodromy preserving deformation of linear ordinary differential equations with rational coefficients II, III. Physica 2D, 407–448 (1981); ibid. 4D, 26–46 (1981) 23. Kawai, S.: The symplectic nature of the space of projective connections on Riemann surfaces. Math. Ann. 305, 161–182 (1996) 24. Kawai, S.: Isomonodromic deformation of Fuchsian projective connections on elliptic curves. To appear in Nagoya Math. J. 171 (2003) 25. Manin, Y.I.: Sixth Painlev´e equation, universal elliptic curve, and mirror of P2 . In: Geometry of differential equations, Amer. Math. Soc. Transl. Ser. 2, 186, Providence, RI: Am. Math. Soc., 1998, pp. 131–151 26. Mazzocco, M.: Picard and Chazy solutions to the Painlev´e VI equation. Math. Ann. 321, 157–195 (2001) 27. Mazzocco, M.: Rational solutions of the Painlev´e VI equation. J. Phys. A: Math. Gen. 34, 2281–2294 (2001) 28. Mazzocco, M.: The geometry of the classical solutions of the Garnier systems. Internat. Math. Res. Notices, 2002 (12), 613–646 (2002) 29. Naruki, I.: Cross ratio variety as a moduli space of cubic surfaces. Proc. London Math. Soc. 45(3), 1–30 (1982) 30. Naruki, I., Sekiguchi, J.: A modification of Cayley’s family of cubic surfaces and birational action of W (E6 ) over it. Proc. Japan Acad. 56, Ser. A, 122–125 (1980) 31. Noumi, M.,Yamada,Y.: A new Lax pair for the sixth Painlev´e equation associated with so ˆ (8). In: Microlocal analysis and complex Fourier analysis, Kawai, T. and Fujita, K. eds., NJ: World Scientific, 2002, pp. 238–252 32. Noumi, M., Takano, K., Yamada, Y.: B¨acklund transformations and the manifolds of Painlev´e systems. Funkcial. Ekvac. 45, 237–258 (2002) 33. Okamoto, K.: Sur les feuilletages associ´es aux equations du second ordre a` points critiques de Painlev´e, espace de conditions initiales. Japan. J. Math. 5, 1–79 (1979) 34. Okamoto, K.: Studies on the Painlev´e equations I, sixth Painlev´e equation PVI . Annali di Math. Pura, Appl. 146(4), 337–381 (1987) 35. Saito, M.-H., Takebe, T., Terajima, H.: Deformation of Okamoto-Painlev´e pairs and Painlev´e equations. J. Algebraic Geom. 11, 311–362 (2002) 36. Saito, M.-H., Terajima, H.: Nodal curves and Riccati solutions of Painlev´e equations. math.AG/0201225 37. Sakai, H.: Rational surfaces associated with affine root systems and geometry of the Painlev´e equations. Commun. Math. Phys 220, 165–229 (2001) 38. Shioda, T., Takano, K.: On some Hamiltonian structures of Painlev´e systems, I. Funkcial. Ekvac. 40(2), 271–291 (1997) 39. Terajima, H.: On the space of monodromy data of Painlev´e VI. Preprint, Kobe University, March, 2003 40. Tod, K.P.: Self-dual Einstein metrics from the Painlev´e VI equation. Phys. Lett. A 190, 221–224 (1994) 41. Yoshida, M.: On the number of apparent singularities of the Riemann-Hilbert problem on Riemann surface. J. Math. Soc. Japan 49, 145–159 (1997) Communicated by L. Takhtajan
Commun. Math. Phys. 242, 221–250 (2003) Digital Object Identifier (DOI) 10.1007/s00220-003-0943-0
Communications in
Mathematical Physics
Quasi-Periodic Solutions for Two-Level Systems Guido Gentile Dipartimento di Matematica, Universit`a di Roma Tre, 00146, Roma, Italy Received: 4 November 2002 / Accepted: 26 May 2003 Published online: 26 September 2003 – © Springer-Verlag 2003
Abstract: We consider the Schr¨odinger equation for a class of two-level atoms in a quasi-periodic external field in the case in which the spacing 2ε between the two unperturbed energy levels is small, and we study the problem of finding quasi-periodic solutions of a related generalized Riccati equation. We prove the existence of quasi-periodic solutions of the latter equation for a Cantor set E of values of ε around the origin which is of positive Lebesgue measure: such solutions can be obtained from the formal power series by a suitable resummation procedure. The set E can be characterized by requesting infinitely many Diophantine conditions of Mel’nikov type. 1. Introduction Consider the Hamiltonian describing a two-level system in a quasi-periodic external field H (t) = εσ3 − f (t)σ1 ,
(1.1)
where σ1 , σ2 , σ3 are the Pauli matrices and f (t) is a real analytic quasi-periodic function with frequency vector ω; the real parameter ε measures half the spacing between the unperturbed energy levels. The model has been widely studied in physics (for an introduction to the subject we refer to such classical textbooks as [9 and 18] ), and it was recently considered in [3 and 20], in connection with the problem of studying the existence of pure point spectrum for the quasi-energy operator. In [3] the case of small external field (large ε) with two frequencies ω1 and ω2 was treated, and the spectrum of the quasi-energy operator was shown to be pure point for α = ω1 /ω2 Diophantine and excluding a further small set of resonant values. In [20] the same problem was studied for large external field (small ε), and it was shown to be reducible to the case of large ε provided that the average f0 of the external field is nonvanishing: this is accomplished by performing a unitary transformation which
222
G. Gentile
casts the quasi-energy operator into the same form as in the case of large ε, but one needs f0 to be not zero. In [1] the problem was investigated of studying quasi-periodic solutions of the corresponding time-dependent Schr¨odinger equation ∂ ψ(t) = H (t)ψ(t), (1.2) ∂t for small ε: the solutions of the Schr¨odinger equation (1.2) were shown to be expressible in terms of particular solutions of a generalized Riccati equation (see the next section). In particular in [1] it was found that quasi-periodic solutions of the generalized Riccati equation exist in the form of formal power series, but such series were argued to be in general divergent. Here we prove that quasi-periodic solutions exist indeed. However they are likely to be not analytic in ε, according to the conjecture proposed in [1]; in fact we are able to define them only on a set of values of the perturbative parameter ε centered around the origin and with a dense set of holes. The problem we consider by following [1] is slightly different from that considered in [3 and 20] , as we fix the frequencies ω1 , . . . , ωd , with d ≥ 1, of the external field, and, by imposing a Diophantine condition on the vector ω = (ω1 , . . . , ωd , f0 ) if f0 = 0 and on the vector ω = (ω1 , . . . , ωd ) if f0 = 0, we find quasi-periodic solutions by requesting further conditions on the parameter ε: therefore we study the dependence on ε of the quasi-periodic solutions. But of course we can also fix ε and find conditions on ω: this requires some modifications of the technical part of the forthcoming sections, which are discussed in [13]. Also after taking into account such modifications, to come back to the original problem about the spectrum of the quasi-energy operator is not so immediate, as one has to check some properties of the solution of the generalized Riccati equation, which are not obvious (see Sect. 7 in [1]). Besides that, there are further problems which make it difficult to control the number of frequencies of the quasi-periodic solutions, and which can be easily settled only when the external field has zero average. So in the latter case (which is precisely the case left out in literature) we are able to conclude that the spectrum of the quasi-energy operator is pure point, so completing the results in [20]: again this is discussed in [13], which we refer to for details. As we said in this paper we focus our attention directly on the related generalized Riccati equation, so proving a result left as an open problem in [1]. For simplicity we assume a nondegeneracy condition of the external field (which corresponds to the condition of case (1) of Theorem 2.2 in [1]), but we think that our methods can be successfully applied in order to deal also with the condition of case (2); we refer to Sect. 2 for a more technical discussion. We first show that it is possible, by a suitable choice of the initial conditions, to eliminate the secular terms and to obtain a formal solution which is given by a formal power series in ε quasi-periodic in time, so recovering the case (1) of Theorem 2.2. in [1]. The main difference with respect to [1] is that we use a graphic representation of the solution in terms of trees which allows us to obtain a very simple proof of existence of the formal solution. Then we introduce a suitable resummation which leads to the proof of existence of a solution which is quasi-periodic in time and defined on a Cantor set E of values of the perturbative parameter. This represents the main interest of the present paper, and the main novelty with respect to the existing literature. An interesting question would be what happens of the values of the perturbative parameter which are excluded. This is a difficult problem. The case of the one-dimensional Schr¨odinger equation with a small quasi-periodic potential was considered by i
Quasi-Periodic Solutions for Two-Level Systems
223
Eliasson [6], and reducibility was proved for a full measure set of the perturbative d parameter. Then in [7] the case of skew systems on T × SO(3, R) was dealt with, and the question was raised if reducibility for a full measure set of parameters holds also in such a case (under some reasonable conditions). A positive answer was then given by Krikorian [15], who also extended the results to more general cases (see for instance [16 and 17] ). As the systems they consider are very close to ours, one can expect a similar result to be valid here; in other words one can expect that for a full measure set of values outside E the system is still reducible. It would be also interesting to study systems with infinite levels (extension to systems with a finite number of levels should not be difficult): some results in this direction can be found in [4 and 5] , where the case of the periodic external field was considered. To prove our results, we use a version of the techniques introduced in classical mechanics by Eliasson, [8], in order to study KAM-type problems. Such techniques were further developed (see [10, 14, 11, 12] and papers quoted therein), by emphasizing the analogy with the methods of quantum field theory. In particular in [11] a resummation procedure was introduced which was reminiscent of the mass graphs resummation in field theory. Here we follow the same approach but slightly changing the resummation procedure in a form which is more suitable to deal with the small divisors in the present case. With respect to [11] we have the extra difficulty that the resummation produces new small divisors which can be vanishing for certain values of ε: so we have to perform the resummation in an iterative way by being careful to exclude more and more values of ε at each step, a feature which was obviously already present in [3], even if our approach is completely different. 2. Existence of Formal Solutions In [1] the solution of the Schr¨odinger equation (1.2) is shown to be expressible in terms of a particular solution of the generalized Riccati equation ˙ − iG2 − 2if G + iε 2 = 0, G ˙ = dG/dt, where G f = f (t) =
(2.1)
eiν·ωt fν
(2.2)
d
ν∈Z
is the real analytic quasi-periodic function appearing in (1.1), and d ≥ 1 is an integer; see Theorem 2.1 in [1]. Let us look for a solution of (2.1) of the form t 2iF (t) G = −iεQu, Q(t) = e , F (t) = dt f (t ). (2.3) 0
Then, for ε = 0, (2.1) implies for u the following equation: u˙ = ε Qu2 + Q−1 . Define
d , if f0 = 0, d= d + 1, if f0 = 0,
(2.4)
ω=
if f0 = 0, ω, (f0 , ω), if f0 = 0,
(2.5)
224
G. Gentile
and assume that ω is a Diophantine vector, i.e. a vector satisfying the Diophantine condition ∀ν ∈ Z \ {0},
|ω · ν| > C0 |ν|−τ
d
(2.6)
with C0 , τ two positive constants and |ν| = |ν1 | + · · · + |νd | for ν = (ν1 , . . . , νd ). Given any function g of the form g(t) = eiν·ωt gν , (2.7) ν∈Zd
let us denote by g = g0 the constant term in its Fourier expansion. Let us suppose that one has Q = 0;
(2.8)
this corresponds to the assumption (1) of Theorem 2.2 in [1]. By the analyticity assumptions on f one has |Qν | ≤ Qe−κ|ν| ,
|(Q−1 )ν | ≤ Qe−κ|ν| ,
(2.9)
for two suitable positive constants Q and κ. Moreover one has Q−1 = 0 if and only if (2.8) holds. Then we have the following result [1]. Theorem 1. The generalized Riccati equation (2.1), with f a real analytic quasi-periodic function of the form (2.2), under the hypotheses (2.6) and (2.8), admits a formal power series g(t; ε) =
∞
ε k g (k) (ωt),
(2.10)
k=0
which represents a formal particular solution, i.e. to all orders k the functions g (k) (ψ) are well defined, and they are 2π-periodic and analytic in the variable ψ. The proof consists in determining the initial conditions in a suitable way in order to eliminate the secular terms, and it is performed in Sects. 4 and 5. In such a way we find (k) that the Fourier coefficients gν of the functions g (k) (ψ) depend on k as factorials to some powers, so that convergence does not follow. 3. Existence of Solutions The result of the previous section can be improved into the following one. Theorem 2. Consider the generalized Riccati equation (2.1), with f a real analytic quasi-periodic function of the form (2.2), and assume that the hypotheses (2.6) and (2.8) are fulfilled. There exist three positive constants ε0 , b and ξ and a set E ⊂ (−ε0 , ε0 ) ξ of Lebesgue measure meas(E) ≥ 2ε0 (1 − bε0 ) such that, for all ε ∈ E, (2.1) admits a particular solution of the form g(t; ε) = g(ωt; ˜ ε), where the function g(ψ; ˜ ε) is 2π-periodic and analytic in the variable ψ.
(3.1)
Quasi-Periodic Solutions for Two-Level Systems
225
The proof of the above statements will be performed in Sect. 6 to 8. The solution (3.1) is likely to be not analytic in ε; in fact it can be obtained by the formal power expansion (2.10) through a suitable resummation procedure. The set of values of ε which have to be excluded from (−ε0 , ε0 ) is dense, and it depends on the external field f (t). From a technical point of view such values arise by imposing infinitely many Diophantine conditions of Mel’nikov type of the form d ∀ν ∈ Z \ {0} and ∀n ≥ −1, (3.2) iω · ν − M[n] (ω · ν; ε) ≥ C0 |ν|−τ1 where τ1 > τ + d and M[n] (ω · ν; ε) are suitable functions which will be constructed recursively along the iterative resummation of the formal solution. For instance one has M[−1] (x; ε) = 0, M[0] (x; ε) = 2ε Q c[0] + O(ε 2 ), with c[0] = i Q−1 /Q, and so on. Of course the relevance of the conditions (3.2) depends on the value of the imaginary parts of the functions M[n] (ω · ν; ε). We do not study this problem in general, but we can immediately realize that we can have easily nontrivial situations. Consider for instance the case of an external field f which is an even function with vanishing average and of order µ, with µ 1: then F (t) in (2.3) is odd, and one has that Q = 1 + O(µ) and Q−1 = 1 + O(µ) are both real, so that c[0] = i(1 + O(µ)), hence M[0] (x; ε) = 2iε(1 + O(µ)) + O(ε 2 ). Therefore the conditions (3.2) give nontrivial results at least for such a case for n = 0; furthermore we shall see that one has M[n] (x; ε) = M[0] (x; ε) + O(ε 2 ) for all n ≥ 1 and that the difference between two functions M[n+1] (x; ε) and M[n] (x; ε) tends exponentially to zero as n → ∞. 4. Graphical Representation and Tree Formalism We look for a formal solution of (2.4) of the form u(t) =
∞
ε k u(k) (t) =
k=0
∞
εk
eiν·ωt u(k) ν ;
(4.1)
ν∈Z
k=0
d
by setting, for all k ≥ 0, (k)
u0 = c(k) ,
(4.2)
we can write u(k) (t) = c(k) + U (k) ,
U (k) = 0
∀k ≥ 0.
(4.3)
By inserting (4.1) into (2.4) we obtain u˙ (0) = 0, u˙ (1) = Q−1 + Q(u(0) )2 , u˙ (k) = Q u(k1 ) u(k2 ) k1 +k2 =k−1
∀k ≥ 2,
(4.4)
226
G. Gentile
which, expressed in Fourier space, becomes, for all ν = 0, u(0) ν = 0, −1 = Q + Qν (c(0) )2 , (iω · ν) u(1) ν ν 1 ) (k2 ) Qν 0 u(k (iω · ν) u(k) ν = ν 1 uν 2 k1 +k2 =k−1
∀k ≥ 2,
(4.5)
ν 0 +ν 1 +ν 2 =ν
provided that the right hand side of (4.4) has vanishing average. This requires 0 = (Q−1 )0 + Q0 (c(0) )2 , 1 ) (k2 ) 0= Qν 0 u(k ν 1 uν 2
∀k ≥ 2.
(4.6)
k1 +k2 =k−1 ν 0 +ν 1 +ν 2 =0
The first equation in (4.6) fixes c(0) to a value such that −1 Q (0) 2 (c ) = − , Q
(4.7)
which is well defined and different from zero by the hypothesis (2.8). The second equation in (4.6) can be written as 0 = 2Q0 c(0) c(k−1) + other terms depending on c(0) , . . . , c(k−2) ,
(4.8)
which allows us, in principle, to fix iteratively the coefficients {c(k) }∞ k=1 . (0) (1) We can represent graphically the functions uν and uν as in Fig. 4.1. (k) More generally we can represent uν for all k ≥ 0 as in Fig. 4.2, where the graphical representation has to be interpreted as in Fig. 4.1 when either k = 0 or k = 1, while it (0)
(0)
u0 =
0
(0) ν0
(1) uν
=
ν
ν
ν1
ν 2 (0)
(1)
(1)
u0 =
+
0 (0)
(1)
(0)
Fig. 4.1. Graphical representation of uν and uν . The function uν is represented by a graph formed by a line and an endpoint (white bullet): we associate to the line a momentum ν = 0 and a propagator 1, and to the white bullet an order label k = 0, a mode label ν = 0 and a node factor c(0) , so that one has (0) (0) (1) uν = 0 for ν = 0, while u0 = c(0) . The function uν , for ν = 0, is represented by the sum of two graphs. We associate to the line with momentum ν a propagator 1/(i ω · ν ). In the first graph we associate to the endpoint (black bullet) a mode label ν and a node factor (Q−1 )ν , while in the second graph we associate to the point (vertex) carrying the mode ν 0 a node factor Qν 0 and to the two white bullets order labels k1 = k2 = 0, modes ν 1 = ν 2 = 0 and node factors c(0) . In the second graph one has the constraint (1) (0) ν = ν 0 + ν 1 + ν 2 . The function u0 is represented as u0 , with the only difference that now the white bullet carries an order label k = 1.
Quasi-Periodic Solutions for Two-Level Systems
227 (k)
(k) uν
= ν (k)
(k)
u0 = 0
(k) Fig. 4.2. Graphical representation of uν . The line carries a momentum ν : we associate to it a propagator 1/(i ω · ν ) if ν = 0, and a propagator 1 if ν = 0
(k1 )
(k1 )
(k) ν
ν0
=
ν1
ν
+ (k2 )
ν2
ν0
ν1
ν
(k2 )
0
(k1 ) ν0
+
0
ν ν2
(k2 )
(k1 )
+
ν0
0
ν 0
(k2 )
(k )
Fig. 4.3. Graphical representation of uν in terms of uν , with k < k, for ν = 0. One has the constraints ν = ν 0 + ν 1 + ν 2 and k = 1 + k1 + k2 . To the vertex with mode ν 0 we associate a node factor Qν 0 . (k)
can be developed iteratively as shown in Fig. 4.3 when k ≥ 2 when ν = 0. If ν = 0 one (k) has u0 = c(k) , with c(k) to be recursively defined, as it will be explained below. For instance when k = 2 and ν = 0 we obtain the graphical representation of Fig. 4.4. Therefore we can see that, iterating the graphical procedure described above, we can d (k) give a graphical representation of uν , for all k ∈ Z+ and for all ν ∈ Z , in terms of trees. A tree θ is a connected set of points and lines such that the lines are oriented toward a point which is called the root of the tree. We call nodes all the points of the tree other than the root. The orientation induces a partial ordering relation between the nodes (and the lines), which we shall denote by : given two nodes v and w we shall write w v if v is along the path (of lines) connecting v to the root, and w = v means that the two nodes coincide. We shall be interested only in trees such that for all nodes there are only either two or zero entering lines (keep in mind Fig. 4.4 as an example). Note that for the root there is by construction one and only one line entering it: we shall call a root line such a line.
228
G. Gentile (0)
ν
ν 1
ν 0
(2) ν0
=
ν
ν1
ν0
+
ν
(0)
ν2
ν2
ν 1 ν 2 (0)
(0) ν0
+
ν1
ν
(0) (0)
ν0
+
ν1
ν
ν2
ν2
ν 1 ν 0
ν 2 (0)
(1) ν0
+
ν
ν1 ν2
(0)
ν0
+
ν
(0)
(0)
ν1 ν2
(1)
(2)
Fig. 4.4. Graphical representation of uν for ν = 0. The symbols and the labels have the same meaning as in the previous Figures. One has the constraints ν = ν 0 + ν 1 + ν 2 in all graphs, ν 1 = ν 0 + ν 1 + ν 2 in the second graph, and ν 2 = ν 0 + ν 1 + ν 2 in the fourth graph. The lines coming out from the white endpoints must have momentum ν = 0; to each white endpoint we associate a node factor c(k) if k = 0, 1 is the order label of the endpoint, to each black endpoint we associate a node factor (Q−1 )ν if ν is the momentum of the line coming out from it (which is equal to the mode of the endpoint itself), and to each node v which is not an endpoint (vertex) we associate a mode ν v and a node factor Qν v .
Given a tree θ let us distinguish between the set E(θ ) of nodes such that no lines enter them and the set V (θ) of nodes such that there is at least a line (hence two lines) entering them: we call endpoints the first ones and vertices the latter. Graphically the endpoints will be depicted as bullets which can be black or white (see for instance Fig. 4): we denote by EB (θ ) and EW (θ ) the two sets, respectively. Define W (θ) as the set of the endpoints represented by white bullets, and B(θ ) as the set of vertices and of endpoints represented by black bullets; of course one has W (θ) = EW (θ ) and B(θ) = EB (θ ) ∪ V (θ ). To each vertex v ∈ V (θ ) we associate a mode label ν v ∈ Z and a node factor d Fv = Qν v , to each endpoint v ∈ EB (θ ) we associate a mode label ν v ∈ Z and a −1 node factor Fv = (Q )ν v , and to each endpoint v ∈ EW (θ ) we associate a mode label ν v = 0, an order label kv ∈ Z+ and a node factor Fv = c(kv ) . Define L(θ ) as the set of lines in θ . Each line comes out from a point and enters another point; if we denote by v the first one we shall denote by v the latter, and we shall call it the “point immediately following v”; as the line is uniquely identified by the node v we shall write also = v . d
Quasi-Periodic Solutions for Two-Level Systems
229
To each line we associate a momentum label ν ∈ Z and a propagator which is g = 1/(iω · ν ) if ν = 0 and g = 1 if ν = 0. If the line comes out from a node v ∈ W (θ) one has necessarily ν = 0, while if the line comes out from a node v ∈ B(θ ) all values of ν (except 0) are possible. We say that ν “flows” through the line . The modes and the momenta are related by the following relation: if = v and and are the lines entering v one has νw, (4.9) ν = ν v + ν + ν = d
w∈B(θ ) w v
which represents a sort of conservation law. We call equivalent two trees which can be transformed into each other by continuously deforming the lines in such a way that the latter do not cross each other. Finally we define Tk,ν as the set of inequivalent trees θ such that (1) for each vertex v ∈ V (θ ) there are exactly two entering lines; (2) the endpoints v ∈ E(θ ) can be either white or black; (3) the number of black endpoints, the number of vertices and the order labels of the white endpoints are such that, by setting |B(θ )| = |EB (θ )| + |V (θ)| = k1 and
v∈EW (θ) kv = k2 , one has k1 + k2 = k; (4) the momentum flowing through the line entering the root (root line) is ν. We shall call Tk,ν the set of trees of order k and with total momentum ν. With the above notations we can write, for ν = 0, Val(θ ), u(k) ν = θ∈Tk,ν
Val(θ ) =
∈L(θ)
g
Fv ,
(4.10)
v∈E(θ)∪V (θ)
where Val(θ ) is called the value of the tree θ , and if v ∈ V (θ), 1 , if ν = 0, Qν v ,
Fv = (Q−1 )ν v , if v ∈ EB (θ ), g = iω · ν 1, c(kv ) , if ν = 0, if v ∈ EW (θ ),
(4.11)
(k)
while, for ν = 0, one can easily write the contribution c(k) = u0 of order k to the initial condition by imposing (4.8). This yields for k ≥ 1, 1 Val(θ ), (4.12) c(k) = − (0) 2c Q ∗ θ∈Tk+1,0
∗ where Tk+1,0 is defined as Tk+1,0 , with the constraint that one has to discard the two trees of the form represented in Fig. 4.5 such that the three represented lines carry vanishing momenta and the mode label associated to the represented vertex is zero. Note that in (4.12), as well as in (4.10), the values Val(θ ) will depend on the node factors c(k ) , with k < k, so that (4.12) provides a recursive definition of the coefficients c(k) . An example of tree of order k = 25 is given in Fig. 4.6.
230
G. Gentile (0) ν0 ν
ν2
(k) ν0
ν1 ν
(k)
ν2
ν1 (0)
∗ Fig. 4.5. Trees that does not appear in Tk+1,0 : besides of having ν 1 = ν 2 = 0, by definition of white endpoint, one requires also ν = ν 0 = 0, so that the value of both such trees is Q0 c(0) c(k) = Qc(0) c(k) .
(0)
(0) (3)
(0)
Val(θ ) =
(4)
(0) (2)
Fig. 4.6. An example of tree with 11 vertices, 5 black endpoints and 7 white endpoints. Unlike the order labels, the modes of the nodes and the momenta of the lines are not explicitly showed. The order of the tree is k = 11 + 5 + (3 + 2 + 4) = 25, while the total momentum ν is the momentum flowing through the root line, which is the leftmost one.
5. Multiscale Decomposition In this section we introduce a multiscale decomposition of the propagators: with respect to [1] this will allow us to obtain better estimates on some contributions to the coefficients (k) uν . Moreover this will be the first step in order to prove Theorem 2 in Sect. 3. + Let ψ(x) be a C ∞ non-decreasing compact support function defined on R such that ψ(x) =
1, 0,
for x ≥ C0 , for x ≤ C0 /2,
(5.1)
Quasi-Periodic Solutions for Two-Level Systems
231
χ(x)
ψ(x)
C0 /2
C0
x
Fig. 5.1. Possible graphs of the C ∞ compact support functions ψ(x) and χ(x).
where C0 is the Diophantine constant appearing in (2.6), and set χ (x) = 1 − ψ(x); see Fig. 5.1. Define also ψn (x) = ψ(2n x) and χn (x) = χ (2n x) for all n ≥ 0; of course ψ0 = ψ and χ0 = χ . Then, for any line ∈ L(θ ) with ν = 0, set ∞
g ≡
1 ψ0 (|ω · ν |) ψn (|ω · ν |)χn−1 (|ω · ν |) = + , iω · ν iω · ν iω · ν
(5.2)
n=1
which can be rewritten as g =
∞
(n)
g ,
n=0 (0) g (n)
g
ψ0 (|ω · ν |) , iω · ν ψn (|ω · ν |)χn−1 (|ω · ν |) = iω · ν =
∀n ≥ 1.
(5.3)
(n)
We shall call g ≡ g (n) (ω · ν ) a propagator on scale n. We shall assign to each line ∈ L(θ ) with ν = 0 also a new label n = 0, 1, 2, . . . , which will be called the scale label of the line ; we can associate a scale label also to a line with ν = 0, by setting n = −1. Then we shall define k,ν as the set of trees which differ from those in Tk,ν just because of the newly introduced scale labels, so that (4.10) can be replaced with u(k) ν =
Val(θ ),
θ∈k,ν
Val(θ ) =
∈L(θ)
(n )
g
Fv ,
(5.4)
v∈E(θ)∪V (θ)
∗ and an expression analogous to (4.11) holds for c(k) , provided that Tk+1,0 is replaced ∗ with k+1,0 , with obvious meaning of the symbols.
232
G. Gentile
Note that, for fixed x = ω · ν, one can have g (n) (x) = 0 only for two values of n, so that the series (5.3) is in fact a finite sum. Note also that g (n) (x) = 0 only if 2−n−1 C0 < |x| < 2−n+1 C0 for n ≥ 1 and only (n) if |x| > 2−1 C0 for n = 0. This means that for any line on scale n such that g = 0 (n) −1 n+1 we can bound |g | ≤
C0 2 . Hence, if Nn (θ ) denotes the number of lines on scale n in θ and |ν(θ )| = v∈B(θ) |ν v |, we can bound, for each tree θ ∈ k,ν and for any integer n0 , ∞ |Val(θ )| ≤ C0−k1 2k1 2n0 k1 Qk1 e−κ|ν(θ)| |c(kv ) | 2nNn (θ) , (5.5) v∈EW (θ)
n=n0
where k1 = |B(θ )| and v∈EW (θ) kv = k2 = k − k1 . A cluster T on scale n is a maximal set of points and lines connecting them such that all the lines have scales n ≤ n and there is at least one line with scale n. The mT ≥ 0 lines entering the cluster T and the possible line coming out from it (unique if existing at all) are called the external lines of the cluster T . Given a cluster T on scale n, we shall denote by nT = n the scale of the cluster. Given a cluster T in a tree θ call V (T ), E(T ), EW (T ), EB (T ), B(T ), and L(T ) the set of vertices, of endpoints, of white endpoints, of black endpoints, of vertices plus black endpoints, and of lines of T , respectively. Let us define also ν T = v∈B(T ) ν v . We call a self-energy graph of a tree θ any cluster T such that (1) T has only one entering line 2T and one exiting line 1T , (2) one has νT ≡ ν v = 0. (5.6) v∈B(T )
We say that the line 1T exiting a self-energy graph T is a self-energy line; we call a normal line any line of the tree which is not a self-energy line. Note that the two external lines of a self-energy graph have not necessarily the same scale: if n1 and n2 are the scale of the lines 1T and 2T , and n = nT is the scale of the self-energy graph as a cluster, one must have n + 1 ≤ min{n1 , n2 }. An example of tree with self-energy graphs is depicted in Fig. 5.2; one can immediately realize that because of the presence of self-energy graphs one can have accumulation of small divisors. It is not difficult to prove (see for instance [11]) that, if we denote by Nnnorm (θ ) the number of normal lines in θ, then there exists a positive constant c such that Nnnorm (θ ) ≤ c 2−n/τ |ν v |, (5.7) v∈B(θ)
so that, if we could neglect the self-energy lines, i.e if we could replace Nn (θ ) with Nnnorm (θ ) in (5.5), we would obtain, for some constant C, |Val(θ )| ≤ C0−k1 22k1 2n0 k1 Qk1 e−κ|ν(θ)| ≤ C k1 e
−κ |ν(θ)|
∞ |c(kv ) | exp |ν(θ )|c log 2 n2−n/τ
v∈EW (θ)
|c(kv ) | ,
v∈EW (θ)
for some κ < κ and n0 suitably chosen (see [14 or 11] ).
n=n0
(5.8)
Quasi-Periodic Solutions for Two-Level Systems
233
ν ν
ν
ν ν ν ν ν ν
Fig. 5.2. Example of tree containing self-energy graphs. All the lines contained inside a self-energy graph have scales strictly less than the scales of the external lines of the self-energy graph. One can have self-energy graphs containing other self-energy graphs on lower scales; in the Figure one has a self-energy graph with external lines carrying a momentum ν contained inside a self-energy graph with external lines carrying a momentum ν : the first one will be on a scale n ≤ n − 1 if n is the scale of the latter.
Unfortunately the bound (5.8) is in general false, as it can apply only to trees without self-energy graphs. Therefore, as we are going to show, the above analysis is sufficient to prove Theorem 1, but not to prove Theorem 2. Indeed, as we can not use the bound (5.7) for all trees, all we can do in general for a tree θ ∈ k,ν is to estimate the product of propagators in (5.4) by
(n ) g ≤ C0−k1 |ν |τ , (5.9)
∈L(θ)
so that, by writing
e−κ|ν v | ≤
∈L(θ)
v∈B(θ)
e−κ|ν v |/2
v∈B(θ)
e−κ|ν |/2k1 ,
(5.10)
∈L(θ)
one obtains for each line ∈ L(θ ) the bound |ν |τ e−κ|ν |/2k1 ≤ τ !(2k1 /κ)τ . Therefore we can bound k1 τ (n ) −k1 k1 2k1 −κ|ν |/2k1 g e , (5.11) ≤ C0 τ ! κ
∈L(θ)
∈L(θ)
and for all trees in k,ν we have |Val(θ )| ≤ C1k1 (k1 !)α
v∈B(θ)
e−κ|ν v |/2
|c(kv ) | ,
(5.12)
v∈EW (θ)
for two positive constants C1 and α. Therefore, by using that the number of trees of fixed order and fixed mode labels is bounded by C2k for some positive constant C2 (taking into
234
G. Gentile
account the number of shapes of trees and the number of ways of assigning the scale labels in such a way that the corresponding tree value is not vanishing) and expanding each c(kv ) in terms of trees according to (4.12), as in [11], we obtain at the end, for suit (k) able positive constants κ < κ and C3 , a bound |uν | ≤ e−κ |ν| C3k k!α , which reproduces the result in [1]. The only case in which we obtain bounds containing no factorial and hence we can deduce the convergence of the perturbative series is the case d = 1, where there are no small divisors (one can bound |ω · ν| = |ων| ≥ |ω|), as it was already pointed out in [2]. 6. Renormalized Expansion To prove Theorem 2 we need a different tree expansion, that we envisage by starting from the present section. We shall define new propagators g [n ] iteratively. First some notations are needed. Suppose that the node factors c[kv ] and the propagators g [n ] are assigned. Given a self-energy graph T which does not contain any other self-energy graphs, define the self-energy value as
VT (ω · ν; ε) = εkT (6.1) g [n ] Fv ,
∈L(T )
v∈E(T )∪V (T )
where Fv is defined as in (4.11) except for the white endpoints for which one has Fv = c[kv ] , and kT = |B(T )| + v∈EW (T ) kv ≥ 1 is called the self-energy order and represents the number of vertices and black endpoints in T plus the sum of the orders of the white endpoints in T ; of course VT (ω · ν; ε) depends on ω · ν through the propagators of the lines in L(T ). Define R k,ν as the set of trees which do not contain any self-energy graphs (renorR as the set of self-energy graphs of order k which do not contain malized trees), and Sk,n any other self-energy graph and such that the maximum of the scales of the lines in T is exactly n (renormalized self-energy graphs on scale n). Then we can define the renormalized propagators g [n] ≡ g [n] (ω · ν ; ε) and the quantities M [n] (ω · ν ; ε) recursively as follows. We set g [−1] (x; ε) = 1, g [0] (x; ε) =
ψ0 (|x|) , ix
M [−1] (x; ε) = 0, M [0] (x; ε) = 2εQ0 c[0] +
∞
VT (x; ε),
k=1 T ∈S R
k,0
(6.2) with c[0] = c(0) , as given by (4.7), while, for n ≥ 1, we define χ0 (|x|)χ1 (|ix − M[0] (x; ε)|) . . . χn−1 (|ix − M[n−2] (x; ε)|) × ψn (|ix − M[n−1] (x; ε)|) g [n] (x; ε) = , ix − M[n−1] (x; ε) M[n] (x; ε) = M[n−1] (x; ε) + M [n] (x; ε)χ0 (|x|)χ1 (|ix − M[0] (x; ε)|) . . . ×χn (|ix − M[n−1] (x; ε)|),
Quasi-Periodic Solutions for Two-Level Systems
=
n
235
M [j ] (x; ε)χ0 (|x|)χ1 (|ix − M[0] (x; ε)|) . . .
j =0
×χj (|ix − M[j −1] (x; ε)|), ∞ VT (x; ε), M [n] (x; ε) =
(6.3)
k=1 T ∈S R
k,n
R one where VT (x; ε) is defined as in (6.1). Note that for all n ≥ 0 and for all T ∈ Sk,n has kT ≥ 2, so that M [0] (x; ε) = 2εQ0 c[0] + O(ε2 ) and M [n] (x; ε) = O(ε2 ) for n ≥ 1. For instance one has
χ0 (|x|)ψ1 (|ix − M[0] (x; ε)|) , ix − M[0] (x; ε) χ0 (|x|)χ1 (|ix − M[0] (x; ε)|)ψ2 (x − M[1] (x; ε)) g [2] (x; ε) = , ix − M[1] (x; ε)
g [1] (x; ε) =
(6.4)
with M[0] (x; ε) = M [0] (x; ε)χ0 (|x|), M[1] (x; ε) = M[0] (x; ε) + M [1] (x; ε)χ0 (|x|)χ1 (|ix − M[0] (x; ε)|),
(6.5)
and so on. Note that if a line is on scale n and, by setting x = ω·ν , one has g [n] (x; ε) = 0, this requires χ0 (|x|) = 0, χ1 (|ix − M[0] (x; ε)|) = 0, . . . , χn−1 (|ix − M[n−2] (x; ε)|) = 0 and ψn (|ix − M[n−1] (x; ε)|) = 0, which means |x| ≤ C0 , |ix − M[0] (x; ε)| ≤ 2−1 C0 , |ix − M[1] (x; ε)| ≤ 2−2 C0 , ......... |ix − M[n−2] (x; ε)| ≤ 2−(n−1) C0 , |ix − M[n−1] (x; ε)| ≥ 2−(n+1) C0 ,
(6.6)
so that, in particular, if a line is on scale n, then one has |g [n] | ≤ C0−1 2n+1 . Then we define, formally, for ν = 0, Val(θ ), u[k] ν = θ∈R k,ν
Val(θ ) =
g [n ] Fv ,
∈L(θ) v∈E(θ)∪V (θ)
(6.7)
while, for ν = 0, one has c[k] = −
1 2c[0] Q
θ∈R∗ k+1,0
Val(θ ),
(6.8)
236
G. Gentile
R∗ is defined as ∗ where k+1,0 k+1,0 , after (4.12) and (5.4), with the only difference that one has to consider renormalized trees instead of trees; of course this provides a recursive definition of the coefficients c[k] , as both (6.7) and (6.8) depend on the values c[k ] with k < k. Then we shall write
u(t) =
∞
ε k u[k] (t) =
k=1
∞
εk
k=1
eiν·ωt u[k] ν ,
(6.9)
ν∈Z
d
u[k] ν
are defined through (6.7) and depend on ε (as the propagawhere the coefficients tors do); note that the order k of a renormalized tree θ is still defined as k = |B(θ )| +
k , but it does not correspond anymore to the perturbative order. v v∈EW (θ) Fix ε such that one has d ∀ν ∈ Z \ {0} and ∀n ≥ −1, (6.10) iω · ν − M[n] (ω · ν; ε) ≥ C0 |ν|−τ1 with Diophantine constants C0 and τ1 , where C0 is the same as in (2.6), while τ1 > τ is to be fixed later. We call E the set of ε for which the Diophantine conditions (6.10) are satisfied. We shall see in next section that for ε ∈ E we shall be able to give a meaning to the (so far formal) renormalized expansion (6.9), hence we shall prove that the set E has positive Lebesgue measure. 7. Convergence of the Renormalized Expansion Now we study the renormalized expansion introduced in Sect. 6. First we show that if the propagators satisfy the Diophantine conditions (6.10), with the functions M[n] (ω · ν; ε) well defined, smooth enough and small enough together k −κ |ν| follows for suitable constants with their derivatives, then a bound like |u[k] ν |≤C e C and κ . By recalling the discussion in Sect. 5, we realize that it is sufficient to obtain a bound on the number of lines on fixed scale like (5.7): this will be the content of Lemma 1 below. Then we prove inductively that the conditions on the functions M[n] (ω · ν : ε) are satisfied, provided that we exclude some values of the perturbative parameter ε: this will be done in Lemma 2. The admissible values of ε are exactly the ones for which the Diophantine conditions (6.10) are satisfied. So we are left with the problem of studying how many values of ε are left, i.e. how large is the set E of admissible values of ε: through Lemma 3 and Lemma 4 we shall verify that E is a set with positive relatively large measure. Lemma 1. Assume that the set E has non-zero measure and that for all ε ∈ E the functions M[n] (x; ε) are C 1 in x and satisfy the bounds [n] (7.1) M (x; ε) ≤ D|ε|, ∂x M[n] (x; ε) ≤ D|ε|, for some constant D. Then for any renormalized tree θ such that Val(θ ) = 0 the number Nn (θ) of lines on scale n satisfies the bound Nn (θ ) ≤ c 2−n/τ1 |ν v |, (7.2) v∈B(θ)
for a suitable positive constant c.
Quasi-Periodic Solutions for Two-Level Systems
237
Proof. We prove inductively on the order k of the renormalized trees the bound
Nn∗ (θ ) ≤ max{0, 2|ν(θ )|2(3−n)/τ1 − 1},
(7.3)
where |ν(θ )| ≡ v∈B(θ) |ν v | and Nn∗ (θ ) is the number of lines in L(θ ) on scale n ≥ n. If θ has k = 1 one has B(θ) = {v} and |ν(θ )| = |ν v |. In order that the line coming out from v be on scale ≥ n one must have |iω · ν v − M[n−2] (ω · ν v ; ε)| ≤ 2−n+1 C0 (see (6.6)), hence, by the Diophantine conditions (6.10), |ν v | ≥ 2(n−1)/τ1 , which implies 2|ν(θ )|2(3−n)/τ1 ≥ 222/τ1 ≥ 2. Therefore in such a case the bound (7.3) is trivially satisfied. If θ is a renormalized tree of order k > 1, we assume that the bound holds for all renormalized trees of order k < k. Define En = (2 2(3−n)/τ1 )−1 : so we have to prove that Nn∗ (θ ) ≤ max{0, |ν(θ )|En−1 − 1}. Call the root line of θ and 1 , . . . , m the m ≥ 0 lines on scale ≥ n which are the closest to (i.e. such that no other line along the paths connecting the lines 1 , . . . , m to the root line is on scale ≥ n). If the root line of θ is on scale n < n, then Nn∗ (θ ) =
m
Nn∗ (θi ),
(7.4)
i=1
where θi is the renormalized subtree with i as root line, hence the bound follows by the inductive hypothesis. If the root line has scale ≥ n, then 1 , . . . , m are the entering lines of a cluster T . By denoting again with θi the renormalized subtree having i as root line, one has Nn∗ (θ ) = 1 +
m
Nn∗ (θi ),
(7.5)
i=1
so that the bound becomes trivial if either m = 0 or m ≥ 2. If m = 1 then one has a cluster T with two external lines and 1 , which are both with scales ≥ n; then |iω · ν − M[n−2] (ω · ν ; ε)| ≤ 2−n+1 C0 , |iω · ν 1 − M[n−2] (ω · ν 1 ; ε)| ≤ 2−n+1 C0 ,
(7.6)
and ν = ν 1 , otherwise T would be a self-energy graph. Then, by (7.6), one has 2−n+2 C0 ≥ iω · (ν − ν 1 ) − M[n−2] (ω · ν ; ε) + M[n−2] (ω · ν 1 ; ε) = |iω · (ν − ν 1 ) + ∂x M[n−2] (x∗ ; ε)(ω · ν 1 − ω · ν )| 1 ≥ |ω · (ν − ν 1 )| − D |ε| ω · (ν − ν 1 ) ≥ ω · (ν − ν 1 ) , (7.7) 2 where x∗ is a point between ω·ν and ω·ν 1 , and (7.1) has been used. By the Diophantine conditions (6.10), one has |ν − ν 1 | > 2(n−3)/τ1 , so that |ν v | ≥ |ν T | = ν − ν 1 > 2(n−3)/τ1 > En , (7.8) v∈B(T )
238
G. Gentile
hence |ν(θ )| − |ν(θ1 )| > En , which, inserted into (7.5) with m = 1, gives, by using the inductive hypothesis, Nn∗ (θ ) = 1 + Nn∗ (θ1 ) ≤ 1 + |ν(θ1 )|En−1 − 1 ≤ 1 + |ν(θ )| − En En−1 − 1 ≤ |ν(θ )|En−1 − 1, hence the bound is proved also if the root line is on scale ≥ n.
(7.9)
Lemma 2. For ε ∈ E and for x such that g [n] (x; ε) = 0, there exist two constants D and D such that the functions M[j ] (x; ε) are smooth functions of x and satisfy the bounds [j ] M (x; ε) ≤ D|ε|, ∂x M[j ] (x; ε) ≤ D|ε|, j/τ1 [j ] (7.10) M (x; ε) − M[j −1] (x; ε) ≤ D|ε|e−D 2 , for all 0 ≤ j ≤ n − 1. Proof. The proof is by induction on j . For j = 0 the bounds (7.10) are trivially satisfied; then, assuming that the bounds hold for all j < j , for some j ≤ n − 1, we want to show that they follow also for j . The quantity M [j ] (x; ε) is given by M [j ] (x; ε) =
∞
VT (x; ε),
(7.11)
k=1 T ∈S R
k,j
with x satisfying the bounds (6.6) by hypothesis; in particular one has |ix − M[j −2] (x; ε)| < 2−j +1 C0 . R contributing to We want to show, by reductio ad absurdum, that for all T ∈ Sk,j M [j ] (x; ε) through the self-energy value VT (x; ε), one must have |ν v | > 2(j −4)/τ1 . (7.12) v∈B(T )
R must contain at least one By construction all renormalized self-energy graphs in Sk,j line on scale n = j . Therefore for such a line one has (see (6.6))
|iω · ν − M[j −2] (ω · ν ; ε)| ≤ 2−j +1 C0 .
(7.13)
Furthermore, by the inductive hypothesis, the quantity M[j −2] (x; ε) is smooth in x, so that one can write M[j −2] (ω · ν ; ε) = M[j −2] (ω · ν; ε) + ∂x M[j −2] (x∗ ; ε) ω · (ν − ν), where x∗ is a point between ω · ν and ω · ν . We can write ν = ν 0 + σ ν, where, if we write as usual = v , ν 0 = νw, w∈B(T ) w v
and σ = 1 if the line entering T is comparable with and σ = 0 otherwise.
(7.14)
(7.15)
Quasi-Periodic Solutions for Two-Level Systems
239
Note that if (7.12) does not hold then one has |iω · ν 0 − M[n] (ω · ν 0 ; ε)| ≥ 24−j C0 for all n ≥ −1, by the Diophantine conditions (6.10). Then if σ = 0 one has ν = ν 0 , hence |iω · ν − M[j −2] (ω · ν ; ε)| ≥ 24−j C0 , while if σ = 1 one has, by using the inductive hypothesis, |iω · ν − M[j −2] (ω · ν ; ε)| = |iω · ν 0 + iω · ν − M[j −2] (ω · ν; ε) − ∂x M[j −2] (x∗ ; ε) ω · ν 0 | ≥ |ω · ν 0 | − |iω · ν − M[j −2] (ω · ν; ε)| − D |ε| |ω · ν 0 | 1 ≥ |ω · ν 0 | − |iω · ν − M[j −2] (ω · ν; ε)| 2 ≥ 2−1 24−j C0 − 21−j C0 > 21−j C0 ,
(7.16)
which are both in contradiction with (7.13). Therefore (7.12) follows. By reasoning as in the proof of Lemma 1 one obtains that, if we denote with Nj (T ) R , one has the number of lines on scale j contained in T ∈ Sk,j
Nj (T ) ≤ c 2−j /τ1
|ν v |.
(7.17)
v∈B(T )
More precisely (and more generally), if T is a connected subset of lines and nodes in a tree such that (1) T has only one exiting line and only one entering line both on scales ≥ j , (2) all the lines in L(T ) are on scale ≤ j , then, by denoting with Nj∗ (T ) the number of lines on scale ≥ j contained in T and
defining |ν(T )| = v∈B(T ) |ν v |, one can prove inductively on the order of T the bound Nj∗ (T ) ≤ max{0, 2|ν(T )|2(3−j )/τ1 − 1} for all j ≤ j , by reasoning as follows. Consider the path P formed by the lines connecting the entering line with the exiting line of T , and call V (P) the vertices connected by such lines. If all the lines ∈ P are on scales
∗ n < j then one has Nj∗ (T ) = m N i=1 j (θi ), where θ1 , . . . , θm are the trees inside T with root in a vertex v ∈ V (P), so that the bound follows from (the proof of) Lemma 1, i.e. from the bound (7.3).1 If there is at least one line ∈ P on scale ≥ j , call T1 and T2 the connected subsets of T such that L(T ) = { } ∪ L(T1 ) ∪ L(T2 ). If both T1 and T2 contain at least a line on scale ≥ j , then they have the same structure of T , i.e. they are subsets of lines (on scales ≤ j ) and nodes with only one exiting line and only one entering line both on scales ≥ j , so that by the inductive hypothesis one has Nj (T ) ≤ 1 + Nj (T1 ) + Nj (T2 ) ≤ 1 + (2|ν(T1 )|2(3−j )/τ1 − 1) + (2|ν(T2 )|2(3−j )/τ1 − 1) ≤ 2|ν(T )|2(3−j )/τ1 − 1. If only the subset T2 contains at least a line on scale ≥ j , then we can reason as in deriving (7.12) through (7.16) to conclude that one must have |ν(T1 )| > 2(j −4)/τ1 , hence Nj (T ) = 1 + Nj (T2 ) ≤ 1 + (2|ν(T2 )|2(3−j )/τ1 − 1) ≤ 2|ν(T )|2(3−j )/τ1 − 2|ν(T1 )|2(3−j )/τ1 ≤ 2|ν(T )|2(3−j )/τ1 − 1; analogously one discusses the case in which only the set T1 contains at least a line on scale ≥ j , and the case in which both sets do not contain any line on scale ≥ j . Hence the bound on Nj∗ (T ) 1 Note that, even if in the statement of Lemma 1, one requires that the bounds (7.1) hold for all n, what is really needed is that they hold for all n such that Nn+1 (θ ) = 0. Therefore we can apply Lemma 1 to the trees θ1 , . . . , θm because for each line ∈ L(T ) one has n ≤ j and the bounds (7.1) hold for all j < j by the inductive hypothesis.
240
G. Gentile
is proved also in the case in which there is at least one line ∈ P on scale ≥ j . In R with external lines on scale n all the lines a renormalized self-energy graph T ∈ Sk,j
∈ L(T ) are on scale j ≤ j , so that for all j ≤ j the renormalized self-energy graph R is a subset verifying the properties (1) and (2), and we can apply the above T ∈ Sk,j result, so that the bound (7.17) follows. R, Therefore we see that (7.12) and (7.17) imply, for all T ∈ Sk,j |VT (x; ε)| ≤ |ε|k A1 Ak2 e−A3 2
j/τ1
e−κ|ν v |/2 ,
(7.18)
v∈B(T )
for suitable constants A1 , A2 and A3 ; this can be easily obtained from the definitions (6.1) and (6.3), by reasoning as in deducing (5.8) and using the bound (2.9) for the node factors. By inserting the bound (7.18) into (7.11) and using the definitions (6.3), we obtain ∞ j/τ1 [j ] D1 D2k |ε|k e−D3 2 , M (x; ε) − M[j −1] (x; ε) ≤ M [j ] (x; ε) ≤ k=1
j j ∞ i/τ1 [i] [j ] D1 D2k |ε|k e−D3 2 M (x; ε) ≤ M (x; ε) ≤ i=0
≤
∞
k=1
i=0
D˜ 1 D2k |ε|k ,
(7.19)
k=1
for suitable constants D1 , D˜ 1 , D2 and D3 ; this proves the first and third bounds in (7.10). Note that in the first of (7.19) we can let the sum start from k = 2 for all j ≥ 1 as any renormalized self-energy graph of scale ≥ 1 has to contain at least two nodes (see comments after (6.3)). To prove the second bound in (7.10) we use the second line in the definition of M[n] (x; ε) in (6.3), the regularity of the functions χn and ψn , and the inductive hypothesis. One has j
χ0 (|x|) . . . χj (|ix − M[j −1] (x; ε)|)∂x M [j ] (x; ε)
∂x M[j ] (x; ε) =
j =0
+M
[j ]
(x; ε)
j
χ0 (|x|) . . . ∂χi (|ix − M[i−1] (x; ε)|) . . .
i=0 [j −1]
×χj (|ix − M
(x; ε)|)∂x |ix − M[j −1] (x; ε)| ,
(7.20)
where ∂χi denotes the derivative of χi with respect to its argument, and
∂x M [j ] (x; ε) =
∞ k=1
∂x VT (x; ε) = εkT
∂x VT (x; ε),
R T ∈Sk,j
∂x g [n ]
∈L(T )
∈L(T )\
[n ]
g
so that one has to evaluate the derivatives of the propagators.
v∈E(T )∪V (T )
Fv , (7.21)
Quasi-Periodic Solutions for Two-Level Systems
241
By using the definition of g [n] (x; ε) in (6.3) one has2 χ0 (|x|) . . . ψn (|ix − M[n−1] (x; ε)|) ∂x (ix − M[n−1] (x; ε)) (ix − M[n−1] (x; ε))2 χ0 (|x|) . . . ∂χj (|ix − M[j −1] (x; ε)|) . . . n ×ψn (|ix − M[n−1] (x; ε)|) + ix − M[n−1] (x; ε)
∂x g [n] (x; ε) = −
j =0
×∂x |ix − M[j −1] (x; ε)|,
(7.22)
2j C0−1 and |∂ψj |
2j C0−1 , for some positive
which, by using the fact that |∂χj | ≤ ≤ constant , and the inductive hypothesis, can be bounded as
C0−1 2j C02 2−2(n+1) j =0 C0 2−(n+1) n −2 2(n+1) ˜ ≤ AC0 2 1+ 2j −(n+1) ≤ AC0−2 22(n+1) ,
|∂x g [n] (x; ε)| ≤ A˜
1
+
n
(7.23)
j =0
for some constants A˜ and A. Therefore we can bound ∂x VT (x; ε) in (7.21) by |∂x VT (x; ε)| ≤ |ε|kT AC0−2 22(n +1)
∈L(T )
×QkT
e−κ|ν v |
v∈B(T )
∈L(T )\
C0−1 2n +1
c[kv ]
v∈EW (θ)
j /τ1 ≤ |ε| A˜ 1 A˜ k2T e−A3 κ2 ,
kT
(7.24)
where we have used also (7.12), for suitable constants A˜ 1 and A˜ 2 ,3 and a bound |c[kv ] | < C kv can be inductively assumed. Then (7.24) implies immediately the bound, for suitable constants D˜ 1 , D˜ 2 , D˜ 3 and ˜ D, |∂x M
[j ]
(x; ε)| ≤
∞
˜ j /τ1 ˜ −D˜ 3 2j /τ1 , |ε|k D˜ 1 D˜ 2k e−D3 2 ≤ |ε|2 De
(7.25)
k=2
which, together with (7.20), yields |∂x M (x; ε)| ≤ [j ]
j
˜ |ε| De
j =0
2
−D˜ 3 2j
/τ 1
˜ + |ε|De
−D3 2j
/τ
1
j
2i ≤ |ε|D,
(7.26)
i=0
provided that D is large enough, so that the second bound in (7.10) follows.
With obvious interpretation of the term with j = n in the last sum. With respect to the bound (7.18) we have the extra difficulty that, in order to prove the bound (7.17), when using the inequality like (7.16) with ∈ P on scale ≥ j , the quantity ω · ν has to be replaced with a continuously varying x. Nevertheless, as in the previous case, one has |ix − M[j −2] (x; ε)| < 2−j +1 C0 , by the support properties of the functions χn , so that (7.16) still applies when needed, and the same conclusions still hold. 2 3
242
G. Gentile
To apply the above results and conclude the proof of Theorem 2, we have still to construct the set E for which the Diophantine conditions (6.10) hold, and to show that such a set has positive measure. Define recursively the sets E [n] as follows. Fix ε0 such that the series ∞ k=1 u(k) ν
εk =
eiω·νt u(k) ν ,
ν∈ Zd
Val(θ ),
(7.27)
θ∈R k,ν
obtained from (6.9), with the definitions (6.7), by replacing the propagators g [n ] with g (n ) (which is equal to the series obtained from (4.1), with the definitions (4.10), by discarding all trees containing self-energy graphs), converges for |ε| ≤ ε0 . Therefore Val(θ) is a numerical value satisfying the bound |Val(θ)| ≤ C k e−κ |ν| , for some constant C, as we can prove by reasoning as in Sect. 5 and bounding the product of the propagators through the bound (7.2) of Lemma 1 (equivalently through the bound (5.7)). Set E [0] = (−ε0 , ε0 ),
(7.28)
and, for n ≥ 1,
E [n] = ε ∈ E [n−1] : |iω · ν − M[n] (ω · ν; ε)| > C0 |ν|−τ1 ,
(7.29)
for a suitable Diophantine constant τ1 (to be fixed later); finally define E=
∞
E [n] = lim E [n] .
n=0
(7.30)
n→∞
Lemma 3. The function M [n] (x; ε) is C 1 -extendible in the sense of Whitney outside E [n−1] , and for all ε, ε ∈ E [n−1] one has M [n] (x; ε ) − M [n] (x; ε) = ε − ε ∂ε M [n] (x; ε) + o(ε − ε), (7.31) where ∂ε M [n] (x; ε) is the formal derivative with respect to ε of M [n] (x; ε). Proof. The proof is by induction on n. Both M [n] (x; ε) and M [n] (x; ε ) can be expressed by the last equation in (6.3): the only difference is that one has to replace ε with ε for M [n] (x; ε ). This means that there is a correspondence one-to-one between the graphs contributing to M [n] (x; ε) and those contributing to M [n] (x; ε ), so that we can write M [n] (x; ε ) − M [n] (x; ε) = 2 ε − ε Q0 c[0] δn,0 ∞
+ Fv (ε )kT − ε kT k=1 T ∈S R
k,n
×
∈L(T )
g [n ] (ω · ν ; ε )
v∈E(T )∪V (T )
Quasi-Periodic Solutions for Two-Level Systems ∞
+
243
ε kT
k=1 T ∈S R
Fv
v∈E(T )∪V (T )
k,n
×
g [n ] (ω · ν ; ε ) − g [n ] (ω · ν ; ε) ,
∈L(T )
∈L(T )
(7.32) where of course kT = k. The terms in the first two lines can be trivially studied, so we concentrate ourselves on the last sum in (7.32). Let us call (T ) the set of lines in L(T ) coming out from nodes in B(T ). We can order the |B(T )|−1 lines in (T ) and construct a set of |B(T )| subsets 1 (T ), . . . , |B(T )| (T ) of (T ), with |j (T )| = j − 1, in the following way. Set 1 (T ) = ∅, 2 (T ) = 1 , if
1 is a line connected to the outcoming line of T , and, inductively for |B(T )| ≥ 3 and 2 ≤ j ≤ |B(T )| − 1, j +1 (T ) = j (T ) ∪ j , where the line j ∈ (T ) \ j (T ) is connected to j (T ). Then in (7.32) we have g [n ] (ω · ν ; ε ) − g [n ] (ω · ν ; ε)
∈(T )
=
∈(T )
|B(T )| j =1
×
g [n ] (ω · ν ; ε )
g
∈j (T )
[n j ]
(ω · ν j ; ε ) − g
[n j ]
(ω · ν j ; ε)
g [n ] (ω · ν ; ε)
,
(7.33)
∈(T )\(j (T )∪ j )
where, by setting nj = n j , xj = ω · ν j , Xs (ε) = χs (|ixj − M[s−1] (xj ; ε)|) for s = 1, . . . , nj − 1, and nj (ε) = ψnj (|ixj − M[nj −1] (xj ; ε)|), we can write (see the first equation in (6.3)) g [nj ] (x − g [nj ] (xj ; ε) j ; ε[n )−1] M j (xj ; ε ) − M[nj −1] (xj ; ε) χ0 (|xj |)X1 (ε) . . . Xnj −1 (ε)nj (ε) = (ixj − M[nj −1] (xj ; ε ))(ixj − M[nj −1] (xj ; ε)) nj −1 χ0 (|xj |) . . . Xs−1 (ε ) Xs (ε ) − Xs (ε) Xs+1 (ε) . . . nj (ε) + ixj − M[nj −1] (xj ; ε ) s=1 χ0 (|xj |)X1 (ε ) . . . Xnj −1 (ε ) nj (ε ) − nj (ε) + . (7.34) ixj − M[nj −1] (xj ; ε )
By writing “symbolically” Xs (ε ) − Xs (ε) = ∂χs (|ixj − M[s−1] (xj ; ε∗ )|) M[s−1] (xj ; ε ) − M[s−1] (xj ; ε) , nj (ε ) − nj (ε) = ∂ψnj (|ixj − M[nj −1] (xj ; ε∗ )|) × M[nj −1] (xj ; ε ) − M[nj −1] (xj ; ε) ,
(7.35)
where ε∗ and ε∗ are two suitable values (depending on j and s) between ε and ε , we can use the inductive hypothesis for all differences M [j ] (xj ; ε ) − M [j ] (xj ; ε), with
244
G. Gentile
j ≤ n − 1, appearing in (7.34) and (7.35), so that (7.31) follows, by defining ∂ε M [n] (ω · ν; ε) =
∂ε VT[k] (ω · ν; ε),
R T ∈Sk,n
∂ε VT[k] (ω · ν; ε) = kT ε kT −1 +εkT
Fv
g [n ]
∈L(T ) ∂ε g [n ]
v∈E(T )∪V (T )
Fv
v∈E(T )∪V (T )
∈(T )
∈(T )\
[n ]
g
, (7.36)
where ∂ε g [n]
g [n] [n−1] ∂ = M (x; ε) ε ix − M[n−1] (x; ε) n−1 ∂χj (|ix − M[j −2] (x; ε)|) [n] [j −2] −g − M (x; ε) ∂ ix ε χj (|ix − M[j −2] (x; ε)|) j =1 ∂ψn (|ix − M[n−1] (x; ε)|) [n−1] + − M (x; ε) (7.37) ∂ ix ε ψn (|ix − M[n−1] (x; ε)|)
denotes the formal derivative of the propagator. Moreover M [n] (ω · ν; ε) is defined for all ε ∈ E [n−1] and it can be extended by continuity to its closure E [n−1] and hence to the full E [0] ; its extension is then C 1 in the sense of Whitney, [19], and satisfies the same bounds (7.10). Therefore for all ε ∈ E [n−1] the quantities M[n] (x; ε) are well defined and formally differentiable (in the sense of Whitney), so that one has B ≤ ∂ε M[n] (x; ε) ≤ 2B, 2
(7.38)
for a suitable positive constant B, while the propagators admit the bounds |g [n] | ≤ 2n+1 C0−1 ,
|∂ε g [n] | ≤ G22(n+1) C0−2 ,
(7.39)
for a suitable constant G, as it can be easily obtained by reasoning as in the proof of Lemma 2; the bounds (7.38) and (7.39) follow inductively from the formulae and the analysis performed along the proof of the above lemma, and from the definitions (6.1) and (6.3). Lemma 4. There are two positive constants b and ξ such that, for ε0 small enough, one has ξ (7.40) meas(E) ≥ ε0 1 − bε0 , where meas denotes the Lebesgue measure.
Quasi-Periodic Solutions for Two-Level Systems
Proof. Define
245
I [0] = ∅, I [n] = E [n−1] \ E [n] ,
for n ≥ 1;
(7.41)
[n] = (−ε , ε ) \ E. We shall prove that one has note that I ≡ ∪∞ 0 0 n=0 I 1+ξ
meas(I [n] ) ≤ b ε0
∀n ≥ 0,
for suitable positive constants b and ξ . d For all n ≥ 1 and for all ν ∈ Z \ {0} define I [n] (ν) = ε ∈ E [n−1] : iω · ν − M[n] (ω · ν; ε) ≤ C0 |ν|−τ1 .
(7.42)
(7.43)
Each set I [n] (ν) has a center in a point ε [n] (ν). We can easily prove that there exist two positive constants B1 and B2 such that one has n/τ1 [n] (7.44) ε (ν) − ε [n−1] (ν) ≤ ε0 B1 e−B2 2 for all n ≥ 2 and for all ν ∈ Z \ {0}. By definition of ε [n] (ν) one has d
iω · ν − M[n] (ω · ν; ε [n] (ν)) = 0,
(7.45)
where we are using the Whitney extension of M[n] (ω · ν; ε) outside E [n−1] , so that, by setting δε = ε[n] (ν) − ε [n−1] (ν), one obtains (again by using Whitney extensions) 0 = iω · ν − M[n] (ω · ν; ε [n] (ν)) = iω · ν − M[n−1] (ω · ν; ε [n−1] (ν) + δε) −M[n] (ω · ν; ε [n] (ν)) + M[n−1] (ω · ν; ε [n] (ν)) [n−1] = −∂ε M (ω · ν; ε [n−1] (ν)) δε + o(δε) − M[n] (ω · ν; ε [n] (ν)) − M[n−1] (ω · ν; ε [n] (ν)) ,
(7.46)
by Lemma 3; therefore one can use that one has n/τ1 [n] M (ω · ν; ε [n] (ν)) − M[n−1] (ω · ν; ε [n] (ν)) ≤ ε0 D1 e−D2 2 ,
(7.47)
by Lemma 2, and B ∂ε M[n−1] (ω · ν; ε [n−1] (ν)) > , 2
(7.48)
by (7.38), so that (7.44) follows. Therefore one has to exclude from the set E [n−1] all the values ε around ε [n] (ν) in I [n] (ν), which gives a set of measure 1 dε(t) dε = dt , (7.49) dt I [n] (ν) −1 where ε(t) is defined by iω · ν − M[n] (ω · ν; ε(t)) = tC0 |ν|−τ1 ,
(7.50)
246
G. Gentile
which means
I [n] (ν)
dε ≤
1
−1
dt C0 |ν|−τ1
4 1 ≤ C0 |ν|−τ1 , |∂ε M[n] (ω · ν; ε(t))| B
(7.51)
by (7.38). d This has to be done for all ν ∈ Z satisfying |ω · ν| < 2ε0 D, where D is the positive d constant such that ε0 D is a bound on |M[n] (ω · ν; ε)|, i.e. for all ν ∈ Z such that C0 1/τ ≡= N0 . (7.52) |ν| ≥ 2ε0 D This yields that we have to exclude from E [n−1] a set I [n] = I [n] (ν)
(7.53)
|ν|≥N0
of measure bounded by meas(I [n] ) ≤
meas(I [n] (ν)) ≤ const.
|ν|≥N0
≤ const.C0
ε0 C0
(τ1 −d)/τ
C0 |ν|−τ1
|ν|≥N0 1+ξ
= const.ε0
,
(7.54)
where ξ = (τ1 − τ − d)/τ , so that ξ > 0 if τ1 > τ + d, which fixes the value of τ1 . For all |ν| ≥ N0 fix n0 = n0 (ν) such that [n0 +1] (ν) − ε [n0 ] (ν) ≤ C0 |ν|−τ1 ; ε
(7.55)
(7.56)
by (7.44) one can choose n0 ≡ n0 (ν) ≤ const. τ1 log log |ν|. Then for all n ≤ n0 define J [n] (ν) as J [n] (ν) = ε ∈ E [n−1] : iω · ν − M[n] (ω · ν; ε) < 2C0 |ν|−τ1 ;
(7.57)
(7.58)
by construction all the sets I [n] (ν) fall inside J [n0 ] (ν) as soon as n > n0 . Then we can bound meas(I) by the sum of the measures of the sets d J [1] (ν), . . . , J [n0 ] (ν) for all ν ∈ Z such that |ν| verifies (7.52). The condition (7.57) on n0 implies that such a measure can be bounded by 1+ξ const. n0 (ν)C0 |ν|−τ1 ≤ const.ε0 , (7.59) |ν|≥N0
with a value ξ smaller than ξ in order to take into account the logarithmic corrections due to the factor n0 (ν).
Quasi-Periodic Solutions for Two-Level Systems
247
The above lemmata imply the convergence of the series (6.9) for all values ε ∈ E, with E a Cantor set with positive Lebesgue measure such that meas(E) = 1, ε0 →0 2ε0 lim
(7.60)
as it follows immediately from the construction of E and from the property (7.40). 8. Properties of the Renormalized Expansion To complete the proof of Theorem 2 we have still to show that the function (6.9), defined through the renormalized expansion (6.7), solves Eq. (2.4) for all ε ∈ E, i.e. that one has (8.1) u = g ε Q−1 + Qu2 , where g is the differential operator with kernel g(ω · ν) = 1/ iω · ν. We can write u(t) = c + eiω·νt uν , uν = un,ν =
∞ n=0 ∞ k=1
ν∈Zd
un,ν , εk
Val(θ ),
(8.2)
θ∈R k,ν (n)
R where R k,ν (n) is the set of trees in k,ν such that the root line has scale n. Note that for all x = 0 one has
1=
∞
Xn (x; ε),
n=0
Xn (x; ε) = χ0 (|x|) . . . χn−1 (|ix − M[n−2] (x; ε)|)ψn (|ix − M[n−1] (x; ε)|), (8.3) where the term with n = 0 has to be interpreted as ψ0 (|x|); more generally for all x = 0 and for all j ≥ 0 one has 1=
∞
χj (|ix − M[j −1] (x; ε)|) . . . χn−1 (|ix − M[n−2] (x; ε)|)
n=j
×ψn (|ix − M[n−1] (x; ε)|),
(8.4)
where again the term with n = j has to be interpreted as ψj (|ix − M[j −1] (x; ε)|). Note that both in (8.3) and in (8.4) only a finite number of addends is different from zero, as the analysis of Sect. 7 implies, so that the two series are well defined. By using (8.3) one can write, in Fourier space, ∞ g(ω · ν) ε Q−1 + Qu2 = g(ω · ν) Xn (ω · ν; ε) ε Q−1 + Qu2 ν
n=0
ν
248
G. Gentile
= g(ω · ν) = g(ω · ν)
∞
Xn (ω · ν; ε)(g [n] (ω · ν; ε))−1 g [n] (ω · ν; ε) ε Q−1 + Qu2
ν
n=0 ∞
iω · ν − M[n−1] (ω · ν; ε) g [n] (ω · ν; ε) ε Q−1 + Qu2
n=0
= g(ω · ν)
∞
∞
iω · ν − M[n−1] (ω · ν; ε)
n=0
εk
k=1
ν
Val(θ ),
(8.5)
R θ∈k,ν (n)
R
where k,ν (n) differs from R k,ν (n) as it contains also trees which can have one renormalized self-energy graph T with exiting line 0 , if 0 denotes the root line of θ; for such trees the line entering T will be on a scale p ≥ 0, while the renormalized selfenergy graph T will have a scale nT = j , with j + 1 ≤ min{n, p} (by definition of the renormalized self-energy graph). Then we obtain, by explicitly separating in (8.5) the trees containing such self-energy graphs from the others, ∞ iω · ν − M[n−1] (ω · ν; ε) g(ω · ν) ε Q−1 + Qu2 = g(ω · ν) ν
×
∞
n=0
εk
k=1
+g(ω · ν)
Val(θ )
θ∈R k,ν (n)
∞
iω · ν − M[n−1] (ω · ν; ε) g [n] (ω · ν; ε)
n=1
×
∞ n−1
M [j ] (ω · ν; ε)
p=n j =0
+g(ω · ν)
∞
εk
Val(θ )
θ∈R k,ν (p)
k=1
iω · ν − M[n−1] (ω · ν; ε) g [n] (ω · ν; ε)
∞ n=2
×
n−1 p−1
M [j ] (ω · ν; ε)
p=0 j =0
∞ k=1
εk
Val(θ ),
θ∈R k,ν (p)
(8.6) which, by the definitions (8.2), can be w ritten as
g(ω · ν) ε Q
−1
+ Qu
2
ν
∞
iω · ν − M[n−1] (ω · ν; ε) un,ν
= g(ω · ν) +
∞
n=0
Xn (ω · ν; ε)
n=1
+
∞ n=2
Xn (ω · ν; ε)
∞ n−1 p=n j =0 n−1 p−1
M [j ] (ω · ν; ε)up,ν M [j ] (ω · ν; ε)up,ν .
p=0 j =0
(8.7)
Quasi-Periodic Solutions for Two-Level Systems
249
We can write ∞
Xn (ω · ν; ε) +
=
Xn (ω · ν; ε)
up,ν
p=1 p−1
+ = =
p p−1
M [j ] (ω · ν; ε)Xn (ω · ν; ε)
j =0 n=j +1 ∞
up,ν
un,ν
j =0 n−1
M [j ] (ω · ν; ε)Xn (ω · ν; ε)
j =0 n=p+1 p−1
p=1 ∞
M [j ] (ω · ν; ε)up,ν
p=1 j =0
n=2 ∞
∞
M [j ] (ω · ν; ε)up,ν
p=n j =0 n−1 p−1
n=1 ∞
∞ n−1
M
[j ]
(ω · ν; ε)
∞
Xn (ω · ν; ε)
n=j +1
M [j ] (ω · ν; ε)χ0 (|ω · ν|) . . . χj (|iω · ν − M[j −1] (ω · ν; ε)|)
n=1
j =0
×
χj +1 (|iω · ν − M[j ] (ω · ν; ε)|) . . . ψs (|iω · ν − M[s−1] (ω · ν; ε)|)
∞
s=j +1
=
∞ n=1
un,ν
n−1
M [j ] (ω · ν; ε)χ0 (|ω · ν|) . . . χj (|iω · ν − M[j −1] (ω · ν; ε)|), (8.8)
j =0
where the identity (8.4) has been used in the last line (with the correct interpretation of the term with s = j + 1 explained after (8.4)). By the second definition in (6.3) one has n−1
M [j ] (ω · ν; ε)χ0 (|ω · ν) . . . χj (|iω · ν − M[j −1] (ω · ν; ε)|) = M[n−1] (ω · ν),
j =0
(8.9) so that, by inserting (8.8) in (8.6), after having used (8.9) we obtain g(ω · ν) ε Q−1 + Qu2 ν ∞ iω · ν − M[n−1] (ω · ν; ε) + M[n−1] (ω · ν; ε) un,ν = g(ω · ν) n=0
= g(ω · ν)
∞ n=0
(iω · ν)un,ν =
∞
un,ν = uν ,
(8.10)
n=0
so that (8.1) follows. Note that at each step only absolutely converging series have been dealt with, so that the above analysis is rigorous and not only formal. Acknowledgement. I’m indebted to Jo˜ao C.A. Barata for interesting and clarifying discussions about his work [1], and to Daniel A. Cortez for comments on the manuscript.
250
G. Gentile
References 1. Barata, J.C.A.: On formal quasi-periodic solutions of the Schr¨odinger equation for a two-level system with a Hamiltonian depending quasi-periodically on time. Rev. Math. Phys. 12(1), 25–64 (2000) 2. Barata, J.C.A.: Convergent perturbative solutions of the Schr¨odinger equation for two-level systems with Hamiltonians depending periodically on time. Ann. Henri Poincar´e 2(5), 963–1005 (2001) 3. Blekher, P.M., Jauslin, H.R., Lebowitz, J.L.: Floquet spectrum for two-level systems in quasiperiodic time-dependent fields. J. Statist. Phys. 68(1–2), 271–310 (1992) 4. Duclos, P., Sˇ ˇtov´ıcˇ ek, P.: Floquet Hamiltonians with Pure Point Spectrum. Commu. Math. Phys. 177(2), 327–347 (1996) 5. Duclos, P., Lev, O., Sˇ ˇtov´ıcˇ ek, P., Vittot, M.: Weakly regular Floquet Hamiltonians with pure point spectrum. Rev. Math. Phys. 14(6), 531–568 (2002) 6. Eliasson, L.H.: Floquet solutions for the 1-dimensional quasiperiodic Schr¨odinger equation. Common. Math. Phys. 146(3), 447–482 (1992) 7. Eliasson, L.H.: Ergodic skew systems on Td × SO(3, R). Preprint, ETH Z¨urich, 1991 8. Eliasson, L.H.: Absolutely convergent series expansions for quasi-periodic motions. Math. Phys. Electron. J. 2 paper 4, 1–33 (1996). Preprint, 1988 9. Feynman, R.P., Leighton, R.B., Sands, M.: The Feynman lectures on physics. Vol. 3: Quantum mechanics. Reading, MA: Addison-Wesley, 1965 10. Gallavotti, G.: Twistless KAM tori. Commun. Math. Phys. 164, 145–156 (1994) 11. Gallavotti, G., Gentile, G.: Hyperbolic low-dimensional invariant tori and summations of divergent series. Common. Math. Phys. 227(3), 421–460 (2002) 12. Gentile, G.: Diagrammatic techniques in perturbation theory, and applications. In Degasperis, A., Gaeta, G.: Proceedings of “Symmetry and Perturbation Theory II” (Rome, 16–22 December 1998), (Eds). Singapore World Scientific, 1999, pp. 59–78 13. Gentile, G.: Pure point spectrum for two-level systems in a strong quasi-periodic filed. Preprint, 2003 14. Gentile, G., Mastropietro, V.: Methods for the analysis of the Lindstedt series for KAM tori and renormalizability in classical mechanics. A review with some applications. Rev. Math. Phys. 8 393– 444 (1996) 15. Krikorian, R.: R´eductibilit´e presque partout des syst`emes quasip´eriodiques dans le cas SO(3). C. R. Acad. Sci. Paris S´er. I Math. 321(8), 1039–1044 (1995) 16. Krikorian, R.: R´educibilit´e des syst`emes produits-crois´es a` valeurs dans les groupes compacts. Ast´erisque 259, 1–216 (1999) 17. Krikorian, R.: Global density of reducible quasi-periodic cocycles on T 1 × SU (2). Ann. of Math. 154(2), 269–326 (2001) 18. Nussenzveig, H. M.: Introduction to Quantum Optics. New York: Gordon and Breach, 1973 19. Whitney, H.: Analytic extensions of differential functions defined in closed sets. Trans. Am. Math. Soc. 36(1), 63–89 (1934) 20. Wreszinski, W.F., Casmeridis, S.: Models of two-level atoms in quasiperiodic external fields. J. Statist. Phys. 90(3–4), 1061–1068 (1998) Communicated by G. Gallavotti
Commun. Math. Phys. 242, 251–275 (2003) Digital Object Identifier (DOI) 10.1007/s00220-003-0944-z
Communications in
Mathematical Physics
Integrable Dynamics of Charges Related to the Bilinear Hypergeometric Equation Igor Loutsenko SISSA, Via Beirut 2-4, 34014 Trieste, Italy. E-mail: [email protected] Received: 20 November 2002 / Accepted: 28 May 2003 Published online: 26 September 2003 – © Springer-Verlag 2003
Abstract: A family of systems related to a linear and bilinear evolution of roots of polynomials in the complex plane is introduced. Restricted to the line, the evolution induces dynamics of the Coulomb charges (or point vortices) in external potentials, while its fixed points correspond to equilibriums of charges in the plane. The construction reveals a direct connection with the theories of the Calogero-Moser systems and Lie-algebraic differential operators. A study of the equilibrium configurations amounts in a construction (bilinear hypergeometric equation) for which the classical orthogonal and the Adler-Moser polynomials represent some particular cases. 1. Introduction In the present paper we propose to discuss a Bilinear Hypergeometric Operator 1 Hλ [f, g] = f g − 2f g + 2 g f P + f g + 2 g f P 2 + f g − g f U + λf g, P := P (z) = A + Bz + Cz2 , U := U (z) = a + bz, ∈ R, df (z) , f := f (z), g := g(z), f := dz and to study integrable dynamics: n d x = 2P (x ) − i dt i
1 + xi − x j
m
1 − U (xi ) − P (xi ), xi − y j 2 j =1,j =i j =1 m n 1 d − U (yi ) + P (yi ), − dt yi = 2P (yi ) yi − yj yi − x j 2 j =1,i=i
j =1
(1)
(2)
252
I. Loutsenko
of roots xi , yi of polynomials in a complex variable z qm (z, t) =
m
(z − xi (t)),
pn (z, t) =
i=1
n
(z − yi (t)), i=1
induced by the action of (1) pn
dqm dpn − qm = Hλnm [pn , qm ], λnm = (m − n) U + (n − m)P /2 . dt dt
(3)
The above construction has a nice physical interpretation: The fixed points of (2) correspond to equilibrium distributions of n and m Coulomb charges (or point vortices in hydrodynamics) of values 1 and − respectively in external potentials on the plane or cylinder, while the real solutions describe their motion on the line or circle. It should, however, be noted that the physical analogies are not complete, because dxi∗ /dt and dyi∗ /dt should appear in place of dxi /dt and dyi /dt in the lhs of (2). Nevertheless, all equilibrium solutions in the plane as well as time dependent solutions on the real line are the same for (2) and corresponding physical systems. The electrostatic interpretation for roots goes back to works by Stieltjes on the classical orthogonal [20], and by Bartman on the Adler-Moser polynomials [4]. These results represent special cases in our construction. The Bilinear Hypergeometric Equation Hλ [f, g](z) = 0
(4)
is a natural extension of the Gauss hypergeometric equation 1 Hλ [f, 1](z) = P (z)f (z) + U (z) + P (z) f + λf (z) = 0 2 and the recurrent relation H01 [θi , θi+1 ](z) = θi+1 (z)θi (z) − 2θi (z)θi+1 (z) + θi+1 (z)θi (z) = 0,
P = 1, U = 0
for the Adler-Moser polynomials θi (z) [1]. Another special case = 1, P (z) = −z2 , U = 0 of (4) provides an interpretation for Huygensian polynomials in two variables studied by Y. Berest and the author in connection with the Hadamard problem [6]. The paper is organized as follows: The principal results on the bilinear evolution of polynomials is gathered in the next section. Starting from the linear evolution as its particular case (in Subsect. 2.1) we show that such an evolution corresponds to dynamics of pairwise interacting charges iff it is induced by second order Lie-algebraic (hypergeometric or “quasi exactly solvable”) differential operators. The system of charges then can be embedded into an integrable Hamiltonian (in general, elliptic Calogero-Moser or Inozemtsev) model related to a Coxeter root system. Skipping consideration of the “quasi” and elliptic cases in the sequel, we classify the remaining cases (in Subsect. 2.2) by types of corresponding hypergeometric equations. Subsection 2.3 is devoted to the study of the bilinear evolution in the special case = 1: Introducing the bilinear hypergeometric operator, we show that
Integrable Dynamics of Charges for Bilinear Hypergeometric Equation
253
its action induces dynamics, which, in some settings, can be interpreted as a motion of unit positive and negative Coulomb charges (or a system of vortices in hydrodynamics) in external potentials. We find that it can be embedded in a flow generated by a sum of two independent Calogero(Sutherland)-Moser Hamiltonians. In this way (2) can be integrated by the Lax method if = 1. Returning to generic two component dynamics in Subsect. 2.4, we arrive at the principal point of the article stated as a conjecture about integrability of (2) for an arbitrary real . Some arguments in its favor are presented. Section 3 is devoted to the study of special limits of bilinear operator and fixed points of evolution in the case = 1: Considering polynomial solutions to Hλ1 [f, g] = 0 in Subsect. 3.1, we analyze equilibrium configurations of charges. It turns out that such solutions can be obtained from associated linear problems by a finite number of the Darboux transformations. In Subsect. 3.2 we discuss degenerate limits of the bilinear dynamics related to the Kadomtsev-Petviashvilli equation and give an interpretation to a set of algebraic solutions of a particular type of (4), obtained earlier in connection with the Hadamard problem in two dimensions. In Sect. 4 we introduce multi-linear (l-linear) hypergeometric operators and related dynamical systems, with (1)–(3) presenting a special case l = 2. In this picture, l distinct types of charges move in external potentials, interacting with each other. Such a dynamics can be embedded in a Hamiltonian system of l species of particles of a Calogero-Moser type. In contrast with the l = 2, = 1 case, there is no separation of the Hamiltonian flow in independent components and the Calogero(Sutherland)-Moser type potentials are not related to the Coxeter root systems. Some open questions are discussed in the concluding section of the paper. 2. Integrable Dynamics Induced by Linear and Bilinear Evolution 2.1. Linear evolution. This section, which is a generalization of works by Choodnovsky & Choodnovsky [10] and Calogero [8], is devoted to the study of a particular case of the bilinear evolution. Let V = Span{1, z, . . . , zn−1 , zn }
(5)
be a linear space of polynomials over the complex numbers C, V ∼ = Cn+1 of degree less than or equal to n in z. We consider the evolution of polynomials p(z, t) dp(z, t) = L[p(z, t)], dt
p(z, t) = T (t)
n
(z − zi (t)) ∈ V
(6)
i=1
under the action of a time independent linear operator L ∈ End(V, V). Rewriting (6) in terms of roots zi (t), i = 1 . . . n and the common factor T (t) we arrive at the following Lemma 1. The linear evolution equation (6) is equivalent to the following dynamical system T −1
d T = τ (z1 . . . zn ), dt
d zi = v(zi |z1 , z2 , . . . , zˆ i , . . . , zn−1 , zn ), dt
(a) (7) i = 1 . . . n,
(b)
254
I. Loutsenko
v is a rational function symmetric in the last n − 1 (all but zi , the hat in (7a) denotes omission of zi ) variables. Proof. Representing L in the matrix form L[zi ] =
n
Lij zj ,
Lij ∈ C,
(8)
j =0
we equate the lhs and the rhs coefficients at different powers of z in (6). Equation (7a) is obtained by picking out a coefficient at z0 = 1. Expressing dT /dt from (7a) and substituting it into the rest of the equations we get: dσi = fi (z1 . . . zn ), dt
i = 1 . . . n,
(9)
σi stand for elementary where fi are polynomials symmetric in zi , i = 1..n and n symmetric polynomials σ (z . . . z ) = 1, σ (z . . . z ) = 0 1 n 1 1 n i=1 zi , σ2 (z1 . . . zn ) =
i<j zi zj , . . . . Equation (9) is a linear system for dzi /dt with determinant i<j (zi − zj ) . It is not singular, provided all zi are distinct, and (9) has a unique solution. Since fi are symmetric in all arguments, this completes the proof. We call (7) a system with two body interactions if w(zi , zj ) + u(zi ). v(zi |z1 , z2 , . . . , zˆ i , . . . , zn−1 , zn ) =
(10)
j ∈N=i
In (10) and in the sequel the following notations N := {1, 2, ...n}, M := {1, 2, ...m} are used. Theorem 1. A system generated by (6) for n > 2 is a system with the two-body interaction if and only if L is a (modulo adding a constant) second order differential operator, d n(n − 1) n d2 + U (z) − U (z) − P (z), 2 dz dz 2 6 P (z) = A + Bz + Cz2 + Dz3 + Ez4 , U (z) = a + bz + cz2 − 2(n − 1)Ez3 , L = P (z)
(11) with polynomial coefficients P (z) and U (z) at most degree four and three respectively. Under condition (11), (7) becomes 1 dzi − U (zi ). = −2P (zi ) dt zi − z j
(12)
j ∈N=i
In other words, (12) is the most general system with two body interactions induced by a linear evolution in the polynomial space. Remark. The cases n = 1, 2 are excluded from the theorem, since any linear operator induces a two-body dynamics.
Integrable Dynamics of Charges for Bilinear Hypergeometric Equation
255
Proof. Introducing the following set of differential operators i n−i d d (−)i zj
Lij = z −k , (n − i)! dz dzi
Lij [zk ] = δik zj ,
0 ≤ i, j, k ≤ n,
k=1
we see from (8) that any L ∈ End(V, V) can be represented by a differential operator with polynomial coefficients Qi (z), L=
n n
Lij Lij =
i=0 j =0
n
Qi (z)
i=0
di . dzi
(13)
Substituting (13) to (6), using (7a) and imposing condition (10) we have: Q := τ (z1 , z2 , . . . ) − Q0 (z) − (Q1 (z) − u(zi )) ti −2
i
Q2 (z) − w(zi , zj ) ti tj − 6
i<j
Q3 (z)ti tj tk − . . . .. = 0,
i<j
where ti = 1/(z − zi ). Taking the n + 1th partial derivative equation (t1 . . . tn )
2
∂ n+1 Q ∂z∂z1 ∂z2 ...∂zn−1 ∂zn
2Qn (z)
= 0, we get the following
ti − Qn (z)
= 0.
i∈N
Picking out the coefficients at different powers of ti , i ∈ N , we find that Qn (z) = 0, since z, t1 . . . .tn is a set of independent variables. Proceeding by induction and taking the n, n − 1, . . . 4th derivative of Q, ∂ nQ = 0, ∂z∂zi1 ∂zi2 . . . ∂zij
i1 < i2 < ... < ij ,
2 < j < n,
we eliminate all Qi with i > 2. Hence, the operator L is at most of the second order. Decomposing L into the sum of homogeneous components L=
Li ,
Li : Czj → Czi+j ,
Li = ai zi+2
i
d2 d + bi zi+1 + ci z i , dz2 dz
and remembering that V is spanned by zi , i = 0..n only, we have Li [zn−2 ] = Li [zn−1 ] = Li [zn ] = 0, It is immediate that
ai = bi = ci = 0,
for
for
i > 2.
i > 2.
Proceeding in this way, we impose similar conditions for i = 1 and i = 2, L2 [zn−1 ] = L2 [zn ] = L1 [zn ] = 0, getting the most general expression (11) for a second order differential operator L ∈ End(V, V).
256
I. Loutsenko
The sufficient condition of the theorem is proved by direct calculation: Substituting (12) and n d T −1 T = Lnn + Ln−1,n zi + Ln−2,n zi z j , dt i=1
i<j
into (11), expressing the matrix elements Lij in terms of a, b, c, A, B, C, D, E, n and using the following identity: 1 1 1 1 , (14) = − (z − zi )(z − zj ) zi − z j z − z i z − zj we show that (6) holds identically. This completes the proof.
Although (12) is not a Hamiltonian system, the following proposition holds Proposition 1. Equation (12) is a trajectory of a system with the Hamiltonian 1 dzi 2 H = − V, 2P (zi ) dt i∈N P (zi ) + P (zj ) V =2 + (n − 2) E(zi + zj )2 + D(zi + zj ) 2 (zi − zj )
(15)
i<j
+
U (zi ) − U (zj ) 1 U 2 (zi ) + . zi − z j 2 P (zi ) i
In other words, the Hamiltonian equations of motion 2 1 d zi P (zi ) dzi 2 ∂V − = , 2 2 2 P (zi ) dt 2P (zi ) dt ∂zi
i∈N
(16)
are corollaries of (12). Remark. Equations (16) are reduced to the Newtonian form ∂V d 2 φi = , dt 2 ∂φi
i∈N
(17)
by the change of variables φi = ζ (zi ),
1 dζ (z) =√ . dz P (z)
(18)
Proof of Proposition 1. Expressing second derivatives through (12)
dzj ∂ dzi by direct calculations, we evaluate the lhs of (16): j ∈N dt ∂zj dt lhs(16)
∂ 2 =− ∂zi
i,n=i,j =i
d 2 zi dt 2
=
U (zi ) − U (zj ) 1 U 2 (zi ) P (zi ) . + + (zj − zi )(zn − zi ) zi − z j 2 P (zi ) i=j
i
Integrable Dynamics of Charges for Bilinear Hypergeometric Equation
257
Using the identity i,n=i,j =i
=
P (zi ) (zj − zi )(zn − zi )
P (zi ) + P (zj ) i<j
(zi − zj )2
2 + (n − 2) E(zi + zj )2 + D(zi + zj ) + C , 3
we get (15), which competes the proof.
2.2. System classification, Lie-algebraic operators and Calogero−Moser models. Consider the n + 1 dimensional representation of the Lie algebra sl(2, C), − + J 0 , J ± = ±J ± , J , J = 2J 0 by differential operators J + = z2
d − nz, dz
J0 = z
d n − , dz 2
J− =
d dz
(19)
acting on (5). (11) is an element of the universal enveloping algebra of sl(2, C), L =
Operator αυς J υ J ς + βς J ς , υ, ς = ±, 0 (19). Such Lie-algebraic operators are called “quasiexactly solvable” [21]. They can be separated in nine nontrivial equivalence classes under the linear-fractional transformations of the independent variable z. Based on the invariant-theoretic classification of canonical forms for quartic polynomials [12], operators (11) can be placed in the nine canonical forms with (1) P (z) = 1
(4, 5)P (z) = 1 ∓ z2 ,
(2) P (z) = z
(6, 7)P (z) = (1 ∓ z2 )2 ,
(3) P (z) = −z2
(8, 9)P (z) = (z4 + τ z2 ∓ 1),
(20)
Most general Hamiltonian systems (15) in this classification are elliptic Inozemtsev models (for trigonometric and rational Inozemtsev Modes see e.g. [15, 19] ) related to An and BC/Dn Coxeter root systems. Skipping analysis of “quasi” and elliptic cases in the sequel, we content ourselves with “exactly solvable” hypergeometric operators dealing with the first five classes only (and linear U (z) = a + bz) in (20). Figure 1 gathers needed information on the first four (the fifth one is a hyperbolic version of the fourth) classes, which are related to the rational/trigonometric Calogero(Sutherland)-Moser systems. 2.3. Bilinear evolution, = 1. In this section we introduce a special case Hλ [·, ·] := Hλ1 [·, ·] of the Bilinear Hypergeometric Operator (1) and study related integrable dynamics of roots. Let V1 ∼ = Cn+1 , V2 ∼ = Cm+1 , V3 ∼ = Cm+n
258
I. Loutsenko z = z(φ) (18)
P (z)
z=φ
1
Root System
Calogero-Moser Potential V (z(φ)), (15), (17)
An
4 j
φ2
z= 4
z
BC/Dn
Polynomial Eigenfunctions of L (11) if b = 0 Hermite −b (z + a ) Hn 2 b
2
+ 21 j a + bφj
4 j
if b = 0, 1, z if b = 0 Laguerre Lna−1 (−bz) if b = 0, a = 1 − j 1, zj zn Lnb−2n+1 −a z
4 (φk +φj )2 b2 φ 2
2 + j 2a2 + 8 j φ
+
j
−z2
z = exp(iφ)
An
1 j
1 − z2
z = cos(φ) BC/Dn
+
1 2
j
ae−iφj + beiφj
1 1 j
(b+a)2 ab + j − 2 2 2 cos φj /2 2 sin φj
Jacobi (− a+b+2 , a−2−b ) 2 2 (z) Pn
Fig. 1. Four generic classes of Hypergeometric systems
be linear spaces of polynomials of degree less than or equal to n, m and n + m − 1 respectively. Consider the evolution dp dq q (21) −p = Hλ [p, q], p ∈ V1 , q ∈ V2 dt dt under the action of a bilinear operator Hλ : V1 × V2 → V3 on the monic polynomials
p= (z − xi (t)), q = (z − yi (t)) (22) i∈N
of the
nth
and
mth
i∈M
degrees.
Lemma 2. The bilinear evolution equation (21) is equivalent to the following dynamical system: d xi = v1 (xi |x1 ..xˆi ..xn |y1 ..ym ), i = 1 . . . n, dt d yi = v2 (yi |y1 ..yˆi ..ym |x1 ..xn ), i = 1 . . . m, dt where v1 and v2 are rational functions (v1 is symmetric in x1 ..xˆi ..xn and y, v2 is symmetric in y1 ..yˆi ..ym and x). Proof. Similar to the linear case (Lemma 1), except that the polynomials are monic now. dx/dt and dy/dt are uniquelyexpressed from a linear system of equations with the determinant j
v2 (yi |y|x) ˆ =
j =i
i,j
w22 (yi , yj ) +
i,j
w21 (yi , xj ).
Integrable Dynamics of Charges for Bilinear Hypergeometric Equation
259
As in the linear case, it is natural to look for differential operators inducing integrable dynamics in a system with two-body interactions. It turns out that such operators exist and are extensions of the linear case. Proposition 2. The bilinear operator Hλnm : V1 × V2 → V3 , 1 Hλnm [p, q] = (p q − 2p q + pq )P + (p q + pq )P 2 +(p q − q p)U + λnm pq, 2 P (z) = A + Bz + , U (z) = a + Cz bz, λnm = (m − n) U + (n − m)P /2 , induces dynamics
m 1 1 d − U (xi ) − 1 P (xi ), + dt xi = 2P (xi ) − x − xj x − yj 2 j =1,j =i i j =1 i m n 1 1 1 d y = 2P (y − U (yi ) + P (yi ) ) − i dt i yi − y j yi − x j 2
(23)
n
j =1,i=i
(24)
j =1
by action (21) on (22). Remark. Equations (21), (23) may be written in the form of the Schr¨odinger evolution equation with a time-dependent potential dψ (25) = P (z)ψ + (U + 1/2P )ψ + (P (ln q) − 2P (ln q) + λ)ψ, dt where ψ = p/q. In this setting, we study the time evolution of polynomial q and a rational function ψ with denominator q, which is a rather inconvenient formulation for our purposes. Remark. Substituting P (z) = 1, U (z) = −k, (dk/dz = 0), and λ = 0 in (25) and reexpressing it in the formally self-adjoint form we obtain the non-stationary Schr¨odinger equation d = + U, U = −2(log τ ) , dt τ = q,
(26)
which is a second equation of an auxiliary linear problem for the Kadomtsev-Petviasvilly hierarchy [18] (with q as a τ -function). The solution to (26) is now a quasi-rational function p = exp(kz + k 2 t). q We observe similarities with the Krichever construction [16, 17] for the rational BakerAkhieser function. In more detail, the Backer-Akhieser function (z, t, k) is a special n = deg(p(z)) = m = deg(q(z)) case of the above quasi-rational function n ηi (t, k) = = 1+ exp kz + k 2 t z − xi (t) i=1
with a divisor of simple poles defined at points xi , i ∈ N, N = M.
260
I. Loutsenko
Proof of Proposition 2. Essentially similar to the proof of sufficient condition of Theorem 1: one substitutes (24) in (23) and uses identity (14). Similarly to the linear case, equations of motion (24) can be expressed in the Newtonian coordinates (18) dθj ∂H , = dt ∂θj
∂H dφi , =− dt ∂φi H=
w(φi , φj ) +
i<j ∈N
−
u(φi ) −
u(θi ) +
w(φi , θj )
i∈N,j ∈M
i∈N
(27)
w(θi , θj ),
i<j ∈M
i∈M
w(φ, θ ) = ln
xi = ζ (φi ), yj = ζ (θj ) i ∈ N, j ∈ M,
(ζ (φ) − ζ (θ ))2 , P (ζ (φ))P (ζ (θ ))
P (z)
du(φ(z)) m−n = U (z) − P (z). dz 2
(28)
Should we have dφi∗ /dt, dθi∗ /dt instead of dφi /dt and dθi /dt in the lhs of the equations of motion, the system (27) would be a Hamiltonian system of n positive and m negative vortices or Coulomb charges on the plane or cylinder [2, 3]. It is not Hamiltonian in our case, but has the same fixed points in the plane or cylinder and dynamics on the real line or circle. Let us discuss the question of integrability of (24). In the linear case, which is a particular case m = 0 of (21), there were two lines of approach to the integration of system (12): For the first possibility, the linear dynamics (6) in the finite basis (5) allowed us to find p(z, t) (and zi (t)) at any t. The second way to find zi was to solve the Hamiltonian system (17) with initial conditions given by (12) itself. Obviously, the first of the above approaches does not apply in the bilinear case, since the evolution is not linear any more. Therefore, we use the second method, trying to embed (24) into a Hamiltonian system. Lemma 3. If the odd function (x) satisfies the functional equation, (x)(y) + (z)(x) + (y)(z) = 0
(29)
whenever x + y + z = 0. Then the following identities hold: I1 (x) = (xn − xi )(xn − xj ) = 2 (xi − xj )2 , (30) i∈N,i=n j ∈N,j =n n∈N
I2 (x, y) = 2
i<j ∈N
(xj − ym )(xi − xj )
m∈M i∈N j ∈N,j =i
−2
(yj − xm )(yi − yj )
m∈N i∈M j ∈M,j =i
+
(yj − xm )(xi − yj )
m∈N i∈N j ∈M
−
m∈M i∈M j ∈N
(xj − ym )(yi − xj ) = 0,
(31)
Integrable Dynamics of Charges for Bilinear Hypergeometric Equation
261
Proof. The proof is a calculation.
Theorem 2. Equation (24) can be embedded into the flow generated by the sum of two independent Hamiltonians H = H+ + H − , 1 dxi 2 H+ = − V+ (x), 2P (xi ) dt i∈N 1 dyi 2 + V− (y), H− = − 2P (yi ) dt
(32)
i∈M
V± (z) = 2
P (zi ) + P (zj ) (zi − zj )2
i<j
+
1 U±2 (zi ) , P (zi ) 2
1 U± (z) = U (z) ± P (z). 2
i
In other words, the Hamiltonian equations of motion d2 ∂V+ (ζ (φ)) φi = , 2 dt ∂φi
d2 ∂V− (ζ (θ )) θi = , 2 dt ∂θi
(33)
where ζ is given by (18), are corollaries of (24). Remark. Some results related to the special case P (z) = 1 of Theorem 2 (rational An root system) were obtained by Veselov [22], who studied rational solutions of the Kadomtsev-Petviashvili equation. In particular, it was found that poles of (unbounded at infinity) rational solutions of the KP equation (which are coordinates xi , i ∈ N in our case) move under the Calogero-Moser flow with nonzero external potential. It is interesting to note that the non-degenerate external potentials U+ and U− coincide only in the above mentioned special case. Proof of Theorem 2. Let us check that the Hamiltonian equation of motion for xi , 1 P (xi )2
d 2 xi dt 2
−
P (xi ) 2P (xi )2
dxi dt
2 =
∂ V+ , ∂xi
(34)
holds, expressing the second derivatives through (2) dxj ∂ d 2 xi = dt 2 dt ∂xj n
j =1
dxi dt
m dyj ∂ dxi + . dt ∂yj dt j =1
By direct calculations we get lhs(34) = −
∂ ∂xi
2W1 (A, B, C|x, y) + 4W2 (A, B, C|x) +
1 U+ (xi )2 2 P (xi )
,
262
I. Loutsenko
where
W1 (A, B, C|x, y) = 2
m∈M i∈N j ∈N,j =i
−2
P (xj ) (xj − ym )(xi − xj )
m∈N i∈M j ∈M,j =i
+
m∈N i∈N j ∈M
−
m∈M i∈M j ∈N
W2 (A, B, C|x) =
P (yj ) (yj − xm )(yi − yj )
P (yj ) (yj − xm )(xi − yj ) P (yi ) P (xj ) + (xj − ym )(yi − xj ) yi − x j
i∈N,i=n j ∈N,j =n n∈N
i∈M j ∈N
P (xn ) (xn − xi )(xn − xj )
and P (z) = A + Bz + Cz2 . Let us evaluate W1 (A, B, C|x, y) = AW1 (1, 0, 0|x, y) + BW1 (0, 1, 0|x, y) + CW1 (0, 0, 1|x, y). In W1 (1, 0, 0|x, y) we immediately recognize identity (31) with (x) = 1/x. Consequently W1 (1, 0, 0|x, y) = I2 (x, y) = 0. Changing variables xi = exp(φi ), yi = exp(θi ) we find that W1 (0, 0, 1|x, y) = I2 (φ, θ )+Anm(1−n+m) = Anm(1−n+m),
(x) = coth(x).
Finally, using linearity of W1 with respect to parameters A, B, C we write W1 (0, 1, 0|x, y) = W1 (0, 0, 1|x + 21 , y + 21 )−W1 (0, 0, 1|x, y)− 41 W1 (0, 0, 1|x, y) = 0. Therefore W1 (A, B, C|x, y) = Anm(1 − n + m). Applying (30), we evaluate W2 in a similar way, getting (34) with V+ given in (32). The proof is completed by applying a similar procedure to yi . Corollary 1. Equation (24) is integrated by the Lax method. Proof. Since (32) are Calogero(Sutherland)-Moser Hamiltonians, equations of motions (33) can be represented in the Lax form [19]. Thus solutions to (24) may be found from (32) subject to initial conditions given by (24) itself. 2.4. General bilinear dynamics for arbitrary , evidences of integrability. Let us concentrate on the general bilinear hypergeometric operator (1)–(3) restricting ourselves to the rational case √ P (z) = i, U (z) = iωz, i := −1. Such a choice of coefficients leads to a dynamical system (see (2)) d 1 + ωxj , j ∈ N, i xj = 2 − dt xk − x j xj − y k k∈N=j
k∈M
Integrable Dynamics of Charges for Bilinear Hypergeometric Equation
1 d + ωyj , j ∈ M i yj = 2 − + dt yj − y k yj − xk k∈M=j
263
(35)
k∈N
of n and m particles of two different types. According to (1)–(2), the dynamical system (35) is a corollary of the evolution equation (Eqs. (35)–(37) are corollaries of Proposition 8 and Theorem 3 of Sect. 4) iq
dp dq − ip = p q − 2p q + 2 pq dt dt +ωz p q − pq + ω(m − n)pq
(36)
for polynomials p and q (22), and is a trajectory of a system with the Hamiltonian dyj 2 dxj 2 1 H = + ω2 xj2 − +ω2 yj2 +V (x, y), 2 dt 2 dt j ∈N j ∈M (37) ( − 1) 2 23 V (x, y) = + − . (xj − xk )2 (xj −yk )2 (yj −yk )2 k<j ∈N
j ∈N,k∈M
k<j ∈M
Remark. Although, similarly to the case = 1, (36) can be written in the form of a time dependent Schr¨odinger equation (25), (changing t to it) with the “ψ” and “τ ” functions given by 1
ψ = p/q , τ = q 2 (−1) , the dynamics of poles of the potential U = 2(log q) cannot be embedded in a Hamiltonian flow uncoupled from the dynamics of zeros of p. This is why system (35) cannot be connected with solutions of the KP hierarchy. Although the Hamiltonian system (37) is unlikely to be integrable for arbitrary initial conditions and = 0, ±1, we find that its trajectories (35) (defined by the polynomial evolution (36)) are integrable. Conjecture 1. System (35) is completely integrable for arbitrary real and ω in the sense that there exist 2(n+m)−1 functionally independent integrals of motions Ij , which are real rational functions of x, y, i.e. Ij = Ij (x1 , x1∗ , ...xn , xn∗ , y1 , y1∗ , .., yn , yn∗ ) = Ij (x1∗ , x1 , .., xn∗ , xn , y1∗ , y1 , .., yn∗ , yn ), j = 1..2(m + n) − 1. We devote the rest of this section to examples in favor of this conjecture. We take the case = 1 as the first example: According to Corollary 1 the equations of motion can be reduced to the Lax form (see e.g. [19]) i
d d Lx = [Lx , Ax ] + ωLx , i Ly = [Ly , Ay ] + ωLy , dt dt
where Ly , Ay and Ly , Ay are matrices of dimensions n × n, m × m respectively, 1 − δj k 1 dxj (Lx )j k = , j, k ∈ N, i + ωxj δkj + 2 dt xj − x k (38) 1 − δj k 1 dyj , j, k ∈ M. i + ωyj δkj + (Ly )j k = 2 dt yj − y k
264
I. Loutsenko
Substituting (24) to (38) we eliminate velocities dx/dt, dy/dt, getting the Lax matrices L˜x and L˜y for (24), depending on the coordinates only. It is easy to see that the absolute values of squares of traces I1 = (TrL)(TrL)∗ , I2 = (TrL2 )(TrL2 )∗ , .., I2(n+m)−1 = (TrL2(n+m)−1 )(TrL2(n+m)−1 )∗ of the (m + n) × (m + n) matrix
L=
L˜ x 0 0 L˜ y
(39)
are real rational integrals of motion. They are homogeneous functions in x, y, ω (with x, y and ω having weights −1, −1, 2 respectively). The functional independence of (39) can be easily proved by considering them as polynomials in ω with functionally independent highest symbols Ik = ω2k xjk + yjk (xj∗ )k + (yj∗ )k + .., k = 1..2(n + m) − 1. j ∈N
j ∈M
j ∈N
j ∈M
Let us now turn to the general system = 0, ±1. Although, in this case, our arguments in favor of integrability of (35) stem mainly from numerical studies, we would like to mention some analytic results: The Hamiltonian (37) admits total separation of variables in low dimensions n+m < 4. Namely, separating the motion of the center of mass, we obtain a one or two dimensional problem, admitting (in the latter case) further separation of variables in the polar coordinates. Another (less trivial) example is the system with an even number of unit charges n = 2l and a single particle of the second type m = 1 having an arbitrary charge −. The system is subject to symmetric initial conditions: xj (0) = xj +l (0),
y1 (0) = 0,
j = 1..l.
It is seen without much difficulty that due to this Z2 symmetry, the above conditions hold for any t. Taking this fact into account, we may reduce (35) by this symmetry √ keeping only variables xj , j = 1..l. Changing the variables as xj = zj we arrive to the following equations of motion: i
4zj dzj +ω =− dt zj − z k
(40)
k=j
which correspond to the BCn rational case of Fig. 1. The integrability of (40) is then provided by arguments used for the study of linear dynamics. The rational integrals of motion for (40) can be found using the Lax representation for the Calogero-Moser system of the BCn type. One can also prove the periodicity of small non-symmetric deviations from the symmetric trajectories as linear perturbations around (40). We do not perform this analysis here, since it requires cumbersome calculations. Finally, numerical simulations show that for any initial conditions and real trajectories of (35) turn out to be periodic, which shows existence of 2(n+m)−1 independent
Integrable Dynamics of Charges for Bilinear Hypergeometric Equation
265
Fig. 2. Examples of trajectories of the two component system consisting of n = 6 unit charges and m = 1 charge −, where = 1.213579. Every curve shows an individual trajectory of each charge in its own coordinates ( xj , xj ), j = 1..6 or ( yj , yj ), j = 1. The charge of value − is depicted by the gray solid line. The motion shown on the left figure has period 4π/ω = 2T , while the period on the right figure is equal to T = 2π/ω. The motion on the left is depicted within the time interval t = [0, T ], which is a half-period for such initial conditions. In this case trajectories of several charges coincide interchanging each half-period T
rational integrals of motion. The period of motion is an integer multiple period of the “free” oscillator T = 2π/ω, with an integer factor depending on the initial conditions. Typical examples of trajectories for generic initial conditions and are shown in Fig. 2. 3. Special Limits: Equilibrium Configurations and Degenerate Operators 3.1. Equilibrium configurations, bilinear hypergeometric equation. The fixed points dxi /dt = 0 and dyi /dt = 0 of (24) describe the equilibrium of the unit positive and negative Coulomb charges in two dimensional electrostatic or point vortices in hydrodinamics respectively [20, 3]. The polynomials p and q (22) must then satisfy an ordinary bilinear differential equation Hλnm [p, q] = 0
(41)
which is a special case = 1 of the Bilinear Hypergeometric Equation (4). Studying the dynamics of roots we supposed that they are distinct and polynomials p and q do not have common factors (Lemmas 1, 2). However, solutions of (41) may have multiple roots or/and common factors. In these circumstances we need to modify (24). Remark. One does not encounter such a problem in the linear case since polynomial solutions of the ordinary Hypergeometric equation (classical orthogonal polynomials) do not have multiple roots. Proposition 3. Let p and q be polynomials of orders n and m satisfying (41) and p/q = p/ ¯ q, ¯
p¯ =
n¯
(z − xi )νi , i=1
q¯ =
m ¯
(z − yi )σi , i=1
266
I. Loutsenko
where p¯ and q¯ do not have common roots. Then x and y are critical points of the Energy function H(x, y) =
i= n,j ¯ =n¯
νj νi w(xi , xj ) +
i<j =1 i=m,j ¯ =m ¯
+
n¯
νi u(xi ) −
i= n,j ¯ =m ¯ i,j =1
i=1
σi σj w(yi , yj ) −
i<j =1
νi σj w(yi , xj )
m ¯
(42)
σj u(yj ),
j =1
w(x1 , x2 ) = ln
(x1 − x2 )2 . P (x1 )P (x2 )
In other words, the total charge at point xi equals the difference of multiplicities of the corresponding root in p and q. Proof. It is straightforward to check that resz=xi
Hλ [p, q](z) Hλ [p, ¯ q](z) ¯ = resz=xi , p(z)q(z) q(z) ¯ p(z) ¯
where resz=xi stands for the residue of a simple pole in the point xi . The residue is zero since p and q satisfy (41). By direct calculation we get Hλ [p, ¯ q](z) ¯ p(z) ¯ q(z) ¯
0 = resz=xi =
n¯
m ¯
j =1,j =i
2νj P (xi ) 2σj P (xi ) 1 − 2νi P (xi ) + − νi U (xi ) + 2 xi − x j xi − y j j =1
which is a derivative ∂H/∂xi of the energy (42). Repeating similar calculation for yi we complete the proof. The following proposition gives examples of equilibrium configurations corresponding to several generic cases of Fig. 1. Proposition 4. Let I = i1 < i2 ...ik < ik+1 be a strictly increasing sequence of nonnegative integers and let Qi (z) be classical orthogonal polynomials satisfying the hypergeometric equation (L + λi )Qi (z) = 0,
L = P (z)
d2 d + U (z) , dz2 dz
(43)
where (up to a linear transformation of z) P (z) = 1, P (z) = −z2 ,
(i) (ii)
U (z) = bz, U (z) = bz,
b = 0.
Then polynomials p and q 1
p(z) = P (z) 4 k(k+1) W[Qi1 (z), Qi2 (z), . . . Qik (z), Qik+1 (z)], 1
q(z) = P (z) 4 (k−1)k W[Qi1 (z), Qi2 (z), . . . Qik (z)],
(44)
Integrable Dynamics of Charges for Bilinear Hypergeometric Equation
(i)
deg(p) = n =
k+1 j =1
(ii)
deg(p) = n =
k+1
267
1 1 ij − k(k + 1), deg(q) = m = ij − k(k − 1), 2 2 k
j =1
ij , deg(q) = m =
j =1
k
ij
j =1
satisfy the bilinear hypergeometric equation (41) with λnm given in Proposition 2. W[ψ1 (z)...ψk (z)] = det ||dψi (z)/dzj || in (44) denotes the Wronskian determinant. To prove the proposition we need the following lemma by Crum [11] Lemma 4. Let L be a given second order Sturm-Liouville operator L=
d2 + u0 (φ) dφ 2
with a sufficiently smooth potential u0 , and let {ψ1 , ...ψk } be its eigenfunctions corresponding to arbitrary fixed pairwise different eigenvalues {λ1 , ...λk }, i.e. ψi ∈ ker(L + λi ), i = 1..k. Then, for arbitrary ψ ∈ ker(L + λ) the function ψ˜ =
W[ψ1 ...ψk , ψ] W[ψ1 ...ψk ]
satisfies the differential equation 2 d + u (φ) + λ ψ˜ = 0 k dφ 2 with uk = u0 + 2
d2 ln W[ψ1 . . . ψk ]. dφ 2
Proof of Proposition 4. Changing variables as in (18) and making a gauge transformation, d U L → L0 = νLν −1 , ln ν = √ , dφ 2 P we get a formally self-adjoint operator d U U d d2 L0 = + u0 + √ − √ = dφ dφ dφ 2 2 P 2 P with eigenfunctions ψi = νQi ,
(L0 + λi )ψi = 0,
i = 0, 1, 2, . . . .
According to the Crum lemma the function W[ψi1 (φ), ψi2 (φ), ...ψik (φ), ψik+1 (φ)] νp = W[ψi1 (φ), ψi2 (φ), ...ψik (φ)] q
(45)
268
I. Loutsenko
is an eigenfunction of Lk =
d2 + uk , dφ 2
uk = u0 + 2
d2 d2 log W[ψ , ψ , ...ψ ] = u − 2 log(ν k q) i i i 0 1 2 k dφ 2 dφ 2
with the eigenvalue λik+1 . Deriving (45) and the last equation we used the following properties of Wronskians W[νf1 , . . . νfn ] = ν n W[f1 , . . . fn ], 1 n(n−1) dz 2 W[f1 (z) . . . fn (z)]. W[f1 (z(φ))...fn (z(φ))] = dφ It then follows immediately that 2 νp d p dq dp U 1 d 2q dq dp q 2 −2 =0= +p 2 + √ q− p Lk + λik +1 q νq 2 dφ dφ dφ dφ dφ P dφ + (k + λik+1 )pq , (46) 1 dU U dP − . k = k 2 2 dz P dz
where
It is clear from the statement of the proposition that k is independent of z. Finally, changing the independent variable back to z (18), φ = φ(z), we arrive at the bilinear hypergeometric operator in the rhs of (46). The degrees of p and q can then be evaluated from the highest powers of Wronskians (44). Example. Consider, for instance, the equilibrium configuration corresponding to the sequence I = 2, 4, 6
(47)
in the system with P = 1,
U = −2z.
The eigenstates of the linear problem are Hermite polynomials Hn (z) (see Fig. 1). Computing p and q, with the help of (44), we obtain p = 8192z3 (8z6 − 12z4 + 18z2 − 15),
q = 32z(4z4 + 3 − 4z2 ).
The polynomial p has a multiple root z = 0 and this is a common root with the polynomial q. Excluding common factors we have p¯ = 256z2 (8z6 − 12z4 + 18z2 − 15),
q¯ = 4z4 − 4z2 + 3.
It can be verified without much difficulty that q¯ and p¯ do not have multiple roots, other than z = 0. Hence sequence (47) gives the following equilibrium distribution of charges (interacting via logarithmic potentials), in the linear external field: One charge of the value ν1 = +2 at z = 0, six charges of the value ν2..7 = +1 on the real line and four negative charges σ1..4 = −1 in the complex plane.
Integrable Dynamics of Charges for Bilinear Hypergeometric Equation
269
The following proposition is an analog of Proposition 4 for P (z) = z. Proposition 5. Let I = i1 < i2 . . . ik < ik+1 be a strictly increasing sequence of non(−1) negative integers and k = 0 mod 4. Let Qi (z) = Li (−bz) be Laguerre polynomials satisfying the hypergeometric equation (L + λi )Qi (z) = 0,
L=z
d2 d + bz , b = 0. 2 dz dz
Then polynomials p and q 1
p(z) = z 4 k(k+1) W[Qi1 (z), Qi2 (z), . . . Qik (z), Qik+1 (z)], 1
q(z) = z 4 (k−1)k W[Qi1 (z), Qi2 (z), . . . Qik (z)], deg(p) = n =
k+1 j =1
(48)
1 1 ij − k(k + 1), deg(q) = m = ij − k(k − 1) 4 4 k
j =1
satisfy the bilinear hypergeometric equation (41) with P (z) = z and U (z) = bz. Proof. We repeat the proof of Proposition 4, except that now k must be a multiple of 4 in order for (48) to be polynomials. 3.2. Rational and trigonometric solutions of KP/KdV hierarchies, evolution in two dimensions. Another interesting set of examples are the degenerate limits U = 0. They correspond to decreasing at infinity rational or periodic soliton solutions of the KP/KdV hierarchies (see Eq. (26)). For instance, studying the case P = 1,
U = 0,
Bartman [4] provided an electrostatic interpretation for the Adler-Moser polynomials. Indeed, in this limit the bilinear hypergeometric equation becomes the recurrence relation for the Adler-Moser polynomials p q − 2p q + pq = 0
(49)
which, as shown by Burchnall and Chaundy (who, according to the author’s knowledge, first studied (49) in [7]), exhaust all polynomial solutions of (49). Note that, different from the generic cases shown in Fig. 1, we have a set of polynomials depending continuously on k + 1 parameters: p = θk+1 , q = θk , θk = W[ψ1 , . . . ψk ],
θk = θk (z + t1 , t2 , . . . tk ), ψj = ψj −1 , ψ0 = 1, ψ1 = z,
deg(θk (z)) = (k + 1)k/2,
with the second logarithmic derivatives of θs being rational solutions of the KdV hierarchy. Thus, we have (at generic values of ti ) equilibrium of k(k + 1)/2 positive and 21 (k + 1)(k +2) negative free charges with positions in the complex plane continuously depending on ti .
270
I. Loutsenko
Let us turn now to the following problem: Find homogeneous polynomials p(X, Y, t), deg(p) = n, q(X, Y, t), deg(q) = m in two variables X, Y satisfying the equation dq dp q− p = (X2 + Y 2 ) (qp − 2(∇q, ∇p) + pq) , dt dt
(50)
where
∂2 ∂2 ∂ ∂ + , ∇ := , ∂X ∂Y ∂X ∂Y and (, ) stands for the standard scalar product in C2 . Factorizing p and q as :=
p=
n m
(X sin φi − Y cos φi ), q = (X sin θi − Y cos θi ), i=1
i=1
we come to the following Proposition 6. The bilinear evolution (50) induces dynamics dφi = −2 cot(φi − φj ) + 2 cot(φi − θj ), dt j ∈N=i j ∈M dθi cot(θi − θj ) − 2 cot(θi − φj ). =2 dt j ∈M=i
(51)
j ∈N
Proof. It is convenient to write (50) in the polar coordinates (X = r cos φ, Y = r sin φ) p = r n p˜ = r n
n
sin(φ − φi ), q = r m q˜ = r m
i=1
q˜
m
sin(φ − θi ),
i=1
d q˜ ∂ 2 p˜ ∂ 2 q˜ ∂ q˜ ∂ p˜ d p˜ − p˜ = q˜ 2 − 2 + p˜ + (n − m)2 p˜ q. ˜ dt dt ∂φ ∂φ ∂φ ∂φ 2
(52)
Then, it can be verified that Eq. (52) corresponds to the case P (z) = −z2 , U (z) = 0 in the classification of Fig. 1. It must be remarked, however, that solutions p˜ and q˜ are not polynomial, but algebraic functions of z = exp(2iφ):
−1/2
xj p, ¯ xj = exp(2iφj ), p¯ = (z − xj ), p˜ = z−n/2 j ∈N
q˜ = z
−m/2
j ∈M
j ∈N −1/2 yj q, ¯
yj = exp(2iθj ), q¯ =
(z − yj ).
j ∈M
Nevertheless, since p˜ and q˜ are of “almost” polynomial type, we get (51) by arguments similar to the proof of Proposition 2. More precisely, for this purpose it is rather more convenient to use (25), where we can replace q and p with “pure” polynomials p¯ and q¯ (and ψ = pq with pq¯¯ ): This substitution adds constants to coefficients in (25) and the rhs
1 of the equation acquires the common factor z 2 (n−m) exp(i j ∈N φj − i j ∈M θj ). The lhs the same factor, since the quantity (“center of mass of the system”)
of (25) acquires
φ − θ j j ∈N j ∈M j does not change with time. The later statement can be easily verified adding equations of motion for φs and subtracting equations of motion for θ s in (52). Thus, the problem is reduced to the purely polynomial dynamics, which completes the proof.
Integrable Dynamics of Charges for Bilinear Hypergeometric Equation
271
The flow (51) is a trajectory of two Sutherland systems in the absence of the external potentials. It is interesting that the equilibrium condition for (52) written in coordinates X, Y qp − 2(∇q, ∇p) + pq = 0
(53)
has been studied in [6, 5] in connection with the Hadamard problem in Minkowski space: Solutions of (53) (or fixed points of (51)) define differential operators possessing Huygens property in the Hadamard sense [13]. The angular parts p, ˜ q˜ of solutions to (53) are periodic soliton solutions of the Korteveg-de Vries equation. The following proposition provides us with a k + 1-parametric family of solutions to (53) describing equilibrium configurations on the Coulomb charges (vortices) on the cylinder. Proposition 7. Let I = i1 < i2 ...ik < ik+1 be a strictly increasing sequence of nonnegative integers. Then p = r n W[ψi1 , ψi2 , . . . ψik , ψik+1 ], q = r m W[ψi1 , ψi2 , . . . ψik ], where ψij := sin(ij φ + ti ), W := det ||dψi /dφ j ||,
m=
k j =1
ij , n =
k+1
ij
j =1
satisfy (53). Proof. Repeats the proof of Proposition 4 for (52)=0, except that now we have a superposition of the Tchebyshev trigonometric polynomials sin(j φ) and cos(j φ) [20] instead of Qj . Thus, in contrast with the plane case we have k + 1 continuous parameters and k + 1 integers defining equilibrium configurations on the cylinder. Also, different from the plane distributions, the equilibrium is possible not only for consecutive triangle powers of the Adler-Moser polynomials, but for any two values of partitions n = deg(p) =
k+1
k j =1 ij . This is due to a different topology of the probj =1 ij and m = deg(q) = lem: roughly speaking, the charges on the cylinder have less “possibilities” to “escape” to infinity than on the plane. It must, however, be restated that numbers and values of charges depend on multiplicities and common factors of p and q. 4. Multi-Linear Evolution Equations and Related Hamiltonian Systems As was mentioned in the introduction, (3) is, in fact, a special case of the more general multi-linear equation. The multi-linear equation induces a polynomial dynamics which can be also embedded in a Hamiltonian flow. However, this flow does not separate now in independent components. The Hamiltonians are of Calogero-Moser type for several species of interacting particles. They are not generally related to the Coxeter reflection groups. Let us begin by introducing l species of particles with distinct charges Q := {Qi , i = 1..l}, Qi = Qj , i = j . We define the l-linear differential operator Q
Hλ [q1 , ...ql ](z)
272
I. Loutsenko
l l l l
= P (z) Q2i qi (z) qn (z) + 2 Qi Qj qi (z)qj (z) qn (z) n=i
i=1
1 + P (z) 2
l
Q2i qi (z)
n=i=j
i<j
l
qn (z) + U (z)
n=i
i=1
l
Qi qi (z)
l
qn (z) + λ
n=i
i=1
l
qi (z) (54)
i=1
acting on polynomials qi ∈ Vi ∼ = Cni +1 , i = 1..l, N
i +ni
qi (z) =
(z − zj (t)),
j =Ni +1
Ni =
i−1
nj .
(55)
j =1
For convenience, we now use unique numeration for roots of all polynomials. Similarly to the case l = 2, Q1 = 1, Q2 = −1 of the bilinear hypergeometric operator (1), the following proposition holds (we skip proofs below, since they repeat arguments of preceding sections): Q
Proposition 8. The multi-linear operator Hλn C
l
i=1 ni
1 ,..,nl
: V1 × V2 × ... × Vl → Vl+1 , Vl+1 ∼ =
given by (54) with P (z) = A + Bz + Cz2 ,
and
U (z) = a + bz,
l l 1 =− U + P Q i ni Q i ni 2
λn1 ,..,nl
i=1
i=1
induces dynamics Qj Qi dzi P (zi ) − U (zi ) − = −2P (zi ) zi − z j 2 dt
(56)
j =i
by the action l i=1
Qi
l dqi
Q qn = Hλn ,...,n [q1 , .., ql ] 1 l dt n=i
on (55). Under conditions mentioned before, (56) describes motion/equilibrium of n1 charges Q1 , n2 charges Q2 , etc. To avoid confusion, we note that in (56) and up to the end of this section, the sum mation indexes go from 1 to the total number of roots lj =1 nj and to each root zj we assign Qj which is equal to the charge of the corresponding polynomial. It is natural to ask the following important question: May dynamics (56) be embedded in a Hamiltonian flow? We address it in the following
Integrable Dynamics of Charges for Bilinear Hypergeometric Equation
273
Theorem 3. Equation (56) is a trajectory of the Hamiltonian system H =
j
Qj 2P (zj )
dzj dt
where W (z1 , z2 ) =
2 −
Qj
j
UQj (zj )2 2P (zj )
P (z1 ) + P (z2 ) , (z1 − z2 )2
−
Qk Qj (Qk + Qj )W (zk , zj ),
k<j
(57)
UQj (z) = U (z) + Qj P (z)/2.
System (57) is of Calogero(Sutherland)-Moser type of l species of particles with masses Qi . In this picture, the two-body potentials are of similar form, but having different amplitudes Qj Qk (Qj + Qk ) (for interaction within each of the species of particles and between the species respectively). They are translation invariant in Newtonian coordinates (18) if (up to a linear transformation) P (z) = 1 or z2 . We recall that in the case l = 2, Q1 = 1, Q2 = −1 the interaction between two different species vanishes, leading to separation of the Hamiltonians, while for l = 1 we obtain identical Calogero-Moser particles. Both above cases are related to A/BC/D root systems. It is seen without much difficulty that for generic l, Q, system (57) is not related to any Coxeter reflection group. Although, as far as the author knows, quantum models related to different deformations of the Coxeter root systems were considered in earlier works (e.g. [6, 9]), (57) has not appeared in the literature. We do not attempt to address the question of integrability of (57) in this paper, leaving it for future studies. 5. Conclusions and Open Questions The results at which we have now arrived may be summed up as follows: The bilinear hypergeometric operator (1) induces dynamics (2), which may be embedded in a Hamiltonian flow. In the case = 1 this flow is generated by a sum of two independent Calogero(Sutherland)-Moser Hamiltonians (Theorem 2) with (generally) different forms of external potentials. This allows us to integrate (24) by the Lax method. The fixed points of bilinear evolution correspond to equilibrium distributions of different species of point vortices on the plane or cylinder. They may be obtained (again, for = 1) by a finite number of Darboux transformations from the eigenstates of associated linear problems (Propositions 4, 5, 7). The dynamical system (35) of two species of interacting points in an external field is conjectured to be completely integrable for arbitrary real and ω. Let us now mention some open questions. The main problem, of course, is to prove integrability of (35) for arbitrary (Conjecture 1). It might be done by using two approaches: The first approach is to find a Lax representation for (35). The Lax matrices for arbitrary could have a complicated structure, being rational functions of x, y and ω of greater homogeneity degree in comparison with the Calogero-Moser case. It makes it hard to find them using straightforward computational approaches. Another method is to try to linearize (36). Although, as was mentioned before, such a linearization, connected with the KP equation, was possible for P = 1 and = 1, we cannot apply a similar scheme in the general case. Another set of questions is connected with solutions of the bilinear hypergeometric equation: In 1929 [7], Burchnall and Chaundy studied the following question: What
274
I. Loutsenko
conditions must be satisfied by two polynomials p(z), q(z) in order that the indefinite integrals q(x) 2 p(x) 2 dx, dx q(x) p(x) may be rational, provided p and q do not have multiple and common roots? They found that the integrals are rational if p = θi , q = θi+1 are (now known as the Adler-Moser) polynomials satisfying (49). On the other hand, we know that any two polynomial solutions of the ordinary hypergeometric equations p = Qn , q = Qm are orthogonal with the measure ν(z) ν(z)Qn (z)Qm (z)dz = 0, n = m. Since the two above integrals are related to particular forms of (4), it is natural to ask the following question: what integration condition may be imposed on two polynomials p and q in order for them to satisfy a nondegenarate bilinear hypergeometric equation (4)? In the same work [7], Burchnall and Chaundy have shown that any polynomial solutions of (49) may be obtained by a finite number of Darboux transformation from the kernel of the “free” differential operator d 2 /dz2 . In this paper we have proved Proposition 4, stating that polynomials obtained from the eigenfunctions of the ordinary hypergeometric equation by a finite number of Darboux transformations are solutions of (4). By analogy with [7] it is natural to state the following Conjecture 2. Any polynomial solutions to bilinear hypergeometric equations of Propositions 4 and 5 are (44) and (48) respectively. Concluding the article we would like to mention briefly possible multi-dimensional generalizations of (4). A particular generalization was constructed in Sect. 3 for the homogeneous polynomials in two variables (53). A similar construction [5] related to the classical special functions in many dimensions [14], is connected with the quantum Calogero-Moser systems on the Coxeter root systems (and their deformations [6, 9]). In this context, it would be interesting to find a proper analog of (4) in many dimensions. Acknowledgements. The author is grateful to H.Aref, Y.Berest, F.Calogero, B.Dubrovin, A. Kirillov, and A.Orlov for useful information and remarks.
References 1. Adler, M., Moser, J.: On a class of polynomials connected with the Korteveg-de Vries equation. Comm. Math. Phys. 61, 1–30 (1978) 2. Aref, H.: Integrable, Chaotic and turbulent motion of vortices in two dimensional flows. Ann. Rew. Fluid Mech. 15, 345–389 (1983) 3. Arnold, V.I., Khesin, B.A.: Topological Methods in Hydrodynamics. NY: Springer-Verlag, 1998 4. Bartman, A.B.: A new interpretation of the Adler-Moser KdV polynomials: interaction of vortices. Nonlinear and Turbulent Processes in Physics, Vol. 3 (Kiev 1983), Chur: Harwood Academic Publ., 1984, pp. 1175–1181 5. Berest, Y.: Huygens principle and the bispectarl problem. CRM Proceedings and Lecture Notes 14, 1998 6. Berest,Y., Loutsenko, I.: Huygens principle in Minkowski space and soliton solutions of the Korteveg de-Vries equation. Commun. Math. Phys. 190, 113–132 (1997)
Integrable Dynamics of Charges for Bilinear Hypergeometric Equation
275
7. Burchnall, J.L., Chaundy, T.W.: A set of differential equations which can be solved by polynomials. Proc. London Math. Soc. 30, 401–414 (1929) 8. Calogero, F.: Motion of poles and zeros of special solutions of nonlinear and linear partial differential equations and related “solvable” many body problems. Nuovo Cimento 43 B, 177 (1978) 9. Chalych, O., Feigin, M., Veselov, A.: Multidimensional Baker-Akhieser Functions and Huygens Principle. Commun. Math. Phys. 206, 533–566 (1999) 10. Choodnovsky, D., Choodnovsky, G.: Pole expansions of nonlinear partial differential equations. Nuovo Cimento 40 B, 339 (1977) 11. Crum, M.: Associated Sturm-Liouville Systems. Quart. J. Math 6, 121–127 (1955) 12. Gurevich, B.B.: Foundations of the Theory of Algebraic Invariants. Groningen, Holland: P. Noordhoff, 1964 13. Hadamard, J.: Lectures on Cauchy’s Problem in Linear Partial Differential Equations. New Haven, CT: Yale Univ. Press, 1923 14. Heckman, G.J., Opdam, E.M.: Root systems and hypergeometric functions I. Composito Math. 64 329–352 (1987) 15. Inozemtsev, V.: On the motion of classical integrable systems of interacting particles in an external field. Phys. Lett. 98, 316–318 (1984) 16. Krichever, I.: Methods of algebraic geometry in the theory of nonlinear equations. Russ. Math. Surv. 32, 185–213 (1977) 17. Krichever, I.: Rational solutions of the Kadomtsev-Petviashvilli equation and integrable systems of N particles on line. Funct. Anal. Appl. 12, 76–78 (1978) 18. Novikov, S., Pitaevski, L., Zakharov, V., Manakov, S.: Theory of Solitons: Inverse Scattering Method. New York, NY: Contemporary Soviet Mathematics, 1984 19. Perelomov, A.M.: Integrable Systems in Classical Mechanics and Lie’s Algebras. Moscow: Nauka 1990 20. Szeg¨o, G.: Orthogonal Polynomials. NY: AMS, 1939 21. Turbiner, A.V.: Quasi-exactly solvable problems and sl(2) algebra. Commun. Math. Phys. 118, 467 (1988) 22. Veselov, A.: Rational solutions of the Kadomtsev-Petviashvilli equation and Hamiltonian systems. Russ. Math. Surv. 35, 239–240 (1980) Communicated by L. Takhtajan
Commun. Math. Phys. 242, 277–329 (2003) Digital Object Identifier (DOI) 10.1007/s00220-003-0945-y
Communications in
Mathematical Physics
Discrete Polynuclear Growth and Determinantal Processes Kurt Johansson Department of Mathematics, Royal Institute of Technology, 100 44 Stockholm, Sweden. E-mail: [email protected] Received: 26 November 2002 / Accepted: 17 June 2003 Published online: 7 October 2003 – © Springer-Verlag 2003
Abstract: We consider a discrete polynuclear growth (PNG) process and prove a functional limit theorem for its convergence to the Airy process. This generalizes previous results by Pr¨ahofer and Spohn. The result enables us to express the F1 GOE TracyWidom distribution in terms of the Airy process. We also show some results, and give a conjecture, about the transversal fluctuations in a point to line last passage percolation problem. Furthermore we discuss a rather general class of measures given by products of determinants and show that these measures have determinantal correlation functions.
1. Introduction and Results 1.1. Discrete polynuclear growth. Recently there has been interesting developments concerning certain special 1 + 1 dimensional local random growth models. This development has its starting point in the new results on the longest increasing subsequence in a random permutation, [2]. We will not review all these developments here. In this paper we consider a certain discrete growth model called the discrete polynuclear growth (PNG) model, [24], a special version of which is closely related to the last-passage percolation problem studied in [16]. It is a discrete version of the PNG model studied by Pr¨ahofer and Spohn, [31], which can be obtained as a special limiting case. In the paper we will extend the results in [31] to the present model and prove a stronger convergence result. We also obtain some preliminary results on the transversal fluctuations in the point to line version of the last-passage percolation problem, which should have many similarities with the corresponding problems for first-passage percolation and directed polymers. The discrete polynuclear growth (PNG) model is a local random growth model defined by h(x, t + 1) = max(h(x − 1, t), h(x, t), h(x + 1, t)) + ω(x, t + 1),
(1.1)
278
K. Johansson
x ∈ Z, t ∈ N, h(x, 0) = 0, x ∈ Z. Here ω(x, t), (x, t) ∈ Z×N, are independent random variables, see [24]. Typically they could be Bernoulli random variables. We should think of h(x, t) as the height above x at time t, so x → h(x, t) gives an interface developing in time. We will treat a special case where ω(x, t) = 0 if t − x is even or if |x| > t, and w(i, j ) = ω(i − j, i + j − 1),
(1.2)
(i, j ) ∈ Z2+ , are independent geometric random variables with parameter ai bj , P[w(i, j ) = m] = (1 − ai bj )(ai bj )m , (1.3) √ m ≥ 0. We will mainly consider the case when ai = bi = q, 0 < q < 1, i ≥ 1, and we do this in the rest of this section. If we define G(i, j ) = h(i − j, i + j − 1),
(1.4)
(i, j ) ∈ Z2+ , it follows from (1.1) that G(i, j ) = max(G(i − 1, j ), G(i, j − 1)) + w(i, j ),
(1.5)
see Proposition 3.2. This leads immediately to a different formula for G(M, N ), [16], G(M, N ) = max w(i, j ), π
(i,j )∈π
where the maximum is taken over all up/right paths from (1, 1) to (M, N ). We can think of G(M, N ) as a point to point last-passage time. It is also natural, from the point of view of directed polymers for example, to consider the point to line last-passage time, Gpl (N ) = max G(N + K, N − K). |K|
(1.6)
This makes it reasonable to study the process K → G(N + K, N − K), −N < K < N, which, by (1.4), is the same as K → h(2K, 2N − 1), i.e. the height curve at even sites at time 2N − 1. Let F1 and F2 denote the GOE respectively GUE Tracy-Widom √ largest √ eigenvalue distributions, [36]. It is known, [16], that there are constants a = 2 q(1 − q)−1 and d given by (1.8) below, such that P[G(N, N ) ≤ aN +dN 1/3 ξ ] → F2 (ξ ) as N → ∞, and, [4], P[Gpl (N ) ≤ aN + dN 1/3 ξ ] → F1 (ξ ) as N → ∞. Also, if the maximum in (1.6) is assumed at some point KN , which need not be unique, we expect KN to be of order N 2/3 , i.e. the transversal fluctuations are of order N 2/3 . This can be seen heuristically, [24], and there are some rigorous results for a related question, [17, 3, 39]. These scales motivate the introduction of a rescaled process t → HN (t), t ∈ R, defined by √ √ 2 q 1− q 1/3 −2/3 G(N + u, N − u) = , (1.7) H N + dN duN √ √ N 1− q 1+ q and linear interpolation, |u| < N, compare with [31]. This is our rescaled discrete PNG process. The constant d is given by √ √ ( q)1/3 (1 + q)1/3 d= . (1.8) 1−q
Discrete Polynuclear Growth and Determinantal Processes
279
In the limit when q is small and N is large, we can obtain the continuous PNG process studied by Pr¨ahofer and Spohn, [31]. We want to extend their results to the present discrete setting and also prove a stronger form of convergence to the limiting process, a functional limit theorem. Before we can state the theorem we must define the limiting process which is the Airy process introduced by Pr¨ahofer and Spohn, [31]. We will approach HN by considering it as the top curve in a multilayer PNG process, compare [21 and 31]. This will lead to measures of the form introduced in Sect. 1.2 and we will be able to use the formulas for the correlation functions derived there. The same methods can also be applied to Dyson’s Brownian motion, compare with [13], which can be obtained from N non-intersecting Brownian motions. The appropriately rescaled limit as N → ∞ of the top path in Dyson’s Brownian motion converges to the Airy process, see below. This gives some intuition about what it looks like. Its precise definition is more technical. The extended Airy kernel, [13, 26, 31], is defined by ∞ −λ(τ −τ ) e Ai (ξ + λ)Ai (ξ + λ)dλ, if τ ≥ τ A(τ, ξ ; τ , ξ ) = 0 0 −λ(τ −τ ) (1.9) − −∞ e Ai (ξ + λ)Ai (ξ + λ)dλ, if τ < τ , where Ai (·) is the Airy function. When τ = τ the extended Airy kernel reduces to the ordinary Airy kernel, [36]. We define the Airy process t → A(t) by giving its finite-dimensional distribution functions. Given ξ1 , . . . , ξm ∈ R and τ1 < · · · < τm in R we define f on {τ1 , . . . , τm }×R by f (τj , x) = χ(ξj ,∞) (x). Then, P[A(τ1 ) ≤ ξ1 , . . . , A(τm ) ≤ ξm ] = det(I − f 1/2 Af 1/2 )L2 ({τ1 ,...,τm }×R) ,
(1.10)
where we have counting measure on {τ1 , . . . , τm } and Lebesgue measure on R. The Fredholm determinant can be defined via its Fredholm expansion, see Sect. 2.1 below. We will prove in Sect. 2.2 that f 1/2 Af 1/2 is a trace class operator on L2 ({τ1 , . . . , τm } × R), so this is also a Fredholm determinant in the sense of determinants for trace class operators. The right hand side of (1.10) can also be thought of as a Fredholm determinant of a block operator, see the discussion after proposition 2.1 below. Note that in particular P[A(τ ) ≤ ξ ] = F2 (ξ ).
(1.11)
This defines the Airy process. It is proved in [31] that it has a version with continuous paths, which also follows from the results below. As mentioned above, another way of understanding the Airy process is as follows. Let λ(t) = (λ1 (t), . . . , λN (t)) with λ1 (t) < · · · < λN (t), be the eigenvalues in Dyson’s Brownian motion model, [9], for −1 2 GUE with stationary distribution ZN N (λ)2 N j =1 exp(−λj ). Then, √ √ lim 2N 1/6 (λN (N −1/3 t) − 2N ) = A(t), N→∞
say in the sense of convergense of finite-dimensional distributions. This can be proved using the methods of the present paper, and using techniques from [20], it is possible to get an integral formula for the (extended) correlation kernel. The details will not be given here. This scaling limit has been studied before, see [13] and references therein. In analogy with the results of [31], we can show that the rescaled height process HN converges in finite dimensional distributions to the Airy process.
280
K. Johansson
Theorem 1.1. Let HN be the process defined by (1.7). Then for any fixed t1 , . . . , tm and ξ1 , . . . , ξm , lim P[HN (t1 ) ≤ ξ1 , . . . , HN (tm ) ≤ ξm ]
N→∞
2 P[A(t1 ) ≤ ξ1 + t12 , . . . , A(tm ) ≤ ξm + tm ],
(1.12)
where A is the Airy process. This result can be sharpened to a functional limit theorem. Theorem 1.2. Let A(t) be the Airy process defined by its finite-dimensional distributions, (1.10). Also, let HN (t) be defined by (1.7) and linear interpolation. Fix T > 0 arbitrary. There is a continuous version of A(t) and HN (t) → A(t) − t 2 ,
(1.13)
as N → ∞ in the weak∗ -topology of probability measures on C(−T , T ). The theorem will be proved in Sect. 5.2. As a corollary to this theorem and the results of Baik and Rains, [4], we obtain the following result which expresses the GOE largest eigenvalue distribution F1 in terms of the Airy process. Corollary 1.3. For all ξ ∈ R, F1 (ξ ) = P[sup(A(t) − t 2 ) ≤ ξ ].
(1.14)
t
The proof of (1.14) is very indirect. It would be interesting to see a more straightforward approach. As discussed above we are also interested in the transversal fluctuations of the endpoint of a maximal path in the point to line case. In our discrete model this is not well-defined, there could be several maximal paths. Consider the random variable KN = inf{u ; sup HN (t) = sup HN (t)}, t≤u
t∈R
(1.15)
the first point that gives the maximum. The corresponding quantity for the limiting process H (t) = A(t) − t 2 is K = inf{u ; sup H (t) = sup H (t)}. t≤u
t∈R
(1.16)
We would like to show that KN converges to K so that we could call the law of K the asymptotic law of transversal fluctuations. Unfortunately we can only prove this under a very plausible assumption on the Airy process. We can show, Proposition 1.4. The sequence of random variables {KN }N≥1 is tight, i.e. given > 0 there is a T > 0 and an N0 such that P[|KN | > T ] < for all N ≥ N0 . The assumption we need to make on the Airy process can be formulated as follows. Conjecture 1.5. Let H (t) = A(t) − t 2 . Then, for each T > 0, H (t) has a unique point of maximum in [−T , T ] almost surely.
Discrete Polynuclear Growth and Determinantal Processes
281
If we accept this we can prove Theorem 1.6. Assume that Conjecture 1.5 is true. Then KN → K in distribution as N → ∞. The law of K is thus a natural candidate for the law of the transversal fluctuations. It would be interesting to find a more explicit formula for this law, which could be used to study it numerically. Assuming the truth of the same conjecture it may also be possible to prove that the endpoints of all maximal paths, or asymptotically maximal paths, converge to the same limit K. By using the limit results of [5], Prop. 3.12 and Theorem 3.14 we can obtain the correlation functions of the eigenvalues of the successive minors H (k) = (hij )1≤i,j ≤k , 1 ≤ k ≤ N, of an N × N GUE matrix H = (hij )1≤i,j ≤N . In this way it is possible to get the Airy process as an appropriate limit of the successive largest eigenvalues of H (k) . More details will be given in future work.
1.2. Measures defined by products of determinants. Probability measures given by products of determinants has been studied in several papers, e.g. by Eynard and Mehta, [10], in connection with eigenvalue correlations in chains of matrices, by Forrester, Nagao and Honner, [13], in connection with Dyson’s Brownian motion model and by Okounkov and Reshetikhin, [30], when introducing the so-called Schur process. The problem is to compute the correlation functions and to show that these are given by determinants so that we obtain a determinantal point process, [34]. The same type of correlation functions are also obtained by Pr¨ahofer and Spohn, [31], in a cascade of continuous polynuclear growth (PNG) models. We will study a class of measures which include all the above as special cases and show that we obtain determinantal correlation functions. As an example of the result, we will in Sect. 2.3 investigate random walks on the discrete circle using the same strategy. This will lead to an extended discrete sine kernel, compare with [30]. We will see in Sect. 3 that our main topic, the discrete PNG problem, fits nicely into this framework. This particular application is very close to the Schur process in [30], and their results could also have been used. In fact, we rederive their main formulas. For r ∈ Z let x r = (x1r , . . . , xnr ) ∈ Rn and x¯ = (x −M+1 , . . . , x M−1 ), M ≥ 1. We think of x¯ as a point configuration in {−M + 1, . . . , M − 1} × Rn , and we also specify fixed initial x −M and final x M positions. Let φr,r+1 : R2 → C, r ∈ Z, be given transition weights. The weight of the configuration x¯ is then wn,M (x) ¯ =
M−1
det(φr,r+1 (xir , xjr+1 ))ni,j =1 .
(1.17)
r=−M
Let dµ be a given reference measure on R, typically Lebesgue measure or counting measure. We assume that |φr,r+1 (x, y)| ≤ cr (x)dr (y), where cr ∈ L1 (R, µ) and dr ∈ L∞ (R, µ), −M ≤ r < M. This assumption is not necessary but is convenient and suffices for the convergence of all the objects we will encounter. The partition function is Zn,M =
1 (n!)2M−1
(Rn )2M−1
wn,M (x)dµ( ¯ x), ¯
(1.18)
282
K. Johansson
n r where dµ(x) ¯ = M−1 j =1 dµ(xj ). We will assume that Zn,M = 0 so that we r=−M+1 can define the normalized weight pn,M (x) ¯ =
1 (n!)2M−1 Z
n,M
wn,M (x). ¯
(1.19)
If wn,M (x) ¯ ≥ 0, this is a probability density on (Rn )2M−1 with respect to the reference measure dµ(x). ¯ The (k−M+1 , . . . , kM−1 )-correlation function can now be defined in a standard way by , . . . , x1M−1 , . . . , xkM−1 ) Rk−M+1 ,...,kM−1 (x1−M+1 , . . . , xk−M+1 −M+1 M−1 M−1 n n! = pn,M (x) ¯ dµ(xjr ), n(2M−1)−k (n − k )! r R
(1.20)
j =kr +1
r=−M+1
where k = k−M+1 + · · · + kM−1 , 0 ≤ kj ≤ n. Given two transition functions we define their convolution by φ(x, z)ψ(z, y)dµ(z). φ ∗ ψ(x, y) = R
Set φr,s (x, y) = (φr,r+1 ∗ · · · ∗ φs−1,s )(x, y) if r < s and φr,s ≡ 0 if r ≥ s. Let A = (Aij ) be the n × n matrix with elements Aij = φ−M,M (xi−M , xjM ), 1 ≤ i, j ≤ n. By repeated use of the Heine identity: 1 det(φi (xj ))ni,j =1 det(ψi (xj ))ni,j =1 dµ(x) n! Rn n = det
R
φi (t)ψj (t)dµ(t)
,
(1.21)
i,j =1
we see that Zn,M = det A. Hence det A = 0 by our assumption. Define a kernel K n,M : ({−M + 1, . . . , M − 1}) × R2 → C by K n,M (r, x; s, y) = K˜ n,M (r, x; s, y) − φr,s (x, y),
(1.22)
where K˜ n,M (r, x; s, y) =
n
φr,M (x, xiM )(A−1 )ij φ−M,s (xj−M , y).
(1.23)
i,j =1
In the case M = 1 the kernel K˜ has appeared before, see [37, 7 and also 19]. Theorem 1.7. The correlation functions defined by (1.20) are given by , . . . , x1M−1 , . . . , xkM−1 ) Rk−M+1 ,...,kM−1 (x1−M+1 , . . . , xk−M+1 −M+1 M−1 = det(K n,M (r, xirr ; s, xjss ))−M
(1.24)
Discrete Polynuclear Growth and Determinantal Processes
283
The determinant in the right-hand side of (1.24) has a block structure with the blocks given by r, s and having size kr × ks . The theorem will be proved in Sect. 2.1. A case of particular interest is when the transition weights are given by Fourier coefficients. We are then in a situation similar to that in [30]. Let fr (eiθ ) be a function in L1 (T) with Fourier coefficients fˆr . Assume that the transition weights are given by φr,r+1 (x, y) = fˆr (y − x),
(1.25)
−M ≤ r < M, x, y ∈ Z and that the initial and final configurations are given by xj−M = xjM = 1 − j , j = 1, . . . , n. If we set fr,s (z) =
s−1
f (z),
=r
z = eiθ , then φr,s (x, y) = fˆr,s (y − x)
(1.26)
for r < s. The matrix A defined above is then a Toeplitz matrix with symbol a(z) = f−M,M (z) =
M−1
f (z).
(1.27)
=−M
Define n,M (z, w) = K˜ r,s
K˜ n,M (r, x; s, y)zx w −y ,
(1.28)
x,y∈Z
where K˜ n,M is given by (1.23). When the transition functions and the initial and final configurations are given in this way we are able to give a formula for the limit of this generating function as n → ∞. Proposition 1.8. Assume that fr (z) has winding number zero, a Wiener-Hopf factorization fr (z) = fr+ (z)fr− (z) and is analytic in 1 − r < |z| < 1 + r for some r > 0. Furthermore, suppose that |n|α |aˆ n | < ∞, n∈Z
for some α > 0, where aˆ n are the Fourier coefficients of the symbol a(z) given by (1.27). Set = min r and M (z, w) = K˜ r,s
z G(z, w), z−w
(1.29)
where M−1
G(z, w) =
− 1 s−1 + 1 t=r ft ( z ) t=−M ft ( w ) . r−1 M−1 − 1 + 1 t=−M ft ( z ) t=s ft ( w )
(1.30)
284
K. Johansson
Then, for 1 − < |w| < 1 < |z| < 1 + , ˜ n,M M (z, w) Kr,s (z, w) − K˜ r,s ≤
|fr,M ( 1z )||f−M,s ( w1 )| (|z| − 1)(1 − |w|)
1 1 + |w|n/2 + n/2 nα |z|
.
(1.31)
Furthermore, 1 φr,s (x, y) = 2π
π
−π
ei(y−x)θ G(eiθ , eiθ )dθ,
(1.32)
for r < s. The same type of formula for the limiting kernel was obtained in [30]. The formula will be proved in Sect. 2.1. This proposition makes it possible to compute the asymptotics of the kernel given by (1.22) in certain cases, since it gives an integral formula for the n → ∞ limit of K n,M . An outline of the paper is as follows. The PNG process can be described as the top process in a multilayer growth process as explained in Sect. 3. This multilayer growth process gives rise to families of non-intersecting paths and using the Karlin-McGregor or Lindstr¨om-Gessel-Viennot method we see that this multilayer growth process gives rise to a measure of the form (1.17), proposition 3.13. Hence, there is an associated point process with determinantal correlation functions by Theorem 1.7. In the case we are studying the correlation kernel is given by a double contour integral, Theorem 3.14, which is a consequence of the more general Proposition 1.8. It follows, Proposition 2.1, that the joint probability in the left-hand side of (1.12) is given by a Fredholm determinant involving this correlation kernel, (3.27). To prove Theorem 1.1 we need to control the convergence of this correlation kernel, suitably rescaled to the extended Airy kernel, (1.9). This asymptotic analysis is carried out in Sect. 4 and involves a standard saddle-point argument and some estimates of the kernel. We will not give all the details of the proof of the convergence of the Fredholm determinant. This is similar to the corresponding argument in [16], Lemma 3.1. More details of the proof of a completely analogous result is given in [22]. The proof of Theorem 1.2 uses standard methods, a moment estimate, to prove a functional limit theorem, [6]. We do not work directly with the PNG-process, the top curve in the multi-layer process. Instead we work with a linear statistic over all the layers, which suffices to prove the result. This idea was used also in [31] to prove the continuity of the Airy process. The necessary moment estimate, Lemma 5.1, can then be handled using only the fourth order correlation functions. The expansion of the 4 × 4 determinant giving the fourth order correlation functions leads to many terms. In order to show the estimate in Lemma 5.1 these terms must be combined in the right way. We will not discuss all the terms in detail, but concentrate on typical terms and provide all the details for these. We need the moment estimate for finite N , so to prove Lemma 5.1 we need some estimates from Sect. 4 and also the explicit integral formula for the correlation kernel. In Sect. 5 we also give the proofs of the consequences of the functional limit theorem for the problem of the transversal fluctuations, and the proofs of Proposition 1.4 and Theorem 1.6. In Sect. 2 measures of the general form (1.17) are analyzed and as an example correlations for non-intersecting walks on the discrete lattice is discussed.
Discrete Polynuclear Growth and Determinantal Processes
285
2. Determinantal Measures 2.1. General theory. In this section we will prove the results of Sect. 1.2. We will prove Theorem 1.7 using a generalization of the method of [37, 7] for β = 2 random matrix ensembles, see also [19]. It is also possible to generalize the approach of [10], which is closer to the original Dyson approach. Let M = {−M + 1, . . . , M − 1} × R, λ be the counting measure on {−M + 1, . . . , M − 1} and ν = λ ⊗ µ. Furthermore, we let g : M → C be a bounded function and define
1
Zn,M [g] =
(n!)2M−1
n (Rn )2M−1
(1 + g(r, xjr ))wn,M (x)dµ( ¯ x). ¯
|r|<M j =1
We want to compute Zn,M [g]/Zn,M [0]. Using the Heine identity (1.21) repeatedly we see that 1 Zn,M [g] = det(φ−M,−M+1 (xi−M , xj−M+1 ))1≤i,j ≤n (n!)2M−1 (Rn )2M−1 ×
M−1
det((1 + g(r, xir ))φr,r+1 (xir , xjr+1 ))1≤i,j ≤n dµ(x) ¯
r=−M+1
= det ×
R2M−1 M−2
φ−M,−M+1 (xi−M , t−M+1 )
(1 + g(r, tr ))
|r|<M
φr,r+1 (tr , tr+1 )
φM−1,M (tM−1 , xjM )d 2M−1 µ(t)
.
r=−M+1
1≤i,j ≤n
Now,
(1 + g(r, tr )) = 1 +
|r|<M
2M−1
g(r1 , tr1 ) . . . g(r , tr ),
=1 −M
and hence Zn,M [g] = det Aij + ×
−1 s=1
2M−1
=1 −M
g(rs , ts )φrs ,rs+1 (ts , ts+1 )
R
φ−M,r1 (xi−M , t1 )
g(r , t )φr ,M (t , xjM )d µ(t)
, 1≤i,j ≤n
(2.1) where we have used the notation of Sect. 1.2. If we set g = 0 we obtain Zn,M [0] = Zn,M = det A as before. By definition φr,s = 0 if r ≥ s, and hence we can remove the ordering of the ri ’s in (2.1). We find,
286
K. Johansson
n 2M−1 Zn,M [g] −1 (A )ik φ−M,r1 (xk−M , t1 ) = det δij + Zn,M [0] R k=1 =1 −M
× g(rs , ts )φrs ,rs+1 (ts , ts+1 ) g(r , t )φr ,M (t , xjM )d µ(t)
.
1≤i,j ≤n
s=1
(2.2) Write ψ(u, t; v, s) = g(u, t)φu,v (t, s), and define ψ ∗0 (u, t; v, s) = δuv δ(t − s), ψ ∗1 = ψ and ψ ∗(r+1) (u, t; v, s) =
rM
ψ(u, t; m1 , ξ1 )ψ(m1 , ξ1 ; m2 , ξ2 ) . . . ψ(mr , ξr ; v, s)d r ν(m, ξ )
for r ≥ 1. Note that, since φr,s = 0 if r ≥ s, we have ψ ∗ = 0 if ≥ 2M − 1. This follows immediately from the definition. The formula (2.2) can now be written n Zn,M [g] = det δij + (A−1 )ik dν(u, ξ ) dν(v, η)φ−M,u (xk−M , ξ ) Zn,M [0] m m 2M−1 k=1
∗( −1) M × ψ (u, ξ ; v, η) g(v, η)φv,M (η, xj ) . (2.3) i,j =1,...,n
=1
If K(x, y) is an integral kernel on L2 (, µ) we define the determinant det(I + K)L2 (,µ) via a Fredholm expansion, det(I + K)L2 (,µ) =
∞ 1 det(K(xi , xj ))i,j =1,...,m d m µ(x). m! m
(2.4)
m=0
We assume that K is such that all the integrals are well-defined and the series converges. For example, by Hadamard’s inequality, [15] corollary 7.8.2, it is sufficient to require that |K(x, y)| ≤ a(x)b(y), where a ∈ L1 (, µ), b ∈ L∞ (, µ). Note that if = {1, . . . , n} and µ is a counting measure this is the ordinary determinant det(δij + K(i, j ))i,j =1,...,n . Let K(x, y) be an integral kernel from L2 (1 , µ1 ) to L2 (2 , µ2 ) and L(x, y) an integral kernel from L2 (2 , µ2 ) to L2 (1 , µ1 ). Then L ∗ K(x, y) =
L(x, z)K(z, y)dµ2 (z) 2
is an integral kernel on L2 (1 , µ1 ). Furthermore, det(I + L ∗ K)L2 (1 ,µ1 ) = det(I + K ∗ L)L2 (2 ,µ2 ) . This is easy to see using the Heine identity in the definition (2.4).
(2.5)
Discrete Polynuclear Growth and Determinantal Processes
287
Set b(i; u, ξ ) =
n
(A−1 )ik φ−M,u (xk−M , ξ ),
k=1
c(u, ξ ; j ) =
2M−1
m =1
ψ ∗( −1) (u, ξ ; v, η)g(v, η)φv,M (η, xjM )dν(v, η),
so that, by (2.3) and (2.5), Zn,M [g] = det(δij + (b ∗ c)(i, j ))1≤i,j ≤n Zn,M [0] = det(I + c ∗ b)L2 (M ,ν) . Now, a computation shows that (c ∗ b)(u, ξ ; v, η) =
2M−1
˜ ξ ; v, η), ψ ∗( −1) ∗ (g K)(u,
=1
where K˜ is defined by (1.23). Thus,
2M−1 Zn,M [g] ˜ ψ ∗( −1) ∗ (g K) = det I + Zn,M [0] =1
.
(2.6)
L2 (M ,ν)
The kernel in (2.6) has finite-rank so the sum (2.4) in the definition of the determinant actually has finitely many terms. We now claim that the right-hand side of (2.6) equals ˜ L2 ( ,ν) , det(I − ψ + g K) M which is what we want. Formally the computation goes as follows. The expression in ˜ and we multiply this by det(I − ψ) = 1. Since we are (2.6) is det(I − (I − ψ)−1 g K) only working with determinants defined by a Fredholm expansion the product rule is not obvious, so we will give a proof in this special case. ˜ We will prove that for any z, w ∈ C, Write a = g K. det(I + w
m
zj ψ ∗(j −1) ∗ a)L2 (M ) = det(I − zψ + zwa)L2 (M ) ,
(2.7)
j =1
where m = 2M − 1. The left-hand side is a polynomial in z, w so it suffices to prove (2.7) for |z|, |w| sufficiently small. In that case, under our assumption φr,r+1 , all on the j ψ ∗(j −1) ∗ a. the expressions below are well-defined and convergent. Write b = m z j =1 Then, see e.g. [27, 33], ∞
(−1)k+1 w k det(I + wb)L2 (M ) = exp b∗k (t, t)dν(t) k M k=1
and
288
K. Johansson
det(I + z(−ψ + wa))L2 (M ) = exp
∞ (−1)k+1 zk k
k=1
∗k
(−ψ + wa) (t, t)dν(t) . M
Set c = za, d = zψ. It suffices to show that ∗k ∞ m−1 (−1)k+1 w k d j ∗ c (t, t)dν(t) k M k=1
=
∞ (−1)k+1 k=1
k
j =0
(−d + wc)∗k (t, t)dν(t).
(2.8)
M
The equality (2.8) holds for w = 0 since (−d)∗k (t, t)dν(t) = (−z)k M
ψ ∗k (t, t)dν(t) = 0 M
if k ≥ 1. This follows from φr,s = 0 for r ≥ s. Hence it is enough to show that the derivatives of the two sides of (2.8) coincide, ∗(k+1) ∞ m−1 (−1)k w k d j ∗ c (t, t)dν(t) M
k=0
=
∞ n=0
j =0
((−d + wc)∗n ∗ c)(t, t)dν(t).
(−1)n M
To prove this last equality is a straightforward but somewhat tedious computation, which is based on expanding both sides and showing that the coefficient of w k is the same on both sides. We omit the details. We have proved ¯ be defined by (1.19) and assume that φr,r+1 (x, y) satisProposition 2.1. Let pn,M (x) fies |φr,r+1 (x, y)| ≤ c(x)d(y) with c ∈ L1 (R, µ), d ∈ L∞ (R, µ), −M ≤ r < M. Furthermore, let M = {−M + 1, . . . , M − 1} × R, ν = λ ⊗ µ, where λ is a counting measure on {−M + 1, . . . , M − 1} and let g : M → C be a bounded function. Then n µ (1 + g(µ, xj ))pn,M (x)dµ( ¯ x) ¯ = det(I + gK)L2 (M ,ν) , (2.9) (Rn )2M−1 |µ|<M j =1
where K is given by (1.22), and the determinant is defined by using the Fredholm expansion (2.4). Theorem 1.7 is a direct consequence of (2.9), compare with the discussion in [37]. If Xm = {1, . . . , m} and λ is counting measure on Xm , then L2 (Xm , λ) ∼ = Rm and we have a chain of isomorphisms L2 (Xm × , λ ⊗ µ) ∼ = L2 (Xm , λ) ⊗ L2 (, µ) ∼ = Rm ⊗ L2 (, µ) ∼ = L2 (, µ) ⊕ · · · ⊕ L2 (, µ), where we have m terms in the last direct sum. We can think of an element in Rm ⊗L2 (, µ) as a column vector (f1 (x) . . . fm (x))t , where fi ∈ L2 (, µ), 1 ≤ i ≤ m. Hence, an operator on L2 (Xm × , λ ⊗ µ) defined by an integral kernel K(r, ξ ; r , ξ ) can be thought of as a block operator on these column vectors with block kernel (K(r, ξ ; r , ξ ))1≤r,r ≤m .
Discrete Polynuclear Growth and Determinantal Processes
289
We also want to prove Proposition 1.8. Let us write Tn (a) for the n × n Toeplitz matrix with symbol a and T (a) for the one-sided infinite Toeplitz matrix with symbol a. Consider the function K˜ n,M (z, w) defined by (1.28) and let the symbol a be given by (1.27). Then, n φr,M (x, 1 − i)[Tn−1 (a)]ij φ−M,s (1 − j, y) zx w −y K˜ n,M (z, w) = x,y∈Z
=
n i,j =1
×
i,j =1
x+i−1 ˆ fr,M (1 − i − x)z z1−i [Tn−1 (a)]ij w j −1
x∈Z
fˆ−M,s (y + j − 1)w −y+1−j
y∈Z
=
n 1 1 z fr,M f−M,s z−i [Tn−1 (a)]ij w j . w z w
(2.10)
i,j =1
To proceed we need a formula for the inverse of a Toeplitz matrix. We will use the following result which follows from Theorem 1.15 and Theorem 2.15, together with its proof, in [8]. Proposition 2.2. Assume that a(z) = a+ (z)a− (z), z ∈ T, where a + (z) =
∞
an+ zn ,
a − (z) =
n=0
∞
− −n a−n z ,
(2.11)
n=0
∞
− + n=0 (|an | + |a−n |) < ∞, and that a(z) has winding number zero. Furthermore, suppose that |n|α |aˆ n | < ∞ (2.12) n∈Z
for some α > 0, where aˆ n is the Fourier coefficient of a(z). Using (2.11) we can extend a+ (z) to |z| ≤ 1 and a− (z) to {|z| ≥ 1} ∪ {∞} and we assume that they have no zeros in these regions. Then, Tn (a) is invertible for n sufficiently large and there is a constant C (which depends on a) such that 1 1 −1 −1 −1 (2.13) )T (a− )]j k ≤ C min , [Tn (a)]j k − [T (a+ (n + 1 − k)α (n + 1 − j )α for 1 ≤ j, k ≤ n. We can now give the proof of Proposition 1.8. Proof. (of Proposition 1.8). The function a defined by (1.27) has a Wiener-Hopf factorization a = a+ a− , where a± (z) =
M−1 t=−M
ft± (z),
290
K. Johansson
and all the assumptions of the previous theorem are satisfied. By (2.13) n n −i −1 −1 j −i j z [Tn (a)]ij w − z [T (a+ )T (a− )]ij w i,j =1 i,j =1
1 1 , (n + 1 − i)α (n + 1 − j )α i,j =1 1 C 1 ≤ + |w|n/2 + n/2 . α |z| (|z| − 1)(1 − |w|) n
≤C
Also,
n
|z|−i |w|j min
|w|n + |1/z|n −1 −1 −i j z [T (a )T (a )] w ij + − ≤ C (|z| − 1)(1 − |w|) . i > n or j > n
Set b± = 1/a± and note that (bˆ+ )k = 0 if k < 0 and (bˆ− )k = 0 if k > 0. We can now compute ∞ i,j =1
=
−1 −1 z−i [T (a+ )T (a− )]ij w j
∞ k=1
i∈Z
k w z−i+k (bˆ+ )i−k w j −k (bˆ− )k−j z j ∈Z
1 w/z . = a+ (1/z)a− (1/w) 1 − w/z It follows that n 1 1 z −1 −1 z−i [T (a+ )T (a− )]ij w j fr,M f−M,s w z w i,j =1 M−1 + 1 − 1 s−1 + 1 − 1 z t=r ft ( z )ft ( z ) t=−M ft ( w )ft ( w ) M = (z, w), = Kr,s M−1 + 1 − 1 z−w t=−M ft ( )ft ( ) z
and the proposition is proved.
w
We have ˜ x; s, y) = K(r, x; s, y) = K(r,
1 (2πi)2
γr2
dz zx+1
w y−1 dw γr1
z G(z, w) (2.14) z−w
if r ≥ s, where 1 − < r1 < r2 < 1 + . Using the residue theorem it follows that for r < s, 1 dz z K(r, x; s, y) = G(z, w), (2.15) w y−1 dw (2πi)2 γr1 zx+1 γr2 z−w 1 − < r1 < r2 < 1 + , compare [30].
Discrete Polynuclear Growth and Determinantal Processes
291
2.2. The extended Airy kernel. The extended Airy kernel is defined by (1.9). We can also define a modification by ∞ ˜ ξ ; τ , ξ ) = A(τ, e−λ(τ −τ )Ai (ξ + λ)Ai (ξ + λ)dλ, (2.16) 0
which is well-defined both for τ ≥ τ and for τ < τ by the following standard estimate for the Airy function, |Ai (ξ )| ≤ CM e−2|ξ |
3/2 /3
(2.17)
for ξ ≥ −M. Both A and A˜ have a useful double integral formula. Proposition 2.3. The extended Airy kernel (1.9) is given by 3 3 1 eiξ z+iξ w+i(z +w )/3 A(τ, ξ ; τ , ξ ) = − 2 , dz dw 4π Im z=η τ − τ + i(z + w) Im w=η
(2.18)
˜ where η, η > 0 and η + η + τ − τ < 0 in case τ > τ . Also, the modified kernel A, (2.16), is given by the same formula but where we now require that η + η + τ − τ > 0. Proof. This is straightforward using the identities ∞ 1 e−λ(τ −τ −iz−iw) dλ = − τ − τ + i(z + w) 0 if τ − τ + η + η > 0 and ∞ e−λ(τ −τ +iz+iw) dλ = 0
if τ
− τ
+η
+ η
1 τ − τ + i(z + w)
< 0, and the integral formula i 3 1 e 3 z +ixz dz, Ai (x) = 2π z=η
η > 0, for the Airy function.
If we move the contour of integration between the two cases in Proposition 2.3 we pick up a contribution from the singularity and we obtain ˜ ξ ; τ , ξ ) − φτ,τ (ξ, ξ ), A(τ, ξ ; τ , ξ ) = A(τ, where φτ,τ ≡ 0 if τ ≥ φτ,τ (ξ, ξ ) = √
τ
(2.19)
and
1 2 3 e−(ξ −ξ ) /4(τ −τ )−(τ −τ )(ξ +ξ )/2+(τ −τ ) /12 4π(τ − τ )
if τ < τ . Combining (1.9), (2.16) and (2.19) we see that ∞ φτ,τ (ξ, ξ ) = e−λ(τ −τ )Ai (ξ + λ)Ai (ξ + λ)dλ −∞
(2.20)
(2.21)
if τ < τ . We would also like to show that the operator in (1.10) is actually a trace class operator.
292
K. Johansson
Proposition 2.4. Let f (τ, x) be a non-negative function in L∞ (R) for each τ ∈ {τ1 , . . . , τm }, where τ1 < · · · < τm . Assume also that f (τk , x) = 0 if x < Mk for some number Mk , k = 1, . . . , m. Then, the kernel f (τ, x)1/2 A(τ, x; τ , x )f (τ , x )1/2 defines a trace class operator on L2 ({τ1 , . . . , τm }×R), where we have counting measure λ on {τ1 , . . . , τm } and Lebesgue measure µ on R. Proof. We will prove the result by factoring into two Hilbert-Schmidt operators. Let H (t) = 1 if t < 0 and H (t) = 0 if t ≥ 0. Set ∞ ˜ B(τ, x; τ , x ) = H (τ − τ ) e−y(τ −τ )Ai (x + y)Ai (x + y)dy. −∞
For i < j we define ˜ B˜ ij (τ, x; τ , x ) = B(τ, x; τ , x )δτ,τi δτ ,τj so that ˜ B(τ, x; τ , x ) =
B˜ ij (τ, x; τ , x ).
1≤i<j ≤m
˜ it suffices to show that f 1/2 Af ˜ 1/2 and Since, by (2.19) and (2.21), A = A˜ − B, 1/2 1/2 ˜ f Bij f , 1 ≤ i < j ≤ m, are trace class operators. Set 1 a(τ, x; σ, y) = √ f (τ, x)1/2Ai (x + y)e−y(τ −σ ) χ[0,∞) (y), m 1 b(σ, y; τ , x ) = √ χ[0,∞) (y)Ai (x + y)e−y(σ −τ ) f (τ , x )1/2 . m Then a and b are Hilbert-Schmidt kernels on L2 (m , λ ⊗ µ), λm = {τ1 , . . . , τm } × R. We have |a(τ, x; σ, y)|2 d(λ ⊗ µ)(τ, x)d(λ ⊗ µ)(σ, y) m m 1 = f (τ, x)Ai (x + y)2 χ[0,∞) (y)e−2y(τ −σ ) dxdydλ(τ )dλ(σ ) m m m ∞ m ||f ||∞ ∞ ≤ dx dyAi (x + y)2 e2y(τm −τ1 ) , m M 0 i,j =1
where M = min(M1 , . . . , Mm ). Using (2.17) we see that the integral in the last expression is < ∞. The proof that b is a Hilbert-Schmidt kernel is analogous. Now, 1 a(τ, x; σ, y)b(σ, y; τ , x )d(λ ⊗ µ)(τ, y) f (τ, x)1/2 f (τ , x )1/2 m m σ ∈{τ1 ,...,τm } ∞ ˜ x; τ , x )f (τ , x )1/2 . × e−y(τ −τ )Ai (x + y)Ai (x + y)dy = f (τ, x)1/2 A(τ, 0
Discrete Polynuclear Growth and Determinantal Processes
293
˜ 1/2 is trace class. Hence, the operator f 1/2 Af Next, set 1 cij (τ, x; τ , x ) = √ f (τ, x)1/2Ai (x + y)e−y(τ −τj )/2 δτ,τi m (it is independent of σ ) and 1 dij (τ, x; τ , x ) = √ f (τ , x )1/2Ai (x + y)e−y(τi −τ )/2 δτ ,τj . m Then,
cij (τ, x; τ , x )dij (τ, x; τ , x )d(λ ⊗ µ)(σ, y)
m
=
1 f (τ, x)1/2 f (τ , x )1/2 m σ ∈{τ1 ,...,τm } ∞ × e−y(τ −τ )Ai (x + y)Ai (x + y)dyδτ,τi δτ ,τj = B˜ ij (τ, x; τ , x ). −∞
It remains to prove that cij and dij are Hilbert-Schmidt kernels. Consider cij ; the proof for dij is similar. We get ∞ ∞ 1 f (τ, x)Ai (x + y)2 e−y(τ −τj ) dxdyδτ,τi m −∞ −∞ σ,τ ∈{τ1 ,...,τm } ∞ ∞ = f (τi , x)Ai (x + y)2 e−y(τi −τj ) dxdy −∞ −∞ ∞ ≤ ||f ||∞ dx dyAi (x + y)2 ey(τj −τi ) Mi −∞ ∞ ≤ ||f ||∞ dx dyAi (x + y)2 ey(τj −τi ) 0
Mi
+ ||f ||∞
dx Mi
0 −∞
dyAi (x + y)2 ey(τj −τi ) .
The first integral in the last expresion is < ∞ by (2.17). Now, by (2.17), ∞ ∞ Ai (x + y)2 dx = Ai (x)2 dx ≤ C(1 + |y|), Mi +y
Mi
since the Airy function is bounded. Hence, 0 dx dyAi (x + y)2 ey(τj −τi ) ||f ||∞ Mi
≤ C||f ||∞
−∞ 0
−∞
(1 + |y|)ey(τj −τi ) dy < ∞,
since τj − τi > 0. This completes the proof.
294
K. Johansson
2.3. An example: Random walks on the discrete circle. We will consider non-intersecting walks on the set ZN of integers modulo N , the discrete circle. This type of model has been analyzed in [12] and we will show how it fits into the present formalism. We have 2M − 1 copies of ZN , where the first and the last are identified so that we have periodic boundary conditions in the time direction. We will have non-intersecting paths on the discrete torus. Let x r ∈ ZnN be the particle configuration (n particles) on the r th discrete circle, |r| < M, x −M+1 ≡ x M−1 . Assume that n is odd, n = 2ν + 1 and that the transition probabilities for the walks are given by p, if y − x = 1 φr,r+1 (x, y) = q, if y − x = 0 0, otherwise for x, y ∈ ZN , where p, q ≥ 0 and p + q = 1. The probability for non-intersecting paths from a configuration x r to a configuration x r+1 is det(φr,r+1 (xir , xjr+1 )1≤i,j ≤n ). We can think of this as an un-normalized transition probability. Write x¯ = (x −M+1 , . . . , x M−1 ) ∈ (ZnN )2M−1 for the total configuration. The probability of x¯ is M−2
qn,N,M (x) ¯ =
det(φr,r+1 (xir , xjr+1 )1≤i,j ≤n .
(2.22)
r=−M+1
We will use discrete Fourier series on ZN , 1 fˆ(n) = f ( )z− n , N
f ( ) =
∈ZN
fˆ(n)z n ,
n∈ZN
where z = e2πi/N . Also, we can represent Kronecker’s delta on ZN as δxy =
1 k(x−y) z . N
(2.23)
k∈ZN
˜ n = {x ∈ Zn ; 0 ≤ x1 < · · · < xn < N} be all ordered configurations of n Let Z N N particles on ZN . We identify ZN with {0, 1, . . . , N − 1} and order these numbers in the ˜ n , then usual way. If x −M+1 , x M−1 ∈ Z N det(δx −M+1 ,x M−1 )1≤i,j ≤n = δx −M+1 ,x M−1 i
j
by the definition of the determinant. This determinant can be rewritten using (2.23) and Heine’s identity, 1 −M+1 −xjM−1 ) δx −M+1 ,x M−1 = det zk(xi 1≤i,j ≤n N k∈ZN −M+1 M−1 1 = det(zki xj )1≤i,j ≤n det(z−ki xj )1≤i,j ≤n , n!N n k1 ,...,kn ∈ZN
(2.24)
Discrete Polynuclear Growth and Determinantal Processes
295
˜ n . This leads us to the measure if x −M+1 , x M−1 ∈ Z N pn,N,M (x) ¯ =
1 (n!)2M−1 Z
n,N,M
qn,N,M (x)δ ¯ x −M+1 ,x M−1 ,
where Zn,N,M is the normalization constant. Let g(r, x), |r| < M, x ∈ ZN , be given functions and set n
G(x) ¯ =
(1 + g(r, xjr )).
|r|<M j =1
We want to compute the expectation G(x)p ¯ n,N,M (x) ¯ x∈( ¯ ZnN )2M−1
=
n! N n Zn,N,M
G(x)w ¯ n,N,M (k; x), ¯
(2.25)
˜ n )2M−1 k∈Z ˜n x∈( ¯ Z N N
where −M+1
M−1
¯ = det(zki xj )qn,N,M (x) ¯ det(z−ki xj wn,N,M (k; x) M−1 = det(φr,r+1 (xir , xjr+1 )).
) (2.26)
r=−M −M+1
Here we have set φ−M,−M+1 (ki , xj−M+1 ) = zki xj
M−1
, φM−1,M (xiM−1 , kj ) = z−kj xi
and xi−M = xiM = ki ∈ ZN . We have a measure of the form (1.17). Set Zn,N,M (k) = wn,N,M (k; x) ¯
(2.27)
˜ n )2M−1 x∈( ¯ Z N
and note that G ≡ 1 in (2.25) gives Zn,N,M =
n! Zn,N,M (k). Nn n
(2.28)
˜ k∈Z
N
Let us also write pn,N,M (k, x) ¯ =
1 ¯ wn,N,M (k; x). (n!)2M−1 Zn,N,M (k)
The expectation (2.25) can then be written, using (2.28), ˜n k∈Z
N
Zn,N,M (k) En,N,M (k; G), ˜ n Zn,N,M (k) k∈Z
(2.29)
N
¯ where En,N,M (k; G) is the “expectation” of G with repect to the measure pn,N,M (k, x). This “expectation” can be computed using the standard framework. Let fˆ(n) be equal
296
K. Johansson
to q if n = 0, p if n = 1 and 0 otherwise, n ∈ ZN , so that φr,r+1 (x, y) = fˆ(y − x), for −M < r < M − 1. Then f ( ) = q + pz , ∈ ZN , and by standard properties of convolution φr,s (x, y) = fˆs−r (y − x),
(2.30)
−M < r < s < M − 1. From this we see that Aij = φ−M,M (ki , kj ) = zki x fˆ2M−2 (y − x)z−kj y x,y∈ZN
= Nf (N − ki )
2M−2
δki ,kj = Nf (N − ki )2M−2 δij
˜ n . Thus, if k ∈ Z N A = (Nf (N − ki )2M−2 δij )i,j =1,...,n .
(2.31)
Now, Zn,N,M (k) = det A = N
n
n
(q + pe−2πiki /N )2M−2 .
(2.32)
i=1
This is always non-zero if p = q. If p = q = 1/2, then we assume that N is odd, which also implies that det A = 0. We obtain n 1 K˜ n (k; r, x; s, y) = fˆM−r−1 ( − x)z−ki f (N − ki )2−2M δij N i,j =1 ∈ZN fˆs+M−1 (y − m)zkj m × m∈ZN
n 1 = f (N − ki )s−r zki (y−x) , N
(2.33)
i=1
where we have indicated the dependence of the kernel on k. Note that this kernel is independent of M. We have Kn (k; r, x; s, y) =
n 1 f (N − ki )s−r zki (y−x) − φr,s (x, y), N
(2.34)
i=1
where φr,s (x, y) =
1 f ( )s−r z (y−x) N
(2.35)
∈ZN
if s > r and φr,s (x, y) = 0 if s ≤ r. Computations similar to those leading up to formula (3.27) below show that if we assume that g(r, ·) ≡ 0 if |r| > M0 , then En,N,M (k, G) = En,N,M0 (k, G) for M ≥ M0 . Hence, the expectation (2.29) can be written
Zn,N,M (k) lim (2.36) En,N,M0 (k, G). M→∞ ˜ n Zn,N,M (k) k∈Z n ˜ k∈Z N
N
Discrete Polynuclear Growth and Determinantal Processes
297
Lemma 2.5. Let αi = i − 1, i = 1, . . . , ν + 1, α2ν+2−i = N − i, i = 1, . . . , ν, n = 2ν + 1. If k ∈ Z˜ nN , then lim
M→∞
Zn,N,M (k) = δα,k . ˜ n Zn,N,M (k) k∈Z
(2.37)
N
Proof. We use the explicit formula (2.32) for Zn,N,M (k), Zn,N,M (k) = N n
n
(q + pe−2πikj /N )2M−2 .
j =1
Now, 2π iki . N This is maximal (= 1) if ki = 0(= N ) and it is easy to see that |q + pe−2πiki /N |2 = p2 + q 2 + 2pq cos
n
p 2 + q 2 + 2pq cos
j =1
n 2πikj 2 2π iαj p + q 2 + 2pq cos ≤ N N j =1
with strict inequality unless k = α. This completes the proof.
Hence, if g(r, ·) ≡ 0 for |r| ≥ M0 , then lim G(x)p ¯ n,N,M (x) ¯ = En,N,M0 (α; G). M→∞
x∈( ¯ ZnN )2M−1
From this it follows that the correlation kernel K(r, x; s, y) on the cylinder Z × ZN is given by K(α; r, x; s, y). We obtain the following proposition. Proposition 2.6. The correlation function for n = 2ν + 1 non-intersecting walks on the infinite cylinder Z × ZN as defined above is given by K(r, x; s, y) =
ν 1 (q + pe2πij/N )s−r e2πij (x−y)/N N j =−ν
−ωr,s
N−ν−1 1 (q + pe2πij/N )s−r e2πij (x−y)/N , N
(2.38)
j =−ν
where ωr,s = 1 if r < s, ωr,s = 0 if r ≥ s. The induced measure on ZN is given by ν 1 1 1 det(K(0, xµ ; 0, xν ))1≤µ,ν≤n = det e2πij (xµ −xν )/N 1≤µ,ν≤n n! n! N j =−ν 1 = |e2πixµ /N − e2πixν /N |2 , n!N n 1≤µ<ν≤n
the equilibrium measure on ZN (discrete CUE), see [25]. We can take the limit n, N → ∞, n/N → ρ, 0 < ρ < 1, and obtain a limiting determinantal process on Z2 .
298
K. Johansson
Proposition 2.7. The correlation function for the determinantal process on Z2 induced by non-intersecting random walks as defined above is given by ρ/2 K(r, x; s, y) = (q + pe2πiθ )s−r e2πiθ(x−y) dθ (2.39) −ρ/2
if r ≥ s, and
K(r, x; s, y) = −
1−ρ/2
(q + pe2πiθ )s−r e2πiθ(x−y) dθ
(2.40)
ρ/2
if r < s for 0 < ρ < 1. This kernel is related to the B ± -kernels in [30]. Compare also with [40]. 3. Multi-Layer Discrete PNG We will discuss how the PNG model defined by (1.1), in the case when ω(x, t), (x, t) ∈ Z × N, satisfies ω(x, t) = 0 if t − x is even or if |x| > t, can be embedded as the top curve in a multi-layer process given by a family of non-intersecting paths. We think of the ω(x, t):s as given numbers. The initial condition is h(x, 0) = 0, x ∈ Z. We extend h(x, t) to all x ∈ R by letting h(x, t) = h([x], t), which makes it right continuous at the jumps. Note that it follows immediately that h(x, t) = 0 if x < −t + 1 or x > t. For t − x odd we define the jumps, η+ (x, t) = h(x, t) − h(x − 1, t), η− (x, t) = h(x, t) − h(x + 1, t).
(3.1)
We will see below that η+ , η− ≥ 0 and we should think of η+ (x, t) as a positive jump at x at time t, and η− (x, t) as the size of a negative jump at x + 1 at time t. Define T ω(x, t) = min(η+ (x + 1, t − 1), η− (x − 1, t − 1))
(3.2)
if t − x is odd and T ω(x, t) = 0 if t − x is even. Claim 3.1. The jumps η+ and η− satisfy the following evolution equation: η+ (x + 1, t + 1) = max(η+ (x + 2, t) − η− (x, t), 0) + ω(x + 1, t + 1), (3.3) η− (x + 1, t + 1) = max(η− (x, t) − η+ (x + 2, t), 0) + ω(x + 1, t + 1) for t − x odd. Furthermore η+ (x, t) and η− (x, t) are ≥ 0. Proof. We proceed by induction on t. Assume that η+ (x, t), η− (x, t) ≥ 0 for all x such that t − x is odd. We will prove that then (3.3) holds, and hence η+ (x + 1, t + 1), η− (x + 1, t + 1) ≥ 0 for all x such that t − x is odd. Obviously our induction assumption is true for t = 0. Note that h(x + 1, t) = h(x, t) − η− (x, t), h(x + 2, t) = h(x, t) + η+ (x + 2, t) − η− (x, t) and h(x − 1, t) = h(x, t) − η+ (x, t). It follows from (1.1), our induction assumption and ω(x, t + 1) = 0, that h(x + 1, t + 1) = h(x, t) + max(0, η+ (x + 2, t) − η− (x, t)) + ω(x + 1, t + 1), h(x, t + 1) = h(x, t) and the first half of (3.3) follows. The proof of the second half is analogous.
Discrete Polynuclear Growth and Determinantal Processes
299
It is easy to see that (1.5) holds. Proposition 3.2. Set G(i, j ) = h(i − j, i + j − 1). Then G(i, j ) = max((G(i − 1, j ), G(i, j − 1)) + w(i, j )
(3.4)
for i, j ≥ 1. Proof. We have that h(i − j, i + j − 1) = max(h(i − j − 1, i + j − 2), h(i − j, i + j − 2), h(i − j + 1, i + j − 2)) + w(i, j ) = max(G(i − 1, j ), h(i − j, i + j − 2), G(i, j − 1)) + w(i, j ). Since h(i − j − 1, i + j − 2) − h(i − j, i + j − 2) = η− (i − j − 1, i + j − 2) ≥ 0, this last expression equals the right-hand side of (3.4) and we are done. There is also an inverse recursion formula. Claim 3.3. If t − x is odd, then ω(x + 1, t + 1) = min(η− (x + 1, t + 1), η+ (x + 1, t + 1)), η+ (x, t) = η+ (x − 1, t + 1) − ω(x − 1, t + 1) + T ω(x − 1, t + 1), −
(3.5)
+
η (x, t) = η (x + 1, t + 1) − ω(x + 1, t + 1) + T ω(x + 1, t + 1). Proof. The first equation follows immediately from (3.3). From (3.2) and (3.3) we see that the right-hand side of the second equation in (3.5) equals max(η+ (x, t) − η− (x − 2, t), 0) + min(η+ (x, t), η− (x − 2, t)), which equals η+ (x, t). The proof of the last equation is similar.
From this claim we immediately deduce the following Claim 3.4. If we know η+ (x + 1, t + 1), η− (x + 1, t + 1) for all x such that t − x is odd, and T ω(x, s) for s ≤ t + 1 and all x, we can reconstruct ω(x, s), s ≤ t + 1, x ∈ Z, uniquely. Let a coordinate system (i, j ) be related to the (x, t) coordinate system via the transformation (x, t) = (i − j, i + j − 1),
(3.6)
and define w(i, j ) by (1.2). Then w(i, j ) = 0 if (i, j ) ∈ / Z2+ , and this condition corresponds exactly to our assumptions on ω(x, t). Similarly to (1.2) we define T w(i, j ) = T ω(i − j, i + j − 1).
(3.7)
Claim 3.5. Assume that w(i, j ) = 0 if i or j is ≤ s. Then, T w(i, j ) = 0 if i or j is ≤ s + 1. Proof. It follows from the condition on w(i, j ) that ω(x, t) = 0 if (t + x + 1)/2 ≤ s or (t −x +1)/2 ≤ s, which implies, using (1.1), that h(x, t) = 0 under the same conditions. It follows from (3.1), (3.2) and (3.7) that T w(i, j ) = 0 if h(i − j + 1, i + j − 2) = 0 or h(i − j − 1, i + j − 2) = 0. Now, h(i − j + 1, i + j − 2) = 0 if i ≤ s or j ≤ s + 1, and h(i − j − 1, i + j − 2) = 0 if i ≤ s + 1 or j ≤ s. Hence T w(i, j ) = 0 if i ≤ s + 1 or j ≤ s + 1.
300
K. Johansson
It follows from Claim 3.5 that T n w(i, j ) = 0 if i or j is ≤ n, since w(i, j ) = 0 if i or j is ≤ 0. Hence T n (ω(x, t) = 0 if t ≤ 2n − 1, since i + j − 1 ≤ 2n − 1 implies that i or j is ≤ n. We formulate this as our next claim. Claim 3.6. If t ≤ 2n − 1, then T n ω(x, t) = 0. Let hi (x, t), i ≥ 0, be the PNG process defined by (1.1) with ω(x, t + 1) replaced by T i ω(x, t + 1), and with initial condition hi (x, o) = −i. We let T 0 ω(x, t + 1) = ω(x, t + 1), so h0 (x, t) = h(x, t) is our original growth process. It follows from Claim 3.6 that at time t = 2n − 1 only h0 , . . . , hn−1 can be non-trivial, i.e. hi (x, 2n − 1) = −i for all x if i ≥ n. Combining Claim 3.4 and Claim 3.6 we get Claim 3.7. Given hi (x, 2n − 1), x ∈ Z, i = 0, . . . , n − 1, we can uniquely reconstruct {ω(x, t) ; t ≤ 2n − 1, x ∈ Z}. We can think of hi at time 2n−1 as a directed path from (−2n+1, −i) to (2n−1, −i) which has up-steps η+ (2m, 2n − 1) at even x-coordinates, x = 2m and down-steps η− (2m, 2n − 1) at odd x-coordinates, x = 2m + 1, |m| < n, and horizontal steps in between. According to (3.7) there is a bijection between these paths h0 , . . . , hn−1 and the set {ω(x, t) ; t ≤ 2n − 1 , x ∈ Z}. We set hi (x, t) = hi ([x], t) for x ∈ R. The paths obtained are nonintersecting: Claim 3.8. If t − x is odd, then hi+1 (x, t) < hi (x − 0, t)
(3.8)
hi+1 (x − 0, t) < hi (x, t),
(3.9)
and if t − x is even
so that corners will not meet. Proof. This is proved by induction on t. It is clearly true for t = 0. If it is true at time t it is still true after forming the maximum in (1.1) (deterministic step). (Note that hi and hi+1 have up-steps/down-steps at the same positions.) From the definition (3.2) it is still true after adding T i+1 ω(x, t) to the lower curve. In order to understand how a geometric distribution (1.3) on the w(i, j ) is transported to a measure on the non-intersecting paths, we will assign weights to the jumps. Let ai and bj be given variables. The jumps are assigned weights as follows: η+ (x, t) has weight η+ (x,t)
ai
η− (x,t)
, i = (t + x + 1)/2 and η− (x, t) has weight bj
, j = (t − x + 1)/2. Also, to
T k ω(x,t)
with the same correspondence between T k ω(x, t) we assign the weight (ai bj ) (i, j ) and (x, t), k ≥ 0. The proof of the next claim is a straightforward computation using the definitions of the quantities involved. Claim 3.9. The product of the weights of η+ (x − 1, t + 1), η− (x − 1, t + 1) and T ω(x − 1, t + 1) equals the product of the weights of η− (x − 2, t), η+ (x, t) and ω(x − 1, t + 1). Using this claim we can show that the measure is transported in the way we want. Claim 3.10. The product of all the weights of all the jumps in the multi-layer PNG, h0 , . . . , hn−1 , at time t = 2n − 1 equals, (ai bj )w(i,j ) . i+j ≤2n
Discrete Polynuclear Growth and Determinantal Processes
301
Proof. Using Claim 3.9 repeatedly we see that (ai bj )w(i,j ) = (a(t+x+1)/2 b(t−x+1)/2 )ω(x,t) i+j ≤2n
=
x∈Z,t≤2n−1 η+ (2m,2n−1) η− (2m,2n−1) an+m bn−m
|m|
(a(t+x+1)/2 b(t−x+1)/2 )T ω(x,t) .
x∈Z,t≤2n−1
Repeated use of this identity together with Claim 3.6 proves the claim.
If ηr+ , ηr− are the jumps for hr it follows from the u + ηr (2m, 2n − 1) = u has weight am+n and ηr− (2m, 2n
assignments of weights that u − 1) = u has weight bn−m , |m| < n, 0 ≤ r < n. If we think of the weights as labels transported from the w(i, j ):s we see that if w(i, j ) = 0 for i > n or j > n, we have no labels ai with i > n or bj with j > n and hence ηr− (2m, 2n − 1) = 0 if m < 0 and ηr+ (2m, 2n − 1) = 0 if m > 0. Hence all plus-steps take place to the left of the origin and all minus-steps to the right of the origin. This is the case discussed in [21]. From this consideration and (1.4) we obtain. Proposition 3.11. If |K| < N , then G(N + K, N − K) = h(2K, 2N − 1).
(3.10)
Also, if w(i, j ) = 0 for |i| > N or j > N , then for 0 ≤ K < N , G(N − K, N) = h(−2K, 2N − 1)
(3.11)
G(N, N − K) = h(2K, 2N − 1).
(3.12)
and
The discussion of the multi-layer extension of the PNG-growth model discussed above is closely related to the Viennot/matrix-ball construction, [32, 14, 38], of the Robinson-Schensted-Knuth (RSK) correspondence. Let us briefly discuss the relation. We can think of (3.2) and (3.3) geometrically as follows. From (x, t) to (x − 1, t + 1) we draw a line with multiplicity η+ (x, t) and from (x, t) to (x +1, t +1) we draw a line with multiplicity η− (x, t). A line with multiplicity zero means no line. At (x, t) a line with multiplicity η+ (x + 1, t − 1) and a line with multiplicity η− (x − 1, t − 1) meet and we have a collision/annihilation of size T ω(x, t) as given by (3.2). If η+ (x + 1, t − 1) ≥ η− (x − 1, t − 1), then η+ (x + 1, t − 1) − η− (x − 1, t − 1) plus-lines survive and we add ω(x, t) new lines. Similarly in the other case. This explains (3.3). Assume that w(i, j ) = 0 if i or j is > N . If (w(i, j ))1≤i,j ≤N is a permutation matrix this gives exactly the “shadow lines” of the Viennot construction, [14]. We obtain a mapping to a pair of semi-standard Young tableaux P and Q of shape λ. The number of m:s in the first row of P equals η− (−(N − m), N + m − 1), m = 1, . . . , N and the number of m:s in the first column of Q equals η+ (−(N − m), N + m − 1), m = 1, . . . , N. Similarly, the same procedure starting with T ω instead gives the second rows and so on. Using this line of argument we obtain Proposition 3.12. Let (w(i, j ))1≤i,j ≤N be given and set w(i, j ) = 0 if i or j is > N . The RSK-correspondence maps a submatrix (w(i, j ))1≤i≤M,1≤j ≤N , M ≤ N to a pair of semi-standard Young tableaux of shape λ(M, N ) = (λ1 (M, N ), λ2 (M, N ), . . . ). (Similarly, we can consider (w(i, j ))1≤i≤N,1≤j ≤M .) Consider the family of height curves
302
K. Johansson
hi , 0 ≤ i < N , obtained from the multi-layer PNG process using (w(i, j )). Then, for 0 ≤ K < N, 1 ≤ j ≤ N, λj (N − K, N) = hj −1 (−2K, 2N − 1) + j − 1
(3.13)
λj (N, N − K) = hj −1 (2K, 2N − 1) + j − 1.
(3.14)
and
If we add vertical line segments to the graphs, x → hi (x, 2N − 1), 0 ≤ i < N, we obtain N non-intersecting paths with hi (−(2N − 1), 2N − 1) = hi (2N − 1, 2N − 1) = −i. Recall that hi (x, 2N − 1) ≡ 1 − i for i ≥ N so that at most N paths are “active”. The paths are described by particle configurations. Let C2N −1 (x) = (h0 (x, 2N − 1), . . . , hN−1 (x, 2N − 1)) and C2N−1 = (C2N−1 (−M + 1), . . . , C2N−1 (M − 1)), where M = 2N − 1. Note that C2N−1 (−M) = C2N−1 (M) = (0, −1, . . . , −N + 1). Set y−x (1 − aj +N )aj +N if y ≥ x φ2j −1,2j (x, y) = (3.15) 0 if y < x, φ2j,2j +1 (x, y) =
0 x−y (1 − bN−j )bN−j
if y > x, if y ≤ x
(3.16)
for |j | < N with the convention that 00 = 1. It follows from the Lindstr¨om-GesselViennot method or from the Karlin-McGregor theorem that the weight of the non-intersecting path configuration corresponding to C2N−1 = x, ¯ with weights assigned to jumps as above, equals
M−1 1 r+1 N r det(φr,r+1 (xi , xj ))i,j =1 M . N (1 − b )N (1 − a ) j j j =1 r=−M The way the weights are related to the “weights” of the geometric random variables as described above shows that ¯ = P[C2N −1 = x]
1
M−1
Zn,M
r=−M
det(φr,r+1 (xir , xjr+1 ))ni,j =1
with Zn,M given by (1.18) and n = N , M = 2N − 1. Hence we obtain a measure of the form (1.17). We note that M n n j =1 (1 − aj ) (1 − bj ) Zn,M = . (3.17) i+j ≤M (1 − ai bj ) We summarize what we have found in the next proposition.
Discrete Polynuclear Growth and Determinantal Processes
303
Proposition 3.13. Let hi , i ≥ 0, be the multi-layer PNG process obtained from geometric random variables with parameters ai bj as defined above and let φr,r+1 be defined by (3.15) and (3.16). Then, P[hk−1 (r, 2N − 1) = xkr , 1 ≤ k ≤ N, |r| < M] =
1
M−1
Zn,M
r=−M
det(φr,r+1 (xir , xjr+1 ))ni,j =1 ,
(3.18)
r for each r, where Zn,M is given by (3.17), xi−M = xiM = 1 − i, x1r > x2r > · · · > xN n = N and M = 2N − 1.
The fact that the probability measure has this form makes it possible to compute the correlation functions. Set f2j −1 (z) = (1 − aj +N )
∞
1 − aj +N 1 − aj +N z
(3.19)
1 − bN−j 1 − bN−j /z
(3.20)
ajm+N zm =
m=0
and f2j (z) = (1 − bN−j )
∞
m bN−j zm =
m=0
so that (1.25) holds. The interpretation of the correlation functions given by (1.23) in this case is that they give the probability of finding particles at the specified positions. We can take n ≥ N in (3.18), where n is the number of PNG height curves. All height curves hi with i ≥ N have to be trivial, i.e. hi ≡ −i if i ≥ N . It follows that the probability of a certain configuration is independent of n for n ≥ N. Thus, we can take the kernel K n,M , (1.22), with an arbitrary n, n ≥ N arbitrary and obtain the same value. In particular we can let n → ∞ and use Proposition 1.8. It is clear that all the conditions of this theorem are satisfied when fr (z) is given by (3.19) and (3.20). Let r = 2u, s = 2v both be even, |u|, |v| < N. The expression (1.30) becomes N−1 j =u
G(z, w) = u
1−bN −j 1−bN −j z
j =−N+1
v j =−N+1
1−aN +j 1−aN +j /z
N−1 j =v
1−aN +j 1−aN +j /w
1−bN −j 1−bN −j w
.
(3.21)
We summarize our results for the correlation functions in a theorem. Theorem 3.14. Let the multi-layer PNG process be defined using geometric random variables w(i, j ) with parameter ai bj , 0 < ai , bj < 1, and let G(z, w) be given by (3.21). Set dz dw w y z 1 ˜ KN (2u, x; 2v, y) = G(z, w), (3.22) (2πi)2 γr2 z γr1 w zx z − w where γr is the circle with radius r and center at the origin, 1 − < r1 < r2 < 1 + with 1 + < min(1/bj ), 1 − > max(aj ) and |u|, |v| < N , x, y ∈ Z. Furthermore, let π 1 ei(y−x)θ G(eiθ , eiθ )dθ, (3.23) φ2u,2v (x, y) = 2π −π
304
K. Johansson
for u < v and φ2u,2v (x, y) = 0 for u ≥ v. Set KN (2u, x; 2v, y) = K˜ N (2u, x; 2v, y) − φ2u,2v (x, y).
(3.24)
Then, P[(2u, xj2u ) ∈ {(2t, hi (2t, 2N − 1)) ; |t| < N , 0 ≤ i < N}, |u| < N, 1 ≤ j ≤ ku ] = det(KN (2u, xi2u ; 2v, xj2v ))|u|,|v|
(3.25)
for any xj2u ∈ Z and any ku ∈ {0, . . . , N}. Consider the finite-dimensional distribution of h0 (x, t) = h(x, t), the top curve, Pn,M [h0 (2si , 2N − 1) ≤ i , 1 ≤ i ≤ m],
(3.26)
where n is the number of paths, M = 2N − 1, n ≥ N , |si | < N and i > −N . This can also be written Pn,M [no particles in {2si } × ( i , ∞), 1 ≤ i ≤ m]. This probability is independent of n ≥ N and hence we can let n → ∞. Let g(2si , x) = −χ( i ,∞) (x), 1 ≤ i ≤ m, and g(r, x) ≡ 0 if r is not equal to one of the 2si . Hence, by Proposition 2.1, PN,M [h0 (2si , 2N − 1) ≤ i , 1 ≤ i ≤ m] = det(I + gKN )L2 (M ) .
(3.27)
This formula can be used to study the convergence in distribution of the rescaled height curve. 4. Asymptotics We will consider the asymptotics of the kernel (3.24) in the case ai = bi = α for all i ≥ 1, so that w(i, j ) are geometric random variables with parameter q = α 2 . The function G(z, w) in (3.21) then becomes G(z, w) = (1 − α)2(v−u)
(1 − α/z)N+u (1 − αw)N−v . (1 − αz)N−u (1 − α/w)N+v
(4.1)
Write Fu,x (z) =
1 zx+N+u
(z − α)N+u , (1 − αz)N−u
so that, by (3.22), (1 − α)2(v−u) K˜ N (2u, x; 2v, y) = (2πi)2
where α < r1 < r2 < 1/α.
γr2
dz z
γr1
dw z Fu,x (z)F−v,y w z−w
1 , w (4.2)
Discrete Polynuclear Growth and Determinantal Processes
305
Set µ = m/N, µ = m /N , β = u/N , β = −v/N and 1 log Fu,m−N (z) N = (1 + β) log(z − α) − (1 − β) log(1 − αz) − (µ + β) log z.
fµ,β (z) =
Then, f (z) = P (z)/Q(z), where Q(z) = z(z − α)(1 − αz) and 1 − α 2 − µ(1 + α 2 ) µ+β P (z) = α(µ − β) z2 + z+ . α(µ − β) µ−β We will write µ(1 + α 2 ) − (1 − α 2 ) µ+β ; q = a(µ, β) = . (4.3) 2α(µ − β) µ−β The critical points of f are zc± = p ± p 2 − q and we obtain a double critical point if p2 = q which gives 1 + α2 2 1 + α2 4α 2 β 2 + − 1 − (4.4) µ = µc (β) = 1 − α2 1 − α2 (1 − α 2 )2 p = p(µ, β) =
and 2α + (1 + α 2 ) 1 − β 2 pc = p(µc , β) = . 1 + α 2 + 2α 1 − β 2 − β(1 − α 2 )
(4.5)
Set d=
α 1/3 (1 + α)1/3 1−α
,
d =
1−α d. 1+α
(4.6)
It will be convenient to write u=
1 τ N 2/3 d
,
v=
1 2/3 τN , d
(4.7)
since N 2/3 is the right scale for u and v if we want a non-trivial limit. The correct way of writing x and y will turn out to be x = N(µc (β) − 1) + ξ dN 1/3
,
y = N (µc (β ) − 1) + ξ dN 1/3 .
(4.8)
We will assume that |τ |, |τ |, |ξ |, |ξ | are ≤ log N . The paths of integration can be deformed into η it . : R t → z(t ) = pc (β) + = p − it, − 1/3 1/3 dN dN −1 η is . : R s → w(s ) = pc (β ) + = (p − it)−1 , − 1/3 1/3 dN dN
(4.9) (4.10)
306
K. Johansson
where η, η > 0 will be appropriately chosen; we will require that η η p > 1. (β ) + pc (β) + c dN 1/3 dN 1/3
(4.11)
In that case we have, by Cauchy’s theorem, (1 − α) K˜ N (2u, x; 2v, y) = − (2πi)2
2(v−u)
dz z
dw z Fu,x (z)F−v,y w z−w
1 . w (4.12)
We first estimate this integral and then we will compute its asymptotics using a saddle. point argument. Choose µ so that p(µ, β) = p = pc (β) + η/dN 1/3 as in (4.9), and let q = q(µ, β) be the value we get with this µ. This is possible by formula (4.3) with µ > µc . We can write Fu,x (z) =
1
1
zx+N−µN
z(µ+β)N
(z − α)N+u , (1 − αz)N−u
and then let z = p − it, t ∈ R, and take the absolute value to get |Fu,x (p − it)|2 =
1 (p 2
+ t 2 )x+N−µN
e2Nh(t) ,
(4.13)
where 2h(t) = (1 + β) log A − (1 − β) log B − (µ + β) log C, with 2β + p2 − q + t 2 , µ−β 2α 2 β B = (1 − αp)2 + α 2 t 2 = 1 − 2αp + α 2 + + α 2 (p 2 − q + t 2 ), µ−β 2β + p2 − q + t 2 . C = p2 + t 2 = 1 + µ−β A = (p − α)2 + t 2 = 1 − 2αp + α 2 +
Note that (µ − β)A = (1 − α 2 )β + 1 − α 2 + (µ − β)(p 2 − q + t 2 ), (µ − β)B = 1 − α 2 − (1 − α 2 )β + α 2 (µ − β)(p 2 − q + t 2 ), (µ − β)C = (µ − β)(p 2 − q + t 2 ) + µ + β. A computation now gives h (t) =
t (p2 − q + t 2 ) {(1 − α 2 )2 − (1 − α 4 )(µ + β) + (1 + α 4 )µβ + 2α 2 β 2 (µ − β)ABC − α 2 (µ − β)2 (p 2 − q + t 2 )}.
(4.14)
Discrete Polynuclear Growth and Determinantal Processes
307
Another computation shows that p 2 − q ≥ 0. To leading order we have p 2 − q ≈ 2η/dN 1/3 . Recall that t = t /dN 1/3 . When t is large we have h (t) ≈ −(µ − β)/t, and we also have the estimate t (p 2 − q){(1 − α 2 )2 − (1 − α 4 )(µ + β) h (t) ≤ (µ − β)ABC +(1 + α 4 )µβ + 2α 2 β 2 } (4.15) for t ≥ 0. Consider the case 0 ≤ t ≤ N γ ; the case −N γ ≤ t ≤ 0 is analogous by symmetry. Here 0 < γ < 1/3. Using (4.15) we see that h(t) − h(0) ≈ −2ηt 2 /N and we can show that 3 2 N h (t) ≤ − ηt + N h(0) 2
(4.16)
for |t | ≤ N γ , 0 < γ < 1/3, and N sufficiently large. Define h∗ (t, p) by e2Nh∗ (t,p) =
[(p − α)2 + t 2 ](1+β)N 1 . (p 2 + t 2 )(µc +β)N [(1 − pα)2 + α 2 t 2 ](1−β)N
A computation, compare (4.23) below, gives 1 3
eh∗ (0,pc ) ∼ (1 − α)2u e 3 τ , and h∗ (0, p) = h∗ (0, pc ) + d 3 (p − pc )3 /3 + · · · = h∗ (0, pc ) + η3 /3N + . . . . Consequently, eNh(0) =
1 p N(µ−µc
eNh∗ (t,p) ∼ )
(1 − α)2u (τ 3 +η3 )/3 e . p N(µ−µc )
Combining this with (4.16) gives |Fu,x (p − it)| ≤
C (p 2
+ t 2 )(x+N(1−µ))/2
(1 − α)2u (τ 3 +η3 )/3−3ηt 2 /2 e . p N(µ−µc )
Write 1 (p 2 + t 2 )(x+N(1−µ))/2
=
2 p + t 2 N(µ−µc )/2 . p2 (p − it)x+N(1−µc ) 1
Further computation shows that 1 −ξ τ −ξ η , (p − it)x+N(1−µc ) ∼ e 2
(1 + t 2 /p 2 )N(µ−µc )/2 ∼ eηt . Collecting the estimates we find η 2 ≤ C(1 − α)2u e 13 (τ 3 +η3 )−ξ τ −ξ η− 2 t Fu,x pc (β) + η − it 1/3 1/3 dN dN for |t | ≤ N γ , 0 < η < N γ , 0 < γ < 1/3.
(4.17)
308
K. Johansson
Using (4.14), (4.15) and the other estimates above we see that the contribution to the integral from |t | ≥ N γ and/or |s | ≥ N γ is ≤ C exp(−cN 2γ ) for some constant c > 0. Hence, using the parametrization (4.9) in (4.12) we can restrict to |t | ≤ N γ , |s | ≤ N γ . We can use (4.17) if we want an estimate of the integral. To get the asymptotics we make a local saddle-point argument. To leading order we have pc (β) = 1 + τ/dN 1/3 , pc (β ) = 1 − τ /dN 1/3 and hence the condition (4.11) requires τ − τ + η + η > 0.
(4.18)
We will use the parametrizations (4.9) and consider the integral (1 − α)2(v−u) − (2π i)2 ×e
z (t) w (t) z(t) w(s)y+N(1−µc (β )) ds z(t) w(t) z(t) − w(s) z(t)x+N(1−µc (β))
dt
|t|≤N γ |s|≤N γ Nfµc (β),β (z(t))+Nfµc (β ),β (1/w(s))
(4.19)
.
Now, Nfµc (β),β pc (β) +
1 (η − it) dN 1/3 i 1 (3) = Nfµc (β),β (pc (β)) + f (p (β)) (t + iη)3 + rN (t) c 3 2d 3 µc (β),β i = Nfµc (β),β (pc (β)) + (t + iη)3 + rN (t), (4.20) 3
where the remainder term rN (t) can be neglected for |t| ≤ N γ . Also, z (t) = −i/dN 1/3 , w (s) = iw(s)2 /dN 1/3 and dN 1/3 w(s) ∼− . z(t) − w(s) τ − τ + i(t + iη + s + iη )
(4.21)
Furthermore
w(s)y+N(1−µc (β )) ∼ eξ τ −ξ τ +iξ(t+iη)iξ (s+iη ) . x+N(1−µ (β)) c z(t)
(4.22)
We also need to compute eNfµc (β),β (pc (β)) =
1 (pc (β) − α)N+u . (1 − αpc (β))N−u pcNµc (β)−u
Using the formulas (4.4) and (4.5) above a rather long computation, which we omit, shows that eNfµc (β),β (pc (β)) ∼ (1 − α)2u eτ
3 /3
.
(4.23)
Discrete Polynuclear Growth and Determinantal Processes
309
Inserting (4.20) – (4.23) into (4.19) we see that, provided (4.18) holds, 1 + α 1 2/3 2α ˜ lim dN KN 2 N τ, N + (ξ − τ 2 )dN 1/3 ; N→∞ 1−αd 1−α 1 + α 1 2/3 2α 2 1/3 2 N τ, N + (ξ − τ )dN 1−αd 1−α 1 1 3 3 = − 2 e 3 (τ −τ )+ξ τ −ξ τ 4π i 3 3 eiξ z+iξ w+ 3 (z +w ) × dzdw. Im z=η Im w=η τ − τ + i(z + w) 1/3
(4.24)
α 2 Here we have used µ(βc ) = 1+α 1−α − 1−α 2 β + . . . . We also want to compute the corresponding limit of (3.23) with G(w, w) given by (4.1), i.e. we consider, u < v,
(1 − α)2(v−u) φ2u,2v (x, y) = 2π
π −π
ei(y−x)θ+(v−u) log(1+α
2 −2α cos θ)
dθ.
(4.25)
If we set g(θ ) = log(1 + α 2 − 2α cos θ ), then g (θ ) =
2α sin θ , 1 + α 2 − 2α cos θ
and we see that g(θ ) has a quadratic minimum at θ = 0. Hence, we can immediately both compute the asympotics of and estimate the integral in (4.25) when x = 2α(1 − α)−1 N + (ξ − τ 2 )dN 1/3 , y = 2α(1 − α)−1 N + (ξ − τ 2 )dN 1/3 , u = 1+α −1 2/3 −1 2/3 τ . We obtain τ and v = 1+α 1−α d N 1−α d N 1 2 2 2 ei(ξ −ξ +τ −τ )t−(τ −τ )t dt 2π R 1 2 2 2 e−(ξ −ξ +τ −τ ) /(τ −τ ) . =√ 4π(τ − τ )
lim dN 1/3 φ2u,2v (x, y) =
N→∞
(4.26)
We want to identify the right-hand side of (4.24) combined with (4.26) with the extended Airy kernel. This can be done using Proposition 2.3. Combining this double integral formula for the extended Airy kernel with (4.24) and (4.26) we obtain the following result. Proposition 4.1. Let d = (1 − α)−1 α 1/3 (1 + α)1/3 and let KN be given by (3.24). Then 1 + α −1 2/3 2α d N τ, N + (ξ − τ 2 )dN 2/3 ; lim dN 1/3 KN 2 N→∞ 1−α 1−α 1 + α −1 2/3 2α 2 2 d N τ, N + (ξ − τ )dN 2/3 1−α 1−α = e(τ
3 −τ 3 )/3+ξ τ −ξ τ
uniformly for ξ, ξ , τ, τ in a compact set.
A(τ, ξ ; τ ξ )
(4.27)
310
K. Johansson
We can now combine formula (3.27), Theorem 3.14, Proposition 4.1 and some estimates of KN , which can be obtained from the asymptotic analysis above, to prove Theorem 1.1 on convergence in distribution to the Airy process. This can be done by proving that the Fredholm expansions converge. The details of this are similar to those in the proof of Lemma 3.1 in [16] and we will not present them here. The individual determinants in the Fredholm expansion can be estimated using the Hadamard inequality. More details in a very similar convergence theorem is given in [22]. 5. A Functional Limit Theorem 5.1. Weak convergence. Consider the PNG height functions hk (x, 2N − 1) defined in Sect. 3. Set tj =
j , cN 2/3
where c = (1 + α)(1 − α)−1 d −1 , j ∈ Z. The normalized height functions are 1 2α HN,k (tj ) = h N , (2j, 2N − 1) − k dN 1/3 1−α with d as in (4.6), k ∈ N. For a given function f : R → C we write 2α 1 fN (x) = f x − N . dN 1/3 1−α Assume that there is a K such that f (x) = 0 for x ≤ K. Define HN (f, tj ) =
∞
fN (hk (2j, 2N − 1)) =
k=0
∞
f (HN,k (tj )).
(5.1)
k=0
The basic estimate in the proof of the functional limit theorem is Lemma 5.1. Assume that f is a C ∞ function and that there are constants K1 and K2 such that f (x) = 0 if x ≤ K1 , and f (x) equals a constant if x ≥ K2 . There is a constant C(f, α) so that E[(HN (f, tu ) − HN (f, tv ))4 ] ≤ C(f, α)e−|tu | |tu − tv |2 , 3
(5.2)
for |tu − tv | ≤ 1 and |tu |, |tv | ≤ log N. We will prove this lemma in Subsect. 5.3. Consider HN (f, tj ) as defined by (5.1). The next lemma is a standard consequence of Lemma 5.1. Lemma 5.2. Under the same assumptions as in Lemma 5.1, P[ max |HN (f, tj ) − HN (f, tu )| ≥ λ] ≤ C(f, α)λ−4 e−|tu | |tu − tv |2 . 3
j =u,...,v
(5.3)
Discrete Polynuclear Growth and Determinantal Processes
311
Proof. Let ηi = HN (f, tu+i ) − HN (f, tu+i−1 ), Tm = m i=1 ηi and T0 = 0, so that Tj − Ti = HN (f, tj ) − HN (f, ti ). It follows from (5.2) and Chebyshev’s inequality that P[|Ti − Tj | ≥ λ] ≤ C(f, α)λ−4 e−|tu |3
2 u
i< ≤j
for u ≤ i < j ≤ v, where u = t +u − t −1+u . This implies (5.3) according to Theorem 12.2 in [6]. Fix l > 0 and consider a rescaled top height curve HN,0 (t) for |t| ≤ L and its modulus of continuity, 0 < δ ≤ 1, wN (δ) =
sup
|t|,|s|≤T ,|s−t|≤δ
|HN,0 (t) − HN,0 (s)|.
Lemma 5.3. Let wN be defined as above. Given , λ > 0 there is a δ > 0 and an integer N0 such that P[wN (δ) ≥ λ] ≤ if N ≥ N0 . Together with the convergence of the finite dimensional distributions, Theorem 1.1, this proves Theorem 1.2, [6]. We turn now to the proof of Lemma 5.3. Proof. Assume that δ −1 , T ∈ Z and divide the interval [−T , T ] into 2m parts of length T /m = δ. Write rj = [j δ[cN 2/3 ]], c = 2α/(1 − α), so that trj ≈ j δ. Claim 5.4. Let L = T [cN 2/3 ] and BM is the subset of our probability space where max|j |≤L |HN,0 (tj )| ≤ M. Then, given > 0, we can choose M so that c ] ≤ . P[BM
(5.4)
We will prove this claim below. We will also need Claim 5.5. For any λ > 0 there is a constant C(M) that depends on M but not on λ such that P[ max
rj ≤i≤rj +1
|HN,0 (ti ) − HN,0 (trj )| ≥ λ, BM ] ≤
C(M) . λ
(5.5)
We will return to the proofs. The proofs of both claims are based on choosing appropriate functions f in Lemma 5.2 and results about convergence in distribution. Assuming the validity of the two claims we can prove Lemma 5.3. Set Aj = {
max
trj ≤i≤trj +1
|HN,0 (s) − HN,0 (trj )| ≥ λ/3},
(5.6)
312
K. Johansson
c ] ≤ , which is possible so that {wN (δ) ≥ λ} ⊆ ∪|j |≤ Aj . Choose M so large that P[BM by Claim 5.4. Hence P[Aj ∪ BM ]. (5.7) P[wN (δ) ≥ λ] ≤ + |j |≤m
Now, if the inequality in (5.6) holds then max
rj ≤i≤rj +1
|HN,0 (ti ) − HN,0 (trj )| ≥ λ/9.
Consequently, using (5.5) and (5.7), P[wN (δ) ≥ λ] ≤ + (2m + 1)C(M)λ−1 δ 2 ≤ + 2T C(M)λ−1 δ. Choose δ so that δ ≤ λ/2T C(M). Lemma 5.3 is proved Consider now Claim 5.4. Pick a C ∞ function q such that 0 ≤ g ≤ 1 and 1 if x ≥ 0 g(x) = 0 if x ≤ −1
(5.8)
and let gM (x) = g(x − M). It is not hard to see that if we take f = gM in (5.2) the C(f, α) can be taken to be independent of M (only sup-norms of f and its derivatives enter). If HN (gM , trj ) > 1/4, then HN,0 (trj ) ≥ M − 1 and using the convergence in distribution to F2 we see that we can choose M so large that P[HN (gM , trj ) > 1/4] ≤ 2 for |j | ≤ m and all sufficiently large N . Let ω denote a point in our probability space. Now, P[ max HN (gM , tj ) > 1/2] |j |≤L
= P[HN (gM , tj (ω) ) > 1/2] m
= ≤
j =−m+1 m
P[HN (gM , tj (ω) ) > 1/2, trj −1 ≤ tj (ω) ≤ trj ] P[HN (gM , trj j ) > 1/4]
j =−m+1 m
+
P[
j =−m+1
≤ 2m 2 +
max
rj −1 ≤i≤trj
m j =−m+1
|HN (gM , ti ) − HN (gM , trj −1 )| ≥ 1/4]
C 2 2 + Cδ δ ≤ 2T (1/4)4 δ
by Lemma 5.2. We can now choose δ = . If HN,0 (tj ) ≥ M, then HN (gm , tj ) ≥ 1 and hence max|j |≤L HN,0 (tj ) ≥ M, which implies max|j |≤L HN (gM tj ) ≥ 1/2. It follows that P[ max HN,0 (tj ) ≥ M] ≤ . |j |≤L
The case max|j |≤L HN,0 (tj ) ≤ −M, is analogous. This proves Claim 5.4.
Discrete Polynuclear Growth and Determinantal Processes
313
To prove Claim 5.5 we let i(ω) be defined by max
rj ≤i≤rj +1
|HN,0 (ti ) − HN,0 (trj )| = |HN,0 (ti(ω) ) − HN,0 (trj )|.
Let Ij = [j λ, (j + 1)λ), j = −K, . . . , K − 1, where M = Kλ, K ∈ Z+ . Take a C ∞ function f , 0 ≤ f ≤ 1, such that f (x) = 0 if x ≤ −λ, f (x) = 1 if 0 ≤ x ≤ λ and F (x) = 0 if x ≥ λ. Set fj (x) = f (x − λj ).
(5.9)
Suppose first that HN,0 (ti(ω) ) ≤ HN,0 (trj ) − 2λ and that ω ∈ BM . Then there is a k(ω) such that HN,0 (trj ) ∈ Ik(ω) , and HN,0 (ti(ω) ) ≤ (k + 1)λ − 2λ = (k − 1)λ, and consequently HN (fk(ω) , ti(ω) ) = 0. Since HN (fk(ω) , trj ) ≥ 1, we see that |HN (fk(ω) , ti(ω) ) − HN (fk(ω) , trj )| ≥ 1. Hence, max
max
|k|≤m rj ≤i≤rj +1
|HN (fk , ti ) − HN (fk , trj )| ≥ 1.
(5.10)
Call this event F . If we instead suppose that HN,0 (ti(ω) ) ≥ HN,0 (trj ) + 2λ we can proceed similarly and see that (5.10) still holds. Now, P[F ] ≤ P[∪rj ≤i≤rj +1 {|HN (fk , ti ) − HN (fk , trj )| ≥ 1}] ≤
K
C|trj +1 − trj |2 ≤
k=−K+1
by Lemma 5.2.
2MC 2 δ , λ
5.2. Transversal fluctuations. In this section we will prove Corollary 1.3, Proposition 1.4 and Theorem 1.6. Let T > 0 be fixed and set T SN (u) =
S T (u) =
sup HN,0 (t),
−T ≤t≤u
sup (A(t) − t 2 ).
−T ≤t≤u
T for S T (T ) and S T for S T (T ). We will write SN N
Lemma 5.6. Given > 0 we can choose T = T () so that ∞ T P[SN = SN ]≤
(5.11)
for all sufficiently large N . Note that together with Theorem 1.2 and the results in [4] this proves Corollary 1.3.
314
K. Johansson
Proof. Let gM be defined as above and set Rj = T + (j − 1)δ, j ≥ 1, where δ will be specified below. It follows from Lemma 5.2 that for Rj ≤ log N , P[
|HN (gM , t) − HN (gM , Rj )| ≥ 1/2] ≤ Ce−Rj δ 2 , 3
sup
Rj ≤t≤Rj +1
(5.12)
where C is independent of M. Now, P[ sup HN,0 (t) ≥ M] T ≤t≤RL
≤ P[ sup HN (gM , t) ≥ 1] T ≤t≤RL
= P[ max
sup
1≤j
≤
L−1
P[
j =1
≤
L−1
P[
j =1
+
L−1
HN (gM , t) ≥ 1]
sup
(HN (gM , t) − HN (gM , Rj )) + HN (gM , Rj ) ≥ 1]
sup
(HN (gM , t) − HN (gM , Rj )) ≥ 1/2]
Rj ≤t≤Rj +1
Rj ≤t≤Rj +1
P[HN (gM , Rj ) ≥ 1/2].
(5.13)
j =1
Claim 5.7. There is a positive constant c1 such that P[HN,0 (R) ≥ s] ≤ e−c1 (s+R
2 )3/2
.
(5.14)
Proof. Let c = (1 + α)(1 − α)−1 d −1 , d 3 = α(1 + α)(1 − α)−3 and tj = j/cN 2/3 as before. We have 1 2α P[HN,0 (R) ≥ s] = P h N ≥ s (2j, 2N − 1) − 0 dN 1/3 1−α 2α = P G(N + cN 2/3 R, N − cN 2/3 R) ≥ N sdN 1/3 1−α by (1.4) and the definition of HN,0 . Recall that the parameter q in the geometric distribution = α 2 . By Corollary 2.4 in [16] we have, for all K ≥ 1 and γ ≥ 1, P[G([γ K], K) > Kt] ≤ e−2KJ (t+1) ,
(5.15)
where the function J satisfies J ((1 +
√ qγ )2 (1 − q)−1 + δ) ≥ c1 δ 3/2
(5.16)
for 0 ≤ δ ≤ 1; c1 is a positive constant. We take K = N − cN 2/3 R, γ = (N + cN 2/3 R)/K and t = (2α(1 − α)−1 N + sdN 1/3 )/K. Pick δ so that √ (1 + qγ )2 (1 − q)−1 + δ = 1 + t. This gives δ = dN −2/3 (s + R 2 ) + O(N −1 ) and if we insert this into (5.16), the estimate (5.15) gives us exactly what we want.
Discrete Polynuclear Growth and Determinantal Processes
315
If HN (gM , Rj ) ≥ 1/2, then HN,0 (Rj ) ≥ M − 1 and hence L−1
L−1 1 −c(M−1+(T +(j −1)δ)2 )3/2 e δ δ j =1 1 ∞ −c(M−1+x 2 )3/2 e dx. ≤ δ T −1
P[HN (gM , Rj ) ≥ 1/2] ≤
j =1
(5.17)
Using (5.12) we find L−1
P[
j =1
≤
sup
Rj ≤t≤Rj +1
L−1
Ce
(HN (gM , t) − HN (gM , Rj )) ≥ 1/2]
−Rj3 2
δ ≤ Cδ
j =1
∞
T −1
e−x dx. 3
(5.18)
Inserting (5.17) and (5.18) into (5.13) gives ∞ 1 ∞ −c(M−1+x 2 )3/2 3 P[ sup HN,0 (t) ≥ M] ≤ e dx + Cδ e−x dx δ T −1 T −1 T ≤t≤RL
(5.19)
if RL ≤ log N . We can take δ = 1. It follows from (5.17) that
P[ sup HN,0 (t) ≥ M] ≤ RL ≤t
≤
P[HN,0 (tu ) ≥ M]
tu ≥RL
e−c(M+tu )
2 3/2
≤ CN e−c(log N) < /4 3
(5.20)
u≥cN 2/3 RL
if N is sufficiently large. We know that P[HN,0 (0) ≤ M] → F2 (M) as N → ∞ and we can choose M so large that the right-hand side of (5.19) is ≤ /4. Together with (5.20) this gives (using symmetry), P[ sup HN,0 (t) ≥ M] ≤ . |t|≥T
(5.21)
∞ = S T and consequently If HN,0 (0) > M and sup|t|≥T HN,0 (t) ≥ M, then SN N ∞ T = SN ] ≤ P[HN,0 (0) ≤ M] + P[ sup HN,0 (t) > M] ≤ 2 P[SN |t|≥T
for all sufficiently large N .
We turn now to the transversal fluctuations and the proof of Theorem 1.6. Define T T (u) = SN }, KNT = inf{u ≥ −T ; SN
K T = inf{u ≥ −T ; S T (u) = S T }, which give the leftmost point of maximum in [−T , T ] before and after the limit. We first prove Proposition 1.4.
316
K. Johansson
Proof (Proposition 1.4). Note that {KN < −T } ⊆ { sup HN,0 (t) ≥ HN,0 (0)}. t≤−T
It follows that P[KN < −T ] ≤ P[HN,0 (t) ≥ M] + P[HN,0 (0) < M} ≤ 2, ∞ = S T } and we can by (5.21) and the discussion proceeding it. Also, {KN > T } ⊆ {SN N use Lemma 5.6.
Proof (Theorem 1.6). It follows from Lemma 5.6 that given > 0 we can choose T and N0 so that P[KN = KNT ] ≥ 1 −
(5.22)
if N ≥ N0 . Let hT : C(R) → R be defined by hT (x) = inf{u ≥ −T ; sup x(t) = −T ≤t≤u
sup
−T ≤t≤T
x(t)},
and let DhT = {x ∈ C(R) ; hT is discontinuous at x}. It follows from our assumption that P[DhT ] = 0, since hT is continuous at x unless x has two distinct maximum points. Since HN,0 converges in distribution to X in C[−T , T ] it follows that KNT = hT (HN,0 ) → hT (X) = KT
(5.23)
as N → ∞. Let DT be all points of discontinuity for x → P[K T ≤ x], T ∈ Z, and D = ∪T ≥1 DT . We will prove that P[KN ≤ x] → P[K ≤ x]
(5.24)
as N → ∞ for all x ∈ R \ D, which implies what we want since D is countable. All the results and assumptions that are behind the estimate (5.22) can also be proved for the limiting Airy process and we can assume that N0 and T ∈ Z+ are chosen so that also P[K = K T ] ≥ 1 −
(5.25)
if N ≥ N0 . Let x ∈ R \ D. Then, P[KN ≤ x] = P[KNT ≤ x, KNT = KN ] + P[KN ≤ x, KNT = KN ], and similarly for KNT . Hence, |P[KN ≤ x] − P[KNT ≤ x]| ≤ 2 if N ≥ N0 . Since x ∈ R \ D it follows from (5.23) that we can choose N1 so that |P[KNT ≤ x] − P[K T ≤ x]| ≤ if N ≥ N1 . It follows from (5.25) that |P[K ≤ x] − P[K T ≤ x]| ≤ . Combining the estimates we see that |P[KN ≤ x] − P[K ≤ x]| ≤ 4 if N ≥ max(N0 , N1 ), which proves (5.24).
Discrete Polynuclear Growth and Determinantal Processes
317
5.3. Proof of Lemma 5.1. In this subsection we will give the proof of Lemma 5.1. The proof is rather long and complicated. In the expansions of the determinant giving the 4-point correlation function many terms appear. These terms have to be combined in the right way to see the cancellations and obtain the desired estimate. We will describe how the terms should be combined, but then do the details of the estimates only for some typical terms. The others are handled in a similar fashion. Proof. The left-hand side of (5.2) can be written 4 ∞ E (fN (hkr (2u, 2N − 1)) − fN (hkr (2v, 2N − 1))) . k1 ,k2 ,k3 ,k4 =1
(5.26)
r=1
We can rewrite this using formula (3.25) in Theorem 3.14. Let us write the kernel KN (2u, x; 2v, y) in (3.24) as Kuv (x, y). We will use the following notation: r1 r2 ...rm x1 x2 . . . xm Ks1 s2 ...sm = det(K(ri , xi ; sj , yj ))m (5.27) i,j =1 , y1 y2 . . . ym and we will also write K(r1 , x1 r2 , x2 . . . rm , xm ) = det(K(ri , xi ; rj , xj ))m i,j =1 .
(5.28)
Furthermore, we will write Du1 ,...,um (x1 , . . . , xm ) = K(2u1 , x1 2u2 , x2 . . . 2um , xm ).
(5.29)
Set hN (x1 , x2 , x3 ) = −6[fN (x1 )2 fN (x2 )fN (x3 ) + fN (x1 )fN (x2 )2 fN (x3 ) −fN (x1 )fN (x2 )fN (x3 )], which is symmetric under permutation of x1 and x2 . Then, the sum in (5.26) can be written fN (x1 )fN (x2 )fN (x3 )fN (x4 )[Duuuu (x1 , x2 , x3 , x4 ) − 4Duuuv (x1 , x2 , x3 , x4 ) x∈Z4
+ 6Duuvv (x1 , x2 , x3 , x4 ) − 4Duvvv (x1 , x2 , x3 , x4 ) + Dvvvv (x1 , x2 , x3 , x4 )] {6fN (x1 )2 fN (x2 )fN (x3 )[Duuu (x1 , x2 , x3 ) + Dvvv (x1 , x2 , x3 )] + x∈Z3
+ hN (x1 , x2 , x3 )Duuv (x1 , x2 , x3 ) + hN (x3 , x2 , x1 )Duvv (x1 , x2 , x3 )} 2(fN (x1 )3 fN (x2 ) + fN (x1 )fN (x3 )3 ) + x∈Z2
× [Duu (x1 , x2 ) − 2Duv (x1 , x2 ) + Dvv (x1 , x2 )] 3fN (x1 )2 fN (x2 )2 [Duu (x1 , x2 ) + 2Duv (x1 , x2 ) + Dvv (x1 , x2 )] + x∈Z2
+
2fN (x1 )4 [Kuu (x1 , x1 ) + Kvv (x1 , x1 )]
x1 ∈Z
. = 1 + 2 + 3 + 4 + 5 .
(5.30)
We have Kuv = K˜ uv − φ if u < v and Kuv = K˜ uv if u ≥ v. Here we have written φ = φu,v . Set Kuv = K˜ uv − K˜ uu , Kvu = K˜ vu − K˜ uu Kvv = K˜ uv − K˜ uu . We see
318
K. Johansson
from (4.26) that φ acts like a kind of approximate δ-function. This will be important for the cancellation between different terms in (5.30). The argument goes as follows. We will take out all terms in (5.30) containing φ and combine them with other terms so that we get cancellation. We will then expand in Kuv , Kvu and Kvv . The terms linear in K will cancel and what will remain will be terms containing (K)2 or higher powers. They will give a contribution proportional to |tu − tv |2 which is what we want. In the computations below we use symmetries and also relabelling of variables. Expand in φ and in the terms linear in φ we expand in K. Let D˜ denote the same ˜ We find object as in (5.29) but with K replaced by K. 1 = fN (x1 )fN (x2 )fN (x3 )fN (x4 )[D˜ uuuu (x1 , x2 , x3 , x4 ) − 4D˜ uuuv (x1 , x2 , x3 , x4 ) x
+ 6D˜ uuvv (x1 , x2 , x3 , x4 ) − 4D˜ uvvv (x1 , x2 , x3 , x4 ) + D˜ vvvv (x1 , x2 , x3 , x4 )] uu x3 x4 [Kvu (x2 , x3 ) + Kuv (x2 , x3 ) 24φ(x1 , x4 )Kuu + x1 x2 x
− Kvv (x2 , x3 )]fN (x1 )fN (x2 )fN (x3 )fN (x4 ) vv x3 x4 ˜ f (x )f (x )f (x )f (x ) 12φ(x1 , x4 )φ( x2 , x3 )Kuu − x1 x2 N 1 N 2 N 3 N 4 x × (terms with K 2 ).
(5.31)
x
We will give a brief discussion of the K 2 terms below. Also we will see then that terms containing Kvu (x, y) + Kuv (x, y) − Kvv (x, y)
(5.32)
˜ of (5.31) in will give a contribution proportional to |tu − tv |2 . If we expand the D-part K we will see that the terms linear in K cancel out. Since obviously the 0:th order term equals zero we are left with K 2 -terms. The term containing two φ-factors will be combined with other terms below. We expand 2 similarly. The part linear in K is 2 uu x3 x1 ˜ [Kvu (x2 , x3 ) − 24fN (x1 ) fN (x2 )fN (x3 )Kuu x1 x2 x
+ Kuv (x2 , x3 ) − Kvv (x2 , x3 )].
(5.33)
Actually this sum can be combined with the corresponding term in (5.31) to get some cancellation, see the φ-calculations below, but we can also use the fact that (5.32) has the right order. We also get K 2 -terms and a term linear in φ, 12φ(x1 , x3 )[fN (x1 )2 fN (x2 )fN (x3 ) − fN (x1 )fN (x2 )fN (x3 )2 ] x
! " uv x2 x3 vv x2 x3 × K˜ uu − K˜ uv x1 x2 x1 x2 " ! uv x2 x3 vv x2 x3 + + K˜ uv 12φ(x1 , x3 )fN (x1 )fN (x2 )2 fN (x3 ) K˜ uu x1 x2 x1 x2 x
. = a + b .
(5.34)
Discrete Polynuclear Growth and Determinantal Processes
319
Consider next 3 . We get a term linear in φ, 4φ(x1 , x2 )Kvu (x2 , x1 )[fN (x1 )fN (x2 )3 + fN (x1 )3 fN (x2 )], −
(5.35)
x
a term linear in K, 4(fN (x1 )fN (x2 )3 + fN (x1 )3 fN (x2 )])[Kvu (x1 , x2 ) x
+ Kuv (x1 , x2 ) − Kvv (x1 , x2 )]Kuu (x2 , x1 ),
(5.36)
and K 2 -terms. In (5.36) we again have the expression (5.32). The leading term in 4 is 2 2 ˜ uu x1 x2 uv x1 x2 vv x1 x2 ˜ ˜ + 2Kuv + Kvv 3fN (x1 ) fN (x2 ) Kuu x1 x2 x1 x2 x 1 x2
(5.37)
x
and we also have a term linear in φ, 6fN (x1 )2 fN (x2 )2 φ(x1 , x2 )K˜ vu (x2 , x1 ).
(5.38)
x
Finally we have 5 which is ˜ 1 , x1 )]. 2fN (x1 )4 [K˜ uu (x1 , x1 ) + K(x
(5.39)
x1
When calculating the cancellations involving the φ-terms we will combine the double φ-term in (5.31) with b in (5.34) and (5.37). Also we will combine (5.35), (5.38) and (5.39). We will discuss this second case first in some detail and then the first case more briefly. The term a is similar and finally we will indicate what is involved in estimating (5.32) and the K 2 -terms. We want to estimate φ(x, y)K¯ vu (y, x)[6fN (x1 )2 fN (x2 )2 − 4fN (x1 )fN (x2 )3 x,y∈Z
− 4fN (x1 )3 fN (x2 )] +
(K˜ uu (y, y) + K˜ vv (y, y))fN (y)4 .
(5.40)
y∈Z
Here we have made a symmetrization in x and y by setting 1 K¯ vu (x, y) = [K˜ vu (x, y) + K˜ vu (y, x)]; 2 note that φ(x, y) is symmetric in x and y. Next, we will introduce some notation and some formulas that will be used. Set g(z) = − and G∗ab (z, w)
=
1 − α/z 1 − αz
N
1 α (z + − 2) (1 − α)2 z
1 − αw 1 − α/w
N (1 + g(z))a (1 + g(w))b
(5.41)
1 (5.42) w(z − w)
320
K. Johansson
so that K˜ ab (x, y) =
1 (2πi)2
and φ(x, y) =
1 2π
π
−π
dz γr2
γr2
dwG∗ab (z, w)
wy , zx
ei(y−x)θ (1 + g(eiθ ))u−v dθ.
(5.43)
(5.44)
Note that G∗ab (z, w)(1 + g(z))c (1 + g(w))−d = G∗a+c,b+d (z, w). Fix > 0 and let f (x) = f (x)e−x . Then f is in L1 (R) and we have 1 m fN (x) = F (λ)eiξm (λ)(x−cN) d m λ, (2π)m Rm m
(5.45)
(5.46)
where Fm (λ) = fˆ (λ1 ) . . . fˆ (λm ), c = 2α(1 + α)−1 and ξm (λ) = (λ1 + · · · + λm − im)/dN 1/3 with d given by (4.8). Integration by parts gives (1 + g(z))u−v − 1 = (u − v)g(z) +
u−v (u − v)2 R (z) + R2 (z), 1 d 4 N 4/3 d 4 N 4/3
(5.47)
where
1
R1 (z) = d 4 N 4/3 g(z)2 0
1−t dt (1 + tg(z))2
and R2 (z) = d N 4
4/3
(log(1 + g(z)))
1
2
(1 − t)(1 + g(z))t (u−v) dt.
0
Let f (x) = f (x + 1) − f (x) be the usual finite difference operator. We have the following formula: y wy w wx fN (x)m = fN (y)m y [(1 + g(z))u−v + (1 + g(w))u−v ] φ(x, y) + x y z z z x∈Z y w y−1 w y+1 α(u − v) w m m f (y) − f (y − 1) + fNm (y) − N N (1 − α)2 zy+1 zy zy wy 1 m + y−1 fN (y − 1) + d m λFm (λ)eiξm (λ)(y−cN) z (2π)m Rm ! u−v −iξm (λ) iξm (λ) × (ze ) − R (z)R (we ) − R (w) R 1 1 1 1 d 4 N 4/3 " (u − v)2 −iξm (λ) iξm (λ) + 4 4/3 R2 (ze ) − R2 (z) + R2 (we ) − R2 (w) (5.48) d N
Discrete Polynuclear Growth and Determinantal Processes
321
for |w| = exp(−m/dN 1/3 ) = r1 , |z| = r2 = 1/r1 . To prove this, introduce the formula (5.44) for φ and the formula (5.46) for fNm into the left-hand side of (5.48) and use
(e−iθ r1 eiφ eiξm )x = δ0 θ −
x∈Z
1 (λ + · · · + λ ) − φ , 1 m dN 1/3
where δ0 is the Dirac δ-function, to carry out the x-summation. This gives y y w wx 1 m m iξm (λ)(y−cN) w f φ(x, y) + (x) = d λF (λ)e N m zx zy (2π)m Rm zy x∈Z
× [(1 + g(ze−iξm (λ) ))u−v + (1 + g(weiξm (λ) ))u−v ] wy = fN (y)m y [(1 + g(z))u−v + (1 + g(w))u−v ] z y 1 m iξm (λ)(y−cN) w + d λF (λ)e [(1 + g(ze−iξm (λ) ))u−v − (1 + g(z))u−v m (2π )m Rm zy + (1 + g(weiξm (λ) ))u−v − (1 + g(w))u−v ].
(5.49)
In the last expression we use (5.47) and the explicit form (5.41) of g to obtain the right-hand side of (5.48). We will call the first part of the right-hand side of (5.48), y fN (y)m wzy [(1 + g(z))u−v + (1 + g(w))u−v ], the contraction term, which is the main contribution. The second part is called the finite difference term. We can now insert the integral formula (5.43) into (5.40) and use (5.48). The contraction term from the first sum in (5.40) will then exactly cancel the second sum. Here we use (5.45). What remains is u−v u−v (u − v)2 S0 + 4 4/3 S1 + 4 4/3 S2 , 4 4/3 d N d N d N
(5.50)
where S0 is the part coming from the finite differences, S1 is the part coming from terms involving R1 and S2 from the terms involving R2 . After some computation we find 1 1 α ∗ S0 = − dz dwG (z, w) + w vu (1 − α)2 (2πi)2 γr2 z γr1 wy × (dN 1/3 (fN (y + 1) − fN (y)))4 . (5.51) zy y∈Z
Also,
Si =
wy
1 d 3 λF3 (λ) fN (y)eiξ3 (λ)(y−cN) y (2π )3 R3 z y∈Z 1 wy ∗ {6[h(z, w; ξ2 (λ)) × dz dwG (z, w) vu (2π i)2 γr2 zy γr1 −h(z, w; 0)] − 4[h(z, w; ξ1 (λ)) − h(z, w; 0)] − 4[h(z, w; ξ3 (λ)) − h(z, w; 0)]} , (5.52)
322
K. Johansson
i = 1, 2. In order to restrict the y-summation so that (y −cN )/dN 1/3 ranges over a compact interval we make a summation by parts in (5.52). Recall that we assume that f (y) is a constant for large y. If we let a = exp(iξ3 (λ))w/z and use a y = (1 − a)−1 (a y − a y+1 ) in the y-sum in (5.52) a summation by parts gives wy 1 (fN (y) − fN (y − 1))eiξ3 (λ)y y . 1 − exp(iξ3 (λ))w/z z y∈Z
Hence, for i = 1, 2, Si = (fN (y) − fN (y − 1)) y∈Z
1 × (2πi)2
dz γr2
γr1
1 d 3 λF3 (λ)eiξ3 (λ)(y−cN) (2π)3 R3
dwG∗vu (z, w)
wy {6[h(z, w; ξ2 (λ)) zy
− h(z, w; 0)] − 4[h(z, w; ξ1 (λ)) − h(z, w; 0)] 1 . −4[h(z, w; ξ3 (λ)) − h(z, w; 0)]} 1 − exp(iξ3 (λ))w/z
(5.53)
The expressions Si will be estimated using the types of estimates derived in Sect. 4. Write u = (1 + α)(1 − α)−1 d −1 N 2/3 τ , v = (1 + α)(1 − α)−1 d −1 N 2/3 τ and y = 2α(1 − α)−1 N (ξ − τ 2 )dN 2/3 . To estimate (5.51) we can now use our results from Sect. 4. We use η it τ + η − it − =1+ + ..., 1/3 1/3 dN dN dN 1/3 η is −τ + η − is = pc (β ) + − =1+ + ..., 1/3 1/3 dN dN dN 1/3
z(t) = pc (β) + w(t)−1
(5.54)
as parametrizations of the integrals as before. Using the same estimates as in Section 4 we can restrict the integration to |t|, |s| ≤ N γ with an error ≤ C exp(−cN 2γ ) with some c > 0. Since v − u ≥ 1, and hence tv − tu ≥ c/N 2/3 , and furthermore |tu | ≤ log N , we can incorporate the error term into the right-hand side of (5.2). The integral in (5.51) can then be estimated using (4.17). Note that, by our assumptions on f , the number of y-terms = 0 is ≤ CN 2/3 and we get a compensating factor 1/N 2/3 from the parametrizations; see also (4.21). The numbers η, η are chosen so that η + η ≥ 2. Note that, since we assume τ − τ ≤ 1, the condition (4.18) is satisfied. We find 1 (τ 3 −τ 3 )/3+ξ(τ −τ )+(η3 +η 3 )/3−ξ(η+η ) |S0 | ≤ C(f, α) 2/3 e , (5.55) N y where the y-summation is over all y ∈ Z such that (y − cN )/dN 1/3 ∈ [K1 − 1, K2 + 1]. Consider now Si . Write z˜ = z exp(−m/dN 1/3 ), λ˜ = (λ1 + · · · + λm )/dN 1/3 . Then, α 2 1 i λ˜ −iξm (λ) ˜ ˜ (˜z − 1) e − 2i(˜z − 1) sin λ + 2(cos λ − 1) . )=− g(ze (1 − α)2 z˜ Thus, 1+g(ze−iξm (λ) ) = 1+
(˜z − 1)2 i λ˜ 2αt αt ˜ ˜ . (1 − cos λ)− −2i(˜ z − 1) sin λ e (1 − α)2 (1 − α)2 z˜
Discrete Polynuclear Growth and Determinantal Processes
323
The last term is small for large N and the second is ≥ 1. Hence, 1 ≤2 |1 + tg(z(t)e−iξm (λ) )|
(5.56)
for |t| ≤ N γ and N sufficiently large. Consequently, there is a constant c1 (α) depending only on α such that |g(ze−iξm (λ) )| ≤ c1 (α)(λ˜ 2 + |˜z − 1|2 )
(5.57)
| log(1 + tg(ze−iξm (λ) ))| ≤ c1 (α)(λ˜ 2 + |˜z − 1|2 ).
(5.58)
and
We can also write 2α ˜ 1 + tg(ze−iξm (λ) ) = 1 + (1 − cos λ) (1 − α)2 ˜ (˜z − 1)2 ei λ /˜z − 2i(˜z − 1) sin λ˜ α . × 1− ˜ (1 − α)2 2α(1 − α)−2 (1 − cos λ)
(5.59)
˜ ≤ π . Estimating the cosine and sine functions By periodicity it is enough to consider |λ| we see that there are constants c2 (α) and c3 (α) such that ˜ |1 + tg(ze−iξm (λ) )| ≥ exp(c2 (α)λ˜ 2 − c3 (α)(|˜z − 1|2 + |˜z − 1||λ|)) and hence, ˜ |1 + tg(ze−iξm (λ) )|t (u−v) ≤ exp(−c2 (α)λ˜ 2 + c3 (α)(|˜z − 1|2 + |˜z − 1||λ|)). Estimating the quadratic polynomial in λ˜ we obtain |1 + tg(ze−iξm (λ) )|t (u−v) ≤ exp(t (τ − τ )c4 (α)N 2/3 |˜z − 1|2 ). Now, z(t) exp(−m/dN 1/3 ) = 1 + (τ + η − m − it)/dN 1/3 + . . . , and we obtain an estimate |1 + tg(z(t)e−iξm (λ) )|t (u−v) ≤ exp(c5 (α)[(τ + η − m)2 + t 2 ]).
(5.60)
A computation shows that 1 ≤ c6 |1 − eiξ3 (λ)w/z |
(5.61)
if τ − τ + η + η > > 0. Since τ τ ≤ 1 and we take η + η ≥ 2 we see that we can take = 1/2 for example. Furthermore, since (y − cN )/dN 1/3 is bounded for the y:s that contribute to the sum, |eiξ3 (λ)(y−cN) | ≤ c7 .
(5.62)
324
K. Johansson
We can now again estimate as in Sect. 4 and use (4.17). This results in an estimate 3 2 |Si | ≤c8 (t, α) (1 + λ2 )|fˆ (λ)|dλ ec5 (α)(τ +η−m) R
×
N 1/3
×
1
R
e
e(τ
3 −τ 3 )/3+ξ(τ −τ )+(η3 +η 3 )/3−ξ(η+η )
y
(c5 (α)−η/2)t 2
dt
R
2
e(c5 (α)−η /2)s ds.
(5.63)
We pick η, η ≥ 3c5 (α). Recall that ξ = (y − cN )/dN 1/3 + τ 2 . Let η = max(|τ |, 3c5 (α), 1) and η = max(|τ |, 3c5 (α), 1). It follows from (5.55) and (5.63) that |Si | ≤ c9 (f, α),
(5.64)
i = 1, 2 if |τ |, |τ | are small. If |τ | and |τ | are large, say τ, τ 1, then η = τ and η = τ , 0 ≤ τ − τ ≤ 1, and we get from (5.55) and (5.63) that |Si | ≤ c10 (f, α)e−τ . 3
(5.65)
Inserting these estimates into (5.50) and using v − u ≥ 1, we obtain an estimate of (5.40) of the type we have in the right-hand side of (5.2). Consider the expression ab x4 x3 ˜ . (5.66) φ(x2 , x4 )fN (x2 )fN (x4 )Kcd x2 x1 x2 ,x4 ∈Z
In our computations with the kernel K˜ given by (5.43) we will leave out the complex integrations. Thus x2 x1 G∗ac (z1 , w1 ) wx14 G∗ (z1 , w1 ) wx14 ad z1 z1 ab x4 x3 = K˜ cd x2 x1 x2 x1 w w ∗ Gbc (z2 , w2 ) zx23 G∗bd (z2 , w2 ) zx23 2
2
w x2 w x1 w x1 w x2 = G∗ac (z1 , w1 )G∗bd (z2 , w2 ) x14 x23 − G∗ad (z1 , w1 )G∗bc (z2 , w2 ) x14 x23 . z1 z 2 z1 z2 We are led to the symmetrized expression x2 w x4 w 1 φ(x2 , x4 )fN (x2 )fN (x4 ) x14 + x12 . 2 z1 z1
(5.67)
x2 ,x4 ∈Z
Perform the x4 -summation first and use the formula (5.48). The parts containing R1 and R2 can be estimated in the same way as above. We will only discuss the contraction and finite-difference parts. The contraction part of (5.67) is x2 1 2 w1 fN (x2 ) x2 [(1 + g(z1 ))u−v + (1 + g(w1 ))u−v ], 2 z1 x2 ∈Z
(5.68)
Discrete Polynuclear Growth and Determinantal Processes
325
and hence the contraction part of (5.66) is ! w x2 w x1 1 fN (x2 )2 G∗a+u−v,c (z1 , w1 )G∗bd (z2 , w2 ) x12 x23 2 z1 z2 x2 ∈Z
+ G∗a,c+v−u (z1 , w1 )G∗bd (z2 , w2 )
w1x2 w2x1 z1x2 z2x3
− G∗a+u−v,d (z1 , w1 )G∗bc (z2 , w2 )
w1x1 w2x2 z1x2 z2x3
w x1 − G∗ad (z1 , w1 )G∗b,c+v−u (z2 , w2 ) x12 z1
" w2x2 . z2x3
Here we have also used (5.45). Performing the complex integrations we obtain 1 fN (x2 )2 {K˜ a+u−v,c (x2 , x2 )K˜ bd (x3 , x1 ) + K˜ a,c+v−u (x2 , x2 )K˜ bd (x3 , x1 ) 2 x2 ∈Z
− K˜ a+u−v,c (x2 , x1 )K˜ bc (x3 , x2 ) − K˜ a,d (x2 , x1 )K˜ b,c+v−u (x3 , x2 ).
(5.69)
The finite difference part of (5.66) is ! w1x2 w2x1 1 α 2 ∗ ∗ G f (x ) (z , w )G (z , w ) + w N 2 1 2 1 ac 1 bd 2 2(1 − α)2 z1x2 z2x3 z1 x2 ∈Z " w2x2 w1x1 1 ∗ ∗ − Gad (z1 , w1 )Gbc (z2 , w2 ) x2 x3 . (5.70) + w2 z1 z1 z2 The double φ-term in (5.31), b in (5.34) and (5.37) combined give vv x4 x3 ˜ f (x )f (x )f (x )f (x ) φ(x2 , x4 )φ(x1 , x3 )Kuu 12 x2 x1 N 1 N 2 N 3 N 4 x uv x2 x3 vv x2 x3 − 12 + K˜ vu fN (x1 )fN (x2 )2 fN (x3 ) φ(x1 , x3 ) K˜ uu x2 x1 x2 x1 x uu x1 x2 uv x1 x2 vv x1 x2 +3 K˜ uu + 2K˜ uv + K˜ vv fN (x1 )2 fN (x2 )2 x1 x2 x1 x2 x1 x2 x . = A1 + A2 + A3 . (5.71) Consider the x4 -summation in A1 . The contraction part is, by (5.69), uv x2 x3 vv x2 x3 + K˜ vu , fN (x1 )fN (x2 )2 fN (x3 )φ(x1 , x2 ) K˜ uu 6 x2 x 1 x2 x 1 x
which is exactly − 21 A2 . Hence what remains of A2 is x x uv x2 x3 vv + K˜ vu φ(x1 , x3 ) K˜ uu + 2 3 fN (x1 )fN (x2 )2 fN (x3 ). (5.72) −6 x2 x1 x2 x1 x
326
K. Johansson
We have
uv fN (x1 )fN (x3 )K˜ uu
x1 ,x3 ∈Z
x 2 x3 x2 x1
=
vu fN (x2 )fN (x4 )K˜ uu
x2 ,x4 ∈Z
x 4 x1 . x2 x1
We can now apply (5.69) to compute the contraction part of the first half of (5.72) and get x 1 x2 uu x1 x2 uv ˜ ˜ −6 Kuu + Kuv + fN (x1 )2 fN (x2 )2 . (5.73) x1 x2 x1 x2 x1 ,x2 ∈Z
Similarly the second half of (5.72) has the contraction part x 1 x2 uv x1 x2 vv ˜ ˜ −6 Kuv + Kvv + fN (x1 )2 fN (x2 )2 . x1 x2 x1 x2
(5.74)
x1 ,x2 ∈Z
Since the contraction part of (5.72) equals (5.73) plus (5.74) we see that this exactly cancels A3 . It remains to consider the finite difference parts. From A1 we get a finite difference part w1x2 w2x1 1 6α ∗ ∗ G (5.75) (z , w )G (z , w ) + w 1 2 1 vu 1 vu 2 (1 − α)2 x z1x2 z2x3 z1 w x2 w x1 1 −G∗vu (z1 , w1 )G∗vu (z2 , w2 ) x22 x13 + w2 fN (x2 )2 φ(x1 , x3 )fN (x1 )fN (x3 ). z1 z1 z2 We also need the finite difference part of (5.72). These finite difference parts should be cancelled by the contraction part of (5.75). The contraction part of (5.72) is w1x2 w2x1 1 3α ∗ ∗ G (z , w )G (z , w ) + w − 1 2 1 vu 1 uu 2 (1 − α)2 x z1x2 z2x1 z1 w x2 w x1 1 −G∗vu (z1 , w1 )G∗uu (z2 , w2 ) x22 x11 + w2 fN (x2 )2 φ(x1 , x3 )fN (x1 )2 z1 z1 z2 w1x2 w2x1 1 3α ∗ ∗ Gvu (z1 , w1 )Gvv (z2 , w2 ) x2 x1 − + w1 (1 − α)2 x z1 z1 z2 w2x2 w1x1 1 ∗ ∗ −Gvv (z1 , w1 )Gvu (z2 , w2 ) x2 x1 + w2 fN (x2 )2 φ(x1 , x3 )fN (x1 )2 . z1 z1 z 2 (5.76) In (5.75) we have first w x2 w x1 2 1 2 x2 x3 φ(x1 , x3 )fN (x1 )fN (x3 )fN (x2 ) z z 1 2 x
x1 w x2 w2x4 1 2 w1 2 = fN (x1 ) x1 + x2 φ(x2 , x4 )fN (x2 )fN (x4 ) . 2 x z1 z2x4 z2 x ,x 1
2
4
Discrete Polynuclear Growth and Determinantal Processes
327
This gives the contraction part x x 1 w1 1 w2 2 fN (x2 )2 fN (x1 )2 [(1 + g(z2 ))u−v + (1 + g(w2 ))u−v ]. 2 x ,x z1x1 z2x2 1
(5.77)
2
From the other half of (5.75) we get similarly the contraction part x x 1 w2 1 w1 2 fN (x2 )2 fN (x1 )2 [(1 + g(z2 ))u−v + (1 + g(w1 ))u−v ]. 2 x ,x z1x1 z2x2 1
(5.78)
2
By (5.77) and (5.78) the contraction part of (5.75) is w x1 w x2 1 3α 2 2 1 2 fN (x1 ) fN (x2 ) + w1 (1 − α)2 x z1x1 z2x2 z1 × [G∗vu (z1 , w1 )G∗uu (z2 , w2 ) + G∗vu (z1 , w1 )G∗vv (z2 , w2 )] w x1 w x2 3α 1 2 2 2 1 − fN (x1 ) fN (x2 ) + w2 (1 − α)2 x z1x1 z2x2 z1 × [G∗vu (z1 , w1 )G∗uu (z2 , w2 ) + G∗vv (z1 , w1 )G∗vu (z2 , w2 )], which exactly cancels (5.76). The finite difference part of (5.75) is handled in the same way as (5.51). We will end with some brief comments about the remaining estimates. By (5.43) we have for example 1 Kuv (x, y) = (2πi)2 ×
dz
γr2
γr1
1 − α/z dw 1 − αz
N
1 − αw 1 − α/w
N
wy zx
1 [(1 + g(w))v−u − 1](1 + g(z))u (1 + g(w))u . w(z − w)
(5.79)
Here we can expand (1 + g(w))v−u − 1 as in (5.47) and then estimate in the same way as we did for the R1 − and R2 − terms above. In this way we will see that the K 2 -terms give contributions of the right type. We get a similar integral expression for Kuv + Kvu − Kvv involving (1 + g(z))v (1 + g(w))−u [(1 + g(z))u−v − 1][(1 + g(w))v−u − 1], and we proceed similarly.
Acknowledgement. I thank Peter Forrester for drawing my attention a few years ago to the relation between the exponents occurring in [13 and 17].
References 1. Adler, M., van Moerbeke, P.: The spectrum of coupled random matrices. Ann. Math. 149, 921–976 (1999) 2. Baik, J., Deift, P.A., Johansson, K.: On the distribution of the length of the longest increasing subsequence in a random permutation. J. Am. Math. Soc. 12, 1119–1178 (1999)
328
K. Johansson
3. Baik, J., Deift, P.A., McLaughlin, K., Miller, P., Zhou, X.: Optimal tail estimates for directed last passage site percolation with geometric random variables. Adv. Theor. Math. Phys. 5, 1207–1250 (2001) 4. Baik, J., Rains, E.: Symmetrized random permutations. In: Random Matrix Models and Their Applications, P.M. Bleher and A.R. Its, (eds.), MSRI Publications 40, Cambridge: Cambridgen Univ. Press, 2001 5. Baryshnikov, Yu.: GUES and QUEUES. Probab. Theory Relat. Fields 119, 256–274 (2001) 6. Billingsley, P.: Convergence of Probability measures. New York: John Wiley & Sons, 1968 7. Borodin, A.: Biorthogonal ensembles. Nuel. Phys. B 536, 704–732 (1999) 8. B¨ottcher, A., Silberman, B.: Introduction to large truncated Toeplitz Matrices. Berlin-HeidelbergNew York: Springer, 1999 9. Dyson, F.J.: A Brownian-Motion Model for the eigenvalues of a Random Matrix. J. Math. Phys. 3, 1191–1198 (1962) 10. Eynard, B., Mehta, M.L.: Matrices coupled in a chain I: Eigenvalue correlations. J. Phys. A 31, 4449–4456 (1998) 11. Fisher, M.E., Stephenson, J.: Statistical Mechanics of Dimers on a plane Lattice II: Dimer Correlations and Monomers. Phys. Rev. 132, 1411–1431 (1963) 12. Forrester, P.J.: Exact solution of the lock step model of vicious walkers. J. Phys. A: Math. Gen. 23, 1259–1273 (1990) 13. Forrester, P.J., Nagao, T., Honner, G.: Correlations for the orthogonal-unitary and symplectic-unitary transitions at the soft and hard edges. Nucl. Phys. B 553, 601–643 (1999) 14. Fulton, W.:Young Tableaux. London Mathematical Society, Student Texts 35, Cambridge: Cambridge Univ. Press, 1997 15. Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge: Cambridge University Press, 1985 16. Johansson, K.: Shape fluctuations and random matrices. Commun. Math. Phys. 209, 437–476 (2000) 17. Johansson, K.: Transversal fluctuations for increasing subsequences on the plane. Probab. Theory Relat. Fields 116, 445–456 (2000) 18. Johansson, K.: Discrete orthogonal polynomial ensembles and the Plancherel measure. Ann. Math. 153, 259–296 (2001) 19. Johansson, K.: Random growth and Random matrices. In: European Congress of Mathematics, Barcelona, Vol. I, Baset-Bosten: Birkh¨auser, 2001 20. Johansson, K.: Universality of the local spacing distribution in certain ensembles of hermitian Wigner matrices. Commun. Math. Phys. 215, 683–705 (2001) 21. Johansson, K.: Non-intersecting paths, random tilings and random matrices. Probab. Theory Relat. Fields 123, 225–280 (2002) 22. Johansson, K.: The arctic circle boundary and the Airy process. math.PR/0306216 23. Kenyon, R.: Local statistics of lattice dimers. Ann. Inst. H. Poincar´e, Probabilit´es et Statistiques, 33, 591–618 (1997) 24. Krug, J., Spohn, H.: Kinetic Roughening of Growing Interfaces. In: Solids far from Equilibrium: Growth, Morphology and Defects, C. Godr`eche, (ed.), Cambridge: Cambridge University Press, 1992, pp. 479–582 25. K¨onig, W., O’Connell, N., Roch, S.: Non-colliding random walks, tandem queues and discrete orthogonal polynomial ensembles. Electron. J. Probab. 7(5), (2002) 26. Macˆedo, A.M.S.: Europhys. Lett. 26, 641 (1994) 27. Mehta, M.L.: Random Matrices. 2nd ed., San Diego: Academic Press, 1991 28. Nagle, J.F.: Yokoi, C.S.O., Bhattacharjee, S.M.: Dimer models on anisotropic lattices. In: Phase Transitions and Critical Phenomena, Vol. 13, C. Domb, J. L. Lebowitz, (eds.), London-New York: Academic Press, 1989 29. Okounkov, A.: Infinite wedge and random partitions. Selecta Math. (N.S.) 7, 57–81 (2001) 30. Okounkov, A., Reshetikhin, N.: Correlation function of Schur process with applications to local geometry with application to local geometry of a random 3-dimensional Young diagram. math.CO/0107056 31. Pr¨ahofer, M., Spohn, H.: Scale invariance of the PNG droplet and the Airy process. J. Stat. Phys. 108, 1076–1106 (2002) 32. Sagan, B.: The Symmetric Group. Monterey, CA: Brooks/Cole Publ. Comp. 1991 33. Simon, B.: Trace ideals and their applications. LMS Lecture Notes Series 35, Cambridge: Cambridge University Press, 1979 34. Soshnikov, A.: Determinantal random point fields. Russ. Math. Surv. 55, 923–975 (2000) 35. Stanley, R.P.: Enumerative Combinatorics. Vol. 2, Cambridge: Cambridge University Press, 1999 36. Tracy, C.A., Widom, H.: Level Spacing Distributions and the Airy Kernel. Commun. Math. Phys. 159, 151–174 (1994) 37. Tracy, C.A., Widom, H.: Correlation Functions, Cluster Functions, and Spacing Distributions for Random Matrices. J. Stat. Phys. 92, 809–835 (1998)
Discrete Polynuclear Growth and Determinantal Processes
329
38. Viennot, G.: Une forme g´eom´etrique de la correspondance de Robinson-Schensted. Lecture Notes in Math. 579, Berlin: Springer, 1977, pp. 29–58 39. Widom, H.: On Convergence of Moments for Random Young Tableaux and a Random Growth Model. Int. Math. Res. Not. 9, 455–464 (2002) 40. Yokoi, C.S.O., Nagle, J.F., Salinas, S.R.: Dimer Pair Correlations on the Brick Lattice. J. Stat. Phys. 44, 729–747 (1986) Communicated by H. Spohn
Commun. Math. Phys. 242, 331–360 (2003) Digital Object Identifier (DOI) 10.1007/s00220-003-0933-2
Communications in
Mathematical Physics
Jack Polynomials in Superspace Patrick Desrosiers1 , Luc Lapointe2 , Pierre Mathieu1 1 2
D´epartement de Physique, de G´enie Physique et d’Optique, Universit´e Laval, Qu´ebec, Canada, G1K 7P4. E-mail: [email protected]; [email protected] Instituto de Matem´atica y F´ısica, Universidad de Talca, Casilla 747, Talca, Chile. E-mail: [email protected]
Received: 9 September 2002 / Accepted: 20 June 2003 Published online: 26 September 2003 – © Springer-Verlag 2003
Abstract: This work initiates the study of orthogonal symmetric polynomials in superspace. Here we present two approaches leading to a family of orthogonal polynomials in superspace that generalize the Jack polynomials. The first approach relies on previous work by the authors in which eigenfunctions of the supersymmetric extension of the trigonometric Calogero-Moser-Sutherland Hamiltonian were constructed. Orthogonal eigenfunctions are now obtained by diagonalizing the first nontrivial element of a bosonic tower of commuting conserved charges not containing this Hamiltonian. Quite remarkably, the expansion coefficients of these orthogonal eigenfunctions in the supermonomial basis are stable with respect to the number of variables. The second and more direct approach amounts to symmetrize products of non-symmetric Jack polynomials with monomials in the fermionic variables. This time, the orthogonality is inherited from the orthogonality of the non-symmetric Jack polynomials, and the value of the norm is given explicitly. 1. Introduction A natural direction in which the theory of orthogonal symmetric polynomials can be generalized is to consider its extension to superspace. One possible approach for such an extension is to consider polynomials involving fermionic (i.e., Grassmannian) variables, or superpolynomials, that arise from physically relevant eigenvalue problems invariant under supersymmetry. In many respects (physical and mathematical), one of the most fundamental bases of symmetric orthogonal polynomials is that of the Jack polynomials. This work is concerned with their orthogonality-preserving extension to superspace. A basic requirement of Jack superpolynomials is that they reduce to Jack polynomials when the fermionic variables are set to zero. Another requirement is that they be solutions of the supersymmetric generalization of the eigenvalue problem characterizing the Jack polynomials. More precisely, Jack polynomials are eigenfunctions of the trigonometric
332
P. Desrosiers, L. Lapointe, P. Mathieu
Calogero-Moser-Sutherland (tCMS) model (see e.g., [1] and [2, 3] for properties of the Jack polynomials). Jack superpolynomials must thus be eigenfunctions of the supersymmetric extension of the trigonometric Calogero-Moser-Sutherland (stCMS) model [4].1 This eigenfunction characterization, as in the non-fermionic case, does not uniquely define Jack superpolynomials. A triangular decomposition, in terms of a superspace extension of the symmetric monomial functions (or supermonomials), must be imposed. Unique eigenfunctions, J , defined according to such a triangular decomposition, were constructed in [4, 5], and called Jack superpolynomials. They are indexed by superpartitions = (a ; s ), composed of a partition a with distinct parts, and a usual partition s . The number of entries in a characterizes the fermion sector (i.e., the number of anticommuting variables appearing in every term of the expansion of J ). The integrability of the tCMS model also makes Jack polynomials eigenfunctions of a family of N independent commuting quantities, where N is the number of variables. We prove in this article that, similarly, Jack superpolynomials are eigenfunctions of a whole tower of commuting conserved charges, denoted Hn , for n = 1, . . . , N, where H2 is the Hamiltonian of the stCSM model. The proof relies heavily on the remarkable fact that, if we consider the restriction to the space of superpolynomials symmetric under the simultaneous interchange of any pair of bosonic and fermionic variables, these n charges can be expressed using Dunkl operators as Hn = N i=1 (Di ) . That is, under this restriction, Hn is equivalent to Hn . Now, even though Jack superpolynomials are eigenfunctions of N commuting conserved charges, degeneracies are still present. Indeed, two distinct superpolynomials labeled by two different superpartitions built out of the same set of N integers (but distributed differently among the two partitions a and s ) have identical Hn eigenvalues. As a result, the Jack superpolynomials J of [4, 5] are not orthogonal under the scalar product (19) with respect to which Hn is self-adjoint. The Gram-Schmidt orthogonalization procedure can of course be used to construct orthogonal superpolynomials, but a general pattern is not likely to appear using this construction. The question is thus whether we can naturally define a family of orthogonal superpolynomials. The answer to this question lies in the following observation. By extending the usual tCMS model with N bosonic degrees of freedom to the supersymmetric case, we have introduced N new degrees of freedom. Integrability leads in this case to the appearance of new conserved charges. Indeed, there are 3N new conserved charges, new in the sense that they all disappear when the fermionic variables vanish [4]. Among these new charges, 2N of them are fermionic, that is, they change the fermion number of the function on which they act. However, the remaining N charges are bosonic, mutually commute, and do not affect the fermion number. These charges, denoted In , n = 1, . . . , N, are thus natural candidates for extra operators that may lift the degeneracy of the Jack superpolynomials, and thereby produce orthogonal combinations of these superpolynomials. This expectation indeed materializes. Actually, to construct orthogonal superpolynomials it suffices to consider the action of the charge I1 , or equivalently, its Dunkl-operator version I1 in the space of symmetric superpolynomials. Its action is also triangular, but with respect to an ordering on superpartitions stronger than the one introduced in [5]. Knowing the action of I1 explicitly on Jack superpolynomials allows to define orthogonal fermionic extensions, J , of the usual Jack polynomials. Moreover, it also leads to 1
[4].
An extensive list of references on the CMS model and its supersymmetric extension can be found in
Jack Polynomials in Superspace
333
determinantal formulas for the expansion coefficients of the orthogonal Jack superpolynomials J in terms of Jack superpolynomials J . The program that we just sketched is the subject of the first part of this paper (up to Sect. 8). It is in line with our previous work [4, 5], and can be viewed as its natural completion. We stress that it is also very explicit in that the precise relation between the orthogonal Jack superpolynomials J with the old J is provided, and that closed form expressions for the latter, in the supermonomial basis, were already obtained in [5]. The construction is also “physical”: the quantum many-body problem and its underlying integrability structure is the guiding tool used to identify a complete set of simultaneously diagonalizable operators. This leads us to our first characterization of the orthogonal Jack superpolynomials: Theorem 1 (See Theorems 22 and 31). The orthogonal Jack superpolynomial J is the unique function satisfying: I1 J = J , and J = m + c (β) m , (1) H2 J = ε J , <
where ε and are defined in Lemma 20 and Theorem 28 respectively, while the ordering on superpartitions is introduced in Definition 7. In the second part of the paper (which is essentially Sect. 9) we propose a much more direct, although less explicit, construction of the orthogonal Jack superpolynomials. The starting point is not anymore the extension of the usual Jack polynomial eigenvalue problem, but rather a symmetrization process performed on the non-symmetric Jack polynomials [6, 7], suitably dressed with products of fermionic variables. Recall that the non-symmetric Jack polynomials Eλ (where λ is now a composition) are eigenfunctions of the Di operators [6], and, from the self-adjointness of these operators, orthogonal. On the other hand, as already pointed out, the stCMS commuting conserved charges can all be expressed in term of the Di ’s. This naturally suggests a very direct path for the construction of the common eigenfunctions of all the commuting stCMS charges: start with Eλ , add a fermionic-monomial prefactor and symmetrize with respect to both types of variables. Quite remarkably, this indeed produces the orthogonal Jack superpolynomials J (with determined by λ and the fermionic number). The advantage of this construction is that orthogonality is built in, and preserved by the symmetrization. This second construction (especially the argument in the proof of Theorem 41) leads to another characterization of the orthogonal Jack superpolynomials: Theorem 2 (See Theorems 35 and 41). The orthogonal Jack superpolynomials are the unique functions satisfying: c (β) m . (2) J , J β ∝ δ , and J = m + <
Our two characterizations of the orthogonal Jack superpolynomials thus extend the two most common definitions of Jack polynomials. An important clarification is in order concerning the second construction. Because of the anticommuting nature of the fermionic variables, a symmetrized superpolynomial that contains m fermionic variables is necessarily an antisymmetric function of the m corresponding bosonic variables, in addition to being symmetric in the remaining bosonic variables. In other words, viewed solely as functions of the bosonic variables,
334
P. Desrosiers, L. Lapointe, P. Mathieu
the symmetric superpolynomials can be decomposed as a sum over polynomials with mixed symmetry properties such as those studied in [8, 9], but with coefficients involving fermionic variables. This, however, does not mean that the theory of symmetric superpolynomials is only a special case of the theory of polynomials with mixed symmetry. Indeed, by symmetrizing over all the variables, including the fermionic ones, the antisymmetrized bosonic variables will be dependent on the particular term under consideration, ensuring that the net result is a brand new object. Let us illustrate this comment by comparing a four-variable monomial function with mixed symmetry with its supermonomial counterpart. Take the four variables to be x1 , x2 , and y1 , y2 , and let the xi variables be symmetrized while the yi are antisymmetrized. Let also the x and y parts of the monomial be parametrized by the partitions (2, 1) and (3, 1) respectively. The associated monomial with mixed symmetry reads: m (3,1)a ,(2,1)s = (y13 y2 − y1 y23 )(x12 x2 + x1 x22 ).
(3)
The corresponding supermonomial has four bosonic and four fermionic variables denoted zi and θi respectively, with i = 1, · · · , 4 and θi θj = −θj θi . Given the superpartition (3, 1; 2, 1), it reads as m(3,1;2,1) = θ1 θ2 (z13 z2 − z1 z23 )(z32 z4 + z3 z42 ) + θ1 θ3 (z13 z3 − z1 z33 )(z22 z4 + z2 z42 ) + θ1 θ4 (z13 z4 − z1 z43 )(z22 z3 + z2 z32 ) + θ2 θ3 (z23 z3 − z2 z33 )(z12 z4 + z1 z42 ) + θ2 θ4 (z23 z4 − z2 z43 )(z12 z3 + z1 z32 ) + θ3 θ4 (z33 z4 − z3 z43 )(z12 z2 + z1 z22 ). (4) Clearly, each bosonic component of this expression corresponds to a monomial with mixed symmetry of type (3). The main point is that the supermonomial is the sum of all these mixed symmetry monomials, each with its appropriate fermionic-monomial prefactor. These prefactors have drastic effects say, when multiplying supermonomials together, due to the fermionic nature of their constituents. The multiplication of polynomials reveals in a rather critical way one aspect of the difference between the polynomials with mixed symmetry and superpolynomials: the product of two polynomials of the former type cannot be decomposed into a linear combination of polynomials with mixed symmetry, that is, there is no ring structure. This is not so for the superpolynomials. It should thus be crystal clear that Jack superpolynomials are not simply Jack polynomials with mixed symmetry properties in disguised form. However, pinpointing the relationship between these two types of objects is technically important since it allows to use the results of [8] on the norm of the Jack polynomials with mixed symmetry to obtain, in a rather direct way, the norm of the orthogonal Jack superpolynomials. The presentation of the two approaches is preceded by four sections in which we introduce our notation, define our basic superobjects and derive relevant properties. Sections 6 and 7 deal respectively with the construction of the Hn and In eigenfunctions. The common eigenfunctions are shown to be orthogonal in Sect. 8. The alternative construction based on the non-symmetric Jack polynomials is the subject of Sect. 9. In Appendix A, we present a number of examples of Jack superpolynomials, including a detailed computation based on the determinantal formula. These examples illustrate a nice property of the orthogonal Jack superpolynomials: they do not depend upon N (when N is sufficiently large). In other words, their expansion coefficients in the supermonomial basis are independent of the number of variables. This property can of course be obtained from the explicit formulas, but it is not at once manifest. Finally, various natural extensions of this work are mentioned in the conclusion.
Jack Polynomials in Superspace
335
Note that for the readers not particularly interested in the “physical” construction relying on the structure of the integrable supersymmetric stCMS model and its conserved charges, reading Sect. 2, and the first two definitions of Sect. 4 is sufficient to understand Sect. 9. 2. Basic Definitions For i, j ∈ {1, . . . , N}, let Kij be the operator that exchanges the variables zi and zj . Similarly, let κij exchange the anticommuting variables θi and θj . Their action on a superfunction f (z, θ ) is thus Kij f (. . . , zi , . . . , zj , . . . , θi , . . . θj , . . . ) = f (. . . , zj , . . . , zi , . . . , θi , . . . , θj , . . . ) , κij f (. . . , zi , . . . , zj , . . . , θi , . . . , θj , . . . ) = f (. . . , zi , . . . , zj , . . . , θj , . . . , θi , . . . ) . (5) Each of these sets of operators generates a realization of the permutation group SN . Since the Kij ’s and the κij ’s commute, the operators Kij = Kij κij , acting as Kij f (. . . , zi , . . . , zj , . . . , θi , . . . , θj , . . . ) = f (. . . , zj , . . . , zi , . . . , θj , . . . , θi , . . . ) , (6) are also seen to generate a realization of SN . Let P be the space of polynomials in the variables θ1 , . . . , θN and z1 , . . . , zN . We will denote by P SN the subspace of P of polynomials invariant under the simultaneous exchange of any pair of variables θi ↔ θj and zi ↔ zj . A polynomial f ∈ P thus belongs to P SN iff Kσ f = f for any σ ∈ SN .2 Let I = {i1 , . . . , im }, (1 ≤ i1 < i2 < · · · < im ≤ N ) be an ordered set of integers, and let λ be a composition with N parts, that is, a sequence of N nonnegative integers (e.g., if N = 5, one possible composition is (20134)).3 A natural basis of P is provided by the monomials θI zλ , where θI = θ{i1 ,...,im } = θi1 · · · θim ,
λN zλ = z1λ1 · · · zN .
(7)
If I has m entries, θI zλ is said to belong to the m-fermion sector. A superpartition in the m-fermion sector is made of a partition a whose parts are all distinct, and of a usual partition s , that is, = (1 , . . . , m ; m+1 , . . . , N ) = (a ; s ) ,
(8)
with a = (1 , . . . , m ),
i > i+1 ≥ 0 ,
i = 1, . . . m − 1,
(9)
and s = (m+1 , . . . , N ),
i ≥ i+1 ≥ 0 ,
i = m + 1, . . . , N − 1 .
(10)
2 It is understood that the product decomposition of K into elementary permutations K σ i,i+1 follows the decomposition of σ into elementary transpositions σi = (i, i + 1). In other words, Kσ is a realization on superspace variables of the action of σ on indices. 3 We use the word composition in a broader sense, given that, strictly speaking, a composition should not contain any zeroes. Similarly, we allow the presence of zeroes in a partition.
336
P. Desrosiers, L. Lapointe, P. Mathieu
s In the zero-fermion sector, the semicolon is omitted N and reduces to . We often write the degree of a superpartition as n = || = i=1 i . Given a supercomposition γ = (γa ; γs ) = (γ1 , . . . , γm ; γm+1 , . . . , γN ), we will denote by γ the superpartition whose antisymmetric part is the rearrangement of (γ1 , . . . , γm ) and whose symmetric part is the rearrangement of (γm+1 , . . . , γN ). Denoting by λ+ the partition obtained by the rearrangement of the entries of any composition λ, we have
(γa ; γs ) = (γa+ ; γs+ ) ,
(11)
which we can illustrate with the example: (1, 4, 2; 2, 5, 1, 3) = (4, 2, 1; 5, 3, 2, 1).
(12)
Furthermore, σγ will stand for the element of SN that sends γ to γ , that is σγ γ = γ . Note that we can always choose σγ such that σγ = σγa σγs , with σγa and σγs permutations of {1, . . . , m} and {m + 1, . . . , N} respectively. If we delete the semi-colon in a superpartition , we obtain an ordinary composition that we will denote as c : = (1 , . . . , m ; m+1 , . . . , N ) ⇒ c = (1 , . . . , N ) .
(13)
Finally, to any superpartition , we associate a unique standard partition ∗ obtained by rearranging the parts of the superpartition in decreasing order: ∗ = (c )+ .
(14)
For instance, the ∗-rearrangement of (4, 2, 1; 5, 3, 2, 1) is (4, 2, 1; 5, 3, 2, 1)∗ = (5, 4, 3, 2, 2, 1, 1) .
(15)
A natural basis of P SN is provided by the monomial symmetric superfunctions4 m =
1 Kσ θ{1,...,m} z , f
(16)
σ ∈SN
where the normalization constant f is f = fs = ns (0)! ns (1)! ns (2)! · · · ,
(17)
with ns (i) the number of i’s in s , the symmetric part of the superpartition = (a ; s ). This normalization ensures that the coefficient of the monomial θ{1,...,m} z appearing in the expansion of m is equal to 1. The supermonomial m(3,1;2,1) is given in (4) for N = 4. Finally, we will define a scalar product ., .β in P . With zj − z k , (18) (z) = z j zk 1≤j
4 The quantity z is to be understood as z 1 · · · z N , that is, as if were replaced by . However, c 1 N to alleviate the notation, we will omit the subindex c when is treated as a formal power.
Jack Polynomials in Superspace
337
., .β is defined (for β a positive integer) on the basis elements of P as C.T. β (z) β (¯z)zλ /zµ if I = J λ µ θI z , θJ z β = , 0 otherwise
(19)
where z¯ i = 1/zi , and where C.T.[E] stands for the constant term of the expression E. This scalar product is a special case of the physical scalar product of the underlying supersymmetric quantum many-body problem (β is now arbitrary)
N dzi ¯ , dθ1 · · · dθN β (z) β (¯z)A(z, θ ), B(¯z, θ) A(z, θ ), B(z, θ ) β = 2πizi i=1
(20) where θi1 · · · θim is defined such that (θi1 · · · θim )(θi1 · · · θim ) = θN · · · θ1 ,
(21)
an operation akin to the Hodge duality transformation. For instance, if N = 5, we have θ2 θ5 = −θ4 θ3 θ1 . The integral over fermionic variables refers to the Berezin integration
dθ = 0 , dθ θ = 1 . (22)
3. Dunkl Operators and Conserved Charges The Dunkl operators, Di , are defined as [10]5 D i = zi
zi zj ∂ +β 1 − Kij + β 1 − Kij − β(i − 1) ∂zi zi − z j zi − z j j
j >i
∂ = zi +β Oij + β Oij − β(i − 1) , ∂zi j
(23)
j >i
where
Oij =
zi zi −zj zj zi −zj
1 − Kij 1 − Kij
j i
(24)
.
The set {Di } forms a family of commuting operators satisfying the degenerate Hecke relations Ki,i+1 Di+1 − Di Ki,i+1 = β
and
Kj,j +1 Di = Di Kj,j +1
(i = j, j + 1) . (25)
It turns out that any conserved charge, Cn , of the stCMS model can be written as the P SN -projection of an expression, Cn , involving Dunkl operators Cn |P SN = Cn ⇐⇒ Cn f = Cn f ,
∀f ∈ P SN .
(26)
5 Following [4], we use the qualitative “Dunkl” for all Dunkl-type operators. The present D ’s are i often called Cherednik operators.
338
P. Desrosiers, L. Lapointe, P. Mathieu
This is a key tool in our subsequent analysis, since by working with Dunkl operators, we avoid manipulating fermionic variables to a large extent. More explicitly, the stCMS conserved charges are defined as the projection onto P SN of the following expressions: Hn =
N
Din ,
i=1
Qn =
Kw θ1 D1n ,
w∈SN
Q†n
=
w∈SN
In =
w∈SN
(27)
Kw
∂ n D ∂θ1 1
(28)
(29)
,
∂ n Kw θ1 D1 . ∂θ1
(30)
Of these charges, I0 /(N − 1)! gives the fermion number. Observe that Hn and In preserve the number of fermions (the number of θi ’s) of the superpolynomials on which they act, while Qn (resp. Q†n ) increases it by 1 (resp. −1). Also, since Hn is known to be central in the degenerate Hecke algebra [6], it commutes with Kσ , for any element σ of the symmetric group. Finally, these expressions not being unique, we should mention that the present choice for Hn and In is motivated by the requirement that they act triangularly on monomial superfunctions (as we will show later on). We finish this section with two propositions, the first one relying on the following lemma. Lemma 3. For any nonnegative integers n and m, we have [D1n K12 , D1m K12 ] + [K12 D1n , K12 D1m ] = 0 .
(31)
Proof. Given that κ12 (in K12 ) commutes with D1 , the lemma is equivalent to [D1n K12 , D1m K12 ] + [K12 D1n , K12 D1m ] = 0 .
(32)
We will now seek to prove this expression. First, it is easy to verify, using K12 D1 = D2 K12 − β, that K12 D1n = D2n K12 − β(D1n−1 + D1n−2 D2 + · · · D1 D2n−2 + D2n−1 ) = D2n K12 − βhn−1 (D1 , D2 ) ,
(33)
where hi (x1 , x2 ) is the homogeneous symmetric function of degree i in the variables x1 and x2 . Using (33) and the commutativity of D1 and D2 , we obtain [D1n K12 , D1m K12 ] + [K12 D1n , K12 D1m ] = D1n D2m − βhm−1 D1n K12 − D1m D2n + βhn−1 D1m K12 + D2n D1m −βhn−1 K12 D1m − D2m D1n + βhm−1 K12 D1n = −βhm−1 D1n K12 + βhn−1 D1m K12 − βhn−1 D2m K12 + β 2 hn−1 hm−1 +βhm−1 D2n K12 − β 2 hn−1 hm−1 = βhm−1 (D2n − D1n )K12 + βhn−1 (D1m − D2m )K12 ,
m
(34)
where hi stands for hi (D1 , D2 ). Finally, using the simple identity x1m − x2 = (x1 − x2 ) hm−1 (x1 , x2 ), the previous expression vanishes, and (32) is thus seen to hold.
Jack Polynomials in Superspace
339
Proposition 4. The families of operators Hn , n = 1, . . . , N, and Im , m = 1, . . . , N, when acting on P SN , form a set of mutually commuting operators, that is, they satisfy [Hn , Hm ]f = [Hn , Im ]f = [In , Im ]f = 0 ,
(35)
for any f ∈ P SN . Proof. Since the Dunkl operators Di mutually commute, we have immediately [Hn , Hm ] = 0. Further, since Hn commutes with Kσ for any permutation σ , we also get [Hn , Im ] = 0. The relation [In , Im ]f = 0 is less trivial. We have ∂ n ∂ m Kw θ1 D Kσ θ1 D f In Im f = ∂θ1 1 ∂θ1 1 w,σ ∈SN
=
w,σ ∈SN
θ(w)1
∂ ∂θ(w)1
θ(wσ )1
∂ ∂θ(wσ )1
Kw D1n Kσ D1m f ,
(36)
where (w)1 is the first entry of the permutation w. Therefore, [In , Im ]f can be written as ∂ ∂ [In , Im ]f = Kw D1n Kσ D1m − Kw D1m Kσ D1n f . θ(w)1 θ(wσ )1 ∂θ(w)1 ∂θ(wσ )1 w,σ ∈SN
(37) To prove that this expression is equal to zero, we will match its summands (w, σ ) and (wσ, σ −1 ), and see that they cancel each other. First, if we let w → wσ , and σ → σ −1 , the summand (w, σ ) of (37) becomes ∂ ∂ θ(w)1 (Kw Kσ D1n Kσ −1 D1m − Kw Kσ D1m Kσ −1 D1n ) f ∂θ(wσ )1 ∂θ(w)1 ∂ ∂ θ(wσ )1 (Kw Kσ D1n Kσ −1 D1m − Kw Kσ D1m Kσ −1 D1n ) f , (38) = θ(w)1 ∂θ(w)1 ∂θ(wσ )1
θ(wσ )1
the equality being obtained by interchanging the two prefactors θi ∂/∂θi . Now, (w, σ ) = (wσ, σ −1 ) iff σ = e. Since, in the case (w, σ ) = (w, e), the summand (w, σ ) of (37) cancels, we can assume that (w, σ ) and (wσ, σ −1 ) are distinct summands. Having shown that the prefactors are the same for the two summands (w, σ ) and (wσ, σ −1 ) (cf. (38)), verifying their cancellation amounts to checking that (Kw D1n Kσ D1m − Kw D1m Kσ D1n ) f +(Kw Kσ D1n Kσ −1 D1m − Kw Kσ D1m Kσ −1 D1n ) f = 0 .
(39)
Since f ∈ P SN , this is equivalent to (D1n Kσ D1m Kσ −1 − D1m Kσ D1n Kσ −1 + Kσ D1n Kσ −1 D1m − Kσ D1m Kσ −1 D1n ) f = 0 . (40) Now, D1 commutes with Ki,i+1 , as long as i = 1. Therefore, if σ leaves 1 fixed, (40) holds. We can thus assume that σ does not leave 1 fixed. In this case, σ can be decomposed as α(12)β, where α and β are permutations that leave 1 invariant, and (40) becomes Kα (D1n K12 D1m K12 − D1m K12 D1n K12 + K12 D1n K12 D1m −K12 D1m K12 D1n )Kα −1 f = 0 ,
(41)
340
P. Desrosiers, L. Lapointe, P. Mathieu
−1 where we have used the facts that K12 = K12 , and that Kα and Kβ commute with D1 . n Given that from Lemma 3, [D1 K12 , D1m K12 ] + [K12 D1n , K12 D1m ] = 0, the expression is finally seen to hold, thereby proving [In , Im ]f = 0.
Proposition 5. The charges Hn and In are self-adjoint with respect to the scalar product (19). Proof. In the Hn case, this simply follows from the self-adjointness of the operators Di [6]. In the case of In , we also need to use θ1† = ∂θ∂ 1 , and θ1
∂ ∂θ1
†
=
∂ ∂θ1
†
θ1† = θ1
∂ . ∂θ1
(42)
Our first goal will be to find the common eigenfunctions of the commuting operators Hn and In . However, before plunging into the relevant computations, we need to introduce further technical tools. This will be the subject of the following two sections.
4. Orderings on Superpartitions In this section we introduce three orderings on superpartitions. They will provide three different ways of defining triangular decompositions. First recall the usual dominance ordering on partitions [3]. If λ and µ are two partitions (i.e., λ = λ+ and µ = µ+ ), then λ ≥ µ iff λ1 + · · · + λi ≥ µ1 + · · · + µi for all i. This ordering can be extended to compositions as follows. Any composition λ can be obtained from its rearranged partition λ+ by a sequence of permutations. Among all permutations w such that λ = wλ+ , there exists a unique one, denoted wλ , of minimal length. Definition 6. Given two compositions λ, µ, we say that λ ≥ µ if either λ+ > µ+ , or λ+ = µ+ and wλ ≤ wµ in the Bruhat order of the symmetric group. This will be called the Bruhat ordering on compositions.6 An immediate consequence of this definition is that λ+ ≥ µ for any composition µ such that µ+ = λ+ . Moreover, given that to any superpartition is associated a composition, this ordering induces an ordering on superpartitions. Definition 7. Given two superpartitions , , we say that ≥ , if c ≥ c . We shall refer to this ordering as the Bruhat ordering on superpartitions. We finally define two other orderings on superpartitions. Definition 8. The h-ordering ≤h is defined such that ≤h , if either = , or ∗ = ∗ and ≤ . Definition 9. The t-ordering ≤t is defined such that ≤t , iff ∗ = ∗ and ≤ . 6 The ordering on compositions could alternatively be formulated as follows [8]. We say that λ ≥ µ if either λ+ > µ+ , or λ+ = µ+ and ki=1 λi ≥ ki=1 µi for all k.
Jack Polynomials in Superspace
341
Obviously, these two new orderings on superpartitions are special cases of the Bruhat ordering in the sense that if either ≤h or ≤t , then ≤ . Let us look at illustrative examples. The two superpartitions = (5, 3; 4, 1, 1) and = (5, 1; 4, 4, 0) cannot be t-compared since ∗ = ∗ . However, they can be h-compared since (c )+ = ∗ = (5, 4, 3, 1, 1) < (c )+ = ∗ = (5, 4, 4, 1, 0). On the other hand, let us see how = (5, 1; 4, 3, 1) compares with the previous two superpartitions. Again, and cannot be t-compared, but are such that
(43)
µ<λ
where λi = λi − β(#{j = 1, ..., i − 1 | λj ≥ λi } + #{j = i + 1, ..., N | λj > λi }) .
(44)
For i < j , let Tij be such that on a composition λ, Tij λ =
(· · · λj · · · λi · · · ) if λi > λj , (· · · λi · · · λj · · · ) otherwise
(45)
that is, Tij interchanges the entries λi and λj only when λi > λj . The action of Tij on superpartitions is defined via the corresponding compositions c . The order on compositions satisfies the following obvious property. Property 11. Let µ and λ be two compositions such that µ ≤ λ, and µ+ = λ+ , that is, such that µ ≤t λ. Then, there exists a sequence of operators Tij giving µ = Ti1 j1 · · · Ti j λ .
(46)
Lemma 12. Let and be two superpartitions. If = Ti1 j1 · · · Ti j ,
(47)
for some Tik jk , 1 ≤ k ≤ , then ≤ . 7 This is the ordering introduced in [4]. In reference [5], a more precise formulation of this ordering was introduced (and called ≤s ).
342
P. Desrosiers, L. Lapointe, P. Mathieu
First, it is important to realize that the product Ti1 j1 · · · Ti j can be rewritten as a product of Tij ’s where all the Tij ’s that interchange elements between the fermionic and bosonic sectors (that is, that interchange entries of a and s ) are to the right. Since, in this form, the remaining elements only interchange entries within each sectors, their action will amount to nothing after the “bar” operation has been performed. We can therefore assume that all the Tij ’s in Ti1 j1 · · · Ti j interchange elements between the fermionic and bosonic sectors. Before going into the proof of the lemma, let us first give an example that will hopefully shed some light on the many steps involved in the proof. Let us consider the superpartition = (7, 5, 4, 3, 0; 9, 6, 4, 4, 2, 2, 1, 1, 1) ,
(48)
and act on it with T1,11 , and then with T4,13 . We have thus T4,13 T1,11 = T4,13 T1,11 (7, 5, 4, 3, 0; 9, 6, 4, 4, 2, 2, 1, 1, 1) = (2, 5, 4, 1, 0; 9, 6, 4, 4, 2, 7, 1, 3, 1) .
(49)
The superpartition is obtained by applying the “bar” operation: = (2, 5, 4, 1, 0; 9, 6, 4, 4, 2, 7, 1, 3, 1) = (5, 4, 2, 1, 0; 9, 7, 6, 4, 4, 3, 2, 1, 1) . (50) Now, can we conclude directly that < ? No, because even though c Bruhat dominates the intermediate composition (2, 5, 4, 1, 0, 9, 6, 4, 4, 2, 7, 1, 3, 1), the latter does not dominate (it is actually dominated by) the composition c associated to the resulting superpartition . In the intermediate step, we somehow ended up too low to apply a chain of Bruhat dominance. This simply indicates that the “bar” operation is not compatible with the ordering on compositions. Therefore, the lemma does not follow immediately from the previous property. Actually, what the proof of the lemma gives is a precise construction to arrive at via a sequence of Tij ’s without introducing rearrangements at any intermediate step. Proof. Essentially, we want to show that any that can be obtained from by exchanging a certain number of elements of a and s , and then rearranging both vectors, can also be obtained by simply applying a sequence of Tij ’s, without rearrangement. Let (a1 , . . . , a ) be the partition corresponding to the elements of a that will be moved to the symmetric side. Also, let (p1 , . . . , p ) be their respective positions in , and (p1 , . . . , p ) be their final positions, that is, their positions in . Similarly, let (b1 , . . . , b ) be the partition corresponding to the elements of s that will be moved to the antisymmetric side, and denote by (q1 , . . . , q ) their positions in , and by (q1 , . . . , q ) their final positions in . Because we move larger elements to the symmetric side, we must have ak > bk for all k = 1, ..., . In our example, we have = 2, and (a1 , a2 ) = (7, 3), (b1 , b2 ) = (2, 1),
(p1 , p2 ) = (1, 4), (p1 , p2 ) = (7, 11), (q1 , q2 ) = (10, 12), (q1 , q2 ) = (3, 4) .
(51) (52)
Now, start from and move (a1 , . . . , a ) so that they occupy the intermediate positions q1 , . . . , q respectively. This can be done using a sequence of Tij because, from ak > bk , we know that all the ak ’s are moved to the right passed smaller elements. The precise sequence of Tij ’s that performs this operation is Tp1 q1 · · · Tp q . In the resulting vector,
Jack Polynomials in Superspace
343
move (b1 , . . . , b ) so that they occupy positions q1 , . . . , q respectively. Again this can be done using Tij operators because, from ak > bk and choosing bm such that it occupies the leftmost position whenever there are multiplicities, all the ak ’s will be moved to the left passed larger elements. This amounts to applying Tp1 q1 · · · Tp q . Finally, applying the sequence Tp1 q1 · · · Tp q gives . Transposing these various steps to our example yields T3,7 T4,11 T7,10 T11,12 T1,3 T4,4 (7, 5, 4, 3, 0; 9, 6, 4, 4, 2, 2, 1, 1, 1) = (5, 4, 2, 1, 0; 9, 7, 6, 4, 4, 3, 2, 1, 1) = . This shows that ≤ .
(53)
Corollary 13. Let µ = (µ1 , . . . , µm ; µm+1 , . . . , µN ) be such that µ = . If ≤ , then Tij µ ≤ . Proof. Since µ = , µ can be written as µ = Ti1 j1 · · · Ti j , for some operators Tik jk , k = 1, . . . , . Therefore, from Lemma 12, Tij µ = Tij Ti1 j1 · · · Ti j ≤ , which gives Tij µ ≤ if ≤ .
(54)
5. Triangular Operators and Determinants This section presents basic results regarding triangular operators. We should point out that Theorem 16 and Corollary 17 appear for instance in a disguised form in [3]. The exposition of the material in this section follows that of [11, 12]. (s) Let {s } be any basis of P SN . We write P, for the finite-dimensional subspace of P SN spanned by the s ’s such that , with respect to some ordering (which could be any of the three orderings introduced previously), i.e., (s)
P, = Span{s } .
(55) (s)
Definition 14. A linear operator Ot : P SN → P SN is called triangular if Ot (P, ) ⊆ (s)
P, for every superpartition . The triangularity of a linear operator Ot in P SN reduces its eigenvalue problem to a finite-dimensional one. Triangular operators can be diagonalized through a determinantal representation of the eigenfunctions. The triangularity implies that the expansion of Ot s is of the form Ot s = s + d s , (56) ≺
with the diagonal matrix elements being precisely the eigenvalues of Ot . Definition 15. The triangular operator Ot is called regular if = whenever ≺ .
344
P. Desrosiers, L. Lapointe, P. Mathieu
Let {p } be a corresponding basis of eigenfunctions diagonalizing Ot . Clearly, we can choose p to have an expansion of the form p = s + c s , (57) ≺
where the normalization has been chosen to make p monic. The following theorem provides an explicit determinantal formula for p , given the action of Ot on s expressed in the basis s . Theorem 16. Let Ot be a regular triangular operator in P SN whose action on the basis {s } is given by (56). Then the unique monic basis {p } of P SN triangularly related to the basis {s } (cf. (57)) diagonalizing Ot , i.e., Ot p = p ,
∀ ,
is given explicitly by the (lower) Hessenberg determinant 0 ... ... 0 s(1) (1) − (n) d(2) (1) (2) − (n) 0 . . . 0 s(2) . . . .. .. . . . 1 . . . . . p = . . .. .. .. E . . . . . 0 (n−1) − (n) s(n−1) d(n−1) (1) d(n−1) (2) · · · s (n) d(n) (1) d(n) (2) · · · · · · d(n) (n−1)
(58)
(59)
Here (1) < (2) < · · · < (n−1) < (n) = denotes any linear ordering, refining the natural order , of the superpartitions, (i) , i = 1, . . . , n − 1, that precede in the ordering . The normalization is determined by E =
n−1
( − (i) ) .
(60)
i=1
With Ot = H2 and s = m , the previous theorem leads to a closed expression for the H2 eigenfunctions, the Jack superpolynomials J of [4, 5], in terms of the coefficients d entering in the supermonomial decomposition of H2 m . These coefficients have been computed in [5]. As already indicated, the superpolynomials J are not orthogonal. We will seek linear combinations that are orthogonal by considering the eigenfunctions of I1 . Theorem 16 will then be invoked again, but this time with s = J and Ot = I1 . Computing the action of I1 in the J basis will provide closed form formulas for the orthogonal superpolynomials J . In the m basis, J will appear as a determinant of determinants. As an aside, we point out that the determinantal formula for p , leads to a linear recurrence relation encoding an efficient algorithm for the computation of the coefficients c entering the expansion (57). Corollary 17. The expansion of p is of the form p =
n =1
c() s() ,
(61)
Jack Polynomials in Superspace
345
with c(n) = c = 1 and, for 1 < ≤ n, c(−1) =
n 1 c(k) d(k) (−1) . − (−1)
(62)
k=
We conclude this section with an elementary and surely well known proposition that we prove for a lack of reference. It provides a simple way of computing the pλ eigenvalues of mutually commuting operators in terms of the action of these operators on the s basis. Proposition 18. Let Dt be a triangular operator commuting with Ot . Then, Dt p = ε p ,
(63)
where ε is the coefficient of s in Dt s . Proof. Let p˜ = Dt p /ε . Then, from (57) and the fact that Dt is a triangular operator, p˜ is seen to be of the form p˜ = s + g s . (64)
Now, because Ot and Dt commute, we have Ot p˜ = Dt Ot p /ε = p˜ . Therefore, the monic polynomial p˜ diagonalizes Ot and, from (64), is triangularly related to the basis {s } . From the uniqueness in Theorem 16, we must have p˜ = p , or Dt p = ε p . 6. The Action of Hn We are now ready to tackle one of our main objectives, which is to obtain common eigenfunctions of the commuting operators Hn and In . In this section, we first study the action of the Hn ’s. We start with a very simple proposition concerning the operators Oij that we state without proof. λ
Proposition 19. If we only consider terms that are permutations of ziλi zj j , we have, for i > j, λ λj i zi zj λi > λj λi λj λj λi Oij zi zj = −z z (65) λ i < λj , i j 0 otherwise and, for i < j , λ
Oij ziλi zj j
λj λ i zi zj λ λ j = −z i z i j 0
λi > λj λi < λj . otherwise
(66)
346
P. Desrosiers, L. Lapointe, P. Mathieu
Lemma 20. Let λ be a partition, and let λR be λ in reverse order. Then R R an,µ zµ , Hn zλ = εn,λ zλ +
(67)
µ<λR ;µ+ =λ
with εn,λ given explicitly by the formula εn,λ =
N
λR i
n (68)
,
i=1 R where the symbol γi was introduced in Property 10, and where λR i stands for (λ )i .
Proof. The lemma will hold if we can demonstrate that, for λ1 ≥ λ2 ≥ · · · ≥ λN , we have R λR Di zλ = λR + aµ z µ . (69) i z µ<λR ;µ+ =λ
Using Property 10, for this to be true we only need to show that terms of the type zµ , R where µ+ = λ never occur (except for zλ ). Since λR ≤ µ for any µ such that µ+ = λ, this is indeed seen to be true. R
The special action of Hn on zλ induces a triangularity on m . Theorem 21. Let ∗ = (λ1 , . . . , λN ) = λ. Then, (n) Hn m = εn,λ m + a m ,
(70)
with εn,λ given in Lemma 20. Proof. We have that θ{1,...,m} z = ±Kσ θI zλ , for some σ ∈ SN and some I ⊆ {1, . . . , N}. Since Hn commutes with Kij and θI , we therefore obtain, using Eq. (16) and Lemma 20, 1 R Hn m = ± Kσ Kσ θI Hn zλ f σ ∈SN 1 R =± Kσ Kσ θI εn,λ zλ + an,µ zµ f σ ∈SN µ<λR ;µ+ =λ (n) = εn,λ m + a m , R
which proves the theorem.
The next theorem was proven in [5] (cf. Theorem 10 therein). Theorem 22. There exists a unique basis {J } of P SN such that H2 J = ελ,2 J and J = m + v m ,
where λ = ∗ .
(71)
Jack Polynomials in Superspace
347
The explicit action of H2 on the supermonomial basis, m , was computed in [5]. In view of Theorem 16, a determinantal formula giving J in terms of supermonomials immediately followed. It should be noted that an efficient algorithm to evaluate such a determinant can be found in Corollary 17. We now show that J is an eigenfunction of Hn , for all n. Theorem 23. With ∗ = λ, we have Hn J = ελ,n J =
N
n J . (λR i )
(72)
i=1
Furthermore, for µ a partition such that µ = λ, there exists at least one n such that ελ,n = εµ,n . Proof. Given Theorem 22, formula (70) and the fact that Hn and H2 commute, the first part of the theorem follows immediately from Proposition 18. The second part of the theorem isobvious because the ελ,n (n = 1, 2, . . . ) are polynomials in β whose constant n are such that if ε λ = ε for all n = 1, 2, . . . , terms ελ,n β=0 = N µ,n β=0 µ,n β=0 i=1 i then λ = µ. Therefore, ελ,n and ελ,n , considered as functions of a generic parameter β, cannot be equal. The last theorem implies that the superpolynomials J and J associated to distinct superpartitions such that ∗ = ∗ , share the same eigenvalues. Therefore, additional commuting operators need to be diagonalized in order to lift the degeneracies. These are the In charges, whose action is considered in the following section. 7. The Action of In Let In be the operator In =
N
K1i θ1
i=1
∂ Dn . ∂θ1 1
(73)
The following proposition states that In and In are equivalent (up to a constant) on P SN . Proposition 24. We have, for f ∈ P SN , In f = (N − 1)! In f .
(74)
Proof. The symmetric group can be factorized in the following way: σ ∈SN
Kσ =
N i=1
K1i
Kw .
(75)
w∈S{2,...,N }
Therefore, since for w ∈ S{2,...,N } , Kw commutes with θ1 ∂θ∂ 1 D1n , and since Kw leaves f invariant, the proposition follows. We now introduce a subspace of P SN .
348
P. Desrosiers, L. Lapointe, P. Mathieu
Definition 25. For a superpartition in the m-fermion sector, L is given by ! ! L = Span zµ µ ≤ = Span Kσ Kw z σ ∈ Sm , w ∈ S{m+1,...,N} , and ≤ , where as usual is a superpartition. This subspace has the following property. Lemma 26. Let be a superpartition in the m-fermion sector, and let i ∈ {1, . . . , m}. Then, Di (L ) ⊆ L .
(76)
Proof. From the action of Di given in Property 10, the lemma is false only if, for µ¯ ≤ , there exist, in Di zµ , some terms of the type zν , where ν + = ∗ and ν ≤ . Given that by definition, Di decomposes into the blocks Oij , it is thus sufficient to limit ourselves to the terms of Oij zµ considered in Proposition 19. If j ∈ {1, . . . , m}, these terms are seen to be of the type zν , with ν¯ = µ¯ ≤ , and hence belong to L . On the other hand, when j ∈ {m + 1, . . . , N}, since i ∈ {1, . . . , m}, we are always in the case j > i of Proposition 19, in which case these terms are of the type zTij µ . The lemma then follows because, from Corollary 13, Tij µ ≤ whenever µ¯ ≤ . Theorem 27. We have In m = ,n m +
(n)
b m .
(77)
<
Proof. We will prove the equivalent statement that In (see Proposition 24) acts triangularly. We have In m
=
N i=1
K1i θ1
∂ 1 D1n Kw θ{1,...,m} z . ∂θ1 f
(78)
w∈SN
We will now focus on the part of this expression involving θ{1,...,m} to the left. This term is of the form m 1 In m = (−1)(w)+1 K1i D1n zw , (79) θ{1,...,m} f w∈Sm
i=1
where (w) is the length of the permutation w. This is because, if κw θ{1,...,m} does not contain θ1 , the term will be annihilated by ∂θ∂ 1 . Now, if κw θ{1,...,m} contains θ1 , then κ1i κw θ{1,...,m} contains θi , and thus κ1i κw θ{1,...,m} will certainly not be equal to ±θ{1,...,m} if i > m. Therefore i ≤ m, which means that w needs to belong to Sm for κ1i κw θ{1,...,m} to be equal to ±θ{1,...,m} . Finally, with κ1i κw θ{1,...,m} = (−1)(w)+1 θ{1,...,m} , formula (79) is seen to hold. Now, if w ∈ Sm , then zw ∈ L , which implies, from Lemma 26, that Din zw ∈ L . Therefore, since K1i , for i = 1, . . . , m, also preserves L (by definition), we have that In m |θ{1,...,m} belongs to L . Therefore, any z in In m |θ{1,...,m} , for a superpartition, will be such that ≤ , which proves the theorem. Note that the eigenvalues ,n are not given explicitly. However, ,1 is obtained in the next theorem. This theorem also characterizes the precise action of I1 on monomials, if we discard coefficients that will not be needed in the sequel.
Jack Polynomials in Superspace
349
Theorem 28. The action of I1 on the monomials is the following: I1 m = ,1 m + b m + c m ,
with
,1 = (N − 1)!
m
(80)
i − β m(m − 1) + #
,
(81)
i=1
where # is the number of pairs (i, j ) such that i ∈ {1, . . . , m}, j ∈ {m + 1, . . . , N} and i < j , and with (N − 1)! β sgn(σTaij ) ns (i ) if = Tij for some i < j b = (82) 0 otherwise (the c ’s are left undetermined). The action of I1 given in (80) displays two types of subleading terms, each type being characterized by one of the two specializations of the Bruhat ordering. As in the action of Hn , we recover terms that are h-ordered. But in addition, there appear terms that are t-ordered, labeled by superpartitions such that ∗ = ∗ . It is precisely the superpolynomials associated to such superpartitions that were Hn -degenerate. It is because they can now be compared that the action of I1 lifts the degeneracies, as we will see in the lemma that follows this theorem. Proof. To simplify the analysis, we again work with I1 instead of I1 , and focus on the coefficient θ{1,...,m} . From (79), this coefficient is given by I1 m
θ{1,...,m}
=
m 1 K1i D1 (−1)(w)+1 zw . f i=1
(83)
w∈Sm
From the Hecke algebra relations, and (1i) = σi−1 · · · σ1 · · · σi−1 we obtain K1i D1 = Di K1i + β
i−1
K(σi−1 ···σ1 )j σ2 ···σi−1 ,
(84)
j =1
where the symbol ()j means that σj does not belong to the product in parenthesis. Now, the transposition (1i) contains an odd number of elementary transpositions, while all the terms of the form (σi−1 · · · σ1 )j σ2 · · · σi−1 contain an even number of such transposi tions. With w∈Sm (−1)(w)+1 zw being totally antisymmetric in the first m variables, we thus obtain I1 m
θ{1,...,m}
=
m 1 (Di − β(i − 1)) (−1)(w) zw f i=1
w∈Sm
m 1 = (−1)(w) Kw (Di − β(i − 1))z , f w∈Sm
i=1
(85)
350
P. Desrosiers, L. Lapointe, P. Mathieu
where we have used the fact that We have, from (23), m
m
i=1 Di
(Di − β(i − 1)) =
m
i=1
i=1
+β
zi
commutes with any Kw such that w ∈ Sm . ∂ +β ∂zi
m n
(Oij + Oj i )
1≤i<j ≤m
Oij − βm(m − 1) .
(86)
i=1 j =m+1
With Oij + Oj i = 0, this leads to m = i − βm(m − 1) m I1 m θ{1,...,m}
i=1
+
N m β (−1)(w) Kw Oij zw . f
(87)
i=1 j =m+1
w∈Sm
The last term of this expression becomes (if we do not consider coefficients that are not permutations of ) m N β (−1)(w) Kw Oij z f i=1 j =m+1
w∈Sm
β# β =− (−1)(w) Kw z + (−1)(w) Kw f f w∈Sm
w∈Sm
zTij , (88)
(i,j );i >j
where (i, j ) is considered to be such that i ∈ {1, . . . , m} and j ∈ {m + 1, . . . , N}. Putting everything together, we get I 1 m = ,1 m +
β Kw θ1 · · · θm f w∈SN
zTij +
(i,j ) ; i >j
b m . (89)
If = Tij , the coefficient of m in the last formula is then given by β sgn(σTaij )
f ns (i )!ns (i )! ns (j ) = β sgn(σTaij ) (ns (j ) + 1) f ns (j )!ns (i )! ns (i ) = β sgn(σTaij ) (ns (j ) + 1) ns (j ) + 1 = β sgn(σTaij ) ns (i ) , (90)
since ns (i ) = ns (i ) − 1 and ns (j ) = ns (j ) + 1.
Lemma 29. The triangular operator I1 is regular, that is, ,1 = ,1 if < . Proof. < means that differs from of a sequence of Tij ’s, by the application m with i ≤ m and j > m. It follows that m < , so that the constant term in i i i=1 i=1 ,1 , viewed as a polynomial in β, is strictly smaller than ,1 |β=0 . This readily implies that ,1 = ,1 .
Jack Polynomials in Superspace
351
Lemma 30. The action of I1 on J is triangular with respect to the t-ordering, that is, b J , (91) I1 J = ,1 J +
where the coefficients b are given in (82). Proof. From Theorem 22 and because the order ≤h is weaker than the Bruhat order ≤, we have v m , (92) J = m + < ; ∗ =∗
and thus, from (80), I1 J = ,1 m +
b m +
c m
< ; ∗ =∗
= ,1 J +
b J +
d J ,
(93)
< ; ∗ =∗
since from (92) we get the inverse relation m = J +
w J .
(94)
< ; ∗ =∗
It now suffices to show that the coefficients d in (93) do in fact vanish. Since I1 commutes with Hn , I1 J must be an eigenfunction of Hn with eigenvalue ε,n (n = 1, 2, . . . ). From Theorem 23, the expansion of I1 J in terms of J can thus only contain terms such that ∗ = ∗ . 8. The Orthogonal Jack Superpolynomials J Theorem 31. There exists a unique basis {J } of P SN such that u J . I1 J = ,1 J and J = J +
(95)
Proof. Using Lemmas 29 and 30, the theorem follows immediately from Theorem 16. From Theorem 16, Lemma 30 and the explicit expression of the eigenvalues ε,1 given in Theorem 28, a determinantal formula, giving J in terms of J , can be obtained. Moreover, by Corollary 17, a recurrence is provided for the coefficients u . Note that because the eigenvalues ,1 and the coefficients b, given in Theorem 28 do not depend upon N, apart from a factorizable overall prefactor (N − 1)!, the coefficients u are N -independent. Moreover, since the expansion of Jλ in the supermonomial basis is N-independent, this holds true for the supermonomial decomposition of Jλ . This is illustrated in Appendix A.
352
P. Desrosiers, L. Lapointe, P. Mathieu
Given Theorem 23, the previous theorem has the following corollary. Corollary 32. Hn J = ε,n J .
(96)
Furthermore, if ∗ = ∗ , then there exists at least one n such that ε,n = ε,n . We now show that J is also an eigenfunction of In , for all n. Theorem 33. We have In J = ,n J .
(97)
Furthermore, if a = a , then there exists at least one n such that ,n = ,n . Proof. Given Theorem 31, formula (77) and the fact that In and I1 commute, the first part of the theorem follows immediately from Proposition 18. The second part of the theorem is obvious because in β whose constant then,n ’s (n = 1, 2, . . . ) are polynomials terms ,n β=0 = m i=1 i are such that if ,n β=0 = ,n β=0 for n = 1, 2, . . . , a a then = . Hence, ,n and ,n , considered as functions of a generic parameter β, cannot be equal when a = a . Now, it is obvious that if ∗ = ∗ and a = a , then = . Therefore, using Corollary 32, the previous theorem has the following corollary. Corollary 34. The polynomial J is the unique common eigenfunction of the operators Hn and I (n, = 1, 2, . . . ), with respective eigenvalues ε,n and , . We thus have immediately, since the operators Hn and I , n, = 1, 2, . . . , are selfadjoint with respect to the scalar product ., .β , the orthogonality of the basis {J } . Theorem 35. The basis {J } of P SN satisfies J , J β = c (β) δ ,
(98)
where c (β) is some function of β (to be determined in the next section). 9. Symmetryzing the Non-Symmetric Jack Polynomials in Superspace For a composition λ, the non-symmetric Jack polynomials, Eλ , are the unique polynomials in the variables z1 , . . . , zN satisfying cλµ (β)zµ , and Eλ , Eµ β ∝ δλµ . (99) Eλ = zλ + µ<λ
The non-symmetric Jack polynomial Eλ is an eigenfunction of the Dunkl operators, Di Eλ = λi Eλ ,
(100)
where the eigenvalue λi is given in (44). This property characterizes Eλ uniquely [6]. Using the orthogonality of the non-symmetric Jack polynomials, the polynomials Eλ,I = θI Eλ are immediately seen to form an orthogonal basis of P . These polynomials are in fact the unique common eigenfunctions of the operators Di and θi ∂θ∂ i , for i = 1, . . . , N.
Jack Polynomials in Superspace
353
A basis of the space of SN -symmetric polynomials in the variables z1 , . . . , zN is given by the Jack polynomials. The following formula is known [6, 7]: Jλ+ ∝ Kw Eλ+ , (101) w∈SN
where Jλ+ is the Jack polynomial indexed by the partition λ+ . On the operatorial side, the Jack polynomials are the unique common eigenfunctions of the operators Hn = N n i=1 Di (n = 1, 2, . . . ). Other polynomials obtained from the non-symmetric Jack polynomials have been studied in [8, 9]. We are particularly interested in those obtained by antisymmetrizing the first m variables and symmetrizing the remaining ones. Namely, given partitions λ and µ with m and N − m parts respectively, let8 S(λ,µ) =
(−1)m(m−1)/2 Kσ Kw (−1)(σ ) E(λR ,µR ) , fµ
(102)
σ ∈Sm w∈Smc
where E(λR ,µR ) is the non-symmetric Jack polynomial indexed by the concatenation of the compositions λR (recall that this is the partition λ in reversed order) and µR (that is, the adjunction of the entries of µR to the right of those of λR without rearrangement), and where Smc stands for the permutations of {m + 1, . . . , N} (or the permutations of SN that leave 1, . . . , m fixed). Property 36 [8]. The polynomials S(λ,µ) are orthogonal with respect to the scalar product , β , the norm being given explicitly by S(λ,µ) , S(λ,µ) β =
m!(N − m)! d(λR ,µ) d(λR ,µR ) E(λR ,µR ) , E(λR ,µR ) β , d(λ fµ R ,µR ) d(λ,µR )
(103)
where for a composition γ (in this case given by the concatenation of two compositions) dγ = [a(i, j ) + 1 + β( l(i, j ) + 1)], (i,j )∈γ
dγ
=
[a(i, j ) + 1 + β l(i, j )],
(104)
(i,j )∈γ
with a(i, j ) = γi − j, l(i, j ) = #{k = 1, . . . , i − 1 | j ≤ γk + 1 ≤ γi } +#{k = i + 1, . . . , N | j ≤ γk ≤ γi }, and Eγ , Eγ β =
N β−1 1≤i<j p=0
γj − γi + p γj − γi − p − 1
(γ
j
(105)
−γ i )
,
(106)
where (x) = 1 if x > 0 and −1 otherwise. 8 In [8], the antisymmetrization (resp. symmetrization) is performed on the last m variables (resp. first N − m variables). This does not affect in any meaningful way the properties of these polynomials. For instance, formula (103) can be extracted easily from a similar formula of [8].
354
P. Desrosiers, L. Lapointe, P. Mathieu
We now build a basis of P SN from the non-symmetric Jack polynomials. Definition 37. Given a superpartition = (a ; s ), (−1)(m)(m−1)/2 Kw θ{1,...,m} E((a )R ,(s )R ) . J˜ = fs
(107)
w∈SN
Proposition 38. We have
J˜ =
Kw θ{1,...,m} S(a ,s ) .
(108)
w∈SN /(Sm ×Smc )
Proof. From the definition of J˜ , we get (−1)(m)(m−1)/2 Kw Kσ Kρ θ{1,...,m} E((a )R ,(s )R ) J˜ = fs σ ∈Sm ρ∈Smc w∈SN /(Sm ×Smc ) = Kw θ{1,...,m} w∈SN /(Sm ×Smc )
(−1)(m)(m−1)/2 × Kσ Kρ (−1)(σ ) E((a )R ,(s )R ) , s f
(109)
σ ∈Sm ρ∈Smc
which gives the desired result from the definition of S(a ,s ) .
Note that the left coset representatives of SN /(Sm × Smc ) can be described as min(m,N−m) (110) Ki1 ,ji · · · Kik ,jk , 1≤i1
with the understanding that when k = 0, the product of K factors reduces to the identity. Proposition 39. We have J˜ = m +
(111)
a (β) m ,
<
that is, J˜ is monic and triangularly related (with respect to the Bruhat ordering on superpartitions) to the monomial superfunction basis. Proof. The monicity of J˜ , given the monicity of S(λ,µ) [8], follows from Proposition 38. Now, let R stand for ((a )R , (s )R ). From Definition 37 and (99), the coefficient of θ1 · · · θm in J˜ is (−1)m(m−1)/2 = Kσ Kw (−1)(σ ) ER J˜ θ1 ···θm fs σ ∈Sm w∈Smc
=
(−1)m(m−1)/2 fs
σ ∈Sm w∈Smc
Kσ Kw (−1)(σ )
≤R
c (β) z ,
(112)
Jack Polynomials in Superspace
355
where in the last equality, the ordering is on compositions. To obtain the monomial superfunctions that appear in the expansion of J˜ , we must simply select the superpartitions that arise as powers of z in this last equality. Because of the nature of the sums over σ and w, the only superpartition such that ∗ = ∗ that can arise is . Finally, since any rearrangement of a composition such that < R and ∗ = ∗ is also such that < , the proposition is seen to hold. The orthogonality of the J˜ ’s is almost immediate from Proposition 38. Proposition 40. We have J˜ , J˜ β = δ
N! S(a ,s ) , S(a ,s ) β , m!(N − m)!
(113)
where S(a ,s ) , S(a ,s ) β is given explicitly in (103). Proof. Using Proposition 38, we have '
J˜ , J˜ β =
Kw θ{1,...,m} S(a ,s ) ,
w∈G
( Kσ θ{1,...,m} S(a ,s )
,
σ ∈G
(114)
β
where G is a set of left coset representatives of SN /(Sm × Smc ). Since Kσ† = Kσ −1 , this gives ' J˜ , J˜ β =
( Kσ −1 Kw θ{1,...,m} S(a ,s ) , θ{1,...,m} S(a ,s )
w,σ ∈G
.
(115)
β
Now, Kσ −1 Kw must belong to Sm × Smc for the product of θ ’s to be the same on both sides. Therefore, since this only occurs for w = σ , J˜ , J˜ β = #G S(a ,s ) , S(a ,s ) β , which proves the proposition since the cardinality of G is N !/(m! (N − m)!).
(116)
We now make the connection with the family {J } introduced before. Theorem 41. For any superpartition , we have J = J˜ . Proof. Both families are triangular with respect to the same ordering (namely, the Bruhat ordering on compositions) when expanded in the supermonomial basis. Since the Gram-Schmidt orthonormalization procedure ensures that there exists at most one orthonormal family with such triangularity, the two families can only differ by a constant. The theorem then follows from the monicity of both families.
356
P. Desrosiers, L. Lapointe, P. Mathieu
10. Conclusion This work has many natural generalizations. The most direct one is to consider the rational counterpart of the above results. The orthogonal eigenfunctions of the supersymmetric rational CMS model can be obtained in a rather direct way using the present results and the remarkable relation, preserved in the supersymmetric case, that exists between the eigenfunctions of the trigonometric and rational models [13]. This leads to a closed form expression (a determinant of determinants) for the orthogonal generalized Hermite (or Hi-Jack) superpolynomials. These results will be presented in [14]. Another rather immediate line of generalizations would be to examine the supersymmetric extension of the r/tCMS models associated to root systems of any type. To find the corresponding orthogonal superpolynomials, one would proceed as follows: take the Dunkl operator of the corresponding exchange version of the r/tCMS model of interest, look for their non-symmetric eigenfunctions, dress them with a fermionic monomial prefactor and symmetrize the result with respect to both types of variables. For instance, the generalized Jacobi and Laguerre polynomials in superspace could be constructed in this way. The resulting superpolynomial would be a linear combination (with θI coefficients) of the corresponding version (Jacobi or Laguerre) of the generalized polynomials with mixed symmetry [9]. Note, however, that the norm of these special polynomials has not yet been computed. The conserved charges of the model would be constructed exactly as in the present case, by symmetrizing the Dunkl operators raised to the nth power and multiplied by a fermionic prefactor. Along these lines, the formulation of the supersymmetric extension of the elliptic (eCMS) model appears to be direct. The elliptic Dunkl operators are given in [15]. Their supersymmetric lift is immediate, leading directly to an expression for the Hamiltonian of the seCMS model. Again, the general form of the charges is bound to be similar to the one found in this article. However, in this case, little is known about the eigenfunctions. The q deformation of the Jack polynomials are the Macdonald polynomials [3], eigenfunctions (up to a conjugation) of the Ruijsenaars-Schneider model [16], a relativistic version of the CMS model. Again, there exist q analogues of the Dunkl operators, giving a natural road for the formulation of the supersymmetric Ruijsenaars-Schneider model. Moreover, the nonsymmetric Macdonald polynomials being known, they can be lifted to orthogonal Macdonald superpolynomials. Further avenues regarding future works concern the study of properties of the Jack superpolynomials J themselves. On that matter, we already have strong indications that these objects have rather remarkable properties. In particular, the Pieri formulas appear to be rather nice. Moreover, exploratory analyses indicate that J products have a combinatorial interpretation in terms of novel types of supertableaux. Finally, a natural problem that should not be out of reach at this stage is working out the superspace extension of the operator construction of [1]. Acknowledgement. This work was supported by NSERC. L.L. wishes to thank Luc Vinet for his support, and P.D. is grateful to the Fondation J.A Vincent for a student fellowship.
A. Examples of Jack Superpolynomials In this appendix, we present a detailed calculation of one Jack polynomial in superspace via determinantal formulas. Then, we give explicitly all orthogonal surperpolynomials of degrees not larger than 3.
Jack Polynomials in Superspace
357
Let = (3, 1; 0). It has weight 4 and lies in the 2-fermion sector. The explicit action of the conserved operator I1 in the space Span{J }
J(3,1;0) =
(118)
We now want the monomial decomposition of the previous result. For this, we must determine the action of the Hamiltonian H2 in the space Span{m }
(119)
The monomial m(2,1;1) is itself an eigenfunction of the Hamiltonian, i.e., H2 m(2,1;1) = (6 − 10β + 4Nβ) m(2,1;1) .
(120)
Again, Theorem 16 yields the non-orthogonal superpolynomial J(3,1;0) as a determinant: m(2,1;1) −4 − 4β 1 J(3,1;0) = 4β (4β)(−4 − 4β) m(2,1;0) β (121) .m(2,1;1) = m(3,1;0) + 1+β Proceeding in the same way, we get: 1 3 (−6 − 10β)(−4 − 4β) (−2 − 2β) m(1,0;13 ) −6 − 10β 0 0 0 0 m(1,0;2,1) 12β −4 − 4β 0 0 0 6β 0 −4 − 4β 0 0 m(2,0;12 ) × 0 0 0 −4 − 4β 0 m(2,1;1) m 0 2β 4β 2β −2 − 2β (2,0;2) m 0 2β 8β 2β 2β (3,0;1) β β(1 + 2β) m(2,0;2) + m(2,1;1) = m(3,0;1) + 1+β 2(1 + β)2 β(2 + 3β) β(1 + 2β) 3β 2 + m m + m (122) 2) + 3 (1,0;2,1) (2,0;1 (1 + β)2 2(1 + β)2 (1 + β)2 (1,0;1 )
J(3,0;1) =
and
358
P. Desrosiers, L. Lapointe, P. Mathieu Table 1. Non-orthogonal superpolynomials J of weight || ≤ 3
Superpartition (0) (1) (12 ) (2) (13 ) (2, 1)
Superpolynomial J (z, θ; 1/β) m(0) m(1) m(12 ) 2β m(2) + 1+β m(12 ) m(13 ) 6β m(2,1) + 1+2β m(13 )
(3)
3β 6β 2 m(3) + 2+β m(2,1) + (1+β)(2+β) m(13 )
(0; 0) (0; 1) (1; 0) (0; 12 ) (1; 1) (0; 2)
m(0;0) m(0;1) m(1;0) m(0;12 ) m(1;1) β 2β m(0;2) + 1+β m(1;1) + 1+β m(0;12 )
β m(2;0) + 1+β m(1;1) m(0;13 ) m(1;12 ) 2β 6β m(0;2,1) + 1+2β m(1;12 ) + 1+2β m(0;13 ) 2β m(1;2) + 1+2β m(1;12 ) 2β m(2;1) + 1+2β m(1;12 )
(2; 0) (0; 13 ) (1; 12 ) (0; 2, 1) (1; 2) (2; 1)
β 2β 3β 4β 2 m(0;3) + 2+β m(2;1) + 2+β m(1;2) + 2+β m(0;2,1) + (1+β)(2+β) m(1;12 ) 6β 2 + (1+β)(2+β) m(0;13 )
(0; 3)
(3; 0)
2β β 2β 2 m(3;0) + 2+β m(2;1) + 2+β m(1;2) + (1+β)(2+β) m(1;12 )
(1, 0; 0) (1, 0; 1) (2, 0; 0) (1, 0; 12 ) (1, 0; 2)
m(1,0;0) m(1,0;1) m(2,0;0) + m(1,0;12 ) m(1,0;2) +
(2, 0; 1) (2, 1; 0) (3, 0; 0)
m(2,0;1) + m(2,1;0) β 2β β 2β 2 m(3,0;0) + 2+β m(2,1;0) + 2+β m(2,0;1) + 2+β m(1,0;2) + (1+β)(2+β) m(1,0;12 )
β 1+β m(1,0;1) 2β 1+2β m(1,0;12 ) 2β 1+2β m(1,0;12 )
1 (−6 − 10β)(−4 − 4β)3 (−2 − 2β) m(1,0;13 ) −6 − 10β 0 0 0 0 m(1,0;2,1) 12β −4 − 4β 0 0 0 6β 0 −4 − 4β 0 0 m(2,0;12 ) × 0 0 0 −4 − 4β 0 m(2,1;1) m 0 2β 4β 2β −2 − 2β (2,0;2) m 0 6β 0 −2β 2β (1,0;3) β β m(2,0;2) − m(2,1;1) = m(3,0;1) + 1+β 2(1 + β)2 β2 β(3 + 4β) 3β 2 + m + m + m 2 3 . (1,0;2,1) (1 + β)2 (2,0;1 ) 2(1 + β)2 (1 + β)2 (1,0;1 )
J(1,0;3) =
(123)
Jack Polynomials in Superspace
359
Table 2. Orthogonal Jack superpolynomials J of weight || ≤ 3 Superpartition
Jack superpolynomial J (z, θ; 1/β)
(0) (1) (12 ) (2) (13 ) (2, 1) (3)
J(0) J(1) J(12 ) J(2) J(13 ) J(2,1) J(3)
(0; 0) (0; 1) (1; 0) (0; 12 ) (1; 1) (0; 2) (2; 0) (0; 13 ) (1; 12 ) (0; 2, 1) (1; 2)
J(0;0) J(0;1) β J(1;0) + 1+β J(0;1) J(0;12 )
(2; 1) (0; 3) (3; 0)
J(2;1) + 1+β J(1;2) + J 2(1+β)2 (0;2,1) J(0;3) β J(3;0) + 3+β J(0;3)
(1, 0; 0) (1, 0; 1) (2, 0; 0) (1, 0; 12 ) (1, 0; 2) (2, 0; 1)
J(1,0;0) J(1,0;1) J(2,0;0) J(1,0;12 ) J(1,0;2) β J(2,0;1) + 1+β J(1,0;2)
(2, 1; 0) (3, 0; 0)
J(2,1;0) + 1+β J(1,0;2) − J 2(1+β)2 (1,0;2) J(3,0;0)
2β
J(1;1) + 1+2β J(0;12 ) J(0;2) β J(2;0) + 2+β J(0;2) J(0;13 ) 3β
J(1;12 ) + 1+3β J(0;13 ) J(0;2,1) β J(1;2) + 1+β J(0;2,1)) β
β
β(1+2β)
β
Consequently, the orthogonal Jack superpolynomial J(3,1;0) is written in the monomial basis as: β β m(3,0;1) − m(1,0;3) 1+β (1 + β)(3 + 2β) 2β 2 β(3 + 4β) m(2,0;2) + m(2,1;1) + (1 + β)(3 + 2β) (1 + β)(3 + 2β) 6β 2 2β 3 m(2,0;12 ) + + m(1,0;2,1) (1 + β)(3 + 2β) (1 + β)2 (3 + 2β) 6β 3 + m 3 . (1 + β)2 (3 + 2β) (1,0;1 )
J(3,1;0) = m(3,1;0) +
(124)
Finally, we give the simplest Jack superpolynomials explicitly in Tables 1 and 2. The polynomials have degrees less than 3 and 4 in θ and z respectively. In the first table,
360
P. Desrosiers, L. Lapointe, P. Mathieu
the non-orthogonal eigenfunctions J are written in the monomial basis {m }
Commun. Math. Phys. 242, 361–392 (2003) Digital Object Identifier (DOI) 10.1007/s00220-003-0947-9
Communications in
Mathematical Physics
Multi-Trace Superpotentials vs. Matrix Models Vijay Balasubramanian1 , Jan de Boer2 , Bo Feng3 , Yang-Hui He1 , Min-xin Huang1 , Vishnu Jejjalaa4 , Asad Naqvi1 1
David Rittenhouse Laboratories, The University of Pennsylvania, 209 S. 33rd St., Philadelphia, PA 19104-6396, USA. E-mail: [email protected], [email protected], [email protected], [email protected] 2 Institute of Theoretical Physics, University of Amsterdam, Valckenierstraat 65, 1018 XE, Amsterdam, The Netherlands. E-mail: [email protected] 3 Institute for Advanced Study, Olden Lane, Princeton, NJ 08540, USA. E-mail: [email protected] 4 Institute for Particle Physics and Astrophysics, Department of Physics, Virginia Tech, Blacksburg, VA 24061, USA. E-mail: [email protected] Recieved: 21 December 2002 / Accepted: 23 June 2003 Published online: 26 September 2003 – © Springer-Verlag 2003
Abstract: We consider N = 1 supersymmetric U (N ) field theories in four dimensions with adjoint chiral matter and a multi-trace tree-level superpotential. We show that the computation of the effective action as a function of the glueball superfield localizes to computing matrix integrals. Unlike the single-trace case, holomorphy and symmetries do not forbid non-planar contributions. Nevertheless, only a special subset of the planar diagrams contributes to the exact result. In addition, the computation of the superpotential localizes to doing matrix integrals. In view of the results of Dijkgraaf and Vafa for single-trace theories, one might have naively expected that these matrix integrals are related to the free energy of a multi-trace matrix model. We explain why this naive identification does not work. Rather, an auxiliary single-trace matrix model with additional singlet fields can be used to exactly compute the field theory superpotential. Along the way we also describe a general technique for computing the large-N limits of multitrace Matrix models and raise the challenge of finding the field theories whose effective actions they may compute. Since our models can be treated as N = 1 deformations of pure N = 2 gauge theory, we show that the effective superpotential that we compute also follows from the N = 2 Seiberg-Witten solution. Finally, we observe an interesting connection between multi-trace local theories and non-local field theory. 1. Introduction Dijkgraaf and Vafa have recently made the remarkable proposal that the superpotential and other holomorphic data of N = 1 supersymmetric gauge theories in four dimensions can be computed from an auxiliary matrix model [16–18]. While the original proposal arose from consideration of stringy dualities arising in context of geometrically engineered field theories, two recent papers have suggested direct field theory proofs of the proposal [19, 12]. These works considered U (N ) gauge theories with an adjoint chiral matter multiplet and a tree-level superpotential W () = k gk Tr(k ). Using somewhat different techniques ([19] uses properties of superspace perturbation theory while
362
V. Balasubramanian, J. de Boer, B. Feng, Y.-H. He, M. Huang, V. Jejjala, A. Naqvi
[12] relies on factorization of chiral correlation functions, symmetries, and the Konishi anomaly) these papers conclude that: 1. The computation of the effective superpotential as a function of the glueball superfield reduces to computing matrix integrals. 2. Because of holomorphy and symmetries (or properties of superspace perturbation theory), only planar Feynman diagrams contribute. 3. These diagrams can be summed up by the large-N limit of an auxiliary matrix model. The field theory effective action is obtained as a derivative of the matrix model free energy. Various generalizations and extensions of these ideas (e.g., N = 1∗ theories [23, 24], fundamental matter [3, 36], quantum moduli spaces [9], non-supersymmetric cases [20], other gauge groups [32, 33, 5, 26], baryonic matter [4, 7], gravitational corrections [35, 21], and Seiberg Duality [27, 28]) have been considered in the recent literature. A stringent and simple test of the Dijkgraaf-Vafa proposal and of the proofs presented in [19, 12] is to consider superpotentials containing multi-trace terms such as g2 (Tr(2 ))2 . W () = g2 Tr(2 ) + g4 Tr(4 ) +
(1)
We will show that for such multi-trace theories: 1. The computation of the effective superpotential as a function of the glueball superfield still reduces to computing matrix integrals. 2. Holomorphy and symmetries do not forbid non-planar contributions; nevertheless only a certain subset of the planar diagrams contributes to the effective superpotential. 3. This subclass of planar graphs also contributes to the large-N limit of an associated multi-trace matrix model. However, because of differences in combinatorial factors, the field theory effective superpotential is not obtained simply as a derivative of the multi-trace matrix model free energy as one might have naively expected. 4. However, multi-trace theories can be linearized in traces by the addition of auxiliary singlet fields Ai . The superpotentials for these theories as a function of both the Ai and the glueball can be exactly computed from an associated matrix model following [18]. Integrating out Ai from this superpotential reproduces the desired field theory results. The plan of this paper is as follows. In Sect. 2 we carefully analyze the methods of [19] and generalize them so that they apply to an N = 1 U (N ) gauge theory in four dimensions with a tree-level superpotential of the form (1). Along the way we introduce some new techniques that deepen our understanding of the selection rules determining which perturbative field theory diagrams contribute to the effective superpotential of an N = 1 field theory. Using this understanding we demonstrate how the conclusions (1) and (2) above arise and show that contributing diagrams are tree-like graphs in which single-trace diagrams are pasted together by double-trace vertices through which no momentum flows. We illustrate our results by explicitly computing the superpotential to the first few orders in perturbation theory. Finally, we observe an intriguing connection between multi-trace local theories and non-local field theory. Since the field content of pure N = 2 supersymmetric U (N ) gauge theory in four dimensions consists in N = 1 language of a vector multiplet Wα and an adjoint chiral multiplet , the superpotential (1) can be treated as a deformation of an N = 2 theory to an N = 1 theory. Hence, we can use global symmetries, holomorphy, regularity conditions, and the Seiberg-Witten solution of N = 2 gauge theory to compute the exact
Multi-Trace Superpotentials vs. Matrix Models
363
superpotential. We carry out this procedure in Sect. 3, using the fact that the vacuum expectation value of the product of chiral operators Tr(2 )2 factorizes as Tr(2 )2 . We show that the result exactly captures the subset of the planar diagrams that contribute to the exact field theory superpotential. The assumption of factorization in the SeibergWitten analysis is equivalent to the vanishing of a certain subset of planar diagrams in our perturbative computations. In Sect. 4 we demonstrate a general technique for solving U (M) matrix models (or general complex matrix models) with multi-trace potentials. The essential observation, following Das, Dhar, Sengupta, and Wadia [15], is that in the large-M limit, mean field methods can be used to solve for the effect on a single matrix eigenvalue of the rest of the matrix. We explain the general method and solve two examples in detail. The first examg2 2 2 ple has a potential V () = M(g2 Tr(2 ) + g4 Tr(4 ) + M Tr( ) ) for φ ∈ U (M). As a further illustration of the mean field technique for computing large-M limits, we study a matrix model with a general quartic potential. By expanding the exact large-M result in powers of the couplings we demonstrate how this limit computes the data relevant for a certain subset of the planar contributions to the effective action of the field theory with the tree-level superpotential in (1). In the proposal of Dijkgraaf and Vafa [18] and the subsequent generalizations (e.g., [23–28]) with single-trave superpotentials, the field theory effective action was related simply to the free energies of auxiliary single-trace matrix models and their derivatives. Here we explain in detail why the superpotential resulting from a multi-trace action is not computed in a like manner from a multi-trace matrix model. There is nevertheless a clean matrix model prescription for computing the field theory superpotential. Multi-trace field theories can be linearized in traces by the introduction of new singlet fields which can be integrated out to produce the multi-trace theory. In Sect. 5 we show how this procedure is carried out and relate the resulting linearized superpotential to a matrix model following the techniques of [19]. In the matrix model integrating out the singlets at the level of the free energy reproduces the multi-trace results that do not agree with the field theory. However, integrating out after computing the linearized field theory superpotential leads to exact agreement. It is worth mentioning several further reasons why multi-trace superpotentials are interesting. First of all, the general deformation of a pure N = 2 field theory to an N = 1 theory with adjoint matter involves multi-trace superpotentials, and therefore these deformations are important to understand. What is more, multi-trace superpotentials cannot be geometrically engineered [8] in the usual manner for a simple reason: in geometric engineering of gauge theories the tree-level superpotential arises from a disc diagram for open strings on a D-brane and these, having only one boundary, produce single-trace terms. In this context, even if multi-trace terms could be produced by quantum corrections, their coefficients would be determined by the tree-level couplings and would not be freely tunable. Hence comparison of the low-energy physics arising from multi-trace superpotentials with the corresponding matrix model calculations is a useful probe of the extent to which the Dijkgraaf-Vafa proposal is tied to its geometric and D-brane origins. In addition to these motivations, it is worth recalling that the double scaling limit of the U (N ) matrix model with a double-trace potential is related to a theory of two-dimensional gravity with a cosmological constant. This matrix model also displays phase transitions between smooth, branched polymer and intermediate phases [15]. It would be interesting to understand whether and how these phenomena manifest themselves as effects in a four dimensional field theory. The results of our paper suggest that these phase transitions and the physics of two-dimensional cosmological constants are embedded within four-dimensional field theory. It would be interesting to explore
364
V. Balasubramanian, J. de Boer, B. Feng, Y.-H. He, M. Huang, V. Jejjala, A. Naqvi
this. Finally, multi-trace deformations of field theories have recently made an appearance in the contexts of the AdS/CFT correspondence and a proposed definition of string theories with a nonlocal worldsheet theory [2]. 2. Multi-Trace Superpotentials from Perturbation Theory In this section we begin by reviewing the field theoretic proof that when treated as a function of the glueball superfield, the effective superpotential of an N = 1 supersymmetric gauge theory with single-trace tree-level interactions is computed by planar matrix diagrams [19, 12]. We will then describe how these arguments are modified by the presence of multi-trace terms in the tree-level action. Finally, we will explicitly illustrate our reasoning by perturbatively computing the diagrams that contribute to the effective superpotential of a multi-trace theory up to third order in the couplings. We will always work around a vacuum with unbroken U (N ) symmetry. 2.1. A Schematic Review of the Field Theory Superpotential Computation. Below we give a schematic description of the methods of [19] for the computation of the effective superpotential of an N = 1 field theory. While [19] discussed theories with singletrace Lagrangians, we will find that most of their arguments will generalize easily to multi-trace theories. 1. The Action: The matter action for an N = 1 U (N ) gauge theory with a vector multiplet V , a massive chiral superfield , and superpotential W (), is given in superspace by ¯ = d 4 x d 4 θ e ¯ V + d 4 x d 2 θ W () + h.c. S(, ) (2) 2. The Goal: We seek to compute the effective superpotential as a function of the glueball superfield S=
1 Tr(W α Wα ), 32π 2
(3)
where 2
Wα = iD e−V Dα eV
(4)
is the gauge field strength of V , with Dα = ∂/∂θ α and D α˙ = ∂/∂ θ¯ α˙ + iθ α ∂α α˙ 2 α˙ the superspace covariant derivatives, and D 2 = 21 D α Dα and D = 21 D D α˙ . The gluino condensate S is a commuting field constructed out of a pair of fermionic operators Wα . 3. The Power of Holomorphy: We are interested in expressing the effective superpotential in terms of the chiral glueball superfield S. Holomorphy tells us that it will be independent of the parameters of the anti-holomorphic part of the tree-level superpotential. Therefore, without loss of generality, we can choose a particularly simple ¯ form for W (): ¯ = W ()
1 ¯ 2. m 2
(5)
Multi-Trace Superpotentials vs. Matrix Models
365
Integrating out the anti-holomorphic fields and performing standard superspace manipulations as discussed in Sect. 2 of [19], gives 1 4 2 α S = d xd θ − (6) − iW Dα + Wtree () 2m as the part of the action that is relevant for computing the effective potential as a function of S. Here, = 21 ∂α α˙ ∂ α α˙ is the d’Alembertian, and Wtree is the tree-level superpotential, expanded as 21 m2 + interactions. (The reader may consult Sect. 2 of [19] for a discussion of various subtleties such as why the can be taken as the ordinary d’Alembertian as opposed to a gauge covariantized cov ). 4. The Propagator: After reduction into the form (6), the quadratic part gives the propagator. We write the covariant derivative in terms of Grassmann momentum variables Dα = ∂/∂θ α := −iπα ,
(7)
and it has been shown in [19] that by rescaling the momenta we can put m = 1 since all m dependence cancels out. Then the momentum space representation of the propagator is simply ∞ dsi exp −si (pi2 + W α πiα + m) , (8) 0
where si is the Schwinger time parameter of i th Feynman propagator. Here the precise form of the W α πα depends on the representation of the gauge group that is carried by the field propagating in the loop. 5. Calculation of Feynman Diagrams: The effective superpotential as a function of the glueball S is a sum of vacuum Feynman diagrams computed in the background of a fixed constant Wα leading to insertions of this field along propagators. In general there will be momentum loops, and the corresponding momenta must be integrated over yielding the contribution
α −si m 4 −si pi2 I = dsi e · d pa e · d 2 πb e−si W πiα a
i
=
dsi e
−si m
b
· Iboson · If ermion
i
= Iboson · If ermion ·
1 mP
(9)
to the overall amplitude. We will explain the origin of the last line later. Here a, b label momentum loops, while i = 1, . . . , P labels propagators. In going from the second to the third line, we have assumed that the si dependence in Iboson · If ermion cancels. The momenta in the propagators are linear combinations of the loop momenta because of momentum conservation. 6. Bosonic Momentum Integrations: The bosonic contribution can be expressed as
4
1 d pa 1 Iboson = exp − pa Mab (s)pb = , (10) (2π)4 (4π )2 (det M(s))2 a=1
a,b
366
V. Balasubramanian, J. de Boer, B. Feng, Y.-H. He, M. Huang, V. Jejjala, A. Naqvi
where we have defined the momentum of the i th propagator in terms of the independent loop momenta pa ,
Lia pa (11) pi = a
via the matrix elements Lia ∈ {0, ±1} and
si Lia Lib . Mab (s) =
(12)
i
7. Which Diagrams Contribute: Since each momentum loop comes with two fermionic πα integrations (9) a non-zero amplitude will require the insertion of 2 πα s. From (8) we see that πα insertions arise from the power series expansion of the fermionic part of the propagator and that each πα is accompanied by a Wα . So in total we expect an amplitude containing 2 factors of Wα . Furthermore, since we wish to compute the superpotential as a function of S ∼ Tr(W α Wα ) each index loop can only have zero or two Wα insertions. These considerations together imply that if a diagram contributes to the effective superpotential as a function of the S, then the number of index loops h must be greater than or equal to the number of momentum loops , i.e., h ≥ .
(13)
8. Planarity: The above considerations are completely general. Now let us specialize to U (N) theories with single-trace operators. A diagram with momentum loops has h = + 1 − 2g
(14)
index loops, where g is the genus of the surface generated by ’t Hooft double line notation. Combining this with (13) tell us that g = 0, i.e., only planar diagrams contribute. 9. Doing The Fermionic Integrations: First let us discuss the combinatorial factors that arise from the fermionic integrations. Since the number of momentum loops is one less than the number of index loops, we must choose which of the latter to leave free of Wα insertions. This gives a combinatorial factor of h, and the empty index loop gives a factor of N from the sum over color. For each loop with two Wα insertions we get a factor of 21 W α Wα = 16π 2 S. Since we are dealing with adjoint matter, the action of Wα is through a commutator (15) exp −si [Wiα , −]πiα in the Schwinger term. (See the appendix of [32] for a nice explanation of this notation as it appears in [19]. In Sect. 2.2 we will give an alternative discussion of the fermionic integrations that clarifies various points.) As in the bosonic integrals above, it is convenient to express the fermionic propagator momenta as sums of the independent loop momenta:
Lia πaα , (16) πiα = a
Multi-Trace Superpotentials vs. Matrix Models
367
where the Lia are the same matrix elements as introduced above. The authors of [19] also find it convenient to introduce auxiliary fermionic variables via the equation
Wiα = Lia Waα . (17) a
Here, the Lia = ±1 denotes the left- or right-action of the commutator. In terms of the Waα , the fermionic contribution to the amplitude can be written as
If ermion = N h(16π 2 S) d 2 πa d 2 Wa exp − Waα Mab (s)πbα a
a,b
= (4π) N hS (det M(s)) . 2
2
(18)
10. Localization: The Schwinger parameter dependence in the bosonic and fermionic momentum integrations cancel exactly Iboson · If ermion = N hS ,
(19)
implying that the computation of the effective superpotential as a function of the S localizes to summing matrix integrals. All the four-dimensional spacetime dependence has washed out. The full effective superpotential Weff (S) is thus a sum over planar matrix graphs with the addition of the Veneziano-Yankielowicz term for the pure Yang-Mills theory [38]. The terms in the effective action proportional to S arise exclusively from planar graphs with momentum loops giving a perturbative computation of the exact superpotential. 11. The Matrix Model: The localization of the field theory computation to a set of planar matrix diagrams suggests that the sum of diagrams can be computed exactly by the large-M limit of a bosonic matrix model. (We distinguish between M, the rank of the matrices in the matrix model and N , the rank of the gauge group.) The prescription of Dijkgraaf and Vafa does exactly this for single-trace superpotentials. Since the number of momentum loops is one less than the number of index loops in a planar diagram, the net result of the bosonic and fermionic integrations in (19) can be written as ∂S h . (20) ∂S Because of this, the perturbative part of the effective superpotential, namely the sum over planar diagrams in the field theory, can be written in terms of the genus zero free energy F0 (S) of the corresponding matrix model: Iboson · If ermion = N
∂ Wpert (S) = N F0 (S), ∂S
F0 (S) = F0,h S h .
(21) (22)
h
This free energy is conveniently isolated by taking the large-M limit of the zerodimensional one-matrix model with M × M matrices1 and potential W () whose 1 The original papers of Dijkgraaf and Vafa [16, 17] consider M × M Hermitian matrices (i.e. matrices with real eigenvalues λi ). In fact, we should think of the matrices as belonging to GL(M, C) with eigenvalues distributed along contours in the complex plane rather than along domains on the real axis. The prior results do not depend crucially on this point. Indeed, they carry through exactly by analytic continuation. We thank David Berenstein for emphasizing this to us.
368
V. Balasubramanian, J. de Boer, B. Feng, Y.-H. He, M. Huang, V. Jejjala, A. Naqvi
partition function is given by 1 Z = exp(M F0 ) = Vol(U (M)) 2
1 [D] exp − Tr W () . gs
(23)
In this matrix model every index loop gives a power of M just as in the field theory computation, and all but one index loop gives a power of S. Because of this simple fact the powers of the gluino condensate in the field theory superpotential can be conveniently counted by identifying it with the ’t Hooft coupling S ≡ Mgs , and then differentiating the matrix model free energy as in (21). Rather surprisingly the Veneziano-Yankielowicz term in Weff (S) arises from the volume factor in the integration over matrices in (23). One important unanswered question is why the low-energy dynamics simplifies so much when written in terms of the gluino condensates. 2.2. Computation of a Multi-Trace Superpotential. We have reviewed above how the field theory calculation of the effective superpotential for a single-trace theory localizes to a matrix model computation. In this subsection we show how the argument is modified when the tree-level superpotential includes multi-trace terms. We consider an N = 1 theory with the tree-level superpotential Wtree =
1 g2 (Tr(2 ))2 . Tr(2 ) + g4 Tr(4 ) + 2
(24)
To set the stage for our perturbative computation of the effective superpotential we begin by analyzing the structure of the new diagrams introduced by the double-trace term. If g2 = 0, the connected diagrams we get are the familiar single-trace ones; we will call these primitive diagrams. When g2 = 0 propagators in primitive diagrams can be spliced together by new double-trace vertices. It is useful to do an explicit example to see how this splicing occurs. As an example, let us study the expectation value of the double-trace operator: Tr(2 )Tr(2 ). To lowest order in couplings, the two ways to contract s give rise to the two diagrams in Fig. 1. When we draw these diagrams in double line
Tr Φ
Φ Tr Φ Φ
x
(a)
Tr Φ
Φ Tr Φ Φ
x
(c)
(b)
Fig. 1. Two ways in which the double-trace operator: Tr(2 )Tr(2 ) can be contracted using the vertex shown in (c)
Multi-Trace Superpotentials vs. Matrix Models
369
x
x
(a) Paste
(b) Pinch
Fig. 2. With the inclusion of the double-trace term we need new types of vertices. These can be obtained from the “primitive” diagrams associated with the pure single-trace superpotential by (a) pasting or (b) pinching. The vertices have been marked with a cross
x
x x
x
x x
Fig. 3. More examples of “pinched” diagrams
notation, we find that Fig. 1a corresponding to Tr()Tr() has four index loops, while Fig. 1b corresponding to Tr()Tr() has only two index loops. Both these graphs have two momentum loops. For our purposes both of these Feynman diagrams can also be generated by a simple pictorial algorithm: we splice together propagators of primitive diagrams using the vertex in Fig. 1c, as displayed in Fig. 2a and b. All graphs of the double-trace theory can be generated from primitive diagrams by this simple algorithm. Note that the number of index loops never changes when primitive diagrams are spliced by this pictorial algorithm. If a splicing of diagrams does not create a new momentum loop we say that the diagrams have been pasted together. This happens when the diagrams being spliced are originally disconnected as, for example, in Fig. 2a. In fact because of momentum conservation, no momentum at all flows between pasted diagrams. If a new momentum loop is created we say that that the diagrams have been pinched. This happens when two propagators within an already connected diagram are spliced together as, for example, in Fig. 2b. In this example one momentum loop becomes two because momentum can flow through the double-trace vertex. Further examples of pinched diagrams are given in Fig. 3 where the new loop arises from momentum flowing between the primitive diagrams via double-trace vertices. To make the above statement more clear, let us provide some calculations. First, according to our operation, the number of double index loops never increases whether under pasting or pinching. Second, we can calculate the total number of independent momentum loops by = P − V + 1, where P is the number of propagators and V ,
370
V. Balasubramanian, J. de Boer, B. Feng, Y.-H. He, M. Huang, V. Jejjala, A. Naqvi
the number of vertices. If we connect two separate diagrams by pasting, we will have Ptot = (P1 − 1) + (P2 − 1) + 4, Vtot = V1 + V2 + 1 and tot = Ptot − Vtot + 1 = 1 + 2 ,
(25)
which means that the total number of momentum loops is just the sum of the individual ones. If we insert the double-trace vertex in a single connected diagram by pinching, we will have Ptot = P − 2 + 4, Vtot = V + 1, and tot = Ptot − Vtot + 1 = + 1,
(26)
which indicates the creation of one new momentum loop. Having understood the structure of double-trace diagrams in this way, we can adapt the techniques of [19] to our case. Steps 1–6 as described in Sect. 2.1 go through without modification since they are independent of the details of the tree-level superpotential. However Steps 7–11 are modified in various ways. First of all naive counting of powers of fermionic momenta as in Step 7 leads to the selection rule h ≥ ,
(27)
where h is the total number of index loops and is the total number of momentum loops. (The holomorphy and symmetry based arguments of [12] would lead to the same conclusion.) Since no momentum flows between pasted primitive diagrams it is clear that this selection rule would permit some of the primitive components to be non-planar. Likewise, both planar and some non-planar pinching diagrams are admitted. An example of a planar pinching diagram that can contribute according to this rule is Fig. 2b. However, we will show in the next subsection that more careful consideration of the structure of perturbative diagrams shows that only diagrams built by pasting planar primitive graphs give non-zero contributions to the effective superpotential.
2.3. Which Diagrams Contribute: Selection Rules. In order to explain which diagrams give non-zero contributions to the multi-trace superpotential it is useful to first give another perspective on the fermionic momentum integrations described in Steps 7–9 above. A key step in the argument of [19] was to split the glueball insertions up in terms of auxiliary fermionic variables associated with each of the momentum loops as in (17). We will take a somewhat different approach. In the end we want to attach zero or two α to each index loop, where p labels the index loop, and the total number of fields W(p) such fields must bring down enough fermionic momenta to soak up the corresponding integrations. On each oriented propagator, with momentum πiα , we have a left index line which we label pL and a right index line which we label pR . Because of the commutator in (15), the contribution of this propagator will be α α exp(−si (πiα (W(p − W(p )). L) R)
(28)
Notice that we are omitting U (N ) indices, which are simply replaced by the different index loop labels. In a standard planar diagram for a single-trace theory, we have one more index loop than momentum loop. So even in this case the choice of auxiliary variables in (28) is not quite the same as in (17), since the number of Wα s is twice the number of index loops in (28) while the number of auxiliary variables is twice the number of momentum loops in (17).
Multi-Trace Superpotentials vs. Matrix Models
371
Now in order to soak up the fermionic π integrations in (9), we must expand (28) in powers and extract terms of the form 2 2 2 W(p W(p . . . W(p , 1) 2) l)
(29)
where is the number of momentum loops and all the pi are distinct. The range of p is over 1, . . . , h, with h the number of index loops. In the integral over the anticommuting momenta, we have all h W(p) appearing. However, one linear combination, which is the “center of mass” of the W(p) , does not appear. This can be seen from (28): if we add a constant to all W(p) simultaneously, the propagators do not change. Thus, without loss of generality, one can set the W(p) corresponding to the outer loop in a planar diagram equal to zero. Let us assume this variable is W(h) and later reinstate it. All W(p) corresponding to inner index loops remain, leaving as many of these as there are momentum loops in a planar diagram. It is then straightforward to demonstrate that the W appearing in (17) in linear combinations reproduce the relations between propagator momenta and loop momenta. In other words, in this “gauge” where the W corresponding to the outer loop is zero, we recover the decomposition of Wα in terms of auxiliary fermions associated to momentum loops that was used in [19] and reviewed in (17) above. We can now reproduce the overall factors arising from the fermionic integrations in the planar diagrams contributing to (18). The result from the π integrations is some constant times
2 W(p) .
(30)
p=1
Reinstating W(h) by undoing the gauge choice, namely by shifting W(p) → W(p) + W(h)
(31)
for p = 1, . . . , h − 1, (30) becomes
(W(p) + W(h) )2 .
(32)
p=1
The terms on which each index loop there has either zero or two W insertions are easily extracted: h
2 . (33) W(p) k=1
p=k
2 by S, and therefore the final result In this final result we should replace each of the W(p) is of the form
hS h−1 ,
(34)
as derived in [19] and reproduced in (18). Having reproduced the result for single-trace theories we can easily show that all non-planar and pinched contributions to the multi-trace effective superpotential vanish. Consider any diagram with momentum loops and h index loops. By the same arguments as above, we attach some W(p) to each index loop as in (28), and again, the
372
V. Balasubramanian, J. de Boer, B. Feng, Y.-H. He, M. Huang, V. Jejjala, A. Naqvi
“center of mass” decouples due to the commutator nature of the propagator. Therefore, in the momentum integrals, only h − 1 inequivalent W(p) appear. By doing momentum α . This integrals, we generate a polynomial of order 2 in the h − 1 inequivalent W(p) 3 polynomial can be non-zero if ≤ h − 1: W(p) is non-chiral for all p and vanishes in the chiral ring. Therefore, we reach the important conclusion that the total number of index loops must be larger than the number of momentum loops (35)
h>
while the naive selection rule (27) says that it could be larger or equal. Consider pasting and pinching k primitive diagrams together, each with hi index loops and i momentum loops. According to the rules set out in the previous subsection, the total number of index loops and the total number of momentum loops are given by:
hi ; ≥ i (36) h= i
i
with equality only when all the primitive diagrams are pasted together without additional momentum loops. Now the total number of independent Ws that appear in full diagram is i (hi − 1) since in each primitive diagram the “center of mass” W will not appear. So the full diagram is non-vanishing only when
(hi − 1). (37) ≤ i
This inequality is already saturated by the momenta appearing in the primitive diagrams if they are planar. So we can conclude two things. First, only planar primitive diagrams appear in the full diagram. Second, only pasted diagrams are non-vanishing, since pinching introduces additional momentum loops which would violate this inequality. Summary. The only diagrams that contribute to the effective multi-trace superpotential are pastings of planar primitive diagrams. These are tree-like diagrams which string together double-trace vertices with “propagators” and “external legs” which are themselves primitive diagrams of the single-trace theory. Below we will explicitly evaluate such diagrams and raise the question of whether there is a generating functional for them. 2.4. Summing Pasted Diagrams. In the previous section we generalized Steps 7 and 8 of the single trace case in Sect. 2.1 to the double-trace theory, and found that the surviving diagrams consist of planar connected primitive vacuum graphs pasted together with double-trace vertices. Because of momentum conservation, no momentum can flow through the double-trace vertices in such graphs. Consequently the fermionic integrations and the proof of localization can be carried out separately for each primitive graph, and the entire diagram evaluates to a product of the primitive components times a suitable power of g2 , the double-trace coupling. Let Gi , i = 1, . . . , k be the planar primitive graphs that have been pasted together, each with hi index loops and i = hi − 1 momentum loops to make a double-trace diagram G. Then, using the result (19) for the single-trace case, the Schwinger parameters in the bosonic and fermionic momentum integrations cancel giving a factor
Iboson · If ermion = (N hi Si ) = N k S i (hi −1) hi , (38) i
i
Multi-Trace Superpotentials vs. Matrix Models
373
where the last factor arises from the number of ways in which the glueballs S can be inserted into the propagators of each primitive diagram. Defining C(G) = i hi as the glueball symmetry factor, k(G) as the number of primitive components, h(G) = i hi as the total number of index loops and (G) = i i = h(G)−k(G) as the total number of momentum loops, we get
Iboson · If ermion = (N hi Si ) = N h(G)−(G) S l(G) C(G). (39) i
We can assemble this with the Veneziano-Yankielowicz contribution for pure gauge theory [38] to write the complete glueball effective action as
Weff = −N S(log(S/ 2 ) − 1) + C(G)F(G)N h(G)−(G) S (G) , (40) G
where F(G) is the combinatorial factor for generating the graph G from the Feynman diagrams of the double-trace theory. Notice that in our discussion, we have set g2 = m = 1, so 2 in this equation is in fact m 2 which matches the dimension of S. We can define a free energy related to above diagrams as
F0 = F(G)S h(G) . (41) G
F0 is a generating function for the diagrams that contribute to the effective superpotential, but does not include the combinatorial factors arising from the glueball insertions. In the single-trace case that combinatorial write factor was simply N h(G) and so we could Weff = N(∂F0 /∂S). Here C(G) = hi is a product rather than a sum h(G) = hi , and so the effective superpotential cannot be written as a derivative of the free energy. Notice that if we rescale g2 to g2 /N , there will be a N −(k(G)−1) factor from k(G) − 1 insertions of the double-trace vertex. This factor will change the N h(G)−l(G) dependence in (40) to just N for every diagram. This implies that the matrix diagrams contributing to the superpotential are exactly those that survive the large M limit of a bosonic U (M) matrix model with a potential V () = g2 Tr(2 ) + g4 Tr(4 ) +
g2 Tr(2 )Tr(2 ). M
(42)
In Sect. 4 we will compute the large M limit of a such a matrix model and compute the free energy F0 in this way. Below we will compute this effective action (40) to the first few orders. In Sect. 3 we will show that it is reproduced by an analysis based on the Seiberg-Witten solution of N = 2 gauge theories. In the single-trace case Dijkgraaf and Vafa argued that the large-N limit of an associated single-trace matrix model carries out the sum in (40), or equivalently, that the matrix model free energy provides a generating function for the perturbative series of matrix diagrams contributing to the exact field theory superpotential. In Sect. 4 we will show that the well known double-trace matrix models that have large-N limits do sum up the same “planar pasted diagrams” that we described above and give the free energy defined by (41). However, this multi-trace matrix model will not reproduce the combinatorial factors C(G) appearing in (40). Rather, in Sect. 5 we show how an auxiliary single-trace matrix model with additional singlet fields can be used to exactly compute the double-trace field theory superpotential.
374
V. Balasubramanian, J. de Boer, B. Feng, Y.-H. He, M. Huang, V. Jejjala, A. Naqvi
x
(a)
~g 2
x
(b)
g4
Fig. 4a,b. All two-loop primitive and pasting diagrams. The vertices have been marked with a cross
x
x
x
x
(a)
~g 2 2
(b)
g 4 ~g 2
(c)
x
x
x
x
g 42
(d)
g 42
Fig. 5a–d. All three-loop primitive and pasting diagrams. The vertices have been marked with a cross
2.5. Perturbative Calculation. Thus equipped, let us begin our explicit perturbation calculations. We shall tabulate all combinatoric data of the pasting diagrams up to third order. Here C(G) = i hi and F(G) is obtained by counting the contractions of s. For pure single-trace diagrams the values of F(G) have been computed in Table 1 in [11], so we can utilize their results.
2.5.1. First Order. To first order in coupling constants, all primitive (diagram (b)) and pasting diagrams (diagram (a)) are presented in Fig. 4a,b. Let us illustrate by showing the computations for (a). There is a total of four index loops and hence h = 4 for this diagram. Moreover, since it is composed of the pasting of two primitive diagrams each of which has h = 2; thus, we have C(G) = 2 × 2 = 4. Finally, F = g2 because there is only one contraction possible, viz, Tr()Tr(). In summary we have: diagram h C(G) F(G)
(a) 4 4 g2
(b) 3 3 2g4
(43)
2.5.2. Second Order. To second order in the coupling all primitive ((c) and (d)) and pasting diagrams ((a) and (b)) are drawn in Fig. 5a–d and the combinatorics are summarized in (44). Again, let us do an illustrative example. Take diagram (b), there are five index loops, so h = 5; more precisely it is composed of pasting a left primitive diagram with h = 3 and a right primitive with h = 2, so C(G) = 2 × 3 = 6. Now for F(G), we need contractions of the form Tr( )Tr( )Tr(); there are 4 × 2 × 2 = 16 ways of doing so. Furthermore, for this even overall power in the coupling, we have a minus sign when expanding out the exponent. Therefore F(G) = −16 g2 g4 for this diagram.
Multi-Trace Superpotentials vs. Matrix Models
x
x
x
375
~g 3 2
(a)
x
x
(i)
g 4 ~g 2
(j)
g4
2
x
x x
~g 3 2
(b)
x
x
x
x
3
g 4 ~g 2
2
x
x
(c)
x
x x
x
2 g 4 ~g 2
(d)
x
(k)
x
x
x
g 4 ~g 2
2
(e)
x
3
g4
x x x x
x
3
g4
x
g 4 ~g 2 2
(f)
x
(l)
x
g 4 ~g 2 2
(g)
x
x
x
x
x
(m)
3
g4
x x
x
g 4 ~g 2 2
(h)
Fig. 6a–m. All four-loop primitive and pasting diagrams. The vertices have been marked with a cross
In summary, we have: diagram h C(G) F(G)
(a) 6 8 −4 g22
(b) 5 6 −16 g 2 g4
(c) 4 4 −2g42
(d) 4 4 −16g42
(44)
2.5.3. Third Order. Finally, the third order diagrams are drawn in Fig. 6a–m. The combinatorics are tabulated in (45). Here the demonstrative example is diagram (b), which is composed of pasting four diagrams, each with h = 2, thus h(G) = 4 × 2 = 8 and 1 C(G) = 24 = 16. For F(G), first we have a factor 3! from the exponential. Next we
376
V. Balasubramanian, J. de Boer, B. Feng, Y.-H. He, M. Huang, V. Jejjala, A. Naqvi
have contractions of the form Tr()3 Tr( )Tr( )Tr( ); there are 23 × 4 × 2 ways of doing this. Thus altogether we have F(G) = 32 g23 for this diagram. 3 In summary: diagram h C(G) F(G) diagram h C(G) F(G)
(a) 8 16 16 g23 (h) 6 8 32 g2 g42
(b) 8 16 32 3 g2 3 (i) 6 9 64 g2 g42
(c) 7 12 64 g22 g4 (j ) 5 5 128g43
(d) 7 12 32 g22 g4 (k) 5 5 32 3 3 g4
(e) 7 12 64 g22 g4 (l) 5 5 64g43
(f ) 6 8 128 g2 g42 (m) 5 5 256 3 3 g4
(g) 6 8 128 g2 g42
(45) 2.5.4. Obtaining the Effective Action. Now to the highlight of our calculation. From (43), (44), and (45) we can readily compute the effective glueball superpotential and free energy. We do so by summing the factors, with the appropriate powers for S, in accordance with (40,41). We obtain, up to four-loop order,
F0 = F(G)S h(G) G=all diagrams
= (2g4 + g2 S)S 3 − 2(9g42 + 8g4 g2 S + 2 g22 S 2 )S 4 16 + (54g43 + 66g42 g2 S + 30g4 g22 S 2 + 5 g23 S 3 )S 5 + · · · , (46) 3 and subsequently,
C(G)F(G)N h(G)−l(G) S l(G) Weff = −NS(log(S/ 2 ) − 1) + G=all diagrams
= −NS(log(S/ ) − 1) + (6g4 + 4 g2 N )N S 2 20 −(72g42 + 96g4 g2 N + 32 g22 N 2 )N S 3 + (6g4 + 4 g2 N )3 N S 4 + · · · .(47) 3 We will later see how this result may be reproduced from independent considerations, i.e., the effective action from the factorization of the Seiberg-Witten curve and free energy from the corresponding matrix model. 2
2.6. Multiple Traces, Pasted Diagrams and Nonlocality. Above we found that the only diagrams that contribute to the effective superpotential have zero momentum flowing though the double-trace vertex. Now observe that the double-trace term in the tree-level action can be written in momentum space as: V = d 4 x Tr(2 (x))Tr(2 (x)) = d 4 p1 d 4 p2 d 4 p3 d 4 p4 Tr((p1 )(p2 ))Tr((p3 )(p4 )) × δ(p1 + p2 + p3 + p4 ).
(48)
Multi-Trace Superpotentials vs. Matrix Models
377
Since no momentum flows through the double-trace vertices contributing to the superpotential, the delta function momentum constraint factorizes in our pasted diagrams as δ(p1 + p2 + p3 + p4 ) ∼ δ(p1 + p2 )δ(p3 + p4 ).
(49)
Therefore for the purposes of computing the superpotential we might as well replace the double-trace term in the action by = d 4 p1 d 4 p2 d 4 p3 d 4 p4 Tr((p1 )(p2 ))Tr((p3 )(p4 )) δ(p1 + p2 ) δ(p3 + p4 ) V 1 = (50) d 4 x d 4 y Tr(2 (x))Tr(2 (y)), Vol where V ol is the volume of spacetime. The Feynman diagrams of this nonlocal theory include the ones that compute the superpotential in the double-trace, local theory. This fact suggests that correlation functions of chiral operators are position independent, as described in [12]. Along with cluster decomposition, this position independence leads to the statement that correlators of operators in the chiral ring of an N = 1 theory factorize, which we use in the next section to write Tr(2 )2 = Tr(2 )2 . However it is subtle to establish the precise equivalence between factorization of chiral operators and the vanishing that we demonstrated of all except the pasted diagrams, which leads in turn to the nonlocal action in (50). We leave this potential connection for exploration in future work. 3. The Field Theory Analysis In this section, we will show that in the confining vacuum the effective superpotential of the field theory discussed in the previous section is g2 ) 4 . Weff = N 2 + (6Ng4 + 4N 2
(51)
After integrating in the glueball superfield and expanding the superpotential in a power series in S, (51) can be written Weff = −NS(log(S/ 2 ) − 1) + (6g4 + 4 g2 N )N S 2 − 2(6g4 + 4 g 2 N )2 N S 3 20 + (6g4 + 4 g 2 N )3 N S 4 + · · · . (52) 3 We shall compare this expression for the low-energy gauge dynamics to the perturbative field theory computations in Sect. 2. The two results, of course, are in concert. We begin by considering an N = 1 U (N ) gauge theory with a single adjoint superfield deformed from N = 2 by the tree-level superpotential (n < N ) Wtree =
n+1
gr ur + 4 g2 u22 ,
(53)
r=1
where 1 (54) Tr(k ). k The tree-level superpotential in (53) is more general than the one used in (24). Here, we allow single-trace terms at arbitrary powers of . We shall specialize to the previous example at the end of our discussion. uk :=
378
V. Balasubramanian, J. de Boer, B. Feng, Y.-H. He, M. Huang, V. Jejjala, A. Naqvi
3.1. The Classical Vacua. To find the classical vacua, we have to solve the D-term and ¯ 2 which is zero if is diagF-term conditions. The D-term is proportional to Tr[, ] onal. Let the diagonal entries be xi , i = 1, . . . , N. We still need to solve the F-term condition. In terms of the xi , the tree-level superpotential is W =
n+1 N
gr r=1
r
xir
+ g2
i=1
N
2 xi2
.
(55)
k = 1, ..., N.
(56)
i=1
From this, the F-flatness condition reads: N n+1
∂W r−1 2 = gr xk + 4 g 2 xk xi , 0= ∂xk r=1
i=1
This is certainly different from the case without the double-trace term, where the F-term equations for different xk s decouple. Here, the eigenvalues interact with each other even at the classical level. To solve (56), which may be recast as N n+1
1 r−1 2 gr xk = −4 g2 xi , xk r=1
(57)
i=1
we take the RHS of (57) C := 4 g2
N
xi2
(58)
,
i=1
as an unknown constant for all N F-terms. This gives n+1
gr xkr−1 + Cxk = 0
∀ k.
(59)
r=1
Now the F-terms are decoupled. We can solve this system just as we solve for the vacua of a field theory with only single-trace interactions [13] simply by taking g2 → g2 + C. As the F-terms are order n polynomials in x, we should generically expect n solutions for each eigenvalue xk . The eigenvalues are the roots of the polynomial 0=
n+1
gr xkr−1 ≡ gn+1
r=1
n
(x − ai ).
If Ni of the eigenvalues are located at ai , where symmetry is U (N ) →
(60)
i=1
n
i=1
U (Ni ).
i
Ni = N , the unbroken gauge
(61)
Multi-Trace Superpotentials vs. Matrix Models
379
As ai s are a function of C, we need to impose the additional consistency condition that 4 g2
n
Ni2 ai2 = C.
(62)
j =1
To simplify the discussion, we henceforth focus on the special case where all of the xi s have the same value. The SU (N ) part of the gauge group is unbroken, and will confine in the infrared.
3.2. The Exact Superpotential in the Confining Vacuum. We now proceed to find the exact superpotential in this confining vacuum [37]. Let us recall the general philosophy of the method (see e.g., [29], whose notations we adopt, for a recent discussion). A generic point in the moduli space of the U (N ) N = 2 theory will be lifted by the addition of the general superpotential (53). The points which are not lifted are precisely where at least N − n mutually local monopoles become massless. This can be seen from n the following argument. The gauge group in the N = 1 theory is broken down to i=1 U (Ni ), and the SU (Ni ) factors each confine. We expect condensation of Ni − 1 magnetic monopoles in each of these SU (Ni ) factors and a total of N − n condensed magnetic monopoles. These monopoles condense at the points on the N = 2 moduli space where N − n mutually local monopoles become massless. These are precisely the points which are not lifted by addition of the superpotential. These considerations are equivalent to the requirement that the corresponding Seiberg-Witten curve has the factorization PN (x, u)2 − 4 2N = HN−n (x)2 F2n (x),
(63)
where PN (x, u) is an order N polynomial in x with coefficients determined by the (vevs of) the uk , is an ultraviolet cut-off, and H and F are, respectively, order N − n and 2n polynomials in x. The N − n double roots place N − n conditions on the original variables uk . We can parametrize all the uk by n independent variables αj . In other words, the αj s then correspond to massless fields in the low-energy effective theory. If we know the exact effective action for these fields, to find the vacua, we simply minimize Seff . Furthermore, substituting uk back into the effective action gives the action for the vacua. Holomorphy and regularity of the superpotential as the couplings go to zero requires that there are no perturbative corrections to the tree-level superpotential. In addition, we assume that all non-perturbative effects are captured in the Seiberg-Witten curve analysis discussed above. Then, we need to minimize2 Wexact =
n+1
gr ur + 4 g2 u22 .
(64)
r=1
In general the factorization problem is hard to solve [14], but for the confining vacuum where all N − 1 monopoles have condensed, there is a general solution given by 2 In principle, we can have additional terms generated quantum mechanically in the superpotential. However, the agreement between this analysis and the matrix model analysis suggests that such terms are not present.
380
V. Balasubramanian, J. de Boer, B. Feng, Y.-H. He, M. Huang, V. Jejjala, A. Naqvi
Chebyshev polynomials.3 In our case, we have the solution [p/2] N 2q q 2q p−2q Cp C2q z , p q=0 u1 n! n p z= = , Cn := . p N p!(n − p)!
up =
(65) (66)
Notice that in (65), there is one free parameter z which is the field left upon condensation. Now we put it into the superpotential W =
n+1
gr ur (z, ) + 4 g2 u2 (z, )2 ,
(67)
r=1
solve and back-substitute z from ∂W/∂z = 0 to obtain the effective superpotential Weff . Notice that in the above result, we have used u22 = u2 2 .
(68)
This is true because u2 is a chiral field, and cluster decomposition in the field theory lets us factor the correlation functions of operators in the chiral ring [12]. Although the above procedure finds Weff , it is not the best form to compare with our previous results because there is no gluino condensate S. To make the comparison, we need to “integrate in” [31] the glueball superfield as in [29]. The integrating in procedure is as follows (here we use the single-trace superpotential as an illustrative example of the technique). • We set := 2 , and use the equation [p/2] n+1
q 2q q ∂W NS = =N Cp C2q zp−2q q gr ∂ p r=2
(69)
q=1
to solve for in terms of S. • Next, we find z by solving 0=
[p/2] n+1
p − 2q 2q q ∂W gr = Cp C2q zp−2q−1 q . ∂z p r=1
(70)
q=0
• Now the effective action for the glueball superfield S can be written as N WS (S, g, ) = −S log + Wtree (S, g, ), 2
(71)
which will reproduce the result ∂ WS (S, 2 , g) = − ln(/ 2 )N . ∂S
(72)
3 This was worked out first by Douglas and Shenker [25], but here we use the results and nomenclature of Ferrari [29].
Multi-Trace Superpotentials vs. Matrix Models
381
3.3. An Explicit Example. Let us work out the double-trace example that we are interested in solving. The superpotential is W = g2 u2 + 4g4 u4 + 4 g2 u22 ,
(73)
(later, we can set g2 = m = 1). Using N 2 [z + 2 2 ], 2 N u4 = [z4 + 12 2 z2 + 6 4 ], 4
u2 =
(74) (75)
from (65), we obtain g2 g2 2 + N g2 z 2 ] + g4 z2 + 12g4 2 + 4N 2 +N 2 [g2 + 6g4 2 + 4N g2 2 ].
W = N z2 [
(76)
From this we have the following equations by setting 2 = : S = [g2 + (12g4 + 4N g2 )z2 + (12g4 + 8N g2 )], ∂W 0= = z[g2 + z2 (4g4 + 4N g2 ) + 2(12g4 + 4N g2 )]. ∂z We solve z = 0: q =
−g2 +
g22 + 4S(12g4 + 8N g2 )
2(12g4 + 8N g2 )
.
The effective action with S integrated in is4 2 N Weff = S log + N [g2 + (6g4 + 4N g2 )],
(77) (78)
(79)
(80)
which will be the one used in comparison to our previous results. After minimizing this action, we find Wz=0 = N 2 [g2 + 6g4 2 + 4N g2 2 ],
(81)
which is the promised result of (51). Setting g2 = 1, expanding (79) in powers of S, and substituting into (80), we get the second formula (52) from the beginning of this section: Weff = −NS(log(S/ 2 ) − 1) + (6g4 + 4 g2 N )N S 2 − 2(6g4 + 4 g 2 N )2 N S 3 20 + (6g4 + 4 g 2 N )3 N S 4 + · · · . (82) 3 Crucial in matching the result of this calculation with the perturbative analysis in the previous section is the assumption of factorization u22 = u2 2 . This is equivalent to the vanishing of the pinching diagrams in the perturbative analysis. Notice that in both formula (79) and (80), g4 and g2 combine together as g4 + 2N g2 . So if we shift 3 2N g4 to g4 + 3 g2 , the single-trace result will reproduce the double-trace result, and the effective action for the double-trace can be naively calculated from the DV prescription by partial differentiation of the glueball field S. 4
382
V. Balasubramanian, J. de Boer, B. Feng, Y.-H. He, M. Huang, V. Jejjala, A. Naqvi
4. The Matrix Model In Sect. 2 we demonstrated that the explicit field theory computation of the effective superpotential localizes to a certain sum of matrix diagrams. All of these diagrams are constructed by pasting planar single-trace diagrams together with double-trace vertices in such a way that no additional momentum loops are created. By examining the scaling of these diagrams with N in (40) we also observed that these are precisely the diagrams that would survive the large-M limit of a U (M) bosonic matrix model with a potential g2 (83) Tr(2 )Tr(2 ). M The extra factor of 1/M multiplying g2 , in comparison with the field theory tree-level superpotential (1) is necessary for a well-defined ’t Hooft large-M limit. This is because each trace, being a sum of eigenvalues, will give a term proportional to M. So to prevent the double-trace term from completely dominating the large-M limit we must divide by an extra factor of M. Fortunately, this is a well known model and was solved more than a decade ago [15, 34]. Below we will review this solution, compare its results to our field theory calculations, and then generalize to other multi-trace deformations. V () = g2 Tr(2 ) + g4 Tr(4 ) +
4.1. The Mean-Field Method. The basic observation, following [15], that allows us to solve the double-trace matrix model (83), is that in the large-M limit the effects on a given matrix eigenvalue of all the other eigenvalues can be treated in a mean field approximation. Accordingly we compute the matrix model free energy F as 1 (Tr(2 ))2 2 g2 , exp(−M 2 F) = d M () exp − M Tr(2 ) + g4 Tr(4 ) + 2 M
1 2 g2 2 dλi exp M − λi − g 4 λ4i − ( λi ) = 2 M i i i i
+ log |λi − λj | . (84) i=j
Here λ are the M eigenvalues of and F is the free energy, which can be evaluated by saddle point approximation at the planar limit. The log term comes from the standard Vandermonde determinant. This matrix model is Hermitian with rank M in the notation of [22] (of course as mentioned earlier, we should really consider GL(M, C) matrices though the techniques hold equally). We have introduced an extra factor of M in the exponent on the right hand side of (84) by rescaling the fields and couplings in (83) in accordance with the conventions of [15]. The density of eigenvalues ρ(λ) :=
M 1 δ(λ − λi ) M
(85)
i=1
becomes continuous in an interval (−2a, 2a) when M goes to infinity in the planar limit for some a ∈ R+ . Here the interval is symmetric around zero since our model is an even function. The normalization condition for eigenvalue density is 2a dλρ(λ) = 1 . (86) −2a
Multi-Trace Superpotentials vs. Matrix Models
383
We can rewrite (84) in terms of the eigenvalue density in the continuum limit as
M 1 2a dλi exp − M 2 dλρ(λ) λ2 + g4 λ4 exp(−M 2 F) = 2 −2a i=1 2a 2 2a 2a + g2 dλρ(λ)λ2 − dλdµρ(λ)ρ(µ) ln |λ − µ| . (87) −2a
−2a
Then the saddle point equation is 1 g2 cλ = P λ + 2g4 λ3 + 2 2 where c is the second moment
c :=
2a
−2a
−2a
2a
−2a
dµ
ρ(µ) , λ−µ
dλρ(λ)λ2
(88)
(89)
and P means principal value integration. The effect of the double-trace is to modify the coefficient of λ in the saddle point equation. We can determine the number c self-consistently by (89). The solution of ρ(λ) to (88) can be obtained by standard matrix model techniques by introducing a resolvent. The answer is 1 1 2 2 ρ(λ) = 4a 2 − λ. (90) + 2 g2 c + 4g4 a + 2g4 λ π 2 Plugging the solution into (86) and (89) we obtain two equations that determine the parameters a and c: a 2 (1 + 4 g2 c) = 1 − 12g4 a 4 ,
(91)
16g4 g2 a 8 + (12g4 + 4 g2 )a 4 + a 2 − 1 = 0.
(92)
Substituting these expressions into (87) gives us the free energy in the planar limit M → ∞ as: 2a 2a 1 F= dλρ(λ)( λ2 + g4 λ4 ) + g2 c 2 − dλdµρ(λ)ρ(µ) ln |λ − µ|. (93) 2 −2a −2a One obtains 1 2 1 (94) (a − 1) + (6g4 a 4 + a 2 − 2)g4 a 4 − log(a 2 ) . 4 2 Equation (94) together with (92) give the planar free energy. We can also expand the free energy in powers of the couplings, by using (92) to solve for a 2 perturbatively F(g4 , g2 ) − F(0, 0) =
g2 ) + (288g42 + 176g4 g2 + 32 g22 ) a 2 = 1 − (12g4 + 4 −(8640g43 + 7488g42 g2 + 2496g4 g22 + 320 g23 ) + · · · .
(95)
Plugging this back into (94) we find the free energy as a perturbative series F0 = F(g4 , g2 ) − F(0, 0) = 2g4 + g2 − 2(9g42 + 8g4 g2 + 2 g22 ) 16 + (54g43 + 66g42 g2 + 30g4 g22 + 5 g23 ) + · · · . 3
(96)
384
V. Balasubramanian, J. de Boer, B. Feng, Y.-H. He, M. Huang, V. Jejjala, A. Naqvi
Comparing with (46) we see that F reproduces the explicit computation of the generating function of “planar pasted” field theory diagrams in Sect. 2.5. In matching the two we have to restore the proper powers of the glueball S into (96). First recall that to keep 1 the relevant diagrams in the matrix model, we have inserted M to the double-trace term in (83). Therefore to compare with (46), we need to rescale g2 in (96) to g2 M ≡ g2 S, where we have effectively identified the glueball S in the field theory with M in the matrix model. In addition, we should re-insert powers of M into (96) by loop counting. The first two terms in (96) have three index loops so we need to multiply them by M 3 ≡ S 3 . The third term has four index loops and the fourth term, five, and hence we respectively need factors of S 4 and S 5 . With these factors correctly placed into (96), we recover (46) completely. This verifies our claim that the diagrams surviving the large-M limit of the matrix model (83) are precisely the graphs that contribute to effective action of the field theory with the tree-level superpotential (1). Nevertheless, as we have already discussed in Sect. 2.4 we cannot compute the effective superpotential of the field theory Weff (S) by taking a derivative ∂F0 /∂S, because the combinatorial factors will not agree. It would be interesting to find the field theories whose effective superpotential is computed by the matrix model (94). 4.2. Generalized Multi-Trace Deformations. In fact the mean field techniques of the previous subsection can be generalized to solve the general multi-trace model. Below we illustrate this by solving the general quartic matrix model; as discussed above, it is an interesting challenge to find a field theory whose effective superpotential these models compute. Specifically, let us consider the Lagrangian L = g2 Tr()2 + µ(Tr())2 + ν1 (Tr)2 Tr(2 ) + ν2 (Tr)4 + 2ν3 (Tr)Tr(3 ), (97) which together with the original two terms in (83) exhausts all quartic interactions. The one-matrix model partition function 1 1 ZM = [Dλ] exp −M 2 L − dx dy log |λ(x) − λ(y)| , (98) 0
0
gives the saddle point equation g2 c2 + ν1 c12 )λ + 2(µc1 + ν1 c1 c2 + 2ν2 c13 + ν3 c3 ) 4g4 λ3 + 6ν3 c1 λ2 + 2(g2 + 2 2b u(τ ) dτ , (99) = 2P λ−τ −2a where the moments ck are defined as ck = c0 =
2b
−2a 2b −2a
dτ u(τ )τ k , dτ u(τ ) = 1.
(100)
Note that we have introduced the separate upper and lower cut parameters a and b as opposed to the standard symmetric treatment because u(λ) is not of explicit parity (such
Multi-Trace Superpotentials vs. Matrix Models
385
asymmetric examples have also been considered in [11]). When a = b one can recast (99) into a Fredholm integral equation of the first kind and Cauchy type, which affords a general solution as follows [10] a a 2 2 1 u(t)dt 1 a − t v(t)dt + √ C P = −v(x) ⇒ u(x) = π −a t − x π −a a2 − x 2 t − x a2 − x 2 (101) for some constant C. When a = b we can use the ansatz: u(λ) =
1 (Aλ2 + Bλ + C) (2a + λ)(2b − λ) π
(102)
with the constants matching the coefficients in the LHS of (99) as 2A = 2g4 , 2aA − 2Ab + 2B = 3ν3 c1 , 2 2 −a A − 2 a A b − A b + 2 a B − 2 b B + 2 C = g2 + 2 g2 c2 + ν1 c12 , a 3 A + a 2 A b − a A b2 − A b3 − a 2 B −2 a b B − b2 B + 2 a C − 2 b C = µc1 + ν1 c1 c2 + 2ν2 c13 + ν3 c3 . (103) We see a well-behaved u(λ) which is zero at the end-points and vanishes outside the support (−2a, 2b). We now need to check the consistency of our mean-field method. This simply means the following. Considering the definition of ci in (100), the definitions (103) actually constitute a system of equations for A, B, C, a, b because each ci on the RHS, through (100), depend on A, B, C, a, b. To (103) we must append one more normalization con 2b dition, that c0 = −2a u(λ)dλ = 1. Therefore we have five equations in five variables which will fix our parameters in terms of the seven couplings. Our mean-field method is therefore self-consistent. It would be interesting to find a role for such exactly solvable models in the physics of four-dimensional field theories. 5. Linearizing Traces: A Matrix Model Prescription for Multi-Trace Superpotentials In previous sections we showed that the field theory computation of the effective superpotential of a double-trace theory as a function of the glueball S localized to summing matrix diagrams. In the end, only the “pasted” diagrams contributed, namely certain tree-like graphs obtained by pasting together planar single-trace graphs with doubletrace vertices in such a way that no momentum flows through the latter. After verifying the result via the N = 2 Seiberg-Witten solution, we demonstrated that the sum of these diagrams given as a series in (40) is not computed by the large M limit of a U (M) matrix as one might have hoped. Here we will show that there is a different matrix model that does compute the field theory superpotential. Since the authors [19] and [12] have proven that the superpotential of a single-trace gauge theory can be computed from an associated matrix model, we seek to construct
386
V. Balasubramanian, J. de Boer, B. Feng, Y.-H. He, M. Huang, V. Jejjala, A. Naqvi
our double-trace theory from another single-trace model. Recall that we are considering the tree-level superpotential Wtree =
1 g2 (Tr(2 ))2 . g2 Tr(2 ) + g4 Tr(4 ) + 2
(104)
Now consider another theory with an additional gauge singlet field A Wtree =
1 g2 A)Tr(2 ) + g4 Tr(4 ) − g 2 A2 . (g2 + 4 2
(105)
tree It is easy to see that integrating out A in (105), which amounts to solving ∂W∂A =0 and back-substituting, produces the double-trace theory (104). The advantage of (105) is that it consists purely of single-trace operators. The first two terms will generate an effective potential Wsingle (A, S), as a function of A and the glueball superfield S (the subscript “single” refers to the fact that this is the superpotential for the model without the double-trace term and with an A dependent mass term). Then
Weff (A, S) = Wsingle (A, S) − g 2 A2 .
(106)
The exact superpotential for the glueball superfield S for the double-trace theory then ∂Wsingle follows by integrating A out, i.e. solving ∂A − 2 g2 A = 0 for A and substituting in (106). Since single-trace theories are directly related to the matrix model we might hope to use this construction with an added auxiliary field A to find an auxiliary matrix model that sums the pasted diagrams of the double-trace theory.
5.1. Field Theory Computation of Wsingle (A, S) and Pasted matrix Diagrams. In this section5 we will discuss how the superpotential for the double-trace theory can be computed in field theory from the linearized model (105). First, observe that the superpotential for an adjoint theory with an additional gauge singlet (105) localizes to summing matrix integrals, since the arguments of [19] that are reviewed in Sect. 2 go through essentially unchanged. To compute the effective potential as a function of both A and the glueball S, we need to sum superspace Feynman diagrams with insertions of both A and Wα , with both of these treated as background fields. Since we are only interested in contributions to the superpotential, we can restrict ourselves to constant background A. Then it is easy to verify that the entire analysis in Sect. 2 goes through for the theory (105), with the double-trace coupling g2 set to zero and a shift in the mass of the field , viz g2 → g2 + 4 g2 A. In particular, the computation of the effective superpotential Wsingle (A, S) localizes to summing matrix diagrams and there is some free energy Fsingle in terms of which Wsingle = N ∂Fsingle (S, A)/∂S. Let us verify that this procedure will yield the correct double-trace result when we g2 A with g2 = 0 in the known single-trace integrate A out. Making the g2 → g2 + 4 result (80) and (79), we find the effective superpotential Wsingle (A, S) = N S log(
2 ) + N ((g2 + 4 g2 A) + 6g4 ),
(107)
5 We thank Cumrun Vafa and Ken Intriligator for communications concerning the material in this section.
Multi-Trace Superpotentials vs. Matrix Models
387
where is determined by the quadratic equation 12g4 2 + (g2 + 4 g2 A) = S.
(108)
Now we can integrate A out and obtain the superpotential for the glueball superfield S. We solve ∂Weff /∂A = ∂Wsingle /∂A − 2 g2 A = 0 for A. This is a simple calculation from (107), (108): ∂Wsingle ∂ ∂ ∂Wsingle Wsingle ((S, A), S, A) = + ∂A ∂A ∂A ∂ = (g2 + 4 g2 A + 12g4 − = 4 g2 2N ,
S ∂ )N + 4 g2 N ∂A (109)
where in last step we have used (108). So the solution to ∂Wsingle /∂A − 2 g2 A = 0 is A = 2N .
(110)
Plugging A = 2N into (107), (108) and (106), we find the effective glueball superpotential to be Weff = N S log(
2 g2 N + 6g4 ), ) + N (g2 + 4
(111)
with determined by the quadratic equation (12g4 + 8 g2 N )2 + g2 = S.
(112)
Equations (111) and (112) are of course the double-trace effective glueball superpotential we computed previously in (79) and (80). Why does this procedure reproduce precisely the sum of pasted diagrams that contribute to the double-trace superpotential in (40)? From the point of view of perturbation theory, we are doing the following. If we treat A as a constant, we should simply sum the planar diagrams in the theory with a quartic superpotential, and after doing so, we obtain the superpotential d 4 xd 2 θ W with W = Wconnected planar (S, g2 + 4 g2 A, g4 ) − g 2 A2 .
(113)
where A0 solves Next, we should integrate out A. To do so, we write A = A0 + A, ∂W/∂A = 0. We see that W becomes 2 + c3 A 3 + . . . . W = Wconnected planar (S, g2 + 4 g2 A0 , g4 ) − g2 A20 + c2 A
(114)
From the diagrammatic point of view, A is What is the meaning of integrating over A? the field that allows momentum to flow through the g2 vertices. All diagrams where such momentum flow is prohibited are taken into account by the background value A0 . Thus, picking A0 takes the pasting process into account, whereas the further integrals over A should correspond to pinching diagrams. We already know that these latter diagrams should vanish from our diagrammatic analysis, and therefore we should simply drop all The final answer for W is thus terms involving A. W = Wconnected planar (S, g2 + 4 g2 A0 , g4 ) − g2 A20 .
(115)
388
V. Balasubramanian, J. de Boer, B. Feng, Y.-H. He, M. Huang, V. Jejjala, A. Naqvi
gives no contribution to the superpotenOne can also see directly that integrating out A but there tial. In the diagrams that one can write down, there will be many loops of A, are no vertices that can absorb any fermionic momentum, and therefore these diagrams do not yield any contribution to the superpotential. It is an interesting exercise to verify explicitly that (115) is a generating diagram for pasted diagrams (which are all tree graphs), made out of building blocks corresponding to Wconnected planar . 5.2. Matrix model perspective. Above we argued that the methods of [19] show that the above linearization of the double-trace deformation via introduction of an auxiliary singlet field A leads to a theory whose superpotential can be computed by a matrix model. First observe that the double-trace matrix model partition function can be linearized in traces by the introduction of an auxiliary parameter A, over which we integrate: Z = exp(−M 2 F0double ) (Tr(2 ))2 1 2 = d M () exp{−M( g2 Tr(2 ) + g4 Tr(4 ) + g2 )} 2 M 1 2 g2 A)Tr(2 ) + g4 Tr(4 ) − M g2 A2 )}. (116) = dA d M () exp{−M( (g2 + 4 2 This is the matrix model analog of the statement that the double-trace field theory can be generated by integrating out a gauge singlet. In terms of the free energy of the single-trace matrix model, this can be written as single + M 2 g2 A2 ). (117) exp(−M 2 F0double ) = dA exp(−M 2 F0 Hence to obtain the free energy of the double-trace matrix model, we need to solve the ∂F
single
(A,S)
single
equation 0 ∂A −2 g2 A = 0 for A and substitute in F0double = F0 (A, S)− g 2 A2 , where we have used the identification from [18] that S ∼ M. The resulting expression for the double-trace matrix model is, of course, the same as that obtained by mean field methods in sect. (4.1). However, as emphasized in the previous sections, the derivative of this free energy with respect to S does not yield the correct superpotential for the field theory. Let us now contrast this with a different matrix model construction suggested by the field theory analysis in the previous subsection. Consider the matrix partition function = exp(−M 2 F single ) Z 0 =
1 2 d M () exp{−M( (g2 + 4 g2 A)Tr(2 ) + g4 Tr(4 )}, 2
(118)
where A is now treated as a fixed parameter of the matrix model in analogy with the constant A appearing in the field theory superpotential. As we explained above, the argusingle ments of [19] applied to the linearized model (105) show that in terms of F0 (A, S) (with S ∼ M as in [18]) single ∂F0 2 Weff (A, S) = −N S(log(S/ ) − 1) + N − g 2 A2 , (119) ∂S constantA
Multi-Trace Superpotentials vs. Matrix Models
389
where Weff is the field theory superpotential in (106). Note that we have not integrated out A at this stage. Since the single-trace matrix model has reproduced Weff (A, S), it is manifest that integrating out A will correctly produce the double-trace superpotential as a function of the glueball, just as it did in the field theory. Notice that if we start with the single-trace matrix model (116) and integrate out the singlet A, we find the free energy of the double-trace model as indicated in (117), and ∂F0double /∂S does not reproduce the field theory superpotential. However, if we first differentiate with respect to S and then integrate out A we reproduce the field theory result. Of course, the direct field theory computation described above indicates to us that this is the right order in which to do things. 5.3. General Multi-trace Operators. Finally, we show that the procedure of introducing auxiliary parameters to linearize traces in a double-trace theory can be extended to a general multi-trace model. Consider the term Tr(m1 )Tr(m2 ),
(120)
in the superpotential. We can rewrite this in terms of single trace terms by introducing four gauge singlet fields Ai , i = 1 · · · 4 as 2 W2 = 3 A21 + A22 + A1 A2 + A1 Trm1 + A2 Trm2 + √ A3 Trm1 3 2 2 m2 2 −A3 + √ A4 Tr − A4 . (121) 3 Integrating out Ai by setting ∂W2 /∂Ai = 0, solving for Ai and substituting in (121) yields the double trace superpotential (120). To generate a term of the form Tr(m1 )Tr(m2 )Tr(m3 ),
(122)
we iterate the above procedure twice, i.e. we introduce additional gauge singlet fields Bi , i = 1 · · · 4, and consider the theory with a superpotential 2 W3 = 3 B12 + B22 + B1 B2 + B1 Trm3 + B2 W2 + √ B3 Trm3 3 2 2 2 −B3 + √ B4 W2 − B4 , (123) 3 where W2 is defined in (121). Integrating out Ai and Bi for i = 1 · · · 4 yields the term (122). Generalization to terms with more traces is obvious. 6. Conclusion We have studied an N = 1 U (N ) gauge theory with adjoint chiral matter and a double-trace tree-level superpotential. We found that the computation of the effective superpotential as a function of the glueball superfield localizes to summing a set of matrix integrals. The associated set of matrix diagrams have the structure of tree diagrams in which double-trace vertices are strung together by “propagators” and “external” legs that are themselves connected single-trace diagrams. We showed that the Seiberg-Witten
390
V. Balasubramanian, J. de Boer, B. Feng, Y.-H. He, M. Huang, V. Jejjala, A. Naqvi
solution to N = 2 field theories computes an effective superpotential for the double-trace theory that matches our direct analysis. The use of factorization in our Seiberg-Witten analysis, namely that Tr(2 )2 = Tr(2 )2 , was equivalent in our perturbative computations to the vanishing of any diagrams where extra momentum loops were introduced by the double-trace vertices. Next, we showed that the large-M limit of the standard double-trace U (M) matrix model does sum up the same set of matrix diagrams, but the combinatorial factors are different from those appearing in the field theory. In particular, the double-trace field theory superpotential is not computed by a derivative of the double-trace matrix model free energy. However, there is a simple matrix model prescription for computing a multi-trace field theory superpotential. We can linearize the traces by introducing some auxiliary singlets and the associated single trace matrix model free energy can be manipulated to compute the desired multi-trace field theory result. Our results raise several issues: 1. Does the large-N limit of the standard double-trace matrix model compute the superpotential for some N = 1 field theory? 2. We have worked in the vacuum with an unbroken gauge group. It would be good to generalize our arguments to the other vacua with partially broken gauge symmetry. 3. In Sect. 2.6 we pointed out that there is an intriguing connection between the contributions made by multi-trace vertices to the superpotential of a local theory and certain Feynman diagrams of an associated nonlocal theory. It would be very interesting to flesh out this connection. 4. In an U (N) theory with adjoint , the operator Tr(K ) with K > N decomposes into a sum of multi-trace operators. This decomposition can receive quantum corrections as discussed in [12]. How do our arguments generalize to this case? 5. We can also add baryon-like operators like det() to the superpotential (for theories with fundamental matter in the context of matrix models, baryons were studied by [4, 7]). Such operators also decompose into sums of products of traces, and are very interesting because, even without fundamental matter, they can give rise to an open string sector in Yang-Mills theory as opposed to the standard closed string sector that the ’t Hooft expansion leads us to expect [6, 1]. It would be useful to understand in this case how and whether the computation of holomorphic data in such a theory localizes to sums of matrix integrals. In addition to these directions there are some interesting applications that arise from known facts about the large-N of the standard double-trace U (N ) matrix model. This theory is related to two-dimensional gravity with a positive cosmological constant and displays phase transitions between branched polymer and smooth phases of two-dimensional gravity [15]. Presumably such phase transitions manifest themselves as interesting phenomena in a four-dimensional field theory. Acknowledgement. Work on this project at the University of Pennsylvania was supported by the DOE under cooperative research agreement DE-FG02-95ER40893, and by an NSF Focused Research Grant DMS0139799 for “The Geometry of Superstrings”. This research was also supported by the Institute for Advanced Study, under the NSF grant PHY-0070928 and the Virginia Polytechnic Institute and State University, under the DOE grant DE-FG05-92ER40709. We gratefully acknowledge J. Erlich for participation in the earlier stages of the work and D. Berenstein, P. Berglund, F. Cachazo, R. Dijkgraaf, A. Hanany, K. Intriligator, P. Kraus, R. Leigh, M. Mari˜no, D. Minic, J. McGreevy, N. Seiberg, and C. Vafa for comments and revelations. We thank Ravi Nicholas Balasubramanian for inspirational babbling. BF and VJ also express their appreciation of the most generous hospitality of the High Energy Group at the University of Pennsylvania; they and YHH further toast W. Buchanan of La Reserve B & B for his warm congeniality.
Multi-Trace Superpotentials vs. Matrix Models
391
References 1. Aharony, O., Antebi, Y.E., Berkooz, M., Fishman, R.: “Holey sheets’: Pfaffians and subdeterminants as D-brane operators in large N gauge theories.’ arXiv:hep-th/0211152 2. Aharony, O., Berkooz, M., Silverstein, E.: Multiple-trace operators and non-local string theories. hep-th/0105309 3. Argurio, R., Campos, V.L., Ferretti, G., Heise, R.: Exact superpotentials for theories with flavors via a matrix integral. arXiv:hep-th/0210291 4. Argurio, R., Campos, V.L., Ferretti G., Heise, R.: Baryonic corrections to superpotentials from perturbation theory. arXiv:hep-th/0211249 5. Ashok, S.K., Corrado, R., Halmagyi, N., Kennaway, K.D., Romelsberger, C.: Unoriented strings, loop equations, and N=1 superpotentials from matrix models. arXiv:hep-th/0211291 6. Balasubramanian, V., Huang, M.X., Levi, T.S., Naqvi, A.: Open strings from N = 4 superYang-Mills. JHEP 0208, 037 (2002) [arXiv:hep-th/0204196] 7. Bena, I., Roiban, R., Tatar, R.: Baryons, boundaries and matrix models. arXiv:hep-th/0211271 8. Berenstein, D.: Reverse geometric engineering of singularities. JHEP 04 052, (2002) http: //arXiv.org/abs/hep-th/0201093 9. Berenstein, D.: Quantum moduli spaces from matrix models. arXiv:hep-th/0210183 10. Bitsadze, A. V.: Integral equations of first kind. Carleman, T.: Uber die Abelsche Integralgleichung mit Konstanten Integrationsgrenzen. Mathematische Zeitschrift 15 111–120 (1922) 11. Brezin, E., Itzykson, C., Parisi, G., Zuber, J.B.: Planar diagrams. Commun. Math. Phys. 59, 35 (1978) 12. Cachazo, F., Douglas, M.R., Seiberg, N., Witten, E.: Chiral rings and anomalies in supersymmetric gauge theory. arXiv:hep-th/0211170 13. Cachazo, F., Intriligator, K.A., Vafa, C.: A large N duality via a geometric transition. Nucl. Phys. B603, 3–41 (2001) http: //arXiv.org/abs/hep-th/0103067 14. Cachazo, F., Vafa, C.: N = 1 and N = 2 geometry from fluxes. http: //arXiv.org/abs/hep-th/0206017 15. Das, S.R., Dhar, A., Sengupta, A.M., Wadia, S.R.: New critical behavior in d = 0 large N matrix models. Mod. Phys. Lett. A5, 1041–1056 (1990) 16. Dijkgraaf, R., Vafa, C.: Matrix models, topological strings, and supersymmetric gauge theories. Nucl. Phys. B644, 3–20 (2002) http: //arXiv.org/abs/hep-th/0206255 17. Dijkgraaf, R., Vafa, C.: On geometry and matrix models. Nucl. Phys. B644, 21–39 (2002) http: //arXiv.org/abs/hep-th/0207106 18. Dijkgraaf, R., Vafa, C.: A perturbative window into non-perturbative physics. http: //arXiv.org/abs/hep-th/0208048 19. Dijkgraaf, R., Grisaru, M.T., Lam, C.S., Vafa, C., Zanon, D.: Perturbative computation of glueball superpotentials. http: //arXiv.org/abs/hep-th/0211017 20. Dijkgraaf, R., Neitzke, A., Vafa, C.: Large N strong coupling dynamics in non-supersymmetric orbifold field theories. arXiv:hep-th/0211194 21. Dijkgraaf, R., Sinkovics, A., Temurhan, M.: Matrix models and gravitational corrections. arXiv:hepth/0211241 22. Dijkgraaf, R., Gukov, S., Kazakov, V.A., Vafa, C.: Perturbative analysis of gauged matrix models. arXiv:hep-th/0210238 23. Dorey, N., Hollowood, T.J., Kumar, S.P., Sinkovics, A.: Massive vacua of N = 1∗ theory and S-duality from matrix models. arXiv:hep-th/0209099 24. Dorey, N., Hollowood, T.J., Prem Kumar, S., Sinkovics, A.: Exact superpotentials from matrix models. arXiv:hep-th/0209089 25. Douglas, M., Shenker, S.: Dynamics of SU (N) supersymmetric gauge theory. hep-th/9503163 26. Feng, B.: Geometric dual and matrix theory for SO/Sp gauge theories. hep-th/0212010 27. Feng, B.: Seiberg duality in matrix model. arXiv:hep-th/0211202 28. Feng, B., He, Y.H.: Seiberg duality in matrix models II. arXiv:hep-th/0211234 29. Ferrari, F.: On exact superpotentials in confining vacua. arXiv:hep-th/0210135 30. Gubser, S.S., Mitra, I.: Double-trace operators and one-loop vacuum energy in AdS/CFT. arXiv:hepth/0210093 31. Intriligator, K.: Integrating in and exact superpotentials in 4d. hep-th/9407106 32. Ita, H., Nieder, H., Oz, Y.: Perturbative computation of glueball superpotentials for SO(N) and USp(N). arXiv:hep-th/0211261 33. Janik, R.A., Obers, N.A.: SO(N) S uperpotential, Seiberg-Witten Curves and Loop Equations. arXiv:hep-th/0212069 34. Klebanov, I. R., Hashimoto, A.: Nonperturbative solution of matrix models modified by trace squared terms. Nucl. Phys. B434 264–282, (1995) http: //arXiv.org/abs/hep-th/9409064 35. Klemm, A., Marino, M., Theisen, S.: Gravitational corrections in supersymmetric gauge theory and matrix models. arXiv:hep-th/0211216
392
V. Balasubramanian, J. de Boer, B. Feng, Y.-H. He, M. Huang, V. Jejjala, A. Naqvi
36. McGreevy, J.: Adding flavor to Dijkgraaf-Vafa. hep-th/0211009 37. Seiberg, N., Witten, E.: Monopole condensation, and confinement in N = 2 supersymmetric YangMills theory. hep-th/9407087; Monopoles, duality and chiral symmetry breaking in N = 2 supersymmetric QCD. hep-th/9408099 38. Veneziano, G., Yankielowicz, S.: An effective lagrangian for the pure N = 1 supersymmetric YangMills theory. Phys. Lett. B 113, 231 (1982)
Communicated by M.R. Douglas
Commun. Math. Phys. 242, 393–423 (2003) Digital Object Identifier (DOI) 10.1007/s00220-003-0906-5
Communications in
Mathematical Physics
Sum Rules and the Szeg˝o Condition for Orthogonal Polynomials on the Real Line Barry Simon , Andrej Zlatoš Mathematics 253-37, California Institute of Technology, Pasadena, CA 91125, USA. E-mail: [email protected]; [email protected] Received: 27 January 2003 / Accepted: 25 March 2003 Published online: 10 October 2003 – © Springer-Verlag 2003
Abstract: We study the Case sum rules, especially C0 , for general Jacobi matrices. We establish situations where the sum rule is valid. Applications include an extension of Shohat’s theorem to cases with an infinite point spectrum and a proof that if lim n(an − 1) = α and lim nbn = β exist and 2α < |β|, then the Szeg˝o condition fails. 1. Introduction This paper discusses the relation among three objects well known to be in one-one correspondence: nontrivial (i.e., not supported on a finite set) probability measures, ν, of bounded support in R; orthogonal polynomials associated to geometrically bounded moments; and bounded Jacobi matrices. One goes from measure to polynomials via the Gram-Schmidt procedure, from polynomials to Jacobi matrices by the three-term recurrence relation, and from Jacobi matrices to measures by the spectral theorem. We will use J to denote the Jacobi matrix (an > 0) b1 a1 0 . . . a b a . . . J = 1 2 2 . (1.1) 0 a2 b3 . . . ... ... ... ... ν will normally denote the spectral measure of the vector δ1 ∈ 2 (Z+ ) and Pn (x) the orthonormal polynomials. We are interested in J ’s close √ to the free Jacobi matrix, J0 , with bn = 0, an = 1, and dν0 (E) = (2π )−1 χ[−2,2] 4 − E 2 dE. Most often, we will suppose J − J0 is compact. That means σess (J ) = [−2, 2] and J has only eigenvalues outside [−2, 2], of multiplicity one denoted Ej± with E1+ > E2+ > · · · > 2 and E1− < E2− < · · · < −2.
Supported in part by NSF grant DMS-9707661.
394
B. Simon, A. Zlatoš
One of the main objects of study here is the Szeg˝o integral 2 √ 1 4 − E2 dE Z(J ) = ln . √ 2π −2 2πdνac /dE 4 − E2
(1.2)
The Szeg˝o integral is often taken in the literature as 2 dνac dE ln , (2π)−1 √ dE 4 − E2 −2 which differs from Z(J ) by a constant and a critical minus sign (so the common condition that the Szeg˝o integral not be −∞ becomes Z(J ) < ∞ in our normalization). There is an enormous literature discussing when Z(J ) < ∞ holds (see, e.g., [1, 2, 7, 9, 13, 14, 16, 17, 22, 24]). It can be shown by Jensen’s inequality that Z(J ) ≥ − 21 ln(2) so the integral can only diverge to +∞. We will focus here on various sum rules that are valid. One of our main results is the following: Theorem 1. Suppose A0 (J ) = lim
N→∞
N − ln(an )
(1.3)
n=1
exists (although it may be +∞ or −∞). Consider the additional quantities Z(J ) given by (1.2) and
± ± 2 1 E0 (J ) = ln 2 |Ej | + (Ej ) − 4 . (1.4) ±
j
If any two of the three quantities A0 (J ), E0 (J ), and Z(J ) are finite, then all three are, and Z(J ) = A0 (J ) + E0 (J ). Remarks. 1. It is not hard to see that E0 (J ) < ∞ if and only if
(Ej± )2 − 4 < ∞. ±
(1.5)
(1.6)
j
2. The full theorem (Theorem 4.1) does not require the limit (1.3) to exist, but is more complicated to state in that case. 3. If the three quantities are finite, many additional sum rules hold. 4. This is what Killip-Simon [11] call the C0 sum rule. 5. Peherstorfer-Yuditskii [17] (see their remark after Lemma 2.1) prove that if Z(J ) < ∞, E0 (J ) = ∞, then the limit in (1.3) is also infinite. Theorem 1 is an analog for the real line of a seventy-year old theorem for orthogonal polynomials on the unit circle: 2π ∞
dνac 1 dθ = ln ln(1 − |αj |2 ), (1.7) 2π 0 dθ n=0
Sum Rules and the Szeg˝o Condition
395
where {αj }∞ j =1 are the Verblunsky coefficients (also called reflection, Geronimus, Schur, or Szeg˝o coefficients) of ν. This result was first proven by Verblunsky [27] in 1935, although it is closely related to Szeg˝o’s 1920 paper [24]. For J ’s with J −J0 finite rank (and perhaps even with ∞ n=1 n(|an −1|+|bn |) < ∞), the sum rule (1.5) is due to Case [2]. Recently, Killip-Simon [11] showed how to exploit these sum rules as a spectral tool (motivated in turn by work on Schrödinger operators by Deift-Killip [5] and Denissov [6]). In particular, Killip-Simon emphasized the importance in proving sum rules on as large a class of J ’s as possible. One application we will make of Theorem 1 and related ideas is to prove the following (≡ Theorem 5.2): Theorem 2. Suppose σess (J ) ⊂ [−2, 2] and (1.6) holds. Then Z(J ) < ∞ if and only if
N ln(an ) < ∞. (1.8) lim inf − N
n=1
Moreover, if these conditions hold, then (i) The limit A0 (J ) in (1.3) exists and is finite. (ii) limN→∞ N n=1 bn exists and is finite. (iii) ∞
(an − 1)2 +
n=1
∞
bn2 < ∞.
(1.9)
n=1
Results of this genre when it is assumed that σ (J ) = [−2, 2] go back to Shohat [22] with important contributions by Nevai [14]. The precise form is from Killip-Simon [11]. Nikishin [16] showed how to extend this to Jacobi matrices with finitely many eigenvalues. Peherstorfer-Yuditskii [17] proved Z(J ) < ∞ implies (i) under the condition E0 (J ) < ∞, allowing an infinity of eigenvalues for the first time. Our result cannot extend to situations with E0 (J ) = ∞ since Theorem 1 says if (i) holds and Z(J ) < ∞, then E0 (J ) < ∞. We will highlight one other result we will prove later (Corollary 6.3). Theorem 3. Let an , bn be Jacobi matrix parameters so that lim n(an − 1) = α,
n→∞
lim nbn = β
n→∞
(1.10)
exist and are finite. Suppose that |β| > 2α.
(1.11)
Then Z(J ) = ∞. Remark. In particular, if α < 0, (1.11) always holds. Equation (1.11) describes threequarters of the (2α, β) plane. In Sect. 6, we will discuss the background for this result, and describe results of Zlatoš [28] that show if |β| ≤ 2α and one has additional information on the approach to the limit (1.10), then Z(J ) < ∞. Thus Theorem 3 captures the precise region where one has (1.10) and one can hope to prove Z(J ) = ∞. Theorem 3 will actually follow from a more general result (see Theorem 4.4, 6.1, and 6.2).
396
B. Simon, A. Zlatoš
Theorem 4. Suppose (1.9) holds and that either lim sup(− nj=1 (aj − 1 + 21 bj )) = ∞ or lim sup(− nj=1 (aj − 1 − 21 bj )) = ∞. Then Z(J ) = ∞. The main technique in this paper exploits the m-function, the Borel transform of the measure, ν: dν(x) mν (E) = . (1.12) x−E Since ν is supported on [−2, 2] plus the set of points {Ej± }, we can write mν (E) =
ν({Ej± }) ±
j
Ej±
−E
+
2 −2
dν(x) . x−E
(1.13)
It is useful to transfer everything to the unit circle, using the fact that z → E = z+z−1 maps D = {z | |z| < 1} onto the cut plane C\[−2, 2]. Thus we can define for |z| < 1, M(z) = −mν (z + z−1 ).
(1.14)
The minus sign is picked so Im M(z) > 0 if Im z > 0. We use M(z; J ) when we want to make the J -dependence explicit. The function M is meromorphic in D with poles at (βj± )−1 such that Ej± = βj± + (βj± )−1
(1.15)
with |βj± | > 1. We sometimes drop the explicit ± symbol and count the βj ’s in one set. We define a signed measure dµ# on [0, 2π] by Im M(reiθ )dθ → dµ# (θ ) weakly as r ↑ 1. Hence µ# is positive on (0, π) and negative on (π, 2π ). Actually, M(z) = M(¯z) implies dµ# (π + θ ) = −dµ# (π − θ ), so we let µ ≡ µ# [0, π].
(1.16)
By general principles [21], Im M(eiθ ) ≡ lim Im M(reiθ ) = r↑1
dµac dνac (θ) = π (2 cos θ) dθ dx
(1.17)
for a.e. θ ∈ (0, π ). One actually has that if d µ(θ ˜ ) ≡ 2 sin θ dµ# (θ ) = 2| sin θ| d|µ# |(θ )
(1.18)
then for any interval I ⊂ (0, π) ∪ (π, 2π), µ(I ˜ ) = πν(2 cos I ).
(1.19)
The reason why we exclude 0, π, 2π is possible mass points of ν at ±2. These do not translate to µ# because Im M(±r) = 0 (notice that r + r −1 → ±2 as r → ±1 is not a nontangential limit).
Sum Rules and the Szeg˝o Condition
397
By (1.19), µ([0, ˜ 2π]) ≤ 2π, so µ˜ is a finite measure. This need not be true for µ# , as can be seen from (1.18) and (1.19). Indeed, these formulae show that µ# is finite if and only if
2
dν(x) χ(−2,2) (x) √ < ∞, 4 − x2 −2
(1.20)
where χ(−2,2) ensures that possible mass points at ±2 do not enter here. We would now like to write (1.13) (or rather its imaginary part) in terms of M. The pole terms (including those at ±2, if they are present) translate directly, and so Im M(z) = Im
±
µ({βj−1 }) µ({±1}) + K(z), + Im z + z−1 − 2 z + z−1 − (βj + βj−1 )
(1.21)
j
where K(z) ≡ Im
2
−2
χ(−2,2) (x)
dν(x) z + z−1 − x
and we use µ({βj−1 }) for the weights ν({Ej }) (and µ({±1}) for ν({±2})). We note that
since µ({βj−1 }) are point masses of a probability measure, we have
µ({βj−1 }) ≤ 1
j ± ≡ ±1. with the µ({±1}) terms included in the sum as β∞ We will rewrite K(z) in terms of the Poisson kernel. Assume first that (1.20) holds, that is, µ# is a finite measure. Notice that K(reiθ ) is a harmonic function in D. Moreover, since the imaginary parts of the pole terms go to 0 as r ↑ 1,
K(reiθ ) − Im M(reiθ ) → 0 as r ↑ 1, uniformly for θ in compact subsets of (0, π ) ∪ (π, 2π ). This means that K(reiθ ) dθ → dµ# (θ ) weakly as measures on (0, π ) ∪ (π, 2π ). Clearly K(±1) = 0 and µ# ({0, π}) = 0, and so K(reiθ ) dθ → dµ# (θ ) as measures on [0, 2π ]. Since K(z) is harmonic in D, it follows (see, e.g., [21]) that K(z) =
2π
P (z, eiϕ )
0
dµ# (ϕ) , 2π
(1.22)
where P (z, w), with |z| < 1, |w| = 1, is the Poisson kernel P (z, w) ≡
1 − |z|2 |z − w|2
(1.23)
1 − r2 . 1 + r 2 − 2r cos(θ − ϕ)
(1.24)
or Pr (θ, ϕ) ≡ P (reiθ , eiϕ ) =
398
B. Simon, A. Zlatoš
Then using the fact that µ# is odd under reflection, we can rewrite (1.21) as Im M(re ) = Im iθ
reiθ
j
µ({βj−1 })
+ r −1 e−iθ
− (βj + βj−1 )
π
+
Dr (θ, ϕ) 0
dµ(ϕ) , (1.25) 2π
where Dr (θ, ϕ) = Pr (θ, ϕ) − Pr (θ, −ϕ).
(1.26)
This is because M(z) = M(¯z), so that µ = −µ# [−π, 0]. As we shall see, it turns out that (1.25) holds even if (1.20) fails, although (1.22) is meaningless in that case. We only need to consider θ, ϕ ∈ [0, π ]. Then obviously Dr (θ, ϕ) ≥ 0, and (1.24) and (1.26) show (1 − r 2 )2r(cos(θ − ϕ) − cos(θ + ϕ)) (1 + r 2 − 2r cos(θ − ϕ))(1 + r 2 − 2r cos(θ + ϕ)) 4r sin θ sin ϕ = Pr (θ, ϕ) . 1 + r 2 − 2r cos(θ + ϕ)
Dr (θ, ϕ) =
Notice that if θ, ϕ ∈ [0, π ], then 2r cos(θ + ϕ) ≤ 2r| cos θ |. Since r < 1, we have sin2 (θ ) + 2| cos θ | = 1 − cos2 (θ ) + 2| cos θ | ≤ 2 ≤
1 + r2 , r
which implies r sin2 (θ ) ≤ 1. 1 + r 2 − 2r| cos θ | Hence 0≤
4Pr (θ, ϕ) Dr (θ, ϕ) ≤ sin ϕ sin θ
(1.27)
for θ, ϕ ∈ (0, π ) and r < 1. Using (1.19), the integral in (1.25) can now be estimated as π Dr (θ, ϕ) d µ(ϕ) ˜ Pr ∞ ≤ , 0≤ sin ϕ 4π sin θ 0 and so is finite (notice that if θ = 0, π, then Dr (θ, ϕ) ≡ 0). Moreover, if νn are probability measures which coincide with ν outside of [−2, −2 + n1 ] ∪ [2 − n1 , 2], satisfy (1.20), and νn → ν weakly, then clearly for any θ ∈ [0, π ] and r < 1, M(reiθ ; νn ) → M(reiθ ; ν) and
0
π
π
Dr (θ, ϕ) dµn (ϕ) →
Dr (θ, ϕ) dµ(ϕ). 0
Since the sum in (1.25) is the same for νn and ν, and (1.25) holds for νn , it holds for ν.
Sum Rules and the Szeg˝o Condition
399
Section 2, the technical core of the paper, proves some convergence results about integrals of ln[Im M(reiθ )]. It is precisely such integrals that arise in Sect. 3 where, following Killip-Simon [11], we use the well-known −m(z; J )−1 = z − b1 + a12 m(z; J (1) ), where J (1) is J with the top row and leftmost column removed. We will be able to prove sum rules that compare J and J (1) . In Sect. 4, we will then list various sum rules, including Theorems 1 and 4. Section 5 proves Theorem 2 and Sect. 6 discusses Coulomb Jacobi matrices (J − J0 decays as n−1 ) and Theorem 3 in particular. It is a pleasure to thank Mourad Ismail, Rowan Killip, and Paul Nevai for useful discussions. 2. Continuity of Integrals of ln(ImM) In this section, we will prove a general continuity result about boundary values of interest for M-functions of the type defined in (1.24). We will consider suitable weight functions, w(ϕ), on [0, π], of which the examples of most interest are w(ϕ) = sink (ϕ), k = 0 or 2. Our goal is to prove that lim ln[Im M(reiϕ )] w(ϕ) dϕ = ln[Im M(eiϕ )] w(ϕ) dϕ (2.1) r↑1
and that the convergence is in L1 if the integral on the right is finite. All integrals in this section are from 0 to π if not indicated otherwise. We define d(ϕ) ≡ min(ϕ, π − ϕ)
(2.2)
0 ≤ w(ϕ) ≤ C1 d(ϕ)−1+α
(2.3)
and we suppose that
for some C1 , α > 0 and that w is C 1 with |w (ϕ)w(ϕ)−1 | ≤ C2 d(ϕ)−β
(2.4)
for C2 , β > 0. For weights of interest, one can take α = β = 1. Remarks. 1. For the applications in mind, we are only interested in allowing “singularities” (i.e., w vanishing or going to infinity) at 0 or π , but all results hold with unchanged proofs if d(ϕ) ≡ min{|ϕ − ϕj |} for any finite set {ϕj }. For example, w(ϕ) = sin2 (mϕ) as πin [12] is fine. 2. Note that by (2.3), 0 w(ϕ) dϕ < ∞. The main technical result we will need is: Theorem 2.1. Let M be a function with a representation of the form (1.25) and let w be a weight function obeying (2.3) and (2.4). Then (2.1) holds. Moreover, if ln[Im M(eiϕ )]w(ϕ) dϕ > −∞ (2.5) (it is never +∞), then lim ln[Im M(reiϕ )] − ln[Im M(eiϕ )] w(ϕ) dϕ = 0. r↑1
(2.6)
400
B. Simon, A. Zlatoš
Let ln± be defined by ln± (y) = max(0, ± ln(y)) so ln(y) = ln+ (y) − ln− (y), |ln(y)| = ln+ (y) + ln− (y). We will prove Theorem 2.1 by proving Theorem 2.2. For any a > 0 and p < ∞, ln+ [Im(M(eiϕ ))/a] ∈ Lp ((0, π ), w(ϕ)dϕ), and Im M(reiϕ ) Im M(eiϕ ) p lim ln+ − ln+ (2.7) w(ϕ) dϕ = 0. r↑1 a a Theorem 2.3. For any a > 0, we have Im M(reiϕ ) Im M(eiϕ ) lim ln− w(ϕ) dϕ = ln− w(ϕ) dϕ. r↑1 a a
(2.8)
Proof of Theorem 2.1 given Theorems 2.2 and 2.3. By Fatou’s lemma and the fact that for a.e. ϕ, Im M(reiϕ ) → Im M(eiϕ ), we have lim inf ln− [Im M(reiϕ )] w(ϕ) dϕ ≥ ln− [Im M(eiϕ )] w(ϕ) dϕ. (2.9) r↑1
Since Theorem 2.2 says that sup0
Im M(eiϕ ) lim ln− w(ϕ) dϕ = 0 a↓0 a since ln− (y/a) is monotone decreasing to 0 as a decreases. Given ε, first find a so
Im M(eiϕ ) ε ln− w(ϕ) dϕ < a 3 and then, by (2.8), r1 < 1 so for r1 < r < 1,
Im M(reiϕ ) ε ln− w(ϕ) dϕ < . a 3 By (2.7), find r2 < 1, so for r2 < r < 1, iϕ iϕ ln+ Im M(re ) − ln+ Im M(e ) w(ϕ) dϕ < ε . a a 3 Writing
α β α β |ln(α) − ln(β)| ≤ ln+ − ln+ + ln + ln − − a a a a
Sum Rules and the Szeg˝o Condition
401
we see that if max(r1 , r2 ) < r < 1, then ln[Im M(reiϕ )] − ln[Im M(eiϕ )] w(ϕ) dϕ < ε so (2.6) holds. We will prove Theorem 2.2 by using the dominated convergence theorem and standard maximal function techniques. We let the maximal function of the measure µ˜ defined in (1.18) be µ˜ ∗ (x) = sup 0
µ(x ˜ − a, x + a) . 2a
The Hardy-Littlewood maximal inequality for measures (see Rudin [21]) says that |{x | µ˜ ∗ (x) > λ}| ≤
3µ(0, ˜ π) . λ
(2.10)
Lemma 2.4. Let M satisfy (1.25), and let α be the sum of the weights of the poles (βj± )−1 . Then for 0 < r < 1 and 0 ≤ θ ≤ π, Im M(reiθ ) ≤ µ˜ ∗ (θ )[sin θ ]−1 + αr −1 [sin θ]−2 .
(2.11)
Proof. Since Dr (θ, ϕ) ≤ Pr (θ, ϕ) and Pr is a convolution operator with a positive even 2π function of ϕ decreasing on [0, π] with 0 Pr (ϕ) dϕ/2π = 1, we have, by standard calculations, (1.18), (1.19), and (1.27), that π π 4Pr (θ, ϕ) d µ(ϕ) dµ(ϕ) ˜ µ˜ ∗ (θ ) Dr (θ, ϕ) ≤ ≤ . 2π sin θ 4π sin θ 0 0 On the other hand, for |β| ≥ 1, 1 z 1 |z| = z + z−1 − β − β −1 (z − β)(z − β −1 ) ≤ |Im z|2 = r sin2 θ if z = reiθ , so summing the pole term shows, Im
µ({βj−1 })
j
z + z−1 − βj − βj−1
≤
j
µ({βj−1 })
r sin2 (θ )
.
Proof of Theorem 2.2. Let f1 (θ ) = µ˜ ∗ (θ )[sin θ ]−1
f2 (θ ) = 2[sin(θ )]−2 .
For a.e. θ, ln+ [(Im M(reiθ ))/a] → ln+ [(Im M(eiθ ))/a]. By (2.11) for all 21 < r < 1, ln+ [(Im M(reiθ ))/a] ≤ ln+ [(f1 (θ ) + f2 (θ ))/a]. Thus if we prove that for all p < ∞, p ln+ f1 (ϕ) + f2 (ϕ) w(ϕ) dϕ < ∞, a
402
B. Simon, A. Zlatoš
we obtain (2.7) by the dominated convergence theorem. Since |ln+ (x)|p ≤ C(p, q)|x|q for any p < ∞, q > 0, and suitable C(p, q), and |x + y|q ≤ 2q |x|q + 2q |y|q , it suffices to find some q > 0, so (|f1 (ϕ)|q + |f2 (ϕ)|q ) w(ϕ) dϕ < ∞. Since for v −1 + t −1 = 1, 1/v 1/t q qv t |w(ϕ)| dϕ |f1 (ϕ)| dϕ |f1 (ϕ)| w(ϕ) dϕ ≤ and w(ϕ) ∈ Lt for some t > 1 by (2.3), it suffices to find some s > 0 with (|f1 (ϕ)|s + |f2 (ϕ)|s ) dϕ < ∞.
(2.12)
By (2.10) and Cauchy-Schwartz, 1/2 1/2 | sin ϕ|−2s dϕ |µ˜ ∗ (ϕ)|2s dϕ <∞ |f1 (ϕ)|s dϕ ≤ and |f2 (ϕ)|s dϕ < ∞ whenever s < 21 .
As a preliminary to the proof of Theorem 2.3, we need Lemma 2.5. Let w obey (2.4). Let 0 < ϕ0 < π and let ϕ1 , ϕ2 ∈ [0, π ] obey (a)
d(ϕ1 ) ≥ d(ϕ0 ),
d(ϕ2 ) ≥ d(ϕ0 ),
(b)
|ϕ1 − ϕ2 | ≤ d(ϕ0 ) . β
Then for C3 = C2 exp(C2 ), w(ϕ1 ) −β w(ϕ ) − 1 ≤ C3 |ϕ1 − ϕ2 | d(ϕ0 ) . 2 Proof.
(2.14)
(2.15)
ϕ2 w(ϕ1 ) w (η) = exp − 1 dη − 1 w(ϕ ) 2 ϕ1 w(η) ≤ |exp(C2 |ϕ2 − ϕ1 | d(ϕ0 )−β ) − 1|
by (2.4) and (2.13). But |ex − 1| ≤ e|x| |x|, so by (2.14), w(ϕ1 ) −β w(ϕ ) − 1 ≤ C2 exp(C2 )|ϕ1 − ϕ2 | d(ϕ0 ) , 2 which is (2.15).
(2.13)
(2.16)
Sum Rules and the Szeg˝o Condition
403
We will also need the following pair of lemmas: Lemma 2.6. Let 0 < η < θ < π − η and θ+η dϕ Nr (θ, η) = . Dr (θ, ϕ) 2π θ−η Then 0 ≤ [1 − Nr (θ, η)] ≤
4(1 − r) . r sin2 (η)
(2.17)
Proof. We have 1=
2π
Pr (θ, ϕ) 0
so since Dr ≤ Pr , Nr ≤ 1 and 1 − Nr (θ, η) ≤
2 2π
dϕ , 2π
ϕ∈[0,2π] |θ−ϕ|≥η
Pr (θ, ϕ) dϕ.
If |θ − ϕ| ≥ η, then Pr (θ, ϕ) = ≤
1 − r2 (1 − r)2 + 4r sin2 [ 21 (θ − ϕ)] 2(1 − r)
4r sin2 ( 21 η) 2(1 − r) , ≤ r sin2 (η) and (2.17) is immediate. π Lemma 2.7. If 0 Im M(eiθ ) dθ = 0, then for θ ∈ [0, π ], r ∈ ( 21 , 1), Im M(reiθ ) ≥ c(r −1 − r) sin θ. Proof. In terms of the real line m function, for E2 > 0, E1 real, E2 2 Im m(E) dE , Im[−m(E1 − iE2 )] ≥ π −2 (E1 − E)2 + E22
(2.18)
(2.19)
since we have dropped the positive contributions of νsing to Im(−m). Now if z = reiθ , M(z) = −m(E1 − iE2 ), where z + z−1 = E1 − iE2 or E1 = (r + r −1 ) cos θ , E2 = (r −1 − r) sin θ . If r > 21 , then |E1 | ≤ 25 , |E2 | ≤ 23 , and in (2.19), |E| ≤ 2. Thus Im M(z) ≥ cE2 (z) which is (2.18).
404
B. Simon, A. Zlatoš
Proof of Theorem 2.3. Since ln− is a decreasing function, to get upper bounds on ln− [Im M(reiθ )/a], we can use a lower bound on Im M. The elementary bound ln− (ab) ≤ ln− (a) + ln− (b)
(2.20)
will be useful. As already noted, Fatou’s lemma implies the lim inf of the left side of (2.8) is bounded from below by the right side, so it suffices to prove that π π Im M(reiϕ ) Im M(eiϕ ) lim sup ln− ln− w(ϕ) dϕ ≤ w(ϕ) dϕ. (2.21) a a 0 0 r↑1 Pick γ and κ so 0 < max(β, 1)γ < κ < 21 and let θ0 (r) = (1 − r)γ , η(r) = (1 − r)κ . We will bound Im M(reiθ ) from below for d(θ ) ≤ θ0 (r) using (2.18), and for d(θ ) ≥ θ0 (r), we will use the Poisson integral for the region |ϕ − θ | ≤ η(r). By (2.18) and (2.3), Im M(reiϕ ) ln− w(ϕ) dϕ ≤ Ca θ0α [ln− (r −1 − r) + ln− θ0 ], a d(ϕ)≤θ0 (r) which goes to zero as r ↑ 1 for any a. So suppose d(θ ) > θ0 . Write θ+η(r) dϕ iθ Im M(re ) ≥ Dr (θ, ϕ) Im M(eiϕ ) 2π θ−η(r) θ+η(r) Dr (θ, ϕ) Im M(eiϕ ) dϕ. = Nr (θ, η) 2πN (θ, η) r θ−η(r)
(2.22)
For later purposes, note that for d(θ ) > θ0 , (2.17) implies 0 ≤ 1 − Nr (θ, η) ≤ C(1 − r)1−2κ ,
(2.23)
which goes to zero since κ < 21 . Using (2.22) and (2.20), we bound ln− [Im M(reiθ )/a] as two ln− ’s. Since ln− is convex and Dr (θ, ϕ)/2πNr (θ, η) χ(θ−η,θ+η) (ϕ) dϕ is a probability measure, we can use Jensen’s inequality to see that
Im M(reiθ ) w(θ ) ln− ≤ w(θ ) ln− [Nr (θ, η)] a
θ+η(r) Im M(eiϕ ) dϕ w(θ ) Dr (θ, ϕ) w(ϕ) ln− . + a 2π θ−η(r) w(ϕ) Nr (θ, η) (2.24) In the first term for the θ ’s with d(θ ) ≥ θ0 (r), Nr obeys (2.23) so w(θ ) ln− [Nr (θ, η)] dθ = O((1 − r)1−2κ ) → 0.
(2.25)
d(θ)≥θ0 (r)
In the second term, note that for the θ ’s in question, Nr (θ, η)−1 − 1 = O((1 − r)1−2κ ) and by (2.15), w(θ )/w(ϕ) − 1 = O((1 − r)κ−βγ ). Since Dr (θ, ϕ) ≤ Pr (θ, ϕ), we thus have
Sum Rules and the Szeg˝o Condition
d(θ)≥θ0
ln−
405
Im M(reiθ ) w(θ ) dθ a
≤ O((1 − r)1−2κ ) + [1 + O((1 − r)1−2κ )][1 + O((1 − r)κ−βγ )]
Im M(eiϕ ) dθ P (θ, ϕ)w(ϕ) ln− dϕ . (2.26) d(θ )≥θ0 r a 2π |ϕ−θ|≤η
Since the integrand is positive, we can extend it to {(θ, ϕ) | θ ∈ [0, 2π ], ϕ ∈ [0, π ]} and do the θ integration using Pr (θ, ϕ)dθ/2π = 1. The result is (2.21). This concludes the proof of Theorem 2.1. By going through the proof, one easily sees that Theorem 2.8. Theorem 2.1 remains true if in (2.1) and (2.6), ln[Im M(reiϕ )] is replaced by ln[g(r) sin ϕ + Im M(reiϕ )], where g(r) ≥ 0 and g(r) → 0 as r ↑ 1. Proof. In the ln+ bounds, we get an extra [sup 1
3. The Step-by-Step Sum Rules We will call J a BW matrix (for Blumenthal-Weyl) if J is a bounded Jacobi matrix with σess (J ) = [−2, 2], for example, if J − J0 is compact. Let J (n) be the matrix resulting from removing the first n rows and columns. Let {Ej± (J )}∞ j =1 be the eigenvalues of J above/below ±2, ordered by ±E1± ≥ ±E2± ≥ · · · with Ej± (J ) defined to be ±2 if there are only finitely many eigenvalues k < j above/below ±2. Then by the min-max principle, ±Ej±+n (J ) ≤ ±Ej± (J (n) ) ≤ ±Ej± (J ).
(3.1)
We have limj →∞ Ej± (J ) = ±2 if J is a BW matrix. It follows by the convergence of sums of alternating series that if f is even or odd and monotone on [2, ∞) with f (2) = 0, then lim
N→∞
N
±
[f (Ej± (J )) − f (Ej± (J (n) ))] ≡ δfn (J )
(3.2)
j =1
exists and is finite. If βj± is defined by Ej± = βj± + (βj± )−1 with |βj | > 1, we define (n)
X (J ) as δfn (J ) for f (E) =
ln|β| =0 . 1 − − [β − β ] = 1, 2, . . .
(3.3)
406
B. Simon, A. Zlatoš
In addition, we will need =0 − n ln(aj ) (n) , ζ (J ) = 2 j =1 1 1 (n) lim m→∞ [Tr(T ( 2 Jm;F )) − Tr(T ( 2 Jm−n;F ))] = 1, 2, . . .
(3.4)
where Jm;F is the finite matrix formed from the first m rows and columns of J and T is the th Chebyshev polynomial (of the first kind). As noted in [11, Prop. 4.3], the limit in (3.4) exists since the expression is independent of m once m > + n. Note that (n)
n
(n)
j =1 n
ζ1 (J ) = ζ2 (J ) =
j =1
(3.5)
bj , 1 2 2 bj
+ (aj2 − 1),
(3.6)
as computed in [11]. Note that by construction (with J (0) ≡ J ), (n)
X (J ) =
n−1
(1)
X (J (j ) )
(3.7)
j =0
and (n)
ζ (J ) =
n−1
(1)
ζ (J (j ) ).
(3.8)
j =0
As final objects we need 1 Z(J ) = 4π and for ≥ 1,
2π 0
sin θ ln dθ, Im M(eiθ , J )
sin θ ln (1 ± cos(θ )) dθ, Im M(eiθ , J ) 0 2π 1 sin θ cos(θ ) dθ. ln Y (J ) = − 2π 0 Im M(eiθ , J )
Z± (J )
1 = 4π
2π
(3.9)
(3.10) (3.11)
We include “sin θ” inside ln(. . . ) so that Z(J0 ) = Z± (J0 ) = Y (J0 ) = 0 because M(z, J0 ) = z. Notice that (3.9) is the same as (1.2). Indeed, Im M(eiθ , J ) = sgn(π − θ ) π
dνac (2 cos θ) dE
for a.e. θ ∈ (0, 2π ), and the factor (4π)−1 replaces (2π )−1 because under z → z + z−1 the unit circle covers (−2, 2) twice. Of course, Z± (J ) = Z(J ) ∓
1 2
Y (J )
(3.12)
Sum Rules and the Szeg˝o Condition
407
when all integrals converge. By Theorem 2.2, the ln− piece of the integrals in (3.9)– (3.11) always converges. Since 1 ± cos(θ ) ≥ 0, the integrals defining Z(J ), Z± (J ) either converge or diverge to +∞. We therefore always define Z(J ) and Z± (J ) although they may take the value +∞. Since [1±cos(θ )] ≤ 2, Z(J ) < ∞ implies Z± (J ) < ∞, so we define Y (J ) by (3.12) if and only if Z(J ) < ∞. If Z(J ) < ∞, we say J obeys the Szeg˝o condition or J is Szeg˝o. If Z1± (J ) < ∞, we say J is Szeg˝o at ±2 since, for example, if Z1+ (J ) < ∞, the integral in (3.9) converges near θ = 0 (E = 2 cos(θ ) near +2) and if Z1− (J ) < ∞, the integral converges near θ = π (i.e., E = −2). Note that while Z1+ (J ) < ∞ only implies convergence of (3.9) at θ = 0, it also implies that at θ = π the integral with a sin2 θ inserted converges (quasi-Szeg˝o condition). Our main goal in this section is to prove the next three theorems Theorem 3.1 (Step-by-Step Sum Rules). Let J be a BW matrix. Z(J ) < ∞ if and only if Z(J (1) ) < ∞, and if Z(J ) < ∞, we have (1)
Z(J ) = − ln(a1 ) + X0 (J ) + Z(J (1) ), Y (J ) =
(3.13)
(1) (1) ζ (J ) + X (J ) + Y (J (1) );
= 1, 2, 3, . . . .
(3.14)
Remarks. 1. By iteration and (3.7)/(3.8), we obtain if Z(J ) < ∞, then Z(J (n) ) < ∞ and Z(J ) = −
n
(n)
ln(aj ) + X0 (J ) + Z(J (n) ),
(3.15)
j =1 (n)
(n)
Y (J ) = ζ (J ) + X (J ) + Y (J (n) );
= 1, 2, 3, . . . .
(3.16)
2. We call (3.13)/(3.14) the step-by-step Case sum rules. Theorem 3.2 (One-Sided Step-by-Step Sum Rules). Let J be a BW matrix. Z1± (J ) < ∞ if and only if Z1± (J (1) ) < ∞, and if Z1± (J ) < ∞, then we have for = 1, 3, 5, . . . , Z± (J ) = − ln(a1 ) ∓
(1) 1 (1) 1 2 ζ (J ) + X0 (J ) ∓ 2
X (J ) + Z± (J (1) ). (1)
(3.17)
Remark. Theorem 3.2 is intended to be two statements: one with all the upper signs used and one with all the lower signs used. Theorem 3.3 (Quasi-Step-by-Step Sum Rules). Let J be a BW matrix. Z2− (J ) < ∞ if and only if Z2− (J (1) ) < ∞, and if Z2− (J ) < ∞, then for = 2, 4, . . . , we have Z− (J ) = − ln(a1 ) +
(1) 1 (1) 1 2 ζ (J ) + X0 (J ) + 2
X (J ) + Z− (J (1) ). (1)
(3.18)
Remarks. 1. The name comes from the fact that since 1 − cos 2θ = 2 sin2 θ, Z2− (J ) is what Killip-Simon [11] called the quasi-Szeg˝o integral, 2π 1 sin θ Z2− (J ) = sin2 θ dθ. ln (3.19) 2π 0 Im M(eiθ , J ) 2. Since Z(J ) < ∞ implies Z1+ (J ) and Z1− (J ) < ∞, and Z1+ (J ) or Z1− (J ) < ∞ imply Z2− (J ) < ∞, we have additional sum rules in various cases.
408
B. Simon, A. Zlatoš
3. In [12], Laptev et al. prove sum rules for Z− (J ) where = 4, 6, 8, . . . . One can develop step-by-step sum rules in this case and use it to streamline the proof of their rules as we streamline the proof of the Killip-Simon P2 rule (our Z2− sum rule) in the next section. The step-by-step sum rules were introduced in Killip-Simon, who first take r < 1 (in our language below), then take n → ∞, and then r ↑ 1 with some technical hurdles to take r ↑ 1. By first letting r ↑ 1 with n < ∞, and then n → ∞ as in the next section, we can both simplify their proof and obtain additional results. The idea of using the imaginary part of −M(z; J )−1 = −(z + z−1 ) + b1 + a12 M(z; J (1) )
(3.20)
is taken from Killip-Simon [11]. Proof of Theorem 3.1. Taking imaginary parts of both sides of (3.20) with z = reiθ and r < 1, we obtain [Im M(reiθ ; J )] |M(reiθ ; J )|−2 = (r −1 − r) sin θ + a12 Im M(reiθ ; J (1) ). Taking ln’s of both sides, we obtain sin θ ln = t1 + t2 + t3 , Im M(reiθ ; J )
(3.21)
(3.22)
where t1 = −2 ln|M(reiθ ; J )|, t2 = −2 ln a1 , sin θ t3 = ln , g(r) sin θ + Im M(reiθ ; J (1) )
(3.23) (3.24)
g(r) = a1−2 (r −1 − r).
(3.26)
(3.25)
where
Let f (z) =
M(rz; J ) , rz
so f (0) = 1 (see (3.20)). In the unit disk, f (z) is meromorphic and has poles at {(rβj± (J ))−1 | j so that |βj± (J )| > r −1 } and zeros at {(rβj± (J (1) ))−1 | j so that |βj± (J (1) )| > r −1 }. Thus, by Jensen’s formula for f : 1 4π
0
2π
t1 dθ = − ln r +
|βj± (J )|>r −1
ln|rβj± (J )| −
ln|rβj± (J (1) )|.
|βj± (J (1) )|>r −1
By (3.1), the number of terms in the sums differs by at most 2, so that the ln(r)’s cancel up to at most 2 ln(r) → 0 as r ↑ 1. Thus as r ↑ 1, 2π 1 (1) (t1 + t2 ) dθ → − ln(a1 ) + X0 (J ). (3.27) 4π 0
Sum Rules and the Szeg˝o Condition
409
It follows by (3.22) and Theorems 2.1 and 2.8 (with w(ϕ) = 1) that Z(J ) < ∞ if and only if Z(J (1) ) < ∞, and if they are finite, (3.13) holds. It also follows that if Z(J ) < ∞, we have L1 convergence of the ln’s to their r = 1 values. That implies convergence of the integrals with cos(θ ) inside. Higher Jensen’s formula as in [11] then implies (3.14). In place of ln|βr −1 |, we have (rβ) − (rβ)− , but the sums still converge to the r = 1 limit since we can separate the β and β − terms, and then the r’s factor out. Proofs of Theorems 3.2 and 3.3. These are the same as the above proof, but now the weight w is either 1 ± cos(θ ) or 1 − cos(2θ ) and that weight obeys (2.3) and (2.4). Corollary 3.4. Let J be a BW matrix. If J and J˜ differ by a finite rank perturbation, then J is Szeg˝o (resp. Szeg˝o at ±2) if and only if J˜ is. Proof. For some n, J (n) = J˜(n) , so this is immediate from Theorems 3.1 and 3.2.
Conjecture 3.5. Let J be a BW matrix. If J and J˜ differ by a trace class perturbation, then J is Szeg˝o (resp. Szeg˝o at ±2) if and only if J˜ is. It is possible this conjecture is only generally true if J − J0 is only assumed compact or is only assumed Hilbert-Schmidt. This conjecture for J = J0 is Nevai’s conjecture recently proven by Killip-Simon. Their method of proof and the ideas here would prove this conjecture if one can prove a result of the following form. Let J, J˜ differ by a finite rank operator so that by the discussion before (3.2), lim
N→∞
N
± j =1
Ej± (J )2
−4 −
Ej± (J˜)2
−4
≡ δ(J, J˜)
exists and is finite. The conjecture would be provable by the method of [11] and this paper (by using the step-by-step sum rule to remove the first m pieces of J and then replacing them with the first m pieces of J˜) if one had a bound of the form |δ(J, J˜)| ≤ (const.)Tr(|J − J˜|).
(3.28)
Equation (3.28) with J = J0 is the estimate of Hundertmark-Simon [10]. We have counterexamples that show (3.28) does not hold for a universal constant c. However, in these examples, J → ∞ as c → ∞. Thus it could be that (3.28) holds with c only depending on J for some class of J ’s. If it held with a bound depending only on J , the conjecture would hold in general. If J was required in J0 + Hilbert-Schmidt, we would get the conjecture for such J ’s. 4. The Z0 , Z1± , and Z2− Sum Rules Our goal here is to prove that sum rules of Case type hold under certain hypotheses. Of interest on their own, these considerations also somewhat simplify the proof of the P2 sum rule in Sect. 8 of [11], and considerably simplify the proof of the C0 sum rule for trace class J − J0 from Sect. 9 of [11]. Throughout, J will be a BW matrix. There are two main tools. As in [11], lower semicontinuity of the Z’s in J (in the topology of pointwise convergence of matrix elements) gets inequalities in one direction. We use step-by-step sum rules and boundedness from below Z for the other direction.
410
B. Simon, A. Zlatoš
We first introduce some quantities involving a fixed Jacobi matrix:
n ¯ A0 (J ) = lim sup − ln(aj ) , n→∞
j =1
n A0 (J ) = lim inf − ln(aj ) , n→∞
(4.1)
j =1
n ± 1 ¯ A1 (J ) = lim sup − (aj − 1 ± 2 bj ) , n→∞
j =1
n ± 1 A1 (J ) = lim inf − (aj − 1 ± 2 bj ) , n→∞
A2 (J ) =
∞
j =1
(4.2)
j =1
1 2 4 bj
+ 21 G(aj ) ,
(4.3)
where G(a) = a 2 − 1 − ln(a 2 ). Since G(a) ≥ 0, the finite sums have a limit (which may be +∞). We note that for a near 1, G(a) ∼ 2(a − 1)2 . Thus A2 (J ) is finite if and only if J − J0 is Hilbert-Schmidt. In (4.2), we can use aj − 1 in place of ln(aj ) because if {aj − 1} ∈ 2 (e.g., if J − J0 is Hilbert-Schmidt), then |ln(aj ) − (aj − 1)| < ∞. Notice also that in the case of a discrete Schrödinger operator (i.e., an ≡ 1), A¯ 0 (J ) = A0 (J ) = 0. Next, we introduce some functions of the eigenvalues:
E0 (J ) = ln|βj± |, (4.4) j,±
E1± (J ) =
(Ej± )2 − 4 ,
(4.5)
j
E2 (J ) =
F (Ej± ),
(4.6)
j,±
where F (E) = 41 [β 2 − β −2 − ln(β 4 )] with E = β + β −1 and |β| > 1. For |E| ∼ 2, F (E) is O((|E| − 2)3/2 ). In (4.4) and (4.6), we sum over + and −. In (4.5), we define E1+ and E1− with only the + or only the − terms. We need the following basis-dependent notion: Definition. Let B be a bounded operator on 2 (Z+ ). We say B has a conditional trace if lim
→∞
δj , Bδj ≡ c-Tr(B)
j =1
exists and is finite. If B is not trace class, this object is not unitarily invariant.
(4.7)
Sum Rules and the Szeg˝o Condition
411
Our goal in this section is to prove the following theorems whose proof is deferred until after all the statements. Theorem 4.1. Let J be a BW matrix. Consider the four statements: (i) A¯ 0 (J ) > −∞, (ii) A0 (J ) < ∞, (iii) Z(J ) < ∞, (iv) E0 (J ) < ∞. Then (a) (ii) + (iv) ⇒ (iii) + (i), (b) (i) + (iii) ⇒ (iv) + (ii), (c) (iii) ⇒ A¯ 0 (J ) < ∞, (d) (iv) ⇒ A0 (J ) > −∞. Thus (iii) + (iv) ⇒ (i) + (ii). In particular, if A0 (J ) = A¯ 0 (J ), that is, the limit exists, then the finiteness of any two of Z(J ), E0 (J ), and A¯ 0 (J ) implies the finiteness of the third. If all four conditions hold and J − J0 is compact, then (e)
n ln(aj ) ≡ A0 (J ) lim −
n→∞
(4.8)
j =1
exists and is finite, and Z(J ) = A0 (J ) + E0 (J ).
(4.9)
(f) For each = 1, 2, . . . ,
(∞) −1 [βj± (J ) − βj± (J )− ] ≡ X (J ) −
(4.10)
j,± (n)
converges absolutely and equals limn→∞ X (J ). (g) For each = 1, 2, . . . , J J0 2 T − T B (J ) = 2 2
(4.11)
has a conditional trace and (n)
c-Tr(B (J )) = lim ζ (J ) n→∞
(4.12)
for example, if = 1, nj=1 bj converges to a finite limit. (h) The Case sum rule holds: (∞)
Y (J ) = c-Tr(B (J )) + X (∞)
where Y is given by (3.11), X (4.12).
(J ),
(4.13)
by (4.10), and c-Tr(B (J )) by (4.7), (4.11), and
412
B. Simon, A. Zlatoš
Remarks. 1. In one sense, this is the main result of this paper. 2. We will give examples later where A¯ 0 (J ) = A0 (J ) and one of the conditions (i)/(ii), (iii), (iv) holds and the other two fail. 3. For odd, T (J0 /2) vanishes on-diagonal. By Proposition 2.2 of [11] and the fact that the diagonal matrix elements of J0k are eventually constant, it follows that for even, T (J0 /2) eventually vanishes on-diagonal and c-Tr(T (J0 /2)) = − 21 . Thus (g) says c-Tr(T (J /2)) exists and the sum rule (4.13) can replace c-Tr(B (J )) by 2 c-Tr(T (J /2)) plus a constant (zero if is odd and 1/ if is even). For even, c-Tr(T (J0 /2)) = − 21 while Tr(T (J0,n;F /2)) = −1 for n large because T (J0,n;F /2) has two ends. Corollary 4.2. Let J − J0 be compact. If Z(J ) < ∞, then − nj=1 ln(aj ) either converges or diverges to −∞. Remarks. 1. We will give an example later where Z(J ) < ∞, and limn→∞ (− nj=1 ln(aj )) = −∞. 2. In other words, if J − J0 is compact and A¯ 0 (J ) = A0 (J ), then Z(J ) = ∞. 3. Similarly, if J − J0 is compact and E0 (J ) < ∞, then the limit exists and is finite or is +∞. Proof. If Z(J ) < ∞ and A¯ 0 > −∞, then by (b) of the theorem, all four conditions hold, and so by (e), the limit exists. On the other hand, if A¯ 0 = −∞, then A¯ 0 = A0 = −∞. Corollary 4.3. If J − J0 is trace class, then Z(J ) < ∞, E0 (J ) < ∞, and the sum rules (4.9) and (4.13) hold. Remark. This is a result of Killip-Simon [11]. Our proof that Z(J ) < ∞ is essentially the same as theirs, but our proof of the sum rules is much easier. Proof. Since J − J0 is traceclass, it is compact. Clearly, A¯ 0 = A0 , and is neither ∞ nor −∞ since aj > 0 and |aj − 1| < ∞ imply |ln(aj )| < ∞. By the bound of Hundertmark-Simon [10], E0 (J ) < ∞. The sum rules then hold by (a), (e), and (h) of Theorem 4.1. Theorem 4.4. Suppose J − J0 is Hilbert-Schmidt. Then ± ± (i) A± 1 < ∞ and E1 < ∞ implies Z1 < ∞. ± ± ¯ (ii) Z1 < ∞ implies A1 < ∞. ± (iii) Z1± < ∞ and A¯ ± 1 > −∞ implies E1 < ∞. ± ± (iv) E1 < ∞ implies A1 > −∞.
Remarks. 1. Each of (i)–(iv) is intended as two statements. 2. In Sect. 6, we will explore (ii), which is the most striking of these results since its contrapositive gives very general conditions under which the Szeg˝o condition fails. 3. The Hilbert-Schmidt condition in (i) and (iv) can be replaced by the somewhat weaker condition that
(|Ej± | − 2)3/2 < ∞. (4.14) j,±
That is true for (ii) and (iii) also, but by the Z2− sum rule, (4.14) plus Z1± < ∞ implies J − J0 is Hilbert-Schmidt.
Sum Rules and the Szeg˝o Condition
413
Theorem 4.5. Let J be a BW matrix. Then Z2− (J ) + E2 (J ) = A2 (J ).
(4.15)
Remarks. 1. This is, of course, the P2 sum rule of Killip-Simon [11]. Our proof that Z2− (J ) + E2 (J ) ≤ A2 (J ) is identical to that in [11], but our proof of the other half is somewhat streamlined. 2. As in [11], the values +∞ are allowed in (4.15). Proof of Theorem 4.1. As in [11], let Jn be the infinite Jacobi matrix obtained from J (n) by replacing a by 1 if ≥ n and b by 0 if ≥ n + 1. Then (3.15) (noting Jn = J0 and Z(J0 ) = 0) reads Z(Jn ) = −
n
ln(aj ) +
j =1
ln|βj± (Jn )|.
(4.16)
j,±
[11, Sect. 6] implies the eigenvalue sum converges to E0 (J ) if J − J0 is compact, and in any event, is bounded above by E0 (J ) + c0 , where c0 = 0 if J − J0 is compact and otherwise, c0 = ln|β1+ (J )| + ln|β1− (J )|.
(4.17)
Moreover, by semicontinuity of the entropy [11, Sect. 5], Z(J ) ≤ lim inf Z(Jn ). Thus we have Z(J ) ≤ A0 (J ) + E0 (J ) + c0 .
(4.18)
Thus far, the proof is directly from [11]. On the other hand, by (3.15), we have (n) Z(J ) ≥ A¯ 0 (J ) + lim inf X0 (J ) + lim inf Z(J (n) ).
(4.19)
(n)
By the lemma below, limn→∞ X0 (J ) = E0 (J ). Moreover, by Theorem 5.5 (Eq. (5.26)) of Killip-Simon [11], Z(J (n) ) ≥ − 21 ln(2), and if J (n) → J0 in norm, that is, J − J0 is compact, then by semicontinuity of Z, 0 = Z(J0 ) ≤ lim inf Z(J (n) ). Therefore, (4.19) implies that Z(J ) ≥ A¯ 0 (J ) + E0 (J ) − c,
(4.20)
where c = 0 if J − J0 is compact;
c=
1 2
ln(2) in general.
(4.21)
With these preliminaries out of the way, Proof of (d). (iv) and (4.18) imply that A¯ 0 (J ) ≥ A0 (J ) ≥ Z(J ) − E0 (J ) − c0 > −∞. Proof of (a). Equation (4.18) shows Z(J ) < ∞, and (d) shows that (i) holds. Proof of (c). By (4.20) and E0 (J ) ≥ 0, Z(J ) ≥ A¯ 0 (J ) − c, so Z(J ) < ∞ implies A¯ 0 (J ) < ∞.
(4.22)
414
B. Simon, A. Zlatoš
Proof of (b). Since A¯ 0 (J ) > −∞ and c < ∞, (4.20) plus Z(J ) < ∞ implies E0 (J ) < ∞. (c) shows that (ii) holds. Note that (iii), (iv), and (4.20) imply that A0 (J ) ≤ A¯ 0 (J ) ≤ Z(J ) − E0 (J ) +
1 2
ln(2) < ∞.
(4.23)
Thus we have shown more than merely (iii) + (iv) ⇒ (i) + (ii), namely, (iii) + (iv) imply by (4.22) and (4.23) −∞ < A¯ 0 (J ) ≤ A0 (J ) +
1 2
ln(2) + c0 < ∞.
(4.24)
We can say more if J − J0 is compact. Proof of (e). Equation (4.23) is now replaced by A0 (J ) ≤ A¯ 0 (J ) ≤ Z(J ) − E0 (J ), since we can take c = 0 in (4.20). This plus (4.22) with c0 = 0 implies A¯ 0 (J ) = A0 (J ) and (4.9). Proof of (f), (g), (h). We have the sum rules (3.15), (3.16). Z(J ) ± 21 Y (J ) is an entropy up to a constant, and so, lower semicontinuous. Since J (n) − J0 → 0, we have lim inf(Z(J (n) ) ± 21 Y (J (n) )) ≥ 0.
(4.25)
(n) (n) (n) On the other hand, since Z(J ) < ∞ and E0 (J ) ≤ E0 (J ) < ∞, J obeys the sum rule (4.9). Since − nj=1 ln(aj ) converges conditionally
lim lim
n→∞ m→∞
−
m+n
ln(aj ) = 0.
j =n
Moreover, E0 (J (n) ) → 0 by Lemma 4.6 below and we conclude that lim Z(J (n) ) = 0. Thus (4.25) becomes lim inf Y (J (n) ) ≥ 0,
lim sup Y (J (n) ) ≤ 0
n
n
or lim Y (J (n) ) = 0.
(4.26)
n
(n)
(∞)
By the lemma below, limn X (J ) = X (J ) exists and is finite. Since E0 (J ) < ∞, (∞) we have that the sum defining X (J ) is absolutely convergent. This proves (f). (n)
By this fact, (3.16), and (4.26), limn→∞ ζ (J ) exists, is finite, and obeys the sum rule (n)
(∞)
Y (J ) = lim ζ (J ) + X n→∞
(J ). (n)
By Propositions 2.2 and 4.3 of Killip-Simon [11], the existence of limn→∞ ζ (J ) is precisely the existence of the conditional trace.
Sum Rules and the Szeg˝o Condition
415
Lemma 4.6. Let J be a BW matrix. Let f be a monotone increasing continuous function on [2, ∞) with f (2) = 0. Then lim
n→∞
∞
[f (Ej+ (J )) − f (Ej+ (J (n) ))]
j =1
=
∞
f (Ej+ (J )).
(4.27)
j =1
Remarks. 1. The right side of (4.27) may be finite or infinite. 2. The sum on the left is interpreted as the limit of the sum from 1 to n as n → ∞, which exists and is finite by the arguments at the start of Sect. 3. 3. A similar result holds for Ej− and f monotone decreasing on (−∞, −2]. Proof. Call the sum on the left of (4.27) (δf )(J, n). Since Ej+ (J (n) ) ≤ Ej+ (J ), we have (δf )(J, n) ≥
m
[f (Ej+ (J )) − f (Ej+ (J (n) ))]
(4.28)
j =1
so, if we show for each fixed j as n → ∞, Ej+ (J (n) ) → 2
(4.29)
we have, by taking n → ∞ and then m → ∞, that lim inf(δf )(J, n) ≥
∞
f (Ej+ (J )).
(4.30)
j =1
On the other hand, since f ≥ 0, for each m, m
[f (Ej+ (J )) − f (Ej+ (J (n) ))] ≤
j =1
m
f (Ej+ (J )),
j =1
so taking m to infinity and then n → ∞, lim sup(δf )(J, n) ≤
∞
f (Ej+ (J )).
(4.31)
j =1
Thus (4.29) implies the result, so we need only prove that. Fix ε > 0 and look at the solution of the orthogonal polynomial sequence un = Pn (2 + ε) as a function of n. By Sturm oscillation theory [8], the number of sign changes of un (i.e., number of zeros of the piecewise linear interpolation of un ) is the number of j with Ej+ (J ) > 2 + ε. Since J is a BW matrix, this is finite, so there exist N0 with un of definite sign if n ≥ N0 − 1. It follows by Sturm oscillation theory again that for all j , Ej+ (J (n) ) ≤ 2 + ε if n ≥ N0 . This implies (4.29).
416
B. Simon, A. Zlatoš
The combination of this Sturm oscillation argument and Theorem 3.1 gives one tools to handle finitely many bound states as an alternate to Nikishin [16]. For the oscillation argument says that if J has finitely many eigenvalues outside [−2, 2], there is a J (n) with no eigenvalues. On the other hand, by Theorem 3.1, Z(J ) < ∞ if and only if Z(J (n) ) < ∞. Proof of Theorem 4.5. Z2− (J ) is an entropy and not merely an entropy up to a constant (see [11]). Thus Z2− (J (n) ) ≥ 0 for all J (n) . Moreover, since the terms in A2 are positive, the limit exists. Thus, following the proofs of (4.18) and (4.20) but using (3.18) in place of (3.15), Z2− (J ) + E2 (J ) ≤ A2 (J ) and Z2− (J ) + E2 (J ) ≥ A2 (J ) which yields the P2 sum rule. In the above, we use the fact that in place of Z(J ) ≥ − 21 ln(2), one has Z2− (J ) ≥ 0, and the fact that A2 (J ) < ∞ implies that J − J0 is compact. Proof of Theorem 4.4. Let g(β) = ln β − 21 (β − β −1 ) in the region β > 0. Then g (β) = β −1 −
1 2
− 21 β −2 = − 21 β −2 (β − 1)2
so g is analytic near β = 1 and g(1) = g (1) = g
(1) = 0, that is, g(β) ∼ c(β −1)3 . On the other hand, h(β) = ln β + 21 (β −β −1 ) is g(β)+(β −β −1 ) = β −β −1 +O((β −1)3 ). √ √ Since β +β −1 = E means β −β −1 = E 2 − 4 and β −1 = O E − 2 , we conclude that E > 2 ⇒ ln(β) − 21 (β − β −1 ) = O(|E − 2|3/2 ), ln(β) + 21 (β − β −1 ) = E 2 − 4 + O(|E − 2|3/2 ) while E < −2 ⇒ ln(|β|) − 21 (β − β −1 ) =
E 2 − 4 + O(|E + 2|3/2 ),
ln(|β|) + 21 (β − β −1 ) = O(|E + 2|3/2 ). It follows, using Lemma 4.6, that lim X0 (J ) ∓ 21 X1 (J ) = E1± + bdd (n)
(n)
n→∞
3 since Theorem 4.5 implies j,± Ej±2 − 4 < ∞ (or, by results of [10]). Thus for a constant c1 dependng only on J − J0 2 , we have ± Z1± (J ) ≤ c1 + A± 1 + E1
(4.32)
by writing the finite rank sum rule, taking limits and using the argument between (4.16) and (4.17). Since Z1± (J ) are entropies up to a constant, we have Z1± (J (n) ) ≥ −c2 and so by (3.17), ± 2 Z1± (J ) ≥ −c2 + A¯ ± 1 + E1 − c J − J0 2 .
(4.33)
Sum Rules and the Szeg˝o Condition
417
With these preliminaries, we have Proof of (i), (iv). Immediate from (4.32). Proof of (ii). Since E1± ≥ 0, (4.33) implies Z1± (J ) ≥ −c2 + A¯ ± 1 so (ii) holds. Proof of (iii). Immediate from (4.33).
Remark. (i)–(iv) of Theorem 4.4 are exactly (a)–(d) of Theorem 4.1 for the Z1± sum rules. One therefore expects a version of (e) of that theorem to hold as well. Indeed, a modification of the above proof yields for J − J0 Hilbert-Schmidt that if E1+ , Z1+ , A¯ + 1 are finite, then Z1+ (J ) = −
∞
[ln(an ) + 21 bn ] +
n=1
[ln|βj± | + 21 (βj± − (βj± )−1 )]
j,±
and if E1− , Z1− , A¯ − 1 are finite, then Z1− (J ) = −
∞
[ln(an ) − 21 bn ] +
n=1
[ln|βj± | − 21 (βj± − (βj± )−1 )].
j,±
5. Shohat’s Theorem with an Eigenvalue Estimate Shohat [22] translated Szeg˝o’s theory from the unit circle to the real line and was able to identify all Jacobi matrices which lead to measures with no mass points outside [−2, 2] and have Z(J ) < ∞. The strongest result we know of this type is the following (Theorem 4 ) from Killip-Simon [11] (the methods of Nevai [14] can prove the same result): Theorem 5.1. Let σ (J ) ⊂ [−2, 2]. Consider (i) A0 (J ) < ∞ where A0 is given by (4.1). (ii) Z(J ) < ∞ . 2 ∞ 2 (iii) ∞ n=1 (an − 1) + n=1 bn < ∞ . (iv) A0 = A¯ 0 and is finite. (v) limN→∞ N n=1 bn exists and is finite. Then (under σ (J ) ⊂ [−2, 2]), we have (i) ⇐⇒ (ii), and either one implies (iii), (iv), and (v). We can prove the following extension of this result: Theorem 5.2. Theorem 5.1 remains true if σ (J ) ⊂ [−2, 2] is replaced by σess (J ) ⊂ [−2, 2] and (1.6).
418
B. Simon, A. Zlatoš
Remarks. 1. Gonˇcar [9], Nevai [14], and Nikishin [16] extended Shohat-type theorems to allow finitely many bound states outside [−2, 2]. 2. Peherstorfer-Yuditskii [17] recently proved that E0 (J ) < ∞ and (ii) implies (iv) and additional results on polynomial asymptotics. Proof. Let us suppose first σess (J ) = [−2, 2], so J is a BW matrix. By Theorem 4.1(a), (i) of this theorem plus E0 (J ) < ∞ implies (ii) of this theorem. By Theorem 4.1(c), (ii) of this theorem implies (i) of this theorem. If either holds, then (iv) follows from (e) of Theorem 4.1, (v) from the = 1 case of (g) of Theorem 4.1. (iii) follows from Theorem 4.5 if we note that E0 < ∞ implies E2 < ∞, that Z(J ) < ∞ implies Z2− (J ) < ∞ and that G(a) = O((a − 1)2 ). If we only have a priori that σess (J ) ⊂ [−2, 2], we proceed as follows. If Z(J ) < ∞, σac (J ) ⊃ [−2, 2] so, in fact, σess (J ) = [−2, 2]. If A0 < ∞, we look closely at the proof of Theorem 4.1(a). Equation (4.18) does not require σess (J ) = [−2, 2], but only that σess (J ) ⊂ [−2, 2]. Thus, A0 < ∞ implies Z(J ) < ∞ if E0 (J ) < ∞. There is an interesting way of rephrasing this. Let the normalized orthogonal polynomial obey Pn (x) = γn x n + O(x n−1 ).
(5.1)
As is well known (see, e.g. [23]), γn = (a1 a2 . . . an )−1 .
(5.2)
A0 = lim inf ln(γn )
(5.3)
A¯ 0 = lim sup ln(γn ).
(5.4)
Thus
and
Corollary 5.3. Suppose σess (J ) ⊂ [−2, 2] and E0 (J ) < ∞. Then Z(J ) < ∞ (i.e., the Szeg˝o condition holds) if and only if γn is bounded from above (and in that case, it is also bounded away from 0; indeed, lim γn exists and is in (0, ∞)). Remark. Actually, lim sup γn < ∞ is not needed; lim inf γn < ∞ is enough. Proof. By (5.3), γn bounded implies A0 < ∞, and thus Z(J ) < ∞. Conversely, Z(J ) < ∞ implies −∞ < A0 = A¯ 0 < ∞. So by (5.2), it implies γn is bounded above and below. In the case of orthogonal polynomials onthe circle, Szeg˝o’s theorem says Z < ∞ 2 if and only if κj is bounded if and only if ∞ j =1 |αj | < ∞, where κj is the leading coefficient of the normalized polynomials, and αj are the Verblunsky (aka Geronimus, aka reflection) coefficients. In the real line case, if one drops the a priori requirement that E0 (J ) < ∞, it can happen that γn is bounded but Z(J ) = ∞. For example, if an ≡ 1 but bn = n−1 , then Z(J ) cannot be finite. For J − J0 ∈ 2 , so Theorem 4.4(ii) is applicable and thus, A¯ − 1 = ∞ implies Z(J ) = ∞. But the other direction always holds: Theorem 5.4. Let J be a BW matrix with Z(J ) < ∞ (i.e., the Szeg˝o condition holds). Then γn is bounded. Moreover, if J − J0 is compact, then limn→∞ γn exists.
Sum Rules and the Szeg˝o Condition
419
Remarks. 1. The examples of the next section show Z(J ) < ∞ is consistent with lim γn = 0. 2. This result – even without a compactness hypothesis – is known. For γn is monotone increasing in the measure (see, e.g., Nevai [15]) and so one can reduce to the case where Shohat’s theorem applies. Proof. By Theorem 4.1(c), Z(J ) < ∞ implies A¯ 0 < ∞ which, by (5.4), implies γn is bounded. If J − J0 is compact, then Corollary 4.2 implies that lim γn = exp(lim − nj=1 ln(aj )) exists but can be zero. Here is another interesting application of Theorem 5.2. Theorem 5.5. Suppose bn ≥ 0 and ∞
|an − 1| < ∞.
(5.5)
n=1
Then E0 (J ) < ∞ if and only if ∞ n=1 bn < ∞. ∞ Proof. If n=1 bn < ∞, E0 (J ) < ∞ by (5.5) and the bounds of Hundertmark-Simon [10]. On the other hand, if E0 (J ) < ∞, (5.5) implies A0 < ∞, so by Theorem 5.2, N ∞ n=1 bn is convergent. Since bn ≥ 0, n=1 bn < ∞. 6. O(n−1 ) Perturbations In this section, we will discuss examples where an = 1 + αn−1 + Ea (n), bn = βn
−1
+ Eb (n),
(6.1) (6.2)
where E· (n) is small compared to n1 in some sense. Our main result will involve the very weak requirement on the errors that n(|Ea (n)| + |Eb (n)|) → 0. (In fact, we only need the weaker condition that nj=1 (|Ea (j )| + |Eb (j )|) is o(ln n).) In discussing the historical context, we will consider stronger assumptions like γ 1 E· (n) = 2 + o 2 . (6.3) n n We will also mention examples where the leading n−1 terms are replaced by (−1)n n−1 . These examples are natural because they are just at the borderline beyond J − J0 trace class or A0 (J ) < ∞ or A¯ 0 (J ) > −∞. Here is the general picture for these examples. The (α, β) plane is divided into four regions: (a) (b) (c) (d)
|β| < −2α. Szeg˝o fails at both −2 and 2. |β| ≤ 2α. Szeg˝o holds. β > 2|α| or β = −2α with β > 0. Szeg˝o holds at +2 but fails at −2. β < −2|α| or β = 2α with β < 0. Szeg˝o holds at −2 but fails at +2.
Remarks. 1. These are only guidelines and the actual result that we can prove requires estimates on the errors.
420
B. Simon, A. Zlatoš
2. Put more succinctly, Szeg˝o holds at ±2 if and only if 2α ± β ≥ 0. 3. We need strong hypotheses at the edges of our regions where |β| = 2|α|. For example, “generally” Szeg˝o should hold if β = 2α > 0, but if an = 1 + αn−1 − (n ln(n))−1 and bn = 2αn−1 , the Szeg˝o condition fails (at −2), as follows from Theorem 6.1 below. Here is the history of these kinds of problems: (1) Pollaczek [18–20] found an explicit class of orthogonal polynomials in the region (in our language) |β| < −2α, one example for each such (α, β) with further study by Szeg˝o [24, 26] (but note formula (1.7) in the appendix to Szeg˝o’s book [26] is wrong – he uses in that formula the Bateman project normalization of the parameters he calls a, b, not the normalization he uses elsewhere). They found that for these polynomials, the Szeg˝o condition fails. (2) In [13], Nevai reported a conjecture of Askey that (with O(n−2 ) errors) Szeg˝o fails for all (α, β) = (0, 0). (3) In [1], Askey-Ismail found some explicit examples with bn ≡ 0 and α > 0, and noted that the Szeg˝o condition holds (!), so they concluded the conjecture needed to be modified. (4) In [7], Dombrowski-Nevai proved a general result that Szeg˝o holds when bn ≡ 0 and α > 0 with errors of the form (6.3). (5) In [3], Charris-Ismail computed the weights for Pollaczek-type examples in the entire (α, β) plane to the left of the line α = 1, and considered a class depending on an additional parameter, λ. While they did not note the consequence for the Szeg˝o condition, their example is consistent with our picture above. In addition, we note that in [13], Nevai proved that the Szeg˝o condition holds if an = 1 + (−1)n α/n + O(n−2 ) and bn = (−1)n β/n + O(n−2 ); see also [4]. With regard to this class, here is our result in this paper: Theorem 6.1. Suppose ∞
(an − 1)2 + bn2 < ∞,
(6.4)
n=1
N lim sup − (an − 1 ± 21 bn ) = ∞ N
(6.5)
n=1
for either plus or minus. Then the Szeg˝o condition fails at ±2. ± Proof. Equation (6.5) implies that A¯ ± 1 (J ) = ∞ so by Theorem 4.4(ii), Z1 (J ) = ∞.
Remark. The same kind of argument lets us also prove the failure of the Szeg˝o condition without assuming (6.4), and with (6.5) replaced by the slightly stronger condition that
N lim sup − (ln(an ) ± p bn ) = ∞ N
(6.6)
n=1
for some 0 ≤ p < 21 . For one can use the step-by-step sum rule for the weight 1 ± 2p cos θ . Equation (6.4) is not needed to control errors in E-sums since they have a
Sum Rules and the Szeg˝o Condition
421
definite sign near both +2 and −2, and it is not needed to replace ln(a) by a − 1 since (6.6) has ln(an ). These considerations yield another interesting result. One can prove Theorem 4.1 for the weight w(θ ) = 1 ± 2p cos θ just as we did it for the weight 1. Since w(θ) is bounded away from zero, the corresponding Z ± term is finite if only if Z is. Since p < 21 , the corresponding eigenvalue term is finite if and only if E0 is. Using Theorem 4.1(a)–(d) for this w(θ ), we obtain Theorem 6.2. Let |p| <
1 2
and |q| < 21 .
(i) If
N (ln(an ) + p bn ) > −∞ lim sup − N
n=1
and
N (ln(an ) + q bn ) = −∞ lim inf − N
n=1
then Z(J ) = ∞. (ii) If
N (ln(an ) + p bn ) < ∞ lim inf − N
n=1
and
N (ln(an ) + q bn ) = ∞ lim sup − N
n=1
then E0 (J ) = ∞.
In particular, if an = 1, bn ≥ 0, and ∞ n=1 bn = ∞, we have Z(J ) = ∞ and E0 (J ) = ∞. On the other hand, if instead ∞ n=1 bn < ∞, then Z(J ) < ∞ and E0 (J ) < ∞ (see [11, 10]). Corollary 6.3. If an , bn are given by (6.1), (6.2) with lim n[|Ea (n)| + |Eb (n)|] = 0
n→∞
and 2α ± β < 0, then the Szeg˝o condition fails at ±2. Remarks. 1. This is intended as separate results for + and for −. 2. All we need is lim (ln N )−1
n→∞
N
(|Ea (n)| + |Eb (n)|) = 0
n=1
instead of (6.7). In particular, trace class errors can be accommodated.
(6.7)
422
B. Simon, A. Zlatoš
Proof. If (6.7) holds, N
(an − 1) ± 21 bn = (α ± 21 β) ln N + o(ln N )
n=1
so (6.5) holds if 2α ± β < 0.
As for the complementary region |β| ≤ 2α, one of us has proven (see Zlatoš [28]) the following: Theorem 6.4 (Zlatoš [28]). Suppose |β| ≤ 2α and an = 1 + αn−1 + O(n−1−ε ), bn = βn−1 + O(n−1−ε ) for some ε > 0. Then the Szeg˝o condition holds. Remarks. 1. This is a corollary of a more general result (see [28]). 2. In these cases, − N n=1 ln(an ) diverges to −∞. This is only consistent with (4.18) because E0 (J ) = ∞, that is, the eigenvalue sum diverges and the two infinities cancel. We can use these examples to illustrate the limits of Theorem 4.1: (1) If an = 1 and bn = n1 , then Z(J ) = ∞ (by Corollary 6.3) while A¯ 0 (J ) = A0 (J ) < ∞. Thus E0 (J ) = ∞. (2) If an = 1 − n1 , bn = 0, then Z(J ) = ∞ (by Corollary 6.3) A¯ 0 (J ) = A0 (J ) = ∞, but E0 (J ) < ∞ since J has no spectrum outside [−2, 2]. (3) If an = 1 + n1 , bn = 0, then Z(J ) < ∞ (by Theorem 6.4), but A¯ 0 (J ) = A0 (J ) = −∞ and so E0 (J ) = ∞. n Finally, we note that Nevai’s [13] (−1) /n theorem shows that we can have Z(J ) < ∞, E0 (J ) < ∞, and have the sums an and/or bn be only conditionally and not absolutely convergent.
References 1. Askey, R., Ismail, M.: Recurrence relations, continued fractions, and orthogonal polynomials. Mem. Am. Math. Soc. 49, (1984) 2. Case, K.M.: Orthogonal polynomials. II. J. Math. Phys. 16, 1435–1440 (1975) 3. Charris, J., Ismail, M.E.H.: On sieved orthogonal polynomials, V. Sieved Pollaczek polynomials. SIAM J. Math. Anal. 18, 1177–1218 (1987) 4. Damanik, D., Hundertmark, D., Simon, B.: Bound states and the Szeg˝o condition for Jacobi matrices and Schrödinger operators. J. Funct. Anal., to appear 5. Deift, P., Killip, R.: On the absolutely continuous spectrum of one-dimensional Schrödinger operators with square summable potentials. Commun. Math. Phys. 203, 341–347 (1999) 6. Denisov, S.A.: On the coexistence of absolutely continuous and singular continuous components of the spectral measure for some Sturm-Liouville operators with square summable potentials. J. Diff. Eqs. 191, 90–104 (2003) 7. Dombrowski, J., Nevai, P.: Orthogonal polynomials, measures and recurrence relations. SIAM J. Math. Anal. 17, 752–759 (1986) 8. Figotin, A., Pastur, L.: Spectra of random and almost-periodic operators. Berlin: Springer-Verlag, 1992 9. Gonˇcar, A.A.: On convergence of Padé approximants for some classes of meromorphic functions. Math. USSR Sb. 26, 555–575 (1975)
Sum Rules and the Szeg˝o Condition
423
10. Hundertmark, D., Simon, B.: Lieb-Thirring inequalities for Jacobi matrices. J. Approx. Theory 118, 106–130 (2002) 11. Killip, R., Simon, B.: Sum rules for Jacobi matrices and their applications to spectral theory. Ann. Math. 158, 253–321 (2003) 12. Laptev, A., Naboko, S., Safronov, O.: On new relations between spectral properties of Jacobi matrices and their coefficients. Commun. Math. Phys., to appear 13. Nevai, P.: Orthogonal polynomials defined by a recurrence relation. Trans. Am. Math. Soc. 250, 369–384 (1979) 14. Nevai, P.: Orthogonal polynomials. Mem. Am. Math. Soc. 18(213), 185 pp (1979) 15. Nevai, P.: Géza Freud, orthogonal polynomials and Christoffel functions. A case study. J. Approx. Theory 48, 3–167 (1986) 16. Nikishin, E.M.: Discrete Sturm-Liouville operators and some problems of function theory. J. Sov. Math. 35, 2679–2744 (1986) 17. Peherstorfer, F., Yuditskii, P.: Asymptotics of orthonormal polynomials in the presence of a denumerable set of mass points. Proc. Am. Math. Soc. 129, 3213–3220 (2001) 18. Pollaczek, F.: Sur une généralisation des polynomes de Legendre. C. R. Acad. Sci. Paris 228, 1363– 1365 (1949) 19. Pollaczek, F.: Systèmes de polynomes biorthogonaux qui généralisent les polynomes ultrasphériques. C. R. Acad. Sci. Paris 228, 1998–2000 (1949) 20. Pollaczek, F.: Sur une famille de polynômes orthogonaux qui contient les polynômes d’Hermite et de Laguerre comme cas limites. C. R. Acad. Sci. Paris 230, 1563–1565 (1950) 21. Rudin, W.: Real and Complex Analysis, 3rd edn. New York: Mc-Graw Hill, 1987 22. Shohat, J.A.: Théorie Générale des Polinomes Orthogonaux de Tchebichef. Mémorial des Sciences Mathématiques, Vol. 66. Paris: 1934, pp. 1–69 23. Simon, B.: The classical moment problem as a self-adjoint finite difference operator. Adv. Math. 137, 82–203 (1998) 24. Szeg˝o, G.: Beiträge zue Theorie der Toeplitzschen Formen, I, II. Math. Z. 6, 167–202 (1920); 9, 167–190 (1921) 25. Szeg˝o, G.: On certain special sets of orthogonal polynomials. Proc. Am. Math. Soc. 1, 731–737 (1950) 26. Szeg˝o, G.: Orthogonal Polynomials, 4th edn. American Mathematical Society, Colloquium Publications, Vol. XXIII. Providence, RI: American Mathematical Society, 1975 27. Verblunsky, S.: On positive harmonic functions. Proc. London Math. Soc. 40, 290–320 (1935) 28. Zlatoš, A.: The Szeg˝o condition for Coulomb Jacobi matrices. J. Approx. Theory 121, 119–142 (2003) Communicated by M. Aizenman
Commun. Math. Phys. 242, 425–444 (2003) Digital Object Identifier (DOI) 10.1007/s00220-003-0949-7
Communications in
Mathematical Physics
A Functional-Analytic Theory of Vertex (Operator) Algebras, II Yi-Zhi Huang1,2 1
Department of Mathematics, Kerchof Hall, University of Virginia, Charlottesville, VA 22904-4137, USA 2 Department of Mathematics, Rutgers University, 110 Frelinghuysen Rd., Piscataway, NJ 08854-8019, USA (permanent address). E-mail: [email protected] Recieved: 30 January 2003 / Accepted: 25 April 2003 Published online: 14 October 2003 – © Springer-Verlag 2003
Abstract: For a finitely-generated vertex operator algebra V of central charge c ∈ C, a locally convex topological completion H V is constructed. We construct on H V a structure of an algebra over the operad of the 2c th power Det c/2 of the determinant line bundle Det over the moduli space of genus-zero Riemann surfaces with ordered analytically parametrized boundary components. In particular, H V is a representation of the semigroup of the 2c th power Det c/2 (1) of the determinant line bundle over the moduli space of conformal equivalence classes of annuli with analytically parametrized boundary components. The results in Part I for Z-graded vertex algebras are also reformulated in terms of the framed little disk operad. Using May’s recognition principle for double loop spaces, one immediate consequence of such operadic formulations is that the compactly generated spaces corresponding to (or the k-ifications of) the locally convex completions constructed in Part I and in the present paper have the weak homotopy types of double loop spaces. We also generalize the results above to locally-grading-restricted conformal vertex algebras and to modules.
0. Introduction The present paper develops the functional-analytic aspects of vertex operator algebras. More specifically, we construct a locally convex topological completion of a finitelygenerated vertex operator algebra and a structure on this completion of an algebra over a certain natural operad constructed from genus-zero Riemann surfaces with boundaries. We obtain representation-theoretic and homotopy-theoretic consequences and give generalizations to more general algebras and modules. For a complex number c, consider the sequence Detc/2 of the 2c th powers of the determinant line bundles Detc/2 (n), n ≥ 0, over the moduli spaces of genus-zero Riemann surfaces with n + 1 ordered analytically parametrized boundaries. This sequence Detc/2 has a natural structure of (genuine) operad. (See [M1, HL1, HL2] and Appendix
426
Y.-Z. Huang
C of [H3] for the notion of operads and other related notions and see [Se1, Se2] and Appendix D of [H3] for determinant line bundles.) An algebra over Detc/2 such that the underlying vector space is a complete locally convex topological vector space and the corresponding maps are continuous and depend holomorphically on Detc/2 is called a genus-zero holomorphic conformal field theory of central charge c. See [Se1 and Se2] for a geometric definition of conformal field theory in the more general case of arbitrary genus and nonholomorphic theories. Genus-zero conformal field theories are the starting point of a number of papers on algebraic structures derived from conformal field theories (see, for example, [KSV, KVZ]). However, the construction of examples of conformal field theories, even in this genus-zero case, is difficult and subtle. It has been expected that vertex operator algebras will give examples of such genus-zero holomorphic theories. But it is clear that vertex operator algebras themselves are not such theories. In fact, in [H1, H2 and H3], it was established that a vertex operator algebra has only the structure of an algebra over a C× -rescalable partial operad in the sense of [HL1 and HL2]. It is also clear that to construct such a theory from a vertex operator algebra, one first has to construct a suitable locally convex completion of the algebra. We know that Detc/2 is generated by Detc/2 (1) and Detc/2 (2), the 2c th powers of the determinant line bundles over moduli spaces of genus-zero Riemann surfaces with two and three, respectively, ordered analytically parametrized boundary components. Thus one must next construct continuous linear maps associated to elements in Detc/2 (1) and Detc/2 (2). Combining these maps with the geometric formulation of vertex operator algebras in terms of partial operads in [H3], it is easy to see that we will have a genus-zero holomorphic conformal field theory. The main purpose of the present paper is to carry out this construction of genus-zero holomorphic conformal field theories from finitely-generated vertex operator algebras. The results in Part I for finitely-generated Z-graded vertex algebras are also reformulated in terms of the framed little disk operad. Note that any genus-zero conformal field theory must be a representation of the semi-group Detc/2 (1), the 2c th power of the determinant line bundle over the moduli space of annuli with analytically parametrized boundary components. Thus, in particular, we construct in this paper a representation of Detc/2 (1) from a finitely-generated vertex operator algebra. In fact, from the construction it is easy to see that part of our construction actually gives a representation of Detc/2 (1) from an arbitrary Z-graded representation of the Virasoro algebra satisfying a certain truncation condition. As far as the author knows, there seems to be no such general results on the integration of representations of the Virasoro algebra in the literature. Combining the operadic formulations mentioned above with May’s recognition principle for double loop spaces [M1], we conclude that the compactly generated spaces corresponding to (or the k-ifications of) the locally convex completions constructed in Part I and in the present paper have the weak homotopy types of double loop spaces. It is known that vertex operator algebras are a basic ingredient in conformal field theory and that conformal field theories describe string theory or M theory perturbatively. In string theory, there are two kinds of geometry involved, the “world-sheet” geometry and “space-time” geometry. The operad Detc/2 is part of the world-sheet geometry. The double loop space structures are interesting because they give us some “space-time” information about the vertex (operator) algebra. Since the operad Detc/2 has a much richer structure than the little disk operad, one should be able to recognize more properties of algebras over it. It will be much more interesting if one can recognize topological properties homeomorphically, not just (weak) homotopically, or even recognize some geometric properties. It will be especially interesting to see what geometric and topolog-
A Functional-Analytic Theory of Vertex (Operator) Algebras, II
427
ical properties can be recognized from structures associated to conformal field theories such as the minimal models which are constructed without any “space-time” geometry information. These constructions and results above generalize to locally-grading-restricted conformal vertex algebras without any difficulty. These generalizations have been used in [HZ]. We also give the corresponding results for modules without giving detailed proofs. The present paper is organized as follows: In Sect. 1, a locally convex topological completion H V of a finitely-generated vertex operator algebra V of central charge c ∈ C is constructed. In Sect. 2, a structure of a representation on H V of Detc/2 (1) is constructed. In Sect. 3, we construct linear continuous maps from the completed tensor product of two copies of H V to H V associated to elements of Detc/2 (2). In Sect. 4, we first reformulate the result in Part I ([H4]) in terms of the framed little disk operad. Then we state the main result (Theorem 4.2) of the present paper. Structures of double loop spaces on the compactly generated spaces corresponding to (or the k-ifications of) the completions constructed in [H4] and in this paper are also stated in this section. The statements of the generalizations to locally-grading-restricted conformal vertex algebras and the corresponding results for modules are given in Sects. 5 and 6, respectively. 1. A Locally Convex Completion of a Finitely-Generated Vertex Operator Algebra In this section, we construct a locally convex topological completion of a finitely-generated vertex operator algebra V . The topological completion is larger than the topological completion of a finitely-generated Z-graded grading-restricted vertex algebra constructed in Part I ([H4]). For simplicity, we shall use the same notation H V as in [H4] to denote the topological completion we shall construct in the present paper. But we warn the reader that H V in the present paper is larger than H V in [H4]. As in [H4], since V is fixed in the present paper, we shall denote H V simply by H . First we need to consider some geometric objects. A disk is a genus-zero Riemann surface with a connected boundary. A smooth invertible map from S 1 to the boundary of a disk is called an analytic parametrization if it can be extended to an analytic map from a neighborhood of S 1 inside the closed unit disk on the complex plane to a neighborhood of the boundary of the disk. A disk with analytically parametrized boundary is a disk equipped with an analytic parametrization of its boundary. For k ≥ 0, a k-punctured disk with analytically parametrized boundary is a disk with analytically parametrized boundary and k ordered and distinct points in the interior of the disk. Conformal equivalences between k-punctured disks with analytically parametrized boundaries are defined in the obvious way. Let (k), k ≥ 0, be the moduli spaces of k-punctured disks with analytically parametrized boundaries and let = ∪k≥0 (k). Also consider the moduli spaces B0,1,k , k ≥ 0, of genus-zero Riemann surfaces with ordered analytically parametrized boundary components, one positively oriented and the other negatively oriented and ordered. The sequence {B0,1,k }k≥0 has a natural structure of an analytic operad and this operad is isomorphic to the suboperad KH1 of the sphere partial operad K discussed in Sect. 6.4 of [H3]. It is clear that has a natural structure of a space over the operad {B0,1,k }k≥0 or equivalently over the operad KH1 . For any k ≥ 0, we have an injective map from (k) to K(k) defined as follows: Take any element of (k), that is, a conformal equivalence class of k-punctured disks with analytically parametrized boundaries. For any k-punctured disk with analytically parametrized boundary in this conformal equivalence class, by sewing the union of the exterior
428
Y.-Z. Huang
of S 1 and ∞ to this k-punctured disk using the analytic boundary parametrization, we obtain a k + 1-punctured genus-zero Riemann surface, one puncture negatively oriented and the other puncture positively oriented and ordered, together with a local analytic coordinate vanishing at the negatively oriented puncture. Using the uniformization theorem, this k + 1-punctured genus-zero Riemann surface with a local coordinate at the negatively oriented puncture is conformally equivalent to C ∪ {∞} with k + 1 punctures, one negatively oriented and the other positively oriented and ordered, together with a local coordinate vanishing at the negatively oriented puncture. Moreover, we can choose the conformal equivalence (analytic diffeomorphism) such that the negatively oriented puncture is mapped to ∞, the k th positively oriented puncture is mapped to 0 and the derivative at ∞ of the local coordinate map vanishing at ∞ is 1. Adding the standard local coordinates vanishing at the positively oriented punctures, we obtain a canonical sphere with tubes of type (1, k) (see Chapter 3 of [H3]). It is clear that this canonical sphere with tubes of type (1, k) is independent of the choice of the k-punctured disk with analytically parametrized boundary in the given conformal equivalence class. We define the image of the element of (k) to be the conformal equivalence class of spheres with tubes of type (1, k) containing this canonical sphere with tubes of type (1, k). We obtain a map from (k) to K(k). Clearly this map is injective. We shall identify (k) with its image in K(k). Note that any element of (k) viewed as an element of K(k) is of the form P = (z1 , . . . , zk−1 ; A, (1, 0), . . . , (1, 0)),
(1.1)
where A ∈ H . But also note that not all elements of K(k) of this form is an element of (k). Note that (k) can also be viewed as a subset of the Banach space Ck−1 × Hol(D 1 ), where Hol(D 1 ) is the Banach space of all functions continuous on the closed unit disk D 1 and holomorphic on the open unit disk. We give (k) the topology and analytic structure induced from those on Ck−1 × Hol(D 1 ). Let (V , Y, 1, ω) be a vertex operator algebra (in the sense of [FLM and FHL]). By the isomorphism theorem proved in Chapter 5 of [H3], there exists a canonical geometric vertex operator algebra structure on V . Let νk : K(k) → Hom(V ⊗k , V ), k ≥ 0, be the maps defining the geometric vertex operator algebra structure on V . Then for any v ∈ V , any u1 , . . . , uk , v ∈ V , v , (νk (P ))(u1 ⊗ · · · ⊗ uk ⊗ v) as a function of P is meromorphic on K(k). Thus for any u1 , . . . , uk , v ∈ V and any P ∈ K(k), we have an element Q(u1 , . . . , uk , v; P ) = (νk (P ))(u1 ⊗ · · · ⊗ uk ⊗ v) ∈ V . In particular, for any u1 , . . . , uk , v ∈ V and any P ∈ (k), we have an element Q(u1 , . . . , uk , v; P ) ∈ V since (k) can be viewed as a subset of K(k). For k ≥ 0 and n > 0, let Jn(k) = {(z1 , . . . , zk−1 ; A, (1, 0), . . . , (1, 0)) ∈ (k) | |zi − zj | ≥
1 , i = j, n
1 , i = 1, . . . , k, the distances from zi , i = 1, . . . , k − 1, n 1 0 to C1 and from 0 to C1−1 are large than or equal to }, n |zi | >
A Functional-Analytic Theory of Vertex (Operator) Algebras, II
where
429
C1 = fA ({w ∈ C | |w| = 1}, fA (w) = e
and
j >0
d Aj wj +1 dw
w,
C1−1 = {w −1 | w ∈ C1 }. (k)
Then we see (k) = ∪n>0 Jn , k ≥ 0. We denote the projections from V to V(n) , n ∈ Z, by Pn as in [H3]. For fixed k ≥ 0, by the sewing axiom for geometric vertex operator algebras in [H3], v , (νl (Q))(v1 ⊗ · · · ⊗ vl−1 ⊗ (Pn (Q(u1 , . . . , uk , v; P ))) , (1.2) n∈Z
v ∈ V , u1 , . . . , uk , v1 , . . . , vl ∈ V , P ∈ (k) and Q ∈ KH1 (l), is absolutely convergent. For fixed v ∈ V , u1 , . . . , uk , v1 , . . . , vl ∈ V , and Q ∈ KH1 (l), the sum of (1.2) gives a function on k . (k)
Lemma 1.1. The functions defined by the sum of (1.2) is bounded on Jn , n > 0. Proof. By the sewing axiom for geometric vertex operator algebras in [H3], (1.2) is equal to v , (νk+l−1 (Ql ∞0 P ))(v1 ⊗ · · · ⊗ vl−1 ⊗ u1 ⊗ · · · ⊗ uk ⊗ v) .
(1.3)
From (1.3) and the definition of νk+l−1 in [H3], we see that to prove the lemma, we need (k) only show that when P ∈ Jn , the distances between distinct punctures of Ql ∞0 P are larger than a fixed positive number depending only on n, and each expansion coefficient, as a function of P , of the analytic local coordinate maps vanishing at these punctures (k) are bounded on Jn . We first recall some facts and results from [H3]. Let (1)
(l)
Q = (ξ1 , . . . , ξl−1 ; B (0) , (b0 , B (1) ), . . . , (b0 , B (l) )) and (i)
fB (i) ,b(i) (w) = b0 e
d Bj wj +1 dw (i)
j >0
w
0
for i = 0, . . . , l. We shall also use the same notations fB (i) ,b(i) , i = 0, . . . , l, and fA 0 to denote the corresponding local coordinate maps. Then by the study of the sewing operation in [H3], the sewing equation 1 −1 (1) (2) fA F (w) = F fB (l) ,b(l) (w) 0
together with the normalization conditions F (1) (∞) = ∞, F (2) (0) = 0, F (1) lim = 1, w→∞ w
430
Y.-Z. Huang
has a unique solution pair F (1) , F (2) and the positively oriented punctures of Ql ∞0 P corresponding to the positively oriented punctures of P are F (2) (z1 ), . . . , F (2) (zk−1 ) and 0. The local coordinate maps vanishing at these punctures are F (2) (w) − F (2) (z1 ), . . . , F (2) (w) − F (2) (zl ) and F (2) (w), respectively. It is also proved in [H3] that the sewing operation is analytic. In particular, it is continuous. Thus Ql ∞0 P is continuous in P ∈ (k). In fact the proof actually proves that F (1) and F (1) depend on fA and fB (l) ,b(l) 0 analytically and in particular continuously. (k) First we prove that when P ∈ Jn , the distances between distinct punctures of Ql ∞0 P are larger than a fixed positive number depending only on n. If this is not true, (k) then there is a sequence {Pm }m>0 in Jn and two punctures on Ql ∞0 Pm for each m > 0 having the same orders, such that the distance between these two punctures goes to 0 when m goes to ∞. We consider the case that these two punctures are positively oriented punctures corresponding to two nonzero positively oriented punctures on Pm . If we use (2) (2) z1 (Pm ), . . . , zk−1 (Pm ) to denote the punctures of Pm and Fm and Fm the solution of the sewing equation and the normalization conditions with P replaced by Pm , then by the results in [H3] we recalled above, these two punctures Ql ∞0 Pm must be of the form (2) (2) Fm (zp (Pm )) and Fm (zq (Pm )) for some 0 < p, q < k. On the other hand, we can also obtain z1 (Pm ), . . . , zk−1 (Pm ) from Fm(2) (z1 (Pm )), . . . , Fm(2) (zk−1 (Pm )) (l)
(1)
as follows: We sew the first puncture of (0, (b0 , B (l) (b0 ))) to the 0-th puncture of (Fm(2) (z1 (Pm )), . . . , Fm(2) (zk−1 (Pm )); − − , (1, 0), . . . , (1, 0)), where
(l)
(l)
(l)
B (l) (b0 ) = {(b0 )j Bj }j >0
and − − = {−j }j <0 is the sequence defined by F (1) (w) = e−
j <0
d j wj +1 dw
w.
Then the positively oriented punctures of the resulting element is z1 (Pm ), . . . , zk (Pm ). In particular, z1 (Pm ), . . . , zk (Pm ) depend continuously on Fm(2) (z1 (Pm )), . . . , Fm(2) (zk−1 (Pm )). (2)
(2)
Thus since the distance between Fm (zp (Pm )) and Fm (zq (Pm )) goes to 0 when m goes to ∞, the distance between zp (Pm ) and zq (Pm ) must also go to 0 when m goes to ∞. But (k) (k) {Pm }m>0 is in Jn and by the definition of Jn , this is impossible. Similarly we get (k) contradictions in the other cases. Thus when P ∈ Jn , the distances between distinct punctures of Ql ∞0 P are larger than a fixed positive number depending only on n. Now we prove that each expansion coefficient, as a function of P , of the analytic (k) local coordinate maps vanishing at the punctures of Ql ∞0 P is bounded on Jn . For simplicity, we prove this for the expansion coefficients of the analytic local coordinate map vanishing at the last puncture 0 of Ql ∞0 P . By the results in [H3] we recalled above, the local coordinate map vanishing at 0 is (F (2) )−1 . Since the expansion coefficients of (F (2) )−1 at 0 are polynomials in the expansion coefficients of F (2) , we need only show (k) that each expansion coefficient of F (2) as a function of P is bounded on Jn . Note that
A Functional-Analytic Theory of Vertex (Operator) Algebras, II
431 (k)
the domain of F (2) contains C1 and the interior of C1 . When P ∈ Jn , the union of C1 and the interior of C1 always contains the closed disk centered at 0 of radius 1/n. Since the radius 1/n is independent of P , using the Cauchy formulas for the expansion (k) coefficients of F (2) , we see that each of these coefficients is bounded on Jn . ˜ be the subspace of V ∗ consisting of linear functionals λ on V such that for any Let G k ≥ 0, u1 , . . . , uk , v ∈ V , P ∈ (k), λ(Pn (Q(u1 , . . . , uk , v; P ))) (1.4) n∈Z (k)
is absolutely convergent and its sum as a function on (k) is bounded on Jn , n > 0. The dual pair (V ∗ , V ) of vector spaces gives V ∗ a locally convex topology. With the ˜ is also a locally convex space. Note that V topology induced from the one on V ∗ , G ˜ ˜ here is different from G ˜ in [H4]. In the is a subspace of G. (We warn the reader that G present paper, many notations we use are the same as the corresponding notations in [H4]. But what they denote are different from what the same notations denote in [H4].) We denote the analytic function on (k) defined by (1.4) by gk (λ ⊗ u1 ⊗ · · · ⊗ uk ⊗ v) since (1.4) is multilinear in λ, u1 , . . . , uk and v. These functions span a vector space Fk of analytic functions on (k). We obtain a linear map ˜ ⊗ V ⊗(k+1) → Fk . gk : G (k)
By definition, elements of Fk are bounded on Jn , n > 0. We define a family of norms · Fk ,n , n > 0, on Fk by gFk ,n = sup g(Q) (k)
Q∈Jn
for g ∈ Fk . These norms give a locally convex topology on Fk . Note that a net {fα }α∈A (where A is an index set) in Fk is convergent to f ∈ k if and only if it is convergent (k) uniformly in Jn for n > n0 , where n0 is a positive integer. For any k ≥ 0, there is an embedding ιFk from Fk to Fk+1 defined as follows: We use (z0 , . . . , zk−1 ; A, (1, 0), . . . , (1, 0)) instead of
(z1 , . . . , zk ; A, (1, 0), . . . , (1, 0)) ˜ u1 , . . . , uk , v ∈ V , since to denote the elements of k+1 . For λ ∈ G, Y (1, z) = 1 for any nonzero complex number z, gk+1 (λ ⊗ 1 ⊗ u1 ⊗ · · · ⊗ uk ⊗ v) as a function of (z0 , . . . , zk−1 ; A, (1, 0), . . . , (1, 0)) is in fact independent of z0 , and is equal to gk (λ ⊗ u1 ⊗ · · · ⊗ uk ⊗ v)
432
Y.-Z. Huang
as a function in (z1 , . . . , zk−1 ; A, (1, 0), . . . , (1, 0)). Thus we obtain a well-defined linear map ιFk : Fk → Fk+1 such that ιFk ◦ gk = gk+1 ◦ φk , where
˜ ⊗ V ⊗(k+1) → G ˜ ⊗ V ⊗(k+2) φk : G
is defined by φk (λ ⊗ u1 ⊗ · · · ⊗ uk ⊗ v) = λ ⊗ 1 ⊗ u1 ⊗ · · · ⊗ uk ⊗ v ˜ u1 , . . . , uk , v ∈ V . It is clear that ιFk is injective. Thus we can regard Fk as for λ ∈ G, a subspace of Fk+1 . Moreover, we have: Proposition 1.2. For any k ≥ 0, ιFk as a map from Fk to ιFk (Fk ) is continuous and open. In other words, the topology on Fk is induced from that on Fk+1 . Proof. We consider the two topologies on Fk , one is the topology defined above for Fk and the other induced from the topology on Fk+1 . We need only prove that for any n > 0, (i) the norm · Fk ,n is continuous in the topology induced from the one on Fk+1 , and (ii) the restriction of the norm · Fk+1 ,n to Fk is continuous in the topology on Fk . Let {fα }α∈A (where A is an index set) be a net in Fk convergent in the topology induced from the one on Fk+1 . Then {fα }α∈A , when viewed as a net of functions in (z0 , z1 , . . . , zk−1 ; A, (1, 0), . . . , (1, 0)), (k+1)
, n > 0. Since fα , α ∈ A, are independent of z0 , is convergent uniformly on Jn (k) {fα }α∈A is in fact convergent uniformly on the sets Jn+1 , n > 0, proving (i). Now let {fα }α∈A be a net in Fk convergent in the topology on Fk . Then {fα }α∈A is convergent (k) uniformly on Jn , n > 0. If we view fα , α ∈ A, as functions on C × (k), then the net (k) {fα }α∈A is convergent uniformly on (C × Jn ) ∩ (k + 1) (where we view (k + 1) (k+1) (k) ⊂ (C × Jn ) ∩ (k + 1), {fα }α∈A is convergent as a subset of C × (k). Since Jn (k+1) , n > 0, proving (ii). uniformly on Jn We equip the topological dual space Fk∗ , k ≥ 0, of Fk with the strong topology, that is, the topology of uniform convergence on all the weakly bounded subsets of Fk . Then Fk∗ is a locally convex space. For k ≥ 0, we define a linear map γk : Fk+1 → Fk as follows: We use P = (z0 , . . . , zk−1 ; A, (1, 0), . . . , (1, 0)) to denote an element of k+1 . Recall that C1 = fA ({w ∈ C | |w| = 1} and fA (w) = e
j >0
d Aj wj +1 dw
w.
A Functional-Analytic Theory of Vertex (Operator) Algebras, II
433
We define γk (gk+1 (λ ⊗ u0 ⊗ u1 ⊗ · · · ⊗ uk ⊗ v)) 1 z−1 gk+1 (λ ⊗ u0 ⊗ u1 ⊗ · · · ⊗ uk ⊗ v)dz0 = √ 2π −1 C1 0
(1.5)
˜ u0 , u1 , . . . , uk , v ∈ V . for λ ∈ G, We still need to show that the right-hand side of (1.5) is indeed in Fk . Let P = (z1 , . . . , zk−1 ; A, (1, 0), . . . , (1, 0)) ∈ k . Then we have P = (fA−1 (z0 ); 0, B(z0 ), (1, 0))2 ∞0 P (see formula (A.6.1) in [H3]), where B(z0 ) = Eˆ −1
(1.6)
fA−1
1
−
1 x+ f 1(z
1 . z0
A 0)
By the definition of gk and (1.6), we have γk (gk+1 (λ ⊗ u0 ⊗ u1 ⊗ · · · ⊗ uk ⊗ v)) 1 z0−1 λ(Pn (Q(u0 , . . . , uk , v; P )))dz0 . = √ 2π −1 C1 n∈Z Since the series
(1.7)
λ(Pn (Q(u0 , . . . , uk , v; P )))
n∈Z
is absolutely convergent, the right-hand side of (1.7) is equal to 1 z−1 λ(Pn (νk+1 (P )(u0 ⊗ u1 ⊗ · · · ⊗ uk ⊗ v)))dz0 √ 2π −1 n∈Z C1 0 1 = z−1 λ(Pn (νk+1 ((fA−1 (z0 ); 0, B(z0 ), (1, 0))2 ∞0 P ) √ 2π −1 n∈Z C1 0
(u0 ⊗ u1 ⊗ · · · ⊗ uk ⊗ v)))dz0 1 = z−1 λ(Pn ((ν2 ((fA−1 (z0 ); 0, B(z0 ), (1, 0)))2 ∗0 νk (P )) √ 2π −1 n∈Z C1 0 (u0 ⊗ u1 ⊗ · · · ⊗ uk ⊗ v)))dz0 1 = z0−1 λ(Y (e− j >0 B(z0 )L(j ) u0 , fA−1 (z0 )) √ 2π −1 n∈Z C1
Pn ((νk (P ))(u1 ⊗ · · · ⊗ uk ⊗ v)))dz0 1 = (fA (w))−1 fA (w)λ(Y (e− j >0 B(fA (w))L(j ) u0 , w) √ 2π −1 n∈Z |w|=1 Pn ((νk (P ))(u1 ⊗ · · · ⊗ uk ⊗ v)))dw = Resw (fA (w))−1 fA (w)λ(Y (e− j >0 B(fA (w))L(j ) u0 , w) n∈Z
Pn (νk (P ))(u1 ⊗ · · · ⊗ uk ⊗ v)).
(1.8)
434
Y.-Z. Huang
Let λ˜ be an element of V defined by λ˜ (v) = Resw (fA (w))−1 fA (w)λ(Y (e−
j >0
B(fA (w))L(j )
u0 , w)v).
˜ and thus the right-hand side of (1.5) is in Fk . Then by (1.7) and (1.8), λ˜ ∈ G Proposition 1.3. The map γk is continuous and satisfies γk ◦ ιFk = IFk ,
(1.9)
where IFk is the identity map on Fk . Proof. We still use
(z0 , . . . , zk−1 ; A, (1, 0), . . . , (1, 0))
instead of
(z1 , . . . , zk ; A, (1, 0), . . . , (1, 0)) to denote an element of k+1 . We know that there exists t ∈ [0, 1) such that for ∈ [t, 1], {w ∈ C | |w| = } is in the domain of fA (1/w). Let C = fA ({w ∈ C | |w| = 1/ }) for ∈ [t, 1]. Then by the definition of γk and Cauchy’s theorem, for any ∈ [t, 1] such that z1 , . . . , zk−1 are in the interior of C , we have γk (gk+1 (λ ⊗ u0 ⊗ u1 ⊗ · · · ⊗ ul ⊗ v)) 1 z−1 gk+1 (λ ⊗ u0 ⊗ u1 ⊗ · · · ⊗ uk ⊗ v)dz0 = √ 2π −1 C 0 ˜ u0 , . . . , uk , v ∈ V . Thus by the definition of Jn(k) , for any n > 0, there exists for λ ∈ G, n ∈ [t, 1] such that γk (gk+1 (λ ⊗ u0 ⊗ u1 ⊗ · · · ⊗ uk ⊗ v))Fk ,n = sup |γk (gk+1 (λ ⊗ u0 ⊗ u1 ⊗ · · · ⊗ uk ⊗ v))| (k)
=
(z1 ,...,zk−1 ;A,(1,0),...,(1,0))∈Jn
sup (k)
(z1 ,...,zk−1 ;A,(1,0),...,(1,0))∈Jn
1 √ 2π −1 ≤
z0 ∈C n
z0−1 gk+1 (λ ⊗ u0
⊗ u1 ⊗ · · · ⊗ uk ⊗ v)dz0 |gk+1 (λ ⊗ u0 ⊗ u1 ⊗ · · · ⊗ uk ⊗ v)|. (1.10)
sup (k)
(z1 ,...,zk−1 ;A,(1,0),...,(1,0))∈Jn ,z0 ∈C n
For any z0 ∈ C n , it is clear that there always exists positive integer nz0 and a open subset (k) (k+1) Uz0 of C containing z0 such that Uz0 × Jn ⊂ Jnz0 . Since C n is compact, there exists (1)
(l)
finitely many points z0 , . . . , z0 ∈ C n such that Uz(1) , . . . , Uz(l) cover C n . Thus the 0 0 right-hand side of (1.10) is less than or equal to l
max
(k+1) (i) z0
i=1 (z0 ,...,zk−1 ;A,(1,0),...,(1,0))∈Jn
=
l
|gk+1 (λ ⊗ u0 ⊗ u1 ⊗ · · · ⊗ uk ⊗ v)|
gk+1 (λ ⊗ u0 ⊗ u1 ⊗ · · · ⊗ uk ⊗ v)Fk+1 ,n (i) .
i=1
Combining (1.10) and (1.11), we see that γk is continuous.
z0
(1.11)
A Functional-Analytic Theory of Vertex (Operator) Algebras, II
435
˜ u1 , · · · , uk , v ∈ V , by definition, For λ ∈ G, gk+1 (λ ⊗ 1 ⊗ u1 ⊗ · · · ⊗ uk ⊗ v) = ιFk (gk (λ ⊗ u1 ⊗ · · · ⊗ uk ⊗ v)). By definition, gk+1 (λ ⊗ 1 ⊗ u1 ⊗ · · · ⊗ uk ⊗ v) = gk (λ ⊗ u1 ⊗ · · · ⊗ uk ⊗ v). Thus γk (gk+1 (λ ⊗ 1 ⊗ u1 ⊗ · · · ⊗ uk ⊗ v)) = gk (λ ⊗ u1 ⊗ · · · ⊗ uk ⊗ v). So we have (1.9).
The proof of the following consequence is the same as the proof of Corollary 1.3 in [H4]: Corollary 1.4. The adjoint map γk∗ of γk satisfies ι∗Fk ◦ γk∗ = IFk∗ , where
(1.12)
∗ → Fk∗ ι∗Fk : Fk+1
is the adjoint of ιFk and IFk∗ is the identity on Fk∗ . It is injective and continuous. As a map from Fk∗ to γk∗ (Fk∗ ), it is also open. In particular, if we identify Fk∗ with γk∗ (Fk∗ ), ∗ . the topology on Fk∗ is induced from the one on Fk+1 In the rest of this section, we give the remaining steps in the construction of the locally convex completion. These steps are mostly the same as those in [H4]. Thus our description of these steps shall be brief. Also we warn the reader again that although the notations we use below are the same as those in [H4], they denote different things in the present paper. ˜ and the algebraic dual space G ˜ ∗ of We use ·, · to denote the pairing between G ˜ G. It is an extension of the pairing between V and V denoted using the same symbol. ˜ and G ˜ ∗ with this pairing form a dual pair of vector spaces and thus give The spaces G ˜ ∗ . The dual space G ˜ ∗ can be viewed as a subspace of a locally convex topology to G ∗ (V ) = V . We define ˜∗ ⊂ V ek : V ⊗(k+1) ⊗ Fk∗ → G by
λ, ek (u1 ⊗ · · · ⊗ uk ⊗ v ⊗ µ) = µ(gk (λ ⊗ u1 ⊗ · · · ⊗ uk ⊗ v))
˜ u1 , . . . , uk , v ∈ V and µ ∈ F ∗ . for λ ∈ G, k We now have to assume that V is finitely generated. Let X be the finite-dimensional subspace X of V spanned by a finite set of generators of V containing the vacuum vector 1. We give X the topology induced by any norm on X. Then X⊗(k+1) ⊗ Fk∗ is a locally convex space. Let Gk be the image ek (X ⊗(k+1) ⊗Fk∗ ) of X ⊗(k+1) ⊗Fk∗ ⊂ V ⊗(k+1) ⊗Fk∗ under ek . The proofs of Propositions 1.5 and 1.6 below are the same as the proofs of Propositions 1.4 and 1.5 in [H4]:
436
Y.-Z. Huang
Proposition 1.5. For any k ≥ 0, Gk ⊂ Gk+1 . Proposition 1.6. The linear map ˜∗ ek |X⊗(k+1) ⊗F ∗ : X⊗(k+1) ⊗ Fk∗ → G k
is continuous.
Corollary 1.7. The quotient space (X ⊗(k+1) ⊗ Fk∗ )/(ek |X⊗(k+1) ⊗F ∗ )−1 (0) k
is a locally convex space.
Using the isomorphism from Gk to (X⊗(k+1) ⊗ Fk∗ )/(ek |X⊗(k+1) ⊗F ∗ )−1 (0), k
we obtain a locally convex space structure on Gk from that on (X⊗(k+1) ⊗ Fk∗ )/(ek |X⊗(k+1) ⊗F ∗ )−1 (0). k
Let Hk be the completion of Gk . Then Hk is a complete locally convex space. The proof of the following proposition is the same as the proof of Proposition 1.7 in [H4]: Proposition 1.8. The space Hk can be embedded canonically in Hk+1 . The topology on Hk is the same as the one induced from the topology on Hk+1 . Now we have a sequence {Hk }k≥0 of strictly increasing complete locally convex spaces. Let H = Hk k≥0
equipped with the inductive limit topology. Then H is a complete locally convex space. Let G= Gk ⊂ H. k≥0
Then V ⊂ G and G is dense in H . The same argument as in [H4] shows that G is in the closure of V . Thus we have: Theorem 1.9. The vector space H equipped with the strict inductive limit topology is a locally convex completion of V . 2. The Locally Convex Completion and a Semi-Group of Annuli In this section, we construct, on the topological completion H , a structure of a representation of the semi-group of the 2c th power of the determinant line bundle over the moduli space of conformal equivalence classes of annuli with analytically parametrized boundary components. Consider the moduli space B0,1,1 of annuli, that is, the space of conformal equivalence classes of genus-zero Riemann surfaces with two boundary components, one positively
A Functional-Analytic Theory of Vertex (Operator) Algebras, II
437
oriented and one negatively oriented, and with analytic boundary parametrizations of the boundary components. There is a sewing operation on B0,1,1 such that it becomes a semi-group. (See Appendix D of [H3] for details.) There is a determinant line bundle Det(1) over B0,1,1 and its cth power Detc (1) for any c ∈ C is well-defined. Proposition 2.1. For any complex number c, Detc (1) has a structure of a semi-group and is the central extension of B0,1,1 with central charge 2c. This result and its proof are contained implicitly in Appendix D of [H3]. See [H3] for details. By the uniformization theorem, it is clear that the semi-group B1,1,0 is isomorphic to the semi-group of the moduli space KH1 (1) equipped with the sewing operation. We shall identify B1,1,0 with KH1 (1). Over the moduli space K(1), we have a determinant line bundle and its 2c th power K˜ c (1) for any complex number c. We denote the restriction c (1). Then K ˜ c (1) is a semi-group isomorphic to Detc/2 (1). of K˜ c (1) to KH1 (1) by K˜ H H1 1 c (1) on H , See [H3] for details. We now construct a structure of a representation of K˜ H 1 where c is the central charge of V . c (1) on G. ˜ Let λ ∈ G ˜ and Q ˜ = (Q; C) ∈ K˜ c (1) First we give a right action of K˜ H H1 1 (where Q ∈ KH1 (1) and C ∈ C). We define λQ˜ ∈ V ∗ by λQ˜ (v) = C λPn ((ν1 (Q))(v)). (2.1) n∈Z
˜ Note that the right-hand side of (2.1) is absolutely convergent because λ ∈ G. ˜ Lemma 2.2. The linear functional λQ˜ is in fact in G. Proof. By definition, for any P ∈ (k), λQ˜ (Pn (Q(u1 , . . . , uk , v; P ))) n∈Z
=
λQ˜ (Pn ((νk (P ))(u1 ⊗ · · · ⊗ uk ⊗ v)))
n∈Z
=C
λPm ((ν1 (Q))(Pn ((νk (P ))(u1 ⊗ · · · ⊗ uk ⊗ v)))).
(2.2)
n∈Z m∈Z
We want to show that the right-hand side of (2.2) is absolutely convergent. To show this convergence, we note that, by the sewing axiom for geometric vertex operator algebras, λ(Pm ((ν1 (Q))(Pn ((νk (P ))(u1 ⊗ · · · ⊗ uk ⊗ v)))) m∈Z n∈Z
=
λ(Pm (((ν1 (Q))1 ∗0 (νk (P )))(u1 ⊗ · · · ⊗ uk ⊗ v)))
m∈Z
=
λ(Pm ((νk (Q1 ∞0 P ))(u1 ⊗ · · · ⊗ uk ⊗ v))).
(2.3)
m∈Z
˜ the right-hand side of (2.3) is absolutely convergent and is anaNote that since λ ∈ G, lytic in P and Q. Thus the double sum and the iterated sum in the other order are also absolutely convergent. Since the iterated sum in the right-hand side of (2.2) is exactly the iterated sum in the other order, it is absolutely convergent.
438
Y.-Z. Huang
˜ give a right action of K˜ c (1) on G. ˜ This right By this lemma, λ → λQ˜ for λ ∈ G H1 c ∗ ˜ ˜ action induces a left action on G . It also induces right actions of KH1 (1) on Fk , k ≥ 0, as follows: ˜ Q
gk (λ ⊗ u1 ⊗ · · · ⊗ uk ⊗ v) → gk (λ ⊗ u1 ⊗ · · · ⊗ uk ⊗ v) = gk (λQ˜ ⊗ u1 ⊗ · · · ⊗ uk ⊗ v), ˜ ∈ K˜ c (1), λ ∈ G, ˜ u1 , . . . , uk , v ∈ V . These right actions on Fk , k ≥ 0, induce left for Q H1 ˜ to denote the images of Q ˜ ∈ K˜ c (1) actions on F ∗ . For simplicity, we shall also use Q k
˜ ∗ and End F ∗ , k ≥ 0. in End G k
H1
˜ ∈ K˜ c (1), µ ∈ F ∗ , u1 , . . . , uk , v ∈ V . Proposition 2.3. For k ≥ 0, Q k H1 ˜ · ek (u1 ⊗ · · · ⊗ uk ⊗ v ⊗ µ) = ek (u1 ⊗ · · · ⊗ uk ⊗ v ⊗ Q ˜ · µ). Q c (1) on G ˜ ∗ and Proof. This follows from the definitions of ek and the left actions of K˜ H 1 Fk∗ .
By this proposition, we immediately obtain: c (1) on G ˜ ∗ and F ∗ induce an action of Corollary 2.4. For k ≥ 0, the actions of K˜ H k 1 c (1) on G and thus an action on H . The actions of K c (1) on H induce an action ˜ K˜ H k k k H1 1 on H . ˜ to denote the images of Q ˜ ∈ K˜ c (1) in End Hk , k ≥ 0, and We shall still use Q H1 End H . We have the following:
˜ ∈ K˜ c (1). Then its images in End Hk , k ≥ 0, and End H are Proposition 2.5. Let Q H1 continuous. ˜ in End Hk , k ≥ 0. Since Proof. We need only prove the continuity of the images of Q ∗ ˜ the actions on Hk , k ≥ 0, are induced from the action on G , we need only show that the ˜ in End G ˜ ∗ is continuous. This is equivalent to the continuity of the image image of Q ˜ ˜ ˜ in End G ˜ is of Q in End G. But from the definition (2.1), it is clear that the image of Q continuous. Combining Corollary 2.4 and Proposition 2.5, we obtain the following: Theorem 2.6. The complete locally convex spaces Hk , k ≥ 0, and H have structures of c (1) or of Detc/2 (1). continuous representations of K˜ H 1 Note that in the constructions of H0 and of the structure of a continuous representac (1) on H , only the structure of a Z-graded representation of the Virasoro tion of K˜ H 0 1 algebra on V and a certain lower-truncation condition of the representation is used. Thus we actually have the following: Theorem 2.7. Let V = n∈Z V(n) be a Z-graded module for the Virasoro algebra satisfying the conditions: (i) L(0)v = nv for v ∈ V(n) and (ii) for any v ∈ V , the Z-graded submodule W = n∈Z W(n) for the Virasoro algebra generated by v is lower truncated, that is, W(n) = 0 when n is sufficiently small. Then the same constructions in Sect. 1 and in this section give a locally convex completion H0 of V and a structure of a continuous c (1) or of Detc/2 (1) on H . representation of K˜ H 0 1
A Functional-Analytic Theory of Vertex (Operator) Algebras, II
439
3. The Locally Convex Completion and the Vertex Operator Map Consider a conformal equivalence class of genus-zero Riemann surfaces with three ordered boundary components, the first positively oriented and the other two negatively oriented, and with analytic parametrizations at these boundary components. Such a conformal equivalence class can be naturally identified with an element of KH1 (2) (see [H3]). We shall denote the corresponding element in KH1 (2) by Q. Then a pair consisting of such a conformal equivalence class and an element of the 2c th power of the ˜ of K˜ c (2). determinant line over it corresponding to an element Q H1 In this section, we use the vertex operator map to construct continuous linear maps ˜ ∈ K˜ c (2). from the topological completion of H ⊗ H to H associated to Q H1 H be the locally convex completion of the vector space tensor product H ⊗H . Let H ⊗ We would like to construct a continuous linear map ˜ : H⊗ H → H Y (Q) ˜ : ˜ such that restricting to V ⊗ V , it is equal to the linear map Y (Q) associated to Q c ˜ V ⊗ V → V constructed in [H3]. Because KH1 (2) is infinite-dimensional, our construction here is more complicated than the one in [H4]. Nevertheless, the idea and the steps are mostly the same. Because of this, we shall be brief in our arguments below. Given any Q ∈ K(2), let Q be the element of K(2) obtained by switching the negatively oriented and the second positively oriented punctures of Q. Thus we obtain a bijective map from K(2) to itself. Since the line bundle K˜ c (2) is canonically trivial, this map can be extended to a bijective map from K˜ c (2) to itself. It is clear that this c (2) to itself. map maps K˜ H 1 ˜ ∈ K˜ c (2). For any λ ∈ G ˜ and u ∈ V , we define an element u ˜ λ ∈ V ∗ We now fix Q Q H1 by ˜ ))(u ⊗ v)) (u Q˜ λ)(v) = λ(Pn ((2 (Q n∈Z
for v ∈ V . ˜ Proposition 3.1. The element u Q˜ λ is in G. ˜ = (Q; C). For any k ≥ 0, u1 , . . . , uk , v ∈ V , P ∈ k , Proof. We write Q (u Q˜ λ)(Pm (Q(u1 , . . . , uk , v; P ))) m∈Z
=
˜ ))(u ⊗ Pm (Q(u1 , . . . , uk , v; P )))) λ(Pn ((2 (Q
m∈Z n∈Z
=C
λ(Pn ((ν2 (Q ))(u ⊗ Pm (Q(u1 , . . . , uk , v; P ))))
m∈Z n∈Z
=C
λ(Pn ((ν2 (Q ))(u ⊗ Pm (Q(u1 , . . . , uk , v; P ))))
m∈Z n∈Z
=C
λ(Pn (Q(u, Pm (Q(u1 , . . . , uk , v; P )); Q ))).
m∈Z n∈Z
We need to prove that the right-hand side of (3.1) is absolutely convergent.
(3.1)
440
Y.-Z. Huang
As in [H4], we consider the iterated sum in the other order λ(Pn (Q(u, Pm (Q(u1 , . . . , uk , v; P )); Q ))) C n∈Z m∈Z
˜ Moreover it is which is convergent by using the sewing axiom and the fact that λ ∈ G. clear that this iterated sum is the expansion of an analytic function in two variables evaluated at a certain particular point. Thus the double sum must be absolutely convergent and consequently the right-hand side of (3.1) is absolutely convergent. ˜ ⊗ X l+1 ⊗ F ∗ → V ∗ by For any l ≥ 0, we define a linear map αl : G l (αl (λ ⊗ v1 ⊗ · · · ⊗ vl ⊗ v ⊗ µ))(u) = u Q˜ λ, el (v1 ⊗ · · · ⊗ vl ⊗ v ⊗ µ) ˜ v1 , . . . , vl , v ∈ X, µ ∈ F ∗ and u ∈ V . for λ ∈ G, l ˜ Proposition 3.2. The image of αl is in G. ˜ u1 , . . . , uk , u ∈ V , P ∈ k , v1 , . . . , vl , v ∈ X and Proof. For any k ≥ 0, λ ∈ G, ∗ µ ∈ Fl , (αl (λ ⊗ v1 ⊗ · · · ⊗ vl ⊗ v ⊗ µ))(Pn (Q(u1 , . . . , uk , u; P ))) n∈Z
=
(Pn (Q(u1 , . . . , uk , u; P )) Q˜ λ), el (v1 ⊗ · · · ⊗ vl ⊗ v ⊗ µ)
n∈Z
=
µ(gl ((Pn (Q(u1 , . . . , uk , u; P )) Q˜ λ) ⊗ v1 ⊗ · · · ⊗ ⊗vl ⊗ v))
n∈Z
= µ (Pn (Q(u1 , . . . , uk , u; P )) Q˜ λ)(Pm (Q(v1 , . . . , vk , v; ·))) n∈Z
=
m∈Z
˜ )) µ(λ(Pp ((2 (Q
n∈Z m∈Z p∈Z
((Pn (Q(u1 , . . . , uk , u; P )) ⊗ Pm (Q(v1 , . . . , vk , v; ·)))) ˜ )) =C µ(λ(Pp ((ν2 (Q n∈Z m∈Z p∈Z
((Pn (Q(u1 , . . . , uk , u; P )) ⊗ Pm (Q(v1 , . . . , vk , v; ·)))).
(3.2)
We need only to show that the right-hand side of (3.2) is absolutely convergent. The proof is similar to the proof in Proposition 3.1 above: We first show that one of the iterated sums in other orders is absolutely convergent and is convergent to an analytic function in Q. Then this function can be expanded as series and the series is triply absolutely convergent. In particular, the iterated sum in the right-hand side of (3.2) is absolutely convergent and is equal to this triple sum. By Proposition 3.2, ˜ αl (λ ⊗ v1 ⊗ · · · ⊗ vl ⊗ v ⊗ µ)(Pn (Q(u1 , . . . , uk , u; Q))) n∈Z
A Functional-Analytic Theory of Vertex (Operator) Algebras, II
441
is absolutely convergent and equal to gk (αl (λ ⊗ v1 ⊗ · · · ⊗ vl ⊗ v ⊗ µ) ⊗ u1 ⊗ · · · ⊗ uk ⊗ u) ∈ Fk . We define a linear map
∗ βk,l : Fk∗ ⊗ Fl∗ → Fk+l+1
by (βk,l (µ1 , µ2 ))(gk+l+1 (λ ⊗ u1 ⊗ · · · ⊗ uk+1 ⊗ u ⊗ v1 ⊗ · · · ⊗ vl ⊗ v)) = µ1 (gk (αl (λ ⊗ v1 ⊗ · · · ⊗ vl ⊗ v ⊗ µ2 ) ⊗ u1 ⊗ · · · ⊗ uk ⊗ u)) ˜ u1 , . . . , uk , u, v1 , . . . , vl , v ∈ V , µ1 ∈ F ∗ and µ2 ∈ F ∗ . In fact this formula for λ ∈ G, k l only gives a linear map from Fk∗ ⊗ Fl∗ to the algebraic dual of Fk+l+1 . The proof of the following result is completely analogous to Proposition 2.3 in [H4]: ∗ Proposition 3.3. The image of the map βk,l is indeed in Fk+l+1 and the map βk,l is continuous.
Let and
h1 = ek (u1 ⊗ · · · ⊗ uk ⊗ u ⊗ µ1 ) ∈ Gk h2 = el (v1 ⊗ · · · ⊗ vl ⊗ v ⊗ µ2 ) ∈ Gl ,
where u1 , . . . , uk , u, v1 , . . . , vl , v ∈ X, µ1 ∈ Fk∗ and µ2 ∈ Fl∗ . We define ˜ ( Y (Q))(h 1 ⊗ h2 ) = ek+l+1 (u1 ⊗ · · · ⊗ uk ⊗ u ⊗ v1 ⊗ · · · ⊗ vl ⊗ v ⊗ βk,l (µ1 , µ2 )). Note that any element of Gk or Gl is a linear combination of elements of the form h1 or h2 , respectively, given above, and that k and l are arbitrary. Thus we obtain a linear map ˜ |G⊗G : G ⊗ G → G. Y (Q) The proof of the following result is completely analogous to the proof of Proposition 2.4 in [H4]: ˜ |G⊗G is continuous. Proposition 3.4. The map Y (Q) ˜ |G⊗G to a linear map Y (Q) ˜ from Since G is dense in H , we can extend Y (Q) H to H . The proof of the following theorem is completely analogous to the proof H⊗ of Theorem 2.5 in [H4]: ˜ is a continuous extension of Y (Q) ˜ to H ⊗ H. That is, Theorem 3.5. The map Y (Q) ˜ is continuous and Y (Q) ˜ |V ⊗V = Y (Q). ˜ Y (Q) 4. Locally Convex Completions, Operads and Double Loop Spaces In this section, we reformulate the result obtained in [H4] and in Sects. 2 and 3 above using the language of operads.
442
Y.-Z. Huang
First, the result in Sect. 2 of [H4] immediately gives the following: Theorem 4.1. Let V be a finitely-generated Z-graded vertex algebra. Then the topological completion H of V constructed in [H4] has a structure of an algebra over the framed little disk operad such that for the unit disk with two embedded disks of radius H to H is r1 and r2 centered at 0 and z, respectively, the corresponding map from H ⊗ the map ν Y ([D(z, r1 , r2 )]). (See [H4] for the notation ν Y and [D(z, r1 , r2 )].) Proof. The framed little disk operad is generated by the unit disk with two embedded disks of radius r1 and r2 centered at 0 and z and the unit disk with the unit disk itself embedded and with the frames given by complex numbers a of absolute value equal to 1. So we need only define the maps corresponding to these elements of the operad. For the unit disk with two embedded disks of radius r1 and r2 centered at 0 and z, we define the associated map to be νY ([D(z, r1 , r2 )]). For the unit disk with the unit disk itself embedded and with the frames given by complex numbers a of absolute value equal to 1, we define the associated map to be a L(0) : H → H . Then we get a structure of algebra on H over the framed little disk operad. Next, combining the results of [H3] and the results in Sects. 2 and 3 above, we obtain the following result: Theorem 4.2. Let V be a finitely-generated vertex operator algebra. Then the topological completion H of V constructed above has a structure of an algebra over the operad c or, equivalently, of Detc/2 . K˜ H 1 Corollary 4.3. Let V be a finitely-generated Z-graded vertex algebra or a finitely-generated vertex operator algebra. Then locally convex completion H of V constructed in Part I ([H4]) or in Sect. 1 above has a structure of a space over the framed little disk operad. In particular, it has a structure of a space over the little disk operad. Proof. Since we have a natural continuous map from H × H to H ⊗ H , we see from Theorem 4.1 that when V is a finitely-generated Z-graded vertex algebra, its locally convex completion constructed in Part I has a structure of a space over the framed little disk operad. If V is a finitely-generated vertex operator algebra. Then note that the framed little disk operad can in fact be viewed as a suboperad of KH1 . Also note that the sewing of the determinant lines over elements in the little framed disk operad is trivial (see Appendix D of [H3]). Thus H has a structure of an algebra over the framed little disk operad and consequently has a structure of a space over the little disk operad. A subspace of a Hausdorff space is said to be compactly closed if the intersection of the subspace with each compact subset of the Hausdorff space is closed. A Hausdorff space is said to be compactly generated if every compactly closed subspace is closed. See [St] (and [W and M2]) for the notion of compactly generated topological space and properties of these spaces. In [M1], May proved, among other things, the following recognition principle for double loop spaces: Theorem 4.4. If a compactly generated Hausdorff based topological space has a structure of a space over the little disk operad, then it has the weak homotopy of a double loop space.
A Functional-Analytic Theory of Vertex (Operator) Algebras, II
443
From [St] (see also [W and M2]), we know that we can make a Hausdorff space into a compactly generated Hausdorff space by giving it a new topology in which a subspace is closed if and only if it is compactly closed in the original topology. Since this functor is usually denoted by k, here we call the space with the new compactly generated topology the k-ification of the original space. Note that in the category of compactly generated spaces, the product of spaces is defined to be the k-ification of the usual product (see [St, W and M2]). The following lemma follows immediately from the properties of k-ifications of topological spaces: Lemma 4.5. If a Hausdorff based topological space is a space over the little disk operad (with the usual products of topological spaces), then the k-ification of the space has a natural structure of a space over the little disk operad (with the products of compactly generated spaces). Combining Corollary 4.3 with Theorem 4.4 and Lemma 4.5, we obtain: Theorem 4.6. The k-ifications of the locally convex completions constructed in [H4] and in Sect. 1 above have weak homotopy types of double loop spaces. 5. Locally-Grading Restricted Conformal Vertex Algebras and Topological Completions The results in the present paper are true also for algebras which do not satisfy the (global) grading-restriction conditions. We first need the following: Definition 5.1. A conformal vertex algebra of central charge c is a Z-graded vertex algebra equipped with a Virasoro element ω satisfying all the axioms for vertex operator algebras of central charge c except the two grading-restriction axioms. A conformal vertex algebra is said to be locally grading-restricted if (i) for any n > 0, v1 , . . . , vn ∈ V , there exists r ∈ Z such that the coefficients of the series Y (v1 , x1 ) . . . Y (vn−1 , xn−1 )vn is in n>r V(n) and (ii) for any element of the conformal vertex algebra, the module W = n∈Z W(n) for the Virasoro algebra generated by this element satisfies the gradingrestriction conditions, that is, dim W(n) < ∞ for n ∈ Z and W(n) = 0 for n sufficiently small. Remark 5.2. In fact, it is not difficult to show that the condition dim W(n) < ∞ in the definition above can be derived as a consequence. Thus for concrete examples, one need only verify the lower-truncation condition W(n) = 0 for n sufficiently small. We have the following: Theorem 5.3. The constructions and results in [H4] and in Sects. 1, 2, 3 and 4 above hold for finitely-generated locally-grading-restricted conformal vertex algebras. Proof. Note that the constructions and results in [H4] and in Sects. 1, 2, 3 and 4 above need only the locally-grading-restriction conditions: All the properties of vertex operator algebras used, for example, commutativity, associativity, rationality and the factorization of exponentials of infinite sums of Virasoro operators, still hold if the locally-gradingrestriction conditions are satisfied. The details are left to the reader as an exercise. Remark 5.4. Theorem 5.3 has been used in [HZ].
444
Y.-Z. Huang
6. A Locally Convex Completion of a Finitely-Generated Module and Operads We give the results for modules in this section. Since the constructions and proofs are all similar to the case of algebras, we shall only state the final results. All the constructions and proofs are left to the reader as exercises. Theorem 6.1. Let V be a finitely-generated vertex operator algebra of central charge c, H its locally convex topological completion constructed in Sect. 1 and W a finitelygenerated V -module. Then constructions completely analogous to those in Sects. 1, 2, 3 and 4 above give a locally convex topological completion H W of W and a structure c (or equivalently of Detc/2 ) on H W . of a module for the algebra H over the operad K˜ H 1 Acknowledgement. I am grateful to J. Peter May and Nick Kuhn for discussions on compactly generated spaces and the recognition principle for double loop spaces. This research is supported in part by NSF grant DMS-0070800.
References [FHL] Frenkel, I.B., Huang, Y.-Z., Lepowsky, J.: On axiomatic approaches to vertex operator algebras and modules. Preprint, 1989; Memoirs Am. Math. Soc. 104 (1993) [FLM] Frenkel, I.B., Lepowsky, J., Meurman, A.: Vertex operator algebras and the Monster. Pure and Appl. Math. 134, New York: Academic Press, 1988 [H1] Huang, Y.-Z.: On the geometric interpretation of vertex operator algebras. Ph.D thesis, Rutgers University, 1990 [H2] Huang, Y.-Z.: Geometric interpretation of vertex operator algebras. Proc. Natl. Acad. Sci. USA 88, 9964–9968 (1991) [H3] Huang, Y.-Z.: Two-dimensional conformal geometry and vertex operator algebras. Progress in Mathematics, Vol. 148, Boston: Birkh¨auser, 1997 [H4] Huang,Y.-Z.: A functional-analytic theory of vertex (operator) algebras, I. Commun. Math. Phys. 204, 61–84 (1999) [HL1] Huang, Y.-Z., Lepowsky, J.: Operadic formulation of the notion of vertex operator algebra. In: Mathematical Aspects of Conformal and Topological Field Theories and Quantum Groups, Proc. Joint Summer Research Conference, Mount Holyoke, 1992, P. Sally, M. Flato, J. Lepowsky, N. Reshetikhin, G. Zuckerman (eds.), Contemporary Math., Vol. 175, Providence, Am. Math. Soc., RI: 1994, pp. 131–148 [HL2] Huang,Y.-Z., Lepowsky, J.: Vertex operator algebras and operads. In: The Gelfand Mathematical Seminars, 1990–1992, L. Corwin, I. Gelfand, J. Lepowsky, (eds.), Boston: Birkh¨auser, 1993, pp. 145–161 [HZ] Huang, Y.-Z., Zhao, W.: Semi-infinite forms and topological vertex operator algebras. Commun. Contemp. Math. 2, 191–241 (2000) [KSV] Kimura, T., Stasheff, J., Voronov, A.A.: On operad structures of moduli spaces and string theory. Commun. Math. Phys. 171, 1–25 (1995) [KVZ] Kimura, T., Voronov, A.A., Zuckerman, G.J.: Homotopy Gerstenhaber algebras and topological field theory. In: Operads: Proceedings of Renaissance Conferences, J.-L. Loday, J. Stasheff, A.A. Voronov, (eds.), Contemporary Math. 202, Providence, RI: Am. Math. Soc., 1997, pp. 305–333 [KM] Kriz, I., May, J.P.: Operads, algebras, modules and motives. Ast´erisque, No. 233, Marseille: Soc. Math. France, 1995 [M1] May, J.P.: The geometry of iterated loop spaces. Lect. Notes Math. 271. Berlin: Springer-Verlag, 1972 [M2] May, J.P.: A concise course in algebraic topology. Chicago Lectures in Mathematics, Chicago, IL: The University of Chicago Press, 1999 [Se1] Segal, G.B.: The definition of conformal field theory. Preprint, 1988 [Se2] Segal, G.B.: Two-dimensional conformal field theories and modular functors. In: Proceedings of the IXth International Congress on Mathematical Physics, Swansea, 1988, Bristol: Hilger, 1989, pp. 22–37 [St] Steenrod, N.E.: A convenient category of topological spaces. Mich. Math. J. 14, 133–152 (1967) [W] Whitehead, G.W.: Elements of homotopy theory. New York: Springer, 1978 Communicated by L. Takhtajan
Commun. Math. Phys. 242, 445–472 (2003) Digital Object Identifier (DOI) 10.1007/s00220-003-0952-z
Communications in
Mathematical Physics
Ferromagnetism in the Hubbard Model: A Constructive Approach Hal Tasaki Department of Physics, Gakushuin University, Tokyo 171-8588, Japan. E-mail: [email protected] Received: 9 January 2003 / Accepted: 9 May 2003 Published online: 10 October 2003 – © Springer-Verlag 2003
Abstract: It is believed that strong ferromagnetic orders in some solids are generated by subtle interplay between quantum many-body effects and spin-independent Coulomb interactions between electrons. Here we describe our rigorous and constructive approach to ferromagnetism in the Hubbard model, which is a standard idealized model for strongly interacting electrons in a solid. We introduce a class of Hubbard models in any dimensions which are nonsingular in the sense that both the Coulomb interaction and the density of states (at the Fermi level) are finite. We then prove that the ground states of the models exhibit saturated ferromagnetism, i.e., have maximum total spins. Combined with our earlier results, the present work provides nonsingular models of itinerant electrons with only spin-independent interactions where low energy behaviors are proved to be that of a “healthy” ferromagnetic insulator. Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Definition of the Hubbard Model . . . . . . . . . . . . . . . . . . . 2.1 Basic operators . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 General Hamiltonian . . . . . . . . . . . . . . . . . . . . . . 3. Rigorous Results About Ferromagnetism in the Hubbard Model . . . 3.1 Saturated ferromagnetism in the ground states . . . . . . . . . 3.2 Ferromagnetism of Nagaoka and Thouless . . . . . . . . . . . 3.3 Lieb’s ferrimagnetism and flat-band ferromagnetism . . . . . . 3.4 Beyond flat-band ferromagnetism . . . . . . . . . . . . . . . 4. Ferromagnetism in Typical d-Dimensional Nearly-Flat-Band Models 5. The Model and Main Results . . . . . . . . . . . . . . . . . . . . . 5.1 Construction of the lattice . . . . . . . . . . . . . . . . . . . . 5.2 Fermion operators . . . . . . . . . . . . . . . . . . . . . . . . 5.3 Definition of the model and the main theorem . . . . . . . . . 5.4 “Band” structure in the single-electron problem . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
446 447 447 448 448 449 449 449 450 452 456 456 456 459 460
446
H. Tasaki
6. Proof . . . . . . . . . . . . . . . . 6.1 Proof of the main theorem . 6.2 Some extensions . . . . . . 6.3 Proof of Lemma 6.1 . . . . . 6.3.1 The limit t, U → ∞. 6.3.2 The case ν = 0. . . . 6.3.3 Non-limiting cases. .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
462 462 464 465 466 468 470
1. Introduction The origin of strong ferromagnetic order observed in some solids has long been a mystery in physical science. After Heisenberg [1], it became clear that the ultimate origin of ferromagnetism lies in a subtle interplay between quantum many-body effects and strong Coulomb interaction between electrons. To provide convincing derivations of ferromagnetism in concrete models of many electrons, however, remained unsolved (even on a heuristic level) for a long time. The problem is difficult because neither quantum many-body effects, nor the Coulomb interaction alone favors ferromagnetism (or any magnetic ordering). One must deal with an interplay of both factors. Moreover, the intrinsically nonperturbative nature of the phenomenon makes the problem almost impossible to attack within conventional perturbative methods in condensed matter physics.A generic many-electron system without interactions is known to have a paramagnetic ground state, a phenomenon known as Pauli paramagnetism. In order to destabilize Pauli paramagnetism and stabilize ferromagnetism, one must have a sufficiently large interaction. For example, a heuristic argument due to Stoner implies the criterion that U DF 1 is necessary to stabilize ferromagnetism, where U is the on-site Coulomb interaction and DF is the density of states at the Fermi level1 . This is the well-known “competition” between quantum dynamics and Coulomb interaction. In the present paper, we describe our constructive and mathematically rigorous approach to the origin of ferromagnetism. This is a continuation of the series of works [2–5], and the main result of the present paper was described in [6] for a special one-dimensional model. Here we present a class of Hubbard models in any dimensions with a finite density of states (at the Fermi level) and finite interactions, and prove that their ground states are ferromagnetic. Combined with our earlier work [4, 5], this provides a class of nonsingular models of itinerant electrons (with only spin-independent interactions) in which low energy behaviors (i.e., the nature of the ground states and the low-lying excitations) are rigorously proved to be those expected in ferromagnetic insulators. We hope that the present work becomes a starting point of further investigations of deep interplay between quantum dynamics and nonlinear interactions in strongly interacting quantum many-body systems. The present paper is written in a nearly self-contained manner. In Sect. 2, we give the definition of the Hubbard model. In Sect. 3, we briefly review rigorous results about ferromagnetism in the Hubbard model, and motivate the present paper. In Sect. 4, we summarize, in a typical class of models, the main results of our constructive program in our present and previous works. The reader who is interested in the new physical results is invited to start from this section. In Sect. 5, which is the main section of the paper,
1
This is only a heuristic criterion, and there are many counterexamples.
Ferromagnetism in the Hubbard Model
447
we define our models in the most general setting and state our conclusions precisely. Finally Sect. 6 is devoted to the proof of the main theorem. 2. Definition of the Hubbard Model The Hubbard model is a standard simple model of interacting itinerant electrons in a solid. Although this model is too idealized to be regarded as a quantitatively reliable model of real solids, it contains physically essential features of interacting itinerant electron systems. It is expected to exhibit various phenomena including antiferromagnetism, ferromagnetism, ferrimagnetism, superconductivity, and metal-insulator transition. Some (but not all) of these phenomena have been treated rigorously in some cases [7]. In the present section, we define the Hubbard model in the general setting, and fix our notation. We leave details and background to more careful reviews (such as [7–9]) and present only the minimum necessary ingredients. 2.1. Basic operators. Let a lattice be a finite set whose elements r, s, · · · ∈ are called sites. A site represents an atomic orbit in a solid. For each r ∈ and σ =↑, ↓, we define the creation and the annihilation operators † cr,σ and cr,σ for an electron at site r with spin σ . These operators satisfy the canonical anticommutation relations † {cr,σ , cs,τ } = δr,s δσ,τ ,
(2.1)
† † {cr,σ , cs,τ } = {cr,σ , cs,τ } = 0,
(2.2)
and
for any r, s ∈ and σ, τ =↑, ↓, where {A, B} = AB + BA. The number operator is defined by † nr,σ = cr,σ cr,σ ,
(2.3)
which has eigenvalues 0 and 1. The Hilbert space of the model is constructed as follows. Let vac be a normalized vector state which satisfies cr,σ vac = 0 for any r ∈ and σ =↑, ↓. Physically vac corresponds to a state where there are no electrons in the system. Then for arbitrary subsets ↑ , ↓ ⊂ , we define a state2 † † cr,↑ cr,↓ vac , (2.4) r∈↑
r∈↓
in which sites in ↑ are occupied by up-spin electrons and sites in ↓ by down-spin electrons. The Hilbert space for the system with Ne electrons is spanned by the basis states (2.4) with all subsets ↑ and ↓ such that3 |↑ | + |↓ | = Ne . 2 Throughout the present paper, we assume that the sites in the lattice are ordered (in an arbitrary but fixed manner), and products of fermion operators respect the ordering. 3 Throughout the present paper |S| denotes the number of elements in a set S.
448
H. Tasaki
(1) (2) (3) We finally define total spin operators Sˆ tot = (Sˆtot , Sˆtot , Sˆtot ) by 1 † (α) cr,σ (p (α) )σ,τ cr,τ , Sˆtot = 2
(2.5)
r∈ σ,τ =↑,↓
for α = 1, 2, and 3. Here p(α) are the Pauli matrices defined by 01 0 −i 1 0 p (1) = , p(2) = , p(3) = . 10 i 0 0 −1
(2.6)
The operators Sˆ tot are the generators of SU (2) rotations of the total spin angular momentum of the system. As usual we denote the eigenvalue of (Sˆ tot )2 as Stot (Stot + 1). The maximum possible value of Stot is Ne /2 when Ne ≤ ||. 2.2. General Hamiltonian. The model is characterized by the hopping amplitudes tr,s = ts,r ∈ R defined for all r, s ∈ , and the magnitude U > 0 of the on-site Coulomb interaction. Physically, tr,s represents the quantum mechanical amplitude for an electron to hop from the site s to site r when s = r, and the on-site potential when r = s. Usually tr,s is non-negligible only when the two sites r and s are close to each other. We then define the general Hubbard Hamiltonian as † H = tr,s cr,σ cs,σ + U nr,↑ nr,↓ . (2.7) r,s∈ σ =↑,↓
r∈
Here the first term describes the quantum mechanical motion of electrons which hop around the lattice according to the amplitude tr,s . The second term represents nonlinear interactions between electrons. There is an increase in energy by U > 0 for each doubly occupied site, i.e., a site which is occupied by both an up-spin electron and a down-spin electron. This is a highly idealized treatment of the Coulomb interaction between electrons. The Hamiltonian which consists only of the first term in (2.7) describes the free tight-binding electron model. It is not very difficult to analyze this model especially when the hopping amplitude tr,s has a translation invariance. The Hamiltonian can be diagonalized in the states in which electrons behave as “waves.” The Hamiltonian which consists only of the second term in (2.7) is also easy to study. The Hamiltonian is already diagonalized in the basis states (2.4), in which electrons behave as “particles.” When both the first and the second terms in (2.7) are present, a “competition” between the wave-like nature and the particle-like nature of electrons takes place. The competition generates rich nontrivial phenomena including ferromagnetism. To investigate these phenomena is a main motivation in the study of the Hubbard model. 3. Rigorous Results About Ferromagnetism in the Hubbard Model In the present section, we formulate the problem of saturated ferromagnetism in the Hubbard model. We then give a brief review of the rigorous results about ferromagnetism in the Hubbard model, and explain the background of the present work. For more careful reviews, see [8, 9].
Ferromagnetism in the Hubbard Model
449
3.1. Saturated ferromagnetism in the ground states. It is easily shown that the Ham(α) iltonian (2.7) commutes with the total spin operators Sˆtot . Therefore one can look for simultaneous eigenstates of H and (Sˆ tot )2 . When all the ground states of the Hamiltonian H (with a fixed electron number Ne ≤ ||) are eigenstates of (Sˆ tot )2 with Stot = Ne /2, we say that the model exhibits saturated ferromagnetism. This is the strongest form of ferromagnetism since Ne /2 is the maximum possible value for Stot . 3.2. Ferromagnetism of Nagaoka and Thouless. The first rigorous and nontrivial result about saturated ferromagnetism in the Hubbard model is due to Nagaoka [10] and to Thouless [11]. It was proved that the Hubbard model on a class of lattices (which includes most of the standard lattices in two and three dimensions) with tr,s ≥ 0 exhibits saturated ferromagnetism when Ne = || − 1 and U = ∞. In other words the model is not allowed to have any doubly occupied sites, and there is only one site without an electron. The ferromagnetism of Nagaoka and Thouless is quite important since it showed for the first time that the Hubbard model can generate ferromagnetism through nontrivial interplay between quantum dynamics and Coulomb interaction. Subsequent studies, however, have suggested that their mechanism of ferromagnetism is restricted to a special situation with infinite U and a single hole. See Sect. 4 of [8] for a modern proof and further discussions.
3.3. Lieb’s ferrimagnetism and flat-band ferromagnetism. In 1989, after more than two decades from the works of Nagaoka and Thouless, Lieb proved an important theorem for the Hubbard model with Ne = || (i.e., half-filling) on a bipartite lattice [12]. For the Hubbard model with U > 0 on lattices which have two sublattices with different numbers of sites, Lieb’s theorem implies the existence of ferrimagnetism, a weaker version of ferromagnetism. A typical example is the Hubbard model on the so-called copper oxide lattice of Fig. 1, where the ground states are proved to have Stot = Ne /6 when Ne = ||. The models exhibiting Lieb’s ferrimagnetism have peculiar single-electron band structures where the band at the middle of the spectrum is completely flat (or dispersionless). One may regard Lieb’s ferrimagnetism as a precursor to the flat-band ferromagnetism that we shall discuss. Flat-band ferromagnetism was discovered first by Mielke [13–15] and then by Tasaki [2, 3]. Mielke treated the Hubbard model on a general line graph, where tr,s = t > 0 for those pairs (r, s) corresponding to the edges (or bonds) of the lattice, and tr,s = 0 otherwise. The models have peculiar band structure where the lowest single-electron band is completely flat. Mielke proved that the models with U > 0 exhibit saturated ferromagnetism for suitable electron numbers which correspond to the half-filling of the lowest bands. A typical example (and the most beautiful example of flat-band ferromagnetism) is the Hubbard model on the kagom´e lattice of Fig. 2, which was proved to exhibit saturated ferromagnetism when Ne = ||/3. See also [16–18] for Mielke’s results on Hubbard models with partially flat bands. Tasaki [2, 3] proposed his version of Hubbard models with flat lowest bands, and proved the existence of saturated ferromagnetism for U > 0 when the lowest bands are half-filled. As can be seen from the one-dimensional example in Fig. 3, his models have two different kinds of lattice sites which are sometimes interpreted as metallic and oxide atoms, and have next nearest neighbor hopping amplitudes. By fine-tuning the
450
H. Tasaki
hopping amplitudes and the on-site potentials, the lowest band becomes flat. See [19] for an extension. A common feature of Lieb’s ferrimagnetism and Mielke’s and Tasaki’s ferromagnetism is that their models have single-electron bands which are totally flat (i.e., dispersionless), and the magnetization is supported by electrons in the flat bands. (For Lieb’s ferrimagnetism, the latter statement is correct only in a vague sense.) This observation is consistent with the Stoner criterion which states that large U DF favors ferromagnetism. Here the criterion is realized by an infinitely large density of states DF . The works of Lieb, Mielke, and Tasaki have shown that rich classes of Hubbard models on slightly complicated lattices exhibit nontrivial magnetic behavior. Such a view may be helpful in understanding insulating ferromagnetism observed in a cuprate [20, 8], and has even motivated some scientists to design novel ferromagnetic materials. See [21] and references therein. But one should not forget that the Hubbard model is a highly idealized model. To find implications of the results for the Hubbard model in realistic many-electron systems defined in continuum space is a formidably difficult but a challenging problem. See, for example, [21, 22].
3.4. Beyond flat-band ferromagnetism. Although Lieb’s ferrimagnetism and Mielke’s and Tasaki’s flat-band ferromagnetism certainly have shed novel light on the mechanisms of magnetic ordering in interacting electron systems, they do not deal with the true “competition” between quantum dynamics and Coulomb interactions. When the Coulomb interaction U is vanishing, all of their models have highly degenerate ground states. The degeneracy reflects the existence of completely flat bands. Among these degenerate ground states for U = 0, there are ferrimagnetic or ferromagnetic states as well as states with much smaller magnetization. The role of the Coulomb interaction in these models is simply to lift the huge degeneracy and “select” the states with highest magnetization as unique ground states. Consequently ferrimagnetism or ferromagnetism in these models takes place for any values of U > 0. In other words magnetic ordering is stabilized by infinitesimally small Coulomb interaction. This is quite different from situations in realistic systems where the interaction must be greater than some positive critical value in order to destabilize Pauli paramagnetism and get magnetic ordering. It may be needless to say that the existence of completely flat lowest bands (especially in Tasaki’s models) is unrealistic, or even pathological. The flatness of the bands is destroyed by arbitrarily small generic perturbation, and is far from robust.
Fig. 1. The so-called copper oxide lattice. As a consequence of Lieb’s theorem [12], it is proved that the Hubbard model with U > 0 on this lattice has Stot = Ne /6 when Ne = ||
Ferromagnetism in the Hubbard Model
451
Fig. 2. The kagom´e lattice is the line graph of the hexagonal lattice. Mielke [13–15] showed that the Hubbard model on the kagom´e lattice exhibits ferromagnetism when Ne = ||/3 for any U > 0
Fig. 3. Tasaki’s flat-band Hubbard model in one dimension [2, 3]. The hopping amplitude tr,s is ν 2 t for the horizontal bonds and νt for the diagonal bonds. The sites in the upper and the lower rows have on-site potential tr,r which equal t and 2ν 2 t, respectively. When Ne = ||/2, the model exhibits saturated ferromagnetism for any t > 0, ν > 0, and U > 0. See Theorem 4.1
It was therefore highly desirable to go beyond flat band models. A natural step was to modify the model by adding extra hopping terms to the Hamiltonian thus making the flat band dispersive, and then to show that the magnetic ordering survives. One can only hope this scenario to work for sufficiently large U since magnetic ordering becomes truly a nonperturbative phenomenon when the band is not flat. As the first step in this direction, the local stability of the ferromagnetic state was proved in models obtained by adding arbitrary small short-range hopping terms to Tasaki’s version of flat-band Hubbard models [4, 5]. In this work, it was also shown that low-lying excitation energy above the ferromagnetic state has the dispersion relation expected for a magnon excitation. Then it was proved in [6] that a one-dimensional Hubbard model with non-flat bands exhibits saturated ferromagnetism for sufficiently large U . The model was obtained by adding extra nearest neighbor hopping terms to Tasaki’s one-dimensional flat-band Hubbard model as in Fig. 4. This was the first rigorous example of ferromagnetism in an electron system without any singularities, i.e., with finite interaction and finite density of states. Shen [23] announced a computer assisted extension of the proof in [6] to some higher dimensional models. The method in [6] inspired similar rigorous works in different classes of Hubbard models [24, 25]. In particular Tanaka and Ueda [26] recently succeeded in proving the existence of saturated ferromagnetism in a Hubbard model obtained by adding extra hopping terms to Mielke’s flat band Hubbard model on the kagom´e lattice. For closely related heuristic works, see [27, 28] and other references in Sect. 6.6 of [8]. The present work is an extension of that in [6]. We extend the theorem in [6] to general models in higher dimensions. As was noted in [6], a straightforward extension of the method in [6] applies to a class of higher dimensional models. Instead of using such a method, we here present a much more general and simplified proof which naturally covers a more general class of models.
452
H. Tasaki
Fig. 4. Tasaki’s nearly-flat-band Hubbard model in one dimension [6]. The hopping amplitude tr,s is −ν 2 s and ν 2 t for the horizontal bonds in the upper and the lower rows, respectively, and ν(t + s) for the diagonal bonds. The sites in the upper and the lower row have on-site potential tr,r which equal t − 2ν 2 s and 2ν 2 t − s, respectively. The model has two bands which are not flat. When Ne = ||/2, the model √ exhibits saturated ferromagnetism for sufficiently large U/s and t/s for any ν > 0. When ν = 1/ 2, for example, the appearance of ferromagnetism is proved for t/s ≥ 1.6 and sufficiently large U/s. See Theorem 4.2
4. Ferromagnetism in Typical d-Dimensional Nearly-Flat-Band Models In the present section, we concentrate on a simple class of models defined on decorated hypercubic lattices, and precisely describe the results of the present paper and our previous works. Although our works cover much more general models, it may be useful for the readers to see what has been achieved in the context of simple models. In short, we start from a concretely defined non-singular model of itinerant electrons, and prove that its low energy properties coincide with what one expects in a “healthy” ferromagnet. Let E denote (only in the present section) the d-dimensional L × · · · × L hypercubic lattice with the unit lattice spacing and periodic boundary conditions. We let L > 0 be an odd integer. We take a new site in the middle of each bond (i.e., a pair of neighboring sites) in E, and denote by I (again only in this section) the collection of all such sites. We shall study the decorated hypercubic lattice = E ∪ I in the present section. See Fig. 5. We define a Hubbard model on which is characterized by four parameters t > 0, s > 0, ν > 0, and U > 0. The hopping amplitude of the model is given by
tr,s =
ν(t + s) 2t ν −ν 2 s 2dν 2 t − s t − 2ν 2 s 0
if |r − s| = 1/2; if r, s ∈ E and |r − s| = 1; if (r, s) ∈ B; if r = s ∈ E; if r = s ∈ I; otherwise,
(4.1)
where we set
√ B = (r, s) r, s ∈ I, |r − s| = 1/ 2
∪ (r, s) r, s ∈ I, |r − s| = 1, (r + s)/2 ∈ E .
(4.2)
There are nearest neighbor and next nearest neighbor hopping amplitudes. See Fig. 5. This rather complicated expression for tr,s comes from a simple construction in Sect. 5.3. See (5.12). One can easily calculate the single-electron properties corresponding to the above hopping amplitudes. There are (d + 1) single-electron bands4 and their dispersion rela4
The readers unfamiliar with the notion of bands may ignore this part or refer to Appendix E of [8].
Ferromagnetism in the Hubbard Model
(a)
453
(b)
Fig. 5 a,b. The lattice structure and the hopping amplitudes in the two dimensional model. The black dots are sites in E , and the gray dots are sites in I . (a) shows the flat-band model with s = 0, and (b) shows the general model with s > 0
tions are given by5
d 2 −s − 2ν s µ=1 (1 + cos kµ ) if j = 1; εj (k) = t if j = 2, · · · , d; t + 2ν 2 t d (1 + cos k ) if j = d + 1. µ µ=1
Here k = (k1 , . . . , kd ) is the wave vector in the set 2π 2π L−1 n1 , . . . , nd ni = 0, ±1, ±2, . . . , ± . K= L L 2
(4.3)
(4.4)
In the flat-band model with s = 0, all the bands except the uppermost band with j = d +1 are dispersionless (or flat) as in Fig. 6(a). In a general model with s > 0, the lowest band becomes dispersive as in Fig. 6(b). Since our ferromagnetism is supported by electrons in the lowest band, it is crucial that the lowest band becomes dispersive. Reflecting the special geometry of the decorated lattice, the intermediate bands with j = 2, . . . , d are always dispersionless. This, however, is not crucial to low energy behavior of our model. Indeed it is not difficult to add proper extra hopping terms to the model to make all the bands dispersive while maintaining the existence of ferromagnetism. See Sect. 6.2. We consider the Hubbard model with the Hamiltonian (2.7), the hopping amplitudes (4.1), and the electron number Ne = |E| = Ld . We first recall the result about the flat-band ferromagnetism proved in [2, 3]. (See Sect. 6 of [8] for the most compact proof.) 5 In our models, all the bands have simple cosine dispersion relations. This is not the case in general multi-band systems, and reflects a special character of our hopping amplitudes. In this sense, our models may be regarded as a kind of “idealized tight-binding models.” Whether such models are useful in studying problems other than ferromagnetism is an open question.
454
(a)
H. Tasaki
(b)
Fig. 6 a,b. The dispersion relation (4.3) of the single-electron bands in the two dimensional model. The horizontal axes represent −π ≤ k1 , k2 ≤ π, and the vertical axis denotes the single-electron energy. (a) shows the flat-band model with s = 0, and (b) shows the general model with s > 0
Theorem 4.1 (Flat-band ferromagnetism). Let s = 0. Then for arbitrary t > 0, ν > 0, and U > 0, the above model exhibits saturated ferromagnetism. As we have stressed in Sect. 3.4, ferromagnetism takes place for any positive values of U in the flat-band models. When the lowest band is no longer flat, saturated ferromagnetism cannot take place for too small values of U > 0. This fact can be seen, for example, from the following (easy and well-known) theorem. (See Sect. 3.3 of [8] for a proof.) Theorem 4.2 (Instability of saturated ferromagnetism). Let s > 0 and U < 4ν 2 s. Then the lowest energy among the states with Stot = Smax − 1 is strictly lower than the lowest energy among the states with Stot = Smax . This means that the ground state of the model has Stot < Smax , and hence the model does not exhibit saturated ferromagnetism. The theorem, unfortunately, does not tell us what the ground states look like for small U . (We nevertheless believe that the model has ground states with Stot = 0 for sufficiently small U .) It assures us, however, that the appearance of saturated ferromagnetism, which is established in the following theorem, is a purely nonperturbative phenomenon. Theorem 4.3 (Ferromagnetism in nearly-flat-band models). When t/s, U/s, and 1/ν are sufficiently large (how large these quantities should be depend only on the dimensionality d), the above model exhibits saturated ferromagnetism. This is a special case of our main theorem in the present paper, Theorem 5.2. For the model with d = 1, one can prove the same statement for any values of ν > 0. See Sect. 6.2. A computer assisted proof of the above theorem for d = 2, 3 (which makes use of an extension of the method in [6]) was announced by Shen [23].
Ferromagnetism in the Hubbard Model
455
Moreover our earlier results in [4, 5] about low-lying excitations also apply to the present model. For any x ∈ Zd , define the translation operator Tx by † † † † cr,↑ cr,↓ vac = cr+x,↑ cr+x,↓ vac , (4.5) Tx r∈↑
r∈↓
r∈↑
r∈↓
where we use periodic boundary conditions to identify r + x with a site in (if necessary). Then, for any k ∈ K, we let ESW (k) be the lowest possible energy among the (3) states that satisfy Sˆtot = {(Ne /2)−1} and Tx [] = eik·x for any x. In other words, ESW (k) is the lowest energy among the states where a single spin is flipped (from the ferromagnetic ground state) and the total momentum is k. Then we have the following theorem. (For more precise statements, see Sect. 3.3 of [5].) Theorem 4.4 (Dispersion relation of low-lying excitations). Let EGS be the ground state energy. When t/s, U/s, t/U , and 1/ν are sufficiently large, one has 4
F1 4ν U
d µ=1
kµ sin 2
2 ≥ ESW (k) − EGS ≥ F2 4ν U 4
d µ=1
kµ sin 2
2 ,
(4.6)
for any k ∈ K. Moreover the constants F1 and F2 tend to 1 as s → 0 and ν → 0. Therefore, for sufficiently small s and ν, we have an almost precise estimate ESW (k) − EGS 4ν 4 U
d
sin
µ=1
kµ 2
2 ,
(4.7)
about the low-lying excitation energies. We note that this dispersion relation is what one expects for the elementary magnon excitation in a ferromagnetic Heisenberg model on E with the exchange interaction Jeff = 2ν 4 U . To summarize, we have obtained a class of non-singular models of itinerant electrons6 (with only spin-independent interactions) whose low energy behaviors are rigorously proved to be that of a “healthy” insulating ferromagnet7 . By a “healthy” insulator, we mean an itinerant electron system whose low energy properties can effectively be described by an appropriate quantum spin system. Although we can hardly claim that our model is realistic, the similarity with ferromagnetism observed in a cuprate (see Sect. 7.1 of [8]) suggests that our models share some features with some of the existing ferromagnetic insulators. Let us finally discuss whether our ferromagnetism is robust against perturbations. We note that Theorem 4.4 about the low-lying excitation is still valid when one adds small arbitrary translation invariant perturbation to the hopping amplitudes8 . In other words, local stability of the ferromagnetic state is proved for slightly perturbed models. Since 6 It is true that the Hubbard model itself is “singular” when compared with more realistic models in continuum. But this is a consequence of the way of describing physical systems, and does not necessarily mean that the underlying system (if any) is singular. We believe, on the other hand, that the models with U = ∞ or DF = ∞ have more manifest singularities. 7 It should be noted that insulating ferromagnets are rather rare in reality. To prove the existence of metallic ferromagnetism, in which a set of electrons contribute both to conduction and magnetism, in a certain version of the Hubbard model is a challenging open problem [8]. 8 One must also replace E GS in the theorem with the lowest energy of the states with Stot = Smax .
456
H. Tasaki
it is generally believed that local stability of ferromagnetism implies global stability (see [17] for a related rigorous result), this strongly suggests that the global stability of ferromagnetism (as is stated in Theorem 4.3) is valid for general perturbed models. 5. The Model and Main Results 5.1. Construction of the lattice. We define our lattice and the Hubbard model on it. Let us give a brief explanation first. Our lattice consists of two kinds of sites called external sites and internal sites. The sets of all the external and the internal sites are denoted as E and I, respectively. In the model of Fig. 5, for example, the black dots are external sites and gray dots are internal sites. The whole lattice is decomposed into a union of overlapping cells. Each cell contains a single internal site and n (n ≥ 2) external sites. An internal site u belongs to exactly one cell (denoted as Cu ), while an external site x belongs to m (m ≥ 2) cells. In Fig. 5, a bond which consists of two black dots and a gray dot is a cell. To be more precise, let us define the general lattice by using the “cell construction” as in [8]. This allows us to cover a general class of models in a unified manner. We fix two integers n, m ≥ 2 which will characterize our lattice. Let the basic cell be a set of (n + 1) sites written as C = {u, x1 , x2 , . . . , xn }.
(5.1)
For convenience, we call u the internal site of C, and x1 , x2 , · · · , xn the external sites. To form the lattice , we assemble M identical copies of the basic cell, and identify external sites from m distinct cells regarding them as a single site. We do not make such identifications for internal sites. We assume that the lattice thus constructed is connected. Usually becomes a periodic lattice by this construction. The lattice is naturally decomposed as = I ∪ E,
(5.2)
where I and E are the sets of internal sites and external sites, respectively. From the above construction, we see that the numbers of sites in these sublattices are |I| = M and |E| = nM/m. See Fig. 7 for some examples of the basic cell and corresponding lattices. The examples treated in Sect. 4 are obtained by considering the cell with n = 2, and setting m = 2d. We can easily treat models where n and m are not identical for different cells, but we here concentrate on the simplest case with constant n and m. (We still can treat a variety of lattices by choosing different n, m, and ways of assembling the cells.) For an internal site u ∈ I, we denote by Cu ⊂ the cell which contains the site u. For an external site x ∈ E, we denote by x ⊂ the union of m cells which contain the site x. 5.2. Fermion operators. We define special fermion operators which will be crucial for our analysis. Let ν > 0 be a constant. (We note that 1/ν corresponds to λ in our previous publications [2–5, 8].) For x ∈ E, let ax,σ = cx,σ − ν cu,σ , (5.3) u∈x ∩I
Ferromagnetism in the Hubbard Model
(A)
457
(a) (b)
(B)
(c)
Fig. 7. Examples of cells and lattices. The black dots represent external sites, and the gray dots represent internal sites. (A) From the cell with three sites (n = 2), one can form (a) a one-dimensional lattice (which is drawn as the delta chain or the zigzag chain in Figs. 3 and 4) by identifying two external sites (m = 2), or (b) a decorated square lattice (which also appears in Fig. 5) by identifying four (m = 4). (B) From the cell with five sites (n = 4), one can form, for example, (c) another decorated square lattice (which will appear in Fig. 8) by identifying four external sites (m = 4). There are many similar examples in higher dimensions
where the sum is over m internal sites adjacent to x. Similarly for u ∈ I, let bu,σ = cu,σ + ν
cx,σ ,
x∈Cu \{u}
where the sum is over the n external sites adjacent to u.
(5.4)
458
H. Tasaki
From the anticommutation relations (2.1) for the basic c operators, one can easily verify that † {ax,σ , bu,τ } = 0
(5.5)
for any x ∈ E, u ∈ I, and σ, τ =↑, ↓. This means that the a operators and the b operators correspond to distinct spaces of electrons. We shall discuss more about this point in Sect. 5.4. The anticommutation relations between the a operators are 2 1 + mν , if x = y, σ = τ ; † 2 {ax,σ , ay,τ } = x,y ν , (5.6) if x = y, σ = τ ; 0, if σ = τ . For x, y ∈ E, we defined x,y = |x ∩ y ∩ I|,
(5.7)
which is the number of distinct cells which contain both x and y. For the b operators, we similarly have 2 1 + nν , if u = v, σ = τ ; † 2 {bu,σ , bv,τ } = u,v ν , (5.8) if u = v, σ = τ ; 0, if σ = τ . For u, v ∈ I, we defined u,v = |Cu ∩ Cv ∩ E|,
(5.9)
which is the number of external sites which are adjacent to both u and v. One sees that a operators or b operators simply anticommute with each other if the reference sites are sufficiently separated. The slightly complicated anticommutation relations (found for sufficiently close reference sites) reflect the use of basis states which are localized but not orthogonal with each other. An important property of the a and b operators is that one can represent arbitrary states of the system by using these operators. The key is the following lemma. Lemma 5.1. For any r ∈ and σ =↑, ↓, one has cr,σ = γx ax,σ + ηu bu,σ , x∈E
with suitable coefficients γx and ηu .
(5.10)
u∈I
Proof. Consider a Hilbert space which consists of operators of the form r∈ αr cr,σ with αr ∈ C. We fix σ to be either ↑ or ↓. The inner product of the two “vectors” † α c and β c is defined to be the anticommutator {( r r,σ r r,σ r∈ αr cr,σ ) , r∈ r∈ † r∈ βr cr,σ } = r∈ αr βr . Since {ax,σ , bu,σ } = 0 for any x ∈ E and any u ∈ I, the subspace spanned by the set {ax,σ }x∈E and that spanned by the set {bu,σ }u∈I are orthogonal. Since ax,σ with different x are linearly independent, the dimension of the former subspace is equal to |E|. Similarly the dimension of the latter subspace is |I|. Noting that |E| + |I| = || is the dimension of the whole space, one finds that the set {ax,σ }x∈E ∪ {bu,σ }u∈I spans the whole space. This means that any cr,σ can be expanded in terms of ax,σ and bu,σ as in (5.10).
Ferromagnetism in the Hubbard Model
459
Recall that the basis states of the many-electron Hilbert space are (2.4). As a consequence of the lemma, we find that an arbitrary many-electron state of the system can be represented as a linear combination of the basis states (ν) † † † † vac , ax,↑ ax,↓ bu,↑ bu,↓ 0 (E↑ , E↓ , I↑ , I↓ ) = x∈E↑
x∈E↓
u∈I↑
u∈I↓
(5.11) with arbitrary subsets E↑ , E↓ ⊂ E and I↑ , I↓ ⊂ I. Here |E↑ | + |E↓ | + |I↑ | + |I↓ | = Ne is the total electron number.
5.3. Definition of the model and the main theorem. Our model is characterized by the four parameters t > 0, s > 0, U > 0, and ν > 0. The Hamiltonian of the model on is H = −s
x∈E σ =↑,↓
† ax,σ ax,σ + t
u∈I σ =↑,↓
† bu,σ bu,σ + U
nr,↑ nr,↓ ,
(5.12)
r∈
where the number operator nx,σ is defined in (2.3). Recalling the definitions (5.3) and (5.4), one sees that this defines a Hubbard model with nearest and next nearest neighbor hopping terms. We can rewrite (5.12) in the standard form (2.7) with the hopping matrix given by tx,x = mtν 2 − s, x ∈ E, 2, t = t − nsν u ∈ I, u,u ν(t + s), x ∈ Cu (5.13) x ∈ E, u ∈ I, tx,u = tu,x = 0, x ∈ Cu tx,y = x,y ν 2 t, x, y ∈ E, x = y, t = − ν 2 s, u, v ∈ I, u = v. u,v u,v Note that the model has nearest and next-nearest neighbor hopping amplitudes, but not more. See Figs. 4, 5(b), and 8 for examples9 . We consider the Hilbert space with the electron number fixed to Ne = |E| = nM/m. Note that this electron number is consistent with the interpretation that an external site represents a metallic atom which emits one electron to the system. Exactly as in Theorem 4.1, it can be shown that the flat-band models with s = 0 exhibit saturated ferromagnetism for any t > 0, ν > 0 and U > 0. See Sect. 6 of [8] for a proof. The instability of saturated ferromagnetism for sufficiently small U as in Theorem 4.2 can be of course proved for the general models. See Sect. 3.3 of [8]. Our main result is the following theorem which shows that the ground states of the model exhibit saturated ferromagnetism10 . 9 Observe that the lattice in Fig. 5(b) can be obtained by either setting n = 2, m = 4, or n = 4, m = 2. (In the latter case, the black dots correspond to the internal sites.) This means that we have models which exhibit saturated ferromagnetism at different electron numbers in different regions in the parameter space. 10 Theorem 4.4 about the low-lying excitation is valid in a wide range of models. See [5].
460
H. Tasaki
Fig. 8. Another example on the fcc like lattice in two dimensions obtained by setting n = m = 4
Theorem 5.2. When t/s, U/s and 1/ν are sufficiently large (how large these quantities should be depends only on the local structure of the lattice, but not on the size of the lattice), the ground state of the model is (Ne + 1)-fold degenerate and has the total spin Stot = Ne /2. In the proof of the theorem, we further show that one of the ground states is written as GS =
x∈E
† ax,↑
vac ,
(5.14)
− and other ground states are obtained by applying the spin lowering operator Sˆtot = † c c onto the state (5.14). r∈ r,↓ r,↑
5.4. “Band” structure in the single-electron problem. Before proceeding to prove the theorem, we discuss a basic property of the single electron problem associated with the present model. Although the discussion is not necessary for the proof of the main theorem, it sheds light on a special character of the model that we are studying. The single electron Hilbert space h is the ||-dimensional linear space spanned by † cr,↑ vac with r ∈ . (We here consider the space of up-spin electrons, but this choice is not essential.) This space is decomposed as h = hL ⊕ hU ,
(5.15)
† † vac with x ∈ E, and hU by bu,↑ vac with u ∈ I. Then we where hL is spanned by ax,↑ have the following.
Ferromagnetism in the Hubbard Model
461
Proposition 5.3. The Hamiltonian H can be diagonalized within hL and within hU , respectively. The energy eigenvalues in hL satisfy −s{1 + (m + L )ν 2 } ≤ ≤ min{0, −s{1 + (m − L )ν 2 }}, (5.16) where L = y∈E ,y=x x,y with x ∈ E, and the energy eigenvalues in hU satisfy where U =
max{0, t{1 + (n − U )ν 2 }} ≤ ≤ t{1 + (n + U )ν 2 },
v∈I ,v=u u,v
(5.17)
with u ∈ I.
The proposition states that the spectrum of the Hamiltonian H in the single electron Hilbert space h consists of two distinct “bands.” When ν is sufficiently small (which is the case when the theorem holds), the two “bands” do not overlap and are separated by † † a finite gap. The fermion operator ax,σ creates an electron in the lower “band”, and bu,σ creates an electron in the upper “band.” When the model has a translation invariance as in the models of Sect. 4, the single electron Hilbert space is decomposed into several bands in the standard sense. The lower or upper “band” that we mentioned above is not necessarily a band in the usual sense, but may be a union of several bands. In the band structure (4.3) discussed in Sect. 4, the lowest band with j = 1 corresponds to the lower “band”, and the collection of the remaining d bands corresponds to the upper “band.” Proof. The proof is elementary but requires some care. Consider a state of the form † = ϕ(x)ax,↑ vac ,
(5.18)
x∈E
where ϕ(x) are complex coefficients. From the anticommutation relations (5.6), one finds that † 2 2 H = − s (1 + mν )ϕ(x) + ν x,y ϕ(y) ax,↑ vac . (5.19) x∈E y∈ E y=x
† vac , we find that H can Since the right-hand side is again a linear combination of ax,↑ be diagonalized within hL . We now assume H = . By comparing the coefficients in (5.18) and (5.19), we find −s(1 + mν 2 )ϕ(x) − sν 2 x,y ϕ(y) = ϕ(x). (5.20) y∈E y=x
By multiplying (5.20) with ϕ(x), by summing it over x ∈ E, and by solving it for , one gets −1 =− |ϕ(x)|2 |ϕ(x)|2 + sν 2 x,y ϕ(x) ϕ(y) . s(1 + mν 2 ) x∈E
x∈E
x,y∈E x=y
(5.21)
462
H. Tasaki
By using the inequalities −(|ϕ(x)|2 + |ϕ(y)|2 ) ≤ ϕ(x) ϕ(y) + ϕ(x) ϕ(y) ≤ |ϕ(x)|2 + |ϕ(y)|2 ,
(5.22)
which follow immediately from |ϕ(x) ± ϕ(y)|2 ≥ 0, we find from (5.21) that −s(1 + mν 2 ) − sν 2 L ≤ ≤ −s(1 + mν 2 ) + sν 2 L .
(5.23)
† This, with the positive semidefiniteness of ax,σ ax,σ , proves the desired (5.16). The other † operators. inequality (5.17) is proved in exactly the same manner using the bu,↑
6. Proof 6.1. Proof of the main theorem. The basic strategy of the proof is first to show the appearance of ferromagnetism in a local piece of the system, and then to “connect” these local ferromagnetisms together to get the desired ferromagnetic state on the whole system. Of course such a “connection” usually does not work in itinerant electron systems where electrons behave as “waves.” Our method makes full use of special features of our model as well as of ferromagnetic states. Our proof is based on the decomposition of the Hamiltonian H =
(6.1)
hx ,
x∈E
where hx acts only on the sublattice x . The local Hamiltonian hx is defined as
hx = −s
σ =↑,↓
+
U n
† ax,σ ax,σ +
t n
u∈x ∩I σ =↑,↓
ny,↑ ny,↓ +
y∈x ∩E
U n
† bu,σ bu,σ
nu,↑ nu,↓ ,
(6.2)
u∈x ∩I
where n = |x ∩ E|. It should be stressed that hx with neighboring x do not commute with each other. One therefore cannot diagonalize all hx simultaneously. As for the lowest eigenvalue and the corresponding eigenstates, however, we have the following. This lemma plays a key role in our proof of the theorem. Lemma 6.1. When t/s, U/s and 1/ν are sufficiently large, the lowest eigenvalue of hx is −s(1 + mν 2 ), and any corresponding eigenstate can always be written as † † = ax,↑ 1 + ax,↓ 2 ,
(6.3)
where 1 , 2 are suitable states with Ne − 1 electrons. The eigenstate also satisfies cr,↑ cr,↓ = 0, for any r ∈ x .
(6.4)
Ferromagnetism in the Hubbard Model
463
We shall prove Theorem 5.2 assuming Lemma 6.1. The lemma will be proved in Sect. 6.3. Since hx ≥ −s(1 + mν 2 ), we have H = x∈E hx ≥ −s(1 + mν 2 )|E|. A straightforward calculation using the anticommutation relations (5.5) and (5.6) shows that the state (5.14) is an eigenstate of H with the eigenvalue −s(1 + mν 2 )|E|. Therefore we see that the state (5.14) is a ground state. Our goal here is to characterize all the ground states. Let be an arbitrary eigenstate of H with the eigenvalue −s(1 + mν 2 )|E|. Then it follows from hx ≥ −s(1 + mν 2 ) that hx = −s(1 + mν 2 ),
(6.5)
for any x ∈ E. Thus satisfies the properties stated in Lemma 6.1. (ν) Let us expand in the basis states 0 (E↑ , E↓ , I↑ , I↓ ) of (5.11). Since satisfies (6.3) for any x ∈ E, it follows that only those basis states with E↑ ∪ E↓ = E contribute. Since the electron number is |E↑ | + |E↓ | + |I↑ | + |I↓ | = Ne = |E|, the condition E↑ ∪ E↓ = E implies E↑ ∩ E↓ = ∅ and I↑ = I↓ = ∅. Therefore the expansion of in the basis states (5.11) can be rearranged into a “spin system representation” as
=
ψ(σ )
x∈E
σ
† ax,σ (x)
vac ,
(6.6)
where σ = (σ (x))x∈E is summed over all the spin configurations with σ (x) =↑, ↓, and ψ(σ ) are complex coefficients. We then examine the property (6.4) for u ∈ I. From the definition (5.3), we find that for any u ∈ I, cu,↑ cu,↓
x∈E
† ax,σ (x)
vac =
sgn(y, z) χ [σ (y) =↑, σ (z) =↓]
y,z∈Cu \{u} y=z
×
x∈E \{y,z}
† ax,σ (x) vac ,
(6.7)
where the sign factor sgn(y, z) comes from the anticommutation relations, and satisfies sgn(y, z) = −sgn(z, y). The characteristic function χ [·] is defined as usual by χ[true] = 1 and χ[false] = 0. By using (6.6) and (6.7), we find for any u ∈ I that cu,↑ cu,↓ =
y,z∈Cu \{u} σ y>z
×
x∈E \{y,z}
sgn(y, z) χ [σ (y) =↑, σ (z) =↓] {ψ(σ ) − ψ(σ y z )}
† ax,σ (x) vac ,
(6.8)
where we have introduced an arbitrary ordering in E to avoid double counting. The spin configuration σ y z is obtained from σ = (σ (x))x∈E by switching σ (y) and σ (z). Since
464
H. Tasaki
the basis states in the sum (6.8) are all linearly independent, we find from the property (6.4) that ψ(σ ) = ψ(σ y z ),
(6.9)
for any y, z ∈ E for which there is u ∈ I such that y, z ∈ Cu . Since the whole lattice is connected, (6.9) implies that all ψ(σ ) with the same M = x∈E σ (x) are identical. This completes the characterization of the space of the ground states. The ground state which has a fixed total spin in the z-direction is † M = ax,σ (x) vac , (6.10) x
σ σ (x)=M
x∈E
where M = −(|E|/2), 1 − (|E|/2), · · · , (|E|/2) − 1, |E|/2. Thus the ground states are |E| + 1 fold degenerate. It is easy to check that (Sˆ tot )2 M = Smax (Smax + 1)M ,
(z) Sˆtot M = M M ,
(6.11)
with Smax = |E|/2 being the maximum possible value of the total spin of Ne = |E| electrons.
6.2. Some extensions. Let us make two brief remarks about extensions of Theorem 5.2. The first extension deals with the one dimensional model of Fig. 4, which (in the notation of Sect. 5.1) has n = m = 2. In this model, Tanaka [29] observed that the statement of Theorem 5.2 can be proved if one first fixes arbitrary ν > 0 and then takes sufficiently large t/s and U/s. To show this extended theorem, one proves the statement corresponding to Lemma 6.1 by the method we used in [6] to prove the similar lemma. The differences between the lemma in [6] and that in the present paper comes from a difference in the definitions of the local Hamiltonian. Unlike the definition (6.2) in the present paper, we did not include the on-site repulsion terms from the external sites other than x in the local Hamiltonian used in [6]. This seemingly minor difference indeed makes a considerable difference in the conditions that we obtain in the limit U → ∞. The same method as in [6] thus yields much stronger information for the local Hamiltonian defined as in the present paper11 . We leave the details to interested readers. The second extension is much more straightforward and less important. For arbitrary complex coefficients fu , define B=
σ =↑,↓
u∈I
† fu bu,σ
fu bu,σ
,
(6.12)
u∈I
which is obviously positive semidefinite. From the expression (6.11) for the ground states and the anticommutation relations (5.5), one readily finds that B = 0 for any ground state . 11 After the publication of [6], Kubo [30] and Shen [31, 23] independently noticed the importance to include the on-site repulsions from the external sites in the local Hamiltonian.
Ferromagnetism in the Hubbard Model
465
This means that one may add to the Hamiltonian the new hopping terms H =
j σ =↑,↓
† (j ) fu bu,σ
u∈I
(j ) fu bu,σ
(6.13)
,
u∈I
(j )
with arbitrary fu without modifying the ferromagnetic ground states. In this manner, one can modify, for example, the models in Sect. 4 so that all the bands become dispersive maintaining the appearance (and the provability) of saturated ferromagnetism.
6.3. Proof of Lemma 6.1. It suffices to prove the lemma for ho with a fixed o ∈ E. Since ho acts only on o , we only consider an electron system defined on o without specifying the electron number. We also write Eo = o ∩ E and Io = o ∩ I. The local Hamiltonian that we consider is ho = −s
σ =↑,↓
† ao,σ ao,σ +
t † U U bu,σ bu,σ + nx,↑ nx,↓ + nu,↑ nu,↓ . n n n u∈Io σ =↑,↓
x∈Eo
u∈Io
(6.14) We stress that the statement of the lemma is about the property of a finite dimensional matrix ho . It is thus possible (in principle) to prove the lemma for fixed n, m by using a computer12 . But the problem is indeed rather delicate, and the proof for general cases seems highly nontrivial. As we have restricted our lattice, we redefine (only in this proof) the operator ax,σ for x ∈ Eo \{o} as
ax,σ = cx,σ − ν
(6.15)
cu,σ .
u∈x ∩Io † The definition of bu,σ is unchanged. Note that we still have {ax,σ , bu,τ } = 0. Exactly as in (5.11), any state defined on o can be written as a linear combination of the basis states
1 (E↑ , E↓ , I↑ , I↓ ) = (ν)
x∈E↑
† ax,↑
x∈E↓
† ax,↓
u∈I↑
† bu,↑
† vac , bu,↓
u∈I↓
(6.16) with arbitrary subsets E↑ , E↓ ⊂ Eo and I↑ , I↓ ⊂ Io . Here we do not fix the electron number which is given by |E↑ | + |E↓ | + |I↑ | + |I↓ |. 12 The numerical values of t/s in the caption to Fig. 4 was obtained by using a computer. See also Fig. 2 of [6]. Shen [23] has done this for some models in higher dimensions.
466
H. Tasaki
6.3.1. The limit t, U → ∞. Let us first consider the limit where t → ∞ and U → ∞. It is easily found that the lowest eigenvalue of ho is finite in this limit. (Try, for example, the † † vac .) Note that the parts in ho which contain t are (t/n) u∈Io σ =↑,↓ bu,σ bu,σ, state ao,↑ and which contain U are (U/n ) x∈Eo nx,↑ nx,↓ + (U/n) u∈Io nu,↑ nu,↓ . Since each term in these sums is positive semidefinite, the necessary and sufficient condition for a state to have a finite energy in the limit t → ∞, U → ∞ is bu,σ = 0,
(6.17)
cr,↑ cr,↓ = 0
(6.18)
for any u ∈ Io and σ =↑, ↓, and
for any r ∈ o . To get (6.18), we noted that nr,↑ nr,↓ = (cr,↑ cr,↓ )† cr,↑ cr,↓ . To see implications of the condition (6.17), we introduce dual operators b˜u,σ for u ∈ Io and σ =↑, ↓ which satisfy † {b˜u,σ , bv,τ } = δu,v δσ,τ ,
(6.19)
for any u, v ∈ Io and σ, τ =↑, ↓. More precisely the construction is as follows. Define † }, where u, v run over I . The linear the Gramm matrix G by (G)u,v = {bu,σ , bv,σ o independence of the basis states implies that G is invertible. For u ∈ Io and σ =↑, ↓, define b˜u,σ = (G−1 )u,w bw,σ , (6.20) w∈Io
where it is easy to check (6.19). From (6.19) and (6.16), one sees that (ν) 1 (E↑ , E↓ , I↑ , I↓ ), if u ∈ Iσ ; (ν) † ˜ bu,σ bu,σ 1 (E↑ , E↓ , I↑ , I↓ ) = 0, otherwise,
(6.21)
for any u ∈ Io and σ =↑, ↓. Let be an arbitrary state satisfying (6.17). Then since b˜u,σ is a linear combination of bw,σ , one has b˜u,σ = 0 and hence † ˜ bu,σ = 0, bu,σ
(6.22)
for any u ∈ Io and σ =↑, ↓. Then from (6.21) and the linear independence of the basis (ν) states (6.16), one finds that the state , when expanded in the basis states 1 , cannot (ν) include 1 (E↑ , E↓ , I↑ , I↓ ) with nonempty I↑ or I↓ . Therefore we conclude that is a linear combination of the basis states † † (ν) (ν) 2 (E↑ , E↓ ) = 1 (E↑ , E↓ , ∅, ∅) = ax,↑ ax,↓ vac , (6.23) x∈E↑
with arbitrary E↑ , E↓ ⊂ Eo .
x∈E↓
Ferromagnetism in the Hubbard Model
467
We then examine the condition (6.18) for r ∈ Eo . Noting the definitions (5.3), (6.15) of ax,σ and (6.23), we see that (ν) 2 (E↑ , E↓ ), if x ∈ E↑ ∩ E↓ ; (ν) † † ax,↓ ax,↑ cx,↑ cx,↓ 2 (E↑ , E↓ ) = (6.24) 0, otherwise, for any x ∈ Eo . Now the condition (6.18) for r ∈ Eo implies † † ax,↓ ax,↑ cx,↑ cx,↓ = 0,
(6.25) (ν)
for any x ∈ Eo . Then, as before, we see that is a linear combination of 2 (E↑ , E↓ ) with E↑ , E↓ ⊂ Eo such that E↑ ∩ E↓ = ∅. For the state to have finite energy in the limits t, U → ∞, it must further satisfy cu,↑ cu,↓ = 0
(6.26)
for any u ∈ Io . This condition is not as straightforward to treat as the previous two conditions. To see implications of (6.26), we first note that when E↑ ∩ E↓ = ∅, (ν) χ [x ∈ E↑ , y ∈ E↓ ] sgn(x, y; E↑ , E↓ ) cu,↑ cu,↓ 2 (E↑ , E↓ ) = ν 2 x,y∈Cu \{u} x=y (ν)
×2 (E↑ \{x}, E↓ \{y}),
(6.27)
(ν)
where we used the definitions (5.3), (6.15) of ax,σ , and (6.23) of 2 (E↑ , E↓ ). Here χ[·] is the characteristic function as before, and sgn(x, y; E↑ , E↓ ) = ±1 is the sign factor coming from anticommutation relations. Let us then expand the state as (ν) ϕ(E↑ , E↓ ) 2 (E↑ , E↓ ). (6.28) = E↑ ,E↓ ⊂Eo E↑ ∩E↓ =∅
The zero energy condition (6.26) for any u ∈ Io implies certain relations that the coefficients ϕ(E↑ , E↓ ) must satisfy. Noting that the parameter ν appears in (6.27) only as a prefactor, one finds that these relations for ϕ(E↑ , E↓ ) depend only on the lattice structure and do not depend on ν at all. Although the precise forms of the relations are not needed here, let us write them down for completeness. The conditions that the coefficients ϕ(E↑ , E↓ ) must satisfy are χ [x ∈ E↑ , y ∈ E↓ ] sgn(x, y; E↑ ∪ {x}, E↓ ∪ {y}) x,y∈Cu \{u} x=y
×ϕ(E↑ ∪ {x}, E↓ ∪ {y}) = 0,
(6.29)
for any E↑ , E↓ ⊂ Eo such that E↑ ∩ E↓ = ∅, and for any u ∈ Io . (ν)
For ν ≥ 0, we let Hfin be the space of all which are expanded as (6.28) with the (ν) coefficients ϕ(E↑ , E↓ ) satisfying the conditions (6.29). For ν > 0, the space Hfin is precisely the space of all which have finite energy (expectation value) in the limit t,
468
H. Tasaki (0)
U → ∞. The space Hfin has no such interpretation, but it is convenient to define this (ν) space. Note that Hfin depends continuously on ν ≥ 0 since the range of allowed coeffi(ν) cients ϕ(E↑ , E↓ ) is independent of ν and the basis states 2 (E↑ , E↓ ) are continuous in ν. (ν) (ν) (ν) We also let Pfin be the orthogonal projection onto the space Hfin . Again Pfin is continuous in ν ≥ 0. For ν > 0, to study finite energy states of the local Hamiltonian ho in the limit t, U → ∞ is equivalent to study the effective Hamiltonian (ν) (ν) (ν) heff = Pfin h˜ (ν) o Pfin ,
where h˜ (ν) o = −s
σ =↑,↓
(6.30)
† ao,σ ao,σ .
(6.31)
(0) (ν) Again we extend the range of ν and define heff by (6.30) with ν = 0. Since h˜ o is (ν) continuous in ν ≥ 0, the effective Hamiltonian heff is also continuous in ν ≥ 0. It follows from the standard argument that the eigenvalues of the local Hamiltonian ho with given ν > 0 are classified into two sets. In the limit t, U → ∞, the eigenvalues (ν) in the first set diverge, while those in the second set converge to the eigenvalues of heff including the degeneracies. (ν) Our next task is to investigate the eigenvalues of heff . But this is still a nontrivial (ν) (ν) problem since Pfin and h˜ o do not commute. (0)
(0)
6.3.2. The case ν = 0. Let us set ν = 0 and study heff . Although heff is not really an (ν) (0) effective Hamiltonian, we get crucial information about heff by studying heff . For ν = 0, the operator ax,σ is nothing but the basic fermion operator cx,σ , and the part (6.31) of the local Hamiltonian becomes h˜ (0) o = −s no ,
(6.32)
where no = no,↑ + no,↓ is the number operator. The problem becomes that of electrons (0) strictly localized at sites in Eo , except for the projection operator Pfin . The existence of the projection makes the problem nontrivial. (0) Let us decompose the space Hfin as (0)
Hfin = S (0) ⊕ V (0) ⊕ M(0) . (0)
(6.33)
Here S (0) consists of all ∈ Hfin which satisfy no = . In other words, S (0) is a set (0) of states in Hfin with singly occupied o. Any ∈ S (0) is written as a linear combination (0) (0) of 2 (E↑ , E↓ ) with E↑ ∪ E↓ o. Similarly V (0) consists of all ∈ Hfin which (0) satisfy no = 0. It is a set of states in Hfin with vacant o. Any ∈ V (0) is written as a (0) linear combination of 2 (E↑ , E↓ ) with E↑ ∪ E↓ o. The space M(0) is defined as the orthogonal complement. † † vac ∈ S (0) and cx,σ vac ∈ V (0) , Note that S (0) and V (0) are never empty since co,σ (0) where x ∈ Eo \{o}. On the other hand, M is empty in models with n = 2. Since the
Ferromagnetism in the Hubbard Model
469
following argument becomes almost trivial when M(0) is empty, we shall assume that M(0) is not empty. Any ∈ M(0) is uniquely decomposed as † † = co,↑ 1 + co,↓ 2 + 3 ,
(6.34)
where i (i = 1, 2, 3) satisfy no i = 0, i.e., o is vacant in these states. We then define α=
sup ∈M(0)
† † co,↑ 1 + co,↓ 2 † † co,↑ 1 + co,↓ 2 + 3
,
(6.35)
and note that α < 1.
(6.36)
To see this, suppose that α = 1. Since M(0) is closed there is ∈ M(0) which attains † † † † α = 1. Then α = 1 implies co,↑ 1 + co,↓ 2 + 3 = co,↑ 1 + co,↓ 2 , which means 3 = 0. But this means ∈ S (0) , which contradicts (6.33). Now from (6.32), one has −s , if ∈ S (0) ; (0) (0) ˜ (0) (0) (0) heff = Pfin ho Pfin = −s Pfin no = 0, if ∈ V (0) .
(6.37)
(0)
Thus in S (0) or V (0) is an eigenstate of heff . The remaining eigenstates are within the space M(0) . As for ∈ M(0) , one has (0) (0) (0) heff = Pfin h˜ (0) o Pfin (0)
† † 1 + co,↓ 2 + 3 ) = −s Pfin no (co,↑ (0)
† † 1 + co,↓ 2 ). = −s Pfin (co,↑
(6.38)
Therefore (0)
(0)
† † 1 + co,↓ 2 )) (, heff ) = −s(, Pfin (co,↑ † † ≥ −s co,↑ 1 + co,↓ 2 ,
(6.39)
and we get (0)
(, heff ) 2
≥ −s
† † 1 + co,↓ 2 co,↑
≥ −αs > −s,
(6.40)
where we used (6.35) and (6.36). By the variational principle, we see that the eigenvalues (0) of heff within the space M(0) are not less than −αs. (0) Thus we found that the lowest eigenvalue of heff is −s and its degeneracy is equal to the dimension γ of the space S (0) . Note that γ is not vanishing since S (0) is not empty. There is a finite gap above the lowest eigenvalue.
470
H. Tasaki (0)
6.3.3. Non-limiting cases. By using the properties of heff and the continuity in ν ≥ 0, (ν) one finds that heff has γ low lying eigenvalues which are separated from larger eigenvalues by a finite gap, provided that ν > 0 is sufficiently small. By recalling the remark at the end of Sect. 6.3.1, one finds that for sufficiently small ν > 0 and sufficiently large t and U , the local Hamiltonian ho has γ low lying eigenvalues which are separated from larger eigenvalues by a finite gap. In what follows, we shall explicitly find these low lying eigenvalues (all of which will turn out to be equal to −s(1 + m ν 2 )), and characterize all the corresponding eigenstates. (ν) For ν > 0, we define S (ν) as the space of all ∈ Hfin which are written as lin(ν) ear combinations of 2 (E↑ , E↓ ) such that E↑ ∩ E↓ o. S (ν) is not empty since † vac ∈ S (ν) . ao,σ In other words, S (ν) is a set of all which are expanded as (6.28) with the coefficients ϕ(E↑ , E↓ ) satisfying the conditions (6.29) and an additional condition that ϕ(E↑ , E↓ ) = 0 unless E↑ ∪ E↓ o. Again we see that the set of allowed coefficients {ϕ(E↑ , E↓ )} is (ν) independent of ν. Since the basis states 2 (E↑ , E↓ ) are mutually linear independent for each fixed ν ≥ 0, we find that S (ν) for different ν ≥ 0 are all identical as linear spaces. In particular S (ν) for any ν > 0 has the same dimension as the space S (0) , i.e., γ . Note that any ∈ S (ν) can be written uniquely in the form † † = ao,↑ 1 + ao,↓ 2 ,
(6.41)
(ν)
where 1 and 2 are linear combinations of 2 (E↑ , E↓ ) with E↑ ∩ E↓ = ∅ and E↑ ∪ E↓ o. We will show that, for arbitrary t, U , and ν, any ∈ S (ν) is an eigenstate of the local Hamiltonian ho with eigenvalue −s(1 + m ν 2 ). Note that this eigenvalue converges to −s as ν → 0, and is γ -fold degenerate. These facts imply that we have precisely located the γ low-lying eigenvalues of ho for sufficiently large t, U and sufficiently small ν. These low-lying eigenvalues turned out to be completely degenerate, and forming the lowest eigenvalue. Since ∈ S (ν) has all the properties declared in the lemma, this leads us to the lemma. Let be an arbitrary state in S (ν) . It only remains to prove that ho = −s(1+mν 2 ). † bu,σ = 0 for any u ∈ Io and σ =↑, ↓, and nr,↑ nr,↓ = 0 By construction we have bu,σ for any r ∈ o . Thus we only need to show that † ao,σ ao,σ = (1 + mν 2 ). (6.42) σ =↑,↓
† From the expression (6.41) and {ao,σ , ao,σ } = 1 + mν 2 , one has † ao,σ ao,σ = (1 + mν 2 ) + ,
(6.43)
σ =↑,↓
where
† = ao,↑
σ =↑,↓
† † ao,σ ao,σ 1 + ao,↓
σ =↑,↓
† ao,σ ao,σ 2 .
(6.44)
Ferromagnetism in the Hubbard Model
471
On the other hand, from the expression (6.41) and the zero energy condition (6.26), one has † † cu,↑ cu,↓ = cu,↑ cu,↓ ao,↑ 1 + cu,↑ cu,↓ ao,↓ 2 † † = ao,↑ cu,↑ cu,↓ 1 + ao,↓ cu,↑ cu,↓ 2 + ν(cu,↓ 1 − cu,↑ 2 )
= 0,
(6.45)
† † where we used (5.3). By operating ao,↑ ao,↓ from the left, the final two lines yield the relation † † ao,↑ ao,↓ (cu,↓ 1 − cu,↑ 2 ) = 0,
(6.46)
for any u ∈ Io . By recalling that ao,σ = co,σ − ν u∈Io cu,σ and noting that co,σ 1 = co,σ 2 = 0, we can rewrite (6.44) as † † † † = ao,↑ ao,↓ ao,↓ 1 + ao,↓ ao,↑ ao,↑ 2 † † = −ν ao,↑ ao,↓ (cu,↓ 1 − cu,↑ 2 ) u∈Io
= 0,
(6.47)
where we used (6.46). Recalling (6.43), this completes the proof of the lemma. Acknowledgements. It is a pleasure to thank Akinori Tanaka for pointing out crucial flaws in the earlier versions of the present work, and for indispensable discussions and comments. I also wish to thank Tom Kennedy, Tohru Koma, Kenn Kubo, Koichi Kusakabe, Elliott Lieb, Andreas Mielke, Bruno Nachtergaele, Teppei Sekizawa, and Shun-Qing Shen for various useful conversations and discussions.
References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22.
Heisenberg, W.J.: Z. Phys. 49, 619 (1928) Tasaki, H.: Phys. Rev. Lett. 69, 1608 (1992) Mielke, A., Tasaki, H.: Commun. Math Phys. 158, 341 (1993) Tasaki, H.: Phys. Rev. Lett. 73, 1158 (1994) Tasaki, H.: J. Stat. Phys. 84, 535 (1996) Tasaki, H.: Phys. Rev. Lett. 75, 4678 (1995), cond-mat/9509063 Lieb, E.H.: In: Advances in Dynamical Systems and Quantum Physics. Singapore: World Scientific, 1995, cond-mat/9311033 Tasaki, H.: Prog. Theor. Phys. 99, 489 (1998), cond-mat/9712219 Tasaki, H.: J. Phys. Cond. Matt. 10, 4353 (1998), cond-mat/9512169 Nagaoka, Y.: Phys. Rev. 147, 392 (1966) Thouless, D.J.: Proc. Phys. Soc. London 86, 893 (1965) Lieb, E.H.: Phys. Rev. Lett. 62, 1201 (1989) Mielke, A.: J. Phys. A24, L73 (1991) Mielke, A.: J. Phys. A24, 3311 (1991) Mielke, A.: J. Phys. A25, 4335 (1992) Mielke, A.: Phys. Ltt. A 174, 443 (1993) Mielke, A.: Phys. Rev. Lett. 82, 4312 (1999) Mielke, A.: J. Phys. A, Math. Gen. 32, 8411 (1999) Sekizawa, T.: J. Phys. A, Math Gen. in press. cond-mat/0304295 Mizuno, F., Masuda, H., Hirabayashi, I.: In: Narlikar, A. (ed), Studies of High Temperature Superconductors, Vol. 10, Commack, NY: Nova Science Publisher, 1993 Arita, R., Suwa, Y., Kuroki, K., Aoki, H.: Phys. Rev. Lett. 88, 127202 (2002) Kusakabe, K., Maruyama, M.: Phys. Rev. B 67, 092406 (2003), cond-mat/0212391
472 23. 24. 25. 26. 27. 28. 29. 30. 31.
H. Tasaki Shen, S.-Q.: Eur. Phys. J. B 2, 11 (1998) Tanaka, A., Idogaki, T.: J. Phys. Soc. Jpn. 67, 401 (1998) Tanaka, A., Idogaki, T.: Physica A 297, 441 (2001) Tanaka, A., Ueda, H.: Phys. Rev. Lett. 90, 067204 (2003), cond-mat/0209423 Kusakabe, K., Aoki, H.: Phys. Rev. Lett. 72, 144 (1994) Penc, K., Shiba, H., Mila, F., Tsukagoshi, T.: Phys. Rev. B 54, 4056 (1996), cond-mat/9603042 Tanaka, A.: Private communication Kubo, K.: Private communication Shen, S.-Q.: Private communication
Communicated by M. Aizenman
Commun. Math. Phys. 242, 473–500 (2003) Digital Object Identifier (DOI) 10.1007/s00220-003-0942-1
Communications in
Mathematical Physics
Abelian Duality and Abelian Wilson Loops Roberto Zucchini1,2 1 2
Dipartimento di Fisica, Universit`a degli Studi di Bologna, V. Irnerio 46, 40126 Bologna, Italy INFN, sezione di Bologna, Bologna, Italy
Received: 30 October 2002 / Accepted: 13 May 2003 Published online: 10 October 2003 – © Springer-Verlag 2003
Abstract: We consider a pure U (1) quantum gauge field theory on a general Riemannian compact four manifold. We compute the partition function with Abelian Wilson loop insertions. We find its duality covariance properties and derive topological selection rules. Finally, we show that, to have manifest duality, one must assume the existence of twisted topological sectors besides the standard untwisted one.
1. Introduction and Conclusions Electromagnetic Abelian duality is an old subject that has fascinated theoretical physicists for a long time as a means to explain the quantization of electric charge [1–4] and the apparent absence of magnetic charge [5–9]. Its study has also provided important clues in the analysis of analogous dualities in supersymmetric gauge theory [10, 11], supergravity [12, 13] and string theory [14–17]. It is also considerably interesting for the nontrivial interplay of quantum field theory, geometry and topology it shows [18–21]. The aim of this paper is to further explore these latter aspects of Abelian duality as we briefly outline next. For an updated review of these matters, see for instance refs. [22–24]. Consider a pure U (1) gauge field theory on a general Riemannian compact four manifold M. The Wick rotated action is S(A, τ ) =
i 2
FA ∧ ∗FA + M
q 2θ 8π 2
FA ∧ F A .
(1.1)
M
Here, the charge q and the angle θ are combined as the real and imaginary parts of the complex parameter τ=
θ 2π +i 2, 2π q
(1.2)
474
R. Zucchini
varying in the open upper complex half plane H+ . A is the physical gauge field. Its field strength FA = dA satisfies the quantization condition q FA ∈ Z, (1.3) 2π for any 2–cycle . The quantization of the gauge field theory is attained as usual by summation over all topological classes of the gauge field and by functional integration of the quantum fluctuations of the gauge field about the vacuum gauge configuration of each class with the gauge group volume factored out. In this way, one can compute in principle the partition function possibly with gauge invariant insertions. It is known that the partition function proper Z(τ ) is a modular form of weights χ+η 4 , χ −η of the subgroup of the modular group generated by ν 4 τ → −1/τ,
τ → τ + ν,
(1.4)
where χ and η are respectively the Euler characteristic and the signature invariant of M and ν = 1 if M is a spin manifold and ν = 2 otherwise [18]. This property of Z(τ ) is what is usually meant by Abelian duality. The natural question arises whether the partition function with simple gauge invariant insertions exhibits analogous duality covariance properties. Specifically, we shall consider the partition function with insertion of the Abelian Wilson loop associated to a 1–cycle of M: Z( , τ ) = Z(τ ) exp iq A . (1.5)
τ
In due course, we shall discover the following. a) Due to a peculiar combination of the contributions of the torsion classical topological classes and the quantum fluctuations in the field theory, the partition function Z( , τ ) vanishes unless the 1–cycle is a boundary. b) Z( , τ ) is a member of a family of partition functions ZA ( , τ ) mixing under the transformations (1.4). ZA ( , τ ) is of the general form π σ ( ) ZA ( , τ ) = exp − FA ( , τ ), (1.6) Im τ where σ ( ) is the renormalized selfenergy of the classical conserved current associated to the 1–cycle . When the 1–boundary satisfies certain conditions, FA ( , τ ) χ−η is the Ath component of a vector modular form F ( , τ ) of weights χ+η 4 , 4 for the subgroup ν . c) To have manifest duality, one must assume the existence of twisted topological sectors besides the standard untwisted one, one for each independent value of the index A. ZA ( , τ ) is the partition function of the twisted sector A. In a topologically non-trivial manifold M, the definition of the integral A is not straightforward, as the gauge field A is not a globally defined 1–form. We approach this problem using the theory of the Cheeger–Simons differential characters. This produces however a family of possible definitions of A parameterized by the choices of certain background fields. In spite of this, the result of the calculations of Z( , τ ) does not depend on the choices made as it should.
Abelian Duality and Abelian Wilson Loops
475
This fact is related to the selection rules mentioned above. Z( , τ ) is non zero when is a 1–boundary. When this happens, the choices entering in the definition of A turn out to be immaterial. The proof of this intriguing result involves an interesting relationship between flat Cheeger–Simons differential characters and Morgan–Sullivan torsion invariants. The physical significance of the twisted topological sectors remains to be explored. It seems to indicate that the non-perturbative structure of electrodynamics might be far richer than thought so far. Plan of the paper. In Sect. 2, we introduce the necessary topological set up. We use this to properly define the Wilson loop corresponding to a given 1–cycle. In Sect. 3, we proceed to the calculation of the partition function with a Wilson loop insertion and show that it vanishes unless the associated 1–cycle is a boundary. In Sect. 4, we study the duality properties of the partition function and show the existence of twisted topological sectors. Finally, in the Appendix, we collect some of the technical details of the calculation of the partition function. Conventions and notation. For a review of the mathematical formalism, see for instance [25]. For a clear exposition of its field theoretic applications, see [26]. In this paper, M denotes a compact connected oriented four manifold. For a sheaf of Abelian groups over M, H p (M, ) denotes the p th sheaf cohomology group of and Tor p (M, ) its torsion subgroup. For an Abelian group G, G denotes the associated constant sheaf on M. For an Abelian Lie group G, G denotes the sheaf of germs of smooth G valued functions on M. Cps (M), Zps (M), Bps (M) denote the groups of smooth singular p–chains, cycles and boundaries of M, respectively, and b the boundary operator. Hps (M) denotes the p th singular homology group and Tor sp (M) its torsion subgroup. For an Abelian group G, p p p CsG (M), ZsG (M), BsG (M) denote the groups of smooth singular p–cochains, cocycles and coboundaries of M with coefficients in G, respectively, and d the coboundary p operator. HsG (M) denotes the p th singular cohomology group with coefficients in G p and Tor sG (M) its torsion subgroup. p p p CdR (M), ZdR (M), BdR (M) denote the groups of general, closed and exact smooth p p–forms of M, respectively, and d the differential operator. HdR (M) denotes p th de p Rham cohomology space. Further, ZdR Z (M) denote the subgroup of closed smooth p p–forms of M with integer periods and HdR Z (M) the integer cohomology lattice in p p HdR (M). q denotes the natural homomorphism of H p (M, Z) into HdR (M). bp denotes th the p Betti number. When M is equipped with a metric g, Harm p (M) denotes the p p space of harmonic p–forms of M and Harm Z (M) the lattice Harm p (M) ∩ ZdR Z (M). ± b2 denotes the dimension of the space of (anti)selfdual harmonic 2–forms. 2. U (1) Principal Bundles, Connections and Cheeger–Simons Characters In this section, we review well known facts about U (1) principal bundles, connections and Cheeger Simons differential characters, which are relevant in the following. See ref. [27] for background material.
476
R. Zucchini
2.1. Smooth and flat principal bundles. The quantization of Maxwell theory involves a summation over the topological classes of the gauge field. Mathematically, these classes can be identified with the isomorphism classes of smooth U (1) principal bundles, which we describe below. The group of isomorphism classes of smooth U (1) principal bundles on M, Princ (M), can be identified with the 1st cohomology of the sheaf U (1): Princ(M) = H 1 (M, U (1)).
(2.1.1)
There is a well known alternative more convenient characterization of Princ (M) derived as follows. Consider the short exact sequence of sheaves 0
→
i
Z →
R
e
→
U (1)
→ 0,
(2.1.2)
where i(n) = n for n ∈ Z and e(x) = exp(2π ix) for x ∈ R. The associated long exact sequence of sheaf cohomology contains the segment e∗
i∗
c
· · · → H 1 (M, R) → H 1 (M, U (1)) → H 2 (M, Z) → H 2 (M, R) → · · · .
(2.1.3)
Since R is a fine sheaf, H p (M, R) = 0 for all p ≥ 1. Therefore, c is an isomorphism H 1 (M, U (1)) ∼ = H 2 (M, Z). It follows that c
∼ =
Princ(M)
(2.1.4)
H 2 (M, Z).
This isomorphism associates to any smooth U (1) principal bundle P its Chern class cP . Flat U (1) principal bundles play an important role in determining the selection rules of the Abelian Wilson loops, as will be shown later. It is therefore necessary to understand their place within the family of smooth U (1) principal bundle. The group of isomorphism classes of flat U (1) principal bundles on M, Flat (M), can be identified with the 1st cohomology of the constant sheaf U (1): Flat (M) = H 1 (M, U (1)).
(2.1.5)
There is an obvious natural sheaf morphism U (1) → U (1), to which there corresponds a homomorphism H 1 (M, U (1)) → H 1 (M, U (1)) of sheaf cohomology. By (2.1.1), (2.1.5), this can be viewed as a homomorphism of Flat (M) into Princ (M). Its image is the subgroup of smooth isomorphism classes of flat principal bundles, Princ 0 (M). On account of (2.1.4), Princ 0 (M) is isomorphic to a subgroup of H 2 (M, Z), which we shall identify next. Consider the short exact sequence of sheaves 0
→
i
Z →
R
e
→
U (1)
→ 0,
(2.1.6)
where i and e are defined as above. The associated long exact sequence of sheaf cohomology contains the segment e∗
c
i∗
· · · → H 1 (M, R) → H 1 (M, U (1)) → H 2 (M, Z) → H 2 (M, R) → · · · .
(2.1.7)
Recalling that Tor 2 (M, Z) = ker i∗ |H 2 (M, Z), c induces an isomorphism H 1 (M, U (1))/ ˇ e∗ H 1 (M, R) ∼ realization of sheaf cohomology, it is easy = Tor 2 (M, Z). Using the Cech
Abelian Duality and Abelian Wilson Loops
477
to see that H 1 (M, U (1))/e∗ H 1 (M, R) is isomorphic to the image of H 1 (M, U (1)) in H 1 (M, U (1)). Therefore, we conclude that c
Princ 0 (M)
∼ =
Tor 2 (M, Z).
(2.1.8)
Combining (2.1.4), (2.1.8), we conclude that there is a commutative diagram Princ 0 (M) ⊆
c
→
↓
Princ(M)
Tor 2 (M, Z) ↓
→ c
⊆
(2.1.9)
H 2 (M, Z),
where the lines are isomorphisms. This describes in some detail the set of U (1) principal bundles on M. Before proceeding to the next topic, the following remark is in order. The Chern class cP of a principal U (1) bundle P belongs by definition to the cohomology group H 2 (M, Z). Another definition identifies the Chern class of P with q(cP ), the natural 2 (M) of de Rham cohomology. The advantage image of cP in the integer lattice HdR Z of the first definition, adopted in this paper, is that it discriminates principal bundles differing by a flat bundle. The second, though more popular in the physics literature, does not. 2.2. The gauge group. The fixing of the gauge symmetry is an essential step of the quantization of Maxwell theory. Below, we recall the main structural properties of the gauge group. For P ∈ Princ(M), the gauge group of P , Gau (P ), can be identified with the 0th cohomology of the sheaf U (1): Gau (P ) = H 0 (M, U (1)).
(2.2.1)
Its elements are often called large gauge transformations in the physics literature. The flat gauge group of P , G(P ), can similarly be identified with the 0th cohomology of the constant sheaf U (1): G(P ) = H 0 (M, U (1)).
(2.2.2)
Its elements are commonly called rigid gauge transformations. The natural sheaf morphism U (1) → U (1) induces a homomorphism H 0 (M, U (1)) → H 0 (M, U (1)) of sheaf cohomology, which is readily seen to be an injection. Thus, G(P ) is isomorphic to a subgroup Gau 0 (P ) of Gau (P ). Note that G(P ) ∼ (2.2.3) = Gau 0 (P ) ∼ = U (1). Gau (P ) and G(P ) or Gau 0 (P ) do not depend on P . Therefore, to emphasize this fact, we shall occasionally denote these groups by Gau (M) and G(M) or Gau 0 (M), respectively. For h ∈ H 0 (M, U (1)), define α(h) =
1 −1 h dh. 2πi
(2.2.4)
478
R. Zucchini
1 0 It is straightforward to show that α(h) ∈ ZdR Z (M) and that the map α : H (M, U (1)) 1 1 0 → ZdR Z (M) is a group homomorphism with range ZdR Z (M) and kernel H (M, U (1)). Thus, on account of (2.2.1)–(2.2.3), we have the important isomorphism α
Gau (M)/Gau 0 (M) ∼ =
1 ZdR Z (M).
(2.2.5)
1 (M) by α is the subgroup Gau (M) of Gau (M) of the The counterimage of BdR c gauge group elements homotopic to the identity. Its elements are called small gauge transformations in the physics literature. Obviously, Gau 0 (M) ⊆ Gau c (M). Thus, α
1 (M). Gau c (M)/Gau 0 (M) ∼ = BdR
(2.2.6)
The quotient Gau (M)/Gau c (M) is the gauge class group. By the above, 1 Gau (M)/Gau c (M) ∼ = HdR Z (M).
(2.2.7)
2.3. Connections. After rescaling by a suitable factor q/2π , the photon gauge field of Maxwell theory can mathematically be characterized as a connection of some U (1) principal bundle. Next, we recall the main properties of the set of connections of a U (1) principal bundle. For any P ∈ Princ(M), the family of connections of P , Conn (P ), is an affine space 1 (M). For A ∈ Conn (P ), modeled on CdR FA = dA
(2.3.1)
2 is the curvature of A. As is well known, FA ∈ ZdR Z (M) and q(cP ) = [FA ]dR (cf. Eq. (2.1.4)). If P , P ∈ Princ(M), A ∈ Conn (P ), A ∈ Conn (P ), then A + A ∈ Conn (P P ). If P ∈ Princ 0 (M) ⊆ Princ(M) is flat, then 0 ∈ Conn (P ). So, if P ∈ Princ (M), P ∈ Princ 0 (M), then Conn (P P ) = Conn (P ). In particular, Conn (P ) = Conn (1) = 1 (M). CdR For P ∈ Princ(M), Gau (P ) acts on Conn (P ) as usual, viz
Ah = A + α(h)
(2.3.2)
for A ∈ Conn (P ) and h ∈ Gau (P ) (cf. Eq. (2.2.4)). Note that Gau 0 (P ) is precisely the invariance subgroup of A.
2.4. Cheeger Simons differential characters. Asis well known, if A is a connection of some principal U (1) bundle P , the line integral A over some closed path cannot be defined in the usual naive sense, since A suffers local gauge ambiguities and, thus, is not a globally defined 1–form. Nevertheless, one can try to give a meaning to such a formal expression modulo integers using the theory of the Cheeger Simons differential characters, whose main features are described below [27–30]. A Cheeger Simons differential character is a mathematical object having the formal properties characterizing the holonomy map of a principal U (1) bundle. It has however a somewhat wider scope, since it is defined for singular 1–cycles, which are objects more general than closed paths. Roughly speaking, we define the formal integral A
Abelian Duality and Abelian Wilson Loops
479
as the logarithm of a suitably chosen differential character computed at the appropriate 1–cycle . A Cheeger Simons differential character is a group homomorphism : Z1s (M) → 2 (M) for which U (1) such that there is a 2–form F ∈ CdR (2.4.1) (bS) = exp 2πi F , S
The Cheeger Simons differential characters form naturally a group for S ∈ 2 CS (M). 2 From (2.4.1), it is simple to see that, for ∈ CS 2 (M), F ∈ ZdR Z (M) and that the 2 2 map F : CS (M) → ZdR Z (M), → F is a group homomorphism. To any ∈ CS 2 (M), there is associated a class c ∈ H 2 (M, Z) such that q(c ) = [F ]dR , defined as follows. Since U (1) ∼ = R/Z is a divisible group and Z1s (M) is a s subgroup of the free group C1 (M), there exists a real cochain f ∈ Cs1R (M) such that
= exp 2π if Z1s (M) . Then, by (2.4.1), (2.4.2) ς(S) = f (bS) − F , S ∈ C2s (M) C2s (M).
S
defines an integer cochain ς ∈ Cs2Z (M). It is readily checked that ς ∈ Zs2Z (M) is an integer cocycle which, viewed as a real cocycle, is cohomologically equivalent to F . The choice of f affects ς at most by an integer coboundary. Hence, the class c of ς in the 2nd integer cohomology Hs2Z (M) is unambiguously determined by . The statement then follows from the isomorphism of integer singular and sheaf cohomology. It is simple to see that the map c : CS 2 (M) → H 2 (M, Z), → c is a group homomorphism. 1 (M), there is associated an element χ ∈ CS 2 (M) by To any v ∈ CdR v v , ∈ Z1s (M). (2.4.3) χv ( ) = exp 2πi
One has Fχv = dv and cχv = 0. Clearly χv depends only on the class of v mod 1 1 1 2 ZdR Z (M) and the map χ : CdR (M)/ZdR Z (M) → CS (M), [v] → χv is a group 1 (M) ⊆ C 1 (M), χ depends only on the cohomology homomorphism. When a ∈ ZdR a dR 1 1 1 (M)/H 1 (M) → CS 2 (M), class of a in HdR (M) mod HdR Z (M) and the map χ : HdR dR Z [a] → χa is again a group homomorphism. The above properties are encoded in the short exact sequences χ
0
1 (M)/H 1 (M) → → HdR dR Z
0
→
χ
1 (M)/Z 1 CdR dR Z (M) →
(c,F )
→
→
0,
(2.4.4)
CS 2 (M) → H 2 (M, Z) →
0.
(2.4.5)
CS 2 (M)
c
A2Z (M)
2 Here, A2Z (M) is the subset of the Cartesian product H 2 (M, Z) × ZdR Z (M) formed by the pairs (e, G) such that q(e) = [G]dR . Before entering the details of the definition of the formal integral A, with P ∈ Princ(M), A ∈ Conn (P ) and ∈ Z1s (M), let us recall the properties which it is required to have. First, when is a boundary, so that = bS for some S ∈ C2s (M), one has A = FA , mod Z, (2.4.6)
S
480
R. Zucchini
where the integral in the right-hand side is computed according to the ordinary differential geometric prescription. This is a formal generalization of Stokes’ theorem. Second, 1 (M), the obvious relation for v ∈ CdR (A + v) = A+ v, mod Z, (2.4.7)
holds, where the second integral in the right-hand side is computed according to the ordinary differential geometric prescription. This property may be called semilinearity. Third, for h ∈ Gau (P ), Ah =
mod Z.
A
(2.4.8)
In this way, gauge invariance is ensured. This property, albeit important, is not independent from the others. Indeed, it follows from (2.4.7), (2.3.2) and the fact that α(h) ∈ 1 ZdR Z (M) and, thus, α(h) ∈ Z. Tentatively, for ∈ Z1s (M), we define A mod Z as follows. We consider a character ∈ CS 2 (M) such that c = cP and F = FA . As q(cP ) = [FA ]dR , the condition q(c ) = [F ]dR is fulfilled. Then, we set A . (2.4.9) ( ) = exp 2πi
The definition given is however ambiguous. Indeed, by the exact sequence (2.4.4), the character with the stated properties is not unique, being defined up to a character of 1 (M) defined modulo Z 1 the form χa with a ∈ ZdR dR Z (M). The definition is also not satisfactory, since, apparently, it yields the same result for connections differing by a 1 (M). closed form a ∈ ZdR To solve these problems, we proceed as follows. With some natural criterion, we fix a reference connection AP ∈ Conn (P ) and afiducial character P ∈ CS 2 (M) such that c P = cP and F P = FAP and declare AP to be given mod Z by the above procedure: (2.4.10) AP . P ( ) = exp 2πi 1 (M) depending Next, for a generic connection A ∈ Conn (P ), we define a form vA ∈ CdR on A by the relation A = AP + vA . (2.4.11)
Then, we set
A=
AP +
vA
mod Z.
(2.4.12)
It is easy to check that this definition of A has the required properties (2.4.6)–(2.4.8). Note that A depends on P via its Chern class cP and not simply via q(cP ) = [FA ]dR . It is therefore sensitive to torsion. By the isomorphism (2.1.8), the torsion part of cP reflects the flat factors of P . Thus, A depends explicitly on these latter. Needless to say, what we have done here is to provide a family of definitions of A parameterized by the choices of AP and P . In the next subsection, we shall devise a way of restricting the amount of arbitrariness involved.
Abelian Duality and Abelian Wilson Loops
481
2.5. Background connection and character assignments. We consider below the group isomorphism that associates to any c ∈ H 2 (M, Z) the unique (up to smooth equivalence) U (1) principal bundle Pc such that cPc = c. This map is the inverse of the isomorphism (2.1.4). A background connection assignment is a map that associates to any c ∈ H 2 (M, Z) a connection Ac ∈ Conn (Pc ) in such a way that c, c ∈ H 2 (M, Z),
Ac+c = Ac + Ac , At = 0,
t ∈ Tor 2 (M, Z).
(2.5.1) (2.5.2)
We set Fc = FAc . A background character assignment compatible with a background connection assignment c → Ac is a map that associates to any c ∈ H 2 (M, Z) a character c ∈ CS 2 (M) such that c c = c and F c = Fc and that c, c ∈ H 2 (M, Z).
c+c = c · c ,
(2.5.3)
A background connection assignment c → Ac and a compatible background character assignment c → c can be constructed as follows. Let fr , r = 1, . . . , b2 and tρ , ρ = 1, . . . , t2 be a set of independent generators of H 2 (M, Z), where the fr are free and the tρ are torsion of order κρ . Every c ∈ H 2 (M, Z) can be written uniquely as c= nr (c)fr + k ρ (c)tρ , (2.5.4) r
ρ
for certain nr (c) ∈ Z depending linearly on c and k ρ (c) = 0, 1, . . . , κρ − 1 depending linearly on c modulo κρ . Next, choose Ar ∈ Conn (Pfr ) with curvature FAr = Fr . Then, set Ac = nr (c)Ar . (2.5.5) r
with c r = fr and F r = Fr and ρ ∈ CS 2 (M) Similarly, choose r ∈ κ 1 (M), by the with c ρ = tρ and F ρ = 0. As κρ tρ = 0, ρρ = χa for some a ∈ ZdR exact sequence (2.4.4). Redefining ρ into ρ χa/κρ , one can impose CS 2 (M)
κ
ρρ = 1. Then, set c =
r
r (c)
nr
·
(2.5.6) ρ (c)
kρ
.
(2.5.7)
ρ
Then, the maps c → Ac and c → c are respectively a connection and a compatible character assignment. Let a background connection assignment c → Ac and a compatible background character assignment c → c be given. For ∈ Z1s (M) and A ∈ Conn (Pc ), we define A by the procedure expounded in the previous subsection by taking APc = Ac and Pc = c , for c ∈ H 2 (M, Z). In this way, (2.4.10)–(2.4.12) hold with AP and P replaced by Ac and c . It is convenient, though not necessary, to choose Ac , c of the form (2.5.5), (2.5.7). In this way, the arbitrariness inherent in the definition of A, discussed at the end of the previous subsection, is reduced to that associated with the choice of Ar , r , ρ .
482
R. Zucchini
2.6. Example, the 4–torus. Since the formalism expounded above is rather abstract, we illustrate it with a simple example. We consider the case where M is the 4–torus T 4 . As coordinates of T 4 , we use angles θ i ∈ [0, 2π [, 1 ≤ i ≤ 4. The 4-torus T 4 has the nice property that torsion vanishes both in homology and in p cohomology. Thus, we have the isomorphisms Hps (T 4 ) ∼ = HdR Z (T 4 ) ∼ = H p (T 4 , Z) ∼ = 4
ZCp , where Cp4 = bp is a binomial coefficient. A standard basis of Hps (T 4 ) consists of the homology classes of the singular p–cycles a1 ···ap ∈ Zps (T 4 ), 1 ≤ a1 < · · · < ap ≤ 4, defined by p θ i (t1 , · · · , tp ) = 2π δai s ts , 0 ≤ t1 , · · · , tp < 1. (2.6.1) s=1 p
A standard basis of HdR Z (T 4 ) consists of the cohomology classes of the integer period p p–forms ωa1 ···ap ∈ ZdR Z (T 4 ), 1 ≤ a1 < · · · < ap ≤ 4, defined by ωa1 ···ap =
a1 1 (2π)p dθ
∧ · · · ∧ dθ ap .
(2.6.2)
For a given p, the homology and cohomology basis are reciprocally dual. 2 (T 4 ), a principal U (1) bundle on T 4 is determined up to Since H 2 (T 4 , Z) ∼ = HdR Z equivalence by the de Rham cohomology class of the curvature of any connection. We consider the principal U (1) bundle P ab ∈ Princ(T 4 ) defined by the de Rham cohomology class of the 2–form 2 4 F ab = ωab ∈ ZdR (2.6.3) Z (T ),
with 1 ≤ a < b ≤ 4. P ab is described concretely by the monodromy of a section of the associated line bundle around the 1–cycles c , T ab c = exp(iδ a c θ b − iδ b c θ a ). Any P Aab ∈
(2.6.4)
∈ Princ(T 4 ) is expressible as a product of P ab ’s and their inverses. A connection Conn (P ab ) with curvature F ab is Aab =
1 2(2π)2
θ a dθ b − θ b dθ a .
(2.6.5)
2 (T 4 ) determines unambiguously a class cab ∈ H 2 (T 4 , Z). There is a [F ab ]dR ∈ HdR Z unique Cheeger Simons character ab ∈ CS 2 (T 4 ) such that F ab = F ab , c ab = cab and that ab ( c ) = 1, 1 ≤ c ≤ 4. (2.6.6) Indeed, (2.6.6) selects unambiguously a unique character among those such that F ab = F ab , c ab = cab (cf. the exact sequence (2.4.5)). By (2.4.1), (2.6.6) (2.6.7) ab ( ) = exp 2πi F ab , S
4
for = a=1 na a + bS ∈ Z1s (T 4 ) with na ∈ Z and S ∈ C2s (T 4 ) a 2–chain. A background connection assignment and a compatible background character assignment are given by Ac = nab (c)Aab , (2.6.8) 1≤a
c = for c =
1≤a
ab
∈
1≤a
( ab )nab (c) ,
(2.6.9)
Abelian Duality and Abelian Wilson Loops
483
3. The Gauge Partition Function The physical motivation of the following construction has been given in the introduction. To begin with, to properly define the kinetic term of the photon action and to carry out the gauge fixing and quantization program, we endow M with a fixed background Riemannian metric g. 3.1. The photon action. For any P ∈ Princ (M) and any A ∈ Conn (P ), the Wick rotated photon action S(A, τ ) is given by 1 S(A, τ ) = π FA ∧ τˆ FA . (3.1.1) M
Here, τ varies in the open upper complex half plane H+ , τ = τ1 + iτ2 , and τˆ is the operator
τ1 ∈ R,
τ2 ∈ R+
τˆ = τ1 + i ∗ τ2 .
(3.1.2) (3.1.3)
The action S(A, τ ) takes the form (1.1) upon expressing τ as in (1.2) and rescaling A into (q/2π )A. The integrality of the de Rham cohomology class of FA translates in the flux quantization condition (1.3) after the rescaling. The action S(A, τ ) has the obvious symmetry A → A + a,
(3.1.4)
1 (M). Unless H 1 (M, R) = 0, this symmetry is larger than gauge symwhere a ∈ ZdR 1 metry, which corresponds to a ∈ ZdR Z (M) (cf. Subsect. 2.2). The field equations can be written as
d τˆ FA = 0.
(3.1.5)
They are equivalent to the vacuum Maxwell equations and the Bianchi identity dFA = 0,
d ∗ FA = 0.
(3.1.6)
3.2. The Wilson loop action. The insertion of a Wilson loop along a cycle ∈ Z1s (M) is equivalent to add to the photon action a coupling of the gauge field A to a one dimensional defect represented by . For any A ∈ Conn (P ), the interaction term of A and is then W (A, ) = 2π A mod 2π Z, (3.2.1)
where the right-hand side is defined in the way expounded in Subsect. 2.4. The fact that is a 1–cycle is equivalent to the conservation of the associated current. (See Subsect. 3.5 below.) As explained in Subsect. 2.4, the definition of A involves choices and, thus, is not unique. It will be necessary to check at the end that the result of our calculations does not depend on the conventions used. 1
The Wick rotated action S is related to the Euclidean action SE as S = iSE .
484
R. Zucchini
3.3. The partition function. The partition function with a Wilson loop insertion is given by DA exp (iS(A, τ ) + iW (A, )) (3.3.1) Z( , τ ) = A∈Conn (P ) vol(Gau (P )) P ∈Princ (M)
[18–21]. The right-hand side of this expression is the formal mathematical statement of the physical quantization prescription consisting in a summation over all topological classes of the gauge field and a functional integration of the quantum fluctuations of the gauge field about the vacuum gauge configuration of each class with the gauge group volume divided out. To compute the above formal expression, we exploit heavily the results of Subsect. 2.5. We first set P = Pc with c ∈ H 2 (M, Z) and transform the summation over P into one over c. Next, we choose a background connection assignment c → Ac and write a generic A ∈ Conn (Pc ) as (3.3.2) A = Ac + v, 1 (M) is a fluctuation, and transform the integration over A into one over v. where v ∈ CdR To evaluate the Wilson loop action, we further pick a background character assignment c → c compatible with the connection assignment c → Ac . It is possible and convenient to impose that the connections Ac of the connection assignment satisfy the Maxwell equation
d ∗ Fc = 0.
(3.3.3)
To keep the arbitrariness involved in the various choices as controlled as possible, we assume further that the background connection and character assignments c → Ac , c → c are of the form (2.5.5), (2.5.7), respectively. Proceeding in this way, we find that the partition function factorizes in a classical background and a quantum fluctuation factor, Z( , τ ) = Zcl ( , τ ) · Zqu ( , τ ), where Zcl ( , τ ) =
c∈H 2 (M,Z)
(3.3.4)
exp iπ Fc ∧ τˆ Fc + 2π i Ac , M
(3.3.5)
Dv Zqu ( , τ ) = dv ∧ ∗dv + 2π i v . exp −πτ2 1 (M) vol(Z 1 v∈CdR M dR Z (M)) (3.3.6) 1 is a universal Jacobian relating the formal volumes vol(Gau (M)) and vol(ZdR Z (M)) (cf. Subsect. 2.2).
3.4. Evaluation of the classical partition function. In order for (3.3.3) to hold, the curvatures Fr of the connections Ar appearing in (2.5.5) all satisfy (3.3.3). Hence, the Fr form a basis of the lattice Harm 2Z (M). The inverse intersection matrix Q is defined by Fr ∧ F s . (3.4.1) Qrs = M
Abelian Duality and Abelian Wilson Loops
485
As is well known, Q is a unimodular symmetric integer b2 × b2 matrix characterizing the topology of M and Q is even or odd according to whether M is spin or not. As ∗Harm 2 (M) ⊆ Harm 2 (M) and ∗2 = 1 on Harm 2 (M), one has H s r Fs , (3.4.2) ∗Fr = s
where H is a non-singular real b2 × b2 matrix such that H 2 = 1. As M F ∧ ∗F is a norm on Harm 2 (M), QH is a positive definite symmetric b2 × b2 matrix. From (2.5.5), one has immediately that nr (c)Fr . (3.4.3) Fc = r
Recalling from Subsect. 2.5 that exp 2πi Ac = c ( ) and using (2.5.7), we find ρ r Ac = exp 2πi n (c) Ar ρ ( )k (c) . (3.4.4) exp 2π i
r
ρ
Using (3.3.5), (3.1.3), (3.4.1)–(3.4.4), we obtain exp iπ n(c)t Q(τ1 1 + iτ2 H )n(c) Zcl ( , τ ) = c∈H 2 (M,Z)
+ 2π in(c)t γ ( )
ρ ( )k
ρ (c)
(3.4.5)
,
ρ
where γr ( ) =
(3.4.6)
Ar .
From (2.5.4), by setting nr = nr (c) and k ρ = k ρ (c), we can transform the summation over c ∈ H 2 (M, Z) in a summation over nr ∈ Z and k ρ = 0, 1, . . . , κρ − 1. Using (2.5.6), it is easy to see that β β ( )k = κρ ς ( ), (3.4.7) k ρ =0,...,κρ −1 β
ρ
where the characteristic map ς is defined by ς ( ) = 1 if ρ ( ) = 1 for all ρ, Thus, Zcl ( , τ ) =
n∈Zb2
ς ( ) = 0 otherwise.
exp iπ nt Q(τ1 1 + iτ2 H )n + 2π int γ ( ) κρ ς ( ),
(3.4.8)
(3.4.9)
ρ
which is our final expression of the classical partition function. The origin of the strange looking factor ς ( ) is not difficult to interpret intuitively. Comparing (3.4.3), (3.4.4), we notice that, while the gauge curvature Fc is notsensi-
tive to the torsion part of c (cf. Eq. (2.5.4)), the Abelian Wilson loop exp 2π i Ac is. When we sum over all classes c ∈ H 2 (M, Z) in (3.3.5), a finite subsum over all torsion classes t ∈ Tor 2 (M, Z) is involved. By (3.4.3), (3.4.4), the terms of the
486
R. Zucchini
subsum differ only by phases, which, on account of (2.5.6), are rational. The superposition of these phases leads to either constructive or destructive interference and yields the factor ς ( ). in Subsect. 2.4, the dependence of the Abe As explained
lian Wilson loop exp 2πi Ac on the torsion part of c can be traced to its dependence on the flat factors of the underlying principal bundle Pc . Thus, the factor ς ( ) can ultimately be attribuited to an interference effect of the flat topological classes in (3.3.1).
3.5. Evaluation of the quantum partition function. The computation of the quantum partition function proceeds through two basic steps [26]. Firstly, one endows the relevant field spaces with suitable Hilbert structures in order to define the corresponding functional measures. Secondly, one determines the appropriate field kinetic operators required by the definition of the perturbative expansion. In our case, the problem is simplified by the fact that the field theory we are dealing with is free. There are however complications related to gauge invariance and the consequent need for gauge fixing. p In our model, the relevant field spaces are CdR (M) with p = 0, 1, corresponding p to the Faddeev–Popov ghost field and photon field. The Hilbert structure of CdR (M) is defined as usual by p u ∧ ∗v, u, v ∈ CdR (M). (3.5.1)
u, v = M
p
The relevant kinetic operators are the standard form Laplacians p acting on CdR (M), p = (d † d + dd † )p ,
(3.5.2)
which are order 2 elliptic non-negative self adjoint operators. Since we are using a Hilbert space formalism, it is convenient to express the argument of the exponential in (3.3.6) in terms of the Hilbert structure (3.5.1). To this end, for a 1 (M) by cycle ∈ Z1s (M), we define a distribution j on CdR
j , ω =
ω,
1 ω ∈ CdR (M).
(3.5.3)
As a consequence of the relation b = 0, one has d † j = 0.
(3.5.4)
Intuitively, j is the current associated to the 1–cycle and (3.5.4) is the statement that j is conserved. As one is computing the partition function of a field theory on a generally topologically non-trivial manifold, particular care must be taken for a proper treatment of the zero modes of the kinetic operators. The p = 0 ghost zero modes form the 1–dimensional vector space of constant functions on M, Harm 0 (M). As a basis of this, we choose the constant scalar 1. The p = 1 photon zero modes form the b1 –dimensional vector space of harmonic 1–forms of M, Harm 1 (M). As a basis of this, we choose a basis {ωm }, m = 1, . . . , b1 , of the lattice Harm 1Z (M) for convenience.
Abelian Duality and Abelian Wilson Loops
487
We fix the gauge by imposing the customary Lorentz fixing gauge condition. By using standard Faddeev–Popov type manipulations to perform the gauge fixing, we find Zqu ( , τ ) =
1 1 1 det G1 2 det (2π τ2 0 ) 2
det δ ( )
j ,ωn ,0 0 b1 −1 vol M det (2π τ2 1 ) (2π) 2 n × exp −π 2 j , (πτ2 1 )−1 j . (3.5.5)
Here, det () and −1 denote the determinant and the inverse of the restriction of to the orthogonal complement of its kernel, respectively, and G1mn = ωm , ωn .
(3.5.6)
We collect in the appendix the details of the derivation of (3.5.5). Without going through all that, we can intuitively understand the origin of the various factors appearing in 1 (3.5.5). [det (2π τ2 1 )]− 2 is the photon determinant. Roughly speaking, the combi1 nation [det (0 ) det (2πτ2 0 )] 2 is the ghost determinant, since the second determinant equals the first up to a τ2 dependent constant. The factor n δ j ,ωn ,0 is yielded by the integration over the photon zero modes that satisfy the Lorentz gauge fixing condition with the volume of the residual gauge symmetry divided out. The zero
modes live in the torus Harm 1 (M)/Harm 1Z (M). Only the integral exp 2π i v in (3.3.6) depends on them. Integration of this phase on the zero modes torus produces the above combination of Kronecker delta functions. Finally, the exponential factor
exp −π 2 j , (π τ2 1 )−1 j is the result of the Gaussian integration in (3.3.6) and represents the selfenergy of the current j . The remaining factors are just normalization constants. In (3.5.5), both the determinants and the argument of the exponential suffer ultraviolet divergences which have to be regularized and renormalized. We regularize the determinants using Schwinger’s proper time method, which now we briefly review [26]. Let be an elliptic non-negative self adjoint operator in some Hilbert space of fields on a manifold X. Its proper time regularized determinant is given by ∞
dt
det () = exp − tr exp(−t) − dim ker , (3.5.7) t where > 0 is a small ultraviolet cutoff of mass dimension exponent −2. According to the heat kernel expansion ∞ k−dim X ord tr exp(−t) ∼ t ak (), t → 0+, (3.5.8) k=0
X
where ak () is a dim X–form depending locally on the background geometry. Using (3.5.7), (3.5.8), it is easy to show that dim X −l/ord
−dim ker exp − adim X−l () det () = l/ord X l=1 + ln adim X () + O() det ms (). (3.5.9) X
Here, det ms () is the finite minimally subtracted renormalized determinant.
488
R. Zucchini
We note that, for any κ > 0, one has det (κ) = det κ (),
(3.5.10)
a simple property that will be useful in the calculations below. We replace the formally divergent determinants appearing in (3.5.5) with their proper time regularized counterparts and use the expansion (3.5.9). The expressions of the heat kernel forms ak (p ) are well known in the literature [31]. In this way, one finds
det (2πτ2 0 ) 2
det (0 ) det (2πτ2 1 )
1
1 3 1 1 = exp −1 2 d 4 xg 2 (8π)2 (2πτ2 )2 M 1 1 1 1 1 + + d 4 xg 2 R (8π )2 2πτ2 3 M 1 1 1 4 2 25R 2 − 88R ij Rij + 13R ij kl Rij kl + ln d xg (8π )2 90 M 1 1 ln(2πτ2 ) 4 2 ij ij kl + d xg 2 15R − 58R Rij + 8R Rij kl + O() (8π )2 60 M b1 −1 det (0 ) ms . (3.5.11) ×(2π τ2 ) 2 1
det ms (1 ) 2 b1 2 −1
b1
The prefactor 2 −1 can be absorbed into an appropriate dependent normalization of the zero mode part of the partition function measure. The local divergences appearing in the exponential can be removed by adding to the action S(A, τ ) (cf. Eq. (3.1.1)) local counterterms with suitable dependent coefficients. The general form of these counterterms, predicted also by standard power counting considerations, is 1 i 4 2 c4 (, τ ) + c2 (, τ )R + c0 (, τ )R 2 S (τ ) = d xg (8π)2 M (3.5.12) + c0 (, τ )R ij Rij + c0
(, τ )R ij kl Rij kl , where the suffix of the numerical coefficients denotes the exponent of their mass dimension. If one adopts the minimal subtraction renormalization scheme, one obtains
det (2π τ2 0 ) det (0 ) det (2π τ2 1 )
21 ms
1 1 ln(2πτ2 ) 4 2 15R 2 − 58R ij Rij = exp d xg (8π)2 60 M b1 −1 det (0 ) ms +8R ij kl Rij kl (2π τ2 ) 2 . 1
det ms (1 ) 2 (3.5.13)
As it turns out, the τ2 dependence of the resulting renormalized product of determinants has bad duality covariance properties due to the exponential factor. It is possible
Abelian Duality and Abelian Wilson Loops
489
to remove the latter by adjusting the finite part of the local counterterms. This amounts to adopting another duality covariant renormalization scheme for which
det (2πτ2 0 ) det (0 ) det (2πτ2 1 )
21
= (2πτ2 )
det ms (0 )
b1 −1 2
1
det ms (1 ) 2
dc
.
(3.5.14)
It is Witten’s choice [18] and also ours. Next, we regularize the Green function by using again Schwinger’s proper time method, as described below [26]. Let be an elliptic non-negative self adjoint operator in some Hilbert space of fields on a manifold X as before. Its proper time regularized Green function is ∞ −1 = dt (exp(−t) − P (ker )) , (3.5.15)
where P (ker ) is the orthogonal projector on ker and > 0 is a small ultraviolet cutoff of mass dimension exponent −2. Indeed, carrying out the integration explicitly, one has −1 = −1 exp(−). (3.5.16) We note that, for any κ > 0, one has (κ)−1 = κ −1 −1 κ ,
(3.5.17)
as is apparent also from (3.5.16). The heat kernel exp(−t)(x, x ), x, x ∈ M, is a bitensor with the small t expansion exp(−t)(x, x ) ∼
∞ 1 σ (x, x ) l exp − t fl (x, x ), (4πt)dim X/2 2t
t →0+.
l=0
(3.5.18) Here, σ (x, x ) is half the square geodesic distance of x, x . The fl (x, x ) are certain bitensors of the same type as exp(−t)(x, x ) [31]. We regularize the formal expression j , (πτ2 1 )−1 j appearing in (3.5.5) by replacing (π τ2 1 )−1 with (π τ2 1 )−1 . The only thing one needs to know about the small t expansion of the heat kernel exp(−t)ij (x, x ) is that f0ij (x, x )|x =x = gij (x) and ∂k f0ij (x, x )|x =x = gkl ijl (x). In this way, one finds
j , (π τ2 1 )
−1
j
=
1
2 (4π 2 τ
2)
3 2
1 2
1
1
dt ( ∗ gtt ) 2 +
0
1 1 σ ( )+O( 2 ), (3.5.19) π τ2
where σ ( ) is a finite constant depending on . In the first term, the 1–cycle is viewed as a parameterized path : [0, 1] → M and the value of the integral is just the length of the path as measured by the metric g. The divergent part can be removed by adding to the interaction action W (A, ) (cf. Eq. (3.2.1)) a local counterterm of the form W ( , τ ) = ic1 (, τ )
1
1
dt ( ∗ gtt ) 2
(3.5.20)
0
with a suitably adjusted dependent coefficient of mass dimension exponent 1.
490
R. Zucchini
One finds in this way 1 b1 −1 det G1 2 det ms (0 ) π σ ( ) 2 exp − Zqu ren ( , τ ) = , δ τ
j ,ω ,0 2 n 1 vol M τ2 det ms (1 ) 2 n (3.5.21) which is our final expression of the renormalized quantum partition function. The factors 1 appearing in (3.5.21) are easily interpreted. det ms (0 ), det ms (1 )− 2 are the renormalb1 −1
ized ghost and photon determinants, respectively. τ2 2 is the explicit τ2 dependence of the renormalized determinants. σ ( ) is the conventionally normalized renormalized selfenergy of the conserved current j associated with . The origin of the combination δ was explained below (3.5.6).
j ,ω ,0 n n 3.6. Selection rules. Let us examine the implications of the above calculation. Consider a cycle ∈ Z1s (M). From (3.4.8), (3.4.9), it follows that Zcl ( , τ ) = 0 unless ρ ( ) = 1 for all ρ, that is is contained in the kernel of all characters ∈ CS 2 (M) such that c ∈ Tor 2 (M, Z),F = 0. This is the classical selection rule. From (3.5.21), recalling that j , ωk = ωk by (3.5.3), it follows that Zqu ( , τ ) = 0 unless ω = 0 for all k, that is is a torsion cycle, i.e. [ ]s ∈ Tor s1 (M). This is the k quantum selection rule. From (3.3.4) and the above, we conclude that Z( , τ ) = 0 identically unless ∈ Z1s (M) satisfies [ ]s ∈ Tor s1 (M), ( ) = 1,
(3.6.1)
for all ∈ CS 2 (M) with F = 0.
(3.6.2)
3.7. Flat characters and the Morgan–Sullivan torsion invariant. Let ∈ CS 2 (M) be a flat character, i. e. such that F = 0. Then, c ∈ Tor 2 (M, Z) ∼ = Tor 2s Z (M). Therefore, there exist a minimal integer ν ∈ N, an integer cocycle ρ ∈ Zs2Z (M) and an integer cochain s ∈ Cs1Z (M) such that c = [ρ]s Z and ν ρ = ds. On the other hand, as explained
in Subsect. 2.4, there is a real cochain f ∈ Cs1R (M) such that = exp 2π if Z1s (M) , df ∈ Zs2Z (M) and c = [df ]s Z . We thus have, df = ρ + dt for some integer cochain t ∈ Cs1Z (M). Let ∈ Z1s (M) be such that [ ]s ∈ Tor s1 (M). Then, there are a minimal ν ∈ N and S ∈ C2s (M) such that ν = bS. Using the above relations, one easily shows that ν f ( ) = ρ(S) + ν t ( ) and ν ρ(S) = ν s( ). Thus f ( ) = ρ(S)/ν = s( )/ν
mod Z.
(3.7.1)
Now, using (3.7.1), it is easy to check that f ( ) depends only on the cohomology class c of ρ and the homology class [ ]s of mod Z. Hence, the object defined by
[ ]s , c = f ( )
mod Z
(3.7.2)
is a topological invariant. It is called the Morgan–Sullivan torsion invariant pairing [32, 33]. It is Z linear in both arguments and non-singular. From the above, we conclude that, for a character ∈ CS 2 (M) such that F = 0,
( ) = exp 2πi [ ]s , c , (3.7.3) for all ∈ Z1s (M) such that [ ]s ∈ Tor s1 (M).
Abelian Duality and Abelian Wilson Loops
491
3.8. The final form of the selection rules. Using the results of the previous subsection, we can restate the selection rules (3.6.1), (3.6.2) as follows: [ ]s ∈ Tor s1 (M),
[ ] , c = 0 s
mod Z,
(3.8.1) c∈
Tor 2s Z (M).
(3.8.2)
As the Morgan–Sullivan pairing is non singular, these are equivalent to ∈ B1s (M).
(3.8.3)
Thus, the partition function Z( , τ ) vanishes unless is a 1–boundary. This is the final form of the selection rules of the Abelian Wilson loops. Note that they originate from a non-trivial combination of the classical and quantum selection rules. Remarkably, in spite of the ambiguity inherent in the definition of the integral A, the partition function Z( , τ ) is unambiguously defined. Sub Indeed, as explained in 1 (M) sect. 2.4, the indetermination of A is of the form a mod Z with a ∈ ZdR and this object vanishes when is a boundary. When, conversely, is not a boundary, Z( , τ ) vanishes identically, regardless of the way the ambiguity of A is fixed. This selection rule found is rather surprising when compared to the result for Abelian Chern Simons theory [34], where non-trivialAbelian Wilson loops are found for non-trivial knots. This calls for an explanation. As a gauge theory on a topologically non-trivial manifold M, Chern Simons theory is rather trivial, since the underlying principal bundle is trivial. For non-trivial bundles, the Chern Simons Lagrangian would not be globally defined on M in general and thus could not be integrated to yield an action. Further, it is implicitly assumed that there are no photon zero modes. This restricts the manifold M to be such that H 1 (M, R) = 0. Thus, unlike for Maxwell theory, the quantization of Chern Simons theory involves no sum over the topological classes of the gauge field, since only the trivial class is involved. For this reason, the basic interference mechanism involving flat bundles which is partly responsible for the selection rule of Abelian Wilson loops of Maxwell theory is not working in Abelian Chern Simons theory. Further, all 1–cycles one is dealing with are torsion from the start. Finally, in the Abelian Chern Simons model the relevant invariants of a knot are given in terms of the selfenergy of the current associated to the knot, which is of a topological nature. In Maxwell theory, the selfenergy of a 1–cycle is obviously not topological. 3.9. Example, the 4–torus. We illustrate the above results with an example. We consider again the case where M is 4–torus T 4 , which was already discussed in Subsect. 2.6. It is not difficult to compute the τ dependent factor of the partition function Z( , τ ). Z( , τ ) is given by (3.3.4) with Zcl ( , τ ), Zqu ( , τ ) given respectively by (3.4.9), (3.5.21) (after renormalization). Since Tor 2 (T 4 , Z) = 0, the factor ρ κρ ς ( ) appearing in (3.4.9) is identically 1. The Betti numbers b1 , b2 of the 4–torus T 4 are 4, 6, respectively. It follows that, for a 1–boundary ∈ B1s (T 4 ), 3 π σ ( ) Z( , τ ) = Z0 τ2 2 exp − (γ ( ), τ ), (3.9.1) τ2 where Z0 is a constant independent from , τ , γ ( ) is defined in (3.4.6) and (γ , τ ) is a certain function of γ ∈ C6 , τ ∈ H+ , given by (3.4.9) with γ ( ) replaced by γ and ρ κρ ς ( ) set to 1.
492
R. Zucchini
It is not difficult to compute (γ , τ ) when T 4 is endowed with the standard flat metric g = δij dθ i ⊗ dθ j . (3.9.2) The 2–forms ωab , 1 ≤ a < b ≤ 4, defined in (2.6.2), belong to Harm 2Z (T 4 ) and form a basis of this latter. A simple calculation shows that Qab,cd = T 4 ωab ∧ ωcd = abcd and QH ab,cd = T 4 ωab ∧ ∗ωcd = δ ac δ bd − δ ad δ bc . If we use the index r = 1, 2, 3, 4, 5, 6 for the pairs (ab) = (12), (34), (13), (24), (14), (23), Q and QH are representable as the 6 × 6 matrices Q = σ1 ⊕ −σ1 ⊕ σ1 ,
QH = 12 ⊕ 12 ⊕ 12 ,
(3.9.3)
where 12 is the 2 × 2 unit matrix and σ1 is a Pauli matrix. Using (3.9.3), it is straightforward to show that (γ , τ ) = ψ(γ (1) , τ )ψ(γ (2) , −τ¯ )ψ(γ (3) , τ ),
γ = γ (1) ⊕ γ (2) ⊕ γ (3) ,
(3.9.4)
where γ (h) ∈ C2 and, for τ ∈ H+ , g ∈ C2 , ψ(g, τ ) = ϑ2 (g1 + g2 |2τ )ϑ2 (g¯ 1 − g¯ 2 |2τ ) + ϑ3 (g1 + g2 |2τ )ϑ3 (g¯ 1 − g¯ 2 |2τ ), (3.9.5) ϑ2 , ϑ3 being standard Jacobi theta functions. 4. Analysis of Abelian Duality We now come to the analysis of the duality covariance properties of the partition function with Wilson loop insertion Z( , τ ), which is the main subject of the paper.
4.1. Study of the τ dependence and duality. We next study the τ dependence of the partition function Z( , τ ). This resides essentially in a ϑ function of the appropriate characteristics. It is therefore necessary to review first some of the basics of the theory of ϑ functions. See for instance [35] for background. We recall that the standard ϑ function with characteristics is defined by
x ϑb (K) = exp iπ nt Kn + 2π int y , (4.1.1) y n∈Zb +x
where b ∈ N, x, y ∈ Rb and K ∈ C(b) such that K = K t and Im K > 0. The main x properties of ϑb (K) used below are the following. Using the Poisson resummation y formula, one can show that the ϑ function satisfies the relation
y 1 x (K) = det(−iK)− 2 exp 2πix t y ϑb (−K −1 ), (4.1.2) ϑb y −x 1
where the branch of the square root used is that for which u 2 > 0 for u > 0. If L ∈ R(b) induces an automorphism of the lattice Zb , one has −1 x L x (4.1.3) (K) = ϑb (Lt KL). ϑb Lt y y
Abelian Duality and Abelian Wilson Loops
493
An element Z ∈ Z(b) with Z = Z t is called even if nt Zn ∈ 2Z for any n ∈ Zb and odd otherwise. We set νZ = 1 if Z is even and νZ = 2 if Z is odd. Then, one has
x x (4.1.4) (K) = exp νZ πix t Zx ϑb (K + νZ Z). ϑb y y − νZ Zx From (3.3.4), (3.4.9), (3.5.21), the τ dependent factor of the partition function Z( , τ ) can be written as π σ ( ) F( , τ ), (4.1.5) Z( , τ ) = exp − τ2 where b1 −1 0 F( , τ ) = τ2 2 ϑb2 (4.1.6) (K(τ )). γ ( ) Here, τ = τ1 + iτ2 varies in the open upper complex half plane H+ . On account of the selection rules derived in Subsect. 3.8, we can assume that ∈ B1s (M) is a boundary. K(τ ) is given by K(τ ) = Q(τ1 + iτ2 H ), (4.1.7) where Q and H are defined by (3.4.1), (3.4.2), respectively. Since Q, H ∈ R(b2 ), Q = Qt , QH = (QH )t and QH > 0, K(τ ) ∈ C(b2 ), K(τ ) = K(τ )t and Im K(τ ) > 0, as required. The vector γ ( ) ∈ Rb2 is given by (3.4.6). γ ( ) is defined modulo Zb2 . Since is a boundary and the curvatures Fr of the connections Ar satisfy the Maxwell equations (3.3.3), γ ( ) does not depend on the choice of the Ar modulo Zb2 . For convenience, we have extracted the exponential factor exp(−π σ ( )/τ2 ), whose τ dependence is anyway quite simple. The analysis of duality reduces essentially to the study of the covariance properties of the function Z( , τ ) under a suitable subgroup of the modular group [18, 19], whose main properties we now briefly review [36]. ¯ The modular group [1] consists of all transformations of the open upper complex half plane H+ of the form u(τ ) =
aτ + b , cτ + d
with a, b, c, d ∈ Z, ad − bc = 1.
(4.1.8)
¯ As is well known, [1] is generated by two elements s, t defined by s(τ ) = −1/τ,
t (τ ) = τ + 1.
(4.1.9)
These satisfy the relations s 2 = id ,
(st)3 = id .
(4.1.10) ∼ ¯ The modular group [1] is isomorphic to the group PSL (2, Z) = SL (2, Z)/{−1, 1}, the isomorphism being defined by a b A(u) = ± , (4.1.11) c d ¯ with u ∈ [1] given by (4.1.18). In particular, 0 −1 1 A(s) = ± , A(t) = ± 1 0 0
1 . 1
(4.1.12)
To efficiently study the duality covariance of F( , τ ), it is necessary to introduce a class of functions of τ ∈ H+ defined as follows. Recall that Q ∈ Z(b2 ) and Q = Qt
494
R. Zucchini
and, so, Q can be even or odd (according to whether M is spin or not). For k, l ∈ Z with kl ∈ νQ Z, we set −1 b1 −1 kQ γ ( ) F(k,l) ( , τ ) = τ2 2 exp −iπ klγ ( )t Q−1 γ ( ) ϑb2 (K(τ )). lγ ( ) (4.1.13) It is readily checked that this expression is defined unambiguously in spite of the Zb2 indeterminacy of γ ( ). Our function F( , τ ) is actually a member of this function class, since indeed (4.1.14) F( , τ ) = F(0,1) ( , τ ). A simple analysis shows that F(k,l) ( , τ ) = e
iπ 4 η
τ−
χ +η 4
τ¯ −
χ −η 4
F(l,−k) ( , −1/τ ).
(4.1.15)
Here, χ and η are respectively the Euler and signature invariant of M and are given by χ = 2(1 − b1 ) + b2 ,
(4.1.16)
η = b2+ − b2− .
(4.1.17)
To prove (4.1.15), one uses (4.1.2), (4.1.3) with L = Q, and the relations b2 = b2+ + b2− and + − 1 iπ (4.1.18) det (−iK(τ )) 2 = e− 4 η τ b2 /2 τ¯ b2 /2 , −K(τ )−1 = Q−1 K(−1/τ )Q−1 .
(4.1.19)
Using (4.1.4), one shows similarly that F(k,l) ( , τ ) = F(k,l−νQ k) ( , τ + νQ ).
(4.1.20)
¯ ¯ Let GνQ be the subgroup of [1] generated by s and t νQ . Specifically, G1 = [1] ¯ and G2 = ¯ θ , the so called Hecke subgroup of [1]. In [18, 19], it was shown that GνQ ¯ is the duality group, the subgroup of [1] under which the partition function without χ−η insertions behaves as a modular form of weights χ+η 4 , 4 . Now, (4.1.14) and (4.1.20) can be written as χ +η
χ −η
F(k,l) ( , τ ) = e 4 η τ − 4 τ¯ − 4 F(k,l)A(s)−1 ( , s(τ )) = F(k,l)A(t νQ )−1 ( , t νQ (τ )). iπ
(4.1.21)
Since F(k,l) ( , τ ) = F(−k,−l) ( , τ ), as is easy to show from (4.1.13) using (4.1.1), the above expressions are unambiguously defined in spite of the sign indeterminacy of A(s) and A(t νQ ). Equation (4.1.21) states that F(k,l) ( , τ ) is a generalized modular χ−η form of GνQ of weights χ+η 4 , 4 . In this sense, GνQ continues to be the duality group also for the partition function with Wilson loop insertions. We denote by EνQ ( ) the subspace of Fun (H+ ) spanned by the functions F(k,l) ( , τ ). We note that, when γ ( ) satisfies certain restrictions, the functions F(k,l) ( , τ ) are not all independent. For instance, if γ ( ) = 0 mod Zb2 , F(k,l) ( , τ ) is actually independent from k, l. So, EνQ ( ) may in some instance be finite dimensional. To see how this can come about in greater detail, suppose that γ ( ) ∈ Qb2 . Then, there is a minimal p ∈ N such that pγ ( ) ∈ Zb2 . Let k, l ∈ Z such that kl ∈ νQ Z. Let further m, n ∈ Z such
Abelian Duality and Abelian Wilson Loops
495
that (kn + lm + mnp)p ∈ νQ Z. Then, (k + mp)(l + np) ∈ νQ Z and, as is easy to show from (4.1.13), one has
F(k+mp,l+np) ( , τ ) = exp 2π i(nk − ml − mnp)w( )/νQ p F(k,l) ( , τ ), (4.1.22) where w( ) ∈ Z is given by w( ) = 21 νQ p 2 γ ( )t Q−1 γ ( ).
(4.1.23)
The phase factor is a νQ p th root of unity independent from τ . Therefore, when γ ( ) satisfies the above condition, EνQ ( ) is finite dimensional. A standard basis of EνQ ( ) consists of the F(k,l) ( , τ ) such that 0 ≤ k, l ≤ p − 1. The dimension of EνQ ( ) is therefore np = p2 − [p/2]2 (νQ − 1).
(4.1.24)
Denote by FA ( , τ ) the standard basis of EνQ ( ). Combining (4.1.15), (4.1.20) and (4.1.22), it is simple to show that there are invertible np × np complex matrices SAB ( ) and T νQ AB ( ) such that FA ( , τ ) = e
iπ 4
η − χ +η − χ −η 4 4
τ¯
τ
SAB ( )FB ( , −1/τ ),
(4.1.25)
B
FA ( , τ ) =
T νQ AB ( )FB ( , τ + νQ ).
(4.1.26)
B
This means that FA ( , τ ) is the Ath component of a vector modular form F( , τ ) of χ−η GνQ of weights χ+η 4 , 4 . The matrices SAB ( ) and T νQ AB ( ) have the property that only one matrix element in each row and column is non-zero. For instance, if p = 2 and νQ = 1, one has np = 4, A = (0, 0), (0, 1), (1, 0), (1, 1) and
1 0 S( ) = 0 0
0 0 1 0
0 1 0 0
0 0 , 0 ε
1 0 T ( ) = 0 0
0 1 0 0
0 0 0 1
0 0 , ε 0
ε = exp(−iπ w( )).
(4.1.27)
For p = 2, νQ = 2, one has np = 3, A = (0, 0), (0, 1), (1, 0) and S( ) =
1 0 0
0 0 1
0 1 0
,
ε = exp(−iπ w( )/2).
T 2 ( ) =
1 0 0
0 1 0
0 0 ε
, (4.1.28)
496
R. Zucchini
4.2. Duality and Twisted sectors. The question arises whether the formal considerations expounded in the previous subsection have a physical interpretation. Here, we propose one. To anticipate, to each boundary ∈ B1s (M), there is associated a family ᐀ of twisted sectors of the quantum field theory. ᐀ is characterized by a point of the coho2 (M)/H 2 (M) and is parameterized by a pair of integers k, l ∈ Z mology torus HdR dR Z such that kl ∈ νQ Z and satisfying further restrictions when γ ( ) ∈ Qb2 , as explained earlier. In turn, each sector is a collection of topological vacua in one–to–one correspondence with Princ(M), as usual. The τ dependent factor of the partition function with a Wilson loop insertion associated to of the sector k, l is π σ ( ) F(k,l) ( , τ ) Z(k,l) ( , τ ) = exp − (4.2.1) τ2 (cf. Eq. (4.1.5)). In the rest of the subsection, we shall try to justify the claims made. For ∈ B1s (M), we define first Q−1rs Ar As , (4.2.2) B =
rs
G = dB =
Q−1rs
(4.2.3)
rs
As is easy to see from (3.4.1),
Ar Fs .
B =
G ∧ G .
(4.2.4)
M
Next, for k, l ∈ Z with kl ∈ νQ Z, we define the action S(k,l) (A, , τ ) = π (FA + kG ) ∧ τˆ (FA + kG ) + 2π l (A + kB ) M −πkl G ∧ G , (4.2.5) M
where A ∈ Conn (P ) with P ∈ Princ(M) (cf. Eqs. (3.1.1)–(3.1.3) and (3.2.1)). We shall consider now the quantum field theory defined by S(k,l) (A, , τ ). But first a few remarks are in order. Since Ar ∈ Z is defined up to an arbitrary integer mr , B is defined up to a shift
of the form Bm = rs Q−1rs mr As . Correspondingly, G is defined up to a shift of the
form Gm = rs Q−1rs mr Fs . Note that Bm is a connection of a U (1) principal bundle
Qm such that nr (cQm ) = rs Q−1rs ms (cf. Eqs. (2.5.4), (2.5.5)) and that Gm is its curvature. If we make the replacements B → B + Bm and G → G + Gm , one has Gm ∧ G m . (4.2.6) S(k,l) (A, , τ ) → S(k,l) (A + kBm , , τ ) + π kl
M
Note that A + kBm ∈ Conn (P Qm k ). Further, kl M Gm ∧ Gm ∈ 2Z. Next, we come to the quantum field theory defined by the action S(k,l) (A, , τ ). Its partition function is computed summing over all topological vacua of Princ (M) and
Abelian Duality and Abelian Wilson Loops
497
factoring the classical and quantum fluctuation contributions, as usual. As is easy to see, the ambiguity (4.2.6) is absorbed by exponentiation and topological vacua summation. A calculation completely analogous to that expounded in Sect. 3 for the partition function Z( , τ ) shows that the τ dependent factor of the partition function is precisely Z(k,l) ( , τ ), Eq. (4.2.1). 2 (M) modulo Z 2 2 2 The class of G in ZdR dR Z (M) is the point of HdR (M)/HdR Z (M) characterizing ᐀ mentioned at the beginning of the subsection. The conclusion of the analysis is that, to preserve Abelian duality in the presence of Wilson loops, it is necessary to assume the existence of twisted sectors. Acknowledgements. I am greatly indebted to R. Stora for useful discussions. This paper is dedicated to the memory of my grandmother Ornella Scaramagli, whose loving and serene eyes still stare at me in my heart.
Appendix In this appendix, we provide briefly the details of the derivation of the formal expression (3.5.5) of the quantum partition function Zqu ( , τ ). The starting expression of Zqu ( , τ ), given in (3.3.6), is a formal functional integral which requires a careful treatment. We normalize conventionally the functional measure Dϕ on a Hilbert space Ᏺ of fields ϕ so that
Dϕ exp − 21 ϕ, ϕ = 1. (A.1) ϕ∈Ᏺ
p
In our case, the relevant field Hilbert spaces are certain subspaces of CdR (M) with p = 0, 1 equipped with the Hilbert space structure defined by (3.5.2). The corresponding functional measures are characterized by (A.1). The invariant measure on the gauge group Gau (M) is defined by the translation of that on its Lie algebra Lie Gau (M) once the normalization of the exponential map 0 (M). We fix the normalization by writing is chosen. Recall that Lie Gau (M) ∼ = CdR 0 (M) and choose Df as the h ∈ Gau (M) near 1 as h = exp(2πif ) with f ∈ CdR measure on Lie Gau (M). Let us go back to (3.3.6). We fix the gauge by imposing a generalized Lorentz condition 1 d1 † v = a, v ∈ CdR (M), (A.2) where a ∈ ran d1 † . We then employ a slight variant of the Faddeev–Popov trick. 1 (M), a ∈ ran d † through the We define a functional B(v, a) of the fields v ∈ CdR 1 identity 1 = B(v, a) x∈ran d0
Dx δran d1 † (d1 † (v + x) − a).
(A.3)
It is easy to show that B(v − x, a) = B(v, a),
x ∈ ran d0 .
(A.4)
Further, when v satisfies the gauge fixing condition (A.2), B(v, a) = B0 ,
(A.5)
where B0 is a constant. We now insert these relations in the functional integral (3.3.6) and, after some straightforward manipulations, we obtain
498
R. Zucchini
Zqu ( , τ ) =
B0 Dvδran d1 † (d1 † v − a) 1 (M) vol(Harm 1Z (M)) v∈CdR × exp − v, (π τ2 d † d)1 v + 2π i j , v ,
(A.6)
1 (M) and where j is defined in (3.5.3). Here, we have used the identity ran d0 = BdR the formal relation 1 1 1 vol(ZdR Z (M))/vol(BdR (M)) = vol(Harm Z (M)).
(A.7)
Next, we define a function (ξ ) of the parameter ξ > 0 by the formal identity Da exp(−ξ a, a). (A.8) 1 = (ξ ) a∈ ran d1 †
Introducing the above relation in the functional integral (A.6), we eliminate the δ function, obtaining Zqu ( , τ ) =
B0 (ξ ) 1 (M)) vol(Harm Z ×
1 (M) v∈CdR
Dv exp − v, (π τ2 d † d + ξ dd † )1 v + 2π i j , v . (A.9)
We compute first the Jacobian . Recalling the facts about the structure of the gauge group Gau (M) expounded in Subsect. 2.2, we find the formal relation 1 1 = vol (ZdR Z (M))/vol (Gau (M)) = vol (BdR (M))/vol (Gau c (M)).
(A.10)
1 (M) at the idenThe tangent map of the isomorphism α : Gau c (M)/Gau 0 (M) → BdR 1
tity is just d0 |ker d0 ⊥ . From here, we have = det (d † d)0 2 /vol (G(M)). One easily 1
computes vol (G(M)) = (vol M/2π) 2 . Thus, ! =
"1 2π det (d † d)0 2 . vol M
(A.11)
The constant B0 is easily computed from (A.3), taking (A.2) into account and writing x = df with f ∈ ker d0 ⊥ . The result is 1 2 B0 = det (d † d)0 .
(A.12)
Similarly, (ξ ) is easily computed from (A.8), writing a = d1 † x with x ∈ ker d1 †⊥ : ! (ξ ) =
"1 det 2ξ(dd † )1 2
. det (dd † )1
(A.13)
The functional integrand (A.9) is invariant under the shifts v → v + vˆ0 , where vˆ0 ∈ Harm 1Z (M), as is easy to see. Thus, we can factorize the functional integration as follows:
Abelian Duality and Abelian Wilson Loops
1 1 (M)) vol(Harm Z =
499
1 (M) v∈CdR
v0 ∈Harm 1 (M)/Harm 1Z (M)
Dv Dv0
v ∈Harm 1 (M)⊥
Dv .
(A.14)
Proceeding in this way, we carry out the Gaussian integration straightforwardly and obtain Dv exp − v, (π τ2 d † d + ξ dd † )1 v + 2π i j , v 1 (M) v∈CdR
1 − 1 det G1 2 2
† † = (2π τ δ det d d + 2ξ dd )
j ,ω ,0 2 1 k (2π)b1 k 2 × exp −π j , (πτ2 d † d + ξ dd † )1 −1 j ,
(A.15)
where G1 is the matrix given by (3.5.6). Next, we substitute (A.11), (A.12), (A.13) and (A.15) into (A.9). The resulting expression can be simplified noting that the operators (d † d)0 , (dd † )1 have the same non zero spectrum counting, also multiplicity and, thus, equal determinants and that (A.16) det (pd † d + qdd † )1 = det p(d † d)1 det q(dd † )1 , with p, q > 0. Proceeding in this way, the ξ gauge independence of Zqu ( , τ ) becomes manifest and one straightforwardly obtains (3.5.5). References 1. Dirac, P.A.M.: Quantized Singularities in the Electromagnetic Field. Proc. Roy. Soc. A133, 60 (1931) 2. Wu, T.T., Yang, C.N.: Concept of non Integrable Phase Factors and Global Formulation of Gauge Fields. Phys. Rev. D12, 3845 (1975) 3. Schwinger, J.S.: Magnetic Charge and Quantum Field. Theory. Phys. Rev. 144, 1087 (1966) 4. Zwanziger, D.: Exactly Soluble Nonrelativistic Model of Particles with Both Electric and Magnetic Charges. Phys. Rev. 176, 1480 (1968) 5. Polyakov, A.M.: Particle Spectrum In Quantum Field Theory. JETP Lett. 20, 194 (1974) [Pisma Zh. Eksp. Teor. Fiz. 20, 430 (1974)] 6. ’t Hooft, G.: Magnetic Monopoles in Unified Gauge Theories. Nucl. Phys. B79, 276 (1974) 7. Julia, B., Zee, A.: Poles with Both Magnetic and Electric Charges in Nonabelian Gauge Theory. Phys. Rev. D11, 2227 (1975) 8. Prasad, M.K., Sommerfield, C.M.: An Exact Classical Solution for the ’t Hooft Monopole and the Julia-Zee Dyon. Phys. Rev. Lett. 35, 760 (1975) 9. Bogomolnyi, E.B.: The Stability of Classical Solutions. Sov. J. Nucl. Phys. 24, 449 (1976) 10. Witten, E., Olive, D.I.: Supersymmetry Algebras that Include Topological Charges. Phys. Lett. B78, 97 (1978) 11. Seiberg, N., Witten, E.: Monopoles, Duality and Chiral Symmetry Breaking in N = 2 Supersymmetric QCD. Nucl. Phys. B431, 484 (1994), arXiv:hep-th/9408099 12. Cremmer, E., Julia, B.: The SO(8) Supergravity. Nucl. Phys. B159, 141 (1979) 13. Gaillard, M.K., Zumino, B.: Duality Rotations for Interacting Fields. Nucl. Phys. B193, 221 (1981) 14. Font, A., Ibanez, L.E., Lust, D., Quevedo, F.: Strong–Weak Coupling Duality and Nonperturbative Effects in String Theory. Phys. Lett. B249, 35 (1990) 15. Hull, C.M., Townsend, P.K.: Nucl. Phys. B 438, 109 (1995), arXiv:hep-th/9410167 16. Witten, E.: String Theory Dynamics in Various Dimensions. Nucl. Phys. B443, 85 (1995), arXiv:hepth/9503124 17. Strominger, A.: Massless Black Holes and Conifolds in String Theory. Nucl. Phys. B451, 96 (1995), arXiv:hep-th/9504090
500
R. Zucchini
18. Witten, E.: On S Duality in Abelian Gauge Theory. Selecta Math. 1,383(1995). arXiv:hepth/9505186 19. Verlinde, E.: Global Aspects of Electricmagnetic Duality. Nucl. Phys. B455, 211 (1995), arXiv:hepth/9506011 20. Olive, D.I., Alvarez, M.: Spin and Abelian Electromagnetic Duality on Four-Manifolds. Commun. Math. Phys. 217, 331 (2001), arXiv:hep-th/0003155 21. Alvarez, M., Olive, D.I.: The Dirac Quantization Condition for Fluxes on Four-Manifolds. Commun. Math. Phys. 210, 13 (2000), arXiv:hep-th/9906093 22. Olive, D.I.: Exact Electromagnetic Duality. Prepared for NATO Advanced Study Institute on Strings, Branes and Dualities, Cargese. France, 26 May–14 June 1997 23. Alvarez-Gaume, L., Zamora, F.: Duality in Quantum Field Theory (and String Theory). Prepared for 37th Internationale Universitatswochen fuer Kernphysik und Teilchenphysik: Broken Symmetries (37 IUKT), Schladming, Austria, 28 Feb–7 Mar 1998, arXiv:hep-th/9709180 24. Olive, D.I.: Spin and Electromagnetic Duality: An outline. arXiv:hep-th/0104062 25. Bott, R., Tu, L.: Differential Forms in Algebraic Topology. New York: Springer Verlag, 1982 26. Schwarz, A.S.: Quantum Field Theory And Topology. Berlin: Springer Verlag, 1993 27. Brylinski, J.-L.: Loop Spaces, Characteristic Classes and Geometric Quantization. Basel-Boston: Birkh¨auser, 1993 28. Koszul, J.L.: Travaux de S. S. Chern et J. Simons sur les Classes Caract´eristiques. Seminaire Bourbaki 26`eme ann´ee 440, 69 (1973/74) 29. Cheeger, J.: Multiplication of Differential Characters. Convegno Geometrico INDAM, Roma maggio 1972, in Symposia Mathematica XI, London-New York: Academic Press, 1973, p.441 30. Cheeger, J., Simons, J.: Differential Characters and Geometric Invariants. Stony Brook preprint (1973), reprinted in Lecture Notes in Math. 1167, Berlin-Heidelberg-New York: Springer Verlag, 1985, p. 50 31. Gilkey, P.B.: Invariance Theory, the Heat Equation and the Atiyah-Singer Index Theorem. Wilmington, DE: Publish or Perish, 1984 32. Morgan, J.W., Sullivan, D.P.: The Transversality Characteristic Class and Linking Cycles in Surgery Theory. Ann. Math. 99, 461 (1974) 33. Freed, D.S.: Determinants, Torsion and Strings. Commun. Math. Phys. 107, 483 (1986) 34. Witten, E.: Quantum Field Theory and the Jones Polynomial. Commun. Math. Phys. 121, 351 (1989) 35. Igusa, J.: Theta Functions. Berlin: Springer Verlag, 1972 36. Miyake, T.: Modular Forms. Berlin: Springer Verlag, 1989 Communicated by H. Nicolai
Commun. Math. Phys. 242, 501–529 (2003) Digital Object Identifier (DOI) 10.1007/s00220-003-0953-y
Communications in
Mathematical Physics
Spectral Theory for Periodic Schr¨odinger Operators with Reflection Symmetries, B. Helffer1 , T. Hoffmann-Ostenhof 2,3 1
D´epartement de Math´ematiques, Batiment 425, Universit´e Paris-Sud, 91045 Orsay C´edex, France. E-mail: [email protected] 2 Institute f¨ ur Theoretische Chemie, Universit¨at Wien, W¨ahringer Strasse A-17, 1090 Wien, Austria. 3 International Erwin Schr¨ odinger Institute for Mathematical Physics, Boltzmanngasse 9, 1090 Wien, Austria. E-mail: [email protected] Received: 9 January 2003 / Accepted: 28 May 2003 Published online: 10 October 2003 – © Springer-Verlag 2003
Abstract: Let H = − + V be defined on Rd with smooth potential V , such that V (x) = V (x + n) , for all n ∈ Zd . In addition we assume that V (Tj x) = V (x), j = 1, . . . , d , where Tj (x1 , . . . , xj −1 , xj , xj +1 , . . . , xd ) = (x1 , . . . , xj −1 , −xj , xj +1 , . . . , xd ). This is a periodic Schr¨odinger operator with additional reflection symmetries. We investigate the associated Floquet operators H q , q ∈ [0, 1]d . In particular we show that the associated lowest eigenvalues λq are simple if q = (q1 , q2 , . . . , qd ) satisfies qj = 1/2 for each j = 1, 2, . . . , d. 1. Introduction and Main Results Consider a selfadjoint Schr¨odinger operator H = − + V on some domain ⊆ Rd with suitable boundary conditions and assume for simplicity that H has discrete spectrum. It is well known that under rather general assumptions the groundstate eigenvalue is nondegenerate and that the associated groundstate can be chosen to be positive. Now assume that H commutes with the actions of a discrete group G. Then one can write H as a direct sum, H = ⊕Hi , where the Hi are the restrictions of H to mutually
¨ Supported by Ministerium f¨ur Bildung, Wissenschaft und Kunst der Republik Osterreich Supported by the European Science Foundation Programme Spectral Theory and Partial Differential Equations (SPECT)
502
B. Helffer, T. Hoffmann-Ostenhof
orthogonal symmetry subspaces which correspond to the irreducible representations Di of G. Denote by λi the groundstate eigenvalues of the Hi and by mi their multiplicities. Motivated by these general results on groundstates and groundstate eigenvalues of H , the following questions seem natural: (a) Denote by i the degree of the irreducible representations Di . When is it possible to find universal upper bounds to the multiplicities of the λi ’s, the lowest eigenvalues of the Hi ’s in terms of the i ’s ? (b) When is it possible to find an ordering of these eigenvalues λi ? (c) Under which assumptions is there some counterpart to the fact that the absolute groundstate of H can be chosen to be positive for the groundstates of the Hi ? Of course these questions are much too general, but one can study them by investigating specific cases. In two recent papers, [3] and [4], the present authors, together with M. HoffmannOstenhof and N. Nadirashvili, considered the above mentioned questions for two dimensional problems. Namely, in [3] the case that H commutes with the actions of the dihedral group D2n , the group of the regular n-gon, was investigated. These investigations were generalized to the case of a periodic strip with addtional reflection symmetries in [4] where also similar questions concerning Aharonov Bohm Hamiltonians were studied. In [3] and in [4] the above questions (a), (b) and (c) were completely answered, namely the multiplicity satisfied m(λi ) = i , the λi exhibited a natural ordering and the groundstates showed some behaviour which can be interpreted as a kind of “positivity”. For the strip case it was shown that the groundstate eigenvalues of certain Floquet operators were simple and that the associated eigenfunctions had empty zerosets. In the present paper we consider question (a) for periodic Schr¨odinger operators with additional reflection symmetries. We consider on R d , H = − + V ,
(1.1)
and assume that V ∈ C ∞ (Rd ) is bounded and real valued and that V (x + n) = V (x) , ∀n ∈ Zd , ∀x ∈ Rd .
(1.2)
In addition, we assume that for 0 < j ≤ d, V (Tj x) = V (x),
(1.3)
where the Tj are defined by Tj (x1 , . . . , xj −1 , xj , xj +1 , . . . , xd ) = (x1 , . . . , xj −1 , −xj , xj +1 , . . . , xd ).
(1.4)
The operator domain of H is W 2,2 (Rd ). We have a periodic Schr¨odinger operator and the spectral analysis of H can be done by Floquet theory, see [8]. 2,2 For any q ∈ Rd we associate to q the space Wq (Rd ) of the functions u ∈ Wloc (Rd ) such that u(x − n) = e2πin,q u(x) , ∀n ∈ Zd .
(1.5)
The Wq norm is defined by taking the W 2,2 norm over the fundamental cell, in our case the unit cube, C = {x ∈ Rd | 0 < xj < 1, j = 1, . . . , d},
(1.6)
Spectral Theory for Periodic Schr¨odinger Operators
503
2,2 and we observe that a function uq ∈ Wloc (Rd ) is well defined if it satisfies (1.5) by its restriction to C. If we restrict the operator H to Wq we obtain a selfadjoint operator H q and it is standard that the spectrum of H , σ (H ) is given by
σ (H ) =
σ (H q ),
(1.7)
q∈[0,1)d
We will analyze the multiplicities of the groundstate energies λq of H q . We note that λq can be defined by the variational principle λq = inf
ϕ∈Wq,f
+ V |ϕ|2 )dx , 2 C |ϕ| dx
C (|∇ϕ|
2
(1.8)
2,2 1,2 where Wq,f is defined analogously to Wq , by replacing the Wloc space by Wloc . Any groundstate uq will satisfy
H uq = λq uq in Rd .
(1.9)
The Floquet conditions (1.5) and noting that uq ∈ Wq implies uq ∈ W−q show that it suffices to consider q ∈ [0, 1/2]d .
(1.10)
Theorem 1.1. Suppose H q and λq is defined as above, then the multiplicity of λq , m(q) satisfies m(q) = 1 , for q ∈ [0, 1/2)d .
(1.11)
Suppose that qi = 1/2 for i ∈ I , where I is a subset of {1, 2, . . . , d} and qi ∈ [0, 21 ) for i ∈ I , then m(q) ≤ 2|I | .
(1.12)
Remarks 1.2. (i) The proof is easier in the case when q ∈ (0, 1/2)d so we will first treat this case, where the scheme of the proof will be more transparent. (ii) We have chosen for simplicity the unit cube as the fundamental cell. The same result holds also for the case that we have a right parallelepiped. (iii) Unlike in [3] and [4] we have here only results concerning question (a). (b) and (c) seem to be much harder in the present context, though one might ask whether for q, q ∈ [0, 1/2]d , λq ≤ λq if qj ≤ qj for all j and whether for q ∈ [0, 1/2)d , uq has empty zeroset. (iv) As in [3] and [4] we are not able to make even a plausible guess about the multiplicities without assuming the reflection symmetries. Note that these symmetries are also introduced in a similar context in [7]. (v) We have chosen for simplicity V ∈ C ∞ . One can certainly allow less regular potentials, but we did not strive for that. Furthermore one can replace the Laplacian by an elliptic operator in divergence form, as it was done in [4] with appropriate conditions on the coefficients to ensure the symmetry properties.
504
B. Helffer, T. Hoffmann-Ostenhof
In order to facilitate the reading of the paper we give a very rough sketch of some of the ideas which are basic for the proof of Theorem 1.1. Our approach is not so far from the ideas developed in [3] and [4], but of course in the present case the true multi-dimensionality causes many new problems which did not turn up in the fairly simple geometrical situations described in [3] and [4]. Assume for simplicity that the Floquet parameters q = (q1 , . . . , qd ) satisfy qi ∈ (0, 1/2). As in [3] and [4] we shall show that the problem of multiplicity is almost equivalent to the analysis of the nodal set of a totally antisymmetric state (with respect to the reflexions Tj ) vq canonically attached to a “real” groundstate uq living in a representation space, see Definition 2.7. This state vq is still a solution of H vq = λq vq but of course does not satisfy (1.5). An explicit construction is given in Sect 2. (There are some minor complications if some of the qi are either 0 or 1/2.) It is easy to see that for H = −, Theorem 1.1 holds true. Let H (α) = − + αV .Basic for our proof is the observation that the zeroset of vq (0) has certain localization properties. Later on, see Sect 6, these localization properties will be called canonicity.Consider the infinite collection of mutually disjoint boxes Rd \ ∪di {xi = n/2}n∈Z . It turns out that for small α, the zero set of vq (α) has empty intersection with subsets of the boundary of some of these boxes. Furthermore it has empty intersection with the closure of some of these boxes together with the boxes adjacent to them. If we would have a higher multiplicity of the λq these localization properties would not hold for some vq associated to λq . The proof is then by contradiction. First one notes that if the zeroset intersects the aformentioned parts of the boundary then it would also intersect some boxes in which there had been no zero set for small α. This follows from considerations of the actions of the reflections and translations on vq (α). Roughly speaking one assumes for contradiction that for some α0 this localization property is violated for the first time and finds then that this already implies that it must have been already violated for some 0 < α < α0 , a contradiction.
2. Symmetry Considerations 2.1. Preliminaries. By assumption (see (1.2) and (1.3) ) the operator commutes with the reflections {Tj }dj =1 and the translations gjk (x1 , . . . , xj , . . . , xd ) = (x1 , . . . , xj − k, . . . , xd ), k ∈ Z.
(2.1)
We can understand the composition of these symmetry operations by noting that Tj Tj = I d, gjk gj = gjk+ for 0 < j ≤ d and k, ∈ Z, Tj Tk = Tk Tj , gj gk = gk gj , gj Tk = Tk gj for 0 < j = k ≤ d, and Tj gj = gj− Tj for 0 < j ≤ d and ∈ Z.
(2.2)
This is a discrete group Dd∞ which is generated by {gj }dj =1 and {Tj }dj =1 . Actually can be considered as the d-fold direct product
Dd∞
Dd∞ = D∞,1 × D∞,2 × · · · × D∞,d ,
(2.3)
Spectral Theory for Periodic Schr¨odinger Operators
505
where the D∞,j are generated by gj and Tj . Any h ∈ Dd∞ can be uniquely (up to the ordering) written as h=
d
hj ,
j =1
where hj is an element of D∞,j . By (2.2) hj hk = hk hj for j = k. Suppose h ∈ Dd∞ , then we define the action of h on a function ϕ : R d → C by (hϕ)(x) = ϕ(h−1 (x)). In the following we consider some q ∈ [0, 21 ]d and after reordering the variables we can assume that, for 1 ≤ ν1 ≤ ν2 ≤ d, we have qj ∈ (0, 1/2) , for 1 ≤ j ≤ ν1 ; qj = 1/2 , for ν1 < j ≤ ν2 ; qj = 0 , for ν2 < j ≤ d .
(2.4)
The proof is mixing the representation theory of the group D∞ and of its subgroup G := Zd . But, due to the very simple structure of these finite groups, our presentation will avoid explicitly referring to their representation theory and we have preferred to make all the decompositions explicit. We will treat first the case when ν1 = ν2 = d. d 2.2. The case when q ∈ (− 21 , 21 ) \ {0} . 2.2.1. Decomposition. We already introduced (cf. (1.5) ) for q ∈ [− 21 , 21 ]d the complex spaces Wq , 2,2 (Rd ) | u(x − n) = exp 2πin, q u(x) , ∀n ∈ Zd } . Wq := {u ∈ Wloc
(2.5)
We start with Lemma 2.1. We have Wq ⊂ Sq , 2,2 where Sq is defined as the subspace of Wloc (Rd ) such that
(gj + gj−1 )u = 2 cos 2πqj u , ∀j ∈ {1, · · · , d} . The proof is immediate. We have for any j and any u in Wq , gj u = exp 2iπ qj u ; gj−1 u = exp −2iπ qj u . The result is obtained by addition of the two lines. This achieves the proof of the proposition.
506
B. Helffer, T. Hoffmann-Ostenhof
Definition 2.2. We denote by d the finite group associated to {−1, +1}d . For σ and σ in d , the law of composition is given by (σ ◦ σ )j = σj · σj , ∀j = 1, . . . , d . The group acts naturally on (− 21 , 21 )d by σ (q) = (σj qj )j =1,··· ,d . We observe that Sq = Sσ (q) and that, provided q ∈ (0, 21 )d , the orbit of q by the group d has 2d distinct points in (− 21 , 21 )d . Remark 2.3. Sq is stable by complex conjugation and by the action of the group G = Zd , Sq = C ⊗ SqR , where SqR denotes the real valued functions of Sq . Proposition 2.4. Sq = ⊕σ ∈d Wσ (q) . Proof. We observe that there exists a family of 2d projectors defined by σ =
1 d
dj =1 (I + iσj Rj ) ,
(2.6)
Rj = (gj−1 − gj )/(2 sin 2π qj ) .
(2.7)
R2j = −I and Rj Rk = Rk Rj , ∀j, k .
(2.8)
2
where
We note that
Using these relations, it is easy to verify that this family satisfies σ = I ,
(2.9)
σ ∈d
and σ · σ˜ = δσ,σ˜ σ ,
(2.10)
where δσ,σ = 1 and δσ,σ˜ = 0 if σ = σ . It remains to show that the operator σ is the projector of Sq onto Wσ (q) . To show this last point, we obtain by explicit computation the following
Spectral Theory for Periodic Schr¨odinger Operators
507
Lemma 2.5. If v satisfies, for some qj ∈ (0, 21 ), (gj + gj−1 )v = 2 cos 2π qj v, then w := 21 [v + iRj v] satisfies, for any k ∈ Z, gjk w = exp 2iπ kqj w,
(2.11)
gjk v = cos 2πkqj v − sin 2π kqj Rj v
(2.12)
gjk Rj v = sin 2πkqj v + cos 2π kqj Rj v .
(2.13)
and in addition v satisfies
and
This achieves the proof of the proposition. A particular role will be played by the projector σ 0 associated to the neutral element σ 0 of the group d corresponding to σj = 1 for all j = 1, · · · , d. We now consider the group (actually the same group but working on Rd ) generated by the Tj ’s (j = 1, · · · , d). It is immediate to see that for all 1 ≤ j ≤ d, Tj Sq ⊂ Sq , and that Tj commutes with complex conjugation. The Tj ’s generate a group with 2d elements and according to finite group theory we can have an alternative decomposition of Sq by using the family of projectors Pτ =
1 d 2
dj =1 (I + τj Tj ) ,
with τ ∈ {−1, +1}d . One verifies immediately that they satisfy Pτ = I τ ∈{−1,+1}d
and Pτ Pτ˜ = δ(τ, τ˜ )Pτ . A particular role is played by τ0 = (−1, −1, · · · , −1) . If u belongs to Wq the function Pτ0 u is called its totally antisymmetrized function. The relation of Wq and Pτ0 Sq is described by the following Proposition 2.6. The map Pτ0 is a bijection from Wq onto Pτ0 Sq . Moreover the inverse is given by 2d σ 0 .
508
B. Helffer, T. Hoffmann-Ostenhof
Proof. We use the observation that Tj ◦ Rj = −Rj ◦ Tj ,
(2.14)
Tj ◦ Rk = Rk ◦ Tj ,
(2.15)
and that
when j = k. The proof is easily reduced to the case d = 1. Take u ∈ Wq . We have just to show that [I + iR][u − T u] = u . But we know that 21 (I + iR)u = u, which implies also that 21 (u − iR)u = 0. It is then enough to consider the anticommutation of T and R for getting the result. Conversely, if we take v such that T v = −v, we immediately obtain (I − T ) (I + iR) v = (I − T )v + iR(I + T )v = 2v .
2.2.2. Real spaces. We finally would like to consider real spaces. We have seen that the second decomposition commutes with complex conjugation and we can consequently consider the real totally antisymmetric space. One can now recognize the “real” subspace of Wq which is characterized by the following Definition 2.7. We denote by WqRκ the “real” subspace of Wq determined by the condition Ku = u with K := (−1)d dj =1 Tj ,
(2.16)
where denotes complex conjugation. Lemma 2.8. Any element u in Wq can be decomposed in the following way : u = u1 + iu2 , WqRκ .
with uj ∈ Moreover, if u is an eigenstate, the corresponding uj are eigenstates when not identically zero. Proof. We can take u1 =
1 i (u + Ku) , u2 = − (u − Ku) . 2 2
We then observe that K is antilinear, that K 2 = I and that K commutes with − + V and respects Wq . Then the reduction procedure is achieved through the following Lemma 2.9. The map u → Pτ0 u is a bijection of WqRκ onto τ0 SqR . We observe indeed that Pτ0 = KPτ0 = Pτ0 K . All which has been done for the pair of spaces (Wq , Sq ), can be done by restricting all the constructions to a spectral subspace of a selfadjoint operator commuting with , Tj , and gj (j = 1, · · · , d).
Spectral Theory for Periodic Schr¨odinger Operators
509
2.2.3. Strategy of the proof. It is easy to see, as in the case d = 1, that if we want to show that λ = λq is of multiplicity 1, then it suffices to show that λ is an eigenvalue of multiplicity 1 of H restricted to WqRκ . For an element u in WqRκ we define : M(u) = N (Pτ0 u) ,
(2.17)
where, for a real valued v in C 0 (Rd ) (we will always be in this situation when considering eigenstates), N (v) = {v −1 (0)} . So an important part of the analysis is to analyze the zero set of the associated real totally antisymmetric function. We note that this associated function v = Pτ0 u is still a distribution solution of (H − λq )v = 0 . The strategy for showing simplicity of the eigenvalues can be roughly described as follows: Show that M(u) is well localized when deforming − into − + V by the family − + αV . Then show that this localization, that we call canonicity, makes the occurrence of a change of multiplicity (which will be seen to be one for α = 0) impossible. 2.3. The “border” cases. 2.3.1. Decomposition. We extend the previous considerations to the more general case when some of the qj ’s are equal to 0 or 21 . The main idea here is roughly to apply the approach of the previous subsection with respect to the ν1 first variables. We have already introduced (see (1.5) ) the complex spaces Wq and we consider the case when 1 ν1 1 ν2 −ν1 × × {0}d−ν2 , q ∈ 0, 2 2 for some 0 ≤ ν1 ≤ ν2 ≤ d. The first lemma, which extends Lemma 2.1 is Lemma 2.10. Wq ⊂ Sq , 2,2 where Sq is now defined as the subspace of Wloc (Rd ) such that
(gj + gj−1 )u = 2 cos 2πqj u , ∀j ∈ {1, · · · , ν1 } and gj u = (−1)2qj u , ∀j ∈ {ν1 + 1, · · · , d} . Remarks 2.11. (i) The new definitions are compatible with the previous ones when we had ν1 = d.
510
B. Helffer, T. Hoffmann-Ostenhof
(ii) The following property is true : Sq = Sσ (q), where
σ (q) = (σj qj )j =1,··· ,ν1 , (qj )j =ν1 +1,··· ,d
with σj = ±1, for j = 1, · · · , ν1 .
ˆ ν1 ,d , which acts (iii) When we work under condition (2.4), the orbit of q by the group1 effectively on the ν1 first variables and trivially on the other variables, is the same as the orbit of q by the group d , has 2ν1 distinct points. (iv) Sq is stable by complex conjugation and by the action of the group G = Zd , Sq = C ⊗ SqR , where SqR denotes the space of the real valued functions of Sq . Proposition 2.12. Sq = ⊕σ ∈ˆ ν ,d Wσ (q) . 1
Proof. There exists actually a family of projectors defined by : 1 −ν1 νj 1=1 (I + iσj Rj ) . σ = 2 These projectors satisfy : σ · σ˜ = σ δσ σ˜ and
σ = I .
(2.18)
(2.19)
σ
Moreover σ is the projector of Sq onto Wσ (q) . A particular role will be played by the projector σ corresponding to σj = 1 for all 0
ˆ ν1 ,d , identified with j = 1, · · · , ν1 . σ0 corresponds to the neutral element of the group the neutral element of d . We now consider the group generated by the Tj ’s (j = 1, · · · , ν1 ) and according to finite group theory we can have an alternative decomposition of Sq by using the family of projectors 1 ν1 Pτ = νj 1=1 (I + τj Tj ) , 2
with τ ∈ {−1, +1}ν1 . A particular role is played by τ0 = (−1, −1, · · · , −1) and by Pτ Sq which corre0 sponds to the partially antisymmetric states with respect to the ν1 first variables. More explicitly, we have 1 ν1 Pτ = νj 1=1 (I − Tj ) . 0 2 Corresponding to Proposition 2.6 we have 1
ˆ ν ,d is identified naturally with a subgroup of d , by the map τ → (τ , 1, . . . , 1). Note that 1
Spectral Theory for Periodic Schr¨odinger Operators
511
Proposition 2.13. The map Pτ is a bijection from Wq onto Pτ Sq . Moreover the inverse 0
is given by 2ν1 σ .
0
0
Proof. This is a consequence of the properties (2.14) and (2.15). We finally would like to consider real spaces. We have seen that the second decomposition commutes with the complex conjugation and we can consequently consider the partially antisymmetric real space (with respect to the ν1 first variables). One can now recognize the “real” space of Wq which is characterized by Definition 2.14. Let WqRκ be the “real” subspace of Wq determined by the condition Ku = u, where K is given by K := (−1)ν1 νj 1=1 Tj .
(2.20)
Then we have corresponding to Lemma 2.9 Lemma 2.15. The map u → Pτ u is a bijection of WqRκ onto Pτ0 SqR . 0
We observe indeed that Pτ = KPτ = Pτ K . 0
0
0
All which has been done for the pair of spaces (Wq , Sq ), can be done by restricting all the constructions to a spectral subspace of a selfadjoint operator commuting with , Tj , and gj (j = 1, · · · , ν1 ). We have not used all the properties. According to our conditions on q, these spaces Wq are left invariant by the Tj ’s (j = ν1 + 1, · · · , d). For the j s corresponding to qj = 0, it is natural to consider the space which is invariant with respect to the Tj ’s (j > ν2 ). This is indeed what is observed for the free Laplacian and what will be proved later in general. For the j ’s corresponding to qj = 21 , one can decompose the space using the 2ν2 −ν1 commuting projectors Pτ associated to τ ∈ {−1, +1}ν2 −ν1 , and Pτ =
1 ν2 −ν1 2
2 −ν1 ν=1 (I + τ Tν1 + ) .
This can be combined with the decomposition associated with the family Pτ =
1 d−ν2 2
2 d−ν =1 (I + τ Tν2 + ) ,
for τ ∈ {−1, +1}d−ν2 . Finally, we have to analyze the multiplicity for each of the spectral spaces attached to the ground state energy of Hq restricted to the symmetry spaces : WqRκ ,τ
,τ
:= Pτ Pτ WqRκ .
(2.21)
As we shall see, we will reduce the analysis to the particular case when τ = τ0 = (1, . . . , 1) (totally symmetric (ts) with respect to the Tj such that qj = 0) and
512
B. Helffer, T. Hoffmann-Ostenhof
τ = τ0 = (−1, . . . , −1) (totally antisymmetric (ta) with respect to the Tj such that qj = 21 ). So it is natural to introduce : Rκ ,τ0 ,τ0
WqRκ ,ta,ts := Wq
(2.22)
,
which, for a given q, will be the domain of our basic operator. Note that we will also use in one of our statements the space defined by
WqRκ ,τ := Pτ WqRκ . It is immediate to see that
WqRκ ,τ :=
WqRκ ,τ
,τ
.
τ ∈{−1,+1}d−ν2
2.3.2. The reduced problem. For each of these spaces WqRκ ,τ ,τ , one has to analyze the multiplicity for the free Laplacian and then to analyze what is going on by deformation, with respect to α. Actually, we will only analyze the case when τ = (1, 1, . . . , 1). We will indeed prove in Sect 9. that this gives the same ground state and the same multiplicity as for the space WqRκ ,τ . More precisely, we will show Proposition 2.16. If λτq then
,τ
is the ground state energy of H q restricted to WqRκ ,τ τ ,τ0
λτq = λq
< λτq
,τ
, ∀τ = τ0 ,
,τ
,
(2.23)
where τ0 = (1, · · · , 1) and τ belongs to {−1, +1}d−ν2 . In particular, the multiplicity τ ,τ of λτq is the same as the multiplicity of λq 0 and the corresponding ground state is symmetric with respect to the Tj , for ν2 < j ≤ d. If we then show that for each τ the multiplicity is one, we get that the multiplicity is bounded by 2ν2 −ν1 . Indeed, for the case of − restricted to Wq one easily verifies that the multiplicity of the lowest eigenvalue is exactly 2ν2 −ν1 . So we will show the following Theorem 2.17. Suppose that q ∈ [0, 21 ]d . For any τ ∈ {−1, +1}ν2 −ν1 , let H q,τ Rκ ,τ ,τ0 τ ,τ the restriction of the operator to Wq . Let λq 0 be the ground τ ,τ H q,τ ,τ0 . Then the multiplicity of λτq = λq 0 , mτ (q), satisfies
mτ
,τ 0
(q) = 1.
,τ 0
be
state energy of
(2.24)
Note that this result will give the general case of the main theorem, if we observe that our symmetry considerations give the following Proposition 2.18. Suppose that q ∈ [0, 21 ]d , then
λq = inf λτq , τ
and
1 ≤ m(q) ≤
τ Counting the cardinality of the set of the τ
mτ (q) .
gives then the theorem in full generality. The multiplicity statement is due to the possible crossing of two λτq with different τ .
Spectral Theory for Periodic Schr¨odinger Operators
513
2.3.3. The analysis corresponding to one τ is sufficient. Proposition 2.19. Suppose that we have shown that, for any V satisfying (1.2) and (1.3), τ ,τ
the multiplicity of λq0 0 , with τ0 = (−1, . . . , −1) is one. Then the same property is true for any V satisfying (1.2) and (1.3) and any τ ∈ ({−1, +1})d . Proof. Let us define, for ν1 < j ≤ ν2 , by gˆ j the translation operator by
1 2
:
1 (gˆ j )(x) = x + ej , 2 with (ej )k = δj k (δ being the Kronecker symbol). In general, these operators gˆ j do not commute with H , and this is why we have to assume in the proposition, that we have the property for any V . The point is now to observe that through these translations one can exchange the symmetry spaces corresponding to different τ . Let us see this first in one variable. Let u be an anti-periodic, symmetric function. Then uˆ defined by 1 u(x) ˆ =u x− , 2 is an anti-periodic, antisymmetric function. The extension to higher dimensions does not create new problems. We use this trick for the variables xj corresponding to the j such that τj = 1. By this procedure, we have sent the initial problem to a new problem with τ = (−1, . . . , −1) and a new V which satisfies also (1.2) and (1.3) obtained from V by translations gˆ j in some directions.
R ,(τˆ (j ) ) ,τ
, with More precisely, the translation by gˆ j sends WqRκ ,τ ,τ onto Wq κ (τˆ (j ) ) = τ for = j − ν1 and (τˆ (j ) ) = −τ for = j − ν1 and an eigenvector u of (− + V ) becomes an eigenvector (gˆ j u) of − + (gˆ j V ). Moreover we emphasize that (gˆ j V )(T x) = (gˆ j V )(x) , for = 1, . . . , d . Remark 2.20. Note that if V is invariant with respect to some of these translations(smaller period), then we get isospectrality between the restriction of the Hamiltonians restricted to the representation spaces corresponding to some τ . To be more specific, if V is in addition periodic with period 21 in the j th direction for some ν1 < j ≤ ν2 , then the problems relative to τ and (τˆ (j ) ) are isospectral. There is no reason for this isospectrality in general. 3. Perturbation Theory 3.1. Kato’s theory. We shall consider the family of operators Hq (α), defined for α in a complex neighborhood of the interval [0, 1], Hq (α) = − + αV Rκ ,τ0 ,τ0
whose domain is restricted to Wq
.
514
B. Helffer, T. Hoffmann-Ostenhof Rκ ,τ0 ,τ0
We recall that Wq
κ ,ta,ts also denoted by WqR ,0,0 is characterized by
κ ,ta,ts WqR ,0,0 = {u ∈ WqRκ | Tj u = −u for ν1 < j ≤ ν2 , Tj u = u for ν2 < j ≤ d} .
This family of operators, Hq (α), is a type A family in the sense of Kato [6] and hence standard perturbation theory applies. Furthermore we know from standard perturbation theory that there is a way for choosing the eigenvalues depending analytically on α. This is particularly simple when the eigenvalue is of multiplicity one and it is easier in this case to choose eigenfunctions depending smoothly on α. We shall also need a more precise result at a possible change of multiplicity (see below Lemma 3.3). 3.2. The case α = 0. We note for further reference that, for given q and for α = 0, a ground state of − in WqRκ ,ta,ts is given by ν1
uq (0) = ei
j =1 qj xj
·
ν2
sin π x
(3.1)
=ν1 +1
and we recall that the projector Pτ associates to uq (0) 0
vq (0) = Pτ uq (0) = 0
ν2
sin 2π qj xj .
(3.2)
j =1
The corresponding eigenvalue is (0) = 4π 2 λta,ts q
d
qj2 = π 2 (ν2 − ν1 ) + 4π 2
j =1
ν1
qj2 .
(3.3)
j =1
It is then easy to see that the corresponding multiplicity satisfies m(λta,ts (0)) = 1 . q
(3.4)
3.3. Starting the deformation argument. Let us consider our family of operators Hq (α) restricted to WqRκ ,ta,ts (Rd ), which is defined for α ∈ [0, 1 + 0 ], (0 > 0). We consider (α)) > 1. We define the set J of the α’s in [0, 1 + 0 ] such that m(λta,ts q Definition 3.1. if J is empty , α0 (q) = 1 + 0 , α0 (q) = inf{0 ≤ α | α ∈ J } , if J is not empty .
(3.5)
Kato’s perturbation theory implies Lemma 3.2. There is a δ > 0 such that m(λta,ts (α)) = 1 , for |α| < δ q and
(3.6)
Spectral Theory for Periodic Schr¨odinger Operators
515
Lemma 3.3. Let Uq (α0 (q)) denote the eigenspace of λta,ts (α0 (q)). Then q Uq (α0 (q)) = Uq1 (α0 (q)) ⊕ Uq2 (α0 (q)),
(3.7)
Uq1 (α0 (q)) = lim Uq (α)
(3.8)
where α↑α0 (q)
is one-dimensional and Uq2 (α0 (q)) is orthogonal to it. 4. Zero Sets of the Associated Real Totally Antisymmetric States As in [3] and [4] the zero set of some real totally antisymmetric state will play an important role. Let us mention however some difference here. In the case of the strip, this associated real antisymmetric state was the imaginary part of the groundstate. Here this is no more the case and the operation u → u has to be replaced by u → Pτ u. 0
4.1. The zero set of vq (0). It is useful to analyze first the case when α = 0. Recall that we have by (3.2) vq (0) =
ν2
sin 2πqj xj .
j =1
Let
Z for 1 < j ≤ ν2 and Mj = ∅ for ν2 < j ≤ d . Mj = y ∈ R | y ∈ 2qj
(4.1)
Then ν2
N (vq (0)) =
{x ∈ Rd | xj ∈ Mj }.
(4.2)
j =1
Further we can split each Mj into two disjoint sets, Mj,0 and Mj,1 with Mj,0 = Mj ∩ Z/2.
(4.3)
Let us observe that Mj = Mj,0 , ∀ν1 < j ≤ ν2 . Definition 4.1. We define the special nodal set by Nq0 =
ν2
{x ∈ Rd | xj ∈ Mj,0 }
j =1
The hyperplanes appearing in Nq0 will be called special canonical hyperplanes.
(4.4)
516
B. Helffer, T. Hoffmann-Ostenhof
We notice that, if all the qj ’s are all irrational, then the only special hyperplanes are given by {x = 0} for 1 ≤ ≤ d. We also introduce, for some eigenstate u ∈ WqRκ ,ta,ts (Rd ) for the free Laplacian Nq (0) = Mq (u) \ Nq0 .
(4.5)
Here as in (2.17), M(u) = N (Pτ u) is well defined. Note that, since Pτ is a projection 0 0 operator, any two uq and uq such that uq = i∈I Ti uq with I a subset of {1, 2, · · · , d} will have M(uq ) = M(uq ). More generally we introduce for α < α0 (q) for an associated groundstate u ∈ WqRκ ,ta,ts (Rd ) of Hq (α), Nq (α) = Mq (u) \ Nq0 .
(4.6)
Actually the analysis of Nq (α) will be crucial in the sequel. Locally in α, we will always choose an analytic family of uα . Using Lemma 3.3, we can also have a natural definition for Nq (α0 (q)) by taking u ∈ Uq1 (α0 (q)).
4.2. Preliminaries on the zeroset of totally antisymmetrized real states. In the following we will investigate Nq (α). Lemma 4.2. If u is in WqRκ ,as,ts , then Nq0 is contained in M(u). Proof of Lemma 4.2. The case j ≤ ν1 . Let j such that cos(2π j qj ) = 1, for some j ∈ Z. It is enough to show that
gj j v = −Tj v .
This corresponds to an hyperplane determined by xj = 2j . For this we observe that, if sin(2π j qj ) = 0, then, by (2.12) and the antisymmetry of v,
gj j v = cos(2πj qj ) v = v = −Tj v . The case ν1 < j ≤ ν2 . We first observe that in this case, we have j ∈ 2Z and the special hyperplane corresponds to xj = kj for some kj ∈ Z. We have only to use here that v is antisymmetric with respect to Tj .
Spectral Theory for Periodic Schr¨odinger Operators
517
4.3. Nodal sets and orbits. We would like to analyze the properties of the states uq in WqRκ ,ta,ts and of the associated vq . Lemma 4.3. Assume x0 = (y1 , . . . , yd ) ∈ N (vq )
(4.7)
and that, for some 1 ≤ j ≤ ν1 , yj ∈ Z/2 \ Mj,0 .
(4.8)
x0 + Zej ⊂ N (vq ).
(4.9)
Then
Let P := P (∂x1 , . . . , ∂xˆj , . . . , ∂xd ) be a differential operator with constant coefficients, for which no differentiation with respect to xj appears. Assume in addition that (P vq )(x0 ) = 0, then x0 + Zej ⊂ N (P vq ).
(4.10)
Proof. Let x0 satisfy (4.7) and (4.8). Then, since vq is totally antisymmetric, we have in particular that vq (x1 , x2 , . . . , xj , . . . , xd ) = −vq (x1 , x2 , . . . , −xj , . . . , xd ).
(4.11)
This implies that for j = 2yj , −j
Tj x0 = gj
x0 ∈ N (vq ).
We apply (2.12), with k = j . When qj is irrational, then (Rj vq )(x0 ) = 0. Equation (2.13) shows that (gjk Rj vq )(x0 ) = 0 for k ∈ Z and hence by (2.12) also that (gjk vq )(x0 ) = 0, proving (4.9) for qj irrational. Now consider the case that qj is rational. If sin 2π j qj = 0 then we can proceed as above. Hence assume that sin 2πj qj = 0. This implies that | cos 2π j qj | = 1. If cos 2π j qj = 1 then remembering that j = yj we see that yj ∈ Mj,0 contradicting our assumption (4.8). So it remains to consider the case cos 2π j qj = −1. Equation (2.13) implies −j
(gj
Rj vq (x0 ) = −(Rj vq )(x0 ).
(4.12)
From (2.2) and the definition of Rj it follows that (Tj Rj vq )(x) = (Rj vq )(x),
(4.13)
which means that Rj vq is symmetric with respect to the reflection Tj . We hence obtain (Rj vq )(x0 ) = 0 and from (2.13) that for any k ∈ Z, (gjk Rj vq )(x0 ) = 0. By (2.12) we have again that x0 + ej Z ⊂ N (vq ), proving (4.9) for the rational case. The proof of (4.10) does not lead to any new difficulties, since differentiating the equalities like (4.11) appearing in the proof with respect to the variables x ( = j ) does not change the argument. The partial derivatives ∂x are indeed commuting with Tj and Rj when = j .
518
B. Helffer, T. Hoffmann-Ostenhof
5. Nodal Sets and Continuity 5.1. Nodal sets for solutions of the Schr¨odinger operator. As in our previous works [3] and [4] we have to describe the qualitative behaviour of zerosets of real valued distributional solutions of elliptic partial differential equations. We start with a classical result of Bers [1]. Proposition 5.1. Let ⊂ Rd and suppose that W ∈ C ∞ () is real valued. Suppose w is a nontrivial distributional real valued solution of (− + W )w = 0
(5.1)
in . Then w ∈ C ∞ () and for all x0 ∈ there is a homogeneous harmonic polynomial PM ≡ 0 of degree M ≥ 0 such that w(x) = PM (x − x0 ) + O(|x − x0 |M+1 )
(5.2)
in a neighborhood of x0 . Remark 5.2. There are much more general versions of this proposition. In particular one can allow for a wide class of W and there is also a suitable reformulation if we have instead of the Laplace operator a general elliptic operator of second order.
5.2. On harmonic polynomials. Before we continue, let us collect in a lemma useful results about homogeneous harmonic polynomials P : Rd → R of degree . Lemma 5.3. a) A homogeneous harmonic polynomial cannot be of constant sign, unless it is constant. b) A homogeneous harmonic polynomial which vanishes on ∪j ∈I {xj = 0} can be written in the form PM =
xj QM−|I | where M ≥ |I |, |I | ≤ d,
j ∈I
and where |I | denotes the cardinality of I . Moreover, QM−|I | cannot have a constant sign inside + := ∩j ∈I {xj > 0} unless it is constant. c) Let PM be a homogeneous harmonic polynomial of order M, such that N (PM ) = ∪j ∈I {xj = 0}. Then M = |I | and PM (x) = c
j ∈I
for some constant c = 0.
xj ,
Spectral Theory for Periodic Schr¨odinger Operators
519
Proof. First, (see for example [9]), we have for any homogeneous harmonic polynomial of order M PM (x) = |x|M YM (ω),
(5.3)
where ω = x/|x| and YM : S d−1 → R is an eigenfunction of the Laplace Beltrami operator, S of the standard d − 1 unit sphere S d−1 . This means that −S YM = M YM and the eigenvalue M satisfies λ0 = 0 < M < M for 0 < M < M . In particular YM is always orthogonal to a constant in L2 (S d−1 ). This shows a). Now let NI = {x ∈ Rd | xj = 0 for j ∈ I } ∩ S d−1
(5.4)
and let DI+ = {ω ∈ S d−1 | ωj > 0 for j ∈ I } . We first observe that YI0 (ω) := j ∈I ωj is strictly positive in DI+ , and is consequently the unique eigenfunction (up to a constant) corresponding to the lowest eigenvalue |I | of −S on D + with Dirichlet boundary conditions on ∂D + . Consequently YM (ω) = Q(ω)YI0 (ω) is an eigenfunction corresponding to a higher eigenvalue of −S on DI+ . And this implies that YM has to change sign in DI+ , hence there are ω± ∈ DI+ such that YM (ω+ ) > 0 and YM (ω− ) < 0. By the homogeneity of PM this achieves the proof of b). Let us complete the proof of c). If M > |I |, we get immediately by b) a contradiction on the zero set. So M = |I | and the polynomial Q must be constant. 5.3. Continuity. We are interested in the dependence of Mq (uq (α)) = N (vq (α)) and in particular of Nq (α) upon α. We recall from (4.6) that N (vq (α)) = Nq0 ∪ Nq (α) and that, while Nq0 is independent of α, Nq (α) depends upon α. Proposition 5.4. Suppose 0 < α < α0 (q) and that x0 ∈ Nq (α).
(5.5)
x0 ∈ Nq (β) for β ∈ (α − , α + ).
(5.6)
Then there is an > 0 such that
Vice versa, assume that 0 < α < α0 (q) and that x0 ∈ Nq (α).
(5.7)
Then, for each > 0, there is a δ > 0 such that for α − δ < β < α + δ, {x ∈ Rd | |x − x0 | < } ∩ Nq (β) = ∅.
(5.8)
Proof. We recall that vq (α) was constructed as an analytic family(with respect to some parameter α) of real local solutions of a second order elliptic equation with real coefficients, depending also analytically on α : (− + αV − λq (α))vq (α) = 0 . We start with the proof of the first part of the proposition. Hence we want to show that (5.5) implies (5.6). There are two possibilities,
520
B. Helffer, T. Hoffmann-Ostenhof
(i) x0 ∈ Nq0 and (ii) x0 ∈ Nq0 . In case (i) the implication follows just from the continuity of vq (α) with respect to α. For the case (ii), namely x0 ∈ Nq0 \ Nq (α), we will use the consequences of Proposition 5.1. Without loss assume that x0 = (y1 , y2 , . . . , yd ) with yj ∈ Mj,0 for j ∈ I, where I is a subset of {1, 2, . . . , d}. Or more explicitly x0 ∈ {x ∈ Rd | xj = yj }.
(5.9)
j ∈I
Using Lemma 5.3 (point c), we get immediately Lemma 5.5. a) Assume that x0 ∈ Nq (α) but that it is in Nq0 . Then there exists c = 0 such that (5.10) vq (x) = c (xj − yj ) + O |x − x0 ||I |+1 j ∈I
in a neighborhood of x0 , where the leading harmonic polynomial is just the first term on the right hand side. In particular (5.11) ( ∂xj )vq (α)) (x0 ) = 0. j ∈I
b) If x0 ∈ Nq (α) and satisfies in addition (5.9) then the left-hand side of (5.11) equals zero, hence the leading homogeneous harmonic polynomial must have a degree strictly larger than |I |. Remark 5.6. The property (5.11) continues to hold now by the continuity with respect to α also for β ∈ (α − , α + ) for sufficiently small > 0, proving the first assertion of Proposition 5.4. Next we are going to prove that (5.7) implies (5.8). Again we have two cases: (i) x0 ∈ Nq (α) \ Nq0 and (ii) x0 ∈ Nq (α) ∩ Nq0 . Case (i) follows again immediately from continuity. To be more precise x0 ∈ Nq (α) implies that in any ball B (x0 ) = {|x − x0 | < } there are two points, say x + , x − with (vq (α))(x + ) > 0 and (vq (α))(x − ) < 0. This is a consequence of Bers’s Theorem and of a) in Lemma 5.3. One can also use Harnack’s inequality (see for instance [2].) If β is sufficiently close to α, the signs of (vq (β))(x ± ) will not change by continuity, hence along any path joining x + and x − , vq (β) will have a zero. Finally we have to consider case (ii). As in the proof of the first part of our proposition we assume x0 = (y1 , y2 , . . . , yd ) with yj ∈ Mj,0 for j ∈ I . It suffices to show that vq (α) has for every > 0 both signs in B (x0 ) := {x ∈ Rd | |x − x0 | < } \ {x ∈ Rd | xj = yj for j ∈ I }.
Spectral Theory for Periodic Schr¨odinger Operators
521
From Proposition 5.1 and Lemma 5.5 we have (vq (α))(x) = PM (x − x0 ) + O |x − x0 |M+1
(5.12)
with M > |I |. Hence it suffices to show that PM (x − x0 ) has both signs in B for all > 0. First we note that PM (x − x0 ) vanishes identically on the set {x ∈ Rd | xj = yj }, and therefore has the form PM (x − x0 ) = Q(x1 − y1 , x2 − y2 , . . . , xd − yd ) (xj − yj ), (5.13) j ∈I
where Q is a homogeneous polynomial whose degree must be at least one. We then use Lemma 5.3. 6. Nodal Sets and Canonicity Although it will not appear explicitly in the notations, all the notions of canonicity which will be considered depend on the choice of a given q ∈ [0, 21 ]d and of the associated 0 ≤ ν1 ≤ ν2 ≤ d. 6.1. Canonicity. We recall that Mj,0 and Mj,1 were introduced in (4.3). Suppose z ∈ Mj,1 for some 1 ≤ j ≤ ν1 . For each z ∈ Mj,1 , let J (z) be the largest open interval containing z such that J (z) ∩ Z/2 = ∅, so that the endpoints of J (z) are points in Z/2 and so that J (z) is an interval of length 1/2. Define Jj = J (z) (6.1) z∈Mj,1
and let Aq =
ν1
{x ∈ Rd | xj ∈ Jj }.
(6.2)
j =1
We observe that Aq is open
(6.3)
and is just a thickening of Nq (0) in which Nq (α) should be contained for small α. We also observe that ∂Aq is an union of hyperplanes defined by {xj = yj } for some j ∈ {1, · · · , ν1 } and some yj ∈ ∂Jj . Definition 6.1. We call a normal canonical hyperplane any hyperplane contained in ∂Aq . Remembering the definition of a special canonical hyperplane (cf. Definition 4.1), we get the natural notion of canonical hyperplane, this hyperplane being normal or special according to the previous definitions. Definition 6.2. We call vq (α), respectively Nq (α), canonical if Nq (α) ⊂ Aq .
(6.4)
522
B. Helffer, T. Hoffmann-Ostenhof
For a given open set in Rd , we will say that vq (α) is canonical in if Nq (α) ∩ ⊂ Aq .
(6.5)
A suitable definition for the case that is closed will be given in the next subsection. One crucial step in the proof of our result is the following proposition: Proposition 6.3. For all α ∈ [0, α0 (q)), Nq (α) is canonical. We will give the proof in the next sections. 6.2. Localized canonicity. We introduce various notions which will be useful for our considerations. In particular we will have to investigate, having Proposition 5.4 in mind, how canonicity can be violated. For this purpose we introduce a localized version of canonicity. Definition 6.4. We shall say that a box L is canonical if there exists k ∈ Rν+1 such that L := L(k1 , . . . , kν1 ) = {x ∈ Rd | − kj ≤ xj ≤ kj , for 1 ≤ j ≤ ν1 },
(6.6)
and if the {xj = kj } are normal canonical hyperplanes. Remark 6.5. Note that this definition implies that the kj are half integers, hence satisfy 0 < kj ∈ N/2 . Note also that we do not need to localize with respect to the variables xj (j > ν1 ). The simple reason is that we shall consider sets which are invariant by the translations gj (j > ν1 ). In particular we observe that gj Nq (α) = Nq (α) , ∀j > ν1 . This is an immediate consequence of (2.10) and of our choice of symmetries. We observe that the union of canonical boxes cover Rd and this will be enough for analyzing the localization of the nodal sets using the symmetries of our eigenstates. In analogy to Definition 6.2 we introduce for a given closed box the notion of L-canonicity. Definition 6.6. We say that vq (α), respectively Nq (α), is L-canonical if there exists an open neighborhood of L, V (L), such that Nq (α) ∩ V (L) ⊂ Aq .
(6.7)
From this definition it is natural to formulate the following Lemma 6.7. If L is canonical, then the set of α’s in [0, α0 (q)) such that vq (α) is Lcanonical is an open set of [0, α0 (q)). When ν1 = d, the proof of the lemma is immediate from Proposition 5.4 because L is compact. When ν1 < d, Remark 6.5 permits us to work with L∩(Rν1 ×[0, 1]d−ν1 ). Note that the reduction to the compact case is important. A difficulty occurs, in the non compact case, when Nq (α) is canonical, but the distance of Nq (α) to ∂Aq is equal to zero. Typically this is the case when the qj ’s are irrational (look at the function t → sin 2π qt). This is why we introduce here this localization in canonical boxes. Observing that, for α = 0, vq (0) is canonical (see (4.2) ), the proof of the lemma gives also Lemma 6.8. If L is canonical, then there exists (q, L) > 0 such that vq (α) is L-canonical for α ∈ [0, (q, L)).
Spectral Theory for Periodic Schr¨odinger Operators
523
6.3. Breaking of local canonicity. We would like to analyze how the local canonicity can be lost for the first time when increasing α. Lemma 6.8 says that this can only occur for α > 0. According to Lemma 6.7, it is natural to introduce the Definition 6.9. For a given canonical box L, we define a critical α1 (q; L) by α1 (q; L) = α0 (q) ,
(6.8)
if vq (α) is L-canonical for any α ∈ [0, α0 (q)), and otherwise by α1 (q; L) = inf{α ∈ (0, α0 (q)) | vq (α) is not L-canonical }.
(6.9)
Notice that we have already shown (cf. Lemma 6.8) that α1 (q; L) > 0, and that, by Lemma 6.7, vq (α1 (q; L)) is not L-canonical, if α1 (q; L) < α0 (q) . 6.4. Former local canonicity. For a given L, we now analyze the notion of former L-canonicity for vq (α1 (q; L)). By definition, assuming that α1 (q; L) < α0 (q), vq (α) is an analytic family (with respect to α) such that vq (α) is L-canonical for α < α1 (q; L) and we observed already that vq (α1 (q; L)) is not L-canonical. We first note that by continuity, vq (α1 (q; L)) has the following weaker property of local canonicity: Nq (α1 (q; L)) ∩ L ⊂ Aq .
(6.10)
This leads to the introduction of the localized touching set: Definition 6.10. The L-touching set of vq (α1 (q; L)) is by definition Tq (α1 (q; L); L) = Nq (α1 (q; L)) ∩ L ∩ ∂Aq .
(6.11)
Remark 6.11. We observe that a touching point necessarily belongs to at least one normal canonical hyperplane. The role played by this touching set appears in the following Lemma 6.12. If α1 (q; L) < α0 (q), then Tq (α1 (q; L); L) = ∅ . Proof. Let us first express that vq (α1 (q; L)) is not L-canonical. We consider a decreasing family of open neighborhoods Vn of L, such that ∩n Vn = L, and Vn+1 ⊂ Vn . For each n, there should be some point zn in Nq (α1 (q; L)) ∩ Vn such that zn ∈ Aq . Using Remark 6.5, we can in addition impose that zn is bounded. Let us extract a converging subsequence still denoted by zn and let us consider the limit z∞ . It is clear that z∞ belongs to Nq (α1 (q; L)) ∩ L and that z∞ ∈ Aq . So it remains to show that z∞ ∈ ∂Aq . We discuss now two possible cases: If z∞ was in the interior of L, we would get a contradiction by continuity unless z∞ ∈ ∂Aq . If z∞ ∈ ∂L, then z∞ ∈ Aq because our box is canonical and is not in Aq by the previous step. Consequently z∞ ∈ ∂Aq . Our aim is to show that α1 (q; L) = α0 (q; L); the basic idea would be to prove that this touching set is actually empty.
524
B. Helffer, T. Hoffmann-Ostenhof
6.5. From local to global. The proof of Proposition 6.3 will be an immediate consequence of the following Proposition 6.13. There exists an increasing exhausting sequence L(n) (n ∈ N) of canonical boxes, such that L(n) ⊂ L(n+1) , ∪n L(n) = Rd and such that α1 (q; L(n) ) = α0 (q) . From now on, we will work on the localized problem. An additional condition on the canonical boxes will be given in (7.5).
7. Proof of Proposition 6.13 7.1. Preliminary remarks. The proof of Proposition 6.13 is tailored after similar considerations in [3] and [4] which were related to the d = 1 case and the case of a strip in R2 . It was possible in these papers to avoid at this step the localization by treating first the rational case and then to treat the irrational case by a comparison with the rational case. The proof given here will not distinguish between the two cases, the only difference between rational and irrational being the presence or not of nontrivial special hyperplanes. We recall that we have to analyze how the local canonicity can be broken in a canonical box L and we proceed by contradiction. Hence we assume by contradiction 0 < α1 (q; L) < α0 (q).
(7.1)
The contradiction, will be shown for boxes L := L(k) such that inf j kj is large enough in a sense which will be given in (7.5). Let us first consider as a warm up the case when d = 1 or d = 2. 7.2. The case d = 1. Of course for d = 1 the result (and much more [8] ) is known and treated in [3] and [4]. Our canonical box L is some interval L(k) := [−k, k] where k is a half integer. By former local canonicity, N (α1 (q; L(k))) ∩ L(k) must be contained in Aq ∩L(k). The breaking of L(k)-canonicity has to occur at a touching point t0 ∈ [−k, k] which is a half integer. By Lemma 4.3, t0 + Z belongs to the nodal set of v. Each of these points should belong to the boundary of a closed interval containing exclusively one zero of the function t → fq (t) = sin 2πqt. This implies a contradiction as follows. We recall here the argument of [4]. Let us consider the function fq with q ∈ (0, 21 ). Let Pt0 = {t0 + Z}. The argument is then simply that Pt0 cannot have the property P0 that, for any t ∈ Pt0 , there exists s(t) such that |s(t) − t| < 21 and sin 2π qs(t) = 0. The contradiction is obtained by counting, for k ∈ N large, the numbers of zeros of the function fq in [−k, k]. By the property P0 , we would find that this number is larger than 2k − 1, in contradiction with the computation based on the repartition of the zeros of the sinus function, which gives a number asymptotic to 4qk as k → +∞.
Spectral Theory for Periodic Schr¨odinger Operators
525
Here we keep for future reference the property which was used: Lemma 7.1. For any q ∈ (0, 21 ), there exists a constant K(q) such that, for all k ≥ K(q) the number of zeroes in [−k, k] of the function s → sin 2π qs is less than 2k − 2. So we have shown that for any k such that L(k) is canonical and k ≥ K(q) ,
(7.2)
there cannot be any touching point, in contradiction with Lemma 6.12. The treatment of the limiting cases q = 0 and q = 21 is easy. This ends the proof of the one-dimensional case. 7.3. The case d = 2. 7.3.1. The subcase q ∈ (0, 21 )2 . We consider a canonical box L = L(k1 , k2 ) and we assume in addition that kj ≥ K(qj ) , for j = 1, 2 .
(7.3)
The breaking of canonicity should occur at a touching point x0 ∈ L such that at least one component yj satisfies Z \ Mj,0 . (7.4) 2 Without loss of generality, we can suppose that j = 1. The point x0 necessarily belongs to a normal canonical hyperplane H1 (y1 ) := {x |x1 = y1 }. In this case, we have seen by Lemma 4.3 that x0 + Ze1 with (e1 ) = δ1 also belongs to the nodal set of v, N (v). The first coordinate of x0 being a half integer, we observe that all the points x0 + pe1 (p ∈ Z) have the same property. There are actually two exclusive cases. In the first case, each of the points of x0 +pe1 (p ∈ Z) meets a canonical hyperplane orthogonal to e1 and, using Lemma 7.1 and condition (7.3), this would imply too many zeros for the function t → sin 2πq1 t. In the second case, there exists n1 ∈ Z such that x1 := x0 + n1 ej is a zero which does not belong anymore to some canonical hyperplane H1 orthogonal to e1 . But this implies that, x1 should, for some y2 := (x1 )2 , yj ∈
• either belong to a special canonical hyperplane H2 (y2 ) • or to a normal canonical hyperplane H2 (y2 ). In the first sub-case, we use the second part of Lemma 4.3 and get a contradiction with the property that, at this point,the normal derivative of v with respect to the special canonical hyperplane H2 (y2 ) should not be zero. We can indeed verify first that ∇v(x0 ) = 0 because at x0 we have in this case v(x1 , x2 ) = (x1 −(x0 )1 )(x2 −(x0 )2 )q(x), where q is a C ∞ function. We use Lemma 4.3 for getting the property ∂x2 v(x1 ) = 0. (Note that the tangent derivative of v along H2 (y2 ) is zero, because H2 (y2 ) is a special hyperplane. So we have also ∂x1 v(x1 ) = 0.) In the second sub-case, we can come back to the argument of the first case, with j = 1 replaced by j = 2. This achieves the proof when d = 2 under the condition that k1 and k2 satisfy (7.3). It is indeed easy to find an exhausting family of canonical boxes L(n) satisfying (7.3) and this achieves the proof of Proposition 6.13.
526
B. Helffer, T. Hoffmann-Ostenhof
7.3.2. Border subcases. The limiting cases do not lead to new difficulties. The only cases which remain are : • q1 = 0, q2 ∈ (0, 21 ). This case is treated as a one dimensional case (see for example the strip case in [4]). • q1 ∈ (0, 21 ), q2 = 21 . This case is treated as the second case. 7.4. The case d > 2: Recursion argument. We consider a canonical box L(k) satisfying kj ≥ K(kj ) , for j = 1, · · · , ν1 .
(7.5)
As a consequence of the definition in (6.11), the touching points in Tq (α1 (q; L); L) should belong to the intersection of some canonical hyperplanes, one at least being normal. For each point x0 in T (α1 (q; L); L), we denote by k(x0 ) the number of these canonical hyperplanes and by ks (x0 ) the number of the special hyperplanes in which x0 lies. So we have 1 ≤ k(x0 ) ≤ ν2 , 0 ≤ ks (x0 ) < k(x0 ) .
(7.6)
Let us now show how we arrive at a contradiction. Let x0 ∈ T (α1 (q; L); L), such that k(x0 ) is minimal : k(x0 ) =
inf
x∈T (α1 (q;L);L)
k(x) .
(7.7)
By the second inequality of (7.6), there exists 1 ≤ j ≤ ν1 such that the hyperplane Hj (yj ) is a normal canonical, where yj is the j th component of x0 . We denote by J sp (x0 ) the set of the ’s such that x0 belongs to a special canonical hyperplane orthogonal to e . Let us observe that, by the second statement of Lemma 5.5, we have ∈J sp (x0 ) ∂x v (x0 ) = 0 . (7.8) As for d = 2, we consider two cases. In the first case, the argument is identical to the one described in the case d = 2 and we get a contradiction with (7.5). In the second case, we observe that, for a new x0 denoted by x1 (x1 = x0 +pj ej ), the number k(x1 ) of the canonical hyperplanes containing x1 is equal to k(x0 ) − 1. On the other hand, the number ks (x1 ) of the special hyperplanes containing x1 remains equal to ks (x0 ). Lemma 7.2. If ks (x1 ) < k(x1 ), that is if ks (x0 ) < k(x0 ) − 1, then x1 is a touching point in T (α1 (q; L); L). Proof. The claim follows by inspection of higher derivatives. One should transport the information that there would be a contradiction to L-canonicity at x0 if x1 was not touching. This statement is clear when x1 did not belong to any special canonical hyperplane. We know indeed that v(x1 ) = 0 and it should belong to a normal canonical hyperplane (hence touching by former canonicity). When x1 belongs to some special canonical hyperplanes, we observe that all the derivatives of v at the point x1 with respect to the variables defining these special hyperplanes are equal to zero. If locally near x1 , v −1 (0) was just the union of these special hyperplanes, we get a contradiction, with Lemma 5.5.
Spectral Theory for Periodic Schr¨odinger Operators
527
We have then a contradiction with the minimality of k(x) for x0 . It remains to treat the case when ks (x1 ) = k(x1 ), that is if x1 is exclusively in the intersection of special hyperplanes. This argument fails, because x1 is no longer a touching point. One again gets a contradiction in the following way. First, we can apply Lemma 4.3 which gives, remembering (7.8): (7.9) ∈J sp (x0 ) ∂x v (x1 ) = 0 . But this is in contradiction with the inequality (5.11) given in Lemma 5.5 and applied at the point x1 . Remark 7.3. Note that the variables ν2 < j ≤ d, are dummy variables in all the discussion. Remark 7.4. Note that what we have actually proved is that an element vq which has the property of former L-canonicity, (hence is a continuous limit in the sense of Proposition 5.4 of L-canonical functions), for a canonical box satisfying (7.5) is L-canonical. 8. Multiplicity is one for q ∈ [0,1/2)d In the previous section, we gave the proof that α1 (q, L(n) ) = α0 (q) for an exhausting family L(n) of canonical boxes. We would like now to show that α0 (q) = 1 + 0 . Again the proof is by contradiction. We will assume that α0 (q) < 1 + 0 and show a contradiction. The multiplicity for α = α0 (q) is in this case larger or equal to 2. From Lemma 3.3 (in particular (3.8)) and using Remark 7.4, we infer that vq (α0 (q)) = lim vq (α)
(8.1)
Uq (α0 (q)) = Uq1 (α0 (q)) ⊕ Uq2 (α0 (q))
(8.2)
α↑α0 (q)
is still canonical. We also have
with vq (α0 (q)) ∈ Uq1 (α0 (q)). By taking real linear combinations we see that there must be a f ∈ Uq (α0 (q)) such that f is not canonical. To be definite we can pick f so that x0 = (1/2, 1/2, . . . , 1/2) ∈ N (f ). x0 is obviously not in Aq . Consider wθ = vq (α0 (q)) cos θ + f sin θ.
(8.3)
Obviously w0 is canonical and wπ/2 is not canonical. We choose a canonical box L satisfying (7.5) and define
(8.4) θ0 = inf 0 < θ ≤ π/2 | wθ is not L − canonical . Indeed, we have just to mimick the proof of Proposition 6.3 and obtain then that wθ0 has to be also L- canonical. This leads to a contradiction as in the proof of Proposition 6.13. Remark 8.1. The limit cases do not cause any new problems.
528
B. Helffer, T. Hoffmann-Ostenhof
9. Variational Principle and Canonicity In this section we give the proof of Proposition 2.16 whose role was explained in Subsubsect. 2.3.2. We recall indeed that in the case when some of the qj are equal to zero, there is a specific problem to solve. We have shown that the multiplicity is one under the additional restriction that the state is totally symmetric with respect to these variables and say totally antisymmetric with respect to the variables corresponding to qj = 21 . We would like to show that the multiplicity is one without this assumption that the state should be totally symmetric with respect to the variables corresponding to qj = 0. As done in the paper [3] in another case, we shall implement the variational principle. Let us treat for simplicity the case when d = 2 and q = (q1 , 0). According to the symmetry with respect to T2 the space WqRκ is decomposed in the direct sum : WqRκ = WqRκ ,a ⊕ WqRκ ,s . Corresponding to this decomposition, we have two ground state energies, which are denoted by λsq and λaq . We would like to show : Lemma 9.1. λsq < λaq . Proof. Let us assume by contradiction that λsq ≥ λaq . Let ua be a corresponding groundstate. By antisymmetry, ua vanishes on the line x2 = j2 (j ∈ Z). These lines determine bands of width 21 . Let uˆ a be the symmetric and periodic function with respect to T2 which coincides with ua in the band 0 < x2 < 21 . This energy is equal to the energy of ua and equal to λaq . But uˆ a being symmetric with respect to T2 , this energy should also satisfy : λsq ≤ λaq . So we get λsq = λaq , and uˆ a is consequently an eigenvector in WqRκ ,s . The eigenvalue λsq being simple, uˆ a is collinear to us . But uˆ a , which has the same zeros as ua vanishes on the lines x2 = j2 and the associated antisymmetric va = 21 (I − T1 )uˆ a has the same properties and is not canonical. This gives the contradiction. Remark 9.2. The proof is not limited to this particular case. The small modifications in the general case are left to the reader. Acknowledgement. It is a pleasure to thank the Mittag-Leffler Institute where part of this work was done. T. H-O also wants to thank M. Hoffmann-Ostenhof for helpful discussions.
Spectral Theory for Periodic Schr¨odinger Operators
529
References 1. Bers, L.: Local behaviour of solutions of general linear equations. Commun. Pure Appl. Math. 8, 473–496 (1955) 2. Gilbarg, D., Trudinger, N.S.: Elliptic partial differential equation of second order. Berlin-HeidelbergNew York: Springer, 1983 3. Helffer, B., Hoffmann-Ostenhof, M., Hoffmann-Ostenhof, T., Nadirashvili, N.: Spectral theory for the dihedral group. Geom. Funct. Anal. 12(5), 989–1017 (2002) 4. Helffer B., Hoffmann-Ostenhof, T., Nadirashvili, N.: Periodic Schr¨odinger operators and Aharonov Bohm Hamiltonians. Moscow Math. J. 3, 45–61 (2003) 5. Hoffmann-Ostenhof, T., Michor, P., Nadirashvili, N.: Bounds on the multiplicity of eigenvalues of fixed membrane. Geom. Funct. Anal. 9 1169–1188 (1999) 6. Kato, T: Perturbation Theory for Linear Operators. Second edition, Berlin-Heidelberg-New York: Springer, 1977 7. Kirsch, W., Simon, B.: Comparison theorems for the gap of Schr¨odinger operators. J. Funct. Anal. 75(2), 396–410 (1987) 8. Reed, M., Simon, B.: Methods of modern mathematical physics IV: Analysis of operators. New York: Academic Press, 1978 9. Stein, E., Weiss, G.: Introduction to Fourier analysis on Euclidean spaces. Princeton, New Jersey: Princeton University Press (sixth printing), 1990 Communicated by B. Simon
Commun. Math. Phys. 242, 531–545 (2003) Digital Object Identifier (DOI) 10.1007/s00220-003-0955-9
Communications in
Mathematical Physics
Inequalities for Trace Norms of 2 × 2 Block Matrices Christopher King Department of Mathematics, Northeastern University, Boston, MA 02115, USA. E-mail: [email protected] Received: 20 February 2003 / Accepted: 4 June 2003 Published online: 14 October 2003 – © Springer-Verlag 2003
Abstract: This paper derives an inequality relating the p-norm of a positive 2 × 2 block matrix to the p-norm of the 2 ×2 matrix obtained by replacing each block by its p-norm. The inequality had been known for integer values of p, so the main contribution here is the extension to all values p ≥ 1. In a special case the result reproduces Hanner’s inequality. A weaker inequality which applies also to non-positive matrices is presented. As an application in quantum information theory, the inequality is used to obtain some results concerning maximal p-norms of product channels. 1. Introduction and Statement of Results Quantum information theory has raised some interesting mathematical questions about completely positive trace preserving maps. Such maps describe the evolution of open quantum systems, or quantum systems in the presence of noise [3]. Many of these questions are related to the quantum entropy of states, and the associated notion of the trace norm, or p-norm, of a state. In one case [7] the investigation of the additivity question for product channels (which will be explained in Sect. 5) led to an inequality for p-norms of positive 2 × 2 block matrices for integer values of p. The present paper is devoted to showing that this inequality extends to non-integer values of p. Some implications of this result for the additivity question are presented, as well as a somewhat weaker inequality which applies to all 2 × 2 block matrices. The inequality for positive matrices turns out to be closely related to Hanner’s inequality [6] for the matrix spaces Cp (these matrix spaces are the non-commutative versions of the function spaces Lp ). The precise relation between these results will be described after the statement of Theorem 1 below. Hanner’s inequality for Cp was first established by Tomczak-Jaegermann [10] for even integer values of p. In later work Ball, Carlen and Lieb [2] extended Tomczak-Jaegermann’s results to non-integer values of p, although still with some restrictions in the range 4/3 ≤ p ≤ 4. Many of the ideas and methods used in the proofs of Theorems 1 and 2 in this paper are taken from the paper by Ball,
532
C. King
Carlen and Lieb. The heart of the proof of Theorem 1 is the convexity result presented below in Lemma 4, which extends a result used by Hanner [6] in his original paper. Let M be a 2n × 2n positive semi-definite matrix. It can be written in the block form X Y M= , (1) Y∗ Z where X, Y, Z are n × n matrices. The condition M ≥ 0 requires that X ≥ 0 and Z ≥ 0, and also that Y = X1/2 RZ 1/2 , where R is a contraction. Recall that the p-norm of a matrix A is defined as 1/p ||A||p = Tr(A∗ A)p/2 . (2) Define the 2 × 2 matrix
||X||p ||Y ||p . ||Y ||p ||Z||p
m=
(3)
From H¨older’s inequality it follows that 1/2
||Y ||p = ||X 1/2 RZ 1/2 ||p ≤ ||X||p
1/2
||Z||p
(4)
which implies that m ≥ 0 also. Theorem 1. Let M and m be defined as in (1) and (3). The following inequalities hold: a) for 1 ≤ p ≤ 2, ||M||p ≥ ||m||p ,
(5)
||M||p ≤ ||m||p .
(6)
b) for 2 ≤ p ≤ ∞, Theorem 1 is easily proved for integer values of p using H¨older’s inequality (see [7] for details). In the case where X = Z and Y = Y ∗ , the norms of M and m simplify in the following way: p
p
p
||M||p = ||X + Y ||p + ||X − Y ||p , p p p + ||X||p − ||Y ||p . ||m||p = ||X||p + ||Y ||p
(7) (8)
With these substitutions, the inequalities (5) and (6) are seen to be special cases of Hanner’s inequality [6] for the matrix spaces Cp . Our results apply only when M is positive semidefinite, which in turn requires that the matrices X + Y and X − Y be positive semidefinite. It is conjectured [2] that Hanner’s inequality holds for all complex matrices X and Y . Tomczak-Jaegermann [10] established the general inequality when p is an even integer. Later Ball, Carlen and Lieb [2] proved it for all p ≥ 1 except in the interval 4/3 ≤ p ≤ 4. For values of p in this interval, they were able to establish Hanner’s inequality under the same assumption as in Theorem 1, namely that the matrices X + Y and X − Y are positive semidefinite (although this conclusion follows from the proofs presented in the paper [2], the result for the subinterval 2 ≤ p ≤ 4 in Theorem 2 of that paper states the (incorrect) conditions X > 0 and Y > 0 [4]). The next theorem presents a weaker pair of inequalities which hold for all 2 × 2 block matrices.
Inequalities for Trace Norms of 2 × 2 Block Matrices
533
Theorem 2. Let X, Y , Z, W be complex n × n matrices. Define the 2 × 2 symmetric matrix p p 1/p 1 1 ||X||p 2 ||Y ||p + 2 ||W ||p α = (9) 1/p . p p 1 1 ||Y || + ||W || ||Z|| p p p 2 2 The following inequalities hold: a) for 1 ≤ p ≤ 2,
1/2 X Y ≥ 21/p p − 1 Tr(α 2 ) + 2 − p (Trα)2 , W Z 2 4 p b) for 2 ≤ p ≤ ∞,
1/2 X Y ≤ 21/p p − 1 Tr(α 2 ) + 2 − p (Trα)2 . W Z 2 4 p
(10)
(11)
Again considering the special case where X = X∗ = Z and Y = Y ∗ = W , the right side of (10) and (11) becomes
1/2 1/p 2 2 2 ||X||p + (p − 1) ||Y ||p . (12) The inequalities in this case were derived in [2], and used to establish the 2-uniform convexity (with best constant) of the space Cp . When the block matrix M on the left side of (10) is positive and defined as in (1), the inequality can be easily derived from Theorem 1, as follows. Observe that in this case 1/p p p ||m||p = (u + v) + (u − v) , (13) where ||X||p + ||Z||p , 2
1/2 2 ||X||p − ||Z||p + ||Y ||2p . v= 2
u=
(14) (15)
Gross’s two-point inequality [5] states that for all real numbers a and b, and all 1 ≤ p ≤ 2, 1/p 1/2 p p 1/p 2 2 |a + b| + |a − b| a + (p − 1) b ≥2 . (16) Applying Gross’s inequality to the right side of (13) and using (5) immediately gives (10). In Sect. 3 we prove Theorem 2 in the general case (where positivity is not assumed) by using some very non-trivial results from the paper [2]. Most of the new work in this paper goes into the proof of Theorem 1, part (a). The proof has three main ingredients: for convenience we state them as separate lemmas here. The first ingredient is a slight modification of a convexity result from [2].
534
C. King
X Y ≥ 0, where X, Y, Z are n × n matrices. For fixed Y , and Y∗ Z for 1 ≤ p ≤ 2, the function
Lemma 3. Let M =
(X, Z) −→ TrM p − TrX p − TrZ p
(17)
is jointly convex in X and Z. The second ingredient extends a convexity result of Hanner [6] to the case of positive 2 × 2 matrices with positive coefficients. ac Lemma 4. Let A = > 0, where a, b, c ≥ 0. For 1 ≤ p ≤ 2, the function cb g(A) = Tr
1/p 1/p p a c c1/p b1/p
(18)
is convex in A. The third ingredient is a monotonicity result for positive 2 × 2 matrices. ac Lemma 5. Let A = > 0, where a, b, c ≥ 0. For fixed c, and for 1 ≤ p ≤ 2, the cb function (a, b) −→ TrAp − a p − bp
(19)
is decreasing in a and b. The paper is organised as follows. In Sect. 2 we present the proof of Theorem 1 using Lemmas 3, 4 and 5. Sect. 3 contains the proof of Theorem 2, which is mostly a straightforward adaptation of the proof of the uniform convexity result in [2]. Lemmas 3, 4 and 5 are proved in Sect. 4, and Sect. 5 describes an application of Theorem 1 in Quantum Information Theory.
2. Proof of Theorem 1 Many of the ideas in this proof are taken from the proof of Hanner’s inequality in [2]. First, we borrow the duality argument from Sect. IV of that paper to show that part (b) follows from part (a). For p ≥ 2 define q ≤ 2 to be its conjugate index. Then there is a 2n × 2n matrix K satisfying ||K||q = 1 such that ||M||p =
sup
L:||L||q =1
| Tr(LM) | = Tr(KM).
(20)
The positivity of M means that K can be assumed to be positive. Let K=
A C C∗ B
≥ 0,
(21)
Inequalities for Trace Norms of 2 × 2 Block Matrices
535
then Tr(KM) = Tr(AX) + Tr(CY ∗ ) + Tr(C ∗ Y ) + Tr(BZ) ≤ ||A||q ||X||p + 2||C||q ||Y ||p + ||B||q ||Z||p ||A||q ||C||q m = Tr ||C||q ||B||q ||A||q ||C||q ||m||p ≤ ||C||q ||B||q q ≤ ||K||q ||m||p = ||m||p .
(22)
The first and second inequalities are applications of H¨older’s inequality, the last inequality uses part (a) of Theorem 1. Next we turn to the proof of part (a) of Theorem 1. The inequality becomes an equality at the values p = 1, 2, so we will assume henceforth that 1 < p < 2. Using the singular value decomposition we can write Y = U DV ∗ ,
(23)
where U, V are unitary matrices and D ≥ 0 is diagonal. Unitary invariance of the p norm implies that ∗ U XU D ||M||p = (24) D V ∗ ZV p and also that ||X||p = ||U ∗ XU ||p , ||Z||p = ||V ∗ ZV ||p and ||Y ||p = ||D||p . So without loss of generality we will assume henceforth that Y is diagonal and non-negative. Next we use a diagonalization argument from Sect. III of [2]. Let U1 , . . . , U2n denote the 2n diagonal n × n matrices with diagonal entries ±1. Then for any n × n matrix A we have n
Ad =
2
2−n Ui AUi∗ ,
(25)
i=1
where Ad is the diagonal part of A. Since Y is diagonal this implies that n
2
2
−n
Ui 0 0 Ui
i=1
XY Y Z
Ui∗ 0 0 Ui∗
Xd Y = , Y Zd
(26)
and by the same reasoning n
2 i=1
2
−n
Ui 0 0 Ui
∗ X 0 Ui 0 Xd 0 = . 0 Z 0 Ui∗ 0 Zd
(27)
Now we combine (26) and (27) with the convexity result Lemma 3, which gives p p p p X 0 Xd Y Xd 0 XY − Tr ≥ Tr − Tr . (28) Tr 0 Z Y Z Y Zd 0 Zd
536
C. King
The matrices Xd , Y, Zd are all diagonal with non-negative entries. Denote these entries by (x1 , . . . , xn ), (y1 , . . . , yn ) and (z1 , . . . , zn ) respectively. Then Tr
Xd Y Y Zd
p =
n
Tr
i=1
xi yi y i zi
p .
(29)
Now for i = 1, . . . , n define p
ai = xi ,
p
bi = zi ,
and introduce the 2 × 2 matrices Ai =
p
ci = yi
ai c i . ci bi
(30)
(31)
It follows that ||Xd ||p = (a1 + · · · + an )1/p , ||Y ||p = (c1 + · · · + cn )1/p , ||Zd ||p = (b1 + · · · + bn )
1/p
(32)
,
and the definition (18) implies that p ||Xd ||p ||Y ||p = g(A1 + · · · + An ). Tr ||Y ||p ||Zd ||p Furthermore (29) implies that p Xd Y = g(A1 ) + · · · + g(An ). Tr Y Zd
(33)
(34)
Also, for any positive number k we have g(kA) = kg(A). Combining this with the convexity result Lemma 4 gives g(A1 + · · · + An ) ≤ g(A1 ) + · · · + g(An ), which from (34) and (33) implies that p p Xd Y ||Xd ||p ||Y ||p Tr ≥ Tr . Y Zd ||Y ||p ||Zd ||p Combining (28) with (36) gives p p X 0 XY − Tr Tr 0 Z Y Z p p ||Xd ||p ||Y ||p ||Xd ||p 0 ≥ Tr − Tr . 0 ||Zd ||p ||Y ||p ||Zd ||p
(35)
(36)
(37)
Furthermore ||Xd ||p ≤ ||X||p ,
||Zd ||p ≤ ||Z||p .
(38)
Inequalities for Trace Norms of 2 × 2 Block Matrices
537
Applying Lemma 5 to the right side of (37) shows that p p ||Xd ||p 0 ||Xd ||p ||Y ||p − Tr Tr 0 ||Zd ||p ||Y ||p ||Zd ||p p p ||X||p ||Y ||p ||X||p 0 ≥ Tr − Tr . 0 ||Z||p ||Y ||p ||Z||p Furthermore
Tr
X 0 0 Z
p
= Tr
||X||p 0 0 ||Z||p
(39)
p ,
(40)
and therefore (37) and (39) imply the result Theorem 1. 3. Proof of Theorem 2 This proof follows very closely the methods in Sect. III of [2]. First we use a duality argument to deduce (11) from (10). Let p ≥ 2 and let q be the index conjugate to p. AC Then it follows as in (22) that there is a matrix K = such that ||K||q = 1 and DB X Y = Tr K X Y W Z W Z p = Tr AX + CW + DY + BZ . (41) Define a = ||A||q ,
b = ||B||q ,
c=
x = ||X||p ,
z = ||Z||p ,
y=
1 q q 1/q ||C||q + ||D||q 2 2
(42)
1 p p 1/p ||Y ||p + ||W ||p . 2 2
(43)
1
and similarly 1
Then applying H¨older’s inequality to (41) gives X Y W Z ≤ ax + bz + 2cy. p
(44)
This is rewritten as ax + bz + 2cy = 2
a + b x + z
+2
a − b x − z
+ 2cy 2 2 2 2 a + b x + z =2 2 2 1/2 a − b 1 1/2 x − z +2 q − 1 2 q −1 2 1/2 1 1/2 +2 q − 1 c y. q −1
(45)
538
C. King
Now we apply the Cauchy-Schwarz inequality to the right side of (45); the result is
1/2 a − b 2 a + b 2 2 + (q − 1) + (q − 1)c ax + bz + 2cy ≤ 2 2 2
1/2 x + z 2 1 x − z 2 1 × + + . (46) y2 2 q −1 2 q −1 Furthermore, a + b 2 a − b 2 q −1 2−q Tr(k 2 ) + (Trk)2 , (47) + (q − 1) + (q − 1)c2 = 2 2 2 4 where k is the 2 × 2 matrix k=
ac . cb
(48)
Since q ≤ 2, (10) implies that
1/2 q −1 2−q ≤ 2−1/q Tr(k 2 ) + (Trk)2 2 4
A C D B q
= 2−1/q ||K||q = 2−1/q . Combining (44), (46) and (49) gives
1/2 X Y x + z 2 1 x − z 2 1 2 ≤ 21−1/q y + + W Z 2 q −1 2 q −1 p
1/2 x − z 2 x + z 2 1/p 2 =2 + (p − 1) + (p − 1)y 2 2
1/2 p−1 2−p = 21/p , Tr(α 2 ) + (Trα)2 2 4
(49)
(50)
where α was defined in (9), and this proves (11). Suppose now that 1 ≤ p ≤ 2. The first step in the proof of (10) is to reduce the result to the case where the matrix is self-adjoint. This is done by modifying an argument from Sect. III of [2]. Given X, Y , W and Z define the matrices X Y J = (51) W Z and
0 X ∗ L= 0 Y∗
X 0 W 0
0 W∗ 0 Z∗
Y 0 . Z 0
(52)
Then L = L∗ and furthermore Tr|L|p = Tr(L∗ L)p/2 = Tr(J ∗ J )p/2 + Tr(J J ∗ )p/2 = 2 Tr|J |p .
(53)
Inequalities for Trace Norms of 2 × 2 Block Matrices
Assuming that (10) holds for self-adjoint matrices, it implies that
1/2 p−1 2−p Tr(β 2 ) + (Trβ)2 , ||L||p ≥ 21/p 2 4 where β is given by p p 1/p 1/p ||X|| 2 ||Y || + ||W || p p p β = . p p 1/p 1/p 2 ||Z||p ||Y ||p + ||W ||p
539
(54)
(55)
Comparing with (9) shows that β = 21/p α, and hence (53) and (54) imply (10). The self-adjoint case will be handled by modifying slightly a very non-trivial proof in Sect. III of the paper [2]. For convenience we state the hard part of the proof in [2] as a separate lemma here, and refer the reader to the original source for its proof. Lemma 6 (Ball, Carlen and Lieb). Let A and B be self-adjoint n × n matrices, with A non-singular, and suppose that 1 ≤ p ≤ 2. Then 2/p 2/p d2 p p Tr|A + rB| ≥ 2(p − 1) Tr|B| . (56) dr 2 r=0 Now suppose that X, Y and Z are n × n complex matrices with X and Z self-adjoint. Define X 0 0 Y F = , G= . (57) 0 Z Y∗ 0 Using the notation introduced in (43), the goal is to show that 2/p
x − z 2 x + z 2 ≥ 22/p + (p − 1) + (p − 1)r 2 y 2 (58) Tr|F + rG|p 2 2 at the value r = 1, where now y = ||Y ||p . First, it is easy to show that (58) holds at r = 0: in this case the left side is (x p + zp )2/p , and Gross’s two-point inequality (16) implies that x − z 2 x + z 2 + (p − 1) . (59) (x p + zp )2/p ≥ 22/p 2 2 Second, both sides of (58) are even functions of r (the left side because the matrices F + rG and F − rG have the same spectrum), hence the derivatives of both sides vanish at r = 0. Therefore it is sufficient to prove that 2/p 2/p d2 p 2/p 2 p Tr|F + rG| ≥ 2 2(p − 1)y = 2(p − 1) Tr|G| (60) dr 2 for all 0 ≤ r ≤ 1. The inequality (60) is established by the following argument (again borrowed from [2]). By continuity, it can be assumed that the ranges of F and G span all of C2n (recall that X, Y , Z are n × n matrices) and therefore that F + rG is non-singular at all but possibly 2n values of r in the interval 0 ≤ r ≤ 1. By continuity again it is sufficient to establish (60) at these non-singular values. Let r0 be such a non-singular value, and let A = F + r0 G and B = G. Then at r = r0 , (60) becomes 2/p 2/p d2 p p Tr|A + rB| ≥ 2(p − 1) Tr|B| . (61) dr 2 r=0 But this is exactly the statement of Lemma 6, hence (10) is proved.
540
C. King
4. Proofs of Lemmas 4.1. Proof of Lemma 3. This result is a slight modification of a convexity result proved X Y X 0 in Sect. IV of [2]. For a positive matrix M = ≥0 ≥ 0, define Md = Y∗ Z 0 Z and F = M − Md . Let D1 0 D= = D∗ (62) 0 D2 be a block diagonal self-adjoint matrix, and define φ(s) = Tr(M + sD)p − Tr(Md + sD)p = Tr(Md + F + sD)p − Tr(Md + sD)p . Then for 1 ≤ p ≤ 2 the second derivative of φ has the following integral representation (see [2] for details): ∞ 1 1 1 1 φ (0) = pγp D D− t p−1 Tr D D dt t + Md + F t + Md + F t + Md t + Md 0 (63) for some constant γp . Furthermore, the matrices Md + F + sD and Md − F + sD have the same spectrum, hence (63) can be written ∞ p 1 1 p−1 φ (0) = γp t Tr D D 2 t + Md + F t + Md + F 0 1 1 + D D t + Md − F t + Md − F 1 1 −2 D D dt. (64) t + Md t + Md Ball, Carlen and Lieb [2] proved that for t ≥ 0, and for any self-adjoint matrix A, the map X −→ Tr
1 1 A A t +X t +X
(65)
is convex on the set of positive matrices. Applying this to (64) with X = Md and A = D shows that φ (0) ≥ 0, which is the convexity result in Lemma 3. 4.2. Proof of Lemma 4. Since g is homogeneous it is sufficient to prove that g(A + B) ≤ g(A) + g(B)
(66)
for any A, B of the specified form. To prove this, it is sufficient to show that d g(A + tB)|t=0 ≤ g(B) dt
(67)
Inequalities for Trace Norms of 2 × 2 Block Matrices
541
for any A, B. Let
ac A= , cb Define
a 1/p c1/p M = 1/p 1/p , c b
xy B= . y z
(68)
a (1−p)/p x c(1−p)/p y L = (1−p)/p . c y b(1−p)/p z
(69)
Then d g(A + tB)|t=0 = TrM p−1 L. dt
(70)
The idea of the proof is to maximise the right side of (70) as a function of M, and show that the maximum is achieved when A and B are proportional, in which case the bound is an equality. This will be done by explicitly finding the critical points of TrM p−1 L. To this end write the spectral decomposition of M in the form 1/p 1/p a c M = 1/p 1/p = λP1 + µP2 , (71) c b where Pi are projectors onto the normalised eigenvectors of M, and λ, µ are the eigenvalues (notice that the positivity of A and B implies that both M and L are also positive). If we assume that λ ≥ µ then for some 0 ≤ t ≤ 1 we have a 1/p = λt + µ(1 − t), c1/p = t (1 − t)(λ − µ),
(72)
= λ(1 − t) + µt.
(74)
b
1/p
Furthermore it also follows that k k M p−1 = 11 12 = λp−1 P1 + µp−1 P2 , k12 k22
(73)
(75)
where k11 = λp−1 t + µp−1 (1 − t), k12 = t (1 − t)(λp−1 − µp−1 ),
(76)
k22 = λ
(78)
p−1
(1 − t) + µ
p−1
t.
(77)
Substituting into (70) gives TrM p−1 L = k11 a (1−p)/p x + 2k12 c(1−p)/p y + k22 b(1−p)/p z.
(79)
Equation (79) is invariant under a rescaling of M. Define h=
µ , λ
0 ≤ h ≤ 1,
(80)
then (79) is a function of t and h, and can be written as TrM p−1 L = F (t, h) = F1 (t, h)x + F2 (t, h)y + F3 (t, h)z,
(81)
542
C. King
where t + (1 − t)hp−1 , (t + (1 − t)h)p−1 1−p/2 1 − hp−1 , F2 (t, h) = 2 t (1 − t) (1 − h)p−1 F3 (t, h) = F1 (1 − t, h). F1 (t, h) =
The goal is to maximise F (t, h) over t and h. Define G = t + (1 − t)h 1 − hp−1 − (p − 1)(1 − h) t + (1 − t)hp−1 , H = (1 − t) + th 1 − hp−1 − (p − 1)(1 − h) (1 − t) + thp−1 , and also let
−p , ξ = x t + (1 − t)h −p/2 , η = y(1 − h)−p t (1 − t) −p . ζ = z 1 − t + th
(82) (83) (84)
(85) (86)
(87) (88) (89)
Then explicit calculation shows that ∂F = Gξ − (G − H )η − H ζ ∂t
(90)
∂F = −t (1 − t)(p − 1)(1 − hp−2 )(ξ − 2η + ζ ). ∂h
(91)
and
The critical equations are ∂F ∂F = = 0. ∂t ∂h
(92)
One obvious set of solutions is obtained when t = 0 or t = 1, or h = 1. In all of these cases, the matrix M must be diagonal, in which case (70) implies p 1/p x 0 TrM p−1 L = TrB = Tr ≤ g(B), (93) 0 z1/p and this establishes the result. If 0 < t < 1 and h < 1, the critical equations can be written G(ξ − η) = H (ζ − η), ξ − η = −(ζ − η).
(94)
It is easy to show that h < 1 implies that G > 0 and H > 0, hence the solution of (94) satisfies ξ = η = ζ . In this case M must be proportional to the matrix 1/p 1/p x y , (95) y 1/p z1/p
Inequalities for Trace Norms of 2 × 2 Block Matrices
543
and substituting into (70) then gives TrM p−1 L = g(B),
(96)
which proves the result. 4.3. Proof of Lemma 5. By the convexity result Lemma 4, it is sufficient to prove that the function (a, b) → TrAp − a p − bp is decreasing as a, b → ∞. For a >> 1, and for 1 < p < 2, easy estimates show that TrAp − a p − bp pc2 a p−2 ,
(97)
which is indeed decreasing. Similarly for b. 5. Application to Qubit Maps Quantum information theory has generated an interesting conjecture concerning completely positive maps on matrix algebras. Let be a completely positive trace-preserving (CPTP) map on the algebra of n × n matrices. The minimal entropy of is defined by Smin ( ) = inf S( (ρ)), ρ
(98)
where S is the von Neumann entropy and the inf runs over n × n density matrices (satisfying ρ ≥ 0 and Trρ = 1). Minimal entropy is conjectured to be additive for product maps, that is, it is conjectured that Smin ( 1 ⊗ 2 ) = Smin ( 1 ) + Smin ( 2 )
(99)
for any pair of CPTP maps 1 and 2 . The conjecture (99) has been established in some special cases [9, 8] but a general proof remains elusive. For related reasons, Amosov, Holevo and Werner [1] defined the maximal p-norm for a CPTP map to be νp ( ) = sup || (ρ)||p ,
(100)
ρ
where the sup runs again over density matrices. They conjectured that this quantity is multiplicative for product maps, that is νp ( 1 ⊗ 2 ) = νp ( 1 ) νp ( 2 ).
(101)
Holevo and Werner later discovered a family of counterexamples to this conjecture for p ≥ 4.79, using maps which act on 3 × 3 or higher dimensional matrices [11]. The conjecture remains open if at least one of the pair is a qubit map (which acts on 2 × 2 matrices) or if p ≤ 4. As an application of Theorem 1, we now show that it implies the result (101) in one special case, namely when 1 is the qubit depolarizing channel and p ≥ 2. This result was derived previously using a lengthier argument [8], and the purpose of this presentation is to explore an alternative method which may allow new approaches to the additivity problem. Indeed, the method shown below can be easily extended to cover all unital qubit channels and even some non-unital qubit maps, thus extending the results in
544
C. King
[7] which were derived for integer values of p. Unfortunately, the restriction to p ≥ 2 does not allow any conclusions to be drawn about additivity of minimal entropy. ac The depolarizing channel acts on a state ρ = by cb 1−λ λ a + λ− b λc
(ρ) = λρ + I= + , (102) λc λ− a + λ + b 2 where λ is a real parameter and λ± = (1 ± λ)/2. We will suppose here that 0 ≤ λ ≤ 1. The maximal p-norm of is easily computed to be 1/p 1 + λ p 1 − λ p + . (103) νp ( ) = 2 2 Now consider a positive 2n × 2n matrix M: A C M= . ∗ C B The map ⊗ I acts on M via
(104)
( ⊗ I )(M) =
λ + A + λ− B λC . λC ∗ λ− A + λ + B
(105)
Let p ≥ 2, and let q ≤ 2 be the index conjugate to p. Then as explained at the start of Sect. 2, there is a positive 2n × 2n matrix K satisfying ||K||q = 1 such that ||( ⊗ I )(M)||p = Tr K( ⊗ I )(M) . (106) Following the methods used in (22), this leads to λ ||A||p + λ− ||B||p λ||C||p Tr K( ⊗ I )(M) ≤ + λ||C||p λ− ||A||p + λ+ ||B||p p = || (m)||p , where m is the 2 × 2 matrix
m=
(107)
||A||p ||C||p . ||C||p ||B||p
(108)
By definition of the p-norm this implies
||( ⊗ I )(M)||p ≤ νp ( ) ||A||p + ||B||p .
Now let ρ be a 2n × 2n density matrix, ρ11 ρ12 ρ= ρ21 ρ22
(109)
(110)
and consider the case where M = (I ⊗ )(ρ) and is some other channel, so that ( ⊗ I )(M) = ( ⊗ )(ρ). Then A = (ρ11 ),
B = (ρ22 ),
(111)
Inequalities for Trace Norms of 2 × 2 Block Matrices
545
and hence ||A||p + ||B||p ≤ νp ( ) Tr(ρ11 + ρ22 ) = νp ( ).
(112)
Therefore (109) implies that ||( ⊗ )(ρ)||p ≤ νp ( ) νp ( ).
(113)
Since (113) is valid for all ρ, we get νp ( ⊗ ) ≤ νp ( ) νp ( ),
(114)
and this establishes the result (101), since the inequality in the other direction follows by restricting to product states. Acknowledgement. This work was supported in part by National Science Foundation Grant DMS– 0101205.
References 1. Amosov, G.G., Holevo, A.S., Werner, R.F.: On Some Additivity Problems in Quantum Information Theory. Problems in Information Transmission 36, 305–313 (2000) 2. Ball, K., Carlen, E., Lieb, E.: Sharp uniform convexity and smoothness inequalities for trace norms. Invent. Math. 115, 463–482 (1994) 3. Bennett, C.H., Shor, P.W.: Quantum Information Theory. IEEE Trans. Info. Theor. 44, 2724–2748 (1998) 4. Carlen, E.: Private communication 5. Gross, L.: Logarithmic Sobolev inequalities. Am. J. Math. 97, 1061–1083 (1975) 6. Hanner, O.: On the uniform convexity of Lp and l p . Ark. Math. 3, 239–244 (1958) 7. King, C.: Maximization of capacity and lp norms for some product channels. J. Math. Phys. 43(3), 1247–1260 (2002) 8. King, C.: Additivity for unital qubit channels. J. Math. Phys. 43(10), 4641–4653 (2002) 9. Shor, P.W.: Additivity of the classical capacity of entanglement-breaking quantum channels. J. Math. Phys. 43(9), 4334–4340 (2002) 10. Tomczak-Jaegermann, N.: The moduli of smoothness and convexity and Rademacher averages of trace classes Sp . Studia Math. 50, 163–182 (1974) 11. Werner, R.F., Holevo, A.S.: Counterexample to an additivity conjecture for output purity of quantum channels. J. Math. Phys. 43(9), 4353–4357 (2002) Communicated by M.B. Ruskai
Commun. Math. Phys. 242, 547–578 (2003) Digital Object Identifier (DOI) 10.1007/s00220-003-0950-1
Communications in
Mathematical Physics
Effective Dynamics for Bloch Electrons: Peierls Substitution and Beyond Gianluca Panati, Herbert Spohn, Stefan Teufel Zentrum Mathematik and Physik Department, Technische Universit¨at M¨unchen, 85747 Garching, Germany. E-mail: [email protected]; [email protected]; [email protected] Received: 21 January 2003 / Accepted: 5 June 2003 Published online: 10 October 2003 – © Springer-Verlag 2003
Abstract: We consider an electron moving in a periodic potential and subject to an additional slowly varying external electrostatic potential, φ(εx), and vector potential A(εx), with x ∈ Rd and ε 1. We prove that associated to an isolated family of Bloch bands there exists an almost invariant subspace of L2 (Rd ) and an effective Hamiltonian governing the evolution inside this subspace to all orders in ε. To leading order the effective Hamiltonian is given through the Peierls substitution. We explicitly compute the first order correction. From a semiclassical analysis of this effective quantum Hamiltonian we establish the first order correction to the standard semiclassical model of solid state physics. Contents 1. 2. 3. 4. A. B.
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . The Periodic Hamiltonian . . . . . . . . . . . . . . . . . . . . Space-Adiabatic Perturbation for Bloch Bands . . . . . . . . . Semiclassical Dynamics for Bloch Electrons . . . . . . . . . . Operator-Valued Weyl Calculus for τ -Equivariant Symbols . . Hamiltonian Formulation for the Refined Semiclassical Model
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
547 550 554 564 570 576
1. Introduction A central problem of solid state physics is to understand the motion of electrons in the periodic potential generated by the ionic cores. While the dynamics is quantum mechanical, many electronic properties of solids can be understood already in the semiclassical approximation [AsMe, Ko, Za]. One argues that for suitable wave packets, which are spread over many lattice spacings, the main effect of a periodic potential V on the electron dynamics consists in changing the dispersion relation from the free kinetic energy
548
G. Panati, H. Spohn, S. Teufel
Efree (k) = 21 k 2 to the modified kinetic energy En (k) given by the nth Bloch band. Otherwise the electron responds to slowly varying external potentials A, φ as in the case of a vanishing periodic potential. Therefore the semiclassical equations of motion read r˙ = ∇En (κ) ,
κ˙ = −∇φ(r) + r˙ × B(r) ,
(1)
where r ∈ R3 is the position of the electron, κ = k − A(r) its kinetic momentum with k its Bloch momentum, −∇φ the external electric field, and B = ∇ × A the external magnetic field. Note that there is a semiclassical evolution for each Bloch band separately. (We choose units in which the Planck constant , the speed c of light, and the mass m of the electron are equal to one, and absorb the charge e into the potentials.) One goal of this article is to understand on a mathematical level how these semiclassical equations emerge from the underlying Schr¨odinger equation i ε ∂t ψ(x, t) =
1 2 ε
2 − i∇x − A(εx) + V (x) + φ(εx) ψ(x, t)
= H ψ(x, t)
(2)
in the limit ε → 0 at leading order. Here the potential V : R3 → R is periodic with respect to some regular lattice . is generated through the basis {γ1 , γ2 , γ3 }, γj ∈ R3 , i.e. = x ∈ R3 : x = 3j =1 αj γj for some α ∈ Z3 , and V (x + γ ) = V (x) for all γ ∈ , x ∈ R3 . The spacing of the lattice defines the microscopic spatial scale. The external potentials A(εx) and φ(εx), with A : R3 → R3 and φ : R3 → R, are slowly varying on the scale of the lattice, as expressed through the dimensionless scale parameter ε, ε 1. In particular, this means that the external fields are weak compared to the fields generated by the ionic cores, a condition which is satisfied for real metals even for the strongest external electrostatic fields available and for a wide range of magnetic fields, see [AsMe], Chapter 12. In solid state physics the derivation of the semiclassical model (1) received a lot of attention during the 1950s to the 1970s. We mention representatively the work by Luttinger [Lu], Kohn [Ko], Blount [Bl1 , Bl2 ] and Zak [Za]. As late as 1962 Wannier [Wa] argues that the derivation of (1) from (2) is still incomplete. On the mathematical side the semiclassical asymptotics of the spectrum of H ε have been studied in great detail by G´erard, Martinez and Sj¨ostrand [GMS] with predecessors [BeRa, Bu, HeSj, Ne]. The large time asymptotics of the solutions to (2) without external potentials is studied in [AsKn] and the scattering theory is developed in [GeNi]. However for the dynamics of wave functions, our interest here, the results are modest. In [GMMP] the case φ = 0, A = 0 is considered, in [HST and BMP] a proof is given for A = 0, which leaves out many interesting applications. The method of Gaussian beams is developed in [GRT] for a weak uniform magnetic field and in [DGR] for magnetic Bloch bands. In fact, as our title indicates, we are more ambitious and plan to derive also the first order correction to (1). The electron acquires then a k-dependent electric moment An (k) and magnetic moment Mn (k). If the nth band is nondegenerate (hence isolated) with Bloch eigenfunctions ψn (k, x), the electric dipole moment is given by the Berry connection (3) An (k) = i ψn (k), ∇k ψn (k)
Effective Dynamics for Bloch Electrons
549
and the magnetic moment by the Rammal-Wilkinson term M(k)n = 2i ∇k ψn (k), ×(Hper (k) − E(k))∇k ψn (k) .
(4)
Here · , · denotes the inner product in L2 (R3 / ) and Hper (k) is H ε of (2) with φ = 0 = A for fixed Bloch momentum k, see Eq. (17). As will be explained in detail, the corrected semiclassical equations read r˙ = ∇κ En (κ) − ε B(r) · Mn (κ) − ε κ˙ × n (κ) , (5) κ˙ = −∇r φ(r) − ε B(r) · Mn (κ) + r˙ × B(r) with n (k) = ∇ × An (k) the curvature of the Berry connection. The issue of first order corrections to the semiclassical equations of motion has been investigated recently by Sundaram and Niu [SuNi] in the context of magnetic Bloch bands, see also Chang and Niu [ChNi]. One adds in (2) a strong uniform magnetic field B0 , i.e. the vector potential A0 (x) = 21 B0 × x. If its magnetic flux per unit cell is rational, then the Hamiltonian in (2) is still periodic at the expense of a larger unit cell and replacing the usual translations by the magnetic translations. Eq. (5) remains formally unaltered, only En now refers to the energy of the magnetic subband. Instructive plots of n and Mn are provided in [SuNi] for the particular case of the 2-dimensional Hofstadter model at rational flux 1/3. The first order corrections obtained in [SuNi] agree with our Eq. (5), except for the term of order ε in the second equation. On a technical level magnetic Bloch bands require some extra considerations and we defer them to a forthcoming paper [PST3 ]. It has been recognized repeatedly, as e.g. emphasized in [ABL], that the geometric phases appearing in the first order correction contain novel physics as compared to the leading order. Bloch electrons are no exception. For example for the case of magnetic Bloch bands, the equations of motion (5) provide a simple semiclassical explanation of the quantum Hall effect. Let us specialize (5) to two dimensions and take B(r) = 0, φ(r) = −E · r, i.e. a weak driving electric field and a strong uniform magnetic field with rational flux. Then, since κ = k, the equations of motion become r˙ = ∇k En (k) + E ⊥ n (k), k˙ = E, where n is now scalar, and E ⊥ is E rotated by π/2. We assume initially k(0) = k and a completely filled band, which means we integrate with respect to k over the first Brillouin zone M ∗ . Then the average current for band n is given by
jn = dk r˙ (k) = dk ∇k En (k) − E ⊥ n (k) = −E ⊥ dk n (k) .
M∗
M∗
M∗
M ∗ dk n (k) is the Chern number of the magnetic Bloch bundle and as such an integer. Further applications related to the semiclassical first order corrections are the anomalous Hall effect [JNM] and the thermodynamics of the Hofstadter model [GaAv]. Our derivation of (5) from (2) proceeds in two conceptually and mathematically distinct steps. The first step is to obtain an effective Hamiltonian whose unitary group closely approximates the solution to the Schr¨odinger equation (2) for ε small, in case the initial wave function lies in a subspace corresponding to a prescribed family of Bloch bands. Inside the family, band crossings and almost crossings are allowed. It is crucial however that for every k the family of bands is separated by a gap from the remaining energy bands. Then, associated to the given family of bands, there is a subspace ε L2 (R3 )
550
G. Panati, H. Spohn, S. Teufel
which is adiabatically decoupled from its orthogonal complement to all orders in ε. The effective Hamiltonian generates the approximate time evolution in ε L2 (R3 ). Compared to the space-adiabatic perturbation theory developed in [PST1 ], as a new element we have to face the fact that the classical phase space is (R3 / ∗ ) × R3 , ∗ the lattice dual to and R3 / ∗ = M ∗ the first Brillouin zone. To come close to the scheme in [PST1 ] a natural approach is to use the extended zone scheme. Going from one cell to the next, one picks up a phase factor which necessitates to generalize the pseudodifferential calculus to τ -equivariant symbols, see Appendix A. The effective Hamiltonian is expanded in an ε-independent reference Hilbert space. For example, for a nondegenerate band the reference space is L2 (M ∗ , dk) and the leading order effective Hamiltonian is given through the Peierls substitution h0 (k, iε∇k ) = En (k − A(iε∇k )) + φ(iε∇k ) ,
(6)
where i∇k is understood with periodic boundary conditions on M ∗ . The natural second step consists in a semiclassical analysis of the effective Hamiltonian. It is a standard result that the unitary group generated by h0 is well approximated by the semiclassical equations (1). At next order, h0 (k, iε∇k ) is corrected to h0 (k, iε∇k ) + εh1 (k, iε∇k ), with h1 given in (22). However (5) is not the semiclassical evolution corresponding to that Hamiltonian. The reason is that the subspace ε L2 (R3 ) is mapped to the reference Hilbert space L2 (M ∗ , dk) through a unitary operator which itself depends on ε. Therefore, the transformation of observables generates an ε-dependence in addition to the transformation of time-evolved states. If done properly, one arrives at (5). To give a brief outline of the paper. In Sect. 2 we discuss the periodic Hamiltonian. In particular we recall the unitary Zak transform and state our assumptions on V , A, φ and the gap condition. In Sect. 3 we apply the space-adiabatic perturbation theory to the present case, using the pseudodifferential calculus developed in Appendix A. The semiclassical analysis of the effective Hamiltonian including first order is carried out in Sect. 4. The precise link between (2) and (5) is stated in Theorem 2. In Appendix B we show that Eqs. (5) are of Hamiltonian form with respect to an appropriate symplectic structure. 2. The Periodic Hamiltonian In order to formulate our setup we first need to recall several well known facts about the periodic Hamiltonian 1 Hper := − + V , 2 acting in L2 (Rd ), keeping from now on the dimension d arbitrary. The potential V is periodic with respect to the lattice . Its dual lattice ∗ is defined as the lattice generated by the dual basis {γ1∗ , . . . , γd∗ } determined through the conditions γi · γj∗ = 2π δij , i, j ∈ {1, . . . , d}. The centered fundamental domain of is denoted by M = x ∈ Rd : x = dj =1 αj γj for αj ∈ [− 21 , 21 ] , and analogously the centered fundamental domain of ∗ is denoted by M ∗ . In solid state physics the set M ∗ is called the first Brillouin zone. In the following M ∗ is always
Effective Dynamics for Bloch Electrons
551
equipped with the normalized Lebesgue measure denoted by dk. We introduce the notation x = [x] + γ for the a.e. unique decomposition of x ∈ Rd as a sum of [x] ∈ M and γ ∈ . We use the same brackets for the analogous splitting k = [k] + γ ∗ . We employ a variant of the Bloch-Floquet transform, called the Zak transform (also Lifshitz-Gelfand-Zak transform). The Zak transform of a function ψ ∈ S(Rd ) is defined as (Uψ)(k, y) := e−ik·(y+γ ) ψ(y + γ ), (k, y) ∈ R2d , (7) γ ∈
and one directly reads off from (7) the following periodicity properties: Uψ (k, y + γ ) = Uψ (k, y) for all γ ∈ , (8) ∗ Uψ (k + γ ∗ , y) = e−iy·γ Uψ (k, y) for all γ ∗ ∈ ∗ . (9) From (8) it follows that, for any fixed k ∈ Rd , Uψ (k, ·) is a -periodic function and can thus be regarded as an element of L2 (Td ), Td being the flat torus Rd / . Eq. (9) involves a unitary representation of the group of lattice translations on ∗ (denoted again as ∗ with a little abuse of notation), given by τ : ∗ → U(L2 (Td )) ,
γ ∗ → τ (γ ∗ ) ,
∗
(τ (γ ∗ )ϕ)(y) = ei y·γ ϕ(y).
It will turn out convenient to introduce the Hilbert space Hτ := ψ ∈ L2loc (Rd , L2 (Td )) : ψ(k − γ ∗ ) = τ (γ ∗ ) ψ(k) , equipped with the inner product ψ, ϕHτ =
(10)
M∗
dk ψ(k), ϕ(k)L2 (T) .
Notice that if one considers the trivial representation, i.e. τ ≡ 1, then Hτ is simply a space of ∗ -periodic vector-valued functions over Rd . Obviously, there is a natural isomorphism between Hτ and L2 (M ∗ , L2 (Td )) given by restriction from Rd to M ∗ , and with inverse given by τ -equivariant continuation, as suggested by (9). The reason for working with Hτ instead of L2 (M ∗ , L2 (Td )) is twofold. First of all it allows to apply the pseudodifferential calculus as developed in Appendix A. On the other hand it makes statements about domains of operators more transparent as we shall see. The map defined by (7) extends to a unitary operator U : L2 (Rd ) → Hτ ∼ = L2 (M ∗ ) ⊗ L2 (Td ) . = L2 (M ∗ , L2 (Td )) ∼ U is an isometry and U −1 given through
−1 U ϕ (x) =
M∗
dk eix·k ϕ(k, [x])
(11)
satisfies U −1 Uψ = ψ for ψ ∈ S(Rd ), as can be checked by direct calculation. U −1 extends to an isometry from Hτ to L2 (Rd ). Hence U −1 must be injective and as a consequence U must be surjective, thus unitary. In order to determine the Zak transform of operators like the full Hamiltonian in (2), we need to discuss how differential and multiplication operators behave under the Zak
552
G. Panati, H. Spohn, S. Teufel
transform, see [Bl1 , Za]. Let P = −i∇x with domain H 1 (Rd ) and Q multiplication by x on the maximal domain. Then U P U −1 = 1 ⊗ −i∇y + k ⊗ 1 , per
U QU
−1
=
i∇kτ
(12) (13)
,
per
where −i∇y is equipped with periodic boundary conditions or, equivalently, operating 1 (Rd , L2 (Td )), i.e. it consists on the domain H 1 (Td ). The domain of i∇kτ is Hτ ∩ Hloc 1 ∗ 2 d of distributions in H (M , L (T )) which satisfy the y-dependent boundary condition associated with (9). In addition to (12) and (13) we notice that multiplication with a -periodic function like V is mapped into multiplication with the same function, i.e. U V (x) U −1 = 1 ⊗ V (y). For later use we remark that the following relations can be checked using the definitions (7) and (11), ψ ∈ H m (Rd ) , m ≥ 0 x ψ(x) ∈ L2 (Rd ) , m ≥ 0 m
⇐⇒ ⇐⇒
Uψ ∈ L2 (B, H m (Td )) , m Uψ ∈ Hτ ∩ Hloc (Rd , L2 (Td )) .
Remark 1. The Bloch-Floquet transform is usually defined as
(Uψ)(k, y) :=
e−ik·y ψ(y + γ ), (k, y) ∈ R2d ,
(14)
γ ∈
are periodic in k and for ψ ∈ S(Rd ). In contrast to (7), functions in the range of U quasi-periodic in y,
(k, y + γ ) = eik·γ Uψ
(k, y) Uψ
(k, y)
(k + γ ∗ , y) = Uψ Uψ
for all for all
γ ∈ ,
γ ∗ ∈ ∗ .
(15) (16)
comes from the fact that the Our choice of using the Zak transform U instead of U transform of the gradient has a domain which is independent of k ∈ M ∗ , see (12). As we shall see, this is essential for the application of the pseudodifferential calculus of Appendix A. ♦ For the Zak transform of the free Hamiltonian one finds
⊕ dk Hper (k) U Hper U −1 = M∗
with Hper (k) =
2 1 − i∇y + k + V (y) , 2
k ∈ M∗ .
(17)
For fixed k ∈ M ∗ the operator Hper (k) acts on L2 (Td ) with domain H 2 (Td ) independent of k ∈ M ∗ , whenever the following assumption on the potential is satisfied. Assumption A1 . We assume that V is infinitesimally bounded with respect to − and that φ ∈ Cb∞ (Rd , R) and Aj ∈ Cb∞ (Rd , R) for any j ∈ {1, . . . , d}.
Effective Dynamics for Bloch Electrons
553
Here Cb∞ (Rd , R) denotes the space of bounded smooth functions with derivatives of any order bounded. From this assumption it follows in particular that also the full Hamiltonian H ε of (2) is self-adjoint on H 2 (Rd ). Assumption (A1 ) excludes the case of globally constant electric and magnetic field. However, since we are not concerned with the spectral analysis of H ε , but with the dynamics of states for large but finite times, locally constant fields serve us as well. The band structure of the fibered spectrum of Hper is crucial for the following. The 2 resolvent Rλ0 = (H0 (k) − λ)−1 of the operator H0 (k) = 21 − i∇y + k is compact for fixed k ∈ M ∗ . Since, by assumption, Rλ V is bounded, also Rλ = (Hper (k) − λ)−1 = Rλ0 + Rλ V Rλ0 is compact. As a consequence Hper (k) has purely discrete spectrum with eigenvalues of finite multiplicity which accumulate at infinity. A more detailed discussion can be found e.g. in [Wi]. For definiteness the eigenvalues are enumerated increasingly as E1 (k) ≤ E2 (k) ≤ E3 (k) ≤ . . . and repeated according to their multiplicity. The corresponding normalized eigenfunctions {ϕn (k)}n∈N ⊂ H 2 (Td ) are called Bloch functions and form, for any fixed k, an orthonormal basis of L2 (Td ). We will call En (k) the nth band function or just the nth band. Notice that, with this choice of the labelling, En (k) and ϕn (k) are generally not smooth functions of k due to eigenvalue crossings. Since (18) Hper (k − γ ∗ ) = τ (γ ∗ ) Hper (k) τ (γ ∗ )−1 , the band functions En (k) are periodic with respect to ∗ .
Definition 1. A family of Bloch bands {En (k)}n∈I , I = [I− , I+ ] N, is called isolated, or satisfies the gap condition, if {En (k)}, {Em (k)} =: Cg > 0 . inf ∗ dist k∈M
n∈I
m∈ /I
In the following we fix an index set I ⊂ N for an isolated family of bands. Let PI (k) be the spectral projector of Hper (k) corresponding to the eigenvalues {En (k)}n∈I , then ⊕ PI := M ∗ dk PI (k) defines the projector on the given isolated family of bands. In terms of Bloch functions PI (k) = n∈I |ϕn (k)ϕn (k)|. However, in general, ϕn (k) are not smooth functions of k at eigenvalue crossings, while PI (k) is a smooth function of k because of the gap condition. Moreover, from (18) it follows that PI (k − γ ∗ ) = τ (γ ∗ ) PI (k) τ (γ ∗ )−1 . For the mapping to the reference space we will need the following assumption. Assumption A2 . If the isolated family of bands {En(k)}n∈I is degenerate, in the sense that = |I| > 1, then we assume that there exists an orthonormal basis ψj (k) j =1 of RanPI (k) whose elements are smooth and τ -equivariant with respect to k, i.e. ψj (k − γ ∗ ) = τ (γ ∗ )ψj (k) for all j ∈ {1, . . . , } and γ ∗ ∈ ∗ . In the case of a single isolated -fold degenerate Bloch band (i.e. En (k) = E∗ (k) for every n ∈ I, |I| = ), Assumption (A2 ) is equivalent to the existence of an orthonormal basis consisting of smooth and τ -equivariant Bloch functions. On the other side, if there are eigenvalue crossings inside the family of bands, Assumption (A2 ) requires only that ψj (k) is an eigenfunction of the corresponding eigenprojection PI (k) and not of the free Hamiltonian Hper (k).
554
G. Panati, H. Spohn, S. Teufel
From the geometrical viewpoint Assumption (A2 ) is equivalent to the triviality of a complex vector bundle over Td , namely the bundle of the null spaces of 1 − PI (k) for k ∈ M ∗ . In this geometrical perspective it is not difficult to see that Assumption (A2 ) is always satisfied if either d = 1 or = 1. Indeed, classification theory for bundles implies that any complex vector bundle over T1 = S 1 is trivial. As for = 1, it is a classical result, due to Kostant and Weil, that smooth complex line bundles are completely classified by their first integer Chern class. In our case, the time-reversal symmetry of Hper implies the vanishing of the first integer Chern class, and therefore the triviality of the bundle. The same, and indeed slightly stronger, results can be proved with analytical techniques, as in [Ne] and references therein. By pushing forward the geometrical approach above, we expect that Assumption (A2 ) is generically satisfied for d ≤ 3, as it will be discussed in [Pa]. In the presence of a strong external magnetic field the Bloch bands split into magnetic sub-bands. Generically, their first Chern number does not vanish and therefore Assumption (A2 ) fails. As is well understood and discussed in the introduction, the nonvanishing of the first Chern number is directly linked to the integer quantum Hall effect [TKNN, Si], hence our interest in extending Theorem 3 to magnetic Bloch bands. The required modifications of our theory will be discussed in [PST3 ]. 3. Space-Adiabatic Perturbation for Bloch Bands Let Pn (k) = |ϕn (k)ϕn (k)|. Then the projector on the nth band subspace is given ⊕ through Pn = M ∗ dk Pn (k). By construction the band subspaces are invariant under the dynamics generated by Hper , −1 e−iU Hper U s , Pn = e−iEn (k)s , Pn = 0 for all n ∈ N , s ∈ R . Notice that Pn is not a spectral projector of Hper , in general, since in more than one space dimension it can happen that e.g. En (k) < En+1 (k) for all k ∈ M ∗ but inf k En+1 (k) < supk En (k). According to the identity (12), in the original representation Hper acts on the nth band subspace as Hper ψ = U −1 (En (k) ⊗ 1)U ψ = En (−i∇x ) ψ , where ψ ∈ U −1 Pn U L2 (Rd ). In other words, under the time evolution generated by the periodic Hamiltonian wave functions in the nth band subspace propagate freely but with a modified dispersion relation given through the nth band function En (k). In the presence of non-periodic external fields the subspaces Pn Hτ are no longer invariant, since the external fields induce transitions between different band subspaces. If the potentials are varying slowly, these transitions are small and one expects that there still exist almost invariant subspaces associated with isolated Bloch bands. To construct them, and to study the dynamics inside these almost invariant subspaces, we apply adiabatic perturbation to perturbed Bloch bands. We first present a theorem which summarizes the main results of this section. The remaining parts give the results and the proofs of the three main steps in space-adiabatic perturbation theory: In Sect. 3.1 we construct the almost invariant subspaces associated with isolated Bloch bands. In Sect. 3.2 we explain how to unitarily map the decoupled subspace to a suitable reference Hilbert space. In this reference space the action of the full Hamiltonian is given through a semiclassical pseudodifferential operator, whose
Effective Dynamics for Bloch Electrons
555
power series expansion can be computed to any order in ε. This effective Hamiltonian is constructed in Sect. 3.3 and we compute explicitly its principal and subprincipal symbol. The main technical innovation necessary in order to apply the scheme to the present case is the development of a pseudodifferential calculus for operators acting on sections of a bundle over the flat torus M ∗ , or, equivalently, acting on the space Hτ . This task is deferred to Appendix A. Before going into the details of the construction we present a theorem which encompasses the main results of this section. Generalizing from (10) it is convenient to introduce the following notation. For any separable Hilbert space Hf and any unitary representation τ : ∗ → U(Hf ), one defines the Hilbert space L2τ (Rd , Hf ) := ψ ∈ L2loc (Rd , Hf ) : ψ(k − γ ∗ ) = τ (γ ∗ ) ψ(k) , equipped with the inner product ψ, ϕL2τ =
M∗
dk ψ(k), ϕ(k)Hf .
Using the results of the previous section and imposing Assumption (A1 ), the Zak transform of the full Hamiltonian in (2) is given through 2 1 HZε := U H ε U −1 = (19) + V (y) + φ iε∇kτ − i∇y + k − A iε∇kτ 2 with domain L2τ (Rd , H 2 (Td )). The application of space-adiabatic perturbation theory to an isolated family of bands {En (k)}n∈I yields the following result, where the reference Hilbert space for the effective dynamics is K := L2 (M ∗ ) ⊗ C with := dimPI (k). Theorem 1 (Peierls substitution and higher order corrections). Let {En }n∈I be an isolated family of bands, see Definition 1, and let the Assumptions (A1 ) and (A2 ) be satisfied. Then there exist (i) an orthogonal projection ε ∈ B(Hτ ), (ii) a unitary map U ε ∈ B( ε Hτ , K), and (iii) a self-adjoint operator h ∈ B(K) such that and
ε H , ε = O(ε∞ ) , Z −iH ε t ht e Z − U ε ∗ e−i U ε ε = O(ε∞ (1 + |t|)) .
The effective Hamiltonian h is the Weyl quantization of a semiclassical symbol h ∈ Sτ1≡1 (ε, B(C )) with an asymptotic expansion to any order. The B(C )-valued principal symbol h0 (k, r) has matrix-elements h0 (k, r)αβ = ψα (k − A(r)), H0 (k, r) ψβ (k − A(r)) , (20) where α, β ∈ {1, . . . , } and H0 (k, r) is defined in (24). The general formula for the subprincipal symbol of the effective Hamiltonian can be found in [PST1 ]. The structure and the interpretation of the effective Hamiltonian are most transparent for the case of a single isolated band.
556
G. Panati, H. Spohn, S. Teufel
Corollary 1. For an isolated -fold degenerate eigenvalue E(k) the B(C )-valued symbol h(k, r) = h0 (k, r) + εh1 (k, r) + O0 (ε 2 ) constructed in Theorem 1 has matrixelements h0 (k, r)αβ = E(k − A(r)) + φ(r) δαβ (21) and
h1 (k, r)αβ = − − ∇φ(r) + ∇E( k) × B(r) · A( k)αβ − B(r) · M( k)αβ := ∂j φ(r) − ∂l E( k) ∂j Al (r) − ∂l Aj (r) Aj ( k)αβ − ∂j Al − ∂l Aj (r) Re 2i ∂l ψα ( k), (Hper − E)( k) ∂j ψβ ( k) H , (22) f
where summation over indices appearing twice is implicit, k(k, r) = k − A(r), and α, β ∈ {1, . . . , }. The coefficients of the Berry connection are (23) A(k)αβ = i ψα (k), ∇ψβ (k) H . f
In dimension d = 3 the subprincipal symbol (22) has a straightforward physical interpretation. The 2-forms B and M are naturally identified with the vectors B = curlA and M(k)αβ = 2i ∇ψα (k), ×(Hper (k) − E(k))∇ψβ (k) H . f
Therefore the symbol of the effective Hamiltonian has the same form as the energy of a classical charge distribution in weak external fields, in first order multipole expansion. In this sense A(k) is interpreted as an effective electric dipole moment and M(k) as an effective magnetic dipole moment. Remark 2. Our results hold for arbitrary dimension d. However, to simplify presentation, we use a notation motivated by the vector product and the duality between 1-forms and 2-forms for d = 3. If d = 3, then B, n and Mn are 2-forms. The inner product of 2-forms is d d B · M := ∗−1 (B ∧ ∗M) = Bij Mij , j =1 i=1
where ∗ denotes the Hodge duality induced by the euclidian metric, and for a vector field w and a 2-form F the “vector product” is (w × F )j := (∗−1 (w ∧ ∗F ))j =
d
wi Fij ,
i=1
where the duality between 1-forms and vector fields was used implicitly.
♦
Theorem 1 is a direct consequence of the results proved in Propositions 1, 2 and 3. The proof of Corollary 1 is given at the end of this section. As mentioned before, the main idea of the proof is to adapt the general scheme of space-adiabatic perturbation theory to the case of the Bloch electron. While formally this seems straightforward, one must overcome two mathematical problems. First of all, in the present case the symbols are unbounded-operator-valued functions. One can deal with unbounded-operator-valued symbols by considering them as bounded operators from their domain equipped with the graph norm into the Hilbert space, see e.g. [DiSj]. The second, more serious problem consists in setting up a Weyl calculus for operators
Effective Dynamics for Bloch Electrons
557
acting on spaces like L2τ (Rd , Hf ). This is done in Appendix A and we will use in this section the terminology and notations introduced there. The results of Appendix A allow us to write the Hamiltonian HZε as the Weyl quantization H0 (k, iε∇k ) of the τ -equivariant symbol H0 (k, r) =
2 1 − i∇x + k − A(r) + V (x) + φ(r) 2
(24)
acting on the Hilbert space Hf := L2 (Td , dx) with constant domain D := H 2 (Td ). For sake of clarity, we spend two more words on this point. For any fixed (k, r) ∈ R2d , H0 (k, r) is regarded as a bounded operator from D to Hf which is τ -equivariant with respect to the bounded representation τ1 := τ |D acting on D and the unitary representation τ2 := τ acting on Hf , see Definition 6. Then the general theory developed in Appendix A can be applied. The usual Weyl quantization of H0 is an operator from S (Rd , D) to S (Rd , Hf ) given by 2 0 = 1 − i∇y + k − A iε∇k H (25) + V (y) + φ iε∇k . 2 0 can be restricted to L2 (Rd , D), since A and φ are smooth and bounded. Since Then H loc 0 preserves τ -equivariance and can then be restricted H0 is a τ -equivariant symbol, H to an operator from L2τ (Rd , D) to L2τ (Rd , Hf ). To conclude that (25), restricted to L2τ (Rd , D), agrees with (19), it is enough to recall that i∇kτ is defined as i∇k restricted to H 1 ∩ Hτ and to use the spectral calculus. Moreover, if one introduces the order function w(k, r) := (1 + k 2 ), then H0 ∈ Sτw (B(D, H)). More generally, we will give the proofs for any symbol H ∈ Sτw (ε, B(D, H)), whose principal symbol is then denoted by H0 . 3.1. The almost invariant subspace. In this section we construct the adiabatically decoupled subspace associated with an isolated Bloch band. Similar constructions have a considerable history and we refer to [MaSo, NeSo, PST1 , Te1 ] and references therein. Given an isolated family of bands {En (k)}n∈I , we define π0 (k, r) = PI (k −A(r)). It follows from the τ -equivariance of H0 and from the gap condition that π0 ∈ Sτ1 (B(Hf )). We also define the shorthand A(ε) = O0 (ε n ), where the subscript expresses that a family A(ε) ∈ B(H) is O(εn ) in the norm of bounded operators. By A(ε) = O0 (ε ∞ ) we mean that A(ε) = O0 (ε n ) for any n ∈ N. The remaining notation is defined in Appendix A. Proposition 1. Let {En }n∈I be an isolated family of bands and let Assumption (A1 ) be satisfied. Then there exists an orthogonal projection ε ∈ B(Hτ ) such that ε (26) HZ , ε = O0 (ε ∞ ) π +O(ε∞ ), where π is the Weyl quantization of a τ -equivariant semiclassical and ε = symbol π ε j πj in Sτ1 (ε, B(Hf )) , j ≥0
whose principal part π0 (k, r) is the spectral projector of H0 (k, r) corresponding to the given isolated family of bands. Proof. We first construct π on a formal symbol level.
558
G. Panati, H. Spohn, S. Teufel
Lemma 1. Let w(k, r) = (1 + k 2 ). There exists a unique formal symbol π=
∞
ε j πj
j =0
∈ Mτ1 (ε, B(Hf )) ∩ Mτw (ε, B(Hf , D))
such that π0 (k, r) = PI k − A(r) and (i) π π = π, (ii) π ∗ = π , (iii) H π − π H = 0. Proof. We construct the formal symbol π locally in phase space and obtain by uniqueness, which can be proved as in [PST1 ], a globally defined formal symbol. Fix a point z0 = (k0 , r0 ) ∈ R2d . From the continuity of the map z → H (z) and the gap condition it follows that there exists a neighborhood Uz0 of z0 such that for every z ∈ Uz0 the set {En (z)}n∈I can be enclosed by a positively-oriented circle (z0 ) ⊂ C independent of z in such a way that (z0 ) is symmetric with respect to the real axis, 1 dist (z0 ), σ (H (z)) ≥ Cg 4
for all
z ∈ Uz0 ,
(27)
and Radius((z0 )) ≤ Cr .
(28)
The constant Cg appearing in (27) is the same as in Definition 1 and the existence of a constant Cr independent of z0 such that (28) is satisfied follows from the periodicity of {En (z)}n∈I and the fact that A and φ are bounded. Indeed, can be chosen ∗ -periodic, i.e. such that (k0 + γ ∗ , r0 ) = (k0 , r0 ) for all γ ∗ ∈ ∗ . Let us choose any ζ ∈ (z0 ) and restrict all the following expressions to z ∈ Uz0 . We will construct a formal symbol R(ζ ) with values in B(Hf , D) — the local Moyal resolvent of H — such that (H − ζ ) R(ζ ) = 1Hf To this end let
and R(ζ ) (H − ζ ) = 1D
on Uz0 .
(29)
R0 (ζ ) = (H − ζ )−1 ,
where according to (27) R0 (ζ )(z) ∈ B(Hf , D) for all z ∈ Uz0 , and, using differentiability of H (z), ∂zα R0 (ζ )(z) ∈ B(Hf , D) for all z ∈ Uz0 . By construction one has (H − ζ ) R0 (ζ ) = 1Hf + O0 (ε) , where the remainder is O(ε) in the B(Hf )-norm. We proceed by induction. Suppose that R (n) (ζ ) =
n
ε j Rj (ζ )
j =0
with Rj (ζ )(z) ∈ B(Hf , D) for all z ∈ Uz0 satisfies the first equality in (29) up to O(εn+1 ), i.e. (H − ζ ) R (n) (ζ ) = 1Hf + ε n+1 En+1 (ζ ) + O0 (ε n+2 ) ,
(30)
Effective Dynamics for Bloch Electrons
559
where En+1 (ζ )(z) ∈ B(Hf ). By choosing Rn+1 (ζ ) = −R0 (ζ ) En+1 ,
(31)
we obtain that R (n+1) (ζ ) = R (n) (ζ ) + ε n+1 Rn+1 (ζ ) takes values in B(Hf , D) and satisfies the first equality in (29) up to O(εn+2 ). Hence the formal symbol R(ζ ) = ∞ j j =0 ε Rj (ζ ) constructed that way satisfies the first equality in (29) exactly. By the
) with values in B(Hf , D) same argument one shows that there exists a formal symbol R(ζ which exactly satisfies the second equality in (29). By the associativity of the Moyal product, they must agree:
) = R(ζ
) (H − ζ ) R(ζ ) = R(ζ ) R(ζ
on Uz0 .
Equations (29) imply that R(ζ ) satisfies the resolvent equation R(ζ ) − R(ζ ) = (ζ − ζ ) R(ζ ) R(ζ )
on Uz0
(32)
for any ζ, ζ ∈ (z0 ). From the resolvent equation it follows as in [PST1 ] that the j π defined through B(Hf , D)-valued formal symbol π = ∞ ε j j =0 i πj (z) := dζ Rj (ζ, z) on Uz0 (33) 2π (z0 ) satisfies (i) and (ii) of Lemma 1. As for (iii) a little bit of care is required. Let J : D → Hf be the continuous injection of D into Hf . Using (33) and (32) it follows that π J R(ζ ) = R(ζ ) J π for all ζ ∈ (z0 ). Moyal-multiplying from left and from the right with H − ζ one finds H π J = J π H as operators in B(D, Hf ). However, by construction H π takes values in B(Hf ) and, by density of D, the same must be true for π H . We are left to show that π ∈ Mτ1 (ε, B(Hf )) ∩ Mτw (ε, B(Hf , D)). To this end notice that by construction π inherits the τ -equivariance of H , i.e. πj (k − γ ∗ , q) = τ (γ ∗ ) πj (k, q) τ (γ ∗ )−1 . From (33) and (28) we conclude that for each α ∈ N2d and j ∈ N one has (∂zα πj )(z) ≤ 2πCr
sup (∂zα Rj )(ζ, z) ,
ζ ∈(z0 )
(34)
where · stands either for the norm of B(Hf ) or for the norm of B(Hf , D). In order to show that π ∈ Mτ1 (ε, B(Hf )) it suffices to consider z = (k, r) ∈ M ∗ × Rd since τ (γ ∗ ) is unitary and thus the B(Hf )-norm of π is periodic. According to (34) we must show that (∂zα Rj )(ζ, z)B(Hf ) ≤ Cαj ∀ z ∈ Uz0 , ζ ∈ (z0 ) (35) with Cαj independent of z0 ∈ M ∗ × Rd . We prove (35) by induction. Assume, by induction hypothesis, that for any j ≤ n one has that Rj (ζ ) ∈ Sτ1 (B(Hf )) ∩ Sτw (B(Hf , D)) (36) uniformly in ζ , in the sense that the Fr´echet semi-norms are bounded by ζ -independent constants. Then, according to Proposition 9, En+1 (ζ ), as defined by (30), belongs to 2 Sτw (B(Hf )) uniformly in ζ . By τ -equivariance, the norm of En+1 (ζ ) is periodic and
560
G. Panati, H. Spohn, S. Teufel
one concludes that En+1 (ζ ) ∈ Sτ1 (B(Hf )) uniformly in ζ . It follows from (31) that (36) is satisfied for j = n + 1. We are left to show that (36) is fulfilled for j = 0. We notice that according to (27) one has for all z ∈ R2d , R0 (ζ )B(Hf ) = (H (z) − ζ )−1 B(Hf ) =
1 4 . ≤ dist(ζ, σ (H (z))) Cg
By the chain rule,
(∂z R0 )(ζ, z)B(Hf ) = R0 (ζ )(∂z H0 )R0 (ζ ) (z)B(Hf ) .
(37)
Since ∂z H0 R0 (ζ ) is a τ -equivariant B(Hf )-valued symbol, its norm is periodic. Therefore it suffices to estimate its norm for z ∈ M ∗ × Rd , which yields the required bound. For a general α ∈ N2d , the norm of ∂zα R0 (ζ ) can be bounded in a similar way. This proves that R0 (ζ ) belongs to Sτ1 (B(Hf )) uniformly in ζ . On the other hand, R0 (k, r)B(Hf ,D) = (1 + x ) R0 ([k] + γ ∗ , r)B(Hf ) = (1 + x ) τ (γ ∗ )R0 ([k], r)τ −1 (γ ∗ )B(Hf ) ≤ C (1 + γ ∗ 2 )(1 + x ) R0 ([k], r)B(Hf ) ≤ C (1 + γ ∗ 2 ) ≤ 2C (1 + k 2 ) , where we used the fact that (1 + x )R0 (z)B(Hf ) is bounded for z ∈ M ∗ × Rd . The previous estimate and the fact that ∂z H0 R0 (ζ ) ∈ Sτ1 (B(Hf )) yield (∂z R0 )(ζ, z)B(Hf ,D) = R0 (ζ )(∂z H0 )R0 (ζ ) (z)B(Hf ,D) ≤ C(1 + k 2 ) . Higher order derivatives, are bounded by the same argument, yielding that R0 (ζ ) belongs to Sτw (B(Hf , D)) uniformly in ζ . This concludes the induction argument. From the previous argument it follows moreover that (∂zα Rj )(ζ, z)B(Hf ,D) ≤ Cαj w(z) ∀ z ∈ Uz0 , ζ ∈ (z0 ) with Cαj independent of z0 ∈ concludes the proof.
R2d .
By (34), this implies π ∈
Mτw (ε, B(Hf , D))
(38) and
Proof of Proposition 1. From the projector constructed in Lemma 1 one obtains, by resummation, a semiclassical symbol π ∈ Sτ1 (ε, Hf ) whose asymptotic expansion is given j by j ≥0 ε πj . Then according to Proposition 3 Weyl quantization yields a bounded operator π ∈ B(Hτ ), which is approximately a projector in the sense that π2 = π + O0 (ε ∞ ) and π∗ = π. We notice that Proposition 9 implies that H π ∈ Sτw (ε, B(Hf )). But τ -equivariance 1
implies that τ (ε, B(Hf )). Then the norm ∗ is periodic and then H π belongs indeed to S π H = H π belongs to the same class, so that [H, π ] ∈ Sτ1 (ε, B(Hf )). This a priori information on the symbol class, together with Lemma 1.(iii), assures that 2
, [H π ] = O0 (ε ∞ ) with the remainder bounded in the B(Hτ )-norm.
(39)
Effective Dynamics for Bloch Electrons
561
In order to get a true projector, we proceed as in [NeSo]. For ε small enough, let
i ε := dζ ( π − ζ )−1 . (40) 2π |ζ −1|= 21 π + O0 (ε ∞ ) and Then it follows that ε 2 = ε , ε = , , ε ] B(H ) ≤ C [H π ] B(Hτ ) = O(ε ∞ ) . [H τ
3.2. The intertwining unitaries. After having determined the decoupled subspace associated with an isolated family of Bloch bands, we aim at an effective description of the intraband dynamics, i.e. the dynamics inside this subspace. In order to get a workable formulation of the effective dynamics, it is convenient to map the decoupled subspace to a simpler reference space. The natural reference Hilbert space for the effective dynamics is K := L2 (Td∗ ) ⊗ C , where := dimPI (k) and Td∗ is M ∗ with periodic boundary conditions. Notation will be simpler in the following, if we think of the fibre C as a subspace of Hf . In order to construct such a unitary mapping, we reformulate Assumption (A2 ). Assumption A2 . Let {En (k)}n∈I be an isolated family of bands and let πr ∈ B(Hf ) be an orthogonal projector with dimπr = . There is a unitary-operator-valued map u0 : R2d → U(Hf ) so that
for any (k, r) ∈ R2d ,
u0 (k, r) π0 (k, r) u∗0 (k, r) = πr
(41)
u0 (k + γ ∗ , r) = u0 (k, r)τ (γ ∗ )−1 ,
(42)
and u0 belongs to S 1 (B(Hf )). Clearly,
u∗0 (k + γ ∗ , r) = τ (γ ∗ )u∗0 (k, r).
(43)
An operator-valued symbol satisfying (43) (resp. (42)) is called left τ -covariant (resp. right τ -covariant). The equivalence of (A2 ) and (A2 ) can be seen as follows. According to Assumption (A2 ), there exists an orthonormal basis ψj (k) j =1 of RanPI (k) which is smooth and τ -equivariant with respect to k. Let πr := π0 (k0 , r0 ) for any fixed point (k0 , r0 ). By the gap condition, dimπr = dimPI (k). Then for any orthonormal basis χj j =1 for Ranπr , the formula χj ψj (k − A(r))
u0 (k, r) := (44) j =1
defines a partial isometry which can be extended to a unitary operator u0 (k, r) ∈ U(Hf ). The fact that ψj (k) j =1 spans RanPI (k) implies (41), and the τ -equivariance of ψj (k) reflects in (42). Viceversa, given u0 fulfilling Assumption (A2 ), one can check that the formula ψj (k − A(r)) := u∗0 (k, r)χj ,
562
G. Panati, H. Spohn, S. Teufel
with χj j =1 spanning Ranπr , defines an orthonormal basis for RanPI (k) which satisfies Assumption (A2 ). After these remarks recall that the goal of this section is to construct a unitary operator which allow us to map the intraband dynamics from Ran ε to an ε-independent reference space K ⊂ Href . Since all the twisting of Hτ has been absorbed in the τ -equi variant basis ψj j =1 , or equivalently in u0 , the space Href can be chosen to be a space of periodic vector-valued functions, i.e. Href := L2τ ≡1 (Rd , Hf ) ∼ = L2 (Td∗ , Hf ). We introduce the orthogonal projector r := πˆ r ∈ B(Href ) since the effective intraband dynamics can be described in K := Ran r ∼ = L2 (Td∗ , C ), = L2τ ≡1 (Rd , C ) ∼ as it will become apparent later on. Recall that = dimPI (k) = dimπr . Proposition 2. Let {En }n∈I be an isolated family of bands and let Assumptions (A1 ) and (A2 ) be satisfied. Then there exists a unitary operator U ε : Hτ → Href such that U ε ε U ε ∗ = r (45) and U ε = uˆ + O0 (ε ∞ ), where u j ≥0 ε j uj belong to S 1 (ε, B(Hf )), is right τ covariant at any order and has principal symbol u0 . Proof. By usingthe same method as in Lemma 3.3 in [PST1 ], one constructs first the formal symbol j ≥0 ε j uj . Since u0 is right τ -covariant, one proves by induction that the same holds true for any uj . Indeed, by referring to the notation in [PST1 ], one has that un+1 = (an+1 + bn+1 )u0 with an+1 = − 21 An+1 and bn+1 = [πr , Bn+1 ]. From the defining equation u(n) u(n)∗ − 1 = ε n+1 An+1 + O(ε n+2 ) and the induction hypothesis, it follows that An+1 is a periodic symbol. Then w (n) := u(n) + ε n+1 an+1 u0 is right τ -covariant. Then the defining equation w (n) π w (n)∗ − πr = εn+1 Bn+1 + O(ε n+2 ) shows that Bn+1 is a periodic symbol, and so is bn+1 . Hence uj is right τ -covariant, and there exists a semiclassical symbol u j ε j uj so that u ∈ S 1 (ε, B(Hf )). One notices that right τ -covariance is nothing but a special case of (τ1 , τ2 )-equivariance, for τ2 ≡ 1 and τ1 = τ . Thus it follows from Proposition 3 that the Weyl quantization of u is a bounded operator u ∈ B(Hτ , Href ) such that: (i) u u∗ = 1Href + O0 (ε ∞ ) and (ii) u ε u∗ = r + O0 (ε ∞ ).
u∗ u = 1Hτ + O0 (ε ∞ ),
Finally we modify u as in [PST1 ] by an O0 (ε ∞ )-term in order to get the unitary ε operator U ∈ U(Hτ , Href ).
Effective Dynamics for Bloch Electrons
563
3.3. The effective Hamiltonian. The final step in space-adiabatic perturbation theory is to define and compute the effective Hamiltonian for the intraband dynamics and to compute its lower order terms. This is done, in principle, by projecting the full Hamiltonian HZε to the decoupled subspace and afterwards rotating to the reference space. Proposition 3. Let {En }n∈I be an isolated family of bands and let Assumptions (A1 ) and (A2 ) be satisfied. Let h be a resummation in Sτ1≡1 (ε, B(Hf )) of the formal symbol h = u π H π u∗ ∈ Mτ1≡1 (ε, B(Hf )) .
(46)
Then h ∈ B(Href ), [ h, r ] = 0 and
ε e−iHZ t − U ε ∗ e−iht U ε ε = O0 (ε ∞ (1 + |t|)) .
(47)
Remark 3. The definition of the effective Hamiltonian is not entirely unique in the sense that any Heff satisfying (47) would serve as well as an effective Hamiltonian. However, the asymptotic expansion of Heff is unique and therefore it is most convenient to define the effective Hamiltonian through (46). ♦ to emphasize the fact that it is the Weyl quantiProof. In the proof we denote HZε as H zation of H ∈ Sτw (ε, B(D, Hf )). First note that (46) follows from the following facts: according to Lemma 1 and Proposition 9 we have that 2
π H π ∈ Mτw (ε, B(Hf )) = Mτ1 (ε, B(Hf )) , where we used that τ is a unitary representation. With Proposition 2 it follows that h ∈ h ∈ B(Href ) follows from Proposition 3, while [ h, r ] = 0 Mτ1≡1 (ε, B(Hf )). Therefore is satisfied by construction. It remains to check (47):
ε∗ ε e−iH t − U ε ∗ e−iht U ε ε = e−iH t − e−iU h U t π + O0 (ε ∞ ) −i ε∗ ε π t − e−iU h U t π + O0 (ε ∞ ) = e πH = O(ε∞ (1 + |t|)) ,
where the last equality follows from the usual Duhammel argument and the fact that the difference of the generators is O0 (ε ∞ ) in the norm of bounded operators by construction. Since [ h, r ] = 0, the effective Hamiltonian will be regarded, without distinctions in notation, either as an element of B(Href ) or as an element of B(K). We compute the principal and the subprincipal symbol of h for the special but most relevant case of an isolated eigenvalue, eventually -fold degenerate, i.e. En (k) ≡ E(k) for every n ∈ I, |I| = . Recall that in this special case Assumption (A2 ) is equivalent to the existence of an orthonormal system of smooth and τ -equivariant Bloch functions corresponding to the eigenvalue E(k). If = 1 then Assumption (A2 ) is always satisfied. The part of u0 intertwining π0 and πr is given by Eq. (44), where ψj (k) are now Bloch functions, i.e. eigenvectors of Hper (k) with eigenvalue E(k).
564
G. Panati, H. Spohn, S. Teufel
Proof of Corollary 1. In the following h is identified with πr hπr and regarded as a B(C )-valued symbol. We consider the matrix elements h(k, r)αβ := χα , h(k, r)χβ for α, β ∈ {1, . . . , }, where we recall that χα = u0 (k, r)ψα (k − A(r)). Eq. (21) follows immediately from the fact that h0 = u0 H0 u∗0 and that ψα are Bloch functions. As for h1 , we use the general formula of [PST1 ], which reads, transcribed to the present setting, as h1 αβ (k, r) = −i ψα ( k), {E( k) + φ(r), ψβ ( k)} − i ψα ( k), {(Hper ( k) − E( k)), ψβ ( k)} . (48) 2
Here {A, ϕ} = ∇r A · ∇k ϕ − ∇k A · ∇r ϕ are the Poisson brackets for an operator-valued function A(k, r) acting on a vector-valued function ϕ(k, r). We need to evaluate (48). Inserting (44) and performing a straightforward computation the first term in (48) gives the first term in (22) while the second term contributes to the αβ matrix element with d i k), ∂l (Hper − E)( k) ∂j ψβ ( k) H . ∂j Al − ∂l Aj (r) ψα ( f 2 j,l=1
The derivative on (Hper − E) can be moved to the first argument of the inner product by noticing that 0 = ∇ ψα , (Hper − E)φ = ∇ψα , (Hper − E)φ + ψα , ∇(Hper − E)φ , since ψα is in the kernel of (Hper − E). Finally the imaginary part of d i k), (Hper − E)( k) ∂j ψβ ( k) H ∂j Al − ∂l Aj (r) ∂l ψα ( f 2 j,l=1
vanishes, as can be seen by direct computation, concluding the proof.
4. Semiclassical Dynamics for Bloch Electrons We have now at our disposal the tools to establish the link between the Schr¨odinger equation (2) and the corrected semiclassical equations of motion (5). To this end we specialize to the case of a non-degenerate Bloch band En . The phase space for (5) is Rd × Rd , since we use the extended zone scheme, and we denote by tε the corresponding solution flow. Since the effective Hamiltonian is written in canonical variables, it is necessary to switch in (5) to (r, k) with k = κ + A(r). In the new coordinates the t solution flow is denoted by ε and t ε (r, k) = tε r r, k − A(r) , tε κ r, k − A(r) + A(r) . Let us consider any admissible semiclassical observable a = a(εx, −i∇x ) acting on the t t “physical” Hilbert space L2 (Rd , dx). Its symbol is transported by ε to a ◦ ε with t Weyl quantization a ◦ ε . On the other hand the operator a is transported by the Heisenε t/ε ε t/ε iH −iH ae . Our assertion is that on the subspace εn L2 (Rd ) := berg equation as e U ε ∗ ε Hτ , ε and U ε as constructed in the previous section, these two operators are uniformly close to order ε2 .
Effective Dynamics for Bloch Electrons
565
Theorem 2. Let En be an isolated, non-degenerate Bloch band, see Definition 1, and let the potentials satisfy Assumption (A1 ). Let a ∈ Cb∞ (R2d ) be ∗ -periodic in the second argument, i.e. a(r, k + γ ∗ ) = a(r, k) for all γ ∗ ∈ ∗ , and a = a(εx, −i∇x ) be its Weyl quantization. Then for each finite time-interval I ⊂ R there is a constant C < ∞ such that for t ∈ I , ε t iH ε t/ε −iH ε t/ε ε e a e − a ◦ ≤ ε2 C . ε n n B(L2 (Rd ))
In particular, for ψ0 ∈ εn H we have that ε t ψ0 , eiH ε t/ε a e−iH t/ε ψ0 − ψ0 , a ◦ ε ψ0 ≤ ε2 C ψ0 2 .
(49)
Theorem 2 is an Egorov-type theorem, see [Ro]. An unconventional feature is that the first order corrections are treated by considering an ε-dependent Hamiltonian flow instead of having a separate dynamics for the subprincipal symbol of an observable. By exploiting the relation between Weyl-quantized operators and Wigner transforms, one can easily translate (49) to the language of Wigner functions. For a detailed discussion on how Theorem 2 relates to alternative approaches to the semiclassical limit in perturbed periodic potentials we refer the reader to [Te2 ]. To prove Theorem 2, our strategy is to first establish a corresponding Egorov theorem in the reference space and then to pull back to L2 (Rd , dx). Proposition 4. Let E be an isolated non-degenerate Bloch band and let h be the effective Hamiltonian constructed in Theorem 1, which acts on the reference space K =
t : R2d → R2d be the Hamiltonian flow L2τ ≡1 (Rd ) of ∗ -periodic L2loc -functions. Let generated by the Hamiltonian function hcl (k, r) = h0 (k, r) + εh1 (k, r) . Then for any semiclassical observable a = a0 (k, iε∇k )+εa1 (k, iε∇k ) with a ∈ S 1 (ε, C) we have that i e ht/ε
t ≤ CT ε 2 a e−iht/ε − a ◦ (50) uniformly for any finite interval in time [−T , T ]. Proof. Since the Hamiltonian function is bounded with bounded derivatives, it follows
t ) ∈ S 1 (ε). Therefore the proof is
t ∈ S 1 (ε) and that d (a ◦ immediately that a ◦ dt just the standard computation
t ht /ε d
t =
t−t e−i eiht/ε a e−iht/ε − a ◦ dt eiht /ε a ◦ dt 0
t i
t−t )
t−t − d (a ◦ = e−iht /ε , dt eiht /ε h, a ◦ dt ε 0 together with the fact that the integrand is O(ε2 ) in the norm of bounded operators, since by construction d
t−t ) = hcl , a ◦
t−t (a ◦ dt and, computing the expansion of the Moyal product, i
t−t = hcl , a ◦
t−t + O(ε 2 ) . h, a ◦
ε
566
G. Panati, H. Spohn, S. Teufel
In order to obtain the Egorov theorem for the physical observables, we need to undo the transform to the reference space and the Zak transform. We start with the simpler observation on how the Zak transform maps semiclassical observables. Proposition 5. Let a ∈ S 1 (ε, C) be ∗ -periodic, i.e. a(r, k + γ ∗ ) = a(r, k) for all γ ∗ ∈ ∗ . Let b(k, r) = a(r, k), then b ∈ Sτ1 (ε, C) and a = U∗ bU , where the Weyl quantization is in the sense of a = a(εx, −i∇x ) acting on L2 (Rd ) and b = b(k, εi∇k ) acting on Hτ . Remark 4. An analogous statement cannot be true for general operator-valued τ -equivariant symbols. For example, the symbol b(k, r) := Hper (k − A(r)) is τ -equivariant and in particular a semiclassical observable. However, the corresponding operator in the original representation is 2 1 − i∇x − A(εx) + V (x) 2 which cannot be written as a ε-pseudodifferential operator with scalar symbol. U∗ bU = −
♦
Proof. We give the proof for a(·, k) ∈ S(Rd ). The general result follows from standard density arguments, see [DiSj]. For ψ ∈ S(Rd ) we have according to (64) the explicit formula
1 a(εx, −i∇x )ψ (x) = dη Fa (η, γ ) eiε(η·γ )/2 eiεη·x ψ(x +γ ) . (51) d/2 (2π) Rd γ ∈
On the other hand for (Uψ)(k, r) =: ϕ(k, r) by definition it holds that
b(k, iε∇k )ϕ (k, r) = dη Fb (γ , η) e−iε(η·γ )/2 eiγ ·k ϕ(k − εη, r) . d γ ∈ R
(52)
The assumptions on a and ψ guarantee that all the integrals and sums in the following expressions are absolutely convergent and thus that interchanges in the order of integration are justified by Fubini’s theorem. We compute the inverse Zak transform of (52) using (11), −1 U b ϕ (x)
= dk dη Fb (γ , η) eik·x e−iε(η·γ )/2 eiγ ·k ϕ(k − εη, [x]) γ ∈ B
=
d γ ∈ R
Rd
dη Fb (γ , η)eiε(η·γ )/2 eiεη·x
M∗
dk ei(k−εη)·(x+γ ) ϕ(k − εη, [x]) . (53)
The τ -equivariance of ϕ implies that the function f (k, y) := eik·y ϕ(k, [y]) is exactly periodic in the first variable. Then the integral in dk can be shifted by an arbitrary amount, so that
i(k−εη)·(x+γ ) dk e ϕ(k − εη, [x]) = dk eik·(x+γ ) ϕ(k, [x + γ ]) = ψ(x + γ ) . M∗
M∗
Inserting this expression in the last line of (53) and comparing with (51) concludes the proof.
Effective Dynamics for Bloch Electrons
567
Before we arrive at the proof of Theorem 2, one has to study how the unitary map constructed in Sect. 3.2 maps observables in the Zak representation to observables in the reference representation. Proposition 6. Let b = b0 (k, εi∇k ) + ε b1 (k, εi∇k ) with symbol b ∈ S 1 (ε, C) which is ∗ -periodic in the first argument. Let U ε : ε Hτ → K be the unitary map constructed in Sect. 3.2. Then U ε ε b ε U ε ∗ = c + O(ε 2 ) , where c(ε, k, r) = b ◦ T (k, r) with T : R2d → R2d , (k, r) → k + ε Am k − A(r) ∇Am (r), r + εA k − A(r) . Here and in the following, summation over indices appearing twice is implicit. Proof. In order to compute c = u π b π u∗ , observe that, since b is scalar-valued, the principal symbol remains unchanged, i.e. c0 = u0 π0 b0 π0 u∗0 = b0 . For the subprincipal symbol we use the general transformation formula (48) obtained for the Hamiltonian, which applies to all operators whose principal symbol commutes with π0 . In this case the eigenvalue E in (48) must be replaced by the corresponding principal symbol and a term for the subprincipal symbol b1 must be added. Hence we find that c1 (k, r) = −i ψ(k − A(r)), {b0 (k, r), ψ(k − A(r))} + ψ(k − A(r)), b1 (k, r)ψ(k − A(r)) = ∂kn b0 (k, r) i ψ(k − A(r)), ∂m ψ(k − A(r)) ∂n Am (r) + ∂rn b0 (k, r) i ψ(k − A(r)), ∂n ψ(k − A(r)) + b1 (k, r) = ∂kn b0 (k, r) Am (k − A(r)) ∂n Am (r) + ∂rn b0 (k, r) An (k − A(r)) + b1 (k, r) , where summation over indices appearing twice is implicit. Now a comparison with the Taylor expansion of b ◦ T (k, r) in powers of ε proves the claim. We have now all the ingredients needed for the Proof (Proof of Theorem 2). Let a ∈ Cb∞ (R2d ) be ∗ -periodic in the second argument, then according to Proposition 5 we have εn eiH
ε t/ε
a e−iH
ε t/ε
εn = U ∗ ε eiHZ t/ε b e−iHZ t/ε ε U ε
ε
(54)
with b(k, r) = a(r, k). With Theorem 1 and Proposition 6 we find that
b e−iHZ t/ε ε = U ε ∗ eiht/ε c e−iht/ε U ε + O(ε 2 ) , (55) ε eiHZ t/ε where c(ε, k, r) = b ◦ T (k, r). Now we can apply Proposition 4 to conclude that ε
ε
t + O(ε 2 ). eiht/ε c e−iht/ε = c ◦ Since, for ε sufficiently small, T is a diffeomorphism, one can write
t = c ◦ T −1 ◦ T ◦
t ◦ T −1 ◦ T =: c ◦ T −1 ◦ ◦ T = b ◦ ◦ T , c◦ t
t
568
G. Panati, H. Spohn, S. Teufel t
where the flow ε in the new coordinates will be computed explicitly below. Inserting the results into (55), one obtains ε ε t b e−iHZ t/ε ε = U ε ∗ b ◦ ◦ T U ε + O(ε 2 ) ε eiHZ t/ε t = ε b ◦ ε + O(ε 2 ) , where we used Proposition 6 for the second equality. Inserting into (54) we finally find that t ε ε εn eiH t/ε a e−iH t/ε εn = εn a ◦ εn + O(ε 2 ) , (56) where we did not make the exchange of the order of the arguments in a explicit. Since the flow is determined only in approximation and only through its vector field, we make use of the following lemma. Lemma 2. Let i : R2d × R → R2d be the flow associated with the vector field vi ∈ Cb∞ (R2d , R2d ), i = 1, 2. (i) If for all α ∈ N2d there is a cα < ∞ such that sup | ∂ α (v1 − v2 )(x)| ≤ cα ε 2 ,
x∈R2d
then for each bounded interval I ⊂ R there are constants CI,α < ∞ such that sup t∈I,x∈R2d
| ∂ α (t1 − t2 )(x)| ≤ CI,α ε 2 .
(57)
(ii) Let a ∈ S 1 (ε, C). If (57) holds for the flows 1 , 2 , then there is a constant C < ∞, such that for all t ∈ I , a ◦ t1 − a ◦ t2 B(L2 (Rd )) ≤ C ε2 . Proof. Assertion (i) is a simple application of Gronwall’s lemma. Assertion (ii) follows from the fact that the norm of the quantization of a symbol in S 1 is bounded by a constant times the sup-norm of finitely many derivatives of the symbol, which are O(ε 2 ) according to (57). According to assertion (ii) of the lemma it suffices to show that t ε (r, k) = tε r (r, k − A(r)), tε κ (r, k − A(r)) + A(r) + O(ε 2 ) in the above sense, where tn is the flow of (5). And from assertion (i) we infer that it suffices to prove the analogous properties on the level of the vector fields. Through a subsequent change of coordinates we aim at computing the vector field
t . The effective of tε up to an error of order O(ε 2 ). We start with the vector field of Hamiltonian on the reference space including first order terms reads h(r, k) = E(k − A(r)) + φ(r) − ε FLor (r, ∇E(k − A(r))) · A(k − A(r)) + B(r) · M(k − A(r)) , (58)
Effective Dynamics for Bloch Electrons
569
with the Lorentz force FLor (r, ∇E(k − A(r))) = −∇φ(r) + ∇E(k − A(r)) × B(r) . Componentwise, the canonical equations of motion are r˙j = ∂kj h(r, k) = ∂kj E(k − A(r)) −ε ∂kj FLor (r, k − A(r)) · A(k − A(r)) + B(r) · M(k − A(r)) , k˙j = −∂rj h(r, k) = −∂j φ(r) + ∂l E(k − A(r))∂j Al (r) − ε ∂kl A(k − A(r)) · FLor (r, k − A(r)) + B(r) · M(k − A(r)) ∂j Al (r) − ε Al (k − A(r)) ∂j ∂l φ(r) − ∇E(k − A(r)) × ∂j B(r) l + ε ∂j B(r) · M(k − A(r)) , with the convention to sum over repeated indices. Substituting k = k − A(r) one obtains r˙j = ∂j E( k) − ε ∂ kj FLor (r, k) · A( k) + B(r) · M( k) and
k˙ j = k˙j − ∂l Aj (r) r˙l
= − ∂j φ(r) + ∂l E( k) ∂j Al (r) − ε ∂kl A( k) + M( k) · B(r) ∂j Al (r) k) · FLor (r, + ε Al ( k) ∂rj FLor l (r, k) + ε ∂j B(r) · M(k − A(r)) − ∂l Aj (r) r˙l = − ∂j φ(r) + r˙l ∂j Al (r) − ∂l Aj (r)
+ ε Al ( k) ∂rj FLor l (r, k) + ε ∂j B(r) · M( k) = − ∂j φ(r) + r˙ × B(r) j + ε Al ( k) ∂rj FLor l (r, k) + ε ∂j B(r) · M( k) , which, in more compact form, read r˙ = ∇E( k) − ε∇ k A( k) · FLor (r, k) + B(r) · M( k) ,
k˙ = −∇φ(r) + r˙ × B(r) + ε∇r A( k) + B(r) · M( k) . k) · FLor (r, As the next step we perform the change of coordinates induced by T , q = r + εA( k) , p = k − A(r) + ε∇r A( k) · A(r) ,
(59)
(60)
and then switch to the kinetic momentum v = p − A(q) = k + εAl ( k)∇Al (r) − εAl ( k)∂l A(r) + O(ε 2 ) 2 = k + ε A( k) × B(r) + O(ε ) ,
(61)
570
G. Panati, H. Spohn, S. Teufel
where we used Taylor expansion. The inverse transformations are r = q − ε A(v) + O(ε 2 ) ,
k = v − ε A(v) × B(q) + O(ε 2 ) . Recall that we want to show that (q, v) satisfy the semiclassical equations of motion (5), where q is identified with r and v with κ. The new notation is introduced here, only to make a clear distinction between the canonical variables (r, k) in the reference representation and the canonical variables (q, p) in the original representation. We now substitute (60) and (61). In the following computations we use several times the Taylor expansion to first order and drop terms of order ε 2 . In particular in the terms of order ε one can replace r by q and k by v. We find q˙j = r˙j + ε A˙ j (v) = ∂j E(v) − ε A(v) × B(q) ∂l ∂j E(v) l − ε ∂vj − ∇φ(q) + ∇E(v) × B(q) l Al (v) + B(q) · M(v) + ε∂l Aj v˙l
= ∂j E(v) − ε v˙l ∂j Al − ∂l Aj − ε B(q) · ∂j M(v) = ∂j E(v) − ε v˙ × (v) j − ε B(q) · ∂j M(v) , where it is used that v˙ = FLor + O(ε). Thus we obtained the first equation of (5). For the second equation we find d v˙j = k˙ j + ε A(v) × B(q) dt = − ∂j φ(q) + εAl (v)∂l ∂j φ(q) ˙ + q˙ × B(q) j − ε A(v) × B(q) j − ε q˙ × Al (v)∂l B(q) j
+ ε Al (v)∂qj FLor l (q, v) + ε ∂j B(q) · M(v) ˙ + ε A(v) × B(q) j + ε A(v) × q˙l ∂l B(q) j = − ∂j φ(q) + q˙ × B(q) j + ε ∂j B(q) · M(v) , where the term ε A(v) ∂qj FLor l (q, v) + ∂l ∂j φ(q) = ε Al (v) q˙ × ∂j B(q) + O(ε 2 ) l
cancels the remaining two terms. Changing back notation from (q, v) to (r, κ), this concludes the proof of Theorem 2. A. Operator-Valued Weyl Calculus for τ -Equivariant Symbols The pseudodifferential calculus for scalar-valued symbols defined on the phase space T ∗ Rd = R2d can be translated to the phase space T ∗ Td = Td × Rd , Td a flat torus, by restricting to periodic functions and symbols. This approach is used by G´erard and Nier [GeNi] in the context of scattering theory in periodic media.
Effective Dynamics for Bloch Electrons
571
In this appendix we present a similar approach to Weyl quantization of operatorvalued symbols which are not exactly periodic, but τ -equivariant with respect to some nontrivial representation τ of the group of lattice translations. We obtain a pseudodifferential and semiclassical calculus which can be applied to τ -equivariant symbols like the Schr¨odinger Hamiltonian with periodic potential in the Zak representation. In particular, the full computational power of the usual Weyl calculus is retained. The strategy is to use the strong results available for the phase space R2d by restricting to functions which are τ -equivariant in the configurational variable. Let ⊂ Rd be a regular lattice generated through the basis {γ1 , . . . , γd }, γj ∈ Rd , i.e. = x ∈ Rd : x = dj =1 αj γj for some α ∈ Zd . Clearly the translations on Rd by elements of form an abelian group isomorphic to Zd . The centered fundamental cell of is denoted as M = x ∈ Rd : x = dj =1 αj γj for αj ∈ [− 21 , 21 ] . Let H be a separable Hilbert space and let τ be a representation of in B ∗ (H), the group of invertible elements of B(H) , i.e. a group homomorphism τ : → B ∗ (H),
γ → τ (γ ) .
If more than one Hilbert space appears, then τ denotes a collection of such representations, i.e. one on each Hilbert space. Warning. In the application of the results of this appendix to Bloch electrons the lattice corresponds to the dual lattice ∗ in momentum space Rd . Let Lγ be the operator of translation by γ ∈ on S(Rd , H), i.e. (Lγ ϕ)(x) = ϕ(x−γ ), and extend it by duality to distributions, i.e. for T ∈ S (Rd , H) let (Lγ T )(ϕ) = T (L−γ ϕ). Definition 2. A tempered distribution T ∈ S (Rd , H) is said to be τ -equivariant if Lγ T = τ (γ )T for all γ ∈ , where τ (γ )T (ϕ) = T τ (γ )−1 ϕ for ϕ ∈ S(Rd , H). The subspace of τ -equivariant distributions is denoted as Sτ . Analogously we define Hτ = ψ ∈ L2loc (Rd , H) : ψ(x − γ ) = τ (γ ) ψ(x) for all γ ∈ ,
which, equipped with the inner product
dx ϕ(x), ψ(x)H , ϕ, ψHτ = M
is a Hilbert space. Clearly Cτ∞ = ψ ∈ C ∞ (Rd , H) : ψ(x − γ ) = τ (γ ) ψ(x) for all γ ∈ , is a dense subspace of Hτ .
♦
572
G. Panati, H. Spohn, S. Teufel
Notice that if τ is a unitary representation, then for any ϕ, ψ ∈ Hτ the map x → ϕ(x), ψ(x)H is periodic, since ϕ(x − γ ), ψ(x − γ )H = τ (γ )ϕ(x), τ (γ )ψ(x)H = ϕ(x), ψ(x)H . Now that we have τ -equivariant functions, we define τ -equivariant symbols. To this end we first recall the definition of the standard symbol classes. Definition 3. A function w : R2d → [0, +∞) is said to be an order function, if there exist constants C0 > 0 and N0 > 0 such that w(x) ≤ C0 x − yN0 w(y) for every x, y ∈ R2d .
♦
It is obvious and will be used implicitly that the product of two order functions is again an order function. Definition 4. A function A ∈ C ∞ (R2d , B(H1 , H2 )) belongs to the symbol class S w (B(H1 , H2 )) with order function w, if for every α, β ∈ Nd there exists a positive constant Cα,β such that α β ≤ Cα,β w(q, p) (62) (∂q ∂p A)(q, p) B(H1 ,H2 )
♦
for every q, p ∈ Rd .
Definition 5. A map A : [0, ε0 ) → Sw (B(H1 , H2 )), ε → Aε is a semiclassical symbol of order w, if there exists a sequence {Aj }j ∈N ⊂ Aj ∈ S w (B(H1 , H2 )) such that A
∞
ε j Aj
in S w (B(H1 , H2 )) ,
j =0
which means that for every n ∈ N and for all α, β ∈ Nd there exists a constant Cα,β,n such that for any ε ∈ [0, ε0 ) one has n−1 α β ε j Aj (q, p) ∂q ∂p Aε (q, p) − j =0
B(H1 ,H2 )
≤ εn Cα,β,n w(q, p) .
(63)
The space of semiclassical symbols of order w is denoted as S w (ε, B(H1 , H2 )) or, if clear from the context or if no specification is required, as S w (ε). The space of formal power series with coefficients in S w (B(H1 , H2 )) is denoted as M w (ε, B(H1 , H2 )). ♦ Definition 6. A symbol Aε ∈ S w (ε, B(H1 , H2 )) is τ -equivariant (more precisely (τ1 , τ2 )equivariant), if Aε (q − γ , p) = τ2 (γ ) Aε (q, p) τ1 (γ )−1 for all γ ∈ . The space of τ -equivariant symbols is denoted as Sτw (ε, B(H1 , H2 )).
♦
Effective Dynamics for Bloch Electrons
573
Notice that the coefficients in the asymptotic expansion of a τ -equivariant ∞ j semiclassical symbol must be as well τ -equivariant, i.e. if Aε j =0 ε Aj , Aε w w ∈ Sτ (ε, B(H1 , H2 )), then Aj ∈ Sτ (B(H1 , H2 )). Given any τ -equivariant symbol A ∈ Sτw (B(H1 , H2 )), one can consider the usual regarded as an operator acting on S (Rd , H1 ) with distributional Weyl quantization A, integral kernel
1 KA (x, y) = dξ A 21 (x + y), ξ eiξ ·(x−y)/ε . (64) d (2πε) Rd Notice that the integral kernel associated to a τ -equivariant symbol A is τ -equivariant in the following sense: KA (x − γ , y − γ ) = τ2 (γ ) KA (x, y) τ1 (γ )−1
for all γ ∈ .
(65)
The simple but important observation is that the space of τ -equivariant distributions is invariant under the action of pseudodifferential operators with τ -equivariant symbols. Proposition 7. Let A ∈ Sτw (B(H1 , H2 )), then Sτ (Rd , H1 ) ⊂ Sτ (Rd , H2 ) . A 1 2 maps S (Rd , H1 ) continuously into S (Rd , H2 ), we only need to show Proof. Since A )(ϕ) for all T ∈ Sτ (Rd , H1 ) and ϕ ∈ S(Rd , H2 ). that (Lγ AT )(ϕ) = (τ2 (γ )AT 1 To this end notice that as acting on S(Rd , H2 ) one finds by direct computation using ∗ Lγ = Lγ (τ1 (γ )−1 )∗ A ∗ τ2 (γ )∗ . Indeed, let ψ ∈ S(Rd , H2 ), then (64) that A ∗ Lγ ψ (x) = A
R
d
dy KA∗ (x, y) ψ(y − γ ) =
Rd
dy KA∗ (x, y + γ ) ψ(y)
dy (τ1 (γ )−1 )∗ KA∗ (x − γ , y) τ2 (γ )∗ ψ(y) ∗ τ2 (γ )∗ ψ (x). = Lγ (τ1 (γ )−1 )∗ A =
Rd
Hence, using the fact that τ is a representation and that Lγ T = τ1 (γ )T , )(ϕ) = T (A ∗ L−γ ϕ) = T (L−γ τ1 (γ )∗ A ∗ (τ2 (γ )−1 )∗ ϕ) (Lγ AT τ1 (γ )−1 Lγ T )(ϕ) = (τ2 (γ ) A T )(ϕ) . = (τ2 (γ ) A For the convenience of the reader we also recall the definition and the basic result about the Weyl product of semiclassical symbols. For a proof see e.g. [DiSj]. B = Proposition 8. Let A ∈ S w1 (ε, B(H2 , H3 )) and B ∈ S w2 (ε, B(H1 , H2 )), then A w w 1 2 C, with C ∈ S (ε, B(H1 , H3 )) given through C(ε, q, p) = exp
iε =: A B . (∇p · ∇x − ∇ξ · ∇q ) A(ε, q, p)B(ε, x, ξ ) x=q,ξ =p 2 (66)
574
G. Panati, H. Spohn, S. Teufel
The corresponding product on the level of the formal power series is called the Moyal product and denoted as : M w1 (ε, B(H2 , H3 )) × M w2 (ε, B(H1 , H2 )) → M w1 w2 (ε, B(H1 , H3 )) . The τ -equivariance of symbols is preserved under the pointwise product, the Weyl product and the Moyal product. Proposition 9. Let Aε ∈ Sτw1 (ε, B(H2 , H3 )) and Bε ∈ Sτw2 (ε, B(H1 , H2 )), then Aε Bε ∈ Sτw1 w2 (ε, B(H1 , H3 )) and Aε Bε ∈ Sτw1 w2 (ε, B(H1 , H3 )). Proof. One has Aε (q − γ , p)Bε (q − γ , p) = τ3 (γ )Aε (q, p)τ2 (γ )−1 τ2 (γ )Bε (q, p)τ1 (γ )−1 = τ3 (γ )Aε (q, p)Bε (q, p)τ1 (γ )−1 , which shows Aε Bε ∈ Sτw1 w2 (ε, B(H1 , H3 )) and inserted into (66) yields immediately also Aε Bε ∈ Sτw1 w2 (ε, B(H1 , H3 )). An analogous statement holds for the Moyal product of formal symbols. A not completely obvious fact is the following variant of the Calderon-Vaillancourt theorem. Theorem 3. Let A ∈ Sτ1 (B(H)) and τ1 , τ2 unitary representations of in B(H), then ∈ B(Hτ1 , Hτ2 ) and for Aε ∈ Sτ1 (ε, B(H)) we have that A ε B(H ,H ) < ∞ . sup A τ1 τ2
ε∈[0,ε0 )
Proof. Fix n > d/2 and let w(x) = x−n . We consider the weighted L2 -space
2 2 d 2 2 Lw = ψ ∈ Lloc (R , H) : dx w(x) |ψ(x)| < ∞ . Rd
Let j = 1, 2, then Hτj ⊂ L2w and for any ψ ∈ Hτj one has the norm equivalence C1 ψHτj ≤ ψL2w ≤ C2 ψHτj
(67)
for appropriate constants 0 < C1 , C2 < ∞. The first inequality in (67) is obvious and the second one follows by exploiting τj -equivariance of ψ and unitarity of τj :
dx w(x)2 τj (γ )−1 ψ(x)2H = dx w(x)2 ψ(x)2H ψ2L2 = w
γ ∈ M+γ
≤
sup
γ ∈ x∈M+γ
w(x)2
γ ∈ M+γ
M
dx ψ(x)2H ≤ C2 ψHτj .
ε in ∈ B(L2w ) and to estimate the norm of A According to (67) it suffices to show that A this space. is smooth as well (see [Fo], Let ψ ∈ Cτ∞ (Rd , H), then by the general theory Aψ 1 Corollary 2.62) and thus, according to Proposition 7, Aψ ∈ Cτ∞ (Rd , H). Hence we can 2 use (67) and find Aψ 2 = w Aψ 2 ≤ w Aw −1 2 wψ 2 = w Aw −1 2 ψ 2 . L L B(L ) L B(L ) L w
w
Effective Dynamics for Bloch Electrons
575
However, by Proposition 8, we have that w A w −1 ∈ S 1 (ε, B(H)). Thus from the usual Calderon-Vaillancourt theorem it follows that w A w−1 2 ≤ Cd w A w−1 C 2d+1 (R2d ) . B(L ) b
∈ B(Hτ1 , Hτ2 ). With w This shows that for A ∈ w−1 ∈ we have A Aε 1 1 S (ε, B(H)) for Aε ∈ Sτ (ε, B(H)), we conclude that Sτ1 (B(H))
ε B(H ,H ) < ∞ sup A τ1 τ2
ε∈[0,ε0 )
by the same argument.
Remark 5. It is clear from the proof that the previous result still holds true under the weaker assumption that τ1 and τ2 are uniformly bounded, i.e. that sup τj (γ )B(H) ≤ C ,
γ ∈
j = 1, 2 . ♦
as an Finally we would also like to show that for A ∈ Sτ1 (B(H)) the adjoint of A † , is given through the quantization of the pointwise operator in B(Hτ ), denoted by A ∗ . Here it is crucial that τ is a unitary representation. adjoint, i.e. through A Proposition 10. Let Sτ1 (B(H)) with a unitary representation τ (with τ1 = τ2 = τ ) and ∈ B(Hτ ), then A † = A ∗ . † be the adjoint of A let A Proof. Let ψ ∈ Hτ and ϕ ∈ Cτ∞ such that ϕ := 1M ϕ ∈ C0∞ (Rd , H), where 1M denotes the characteristic function of the set M. Such ϕ are dense in Hτ and the corresponding
ϕ can be used as a test function:
dx ϕ(x), (Aψ)(x) H = dx ϕ (x), (Aψ)(x) ϕ, Aψ H = H τ d R
M ∗ = dx (A ϕ )(x), ψ(x) H Rd
= dx dy KA∗ (x, y) ϕ (y), ψ(x) d H Rd
R
= dx dy KA∗ (x, y) ϕ (y), ψ(x) H Rd M
= dx dy KA∗ (x + γ , y) ϕ (y), ψ(x + γ )
M
=
dx
M
=
γ ∈
dx
=
γ ∈
M
dx
M
= M
M
dy τ −1(γ )KA∗ (x, y − γ )τ (γ ) ϕ (y), τ −1 (γ )ψ(x)
γ ∈
Rd
H
M
M
dy KA∗ (x, y − γ )ϕ(y − γ ), ψ(x)
dy KA∗ (x, y)ϕ(y), ψ(x)
H
∗ ϕ)(x), ψ(x) = A ∗ ϕ, ψ dx (A H Hτ .
H
H
576
G. Panati, H. Spohn, S. Teufel
In particular, we used the τ -equivariance of the kernel (65) and of the functions in Hτ ∗ = A † . and the unitarity of τ . By density we have A B. Hamiltonian Formulation for the Refined Semiclassical Model The dynamical equations (5), which define the ε-corrected semiclassical model, can be written as r˙ = ∇κ Hsc (r, κ) − ε κ˙ × n (κ) , κ˙ = −∇r Hsc (r, κ) + r˙ × B(r)
(68)
with
Hsc (r, κ) := En (κ) + φ(r) − ε Mn (κ) · B(r) . Recall that we are using the notation introduced in Remark 2 and that B and n are the 2-forms corresponding to the magnetic field and to the curvature of the Berry connection, i.e. in components B(r)ij = ∂i Aj − ∂j Ai (r) for i, j ∈ {1, . . . , d}, and
n (κ)ij = ∂i Aj − ∂j Ai (κ) .
We fix the system of coordinates z = (r, κ) in R2d . The standard symplectic form 0 = 0 (z)lm dzm ∧ dzl , where l, m ∈ {1, . . . , 2d}, has coefficients given by the constant matrix 0 −I 0 (z) = , I 0 where I is the identity matrix in Mat(d, R). The symplectic form, which turns (68) into Hamilton’s equation of motion for Hsc , is given by the 2-form B, ε = B, ε (z)lm dzm ∧ dzl with coefficients B(r) −I B, ε (r, κ) = . (69) I ε n (κ) For ε = 0 the 2-form B, ε coincides with the magnetic symplectic form B usually employed to describe in a gauge-invariant way the motion of a particle in a magnetic field ([MaRa], Sect. 6.6). For ε small enough, the matrix (69) defines a symplectic form, i.e. a closed non-degenerate 2-form. Indeed, since det B = 1 it follows that, for ε small enough, B, ε is not degenerate. In particular it is sufficient to choose ε < sup B(r) n (κ) + n (κ) . r,κ∈Rd
The closedness of B, ε follows from the fact that B and n correspond to closed 2-forms over Rd . With these definitions the corresponding Hamiltonian equations are B, ε (z) z˙ = dHsc (z) , or equivalently
B(r) −I I ε n (κ)
r˙ ∇r H (r, κ) = , ∇κ H (r, κ) κ˙
which agrees with (68). We notice that this discussion remains valid if n admits a potential only locally, as it happens generically for magnetic Bloch bands.
Effective Dynamics for Bloch Electrons
577
Acknowledgements. G. P. is grateful for financial support by the Research Training Network HYKE of the European Union and by the Priority Program “Analysis, Modeling and Simulation of Multiscale Problems” of the Deutsche Forschungsgemeinschaft.
References [AsKn] [AsMe] [ABL]
Asch, J., Knauf, A.: Motion in periodic potentials. Nonlinearity 11, 175–200 (1998) Ashcroft, N.W., Mermin, N.D.: Solid State Physics. New York: Saunders, 1976 Avron, J.E., Berger, J., Last, Y.: Piezoelectricity: Quantized charge transport driven by adiabatic deformations. Phys. Rev. Lett. 78, 511–514 (1997) [BMP] Bechouche, P., Mauser, N.J., Poupaud, F.: Semiclassical limit for the Schr¨odinger-Poisson equation in a crystal. Comm. Pure Appl. Math. 54, 851–890 (2001) [BeRa] Bellissard, J., Rammal, R.: An algebraic semi-classical approach to Bloch electrons in a magnetic field. J. Physique France 51, 1803 (1990) Blount, E.I.: Formalisms of band theory. In: Solid State Physics 13, New York: Academic [Bl1 ] Press, 1962, pp. 305–373 Blount, E.I.: Bloch electrons in a magnetic field. Phys. Rev. 126, 1636–1653 (1962) [Bl2 ] [Bu] Buslaev, V.: Semiclassical approximation for equations with periodic coefficients. Russ. Math. Surv. 42, 97–125 (1987) [ChNi] Chang, M.C., Niu, Q.: Berry phase, hyperorbits and the Hofstadter spectrum: Semiclassical dynamics and magnetic Bloch bands. Phys. Rev. B 53, 7010–7023 (1996) [DGR] Dimassi, M., Guillot, J.-C., Ralston, J.: Semiclassical asymptotics in magnetic Bloch bands. J. Phys. A 35, 7597–7605 (2002) [DiSj] Dimassi, M., Sj¨ostrand, J.: Spectral Asymptotics in the Semi-Classical Limit. London Mathematical Society Lecture Note Series 268, Cambridge: Cambridge University Press, 1999 [Fo] Folland, G.B.: Harmonic analysis in phase space. Princeton, NJ: Princeton University Press, 1989 [GaAv] Gat, O., Avron, J.E.: Magnetic fingerprints of fractal spectra and the duality of Hofstadter models. New J. Phys. 5, 44.1–44.8 (2003) [GMMP] G´erard, P., Markowich, P.A., Mauser, N.J., Poupaud, F.: Homogenization limits and Wigner transforms. Commun. Pure Appl. Math. 50, 323–380 (1997) [GMS] G´erard, C., Martinez, A., Sj¨ostrand, J.: A mathematical approach to the effective Hamiltonian in perturbed periodic problems. Commun. Math. Phys. 142, 217–244 (1991) [GeNi] G´erard, C., Nier, F.: Scattering theory for the perturbations of periodic Schr¨odinger operators. J. Math. Kyoto Univ. 38, 595–634 (1998) [GRT] Guillot, J.C., Ralston, J., Trubowitz, E.: Semi-classical asymptotics in solid state physics. Commun. Math. Phys. 116, 401–415 (1988) [HeSj] Helffer, B., Sj¨ostrand, J.: On diamagnetism and de Haas-Van Alphen effect. Annales I.H.P. (physique th´eorique) 52, 303–375 (1990) [HST] H¨overmann, F., Spohn, H., Teufel, S.: Semiclassical limit for the Schr¨odinger equation with a short scale periodic potential. Commun. Math. Phys. 215, 609–629 (2001) [JNM] Jungwirth, T., Niu, Q., MacDonald, A.H.: Anomalous Hall effect in ferromagnetic semiconductors. Phys. Rev. Lett. 88, 207208 (2002) [Ko] Kohn, W.: Theory of Bloch electrons in a magnetic field: The effective Hamiltonian. Phys. Rev. 115, 1460–1478 (1959) [Lu] Luttinger, J.M.: The effect of a magnetic field on electrons in a periodic potential. Phys. Rev. 84, 814–817 (1951) [MaNo] Maltsev, A.Ya., Novikov, S.P.: Topological phenomena in normal metals. Physics - Uspekhi 41, 231–239 (1998) [MaRa] Marsden, J.E., Ratiu, T.S.: Introduction to Mechanics and Symmetry. Texts in Applied Mathematics 17, Berlin-Heidelberg-New York: Springer Verlag, 1999 [MaSo] Martinez, A., Sordoni, V.: On the time-dependent Born-Oppenheimer approximation with smooth potential. C. R. Math. Acad. Sci. Paris 334, 185–188 (2002) [Ne] Nenciu, G.: Dynamics of band electrons in electric and magnetic fields: Rigorous justification of the effective Hamiltonians. Rev. Mod. Phys. 63, 91–127 (1991) [NeSo] Nenciu, G., Sordoni, V.: Semiclassical limit for multistate Klein-Gordon systems: Almost invariant subspaces and scattering theory. Math. Phys. Preprint Archive mp arc 01–36 (2001) [Pa] Panati, G.: On the existence of smooth and periodic Bloch functions. In preparation Panati, G., Spohn, H., Teufel, S.: Space-adiabatic perturbation theory. To appear in Adv. [PST1 ] Theor. Math. Phys., 2003
578 [PST2 ] [PST3 ] [Ro] [Si] [SuNi] [Te1 ] [Te2 ] [TKNN] [Wa] [Wi] [Za]
G. Panati, H. Spohn, S. Teufel Panati, G., Spohn, H., Teufel, S.: Space-adiabatic perturbation theory in quantum dynamics. Phys. Rev. Lett. 88, 250405 (2002) Panati, G., Spohn, H., Teufel, S.: Effective dynamics in magnetic Bloch bands. In preparation, 2002 Robert, D.: Autour de l’Approximation Semi-Classique. Progress in Mathematics, Volume 68, Basel-Boston: Birkh¨auser, 1987 Simon, B.: Holonomy, the quantum adiabatic theorem, and Berry’s phase. Phys. Rev. Lett. 51, 2167–2170 (1983) Sundaram, G., Niu, Q.: Wave-packet dynamics in slowly perturbed crystals: Gradient corrections and Berry-phase effects. Phys. Rev. B 59, 14915–14925 (1999) Teufel, S.: Adiabatic perturbation theory in quantum dynamics. Springer Lecture Notes in Mathematics 1821, 2003 Teufel, S.: Propagation of Wigner functions for the Schr¨odinger equation with a slowly perturbed periodic potential. To appear in the Proceedings of the conference Multiscale Methods in Quantum Mechanics in Rome, December 16–20, 2002 Thouless, D.J., Kohomoto, M., Nightingale, M.P., den Nijs, M.: Quantized Hall conductance in a two-dimensional periodic potential. Phys. Rev. Lett. 49, 405–408 (1982) Wannier, G.H.: Dynamics of band electrons in electric and magnetic fields. Rev. Mod. Phys. 34, 645–656 (1962) Wilcox, C.H.: Theory of bloch waves. J. Anal. Math. 33, 146–167 (1978) Zak, J.: Dynamics of electrons in solids in external fields. Phys. Rev. 168, 686–695 (1968)
Communicated by B. Simon
Commun. Math. Phys. 242, 579–584 (2003) Digital Object Identifier (DOI) 10.1007/s00220-003-0957-7
Communications in
Mathematical Physics
Generalized Symmetry Transformations on Quaternionic Indefinite Inner Product Spaces: An Extension of Quaternionic Version of Wigner’s Theorem ˇ Peter Semrl Department of Mathematics, University of Ljubljana, Jadranska 19, 1000 Ljubljana, Slovenia. E-mail: [email protected] Received: 6 March 2003 / Accepted: 20 June 2003 Published online: 10 October 2003 – © Springer-Verlag 2003
Abstract: Uhlhorn’s extension of Wigner’s unitary-antiunitary theorem has been recently generalized by Moln´ar to indefinite inner product spaces. We present the quaternionic version of this result. 1. Introduction Wigner’s unitary-antiunitary theorem plays a fundamental role in quantum mechanics. It states that any quantum mechanical invariance transformation (symmetry transformation) can be represented by a unitary or an antiunitary operator. In the mathematical language this reads as follows. Let H be a complex Hilbert space and φ a bijective transformation on the set of all one-dimensional linear subspaces of H preserving the angle between every pair of such subspaces (transition probability in the terminology of quantum mechanics). Then there exists a unitary or an antiunitary operator U : H → H such that φ(L) = {U x : x ∈ L} for every one-dimensional subspace L of H . In the case that dim H ≥ 3, Uhlhorn [10] obtained the same conclusion under the weaker assumption that φ preserves the orthogonality between the one-dimensional subspaces of H . This is a significant generalization since Uhlhorn’s transformation preserves only the logical structure of the quantum mechanical system in question while Wigner’s transformation preserves its complete probabilistic structure. As noted in [2, Introduction] the indefinite inner product spaces represent a more useful mathematical setting for describing several physical problems than definite ones. This has raised the need to study Wigner’s and Uhlhorn’s theorem in this more general setting as well [2, 3, 5]. The most general result of this type for real and complex Hilbert spaces has been recently proved by Moln´ar [6]. Motivated by applications of quaternionic Hilbert spaces in certain physical problems (see, for example [4]) he posed the problem whether an analogous result holds for such spaces as well. It should be mentioned here that the theorems of Wigner and Uhlhorn have been already extended to the quaternionic case [1, 9, 10]. Moln´ar used Ovchinnikov’s characterization of automorphisms of the poset of idempotent operators [7] to
580
ˇ P. Semrl
obtain the general form of bijective transformations on rank one idempotent operators preserving zero products and from here he deduced the extension of Uhlhorn’s theorem. He suggested the same approach in the quaternionic case. This approach would require the extension of Ovchinnikov’s result to the quaternionic case. We will solve the problem with a much shorter direct proof not depending on the results of Ovchinnikov’s type. This kind of approach leads also to the shorter proof of Moln´ar’s result [8]. In the second section we will fix the notation and obtain some preliminary results on quaternionic Hilbert spaces including Theorem 1 which characterizes zero product preserving bijective transformations on rank one idempotent bounded linear operators acting on a quaternionic Hilbert space. The third section will be devoted to the formulation and the proof of the main result, that is, the generalization of the quaternionic version of Uhlhorn’s result. 2. Preliminary Results We will denote by H the skew-field of quaternions, H = {t + ai + bj + ck : t, a, b, c ∈ R}. For q = √ t + ai + bj + ck ∈ H, q ∗ is defined by q ∗ = t − ai − bj − ck and |q| by |q| = t 2 + a 2 + b2 + c2 . We will say that t is the real part of q. Quaternions t + ai + bj + ck will be sometimes identified with ordered pairs (t, v) ∈ R × R3 , where v = ai + bj + ck and the triple i, j, k is the standard orthonormal basis of the three dimensional Euclidean space. Then the multiplication on H is defined by (t, v)(s, u) = (ts − v, u, tu + sv + v × u). With SO(3) we denote the 3 × 3 special orthogonal group, that is, the group of all linear orthogonal operators O on R3 with det O = 1. For every O ∈ SO(3) and every pair of vectors v, u we have O(v × u) = (Ov) × (Ou) and (−O)(v × u) = ((−O)u) × ((−O)v). For any linear operator A : R3 → R3 we define fA : H → H by fA ((t, v)) = (t, Av), (t, v) ∈ H. So, if Q ∈ SO(3), then fQ is an automorphism of the skew-field H and f−Q is an antiautmorphism of H, that is, a bijective map from H onto itself satisfying f−Q (q + p) = f−Q (q) + f−Q (p) and f−Q (qp) = f−Q (p)f−Q (q), q, p ∈ H. The following statement is probably wellknown. As the proof is very short we will include it for the sake of completeness. Proposition 1. Let f be an automorphism of H. Then there exists Q ∈ SO(3) such that f = fQ . Similarly, if g is an antiautomorphism of the skew-field H, then there exists P ∈ SO(3) such that g = f−P . Proof. We will prove only the second part of the statement as the proof of the first part goes through in exactly the same way. The center of the skew-field H, that is, the set of all quaternions (t, 0), where t is any real number, is mapped by every antiautomorphism onto itself. The restriction of g to the center is an automorphism of the real field. It is well-known that the only automorphism of the real field is the identity. Hence, g is a real linear map on H. If (t, v)2 = −1 for some quaternion q = (t, v), then 2tv = 0, and consequently, either t = 0 or v = 0. The second possibility leads to t 2 = −1, a contradiction. Thus, q is of the form (0, v). It follows that the real linear subspace of all such quaternions is invariant under g. Therefore, g is of the form g((t, v)) = (t, Av) for some real linear operator A on R3 . Comparing the real parts of g(pq) and g(q)g(p), q, p ∈ H, we first see that A is an orthogonal operator. Applying g(pq) = g(q)g(p), q, p ∈ H, once again we conclude that A(v × u) = (Au) × (Av). Thus, A = −P for some P ∈ SO(3). In particular, every automorphism as well as every antiautomorphism of the skewfield H is a real linear isometry with respect to | · |.
Quaternionic Version of Wigner’s Theorem
581
Let us recall some basic definitions. Let V be a (left) vector space over H. An inner product on V is a map ·, · : V × V → H satisfying – – – –
x, y = y, x∗ , px + qy, z = px, z + qy, z, x, py + qz = x, yp ∗ + x, zq ∗ , x, x ≥ 0 and x, x = 0 ⇐⇒ x = 0,
√ for all p, q ∈ H and all x, y, z ∈ V . If ·, · is an inner product on V , then x = x, x is a norm on a real vector space V with qx = |q| x for all q ∈ H, x ∈ V . In particular, the space V equipped with such a norm is a normed left H-module. A left vector space H over H together with an inner product which makes the resulting normed linear space complete is called a quaternionic Hilbert space. The geometry of quaternionic Hilbert spaces is similar to that of complex Hilbert spaces. In particular, we have |x, y| ≤ x y , x, y ∈ H , and every bounded linear functional on H is of the form x → x, y for a unique y. Let H be a quaternionic Hilbert space. A map A : H → H is called a semilinear operator if there exists an automorphism f of H such that A(x + y) = Ax + Ay and A(qx) = f (q)Ax for all q ∈ H and x, y ∈ H . We say that the automorphism f corresponds to the semilinear operator A. Since every semilinear operator on H is real linear we can apply the closed graph theorem for such operators. In particular, if A : H → H is a semilinear operator and if for every sequence (xn ) of vectors the facts that xn → 0 and Axn → y imply that y = 0, then A is bounded. We will also need the notion of the adjoint operator of a bounded semilinear operator A : H → H . Let Q ∈ SO(3) be chosen in such a way that A(qx) = fQ (q)Ax, q ∈ H, x ∈ H . Pick y ∈ H . Then x → fQ−1 (Ax, y) is a bounded linear functional on H , and therefore, there exists a unique w ∈ H such that fQ−1 (Ax, y) = x, w, or equivalently, Ax, y = fQ (x, w), x ∈ H . We define w = A∗ y. It is now easy to see using standard arguments that A∗ is a bounded semilinear operator on H satisfying Ax, y = fQ (x, A∗ y), x, y ∈ H , and A∗ (qx) = fQ−1 (q)A∗ x, q ∈ H, x ∈ H . The set of all bounded linear operators on H will be denoted by B(H ). We say that A ∈ B(H ) is of rank one if its range is one-dimensional. For every pair of vectors x, y ∈ H we denote by Tx,y the bounded linear operator on H defined by Tx,y u = u, yx, u ∈ H . If x and y are nonzero then this is a rank one operator and every rank one operator from B(H ) can be written in this form. Note that Tqx,y = Tx,q ∗ y for all q ∈ H and x, y ∈ H . Further, Tx,y Tu,v = Tu,yx,v , x, y, u, v ∈ H . In particular, Tx,y is a rank one idempotent if and only if x, y = 1. We denote by I (H ) ⊂ B(H ) the subset of all bounded linear rank one idempotents on H . Clearly, if Tx,y , Tu,v ∈ I (H ), then Tx,y Tu,v = 0 if and only if u, y = 0. Now, we are ready to formulate the main result of this section. This result will be the main tool for proving the quaternionic version of Moln´ar’s result.
Theorem 1. Let H be a quaternionic Hilbert space with dim H ≥ 3 and φ : I (H ) → I (H ) a bijective transformation satisfying T S = 0 ⇐⇒ φ(T )φ(S) = 0 for all T , S ∈ I (H ). Then φ(T ) = AT A−1 , T ∈ I (H ), where A : H → H is a bounded invertible semilinear operator.
ˇ P. Semrl
582
Proof. Take any two idempotents Tx,y and Tu,v of rank one. Clearly, the vectors x and u are linearly dependent if and only if for every S ∈ I (H ) we have STx,y = 0 ⇐⇒ STu,v = 0. For a nonzero x ∈ H define Lx to be the set of all rank one idempotents from I (H ) whose range is the linear span of x, that is, Lx = {Tx,y : y ∈ H and x, y = 1}. Thus, for every nonzero x ∈ H there exists a nonzero z ∈ H such that φ(Lx ) = Lz . Set PH = {[x] : x ∈ H \ {0}}, where [x] denotes the one-dimensional linear span of x. Hence, φ induces a bijective map ϕ on PH such that [z] = ϕ([x]) if and only if φ(Lx ) = Lz . If [x1 ] ⊂ [x2 ] + [x3 ] for some nonzero x1 , x2 , x3 ∈ H , then for every S ∈ I (H ) satisfying S · Lx2 = S · Lx3 = {0} we have S · Lx1 = {0}. So, if ϕ([xi ]) = [zi ], i = 1, 2, 3, then for every R ∈ I (H ) satisfying R · Lz2 = R · Lz3 = {0} we have R · Lz1 = {0}. It follows that ϕ([x1 ]) ⊂ ϕ([x2 ]) + ϕ([x3 ]). Conversely, if ϕ([x1 ]) ⊂ ϕ([x2 ]) + ϕ([x3 ]) then, by applying the same argument to the inverse of φ, we must have [x1 ] ⊂ [x2 ] + [x3 ]. By the fundamental theorem of the projective geometry the map ϕ is induced by a semilinear bijective map A : H → H . Thus, for every Tx,y ∈ I (H ) there exists u ∈ H such that φ(Tx,y ) = TAx,u and Ax, u = 1. Choose Q ∈ SO(3) such that A(qx) = fQ (q)Ax, q ∈ H, x ∈ H . Let us prove that A is bounded. In order to do this we assume that (xn ) ⊂ H is a sequence of vectors satisfying xn → 0 and Axn → y. We have to show that y = 0. Assume on the contrary that y = 0. Take any u with u, y = 1. Choose and fix vectors z, v with v = 1 such that z, v = 1 and φ(Tz,v ) = Tu,y . Obviously, for every nonzero w ∈ H we have Tu,y LAw = 0 ⇐⇒ Tz,v Lw = 0, or equivalently, Aw, y = 0 ⇐⇒ w, v = 0. Take any x ∈ H . Because x − x, vv is orthogonal to v, its A-image is orthogonal to y, or equivalently, for every x ∈ H we have Ax, y = fQ (x, v)Av, y. Replacing x by xn , sending n to infinity and applying the fact that |fQ (x, v)| ≤ x v , we get y 2 = 0, a contradiction. Thus, A is bounded. Replacing φ by T → A−1 φ(T )A we may, and we do assume that for every Tx,y ∈ I (H ) there exists u ∈ H such that φ(Tx,y ) = Tx,u and x, u = 1. For a nonzero w ∈ H we have w, y = 0 if and only if Tx,y · Lw = {0} which is equivalent to φ(Tx,y ) · Lw = Tx,u · Lw = {0}. So, w, y = 0 ⇐⇒ w, u = 0. Moreover, x, u = x, y = 1. It follows that y = u. Hence, φ(T ) = T for every T ∈ I (H ). This completes the proof. 3. Quaternionic Version of Moln´ar’s Theorem Let D ∈ B(H ) be an invertible operator. We will consider the indefinite inner product induced by D which is defined by (x, y)D = Dx, y, x, y ∈ H . For nonzero vectors x, y ∈ H we write [x] ⊥D [y] if (x, y)D = 0. Note that [x] ⊥D [y] does not imply that [y] ⊥D [x] in general. A ray transformation : PH → PH is called a generalized symmetry transformation with respect to the indefinite inner product generated by D if [x] ⊥D [y] ⇐⇒ [x] ⊥D [y]. Theorem 2. Let H be a quaternionic Hilbert space, dim H ≥ 3, D ∈ B(H ) an invertible operator and : PH → PH a bijective generalized symmetry transformation with respect to the indefinite inner product generated by D. Then there exist a nonzero c ∈ R
Quaternionic Version of Wigner’s Theorem
583
and a bounded semilinear bijective operator U : H → H such that [x] = [U x] for every nonzero x ∈ H and DU x, Uy = cf (Dx, y), x, y ∈ H, where f is the automorphism of the skew-field H corresponding to the semilinear operator U . Remark 1. If we take D = I then we get the quaternionic version of Uhlhorn’s generalization of Wigner’s theorem. Proof. We define a map φ : I (H ) → I (H ) in the following way. Let Tx,y ∈ I (H ). Then (D −1 x, y)D = 1, and consequently, [D −1 x] ⊥D [y]. Hence, we can find u ∈ [D −1 x] and v ∈ [y] such that Du, v = 1. Define φ(Tx,y ) = TDu,v . Applying the fact that for nonzero vectors x, y, u, v we have Tx,y = Tu,v if and only if u = qx and y = q ∗ v for some nonzero q ∈ H we easily see that φ is well-defined. For idempotents Tx,y , Tu,v ∈ I (H ) we have Tx,y Tu,v = 0 if and only if (D −1 u, y)D = 0 which is equivalent to [D −1 u] ⊥D [y]. This is further equivalent to φ(Tx,y )φ(Tu,v ) = 0. Moreover, φ is bijective. Indeed, if Tx,y = Tu,v ∈ I (H ), then either x and u are linearly independent, or y and v are linearly independent. We will consider only the first case. Then [D −1 x] = [D −1 u], and consequently, φ(Tx,y ) = φ(Tu,v ). To prove the surjectivity, choose any pair of vectors z and v with z, v = 1. We want to find Tx,y ∈ I (H ) such that φ(Tx,y ) = Tz,v . We can find nonzero vectors x and y such that D −1 z ∈ [D −1 x] and v ∈ [y]. Because [D −1 z] ⊥D [v] we have [D −1 x] ⊥D [y]. Hence, multiplying x by an appropriate scalar, if necessary, we may assume that x, y = 1. Then φ(Tx,y ) = Tz,v , as desired. Thus, by Theorem 1, there exists a bounded invertible semilinear operator A : H → H such that every rank one idempotent Tx,y is mapped into a rank one idempotent whose range is the linear span of Ax. In other words, for every nonzero x ∈ H we have [Ax] = [Du] for some nonzero u ∈ [D −1 x]. Replacing x by Dx we see that [x] = [U x], x ∈ H , where U = D −1 AD is a semilinear bounded bijective map. From [x] ⊥D [y] ⇐⇒ [x] ⊥D [y] we get DU x, Uy = 0 if and only if Dx, y = 0, x, y ∈ H , or equivalently, U ∗ DU D −1 x, y = 0 if and only if x, y = 0, x, y ∈ H . Hence, for every x ∈ H , the vector U ∗ DU D −1 x belongs to the linear span of x, and because U ∗ DU D −1 is an additive map we have U ∗ DU D −1 = qI for some nonzero q ∈ H. Here, I denotes the identity operator on H . Indeed, for every nonzero x ∈ H there exists a scalar qx ∈ H such that U ∗ DU D −1 x = qx x. Let x and y be linearly independent. By additivity of U ∗ DU D −1 we have qx x + qy y = U ∗ DU D −1 x + U ∗ DU D −1 y = U ∗ DU D −1 (x + y) = qx+y (x + y), and consequently, qx = qx+y = qy . If x and y are linearly dependent, then we can find z ∈ H linearly independent of x and y. So, by the previous step we have qx = qz and qy = qz . Thus, qx is independent of x, as desired. Now, U ∗ DU D −1 is a linear map on the quaternionic vector space, and therefore, q ∈ R. It follows that U ∗ DU D −1 x, y = qx, y, x, y ∈ H , which further yields DU x, Uy = qf (Dx, y), x, y ∈ H , where f is the automorphism of the skew-field H corresponding to the semilinear operator U . Acknowledgements. I would like to thank Lajos Moln´ar for many valuable comments on this work and to the referee for numerous suggestions and for information on some known related results. This research was supported in part by a grant from the Ministry of Science of Slovenia.
584
ˇ P. Semrl
References 1. Bargmann, V.: Note on Wigner’s theorem on symmetry operations. J. Math. Phys. 5, 862–868 (1964) 2. Bracci, L., Morchio, G., Strocchi, F.: Wigner’s theorem on symmetries in indefinite metric spaces. Commun. Math. Phys. 41, 289–299 (1975) 3. Van den Broek, P.M.: Symmetry transformations in indefinite metric spaces: A generalization of Wigner’s theorem. Physica A 127, 599–612 (1984) 4. Finkelstein, J., Jauch, J.M., Schiminovich, S., Speiser, D.: Foundations of quaternion quantum mechanics. J. Math. Phys. 3, 207–220 (1962) 5. Moln´ar, L.: Generalization of Wigner’s unitary-antiunitary theorem for indefinite inner product spaces. Commun. Math. Phys. 201, 785–791 (2000) 6. Moln´ar, L.: Orthogonality preserving transformations on indefinite inner product spaces: generalization of Uhlhorn’s version of Wigner’s theorem. J. Funct. Anal. 194, 248–262 (2002) 7. Ovchinnikov, P.G.: Automorphisms of the poset of skew projections. J. Funct. Anal. 115, 184–189 (1993) ˇ 8. Semrl, P.: Applying projective geometry to transformations on rank one idempotents. J. Funct. Anal., to appear 9. Sharma, C.S., Almeida, D.F.: Additive isometries on a quaternionic Hilbert space. J. Math. Phys. 31, 1035–1041 (1990) 10. Uhlhorn, U.: Representation of symmetry transformations in quantum mechanics. Ark. Fysik 23, 307–340 (1963) Communicated by M.B. Ruskai